• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Autonomous maneuver decision-making for a UCAV in short-range aerial combat based on an MS-DDQN algorithm

    2022-09-22 10:57:46YongfengLiJingpingShiWeiJingWeiguoZhngYongxiLyu
    Defence Technology 2022年9期

    Yong-feng Li ,Jing-ping Shi ,b,*,Wei Jing ,Wei-guo Zhng ,b,Yong-xi Lyu ,b

    a School of Automation,Northwestern Polytechnical University,Xi'an,710129,China

    b Shaanxi Province Key Laboratory of Flight Control and Simulation Technology,Xi'an,710129,China

    Keywords:Unmanned combat aerial vehicle Aerial combat decision Multi-step double deep Q-network Six-degree-of-freedom Aerial combat maneuver library

    ABSTRACT To solve the problem of realizing autonomous aerial combat decision-making for unmanned combat aerial vehicles(UCAVs)rapidly and accurately in an uncertain environment,this paper proposes a decision-making method based on an improved deep reinforcement learning(DRL)algorithm:the multistep double deep Q-network(MS-DDQN)algorithm.First,a six-degree-of-freedom UCAV model based on an aircraft control system is established on a simulation platform,and the situation assessment functions of the UCAV and its target are established by considering their angles,altitudes,environments,missile attack performances,and UCAV performance.By controlling the flight path angle,roll angle,and flight velocity,27 common basic actions are designed.On this basis,aiming to overcome the defects of traditional DRL in terms of training speed and convergence speed,the improved MS-DDQN method is introduced to incorporate the final return value into the previous steps.Finally,the pre-training learning model is used as the starting point for the second learning model to simulate the UCAV aerial combat decision-making process based on the basic training method,which helps to shorten the training time and improve the learning efficiency.The improved DRL algorithm significantly accelerates the training speed and estimates the target value more accurately during training,and it can be applied to aerial combat decision-making.

    1.Introduction

    Air domination is becoming more and more important in modern warfare.Among recent developments in this area,unmanned combat aerial vehicles(UCAVs)have attracted worldwide attention.UCAVs have the advantages of high mobility,low cost,and zero risk of casualties among their operators.In the past,UCAVs have been used mainly for aerial reconnaissance,battlefield monitoring,attracting fire,and communication relay tasks[1].With developments in weaponry,computer intelligence,and communication technology,there has been continuous improvement in the performance of UCAVs[2,3],and they are likely to become a mainstay of military equipment,able to perform aerial combat,ground fire suppression,and acquisition of air dominance[4].

    Despite this great improvement in performance,however,it is difficult for UCAVs to carry out all of their complex tasks independently in an increasingly complex combat environment,and most of these tasks cannot be separated completely from human intervention.With currently available technology,it is necessary for a ground controller to control a UCAV through a task control station.This control method suffers from delays and is prone to electromagnetic interference.An interference attack can interrupt communication between base station and UCAV[5].It is therefore essential for air forces to develop UCAVs with autonomous combat capability.

    With the rapid development of artificial intelligence(AI),it has now reached a level where it has great potential for application to autonomous aerial combat.The development of the ALPHA intelligent aerial combat system in the United States[6]indicates the likelihood that future aerial combat will no longer be between human and human,but rather between human and machine or between machine and machine[7].UCAVs will evolve from simple remote control to intelligent and autonomous control,and equipped with intelligent combat decision systems,will graduallyreplace piloted aircraft,with consequent improvements in combat effectiveness and reductions in cost.In close combat,a UCAV will be able to select appropriate flight control commands according to the current combat situation,seize a favorable position,and find an opportunity to shoot down the enemy aircraft and protect itself.

    Abbreviations 6-DOF Six-degree-of-freedom AI Artificial intelligence BP Back-propagation DDPG Deep deterministic policy gradient DRL Deep reinforcement learning GWO Grey wolf optimizer MS-DDQN Multi-step double deep Q-network PID Proportional—integral—derivative UCAV Unmanned combat aerial vehicle

    In the increasingly complex aerial combat environment,an autonomous maneuver decision requires a UCAV to generate appropriate maneuver control commands automatically in different aerial combat situations.In the past few decades,aerial combat decision-making methods have become divided into two categories:traditional methods and those based on AI.The traditional methods for maneuver decisions include pursuit evasion games[8—11],differential game strategies[12—15],game theoretical methods[16],influence graph methods[17—20],and Bayesian theory[21—23].For example,the differential game method describes the dynamic decision-making process of aerial combat through a differential equation and thereby determines the optimal behavior of a UCAV relative to its target.This enables the UCAV to respond rapidly to changes in the combat environment,but,owing to the limitations of real-time calculation,it is difficult to apply this approach in complex environments[12].Game theory has been used to model UCAV aerial combat decisions in those situations that are common to all aerial combat environments,but the timesensitive nature of information in complex environments will have a deleterious effect on some decisions.Therefore,a constraint strategy game model for time-sensitive information in a complex aerial combat environment has been proposed[16].This model provides improved decision-making for a UCAV in both attack and defense.In another approach[21],aerial combat is regarded as a Markov process,and Bayesian reasoning is applied,with the weight of maneuver decision factors being adjusted adaptively to give an improved objective function and enhance the superiority of the UCAV.Although these decision algorithms can improve the efficiency,robustness,and optimization rate of decision-making to some extent,the frequent reasoning processes that they require increase the optimization time,leading to slow responses of the UCAV,which is not suitable for the modern battlefield environment.

    Methods based on AI include expert system methods[24—26],genetic learning algorithms[27],neural network methods[28—30],and reinforcement learning algorithms[32—34].Expert system methods are widely used in AI,and the associated techniques are mature.In the context of UCAV control in a combat environment,these methods establish rule sets in the knowledge base by using if—then rules according to data provided by pilots.The situation and state of motion of both sides in the combat are predicted,and pseudo-random numbers are used to generate corresponding maneuver commands according to a certain probability.However,the disadvantage of this method is that the rule base is complex and has poor generality,and therefore it needs to be continually debugged.In Ref.[27],a decision model for aircraft maneuvering was designed based on a genetic learning system.By optimizing the maneuver process,the aerial combat decision-making problem in an unknown aerial combat environment can be solved.The corresponding tactical actions can then be generated in different aerial combat environments.However,the parameter design of this method is subjective and cannot be applied flexibly.A decisionmaking system using a neural network technique has been shown to have strong tracking accuracy for highly maneuverable targets[28].However,this neural network method needs a large number of UCAV aerial combat samples for its learning process,and a sufficient number of such samples are not available.To solve the uncertain factors of UCAVs in aerial combat,the method proposed in Ref.[31]forecasts the target state using the grey wolf optimizer(GWO)method and can be used for real-time optimization.

    In contrast to other methods based on AI,reinforcement learning algorithms for UCAV control base their learning process on continuous trial-and-error interactions between the agent and the environment.According to feedback from the environment,these algorithms generate a strategy and then act according to this strategy to make the UCAV continuously interact with the environment and achieve the dominant position in aerial combat.Further feedback and consequent modifications of the strategy finally lead to an optimal strategy.Since the reinforcement learning process does not usually require training samples,it is able to optimize behavior through rewards obtained from environmental feedback alone[32].To improve the computational efficiency of reinforcement learning,in the approach proposed in Ref.[33],expert experience is used as a heuristic signal to guide the process of reinforcement learning and neural network training.In Ref.[34],an intelligent aerial combat learning system based on the learning mechanism of the brain was proposed,with the aim of training UCAVs by simulating human reasoning processes and carrying out autonomous learning.In Refs.[35—37],by combining neural networks with reinforcement learning,a deep reinforcement learning(DRL)algorithm was constructed to improve operational efficiency of UCAVs.The deep deterministic policy gradient(DDPG)method is another network-based DRL method.It can deal with the learning problem in a continuous action space.Using this method,a driver can be designed,and a maneuver strategy can be established to solve for the continuous action values[38—40].

    However,the DRL algorithm has the problems of slow training speed and convergence speed.At the same time,under the conditions of aerial combat,the aircraft model itself is nonlinear and the flight trajectory of the target is uncertain,leading to difficulties in UCAV maneuver decision-making.To solve these problems,this paper adopts the following approach:

    (1)The particle model of UCAV is used in most papers on air combat maneuver decision,in this paper,We constructed a six-degree-of-freedom(6-DOF)UCAV model.In the establishment of the 6-DOF UCAV situation assessment function,not only the angle,height,environment and missile attack performance of UCAV are considered,but also the performance of UCAV is fully considered to solve the problem of UCAV out of control.The 6-DOF model is more in line with the actual application needs,and has higher authenticity and practicability.

    (2)The control law is designed to control the flight path angle,roll angle,and flight velocity,which makes the 7 classic maneuver actions are extended to 27 so that a UCAV can carry out maneuvers that cannot be completed by a typical action library.

    (3)A multistep double deep Q-network(MS-DDQN)algorithm is proposed to improve the training speed and accuracy oftraditional DRL by introducing the final return value into the previous steps.

    (4)A 6-DOF UCAV model is constructed in a MATLAB/Simulink environment,and the appropriate aerial combat action is selected as the UCAV maneuver output.An aerial combat superiority function is established,and a UCAV aerial combat maneuver decision model is designed.Finally,based on the basic training method,the pre trained learning model is used as the starting point of a second learning model.The simulation results show that the method is effective and feasible for UCAV maneuver decision-making.

    2.Background

    In this section,an introduction to the UCAV autonomous tactical decision system,UCAV motion model,and UCAV controller is presented,together with the relevant theoretical background,including the Q-learning algorithm and Q-based DQN algorithm.

    2.1.UCAV autonomous tactical decision system

    Research into UCAV autonomous tactical decision-making systems has aimed to improve the autonomy of UCAVs,allowing them to deal independently with emergencies,improve the efficiency with which they execute tasks,and improve their ability to adapt to the environment.As shown in Fig.1,the UCAV autonomous decision-making module comprehensively evaluates the situation information of the UCAV and its target.If the UCAV is in the aerial combat state,the combat situation information is input into the module,which obtains the relative advantage function and the maneuver command based on the maneuver action library,generates the corresponding maneuver action,and obtains the rudder deflection through the control law,to guide the aerial combat of the UCAV.If the UCAV is not in the aerial combat state,it will search the battlefield until the target is found.The module can also be switched automatically into manual mode.

    2.2.UCAV motion model

    Fig.2.The F-16 aircraft configuration.

    Fig.2 shows the UCAV model used in this paper which is based on the General Dynamics F16 aircraft of the United States Air Force,and the flight envelope of the UCAV model is-20≤α≤90,-30≤β≤30,0˙1≤M≤0˙6.Because the focus of this study is on UCAV maneuver decision-making,the UCAV is regarded as an ideal rigid body with left-right symmetry when its motion is considered,and the influence of the Earth's rotation is ignored,as are the dynamic characteristics of the sensors.The velocity and three attitude angles of UCAVs are considered.Control of the UCAV relies mainly on the thrust of its engine and on its aerodynamic control surfaces.To take account of the nonlinear behavior of the UCAV and its maneuvering capabilities,a 6-DOF equation is adopted to describe its state of motion.The structural parameters of the UCAV are shown in Table 1,and the restrictions on each rudder surface are shown in Table 2.

    Table 1Structural parameters of the UCAV.

    Table 2Limitations of each rudder surface.

    The ground coordinate system is taken as the European coordinate system,and the nonlinear 6-DOF model of the UCAV in this system can be described as force equations,moment equations,motion equations,and navigation equations.The nonlinear relationship between aircraft state vector x=[V,α,β,p,q,r,φ,θ,ψ,x,y,z]and control input u=[δ,δ,δ,δ]can be obtained,where V,α,and β are the speed,angle of attack and sideslip angle,respectively,φ,θ,and ψ are the roll angle,pitch angleand yaw angle,respectively,p,q,and r are the angular rates along the three axes,x,y,and z are the position along the x-axis,y-axis and z-axis in the ground coordinate system,respectively,and δ,δ,δ,and δare the elevator deflection,aileron deflection,rudder deflection,and throttle lever displacement,respectively.

    Fig.1.UCAV autonomous decision module.

    The above Eqs.(1)—(4)together constitute the 6-DOF motion equation of the UCAV.

    2.3.UCAV controller

    In this paper,using the above nonlinear UCAV model,and taking account of the influence of attitude on UCAV aerial combat decisions,a proportional—integral—derivative(PID)algorithm is used to set up the control law and construct the basic operation database.At the same time,a DRL algorithm is used to determine the UCAV maneuvers under autonomous decision-making,thus achieving complete and accurate control of the UCAV.

    According to the information of flight path angle,it can directly control the normal of the aircraft,to change the elevator deviation.

    Longitudinal control loop:

    where nz is the normal overload,q is the pitch angular rate,γ is the flight path angle,nzand γare the given normal overload and flight path angle,K,Kand Kare the proportional coefficients,Kis the integral coefficient.

    Velocity control loop:

    where Vis the given speed command,Kis the proportional coefficient,Kis the integral coefficient,Kis the differential coefficient.

    The roll angle control loop is composed of a vertical gyroscope.By measuring the roll angle of the aircraft and adding the signal to the aileron channel,a roll stability loop is formed to keep the wing horizontal.The yaw rate is measured by the course gyroscope,and the signals of yaw rate and sideslip angle are added to the rudder track to form the course stability control loop and reduce the sideslip.

    Lateral control loop:

    where p is the roll angle rate,φ is the roll angle,r is the yaw angular rate,φis the given roll angle command.K,K,K,and Kare the proportional coefficients.

    The specific values of control law parameters are listed in Table 3.

    Table 3Control law parameters.

    Table 4Parameters and their values.

    On the basis of trim status H=3000 m,V=200 m/s,θ=1˙73,give flight path angle command γ=30,roll angle command φ=30,and velocity command V=250 m/s,respectively.

    According to the digital simulation results shown in Figs.3-5,it can be seen that the flight path angle signal,velocity signal,and roll angle signal can accurately track the given step signal,which meets the requirements of control law design.

    2.4.Deep reinforcement learning

    The reinforcement learning algorithm consists of the following five principal parts:agent,environment,states,action A,andobservation R.At time t,the agent generates action and interacts with the environment.After the action has been executed,the state of the agent changes from sto sand gives a return value Rof the environment.In this way,agents always modify their data on their interaction with the environment,and the optimal solution is obtained after many operations.(see Fig.6).

    Fig.3.Simulation results of flight path angle control.

    The calculation process in reinforcement learning is one of continuous exploration of the optimal strategy.Strategy here refers to the mapping from state to action.The following formula gives the probability of each action corresponding to each state s:

    In a reinforcement learning algorithm,we want to maximize the value of the action corresponding to each state:

    In traditional reinforcement learning,a tabular form is usually used to record the value function model.This method can give the value of the function in different states and actions in a stable manner.However,in the face of complex problems,the space of states and actions is large,and it takes a long time to retrieve the corresponding state values in the table,which is difficult to solve.Because deep learning integrates feature learning into the model,it has self-learning properties and is robust and can therefore be applied to nonlinear models.However,deep learning cannot estimate data rules without bias,and it requires a large amount of data and repetitive calculations to achieve high accuracy.It is thereforeappropriate to consider the construction of a DRL algorithm by combining deep learning with a reinforcement learning algorithm.

    Fig.4.Simulation results of velocity control.

    The deep Q-network(DQN)model consists mainly of a multilayer back-propagation(BP)network maneuver decision model and a decision model based on Q-learning.In the decision-making process for a UCAV involved in aerial combat,it is necessary to analyze its flight status and combat situation,as well as those of the enemy aircraft.The BP network is used to calculate the long-term discount expectation of each state-action pair,and the Q-function network is used as the basis of evaluation when traversing all maneuvers in different states.At the same time,to keep the learning data close to the independent distribution data,we need to establish a database to store the state,action,reward,and next state of the action in a given period of time.With this method,in each learning step,a small portion of the samples in the storage area〈s,a,r,s〉is used,and this disturbs the correlation of the original data and reduces the divergence compared with previous Q-learning approaches(see Figs.7 and 8).

    To deal with uncertainty,the DQN algorithm also establishes a target net with the same structure to update the Q value.The target net has the same initial structure as the Q-function net,but the parameters are fixed.The parameters of the Q-function net are assigned to the target net at intervals to keep the Q value unchanged for a certain period of time.The optimal solution can be obtained by minimizing the following loss function with the gradient descent method:

    where

    is the target parameter.

    Because the DRL algorithm iterates the value Q(s,a)of the state—action pair,in the learning process,when action ais selected at time t,the update of the value function becomes

    where δ is the learning rate,τ is the discount rate,r(s,a)is the comprehensive advantage function,η is the network parameter of the Q-function,and ηis the target network parameter.It can be seen from Eq.(12)that the reinforcement learning algorithm contains the comprehensive dominance function and the state value after the selected action,which means that the method tends to be infinitely stable in the long run.

    3.Proposed method

    This section describes in detail how to realize the autonomous maneuver decision of the UCAV when maneuvering to the target based on an improved DQN in an uncertain environment.It also gives the situation assessment function and maneuver actionlibrary,as well as describing the implementation of the corresponding algorithm.

    Fig.5.Simulation reslts of roll angle control.

    3.1.DQN-based framework for UCAV control

    3.1.1.Reward function

    In aerial combat decision-making,taking the situation between the UCAV and its target in instantaneous time and space as a reward and punishment signal and constructing the corresponding aerial combat advantage function can make the decision-making system choose the appropriate maneuver and improve the UCAV's combat advantage over the enemy.Traditional environmental rewards generally include azimuth reward,speed reward,distance reward,and height reward,and the comprehensive aerial combat situation assessment value is obtained by weighting these components.However,this situation assessment does not consider the performance of the UCAV's weaponry and cannot adapt accurately to different weapons.Therefore,to solve this problem,in this paper,a superiority function for the attack mode of UCAV air-to-air missiles is designed[41].The typical air-to-air missile attack range is coneshaped,with a certain distance and angle in front of the attacking aircraft,as shown in Fig.9.In this figure,Vand Vare the velocity vectors of the UCAV and the target,respectively,R is the distance vector between the UCAV and the target,φand φare the angles between the distance vector and the velocity vectors of the UCAV and the target,respectively,and Rand φare the attack distanceand the attack angle,respectively,of the UCAV's missile.

    Fig.6.Basic framework of reinforcement learning.

    Fig.7.DQN model.

    Fig.8.Neural network.

    (1)Azimuth situation

    In an aerial combat environment,the rear,chasing,aircraft is in the dominant state and the aircraft being chased is in a disadvantaged state,while the two aircraft are in an equilibrium state when they fly in the opposite or the same direction.In this paper,the angle advantage is calculated from the azimuth angle of the two aircraft.The azimuth advantage function is given by

    When φ+φ=0,the UCAV is tailing the target,and the angle advantage function is greatest;when φ+φ=π,the UCAV and target are in the same situation;φ+φ=2π,the UCAV is tailed by the target and is in a disadvantaged state.

    (2)Distance situation

    Fig.9.Aerial combat situation.

    Nowadays,aerial combat weapons are generally air-to-air missiles.For most of these missiles,the hit rate is related mainly to distance.To make the distance parameter function insensitive to changes in distance,the UCAV decision-making must be robust.The distance advantage function is given by

    where R is the distance between the UCAV and the target,Ris the attack distance of the UCAV's missile,and σis the standard deviation.

    (3)Speed situation

    In an aerial combat environment,there is an optimal attack speed Vof the UCAV relative to its target,and the associated speed advantage function is given by

    where V is the current speed of the UCAV,and the optimal attack speed relative to its target is given by

    where Vis the speed of the target and Vis the maximum speed of the UCAV.

    (4)High-level situation

    In an aerial combat environment,there is an optimal attack height difference h.When the UCAV is higher than its target,it has a potential energy advantage,and the associated height advantage function is given by

    whereΔH is the current height difference between the UCAV and the target and σis the standard deviation of the optimal attack height of the UCAV.

    (5)Attack situation

    If the distance between the UCAV and the target is less than the attack distance of the missile,the angle between the UCAV speed vector and the distance vector is smaller than that of the UCAV missile,and the angle between the velocity vector and the distance vector of the target is less than 90,then the target is within the attack range of the UCAV and it can launch a missile and intercept the target.This ends the current simulation round,and the next round is entered.The reward value of the UCAV is given by

    When the conditions in Eq.(18)are met,the UCAV receives a reward value.To train UCAVs to avoid attack by enemy aircraft,the target in the simulation also has attack weapons.When the target meets the same conditions,the UCAV is at a disadvantage and receives a negative reward value:

    where

    (6)Environmental situation

    To avoid stalling,too low or too high a flight path,too great a distance from the target,or collision with the target,the speed of the UCAV must be limited to not less than 150 m/s,its height to not less than 1000 m,and its distance from the target to the range 1000—50,000 m.The associated reward value is:

    At the same time,because the UCAV model is a 6-DOF nonlinear model,the choice of maneuver action should consider not only the situations of the UCAV and the enemy,but also the state of the UCAV,so that the selected maneuver action can be fully executed without losing control of the UCAV.For fixed-wing aircraft,the magnitudes of the triaxial force and the triaxial moment are related to the angle of attack and the sideslip angle,and so the key to controlling the aircraft and its flight quality is the angle of airflow.When the aircraft is maneuvering,it is necessary to protect the airflow angle to avoid runaway of the aircraft due to inertia or disturbance beyond the flight envelope.To ensure that the decision mechanism avoids a choice of maneuver instructions that cause UCAV to lose control,the angle of attack of the UCAV should be limited to[-20,20]and the sideslip angle limited to[-30,30],with a negative reward value being given when these limits are exceeded.Thus,the reward value is given by:

    (7)Comprehensive advantage evaluation function

    The advantage function is used to judge the maneuver strategy.The purpose of the maneuver is to seize a favorable position and attack the target.Therefore,decision-makers need to consider the influence of each advantage function on the UCAV's aerial combat and evaluate it comprehensively.Because each of the advantage functions has a different influence on the aerial combat situation,a comprehensive advantage function is taken as a weighted sum of each advantage function:

    In this expression,the weight of each advantage function is determined by simulated aerial combat,and the calculated value is used as the feedback input of the environment.When this weighted sum of the advantage functions is large,the UCAV is in the dominant position.If this situation can be maintained for a certain time,the UCAV will have a greater chance to shoot down the target.However,when the weighted sum is small,the UCAV is in a disadvantageous position and has a greater probability of being attacked by the target.

    At the same time,the attack situation and environmental situation based on the missile are added to the dominance function to give the following formula:

    3.1.2.Action space

    The control mode of the UCAV is divided into attitude control and track control.Attitude control is common in UCAV short-range aerial combat,which requires tracking and entanglement of targets.In the PID control law,the state and rate of change of the attitude angle are usually controlled to realize attitude change.Track control is common in UCAV oversight aerial combat,and long-distance targeting is achieved by controlling the position.

    In the current aerial combat environment,there is intense combat is fierce and the tactical action switching time is fast.By constructing a mobile action database,it is possible to select appropriate maneuver actions according to the current battlefield situation,and the probability of winning an aerial combat can be increased.A UCAV aerial combat maneuver action library can be divided into two sub-libraries:a typical tactical action library and a basic maneuver action library.The typical tactical action library includes the Cobra maneuver,hammer maneuver,spiral climb,etc.,but each of these tactical actions is essentially composed of various basic actions.At present,special maneuver actions must be closely coordinated by humans and computers,otherwise the status of UCAV can exceed its normal envelope,leading to the risk of runaway.Therefore,in the design of the decision system,the basic maneuver action library[42,43]proposed by NASA is usually used as the selection range of the UCAV maneuver action library.

    However,these seven maneuvers are the flight states of a UCAV under extreme control,and they do not reflect the actions encountered in aerial combat.Control of an aircraft according tospeed can be divided into constant flight,accelerated flight,and deceleration flight;control according to roll angle can be divided into straight plane flight,left turn,and right turn;and control according to altitude can be divided into plane flight,climb,and descent.The combination of speed,roll angle,and height gives 27 kinds of maneuvering actions,as shown in Fig.10.

    For realization of the basic control action library,the movement calculation command in the maneuver action library consists of three commands:thrust,normal overload,and roll rate[T,nz,p].The maneuvering action command[V,γ,φ]in the European coordinate system is used to realize various maneuvers,and a candidate action library for autonomous combat decision is established,in which Vis the velocity command of the UCAV,γis the flight path angle command of the UCAV track,and φis the roll angle command of the UCAV.

    The 27 maneuver actions are used as the output of the UCAV maneuver decision,and the flight of the UCAV is thereby controlled.Because the UCAV lacks the human ability to sense the aircraft state,it is necessary to limit the above maneuver actions.By limiting the track inclination angle,roll angle,and speed command of the track,constraints are imposed at the control output end to prevent the UCAV from losing control due to excessive or too small value of the attack angle,sideslip angle,and speed.The command range of the flight path angle at the control output is[γ,γ],that of the roll angle is[φ,φ],and that of the speed is[V,V].At the same time,if the UCAV azimuth is near the target,the roll angle is probably too large,and so a reverse rolling command should be received to correct this,to prevent the combat efficiency from being reduced.On the other hand,the importance of altitude is less than the azimuth and distance.It is easy to climb too high or fall too low in the learning process,and so certain restrictions need to be imposed.Therefore,for a variety of aerial combat situations,the maneuver action commands are as follows.In terms of height,when a fixed height command is received,

    When a climb command is received,

    When a descent command is received,

    When a direction instruction in terms of angle is received,

    Fig.10.Maneuver library.

    When a right-turn command is received,

    When a left-turn command is received,

    When a constant-speed command is received,

    where Vis the current speed of the UCAV.When an acceleration command is received,

    When a deceleration command is received,

    3.1.3.State space

    Since the aerial combat environment is a three-dimensional space,to fully represent the flight status and aerial combat situation of the two aircraft,the state space of the autonomous aerial combat maneuver decision module in Fig.1 contains 10 variables:

    where φis the angle between the UCAV velocity vector and the target velocity vector,θand θare the pitch angles of the UCAV and the target,respectively,His the current flight altitude of the UCAV,andΔH=H-His the altitude difference between the UCAV and the target.It is necessary to normalize the state space and then input it into the neural network model.

    3.2.UCAV aerial combat decision-making based on MS-DDQN

    The DQN algorithm suffers from the problem of a cold start.It is difficult to get the model into a relatively ideal environment during the early stages of algorithm iteration,which leads to a large error in the estimation of the value function in the early learning stage.The slow learning rate will affect the later learning process,and it is difficult to find the optimal strategy.With a double Q-learning algorithm,the model can use different models when choosing the optimal action and calculating the target value,thereby reducing the overestimation of the value function.

    At the same time,the aerial combat decision-making system needs to make many decisions throughout the entire aerial combat,and this is a continuous process.During training,some time is needed to skip the period of large fluctuations in the early stage and enter a stable period.Because the purpose of the decision-making is to ensure that the UCAV reaches a dominant position in a certain time and attack the enemy aircraft,the situation on the UCAV side at the end of aerial combat is particularly important.The final reward values for each aerial combat are returned to the value function of the state—action pair Q(s,a)with certain weights,with the aim of improving the speed and accuracy of training.

    The target parameters are updated as follows:

    Fig.11.Neural network model.

    When an action is selected at time t,the value function is updated as follows:

    where t'is the time of single aerial combat decisions and r(s,a)is the final reward value of the aerial combat,the reward value r(s,a)is updated as follows:

    A neural network approximation function is also used,in which the input layer includes the current situation information,and the output layer includes longitudinal maneuver decision-making instructions,lateral maneuver decision-making instructions,and speed control commands Fig.11.The situation information is input into the neural network,and the maneuver action value is output.At the same time,the optimal maneuver action is obtained by interaction with the environment,which allows the UCAV independently make operational decisions and improve its intelligence.

    In its training for aerial combat decision-making,the UCAV makes maneuver decisions according to the DRL algorithm described above.The whole training process is composed of multiple aerial combat rounds.Whenever the UCAV decides that it has hit the enemy plane,is hit by the enemy plane,reaches the maximum round time,or is in the wrong position,it ends the aerial combat round,enters a new aerial combat round,and resets the simulation environment.In the process of training,an ε-greedy strategy is adopted.In the beginning,actions are randomly generated with 100% probability.As the simulation progresses,the probability is continuously reduced to 10%,at which point the strategic approach is optimal.At the same time,to reflect the effect of learning,the decision-making ability needs to be judged regularly during training,and the random probability needs to be reduced to 0 when a judgment is made,so that the decision-making model can directly output the largest Q-value action.Also,the final value of the dominance function value needs to be determined,to judge the learning efficiency compared with different periods.

    The specific implementation steps of the MS-DDQN algorithm for UCAV short-range aerial combat maneuver decision-making are shown in Algorithm 1.

    Algorithm 1.Specific implementation steps of the MS-DDQN algorithm.1 Initialize replay memory D to capacity N 2 Input UCAV model,aerial combat environment,and missile model 3 Initialize action-value function Q with random weights η and target action-value function Q′with weights η-=η 4 Initialize hyper-parameters:evaluation frequency M,mini-batch size N,random probability ε,learning rate δ,reward value ratio λ,maximum episode E,maximum step T,target networks update interval C 5 For each episode e=1,2,…,E,do 6 Observe initial states of UCAV models of both sides and the simulated aerial combat environment to obtain the current situation 7 If e% M==0,do 8 Make ε=0 to perform evaluation episode 9 End if 10 For each step t=1,2,…,T,do 11 Choose one action randomly from the 27 kinds of maneuvers in the maneuvering library with a probability ε;otherwise select at=argmax a Q(st,aη)12 Perform action at,get reward rt and new status st+1 13 Store transition(st,at,rt,st+1)in D 14 Judge whether the aerial combat round is over 15 end for 16 Get final reward Rt,stop time of aerial combat t'and store transition(st,at,rt,st+1,Rt,t′)in D 17 Sample random minibatch of transitions(si,ai,ri,si+1,Ri,i′)from D 18 Set yi=■■■■ ■■■ri for terminal si′,ri+τQsi+1,argmax a Q(si+1,a■η)■■■■η-+λi′-iRi′for nonterminal si′■■■■ ■■■19 Perform a gradient descent step[yt′-Q(st′,at′■η)]2 to the network parameters 20 Every C steps,reset the target network,and make η-=η 21 Gradually decrease the value of ε to 0.1 22 End for

    4.Experiment and results of analysis

    In this section,we discuss the simulation environment and the setting of various parameters and carry out experiments.On the basis of the experimental results,we prove the effectiveness and performance improvement provided by our proposed method.

    4.1.Simulation environment settings

    To verify the MS-DDQN algorithm proposed in this paper,the parameters are set and constrained during the experiment.According to the range of flight performance,acceleration,and altitude of the UCAV,a constrained flight control law is designed,i.e.,the limitations of the aircraft state are incorporated into the design of the controller,the control input is calculated,the actual rudder deflection angle of the UCAV is limited to the maximum allowable deflection angle in real time,and the flight parameters are maintained within safe boundaries without affecting normal flight control.This ensures flight and operational performance and safety,reduces the workload of the UCAV,and realizes control of the UCAV.The main constrained variables are the angle of attack,sideslip angle,overload,and airspeed.

    Let the UCAV be designated as red and the enemy(the target)as blue.The simulation environments of the red and blue aircraft are limited in the same airspace.In the first task,a constant-velocity model is introduced for the target,in which the target moves in a straight line at a uniform speed,and its initial direction,velocity,and position are evenly distributed.In the second task,the target turns at a uniform speed,and its initial direction,velocity,and position are again evenly distributed.The parameters of the MSDDQN algorithm and their values are listed in Table 4.

    The parameters of the neural network are set as follows.A twolayer fully connected feedforward neural network is used as the online Q-network,with 10 input states and 27 output values.The network has two hidden layers,with unit sizes of 1024 and 512,respectively.The TANH function is used as the activation function,and the PURELIN function is used to activate the last output layer.The buffer size of the memory playback unit D is set to 10.After 10,000 samples have been stored,the neural network starts training.The number of training samples extracted each time is 1000,and the target network is updated every 100 combat rounds,in which the learning rate is δ=0˙01 and the discount coefficient is τ=0˙9.

    In the process of simulation,to complete the whole maneuver,the decision time of each step is 1 s,and the maximum time of each operation is 200 s.If the UCAV decides to track the enemy aircraft,be tracked by the enemy aircraft,reach the maximum turn time,or be in the wrong position,it ends the aerial combat round.Every 100 combat rounds,it evaluates the learning ability of the neural network and checks the reward value when it stops flying.The computer used in the simulation has an AMD Ryzen 7 3700X 8-core CPU and an NVIDIA GeForce GTX 1660 SUPER graphics card.

    4.2.Analysis of simulation results

    4.2.1.Algorithm training

    The aircraft models of both sides are identical,as shown in section 2.2,in which the red machine is the UCAV,the blue machine is the target aircraft,and the initial state space is shown in Fig.12.We carry out the first training task as follows.The initial direction,speed,and position of the target are shown in training episode 1 inTable 5.The target moves in a straight line at a constant speed.The pre-training is carried out based on the basic training method.After pre-training of 200,000 combat rounds,the weight of the training network in subtask 1 is transferred to subtask 2 as the initial weight network of the current strategy,and then the training of 200,000 combat rounds is carried out.The initial state value of subtask 2 is shown in training episode 2 in Table 5,the initial position of the target is on a ring 20,000 m away from the UCAV.

    Table 5Initial state values.

    Fig.12.Initial state space.

    After training,the states of both sides are as shown in test episode 1 in Table 5,and the test is carried out.The whole combat trajectory can be seen in Fig.13,which shows the real-time position changes of both sides.The trajectories of the UCAV and target are green and gold,respectively.Fig.13(a)and(b)are a stereogram and a plan diagram,respectively,of aerial combat.It can be seen that the UCAV begins to roll to the right,accelerates to climb and turn to the right when it flies to 13 s,continuing to accelerate when it flies to 30 s.During the whole process,it chooses the appropriate maneuver and adjusts its speed and attitude so as to achieve the dominant position and attack the target.As can be seen from Fig.14(a),the distance between the UCAV and the target continually decreases until the attack distance is reached,and as shown in Fig.14(b),the UCAV is slightly higher than the target,which is convenient for attacking the latter.Fig.14(c)shows that the value of the UCAV's dominance function increases continuously and it finally succeeds in defeating the enemy.Fig.14(d)and(e),and 14(f)show the changes of pitch angle,roll angle,and velocity,respectively,of the UCAV during its tracking of the target tracking.It can be clearly seen that the UCAV continually adjusts its attitude and velocity in order to reach the attack position.Owing to the improvement of the maneuver library,the roll value selected by the UCAV as it flies toward the target will not be too large,avoiding the risk of overflight.

    To verify the universality of the training,the states of both sides are shown in test episodes 2 and 3 in Table 5,and the corresponding combat trajectories are shown in Fig.15 and Fig.16.The target also moves in a straight line at a constant speed.

    We can see that when facing the enemy aircraft from the rear,the UCAV will choose to accelerate,roll right,and climb to a certain altitude at the start,accelerate in forward flight when flying to 30 s,switch attitude when flying to 40 s,and roll right behind the target to attack.When facing the enemy aircraft from the right side,it will roll to the left and climb at the start.When flying to 28 s,it will accelerate to roll to the left and descend.It will also choose to attack behind the target.This shows that the UCAV trained by the algorithm can choose the appropriate maneuver and occupy a relatively dominant position in different situations.

    After the environmental variables have been set,the second training is carried out,the training method is the same as the first training,and the target turns at a constant speed with a roll angle of 30.The initial direction,speed,and position values are shown in test episode 4 in Table 5.

    Fig.17 shows the combat trajectory of the UCAV,and Fig.18(a)and(b),and 18(c)show its range,altitude,and speed and those of the target.It can be seen that the UCAV can still choose the appropriate maneuver in the complex situation of facing a turning moving target.

    Fig.13.Position simulation 1 of a UCAV attacking a linearly moving target.

    Fig.14.Simulation 1 of a UCAV attacking a linearly moving target.

    4.2.2.Effectiveness of the algorithm

    In the air combat maneuver decision-making process,the length of decision-making time affects the reaction speed of the UCAV.The decision-making time of the DRL method is about 1.5 ms,while the decision-making time of other air combat decision-making algorithms such as differential game method and game theory method is more than 3 ms.The response speed of UCAV is slow,and the decision-making time is not as good as the DRL algorithm.But the DQN algorithm has the problems of slow training and low efficiency,so this part aims to improve the learning efficiency of DRL through the MS-DDQN algorithm.

    To verify the superiority of the MS-DDQN algorithm,an experiment is performed to compare its performance with those of MSDQN and traditional DQN strategy exploration methods.These three algorithms are implemented in the first training task and are then evaluated after a period of training.The initial positions of the enemy and the UCAV in the evaluation are as shown in training episode 2 in Table 4,and the average final dominance value is calculated.

    Fig.19 shows the average final dominance value of the UCAV in each evaluation episode.The red,blue,and green results represent the dominance values of the MS-DDQN,MS-DQN,and DQN algorithms respectively.It can be seen that the training speeds of the MS-DDQN and MS-DQN algorithms are initially higher than that of the DQN algorithm and that the average dominance value of the MS-DDQN algorithm eventually becomes greater than that of the MS-DQN algorithm.

    After the task training,we randomly select 1000 initial positions to test the algorithm.

    Fig.20 shows the average final dominance value of the UCAVafter calculation for each test set.From Table 6,it can be seen that the success probability of the MS-DDQN algorithm test value is much higher than those of the MS-DQN and DQN algorithms.This means that the use of the DRL-based MS-DDQN algorithm enables the UCAV to explore effective maneuvers and succeed in targeting enemy aircraft.

    Fig.15.Position simulation 2 of a UCAV attacking a linearly moving target.

    Fig.16.Position simulation 3 of a UCAV attacking a linearly moving target.

    Fig.17.Position simulation 4 of a UCAV attacking a turning moving target.

    4.2.3.Comparison of algorithms

    Simulations of UCAV aerial combat are performed for training using the MS-DDQN and DQN algorithms,with the initial values of direction,speed,and position as given in Table 7.Among them,the UCAV adopts MS-DDQN algorithm for training and generates corresponding air combat maneuvers.The target adopts DQN algorithm for training,and the action space is the same as the UCAV.

    Table 6Comparison of different algorithms.

    Table 7Initial state values for aerial combat comparison.

    Fig.21 shows the combat trajectory of the UCAV,and it can be seen that the flight path is scissor-shaped for both algorithms.Fig.22(a)and(b),and 22(c)show the altitude,roll angle,and speed of the UCAV.After a period of combat,the UCAV using the MSDDQN algorithm successfully shoots down its target.This comparison shows that the combat effectiveness of UCAVs can be effectively improved by using the MS-DDQN algorithm.

    5.Conclusions

    Traditional algorithms model UCAV decisions in attack and defense by describing the dynamic decision-making process of aerial combat.However,these decision models require frequent reasoning processes,which will waste a lot of time in seeking the optimal solution,and this is difficult to apply in real-time UCAV aerial combat.

    In this paper,an MS-DDQN algorithm based on DRL has been used to construct an autonomous aerial combat maneuver decision system for UCAVs that takes account of the 6-DOF flight of a UCAV.The decision system is trained by the transfer method.Thesimulation results show that by improving the original DQN algorithm using DRL,the proposed algorithm can significantly speed up training and improve the effectiveness of combat.

    Fig.18.Simulation 4 of a UCAV attacking a turning moving target.

    Fig.19.Average final dominance value of each evaluation episode.

    Fig.20.Average final dominance value of each testing episodes.

    In the future,we intend to make the simulation scenarios more realistic by considering uncertain factors such as obstacles,wind field interference,and measurement errors,as well as moving on from virtual digital simulations of UCAVs to actual maneuver operations.

    Fig.21.Comparison of MS-DDQN and DQN algorithms for position simulation 5 of a UCAV attacking a turning moving target.

    Fig.22.Comparison of MS-DDQN and DQN algorithms for position simulation 5 of a UCAV attacking a turning moving target.

    The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

    This work is supported by the National Natural Science Foundation of China(No.61573286),the Aeronautical Science Foundation of China(No.20180753006),the Fundamental Research Funds for the Central Universities(3102019ZDHKY07),the Natural Science Foundation of Shaanxi Province(2019JM-163,2020JQ-218),and the Shaanxi Province Key Laboratory of Flight Control and Simulation Technology.

    又黄又爽又免费观看的视频| 99视频精品全部免费 在线 | 国产一区在线观看成人免费| 1024香蕉在线观看| 亚洲 欧美 日韩 在线 免费| 身体一侧抽搐| 小说图片视频综合网站| 美女大奶头视频| 琪琪午夜伦伦电影理论片6080| 成年人黄色毛片网站| www国产在线视频色| 久久久久久久久中文| 欧美日韩黄片免| 亚洲国产精品sss在线观看| 欧美一区二区国产精品久久精品| 欧美日韩一级在线毛片| 小蜜桃在线观看免费完整版高清| 99久久久亚洲精品蜜臀av| 国产一区二区在线观看日韩 | 日韩av在线大香蕉| 精品99又大又爽又粗少妇毛片 | 亚洲黑人精品在线| 国产成人啪精品午夜网站| 免费看光身美女| 亚洲成人精品中文字幕电影| 日韩欧美三级三区| 一边摸一边抽搐一进一小说| 亚洲人成网站高清观看| 夜夜看夜夜爽夜夜摸| 在线永久观看黄色视频| 国产成人影院久久av| 日韩欧美在线二视频| 国产伦在线观看视频一区| ponron亚洲| 黄片小视频在线播放| 午夜福利在线观看吧| 国产精品久久电影中文字幕| 国产精品亚洲一级av第二区| 啦啦啦免费观看视频1| 国产精品女同一区二区软件 | 欧美又色又爽又黄视频| 亚洲精品美女久久av网站| 一本久久中文字幕| 黄色丝袜av网址大全| 国产精品一区二区三区四区免费观看 | 国产91精品成人一区二区三区| 成在线人永久免费视频| 国产精品香港三级国产av潘金莲| or卡值多少钱| 亚洲成av人片免费观看| 亚洲天堂国产精品一区在线| 国产亚洲精品久久久久久毛片| 久久精品aⅴ一区二区三区四区| 亚洲欧美一区二区三区黑人| 色尼玛亚洲综合影院| 国语自产精品视频在线第100页| 精华霜和精华液先用哪个| 观看美女的网站| 免费看日本二区| 亚洲午夜理论影院| 少妇丰满av| 国产成人精品久久二区二区91| 精品无人区乱码1区二区| 午夜精品一区二区三区免费看| 亚洲中文字幕日韩| 麻豆国产97在线/欧美| 亚洲国产色片| 在线观看舔阴道视频| 女人高潮潮喷娇喘18禁视频| 亚洲国产欧洲综合997久久,| 桃红色精品国产亚洲av| 最新美女视频免费是黄的| 国产伦在线观看视频一区| 精品久久久久久,| 极品教师在线免费播放| 亚洲成a人片在线一区二区| 成人永久免费在线观看视频| 中文字幕人成人乱码亚洲影| 一边摸一边抽搐一进一小说| 日日夜夜操网爽| 欧美一区二区精品小视频在线| 天堂√8在线中文| 后天国语完整版免费观看| 亚洲在线自拍视频| 人妻夜夜爽99麻豆av| 好男人电影高清在线观看| 99久久精品热视频| 亚洲美女视频黄频| 在线观看一区二区三区| 超碰成人久久| 欧美三级亚洲精品| 在线视频色国产色| 亚洲国产看品久久| 19禁男女啪啪无遮挡网站| 啪啪无遮挡十八禁网站| 国产精品 国内视频| 国产成人av教育| 国产精品 欧美亚洲| netflix在线观看网站| 中文亚洲av片在线观看爽| 亚洲精品一区av在线观看| 18禁国产床啪视频网站| 青草久久国产| 亚洲中文日韩欧美视频| 色在线成人网| 一夜夜www| 美女扒开内裤让男人捅视频| 亚洲中文字幕一区二区三区有码在线看 | 国产精华一区二区三区| 麻豆一二三区av精品| 97碰自拍视频| 免费搜索国产男女视频| 法律面前人人平等表现在哪些方面| av福利片在线观看| 黄片小视频在线播放| 后天国语完整版免费观看| 丁香六月欧美| 国产成人系列免费观看| 一本久久中文字幕| 韩国av一区二区三区四区| www.自偷自拍.com| 国产黄色小视频在线观看| 色av中文字幕| 国产精品久久久久久久电影 | 香蕉国产在线看| 丁香六月欧美| 2021天堂中文幕一二区在线观| 男女做爰动态图高潮gif福利片| 国产淫片久久久久久久久 | 两人在一起打扑克的视频| 成年免费大片在线观看| 国产aⅴ精品一区二区三区波| 亚洲在线观看片| 中文资源天堂在线| 动漫黄色视频在线观看| 国产精品久久久人人做人人爽| 国产精品,欧美在线| 国产成人精品久久二区二区91| 美女 人体艺术 gogo| 亚洲人成电影免费在线| 亚洲成人久久爱视频| 国产亚洲av嫩草精品影院| 久久性视频一级片| 久久热在线av| 国产av在哪里看| 欧美日韩乱码在线| 日本在线视频免费播放| 男人舔女人的私密视频| 天天躁日日操中文字幕| 在线观看日韩欧美| 久久午夜综合久久蜜桃| 日本成人三级电影网站| 日本免费一区二区三区高清不卡| 性色av乱码一区二区三区2| 欧洲精品卡2卡3卡4卡5卡区| 午夜两性在线视频| 国产精品自产拍在线观看55亚洲| 亚洲第一电影网av| 午夜福利高清视频| 性色av乱码一区二区三区2| 国内久久婷婷六月综合欲色啪| 不卡一级毛片| 51午夜福利影视在线观看| 久久久久精品国产欧美久久久| 亚洲一区二区三区色噜噜| 90打野战视频偷拍视频| 久久伊人香网站| 悠悠久久av| 精品久久久久久久毛片微露脸| 精品久久蜜臀av无| 999精品在线视频| 给我免费播放毛片高清在线观看| 中文在线观看免费www的网站| 亚洲中文字幕一区二区三区有码在线看 | 国产91精品成人一区二区三区| 国产精华一区二区三区| 欧美日韩瑟瑟在线播放| 19禁男女啪啪无遮挡网站| 狂野欧美激情性xxxx| 网址你懂的国产日韩在线| 亚洲熟妇中文字幕五十中出| 首页视频小说图片口味搜索| 国产高潮美女av| 一夜夜www| 国产在线精品亚洲第一网站| 看黄色毛片网站| 国产探花在线观看一区二区| www日本黄色视频网| 国产成人影院久久av| 国产激情欧美一区二区| 天天躁狠狠躁夜夜躁狠狠躁| 欧美黑人巨大hd| 国产欧美日韩精品一区二区| avwww免费| 精品国产美女av久久久久小说| 深夜精品福利| 国产精品亚洲美女久久久| 久久精品91蜜桃| 亚洲国产精品999在线| 听说在线观看完整版免费高清| 免费看十八禁软件| 国产精品 欧美亚洲| 国产黄片美女视频| 在线观看一区二区三区| 国产高清有码在线观看视频| 国产精华一区二区三区| 国产成人啪精品午夜网站| 欧美激情久久久久久爽电影| 欧美av亚洲av综合av国产av| 免费电影在线观看免费观看| 欧美日韩一级在线毛片| 欧美日韩瑟瑟在线播放| 中文字幕熟女人妻在线| 国产 一区 欧美 日韩| 欧美日本亚洲视频在线播放| 成人性生交大片免费视频hd| 国产精品亚洲一级av第二区| 最近最新中文字幕大全免费视频| 99久久99久久久精品蜜桃| 男女做爰动态图高潮gif福利片| 国产成人aa在线观看| 99久久99久久久精品蜜桃| 国产高潮美女av| 免费av毛片视频| 露出奶头的视频| 国产成人一区二区三区免费视频网站| 亚洲午夜理论影院| 午夜福利在线在线| 欧美黄色片欧美黄色片| 国产成人影院久久av| 男女那种视频在线观看| 欧美日韩综合久久久久久 | 国产精品一区二区精品视频观看| 男女做爰动态图高潮gif福利片| 最近最新免费中文字幕在线| 亚洲精品乱码久久久v下载方式 | 精品一区二区三区视频在线观看免费| 亚洲性夜色夜夜综合| 国产 一区 欧美 日韩| 亚洲成人精品中文字幕电影| 美女 人体艺术 gogo| 99视频精品全部免费 在线 | 九九在线视频观看精品| 亚洲狠狠婷婷综合久久图片| 亚洲在线自拍视频| 他把我摸到了高潮在线观看| 欧美另类亚洲清纯唯美| 人人妻人人澡欧美一区二区| av女优亚洲男人天堂 | 国产淫片久久久久久久久 | 成人三级做爰电影| 亚洲精品在线观看二区| 精品熟女少妇八av免费久了| 全区人妻精品视频| 欧美黑人巨大hd| 99久久99久久久精品蜜桃| 亚洲,欧美精品.| 欧美成人免费av一区二区三区| 午夜视频精品福利| 桃红色精品国产亚洲av| av天堂中文字幕网| 亚洲七黄色美女视频| 亚洲 欧美一区二区三区| 九九久久精品国产亚洲av麻豆 | 欧美日韩黄片免| cao死你这个sao货| 亚洲va日本ⅴa欧美va伊人久久| 五月玫瑰六月丁香| 国产精品 欧美亚洲| 久久人人精品亚洲av| 国产亚洲av嫩草精品影院| 久久久久国产精品人妻aⅴ院| 成年女人永久免费观看视频| 午夜久久久久精精品| 国产av在哪里看| 好男人在线观看高清免费视频| 国产乱人伦免费视频| 天天添夜夜摸| 国内精品久久久久精免费| 久久婷婷人人爽人人干人人爱| 最新在线观看一区二区三区| 99国产极品粉嫩在线观看| 毛片女人毛片| 女同久久另类99精品国产91| 欧美性猛交╳xxx乱大交人| 国产综合懂色| 亚洲激情在线av| 日韩精品青青久久久久久| 最近最新免费中文字幕在线| 日韩国内少妇激情av| 国产一区二区激情短视频| 真实男女啪啪啪动态图| 99re在线观看精品视频| e午夜精品久久久久久久| 亚洲成人久久性| 色精品久久人妻99蜜桃| 国产精品一区二区精品视频观看| 国产一区二区三区在线臀色熟女| 看片在线看免费视频| 亚洲av美国av| 国产精品一区二区免费欧美| 久久久久久久久久黄片| 观看免费一级毛片| 国产免费av片在线观看野外av| 99久久精品国产亚洲精品| 黄色 视频免费看| a在线观看视频网站| 亚洲av五月六月丁香网| av视频在线观看入口| 亚洲无线观看免费| 国产精品av久久久久免费| 国产探花在线观看一区二区| 久久精品影院6| 国产又黄又爽又无遮挡在线| 国产精品一及| 国产综合懂色| 国产精品免费一区二区三区在线| 中文在线观看免费www的网站| av欧美777| 国产69精品久久久久777片 | 国产人伦9x9x在线观看| 一进一出好大好爽视频| 黄色丝袜av网址大全| 国产精品香港三级国产av潘金莲| 国产精品亚洲av一区麻豆| 日韩欧美国产一区二区入口| 很黄的视频免费| 悠悠久久av| 国产亚洲av嫩草精品影院| 又紧又爽又黄一区二区| 成人无遮挡网站| 国内精品一区二区在线观看| www.999成人在线观看| 国产日本99.免费观看| 12—13女人毛片做爰片一| or卡值多少钱| 两个人看的免费小视频| 精品国产美女av久久久久小说| 99热这里只有精品一区 | 国产成人一区二区三区免费视频网站| 欧美+亚洲+日韩+国产| 成人av在线播放网站| 亚洲人成网站在线播放欧美日韩| 麻豆av在线久日| 老熟妇乱子伦视频在线观看| 亚洲五月天丁香| 婷婷亚洲欧美| 两个人视频免费观看高清| 免费在线观看视频国产中文字幕亚洲| 我要搜黄色片| 在线十欧美十亚洲十日本专区| 成在线人永久免费视频| 国产伦一二天堂av在线观看| 国内毛片毛片毛片毛片毛片| 免费看光身美女| 黄频高清免费视频| 欧美日韩瑟瑟在线播放| 一个人观看的视频www高清免费观看 | 老司机在亚洲福利影院| 啪啪无遮挡十八禁网站| 久久国产精品影院| 黄片大片在线免费观看| 久久精品91蜜桃| 男插女下体视频免费在线播放| 国产亚洲av高清不卡| 在线观看午夜福利视频| 欧美三级亚洲精品| 亚洲国产色片| 97碰自拍视频| 成年免费大片在线观看| 美女免费视频网站| 亚洲avbb在线观看| 久久亚洲精品不卡| 国产精品 欧美亚洲| 久久这里只有精品中国| 午夜成年电影在线免费观看| 久久精品国产清高在天天线| 国产在线精品亚洲第一网站| 亚洲熟妇熟女久久| 美女被艹到高潮喷水动态| 国产三级黄色录像| 欧美高清成人免费视频www| 精品久久久久久久毛片微露脸| 国产亚洲精品综合一区在线观看| 舔av片在线| 国产熟女xx| 免费看美女性在线毛片视频| 757午夜福利合集在线观看| 国产高清有码在线观看视频| 黄色成人免费大全| 毛片女人毛片| 精品电影一区二区在线| 久久久久性生活片| 男女午夜视频在线观看| 女人被狂操c到高潮| 亚洲成人中文字幕在线播放| 日本在线视频免费播放| 深夜精品福利| 天堂av国产一区二区熟女人妻| 老汉色av国产亚洲站长工具| 国产成人精品久久二区二区免费| 少妇裸体淫交视频免费看高清| 搞女人的毛片| 嫩草影院入口| 欧美日韩黄片免| 成人av在线播放网站| 国内少妇人妻偷人精品xxx网站 | 国产真人三级小视频在线观看| 国产成人欧美在线观看| av欧美777| 熟女少妇亚洲综合色aaa.| 巨乳人妻的诱惑在线观看| 久久草成人影院| 午夜福利在线在线| 美女高潮的动态| 村上凉子中文字幕在线| 老司机福利观看| 中文字幕人成人乱码亚洲影| 欧美国产日韩亚洲一区| 宅男免费午夜| 最近最新中文字幕大全电影3| 久久99热这里只有精品18| 久久伊人香网站| 男女做爰动态图高潮gif福利片| 国产成人啪精品午夜网站| 国内揄拍国产精品人妻在线| www.精华液| 黄色女人牲交| 国产高清videossex| 国产欧美日韩一区二区三| www.熟女人妻精品国产| 99热只有精品国产| 淫妇啪啪啪对白视频| 亚洲精品色激情综合| 观看免费一级毛片| АⅤ资源中文在线天堂| 国产激情欧美一区二区| 久久久久久大精品| 人人妻人人看人人澡| 90打野战视频偷拍视频| 一本综合久久免费| 观看美女的网站| 日本熟妇午夜| 母亲3免费完整高清在线观看| 国产黄片美女视频| 国产精华一区二区三区| av在线蜜桃| 久99久视频精品免费| 欧美+亚洲+日韩+国产| 免费在线观看成人毛片| 村上凉子中文字幕在线| 国产精品久久电影中文字幕| 美女黄网站色视频| 看黄色毛片网站| av在线天堂中文字幕| 91老司机精品| 黄片小视频在线播放| АⅤ资源中文在线天堂| 久99久视频精品免费| 亚洲电影在线观看av| 久久精品91无色码中文字幕| 欧美绝顶高潮抽搐喷水| 久久中文字幕人妻熟女| 日日夜夜操网爽| 色吧在线观看| 法律面前人人平等表现在哪些方面| 村上凉子中文字幕在线| 99久久国产精品久久久| 亚洲成人久久性| 禁无遮挡网站| 国产精品98久久久久久宅男小说| 久久久久久久精品吃奶| 精品国产超薄肉色丝袜足j| 黄色片一级片一级黄色片| 亚洲国产精品sss在线观看| 丰满人妻熟妇乱又伦精品不卡| 亚洲精品美女久久久久99蜜臀| 国产亚洲精品av在线| 国产精品久久久久久人妻精品电影| 黑人巨大精品欧美一区二区mp4| 亚洲国产色片| 国产精品久久久av美女十八| 亚洲欧洲精品一区二区精品久久久| 好男人电影高清在线观看| 亚洲一区高清亚洲精品| 91麻豆精品激情在线观看国产| 日日夜夜操网爽| 日本免费a在线| 日韩精品中文字幕看吧| 亚洲成av人片免费观看| 亚洲乱码一区二区免费版| 99久久综合精品五月天人人| 国产欧美日韩一区二区三| 日日摸夜夜添夜夜添小说| 偷拍熟女少妇极品色| 757午夜福利合集在线观看| 国产精华一区二区三区| 日韩精品青青久久久久久| 色av中文字幕| АⅤ资源中文在线天堂| 99国产精品一区二区三区| 免费看十八禁软件| 男人舔奶头视频| 色av中文字幕| 欧美黑人欧美精品刺激| 国产成+人综合+亚洲专区| 在线观看午夜福利视频| 久久欧美精品欧美久久欧美| 熟女少妇亚洲综合色aaa.| 深夜精品福利| 国产av麻豆久久久久久久| 91在线精品国自产拍蜜月 | 国产精品久久久久久精品电影| 丁香六月欧美| 国产精品久久久久久人妻精品电影| 国内精品美女久久久久久| 成人鲁丝片一二三区免费| 国产91精品成人一区二区三区| 中文字幕av在线有码专区| 啪啪无遮挡十八禁网站| 日本免费一区二区三区高清不卡| 校园春色视频在线观看| 床上黄色一级片| 岛国在线观看网站| 日本一二三区视频观看| 久久久久久人人人人人| 99久久精品国产亚洲精品| 精品国产美女av久久久久小说| 亚洲中文日韩欧美视频| 国产精品久久视频播放| 亚洲专区中文字幕在线| 精品日产1卡2卡| 99热只有精品国产| 欧美乱色亚洲激情| 俺也久久电影网| 国产精品久久久久久亚洲av鲁大| 高清在线国产一区| 亚洲狠狠婷婷综合久久图片| 国产视频一区二区在线看| 日本撒尿小便嘘嘘汇集6| 亚洲激情在线av| 韩国av一区二区三区四区| 99riav亚洲国产免费| av中文乱码字幕在线| 国产av一区在线观看免费| 噜噜噜噜噜久久久久久91| 国产精品99久久99久久久不卡| 国产精品久久视频播放| 国产av在哪里看| 少妇的丰满在线观看| 99热只有精品国产| 免费在线观看视频国产中文字幕亚洲| 午夜精品在线福利| 国产三级在线视频| 在线播放国产精品三级| 国产一区二区三区在线臀色熟女| 99视频精品全部免费 在线 | 国产精品一及| 亚洲专区字幕在线| 男人和女人高潮做爰伦理| 人人妻人人澡欧美一区二区| 日本一本二区三区精品| 亚洲欧美一区二区三区黑人| 1000部很黄的大片| 婷婷丁香在线五月| 国产av不卡久久| 岛国在线观看网站| 国产麻豆成人av免费视频| 久久99热这里只有精品18| 亚洲成av人片在线播放无| 国产成人影院久久av| 天天躁狠狠躁夜夜躁狠狠躁| 丰满人妻熟妇乱又伦精品不卡| 深夜精品福利| 少妇的逼水好多| 制服丝袜大香蕉在线| 又黄又爽又免费观看的视频| 18禁裸乳无遮挡免费网站照片| 久久久国产欧美日韩av| 免费在线观看日本一区| 免费搜索国产男女视频| 午夜免费成人在线视频| 国产成人av激情在线播放| 久久久久久久精品吃奶| 日韩欧美国产一区二区入口| 最新美女视频免费是黄的| 一个人免费在线观看电影 | 老司机午夜福利在线观看视频| 亚洲熟妇熟女久久| 变态另类成人亚洲欧美熟女| 老司机午夜十八禁免费视频| 亚洲精品色激情综合| 国产精品一区二区三区四区免费观看 | 小蜜桃在线观看免费完整版高清| 90打野战视频偷拍视频| 亚洲av成人一区二区三| 91av网站免费观看| 亚洲av五月六月丁香网| 日本一二三区视频观看| 久久精品人妻少妇| 狂野欧美激情性xxxx| 一进一出好大好爽视频| 国产激情欧美一区二区| 国产精品女同一区二区软件 | 成人av一区二区三区在线看| e午夜精品久久久久久久| avwww免费| 高潮久久久久久久久久久不卡| 亚洲狠狠婷婷综合久久图片| 成人高潮视频无遮挡免费网站| 久久精品91无色码中文字幕| 99久久久亚洲精品蜜臀av| 亚洲人成伊人成综合网2020| 少妇丰满av| 高潮久久久久久久久久久不卡|