• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Maneuvering target tracking of UAV based on MN-DDPG and transfer learning

    2021-03-23 13:58:44BoLiZhipengYngqingChenShiyngLingHo
    Defence Technology 2021年2期

    Bo Li ,Zhi-peng Yng ,D-qing Chen ,Shi-yng Ling ,Ho M

    a School of Electronics and Information,Northwestern Polytechnical University,Xi’an,710072,China

    b School of Engineering,London South Bank University,London,SE1 0AA,UK

    c AVIC Xi’an Aeronautics Computing Technique Research Institute,Xi’an,710068,China

    Keywords:UAVs Maneuvering target tracking Deep reinforcement learning MN-DDPG Mixed noises Transfer learning

    ABSTRACT Tracking maneuvering target in real time autonomously and accurately in an uncertain environment is one of the challenging missions for unmanned aerial vehicles(UAVs).In this paper,aiming to address the control problem of maneuvering target tracking and obstacle avoidance,an online path planning approach for UAV is developed based on deep reinforcement learning.Through end-to-end learning powered by neural networks,the proposed approach can achieve the perception of the environment and continuous motion output control.This proposed approach includes:(1)A deep deterministic policy gradient(DDPG)-based control framework to provide learning and autonomous decision-making capability for UAVs;(2)An improved method named MN-DDPG for introducing a type of mixed noises to assist UAV with exploring stochastic strategies for online optimal planning;and(3)An algorithm of taskdecomposition and pre-training for efficient transfer learning to improve the generalization capability of UAV’s control model built based on MN-DDPG.The experimental simulation results have verified that the proposed approach can achieve good self-adaptive adjustment of UAV’s flight attitude in the tasks of maneuvering target tracking with a significant improvement in generalization capability and training efficiency of UAV tracking controller in uncertain environments.

    1.Introduction

    Maneuvering target tracking technologies are widely employed in various fields,such as surveillance early warning,probing and rescuing,and high-altitude shooting,etc.[1-3].There are two major approaches to improve the accuracy of maneuvering target tracking:(1)Implement a filtering algorithm to estimate and predict the status of the maneuvering target by using data fusion technique[4];and(2)Design an online path planning method to realize continuous and stable tracking of moving targets by controlling the movement of perception platforms such as unmanned aerial vehicles(UAVs)[5-7].In this paper,we focus on developing an effective path planning algorithm to deal with UAV tracking control of maneuvering target with sustainable real-time decisionmaking.

    Traditional path planning algorithms,such as A*[8],ant colony algorithm[9],and particle swarm optimization algorithm[10],have been implemented to obtain an optimal path based on environment modeling and performance evaluation.However,if a target moves or there is a change of external environments,these methods need to re-model the environment and plan a new effective path accordingly.This process consumes a lot of computational resources and can significantly reduce the effectiveness of real-time target tracking[11].Therefore,it is meaningful to provide autonomous learning and adaptive capabilities for UAV systems so UAVs can detect any changes in the environments dynamically and make tracking decisions autonomously in real time.

    By combining the perceptual abilities of Deep Learning[12]with the decision-making capabilities of Reinforcement Learning[13],Deep Reinforcement Learning(DRL)[14]can be used to generate adaptive strategies online through an interactive process between an agent and an environment and has performed well in the field of UAV motion planning[15-19].Liu et al.[15].transformed the UAVs route programming problem into a partially observable Markov Decision Process(MDP)and utilized the Deep Q Network(DQN)[20]algorithm combined with greedy strategy to support UAV trajectory planning and optimization in a large-scale complex environment.By using the value-based Dueling Double DQN(DDQN)[21]with multi-step learning,Zeng[19]introduced a highprecision navigation technique for UAV,which has improved the effectiveness of target tracking.Although these methods have been successfully applied in UAV target tracking tasks,they have oversimplified UAV flight environment and divided the continuous action space into limited action intervals before constructing the decision-making model,which can significantly affect the UAV attitude stability and the tracking accuracy.

    To address these problems and to achieve continuous action control of UAVs,researchers have explored other DRL algorithms based on policy gradient[22].One such an efficient and feasible algorithm is Deep Deterministic Policy Gradient(DDPG),proposed by Lillicrap[23]through incorporating DQN in Actor-Critic[24]framework,which can map continuous observations directly to continuous actions.DDPG algorithm has gained increasing popularity in intelligent continuous control of UAVs[25-29].For instance,Yang[26]utilized DDPG algorithm to assist UAV in solving air combat situational awareness and dynamic planning decisionmaking problems.Ramos[27]used the DDPG algorithm to train the UAV to complete an automatic landing task in a real dynamic environment and has achieved good results.However,although some progress,the existing DDPG-based UAV intelligent action control are still far from smart,their solutions tend to fall into local optima because of the generation of deterministic strategies.In addition,they are limited to single and pre-defined tasks,and can hardly be generalized to a new task where a target moves with random trajectories,or a more complex situation that may contain unknown obstacle threats.

    To address these limitations,in this paper,an improved DRL algorithm is proposed.This new approach is used to provide UAVs with accurate tracking control strategy for maneuvering target in an uncertain environment.The main contributions of this research are as follows:

    (1)A dedicated UAV decision-making framework of target tracking is established based on DDPG algorithm,in which a tracking controller of UAV can be optimized according to the flight direction and velocity of UAV to achieve the adjustment of the UAV flight attitude.This framework can be used for UAV to learn autonomous real-time maneuvering target tracking and obstacle avoidance.

    (2)An improved MN-DDPG algorithm is proposed,in which a type of mixed noises composed of Gaussian noise and Ornstein-Uhlenbeck(OU)noise is introduced into the process of UAV training to guide the UAV for strategy exploration and help the network escape from local optimum.

    (3)An optimized approach based on transfer learning[30]is introduced to improve the generalization ability of the UAV control model,which can help the UAV quickly learn to track randomly moving targets after being proficient in tracking a fixed moving target.In particular,we apply transfer learning twice to do double pre-training,which assists UAV in making more adequate preparations.Experimental results have shown that these optimizations can improve the efficiency and stability of the training process of deep reinforcement learning,and it can be effectively adapted to the training of UAVs for precise flight actions and target tracking with obstacles avoidance.

    The remainder of this paper is organized as follows:Section 2 elaborates the background of the maneuvering target tracking task for UAV and related theoretical method.Section 3 introduces the core approach i.e.an improved MN-DDPG algorithm combined with parameter-based transfer learning approach theoretically,where crucial improved trick are mixed strategy exploration and parameters transfer.The performance,efficiency,adaptability and etc.of the proposed technique are presented through the presentation of experiments in Section 4.Finally,Section 5 conducts a further conclusion about the paper and envisages some future works about UAVs intelligent autonomous decision system.

    2.Background

    In this Section,an introduction is provided to the essential concepts of maneuvering target tracking,the motion model and observation model of UAV.Furthermore,the relevant theoretical background of DRL-based DDPG algorithm and transfer learning are given.

    2.1.Mathematical model of maneuvering target tracking for UAV

    Maneuvering target tracking refers to observing a moving target through sensors,and then using airborne computer to process the signals from the sensors,perceiving the environment in which the target is located,and performing tracking decisions accordingly[31].In this paper,the maneuvering target tracking problem under consideration involves a UAV,a group of sensors,and a ground maneuvering target.Therefore,it is essential to formulate the relevant UAV kinematic model and observation model.

    2.1.1.Kinematic model of UAV

    Many of the autopilots are equipped with advanced facilities such as high-precision dynamic three-dimensional information processing,AHRS and GPS-INS inertial navigation aid,etc.[32],and there is no need for the user to specify wing control input for the UAV.For the sake of brevity and without loss of generality,in this research,we suppose that the UAV flies at a fixed altitude with the help of autopilots(i.e.,pitch angle and flight height z(t)=H,where H is a positive constant).Hence,the continuous motion equation of the UAV with four degrees of freedom can be expressed as,

    where x and y represent the two-dimensional coordinates of the UAV,φsignifies the yaw angle of the UAV,and v is the flight velocity of UAV.Furthermore,the state update of the UAV at time interval time t can be described as,

    Note that it is reasonable to assume the speed v of the UAV is within a certain range,i.e.,v(t)∈[vminvmax].

    2.1.2.Observation model of UAV

    During the target tracking process,the UAV obtains the motion information of the target continuously in real-time through radar sensors or other auxiliary systems,as shown in Fig.1(a).Consider that the drone is flying at a fixed altitude,in this paper,the tracking process can be simplified in two-dimensional coordinates as shown in Fig.1(b).

    Fig.1.Illustration of a UAV’s observations of ground target in tracking task.

    In addition,a set of range sensors are employed to help UAVs detect possible obstacle threats ahead in the range of L.As shown in Fig.2,the blue area within the sensor range is considered the UAV threat detection area.At each moment,the UAV observation about obstacle is defined as

    Fig.2.UAV obstacle threats detection based on range sensors.

    where d1~7denotes the corresponding sensor indication,and dn=L when the obstacle is not detected,otherwise dn∈[0,L]represents the relative distance between the UAV and the obstacle threats.

    2.2.Deep deterministic policy gradient based on deep reinforcement learning

    2.2.1.Deep reinforcement learning

    Deep reinforcement learning is a technique to train an agent to interact with its environment,and to generate reasonable coping behaviors in accordance with dynamic perception based on the powerful fitting capability of neural network.DRL uses Markov Decision Process(MDP)to model the training process.At each time step of this process,the agent interacts with the environment and obtain the state s∈S at the current moment,and then,selects a corresponding action a∈A.After performing the policy,the agent can receive reward r based on reward function R and get a new state s′∈S in next stage.By continuing trial-and-error interactions with the environment,the agent can leverage previous experience to learn the knowledges of making rational action in the direction of high returns and optimize its behavior to adapt well to varying circumstances,and ultimately,excels in making strategy to meet the mission delivery requirements.

    2.2.2.DDPG algorithm

    The DDPG algorithm is a mature DRL algorithm and can be used to tackle continuous action control problems.Mainly composed of the Actor-Critic framework,the algorithm uses actor online networkμto output action at=μ(st|θμ)according to the current agent’s state and critic online network Q to evaluate this action value,whereθμandθQdenote the parameters of actor online network and critic online networks.In addition,an actor target networkμ′and a critic target network Q′are constructed for following update process.

    He gave, however, as good as he got, and they became so enraged33 that they tore up trees and beat each other with them, till they both fell dead at once on the ground

    When updating the actor and critic networks,a mini-batch of N transitions[st,at,rt,st+1]is sampled from the experience replay buffer M to calculate the loss function L for critic networks,which is given by,

    where Yi=ri+γQ′(si+1,μ′(si+1|θμ′)|θQ′)is the target value,γis the discounted coefficient,i is the sequence number of the sample extracted,θμ′andθQ′represent parameters of actor target network and critic target network,respectively.Meanwhile,training the actor networks according to the policy gradient,which is expressed as,

    At last,the two target network parametersθμ′andθQ′are updated by soft update strategies:

    whereτis a configurable constant coefficient,used to the regulate the soft update factor.

    2.3.Transfer learning

    Transfer learning is a new machine learning method that transfers the learned model parameters to the new model to help the new model training[33].Two basic concepts are referred to TL:(1)source domain,represents the object to be transferred;(2)target domain,represents the target to be endowed with knowledge.As shown in Fig.3,through transfer learning,the learned model parameters or knowledges from source domain task can be shared to the new model in target domain task,which can speed up the training process of new tasks and optimize the learning efficiency[34].

    Fig.3.Schematic of transfer learning.

    There are some common implementation methods of TL,such as instance-based approach,feature-based approach and parameterbased approach[35].The instance-based TL method is to weight different data samples according to similarity and importance.However,this method needs to collect a large number of instance samples and calculate similarities between these instance samples and the new learning samples respectively,which consumes large amounts of memory resources and computational resources[36].The feature-based TL approach needs to project the features of the source domain and the target domain into the same feature space,and utilize some machine learning methods to process the feature matrix[37],and this approach is mainly used for solving classification and recognition problems.The parameter-based TL approach applies the model trained in the source domain to the target domain,and completes the new similar task through a short retraining[38].To build up UAVs tracking decision-making model in this research,parameter transfer is a simple and effective way,and it can help UAVs learn similar strategies from a more reasonable initial network based on model parameters previously trained[39].As a result,the tracking task is simplified into a set of simple sub-tasks.We can train the model to fulfill sub-tasks,and migrate the sub-tasks model to the final task through parameter-based transfer learning,which will be explained in detail in Section 3.

    3.Proposed method

    This Section details the proposed approach for realizing the autonomous tracking control of UAVs for maneuvering target in an uncertain environment including a DDPG-based UAV control framework,an improved algorithm named MN-DDPG and the optimization based on transfer learning introduced.

    3.1.DDPG based framework for UAV control

    Fig.4 describes the UAV maneuvering target tracking framework based on DDPG algorithm.The DDPG networks output an action of UAV based on the current status of the UAV,and then the UAV receives the output of actor networks and generates a corresponding flight strategy to track the target accordingly.Throughout an interactive process with its environment,the networks can be updated,and the UAV can keep learning and making effective sequential control policies from the previous experiences recorded in experience buffer.The UAV state space,UAV action space and the rewards function under this framework are defined and explained below.

    Fig.4.UAV maneuvering target tracking framework.

    3.1.1.State space

    The state space of UAV represents valuable information UAV can attain before decision-making,which is used to help UAV assess the situation.To train the tracking decision-making model,the UAV state space is defined as the input Stof the neural network in our DDPG framework,which is defined as,

    where xtand ytare the coordinates of UAV on X-axis and Y-axis,vtdenote the velocity of UAV.φt,Dtrepresent the relative azimuth and relative distance between UAV and target,as formulated in Equation(3).Moreover,d1,…,d7are used to represent the states of surrounding obstacles based on UAV observation about obstacle Oo,as mentioned in Equation(4).

    3.1.2.Action space

    Considering that the UAV needs to maintain a reasonable speed and stable heading when performing tracking tasks,we employ a UAV strategy controller based on the change rate of UAV flight speed and heading.Therefore,the action output of the UAV is defined as,

    with the acceleration of UAV motion˙vtand yaw angle˙φtin time t,manipulate the UAV attitude according to Equation(2).Taking into account the airborne capabilities of the UAV,we constrain the maneuvering characteristics by which UAV need to oblige,which were explicated in the experiment verification of Section 4.

    3.1.3.Reward function

    The reward in DRL,which is the feedback signal available for the agent’s learning,is used to evaluate the action taken by the agent.A simple approach is to set sparse reward,in which the agent can get a positive return only if the task is accomplished.However,this method is inefficient to collect useful experience data so as to help UAV learning.Accordingly,the convergent speed of network updating is slow and the UAV may not learn optimal strategies.

    In this paper,a non-sparse reward is designed to guide UAV maneuvering target tracking with obstacle avoiding.Four types of excitations are considered in the reward:(1)track rewardabout distance between UAV and target;(2)UAV course reward;(3)UAV steady flight reward;(4)safe rewardabout obstacle avoidance.Specifically,the track reward is formulated as,

    where Dt-1and Dtdenote the distance between UAV and target at the previous moment and at the current moment t.ξ1andξ2are the positive weight ratio of the two rewards in tracking distance reward.UAV will be penalized moderately if the target gets rid of sensor detection range L.On the contrary,it is considered that the UAV can keep up with the mobile device and was given a positive reward when meet the first condition in Equation(10)where Deis the maximum eligible tracking distance for UAV.In addition,the other three rewards are defined as

    whereφis the relative azimuth between UAV and target,˙vtis the accelerated velocity of UAV.Autonomous obstacle avoidance ability is regarded as one of the most material competences of UAV.In Equation(13),d1~7represent the indications of seven sensors.The UAV is expected to stay away from obstacles to maintain its own safety,or it will be punished when obstacles are detected.To summarize,the final reward function in learning process is defined as,

    Here,we introduce four relative gain factorsλ1~4,which represent the respective weights of the four rewards or punishments.

    3.2.MN-DDPG with mixture noise for strategy exploration

    A major challenge of learning based on DDPG in continuous action spaces is exploration.In the process of UAV training,as the target trajectory and environment change,the UAV needs to explore new strategies to complete tracking tasks.Therefore,we propose an improved MN-DDPG algorithm,in which the mixed noises composed of Gaussian noise and Ornstein-Uhlenbeck(OU)noise are added to optimize the deterministic strategy generated from DDPG and guide UAV in strategy exploration.

    Considering that DRL use the MDP to model the tracking task as a sequential decision-making problem,according to the characteristics of the UAV’s continuous action output,the sequential related Ornstein-Uhlenbeck(OU)random process is adopted to provide action exploration for UAV to dealing with the changing environment.The noise based on OU process is given as

    where atis the action state at moment t,a is the average of action sampling data,βis the learning rate for random process,σ1is OU random weight,and Wtrepresents Wiener process.

    In addition,considering deterministic optimal policies for the previous tasks by the transferred model,which might be unbefitting in new task scenarios,another Gaussian noise is introduced to help UAV learn adaptive stochastic behaviors.This exploratory process is particularly important at the beginning of transfer learning.The optimized behavior based on policy network output μtthrough mixed noises are updated as

    whereσ2(e)2denotes Gaussian variance in episode e,which can ensure that the UAV has the uniform and stable exploration capability in each turn,maintaining the effectiveness of exploration and correcting deviation.Meanwhile,as the learning goes on,the transferred model gradually adapts to the new task scenarios,which requires an exponential decay with Gaussian variance by:

    whereδis attenuation coefficient.During the training process,we introduce the MN-DDPG algorithm to update the DDPG control framework,in which the mixed noises are added for UAVs stochastic strategies exploration,and then,associate the training process with following transfer learning.

    3.3.Transfer learning integrated into MN-DDPG

    In this Section,the process of combing transfer learning approach with MN-DDPG is discussed.Essentially,this process continuously decomposes a tracking task into two progressive subtasks and carries on double pre-training of tracking tasks.Specifically,to help UAV learn track target with obstacle avoidance step by step,we disassemble the overall task of UAV tracking complex maneuvering target while avoiding barriers into the two simple sub-tasks:1)UAV tracking task for uniform linear motion target;and 2)UAV tracking task for target with random motion.The learning process of UAV maneuvering target tracking based on MNDDPG combined with transfer learning based on task decomposition and double pre-training is summarized in Fig.5.

    During the interaction between a UAV and the environment in sub-task 1 i.e.first pre-training,the preliminary tracking model of UAV can be trained based on MN-DDPG algorithm and the weights and bias parameters of current networks can be saved as?1.Then,we transfer the trained?1into sub-task 2 as the initial network for sub-task 2.As the target move with various trajectories,UAV can be trained based on MN-DDPG algorithm to continuously learn new strategy.After finish sub-task 2 learning,we save the networks as?2.At this point,it is deemed that we have completed the second pre-training on the UAV.Ultimately,we add obstacles in task scenarios and transfer?2into the final task?tracking.Through the training in multiple new environments with random obstacles,the model cannot only keep accurate tracking of the target but also learn new knowledges about obstacles avoidance.With the successive migration of each trained subtask model,ultimately the UAV can be trained to be proficient in generating comprehensive strategy.A detailed description of the complete algorithm is given in Algorithm 1gorithm 1.

    Fig.5.The learning process of UAV maneuvering target tracking based on MN-DDPG combined with transfer learning,where decompose task into two sub-tasks to double pretraining.

    Algorithm 1UAV maneuvering target tracking training workflow of the proposed approach.

    4.Experiment and result analysis

    In this Section,we discuss the simulation experiment settings and further analyze the effectiveness and performance boost of our proposed approach based on the experimental results.

    Table 1The detailed environment settings in maneuvering target tracking tasks.

    4.1.Simulation environment settings

    In the experiments,settings and constraints on the parameters of the task environments have been made.According to the UAV aircraft performance,the acceleration and angular velocity range of the UVA can be determined,and the velocity of UAV is set to(m/s).Both of the initial velocity of UAV and target are set as 20 m/s.In addition,the Constant Velocity(CV)[31]model has been introduced for target in sub-task 1,where target does a fixed uniform linear motion.Then,the target with stochastic CV and Constant Turn(CT)[31]model has been considered in sub-task 2,where target maintains a constant speed but its trajectory is random.To train and verify the decision-making ability of UAVs in unknown environments,three random obstacles are added in final task based on sub-task 2.In our experiment,the simulation step sizeΔt=0.1s.Taking into account the fuel consumption of the UAV,the maximum working time of UAV was set as 40s.When the UAV completes the tracking task within the specified time(i.e.UAV keeps tracking in 400 steps)or the target is out of detection range of radar sensors,the task in current training episode is considered to be over,and the simulation environment will be reset.Table 1 gives the detailed environment settings in the tasks.

    In the UAV maneuvering target tracking framework,the actor network and its target network are constructed by two 12×64×64×2 fully connected neural networks based on the UAVs’observation,and the critic network and its target network are constructed by two 14×64×64×1 fully connected neural networks.When the experience replay buffer is full of data,the Adam-Optimizer is employed to update the neural network parameters.The detailed hyper-parameters in different stages of training are set as shown in Table 2.In addition,the parameters relating to mixed noises are set with the learning rate for OU random processβ=0.1,OU random weightσ1=0.2,initial Gaussian varianceσ2(0)2=0.5,and the attenuation coefficient of Gaussian varianceδ=0.001.

    Table 2Hyper-parameters in UAV training process based on MN-DDPG.

    4.2.Simulation results analysis

    Primarily,we did the first pre-training i.e.sub-task 1 as shown in Fig.6.In Fig.6(a)we presented the real-time position changes of the UAV and the target with green and blue trajectories,respectively.The UAV continuously learned through MN-DDPG algorithm,and selected the appropriate acceleration and angular velocity to adjust its speed and attitude,thereby maintaining the distance to the target within a large fluctuation range as demonstrated in Fig.6(b).Although we can see that the UAV did not stabilize its own speed as evident in Fig.6(c),the MN-DDPG algorithm proposed has completed the preliminary plan of training the UAV for simple tracking target.

    Fig.6.Simulation 1 about UAV tracking linear moving target.

    After setting the environment variables,the second pre-training was conducted to fine-tune the model in new scenario on the basis of sub-task 1,where the weights of the trained network were transferred to sub-task 2 as the initial weights of the current policy network.Furthermore,CT model was added for target on the basis of the primordial CV model.Fig.7(a),Fig.7(b)and(c)show the UAV tracking trajectory,tracking distance deviation and flight speed respectively in sub-task 2.It can be clearly seen that the UAV has been following the target and maintaining a similar flight attitude with the target.With continuous iteration,the distance between UAV and target gradually decreased,and the velocity of UAV tended to be stable.Compared to the previous training,the UAV has optimized its strategies and effectively tracked complex maneuvering targets.

    Fig.7.Simulation 2 about UAV tracking complex maneuvering target.

    The final task was to train UAV to reach the navigation goal of tracking complex maneuvers while avoiding obstacles based on sub-task 2.After accomplishing the double pre-training,by transmitting the trained network weight parameters into the final task,the entire training was eventually completed by fine-tuning of neural networks and the training result is shown in Fig.8.In Fig.8(a),the red areas represent obstacles.Although the situation was difficult for the UAV to make decisions on maneuvering of its own in a complex dynamic stochastic environment involving obstacles,the UAV still navigated securely and tracked the maneuvering target accurately after a quick transfer learning.As shown in Fig.8(b),the distance between the UAV and the target was still dynamically stable within a small range.Fig.8(c)displays the statistical results with minor fluctuations of the UAV speed from 0 to 400 steps in one whole tracking process.This indicates that MNDDPG algorithm with transfer learning by double pre-training implemented can not only help UAV to cope with unexpected changes in complex environments,but also optimize the flight stability of UAV.

    Fig.8.Simulation 3 about UAV tracking complex maneuvering target in an environment with random obstacles.

    4.2.1.Algorithm effectiveness

    In order to verify the advantages of MN-DDPG algorithm for strategy exploration,a comparative experiment has been designed with conventional DDPG.Fig.9 shows the reward curve of the UAV implementing the two algorithms in sub-task 1.The red curve and blue curve represent the reward in MN-DDPG algorithm and DDPG algorithm,respectively.As it can be seen from the figure,the blue curve was in a downtrend,which means that UAV cannot obtain substantial and stable rewards.After the two algorithms have converged steadily,the reward value of MN-DDPG algorithm converged to 271.37.The comparative reward of the value of DDPG was stabilized to-212.68,which was significantly less than the convergence value of MN-DDPG algorithm.

    Fig.9.Reward in each episode during the training phase.

    During the testing,we randomly selected 50 episodes as samples as shown in Fig.10,from which we can see that the reward value of MN-DDPG algorithm test was relatively stable,with an average value of 310.7828,and the reward value of DDPG fluctuated significantly,with an average value of-110.2586.This means that the disciplined UAV using the DRL-based MN-DDPG algorithm can explore effective policies and complete maneuvering target tracking tasks in a simple environment.

    Fig.10.Rewards in each episode during the testing phase.

    4.2.2.Algorithm improvement

    Fig.11 depicts the UAV trajectory after we directly applied the trained network from the sub-task 1 to the sub-task 2,from which we can conclude that direct network parameter application without supplementary learning and fine-tuning has bad performance in our tasks.At the condition of target moving in a straight line,the UAV could track the target within a controllable range.But when the target moves in a curve,the UAV failed to make an effective response decision and was gradually being pulled away from the target.That means the decision-making model we trained based on DDPG algorithm without transfer learning had a weak generalization capability.

    Fig.11.Tracking simulation based on directly applying parameters without transfer learning.

    To verify the effectiveness of MN-DDPG combined with transfer learning algorithm,we collected the UAV reward data under 800 rounds using traditional DDPG,MN-DDPG and MN-DDPG with transfer learning techniques.In Fig.12,the green curve is the relevant changing curve of rewards during the training in final complicated scenario from initial networks by using traditional DDPG algorithm.In addition,we have also plotted the blue curve represented the cumulative rewards each round in our task based on MN-DDPG algorithm,and the red curve represented the rewards aggregate in each round optimized by MN-DDPG combined with transfer learning.Distinctly,the learning effect of UAV based on untreated DDPG algorithm is comparatively unideal.It only gradually gains unstable revenue-boosting after about 480 rounds.The UAV optimized by MN-DDPG via mixed exploratory noises can start to learn valid knowledge shortly after the beginning of training,which means that MN-DDPG algorithm improved the exploratory efficiency of UAV and accelerated the learning process.In this case,the UAV has obtained a stable high reward after about 600 episodes.Compared with the two situations discussed above,the trained UAV,utilizing MN-DDPG with transfer learning via double pre-training in complex task scenario,could obtain a high reward around 210th episode.The reward curve has been stabilized quickly after a slight fluctuation,which means that UAV can make rational tracking strategy after only 270 training episodes.

    Fig.12.Based on the sum of rewards agent get in each training episode,compare DDPG,MN-DDPG with MN-DDPG combined with transfer learning.

    These results indicate that the performance and efficiency of DRL-based UAV tracking maneuvering target MN-DDPG algorithm combined with transfer learning is greatly improved.It enables the UAV to accomplish the tracking task under the environment of complex maneuvering targets and random disturbances,and also accelerates the training via original DRL approach.

    5.Conclusions

    The traditional algorithm solves the UAV maneuvering target tracking problem by establishing a tracking model matching with the target motion based on the known environment.When the environment changes or an obstacle threat generates,the environment modeling should be updated in real time,which consumes a lot of computing resources and reduce the real-time and effectiveness of tracking.

    The intent of this paper was to illustrate that DRL-based MNDDPG combined with transfer Learning is an efficient approach to develop an autonomous maneuvering target tracking control system for UAV applications in dynamic environments.Using MNDDPG algorithm,this paper constructs an online decision-making method and realizes maneuvering target tracking and obstacle avoidance autonomously of UAV.The simulation results show that the mixed exploratory noises and the parameter-based transfer learning we introduced can improve the convergence speed of the neural network in original DDPG algorithm,and improve the generalization capability of the UAV control model.

    We intend to expand the UAV missions in the future to evaluate the performance,efficiency and robustness of our algorithms to a realistic 3D space,so as to support UAV flight with six degrees of freedom.Furthermore,we are also going to build up the diversity and authenticity of the simulated scenario,e.g.add interferometer of winds and other uncertainties,and consider the introduction of UAV positioning error and other errors,which accelerate the achievement conversion from the virtual digital simulation to real UAVs and other spacecraft instrumentations.

    Declaration of competing interest

    The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

    Acknowledgements

    The authors would like to acknowledge National Natural Science Foundation of China(Grant No.61573285,No.62003267),Aeronautical Science Foundation of China(Grant No.2017ZC53021),Open Fund of Key Laboratory of Data Link Technology of China Electronics Technology Group Corporation(Grant No.CLDL-20182101)and Natural Science Foundation of Shaanxi Province(Grant No.2020JQ-220)to provide fund for conducting experiments.

    男女下面进入的视频免费午夜| 长腿黑丝高跟| 亚洲熟妇中文字幕五十中出| av在线天堂中文字幕| 一个人免费在线观看电影 | 亚洲人成网站高清观看| 亚洲avbb在线观看| 欧美另类亚洲清纯唯美| 亚洲 欧美一区二区三区| 一本综合久久免费| 动漫黄色视频在线观看| 亚洲成人精品中文字幕电影| 久久精品国产亚洲av香蕉五月| 99热只有精品国产| 午夜亚洲福利在线播放| 在线观看午夜福利视频| 国产麻豆成人av免费视频| av片东京热男人的天堂| 亚洲在线自拍视频| 老司机深夜福利视频在线观看| 国产精品1区2区在线观看.| 999久久久精品免费观看国产| 大型黄色视频在线免费观看| 午夜a级毛片| 亚洲av美国av| 18禁美女被吸乳视频| 日本一二三区视频观看| 非洲黑人性xxxx精品又粗又长| 久久精品国产99精品国产亚洲性色| 国产av一区在线观看免费| 亚洲欧洲精品一区二区精品久久久| 欧美一级a爱片免费观看看 | 无遮挡黄片免费观看| 国产又色又爽无遮挡免费看| 美女黄网站色视频| 国产精品亚洲一级av第二区| 中文字幕熟女人妻在线| 欧美日韩黄片免| 亚洲国产精品合色在线| 亚洲av片天天在线观看| 亚洲avbb在线观看| 欧美色视频一区免费| 最近视频中文字幕2019在线8| 一区福利在线观看| 国产99白浆流出| 中文资源天堂在线| 亚洲av电影不卡..在线观看| 精品一区二区三区四区五区乱码| 狠狠狠狠99中文字幕| 欧美日韩中文字幕国产精品一区二区三区| 一进一出好大好爽视频| 日韩成人在线观看一区二区三区| 国产精品综合久久久久久久免费| 丁香六月欧美| 国产精品久久电影中文字幕| 午夜精品一区二区三区免费看| 亚洲自偷自拍图片 自拍| 午夜两性在线视频| 亚洲av成人不卡在线观看播放网| 美女大奶头视频| 黄片小视频在线播放| 亚洲av成人av| 亚洲国产欧洲综合997久久,| 男女之事视频高清在线观看| 中亚洲国语对白在线视频| 在线观看免费日韩欧美大片| 久久精品成人免费网站| 精品欧美国产一区二区三| 亚洲欧美激情综合另类| 成人18禁高潮啪啪吃奶动态图| 一级毛片女人18水好多| 国产熟女xx| 一二三四在线观看免费中文在| 久久香蕉精品热| 久久精品91蜜桃| 最好的美女福利视频网| 美女大奶头视频| 美女午夜性视频免费| 99精品欧美一区二区三区四区| 国产精品av久久久久免费| 嫁个100分男人电影在线观看| 黄片大片在线免费观看| 99热这里只有是精品50| 不卡一级毛片| 成在线人永久免费视频| 欧美3d第一页| 午夜两性在线视频| 久久久久久久午夜电影| 欧美另类亚洲清纯唯美| 听说在线观看完整版免费高清| 国产亚洲av嫩草精品影院| 日本三级黄在线观看| 国内精品一区二区在线观看| 精品国产美女av久久久久小说| av视频在线观看入口| av福利片在线| 亚洲第一欧美日韩一区二区三区| 国产探花在线观看一区二区| 身体一侧抽搐| 五月玫瑰六月丁香| 精品久久久久久久末码| 嫁个100分男人电影在线观看| 首页视频小说图片口味搜索| 国产99久久九九免费精品| 婷婷亚洲欧美| 国产精品久久久久久亚洲av鲁大| 88av欧美| 亚洲aⅴ乱码一区二区在线播放 | 一本综合久久免费| 每晚都被弄得嗷嗷叫到高潮| 国产成人系列免费观看| 国产av不卡久久| 这个男人来自地球电影免费观看| 可以在线观看毛片的网站| 大型黄色视频在线免费观看| а√天堂www在线а√下载| 欧美日韩亚洲国产一区二区在线观看| 国产午夜福利久久久久久| 久久精品夜夜夜夜夜久久蜜豆 | 特级一级黄色大片| 他把我摸到了高潮在线观看| 欧美人与性动交α欧美精品济南到| 国产99久久九九免费精品| 神马国产精品三级电影在线观看 | 成人特级黄色片久久久久久久| 国产精品一区二区三区四区久久| aaaaa片日本免费| 特级一级黄色大片| 欧美黑人欧美精品刺激| 国产69精品久久久久777片 | 他把我摸到了高潮在线观看| x7x7x7水蜜桃| 亚洲欧美日韩东京热| av欧美777| 给我免费播放毛片高清在线观看| 韩国av一区二区三区四区| 99re在线观看精品视频| 制服诱惑二区| 搞女人的毛片| 亚洲熟妇中文字幕五十中出| 精品午夜福利视频在线观看一区| 国产午夜福利久久久久久| 视频区欧美日本亚洲| 亚洲成a人片在线一区二区| 亚洲欧美日韩东京热| 在线视频色国产色| 禁无遮挡网站| 又黄又粗又硬又大视频| 丰满的人妻完整版| 久久这里只有精品19| 岛国在线观看网站| 国产av不卡久久| 欧美丝袜亚洲另类 | 18禁黄网站禁片免费观看直播| 久久亚洲真实| 一级毛片高清免费大全| 婷婷六月久久综合丁香| 国产成人av教育| 少妇熟女aⅴ在线视频| 国产精品久久久久久精品电影| tocl精华| 国产熟女xx| 久久99热这里只有精品18| 香蕉国产在线看| 色综合婷婷激情| 男女下面进入的视频免费午夜| 亚洲成av人片免费观看| 老司机深夜福利视频在线观看| 欧美午夜高清在线| 97超级碰碰碰精品色视频在线观看| 欧美精品啪啪一区二区三区| 国产亚洲欧美在线一区二区| 欧美日韩亚洲国产一区二区在线观看| 好男人电影高清在线观看| 国产精品av久久久久免费| 高清在线国产一区| 国产av在哪里看| 成人午夜高清在线视频| 亚洲 欧美一区二区三区| 天天躁狠狠躁夜夜躁狠狠躁| 亚洲色图 男人天堂 中文字幕| 99国产精品99久久久久| 日韩精品免费视频一区二区三区| 一级作爱视频免费观看| 白带黄色成豆腐渣| 国产亚洲欧美在线一区二区| 精品久久久久久久久久久久久| 国产99白浆流出| 天天添夜夜摸| 伦理电影免费视频| 人人妻,人人澡人人爽秒播| 久久久久久久精品吃奶| 一区二区三区国产精品乱码| 日本 av在线| 久9热在线精品视频| 国产亚洲精品一区二区www| 99久久精品热视频| 免费看日本二区| 日韩中文字幕欧美一区二区| 最近视频中文字幕2019在线8| 成人18禁在线播放| 国产精品久久久av美女十八| 午夜免费激情av| 99久久99久久久精品蜜桃| 亚洲精品久久成人aⅴ小说| 亚洲人成电影免费在线| 麻豆成人av在线观看| 亚洲人成77777在线视频| 国产成人啪精品午夜网站| 亚洲成人久久性| 大型av网站在线播放| 夜夜躁狠狠躁天天躁| 村上凉子中文字幕在线| 欧美色视频一区免费| 天天一区二区日本电影三级| 19禁男女啪啪无遮挡网站| 美女扒开内裤让男人捅视频| av天堂在线播放| 十八禁人妻一区二区| 黑人操中国人逼视频| 黄色女人牲交| 99久久国产精品久久久| 欧美日韩黄片免| 狠狠狠狠99中文字幕| 啦啦啦韩国在线观看视频| 五月伊人婷婷丁香| 99热6这里只有精品| 亚洲精品一卡2卡三卡4卡5卡| 亚洲国产欧美人成| 欧美一级a爱片免费观看看 | 麻豆成人av在线观看| 九色成人免费人妻av| 亚洲欧美精品综合久久99| 999精品在线视频| 成人午夜高清在线视频| 香蕉久久夜色| 女生性感内裤真人,穿戴方法视频| 熟女少妇亚洲综合色aaa.| 伊人久久大香线蕉亚洲五| 欧美黑人巨大hd| 黄色a级毛片大全视频| 97人妻精品一区二区三区麻豆| 香蕉国产在线看| 国产精品一区二区三区四区免费观看 | 正在播放国产对白刺激| 国产一区二区在线观看日韩 | 五月玫瑰六月丁香| 亚洲欧洲精品一区二区精品久久久| 免费看a级黄色片| 久久久久九九精品影院| 黄色成人免费大全| 18禁美女被吸乳视频| 久久 成人 亚洲| 久久精品国产亚洲av香蕉五月| 97超级碰碰碰精品色视频在线观看| 日本黄色视频三级网站网址| 夜夜爽天天搞| 一本大道久久a久久精品| 亚洲国产欧洲综合997久久,| 久久国产精品人妻蜜桃| 国产高清激情床上av| 国内精品久久久久久久电影| 99久久久亚洲精品蜜臀av| 久久天躁狠狠躁夜夜2o2o| 黑人操中国人逼视频| www.999成人在线观看| 国产成人欧美在线观看| 午夜福利在线在线| 99精品久久久久人妻精品| 777久久人妻少妇嫩草av网站| 欧美性猛交黑人性爽| 国产精品一区二区免费欧美| 久久精品国产清高在天天线| 国产精品 欧美亚洲| 色在线成人网| 亚洲成a人片在线一区二区| 免费在线观看影片大全网站| 欧美乱妇无乱码| 又黄又爽又免费观看的视频| 久久人妻福利社区极品人妻图片| 男男h啪啪无遮挡| 久久中文看片网| 91麻豆av在线| 美女免费视频网站| 国产亚洲欧美98| 人妻夜夜爽99麻豆av| 欧美成人一区二区免费高清观看 | 巨乳人妻的诱惑在线观看| 99久久国产精品久久久| 天天躁夜夜躁狠狠躁躁| 欧美色视频一区免费| 一二三四在线观看免费中文在| 在线观看美女被高潮喷水网站 | 日韩欧美精品v在线| 一级作爱视频免费观看| 每晚都被弄得嗷嗷叫到高潮| tocl精华| av福利片在线观看| 久久天躁狠狠躁夜夜2o2o| 亚洲欧美日韩东京热| 九九热线精品视视频播放| 人妻丰满熟妇av一区二区三区| 亚洲片人在线观看| 国产精品久久电影中文字幕| 一区福利在线观看| 免费看日本二区| 精品人妻1区二区| 精品欧美国产一区二区三| 又紧又爽又黄一区二区| 老鸭窝网址在线观看| 国产精品亚洲美女久久久| videosex国产| 看片在线看免费视频| 欧美黄色片欧美黄色片| 精品久久久久久久久久免费视频| 欧美黑人欧美精品刺激| 欧美一级a爱片免费观看看 | 熟妇人妻久久中文字幕3abv| 99国产综合亚洲精品| 日日夜夜操网爽| 亚洲成人国产一区在线观看| 亚洲欧洲精品一区二区精品久久久| 国产又色又爽无遮挡免费看| 欧美另类亚洲清纯唯美| av欧美777| 精品午夜福利视频在线观看一区| 中文字幕人成人乱码亚洲影| 校园春色视频在线观看| 日韩 欧美 亚洲 中文字幕| 国产精品亚洲av一区麻豆| 亚洲精品在线观看二区| 国产视频内射| 国产男靠女视频免费网站| 国产成人av教育| 成人av一区二区三区在线看| 欧美人与性动交α欧美精品济南到| 亚洲精品在线美女| 欧美色视频一区免费| 精品国产亚洲在线| 91国产中文字幕| 久久香蕉精品热| 欧美成人一区二区免费高清观看 | 亚洲激情在线av| 美女扒开内裤让男人捅视频| 男女做爰动态图高潮gif福利片| 色av中文字幕| 国产av一区二区精品久久| 日韩欧美在线二视频| av国产免费在线观看| 狂野欧美激情性xxxx| 久久人人精品亚洲av| 无人区码免费观看不卡| 国产三级中文精品| 桃色一区二区三区在线观看| 老司机靠b影院| 国产欧美日韩一区二区精品| 午夜福利高清视频| 亚洲人与动物交配视频| 午夜福利高清视频| 亚洲人成伊人成综合网2020| 亚洲自拍偷在线| 97超级碰碰碰精品色视频在线观看| 日韩欧美国产一区二区入口| 中文字幕精品亚洲无线码一区| www.自偷自拍.com| 亚洲欧美激情综合另类| 日韩欧美精品v在线| 首页视频小说图片口味搜索| 国产一区在线观看成人免费| 亚洲国产精品久久男人天堂| 丰满的人妻完整版| 亚洲在线自拍视频| 久久香蕉国产精品| 在线观看舔阴道视频| 国产精品精品国产色婷婷| 看黄色毛片网站| 亚洲人成77777在线视频| 黑人操中国人逼视频| 国产精品精品国产色婷婷| 搡老熟女国产l中国老女人| 中文字幕熟女人妻在线| 超碰成人久久| 亚洲aⅴ乱码一区二区在线播放 | 日本一二三区视频观看| 免费在线观看日本一区| 久久人人精品亚洲av| 18美女黄网站色大片免费观看| netflix在线观看网站| 欧美午夜高清在线| 中出人妻视频一区二区| 日韩中文字幕欧美一区二区| 精品久久久久久久毛片微露脸| 亚洲天堂国产精品一区在线| 亚洲自拍偷在线| 国产真实乱freesex| 午夜激情av网站| 露出奶头的视频| 男女那种视频在线观看| 99re在线观看精品视频| 午夜老司机福利片| 此物有八面人人有两片| 亚洲精品中文字幕一二三四区| 一级毛片高清免费大全| 日本精品一区二区三区蜜桃| 色av中文字幕| 天天躁夜夜躁狠狠躁躁| 人成视频在线观看免费观看| 欧美激情久久久久久爽电影| 人成视频在线观看免费观看| 亚洲成人免费电影在线观看| 99精品久久久久人妻精品| 嫩草影视91久久| 国产麻豆成人av免费视频| 精品人妻1区二区| 精品无人区乱码1区二区| 午夜影院日韩av| 两人在一起打扑克的视频| 亚洲乱码一区二区免费版| 国产单亲对白刺激| 黄色女人牲交| 人妻丰满熟妇av一区二区三区| 国产一区二区激情短视频| 亚洲成av人片在线播放无| 免费看美女性在线毛片视频| 久久久久国产精品人妻aⅴ院| 国产精品 国内视频| 小说图片视频综合网站| 99精品久久久久人妻精品| 神马国产精品三级电影在线观看 | 久久99热这里只有精品18| xxx96com| 91成年电影在线观看| 91在线观看av| 一个人观看的视频www高清免费观看 | 国产成人精品久久二区二区免费| 精品免费久久久久久久清纯| 女同久久另类99精品国产91| 精品熟女少妇八av免费久了| 日韩欧美精品v在线| www.精华液| 亚洲,欧美精品.| 成人欧美大片| 精品免费久久久久久久清纯| 国产视频一区二区在线看| 禁无遮挡网站| 久久久久亚洲av毛片大全| 久久久久国产一级毛片高清牌| 男女做爰动态图高潮gif福利片| 久久精品国产综合久久久| 90打野战视频偷拍视频| 又黄又爽又免费观看的视频| 男女视频在线观看网站免费 | 国产欧美日韩一区二区精品| 人人妻人人澡欧美一区二区| 校园春色视频在线观看| 中文字幕人妻丝袜一区二区| 在线观看舔阴道视频| 99国产精品一区二区蜜桃av| 国产一级毛片七仙女欲春2| 国产真实乱freesex| 国产三级在线视频| 成人18禁在线播放| 国产一区二区激情短视频| 无限看片的www在线观看| 国产高清有码在线观看视频 | 免费看美女性在线毛片视频| 亚洲专区国产一区二区| 午夜精品久久久久久毛片777| 午夜视频精品福利| 女人被狂操c到高潮| 国产免费av片在线观看野外av| 99热这里只有是精品50| av有码第一页| 亚洲欧美日韩东京热| 又大又爽又粗| 999久久久国产精品视频| 狂野欧美激情性xxxx| 他把我摸到了高潮在线观看| 99精品在免费线老司机午夜| 精品国产乱码久久久久久男人| 老熟妇乱子伦视频在线观看| 国产熟女午夜一区二区三区| 成人三级黄色视频| 欧美大码av| videosex国产| 国产乱人伦免费视频| 99在线视频只有这里精品首页| 99国产极品粉嫩在线观看| 人妻久久中文字幕网| 久久精品国产清高在天天线| 亚洲国产欧美网| 一个人观看的视频www高清免费观看 | 亚洲色图av天堂| 后天国语完整版免费观看| 久久欧美精品欧美久久欧美| 一卡2卡三卡四卡精品乱码亚洲| 欧美黑人欧美精品刺激| 成人永久免费在线观看视频| 桃红色精品国产亚洲av| 亚洲精品一区av在线观看| 国产精品永久免费网站| 麻豆国产97在线/欧美 | 九九热线精品视视频播放| 国产精品一区二区三区四区免费观看 | 日韩欧美三级三区| 国产熟女xx| 欧美zozozo另类| 国产亚洲精品久久久久久毛片| 国产熟女午夜一区二区三区| 成人三级黄色视频| 亚洲欧美日韩高清在线视频| 国产黄色小视频在线观看| 小说图片视频综合网站| 国产日本99.免费观看| 丰满的人妻完整版| 亚洲av成人一区二区三| 日本 欧美在线| 男人舔女人的私密视频| 校园春色视频在线观看| 女生性感内裤真人,穿戴方法视频| 757午夜福利合集在线观看| 桃红色精品国产亚洲av| 色在线成人网| 亚洲在线自拍视频| 亚洲欧美一区二区三区黑人| 老司机靠b影院| 欧美成人免费av一区二区三区| 十八禁网站免费在线| 超碰成人久久| 久久久国产成人免费| 国产真人三级小视频在线观看| 啦啦啦观看免费观看视频高清| 亚洲精品中文字幕一二三四区| 日本黄大片高清| 欧美一级毛片孕妇| 九色成人免费人妻av| 国产精品乱码一区二三区的特点| 国产欧美日韩一区二区精品| 日韩 欧美 亚洲 中文字幕| 啦啦啦韩国在线观看视频| 大型av网站在线播放| 久久久久久九九精品二区国产 | www.999成人在线观看| 欧美性长视频在线观看| 给我免费播放毛片高清在线观看| 国产精品久久久av美女十八| 成人一区二区视频在线观看| 久久亚洲真实| 婷婷精品国产亚洲av在线| 校园春色视频在线观看| 欧美性猛交黑人性爽| 日本成人三级电影网站| 老司机午夜十八禁免费视频| 一进一出好大好爽视频| 国产精品亚洲美女久久久| 午夜精品在线福利| 国产又色又爽无遮挡免费看| 真人做人爱边吃奶动态| 在线观看午夜福利视频| 亚洲自拍偷在线| 高潮久久久久久久久久久不卡| 国产亚洲精品久久久久5区| 久久精品91蜜桃| 亚洲欧美精品综合久久99| 19禁男女啪啪无遮挡网站| 国产精品98久久久久久宅男小说| 亚洲人与动物交配视频| 国产私拍福利视频在线观看| 搞女人的毛片| 在线观看免费午夜福利视频| 色综合亚洲欧美另类图片| 香蕉久久夜色| 国产欧美日韩一区二区精品| 久久精品夜夜夜夜夜久久蜜豆 | 亚洲精品美女久久av网站| 老司机在亚洲福利影院| 欧美色欧美亚洲另类二区| 久久精品国产亚洲av香蕉五月| 久久精品国产综合久久久| 香蕉国产在线看| 午夜a级毛片| 亚洲成av人片在线播放无| 给我免费播放毛片高清在线观看| 中文字幕人成人乱码亚洲影| 亚洲中文日韩欧美视频| 日本一二三区视频观看| 国产又色又爽无遮挡免费看| 色综合婷婷激情| 给我免费播放毛片高清在线观看| 97人妻精品一区二区三区麻豆| 久久 成人 亚洲| 麻豆av在线久日| 在线视频色国产色| 国产av麻豆久久久久久久| 这个男人来自地球电影免费观看| 在线视频色国产色| 免费一级毛片在线播放高清视频| 最近最新免费中文字幕在线| av中文乱码字幕在线| 99热这里只有是精品50| 免费搜索国产男女视频| 国产亚洲精品第一综合不卡| 黄色a级毛片大全视频| 18禁美女被吸乳视频| 成人av一区二区三区在线看| 久久国产乱子伦精品免费另类| 精品国产乱子伦一区二区三区| 69av精品久久久久久| 日本五十路高清| 中文字幕久久专区| 中国美女看黄片| 巨乳人妻的诱惑在线观看| 日韩欧美在线乱码| 禁无遮挡网站| 国产精品av视频在线免费观看|