• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Path Planning for Intelligent Robots Based on Deep Q-learning with Experience Replay and Heuristic knowledge

    2020-08-05 09:40:22LanJiangHongyunHuangandZuohuaDing
    IEEE/CAA Journal of Automatica Sinica 2020年4期

    Lan Jiang,Hongyun Huang,and Zuohua Ding,

    Abstract—Path planning and obstacle avoidance are two challenging problems in the study of intelligent robots. In this paper, we develop a new method to alleviate these problems based on deep Q-learning with experience replay and heuristic knowledge. In this method,a neural network has been used to resolve the “curse of dimensionality” issue of the Q-table in reinforcement learning. When a robot is walking in an unknown environment, it collects experience data which is used for training a neural network; such a process is called experience replay.Heuristic knowledge helps the robot avoid blind exploration and provides more effective data for training the neural network.The simulation results show that in comparison with the existing methods, our method can converge to an optimal action strategy with less time and can explore a path in an unknown environment with fewer steps and larger average reward.

    I.Introduction

    with the development of science and technology,intelligent robots play an increasingly important role in human life.Avoiding obstacles in unknown environments and exploring a route are the most basic tasks of intelligent robots.Examples include sweeping robots, mining robots,and rescue robots.Due to the lack of detailed environment information and the unpredictable nature of the environment,it is difficult for intelligent robots to autonomously plan a path and avoid obstacles.

    In the traditional method, researchers often regard the environment as a geometric world and construct a map[1],[2],but it is time-consuming to build and update maps and it is impossible to construct a map that includes all the scenarios.Fuzzy logic method can cope with uncertain data,and make the robot navigate whiling ensuring obstacle avoidance[3].Heuristic algorithms are widely used in path planning.In[4],[5], particle swarm optimization algorithm is used to avoid obstacle collision.The ant colony algorithm is also used to do path planning in[6].All of them adopt heuristic functions to coordinate the robot to explore in a good direction.The artificial potential field method regards the robot environment as a potential field,in which the target point produces gravitational force to attract to the robot,and obstacles generate repulsive force to repel the robot.The studies[7],[8] propose two modified artificial potential methods for path planning.Map construction and neural network are combined to sense the environment and avoid collisions in[9],the system constructs a grid-based map by using known information and calculates the optimal trajectory by using a neural network.

    In recent years,reinforcement learning has been widely used in intelligent robot path planning and obstacle avoidance[10],[11].But there are several shortcomings in reinforcement learning.First,the“curse of dimensionality”occurs when the robot is put into a complex environment.In addition,slow convergence is still a problem in reinforcement learning.It takes a long time to train the robot[12].The last issue is the poor portability and generalization of reinforcement learning,where a trained robot cannot move in a new unknown environment.

    In this paper,we apply deep Q-learning(DQL)with experience replay(ER)[13],[14]and heuristic knowledge(HK)for robot path planning and obstacle avoidance.In this method,a neural network is used to replace the Q-table in reinforcement learning.We take the original sonar signal as the input of the neural network,which solves the problem of“curse of dimensionality”.The experience replay mechanism maximizes the use of experience data that is collected by robots during moving and disrupts the correlation of the neural network’s training data.Heuristic knowledge provides guidance for the actions selection of robots and helps the network converge faster.Simulation results shows that our method ensures the intelligent robot can path plan without collision in an unknown environment.They also show the effectiveness and general applicability of our method.

    The structure of this paper is organized as follows.Section II introduces the framework of our approach;Section III presents a method to train the neural network with experience replay and heuristic knowledge;Section IV shows the simulation experiment and experimental result;Section V discusses some related works;and finally,Section VI gives some conclusions about this paper and introduces future works.

    II.Our Approach

    Our approach is based on modified reinforcement learning.First,we briefly describe commonly used reinforcement learning.Then,our approach is introduced.

    A. Existing Reinforcement Learning

    Reinforcement learning system is a system in which the agent learns action strategy from the mapping of the environment to behaviors to maximize the value of the reward.Rewards provided by the environment in reinforcement learning systems are evaluations of the quality of actions.Reinforcement learning systems gain knowledge in an action-evaluation environment and improves its action strategy to adapt to the environment.

    Reinforcement learning is a learning technique that approximates dynamic programming.It determines the optimal strategy in a step-by-step manner and tries to find maximum cumulative reward value in every state as its optimization strategy[15].Instead of requiring positive or negative labels,reinforcement learning enables a robot to autonomously discover an optimal behavior through trial-anderror interactions with the environment[16].Fig.1 is the framework of reinforcement learning,where the agent selects an actionaaccording to the Q-table and executes it,then the environment returns a statesand a rewardrto the agent.The most commonly used reinforcement learning algorithm is Qlearning.In Q-learning,the Q-table is an optimal strategy action value functionQ(s,a),it updates according to

    Fig.1.The framework of reinforcement learning system.

    Reinforcement learning tasks are usually described using Markov decision processes(M DPs ).The essence of MDPs is that the probability and the rewards obtained of the transition from the current state to the next state only depend on the current state and action,and has nothing to do with the past states and actions[17].

    Today,reinforcement learning is widely used in all aspects.Leiet al.[18]introduced reinforcement learning to design an adaptive strategy for the iterated prisoner’s dilemma and simulation results illustrate the effectiveness of this method.Leiet al.[19]studied how to apply reinforcement learning to complex system control.They propose parallel reinforcement learning to solve difficulties encountered in complex control system,such as data inefficiency,data dependency and distribution.In[20],an unsupervised weightless neural network learning algorithm and Q-learning are combined into a self-learning algorithm,which is implemented in a mobile robot navigation and obstacle avoidance.

    B.Our Approach: A Modified Reinforcement Learning Algorithm

    The intelligence robot system in Fig.2 is a modified reinforcement learning system.In our approach,we use a neural network to replace the Q-table and add heuristic knowledge.In this study,the input of the network is the state of the robot,and its output is the expected cumulative reward corresponding to each action.Instead of choosing actions by querying the Q-table, the robot selects actions directly according to the output value of the neural network or heuristic knowledge.

    Fig.2.The framework of intelligence robot system.

    Training a neural network requires a lot of data,but when the robot explores in a unknown environment,it is impossible to prepare enough training sample sets for it in advance.So,the robot collects experience data that are generated during its moving in a form as(s,a,r,s′) and stores them in replay memoryD.In this way,the quantity of training samples is guaranteed.Then,the robot samples random mini-batch experience data fromDto train the neural network.

    In some studies,neural networks have been used to replace the Q-table.In [21],Liet al.set up a neural network to learn a Q-function corresponding to traffic state and traffic system performance.The study[22] uses the ε-greedy algorithm based on Q-learning and neural networks to make the robot arrive at the end line of the driving arena without any collisions.The neural Q-learning algorithm has been proven to be efficient in path planning on square grids in[23].Determining how to make neural network learn more quickly with improved results is still a problem.

    In order to provide effective training data to neural networks,we must add heuristic knowledge in this system.On the one hand,it can guide the behavior of the robot;on the other hand,it also increases the effectiveness of training the neural networks.with the help of heuristic knowledge,neural networks will converge to an optimal action strategy faster.

    In this intelligence robot system,the robot implements path planning and obstacle avoidance tasks without a lack of prior training data and “curse of dimensionality”.And, after training,we will obtain an adaptive obstacles avoidance model.

    III.Training Deep Q-learning with Experience Replay and Heuristic knowledge

    A. Deep Q-learning With Experience Replay

    Reinforcement learning and a neural network are combined to improve the generalization ability of the model and solve the “curse of dimensionality”in[23].But the data samples for training neural networks are hard to obtain,and in above literature,data in Q-table are used to train neural networks.The data samples are always required to be independent in deep learning;however,the data samples in the Q-table are a sequence of highly correlated states produced in sequence.The actions selected by the robot have an impact on the environment in reinforcement learning.To alleviate those problems,DeepMind proposed a deep Q-learning with experience replay algorithm[14],which is proposed to play Atari.In this algorithm,experience data (s,a,r,s′)is stored in the replay memoryD.The size of the replay memoryDis fixed atN,and the replay memory always stores the lastNcollected experience data.During the process of training,we sample mini-batch experience data fromDrandomly and use it to train the network according (1).

    B. Heuristic knowledge

    The characteristics and quantity of training data are the most important factors determining the performance of a neural network model. Neural network is more likely to learn better representations by feeding it with sufficient data[24].In order to learn an expected policy,it is very important to have sufficient and effective experience data in the replay memory for a robot.

    There is a lot of randomness in traditional deep Q-learning.For example,the robot may hit obstacles when exploring randomly;the robot selects an action randomly with the ε greedy algorithm.ε-greedy algorithm makes trade-off between exploration and exploitation base on a probability.Every time the robot selects an action,it randomly selects an action to explore with a probability of ε,ε ∈[0,1] or it exploits with a probability of 1?ε,i.e.,it selects the action with the highest reward value.For the neural network, the collision experience data and random action-selected experience data cannot contribute to neural networkstraining.

    Heuristic knowledge is used in guiding the behavior of the robot,and it can reduce the randomness in an intelligent robot system.with the help of heuristic knowledge,the robot selects a suitable action,which provides characteristic training data for the neural network and accelerates the training process.

    1)Obstacle Avoidance knowledge:Because of the randomness of action choice in the early stage of reinforcement learning,the probability of the robot hitting obstacles is very high.If there is a large amount of collision experience data in replay memory,it will inevitably have a negative effect on the learning of the neural network, but if there is no collision experience as negative samples,the neural network can only learn one-sided knowledge.So,we equip the robot with obstacle avoidance knowledge,which helps the robot avoid obstacles as much as possible.In addition,the robot does not stop exploring when it hits obstacles and this collision experience data is also stored in replay memory.

    In our work, we divide the state of the intelligent robot into four categories:1)safe state(S),in which no matter what action be selected,the robot will not hit obstacles;2) unsafe state(U), means the robot may hit an obstacle at next step;3)failure state(F),in which the robot hits an obstacle and 4)winning state (W),where the robot arrives at the terminal.

    If the robot is in a safe state,it selects an action randomly using the ε-greedy strategy;if the robot is in an unsafe state,the obstacle avoidance mode is enabled,and robot will select the action which makes it move away from obstacles as far as possible without thinking about the path planning.In this paper,if the robot is in an unsafe state,it will move in the direction of a sonar, which sonar value is greater than the obstacle avoidance distance and it is farthest from the sonar has the minimum sonar value.

    Using obstacle avoidance knowledge can reduce the number of times the robot hits obstacles and makes a contribution to improve the quality of training data.

    2)Goal-directed knowledge:When the robot is at a safe state,it usually uses the ε-greedy strategy to select actions.The robot selects the action through the neural network with the probability 1?ε and selects an action randomly with probability ε,which increases the randomness of the action selection and also makes the data samples used for training become noisy.In order to reduce the blind exploration of the robot,goal-directed knowledge is used to guide the robot’s action selection.with probability ε, the robot no longer selects actions randomly,but selects the action that takes it closer to the end point according to the goal-directed knowledge.

    In this paper,we use the angle between the robot’s direction and the end point as the basis for the goal-directed knowledge.The angle is defined as the rotating angle at which the robot rotates counterclockwise until it point to the end point.The range of the angle is from 0 to 360 degrees.From the size of the angle,we can know the positional relationship between the robot and the end point,e.g.,if the angle is 180 degrees,it means that the robot direction is opposite to the end point.Those provide guidance for the robot’s action selection,so that the robot can move toward the end point.For example,when the angle is 30(or 330)degrees,if the robot rotates 30 degrees to the left(or right),it will be in the direction of the end point.

    Goal-directed knowledge provides good assistance for the selection of robot actions,and it is also helpful for speeding up the training process of the neural networks.

    C.Training the Neural Network

    Different from traditional Markov evaluation, we use a neural network to replace the Q-table in deep Q-learning with experience replay and heuristic knowledge.without prior experience data training sets, the neural network should be trained during movement of the robot.At each step of the training, the value of the neural network is changing.In the neural network training process of this paper,there is a lack of target values.If we train a neural network with a series of continuously changing values as the target value, the neural network have difficulty converging.The network may not work because it falls into a feedback loop between the target value and the estimated value.Therefore, we adopt two neural networks to complete error back propagation and update the weights.We use a slower-updating network to provide target values and gradually optimize the weights of the neural network.

    Those two neural networks work as shown in Fig.3.One of them is calledevaluate_net,which is used to generate an estimates value,denoted byq_evaluate.Another is calledtarget_net,which generates a target Q value,denoted byq_target.The two neural networks have exactly the same structure.Theevaluate_netalways has the latest weight,it is constantly updated.Thetarget_netis a historical version ofevaluate_net,it records the old weights of theevaluate_netand updates periodically.We initialize the two neural networks with the same random weights at the beginning of training.During the training,we regard the difference between the two neural networks’output values as an error and propagate it back to theevaluate_net.By modifying the weight of each neuron,the error is minimized.

    Fig.3.Two neural network diagrams.

    D.Using Deep Q-learning With Experience Replay and Heuristic knowledge on Robots

    We use deep Q-learning with experience replay and heuristic knowledge on robots in performing path planning and obstacles avoidance;the algorithm is presented in Algorithm 1.

    Algorithm 1Routing algorithm Initialize replay memory to capacity K L D N Initialize learning frequency , target_net weight updates frequency target_net evaluate_net Initialize and with the same random weight M for episode = 1,do Initialize environment and set the robot to the start point T for t = 1,do Determine the state of the robot st st if is at an unsafe state then Select an action through obstacle avoidance knowledge end if at if is at a safe sate then ε st With probability selectan action through goal-directed knowledge at Otherw ise selectan action according to the output of end if at rt st+1 at evaluate_net Execute action and observe immediately reward and new state (st,at,rt,st+1) D Store the experience data in K while t % == 0 do Sample random m ini-batch of experience data from D (S j,A j,Rj,S j+1)Set the random samples into the two networks,obtain ,yj=Rj+γmax Aj(q_tar j)q_tarj q_evaj loss=(yj ?q_evaj)2 Update theweightsof through a gradient descent procedure on end while L evaluate_net loss if t %==0 then Assign theweights of the to at interval end if evaluate_net target_net Determ ine the state of the robot st+1 if is at a winning state then Finish this episode st+1 else if isat a failure state then Step back and continue to learn else Continue thisepisode end if end for end for st+1

    In the early exploration-exploitation process,the robot just collects and stores experience data in the replay memory until there is enough data for the robot to learn.Instead of the weight being adjusted when the robot accomplishes an action,in our algorithm,everyKsteps the robot moves,it samples mini-batch experience data randomly from replay memory to train the neural network.

    The neural network used in this algorithm is a three-layer backpropagation neural network.It hasiinput nodes,hhidden nodes andjoutput nodes.The size of mini-batch ism.The complexity of Algorithm 1 consists of two parts:select actions and train the neural network.

    1) Select Actions:The time complexity of using heuristic knowledge to select actions isO(1).Using neural networks to select actions is a feed forward propagating process,the time complexity isO(m×h×(i+j)).Because the neural network is used to select action in most of time,the complexity of selection action isO(m×h×(i+j)).

    2)Train the Neural Network:Before training the neural network, training data needs to be extracted from the experience repay memory. Because the input matrix dimensions of samples are fixed in dimension, the time complexity isO(1). In a threelayer backpropagation neural network, the total time complexity for one training isO(m×h×(i+j)). Under the limitation of the learning frequencyK, the training times of each epochs isT/K.So, the complexity of training the neural network per epochs is

    O((T/K)(1+m×h×(i+j))), that isO((T/K)(m×h×(i+j))).

    The number of training epochs isM,and according to the above analysis,we can determine that the time complexity of Algorithm 1 isO(M×(T/K)×m×h×(j+i)).

    This algorithm can also be used to solve other problems,such as, playing flappy bird,walking through amaze etc.It can create a very good model for certain tasks, but its final model can not apply to other tasks.This is because this model only learns one specific goal at a time and works for a specific task.When the learning task is completely different,it is necessary to retrain the model.But if the learning task is very similar, the obtained model has a certain degree of generalization.

    IV.Pa th Planning for Robots

    A. Simulation Environment

    It is difficult to apply reinforcement learning to robots directly.Intelligence robots need thousands of repeated trainings to get a good behavior strategy.In this paper we use the simulation environment to train the robot.

    In our experiment, the task of the robot is to move from the start point to the end point without collision in an unknown environment;the start point and the end point have been told.The distribution of the robot’s sensors is shown in Fig.4.There are 5 sonar sensors located in the robot,the angle difference between different sonar is 30 degrees.Those sonars’measured distances are denoted by (s1,s2,s3,s4,s5)respectively.The motion directions of robots are also divided into five kinds:

    Action 1:turns left 60 degrees and moves forward 30 cm;

    Action 2:turns left 30 degrees and moves forward 30 cm;

    Action 3:moves forward 30 cm;

    Action 4:turns right 30 degrees and moves forward 30 cm;

    Fig.4.The distribution of the robot’s sonar.

    Action 5:turns right 60 degrees and moves forward 30 cm.

    The angle between the robot current coordinate and the end point coordinate is denoted as β.According to the five actions of the robot, we design the goal-directed knowledge as follows:

    1)If 0 ≤β<15 or 3 45<β ≤360,Action 3 will be selected;

    2)If 1 5≤β<45,Action 2 will be selected;

    3)If 4 5 ≤β<180, Action 1 will be selected;

    4)If 1 80 ≤β<315,Action 5 will be selected;

    5)If 3 15≤β ≤345,Action 4 will be selected.

    We combine 5 sensors distances and the angle β as the robot states,wheres=(s1,s2,s3,s4,s5,β).

    Fig.5 shows the structure of the neural network we used in this paper.It is a three-layer neural network.Its input is a robot states,and its output are cumulative reward values corresponding to different actions under states.There are 6 neurons in the input layer,10 neurons in the hidden layer and 5 neurons in the output layers.The activation function for hidden layer is rectified linear unit(ReLU).The activation function of output layer is a linear function.We use stochastic gradient descent to train the neural network in our study.

    Fig.5.Structure of the neural network.

    Reward function is used for judging the merits of the action.According to the reward function,the robot interacts with the environment and adjusts its action strategy by reward value.Reward function helps to strengthen expected behaviors and punish unsuitable behaviors.As the only feedback to motivate the network convergence,the negative reward for the collision between the robot and obstacles must be very large[25].The reward function adopted in this paper is shown in(2),“ →”represents the transfer of states,AG(AO)means the robot stays away from the end point (obstacles),CG(CO) means the robot is close to the end point (obstacles).

    B. Simulation and Analysis

    We conduct comparative tests in the same experimental environment, by using the method in[23],DQL with ER and DQL with ER and HR.The comparison is divided into two parts, the first part is training phase and the second part compares generalization of the obtained model.

    1)Training Phase:In[23],initial training phase is training a Q-table,we quantify the sonar values and the angle into 4 and 8 degrees respectively.Our approach will train a neural network and obtain a trained model.

    Table I shows the detail parameters used in DQL training.

    TABLE I Training Parameters and Their Valuate

    There are three different map environments with two,three and four obstacles, which are recorded asM1–M3 respectively.The above mentioned three methods were used in those map environments.The training finishes when the average reward of smart cars tends to be stable.Fig.6 shows9 trajectories that the robot obtained by using three methods in three different maps,the map of each row is the same and the method of each column is the same.In each map,the blue dot in the lower right corner is the start point, the red “×”in the upper left corner is the end point,and the black areas are four walls and obstacles.

    As we can see,those three methods can guide the robot at the end point without collision.The trajectories of the three methods are almost the same.In general, trajectories obtained using DQL are straighter than trajectories obtained by Qlearning.Once the robot has learned heuristic knowledge,it can travel in a more straight path than other robots.However,the robot will always try to choose the action that allows it to be further away from obstacles to avoid collisions, which makes trajectories not as smooth.

    The moving step count and the average training epochs of the robot when it reaches the end point under different maps are listed in Table II.The difference between the moving step count is small;in addition,the two DQL methods result in the robot reaching the end point with the same moving step count.This shows that heuristic knowledge gives little effect on the moving step count.But in terms of training epochs,DQL performs much better than Q-learning does.Comparing to DQL+ER,our method’s training epochs have been reduced by 33.33%,15.84%,and 23.38%on each map. Notice that,after applying heuristic knowledge,the training rounds are greatly reduced again.

    TABLE II Moving Step Count and Average Training Rounds

    We choose the average reward the robot collects in an epoch as our evaluation metric.Fig.7 shows the average reward when those three algorithms work onM3.As we can see,in the early stage of training,the average reward is very noisy.One reason is that the robot explores in the map,which makes it take a lot of steps to reach the end point and decreases the average reward.without heuristic knowledge,Q-learning and DQL+ER may hit obstacles in the first few training rounds,which is also a reason why the average reward is low.In general,DQL+ER +HK converges earliest in 118 epoch and has the highest average reward,around 2.27.Although,the final average reward value of the three algorithms differs slightly,DQL+ER+HK has the highest average reward and obtains a better action strategy,allowing the robot to arrive at the end point more directly.

    In summary,DQL with ER and HK makes the robot converge to the best trajectory faster than traditional Qlearning,and takes fewer steps to reach the target.with the help of heuristic knowledge,the robot can accelerate learning and obtain a better strategy.

    2)Generalization and Flexibility:According to the training process in[23],we use the 323 data in the Q-table obtained fromM3 to train a neural network (NN),and combines this Qtable with the network as an adaptive model(Q+ NN).DQL with ER,and DQL with ER and HK obtain adaptive models fromM3 too.Two tests are performed to test the generalization and flexibility of these three models.We record trajectories of the robot in the new environment for the first time,and compare the trajectories obtained by the three models.

    Test 1:The Robot Is Initialized From Arbitrary Points on M 3

    Fig.6.Nine trajectories that the robot obtained by using three methods in three different maps.

    Fig.7.The plot shows average reward per episode on M 3 when uses Qlearning,DQL + ER,and DQL + ER + HK.

    Fig.8 shows trajectories when the robot starts from three arbitrary starting points.The horizontal axis is three models,and the vertical axis is three arbitrary start point maps.In the top row of this figure,the start point is on the left side of the original start point.In this case,the differences among three trajectories obtained by the three models are not huge.In the middle row,the start point is above the obstacle in the lower right corner.In this situation,the robot that uses our method almost goes straight to the end point.But the robot that uses Q-learning with a neural network goes to a detour and the trajectory obtained by DQL with ER is also more tortuous than that obtained by our method.In the bottom row, the start point is on the right side of the original start point.We can see, the left trajectory is the most tortuous.The middle trajectory almost has the same outline with the right one, but the right one is more straight.

    Detailed data about the moving step count of arriving to the end point are showed in Table III.Bold numbers are the minimum moving step count used in different situations.It shows that the robot that uses our method can reach the end point with fewer steps.

    Table IV shows the average reward that those three models obtained when they start from three arbitrary start point.When the robots have similar trajectories, their average rewardvalues are similar.But the fewer detours the robot takes,the greater the average reward obtained.The robot that uses our method always has the largest average reward.

    TABLE III Moving Step Count From Arbitrary Start Point

    TABLE IV Average Reward From Arbitrary Start Point

    Test 1 shows that all the models can make robots reach the end point without collision when the robots are initialized from arbitrary start points.By comparing the experiment results,we find our approach provides a better action strategy,which helps the robot go on a shorter path and obtain a larger average reward.

    Test 2:Changing the Position of Obstacles in M 3

    Fig.8.Nine trajectories that the robot obtained by using three models when starting from three arbitrary start points.

    In Test 2,the position of obstacles inM3 are changed,and the changed maps and obtained trajectories are shown in Fig.9.The horizontal axis is the three models,the vertical axis is the three obstacle-changed maps.The number of obstacles are changed from 4 to 3 in the changed Map 1,and the middle obstacle is located on the line connecting the start point and the end point.The left figure in the top row shows the robot has a tendency to stay away from the end point in the process of moving.However,the right two robots bypass the obstacle directly and reach the end point.Their trajectories are similar.The middle row is changed Map 2,where the start point is surrounded by a inverted U-shaped obstacle.It can be determined from trajectory that all of the robots explore to the bottom of the inverted U-shaped obstacle at first,and then finally find the exit and reach the end point.During the exploration, the left trajectory shows that the robot collides with obstacles 4 times,with collision points represented by red dots.However,the right two trajectories show that robots walk to the bottom of the inverted U-shaped obstacle,turn around straightly,and reach the end point successfully.The bottom row is changed Map 3,where the position of the obstacles has been changed tremendously and there are 10 block obstacles.The left trajectory shows that during the movement, the robot exhibits the phenomenon of turning circles.In contrast, the right two trajectories show that the robots which used DQL explore a shorter moving route.Although the right two models get very similar trajectories,intuitively,DQL with ER and HK can obtain a more straight route.

    Table V shows the moving step count the robot takes to reach the end point in three changed maps.Comparing to Qlearning with neural networks,the robot that used that our model takes fewer steps to reach the end point.The more complex the map is, the more obvious the advantages of our model is.But comparing to DQL with ER,our model has a slight advantages in the moving step count.In the changed Map 2,those two models have the same moving step count.

    TABLEVI shows the average reward those three modelsobtained when they are in three changed maps.From the table,it can be known that Q-learning with NN can not adapt well to the changed map,and it always has the lowest average reward.A lso,it causes the robot to hit obstacles 4 times in the changed Map 2,so the average reward is close to 0.The average reward of DQL+ER+HK is always higher than DQL + ER.

    TABLE V Moving Step Count in Three Changed Maps

    TABLE VI Average Reward in Changed Maps

    In Test 2,robots that used DQL can reach the end point without hitting obstacles,even though obstacles have been changed drastically and the environment is more complex.Furthermore,we find that our model can adapt to new maps better than Q-learning with the neural network and DQL with ER.The reasons why the generalization ability of the model combining the Q-table with the neural network are not optimal areas follow:

    Fig.9.Nine trajectories that the robot obtained by using three models in the three obstacles-changed maps.

    1)Fuzzy states have an impact on the learning space of the robot.In the Q-learning training process,states of the robot had been fuzzed.An excessive degree of fuzzification leads to inaccurate execution of robot actions;a small degree of fuzzification will greatly increase the learning space of the robot and the learning time.

    2)Limited amounts of data were used for training neural network.It is critical to train a neural network with sufficient training data.There are some limitations in training the neural network when using the data in the Q-table as training data.

    DQL with ER also has good generalization ability. After combining heuristic knowledge,it can move more optimally towards the end point.This,DQL with ER and HK can reach the end point with fewer steps and get a larger average reward.

    In DQL with the ER and HK algorithm,the state of the robot is represented by raw data directly, which is used as the input of the neural network.Training of the neural network runs through the moving process of the robot,the data of each step the robot moves is applied to the training of the neural network.Heuristic knowledge guides the behavior of the robot,and it makes the actions selected by the robot more purposeful.However,the trajectories obtained by our approach in the new map is not the shortest one.This is because that when avoiding obstacles,the robots will try to keep away from obstacles,which leads to detours.The accuracy of neural network is also a reason,which will bring uncertainty.

    The above two tests verify the generalization and flexibility of our approach,and our approach shows stronger robustness and adaptability than Q-table with neural network and DQL with ER.It can provide a relatively superior action strategy.

    V.Discussion o f Related Work

    Q-learning is widely used in path planning and obstacle avoidance of an intelligent robot.Jaradatet al. proposed a new definition for states space in[26] to reduce the size of the Qtable;in[27],[28],a method based on reinforcement learning and fuzzy logic is proposed.However,those just limit the number of states and cannot solve“curse of dimensionality”completely. Also, the degree of fuzzification limits the learning states of the robot.In our work,a neural network is used to deal with this problem,where the input of the neural network is the state of the robot and consists of raw sonar data.

    To solve the problem of slow convergence and low efficiency in path planning,reinforcement learning based on virtual potential field has been proposed in[29].It regards the unknown environmentas a potential field,and Y Zheng et al.propose a new algorithm based on hierarchical reinforcement learning and an artificial potential field[30].But as we known, the potential field method easily falls into a local optimum.

    Neural networks have been used to enhance reinforcement learning’s generalization ability,but a large amount of training samples are needed to train a neural network.In[23],[31],researchers use traditional reinforcement learning to generate training samples for neural networks and then,train the neural network with those training samples.There are two training processes in those algorithms,and they are time consuming.In[32],[33],the neural network is trained while the robot is moving,but because the neural network trains once each step the robot moves,the training efficiency is lowered.Furthermore,the correlation of training samples will affect the representation of the neural network.However,the experience replay mechanism solves the above problems perfectly, where training samples are stored in replay memory and each step of experience data can be used for updating many weights and the relevance of training data is disrupted.It reduces the burden of collecting prior experience data to train the neural network and improves the efficiency of experience data utilization.

    The work of [24]is similar to our work,and the system in this literature is based on deep Q-network which combines a CNN(convolutional neural network)with deep Q-learning.In contrast to ourwork,this method uses CNN to process image data and takes the obtained result as the input of deep Qlearning.We also add heuristic knowledge based on the original DQL.Heuristic knowledge provides suitable and effective data for training the neural network,and it helps to train a satisfactory neural network.

    VI.Conclusion

    In this paper,we combined deep Q-learning with experience replay and heuristic knowledge for path planning and obstacle avoidance of intelligent robots.this method has been tested in three different environments,and the robot converges to an optimal strategy faster and reaches the end point with fewer steps than Q-learning with the neural network and normal DQL with ER.The experiments have also shown that our model has better adaptiveness in an unknown environment.

    In the future, we will work on the following aspects:

    1)Optimizing the planned path for robots.We should design a better strategy to collaborate path planning and obstacle avoidance.

    2)Explore more complicated neural network architectures.In this paper,we complete the simulation experiment with the simplest neural network architecture.We hope that more complex neural network architectures can further improve the experimental results.

    3)dynamic obstacle avoidance.In this paper the obstacles are static.We consider dynamic obstacles which will increase the difficulty of robot path planning.

    4)Applying our method to real robots.In this paper,we only perform the simulation for our method in an ideal environment,which is hard to satisfy in real life.We will implement our method on a robot in a real environment.Due to the uncertainties in the environment, we may need to adjust our method.

    老女人水多毛片| 成人鲁丝片一二三区免费| 色综合站精品国产| 黑人高潮一二区| 亚洲国产精品sss在线观看| 国产精品一区二区性色av| av在线蜜桃| 99视频精品全部免费 在线| 一个人看的www免费观看视频| 日韩,欧美,国产一区二区三区| 国内精品一区二区在线观看| 99久久精品一区二区三区| 国产日韩欧美在线精品| 中文字幕亚洲精品专区| 国产麻豆成人av免费视频| 亚洲成人久久爱视频| 国产精品久久久久久精品电影小说 | 色网站视频免费| 国产高清不卡午夜福利| 国产精品久久久久久久电影| 少妇熟女aⅴ在线视频| 日韩国内少妇激情av| 熟妇人妻久久中文字幕3abv| 国产伦一二天堂av在线观看| 人妻制服诱惑在线中文字幕| kizo精华| 亚洲欧美成人精品一区二区| 国产精品不卡视频一区二区| 国产在线男女| 日韩欧美一区视频在线观看 | 免费少妇av软件| 午夜久久久久精精品| 久久精品熟女亚洲av麻豆精品 | 国产女主播在线喷水免费视频网站 | av国产免费在线观看| 少妇被粗大猛烈的视频| 美女大奶头视频| 在线观看一区二区三区| 免费av毛片视频| 国产精品一区二区三区四区久久| 国产精品三级大全| 淫秽高清视频在线观看| 少妇裸体淫交视频免费看高清| 成人特级av手机在线观看| 精品久久久久久久久av| 久久久精品94久久精品| 亚洲国产欧美在线一区| 国产精品日韩av在线免费观看| 国产欧美另类精品又又久久亚洲欧美| 日韩不卡一区二区三区视频在线| 久久精品夜夜夜夜夜久久蜜豆| 日韩大片免费观看网站| 亚州av有码| 亚洲18禁久久av| 嫩草影院精品99| 亚洲av一区综合| 丝袜美腿在线中文| 女人十人毛片免费观看3o分钟| 国语对白做爰xxxⅹ性视频网站| 欧美日韩国产mv在线观看视频 | 国产黄色免费在线视频| 国产亚洲5aaaaa淫片| 深夜a级毛片| 亚洲欧美成人综合另类久久久| 国产成年人精品一区二区| 色综合站精品国产| 欧美一区二区亚洲| 啦啦啦中文免费视频观看日本| 欧美xxxx黑人xx丫x性爽| 亚洲四区av| 精品一区二区三区视频在线| 好男人在线观看高清免费视频| 人妻系列 视频| 99热这里只有精品一区| av黄色大香蕉| 国产又色又爽无遮挡免| 国产毛片a区久久久久| 国产高潮美女av| 亚洲成人精品中文字幕电影| 国产精品久久久久久精品电影| 日韩大片免费观看网站| 亚洲图色成人| 欧美高清性xxxxhd video| 精品一区二区三卡| 91狼人影院| 国产男女超爽视频在线观看| 大话2 男鬼变身卡| 亚洲精品,欧美精品| 日韩av在线免费看完整版不卡| 深夜a级毛片| 99热网站在线观看| 97在线视频观看| 久久久精品欧美日韩精品| 最近中文字幕高清免费大全6| 欧美激情在线99| 大片免费播放器 马上看| 狂野欧美白嫩少妇大欣赏| 极品教师在线视频| 国产成人a区在线观看| 高清视频免费观看一区二区 | 日韩av在线大香蕉| 久久久久网色| 干丝袜人妻中文字幕| 亚洲图色成人| 97超碰精品成人国产| 国内精品一区二区在线观看| 免费观看在线日韩| 神马国产精品三级电影在线观看| 国产一区二区在线观看日韩| 日韩欧美三级三区| av在线天堂中文字幕| 欧美成人一区二区免费高清观看| 乱系列少妇在线播放| 久久久久久久大尺度免费视频| 国产一区有黄有色的免费视频 | 亚洲激情五月婷婷啪啪| 亚洲精品成人久久久久久| 免费不卡的大黄色大毛片视频在线观看 | 一本—道久久a久久精品蜜桃钙片 精品乱码久久久久久99久播 | 国产午夜精品久久久久久一区二区三区| 神马国产精品三级电影在线观看| 草草在线视频免费看| 又粗又硬又长又爽又黄的视频| 国产黄片视频在线免费观看| 一本一本综合久久| 国产黄色视频一区二区在线观看| 久久久久久久国产电影| 国产日韩欧美在线精品| 久久鲁丝午夜福利片| 国内精品美女久久久久久| 亚洲av国产av综合av卡| 亚洲怡红院男人天堂| 国国产精品蜜臀av免费| 免费观看性生交大片5| 午夜精品在线福利| 亚洲最大成人手机在线| 国产亚洲一区二区精品| av黄色大香蕉| 日韩成人伦理影院| 一级毛片黄色毛片免费观看视频| av在线观看视频网站免费| 日日撸夜夜添| 欧美性猛交╳xxx乱大交人| 97热精品久久久久久| 在线免费十八禁| 久久草成人影院| 国产成人精品一,二区| 内射极品少妇av片p| 麻豆久久精品国产亚洲av| 国产黄频视频在线观看| 日韩欧美三级三区| 一级二级三级毛片免费看| 亚洲va在线va天堂va国产| 一级黄片播放器| 内地一区二区视频在线| 久久这里有精品视频免费| 伦理电影大哥的女人| 久久热精品热| 亚洲综合色惰| 嫩草影院新地址| 午夜激情欧美在线| 亚洲av在线观看美女高潮| 小蜜桃在线观看免费完整版高清| 国产精品1区2区在线观看.| 国产精品国产三级专区第一集| 亚洲av男天堂| 亚洲精品久久午夜乱码| 99热全是精品| 亚洲av成人精品一区久久| 大香蕉久久网| 免费高清在线观看视频在线观看| 国产精品av视频在线免费观看| 国产亚洲午夜精品一区二区久久 | 日日摸夜夜添夜夜爱| 水蜜桃什么品种好| 直男gayav资源| 欧美人与善性xxx| 成年女人看的毛片在线观看| 街头女战士在线观看网站| 久久久久久久午夜电影| 97热精品久久久久久| 极品教师在线视频| 国产精品人妻久久久久久| 麻豆久久精品国产亚洲av| 99久久精品国产国产毛片| 亚洲精品国产av蜜桃| 国产成人午夜福利电影在线观看| 国内精品美女久久久久久| 一本—道久久a久久精品蜜桃钙片 精品乱码久久久久久99久播 | 午夜日本视频在线| 国产成人免费观看mmmm| 日日摸夜夜添夜夜爱| 天堂俺去俺来也www色官网 | 国内精品美女久久久久久| 少妇丰满av| 免费大片18禁| 午夜激情福利司机影院| 色视频www国产| 免费少妇av软件| 欧美三级亚洲精品| 又大又黄又爽视频免费| 天美传媒精品一区二区| 一级黄片播放器| 日韩视频在线欧美| 精品不卡国产一区二区三区| 97人妻精品一区二区三区麻豆| 日韩三级伦理在线观看| 亚洲婷婷狠狠爱综合网| 极品教师在线视频| 色5月婷婷丁香| 免费高清在线观看视频在线观看| 2018国产大陆天天弄谢| 禁无遮挡网站| 熟妇人妻不卡中文字幕| 亚洲在久久综合| 婷婷色麻豆天堂久久| 免费不卡的大黄色大毛片视频在线观看 | 久久人人爽人人片av| 成年女人在线观看亚洲视频 | 午夜福利在线观看免费完整高清在| 草草在线视频免费看| 午夜福利视频精品| 亚洲久久久久久中文字幕| 亚洲最大成人中文| 黄片无遮挡物在线观看| 又粗又硬又长又爽又黄的视频| 国语对白做爰xxxⅹ性视频网站| 午夜日本视频在线| 久久久久久久久中文| 99视频精品全部免费 在线| 日本-黄色视频高清免费观看| 亚洲综合色惰| 91精品伊人久久大香线蕉| 欧美 日韩 精品 国产| 人妻一区二区av| 亚洲自拍偷在线| 免费看美女性在线毛片视频| 熟女人妻精品中文字幕| 美女被艹到高潮喷水动态| 亚洲国产高清在线一区二区三| 国产 亚洲一区二区三区 | 国产精品女同一区二区软件| 亚洲最大成人手机在线| 亚洲在线观看片| 人妻夜夜爽99麻豆av| 在线免费观看的www视频| 国产成人a∨麻豆精品| a级毛色黄片| 欧美+日韩+精品| 亚洲欧美日韩无卡精品| 18禁在线播放成人免费| 日本黄色片子视频| 久久精品国产亚洲av涩爱| av播播在线观看一区| 视频中文字幕在线观看| 成人二区视频| 伊人久久精品亚洲午夜| 国产 一区精品| 激情 狠狠 欧美| 日本黄色片子视频| 内射极品少妇av片p| 亚洲精品日韩av片在线观看| 色播亚洲综合网| 高清毛片免费看| 三级经典国产精品| 国产精品av视频在线免费观看| 久久韩国三级中文字幕| 亚洲精品乱码久久久v下载方式| 黄色一级大片看看| 日韩伦理黄色片| 一区二区三区免费毛片| 观看免费一级毛片| av在线蜜桃| 久久久久久久午夜电影| 午夜激情福利司机影院| kizo精华| 美女主播在线视频| 午夜福利视频精品| 中文在线观看免费www的网站| 99视频精品全部免费 在线| 亚洲自偷自拍三级| 秋霞在线观看毛片| 亚洲国产日韩欧美精品在线观看| 日产精品乱码卡一卡2卡三| 我要看日韩黄色一级片| 久久久久久久国产电影| 成人亚洲欧美一区二区av| 日日撸夜夜添| 成人av在线播放网站| 可以在线观看毛片的网站| 中文字幕制服av| 亚洲电影在线观看av| 国产精品福利在线免费观看| 天美传媒精品一区二区| 特大巨黑吊av在线直播| 久久久久久久久久人人人人人人| 在线免费观看的www视频| 亚洲国产精品成人久久小说| 啦啦啦韩国在线观看视频| 久久精品人妻少妇| 欧美一区二区亚洲| 蜜桃亚洲精品一区二区三区| 九草在线视频观看| 久久久久久久午夜电影| 99热6这里只有精品| 国产伦一二天堂av在线观看| 日韩中字成人| 欧美日韩国产mv在线观看视频 | 噜噜噜噜噜久久久久久91| 啦啦啦啦在线视频资源| 白带黄色成豆腐渣| 国产三级在线视频| 日本午夜av视频| 婷婷六月久久综合丁香| 国产黄色免费在线视频| 肉色欧美久久久久久久蜜桃 | 美女黄网站色视频| 日韩不卡一区二区三区视频在线| 亚洲av成人精品一二三区| 久久久久网色| 搞女人的毛片| 亚洲精品色激情综合| 乱系列少妇在线播放| 伦精品一区二区三区| 婷婷六月久久综合丁香| 亚洲国产精品成人久久小说| 午夜视频国产福利| 五月伊人婷婷丁香| 18禁裸乳无遮挡免费网站照片| 黄色欧美视频在线观看| 国产午夜精品久久久久久一区二区三区| 亚洲高清免费不卡视频| 国产欧美日韩精品一区二区| 亚洲精品乱码久久久久久按摩| 欧美激情在线99| 国产成人精品一,二区| 亚洲精品亚洲一区二区| 日本爱情动作片www.在线观看| 真实男女啪啪啪动态图| 男女啪啪激烈高潮av片| 一夜夜www| av在线老鸭窝| 蜜臀久久99精品久久宅男| 亚洲欧美日韩卡通动漫| 国产探花在线观看一区二区| 3wmmmm亚洲av在线观看| 日韩亚洲欧美综合| 高清视频免费观看一区二区 | 蜜桃久久精品国产亚洲av| 寂寞人妻少妇视频99o| 嫩草影院精品99| 国产在视频线在精品| 国产爱豆传媒在线观看| 欧美日韩视频高清一区二区三区二| 乱码一卡2卡4卡精品| 亚洲精品自拍成人| 国国产精品蜜臀av免费| 能在线免费观看的黄片| 大香蕉久久网| 中文精品一卡2卡3卡4更新| 亚洲成人一二三区av| 高清午夜精品一区二区三区| a级毛片免费高清观看在线播放| 久久久久久久大尺度免费视频| 大话2 男鬼变身卡| 校园人妻丝袜中文字幕| 精品久久久噜噜| 男人舔奶头视频| 午夜福利在线观看吧| 久久久欧美国产精品| 亚洲国产精品专区欧美| 人妻系列 视频| 欧美日韩综合久久久久久| 视频中文字幕在线观看| 十八禁国产超污无遮挡网站| 人人妻人人澡欧美一区二区| 国产精品久久视频播放| 波多野结衣巨乳人妻| 久久这里只有精品中国| 国产美女午夜福利| 天天一区二区日本电影三级| 国产精品国产三级国产av玫瑰| 久久99热6这里只有精品| 亚洲av二区三区四区| 国产成人精品婷婷| 男人狂女人下面高潮的视频| 亚洲欧美精品自产自拍| 综合色丁香网| 天堂网av新在线| 美女主播在线视频| 亚洲人与动物交配视频| 国产精品1区2区在线观看.| 国产精品女同一区二区软件| 又黄又爽又刺激的免费视频.| 亚洲欧美日韩无卡精品| 国产精品久久久久久精品电影小说 | 亚洲三级黄色毛片| 少妇丰满av| 一级黄片播放器| 在线天堂最新版资源| 视频中文字幕在线观看| 国产探花在线观看一区二区| 五月伊人婷婷丁香| 国产淫片久久久久久久久| 熟妇人妻久久中文字幕3abv| 高清av免费在线| 国产91av在线免费观看| 日韩av在线免费看完整版不卡| 免费观看无遮挡的男女| 人人妻人人看人人澡| 黑人高潮一二区| 亚洲av不卡在线观看| 永久免费av网站大全| 最近中文字幕高清免费大全6| 国产色婷婷99| 五月伊人婷婷丁香| 非洲黑人性xxxx精品又粗又长| 熟女电影av网| 国产黄色小视频在线观看| 久久精品熟女亚洲av麻豆精品 | av卡一久久| 美女被艹到高潮喷水动态| 亚洲av成人精品一二三区| 日韩一区二区三区影片| 国产免费福利视频在线观看| 国精品久久久久久国模美| 国产亚洲精品av在线| 国产亚洲91精品色在线| 亚洲av不卡在线观看| 亚洲激情五月婷婷啪啪| 日韩欧美精品免费久久| 欧美激情国产日韩精品一区| 国产成年人精品一区二区| 免费黄网站久久成人精品| 蜜臀久久99精品久久宅男| 老司机影院成人| .国产精品久久| 综合色丁香网| 国产激情偷乱视频一区二区| 国产成人91sexporn| 国产一级毛片在线| 日韩av不卡免费在线播放| 禁无遮挡网站| 看十八女毛片水多多多| 91久久精品国产一区二区三区| 99热这里只有是精品50| 日韩不卡一区二区三区视频在线| 建设人人有责人人尽责人人享有的 | 国产精品人妻久久久久久| 777米奇影视久久| 别揉我奶头 嗯啊视频| 日韩欧美精品v在线| 亚洲欧美中文字幕日韩二区| 亚洲成人久久爱视频| 国产成人aa在线观看| 一级毛片黄色毛片免费观看视频| 日本爱情动作片www.在线观看| 欧美高清成人免费视频www| 国产成人精品婷婷| av黄色大香蕉| 久久精品国产自在天天线| 韩国av在线不卡| 精品熟女少妇av免费看| 啦啦啦中文免费视频观看日本| 亚洲精华国产精华液的使用体验| 国产成人精品婷婷| 国产精品麻豆人妻色哟哟久久 | 国产精品蜜桃在线观看| 汤姆久久久久久久影院中文字幕 | 日韩视频在线欧美| 亚洲怡红院男人天堂| 欧美 日韩 精品 国产| 亚洲真实伦在线观看| 亚洲国产精品成人久久小说| 你懂的网址亚洲精品在线观看| av卡一久久| 熟妇人妻久久中文字幕3abv| 色综合站精品国产| 可以在线观看毛片的网站| 国产欧美另类精品又又久久亚洲欧美| 精品久久久久久久久久久久久| 婷婷色综合大香蕉| 久久人人爽人人片av| 91久久精品电影网| 亚洲最大成人手机在线| 国产有黄有色有爽视频| 久久久久久国产a免费观看| 久久久a久久爽久久v久久| 亚洲成人精品中文字幕电影| 午夜激情福利司机影院| 国产精品综合久久久久久久免费| 午夜福利视频1000在线观看| 美女脱内裤让男人舔精品视频| 国产成人a∨麻豆精品| 久久久久国产网址| 搡老妇女老女人老熟妇| 国产伦一二天堂av在线观看| 欧美性猛交╳xxx乱大交人| 亚洲人成网站在线观看播放| 精品午夜福利在线看| 老师上课跳d突然被开到最大视频| 99久久精品热视频| 亚洲不卡免费看| 男女下面进入的视频免费午夜| 晚上一个人看的免费电影| 日韩欧美精品v在线| 天堂俺去俺来也www色官网 | 我要看日韩黄色一级片| 99久久精品一区二区三区| 少妇的逼好多水| 国产色婷婷99| 国产男人的电影天堂91| 日韩视频在线欧美| 亚洲精品乱久久久久久| 精品人妻一区二区三区麻豆| 成人国产麻豆网| 亚洲美女搞黄在线观看| 最新中文字幕久久久久| 亚洲人成网站在线播| 日本午夜av视频| 国产精品伦人一区二区| 久久99蜜桃精品久久| 美女高潮的动态| 久久99精品国语久久久| 亚洲精品成人av观看孕妇| 日产精品乱码卡一卡2卡三| 久久精品国产亚洲网站| 亚洲激情五月婷婷啪啪| 亚洲欧美一区二区三区国产| 国产成人精品久久久久久| 2021少妇久久久久久久久久久| 精品国产露脸久久av麻豆 | 亚洲国产精品国产精品| 婷婷六月久久综合丁香| 精品久久久久久成人av| 菩萨蛮人人尽说江南好唐韦庄| 午夜精品国产一区二区电影 | 国产高清不卡午夜福利| 国产精品久久视频播放| 国产av码专区亚洲av| 99久久精品国产国产毛片| 欧美xxⅹ黑人| 亚洲av免费高清在线观看| 国产精品一区二区性色av| 国产精品一区www在线观看| 欧美精品一区二区大全| 亚洲欧美一区二区三区国产| 神马国产精品三级电影在线观看| 少妇熟女欧美另类| 国产三级在线视频| 亚洲精品456在线播放app| 日本午夜av视频| 欧美高清性xxxxhd video| 久久久久久伊人网av| 亚洲国产精品国产精品| 美女黄网站色视频| 青春草视频在线免费观看| 国产精品日韩av在线免费观看| a级一级毛片免费在线观看| a级毛片免费高清观看在线播放| 国产精品人妻久久久久久| 免费大片18禁| 日韩,欧美,国产一区二区三区| 卡戴珊不雅视频在线播放| 亚洲色图av天堂| 最后的刺客免费高清国语| 欧美最新免费一区二区三区| 午夜福利在线观看吧| 国产av码专区亚洲av| 亚洲最大成人手机在线| 男的添女的下面高潮视频| 久久人人爽人人片av| 日产精品乱码卡一卡2卡三| 免费看av在线观看网站| 欧美bdsm另类| 高清毛片免费看| 一本—道久久a久久精品蜜桃钙片 精品乱码久久久久久99久播 | 18禁在线无遮挡免费观看视频| 欧美区成人在线视频| 成年人午夜在线观看视频 | 高清av免费在线| 人人妻人人看人人澡| 一个人免费在线观看电影| 黄色配什么色好看| 能在线免费看毛片的网站| 免费观看在线日韩| 欧美xxxx性猛交bbbb| 亚洲熟女精品中文字幕| 熟妇人妻不卡中文字幕| 亚洲精品乱久久久久久| 国产欧美日韩精品一区二区| 国产高清国产精品国产三级 | 婷婷色综合www| 午夜福利在线观看免费完整高清在| 国产在视频线精品| 永久免费av网站大全| 秋霞在线观看毛片| www.av在线官网国产| 一区二区三区免费毛片| 精品熟女少妇av免费看| 国产人妻一区二区三区在| 亚洲精品自拍成人| 男人狂女人下面高潮的视频| 成人国产麻豆网| 亚洲av成人精品一区久久| 成人特级av手机在线观看| 精品少妇黑人巨大在线播放| 嫩草影院入口| 麻豆久久精品国产亚洲av| 国产乱来视频区| 日韩精品有码人妻一区| 最近最新中文字幕大全电影3| 色哟哟·www| 搞女人的毛片| 一区二区三区乱码不卡18|