• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    A policy iteration method for improving robot assembly trajectory efficiency

    2023-04-22 02:06:18QiZHANGZongwuXIEBaoshiCAOYangLIU
    CHINESE JOURNAL OF AERONAUTICS 2023年3期

    Qi ZHANG, Zongwu XIE, Baoshi CAO, Yang LIU

    State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150001, China

    KEYWORDS Bolt assembly;Policy initialization;Policy iteration;Reinforcement learning(RL);Robotic assembly;Trajectory efficiency

    Abstract Bolt assembly by robots is a vital and difficult task for replacing astronauts in extravehicular activities(EVA),but the trajectory efficiency still needs to be improved during the wrench insertion into hex hole of bolt.In this paper, a policy iteration method based on reinforcement learning(RL)is proposed,by which the problem of trajectory efficiency improvement is constructed as an issue of RL-based objective optimization.Firstly, the projection relation between raw data and state-action space is established, and then a policy iteration initialization method is designed based on the projection to provide the initialization policy for iteration.Policy iteration based on the protective policy is applied to continuously evaluating and optimizing the action-value function of all state-action pairs till the convergence is obtained.To verify the feasibility and effectiveness of the proposed method, a noncontact demonstration experiment with human supervision is performed.Experimental results show that the initialization policy and the generated policy can be obtained by the policy iteration method in a limited number of demonstrations.A comparison between the experiments with two different assembly tolerances shows that the convergent generated policy possesses higher trajectory efficiency than the conservative one.In addition,this method can ensure safety during the training process and improve utilization efficiency of demonstration data.

    1.Introduction

    In-space assembly (ISA) technologies can effectively expand space structures, improve spacecraft performance, and reduce launch requirements.1–3Bolt assembly is an indispensable on-orbit extra-vehicular activity (EVA) in the future work of International Space Station (ISS)/Chinese Space Station(CSS) and large space telescopes tasks.Currently, astronauts have to use the electric wrench tool to screw hexagonal bolts in the extra-vehicular scene.This activity is time consuming,and lacking in operational flexibility and safety,4,5so it is expected that space robots can replace astronauts in this activity.6–8The space robot should autonomously make decisions for wrench insertion during assembly according to predetermined rules in the on-orbit extra-vehicular environment.

    Previous studies have proposed many peg insertion methods based on vision from multiple routes.Edge-fitting and shadow-aided positioning can facilitate alignment and pegin-hole assembly with the visual-servo system.9An edge detection method was proposed for post-processing applications based on edge proportion statistics.10,11With the aid of vision recognition,eccentric peg in a hole of the crankshaft and bearing assembly was achieved.12However,the software and hardware of the space robot data processing system restrict the processing capability of visual information, so insufficient visual information dissatisfies the real-time on-orbit visual servo.With the obtained sparse visual information, it is difficult to distinguish the hexagon contour features of wrench and hole but it is possible to recognize the approximate circular features to locate the hexagon hole of bolt.13Force-based methods provide another way of insertion assembly by robots,and the core is to guide the assembly trajectory by contact force.However,analyzing the relationship between the contact force and trajectory required for assembly is rather difficult,14,15so several types of methods were designed from the perspective of control, shape recognition and contact model.16,17The force control methods were adopted typically to solve peg-in-hole problems.18–20By analyzing the geometric relationship through force-torque information, the hole detection algorithm tried to find the direction of a hole relative to a peg based on shape recognition.21The guidance algorithm was proposed to assemble the parts of complex shapes, and the force required for assembly was decided by kinesthetic teaching with a Gaussian mixture model.22Force information can help to maximize the regions of attraction (ROA), and the complex problem of peg-in-hole assembly planning was achieved analytically.23,24Besides, the static contact model was also used for the peg-in-hole task, and the relative pose was estimated by force-torque maps.25To eliminate the interference of external influence, the noncontact robotic demonstration method with human supervision was used to collect the pure contact force, so as to analyze the contact model in the wrench insertion task.26.

    Even though the analytical contact model has been used in some cases, the robotic insertion assembly efficiency requires to be further improved.Reinforcement learning (RL) has shown its superiority in making the optimal decision, but robotic application by RL still has various barriers.Highdimensional and unexplainable state definition is one of them.Deep reinforcement learning can handle high-dimensional image inputs by the neural network.27The end-to-end approach utilizes images as both state and goal representation.28,29High-dimensional image information is processed by the neural network with plenty of parameters, so ambiguous state expressions do not have clear physical meanings.Sample efficiency is also a major constraint for deploying RL in robotic tasks as the interaction time is so long that experimenters cannot bear it.Some attempts have made progresses in simulation, but application of RL in real-word robotic tasks is still challenging.30,31Training times can be reduced by simultaneously executing the policy under the condition of multiple robots.32Exploration sufficiency before RL policy iteration is another obstacle in robotic tasks, and the existing method for exploring initialization or mild policy is impractical in robotic applications.The initial position cannot be set arbitrarily in the robotic assembly problem, which makes exploration initialization infeasible.The mild policy selects each action in any state with nonzero probability,but the frequency of selecting certain actions is not high enough during data acquisition.In addition, risk avoidance during policy training is also an important issue, because the robotic manipulator is continuously subjected to an unknown dangerous interacting contact force.From another point of view, this contact force can also provide guidance for assembly.In a word, to apply force-based RL in the robotic wrench insertion task, we need to overcome three difficulties as follows:

    1.The state-action space design in RL requires consideration of the physical meaning of contact state.

    2.Sample efficiency cannot be neglected in sufficient traverse of the state-action space in policy iteration initialization.

    3.The security issue should be considered especially in policy iteration.

    However, none of the methods mentioned above can meet the three key demands in force-based RL robotic manipulation at the same time.To cope with this problem, a novel policy iteration method including state-action definition, initialization method and protective policy is proposed in this paper to realize the robotic wrench insertion task.The ground validation experiment using the noncontact demonstration method with human supervision proves that the proposed policy iteration method can improve the trajectory efficiency of robotic assembly in a safe way.

    The rest of the paper is organized as follows.Section 2 mainly provides illustration of the proposed policy iteration method.The experiment platform, execution and result are introduced in Section 3.Section 4 presents the conclusions and future work.

    2.Policy iteration method

    In this section, the process of policy iteration is elaborated.First,how to improve trajectory efficiency of the insertion task is defined as an objective optimization problem.To acquire a policy with higher efficiency, it is necessary to design the RL state-action space by projecting from raw force-position data.Then, policy iteration initialization is conducted to transverse all state-action pairs with higher sample efficiency.Finally,the safety during the training process of policy iteration is guaranteed by using the protective policy.

    2.1.Optimization objective and definition of trajectory efficiency

    In this section,the robotic wrench insertion task is divided into several stages to illustrate the optimization objective,and then the trajectory efficiency is defined with a mathematical expression.It is known that the contour shapes of the wrench and the bolt hole are hexagonal, so robotic wrench insertion into the bolt hole requires multiple steps.The relative efficient sequence of entire insertion assembly can be divided into three stages,with the wrench coordinate as defined in Fig.1.

    (1) rotating along the Z-axis (Rz+) till alignment is achieved in the Rzdimension.

    (2) moving along the Z-axis (Pz+) till the end face of wrench contacts the bottom of hole.

    Fig.1 Three stages of wrench insertion into hexagonal hole of bolt.

    (3) adjusting the position (Pxor Py) or orientation (Rxor Ry) of the robotic end-effector.

    Stage(2)and Stage(3)can be exchanged according to realtime conditions as shown in Fig.1, which illustrates an intuitive stage presentation of wrench insertion into the hexagonal hole of bolt.

    For Stage (1), the end-effector wrench tool of robotic manipulator only needs to be rotated along the direction in which the bolt can be tightened.When the lateral side of wrench is in contact with one of the hexagonal holes as shown in Fig.1,the wrench tool bears a torque opposite to the direction of rotation.This torque value can be measured by the 6-dimensional force/torque sensor, serving as the criterion for the end of Stage (1).

    For Stage (2), the robotic end-effector wrench tool only needs to move along the direction of insertion.When the end face of wrench tool fits with the bottom of the hexagonal bolt hole as shown in Fig.1, the wrench will bear a positive pressure opposite to the moving direction.The pressure force value measured by the 6-dimensional force/torque sensor can be utilized as the criterion for the end of Stage (2).

    Since we only need to focus on whether the motion is finished in one dimension in the trajectory of Stage(1)and Stage(2),making the multi-dimensional decision unnecessary during trajectory planning.Moreover,in these two stages,it is unnecessary to consider trajectory efficiency improvement.Oppositely, the motion decision in different dimensions of position and attitude should be considered in Stage (3).Thus, Stage(3)is particularly studied to improve efficiency of the assembly trajectory in this paper.

    For Stage (3), the necessary trajectory from an arbitrary point to the terminal point is the‘‘sum”of the shortest position and orientation distance, and the actual trajectory is the‘‘sum”of the actual position and orientation path.The difference between these two concepts is illustrated in Fig.2.

    Thus, trajectory efficiency is defined as the ratio between necessary trajectory (Rnand Pn) and actual trajectory (Raand Pa) as follows:

    Fig.2 Comparison between necessary trajectory and actual trajectory.

    This definition of trajectory efficiency can be adopted to evaluate and improve assembly efficiency of robotic wrench insertion task.

    2.2.RL state-action space design for data projection

    To improve the assembly trajectory efficiency by means of RL,the raw experiment data in the assembly task should be projected into the state-action space in the RL framework.Force instead of vision information is utilized to guide the robotic assembly task,so the state space should be designed according to force data.The properly designed state-action space correlates with the contact model established in the 2-dimensional space,26as shown in Fig.3.As the contact model is established in the 2-dimensional plane rather than the 3-dimensional space,the wrench restricts the adjustment motion in this plane instead of directly moving to the ideal direction.In this sense,dimensionality reduction is achieved by choosing the position or orientation from a specific direction in Stage (3).

    Fig.3 Relationship between force-torque direction and position-orientation deviation.

    Two contact points in the simplified contact model result in contact force.These two contact forces can be denoted as F1and F2, and can be regarded as two parallel forces.In fact,F = F1-F2is the resultant force of the contact force measured by the force sensor,and M is the torque value measured by the force sensor.The direction of the resultant force and torque/-force ratio reflects the position deviation and orientation deviation.26Thus, according to the pose deviation determined by the contact model shown in Fig.3, active trajectory planning is simplified to judge which decision is better in either position or orientation adjustment in this 2-dimensional plane.

    In the actual trajectory decision-making process, only the plane with larger horizontal contact force is considered, and the force-torque plane orthogonal to it is ignored in each step of adjustment(Fig.1).Therefore, the position(represented by P) or orientation (represented by R) of robotic end-effector is adjusted only in this plane each time.

    Then, the state and action can be designed, as expressed in Eqs.(2)–(3) according to Fig.3.

    After determining the principal plane for analysis, raw force-torque data are selected to construct the state variable,then the motion command can also be restricted to two options.Data dimensionality reduction is achieved accordingly.Specifically, the robot receives force information as the state variable, and then the decision action is described in 6-D motion of the robotic end-effector.After the motion of end-effector, the contact state changes accordingly.The force data and the motion data can be recorded in the format as shown in Eq.(4).

    To improve utilization efficiency of raw data, a data processing method is specially designed from the perspective of RL training, as shown in Eq.(5).The conditions of task completion are also illustrated.

    Due to the high expense of acquiring robotic manipulation data in the real physical environment,we should make full use of the obtained demonstration data.According to the contact model and the RL model,the contact forces of Fxand Fyin the horizontal direction are compared first.The direction of the greater force determines the principal analysis plane at this moment (ignoring the contact force in the other orthogonal plane), as shown in Fig.4.

    After selecting the analysis plane at any time,the raw force data (Fxor Fy, Mxor My) that can be utilized to generate the designed state are determined (Fig.4).Analyzing the contact conditions of all situations with the same simplified 2-dimensional contact model as shown in Fig.3 significantly improves data utilization efficiency.

    Obviously, the value of horizontal force affects the motion decision.For the specific task in this paper, the force is very scattered in value, so the range of force values is divided into 5 segments with an interval of 1 N to reduce the number of intervals and simplify judgement.

    According to the simplified contact model, both the horizontal force and the ratio between the torque and the horizontal force (in the same plane) reflect the position-orientation deviation.Specifically,this ratio may fall in 3 different interval ranges according to the distance between the end point of tool and force-torque sensor.Therefore, the defined twodimensional state space, SF and SL, needs to be discussed based on categorization.The detailed correspondence relation between raw force data and state is shown in Fig.5.

    In the simplified contact model,the robot is only needed to determine the unidirectional position or the orientation motion at a moment.In fact,the robot end-effector command has 8 options totally in Stage (3).Thus, after establishing the state space, the relation between the motion command and action space is established according to the plane determined by the horizontal contact force and state variable as shown in Fig.4.In the previous work, conservative policy has been proved to be available,26which can be defined under the framework of RL in Table 1.AR and AP correspond to Rxor Ryand Pxor Pyrespectively in Fig.4.

    Fig.4 Restricted options of the actions according to principal analysis plane and state variables.

    Fig.5 State definitions according to different situations.

    Obviously, it can be deduced that SL3 represents one-side contact with only position deviation.Thus, the policy at state SL3 outputs concrete action, AP, and the subsequent policies in this paper all obey the same principle.This policy in Table 1 is used to complete the task with the goal of controlling small contact force during the whole process at the expense of sacrificing efficiency, which can explain why we conduct this work.

    In the design of the state-action space,the state corresponds to raw force-torque data.After selecting certain force or torque to constitute state variables according to larger horizontal force and contact model, the 6-dimensional raw force-torque data are reduced into a 2-dimensional state variable.The action space corresponds to the motion command of the robotic end-effector.According to the current state and contact model,the action space becomes a 1-dimensional variable with 2 options at a moment.The method of projection from raw data to the state-action space reflects the applicability and innovation of force-based active trajectory planning.The design of the state and action space in this paper greatly improves the efficiency of data utilization and reduces the difficulty of decision making.Four contact conditions are of the same state,and cover all data.The action decision is simplified from one of eight to one of two.

    2.3.Policy iteration initialization method

    After designing the state-action space of RL according to the contact model,policy evaluation before policy iteration is necessary for effectiveness comparison of different actions with the same state for higher reward.However, some state-action pairs are never visited after the projection of experiment data collected by a specific policy.Thus,the policy iteration initialization method is proposed to solve the state-action space coverage by efficiently sampling the subsequent experiment data with specific rules.Common initialization methods include exploration initialization and mild policy.

    The method of exploration initialization requires that every state have a certain probability as the start point.For the robotic wrench tool, the whole assembly process of insertion into the hexagonal hole of bolt has been divided into 3 stages as illustrated in Section 2.1.Policy improvement is intensively studied in Stage (3), which means that the start point in Stage(3) is an uncertain end point of Stage (1) and Stage (2).Thus,the exploration initialization method is not feasible for the current task.

    Mild policy is a method that can cover all the actions, but the probability of selecting specific actions may be relatively low.For the robotic wrench insertion task, the probability of visiting specific state is already low.Choosing low-probability actions in the states with low probability is difficult to traverse all state-action pairs.Theoretically, mild policy requires more demonstration data,increasing data acquisition cost.

    Although the conservative policy can accomplish the task safely, it cannot guarantee the accessibility to all state-action spaces.The number of times that the previously existing demonstration data have traversed each state-action pair are counted and listed in a table, called as state-action table.The policy iteration initialization method in this paper requires the inclusion of all state-action pairs by a more direct and efficient method.The core is to develop an active trajectory adjustment policy based on the vacancy of the state-action table,till the data of all positions in the state-action table have been obtained for calculating and comparing action-values.With respect to the policy iteration initialization method,the robotic experiment is required to obtain the data in the demonstration condition, in which the input of different policies is allowable.

    After accomplishing the projection from raw demonstration data to state-action variable,the collection of all previous demonstration experiment data will generate a state-action table (e.g., Table 2).Every-one increment to the number inthe table indicates that the trajectory has passed through this position one more time, and each time it passes by, the trajectory efficiency is calculated using this as the starting point.

    Table 1 Conservative policy under framework of reinforcement learning.

    Thus, taking the actual situations of robotic wrench insertion task into account, the policy initialization method can be designed,as shown in Fig.6.To accomplish the whole process, several policy forms are defined according to the stateaction table to obtain the initialization policy, which is required as a prerequisite for policy iteration.

    Projection policy: it is completely opposite to the conservative strategy.

    Directive policy: it chooses the action that has never been chosen if the specific states are encountered by chance in the demonstration.For other states, the policy chooses the action with smaller number in the state-action table.After adopting the projection policy, some positions in the state-action table still cannot be accessed for the uncertainty of demonstration process.The directive policy continues with policy initialization after executing the projection policy twice,in order to protect the robotic manipulator and the manipulated object as far as possible.

    Both projection policy and directive policy can be defined as table policy.Specifically, the amount of data in stateaction table determines the formulation of the table policy.However, risks exist in the table policy.Initialization during the demonstration with human supervision provides opportunity for policy adjustment at any time.In the initialization process,the policy can be changed,which gives rise to the concept of the protective policy.

    Protective policy:if the contact force increases continuously after consecutive actions of table policy twice, switch to the conservative policy in the next action.When the contact force decreases,the table policy can be executed again.If the contact force still increases after adopting the table policy twice, the conservative policy will work to the end.Not only the vacancy in the state-action table matters, but also the human demonstrator should make a real-time judgement about security by contact force.This requires that the demonstration platform be able to display the contact force on the monitor in real time(Fig.8(c)).

    Once the demonstration experiment is performed, the demonstration data are classified according to the definition of state-action space.The data amount in each position of the state-action table is accumulated, and the action-value function can be calculated by mathematical expectation.According to the policy definition above, the policy initialization demonstration can be divided into several steps.

    In the first step,the policy initialization method starts with the conservative policy.It is obvious that more than half positions of the state-action table will be empty as some stateaction pairs are not accessed in limited amount of robotic demonstration experiment.

    Fig.6 Policy initialization process of the following stage of policy iteration.

    In the second step, demonstration experiments are conducted twice by the projection policy.In this step, the protective policy is involved in the demonstration to protect the robotic manipulator and the manipulated object.

    In the third step,the directive policy is adopted because the projection policy is too aggressive.The protective policy assists in this step,till no position in the state-action table shows zero.

    2.4.Policy iteration based on protective policy

    After policy iteration initialization, an initialization policy is obtained.Taking this policy as the start point,policy iteration gains a series of policies till convergence is realized.Directly executing these policies in the robotic task would inevitably cause damage to the robotic system.Therefore, the security issue should be considered and guaranteed sufficiently in the policy iteration process by using the protective policy.

    Policy iteration includes policy evaluation and policy improvement.Generally, three policy evaluation methods(Monte Carlo, Dynamic Programming, and Temporal-Difference) are commonly used to calculate the state-value function or action-value function.The state transition probability representing the complete and accurate model of environment should be known in the DP-based method.Biased estimation exists in the TD-based method.Conversely, theMC-based method is unbiased estimation,but requires traversing data.For the robotic assembly task in the real physical environment,the state transition probability,which represents probability of transition to the next state from current state and current action, cannot be described analytically.For the general demonstration experiment, it is indispensable to execute the experiment from beginning to end to obtain complete experiment data.Thus, the inherent problems of MC-based method do not affect the calculation of action-value function,and the MC-based method can avoid involving in state transition probability.Therefore, the Monte Carlo method is utilized to calculate the action-value function by Eq.(6).

    Table 2 Number of times that previous demonstration data have traversed each state-action pair after accumulating 4-time conservative policy demonstrations and twice projection policy demonstrations.

    Among them, the reward function is designed as shown in Eq.(7) according to the trajectory efficiency definition in Eq.(1) to improve trajectory efficiency in Stage (3) of insertion assembly.

    Then, the mathematical expectation of action-value function can be calculated accordingly.As the state-action spaces are small enough to represent approximated action-value functions as a table, the action-values that reflect the average trajectory efficiency of an action in each given state are listed in the form of Q-table (Table 4).The optimal policy can be acquired exactly by comparison of Q-table, and outputs the optimal action in any state by Eq.(8).The policy improvement procedure can be calculated accordingly.

    The RL method can be classified into on-policy method and off-policy method according to whether the policy that generates data is identical to the evaluated and improved policy or not.The off-policy method is used to guarantee sufficient exploration, and complex importance sampling is necessary.As the policy initialization method can achieve coverage of state-action space, the on-policy method can satisfy the task of robotic wrench insertion.As the stochastic policy would generate unpredictable motion, the greedy policy is adopted to reduce uncertainty caused by actions in this paper.

    The greedy policy as a deterministic policy would also face the security issue because the next state is still unknown.Solutions should be designed according to the state-action space and the policy initialization method.Based on the protective policy above, two concepts are defined below:

    Generated policy: the final convergent policy of policy iteration.The generated policy should be capable of completing the assembly independently without the protective policy.

    Intermediate policy: the policy produced during the policy iteration process instead of the initialization policy and final generated policy.

    Different from the classical policy iteration method, when the intermediate policy in every iteration works in the real physical environment, the protective policy must take the responsibility if necessary.The intermediate policy cannot guarantee the security of the robotic manipulator and manipulated object.In addition, it may not complete the assembly trajectory by its own.The pseudo code of policy iteration is listed in Algorithm 1.

    Therefore, the situation where the policy can complete the task without relying on the protective policy is a necessary condition for iterative convergence.If the output policy remains unchanged and does not rely on the protective policy,this policy can be regarded as the final generated policy.

    Algorithm 1 1: Initializeπ1, π1 = initialization policy 2: while π(s) is non-convergence do 3: Generate demo using π(s), if danger occurs, add protective policy.4: for each (s,a) pair appearing in the sequence 5: G ←return reward of every visit (s,a)6: Append G to Returns(s,a)7: Q(s,a)←average(Returns(s,a))8: end 9: π(s)←arg maxaQ(s,a)10: end 11: return π(s)

    3.Experiment and results

    In this section, the experiment platform is introduced first to illustrate the environment of method verification.Then, the policy iteration initialization process and results are listed.The results of policy iteration initialization can be utilized as the start point of policy iteration process.Then the protection-based policy iteration process, and results show that assembly trajectory efficiency can be improved in the case of safety.After policy iteration, the policy that can complete the task independently, called as the generated policy, is compared with the conservative one according to the criterion of trajectory efficiency.

    3.1.Experiment platform for validating policy iteration method

    The validation experiment platform for policy iteration method needs to support the policy iteration process.Not only the application background of space robot, but also the demonstration demand should be considered for this platform.

    For the special space environment, real-time performance and sufficiency of visual information cannot afford to visualservo.Software and hardware of the space robot system cannot guarantee accurate modelling of the hexagonal contour edge.13However, approximate circular features acquired by cameras can help the space robot locate the initial contact position.

    Table 3 State-action table after directive policy demonstrations for 4 times.

    Table 4 Action-values in specific states at the beginning of the policy iteration.

    Thus, the wrench assembly experiment platform for the ground verification experiment in this paper makes it possible for the wrench to contact the bolt and align it in the position dimension(Fig.7).Only the 6-dimensional force-torque sensor installed between the robotic manipulator and the end-effector is adopted to guide this active trajectory planning for assembly.In this paper, the JR3 6-axis Force/Torque sensor (Product Number: 67M25A3) is selected.

    In the space environment,visual servoing is achieved by the eye-in-hand camera and visual marker.The relative pose between the target load and the visual marker is designed as a known variable.Within the 300 mm range,the position measurement accuracy of the camera can be better than 0.5 mm,and the orientation measurement accuracy can be better than 0.3°.In the ground verification experiment, the teleoperation method is used to guide the robotic manipulator to the initial position, as shown in Fig.7.Considering the factors such as position and orientation deviation of the robotic manipulator,the initial conditions of the ground verification experiment should tolerate a larger initial position and orientation deviation between the robotic end-effector wrench tool and the hexagonal hole of bolt.

    The initial pose deviations between the wrench tool and the hexagonal hole are set as follows:orientation deviation is 2.5°,and the position deviation is 1 mm.The generated policy after training should ensure safety, and the trajectory efficiency can be improved in this condition.

    Due to uncertain contact, makes it is dangerous to directly execute the arbitrary autonomous strategy in performing the assembly task.It is natural to protect the whole robotic system when acquiring more efficient policy.Generally, dragging the robotic manipulator by the human demonstrator to acquire trajectory is a common robot demonstration method.In the situation where the robot end-effector is constrained by the manipulated object, the additional external force reflected in the force-torque sensor will be generated due to the contact between the human demonstrator and the robot manipulator.To avoid these issues, the method of noncontact demonstration with human supervision is adopted26to protect the whole experiment of policy improvement and provide interface to send robot end-effector motion command by the mouse(Fig.8(a)).By this demonstration method, not only the arbitrary policy can be applied by the software interface, but also it becomes possible for the human demonstrator to supervise the assembly process by eyes.The real-time contact force detected by the force-torque sensor can also be reflected in the monitor for reference.

    Fig.7 Robotic wrench insertion task on noncontact demonstration platform with human supervision from local and global views.

    In addition,when the assembling strategy is determined,the robot end-effector motion command can be input according to the real-time force signal in the monitor.The software interface provides opportunity for the demonstrator to classify the interval of force data.To ensure the purity and accuracy of the position information, only the position control method is adopted in this robotic demonstration method.To simulate different environments, adjustable angle flat tongs and rotation platform can be used to set 2-dimensional orientation deviations.

    After the robotic wrench tool contacts the hexagonal hole of the socket bolt, the demonstration experiment begins, as shown in Fig.7.The software interface supports inputting the arbitrary policy by pressing the position or orientation command buttons (6 pairs) on the software interface in Fig.8(a).After choosing the analysis plane according to real-time force information, the motion command is input by the mouse.After adjusting the corresponding robot endpoint motion, the position or orientation values on the monitor are updated.When pressing the command button,both the force-torque data in this moment and the updated cumulative position-orientation values are recorded in the same row in a text file.The human demonstrator operates the button according to the readily available strategy.This demonstration software can be used to record the force-position data in the format in Eq.(4) till assembly is completed.

    Fig.8 Core parts of demonstration software interface.

    As the wrench tool has a certain length entering the hole of bolt, any position or orientation adjustment will cause uncertain contact.Thus,the relatively large motion amplitude above can accelerate the whole process significantly.Besides, the interval between the wrench and hole can tolerate this motion amplitude.

    In Stage(1),|Fx|and|Fy|are controlled to be less than 2 N,and Fzis controlled in the interval of[-5N,-2N]in Stage(2).In Stage (3), the assembly trajectory is judged to be finished when the maximum value of |Mx| and |My| is smaller than 0.1 N?m.

    In the entire training process, policy improvement is completed under the demonstration condition.Making full use of existing demonstration will also reduce the number of samples needed to be collected by the robotic experiment.

    3.2.Policy iteration initialization process and results

    According to the definition of policy initialization, initialization demonstration experiments aim to fill the state-action table.As the conservative policy in the first step can only fill up half of the state-action table theoretically, demonstration experiments are only conducted 4 times.The angular position of adjustable rotation platform is set as 210° or 300°, and the angular position of adjustable angle flat tongs is 2.5° in the policy initialization process.The detailed process of initialization policy acquisition is depicted in Fig.9,where X represents that the number in this position is non-zero.

    To fill the positions that would never be accessed by the conservative policy in the state-action table,the projection policy is adopted twice together with the protective policy for the reason of security in the second step.After classifying and accumulating the raw demonstration data according to the state-action definition,the state-action table is listed(Table 2).The state-action table cannot be totally filled because the demonstration trajectories cannot guarantee every state accessible.

    It can be seen in Table 2 that 4 in 20 positions are not accessed right now.Thus, in the third initialization step, the human demonstrator should focus on the states of these 4 state-action pairs and execute the directive policy in certain states.After robotic wrench insertion demonstration experiments for 4 times and data collection, the state-action table turns to be Table 3.

    So far,each position in the state-action table has been filled,which can be seen as a sign of completion of the policy initialization process, and policy iteration can follow up.Although the vacancies of the table can be seen in some demonstrations,the data still contribute to the calculation of the action-value of other state-action pairs.

    The policy initialization method proposed in this paper is suitable for policy iteration in the real physical environment of robotic system.In the robotic assembly task,the initial state cannot be deliberately appointed,so some category data with a small amount in the past data set should be made full use of.When a policy outputs a specific action, the protective policy is prepared for protection.In the process of executing robotic tasks, the principal problem is not the small probability of selecting a certain action in a certain state,but the small probability of accessing some states.Therefore, the random policy or the epsilon-greedy policy is not as efficient as the directive policy.

    3.3.Policy iteration process and results

    When policy initialization finishes, the action-values corresponding to Table 3 and Eq.(6) can be calculated, as listed in Table 4, which is also named as the Q-table.The bold characters highlight the relatively greater action-value given certain two-dimensional state, and the parentheses show the amount of data for calculation.

    Fig.9 Process of acquiring initialization policy.

    To make a clear comparison of the action-values for different actions, the Q-table is depicted in Fig.11(a) to show the better action in each given 2-dimensional state.Then,the concrete policy iteration process is depicted in Fig.10 to acquire the generated policy.

    Similarly, the robotic manipulator and the manipulated object still face security risks for the intermediate policy during policy iteration.Thus, the protective policy is required to cooperate with the intermediate policy demonstration.After adopting the policy acquired from Fig.11(a), the actionvalues are updated, as shown in Fig.11(b).The angular position of the adjustable rotation platform is set to 190° in the policy iteration process.

    Policy iteration does not converge here.Therefore, the robotic assembly demonstration experiment continues by the latest intermediate policy together with the protective policy.Accumulating the latest demonstration data, the updated action-values are depicted in Fig.11(c).

    Although the newly output intermediate policy given by Fig.11(c)stays the same with the previous policy,this intermediate policy cannot complete the task without the aid of the protective policy.Policy iteration needs to continue.However,the data amount of calculating the action-values in Q-table increases, which benefits future calculation.After executing the intermediate policy corresponding to Fig.11(c) in the following demonstration experiment (with the protective policy),the data are obtained, and then the action-values are updated as shown in Fig.11(d).

    Fig.10 Policy iteration process for acquiring final generated policy with initialization policy as the start point.

    The protective policy is not necessary when adopting the policy produced by the least action-values shown in Fig.11(d) in the following demonstration experiment.After processing the latest demonstration data and adding them into existing data set, the action-values are updated, as shown in Fig.11(e).

    The action-values from Fig.11(e) generate the same policy with that from Fig.11(d).To judge whether the policy has converged, the latest policy is applied again.The updated actionvalue is shown in Fig.11(f), and the corresponding policy can be judged as the convergent policy and called as the generated policy.This generated policy can be expressed explicitly in Table 5.

    As the intermediate policy during the iteration process requires auxiliary the protective policy, these policies are not comparable.In the following part, the generated policy is compared with the conservative policy in terms of trajectory efficiency by additional experiments.

    Fig.11 Action-values of all state-action pairs for 6 calculations in policy iteration.

    Compared with direct policy training under the unsupervised condition, the protective policy with human supervision can protect the robot and the manipulated object.It is also a novel idea to introduce the protective policy to deal with the danger caused by the unstable intermediate policy during policy iteration.

    3.4.Comparison between generated policy and conservative policy

    To improve trajectory efficiency, policy iteration is utilized,and a policy initialization method is proposed.The trajectory efficiencies are compared in the demonstration mode by adopting two policies.The angular position of adjustable rotation platform is set to 100° in the policy comparison process, and the angular position of adjustable-angle flat tongs is 2°.To illustrate the influence of manufacturing tolerances of hexagonal bolts and wrench,two types of M8 bolts are selected in the experiment,and the distance between the opposite sides of the hexagonal hole of these two bolts are 6.07 mm and 6.03 mm,respectively.Besides, the distance between the opposite sides of the hexagonal wrench in this experiment is 5.98 mm,resulting in two types of assembly tolerance,0.05 mm and 0.09 mm,respectively.

    Under this initial condition,the main orientation deviation is in the Rxdirection(compared with Ry).Both generated policy and conservative policy execute demonstration experiments for 10 times, and the trajectory efficiency for each experiment is calculated by Eq.(1).For the assembly tolerance of 0.09 mm,the average trajectory efficiency of the generated policy is 58.56%,while it is 55.25%for the conservative one.For the assembly tolerance of 0.05 mm,the average trajectory efficiency of the generated policy is 82.00 %, while it is 62.91 %for the conservative one.The results are compared and shown in Fig.12.

    During the demonstration process, it can be found in both experiments with two types of bolts that the first adjustment of orientation direction has a great impact on trajectory efficiency.The trajectory efficiency is further classified and compared according to the direction of the first orientation adjustment, which is differentiated by Rxand Ryin Table 6(for the assembly tolerance of 0.09 mm) and Table 7 (for the assembly tolerance of 0.05 mm).

    According to policy tables(Table 1 and Table 5),these two policies only have differences in state [SF3,SL1].If one of the policies generates a trajectory without going through this state,this demonstration trajectory data is obviously invalid for comparison.

    Regardless of the direction of the first orientation adjustment, the trajectory efficiency of generated policy is always higher than that of conservative policy.Priority choice in thedirection with large orientation deviation leads to higher trajectory efficiency.This is because orientation deviation is the dominant deviation relative to the position deviation during the assembly process.Every time the orientation is adjusted,the position deviation will inevitably change.Conversely, the position adjustment does not change the orientation deviation.Therefore, if the first adjustment is towards the direction with greater orientation deviation, it will result in higher trajectory efficiency.

    Table 5 Expressions of generated policy.

    Fig.12 Trajectory efficiency results for different assembly tolerance.

    Table 6 Trajectory efficiency classified and compared according to the first orientation adjustment (assembly tolerance:0.09 mm).

    Table 7 Trajectory efficiency classified and compared according to first orientation adjustment (assembly tolerance:0.05 mm).

    However, it is difficult to judge the principal orientation deviation direction under the current condition.The first orientation adjustment decision depends on initial position deviation, which depends on the uncertain initial contact state.It is hard to guarantee the optimal orientation adjustment with the current policy,but the policy initialization and the iteration method surely improve follow-up trajectory efficiency.

    Besides, the comparison experiment shows the influence of manufacturing tolerance.For the bolt assembly with small tolerance, the trajectory efficiency is higher for both policies.In our view,small assembly tolerance restricts the position adjustment space.When the horizontal contact force locates in the areas of the adjusting orientation of the robotic end-effector,small assembly tolerance leads to more accurate position alignment.Thus, it is obvious that higher trajectory efficiency can be realized with small assembly tolerance between the hexagonal bolt hole and wrench.

    4.Conclusion

    To improve the assembly efficiency of robots, the efficiency improvement of trajectory planning in wrench insertion is constructed as an objective optimization problem based on RL.To achieve objective optimization through RL, a novel policy iteration method is proposed.For the start point of policy iteration,a policy iteration initialization method based on vacancy of state-action table is designed and applied.Compared with the initialization mode dependent on exploring starts and the mild policy,this initialization method can improve demonstration efficiency by finishing the state-action traverse in a limited number of times.In addition, policy iteration based on the protective policy can output the generated policy and avoid unpredictable risks.

    In robotic experiments,the conservative policy and the generated policy are compared separately in the condition of noncontact robot demonstration with human supervision.For each 10-time experiment, the average trajectory efficiencies of generated policy are 58.56 % for 0.09 mm assembly tolerance and 82.00%for 0.05 mm assembly tolerance,which are much higher than those of conservative policy(55.25%for 0.09 mm assembly tolerance and 62.91 % for 0.05 mm assembly tolerance).Although the trajectory efficiency is affected by the first orientation decision, the generated policy has significant improvement in trajectory efficiency compared with the conservative one for all assembly tolerances in this paper.

    In the future, the state-action space could be refined in the targeted region, so as to further improve trajectory efficiency,and the influence of the first orientation adjustment decision should be eliminated.In addition, this method guarantees the feasibility of inserting wrench into the bolt for the robotic manipulator and makes it possible to replace astronauts’extravehicular activities.

    Declaration of Competing Interest

    The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

    Acknowledgements

    This study was supported by the National Natural Science Foundation of China(No.91848202)and the Special Foundation (Pre-Station) of China Postdoctoral Science (No.2021TQ0089).

    欧美精品国产亚洲| 在线免费观看的www视频| 搞女人的毛片| 亚洲精品中文字幕在线视频 | 在线观看美女被高潮喷水网站| 一边亲一边摸免费视频| 亚洲欧美精品专区久久| 中文欧美无线码| 久久精品国产自在天天线| 精品国产三级普通话版| 久久精品人妻少妇| 最新中文字幕久久久久| 久久久久久久久大av| 亚洲最大成人av| 精品欧美国产一区二区三| 亚洲av在线观看美女高潮| 国产一区二区三区综合在线观看 | 日本熟妇午夜| 国产一区二区在线观看日韩| 两个人视频免费观看高清| 欧美丝袜亚洲另类| 国产欧美日韩精品一区二区| 日韩av不卡免费在线播放| 精品一区二区免费观看| 国产伦精品一区二区三区视频9| 成年女人在线观看亚洲视频 | 国产精品人妻久久久久久| 黄色配什么色好看| 国产精品一区二区在线观看99 | 天堂√8在线中文| 一级毛片 在线播放| 一级a做视频免费观看| 亚洲婷婷狠狠爱综合网| 蜜桃亚洲精品一区二区三区| 干丝袜人妻中文字幕| 国产精品蜜桃在线观看| 国产色爽女视频免费观看| 国产精品人妻久久久久久| 天堂av国产一区二区熟女人妻| 又黄又爽又刺激的免费视频.| 99久久人妻综合| 色视频www国产| 亚洲高清免费不卡视频| 日本欧美国产在线视频| 国产精品久久久久久精品电影小说 | 亚洲18禁久久av| 高清av免费在线| 色综合站精品国产| 搡老乐熟女国产| 啦啦啦中文免费视频观看日本| 久久97久久精品| av卡一久久| 日韩欧美精品免费久久| 插阴视频在线观看视频| 亚洲高清免费不卡视频| av国产久精品久网站免费入址| 色播亚洲综合网| 最后的刺客免费高清国语| 日韩av在线大香蕉| 2021少妇久久久久久久久久久| 毛片女人毛片| 能在线免费看毛片的网站| 国产高清有码在线观看视频| 亚洲精品国产成人久久av| 亚洲av一区综合| 日本爱情动作片www.在线观看| 国产亚洲91精品色在线| 欧美日韩精品成人综合77777| 亚洲精品久久久久久婷婷小说| 亚洲欧美成人综合另类久久久| 麻豆国产97在线/欧美| 国产成人精品婷婷| 99久久九九国产精品国产免费| 国产一区二区三区综合在线观看 | 国产午夜福利久久久久久| 啦啦啦中文免费视频观看日本| 午夜久久久久精精品| 国产视频首页在线观看| 免费观看av网站的网址| 在线播放无遮挡| 最近手机中文字幕大全| 精品一区在线观看国产| 五月玫瑰六月丁香| 色吧在线观看| 久久精品夜夜夜夜夜久久蜜豆| 久久99蜜桃精品久久| 三级国产精品欧美在线观看| 草草在线视频免费看| 九九久久精品国产亚洲av麻豆| 午夜激情久久久久久久| 久久久精品94久久精品| 欧美xxxx性猛交bbbb| 直男gayav资源| 国产精品一区二区三区四区免费观看| 亚洲欧美成人综合另类久久久| 69人妻影院| 干丝袜人妻中文字幕| 国产午夜精品一二区理论片| 午夜日本视频在线| 久久久久久国产a免费观看| 18禁裸乳无遮挡免费网站照片| 亚洲精品乱码久久久v下载方式| 一个人看视频在线观看www免费| 熟女电影av网| 久久久国产一区二区| 少妇裸体淫交视频免费看高清| 免费电影在线观看免费观看| 干丝袜人妻中文字幕| 国产黄色免费在线视频| 中文字幕久久专区| 又爽又黄无遮挡网站| 欧美3d第一页| 在线 av 中文字幕| 一个人观看的视频www高清免费观看| 国产视频首页在线观看| 久久精品久久精品一区二区三区| 亚洲欧美日韩卡通动漫| 国产免费视频播放在线视频 | 高清欧美精品videossex| 夫妻午夜视频| 亚洲国产欧美人成| 国产精品三级大全| 舔av片在线| 亚洲aⅴ乱码一区二区在线播放| 久久97久久精品| 日韩欧美三级三区| 日本与韩国留学比较| 看黄色毛片网站| 内地一区二区视频在线| 免费观看无遮挡的男女| 2018国产大陆天天弄谢| 国产精品一区二区三区四区久久| 亚洲精品视频女| 国产午夜福利久久久久久| 欧美三级亚洲精品| 午夜福利在线观看免费完整高清在| 国产白丝娇喘喷水9色精品| 黄色配什么色好看| 黄色配什么色好看| 人妻制服诱惑在线中文字幕| 国产午夜精品论理片| 国产成人freesex在线| www.av在线官网国产| 我的老师免费观看完整版| 精品久久国产蜜桃| 超碰97精品在线观看| 国产精品人妻久久久久久| 亚洲真实伦在线观看| 深爱激情五月婷婷| 男人狂女人下面高潮的视频| 高清午夜精品一区二区三区| 国产亚洲91精品色在线| 99热6这里只有精品| 中文资源天堂在线| 精品久久久久久电影网| 日韩欧美精品免费久久| 国产成人精品一,二区| 大话2 男鬼变身卡| 日本免费a在线| 久久久久久久大尺度免费视频| 大香蕉久久网| 一级毛片 在线播放| 久久久色成人| 久久久欧美国产精品| 日日啪夜夜撸| 日本与韩国留学比较| 国产av不卡久久| 在线免费观看不下载黄p国产| 伦精品一区二区三区| 久久精品久久久久久噜噜老黄| 一个人观看的视频www高清免费观看| 中文字幕人妻熟人妻熟丝袜美| 国产午夜福利久久久久久| 久久久久久国产a免费观看| 麻豆精品久久久久久蜜桃| 高清在线视频一区二区三区| 日本三级黄在线观看| 日日啪夜夜撸| 亚洲精品第二区| av在线播放精品| 我要看日韩黄色一级片| 日韩强制内射视频| 观看美女的网站| 国产综合精华液| 视频中文字幕在线观看| 69人妻影院| 亚洲综合精品二区| 麻豆成人午夜福利视频| 蜜桃久久精品国产亚洲av| 国产探花在线观看一区二区| 亚洲,欧美,日韩| 插逼视频在线观看| 亚洲精品日韩在线中文字幕| 亚洲真实伦在线观看| 免费播放大片免费观看视频在线观看| 国语对白做爰xxxⅹ性视频网站| 亚洲在线观看片| ponron亚洲| 嫩草影院新地址| 老女人水多毛片| 2022亚洲国产成人精品| 久久韩国三级中文字幕| 可以在线观看毛片的网站| 国产女主播在线喷水免费视频网站 | 日本猛色少妇xxxxx猛交久久| 国产亚洲一区二区精品| 欧美区成人在线视频| 国产精品av视频在线免费观看| 丝袜美腿在线中文| 少妇的逼水好多| 美女主播在线视频| 老司机影院成人| 人妻一区二区av| 日韩制服骚丝袜av| 国产单亲对白刺激| 免费看不卡的av| 亚洲美女搞黄在线观看| 亚洲精品日韩在线中文字幕| 婷婷色麻豆天堂久久| 久久热精品热| 色吧在线观看| 最近手机中文字幕大全| 日本一本二区三区精品| 国产白丝娇喘喷水9色精品| 久久精品夜夜夜夜夜久久蜜豆| 69av精品久久久久久| 国产精品一二三区在线看| 中文字幕av在线有码专区| 91久久精品电影网| 3wmmmm亚洲av在线观看| 一级毛片久久久久久久久女| 女的被弄到高潮叫床怎么办| 青春草视频在线免费观看| 日本黄大片高清| 婷婷色综合大香蕉| 免费看美女性在线毛片视频| 99久久人妻综合| 男女啪啪激烈高潮av片| 午夜福利在线观看吧| 亚洲欧美成人精品一区二区| av又黄又爽大尺度在线免费看| 天堂俺去俺来也www色官网 | 国国产精品蜜臀av免费| 亚洲精品一区蜜桃| 少妇人妻一区二区三区视频| 国产精品一区www在线观看| 久久久久国产网址| 嫩草影院精品99| 九九爱精品视频在线观看| 日韩av不卡免费在线播放| 六月丁香七月| 久久久久久久久久成人| 精品久久久久久久久av| 免费黄频网站在线观看国产| 少妇被粗大猛烈的视频| 99热这里只有是精品在线观看| 免费观看av网站的网址| 精品久久久久久久末码| 日本熟妇午夜| 综合色av麻豆| 欧美97在线视频| 成人国产麻豆网| 韩国高清视频一区二区三区| 日韩av不卡免费在线播放| 丰满少妇做爰视频| 国产精品1区2区在线观看.| 久久久精品欧美日韩精品| 久久久a久久爽久久v久久| 久久99蜜桃精品久久| 成年人午夜在线观看视频 | 国产精品精品国产色婷婷| 天堂影院成人在线观看| 3wmmmm亚洲av在线观看| 免费看av在线观看网站| 亚洲精品久久久久久婷婷小说| 97热精品久久久久久| 亚洲,欧美,日韩| 少妇的逼好多水| av线在线观看网站| 少妇丰满av| 国产麻豆成人av免费视频| 精品国产露脸久久av麻豆 | kizo精华| 亚洲国产欧美人成| 精品国内亚洲2022精品成人| 国产男人的电影天堂91| 高清在线视频一区二区三区| 日本wwww免费看| 免费在线观看成人毛片| 永久免费av网站大全| 国国产精品蜜臀av免费| 日韩av免费高清视频| 国产探花在线观看一区二区| 亚洲精品国产av蜜桃| 欧美人与善性xxx| 免费av毛片视频| 亚洲av成人精品一二三区| 最新中文字幕久久久久| 国产精品av视频在线免费观看| 97热精品久久久久久| 最近最新中文字幕大全电影3| 九九爱精品视频在线观看| 成人漫画全彩无遮挡| 在线免费观看的www视频| 国产麻豆成人av免费视频| 亚洲av电影在线观看一区二区三区 | 亚洲人成网站在线播| 三级国产精品片| 国产免费又黄又爽又色| 亚洲欧美精品自产自拍| 中文精品一卡2卡3卡4更新| 久99久视频精品免费| 亚洲国产成人一精品久久久| 国产午夜精品论理片| 亚洲av日韩在线播放| 国产一区亚洲一区在线观看| 欧美日韩国产mv在线观看视频 | 国产精品福利在线免费观看| 又粗又硬又长又爽又黄的视频| 99热全是精品| 久久久久免费精品人妻一区二区| 自拍偷自拍亚洲精品老妇| www.色视频.com| 午夜激情欧美在线| 国产一级毛片七仙女欲春2| 男女国产视频网站| 欧美变态另类bdsm刘玥| 日本wwww免费看| 欧美 日韩 精品 国产| 欧美日韩综合久久久久久| 欧美激情在线99| 日本三级黄在线观看| 精品一区二区三区人妻视频| 亚洲国产最新在线播放| 国产成人freesex在线| 青春草亚洲视频在线观看| 九九爱精品视频在线观看| 欧美日韩精品成人综合77777| 国产精品熟女久久久久浪| 欧美最新免费一区二区三区| 丝袜美腿在线中文| 国产成人午夜福利电影在线观看| 久久久精品94久久精品| 99热这里只有精品一区| 五月伊人婷婷丁香| 美女高潮的动态| 亚洲精华国产精华液的使用体验| eeuss影院久久| 精品人妻一区二区三区麻豆| 欧美xxxx性猛交bbbb| 国产精品久久久久久久久免| 99久久精品一区二区三区| 夜夜看夜夜爽夜夜摸| 国产黄a三级三级三级人| 水蜜桃什么品种好| 中文资源天堂在线| 亚洲精品久久久久久婷婷小说| 卡戴珊不雅视频在线播放| 中文字幕制服av| 国产伦在线观看视频一区| 草草在线视频免费看| 一本久久精品| av在线亚洲专区| 国产永久视频网站| 国产精品女同一区二区软件| 最近的中文字幕免费完整| 人妻一区二区av| 最新中文字幕久久久久| 日本爱情动作片www.在线观看| 欧美日韩亚洲高清精品| 日韩av不卡免费在线播放| 日本午夜av视频| 国产男人的电影天堂91| 中文字幕免费在线视频6| 国产高潮美女av| 午夜免费激情av| 日本免费在线观看一区| 69人妻影院| www.av在线官网国产| 国产高清有码在线观看视频| 80岁老熟妇乱子伦牲交| 国产高清国产精品国产三级 | 亚洲国产高清在线一区二区三| 大片免费播放器 马上看| 高清在线视频一区二区三区| 日韩大片免费观看网站| 久久草成人影院| 久久国产乱子免费精品| 久久久久性生活片| 天天躁夜夜躁狠狠久久av| 老司机影院成人| 高清视频免费观看一区二区 | 国产又色又爽无遮挡免| 婷婷六月久久综合丁香| 亚洲精品一区蜜桃| 精品久久久久久久久亚洲| 精品久久久久久成人av| 亚洲电影在线观看av| 99久久精品国产国产毛片| 欧美三级亚洲精品| av卡一久久| 国产av码专区亚洲av| 乱人视频在线观看| 国产日韩欧美在线精品| 国产精品av视频在线免费观看| 精品亚洲乱码少妇综合久久| 国产真实伦视频高清在线观看| 一级av片app| 永久免费av网站大全| 能在线免费观看的黄片| 国内精品美女久久久久久| 天天躁日日操中文字幕| 国产精品久久久久久久电影| 成人漫画全彩无遮挡| 高清视频免费观看一区二区 | 国产91av在线免费观看| 久久久久久久久大av| 国产精品一区www在线观看| 久久久亚洲精品成人影院| 水蜜桃什么品种好| 三级国产精品欧美在线观看| 午夜激情欧美在线| 欧美日韩一区二区视频在线观看视频在线 | 精品人妻熟女av久视频| 国产亚洲5aaaaa淫片| 国语对白做爰xxxⅹ性视频网站| 国产精品一区二区在线观看99 | a级毛色黄片| 亚洲av中文av极速乱| 97热精品久久久久久| 免费观看在线日韩| 亚洲欧美日韩卡通动漫| 黄色日韩在线| 男人和女人高潮做爰伦理| 国产黄片美女视频| 午夜福利网站1000一区二区三区| 久久精品久久精品一区二区三区| 日日摸夜夜添夜夜添av毛片| 午夜亚洲福利在线播放| 人人妻人人看人人澡| 日韩成人av中文字幕在线观看| 黄色一级大片看看| 欧美日韩视频高清一区二区三区二| 少妇熟女aⅴ在线视频| 久久综合国产亚洲精品| 欧美激情久久久久久爽电影| 国产视频首页在线观看| 欧美最新免费一区二区三区| 极品少妇高潮喷水抽搐| 天堂中文最新版在线下载 | 色综合站精品国产| 91久久精品国产一区二区成人| 中文字幕制服av| 一级毛片黄色毛片免费观看视频| 亚洲国产精品专区欧美| 中文乱码字字幕精品一区二区三区 | 亚洲精品自拍成人| 国产片特级美女逼逼视频| 免费高清在线观看视频在线观看| 精品人妻熟女av久视频| 婷婷色麻豆天堂久久| 中文字幕亚洲精品专区| 2021天堂中文幕一二区在线观| 干丝袜人妻中文字幕| 国产不卡一卡二| 建设人人有责人人尽责人人享有的 | 日韩大片免费观看网站| 内地一区二区视频在线| 人体艺术视频欧美日本| freevideosex欧美| 日韩精品有码人妻一区| 国产高清有码在线观看视频| 深爱激情五月婷婷| 国产乱人偷精品视频| eeuss影院久久| 午夜免费激情av| av福利片在线观看| 一级片'在线观看视频| 免费黄频网站在线观看国产| 国内少妇人妻偷人精品xxx网站| 麻豆国产97在线/欧美| 天堂俺去俺来也www色官网 | 日韩电影二区| 国产午夜精品久久久久久一区二区三区| 亚洲国产精品成人久久小说| 一个人看视频在线观看www免费| 国产真实伦视频高清在线观看| 精品久久久久久成人av| 国产不卡一卡二| 精品人妻一区二区三区麻豆| 亚洲欧洲日产国产| 啦啦啦啦在线视频资源| 国产成人91sexporn| 成人一区二区视频在线观看| 国产一区亚洲一区在线观看| 日日摸夜夜添夜夜添av毛片| 嫩草影院入口| .国产精品久久| 国产淫片久久久久久久久| 2021天堂中文幕一二区在线观| 一级黄片播放器| 国产不卡一卡二| 亚洲精品第二区| 国产午夜精品久久久久久一区二区三区| 日日啪夜夜爽| av免费在线看不卡| 免费看不卡的av| 成人特级av手机在线观看| 午夜福利成人在线免费观看| 中文字幕人妻熟人妻熟丝袜美| 老师上课跳d突然被开到最大视频| 亚洲经典国产精华液单| 最近最新中文字幕免费大全7| 国产成人a∨麻豆精品| 国产淫语在线视频| 免费播放大片免费观看视频在线观看| 99热全是精品| 一个人看的www免费观看视频| 一个人观看的视频www高清免费观看| 观看免费一级毛片| 亚洲国产av新网站| 国产精品熟女久久久久浪| 亚洲欧美一区二区三区黑人 | 国产av码专区亚洲av| 深爱激情五月婷婷| 能在线免费观看的黄片| 亚洲经典国产精华液单| 亚洲精品影视一区二区三区av| 国产在视频线在精品| 床上黄色一级片| 国产精品一区二区三区四区免费观看| av卡一久久| 肉色欧美久久久久久久蜜桃 | 亚洲av免费高清在线观看| 国产av在哪里看| 大片免费播放器 马上看| 国产爱豆传媒在线观看| 久久精品国产亚洲av天美| 日韩伦理黄色片| 秋霞伦理黄片| 午夜亚洲福利在线播放| 国产精品.久久久| 国产精品嫩草影院av在线观看| 亚洲成人av在线免费| 非洲黑人性xxxx精品又粗又长| 亚洲美女视频黄频| av线在线观看网站| 一级毛片电影观看| 精品久久久噜噜| 国产亚洲91精品色在线| 国产精品一区二区三区四区免费观看| 日本av手机在线免费观看| 亚洲欧美精品自产自拍| 中文字幕亚洲精品专区| 日韩精品有码人妻一区| 亚洲av中文字字幕乱码综合| 成人午夜高清在线视频| 夫妻性生交免费视频一级片| 秋霞在线观看毛片| 亚洲一级一片aⅴ在线观看| 天堂影院成人在线观看| 欧美日韩视频高清一区二区三区二| 国产精品久久久久久久电影| 97精品久久久久久久久久精品| 黄片无遮挡物在线观看| 夜夜爽夜夜爽视频| 日韩成人av中文字幕在线观看| 我的老师免费观看完整版| 高清欧美精品videossex| 亚洲成人中文字幕在线播放| 国产淫语在线视频| 噜噜噜噜噜久久久久久91| 伦理电影大哥的女人| 非洲黑人性xxxx精品又粗又长| 亚洲精品aⅴ在线观看| 在线免费观看不下载黄p国产| av免费在线看不卡| 伦精品一区二区三区| 天天躁日日操中文字幕| 亚洲av国产av综合av卡| 直男gayav资源| 亚洲av免费在线观看| 成人高潮视频无遮挡免费网站| 一区二区三区四区激情视频| 夫妻性生交免费视频一级片| 99热这里只有是精品在线观看| 亚洲一级一片aⅴ在线观看| 日韩电影二区| av国产久精品久网站免费入址| 国产av在哪里看| 特大巨黑吊av在线直播| 又粗又硬又长又爽又黄的视频| h日本视频在线播放| 舔av片在线| 寂寞人妻少妇视频99o| 一级av片app| 99久久精品热视频| 午夜视频国产福利| 18禁裸乳无遮挡免费网站照片| 成人亚洲欧美一区二区av| 国产在视频线精品| 美女大奶头视频| 内地一区二区视频在线| 美女xxoo啪啪120秒动态图| 国产免费福利视频在线观看| 狂野欧美白嫩少妇大欣赏| 国产成人免费观看mmmm| 人妻制服诱惑在线中文字幕| 午夜老司机福利剧场| 精华霜和精华液先用哪个| 久久久精品免费免费高清| 日韩大片免费观看网站| 亚洲最大成人中文| 免费在线观看成人毛片| 男女边摸边吃奶|