• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Collaborative Pushing and Grasping of Tightly Stacked Objects via Deep Reinforcement Learning

    2022-10-26 12:24:00YuxiangYangZhihaoNiMingyuGaoJingZhangandDachengTao
    IEEE/CAA Journal of Automatica Sinica 2022年1期

    Yuxiang Yang, Zhihao Ni, Mingyu Gao, Jing Zhang,, and Dacheng Tao,

    Abstract—Directly grasping the tightly stacked objects may cause collisions and result in failures, degenerating the functionality of robotic arms. Inspired by the observation that first pushing objects to a state of mutual separation and then grasping them individually can effectively increase the success rate, we devise a novel deep Q-learning framework to achieve collaborative pushing and grasping. Specifically, an efficient nonmaximum suppression policy (PolicyNMS) is proposed to dynamically evaluate pushing and grasping actions by enforcing a suppression constraint on unreasonable actions. Moreover, a novel data-driven pushing reward network called PR-Net is designed to effectively assess the degree of separation or aggregation between objects. To benchmark the proposed method, we establish a dataset containing common household items dataset (CHID) in both simulation and real scenarios.Although trained using simulation data only, experiment results validate that our method generalizes well to real scenarios and achieves a 97% grasp success rate at a fast speed for object separation in the real-world environment.

    I. INTRODUCTION

    GRASPING is one of the most fundamental problems in the area of robotics [1], [2], which has important applications in many scenarios, such as sorting robot, service robot and human-robot interaction. It has attracted increasing attention in recent years, however, remaining challenging for a robot arm to grasp tightly stacked objects automatically.

    Traditional grasping methods are usually applied in a controlled environment with the known object model [3],which have limited the adaptability for different objects and scenarios. Recently, researchers apply deep learning and reinforcement learning into robotic tasks to improve the grasping success rate in various scenarios with different targets. For example, in [4]–[6], deep neural networks were used to predict the grasp point, angle, and jaw width from the input image. In [7]–[9], deep learning and reinforcement learning were combined for robotic grasping, which mapped the RGB-D image to specific action strategy and performed unsupervised learning to use the reward function. Although these methods can achieve grasping at a reasonable success rate, they struggle in handling tightly stacked objects since it is hard to find a suitable grasp point on an object and grasp it without causing collisions [10]. Therefore, how to design effective strategies to grasp tightly stacked objects remains challenging.

    In practice, first pushing tightly stacked objects to a state of mutual separation can facilitate the subsequent grasping phase and significantly increase the success rate [11]. Therefore,how to model both tasks into a unified multi-task framework to enable collaborative pushing and grasping is a promising direction to solve the problem. Recently, some collaborative pushing and grasping methods [12]–[16] based on deep reinforcement learning have been proposed. Zenget al. [12]proposed a deep Q-learning framework to tackle this task.However, its reward function only accounts for whether there should be a push action without evaluating the consequence of the push action, which affects the effectiveness of the pushing strategy. The pushing reward functions in [13]–[15] were defined using the image difference before and after the pushing action, while the validity of the pushing action was still not evaluated. Yanget al. [16] evaluated the pushing effect using the maximum Q value of local area around the push point before and after the pushing action. Since the evaluation only accounted for the consequence of pushing at a local area, it may result in predicting ineffective actions that achieve no gains from a global perspective, e.g., separating a small group of objects while some of them may be closer to the remaining objects. Indeed, how to design a pushing reward function to comprehensively evaluate the consequence of the pushing action remains under-explored.

    Besides, these methods [12]–[16] mainly used toy blocks as representative objects in the experiments, which have simple colors and shapes and lack of diversity and generality. Using simple objects during training may lead to a poor generalization capability when transferring to new scenarios,e.g., from the simulation environment to the real environment and from specific objects to unknown objects. Therefore, it is also very important to construct an object dataset containing objects in various shapes and colors to improve the generalization capability of the trained model.

    To address these issues, we propose a novel collaborative pushing and grasping method based on deep Q-learning with an efficient non-maximum suppression policy (PolicyNMS),which can help to suppress unreasonable actions. Moreover, a novel pushing reward function based on convolutional neural networks called PR-Net is devised, which can comprehensively assess the degree of aggregation or separation between objects for each candidate pushing action from a global perspective, therefore helping the model to predict more effective pushing actions. Furthermore, we establish a dataset named CHID (common household items dataset) containing common household items in various colors and shapes and construct training scenarios from easy to difficult following the curriculum learning idea, which are beneficial to enhance the generalization capability of the collaborative pushing and grasping model. Experiments show that our method can efficiently accomplish the grasping task of tightly stacked objects via collaborative pushing and grasping and generalize well from simulation to real application and from specific objects to unknown objects as illustrated in Fig. 1. The proposed method has a wide range of applications like industrial parts sorting and household clutter sorting. The contributions of this study can be summarized as:

    Fig. 1. Illustration of the proposed method for collaborative pushing and grasping tightly stacked objects.

    1) A novel model-free deep Q-learning method is proposed for grasping tightly stacked objects via collaborative pushing and grasping, where an efficient PolicyNMS is devised to suppress unreasonable actions.

    2) A novel pushing reward function called PR-Net is devised to predict the global reward for each candidate pushing action by comprehensively assessing the degree of aggregation or separation between objects.

    3) A common household item dataset with curriculum training scenarios from easy to difficult is established to train and evaluate the model. Experimental results demonstrate the generalization capability of our model.

    The remainder of the paper is organized as follows. Section II reviews related work. In Section III, we present the details of the proposed method, including the PolicyNMS, the reward functions, and the proposed CHID dataset. The experimental results and analysis are presented in Section IV. Finally, we conclude the paper in Section V.

    II. RELATED WORK

    A. Grasping Methods

    Grasping is one of the most fundamental and interesting problems in robotics research. Recently, data-driven robotic grasping methods have achieved a lot of progress. 6D pose estimation methods [17], [18] were proposed to complete precise positioning of objects and achieved grasping. Kehlet al. [17] extended the popular object detection network SSD(single shot multibox detector) [19] for 6D pose estimation and achieved good results from a single RGB image. Wanget al. [18] proposed a DenseFusion network to extract RGB and depth features separately and fuse them to estimate precise 6D pose. But these methods all need the 3D model information of the target objects, which is difficult to acquire in many practical applications.

    Differently from them, deep neural networks [4]–[6] were used to directly predict the grasp point, angle, and jaw width from the image, which can be well generalized to unknown objects and accomplish the grasping task. Mahleret al. [4]proposed a grasp quality convolutional neural network that predicts grasps location from synthetic point cloud data.Kumraet al. [6] proposed a generative residual convolutional neural network that usesn-channel input data to generate images that can be used to infer grasp rectangles for each pixel. Although good grasping performance has been achieved, these methods [4]–[6] based on supervised learning are limited to single grasp strategy and unable to achieve the coordination of different strategies throughout the task.

    Reinforcement learning using long-term future views can help the agent to learn a more robust and comprehensive policy. In [8], [9], [20]–[22], deep reinforcement learning methods were proposed to model the grasping task and use reward functions to guide the grasp strategy for accomplishing the task, achieving good generalization performance.However, for densely stacked objects, grasping them directly will cause collisions between objects as well as collisions between the gripper and objects, resulting in failures.Differently from these methods, we propose a collaborative pushing and grasping method based on reinforcement learning, which first pushes the tightly stacked objects to separate them from each other and then grasps each object sequentially. In this way, our method can significantly improve the success rate.

    B. Pushing Methods

    Pushing is another fundamental task in robotics research[23]. Pushing to separate the tightly stacked objects can help improve the success rate of grasping. Actually, separating objects in close proximity is the prerequisite for many other subsequent operations [24], such as object classification,object arrangement, and object stacking. Deep learning methods are widely applied in robotic pushing problems [25],[26]. Katzet al. [25] presented an interactive segmentation algorithm to push cluttered objects. Similarly, Eitelet al. [26]also applied an object segmentation algorithm and then generated a series of push action sets based on the segmentation results. However, the segmentation results may be incorrect, especially for unknown objects, which greatly affect the robustness of these segmentation-based methods.

    Reinforcement learning based pushing also attracts increasing attention in recent years [27], [28]. Andrychowiczet al. [27] proposed the hindsight experience replay method to train the policy for those robotic tasks like pushing from the sparse and binary reward. But their environments were pretty simple, where only one object was needed to be pushed in the workspace. Kiatoset al. [28] designed a pushing method to separate a target object from the cluttered environment based on reinforcement learning. However, it is designed to separate single specific target rather than all generic objects in the complex environment. Differently from these methods, we focus on obtaining the suitable push sequence to separate all the objects in dense clutter, which is essential to improve the success rate of subsequent grasping.

    C. Multi-task Learning

    Recently, researchers focused on multi-task learning of collaborative pushing and grasping [12]–[16] based on deep reinforcement learning. In [12], a deep Q-learning framework was proposed to address this task. However, the pushing reward function in [12] only assessed whether the object was pushed, rather than evaluating the effectiveness of the push action. Hence, this method [12] may result in pushing the whole objects in a certain direction. The pushing reward functions in [13]–[15] were defined using the scene image difference before and after the pushing action, while the effectiveness of the pushing action was still not assessed. The reward function in [16] evaluated the pushing consequence by comparing the Q value around the push point before and after the pushing action. However, only evaluating the pushing effectiveness in a local area may result in non-optimal pushing action from a global perspective, e .g., separating a small group of objects while some of them may be closer to the remaining objects. Besides, these methods mainly used toy blocks during training and testing, which have simple colors and shapes and lack of diversity and generality. Using simple training objects may lead to a poor generalization capability when transferring to new scenarios, e.g., from the simulation environment to the real environment and from specific objects to unknown objects.

    By contrast, we establish a CHID dataset containing common household items in various shapes and colors, which can be used to improve the generalization capability of the trained model. In addition, following the multi-task learning idea, we design a novel collaborative pushing and grasping method based on deep Q-learning, where an efficient nonmaximum suppression policy is designed to suppress unreasonable actions. Furthermore, we propose a new datadriven pushing reward network that can comprehensively assess the degree of separation and aggregation between objects from a global view rather than the local neighborhood based assessment in previous method [16].

    III. THE PROPOSED METHOD

    Pushing and grasping objects using a robotic arm can be expressed as a Markov decision process (MDP) [29], [30].MDP is commonly represented by a quaternion (S,A,P,R),whereSdenotes the state space,Adenotes the action space,Pdenotes the transition probability, andRdenotes the reward function. The value-based reinforcement learning (RL)method can effectively deal with the MDP problem. Among them, deep Q-learning (DQN) methods [31]–[33] aim to obtain an end-to-end mapping functionQ(S,A;θ) from state spaceSto action spaceAby learning the network parametersθ, which have demonstrated good performance and great potential in the field of robotics. In this paper, a novel collaborative pushing and grasping framework based on DQN is proposed for automatically pushing and grasping tightly stacked objects. As shown in Fig. 2, the proposed framework consists of a pushing network (Action-PNet) and a grasping network (Action-GNet), which follows the idea of first separating the cluttered objects by pushing and then grasping them one-by-one.

    A. Collaborative Pushing and Grasping Network

    The action space includes four components: action type?={push,grasp}, locations (x,y,z), rotation angle Θ, and push lengthL. During pushing, we set ?Θ=22.5° to indicate the interval of pushing directions in a range of 360°, i.e., a total of 16 pushing directions. During grasping, we set?Θ=11.25°in a range of 180°to indicate the interval of grasping directions, i .e., a total of 16 grasping directions.

    At timet, the statestis obtained from the RGB-D images.Specifically, we map color and depth images to the robotic arm coordinate system and obtain the color-state-map and the depth-state-map. As shown in Fig. 2, our Action-PNet and Action-GNet are built upon the 121-layer DenseNet [34] pretrained on ImageNet [35] to extract features from the colorstate-map and depth-state-map. After feature concatenation,two identical blocks with batch normalization (BN) [36],rectified linear unit (ReLU) [37], and 1×1 convolution are used in Action-PNet and Action-GNet for further feature embedding. Then, the bilinear interpolation layer is used to obtain the pixel-wise state-action prediction valueQ(st,a;θ).Note that the pushing process switches to the grasping process according to the separation degree of objects, i.e., the pushing state-action prediction value will decrease to a low level when the objects are already separated from each other.

    Moreover, efficient prior constraints are devised to reduce the complexity of action space and accelerate the training process. As shown in Fig. 2, we present the dynamic action maskM(st,?) to optimize the action strategy

    Fig. 2. Illustration of the proposed collaborative pushing and grasping method based on deep reinforcement learning.

    where,M(st,?) is obtained by the object contours for pushing actions, andM(st,?) is obtained by the centers of object contours for grasping actions.

    B. Non-maximum Suppression Policy (PolicyNMS)

    Non-maximum suppression (NMS) algorithms [38], [39] are widely applied to deal with highly redundant candidate boxes in object detection tasks. Inspired by these NMS algorithms,we propose an efficient PolicyNMS to suppress unreasonable actions. Specifically, we construct redundant boxes on each candidate action and calculate the confidences ( i.e., the object percentage) of redundant boxes to evaluate the reasonableness of an action as shown in Fig. 3.

    Fig. 3. Illustration of our PolicyNMS.

    PolicyNMS aims to use a constraint πNMS(st) to suppress unreasonable actions and help to obtain the final action as

    According to (1), we can obtain the action locations (x,y,z)and the corresponding state-action predictions in 16 action directions, respectively. As shown in Fig. 3, different shifts at the original action location are implemented along each action direction to obtain the boxes, wherek∈[1,K] denotes different shifts in each direction,d∈[0,15] denotes 16 directions. For pushing and grasping, boxes of different shapes are designed as shown in Figs. 3(a) and (b). The probabilityis defined as the percentage of objects in the box. Then, the probabilitiescorresponding to different shifts in the same direction are averaged to get the action probabilityPd

    During pushing, a largerPdrepresents a higher possibility of successfully pushing the object. For grasping, a smallerPdmeans a larger grasping space for the gripper and a lower possibility of collision. Therefore, for 16 action directions we can obtain the constraint on unreasonable action πNMS(st)

    where πNMS(st) is a 16-dimensional vector.

    By using such a constraint for unreasonable action suppression, our method can significantly improve the convergence speed and predict more reasonable actions.

    C. Rewards for Pushing and Grasping

    To better evaluate the quality of the action strategies, novel rewards for pushing and grasping are designed in this paper.As shown in Fig. 4, a convolutional neural network based pushing reward is designed to evaluate the separation or aggregation trend after pushing, called PR-Net.

    Fig. 4. The proposed PR-Net architecture.

    Firstly, two sequential depth-state-mapsIpush_before,Ipush_afterof size 224×224×3 are fed into two branches, respectively.In each branch, the VGG-16 (visual geometry group 16-layer)network is used as the backbone and outputs the 7×7×512 feature maps. Then, the feature maps from both branches are concatenated to obtain the 7×7×1024 fused feature mapsIfusion, which are fed into a convolution layer with a kernel of size 1 ×1, followed by a BN layer and an ReLU layer, i .e.,

    whereIcov1∈R512×7×7denotes the output feature maps, and ω(512,(1,1))denotes the learnable parameters,BN(·) denotes the batch normalization layer, and σ(·) denotes the ReLU activation layer.

    Then, we obtainIcov2∈R512×5×5by feeding the output feature maps into another convolution layer with a kernel of size 3×3×512, followed by a BN layer and an ReLU layer,i.e.,

    The feature mapsIcov2are flattened and fed into three fully connected layers. We use dropout to avoid overfitting and an ReLU layer as the activation function after the first two layers.The last fully connected layer is fed into a softmax layer to predict a probability vector for a binary classification task, i .e.,whether or not the objects are separated further after a pushing action.

    We use the cross-entropy loss to train the PR-Net

    whereandrepresent theith input image pair of the PR-Net, θPR?Netrepresents the learnable parameters of the PRNet,represents the mapping function of the PR-Net,pijrepresents the one-hot encoding vector of the ground truth label of theith sample,Crepresents the number of classes,Nrepresents the total number of samples in the training set,Lp(·) represents the loss function of the PR-Net.

    Finally, a pushing rewardrpcan be derived from the output of the PR-Net, which is defined as

    whereoutput=0 means that the push action aggregates the objects, andoutput=1 means that the push action separates the objects.

    PR-Net can efficiently predict the global reward for each candidate pushing action by assessing the degree of aggregation or separation from the full view of scene.

    An efficient grasping reward functionrgis also designed

    whereGdenotes the grasp result, i.e., 0 for a failed grasp and 1.5 for a successful one. ?Θ denotes the angle constraint indicating the absolute difference between the rotation angleEΘof gripper and the angleOΘof the object,λis a hyperparameter, which is set to 0.02. The angle constraint ?Θ can help to obtain a more precise grasp policy, which will be discussed in Section IV.

    D. The Common Household Item Dataset (CHID)

    Differently from [12]–[16], [40], we use common household items as the targets in our pushing and grasping task. To this end, we establish a common household item dataset (CHID),which contains many different household items in various shapes, colors, textures, and sizes, i.e., a better collection of various generic objects in the household scenario.Specifically, we select the household object meshes from Freiburg spatial relations dataset [41] and 3D Warehouse Web1https://3dwarehouse.sketchup.com. We also set the physical properties for these objects, so that the dataset can simulate physical collision, friction, and other phenomena in the real world. The simulation items in the training set and testing set are shown in Figs. 5(a) and (b),each includes 15 kinds of objects. Note that the objects in the testing set are disjoint with those in the training set. The realworld testing items for testing are presented in Fig. 5(c),which are also disjoint with the simulation items in the training set.

    Fig. 5. Some items from the CHID dataset. (a) Simulation items in the training set; (b) Simulation items in the testing set; (c) Real-world items for testing.

    Then, we randomly selectn∈[3,6] objects in the training set to build training scenarios and randomly selectn∈[3,8]objects in the testing set to build testing scenarios. To learn an effective pushing strategy to separate objects, we set two difficulty levels during training by following the idea of curriculum learning. As shown in Fig. 6(a), there are clear gaps between objects in the easy scenarios, which are used for the initial stage of training. As shown in Fig. 6(b), objects in the difficult scenarios are packed tightly, which are used for the later stage of training. Training from easy to difficult is beneficial for speeding up the convergence and learning a robust pushing strategy. As shown in Fig. 6(c), the real-world testing scenarios are very different from the simulation ones,which are used to evaluate the generalization capability of the proposed method.

    Fig. 6. Training and testing scenarios. (a) Easy scenarios for the initial stage of training; (b) Difficult scenarios for later stage of training; (c) The real-world testing scenarios.

    Fig. 7. Some training and testing pairs for our PR-Net: (a) Depthstate-maps before a pushing action; (b) Depth-state-maps after the action. The bottom labels denote aggregation (0) or separation (1). The white arrows indicate the pushing directions.

    Finally, we establish the training and testing sets for our PRNet as shown in Fig. 7, including 31 628 training pairs and 7907 testing pairs. Each pair contains the depth-state-map before a pushing action and the depth-state-map after the pushing action, as well as the ground truth label, i.e., 0 means aggregation while 1 means separation.

    IV. EXPERIMENTAL RESULTS

    In the experiments, we evaluated the proposed method in both the simulation and real-world environments. First, we compared our method with the Non-RL pushing method and directly grasping method to verify the performance of the proposed method. Then, we performed the ablation studies to validate the effectiveness of the proposed PolicyNMS and PRNet. Finally, we demonstrated the generalization capability of the proposed method from the simulation testing scenarios to real-world scenarios.

    A. Implementation Details

    We built a simulation environment in Gazebo, including a UR10 robotic arm equipped with a robotiq85 gripper and a Kinect RGB-D camera fixed on the table. We trained our RL method using stochastic gradient descent (SGD) with a fixed learning rate of 0.0001, momentum of 0.95, and weight decay of 2E–5 on an Ubuntu 16.04 server with two NVIDIA GTX 1080Ti GPUs. We applied DQN [33] with a prioritized experience replay [42] to train our Action-PNet and Action-GNet for 20 000 steps and 8000 steps, respectively. It took about 15 s for each step. And we updated the parameters of the target network in every 200 steps.ε-greedy [31] was used as the action selecting policy, whereεwas initialized as 0.4 and then annealed to 0.1 during training. The future discount factorγwas set as 0.5. For PR-Net, we used the VGG-16 network pretrained on ImageNet [35] as the backbone and trained it for 60 epochs using SGD with a fixed learning rate of 0.0001, momentum of 0.95, and weight decay of 2E–5. The batch size was set to 32. Horizontal and vertical flipping was used for data augmentation during training.

    B. Dataset and Evaluation Metrics

    We adopted the established dataset CHID described in Section III-D as the benchmark. Specifically, we used the training set including the easy scenarios and difficult scenarios to train the proposed model and tested it on the simulation testing scenarios as well as the real-world scenarios. For each number of objects (n∈[3,8]), we conducted 25 tests,respectively. The performances of different methods were evaluated in terms of the following metrics: 1) success rate of separation (pushing times ≤2×n), wherenrepresents the number of objects to be separated. If the pushing times in one test exceed 2×n, this test is regarded as failure; 2) pushing efficiency metric, i.e., the mean and standard variance of pushing times in 25 tests at different settings (n∈[3,8]). The smaller this mean value is, the more effective the current method is. Besides, a smaller variance indicates a more robust pushing strategy; and 3) success rate of grasping, which is defined as the average ratio between the number of objects and the total grasping times in 25 tests at different settings.

    C. Pushing and Grasping Results in Simulation Scenarios

    First, we compared our RL-based pushing method with the supervised learning method (named as Non-RL pushing),which has the same structure with the Action-PNet but is trained in a supervised manner, where the binary classification labels are predicted by PR-Net. For each number of objects(n∈[3,8]) , we conducted 25 scenarios, i.e., randomly selected the corresponding number of objects from the testing set and tightly stacked them together for each scenario. The success rate of separation and pushing efficiency metrics are reported in Table I. Compared with the Non-RL pushing method, the performance of our method is much better. As the number of objects increases, the difficulty of the pushing task becomes higher, and the advantage of our method becomes more and more obvious. Besides, the less pushing times demonstrate that the proposed RL based pushing method using long-term future rewards separates objects more effectively while the smaller variance shows its robustness. Although onlyn∈[3,6]objects were used during training, the proposed method still obtained high performance for pushing more tightly stacked objects ( e.g.,n∈[7,8]) during testing, which demonstrates the generalization capability of our method. In addition, we replaced the proposed PR-Net reward with the local rewardfunction of a recently proposed RL-based collaborative pushing and grasping method [16] and constructed comparative experiments. As shown in Table I, the proposed method can obtain significant advantages over local reward RL pushing [16]. It is because that only evaluating the pushing effectiveness in a local area may result in non-optimal pushing action from a global perspective, e.g., separating a small group of objects while some of them may be closer to the remaining objects, which well demonstrates the superior performance of our designed PR-Net reward.

    TABLE I COMPARISON WITH OTHER PUSHING METHODS

    TABLE II COMPARISON WITH THE GRASPING-ONLY METHOD

    Then, we conducted experiments to compare grasping-only method and the proposed collaborative pushing and grasping method for grasping tightly stacked objects. The graspingonly method has the same structure with our Action-GNet and was used for directly grasping objects without pushing. As shown in Table II, the success rate of grasping of the grasping-only method is very low. It is because directly grasping the tightly stacked objects will cause collisions and result in failures. By contrast, the proposed method can achieve a much higher success rate, which demonstrates the superiority of our collaborative pushing and grasping framework over the grasping-only one. The simulation testing environment is presented in Fig. 8.

    Fig. 8. Illustration of the simulation testing environment.

    D. Ablation Study

    Ablation studies of the components of the proposed method were performed to validate their effectiveness. First,experiments were conducted to verify the performance of the pushing reward network PR-Net and the PolicyNMS in the pushing task. Specifically, we conducted experiments for the following three models.

    Model 1:the push reward without PR-Net, i.e., only getting the final reward when the separation is done, and Action-PNet without PolicyNMS.

    Model 2:the push reward using PR-Net and Action-PNet without PolicyNMS.

    Model 3:the push reward using PR-Net and Action-PNet using PolicyNMS. The training results of these three models for the setting of 6 objects are plotted in Fig. 9. It can be seen that the proposed PR-Net can help the method achieve better pushing efficiency by adequately evaluating the rationality of pushing actions and the proposed PolicyNMS contributes significantly to the faster learning speed by suppressing unreasonable pushing actions.

    Fig. 9. Ablation study of PR-Net and PolicyNMS for pushing.

    Then, to evaluate the effectiveness of the proposed grasping angle reward function defined in (9) and PolicyNMS in the grasping task, we conducted experiments for the following three models.

    Model 1:the grasping reward without the angle constraint defined in (9) and the Action-GNet without PolicyNMS.

    Model 2:the grasping reward with the angle constraint and the Action-GNet without PolicyNMS.

    Model 3:the grasp reward with the angle constraint and the Action-GNet with PolicyNMS. The training results of these three models for the setting of 6 objects are plotted in Fig. 10.It can be seen that PolicyNMS and the grasping angle reward bring a higher success rate of grasping. Specifically, the grasping angle constraint reward benefits the final performance while contributing less to the learning speed. By contrast, PolicyNMS has a larger impact on the learning speed, which helps the agent learn much faster by suppressing unreasonable grasping actions to avoid failure grasping and collisions.

    E. Evaluation Results in Real-world Scenarios

    We evaluated the proposed method in real-world scenarios.The testing suit consists of a UR10 robotic arm with a DH-95 gripper, and a Realsense RGB-D camera fixed on the desktop as shown in Fig. 2. We used the network trained in the simulation training scenarios directly to the real-world testing scenarios. Specifically, we randomly selectedn∈[3,8] realworld household objects and tightly stacked them together. In all the real-world tests, our method successfully separated all the objects under the push times limitation, i.e., ≤2×n. An visual demo of pushing and grasping in the real-world environment is presented in Fig. 11. As shown in the first column of Table III, our method can achieve a robust and efficient pushing performance in the real-world tests, which are comparable to the results in the simulation tests as shown in Table I. Besides, the high success rates of grasping are also comparable to the results in Table II. The results validate the good generalization capability of the proposed method from simulation environment to real-world environment as well as from specific objects to unknown objects. Furthermore, we also prepared much more difficult testing scenarios, wheren∈[3,8] identical objects are tightly stacked together as shown in Fig. 12. These tests further demonstrate the excellent generalization capability and adaptability of our method,which is important for practical applications. A video demo of the testings in the real-world environment is also provided2https://github.com/nizhihao/Collaborative-Pushing-Grasping.

    Fig. 10. Ablation study of the proposed grasping angle reward and PolicyNMS for grasping.

    TABLE III RESULTS OF OUR METHOD IN REAL-WORLD ENVIRONMENT

    Fig. 11. Testing for random objects stacked tightly in real-world environment.

    Fig. 12. Testing for identical objects stacked tightly in real-world environment.

    V. CONCLUSIONS

    In the paper, we propose a novel deep Q-learning method for collaboratively pushing and grasping tightly stacked objects. Specifically, a novel efficient non-maximum suppression policy is designed, which can help accelerate the learning speed by suppressing unreasonable actions to avoid bad consequences. For the pushing task, an end-to-end datadriven pushing reward network is designed to assess the state of aggregation or separation after different pushing actions from a global perspective. For the grasping task, an efficient grasping reward function with angel constraint is defined to help optimize the angle of grasping actions. They contribute to developing an efficient and robust pushing strategy as well as the high success rates of pushing and grasping. Moreover, we establish the common household item dataset containing various objects in different colors, shapes, textures, and sizes,forming lots of easy to difficult training scenarios.Experimental results demonstrate the superiority of the proposed method over the non-RL pushing method and directly grasping method for this challenging task, as well as its fast learning speed, good generalization capability and robustness. One of the limitation of the method is that there is no constraint of the pushing distance, which may push some objects out of the boundary. In the future work, we can explore an effective constraint to deal with this limitation.

    国产精品久久久久久久久免| 欧美日韩精品网址| 啦啦啦在线观看免费高清www| 亚洲激情五月婷婷啪啪| 欧美在线一区亚洲| 18在线观看网站| 爱豆传媒免费全集在线观看| 男男h啪啪无遮挡| 嫩草影院入口| 韩国精品一区二区三区| 日韩三级视频一区二区三区| 欧美色视频一区免费| 午夜久久久在线观看| 欧美日本视频| 亚洲少妇的诱惑av| 免费高清在线观看日韩| 久久久精品国产亚洲av高清涩受| 国产精品一区二区在线不卡| 亚洲专区国产一区二区| 午夜福利一区二区在线看| 欧美激情高清一区二区三区| 操美女的视频在线观看| 9191精品国产免费久久| 国产精品久久久久久亚洲av鲁大| 两个人免费观看高清视频| 久久精品aⅴ一区二区三区四区| 深夜精品福利| 一a级毛片在线观看| 亚洲精品在线美女| 老司机在亚洲福利影院| 咕卡用的链子| 国产精品国产高清国产av| 亚洲午夜精品一区,二区,三区| 国产精品 欧美亚洲| 国产欧美日韩综合在线一区二区| 国产91精品成人一区二区三区| 久久午夜亚洲精品久久| 黄片小视频在线播放| 18禁观看日本| 亚洲五月色婷婷综合| 色综合欧美亚洲国产小说| 国产成人一区二区三区免费视频网站| 亚洲精品久久国产高清桃花| 久久久久久国产a免费观看| 久久香蕉国产精品| 成人免费观看视频高清| 淫秽高清视频在线观看| 成在线人永久免费视频| 日本a在线网址| 啪啪无遮挡十八禁网站| 亚洲国产欧美一区二区综合| 久久精品91无色码中文字幕| 久久性视频一级片| 麻豆av在线久日| 日本免费a在线| 啪啪无遮挡十八禁网站| 色尼玛亚洲综合影院| 在线十欧美十亚洲十日本专区| 婷婷精品国产亚洲av在线| 亚洲熟妇熟女久久| 国产欧美日韩综合在线一区二区| 亚洲av片天天在线观看| 国产亚洲精品第一综合不卡| 欧美激情高清一区二区三区| 成人亚洲精品av一区二区| 无限看片的www在线观看| av天堂在线播放| 国产三级黄色录像| 免费在线观看亚洲国产| 人人妻人人爽人人添夜夜欢视频| 一级片免费观看大全| 中出人妻视频一区二区| 777久久人妻少妇嫩草av网站| 欧美av亚洲av综合av国产av| 高清黄色对白视频在线免费看| 亚洲成国产人片在线观看| 精品卡一卡二卡四卡免费| 18禁观看日本| 久久中文字幕一级| 免费久久久久久久精品成人欧美视频| 又黄又粗又硬又大视频| 男人舔女人下体高潮全视频| 最近最新免费中文字幕在线| 成人18禁高潮啪啪吃奶动态图| 老熟妇乱子伦视频在线观看| 国产又爽黄色视频| 电影成人av| 成人三级做爰电影| 色老头精品视频在线观看| 国产成人系列免费观看| 免费搜索国产男女视频| 最近最新中文字幕大全电影3 | 精品电影一区二区在线| 欧美日韩中文字幕国产精品一区二区三区 | 国产精品久久久av美女十八| 日韩av在线大香蕉| 给我免费播放毛片高清在线观看| 欧洲精品卡2卡3卡4卡5卡区| 亚洲精品一卡2卡三卡4卡5卡| 欧美黑人精品巨大| 欧美成狂野欧美在线观看| 九色国产91popny在线| 亚洲三区欧美一区| 国产三级黄色录像| av片东京热男人的天堂| 亚洲国产精品合色在线| 国产一卡二卡三卡精品| 国产精品美女特级片免费视频播放器 | 国产男靠女视频免费网站| 亚洲无线在线观看| 国产精品亚洲av一区麻豆| 成人永久免费在线观看视频| 在线观看一区二区三区| 嫁个100分男人电影在线观看| 久久国产精品人妻蜜桃| 亚洲 国产 在线| 欧美日韩中文字幕国产精品一区二区三区 | 香蕉国产在线看| 中文字幕久久专区| 18禁观看日本| 99国产精品一区二区蜜桃av| 在线播放国产精品三级| 自线自在国产av| 午夜福利免费观看在线| 成人三级做爰电影| 麻豆久久精品国产亚洲av| 天堂√8在线中文| 久久香蕉激情| 黑人操中国人逼视频| 男人舔女人的私密视频| 亚洲一码二码三码区别大吗| 夜夜躁狠狠躁天天躁| 女人被躁到高潮嗷嗷叫费观| 91成年电影在线观看| 两人在一起打扑克的视频| 免费在线观看视频国产中文字幕亚洲| 日本欧美视频一区| 好男人在线观看高清免费视频 | 老熟妇仑乱视频hdxx| 天天添夜夜摸| 一卡2卡三卡四卡精品乱码亚洲| 亚洲专区字幕在线| 久久人人97超碰香蕉20202| 国产精品一区二区三区四区久久 | 久久香蕉激情| 日本免费a在线| 侵犯人妻中文字幕一二三四区| 又紧又爽又黄一区二区| 我的亚洲天堂| 9191精品国产免费久久| 国产精品99久久99久久久不卡| 波多野结衣高清无吗| 国产91精品成人一区二区三区| 51午夜福利影视在线观看| 69精品国产乱码久久久| 国产精品久久电影中文字幕| 久久久久久久久免费视频了| www日本在线高清视频| 大型黄色视频在线免费观看| 欧美午夜高清在线| av视频在线观看入口| 精品国产一区二区久久| 国产在线精品亚洲第一网站| 国产精品野战在线观看| 99国产精品一区二区三区| 妹子高潮喷水视频| av在线播放免费不卡| 51午夜福利影视在线观看| 色综合欧美亚洲国产小说| 久久精品亚洲精品国产色婷小说| 激情视频va一区二区三区| 国产1区2区3区精品| 亚洲午夜理论影院| 激情在线观看视频在线高清| 日本免费一区二区三区高清不卡 | 18美女黄网站色大片免费观看| 一级黄色大片毛片| 亚洲精品中文字幕在线视频| 国产精品综合久久久久久久免费 | 丝袜美腿诱惑在线| 啦啦啦观看免费观看视频高清 | 好男人电影高清在线观看| 99国产极品粉嫩在线观看| 国产私拍福利视频在线观看| 国产精品 国内视频| 麻豆国产av国片精品| aaaaa片日本免费| 黄色丝袜av网址大全| 精品一区二区三区av网在线观看| 免费无遮挡裸体视频| 日韩成人在线观看一区二区三区| 国产欧美日韩一区二区三| 啦啦啦 在线观看视频| 国产av一区二区精品久久| 成人免费观看视频高清| 久久精品成人免费网站| 成年女人毛片免费观看观看9| 看片在线看免费视频| 99久久99久久久精品蜜桃| 精品不卡国产一区二区三区| 亚洲国产看品久久| 亚洲激情在线av| 欧美激情高清一区二区三区| 黄片大片在线免费观看| 亚洲一区高清亚洲精品| 制服人妻中文乱码| 欧美精品啪啪一区二区三区| 长腿黑丝高跟| 亚洲一区高清亚洲精品| 一级a爱视频在线免费观看| 国产主播在线观看一区二区| 18禁黄网站禁片午夜丰满| 搞女人的毛片| 日本精品一区二区三区蜜桃| 久久草成人影院| 成人特级黄色片久久久久久久| 亚洲精品久久成人aⅴ小说| 成年人黄色毛片网站| 黄色毛片三级朝国网站| 久久亚洲精品不卡| 午夜免费激情av| 久久青草综合色| 人人妻,人人澡人人爽秒播| av有码第一页| 在线av久久热| 一边摸一边做爽爽视频免费| 免费高清在线观看日韩| 国产99久久九九免费精品| 精品国产一区二区三区四区第35| 极品人妻少妇av视频| 九色亚洲精品在线播放| 国产av精品麻豆| 欧美大码av| 欧美日本亚洲视频在线播放| 99香蕉大伊视频| 成年版毛片免费区| 欧美老熟妇乱子伦牲交| 精品国产一区二区久久| 天天一区二区日本电影三级 | 久久午夜综合久久蜜桃| 999精品在线视频| 亚洲无线在线观看| 黑人欧美特级aaaaaa片| 91成人精品电影| 丝袜美腿诱惑在线| 成人欧美大片| 免费少妇av软件| 91成年电影在线观看| 狂野欧美激情性xxxx| 国产一区二区三区视频了| 视频区欧美日本亚洲| 这个男人来自地球电影免费观看| 亚洲专区字幕在线| 成人免费观看视频高清| 99久久综合精品五月天人人| 午夜激情av网站| 亚洲第一av免费看| 国内毛片毛片毛片毛片毛片| 如日韩欧美国产精品一区二区三区| 曰老女人黄片| 999久久久国产精品视频| 日韩精品免费视频一区二区三区| 人成视频在线观看免费观看| 国产免费av片在线观看野外av| 99精品欧美一区二区三区四区| 午夜福利在线观看吧| 亚洲成国产人片在线观看| 欧美不卡视频在线免费观看 | 欧美另类亚洲清纯唯美| 精品不卡国产一区二区三区| 18禁国产床啪视频网站| 成人手机av| 国产精品久久电影中文字幕| 成人国产综合亚洲| av视频免费观看在线观看| 亚洲精华国产精华精| 久久伊人香网站| 久久国产精品影院| 亚洲av美国av| 波多野结衣一区麻豆| 美国免费a级毛片| 国产一区在线观看成人免费| 亚洲一码二码三码区别大吗| 午夜成年电影在线免费观看| 国产精品1区2区在线观看.| 欧美黑人欧美精品刺激| 日本一区二区免费在线视频| 亚洲午夜理论影院| 久久国产乱子伦精品免费另类| cao死你这个sao货| av天堂久久9| 纯流量卡能插随身wifi吗| 欧美成人一区二区免费高清观看 | 日本a在线网址| 久久久久久久久中文| 亚洲男人天堂网一区| 午夜福利成人在线免费观看| 亚洲国产精品sss在线观看| 国产精品美女特级片免费视频播放器 | 国产私拍福利视频在线观看| 色尼玛亚洲综合影院| 最好的美女福利视频网| 无人区码免费观看不卡| 老熟妇仑乱视频hdxx| 日韩大码丰满熟妇| 天天一区二区日本电影三级 | 免费无遮挡裸体视频| 欧美日韩亚洲综合一区二区三区_| 啦啦啦韩国在线观看视频| 看片在线看免费视频| 岛国视频午夜一区免费看| 亚洲国产精品久久男人天堂| 国产主播在线观看一区二区| 国产精品久久电影中文字幕| 日本欧美视频一区| 99久久综合精品五月天人人| 一级a爱片免费观看的视频| 99香蕉大伊视频| 精品国产超薄肉色丝袜足j| 国产精品 国内视频| 天天一区二区日本电影三级 | 欧美精品啪啪一区二区三区| 男女之事视频高清在线观看| 熟妇人妻久久中文字幕3abv| 丝袜美腿诱惑在线| 亚洲专区字幕在线| 国产伦人伦偷精品视频| 亚洲av成人一区二区三| 午夜福利免费观看在线| 91精品国产国语对白视频| 午夜福利18| 香蕉国产在线看| 午夜两性在线视频| 午夜福利视频1000在线观看 | 精品熟女少妇八av免费久了| 黄色 视频免费看| 一本久久中文字幕| 亚洲欧美激情在线| 亚洲人成电影观看| 黑人巨大精品欧美一区二区mp4| 人成视频在线观看免费观看| 精品福利观看| 免费搜索国产男女视频| 国语自产精品视频在线第100页| 日本a在线网址| 午夜福利影视在线免费观看| 亚洲性夜色夜夜综合| 一区二区三区国产精品乱码| 国产成人精品在线电影| 色综合婷婷激情| 久久香蕉国产精品| 中文字幕高清在线视频| 琪琪午夜伦伦电影理论片6080| 精品国内亚洲2022精品成人| 天堂√8在线中文| 99久久综合精品五月天人人| 日韩精品免费视频一区二区三区| 法律面前人人平等表现在哪些方面| 巨乳人妻的诱惑在线观看| 久久精品国产99精品国产亚洲性色 | 19禁男女啪啪无遮挡网站| 黄片播放在线免费| 亚洲熟妇中文字幕五十中出| 欧美成人免费av一区二区三区| 国产成人系列免费观看| 欧美日韩亚洲国产一区二区在线观看| 亚洲国产看品久久| 色尼玛亚洲综合影院| 婷婷六月久久综合丁香| 正在播放国产对白刺激| 美女高潮喷水抽搐中文字幕| 久久青草综合色| 免费高清视频大片| 麻豆一二三区av精品| 女生性感内裤真人,穿戴方法视频| 欧美最黄视频在线播放免费| 久久久国产精品麻豆| 看片在线看免费视频| 久久久久久国产a免费观看| 免费一级毛片在线播放高清视频 | 超碰成人久久| 欧美在线一区亚洲| 精品少妇一区二区三区视频日本电影| 最近最新中文字幕大全免费视频| 成人国产综合亚洲| 久久久久国内视频| 欧美日韩黄片免| 禁无遮挡网站| ponron亚洲| 黄色片一级片一级黄色片| 91字幕亚洲| 国产成人av教育| 97碰自拍视频| 欧美黄色淫秽网站| 久久狼人影院| 国产精品永久免费网站| 精品乱码久久久久久99久播| 啦啦啦韩国在线观看视频| 亚洲精品国产精品久久久不卡| 日日爽夜夜爽网站| 亚洲狠狠婷婷综合久久图片| 欧美乱色亚洲激情| 亚洲国产精品成人综合色| 九色亚洲精品在线播放| 久久性视频一级片| 精品电影一区二区在线| 亚洲成a人片在线一区二区| 搡老妇女老女人老熟妇| 国产高清有码在线观看视频 | 欧美丝袜亚洲另类 | x7x7x7水蜜桃| 夜夜看夜夜爽夜夜摸| 天天添夜夜摸| 欧美一级毛片孕妇| 中文字幕色久视频| 国产精品亚洲一级av第二区| 国产一区二区三区在线臀色熟女| 亚洲欧美激情综合另类| 99久久精品国产亚洲精品| 欧美成人午夜精品| 亚洲狠狠婷婷综合久久图片| 国产成人欧美在线观看| 每晚都被弄得嗷嗷叫到高潮| 国产成人精品久久二区二区91| 91精品国产国语对白视频| 日本精品一区二区三区蜜桃| 国产精品 欧美亚洲| 午夜影院日韩av| 日韩有码中文字幕| 日本黄色视频三级网站网址| 满18在线观看网站| 久久国产亚洲av麻豆专区| 国产精品一区二区三区四区久久 | 亚洲九九香蕉| 午夜久久久久精精品| 日本欧美视频一区| 很黄的视频免费| 国产精品综合久久久久久久免费 | 我的亚洲天堂| 国产精品乱码一区二三区的特点 | 人妻久久中文字幕网| 日本 av在线| 久久午夜综合久久蜜桃| 久久久水蜜桃国产精品网| 亚洲国产欧美网| 成人精品一区二区免费| 叶爱在线成人免费视频播放| netflix在线观看网站| 天堂√8在线中文| 成人欧美大片| 黄频高清免费视频| 欧美日本视频| 最近最新中文字幕大全免费视频| 国产亚洲精品久久久久久毛片| 欧美色视频一区免费| 神马国产精品三级电影在线观看 | 精品熟女少妇八av免费久了| 午夜福利18| 午夜久久久在线观看| 在线免费观看的www视频| 脱女人内裤的视频| 制服丝袜大香蕉在线| 老司机在亚洲福利影院| 好男人在线观看高清免费视频 | www.999成人在线观看| 两个人看的免费小视频| 亚洲av日韩精品久久久久久密| 欧美最黄视频在线播放免费| 国产亚洲精品久久久久5区| 亚洲色图av天堂| 在线观看免费日韩欧美大片| 亚洲电影在线观看av| 热99re8久久精品国产| 搡老岳熟女国产| 女警被强在线播放| 日韩 欧美 亚洲 中文字幕| 成人18禁高潮啪啪吃奶动态图| 国产一卡二卡三卡精品| 国产成人精品久久二区二区免费| 最近最新中文字幕大全免费视频| 精品一区二区三区视频在线观看免费| 日韩一卡2卡3卡4卡2021年| 99国产极品粉嫩在线观看| 亚洲精品国产一区二区精华液| 亚洲第一青青草原| 久久精品国产综合久久久| 69av精品久久久久久| 在线观看www视频免费| 美女 人体艺术 gogo| 国产又色又爽无遮挡免费看| 黑人欧美特级aaaaaa片| 99国产综合亚洲精品| 曰老女人黄片| 一级作爱视频免费观看| 亚洲第一av免费看| 淫妇啪啪啪对白视频| 国产av精品麻豆| 中文字幕精品免费在线观看视频| 一本久久中文字幕| 亚洲av成人不卡在线观看播放网| 亚洲精品国产一区二区精华液| 日本精品一区二区三区蜜桃| 午夜福利影视在线免费观看| 18禁裸乳无遮挡免费网站照片 | 国产一区在线观看成人免费| 香蕉国产在线看| 神马国产精品三级电影在线观看 | 最新在线观看一区二区三区| 他把我摸到了高潮在线观看| 亚洲中文日韩欧美视频| 叶爱在线成人免费视频播放| 精品少妇一区二区三区视频日本电影| 午夜精品久久久久久毛片777| 波多野结衣av一区二区av| 制服人妻中文乱码| 黄色视频,在线免费观看| 变态另类成人亚洲欧美熟女 | 国产精品久久久av美女十八| 精品久久久久久久久久免费视频| 丝袜美腿诱惑在线| 亚洲 欧美 日韩 在线 免费| 在线av久久热| 国产成年人精品一区二区| 一本综合久久免费| 国产精华一区二区三区| 无遮挡黄片免费观看| 久久影院123| 99国产综合亚洲精品| 久久香蕉精品热| 日本撒尿小便嘘嘘汇集6| av视频免费观看在线观看| 男男h啪啪无遮挡| 母亲3免费完整高清在线观看| 午夜福利高清视频| 亚洲精品国产一区二区精华液| 黄色视频不卡| 97人妻天天添夜夜摸| 国产国语露脸激情在线看| 日本 av在线| 一级毛片高清免费大全| a级毛片在线看网站| 天天躁夜夜躁狠狠躁躁| 国产精品 欧美亚洲| 亚洲av日韩精品久久久久久密| 午夜激情av网站| 99香蕉大伊视频| avwww免费| 欧美最黄视频在线播放免费| 亚洲性夜色夜夜综合| 两性午夜刺激爽爽歪歪视频在线观看 | 亚洲五月婷婷丁香| 每晚都被弄得嗷嗷叫到高潮| 精品一区二区三区四区五区乱码| 亚洲欧美日韩无卡精品| www.精华液| 久久精品国产清高在天天线| 黄片播放在线免费| 女生性感内裤真人,穿戴方法视频| 精品高清国产在线一区| 国产亚洲精品久久久久5区| 成人国产综合亚洲| 日韩大码丰满熟妇| 欧美亚洲日本最大视频资源| 热99re8久久精品国产| 91成人精品电影| 色av中文字幕| 精品国产美女av久久久久小说| 久久久久久久久免费视频了| 久久久久久国产a免费观看| 在线观看舔阴道视频| 国产精品九九99| 久久久久久久久免费视频了| 久久国产精品男人的天堂亚洲| 制服丝袜大香蕉在线| 视频区欧美日本亚洲| 国产97色在线日韩免费| 九色亚洲精品在线播放| av有码第一页| tocl精华| 日本免费a在线| 久久影院123| 婷婷精品国产亚洲av在线| 这个男人来自地球电影免费观看| 中文字幕精品免费在线观看视频| www.自偷自拍.com| 成年人黄色毛片网站| 亚洲一区中文字幕在线| 19禁男女啪啪无遮挡网站| 精品国产乱子伦一区二区三区| 亚洲精品美女久久久久99蜜臀| 亚洲国产欧美网| 高清黄色对白视频在线免费看| 亚洲欧美日韩另类电影网站| 真人一进一出gif抽搐免费| 老汉色∧v一级毛片| 99久久国产精品久久久| 日本在线视频免费播放| 中文字幕人成人乱码亚洲影| 久久国产精品人妻蜜桃| 国产精品九九99| 在线观看舔阴道视频| 久久亚洲真实| 午夜视频精品福利| 非洲黑人性xxxx精品又粗又长| 日本欧美视频一区| 久久久久久久久免费视频了| 亚洲狠狠婷婷综合久久图片| 黄色成人免费大全| 在线观看免费午夜福利视频| 在线视频色国产色| 9色porny在线观看| 国产欧美日韩精品亚洲av| av电影中文网址| 欧美在线一区亚洲| 亚洲一区二区三区色噜噜|