ZHANG Jue(張 玨), LI Xiangjian(李祥健), LIU Xiaoyan(劉肖燕)*, LI Nan (李 楠), YANG Kaiqiang(楊開(kāi)強(qiáng)), ZHU Heng(朱 恒)
1 College of Information Science and Technology, Donghua University, Shanghai 201620, China
2 Engineering Research Center of Digitized Textile & Fashion Technology, Ministry of Education, Shanghai 201620, China
Abstract: A large number of logistics operations are needed to transport fabric rolls and dye barrels to different positions in printing and dyeing plants, and increasing labor cost is making it difficult for plants to recruit workers to complete manual operations. Artificial intelligence and robotics, which are rapidly evolving, offer potential solutions to this problem. In this paper, a navigation method dedicated to solving the issues of the inability to pass smoothly at corners in practice and local obstacle avoidance is presented. In the system, a Gaussian fitting smoothing rapid exploration random tree star-smart (GFS RRT*-Smart) algorithm is proposed for global path planning and enhances the performance when the robot makes a sharp turn around corners. In local obstacle avoidance, a deep reinforcement learning determiner mixed actor critic (MAC) algorithm is used for obstacle avoidance decisions. The navigation system is implemented in a scaled-down simulation factory.
Key words: rapid exploration random tree star smart (RRT*-Smart); Gaussian fitting; deep reinforcement learning (DRL); mixed actor critic (MAC)
In printing and dyeing plants, the logistics task is to transport fabric rolls and dye barrels. Workers use trolleys to load fabric rolls and dye barrels and transport the material between printing and dyeing machines.
Intelligent robot navigation systems must perform two tasks in printing and dyeing plants: plan the best route and maneuver around obstacles.
The rest of the paper is organized as follows. In section 1, historical literature related to the methods is recalled. In section 2, the Gaussian fitting smoothing rapid exploration random tree star-smart (GFS RRT*-Smart) algorithm and the determiner mixed actor critic (MAC) method are introduced. In section 3, experimental results of global path planning and local obstacle avoidance strategies are demonstrated.
The main contributions of this paper are the proposal of a navigation method that uses the GFS RRT*-Smart algorithm for global path planning to address the problems of inability to pass smoothly at corners in practice, and a deep reinforcement learning (DRL) determiner MAC method to solve local obstacle avoidance problems.
Simultaneous localization and mapping (SLAM) systems based on 2D light detection and ranging (LiDAR) devices are unable to recognize 3D targets in the environment, while 3D mapping achieved by fusion of 2D LiDAR and 3D ultrasound sensors is highly influenced by ambient light[1-2]. DRL methods for autonomous navigation are developing rapidly, and excellent methods are continuously proposed. Francisetal.[3]presented a deep deterministic policy gradient(DDPG) network to obtain a better local DRL policy and then parallelize the computational DRL policy with probabilistic roadmaps for a given large map to obtain a network. After deployment, the best path can be searched in this network using the A star (A*) algorithm. Savvaetal.[4]developed the indoor navigation platform Habitat, in which the visual properties during navigation were evaluated using DRL, and found that depth maps helped to improve the visual performance of navigation. Saxetal.[5]used mid-level migration invariant feature data with migration invariance for DRL of robot vision to solve the problem of failure to recognize training results properly due to illumination changes. Researchers[6-7]designed a neurotopological simultaneous localization and mapping(SLAM) for visual navigation whose main approach utilized image semantic features for nodes and a spatial topological representation that provided approximate geometric inference, and a goal-oriented semantic exploration module to solve the problem of navigating to a given object category in unseen environments. The above methods suggest that the DRL autonomous navigation method combined with semantic segmentation techniques can help to solve the navigation problems. A random sampling algorithm, for example the probabilistic roadmap (PRM) or the rapid-exploration random tree (RRT), is primarily used for solving path planning problems in low-dimensional spaces. The RRT*algorithm is based on RRT, which uses the re-selecting parent node and rewiring operations to optimize the path. The RRT*-Smart algorithm optimizes the path by converting curves into straight lines as much as possible[8]. Further research improved the RRT*path planning method by using a sampling-based approach[9]. Wangetal.[10]optimized the kinematic planning by using a sampling approach.
Heidenetal.[11]designed an open-source sampling-based motion planning benchmark for wheeled mobile robots. The benchmark provided a diverse of algorithms, post-smoothing techniques, steering functions, optimization criteria, complex environments similar to real-world applications, and performance metrics. Kulhneketal.[12]designed DRL-based visual assistance tasks, customized reward schemes, and simulators. The strategy was fine-tuned on images collected from real-world environments. Kontoudis and Vamvoudakis[13]presented an online kinodynamic motion planning algorithmic framework by using asymptotically optimal RRT*and continuous-time Q-learning. Shietal.[14]proposed an end-to-end navigation method based on DRL that translated sparse laser ranging results into movement actions, which accomplished map-less navigation in complex environments.
The logistics path of printing and dyeing plants has following characteristics: low overall environmental change and frequent local area obstacles. According to these characteristics, we designed a navigation method with global planning of the overall path and local dynamic obstacle avoidance decisions (shown in Fig. 1). Among them, the GFS RRT*-Smart algorithm is used for global path planning, and a decision method MAC combined with semantic segmentation is used for avoiding obstacles.
Fig. 1 Overview of the navigation system
By using intelligent sampling and path optimization techniques, the RRT*-Smart algorithm solves the problem of slow convergence when the RRT*algorithm is close to the optimal value. However, the paths generated by the RRT*-Smart algorithm are not suitable for robots, especially for 2-wheeled differential transition chassis. The RRT*-Smart algorithm-generated curve steers at a small angle, which results in the robot’s inability to steer accurately when the turning radius is small. Smoothing the direct steering line segments with a curved path is beneficial to actual operation of the robot.
The GFS RRT*-Smart algorithm is outlined in Fig. 2. The algorithm traverses path points derived from the RRT*-Smart to confirm whether the path is legal (step 21), and chooses those points in the path where the turning angle is smaller than the certain angle (step 24). In step 25, the algorithm intercepts line segments on both sides of these points, and samples randomly near the line segments according to a certain standard deviation. In steps 26 and 27, the algorithm uses these data to fit a Gaussian curve, and chooses points within the confidence interval to replace the original points.
Fig. 2 GFS RRT*-Smart algorithm
The DDPG algorithm is a deterministic policy learning algorithm, but it has considerable estimation bias along with value estimation, which tends to lead to a large bias in policy learning and reduces its exploration ability.
Twin delayed deep deterministic policy gradient (TD3) effectively reduces the value estimation bias of DDPG with techniques such as dual critic structure and differential delayed update, but its policy learning and decision making are still tendentious.
The girl would turn her cell1 phone off and put it by her photo on the desk every night before going to bed. This habit has been with her ever since she bought the phone.
The soft actor critic (SAC) algorithm considers the evaluation of the entropy of the policy in addition to the rewards inherent in the task itself, correlates the evaluation of the entropy with the dimensions of actions of the agent, and employs a stochastic policy approach; all of those techniques help to improve the ability of intelligent exploration. However, stochastic policies can lead to a lack of certainty in the making of actions, which can easily lead to a failed task.
The proposed algorithm MAC is a deep fusion of soft-of-the-art DRL algorithms and techniques such as TD3, DDPG, and SAC, and its fusion ratio is adapted to the task. MAC integrates the advantages of other algorithms to perform the task more efficiently.
MAC adopts six critic networks: SAC networksQα1,Qα2andQβto increase pre-explorations; TD3 networksQθ1andQθ2to obtain mid-term certainty; DDPG networkQηto obtain posterior certainty. The target value is given by
(1)
whereyis a substitute for the long formula that follows (to facilitate the substitution of the formula below),f,g, andkrefer to the functions that satisfy the sum of 1 at the same momentt,s′ denotes the state quantity at the momentt+1,a′ denotes the action quantity at the momentt+1,xdenotes the input at any moment of loss,rdenotes the reward value, andγindicates the decay factor (generally between 0 and 1). The fusion expectations of TD3, DDPG, and SAC errors are used to update the criticQnetwork, and the fusion ratio is constrained by
f(t)+g(t)+k(t)=1.
(2)
The loss function is given as
(3)
wheretis the index of steps in each episode, andβandρβare distributions with noise of action and state, respectively.
The goal of the actor network is to increase the values ofQθ1, 2(si,ai),Qη(s,a),Qα1, 2(si,ai), andQβ(s,a).The actor network can be updated through the deterministic policy gradient algorithm:
J(φ)≈Eat~β,st~ρβ[(Qθ(st,at)f(t)+Qη(st,at)g(t)-
(Qα(st,at)+Qβ(st,at))k(t))│a=π(s)πφ(s)],
(4)
whereJstands for the update formula of the critic network,φdenotes the full set of parameters in the critic network, and π refers to the abstract concept strategy and more broadly, the action neural network. The results are calculated by using the MAC algorithm to take the advantages of TD3, DDPG and SAC algorithms at different stages. They reduce the bias caused by using a single algorithm.
As shown in Fig. 3, MAC is composed of the actor network and several critic networks. The target actor network and the target critic networks are used to store the parameters of the previous state of their corresponding networks, respectively. The replay buffer is used to store the information that the agent interacts with the environment, and the information is also sampled to update the actor network and the critic networks. The update formula for the critics network and the actor network uses the target critics network output valueQ′ and the critic network output valueQas inputs.
Fig. 3 Flow chart of MAC
In this section, we demonstrate the experimental results of the GFS RRT*-Smart algorithm for global planning and its smooth turns. In a virtual scenario created by the virtual robot experiment platform (V-REP), the performances of the DRL methods DDPG, TD3, SAC, and MAC for obstacle avoidance direction selection are also compared.
The GFS RRT*-Smart algorithm is executed in a 2D environment simulating a printing and dyeing plant (shown in Fig. 4). The upper left side is the blank fabric warehouse, from which the intelligent robot departs to transport the fabric rolls to the inlet or delivery side of the mechanical equipment in the center and right.
Fig. 4 Global path planning in a simulated printing and dyeing plant environment by using the GFS RRT*-Smart algorithm: (a) path planning with the end of the bleacher as the target; (b) optimized path planning near the back of the bleacher; (c) path planning with the front of the dyeing machine as the target; (d) optimized path planning near the front of the dyeing machine
The GFS RRT*-Smart algorithm explores the simulation plant, and comes up with the best path from the blank fabric shelves to the back of the bleacher as shown in Fig. 4 (a), and the optimized path to the dyeing machine as shown in Fig. 4 (c), respectively.
Then the algorithm calls the Gaussian fitting smoothing subroutine to determine whether the corner of the path affects the steering of the intelligent robot, and conducts coordinate point sampling and Gaussian fitting to obtain the curve and corresponding coordinate points shaped like a Gaussian distribution, which part of the curve is intercepted and added to the original path. As shown in Figs. 4 (b) and 4 (d), the red path is the initial path, and the blue path is the local path after being processed by the GFS sub-algorithm.
In order to compare the smoothness of the turning points generated by two algorithms, we implemented route planning tests by the RRT*-Smart algorithm and the GFS RRT*-Smart algorithm for different targets, respectively. The tests for each target were performed 50 times.
As shown in Fig. 5, the curvature comparison for two algorithms at points shown in Figs. 4 (b) and 4 (d) indicates that the GFS RRT*-Smart algorithm generates a smaller curvature than the RRT*-Smart algorithm, confirming that the GFS RRT*-Smart algorithm generates a more favorable steering of the robot at the turning point in the real environment.
Fig. 5 Curvature comparison for RRT*-Smart algorithm and GFS RRT*-Smart algorithm at turning points: (a) curvature comparison at point shown in Fig. 4 (b); (b) curvature comparison at point shown in Fig. 4 (d)
In order to train the obstacle avoidance strategy of DRL, the segmented image is used as an input to the agent. According to this image, the agent needs to learn to transfer from the current direction to the target direction. Therefore, the obstacle avoidance policy is considered equivalent to a point-to-point path exploration policy in a two-dimensional plane at this time.
The research team built a scaled-down printing and dyeing plant model and its 3D simulation environment (shown in Figs. 6 (a) and 6 (b)). In the V-REP simulation environment (shown in Fig. 6 (b)), we implemented direction decision tests for DDPG, TD3, SAC, and MAC in the V-REP, where the randomly generated obstacles were placed at different locations for the experiments. The orientation decision tests for obstacle avoidance at each location were performed ten times, with 500 interactions between the agent and the environment each time. The total cumulative return was used as an evaluation criterion.
Fig. 6 Scaled-down printing and dyeing plants model, its 3D simulation, and robot perspectives: (a) Scaled-down physical model of printing and dyeing plant with 7 target sites; (b) 3D simulation test environment in V-REP; (b) robot perspectives when the cart were driving around a corner
As shown in Fig. 7, the performance of MAC and SAC was significantly better than that of DDPG and TD3 in the 500 agent-environment interactions, and the performance of MAC was also better than that of SAC. The superior performance shown by MAC within the 500 interactions set in the test satisfies the requirement for the timeliness of obstacle avoidance decision when the robot encounters an obstacle.
Fig. 7 Experimental results of obstacle avoidance strategies in three different scenarios: (a) MAC performs better than other algorithms; (b) MAC performs slightly better than other algorithms; (c) MAC performs significantly better than other algorithms
A 28 cm×23 cm×9 cm size cart equipped with a Jetson Nano controller (NVidia, USA) and a RealSense camera(Intel, USA) was used instead of the intelligent robot for the real-world test (shown in Fig. 6 (a)).
To evaluate the effectiveness of the GFS RRT*-Smart algorithm, seven alternative target sites were chosen for testing in the scenario as shown in Fig. 6 (a). The algorithm was tested on the navigation program of the cart, and the results showed that the cart took less time to adjust its posture for smooth passage when driving around a corner (shown in Fig. 6 (c)).
In the obstacle avoidance test, a robot moved about in the simulated scene with its monocular camera, which collected about 2 000 pictures of the scene during the movement. The research team finely annotated the objects in the pictures and trained the semantic segmentation of the annotated data by using the DeepLab v3[15]network.
The trained parameters of DeepLab v3 were deployed on the Jetson Nano controller (NVidia, USA), which was combined with a binocular camera depth estimation algorithm for detecting obstacle distances to dynamically derive the complete obstacle data. The DRL method MAC successfully implements a local obstacle avoidance strategy by rapidly processing this data.
This navigation system uses an intelligent robot logistics in a printing and dyeing plant as an application context, and uses an improved RRT*-Smart algorithm to plan motion paths globally and to perform flexible obstacle avoidance strategy actions locally. The GFS RRT*-Smart algorithm improved the method RRT*-Smart by making its steering path smoother to suit the operation of intelligent robots in practice. In the local obstacle avoidance strategy, the new obstacles on the already planned path are detected with the help of image segmentation, and local obstacle avoidance is combined with DRL to achieve flexible detour processing to obtain the actual feasible optimal path.
Journal of Donghua University(English Edition)2022年5期