Xiaolin Tang ,,, Yuyou Yang , Teng Liu ,,,Xianke Lin , Kai Yang , and Shen Li ,,
Abstract—Parking in a small parking lot within limited space poses a difficult task.It often leads to deviations between the final parking posture and the target posture.These deviations can lead to partial occupancy of adjacent parking lots, which poses a safety threat to vehicles parked in these parking lots.However, previous studies have not addressed this issue.In this paper, we aim to evaluate the impact of parking deviation of existing vehicles next to the target parking lot (PDEVNTPL) on the automatic ego vehicle (AEV) parking, in terms of safety, comfort, accuracy, and efficiency of parking.A segmented parking training framework(SPTF) based on soft actor-critic (SAC) is proposed to improve parking performance.In the proposed method, the SAC algorithm incorporates strategy entropy into the objective function, to enable the AEV to learn parking strategies based on a more comprehensive understanding of the environment.Additionally, the SPTF simplifies complex parking tasks to maintain the high performance of deep reinforcement learning (DRL).The experimental results reveal that the PDEVNTPL has a detrimental influence on the AEV parking in terms of safety, accuracy, and comfort, leading to reductions of more than 27%, 54%, and 26%respectively.However, the SAC-based SPTF effectively mitigates this impact, resulting in a considerable increase in the parking success rate from 71% to 93%.Furthermore, the heading angle deviation is significantly reduced from 2.25 degrees to 0.43 degrees.
MANY collisions occur every year to the parking process due to the small parking space and complex driving skills [1].Autonomous vehicles are expected to safely and efficiently execute all driving skills [2], [3].However, In the field of intelligent vehicles, the technologies that can achieve batch application are mostly focused on advanced assisted driving and energy management [4], [5].Regarding the application of autonomous driving technology, its safety and reliability still need to be verified and tested [6], [7].automatic parking is one of the autonomous driving technologies, its safety, comfort, parking efficiency, and accuracy still need to be further studied.
Currently, two types of automatic parking systems (APS)were proposed [8], [9]: one is general APS.After the system obtains the environmental information, it first plans a parking path.The vehicle is then controlled to follow the planned path to achieve parking.The other is the end-to-end APS, which combines the path planning module and path tracking control module.It directly outputs control variables (acceleration and steering angle) to control the vehicle according to environmental information.
Currently, the primary path planning methods employed in general APS include quintic polynomial curve, cubic spline curve, cyclotron curve, and hybrid A-star algorithm, for tracking control, the commonly used algorithms are proportionintegral-derivative (PID), linear quadratic regulator (LQR),and model predictive control (MPC) [10]–[14].Some of them have been applied to advanced assisted driving systems.However, due to its ability to directly optimize the performance of the entire system, the end-to-end APS exhibits superior overall performance [15].The current methods of implementing end-to-end APS are deep learning and deep reinforcement learning (DRL) [16], [17].DRL is very useful in autonomous driving [18].It has great advantages not only in the end-to-end parking system but also in the parking lot allocation system[19]–[21].Some researchers have done many studies on it.For example, Zhanget al.[15] presented a DRL-based APS.The authors compared the parking performance between DRL-based APS and the general APS.The results proved that the DRL-based APS achieved park easily with fewer deviations.Songet al.[22] built a parking model based on actual parking data, which avoided the disadvantage that model-free RL requires a large amount of interactive data.The training efficiency was improved.However, The performance of the RL model is limited by this data.Bernhardet al.[23] used experience-based heuristic RL for automatic parking.Human main is used to weigh the data explored.Zhanget al.[24]treated parking as a multi-objective optimization process that considers safety, comfort, parking efficiency, and final parking performance.They obtained the parking control strategy considering all aspects through training the RL-based parking model.However, many difficulties still are not solved well in the practical applications of DRL-based APS.The gaps and weaknesses of previous research are that: Generalization is not enough good.The performance will decrease rapidly as the difficulty of the training task increases.The time of the training system model is too long.
This paper proposes a segmented APS based on sofe actorcritic (SAC).It can improve the safety, comfort, and accuracy of automated parking and reduce training time by exploring more realistic environments and decomposing complex driving tasks into simple tasks.As shown in Fig.1, firstly, we quantify the parking deviation of existing vehicles next to the target parking lot (PDEVNTPL) on the automatic parking in terms of safety, comfort, efficiency, and accuracy in an experiment.Secondly, the best starting state for parking is determined to adapt to the target state that deviates from the ideal state due to PDEVNTPL, according to the target state, starting state, kinematic constraint, and PDEVNTPL.Thirdly, the best starting parking state is set as the segmentation point.By doing so, the complex parking task is effectively divided into a simpler parking task and a posture adjustment task.This decomposition reduces the overall difficulty of the parking process, addressing the issue of diminishing DRL performance as training tasks become more challenging.Following this, two driving tasks are trained using the SAC algorithm to acquire control strategies.SAC incorporates strategy entropy into the objective function, encouraging the vehicle to explore the environment and facilitating a more comprehensive learning process.Finally, the SAC-based segmented parking strategies are compared and analyzed against other algorithms.Safety, comfort, efficiency, and accuracy are evaluated using metrics such as success rate, trajectory smoothness, parking time, and posture deviation.
The main contributions of this work are as follows: 1) The SAC algorithm is utilized to balance and improve the safety,comfort, efficiency, and accuracy of automatic parking.2)The impact of parking deviation of existing vehicles next to the target parking lot on the automatic ego vehicle parking is considered in the DRL training process, and it is quantified in terms of safety, comfort, and accuracy.3) A segmented parking training framework is established to simplify the parking task and collect the best parking starting state, to improve the generalization of APS.
To better explain the contributions of this paper, the rest of the work is arranged as follows.Section II describes vehicle kinematics and parking scenarios.Section III introduces the SAC algorithm and segmented parking training framework,and implements the automatic parking system.Section IV analyzes the results.Section V presents a conclusion.
In this section, two different parking scenarios are introduced.The vehicle kinematics of automatic ego vehicle(AEV) is also described.The parking environment includes the AEV and surrounding vehicles.The surrounding vehicles were already parked.
The rectangle is represented the outline of the vehicle.Its width and length are maximum width and length of the vehicle, which adds the restricted zone area to increase parking safety.As shown in Fig.2.x,yare the lateral position and longitudinal position of the vehicle, respectively.vis the vehicle velocity.lf,lris front wheelbase and rear wheelbase separately.ψis the heading.βis a slip angle at the center of gravity.δfis steering angle.Ris the turning radius.Since the vehicle travels at low speed during parking, the influence on driving by the physical property of tire and suspension is ignored.We can model the AEV by the bicycle model, which is based on the nonlinear continuous horizon equations [25], [26], as follows:
The linear kinematic model is calculated by Taylor expansion
For the convenience of the following research, the calculation formula of the minimum turning radius is given:
The default parameters of the AEV and surrounding vehicles are the same.In this paper, the length and width of the vehicles are set to 5.0 meters and 2.0 meters respectively.The wheelbase is set to 3.6 meters (The front wheelbase equals the rear wheelbase).
Two types of parking scenarios are investigated in this study: the ideal parking scenario and the actual parking scenario.The actual parking scenario considers the parking deviations of existing vehicles next to the target parking lot (PDEVNTPL).As shown in Fig.3.The blue vehicle is the AEV.The gray vehicles are the surrounding vehicles.The green icon and red icon are the ideal target parking state and original target parking state.
Fig.3.The ideal parking scenario and actual parking scenario.
In actual parking scenario, the collision may happen if the AEV is parked according to the original target parking state.To coordinate the distances between AEV and adjacent vehicles.It is a good choice to adjust the target state by the mean of the PDEVNTPL.The basic change rules are as follows:
where (?xT, ?yT, ?ψT) is the variation of the target parking
state.?xi, ?yi, ?ψi, fori= 1, 2, are the lateral deviations, longitudinal deviations, and heading angle deviations of the adjacent vehicles respectively.For special cases where PDEVNTPL is a large value, which leads to the entrance of the parking lot being occupied, one extra rule is added: Ifmeters, reset the parking.
According to China’s regulations on parking lot size, the length and width of mini parking lots are 5.5 meters and 2.5 meters.The width of the center lane of the parking lot should be 3.5 meters at least.Therefore, without losing generality, all parking environment sizes in this paper are set based on these regulations.To better reflect the influence of the PDEVNTPL,the width of the parking lot is changed to 3 meters.Take the target position as the origin.Right is the positive direction of theXaxis.Up is the positive direction of theYaxis.The lateral deviation (The abscissa difference between the centers of the vehicle and the parking lot), longitudinal deviations (The ordinate difference between the centers of the vehicle and parking lot) and heading deviations (The angle between the heading of a vehicle and the length direction of a parking lot)of the vehicles adjacent to the target parking lot take respectively a random value within [0, 0.5]/[-0.5, 0] (Vehicle on the left/Vehicle on the right) meters, [0, 0.2] meters and [-10, 10]degrees.(x,y,ψ) is used to represent the target state.The ideal target parking state is set as (0, 0, 90); The abscissa of the starting point should be greater thanRmin=6.4 (Equation (7)).For generality, the start parking state of the ideal parking scenario and general parking of the actual parking scenario is set as (8, 6, 0).The initial parking configuration of segmented parking in actual parking scenarios is introduced in Section III.
In this section, the APS is established.Firstly, the SAC algorithm is introduced.Then, the segmented parking training framework is introduced.Finally, the implementation of the automatic parking system is introduced.
In recent years, learning-based algorithms are a hot type of method in the field of intelligence [27]–[29].DRL is one type of learning-based algorithm.It has been applied in autonomous driving, mainly focusing on path planning, control strategy, and decision-making [30], [31].Deep deterministic strategy gradient (DDPG) algorithm is particularly outstanding due to its excellent performance [32].It is one of a few DRL algorithms that can solve the continuous action space problem.It was proposed in 2014 [33].However, Its training time is long.Its model generalization is not good.These problems need to be addressed urgently [34], [35].The SAC algorithm can overcome these shortcomings.It is a new DRL algorithm presented in 2018 [36].Currently, it is been applied in autonomous driving for decision-making [37] and path planning [38].In this study, we use it to learn parking control strategies.
SAC is improved based on actor-critic.It borrows the experience replay mechanism of the deep Q-network to make the training data independent of each other.Also, it introduces strategy entropy to increase the randomness of action, which encourages vehicles to explore the environment.Strategy entropy is directly added into the Q function and value function to Maximize itself [39].
where γ ∈(0,1) is the discounted factor to balance the shortterm and long-term reward.ρis a temperature parameter to tradeoff the entropy and reward.H(·|st) is entropy, expressed as
As shown in Fig.4.SAC consists of five neural networks(NNs): one policy network is used to produce actions.one value network and one value-target network are used to evaluate the current states.Two Q-function networks are used to evaluate the effect of the action chosen based on the current state.
Fig.4.The network structure and the update process of network parameters of SAC algorithm.
The pseudo-code of SAC algorithm parameters training is displayed in Algorithm 1.The update of NNs parameters requires a certain amount of data.Before the formal training of the algorithm parameters, the vehicle randomly explores the environment to collect empirical data.After the data of replay bufferDis sufficient, the NNs parameters can be updated based on replay bufferD.
Referring to reference [36], the updated principle of each network parameters of SAC algorithm is given below.The parameters of the value network are optimized by minimizing the value of the loss function.It is expressed as
The gradient of the loss function is expressed as
The value-target network has the same structure as the value network.Its parameters are optimized by copying from the value network periodically.
For an algorithm with only one Q network (Such as deep Qnetwork, actor-critic), the evaluation of events is always overevaluated, which results in the optimization results being easily convergent to the local optimal.Van Hasseltet al.found that building two Q networks and selecting the smaller output of Q network as the target Q function during training, which can effectively alleviate the problem of algorithm overestimation [40].For the two Q-function networks, they have the same structure.Their parameters are optimized by minimizing soft Bellman residual function.It is expressed as
whereQˉ(st,at) is the soft Bellman value of the state,expressed as
where theVψˉ(st+1)is the output of the value target network.
Algorithm 1 Soft Actor-Critic Ψ, ˉΨ, θ, ? 1: Initial parameters:2: for each iteration do 3: for each environment step do at ←π?(st,at)4: Select action:at st+1 rt 5: Execute ,observe next state and reward :st+1 ←p(st| st,at),rt=r(at,st)6: Store observes in replay memory D:D ←D∪(st,at,rt)7: end 8: for each gradient step do 9: Updata value network parameters:Ψ ←Ψ-λV ??ΨJv(Ψ)10:11: Updata the Q-function network parameters:θi ←θi-λQ ??θi JQ(θi) i ∈1,2, for 12: Updata policy network parameters:? ←?-λπ ???Jπ(?)13: Updata target network weights:ˉΨ ←τΨ+(1-τ)ˉυ 14: end 15: end
The parameters of Q-function networks are optimized by following the gradient of soft Bellman residual function.It is expressed as:
The parameters of the policy network are optimized by the loss function, expressed as
The gradient of this loss function for optimizing parameters is expressed as
where πtis the probability function of the distribution of actions.For continuous action space DRL, the probability function of the distribution of actions is used as the basis of action choice.Only the action with the highest probability is selected [36].It is expressed as
whereμandσare the mean and standard deviation of the Gaussian distribution respectively.εis noise.Nmeans the standard normal distribution.The purpose of adding noise is to diffuse the range of action choices and increase vehicle exploration.
The challenge in automatic parking lies in the limited freedom to adjust the vehicle’s posture within confined spaces,leading to difficulties in ensuring both safety and accuracy during the parking process.Additionally, the performance of DRL diminishes as the task complexity increases.Zhuanget al.[41] have proved that training efficiency can be improved by decomposing the parking process into three parking subprocesses.To address these issues, a segmented parking training framework (SPTF) is employed, as illustrated in Fig.5,providing a solution to these challenges.The complex parking task can be broken down into two simpler tasks: a posture adjustment task and a parking task.The best starting state (c2)for the simple parking task, evaluated by the Euclidean Distance between this state and the target state, can be calculated using the starting state (e2), target state, kinematic constraint,and PDEVNTPL.Two tasks are independently trained.By reducing the difference between the target state and the starting state, the SPTF enhances the available space for relative attitude adjustment.Therefore, The SPTF enables the AEV to effectively adjust its posture in crowded and limited spaces.
Among the components of the Euclidean distance, the heading angle difference accounts for the largest proportion.Consequently, the primary goal when calculating the best starting state of parking (state c2) is to minimize the heading difference.As depicted in Fig.6(b), the maximum heading angle of the starting state of parking can be calculated using the following approach:
wheredlaneis the width of center lane.LAEVandWAEVare the width and length of AEV.
Substituting (24) into (23), φc2is obtained as follows:
Fig.5.Segmentation and training framework of segmented parking based on DRL.
Fig.6.Automatic parking trajectory planning based on the curve..
The position ofd2is obtained as follow:
Choosing the maximum allowed value for the heading angle φc2.The best starting state lateral position and longitudinal position are obtained as follows:
whereLlotis the length of parking lot.
xc2andyc2need to be adjusted according to the target state.Therefore, lateral positions and longitudinal positions of a2 and b2 are calculated based on variation of target state(?xT,?yT,?φT).
1)The Implementation of APS Based on DRL: The implementation of using DRL to solve practical problems includes environment building and algorithm framework building.The environment is built in Section II.The algorithm framework will be built in this subsection.It generally includes five parts:confirming state variables and control variables, establishing reward function, building NNs construction, setting training mode and confirming training parameters.
The state variable is the observation of AEV in the environment.It is generally expressed by position, velocity and heading angle.For more intuitive observation, this study projected each variable distribution onto two coordinate axes.Its expression is as
where (xe,ye),ve, φeare the location coordinates, speed, and heading angle of the vehicle.(xT,yT),vT, φTare the location coordinates, speed, and heading angle of the target state.
Control variables (action) is the command from the controller to AEV.This study confirms the control variables are accelerationaand front wheel steering angleδ.They are defined as follows:
Parking safety (Parking without collisions), parking comfort (Control variables vary smoothly), parking efficiency(Parking trajectory is smooth), and deviations between final parking posture and target state are considered in this study.Therefore, the reward function consists of six parts corresponding to these four factors.It is expressed as
whereRphconsiders the final parking posture.Rcfconsiders the parking comfort.Rvconsiders the parking efficiency.Rcis the collision penalty.Rsis the reward of success.The goal ofRadis used to reduce the lateral deviation of the parking.
Rnis used to strengthen posture adjustment.Since the target state can change in actual parking scenario.TheRnin (43) is not feasible, changed as
The NNs framework of SAC algorithm is shown in Fig.4.For two Q-function networks, their architectures are the same.Each network is composed of five layers: Layer 1 has 14 units to input the state and action information.Layers 2–4 is the hidden layer with 256 units.Layer 5 outputs the value of Q.
For the policy network, value network and value-target network, their architectures are the same.Each network of they has five layers: Layer 1 has 12 units to input the state information.Layers 2-4 is the hidden layer with 256 units.Layer 5 outputs the mean and standard deviation of the action distribution and value of state.
After much trial and error, the parameters of DRL are determined.The weight coefficients of the reward function and the hyper-parameters are displayed in Tables I and II.The simulation frequency is 15 Hz.The time of sampling action is 0.2 s.The duration of one episode is 50 steps or successful parking or collision.
TABLE I THE HYPER-PARAMETERS OF SAC AND DDPG ALGORITHMS
2)The Implementation of Motion Planning: Motion planning includes trajectory planning and speed planning.Safety,comfort, and accuracy are taken into account in trajectory planning.Speed planning takes comfort and driving efficiency into consideration.
The trajectory planning method in this study is shown in Fig.6.The arc and straight line that meet the kinematic constraints are used to form a trajectory curve.It connects thestarting state to the target state.To solve the problem of curvature discontinuity, the cubic spline curve is used for interpolation fitting of the trajectory.The trajectory for the segmented parking (as shown in Fig.5(b)) is calculated using (23) to(28).The equations of the trajectory of general parking (Fig.5(a)) are the same as the trajectory of segmented parking.The following changes are made:dlane, φc2is set asysand 0.5π.ysis the longitudinal position of starting position.
TABLE II THE WEIGHT COEFFICIENT OF REWARD FUNCTION
The principle of cubic spline curve interpolation fitting is to solve the cubic polynomial function between adjacent nodes.It is represented as follows:
By substituting the boundary values, (45) is derived to obtain the solution
wherehi=[xi+1-xi].mi,i∈[0,1,...,N] is the solution to the following equation:
Comfort is considered in the objective function of the algorithm.Speed planning is as follows:
wherek,k∈{-1,1}, is directional coefficient.dis the distance between AEV and starting position.d0is the distance between starting position and target position.vmaxis the maximum speed allowed.
Parameters of motion planning are set as follows: Trajectory sampling interval is 0.1 meters; Maximum allowable speedvmax= 3 m/s.The boundary conditions of cubic spline curve is Natural Spline.
3)The Implementation of Tracking Control Based on MPC:The MPC algorithm is known for its simulation prediction capabilities.It aims to find a set of optimal solutions within the prediction time horizon through iterative optimization.The linear system model exhibits a favorable prediction performance.Therefore, the implementation steps for MPC path tracking involve the linearization of the tracking control model and the construction of the objective function.
The linear equation of the approximate system can be obtained by the linear kinematic model (Equation (6))
where
The purpose of the objective function is to reduce the tracking error.Comfort is also considered in this study.The objective function is set as follows:
where theQMPC,RMPC,Rdare weight matrixes.Nis the number of simulation steps of prediction (Predictive time domain).
Aftermuchtrialand error,theMPC parameters aresetas follows:N=30,QMPC=[2211]T,RMPC=[0.10.1]T,Rd=[0.01 0.01]T.The maximum allowable loss value is 0.01.Maximum iteration step is 100.
4)The Implementation of Tracking Control Based on LQR and PID: Previously, researchers have explored the idea of breaking down a multi-input tracking control system into multiple single-input tracking control systems by decoupling the control variables [42].However, it should be noted that a complete tracking control system is still interconnected through mutual state variables.One common approach involves designing the path tracking control system and the speed tracking control system separately, with the overall tracking control system being coupled through the velocity variable.In this study, the path tracking control system is designed using the LQR method, while the speed tracking control system is designed using the PID controller.
LQR is also used to solve linear system problems.A linear model for tracking control is given as follow:
where
de(t) and φe(t) are position error and heading angle error respectively.
The cost function is defined as follows:
whereQLQR,RLQRare weight matrixes.Mis predictive time domain.
The solution of LQR is as follows:
Substituting (44) into (43), takeM=∞, the equation forKis obtained as follows:
wherePis the solution to the following Riccati equation:
PID is used to track control speed.Its formula is expressed as follows:
wheree(t) is function of error.
After many trials and errors, the parameters of PID and LQR are determined as follows:QLQR=eye[100 0.10 0.1],RLQR=[0.1].The maximum number of iterations is 150.The maximum allowable loss value is 0.01.The values ofkp,ki,andkdare set to 2, 0.001, and 0.1 respectively.
5)The Implementation of APS Based on DDPG And OBCA:The basic trajectory is planned based on DDPG, and then the trajectory is optimized by optimization-based collision avoidance (OBCA).The safe and comfortable parking trajectory is obtained [43].The implementation process of DDPG is the same as that of SAC (30-44), which will not be repeated here.
The implementation of the OBCA-based optimization of the path planned by DDPG include the construction of objective functions and the addition of safe constraints.The basic of trajectory optimization is the basic trajectory.Therefore, the objective function and path tracking are similar.It is expressed as follows:
whereQ,Rare weight matrixes.Their value selection is the that of the MPC.
xrefis the points sampled from the basic trajectory.xanduis the state variables and control variables.Mis the number of points sampled.
The constraint types are kinematic constraint, obstacle avoidance constraint, control quantity range constraint and trajectory starting and ending constraint.It is expressed as follows:
wherexstartandxtargetare the start point and target point of parking trajectory.f(xk,uk) if the vehicle kinematics model(6).(o1, o2, o3, o4) and (v1, v2, v3, v4) are four corners of a rectangle representing vehicles and obstacles.S?is triangular area.Sis rectangular area.
This section evaluates the performance of SAC-based segmented parking strategy.The simulation environment is constructed based on Python 3.7 [44].Firstly, the convergence of DRL training strategies is analyzed.Secondly, the performance of AEV parking is compared between DRL methods and traditional methods.Thirdly, the influence of PDEVNTPL on automatic parking is quantified in terms of safety,comfort, efficiency, and accuracy.Finally, The effect of that SPTF is used to reduce the influence of PDEVNTPL is analyzed.
In the subsection, the training process is analyzed.The SAC and DDPG algorithms are to train AEV to park Safely, comfortably, efficiently and accurately.The learning effect is reflected by the cumulative reward of an episode.AEV must learn to balance the four elements to maximize the accumulative reward.In the training process, the parking policy is improved by increasing of the accumulative reward.When the cumulative reward converges to a value with small fluctuations, the parking policy has reached the optimum.
The automatic ego vehicle is trained in over 100 000-time steps (2000 episodes).The average reward of the training process is shown in Fig.7.AEV is trained by SAC and DDPG.Both cumulative reward curves tend to rise first and converge to a value finally.Obviously, the cumulative reward of SAC(600 episodes) becomes convergence earlier than DDPG(1800 episodes).The convergence values of the two algorithms are different little.It means SAC has a higher training efficiency while ensuring good learning effects.
Fig.7.The average reward for SAC and DDPG training process.
According to the analysis of the above experimental results,SAC algorithm can greatly improve the training efficiency,while ensuring the same convergence performance as DDPG.
In this subsection, the performances of DRL methods, traditional methods, and scheme combining DRL and traditional methods on parking are compared in the ideal parking scenario.DRL methods are SAC algorithm and DDPG algorithm[15].Traditional methods are that Spline curve and MPC are used to plan path and track control (S+MPC) [12], and Spline curve, LQR and PID are used to plan path and track control(S+LQR+PID) [11].scheme combining DRL and traditional methods is that DDPG is used to plan path and OBCA is used to correct the trajectory and track the control (DDPG+OBCA)[43].The performance of each method is analyzed in terms of safety, comfort, efficiency, and accuracy.
The metrics used to evaluate automated parking systems are generally safety, comfort, efficiency, and accuracy [15],[22]–[24].Safety is concerned with determining if any collisions occur during the parking process.Comfort assesses whether the control variables, namely acceleration and steering angle, experience frequent and significant fluctuations.Efficiency measures the duration of the parking process.Accuracy evaluates the deviation between the final parking state and the desired target state.In the ideal parking scenario,the AEV would be able to park without any collisions consistently.Consequently, the analysis of APS will focus on the comfort, efficiency, and accuracy aspects, examining factors such as parking deviations, control variable smoothness, and parking time.
The reward function assigns varying weights to different criteria, with accuracy being accorded only smaller significance than safety.This weighting suggests a prioritization of optimizing accuracy initially, under the condition of ensuring safety.The parking deviations of all methods are tabulated in Table III.All methods adhere to BS ISO16787 standards, barring the S+MPC level deviation.Notably, the DRL method demonstrates reduced deviations in comparison to the traditional methods.For instance, the S+MPC lateral deviation and the heading deviation of S+LQR+PID is more pronounced.Conversely, deviations in the case of DDPG and SAC remain relatively modest, both in positional deviation (below 0.1 meters) and heading angle deviation (below 1 degree).The deviation in DDPG+OBCA is comparable to, or even smaller than that of the DRL method.The performance of the DDPG technique is notably commendable.Post reoptimization, further enhancement in system performance is achieved.
TABLE III THE PARKING DEVIATIONS IN IDEAL PARKING SCENARIO AND BS ISO16787 FOR COMPARISON
Fig.8 shows the control variables of parking in the ideal parking scenario.To directly observe the comfort of parking,the smoothness of the control variable is calculated using the following equation [41]:
whereaˉ is the average value of the set of control variables.aiis the element of the set of control variables.nis the length of the control variables set.
Fig.8.The acceleration and steering angle for automatic parking in the ideal parking scenario.
TABLE IV THE SMOOTHNESS OF ACCELERATION AND STEERING ANGLE IN IDEAL PARKING SCENARIO
Table IV presents the smoothness characteristics of the control variables.Generally, both S+MPC and DDPG+OBCA exhibit superior smoothness (lower values) when contrasted with the two DRL methods.S+LQR+PID displays the least favorable smoothness (maximum values), particularly evident in the steering angle domain.This observation aligns logically with the fact that model-based methods (MPC and OBCA) can adeptly fine-tune parameters based on the model,excelling in individual aspects.However, a challenge arises in finding an encompassing parameter set ensuring holistic optimization.LQR enhances real-time performance by maximizing subsequent state returns, which limits its agility in coping with abrupt state transitions.Instances of sharp curvature fluctuations necessitate substantial control variable adjustments for minimal tracking error.Conversely, DRL optimizes policies by considering both immediate actions and their downstream consequences, preserving global performance optimality.Consequently, the relative reduction in comfort within DRL compared to model-based methods (MPC and OBCA) is attributed to DRL’s primary training focus on secure and precise parking, with comfort inadvertently sacrificed.This compromise is further compounded in the pursuit of parking efficiency optimization, causing a detriment to comfort within the DRL methods.As a result, DRL exhibits suboptimal performance in the comfort aspect.
Parking efficiency and the smoothness of acceleration are often conflicting objectives.Maximizing efficiency requires aggressive acceleration and deceleration, leading to significant changes in acceleration during the parking process.As a result, the smoothness of acceleration is unavoidably affected.The parking times is presented in Table V.The DRL methods prioritize the reduction of parking time over maintaining smooth acceleration, thereby sacrificing trajectory smoothness.For instance, the parking duration for the DDPG method is 7 seconds, while after the incorporation of DDPG+OBCA,the parking duration extends to 13.6 seconds.OBCA substantially enhances the smoothness of the control variables, resulting in a considerable reduction in parking efficiency.
Based on experimental analysis, it is evident that the DRL method outperforms traditional approaches in terms of auto-mated parking performance.When compared with the stateof-the-art methods, precision levels can achieve high standards, albeit at the cost of reduced comfort, but with a notably greater efficiency.While the DRL method might not offer optimal comfort, this trade-off is made to attain elevated levels of efficiency and accuracy.
TABLE V THE PARKING TIME OF AUTOMATIC PARKING IN IDEAL PARKING SCENARIO
PDEVNTPL is a common occurrence in real-life situations,and it significantly affects automatic parking.This is clearly illustrated in Fig.9, where the PDEVNTPL poses a threat to the successful execution of automatic parking.In this subsection, the impact of PDEVNTPL on automatic parking is quantified through tests conducted in actual parking scenarios.
Fig.9.The influence of PDEVNTPL in automatic parking in actual parking scenario.(a) The rear of the vehicle takes up parking space and causes a collision; (b) The head of the vehicle blocks the entrance by occupying the parking space.
During the training of DRL, the horizontal deviation of adjacent vehicles is randomly set between 0 and 0.5 meters,while the heading angle deviation is randomly set between -10 and 10 degrees.It is observed that the influence of heading angle deviation is more significant.During testing, the heading angle deviation also remains random within the range of -10 to 10 degrees, to ensure a continuous influence on automatic parking.Additionally, the horizontal deviation increases sequentially.To study the extent of influence from PDEVNTPL, the parking space is gradually reduced, and the parking difficulty is increased.The success rate, average smoothness of control variables, and average deviations of the final state are computed based on 1000 test results.These metrics serve to quantify the degree of influence PDEVNTPL on automatic parking.
The success rates of the experiments are presented in Table VI.As the lateral deviation increases, the parking space gradually diminishes, leading to heightened parking difficulty.Consequently, the success rates of all methods exhibit a decline.At the maximum PDEVNTPL value (0.5), the success rates of the S+MPC and S+LQR+PID algorithms plummet by over 30%.Similarly, the success rates of SAC and DDPG decrease by 29%.Even for DDPG+OBCA, whichenforces safety constraints for assurance, the success rate experiences an almost 10% drop.Safety constraints ensure that the target vehicle avoids collisions with neighboring vehicles but do not guarantee safety throughout the entire trajectory from start to finish.Given the confined parking space,certain scenarios necessitate conditions that prevent collisions with adjacent vehicles, inadvertently leading to collisions with vehicles in upper parking spaces.
TABLE VI THE SUCCESS RATE OF GENERAL PARKING WITH VARIATION OF DEVIATION
Parking deviations are presented in Table VII.All deviations with PDEVNTP exceed these deviations without PDEVNTP (Table III).The influence of adjacent vehicle parking deviations on the autonomous parking of the AEV is primarily reflected in heading angle deviations and lateral deviations.The lateral deviations for S+MPC, S+LQR+PID, DDPG,DDPG+OBCA, and SAC increased by 230%, 153%, 300%,200%, and 359% respectively.Similarly, the heading angle deviations increased by 196%, 21%, 32%, 101%, and 142%respectively.These deviations are primarily impacted by PDEVNTPL.Notably, the influence on longitudinal deviation is minimal.This is attributed to the increased complexity in posture adjustments stemming from alterations in the target state, which has no bearing on velocity planning.
TABLE VII THE PARKING DEVIATIONS OF GENERAL PARKING IN ACTUAL PARKING SCENARIO
Regarding parking comfort, the smoothness of control variables is presented in Table VIII.In comparison to parking without PDEVNTPL, all methods exhibit poorer (higher values) steering angle smoothness (Table IV), with relatively minor variations in acceleration smoothness.Fig.10 provides an intuitive understanding of the results.To effectively accommodate deviations between the target state and desired state, the steering Angle requires a large adjustment.Inversely, acceleration necessitates only minor adjustments to accommodate distance variations between the starting and tar-get points.The steering angle smoothness for S+MPC,S+LQR+PID, DDPG, DDPG+OBCA, and SAC increased by 27%, 6%, 9%, 72%, and 58% respectively, indicating a decline in smoothness.Notably, both S+LQR+PID and DDPG exhibit minimal variations in steering angle smoothness due to their poor comfort levels, as evident in Table IV, with a decrease in the degree of variation as they approach worstcase scenarios.
TABLE VIII THE ACCELERATION AND STEERING ANGLE SMOOTHNESS OF GENERAL PARKING IN ACTUAL PARKING SCENARIO
Fig.10.The acceleration and steering angle for general parking in actual parking scenario.
Based on the aforementioned analysis of experimental data,PDEVNTPL exerts an impact on automated parking across all methods.It primarily influences success rates, lateral deviation, heading angle deviation, and steering angle smoothness.Under the influence of PDEVNTPL, safety, accuracy, and comfort are reduced by approximately 27%, 54%, and 26%respectively.
In this subsection, the SPTF is employed to minimize the impact of PDEVNTPL in actual parking scenarios.The path planning of traditional methods also follows the principles of the SPTF.By conducting a comparative analysis of parking safety, accuracy, comfort, and efficiency between general parking and SPTF-based parking, the advantages of the SPTF are verified.To ensure an accurate evaluation of parking strategy performance, data from 1000 tests are collected, and the average values are used as the main criteria.
In terms of parking safety, the success rates are presented in Table IX.The experimental results indicate that success rates decrease with increasing PDEVNTPL values.However, even when PDEVNTPL reaches its maximum (0.5), the success rate of the SAC method remains above 93%, surpassing the success rates of both DDPG and DDPG+OBCA.The successrates of traditional methods also exhibit improvement (Over 20% improvement), albeit not as prominently as SAC.The direct cause of enhanced parking safety is the SPTF, it ensures the AEV initiates parking in the most best state.As depicted in Fig.11, the initial orientation angle is usually 90 degrees different from the target orientation angle in general parking scenarios.Additionally, there is a larger lateral distance to cover.SPTF ensures that the initial state is not only closer to the target state but also possesses a smaller angular difference.This allows the AEV to achieve parking through simple adjustments.Moreover, the value of the best starting state for parking, as determined by SPTF, takes PDEVNTPL into account, enabling the AEV to adapt to non-ideal target states induced by PDEVNTPL.
TABLE IX THE SUCCESS RATE OF SPTF-BASED PARKING AND GENERAL PARKING WITH VARIATION OF DEVIATION FOR COMPARISON
Fig.11.General parking and SPTF-based parking processes.(a) General parking is unable to adjust the attitude due to the large distance between the starting state and the target state, resulting in a collision; (b) The distance between the starting state and the target state of SPTF-based parking is small,AEV parks easily.
In terms of accuracy, the parking deviations are presented in Table X.It can be observed that the parking deviations are all smaller than those of general parking.Lateral deviation reduced to the level of parking without PDEVNTPL (Table IV).However, it is noteworthy that there still exists a relatively large deviation for these four methods.Nevertheless,the success rate of the SAC algorithm surpasses that of the other algorithms.This implies that even under larger PDEVNTPL values, SAC is capable of achieving parking safely.These cases present greater challenges to the parking task,resulting in larger deviations.SAC prioritizes safety over other performance aspects, sacrificing them to ensure a higher level of safety during parking.
TABLE X THE PARKING DEVIATIONS OF SPTF-BASED PARKING AND GENERAL PARKING IN ACTUAL PARKING SCENARIO FOR COMPARISON
In terms of parking comfort, the smoothness of control variables is illustrated in Table XI.It can be observed that, in comparison to general parking, the smoothness of all methods is enhanced (values are lower), particularly in terms of steering angle smoothness.By employing the SPTF, the parking task is divided into two phases: posture adjustment and simple parking.While the overall amount of posture adjustment remains constant, the total amount of posture adjustment for individual tasks decreases.This partitioning enables a smoother and more comfortable parking experience.The best starting state further reduces the overall amount of posture adjustment required, consequently minimizing abrupt changes in control variables.As depicted in Fig.12, it is evident that the smoothness of SAC is inferior (higher values) compared to S+MPC and DDPG+OBCA.This disparity can be attributed to SAC’s primary focus on optimizing safety during the parking process.Comfort is further sacrificed to enhance parking efficiency.
TABLE XI THE ACCELERATION AND STEERING ANGLE SMOOTHNESS OF SPTF-BASED PARKING AND GENERAL PARKING IN ACTUAL PARKING SCENARIO FOR COMPARISON
Fig.12.The acceleration and steering angle of SPTF-based parking in actual parking scenario.
TABLE XII THE PARKING TIME OF SPTF-BASED PARKING AND GENERAL PARKING FOR COMPARISON
The parking durations based on SPTF are presented in Table XII.Experimental results demonstrate that the parking times for most methods increase by more than 200% compared to general parking times.This is a drawback of the SPTF approach.However, sacrificing some efficiency in favor of enhanced safety and accuracy is reasonable.Among these methods, S+MPC, DDPG, and SAC exhibit shorter parking durations compared to S+LQR+PID.SAC achieves higher success rates.As the deviation of the target state increases, the parking duration also increases.Naturally, if parking speeds are the same, the parking duration for S+MPC would naturally be longer.Overall, SAC performs better in terms of parking duration.
Through the analysis of experimental data, it can be concluded that the SPTF effectively mitigates the impact of PDEVNTPL on automated parking, leading to a notable enhancement in automated parking performance.The parking success rate exceeds 90%, and parking deviations are reduced to a level comparable to parking without PDEVNTPL.There is also a certain degree of improvement in comfort.Among the tested algorithms, SAC demonstrates the most pronounced performance improvement.
In order to assess the impact of parking deviation caused by other vehicles on automatic parking, we conducted a quantitative analysis.It was observed that the PDEVNTPL significantly affects various aspects of automatic parking performance, with safety, comfort, and accuracy experiencing a decline of more than a quarter.To mitigate this influence, we propose a SAC-based SPTF approach, which successfully reduces the impact of PDEVNTPL.Consequently, the safety,comfort, and accuracy of parking are substantially improved.However, it is important to note that the DRL-based APS used in this study is not suitable for unfamiliar scenarios due to the inherent characteristics of RL and the specificity of the reward function.Moreover, DRL methods are unable to incorporate state or control variable constraints to ensure vehicle safety in abnormal conditions.Furthermore, the study did not consider the influence of moving obstacles near the target parking lot or the impact of incomplete environmental information resulting from sensor failures on automatic parking.These factors will be addressed in future research, which will focus on autonomous parking scenarios where ego vehicles interact with nearby vehicles.Additionally, the study will explore the safety implications of varying loss ratios, types, and distances of environmental information.Finally, the parking control strategy will be transferred to real-world vehicles for experimental validation.
IEEE/CAA Journal of Automatica Sinica2024年1期