Optimal reinforcement learning and probabilistic-risk-based path planning and following of autonomous vehicles with obstacle avoidance

In this paper, a novel algorithm is proposed for the motion planning and path following automated cars with the incorporation of a collision avoidance strategy. This approach is aligned with an optimal reinforcement learning (RL) coupled with a new risk assessment approach. For this purpose, a probabilistic function-based collision avoidance strategy is developed, and the proposed RL approach learns the probability distributions of the adjacent and leading vehicles. Subsequently, the nonlinear model predictive control (NMPC) algorithm approximates the optimal steering input and the required yaw moment to follow the safest and shortest path through the optimal RL-based probabilistic risk function framework. Additionally, it is attempted to maintain the travel speed for the ego vehicle stable such that the ride comfort is also offered for the vehicle occupants. For this purpose, the steering system dynamics are also incorporated to provide a thorough understanding of the vehicle dynamics characteristic. Different driving scenarios are employed in the present paper to evaluate the proposed algorithm’s effectiveness.


Introduction
2][3] Powerful motivations to employ automated cars include safety of passengers, driving comfort, improved car performance, and efficiency in terms of time and infrastructure deployment. 4,5Statistics reveal that car drivers are the main culprit of almost 90% of casualties in road accidents that can be potentially avoided by employing automated cars. 6,7To accomplish this goal, automated cars should hold an adequate intelligence level to demonstrate effective decisionmaking and environmental awareness to handle severe traffic scenarios and hazardous road conditions.It is essential in case the driving procedure is wholly taken over by the car.Optimal path planning based on road conditions, potential obstacles, and traffic regulations are crucial for automated cars.Accordingly, emerging a framework to equip the cars with optimized route planning algorithms based on the potential road obstacles and available road space is still a dynamic research field.
Path-planning combined with the obstacle avoidance paradigm has been extensively investigated in the literature for non-holonomic robots before. 8,9However, path-planning based on obstacle avoidance should be performed by extra considerations such as the road regulations and free maneuvering space, the dynamic constraints related to the vehicle components and system states.Overall, the described factors turn the pathplanning and following problem into a hugely challenging task. 10It is also essential to recognize a strategy for path-planning in a real-time fashion because of the risk of obstacle emergence on the road.The recently employed path-planning techniques for automated cars include artificial potential field methods (APF), 10 random search methods, 11 and invariants of optimal control such as nonlinear model predictive control (NMPC). 2,4Huang et al. 12 employed an APF approach to designate several distinct potential functions for possible obstacles and road barriers.Moreover, the obstacle-free areas were meshed and utilized as safe driving zones.As a result, the driving path was planned spatiotemporally.APF approach holds the ability to designate various potential functions to complex obstacles and road barriers to set the desired path accordingly.However, APF approach does not necessarily encompass the optimal vehicle dynamic response to follow the desired path.Rasekhipour et al. 13 developed a combined model predictive and APF algorithm for planning the optimal path, while the objectives incorporated the obstacle-based potential functions together with the vehicle dynamics constraints.
The path-planning strategy typically accounts for any individual road barrier under the operating conditions such as vehicle-obstacle shortest space and whether the obstacle is visible for the approaching vehicle.Among the commonly employed strategies, single MPC is typically sluggish to deal with the twodimensional collisionless maneuvering by incorporating an obstacle avoidance cost.Additionally, the optimal control problem analyzes the different types of obstacles in identical functions without the road regulations.A hybrid path-planning paradigm for automated cars under constrained environments was proposed by considering various constraints related to the path geometry, vehicle dynamics and holonomicity. 14In this approach, a non-derivative-based global search algorithm was employed to derive the higher-order state information for state sampling.Yoon et al., 15 proposed a recursive-based path-planning algorithm running based on the reduction in the states of the search space and incorporating several factors such as the geometry of the road and the dynamics of the car.Additionally, the framework operates through two heuristics-based constraints-wise node expansion approaches that correct the future path according to the available geometry and cornering space.7][18] Apart from the optimality for the path to take, it is also crucial to consider the entire surrounding risks of the ego vehicle and guarantee that the intended direction is reliable and reasonably safe.0][21] Kim et al. 19 developed an algorithm based on the potential risk assessment by realizing the possible collision risk with the driving situation obstacles and identifying the safest path to take.In addition, an integrated path planning and path-following strategy was proposed in Chen et al. 22 according to the velocity prediction for the leading vehicle by employing a composite nonlinear feedback controller design for the path following purpose and an input-output hidden Markov model (HMM) for the estimation of the velocity related to the leading car.In the area of the dynamic field, the invariants of curve shapes such as Bezier, 23 cubic, 24 and quintic polynomials 25 have manifested their effectiveness for generating smooth trajectories for the vehicle during the path planning process.Lattice-based path planning framework 26,27 as well as multi-objective trajectory planning according to evolutionary algorithms have also been proposed [28][29][30] as potential approaches.However, such a strategy mainly incorporates the kinematic constraints but lacks the collision risk and obstacle avoidance intentions, demonstrating limited effectiveness in practical applications.In Liang et al., 31 a local motion planning framework was conceived, connected with a cruise control algorithm using adaptive MPC.Another lateral MPC controller then performed the lateral path-tracking for the global path.In Hang et al., 32 a tube-based MPC method was utilized to control autonomous electric vehicles with active safety considerations at driving limits by controlling four-wheel steering (4WS) and direct yaw-moment control (DYC).Because sideslip angle, as a critical state, plays a significant role in achieving the desired tire forces for controlling the vehicle, Advancing the estimation sideslip angle was carried out by combining vehicle kinematics and dynamics states and fed to a fuzzy logic system in Xia et al. 33 Other hybridized local and global path-planning methods, such as the visibility graph method, have also been introduced to control autonomous vehicles, combined with NMPC control algorithms for path-tracking purposes. 34Despite the particular merits of the reviewed approaches, path planning for situations where the existence of obstacles and barriers is unknown or no information exists on their dynamic states still serves as a challenge for optimal path planning problems.
The reviewed literature indicates that optimal path, and predictive models have been massively employed so far.However, these algorithms are liable to the lack of dynamic interaction with the surrounding environment to plan for the optimal path adaptively.Furthermore, the varying risk function growth interpreted in probabilistic functions suggests more excellent reliability and generality.Moreover, reinforcement learning-based risk assessment is broadly considered as a proper solution to address the collision-free path-planning of automated cars.In this paper, the path planning of automated cars is explored based on the following main contributions: (i) a novel optimal path-planner paradigm to avoid the obstacle collision is proposed by employing the optimal reinforcement learning algorithm combined with probabilistic risk assessment and (ii) potential risks of the car-obstacle(s) collision based on a growth function is considered to be uniformly distributed but unequal in magnitudes.Hence, the proposed algorithm comprises the merits of both unstructured-based obstacle avoidance control according to the nearest neighbor principle and the deep learning benefit of the RL-algorithm combined with the optimality search according to a nonlinear model predictive control (NMPC) paradigm.Computing the optimal path is organized based on the approaching obstacles, road structures, and the dynamic response of the automated car in terms of the constrained inputs and states.
The structure of the present paper is laid out as follows.In Section II, the dynamics of an automated car are formulated.Section III presents the path-planning problem, prospective obstacles, and constraints, and the probabilistic risk assessment.Section IV describes the optimal RL-algorithm.In Section V, the path planning paradigm is investigated based on numerous simulations under various operating situations, and the results are discussed in further detail.Finally, Section VI concludes the paper.

Problem formulation
The dynamic response of the vehicles closely depends on the directional forces and resulting moments generated by the pneumatic tires.However, merely the forces developed by the tires in the lateral direction leave a substantial effect on the handling performance analysis of the vehicles.In contrast, the longitudinal force components affect the handling dynamics infinitesimally.The longitudinal acceleration and the resultant force components of the car are dismissed.However, the vehicle's longitudinal speed must be adequate for producing the lateral forces in proportion with the value of slip angles based on the well-known models for tires.
Furthermore, the roll effect of the lateral weight transfer during the cornering is considered negligible due to the adequately adjusted suspension setting.Therefore, a bicycle model with two describing degrees of freedom is applied to describe the main dynamic modes of the vehicle in the yaw-plane of motion owing to the symmetricity between the right-and left-side tracks (Figure 1).The vehicle yaw stability implies that the yaw angle, the so-called heading angle, is essentially taken as the controlled parameter.Furthermore, maintaining the vehicle yaw velocity g at the vicinity of the desired value that professional drivers achieve is a substantial step in the functional control of vehicles during cornering.The lateral offset error in automated cars can be explained based on the shortest space between the desired trajectory and the vehicle as an orthogonal projection.The vehicle yaw rate g is computed in terms of heading angle u first-order time-differentiation, and the difference between the actual yaw rate and desired yaw one, g d , is defined in terms of the yaw rate error g e .Consequently, vehicle dynamics for the pathfollowing task can be formulated as: where R r ð Þ represents the road radius of curvature and r is the arc length of the position to track instantaneously, which varies as a function of the road trajectory.Furthermore, v x and v y denote the vehicle's longitudinal and lateral speed components described in the body-attached reference system.Moreover, y and _ y denote the lateral displacement and velocity components of the vehicle at the center of gravity (C.G.), respectively.A primary goal in developing the automated cars is to guarantee the minimum lateral offset of the car by designing a controller to converge g e and _ y e to zero, where _ y e = _ y À _ y d .Additionally, y d is the desired trajectory of the car in the lateral direction.Consequently, the vehicle should carry the ability for the satisfactory path-following of the desired trajectory.Integrated Active Front Steering (AFS) coupled with Direct Yaw-Moment Control (DYC) are suggestive of remarkable benefits within different variants of control approaches. 35,36Thus, a 2-DOF yaw model of vehicles can safely predict the dominant responses for the purpose of path-planning and trajectory following.These equations can be briefly written as: where DT represents the acting yaw moment, and F yf and F yr account for the tire lateral force constituents concerning the front and rear wheels, respectively.Furthermore, l f and l r represent the associated wheelbase elements, and m and I z represent total mass and yaw-inertia, respectively.Consequently, the exerted moment with the track width l b is described: where F xij represent the longitudinal force exerted to the front-and rear-axle wheels.There exists a proportional relationship among the tire lateral force and the side slip angles.Therefore, the lateral force components are essentially computable based on the front and rear tire cornering stiffness parameters (i.e.C f and C r ): The nonlinear cornering characteristics of tire may be captured in terms of the uncertainty about the nominal tire cornering stiffness as follows: where Cf and Cr represent the nominal cornering stiffness values related to the front and rear tires, respectively.The nominal cornering stiffness terms are descriptive of the tire force-deflection linear region, and DC f and DC r describes the bounded uncertainties for tire cornering stiffness related to the front and rear wheels, respectively.Additionally, the side slip angles associated with the front and rear tires can be described as: We also incorporated the steering system dynamics in the present study to thoroughly explore the vehicle response.The steering system dynamics can drastically affect the vehicle's dynamic response and the capacity to follow the path formed by the developed algorithm.
Considering the dynamics of steering system from (Figure 2), the produced moment about the kingpin due to the tire lateral force is obtained as: where s c and s n define the caster and pneumatic trails, respectively.By replacing a f from equations ( 6) in (7): One should note that moment is an extraneous cause to the front wheel, and the attached steering system.Hence, the describing equations of motion for steering wheel account for the rotations transferred through the kingpin: Since the rotational acceleration acts relative to the absolute space and dynamically changing steering system, the expression dg=dt incorporates in this model.Nevertheless, the expression d 2 d=dt 2 ) dg=dt 37,38 holds valid for regular automotive cars.Therefore, a dynamic model for the steering system can be expressed as: The equations of motion associated with the vehicle's lateral dynamics, in terms of vehicle lateral speed, yaw-rate, and steering system and the steering system, can be re-structured as follows: Figure 2. EPS-based steering system model.
Accordingly, the general state-space representation of the system dynamics is derivable as follows: where j = v y , g, d Â Ã T represents the states of the system, T includes the general control function and U = u 1 , u 2 ½ T is the control input to the system.Additionally, the corresponding subfunctions are obtained as: Furthermore, it can be stated that h 1 = 1=I z and Path-planning and probabilistic risk assessment

Path considerations and risk assessment
For the automated cars to follow the intended path, a modified model according to the Frenet-Serret differential geometry can be developed.Assuming u denotes the independent parameterization variable and the intended path curve D x d , y d ð Þ, the desired position vector is denoted by where s represents the curvilinear coordinate (arc-length) related to point O in the direction of the traveling path from a predetermined initial position.Therefore, s can be expressed as: Since d c d =d ũ is always nonzero, thus: Hereafter, the parameterized variable s is employed with u d s ð Þ representing the angle between the unit vector tangent to the path T s ð Þ, and the coordinate X.For this purpose, the tangential vector is represented as: Differentiating (20) with respect to the variable s: Additionally, it is assumed that the curvature vector is , which leads to the following kinematic equality 2 : , u e = u À u d and u is the vehicle heading angle.Assuming a scenario with two lanes that can be represented by a cubic spline c O ð Þ, the position in the direction of the spline can be parametrically expressed as follows: For any spline c O ð Þ, it is also assumed to exist a number of disjoint unobserved segments.For the sampled segments independent from uniform distributions in certain unobserved segments, the position and speed components in the segments are expressed as: where ½:, : denotes a closed set between two real numbers, U(:) denotes a uniform distribution on the set, ð Þ represent the starting and final position of an unobserved segment i on spline c O ð Þ, and the minimum and maximum speed of other vehicles, respectively.Assuming that each car is traveling with a constant speed, it is then possible to propagate the entire segments forward in time for t max seconds: where d k ð Þ is the position of the car in the k-th segment after t max seconds.In order to incorporate the size of vehicles and the permissible lateral displacements within the lane, an offset term G k ð Þ is introduced from a uniform distribution for each road segment in Cartesian space perpendicular to the spline.Such a uniform distribution is employed to consider the probability of adjacent vehicles since no prior information is assumed for the location of the adjacent vehicles in the lateral direction.
where m and n are the indices related to the columns and rows of the matrix elements, respectively.Therefore, the probability function is expressed as: where : k k 2 is the 2-norm of a vector, r O ð Þ is the nonnormalized vector perpendicular to c O ð Þ, and G k ð Þ is the maximum deviation among the segments alongside the spline curve.The main purpose of this part is to illustrate how the risk explained above is employed in practice.For this aim, the risk assessment is combined with an optimization-based planning algorithm and twofold advancements in terms of safety and ride comfort are demonstrated.The simple path-planner which is typically employed is substituted in this paper by any other cost-function-based path-planners because the methods are agnostic to the planning techniques.It is also assumed that the ego vehicle speed at time t is v ego and the target path is c O ð Þ, and also a cubic spline is parameterized by its position along the spline.Therefore, the path-planning algorithm primarily recognizes the safety cost J a ego corresponding to the acceleration or deceleration a ego as follows: , where h a ego is the potential function of the vehicle which is positive when the vehicle P O ð Þ is within the ego lane and zero otherwise, and can be defined as follows: Additionally, the following kinematic equalities can be described as 39 : where d k ð Þ ego denotes the expected position for the ego car that travels along the ego car, Q a ego À Á represents the distance between the implies the minimum distance between the vehicle P O ð Þ and the ego car's intended route c d k ð Þ ð Þ, and is the bandwidth of the repulsive potential field.Figure 3 represents the sample collision risk probability function value variations in different orientations.

NMPC path-tracking
Herein, there are constraints that are placed on the vehicle side-slip angle together with the yaw-rate to guarantee cornering stability.The constraint related to the vehicle slip-angle is purposed to keep the vehicle away from the tire-road adhesion limits because the slip-angle massively relies on the road condition and varies under different adhesion characteristics.Accordingly, the constraints imposed on the yaw-rate and slip-angle states of the vehicle are explained as follows: The vehicle's total acceleration/deceleration performance is directly related to the tire-road adhesion characteristics.Such value is bounded by mg where m represents the tire/road friction coefficient and g denotes the gravitational acceleration.Accordingly, the acceleration absolute value is expressed as: Assuming that the vehicle longitudinal acceleration is negligible _ v x ffi 0, equation ( 34) is simplified to: From equation (35), it is axiomatic that the control input to put the vehicle on the preplanned path would be constrained for a specific operating condition.As a result, the steering angle and direct yaw moment control (DYC) and the corresponding variations constrained as follows: where e 1 j j and e 2 j j define the maximum allowable magnitude of the control input perturbation: s:t: where Q shows the weighting matrix related to the difference between the vehicle actual path and the planned path.Additionally, the weighting matrices concerned with the control input is represented by R.Moreover, the intended path vector, control input and the control increment steps are denoted by w ref (k), u(k), and Du(k).In addition, X denotes a slack variable and Xe 2 is employed to penalize the vehicle cost function exceeding the constrained slip angle of the tire.

Reinforcement learning algorithm
The conventional reinforcement learning (RL) models are mostly explained through an agent operator that dynamically interacting with the environment.Such an interaction is implemented by applying the action and perception system.Throughout each singular exchange between the agent and environment, the agent accommodates input i being the sign of the current state s of the environment.Consequently, the agent plans an action a to achieve the output.Afterward, the action substitutes the environment state and the magnitude concerned with the state transition is regarded as the agent by applying a proper reinforcement signal r.
Herein, B describes the agent action to improve the temporary total magnitude associated with r interchangeably.Additionally, S and A describe the discrete set of environment states and agent actions.In this context, the problem of delayed reinforcement and delayed reward is applied according to the Markov Decision Process (MDP). 39Furthermore, the reward function R R : S3R !< ð Þand a state transition function T are employed where T : S3A !P S ð Þ.It is also assumed that P S ð Þ is a probability function distributed across the set S. In this manner, the transition function T s, s 0 , a À Á can be defined in terms of the probability of implementing a transition from state s to state s 0 because of the action a.
By taking into account the longer-term reward policy for the agent, the infinite horizon discounted model is applied.Besides, the subsequent rewards are topologically discounted on account of a discount factor ranged between 0 and 1 04x \ 1 ð Þsuch as E P ' t = 0 x t r t À Á .Additionally, the average of infinite discounted rewards to approach the optimality derived from an agent describes the optimal value corresponding to a state: Based on the uniqueness and existence of the optimal result, the solution to the concurrent equations is determined in terms of a recursion expression 39 : where V Ã s ð Þ represents the value of s corresponding to the initial optimal action and the above statement shows that the value of the state is the total sum of the expected instantaneous reward and the discounted value of the subsequent state values based on the current action.According to the optimal design, the desired value function is explained as follows 39 : Moreover, the action-value function Q s, a ð Þ is described: Hence, the associated optimal solution Q Ã s, a ð Þ is defined according to the action-value function 39 : where Q Ã s, a ð Þ denotes the expected discounted reinforcement associated with the a in state s continuously.Moreover, the Q-learning algorithm explains the update concerned with the Q value according to the delayed parameter Y 2 0, 1 The stated modification is used to implement the RLbased predictive decision-making to avoid the obstacle collision according to the predictive model.Figure 4 illustrates the integrated algorithm for the path planning and path following strategies according to the provided discussions, and the NMPC control algorithm, the control commands in terms of the steering input and the DYC signal applied to the vehicle dynamics model.

Results and discussion
In order to evaluate the performance of the proposed path-planning algorithm for automated cars, simulations are implemented during two different driving scenarios.These two scenarios can demonstrate the feasibility of the proposed methods under various operating conditions.The simulation parameters are summarized in Table 1.The simulation results for the proposed controller are implemented using numerous simulations.In the present study, a road with a single lane in each direction is employed for evaluating the proposed method without the loss of generality to be extendedly employed for other road conditions and driving environment.

Scenario A
In the first scenario, the ego car is traveling at an average forward speed of 30 km/h while the two leading vehicles travel on the same lane holding the constant speed of 25 km/h.It is obvious that the ego vehicle is required to safely pass the leading vehicles.The space for the ego vehicle to pass the leading vehicle has to be sufficient, which is typically a function of the car traveling speed.Herein, the threshold is put at a low space to verify whether the car has the capacity to pass the leading vehicle and also to return to the main pass successfully.This maneuver simply mimics the double lane change maneuver.Herein, the ego vehicle is represented with the risk functions shown in Figure 5.The two leading vehicles are represented based on their collision risk functions in the global coordinate system.Additionally, the planned path for the ego vehicle to pass the two inline leading vehicles can be seen to safely return the original lane without collision.
Because the target lane is clear after the first lane change, the vehicle is planned to can successfully complete the double lane change without changing the travel speed such that the left lane is kept free for the other vehicles attempting to pass.Additionally, it can be seen that after the critical passing from the leading vehicles, because the front lane is free, the vehicle has the opportunity to make the second lane change is a gradual and smoother manner.Figure 6 represents the vehicle trajectory in the plane of the motion and how the vehicle passes the leading vehicles as a function of the iteration numbers of the RL-agent and environment interaction.The plot encompasses both of the longitudinal and lateral based trajectory variations and how the ego vehicle (blue) can pass the leading vehicles (red) without any collision and considering other dynamics obstacles in the environment (green car).Figure 7 illustrates the vehicle responses in terms of the steering system input, the applied torque for the yaw generation for smooth cornering velocity.These parameters are mainly the control tuning inputs to the system which can be seen that are within the reasonable ranges for tires before saturation and prior to the tire starting to drop the lateral force generation.In response to the applied inputs to the ego vehicle to follow the planned path, the dynamic response of the car in terms of the lateral acceleration (g-acceleration), yaw-rate variations and vehicle heading angle change during the intended trajectory travel are presented in Figure 8. Furthermore, it is clear that based on the cosimulations of the model, the path-planning and following the proposed trajectory by the ego vehicle can be performed satisfactorily.

Scenario B
This scenario is considerably complex compared to the first scenario mainly because the ego vehicle is expected  to perform two consecutive double-lane-change maneuvers.The leading vehicles are distributed within two lanes with various collision risk functions depending on the traveling speed.The relative positions and the collision risk functions related to the leading vehicles on the plane of the motion proposed can be seen in Figure 9. Furthermore, the planned trajectory for the ego vehicle based on the algorithm is also demonstrated in Figure 9.It can be seen that because the leading vehicles hold the constant speed lower than the ego vehicle and that the leading vehicles are distributed randomly, the optimal path to pass the entire vehicles safely without the collision risk is to perform two consecutive double-lane changes with different lengths depending on the collision risk function, road condition, and geometric understanding of the environment.It is also noted that the ego vehicle changes the lane to the left lane when it is clear and has sufficient space to accommodate the ego-vehicle with the constant speed.The vehicle keeps the constant speed to provide a smooth ride comfort for the passenger.It is also appreciated that the vehicle is intended to return to the original lane after any passing of the leading vehicles to keep the left lane free for other higher speed traveling cars.The measure of comfort for seated passengers inside vehicles, according to the ISO 2631-1:1997, is associated with the magnitude of exposure to the total magnitude of weighted accelerations in all directions.The root mean square (RMS) of the accelerations can be utilized to objectify the magnitude of the weighted accelerations: where a w is the weighted acceleration according to ISO 2631-1:1997, and the subscript j demonstrates the acceleration component in each direction.As the road is assumed reasonably flat, and the longitudinal acceleration is negligible, the RMS contributions of these components converge to zero, and the only RMS of the weighted acceleration in the lateral direction is, according to ISO 2631-1 and Zhao and Schindler, 40 is obtained at 0.201 m s 2 , which puts the measurable criterion in the comfortable range.It is also noted that the agent-environment interaction number causes the self-tuning and deep learning of the ego vehicle to adapt to the driving environment and the road condition.Finally, Figure 11 explores the tracking performance of the target vehicle by employing the designed NMPC algorithm, subsequent to the planned path according to the risk-assessment based collision avoidance algorithm.It can be seen that the target car holds the capacity to follow the planned path during the entire simulation range although slight variations are observed which are followed by rapid stabilization.Furthermore, it is observed that the second double-lane-change maneuver is taken more consistently and smoothly compared to the first maneuver which can be attributed to the improved learning of the algorithm after iterative interactions of the agents with the environment.Additionally, the error variations of the tracking performance across the X-coordinate is presented in Figure 11 along with the standard deviation of the tracking error.According to the obtained results, the maximum and mean values of the tracking error are obtained at 0.11 and 0.01 m, respectively,   indicating the effectiveness and reliability of the proposed integrated path planning and following algorithm.

Conclusions
In this paper, a motion planning and path following algorithm was proposed by employing the optimal reinforcement learning (RL) coupled with a novel risk assessment approach to avoid the collision with the leading and adjacent vehicles and obstacles during the lane change and critical maneuvers.The proposed RL approach demonstrated to be capable of learning the collision risk based on the probability distributions of the adjacent and leading vehicles and identifying the safest and shortest paths during the changes.Additionally, it was achieved to maintain the travel speed for the ego vehicle unchanged such that the ride comfort is rendered for the vehicle occupants by minimizing the contribution of the weighted longitudinal acceleration, as explored in equation (45).For this purpose, the dynamics of the steering system was also incorporated to provide an understanding of how the steering system dynamics can potentially affect the vehicle response to the input variations.Different driving scenarios were employed in the present paper to verify the effectiveness and performance of the proposed algorithm.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figure 1 .
Figure 1.Yaw-plane vehicle bicycle model due to the symmetricity between the right and left tracks.

Figure 3 .
Figure 3. Sample collision risk probability function value variations in different orientations.

Figure 4 .
Figure 4.The flowchart related to the integrated path planning and path following algorithm.

Table 1 . 04 Figure 5 .
Figure 5. Sample collision risk probability function value variations in different orientations.

Figure 6 .
Figure 6.Vehicle trajectory in the plane of motion and the planned strategy for the vehicle passing the leading vehicles as a function of the iteration numbers of the RL-agent interaction with the environment.

Figure 7 .
Figure 7. Dynamic variations related to: (a) steering system input and (b) applied torque for the yaw generation of the ego car.

Figure 8 .
Figure 8. Dynamic responses of the ego car in terms of: (a) lateral acceleration, (b) yaw-rate variations, and (c) vehicle heading.

Figure 10
Figure10represents the vehicle trajectory in the plane of the motion and how the vehicle passes the leading vehicles as a function of the iteration numbers of the RL-agent interaction with the environment.The plot encompasses the lateral based trajectory variations and how the ego vehicle (blue) can pass the leading vehicles (red) without any collision and considering other dynamics obstacles in the environment (green car).It is also noted that the agent-environment interaction number causes the self-tuning and deep learning of the ego vehicle to adapt to the driving environment and the road condition.Finally, Figure11explores the tracking performance of the target vehicle by employing the designed NMPC algorithm, subsequent to the planned path according to the risk-assessment based collision avoidance algorithm.It can be seen that the target car holds the capacity to follow the planned path during the entire simulation range although slight variations are observed which are followed by rapid stabilization.Furthermore, it is observed that the second double-lane-change maneuver is taken more consistently and smoothly compared to the first maneuver which can be attributed to the improved learning of the algorithm after iterative interactions of the agents with the environment.Additionally, the error variations of the tracking performance across the X-coordinate is presented in Figure11along with the standard deviation of the tracking error.According to the obtained results, the maximum and mean values of the tracking error are obtained at 0.11 and 0.01 m, respectively,

Figure 9 .
Figure 9. Global Coordinate based path-planning based on the proposed algorithm based on: (a) collision-risk function and (b) on yaw-plane of motion.

Figure 10 .
Figure 10.Vehicle trajectory in the plane of motion and the planned strategy for the vehicle passing the leading vehicles as a function of the iteration numbers of the RL-agent interaction with environment.

Figure 11 .
Figure 11.(a) The planned path versus the actual path subsequent to applying NMPC in the global coordinate system and (b) tracking error variations alongside the traveling direction.