• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Autonomous Vehicle Platoons In Urban Road Networks: A Joint Distributed Reinforcement Learning and Model Predictive Control Approach

    2024-01-27 06:47:44LuigiAlfonsoFrancescoGianniniStudentGiuseppeFranzSeniorGiuseppeFedeleFrancescoPupoandGiancarloFortino
    IEEE/CAA Journal of Automatica Sinica 2024年1期

    Luigi D’Alfonso ,,, Francesco Giannini , Student,, Giuseppe Franzè ,Senior,, Giuseppe Fedele ,,, Francesco Pupo ,,, and Giancarlo Fortino ,,

    Abstract—In this paper, platoons of autonomous vehicles operating in urban road networks are considered.From a methodological point of view, the problem of interest consists of formally characterizing vehicle state trajectory tubes by means of routing decisions complying with traffic congestion criteria.To this end, a novel distributed control architecture is conceived by taking advantage of two methodologies: deep reinforcement learning and model predictive control.On one hand, the routing decisions are obtained by using a distributed reinforcement learning algorithm that exploits available traffic data at each road junction.On the other hand, a bank of model predictive controllers is in charge of computing the more adequate control action for each involved vehicle.Such tasks are here combined into a single framework:the deep reinforcement learning output (action) is translated into a set-point to be tracked by the model predictive controller; conversely, the current vehicle position, resulting from the application of the control move, is exploited by the deep reinforcement learning unit for improving its reliability.The main novelty of the proposed solution lies in its hybrid nature: on one hand it fully exploits deep reinforcement learning capabilities for decisionmaking purposes; on the other hand, time-varying hard constraints are always satisfied during the dynamical platoon evolution imposed by the computed routing decisions.To efficiently evaluate the performance of the proposed control architecture, a co-design procedure, involving the SUMO and MATLAB platforms, is implemented so that complex operating environments can be used, and the information coming from road maps (links,junctions, obstacles, semaphores, etc.) and vehicle state trajectories can be shared and exchanged.Finally by considering as operating scenario a real entire city block and a platoon of eleven vehicles described by double-integrator models, several simulations have been performed with the aim to put in light the main features of the proposed approach.Moreover, it is important to underline that in different operating scenarios the proposed reinforcement learning scheme is capable of significantly reducing traffic congestion phenomena when compared with well-reputed competitors.

    I.INTRODUCTION

    IN the last decade, advances in vehicular networking, communication and computing technologies have facilitated the practical deployment of autonomous vehicles (AVs) [1]-[3].On the other hand, the increasing number of vehicles has led to an urgent need for the innovative solutions to deal with road traffic issues.By joining these novelties with modern control technologies, the vehicular system is promising to shift from the individual driving automation to the platoonbased vehicular system paradigm [4].It is envisioned that the use of platoon configurations can improve road traffic safety and efficiency.Therefore, the control of AV platoons has attracted increasing attention from different perspectives starting from the ’00s [5]-[14].Indeed, vehicle routing decisions assume a key role because the optimization of the solution has a positively impact on the traffic congestion, see [15]-[17]and references therein.Moreover, these aspects require particular attention because the private mobility within urban road networks is almost always unsustainable [18].As a consequence future smart cities should refer to autonomous mobility systems that may offer a new way to provide equivalent service capabilities at possibly low congestion levels.In the sequel, these objectives will be addressed by formally combining two different approaches: the model predictive control(MPC) for constrained multi-agent evolutions and deep reinforcement learning (DRL) techniques for routing decision purposes.

    A. Literature Review

    Coordination and control of multi-vehicle systems have gained increasing interest in the control community as testified by several important contributions, see e.g., [19]-[23].Essentially two main lines are pursued: the characterization of admissible vehicle state trajectories tubes and the design ofad-hoccontrol architectures to keep the relative positions of the vehicles within safe and acceptable limits.According to this premise, the MPC appears to be the more adequate approach to comply with hard constraints during the system dynamical evolution.Hereafter, a brief analysis on some pertinent works is provided.In [24] a distributed receding horizon control (DRHC) scheme for nonlinear platoons is developed,while [25] proposes a synchronous DRHC for nonholonomic multi-vehicle systems.In [26], the constrained DRHC problem of a vehicle platoon with constrained nonlinear dynamics is investigated by usingγ-gain stability arguments, whereas in[27], the distributed MPC (DMPC) scheme is based on state trajectories parameterized as polynomial splines.Conversely in [28] a DMPC strategy for solving the formation problem of leader-follower systems is derived by exploiting explicit robust model-based control and set-theoretic ideas.In [29] a DMPC algorithm is developed for heterogeneous vehicle platoons with unidirectional topologies anda-prioriunknown desired set point.Finally, it is worth to mention that the possibility to integrate MPC techniques within AV platoon scenarios has gained an increasing interest because the resulting scheme is flexible with respect to topology changes and feasible when switching constraints sets arise, see e.g., [30].

    In the last decade, the MPC philosophy has been coupled with reinforcement learning (RL) schemes for addressing complex problems in different real contexts: building energy management [31], control of multi-energy systems [32],quadrotor micro air vehicle navigation [33], power system control [34], to cite a few.

    More recently, RL algorithms have applied in several areas of intelligent transportation systems in virtue of achievable high performance [35], [36].Few DRL algorithms have been exploited for vehicle platoon control purposes.Since accelerations (or speeds) of all vehicles should be properly selected,the simple exploitation of centralized RL unit could lead to an exponentially increase of the plant size by making intractable the underlying computations [37], [38].Hence, distributed descriptions of single-unit RL algorithms are mandatory, see[39] and references therein.

    Although some studies have been published on the integration between machine learning and distributed model predictive control approaches, see [40]–[42] and references therein,this topic is worth looking into the capability of the distributed MPC to take fully advantage of the potentialities of the class of RL algorithms in terms of achievable control performance.In particular, MPC-based strategies, being robust and based on worst-case analysis, may be too conservative under dynamic and changing working conditions, conversely reinforcement learning algorithms, although capable of dynamically adapting to time-varying operating scenarios variations, are not able to guarantee hard constraints satisfaction.

    B. Distributed Reinforcement Learning Based Model Predictive Control Scheme

    The analysis of the above discussed contributions put in light two key aspects often left out in the proposed strategies:1) The capability to efficiently take or modify the routing decisions during the on-line operations; 2) The definition of anad-hoccontrol architecture capable to couple thenominalpath planning (defined by the sequence of routing decisions)with real dynamics of the autonomous vehicles where constraints and/or model impairments must be taken into account at each time instant.

    Starting from this reasoning, in this paper the main aim is to propose new insights in the field of path planning for constrained autonomous vehicles moving in the cluttered environments of urban road networks.To this end, the guideline is to make the resulting control architecture scalable, flexible and computational low demanding in order to be capable of addressing time-varying operating scenarios along the path.From a methodological perspective two aspects deserve particular attention:

    1) The computation of routing decisions for mitigating traffic congestion phenomena;

    2) A constrained control strategy in charge of adequately exploiting the routing decisions.

    These considerations naturally lead to consider a distributed framework, necessary to enjoy scalability and flexibility specs, where the first aspect is addressed by resorting to an innovative DRL scheme, while the second point is attacked by means of an MPC approach adequately designed to reduce as much as possible computational load pertaining to the underlying optimizations.Preliminary studies along these lines have been presented in [43], [44].

    To comply with this reasoning, the two approaches are combined into a single distributed control architecture where the capability of the DRL scheme to derive routing decisions by mitigating the undesired traffic effects allows to drive safely the platoon towards the pre-assigned target under the requirement that no collisions amongst the involved vehicles occur.Then the main contributions of this paper can be summarized as follow:

    1) Formalization of local MPC controllers that take advantage of receding horizon philosophy and the platoon topology:in particular most of computations are performed by the leader while followers exploit the information spread along the chain and set-membership conditions for reducing their computational complexity;

    2) The periodic RL activation is the job of the leader while all the rest of platoon only accesses and evaluates the reward function for routing decision purposes;

    3) Time-varying topology scenarios are allowed, i.e., the initial platoon can split, whenever recommended (different routing decisions between the leader and its followers), in two or more leader-follower configurations andvice versatwo platoons can safely join each other.

    As it will be shown in the simulation section, these properties allow to improve the overall performance when looking at the minimization of traffic congestion occurrences.

    A second important outcome of this paper concerns with the development of a co-design procedure between the SUMO platform [45] and MATLAB packages that allows to perform real-like simulations by using real city maps and data series of the traffic evolution.This prescribes that data are recast in the XML format, a client-server paradigm is defined for making admissible the communication between the two platforms and application programming interfaces (APIs) functions are implemented to get, modify or add information about the vehicles.

    Finally, it is important to analyze the possible implications of using the proposed control architecture in real contexts.From one side, by looking at the computational burdens, the fact that the proposed strategy is computationally “l(fā)ight” represents a crucial property for operating in unknown environments populated by obstacles and/or external agents where routing decisions and command inputs must be quickly adopted.On the other hand, an effective application imposes to take care of the following issues: develop accurate mathematical models describing vehicle dynamics and operating environments; introduce anti-collisions capabilities to the overall strategy; define adequate communication features in order to correctly acquire and evaluate the information available on the sensor units located along the urban road network.

    C. Paper Structure

    The rest of the paper is organized as follows.In Section II,the path planning problem for urban road networks is formulated.In Section III, the proposed deep reinforcement learning algorithm is introduced and its main properties are discussed.Sections IV and V describe the proposed DMPC architecture and the resulting algorithm, respectively.Section VI details the co-design procedure.Finally simulations and some remarks end the paper.

    D. Notation and Definitions

    II.PROBLEM FORMULATION

    Moreover, we assume that the plant description (3) has real(e.g., double integrator models) or artificially added (i.e.,thanks to a pre-compensation action) integral effects so that under zero velocity andui(t)=0, any pointpi∈R2in the planar space is an equilibrium.In the context of the intelligent vehicles field, this is by now a standard assumption as outlined in [46].

    Throughout the paper it is assumed that the team is topologically organized as a platoon, see Fig.1.

    Fig.1.Leader-follower topology.

    -Information Exchange: At each time instantt, eachi-th vehicle sends to (i+ 1)-th its predicted future state trajectory,namelyx?i(t).

    Operating scenario: The platoon (3) and (4) operates within an urban road network (URN), consisting ofMlinks and withFjunctions, where sequence of routing decisions have to be taken into account during the travel.

    Hereafter, the following definition will be exploited.

    Definition 4: A link is defined as a portion of a road comprised between two junctions.

    Then given a road map, the problem we want to solve can be stated as follows.

    Platoons in Urban Environments(PL-UE): Given a platoon of constrained AVs (3) and (4) and a targetdesign a distributed state-feedback control policy

    satisfying constraints (4) and such that, starting from an admissible initial conditionthe team is driven towards the targetxfregardless of any junction occurrence.

    According to the PL-UE statement, the following definition is exploited for the next developments.

    Definition 5: A statex∈X1×···×XLis said admissible if there exists at least a path fromxtoxf∈X1×···×XLcompatible with (4).

    The problem will be addressed by means of a DMPC scheme properly customized to exploit the capabilities of a recent DRL algorithm.In particular, the idea we would develop consists of exploiting routing decisions, provided by the DRL unit, at each junction occurrence as set-point references for the underlying MPC controllers.

    A. Proposed Solution: An Overview

    In the sequel, the key ingredients of a DMPC-based architecture devoted to address the PL-UE problem is discussed.First, without loss of generality, it is assumed that an admissible path complying with the PL-UE problem prescriptions exists.Then, the overall control architecture is reported in Fig.2.Essentially, it consists of three units: a Path Planner based on the DRL algorithm, a Reference Generator and a Controller each one tuned w.r.t.the involved vehiclesAVi,i=1,...,L.

    Fig.2.Distributed control architecture exploiting the DRL algorithm.

    where the direction of travel and center line are taken into account.By taking advantage of this reasoning, the set-pointzi(t)is then formally achieved by solving the following optimization:

    which provides the admissible referencezi(t), at the maximum distance from the current conditionxi(t), compatible with the road constraint setselected by the actionai(t).

    Moreover, a decision zone for routing decision purposes has to be formally defined for each link.This translates into the following half-space description:

    see the explanatory example of Fig.3.

    Fig.3.Geometric constraints (dashed green zone) due to direction travel and center line (dashed red line).

    Hence, the Controller computes an admissible commandui(t)in a distributed receding horizon fashion.Specifically:

    ?MPC controllers- Eachi-th vehicle is equipped with a DMPC:

    1) The leader (namelyi=1) implements a local receding horizon controller (RHC) with the prediction horizon lengthN1=0;

    2) Each follower (namelyi=2,...,L,) uses an MPC controller withNi=i-1;

    Remark 1: Notice that the above choices are dictated by the following reasoning.In real contexts, the platoon is subject to a certain level of uncertainty within the URN, e.g., other agents moving in the same road links, obstacles, and so on.Since the leader should compute control actions in guaranteed fashion, this leads to consider a receding horizon controller for the leader with a control horizon lengthN1=0 within the positively invariant region that can be simply checked by using on-board perception modules.As followers are concerned, the simplest idea is to increment the length by one at each successive level so that the overall feasibility can be kept according to the Bellman optimization principle [47].

    ?Time-varying LF topology- According to a given switching rule, the proposed strategy allows to split the platoon in two or more LF configurations whenever different routing decisions occur.Conversely, two platoons can be combined into a single configuration only if the leader of the first platoon reconnects to the leaf node of the second one.As a consequence the feasibility is preserved if the following actions are performed:

    – Lettˉ be the time instant when thei-th,i>1, vehicle leaves the initial LF configuration.Attˉ+1 it becomes the leader of a new platoon and the prediction horizon is set toNi=0.If 0

    Remark 2: In order to ensure that the leader-follower (LF)formation is preserved, each vehicle should be aware of the position of its predecessor from the current time instant onward.Within an MPC framework, this prescribes that each element of the LF configuration informs followers about its future state trajectory (predicted sequence) in order to properly define the LF formation constraints within the time interval defined by the prediction horizon length.

    III.REINFORCEMENT LEARNING SCHEME

    This section is devoted to implement a DRL scheme for routing decision purposes.To this end, the routing problem can be easily recast as a path search problem on a fully connected graph by exploiting as a threshold the decision zone,see Fig.3.For the sake of computational complexity savings,the resulting graph can be appropriately pruned to reduce as much as possible the research space.

    In the sequel, the following definitions are of interest.

    Definition 6: Given a road linkj, at each time instanttthe vehicle density ρj(t) is defined as follows:

    where μj(t)∈Z+is the number of vehicles onjatt,dˉj∈R+the average road width andljthe road length.

    By referring to a distributed framework, aQ-learning approach [48] is adopted and the multi-agent RL model is as follows:

    where

    ?V={AV1,...,AVh} accounts for the set of vehicles(agents) contributing to the RL task;

    whereEDiis a one-hot encoding string accounting for the usable edges, while ρj(t),j=1,...,Mi, are the distribution densities of the road linkiand its neighbors;

    ? Λ collects the admissible vehicle actions: e.g., “turn right”,“turn left”, “go straight on”;

    ? Φ : Σ×Λ×Σ →R is the global reward function.

    Fig.4.Neural Network description.

    Unfortunately, during the training procedure, a trade-off between the exploration and exploitation phases comes out.Hence, such a hitch can be overcome by using an epsilongreedy policy [49], i.e., the probability ?(·) of choosing a random action is an inversely proportional function of time.

    Here, the following local reward function has been selected:

    Notice that one obtains good rewards (performance) as long as the vehicle density decreases.

    In order to adequately train the considered NN, the following squared loss function is used:

    where

    represents the so-calledexpectedreward that characterizes the quality of information acquired by thei-th agent on its surrounding.Moreover, in the sequel the moving time-window average reward is exploited

    withTbeing the moving window length.

    Finally, whenever a new reward measurement is available a gradient descent update is performed

    with α ∈(0,1) the learning rate.

    All the above developments allow to write down the following computable algorithm.

    DQL-Algorithm Initialization σi(0),i=1,...,h,1: Extrapolate from the URN;Q(σi,a;θ),i=1,...,h;2: Initialize At each do 1: Select an admissible action:tdec ai(tdec)=■■■■■■■rand(Λ),with probability ?(tdec)argmax a∈Λ Qi(σi(tdec),a;θi),otherwise.k σk(tdec+?tdec(t))) σi(tdec+?Tdec(t)),?i=1,...,h Φ(∪2: Observe the global reward and;?i(tdec+?Tdec(t)),,?i=1,...,h;3: Compute the local reward θi,?i=1,...,h.4: Update

    Remark 3: Notice that, in the present context, DQL algorithms are more suitable than standardQ-table based schemes[50], [51].In fact, deepQ-learning uses a neural network to calculate theQ-values necessary to take an action in the environment.Essentially, theQ-values are not directly updated but the neural network weights are corrected, according to the reward for each action, and used to compute them.This means that they are versatile and require very less memory than anyQ-table counterpart.As a matter of fact most appropriate actions come out.

    IV.DMPC STRATEGY: SOLUTION AND FEASIBILITY

    The above control framework poses the following methodological questions:

    1) Given an initial LF configuration, define local MPC problems for theLinvolved vehicles;

    2) Characterize the conditions under which the platoon split is feasible;

    3) Determine the conditions under which two platoons can be combined into a single LF configuration.

    A. DMPC Control Units: Single Platoon

    Notice that the sequence x?iis transmitted to the followeri+1 which in turn hypothesizes that thei-th vehicle will implement during the update prediction time interval[t+k+1,t+k+Ni+1].

    In order to formally define the optimal control problem underlying the MPC strategy, the following ingredients are required:

    ? Input sequence parametrization:

    withKi∈Rmi×nia stabilizing and admissible state feedback law;

    ? Cost-to-go:

    In the proposed DMPC scheme, the MPC controller of each vehicle has to be obtained by also taking care of the main PLUE goal, i.e., converging to the targetxf.Then, it is necessary to deal with the fact that the LF formation moves towardsxf:the latter prescribes that the terminal condition is timevarying and the related constraint must be accordingly modified.Moreover, the vehicle must take into account routing decisions at each junction, so that the resulting referencezi(t)comes to play when the terminal region is derived.First, the following conditions have to be satisfied for all involved vehicles:

    whereΞi(t) is computed such that

    As the followers are concerned, an admissible computation of the sets Ξi(t),i=2,...,L, prescribes that each vehicle transmits to its direct followeri+1 the PI computed at the previous time instant, i.e.,Ξi(t-1).

    Then, one has that

    ?Ifi=1 then Ξ1(t) is such that

    – Ifz1(t)∈Ξ1(t-2)∩Ξ1(t-1) then

    – else

    ? else

    – I fzi(t)=zi-1(t) then

    – elseifzi(t)∈Ξi(t-2)∩Ξi(t-1)

    – otherwise

    Finally, it is mandatory to preserve the LF configuration at each time instant.Here, it is required that each follower vehicle must maintain a safe distance from its predecessor and neighbors.This translates into LF formation constraints in terms of desired separation and the desired relative bearing between thei-th vehicle and its predecessor [52].Hence,given the current state measurementxi(t|t)∈Xiand the predecessor assumed state sequence(t) computed according to the following strategy:

    the optimization problem for thei-th follower, hereafter denoted as(t), is

    Conversely, the DMPC optimization pertaining to the leader vehicle, named DMPC-PL(t), reduces to the off-line computation of an admissible pair (Ξ1,K1) via the following semidefinite programming (SDP) problem (see e.g., [53]):

    DMPC-PL(t):

    Notice that DMPC-PL(t) does not involve the satisfaction of (36) because this vehicle leads the team, while the LF configuration is exclusively preserved by its follower nodes.

    Remark 4: Although l.h.s.of (36) gives rise to non-convex constraints, they are addressed by straightforwardly adapting the convexification results of [54] (Proposition 1, p.292).

    B. Platoon Splitting and Queuing

    Along the road links, the initial platoon can split into two or more sub-LF configurations and two any platoons (even singletons) can regroup.

    Notice that the splitting phase occurs during the on-line operations when different routing decisions are provided to the platoon vehicles.More precisely, since the reward is periodically updated until the decision zone of any road junction is reached, a vehicle and its predecessor (father) could assume different directions of travel so that a disconnection in two or more sub-platoons takes place.

    As a consequence of such an event, it is assumed that an adequatere-numberingprocedure is implemented so that the new platoonsLF1andLF2are renamed as followsand

    On the other hand, two platoons could regroup according to the following reasoning.The idea is thatLF1is added toLF2orvice versaat the leaf node of the first LF chain.As a consequence, the associated DMPC controllers will be implemented over the horizon of lengthsNh=NL1+h,h=1,...,L2.

    The main difficulty with such an aim relies on the capability of the external platoon to detect leaf node of the platoon to regroup.In fact, noa-prioriinformation about roles and positions of the vehicles within the LF chain is made available on the external vehicles side.Then, one can use the following technicality to overcome this hitch.

    Statement 1: At each time instantt, eachi-th vehicle of theLF1formation makes available a data packet whose structure is the formCounti={Counti.child(i),Counti.Ni}, where the integerchild(i) accounts for its follower, i.e.,child(i)=0 implies no follower, and the horizon lengthNi.Such a packet can be acquired by anotherh-th vehicle if there exists a time instanttˉ such that

    It is worth noticing thatStatement 1is simply implementable because the actual production robots are usually equipped with embedded and purpose-oriented communication devices with a low energy consumption, see e.g., [55].

    Then, theLF2platoon is able to regroup withLF1if condition (42) becomes valid for somei-th vehicle for which theCounti.child(i)=0.This means that theAV12vehicle will be a leaf node of theLF1formation and, as a consequence, communication facilities will be enabled and a new LF configuration will come out.

    The key question to investigate is if splitting and queuing operations allow to preserve the overall feasibility property of the DMPC strategy.To this end the following arguments will be considered:

    imposed equal to the number of followers of the initial platoon

    and this always allows to provide an admissible sequence when the platoonsLF1andLF2regroup.According to this reasoning, first the positively invariant regions are derived by resorting to the so-called Delay Dependent time-delay scenario [57] and the prescribed constraints (4).To this end, by considering the state-feedback control law

    and the regulated plant

    the delayed system technicalities of [57] are hereafter used.The feedback control law (45) stabilizes the plant and satisfies the prescribed constraints if the following matrix inequalities in the objective variablesKi,Qi, Ψi,Riand τ, are satisfied

    Hence, the ellipsoidal set

    Conversely, if the control horizon becomesNi+ka solution there always exists becauseKixi?(t+Ni|t) can be consecutively applied as prescribed by (47)-(49), i.e.,

    V.DISTRIBUTED DRL-MPC ALGORITHM

    The developments of the previous sections are here collected to define a computable algorithm based on the availability of an urban map network characterized in terms of admissible directions of travel for each road link.

    Recursive feasibility and asymptotic stability of the DRLMPC Algorithm are stated in the following proposition.

    Proposition 3:Letxi(0) andxif,i=1,...,L, initial and final conditions be given.Then, the DRL-MPC Algorithm always satisfies the prescribed constraints, complies with the requirements of the PL-UE problem and ensures that the regulated state trajectories are asymptotically stable.

    Proof: Under the hypothesis that feasible input sequences are off-line available both for PLand PiFoptimization problems, the following arguments are exploited for feasibility purposes.The following scenarios can arise:

    1)Single platoon phase: The leader feasibility arises from the fact that if an optimal solution there exists at the time instantt, namelyu1?(t|t)=K1(t)x1(t), then at the next time instantt+1 the positively invariance property of Ξ1(t) ensures that the state trajectory at least will remain confined within Ξ1(t)and, in virtue of (22), (23), the transition to the new controller is asymptotically guaranteed.Similar arguments can be exploited for the generic follower because afterNi-steps state ahead predictions the state trajectoryxi(·) keeps confined within the positively invariant set Ξi, in turn updated by (27);

    2)Splitting mode: Without loss of generality, let us assume that two platoons arise, namelyL1andL2.Att+1 the local optimizationPL1remains feasiblebecausethedomainof attractionΞ1(t)isinvariantw.r.t.the statefeedbackK1(t),while the feasibility of the leader of the new platoonL2is not affected because admissible, though not optimal, solutions there always exist in virtue of the results ofProposition1.Again, similar ideas apply to the follower vehicles;

    3)Queuing mode: Att+1 the feasibility of the platoonL2is guaranteed in virtue ofProposition2 that allows to define an admissible, though not optimal, control moves sequence by exploiting the delay property of the PI region Ξi(t) computed by complying with (47)-(49).

    Finally, the closed-loop asymptotic stability derives from the same analysis because, according to each routing decisionzi(t), the vehicle position converges toxifast→∞.■

    VI.SUMO VERSUS MATLAB CO-DESIGN

    In this section, a simulation framework is developed with the aim of stressing the capabilities of the proposed DRLMPC architecture in complex environments.Specifically, two software platforms are considered: Simulation of Urban MObility? (SUMO) software [45] andMATLAB 2022a?.

    Recall that SUMO is an open source, globally accepted by scientific community, highly portable, microscopic and continuous multi-modal traffic simulation package designed to handle large road networks [58].Although from a macroscopic point of view, traffic is usually described by means of departure times and routes with certain durations, in a real scenario it is highly conditioned by various other factors like drivers mobility, driving styles, weather, infrastructure within the region or other incidents affecting the environment.As it is well-known, the environment complexity prevents any traffic modeling via explicit mathematical expressions [59], therefore the only chance consists in performing reliable simulations instrumental to show critical waypoints or to predict traffic behaviors [45].This software is particularly suitable to simulate URNs where different classes of vehicles (such as cars, trains, buses) lie.The simulation engine is based on a hybrid description: discrete-time and continuous-space.It enjoys collision avoidance capabilities, multi-lane roads under lane changing, junction-based right-of-way rules, lane-to-lane connections.Moreover, SUMO provides all the URN data,traffic demanding and the resulting simulation in XML formats.

    In the sequel, a co-design procedure is outlined in order to allow information sharing between SUMO and MATLAB.Essentially this prescribes that, starting from the set of functions provided by TraCI4Matlab? [60] that are customized to comply with the DQL-Algorithm, the co-design in terms of software communication and data exchange is performed.More in detail, the TraCI4Matlab toolbox allows the communication between any MATLAB application and the SUMO traffic simulator.The TraCI application level protocol is based on the client-server paradigm and it is built on top of the TCP/IP stack: the application developed in MATLAB plays the role of a client and it can access and modify the simulation environment provided by SUMO that acts as a server.Hence, the proposed co-design is in charge of automatically mapping all the URN information into properly encoded MATLAB objects.Then, a MATLAB graph is obtained by exploiting the SUMO XML files, where the URN junctions(i.e., crossroads, traffic circles, road entrances and exits) are the graph nodes while the roads are the graph edges, see the explicative Fig.5.A set of APIs has been then developed to get, modify or add information about vehicles within the simulation environment.For instance these routines can be used to query SUMO where a vehicle is currently moving or to add a new vehicle in terms of its position, speed, attitude and driving features in the URN reference frame.In addition, the proposed toolbox straightforwardly allows to obtain data regarding road vehicles density, spatial constraints (i.e., road width,length and topology), obstacles, and so on.

    As the DQL-Algorithm is concerned, the proposed codesign adapts to new operating scenarios by simply selecting a new SUMO XML describing file that in turn increases its flexibility features for routing decision purposes.In particular,the designed APIs allow to compute the local reward (13),while the optimization of NN weights is achieved by means of the MATLAB Reinforcement Learning toolbox? and the MATLAB Deep learning toolbox? [61], [62].

    VII.SIMULATIONS

    In this section, a set of simulations is proposed aiming at highlighting the performance of the proposed DRL-MPC algorithm by using real traffic data.The simulation setup has been implemented within the Matlab R2022a environment by using the SUMO 1.11.0 toolbox and the computations have been carried out on a Lenovo IdeaPad L340-15RH laptop equipped with an Intel Core i9 processor under a 64-bit operating system.

    TABLE I PLATOON INITIAL POSITIONS

    A. Vehicle Platoon, Urban Road Network and Constraints

    In the sequel, the district around the “Andrea Costa” road in Bologna (IT) has been chosen for testing the features of the proposed scheme, see Fig.6.To this end, the real traffic data set reported in [63] includes the area in the proximity of the city football stadium that well adapts to simulate the vehicles mobility during big and popular events such as football matches or concerts.In particular, these traffic data are related to the Bologna’s peak hour (8:00 am–9:00 am) and have been generated by considering the highest traffic demand during a football match on March 3rd 2010 and the daily flow one week before.Moreover, additional road information have been also considered: traffic lights locations and plans, inductive loop positions, public bus transportation system, and so on.

    The URN has been built within the public project iTETRIS[64] which is specifically oriented to define large-scale traffic management solutions for vehicular communications.

    Operating scenario - The reference initial position is located at the latitude 44?29′23.6′′Nand longitude11?18′30.2′′Ecoordinates.Within a local tangent plane East-North-Up reference frame [65], the AVs initial positions have been set as reported in Table I withvix=0 andviy=0,?i.The aim consists in driving all the vehicles towards the parking area located close to the target position in the south-east side, see Fig.7.

    Fig.7.The URN description via SUMO: the leader initial (yellow) and target (red) locations are denoted with pins.

    The following knobs are used for the DRL-MPC algorithm implementation:i=1,...,L,α=0.01 and γ=0.99.Moreover, the decision zones(8) are straightforwardly derived according to the assumption that the decision is made 3.5m before each junction.Finally,the NN architecture of Fig.4 is customized within the Matlab Deep Network Designer as depicted in Fig.8, where the state input layer dimension is twice the number of edges of the considered URN, i.e.,2M=358.

    Fig.8.Neural Network from MATLAB Deep Network Designer.The activation functions are labeled as “elus” and “elua” because they are two elu functions, one for the state path and another one for the action path.

    B. Results

    All the achieved results are collected in Figs.9-16.As the Q-function is concerned, each AV benefits of pre-trained initialization performed by using the MATLAB Reinforcement Learning Toolbox built-in routines and the SUMO data set related to the “Andrea Costa” district.Such a phase has been carried out by generating 2000 traffic episodes [49].Each episode is here defined as a finite sequence of time instants where the first one corresponds to the initial position vehicles configuration of Table I, while the last to the targetxf(Parking area).Notice that during the training, the NN weights are updated according to (18).Specifically, the training process evolution is depicted in Fig.9, where the episode reward (the sum of all the rewards received from the agent during each episode), the moving average reward and the expected episode reward are reported.There, it can be observed that the capability of the vehicle to select the more adequate decision improves as the training process evolves: the average reward(red line) increases, while the expected one (orange line)shows an asymptotic convergent behavior to the effective reward.

    By taking a look to the regulated behavior, all the prescribed constraints are always fulfilled, see Figs.10 and 14.Hence for the sake of comprehension, the attention will be focused on the two critical modes: platoon splitting and platoon queuing.

    Fig.9.Training from MATLAB Reinforcement Learning toolbox.

    Fig.10.8:00 am–8:30 am traffic data set: constraints fulfillment.Left bottom graph refers to a single platoon, while the right column to the two platoons scenario ( L F1 (top) and L F2 (bottom)).

    Fig.11.8:00 am–8:30 am traffic data set: (a) single platoon before split (b)splitting (c) two platoons moving in the URN (d) platoons reaching the target position.Yellow cars denote the traffic along the path.

    The first experiment is performed by considering the traffic data in the time interval 8:00am-8:30am.In this case, the initial platoon configuration is kept the same until around

    Fig.12.8:00 am–8:30 am traffic data set: single leader (blue line) and two leaders ( A V1 red line, A V2 green line).

    Fig.13.8:00 am–8:30 am traffic data set: routing decisions.Leader a1 and first splitting vehicle a9.

    Fig.14.8:30 am–9:00 am traffic data set: constraints fulfillment.Left bottom graph refers to a single platoon, right top and middle graphs to the two platoons configuration and the right bottom one to the single platoon after queuing.

    Fig.15.8:30 am–9:00 am traffic data set: (a) single platoon before split (b)splitting (c) two platoons moving in the URN (d) platoons reaching the joining condition.In all the pictures, green vehicles are the leaders and red vehicles are the followers.Yellow cars denote the traffic along the path.

    Fig.16.8:30 am–9:00 am traffic data set: leaders trajectory in the cases of a single platoon, before the split and after the join (blue line), and of two platoons (red line for L F1 and green line for L F2).

    Fig.17.8 :30 am-9:00 am traffic data set: routing decisions.Leadera1 and first splitting vehicle a6.

    platoon reaches the target att≈715s.For the sake of completeness constraint evolutions and platoon behavior snapshots within SUMO are summarized in Figs.14 and 15,respectively.Finally, Fig.16 depicts the achieved paths during the prescribed mission.

    For the interested readers, simulation demos are available at the following web link: https://youtu.be/YcxVJctk75k.

    Finally, some numerical comparisons have been carried out by contrasting the DQL algorithm with a well-reputed competitor, namely the Dijkstra algorithm [66].To this end, the same DMPC controller has been exploited for both the routing schemes.In addition to the simulations above described, a further scenario is addressed: within the time window 8:00 am-8:30 am, the initial platoon configuration is initialized by imposing that all the vehicles have zero velocities,the leader is located at (x11(0),x12(0))=(874.54,979.20)m and the target isxf=[1611.76,883.38,0,0]Tm.Notice that the ENU reference frame is the same exploited for the other simulations.The results are collected in Table II in terms of the maximum travel time required, i.e., the time interval required by the leaf vehicle.Such a computation has been parceled out:20%, 50% and 100% of the overall route length.As a common denominator, the three simulations show that during the first segment (20%) the two schemes have quite similar dynamical behaviors, while as the platoon moves towards the targetxfthe proposed DRL-MPC more and more outperforms the Dijkstra-based solution.The main reasoning of this behavior relies on the capability of the DQL algorithm to simply and quickly adapt its routing decisions at the time-varying traffic conditions, while this is difficult, or even unviable, for the Dijkstra counterpart.

    VIII.CONCLUSIONS

    In this paper, the path planning problem for vehicle platoons subject to routing decisions and operating in urban road networks has been addressed by means of a new control architecture based on the joint exploitation of DRL and MPC properties.As one of the main merits of the proposed approach,the DMPC algorithm has been formalized by fully taking advantage of the routing decisions obtained by the DRL unit.As a consequence, time-varying topologies of the multi-vehicle system, arising according to the minimization of traffic congestion criteria, have been also considered without compromising the overall feasibility.In the simulation section, aplatoon of eleven vehicles, operating within the area around the football stadium of Bologna in Italy, has been used to put in light the main capabilities of the proposed DRL-MPCAlgorithm.Future studies will follow two directions.First, it is important to extend the proposed approach to comply with the satisfaction of obstacle avoidance requirements and when the roadway has more than one lane allowing the use of grid vehicle topologies.Then, the performance of the DRL algorithm can be improved by accelerating the distributed learning phase to get faster vehicle planning decision responses.

    TABLE II ROUTING TIMES IN THE TWO SIMULATED SCENARIOS

    欧美大码av| aaaaa片日本免费| 69精品国产乱码久久久| 午夜久久久在线观看| 国产1区2区3区精品| 少妇 在线观看| 免费少妇av软件| 亚洲性夜色夜夜综合| 久久性视频一级片| 国产成人av激情在线播放| 一二三四在线观看免费中文在| 男女午夜视频在线观看| 国产三级黄色录像| 亚洲国产av影院在线观看| 国产片内射在线| 女人爽到高潮嗷嗷叫在线视频| 丝袜在线中文字幕| 国产在线免费精品| 国产麻豆69| 免费在线观看日本一区| 在线亚洲精品国产二区图片欧美| 亚洲视频免费观看视频| 欧美精品人与动牲交sv欧美| 欧美成人免费av一区二区三区 | 亚洲国产中文字幕在线视频| 97人妻天天添夜夜摸| 国产精品麻豆人妻色哟哟久久| 欧美日韩中文字幕国产精品一区二区三区 | 99精品久久久久人妻精品| 午夜久久久在线观看| 欧美激情久久久久久爽电影 | 成人影院久久| 又紧又爽又黄一区二区| 久久av网站| 汤姆久久久久久久影院中文字幕| 久久人妻av系列| 亚洲精品成人av观看孕妇| 欧美在线黄色| 99国产精品一区二区蜜桃av | 下体分泌物呈黄色| 久久毛片免费看一区二区三区| 亚洲一码二码三码区别大吗| 一本—道久久a久久精品蜜桃钙片| 免费久久久久久久精品成人欧美视频| 国产激情久久老熟女| 十分钟在线观看高清视频www| √禁漫天堂资源中文www| 一边摸一边抽搐一进一出视频| 欧美在线黄色| 嫁个100分男人电影在线观看| 亚洲精品国产色婷婷电影| 777久久人妻少妇嫩草av网站| 成年动漫av网址| 男女下面插进去视频免费观看| 777米奇影视久久| 国产精品熟女久久久久浪| 丁香欧美五月| 一边摸一边抽搐一进一小说 | 久久国产精品影院| 99re6热这里在线精品视频| 在线观看免费日韩欧美大片| 啦啦啦在线免费观看视频4| 在线亚洲精品国产二区图片欧美| 国产有黄有色有爽视频| 熟女少妇亚洲综合色aaa.| 女性被躁到高潮视频| 国产xxxxx性猛交| 最近最新中文字幕大全免费视频| 中文字幕色久视频| 一级a爱视频在线免费观看| 少妇被粗大的猛进出69影院| 十八禁人妻一区二区| 国产高清国产精品国产三级| 亚洲 国产 在线| 欧美日韩中文字幕国产精品一区二区三区 | 汤姆久久久久久久影院中文字幕| 菩萨蛮人人尽说江南好唐韦庄| 老司机福利观看| 国产精品免费大片| 伦理电影免费视频| 一级毛片精品| 欧美 亚洲 国产 日韩一| 亚洲欧美日韩另类电影网站| 天天躁日日躁夜夜躁夜夜| 在线观看66精品国产| 亚洲全国av大片| 久热这里只有精品99| 亚洲成av片中文字幕在线观看| 女人久久www免费人成看片| 夜夜爽天天搞| 黄色怎么调成土黄色| 午夜精品国产一区二区电影| 叶爱在线成人免费视频播放| 高清av免费在线| 美女国产高潮福利片在线看| 亚洲久久久国产精品| 中文字幕人妻熟女乱码| www日本在线高清视频| 久久狼人影院| 最新的欧美精品一区二区| 黄片播放在线免费| 免费看a级黄色片| 天堂中文最新版在线下载| 婷婷丁香在线五月| 久久久久久免费高清国产稀缺| 91精品三级在线观看| 免费在线观看黄色视频的| 亚洲av片天天在线观看| 久热这里只有精品99| 亚洲五月婷婷丁香| 国产亚洲精品一区二区www | 99国产精品一区二区蜜桃av | 亚洲中文字幕日韩| 免费高清在线观看日韩| 欧美日韩亚洲国产一区二区在线观看 | 深夜精品福利| 女人精品久久久久毛片| 在线十欧美十亚洲十日本专区| 男女无遮挡免费网站观看| 啪啪无遮挡十八禁网站| 啦啦啦在线免费观看视频4| 精品亚洲乱码少妇综合久久| 色精品久久人妻99蜜桃| 黄色毛片三级朝国网站| 日韩精品免费视频一区二区三区| 制服诱惑二区| 99精品在免费线老司机午夜| 国产xxxxx性猛交| 日韩大码丰满熟妇| 咕卡用的链子| kizo精华| 99精品久久久久人妻精品| 午夜91福利影院| 啦啦啦免费观看视频1| 久久免费观看电影| 王馨瑶露胸无遮挡在线观看| 午夜福利欧美成人| av福利片在线| av不卡在线播放| 午夜福利一区二区在线看| 波多野结衣一区麻豆| 午夜日韩欧美国产| 久久久欧美国产精品| 免费观看a级毛片全部| 9191精品国产免费久久| 久热这里只有精品99| tocl精华| 超碰成人久久| 十八禁网站网址无遮挡| e午夜精品久久久久久久| 国产精品久久久久久精品电影小说| kizo精华| 精品久久蜜臀av无| 欧美 日韩 精品 国产| 亚洲九九香蕉| 国产精品免费大片| 乱人伦中国视频| 亚洲国产精品一区二区三区在线| 一级毛片女人18水好多| 亚洲国产看品久久| 黄色片一级片一级黄色片| 99国产精品一区二区三区| 蜜桃在线观看..| 国产成人一区二区三区免费视频网站| 国产精品亚洲一级av第二区| 欧美日韩亚洲综合一区二区三区_| 2018国产大陆天天弄谢| 日韩免费av在线播放| 51午夜福利影视在线观看| 国产高清videossex| 免费在线观看影片大全网站| 老鸭窝网址在线观看| 怎么达到女性高潮| 久久精品国产亚洲av香蕉五月 | 国产av国产精品国产| 久久久久久久国产电影| 另类亚洲欧美激情| 午夜福利,免费看| 欧美激情高清一区二区三区| 国产国语露脸激情在线看| 午夜久久久在线观看| 五月天丁香电影| 91麻豆精品激情在线观看国产 | 老熟妇乱子伦视频在线观看| 岛国毛片在线播放| 91大片在线观看| 日韩制服丝袜自拍偷拍| 亚洲黑人精品在线| 精品卡一卡二卡四卡免费| 一区二区日韩欧美中文字幕| 国产成人免费观看mmmm| 悠悠久久av| 精品免费久久久久久久清纯 | 999精品在线视频| 俄罗斯特黄特色一大片| 啦啦啦中文免费视频观看日本| 丝袜喷水一区| 国产精品成人在线| 不卡av一区二区三区| 深夜精品福利| 成人18禁在线播放| 欧美精品高潮呻吟av久久| 免费人妻精品一区二区三区视频| 五月天丁香电影| 日本黄色日本黄色录像| 法律面前人人平等表现在哪些方面| 12—13女人毛片做爰片一| 亚洲欧美一区二区三区黑人| 丝袜美足系列| 中文字幕人妻丝袜一区二区| 免费av中文字幕在线| 欧美 日韩 精品 国产| 大型av网站在线播放| 国产精品电影一区二区三区 | 欧美精品人与动牲交sv欧美| 成人永久免费在线观看视频 | 91九色精品人成在线观看| 考比视频在线观看| 两人在一起打扑克的视频| 日韩一区二区三区影片| 狠狠婷婷综合久久久久久88av| 久久热在线av| 久久精品aⅴ一区二区三区四区| 视频区图区小说| 久久久久网色| 久久中文字幕人妻熟女| 国精品久久久久久国模美| 在线观看免费视频网站a站| 国产国语露脸激情在线看| 亚洲欧美精品综合一区二区三区| 性色av乱码一区二区三区2| 免费在线观看日本一区| 少妇被粗大的猛进出69影院| 91麻豆av在线| 亚洲成人免费av在线播放| 桃红色精品国产亚洲av| 欧美日韩精品网址| 亚洲av电影在线进入| 搡老乐熟女国产| 国产野战对白在线观看| 18禁黄网站禁片午夜丰满| 亚洲专区国产一区二区| 欧美在线一区亚洲| 99国产精品99久久久久| 色在线成人网| 成人免费观看视频高清| 制服诱惑二区| 久久 成人 亚洲| 伊人久久大香线蕉亚洲五| 国产免费现黄频在线看| 久久香蕉激情| 午夜免费鲁丝| 精品一区二区三区视频在线观看免费 | 一本一本久久a久久精品综合妖精| 桃红色精品国产亚洲av| 中文字幕av电影在线播放| 男人舔女人的私密视频| 午夜福利在线免费观看网站| 国产97色在线日韩免费| 国产男女超爽视频在线观看| 国产成人欧美在线观看 | 动漫黄色视频在线观看| www.999成人在线观看| 成年女人毛片免费观看观看9 | 精品一区二区三卡| 久久久久久亚洲精品国产蜜桃av| 韩国精品一区二区三区| av福利片在线| 国产成人欧美| 欧美精品一区二区免费开放| 一边摸一边做爽爽视频免费| 纵有疾风起免费观看全集完整版| 不卡av一区二区三区| videosex国产| 丁香六月欧美| 国产在线视频一区二区| 天堂8中文在线网| 精品一品国产午夜福利视频| 午夜激情久久久久久久| 亚洲第一av免费看| 一本色道久久久久久精品综合| 热re99久久精品国产66热6| 高清黄色对白视频在线免费看| 大片免费播放器 马上看| 亚洲成av片中文字幕在线观看| √禁漫天堂资源中文www| 少妇精品久久久久久久| 女人精品久久久久毛片| 日韩欧美国产一区二区入口| 波多野结衣一区麻豆| 日韩一区二区三区影片| 日韩大码丰满熟妇| 国产精品1区2区在线观看. | 麻豆国产av国片精品| 麻豆成人av在线观看| 大型av网站在线播放| 成人永久免费在线观看视频 | 欧美日韩精品网址| 天堂中文最新版在线下载| 亚洲国产欧美网| 在线天堂中文资源库| 国产片内射在线| 免费黄频网站在线观看国产| 性少妇av在线| 女人高潮潮喷娇喘18禁视频| 无人区码免费观看不卡 | 国产麻豆69| 少妇粗大呻吟视频| 国产欧美日韩一区二区精品| 中文亚洲av片在线观看爽 | 自拍欧美九色日韩亚洲蝌蚪91| 欧美日韩亚洲高清精品| 欧美成狂野欧美在线观看| 麻豆成人av在线观看| 人成视频在线观看免费观看| 欧美日韩av久久| 精品免费久久久久久久清纯 | 在线观看人妻少妇| 首页视频小说图片口味搜索| 丁香六月天网| 日本一区二区免费在线视频| 热re99久久精品国产66热6| av超薄肉色丝袜交足视频| 国产精品电影一区二区三区 | 亚洲精品国产区一区二| 国产午夜精品久久久久久| 久久人人爽av亚洲精品天堂| 中文字幕人妻丝袜一区二区| 久久人妻av系列| 国产精品国产av在线观看| 亚洲国产中文字幕在线视频| 久久久精品免费免费高清| 大码成人一级视频| 国产淫语在线视频| 久久久久视频综合| 在线观看一区二区三区激情| 操美女的视频在线观看| 91成人精品电影| 亚洲成a人片在线一区二区| 国产在线一区二区三区精| 在线观看免费午夜福利视频| 国产欧美日韩一区二区三区在线| 一进一出抽搐动态| 99国产极品粉嫩在线观看| 国产精品秋霞免费鲁丝片| 18禁观看日本| 黄色片一级片一级黄色片| 久久久久久人人人人人| 在线观看免费午夜福利视频| 嫩草影视91久久| bbb黄色大片| 国产欧美日韩一区二区三区在线| 无人区码免费观看不卡 | av网站在线播放免费| 国产精品久久久av美女十八| 大片电影免费在线观看免费| 人妻 亚洲 视频| svipshipincom国产片| av在线播放免费不卡| 欧美另类亚洲清纯唯美| 国产真人三级小视频在线观看| 久久久久视频综合| 精品视频人人做人人爽| √禁漫天堂资源中文www| 精品一品国产午夜福利视频| 正在播放国产对白刺激| 男女床上黄色一级片免费看| 99精品在免费线老司机午夜| 成人亚洲精品一区在线观看| 黑人操中国人逼视频| 大陆偷拍与自拍| 精品卡一卡二卡四卡免费| 天天添夜夜摸| 日韩免费高清中文字幕av| 肉色欧美久久久久久久蜜桃| 岛国在线观看网站| av超薄肉色丝袜交足视频| 亚洲精品一卡2卡三卡4卡5卡| av超薄肉色丝袜交足视频| 少妇 在线观看| 国产成人影院久久av| 日韩中文字幕欧美一区二区| 亚洲天堂av无毛| 老司机靠b影院| 99精品久久久久人妻精品| 纯流量卡能插随身wifi吗| 99久久国产精品久久久| 免费黄频网站在线观看国产| 欧美另类亚洲清纯唯美| 一区二区日韩欧美中文字幕| 9色porny在线观看| 女人久久www免费人成看片| 大片免费播放器 马上看| 大陆偷拍与自拍| 又黄又粗又硬又大视频| 久久亚洲真实| 国产亚洲一区二区精品| 成人黄色视频免费在线看| 在线亚洲精品国产二区图片欧美| 热99re8久久精品国产| 久久精品国产a三级三级三级| 一区二区av电影网| 国产日韩欧美在线精品| 欧美国产精品一级二级三级| 美女高潮喷水抽搐中文字幕| 国产av国产精品国产| av有码第一页| 成人永久免费在线观看视频 | 日本黄色视频三级网站网址 | 日韩中文字幕视频在线看片| 每晚都被弄得嗷嗷叫到高潮| 亚洲七黄色美女视频| 国产精品一区二区免费欧美| 美女午夜性视频免费| 国产精品秋霞免费鲁丝片| 精品少妇一区二区三区视频日本电影| 啦啦啦免费观看视频1| 久久这里只有精品19| 亚洲色图av天堂| 亚洲精品粉嫩美女一区| 久久ye,这里只有精品| 精品视频人人做人人爽| 女人精品久久久久毛片| 国产精品99久久99久久久不卡| 精品国产一区二区三区久久久樱花| 国产精品欧美亚洲77777| 日韩欧美国产一区二区入口| cao死你这个sao货| 人人妻人人添人人爽欧美一区卜| 国产精品一区二区精品视频观看| 黄色视频不卡| 欧美国产精品一级二级三级| 成人精品一区二区免费| kizo精华| 久久影院123| 美女午夜性视频免费| 菩萨蛮人人尽说江南好唐韦庄| 丝袜在线中文字幕| 一级片免费观看大全| 精品少妇内射三级| 久久久欧美国产精品| 国产成人免费观看mmmm| 少妇 在线观看| 精品免费久久久久久久清纯 | 熟女少妇亚洲综合色aaa.| 免费看a级黄色片| 国产欧美日韩综合在线一区二区| 中文亚洲av片在线观看爽 | 成年人黄色毛片网站| 国产精品1区2区在线观看. | 国产欧美日韩一区二区三区在线| 久久久精品免费免费高清| 午夜福利免费观看在线| 成人国语在线视频| 少妇粗大呻吟视频| 在线观看免费午夜福利视频| 成人特级黄色片久久久久久久 | 久久 成人 亚洲| 十八禁网站免费在线| 国产精品久久电影中文字幕 | 19禁男女啪啪无遮挡网站| 久久 成人 亚洲| 欧美激情高清一区二区三区| 国产精品一区二区精品视频观看| 丝袜人妻中文字幕| 婷婷成人精品国产| 一区二区av电影网| 搡老岳熟女国产| 一级毛片电影观看| www.999成人在线观看| 精品少妇久久久久久888优播| 亚洲精品国产精品久久久不卡| 大片免费播放器 马上看| 国产伦人伦偷精品视频| 757午夜福利合集在线观看| 午夜福利在线观看吧| 日韩一区二区三区影片| 国产av又大| 丝瓜视频免费看黄片| 可以免费在线观看a视频的电影网站| 亚洲国产看品久久| 中文字幕最新亚洲高清| 亚洲精品粉嫩美女一区| 国产成+人综合+亚洲专区| 久久亚洲真实| 不卡一级毛片| 老司机午夜十八禁免费视频| 天天影视国产精品| 男女高潮啪啪啪动态图| 十八禁网站网址无遮挡| 黑人操中国人逼视频| 国产免费福利视频在线观看| 欧美成人午夜精品| 中文字幕人妻熟女乱码| 精品亚洲成a人片在线观看| 91麻豆av在线| a级片在线免费高清观看视频| av国产精品久久久久影院| 久久精品人人爽人人爽视色| 欧美精品啪啪一区二区三区| 最近最新免费中文字幕在线| 亚洲国产欧美日韩在线播放| 99热网站在线观看| avwww免费| 国产男女内射视频| 亚洲精品美女久久久久99蜜臀| 精品高清国产在线一区| 国产伦人伦偷精品视频| 中文亚洲av片在线观看爽 | 一级毛片精品| 精品免费久久久久久久清纯 | 国产不卡av网站在线观看| 久久99热这里只频精品6学生| 黑丝袜美女国产一区| 国产精品熟女久久久久浪| 1024视频免费在线观看| 美女视频免费永久观看网站| 久久久久久亚洲精品国产蜜桃av| 一区二区日韩欧美中文字幕| 免费在线观看日本一区| 日韩欧美免费精品| 国产视频一区二区在线看| 在线观看免费午夜福利视频| 国产xxxxx性猛交| 丰满迷人的少妇在线观看| 国产一区二区激情短视频| 高清欧美精品videossex| 最新的欧美精品一区二区| 成人国产一区最新在线观看| 啦啦啦视频在线资源免费观看| 国产一区二区三区在线臀色熟女 | 国产亚洲精品一区二区www | 久久久精品94久久精品| 日本黄色日本黄色录像| 搡老熟女国产l中国老女人| 国产精品久久久久久人妻精品电影 | 久久精品aⅴ一区二区三区四区| 黑人欧美特级aaaaaa片| 国产男女超爽视频在线观看| 老熟妇乱子伦视频在线观看| 成人手机av| 亚洲专区中文字幕在线| av又黄又爽大尺度在线免费看| 1024香蕉在线观看| 精品少妇内射三级| 久久亚洲真实| 欧美日韩黄片免| 亚洲免费av在线视频| 考比视频在线观看| 成人精品一区二区免费| 久久亚洲真实| 男人舔女人的私密视频| 亚洲avbb在线观看| 日本黄色日本黄色录像| 成年人午夜在线观看视频| 啪啪无遮挡十八禁网站| 国产欧美日韩一区二区三区在线| 欧美另类亚洲清纯唯美| 亚洲成av片中文字幕在线观看| 香蕉久久夜色| 免费人妻精品一区二区三区视频| 97在线人人人人妻| 777米奇影视久久| 欧美黑人欧美精品刺激| 我要看黄色一级片免费的| 久久精品成人免费网站| 黑人巨大精品欧美一区二区蜜桃| 国产亚洲午夜精品一区二区久久| 日韩熟女老妇一区二区性免费视频| 无限看片的www在线观看| 欧美成狂野欧美在线观看| 亚洲欧美色中文字幕在线| 日本av手机在线免费观看| 精品国产乱码久久久久久男人| 激情视频va一区二区三区| 一本—道久久a久久精品蜜桃钙片| 人妻一区二区av| 午夜福利影视在线免费观看| 91国产中文字幕| 亚洲国产毛片av蜜桃av| 日日夜夜操网爽| 亚洲欧美一区二区三区黑人| 12—13女人毛片做爰片一| 高清av免费在线| 亚洲精品国产精品久久久不卡| 黄色成人免费大全| 日本撒尿小便嘘嘘汇集6| 美女高潮喷水抽搐中文字幕| 成人永久免费在线观看视频 | 丁香六月天网| 人人澡人人妻人| 男女免费视频国产| 美女视频免费永久观看网站| 欧美中文综合在线视频| 亚洲人成77777在线视频| 水蜜桃什么品种好| 三级毛片av免费| 久热这里只有精品99| 色播在线永久视频| 黄片大片在线免费观看| 在线观看人妻少妇| 熟女少妇亚洲综合色aaa.| 黄色视频不卡| 变态另类成人亚洲欧美熟女 | 美女午夜性视频免费| 午夜成年电影在线免费观看| 国产一区二区三区在线臀色熟女 | 日韩免费高清中文字幕av| 欧美成狂野欧美在线观看| 精品一品国产午夜福利视频| 一级毛片女人18水好多| 久久久久久久精品吃奶| 日韩欧美国产一区二区入口| 成人影院久久| 亚洲欧美一区二区三区黑人|