• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Deep Reinforcement Learning for Addressing Disruptions in Traffic Light Control

    2022-08-24 03:26:18FaizanRasheedKokLimAlvinYauRafidahMdNoorandYungWeyChong
    Computers Materials&Continua 2022年5期

    Faizan Rasheed,Kok-Lim Alvin Yau,Rafidah Md Noor and Yung-Wey Chong

    1School of Engineering and Computer Science,University of Hertfordshire,Hatfield,AL109AB,UK

    2Department of Computing and Information Systems,Sunway University,Subang Jaya,47500,Malaysia

    3Faculty of Computer Science and Information Technology,University of Malaya,Kuala Lumpur,50603,Malaysia

    4National Advanced IPv6 Centre,Universiti Sains Malaysia,USM,Penang,11800,Malaysia

    Abstract: This paper investigates the use of multi-agent deep Q-network(MADQN) to address the curse of dimensionality issue occurred in the traditional multi-agent reinforcement learning (MARL) approach.The proposed MADQN is applied to traffic light controllers at multiple intersections with busy traffic and traffic disruptions, particularly rainfall.MADQN is based on deep Q-network (DQN), which is an integration of the traditional reinforcement learning (RL) and the newly emerging deep learning (DL)approaches.MADQN enables traffic light controllers to learn, exchange knowledge with neighboring agents,and select optimal joint actions in a collaborative manner.A case study based on a real traffic network is conducted as part of a sustainable urban city project in the Sunway City of Kuala Lumpur in Malaysia.Investigation is also performed using a grid traffic network(GTN) to understand that the proposed scheme is effective in a traditional traffic network.Our proposed scheme is evaluated using two simulation tools,namely Matlab and Simulation of Urban Mobility (SUMO).Our proposed scheme has shown that the cumulative delay of vehicles can be reduced by up to 30%in the simulations.

    Keywords: Artificial intelligence; traffic light control; traffic disruptions;multi-agent deep Q-network;deep reinforcement learning

    1 Introduction

    Traffic congestion has become a problem in most urban areas of the world, causing enormous economic waste, extra travel delay, and excessive vehicle emission [1].The traffic light controllers,which are installed to monitor and control the traffic flows at intersections in order to alleviate traffic congestion strategically.Each traffic light controller has:a)light colors in which green color represents“go”,amber represents“slow down”,and red represents“stop”;b)traffic phase,in which a set of green lights are assigned to a set of lanes for safe and non-conflicting movements at the intersections;and c)traffic phase splitis the time period of a traffic phase.The traffic phase split includes a short moment of red lights for all lanes to provide a safe transition in between traffic phases.The time passed since the respective lane light has changed to red is represented by red timing.

    Next, we present five main fundamentals related to our investigation in the use of MADQN to traffic light controllers for reducing the cumulative delay of vehicles.Firstly, various types of traffic light controllers are presented to control and alleviate traffic congestion.Secondly,various artificial intelligence approaches are presented to accomplish fully-dynamic traffic light controllers.Thirdly,the traditional DQN approach is presented as an enhanced artificial intelligence approach applied to traffic light controllers.Fourthly, the significance of using the cumulative delay of vehicles as the performance measure is presented.Fifthly,traffic disruptions to traffic network are introduced using the Burr distribution.The contributions of this paper are also presented.

    1.1 Types of Traffic Light Controllers

    Traditionally,traffic light controllers monitor traffic movements and determine traffic phase splits to control and reduce congestion in traffic by using three main techniques.Firstly, adeterministictraffic light controller is a pretimed control system that uses historical data of traffic collected at different times and determines traffic phase splits using the Webster formula [2].Secondly, asemidynamictraffic light controller is a system based on actuated control, and it uses instantaneous or short-term traffic condition, such as the absence and presence of vehicles, and assigns green lights to lanes with vehicles [3].The short-term traffic condition can be detected using an inductive loop detector.Thirdly, afully-dynamictraffic light controller is also a system based on actuated control;however,it uses longer-term traffic condition,including the average queue length and waiting time of vehicles of a lane.The longer-term traffic condition (e.g., the average queue length and the waiting time of vehicles at a lane)can be measured by the number of vehicles using at least two inductive loop detectors at a lane,one is installed near the intersection and another one is installed further away from the intersection[4].By measuring the number of vehicles,the average queue length and waiting time of vehicles at a lane can be calculated.Video-based traffic detector(or camera sensor)can also be installed at an intersection,and image processing can be used to calculate the number of vehicles at a lane of an intersection [5].Subsequently, it adjusts the traffic phase split based on the traffic condition [6].The fully-dynamic traffic light controllers are more realistic in monitoring traffic movements because it uses longer-term traffic conditions,and the approach has shown to alleviate traffic congestion more effectively compared to deterministic and semi-dynamic traffic light controllers[7–9].

    1.2 Common Approaches of Fully-Dynamic Traffic Light Controllers

    The fully-dynamic traffic light controller has commonly been accomplished using artificial intelligence approaches, particularly reinforcement learning (RL) [10,11] and multi-agent reinforcement learning (MARL) [12].RL can be embedded in a single agent (or a decision maker), which is the traffic light controller,to learn,make optimal action(i.e.,traffic phase split),and improve its operating environment (i.e., the local traffic condition).In contrast, MARL can be embedded in multiple traffic light controllers(or agents)to exchange knowledge(i.e.,Q-values),learn,make optimal joint action,and improves their operating environment(i.e.,the global traffic condition)in a collaboration.However,the curse of dimensionality has occurred in single-agent RL and MARL,in which the state space is too large to be handled efficiently due to the complexity of the traffic congestion issue[13],and so this paper uses multi-agent deep Q-network(MADQN)that is based on the traditional singleagent DQN approach[14].This paper extends our previous work[15]that mainly focuses on measuring the performance measures,such as throughput,waiting time,and queue length,which are unable to relate to drivers’experience.This paper investigates the use of MADQN to traffic light controllers at intersections with high volume of traffic and traffic disruptions (i.e., rainfall) by measuring the cumulative delay of vehicles as the performance measure,which relate to drivers’experience.

    1.3 DQN as an Enhanced RL-Based Approach for Traffic Light Controllers

    DQN is a combination of the new evolving deep learning technique[16]and the traditional RL technique, conveniently called deep reinforcement learning (DRL) [17].DQN solves the curse of dimensionality and provides two main advantages[18]:a)reduces the learning time and computational cost incurred to explore different pairs of state-action and identify the actions that are optimal;and b) uses hidden layers to provide abstract and continuous representations of the complex and highdimensional inputs (i.e., the state space) for reducing the capacity of storage needed to store the unlimited number of pairs of state-action(or the Q-values).

    1.4 Cumulative Delay of Vehicles as the Performance Measure Used by Traffic Light Controllers

    The cumulative delay of the vehicles is the average time(i.e.,average travelling and waiting times caused by congestion)taken by vehicles to travel from a source location to a destination location,which may require crossing multiple intersections.Compared to other measures,including the average queue length and waiting time of the vehicles of a lane,and throughput(i.e.,the number of vehicles crossing an intersection),the cumulative delay can be directly perceived by drivers,and so it relates to the drivers’experiences.In other words,drivers perceive the difference between the actual and expected travel times when crossing multiple intersections.The investigations with respect to the cumulative delay of vehicles as the performance measure has gained momentum over the years with the use of DRL to traffic light control.Specifically, the cumulative delay is the most frequently used performance measure in the literature from January 2016 to November 2020 as compared to other performance measures in the investigation of the use of DRL to traffic light control as shown in Fig.1.The scientific literature databases, including Web of Science [19], IEEE Xplore Digital Library [20], and ScienceDirect [21],have been used to conduct this study.The cumulative delay of vehicles has gained momentum over the years due to its better reflection of the real circumstances, particularly the drivers’experiences.While the cumulative delay has been used in the literature [22,23], it has not been applied to traffic congestion at multiple intersections scenario under the presence of increased traffic volume and traffic disruptions,and so this paper adopts this measure to calculate the average time required by vehicles to cross multiple intersections.

    1.5 Burr Distribution for Introducing Traffic Disruptions to Traffic Networks

    Traffic congestion can be categorized into:a)recurrent congestion(RC)caused by the high volume of traffic; and b)non-recurrent congestion (NRC) caused by erratic traffic disruptions, including accidents, and rainfall [24,25].In the literature [26], the arrival process of vehicles has been widely modeled by the Poisson process, whereby the inter-arrival time of vehicles follows an exponential distribution.The Poisson process incorporates RC naturally;however,it does not incorporate NRC,and so the Burr distribution,which is the generalization of the Poisson process,has been adopted by this work to model the vehicle’s time of inter-arrival under scenarios with a high volume of traffic(i.e.,RC)and traffic disruptions(i.e.,NRC).

    1.6 Contributions of the Paper

    Our contribution is to investigate the use of MADQN to traffic light controllers at intersections with a high volume of traffic and traffic disruptions(i.e.,rainfall).This work is based on simulation using the Burr distribution, which has been shown to model traffic disruptions in traffic networks accurately in [27].The performance measure is the cumulative delay of vehicles, which includes the average waiting and travelling times caused by congestion.This performance measure is used because the difference between the actual and expected travel times when crossing multiple intersections can be directly perceived by drivers,and so it relates to the drivers’experiences.In this paper,we aim to show a performance comparison between MADQN and MARL applied to traffic light controllers at intersections with a high volume of traffic and traffic disruptions in terms of the cumulative delay of vehicles,which has not been investigated in the literature despite its significance.

    Figure 1:The usage of three popular performance measures in the investigation of the use of DRL to traffic light control in recent years

    1.7 Organization of the Paper

    The paper is structured into six sections.

    ·Section 1 presents the introduction of traffic light controllers and the background of common approaches for traffic light controllers.It also presents the contributions of the paper.

    ·Section 2 presents the background of the Krauss vehicle-following model,DRL and MARL.The traditional algorithms of DRL and MARL are also presented in this section.

    ·Section 3 presents the literature review of DQN-based traffic light controllers.There are five main DQN approaches discussed in this section.

    ·Section 4 presents the proposed MADQN model for traffic light controllers.It presents the representations of the proposed MADQN model,including the state space,action space,and delayed reward, applied to traffic light controllers.It also presents the MADQN architecture and algorithm.

    ·Section 5 presents an application for sustainable city (i.e., Sunway city), and our simulation results and discussion.

    ·Section 6 concludes this paper with a discussion of potential future directions.

    2 Background

    This section presents the background of the Krauss vehicle-following model,DRL and MARL.The Krauss vehicle-following model is a mathematical model of safe vehicular movement,whereby a gap between two consecutive vehicles is maintained.The background of DRL includes the traditional single-agent deep Q-network (DQN) algorithm.The background of MARL includes its traditional algorithm.

    2.1 Krauss Vehicle-Following Model

    In 1997,Krauss developed a vehicle-following model based on the safe speed of vehicles.The safe speed is calculated as follows[28]:

    whereul(t)anduf(t)represent the speed of the leading and following vehicles at timet,respectively,andg(t) is the gap to the leading vehicle at timet.The driver’s reaction time (e.g., one second) is represented byτr,andbis the maximum deceleration of the vehicle.

    In our study,the Krauss vehicle-following model is used to ensure the safe movement of vehicles at intersections of the Sunway city and grid traffic networks(see Section 5).

    2.2 Deep Reinforcement Learning

    DRL incorporates DNNs into RL that enables agents to learn relationships between actions and states.DeepMind has first proposed the DQN [17], which is the DRL method, and it has been widely adopted in traffic light control [29].DQN consists of a DNN, which is comprised of three main kinds of layers,namely theinput layer,hidden layer(s),andoutput layer.In DQN,the neurons are interconnected with each other and they can learn complex and unstructured data [30].During training, the data flows from the input layer to the hidden layer(s), and finally to the output layer.DQN provides two main features,which are:a)experience replay,in which experiences are stored in a replay memory,and then the experiences are randomly selected for training;and b)target network,which is the main network duplicate.The main network selects actions based on observations from the operating environment and updates its weights.The target network approximates the weights of the main network to generate its Q-value.During training,the Q-value of the target network is used to calculate the loss incurred by a selected action,and it has shown to stabilize training.After every certain number of iterations, the target network is updated.The main difference between the main and target networks is that the main network is used during observation,action selection and training processes,while the target network is used during training process only.

    Algorithm 1:The traditional single-agent DQN algorithm 1.procedure 2.for m=1: M do{Observation process}3.observe current state st 4.for t=1: T do{Action selection process}5.select action vt using Eq.(2)(Continued)

    6.receive delayed reward rt+1(st+1)and next state st+1 7.store experience et =(st,vt,rt+1(st+1),st+1)in replay memory Dt{Training process}8.sample a random minibatch of experiences en from replay memory Dt 9.for j =1: N do 10.set target yj =images/BZ_143_675_641_703_687.pngrj+1images/BZ_143_791_641_810_687.pngsj+1images/BZ_143_791_696_810_742.pngsj+1images/BZ_143_869_641_887_687.png,if episode terminates at sj+1images/BZ_143_869_696_887_742.png+γmaxvQ(sj+1,v;θj),otherwise 11.compute the loss function using Eq.(3)12.perform a gradient descent optimization on(yj-Q(sj,vj;θj))2 with respect to θj using Eq.(5)13.reset θ- =θ in every C steps 14.end for 15.end for 16.end for 17.end procedure rj+1

    Algorithm 1 shows the DQN algorithm.Inm∈M,which is an episode,the current statest∈S(or the decision making factors)is observed by an agent.Att∈T,which is a time instant,the best-known(or greedy)actionv*t∈Vis selected by an agent as follows:

    whereQt(st,vt;θt) is the Q-value, which indicates whether the actionvtis appropriate under statest,andθtare the parameters of main network.After that,the agent receives the delayed rewardrt+1(st+1)and next statest+1,and then it stores its experienceet=(st,vt,rt+1(st+1),st+1)in a replay memoryDt=(e1,e2,...,et).After that,a minibatch of experiencesenis sampled by an agent from the replay memoryDtrandomly.Suppose, the target network Q-value isQt(st,vt;θt-) and the main network Q-value isQt(st,vt;θt).The target Q-value is fixed forCsteps to stabilize the Q-values of the main network,and to reduce the loss between the Q-values of the target and main networks.To train the main network,the loss function is reduced at iterationjas follows:

    wherep(sj,vj)is the state-action(sj,vj)pair probability distribution,andyjis a target,as follows:

    whereγis a discount factor,in which the discounted rewardγmaxv Q(sj+1,v;θj)represents the longterm reward estimated by the maximum Q-value at iterationj+1,and the delayed rewardrepresents the short-term reward.If episode terminates atsj+1,thenThe loss function gradient ?θLj(θj)is given as follows:

    The target Q-valuesQt(st,vt;θt-)of the target network is updated by replacing the weightsθj-of the target network with the weightsθjof the main network in order to provideQj(sj,v*j;θj-)≈Q*(sj,vj;θj)at everyCsteps(i.e.,equivalent to a number of iterations[31]).

    2.3 Multi-Agent Reinforcement Learning

    MARL is an extended approach of the traditional RL approach that enables multiple agents to exchange information with each other in order to achieve the optimal network-wide performance[32].The optimization of the network-wide objective function is the main purpose,such as the global Qvalue that sums up the local Q-values of all agents in a single network,as time goes byt= 1,2,3....Algorithm 2 shows the MARL algorithm.At time instantt∈T,an agentiobserves its current local statesit∈S, sends its own Q-valueQit(sit,vit) to neighboring agentsJi, receives the optimal Q-value maxvj∈V Qjt(sjt,vj)from each neighboring agentj∈Ji,and selects anaction vit,*∈Vas follows:

    Agentireceives a delayed reward() for the state-action pair(sit,vit) under the next state∈Sat time instantt+1, and then the Q-valueQit(sit,vit) for the state-action pair is updated as follows:

    where Δti(sit,vit)represents a temporal difference as follows:

    whereni,jrepresents the importance (or weight) of an agentjin the neighborhood of agenti, and

    Algorithm 2:MARL algorithm embedded in agent i 1.procedure 2.observe current state si t ∈S 3.send Q-value Qi t)to neighboring agents Ji 4.receive max vj∈V Qj t(si t(sj t,vi t,vj)from agent j ∈Ji 5.select action vi t ∈V using Eq.(6)6.receive delayed reward ri t+1(siimages/BZ_144_746_2295_765_2341.pngsit,vit t+1)images/BZ_144_843_2295_862_2341.pngusing Eq.(7)8.end procedure 7.update Q-value Qi t+1

    3 Literature Review

    This section presents a literature review of DQN-based traffic light controllers,which have shown to achieve various performance measures.Five main DQN approaches have been proposed to reduce the cumulative delay of vehicles.In general, the DQN approaches are embedded in traffic light controllers.The DQN model has an input layer that receives state,and an output layer that provides Q-values for possible actions(e.g.,traffic phases[33]and traffic phase splits[34]).

    3.1 Traditional DNN-Based DQN

    The application of the traditional DNN-based DQN approach to traffic light control is proposed in [33,35].In the Wan’s DNN-based DQN model [33]:a) state represents the current traffic phase,the queue length, and the green and red timings; b) action represents a traffic phase; and c) reward represents the waiting time of vehicles.In the Tan’s DNN-based DQN model[35]:a)state represents the queue length of vehicles;b)action represents a traffic phase;and c)reward represents the queue length and waiting time, as well as throughput.The proposed schemes have shown to reduce the cumulative delay[33,35]and queue length[35]of vehicles,and improve throughput[33].

    3.2 CNN-Based DQN

    The convolutional neural network(CNN)-based DQN approach enables agents to analyze visual imagery of traffic in an efficient manner.The agents process the input states, which are represented in the form of a two-dimensional matrix(i.e.,multiple rows and columns of values,such as an image)or one-dimensional vectors(e.g.,a single row or column of values,such as the queue length of a lane)[29].The application of the CNN-based DQN approach to traffic light control is proposed in [22],[23,34,36].In the Genders’and Gao’s CNN-based DQN model[22,34]:a)state represents the current traffic phase,as well as the position and speed of vehicles;b)action represents a traffic phase[22]and a traffic phase split[34];and c)reward represents the waiting time of vehicles.In the Wei’s CNN-based DQN model[36]:a)state represents the current traffic phase,as well as the position and queue length of vehicles;b)action represents a traffic phase;and c)reward represents the waiting time and queue length.In the Mousavi’s CNN-based DQN model[23]:a)state represents the current traffic phase and queue length;b)action represents a traffic phase;and c)reward represents the waiting time of vehicles.The proposed schemes have shown to reduce the cumulative delay[22,23,34,36],waiting time[34],and queue length[22,23,36]of vehicles,as well as improve throughput[22,36].

    3.3 SAE-Based DQN

    The stacked auto encoder(SAE)neural network-based DQN approach enables agents to perform encoding and decoding functions, and store inputs efficiently.The application of the SAE neural network-based DQN approach to traffic light control is proposed in [37].In the Li’s SAE neural network-based DQN model [37]:a) state represents the queue length; b) action represents a traffic phase split;and c)reward represents the queue length and waiting time.The simulation results have shown that the proposed scheme can reduce the cumulative delay and queue length of vehicles.

    3.4 LSTM-Based DQN with A2C

    The long short-term memory (LSTM) neural network-based DQN approach enables agents to memorize previous inputs of a traffic light control using a memory cell that maintains a time window of states.The advantage of the actor critic(A2C)-based method is that it is a combination of valuebased and policy-gradient (PG)-based DQN method.Each agent has an actor that controls how it behaves (i.e., PG-based), and a critic that measures the suitability of the selected action (i.e., valuebased).The application of the LSTM neural network-based DQN with A2C approach to traffic light control has been proposed in[38].In[38]:a)state represents the queue length;b)action represents a traffic phase;and c)reward represents the queue length and waiting time.The simulation results have shown that the proposed scheme can reduce the cumulative delay and queue length of vehicles,as well as improve throughput.

    3.5 MADQN

    The MADQN approach allows multiple DQN agents to share knowledge(i.e.,Q-values),learn,and make optimal joint actions(i.e.,traffic phase split)in a collaboration.In the Rasheed’s MADQN model [15]:a) state represents queue length, the current traffic phase, red timing, and the rainfall intensity; b) action represents a traffic phase split; and c) reward represents the waiting time of the vehicles.The simulation results have shown that the proposed scheme can reduce the waiting time and queue length at a lane,and improve throughput.In this paper,we extend the work in[15]by evaluating the cumulative delay of vehicles incurred by MARL and MADQN at multiple intersections, while having traffic disruptions(i.e.,rainfall).

    4 Our Proposed MADQN Approach for Traffic Light Controllers

    In a traffic network, an intersections setIis considered in this paper, wherebyi∈Iis an intersection in which:a)Kiis an incoming lane set,and b)Jiis a neighboring intersection set.Fig.2 shows an abstract model of MADQN, in which the agentiand its neighboring agentsj= 1 ∈Jiandj= 2 ∈Jishare the same traffic environment.This research uses four traffic phases:a)the north-east bound traffic phase; b) the east-south bound traffic phase; c) the west-north bound traffic phase;and d)the south-west bound traffic phase.The traffic phases are activated in a roundrobin fashion by traffic light controllers at intersections, and our MADQN approach is used to adjust the time intervals of the traffic phases (i.e., traffic phase splits).Our proposed MADQN approach,including the MADQN model(i.e.,the state,action,and delayed reward representations),the MADQN architecture,and the MADQN algorithm with its complexity analysis are presented in the remainder of this section.

    4.1 MADQN Model

    MADQN has three main advantages as compared to MARL as follows:

    ·MADQN uses DNNs, which provide the state space with its continuous representation.Consequently,it represents an unlimited number of pairs of state-action.

    ·MADQN addresses the curse of dimensionality by providing efficient storage for complex inputs.

    ·MADQN uses a target network with experience replay, and so it improves the stability of training.

    The remainder of this subsection presents the representations of the state, action, and delayed reward of the MADQN model at an intersectioniat timet.

    4.1.1 State

    ·∈{0,1,2,3} represents the current traffic phase, and it is a discrete state.The north-east bound traffic phase is represented by a 0 value,the east-south is represented.

    ·∈ {0,1,2,3},?k∈Kirepresents the queue length of the incoming lanesKiand it is a continuous state.No vehicle at a lane is represented by a 0 value, ≤ 25% occupancy is represented by 1,>25%and ≤50%is represented by 2,and>50%is represented by 3.The occupancy can be measured using inductive loop detectors installed at intersections.

    ·∈ {0,1,2,3},?k∈Kjrepresents the queue length of the incoming lanesk∈Kjat a neighboring intersectionj, and it is a continuous state.Bothandhave similar representation.

    ·∈represents the red timing of the current traffic phase,and it is a continuous state.

    ·∈{0,1,2,3,...,si5}represents the intensity of rainfall with a 0 value means no rain andsi5,which is the maximum value,means the heaviest rain,and it is a continuous state.Simply,the intensity of the disruption is represented by the sub-state.

    Figure 2:An abstract model of agent i and its neighboring agents j =1 ∈Ji and j =2 ∈Ji in MADQN.The cumulative delay of vehicles increases as vehicles travel from one intersection to another

    4.1.2 Action

    The actionvit∈Virepresents a selected action, which is a traffic phase splitvit∈{0,1,2,3,4}in a fixed predetermined round-robin sequence of traffic phases, wherevit= 0 skips a traffic phase;for instance, due to the absence of a waiting vehicle at a lane.The north-east bound traffic phase is represented by a 1 value,the east-south is represented by 2,the west-north is represented by 3,and the south-west is represented by 4.Hence,agentican select to switch to another traffic phase or to keep the current traffic phase.

    4.1.3 Delayed Reward

    An agent receives delayed rewards that vary with the average waiting time of the vehicles at the intersections.Traffic congestion can cause an increment in the average waiting time of the vehicles at the intersections.The delayed rewardis a relative value that represents the difference of the average total waiting time of all vehicles at an intersectioniat timetandt+1(i.e.,before and after taking an actionvit),wherebyWti >gives a positive delayed reward,Wti=gives a zero delayed reward,andWti <gives a negative delayed reward.

    4.2 MADQN Architecture

    Fig.3 shows the DQN architecture.There are three main components in an agent, namely the main network,the target network,and the replay memory.The main network consists of a DNN with its weightθt

    iused to approximate its Q-valuesQit(sit,vit;θti).The main network is used to choose an actionvitfor a particular st atesitobserved from the operating environment in order to achieve the best possible delayed rewardand next stateat the next time instantt+1.The target network is a copy (or duplicate) of the main network with its weightθti-used to approximate its Q-valuesThe target network is used during training only, and the main network is used during both action selection and training.The replay memory represents the dataset of an agent’s experiencesgathered during the interaction between an agent and its operating environment as time goes by.The experiencesDitare used for training purpose.

    The DNN has three kinds of layers(see Section 2).In our DQN architecture,there are 5 neurons in the input layer,whereby each neuron represents a state.The number of fully connected(FC)hidden layers is 5, whereby there are 400 neurons in each layer.There are 5 neurons in the output layer,whereby a possible action is represented by each neuron.A weight is associated with each link.Each neuron consists of a rectified linear unit(ReLU)activation function,in which the gradient descent is performed.The 5 sub-states of stateare fed to the input layer by its neurons.Subsequently,signals are forwarded to the output layerviahidden layers,which provides the Q-valuesof the possible actionsvitat intersectioniduring training.

    Figure 3:DQN architecture

    4.3 MADQN Algorithm

    In this section, the extension of the traditional DQN approach to MADQN for multiple intersections is presented, and it has not been explored in the literature.The proposed MADQN algorithm is evaluated in simulation under different traffic networks (see Section 5) in the presence of traffic disruptions.

    MADQN allows knowledge to be learned and exchanged among multiple DQN agents for coordination.The traditional MARL approach enables an agent to choose an optimal action based on its neighboring agents’actions.In the moving target scenario,actions are selected independently by agents simultaneously,and so the action selected by an agentican affect the operating environment of its neighboring agentsJi.Therefore,the moving target scenario has increased dynamicity of operating environment that affects learning stability.For instance, at an upstream intersection, a traffic light controlleri’s action can have positive or negative effects on the congestion level of downstream and neighboring intersectionsJisince vehicles move from one intersection to another.Likewise,the agent’siaction at an intersection can be affected by agentsJiactions at neighboring intersections.By exchanging knowledge and coordinating the agents,the convergence to an optimal action in a multiagent system has been shown in the literature[39].The summation of the local Q-values of the agents is known as the global Q-value,which represents the global objective function.An optimal equilibrium is attained when a convergence is achieved by the global Q-value.The convergence is attributed to:a)an agent updates its Q-values by using Q-values from neighboring agentsJi; b) the availability of a local view of neighboring agentsJiat an agenti;and c)an agent’s action being the best response to the agents’neighbors.MADQN addressesmoving targetby taking neighboring agents’actions into consideration,and coordinating among themselves in a collaborative manner,in order to converge to an optimal joint action and achieve stability in a shared environment.

    Algorithm 3:MADQN algorithm embedded in agent i Complexity ComputationalMessageStorage 1.procedure 2.for m=1: M do{Observation process}3.observe current state sit ∈S{Knowledge sharing process}4.send Q-value Qit(sit,vit;θit)to neighboring agents Ji≤|J|5.receive max vj;θj t)from agent j ∈Ji≤|J|6.for t=1: T do{Action selection process}7.select action vit using Eq.(9)8.receive delayed reward rit+1(sit+1)and next state sit+1 9.store experience eit =(sit,vit,ri vj∈V Qjt(sjt,images/BZ_149_973_2370_988_2407.png,si t+1)in replay memory~Dit t+1images/BZ_149_899_2370_914_2407.pngsit+1 O(|S||A|)≤|S||A|{Training process}10.sample a random minibatch of experiences ein from replay memory Dit 11.for j =1: N do(Continued)

    12.set target rij+1()■■■■■■■■■■■■■sij+1)(,if episode terminates as sij+1 yij =ri j+1 si j+1otherwise+γ maxv Qij+1(,sij+1,vi;θij)13.compute the loss function using Eq.(3)O(|S||A|)14.perform a gradient descent optimization on(yij -Qij(sij,vij;θij))2 with respect to θij using Eq.(5)O(|S||A|)15.reset θi- =θi in every C steps 16.end for 17.end for 18.end for 19.end procedure

    Algorithm 3 shows the algorithm for MADQN,which is an extension of Algorithm 2.In episodem∈M, the DQN agentiobserves the current stateand sends its own Q-valueto agents in the neighborhoodJi.Next, it receives the optimal Q-valuefrom each agentjin the neighborhoodJi.At time instantt, the agentiselects an optimal actionas follows:

    whereni,jrepresents the weight(or importance)of neighboring agentjat agenti,and

    The proposed MADQN algorithm allows an agent to receive Q-values from neighboring agents and use them to select the optimal action.

    4.4 Complexity Analysis

    This section presents an analysis of the computational,message,and storage complexities of the proposed MADQN algorithm(i.e.,Algorithm 3)applied to traffic light controllers.In this paper,the complexity analysis has two levels:a)step-wisethat considers a single execution or iteration of the MADQN algorithm;and b)agent-wisethat considers all the state-action pairs of an agent.Note that,only exploitation actions are considered while analyzing the algorithm.

    Computational complexity estimates the maximum number of iterations, episodes, and so on,executed to calculate Q-values under all possible states.The step-wise computational complexity is considered under a particular state.An agentistores its experienceupon receiving a delayed reward and the next state,so the step-wise computational complexity is calculated asO(|A|) (Step 9) since there are |A| possible actions for each state.The agent-wise complexity is calculated asO(|S||A|)since there are|S|possible states.

    Message complexity is the number of messages exchanged between the agents in order to calculate the Q-values.An agentiexchanges its knowledge(i.e.,Q-values)with its neighboring agentsJ(Steps 4-5),so the step-wise message complexity is given by ≤|J|since there are|J|neighboring agents.The agent-wise complexity is calculated as ≤|J|.

    Storage complexity is the amount of memory needed to store knowledge(i.e.,Q-values)and the experiences of agents.An agentistores its experience(Step 9),so the stepwise storage complexity has a value of 1,and the agent-wise complexity is calculated as ≤|S||A|.

    5 Application for Sustainable City and Simulation Results

    An investigation of the proposed scheme has been conducted in a case study based on a real traffic network,which is part of a sustainable urban city project in the Sunway City of Kuala Lumpur in Malaysia.Investigation is also performed using a grid traffic network (GTN) to understand the performance of the proposed scheme in a complex traffic network.Hence, our investigation covers both real-world and complex traffic networks,which are based on simulation.In this paper,the traffic networks with a left-hand traffic is considered,in which the traffic movement for the left turn is either protected or does not conflict with other traffic movements.This section also presents simulation results and discussion for our simulation in both RC and NRC environments.

    5.1 Sunway City in Kuala Lumpur

    Sunway city is one of the sustainable and smart cities in Malaysia [40].It has busy commercial areas, residential areas with high density (i.e., LaCosta and Sunway Monash Residence), higher educational institutions (i.e., Monash University Malaysia campus and Sunway University), health centre (i.e., Sunway Medical Centre), amusement park (i.e., Sunway Lagoon), hotel (i.e., Sunway Resort Hotel&Spa),and so on,as shown in Fig.4.In Fig.4,the Sunway city traffic network(SCTN)has seven intersections,whereby every intersection has a traffic light controller.Fig.5 shows the traffic phases,and Tab.1 shows the traffic phase splits of existing(i.e.,deterministic)traffic light controllers at all intersections in SCTN.The traffic phase splits were observed during the evening busy hours(i.e.,5–7 pm)of a working day,and they were measured using a stopwatch.

    Malaysia ranks third and fifth worldwide in the number of lightning strikes (i.e., around 240 thunderstorm days/year [41]) and rainfall (i.e., around 1000 mm/year [42]), respectively.So, traffic congestion caused by traffic disruptions(i.e.,rainfall)during the peak hours is a serious problem.

    In this paper,we apply our proposed algorithm to the traffic network of Sunway city.Investigation is conducted in the traffic simulator SUMO[43].

    5.2 Grid Traffic Network

    A GTN,which is a complex traffic network,has been widely adopted in the literature[15,44–47]to conduct similar investigations,and so it is selected for investigation in this paper to show that the proposed scheme is effective.This paper uses a 3×3 GTN with nine intersections,whereby a traffic light controller is installed at each intersection,which has 4 legs in four different directions(i.e.,north bound,south bound,east bound,and west bound).Each leg has two lanes so that a vehicle can either enter or leave the leg of an intersection.

    Figure 4:A SCTN and the locations of its traffic light controllers

    Figure 5:Seven intersections of the SCTN shown in Fig.4.(a) Intersection 1 (b) Intersection 2 (c)Intersection 3(d)Intersection 4(e)Intersection 5(f)Intersection 6(g)Intersection 7

    5.3 Simulation Settings

    This subsection provides the specification of simulation setup.Two different traffic networks are investigated:a)SCTN with seven intersections(see Fig.4),and b)a 3×3 GTN with nine intersections.While SCTN is based on a real-world traffic network,GTN is a complex traffic network traditionally used in traffic light control investigations [15,44–47].So, these traffic networks are chosen for the investigation of the effectiveness of the proposed scheme in both real-world and complex traffic networks.The simulations are conducted using Matlab [48] and traffic simulator SUMO [43].The traffic control interface protocol of SUMO (i.e., TraCI4Matlab [49]) is used to interconnect Matlab and SUMO.Both SCTN and GTN are designed using NetEdit,which is the traffic network editor of SUMO.The resource files in XML provide the details of the speed limits and arrival rates of vehicles,which define the RC and NRC traffic congestion levels and their effects to the traffic networks.The total duration of the simulations is up to the 100 episodes.The steps of each episode are provided in the Steps 2 to 18 of Algorithm 3.

    Table 1:Traffic phase splits for existing traffic light controllers in SCTN

    5.4 Parameters of Simulation and Performance Measure

    The parameters of simulation,which allow the best possible results for a DQN agent are presented in Tab.2.Up to 50,000 experiences can be stored in a replay memory,and up to 100 experiences can be sampled randomly to form a minibatch.The values of parameters,which are presented in Tab.2,have shown to provide the best possible performance in the literature[44].

    Tab.3 presents the parameters of simulation for the Burr type XII distribution model, which has various intensities of rainfall, includingno rain(NR),light rain(LR),moderate rain(MR), andheavy rain(HR) scenarios [27].The lower and higher scale parameterβvalue shrinks and stretches the distribution, respectively.The shape parameterskandcare reciprocals of the scale parameterβ.The shape parameterskandc, as well as the scale parameterβ, increase with the intensity of rainfall[27].

    Table 2:Parameters of simulation for the DQN agent

    The performance measure used in this paper is the cumulative delay of the vehicles.Our proposed scheme aims to reduce the cumulative delay required by vehicles to cross multiple intersections.The cumulative delay also includes the average travelling and waiting times during congestion caused by RC and NRC.The total number of vehicles is 1000 per episode.

    5.5 Results and Discussion

    This section compares the performance measures achieved by our proposed MADQN, MARL and the baseline approaches,under RC and NRC traffic congestions.Fig.6 presents a MADQN model loss during the training process under RC and NRC traffic congestions in SCTN and GTN.The lower model loss enhances the performances of both SCTN and GTN under both traffic congestions(i.e.,RC and NRC).

    Table 3:Parameters of simulation for the Burr distribution model

    Figure 6:The MADQN model loss under RC and NRC traffic congestions in SCTN and GTN reduces with episode

    5.5.1 Accumulated Delayed Reward

    The accumulated delayed reward for MARL and MADQN under RC and NRC traffic congestions increases with episode in SCTN, as well as in GTN as shown in Fig.7.The accumulated delayed reward for both MARL and MADQN approaches becomes steady after 50 episodes.As compared to MARL,the accumulated delayed reward achieved by MADQN is higher in both types of traffic congestions(i.e.,RC and NRC)and traffic networks(i.e.,SCTN and GTN).Overall,MADQN increases accumulated delayed reward by up to 10%and 12.5%under RC and NRC traffic congestions in the SCTN,and up to 8.3%and 7.2%under RC and NRC traffic congestions in the GTN,respectively.

    Figure 7:Accumulated delayed reward under RC and NRC traffic congestions in SCTN and GTN increases with episode.The accumulated delayed reward of MADQN is higher than that of MARL,which enhances the performances of both SCTN and GTN.(a)RC at SCTN(b)RC at GTN(c)NRC at SCTN(d)NRC at GTN

    5.5.2 Cumulative Delay of the Vehicles

    The cumulative delay of the vehicles for MADQN,MARL and the baseline approaches,under RC and NRC traffic congestions reduces with episode in SCTN as shown in Fig.8.For both RC and NRC traffic congestions,MADQN outperforms MARL and the baseline approaches,in which MADQN achieves a lower cumulative delay compared to MARL and the baseline approaches under increased traffic volume.Similar trend is observed in GTN under RC and NRC traffic congestions as shown in Fig.8.Overall, MADQN reduces cumulative delay of vehicles by up to 27.7% and 27.9% under RC and NRC traffic congestions in SCTN, and up to 28.5% and 27.8% under RC and NRC traffic congestions in GTN,respectively.

    Figure 8:Cumulative delay of the vehicles under RC and NRC traffic congestions in SCTN and GTN reduces with episode.The cumulative delay of the baseline approach is fixed,and for the MARL and MADQN approaches,they reduce with episodes.MADQN has a lower cumulative delay than that of MARL and baseline approaches,which enhances the performances of both SCTN and GTN.(a)RC at SCTN(b)RC at GTN(c)NRC at SCTN(d)NRC at GTN

    Fig.9 presents a performance comparison between MARL and MADQN traffic light controllers in SCTN and GTN,in terms of cumulative delay of vehicles.MADQN achieves lower values compared to MARL in both SCTN and GTN under both RC and NRC traffic congestions.Overall,MADQN has similar results in RC and NRC traffic congestions because of its two main features,particularly target network and experience replay,which have shown outperforming results as compared to MARL in both complex GTN and real-world SCTN.

    Figure 9:Performance comparison between MARL and MADQN traffic light controllers applied in SCTN and GTN under RC and NRC traffic congestions.MADQN achieves a lower cumulative delay compared to MARL under increased traffic volume.Lower cumulative delay enhances the performance of the traffic networks

    6 Conclusions and Future Directions

    This paper investigates the application of multi-agent deep Q-network(MADQN)to traffic light controllers at multiple intersections in order to address two types of traffic congestions:a)recurrent congestion(RC)caused by high volume of traffic;and b)non-recurrent congestion(NRC)caused by traffic disruptions,particularly bad weather conditions.From the traffic light controller perspective,MADQN adjusts traffic phase split according to traffic demand in order to minimize the number of waiting vehicles at different lanes of an intersection.From the MADQN perspective, it enables traffic light controllers to use deep neural networks (DNNs) to store and represent complex and continuous states,exchange knowledge(or Q-values),learn,and achieve optimal joint actions while addressing the curse of dimensionality in a multi-agent environment.There are two main features in MADQN,namely target network and experience replay,which provide training with stability in the presence of multiple traffic light controllers.MADQN is investigated in a traditional GTN and a real traffic network based on the Sunway city.Our simulation in Matlab and SUMO shows that MADQN outperforms MARL by reducing the cumulative delay of vehicles by up to 27.7% and 27.9% under RC and NRC traffic congestions in the SCTN,and up to 28.5%and 27.8%under RC and NRC traffic congestions in the GTN,respectively.

    There aresixfuture works that could be pursued to improve MADQN.Firstly, relaxing the assumption in which the left-turning (or right-turning) traffic movement is not protected or can conflict with other traffic movements in a left-hand (or right-hand) traffic network.Secondly,prioritizing the experiences during experience replay for faster learning in a multi-agent environment with multiple intersections.Thirdly,addressing the effects of dynamicity to MADQN, including the dynamic movement of vehicles.Fourthly,providing fairness and prioritized access among traffic flows at intersections.Fifthly, other kinds of disruptions of traffic, including crashes, could be considered into the state space as they tend to cause serious traffic congestion.Lastly,real field experiment can be conducted to train and validate the proposed scheme using real-world feedback.The real field data can be collected so that the traffic network and the system performance achieved in the simulation can be calibrated.

    Funding Statement:This research was supported by Publication Fund under Research Creativity and Management Office,Universiti Sains Malaysia.

    Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

    一本—道久久a久久精品蜜桃钙片 精品乱码久久久久久99久播 | 成人三级黄色视频| 97在线视频观看| 18禁裸乳无遮挡免费网站照片| 亚洲国产成人一精品久久久| 麻豆成人av视频| 人妻夜夜爽99麻豆av| 国产精品女同一区二区软件| 久久精品久久精品一区二区三区| 久久久久久大精品| 久久久成人免费电影| 精品久久久久久久久亚洲| eeuss影院久久| 丰满人妻一区二区三区视频av| 99在线人妻在线中文字幕| 国产亚洲最大av| 精品免费久久久久久久清纯| 亚洲五月天丁香| 精品欧美国产一区二区三| 韩国av在线不卡| 婷婷色av中文字幕| 亚洲美女搞黄在线观看| 丝袜喷水一区| 精品国产露脸久久av麻豆 | 99热网站在线观看| 亚洲va在线va天堂va国产| 免费黄网站久久成人精品| 不卡视频在线观看欧美| 精品久久久久久久久亚洲| 国产精品女同一区二区软件| 免费看av在线观看网站| 免费看光身美女| av播播在线观看一区| 亚洲内射少妇av| 2021少妇久久久久久久久久久| 亚洲美女搞黄在线观看| 插逼视频在线观看| 欧美精品国产亚洲| 熟女人妻精品中文字幕| 秋霞伦理黄片| 波多野结衣高清无吗| 99在线人妻在线中文字幕| 亚洲最大成人手机在线| 久久久精品大字幕| 男女啪啪激烈高潮av片| 免费大片18禁| 男人狂女人下面高潮的视频| 女人久久www免费人成看片 | 美女被艹到高潮喷水动态| 国产一级毛片七仙女欲春2| 色吧在线观看| 一本久久精品| 国产淫语在线视频| 亚洲精品,欧美精品| 3wmmmm亚洲av在线观看| av国产久精品久网站免费入址| 国产精品伦人一区二区| 网址你懂的国产日韩在线| 搡女人真爽免费视频火全软件| 69人妻影院| 男人狂女人下面高潮的视频| av天堂中文字幕网| 一本—道久久a久久精品蜜桃钙片 精品乱码久久久久久99久播 | 久久这里只有精品中国| 国语对白做爰xxxⅹ性视频网站| 国产在线一区二区三区精 | 我要搜黄色片| 亚洲av电影不卡..在线观看| 精品熟女少妇av免费看| 国国产精品蜜臀av免费| 久久久国产成人精品二区| 尾随美女入室| 久久婷婷人人爽人人干人人爱| 国产高清有码在线观看视频| 中文字幕人妻熟人妻熟丝袜美| 国产精品一区二区三区四区免费观看| 七月丁香在线播放| av女优亚洲男人天堂| 免费人成在线观看视频色| 22中文网久久字幕| 欧美一区二区国产精品久久精品| 日韩 亚洲 欧美在线| 欧美变态另类bdsm刘玥| 五月伊人婷婷丁香| 大香蕉97超碰在线| 日韩在线高清观看一区二区三区| 久久精品久久久久久噜噜老黄 | 成人毛片60女人毛片免费| 国产成人精品婷婷| 青春草国产在线视频| 日韩成人伦理影院| 99热网站在线观看| 最近的中文字幕免费完整| 国产又色又爽无遮挡免| 97人妻精品一区二区三区麻豆| 国产黄片视频在线免费观看| 亚洲国产精品久久男人天堂| 身体一侧抽搐| 综合色丁香网| 国产亚洲av片在线观看秒播厂 | 国产亚洲5aaaaa淫片| 一级二级三级毛片免费看| 超碰97精品在线观看| 国产成人freesex在线| 午夜a级毛片| 国模一区二区三区四区视频| 中文字幕久久专区| 2021天堂中文幕一二区在线观| 久久综合国产亚洲精品| 干丝袜人妻中文字幕| 天天一区二区日本电影三级| 国产精品人妻久久久影院| 中文乱码字字幕精品一区二区三区 | 亚洲久久久久久中文字幕| 国产白丝娇喘喷水9色精品| 九九久久精品国产亚洲av麻豆| 亚洲国产最新在线播放| 秋霞伦理黄片| 最新中文字幕久久久久| 久久久色成人| 看片在线看免费视频| 神马国产精品三级电影在线观看| 91久久精品国产一区二区成人| 久久精品国产亚洲av天美| 精品国产一区二区三区久久久樱花 | 性色avwww在线观看| 亚洲av成人精品一区久久| 国产免费一级a男人的天堂| 欧美一区二区国产精品久久精品| 麻豆av噜噜一区二区三区| 国产免费又黄又爽又色| 国产免费男女视频| 人人妻人人澡人人爽人人夜夜 | 男女那种视频在线观看| 国产探花极品一区二区| ponron亚洲| 国产精品国产三级国产av玫瑰| 欧美性猛交黑人性爽| av.在线天堂| 亚洲欧美精品专区久久| 国产精品一及| 国产成人精品婷婷| 亚洲高清免费不卡视频| 久久草成人影院| 欧美激情国产日韩精品一区| 日韩三级伦理在线观看| 亚洲精品456在线播放app| 成人亚洲欧美一区二区av| 变态另类丝袜制服| 国产精品国产三级国产专区5o | 欧美一区二区亚洲| 成年av动漫网址| 欧美日韩一区二区视频在线观看视频在线 | 人人妻人人看人人澡| 亚洲成人精品中文字幕电影| 六月丁香七月| 亚洲国产日韩欧美精品在线观看| 亚洲av.av天堂| 久99久视频精品免费| av在线亚洲专区| 菩萨蛮人人尽说江南好唐韦庄 | 日本免费在线观看一区| 免费观看人在逋| 国产乱人偷精品视频| ponron亚洲| 一夜夜www| 精品久久久久久久久久久久久| ponron亚洲| 一区二区三区四区激情视频| 免费观看精品视频网站| 成人三级黄色视频| 成人性生交大片免费视频hd| 白带黄色成豆腐渣| 久久久久久大精品| 国产亚洲av片在线观看秒播厂 | 久久婷婷人人爽人人干人人爱| 麻豆成人午夜福利视频| 久久久久久久久久久免费av| 久久久久久久久久黄片| 一二三四中文在线观看免费高清| 在现免费观看毛片| 草草在线视频免费看| 欧美激情在线99| 欧美成人精品欧美一级黄| 不卡视频在线观看欧美| 久久精品国产自在天天线| 精品久久国产蜜桃| 69av精品久久久久久| 99热6这里只有精品| a级一级毛片免费在线观看| 久久欧美精品欧美久久欧美| 欧美性猛交╳xxx乱大交人| 久久人人爽人人片av| 国产激情偷乱视频一区二区| 日韩成人伦理影院| 国产爱豆传媒在线观看| 亚洲18禁久久av| 亚洲综合精品二区| 国产乱人偷精品视频| .国产精品久久| 男人舔女人下体高潮全视频| 日日摸夜夜添夜夜爱| 欧美成人精品欧美一级黄| 不卡视频在线观看欧美| 少妇人妻一区二区三区视频| 欧美性猛交╳xxx乱大交人| 久久精品国产亚洲网站| 91精品一卡2卡3卡4卡| 1000部很黄的大片| 91久久精品国产一区二区三区| 中文字幕免费在线视频6| 午夜福利网站1000一区二区三区| 久久精品久久精品一区二区三区| 国产午夜精品一二区理论片| 九九久久精品国产亚洲av麻豆| 久久精品夜色国产| 国产免费福利视频在线观看| 少妇猛男粗大的猛烈进出视频 | 亚洲欧美清纯卡通| 国产精品不卡视频一区二区| 99在线人妻在线中文字幕| 午夜精品国产一区二区电影 | 少妇的逼水好多| 精品久久久久久电影网 | 久久久成人免费电影| 国产中年淑女户外野战色| 看片在线看免费视频| 日本-黄色视频高清免费观看| 深爱激情五月婷婷| 青春草视频在线免费观看| 日韩中字成人| 麻豆乱淫一区二区| 国产爱豆传媒在线观看| 亚洲四区av| 国内精品一区二区在线观看| 欧美性猛交黑人性爽| 中文在线观看免费www的网站| 国产精品国产三级专区第一集| 欧美成人一区二区免费高清观看| 亚洲欧美成人精品一区二区| 又粗又硬又长又爽又黄的视频| 午夜福利成人在线免费观看| 欧美又色又爽又黄视频| 久久久久网色| 亚洲色图av天堂| a级毛片免费高清观看在线播放| 精品久久久久久久久久久久久| 亚洲真实伦在线观看| 亚洲精品国产av成人精品| 国产午夜精品论理片| 国产男人的电影天堂91| 久久精品国产鲁丝片午夜精品| 人人妻人人澡欧美一区二区| 午夜爱爱视频在线播放| 身体一侧抽搐| 免费观看的影片在线观看| 国产探花极品一区二区| 美女xxoo啪啪120秒动态图| 99热这里只有是精品50| 婷婷六月久久综合丁香| 久久精品国产亚洲网站| 久久精品人妻少妇| 久久99蜜桃精品久久| 一级毛片久久久久久久久女| 禁无遮挡网站| 亚洲国产欧美人成| 日日摸夜夜添夜夜爱| 国产久久久一区二区三区| 久久99精品国语久久久| 国产美女午夜福利| 我的老师免费观看完整版| 卡戴珊不雅视频在线播放| 久久亚洲国产成人精品v| 亚洲av成人av| 色尼玛亚洲综合影院| 黄片无遮挡物在线观看| 久久99热6这里只有精品| 国产高潮美女av| 亚洲五月天丁香| 久久精品国产亚洲av涩爱| 少妇的逼好多水| 级片在线观看| 亚洲美女视频黄频| 成人亚洲精品av一区二区| 日日啪夜夜撸| 国产美女午夜福利| ponron亚洲| 免费观看的影片在线观看| 国产精品一区二区三区四区免费观看| 搡老妇女老女人老熟妇| 亚洲性久久影院| 日本猛色少妇xxxxx猛交久久| av线在线观看网站| 在线观看一区二区三区| 欧美极品一区二区三区四区| 3wmmmm亚洲av在线观看| 国产精品一二三区在线看| 永久免费av网站大全| 亚洲综合精品二区| 99视频精品全部免费 在线| 免费观看性生交大片5| 中文字幕av在线有码专区| 久久人人爽人人爽人人片va| 久久久精品大字幕| 亚洲av福利一区| 国产91av在线免费观看| 色噜噜av男人的天堂激情| 干丝袜人妻中文字幕| 亚洲av中文av极速乱| 成人av在线播放网站| 九九在线视频观看精品| 亚洲精品国产av成人精品| h日本视频在线播放| 综合色丁香网| 日韩国内少妇激情av| 欧美性猛交╳xxx乱大交人| 国产 一区 欧美 日韩| 日韩人妻高清精品专区| 99久久精品国产国产毛片| 欧美激情久久久久久爽电影| 亚洲内射少妇av| 精品99又大又爽又粗少妇毛片| 只有这里有精品99| 国产乱来视频区| 国产爱豆传媒在线观看| 欧美xxxx性猛交bbbb| 身体一侧抽搐| 亚洲欧美中文字幕日韩二区| 22中文网久久字幕| 午夜激情欧美在线| 国产av在哪里看| 国产黄色视频一区二区在线观看 | 好男人视频免费观看在线| 国内精品一区二区在线观看| 丝袜美腿在线中文| 色综合色国产| 免费观看性生交大片5| 欧美人与善性xxx| 亚洲最大成人av| 午夜视频国产福利| 日韩强制内射视频| 国产午夜精品论理片| 青春草亚洲视频在线观看| 中文字幕av在线有码专区| 亚洲欧洲日产国产| 国产又色又爽无遮挡免| 成人漫画全彩无遮挡| 午夜免费男女啪啪视频观看| 国产亚洲最大av| 国产午夜精品一二区理论片| 成人漫画全彩无遮挡| 亚洲人成网站在线观看播放| 亚洲最大成人av| 国产午夜精品一二区理论片| 欧美性猛交╳xxx乱大交人| 97超碰精品成人国产| or卡值多少钱| 日韩欧美在线乱码| 亚洲av电影在线观看一区二区三区 | 国产精品久久久久久精品电影小说 | 69av精品久久久久久| 成人高潮视频无遮挡免费网站| 国产黄色小视频在线观看| 91狼人影院| 超碰av人人做人人爽久久| 欧美+日韩+精品| 综合色丁香网| 少妇高潮的动态图| 一边摸一边抽搐一进一小说| 免费一级毛片在线播放高清视频| 欧美精品国产亚洲| 午夜视频国产福利| 久久综合国产亚洲精品| 日本熟妇午夜| 一个人观看的视频www高清免费观看| 大香蕉久久网| 午夜老司机福利剧场| 亚洲成人av在线免费| 久久韩国三级中文字幕| 熟女电影av网| 成人av在线播放网站| 午夜福利高清视频| 建设人人有责人人尽责人人享有的 | 一级毛片电影观看 | 日韩三级伦理在线观看| 欧美97在线视频| 亚洲内射少妇av| 成人午夜精彩视频在线观看| 18禁在线播放成人免费| 精品酒店卫生间| 亚洲国产精品成人综合色| 大香蕉久久网| 久久久久九九精品影院| 国产精品一区二区性色av| 亚洲欧美中文字幕日韩二区| 神马国产精品三级电影在线观看| 久久草成人影院| 久久精品国产自在天天线| 亚洲精品乱码久久久v下载方式| 精品一区二区免费观看| 最近最新中文字幕大全电影3| 亚洲18禁久久av| 国产探花在线观看一区二区| 噜噜噜噜噜久久久久久91| 亚洲国产日韩欧美精品在线观看| 一区二区三区四区激情视频| 三级男女做爰猛烈吃奶摸视频| 一个人观看的视频www高清免费观看| 夫妻性生交免费视频一级片| 禁无遮挡网站| 亚洲激情五月婷婷啪啪| 国产单亲对白刺激| 黄片wwwwww| 久久久久久久国产电影| 一本—道久久a久久精品蜜桃钙片 精品乱码久久久久久99久播 | 天天躁日日操中文字幕| 国产av码专区亚洲av| 日本一二三区视频观看| 国产高潮美女av| 少妇熟女欧美另类| 精品久久久久久久久亚洲| 亚洲不卡免费看| 国产成人免费观看mmmm| 成人欧美大片| 一级二级三级毛片免费看| 成人高潮视频无遮挡免费网站| 性插视频无遮挡在线免费观看| 国产高清不卡午夜福利| 国产精品熟女久久久久浪| ponron亚洲| 久久99热6这里只有精品| 欧美日韩精品成人综合77777| 久久精品国产亚洲av天美| 久久久久久久久久黄片| 国产成人午夜福利电影在线观看| 国产高清有码在线观看视频| 九九在线视频观看精品| 国产成人aa在线观看| 国产极品天堂在线| 老司机福利观看| 七月丁香在线播放| 国产在线男女| 精品熟女少妇av免费看| 色播亚洲综合网| 一个人看视频在线观看www免费| 麻豆国产97在线/欧美| 内地一区二区视频在线| 国产亚洲5aaaaa淫片| 麻豆成人午夜福利视频| 你懂的网址亚洲精品在线观看 | 亚洲精华国产精华液的使用体验| 五月玫瑰六月丁香| 国产在视频线在精品| 尤物成人国产欧美一区二区三区| 亚洲第一区二区三区不卡| 性插视频无遮挡在线免费观看| 亚洲国产精品合色在线| 日本免费a在线| 国产精品.久久久| 久久国内精品自在自线图片| 中文字幕av成人在线电影| 男人狂女人下面高潮的视频| 99久久中文字幕三级久久日本| 亚洲丝袜综合中文字幕| 我要搜黄色片| 精品国产露脸久久av麻豆 | 99久国产av精品| 亚洲av.av天堂| 99久久精品一区二区三区| 欧美性猛交╳xxx乱大交人| 99久久九九国产精品国产免费| 禁无遮挡网站| 亚洲精品,欧美精品| 国产久久久一区二区三区| 亚洲av成人av| 搞女人的毛片| 日韩欧美精品免费久久| 精品人妻偷拍中文字幕| 蜜桃亚洲精品一区二区三区| 我的女老师完整版在线观看| 国产精品人妻久久久久久| 日日摸夜夜添夜夜爱| 亚洲欧美精品自产自拍| 久久久国产成人精品二区| 久久99热这里只有精品18| 成人一区二区视频在线观看| 日本免费a在线| 国产av码专区亚洲av| 热99re8久久精品国产| 色噜噜av男人的天堂激情| 亚洲在线观看片| 乱人视频在线观看| 人人妻人人澡人人爽人人夜夜 | 老司机福利观看| 日韩三级伦理在线观看| 国产伦一二天堂av在线观看| 国语对白做爰xxxⅹ性视频网站| 天天躁夜夜躁狠狠久久av| 欧美成人免费av一区二区三区| 久久久久久久久大av| 赤兔流量卡办理| 国产免费视频播放在线视频 | 97在线视频观看| 九草在线视频观看| 久久久久网色| 日韩中字成人| 成人一区二区视频在线观看| 岛国在线免费视频观看| 永久免费av网站大全| 久久草成人影院| 日韩三级伦理在线观看| 久久久久久久久久成人| 久久国内精品自在自线图片| 国产精品无大码| 久久久a久久爽久久v久久| 亚洲av电影在线观看一区二区三区 | 精品99又大又爽又粗少妇毛片| 亚洲熟妇中文字幕五十中出| 国产伦在线观看视频一区| 亚洲综合精品二区| 国产三级在线视频| 真实男女啪啪啪动态图| 成人漫画全彩无遮挡| 美女脱内裤让男人舔精品视频| 一级黄片播放器| 久久久成人免费电影| 久久久久久久久中文| 好男人视频免费观看在线| 亚洲精品亚洲一区二区| 亚洲精品乱码久久久v下载方式| 亚洲欧洲日产国产| 亚洲经典国产精华液单| 国产成人aa在线观看| 热99在线观看视频| 国产激情偷乱视频一区二区| av卡一久久| 日韩一本色道免费dvd| 亚洲国产精品久久男人天堂| 综合色av麻豆| 国产av码专区亚洲av| 国产亚洲一区二区精品| 在线免费观看的www视频| 欧美日本视频| 国语对白做爰xxxⅹ性视频网站| 久久99精品国语久久久| 国产精品麻豆人妻色哟哟久久 | 久久久久久久国产电影| 国产精品日韩av在线免费观看| 色综合色国产| 中文字幕精品亚洲无线码一区| 免费av毛片视频| 尾随美女入室| 国产精品熟女久久久久浪| 精品久久久久久久久亚洲| 可以在线观看毛片的网站| 欧美激情在线99| 久久久亚洲精品成人影院| 亚洲图色成人| av在线天堂中文字幕| 成年女人看的毛片在线观看| 欧美成人精品欧美一级黄| 精品久久久久久久久av| 波野结衣二区三区在线| 国产视频内射| 亚洲国产欧洲综合997久久,| 国产精品一区二区三区四区免费观看| 亚洲人与动物交配视频| 高清日韩中文字幕在线| 精品少妇黑人巨大在线播放 | 婷婷色综合大香蕉| a级一级毛片免费在线观看| videossex国产| av播播在线观看一区| 成人综合一区亚洲| 久久韩国三级中文字幕| 汤姆久久久久久久影院中文字幕 | 91精品一卡2卡3卡4卡| 久久国内精品自在自线图片| www日本黄色视频网| 可以在线观看毛片的网站| 午夜福利在线观看吧| 黄片无遮挡物在线观看| 99热精品在线国产| 99久久无色码亚洲精品果冻| 亚洲激情五月婷婷啪啪| 真实男女啪啪啪动态图| 欧美zozozo另类| 日韩av不卡免费在线播放| 老师上课跳d突然被开到最大视频| 午夜亚洲福利在线播放| 国产精品一区二区性色av| 亚洲不卡免费看| 久久综合国产亚洲精品| 纵有疾风起免费观看全集完整版 | 少妇的逼水好多| 日本一二三区视频观看| 少妇丰满av| 欧美日本视频| 精品一区二区三区视频在线| 国产伦精品一区二区三区视频9| 尾随美女入室| 亚洲欧美中文字幕日韩二区| 美女国产视频在线观看| 久久久久久久久久成人| 最近2019中文字幕mv第一页| 亚洲av.av天堂| 成人高潮视频无遮挡免费网站| 午夜激情欧美在线| 国产精品人妻久久久影院| 国语自产精品视频在线第100页| 久久久久性生活片| 久久人人爽人人爽人人片va| 老司机影院毛片| 久久久国产成人精品二区| 久久精品综合一区二区三区|