• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Deep Reinforcement Learning for Addressing Disruptions in Traffic Light Control

    2022-08-24 03:26:18FaizanRasheedKokLimAlvinYauRafidahMdNoorandYungWeyChong
    Computers Materials&Continua 2022年5期

    Faizan Rasheed,Kok-Lim Alvin Yau,Rafidah Md Noor and Yung-Wey Chong

    1School of Engineering and Computer Science,University of Hertfordshire,Hatfield,AL109AB,UK

    2Department of Computing and Information Systems,Sunway University,Subang Jaya,47500,Malaysia

    3Faculty of Computer Science and Information Technology,University of Malaya,Kuala Lumpur,50603,Malaysia

    4National Advanced IPv6 Centre,Universiti Sains Malaysia,USM,Penang,11800,Malaysia

    Abstract: This paper investigates the use of multi-agent deep Q-network(MADQN) to address the curse of dimensionality issue occurred in the traditional multi-agent reinforcement learning (MARL) approach.The proposed MADQN is applied to traffic light controllers at multiple intersections with busy traffic and traffic disruptions, particularly rainfall.MADQN is based on deep Q-network (DQN), which is an integration of the traditional reinforcement learning (RL) and the newly emerging deep learning (DL)approaches.MADQN enables traffic light controllers to learn, exchange knowledge with neighboring agents,and select optimal joint actions in a collaborative manner.A case study based on a real traffic network is conducted as part of a sustainable urban city project in the Sunway City of Kuala Lumpur in Malaysia.Investigation is also performed using a grid traffic network(GTN) to understand that the proposed scheme is effective in a traditional traffic network.Our proposed scheme is evaluated using two simulation tools,namely Matlab and Simulation of Urban Mobility (SUMO).Our proposed scheme has shown that the cumulative delay of vehicles can be reduced by up to 30%in the simulations.

    Keywords: Artificial intelligence; traffic light control; traffic disruptions;multi-agent deep Q-network;deep reinforcement learning

    1 Introduction

    Traffic congestion has become a problem in most urban areas of the world, causing enormous economic waste, extra travel delay, and excessive vehicle emission [1].The traffic light controllers,which are installed to monitor and control the traffic flows at intersections in order to alleviate traffic congestion strategically.Each traffic light controller has:a)light colors in which green color represents“go”,amber represents“slow down”,and red represents“stop”;b)traffic phase,in which a set of green lights are assigned to a set of lanes for safe and non-conflicting movements at the intersections;and c)traffic phase splitis the time period of a traffic phase.The traffic phase split includes a short moment of red lights for all lanes to provide a safe transition in between traffic phases.The time passed since the respective lane light has changed to red is represented by red timing.

    Next, we present five main fundamentals related to our investigation in the use of MADQN to traffic light controllers for reducing the cumulative delay of vehicles.Firstly, various types of traffic light controllers are presented to control and alleviate traffic congestion.Secondly,various artificial intelligence approaches are presented to accomplish fully-dynamic traffic light controllers.Thirdly,the traditional DQN approach is presented as an enhanced artificial intelligence approach applied to traffic light controllers.Fourthly, the significance of using the cumulative delay of vehicles as the performance measure is presented.Fifthly,traffic disruptions to traffic network are introduced using the Burr distribution.The contributions of this paper are also presented.

    1.1 Types of Traffic Light Controllers

    Traditionally,traffic light controllers monitor traffic movements and determine traffic phase splits to control and reduce congestion in traffic by using three main techniques.Firstly, adeterministictraffic light controller is a pretimed control system that uses historical data of traffic collected at different times and determines traffic phase splits using the Webster formula [2].Secondly, asemidynamictraffic light controller is a system based on actuated control, and it uses instantaneous or short-term traffic condition, such as the absence and presence of vehicles, and assigns green lights to lanes with vehicles [3].The short-term traffic condition can be detected using an inductive loop detector.Thirdly, afully-dynamictraffic light controller is also a system based on actuated control;however,it uses longer-term traffic condition,including the average queue length and waiting time of vehicles of a lane.The longer-term traffic condition (e.g., the average queue length and the waiting time of vehicles at a lane)can be measured by the number of vehicles using at least two inductive loop detectors at a lane,one is installed near the intersection and another one is installed further away from the intersection[4].By measuring the number of vehicles,the average queue length and waiting time of vehicles at a lane can be calculated.Video-based traffic detector(or camera sensor)can also be installed at an intersection,and image processing can be used to calculate the number of vehicles at a lane of an intersection [5].Subsequently, it adjusts the traffic phase split based on the traffic condition [6].The fully-dynamic traffic light controllers are more realistic in monitoring traffic movements because it uses longer-term traffic conditions,and the approach has shown to alleviate traffic congestion more effectively compared to deterministic and semi-dynamic traffic light controllers[7–9].

    1.2 Common Approaches of Fully-Dynamic Traffic Light Controllers

    The fully-dynamic traffic light controller has commonly been accomplished using artificial intelligence approaches, particularly reinforcement learning (RL) [10,11] and multi-agent reinforcement learning (MARL) [12].RL can be embedded in a single agent (or a decision maker), which is the traffic light controller,to learn,make optimal action(i.e.,traffic phase split),and improve its operating environment (i.e., the local traffic condition).In contrast, MARL can be embedded in multiple traffic light controllers(or agents)to exchange knowledge(i.e.,Q-values),learn,make optimal joint action,and improves their operating environment(i.e.,the global traffic condition)in a collaboration.However,the curse of dimensionality has occurred in single-agent RL and MARL,in which the state space is too large to be handled efficiently due to the complexity of the traffic congestion issue[13],and so this paper uses multi-agent deep Q-network(MADQN)that is based on the traditional singleagent DQN approach[14].This paper extends our previous work[15]that mainly focuses on measuring the performance measures,such as throughput,waiting time,and queue length,which are unable to relate to drivers’experience.This paper investigates the use of MADQN to traffic light controllers at intersections with high volume of traffic and traffic disruptions (i.e., rainfall) by measuring the cumulative delay of vehicles as the performance measure,which relate to drivers’experience.

    1.3 DQN as an Enhanced RL-Based Approach for Traffic Light Controllers

    DQN is a combination of the new evolving deep learning technique[16]and the traditional RL technique, conveniently called deep reinforcement learning (DRL) [17].DQN solves the curse of dimensionality and provides two main advantages[18]:a)reduces the learning time and computational cost incurred to explore different pairs of state-action and identify the actions that are optimal;and b) uses hidden layers to provide abstract and continuous representations of the complex and highdimensional inputs (i.e., the state space) for reducing the capacity of storage needed to store the unlimited number of pairs of state-action(or the Q-values).

    1.4 Cumulative Delay of Vehicles as the Performance Measure Used by Traffic Light Controllers

    The cumulative delay of the vehicles is the average time(i.e.,average travelling and waiting times caused by congestion)taken by vehicles to travel from a source location to a destination location,which may require crossing multiple intersections.Compared to other measures,including the average queue length and waiting time of the vehicles of a lane,and throughput(i.e.,the number of vehicles crossing an intersection),the cumulative delay can be directly perceived by drivers,and so it relates to the drivers’experiences.In other words,drivers perceive the difference between the actual and expected travel times when crossing multiple intersections.The investigations with respect to the cumulative delay of vehicles as the performance measure has gained momentum over the years with the use of DRL to traffic light control.Specifically, the cumulative delay is the most frequently used performance measure in the literature from January 2016 to November 2020 as compared to other performance measures in the investigation of the use of DRL to traffic light control as shown in Fig.1.The scientific literature databases, including Web of Science [19], IEEE Xplore Digital Library [20], and ScienceDirect [21],have been used to conduct this study.The cumulative delay of vehicles has gained momentum over the years due to its better reflection of the real circumstances, particularly the drivers’experiences.While the cumulative delay has been used in the literature [22,23], it has not been applied to traffic congestion at multiple intersections scenario under the presence of increased traffic volume and traffic disruptions,and so this paper adopts this measure to calculate the average time required by vehicles to cross multiple intersections.

    1.5 Burr Distribution for Introducing Traffic Disruptions to Traffic Networks

    Traffic congestion can be categorized into:a)recurrent congestion(RC)caused by the high volume of traffic; and b)non-recurrent congestion (NRC) caused by erratic traffic disruptions, including accidents, and rainfall [24,25].In the literature [26], the arrival process of vehicles has been widely modeled by the Poisson process, whereby the inter-arrival time of vehicles follows an exponential distribution.The Poisson process incorporates RC naturally;however,it does not incorporate NRC,and so the Burr distribution,which is the generalization of the Poisson process,has been adopted by this work to model the vehicle’s time of inter-arrival under scenarios with a high volume of traffic(i.e.,RC)and traffic disruptions(i.e.,NRC).

    1.6 Contributions of the Paper

    Our contribution is to investigate the use of MADQN to traffic light controllers at intersections with a high volume of traffic and traffic disruptions(i.e.,rainfall).This work is based on simulation using the Burr distribution, which has been shown to model traffic disruptions in traffic networks accurately in [27].The performance measure is the cumulative delay of vehicles, which includes the average waiting and travelling times caused by congestion.This performance measure is used because the difference between the actual and expected travel times when crossing multiple intersections can be directly perceived by drivers,and so it relates to the drivers’experiences.In this paper,we aim to show a performance comparison between MADQN and MARL applied to traffic light controllers at intersections with a high volume of traffic and traffic disruptions in terms of the cumulative delay of vehicles,which has not been investigated in the literature despite its significance.

    Figure 1:The usage of three popular performance measures in the investigation of the use of DRL to traffic light control in recent years

    1.7 Organization of the Paper

    The paper is structured into six sections.

    ·Section 1 presents the introduction of traffic light controllers and the background of common approaches for traffic light controllers.It also presents the contributions of the paper.

    ·Section 2 presents the background of the Krauss vehicle-following model,DRL and MARL.The traditional algorithms of DRL and MARL are also presented in this section.

    ·Section 3 presents the literature review of DQN-based traffic light controllers.There are five main DQN approaches discussed in this section.

    ·Section 4 presents the proposed MADQN model for traffic light controllers.It presents the representations of the proposed MADQN model,including the state space,action space,and delayed reward, applied to traffic light controllers.It also presents the MADQN architecture and algorithm.

    ·Section 5 presents an application for sustainable city (i.e., Sunway city), and our simulation results and discussion.

    ·Section 6 concludes this paper with a discussion of potential future directions.

    2 Background

    This section presents the background of the Krauss vehicle-following model,DRL and MARL.The Krauss vehicle-following model is a mathematical model of safe vehicular movement,whereby a gap between two consecutive vehicles is maintained.The background of DRL includes the traditional single-agent deep Q-network (DQN) algorithm.The background of MARL includes its traditional algorithm.

    2.1 Krauss Vehicle-Following Model

    In 1997,Krauss developed a vehicle-following model based on the safe speed of vehicles.The safe speed is calculated as follows[28]:

    whereul(t)anduf(t)represent the speed of the leading and following vehicles at timet,respectively,andg(t) is the gap to the leading vehicle at timet.The driver’s reaction time (e.g., one second) is represented byτr,andbis the maximum deceleration of the vehicle.

    In our study,the Krauss vehicle-following model is used to ensure the safe movement of vehicles at intersections of the Sunway city and grid traffic networks(see Section 5).

    2.2 Deep Reinforcement Learning

    DRL incorporates DNNs into RL that enables agents to learn relationships between actions and states.DeepMind has first proposed the DQN [17], which is the DRL method, and it has been widely adopted in traffic light control [29].DQN consists of a DNN, which is comprised of three main kinds of layers,namely theinput layer,hidden layer(s),andoutput layer.In DQN,the neurons are interconnected with each other and they can learn complex and unstructured data [30].During training, the data flows from the input layer to the hidden layer(s), and finally to the output layer.DQN provides two main features,which are:a)experience replay,in which experiences are stored in a replay memory,and then the experiences are randomly selected for training;and b)target network,which is the main network duplicate.The main network selects actions based on observations from the operating environment and updates its weights.The target network approximates the weights of the main network to generate its Q-value.During training,the Q-value of the target network is used to calculate the loss incurred by a selected action,and it has shown to stabilize training.After every certain number of iterations, the target network is updated.The main difference between the main and target networks is that the main network is used during observation,action selection and training processes,while the target network is used during training process only.

    Algorithm 1:The traditional single-agent DQN algorithm 1.procedure 2.for m=1: M do{Observation process}3.observe current state st 4.for t=1: T do{Action selection process}5.select action vt using Eq.(2)(Continued)

    6.receive delayed reward rt+1(st+1)and next state st+1 7.store experience et =(st,vt,rt+1(st+1),st+1)in replay memory Dt{Training process}8.sample a random minibatch of experiences en from replay memory Dt 9.for j =1: N do 10.set target yj =images/BZ_143_675_641_703_687.pngrj+1images/BZ_143_791_641_810_687.pngsj+1images/BZ_143_791_696_810_742.pngsj+1images/BZ_143_869_641_887_687.png,if episode terminates at sj+1images/BZ_143_869_696_887_742.png+γmaxvQ(sj+1,v;θj),otherwise 11.compute the loss function using Eq.(3)12.perform a gradient descent optimization on(yj-Q(sj,vj;θj))2 with respect to θj using Eq.(5)13.reset θ- =θ in every C steps 14.end for 15.end for 16.end for 17.end procedure rj+1

    Algorithm 1 shows the DQN algorithm.Inm∈M,which is an episode,the current statest∈S(or the decision making factors)is observed by an agent.Att∈T,which is a time instant,the best-known(or greedy)actionv*t∈Vis selected by an agent as follows:

    whereQt(st,vt;θt) is the Q-value, which indicates whether the actionvtis appropriate under statest,andθtare the parameters of main network.After that,the agent receives the delayed rewardrt+1(st+1)and next statest+1,and then it stores its experienceet=(st,vt,rt+1(st+1),st+1)in a replay memoryDt=(e1,e2,...,et).After that,a minibatch of experiencesenis sampled by an agent from the replay memoryDtrandomly.Suppose, the target network Q-value isQt(st,vt;θt-) and the main network Q-value isQt(st,vt;θt).The target Q-value is fixed forCsteps to stabilize the Q-values of the main network,and to reduce the loss between the Q-values of the target and main networks.To train the main network,the loss function is reduced at iterationjas follows:

    wherep(sj,vj)is the state-action(sj,vj)pair probability distribution,andyjis a target,as follows:

    whereγis a discount factor,in which the discounted rewardγmaxv Q(sj+1,v;θj)represents the longterm reward estimated by the maximum Q-value at iterationj+1,and the delayed rewardrepresents the short-term reward.If episode terminates atsj+1,thenThe loss function gradient ?θLj(θj)is given as follows:

    The target Q-valuesQt(st,vt;θt-)of the target network is updated by replacing the weightsθj-of the target network with the weightsθjof the main network in order to provideQj(sj,v*j;θj-)≈Q*(sj,vj;θj)at everyCsteps(i.e.,equivalent to a number of iterations[31]).

    2.3 Multi-Agent Reinforcement Learning

    MARL is an extended approach of the traditional RL approach that enables multiple agents to exchange information with each other in order to achieve the optimal network-wide performance[32].The optimization of the network-wide objective function is the main purpose,such as the global Qvalue that sums up the local Q-values of all agents in a single network,as time goes byt= 1,2,3....Algorithm 2 shows the MARL algorithm.At time instantt∈T,an agentiobserves its current local statesit∈S, sends its own Q-valueQit(sit,vit) to neighboring agentsJi, receives the optimal Q-value maxvj∈V Qjt(sjt,vj)from each neighboring agentj∈Ji,and selects anaction vit,*∈Vas follows:

    Agentireceives a delayed reward() for the state-action pair(sit,vit) under the next state∈Sat time instantt+1, and then the Q-valueQit(sit,vit) for the state-action pair is updated as follows:

    where Δti(sit,vit)represents a temporal difference as follows:

    whereni,jrepresents the importance (or weight) of an agentjin the neighborhood of agenti, and

    Algorithm 2:MARL algorithm embedded in agent i 1.procedure 2.observe current state si t ∈S 3.send Q-value Qi t)to neighboring agents Ji 4.receive max vj∈V Qj t(si t(sj t,vi t,vj)from agent j ∈Ji 5.select action vi t ∈V using Eq.(6)6.receive delayed reward ri t+1(siimages/BZ_144_746_2295_765_2341.pngsit,vit t+1)images/BZ_144_843_2295_862_2341.pngusing Eq.(7)8.end procedure 7.update Q-value Qi t+1

    3 Literature Review

    This section presents a literature review of DQN-based traffic light controllers,which have shown to achieve various performance measures.Five main DQN approaches have been proposed to reduce the cumulative delay of vehicles.In general, the DQN approaches are embedded in traffic light controllers.The DQN model has an input layer that receives state,and an output layer that provides Q-values for possible actions(e.g.,traffic phases[33]and traffic phase splits[34]).

    3.1 Traditional DNN-Based DQN

    The application of the traditional DNN-based DQN approach to traffic light control is proposed in [33,35].In the Wan’s DNN-based DQN model [33]:a) state represents the current traffic phase,the queue length, and the green and red timings; b) action represents a traffic phase; and c) reward represents the waiting time of vehicles.In the Tan’s DNN-based DQN model[35]:a)state represents the queue length of vehicles;b)action represents a traffic phase;and c)reward represents the queue length and waiting time, as well as throughput.The proposed schemes have shown to reduce the cumulative delay[33,35]and queue length[35]of vehicles,and improve throughput[33].

    3.2 CNN-Based DQN

    The convolutional neural network(CNN)-based DQN approach enables agents to analyze visual imagery of traffic in an efficient manner.The agents process the input states, which are represented in the form of a two-dimensional matrix(i.e.,multiple rows and columns of values,such as an image)or one-dimensional vectors(e.g.,a single row or column of values,such as the queue length of a lane)[29].The application of the CNN-based DQN approach to traffic light control is proposed in [22],[23,34,36].In the Genders’and Gao’s CNN-based DQN model[22,34]:a)state represents the current traffic phase,as well as the position and speed of vehicles;b)action represents a traffic phase[22]and a traffic phase split[34];and c)reward represents the waiting time of vehicles.In the Wei’s CNN-based DQN model[36]:a)state represents the current traffic phase,as well as the position and queue length of vehicles;b)action represents a traffic phase;and c)reward represents the waiting time and queue length.In the Mousavi’s CNN-based DQN model[23]:a)state represents the current traffic phase and queue length;b)action represents a traffic phase;and c)reward represents the waiting time of vehicles.The proposed schemes have shown to reduce the cumulative delay[22,23,34,36],waiting time[34],and queue length[22,23,36]of vehicles,as well as improve throughput[22,36].

    3.3 SAE-Based DQN

    The stacked auto encoder(SAE)neural network-based DQN approach enables agents to perform encoding and decoding functions, and store inputs efficiently.The application of the SAE neural network-based DQN approach to traffic light control is proposed in [37].In the Li’s SAE neural network-based DQN model [37]:a) state represents the queue length; b) action represents a traffic phase split;and c)reward represents the queue length and waiting time.The simulation results have shown that the proposed scheme can reduce the cumulative delay and queue length of vehicles.

    3.4 LSTM-Based DQN with A2C

    The long short-term memory (LSTM) neural network-based DQN approach enables agents to memorize previous inputs of a traffic light control using a memory cell that maintains a time window of states.The advantage of the actor critic(A2C)-based method is that it is a combination of valuebased and policy-gradient (PG)-based DQN method.Each agent has an actor that controls how it behaves (i.e., PG-based), and a critic that measures the suitability of the selected action (i.e., valuebased).The application of the LSTM neural network-based DQN with A2C approach to traffic light control has been proposed in[38].In[38]:a)state represents the queue length;b)action represents a traffic phase;and c)reward represents the queue length and waiting time.The simulation results have shown that the proposed scheme can reduce the cumulative delay and queue length of vehicles,as well as improve throughput.

    3.5 MADQN

    The MADQN approach allows multiple DQN agents to share knowledge(i.e.,Q-values),learn,and make optimal joint actions(i.e.,traffic phase split)in a collaboration.In the Rasheed’s MADQN model [15]:a) state represents queue length, the current traffic phase, red timing, and the rainfall intensity; b) action represents a traffic phase split; and c) reward represents the waiting time of the vehicles.The simulation results have shown that the proposed scheme can reduce the waiting time and queue length at a lane,and improve throughput.In this paper,we extend the work in[15]by evaluating the cumulative delay of vehicles incurred by MARL and MADQN at multiple intersections, while having traffic disruptions(i.e.,rainfall).

    4 Our Proposed MADQN Approach for Traffic Light Controllers

    In a traffic network, an intersections setIis considered in this paper, wherebyi∈Iis an intersection in which:a)Kiis an incoming lane set,and b)Jiis a neighboring intersection set.Fig.2 shows an abstract model of MADQN, in which the agentiand its neighboring agentsj= 1 ∈Jiandj= 2 ∈Jishare the same traffic environment.This research uses four traffic phases:a)the north-east bound traffic phase; b) the east-south bound traffic phase; c) the west-north bound traffic phase;and d)the south-west bound traffic phase.The traffic phases are activated in a roundrobin fashion by traffic light controllers at intersections, and our MADQN approach is used to adjust the time intervals of the traffic phases (i.e., traffic phase splits).Our proposed MADQN approach,including the MADQN model(i.e.,the state,action,and delayed reward representations),the MADQN architecture,and the MADQN algorithm with its complexity analysis are presented in the remainder of this section.

    4.1 MADQN Model

    MADQN has three main advantages as compared to MARL as follows:

    ·MADQN uses DNNs, which provide the state space with its continuous representation.Consequently,it represents an unlimited number of pairs of state-action.

    ·MADQN addresses the curse of dimensionality by providing efficient storage for complex inputs.

    ·MADQN uses a target network with experience replay, and so it improves the stability of training.

    The remainder of this subsection presents the representations of the state, action, and delayed reward of the MADQN model at an intersectioniat timet.

    4.1.1 State

    ·∈{0,1,2,3} represents the current traffic phase, and it is a discrete state.The north-east bound traffic phase is represented by a 0 value,the east-south is represented.

    ·∈ {0,1,2,3},?k∈Kirepresents the queue length of the incoming lanesKiand it is a continuous state.No vehicle at a lane is represented by a 0 value, ≤ 25% occupancy is represented by 1,>25%and ≤50%is represented by 2,and>50%is represented by 3.The occupancy can be measured using inductive loop detectors installed at intersections.

    ·∈ {0,1,2,3},?k∈Kjrepresents the queue length of the incoming lanesk∈Kjat a neighboring intersectionj, and it is a continuous state.Bothandhave similar representation.

    ·∈represents the red timing of the current traffic phase,and it is a continuous state.

    ·∈{0,1,2,3,...,si5}represents the intensity of rainfall with a 0 value means no rain andsi5,which is the maximum value,means the heaviest rain,and it is a continuous state.Simply,the intensity of the disruption is represented by the sub-state.

    Figure 2:An abstract model of agent i and its neighboring agents j =1 ∈Ji and j =2 ∈Ji in MADQN.The cumulative delay of vehicles increases as vehicles travel from one intersection to another

    4.1.2 Action

    The actionvit∈Virepresents a selected action, which is a traffic phase splitvit∈{0,1,2,3,4}in a fixed predetermined round-robin sequence of traffic phases, wherevit= 0 skips a traffic phase;for instance, due to the absence of a waiting vehicle at a lane.The north-east bound traffic phase is represented by a 1 value,the east-south is represented by 2,the west-north is represented by 3,and the south-west is represented by 4.Hence,agentican select to switch to another traffic phase or to keep the current traffic phase.

    4.1.3 Delayed Reward

    An agent receives delayed rewards that vary with the average waiting time of the vehicles at the intersections.Traffic congestion can cause an increment in the average waiting time of the vehicles at the intersections.The delayed rewardis a relative value that represents the difference of the average total waiting time of all vehicles at an intersectioniat timetandt+1(i.e.,before and after taking an actionvit),wherebyWti >gives a positive delayed reward,Wti=gives a zero delayed reward,andWti <gives a negative delayed reward.

    4.2 MADQN Architecture

    Fig.3 shows the DQN architecture.There are three main components in an agent, namely the main network,the target network,and the replay memory.The main network consists of a DNN with its weightθt

    iused to approximate its Q-valuesQit(sit,vit;θti).The main network is used to choose an actionvitfor a particular st atesitobserved from the operating environment in order to achieve the best possible delayed rewardand next stateat the next time instantt+1.The target network is a copy (or duplicate) of the main network with its weightθti-used to approximate its Q-valuesThe target network is used during training only, and the main network is used during both action selection and training.The replay memory represents the dataset of an agent’s experiencesgathered during the interaction between an agent and its operating environment as time goes by.The experiencesDitare used for training purpose.

    The DNN has three kinds of layers(see Section 2).In our DQN architecture,there are 5 neurons in the input layer,whereby each neuron represents a state.The number of fully connected(FC)hidden layers is 5, whereby there are 400 neurons in each layer.There are 5 neurons in the output layer,whereby a possible action is represented by each neuron.A weight is associated with each link.Each neuron consists of a rectified linear unit(ReLU)activation function,in which the gradient descent is performed.The 5 sub-states of stateare fed to the input layer by its neurons.Subsequently,signals are forwarded to the output layerviahidden layers,which provides the Q-valuesof the possible actionsvitat intersectioniduring training.

    Figure 3:DQN architecture

    4.3 MADQN Algorithm

    In this section, the extension of the traditional DQN approach to MADQN for multiple intersections is presented, and it has not been explored in the literature.The proposed MADQN algorithm is evaluated in simulation under different traffic networks (see Section 5) in the presence of traffic disruptions.

    MADQN allows knowledge to be learned and exchanged among multiple DQN agents for coordination.The traditional MARL approach enables an agent to choose an optimal action based on its neighboring agents’actions.In the moving target scenario,actions are selected independently by agents simultaneously,and so the action selected by an agentican affect the operating environment of its neighboring agentsJi.Therefore,the moving target scenario has increased dynamicity of operating environment that affects learning stability.For instance, at an upstream intersection, a traffic light controlleri’s action can have positive or negative effects on the congestion level of downstream and neighboring intersectionsJisince vehicles move from one intersection to another.Likewise,the agent’siaction at an intersection can be affected by agentsJiactions at neighboring intersections.By exchanging knowledge and coordinating the agents,the convergence to an optimal action in a multiagent system has been shown in the literature[39].The summation of the local Q-values of the agents is known as the global Q-value,which represents the global objective function.An optimal equilibrium is attained when a convergence is achieved by the global Q-value.The convergence is attributed to:a)an agent updates its Q-values by using Q-values from neighboring agentsJi; b) the availability of a local view of neighboring agentsJiat an agenti;and c)an agent’s action being the best response to the agents’neighbors.MADQN addressesmoving targetby taking neighboring agents’actions into consideration,and coordinating among themselves in a collaborative manner,in order to converge to an optimal joint action and achieve stability in a shared environment.

    Algorithm 3:MADQN algorithm embedded in agent i Complexity ComputationalMessageStorage 1.procedure 2.for m=1: M do{Observation process}3.observe current state sit ∈S{Knowledge sharing process}4.send Q-value Qit(sit,vit;θit)to neighboring agents Ji≤|J|5.receive max vj;θj t)from agent j ∈Ji≤|J|6.for t=1: T do{Action selection process}7.select action vit using Eq.(9)8.receive delayed reward rit+1(sit+1)and next state sit+1 9.store experience eit =(sit,vit,ri vj∈V Qjt(sjt,images/BZ_149_973_2370_988_2407.png,si t+1)in replay memory~Dit t+1images/BZ_149_899_2370_914_2407.pngsit+1 O(|S||A|)≤|S||A|{Training process}10.sample a random minibatch of experiences ein from replay memory Dit 11.for j =1: N do(Continued)

    12.set target rij+1()■■■■■■■■■■■■■sij+1)(,if episode terminates as sij+1 yij =ri j+1 si j+1otherwise+γ maxv Qij+1(,sij+1,vi;θij)13.compute the loss function using Eq.(3)O(|S||A|)14.perform a gradient descent optimization on(yij -Qij(sij,vij;θij))2 with respect to θij using Eq.(5)O(|S||A|)15.reset θi- =θi in every C steps 16.end for 17.end for 18.end for 19.end procedure

    Algorithm 3 shows the algorithm for MADQN,which is an extension of Algorithm 2.In episodem∈M, the DQN agentiobserves the current stateand sends its own Q-valueto agents in the neighborhoodJi.Next, it receives the optimal Q-valuefrom each agentjin the neighborhoodJi.At time instantt, the agentiselects an optimal actionas follows:

    whereni,jrepresents the weight(or importance)of neighboring agentjat agenti,and

    The proposed MADQN algorithm allows an agent to receive Q-values from neighboring agents and use them to select the optimal action.

    4.4 Complexity Analysis

    This section presents an analysis of the computational,message,and storage complexities of the proposed MADQN algorithm(i.e.,Algorithm 3)applied to traffic light controllers.In this paper,the complexity analysis has two levels:a)step-wisethat considers a single execution or iteration of the MADQN algorithm;and b)agent-wisethat considers all the state-action pairs of an agent.Note that,only exploitation actions are considered while analyzing the algorithm.

    Computational complexity estimates the maximum number of iterations, episodes, and so on,executed to calculate Q-values under all possible states.The step-wise computational complexity is considered under a particular state.An agentistores its experienceupon receiving a delayed reward and the next state,so the step-wise computational complexity is calculated asO(|A|) (Step 9) since there are |A| possible actions for each state.The agent-wise complexity is calculated asO(|S||A|)since there are|S|possible states.

    Message complexity is the number of messages exchanged between the agents in order to calculate the Q-values.An agentiexchanges its knowledge(i.e.,Q-values)with its neighboring agentsJ(Steps 4-5),so the step-wise message complexity is given by ≤|J|since there are|J|neighboring agents.The agent-wise complexity is calculated as ≤|J|.

    Storage complexity is the amount of memory needed to store knowledge(i.e.,Q-values)and the experiences of agents.An agentistores its experience(Step 9),so the stepwise storage complexity has a value of 1,and the agent-wise complexity is calculated as ≤|S||A|.

    5 Application for Sustainable City and Simulation Results

    An investigation of the proposed scheme has been conducted in a case study based on a real traffic network,which is part of a sustainable urban city project in the Sunway City of Kuala Lumpur in Malaysia.Investigation is also performed using a grid traffic network (GTN) to understand the performance of the proposed scheme in a complex traffic network.Hence, our investigation covers both real-world and complex traffic networks,which are based on simulation.In this paper,the traffic networks with a left-hand traffic is considered,in which the traffic movement for the left turn is either protected or does not conflict with other traffic movements.This section also presents simulation results and discussion for our simulation in both RC and NRC environments.

    5.1 Sunway City in Kuala Lumpur

    Sunway city is one of the sustainable and smart cities in Malaysia [40].It has busy commercial areas, residential areas with high density (i.e., LaCosta and Sunway Monash Residence), higher educational institutions (i.e., Monash University Malaysia campus and Sunway University), health centre (i.e., Sunway Medical Centre), amusement park (i.e., Sunway Lagoon), hotel (i.e., Sunway Resort Hotel&Spa),and so on,as shown in Fig.4.In Fig.4,the Sunway city traffic network(SCTN)has seven intersections,whereby every intersection has a traffic light controller.Fig.5 shows the traffic phases,and Tab.1 shows the traffic phase splits of existing(i.e.,deterministic)traffic light controllers at all intersections in SCTN.The traffic phase splits were observed during the evening busy hours(i.e.,5–7 pm)of a working day,and they were measured using a stopwatch.

    Malaysia ranks third and fifth worldwide in the number of lightning strikes (i.e., around 240 thunderstorm days/year [41]) and rainfall (i.e., around 1000 mm/year [42]), respectively.So, traffic congestion caused by traffic disruptions(i.e.,rainfall)during the peak hours is a serious problem.

    In this paper,we apply our proposed algorithm to the traffic network of Sunway city.Investigation is conducted in the traffic simulator SUMO[43].

    5.2 Grid Traffic Network

    A GTN,which is a complex traffic network,has been widely adopted in the literature[15,44–47]to conduct similar investigations,and so it is selected for investigation in this paper to show that the proposed scheme is effective.This paper uses a 3×3 GTN with nine intersections,whereby a traffic light controller is installed at each intersection,which has 4 legs in four different directions(i.e.,north bound,south bound,east bound,and west bound).Each leg has two lanes so that a vehicle can either enter or leave the leg of an intersection.

    Figure 4:A SCTN and the locations of its traffic light controllers

    Figure 5:Seven intersections of the SCTN shown in Fig.4.(a) Intersection 1 (b) Intersection 2 (c)Intersection 3(d)Intersection 4(e)Intersection 5(f)Intersection 6(g)Intersection 7

    5.3 Simulation Settings

    This subsection provides the specification of simulation setup.Two different traffic networks are investigated:a)SCTN with seven intersections(see Fig.4),and b)a 3×3 GTN with nine intersections.While SCTN is based on a real-world traffic network,GTN is a complex traffic network traditionally used in traffic light control investigations [15,44–47].So, these traffic networks are chosen for the investigation of the effectiveness of the proposed scheme in both real-world and complex traffic networks.The simulations are conducted using Matlab [48] and traffic simulator SUMO [43].The traffic control interface protocol of SUMO (i.e., TraCI4Matlab [49]) is used to interconnect Matlab and SUMO.Both SCTN and GTN are designed using NetEdit,which is the traffic network editor of SUMO.The resource files in XML provide the details of the speed limits and arrival rates of vehicles,which define the RC and NRC traffic congestion levels and their effects to the traffic networks.The total duration of the simulations is up to the 100 episodes.The steps of each episode are provided in the Steps 2 to 18 of Algorithm 3.

    Table 1:Traffic phase splits for existing traffic light controllers in SCTN

    5.4 Parameters of Simulation and Performance Measure

    The parameters of simulation,which allow the best possible results for a DQN agent are presented in Tab.2.Up to 50,000 experiences can be stored in a replay memory,and up to 100 experiences can be sampled randomly to form a minibatch.The values of parameters,which are presented in Tab.2,have shown to provide the best possible performance in the literature[44].

    Tab.3 presents the parameters of simulation for the Burr type XII distribution model, which has various intensities of rainfall, includingno rain(NR),light rain(LR),moderate rain(MR), andheavy rain(HR) scenarios [27].The lower and higher scale parameterβvalue shrinks and stretches the distribution, respectively.The shape parameterskandcare reciprocals of the scale parameterβ.The shape parameterskandc, as well as the scale parameterβ, increase with the intensity of rainfall[27].

    Table 2:Parameters of simulation for the DQN agent

    The performance measure used in this paper is the cumulative delay of the vehicles.Our proposed scheme aims to reduce the cumulative delay required by vehicles to cross multiple intersections.The cumulative delay also includes the average travelling and waiting times during congestion caused by RC and NRC.The total number of vehicles is 1000 per episode.

    5.5 Results and Discussion

    This section compares the performance measures achieved by our proposed MADQN, MARL and the baseline approaches,under RC and NRC traffic congestions.Fig.6 presents a MADQN model loss during the training process under RC and NRC traffic congestions in SCTN and GTN.The lower model loss enhances the performances of both SCTN and GTN under both traffic congestions(i.e.,RC and NRC).

    Table 3:Parameters of simulation for the Burr distribution model

    Figure 6:The MADQN model loss under RC and NRC traffic congestions in SCTN and GTN reduces with episode

    5.5.1 Accumulated Delayed Reward

    The accumulated delayed reward for MARL and MADQN under RC and NRC traffic congestions increases with episode in SCTN, as well as in GTN as shown in Fig.7.The accumulated delayed reward for both MARL and MADQN approaches becomes steady after 50 episodes.As compared to MARL,the accumulated delayed reward achieved by MADQN is higher in both types of traffic congestions(i.e.,RC and NRC)and traffic networks(i.e.,SCTN and GTN).Overall,MADQN increases accumulated delayed reward by up to 10%and 12.5%under RC and NRC traffic congestions in the SCTN,and up to 8.3%and 7.2%under RC and NRC traffic congestions in the GTN,respectively.

    Figure 7:Accumulated delayed reward under RC and NRC traffic congestions in SCTN and GTN increases with episode.The accumulated delayed reward of MADQN is higher than that of MARL,which enhances the performances of both SCTN and GTN.(a)RC at SCTN(b)RC at GTN(c)NRC at SCTN(d)NRC at GTN

    5.5.2 Cumulative Delay of the Vehicles

    The cumulative delay of the vehicles for MADQN,MARL and the baseline approaches,under RC and NRC traffic congestions reduces with episode in SCTN as shown in Fig.8.For both RC and NRC traffic congestions,MADQN outperforms MARL and the baseline approaches,in which MADQN achieves a lower cumulative delay compared to MARL and the baseline approaches under increased traffic volume.Similar trend is observed in GTN under RC and NRC traffic congestions as shown in Fig.8.Overall, MADQN reduces cumulative delay of vehicles by up to 27.7% and 27.9% under RC and NRC traffic congestions in SCTN, and up to 28.5% and 27.8% under RC and NRC traffic congestions in GTN,respectively.

    Figure 8:Cumulative delay of the vehicles under RC and NRC traffic congestions in SCTN and GTN reduces with episode.The cumulative delay of the baseline approach is fixed,and for the MARL and MADQN approaches,they reduce with episodes.MADQN has a lower cumulative delay than that of MARL and baseline approaches,which enhances the performances of both SCTN and GTN.(a)RC at SCTN(b)RC at GTN(c)NRC at SCTN(d)NRC at GTN

    Fig.9 presents a performance comparison between MARL and MADQN traffic light controllers in SCTN and GTN,in terms of cumulative delay of vehicles.MADQN achieves lower values compared to MARL in both SCTN and GTN under both RC and NRC traffic congestions.Overall,MADQN has similar results in RC and NRC traffic congestions because of its two main features,particularly target network and experience replay,which have shown outperforming results as compared to MARL in both complex GTN and real-world SCTN.

    Figure 9:Performance comparison between MARL and MADQN traffic light controllers applied in SCTN and GTN under RC and NRC traffic congestions.MADQN achieves a lower cumulative delay compared to MARL under increased traffic volume.Lower cumulative delay enhances the performance of the traffic networks

    6 Conclusions and Future Directions

    This paper investigates the application of multi-agent deep Q-network(MADQN)to traffic light controllers at multiple intersections in order to address two types of traffic congestions:a)recurrent congestion(RC)caused by high volume of traffic;and b)non-recurrent congestion(NRC)caused by traffic disruptions,particularly bad weather conditions.From the traffic light controller perspective,MADQN adjusts traffic phase split according to traffic demand in order to minimize the number of waiting vehicles at different lanes of an intersection.From the MADQN perspective, it enables traffic light controllers to use deep neural networks (DNNs) to store and represent complex and continuous states,exchange knowledge(or Q-values),learn,and achieve optimal joint actions while addressing the curse of dimensionality in a multi-agent environment.There are two main features in MADQN,namely target network and experience replay,which provide training with stability in the presence of multiple traffic light controllers.MADQN is investigated in a traditional GTN and a real traffic network based on the Sunway city.Our simulation in Matlab and SUMO shows that MADQN outperforms MARL by reducing the cumulative delay of vehicles by up to 27.7% and 27.9% under RC and NRC traffic congestions in the SCTN,and up to 28.5%and 27.8%under RC and NRC traffic congestions in the GTN,respectively.

    There aresixfuture works that could be pursued to improve MADQN.Firstly, relaxing the assumption in which the left-turning (or right-turning) traffic movement is not protected or can conflict with other traffic movements in a left-hand (or right-hand) traffic network.Secondly,prioritizing the experiences during experience replay for faster learning in a multi-agent environment with multiple intersections.Thirdly,addressing the effects of dynamicity to MADQN, including the dynamic movement of vehicles.Fourthly,providing fairness and prioritized access among traffic flows at intersections.Fifthly, other kinds of disruptions of traffic, including crashes, could be considered into the state space as they tend to cause serious traffic congestion.Lastly,real field experiment can be conducted to train and validate the proposed scheme using real-world feedback.The real field data can be collected so that the traffic network and the system performance achieved in the simulation can be calibrated.

    Funding Statement:This research was supported by Publication Fund under Research Creativity and Management Office,Universiti Sains Malaysia.

    Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

    夫妻性生交免费视频一级片| 久久久久精品久久久久真实原创| 欧美日韩在线观看h| 午夜福利在线在线| 精品久久久久久电影网| 性插视频无遮挡在线免费观看| 性色avwww在线观看| 下体分泌物呈黄色| 国产 一区 欧美 日韩| 一级毛片aaaaaa免费看小| 91精品一卡2卡3卡4卡| 天天一区二区日本电影三级| 日韩中字成人| 少妇的逼好多水| 观看美女的网站| 国产老妇伦熟女老妇高清| 在线免费观看不下载黄p国产| 超碰97精品在线观看| 日韩免费高清中文字幕av| 亚洲成人一二三区av| 久久精品国产亚洲av天美| 丝袜喷水一区| 精品国产露脸久久av麻豆| 男的添女的下面高潮视频| 国产精品福利在线免费观看| 一级a做视频免费观看| av在线播放精品| 国语对白做爰xxxⅹ性视频网站| 一本久久精品| 中文字幕人妻熟人妻熟丝袜美| 国产伦精品一区二区三区四那| 日韩国内少妇激情av| 麻豆成人午夜福利视频| 简卡轻食公司| 精品人妻一区二区三区麻豆| 在线天堂最新版资源| 亚洲怡红院男人天堂| 97在线视频观看| 国内精品宾馆在线| 中文字幕制服av| 久久精品国产a三级三级三级| 黄色配什么色好看| 在线a可以看的网站| 国产免费一级a男人的天堂| 欧美日韩综合久久久久久| 国产精品福利在线免费观看| 欧美精品国产亚洲| 69人妻影院| 女人被狂操c到高潮| 国产亚洲av片在线观看秒播厂| 中文字幕久久专区| 人体艺术视频欧美日本| 2022亚洲国产成人精品| 免费观看a级毛片全部| 各种免费的搞黄视频| 嫩草影院精品99| 国产成人免费无遮挡视频| 嘟嘟电影网在线观看| 真实男女啪啪啪动态图| 伦精品一区二区三区| av天堂中文字幕网| 精品人妻视频免费看| 一级毛片 在线播放| 综合色av麻豆| 老司机影院成人| 男人狂女人下面高潮的视频| 久久久久国产网址| 日韩av在线免费看完整版不卡| 看非洲黑人一级黄片| 最近中文字幕高清免费大全6| 大香蕉97超碰在线| 国产久久久一区二区三区| 岛国毛片在线播放| 2022亚洲国产成人精品| 夫妻午夜视频| 成人鲁丝片一二三区免费| 免费少妇av软件| 成人高潮视频无遮挡免费网站| 亚洲天堂av无毛| 国产国拍精品亚洲av在线观看| 亚洲精品国产av成人精品| 亚洲天堂国产精品一区在线| 两个人的视频大全免费| av免费观看日本| 一边亲一边摸免费视频| 大码成人一级视频| 欧美国产精品一级二级三级 | 王馨瑶露胸无遮挡在线观看| 日韩三级伦理在线观看| 精品一区二区免费观看| 啦啦啦在线观看免费高清www| 亚洲国产欧美人成| 超碰av人人做人人爽久久| 少妇裸体淫交视频免费看高清| 一二三四中文在线观看免费高清| 欧美xxxx黑人xx丫x性爽| 好男人在线观看高清免费视频| 综合色丁香网| 少妇人妻一区二区三区视频| 成人国产av品久久久| xxx大片免费视频| 久久久久久久精品精品| 国产亚洲一区二区精品| av在线亚洲专区| 黄片无遮挡物在线观看| 舔av片在线| 美女视频免费永久观看网站| 97超碰精品成人国产| av播播在线观看一区| 欧美3d第一页| 成人美女网站在线观看视频| 亚洲av中文av极速乱| 欧美一区二区亚洲| 天堂俺去俺来也www色官网| 在线精品无人区一区二区三 | 黄色配什么色好看| 国产精品一及| 97热精品久久久久久| 老师上课跳d突然被开到最大视频| 日韩,欧美,国产一区二区三区| 欧美日韩精品成人综合77777| 国产精品爽爽va在线观看网站| 大片电影免费在线观看免费| 51国产日韩欧美| 老女人水多毛片| 免费观看在线日韩| 亚洲精品色激情综合| 大香蕉97超碰在线| 国产在线一区二区三区精| 欧美老熟妇乱子伦牲交| 亚洲av欧美aⅴ国产| 亚洲综合色惰| 亚洲,一卡二卡三卡| 97精品久久久久久久久久精品| 女的被弄到高潮叫床怎么办| 亚洲欧美成人精品一区二区| 国产成人精品久久久久久| 亚洲av.av天堂| 国产精品女同一区二区软件| 18禁动态无遮挡网站| 成人一区二区视频在线观看| 小蜜桃在线观看免费完整版高清| 只有这里有精品99| 国产男人的电影天堂91| 在线免费十八禁| 男人狂女人下面高潮的视频| 大香蕉久久网| 一区二区三区精品91| 噜噜噜噜噜久久久久久91| 色婷婷久久久亚洲欧美| 波多野结衣巨乳人妻| av.在线天堂| 免费观看a级毛片全部| 亚洲精品国产色婷婷电影| 观看美女的网站| 精品一区二区三区视频在线| 日韩在线高清观看一区二区三区| 亚洲精品亚洲一区二区| 国产一区二区三区av在线| 国产一区亚洲一区在线观看| 一区二区三区乱码不卡18| 国产大屁股一区二区在线视频| 国产伦精品一区二区三区四那| 久久精品人妻少妇| 亚洲内射少妇av| 性插视频无遮挡在线免费观看| tube8黄色片| 亚洲国产日韩一区二区| 边亲边吃奶的免费视频| 视频中文字幕在线观看| 亚洲精品成人av观看孕妇| 国产精品人妻久久久久久| 亚洲成人av在线免费| 国产精品人妻久久久影院| 午夜福利视频精品| 精品一区二区三卡| 我的老师免费观看完整版| 久久人人爽人人爽人人片va| 中文乱码字字幕精品一区二区三区| 80岁老熟妇乱子伦牲交| 精品酒店卫生间| 99久久九九国产精品国产免费| 精品一区二区免费观看| 一本久久精品| 中文字幕免费在线视频6| 国产乱人偷精品视频| 精品人妻熟女av久视频| 99热这里只有是精品50| av天堂中文字幕网| 久久久久久国产a免费观看| 婷婷色综合www| 欧美一级a爱片免费观看看| 免费大片黄手机在线观看| 一级毛片黄色毛片免费观看视频| 国产亚洲91精品色在线| 日韩亚洲欧美综合| 看黄色毛片网站| 欧美三级亚洲精品| 综合色丁香网| 别揉我奶头 嗯啊视频| 日本熟妇午夜| 久久久久久久午夜电影| a级毛色黄片| 欧美日韩综合久久久久久| 99久久中文字幕三级久久日本| 秋霞在线观看毛片| 搞女人的毛片| 免费看a级黄色片| 成人特级av手机在线观看| 国产黄色视频一区二区在线观看| 亚洲丝袜综合中文字幕| 只有这里有精品99| 综合色丁香网| 亚洲av在线观看美女高潮| 中文天堂在线官网| 听说在线观看完整版免费高清| 精品人妻一区二区三区麻豆| 国产高潮美女av| 观看美女的网站| 亚洲成人一二三区av| 熟女电影av网| 久久ye,这里只有精品| 久久精品国产鲁丝片午夜精品| 三级经典国产精品| av女优亚洲男人天堂| 真实男女啪啪啪动态图| 26uuu在线亚洲综合色| 亚洲,一卡二卡三卡| 欧美潮喷喷水| 国产精品久久久久久精品电影| av女优亚洲男人天堂| 欧美97在线视频| 黄片无遮挡物在线观看| 卡戴珊不雅视频在线播放| 欧美3d第一页| 天天躁夜夜躁狠狠久久av| 男女边吃奶边做爰视频| 国内精品宾馆在线| 国产伦在线观看视频一区| 一个人看的www免费观看视频| 国产黄a三级三级三级人| 午夜激情福利司机影院| 免费av不卡在线播放| 亚洲国产av新网站| 性插视频无遮挡在线免费观看| 成人综合一区亚洲| 亚洲欧美精品专区久久| 久久久色成人| 国产精品人妻久久久久久| 国产成人免费观看mmmm| 秋霞在线观看毛片| 大话2 男鬼变身卡| 深爱激情五月婷婷| 在线观看一区二区三区激情| 国产精品国产三级专区第一集| 麻豆国产97在线/欧美| 午夜精品国产一区二区电影 | 久久热精品热| 一级毛片久久久久久久久女| 高清毛片免费看| av在线观看视频网站免费| 免费观看性生交大片5| 国产探花在线观看一区二区| 日韩欧美精品v在线| 一本一本综合久久| 亚洲精品一二三| 亚洲精品成人久久久久久| 亚洲人与动物交配视频| 麻豆成人av视频| 精品午夜福利在线看| 简卡轻食公司| 亚洲欧美精品自产自拍| 一级毛片久久久久久久久女| 亚洲人成网站在线观看播放| 婷婷色综合大香蕉| 色视频www国产| 大又大粗又爽又黄少妇毛片口| 日韩中字成人| 99久国产av精品国产电影| 亚洲最大成人中文| www.色视频.com| 美女高潮的动态| 成人免费观看视频高清| 一级毛片久久久久久久久女| 国产亚洲av片在线观看秒播厂| 少妇人妻一区二区三区视频| 99久久精品国产国产毛片| 丝袜喷水一区| 免费高清在线观看视频在线观看| 国产精品嫩草影院av在线观看| 国产毛片a区久久久久| 精品一区二区免费观看| 亚洲美女搞黄在线观看| 交换朋友夫妻互换小说| 久久午夜福利片| 精品一区二区三卡| 国产精品无大码| 简卡轻食公司| 成年版毛片免费区| 欧美日韩视频高清一区二区三区二| 国产一级毛片在线| 蜜臀久久99精品久久宅男| 日日摸夜夜添夜夜爱| 成人综合一区亚洲| 亚洲熟女精品中文字幕| 国产亚洲最大av| 国产色爽女视频免费观看| 日韩av在线免费看完整版不卡| 国产在视频线精品| 欧美日本视频| 久久精品国产自在天天线| 亚洲欧美日韩另类电影网站 | 免费看日本二区| 国产毛片在线视频| 久久精品久久久久久噜噜老黄| 久久人人爽av亚洲精品天堂 | tube8黄色片| 少妇人妻 视频| 日本一二三区视频观看| 欧美高清成人免费视频www| 国产精品国产三级国产av玫瑰| 五月天丁香电影| 大陆偷拍与自拍| 久久久国产一区二区| 国产91av在线免费观看| 身体一侧抽搐| 啦啦啦中文免费视频观看日本| 久久久精品免费免费高清| 国产真实伦视频高清在线观看| 男人爽女人下面视频在线观看| 18+在线观看网站| 亚洲天堂av无毛| 国产乱来视频区| 在线精品无人区一区二区三 | 久久97久久精品| 中国美白少妇内射xxxbb| eeuss影院久久| 欧美丝袜亚洲另类| 欧美日韩综合久久久久久| 亚洲,一卡二卡三卡| 国产色爽女视频免费观看| 精品国产三级普通话版| 99热6这里只有精品| 久久久久国产网址| 女人十人毛片免费观看3o分钟| 精品99又大又爽又粗少妇毛片| 免费看a级黄色片| 最后的刺客免费高清国语| 成年版毛片免费区| 噜噜噜噜噜久久久久久91| 日本三级黄在线观看| 联通29元200g的流量卡| 精品国产一区二区三区久久久樱花 | 免费观看在线日韩| 中国三级夫妇交换| 亚洲婷婷狠狠爱综合网| 久久人人爽人人片av| videos熟女内射| 日本熟妇午夜| 亚洲av.av天堂| 一区二区三区四区激情视频| 久久精品国产亚洲av天美| www.av在线官网国产| 欧美亚洲 丝袜 人妻 在线| 九色成人免费人妻av| 成人鲁丝片一二三区免费| 熟女av电影| 久久精品人妻少妇| 有码 亚洲区| 春色校园在线视频观看| 亚洲精品国产av蜜桃| 国产伦在线观看视频一区| 男女边吃奶边做爰视频| 亚洲国产成人一精品久久久| 22中文网久久字幕| 人妻少妇偷人精品九色| 伦理电影大哥的女人| 综合色av麻豆| 国产成人精品一,二区| 亚洲av日韩在线播放| 亚洲精品国产av成人精品| 99久久精品一区二区三区| 一级片'在线观看视频| 在线免费十八禁| 精品久久久噜噜| 精品一区二区免费观看| 啦啦啦啦在线视频资源| 精品人妻偷拍中文字幕| 亚洲欧美清纯卡通| 国产探花极品一区二区| a级毛片免费高清观看在线播放| 啦啦啦啦在线视频资源| 亚洲精品亚洲一区二区| 亚洲av一区综合| 国产伦精品一区二区三区四那| 黄片wwwwww| 午夜爱爱视频在线播放| 成年版毛片免费区| 在线观看免费高清a一片| 成人午夜精彩视频在线观看| 中国国产av一级| 夜夜爽夜夜爽视频| 有码 亚洲区| 精品亚洲乱码少妇综合久久| 交换朋友夫妻互换小说| 国产中年淑女户外野战色| 全区人妻精品视频| 热99国产精品久久久久久7| 99久久九九国产精品国产免费| 亚洲欧美日韩卡通动漫| 午夜免费男女啪啪视频观看| 日韩强制内射视频| 最近最新中文字幕大全电影3| 亚洲自拍偷在线| 国产免费一区二区三区四区乱码| 国产一区二区三区综合在线观看 | 国产精品.久久久| 欧美zozozo另类| 亚洲av中文字字幕乱码综合| 国国产精品蜜臀av免费| 在线观看一区二区三区激情| 男女边吃奶边做爰视频| 亚洲在线观看片| 免费不卡的大黄色大毛片视频在线观看| 成人二区视频| 亚洲国产高清在线一区二区三| 精品一区二区三区视频在线| 国产av码专区亚洲av| 久久国内精品自在自线图片| 三级经典国产精品| 午夜福利视频1000在线观看| 白带黄色成豆腐渣| 91精品伊人久久大香线蕉| 国产精品不卡视频一区二区| 九九久久精品国产亚洲av麻豆| 亚洲一区二区三区欧美精品 | 交换朋友夫妻互换小说| 亚洲精品成人av观看孕妇| 你懂的网址亚洲精品在线观看| 五月天丁香电影| 三级男女做爰猛烈吃奶摸视频| 一本久久精品| www.av在线官网国产| 欧美激情在线99| 波多野结衣巨乳人妻| 国产精品秋霞免费鲁丝片| 久久精品国产a三级三级三级| 亚洲天堂av无毛| 蜜桃久久精品国产亚洲av| 亚洲欧美精品专区久久| av女优亚洲男人天堂| 午夜福利高清视频| 亚洲最大成人手机在线| 天堂中文最新版在线下载 | 韩国av在线不卡| 亚洲精品久久久久久婷婷小说| 亚洲av免费高清在线观看| 久久精品人妻少妇| 男插女下体视频免费在线播放| 国产女主播在线喷水免费视频网站| 精品久久久精品久久久| 国产精品99久久久久久久久| 欧美另类一区| 晚上一个人看的免费电影| 亚洲性久久影院| 久久精品国产a三级三级三级| 高清午夜精品一区二区三区| 大码成人一级视频| 国精品久久久久久国模美| 亚洲图色成人| 人体艺术视频欧美日本| 91在线精品国自产拍蜜月| 久久久久久久国产电影| 九九在线视频观看精品| 久久ye,这里只有精品| 极品教师在线视频| 国产男女超爽视频在线观看| 777米奇影视久久| 最近最新中文字幕大全电影3| 美女脱内裤让男人舔精品视频| 亚洲国产欧美人成| 少妇的逼水好多| 国产成人免费观看mmmm| 成人欧美大片| 日韩精品有码人妻一区| 男女那种视频在线观看| 日本wwww免费看| 国产色婷婷99| 国产成人91sexporn| 久久久a久久爽久久v久久| 国产伦精品一区二区三区视频9| 亚洲人成网站高清观看| 一级毛片 在线播放| 成人美女网站在线观看视频| 中国三级夫妇交换| 99热这里只有精品一区| 国产真实伦视频高清在线观看| 99久久精品国产国产毛片| 亚洲精品久久久久久婷婷小说| 国产成人aa在线观看| 亚洲精品,欧美精品| 久久久久性生活片| 3wmmmm亚洲av在线观看| 亚洲国产成人一精品久久久| 国产成年人精品一区二区| 特大巨黑吊av在线直播| 国产欧美另类精品又又久久亚洲欧美| 蜜桃久久精品国产亚洲av| 五月玫瑰六月丁香| 国产一区有黄有色的免费视频| 男女边吃奶边做爰视频| 欧美变态另类bdsm刘玥| 六月丁香七月| 岛国毛片在线播放| 亚洲欧美日韩卡通动漫| 亚洲国产日韩一区二区| 一本一本综合久久| 成人免费观看视频高清| 男人爽女人下面视频在线观看| 交换朋友夫妻互换小说| 如何舔出高潮| 久久精品国产自在天天线| 嘟嘟电影网在线观看| 身体一侧抽搐| 午夜亚洲福利在线播放| 人人妻人人爽人人添夜夜欢视频 | 国产乱人视频| 免费看日本二区| 免费高清在线观看视频在线观看| 99热这里只有精品一区| 亚洲av成人精品一二三区| 免费人成在线观看视频色| 麻豆国产97在线/欧美| 亚洲人成网站高清观看| 一级毛片久久久久久久久女| 色5月婷婷丁香| 精品久久久久久久末码| 久久女婷五月综合色啪小说 | 精品亚洲乱码少妇综合久久| 国产一区二区三区av在线| 亚洲精品成人av观看孕妇| 精品国产露脸久久av麻豆| 简卡轻食公司| 国内少妇人妻偷人精品xxx网站| 在现免费观看毛片| 亚洲av日韩在线播放| 欧美 日韩 精品 国产| 国产极品天堂在线| 国产日韩欧美亚洲二区| 亚洲av.av天堂| 亚洲精华国产精华液的使用体验| 国国产精品蜜臀av免费| av天堂中文字幕网| 国模一区二区三区四区视频| 久久99精品国语久久久| 美女高潮的动态| 精品午夜福利在线看| 极品少妇高潮喷水抽搐| 亚洲av电影在线观看一区二区三区 | 另类亚洲欧美激情| 性插视频无遮挡在线免费观看| 白带黄色成豆腐渣| 欧美xxxx性猛交bbbb| 成人毛片a级毛片在线播放| 国产v大片淫在线免费观看| 国产午夜精品久久久久久一区二区三区| 又黄又爽又刺激的免费视频.| 久久久久久伊人网av| 国产精品不卡视频一区二区| 国产精品精品国产色婷婷| 精品熟女少妇av免费看| 春色校园在线视频观看| 少妇人妻一区二区三区视频| 91精品国产九色| 2018国产大陆天天弄谢| 秋霞伦理黄片| 好男人视频免费观看在线| 国产av码专区亚洲av| 少妇熟女欧美另类| 免费电影在线观看免费观看| 中文字幕久久专区| 天天躁日日操中文字幕| 日本免费在线观看一区| 久久久久国产精品人妻一区二区| 蜜臀久久99精品久久宅男| 国产乱人偷精品视频| 高清日韩中文字幕在线| 日日摸夜夜添夜夜添av毛片| 天美传媒精品一区二区| 国产成人精品久久久久久| 国产淫片久久久久久久久| 男女啪啪激烈高潮av片| 18禁在线无遮挡免费观看视频| 伊人久久精品亚洲午夜| 亚洲国产精品成人综合色| 22中文网久久字幕| 久久人人爽人人片av| 国产精品国产三级国产av玫瑰| 成人一区二区视频在线观看| 国产成人免费无遮挡视频| 精品国产露脸久久av麻豆| 欧美成人精品欧美一级黄| 亚洲成人中文字幕在线播放| 国产免费又黄又爽又色| 国产极品天堂在线| 新久久久久国产一级毛片| 最近中文字幕高清免费大全6| 国产午夜福利久久久久久| 日韩 亚洲 欧美在线| 亚洲怡红院男人天堂| 国产 一区 欧美 日韩| 日本av手机在线免费观看| 一个人看的www免费观看视频| 久久精品国产亚洲网站| eeuss影院久久|