• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Locally generalised multi-agent reinforcement learning for demand and capacity balancing with customised neural networks

    2023-05-19 03:40:12YutongCHENMinghuHUYnXULeiYANG
    CHINESE JOURNAL OF AERONAUTICS 2023年4期

    Yutong CHEN, Minghu HU, Yn XU, Lei YANG

    aCollege of Civil Aviation, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China

    bSchool of Aerospace, Transport and Manufacturing, Cranfield University, Cranfield MK43 0AL, United Kingdom

    KEYWORDSAir traffic flow management;Demand and capacity balancing;Deep Q-learning network;Flight delays;Generalisation;Ground delay program;Multi-agent reinforcement learning

    AbstractReinforcement Learning (RL) techniques are being studied to solve the Demand and Capacity Balancing(DCB)problems to fully exploit their computational performance.A locally generalised Multi-Agent Reinforcement Learning(MARL)for real-world DCB problems is proposed.The proposed method can deploy trained agents directly to unseen scenarios in a specific Air Traffic Flow Management(ATFM)region to quickly obtain a satisfactory solution.In this method,agents of all flights in a scenario form a multi-agent decision-making system based on partial observation.The trained agent with the customised neural network can be deployed directly on the corresponding flight,allowing it to solve the DCB problem jointly.A cooperation coefficient is introduced in the reward function,which is used to adjust the agent’s cooperation preference in a multi-agent system,thereby controlling the distribution of flight delay time allocation.A multi-iteration mechanism is designed for the DCB decision-making framework to deal with problems arising from non-stationarity in MARL and to ensure that all hotspots are eliminated.Experiments based on large-scale high-complexity realworld scenarios are conducted to verify the effectiveness and efficiency of the method.From a statistical point of view,it is proven that the proposed method is generalised within the scope of the flights and sectors of interest,and its optimisation performance outperforms the standard computer-assisted slot allocation and state-of-the-art RL-based DCB methods.The sensitivity analysis preliminarily reveals the effect of the cooperation coefficient on delay time allocation.

    1.Introduction

    Recently,one of the major concerns in the global development of civil aviation is the growing imbalance between increasing total traffic volume and saturated airspace accommodation,also known as the demand-capacity mismatch.If demand exceeds capacity in a sector for a certain period, hotspots will arise,resulting in increased loads on controllers, congestion in airspace,and flight delays.1As a result,balancing demand and capacity has become a vital issue for the aviation industry.DCB is one of the seven operational concepts of Air Traffic Management (ATM).2According to Single European Sky ATM Research (SESAR), DCB will play an essential role in the future air traffic management system as part of network management and can help to reduce flight delays.3,4

    DCB is also known as ATFM or Air Traffic Flow and Capacity Management (ATFCM).5,6Depending on the advance time of operation, ATFM is divided into strategic(one year to one week), pre-tactical (one week to one day),and tactical (day of operation).7The main focus of this paper is on ATFM implemented on the Day before (D-1) or on the Day (D-0) of operation.The typical operational ways of ATFM contain Ground Delay Program (GDP),8,9rerouting,10,11separation management12,13and their combination.14,15

    ATFM methods are classified into two types based on their solution methods: exact solution methods16,17and approximate solution methods.18,19In general, the advantage of exact solution methods is obtaining a globally optimal solution.However,when the problem is too large,such methods cannot guarantee that the solution will be completed in a limited time.Besides, the computing time highly depends on cases and can vary considerably for DCB problems of similar problem scales.16,17As a result, exact solution methods are hardly ever applied in practice.On the other hand, Computer-Assisted Slot Allocation(CASA), an approximate algorithm, is usually used in practice.CASA is commonly used in Europe,and it is similar to the Ration By Schedule (RBS) approach as applied in the United States.Approximation solution methods typically employ some heuristic framework or algorithm to find a locally optimal solution in a reasonable amount of time.The computation time of approximation solution methods is less sensitive to problem scale than exact solution methods.However, local optimal solutions are often not readily accepted because there is frequently a significant gap between the local and the global optimal solution.Thus, a DCB method capable of obtaining solutions with high optimisation performance in a short time is highly desired.

    In recent years, reinforcement learning techniques have gradually been tried to solve DCB problems to find a good balance between computing speed and optimisation performance.RL methods train agents to obtain strategies through a large number of training scenarios and then deploy trained agents to an actual scenario problem to make quick decisions so that the solution can be obtained in a short time.It is equivalent to transferring a significant amount of solution time to the training stage, allowing it to respond quickly to actual scenarios in operation.Hence, RL-based methods have the advantage of being faster in the calculation compared with exact solution methods.Moreover, compared with approximation solution methods, RL-based methods have potential to obtain a better approximate solution.It is because approximate solution methods such as CASA are rule-based algorithms designed based on human experience,which is likely to limit optimality.RL methods were initially explored in the field of DCB.However, up to our best knowledge, it is challenging for existing RL-based DCB methods to solve the DCB problem with large-scale high-complexity real-world scenarios in a short time by directly deploying trained agents.For some existing RL-based DCB methods, it is impossible to deploy trained agents to unseen scenarios because of the model’s scalability.For others, it is not trivial to achieve satisfactory solutions in unseen scenarios because of model design structure (please refer to Table 132–42and corresponding discussions for details).Therefore, these methods must retrain the agent if they intend to effectively solve the DCB problem for an unseen scenario.However, training the agent is time-consuming, which offsets the advantage of RL methods.

    Therefore,we propose a locally generalised MARL for realworld DCB problems to fill the gap.Our method can deploy trained agents directly to unseen scenarios in a specific ATFM region to obtain a satisfactory solution quickly(where‘locally’corresponds to ‘a(chǎn) specific ATFM region’).In this method, a distributed decision system is constructed by deploying a customised neural network on each flight to handle the specificity of each flight in DCB problems.The cooperation coefficient is introduced to control the degree of cooperation between flights.A multi-iteration mechanism is designed to deal with problems arising from non-stationarity in MARL.

    This paper is organised as follows.In the rest of Section 1,we provide a brief review of related work and introduce features and contributions of our method.Section 2 introduces notations and formulates the DCB problem.In Section 3, we discuss the construction of the reinforcement learning environment, the architecture of the network, the training method of the neural network,and the multi-iteration mechanism.In Section 4, we show the MARL training process, the performance test results, and the cooperation coefficient sensitivity analysis through simulation experiments.Finally, conclusions and future work are summarised in Section 5.

    1.1.Related work

    Much research has been done on DCB problems, and some review articles provide good summaries of the current research progress.20–22RL methods were explored in various fields of ATM because they can solve a problem in a short time after completing training, such as conflict resolution,23–25collision avoidance26–28and assistant control tool.29–31Due to the advantage of responding quickly in real scenarios,RL methods have excellent research value and potential for application in real-world DCB problems.Several representative RL-based DCB methods are summarised in Table 132–42,and the relevant explanations for this table are given as follows:

    (1) Agent: Artificial intelligence with decision-making ability.

    - Number:The number of agents in the system,where S and M refer to single and multiple, respectively.

    - Role: The agent’s role in the DCB problem, where C and F refer to controllers and flights, respectively.

    - Mode:The operating mode of agents,where C and D refer to centralized and decentralized (distributed),respectively.

    (2) RL method:The method used to train the agent’s policy,e.g., Q-table, Temporal-Difference (TD) learning, Qlearning, Deep Q-Learning (DQN), Proximal Policy Optimisation (PPO) and Asynchronous Advantage Actor-Critic (A3C).

    Table 1 Features of DCB methods based on RL32–42

    (3) Sharing policy: Whether agents share the neural network parameters in the multi-agent system.

    (4) Generalisation:Whether the method is generalised in a specific ATFM region.L1 means that the trained agent can be deployed directly to unseen scenarios(the model is scalable),but there is no guarantee that a satisfactory solution will be obtained.L2 means that the trained agent can obtain a satisfactory solution based on L1.

    (5) ATFM method: Operational ways of ATFM, where GDP refers to ground delay program, and MIT refers to Miles-in-trail (a kind of separation management).

    (6) Sector opening scheme: Whether the sector structure changes over time, as in the real world.

    (7) Uncertainty: Whether to consider the uncertainty of demand and capacity forecasts and the uncertainty of flight.

    (8) Elimination:Whether all hotspots can be guaranteed to be eliminated.

    (9) Experimental scenario: The most complex and realistic experimental scenario in the study.

    - Real-world: Whether the experiment was based on real-world data, including flights and sectors.

    - Hotspot: The initial number of hotspots.

    - Flight scale: The number of flights (round by 100).

    - Sector scale: The number of sectors.

    (10) Symbol descriptions: checkmark (?) and circle (?)respectively mean that the method has and does not has the corresponding feature.N/A means that the feature or parameter is not applicable to the method or is not disclosed in the study.

    Please note that if there are several extended methods (or variants of the basic method) introduced in a study, only the one with the highest comprehensive performance is shown in the table.

    Crespo et al.32trained a single agent through RL to centrally assign flight delays to several airports (all flights at the same airport are delayed by the same time).Due to the limitation of the Q table, this agent can only be deployed for problems with specific sectors and airports.Agogino and Tumer33deployed agents on several route nodes around overloaded sectors, forming a distributed multi-agent system and employing the Miles-in-Trail(MIT)method to adjust the distance between flights passing through the same node for ATFM purposes.Kravaris et al.34proposed a collaborative RL framework for DCB, which deploys an agent on each flight.In Kravaris’method,each flight makes a decision independently without communicating with other flights.However, the effectiveness of this method has only been demonstrated in one tiny toy case.Spatharis et al.35,36further enhanced Kravaris’method and verified their method’s effectiveness in real-world scenarios with high complexity.Duong et al.37presented a blockchain-based RL method for ATFM.Like Crespo’s method, the agent in Duong’s method plays a controller to assign delays to flights at its responding airport,and each agent is responsible for only one airport.Spatharis et al.38,39proposed and extended a hierarchical MARL learning scheme for DCB problems, which constructed two states for agents: the ground state and abstract state.A common shortcoming of the six studies mentioned above is that they all employed RL algorithms as a search algorithm; that is,the process of training is used as the process of solving.Therefore,these methods are not generalised,and the advantages of reinforcement learning in terms of rapid response are not available for the above methods.Besides, because of the nonstationarity in MARL, they cannot guarantee that the agents will always be able to eliminate all hotspots.Thanks to the development of deep neural networks in reinforcement learning applications,the DCB methods with deep neural networks have been further enhanced.Chen et al.40validated the effectiveness of the Deep Q-learning Network (DQN) in the RLbased DCB method.However, the experiments are not based on real-world scenarios, and the generalisation of the method has not been verified.Tang and Xu41integrated an action supervisor into a rule-based time-step DCB method using Proximal Policy Optimisation (PPO).Tang’s method forces the agent to change its action when it chooses an action that would cause a hotspot to ensure that hotspots are eliminated.However,the addition of an action supervisor has led to a significant increase in delays.Huang and Xu42presented a hotspot-based DCB method with Asynchronous Advantage Actor-Critic (A3C), and this method ensures that hotspots are eliminated by allocating multiple delays.Despite Tang’s and Huang’s methods attempting to deploy trained agents to unseen scenarios, their method cannot be considered highly generalised due to potential flaws in the model design.For Tang’s method, the observation matrix seems so sparse that it limits agent learning.For Huang’s method, sector IDs are used as part of the agent observations and the ID as a code is not computationally meaningful.

    In summary,it is difficult to deploy agents trained by these existing methods in Table 1 directly to an unseen scenario and obtain a satisfactory solution quickly.Therefore, there is an urgent need to improve the generalisation of RL-based DCB methods.

    1.2.Proposed method

    This paper serves as an extension of our previous study.40In this paper, a locally generalised MARL method for DCB where each agent has a customised neural network is proposed(the features of our method are also summarised in Table 1).

    Generalisation is one of the most critical indicators of RL methods.To our knowledge, this is a dilemma for the DCB problem, however.On the one hand, the scale of each DCB problem (the number of flights or sectors) is different, while the dimension of the observation matrix in an RL method is usually required to be fixed.On the other hand, maybe we could technically remove the limitation of the observation matrix.However, completely homogenising individuals in large-scale DCB problems could significantly decrease the solver’s optimisation performance and make it difficult to meet the differentiated preferences of different flights.Hence,a balance needs to be found in this dilemma to maximise the advantages of the RL method.

    Considering that flight schedules for commercial flights are usually cyclical,we can train an agent with a customised neural network for each flight schedule and deploy it in any scenario that contains that flight.Our method has both a local generalisation and the optimisation performance improvement led by heterogeneous individuals in the multi-agent system.We set the cooperation coefficient in the reward function to adjust the cooperation preference of the flight, thereby adjusting the distribution of the global delay time allocation.Our neural network outputs are only two discrete actions.We employ the state-of-the-art algorithm based on DQN, Rainbow DQN,43which significantly improves RL efficiency by integrating multiple DQN enhancement technologies.Besides, we design a multi-iteration mechanism for the DCB decision-making framework to deal with problems arising from nonstationarity in MARL,thereby enabling the solver to eliminate hotspots.

    1.3.Summary of contributions

    (1) Trained agents of the proposed method can be deployed directly to unseen scenarios in a specific ATFM region to obtain a satisfactory solution quickly.

    (2) A cooperation coefficient is introduced to adjust the distribution of flight delay time allocation.

    (3) A multi-iteration mechanism is designed for the DCB decision-making framework to enable the solver to eliminate all hotspots.

    (4) Systematic experiments based on large-scale highcomplexity real-world scenarios are performed to verify the proposed method’s effectiveness and efficiency.

    2.Problem formulation

    The demand and capacity balancing problem to be handled in this article is ensuring that the number of flights entering the sector per unit time does not exceed its capacity (that is, all hotspots are eliminated) by implementing the ground delay program before the operation.

    2.1.Demand and capacity

    All initial flight schedules can be obtained one day before the flight operation.The flight schedule of the ith flight is denoted by fi, which consists of a set of binary arrays, as shown in Fig.1.

    where eijdenotes the jth sector through which the ith flight is scheduled to pass,tijdenotes the time of entry into that sector,Midenotes the number of sectors through which the ith flight is scheduled to pass,and I denotes the set of flights.The set of initial flight schedules FInitialis represented as

    Fig.1 Flight schedule.

    For example,if the width of time windows τ is 20 min,the time range of the 10th time window is [1 80;200),which corresponds to 03:00–03:20 of the operation day (not including 03:20).

    In practice, sectors are divided into two types, elementary sectors and collapsed sectors.An elementary sector is the most fundamental sector unit in the airspace, while a collapsed sector comprises several adjacent elementary sectors.The basic sector units operate as elementary sectors or collapsed sectors depending on the sector opening scheme.In this paper,the sector opening scheme is considered, and the state of the sectors changes over time windows.

    Fig.2 Sector opening scheme.

    In summary, the DCB problem in this paper is defined as balancing the traffic demand with the airspace capacity through shifting flight take-off time to avoid hotspots.The sector opening scheme is considered.The optimisation objective is to minimise the total delay time for all flights.

    2.2.Partially observable decision-making

    Fig.3 Multi-agent decision-making framework (top-level framework of the proposed method) where all agents are distributed.

    To find an optimal policy,we adopt an objective by minimizing the expectation of the average delay time of all flights in the same scenario, which is defined as

    where N is the number of flights.The average delay time will also be an essential metric to evaluate the trained policy in Section 4.

    3.Approach

    In this section,we first introduce the key ingredient of our reinforcement learning setup.Then, the detail of the architecture of the policy network based on a neural network is proposed.Next, we present the MARL training algorithm based on Rainbow DQN and the design of training scenarios.Finally,we discuss the multi-iteration mechanism.

    3.1.Reinforcement learning setup

    We assume all flights in the environment are heterogeneous to tackle the DCB problem through the MARL method.Each flight has different aircraft types, routes, and preferences.More importantly, most commercial flight plans are generally repeated every week, and there is a limited amount of flight routes in the real world.Thus, we treat all flights over time as a set of flights based on historical data and train an agent for each flight using reinforcement learning methods.If so,we can solve DCB problems based on a subset of this set of flights by deploying corresponding agents to flights.Based on the problem formulation in Section 2.2, the DCB problem can be transferred into a Partially Observable Markov Decision Process (POMDP) solved with MARL.A POMDP can be described as a six-tuple (S;A;P;R;Ω;O).S is the state space, A is the action space, P is the state-transition model,R is the reward function, Ω is the observation space(o ?Ω), and O is the observation probability distribution given the system state (o ~O(s)), where s is the global state.An agent is deployed on each aircraft in this environment,and all the agents form a distributed decision-making system.Therefore, a multi-aircraft state-transition model P determined by the aircraft’s delay time and uncertainty is unnecessary.The action space A, the observation space Ω and the reward function R are defined as below.

    3.1.1.Action space

    3.1.3.Reward design

    3.2.Network architecture

    3.3.Training method

    3.3.1.Training algorithm

    Because the proposed reinforcement learning model’s action is discrete and has only two actions,DQN is ideal as an efficient reinforcement learning algorithm for discrete actions.45In recent years, several DQN performance improvement techniques have been proposed.We adaptively employ and combine the Double-DQN,46Duelling DQN47and Prioritised Replay Buffer48to enhance the performance of DQN,inspired by a recently proposed advanced DQN method, Rainbow DQN.43

    Fig.4 Architecture of policy network, showing the working process of an agent and focusing on neural network.

    Algorithm 1.Learning to solve DCB problems.46–49

    The algorithm of learning for DCB is summarised in Algorithm 146–49In this algorithm, each episode consists of two parts.One is the simulation and collecting transitions (lines 4–29), and the other is the policy update (lines 31–37).

    As agents in our MARL model are distributed, several parts of the algorithm can run in parallel, including lines 7–16, lines 18–22 and lines 31–37.

    Fig.5 French and Spanish sectors.

    3.3.2.Training scenarios

    This paper’s scenario data consists of the sector opening scheme and flight schedule.We use the real data of French and Spanish airspace, as shown in Fig.5.Specifically, 12187 flights on 2019-08-27 and 12138 flights on 2019-08-28 are selected for training (a total of 24325 flights), and a unique agent is deployed on each of them.The sector opening schemes for each day in August 2019 (31 sector opening schemes in total) are used for training scenarios.Three hundred ninetysix sectors are considered, including elementary sectors and collapsed sectors.Based on the real data mentioned above,each training scenario comprises a set of flight schedules and a sector opening scheme for any day.Thus,there are 62 different training scenarios in total, and one of them is randomly selected in each episode for training(refer to Algorithm 1,line 4).

    3.4.Multi-iteration mechanism

    Due to the non-stationarity in MARL, in our model, the trained agents do not guarantee that hotspots will be wholly eliminated with just one iteration.One iteration refers to that the trained agents make action decisions for all flights of the day according to the order of the time window until all flights are assigned a departure time or delayed to the next day.Note that, to date, there are few proven methods to handle nonstationarity in MARL effectively50Therefore, a multiiteration mechanism is introduced to deal with the problem arising from the non-stationarity in MARL.If there are still hotspots that have not been eliminated after the first iteration,multiple iterations are performed based on the assigned departure time until all hotspots are eliminated.Please note that the iteration following the first iteration differs from the first iteration.For legibility,we call an iteration after the first iteration a subsequent iteration.The framework of the subsequent iteration in the multi-iteration mechanism is shown in Fig.6.In each time window of the subsequent iteration, flights are divided into two categories:

    (1) One is the flight(represented by Flight A in Fig.6)which does not cause hotspots based on the currently assigned departure time.Its departure time will remain unchanged;it is scheduled to take off in the current time window.

    (2) The other is the flight(represented by Flight B in Fig.6)which causes hotspots based on the currently assigned departure time.Through its trained neural network,the agent deployed on the flight will decide whether the flight will take off at the current time window or be delayed to the next time window.

    Please note that this multi-iteration mechanism is only used for trained agents to solve a DCB problem and not for training.In training,only one iteration is performed.In solving,the first iteration (refer to Fig.3) is performed firstly.If, after the first iteration, there are still hotspots that have not been eliminated, subsequent iterations (refer to Fig.6) are performed until all the hotspots have been eliminated.

    4.Simulation experiments

    In this section,we first introduce the training setup and present the training results.Next, we test the proposed method’s generalisation and compare it with state-of-the-art RL-based DCB methods.Finally, we discuss the sensitivity analysis of the cooperation coefficient.

    4.1.Training

    The neural networks are constructed based on the Pytorch library in the Python environment and are trained on a computer with one i5-8300H CPU.The Adam optimisation algorithm51updates the network parameters.We determine the values of the hyper-parameters in Algorithm 1 through tuning experiments,and they are summarised in Table 2.Specifically,the exploration rate ε is set to 0.95; the buffer capacity nBCis set to 200; the batch size nBSis set to 20; the discount rate γ is set to 0.9;the learning rate α is set to 1×10-3;the target network update cycle nTNis set to 100.In each training episode,one of the 62 scenarios in the training dataset is randomly input to the reinforcement learning environment as the training scenario (refer to Section 3.3.2 for details).

    To study the training process and the effectiveness of each DQN enhancement technology in our model,we introduce and compare the training results of five models:Model 1,Model 2,Model 3,Model 4 and Model 5.The features of each model are summarised in Table 3, where the checkmark (?) means that the model uses the corresponding DQN enhancement technology while the circle (?) means not.

    Fig.6 Framework of subsequent iteration in multi-iteration mechanism.

    Table 2 Hyper-parameter settings for training algorithm.

    Table 3 Model features.

    The training result is shown in Fig.7.The light curves are the actual result data and the dark ones are fitted curves based on the former by 90% smoothing coefficient for legibility.We can find that the trend of change of these curves is the same;after a short fluctuation, it rises rapidly and then tends to stabilise and gradually converge.Model 1 and Model 2 grow the fastest and converge around the 800th episode.The other three curves stabilised around the 1600th episode.In terms of performance, according to the average reward, Model 1 is the best, and Model 2 is the second.Model 3 and Model 4 are nearly identical.Model 5 is significantly inferior to the others.In summary, the training of our neural networks is effective,and the DQN enhancement technologies in our model can effectively improve training efficiency and model performance.

    Fig.7 Training result.

    4.2.Performance test

    4.2.1.Benchmark and test scenario

    To verify the optimisation performance of the proposed MARL method, we use CASA as a benchmark52CASA is a DCB solver based on a heuristic algorithm,and it is often used in actual ATM operations.It has the advantages of requiring less computing time and being simple to use in the real world.CASA is a general term for a class of methods whose specific algorithms are often fine-tuned to suit specific scenarios.For ease of comparison, in this paper, we use a standard CASA introduced in ATFCM OPERATIONS MANUAL-Network Manager by EUROCONTROL53and the CASA’s source code based on Python3 is available on our Github.

    To test the performance of our proposed method in diverse scenarios different from the training scenarios, we use sector opening schemes in July 2019 and combine the flights on 2019-08-27 and 2019-08-28 to generate test scenarios,as shown in Fig.8.Three sets of test scenarios are generated for three corresponding tests to verify the generalisation of our method for sector opening schemes, flight schedules and problem scales, respectively.The numbers shown in Fig.8 represent the number of the corresponding element.For example,in Test 2,it means that there are 7 sets of flight schedules and 31 sector opening schemes, and they make up a total of 217 scenarios.Besides, we compare our method with two state-of-the-art RL-based DCB methods.The experimental design and results are shown as follows.

    4.2.2.Generalisation for sector opening scheme (Test 1)

    Fig.8 Scenarios generation for performance tests.

    Test 1 aims to verify the generalisation of our method for the sector opening scheme.Although commercial flights are repeated for a certain period, the flight schedules for corresponding days will not be the same.Therefore, we randomly select 90% of flights on 2019-08-27 and 10% of flights on 2019-08-28 to form a new set of flight schedules which contains 12182 flights.To differentiate from the training scenario, we use the 31 sector opening schemes in July 2019 in the test scenarios.The new set of flight schedules and the 31 sector opening schemes form 31 test scenarios (also refer to Fig.8).Two performance indicators are used to reflect the optimisation performance of the method,namely the number of delayed flights and the average delay time.Please note that the average delay time is the mean for all flights in a scenario.The 31 test scenarios are solved by our proposed MARL method and CASA.The results are shown in Fig.9, where the sector opening scheme ID corresponds to the day in July 2019 (for example,01 refers to 2019–07-01).Compared with CASA, MARL decreases the average delay time and the number of delayed flights by about 72.5%and 74.6%,respectively.Since the performance of our method, according to the benchmark, does not vary significantly in different scenarios of Test 1, it can be considered generalised for various sector opening schemes.

    4.2.3.Generalisation for flight schedule (Test 2)

    Test 2 aims to verify the generalisation of our method for flight schedules.We generate seven sets of flight schedules by combining the flights on 2019-08-27 and 2019-08-28 with different proportions.We use the rate of difference to represent the proportion of non-same-day flights.Assuming that the flights on 2019-08-27 are taken as main flights in a test scenario while the test set of flight schedules consists of x flights on 2019-08-27 and y flights on 2019-08-28, the rate of difference isWe randomly select flights on 2019-08-27 and 2019-08-28 based on seven specific rates of difference and test them in 31 sector opening schemes in July 2019; there are 217 test scenarios in total (also refer to Fig.8).The results of Test 2 are shown in Fig.10,where C and M in the boxes below the figure represent CASA and MARL, respectively (grey represents CASA and red represents MARL; this representation is also used in Fig.11).IQR is interquartile range.Although the average delay time and the number of delayed flights of the results when the rate of difference is 0 in both are the smallest on average, the two performance indicators, by comparing with the results of CASA,show no significant uptrend as the rate of difference increases.In other words, the method does not significantly decrease performance due to the rise in the rate of difference in flights.Therefore,it is demonstrated that the proposed method can cope with limited flight combinations.

    4.2.4.Generalisation for problem scale (Test 3)

    Test 3 aims to verify the generalisation of our method for problem scale.To obtain several scenarios with different scales, we first select the sector opening scheme of 2019-07-27 and then randomly select a specific number of sectors to form a new sector opening scheme.We generate 6 sets of 10 sector opening schemes in total,with each set’s sector opening schemes containing a different number of sectors.The flight schedule in Test 1 is tested in each opening scheme separately for 60 scenarios in Test 3 (also refer to Fig.8).The results of Test 3 are shown in Fig.11.In addition to the average delay time and the number of delayed flights, each scenario’s computing time and initial condition are also presented for reference.For the 60 scenarios in Test 3, the initial number of hotspots and the initial number of flights increase with the number of sectors, as shown in Fig.11(d).As expected, the average delay time and the number of delayed flights increase as the problem scale increases.The growth trends of the two optimisation indicators for MARL are linear, similar to the growing trend of the problem scale.Therefore, the optimisation performance of our proposed method is similar in scenarios of different problem scales, which demonstrates its generalisation for problem scales.As shown in Fig.11(c),although MARL’s computing time is much longer than CASA’s in the same scenario,its growth trend is almost linear rather than exponential as the problem scale increases,and the computing time for the scenario with 300 sectors and more than 10000 flights is about 30 s, which is acceptable for a DCB problem of this scale.

    4.2.5.Comparison with state-of-art RL-based DCB methods

    Fig.9 Experimental results of method generalisation test for sector opening schemes.

    Fig.10 Experimental results of method generalisation test for flight schedules.

    Fig.11 Experimental results of method generalisation test for problem scales.

    This test aims to compare the proposed method with the stateof-the-art RL-based DCB methods.Because Tang’s41and Huang’s42methods have the potential for generalisation, we compare our method with them in the computing time and average delay time.The same way is followed to train them as with the proposed method.Because the two methods do not consider sector opening schemes, we upgrade them to enable dealing with DCB problems with sector opening schemes.The comparison is based on ten random scenarios of Test 2, and the experimental results are shown in Fig.12,where MARL represents the proposed method, T represents Tang’s method41and H represents Huang’s method42Although this test is for comparing RL-based DCB methods,the performance data of CASA are still added to Fig.12 for reference.We can find that the proposed method outperforms the two state-of-the-art RL-based DCB methods.In terms of the average delay time, compared with the proposed method,Tang’s method and Huang’s method increase by about 116% and 79%, respectively.The reason may be that their methods do not have enough generalisation for the test scenarios.In terms of the computing time, compared with the proposed method, Tang’s method and Huang’s method increase by about 43% and 620%, respectively.The delay time allocated by the agent in Huang’s method is tiny at each iteration,so a considerable number of iterations are needed to eliminate hotspots, which is significantly time-consuming.The decisionmaking framework of Tang’s method is similar to ours; a longer average time means that agents make more actions,which is also time-consuming.It tends to explain why the computing time for Tang’s method is longer than ours but very close.

    4.3.Sensitivity analysis

    Sensitivity analysis on the coefficient of cooperation (refer to Eq.(20)) is performed to explore the impact on our proposed method.We re-train another four sets of neural networks in the training scenarios(mentioned in Section 3.3)with different coefficients of cooperation.With the set of neural networks trained on the base model, there are five sets of neural networks for training flights; the coefficient of cooperation,respectively, is set to 0.25, 0.5, 1, 2 and 4.Then, the five sets of neural networks are deployed on the corresponding flights and tested in the scenarios of Test 1, respectively.The experimental results in the average delay time and the number of delayed flights are shown in Fig.13.From a statistical point of view,as the coefficient of cooperation increases,the average delay time decreases while the number of delayed flights increases.The reason is that the greater the coefficient of cooperation, the lower the proportion of the penalty of holding in the total reward, and the more likely the flight is to promote the achievement of the global optimisation goal even if its own delay time is longer.Thus, in practice, an appropriate cooperation coefficient should be set according to the actual needs of air traffic operations to balance the average delay time and the number of delayed flights.

    Fig.12 Experimental results of the comparison of RL-based DCB methods.

    Fig.13 Experimental results of sensitivity analysis on coefficient of cooperation.

    5.Conclusions

    In this paper, we have presented a locally generalised multiagent reinforcement learning for demand and capacity balancing with customised neural networks.DQN-based enhancement technologies (i.e., Double DQN, Duelling DQN and Prioritised Replay Buffer) have been proven to improve the learning efficiency of agents.We have evaluated the performance of our method using a series of comprehensive experiments based on large-scale high-complexity real-world scenarios.By analysing the experimental results, we can draw the following conclusions:

    (1) The proposed method is generalised for sector opening schemes,flight schedules and problem scales in the scope of interest.

    (2) The proposed method outperforms CASA in optimisation performance.

    (3) The proposed method outperforms the state-of-the-art RL-based DCB methods41,42in computing time and optimisation performance.

    (4) The trend of average delay time and the number of delayed flights are negatively correlated as the cooperation coefficient changes in the proposed method.

    This work serves as our first step toward reducing the gap between the theory and practice of multi-agent reinforcement learning methods used for demand and capacity balancing.We are fully aware that the situation in practice is far more complicated than that in our experiments.Compared with traditional methods (such as exact solution methods), reinforcement learning methods have the potential to deal with uncertainty because they are based on probability inherently.In the future, the uncertainty should be taken into account in RL-based DCB methods to improve the potential for practical application.Furthermore, without compromising the optimisation performance,we will further enhance the generalisation of the technique to reduce the training cost.For example, all agents with the same flight path share the same neural network.

    Declaration of Competing Interest

    The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

    Acknowledgements

    This study was co-funded by the National Natural Science Foundation of China (No.61903187), the National Key R&D Program of China (No.2021YFB1600500), the China Scholarship Council (No.202006830095), the Natural Science Foundation of Jiangsu Province (No.BK20190414) and the Jiangsu Province Postgraduate Innovation Fund (No.KYCX20_0213).The research described in this paper has been partly conducted in the framework of a project that has received funding from the SESAR Joint Undertaking within the framework of the European Union’s Horizon 2020 research and innovation programme under grant agreement No 891965.

    成人美女网站在线观看视频| 99热精品在线国产| 欧美一区二区亚洲| 久久精品国产亚洲av香蕉五月| 亚洲专区国产一区二区| 亚洲最大成人中文| 伦精品一区二区三区| 日韩大尺度精品在线看网址| 晚上一个人看的免费电影| 精品无人区乱码1区二区| 最后的刺客免费高清国语| 一区二区三区免费毛片| 日韩欧美精品免费久久| 亚洲婷婷狠狠爱综合网| 69av精品久久久久久| 免费观看人在逋| 嫩草影院精品99| 国产一区二区激情短视频| 国产黄a三级三级三级人| 露出奶头的视频| 丰满乱子伦码专区| 97在线视频观看| 97在线视频观看| 亚洲性夜色夜夜综合| 激情 狠狠 欧美| 听说在线观看完整版免费高清| or卡值多少钱| 日韩欧美免费精品| 天美传媒精品一区二区| 69人妻影院| 国产精品爽爽va在线观看网站| 国产高清视频在线观看网站| 国产一区亚洲一区在线观看| 床上黄色一级片| 久久精品综合一区二区三区| 久久久久久久午夜电影| 国产欧美日韩精品亚洲av| 亚洲熟妇中文字幕五十中出| 十八禁网站免费在线| 天堂√8在线中文| 麻豆国产97在线/欧美| АⅤ资源中文在线天堂| 国产高清视频在线观看网站| 国产亚洲91精品色在线| 丰满人妻一区二区三区视频av| 十八禁网站免费在线| 国产一区二区三区在线臀色熟女| 黄色视频,在线免费观看| 观看美女的网站| 岛国在线免费视频观看| 一夜夜www| 日本欧美国产在线视频| 18禁裸乳无遮挡免费网站照片| 男插女下体视频免费在线播放| 禁无遮挡网站| 乱系列少妇在线播放| 亚洲人与动物交配视频| 人妻少妇偷人精品九色| 国产淫片久久久久久久久| 人妻少妇偷人精品九色| 久久久国产成人免费| 国产美女午夜福利| 一进一出抽搐动态| 麻豆av噜噜一区二区三区| 91精品国产九色| 九九爱精品视频在线观看| 国产精品一区二区免费欧美| 国产精品野战在线观看| 欧美日本视频| 又黄又爽又免费观看的视频| 一个人看视频在线观看www免费| av天堂中文字幕网| 国内精品美女久久久久久| 亚洲精品亚洲一区二区| www日本黄色视频网| 精品乱码久久久久久99久播| 狂野欧美激情性xxxx在线观看| 69人妻影院| 亚洲欧美中文字幕日韩二区| 国产v大片淫在线免费观看| 一本精品99久久精品77| 特级一级黄色大片| 寂寞人妻少妇视频99o| 欧美xxxx性猛交bbbb| aaaaa片日本免费| 亚洲在线观看片| 亚洲欧美成人精品一区二区| 亚洲人成网站在线播放欧美日韩| 欧美一区二区精品小视频在线| 最新中文字幕久久久久| 麻豆国产97在线/欧美| 嫩草影院入口| 日日摸夜夜添夜夜爱| 国产中年淑女户外野战色| 日本一本二区三区精品| 白带黄色成豆腐渣| 国产三级在线视频| 色哟哟·www| 男女之事视频高清在线观看| 极品教师在线视频| 非洲黑人性xxxx精品又粗又长| 免费黄网站久久成人精品| 日韩成人伦理影院| 一个人看视频在线观看www免费| 丝袜美腿在线中文| 国产精华一区二区三区| 久久国内精品自在自线图片| 麻豆乱淫一区二区| 久久精品国产鲁丝片午夜精品| 国产精品野战在线观看| 一级黄色大片毛片| 午夜影院日韩av| 五月玫瑰六月丁香| 天美传媒精品一区二区| 在现免费观看毛片| 晚上一个人看的免费电影| 亚洲中文日韩欧美视频| 婷婷亚洲欧美| 国内少妇人妻偷人精品xxx网站| 日韩中字成人| 亚洲无线在线观看| 欧美成人一区二区免费高清观看| 国产精品一二三区在线看| 九九爱精品视频在线观看| 啦啦啦韩国在线观看视频| 久久久久免费精品人妻一区二区| 在线观看免费视频日本深夜| 日本五十路高清| 久久久久久伊人网av| 亚洲av成人av| av.在线天堂| 欧美成人精品欧美一级黄| 人妻丰满熟妇av一区二区三区| 日本五十路高清| 国产在线男女| 国产v大片淫在线免费观看| 国产三级在线视频| av福利片在线观看| 午夜免费男女啪啪视频观看 | 国产在线男女| 亚洲四区av| 男插女下体视频免费在线播放| av在线天堂中文字幕| 日韩强制内射视频| 成人特级av手机在线观看| 深夜a级毛片| 黄色配什么色好看| 欧美不卡视频在线免费观看| 欧美潮喷喷水| 可以在线观看毛片的网站| 色尼玛亚洲综合影院| 亚洲第一电影网av| 中文在线观看免费www的网站| 国产综合懂色| 亚洲在线自拍视频| 国产亚洲精品av在线| 国产免费男女视频| 成人欧美大片| 免费一级毛片在线播放高清视频| 一进一出抽搐动态| 精品无人区乱码1区二区| 如何舔出高潮| 桃色一区二区三区在线观看| 在线免费观看不下载黄p国产| 亚洲国产精品成人久久小说 | 欧美性猛交╳xxx乱大交人| 亚洲精品国产成人久久av| 久久久精品欧美日韩精品| 毛片一级片免费看久久久久| 成年女人永久免费观看视频| 成人漫画全彩无遮挡| 最近中文字幕高清免费大全6| videossex国产| 1024手机看黄色片| 亚洲人成网站在线播放欧美日韩| 中文字幕av在线有码专区| 一本精品99久久精品77| 亚洲国产精品国产精品| 天美传媒精品一区二区| 长腿黑丝高跟| 女同久久另类99精品国产91| 国产伦精品一区二区三区四那| 亚洲最大成人手机在线| or卡值多少钱| 免费看a级黄色片| 亚洲在线自拍视频| 国产精品伦人一区二区| 男人和女人高潮做爰伦理| 偷拍熟女少妇极品色| 亚洲精品色激情综合| 97超视频在线观看视频| 国产黄a三级三级三级人| 亚洲熟妇熟女久久| 国产精品三级大全| 日韩三级伦理在线观看| 欧美一区二区精品小视频在线| 乱码一卡2卡4卡精品| 国产片特级美女逼逼视频| 色吧在线观看| 欧美区成人在线视频| 久久久欧美国产精品| 国产成人一区二区在线| 日韩大尺度精品在线看网址| 精品日产1卡2卡| 国产成人a区在线观看| 在线播放国产精品三级| 高清毛片免费看| av专区在线播放| 男女那种视频在线观看| 国产av不卡久久| 中国美女看黄片| 美女 人体艺术 gogo| 午夜爱爱视频在线播放| 日韩欧美三级三区| av天堂在线播放| 欧美丝袜亚洲另类| 亚洲精品久久国产高清桃花| 日日干狠狠操夜夜爽| 国内揄拍国产精品人妻在线| 一级毛片久久久久久久久女| 成人鲁丝片一二三区免费| 欧美绝顶高潮抽搐喷水| 2021天堂中文幕一二区在线观| 国产黄a三级三级三级人| 欧美激情久久久久久爽电影| 两个人的视频大全免费| 日韩强制内射视频| 欧美成人a在线观看| 亚洲无线在线观看| 色综合亚洲欧美另类图片| а√天堂www在线а√下载| 国产一区二区亚洲精品在线观看| 久99久视频精品免费| 日本黄色片子视频| 中文字幕免费在线视频6| 国产精品嫩草影院av在线观看| 成人性生交大片免费视频hd| aaaaa片日本免费| 高清毛片免费看| 国内久久婷婷六月综合欲色啪| 91精品国产九色| 非洲黑人性xxxx精品又粗又长| 人妻久久中文字幕网| 青春草视频在线免费观看| 精品人妻一区二区三区麻豆 | 亚洲图色成人| 99久久精品一区二区三区| 国产精品一区www在线观看| 国产精品爽爽va在线观看网站| 1024手机看黄色片| 最近2019中文字幕mv第一页| 韩国av在线不卡| 天堂影院成人在线观看| 啦啦啦韩国在线观看视频| 亚洲久久久久久中文字幕| 啦啦啦观看免费观看视频高清| a级一级毛片免费在线观看| 国产91av在线免费观看| 欧美潮喷喷水| 在线观看66精品国产| 3wmmmm亚洲av在线观看| 99九九线精品视频在线观看视频| 最近的中文字幕免费完整| 久久精品国产99精品国产亚洲性色| 欧美日韩国产亚洲二区| 亚洲成人精品中文字幕电影| 偷拍熟女少妇极品色| 又爽又黄无遮挡网站| 久久精品国产亚洲av涩爱 | 日韩人妻高清精品专区| 五月玫瑰六月丁香| 亚洲五月天丁香| 成年av动漫网址| 亚洲精华国产精华液的使用体验 | 国产欧美日韩精品一区二区| 天堂av国产一区二区熟女人妻| 尾随美女入室| 国产视频一区二区在线看| 午夜福利18| 搡女人真爽免费视频火全软件 | 俺也久久电影网| 在线看三级毛片| 成年av动漫网址| 特级一级黄色大片| 精品人妻偷拍中文字幕| 国产真实伦视频高清在线观看| 精品久久国产蜜桃| 两个人视频免费观看高清| 亚洲精品久久国产高清桃花| 一本一本综合久久| 99久国产av精品| 麻豆av噜噜一区二区三区| 特大巨黑吊av在线直播| 亚洲不卡免费看| 亚洲色图av天堂| 99国产极品粉嫩在线观看| 变态另类丝袜制服| 六月丁香七月| 国产亚洲欧美98| 亚洲av二区三区四区| 国产精品免费一区二区三区在线| 最近手机中文字幕大全| 国产乱人偷精品视频| 可以在线观看毛片的网站| 黄色一级大片看看| 日本成人三级电影网站| 日韩欧美 国产精品| 亚洲成人精品中文字幕电影| 男人舔女人下体高潮全视频| 波多野结衣高清作品| av.在线天堂| 在线观看免费视频日本深夜| 看免费成人av毛片| 久久精品91蜜桃| www日本黄色视频网| 亚洲av中文av极速乱| 国产国拍精品亚洲av在线观看| 久久午夜福利片| 久久久精品大字幕| 国产极品精品免费视频能看的| 我的女老师完整版在线观看| videossex国产| 精品一区二区三区av网在线观看| 国产高清视频在线播放一区| 噜噜噜噜噜久久久久久91| 亚洲av免费高清在线观看| a级一级毛片免费在线观看| 国产精品无大码| 日日摸夜夜添夜夜爱| 国产久久久一区二区三区| 国产极品精品免费视频能看的| 乱人视频在线观看| 人妻久久中文字幕网| 少妇高潮的动态图| 亚洲成人中文字幕在线播放| 久久午夜福利片| 成人一区二区视频在线观看| 亚洲av中文av极速乱| 黄色视频,在线免费观看| 久久久国产成人免费| 联通29元200g的流量卡| 成人av一区二区三区在线看| АⅤ资源中文在线天堂| 在线看三级毛片| 久久久久国产网址| 成人国产麻豆网| 午夜免费激情av| 国产探花在线观看一区二区| 国产乱人视频| 中文字幕熟女人妻在线| 在线免费观看不下载黄p国产| 中文资源天堂在线| 国产 一区精品| av视频在线观看入口| av黄色大香蕉| 欧美最新免费一区二区三区| 精品久久久久久久久亚洲| 毛片一级片免费看久久久久| 99热这里只有是精品在线观看| 国产精品电影一区二区三区| 亚洲av中文av极速乱| 免费人成视频x8x8入口观看| 国产aⅴ精品一区二区三区波| 大型黄色视频在线免费观看| 欧美性猛交╳xxx乱大交人| 精品午夜福利视频在线观看一区| 国产熟女欧美一区二区| 亚洲av成人av| 美女高潮的动态| 黄色日韩在线| 国产精品人妻久久久影院| 欧美xxxx性猛交bbbb| 精品少妇黑人巨大在线播放 | 精品日产1卡2卡| 成年版毛片免费区| 五月玫瑰六月丁香| 精品无人区乱码1区二区| 国产亚洲精品av在线| 一a级毛片在线观看| а√天堂www在线а√下载| 久久精品国产鲁丝片午夜精品| 麻豆久久精品国产亚洲av| a级毛片a级免费在线| 成年女人毛片免费观看观看9| 一级黄色大片毛片| 色综合站精品国产| 国产乱人偷精品视频| 国产精品一区二区性色av| 狂野欧美激情性xxxx在线观看| 日本撒尿小便嘘嘘汇集6| 看免费成人av毛片| av免费在线看不卡| 狂野欧美白嫩少妇大欣赏| 日本熟妇午夜| 免费电影在线观看免费观看| 日韩 亚洲 欧美在线| 天天一区二区日本电影三级| 天堂√8在线中文| 亚洲人成网站在线播| 久久欧美精品欧美久久欧美| 国模一区二区三区四区视频| 免费av观看视频| 久久人人精品亚洲av| 99久久无色码亚洲精品果冻| 天堂网av新在线| 日日啪夜夜撸| 一本一本综合久久| 天天一区二区日本电影三级| 亚洲内射少妇av| 亚洲精品乱码久久久v下载方式| 亚洲中文字幕日韩| 日日干狠狠操夜夜爽| 国产高清三级在线| 亚洲真实伦在线观看| 一卡2卡三卡四卡精品乱码亚洲| 你懂的网址亚洲精品在线观看 | 亚洲国产精品成人久久小说 | 麻豆一二三区av精品| 午夜影院日韩av| 99国产精品一区二区蜜桃av| 干丝袜人妻中文字幕| 亚洲色图av天堂| a级毛片a级免费在线| 欧美成人免费av一区二区三区| 又黄又爽又免费观看的视频| 99久久精品热视频| 亚洲成人久久爱视频| 亚洲色图av天堂| 床上黄色一级片| 亚洲av不卡在线观看| 久久热精品热| 日本爱情动作片www.在线观看 | 最好的美女福利视频网| 亚洲欧美清纯卡通| 国产免费男女视频| 九色成人免费人妻av| 免费在线观看成人毛片| 熟女电影av网| 丝袜美腿在线中文| 亚洲欧美日韩高清在线视频| av中文乱码字幕在线| 色播亚洲综合网| 欧美最黄视频在线播放免费| 美女免费视频网站| 别揉我奶头~嗯~啊~动态视频| 精品少妇黑人巨大在线播放 | 国产伦在线观看视频一区| 成年女人永久免费观看视频| 亚洲va在线va天堂va国产| 国产精华一区二区三区| 久久久精品94久久精品| 最新在线观看一区二区三区| 国产成人精品久久久久久| 亚洲精品日韩在线中文字幕 | 男人舔奶头视频| 男插女下体视频免费在线播放| 在线a可以看的网站| 91久久精品国产一区二区成人| 久久久久久久久久成人| 成人国产麻豆网| 色视频www国产| 搡老熟女国产l中国老女人| 色哟哟·www| 久久精品国产亚洲av天美| 久久欧美精品欧美久久欧美| 国产黄色小视频在线观看| 美女被艹到高潮喷水动态| 国产亚洲av嫩草精品影院| 一区二区三区高清视频在线| 国产黄色小视频在线观看| 人妻丰满熟妇av一区二区三区| 亚洲成人中文字幕在线播放| 老司机影院成人| 久久久久国产网址| 国产精品日韩av在线免费观看| 国产av一区在线观看免费| 国产三级在线视频| 日本a在线网址| 亚洲性久久影院| 狂野欧美激情性xxxx在线观看| av专区在线播放| 精品久久国产蜜桃| 欧美最黄视频在线播放免费| 亚洲一级一片aⅴ在线观看| 免费看日本二区| 久久久久久九九精品二区国产| 一区二区三区免费毛片| 亚洲欧美日韩东京热| 丰满乱子伦码专区| 亚洲天堂国产精品一区在线| 午夜a级毛片| 久久草成人影院| 久久久午夜欧美精品| 老司机影院成人| 在线观看66精品国产| 日本免费a在线| 美女被艹到高潮喷水动态| 国产欧美日韩精品亚洲av| 级片在线观看| 午夜a级毛片| 国产精品久久久久久久久免| 亚洲自拍偷在线| 狂野欧美白嫩少妇大欣赏| 亚洲中文字幕一区二区三区有码在线看| 一区二区三区免费毛片| 欧美激情久久久久久爽电影| a级毛片a级免费在线| 亚洲在线自拍视频| 一a级毛片在线观看| 久久久国产成人精品二区| 97碰自拍视频| 国产精品嫩草影院av在线观看| 变态另类成人亚洲欧美熟女| 免费大片18禁| 亚洲成av人片在线播放无| 男女边吃奶边做爰视频| av专区在线播放| 久久久久久久午夜电影| 日韩成人伦理影院| 亚洲熟妇中文字幕五十中出| 天堂影院成人在线观看| 久久亚洲精品不卡| 亚洲精品456在线播放app| 国产三级在线视频| 国产美女午夜福利| 12—13女人毛片做爰片一| 美女cb高潮喷水在线观看| 婷婷精品国产亚洲av| 一个人观看的视频www高清免费观看| 中国美白少妇内射xxxbb| 黄色视频,在线免费观看| 国产亚洲精品综合一区在线观看| 我要看日韩黄色一级片| a级一级毛片免费在线观看| 天堂动漫精品| 色av中文字幕| 一级a爱片免费观看的视频| 午夜激情福利司机影院| 久久热精品热| 国产人妻一区二区三区在| 成人特级黄色片久久久久久久| 亚洲va在线va天堂va国产| 国产精品久久久久久久电影| 欧美激情在线99| 成人高潮视频无遮挡免费网站| 免费看美女性在线毛片视频| 国产精品不卡视频一区二区| 亚洲无线在线观看| 人人妻人人看人人澡| 久久久精品94久久精品| 亚洲av中文字字幕乱码综合| 九色成人免费人妻av| 精品一区二区三区av网在线观看| 我的女老师完整版在线观看| 国内久久婷婷六月综合欲色啪| 久久精品国产清高在天天线| 精品一区二区三区av网在线观看| 变态另类丝袜制服| 小蜜桃在线观看免费完整版高清| 深夜精品福利| 99热这里只有是精品50| 国产高清视频在线观看网站| 黄色欧美视频在线观看| 深夜精品福利| 亚洲丝袜综合中文字幕| 一级毛片电影观看 | 欧美最黄视频在线播放免费| 亚洲自偷自拍三级| 亚洲在线观看片| 免费高清视频大片| 国产精品免费一区二区三区在线| 听说在线观看完整版免费高清| 久久亚洲国产成人精品v| 精品免费久久久久久久清纯| 欧美激情久久久久久爽电影| 草草在线视频免费看| 国产乱人偷精品视频| 国产精品,欧美在线| 18禁在线无遮挡免费观看视频 | 亚洲精品日韩在线中文字幕 | 禁无遮挡网站| 不卡一级毛片| 成人av在线播放网站| 不卡一级毛片| 熟女人妻精品中文字幕| 亚洲中文字幕一区二区三区有码在线看| 午夜爱爱视频在线播放| av天堂在线播放| 久久精品国产99精品国产亚洲性色| 精品国产三级普通话版| 国产在视频线在精品| 亚洲一级一片aⅴ在线观看| 久久久精品大字幕| 亚洲精品粉嫩美女一区| 人人妻人人看人人澡| 黄色视频,在线免费观看| 国产探花极品一区二区| 我要搜黄色片| 国语自产精品视频在线第100页| 国产精品久久久久久av不卡| 成人特级黄色片久久久久久久| 日本欧美国产在线视频| 久久欧美精品欧美久久欧美| 老司机福利观看| 美女黄网站色视频| 啦啦啦观看免费观看视频高清| 有码 亚洲区| 亚洲成av人片在线播放无| 午夜亚洲福利在线播放| 伦理电影大哥的女人| av卡一久久| 精品福利观看| 亚洲国产精品成人综合色| 欧美日韩一区二区视频在线观看视频在线 | 欧美三级亚洲精品|