• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Optimal control strategy for COVID-19 concerning both life and economy based on deep reinforcement learning?

    2021-12-22 06:48:20WeiDeng鄧為GuoyuanQi齊國元andXinchenYu蔚昕晨
    Chinese Physics B 2021年12期
    關(guān)鍵詞:齊國

    Wei Deng(鄧為) Guoyuan Qi(齊國元) and Xinchen Yu(蔚昕晨)

    1Tianjin Key Laboratory of Advanced Technology in Electrical Engineering and Energy,School of Control Science and Engineering,Tiangong University,Tianjin 300387,China

    2School of Mechanical Engineering,Tiangong University,Tianjin 300387,China

    Keywords: COVID-19,SIHR model,deep reinforcement learning,DQN,secondary outbreak,economy

    1. Introduction

    As of April 13, 2021, the number of diagnosed cases of COVID-19 worldwide reached 137 941 696, and at least 2 967 745 individuals have died from this virus since the first report in December 2019.[1]According to the research,[2]the new coronavirus is highly contagious with a relatively low case fatality rate, and has a long asymptomatic infection period. The infected individuals in the incubation period can infect normal people without any symptoms.[3]Therefore,the most effective measure to prevent the rapid spread of COVID-19 is nucleic acid detection, isolation measures and travel tracing.[4]However, extreme blockade measures have disastrous consequences for economy. The quarantine policy may be an effective short-term measure. However, the indefinite quarantine before the vaccine is released and put on the market on a large scale will prevent billions of people in the world from earning income,especially in countries with a more vulnerable economy, leading to an increase in the mortality rate of low-income people,[5]especially children.[6]

    Dynamic and mathematical models that simulated the spread of diseases can guide government policymakers to mitigate the detrimental consequences of the epidemic.[7]Many researchers have analyzed and predicted the spread of the epidemic by adopting the improved Susceptible-Exposed-Infectious-Recovered (SEIR).[8–11]Fanget al.simulated the transmission of COVID-19 and the impact of quarantine measures on the epidemic.[8]Mandalet al. established the Susceptible-Exposed-Quarantined-Infectious-Recovered (SEQIR) model in Ref. [9], and formulated reliable epidemic prevention and control measures through the optimal control methods. Huanget al. studied the consequences of relaxing control measures in Spain.[10]Yuet al. proposed the SIHR model in which the parameters were designed as piecewise functions in lockdown time, and studied the possible secondary outbreaks after India loosened control.[11]Wanget al.proposed a novel epidemic model based on two-layered multiplex networks to explore the influence of positive and negative preventive information on epidemic propagation.[12]Huanget al. proposed a new vaccination update rule on complex network to discuss the role of vaccine efficacy in the vaccination behavior.[13]Ronget al.studied the dependence of model parameters on the basic reproduction number.[14]Cuiet al.studied individuals’effective preventive measures against epidemics through reinforcement learning.[15]Tonget al.adopted agent-based simulation to assess disease-prevention measures during pandemics.[16]Some researchers have also adopted machine learning to predict the COVID-19,but have not considered the epidemic control.[17–19]

    In the literature above and the latest research of COVID-19, economy is not considered in the model of SEIR. Under the economic pressure caused by strict quarantine measures in the epidemic,some countries have pursued a balance between epidemic prevention and control and economic recovery. To accurately predict the spread of COVID-19 and evaluate consequences beyond the epidemic itself,the model must consider how quarantine measures may affect the economy.[20–24]However,to our best knowledge,there has been no model concerning preventing both peoples’lives and economic development that impacts the people’s welfare. We can regard the control of the epidemic and the economy’s development as an optimal control problem.

    Deep reinforcement learning (RL) is a machine learning technique that combines the perception ability of deep learning with the decision-making ability of the RL.Compared with other traditional decision-making optimization algorithms,the RL can realize model-free self-learning of high-dimensional mapping relationships from state to action. The RL is widely used in self-driving, optimal scheduling, path planning and other fields to solve optimal control problems.[25–27]Mnihet al.[28]introduced Deep Q-Network(DQN)that combines the deep neural networks and the RL. The DQN is an effective method of deep RL.Compared with traditional RL,the DQN can effectively improve learning efficiency in situations where the state space is too large or the environment is unknown.The balance between the retraining of the epidemic of Covid-19 and economic development is decision-making and policy optimization.Therefore,choosing the advanced method of the DQN to make an optimal policy is of great value and necessity.At present,most of the research on COVID-19 has mainly been devoted to giving analysis and prediction of the development trend of the pandemic. However, we have not found an optimal strategy for economic development and epidemic prevention and control using deep RL through searching references.

    In this paper, the SIHR model is adopted to simulate the spread of the epidemic, aiming to study the development of COVID-19 at different stages. The contribution and innovation of this paper are as follows.

    (i) An economic model affected by epidemic isolation measures is established. The development of the epidemic can be roughly divided into five stages, according to the government’s response measures and the trend of newly diagnosed cases. The effective reproduction number and the eigenvalues at the equilibrium point are introduced to verify the effectiveness of the model.

    (ii)Based on the deep reinforcement learning method of DQN,the blocking policy to maximize the economy under the premise of controlling the number of infections as much as possible is studied. The abilities of different countries to resist economic risks by adjusting the reward coefficient are simulated. From this,the optimal control policy of different countries is formulated.

    The remainder of this paper is organized as follows. In Section 2, the deep RL based on the DQN is introduced. In Section 3, a training experiment of deep RL based on the SIHR-based compartment model is designed. Section 4 studies the optimal policy in different conditions and adopts the optimal policy at different time points. In Section 5, a summary is made.

    2. Deep reinforcement learning

    Deep RL is a machine learning technique that combines the perception ability of deep learning with the decisionmaking ability of reinforcement learning.[29]Figure 1 shows the general framework of deep RL. Deep neural network obtains target observation information from the environment and provides state information. The RL takes environmental feedback as input and returns a policy that maximizes the timediscounted expected future rewards.

    Fig.1. Deep reinforcement learning framework.

    2.1. Markov decision process

    The government’s policy on COVID-19 can be approximately modeled as a Markov decision process (MDP). In a Markov process,we assume that the government does not fully understand their situation and what measures should be taken next.It only considers its current state and takes action leading to a new state. The MDP usually consists of four parts:O(observation state space),A(set of possible actions),P(transition probabilities),andV(set of value of the reward). At the stateot, the government takes the actionatand transfers from the current state to the next stateot+1with probabilityp. Finally,the government gets a rewardvtfor its action. This process can go on,or it can stop at a terminating state.

    The strategyπrepresents the probability distribution of actionAin each stateO. The goal of RL is to find an optimal economy-life balanced strategyπ?that maximizes the cumulative rewardVπthrough continuous interaction with the environment

    As the system environment changes,the method of calculating cumulative rewards will also be changed. In round tasks such as formulating a policy over a period of time, we usually useT-step of cumulative rewards,

    whereEπ[·]is the expectation under the strategyπ.

    2.2. Calculation of value function

    For a strategy,the value function can predict the cumulative reward that the government policy will obtain based on the current state in the future,which will bring great convenience to RL. For theT-step cumulative reward, given the current stateoand actiona,the state-action value function is the longterm reward expectation generated under the guidance of the strategyπ,which can be defined as

    From this,we can get the Bellman equation

    We can see that the state-action value function can be expressed in a recursive form.

    For all state-action pairs, there is an optimal strategyπ?to obtain the maximum expected return value. The strategyπ?is called the optimal strategy that can balance economic recovery and epidemic prevention and control, and its state-action value function can be defined as

    The Bellman equation changes to

    2.3. Deep Q network algorithm

    When the state space of the environment is vast, or the model is unknown,it is too costly for the government to obtain the value function using state transition functions or tables. It is necessary to approximate the value function through a nonlinear function approximator such as the deep neural network.This nonlinear function approximator can effectively store the experience accumulated by the government in adopting different policies. Equation(7)shows the updating process of theQfunction in table format,

    The DQN algorithm uses a deep neural network to approximate theQfunction,and equation(8)shows the updating process of its value function,

    whereαis the learning rate,andwis the weight of the neural network.

    When training a neural network,we use the mean square error to define the error function

    To get the maximumQvalue, we use the stochastic gradient descent method to update the parameters. We get the optimal strategy based on

    In the DQN training process,parameter selection and evaluation actions based on the same target value network will lead to overestimatingQvalue during the learning process, which will lead to more significant errors in the result. There are two groups of neural networks with different parameters and the same structure in double DQN.The online network is used to select the action corresponding to the maximumQvalue,and the target network is used to evaluate theQvalue of the optimal action. The target formula is as follows:

    Double DQN can separate action selection and strategy evaluation by using two sets of neural networks.In this way,we can estimate theQvalue more accurately and improve the speed of convergence.

    3. System model and scene construction

    3.1. Epidemic model and economic model

    The SIR dynamic model was firstly used for studying the Black Death in 1927.[30]The SIR-liked model has been widely adopted to simulate the spread of various infectious diseases.To simulate the spread of COVID-19 in different stages, we adopt the SIHR model[11]and add the isolation rate related to government quarantine measures.On this basis,we also establish an economic model affected by the quarantine measures.The following assumptions are needed:

    i)The community population is a closed system.

    ii)Everyone in the population is susceptible.

    iii)All the infected individuals enter the hospital for treatment.

    iv)Everyone in the population is not vaccinated.

    v)Ignore the impact of virus mutation on the transmission rate.

    The total populationNis composed of the susceptible individuals(S),the infected individualsI(latent individuals and those capable of spreading the coronavirus), the hospitalized individualsH(diagnosed patients diagnosed by the hospital),the recovered individualsR(immune to the coronavirus) and the dead individualsD. A schematic description of the model is depicted in Fig.2.

    Fig.2. Flow diagram of the dynamic system of COVID-19.

    Some susceptible individualsSwill be infected by contacting the infectiousI(inflowI), and the transmission rateα(t) indicates the possibility of infection per infector transmitting the disease to the susceptible.lrepresents the isolation rate that is mandated by the government and execution of people in the closed region. And the higherl, the lower isolation, andl=0 means the infectious route is completely cut off.Nis the total population andN=S+I+H+R+D.Yet,due to the limited diagnostic resources,only a portion of people could be diagnosed, soβ(t) indicates the probability of diagnosis. After being diagnosed, the patients are almost entirely isolated, so they would not be transmitted to others.The diagnosed infectorsIreceive treatment to reduce because of the recovery rateγ(t)and the mortality rated(t)caused by the disease,and the recovered individuals are not be infected if they have developed an immunity. The cumulative diagnosed cases can be expressed byC(t)=H(t)+R(t)+D(t). The following equations summarize the spread-prevention-infection dynamics model:

    wherel,α,β(t),γ(t),andd(t)respectively represent the rates of isolation,transmission,diagnosis,cure,and death based on the infectious disease model.β(t),γ(t),andd(t)are designed as Sigmoid cumulative functions 1/(1+ek(t?τ))composed ofk,τ,andtin different stages,kis usually positive inβ(t),γ(t),and negative ind(t), which means that theβ(t)andγ(t)will increase astincreases, while thed(t) is just opposite. The parameters setting above was given by Ref.[11].

    In the economic model, populations’ production will be affected by the lockdown measures.Compared with economic indicators such as gross domestic product (GDP), we only consider the wealth created by individuals, not the economic growth brought about by consumption.In our simulation,populations can be divided into two types:those whose productivity is highly damaged by quarantine and those whose productivity is less damaged.The total economic output is the sum of the outputs of all the individuals in the environment minus the medical expenses for treating patients. The individuals who are not isolated have normal productivity,isolated individuals lose a high percentage of their productivity (represent byη),dead individuals have no productivity, and hospitalized individuals have no productivity and pay for treatment. The following economic outputGper capita is proposed:

    whereηandμrepresent the reduced productivity per capita and average treatment expense,respectively.

    3.2. Indictors of controllability and stability of spread

    In terms of the controllability of the epidemic, the basic reproduction number(R0)measures the probability of the disease being transmitted to other populations through naive populations in initial stage (Ronget al., 2020). A real-time indicator in measuring the spread risk and the controllability of the spread is effective reproduction number (Re(t)).[31]In Eq.(12),Re(t)can be expressed as

    where??S(t) and ?C(t) represent the net newly infectious individuals and the net newly diagnosed infections.

    From the perspective of stability of the SIHR model, we solve the equilibrium point of the model(12)as(S?,0,0,R?,D?,C?),whileS?,R?,D?,C?can be any positive numbers less thanNand satisfyN=S?+R?+D?,C?=R?+D?.Under the premise of considering the stability of the epidemic, we can modify the model(12)as

    and assume thatX=(S I H R D). Now equation(15)can be expressed as

    whereBrepresent the 5×5 matrix to the right-hand side of Eq.(15). The characteristics equation ofBat the equilibrium point can be expressed as

    Then we can obtain the following eigenvalues:

    Here we observed thatλ1<0 and we give a specific example to analyze the role ofλ2andRe(t) in the spread of the epidemic. Supposing a closed area has 65 500 000 people, 500 unquarantined virus carriers, 100 diagnosed cases, no deaths and recovered cases in the outbreak stage. Figure 3 shows the simulated results with fixedβ(t)=0.10,α=0.5,and varyingl.

    From Fig. 3, we can observe that the newly diagnosed cases ?C(t) shows a single wave, the correspondingλ2andRe(t) decline. Moreover,λ2>0 andRe(t)>1 indicates that the newly infected cases increase and exceed the newly diagnosed cases, which means that the risk of spread of the pandemic may exists temporarily, and the system (15) will be in divergence. The biggerλ2andRe(t)are,the faster ?C(t)will grow. Conversely,λ2<0 andRe(t)<1 indicate the decline of ?C(t), which means the epidemic is under control and the system will be finally stable. The smallerλ2andRe(t) are,the faster ?C(t) will decline. It is worth noting that in the case ofλ2≡0 andRe(t)≡1,?C(t)will be a constant,which also means the infected individualsIwill not increase further.Therefore,λ2andRe(t) accurately depicts the stability and controllability of the system (15) and pandemic and further prove the effectiveness of the SIHR model. These results also indicates that the spread of the epidemic can be effectively affected by the quarantine measuresl,which is conducive to the establishment of the reward function.

    Fig.3. Simulated results with varying l and fixed β(t)=0.10,α =0.5,(a)the newly diagnosed cases,(b)effective reproduction number,(c)eigenvalues of equilibrium point.

    3.3. Preconditions for RL training

    Due to the constraints of physical conditions, the degree of public cooperation, system time lag and other factors, the following assumption must be considered:

    (I)The government needs to formulate a long-term quarantine policy, after at leastNdays, the government could change the isolation measures.

    (II) The government needs to implement different quarantine measures to deal with the changing situation of the epidemic. The quarantine ratesl1,l2,l3,l4represent the quarantine measures after the gradual unblocking in the state of emergency.

    (III)The system is updated in days. The number of diagnosed cases, deaths and the recovered cases will change with timet,and the smallest unit of timetis a day.

    3.4. Space and reward function

    When selecting statespace parameters, the performance improvement brought by an excellent new state information is significantly higher than that of other work. Similarly, some irrelevant interference information will have a counterproductive effect. The impact of dead individuals and recovered individuals on the epidemic is minimal, so they are not used as a statespace parameter. Statespace parameters include susceptible individualsS, infectious individualsI, hospitalized individualsHand timet. The observation state space is expressed as

    Action space includes isolation rates corresponding to isolation measures of different strengthsl1,l2,l3,l4andl1

    Besides, the isolation ratelrepresents the actionathat the government can perform,which meansa=l.

    The reward function penalizes the increase of the number of diagnosed cases,and also rewards the cumulative economic output.v?(st,at)ensures that the epidemic can be controlled,v+(st,at) ensures the maximization of cumulative gross production value. The reward function is expressed as

    wherev+(st,at)=G,v?(st,at)=?C/N, and?is the reward coefficient, representing the government’s emphasis on the economy.

    3.5. Economy-life optimal algorithm

    In this paper,we propose a short-term economy-life optimal algorithm based on deep RL,and its overall flow is shown in Fig.4.

    I)The original COVID-19 data is divided into several different stages to fit the SIHR model. Then the model is used to provide training data for RL, which can simulate the development of the epidemic under different government policies.The better the model fitted,the more reference value the optimal policy.

    II)The optimal policy derived from RL is mainly affected by the reward function. The reward coefficient?represents the government’s emphasis on the economy. Therefore, the optimal strategy for different countries can be formulated by adjusting the?.

    Fig.4. Algorithm flow chart.

    4. Experimental results and analysis

    It is noted that most countries are still suffering from the epidemic. The COVID-19 is far from over until the vaccine is successfully developed and put on the market on a large scale. Therefore, it is significant to adopt deep RL to study the economic-epidemic balance policies of different countries.According to the government’s response measures and the trend of newly diagnosed cases,the COVID-19 can be roughly divided into five stages:

    Stage I Outbreak stage. At the beginning of the epidemic, the government ignored the severity of the epidemic.The number of newly diagnoses has increased rapidly.

    Stage II Lockdown stage. The government implemented a strict isolation policy. The number of newly diagnoses peaked and began to decline.

    Stage III Gradually unblocking stage. The number of newly diagnoses has further decreased. The government began to unblock the city to recover the economy gradually.

    Stage IV Buffer stage. During this stage,the number of infections remained at a low level. But there is still a risk of an outbreak.

    Stage V Second or third outbreak stage. The number of newly diagnoses increased again after reaching the bottom,and the epidemic broke out again.

    The stage of the epidemic in different countries is shown in Fig.5. As shown in Fig.5,China and Iceland have entered the buffer stage early, and there has been no secondary outbreak. After entering the controllable stage of the epidemic,most European countries experienced a second outbreak.

    Fig.5. Stage of COVID-19 in different countries.

    Table 1. Fitting results of parameters.

    Fig. 6. Fitting curve and reported data, (a) cumulative confirmed cases, (b) newly diagnosed cases, (c) cumulative cured cases, and (d)cumulative dead cases.

    We notice that the Italian data is very representative.Therefore, we use data from different stages in Italy as the training data for the RL. Here, we fit the parameters ofl,α,β(t),γ(t), andd(t) by using the least square functionsfminconandlsqnonlinof Matlab.[14]The Italian government began to vaccinate the people on December 27,the number of vaccinated people(2 doses)reached 4 055 458(6.8%of the population)by April 13.[32]To avoid the influence of the vaccinated individual,twenty sets of data from February 22 to November 10 in Italy are used to fit the model. Figure 6 shows the fitting curve and the reported data. The model-based parameters by fitting the reported data are shown in Table 1.

    From Fig.6,we can see that the development of the epidemic in Italy can also be roughly divided into the above five stages. The fitting results are excellent, and the curve fits the real data. As shown in Table 1,the transmission rateαis usually fixed in different stages of an epidemic,and only changes during the second or third outbreak stage.The quarantine ratelrepresents the intensity of the government’s policy in response to the epidemic.lis different at each stage, but in a round of the epidemic,it first declines and then rises.This phenomenon shows that the government always locks down cities when the epidemic is severe and releases the lockdown to restore the economy after the epidemic eases. The diagnosis rateβ(t)and the cure rateγ(t) increase over time, while the mortality rate is the opposite. It is noted that in the second round of the epidemic,althoughlis nearly unchanged andαis significantly lower than the previous stage,a second outbreak still occurred.This phenomenon is due to the relaxation of vigilance by the government and the public during the second outbreak stage,resulting in a significant decrease in the diagnosis rate compared to the previous stage. Hidden virus carriers were not isolated,which led to a second outbreak.

    We adopted the coefficient of determinationR2to evaluate the goodness of the fitting results,[11]and the closer theR2is to 1, the better the fitting results. TheR2can be expressed as

    whereyi, ?yi,and ˉyirepresent the value of reported data,average value of reported data, and the fitted value in Italy from February 22 to November 10. Table 2 shows theR2of the cumulative diagnosed cases,daily diagnosed cases,recovered cases, and dead cases. The mean ofR2at different stages reached more than 0.84 and most value ofR2reached more than 0.9 or even 0.99. These results indicate that our model can fit the real data well, which is conducive to the training process of deep reinforcement learning and come up with an effective scheme.

    Table 2. Goodness of fitting results.

    4.1. Control strategy during outbreak stage

    On the premise of controlling the spread of the epidemic,recovering the economy as much as possible has become a concern for governments of many countries. We take the first day of the Italian government’s lockdown (March 10) as the starting point, 90 days later as a round, and assume that the government can take new quarantine measures at least 20 days after. Based on the TensorFlow framework,a fully connected neural network with a 3-layer network structure as theQ-value network of DQN has been designed. The input layer is a 5-dimensional feature tensor, including susceptible individualsS, infectious individualsI, hospitalized individualsH, timetand actiona. The hidden layer has five layers of the network,with each layer of the network having 20 neuron nodes. The output layer is a 4-dimensional tensor,which represents theQvalue of different actions (l1,l2,l3,l4). The memory buffer capacity is 10 000,and the random batch size is 64.

    We use the e-greedy strategy to train the agent,[22]which helps the government obtain a better strategy. The agent randomly explores actionsain the initial stage,and gets the corresponding reward valuevafter performing the actionato update theQvalue of Eq.(3). As the training progresses,it gradually replaces random exploration with network predictions.The agent selects the actiona=lwith the maximumQvalue of Eq.(6)in the output layer of the neural network and sends it to the SIHR model of Eq.(12)as the isolation rate at the next moment.

    Figure 7 shows the agent’s performance after 6500 episodes of training(90 days after the initial date is episode).In each episode,the agent made 90 action choices and updated the parameters in the neural network. The abscissa represents the number of training episodes, and the ordinate represents the rewards obtained by the agent in each episode. The result shows that as the number of training rounds increases,the agent gets convergent and steady rewards,which indicates that the agent already has some intelligent features.

    Figure 8 shows the optimal control strategy and epidemic development trend obtained by the agent after 20 000 episodes of training. We provide four isolation rates, as shown in Fig. 8(a), corresponding to the government’s isolation measures in different training periods. The agent decides to select which isolation rate according to the training. Consequently,figures 8(b), 8(c), and 8(d) show the newly diagnosed cases,the total diagnosed cases,and the cumulative dead cases after the government took quarantine measures using the deep RL.

    Fig. 8. Impact of government’s control after March 11 on, (a) isolation rate based on isolation measure, (b) newly diagnosed cases, (c)cumulative diagnosed cases,and(d)economic output compared to the pre-epidemic period.

    Fig.7. Training process.

    Figure 8(a)shows the optimal selection using the deep RL training. From Figs.8(a)and 8(b),in the outbreak stage when the newly diagnosed cases are increasing,the strategy given by the agent tends to adopt the most stringent isolation measures in the early stage of the epidemic becausel1that is the least number in the early stage is taken. After the epidemic is basically controlled,the agent recommends gradually lifting the lockdown measures to recover the economy.In the unblocking process,the isolation rate rises froml1tol2,then skipsl3and directly rises tol4.The rate of decrease in the number of newly diagnosed patients slowed down,but after the second release,the number of newly diagnosed people rose slightly.However,as the government stepped up the virus detection measures,the number of newly diagnosed people continued decreasing. In Fig.8(d),we can see that as the government gradually relaxes the isolation measures, the economic growth rate has also increased.

    These results indicate that the government should immediately adopt the most severe isolation measures in response to the rapidly spreading epidemic. In the process of gradual unblocking,the time and degree of unblocking not only affect the speed of economic recovery, but also determine whether there will be a second outbreak in the future. After accumulating experience through thousands of training episodes, the RL can formulate effective prevention and control strategies for the epidemic.

    4.2. Control strategy in different situations in outbreak stage

    Considering the differences in the industrial structure and economic risk resistance of different countries,too strict isolation measures may bring greater risks to economically vulnerable countries. Therefore, the epidemic prevention and control policy should be combined with the conditions of different countries.

    What is directly related to the government’s concern for the economy is the reward coefficient?in the reward function.The reward coefficient will affect the weight of the economy in the reward function—the smaller the?,and the more economical the policy. We take different reward coefficients?1,?2,?3(?1

    Figure 9(a) compares the isolation rate corresponding to the optimal control scheme under different parameters?. The smaller?is, the more emphasis is on recovering the economy,and the earlier the first unblocking and gradual unblocking. And in the case of?=?3, the degree of unblocking is more conservative. Figures 9(b)and 9(c)show the trend of the newly diagnoses and total diagnoses under different strategies.Compared with the reward coefficient?3, the final cumulative diagnosed cases of?1,?2were increased by 107.47%and 6.67%, respectively, and the cumulative dead case increased by 9.59%and 0.65%,respectively. As?decreases,the isolation measures become more relaxed,leading to that the newly diagnosed case and the cumulative diagnosed case increase.And a second outbreak occurred for?=?1,which is the consequence of striving to recover the economy in the short term.Figure 9(d)shows the trend of economic output under different strategies. Compared with the reward coefficient?3, the final economic output of?1,?2were increased by 8.17%and 18.95%,respectively. With the decrease of?,the average isolation rate decreases,which means more people are engaged in production activities, and the cumulative gross product value increases. Table 3 compares the specific data. Compared with the?2,the final economic output of?1is not much higher,and it pays a huge price with the much higher hospitalized people and deaths. Part of the reason is that the second outbreak has led to more diagnosed cases and medical expenses.

    Fig.9. Impact of government’s control after March 11 in different ?,(a)isolation rate based on isolation measure,(b)newly diagnosed cases,(c)cumulative diagnosed cases,and(d)economic output compared to the pre-epidemic period.

    Table 3. Comparison of data of different reward coefficients on March 11.

    These results show that based on different reward coefficients?, the epidemic control strategies given by the agent after training are also different. The smaller the?,the weaker the country’s ability to resist risks in the economy. The economy in short term will be more considered when formulating policies. The time for unblocking will come earlier and the intensity of unblocking will be greater, which will lead to an increase in the diagnosed cases and even a second outbreak.However, economically biased policies can only reduce economic losses in the short term. In the long term, looser policies will lead to more diagnosed cases and deaths,and a higher probability of recurrence will lead to a longer duration of the epidemic,which will delay the economic recovery.

    The above policies have one thing in common: the government implemented lockdown measures in the early stages of the outbreak to avoid significant medical expenses and deaths caused by the increasing diagnosed cases. We assume that the government did not lock down the city to maintain the economy and only adopted minimal quarantine measuresl2within 90 days after March 11. Figure 10 compares the economic growth curve of this policy and the optimal strategy recommended by the agent. It can be seen from the figure that although the adoption of loose quarantine measures can achieve rapid economic growth in the short term,as the number of diagnosed cases and deaths further increases, medical expenditures increase. The growth rate of the economy slowed down and reached an inflection point on April 24,which means that most of the population in the environment has been diagnosed and hospitalized without considering the carrying capacity of the medical system. They were unable to work,and the medical expenses exceeded the economic output of the whole society,and the economy began to grow negatively. After the epidemic was basically controlled,the economy of negative policy began to grow again,but the speed was significantly lower than the optimal strategy. Compared with the optimal policy given by the deep RL, the economic output decreased by 37.8%under the negative policy that the government adopted minimal quarantine measurel2.

    Fig.10. Economic output curve under different control strategies.

    The above results indicate that whether it is from the perspective of ensuring economic growth or controlling the spread of the epidemic,the strictest isolation measures should be taken during the outbreak stage when the newly diagnosed cases increasing rapidly. When the epidemic is under control,gradual unblocking will help recover the economy.

    4.3. Public policy in different time points of the second outbreak stage

    Due to the economic pressure caused by the long-term lockdown, European countries have gradually unblocked the city after the epidemic was basically under control. However,there have still been some virus carriers in the environment.The epidemic is far from over until the vaccine is successfully developed and put on the market on a large scale. As time passed,the newly diagnosed cases in most European countries,including Italy,began to rebound,and the epidemic entered the second outbreak stage or even the third outbreak stage.

    The conclusions we got in the first outbreak stage are still applicable to the second or third outbreak stage. Specific government policies can be given after the RL training. In this section, we have set September 26, October 6, and October 16 as the starting date for the government to adopt isolation measures to study the impact of the control strategy on the epidemic and economy on the different dates of the second outbreak stage. From Fig. 11, although the control strategy on different dates has little effect on the epidemic’s duration,the sooner control measures are taken, the fewer cumulative diagnosed cases and cumulative dead cases,and the higher the total economic output. Table 4 compares the specific data.

    In Fig. 11(a), after the government implemented lockdown measures, the number of cumulative diagnosed cases began to slow down and eventually stabilized. Besides, in Figs.11(a)and 11(b),compared with the date of lockdown on September 26,the final cumulative diagnosed cases of October 6,and October 16 were increased by 17.54%and 64.37%,respectively, and the cumulative dead case increased by 4.94%and 15.81% respectively. According to Fig. 11(c), the later the government takes lockdown measures,the greater the economic loss,even if the government can obtain more economic growth in the early stage. The reason for this phenomenon is that the later the government lockdown the country, the more infections and hospitalizations in the environment, and the time for unblocking will be later,which will lead to more significant economic losses.

    The results indicate that if the government can take effective prevention and control measures in time during the second explosion, it can effectively reduce the number of people infected with the epidemic and ensure continued economic growth. Although the policy of balancing economy and epidemic has controlled the spread of the epidemic,the virus carriers in the population have not completely disappeared. If the government relaxes inspections or the people’s awareness of epidemic prevention declines, the epidemic may break out again. Therefore, the government should strengthen personal nucleic acid testing and establish the case tracing mechanism to increase the diagnosis rate.

    Fig. 11. Impact of the same reward coefficient on: (a) cumulative diagnosed cases, (b) cumulative dead cases, and (c) economic output compared to the pre-epidemic period.

    Table 4. Comparison of data after adopting optimal policy at different dates.

    5. Conclusion

    At present, the global COVID-19 epidemic is still severe. More and more countries have experienced second or even third outbreaks. The epidemic is far from over until the vaccine is successfully developed and put on the market on a large scale. Under the premise of controlling the spread of the epidemic, how to ensure economic development as much as possible has become a major problem considered by many countries.In the above research,we improved the SIHR model to simulate the spread of COVID-19 in Italy at different stages and the determination coefficientR2is used to evaluate the goodness of the fitting results. On this basis, we established an economic model affected by the quarantine measures. We used the effective regeneration number and the eigenvalues at the equilibrium point of the model as indicators of controllability and stability of model.We adopted the DQN-based deep reinforcement learning method and introduced the cumulative diagnoses and cumulative gross production value into the reward function as rewards and punishments. After adequate training, an economy-life balanced policy at different stages of the epidemic was formulated.

    The research results show that our model and scheme are effective,to control the spread of the epidemic effectively,the government should adopt the most stringent blockade measuresl1during the outbreak stage,and the timetfor unblocking should be determined by the country’s ability to resist economic risks. These results also suggest that optimal policies may differ in various countries dependent on the level of disease spread and anti-economic risk ability?. For example,in countries with more vulnerable economies and a lower transmission rateα, the consequences of the disease may be less than those of other countries. In contrast, the consequences of blockade policies may cause an economic crisis which will lead many people to be unemployed and difficult to live.In the second outbreak stage,the sooner the lockdown measures are taken, the smaller the losses caused by the epidemic will be.Although the economic outputGwill suffer in the short term,it will benefit the long term.

    The research is not only applicable to Italy,but also provides references for other countries to formulate policies.Similarly,deep reinforcement learning can also be applied to different models. When the model is closer to the real world,the optimal strategy given by deep reinforcement learning will be more accurate.

    Data availability statement

    The data that supports the findings of this study are available within the article[and its supplementary material].

    猜你喜歡
    齊國
    Modeling and dynamics of double Hindmarsh–Rose neuron with memristor-based magnetic coupling and time delay?
    蝸牛的故事
    老馬識(shí)途
    遠(yuǎn)水救不了近火
    遠(yuǎn)水救不了近火
    鄒忌比美
    奢華萬乘國 齊地瑪瑙紅——齊國瑪瑙器藝術(shù)欣賞
    齊國強(qiáng) 作品
    秉筆直書
    略論古齊國的治國之道
    嫩草影视91久久| 国产一卡二卡三卡精品 | 午夜老司机福利片| 亚洲一区二区三区欧美精品| www.av在线官网国产| h视频一区二区三区| 蜜桃国产av成人99| 成年人午夜在线观看视频| av电影中文网址| 婷婷色综合www| 青春草亚洲视频在线观看| 黄色视频不卡| 黑人巨大精品欧美一区二区蜜桃| 伊人久久大香线蕉亚洲五| 国产精品久久久久久精品电影小说| 2021少妇久久久久久久久久久| 久久久久精品性色| 一级黄片播放器| 最近的中文字幕免费完整| 天堂俺去俺来也www色官网| 欧美 日韩 精品 国产| 乱人伦中国视频| 亚洲国产毛片av蜜桃av| 秋霞伦理黄片| 免费在线观看完整版高清| 赤兔流量卡办理| 成人亚洲精品一区在线观看| 大香蕉久久网| 欧美日韩综合久久久久久| 国产伦人伦偷精品视频| 叶爱在线成人免费视频播放| 看十八女毛片水多多多| 人体艺术视频欧美日本| 99精国产麻豆久久婷婷| 亚洲成色77777| 国产片内射在线| 啦啦啦在线观看免费高清www| 极品人妻少妇av视频| 精品国产一区二区久久| 又粗又硬又长又爽又黄的视频| 久久久久久久久免费视频了| 两个人看的免费小视频| 综合色丁香网| 老司机亚洲免费影院| 天堂俺去俺来也www色官网| 国产深夜福利视频在线观看| 久久毛片免费看一区二区三区| 十八禁网站网址无遮挡| 国产精品av久久久久免费| 高清黄色对白视频在线免费看| 国产成人91sexporn| 操美女的视频在线观看| 精品一区在线观看国产| 亚洲 欧美一区二区三区| 国产免费视频播放在线视频| 在现免费观看毛片| 少妇人妻精品综合一区二区| 精品人妻一区二区三区麻豆| 国产伦理片在线播放av一区| 国产精品麻豆人妻色哟哟久久| videos熟女内射| 捣出白浆h1v1| 亚洲成色77777| 伦理电影大哥的女人| 精品免费久久久久久久清纯 | 日本爱情动作片www.在线观看| 久久久久网色| 韩国av在线不卡| 亚洲精品aⅴ在线观看| 国产精品.久久久| 90打野战视频偷拍视频| av网站免费在线观看视频| 69精品国产乱码久久久| 少妇人妻久久综合中文| 欧美精品av麻豆av| 国产片内射在线| 国产精品免费大片| 国产男女内射视频| 观看美女的网站| 亚洲自偷自拍图片 自拍| 免费黄色在线免费观看| 黄色 视频免费看| 捣出白浆h1v1| 精品一区二区三区四区五区乱码 | 一边亲一边摸免费视频| 久久人人爽人人片av| 青春草视频在线免费观看| 国产一区有黄有色的免费视频| 香蕉丝袜av| 18禁裸乳无遮挡动漫免费视频| 2021少妇久久久久久久久久久| 国产色婷婷99| 一区二区三区精品91| 国产成人av激情在线播放| 婷婷成人精品国产| 大陆偷拍与自拍| 成年女人毛片免费观看观看9 | 亚洲熟女精品中文字幕| 最黄视频免费看| 亚洲精品av麻豆狂野| 亚洲精品中文字幕在线视频| 国产高清不卡午夜福利| 日韩av免费高清视频| 最近2019中文字幕mv第一页| 亚洲第一青青草原| 精品一区二区三区四区五区乱码 | 日韩欧美精品免费久久| 一区福利在线观看| 波多野结衣一区麻豆| 一区二区三区四区激情视频| 日本91视频免费播放| 国产成人精品无人区| 黑人猛操日本美女一级片| 国产乱来视频区| 亚洲精品一二三| 中文欧美无线码| 校园人妻丝袜中文字幕| 欧美av亚洲av综合av国产av | 国产乱来视频区| 国产一区二区三区综合在线观看| 桃花免费在线播放| 一级a爱视频在线免费观看| 国产一卡二卡三卡精品 | 在线免费观看不下载黄p国产| 久久久欧美国产精品| 成人手机av| 免费观看性生交大片5| 熟妇人妻不卡中文字幕| 亚洲人成77777在线视频| 国产精品亚洲av一区麻豆 | 国产精品一国产av| 国产在视频线精品| 欧美变态另类bdsm刘玥| 99香蕉大伊视频| 午夜91福利影院| 男人添女人高潮全过程视频| 国产精品一国产av| 一本久久精品| 国产成人午夜福利电影在线观看| 亚洲精品aⅴ在线观看| 高清av免费在线| 韩国av在线不卡| 美女高潮到喷水免费观看| 桃花免费在线播放| 精品国产超薄肉色丝袜足j| 日韩熟女老妇一区二区性免费视频| 亚洲免费av在线视频| 亚洲精品日本国产第一区| 亚洲伊人色综图| 伊人久久国产一区二区| 又大又黄又爽视频免费| 亚洲av男天堂| 综合色丁香网| 久久久久国产精品人妻一区二区| 乱人伦中国视频| 熟女av电影| 亚洲精品aⅴ在线观看| 亚洲精品乱久久久久久| 久久 成人 亚洲| 久热爱精品视频在线9| videos熟女内射| 国产精品.久久久| 欧美精品一区二区免费开放| 国产男女内射视频| 丰满少妇做爰视频| 免费黄网站久久成人精品| kizo精华| 只有这里有精品99| 无限看片的www在线观看| 久久久欧美国产精品| bbb黄色大片| 老司机亚洲免费影院| 人妻一区二区av| 色婷婷av一区二区三区视频| 亚洲熟女毛片儿| 日韩熟女老妇一区二区性免费视频| 亚洲综合色网址| 热re99久久精品国产66热6| 老司机影院毛片| 夫妻午夜视频| 国产在线一区二区三区精| 男女下面插进去视频免费观看| 人人妻人人添人人爽欧美一区卜| avwww免费| 超色免费av| 国产成人a∨麻豆精品| 国产精品久久久av美女十八| 国产极品粉嫩免费观看在线| 中文乱码字字幕精品一区二区三区| 欧美激情 高清一区二区三区| 亚洲欧洲国产日韩| 亚洲精品在线美女| av网站免费在线观看视频| 一区二区三区四区激情视频| kizo精华| 少妇的丰满在线观看| 一区二区三区激情视频| 交换朋友夫妻互换小说| 欧美乱码精品一区二区三区| 99国产精品免费福利视频| 亚洲免费av在线视频| 国产亚洲av高清不卡| videosex国产| 国产激情久久老熟女| 晚上一个人看的免费电影| 久久人妻熟女aⅴ| 青青草视频在线视频观看| 精品人妻一区二区三区麻豆| 波多野结衣一区麻豆| 老司机亚洲免费影院| 欧美日本中文国产一区发布| 丰满乱子伦码专区| 97在线人人人人妻| 欧美日韩福利视频一区二区| 色婷婷久久久亚洲欧美| 大香蕉久久网| 国产男女内射视频| 国产亚洲精品第一综合不卡| 亚洲一卡2卡3卡4卡5卡精品中文| 一级黄片播放器| 国产高清不卡午夜福利| 国精品久久久久久国模美| 美女中出高潮动态图| 亚洲一卡2卡3卡4卡5卡精品中文| 亚洲欧美激情在线| 国产免费福利视频在线观看| 亚洲,一卡二卡三卡| 青春草视频在线免费观看| 哪个播放器可以免费观看大片| 日韩免费高清中文字幕av| 中文字幕人妻丝袜一区二区 | 中文字幕亚洲精品专区| 久久久亚洲精品成人影院| av国产精品久久久久影院| 欧美97在线视频| 亚洲熟女精品中文字幕| 丝袜脚勾引网站| 欧美日本中文国产一区发布| 国产片内射在线| 精品亚洲成a人片在线观看| 男女下面插进去视频免费观看| 午夜福利视频精品| 欧美人与善性xxx| 日韩大片免费观看网站| 嫩草影视91久久| 丝袜人妻中文字幕| 久久精品亚洲熟妇少妇任你| 在线天堂中文资源库| 18禁裸乳无遮挡动漫免费视频| 在线 av 中文字幕| 国产一卡二卡三卡精品 | videos熟女内射| 久久久久久久久久久免费av| 丝袜人妻中文字幕| 亚洲精品国产色婷婷电影| 五月天丁香电影| 国产又爽黄色视频| 大话2 男鬼变身卡| 国产精品人妻久久久影院| av电影中文网址| av在线app专区| 国产男女超爽视频在线观看| 国产一区二区在线观看av| netflix在线观看网站| 国产爽快片一区二区三区| 国产一区二区三区综合在线观看| 99香蕉大伊视频| 毛片一级片免费看久久久久| 日日啪夜夜爽| 狠狠婷婷综合久久久久久88av| 亚洲国产欧美在线一区| 亚洲婷婷狠狠爱综合网| 伊人久久大香线蕉亚洲五| 日韩熟女老妇一区二区性免费视频| 久久这里只有精品19| 人妻人人澡人人爽人人| 精品亚洲乱码少妇综合久久| 国产熟女午夜一区二区三区| 亚洲五月色婷婷综合| 满18在线观看网站| 亚洲国产最新在线播放| 亚洲成色77777| 观看美女的网站| 青春草国产在线视频| 叶爱在线成人免费视频播放| 色视频在线一区二区三区| 久久国产精品男人的天堂亚洲| 欧美日韩一级在线毛片| 国产亚洲一区二区精品| 欧美亚洲 丝袜 人妻 在线| 免费日韩欧美在线观看| 国产av码专区亚洲av| 久久久久久久精品精品| 久久久久久人妻| 国产高清不卡午夜福利| 中文欧美无线码| av视频免费观看在线观看| 免费观看人在逋| 日韩大片免费观看网站| 免费日韩欧美在线观看| 午夜福利视频在线观看免费| 免费高清在线观看日韩| 菩萨蛮人人尽说江南好唐韦庄| 亚洲欧美一区二区三区久久| 亚洲欧美色中文字幕在线| 在线看a的网站| 国产人伦9x9x在线观看| 婷婷色av中文字幕| 久久鲁丝午夜福利片| 亚洲美女黄色视频免费看| av国产精品久久久久影院| 嫩草影视91久久| 亚洲色图 男人天堂 中文字幕| 国产精品免费视频内射| 永久免费av网站大全| 丝袜美足系列| 黑丝袜美女国产一区| 亚洲精品av麻豆狂野| 亚洲,欧美,日韩| 欧美精品av麻豆av| 欧美老熟妇乱子伦牲交| 国产一区二区 视频在线| 精品一品国产午夜福利视频| 美女扒开内裤让男人捅视频| 美女中出高潮动态图| 狂野欧美激情性xxxx| 涩涩av久久男人的天堂| 操美女的视频在线观看| 久久久国产欧美日韩av| 色婷婷av一区二区三区视频| 日本91视频免费播放| 色婷婷av一区二区三区视频| 91成人精品电影| 国产欧美日韩综合在线一区二区| 婷婷成人精品国产| 久久性视频一级片| 美女视频免费永久观看网站| 别揉我奶头~嗯~啊~动态视频 | 黄色怎么调成土黄色| 免费在线观看完整版高清| 成人影院久久| 黄片小视频在线播放| www.熟女人妻精品国产| 考比视频在线观看| 欧美精品av麻豆av| 国产毛片在线视频| 我的亚洲天堂| 久久久久精品人妻al黑| av国产精品久久久久影院| 99国产精品免费福利视频| 亚洲情色 制服丝袜| 宅男免费午夜| 欧美少妇被猛烈插入视频| 久久狼人影院| 久久久精品94久久精品| 国产午夜精品一二区理论片| 制服人妻中文乱码| 亚洲久久久国产精品| 国产日韩一区二区三区精品不卡| 国产精品二区激情视频| 一区二区三区四区激情视频| 亚洲成色77777| 热re99久久国产66热| 色94色欧美一区二区| av卡一久久| 看十八女毛片水多多多| 人妻人人澡人人爽人人| 亚洲欧美一区二区三区黑人| www日本在线高清视频| 天堂中文最新版在线下载| 2021少妇久久久久久久久久久| 免费女性裸体啪啪无遮挡网站| 午夜福利影视在线免费观看| 高清黄色对白视频在线免费看| 在线 av 中文字幕| 亚洲激情五月婷婷啪啪| 韩国av在线不卡| 久久99精品国语久久久| 婷婷成人精品国产| 视频区图区小说| www.自偷自拍.com| 国产97色在线日韩免费| 国产免费又黄又爽又色| 久久久精品94久久精品| 亚洲欧美成人精品一区二区| 看免费av毛片| 伦理电影大哥的女人| 亚洲婷婷狠狠爱综合网| 成年美女黄网站色视频大全免费| 欧美日韩av久久| a级毛片在线看网站| 精品少妇一区二区三区视频日本电影 | 黄色一级大片看看| 精品少妇内射三级| 香蕉丝袜av| 国产在视频线精品| 亚洲av欧美aⅴ国产| 欧美日韩av久久| 1024视频免费在线观看| 婷婷成人精品国产| 日韩制服骚丝袜av| 久久国产精品男人的天堂亚洲| 一边亲一边摸免费视频| 秋霞伦理黄片| 成人18禁高潮啪啪吃奶动态图| 亚洲,欧美,日韩| 亚洲精品av麻豆狂野| 国产成人精品在线电影| 中文字幕精品免费在线观看视频| 国产成人精品久久二区二区91 | 欧美日本中文国产一区发布| 久久97久久精品| 国产成人精品福利久久| 亚洲成人手机| 99国产综合亚洲精品| 精品卡一卡二卡四卡免费| 99精国产麻豆久久婷婷| 国产精品嫩草影院av在线观看| 久久久久久久久久久久大奶| 久久人人爽人人片av| 91国产中文字幕| 国产精品国产三级国产专区5o| 久久人妻熟女aⅴ| 国产熟女午夜一区二区三区| 少妇的丰满在线观看| 飞空精品影院首页| 亚洲欧美一区二区三区黑人| 国产av国产精品国产| 亚洲视频免费观看视频| 免费不卡黄色视频| 亚洲四区av| 精品国产一区二区三区四区第35| 汤姆久久久久久久影院中文字幕| 亚洲精品久久午夜乱码| tube8黄色片| 观看av在线不卡| 我要看黄色一级片免费的| 中文字幕人妻丝袜一区二区 | 老司机影院成人| 午夜免费男女啪啪视频观看| 久久国产精品大桥未久av| 午夜久久久在线观看| 国产日韩欧美在线精品| 人人妻人人澡人人看| 女人爽到高潮嗷嗷叫在线视频| 熟女av电影| 丰满迷人的少妇在线观看| 欧美变态另类bdsm刘玥| 大码成人一级视频| 超碰97精品在线观看| 纵有疾风起免费观看全集完整版| 日韩欧美精品免费久久| 亚洲av日韩精品久久久久久密 | 不卡视频在线观看欧美| 两个人看的免费小视频| 18在线观看网站| 中文字幕av电影在线播放| 岛国毛片在线播放| av一本久久久久| 天天影视国产精品| 狂野欧美激情性xxxx| 青春草国产在线视频| 日韩av不卡免费在线播放| 成年人午夜在线观看视频| 亚洲av日韩在线播放| 热99国产精品久久久久久7| 国产精品二区激情视频| 亚洲 欧美一区二区三区| 国产在线免费精品| 国产99久久九九免费精品| av.在线天堂| 日本vs欧美在线观看视频| 啦啦啦啦在线视频资源| 只有这里有精品99| 免费看av在线观看网站| 国产熟女午夜一区二区三区| 国产精品久久久久久精品古装| 国产黄色免费在线视频| 九色亚洲精品在线播放| 国产精品 国内视频| a级毛片黄视频| 男女之事视频高清在线观看 | 精品人妻一区二区三区麻豆| 麻豆av在线久日| 亚洲欧美一区二区三区国产| 亚洲七黄色美女视频| 最近中文字幕高清免费大全6| 国产又色又爽无遮挡免| 欧美日韩亚洲国产一区二区在线观看 | 一本久久精品| 岛国毛片在线播放| av在线播放精品| 成人免费观看视频高清| 美女福利国产在线| 看免费成人av毛片| 国产1区2区3区精品| 岛国毛片在线播放| 丝袜美足系列| 免费高清在线观看日韩| 欧美乱码精品一区二区三区| 天堂中文最新版在线下载| 亚洲精品日本国产第一区| 国产成人精品无人区| 少妇 在线观看| 国产精品偷伦视频观看了| 免费黄色在线免费观看| 最新的欧美精品一区二区| 黄网站色视频无遮挡免费观看| 青草久久国产| 欧美成人精品欧美一级黄| 日韩大码丰满熟妇| 亚洲三区欧美一区| 香蕉国产在线看| 一区二区三区激情视频| 免费观看av网站的网址| 2018国产大陆天天弄谢| 精品一品国产午夜福利视频| 国产精品欧美亚洲77777| av视频免费观看在线观看| 亚洲免费av在线视频| 无遮挡黄片免费观看| 亚洲欧洲国产日韩| 在线观看人妻少妇| 一级毛片 在线播放| 蜜桃国产av成人99| 波多野结衣一区麻豆| 免费看av在线观看网站| 青草久久国产| 国产精品久久久久久久久免| 国产欧美日韩综合在线一区二区| www.自偷自拍.com| 少妇人妻久久综合中文| 久久久久久久国产电影| 菩萨蛮人人尽说江南好唐韦庄| 国产乱人偷精品视频| 人人妻人人澡人人看| 国产日韩欧美亚洲二区| 精品久久蜜臀av无| 在线观看免费视频网站a站| 国产人伦9x9x在线观看| 国产成人一区二区在线| 人人妻人人添人人爽欧美一区卜| 久久久久久久久久久免费av| 国产精品久久久久成人av| 人成视频在线观看免费观看| 一级黄片播放器| 在线观看免费日韩欧美大片| 亚洲欧美激情在线| 青草久久国产| 欧美成人精品欧美一级黄| 操出白浆在线播放| 老熟女久久久| 18禁裸乳无遮挡动漫免费视频| 日韩 欧美 亚洲 中文字幕| 免费av中文字幕在线| 欧美老熟妇乱子伦牲交| 天美传媒精品一区二区| 女人久久www免费人成看片| 午夜91福利影院| 久久精品国产亚洲av高清一级| 一边摸一边抽搐一进一出视频| 亚洲激情五月婷婷啪啪| 青春草国产在线视频| 观看av在线不卡| 伊人久久国产一区二区| 国产精品人妻久久久影院| 成人午夜精彩视频在线观看| 丰满迷人的少妇在线观看| www.av在线官网国产| 精品少妇内射三级| 亚洲欧美精品自产自拍| 午夜激情av网站| 香蕉国产在线看| 久久久久久人妻| 最新在线观看一区二区三区 | 亚洲精品国产av成人精品| 制服丝袜香蕉在线| 九色亚洲精品在线播放| 午夜日本视频在线| 国产精品久久久久久久久免| 少妇人妻久久综合中文| 人妻 亚洲 视频| 综合色丁香网| 黑丝袜美女国产一区| 9热在线视频观看99| 久久性视频一级片| 人人妻,人人澡人人爽秒播 | 色精品久久人妻99蜜桃| 国产亚洲一区二区精品| 午夜老司机福利片| 在线观看一区二区三区激情| www.精华液| 日韩,欧美,国产一区二区三区| 人妻人人澡人人爽人人| 久久天躁狠狠躁夜夜2o2o | 国产在线一区二区三区精| 亚洲av福利一区| 国产精品久久久人人做人人爽| 久久久久人妻精品一区果冻| 亚洲久久久国产精品| 丰满少妇做爰视频| 天天躁夜夜躁狠狠久久av| 在线观看www视频免费| 日韩一卡2卡3卡4卡2021年| 日本欧美视频一区| 欧美精品av麻豆av| 成人免费观看视频高清| 成年人午夜在线观看视频| 国产精品一国产av| 一个人免费看片子| 欧美久久黑人一区二区| 90打野战视频偷拍视频| 久久精品亚洲熟妇少妇任你| 亚洲精品自拍成人| 如何舔出高潮|