• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Price-Based Residential Demand Response Management in Smart Grids: A Reinforcement Learning-Based Approach

    2022-10-26 12:23:56YanniWanJiahuQinXinghuoYuTaoYangandYuKang
    IEEE/CAA Journal of Automatica Sinica 2022年1期

    Yanni Wan,, Jiahu Qin,, Xinghuo Yu,,Tao Yang,, and Yu Kang,

    Abstract—This paper studies price-based residential demand response management (PB-RDRM) in smart grids, in which nondispatchable and dispatchable loads (including general loads and plug-in electric vehicles (PEVs)) are both involved. The PBRDRM is composed of a bi-level optimization problem, in which the upper-level dynamic retail pricing problem aims to maximize the profit of a utility company (UC) by selecting optimal retail prices (RPs), while the lower-level demand response (DR)problem expects to minimize the comprehensive cost of loads by coordinating their energy consumption behavior. The challenges here are mainly two-fold: 1) the uncertainty of energy consumption and RPs; 2) the flexible PEVs’ temporally coupled constraints, which make it impossible to directly develop a modelbased optimization algorithm to solve the PB-RDRM. To address these challenges, we first model the dynamic retail pricing problem as a Markovian decision process (MDP), and then employ a model-free reinforcement learning (RL) algorithm to learn the optimal dynamic RPs of UC according to the loads’responses. Our proposed RL-based DR algorithm is benchmarked against two model-based optimization approaches(i.e., distributed dual decomposition-based (DDB) method and distributed primal-dual interior (PDI)-based method), which require exact load and electricity price models. The comparison results show that, compared with the benchmark solutions, our proposed algorithm can not only adaptively decide the RPs through on-line learning processes, but also achieve larger social welfare within an unknown electricity market environment.

    I. INTRODUCTION

    THE rapid development of information and communication technologies (ICTs) in power system, especially the introduction of two-way information and energy flow, has led to a revolutionary transition from the traditional power grid to smart grid [1]. The smart grid, a typical cyber-physical system(CPS), integrates advanced monitoring, control, and communication techniques into the physical power system to provide reliable energy supply, promote the active participation of loads, and ensure the stable operation of the system [2]. Due to the cyber-physical fusion characteristics of the smart grid, demand response management (DRM) has become a research hotspot in the field of energy management[3], [4]. The purpose of DRM is to utilize the changes in energy usage of loads to cope with time-varying electricity prices or reward/punishment incentives, so as to achieve cost reduction or other interests [5].

    The existing literature mainly focuses on two branches of DRM, namely price-based DRM (PBDRM) and incentivebased DRM (IBDRM) [6]. The PBDRM encourages loads to adjust their energy usage patterns in accordance with timebased pricing mechanisms, such as real-time pricing [7] and time-of-use (TOU) pricing [8]. The IBDRM prefers to provide loads with rewards/punishments for their contribution/failure in demand reduction during peak periods [3]. Although both of these two DRMs can promote the active participation of loads, as mentioned in [6], PBDRM is more common than IBDRM, so this study mainly focuses on the PBDRM.

    Up to now, many efforts have been devoted to investigating the PBDRM [9]–[16], mainly from social and individual perspectives. From the social perspective, one expects to maximize the social benefit including interests of both the utility company (UC) and users. For example, the work in [9]studies the distributed real-time demand response (DR) in a multiseller-multibuyer environment and proposes a distributed dual decomposition-based (DDB) method to maximize social welfare, i.e., the comfort of users minus the energy cost of UC. Another work in [10] proposes a distributed fast-DDB DR algorithm to obtain the consumption/generation behavior of the end-user/energy-supplier that can yield the optimal social welfare. In addition to common residential loads, the authors in [11]–[14] further consider a holistic framework which optimizes and controls the building heating, ventilation and air conditioning (HVAC) system and the residential energy management with smart homes under a dynamic retail pricing strategy in addition to achieving DR goals. From individuals’ point of view, one prefers to reduce the electricity bills of users or to maximize the revenue of UC by selecting appropriate pricing mechanisms. For example, the work in[15] studies a deterministic DR with day-ahead electricity prices, aiming to minimize the energy cost for customers.Some other works may focus on the benefit of UC, see, for instance, the objective in [16] is to minimize the energy cost of UC.

    Nevertheless, most of the works are either based on a given pricing mechanism (e.g., TOU pricing in [8] and day-ahead pricing in [15]) or a predetermined abstract pricing model(e.g., linear pricing strategy in [17]). That is to say, the existing PBDRM depends largely on deterministic pricing models which cannot reflect the uncertainty and flexibility of the dynamic electricity market. Additionally, in the long run,the UC expects to estimate/predict the impact of its current retail pricing strategy on the immediate and all subsequent responses of loads. However, the existing works show that UC is myopic and only focuses on the immediate response of loads to the current pricing strategy. In view of this, it is urgent to design a real dynamic pricing mechanism such that it can adapt to flexible load changes and the dynamic electricity market environment. Moreover, it is necessary to develop an effective methodology to solve the dynamic PBDRM under an unknown electricity market environment.

    The development of artificial intelligence (AI) has prompted many experts and scholars to adopt learning-based methods (a powerful tool for sequential decisions within unknown environment [18]) to solve the decision-making problems arising from the smart grid, such as the PEV charging problem[19], energy management problem [20]–[24], and demandside management problem (DSM) [25], [26]. Specifically, in[19], the authors use the reinforcement learning (RL)algorithm to determine the optimal charging behavior of an electric vehicle (EV) fleet without a prior knowledge about the exact model of each EV. The work in [20] develops a RLbased algorithm to solve the problems of dynamic pricing and energy consumption scheduling without requiring a priori system information. Some other works, see [21], [23]–[26],focus more on energy management and DSM (usually refer to the DR for load units). For instance, in [21], the authors study the distributed energy management problem by means of RL,in which the uncertainties caused by renewable energies and continuous fluctuations in energy consumption can be effectively addressed. Moreover, to further improve the scalability and reliability of learning-based approaches [22]and reduce the power loss during the energy trading between energy generation and consumption, the authors in [23]propose a Bayesian RL-based approach with coalition formation, which effectively addresses the uncertainty in generation and demand. A model-free RL-based approach is employed in [24] to train the action-value function to determine the optimal energy management strategy in a hybrid electric powertrain. When considering the DSM, the authors propose a multiagent RL approach based on the Q-learning in[25], which not only enhances the possibility of dedicating separate DR programs for different load devices, but also accelerates the calculation process. Another work in [26]investigates the building DR control framework and formulates the DR control problem as a MDP. On this basis, a cost-effective RL-based edge-cloud integrated solution is proposed, which shows good performance in control efficiency and the learning efficiency in different-sized buildings. However, the considered energy management and DSM problems only involve the general dispatchable loads(i.e., energy usage changes with RP, such as air conditioning and lighting) while ignoring the non-dispatchable loads (i.e.,energy usage cannot change at any time, such as a refrigerator) and a class of more flexible loads, such as plug-in electric vehicles (PEVs) [27].

    Inspired by the application of RL in energy scheduling and trading, this paper adopts a model-free RL to learn the optimal RPs within an unknown electricity market environment. The main contributions are shown below:

    1) This paper studies the price-based residential demand response management (PB-RDRM) in a smart grid, in which both the non-dispatchable and dispatchable loads are considered. Compared with the existing PBDRM with general dispatchable loads such as household appliances [9] and commercial buildings [11], this work innovatively considers a more flexible PEV load with two working modes of charging and discharging.

    2) Unlike the existing works that focus on the individual interests [15], [16], the considered PB-RDRM is modeled from a social perspective. Specifically, the PB-RDRM is composed of a bi-level optimization problem, where the upper level aims to maximize the profit of the UC and the lower level expects to minimize the comprehensive cost of loads.Therefore, the goal of PB-RDRM is to coordinate the energy consumption of all loads to maximize the social welfare (i.e.,the weighted sum of UC’s profit and loads’ comprehensive cost) under the dynamic RPs.

    3) Considering the uncertainty induced by energy consumption and RPs, as well as the temporally coupled constraints of PEVs, a model-free RL-based DR algorithm is proposed. The comparison results between the proposed model-free algorithm and two benchmarked model-based optimization approaches (i.e., distributed DDB [9] and PDI methods [10]) show that our proposed algorithm can not only adaptively decide the dynamic RPs by an on-line learning process, but also achieve the optimal PB-RDRM within an unknown electricity market environment.

    The organization of this paper is as follows. Section II presents the problem statement. Section III provides the RLbased DR algorithm and Section IV conducts the simulation results to verify the performance of the proposed algorithm.Finally, the conclusions are drawn in Section V.

    II. PROBLEM STATEMENT

    Consider a retail electricity market in residential area as shown in Fig. 1, including the load (lower) and UC (upper)levels. Note that there is a two-way information flow between the UC and load levels. Specifically, the UC (a form of distribution system operator (DSO)) releases the information of dynamic RPs to loads while the loads deliver the energy demand information to UC. The PB-RDRM aims to coordinate the energy consumption of a finite set N ={1,2,...,N} of residential loads within a time period T ={1,2,...,T} in response to the dynamic RPs, thereby maximizing social welfare. Since the problem model involves the information and energy interactions between the UC and loads, the mathematical models of them shall be introduced first after which we will give the system objective.

    Fig. 1. Retail electricity market model in residential area.

    A. Load Modeling

    According to the users’ preferences and loads’ energy consumption characteristics, the loads are usually classified into two categories [28], namely dispatchable loads Ndand non-dispatchble loads Nn. In this paper, in addition to the general dispatchable loads G, we consider a more flexible load, i.e., PEVs V . That is, Nd=G∪V.

    1) General Dispatchable Loads:The consumed energy of a general dispatchable loadn∈G is described as [28], [29]

    whereandare the energy consumption (kWh) and energy demand (kWh) of general dispatchable loadnat time slott, respectively. Here the energy demand refers to the expected energy requirement of loads before they receive the RP from UC, while the energy consumption is the actual consumed energy of loads after they receive the RP signal. ξtis the price elasticity coefficient indicating the ratio of energy demand change to the RP variation at time slott. Note that ξtis usually negative to show the reciprocal relationship between energy demand and electricity price [28].and λtrespectively represent the RP ($/kWh) and wholesale price($/kWh) at time slottand follows≥λt. The intuition behind (1) is that the current energy consumption of general dispatchable loadndepends on the current energy demand information and the demand reduction amount resulting from the changes in RP. Here note that when the general dispatchable loadnconsumes energyat time slott, the remaining required energycannot be satisfied, thus resulting in loadnexperiencing dissatisfaction. To characterize such dissatisfaction, a dissatisfaction function is defined as follows [30]:

    whereandare the lower and upper limits of the battery capacity (kWh) of PEVn, respectively. Considering the overall interests of the electricity market, it is impossible to completely obey the PEV owners’ charging willingness,leading to the dissatisfaction of PEV owners. Thus, the following dissatisfaction function is defined [30]:

    where κ ($/kWh) is the degradation coefficient.

    3) Non-Dispatchable Loads:Since the energy consumption of non-dispatchable loads cannot be shifted or curtailed, these energy demands can be critically met at any time. Therefore,for anyn∈Nn, one has

    whereandare the energy consumption and energy demand of non-dispatchable loadnat time slott, respectively.

    From the point of view of loads, one expects to decide optimal energy consumption of all loads to minimize the comprehensive cost which is described below:

    B. Utility Company Modeling

    For UC, since it first purchases electrical energy from the grid operator at predetermined wholesale prices, and then sells the purchased energy to various types of loads at RPs set by itself, the goal of the UC is to select optimal RPs so as to maximize its profit, i.e.,

    C. Problem Formulation

    Recall that the aim of PB-RDRM is to adjust the energy usage patterns of loads to cope with time-varying RPs, so as to maximize social welfare (including both UC’s profit ($) and loads’ comprehensive cost ($)) from a social perspective.Therefore, the considered PB-RDRM can be formulated as the following optimization problem:

    where ρ ∈[0,1] is a weighting parameter to show the relative social value of the UC’s profit and the loads’ comprehensive cost from the social perspective [29], [32]. It is worth mentioning that many optimization methods, such as twostage stochastic programming, Lyapunov optimization techniques, model predictive control, and robust optimization in[13], have been used to solve the DRM problem similar to the optimization problem (14). Although the above methods are relatively mature, they still have the following limitations: 1)They need to have prior knowledge of the exact load model.However, the model of some loads, like the PEVs considered in this paper, is affected by many factors, such as the temporally coupled SoC constraints [31] and the randomness of EV’s commuting behavior [33], resulting in an exact load model that is often difficult to obtain or even unavailable; 2)They depend on the accurate prediction of uncertain parameters (e.g., the RP in the current work). However, in most cases, the prediction error cannot be guaranteed to be small enough, thus affecting the performance of the optimization approaches; 3) Almost all above methods are offline. Therefore, we must fully complete the calculation process, and then choose the best result which, however, is time-consuming when the problem size is large. To tackle the above limitations, we next adopt a RL-based approach which can adaptively determine the optimal policy by on-learning process without requiring the exact load model.

    III. REINFORCEMENT LEARNING-BASED DR ALGORITHM

    This section discusses how to employ the RL method for UC to decide the optimal retail pricing policy so as to solve the PB-RDRM.

    A. A Brief Overview of RL

    RL is a type of machine learning (ML) approach that is evolved from the behaviorist psychology, focusing on how an agent can find an optimal policy within a stochastic/unknown environment that can maximize cumulative rewards [18].Different from the supervised ML, RL explores the unknown environment through continuous actions, continuously optimizes behavioral strategies according to the reward provided by environment, and finally finds the optimal policy(i.e., a sequence of actions) that yields the maximum cumulative reward. Fig. 2 depicts the basic principle of RL.Specifically, the agent does not know what reward and the next state produced by the environment when taking the current action at the initial moment, thus with no knowledge of how to choose actions to maximize the cumulative reward.To tackle this issue, at initial states0, the agent randomly takes an actiona0from the action set and acts on the environment, resulting in the state of environment moving froms0tos1(purple arrows). At the same time, the agent receives an immediate rewardr0from the environment(orange arrows). The process repeats until the end of one episode (the completion of a task or the end of a period of time). Moreover, the current action affects both the immediate reward as well as the next state and all future rewards.Therefore, the RL has two significant characteristics, namely the delayed reward and the trial-and-error search.

    Fig. 2. Basic principle of RL.

    B. Mapping the System Model to RL Framework

    To determine the optimal RPs, we first use the RL framework to illustrate the retail electricity market model (see Fig. 3). Specifically, the UC acts as the agent, all the loads serve as the environment, the retail prices are denoted as the actions that the agent acts on the environment, the energy demand, energy consumptions of loads, and time index represent the state, and the social welfare (i.e., weighted sum of UC’s profit and loads’ comprehensive cost) is the reward.Then, we further adopt a discrete-time Markovian decision process (MDP) to model the dynamic retail pricing problem as it is usually the first step when using the RL method [34],[35]. The MDP is represented by a quintuple ,wherein each component is described as follows:

    Fig. 3. Illustration of RL framework for retail electricity market model.

    Fig. 4. Energy demand/consumption of non-dispatchable loads.

    Fig. 5. Energy demand of dispatchable loads. (a) General dispatchable loads (b) PEVs.

    1) State Set:S={s1,s2,...,sT}, wherest=(et,pt,t). The environment state at time slottis represented by three kinds of information, i.e., energy demandet, energy consumptionptof all loads, and time stept;

    2) Action Set:A={a1,a2,...,aT}, whereat=ηt. To be specific, the action at time slottis denoted as the RPs ηtthat the UC sets for all loads at that time;

    3) Reward Set:R={r1,r2,...,rT}, wherert=ρUt?(1?ρ)Ct. That is to say, the reward at time slottis the social welfare received by the system at that time;

    4) State Transition Matrix:P={Pss′} , where=P{st+1=s′|st=s,at=a} is the transition probability1Since the energy demand and consumption of loads is affected by many factors, the state transition is rather difficult to obtain. Therefore, we next employ a model-free Q-learning method to solve the dynamic retail pricing problem.when adopting actionaat statesand the environment moves to the next states′;

    5) Discount Factor:γ ∈[0,1] indicating the relative importance of subsequent rewards and the current reward.

    One episode of MDP is denoted by (s1,a1,s2,r1;a2,s3,r2;...;aT?1,sT,rT?1). The total return of one episode is denoted by, which represents the cumulative reward. Due to the delayed reward feature of RL, the discounted future return from time slottis usually expressed as, where γ ∈[0,1] is the discount factor:γ=0 implies that the system is totally myopic and only focuses on the immediate reward, while γ=1 shows that the system treat all rewards fairly. Thus, to show the foresight of system, one usually chooses an appropriate discount factor.Here note that since we focus on the social welfare during the entire time horizon, the reward at each time slot is equally important. As a result, the discount factor is 1 in our problem formulation. In addition, denote thepolicyπ, which is a mapping from states to actions, i.e., π:S→A. Then the retail pricing problem aims to seek the optimal policy π?that can maximize the cumulative return, i.e.,

    C. Algorithm Implementation

    After mapping the retail pricing problem to the MDP framework, the RL method can be used to seek optimal retail pricing policies. Here we adopt Q-learning, one of the modelfree RL methods, to analyze how a UC chooses RPs while interacting with all loads to achieve the system objective (14).Almost all RL methods rely on the estimation ofvalue functions, which refer to a set of functions related to states (or state-action pairs), to illustrate the performance of the agent in a given state (or a state-action pair). Thus, the basic principle of Q-learning is to assign anaction-value function(i.e.,Q(s,a)) to each state-action pair (s,a) and update that at each iteration so as to acquire the optimalQ(s,a). The optimal action-value functionQ?(s,a) is denoted as the maximum cumulative discount future return starting from states, taking actiona, and thereafter adopting the optimal policy π?, which obeys the Bellman optimality equation [18], i.e.,

    where E is the expected operator used to show the randomness of the next state and reward,s′∈Srepresents the state at next time slot anda′∈Ais the action adopted at states′. Therefore,by acquiringQ?(s,a), one can immediately obtain the optimal policy by

    The overall implementation of the RL-based DR algorithm is summarized in Algorithm 1. Specifically, to begin with,input a set of predefined parameters including the loads’energy demands, dissatisfaction coefficients, price elasticity coefficients, wholesale prices, and weighting parameters, etc.Then initialize the action-value functionQ0(s,a) to zeros and the UC learns the optimal RP policy by following steps:

    S1) Observe the initial statestand select an actionatwith an ?-greedy policy within the RP boundaries.

    S2) After performingat, the UC obtains an immediate reward byrt=ρUt?(1?ρ)Ctand observes the next statest+1.

    S3) UpdateQk(st,at) by the following mechanism:

    where θ ∈[0,1] is a learning rate indicating the coverage degree of newly obtained Q-values over the old ones.

    S4) Check whether to reach the end of one episode (that is,whether to reach the final time slotT), if not, go back to S1);otherwise, go to S5).

    S5) Check the stopping criterion. Specifically, compare the values ofQkandQk?1to see if it converges, if not, go to the next iteration; otherwise, go to S6).

    S6) Calculate the optimal retail pricing policy based on(17).

    S7) Calculate the optimal energy consumption of dispatchable loads by (1) and (4).

    Remark 1:The basic principle of the ?-greedy policy (the most common exploration mechanism in RL) is either to choose a random action from the action setAwith probability ? or select the action that corresponds to the maximum actionvalue function with probability 1??. Such exploration and selection mechanism not only avoids the complete randomness of system but also promotes the efficient exploration in action space and thus can adaptively decide the optimal policy(i.e., the dynamic RPs) by the on-line learning process.Moreover, the iterative stopping criterion is |Qk?Qk?1|≤δ,where δ is a very small positive constant indicating the gap tolerance of the previous Q-value and the current one,ensuring that the Q-value eventually approaches to the maximum. That is to say, the proposed RL-based DR algorithm can guarantee to an optimal PB-RDRM within an unknown electricity market environment.

    Algorithm 1 RL-Based DR Algorithm 1: Input: A set of predefined parameters Q0(s,a)=0k=0t=0 2: Initialize: An initial action-value function , , 3: Iteration: 4: For each episode do t ←t+1 k ←k+1 5: Repeat: st 6: Step 1: Observe the state (i.e., energy demand, energy con- sumption, and time step) and choose an action (i.e., RP) using the -greedy policy rt=ρUt ?(1?ρ)Ct st+1 at ? 7: Step 2: Calculate the immediate reward and observe the next state Qk(st,at)8: Step 3: Update the action-value function by (18)9: Step 4: Check whether to reach the end of one episode t=T 10: if 11: break;12: end if 13: Step 5: Check the stopping criterion|Qk ?Qk?1|≤δ 14: if 15: break;16: end if 17: Step 6: Compute the optimal retail pricing policy by (17)18: Step 7: Compute the optimal energy consumption by (1) and (4)19: Output: The optimal energy consumption profile

    IV. CASE STUDIES

    This section conducts case studies to verify the performance of the proposed RL-based algorithm. In particular, two modelbased optimization approaches (i.e., distributed DDB [9] and PDI [10] methods) are adopted as benchmarks for comparison. The algorithms are implemented by MATLAB R2016a on a desktop PC with i3-6100 CPU @ 3.70 GHz, 8 GB of RAM, and a 64-bit Windows 10 operating system.

    A. Case Study 1: Effectiveness Verification

    1) Performance Evaluation:In this case, we consider the DRM of a residential area with 5 non-dispatchable loads and 10 dispatchable loads (including 6 general dispatchable loads and 4 PEVs) in a whole day (i.e., 24 hours). The energy demand profiles of non-dispatchable and dispatchable loads are obtained from San Diego Gas & Electric [36] and shown in Figs. 4 and 5. As shown in Fig. 5, the energy demand trends of all six general dispatchable loads are almost the same,resulting in two demand peaks (i.e., 10:00–15:00 and 19:00–22:00). Thus, if the actual energy consumption of those dispatchable loads are not properly coordinated, the electricity burden of power grid shall be largely increased and we cannot guarantee the economic operation of electricity market.Additionally, note that since we consider residential PEVs,their arrival and departure time is almost fixed and known in advance. The wholesale prices are determined by grid operator and is derived from Commonwealth Edison Company [37](see also in Fig. 6). The remaining related parameters of dispatchable loads are listed in Table I. For illustration, the numerical value of the wholesale price and the remaining related parameters (including the elasticity coefficients ξt,weighting factor ρ, learning rate θ, and gap tolerance δ) are summarized in Table II. Note that the action space is discretized by 0.1. That is to say, the RP increases or decreases by a multiple of 0.1 at each iteration. The numerical results of the proposed RL-based DR algorithm are shown below.

    Fig. 6. Daily optimal retail pricing policies of loads. (a) Non-dispatchable load. (b) General dispatchable load. (c) PEV.

    Fig. 6 shows the daily optimal retail pricing policies received by three types of loads. It can be observed that the trends of RPs and wholesale prices are similar, which is justified in terms of maximizing social welfare. Moreover, all the RPs fall within the lower and upper price bounds, thus satisfying the constraint (13). It is worth noting that due to the changes of price elasticity from off-peak to mid-peak (onpeak) periods at 12:00 (16:00), there appears to be sudden decreases at these two time slots. From the definition of price elasticity, one knows that a continuous increase in RP may lead to more demand reduction, thus causing great reductions in UC’s profit. Another observation is that the price difference(i.e., retail price minus wholesale price) for each load unit during three periods satisfies: off-peak > mid-peak > on-peak.This is because the price elasticity coefficient of on-peak period is smaller than that of mid- and off-peak periods. Once obtaining the optimal RPs for all loads, the optimal energy consumption can be directly calculated by (1) and (4), which is shown in Fig. 7. One can see that the PEV discharges at some peak hours to relieve the electricity pressure and increase its own profit. And for further analysis, the total demand reduction of each dispatchable load is displayed in Fig. 8. It can be observed that Gen6 reduces less energy compared with other dispatchable loads, which is because a load unit with a larger αnprefers a smaller demand reduction to prevent from experiencing more dissatisfaction.

    Next, we proceed to verify the convergence of Algorithm 1,that is, judging whether the Q-value converges to the maximum. For clarity, we choose five Q-values of each type of load as an example and the numerical results are displayed in Fig. 9. Clearly, at the beginning, the UC has no knowledge of which RP can result in a larger reward. But as the iteration proceeds, since the UC learns the dynamic responses of loads through trial and error, the Q-values gradually increase and eventually converge to the maximums.

    Now let us move on to the discussion about the impact of ρ and θ. Since the demand reduction of loads is closely related to the time-varying price elasticity, the fixed elasticity coefficients are not representative. To tackle this issue, we adoptMonte Carlo simulations(2000 simulations with changing elasticity coefficients) to capture the trends of average RP, UC’s total profit, and loads’ total cost as the weighting parameter ρ changes. As shown in Fig. 10, the average RP (red solid line), the UC’s total profit (black solidline), and loads’ total cost (blue solid line) increase as ρ varies from 0 to 1 with a step of 0.1. It is because the increase in ρ means that from a social point of view, maximizing the profit of UC is more important than minimizing the comprehensive cost of loads, thereby resulting in an increase in the RP determined by UC. Correspondingly, as RP increases, the amount of energy consumed by loads is gradually reduced,leading to a slight increase in the total cost of the loads. Fig. 11 shows that with θ increasing from 0 to 1, the convergence of Q-values gradually becomes faster. In particular, θ=0 means UC learns nothing while θ=1 shows the UC focuses only on the latest information.

    Fig. 7. Daily energy consumption of loads. (a) General dispatchable load.(b) PEV.

    Fig. 8. Total demand reduction of dispatchable loads.

    Fig. 9. Convergence of Q-values. (a) Non-dispatchable load. (b) General dispatchable load. (c) PEV.

    Fig. 10. Impact of ρ on average retail price, UC's total profit, and loads'total cost.

    Fig. 11. Impact of learning rate θ on convergence of Q-values.

    TABLE I RELATED PARAMETER SETTINGS OF DISPATCHABLE LOAD UNITS

    TABLE II NUMERICAL VALUES OF WHOLESALE PRICE AND OTHER RELATED PARAMETERS

    2) Comparison With Benchmarks:To further evaluate the effectiveness of the proposed model. First we adopt a benchmark without PBDRM for comparison. Fig. 12 shows the daily energy consumption of all loads under two different situations, namely with and without PBDRM. Here note that for the sake of illustration, the figure plots the average RP of all loads. It can be seen from Fig. 12(a) that there is no energy demand reduction or shift in the absence of PBDRM, causing a fluctuating energy consumption profile. By contrast, as shown in Fig. 12(b), with the help of PBDRM, the loads reduce energy consumption when the price is high, resulting in less total energy consumption and a more smooth profile.Therefore, the proposed PB-RDRM effectively coordinates the energy consumption of residential loads and significantly improves the social welfare of residential retail electricity market.

    Fig. 12. Energy consumption of all loads. (a) without and (b) with PBDRM.

    Then, the proposed RL-based DR algorithm is benchmarked against the other two model-based optimization algorithms(i.e., distributed DDB [9] and PDI [10] methods), and a scheme with perfect information of random parameters (i.e.,the RP and the EVs’ commuting behavior). Note that both of the two model-based optimization approaches are based on deterministic load and price models. It is because the PBRDRM is essentially a bi-level optimization problem, so when the problem model is accurately formulated, one can use the conventional optimization techniques to solve it directly.Specifically, by means of the DDB method in [9], the optimization problem of all lower-level loads can be decoupled into several sub-optimization problems, each of which can be solved in a distributed manner. In simulations,the initial Lagrangian multipliers are set to zeros. In addition,another optimization technique, namely the PDI method, has been shown to be effective in dealing with the DR in smart grid [10]. In our comparative simulation study, let the value of the initial random dual vector be 0.5, α=0.05, σ=2.5,β=0.2 , and εν=εfeas=10?2. Note that the above parameter settings correspond to the algorithm in [10] and are independent of the parameters presented in this paper. The comparison results are shown in Fig. 13. It can be observed that the trends of RP (solid lines) obtained by the four compared algorithms are similar, but the energy consumption profile (bar charts) learned from our proposed RL-based algorithm is much smoother than that of two model-based optimization approaches. Moreover, the trends of RPs and energy consumption profiles obtained by our proposed approach are the closest to the scheme with perfect information of random parameters (refer as “Com-based” in Fig. 13). In addition, Table III lists the numerical comparison results of the UC’s total profit, the loads’ total cost, and the social welfare. It can be observed from Table III that the scheme with complete information of random parameters provides an upper bound for the social welfare generated by the PB-RDRM considered in this paper. Moreover, the result of our proposed RL-based approach is closer to this upper bound than those of the other two model-based optimization approaches.

    Fig. 13. Comparison results of the total energy consumption and RP solutions of RL-based algorithm and three benchmarked algorithms(including DDB, PDI, and Com-based).

    TABLE III NUMERICAL COMPARISON RESULTS

    Moreover, note that the complexity of RL algorithms applied to the goal-directed exploration tasks, especially for the Q-learning and its variants, have been thoroughly analyzed in an earlier paper [38]. Specifically, according to Theorem 2 and Corollary 3 in [38], one can see that the complexity of the one-step Q-Learning algorithm (corresponding to the inner loop of the RL-based DR algorithm proposed in the current work) is O(md), wheremanddare the total number of stateaction pairs and the maximum number of actions that the agent executes to reach the next state, respectively. Therefore,the complexity of our proposed RL-based DR algorithm is O(md), which obviously depends on the size of the state space. As for the two compared model-based approaches (i.e.,distributed DDB [9] and distributed PDI [10]), they are shown to have polynomial-time complexity. Therefore, with the increase in the number of residential loads and the time horizon, the proposed algorithm has a comparable complexity with distributed DDB and PDI methods. In addition,considering the performance advantages of our proposed algorithm in addressing the uncertainty of RPs and energy consumption, as well as yielding larger social welfare, it turns out that the proposed RL-based DR algorithm can effectively solve the PB-RDRM within an unknown electricity market environment.

    B. Case Study 2: Scalability Verification

    Next, to verify the scalability of the proposed algorithm, we consider more loads (i.e., the total number of loads changes from 50 to 200) participating in the PB-RDRM. Fig. 14 traces the convergence rate of Q-values with different numbers of loads. It can be observed that the more loads there are, the more iterations required for Q-values to converge. Note that the iterative process of 200 loads takes about 7.17×103swhile that of 50 loads is 3.89×103s. The main reason for such an increase in time and number of iterations is that when adding one load, there areW24permutations to perform,whereW∈{1,...,|A|}. Fortunately, with the advent of a new generation of advanced computing platforms such as grid computing and cloud computing [2], such computing pressure is no longer an obstacle to the development of smart grids as they synthesize and coordinate various local computing facilities such as smart meters, to provide the required subcomputing and storage tasks.

    Fig. 14. Convergence rate of Q-values with different numbers of loads.

    V. CONCLUSION

    This paper investigates the problem of PB-RDRM, in which the flexible PEVs are innovatively considered. We first formulate the upper dynamic retail pricing problem as a MDP with unknown state transition probability from the social perspective. Then a model-free RL-based approach is proposed to obtain the optimal retail pricing policies to coordinate energy consumption profiles. The proposed approach is shown to address the uncertainty induced by energy consumption and RP, as well as the temporally coupled constraints of PEVs, without any prior knowledge about the exact models of load and electricity price. The simulation results show that our proposed RL-based algorithm can not only adaptively decide the dynamic RPs by the on-line learning process, but also outperform the model-based optimization approaches in solving the PB-RDRM within an unknown market environment.

    Note that the tabular Q-learning algorithm we are using is limited by the changing dimensions of the state vector. However, unlike the commercial area, the number of loads in residential areas is almost fixed, resulting in the dimension of state vector being constant in the considered PB-RDRM. Therefore, the corresponding Q table does not need to be reconstructed and trained repeatedly. That is to say, the proposed Q learning-based algorithm is applicable in the investigated PBRDRM. In the future, we intend to use function approximation or neural networks to replace the Q-table so as to expand the algorithm to solve larger problems. We will also focus on the pros and cons of both PBDRM and IBDRM to explore the coordination between these two DRMs.

    成人欧美大片| 亚洲一区二区三区色噜噜| 波野结衣二区三区在线 | 久久久国产成人精品二区| 岛国在线免费视频观看| 变态另类丝袜制服| 国内少妇人妻偷人精品xxx网站| 少妇的逼水好多| 欧美另类亚洲清纯唯美| 美女高潮的动态| 最近最新免费中文字幕在线| 天天躁日日操中文字幕| 99国产精品一区二区蜜桃av| 国内揄拍国产精品人妻在线| 在线观看舔阴道视频| 一级黄片播放器| 欧美成人性av电影在线观看| 国产国拍精品亚洲av在线观看 | 最近最新中文字幕大全免费视频| 久久精品国产99精品国产亚洲性色| 亚洲av中文字字幕乱码综合| av福利片在线观看| 一个人看的www免费观看视频| 天堂网av新在线| 午夜福利视频1000在线观看| 亚洲国产欧美人成| 亚洲国产精品sss在线观看| 日本黄色视频三级网站网址| 悠悠久久av| 色综合亚洲欧美另类图片| 久久久久久国产a免费观看| 中文字幕人成人乱码亚洲影| 亚洲最大成人中文| 天堂影院成人在线观看| 最新在线观看一区二区三区| 一进一出抽搐动态| 亚洲欧美日韩高清专用| 久久性视频一级片| 欧美成人免费av一区二区三区| 国产伦一二天堂av在线观看| 午夜a级毛片| 一夜夜www| 久久精品91无色码中文字幕| 欧美一区二区国产精品久久精品| www日本黄色视频网| 悠悠久久av| 免费看光身美女| 真人做人爱边吃奶动态| netflix在线观看网站| 757午夜福利合集在线观看| 日韩人妻高清精品专区| 乱人视频在线观看| 久久精品国产99精品国产亚洲性色| 久久精品国产综合久久久| ponron亚洲| 午夜福利欧美成人| 婷婷六月久久综合丁香| 2021天堂中文幕一二区在线观| 在线观看66精品国产| 99热6这里只有精品| 51午夜福利影视在线观看| 精品久久久久久久毛片微露脸| 叶爱在线成人免费视频播放| 国产黄色小视频在线观看| 欧美成狂野欧美在线观看| 男插女下体视频免费在线播放| 啦啦啦免费观看视频1| 真人做人爱边吃奶动态| 制服人妻中文乱码| 亚洲成av人片免费观看| 国产真实乱freesex| 国产精品永久免费网站| 夜夜看夜夜爽夜夜摸| 在线免费观看的www视频| 午夜福利在线观看吧| 色视频www国产| 99久久精品国产亚洲精品| 别揉我奶头~嗯~啊~动态视频| 色精品久久人妻99蜜桃| 床上黄色一级片| 一进一出好大好爽视频| 国产av不卡久久| 亚洲七黄色美女视频| 给我免费播放毛片高清在线观看| 色视频www国产| 日日夜夜操网爽| 啦啦啦韩国在线观看视频| 真人一进一出gif抽搐免费| 国产一区二区激情短视频| 国产高清视频在线播放一区| 欧美又色又爽又黄视频| 亚洲精品一区av在线观看| 99国产极品粉嫩在线观看| 88av欧美| aaaaa片日本免费| 一级毛片高清免费大全| 91av网一区二区| 两个人视频免费观看高清| 色av中文字幕| 国产极品精品免费视频能看的| 香蕉av资源在线| 欧美日韩乱码在线| 日韩高清综合在线| 观看美女的网站| 久久国产乱子伦精品免费另类| 欧美又色又爽又黄视频| 成人三级黄色视频| 国产午夜福利久久久久久| 18禁黄网站禁片免费观看直播| 中文字幕人妻丝袜一区二区| 成年女人永久免费观看视频| 两个人视频免费观看高清| 久久久色成人| 欧美+日韩+精品| 国产一区在线观看成人免费| 狠狠狠狠99中文字幕| 国产精品久久电影中文字幕| 天天一区二区日本电影三级| 久久人人精品亚洲av| 俄罗斯特黄特色一大片| 十八禁人妻一区二区| 国产精品永久免费网站| 亚洲精品亚洲一区二区| 天天躁日日操中文字幕| 少妇的逼水好多| 欧美av亚洲av综合av国产av| 男人的好看免费观看在线视频| 国产激情欧美一区二区| 国产成人系列免费观看| 免费大片18禁| 1024手机看黄色片| 一卡2卡三卡四卡精品乱码亚洲| 国产v大片淫在线免费观看| 亚洲aⅴ乱码一区二区在线播放| av在线天堂中文字幕| 嫁个100分男人电影在线观看| 精品熟女少妇八av免费久了| 国产中年淑女户外野战色| 国产精品永久免费网站| 国产亚洲精品综合一区在线观看| eeuss影院久久| 一区二区三区免费毛片| 午夜免费男女啪啪视频观看 | 亚洲av成人不卡在线观看播放网| 日韩成人在线观看一区二区三区| 国产av在哪里看| 日韩中文字幕欧美一区二区| 日韩欧美精品v在线| 国产精品久久久人人做人人爽| or卡值多少钱| 别揉我奶头~嗯~啊~动态视频| 国产色婷婷99| 少妇人妻一区二区三区视频| 成人特级av手机在线观看| 男人的好看免费观看在线视频| 女警被强在线播放| 非洲黑人性xxxx精品又粗又长| 免费看十八禁软件| 一边摸一边抽搐一进一小说| 99精品久久久久人妻精品| 亚洲欧美日韩高清专用| 色尼玛亚洲综合影院| 成人特级黄色片久久久久久久| 国产三级黄色录像| 在线播放国产精品三级| 亚洲精品在线观看二区| 18禁裸乳无遮挡免费网站照片| 欧美最黄视频在线播放免费| 久久国产乱子伦精品免费另类| 18禁在线播放成人免费| 99在线视频只有这里精品首页| 亚洲欧美日韩高清专用| 真人一进一出gif抽搐免费| 免费在线观看亚洲国产| 国产97色在线日韩免费| 老司机福利观看| 人妻久久中文字幕网| 久久久久久久久中文| www日本黄色视频网| 黄片大片在线免费观看| 精品人妻一区二区三区麻豆 | 97超视频在线观看视频| 精品国产三级普通话版| 国产一区二区激情短视频| 欧美成人a在线观看| 欧美乱妇无乱码| 狂野欧美激情性xxxx| 亚洲成av人片在线播放无| 欧美在线黄色| 欧美色视频一区免费| 好看av亚洲va欧美ⅴa在| 国产主播在线观看一区二区| 成人av在线播放网站| 中文字幕久久专区| 全区人妻精品视频| 亚洲精品亚洲一区二区| 国产av在哪里看| 国产真实伦视频高清在线观看 | АⅤ资源中文在线天堂| 少妇的逼好多水| 麻豆国产av国片精品| 好看av亚洲va欧美ⅴa在| 欧美午夜高清在线| 免费在线观看成人毛片| 老汉色∧v一级毛片| 69人妻影院| 亚洲欧美日韩高清在线视频| 3wmmmm亚洲av在线观看| 在线免费观看不下载黄p国产 | bbb黄色大片| 日韩欧美国产在线观看| 国产免费男女视频| 亚洲电影在线观看av| 国产男靠女视频免费网站| 男女床上黄色一级片免费看| 日韩av在线大香蕉| 亚洲精品456在线播放app | 极品教师在线免费播放| 久久久久性生活片| 日韩大尺度精品在线看网址| 激情在线观看视频在线高清| 久久精品人妻少妇| 亚洲熟妇中文字幕五十中出| 午夜老司机福利剧场| 99久久精品国产亚洲精品| 国产成人av激情在线播放| 99久久九九国产精品国产免费| 国产三级中文精品| 99精品欧美一区二区三区四区| 中国美女看黄片| 亚洲国产欧美网| 国产91精品成人一区二区三区| 国产欧美日韩一区二区精品| 老司机福利观看| 亚洲成av人片免费观看| 在线观看免费午夜福利视频| 91在线精品国自产拍蜜月 | 高清在线国产一区| 岛国在线观看网站| 亚洲美女黄片视频| 69人妻影院| 校园春色视频在线观看| 久久精品国产综合久久久| 麻豆久久精品国产亚洲av| 19禁男女啪啪无遮挡网站| 精品久久久久久久末码| av黄色大香蕉| 国产极品精品免费视频能看的| 久久精品影院6| 麻豆国产av国片精品| 午夜免费男女啪啪视频观看 | 黄色日韩在线| av在线蜜桃| 黄片小视频在线播放| 香蕉丝袜av| 99精品久久久久人妻精品| 亚洲天堂国产精品一区在线| 国产精品一区二区三区四区免费观看 | 国产高清有码在线观看视频| 夜夜夜夜夜久久久久| 国产伦在线观看视频一区| 国产野战对白在线观看| 69人妻影院| 制服人妻中文乱码| 午夜视频国产福利| 国产高清三级在线| 久久精品91蜜桃| 中文字幕人妻丝袜一区二区| 麻豆成人午夜福利视频| 制服人妻中文乱码| 动漫黄色视频在线观看| 香蕉av资源在线| 一本一本综合久久| 一级毛片高清免费大全| 免费在线观看影片大全网站| 一a级毛片在线观看| 岛国在线免费视频观看| 欧美丝袜亚洲另类 | 国产黄a三级三级三级人| 亚洲久久久久久中文字幕| 人人妻人人澡欧美一区二区| 国产成人aa在线观看| 又紧又爽又黄一区二区| 国产精品 国内视频| 国产亚洲精品综合一区在线观看| 欧美bdsm另类| 男女视频在线观看网站免费| 亚洲av电影不卡..在线观看| 淫妇啪啪啪对白视频| 岛国在线观看网站| 美女大奶头视频| 在线观看美女被高潮喷水网站 | АⅤ资源中文在线天堂| 少妇的丰满在线观看| 每晚都被弄得嗷嗷叫到高潮| 免费观看的影片在线观看| 色老头精品视频在线观看| 最近视频中文字幕2019在线8| 天堂网av新在线| 他把我摸到了高潮在线观看| 亚洲第一电影网av| 国产免费av片在线观看野外av| 99久国产av精品| 午夜免费成人在线视频| aaaaa片日本免费| 久久久久久九九精品二区国产| 最后的刺客免费高清国语| 中文资源天堂在线| 琪琪午夜伦伦电影理论片6080| 法律面前人人平等表现在哪些方面| 免费在线观看影片大全网站| 国产精品精品国产色婷婷| 老司机午夜福利在线观看视频| 欧美大码av| 成人一区二区视频在线观看| 国产高潮美女av| 午夜福利免费观看在线| 99热6这里只有精品| 老司机深夜福利视频在线观看| 日本与韩国留学比较| 亚洲无线在线观看| 亚洲成a人片在线一区二区| 午夜福利高清视频| 真人一进一出gif抽搐免费| 国产精品亚洲一级av第二区| 午夜老司机福利剧场| 精品福利观看| 久久精品91无色码中文字幕| 免费无遮挡裸体视频| 亚洲七黄色美女视频| 好看av亚洲va欧美ⅴa在| 久久久久久大精品| 午夜视频国产福利| 97超视频在线观看视频| 狂野欧美激情性xxxx| 身体一侧抽搐| 91在线精品国自产拍蜜月 | 久久久久久人人人人人| 亚洲,欧美精品.| 好男人在线观看高清免费视频| 欧洲精品卡2卡3卡4卡5卡区| 国产精品1区2区在线观看.| 国产免费av片在线观看野外av| 久久性视频一级片| 男女那种视频在线观看| 成人特级黄色片久久久久久久| 亚洲av免费在线观看| 国产中年淑女户外野战色| 在线观看66精品国产| 亚洲欧美日韩卡通动漫| 亚洲色图av天堂| 69av精品久久久久久| 久久精品国产亚洲av涩爱 | 国产成人aa在线观看| 国产高潮美女av| 久久九九热精品免费| 免费看光身美女| 嫩草影院精品99| 人妻丰满熟妇av一区二区三区| 免费电影在线观看免费观看| 亚洲人成网站高清观看| 色综合亚洲欧美另类图片| xxxwww97欧美| 在线a可以看的网站| 五月玫瑰六月丁香| 精品无人区乱码1区二区| 精品久久久久久久久久免费视频| 国产一区二区三区视频了| a级毛片a级免费在线| 成年女人毛片免费观看观看9| 少妇熟女aⅴ在线视频| 一a级毛片在线观看| 禁无遮挡网站| 国产亚洲精品一区二区www| 琪琪午夜伦伦电影理论片6080| 丁香六月欧美| 亚洲性夜色夜夜综合| 久久精品国产综合久久久| 听说在线观看完整版免费高清| 亚洲国产日韩欧美精品在线观看 | 国产伦一二天堂av在线观看| 日日干狠狠操夜夜爽| 国产毛片a区久久久久| 免费人成视频x8x8入口观看| 国产欧美日韩精品一区二区| 国产精品av视频在线免费观看| 一本一本综合久久| 99精品在免费线老司机午夜| 国产三级中文精品| 精品久久久久久久毛片微露脸| 熟女人妻精品中文字幕| 国产 一区 欧美 日韩| 日本免费一区二区三区高清不卡| 亚洲国产精品成人综合色| 黄色丝袜av网址大全| 此物有八面人人有两片| 欧美激情在线99| 免费一级毛片在线播放高清视频| 精品久久久久久久毛片微露脸| 在线观看舔阴道视频| 乱人视频在线观看| 欧美绝顶高潮抽搐喷水| 国产一区二区三区在线臀色熟女| 九色国产91popny在线| 最好的美女福利视频网| 亚洲av熟女| 久久伊人香网站| 99视频精品全部免费 在线| АⅤ资源中文在线天堂| 非洲黑人性xxxx精品又粗又长| 日日摸夜夜添夜夜添小说| 亚洲自拍偷在线| 在线观看日韩欧美| 成人高潮视频无遮挡免费网站| 国产av不卡久久| 中文字幕人妻熟人妻熟丝袜美 | 99久久成人亚洲精品观看| 久久久久亚洲av毛片大全| 男插女下体视频免费在线播放| 亚洲在线观看片| 国产真人三级小视频在线观看| 日韩av在线大香蕉| 高潮久久久久久久久久久不卡| 国产精品久久久久久亚洲av鲁大| а√天堂www在线а√下载| 亚洲国产中文字幕在线视频| 国产极品精品免费视频能看的| 精品免费久久久久久久清纯| 欧美最黄视频在线播放免费| 一进一出好大好爽视频| 久久久久久久亚洲中文字幕 | 舔av片在线| 日本一本二区三区精品| 免费看a级黄色片| 真人一进一出gif抽搐免费| 亚洲在线自拍视频| 精品电影一区二区在线| 在线观看午夜福利视频| 久久久久久久午夜电影| 国产私拍福利视频在线观看| 蜜桃亚洲精品一区二区三区| 12—13女人毛片做爰片一| 69av精品久久久久久| 久久精品亚洲精品国产色婷小说| 久久久久久大精品| 欧美最黄视频在线播放免费| 国产欧美日韩一区二区三| 少妇人妻精品综合一区二区 | 美女免费视频网站| 亚洲最大成人手机在线| 日韩国内少妇激情av| 嫩草影院精品99| 天天一区二区日本电影三级| 99riav亚洲国产免费| 成人av一区二区三区在线看| e午夜精品久久久久久久| 精华霜和精华液先用哪个| 99久久成人亚洲精品观看| 亚洲精品美女久久久久99蜜臀| 小蜜桃在线观看免费完整版高清| 狠狠狠狠99中文字幕| 国产精品亚洲一级av第二区| 女警被强在线播放| 中文字幕人成人乱码亚洲影| 精品久久久久久久久久免费视频| 舔av片在线| 白带黄色成豆腐渣| 一本一本综合久久| 色在线成人网| 日韩欧美国产在线观看| 国产精品日韩av在线免费观看| 午夜视频国产福利| 久久香蕉国产精品| 免费观看精品视频网站| 18禁黄网站禁片免费观看直播| 国产97色在线日韩免费| av天堂中文字幕网| 亚洲黑人精品在线| 久久久久免费精品人妻一区二区| 欧美一区二区国产精品久久精品| 黄色丝袜av网址大全| 日本与韩国留学比较| 99久久无色码亚洲精品果冻| 丁香六月欧美| 久久精品夜夜夜夜夜久久蜜豆| 在线看三级毛片| 免费在线观看影片大全网站| 免费看十八禁软件| 在线十欧美十亚洲十日本专区| 国产av不卡久久| 级片在线观看| 国产成人系列免费观看| 国产97色在线日韩免费| 亚洲美女黄片视频| 天堂影院成人在线观看| 国产一区二区三区在线臀色熟女| 亚洲av二区三区四区| 麻豆成人午夜福利视频| 18禁黄网站禁片免费观看直播| 最近最新中文字幕大全电影3| 琪琪午夜伦伦电影理论片6080| 少妇熟女aⅴ在线视频| 禁无遮挡网站| 午夜福利18| 制服人妻中文乱码| 国产精品精品国产色婷婷| av天堂中文字幕网| 中文字幕精品亚洲无线码一区| 少妇丰满av| 国产在视频线在精品| 亚洲av免费在线观看| 日韩大尺度精品在线看网址| 国产av不卡久久| 日韩大尺度精品在线看网址| 亚洲,欧美精品.| 丰满人妻熟妇乱又伦精品不卡| 精品久久久久久,| 操出白浆在线播放| 国产99白浆流出| 久久久久九九精品影院| 国产精品亚洲一级av第二区| 一夜夜www| 99国产精品一区二区三区| 成年女人永久免费观看视频| 亚洲人成网站高清观看| 黄片小视频在线播放| 日本一本二区三区精品| 国产精品一及| 欧美日本视频| 成人无遮挡网站| 99国产极品粉嫩在线观看| 久久久久精品国产欧美久久久| 90打野战视频偷拍视频| 在线a可以看的网站| 99热6这里只有精品| 久久久成人免费电影| 中文字幕av在线有码专区| 色精品久久人妻99蜜桃| 亚洲精品在线观看二区| 老师上课跳d突然被开到最大视频 久久午夜综合久久蜜桃 | 亚洲精品色激情综合| 国产免费男女视频| 精品欧美国产一区二区三| 亚洲成人中文字幕在线播放| 脱女人内裤的视频| 最近最新中文字幕大全免费视频| 久久久久久人人人人人| 成年免费大片在线观看| 久久精品国产自在天天线| 亚洲久久久久久中文字幕| 国产高清激情床上av| xxxwww97欧美| 伊人久久大香线蕉亚洲五| 一本一本综合久久| 欧美日韩综合久久久久久 | 久久久久精品国产欧美久久久| 日韩 欧美 亚洲 中文字幕| www.熟女人妻精品国产| 久久精品国产清高在天天线| 高清在线国产一区| 88av欧美| 久久久久久久久中文| 在线观看午夜福利视频| 亚洲自拍偷在线| 怎么达到女性高潮| 男女视频在线观看网站免费| 午夜福利在线在线| xxx96com| 精品免费久久久久久久清纯| 老熟妇仑乱视频hdxx| 欧美国产日韩亚洲一区| 99国产综合亚洲精品| 亚洲国产高清在线一区二区三| 久久草成人影院| 麻豆国产97在线/欧美| 欧美最新免费一区二区三区 | 亚洲不卡免费看| 亚洲狠狠婷婷综合久久图片| 午夜两性在线视频| 免费看日本二区| 天堂√8在线中文| 国产精品一区二区三区四区免费观看 | 欧美日韩精品网址| 三级毛片av免费| 久久久久久久久大av| 久久久久免费精品人妻一区二区| 久久午夜亚洲精品久久| 亚洲片人在线观看| 国产熟女xx| 成年人黄色毛片网站| 男女下面进入的视频免费午夜| 午夜免费男女啪啪视频观看 | 国产午夜精品久久久久久一区二区三区 | 国内精品美女久久久久久| 看黄色毛片网站| 午夜免费激情av| 好男人电影高清在线观看| 久久精品国产99精品国产亚洲性色| 日本 av在线| 亚洲第一电影网av| 99热只有精品国产| 精品久久久久久久人妻蜜臀av| 国产一区二区三区在线臀色熟女| 欧美乱码精品一区二区三区| 国产黄色小视频在线观看| 白带黄色成豆腐渣| 午夜免费激情av| 久久久久久九九精品二区国产| 午夜激情福利司机影院| 日韩成人在线观看一区二区三区| 亚洲av不卡在线观看| 一进一出好大好爽视频| 免费人成在线观看视频色| 亚洲狠狠婷婷综合久久图片| 岛国在线观看网站|