• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Price-Based Residential Demand Response Management in Smart Grids: A Reinforcement Learning-Based Approach

    2022-10-26 12:23:56YanniWanJiahuQinXinghuoYuTaoYangandYuKang
    IEEE/CAA Journal of Automatica Sinica 2022年1期

    Yanni Wan,, Jiahu Qin,, Xinghuo Yu,,Tao Yang,, and Yu Kang,

    Abstract—This paper studies price-based residential demand response management (PB-RDRM) in smart grids, in which nondispatchable and dispatchable loads (including general loads and plug-in electric vehicles (PEVs)) are both involved. The PBRDRM is composed of a bi-level optimization problem, in which the upper-level dynamic retail pricing problem aims to maximize the profit of a utility company (UC) by selecting optimal retail prices (RPs), while the lower-level demand response (DR)problem expects to minimize the comprehensive cost of loads by coordinating their energy consumption behavior. The challenges here are mainly two-fold: 1) the uncertainty of energy consumption and RPs; 2) the flexible PEVs’ temporally coupled constraints, which make it impossible to directly develop a modelbased optimization algorithm to solve the PB-RDRM. To address these challenges, we first model the dynamic retail pricing problem as a Markovian decision process (MDP), and then employ a model-free reinforcement learning (RL) algorithm to learn the optimal dynamic RPs of UC according to the loads’responses. Our proposed RL-based DR algorithm is benchmarked against two model-based optimization approaches(i.e., distributed dual decomposition-based (DDB) method and distributed primal-dual interior (PDI)-based method), which require exact load and electricity price models. The comparison results show that, compared with the benchmark solutions, our proposed algorithm can not only adaptively decide the RPs through on-line learning processes, but also achieve larger social welfare within an unknown electricity market environment.

    I. INTRODUCTION

    THE rapid development of information and communication technologies (ICTs) in power system, especially the introduction of two-way information and energy flow, has led to a revolutionary transition from the traditional power grid to smart grid [1]. The smart grid, a typical cyber-physical system(CPS), integrates advanced monitoring, control, and communication techniques into the physical power system to provide reliable energy supply, promote the active participation of loads, and ensure the stable operation of the system [2]. Due to the cyber-physical fusion characteristics of the smart grid, demand response management (DRM) has become a research hotspot in the field of energy management[3], [4]. The purpose of DRM is to utilize the changes in energy usage of loads to cope with time-varying electricity prices or reward/punishment incentives, so as to achieve cost reduction or other interests [5].

    The existing literature mainly focuses on two branches of DRM, namely price-based DRM (PBDRM) and incentivebased DRM (IBDRM) [6]. The PBDRM encourages loads to adjust their energy usage patterns in accordance with timebased pricing mechanisms, such as real-time pricing [7] and time-of-use (TOU) pricing [8]. The IBDRM prefers to provide loads with rewards/punishments for their contribution/failure in demand reduction during peak periods [3]. Although both of these two DRMs can promote the active participation of loads, as mentioned in [6], PBDRM is more common than IBDRM, so this study mainly focuses on the PBDRM.

    Up to now, many efforts have been devoted to investigating the PBDRM [9]–[16], mainly from social and individual perspectives. From the social perspective, one expects to maximize the social benefit including interests of both the utility company (UC) and users. For example, the work in [9]studies the distributed real-time demand response (DR) in a multiseller-multibuyer environment and proposes a distributed dual decomposition-based (DDB) method to maximize social welfare, i.e., the comfort of users minus the energy cost of UC. Another work in [10] proposes a distributed fast-DDB DR algorithm to obtain the consumption/generation behavior of the end-user/energy-supplier that can yield the optimal social welfare. In addition to common residential loads, the authors in [11]–[14] further consider a holistic framework which optimizes and controls the building heating, ventilation and air conditioning (HVAC) system and the residential energy management with smart homes under a dynamic retail pricing strategy in addition to achieving DR goals. From individuals’ point of view, one prefers to reduce the electricity bills of users or to maximize the revenue of UC by selecting appropriate pricing mechanisms. For example, the work in[15] studies a deterministic DR with day-ahead electricity prices, aiming to minimize the energy cost for customers.Some other works may focus on the benefit of UC, see, for instance, the objective in [16] is to minimize the energy cost of UC.

    Nevertheless, most of the works are either based on a given pricing mechanism (e.g., TOU pricing in [8] and day-ahead pricing in [15]) or a predetermined abstract pricing model(e.g., linear pricing strategy in [17]). That is to say, the existing PBDRM depends largely on deterministic pricing models which cannot reflect the uncertainty and flexibility of the dynamic electricity market. Additionally, in the long run,the UC expects to estimate/predict the impact of its current retail pricing strategy on the immediate and all subsequent responses of loads. However, the existing works show that UC is myopic and only focuses on the immediate response of loads to the current pricing strategy. In view of this, it is urgent to design a real dynamic pricing mechanism such that it can adapt to flexible load changes and the dynamic electricity market environment. Moreover, it is necessary to develop an effective methodology to solve the dynamic PBDRM under an unknown electricity market environment.

    The development of artificial intelligence (AI) has prompted many experts and scholars to adopt learning-based methods (a powerful tool for sequential decisions within unknown environment [18]) to solve the decision-making problems arising from the smart grid, such as the PEV charging problem[19], energy management problem [20]–[24], and demandside management problem (DSM) [25], [26]. Specifically, in[19], the authors use the reinforcement learning (RL)algorithm to determine the optimal charging behavior of an electric vehicle (EV) fleet without a prior knowledge about the exact model of each EV. The work in [20] develops a RLbased algorithm to solve the problems of dynamic pricing and energy consumption scheduling without requiring a priori system information. Some other works, see [21], [23]–[26],focus more on energy management and DSM (usually refer to the DR for load units). For instance, in [21], the authors study the distributed energy management problem by means of RL,in which the uncertainties caused by renewable energies and continuous fluctuations in energy consumption can be effectively addressed. Moreover, to further improve the scalability and reliability of learning-based approaches [22]and reduce the power loss during the energy trading between energy generation and consumption, the authors in [23]propose a Bayesian RL-based approach with coalition formation, which effectively addresses the uncertainty in generation and demand. A model-free RL-based approach is employed in [24] to train the action-value function to determine the optimal energy management strategy in a hybrid electric powertrain. When considering the DSM, the authors propose a multiagent RL approach based on the Q-learning in[25], which not only enhances the possibility of dedicating separate DR programs for different load devices, but also accelerates the calculation process. Another work in [26]investigates the building DR control framework and formulates the DR control problem as a MDP. On this basis, a cost-effective RL-based edge-cloud integrated solution is proposed, which shows good performance in control efficiency and the learning efficiency in different-sized buildings. However, the considered energy management and DSM problems only involve the general dispatchable loads(i.e., energy usage changes with RP, such as air conditioning and lighting) while ignoring the non-dispatchable loads (i.e.,energy usage cannot change at any time, such as a refrigerator) and a class of more flexible loads, such as plug-in electric vehicles (PEVs) [27].

    Inspired by the application of RL in energy scheduling and trading, this paper adopts a model-free RL to learn the optimal RPs within an unknown electricity market environment. The main contributions are shown below:

    1) This paper studies the price-based residential demand response management (PB-RDRM) in a smart grid, in which both the non-dispatchable and dispatchable loads are considered. Compared with the existing PBDRM with general dispatchable loads such as household appliances [9] and commercial buildings [11], this work innovatively considers a more flexible PEV load with two working modes of charging and discharging.

    2) Unlike the existing works that focus on the individual interests [15], [16], the considered PB-RDRM is modeled from a social perspective. Specifically, the PB-RDRM is composed of a bi-level optimization problem, where the upper level aims to maximize the profit of the UC and the lower level expects to minimize the comprehensive cost of loads.Therefore, the goal of PB-RDRM is to coordinate the energy consumption of all loads to maximize the social welfare (i.e.,the weighted sum of UC’s profit and loads’ comprehensive cost) under the dynamic RPs.

    3) Considering the uncertainty induced by energy consumption and RPs, as well as the temporally coupled constraints of PEVs, a model-free RL-based DR algorithm is proposed. The comparison results between the proposed model-free algorithm and two benchmarked model-based optimization approaches (i.e., distributed DDB [9] and PDI methods [10]) show that our proposed algorithm can not only adaptively decide the dynamic RPs by an on-line learning process, but also achieve the optimal PB-RDRM within an unknown electricity market environment.

    The organization of this paper is as follows. Section II presents the problem statement. Section III provides the RLbased DR algorithm and Section IV conducts the simulation results to verify the performance of the proposed algorithm.Finally, the conclusions are drawn in Section V.

    II. PROBLEM STATEMENT

    Consider a retail electricity market in residential area as shown in Fig. 1, including the load (lower) and UC (upper)levels. Note that there is a two-way information flow between the UC and load levels. Specifically, the UC (a form of distribution system operator (DSO)) releases the information of dynamic RPs to loads while the loads deliver the energy demand information to UC. The PB-RDRM aims to coordinate the energy consumption of a finite set N ={1,2,...,N} of residential loads within a time period T ={1,2,...,T} in response to the dynamic RPs, thereby maximizing social welfare. Since the problem model involves the information and energy interactions between the UC and loads, the mathematical models of them shall be introduced first after which we will give the system objective.

    Fig. 1. Retail electricity market model in residential area.

    A. Load Modeling

    According to the users’ preferences and loads’ energy consumption characteristics, the loads are usually classified into two categories [28], namely dispatchable loads Ndand non-dispatchble loads Nn. In this paper, in addition to the general dispatchable loads G, we consider a more flexible load, i.e., PEVs V . That is, Nd=G∪V.

    1) General Dispatchable Loads:The consumed energy of a general dispatchable loadn∈G is described as [28], [29]

    whereandare the energy consumption (kWh) and energy demand (kWh) of general dispatchable loadnat time slott, respectively. Here the energy demand refers to the expected energy requirement of loads before they receive the RP from UC, while the energy consumption is the actual consumed energy of loads after they receive the RP signal. ξtis the price elasticity coefficient indicating the ratio of energy demand change to the RP variation at time slott. Note that ξtis usually negative to show the reciprocal relationship between energy demand and electricity price [28].and λtrespectively represent the RP ($/kWh) and wholesale price($/kWh) at time slottand follows≥λt. The intuition behind (1) is that the current energy consumption of general dispatchable loadndepends on the current energy demand information and the demand reduction amount resulting from the changes in RP. Here note that when the general dispatchable loadnconsumes energyat time slott, the remaining required energycannot be satisfied, thus resulting in loadnexperiencing dissatisfaction. To characterize such dissatisfaction, a dissatisfaction function is defined as follows [30]:

    whereandare the lower and upper limits of the battery capacity (kWh) of PEVn, respectively. Considering the overall interests of the electricity market, it is impossible to completely obey the PEV owners’ charging willingness,leading to the dissatisfaction of PEV owners. Thus, the following dissatisfaction function is defined [30]:

    where κ ($/kWh) is the degradation coefficient.

    3) Non-Dispatchable Loads:Since the energy consumption of non-dispatchable loads cannot be shifted or curtailed, these energy demands can be critically met at any time. Therefore,for anyn∈Nn, one has

    whereandare the energy consumption and energy demand of non-dispatchable loadnat time slott, respectively.

    From the point of view of loads, one expects to decide optimal energy consumption of all loads to minimize the comprehensive cost which is described below:

    B. Utility Company Modeling

    For UC, since it first purchases electrical energy from the grid operator at predetermined wholesale prices, and then sells the purchased energy to various types of loads at RPs set by itself, the goal of the UC is to select optimal RPs so as to maximize its profit, i.e.,

    C. Problem Formulation

    Recall that the aim of PB-RDRM is to adjust the energy usage patterns of loads to cope with time-varying RPs, so as to maximize social welfare (including both UC’s profit ($) and loads’ comprehensive cost ($)) from a social perspective.Therefore, the considered PB-RDRM can be formulated as the following optimization problem:

    where ρ ∈[0,1] is a weighting parameter to show the relative social value of the UC’s profit and the loads’ comprehensive cost from the social perspective [29], [32]. It is worth mentioning that many optimization methods, such as twostage stochastic programming, Lyapunov optimization techniques, model predictive control, and robust optimization in[13], have been used to solve the DRM problem similar to the optimization problem (14). Although the above methods are relatively mature, they still have the following limitations: 1)They need to have prior knowledge of the exact load model.However, the model of some loads, like the PEVs considered in this paper, is affected by many factors, such as the temporally coupled SoC constraints [31] and the randomness of EV’s commuting behavior [33], resulting in an exact load model that is often difficult to obtain or even unavailable; 2)They depend on the accurate prediction of uncertain parameters (e.g., the RP in the current work). However, in most cases, the prediction error cannot be guaranteed to be small enough, thus affecting the performance of the optimization approaches; 3) Almost all above methods are offline. Therefore, we must fully complete the calculation process, and then choose the best result which, however, is time-consuming when the problem size is large. To tackle the above limitations, we next adopt a RL-based approach which can adaptively determine the optimal policy by on-learning process without requiring the exact load model.

    III. REINFORCEMENT LEARNING-BASED DR ALGORITHM

    This section discusses how to employ the RL method for UC to decide the optimal retail pricing policy so as to solve the PB-RDRM.

    A. A Brief Overview of RL

    RL is a type of machine learning (ML) approach that is evolved from the behaviorist psychology, focusing on how an agent can find an optimal policy within a stochastic/unknown environment that can maximize cumulative rewards [18].Different from the supervised ML, RL explores the unknown environment through continuous actions, continuously optimizes behavioral strategies according to the reward provided by environment, and finally finds the optimal policy(i.e., a sequence of actions) that yields the maximum cumulative reward. Fig. 2 depicts the basic principle of RL.Specifically, the agent does not know what reward and the next state produced by the environment when taking the current action at the initial moment, thus with no knowledge of how to choose actions to maximize the cumulative reward.To tackle this issue, at initial states0, the agent randomly takes an actiona0from the action set and acts on the environment, resulting in the state of environment moving froms0tos1(purple arrows). At the same time, the agent receives an immediate rewardr0from the environment(orange arrows). The process repeats until the end of one episode (the completion of a task or the end of a period of time). Moreover, the current action affects both the immediate reward as well as the next state and all future rewards.Therefore, the RL has two significant characteristics, namely the delayed reward and the trial-and-error search.

    Fig. 2. Basic principle of RL.

    B. Mapping the System Model to RL Framework

    To determine the optimal RPs, we first use the RL framework to illustrate the retail electricity market model (see Fig. 3). Specifically, the UC acts as the agent, all the loads serve as the environment, the retail prices are denoted as the actions that the agent acts on the environment, the energy demand, energy consumptions of loads, and time index represent the state, and the social welfare (i.e., weighted sum of UC’s profit and loads’ comprehensive cost) is the reward.Then, we further adopt a discrete-time Markovian decision process (MDP) to model the dynamic retail pricing problem as it is usually the first step when using the RL method [34],[35]. The MDP is represented by a quintuple ,wherein each component is described as follows:

    Fig. 3. Illustration of RL framework for retail electricity market model.

    Fig. 4. Energy demand/consumption of non-dispatchable loads.

    Fig. 5. Energy demand of dispatchable loads. (a) General dispatchable loads (b) PEVs.

    1) State Set:S={s1,s2,...,sT}, wherest=(et,pt,t). The environment state at time slottis represented by three kinds of information, i.e., energy demandet, energy consumptionptof all loads, and time stept;

    2) Action Set:A={a1,a2,...,aT}, whereat=ηt. To be specific, the action at time slottis denoted as the RPs ηtthat the UC sets for all loads at that time;

    3) Reward Set:R={r1,r2,...,rT}, wherert=ρUt?(1?ρ)Ct. That is to say, the reward at time slottis the social welfare received by the system at that time;

    4) State Transition Matrix:P={Pss′} , where=P{st+1=s′|st=s,at=a} is the transition probability1Since the energy demand and consumption of loads is affected by many factors, the state transition is rather difficult to obtain. Therefore, we next employ a model-free Q-learning method to solve the dynamic retail pricing problem.when adopting actionaat statesand the environment moves to the next states′;

    5) Discount Factor:γ ∈[0,1] indicating the relative importance of subsequent rewards and the current reward.

    One episode of MDP is denoted by (s1,a1,s2,r1;a2,s3,r2;...;aT?1,sT,rT?1). The total return of one episode is denoted by, which represents the cumulative reward. Due to the delayed reward feature of RL, the discounted future return from time slottis usually expressed as, where γ ∈[0,1] is the discount factor:γ=0 implies that the system is totally myopic and only focuses on the immediate reward, while γ=1 shows that the system treat all rewards fairly. Thus, to show the foresight of system, one usually chooses an appropriate discount factor.Here note that since we focus on the social welfare during the entire time horizon, the reward at each time slot is equally important. As a result, the discount factor is 1 in our problem formulation. In addition, denote thepolicyπ, which is a mapping from states to actions, i.e., π:S→A. Then the retail pricing problem aims to seek the optimal policy π?that can maximize the cumulative return, i.e.,

    C. Algorithm Implementation

    After mapping the retail pricing problem to the MDP framework, the RL method can be used to seek optimal retail pricing policies. Here we adopt Q-learning, one of the modelfree RL methods, to analyze how a UC chooses RPs while interacting with all loads to achieve the system objective (14).Almost all RL methods rely on the estimation ofvalue functions, which refer to a set of functions related to states (or state-action pairs), to illustrate the performance of the agent in a given state (or a state-action pair). Thus, the basic principle of Q-learning is to assign anaction-value function(i.e.,Q(s,a)) to each state-action pair (s,a) and update that at each iteration so as to acquire the optimalQ(s,a). The optimal action-value functionQ?(s,a) is denoted as the maximum cumulative discount future return starting from states, taking actiona, and thereafter adopting the optimal policy π?, which obeys the Bellman optimality equation [18], i.e.,

    where E is the expected operator used to show the randomness of the next state and reward,s′∈Srepresents the state at next time slot anda′∈Ais the action adopted at states′. Therefore,by acquiringQ?(s,a), one can immediately obtain the optimal policy by

    The overall implementation of the RL-based DR algorithm is summarized in Algorithm 1. Specifically, to begin with,input a set of predefined parameters including the loads’energy demands, dissatisfaction coefficients, price elasticity coefficients, wholesale prices, and weighting parameters, etc.Then initialize the action-value functionQ0(s,a) to zeros and the UC learns the optimal RP policy by following steps:

    S1) Observe the initial statestand select an actionatwith an ?-greedy policy within the RP boundaries.

    S2) After performingat, the UC obtains an immediate reward byrt=ρUt?(1?ρ)Ctand observes the next statest+1.

    S3) UpdateQk(st,at) by the following mechanism:

    where θ ∈[0,1] is a learning rate indicating the coverage degree of newly obtained Q-values over the old ones.

    S4) Check whether to reach the end of one episode (that is,whether to reach the final time slotT), if not, go back to S1);otherwise, go to S5).

    S5) Check the stopping criterion. Specifically, compare the values ofQkandQk?1to see if it converges, if not, go to the next iteration; otherwise, go to S6).

    S6) Calculate the optimal retail pricing policy based on(17).

    S7) Calculate the optimal energy consumption of dispatchable loads by (1) and (4).

    Remark 1:The basic principle of the ?-greedy policy (the most common exploration mechanism in RL) is either to choose a random action from the action setAwith probability ? or select the action that corresponds to the maximum actionvalue function with probability 1??. Such exploration and selection mechanism not only avoids the complete randomness of system but also promotes the efficient exploration in action space and thus can adaptively decide the optimal policy(i.e., the dynamic RPs) by the on-line learning process.Moreover, the iterative stopping criterion is |Qk?Qk?1|≤δ,where δ is a very small positive constant indicating the gap tolerance of the previous Q-value and the current one,ensuring that the Q-value eventually approaches to the maximum. That is to say, the proposed RL-based DR algorithm can guarantee to an optimal PB-RDRM within an unknown electricity market environment.

    Algorithm 1 RL-Based DR Algorithm 1: Input: A set of predefined parameters Q0(s,a)=0k=0t=0 2: Initialize: An initial action-value function , , 3: Iteration: 4: For each episode do t ←t+1 k ←k+1 5: Repeat: st 6: Step 1: Observe the state (i.e., energy demand, energy con- sumption, and time step) and choose an action (i.e., RP) using the -greedy policy rt=ρUt ?(1?ρ)Ct st+1 at ? 7: Step 2: Calculate the immediate reward and observe the next state Qk(st,at)8: Step 3: Update the action-value function by (18)9: Step 4: Check whether to reach the end of one episode t=T 10: if 11: break;12: end if 13: Step 5: Check the stopping criterion|Qk ?Qk?1|≤δ 14: if 15: break;16: end if 17: Step 6: Compute the optimal retail pricing policy by (17)18: Step 7: Compute the optimal energy consumption by (1) and (4)19: Output: The optimal energy consumption profile

    IV. CASE STUDIES

    This section conducts case studies to verify the performance of the proposed RL-based algorithm. In particular, two modelbased optimization approaches (i.e., distributed DDB [9] and PDI [10] methods) are adopted as benchmarks for comparison. The algorithms are implemented by MATLAB R2016a on a desktop PC with i3-6100 CPU @ 3.70 GHz, 8 GB of RAM, and a 64-bit Windows 10 operating system.

    A. Case Study 1: Effectiveness Verification

    1) Performance Evaluation:In this case, we consider the DRM of a residential area with 5 non-dispatchable loads and 10 dispatchable loads (including 6 general dispatchable loads and 4 PEVs) in a whole day (i.e., 24 hours). The energy demand profiles of non-dispatchable and dispatchable loads are obtained from San Diego Gas & Electric [36] and shown in Figs. 4 and 5. As shown in Fig. 5, the energy demand trends of all six general dispatchable loads are almost the same,resulting in two demand peaks (i.e., 10:00–15:00 and 19:00–22:00). Thus, if the actual energy consumption of those dispatchable loads are not properly coordinated, the electricity burden of power grid shall be largely increased and we cannot guarantee the economic operation of electricity market.Additionally, note that since we consider residential PEVs,their arrival and departure time is almost fixed and known in advance. The wholesale prices are determined by grid operator and is derived from Commonwealth Edison Company [37](see also in Fig. 6). The remaining related parameters of dispatchable loads are listed in Table I. For illustration, the numerical value of the wholesale price and the remaining related parameters (including the elasticity coefficients ξt,weighting factor ρ, learning rate θ, and gap tolerance δ) are summarized in Table II. Note that the action space is discretized by 0.1. That is to say, the RP increases or decreases by a multiple of 0.1 at each iteration. The numerical results of the proposed RL-based DR algorithm are shown below.

    Fig. 6. Daily optimal retail pricing policies of loads. (a) Non-dispatchable load. (b) General dispatchable load. (c) PEV.

    Fig. 6 shows the daily optimal retail pricing policies received by three types of loads. It can be observed that the trends of RPs and wholesale prices are similar, which is justified in terms of maximizing social welfare. Moreover, all the RPs fall within the lower and upper price bounds, thus satisfying the constraint (13). It is worth noting that due to the changes of price elasticity from off-peak to mid-peak (onpeak) periods at 12:00 (16:00), there appears to be sudden decreases at these two time slots. From the definition of price elasticity, one knows that a continuous increase in RP may lead to more demand reduction, thus causing great reductions in UC’s profit. Another observation is that the price difference(i.e., retail price minus wholesale price) for each load unit during three periods satisfies: off-peak > mid-peak > on-peak.This is because the price elasticity coefficient of on-peak period is smaller than that of mid- and off-peak periods. Once obtaining the optimal RPs for all loads, the optimal energy consumption can be directly calculated by (1) and (4), which is shown in Fig. 7. One can see that the PEV discharges at some peak hours to relieve the electricity pressure and increase its own profit. And for further analysis, the total demand reduction of each dispatchable load is displayed in Fig. 8. It can be observed that Gen6 reduces less energy compared with other dispatchable loads, which is because a load unit with a larger αnprefers a smaller demand reduction to prevent from experiencing more dissatisfaction.

    Next, we proceed to verify the convergence of Algorithm 1,that is, judging whether the Q-value converges to the maximum. For clarity, we choose five Q-values of each type of load as an example and the numerical results are displayed in Fig. 9. Clearly, at the beginning, the UC has no knowledge of which RP can result in a larger reward. But as the iteration proceeds, since the UC learns the dynamic responses of loads through trial and error, the Q-values gradually increase and eventually converge to the maximums.

    Now let us move on to the discussion about the impact of ρ and θ. Since the demand reduction of loads is closely related to the time-varying price elasticity, the fixed elasticity coefficients are not representative. To tackle this issue, we adoptMonte Carlo simulations(2000 simulations with changing elasticity coefficients) to capture the trends of average RP, UC’s total profit, and loads’ total cost as the weighting parameter ρ changes. As shown in Fig. 10, the average RP (red solid line), the UC’s total profit (black solidline), and loads’ total cost (blue solid line) increase as ρ varies from 0 to 1 with a step of 0.1. It is because the increase in ρ means that from a social point of view, maximizing the profit of UC is more important than minimizing the comprehensive cost of loads, thereby resulting in an increase in the RP determined by UC. Correspondingly, as RP increases, the amount of energy consumed by loads is gradually reduced,leading to a slight increase in the total cost of the loads. Fig. 11 shows that with θ increasing from 0 to 1, the convergence of Q-values gradually becomes faster. In particular, θ=0 means UC learns nothing while θ=1 shows the UC focuses only on the latest information.

    Fig. 7. Daily energy consumption of loads. (a) General dispatchable load.(b) PEV.

    Fig. 8. Total demand reduction of dispatchable loads.

    Fig. 9. Convergence of Q-values. (a) Non-dispatchable load. (b) General dispatchable load. (c) PEV.

    Fig. 10. Impact of ρ on average retail price, UC's total profit, and loads'total cost.

    Fig. 11. Impact of learning rate θ on convergence of Q-values.

    TABLE I RELATED PARAMETER SETTINGS OF DISPATCHABLE LOAD UNITS

    TABLE II NUMERICAL VALUES OF WHOLESALE PRICE AND OTHER RELATED PARAMETERS

    2) Comparison With Benchmarks:To further evaluate the effectiveness of the proposed model. First we adopt a benchmark without PBDRM for comparison. Fig. 12 shows the daily energy consumption of all loads under two different situations, namely with and without PBDRM. Here note that for the sake of illustration, the figure plots the average RP of all loads. It can be seen from Fig. 12(a) that there is no energy demand reduction or shift in the absence of PBDRM, causing a fluctuating energy consumption profile. By contrast, as shown in Fig. 12(b), with the help of PBDRM, the loads reduce energy consumption when the price is high, resulting in less total energy consumption and a more smooth profile.Therefore, the proposed PB-RDRM effectively coordinates the energy consumption of residential loads and significantly improves the social welfare of residential retail electricity market.

    Fig. 12. Energy consumption of all loads. (a) without and (b) with PBDRM.

    Then, the proposed RL-based DR algorithm is benchmarked against the other two model-based optimization algorithms(i.e., distributed DDB [9] and PDI [10] methods), and a scheme with perfect information of random parameters (i.e.,the RP and the EVs’ commuting behavior). Note that both of the two model-based optimization approaches are based on deterministic load and price models. It is because the PBRDRM is essentially a bi-level optimization problem, so when the problem model is accurately formulated, one can use the conventional optimization techniques to solve it directly.Specifically, by means of the DDB method in [9], the optimization problem of all lower-level loads can be decoupled into several sub-optimization problems, each of which can be solved in a distributed manner. In simulations,the initial Lagrangian multipliers are set to zeros. In addition,another optimization technique, namely the PDI method, has been shown to be effective in dealing with the DR in smart grid [10]. In our comparative simulation study, let the value of the initial random dual vector be 0.5, α=0.05, σ=2.5,β=0.2 , and εν=εfeas=10?2. Note that the above parameter settings correspond to the algorithm in [10] and are independent of the parameters presented in this paper. The comparison results are shown in Fig. 13. It can be observed that the trends of RP (solid lines) obtained by the four compared algorithms are similar, but the energy consumption profile (bar charts) learned from our proposed RL-based algorithm is much smoother than that of two model-based optimization approaches. Moreover, the trends of RPs and energy consumption profiles obtained by our proposed approach are the closest to the scheme with perfect information of random parameters (refer as “Com-based” in Fig. 13). In addition, Table III lists the numerical comparison results of the UC’s total profit, the loads’ total cost, and the social welfare. It can be observed from Table III that the scheme with complete information of random parameters provides an upper bound for the social welfare generated by the PB-RDRM considered in this paper. Moreover, the result of our proposed RL-based approach is closer to this upper bound than those of the other two model-based optimization approaches.

    Fig. 13. Comparison results of the total energy consumption and RP solutions of RL-based algorithm and three benchmarked algorithms(including DDB, PDI, and Com-based).

    TABLE III NUMERICAL COMPARISON RESULTS

    Moreover, note that the complexity of RL algorithms applied to the goal-directed exploration tasks, especially for the Q-learning and its variants, have been thoroughly analyzed in an earlier paper [38]. Specifically, according to Theorem 2 and Corollary 3 in [38], one can see that the complexity of the one-step Q-Learning algorithm (corresponding to the inner loop of the RL-based DR algorithm proposed in the current work) is O(md), wheremanddare the total number of stateaction pairs and the maximum number of actions that the agent executes to reach the next state, respectively. Therefore,the complexity of our proposed RL-based DR algorithm is O(md), which obviously depends on the size of the state space. As for the two compared model-based approaches (i.e.,distributed DDB [9] and distributed PDI [10]), they are shown to have polynomial-time complexity. Therefore, with the increase in the number of residential loads and the time horizon, the proposed algorithm has a comparable complexity with distributed DDB and PDI methods. In addition,considering the performance advantages of our proposed algorithm in addressing the uncertainty of RPs and energy consumption, as well as yielding larger social welfare, it turns out that the proposed RL-based DR algorithm can effectively solve the PB-RDRM within an unknown electricity market environment.

    B. Case Study 2: Scalability Verification

    Next, to verify the scalability of the proposed algorithm, we consider more loads (i.e., the total number of loads changes from 50 to 200) participating in the PB-RDRM. Fig. 14 traces the convergence rate of Q-values with different numbers of loads. It can be observed that the more loads there are, the more iterations required for Q-values to converge. Note that the iterative process of 200 loads takes about 7.17×103swhile that of 50 loads is 3.89×103s. The main reason for such an increase in time and number of iterations is that when adding one load, there areW24permutations to perform,whereW∈{1,...,|A|}. Fortunately, with the advent of a new generation of advanced computing platforms such as grid computing and cloud computing [2], such computing pressure is no longer an obstacle to the development of smart grids as they synthesize and coordinate various local computing facilities such as smart meters, to provide the required subcomputing and storage tasks.

    Fig. 14. Convergence rate of Q-values with different numbers of loads.

    V. CONCLUSION

    This paper investigates the problem of PB-RDRM, in which the flexible PEVs are innovatively considered. We first formulate the upper dynamic retail pricing problem as a MDP with unknown state transition probability from the social perspective. Then a model-free RL-based approach is proposed to obtain the optimal retail pricing policies to coordinate energy consumption profiles. The proposed approach is shown to address the uncertainty induced by energy consumption and RP, as well as the temporally coupled constraints of PEVs, without any prior knowledge about the exact models of load and electricity price. The simulation results show that our proposed RL-based algorithm can not only adaptively decide the dynamic RPs by the on-line learning process, but also outperform the model-based optimization approaches in solving the PB-RDRM within an unknown market environment.

    Note that the tabular Q-learning algorithm we are using is limited by the changing dimensions of the state vector. However, unlike the commercial area, the number of loads in residential areas is almost fixed, resulting in the dimension of state vector being constant in the considered PB-RDRM. Therefore, the corresponding Q table does not need to be reconstructed and trained repeatedly. That is to say, the proposed Q learning-based algorithm is applicable in the investigated PBRDRM. In the future, we intend to use function approximation or neural networks to replace the Q-table so as to expand the algorithm to solve larger problems. We will also focus on the pros and cons of both PBDRM and IBDRM to explore the coordination between these two DRMs.

    欧美绝顶高潮抽搐喷水| 国产精品精品国产色婷婷| 亚洲,欧美精品.| 国产精品免费一区二区三区在线| 亚洲熟妇熟女久久| 午夜福利免费观看在线| 亚洲欧美一区二区三区黑人| 亚洲一区高清亚洲精品| 国产伦在线观看视频一区| 三级国产精品欧美在线观看| 国产免费av片在线观看野外av| 国产一区二区三区在线臀色熟女| 18禁在线播放成人免费| 别揉我奶头~嗯~啊~动态视频| 欧美乱妇无乱码| 法律面前人人平等表现在哪些方面| 69人妻影院| 叶爱在线成人免费视频播放| 两个人看的免费小视频| 日韩精品中文字幕看吧| 又黄又粗又硬又大视频| 国产熟女xx| 淫妇啪啪啪对白视频| 日韩欧美在线二视频| 日本三级黄在线观看| 首页视频小说图片口味搜索| 成人av在线播放网站| 免费搜索国产男女视频| 亚洲av电影在线进入| 淫妇啪啪啪对白视频| 精品人妻一区二区三区麻豆 | 97碰自拍视频| 亚洲中文字幕一区二区三区有码在线看| 欧美日韩黄片免| 精品国产亚洲在线| 国产精品乱码一区二三区的特点| 亚洲午夜理论影院| 黑人欧美特级aaaaaa片| 超碰av人人做人人爽久久 | 内射极品少妇av片p| 在线国产一区二区在线| 久久久久久久久中文| 91在线观看av| 国产精品99久久久久久久久| 国产毛片a区久久久久| 特级一级黄色大片| 欧美绝顶高潮抽搐喷水| 少妇高潮的动态图| 伊人久久精品亚洲午夜| 少妇高潮的动态图| 国产亚洲精品久久久久久毛片| 欧美日韩精品网址| 内射极品少妇av片p| 欧美绝顶高潮抽搐喷水| 国产黄片美女视频| 国语自产精品视频在线第100页| 成人午夜高清在线视频| 啪啪无遮挡十八禁网站| 日本一二三区视频观看| 亚洲国产色片| bbb黄色大片| www国产在线视频色| 亚洲天堂国产精品一区在线| 亚洲中文日韩欧美视频| 在线观看免费视频日本深夜| 色在线成人网| 国内毛片毛片毛片毛片毛片| 欧美一级毛片孕妇| 国产精品一区二区三区四区免费观看 | 一级黄色大片毛片| 国产探花在线观看一区二区| 99久久精品一区二区三区| 国产熟女xx| 变态另类成人亚洲欧美熟女| 亚洲一区二区三区色噜噜| 人妻夜夜爽99麻豆av| 亚洲va日本ⅴa欧美va伊人久久| 中文资源天堂在线| 中文字幕av在线有码专区| 香蕉久久夜色| 国产老妇女一区| 亚洲精品在线美女| 在线天堂最新版资源| 丰满的人妻完整版| 嫁个100分男人电影在线观看| 国产又黄又爽又无遮挡在线| 日韩欧美 国产精品| 啪啪无遮挡十八禁网站| 久久久久精品国产欧美久久久| 国产精品99久久99久久久不卡| 搡女人真爽免费视频火全软件 | 两个人看的免费小视频| 老熟妇仑乱视频hdxx| 亚洲熟妇熟女久久| 久久久国产成人精品二区| 又黄又粗又硬又大视频| 黄色女人牲交| 男人和女人高潮做爰伦理| 淫妇啪啪啪对白视频| 国内精品美女久久久久久| www.色视频.com| www.www免费av| 97碰自拍视频| 综合色av麻豆| 色综合欧美亚洲国产小说| 岛国在线观看网站| 久久久久久久久大av| 日韩欧美精品免费久久 | 精品不卡国产一区二区三区| 亚洲av美国av| 18禁裸乳无遮挡免费网站照片| 欧美中文日本在线观看视频| 国产精品一区二区三区四区免费观看 | 国产黄片美女视频| 一个人看的www免费观看视频| 精品一区二区三区视频在线观看免费| 波多野结衣高清无吗| 99久久精品热视频| 国产三级黄色录像| 母亲3免费完整高清在线观看| 嫁个100分男人电影在线观看| x7x7x7水蜜桃| 日本在线视频免费播放| 小蜜桃在线观看免费完整版高清| 国产毛片a区久久久久| 亚洲成人久久性| 午夜福利欧美成人| 中文字幕人妻丝袜一区二区| 在线看三级毛片| 久久精品国产亚洲av涩爱 | 制服丝袜大香蕉在线| 91麻豆av在线| 欧美日韩黄片免| 国产亚洲欧美在线一区二区| 成年免费大片在线观看| 一个人看的www免费观看视频| 国产精品,欧美在线| 久久九九热精品免费| 99久久精品一区二区三区| 久久久久亚洲av毛片大全| 国内毛片毛片毛片毛片毛片| 法律面前人人平等表现在哪些方面| 国产精品亚洲一级av第二区| 丰满乱子伦码专区| 亚洲男人的天堂狠狠| 国语自产精品视频在线第100页| 免费电影在线观看免费观看| 美女 人体艺术 gogo| 又黄又粗又硬又大视频| 久久久久久久久久黄片| 亚洲一区高清亚洲精品| 亚洲电影在线观看av| 亚洲国产欧美人成| 女人被狂操c到高潮| 久久久久久久久中文| 搡老熟女国产l中国老女人| 99riav亚洲国产免费| 亚洲精品456在线播放app | 国产午夜福利久久久久久| 天堂动漫精品| 国模一区二区三区四区视频| 午夜福利成人在线免费观看| 老司机深夜福利视频在线观看| 悠悠久久av| 国产欧美日韩精品亚洲av| 久久亚洲真实| 精品午夜福利视频在线观看一区| 天堂av国产一区二区熟女人妻| 香蕉久久夜色| 波野结衣二区三区在线 | 亚洲美女视频黄频| 观看美女的网站| 精品一区二区三区视频在线 | 一进一出抽搐gif免费好疼| 精品国产三级普通话版| 久久香蕉国产精品| 身体一侧抽搐| 欧美日本亚洲视频在线播放| 天堂√8在线中文| 亚洲最大成人手机在线| 国产aⅴ精品一区二区三区波| 国语自产精品视频在线第100页| 热99在线观看视频| 欧美成人性av电影在线观看| 国产v大片淫在线免费观看| 欧美乱码精品一区二区三区| 无遮挡黄片免费观看| 欧美一区二区亚洲| 狂野欧美激情性xxxx| 国产毛片a区久久久久| 国产精品久久久人人做人人爽| 午夜老司机福利剧场| 淫妇啪啪啪对白视频| 在线观看av片永久免费下载| 国产成人福利小说| 男女视频在线观看网站免费| 亚洲av不卡在线观看| a级一级毛片免费在线观看| 女同久久另类99精品国产91| 成人三级黄色视频| 淫秽高清视频在线观看| 午夜精品在线福利| 人妻丰满熟妇av一区二区三区| 欧美+亚洲+日韩+国产| 天美传媒精品一区二区| 99热这里只有精品一区| 免费看光身美女| 97碰自拍视频| 麻豆国产97在线/欧美| 男插女下体视频免费在线播放| 久久久久精品国产欧美久久久| 国产探花极品一区二区| 亚洲真实伦在线观看| 国产国拍精品亚洲av在线观看 | 日本撒尿小便嘘嘘汇集6| 啦啦啦观看免费观看视频高清| 亚洲一区二区三区色噜噜| 亚洲,欧美精品.| 亚洲av免费高清在线观看| 国产中年淑女户外野战色| 黄片小视频在线播放| а√天堂www在线а√下载| 国产日本99.免费观看| 色综合婷婷激情| 99国产极品粉嫩在线观看| 极品教师在线免费播放| 老司机午夜福利在线观看视频| 麻豆一二三区av精品| 一个人看的www免费观看视频| 亚洲片人在线观看| 国产精品久久视频播放| 成人18禁在线播放| 欧美激情久久久久久爽电影| av国产免费在线观看| 国产一区二区三区在线臀色熟女| 在线观看午夜福利视频| 午夜福利成人在线免费观看| 搡老熟女国产l中国老女人| 91字幕亚洲| 岛国视频午夜一区免费看| 国产精品 国内视频| 91麻豆av在线| 757午夜福利合集在线观看| 色噜噜av男人的天堂激情| 免费大片18禁| 美女高潮的动态| 国产亚洲精品综合一区在线观看| 色在线成人网| 一个人观看的视频www高清免费观看| 99久久综合精品五月天人人| 国产乱人伦免费视频| 日韩人妻高清精品专区| 少妇的逼水好多| 一区二区三区激情视频| www.色视频.com| 中亚洲国语对白在线视频| 精品午夜福利视频在线观看一区| 亚洲av成人精品一区久久| 欧美乱妇无乱码| 国产色爽女视频免费观看| 每晚都被弄得嗷嗷叫到高潮| 久久香蕉国产精品| 99久久精品国产亚洲精品| 好看av亚洲va欧美ⅴa在| 三级男女做爰猛烈吃奶摸视频| 国产又黄又爽又无遮挡在线| 少妇高潮的动态图| 神马国产精品三级电影在线观看| 久久香蕉国产精品| 精品一区二区三区视频在线 | av国产免费在线观看| 91久久精品国产一区二区成人 | 午夜福利高清视频| 欧美激情在线99| 国产又黄又爽又无遮挡在线| 免费在线观看影片大全网站| 一个人免费在线观看的高清视频| 老司机午夜十八禁免费视频| 搡老熟女国产l中国老女人| 亚洲国产精品999在线| 色播亚洲综合网| 国产 一区 欧美 日韩| 老汉色∧v一级毛片| 色综合站精品国产| 成人鲁丝片一二三区免费| 欧美激情久久久久久爽电影| 亚洲美女黄片视频| 亚洲va日本ⅴa欧美va伊人久久| 欧美日本亚洲视频在线播放| 国产精品日韩av在线免费观看| 无人区码免费观看不卡| 国产探花在线观看一区二区| 精品不卡国产一区二区三区| 日韩欧美一区二区三区在线观看| 俺也久久电影网| 欧美最黄视频在线播放免费| 国产精品国产高清国产av| 亚洲美女黄片视频| 久久久久久久精品吃奶| 精品日产1卡2卡| 一个人看视频在线观看www免费 | 久久久精品欧美日韩精品| 国产精品久久视频播放| 国产久久久一区二区三区| 国产精品98久久久久久宅男小说| 99热只有精品国产| 黄色片一级片一级黄色片| 精品不卡国产一区二区三区| 黄片小视频在线播放| 国产69精品久久久久777片| 亚洲精品在线美女| 性色avwww在线观看| 免费看光身美女| 亚洲av电影在线进入| 免费观看精品视频网站| 岛国视频午夜一区免费看| 亚洲精品乱码久久久v下载方式 | 级片在线观看| 国产精品久久久人人做人人爽| 成年女人毛片免费观看观看9| 亚洲 国产 在线| 精品欧美国产一区二区三| 亚洲精品一卡2卡三卡4卡5卡| 亚洲性夜色夜夜综合| 国产色爽女视频免费观看| 搡女人真爽免费视频火全软件 | 我的老师免费观看完整版| 午夜福利在线观看吧| 欧美精品啪啪一区二区三区| 制服丝袜大香蕉在线| 99久久99久久久精品蜜桃| 亚洲精品美女久久久久99蜜臀| 日韩欧美国产在线观看| www日本黄色视频网| x7x7x7水蜜桃| 97碰自拍视频| x7x7x7水蜜桃| 精品电影一区二区在线| 操出白浆在线播放| 国产乱人视频| 18禁黄网站禁片午夜丰满| 精品一区二区三区人妻视频| 欧美xxxx黑人xx丫x性爽| 啦啦啦观看免费观看视频高清| 欧美日本亚洲视频在线播放| 欧美一区二区亚洲| 欧美中文综合在线视频| 亚洲中文字幕日韩| 免费av毛片视频| av片东京热男人的天堂| 亚洲中文字幕日韩| 天美传媒精品一区二区| 丰满人妻熟妇乱又伦精品不卡| 免费看a级黄色片| 日本 欧美在线| 欧美日本视频| 久久久久久久久久黄片| 国产v大片淫在线免费观看| 国产精品综合久久久久久久免费| 欧美一区二区亚洲| 淫妇啪啪啪对白视频| 亚洲片人在线观看| 午夜激情福利司机影院| 久久香蕉精品热| 搡老妇女老女人老熟妇| 波多野结衣巨乳人妻| 国内揄拍国产精品人妻在线| 久久久国产成人免费| 国内精品一区二区在线观看| 十八禁网站免费在线| 内地一区二区视频在线| 日韩欧美精品v在线| bbb黄色大片| 亚洲av二区三区四区| 免费观看人在逋| 国产成人av激情在线播放| 欧美在线黄色| 国产淫片久久久久久久久 | 1024手机看黄色片| 9191精品国产免费久久| 亚洲片人在线观看| 成人性生交大片免费视频hd| 欧美不卡视频在线免费观看| 有码 亚洲区| 亚洲久久久久久中文字幕| 日韩人妻高清精品专区| 天天躁日日操中文字幕| 午夜免费男女啪啪视频观看 | 日本a在线网址| 制服人妻中文乱码| 2021天堂中文幕一二区在线观| 久久久久九九精品影院| 内射极品少妇av片p| 亚洲精品国产精品久久久不卡| 中文字幕久久专区| 国产一区二区激情短视频| 天堂av国产一区二区熟女人妻| 女生性感内裤真人,穿戴方法视频| 高潮久久久久久久久久久不卡| 一二三四社区在线视频社区8| 看免费av毛片| 亚洲国产欧美人成| 久久久久免费精品人妻一区二区| 亚洲七黄色美女视频| 色老头精品视频在线观看| 免费人成视频x8x8入口观看| 免费av不卡在线播放| 99久久无色码亚洲精品果冻| 在线a可以看的网站| 一边摸一边抽搐一进一小说| 熟女电影av网| 国产精品嫩草影院av在线观看 | 淫妇啪啪啪对白视频| 波多野结衣高清作品| 看免费av毛片| 嫩草影院精品99| 国产熟女xx| 一夜夜www| 国产精品国产高清国产av| 母亲3免费完整高清在线观看| 在线a可以看的网站| 亚洲五月婷婷丁香| 亚洲成人久久性| 黄色片一级片一级黄色片| 99国产极品粉嫩在线观看| 最新在线观看一区二区三区| 在线观看一区二区三区| 国产 一区 欧美 日韩| www.999成人在线观看| 美女被艹到高潮喷水动态| 综合色av麻豆| 亚洲欧美日韩东京热| 男女视频在线观看网站免费| 亚洲精品在线美女| 国产成人aa在线观看| 久久久久亚洲av毛片大全| 国产精品亚洲av一区麻豆| 丁香欧美五月| 国产成人啪精品午夜网站| 午夜免费观看网址| 窝窝影院91人妻| 亚洲狠狠婷婷综合久久图片| 岛国在线观看网站| 一边摸一边抽搐一进一小说| 亚洲成人久久性| 很黄的视频免费| 亚洲成人精品中文字幕电影| 免费看日本二区| 午夜免费成人在线视频| 国产淫片久久久久久久久 | 国产精品亚洲av一区麻豆| 国产精品亚洲一级av第二区| 免费观看人在逋| 精品久久久久久久人妻蜜臀av| 国产免费av片在线观看野外av| 黄片大片在线免费观看| 精品久久久久久久久久免费视频| 久久久久久久久久黄片| 午夜免费男女啪啪视频观看 | 日本成人三级电影网站| 精品欧美国产一区二区三| 97碰自拍视频| 欧美极品一区二区三区四区| 亚洲av美国av| 欧美高清成人免费视频www| а√天堂www在线а√下载| 色综合亚洲欧美另类图片| 欧美一级a爱片免费观看看| 久久久久久国产a免费观看| 国产精品亚洲美女久久久| 一个人观看的视频www高清免费观看| 在线天堂最新版资源| 操出白浆在线播放| 亚洲成人中文字幕在线播放| 日韩欧美国产在线观看| 琪琪午夜伦伦电影理论片6080| 特级一级黄色大片| 99riav亚洲国产免费| 国内少妇人妻偷人精品xxx网站| bbb黄色大片| 久久国产乱子伦精品免费另类| 精品久久久久久成人av| 美女黄网站色视频| 亚洲国产欧美网| 亚洲欧美一区二区三区黑人| 成人特级黄色片久久久久久久| 国产成人av教育| www日本黄色视频网| 久久人妻av系列| 久久久国产精品麻豆| 无限看片的www在线观看| 色精品久久人妻99蜜桃| 精品人妻偷拍中文字幕| 午夜福利成人在线免费观看| 亚洲国产欧洲综合997久久,| 国产熟女xx| 国产精品久久视频播放| 亚洲专区中文字幕在线| 午夜福利成人在线免费观看| 亚洲熟妇中文字幕五十中出| 国产亚洲精品一区二区www| 中亚洲国语对白在线视频| 国产高清视频在线播放一区| 欧美日韩福利视频一区二区| 老熟妇仑乱视频hdxx| 国产亚洲欧美在线一区二区| 精品熟女少妇八av免费久了| 国产午夜福利久久久久久| 午夜福利在线观看吧| 欧美日韩精品网址| 99久久无色码亚洲精品果冻| 国产黄片美女视频| 69人妻影院| 亚洲色图av天堂| 一本久久中文字幕| 亚洲国产高清在线一区二区三| 久久久久久大精品| 欧美成人性av电影在线观看| 欧美+日韩+精品| 嫩草影院精品99| 亚洲av中文字字幕乱码综合| 午夜福利免费观看在线| 欧美日韩国产亚洲二区| 天美传媒精品一区二区| 亚洲精品一卡2卡三卡4卡5卡| 亚洲精华国产精华精| 又黄又粗又硬又大视频| 色播亚洲综合网| 国产精品亚洲一级av第二区| 桃红色精品国产亚洲av| 亚洲 欧美 日韩 在线 免费| 精品欧美国产一区二区三| 桃色一区二区三区在线观看| 国产视频一区二区在线看| 俄罗斯特黄特色一大片| 天美传媒精品一区二区| 国产亚洲精品av在线| 听说在线观看完整版免费高清| 一区二区三区国产精品乱码| 国产一区二区亚洲精品在线观看| 日日夜夜操网爽| 长腿黑丝高跟| 天堂av国产一区二区熟女人妻| 老汉色av国产亚洲站长工具| 国产激情欧美一区二区| 国产一区二区激情短视频| 成人精品一区二区免费| or卡值多少钱| 国产精品99久久久久久久久| 91在线精品国自产拍蜜月 | 色吧在线观看| 国内精品久久久久精免费| 国产成人福利小说| 亚洲av中文字字幕乱码综合| 国产一区二区三区在线臀色熟女| 色综合婷婷激情| 国模一区二区三区四区视频| 日本五十路高清| av黄色大香蕉| 香蕉丝袜av| 久久精品91蜜桃| 欧美3d第一页| 长腿黑丝高跟| 网址你懂的国产日韩在线| 操出白浆在线播放| 亚洲国产精品sss在线观看| 人妻久久中文字幕网| 美女黄网站色视频| 一个人看视频在线观看www免费 | 亚洲精品在线观看二区| 久久久久精品国产欧美久久久| 舔av片在线| 亚洲 欧美 日韩 在线 免费| 国产精品久久久久久久久免 | 香蕉丝袜av| 精品一区二区三区人妻视频| av片东京热男人的天堂| 国产成年人精品一区二区| 亚洲欧美日韩东京热| 亚洲人成电影免费在线| 日韩欧美一区二区三区在线观看| 久久久久免费精品人妻一区二区| 亚洲最大成人手机在线| 日韩 欧美 亚洲 中文字幕| 真人做人爱边吃奶动态| 中国美女看黄片| 亚洲精品亚洲一区二区| 久久人妻av系列| 欧美成人性av电影在线观看| 好男人电影高清在线观看| 国产精品 欧美亚洲| av在线天堂中文字幕| 国产一区二区三区视频了| 亚洲久久久久久中文字幕| 国产一区二区三区在线臀色熟女| 久久精品91蜜桃| 成人亚洲精品av一区二区| 国产精品爽爽va在线观看网站| 成人特级黄色片久久久久久久| 91麻豆av在线| 亚洲无线在线观看| 99视频精品全部免费 在线| 成年女人毛片免费观看观看9| 久久香蕉国产精品| 欧美区成人在线视频| 国产精品久久久久久久久免 | 欧美zozozo另类| 男人舔奶头视频| 亚洲av成人精品一区久久| 免费人成视频x8x8入口观看| 午夜免费成人在线视频| 怎么达到女性高潮| 色精品久久人妻99蜜桃| 欧美av亚洲av综合av国产av| 看片在线看免费视频|