• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Safety-Constrained Multi-Agent Reinforcement Learning for Power Quality Control in Distributed Renewable Energy Networks

    2024-05-25 14:39:46YongjiangZhaoHaoyiZhongandChangCyoonLim
    Computers Materials&Continua 2024年4期

    Yongjiang Zhao,Haoyi Zhong and Chang Cyoon Lim

    Department of Computer Engineering,Chonnam National University,Yeosu,59626,South Korea

    ABSTRACT This paper examines the difficulties of managing distributed power systems,notably due to the increasing use of renewable energy sources,and focuses on voltage control challenges exacerbated by their variable nature in modern power grids.To tackle the unique challenges of voltage control in distributed renewable energy networks,researchers are increasingly turning towards multi-agent reinforcement learning(MARL).However,MARL raises safety concerns due to the unpredictability in agent actions during their exploration phase.This unpredictability can lead to unsafe control measures.To mitigate these safety concerns in MARL-based voltage control,our study introduces a novel approach:Safety-Constrained Multi-Agent Reinforcement Learning(SC-MARL).This approach incorporates a specialized safety constraint module specifically designed for voltage control within the MARL framework.This module ensures that the MARL agents carry out voltage control actions safely.The experiments demonstrate that,in the 33-buses,141-buses,and 322-buses power systems,employing SC-MARL for voltage control resulted in a reduction of the Voltage Out of Control Rate(%V.out)from 0.43,0.24,and 2.95 to 0,0.01,and 0.03,respectively.Additionally,the Reactive Power Loss(Q loss)decreased from 0.095,0.547,and 0.017 to 0.062,0.452,and 0.016 in the corresponding systems.

    KEYWORDS Power quality control;multi-agent reinforcement learning;safety-constrained MARL

    1 Introduction

    The conventional utilization of non-renewable energy sources for electricity generation has given rise to issues of energy scarcity and environmental pollution.To address these concerns,an increasing reliance on renewable energy sources,such as solar and wind power,has been observed in power generation.However,the intermittent and volatile nature of renewable energy sources poses significant challenges to the safe operation and stability of the electrical grid when integrated into it.Within the grid,it is imperative to maintain specific ranges of both frequency and voltage to ensure the normal operation of various devices.Frequent fluctuations and unstable power supply can result in power quality issues,including voltage fluctuations,harmonics,and intermittent power supply,which may adversely affect grid stability and impact sensitive electronic equipment and industrial processes[1].This article primarily explores how to mitigate the instability of renewable energy-based electricity generation through voltage control.Active power voltage control involves the adjustment of the active power level within the electrical system to ensure that the grid voltage is maintained within suitable limits.This can be achieved through the adjustment of generator output,the utilization of reactive power compensation devices(such as synchronous condensers or capacitors),and the implementation of voltage stabilizers to alleviate issues related to overvoltage and undervoltage[2].

    Active power voltage control has always played a crucial role in traditional distribution grids.However,with the increasing integration of distributed renewable energy sources into the grid,an excessive injection of active power may lead to voltage fluctuations beyond the prescribed thresholds in the grid[3].This renders traditional control algorithms,such as droop control and optimal power flow(OPF)[4],less adaptable to the uncertainty in distributed renewable energy networks.Droop control is a relatively simple control strategy that does not necessitate complex optimization algorithms or communication systems.Nevertheless,the parameters of droop control are typically fixed,making it less responsive to changes in grid conditions.On the other hand,OPF offers flexibility by adjusting to grid requirements and operational constraints,but it relies on intricate mathematical models and computations,often demanding high-performance computing and real-time data updates.Active power voltage control problem exhibits two primary characteristics: (1) In a distributed network,voltage is influenced by neighboring nodes,and as the distance between nodes increases,the likelihood of mutual influence decreases.(2) It presents a constrained multi-objective optimization problem,where the objective is to maintain the voltage within prescribed limits for all buses while minimizing total power losses[5].

    Multi-agent reinforcement learning(MARL)is a subfield of reinforcement learning that involves multiple intelligent agents operating in a shared environment,learning to make decisions that maximize their individual long-term cumulative rewards.Each agent acts as an independent learning entity,but their behaviors interact with and influence each other since they share the environment and may pursue common or competitive objectives.MARL typically encompasses cooperative collaboration and competitive rivalry,addressing complex issues of coordination,communication,and competition.In the context of the power system,distributed voltage control is a critical concern for ensuring grid stability and voltage quality.Conventional voltage control methods typically involve centralized control or rule-based local control,which may lack the flexibility and efficiency needed in complex distributed energy environments.Consequently,the application of MARL holds significant potential in the realm of distributed voltage control[6].

    Currently,many research efforts aim to employ MARL to address the issue of distributed voltage control [6–10].Each of these studies has demonstrated that MARL-based approaches outperform traditional control algorithms.However,most of these works have not considered large-scale power grids.To assess the performance of MARL,this paper utilizes three power grids of varying scales,which can be employed to validate the effectiveness of MARL in different grid sizes and identify MARL algorithms suitable for voltage control.While MARL surpasses traditional algorithms in the context of distributed voltage control,it frequently exhibits risky actions during the early training and transfer processes,thereby increasing the risk of grid failure.Therefore,to ensure the safety of MARL throughout its training,testing,and transfer phases,this paper proposes the inclusion of safety constraints on MARL to guarantee both secure actions and the safe operation of the power grid.The primary contributions of this paper are as follows:

    1.In response to voltage control,we propose safety-constrained multi-agent reinforcement learning (SC-MARL) based on MARL.SC-MARL integrates a safety constraint module to derive secure actions,ensuring the safety of MARL-controlled actions during the voltage control process.

    2.We conducted a comparative analysis of various MARL algorithms and the SC-MARL algorithm.Experimental results demonstrate that,in contrast to other MARL algorithms,SCMARL consistently ensures the generation of safe actions.Moreover,throughout the training and testing phases,the proportion of voltage exceeding the safety range approaches zero.

    3.Our experiments involved three different scales of power grids:33-buses,141-buses,and 322-buses.SC-MARL exhibited optimal performance across all three power grids,underscoring its adaptability as the scale of the power grid expands.

    The remainder of this paper is organized as follows.Section 2 describes the work related to traditional methods and MARL for active voltage control.Section 3 formulates the power quality control problem and solves it using safety-constrained MARL.The experiments and results are described in detail in Section 4.Finally,we summarize our work in Section 5.

    2 Related Work

    Traditional distribution networks typically have few or no renewable energy generation nodes,and voltage control is primarily managed through voltage control devices such as static var compensators(SVCs) and on-load tap changers (OLTCs) [11].These voltage control devices are often installed at substations,limiting their control to the voltages at these substations,while more distant nodes or buses are not effectively regulated[12].The integration of many photovoltaic(PV)into the grid has raised interest in controlling the reactive power output of PVs to adjust the voltages at their respective buses.In this context,methods such as OPF and droop control are frequently employed to address voltage control issues.OPF is typically considered in two main variants: Centralized OPF [13] and distributed OPF[14].Centralized OPF primarily addresses the problem of minimizing overall power losses,while distributed OPF aims to decentralize using the alternating direction method of multipliers(ADMM)to enhance computational efficiency.However,OPF’s primary limitation is its inability to achieve real-time control in dynamic power systems,particularly in rapidly changing voltage scenarios[15].On the other hand,droop control is often employed for local voltage control,offering strong realtime responsiveness.Yet,distributed networks require communication to enable global voltage stability[16].In summary,while OPF can minimize power losses,it lacks real-time responsiveness,and droop control offers real-time capabilities but does not optimize power losses.Therefore,this paper adopts MARL to learn voltage control strategies,striking a balance between real-time responsiveness and power loss minimization.

    MARL is increasingly being applied by researchers to address the issue of active voltage control.Commonly used MARL algorithms in research include multi-agent deep deterministic policy gradient(MADDPG) [8],multi-agent soft actor critic (MASAC) [17],and multi-agent twin delayed deep deterministic policy gradient algorithm (TD3) [7],among others.These studies typically divide the distributed renewable energy network into numerous regions,with each region having a single agent.Each agent can manipulate the reactive power to affect the active power,with a focus on central training and decentralized execution(CTDE)frameworks.CTDE has the advantage of allowing each agent to learn global information and then execute actions quickly at the local level.However,these studies do not account for scenarios in which a region contains multiple agents or when the network scales up.To address these issues,a team led by A models the active voltage control problem as a decentralized partially observable markov decision process(Dec-POMDP)[18],in which each region contains multiple PV inverters.The control objective is to regulate the active power by adjusting the reactive power of each inverter.Additionally,they evaluate the performance of various MARL algorithms in three different scales of power networks [19].Nonetheless,a common limitation of MARL algorithms is their failure to consider security concerns during training,testing,and transfer learning.

    In the context of power grids,operational safety is of paramount importance,as actions that exceed safety boundaries can potentially lead to grid failures.Therefore,ensuring the safety of MARL involves agents maximizing rewards while simultaneously ensuring that their actions comply with safety requirements.One approach to ensuring the safety of MARL is by designing a safe reward function that encourages MARL to favor actions with high rewards.However,this does not guarantee safety during the model training phase[20].To ensure safety during the training phase,the constrained policy optimization method can be employed,which restricts the policy’s single-step updates through trust region optimization[21].Nonetheless,this approach does not guarantee safety during the agent’s testing phase.The utilization of Lyapunov functions,which guide policy learning,can provide safety assurance for both training and testing phases,but it is limited to global safety.To enable agents to take safe actions during training,testing,and transfer learning,we propose a safety-constrained MARL method.This primarily involves calculating safe actions by taking derivatives with respect to each action through the observed voltage changes,facilitated by the Lyapunov functions.

    3 Methodology

    3.1 Distributed Renewable Energy Network

    The distributed renewable energy network is primarily divided into three components:Generation,transmission,and local renewable energy systems,as illustrated in Fig.1.Power generation involves the utilization of sources such as hydro energy,nuclear energy,and thermal energy.To minimize losses during the generation and transmission process,high voltages are often employed.The transmission component transports and distributes the electricity generated by power plants to residential or industrial consumers.To meet users’electricity demands,voltage is adjusted to accommodate the load through transformers.In this paper,renewable energy primarily refers to solar energy.Solar panels are not mandatory on the user side,so solar farms are introduced to address this issue.Local management entities communicate with users,solar farms,and smart meters through a network to obtain power parameter information.Solar power cannot be directly injected into the grid and requires conversion through voltage transformers.Due to the intermittent nature of solar power generation,voltage instability occurs on the user side,with the risk of voltage exceeding safe levels and causing reverse current flow into the grid.After acquiring power information,local management entities control the reactive power of the PV voltage converter to adjust the voltage within a safe range.Symbol definition of distributed renewable energy networks are shown in Table 1.

    Table 1: Symbol definition of distributed renewable energy networks

    The power system dynamics can be defined as shown in Eqs.(1),(2),which is the cornerstone of solving the active power control and power flow problems[22].

    Figure 1: Illustration of distributed renewable energy networks

    In a distributed renewable energy network,it is assumed that there areLbuses andNbranches,where each busiis associated with voltage valueviand phase angleθi.Power injection is represented byzi=pi+jqi.The set of buses connected to busiis denoted asLi.The conductance and susceptance for busiandjare denoted bygijandbij,respectively,while the phase difference between busiandjis represented byθij=θi-θj.Active and reactive power at the PV nodes of busiare denoted byand,respectively.Active and reactive power consumed by the loads at busiare represented byandTo minimize the impact of voltage fluctuations on the grid,a safe range is defined as a 5% deviation from the baseline voltage.Assuming a baseline voltage ofv0=1.0 at the grid side,denoted as unit(p.u.),the voltage at each bus must satisfy the condition outlined in(0.95p.u.≤vi≤1.05p.u.).During nighttime,when energy consumption and generation are relatively low,the voltage at corresponding buses may fall below(0.95p.u.).In contrast,during daytime with ample solar energy,the corresponding bus voltages may exceed(1.05p.u.),resulting in reverse power flow from the user side to the grid side[23].

    To describe the voltage control relationship,we simplify the system into two buses.In Fig.1,viprepresents the voltage of a particular bus,which can be considered as the reference voltage.viis the voltage of the bus connected to it,and the bus has loads and PV elements.The impedance between them is denoted asvi=ri+jxi,whereriis the resistance,andxiis the reactance.The voltage difference between these two buses is Δvi=vip-vi,as shown in Eq.(3).

    The power loss is denoted asPloss,as expressed in Eq.(4).

    To minimize the voltage difference between the two buses,the control variableis adjusted to manipulate Δvi.Similarly,the adjustment ofcan also be employed to reduce power losses.It is crucial to note that,apart fromall other variables are uncontrollable.

    3.2 Problem Formulation

    The MARL approach to addressing multi-agent control problems is typically formulated as a Dec-POMDP,represented by a tuple(M,S,A,O,T,R,P,ρ,γ).TheMdenotes the set of agents.TheSrepresents the set of states,describing the entire environment’s state.TheAis the set of actions,with each element corresponding to the action of each individual agent.TheOsignifies the observable states for the agents,where each agent’s observed state may only capture a portion of the overall environment state.TheTdenotes the transition probabilities of the environment,describing the dynamic changes in the environment,with values ranging from [0,1].TheRis the set of rewards obtained by agents after executing actions in the environment.ThePrepresents the observation function,indicating the probability of an agent observing a particular state after taking an action in a given state,with values in the range of[0,1].Theρis the probability distribution function for the initial state values,ranging from[0,1].Theγis the discount factor,utilized to discount future rewards.

    (1) State and Observation Set:

    In nodes within the bus,electrical data such as voltage,active power,and reactive power can be obtained through smart meters.In nodes equipped with PV systems,it is also necessary to monitor the active and reactive power of the PV system.The variablesvandθrepresent the voltage value and corresponding phase angle of the node’s bus,respectively,influenced by the power of the load and PV.The variablespLandqLrepresent the total active and reactive power of the load at the bus node,where the load generally consists of user electrical devices and is considered uncontrollable.The variablespPVandqPVrepresent the total active and reactive power of the PV system at the bus node.The system’s state can be represented asS={v,θ,pL,qL,pPV,qPV}.As each agent can only observe the state of its respective region,referred to as an observation,Osignifies the measured voltage,active power,and reactive power within the region.

    (2) Action Set:

    The overall bus voltage in the region is primarily influenced by the intermittent nature of PV generation,resulting in voltage instability.The main cause of this instability is the fluctuation in the active power generated by PV itself.To address this issue,voltage control can be achieved by adjusting the reactive power generated by the PV inverter.The adjustment of PV inverter reactive power can be expressed aswhererepresents the maximum apparent power for the corresponding bus node.The parameterakrepresents the adjustment of the reactive power of the PV inverter,enabling the regulation of reactive power production and,consequently,the control of voltage on the corresponding bus.The range ofakis[-c,c],where the value ofcis determined based on the load capacity.

    (3) Reward Function:

    In the process of utilizing MARL for voltage control,the primary consideration is whether the post-operation voltage remains within a safe range.Subsequently,the objective is to minimize losses incurred by the production of reactive power.The reward function is defined as follows:

    In the Eq.(5),lvrepresents the penalty function for voltage.Any deviation of the voltage from the designated standard voltage results in a corresponding penalty value calculated through the penalty function.The penalty is zero only when the voltage value aligns with the set standard voltage.a,b,c,dare set to 2,0.095,0.01,0.04,respectively.lvis expressed as follows:

    lqis computed to quantify the loss due to the production of reactive power,denoted as:

    Theαdenotes the proportion of reactive power generation loss,ranging from[0,1].

    The objective is to maximize the reward,denoted aswithπrepresenting the policy of the agent.

    3.3 Multi-Agent Reinforcement Learning

    In comparison to single-agent reinforcement learning,the environment in which multiple agents operate is more intricate.In collaborative tasks,each intelligent agent is tasked not only with maximizing its individual rewards but also with considering the collective achievement of goals with other agents.However,each agent,in the process of exploration,observes only a partial aspect of the environment,rendering each trained agent highly unstable.To address this issue,the concept of centralized training with decentralized execution has been proposed[24].During the training phase,each agent explores the environment and collects transitions,which are then uniformly stored in a replay buffer.Once transitions from all agents are gathered,a critic network is trained using all the data.The policy network of each agent is trained locally based on theQvalues provided by the critic.Consequently,upon completion of training,the agents can execute tasks locally.

    In reinforcement learning algorithms,the issue of overestimation often arises in value estimation networks,impeding the convergence of the reinforcement learning algorithm.To address this problem,approaches such as DDQN[25],Dueling DQN[26],and TD3[27]have been successively proposed.Among these,TD3 has demonstrated effective mitigation of issues arising from overestimation.TD3 employs a dual actor-critic network structure,comprising a main network and a target network.The critic utilizes two networks to estimateQvalues and selects the minimum value for updating the policy network.To further alleviate overestimation issues,the Actor is updated with a delay after several steps of critic training,ensuring more stable convergence.The MATD3 algorithm integrates the CTDE-derived algorithm into the TD3 framework and is designed to address challenges in multiagent environments.The algorithmic framework of MATD3 is illustrated in Fig.2.

    Figure 2: The algorithmic framework of MATD3

    To prevent the overestimation problem of the critics,the minimumQvalue is selected for updating the networks.The critic networks of the main network are updated using theQvalues from the target networks.The objective of training the main network is to make the outputQvalues as close as possible to the values of the target network plus the reward obtained from environmental exploration,as shown in Eq.(11).

    whereγis the discount factor,is the target output action,and Gaussian noise with clipping is added as shown in Eq.(12).

    N(0,σ)is random noise following a standard Gaussian distribution,andcidefines the maximum and minimum values of the noise.After obtaining the minimumQvalue from the output of the target critic networks,denoted asy,the main critic networks are updated using the loss function represented in Eqs.(13),(14).

    After calculating the loss values for the two target networks,the network parametersandare updated through backpropagation,as shown in Eqs.(15),(16).

    The α is the learning rates for the two main critic networks.The update of the main actor is based on theQvalues output by the two main critic networks,and the loss function is represented as shown in Eq.(17).

    After obtaining the loss throughQvalues,the hyperparameters of the main actor network are updated through backpropagation,as shown in Eq.(18).

    The α is the learning rate of the main actor network.

    3.4 Safety-Constrained MARL for Power Quality Control

    In the context of distributed renewable energy networks,each embedded node with PV is considered as an agent.Each agent explores the environment,collecting observed observations,and takes corresponding actions.However,due to the lack of prior knowledge during the initial training of agents,actions taken by the agents may cause voltage to exceed the safe range,posing unknown potential risks.To ensure that agents make safe actions,we propose imposing safety constraints on their actions before execution.After an agent takes an action,the output action is subjected to a safety check within the safety constraint module.In this module,the input consists of observed observations by the agent,including the voltage valuev.It is imperative to assess whether the voltage falls within the safe range,defined as[0.95+η,1.05-η],whereηis set to 0.025 in the paper,signifying a narrowed voltage safety range of[0.975,1.025].This adjustment aims to enhance the algorithm’s fault-tolerance space.When the voltage exceeds the designated safety range,a quadratic programming(QP)is solved to determine the corrective action that satisfies safety conditions.The calculated corrective action is then fed back to the agent to adjust its output action,which is subsequently utilized for training the policy network.If the voltage remains within the established safe range,no further action is taken,as illustrated in Fig.3.

    Figure 3: The diagram of safety-constrained MARL for power quality control

    In a specific region,the action of each agent influences the voltage at various bus points to varying degrees.This means that changes in bus voltage can be attributed to two main factors: Inherent characteristics of the bus and the actions of other agents.Therefore,when bus voltages exceed the predefined safety range,it is essential to accurately assess the impact of each action on all bus voltages.To achieve this,we must calculate the partial derivatives of the voltage at each bus with respect to the actions of all agents.This approach allows for a more precise understanding of how individual actions affect the overall voltage stability in the region.This is represented using the Jacobian matrix,as shown in Eq.(19).

    The variable m represents the number of buses,Ndenotes the number of agents,and the overall nodal voltage of the system,excluding the embedded PV inverter,is influenced by the actions performed by agents.Thetsignifies the current time step,andt+1 represents the subsequent time step.Therepresents the action executed by thei-th agent at timet.According to the Taylor series expansion,if a small change Δatis added toat,the expression for the voltage at the next time stept+1 can be formulated as follows Eq.(20).

    To guarantee that the voltage determined from solving Eq.(20) adheres to the specified safety constraints,we have reformulated the problem.It is now presented as the solution to a Quadratic Programming(QP)problem,which is detailed in Eq.(21).

    The boundary value for the safety range,denoted asη>0,is set to 0.025,as the voltage must not exceed the safety range during the MARL process.Building upon this constraint,we further limit it to ensure safety during the training phase of MARL.By solving a quadratic programming problem,we obtain the optimal correction term Δat.Consequently,when the total bus voltage within the region exceeds the predefined safety range,the action taken by the agent,in addition to the correction term Δat,facilitates the return of the voltage to the safety range.Consequently,the safety constraint module ensures that MARL maintains control of the voltage within the safety range throughout both the training and testing phases.

    4 Experiments

    4.1 Experiment Setups

    To validate the proposed SC-MARL for power quality control,we employed the open-source distributed power grid environment MAPDN [2] as our simulation platform.To demonstrate the robust performance of SC-MARL across power grids of varying scales,three distributed power grids of different sizes were utilized,namely,33-buses,141-buses,and 322-buses.The data used consisted of three years of records at 3-min intervals,with 2 years allocated for training and 1 year for validating the performance of different algorithms.Five commonly used MARL algorithms were selected for comparison with SC-MARL in the experiments.The performance of the proposed method was contrasted with that of other algorithms.To assess and compare the performance of SC-MARL and other algorithms,five distinct evaluation metrics were employed,allowing for a comprehensive analysis from various perspectives.Finally,for visual representation of the control efficacy of voltage,we chose a specific day in summer and winter for the 33-buses,141-buses,and 322-buses,employing visualization techniques in the experiments.

    4.1.1 Distributed Renewable Power Network

    In the experiment,three different scales of power grids were employed,namely,33-buses,141-buses,and 322-buses.Their voltage levels were 12.66 kV,12.5 kV,and 110-20-0.4 kV,respectively.The 322-buses scale is comparatively large,and to emphasize flexibility and diversity,three different voltage levels were utilized for assessing the performance of MARL across varying voltage networks.The power grids were equipped with varying numbers of loads and PVs,with load quantities of 32,84,and 337,and PV quantities of 6,22,and 38,respectively.Each power grid was partitioned into distinct regions,with the numbers of regions being 4,9,and 22 for 33-buses,141-buses,and 322-buses,respectively.The maximum active power of loads and PVs for the 33-buses system were 3.5 and 8.75 MW,respectively.For the 141-buses system,the maximum active power of loads and PVs were significantly larger at 20 and 80 MW,providing a basis for comparing the performance of MARL in networks with different active power levels.The 322-buses system had a maximum active power of 1.5 MW for loads and 3.75 MW for PVs.The distributed renewable energy environment configurations are presented in the Table 2.

    Table 2: Environment configurations

    Table 3: Test results in 33-buses.?refers to the MARL baseline

    Table 4: Test results in 141-buses.?refers to the MARL baseline

    Table 5: Test results in 322-buses.?refers to the MARL baseline

    The size of the figure is measured in centimeters and inches.Please adjust your figures to a size within 17 cm(6.70 in)in width and 20 cm(7.87 in)in height.Figures should be in the original scale,with no stretch or distortion.The network structure of the 33-buses system is illustrated in Fig.4.We have partitioned the network into four zones,each characterized by varying quantities of loads and PV sources.Bus 0 typically serves as the bus interfacing with the main power grid.In zones 2 and 3,there is a single PV source connected to bus 1 and 2.Conversely,zones 1 and 4 feature two PV sources each,connected to bus 5.Within a given zone,each agent,which is linked to the bus where the photovoltaic(PV)source is situated,has the ability to monitor the voltage and power levels of other buses in the same zone.However,this capability to observe does not extend to buses located in different zones.

    Figure 4: The network structure of the 33-buses system

    4.1.2 Data Descriptions

    The load data was collected in real-time from the electricity consumption of 232 power users in the Portuguese region over a period of three years[28].The original dataset comprised electricity consumption readings at 15-min intervals for a total of 370 residential and industrial entities spanning the years 2012 to 2015.Data collection commenced on January 01,2012,at 00:15:00.Due to the presence of some missing values,a subset of users was excluded,resulting in a final dataset consisting of 232 power users.To enhance the temporal resolution,the data was imputed to transform the 15-min intervals into 3-min intervals.The ultimate dataset dimensions were 526,080×232,covering 232 users over a span of 1,096 days.

    Given that 322 buses were required,and to address the shortfall in users,load data was randomly duplicated from the existing 232 users to compensate for the missing data in the 322 buses.Additionally,Gaussian random noise was introduced to the duplicated data.PV data was sourced from the Elia group[28],with a resolution of 3 min and a total dataset size of 526,080×232.For the 33-bus configuration,encompassing 4 regions,the available PV profiles were deemed sufficient.In the case of the 141-bus configuration,featuring 9 regions,the PV data was utilized.Concerning the 322-bus configuration,comprising 22 regions,PV data for the 22 regions was obtained through random duplication,followed by the introduction of Gaussian random noise.

    The load and PV power profiles for bus 13,18,and 23 of the 33-buses system during winter and summer are depicted in Fig.5.For bus-13,the disparity between winter and summer load powers is marginal,as consumer electricity demand exhibits relatively limited variation.However,there is a substantial difference in PV power,primarily attributed to abundant sunlight during the summer,resulting in approximately a twofold increase in PV generation compared to winter.Bus 18 exhibits similarities to bus 13,with the distinction that power consumption during winter is higher than in summer.In the case of bus 23,PV generation is lower during winter,while in summer,the PV output is approximately four times that of winter.

    Figure 5: Daily power of 33-buses network of winter(January,1st raw)and summer(July,2nd raw).The(a),(b)and(c)are bus 13,bus 18 and bus 23,respectively

    The load and PV power profiles for bus 36,111,and 53 of the 141-buses system during both winter and summer seasons are illustrated in Fig.6.It is noteworthy that bus 53 exhibits substantial PV generation in both winter and summer,with increased sunlight availability during the summer season.However,the load power is relatively small.Consequently,the injection of a significant amount of active power from PV sources into the bus can result in voltage fluctuations,leading to grid disturbances.In such scenarios,it becomes imperative to regulate the PV inverters,inducing them to generate reactive power to absorb the surplus active power.This action aims to stabilize the bus voltage,ensuring overall grid safety and reliability.

    Figure 6: Daily power of 141-buses network of winter(January,1st raw)and summer(July,2nd raw).The(a),(b)and(c)are bus 36,bus 111 and bus 53,respectively

    The load and PV power curves for bus 70,111,and 53 of the 322-buses system during both winter and summer seasons are illustrated in Fig.7.In the summer,the scenario is analogous to that of the 33-buses system in winter.Conversely,during the winter,the PV generation power of the 322-buses system is extremely low.Owing to the presence of a significant number of inverters on the PV,power is consumed on the bus,resulting in bus voltage falling below the lower limit of the safety range and causing grid instability.To address this issue,the impact of PV inverters on the grid is mitigated by controlling the reactive power of the PV inverters.In summary,weak sunlight during the winter results in lower PV generation power,leading to voltages below the lower limit of the safety range.Conversely,strong sunlight during the summer results in higher PV generation power,causing voltages to exceed the upper limit of the safety range.

    4.1.3 MADRL Algorithm Settings

    In the experiment,we selected five commonly used MARL algorithms for performance comparison with our proposed model.These algorithms are MATD3,MAPPO [29],MADDPG,IPPO[30],and COMA[31].SC-MARL is an extension of MATD3 with the addition of a safety constraint module.To assess their performance,all algorithms were trained for a total of 400 episodes,with each episode consisting of 480 steps,corresponding to one day(3-min intervals).Due to variations in training methodologies,the training was conducted using both online and offline approaches.IPPO,COMA,and MAPPO were trained online,with the network being updated after each episode.On the other hand,MATD3 and MADDPG were trained offline,with policy network updates occurring after each episode as well.The learning rate was set to 0.0001,and the L1 norm clip bound was set to 1.The batch size was fixed at 32.For offline algorithms,the replay buffer size was set to 5000.Notably,COMA had a sample size of 10 for its replay buffer.Additionally,IPPO and MAPPO had a value loss coefficient of 2,with a clip bound of 0.4.

    Figure 7: Daily power of 322-buses network of winter(January,1st raw)and summer(July,2nd raw).The(a),(b)and(c)are bus 70,bus 74 and bus 132,respectively

    4.1.4 Evaluation Metrics

    To evaluate the performance of the proposed SC-MARL model in comparison to other MARL models,five metrics were employed in the experiments,namely controllable rate,average reward,reactive power loss,average voltage,and voltage out of control rate.

    (1)Controllable Rate(%CR):It is calculated as the proportion of all buses controlled at each time step in every episode.Specifically,it represents the ratio of the number of times the controller has taken control over a period of time to the total number of time steps.The controllable rate ranges from 0 to 1,with a higher value indicating better performance.

    (2)Average Reward(Avr.R):It measures the average sum of rewards obtained by all agents at each time step within each episode.The average reward falls within the range of[-8,0],and a value closer to 0 is indicative of better performance.

    (3) Reactive Power Loss (Q loss): It represents the average reactive power loss generated by all agents at each time step in every episode.The reactive power loss metric also ranges from[-8,0].

    (4)Average Voltage(V.pu):It calculates the average voltage at each time step for all buses within each episode.The average voltage is around 1,and a value closer to 1 is considered desirable.

    (5)Voltage Out of Control Rate(%V.out):It is the average proportion of time steps in which the voltage of any bus goes out of control during each episode.The voltage out of control rate ranges from[0,1],and a lower value indicates better control performance.

    4.2 Results

    In voltage control,voltage fluctuations occur frequently.To prevent the voltage from exceeding the predetermined safety range,it is essential for the controller to perform real-time processing of voltage fluctuations rather than waiting until the voltage is close to the dangerous threshold to react.The controllable rate is employed to measure the real-time control proportion of the controller.A comparative analysis of the controllable rates for COMA,IPPO,MADDPG,MATD3,MAPPO,and SC-MARL is presented in Fig.8 across 33-buses,141-buses,and 322-buses scenarios.SCMARL demonstrates outstanding performance across different buses.However,there is some notable fluctuation in the 322-buses scenario,primarily attributed to the large network scale,with certain bus nodes stabilizing and requiring minimal intervention.

    Figure 8: The comparative analysis of the controllable rate.The(a),(b)and(c)are 33-buses,141-buses and 322-buses,respectively

    In the process of exploration,the agent,lacking any prior knowledge during the early stages of training,initially experiences lower rewards.A comparative analysis of Average Rewards for COMA,IPPO,MADDPG,MATD3,MAPPO,and SC-MARL is presented in Fig.9 across 33-buses,141-buses,and 322-buses scenarios.As the agent continues to explore and learn,it gradually tends to favor actions that result in greater rewards.The inclusion of a safety constraint in our algorithm significantly reduces the margin for agent errors.Consequently,the agent learns safe actions from the outset,leading to higher rewards.In comparison to other algorithms,the proposed algorithm exhibits characteristics such as rapid convergence and high safety,making it more suitable for scenarios where safety is a primary concern.

    Figure 9: The comparative analysis of the average reward.The(a),(b)and(c)are 33-buses,141-buses and 322-buses,respectively

    The primary objective of voltage control is to regulate voltage by controlling the generation of reactive power.The generation of reactive power results in power wastage;therefore,concurrently with voltage control,minimizing the generation of reactive power is also crucial.We compared the Reactive Power Loss of COMA,IPPO,MADDPG,MATD3,MAPPO,and SC-MARL in a 33-buses,141-buses,and 322-buses system,as illustrated in Fig.10.The proposed SC-MARL,when employed for voltage control,consistently exhibits lower levels of generated reactive power compared to other algorithms.As the number of episodes increases,the agent gradually reduces the generation of reactive power.The objective is to control voltage within a safe range while minimizing the production of reactive power as much as possible.

    Figure 10: The comparative analysis of the reactive power loss.The(a),(b)and(c)are 33-buses,141-buses and 322-buses,respectively

    Maintaining voltage within a safe range and ensuring stability are critical aspects of control.We compared the Average Voltage of COMA,IPPO,MADDPG,MATD3,MAPPO,and SC-MARL across 33-buses,141-buses,and 322-buses scenarios,as depicted in Fig.11.In contrast to other algorithms,SC-MARL exhibits comparatively stable voltage control with minimal fluctuations.In the case of the 141-buses scenario,MADDPG and MATD3 demonstrate significant voltage fluctuations,exceeding the safe range early in the training phase an undesirable occurrence.Within the 322-buses scenario,MAPPO also exceeds the safe range in the initial stages of training and exhibits substantial fluctuations throughout the training process.Although SC-MARL does not precisely maintain voltage around 1,it underscores that the pivotal control objective is the safety and stability of the voltage rather than a specific numerical target.

    Figure 11: The comparative analysis of the average voltage.The(a),(b)and(c)are 33-buses,141-buses and 322-buses,respectively

    Once the voltage exceeds the designated safety range,the stability of the power grid cannot be guaranteed,making voltage control a pivotal concern.We conducted a comparative analysis of the Average Voltage for COMA,IPPO,MADDPG,MATD3,MAPPO,and SC-MARL across 33-buses,141-buses,and 322-buses systems,as depicted in Fig.12.Due to the incorporation of safety constraints,SC-MARL exhibits minimal instances of voltage instability,even during the early stages of training.In the case of the 141-buses system,lacking prior knowledge in the initial training phase results in MATD3 experiencing an alarming 80%rate of voltage instability,posing a potentially fatal threat to the power grid.Similarly,in the 322-buses system,MAPPO exhibits high levels of instability,reaching 50% during the early stages of training.Although the voltage instability diminishes with continuous training,deploying these models in practical scenarios may expose them to potential control risks.

    Figure 12: The comparative analysis of the voltage out of control rate.The(a),(b)and(c)are 33-buses,141-buses and 322-buses,respectively

    To assess the performance of the proposed algorithm,we randomly selected 10 episodes from the test sets of 33-buses,141-buses,and 322-buses scenarios and conducted evaluations on the COMA,IPPO,MADDPG,MATD3,MAPPO,and SC-MARL algorithms.The test results are presented in Tables 3–5.It is noteworthy that the%V.outmetric of SC-MARL is significantly lower than that of other algorithms,particularly evident in the 33-bus scenario where the proportion of voltage exceeding the safety range is zero.This demonstrates the algorithm’s robust safety profile.Additionally,SCMARL outperforms other algorithms in terms of control proportion,notably in the 322-bus scenario where it surpasses all others,maintaining a high control ratio even in large-scale power grids.One limitation is that the voltage value cannot be adjusted to stabilize around 1 due to our set value ofηas 0.025;hence,SC-MARL controls the voltage to approximately 0.98.In summary,our proposed SC-MARL demonstrates the capability to safely and stably control voltage while minimizing reactive power generation as much as possible,both during training and testing phases.

    In the context of 33-buses,141-buses,and 322-buses power systems,a comparative analysis was conducted for a selected day in winter (January) and summer (July),with a time interval of 3 min.Three scenarios were considered:No control(None),utilization of the optimal MARL algorithm,and application of SC-MARL for voltage control.The results are illustrated in Fig.13.During summer in the 33-buses system,the high reactive power output from PV sources resulted in voltages exceeding the upper limit of the safety range between 190 and 310.MARL was unable to consistently maintain voltages within the safe range at every time point,whereas SC-MARL exhibited superior and stable control,ensuring safety.In the case of the 141-buses and 322-buses systems,SC-MARL consistently demonstrated excellent control performance,ensuring both the safety and stability of voltages within the specified ranges.

    Figure 13: The curves of voltage control.The(a),(b)and(c)are 33-buses,141-buses and 322-buses,respectively

    5 Conclusion

    This paper introduces a Safety-Constrained Multi-Agent Reinforcement Learning(SC-MARL)approach for power quality control.We have conducted experiments where the SC-MARL algorithm was trained and benchmarked against five other Multi-Agent Reinforcement Learning (MARL)algorithms using five key performance metrics.The results from our tests show that the SC-MARL algorithm effectively manages voltage control in power systems with 33,141,and 322 buses.In the 33-buses system,specifically,the Controllable Rate(%CR)increased from 95.02%to 98.91%,the Average Reward(Avr.R)increased from-0.0104 to-0.0073,the Reactive Power Loss(Q loss)decreased from 0.095 to 0.0624,and the Average Voltage(V.pu)was 0.9852.Additionally,the Voltage Out of Control Rate(%V.out)decreased from 0.43 to 0.Notably,the percentage of voltage surpassing the safety range was almost zero during both the training and testing phases.This robust voltage control underscores the potential of SC-MARL for practical deployment in power grids.

    A noted limitation of our experiment is that due to the safety constraint module,which specifies a voltage safety range of [0.975,1.025],the SC-MARL tends to maintain the voltage around 0.98.Although this does not present immediate concerns,it does highlight an area for further research.

    In conclusion,SC-MARL demonstrates effective and safe voltage control capabilities,crucial as the integration of photovoltaic systems into power grids escalates.This ensures high-quality electrical energy and minimizes energy losses.Future research will aim to refine the safety strategies of SCMARL,enhancing its practical application in real-world power grid scenarios.

    Acknowledgement:We would like to express our sincere gratitude to the Regional Innovation Strategy(RIS) for their generous support of this research.This work was made possible through funding provided by the National Research Foundation of Korea (NRF) under the Ministry of Education(MOE),Grant Number 2021RIS-002.We appreciate the opportunity to conduct this study and acknowledge the invaluable assistance provided by the NRF and MOE in advancing our research endeavors.Their support is instrumental in driving innovation and contributing to the advancement of knowledge in our field.

    Funding Statement:This research was supported by “Regional Innovation Strategy (RIS)”through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (MOE)(2021RIS-002).

    Author Contributions:The authors confirm contribution to the paper as follows:Study conception and design:Y.Zhao,H.Zhong;data collection:Y.Zhao,H.Zhong;analysis and interpretation of results:Y.Zhao;draft manuscript preparation:Y.Zhao;C.Lim reviewed the results and approved the final version of the manuscript.

    Availability of Data and Materials:Data openly available in a public repository.The data that support the findings of this study are openly available in MAPDN at https://drive.google.com/file/d/1-GGPBSolVjX1HseJVblNY3KoTqfblmLh/view?usp=sharing.

    Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

    又大又爽又粗| 高潮久久久久久久久久久不卡| 国产成人啪精品午夜网站| 亚洲情色 制服丝袜| 久久人人爽av亚洲精品天堂| 久久精品久久精品一区二区三区| 国产精品一区二区在线不卡| 桃花免费在线播放| 亚洲五月婷婷丁香| 精品免费久久久久久久清纯 | 看免费成人av毛片| 亚洲七黄色美女视频| 日本av免费视频播放| av片东京热男人的天堂| 国精品久久久久久国模美| www日本在线高清视频| 一级黄片播放器| 天天躁夜夜躁狠狠躁躁| 欧美黑人欧美精品刺激| 视频区欧美日本亚洲| 国产精品.久久久| 亚洲国产日韩一区二区| 久久精品国产亚洲av高清一级| 成人手机av| 欧美人与性动交α欧美精品济南到| 久久性视频一级片| 国产精品三级大全| 亚洲欧美一区二区三区久久| 侵犯人妻中文字幕一二三四区| 国产黄色视频一区二区在线观看| 一个人免费看片子| 国产亚洲欧美在线一区二区| 最新的欧美精品一区二区| 日韩一卡2卡3卡4卡2021年| 日韩av免费高清视频| 我的亚洲天堂| 国产老妇伦熟女老妇高清| 欧美日韩亚洲高清精品| 在线观看免费视频网站a站| 岛国毛片在线播放| 久久九九热精品免费| 亚洲国产欧美在线一区| 美国免费a级毛片| 激情视频va一区二区三区| 天天躁狠狠躁夜夜躁狠狠躁| 国产精品.久久久| 日韩人妻精品一区2区三区| 亚洲欧美成人综合另类久久久| √禁漫天堂资源中文www| 国产伦人伦偷精品视频| 亚洲欧美精品自产自拍| 亚洲,欧美精品.| 99久久人妻综合| 国产成人啪精品午夜网站| 亚洲黑人精品在线| 1024香蕉在线观看| 97在线人人人人妻| 午夜福利视频精品| 肉色欧美久久久久久久蜜桃| 国产xxxxx性猛交| 欧美精品啪啪一区二区三区 | netflix在线观看网站| 日韩一卡2卡3卡4卡2021年| 午夜老司机福利片| 久久精品亚洲av国产电影网| 日韩人妻精品一区2区三区| 久久天躁狠狠躁夜夜2o2o | 欧美日韩国产mv在线观看视频| 无限看片的www在线观看| 精品一区在线观看国产| 日日爽夜夜爽网站| 热99国产精品久久久久久7| 亚洲国产av新网站| 黄色视频不卡| 色94色欧美一区二区| 中国美女看黄片| 永久免费av网站大全| 黑人巨大精品欧美一区二区蜜桃| 欧美成人午夜精品| av天堂在线播放| 国产又色又爽无遮挡免| 欧美在线一区亚洲| 久久久久久亚洲精品国产蜜桃av| 十分钟在线观看高清视频www| 搡老乐熟女国产| 亚洲自偷自拍图片 自拍| 亚洲中文字幕日韩| 日韩精品免费视频一区二区三区| 亚洲人成77777在线视频| 男的添女的下面高潮视频| 精品国产乱码久久久久久小说| 亚洲,欧美,日韩| 亚洲成人国产一区在线观看 | 七月丁香在线播放| 国产人伦9x9x在线观看| 精品卡一卡二卡四卡免费| 夫妻午夜视频| 无限看片的www在线观看| 午夜福利影视在线免费观看| 在线 av 中文字幕| 九色亚洲精品在线播放| 一个人免费看片子| 国产主播在线观看一区二区 | 免费不卡黄色视频| 亚洲精品国产色婷婷电影| 国产精品一区二区在线不卡| 热99国产精品久久久久久7| 国产片特级美女逼逼视频| 日本av手机在线免费观看| 国产欧美日韩综合在线一区二区| 久久精品人人爽人人爽视色| 国产亚洲一区二区精品| 国产av国产精品国产| 亚洲国产欧美在线一区| 国产成人欧美在线观看 | 精品人妻1区二区| 一级毛片 在线播放| 19禁男女啪啪无遮挡网站| 美女视频免费永久观看网站| 精品欧美一区二区三区在线| 久久人妻福利社区极品人妻图片 | 中文欧美无线码| 大香蕉久久网| 免费女性裸体啪啪无遮挡网站| 午夜老司机福利片| 亚洲精品国产色婷婷电影| 亚洲av片天天在线观看| 国产免费一区二区三区四区乱码| 亚洲五月色婷婷综合| 亚洲专区中文字幕在线| 午夜影院在线不卡| 精品人妻在线不人妻| 狠狠精品人妻久久久久久综合| 欧美黑人精品巨大| 亚洲精品中文字幕在线视频| 亚洲精品第二区| 男女高潮啪啪啪动态图| 嫁个100分男人电影在线观看 | 又黄又粗又硬又大视频| 亚洲精品国产av成人精品| 夫妻午夜视频| 91精品伊人久久大香线蕉| 另类精品久久| 欧美 日韩 精品 国产| 精品高清国产在线一区| 国产精品一区二区免费欧美 | 国产成人啪精品午夜网站| 精品久久久久久电影网| 成人18禁高潮啪啪吃奶动态图| 国产av国产精品国产| 少妇 在线观看| 十八禁高潮呻吟视频| 两个人免费观看高清视频| 亚洲九九香蕉| 91国产中文字幕| 桃花免费在线播放| 18在线观看网站| 精品国产一区二区久久| 搡老乐熟女国产| 精品高清国产在线一区| 国产成人免费无遮挡视频| 国产亚洲精品第一综合不卡| 色网站视频免费| 亚洲欧洲国产日韩| 国产老妇伦熟女老妇高清| 欧美激情 高清一区二区三区| 观看av在线不卡| av国产久精品久网站免费入址| 国产免费又黄又爽又色| 人人妻人人澡人人爽人人夜夜| 久久久久久久久久久久大奶| 制服诱惑二区| 久久精品亚洲熟妇少妇任你| 亚洲国产日韩一区二区| 亚洲精品乱久久久久久| 女人被躁到高潮嗷嗷叫费观| 交换朋友夫妻互换小说| 国产成人a∨麻豆精品| 午夜影院在线不卡| 免费在线观看黄色视频的| 免费在线观看视频国产中文字幕亚洲 | 午夜影院在线不卡| 99精品久久久久人妻精品| 一级毛片我不卡| 国产xxxxx性猛交| 亚洲九九香蕉| 七月丁香在线播放| 蜜桃国产av成人99| 满18在线观看网站| 两个人看的免费小视频| 极品少妇高潮喷水抽搐| 如日韩欧美国产精品一区二区三区| 午夜福利视频精品| 国产又色又爽无遮挡免| 深夜精品福利| 麻豆av在线久日| 我的亚洲天堂| 一区福利在线观看| 人人妻人人澡人人看| 曰老女人黄片| videosex国产| 两个人看的免费小视频| 亚洲欧美色中文字幕在线| 伊人亚洲综合成人网| 久久国产精品人妻蜜桃| 亚洲国产毛片av蜜桃av| 国产欧美日韩综合在线一区二区| 精品少妇一区二区三区视频日本电影| 肉色欧美久久久久久久蜜桃| 午夜免费鲁丝| 久久精品aⅴ一区二区三区四区| 精品免费久久久久久久清纯 | 一个人免费看片子| 一区二区三区精品91| 色播在线永久视频| 人人妻人人爽人人添夜夜欢视频| av网站在线播放免费| 婷婷色av中文字幕| 日韩av不卡免费在线播放| 国产成人a∨麻豆精品| 精品人妻在线不人妻| 少妇人妻 视频| 国产精品免费视频内射| 亚洲欧美一区二区三区黑人| 真人做人爱边吃奶动态| 亚洲国产精品999| 一本色道久久久久久精品综合| 天堂俺去俺来也www色官网| 最近中文字幕2019免费版| 久久午夜综合久久蜜桃| 国产淫语在线视频| 七月丁香在线播放| 婷婷成人精品国产| 亚洲一区二区三区欧美精品| 一区二区三区乱码不卡18| 黄片小视频在线播放| 桃花免费在线播放| 91字幕亚洲| 久久免费观看电影| 亚洲自偷自拍图片 自拍| 99国产精品免费福利视频| 久久久久国产一级毛片高清牌| 黄色 视频免费看| 精品福利永久在线观看| 亚洲精品国产av成人精品| 亚洲成av片中文字幕在线观看| 一本—道久久a久久精品蜜桃钙片| 激情视频va一区二区三区| 亚洲欧美精品自产自拍| 日日爽夜夜爽网站| 国产精品二区激情视频| 欧美精品av麻豆av| 亚洲欧美色中文字幕在线| av福利片在线| 热re99久久精品国产66热6| 免费女性裸体啪啪无遮挡网站| 精品少妇久久久久久888优播| 青春草视频在线免费观看| 久久精品久久久久久噜噜老黄| 手机成人av网站| 久久久久久久午夜电影| 啦啦啦观看免费观看视频高清| 一区福利在线观看| 欧美一级毛片孕妇| 亚洲美女黄片视频| 亚洲欧美激情综合另类| 好男人电影高清在线观看| 一边摸一边抽搐一进一小说| 成年版毛片免费区| 最近最新中文字幕大全免费视频| 午夜福利成人在线免费观看| 欧美精品亚洲一区二区| 国产黄色小视频在线观看| 欧美最黄视频在线播放免费| 99热只有精品国产| 中文在线观看免费www的网站 | а√天堂www在线а√下载| 亚洲精品av麻豆狂野| 精品国产美女av久久久久小说| 十分钟在线观看高清视频www| 不卡一级毛片| 91国产中文字幕| 一区二区三区国产精品乱码| 欧美在线黄色| 一级作爱视频免费观看| 亚洲一区二区三区色噜噜| 日韩av在线大香蕉| 色婷婷久久久亚洲欧美| 免费看美女性在线毛片视频| 亚洲国产日韩欧美精品在线观看 | 成人永久免费在线观看视频| 嫩草影院精品99| 国内精品久久久久精免费| 久久99热这里只有精品18| 亚洲成国产人片在线观看| 亚洲专区中文字幕在线| 两性夫妻黄色片| 国产精品98久久久久久宅男小说| 69av精品久久久久久| 搡老熟女国产l中国老女人| АⅤ资源中文在线天堂| 人人澡人人妻人| 男女做爰动态图高潮gif福利片| 欧美黄色淫秽网站| 国产野战对白在线观看| 桃色一区二区三区在线观看| 老熟妇仑乱视频hdxx| 俺也久久电影网| 亚洲色图av天堂| 亚洲九九香蕉| 亚洲一区二区三区不卡视频| 黄色视频不卡| 在线看三级毛片| 97碰自拍视频| 亚洲精品久久成人aⅴ小说| 久久人妻福利社区极品人妻图片| 1024手机看黄色片| 黄色毛片三级朝国网站| 男人舔女人的私密视频| 性色av乱码一区二区三区2| 精品日产1卡2卡| 侵犯人妻中文字幕一二三四区| 日韩欧美三级三区| 国产熟女午夜一区二区三区| 国产成人精品久久二区二区免费| 亚洲精品一区av在线观看| 精品国产乱码久久久久久男人| 久久精品国产99精品国产亚洲性色| 在线观看66精品国产| 1024香蕉在线观看| 国产成人影院久久av| 久久热在线av| 97碰自拍视频| 国产成人精品久久二区二区91| 欧美另类亚洲清纯唯美| 777久久人妻少妇嫩草av网站| 欧美黑人欧美精品刺激| 亚洲精品久久国产高清桃花| 丁香六月欧美| 好男人电影高清在线观看| 日本免费a在线| 成熟少妇高潮喷水视频| 欧美成人午夜精品| 18禁国产床啪视频网站| 国产精品 国内视频| 国产又黄又爽又无遮挡在线| 在线观看舔阴道视频| 亚洲精品久久成人aⅴ小说| 日韩精品青青久久久久久| 麻豆av在线久日| 国产乱人伦免费视频| 俄罗斯特黄特色一大片| 久久热在线av| 成人永久免费在线观看视频| 日日爽夜夜爽网站| 久久久国产欧美日韩av| 国产黄色小视频在线观看| 久久久久精品国产欧美久久久| 动漫黄色视频在线观看| 亚洲无线在线观看| 成人欧美大片| 淫秽高清视频在线观看| 久久久久免费精品人妻一区二区 | 女警被强在线播放| 亚洲av成人不卡在线观看播放网| 热99re8久久精品国产| 精品电影一区二区在线| 国内精品久久久久久久电影| 女人高潮潮喷娇喘18禁视频| 国产亚洲欧美98| 日本五十路高清| 人人妻人人澡人人看| 亚洲精品国产区一区二| 51午夜福利影视在线观看| 久久人妻av系列| 亚洲自拍偷在线| 日本一区二区免费在线视频| 午夜免费成人在线视频| 国产视频内射| 18禁黄网站禁片免费观看直播| 免费看日本二区| 欧美av亚洲av综合av国产av| 婷婷丁香在线五月| 国产免费av片在线观看野外av| 国产成人啪精品午夜网站| 久热这里只有精品99| 色尼玛亚洲综合影院| 国产高清激情床上av| 国内久久婷婷六月综合欲色啪| 草草在线视频免费看| 久久伊人香网站| 欧美又色又爽又黄视频| 男人的好看免费观看在线视频 | 国产爱豆传媒在线观看 | 亚洲aⅴ乱码一区二区在线播放 | 国产亚洲精品av在线| 国产欧美日韩精品亚洲av| 亚洲avbb在线观看| 亚洲美女黄片视频| xxxwww97欧美| 午夜福利欧美成人| 国产精品永久免费网站| 99热只有精品国产| 一级毛片精品| 国内毛片毛片毛片毛片毛片| 免费在线观看视频国产中文字幕亚洲| 亚洲久久久国产精品| 午夜成年电影在线免费观看| 国产成人精品无人区| 日本三级黄在线观看| 女同久久另类99精品国产91| 日韩三级视频一区二区三区| 国产亚洲欧美98| 久久久久久人人人人人| 久久天躁狠狠躁夜夜2o2o| 无限看片的www在线观看| av有码第一页| 国内精品久久久久精免费| 1024香蕉在线观看| 色av中文字幕| 亚洲全国av大片| 亚洲国产精品成人综合色| 日日夜夜操网爽| 亚洲熟妇中文字幕五十中出| 久久 成人 亚洲| 久久久久久大精品| 国产精品乱码一区二三区的特点| 国产精品 国内视频| 国产亚洲精品久久久久久毛片| 一本一本综合久久| 麻豆久久精品国产亚洲av| 免费看十八禁软件| 一本大道久久a久久精品| 亚洲中文av在线| 天堂影院成人在线观看| 亚洲专区中文字幕在线| 最近在线观看免费完整版| 一级a爱视频在线免费观看| 国产精品av久久久久免费| 99精品在免费线老司机午夜| 在线免费观看的www视频| 亚洲成国产人片在线观看| 老汉色∧v一级毛片| 母亲3免费完整高清在线观看| 看片在线看免费视频| 国产免费av片在线观看野外av| 中文字幕精品亚洲无线码一区 | 制服丝袜大香蕉在线| 亚洲最大成人中文| 精品免费久久久久久久清纯| 亚洲黑人精品在线| 欧美乱色亚洲激情| 51午夜福利影视在线观看| av免费在线观看网站| 桃红色精品国产亚洲av| 自线自在国产av| 欧美丝袜亚洲另类 | 久久精品人妻少妇| 精品欧美一区二区三区在线| 久久天堂一区二区三区四区| 免费观看精品视频网站| 国产91精品成人一区二区三区| 欧美国产日韩亚洲一区| 一级a爱片免费观看的视频| e午夜精品久久久久久久| 亚洲电影在线观看av| 免费搜索国产男女视频| 日本黄色视频三级网站网址| x7x7x7水蜜桃| 日本 av在线| 精品一区二区三区av网在线观看| 国产精品爽爽va在线观看网站 | 岛国视频午夜一区免费看| 日本在线视频免费播放| 丰满的人妻完整版| 国产激情欧美一区二区| 久久中文字幕人妻熟女| 欧美精品亚洲一区二区| 国产真实乱freesex| 天天躁夜夜躁狠狠躁躁| 国产精品美女特级片免费视频播放器 | 黄片大片在线免费观看| 欧美精品亚洲一区二区| 成年人黄色毛片网站| 人人妻,人人澡人人爽秒播| 一卡2卡三卡四卡精品乱码亚洲| 国产欧美日韩一区二区三| 精品国内亚洲2022精品成人| 午夜亚洲福利在线播放| 黄网站色视频无遮挡免费观看| 18禁黄网站禁片午夜丰满| 美女免费视频网站| 亚洲精品国产一区二区精华液| 国产精品久久久av美女十八| 一级片免费观看大全| 亚洲欧美激情综合另类| 免费在线观看影片大全网站| 禁无遮挡网站| 一边摸一边做爽爽视频免费| 精品久久久久久久末码| 国产乱人伦免费视频| 国产伦人伦偷精品视频| 一区二区三区国产精品乱码| 午夜a级毛片| 热99re8久久精品国产| 听说在线观看完整版免费高清| 啦啦啦韩国在线观看视频| 国产精品二区激情视频| 一本精品99久久精品77| av欧美777| www.精华液| 国内揄拍国产精品人妻在线 | 99热6这里只有精品| 欧美成人午夜精品| 精品国产乱码久久久久久男人| xxx96com| avwww免费| 99久久精品国产亚洲精品| 999精品在线视频| 两性夫妻黄色片| 国产亚洲精品久久久久久毛片| 欧美日韩乱码在线| 亚洲欧洲精品一区二区精品久久久| 夜夜看夜夜爽夜夜摸| 色综合站精品国产| 特大巨黑吊av在线直播 | 亚洲狠狠婷婷综合久久图片| 特大巨黑吊av在线直播 | 听说在线观看完整版免费高清| 国产v大片淫在线免费观看| 国产精品一区二区精品视频观看| 中文字幕人妻丝袜一区二区| x7x7x7水蜜桃| 精品福利观看| 亚洲精品美女久久久久99蜜臀| 日韩一卡2卡3卡4卡2021年| 在线十欧美十亚洲十日本专区| 黄色女人牲交| 国产熟女午夜一区二区三区| 欧美乱码精品一区二区三区| 欧美黑人精品巨大| 麻豆国产av国片精品| 国产午夜福利久久久久久| 亚洲成av人片免费观看| 亚洲中文字幕一区二区三区有码在线看 | 最近最新中文字幕大全免费视频| 伦理电影免费视频| 久久精品人妻少妇| 啪啪无遮挡十八禁网站| 一级毛片女人18水好多| 91成人精品电影| a在线观看视频网站| 亚洲午夜理论影院| 久久99热这里只有精品18| 最新美女视频免费是黄的| 中文字幕人妻丝袜一区二区| 欧美成人免费av一区二区三区| 亚洲精品国产区一区二| 性色av乱码一区二区三区2| 亚洲人成网站高清观看| 波多野结衣av一区二区av| 91国产中文字幕| 国产免费男女视频| 亚洲成av片中文字幕在线观看| 女人被狂操c到高潮| 动漫黄色视频在线观看| 午夜亚洲福利在线播放| 夜夜躁狠狠躁天天躁| 中文字幕人成人乱码亚洲影| 中文字幕人妻熟女乱码| 欧美日韩精品网址| 国产精品日韩av在线免费观看| 日本三级黄在线观看| 午夜福利在线在线| 91国产中文字幕| 少妇熟女aⅴ在线视频| 国产一区二区激情短视频| 国产久久久一区二区三区| 人人妻,人人澡人人爽秒播| 18禁裸乳无遮挡免费网站照片 | 黄片小视频在线播放| 精品国内亚洲2022精品成人| 90打野战视频偷拍视频| 人妻丰满熟妇av一区二区三区| av在线播放免费不卡| 免费在线观看亚洲国产| 亚洲av成人不卡在线观看播放网| 十分钟在线观看高清视频www| 一区福利在线观看| 亚洲成a人片在线一区二区| 999久久久精品免费观看国产| 欧美在线一区亚洲| 亚洲av中文字字幕乱码综合 | 99热6这里只有精品| 精品一区二区三区视频在线观看免费| 狠狠狠狠99中文字幕| netflix在线观看网站| 长腿黑丝高跟| 一本精品99久久精品77| 日韩 欧美 亚洲 中文字幕| 日本撒尿小便嘘嘘汇集6| 每晚都被弄得嗷嗷叫到高潮| 亚洲精品久久国产高清桃花| 在线十欧美十亚洲十日本专区| 精品久久久久久久毛片微露脸| 亚洲欧美日韩高清在线视频| 色综合亚洲欧美另类图片| 欧美成人免费av一区二区三区| 国产精品久久久久久亚洲av鲁大| 国产av又大| 日韩免费av在线播放| 久久久国产成人精品二区| 免费av毛片视频| 日本一本二区三区精品| 一区福利在线观看|