• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    Constrained Multi-Objective Optimization With Deep Reinforcement Learning Assisted Operator Selection

    2024-04-15 09:37:02FeiMingWenyinGongLingWangandYaochuJin
    IEEE/CAA Journal of Automatica Sinica 2024年4期

    Fei Ming , Wenyin Gong ,,, Ling Wang ,,, and Yaochu Jin ,,

    Abstract—Solving constrained multi-objective optimization problems with evolutionary algorithms has attracted considerable attention.Various constrained multi-objective optimization evolutionary algorithms (CMOEAs) have been developed with the use of different algorithmic strategies, evolutionary operators, and constraint-handling techniques.The performance of CMOEAs may be heavily dependent on the operators used, however, it is usually difficult to select suitable operators for the problem at hand.Hence, improving operator selection is promising and necessary for CMOEAs.This work proposes an online operator selection framework assisted by Deep Reinforcement Learning.The dynamics of the population, including convergence, diversity,and feasibility, are regarded as the state; the candidate operators are considered as actions; and the improvement of the population state is treated as the reward.By using a Q-network to learn a policy to estimate the Q-values of all actions, the proposed approach can adaptively select an operator that maximizes the improvement of the population according to the current state and thereby improve the algorithmic performance.The framework is embedded into four popular CMOEAs and assessed on 42 benchmark problems.The experimental results reveal that the proposed Deep Reinforcement Learning-assisted operator selection significantly improves the performance of these CMOEAs and the resulting algorithm obtains better versatility compared to nine state-of-the-art CMOEAs.

    I.INTRODUCTION

    CONSTRAINED multi-objective optimization problems(CMOPs) contain multiple conflicting objective functions and constraints, which widely exist in real-world applications and scientific research [1].For example, web service location-allocation problems [2] and vehicle scheduling of urban bus lines [3].

    The development of solving CMOPs by constrained multiobjective optimization evolutionary algorithms (CMOEAs)has seen prominent growth due to the wide existence of this kind of problem.Generally, a CMOEA contains three key components that affect its performance: the algorithmic strategy used to assist the selection procedures, the constraint-handling technique (CHT) to handle constraints, and the evolutionary operator to generate new solutions.In the past decade,a large number of CMOEAs have been proposed, most of which focus on enhancing algorithmic strategies [4]–[7] and CHTs [3], [8].By contrast, little research focused on how to adaptively select the evolutionary operator in CMOEAs has been reported in the literature.

    According to [9], different evolutionary operators1An evolutionary operator means the operations of an evolutionary algorithm used for generating offspring solution, such as the crossover and mutation of GA, the differential variation of DE, and the particle swarm update of PSO/CSO.are suited to different optimization problems.For example,crossovers in genetic algorithms (GAs) [10] are effective in overcoming multimodal features; mutations in differential evolution (DE) [11] can handle complex linkages of variables because they use the weighted difference between two other solutions.The particle swarm optimizer (PSO) [12] shows good performance on convergence speed and the competitive swarm optimizer (CSO) [13] is good at dealing with largescale optimization problems because of its competitive mechanism, and the multi-objective PSO (MOPSO) [14], [15] is particularly designed and effective for multi-objective situations.Given the fact that a real-world CMOP is usually subject to unknown features and challenges, using a fixed evolutionary operator will limit the applicability of a CMOEA.Although ensemble and adaptive selection of operators have received increased attention in the multi-objective optimization community [16]–[18], unfortunately, no research efforts have been dedicated to constrained multi-objective optimization.Hence, it is necessary to study the effect of adaptive operator selection in dealing with CMOPs.

    Among the reinforcement learning community, deep reinforcement learning (DRL) is an emerging topic and has been applied in several real-world applications in the multi-objective optimization field [19]–[23].Also, its effectiveness in dealing with multi-objective optimization problems (MOPs)has been studied recently [17], [24], [25].Since DRL uses a deep neural network to approximate the action-value function,it can handle continuous state space and thereby is suitable for operator selection for CMOPs because a population could have infinite states during the evolution.Therefore, it is promising to adopt DRL techniques to fill the research gap of operator selection in handling CMOPs.However, when applying DRL to CMOPs, the main challenge is to properly consider constraint satisfaction and feasibility in a DRL model.

    This work proposes a DRL-assisted online operator selection framework for CMOPs.Specifically, the main contributions are as follows:

    1) We propose a novel DRL model for operator selection in CMOPs.The state of the population, including the convergence, diversity, and feasibility performance, is viewed as the state, the candidate operators are regarded as actions, and the improvement of the population state is the reward.Then, we develop a deep Q-learning (DQL)-assisted operator selection framework for CMOEAs.A deep Q-network (DQN) is trained to learn a policy that estimates the Q-value of an action at the current state.

    2) The proposed model can contain an arbitrary number of operators and the proposed framework can be easily employed in any CMOEA.In this work, we develop an operator selection (OS) method and further instantiate it by using GA and DE as candidate operators in the DRL model.Then, the OS is embedded into four popular CMOEAs: CCMO (coevolutionary framework for constrained multiobjective optimization)[4], PPS (push and pull search) [26], MOEA/D-DAE (multiobjective evolutionary algorithm with detect-and-escape) [27],and EMCMO (evolutionary multitasking optimization framework for constrained multiobjective optimization) [7].

    3) Extensive experimental studies on four popular and challenging CMOP benchmark test suites demonstrated that the OS method can significantly improve the performance of CMOEAs.Furthermore, compared to nine state-of-the-art CMOEAs, our methods obtained better versatility on different problems.

    The remainder of this article is organized as follows.Section II introduces the related work.Section III elaborates on the proposed methods.Then, experiments and analysis are presented in Section IV.Finally, conclusions and future research directions are given in Section V.

    II.RELATED WORK AND MOTIVATIONS

    A. Preliminaries

    Without loss of generality, a CMOP can be formulated as

    wheremdenotes the number of objective functions;x=(x1,...,xn)Tdenotes the decision vector withndimensions (i.e., the number of decision variables);x∈R, and R ?Rnrepresents the search space.gi(x) andhj(x) are thei-th inequality andjth equality constraints, andqdenotes the number of constraints.

    In a CMOP, the degree of thej-th constraint violation (CV)of a solutionxis

    whereσis a small enough positive value to relax the equality constraints into inequality ones.The overall CV ofxis

    xis feasible if ? (x)=0; otherwise, it is infeasible.

    B. Adaptive Operator Selection in MOPs

    In recent years, adaptive operator selection is gradually attracting research attention in the design of multi-objective optimization evolutionary algorithms (MOEAs).

    Wanget al.[16] proposed a multi-operator ensemble method that uses multiple subpopulations for multiple operators and adjusts their sizes according to the effectiveness of each action to reward good operators and punish bad ones.Tianet al.[17] adopted DRL to construct an adaptive operator selection method for MOEA/D.The proposed DRL model regards the decision vectors as states and the operators as actions; then, the fitness improvement of the solution brought by the operator is taken as the reward.Santiagoet al.[28] suggested a fuzzy selection of operators that chooses the most appropriate operators during evolution to promote both diversity and convergence of solutions.Donget al.[18] devised a test-and-apply structure to adaptively select the operator for decomposition-based MOEAs.The proposed structure contains a test phase that uses all operators to generate offspring and test the survival rate as the credit of operators.Then, the apply phase uses only the best operator for the remaining evolution.Yuanet al.[29] investigated the effect of different variation operators (i.e., GA and DE) in MOEA.McClymont and Keedwell [30] proposed a Markov chain hyper-heuristic that employs Reinforcement Learning and Markov chains to adaptively select heuristic methods, including different operators.Linet al.[31] devised a one-to-one ensemble mechanism that measures the credits of operators in both the decision and objective spaces and designed an adaptive rule to guarantee that suitable operators can generate more solutions.

    C. DRL and Its Applications in MOPs

    As we know, evolutionary computation aims to find the optima from a static environment.However, reinforcement learning can learn an optimal coping strategy in a dynamic environment [32].The policy enables the agent to adaptively take actions that gain the largest cumulative reward at the current environmental state [33].Different from reinforcement learning, DRL uses a deep neural network to approximate the action-value function.More specifically, Q-learning learns a policy through a discrete Q-table that records the cumulative reward of each action at each state [34].In contrast, DQL trains a Q-network that approximates the action-value function to estimate the expected reward of each action [35].The action-value function is also known as a probability model that estimates the probability of taking each action.Generally,DQL has two types of working principles.We assume that the DQN is trained, where two types are depicted in the left half and right half of Fig.1, respectively.For the Type-I DQL model, the input includes a state and an action (shown in a cube), and the output is the Q-value of this action under this state.For the Type-II DQL model, the input contains only the state, and the DQN can output the Q-values of all actions.

    Fig.1.An illustration of two types of working principles of the DQL technique.

    Without loss of generality, DQL trains a Q-network through the loss function

    where T is the training data randomly sampled from an experience replay (EP); the experience replay is a data set that records historical data for the training of the Network;Q(st,at)is the output of the Network with the input (st,at) andaT, andqtis the Q-value of taking actionatat the statest, which is formulated as

    whereQ(st+1,a′) is the maximum reward of taking the next action under the state whereatis performed.Therefore,qtcan not only calculate the current reward of performing actiona,but also estimate the maximum cumulative reward in the future.

    Recently, DRL techniques are attracting more and more attention in the multi-objective optimization community.Researchers either employ DRL techniques to solve realworld MOPs or utilize them to enhance the performance of MOEAs.

    1)Applications in Real-world MOPs: Liet al.[22] presented a DRL-assisted multi-objective bacterial foraging optimization algorithm for a five-objective renewable energy accommodation problem.Caviglioneet al.[20] suggested solving the multi-objective placement problem of virtual machines in cloud data centers by a DRL framework that selects the best placement heuristic for each virtual machine.Schneideet al.[19] designed a DRL-based DeepCoord approach that trains an agent to adaptively learn how to best coordinate services without prior and expert knowledge in dealing with a three-objective service coordination problem.

    2)Applications in MOEAs: Tianet al.[17] used the DRL technique to adaptively select the evolutionary operators in dealing with MOPs.In their methods, the decision variable of a solution is regarded as the state, the operator is regarded as the action, and the improvement of fitness of the solution is the reward.Liet al.[24] proposed modeling each subproblem decomposed by the reference vectors as a neural network and generating Pareto-optimal solutions directly through the trained network models.They modeled the multi-objective TSP (MOTSP) as a Pointer Network and solved it using DRL techniques.Zhanget al.[25] developed a concise meta-learning-based DRL technique that first trains a fine-tuned metamodel to derive submodels for the subproblems to shorten training time.Then, the authors further used their technique to solve MOTSP and multi-objective vehicle routing problems with time windows.

    D. Motivations

    In the evolutionary computation community, evolutionary operators (also known as variation operators) that generate new candidate solutions for selection are an important part [9].A fixed operator can not be suitable for all problems according to the no free lunch theorem.Therefore, adaptive operator selection is widely recognized as an interesting and key issue for evolutionary computation [36].Although adaptive operator selection has attracted some attention in the multi-objective optimization community, unfortunately, there are currently no research efforts dedicated to the design of CMOEAs.Given that a CMOP inherits the features and challenges of MOPs other than its constraints, it is valuable and promising to develop adaptive operator selection methods for solving CMOPs.

    During the evolution of solving a CMOP, the whole environment is dynamic because the population is different and unknown in advance at each iteration.Therefore, adopting reinforcement learning techniques to solve the adaptive operator selection issue is intuitively effective.However, traditional reinforcement learning techniques such as Q-learning may be less effective in dealing with such problems because the environment can include an infinite number of states, that is, the search space is continuous but Q-table can only handle discrete state space.In contrast, DRL techniques that train a deep neural network as the policy are suitable for the continuous and infinite state space.due to the recursive nature of this formula [17].As a result,DRL techniques such as DQL are more suitable for the evolutionary process of CMOEAs since both historical and future performance should be considered in the heuristic algorithms[37].Thus, DQL is very promising for adaptive operator selection.

    Meanwhile, there are some challenging issues that must be resolved to successfully apply DRL to CMOPs.Since a CMOP contains constraints, it is necessary for a DRL model to consider constraint satisfaction and feasibility in the design of the state and reward.In addition, the effectiveness of an action (i.e., operator) on both handling constraints and opti-

    Another advantage of DRL techniques is that the outputqtof (5) contains not only the current rewardrtbut also the maximum expected reward after taking this action.Consequently,qtcan represent the maximum cumulative reward in the future mizing objectives should be evaluated in training the DRL model.

    To adopt DQL in CMOEAs, we need to develop a model that determines the state, action, reward, and learning procedure.Our proposed DQL model and the DQL-assisted CMOEA framework are elaborated on in the next section.

    III.PROPOSED METHODS

    A. Proposed DRL Model

    In this work, the evolutionary operators are regarded as actions, thus the actions include

    whereopiis theith evolutionary operator such as GA, DE,PSO, and CSO, andkis the number of candidate operators.It should be noted that in our proposed model, any number of operators can be used as candidates.

    Then, we define the state of a population by considering its performance in terms of convergence, diversity, and feasibility.The average sum of objective functions of solutions in the population is used to evaluate the convergence of the population in the objective space, i.e., the approximation to the CPF(constrained Pareto front).Specifically, it is formulated as

    wherefj(x) is thej-th objective function value.When the population approximates the CPF,conwill become smaller.The objective functions are not normalized due to the following underlying considerations.After normalization, all objectives

    will belong to [0,1].Although the influence of different scales can be eliminated, theconfails to represent the real distribution of the population.

    The average CV value of solutions in the population is used to estimate feasibility, i.e., the distribution of the population in the feasible/infeasible regions.Specifically, it is formulated as

    where ?(x) is the CV ofx.If all solutions of the population are in feasible regions,feais zero.Otherwise, it will be a large value if the population is located outside the feasible regions.

    The sum of scales, in all objective dimensions that the population covers, is used to estimate the diversity of the population.Specifically, it is formulated as

    Then, we use three components to form a state.The state set is

    It usescon,fea, anddivto reflect the current state of the population.In this case, whether an operator should be selected depends on a comprehensive evaluation of the current population state and its effectiveness in enhancing these performances.

    The reward is calculated by

    wherecon′,fea′, anddiv′are the new state of the population for the next generation.The reward we define in this work represents the improvement of the population state, including the performance in terms of convergence, diversity, and feasibility.Therefore, the effectiveness of a selected operator can be comprehensively evaluated.It is necessary to note that these three terms are normalized when they are input to the network as training/predicting data.

    A record is formed as

    and the EP is formed as

    Such a data structure enables the EP to record the state, the selected operator, the reward, and the new state at each iteration.Thus, when training the network, the training data include all necessary items to approximate the action-value function in handling the operator selection issue.

    Based on the above definitions and proposals, we develop a DQL model for operator selection for CMOEAs.The proposed model is shown in Fig.2.This model contains four procedures, they are:

    1)Evolution Procedure: In the evolution procedure, the CMOEA generates offspring by the selected operator (determined by the DQN) and determines solutions that can survive to the next generation.

    2)Interaction Procedure: In the interaction procedure, the agent and the environment exchange information and interact with each other.The agent gives the environment an action(i.e., operator) and the environment generates feedback (i.e.,reward) from the evolution of the CMOEA for the agent.

    Fig.2.The illustration of the proposed DQL model.

    Fig.3.The flowchart of the proposed DQL-assisted CMOEA framework.

    3)Learning Procedure: After the agent receives the feedback, it learns and improves the policy in the learning procedure.First, the record of the current iteration is added to the EP.Second, the DQN is trained based on the training data sampled from the EP.The input of the DQN is state & action,which is to say, the population state and the adopted operator at one iteration.The output of the DQN is the Q-value of adopting that operator at that population state.

    4)Decision Procedure: After the DQN is trained, the agent can use it to decide which action to take in the face of a population state.The decision procedure first estimates all Q-values of all actions.Then the action with the largest Q-value is selected.

    The above four procedures execute in order at each iteration to achieve the DQL-assisted operator selection for CMOEA.

    B. Proposed DQL-Assisted Framework

    Based on the DQL model, we develop a DQL-assisted adaptive operator selection framework for CMOEAs.We present the flowchart of the proposed DQL-assisted CMOEA framework in Fig.3.The main steps are as those in a CMOEA (initialization, mating, and selection), except that the DQLassisted framework contains two additional parts.

    The blue part contains the operator selection process.In this process, the size of the EP needs to meet the requirement before the DQN is trained because we need enough data to train the Network.If the size of the EP is not enough, we randomly select an operator at the iteration to give equal weight to each operator so that the exploration can be guaranteed.Otherwise, if the size of the EP meets the requirement and the DQN is not built, the DQN is first built.If the DQN has been trained, it is directly used to adaptively select an operator for the following steps.

    After we get an adaptively selected operator, the offspring are generated based on the mating selection of the CMOEA,and the environmental selection of the CMOEA is conducted to determine the population for the next generation.

    Then, the network update process (in yellow) is conducted.It mainly contains two specific parts.At each iteration, the population state is detected and the record is calculated.Afterward, the EP is updated by adding the record of this iteration.After every 50 iterations, the QDN is updated.The above steps continue until the termination condition is met.

    Algorithm 1 The DQL-Assisted Framework for CMOEAs Require: N (population size), (termination condition), i (number of operators), (maximum size of EP), (required size of EP)P Gmax msep rsep Output: (the output solution set of the CMOEA)1: Initialization of the CMOEA;2: Determine the state of the population;EP ←3: initialize the EP;g ←0 4: ;g

    The pseudocode of the proposed DQL-assisted framework for CMOEAs is presented in Algorithm 1.The inputs include the population size, the termination condition (maximum function evaluations), the maximum size of EP, and the required size of EP for training the DQN.The output is the same as the output solution set of the selected CMOEA.The main steps are consistent with the process in the flowchart.First, the initialization of the CMOEA is conducted to generate the initial population(s) for evolution (line 1).Then, the state of the initial population is determined (line 2) to prepare the first record.The EP set EP is initialized as empty then(line 3) and the index of the current generationgis initialized as zero (line 4).In the main loop, the following steps are performed:

    1) If the EP has not meet the required sizersep, lines 7-11 are performed.First, an operator (denoted asith operator) is randomly selected (line 7).Then, the offspring set O is generated based on the operator and the mating selection of the CMOEA (line 8).Afterward, the environmental selection of the CMOEA is performed to select the population P for the next generation (line 9).According to this iteration, the reward and the new state are determined and used to form a new record t (line 10).Then EP is updated using the record t(line 11).It should be noted that EP is a queue that follows the first in, first out rule.

    2) Otherwise if the EP has met the required sizersep, then lines 13-20 are performed.In the beginning, the DQNQis not built, so it is initialized using data from E P by Algorithm 2(line 14).

    3) In the later iterations where the DQN has been built, lines 16-20 are performed.First, an operator is adaptively selected according to the current state and the DQL-assisted method presented in Algorithm 3 (line 16).Then the offspring set O is generated by the CMOEA using the selected operator (line 17).Afterward, the population for the next generation is selected based on the environmental selection of the CMOEA(line 18).Finally, the new record is formed (line 19) and the EPis updated (line 20).

    4) Once every 50 iterations, the DQN is updated usingEP by Algorithm 3 (line 25).This step guarantees that the DQN can approximate the recent environment.

    Finally, the output solution set of the CMOEA is output as the final solution set (line 25).

    C. Train/Update the DQN

    In this work, we adopt a simple Back-propagation neural network as the DQN.The detailed parameter settings of the adopted DQN architecture are listed in Table I.When the DQN is trained or updated, the procedure of Algorithm 2 is conducted.

    The inputs include the experience replay EP and the required size of training datastr.The output is a DQNQ.First,strrecords are sampled from E P and form a set T as the training data (line 1).Then, the former four items (i.e., the statestand the actionat) are used as the input of the DQN(lines 2 and 3), while the fifth item (i.e., the rewardrt) is used as the output of the DQN (line 4).The last three items are used to train DQN as thest+1in (5) (line 5).An important step is that we need to normalize the inputs and output to eliminate the influence of different scales so that the DQN can be accurately estimated (line 6).Then, the normalized training data is used to train the DQN using (4) as the loss function(line 7).

    D. Proposed OS Method

    The OS method in this work is similar to the common method of selecting the action in Reinforcement Learning.The detailed Pseudocode is presented in Algorithm 3.The inputs include the possibility of a greedyεto control the possibility of selecting the operator by DQN or in a random method.They also include the current population states, the operator set A, and the DQNQ.The output is the selected operatora.First, a random numberkbetween [0,1] is generated (line 1).Ifkis less than the possibility of greedy, the DQN is used to select an operator (lines 3 and 4).When DQN is used, all operators are tested through the DQN, and the one with the maximum reward is adopted.Ifkis greater thanε, the index of the selected operator is randomly generated (line 6)to guarantee exploration.In this work, we use GA and DE operators (i.e., their crossover and mutation strategies) as the candidate actions to instantiate the framework.The reasons for selecting these two operators are two-fold:

    1) First, GA and DE are the two most commonly used operators in existing CMOEAs;

    2) Second, GA can handle multimodality well and has a strong capability for convergence, while DE can handle linkages well and is good at exploring the decision space [9].

    It is worth noting that before the E P meets the required size(i.e., line 6 in Algorithm 1), the operator is randomly selected to enhance exploration.Also, parameterεenables some random selection to enhance the exploration.

    E. Computational Complexity

    According to Algorithms 1-3, the computational complexity of our methods is determined by three components, i.e., the selected CMOEA, the training of DQN, and the calculations of population state.The complexity of training the DQN (a Backpropagation neural network) isO(4u2) if we useuto represent the number of neurons.The time consumption of calculating the state isO(mnN).The time complexity of a CMOEA is usuallyO(mN2) toO(N3) (e.g., CCMO [4]).Therefore, the overall computational complexity is determined by the selected CMOEA.

    F. Remarks

    The differences between our proposed operator selection method and existing adaptive operator selection methods are two-fold:

    1) Our method can apply to CMOPs because the design of the DQL model and the algorithmic framework both consider constraints and the feasibility of solutions, but existing methods are not applicable to CMOPs.

    2) Our method is based on DQL, which can evaluate the improvement of the population brought by the selected operator in the future, while existing methods can only evaluate the improvement according to historical evolution.

    Although Tianet al.[17] also proposed a DQL-assisted operator selection method, our methods are different from theirs in three aspects:

    1) This work handles CMOPs, while the topic of [17] is unconstrained MOPs.Handling CMOPs is more difficult than MOPs due to the existence of constraints.

    2) Our proposed DQL model is different from the model in[17].In [17], the decision variable is regarded as the state, and the improvement of a single solution (offspringvsparent solution) is regarded as the reward.

    3) The method in [17] is a specific algorithm for MOPs,while our proposed framework can be embedded into any CMOEAs.

    The differences between our methodology and existing applications of DRL in solving MOPs are as follows:

    1) First, this work proposes a universal adaptive operator selection framework that can be used in any algorithm for CMOPs.By contrast, most existing applications focus on using DRL to solve a specific real-world application problem.

    2) Second, most existing applications of DRL in solving MOPs can only solve discrete MOPs, while our method can solve any form of CMOP as long as the embedded operators and the adopted CMOEA can handle discrete or continuous decision variables.

    IV.EXPERIMENTAL STUDIES

    This section presents the experimental studies.Section IV-A presents the detailed experimental settings.Section IV-B compares the DQL-assisted operator selection with the original CMOEA and random operator selection.Section IV-C gives the comparison studies between our method and state-of-theart CMOEAs.Section IV-D shows the parameter analyses of the DRL-assisted operator selection technique and the adopted neural network.Section IV-E studies the effectiveness of using two performance indicators, instead of the proposed simplistic assessments in (7) and (9), to assess the population state.

    A. Experimental Settings

    1)Benchmark Problems: In the experimental studies, we selected four challenging CMOP benchmark test suites to test our methods.The benchmarks include CF [38], DAS-CMOP[39], DOC [40], and LIR-CMOP [41].The main difficulties and challenges of these benchmarks are summarized in Table S-II in the Supplementary File.

    2)Algorithms in Comparison: In this work, we embed our proposed DQL-assisted operator selection framework into four existing CMOEAs; they are CCMO [4], MOEA/D-DAE[27], EMCMO [7], and PPS [26].In the comparison studies,we selected nine state-of-the-art CMOEAs for comparison,including methods using different operators.They are c-DPEA [42], ToP [40], CMOEA-MS [5], BiCo [43], MFOSPEA2 [6], ShiP-A [44], DSPCMDE [45], NSGA-II-ToR [8],and CCEA [46].

    3)Parameter Settings and Genetic Operators: For algorithms that use GA as the operator, the simulated binary crossover (SBX) and polynomial mutation (PM) [47] were used with the following parameter settings:

    a) Crossover probability waspc=1; distribution index was ηc=20;

    b) Mutation probability waspm=1/n; distribution index was ηm=20.CRFor algorithms that use DE as the operator, the parameters andFin the DE operator were set to 1 and 0.5, respectively.

    The evolutionary settings and parameters of our methods are presented in Table S-III in the Supplementary File.The explanations for the settings are also presented in Section IV in the Supplementary File.All other parameter settings of the comparison methods were the same as in their original literature (i.e., the default settings in PlatEMO).

    4)Performance Indicators: Inverted generational distance based on modified distance calculation (IGD+) [48] and hypervolume (HV) [48] were adopted as indicators to evaluate the performance of different algorithms.We used two Pareto-compliant indicators to achieve a sound and fair comparison [49].Detailed information on these three indicators could be found in the Supplementary File.The valueNaNmeans an algorithm cannot find a feasible solution or the final solution set is far from the CPF.

    5)Statistical Analysis: Each algorithm is executed 30 independent runs on each test instance.The mean and standard deviation values of IGD+ and HV were recorded.The Wilcoxon rank-sum test with a significance level of 0.05 was used to perform the statistical analysis using the KEEL software [50].“+”, “-”, and “=” were used to show that the result of other algorithms was significantly better than, significantly worse than, and statistically similar to those obtained by our methods to the Wilcoxon test, respectively.AllNaNvalues of HV and IGD+ are replaced by zero and 100, respectively.

    B. On the Effectiveness of DQL-Assisted OS

    In the first part of the experiments, we compare the CMOEAs embedded using the DQL-assisted operator selection method with the original CMOEAs using a fixed operator and the CMOEAs using random operator selection to verify the effectiveness of the DQL-assisted operator selection method.The results in terms of HV and IGD+ on the four benchmarks are presented in Tables S-V to S-XII in the Supplementary File.

    For the CFs, it can be found that DRLOS-CCMO and DRLOS-EMCMO significantly outperformed the corresponding methods using a fixed operator or random operator.However, CCMO and EMCMO perform better or at least competitively on the three-objective CF8-10, revealing that for these large objective space instances, the operators of GA can enhance convergence.As for MOEA/D-DAE and PPS, it can be determined that using random and adaptively selected operators outperformed using a fixed operator.For DAS-CMOPs,the superiority of CCMO and EMCMO on DAS-CMOP4-8 reveals that these instances prefer the GA operator.On the contrary, DAS-CMOP1-3 and DAS-CMOP9 prefer the DE operator.However, an adaptive operator selection method can better handle DAS-CMOP1-3 and DAS-CMOP9 than PPS using the DE operator, revealing that the DQL-assisted operator selection can learn to decide which operator to use according to the population state during evolution.For DOCs, the results are similar to those of CFs.DRLOS-CCMO and DRLOS-EMCMO performed significantly better than other methods, revealing that DOC prefers the DE operator and the adaptive selection method can accurately determine DE as the operator.Since PPS uses DE as the operator, the performance is not significantly influenced by operator selection on DOCs.The results among RandOS and DRLOS also reveal that using DRL to adaptively select operators during evolution performs better than random selection.For LIR-CMOPs, it is apparent that the proposed OS method can significantly improve the performances of all CMOEAs.Nevertheless, the results show that LIR-CMOP13-14 prefer GA as operator.

    To better understand the performance, we conduct the Friedman test with Holm correction at a significance level of 0.05 on all results.The average rankings and p-values are summarized in Table II.It can be found that the DQL-assisted operator selection method outperforms the fixed operator and random operator selection.For these four CMOEAs, using our proposed DQL-assisted operator selection method can improve their performance.

    In summary, our proposed DQL-assisted OS method can improve the performance of these CMOEAs.The DQLassisted operator selection method can better determine the operator based on the population state compared to using a fixed or randomly selected operator.

    C. Comparison Studies

    After the verification of the proposed DQL-assisted OS method, we further compare the DRLOS-EMCMO algorithm with nine state-of-the-art CMOEAs to study the algorithmic performance of these CMOEAs.The statistical results on four benchmarks are presented in Tables III-V in this file and Tables S-XIII-S-XVIII in the Supplementary File.

    In general, DRLOS-EMCMO outperforms other CMOEAs on these challenging CMOPs.But it performed worse in some CMOPs.Among the selected CMOPs, some instances rely on a particular operator.For example, the results show that DASCMOP4-8 and LIR-CMOP13-14 need the GA operator because of their multimodal feature.Besides, some instances of the CF test suite need the DE operator due to the linkage between variables.However, since the computational resources are used for all operators in DRLOS-EMCMO in the learning stage, DRLOS-EMCMO has fewer resources for the specifically needed operator in dealing with these problems, and thus, it performs worse.Additionally, it can be found that LIR-CMOP1-4, the CMOPs with an extremely small feasible region, require some constraint relaxation techniques such as penalty function in c-DPEA [42] and ShiP-A[3], orε-constrained in DSPCMDE [45] and CCEA [46].However, EMCMO does not contain a constraint relaxationtechnique, and thus, it performs worse on LIR-CMOP1-4.Except for these specific problems, DRLOS-EMCMO generally performs better than other CMOEAs.

    TABLE III STATISTICAL RESULTS OF IGD+ OBTAINED BY DRLOS-EMCMO AND OTHER METHODS ON DOC BENCHMARK PROBLEMS

    TABLE IV STATISTICAL RESULTS OF HV OBTAINED BY DRLOS-EMCMO AND OTHER METHODS ON LIR-CMOP BENCHMARK PROBLEMS.

    Fig.4.The final solution sets obtained by DRLOS-EMCMO and other methods on DAS-CMOP9 with the median IGD+ value among 30 runs.

    We depict the final solutions sets obtained by DRLOSEMCMO and other CMOEAs on CF1, DAS-CMOP9, DOC2,and LIR-CMOP8 in Figs.S-1 to S-3 in the Supplementary File and Fig.4 in this file.The results clearly show that DRLOS-EMCMO can approximate the CPF and obtain an even distribution in every instance.For DOC2, it can be found that only DRLOS-EMCMO can converge to two segments of the CPF, revealing that using GA performs worse than using an adaptively selected operator.As for LIR-CMOP8, it can be found DSPCMDE, adopting DE as the operator, also converges to the CPF.However, some dominant and extreme solutions remain, revealing that when GA can be adaptively selected, convergence can be further enhanced.Similarly for DAS-CMOP9, DSPCMDE finds fewer segments of the CPF compared to DRLOS-EMCMO, demonstrating that when GA can be used during evolution, the population can more effectively converge to the CPF and is less likely to get trapped in local optima.

    Additionally, we depict the convergence profiles of DRLOSEMCMO and other CMOEAs on CF4, DAS-CMOP1, DOC7,and LIR-CMOP6 in Fig.5.It can be determined that DRLOSEMCMO can not only achieve faster convergence speed but also obtain a better final indicator value.

    Fig.5.The convergence profiles on IGD+ of DRLOS-EMCMO and other methods on CF4, DAS-CMOP1, DOC7, and LIR-CMOP6 with the median IGD+ values among 30 runs.

    In summary, DRLOS-EMCMO outperforms these CMOEAs using a fixed operator, revealing that the DQL-assisted operator selection can achieve better versatility on different CMOPs.

    D. Parameter Analyses

    In the first part of parameter analyses, we change the required size of EP and the batch size to test its influence on training the DQL.Also, we change the greedy threshold to test the robustness of our method.The statistical results are presented in Tables S-XIX to S-L in the Supplementary File.It can be found that these two parameters have little influence on the performance in terms of all benchmark problems, demonstrating that our method is not parameter sensitive.Therefore current parameter settings are applicable when applying the DQL-assisted operator selection method to a new CMOEA.

    Then in the second part of parameter analyses, we conduct different settings of important parameters in the DQN, including the decay of learning rate, learning rate, number of iterations in training, and number of nodes in each hidden layer, to test the influence of these parameter settings.Detailed explanations of variants are reported in Table S-IV in the Supplementary File.In this part, only DRLOS-EMCMO is adopted to save space.Statistical results of HV and IGD+ are reported in Tables S-L to S-LVIII in the Supplementary File.On all benchmark problems, different parameter settings have little or no significant influence on DRLOS-EMCMO, revealing that the DQN adopted in this work is not parameter sensitive.

    E. Ablation Studies on Assessing Population State Using Indicators

    In this part, we adopt two indicators, HV [48] and Spacing[51], to estimate the population state, respectively.These two indicators are selected because they do not need prior knowledge of the true PF.Compared to the items used in (7) and (9),HV and Spacing are two more sophisticated indicators that can provide a more concise estimation of the population state.The HV indicator can evaluate convergence and diversity,while the Spacing indicator can evaluate diversity.

    The results are reported in Tables S-LIX and S-LX in the Supplementary File, according to which the following conclusions can be found.

    1) For CCMO and EMCMO, using simplistic evaluations as in (7) and (9) performs better when dealing with DSACMOP1-3 and DOCs, revealing that a sophisticated evaluation can easily lead to local optima because these instances need not only operators of GA but also other operators.In contrast, using HV and Spacing indicators performs better on DAS-CMOP4-9 which prefers GA.The indicators provide a more concise estimation of the dynamics of the population,allowing the algorithm to determine how to select operators of GA.

    2) For MOEA/D-DAE, the choice of simplistic or sophisticated evaluations has no significant influence on the performance.

    3) For PPS, using sophisticated evaluations performs significantly worse on instances with no preference for operators of GA and performs better on some instances with a preference for operators of GA, which further demonstrates that a simplistic evaluation has better versatility.

    V.CONCLUSIONS AND FUTURE WORK

    In this article, we propose a DQL-assisted online operator selection method for CMOPs, filling the research gap in operator selection in CMOPs and introducing DRL techniques to CMOPs.We develop a DQL model that uses population convergence, diversity, and feasibility as the state, the operators as actions, and the improvement of the population state as the reward.Based on this DQL model, we develop a versatile and easy-to-use DQL-assisted operator selection framework that can contain any number of operators and be embedded into any CMOEA.We embedded the proposed method into four existing CMOEAs.Experimental studies have demonstrated that the proposed adaptive operator selection method is effective, and the resulting algorithm outperformed nine state-ofthe-art CMOEAs.

    Nevertheless, some issues must be addressed regarding the current study of this paper.According to the experimental results, DRL-assisted CMOEAs perform worse when a CMOP prefers specific operators.Therefore, it is necessary to improve learning efficiency so that fewer computational resources can train a more accurate DRL model to determine the desired operators.Moreover, increasing the maximum number of function evaluations in the experiments may improve the numerical performance of DRL-assisted CMOEAs, which is also true when the proposed method is applied to real-world problems.

    In the future, the following directions are worth trying:

    1) Some other operators can be embedded to extend this framework to solve other kinds of MOPs.For example, the CSO [13] or other enhanced operators [52] can be embedded to solve large-scale CMOPs [53].In addition, some advanced learning-based optimizers such as switching particle swarm optimizers [54], [55] can be used as actions to enhance performance.

    2) Advanced neural networks [56] can be employed as the DQN to see if they can enhance the performance of the proposed DQL-assisted operator selection method.

    3) Moreover, hyperparameter adaptation [57] is necessary for future study since there are many hyperparameters that need to be adjusted.

    The codes of the methods in this work can be obtained from the authors upon request.

    91国产中文字幕| 中文字幕精品免费在线观看视频| 欧美 日韩 精品 国产| 热99久久久久精品小说推荐| 男女下面插进去视频免费观看| 色婷婷av一区二区三区视频| 一级毛片我不卡| 国产精品国产三级国产专区5o| 久久狼人影院| 亚洲欧美一区二区三区黑人 | 97在线人人人人妻| 国产不卡av网站在线观看| 中文字幕人妻丝袜制服| 女人精品久久久久毛片| 国产精品久久久久成人av| 如何舔出高潮| 看十八女毛片水多多多| 国产极品粉嫩免费观看在线| 在线观看免费高清a一片| 啦啦啦啦在线视频资源| 宅男免费午夜| 国产黄频视频在线观看| 校园人妻丝袜中文字幕| 狠狠精品人妻久久久久久综合| 热99久久久久精品小说推荐| av在线观看视频网站免费| 国产精品亚洲av一区麻豆 | 久久精品国产a三级三级三级| 日本黄色日本黄色录像| 国产国语露脸激情在线看| 91在线精品国自产拍蜜月| 在线观看www视频免费| 亚洲精品一二三| 天天躁夜夜躁狠狠久久av| av又黄又爽大尺度在线免费看| 亚洲欧美一区二区三区久久| 欧美精品高潮呻吟av久久| 人人妻人人澡人人爽人人夜夜| 国产精品一国产av| 日日摸夜夜添夜夜爱| 亚洲av综合色区一区| 国产精品不卡视频一区二区| 精品国产一区二区三区四区第35| 国产在线免费精品| 亚洲美女视频黄频| 亚洲精品国产av成人精品| 亚洲少妇的诱惑av| 国产av码专区亚洲av| 久久久久精品久久久久真实原创| 久久国产亚洲av麻豆专区| 在线免费观看不下载黄p国产| 蜜桃国产av成人99| 激情视频va一区二区三区| 亚洲欧美日韩另类电影网站| 三上悠亚av全集在线观看| 男人舔女人的私密视频| 男人添女人高潮全过程视频| 不卡视频在线观看欧美| 国产成人精品福利久久| 五月开心婷婷网| 成人毛片a级毛片在线播放| 日韩精品有码人妻一区| 日本wwww免费看| 色哟哟·www| 免费日韩欧美在线观看| 欧美97在线视频| 亚洲av日韩在线播放| 一本—道久久a久久精品蜜桃钙片| 美国免费a级毛片| 伊人久久国产一区二区| 欧美xxⅹ黑人| 中文字幕最新亚洲高清| 国产精品久久久久久久久免| 午夜福利在线免费观看网站| 国产成人免费观看mmmm| 久久国内精品自在自线图片| 26uuu在线亚洲综合色| 免费久久久久久久精品成人欧美视频| 97人妻天天添夜夜摸| 久久狼人影院| 国产黄频视频在线观看| 在线观看免费高清a一片| 欧美日韩亚洲高清精品| 香蕉丝袜av| 亚洲av男天堂| 欧美国产精品va在线观看不卡| 免费少妇av软件| 久久99一区二区三区| 久久久国产一区二区| 九色亚洲精品在线播放| 午夜免费男女啪啪视频观看| 观看av在线不卡| 国产1区2区3区精品| 老汉色∧v一级毛片| 人人妻人人爽人人添夜夜欢视频| 最新中文字幕久久久久| 久久精品国产综合久久久| 免费黄频网站在线观看国产| 亚洲一码二码三码区别大吗| 9色porny在线观看| 亚洲国产精品国产精品| 亚洲av欧美aⅴ国产| 国产视频首页在线观看| 天堂俺去俺来也www色官网| 十八禁网站网址无遮挡| 成人国语在线视频| 秋霞在线观看毛片| 午夜福利在线免费观看网站| 国产精品 欧美亚洲| 一二三四在线观看免费中文在| 1024香蕉在线观看| 亚洲欧美成人综合另类久久久| av国产久精品久网站免费入址| 免费黄频网站在线观看国产| freevideosex欧美| 日本爱情动作片www.在线观看| 天天躁夜夜躁狠狠躁躁| 丝袜美腿诱惑在线| 亚洲精品国产一区二区精华液| 人妻少妇偷人精品九色| av有码第一页| 亚洲内射少妇av| 青春草国产在线视频| 捣出白浆h1v1| 777米奇影视久久| 侵犯人妻中文字幕一二三四区| 免费在线观看完整版高清| 精品国产乱码久久久久久男人| av免费在线看不卡| 最近手机中文字幕大全| 熟妇人妻不卡中文字幕| 久热久热在线精品观看| 久久精品国产a三级三级三级| 久久精品国产亚洲av涩爱| 亚洲人成电影观看| 高清av免费在线| 欧美日韩国产mv在线观看视频| 久久久久久人人人人人| 亚洲国产日韩一区二区| 超色免费av| 国产成人91sexporn| 黄色怎么调成土黄色| 国产 精品1| 亚洲欧美日韩另类电影网站| 午夜福利视频在线观看免费| 亚洲精华国产精华液的使用体验| 亚洲第一区二区三区不卡| 街头女战士在线观看网站| 午夜久久久在线观看| 国产在视频线精品| 自线自在国产av| 高清视频免费观看一区二区| 久久免费观看电影| 大码成人一级视频| videossex国产| 中文字幕亚洲精品专区| 久久精品国产亚洲av涩爱| 只有这里有精品99| 国产高清国产精品国产三级| 中文字幕精品免费在线观看视频| 伦精品一区二区三区| 亚洲国产av影院在线观看| 五月天丁香电影| 亚洲精品一二三| 亚洲欧洲国产日韩| 亚洲五月色婷婷综合| 日本黄色日本黄色录像| 国产精品免费大片| 熟妇人妻不卡中文字幕| freevideosex欧美| 国产精品久久久久成人av| kizo精华| 日韩中文字幕视频在线看片| 女的被弄到高潮叫床怎么办| 亚洲国产精品一区二区三区在线| 国产成人av激情在线播放| 国产在线一区二区三区精| 男女下面插进去视频免费观看| 黄片播放在线免费| 伊人久久国产一区二区| 国产黄频视频在线观看| 中文乱码字字幕精品一区二区三区| a级片在线免费高清观看视频| 亚洲欧美成人综合另类久久久| a级毛片黄视频| 中文字幕人妻丝袜一区二区 | 校园人妻丝袜中文字幕| 午夜免费观看性视频| 观看av在线不卡| www日本在线高清视频| 欧美黄色片欧美黄色片| 777米奇影视久久| 免费看av在线观看网站| 亚洲av欧美aⅴ国产| 亚洲,一卡二卡三卡| 精品国产一区二区三区四区第35| 国产精品秋霞免费鲁丝片| 人妻少妇偷人精品九色| 国产av码专区亚洲av| 亚洲av电影在线观看一区二区三区| 97在线人人人人妻| 一区在线观看完整版| 欧美人与性动交α欧美软件| 国产在线免费精品| 两个人看的免费小视频| 少妇人妻精品综合一区二区| 国产有黄有色有爽视频| 久久 成人 亚洲| 国产精品香港三级国产av潘金莲 | 电影成人av| 欧美精品av麻豆av| 国产成人精品一,二区| 亚洲美女黄色视频免费看| 亚洲精品国产av成人精品| 亚洲,欧美精品.| 日韩大片免费观看网站| 国产欧美日韩一区二区三区在线| 这个男人来自地球电影免费观看 | 男女午夜视频在线观看| 亚洲熟女精品中文字幕| 免费黄色在线免费观看| 成年美女黄网站色视频大全免费| 一区二区三区四区激情视频| 亚洲第一青青草原| 爱豆传媒免费全集在线观看| 久久久久久久大尺度免费视频| 国产成人精品久久二区二区91 | 亚洲四区av| 国产探花极品一区二区| 中文字幕最新亚洲高清| 成年女人在线观看亚洲视频| 十分钟在线观看高清视频www| 国产亚洲一区二区精品| 母亲3免费完整高清在线观看 | 国产精品二区激情视频| 欧美另类一区| 欧美精品高潮呻吟av久久| 国产 一区精品| av女优亚洲男人天堂| 伦理电影大哥的女人| av一本久久久久| 国产野战对白在线观看| 一区二区av电影网| 一级a爱视频在线免费观看| 一区二区日韩欧美中文字幕| 少妇人妻 视频| 中文乱码字字幕精品一区二区三区| 天天影视国产精品| 最新中文字幕久久久久| 国产成人欧美| 天天操日日干夜夜撸| 国产1区2区3区精品| 午夜免费男女啪啪视频观看| 又粗又硬又长又爽又黄的视频| 熟妇人妻不卡中文字幕| av免费观看日本| 亚洲欧美精品综合一区二区三区 | 国产av国产精品国产| 两个人免费观看高清视频| av片东京热男人的天堂| 三上悠亚av全集在线观看| 亚洲av中文av极速乱| 免费在线观看视频国产中文字幕亚洲 | 久久免费观看电影| 国产老妇伦熟女老妇高清| 国产成人精品福利久久| xxxhd国产人妻xxx| 高清视频免费观看一区二区| 久久午夜综合久久蜜桃| 精品国产一区二区三区四区第35| 免费不卡的大黄色大毛片视频在线观看| 欧美日韩亚洲高清精品| 午夜av观看不卡| 亚洲国产精品成人久久小说| 狠狠婷婷综合久久久久久88av| 啦啦啦中文免费视频观看日本| 午夜免费观看性视频| 十分钟在线观看高清视频www| 晚上一个人看的免费电影| 日韩av在线免费看完整版不卡| 又黄又粗又硬又大视频| 大陆偷拍与自拍| 日本爱情动作片www.在线观看| 亚洲精品久久成人aⅴ小说| 国产成人91sexporn| 欧美成人精品欧美一级黄| 1024视频免费在线观看| 国产 精品1| 免费女性裸体啪啪无遮挡网站| 卡戴珊不雅视频在线播放| 久久久久久久亚洲中文字幕| 久久久久久久久久久久大奶| 又粗又硬又长又爽又黄的视频| 国产精品熟女久久久久浪| 两个人免费观看高清视频| 丝袜人妻中文字幕| 国产免费一区二区三区四区乱码| 男女午夜视频在线观看| 欧美精品一区二区大全| av天堂久久9| 97人妻天天添夜夜摸| 欧美变态另类bdsm刘玥| 91aial.com中文字幕在线观看| 久久国产亚洲av麻豆专区| 一级爰片在线观看| 亚洲av中文av极速乱| 亚洲,一卡二卡三卡| 国产精品麻豆人妻色哟哟久久| 日韩av在线免费看完整版不卡| 啦啦啦中文免费视频观看日本| 在线观看人妻少妇| 亚洲精品视频女| 国产av国产精品国产| 午夜福利视频在线观看免费| 国产精品 国内视频| 亚洲 欧美一区二区三区| 高清在线视频一区二区三区| 国产精品一国产av| 美女视频免费永久观看网站| 哪个播放器可以免费观看大片| 国产亚洲一区二区精品| 亚洲人成电影观看| 丰满饥渴人妻一区二区三| a级毛片在线看网站| 99久国产av精品国产电影| 欧美日韩综合久久久久久| 美女福利国产在线| 色94色欧美一区二区| 综合色丁香网| 欧美日本中文国产一区发布| 亚洲,一卡二卡三卡| 十八禁网站网址无遮挡| 久久97久久精品| 免费在线观看完整版高清| 国产片内射在线| 青春草视频在线免费观看| 中文字幕色久视频| 亚洲三级黄色毛片| 亚洲欧洲日产国产| 午夜免费男女啪啪视频观看| 久久午夜综合久久蜜桃| 高清视频免费观看一区二区| xxxhd国产人妻xxx| 精品酒店卫生间| 亚洲色图综合在线观看| 在线天堂中文资源库| 黄色怎么调成土黄色| 国产成人一区二区在线| 久久人人爽av亚洲精品天堂| 久久久久精品人妻al黑| 国精品久久久久久国模美| 欧美老熟妇乱子伦牲交| 制服人妻中文乱码| 伦精品一区二区三区| 久久久久久久久免费视频了| 日本黄色日本黄色录像| 卡戴珊不雅视频在线播放| 岛国毛片在线播放| 国产成人a∨麻豆精品| 国产在线免费精品| 精品卡一卡二卡四卡免费| 国产白丝娇喘喷水9色精品| 中文字幕av电影在线播放| 日韩,欧美,国产一区二区三区| 我的亚洲天堂| av有码第一页| 咕卡用的链子| 你懂的网址亚洲精品在线观看| av国产久精品久网站免费入址| 欧美最新免费一区二区三区| 亚洲精品国产av成人精品| 日本免费在线观看一区| 女性被躁到高潮视频| 国产福利在线免费观看视频| 亚洲欧美精品综合一区二区三区 | 一区二区三区乱码不卡18| 亚洲成人av在线免费| 天天躁日日躁夜夜躁夜夜| 国产av精品麻豆| 精品久久久精品久久久| 欧美xxⅹ黑人| 9热在线视频观看99| 成人毛片a级毛片在线播放| 午夜激情av网站| 精品人妻偷拍中文字幕| 一边摸一边做爽爽视频免费| 亚洲精品久久久久久婷婷小说| 麻豆精品久久久久久蜜桃| 青草久久国产| 精品99又大又爽又粗少妇毛片| 成人国产麻豆网| 精品亚洲乱码少妇综合久久| 免费不卡的大黄色大毛片视频在线观看| 国产一区有黄有色的免费视频| 久久久久视频综合| 青春草视频在线免费观看| 午夜久久久在线观看| av视频免费观看在线观看| 伦理电影免费视频| 尾随美女入室| 久久国产亚洲av麻豆专区| 欧美最新免费一区二区三区| 在现免费观看毛片| 午夜日本视频在线| 蜜桃国产av成人99| 亚洲国产av影院在线观看| av在线app专区| 18禁裸乳无遮挡动漫免费视频| 天天影视国产精品| 国产片内射在线| 亚洲男人天堂网一区| 久久青草综合色| 国产精品.久久久| 精品国产国语对白av| 日本免费在线观看一区| 九色亚洲精品在线播放| 两性夫妻黄色片| 三上悠亚av全集在线观看| 日韩制服骚丝袜av| 国产又爽黄色视频| 男女国产视频网站| 日韩,欧美,国产一区二区三区| 午夜福利视频精品| 免费观看在线日韩| 最近的中文字幕免费完整| 亚洲人成电影观看| 999久久久国产精品视频| 久久久久久伊人网av| tube8黄色片| 欧美中文综合在线视频| 欧美人与善性xxx| 国产免费视频播放在线视频| 日本-黄色视频高清免费观看| 国产片特级美女逼逼视频| 国产黄色免费在线视频| 晚上一个人看的免费电影| 亚洲国产av影院在线观看| 亚洲国产毛片av蜜桃av| 激情五月婷婷亚洲| www.av在线官网国产| videossex国产| 亚洲国产精品一区二区三区在线| 一二三四在线观看免费中文在| 久久久久网色| 最新的欧美精品一区二区| 999精品在线视频| 亚洲av成人精品一二三区| 91成人精品电影| 国产激情久久老熟女| 国产不卡av网站在线观看| 最近中文字幕高清免费大全6| 久久女婷五月综合色啪小说| 国产精品熟女久久久久浪| 亚洲精品日韩在线中文字幕| 欧美日韩亚洲高清精品| 爱豆传媒免费全集在线观看| 妹子高潮喷水视频| 精品一区二区三区四区五区乱码 | 在线 av 中文字幕| 91国产中文字幕| 亚洲美女搞黄在线观看| 久久久国产一区二区| a级片在线免费高清观看视频| 久久97久久精品| 香蕉丝袜av| 狂野欧美激情性bbbbbb| 婷婷色综合大香蕉| 亚洲国产毛片av蜜桃av| 青春草亚洲视频在线观看| 国产成人精品福利久久| 日韩欧美精品免费久久| 精品人妻偷拍中文字幕| 久久久久久久大尺度免费视频| 色吧在线观看| 日韩在线高清观看一区二区三区| 国产精品久久久久久av不卡| 国产成人精品久久二区二区91 | 十八禁高潮呻吟视频| 亚洲国产欧美在线一区| 久久精品aⅴ一区二区三区四区 | 1024香蕉在线观看| 丝袜美足系列| 国产精品 国内视频| 日韩制服丝袜自拍偷拍| 欧美日韩亚洲国产一区二区在线观看 | 一区二区三区四区激情视频| 欧美xxⅹ黑人| 亚洲av综合色区一区| 美女国产视频在线观看| 久久久精品免费免费高清| 久久精品久久精品一区二区三区| 亚洲第一区二区三区不卡| 三级国产精品片| 国产精品偷伦视频观看了| 亚洲三区欧美一区| 亚洲人成电影观看| 亚洲欧美清纯卡通| 国产在线视频一区二区| 中国国产av一级| 精品国产一区二区三区四区第35| 日韩av免费高清视频| 国产免费又黄又爽又色| 女人精品久久久久毛片| 女人高潮潮喷娇喘18禁视频| 中国三级夫妇交换| 欧美老熟妇乱子伦牲交| 午夜老司机福利剧场| 亚洲一区二区三区欧美精品| 亚洲美女搞黄在线观看| 欧美国产精品一级二级三级| www.av在线官网国产| 在线 av 中文字幕| 国产熟女午夜一区二区三区| 国产xxxxx性猛交| 国产综合精华液| 日本免费在线观看一区| 精品国产超薄肉色丝袜足j| 一级毛片 在线播放| 久久久欧美国产精品| 午夜日韩欧美国产| 日韩人妻精品一区2区三区| 精品99又大又爽又粗少妇毛片| 高清av免费在线| 在线天堂中文资源库| 免费日韩欧美在线观看| 国产 一区精品| 国产成人精品一,二区| 在线观看美女被高潮喷水网站| 狠狠婷婷综合久久久久久88av| 黄网站色视频无遮挡免费观看| 99热网站在线观看| 亚洲,欧美精品.| 一本久久精品| 两性夫妻黄色片| 国产一区有黄有色的免费视频| 人人澡人人妻人| 日本-黄色视频高清免费观看| 久久久久久免费高清国产稀缺| 18禁裸乳无遮挡动漫免费视频| 国产成人一区二区在线| 亚洲四区av| 一区在线观看完整版| 亚洲国产看品久久| 一区二区三区四区激情视频| 欧美精品人与动牲交sv欧美| www.自偷自拍.com| 国产人伦9x9x在线观看 | 九色亚洲精品在线播放| 精品久久久久久电影网| 18+在线观看网站| 亚洲婷婷狠狠爱综合网| 欧美日韩成人在线一区二区| 波多野结衣av一区二区av| 人人澡人人妻人| 国产成人精品久久久久久| 老汉色∧v一级毛片| 欧美日韩亚洲国产一区二区在线观看 | 又黄又粗又硬又大视频| 午夜av观看不卡| 亚洲三级黄色毛片| 美女福利国产在线| 18在线观看网站| 久久久a久久爽久久v久久| 在线观看美女被高潮喷水网站| 大话2 男鬼变身卡| 999久久久国产精品视频| 久久久精品94久久精品| 丝袜美腿诱惑在线| 亚洲国产欧美日韩在线播放| 色哟哟·www| 国产男女超爽视频在线观看| 欧美成人午夜精品| 爱豆传媒免费全集在线观看| 黄色配什么色好看| 少妇 在线观看| 久久久久久伊人网av| 国产成人一区二区在线| 男人爽女人下面视频在线观看| 亚洲欧洲国产日韩| 欧美老熟妇乱子伦牲交| 免费日韩欧美在线观看| 成人国产麻豆网| 国产精品不卡视频一区二区| 不卡av一区二区三区| 99热国产这里只有精品6| 国产精品香港三级国产av潘金莲 | 日本爱情动作片www.在线观看| 男女下面插进去视频免费观看| www.自偷自拍.com| 9色porny在线观看| 纵有疾风起免费观看全集完整版| av线在线观看网站| 国产精品女同一区二区软件| 亚洲精品aⅴ在线观看| 久久精品国产鲁丝片午夜精品| 久久久久久久久免费视频了| 亚洲精品aⅴ在线观看| 精品亚洲乱码少妇综合久久| 日韩成人av中文字幕在线观看| 美女大奶头黄色视频| videos熟女内射| 亚洲色图综合在线观看| 亚洲av国产av综合av卡| 国产一区有黄有色的免费视频| 高清不卡的av网站| 日产精品乱码卡一卡2卡三| 午夜福利视频在线观看免费| 国精品久久久久久国模美| 最新的欧美精品一区二区| 久久国产精品男人的天堂亚洲| 在线观看免费高清a一片| 国产精品偷伦视频观看了| 久久婷婷青草| 午夜福利影视在线免费观看| 精品国产乱码久久久久久小说|