• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    A Lyapunov characterization of robust policy optimization

    2023-11-16 10:12:32LeileiCuiZhongPingJiang
    Control Theory and Technology 2023年3期

    Leilei Cui·Zhong-Ping Jiang

    Abstract In this paper,we study the robustness property of policy optimization(particularly Gauss–Newton gradient descent algorithm which is equivalent to the policy iteration in reinforcement learning)subject to noise at each iteration.By invoking the concept of input-to-state stability and utilizing Lyapunov’s direct method,it is shown that,if the noise is sufficiently small,the policy iteration algorithm converges to a small neighborhood of the optimal solution even in the presence of noise at each iteration.Explicit expressions of the upperbound on the noise and the size of the neighborhood to which the policies ultimately converge are provided.Based on Willems’fundamental lemma,a learning-based policy iteration algorithm is proposed.The persistent excitation condition can be readily guaranteed by checking the rank of the Hankel matrix related to an exploration signal.The robustness of the learning-based policy iteration to measurement noise and unknown system disturbances is theoretically demonstrated by the input-to-state stability of the policy iteration.Several numerical simulations are conducted to demonstrate the efficacy of the proposed method.

    Keywords Policy optimization·Policy iteration(PI)·Input-to-state stability(ISS)·Lyapunov’s direct method

    1 Introduction

    Through reinforcement learning(RL)techniques,agents can iteratively minimize the specific cost function by interacting continuously with unknown environment.Policy optimization is fundamental for the development of RL algorithms as introduced in[1].Policy optimization first parameterizes the control policy,and then,the performance of the control policy is iteratively improved by updating the parameters along the gradient descent direction of the given cost function.Since the linear quadratic regulator (LQR)problem is tractable and widely applied in many engineering fields, it provides an ideal benchmark example for the theoretical analysis of policy optimization.For the LQR problem, the control policy is parameterized by a control gain matrix,and the gradient of the quadratic cost with respect to the control gain is associated with a Lyapunov matrix equation.Based on these results, various policy optimization algorithms, including vanilla gradient descent, natural gradient descent and Gauss–Newton methods,are developed in[2–5].Compared with other policy optimization algorithms with a linear convergence rate,the control policies generated by the Gauss–Newton method converge quadratically to the optimal solution.

    It is noticed that the Gauss–Newton method with the step size of 1/2 coincides with the policy iteration(PI)algorithm[6, 7], which is an important iterative algorithm in RL and adaptive/approximate dynamic programming (ADP) [1, 8,9].From the perspective of the PI, the Lyapunov matrix equation for computing the gradient can be considered as policy evaluation.The update of the policy along the gradient direction can be interpreted as policy improvement.The steps of policy evaluation and policy improvement are iterated in turn to find the optimal solution of LQR.Various PI algorithms have been proposed for important classes of linear/nonlinear/time-delay/time-varying systems for optimal stabilization and output tracking [10–14].In addition,PI has been successfully applied to sensory motor control[15],and autonomous driving[16,17].

    The convergence of the PI algorithm is ensured under the assumption that the accurate knowledge of system model is accessible.However, in reality, the system model obtained by system identification [18] is used for the PI algorithm or the PI algorithm is directly implemented through a data-driven approach using input-state data [10, 19–22].Consequently,the PI algorithm is hardly implemented accurately due to modeling errors, inaccurate state estimation,measurement noise,and unknown system disturbances.The robustness of the PI algorithm to unavoidable noise is an important property to be investigated, which lays a foundation for better understanding RL algorithms.There are several challenges for studying the robustness of the PI algorithm.Firstly, the nonlinearity of the PI algorithm makes it hard to analyze the convergence property.Secondly, it is difficult to quantify the influence of noise,since noise may destroy the monotonic property of the PI algorithm or even result in a destabilizing controller.

    In this paper,we study the robustness of the PI algorithm in the presence of noise.The contributions are summarized as follows.Firstly,by viewing the PI algorithm as a nonlinear system and invoking the concept of input-to-state stability(ISS)[23],particularlythesmall-disturbanceISS[24,25],we investigate the robustness of the PI algorithm under the influence of noise.It is demonstrated that when subject to noise,thecontrolpoliciesgeneratedbythePIalgorithmwilleventually converge to a small neighborhood of the optimal solution of LQR as long as noise is sufficiently small.Different from[24,25],where the analysis is trajectory-based,we directly utilize Lypuanov’s direct method to analyze the convergence of the PI algorithm under disturbances.As a result,an explicit expression of the upperbound on the noise is provided.The size of the neighborhood in which the control policies will ultimately stay is demonstrated as a quadratic function of the noise.Secondly,by utilizing Willems’fundamental lemma,a learning-based PI algorithm is proposed.Compared with the conventional learning-based control approach where the exploratory control input is hard to design such that the persistent excitation condition is satisfied[24],the persistently exciting exploratory signal of the proposed method can be easily designed by checking the rank condition of a Hankel matrix related to the exploration signal.Finally, based on the small-disturbance ISS property of the PI algorithm, we demonstrated that the proposed learning-based PI algorithm is robust to the state measurement noise and unknown system disturbances.

    The remaining contents of the paper are organized as follows.Section 2 reviews the LQR problem and the celebrated PI algorithm.In Sect.3,the small-disturbance ISS property of the PI algorithm is studied.Section 4 proposes a learningbased PI algorithm and the robustness of the algorithm is analyzed.Several numerical examples are given in Sect.5,followed by some concluding remarks in Sect.6.

    2 Preliminaries and problem formulation

    2.1 Policy iteration for discrete-time LQR

    The discrete-time linear time-invariant(LTI)system is represented as

    wherexk∈Rnanduk∈Rmare the state and control input,respectively;AandBare system matrices with compatible dimensions.

    Assumption 1 The pair(A,B)is controllable.

    Under Assumption 1,the discrete-time LQR is to minimize the following accumulative quadratic cost

    whereQ=QT?0 andR=RT?0.The optimal controller of the discrete-time LQR is

    whereV?=(V?)T?0 is the unique solution to the following discrete-time algebratic Riccati equation(ARE)

    For a stabilizing control gainL∈ Rm×n, the corresponding cost in (2) isJd(x0,-Lx) =, whereVL=(VL)T?0 is the unique solution of the following Lyapunov equation

    and the functionG(·):Sn→Sn+mis defined as

    The discrete-time PI algorithm was developed by [7] to iteratively solve the discrete-time LQR problem.Given an initial stabilizing control gainL0,the discrete-time PI algorithm is represented as:

    Procedure 1 (Exact PI for discrete-time LQR)

    1.Policy evaluation:get G(Vi)by solving

    2.Policy improvement:get the improved policy by

    The monotonic convergence property of the discrete-time PI is shown in the following lemma.

    2.2 Policy iteration for continuous-time LQR

    Consider the continuous-time LTI system

    wherex(t)∈Rnis the state;u(t)∈Rmis the control input;x0is the initial state;AandBare constant matrices with compatible dimensions.The cost of system(9)is

    Under Assumption 1, the classical continuous-time LQR aims at computing the optimal control policy as a function of the current state such thatJc(x0,u) is minimized.The optimal control policy is

    whereP?=(P?)T? 0 is the unique solution of the continuous-time ARE[26]:

    For a stabilizing control gainK∈ Rm×n, the corresponding cost in (10) is, wherePK=(PK)T?0 is the unique solution of the following Lyapunov equation

    and the functionM(·):Sn→Sn+mis defined as

    Given an initial stabilizing control gainK0, the celebrated continuous-time PI developed in[6]iteratively solves the continuous-time LQR problem.The continuous-time PI algorithm is represented as:

    Procedure 2 (Exact PI for continuous-time LQR)

    1.Policy evaluation:get M(Pi)by solving

    2.Policy improvement:get the improved policy by

    Given an initial stabilizing control gainK0,by iteratively solving(15)and(16),Pimonotonically converges toP?and(A-BKi) is Hurwitz, which is formally presented in the following lemma.

    2.3 Problem formulation

    For the discrete-time and continuous-time PI algorithms,the accurate model knowledge(A,B) is required for the algorithm implementation.The convergence of the PI algorithms in Lemmas 1 and 2 are based on the assumption that the accurate system model is attainable.However,in reality,system uncertainties are unavoidable,and the PI algorithms cannot be implemented exactly.Therefore,in this paper,we investigate the following problem.

    Problem 1When the policy evaluation and improvement steps of the PI algorithms are subject to noise,will the convergence properties in Lemmas1and2still hold?

    3 Robustness analysis of policy iteration

    In this section, we will formally introduce the inexact PI algorithms for the discrete-time and continuous-time LQR in the presence of noise.By invoking the concept of input-tostate stability[23],it is rigorously shown that the optimized control policies converge to a neighborhood of the optimal control policy,and the size of the neighborhood depends on the magnitude of the noise.

    3.1 Robustness analysis of discrete-time policy iteration

    According to the exact discrete-time PI algorithm in(7)and(8), in the presence of noise, the steps of policy evaluation and policy improvement cannot be implemented accurately,and the inexact PI algorithm is as follows.

    Procedure 3 (Inexact PI for discrete-time LQR)

    1.Inexact policy evaluation:get?Gi∈Sm+n as an approximation of G(?Vi),where?Vi is the solution of

    2.Inexact policy improvement:get the improved policy by

    Remark 1The noiseΔGican be caused by various factors.For example, in data-driven control [24], the matrixG(?Vi)is identified by the collected input-state data.Since noise possibly pollutes the collected data, ?Gi, instead ofG(?Vi),is obtained.Other factors that may causeΔGiinclude the inaccurate system identification,the residual error of numerically solving the Lyapunov equation, and the approximate values ofQandRin inverse optimal control in the absence of the exact knowledge of the cost function.

    Next,by considering the inexact PI as a nonlinear dynamical system with the state ?Vi, we analyze its robustness to noiseΔGiby Lyapunov’s direct method and in the sense of small-disturbance ISS.For any stabilizing control gainL,define the candidate Lyapunov function as

    whereVL=?0 is the solution of(5).SinceVL≥V?(obtained by Lemma 1),we have

    Remark 2SinceJd(x0,-Lx) =xT0VLx0, whenx0~N(0,In),Ex0Jd(x0,-Lx)=Tr(VL).Hence,the candidate Lyapunov function in (20) can be considered as the difference between the value function of the controlleru(x(t))=-Lx(t)and the optimal value function.

    For anyh> 0, define a sublevel setLh= {L∈Rm×n|(A-BL)is Schur,Vd(VL) ≤h}.SinceVLis continuous with respect to the stabilizing control gainL, it readily follows thatLhis compact.Before the main theorem about the robustness of Procedure 3, we introduce the following instrumental lemma,which provides an upperbound onVd(VL).

    Lemma 3For any stabilizing control gain L,let L′=(R+BTVL B)-1BTVL A and EL=(L′-L)T(R+BTVL B)(L′-L).Then,

    where

    ProofWe can rewrite(4)as

    It was the day before Thanksgiving -- the first one my three children and I would be spending without their father, who had left several months before. Now the two older children were very sick with the flu, and the eldest1 had just been prescribed bed rest for a week.

    In addition,it follows from(5)that

    Subtracting(24)from(25)yields

    Since(A-BL?) is Schur, it follows from [27, Theorem 5.D6]that

    Taking the trace of(27)and using the main result of[28],we have

    Hence,the proof is completed.■

    Lemma 4For any L∈Lh,

    ProofSince(A-BL)is Schur,it follows from(5)and[27,Theorem 5.D6]that

    Hence,(29)readily follows from(31).■

    The following lemma shows that the sublevel setLhis invariant as long as the noiseΔLiis sufficiently small.

    ProofInduction is applied to prove the statement.Wheni=0, ?L0∈Lh.Suppose ?Li∈Lh,then,by[27,Theorem 5.D6],we have ?Vi?0.We can rewrite(17)as

    Considering(19),we have

    In addition,it follows from(19)that

    Plugging(33)and(34)into(32),and completing the squares,we have

    where

    If

    it is guaranteed that

    Writingdown(17)atthe(i+1)thiteration,andsubtracting it from(35),we have

    Following[27,Theorem 5.D6],we have

    Taking the trace of(40),and using Lemma 3 and the result in[28]yield

    It follows from(41),Lemma 4 and[28]that

    whereb2is defined as

    Hence,if

    it is guaranteed that

    Now,by Lypaunov’s direct method and by viewing Procedure 3 as a discrete-time nonlinear system with the state ?Vi, it is shown that ?Viconverges to a small neighbourhood of the optimal solution as long as noise is sufficiently small.

    Lemma 6For any h> 0and?L0∈Lh,if

    there exist a K-function ρ(·)and a KL-function κ(·,·),such that

    ProofRepeating(42)fori,i-1,...,0,we have

    Considering(21),it follows that

    The proof is thus completed.■

    The small-disturbance ISS property of the Procedure 3 is shown in the following theorem.

    Consequently,

    3.2 Robustness analysis of continuous-time policy iteration

    According to the exact PI for continuous-time LQR in(15) and (16), in the presence of noise, the steps of policy evaluation and policy improvement cannot be implemented accurately,and the inexact PI is as follows.

    Procedure 4 (Inexact PI for continuous-time LQR)

    1.Inexact policy evaluation:get?Mi∈Sm+n as an approximation of M( ?Pi),where?Pi is the solution of

    2.Inexact policy improvement:get the updated control gain by

    For any stabilizing control gainK, define the candidate Lyapunov function as

    wherePK=?0 is the solution of(13),i.e.

    SincePK≥P?,we have

    Foranyh>0,definethesublevelsetKh={K∈Rm×n|(ABK)is Hurwitz,Vc(PK)≤h}.SincePKis continuous with respect to the stabilizing control gainK,the sublevel setKhis compact.

    The following lemmas are instrumental for the proof of the main theorem.

    ProofThe Taylor expansion ofeDtis

    Hence,

    Pick av∈Rnwhich satisfies.Then,

    Hence,the lemma follows readily from(63).■

    The following lemma presents an upperbound of the Lyapunov functionVc(PK).

    Lemma 8For any stabilizing control gain K,let K′=R-1BTPK,where PK=PTK?0is the solution of(58),and EK=(K′-K)TR(K′-K).Then,

    ProofRewrite ARE(12)as

    Furthermore,(58)is rewritten as

    Subtracting(65)from(66)yields

    ConsideringK′=R-1BTPKand completing the squares in(67),we have

    Since(A-BK?) is Hurwitz, by (68) and [27, equation(5.18)],we have

    Taking the trace of (69), considering the cyclic property of trace and[28],we have(64).■

    Lemma 9For any K∈Kh,

    ProofSinceA-BKis Hurwitz,it follows from(58)that

    Taking the trace of(71),and considering[28],we have

    The proof is hence completed.■

    Writing down(54)for the(i+1)th iteration,and subtracting it from(75),we have

    Since(A-B?Ki+1) is Hurwitz (e(h) ≤e1(h)), it follows from[27,equation(5.18)]and(78)that

    Taking the trace of(79),and considering[28]and Lemma 9,we have

    It follows from(80)and Lemmas 7 and 8 that

    Taking the expression of ?Ki+1into consideration,we have

    Plugging(82)into(81),it follows that if

    we have

    ProofIt follows from Lemma 10,(81)and(82)that for anyi∈Z+,

    Repeating(86)fori,i-1,...,1,0,we have

    By(59),we have

    Hence,(85)follows readily.■

    With the aforementioned lemmas, we are ready to propose the main result on the robustness of the inexact PI for continuous-time LQR.

    ProofSuppose ?Ki∈Lh.If

    (ΔMuu,i+Muu( ?Pi))is invertible.It follows from(56)and

    Remark 3Compared with[24,25],in Theorems 1 and 2,the explicit expressions of the upperbounds on the small disturbance are given, such that at each iteration, the generated control gain is stabilizing and contained in the sublevel setsLhandKh.In addition, it is observed from (48) and (88)that the control gains generated by the inexact PI algorithms ultimately converge to a neighborhood of the optimal solution,and the size of the neighborhood is proportional to the quadratic form of the noise.

    4 Learning-based policy iteration

    In this section,based on the robustness property of the inexact PI in Procedure 3,we will develop a learning-based PI algorithm.Only the input-state trajectory data measured from the system is required for the algorithm.

    4.1 Algorithm development

    For a signalu[0,N-1]= [u0,u1,...,uN-1], its Hankel matrix of depthlis represented as

    Definition 1 An input signalu[0,N-1]is persistent exciting(PE)of orderlif the Hankel matrixHl(u[0,N-1])is full row rank.

    Lemma 12 [31]Let an input signal u[0,N-1]be PE of order l+n.Then,the state trajectory x[0,N-1]sampled from system(1)driven by the input u[0,N-1]satisfies

    Given the input-state datau[0,N-1]andx[0,N]sampled from (1), we will design a learning-based PI algorithm such that the accurate knowledge of system matrices is not required.For any time indices 0 ≤k1,k2≤N- 1 andV∈Sn,along the state trajectory of(1),we have

    It follows from(96)that

    Assumption 2 The exploration signalu[0,N-1]is PE of ordern+1.

    Under Assumption 2 and according to Lemma 12,z[0,N-1]is full row rank.As a result,for any fixedV∈Sn,(98)admits a unique solution

    whereΛis a data-dependent matrix defined as

    Therefore, given anyV∈Sn,Θ(V) can be directly computed from(99)without knowing the system matricesAandB.

    By(97),we can rewrite(7)as

    Plugging(99)into(101)yields(102).The learning-based PI is represented in the following procedure.

    Procedure 5 (Learning-based PI for discrete-time LQR)

    1.Learning-based policy evaluation

    2.Learning-based policy improvement

    It should be noticed that due to(99),Procedure 5 is equivalent to Procedure 1.

    4.2 Robustness analysis

    In the previous subsection,we assume that the accurate data from system can be obtained.In reality,measurement noise and unknown system disturbance are inevitable.Therefore,the input-state data is sampled from the following linear system with unknown system disturbance and measurement noise

    wherewk~N(0,Σw)andvk~N(0,Σv)are independent and identically distributed random noises.Let ˇzk=[yTk,uTk]and suppose there are in totalStrajectories of system(104)which start from the same initial state and are driven by the sameexplorationinputu[0,N-1].Averagingthecollecteddata overStrajectories,we have

    Then,the data-dependent matrix is constructed as

    By the strong law of large numbers,the following limitations hold almost surely

    Recall thatz[0,N-1],x[1,N-1], andΛare the data collected from system (1) with the same initial state and exploration input as (104).SinceSis finite, the difference betweenΛand ˇΛSis unavoidable,and hence,

    Procedure 6 (Learning-based PI using noisy data)

    1.Learning-based policy evaluation using noisy data

    2.Learning-based policy improvement using noisy data

    In Procedure 6,the symbol“check”is used to denote the variables for the learning-based PI using noisy data.In addition,let ?Videnote the result of the accurate evaluation of ˇLi,i.e.?Viis the solution of(109)with ˇΘ(ˇVi)replaced byΘ(?Vi).ˇVi= ?Vi+ΔViis the solution of(109)andΔViis the policy evaluation error induced by the noiseΔΛ.In the following contents,the superscriptSis omitted to simplify the notation.Based on the robustness analysis in the previous section,we will analyze the robustness of the learning-based PI to the noiseΔΛ.

    For any stabilizing control gainL,let ˇVL=VL+ΔVbe the solution of the learning-based policy evaluation with the noisy data-dependent matrix ˇΛ,i.e.

    The following lemma guarantees that(111)has a unique solution(VL+ΔV)=(VL+ΔV)T?0.

    Lemma 13If

    then(Λ+ΔΛ)TIn-LTTis a Schur matrix.

    ProofRecall thatVL=VTL?0 is the solution of(5)associated with the stabilizing control gainL.By (99), (5) is equivalent to the following equation

    SinceQ?0,by[30,Lemma 3.9],ΛTIn-LTTis Schur.

    WhenΛis disturbed byΔΛ,we can rewrite(113)as

    When(112)holds,we haveBy [30, Lemma 3.9], (114) and (115),(Λ+ΔΛ)T×In-LTTis Schur.■

    The following lemma implies that the policy evaluation errorΔVis small as long asΔΛis sufficiently small.

    Lemma 14For any h>0,L∈Lh,and ΔΛ satisfying(112),we have

    ProofAccording to[32,Theorems 2.6 and 4.1],we have

    Since Tr(VL) ≤h+Tr(V?),it follows from(102)and[27,Theorem 5.D6]that

    Taking the trace of both sides of(118)and utilizing[28],we have

    Plugging(119)into(117)yields(116).■

    The following lemma tells us thatΔΘis small ifΔVandΔΛare small enough.

    Lemma 15LetˇΘ(ˇVL)= ˇΛˇVLˇΛTand ΔΘ(VL)= ˇΘ(ˇVL)-Θ(VL),then,

    ProofBy the expressions of ˇΘ(ˇVL)andΘ(VL),we have

    Hence,(120)is obtained by(121)and the triangle inequality.

    By the follow lemma,it is ensured thatΔLconverges to zero asΔΘtends to zero.

    Lemma 16Let ΔL=(R+Θˇuu(VˇL))-1Θˇux(VˇL)-(R+Θuu(VL))-1Θux(VL).Then,

    ProofFrom the expression ofΔL,we have

    Therefore,(122)readily follows from(123).■

    Given the aforementioned lemmas,we are ready to show the main result on the robustness of the learning-based PI algorithm in Procedure 6.

    ProofAt each iteration of Procedure 6,ifΛis not disturbed by noise,i.e.ΔΛ=0,the policy improvement is(103).Due to the influence ofΔΛ,the control gain is updated by(110),which can be rewritten as

    whereΔLi+1is

    5 Numerical simulation

    In this section,we illustrate the proposed theoretical results by a benchmark example known as cart-pole system [33].The parameters of the cart-pole system are:mc=1kg(mass of the cart),m p= 0.1kg (mass of the pendulum),lp=0.5m (distance from the center of mass of the pendulum to the pivot),gc= 9.8 m/s2(gravitational acceleration).By linearizing the system around the equilibrium,the system is

    By discretizing it through Euler method with a step size of 0.01sec,we have

    The weighting matrices of the cost (1) areQ= 10I4andR=1.The initial stabilizing gain to start the policy iteration algorithm is

    5.1 Robustness test of the inexact policy iteration

    We test the robustness of the inexact PI for discrete-time systems in Procedure 3.At each iteration, each element ofΔGiis sampled from a standard Gaussian distribution,and then its spectral norm is scaled to 0.2.During the iteration,the relative errors of the control gain ?Liand cost matrix ?Viare shown in Fig.1.The control gain and cost matrix are close to the optimal solution at the 5th iteration.It is observed that even under the influence of disturbances at each iteration,the inexact PI in Procedure 3 can still approach the optimal solution.ThisisconsistentwiththeISSpropertyofProcedure 3 in Theorem 1.

    In addition, the robustness of Procedure 4 is tested.At each iteration,ΔMiis randomly sampled with the norm of 0.2.Under the influence ofΔMi,the evolution of the control gain ?Kiand cost matrix ?Piis shown in Fig.2.Under the noiseΔMi,the algorithm cannot converge exactly to the optimal solution.However,with the small-disturbance ISS property,the inexact PI can still converge to a neighborhood of the optimal solution,which is consistent with Theorem 2.

    5.2 Robustness test of the learning-based policy iteration

    The robustness of the learning-based PI in Procedure 6 is tested for system (104) with both system disturbance and measurement noise.The variances of the system disturbance and measurement noise areΣw=0.01InandΣv=0.01In.One trajectory is sampled from the solution of(104)and the length of the sampled trajectory isN= 100, i.e.100 data collected from(104)is used to construct the data-dependent matrix ?ΛS.Compared with the matrixΛin(100b)where the data is collected from the system without unknown system disturbance and measurement noise, ?ΛSis directly constructed by the noisy data.Therefore, at each iteration of the learning-based PI,ΔΛSintroduces the disturbances.The evolution of the control gain and cost matrix is in Fig.3.It is observed that with the noisy data,the control gain and the cost matrix obtained by Procedure 6 converge to an approximation of the optimal solution.This coincides with the result in Theorem 3.

    Fig.1 Robustness test of Procedure 3 whenΔG∞=0.2

    Fig.2 Robustness test of Procedure 4 whenΔM∞=0.2

    Fig.3 Robustness test of Procedure 6 when the noisy data is applied for the construction of ?Λ

    6 Conclusion

    In this paper, we have studied the robustness property of policy optimization in the presence of disturbances at each iteration.Using ISS Lyapunov techniques,it is demonstrated that the PI ultimately converges to a small neighborhood of the optimal solution as long as the disturbance is sufficiently small.In this paper,we also provided a quantifiable bound.Based on the ISS property and Willems’fundamental lemma,a learning-based PI algorithm is proposed and the persist excitation of the exploratory signal can be easily guaranteed.A numerical simulation example is provided to illustrate the theoretical results.

    Data availability The data that support the findings of this study are available from the corresponding author, L.Cui, upon reasonable request.

    久久久水蜜桃国产精品网| 国产一区二区在线观看日韩 | 99精品欧美一区二区三区四区| 麻豆久久精品国产亚洲av| 两性午夜刺激爽爽歪歪视频在线观看| 观看免费一级毛片| 国产美女午夜福利| 搡老熟女国产l中国老女人| 男女那种视频在线观看| 国产成人精品久久二区二区免费| 亚洲成av人片在线播放无| 久久人妻av系列| 国产日本99.免费观看| 青草久久国产| 亚洲aⅴ乱码一区二区在线播放| а√天堂www在线а√下载| 国产成人精品无人区| 欧美成人性av电影在线观看| 日本 欧美在线| 日韩av在线大香蕉| 成人无遮挡网站| 男人的好看免费观看在线视频| 毛片女人毛片| 久久这里只有精品19| 99久久精品国产亚洲精品| 欧美大码av| 好看av亚洲va欧美ⅴa在| 精品不卡国产一区二区三区| 国产精品av视频在线免费观看| 成人午夜高清在线视频| e午夜精品久久久久久久| 99精品在免费线老司机午夜| 日本黄大片高清| 国产精品免费一区二区三区在线| 亚洲精品久久国产高清桃花| 成人18禁在线播放| 国产黄片美女视频| 日韩 欧美 亚洲 中文字幕| 一个人免费在线观看电影 | 亚洲片人在线观看| 国产高清有码在线观看视频| 两人在一起打扑克的视频| 免费在线观看日本一区| 国产一区二区三区视频了| 日韩欧美三级三区| 嫁个100分男人电影在线观看| 欧美国产日韩亚洲一区| 国产精品99久久99久久久不卡| 午夜福利欧美成人| 欧美zozozo另类| 色综合欧美亚洲国产小说| 国产免费男女视频| 亚洲国产精品sss在线观看| or卡值多少钱| 精品一区二区三区四区五区乱码| 欧美高清成人免费视频www| 亚洲 国产 在线| 久久久久久国产a免费观看| 精品熟女少妇八av免费久了| 成人精品一区二区免费| 波多野结衣高清无吗| 综合色av麻豆| 很黄的视频免费| 9191精品国产免费久久| 午夜福利在线在线| av在线天堂中文字幕| 亚洲成a人片在线一区二区| 亚洲午夜理论影院| 午夜福利在线在线| 国产精品av久久久久免费| 在线看三级毛片| netflix在线观看网站| av天堂在线播放| 最近视频中文字幕2019在线8| 桃色一区二区三区在线观看| 欧美乱色亚洲激情| 搡老岳熟女国产| 一夜夜www| 91麻豆av在线| 国产黄片美女视频| 18禁国产床啪视频网站| 免费观看的影片在线观看| 91老司机精品| 精品国产超薄肉色丝袜足j| 日本精品一区二区三区蜜桃| tocl精华| 脱女人内裤的视频| 91字幕亚洲| 淫秽高清视频在线观看| 日本撒尿小便嘘嘘汇集6| 国产亚洲精品一区二区www| 欧美日本亚洲视频在线播放| 51午夜福利影视在线观看| 性欧美人与动物交配| 757午夜福利合集在线观看| 日韩 欧美 亚洲 中文字幕| 日本 欧美在线| 国产综合懂色| 日韩欧美 国产精品| 成年女人永久免费观看视频| 亚洲欧美激情综合另类| 特级一级黄色大片| 日韩 欧美 亚洲 中文字幕| 亚洲五月天丁香| 级片在线观看| 国产一区二区三区视频了| 女生性感内裤真人,穿戴方法视频| bbb黄色大片| 亚洲国产欧美网| 午夜影院日韩av| 亚洲av第一区精品v没综合| 亚洲avbb在线观看| 麻豆成人av在线观看| 一进一出好大好爽视频| 最近最新免费中文字幕在线| 国产又黄又爽又无遮挡在线| av在线蜜桃| 亚洲中文av在线| 日本撒尿小便嘘嘘汇集6| 成人高潮视频无遮挡免费网站| 最新中文字幕久久久久 | 欧美日韩瑟瑟在线播放| 国产精品久久久人人做人人爽| 十八禁人妻一区二区| 欧美一级a爱片免费观看看| 日本免费一区二区三区高清不卡| 一个人免费在线观看的高清视频| 国产成人欧美在线观看| 免费av不卡在线播放| 国产高清视频在线观看网站| 99热6这里只有精品| 国产成+人综合+亚洲专区| 美女午夜性视频免费| 免费看十八禁软件| 脱女人内裤的视频| 男女午夜视频在线观看| 日韩精品中文字幕看吧| 日本 av在线| 这个男人来自地球电影免费观看| 国产精品电影一区二区三区| 少妇人妻一区二区三区视频| a级毛片在线看网站| 观看免费一级毛片| 成人国产综合亚洲| 一级a爱片免费观看的视频| 综合色av麻豆| 欧美zozozo另类| 久久国产精品人妻蜜桃| 97碰自拍视频| 国产成人精品久久二区二区91| 在线播放国产精品三级| 午夜视频精品福利| 观看美女的网站| 日本三级黄在线观看| 国产极品精品免费视频能看的| a级毛片a级免费在线| 久99久视频精品免费| 久久精品亚洲精品国产色婷小说| 他把我摸到了高潮在线观看| 国产成人aa在线观看| 中文资源天堂在线| 中文字幕人成人乱码亚洲影| 给我免费播放毛片高清在线观看| 人人妻人人看人人澡| 丁香欧美五月| 亚洲五月婷婷丁香| 欧美高清成人免费视频www| av欧美777| 两性夫妻黄色片| 午夜日韩欧美国产| 美女午夜性视频免费| 免费观看的影片在线观看| 人人妻人人看人人澡| avwww免费| 精品一区二区三区视频在线 | 亚洲乱码一区二区免费版| 一个人观看的视频www高清免费观看 | 免费观看精品视频网站| 日韩成人在线观看一区二区三区| 日韩高清综合在线| 亚洲av片天天在线观看| 国产高清激情床上av| 最近在线观看免费完整版| 亚洲无线观看免费| 美女黄网站色视频| 99久久精品国产亚洲精品| 免费av不卡在线播放| 亚洲精品国产精品久久久不卡| 国产精品亚洲av一区麻豆| 99视频精品全部免费 在线 | av天堂中文字幕网| 亚洲无线在线观看| 日本三级黄在线观看| 日本免费一区二区三区高清不卡| cao死你这个sao货| 成人欧美大片| 巨乳人妻的诱惑在线观看| 中国美女看黄片| 亚洲中文日韩欧美视频| 国内揄拍国产精品人妻在线| 午夜精品久久久久久毛片777| 欧美绝顶高潮抽搐喷水| 一二三四社区在线视频社区8| 国产97色在线日韩免费| 一区二区三区激情视频| 久久中文字幕人妻熟女| 国产高清三级在线| 村上凉子中文字幕在线| 一本一本综合久久| 一个人看视频在线观看www免费 | 一级作爱视频免费观看| 岛国在线观看网站| 19禁男女啪啪无遮挡网站| 一卡2卡三卡四卡精品乱码亚洲| 精品一区二区三区视频在线观看免费| 在线观看美女被高潮喷水网站 | 禁无遮挡网站| 欧美日韩国产亚洲二区| 欧美黄色片欧美黄色片| 国产v大片淫在线免费观看| 亚洲一区二区三区色噜噜| 少妇裸体淫交视频免费看高清| 亚洲天堂国产精品一区在线| 人人妻人人澡欧美一区二区| 日本一二三区视频观看| 一个人看视频在线观看www免费 | 三级毛片av免费| 欧美日韩精品网址| 亚洲av电影在线进入| 国产av一区在线观看免费| www.自偷自拍.com| 日韩高清综合在线| 欧美绝顶高潮抽搐喷水| 九色成人免费人妻av| 色av中文字幕| 嫩草影视91久久| 最近最新免费中文字幕在线| 亚洲一区二区三区不卡视频| 中文字幕人妻丝袜一区二区| 校园春色视频在线观看| 大型黄色视频在线免费观看| 高清毛片免费观看视频网站| 国产成+人综合+亚洲专区| 国产精品一区二区三区四区久久| 欧美另类亚洲清纯唯美| 成人国产一区最新在线观看| 日韩高清综合在线| 国产精品乱码一区二三区的特点| 久久香蕉国产精品| www.熟女人妻精品国产| 亚洲成人精品中文字幕电影| 午夜精品在线福利| 手机成人av网站| 国产精品爽爽va在线观看网站| 美女扒开内裤让男人捅视频| 国产精品影院久久| 两性夫妻黄色片| 婷婷亚洲欧美| 最近视频中文字幕2019在线8| 黑人欧美特级aaaaaa片| 日韩 欧美 亚洲 中文字幕| 无人区码免费观看不卡| 国产99白浆流出| 国产精品久久久久久人妻精品电影| 99久久精品一区二区三区| 国产蜜桃级精品一区二区三区| e午夜精品久久久久久久| 老汉色∧v一级毛片| 天天躁日日操中文字幕| 国产午夜精品论理片| 中文字幕人妻丝袜一区二区| 欧美国产日韩亚洲一区| 国产成人系列免费观看| bbb黄色大片| 亚洲国产高清在线一区二区三| 亚洲av熟女| 波多野结衣巨乳人妻| 亚洲av美国av| 国产精品久久电影中文字幕| 男女午夜视频在线观看| 一进一出抽搐动态| 国产99白浆流出| 欧美黑人巨大hd| 国产精品99久久久久久久久| 午夜福利免费观看在线| 天堂网av新在线| 久久99热这里只有精品18| 久久精品国产99精品国产亚洲性色| 99久久久亚洲精品蜜臀av| 18美女黄网站色大片免费观看| 麻豆国产av国片精品| 老熟妇乱子伦视频在线观看| 精品国产乱码久久久久久男人| 脱女人内裤的视频| 久久久久久久久久黄片| 国产av一区在线观看免费| 18禁黄网站禁片免费观看直播| 国产成人aa在线观看| 999久久久精品免费观看国产| 色精品久久人妻99蜜桃| 婷婷亚洲欧美| 男女之事视频高清在线观看| 91在线观看av| 国产一区二区激情短视频| 最近最新中文字幕大全电影3| avwww免费| 欧美中文日本在线观看视频| 午夜福利在线在线| 久久性视频一级片| 久久国产精品影院| 51午夜福利影视在线观看| 9191精品国产免费久久| 视频区欧美日本亚洲| 天堂网av新在线| 国产欧美日韩一区二区三| 婷婷丁香在线五月| 亚洲成人精品中文字幕电影| 欧美色欧美亚洲另类二区| 每晚都被弄得嗷嗷叫到高潮| 国产亚洲av嫩草精品影院| av福利片在线观看| 免费人成视频x8x8入口观看| 日韩国内少妇激情av| 久久中文字幕人妻熟女| 91在线观看av| 90打野战视频偷拍视频| 窝窝影院91人妻| 国产精品一区二区三区四区久久| 国产高清三级在线| 丰满人妻一区二区三区视频av | 麻豆一二三区av精品| 国产高清视频在线观看网站| 黄色欧美视频在线观看| 国产欧美另类精品又又久久亚洲欧美| 精品人妻一区二区三区麻豆| 久久久久精品久久久久真实原创| 欧美成人免费av一区二区三区| 国产亚洲5aaaaa淫片| 在线免费十八禁| 精品一区二区三区人妻视频| 日本一二三区视频观看| 日本黄色视频三级网站网址| 日韩欧美精品免费久久| 国产老妇女一区| 好男人视频免费观看在线| www.av在线官网国产| 一级毛片久久久久久久久女| 久久亚洲国产成人精品v| 九九热线精品视视频播放| 国产伦精品一区二区三区四那| 天堂√8在线中文| 亚洲av成人精品一区久久| 国产精品国产三级国产av玫瑰| 尤物成人国产欧美一区二区三区| 日本wwww免费看| 国产91av在线免费观看| 26uuu在线亚洲综合色| 国产v大片淫在线免费观看| 亚洲成人精品中文字幕电影| 一级av片app| 只有这里有精品99| 国产精品国产三级国产专区5o | 久久精品国产鲁丝片午夜精品| 亚洲欧美成人综合另类久久久 | 日本猛色少妇xxxxx猛交久久| 国产日韩欧美在线精品| 久久人人爽人人爽人人片va| av专区在线播放| 亚洲国产欧美人成| 亚洲人成网站在线观看播放| 成人特级av手机在线观看| 波多野结衣巨乳人妻| 男女国产视频网站| 日韩成人伦理影院| 日韩成人av中文字幕在线观看| 伦精品一区二区三区| 午夜亚洲福利在线播放| 九草在线视频观看| 亚洲五月天丁香| 三级男女做爰猛烈吃奶摸视频| 18禁在线无遮挡免费观看视频| 国产精品1区2区在线观看.| 国产成人a∨麻豆精品| 网址你懂的国产日韩在线| 男女视频在线观看网站免费| .国产精品久久| 最后的刺客免费高清国语| 99视频精品全部免费 在线| 亚洲综合精品二区| av在线蜜桃| 午夜爱爱视频在线播放| 午夜福利在线在线| 国产单亲对白刺激| 最新中文字幕久久久久| 精品人妻一区二区三区麻豆| 国产一区亚洲一区在线观看| 中文字幕久久专区| 91久久精品国产一区二区三区| 看黄色毛片网站| 淫秽高清视频在线观看| 特级一级黄色大片| 少妇猛男粗大的猛烈进出视频 | 亚洲欧美日韩东京热| 日日干狠狠操夜夜爽| 人妻制服诱惑在线中文字幕| 精品久久久久久久末码| 在线天堂最新版资源| 男插女下体视频免费在线播放| 寂寞人妻少妇视频99o| 国产淫片久久久久久久久| 欧美精品一区二区大全| 青春草国产在线视频| 村上凉子中文字幕在线| 免费黄网站久久成人精品| 久久这里只有精品中国| 国产大屁股一区二区在线视频| 国产一区亚洲一区在线观看| a级一级毛片免费在线观看| 色综合站精品国产| 高清视频免费观看一区二区 | 中文字幕亚洲精品专区| www.色视频.com| 最近最新中文字幕大全电影3| 国产亚洲91精品色在线| 欧美高清成人免费视频www| 我要搜黄色片| 国产精品一区二区在线观看99 | 国产乱来视频区| 亚洲欧美日韩无卡精品| 直男gayav资源| 久久久久久久久久黄片| 日韩一区二区视频免费看| 国产视频首页在线观看| 看免费成人av毛片| 插阴视频在线观看视频| 在线观看av片永久免费下载| 麻豆av噜噜一区二区三区| 少妇被粗大猛烈的视频| av在线老鸭窝| 亚洲精品国产成人久久av| 久久久久久久久久成人| 六月丁香七月| 两个人的视频大全免费| 麻豆av噜噜一区二区三区| 26uuu在线亚洲综合色| 国产综合懂色| 丰满乱子伦码专区| 麻豆成人av视频| 91精品伊人久久大香线蕉| 亚洲中文字幕日韩| 少妇人妻精品综合一区二区| 精品久久久久久久久亚洲| 国产综合懂色| 99热网站在线观看| 国产探花在线观看一区二区| 成人漫画全彩无遮挡| 精品酒店卫生间| 美女cb高潮喷水在线观看| 精品人妻偷拍中文字幕| 干丝袜人妻中文字幕| 日本wwww免费看| 国产色婷婷99| 超碰av人人做人人爽久久| 久久久久久久久久久免费av| 1024手机看黄色片| 只有这里有精品99| 国产在线一区二区三区精 | 欧美高清性xxxxhd video| 色吧在线观看| 中文字幕av在线有码专区| av在线播放精品| 日本免费在线观看一区| 秋霞在线观看毛片| 亚洲欧美成人综合另类久久久 | 国产成人精品婷婷| 高清视频免费观看一区二区 | 午夜久久久久精精品| 变态另类丝袜制服| 国产伦在线观看视频一区| 国产免费视频播放在线视频 | 一边亲一边摸免费视频| 久久这里有精品视频免费| 午夜亚洲福利在线播放| 夜夜看夜夜爽夜夜摸| 精品无人区乱码1区二区| 色噜噜av男人的天堂激情| 亚洲av成人精品一区久久| 最近视频中文字幕2019在线8| 白带黄色成豆腐渣| 精品一区二区三区视频在线| 久久国产乱子免费精品| 亚洲人成网站高清观看| 久久久久久国产a免费观看| 成人高潮视频无遮挡免费网站| 最近手机中文字幕大全| 久久99精品国语久久久| 99久久成人亚洲精品观看| 中文精品一卡2卡3卡4更新| 亚洲成av人片在线播放无| av视频在线观看入口| 亚洲中文字幕一区二区三区有码在线看| 国产精品国产三级国产专区5o | 国产精品野战在线观看| 一区二区三区四区激情视频| av在线蜜桃| 国产单亲对白刺激| 亚洲精品色激情综合| 日韩成人伦理影院| 亚洲综合精品二区| 国产精品福利在线免费观看| 国产又黄又爽又无遮挡在线| 综合色av麻豆| 三级男女做爰猛烈吃奶摸视频| 在线免费十八禁| 欧美日韩精品成人综合77777| 国产一区有黄有色的免费视频 | videossex国产| 亚洲精品乱码久久久久久按摩| 国产精品蜜桃在线观看| 国产国拍精品亚洲av在线观看| 国产伦一二天堂av在线观看| 春色校园在线视频观看| 欧美又色又爽又黄视频| av黄色大香蕉| 青春草亚洲视频在线观看| 一级黄色大片毛片| 三级毛片av免费| 国产伦精品一区二区三区视频9| 免费一级毛片在线播放高清视频| 亚洲aⅴ乱码一区二区在线播放| 两个人的视频大全免费| 成人亚洲精品av一区二区| 高清av免费在线| 大香蕉久久网| 欧美xxxx性猛交bbbb| 婷婷六月久久综合丁香| 国产伦精品一区二区三区四那| 国产免费视频播放在线视频 | 国产亚洲最大av| 免费av不卡在线播放| 在线免费观看不下载黄p国产| 久久久久精品久久久久真实原创| 亚洲av中文字字幕乱码综合| 亚洲婷婷狠狠爱综合网| 国产精品国产三级专区第一集| 天堂影院成人在线观看| 偷拍熟女少妇极品色| 九九在线视频观看精品| 久久99热6这里只有精品| 在线免费观看的www视频| 久久久久久久久久成人| 1024手机看黄色片| 久久久久精品久久久久真实原创| 一级毛片aaaaaa免费看小| 久久韩国三级中文字幕| 亚洲激情五月婷婷啪啪| 午夜久久久久精精品| 久久综合国产亚洲精品| 国内精品美女久久久久久| 18禁动态无遮挡网站| 麻豆国产97在线/欧美| 99久久九九国产精品国产免费| 久99久视频精品免费| 99久久精品热视频| 久久久久免费精品人妻一区二区| 国内精品宾馆在线| 色噜噜av男人的天堂激情| 午夜视频国产福利| 性色avwww在线观看| 亚洲欧美日韩无卡精品| 免费人成在线观看视频色| 男女视频在线观看网站免费| 汤姆久久久久久久影院中文字幕 | 国产精品人妻久久久久久| 国语对白做爰xxxⅹ性视频网站| 国产午夜精品久久久久久一区二区三区| 国产成人a区在线观看| 欧美zozozo另类| 午夜日本视频在线| 丝袜喷水一区| 免费观看性生交大片5| 国产乱人视频| 高清在线视频一区二区三区 | 赤兔流量卡办理| 日韩欧美精品免费久久| 国产 一区精品| 青春草视频在线免费观看| 六月丁香七月| 中文欧美无线码| 国产免费又黄又爽又色| 欧美激情在线99| 日本免费一区二区三区高清不卡| 又爽又黄a免费视频| 校园人妻丝袜中文字幕| 精品人妻偷拍中文字幕| 国产亚洲av片在线观看秒播厂 | 国产精华一区二区三区| 欧美日韩精品成人综合77777| 亚洲人成网站在线观看播放| 午夜福利视频1000在线观看| www日本黄色视频网| 国产又色又爽无遮挡免| 日本欧美国产在线视频| 亚洲av成人精品一二三区| 91狼人影院| 成人性生交大片免费视频hd| 久久久久久久久大av| 熟妇人妻久久中文字幕3abv| 国产亚洲精品久久久com| 最新中文字幕久久久久| 国产伦理片在线播放av一区| av女优亚洲男人天堂| 色综合亚洲欧美另类图片| 伊人久久精品亚洲午夜| 热99re8久久精品国产|