• <tr id="yyy80"></tr>
  • <sup id="yyy80"></sup>
  • <tfoot id="yyy80"><noscript id="yyy80"></noscript></tfoot>
  • 99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

    A Lyapunov characterization of robust policy optimization

    2023-11-16 10:12:32LeileiCuiZhongPingJiang
    Control Theory and Technology 2023年3期

    Leilei Cui·Zhong-Ping Jiang

    Abstract In this paper,we study the robustness property of policy optimization(particularly Gauss–Newton gradient descent algorithm which is equivalent to the policy iteration in reinforcement learning)subject to noise at each iteration.By invoking the concept of input-to-state stability and utilizing Lyapunov’s direct method,it is shown that,if the noise is sufficiently small,the policy iteration algorithm converges to a small neighborhood of the optimal solution even in the presence of noise at each iteration.Explicit expressions of the upperbound on the noise and the size of the neighborhood to which the policies ultimately converge are provided.Based on Willems’fundamental lemma,a learning-based policy iteration algorithm is proposed.The persistent excitation condition can be readily guaranteed by checking the rank of the Hankel matrix related to an exploration signal.The robustness of the learning-based policy iteration to measurement noise and unknown system disturbances is theoretically demonstrated by the input-to-state stability of the policy iteration.Several numerical simulations are conducted to demonstrate the efficacy of the proposed method.

    Keywords Policy optimization·Policy iteration(PI)·Input-to-state stability(ISS)·Lyapunov’s direct method

    1 Introduction

    Through reinforcement learning(RL)techniques,agents can iteratively minimize the specific cost function by interacting continuously with unknown environment.Policy optimization is fundamental for the development of RL algorithms as introduced in[1].Policy optimization first parameterizes the control policy,and then,the performance of the control policy is iteratively improved by updating the parameters along the gradient descent direction of the given cost function.Since the linear quadratic regulator (LQR)problem is tractable and widely applied in many engineering fields, it provides an ideal benchmark example for the theoretical analysis of policy optimization.For the LQR problem, the control policy is parameterized by a control gain matrix,and the gradient of the quadratic cost with respect to the control gain is associated with a Lyapunov matrix equation.Based on these results, various policy optimization algorithms, including vanilla gradient descent, natural gradient descent and Gauss–Newton methods,are developed in[2–5].Compared with other policy optimization algorithms with a linear convergence rate,the control policies generated by the Gauss–Newton method converge quadratically to the optimal solution.

    It is noticed that the Gauss–Newton method with the step size of 1/2 coincides with the policy iteration(PI)algorithm[6, 7], which is an important iterative algorithm in RL and adaptive/approximate dynamic programming (ADP) [1, 8,9].From the perspective of the PI, the Lyapunov matrix equation for computing the gradient can be considered as policy evaluation.The update of the policy along the gradient direction can be interpreted as policy improvement.The steps of policy evaluation and policy improvement are iterated in turn to find the optimal solution of LQR.Various PI algorithms have been proposed for important classes of linear/nonlinear/time-delay/time-varying systems for optimal stabilization and output tracking [10–14].In addition,PI has been successfully applied to sensory motor control[15],and autonomous driving[16,17].

    The convergence of the PI algorithm is ensured under the assumption that the accurate knowledge of system model is accessible.However, in reality, the system model obtained by system identification [18] is used for the PI algorithm or the PI algorithm is directly implemented through a data-driven approach using input-state data [10, 19–22].Consequently,the PI algorithm is hardly implemented accurately due to modeling errors, inaccurate state estimation,measurement noise,and unknown system disturbances.The robustness of the PI algorithm to unavoidable noise is an important property to be investigated, which lays a foundation for better understanding RL algorithms.There are several challenges for studying the robustness of the PI algorithm.Firstly, the nonlinearity of the PI algorithm makes it hard to analyze the convergence property.Secondly, it is difficult to quantify the influence of noise,since noise may destroy the monotonic property of the PI algorithm or even result in a destabilizing controller.

    In this paper,we study the robustness of the PI algorithm in the presence of noise.The contributions are summarized as follows.Firstly,by viewing the PI algorithm as a nonlinear system and invoking the concept of input-to-state stability(ISS)[23],particularlythesmall-disturbanceISS[24,25],we investigate the robustness of the PI algorithm under the influence of noise.It is demonstrated that when subject to noise,thecontrolpoliciesgeneratedbythePIalgorithmwilleventually converge to a small neighborhood of the optimal solution of LQR as long as noise is sufficiently small.Different from[24,25],where the analysis is trajectory-based,we directly utilize Lypuanov’s direct method to analyze the convergence of the PI algorithm under disturbances.As a result,an explicit expression of the upperbound on the noise is provided.The size of the neighborhood in which the control policies will ultimately stay is demonstrated as a quadratic function of the noise.Secondly,by utilizing Willems’fundamental lemma,a learning-based PI algorithm is proposed.Compared with the conventional learning-based control approach where the exploratory control input is hard to design such that the persistent excitation condition is satisfied[24],the persistently exciting exploratory signal of the proposed method can be easily designed by checking the rank condition of a Hankel matrix related to the exploration signal.Finally, based on the small-disturbance ISS property of the PI algorithm, we demonstrated that the proposed learning-based PI algorithm is robust to the state measurement noise and unknown system disturbances.

    The remaining contents of the paper are organized as follows.Section 2 reviews the LQR problem and the celebrated PI algorithm.In Sect.3,the small-disturbance ISS property of the PI algorithm is studied.Section 4 proposes a learningbased PI algorithm and the robustness of the algorithm is analyzed.Several numerical examples are given in Sect.5,followed by some concluding remarks in Sect.6.

    2 Preliminaries and problem formulation

    2.1 Policy iteration for discrete-time LQR

    The discrete-time linear time-invariant(LTI)system is represented as

    wherexk∈Rnanduk∈Rmare the state and control input,respectively;AandBare system matrices with compatible dimensions.

    Assumption 1 The pair(A,B)is controllable.

    Under Assumption 1,the discrete-time LQR is to minimize the following accumulative quadratic cost

    whereQ=QT?0 andR=RT?0.The optimal controller of the discrete-time LQR is

    whereV?=(V?)T?0 is the unique solution to the following discrete-time algebratic Riccati equation(ARE)

    For a stabilizing control gainL∈ Rm×n, the corresponding cost in (2) isJd(x0,-Lx) =, whereVL=(VL)T?0 is the unique solution of the following Lyapunov equation

    and the functionG(·):Sn→Sn+mis defined as

    The discrete-time PI algorithm was developed by [7] to iteratively solve the discrete-time LQR problem.Given an initial stabilizing control gainL0,the discrete-time PI algorithm is represented as:

    Procedure 1 (Exact PI for discrete-time LQR)

    1.Policy evaluation:get G(Vi)by solving

    2.Policy improvement:get the improved policy by

    The monotonic convergence property of the discrete-time PI is shown in the following lemma.

    2.2 Policy iteration for continuous-time LQR

    Consider the continuous-time LTI system

    wherex(t)∈Rnis the state;u(t)∈Rmis the control input;x0is the initial state;AandBare constant matrices with compatible dimensions.The cost of system(9)is

    Under Assumption 1, the classical continuous-time LQR aims at computing the optimal control policy as a function of the current state such thatJc(x0,u) is minimized.The optimal control policy is

    whereP?=(P?)T? 0 is the unique solution of the continuous-time ARE[26]:

    For a stabilizing control gainK∈ Rm×n, the corresponding cost in (10) is, wherePK=(PK)T?0 is the unique solution of the following Lyapunov equation

    and the functionM(·):Sn→Sn+mis defined as

    Given an initial stabilizing control gainK0, the celebrated continuous-time PI developed in[6]iteratively solves the continuous-time LQR problem.The continuous-time PI algorithm is represented as:

    Procedure 2 (Exact PI for continuous-time LQR)

    1.Policy evaluation:get M(Pi)by solving

    2.Policy improvement:get the improved policy by

    Given an initial stabilizing control gainK0,by iteratively solving(15)and(16),Pimonotonically converges toP?and(A-BKi) is Hurwitz, which is formally presented in the following lemma.

    2.3 Problem formulation

    For the discrete-time and continuous-time PI algorithms,the accurate model knowledge(A,B) is required for the algorithm implementation.The convergence of the PI algorithms in Lemmas 1 and 2 are based on the assumption that the accurate system model is attainable.However,in reality,system uncertainties are unavoidable,and the PI algorithms cannot be implemented exactly.Therefore,in this paper,we investigate the following problem.

    Problem 1When the policy evaluation and improvement steps of the PI algorithms are subject to noise,will the convergence properties in Lemmas1and2still hold?

    3 Robustness analysis of policy iteration

    In this section, we will formally introduce the inexact PI algorithms for the discrete-time and continuous-time LQR in the presence of noise.By invoking the concept of input-tostate stability[23],it is rigorously shown that the optimized control policies converge to a neighborhood of the optimal control policy,and the size of the neighborhood depends on the magnitude of the noise.

    3.1 Robustness analysis of discrete-time policy iteration

    According to the exact discrete-time PI algorithm in(7)and(8), in the presence of noise, the steps of policy evaluation and policy improvement cannot be implemented accurately,and the inexact PI algorithm is as follows.

    Procedure 3 (Inexact PI for discrete-time LQR)

    1.Inexact policy evaluation:get?Gi∈Sm+n as an approximation of G(?Vi),where?Vi is the solution of

    2.Inexact policy improvement:get the improved policy by

    Remark 1The noiseΔGican be caused by various factors.For example, in data-driven control [24], the matrixG(?Vi)is identified by the collected input-state data.Since noise possibly pollutes the collected data, ?Gi, instead ofG(?Vi),is obtained.Other factors that may causeΔGiinclude the inaccurate system identification,the residual error of numerically solving the Lyapunov equation, and the approximate values ofQandRin inverse optimal control in the absence of the exact knowledge of the cost function.

    Next,by considering the inexact PI as a nonlinear dynamical system with the state ?Vi, we analyze its robustness to noiseΔGiby Lyapunov’s direct method and in the sense of small-disturbance ISS.For any stabilizing control gainL,define the candidate Lyapunov function as

    whereVL=?0 is the solution of(5).SinceVL≥V?(obtained by Lemma 1),we have

    Remark 2SinceJd(x0,-Lx) =xT0VLx0, whenx0~N(0,In),Ex0Jd(x0,-Lx)=Tr(VL).Hence,the candidate Lyapunov function in (20) can be considered as the difference between the value function of the controlleru(x(t))=-Lx(t)and the optimal value function.

    For anyh> 0, define a sublevel setLh= {L∈Rm×n|(A-BL)is Schur,Vd(VL) ≤h}.SinceVLis continuous with respect to the stabilizing control gainL, it readily follows thatLhis compact.Before the main theorem about the robustness of Procedure 3, we introduce the following instrumental lemma,which provides an upperbound onVd(VL).

    Lemma 3For any stabilizing control gain L,let L′=(R+BTVL B)-1BTVL A and EL=(L′-L)T(R+BTVL B)(L′-L).Then,

    where

    ProofWe can rewrite(4)as

    It was the day before Thanksgiving -- the first one my three children and I would be spending without their father, who had left several months before. Now the two older children were very sick with the flu, and the eldest1 had just been prescribed bed rest for a week.

    In addition,it follows from(5)that

    Subtracting(24)from(25)yields

    Since(A-BL?) is Schur, it follows from [27, Theorem 5.D6]that

    Taking the trace of(27)and using the main result of[28],we have

    Hence,the proof is completed.■

    Lemma 4For any L∈Lh,

    ProofSince(A-BL)is Schur,it follows from(5)and[27,Theorem 5.D6]that

    Hence,(29)readily follows from(31).■

    The following lemma shows that the sublevel setLhis invariant as long as the noiseΔLiis sufficiently small.

    ProofInduction is applied to prove the statement.Wheni=0, ?L0∈Lh.Suppose ?Li∈Lh,then,by[27,Theorem 5.D6],we have ?Vi?0.We can rewrite(17)as

    Considering(19),we have

    In addition,it follows from(19)that

    Plugging(33)and(34)into(32),and completing the squares,we have

    where

    If

    it is guaranteed that

    Writingdown(17)atthe(i+1)thiteration,andsubtracting it from(35),we have

    Following[27,Theorem 5.D6],we have

    Taking the trace of(40),and using Lemma 3 and the result in[28]yield

    It follows from(41),Lemma 4 and[28]that

    whereb2is defined as

    Hence,if

    it is guaranteed that

    Now,by Lypaunov’s direct method and by viewing Procedure 3 as a discrete-time nonlinear system with the state ?Vi, it is shown that ?Viconverges to a small neighbourhood of the optimal solution as long as noise is sufficiently small.

    Lemma 6For any h> 0and?L0∈Lh,if

    there exist a K-function ρ(·)and a KL-function κ(·,·),such that

    ProofRepeating(42)fori,i-1,...,0,we have

    Considering(21),it follows that

    The proof is thus completed.■

    The small-disturbance ISS property of the Procedure 3 is shown in the following theorem.

    Consequently,

    3.2 Robustness analysis of continuous-time policy iteration

    According to the exact PI for continuous-time LQR in(15) and (16), in the presence of noise, the steps of policy evaluation and policy improvement cannot be implemented accurately,and the inexact PI is as follows.

    Procedure 4 (Inexact PI for continuous-time LQR)

    1.Inexact policy evaluation:get?Mi∈Sm+n as an approximation of M( ?Pi),where?Pi is the solution of

    2.Inexact policy improvement:get the updated control gain by

    For any stabilizing control gainK, define the candidate Lyapunov function as

    wherePK=?0 is the solution of(13),i.e.

    SincePK≥P?,we have

    Foranyh>0,definethesublevelsetKh={K∈Rm×n|(ABK)is Hurwitz,Vc(PK)≤h}.SincePKis continuous with respect to the stabilizing control gainK,the sublevel setKhis compact.

    The following lemmas are instrumental for the proof of the main theorem.

    ProofThe Taylor expansion ofeDtis

    Hence,

    Pick av∈Rnwhich satisfies.Then,

    Hence,the lemma follows readily from(63).■

    The following lemma presents an upperbound of the Lyapunov functionVc(PK).

    Lemma 8For any stabilizing control gain K,let K′=R-1BTPK,where PK=PTK?0is the solution of(58),and EK=(K′-K)TR(K′-K).Then,

    ProofRewrite ARE(12)as

    Furthermore,(58)is rewritten as

    Subtracting(65)from(66)yields

    ConsideringK′=R-1BTPKand completing the squares in(67),we have

    Since(A-BK?) is Hurwitz, by (68) and [27, equation(5.18)],we have

    Taking the trace of (69), considering the cyclic property of trace and[28],we have(64).■

    Lemma 9For any K∈Kh,

    ProofSinceA-BKis Hurwitz,it follows from(58)that

    Taking the trace of(71),and considering[28],we have

    The proof is hence completed.■

    Writing down(54)for the(i+1)th iteration,and subtracting it from(75),we have

    Since(A-B?Ki+1) is Hurwitz (e(h) ≤e1(h)), it follows from[27,equation(5.18)]and(78)that

    Taking the trace of(79),and considering[28]and Lemma 9,we have

    It follows from(80)and Lemmas 7 and 8 that

    Taking the expression of ?Ki+1into consideration,we have

    Plugging(82)into(81),it follows that if

    we have

    ProofIt follows from Lemma 10,(81)and(82)that for anyi∈Z+,

    Repeating(86)fori,i-1,...,1,0,we have

    By(59),we have

    Hence,(85)follows readily.■

    With the aforementioned lemmas, we are ready to propose the main result on the robustness of the inexact PI for continuous-time LQR.

    ProofSuppose ?Ki∈Lh.If

    (ΔMuu,i+Muu( ?Pi))is invertible.It follows from(56)and

    Remark 3Compared with[24,25],in Theorems 1 and 2,the explicit expressions of the upperbounds on the small disturbance are given, such that at each iteration, the generated control gain is stabilizing and contained in the sublevel setsLhandKh.In addition, it is observed from (48) and (88)that the control gains generated by the inexact PI algorithms ultimately converge to a neighborhood of the optimal solution,and the size of the neighborhood is proportional to the quadratic form of the noise.

    4 Learning-based policy iteration

    In this section,based on the robustness property of the inexact PI in Procedure 3,we will develop a learning-based PI algorithm.Only the input-state trajectory data measured from the system is required for the algorithm.

    4.1 Algorithm development

    For a signalu[0,N-1]= [u0,u1,...,uN-1], its Hankel matrix of depthlis represented as

    Definition 1 An input signalu[0,N-1]is persistent exciting(PE)of orderlif the Hankel matrixHl(u[0,N-1])is full row rank.

    Lemma 12 [31]Let an input signal u[0,N-1]be PE of order l+n.Then,the state trajectory x[0,N-1]sampled from system(1)driven by the input u[0,N-1]satisfies

    Given the input-state datau[0,N-1]andx[0,N]sampled from (1), we will design a learning-based PI algorithm such that the accurate knowledge of system matrices is not required.For any time indices 0 ≤k1,k2≤N- 1 andV∈Sn,along the state trajectory of(1),we have

    It follows from(96)that

    Assumption 2 The exploration signalu[0,N-1]is PE of ordern+1.

    Under Assumption 2 and according to Lemma 12,z[0,N-1]is full row rank.As a result,for any fixedV∈Sn,(98)admits a unique solution

    whereΛis a data-dependent matrix defined as

    Therefore, given anyV∈Sn,Θ(V) can be directly computed from(99)without knowing the system matricesAandB.

    By(97),we can rewrite(7)as

    Plugging(99)into(101)yields(102).The learning-based PI is represented in the following procedure.

    Procedure 5 (Learning-based PI for discrete-time LQR)

    1.Learning-based policy evaluation

    2.Learning-based policy improvement

    It should be noticed that due to(99),Procedure 5 is equivalent to Procedure 1.

    4.2 Robustness analysis

    In the previous subsection,we assume that the accurate data from system can be obtained.In reality,measurement noise and unknown system disturbance are inevitable.Therefore,the input-state data is sampled from the following linear system with unknown system disturbance and measurement noise

    wherewk~N(0,Σw)andvk~N(0,Σv)are independent and identically distributed random noises.Let ˇzk=[yTk,uTk]and suppose there are in totalStrajectories of system(104)which start from the same initial state and are driven by the sameexplorationinputu[0,N-1].Averagingthecollecteddata overStrajectories,we have

    Then,the data-dependent matrix is constructed as

    By the strong law of large numbers,the following limitations hold almost surely

    Recall thatz[0,N-1],x[1,N-1], andΛare the data collected from system (1) with the same initial state and exploration input as (104).SinceSis finite, the difference betweenΛand ˇΛSis unavoidable,and hence,

    Procedure 6 (Learning-based PI using noisy data)

    1.Learning-based policy evaluation using noisy data

    2.Learning-based policy improvement using noisy data

    In Procedure 6,the symbol“check”is used to denote the variables for the learning-based PI using noisy data.In addition,let ?Videnote the result of the accurate evaluation of ˇLi,i.e.?Viis the solution of(109)with ˇΘ(ˇVi)replaced byΘ(?Vi).ˇVi= ?Vi+ΔViis the solution of(109)andΔViis the policy evaluation error induced by the noiseΔΛ.In the following contents,the superscriptSis omitted to simplify the notation.Based on the robustness analysis in the previous section,we will analyze the robustness of the learning-based PI to the noiseΔΛ.

    For any stabilizing control gainL,let ˇVL=VL+ΔVbe the solution of the learning-based policy evaluation with the noisy data-dependent matrix ˇΛ,i.e.

    The following lemma guarantees that(111)has a unique solution(VL+ΔV)=(VL+ΔV)T?0.

    Lemma 13If

    then(Λ+ΔΛ)TIn-LTTis a Schur matrix.

    ProofRecall thatVL=VTL?0 is the solution of(5)associated with the stabilizing control gainL.By (99), (5) is equivalent to the following equation

    SinceQ?0,by[30,Lemma 3.9],ΛTIn-LTTis Schur.

    WhenΛis disturbed byΔΛ,we can rewrite(113)as

    When(112)holds,we haveBy [30, Lemma 3.9], (114) and (115),(Λ+ΔΛ)T×In-LTTis Schur.■

    The following lemma implies that the policy evaluation errorΔVis small as long asΔΛis sufficiently small.

    Lemma 14For any h>0,L∈Lh,and ΔΛ satisfying(112),we have

    ProofAccording to[32,Theorems 2.6 and 4.1],we have

    Since Tr(VL) ≤h+Tr(V?),it follows from(102)and[27,Theorem 5.D6]that

    Taking the trace of both sides of(118)and utilizing[28],we have

    Plugging(119)into(117)yields(116).■

    The following lemma tells us thatΔΘis small ifΔVandΔΛare small enough.

    Lemma 15LetˇΘ(ˇVL)= ˇΛˇVLˇΛTand ΔΘ(VL)= ˇΘ(ˇVL)-Θ(VL),then,

    ProofBy the expressions of ˇΘ(ˇVL)andΘ(VL),we have

    Hence,(120)is obtained by(121)and the triangle inequality.

    By the follow lemma,it is ensured thatΔLconverges to zero asΔΘtends to zero.

    Lemma 16Let ΔL=(R+Θˇuu(VˇL))-1Θˇux(VˇL)-(R+Θuu(VL))-1Θux(VL).Then,

    ProofFrom the expression ofΔL,we have

    Therefore,(122)readily follows from(123).■

    Given the aforementioned lemmas,we are ready to show the main result on the robustness of the learning-based PI algorithm in Procedure 6.

    ProofAt each iteration of Procedure 6,ifΛis not disturbed by noise,i.e.ΔΛ=0,the policy improvement is(103).Due to the influence ofΔΛ,the control gain is updated by(110),which can be rewritten as

    whereΔLi+1is

    5 Numerical simulation

    In this section,we illustrate the proposed theoretical results by a benchmark example known as cart-pole system [33].The parameters of the cart-pole system are:mc=1kg(mass of the cart),m p= 0.1kg (mass of the pendulum),lp=0.5m (distance from the center of mass of the pendulum to the pivot),gc= 9.8 m/s2(gravitational acceleration).By linearizing the system around the equilibrium,the system is

    By discretizing it through Euler method with a step size of 0.01sec,we have

    The weighting matrices of the cost (1) areQ= 10I4andR=1.The initial stabilizing gain to start the policy iteration algorithm is

    5.1 Robustness test of the inexact policy iteration

    We test the robustness of the inexact PI for discrete-time systems in Procedure 3.At each iteration, each element ofΔGiis sampled from a standard Gaussian distribution,and then its spectral norm is scaled to 0.2.During the iteration,the relative errors of the control gain ?Liand cost matrix ?Viare shown in Fig.1.The control gain and cost matrix are close to the optimal solution at the 5th iteration.It is observed that even under the influence of disturbances at each iteration,the inexact PI in Procedure 3 can still approach the optimal solution.ThisisconsistentwiththeISSpropertyofProcedure 3 in Theorem 1.

    In addition, the robustness of Procedure 4 is tested.At each iteration,ΔMiis randomly sampled with the norm of 0.2.Under the influence ofΔMi,the evolution of the control gain ?Kiand cost matrix ?Piis shown in Fig.2.Under the noiseΔMi,the algorithm cannot converge exactly to the optimal solution.However,with the small-disturbance ISS property,the inexact PI can still converge to a neighborhood of the optimal solution,which is consistent with Theorem 2.

    5.2 Robustness test of the learning-based policy iteration

    The robustness of the learning-based PI in Procedure 6 is tested for system (104) with both system disturbance and measurement noise.The variances of the system disturbance and measurement noise areΣw=0.01InandΣv=0.01In.One trajectory is sampled from the solution of(104)and the length of the sampled trajectory isN= 100, i.e.100 data collected from(104)is used to construct the data-dependent matrix ?ΛS.Compared with the matrixΛin(100b)where the data is collected from the system without unknown system disturbance and measurement noise, ?ΛSis directly constructed by the noisy data.Therefore, at each iteration of the learning-based PI,ΔΛSintroduces the disturbances.The evolution of the control gain and cost matrix is in Fig.3.It is observed that with the noisy data,the control gain and the cost matrix obtained by Procedure 6 converge to an approximation of the optimal solution.This coincides with the result in Theorem 3.

    Fig.1 Robustness test of Procedure 3 whenΔG∞=0.2

    Fig.2 Robustness test of Procedure 4 whenΔM∞=0.2

    Fig.3 Robustness test of Procedure 6 when the noisy data is applied for the construction of ?Λ

    6 Conclusion

    In this paper, we have studied the robustness property of policy optimization in the presence of disturbances at each iteration.Using ISS Lyapunov techniques,it is demonstrated that the PI ultimately converges to a small neighborhood of the optimal solution as long as the disturbance is sufficiently small.In this paper,we also provided a quantifiable bound.Based on the ISS property and Willems’fundamental lemma,a learning-based PI algorithm is proposed and the persist excitation of the exploratory signal can be easily guaranteed.A numerical simulation example is provided to illustrate the theoretical results.

    Data availability The data that support the findings of this study are available from the corresponding author, L.Cui, upon reasonable request.

    黄色成人免费大全| 免费看美女性在线毛片视频| 亚洲精品乱码久久久v下载方式 | 亚洲电影在线观看av| 欧美极品一区二区三区四区| 最新在线观看一区二区三区| 亚洲国产色片| x7x7x7水蜜桃| 成人av在线播放网站| 久久午夜亚洲精品久久| 两性夫妻黄色片| 日本一二三区视频观看| 夜夜爽天天搞| 最新在线观看一区二区三区| 亚洲欧洲精品一区二区精品久久久| 久久伊人香网站| 欧美日韩亚洲国产一区二区在线观看| 欧美3d第一页| 日本三级黄在线观看| 欧美高清成人免费视频www| 在线a可以看的网站| 国产三级在线视频| 99re在线观看精品视频| 国产精品一及| 亚洲九九香蕉| 久9热在线精品视频| tocl精华| 好男人在线观看高清免费视频| 免费看日本二区| 99久久综合精品五月天人人| 中出人妻视频一区二区| 精品久久久久久成人av| 午夜日韩欧美国产| 久久久久国产精品人妻aⅴ院| 国产视频一区二区在线看| 亚洲专区中文字幕在线| 老司机在亚洲福利影院| 精品国产亚洲在线| 久久国产乱子伦精品免费另类| 免费高清视频大片| av视频在线观看入口| 欧美成人一区二区免费高清观看 | 法律面前人人平等表现在哪些方面| 亚洲第一电影网av| 国产综合懂色| 脱女人内裤的视频| 国产探花在线观看一区二区| 精品久久久久久久人妻蜜臀av| 岛国在线观看网站| 在线观看66精品国产| 亚洲中文字幕日韩| 日韩免费av在线播放| 91字幕亚洲| 国产三级中文精品| 九九久久精品国产亚洲av麻豆 | 国产成人av激情在线播放| 国产亚洲精品久久久久久毛片| 亚洲美女视频黄频| 成人无遮挡网站| www.精华液| 在线十欧美十亚洲十日本专区| 免费看a级黄色片| 成人特级av手机在线观看| av女优亚洲男人天堂 | 偷拍熟女少妇极品色| 国产麻豆成人av免费视频| 精品国内亚洲2022精品成人| 成人国产一区最新在线观看| 在线观看美女被高潮喷水网站 | 国产伦人伦偷精品视频| 国产单亲对白刺激| 国语自产精品视频在线第100页| 日韩 欧美 亚洲 中文字幕| 午夜亚洲福利在线播放| 国产激情久久老熟女| 成年版毛片免费区| 国内揄拍国产精品人妻在线| 日韩高清综合在线| 国产精品久久电影中文字幕| 国内精品一区二区在线观看| 久久精品aⅴ一区二区三区四区| 69av精品久久久久久| 国产v大片淫在线免费观看| 韩国av一区二区三区四区| 国内少妇人妻偷人精品xxx网站 | 亚洲欧美精品综合一区二区三区| 欧美黄色片欧美黄色片| 国产真实乱freesex| 中文字幕久久专区| 男人舔奶头视频| 九九热线精品视视频播放| 非洲黑人性xxxx精品又粗又长| 小蜜桃在线观看免费完整版高清| 美女大奶头视频| 国产高清有码在线观看视频| 国产又黄又爽又无遮挡在线| 国产一级毛片七仙女欲春2| 99精品欧美一区二区三区四区| 青草久久国产| 麻豆一二三区av精品| 日韩欧美国产一区二区入口| 岛国在线免费视频观看| 久久久精品大字幕| 我的老师免费观看完整版| 亚洲精品美女久久久久99蜜臀| 最近在线观看免费完整版| 脱女人内裤的视频| 精品久久久久久,| 亚洲av第一区精品v没综合| 久久精品国产99精品国产亚洲性色| 亚洲自偷自拍图片 自拍| 不卡一级毛片| 久久99热这里只有精品18| 国产精品爽爽va在线观看网站| 母亲3免费完整高清在线观看| 欧美国产日韩亚洲一区| 亚洲中文字幕一区二区三区有码在线看 | 一个人免费在线观看电影 | 国产高清视频在线观看网站| 很黄的视频免费| 人人妻人人看人人澡| 日本五十路高清| 免费大片18禁| 亚洲国产中文字幕在线视频| 每晚都被弄得嗷嗷叫到高潮| 欧美激情在线99| 法律面前人人平等表现在哪些方面| 欧美日韩综合久久久久久 | www.精华液| 九九在线视频观看精品| 国产av不卡久久| 亚洲成a人片在线一区二区| 又黄又粗又硬又大视频| 一进一出好大好爽视频| 麻豆成人午夜福利视频| 搞女人的毛片| 香蕉av资源在线| tocl精华| 亚洲最大成人中文| 欧美黑人欧美精品刺激| 国产美女午夜福利| 成人国产综合亚洲| 亚洲熟妇熟女久久| 亚洲第一欧美日韩一区二区三区| 国内精品一区二区在线观看| 美女黄网站色视频| xxxwww97欧美| 一个人看的www免费观看视频| 丰满的人妻完整版| 黄色片一级片一级黄色片| 91九色精品人成在线观看| 国产精品久久久av美女十八| 久久久久国内视频| 亚洲在线自拍视频| 村上凉子中文字幕在线| 这个男人来自地球电影免费观看| 日日夜夜操网爽| 亚洲真实伦在线观看| 少妇裸体淫交视频免费看高清| 一卡2卡三卡四卡精品乱码亚洲| 熟女电影av网| 嫩草影院精品99| 又爽又黄无遮挡网站| 精品福利观看| 亚洲精品色激情综合| 女警被强在线播放| 免费看光身美女| 99久久精品热视频| 久久这里只有精品19| 熟妇人妻久久中文字幕3abv| 国产精品久久久久久精品电影| 青草久久国产| 国产伦一二天堂av在线观看| 亚洲,欧美精品.| 精品熟女少妇八av免费久了| 少妇熟女aⅴ在线视频| 男女做爰动态图高潮gif福利片| 国产一区二区三区在线臀色熟女| 久久性视频一级片| 最新美女视频免费是黄的| 成人国产一区最新在线观看| 国产高清有码在线观看视频| 怎么达到女性高潮| 丁香欧美五月| 麻豆成人午夜福利视频| 国产毛片a区久久久久| 精品国产美女av久久久久小说| 一本一本综合久久| 不卡av一区二区三区| 国产欧美日韩一区二区三| 在线观看日韩欧美| 国产精品亚洲美女久久久| 男女下面进入的视频免费午夜| 桃红色精品国产亚洲av| 免费在线观看视频国产中文字幕亚洲| 1024香蕉在线观看| 国产亚洲欧美98| 黄色日韩在线| a级毛片a级免费在线| 成人三级做爰电影| 男人舔女人的私密视频| 在线观看午夜福利视频| 91在线观看av| 国产精品免费一区二区三区在线| 欧美中文日本在线观看视频| 麻豆久久精品国产亚洲av| 我要搜黄色片| 麻豆一二三区av精品| 国产又色又爽无遮挡免费看| 国产美女午夜福利| 99视频精品全部免费 在线 | 国产精品亚洲一级av第二区| 国产精品女同一区二区软件 | 欧美日韩中文字幕国产精品一区二区三区| 熟女少妇亚洲综合色aaa.| 麻豆久久精品国产亚洲av| 少妇裸体淫交视频免费看高清| 中文在线观看免费www的网站| 好男人电影高清在线观看| 亚洲国产精品成人综合色| 国产av麻豆久久久久久久| 女人高潮潮喷娇喘18禁视频| 毛片女人毛片| 亚洲欧美日韩东京热| 波多野结衣高清作品| 国产97色在线日韩免费| 999久久久精品免费观看国产| 极品教师在线免费播放| 最新美女视频免费是黄的| 午夜福利在线在线| 十八禁人妻一区二区| 亚洲中文日韩欧美视频| 最新中文字幕久久久久 | av片东京热男人的天堂| 日韩欧美三级三区| 最近最新中文字幕大全免费视频| 99热6这里只有精品| 中文字幕高清在线视频| 老汉色∧v一级毛片| 亚洲国产欧洲综合997久久,| 一本久久中文字幕| 国产av麻豆久久久久久久| 国产三级中文精品| 国产精品爽爽va在线观看网站| 久久久久久久精品吃奶| 三级毛片av免费| 人妻丰满熟妇av一区二区三区| 亚洲最大成人中文| 欧美av亚洲av综合av国产av| 免费高清视频大片| 国产欧美日韩一区二区精品| 欧美大码av| www日本在线高清视频| 一级a爱片免费观看的视频| 90打野战视频偷拍视频| 午夜激情福利司机影院| 免费在线观看日本一区| av天堂中文字幕网| 美女午夜性视频免费| 我要搜黄色片| 人妻久久中文字幕网| 国产精品亚洲美女久久久| 亚洲自拍偷在线| 听说在线观看完整版免费高清| x7x7x7水蜜桃| 脱女人内裤的视频| 女同久久另类99精品国产91| 一进一出好大好爽视频| 日本 欧美在线| 国内久久婷婷六月综合欲色啪| 日韩精品中文字幕看吧| 久久久久亚洲av毛片大全| 国产成人福利小说| 欧美日韩福利视频一区二区| 窝窝影院91人妻| 午夜视频精品福利| АⅤ资源中文在线天堂| 亚洲人成电影免费在线| 人人妻人人看人人澡| 一进一出抽搐gif免费好疼| 欧美性猛交黑人性爽| 女人被狂操c到高潮| 色哟哟哟哟哟哟| 免费搜索国产男女视频| 亚洲专区字幕在线| 一二三四社区在线视频社区8| 国产v大片淫在线免费观看| 日韩欧美国产一区二区入口| 欧美乱妇无乱码| 老熟妇乱子伦视频在线观看| 少妇裸体淫交视频免费看高清| 成人一区二区视频在线观看| 岛国视频午夜一区免费看| 色综合欧美亚洲国产小说| 亚洲 国产 在线| 禁无遮挡网站| 变态另类成人亚洲欧美熟女| 国产精品久久视频播放| 婷婷亚洲欧美| 亚洲欧美一区二区三区黑人| 亚洲专区字幕在线| 亚洲国产欧美人成| 成人一区二区视频在线观看| 国产精品一区二区三区四区久久| 九色国产91popny在线| 男人舔女人下体高潮全视频| 99久久综合精品五月天人人| 天天躁狠狠躁夜夜躁狠狠躁| 日本黄色片子视频| 午夜免费成人在线视频| 无人区码免费观看不卡| 两个人的视频大全免费| a在线观看视频网站| 久久久久国产精品人妻aⅴ院| 俄罗斯特黄特色一大片| 人人妻人人看人人澡| 19禁男女啪啪无遮挡网站| 国产 一区 欧美 日韩| 国产不卡一卡二| 99久久精品国产亚洲精品| 99久久久亚洲精品蜜臀av| 麻豆av在线久日| 亚洲欧美日韩东京热| 国产精品自产拍在线观看55亚洲| 亚洲中文日韩欧美视频| 欧美性猛交黑人性爽| 日日夜夜操网爽| 亚洲av美国av| 久久久久久九九精品二区国产| www.www免费av| 亚洲精品一卡2卡三卡4卡5卡| 亚洲欧美一区二区三区黑人| 不卡av一区二区三区| 黄色视频,在线免费观看| 99久久精品一区二区三区| 男女床上黄色一级片免费看| xxx96com| 精品乱码久久久久久99久播| 真实男女啪啪啪动态图| 国产精品免费一区二区三区在线| 中文字幕最新亚洲高清| 久久精品国产清高在天天线| 国产高清videossex| 国内揄拍国产精品人妻在线| 午夜福利成人在线免费观看| 18禁黄网站禁片免费观看直播| 很黄的视频免费| 91字幕亚洲| 在线观看日韩欧美| 国内少妇人妻偷人精品xxx网站 | 999精品在线视频| 国产成人影院久久av| 国产乱人伦免费视频| 国产高清有码在线观看视频| 国产精品亚洲美女久久久| or卡值多少钱| 亚洲无线观看免费| 免费大片18禁| 搞女人的毛片| 亚洲欧美日韩卡通动漫| 黄色成人免费大全| 丝袜人妻中文字幕| 亚洲 欧美 日韩 在线 免费| 国产亚洲av嫩草精品影院| 女同久久另类99精品国产91| 高清毛片免费观看视频网站| 国产一区二区三区视频了| 嫁个100分男人电影在线观看| 变态另类成人亚洲欧美熟女| 中国美女看黄片| 国产黄片美女视频| 中国美女看黄片| 国产视频一区二区在线看| 精品久久久久久久末码| 99视频精品全部免费 在线 | 国产野战对白在线观看| 久久这里只有精品19| 嫩草影视91久久| bbb黄色大片| 大型黄色视频在线免费观看| 亚洲人成电影免费在线| 又粗又爽又猛毛片免费看| 国产美女午夜福利| av天堂在线播放| 99热6这里只有精品| 日韩精品青青久久久久久| 热99re8久久精品国产| 国产高清视频在线播放一区| 热99re8久久精品国产| 欧美+亚洲+日韩+国产| 99久久国产精品久久久| 欧美+亚洲+日韩+国产| 国产真实乱freesex| 我的老师免费观看完整版| 99在线视频只有这里精品首页| 日本黄大片高清| 青草久久国产| 免费在线观看影片大全网站| 可以在线观看的亚洲视频| 国产精品综合久久久久久久免费| 一进一出抽搐动态| av中文乱码字幕在线| 国产亚洲欧美98| 国产野战对白在线观看| 欧美3d第一页| 搡老岳熟女国产| 成年女人毛片免费观看观看9| 久久人人精品亚洲av| 国产97色在线日韩免费| 一级a爱片免费观看的视频| 老司机午夜福利在线观看视频| 少妇人妻一区二区三区视频| 欧美日韩亚洲国产一区二区在线观看| 国产伦精品一区二区三区视频9 | 人人妻人人看人人澡| 欧美性猛交黑人性爽| 亚洲国产欧美网| 嫩草影院精品99| 国产伦一二天堂av在线观看| 两个人视频免费观看高清| 日韩精品中文字幕看吧| 国产精品久久视频播放| 999久久久精品免费观看国产| 最近在线观看免费完整版| 久久久久久人人人人人| 国产成人一区二区三区免费视频网站| 午夜免费成人在线视频| 久久草成人影院| 啦啦啦免费观看视频1| 一卡2卡三卡四卡精品乱码亚洲| avwww免费| 丰满人妻熟妇乱又伦精品不卡| 免费电影在线观看免费观看| 免费观看精品视频网站| 欧美一区二区国产精品久久精品| netflix在线观看网站| 亚洲性夜色夜夜综合| 丁香六月欧美| 在线免费观看的www视频| 老司机午夜福利在线观看视频| 12—13女人毛片做爰片一| 一个人免费在线观看电影 | 国内揄拍国产精品人妻在线| 无人区码免费观看不卡| 午夜两性在线视频| 精品免费久久久久久久清纯| 免费观看精品视频网站| 亚洲美女视频黄频| 欧美3d第一页| 日韩欧美 国产精品| 天天躁日日操中文字幕| 中文字幕高清在线视频| 88av欧美| 色av中文字幕| 此物有八面人人有两片| 国产亚洲av嫩草精品影院| 国产精品国产高清国产av| 舔av片在线| 欧美最黄视频在线播放免费| 一进一出好大好爽视频| 欧美极品一区二区三区四区| 亚洲国产精品999在线| 18禁黄网站禁片午夜丰满| 1000部很黄的大片| 国产成人av教育| 99国产极品粉嫩在线观看| 亚洲一区高清亚洲精品| 亚洲,欧美精品.| 99久久综合精品五月天人人| 久久婷婷人人爽人人干人人爱| 国产精品永久免费网站| 长腿黑丝高跟| 亚洲一区二区三区不卡视频| 老鸭窝网址在线观看| 亚洲熟女毛片儿| 99在线视频只有这里精品首页| 又粗又爽又猛毛片免费看| 女人被狂操c到高潮| 日本 av在线| 久久国产精品人妻蜜桃| 欧美性猛交黑人性爽| 免费在线观看视频国产中文字幕亚洲| 国产精品 国内视频| 成人特级黄色片久久久久久久| 别揉我奶头~嗯~啊~动态视频| 亚洲精品在线观看二区| 99热这里只有精品一区 | 特大巨黑吊av在线直播| 宅男免费午夜| 天堂av国产一区二区熟女人妻| 偷拍熟女少妇极品色| 国语自产精品视频在线第100页| 午夜免费激情av| 午夜福利高清视频| 成人鲁丝片一二三区免费| 美女午夜性视频免费| 亚洲成av人片在线播放无| 日韩欧美免费精品| h日本视频在线播放| 免费观看的影片在线观看| 亚洲 国产 在线| www国产在线视频色| 黄色女人牲交| 欧美最黄视频在线播放免费| 制服丝袜大香蕉在线| 一二三四在线观看免费中文在| 青草久久国产| 国产av在哪里看| xxxwww97欧美| 男人舔女人下体高潮全视频| 国产高清三级在线| 免费看美女性在线毛片视频| 亚洲熟女毛片儿| 99久久精品国产亚洲精品| 久久99热这里只有精品18| 99国产精品99久久久久| 嫩草影院入口| 伊人久久大香线蕉亚洲五| 国产精品久久电影中文字幕| 久久久久久国产a免费观看| 亚洲午夜精品一区,二区,三区| 久9热在线精品视频| av国产免费在线观看| 精品国产乱子伦一区二区三区| 我要搜黄色片| 亚洲中文字幕一区二区三区有码在线看 | 一级黄色大片毛片| 好男人在线观看高清免费视频| 国产精品久久久av美女十八| 高潮久久久久久久久久久不卡| 少妇裸体淫交视频免费看高清| 亚洲 欧美 日韩 在线 免费| 性欧美人与动物交配| 深夜精品福利| 亚洲最大成人中文| 成人av在线播放网站| 国产精品香港三级国产av潘金莲| 一级毛片精品| 最近最新免费中文字幕在线| 国产三级中文精品| 亚洲天堂国产精品一区在线| 国内毛片毛片毛片毛片毛片| 999久久久精品免费观看国产| 亚洲精品456在线播放app | 一个人看视频在线观看www免费 | 99国产精品99久久久久| 中亚洲国语对白在线视频| 好男人电影高清在线观看| 成人18禁在线播放| 午夜福利免费观看在线| 精品不卡国产一区二区三区| 国产69精品久久久久777片 | 成人永久免费在线观看视频| 男人和女人高潮做爰伦理| www日本黄色视频网| 免费在线观看日本一区| 免费搜索国产男女视频| 欧美又色又爽又黄视频| www.精华液| 精品久久久久久久毛片微露脸| 免费av不卡在线播放| av片东京热男人的天堂| 国产精品免费一区二区三区在线| 香蕉久久夜色| 麻豆国产av国片精品| 男女下面进入的视频免费午夜| 最新美女视频免费是黄的| 欧美一区二区国产精品久久精品| 久久这里只有精品19| 麻豆一二三区av精品| 黄频高清免费视频| 国产精品综合久久久久久久免费| 久久伊人香网站| 久久中文看片网| 日日摸夜夜添夜夜添小说| 男女之事视频高清在线观看| 久久香蕉精品热| 午夜激情欧美在线| 色噜噜av男人的天堂激情| 看黄色毛片网站| 午夜福利视频1000在线观看| 午夜福利高清视频| 日韩欧美国产在线观看| 国产精品一区二区三区四区久久| 曰老女人黄片| 免费看十八禁软件| 亚洲国产精品999在线| 一级毛片精品| 叶爱在线成人免费视频播放| 黄色 视频免费看| 在线看三级毛片| 国产精品一区二区三区四区久久| 老熟妇乱子伦视频在线观看| 给我免费播放毛片高清在线观看| 99re在线观看精品视频| 国产高清videossex| 搡老妇女老女人老熟妇| av天堂中文字幕网| 国产成人欧美在线观看| 亚洲国产欧美人成| 久久精品亚洲精品国产色婷小说| 别揉我奶头~嗯~啊~动态视频| 精品人妻1区二区| 国内精品久久久久精免费| 亚洲五月天丁香| 三级国产精品欧美在线观看 | 婷婷六月久久综合丁香| 欧美高清成人免费视频www| 91久久精品国产一区二区成人 | 日本黄大片高清| 亚洲人与动物交配视频| 岛国视频午夜一区免费看| 国产 一区 欧美 日韩| 国产高清有码在线观看视频| 久久欧美精品欧美久久欧美|