Chi Zhang·Minggang Gan·Chenchen Xue,3
Abstract In this paper,optimal switching and control approaches are investigated for switched systems with infinite-horizon cost functions and unknown continuous-time subsystems.At first,for switched systems with autonomous subsystems,the optimal solution based on the finite-horizon HJB equation is proposed and a data-driven optimal switching algorithm is designed.Then,for the switched systems with subsystem inputs,a data-driven optimal control approach based on the finite-horizon HJB equation is proposed.The data-driven approaches approximate the optimal solutions online by means of the system state data instead of the subsystem models.Moreover,the convergence of the two approaches is analyzed.Finally,the validity of the two approaches is demonstrated by simulation examples.
Keywords Switched systems·Optimal switching·Optimal control·Data-driven control
Switched systems [1–3] have been intensively studied in the existing literature.Although great progress has been made in the control field of switched systems,many problems are still either unexplored or need to be solved since switching among subsystems complicates the system characteristic.Thereinto the optimal control problems of switched systems[4–7] have been paid increasingly more attention and adaptive dynamic programming (ADP) [8–12] approaches have been investigated for optimal control problems of switched systems [13–24].
Optimal switching for the switched system with autonomous subsystems is a fundamental problem which is different from the optimal control problems of ordinary systems and ADP can provide approximate optimal solutions for different initial conditions directly [13–20].For optimal switching of switched systems with discrete-time dynamics,ADP-based optimal solutions are proposed for finite-horizon[13] or infinite-horizon [14] problems.Then the optimal switching problems with constraints such as switching cost[15] or minimum dwell time constraint [16] are investigated.Moreover,a value iteration (VI) algorithm is used to derive the infinite-horizon solution [17].For switched systems with continuous-time dynamics,a policy iteration (PI) based optimal switching solution is proposed with its offline,online,and concurrent implementation [18,19].Then to simplify calculation,a single loop PI algorithm with recursive least square is brought in [20].
For optimal control problems of switched systems with subsystem inputs,ADP approaches have also been investigated [21–24].An ADP based algorithm is designed for learning the optimal cost-to-go function based on the switching times and the initial conditions [21].An?-optimal control scheme based on the iterative ADP algorithm is presented for a class of nonlinear discrete-time switched systems [22].An ADP based method is proposed to provide a feedback solution for unspecified initial conditions and different final times [23].A novel VI based off-policy ADP algorithm is proposed for the infinite-horizon adaptive optimal control of continuous-time linear periodic systems without the exact knowledge of system dynamics [24].Moreover,an ADP based scheme is developed for optimal control of nonlinear impulsive systems with free impulse instants and applied to the orbital maneuver of spacecraft with the fixed final time using impulsive actuators [25].A model-based and switching-based optimization method is presented for multi-intersection and multiphase traffic light systems in the framework of ADP [26].
Most of the above optimal switching and control approaches depend on the perfect knowledge of the switched system models.However,sometimes system models can not be acquired accurately so that approaches independent of system models [27] need to be explored.Optimal control approaches independent of models have been introduced for switched systems.ADP approaches are investigated for switched system under the assumption that dynamic equations can be evaluated at some sets [28,29].Gradient-decent optimal switching approaches are investigated for continuous-time switched systems with finite-horizon cost functions [30,31].An off-policy ADP algorithm is proposed for continuous-time linear periodic systems with infinitehorizon cost function without the exact knowledge of system dynamics [24].It can be indicated that approaches which do not require the assumption that dynamic equations can be evaluated at some sets need to be investigated for switched systems with infinite-horizon cost functions and unknown dynamics.
In ADP field of ordinary systems,PI and VI approaches have developed for systems with unknown dynamics.Optimal control solution based on PI is proposed online without knowing the internal dynamics model of the system [8].PI approaches are proposed to iteratively update the control policy online without the system dynamics [9,10].Datadriven ADP approaches are proposed based on VI with completely unknown dynamics [11,12].In this paper,inspired by the ADP approaches [11,12],novel optimal solutions based on the finite-horizon HJB equation are proposed for optimal switching and optimal control of switched systems with continuous-time dynamics and infinite-horizon cost functions.Then,data-driven optimal switching and control algorithms are proposed based on the finite-horizon HJB equation for switched systems with unknown subsystems and infinite-horizon cost functions,not requiring the known subsystems or the assumption that dynamic equations can be evaluated at some sets.The algorithms take full advantages of data produced by an initial switching or control policy to approximate the optimal switching or control policy online starting from certain positive definite function.Compared to the optimal control of ordinary systems with unknown dynamics in [11,12],the optimal switching and optimal control of switched systems with unknown subsystems is more complicated which is brought by switching among subsystems.The proposed data-driven optimal switching and control approaches take advantages of data not requiring known subsystems while the approaches in [19,20] aim at optimal switching requiring the known models of switched systems.
The contribution of this paper is stated as follows:(1)Optimal solutions based on the finite-horizon HJB equations are proposed for the optimal switching and control problems of switched systems with continuous-time dynamics and infinite-horizon cost functions.(2) A novel data-driven optimal switching algorithm is designed based on the finitehorizon HJB equation to approximate the optimal switching policy for switched systems with unknown autonomous subsystems.(3) A data-driven optimal control algorithm solely from the data with no need of subsystem dynamics is first proposed to solve optimal control problems for switched systems with unknown subsystems and inputs.(4) The convergence of the approaches is analyzed.
The remainder of the paper is organized as follows.In Sect.2,the preliminaries are introduced.In Sect.3,the datadriven optimal switching approach is proposed for switched systems with unknown autonomous subsystems.In Sect.4,the data-driven optimal control approach is proposed for switched systems with unknown dynamics.In Sect.5,simulation examples demonstrate the validity of the approaches.Finally,the conclusion is drawn in Sect.6.
Consider the switched system with continuous-time dynamics in the following form:
wherex(t)∈?nis the system state which is measurable,u(t)∈?mis the input,v∈Vrepresents the subsystem index,V={1,2,…,N} is the index set of all subsystems,Nis the number of subsystems,andfv∶?n×?m→?ndenotes the dynamics of subsystemv.fv(v∈V) is Lipschitz continuous inΩxuwhereΩxu ??n×?mis the region of interest,which guarantees the uniqueness of resulting state trajectory and that there exists some policy under which the origin is an equilibrium point for the closed-loop system [20].Ω ??ninΩxuis the region corresponding to the statexandΩu ??minΩxuis the region corresponding to the inputu.Moreover,there existsv∈Vsuch thatfv(0,0)=0.
In this paper,the infinite-horizon cost function is defined as
whereQ∶?n×?m→Ris a positive definite function.The input of the system can be defined in the state-feedback form as follows:μ(x(t))=u(t) .The optimal switching problem and the optimal control problem are investigated for the system (1) with the cost (2) and unknown subsystem dynamics,which are stated as follows:
Problem 1The optimal switching problem is to find the optimal switching policyv?(?) to minimize the cost function(2) for system (1) where the subsystems are autonomous.
Problem 2The optimal control problem is to find the optimal control policy (v?(?),μ?(?)) to minimize the cost function(2) for system (1).
The main challenge of this paper lies in how to acquire the optimal switching and control policy through the data without the subsystem models.The calculation related to switchings between subsystems requiring the system dynamics becomes intractable without system dynamics.
The cost from the timetto infinity with the statex(t) at timetcan be defined as
Then,an assumption about the admissible control policy is brought in to guarantee the existence of the solution.
Definition 1A control policy is admissible with respect to the cost (2) for system (1),if it stabilizes the system in the state regionΩand the costV(x0) is finite for allx0∈Ω.
Assumption 1There exists at least one admissible control policy for system (1).
In this section,we first discuss the optimal switching problem where the subsystems are autonomous.Firstly,the finite-horizon HJB equation is derived for the switched system with autonomous subsystems and then is employed to solve the optimal switching problem.
The infinite-horizon HJB equation [19,20] for the switched system with autonomous subsystems is described as
For the switched system with autonomous subsystems,a finite-horizon cost-to-go from the timetto a fixed final timetfwith the statex(t) at timetcan be defined as
whereδt>0 is very small.
According to the optimality principle,the optimal costto-go can be transformed into
Whenδt>0 is very small and the switching is ideal (namely the switching is instantaneous),applying the first-order Taylor expansion of
Inspired by the VI idea [11,12],the optimal solution is provided for switched systems based on the finite-horizon HJB equation.At first,with the aid of the finite-horizon HJB equation (10),a functionVF0(x,t) can be defined as follows:
with the terminal conditionVF0(?,tf)=V0(?) whereV0is a positive definite function.Then the following theorem holds forVF0(?,t),where the optimal solution is stated.
The optimal solution based on the finite-horizon HJB equation starts from certain positive definite function and can be regarded as a VI approach for switched systems with continuous-time subsystem dynamics.Next,an algorithm is designed to calculate the optimal solution through (28)–(30)without known subsystems.
In this section,according to the above optimal solution and ADP idea,a data-driven optimal switching algorithm based on the HJB equation is proposed to approximate the optimal cost and the switching policy for switched systems with unknown autonomous subsystems.
In the process to solve (13),the costVF1(x,s) in (13) is unknown and to facilitate calculation it can be represented by
whereΨ(x)=consists of linearly independent basis functionsΨj(x)∶?n→?(j=1,2,…,Nc),Cv(s),v∈Vare the weight vectors corresponding to subsystemvandeΨ,v(x,s) is the approximation error.Ncis the number of the basis functionsΨj(x).
With (16) and (17) applied in (13),it can be deduced that
Applying a switching policyv0in the system generates a state trajectory with quantities of data which are useful in following calculation.Some necessary data matrices are defined as follows:
wheret1 Assumption 2There exist positive integersL,L1and positive numbersα,α1such that the following equalities hold:where the time instantstrcan be any instant among the selected instantst1,t2,…,tl;for ?v∈V,where the time instantstrsatisfies the condition thatv0(x(tr))=v. This kind of assumption is similar to the persistent excitation condition.According to [19,20],for the optimal switching problem of switched systems,Assumption 2 can be satisfied through random switching. Along the state trajectory,it can be obtained that Sinceδtis very small,the values ofv0(x) can be seen to be constant during the time interval [tr,tr+δt) .It follows that where the time instantstrsatisfies the condition thatv0(x(tr))=v,for ?v∈V.Then with the acquired data matrices applied,it can be achieved from (19) that where the time instantstrsatisfies the condition thatv0(x(tr))=v,for ?v∈V.Then under Assumption 2,the estimate ofCv(s) can be achieved from (20) with the data matrices as follows: According to the above analysis,optimal switching algorithm can be presented for switched systems with unknown autonomous subsystems. Algorithm 1 By means of the produced data,the proposed data-driven HJB equation-based optimal switching algorithm approximates the optimal solution gradually from certain positive definite function without a priori of knowledge of subsystem models.Then,the approximate optimal cost function can be achieved aswith the corresponding approximate optimal switching policy Remark 2The value of the initial switching policyv0(x)should include every element of V so that there exist enough data for everyv∈Vto calculateThe produced state data are collected at the beginning,with which the data matricesg(tr),HvandKare calculated once at the beginning. Remark 3The optimal switching policy is deduced for switched systems with unknown subsystem dynamics based on the certain subsystem dynamics.However,since the optimal switching policy is achieved directly from the system data instead of the system model,small uncertainties can be tackled by approximation functions from the system data.When the parameter uncertainties are constrained to a small range,the optimal switching policy will function well.When the parameter uncertainties are constrained to a certain range,the optimal switching policy may become a suboptimal switching policy.When the parameter uncertainties are out of a certain range,the optimal switching policy will become unsuitable.Parameter uncertainties of different extent will have different influence on the optimal switching policy. In this section,the convergence of Algorithm 1 is analyzed in Theorem 2 based on Theorem 1.At first,Lemma 1 [32]is introduced for the proof of Theorem 2. In this section,based on the above optimal switching approach,the optimal control problem is investigated for the switched system (1) with the input.At first,the finitehorizon HJB equation is derived for the switched system (1)and then is employed to solve the optimal control problem. Similar as (10),a finite-horizon cost-to-go from the timetto a fixed final timetfwith the statex(t) at timetcan be defined for the switched system (1): and its recursion formula is Theorem 3 provides the optimal solution based on the finitehorizon HJB equation (26). Theorem 3Based on the finite-horizon HJB equation(26),a function VF0(x,t)can be defined as follows: with the terminal condition VF0(?,tf)=V0(?)where V0is a positive definite function.Then for?t From certain positive definite functionV0(?),the optimal cost can be approximated through (28) and (29) and the optimal control policy can be approximated through (30).However,with the unknown subsystems,how to calculate the optimal solution through (28) to (30) requires to be investigated. In this section,according to the above analysis,a data-driven optimal control algorithm based on the HJB equation is proposed to approximate the optimal cost and the corresponding optimal control policy for the switched system with unknown subsystems. The costVF1(x,s) in (13) is unknown and to facilitate calculation it can be represented by With (31) and (32) applied in (28),it can be deduced that Applying a control policy (v0(x),μ0(x)) in the system generates a state trajectory with quantities of data which are useful in following calculation.Some necessary data matrices are defined as follows: wheret1 Assumption 3There exist positive integersL,L1and positive numbersα,α1such that the following equalities hold:where the time instantstrcan be any instant among the selected instantst1,t2,…,tl;for ?v∈V,where the time instantstrsatisfies the condition that In optimal control of ordinary systems,this kind of assumption can be satisfied by choosing the input as an exploration noise such as sinusoidal signals or random noise [9–12].In view of the above,for the optimal control problem of switched systems,Assumption 3 can be satisfied by exerting an exploration noise in the input and random switching. Sinceδtis very small,the values ofv0(x) can be seen to be constant during the time interval [tr,tr+δt) .Then with sufficient data employed and the corresponding data matrices applied,it can be achieved from (34) that where the time instantstrsatisfies the condition thatv0(x(tr))=v,for ?v∈V.Then under Assumption 3,the estimate ofCv(s) can be achieved from (35) with the data matrices as follows: In (33),the control input is unknown.To facilitate calculation,the approximation function can be employed to replace the control input.The control inputμcorresponding to the subsystemvcan be represented byμ2,v(x,s) as follows: whereΘ(x)=[Θ1(x),Θ2(x),…,ΘNl(x)]Tis a vector concerning a set of linearly independent basis functionsΘj(x)∶v∈Vis the weight vector andeΘ,v(x,s) is the approximation error.Nlis the number of the basis functionsΘj. Applying (37) and according to (33),the weight corresponding to the minimum of the right side can be estimated byL1,v(s) as follows: Algorithm 2 By means of the produced data,the proposed datadriven HJB equation-based optimal control algorithm approximates the optimal solution gradually from certain positive definite function without a priori of knowledge of subsystem models.Then,the approximate optimal cost function can be achieved aswith the corresponding appro-ximate optimal control policy Remark 4Similar as Remark 2,the value of the initial control policyv0(x) in (v0(x),μ0(x)) should include every element of V so that there exist enough data for everyv∈Vto calculate In this section,the convergence of Algorithm 2 is analyzed in Theorem 4 based on Theorem 3. In this section,three examples with Matlab are illustrated to validate the effectiveness of the data-driven optimal switching and control approaches in this paper. Example 1Consider a switched system as [13,18,19] consisting of autonomous subsystems as follows: The optimal switching policy can be known from [13] asChoose the initial switching policy asv0(x)=(?1)floor(x∕0.03)∕2+1.5 .The basis functions are selected asΦ(x)=[x2,x4,x6,x8,x10]TandΨ(x)=[x2,x4,x6,x8,x10,x12]T.Set the sample period asδt=0.002 s. Apply Algorithm 1 in this example as Fig.1.Apply the initial switching policyv0(?) in the system and utilize the online state data fromt=0–0.5 s to calculate the data matrices and then to update the weight vectors at each iteration.After the calculation in 0.5 s,the approximate optimal cost is achieved through 277 iterations with the corresponding approximate optimal switching policy which is applied to the system then.The initial costthe approximate optimal costand the optimal costV?(?) are demonstrated in Fig.2.It is obvious that the approximate optimal costis very close to the optimal costV?(?) .The state trajectories with the initial switching policyv0(?),the approximate optimal switching policyand the optimal switching policyv?(?) applied in the system aftert=1 s are demonstrated in Fig.3,where the largest error between the state trajectories corresponding toandv?(?) is 0 while the largest error between the state trajectories corresponding tov0(?) andv?(?) is 0.2624.At the end of 5s in Fig.3,the state trajectories with the approximate optimal switching policyand the optimal switching policyv?(?) applied both reach 0.0125 which is near 0,while the one with the initial switching policyv0(?)reaches 0.2523.The switching policiesv0(?),andv?(?)whenx∈[?2,2] are illustrated in Fig.4.Apparently,andv?(?) are almost the same.The similarity rate of the switching policiesandv?(?) is 100% while the similarity rate of the schedulesv0(?) andv?(?) is 50.62%. Fig.1 Simulation flow chart Fig.2 Costs achieved by Algorithm 1 applied in Example 1 Fig.3 State trajectories with Algorithm 1 applied in Example 1 Fig.4 Switching policies achieved by Algorithm 1 applied in Example 1 Example 2Consider a mass-spring-damper system as [19,20]: withv∈{1,2,3},F1=1,F2=?1 andF3=0 .Here,the statex1(t) is the displacement of the mass measured from the relaxed length of the spring.Fvis the external force acting on the mass.The initial state isx(0)=[2,2] and the functionQisQ(x(t))= Choose the initial switching policyv0(x)=1 when mod(floor(x1/0.2),3)=0,v0(x)=2 when mod(floor(x1/0.2),3)=1 andv0(x)=3 under other condition.The basis functions are polynomials with all possible combinations of the state variables up to the 4th degree without repetitions selected as [19,20].Set the sample periodδt=0.01 s. Apply Algorithm 1 in this example as Fig.1.Apply the initial switching policyv0(?) in the system and utilize the online state data fromt=0 to 4 s to calculate the data matrices and then to update the weight vectors at each iteration.After the calculation in 2 s,the approximate optimal cost is achieved through 300 iterations with the corresponding approximate optimal switching policy which is applied to the system then.The state trajectories with the initial switching policyv0(?) and the approximate optimal switching policyapplied in the system aftert=6 s are demonstrated in Fig.5,where the state trajectory corresponding toconverges to the origin quickly aftert=6 s while the trajectory corresponding tov0(?) goes to other point with decreasing oscillation amplitude.The state trajectories withv0(?) andapplied in the system fromt=0 s are demonstrated in Fig.6 which indicates the same effect as Fig.5.In Fig.6,the state trajectories with the approximate optimal switching policyapplied converge around the zero point with the absolute error of 0.1 at 18.02 s and 15.16 s.The costsandare demonstrated in Fig.7.It can be seen that the approximate optimal costis less than the initial cost Fig.5 State trajectories with Algorithm 1 applied in Example 2 Fig.6 State trajectories with Algorithm 1 applied in Example 2 from t=0 Fig.7 Costs achieved by Algorithm 2 applied in Example 2 Example 3Consider a switched system [21] as follows: Apply Algorithm 2 in this example similarly as Fig.1.Apply the initial control policy (v0(?),μ0(?)) in the system and utilize the online state data fromt=0 to 10 s to calculate the data matrices and then to update the weight vectors at each iteration.After the calculation in 5 s,the approximate optimal cost is achieved through 44 iterations with the corresponding approximate optimal control policy which is applied to the system then.The state trajectories with the initial control policy (v0(?),μ0(?)) and the approximate optimal control policyapplied in the system aftert=15 s are demonstrated in Fig.8,where the state trajectory corresponding toconverges to the origin quickly aftert=15 s while the trajectory corresponding to(v0(?),μ0(?)) oscillates around the origin.The state trajectories with (v0(?),μ0(?)) andapplied in the system fromt=0 s are demonstrated in Fig.9 which indicates the same effect as Fig.8.In Fig.9,the state trajectories with the approximate optimal switching policyapplied converge around the zero point with the absolute error of 0.1 at 4.22 s and 3.94 s.The costsare demonstrated in Fig.10.It can be seen that the approximate optimal costis less than the initial costmostly. Fig.8 State trajectories with Algorithm 2 applied in Example 3 Fig.9 State trajectories with Algorithm 2 applied in Example 3 from t=0 Fig.10 Costs achieved by Algorithm 2 applied in Example 3 The simulation results show that Algorithms 1 or 2 can effectively approximate the optimal cost and the optimal switching or control policy online in the case of unknown subsystem models,and can effectively solve the optimal switching and control problems for switched system with unknown subsystems. In this paper,optimal solutions based on the finite-horizon HJB equation are provided for the optimal switching problem and the optimal control problem of switched systems with continuous-time dynamics.Data-driven algorithms are proposed to approximate the optimal switching and control policy with unknown subsystems.The approaches utilize the data produced by the initial switching or control policy instead of the subsystem models.However,there are no restrictions involved for the switching policy or input and so on in this paper and there are many constraints or uncertainties in practical application.Therefore,optimal control problems for switched systems with constraints and uncertainties will be investigated further in the future.3.3 Convergence analysis
4 Data-driven optimal control for switched systems
4.1 Optimal solution based on the finite-horizon HJB equation
4.2 Data-driven optimal control algorithm based on the HJB equation
4.3 Convergence analysis
5 Simulation
6 Conclusions
Control Theory and Technology2021年3期