Distributed projection subgradient algorithm for two-network zero-sum game with random sleep scheme

2021-10-13 07:16:54HongyunXiongJiangxiongHanXiaohongNianShilingLi

Control Theory and Technology 2021年3期

Hongyun Xiong·Jiangxiong Han·Xiaohong Nian·Shiling Li

Abstract In this paper,a zero-sum game Nash equilibrium computation problem with a common constraint set is investigated under two time-varying multi-agent subnetworks,where the two subnetworks have opposite payoff function.A novel distributed projection subgradient algorithm with random sleep scheme is developed to reduce the calculation amount of agents in the process of computing Nash equilibrium.In our algorithm,each agent is determined by an independent identically distributed Bernoulli decision to compute the subgradient and perform the projection operation or to keep the previous consensus estimate,it effectively reduces the amount of computation and calculation time.Moreover,the traditional assumption of stepsize adopted in the existing methods is removed,and the stepsizes in our algorithm are randomized diminishing.Besides,we prove that all agents converge to Nash equilibrium with probability 1 by our algorithm.Finally,a simulation example verifies the validity of our algorithm.

Keywords Zero-sum game·Nash equilibrium·Time-varying multi-agent network·Projection subgradient algorithm·Random sleep scheme

1 Introduction

Nash equilibrium computation problem in multi-agent network game have been paid more and more attention in social networks,cloud computation,smart grids [1–3],and so on.In recent years,some useful methods were proposed by many researchers in the related field to calculate the Nash equilibrium of different game problems.For example,a distributed operator splitting method was developed in [4] to solve generalized Nash equilibrium.Ye and Hu [5] solved Nash equilibrium of non-cooperative game by combining consensus and Gradient Play method.Furthermore,Deng et al.[6–8] studied the aggregate game problems,where the agent’s decision is decided by its own decision and the aggregation of all agent’s decisions.Moreover,Zeng et al.[9] considered Nash equilibrium computation of multinetwork games,in which agents cooperate with each other within the subnetwork and the subnetworks compete with each other.

It is noted that the game problem studied by [4–8] contains only one multi-agent network.However,the multinetwork game is closer to the practical application scenarios.In particular,two-network zero-sum game is the most common multi-network game and has important research value in the power allocation of Gaussian communication channel [10] and adversarial detection of sensor networks[11].Many researchers have proposed some useful algorithms to compute the Nash equilibrium of two-network zero-sum games.An original dual method based on continuous-time to solve the Nash equilibrium of the twonetwork zero-sum game was provided in [12].Besides,a distributed projection subgradient algorithm was proposed in [13],it studied the Nash equilibrium computation of two-network zero-sum game with different stepsize and different communication topologies.In addition,Shi and Yang [14] proposed a novel incremental algorithm to reduce communication burden between two subnetworks.

In the existing distributed methods mentioned above,each iteration of the agent needs to calculate the local subgradient information.The calculation of subgradient involves the calculation of big data and multiple function values,and the cost function is usually complex.Therefore,the calculation amount of each agent in the above method is large and it is necessary to study a new algorithm to reduce the computation of agents.In order to reduce the computation of agents in distributed optimization problems,the random sleep scheme was introduced in [15],and a distributed algorithm with random sleep scheme is proposed in [16] to study constrained convex optimization problems over multi-agent networks.Besides,the distributed optimization problem over multi-cluster networks is studied in [17] by combining hierarchical algorithm and random sleep scheme.Inspired by the existing work,this paper considers two-network zero-sum games Nash equilibrium computation problem under two time-varying multi-agent subnetworks,we will propose a new algorithm with random sleep scheme to reduce the computation of agents.According to the principle of random sleep strategy,each agent is based on independent identically distributed Bernoulli decision decides whether to calculate the local subgradient at the current time and perform projection operation,or only perform consistency calculation by combining all neighbors information.Therefore,the computational complexity of subgradient and projection will be simultaneously reduced in our algorithm.

This paper mainly studies the Nash equilibrium computation problem of two-network zero-sum games with a common constraint set,and the communication graphs are time-varying.In order to effectively reduce the amount of calculation and running time,inspired by [15–17],a new projection subgradient algorithm with random sleep scheme is proposed in this paper.Besides,consensus and convergence of our algorithm are analyzed.The main contributions of this paper are as follows:

(1) This paper proposes a new distributed projection subgradient algorithm with random sleep scheme to compute Nash equilibrium of two-network zero-sum game under a time-varying weight-balanced communication topology.Compared with the existing algorithm[12–14],the proposed algorithm not only ensures convergence performance but also reduces the amount of computation and running time.

(2) Compared with coordinated stepsize in [13,14],the randomized diminishing stepsize is used in our algorithm,and our stepsizes are nonhomogeneous.Therefore,our algorithm is more flexible in the choice of stepsize than [13,14].

(3) We use the distributed optimization method to solve the Nash equilibrium of zero-sum game.In our algorithm,all agents of two subnetworks minimize and maximize the same convex-concave payoff function,respectively.However,all agents in [16] agreed to minimize a convex objective function.Therefore,the design and proof of our algorithm are different from [16].

(4) Different from [12],our algorithm is based on discrete time,and due to the introduction of random sleep scheme,the convergence analysis is more complex than continuous-time algorithm [12].Moreover,our algorithm considers a common constraints and allows the cost functions to be nonsmooth,the requirement is more general than [18].

The organization of this paper is as follows.In Sect.2,preliminaries are introduced and the considered two-network zero-sum games Nash equilibrium computation problem is formulated.In Sect.3,the novel distributed projection subgradient algorithm random sleep scheme is designed.Besides,the consensus and convergence of proposed algorithm is given under an uncoordinated stepsize.In Sect.4,numerical examples are given.Finally,conclusion is given in Sect.5.

1.1 Notations

? is the set of real number,and ? is the set of natural numbers.?ndenotes then-dimension Euclidean space,and ‖?‖denotes the Euclidean norm.rTdenotes the transpose of vectorr.The mathematical expectation of a random variableYis defined as E[Y].Let X be a closed convex set,PX[α]is defined as a projection of vectorα∈?nonto X,that is,for anyγ∈X .The projection operatorPX[?]∶?n→X has the following important property:

For a convex functionf(x),the subgradientψ(x1) atx1∈?nsatisfiesf(x)?f(x1)≥〈x?x1,ψ(x1)〉 for allx∈?n.For a concave functionf(x),the subgradientψ(x1) atx1∈?nsatisfiesf(x)?f(x1)≤〈x?x1,ψ(x1)〉 for allx∈?n.The subdifferential?f(x1) off(x) atx1is the set of all subgradients off(x) atx1.A functionf(x,y)∶→? is (strictly)convex–concave if it is (strictly) convex inxfor any fixedyand (strictly) concave inyfor any fixedx.The subdifferential off(x,y) at pointxis denoted as?x f(x,y) .Similarly,the subdifferential off(x,y) at pointyis denoted as?y f(x,y).

2 Preliminaries and formulation

In this section,we give some preliminary knowledge and formulate our problem.

2.1 Graph theory

2.2 Saddle point and zero-sum games

For a functionf(x,y),there exists a pair (x?,y?)∈X×Y such thatf(x?,y)≤f(x?,y?)≤f(x,y?) for allx∈X,y∈Y,then (x?,y?) is called a saddle point of functionf.In this paper,the sets X and Y usually are nonempty convex compact set,and the functionfis convex–concave,these ensure that the set of saddle points of the functionfis nonempty.

A game problem is denoted as (M,S,U),where|M|=N∈?≥2is the number of all players,and S=s1×…×sNis the set of actions,wheresiis defined as the action of playeri.Besides,U=(u1,…,uN),whereui∶S→? is the payoff function of playeri.For a game problem,each player aims to minimize its cost function until none of player can get better benefits by changing their action alone.A profile actionis a Nash equilibrium iffor all playeri∈M andxi∈si,wherex?iis denoted as the actions of all players excepti.An N-person zero-sum game satisfies=0 .In particular,two-person zero-sum game only has two players,their payoff functions satisfiesu1+u2=0 .There is an important conclusion that the Nash equilibrium of two-person zero-sum game is the saddle point of payoff function.

2.3 Problem formulation

In this paper,we provide a novel distributed projection subgradient algorithm with random sleep scheme to solve Nash equilibrium of two-network zero-sum game under time-varying weight-balanced topologies,each agent updates its state only by communicating with its neighbors until it finally converges to Nash equilibrium.An illustrative example for the communication topology of the two-network zero-sum game is given in Fig.1.The same problem was studied in [13],where each agent needs to calculate the subgradient information and perform projection operation in each iteration.However,our algorithm with random sleep scheme can effectively reduce the computation and energy burden and ensure the agents converge to Nash equilibrium.

Fig.1 The communication topology of two-network zero-sum game

3 Main result

In this section,a novel distributed algorithm is proposed in Sect.3.1.The consensus result of the proposed algorithm is given in Sect.3.2.Finally,the convergence of the proposed algorithm is analyzed in Sect.3.3.

3.1 Distributed algorithm design

In this section,a distributed projection subgradient algorithm with random sleep scheme first is provided to solve Nash equilibrium of two-network zero-sum game under time-varying weight-balanced communication topologies.Firstly,a useful assumption about communication topologies is given.

Assumption 1The time-varying communication graphs G1(k) and G2(k) are uniformly jointly strongly connected,Gcon(k) is uniformly jointly bipartite.

Remark 1The requirement of communication topologies is connected for all the time in [12].However,this paper only requires agents in time-varying networks G(k) to exchange information with neighbor agents at least once during a period of lengthT,which can save energy and reduce communication burden of multi-agent network systems in practical application.

We denote the state of agenti∈V1at timekasxi(k)∈?m1 and the state of agenti∈V2at timekasLetξ?i(k),?=1,2,k∈? be independent identically distributed Bernoulli random variables,and they satisfyThe update mechanism of the proposed distributed projection subgradient algorithm with random sleep scheme is as follows:we denote the initial value of these variables arexi(0)∈X,yi(0)∈Y .During the iteration,each agenti∈V1chooses to either calculate the subgradient and perform projection operation whenξ1i(k)=1,or keep the previous consensus estimate whenξ1i(k)=0 .For alli∈V2,there has a similar update mechanism aboutξ2i(k) .The distributed projection subgradient algorithm with random sleep scheme is given as follows:

whereaij(k),bij(k) are the communication weights of link (j,i),respectively.q1i(k) is subdifferential offiati∈V1andq2i(k) is subdifferential ofgiatAt timek,the stepsizes of two subnetworks are defined asαi(k),i∈Vl(k) andβi(k),i∈V2(k),respectively.is neighbor set of agentiin subnetwork G?(k) at timek.

In the proposed algorithm (2) and (3),andrepresent the estimation of the state of the opposite subnetwork,but due to the communication topology Gcon(k)is time-varying,agents may not receive information from opposite subnetwork at every time.Therefore,our algorithm needs to consider the following two cases during the iteration process,that is,the agent has or does not have neighbors in the opposite subnetwork,accordingly,the state estimationandare defined as follows:

(i) If agenti∈V1has at least one neighbor in subnetwork G2(k) at timek,then

If agenti∈V2has at least one neighbor in subnetwork G1(k) at timek,then

(ii) If agenti∈V1has no neighbors in subnetwork G2(k)at timek,by Assumption 1,then there must existh1i∈? such that

wherek?h1iis last time beforekwhen agentihas at least one neighbor in subnetwork G2(k) .If agenti∈V2has no neighbors in subnetwork G1(k) at timek,by Assumption 1,then there must existh2i∈?such that

wherek?h2iis last time beforekwhen agentihas at least one neighbor in subnetwork G1(k).

Remark 2According to the definition ofabove,it is not difficult to find that our algorithm requires each agent to have at least one neighbor of the opposite subnetwork during a period of lengthT,so as to estimate the state of the opposite subnetwork.The weighted average state of the neighbors in the opposite subnetwork is used as the estimationThe state estimationin this paper are similar to [13].

In the proposed algorithm,(2) and (3) execute the state iteration of the two subnetwork,respectively.For all timek,each agent decides which iteration to carry out according to the random variableξ?i(k),whenξ?i(k)=0,the agent only uses the neighbor information to perform the consistency operation,which avoids the calculation of subgradient and the execution of projection operation,thus effectively reducing the calculation amount.In particular,whenρ?i=1,our algorithm is equivalent to standard distributed projection subgradient algorithm in [13].

Let Fkbe theσ-algebra created by the entire history of our algorithm up to timek,that is,fork≥1,Fk={(xi(0),ξ1i(l),i∈V1);(yi(0),ξ2i(l),i∈V2),1 ≤l≤k?1},and F0={(xi(0),i∈V1);(yi(0),i∈V2)} .Besides,J?k,?=1,2 represents the set of agents that perform projective subgradient iterations in two subnetworks at thekth moment,respectively,thenJ?k={i∶ξ?i(k)=1} .Some important assumptions for consensus and convergence analysis are as follows:

Assumption 2There exists a constant numberL>0 such that the subgradients are bounded,that is,for allk∈?,|q1i(k)|≤Lfor alli∈V1and |q2i(k)|≤Lfor alli∈V2.

Remark 3According to Assumption 2,we can obtain the following useful conclusions:‖fi(x1,?)?fi(x2,?)‖≤L‖x1?x2‖and ‖fi(?,x1)?fi(?,x2)‖≤L‖x1 ?x2‖,and there are similar conclusions for cost functionsgi:‖gi(y1,?)?gi(y2,?)‖≤L‖y1?y2‖ and ‖gi(?,y1)?gi(?,y2)‖≤L‖y1?y2‖ .These conclusions are useful in the process of algorithm convergence analysis.

Assumption 3For allk∈?,the adjacency matricesA(k)andB(k) are doubly stochastic.Besides,are stochastic matrices,that is,

Assumption 4The stepsizes of agents in two subnetworks are defined as follows:

Remark 4The nonhomogeneous randomized diminishing stepsizesαi(k),βi(k) are used in our algorithm,anddenote the sum of times that agenti∈V?,?=1,2 is active before timek.Compared with [13,14],the stepsize we consider is more flexible and general.

For the stepsizes we set,there is an important lemma which will be used to analyze the convergence of our algorithm.

Lemma 1[16]Let0

Lemma 2[16]Let(Ω,F,P)be a probability space andF0?F1?F2?…be a sequence of σ-algebra ofF .Let{λ(k)},{μ(k)},{ν(k)},{σ(k)}are sequences ofFk-measurable non-negative random variables such that with probability1for all k∈?,

3.2 Consensus analysis

In this section,we show that the multi-agent network G(k)achieves a consensus result with probability 1 by the proposed algorithm.First,we introduce an important lemma

Lemma 3[15]Let Assumptions1,2,3and4hold,then with probability1we have

are the average states of two subnetworks at time k,respectively.

ProofBecause the two-network zero-sum game can be divided into two distributed optimization problems to some extent,but the agent needs to estimate the state of other subnetwork,so the proof of the (9) and (10) in Lemma 3 is similar to Lemma 5 of [15].Since the proof process of formula (9) is the same as that of Lemma 5 of [15],except that a gradient perturbation term is considered in [15].On the other hand,the two subnetworks have similar state update laws,but the update direction is opposite,then the proof of formula (10) is similar to that of formula (9),so we are not going to describe the proof process in detail. ?

The following lemma ensures the consensus of the proposed algorithm.

Lemma 4Under Assumptions1,2,3and4,for all k∈? ,the agents update their states through algorithms(2)and(3),then with probability1,

Taking conditional expectation for (15),by Lemma 1,for anyk≥with probability 1 we can obtain

3.3 Convergence analysis

In this section,we prove that the multi-agent network G(k)converges to Nash equilibrium with probability 1 by algorithms (2) and (3).Firstly,a useful lemma needs to be introduced.

ProofThe proof process is similar to that of of [13],we just need to replace allαk,βkin Lemma 5 withBesides,because our algorithm has a random update strategy,we need to take the conditional expectation on both sides of the formula in Lemma 5,but the proof idea is exactly the same as that in the [13],then we are not going to describe the proof process in detail. ?

The following theorem is our main conclusion that shows the convergence result of the proposed algorithm.

Theorem 1Let Assumptions1,2,3and4hold and U is strictly convex–concave.Consider Nash equilibrium computation problem of two-network zero-sum game under a time-varying weight-balanced communication topologyG(k),k∈? ,agents update their states by the algorithm(2)and(3),then with probability1,

4 Numerical examples

In this section,we will provide two numerical examples to verify the validity of the distributed projection subgradient algorithm with random sleep scheme from smooth and nonsmooth cost functions,respectively.

Example 1We consider a multi-agent system composed of two subnetwork with 3 agents and 4 agents,respectively,that is,n1=3,n2=4 andm1=1,m2=1 .We denote the common constraint set as X=Y=[1,5] .The smooth cost functions of agents in subnetwork G1(k) are as follows:

and the cost functions of agents in subnetwork G2(k) are as follows:

Besides,for the two subnetworks of graph Gφ,we letbe the communication weights of agents and its neighbors in the opposite subnetwork,respectively.

We letρ=0.6,Figs.3 and 4 show the convergence results of our algorithm under smooth cost functions after 4000 iterations,and each agent performs projection operations and computes subgradient only about 2400 times.It can be seen that our algorithm not only makes all agents converge to Nash equilibrium,but also greatly reduces the amount of computation and speeds up the running time of the algorithm.

Fig.2 The time-varying communication topology of multi-agent network

Fig.3 The convergence result of subnetwork G1(k) with smooth cost function

Next we consider the effect ofρ?ion the convergence rate and accuracy of our algorithm.We define the relative error as follows:

The relative errors under four differentρvalues are shown in Fig.5.It can be seen that the value ofρhas large influence on the convergence performance of our algorithm,that is,whenρis close to 1,our algorithm shows a better convergence performance.In particular,whenρ=1,the corresponding algorithm is the classical projection subgradient algorithm[13].However,at this time,the efficiency of reducing the amount of calculation will become worse.We also can find that the convergence performance of the algorithm is not as good as that of the classical algorithm after adopting the random sleep strategy.However,whenρ=0.6,our algorithm can achieve satisfactory accuracy while reducing a large amount of calculation.On the other hand,due to the existence of random sleep strategy and switching of communication topology in our algorithm,the relative error shows oscillatory behavior,which does not affect our algorithm to converge to Nash equilibrium with better precision.

Fig.4 The convergence result of subnetwork G2(k) with smooth cost function

Fig.5 The relative errors with smooth cost function

Finally,under the given accuracy,we compare the average calculation times of subgradient under three differentρvalues.The average calculation times of 100 experiments are shown in Table 1.According to the principle of the random sleep scheme,the amount of computation of the agent mainly comes from the calculation of the subgradient,so the average calculation times of the subgradient in the whole iterative process can effectively reflect the amount of computation of the algorithm.According to Table 1,under the same accuracy requirement,the smaller theρvalue is,the less the calculation times of the subgradient are,it means that the amount of computation is smaller.However,combining with the relative error curve in Fig.5,we can see that whenρis small,the convergence performance of the algorithm will be poor.Therefore,it is necessary to choose a suitableρto balance the convergence performance and the computational complexity.

Table 1 Average calculation times of subgradient under the various ρ values

According to the proposed algorithm (2) and (3),when theρvalue is large,the agent updates its state toward the subgradient direction with a greater probability,which improves the convergence performance.When theρvalue is small,the agent performs more consistent operations,and the convergence performance is weakened to a certain extent.Therefore,in practical application,if it tends to have a higher convergence performance,it is recommended to use a largerρvalue (close to 1),if it tends to have a small amount of computation,it is recommended to use a smaller value ofρ,but the value ofρshould be less than or equal to 0.5,because it can be seen from Fig.5 that too smallρwill lead to poor convergence performance.

Example 2We consider a multi-agent system similar to the first example.We denote the common constraint set as X=Y=[?2,3],the nonsmooth cost functions of agents in subnetwork G1(k) are as follows:

the nonsmooth cost functions of agents in subnetwork G2(k)are as follows:

We also can find thatU(x,y) is strictly convex–concave.The saddle point ofU(x,y) is (1.0671,0.7698).Besides,sincef2is nonsmooth atx=1,g1is nonsmooth aty=0,then we define the subgradientq12(k)=?x f2(1,y)=1∈[?1,1],q21(k)=?yg2(x,0)=?1∈[?1,1].We select the initial valuexi(0)=[1,?2,2],yi(0)=[1,?1,0.5,?2] randomly.As with the first example,we letρ1i=ρ2i=ρ.The communication topology and adjacency matrix are given in the first example.We letρ=0.6,and Figs.6 and 7 show the convergence results of our algorithm under nonsmooth cost functions after 1500 iterations,and each agent performs projection operations and computes subgradient only about 900 times.

For the influence ofρvalue on the convergence performance of our algorithm under nonsmooth cost functions,the conclusion is similar to the first example,which will not be shown in detail here.

In summary,both smooth and nonsmooth cost functions,we verify that all agents converge to Nash equilibrium by our algorithm.The value ofρaffects the convergence performance and computational complexity of the algorithm.Therefore,by properly setting the appropriateρin algorithm,it can effectively reduce the calculation amount of the agent,speed up the running time of the algorithm,and ensure the good convergence accuracy of the algorithm to a certain extent.

Fig.6 The convergence result of subnetwork G1(k) with nonsmooth cost function

Fig.7 The convergence result of subnetwork G2(k) with nonsmooth cost function

5 Conclusions

The Nash equilibrium computation problem of two-network zero-sum game under time-varying communication graph is considered in this paper.We propose a distributed projection subgradient algorithm with random sleep scheme for reducing calculation amount of the agent and guaranteeing the convergence performance.Besides,uncoordinated random decreasing stepsizes are used in our algorithm.Furthermore,we analyze the convergence of our algorithm and verify the validity by a numerical examples.

Our future work will extend zero-sum game to N-coalition non-cooperative game,and design a distributed algorithm to solve the Nash equilibrium computation problem of N-coalition noncooperative games with event-triggered communication.

Control Theory and Technology2021年3期

Control Theory and Technology的其它文章: Neural network-based adaptive decentralized learning control for interconnected systems with input constraints; H∞ output feedback control for large-scale nonlinear systems with time delay in both state and input; A characteristic modeling method of error-free compression for nonlinear systems; Adaptive Kalman filter for MEMS IMU data fusion using enhanced covariance scaling; Extremum seeking-based optimal EGR set-point design for combustion engines in lean-burn mode; Heuristic dynamic programming-based learning control for discrete-time disturbed multi-agent systems