Xiuxian Li
Online optimization has received numerous attention in recent two decades, mostly inspired by its potential applications to auctions, smart grids, portfolio management, dictionary learning, neural networks, and so on [1-4]. Generally, online optimization is a sequence of decision making processes, where a sequence of time-varying loss functions are gradually revealed in a dynamic environment which may be adversarial. At each time instant, the loss function information at current time is revealed to the decision maker only after her/his decision is made. The objective of online optimization is to choose the best decision at each time step as far as possible, but unfortunately, this goal is generally difficult or impossible to achieve. As such, to measure the performance for an algorithm, two metrics are usually exploited[5], i.e., regret and competitive ratio, for which the former one is leveraged more frequently in the literature. Moreover,two kinds of regrets, i.e., static and dynamic regrets, are usually considered by researchers, where the static regret is to compare the performance with a cumulative loss with respect to the same best decision through all the time horizons, while the dynamic regret is with respect to the best decision at each time instant. More recently, another regret,called adaptive regret [6], has been proposed and investigated as a suitable metric for changing environments, as dynamic regret does.
As the development of advanced technologies, smart devices, and big data in modern society, centralized algorithms are emerging some limitations, such as low failure robustness, low-level data privacy, vulnerability to attack,high computational complexity, and so forth. In contrast,distributed/decentralized algorithms only depend upon local information from each agent’s neighbors in a multi-agent network, where each individual agent is in possession of its own private information, thus enjoying a multitude of advantages which centralized algorithms do not have. With this motivation, distributed algorithms have been extensively studied in recent years in a few domains, e.g., computer science, systems and control, and machine learning [15-29].
Simple scenarios with/without feasible set constraints are first addressed for distributed online optimization. For instance, the authors in Refs. [16, 17] addressed distributed online optimization without any constraints, where two distributed algorithms, i.e., an online subgradient descent algorithm and a distributed online subgradient push-sum algorithm, are developed, respectively. In addition, global and local set constraints were also taken into consideration along with the design of effective distributed algorithms,including Nesterov based primal-dual algorithm [18], a variant of the Arrow-Hurwicz saddle point algorithm [19], dual subgradient averaging algorithm [21], distributed primaldual algorithm [22], and mirror descent algorithm [20], to name a few.
Besides simple feasible set constraints, inequality constraints also become the focus of researchers. For example, local stationary inequality constraints were discussed in Ref. [23], where a consensus-based adaptive primaldual subgradient algorithm is proposed. Besides, a sort of general constraint, i.e., stationary coupled inequality constraint, was investigated in Refs. [24, 30], where the global inequality constraint is not accessible to any agent,instead each agent only knows some partial information on it. For this problem, a sublinear bound is established for static regret by proposing distributed primal-dual algorithms. Later, time-varying coupled inequality constraints were considered under full gradient information and bandit feedback in Refs. [31, 32], respectively.
The above discussion is presented from the perspective of constraints. From another viewpoint, in the following several recent hot issues in distributed online optimization are presented, including optimal regret bounds, the case with predicted information, switching costs, communication delays, asynchrony, quantization, random networks,security, the case with an aggregative variable, and so on.
Predicted information is generally instrumental to improving the regret bounds in some cases [36-38],although most of existing works do not assume any future information. This is in line with the intuition that more information one has, better performance it achieves. In this respect, recent research has put more and more attention to online optimization in the presence of predicted information. For instance, the work [36] has taken advantage of potential predictions, introduced a new regularity for dynamic regret using the accuracy of predicted information, and shown that accurate prediction can derive sharper dynamic regret bounds under mild assumptions. In the meantime, a finite prediction window has been considered for cost functions along with switching costs in Ref.[38], where two distributed gradient-based algorithms,i.e., receding horizon gradient descent (RHGD) and receding horizon accelerated gradient (RHAG) are developed,establishing a near-optimal lower regret bound for RHAG.It should be noted that switching costs are pivotal in some scenarios, where the cost at each time depends not only on the current decision, but also on the previous one, with the purpose of penalizing the change in the decision at each stage.
The complexity of computation and communication is paramount in online optimization, mostly because physical agents, such as computers, often have limited capability of computation and communication. As such, to alleviate the complexity, a number of substantial factors, such as communication time-delays, asynchronous communication and iteration, and quantization, have been increasingly investigated in recent years [39, 40]. As an example, delays and asynchronicities have been studied for multi-agent online learning problems in Ref. [39] by providing a general framework, for which the authors proposed a class of adaptive dual averaging schemes without the need of any between-agent coordination and both single-agent and multi-agent cases were addressed in detail. Wherein, optimal regret bounds were established at both the agent and network levels, and an“optimistic” algorithm was proposed, which is with slower variation and improved regret bounds by employing the predictability of problems.
Additionally, communication channels are often unreliable in many practical applications, such as packet dropout of information transmission in wireless sensor networks, which thus inspires the necessity of (random) switching communication graphs [16, 21, 41, 42]. For example, unreliable links result in switching communication topologies, which motivates the work [21], where a dual subgradient averaging distributed online algorithm was developed along with a sublinear regret analysis. In Ref. [42], Erd?s-Rényi rule was leveraged to delineate node-to-node communications,and the effect of node-to-node communications on distributed online convex optimization was addressed in detail,including full gradients, one-point and two-point bandits for convex and strongly convex loss functions.
Security is an additional pivotal factor in distributed networks, since it makes a significant effect on private agents,including privacy preserving and malicious attacks, etc. In the presence of adversarial attacks, the performance will generally be degraded and in some cases they even cause detrimental consequences, such as completely opposite results driven by the compromised agents. Along this line,privacy preserving of local cost functions was studied in Ref.[43] for distributed online optimization under time-varying unbalanced directed graphs, a differentially private-distributed stochastic subgradient-push algorithm was devised for masking the privacy of participating agents. Wherein, the authors also analyzed the compromise between the proposed algorithm and privacy levels, and it was shown that the designed algorithm can effectively deal with all uniformly bounded delays in the communication channels.
By careful observation, it can be easily found that all the aforesaid works only focus on the case where each agent shares a common or its own decision variable without dependency on other agents’ variables. However, many practical problems may encounter the case where the loss functions of an agent rest on the variables of other agents besides its own decision variable, such as formation control in unmanned aerial vehicles (UAVs) where each UAV needs to avoid collision with its nearby ones. In this respect,an aggregative variable, which is composed of all agents’variables, was studied in Ref. [44] more recently, where all agents’ local loss functions are dependent on the aggregative variable. On the other hand, when involving physical systems, the system dynamics, in combination with control inputs, will apparently become a part of constraints for online optimization, which makes online optimization more challenging. Along this direction, learning-based techniques(e.g., online learning and reinforcement learning [45]) have attracted an increasing interest to handle control problems,such as robotics [46], autonomous vehicles [46], data center cooling [47], etc. In this respect, the recent work [48] investigated online optimal control subject to affine constraints for linear systems with random disturbances, for which an algorithm, called online gradient descent with buffer zone,was proposed and proved to meet all the constraints in spite of any disturbance realizations. As discussed above, it is promising to put more attention on the aforementioned scenarios in the future.
In summary, this paper presents a brief survey on distributed/decentralized online optimization, for which the centralized online optimization, as an incipient focus of researchers, is first succinctly discussed, following why distributed online optimization is considered, as motivated by a multitude of realistic problems with large-scale networks.Subsequently, the current existing works under investigation are presented, including unconstrained online optimization,online optimization with feasible set constraints, the case with local inequality constraints, and the case with coupled inequality constraints. Then several hot issues for distributed online optimization are briefly concluded, such as optimal regret bounds, the case with predicted information, switching costs, communication delays, asynchrony, quantization,random networks, security, the case with an aggregative variable, etc., which are still interesting and active research directions in future.
AcknowledgementsThis work was supported by the Shanghai Municipal Science and Technology Major Project (No. 2021SHZDZX0100),the Shanghai Municipal Commission of Science and Technology (No.19511132101) and the National Natural Science Foundation of China(Nos. 62003243, 62088101).
Control Theory and Technology2021年1期