LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Online Learning of Time-Varying Unbalanced Networks in Non-Convex Environments: A Multi-Armed Bandit Approach

Photo from wikipedia

This study discusses how agents in a time-varying distributed network can converge to the global minimizer of a time-varying graph network. Each agent knows only the local loss of its… Click to show full abstract

This study discusses how agents in a time-varying distributed network can converge to the global minimizer of a time-varying graph network. Each agent knows only the local loss of its observation and must cooperate constructively with other agents to find the global minimizer of the network. Unlike most existing works in the literature that consider a convex loss function, this study assumes a generalized local Lipschitz loss function for each agent, which can be convex or non-convex. We propose a multi-armed bandit algorithm CD EXP3 where each agent does not know its loss function but only observes its losses. Through simulations using two different time-varying graph topologies, we show that the algorithm helps all agents converge to the minimizer of the network. In addition, we discuss the effects of the two different topologies and various simulation parameters on convergence. We obtain an upper bound on the expected regret and compare it with the sublinearity of the regret bounds of well-known online distributed algorithms.

Keywords: time; network; non convex; armed bandit; time varying; multi armed

Journal Title: IEEE Access
Year Published: 2023

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.