"Online Learning of Time-Varying Unbalanced Networks in Non-Convex Environments: A Multi-Armed Bandit Approach"

This study discusses how agents in a time-varying distributed network can converge to the global minimizer of a time-varying graph network. Each agent knows only the local loss of its observation and must cooperate constructively with other agents to find the global minimizer of the network. Unlike most existing works in the literature that consider a convex loss function, this study assumes a generalized local Lipschitz loss function for each agent, which can be convex or non-convex. We propose a multi-armed bandit algorithm CD EXP3 where each agent does not know its loss function but only observes its losses. Through simulations using two different time-varying graph topologies, we show that the algorithm helps all agents converge to the minimizer of the network. In addition, we discuss the effects of the two different topologies and various simulation parameters on convergence. We obtain an upper bound on the expected regret and compare it with the sublinearity of the regret bounds of well-known online distributed algorithms.

Keywords: time; network; non convex; armed bandit; time varying; multi armed

Journal Title: IEEE Access
Year Published: 2023

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
2

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended