"Cooperative Learning for Adversarial Multi-Armed Bandit on Open Multi-Agent Systems"

This letter considers a cooperative decision-making method for an adversarial bandit problem on open multi-agent systems. In an open multi-agent system, the network configuration changes dynamically as agents freely enter and leave the network. We propose a distributed Exp3 policy in which a group of agents exchanges the estimation of the expected reward of each arm with active neighboring agents. Then, each agent updates the probability distribution of choosing arms by combining the estimated rewards of neighboring agents. We derive a sufficient condition for a sublinear bound of a pseudo regret. The numerical example shows that active agents can cooperatively find the optimal arm by the proposed Exp3 policy algorithm.

Keywords: open multi; agent systems; multi; bandit; multi agent

Journal Title: IEEE Control Systems Letters
Year Published: 2023

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
2

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended