SWM-MARLStrategic World Models for Multi-Agent Reinforcement Learning

OngoingSingleFlow

2026–2030

This project is situated at the frontier of multi-agent reinforcement learning (MARL), with a particular focus on strategic world models that enable autonomous agents to reason, plan, and act effectively in complex, interactive environments. The foundational work spans model-free centralized training with decentralized execution (CTDE) methods, with ongoing investigations into the computational intractability inherent in multi-agent settings, including efforts to reduce the computational complexity of joint policy search. Building on this, the research explores how learned world models can serve not merely as environment simulators, but as strategic reasoning substrates — capturing opponent intentions, modeling recursive belief hierarchies, and supporting causal reasoning about the consequences of joint actions. A key interest lies in leveraging large language models to ground high-level strategic inference within these world models, enabling agents to reason about what others believe, how others are learning, and how to act in ways that shape the strategic landscape itself. Alongside this, the project invests in embedding safety and behavioral constraints systematically within multi-agent planning, ensuring that strategic competence does not come at the cost of reliability. Empirical investigations are conducted primarily in simulated multi-agent environments. Through this work, the project aims to contribute toward autonomous systems that are strategically competent, computationally tractable, and safe in the presence of other learning agents.

Partners

SingleFlow

Team

Jilles S. Dibangoye — Promotor
Yftah Ziser — Co-Promotor
Bharath Mahadeva Rao — PhD Candidate