| Kurzfassung | The ability to infer the beliefs, desires, and preferences of others around us - known as Theory of Mind - is critical to effectively collaborate as human beings. In recent years, Theory of Mind has been extended to the field of artificial intelligence as Machine Theory of Mind. Agents with a Theory of Mind have a human-like ability to reason about the mental states of other agents, thus enabling more efficient cooperation among agents. In the field Multi-Agent Reinforcement Learning, AI practitioners tackle specific challenges such as Overcooked, Hanabi, and the StarCraft Multi-Agent Challenge, which encourage more research in this area. However, previous challenges either do not require Theory of Mind or, if they do, do not require spatio-temporal reasoning. To bridge this gap, we further introduce a novel challenge based on the card game called Yokai in this work. Yokai, similar to Hanabi, is a cooperative game characterized by incomplete information. The complexity of Yokai promises to provide a valuable opportunity to test the current capabilities of Multi-Agent Reinforcement Learning algorithms in tasks requiring Theory of Mind. In this thesis, we set out to explore Yokai as a potential benchmark for Theory of Mind reasoning. To this end, we first conduct a theoretical comparison between Yokai and Hanabi. We then discuss our implementation of a high-performance Yokai environment, which uses Jax-a numerical computing library that integrates features of NumPy, automatic differentiation, and GPU/TPU support. Finally, we assess and discuss the performance of the MAPPO and IPPO algorithms in the Yokai and Hanabi environments, aiming to understand their effectiveness in scenarios that demand Theory of Mind reasoning capabilities. Our experiment results reveal that: (1) Yokai indeed poses a greater challenge than Hanabi, as indicated by the rewards achieved and model adaptability. (2) MAPPO outperforms IPPO in complex multi-agent cooperative tasks if agents can be trained with a sufficiently large number of steps, and (3) an increase in the number of participants makes multi-agent cooperative tasks more challenging. In conclusion, our work suggests that the increased complexity of Yokai makes it a valuable and more challenging testbed for Theory of Mind, and we hope that this more challenging Yokai environment can facilitate further research of Machine Theory of Mind in the field of Multi-Agent Reinforcement Learning.
|