skip to main content
Language:
Search Limited to: Search Limited to: Resource type Show Results with: Show Results with: Search type Index

Stackelberg POMDP: A Reinforcement Learning Approach for Economic Design

arXiv.org, 2023-11

2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. ;http://arxiv.org/licenses/nonexclusive-distrib/1.0 ;EISSN: 2331-8422 ;DOI: 10.48550/arxiv.2210.03852

Full text available

Citations Cited by
  • Title:
    Stackelberg POMDP: A Reinforcement Learning Approach for Economic Design
  • Author: Brero, Gianluca ; Eden, Alon ; Chakrabarti, Darshan ; Gerstgrasser, Matthias ; Greenwald, Amy ; Li, Vincent ; Parkes, David C
  • Subjects: Computer Science - Computer Science and Game Theory ; Computer Science - Multiagent Systems ; Design ; Game theory ; Games ; Learning ; Optimization ; Strategy
  • Is Part Of: arXiv.org, 2023-11
  • Description: We introduce a reinforcement learning framework for economic design where the interaction between the environment designer and the participants is modeled as a Stackelberg game. In this game, the designer (leader) sets up the rules of the economic system, while the participants (followers) respond strategically. We integrate algorithms for determining followers' response strategies into the leader's learning environment, providing a formulation of the leader's learning problem as a POMDP that we call the Stackelberg POMDP. We prove that the optimal leader's strategy in the Stackelberg game is the optimal policy in our Stackelberg POMDP under a limited set of possible policies, establishing a connection between solving POMDPs and Stackelberg games. We solve our POMDP under a limited set of policy options via the centralized training with decentralized execution framework. For the specific case of followers that are modeled as no-regret learners, we solve an array of increasingly complex settings, including problems of indirect mechanism design where there is turn-taking and limited communication by agents. We demonstrate the effectiveness of our training framework through ablation studies. We also give convergence results for no-regret learners to a Bayesian version of a coarse-correlated equilibrium, extending known results to the case of correlated types.
  • Publisher: Ithaca: Cornell University Library, arXiv.org
  • Language: English
  • Identifier: EISSN: 2331-8422
    DOI: 10.48550/arxiv.2210.03852
  • Source: arXiv.org
    AUTh Library subscriptions: ProQuest Central
    Free E Journals
    ROAD: Directory of Open Access Scholarly Resources

Searching Remote Databases, Please Wait