Shielding in Resource-Constrained Goal POMDPs

Ajdarów,  Michal; Brlej,  Šimon; Novotný,  Petr

Shielding in Resource-Constrained Goal POMDPs

Varování

Publikace nespadá pod Fakultu sportovních studií, ale pod Fakultu informatiky. Oficiální stránka publikace je na webu muni.cz.

Autoři	AJDARÓW Michal BRLEJ Šimon NOVOTNÝ Petr
Rok publikování	2023
Druh	Článek ve sborníku
Konference	Proceedings of the 37th AAAI Conference on Artificial Intelligence
Fakulta / Pracoviště MU	Fakulta informatiky
Citace
www	https://ojs.aaai.org/index.php/AAAI/article/view/26715
Doi	http://dx.doi.org/10.1609/aaai.v37i12.26715
Klíčová slova	decision making; Markov decision processes; controller synthesis; resource constraints; shielding
Popis	We consider partially observable Markov decision processes (POMDPs) modeling an agent that needs a supply of a certain resource (e.g., electricity stored in batteries) to operate correctly. The resource is consumed by the agent's actions and can be replenished only in certain states. The agent aims to minimize the expected cost of reaching some goal while preventing resource exhaustion, a problem we call resource-constrained goal optimization (RSGO). We take a two-step approach to the RSGO problem. First, using formal methods techniques, we design an algorithm computing a shield for a given scenario: a procedure that observes the agent and prevents it from using actions that might eventually lead to resource exhaustion. Second, we augment the POMCP heuristic search algorithm for POMDP planning with our shields to obtain an algorithm solving the RSGO problem. We implement our algorithm and present experiments showing its applicability to benchmarks from the literature.
Související projekty:	Efektivní analýza a optimalizace pravděpodobnostních systémů a her Zapojení studentů Fakulty informatiky do mezinárodní vědecké komunity 23