In English

Learning to Play Games from Multiple Imperfect Teachers

John Karlsson
Göteborg : Chalmers tekniska högskola, 2014. 42 s.
[Examensarbete på avancerad nivå]

This project evaluates the modularity of a recent Bayesian Inverse Reinforcement Learning approach [1] by inferring the sub-goals correlated with winning board games from observations of a set of agents. A feature based architecture is proposed together with a method for generating the reward function space, making inference tractable in large state spaces and allowing for the combination with models that approximate stateaction values. Further, a policy prior is suggested that allows for least squares policy evaluation using sample trajectories. The model is evaluated on randomly generated environments and on Tic-tac-toe, showing that a combination of the intentions inferred from all agents can generate strategies that outperform the corresponding strategies from each individual agent.



Publikationen registrerades 2014-09-19. Den ändrades senast 2014-09-19

CPL ID: 203067

Detta är en tjänst från Chalmers bibliotek