In English

Modulating Reinforcement- Learning Parameters Using Agent Emotions

Rickard von Haugwitz
Göteborg : Chalmers tekniska högskola, 2012. 42 s. Report - IT University of Göteborg, Chalmers University of Technology and the University of Göteborg , ISSN 1651-4769, 2012.
[Examensarbete på avancerad nivå]

When faced with the problem of learning a strategy for social interaction in a multiagent environment, it is often difficult to satisfactorily define clear goals, and it might not be clear what would constitute a “good” course of action in most situations. In this case, by using a computational model of emotion to provide an intrinsic reward function, the task can be shifted to optimisation of emotional feedback, allowing more high-level goals to be defined. While of most interest in a general, not necessarily competitive, social setting on a continuing task, such a model can be better compared with more conventional reward functions on an episodic competitive task, where its benefit is not as readily visible. A reinforcement-learning system based on the actor-critic model of temporal-difference learning was implemented using a fuzzy inference system functioning as a normalised radial-basis-function network capable of dynamically allocating computational units as needed and to adapt its features to the actual observed input. While adding some computational overhead, such a system requires less manual tuning by the programmer and is able to make better use of existing resources. Tests were carried out on a small-scale multi-agent system with an initially hostile environment, with fixed learning parameters and separately with modulated parameters that were allowed to deviate from their base values depending on the emotional state of the agent. The latter approach was shown to give marginally better performance once the hostile elements were removed from the environment, indicating that emotion-modulated learning may lead to somewhat closer approximation of the optimal policy in a difficult environment by focusing learning on more useful input and increasing exploration when needed.



Publikationen registrerades 2013-02-19. Den ändrades senast 2013-04-04

CPL ID: 173825

Detta är en tjänst från Chalmers bibliotek