In English

Comparison of Arm Selection Policies for the Multi-Armed Bandit Problem

Fifi Johansson ; MIRIAM MCHOME
Göteborg : Chalmers tekniska högskola, 2018. 46 s.
[Examensarbete på avancerad nivå]

Web content optimization involves deciding what content to put on a web page, its layout and design. All of which involve selecting few options among many. With the advent of personalization, many companies seek to make this decision even on a per-user basis in order to improve customer experience and satisfaction. Contextual multi-armed bandit provides several strategies to address this online decision-making problem at a lower experimental cost than traditional A/B testing. In this study, we compare three common Contextual Bandit strategies that exist in literature namely E-greedy, LinUCB and Thompson Sampling, and apply two of them, E-greedy and LinUCB, to three datasets. In doing so we offer further empirical evidence on the performance of these strategies and insights for practitioners on what strategy might work for them. Our results suggest that both approaches, E-Greedy and LinUCB are effective in improving click-through rate compared to the random approach. The more sophisticated approach has better results with large datasets, and a quite unstable performance when the number of datapoints is small. On the other hand, we find that the more sophisticated approach is more sensitive to parameter tuning and can have significantly worse outcome when parameters are incorrect. Our study also finds that LinUCB can have higher data requirements when performing evaluation offline. Collectively the varying performance of these approaches across dataset signal the need for better tools and procedures to help practitioners decide on the appropriate approach.

Nyckelord: machine learning, multi-armed bandit, offline evaluation, contextual bandit

Publikationen registrerades 2018-11-23. Den ändrades senast 2018-11-23

CPL ID: 256336

Detta är en tjänst från Chalmers bibliotek