In English

Auto-scaling cloud infrastructure with Reinforcement Learning A comparison between multiple RL algorithms to auto-scale resources in cloud infrastructure

Daniel Edsinger
Göteborg : Chalmers tekniska högskola, 2018. 69 s.
[Examensarbete på avancerad nivå]

With an increasing use of cloud services for both personal and professional use, the competition for bringing the best product becomes harder as more companies provide this type of service. Not only do they want to save cost, but also improve the stability to better handle sudden, unexpected problems that can decrease the performance and responsiveness of their cloud service. Therefore, the purpose of this project was to propose and evaluate different solutions that can auto-scale the cloud infrastructure based on its resource usage. Also included in the report are algorithms that did not provide any usable results or could not handle the complexity of the problem. We developed three different reinforcement learning algorithms in Python, using the Tensorflow framework to train neural networks, and compared their performances in terms of both cost and stability. These algorithms were implemented to work on virtual machines with Apcera installed and were trained with data collected through Apceras API. The training was done in a simulation of the cloud cluster. The results of this project shows a noticeable difference between these three algorithms. While all three work to some degree, one stands out and performs significantly better than the other two in terms of cost and the stability of the cluster. Conclusively, we have an algorithm that can accurately predict how to scale the cloud cluster based on the time of day, and the current resource usage.

Nyckelord: Computer, science, Q-learning, SARSA, machine learning, reinforcement learning, cloud computing, EC2, AWS, Apcera.

Publikationen registrerades 2019-01-31. Den ändrades senast 2019-01-31

CPL ID: 256475

Detta är en tjänst från Chalmers bibliotek