In English

Distributed Systems verification using fault injection approach

Zhenxiao Hao ; Khaled Alnawasreh
Göteborg : Chalmers tekniska högskola, 2016. 51 s.
[Examensarbete på avancerad nivå]

Software nowadays becomes more complex and the number of the components that is involved in an application is externally large. If a fault occurs, the fault can easily propagate, become larger and take more time to detect and reproduce. Therefore, having a robust system that is able to perform normally even with the existence of faults is very important, but at the same time is very challenging. Different researches have been involved in handling and improving the robustness by using fault injection techniques presented in [23], [31]. Fault injection is mainly used in order to detect the unexpected faults as well as the dependencies bottleneck. Fault injection approaches work by sending fault messages to the components within a distributed system and observing how the system can handle them.

This study presents a fault injection approach for testing the robustness of the embedded distributed system in the RBS (Radio based station) at Ericsson. RBS is a distributed system that consists of components that communicate with each other via messages. One characteristic of the distributed system at Ericsson is the possibility to work and provide services even though some components fail. Since the components are stateful and have complex protocol, verifying that the system is robust is not a trivial task. The new approach is inspired from Netflix’s ChaosMonkey. When Netflix moved their data center to amazon web service, they had the need to use fault injection technique for testing the reliability of the distributed system. After deep analysing of the Performance Management(PM) framework documentations at Ericsson, some potential bottlenecks have been discovered and some strategies on how the faults can be triggered have been implemented. A fault injection tool have been developed in this study for testing the robustness of the distributed system. Moreover, unexpected faults were detected after generating two fault types, which were sending random messages as well as delaying messages. This study illustrates the potential of utilizing fault injection approach that comes as a complementary to traditional software testing.

The report is written in English.

Nyckelord: Distributed Systems, Fault Tolerance, Fault Injection Testing, Embedded Systems

Publikationen registrerades 2016-06-20. Den ändrades senast 2016-06-20

CPL ID: 237946

Detta är en tjänst från Chalmers bibliotek