In English

Shorter build-measure-learn cycle in software development by using natural language to query big data sets

Markus Berget
Göteborg : Chalmers tekniska högskola, 2014. 58 s.
[Examensarbete på avancerad nivå]

Background Big data is used by many companies to gain insights and drive decisions. Data-scientist is a role that is responsible for analyzing and nding trends in data. In software product development these insights can be valuable in order to improve the quality of the software product. Examples of data used can be usage logs, social media data etc. But the gap between the stakeholders in software product development and data-insights makes it dicult for stakeholders in software product development to gain fast insights about data. Objective This thesis explores what possible factors make it dicult for stakeholders in software product development to gain data-insights in order to improve products. The thesis also explores how stakeholders in software product development can gain big-data insights without the involvement of data-scientists. Method The research method chosen in this thesis was action research. The research contained ve iterations with a collaborating company. The iterations conducted were: rule based parsing using a DSL, statistical parsing using machine learning, webapplication prototype, survey, and observations. Results It was concluded from the results of the survey and semi-structured observations that there was a need to improve data-insights for stakeholders in software product development. The main issues found was lack of customizability and exibility, also the multiple data sources used and diculties to explore the data. A prototype was presented to address the identi ed issues. The prototype used natural language and machine learning for querying data. The prototype also supported querying of multiple data sources. From the observations the prototype proved to be a simple way to query the data and allowing for querying multiple data sources in one place. Conclusion The proposed prototype did not eliminate the need for data-scientists. But the prototype worked as a structured communication channel for data scientists to gauge stakeholders interest in di erent data queries and adding missing functionality by using a data driven approach.

Publikationen registrerades 2015-08-12.

CPL ID: 220541

Detta är en tjänst från Chalmers bibliotek