In English

Twitter Topic Modeling

Karina Bunyik
Göteborg : Chalmers tekniska högskola, 2014. 79 s.
[Examensarbete på avancerad nivå]

Following social media discussions related to real life events, has been a great topic of interest. There is no general method for deciding whether the social media discussions reflect the dynamics of the events or if they lead a life on their own. Existing methods for analyzing social media discussions rely on extensive manual work from domain experts and do not generalize well to discussions on languages other than English nor to various events. Combining the domain expert’s knowledge with data driven approaches can lead to models that are applicable to di↵erent domains, and the same time are capable of handling large data amount from social media. In this research, we modeled the Twitter discussions about the Swedish party leader debate held on October 2013. We constructed a semiautomatic model based on Term Frequency-Inverse Document Frequency in order to identify and measure the debate topics on Twitter. For discovering other discussions, we made use of Latent Dirichlet Allocation - an unsupervised learning algorithm. We evaluated the models manually with the help of a domain expert. We compared the Twitter discussions to the topics the politicians were talking about on the debate. The correlation between the Twitter discussions and the debate topic corresponds to the results from a still ongoing political science research. The political science domain expert Linn Sandberg from The University of Gothenburg, Department of Political Science contributed to the research by defining the research-question and evaluating the models.

Nyckelord: topic modeling, Twitter, LDA, tf-idf

Publikationen registrerades 2014-09-18. Den ändrades senast 2014-09-18

CPL ID: 202973

Detta är en tjänst från Chalmers bibliotek