In English

Topic Modeling and Clustering for Analysis of Road Traffic Accidents

Agazi Mekonnen ; Shamsi Abdullayev
Göteborg : Chalmers tekniska högskola, 2017. Diploma work - Department of Applied Mechanics, Chalmers University of Technology, Göteborg, Sweden, ISSN 1652-8557; 2017:65, 2017.
[Examensarbete på avancerad nivå]

In this thesis, we examined different approaches on how to cluster, summarise and search accident descriptions in Swedish Traffic Accident Data Acquisition (STRADA) dataset. One of the central questions in this project was that how to retrieve similar documents if a query does not have any common words with relevant documents. Another question is how to increase similarity between documents which describe the same or similar scenarios in different words. We designed a new pre-processing technique using keyword extraction and word embeddings to address these issues. Theoretical and empirical results show the pre-processing technique employed improved the results of the examined topic modeling, clustering and document ranking methods.

Nyckelord: Machine Learning, Latent Dirichlet Allocation, Clustering, Probabilistic, Topic Models, Text Mining, Traffic Safety, Accident database



Publikationen registrerades 2017-07-05. Den ändrades senast 2017-07-05

CPL ID: 250497

Detta är en tjänst från Chalmers bibliotek