In English

Swedish Dialect Classification using Artificial Neural Networks and Guassian Mixture Models

Viktor Blomqvist ; David Lidberg
Göteborg : Chalmers tekniska högskola, 2017. 86 s.
[Examensarbete på avancerad nivå]

Variations due to speaker dialects are one of the main problems in automatic speech recognition. A possible solution to this issue is to have a separate classifier identify the dialect of a speaker and then load an appropriate speech recognition system. This thesis investigates classification of seven Swedish dialects based on the SweDia2000 database. Classification was done using Gaussian mixture models, which are a widely used technique in speech processing. Inspired by recent progress in deep learning techniques for speech recognition, convolutional neural networks and multi-layered perceptrons were also implemented. Data was preprocessed using both mel-frequency coefficients, and a novel feature extraction technique using path signatures. Results showed high variance in classification accuracy during cross validations even for simple models, suggesting a limitation in the amount of available data for the classification problems formulated in this project. The Gaussian mixture models reached the highest accuracy of 61.3% on test set, based on singe-word classification. Performance is greatly improved by including multiple words, achieving around 80% classification accuracy using 12 words.

Nyckelord: Swedish, SweDia2000, dialect classification, Gaussian mixture models, convolutional networks, artificial neural networks, deep learning, path signatures

Publikationen registrerades 2017-09-12. Den ändrades senast 2017-09-12

CPL ID: 251852

Detta är en tjänst från Chalmers bibliotek