In English

Data integration using machine learning: Automation of data mapping using machine learning techniques

Marcus Birgersson ; Gustav Hansson
Göteborg : Chalmers tekniska högskola, 2016. 65 s.
[Examensarbete på avancerad nivå]

Data integration involves the process of mapping the flow of data between systems. This is a task usually performed manually and much time can be saved if some parts of this can be automated.

In this report three models based on statistics from earlier mapped systems is presented. The purpose of these models is to aid an expert in the mapping process by supplying a first guess on how to map two systems. The models are limited to mappings between two XML-formats, where the path to a node carrying data usually is descriptive of its data content. The developed models are the following:

1. A shortest distance model based on the concept that two nodes that have been associated with a third node but not each other most likely have something to do with each other.

2. A network flow model, which connects words with similar semantic meaning to be able to associate the words in two connected XML paths with each other.

3. A data value model which connects data values to nodes based on similarities between the value and earlier seen data. The results of the models agrees with expectations. The shortest distance model can only make suggestions based on XML-structures that are present in the training set supplied for the project. The network flow model has the advantage that it only needs to recognize parts of a path to map two nodes to each other, and even completely unfamiliar systems can be mapped if there are similarities between the two systems. Overall, the data value model performs the worst, but can make correct mappings in some cases when neither of the others can.

Nyckelord: artificial intelligence, machine learning, system integration, data mapping



Publikationen registrerades 2016-02-18. Den ändrades senast 2016-03-03

CPL ID: 232167

Detta är en tjänst från Chalmers bibliotek