Evaluation of Document and Search Query Processing Frameworks

Tobias Svensson
Göteborg : Chalmers tekniska högskola, 2014. 48 s.
[Examensarbete på avancerad nivå]

As search becomes a vital cornerstone of any organization and as expectations and demands on ndability and search steadily increase, there is a need for high-performance, scalable and simple Text Processing Frameworks to implement document processing solutions. Today, there are many open source solutions available to this end. In this thesis, the processing frameworks GATE, UIMA, OpenPipeline, Hydra and Storm are analyzed and compared. We investigate the impact of parallelism and distribution on throughput and performance. Additionally, the possibilities and demands of performing Natural Language Processing tasks on real-time search queries is analyzed. The feasibility of using the processing frameworks for this task is investigated and the results are discussed. Finally, recommendations are made for which kind of system to implement for di erent use cases and improvements to existing systems are suggested.

