In English

Distributed Web Crawler

Hans Bjerkander ; Erik Karlsson
Göteborg : Chalmers tekniska högskola, 2014. 44 s.
[Examensarbete på avancerad nivå]

This thesis investigates possible improvements in distributed web-crawlers. Web-crawling is the cornerstone of search-engines and a well de ned part of Internet technology. Due to the size of the Web, it is important that a web-crawler is fast and ecient, since a web-crawler should be able to nd the interesting sites before they change or disappear. The thesis will focus on crawler distribution concerning modularity, fault-tolerance and group membership services. The download order of crawlers will also be covered, since this greatly in uences the eciency of a crawler. In addition to the theoretical basis of the thesis, a prototype has been constructed in Java. The prototype is ecient, modular, fault-tolerant and con gurable. The result from the thesis indicates that using a membership service is a good way to distribute a crawler and conclusively, the thesis also demonstrate a way to improve the crawling order compared to a breadth- rst ordering.

Publikationen registrerades 2014-02-12.

CPL ID: 193680

Detta är en tjänst från Chalmers bibliotek