To read this content please select one of the options below:

Efficient watcher based web crawler design

Saed ALQARALEH (Eastern Mediterranean University, Famagusta, KKTC, via Mersin 10, Turkey)
Omar RAMADAN (Eastern Mediterranean University, Famagusta, KKTC, via Mersin 10, Turkey)
Muhammed SALAMAH (Eastern Mediterranean University, Famagusta, KKTC, via Mersin 10, Turkey)

Aslib Journal of Information Management

ISSN: 2050-3806

Article publication date: 16 November 2015

565

Abstract

Purpose

The purpose of this paper is to design a watcher-based crawler (WBC) that has the ability of crawling static and dynamic web sites, and can download only the updated and newly added web pages.

Design/methodology/approach

In the proposed WBC crawler, a watcher file, which can be uploaded to the web sites servers, prepares a report that contains the addresses of the updated and the newly added web pages. In addition, the WBC is split into five units, where each unit is responsible for performing a specific crawling process.

Findings

Several experiments have been conducted and it has been observed that the proposed WBC increases the number of uniquely visited static and dynamic web sites as compared with the existing crawling techniques. In addition, the proposed watcher file not only allows the crawlers to visit the updated and newly web pages, but also solves the crawlers overlapping and communication problems.

Originality/value

The proposed WBC performs all crawling processes in the sense that it detects all updated and newly added pages automatically without any human explicit intervention or downloading the entire web sites.

Keywords

Acknowledgements

The authors would like to thank Assistant Professor Dr Yıltan Bitirim (Eastern Mediterranean University) for his valuable suggestions and comments that greatly improved the manuscript.

Citation

ALQARALEH, S., RAMADAN, O. and SALAMAH, M. (2015), "Efficient watcher based web crawler design", Aslib Journal of Information Management, Vol. 67 No. 6, pp. 663-686. https://doi.org/10.1108/AJIM-02-2015-0019

Publisher

:

Emerald Group Publishing Limited

Copyright © 2015, Emerald Group Publishing Limited

Related articles