工程科技文献索引服务平台 | abstract.cmanuf.com

SCRAWLER: A SEED-BY-SEED PARALLEL WEB CRAWLER

作者：	Joo Yong Lee（School of Computing, Soongsil University, Seoul, Korea jylee@comp.ssu.ac.kr） Sang Ho Lee Yanggon Kim
会议名：	International Joint Conference on e-Business and Telecommunications (ICETE 2007) - Second International Conference on E-Business (ICE-B 2007)
会议日期：	July 28-31, 2007
会议地点：	Barcelona, Spain
出版年：	2007
ISBN：	978-989-8111-11-1
页码：	86-91
总页数：	6
馆藏号：	hyw01663(1)
分类号：	F713.36-53/I61/(2007)
关键词：	Web crawler；Parallel crawler；Scalability；Web database
参考中译：
语种：	eng
文摘：	As the size of the Web grows, it becomes increasingly important to parallelize a crawling process in order to complete downloading pages in a reasonable amount of time. This paper presents the design and implementation of an effective parallel web crawler. We first present various design choices and strategies for a parallel web crawler, and describe our crawler's architecture and implementation techniques. In particular, we investigate the URL distributor for URL balancing and the scalability of our crawler.