首页
文献服务
文献资源
外文期刊
外文会议
中文期刊
专业机构
智能制造
高级检索
版权声明
使用帮助
SCRAWLER: A SEED-BY-SEED PARALLEL WEB CRAWLER
  
  
作者:
Joo Yong Lee
(School of Computing, Soongsil University, Seoul, Korea jylee@comp.ssu.ac.kr)
Sang Ho Lee
Yanggon Kim
会议名:
International Joint Conference on e-Business and Telecommunications (ICETE 2007) - Second International Conference on E-Business (ICE-B 2007)
会议日期:
July 28-31, 2007
会议地点:
Barcelona, Spain
出版年:
2007
ISBN:
978-989-8111-11-1
页码:
86-91
总页数:
6
馆藏号:
hyw01663(1)
分类号:
F713.36-53/I61/(2007)
关键词:
Web crawler
;
Parallel crawler
;
Scalability
;
Web database
参考中译:
语种:
eng
文摘:
As the size of the Web grows, it becomes increasingly important to parallelize a crawling process in order to complete downloading pages in a reasonable amount of time. This paper presents the design and implementation of an effective parallel web crawler. We first present various design choices and strategies for a parallel web crawler, and describe our crawler's architecture and implementation techniques. In particular, we investigate the URL distributor for URL balancing and the scalability of our crawler.
相关文献:
CULTURALLY APPROPRIATE WEB INTERFACE DESIGN: WEB CRAWLER STUDY
ON-THE-FLY DETECTION OF CONTENT-POOR WEBPATHS
An Architectural Framework of a Crawler for Locating Deep Web Repositories using Learning Multi-agent Systems
Utilizing RSS feeds for crawling the Web
A Framework of Deep Web Crawler
SCTWC: An online semi-supervised clustering approach to topical web crawlers
Stanford WebBase Components and Applications
Novel approaches to crawling important pages early
Development of a scalable web crawler
Topical Web Crawling Using Weighted Anchor Text and Web Page Change Detection Techniques
©2016机械工业出版社(机械工业信息研究院) 京ICP备05055788号-35