Web Crawler
Simple illustration
- Given a set of URLs, download all the webpages addressed by the URLs
- Extract URLs from these web pages
- Add new URLs to the list of URLs to be downloaded, repeat the 3 steps
Purpose
- Search engine indexing
- Web archiving
- Web mining
- To collect data, such as the financial firms collecting shareholder information
- Web monitoring
- Monitor copyright and trademark infringement over the internet