Factor/To do/Spider
- option to check if pages exist but not download them
- retry framework
- retry connection-failed
- random sleep
- redirects
- proxies
- connect timeout, page timeout, data timeout, overall timeout, stopping spiders if overall timeout is reached
- bytes per second download rate limit
- download quota
- option to turn off dns caching
- https
- cookies
- make filters compile somehow
- parallel version
- custom user agent string
- custom http headers
- spidering of results of a spider
- save to database
- save to directories/files
- follow relative links only
This revision created on Thu, 2 Oct 2008 01:16:31 by erg