Factor/To do/Spider
- make filters compile somehow
- random sleep
- redirects
- https
- cookies
- connect timeout, page timeout, data timeout, overall timeout, stopping spiders if overall timeout is reached
- parse robots.txt and make filters for it
- flag to disable robots.txt ;)
Not immediately needed
- parallel version
- retry framework
- retry connection-failed
- option to turn off dns caching
- proxies
- option to check if pages exist but not download them
- custom user agent string
- custom http headers
- spidering of results of a spider
- save to database
- save to directories/files
- follow relative links only
- support ftp spidering
- bytes per second download rate limit
- download quota
- quiet mode
- prefer ipv4/ipv6
This revision created on Thu, 2 Oct 2008 05:19:14 by erg