check out:
http://www.cs.washington.edu/research/networking/detour/local/harvest1.txt
283 GET-able traceroute slaves i found a few weeks ago courtesy of
Alta-Vista and a few perl scripts...
still haven't checked robots.txt for them, though.
neal
PS: the lines with more than one URL indicate more than one script on a
particular host