Question: Of server logs from 12 web sites, how many sites had domain
X in their pool of top 30 clients, when client domains are ranked by
number of HTTP GET requests?
Answer: client domains, sorted by frequency:
--------------------------------------------
<1-2 chopped. long tail>
com.boeing: 3
edu.tenet: 3
com.lycos: 3
net.uu: 3
com.inktomi: 3
198.5: 3 (198.5.210.8 <=> infoseek.com)
net.prodigy: 3
au.edu: 5
au.com: 5
au.net: 6
com.wisewire: 6 (WiseWire collaborative filtering)
uk.co: 6
com.netcom: 7
uk.ac: 7
com.alexa: 7 (Alexa web recommendation service)
com.atext: 8 (Excite)
com.compuserve: 10
com.dec: 10 (Alta Vista)
com.aol: 12
For example, this means that, for all 12 sites, clients from aol.com
were in the top 30 of the list of the domains that made the most GET
requests to the site. And, for example, half of the sites (6 sites)
had clients from British companies (uk.co) in their top 30 clients.
Outside of requests from within the server's own institution, AOL
customers were usually one of the 2 or 3 biggest pools of customers.
All these clients fall into 6 nice categories:
Big ISPs
--------
edu.tenet: 3 (Texas Educational Network)
net.uu: 3
net.prodigy: 3
com.netcom: 7
com.compuserve: 10
com.aol: 12
search engines:
---------------
com.lycos: 3
com.inktomi: 3
com.infoseek: 3
com.wisewire: 6 (WiseWire collaborative filtering)
com.alexa: 7 (Alexa collaborative filtering)
com.atext: 8 (Excite)
com.dec: 10 (Alta Vista)
Australian clients
------------------
au.edu: 5
au.com: 5
au.net: 6
British clients
---------------
uk.co: 6
uk.ac: 7
American companies with employees who waste a lot of time on the web
--------------------------------------------------------------------
com.boeing: 3
The server sites from which the logs were taken:
------------------------------------------------
http://www.csrl.ars.usda.gov/webstat.html (all of 98)
http://resource.ca.jhu.edu/rawstats.html (nearly a year)
http://www.physiol.unimelb.edu.au/Webstat.html (nearly a year)
http://agdc.usgs.gov/srvinfo/http/host.html (nearly a year)
http://esdis.gsfc.nasa.gov/smo/metrics/smo_stats_may98.html
http://www.execpc.com/reports/birzer
http://www.dea.main.com/usage.html
http://darkstar.engr.wisc.edu/webstat.html
http://www.educ.drake.edu/WebStat.stat
http://www.baclass.panam.edu/webstat.html
http://gynoncology.obgyn.washington.edu/STATS/
http://wwwofe.er.doe.gov/Webstat.html (huge - 1996)
o All logs except for one were from 97-98.
o There were roughly 40,000-60,000 distinct client hosts in all the
traces combined.
o AOL and Prodigy customers come through web proxies.
Compuserve, MSN, and Netcom customers apparently do not.
neal