show me the clients

Neal Cardwell (cardwell@cs.washington.edu)
Wed, 10 Jun 1998 17:58:33 -0700 (PDT)

I spent a few hours looking at web server logs, with the general question
in mind: "where are the web clients?". I looked at web server logs from 12
web sites. Here's what i've found so far. Let me know if you think of
other related questions that might be interesting...

Question: Of server logs from 12 web sites, how many sites had domain
X in their pool of top 30 clients, when client domains are ranked by
number of HTTP GET requests?

Answer: client domains, sorted by frequency:
--------------------------------------------
<1-2 chopped. long tail>
com.boeing: 3
edu.tenet: 3
com.lycos: 3
net.uu: 3
com.inktomi: 3
198.5: 3 (198.5.210.8 <=> infoseek.com)
net.prodigy: 3
au.edu: 5
au.com: 5
au.net: 6
com.wisewire: 6 (WiseWire collaborative filtering)
uk.co: 6
com.netcom: 7
uk.ac: 7
com.alexa: 7 (Alexa web recommendation service)
com.atext: 8 (Excite)
com.compuserve: 10
com.dec: 10 (Alta Vista)
com.aol: 12

For example, this means that, for all 12 sites, clients from aol.com
were in the top 30 of the list of the domains that made the most GET
requests to the site. And, for example, half of the sites (6 sites)
had clients from British companies (uk.co) in their top 30 clients.

Outside of requests from within the server's own institution, AOL
customers were usually one of the 2 or 3 biggest pools of customers.

All these clients fall into 6 nice categories:

Big ISPs
--------
edu.tenet: 3 (Texas Educational Network)
net.uu: 3
net.prodigy: 3
com.netcom: 7
com.compuserve: 10
com.aol: 12

search engines:
---------------
com.lycos: 3
com.inktomi: 3
com.infoseek: 3
com.wisewire: 6 (WiseWire collaborative filtering)
com.alexa: 7 (Alexa collaborative filtering)
com.atext: 8 (Excite)
com.dec: 10 (Alta Vista)

Australian clients
------------------
au.edu: 5
au.com: 5
au.net: 6

British clients
---------------
uk.co: 6
uk.ac: 7

American companies with employees who waste a lot of time on the web
--------------------------------------------------------------------
com.boeing: 3

The server sites from which the logs were taken:
------------------------------------------------
http://www.csrl.ars.usda.gov/webstat.html (all of 98)
http://resource.ca.jhu.edu/rawstats.html (nearly a year)
http://www.physiol.unimelb.edu.au/Webstat.html (nearly a year)
http://agdc.usgs.gov/srvinfo/http/host.html (nearly a year)
http://esdis.gsfc.nasa.gov/smo/metrics/smo_stats_may98.html
http://www.execpc.com/reports/birzer
http://www.dea.main.com/usage.html
http://darkstar.engr.wisc.edu/webstat.html
http://www.educ.drake.edu/WebStat.stat
http://www.baclass.panam.edu/webstat.html
http://gynoncology.obgyn.washington.edu/STATS/
http://wwwofe.er.doe.gov/Webstat.html (huge - 1996)

o All logs except for one were from 97-98.
o There were roughly 40,000-60,000 distinct client hosts in all the
traces combined.
o AOL and Prodigy customers come through web proxies.
Compuserve, MSN, and Netcom customers apparently do not.

neal