Andy, Neal and I drove up a couple days early to catch tutorials.
Vancouver is about 3 hours away by car (Expedia says 2:15, but I think
they assume you can drive from the Canadian border to Vancouver at
70mph). They have some slightly different traffic signals in Canada
(like blinking green) that I have no idea how to interpret, and I was
occasionally confused into speeding (I wasn't prepared for the change to
kph) but we got there safely.
The three of us attended two full day tutorials: "An Introduction to
Internet Measurement and Modeling" by Vern Paxon, and "Designing and
Building Gigabit and Terabit Internet Routers" by Craig Partridge. The
slides for both tutorials can be found in my office. Vern's was first
and was quite good, although if you've read his last few SIGCOMM papers
(as we had) then you didn't learn much that was new. A couple nice
parts of his talk included a great graphical argument for plotting
internet measurements using a log scale, and some good definitions of
Pareto distributions, heavy tails and self-similarity. He has become a
big believer in self-similarity for modeling. Apparently, the big
result from the last year or so, is a proof that if sessions are Poisson
distributed and individual sessions have heavy-tailed distributions,
that the overall result is self-similar (giving one good explanation why
internet traffic looks self-similar). I had a person problem with the
term "long-range dependence" because, as best as I understand it, it
really means "long range correlation" rather than dependence. He had a
number of good anecdotes about measurements pitfalls that seemed
useful... we also learned that illegal software is beating out porn as
the principle reason why the number of bytes per USENET article as
increased. Finally, I think perhaps one of his strongest pieces of
advice was to calibrate measurements by using alternative measurement
techniques. After the tutorial I talked to him about the routing
measurements and about the drop rate measurement tool. He sounded
interested in the former, but warned us to calibrate our traceroute
measurements. Some hosts (Solaris in particular) apparently rate-limit
their ICMP responses. This would perturb drop frequencies and possibly
rtt's. Wrt the packet loss tool at first he didn't believe it could be
done, but after we talked through several of his scenarios he though it
could be pretty cool.
The second tutorial was, in my opinion, not as good. Craig is less
professorial than Vern and seems more dogmatic and off-the-cuff. Also,
most of the material was preview or simple re-canning of other sigcomm
papers (for instance, the first 25 slides or so were outlines and
explanations of what the fields in IP were). The tutorial was one part
overview, one part history of router architecture (host, host with smart
interface, host with dual bus with smart interface, switch with smart
interfaces...) and one apart discussion of route lookup algorithms.
Among his claims that seemed useful:
Market:
Low-end routers are commodity items, slim margin
High-end routers are small market but necessary for advertising
All the money in mid-range routers (although he didn't talk
about these)
Technical issues:
In old routers, large problem was that administrators were
responsible for configuring interfaces and balancing load to get best
performance... didn't work.
With switched backplanes, IO not an issue... route lookup is
bottleneck
secondary issue (output port scheduling)
Big debate over whether shared memory switches can scale
No one does IP checksum verification
Everyone uses input buffered switches because they're easier to
build
Getting management control hardware/software in place is hardest
Hardware vs Software debate:
Principle problem with software is increasing problems
with
memory interfaces of mass-market uP's (multi-level
caches),
and instruction set mismatch
Likes FPGAs
Stanford's Nick McKeown has something called the TinyTera that
he
thinks is the best thing since slices bread (scales to a
Terabit)
He thinks IPsec is going to make routers simpler and faster
because they won't be able to classify on the packet header/body, and
will need to use the IP TOS field.
There were a number of places were we (I?) got confused, such as where
RED gets run on these boxes (there are buffers in lots of different
places) and exactly what the technical problems with output buffering
were. I did follow up with him concerning buffer sizes. The large
input buffers are sized to the largest MTU of an input interface
(leading to lots of wastage), but these are fragmented WITHIN the switch
to be small fixed sized units. Special case for ATM... cells are
transitioned directly into internal switch (not packetized?)
The best part of these days, and indeed the entire conference, was
meeting other people.
The proceedings are in my office so people should feel free to look
through. Here are the papers that I thought were really good (although
I was only there for 2 of three days and couldn't listen closely to
every single talk).
Lee Breslau gave a talk about a simple analytic model for comparing the
benefit of reservation-style schemes to best-effort schemes
paramaterizing utility to the user, bandwidth, bandwidth cost, load,
etc... It was very simple... so simple that it was immediately
understandable and hence useful. Its amazing that no one has done this
before. Anyway, the interesting results from this simple model is that
relatively small amounts of overcapacity (less than a factor of 2)make
best-effort do as well as reservation for all load models except heavy
talked ones. IN heavy talked ones, there is still a real strong benefit
for reservation. Breslau and Shenker come from the pro-reservation camp
so any biases will be methodological and not result oriented.
Walter Willinger (Mr self-similar) gave a talked on multi-fractal
analysis that surprisingly to me, was very good and very accessible.
The basic issue is that internet traffic is well described as
self-similar on longer time scales, but on shorter times scales (ie a
round trip time) it is not. So the modeling issue is how to talk in a
unified way about both the short range dependent components of traffic
(queues and TCP congestion behavior) and the long-range correlated
component (the self-similar stuff). So the trick is using wavelet
decomposition (which isn't too surprising for those of you who took
graphics, since that's what they're good for) to talk about the
different times scales independently. Past this the math goes beyond
me... although its interesting enough that I plan on taking out my old
wavelet notes and trying to work through the paper. Wilinger also came
up with a logical mechanism to explain why this kind of behavior would
occur (called cascades) however, the missing link is a physical
mechanism (ie a real property of protocols, networks or users) the
matches this.
Ion Stoica from CMU presented their paper on Core-Stateless Fair
Queuing. This is a way to approximate fair queuing with only doing
computation at the edges. The trick is that the edges append a tag to
each packet that contains a probability and the core nodes just discard
based on this probability. Correctly calculating the probability is the
interesting problem. It was quite a nice result, although there seem to
be two big problems. The first is that it doesn't really address the
issues of multiple congestion points. You get assigned a drop
probability according to your fair share at and edge... however in the
core you may compete with traffic from another edge and clearly the two
flows will not receive fair shares at that internal congestion point.
Second, the model is based on the assumption that network "hot spots"
are in the core and not at the edges. This ignores the issues of
exchange points which are high-bandwidth exchange points at the edges.
Still, a very nice paper.
John Byers presented a paper on "A digital fountain approach to reliable
distribution of bulk data". This is really the first "application"
paper to come out of the theoretical Tornado code paper by Mike Luby and
others. Anyway, the basic idea is that they've identified a set of
erasure codes that take up slightly more room than reed solomon codes,
but are dramatically cheaper to compute. The application is bulk
distribution of data to lots of people (ie explorer upgrade). The data,
of size k, is duplicated into 2k packets and then sent unreliably. As
long as you receive 2k + 0.05*2k packets you can reconstruct the
message. Tornado codes are pretty cool and should have wide
applicability.
Summary cache, a web caching paper, also looked good... but I didn't
seem the talk.
As far as people/gossip news... the big news was that Van Jacobsen (and
possibly others at LBL) is leaving LBL and going to a private research
lab at Cisco.
In other news, Jamshid Mahdavi from PSC was interested in our
measurements work and offered to let us use Nimi (their distributed
measurement infrastructure). He was also into the packet drop
measurement tool I've been talking about. We also met Jeff Semke from
over there. Both were very approachable and good people to talk with.
Sprint is forming a research lab in the Bay. They are taping all of the
internet portion of their network and it seemed like there might be some
opportunity to work with them (at least to have someone come to our
retreat). They are particularly interested in finding ways to measure
and account for availability.
Matt Zeukauskas who's an old CMU guy (actually Brian's student... still
finishing his thesis on Midway) is now working at ANS and has been
working on active measurement specifications for IPMG. He could be a
resource for ANS data or some sense of what IPMG is doing.
Steve McCanne and Elan Amir are at their new startup company, FastForwrd
networks, but they won't say what their doing. MSR's OS research group
has been OS and networking research with the addition of Venkat
Padmanabhan from Berkeley. ARPA is looking for PM's. There is an
anti-Active Net sentiment around many of the networking people. My take
is that the issue is economic and not technical. Active services
represent a different business model than selling routers. I think a
more successful packaging model might be to get a high speed parallel
interface to the router switch fabric and then sell a separate "active
services" box that plugged in there.
Anyway, that's it. Feel free to come by and talk to me if you have any
questions about what went on or want to see the proceedings/tutorials.
- Stefan