whoops

Tom Anderson (tom@emigrant)
Sun, 23 Aug 1998 01:22:25 -0700 (PDT)

at the beginning of our discussion on friday I suggested
using the old measurements as a coarse-grained version of
the simultaneous test. I realized later why that wouldn't
work -- the path rate for the first test was once every 5 hours,
suggesting that the alternative paths would average several
hours out of date.

So in picking a timeout for the cache of recently tested links,
you should use the average length of time in running a traceroute.
If most complete in a few minutes, a few minute cache timeout wouldn't
queer the results. Since we're only measuring each link on
average every day or so, I wouldn't expect this to be a big deal.

By the way, what is the motivation for measuring the direct
route and the n-gon bi-directionally? Note that this halves
your path rate -- which is the thing you are worried about.
A traceroute that gets through to the target automatically measures
a bi-directional route -- there and back. It won't tell
you the path that is taken by the return -- is that important?
(It might be for figuring out where the bottlenecks are inside
the network, but maybe you don't need to do that for every n-gon
measurement -- you could rely on the fact that every path will
be eventually measured in both directions, even if you don't
know what it is using instantaneously.)

Although this is contrary to something Jack said, I think
you could use earlier n-gon measurements to help determine
which links to use in the next n-gon. As long as we select
pairs randomly, the question is how well we can do at predicting
good n-gons. Why not use all the information we have available?

I'll reiterate the summary of our discussion -- you
need to carefully figure out the path lambda to avoid
overwhelming the network with n-gon calculations, and you need
to carefully figure out how to exclude samples
that would overwhelm particular sites, without introducing bias.
(One idea that might work, but check with Jack: randomly choose which
measurements you won't make, with a bias based on the long-term rate
that a node is getting hit with n-gon measurements.) And you
need to be careful with the startup transient for exponential moving
averages.

tom

ps. stefan and neal -- I suggested to John that he use your
model of how short flow TCP performance varies with loss rate
and RTT and RTT variance, as a way of picking alternate n-gon routes.
The question is how to balance optimizing RTT vs. optimizing loss rate.