RE: TCP-performance-based routing metric thought

Stefan Savage (savage@cs.washington.edu)
Tue, 20 Oct 1998 09:19:27 -0700

The other problem is that even with the "epsilon-fix" to drop rate,
there is a magnitude bias in our data towards drop rate that we're not
comfortable with. Almost any metric that incorporates drop rate will
make us take alternate paths with slower than normal RTT's. Our drop
rate samples have a couple of problems:
1) They have more error in them (we need many more samples to
differentiate small changes in drop rate)
2) I assert that they're more load sensitive than latency (as
you increase load, latency will increase up to "pkt service time*q
depth"... I don't think this will generally be more than a factor of two
greater than propagation delay. On the other hand, as you increase
load, drop rate will climb to infinity. Almost anything is bigger than
epsilon, or even 1%.)

For now I think we should stick with the hybrid metric or just straight
latency.

- Stefan

-----Original Message-----
From: Neal Cardwell [mailto:cardwell@cs.washington.edu]
Sent: Tuesday, October 20, 1998 2:42 AM
To: syn@cs
Subject: TCP-performance-based routing metric thought

here's another possiblity for routing metric: use the Padhye model (or
whatever model i settle on this week for my quals project, if it seems
ok)
except either

1) for paths with measured loss probabilty = 0, use loss probability =
epsilon, where epsilon = 10^-5 or 10^-6 (the easy, cheesy way)

or, more interestingly,

2) for all paths, instead of using the measured loss rate, p, as the
loss
rate estimator, be conservative and use the top of the 95% confidence
interval of the estimated p.

ex:

n = 100 ping samples, y = 0 dropped of those n
--> alpha=95% confidence interval for loss rate (p. 278 in my ugrad
stat text): (0, 0.037)

n = 100 samples, y = 1 dropped
--> alpha=95% confidence interval for loss rate: (<something>, 0.054)

n = 1000 samples, y = 0 dropped,
--> interval is (0, .0038)

n = 10 ping samples, y = 0 dropped
--> interval: (0, 0.278)

n ping samples, y = 0 dropped
--> interval: (0, (1.96)^2 / (n + (1.96)^2)) ~= (0, 4/(n+4))

the advantage of this scheme is that we get to use a real TCP
performance
prediction formula, and we have a consistent way to pick routes, no
matter
what the loss rate, that is based on our estimate of the loss rate, and
modulated by our confidence in this estimate. the disadvantage is that
it
involves a nontrivial amount of computation - maybe 20 fp ops to
calculate
the confidence interval.

another feature, maybe good or bad: it is heavily biased against paths
about which we have little info, and therefore few samples (since having
few samples dramatically inflates the confidence interval). this may be
exactly the stabilizing hysterisis kinda thing we want, since it will
keep
us using the current paths as long as we know they are good, and we'll
only flap to the path less traveled once we somehow get enough samples
from it to decide that its loss rate is definitely low and stable. and
then again, it may not be what we want, because it will tend to
discourage
us from using potentially great, but as-yet-untried, paths.

neal