There really isn't a strong result here; rather this is just a
somewhat interesting measurement and a potentially interesting
measurement technique, and I was hoping that it might spark some
ideas. If you want to read about my ideas for applying this data
before reading about the actual graph interpretation, skip to the last
two paragraphs.
What is basically being measured here is the extent to which
congestion along one path is correlated with congestion on other paths
Since the correlation varies dramatically depending on exactly which
two paths you're looking at, we can't effectively sum this up as a
single number, which is why I'm showing the entire distribution.
There are 15 nodes in the UW4 dataset, so there are 15*14 = 210 paths,
and therefore 210 * 209 = 43,890 pairs of paths. Some of these share a
common endpoint (and therefore would be more expected to show
correlations), but most of them do not.
The difficulty in using correlation coefficients as a metric is that
there doesn't seem to be a really good, simple way to map the
coefficient back into a simple notion like "definitely shared
bottleneck" or "just noise." Typically a statistician would say that
a coeffient less that 0.5 would indicate "weak" correlation, but I
tend to be a bit more generous. Negative correlations indicate that
increases in congestion along one path was associated with a decrease
in congestion along the other path. Not surprising, there are no
pairs with a strong negative correlation. Since I can't think of any
mechanism in the network that would actually cause negative
correlation, I assume that all the negative values are due to noise
(note how taking all the samples versus just every fifth sample pulls
in the number of negative pairs), and I tend to write off postive
values in the same absolute range as also being noise.
So looking at the ABCD line (unrelated pairs), I would notice that
values as negative as -0.1 are reasonably common, and therefore think
that every pair whose correlation is less than 0.1 (about 80% of the
unrelated paths) are showing only noise. It's a bit harder to know
what to do with the remaining 20% of these pairs, with coefficients in
the range 0.1 to 0.4. These paths are mostly independent, but they
could be sharing some links that aren't generally bottlenecks, or we
could just be observing diurnal effects.
The noise range on the ABC line (pairs with a common endpoint) is
about the same, but 50% of these pairs have positive correlation above
the noise level. The most interesting thing here (to my way of
thinking) is shift that happens at this point. Many of the pairs
above this noise level are in fact quite strongly correlated, while
the part of the curve below this level is very similar to the
unrelated pairs line.
The two areas that I've thought of in terms of applying this data are
generating synthetic end-to-end latencies and analyzing end-to-end
traces for sharing. We need some handle on correlations if we are
going to be able to generate synthetic datasets that match our traces,
because we have discovered that assuming independent congestion
effects on each path doesn't work well enough to properly match our
alternate path measurements.
We can also apply the analysis in reverse, working from our measured
traces and using the measured correlations to drive guesses about the
topology and the location of bottlenecks (if two paths are highly
correlated in their congestion behaviour, then they probably share a
bottleneck). I don't know how this sort of technique might compare to
or integrate with some of our other ideas.
Andy
> -----Original Message-----
> From: acollins@hotate.cs.washington.edu
> [mailto:acollins@hotate.cs.washington.edu]
> Sent: Thursday, July 29, 1999 12:21 PM
> To: syn@cs.washington.edu
> Subject: yet another CDF plot
>
>
>
> But don't worry, this one is very different. Take a look at:
>
> http://www.cs.washington.edu/homes/acollins/detour/local/timecorr.ps
>
> This plot shows the distribution of correlation coefficients between
> pairs of paths in the UW4 dataset. The traces labeled "range reduce
> 5" were generated using only one out of five samples, while the third
> trace used all the samples (so you can see how some of the noise goes
> away).
>
> The CDF is taken over pairs of paths in the dataset, and what we are
> looking at is the correlation between the latencies on the two paths
> (since this is UW4, we have a series of paired measurements, for which
> we can take a correlation coefficient.
>
> The trace labeled "ABCD" is for pairs of unrelated paths, while the
> trace labeled "ABC" is for pairs of paths with a common endpoint.
> High correlation coefficients indicate that congestion tended to
> happen on the two paths simultaneously (presumably indicating a shared
> bottleneck).
>
> Most of the unrelated pairs have weak correlation (especially since
> the distribution goes nearly as far negative as positive, and we
> wouldn't expect anything systematic to cause negative correlations),
> while about half the connected paths have fairly strong correlations
> (and the other half don't; I'm working on trying to find a
> distinguishing feature between these two classes).
>
> I've been looking at this with an eye towards generating synthetic
> datasets, but it also has potential application for locating sharing
> by tying it to correlation.
>
> Andy
>
>
> --
> //-------------------------------------------------------------------------
> // Andy Collins -- KC6YEY acollins@cs.washington.edu
> // Graduate Student, University of Washington
> // Committee Rules (3-5):
> // 3) Be as vague as possible; this prevents irritating the others.
> // 4) When in doubt, suggest that a subcommittee be appointed.
> // 5) Be the first to move for adjournment; this will make you popular --
> // it's what everyone is waiting for.
> // (collection last updated by Don Woods)
> //-------------------------------------------------------------------------
>
-- //------------------------------------------------------------------------- // Andy Collins -- KC6YEY acollins@cs.washington.edu // Graduate Student, University of Washington // Peter's Observation: // Super-competence is more objectionable than incompetence. // (collection last updated by Don Woods) //-------------------------------------------------------------------------