Re: clustering

Neal Cardwell (cardwell@cs.washington.edu)
Fri, 29 May 1998 20:52:39 -0700 (PDT)

> So, back to DNS resolutions. Bind is serial. Which means that there's a
> bottleneck in the measurements -- everyone queues up behind the BFE server
> that takes 15 minutes to resolve, when multiple traces are to be
> performed.
>
> Originally, this meant that they all timed out, and the results were not
> kept -- BFE's revenge.

I heard the NOW project had similar problems with parallel job startup,
when 128 jobs on 128 different nodes would simultaneously try to look up
uFOO.cs.berkeley.edu using the same name server. Reeealll sloooow... I
think they ended up just hard-wiring IP addresses or something. Amin may
be more familiar with the details.

> SO. Is this synchronization a problem? Should we drop the servers that,
> for some odd reason, don't resolve properly when using IP's, or should we
> allow that synchronization to occur?
>
> Although I don't like the synchronization, I similarly don't like the idea
> of removing more datapoints.

One work-around might be to just use DNS names for the picky servers, and
IP addresses for the rest (hopefully the majority). This might allow you
to use all traceroute servers while still avoiding most traffic jams at
the BIND server.

> And one last question. Was the twonk who originally wrote bind working in
> MS-DOS, or some similarly-valued single user system? This lack of
> parallelism is annoying.

Yeah. I guess the apps at the time -- FTP, telnet, NNTP, and SMTP --
didn't beat on name servers much.

One question: anyone have an idea why the name->IP mappings aren't being
cached in the Java program that's accessing these traceroute servers? Are
new processes being fired up, or is Java just brain dead?

neal