...thinking about access

Eric Hoffman (hoffman@cs.washington.edu)
Thu, 15 Oct 1998 12:48:22 -0700 (PDT)

...thinking about how to Stefan's model of node management where the
basic management interface is to dump a new disk image and reboot

for a while I had been thinking that we could use an intelligent
network card as some kind of node-controller. It could listen
for authenticated messages at either IP or ethernet level, and
would be able to:

reset the host machine
address the pci ide controller when the host machine was stalled
for purposes of dumping a new image
encapsulate and relay messages on the host serial port (for those
environments that use it) to network clients

in general this might not work because of two problems:

slave cards can't generally drive the host reset line (I also don't
know if PCI reset is actually interpreted as a system reset, or is
just local to the bus)

it wont be simple to insure that the host is quiescent while writing
to the disk

intel has an ethernet board for $200-$300 with a i960, but mef isn't
sure we can get programming specs. we can buy a $1000 board from
cyclone which does, but thats quite a bit to spend per node on control
functions

the other technology which is have at our disposal is bootable
cds. this is basically the same story as boot floppies, but far less
space constrained and far more reliable

if we put a full operating system on the cd for purposes of upgrading
the drive boot image, its not clear how easy it will be to boot again
into the target operating system. unfortunately, dos would probably be
the best for this since pc os's need information from the bios during
boot, and linux and freebsd generally use this information and dispose
of it

the bootable cd seems the most likely course, but it leaves us with
the requirement for a 'control host' in every rack. the cd-os would
check with this controller as to what the right action was on each
boot, load any new partitions, and boot into any of the existing ones.
the controller would contain power and serial control similar to that
used in the loom rack

the per-node intelligent ethernet card could (possibly) be engineered
to be directly managed by any authenticated node on the internet, and
thus spare the added administration overhead of dealing with the
control host

any other thoughts?