RE: ...thinking about access

Stefan Savage (savage@cs.washington.edu)
Thu, 15 Oct 1998 14:12:45 -0700

In fairness I don't think the hard problem is going to be where does the
"boot loader come from". There are a number of ways to get a small
program into memory early on in the boot sequence.
Boot floppies,
boot cd's,
a PROM for the NIC (available for most NICs),
a flash card with a ROM signature mapped in the 640KB-1MB
region,
network boot in BIOS (supported on some newer BIOS's)
mod to Flash BIOS (if Intel gives specs)
etc...

I expect that Access will become heterogeneous over time so it may be
that some combination of these solutions end up being used. Inevitably
there will be heterogeneity between network cards, SCSI/IDE cards,
etc... so while we may start with a single boot loader it will likely
need to be updated to reflect this diversity over time. I think we
should pick whatever solution is easiest for the set of machines that
Intel donates ;-) a Harder problem is going to be how to configure the
new OS once its been setup. I can imagine doing this via scripts for
Linux/FreeBSD, but some thought is needed for NT.

While Intel/MS has been working on a standard where your PC can be
turned on and off via the network card I don't think this is a great
plan for Access. Even if it always worked, I've witnessed that with
some I/O cards asserting RST isn't the same as powercycling. Hence, I
prefer the good old fashion solution of cutting the power and turning it
on again. For this we can use X-10 or high voltage serial controlled
relays (available from Home Automation also).

I also think that we want at least TWO control hosts for ever rack of
PC's, wired so one can reboot the other in case of independent
controller host failure (the goal is to keep operator intervention at a
minimum). Control hosts will be necessary for all kinds of management
functions:
Staging code to be downloaded
Scheduling access to access
Recovering persistent data
getting status info on hosts (what's used, what's down, etc...)
Getting network status info (RMON/SNMP queries to nearest
router)
To announce new resources to the network, or withdraw resources
As a repository of use/error logs, etc...
Remote console support
Security checks

I think getting all of the above right is the really tough problem.
Perhaps we can draw up a list of functions that need to be on a control
node:
- schedule/request access to a node
- reboot a node
- load an OS onto a node
- configure an OS
- way to access console remotely (serial console?)
- check status of node/cluster
- keep history of cluster use, errors, etc...
- manage some amount of persistent state for users
- global announcement/withdrawal of resources
- validate that a users is allowed to perform any of the above
functions

- Stefan

-----Original Message-----
From: Eric Hoffman [mailto:hoffman@cs.washington.edu]
Sent: Thursday, October 15, 1998 12:48 PM
To: syn@cs.washington.edu
Subject: ...thinking about access

...thinking about how to Stefan's model of node management where the
basic management interface is to dump a new disk image and reboot

for a while I had been thinking that we could use an intelligent
network card as some kind of node-controller. It could listen
for authenticated messages at either IP or ethernet level, and
would be able to:

reset the host machine
address the pci ide controller when the host machine was stalled
for purposes of dumping a new image
encapsulate and relay messages on the host serial port (for those
environments that use it) to network clients

in general this might not work because of two problems:

slave cards can't generally drive the host reset line (I also don't
know if PCI reset is actually interpreted as a system reset, or is
just local to the bus)

it wont be simple to insure that the host is quiescent while writing
to the disk

intel has an ethernet board for $200-$300 with a i960, but mef isn't
sure we can get programming specs. we can buy a $1000 board from
cyclone which does, but thats quite a bit to spend per node on control
functions

the other technology which is have at our disposal is bootable
cds. this is basically the same story as boot floppies, but far less
space constrained and far more reliable

if we put a full operating system on the cd for purposes of upgrading
the drive boot image, its not clear how easy it will be to boot again
into the target operating system. unfortunately, dos would probably be
the best for this since pc os's need information from the bios during
boot, and linux and freebsd generally use this information and dispose
of it

the bootable cd seems the most likely course, but it leaves us with
the requirement for a 'control host' in every rack. the cd-os would
check with this controller as to what the right action was on each
boot, load any new partitions, and boot into any of the existing ones.
the controller would contain power and serial control similar to that
used in the loom rack

the per-node intelligent ethernet card could (possibly) be engineered
to be directly managed by any authenticated node on the internet, and
thus spare the added administration overhead of dealing with the
control host

any other thoughts?