- Porcupine One is now so reliable that most failures are of the
kernel.
	- There is presently no way to monitor and recover from kernel
failures
	- Yasushi is building a Failure Recovery Service that runs on a
single node:
		- when a porc node suspects that another porc node has
gone down, it calls up
		  the failure recovery service and says "check it out!"
		- the failure recovery service pings the porc app on the
suspect node
			if the porc app is there and running, it reports
a false positive
			if the porc app is not there but the node is up,
the porc app is restarted
			if the node is down or hung, then the node is
restarted
		 At the end of the failure recovery, a mail msg goes out
tracking what happened.
.
-----Original Message-----
From: yasushi@yasushi-pc [mailto:yasushi@yasushi-pc]
Sent: Wednesday, October 28, 1998 12:02 PM
To: porcupine@yasushi-pc; syn@yasushi-pc; spin-m3@yasushi-pc
Subject: loom25 being used as a crash box
For a next couple of days, I will be testing a watchdog mechanism on
loom25. This means loom25 will go through many involuntary reboots and
powercycling.
yaz