comments?

Tom Anderson (tom@emigrant)
Thu, 17 Sep 1998 22:00:29 -0700 (PDT)

Your comments would be welcome on the latest draft.
I'm working under a deadline to submit something to the Abilene
steering committee, and to start collecting participants.

tom

-----
A Case for Access: A High Performance Communication and
Computation Environment for Wide Area Distributed Systems,
Networking, and Applications Research

1. Introduction

Despite the enormous commercial and social importance of the
Internet, academic research in wide area distributed systems,
networking, and applications is being hamstrung by the lack of a
development and deployment path for new ideas and
technologies. Without a general-purpose environment for
conducting long-term experiments under real traffic load, wide
area systems research is limited to paper design studies and
small-scale demonstrations, without the reality check of real
users. In the past, when much of computer science research
concerned systems and applications for single machines and
local area networks, it was possible for university groups to
develop and distribute software embodying new ideas, to raise
the level of work of the entire research community. Examples
include Berkeley UNIX, Mach, gcc and the gnu tools, PVM,
various CAD tools, Linux, Kerberos, High Performance
Fortran, Ingres, LAPACK, X-windows, and many, many other
smaller efforts. All that was needed to validate research ideas in
practice was to develop software that solved real problems for a
willing user community.

Unfortunately, the path for wide area research from concept to
deployment is much more laborious. Real-life deployment often
requires coordination of computation and communication
resources dispersed around the world, requiring laborious
negotiations across numerous administrative domains. One of
the few examples of success has been the MBone (the multicast
backbone), and its experience is a tremendous case study for
both the value and the need for a more systematic approach for
providing an infrastructure for wide area systems and
applications research. The MBone has attracted large numbers
of new users to new applications, such as Internet-based
teleconferencing, in the process illustrating a number of new
problems that appeared when the system was deployed on the
wide scale. A huge amount of research has been motivated and
enabled by the MBone, leading directly to new products in
industry; a "paper MBone" would have simply not been
effective. The MBone, however, was painstakingly put together
piece by piece over the course of many years, and did not really
take off until it had real users. Although many proposals exist
for new telecollaboration tools, new approaches to web caching,
new name services, and new security architectures, new Internet
measurement tools, etc., the activation energy required to deploy
these systems makes it unlikely that any will reach critical mass.

Our proposal is to deploy high performance computing and
communication resources at 50-100 geographically separate sites
spread around the US and the world, for conducting long-term
wide area systems, networking and applications experiments to
gain experience with real users. We call this system Access (A
Computing and Communication Environment for System
Software). We envision a rack roughly ten to twenty PC's per
site, with gigabit local area network connectivity between the
PC's, high performance connectivity to a local or regional
bandwidth redistribution point, such as a GigaPoP, and
nationwide bandwidth provided between the GigaPoPs via
Abilene, HSCC, SuperNet, and the plain old Internet. A single
PC per site might be enough to support a single experiment such
as the MBone, but we would like to use the infrastructure to
support many simultaneous experiments, each running
continuously to attract real users. While Access could be seen as
a parallel computer, our focus is in supporting applications that
require geographic distribution to benefit end users.

The benefits of Access relative to the existing deployment path
for wide area applications include:

-- Rapid deployment of new wide area applications; universities
offer a ready-made community of early adopters for new
technology.

-- Enable researchers to rely on new research in distributed
system services by other researchers, to allow research groups
to stand on each other's shoulders, rather than on each other's
toes. Access would allow new services for naming, security,
mobility, etc., to be continuously available, a key step to
encouraging applications to be developed that depend on those
services.

-- Robust testing of distributed services and applications under
realistic conditions, instead of extrapolating from simulations
and small scale LAN experiments. It is common wisdom among
Internet researchers that our intuition doesn't scale; new,
unexpected problems are always encountered when we scale
systems to larger numbers of machines and users.

-- Focus wide area research on problems of availability and fault
tolerance. To be practical, wide area systems must deal with
network partitions and hardware failures; this is easy to ignore in
small scale local area experiments. With a 1000 PC's in Access,
it is guaranteed that a large number will be unavailable at any
point in time;

-- Develop shared software for remote authentication, rebooting,
debugging, and management, enabling researchers to focus on
new distributed applications, rather than having to repeat the
same framework for each new effort.

-- Provide defined administrative procedures for gaining access
to remote resources. It is hard to negotiate access to a remote
computer at a single site; it is nearly impossible for any
individual research group to do so for fifty separate sites.

Various developments make now a good time to try to develop
the kind of a framework we propose. First, the enormous
growth in Internet use has led several companies, such as
Qwest, Enron, and others, to begin laying new nationwide fiber
plants; as demonstrated by Abilene, with planning it is feasible
to acquire a portion of that new fiber to support research into
new Internet applications, where buying the bandwidth on the
open market would otherwise be prohibitively costly. Further,
many areas have concentrated their high bandwidth networking
efforts at regional GigaPoPs, regional redistribution points for
national high bandwidth networks. This enables universities to
gain access to high bandwidth national networks by laying only
local area fiber; we envision the rack of PC's in Access to either
be physically colocated at the GigaPoP or connected to it by a
high performance link. By contrast, a random PC recruited on a
friendly researcher's desk is likely to be connected to the Internet
only by a 10Mb/s Ethernet.

On the software side, the Berkeley NOW project and other
cluster research efforts have prototyped software to remotely
manage racks of PC's, with secure remote loading of software
including the operating system, remote debugging and
rebooting, and self-configuration. For the NOW project, it was
useful to reduce the number of trips to the local machine room to
reboot systems; for Access, it is an absolute requirement that the
systems be self-managing without local operator intervention.
Similarly, the NIMI, X-Bone and Active Network efforts have
developed other key parts of the software needed to run Access.

To support Access, we plan concurrent proposals to:
-- Abilene and HSCC to donate 10% of their bandwidth
-- Intel and Sun to donate on the order of 1000
PCs/workstations
-- the NSF Research Initiatives program for $2M for glue
hardware (e.g., racks, Myrinets, disks, GPS timers, etc.)
-- the DARPA Active Networks program to fund the
development of glue software (e.g., tools for remote
management and debugging)
-- participating universities and other sites to provide a high
bandwidth connection to their metripolitan or regional bandwidth
redistribution point (GigaPoP), as well as space in a machine
room for installing the PC rack

In the rest of this proposal, we outline a set of application
experiments that would be enabled by Access, as well as the
hardware and software that would be needed at each site.

2. Applications

Our goal is to enable novel wide-area distributed systems and
networking research. There is a huge pent-up demand for a
deployment mechanism for new research ideas; we enumerate a
few here. To obtain experience with real users, experiments
need to run for weeks or months, if not indefinitely. The scope
of the list here suggests that even twenty PC's per site would be
quickly utilized.

-- Virtual Web services. The model we propose for Access is
emerging as the standard platform of choice for web services.
Many heavily used web services such as Altavista and Netscape
are spread across geographically distributed replicas, to improve
latency, bandwidth and availability to end-users; Amazon.com,
for example, has a site on each coast. However, the tools used
to manage consistency across these distributed web sites are
extremely primitive. If the computer science research
community is to figure out solutions for what web services will
need in the future, we will need an experimental testbed for
validating those solutions.

-- Telecollaboration. The MBone is a huge success story for
widespread deployment, but it is suffering from its own
popularity. Its protocols are based on manual coordination, and
although there are several proposals for self-managing multicast
networks (e.g., for extending a multicast tree across
administrative domains, for multicast address allocation, for
reliable multicast transmission), it is unclear how to deploy those
new ideas without disrupting the widespread use of the MBone
today. Access would allow an "MBone2" to be deployed in
parallel with the original Mbone for validating new research
ideas in managing the multicast network. Similarly, Access
would enable various telecollaboration tools that require
computation inside of the network to be effective, such as
mixing audio and video feeds from multiple sites.

-- Real time. Providing end-to-end real time performance is an
active research area, with several competing proposals including
RSVP and DiffServ. Access would lower the barrier to
widespread deployment of these protocols, allowing groups to
collect real user experience to help evaluate their work.
Similarly, Internet switch manufacturers are moving towards
providing quality of service by implementing prioritized classes
of service in hardware; however, there has been no widespread
academic prototyping effort to demonstrate that priority classes
will be sufficient to provide reasonable performance for real-time
traffic.

-- Worldwide web caching. One solution to the "world wide
wait" is to develop a network of web caches to allow web pages
to be served from locations closer to end users. As evidence of
the importance of validating research ideas in practice, the initial
deployment of the Squid web cache demonstrated several
surprising results, such as the need to cache active content to
provide high hit rates. Again, several competing research
groups have proposed alternative approaches to web caches;
Access would enable these alternatives to be evaluated in
practice, rather than in theory.

-- IETF standards. Access would complement the IETF
standards process, enabling proposed standards to be
implemented, deployed, and tested under real use before and
during being adopted for the real Internet. It is a principle of
standards bodies that no standard should be adopted without
first being implemented and used; Access would make it simpler
for new standards to be prototyped. Further, Access could
facilitate propagation of newly adopted standards; IPv6 and
mobile IP have been tested on a small scale, but with Access,
clients could begin to count on these services being continuously
available.

-- Internet measurement. A number of research efforts have
begun to focus on measuring characteristics of the Internet, both
to understand its behavior and to use as input into large scale
simulations of the Internet. Because there is no direct way to
ask routers to report on their buffer occupancy, link
latencies/bandwidths, utilization, or drop rate, measurements
must be taken from multiple sites to be effective at capturing the
state of the Internet; Access would provide a platform for those
measurements.

-- Internet operation. Ongoing Internet measurement efforts
have begun to illustrate that the Internet has substantial
operational problems, including high drop rates (5-6%),
persistent congestion, poor route selection, and route
oscillations, just to name a few examples. Several researchers
have proposed new approaches to routing and congestion
control to address these problems; Access would provide a
platform for a "virtual Internet" to test these new approaches
against substantial user load. Without Access, it would be hard
to imagine validating any new approach to routing or congestion
control to the degree that would be necessary to think about
using them in the real Internet.

-- Distillation and compression. As more of the Web becomes
graphics-based, and as end-host displays become more
heterogeneous (from PDA's to reality engines), there is an
increasing need for application-specific compression to take
place inside of the network to optimize around bottleneck links.
For example, it makes little sense to ship a full screen picture
across the Internet to a PDA; it also makes little sense to ask
users to manually select among small and large versions of the
same image. Various proposals exist to address this problem;
Access would enable validation of these proposals by providing
a framework for reliable service to real users. In the long run,
one would hope compression and distillation would be
supported by both servers and clients, but before there is
widespread adoption, there is need for translators embedded in
the network to handle legacy systems.

-- Wide area distributed systems. A number of projects, such as
Globe, Globus, Legion, WebOS, ProActive, and DARPA's
Quorum, have recently been started to provide a software
framework to support applications that can make effective use of
remote computational and storage resources. These systems
face a huge research agenda; to just illustrate one example, we
have only very limited understanding of how to provide cache
and replica consistency across the wide area. To focus this
work on the real problems of next-generation distributed
applications, Access provides a path for applications to be
developed for these frameworks and then be tested in real use.

-- Naming. Similarly, a number of proposals have recently been
developed for enhancing the Internet's Domain Naming System
(DNS). Although DNS is effective at mapping individual
machine names to IP addresses, as services become replicated,
there is an increasing need to carefully control the mapping from
names to instances of a service (e.g., binding clients on the East
Coast to the Amazon.com replica in Delaware vs. binding West
Coast clients to the one in Seattle). Similarly, there is a need for
enhanced naming services to track mobile clients. Access would
enable prototypes of these solutions to be built and deployed,
enabling applications to be written depending on those services.

-- Wide area security. There is an obvious need for a national
infrastructure for the secure, authenticated, accountable, and
revocable access to remote resources; several proposals have
been made for extending local area security mechanisms to the
wide area. Each relies on physically secure computers spread
around the country, for providing a reliable, continuously
available framework for authentication and key distribution.

-- Distributed databases and cooperative web crawling. An
active area of research in the database community is how to
integrate geographically distributed data sets, for example, to
integrate various Web databases or NASA's EOSDIS into a
usable system capable of supporting queries that span multiple
sites. The Mariposa project at Berkeley, for example, has
proposed an architecture for dynamically moving data and
computation around to minimize network bandwidth and local
computation cost. Similarly, web crawling (building an index of
web data) needs to be geographically distributed to be scalable;
otherwise, all data on the Web must be shipped through a single
site, limiting the rate that the Web can be crawled. Access
would provide a platform for testing and deploying new
approaches for integrating diverse wide area information
sources.

-- Active Networks. DARPA's Active Networks program
provides an architecture for applications that can benefit from
computing in the network. Access would provide a platform for
validating the Active Networks virtual machine, operating
system architectures for hosting computation in the network,
resource allocation among competing applications that share
physical resources, etc.

3. Local Hardware and Software

As a research community, over the past few years we have
gained lots of experience with assembling and operating
machine-room-area clusters. Since our graduate students have
wanted to avoid having to run to the machine room every time
something goes wrong with our local clusters, we have also
gained considerable experience with remote cluster operation.
Prototypes of the hardware and software we describe below
exist at both Berkeley and the University of Washington.

Our strategy is to simplify operations by (i) using two extra PCs
to serve as fail-safe monitors on the cluster operations and (ii)
having a standard baseline hardware and software configuration,
avoiding the management problems of trying to turn random
collections of PC's running random collections of software into
a usable system.

At each site, we envision:

2 control PC's to serve as fail-safe reboot engines, monitoring
the operation of the other PC's. The control PC's would control
reboot serial lines (X.10) for all of the machines in the cluster
(including each other); new experiments (including potentially
new OS kernels) are downloaded over the Internet to the control
PC and then installed on the relevant PC in the rack. The control
PC's would have GPS's to provide time synchronization for the
rest of the cluster. They would also provide services to
authenticate researchers installing new experiments, support for
remote debugging and installation, and self-test code to detect
and report hardware failures in the cluster. Finally, the control
PC's would passively monitor the cluster's Internet connection
for illegal use by the experimental PC's. As a fail-safe, the two
PC's will not be used to run experimental software, allowing the
system to recover if an experiment leaves one of the PC's in the
rack in a corrupted state.

20 PC's to serve as experimental apparatus. Each PC would be
configured with a reasonable amount of memory and disk, a
machine-room area network connection, and a wide-area
network connection. A disk is needed for experiments that need
local persistent storage, such as web caching, naming, security,
and for logging of results. The console for each machine would
be reflected remotely to the researcher controlling the
experiment. One model of operation is that each experiment is
given its own PC to minimize interference between experiments;
in this case, each researcher would be free to wipe their
environment completely clean, download a new OS, reformat
the disk, etc. Experiments that share hardware would clearly not
be as flexible; they would have to be assigned to a PC running a
compatible operating system.

A high-speed machine room area network, such as Myrinet or
fast switched Ethernet, connecting all of the PC's in the cluster.
This network would be dedicated to the cluster and isolated from
any other machines at the site. With Myrinet, downloading a
new software distribution to each of the dozens of PC's in our
local cluster takes only a matter of minutes. Further, prototype
software exists to automatically manage a Myrinet network,
establishing routes, detecting and mapping around failures, etc.

A high-speed connection to the local GigaPOP (the local
connection point to the high-speed national networks). This
connection would be something like Gigabit Ethernet, that can
be passively monitored by the control PC's. Ideally, the link
would be dedicated to the cluster, carrying only traffic explicitly
sent to applications running on Access; for obvious privacy
reasons, it is not our goal to provide a mechanism for
experiments to snoop uninvited on local traffic. Additionally,
we envision using policy routing at the GigaPOP to allow
flexible configuration of the wide area network used for each
experiment, whether plain old Internet, Abilene, HSCC,
SuperNet, or other provider.

A local operator would be needed at each site to install the
system and to replace any failed hardware components, and to
reboot the control PC's in the event that both crash. Otherwise,
the local operator would have no software responsibilities for the
cluster.