fixes the problem we saw last summer where the last little packet of any
old TCP transfer was being delayed. btw, yasushi's name is in the
acknowledgments; The old Nagle algorithm must have been killing porky's
TPS...
neal
---------- Forwarded message ----------
Date: Fri, 18 Dec 1998 22:44:37 -0800
From: Greg Minshall <minshall@siara.com>
To: end2end-interest@ISI.EDU
Subject: a proposed modification to the Nagle algorithm
Members of the end2end-interest list,
I am enclosing a draft which discusses a proposed modification to the Nagle
algorithm (in particular, the algorithm Nagle described in RFC896 to deal with
the "small-packet problem".
Hopefully, the draft does an adequate job of describing the current Nagle
algorithm as well as the proposed modification.
The draft also discusses some of the motivation for suggesting this change.
However, more motivation is included in a paper:
Nielsen, Henrik Frystyk, et al, "Network Performance Effects
of HTTP/1.1, CSS1, and PNG", in Proceedings, ACM SIGCOMM 1997.
A web page associated with that paper may also be of interest to people
interested in this topic:
http://www.w3.org/Protocols/HTTP/Performance/Nagle/
(Please note that while, in certain cases, the current Nagle algorithm can
have a negative performance impact for certain applications, turning OFF the
Nagle algorithm can have a very serious negative impact on the internet.
Thus, nothing in this e-mail message, or in the enclosed draft, should be
taken as advice to any application developer to disable the Nagle algorithm.
The current Nagle algorithm is very important in protecting the health of the
internet; the proposed modification [hopefully] provides the same level of
protection.)
I would be very interested in hearing people's thoughts about the proposed
modification. In particular, if people can think of ways in which the
proposed modification might have a serious negative impact on the internet
(see the section "A Failure Mode" for one possible line of reasoning), that
would be very interesting to hear!
(You may notice that the paper includes a discussion of a possible API change
to give applications better control over TCP's transmission of small packets.
While i believe this to be a beneficial change to the API, this is not the
main thrust of the paper and would not, therefore, be the main thrust of our
subsequent discussion of the paper.)
Thanks in advance for your careful reading and comments,
Cheers, Greg Minshall
ps -- I am about to disappear for a week's vacation. I wanted to get this
draft out now in order to give people a chance to comment on it, but *i* won't
be able to participate in the discussion in real time until the week of
December 28. Think of all those flames you are about to send me as being
community Christmas cards!
----
Internet Engineering Task Force Greg Minshall
INTERNET-DRAFT Siara Systems
draft-minshall-nagle-00 December 18, 1998
A Proposed Modification to Nagle's Algorithm
Status of This Memo
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet- Drafts
as reference material or to cite them other than as ``work in
progress.''
To view the entire list of current Internet-Drafts, please check
the ``1id-abstracts.txt'' listing contained in the Internet-Drafts
Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net
(Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au
(Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US
West Coast).
This draft proposes a modification to Nagle's algorithm (as
specified in RFC896) to allow TCP, under certain conditions, to
send a small sized packet immediately after one or more maximum
segment sized packet.
Abstract
The Nagle algorithm is one of the primary mechanisms which protects
the internet from poorly designed and/or implemented applications.
However, for a certain class of applications (notably,
request-response protocols) the Nagle algorithm interacts poorly
with delayed acknowledgements to give these applications poorer
performance.
This draft is NOT suggesting that these applications should di
sable the Nagle algorithm.
This draft suggests a fairly small and simple modification to the
Nagle algorithm to preserve Nagle as a means of protecting the
internet while at the same time giving better performance to a
wider class of applications.
Introduction to the Nagle algorithm
The Nagle algorithm [RFC896] protects the internet from
applications (most notably Telnet, at the time the algorithm was
developed) which tend to dribble small amounts of data to TCP.
Without the Nagle algorithm, TCP would transmit a packet, with a
small amount of data, in response to each of the application's
writes to TCP. With the Nagle algorithm, a first small packet will
be transmitted, then subsequent writes from the application will be
buffered at the sending TCP until either i) enough application data
has accumulated to enable TCP to transmit a maximum sized packet,
or ii) the initial small packet is acknowledged by the receiving
TCP. This limits the number of small packets to one per round trip
time.
While the current Nagle algorithm does a very good job of
protecting the internet from such applications, there are other
applications, such as request-response protocols (with HTTP 1.1
being a topical example) in which the current Nagle algorithm
produces non-optimal results. In this context, the Nagle algorithm
is interacting with TCP's ``delayed ACK'' policy [RFC1122].
Delayed ACKs
A receiving TCP tries to avoid acknowledging every received data
packet. This process, known as ``delayed ACKing'' [RFC1122],
typically causes an ACK to be generated for every other received
(full-sized) data packet. In the case of an ``isolated'' TCP
packet (i.e., where a second TCP packet is not going to arrive
anytime soon), the delayed ACK policy causes an acknowledgement for
the data in the isolated packet to be sent within 200 milliseconds
of the receipt of the isolated packet. (The way delayed ACKs are
implemented in some systems causes the delayed ACK to be generated
anytime between 0 and 200ms; in this case, the average amount of
time before the delayed ACK is generated is 100ms.)
The interaction of delayed ACKs and Nagle
If a TCP has more application data to transmit than will fit in one
packet, but less than two full-sized packets' worth of data, it
will transmit the first packet. As a result of Nagle, it will not
transmit the second packet until the first packet has been
acknowledged. On the other hand, the receiving TCP will delay
acknowledging the first packet until either i) a second packet
arrives (which, in this case, won't arrive), or ii) approximately
100ms (and a maximum of 200ms) has elapsed.
When the sending TCP receives the delayed ACK, it can then transmit
its second packet.
In a request-response protocol, this second packet will complete
either a request or a response, which then enables a succeeding
response or request.
Note two (related) bad results of the interaction of delayed ACKs
and the Nagle algorithm in this case: the request-response time may
be increased by up to 400ms (if both the request and the response
are delayed); and, the number of transactions per second is
substantially reduced.
A proposed modification to the Nagle algorithm
The current Nagle algorithm can be described as follows:
If a TCP has less than a full-sized packet to transmit,
and if any previous packet has not yet been acknowledged,
do not transmit a packet.
The proposed Nagle algorithm modifies this as follows:
If a TCP has less than a full-sized packet to transmit,
and if any previous less than full-sized packet has not
yet been acknowledged, do not transmit a packet.
In other words, when running Nagle, only look at the recent
transmission (and acknowledgement) of small packets (rather than
all packets, as in the current Nagle).
(In writing the above, I am aware that TCP acknowledges BYTES, not
packets. However, expressing the algorithm in terms of packets
seems to make the explanation a bit clearer.)
Implementation of the modified Nagle algorithm in a system
The current Nagle algorithm does not require any more state to be
kept by TCP on a system. SND_NXT is a TCP variable which names the
next byte of data to be transmitted. SND_UNA is a TCP variable
which names the next byte of data to be acknowledged. If SND_NXT
equals SND_UNA, then all previous packets have been acknowledged.
The proposed modification to the Nagle algorithm does,
unfortunately, require one new state variable to be kept by TCP.
SND_SML is a TCP variable which names the last byte of data in the
most recently transmitted small packet.
An implementation could be as follows:
1. When transmitting a small packet, record the sequence
number of the last byte of the small packet in SND_SML.
2. When deciding whether or not to transmit a small packet,
check to ensure that SND_SML is less than, or equal to,
SND_UNA.
A Failure Mode
If an application sends a large amount of data, followed by a small
amount of data, followed by a large amount of data, the current
Nagle algorithm would perform better than the proposed
modification. The current Nagle algorithm would send at most one
small packet (possibly the last packet), delaying the middle
(small) amount of data which would allow the application to send
the following large amount of data; the proposed Nagle algorithm
would send two small packets (the middle packet, plus possibly a
last packet).
A separate, but desirable, system facility
In addition to the Nagle algorithm (or the modification proposed by
this draft), it would be desirable for a system providing TCP
service to applications to allow the application to set TCP into a
mode in which the TCP would only transmit small packets at the
explicit direction of the application. For example, a system based
on BSD might implement a socket option (using setsockopt(2))
SO_EXPLICITPUSH, as well as a flag to sendto(2) (possibly
overloading the semantics of an existing flag, such as MSG_EOF).
In this scenario, an application would set a socket into
SO_EXPLICITPUSH mode, then enter a mode of writing data to the
socket and, at the last write, using send(2) with the MSG_EOF flag.
The underlying TCP would recognize the MSG_EOF flag as an indicator
to transmit the (possibly) small packet.
Like the proposed modification to the Nagle algorithm, this is
fairly simple to implement.
If a system were to implement this interface, it would be important
to NOT disable Nagle when using this interface. In other words,
when using this interface, the default mode for TCP would be to NOT
transmit a small packet (even in the presence of MSG_EOF) if a
previously transmitted small packet was as yet unacknowledged.
Note, also, that implementing this interface does not eliminate the
desirability of using the modification of the Nagle as the default
for applications. More sophisticated networking applications might
well use the new interface, but naive applications will often be
adequately served by the modified Nagle algorithm.
Acknowledgements
Jim Gettys, Henrik Frystyk Nielsen, Jeff Mogul, and Yasushi Saito,
as well as a message forwarded to the end2end-interest list by Sean
Doran, have motivated my current interest in the Nagle algorithm.
John Heidemann's work related to the Nagle algorithm has informed
some of the thinking in this draft; discussions with John have also
been helpful. Members of the End-to-End Research Group (under
the direction of Bob Braden) patiently listened to my discussion of
the current state of the Nagle algorithm and to the modifications
proposed in this document.
Security Considerations
The Nagle algorithm does not have major security consequences.
Implementation of this algorithm should not negatively impact
the performance of the internet. The negative impact of
implementation of this algorithm should be significantly less
than disabling the Nagle algorithm.
References
[RFC896] Nagle, J., "Congestion control in IP/TCP internetworks",
Jan-06-1984.
[RFC1122] Braden, R. T., "Requirements for Internet hosts -
communication layers", Oct-01-1989.
Author's Addresses
Greg Minshall
Siara Systems
1399 Charleston Road
Mountain View, CA 94043
USA