TCP Selective Acknowledgement option (and related changes) for FreeBSD

The file sack.diffs includes a number of modifications to TCP designed to improve performance in presence of losses, namely: Additionally, some cleanup of the TCP code is also present. The software is in alpha stage, although it has been running for a couple of weeks in intermediate formats, and it is running on a couple of our systems since Aug.26 1996.

MODIFIED FAST RETRANSMIT is really helpful on lossy links, and does not need modifications at the receive side. Same for NEWRENO.

SACK (and/or TSACK), especially if sided by MODIFIED FAST RETRANSMIT, can give some improvements to the throughput, but only with sufficiently large windows and a low loss rate.

The diffs are against FreeBSD 2.1R, although they should be easily ported to other BSD-derived systems. Most options must be enabled in the kernel config file via

	option	SPECIFICOPTIONNAME
and need to be enabled via a sysctl variable in order to activate them. Since this code is evolving, please check here (http://www.iet.unipi.it/~luigi/research.html) to see if there is a newer version. In particular, this code still has some diagnostic output which goes to /var/log/messages.

Bugs, fixes and suggestions can be reported to me at rizzo@iet.unipi.it


A brief description of all changes included in sack.diffs follows:

CLEANUP OF BSD CODE

MODIFIED FAST RETRANSMIT

NEWRENO (following a suggestion by J. Hoe)

In Reno, after a fast retransmit, a non-dup ack causes exit from fast recovery. However, in case of multiple losses in the same window, there might need three more dupacks to detect this, and a subsequent fastretrans would shrink the window even further. We save the value of snd_max in snd_max_rxmt at the time of the fast retransmit; then if snd_una does not advance to snd_max_rxmt the segment at snd_una has been lost and can be retransmitted immediately.

SACK

This is an implementation of the SACK options as described in the recent internet draft, to which it is fully compliant. The maximum lifetime of SACK can be set to 0 or more timeouts. The retransmission strategy, during fast recovery, is as follows: if new data can be sent within snd_wnd and snd_cwnd, then do it. Otherwise, old blocks (up to, but not beyond, the last SACKed block) are sent again. There is currently no provision to resent the block snd_una if this has been lost twice (a solution is in the works).

TSACK

This is a simplified version of SACK, which carries SACK information embedded in slightly modified RFC1323 timestamps. There are some tradeoffs in using TSACKs (almost no need for receiver support, less precise SACKs) instead of ACKs, but TSACKs have some advantage over SACKs in some cases.

ARTIFICIAL LOSSES

In order to test the behaviour of the above code, there is a new function, tcp_dropit(), which allows some incoming data and ack packets to be dropped. Currently the drop rate is 10% for data segment, 5% for pure acks. Segments are dropped using a repetitive pattern of 499 segments, in order to make results a bit more reproducible (they aren't reproducible anyways, because the actual generation of ACKs depends on the behaviour of the receiver process and there is some interaction with timeouts).

All the above mechanisms can be enabled by setting the variable

	net.inet.tcp.sack
as follows:
SACK lifetime	0..15	(0 and 1 are equivalent)
SACK		0x10	enables sack negotiation and processing
TSACK		0x20	enables TSACK generation
MODIFIED_FR	0x40	enables modified fast retransmit
NEWRENO		0x80	enables newreno
LOSSY		0x100	enables dropping incoming data/acks
The following kernel options are needed:
option	TSACK		enables TSACK generation
option	SACK		enables SACK code, TSACK processing, LOSSY
Newreno and modified fast retransmit are compiled in by default.

You might also need the following changes to sysctl and netstat. The former needs to be recompiled with the new tcp_var.h The patch below just allows you to enter values as hex numbers instead of decimal ones.

The patch to netstat (which also needs to be recompiled) is there to allow you to see the additional statistic variables in the tcpstat structure. Since these variables are allocated at the bottom of the structure, older netstat will work, just don't write all available info.

diff -cbwr /usr.sbin/sysctl/sysctl.c ./sysctl.c
*** /cdrom/usr/src/usr.sbin/sysctl/sysctl.c     Sun Jun 11 06:32:58 1995
--- ./sysctl.c  Mon Aug 19 16:28:31 1996
***************
*** 342,348 ****
        if (newsize > 0) {
                switch (type) {
                case CTLTYPE_INT:
!                       intval = atoi(newval);
                        newval = &intval;
                        newsize = sizeof intval;
                        break;
--- 342,349 ----
        if (newsize > 0) {
                switch (type) {
                case CTLTYPE_INT:
!                       sscanf(newval, "%i", &intval); /* XXX */
!                       /* intval = atoi(newval); */
                        newval = &intval;
                        newsize = sizeof intval;
                        break;

diff -cbwr netstat/inet.c /usr/src/usr.bin/netstat/inet.c
*** netstat/inet.c      Sat Jul 29 11:42:54 1995
--- /usr/src/usr.bin/netstat/inet.c     Fri Aug 23 17:02:49 1996
***************
*** 227,233 ****
--- 227,243 ----
        p(tcps_conndrops, "\t%d embryonic connection%s dropped\n");
        p2(tcps_rttupdated, tcps_segstimed,
                "\t%d segment%s updated rtt (of %d attempt%s)\n");
+       p(tcps_zerodupw, "\t%d invalid invalid dupack reset on window update\n");
        p(tcps_rexmttimeo, "\t%d retransmit timeout%s\n");
+       p(tcps_rexmt[0], "\t\t%d retransmit timeout with 0 dup acks\n");
+       p(tcps_rexmt[1], "\t\t%d retransmit timeout with 1 dup acks\n");
+       p(tcps_rexmt[2], "\t\t%d retransmit timeout with 2 dup acks\n");
+       p(tcps_fastretransmit, "\t%d fast retransmit%s\n");
+       p(tcps_fastrexmt[0], "\t\t%d with 1 dup ack\n");
+       p(tcps_fastrexmt[1], "\t\t%d with 2 dup ack\n");
+       p(tcps_fastrexmt[2], "\t\t%d with 3 dup ack\n");
+       p(tcps_newreno, "\t%d newreno retrans\n");
+       p(tcps_fastrecovery, "\t%d fast recovery\n");
        p(tcps_timeoutdrop, "\t\t%d connection%s dropped by rexmit timeout\n");
        p(tcps_persisttimeo, "\t%d persist timeout%s\n");
        p(tcps_persistdrop, "\t\t%d connection%s dropped by persist timeout\n");