INSTALL100664 423 0 2664 6755353531 10534 0ustar luigiwheel*** INSTALL NOTES for PGM *** Luigi Rizzo (luigi@iet.unipi.it) *** 14 august 1999 >>> ONLY THE FIRST TIME YOU INSTALL THIS CODE, you need to patch a few system files with the hooks for pgm using the patchfile named kpgm--.patch The files affected are at most the following /sys/conf/files /sys/conf/options /sys/netinet/in.h /sys/netinet/in_proto.c (but newer releases of FreeBSD might already include the necessary hooks). To patch your system sources, do the following cd / patch < YOUR_PATCH_FILE (patches are hand-edited and small, if something goes wrong or you don't trust the automatic process you can apply them by hand). >>> everytime you install a new release, replace the files indicated below with the ones coming from the archive: * copy the following four files to /sys/netinet pgm.h pgm_var.h pgm_timer.c pgm_usrreq.c * copy the following two files to /usr/include/netinet pgm.h pgm_var.h * copy the following file to /usr/share/man/man4 pgm.4 To compile a pgm-enabled kernel you need to put options PGM in your kernel config file, then use the usual sequence to build a kernel (config, make depend, make). BUILDING APPLICATIONS The command "man pgm" gives you a few information on how to build applications using PGM (the interface is very simple) and on the sysctl variables that control PGM parameters. In the "sample/" directory there are a few sample applications. README.pgm100664 423 0 14474 6753104564 11165 0ustar luigiwheelREADME.pgm -- architecture of PGM implementation The code is made of four files pgm.h main include files with PGM definitions. pgm_var.h protocol control blocks, data structures and kernel procedures pgm_timer.c timers and input handling pgm_usrreq.c user calls and output handling PGM sockets are created with the socket() system call, initially in an uncommitted state (PGM_NEW) where they cannot do any I/O. Senders must initialize a socket using both the bind() and connect() system calls. bind() is optional and is used to select a local port to be used in the transmission -- the port is part of the TSI for the session. If used, bind() must be called before a connect(). connect() is used to select the multicast (or unicast ?) address and port of the destination. It also starts the state machine for the sender, starting the transmission of SPM for the session. Receivers are only initialized with the bind() system call in one of the two available modes: raw receivers and full receivers. Raw receiver sockets are created by passing to the bind() call only the local endpoint, and a zero TSI. This makes the socket able to receive all PGM packets addressed to that group/port, but without running the PGM state machine or sending NAKs. This mode can be used to learn the TSI associated to a session Full receivers are created by passing a fully instantiated sockaddr_pgm to the bind() call, i.e. including the full TSI for the session. Such a call starts the full PGM receiver state machine (including data reassembly, reordering, and repair requests. PROTOCOL CONTROL BLOCK A single control block is used, but it really contains info for both sender and receiver state (this might be fixed in a future release by separating sender and receiver descriptor). Common fields include: + the usual back pointer to struct inpcb; + descriptor state name; + TSI used for the session; + network layer addresses for the multicast group and next hop. + socket buffer sizes. + a template header used for transmissions. Sender variables include: + window variables; + a pointer into the socket buffer to decide which ODATA packet is the next one to transmit; + a queue of pending retransmit requests; + a timeout queue used to expire packets from the window. + state variables used by the traffic shaper; + variables used to schedule SPM. Receiver data structures are the biggest ones and include: + window variables; + a queue of descriptors for each present or missing packet; descriptors for missing packet also contain retransmit state; CODE PATHS User calls are handled via the usual socket primitives and protosw[] calls. Unsupported functions (such as listen() and accept()) return an error, supported ones are routed to the corresponding pgm_XXX() calls. socket() invokes pgm_attach() which in turn creates a new descriptor in state PGM_NEW. ind() is only valid in state PGM_NEW. It calls pgm_bind() which initializes the state of the descriptor and possibly commits to one of the available modes. connect() is only valid in state PGM_NEW or PGM_SENDER. It completes initialization of sender sockets and starts the generation of SPM, also enabling transmission on the socket. close() and shutdown() cause the socket to be closed, state to be flushed... TODO: check if we can leave sender sockets lingering for pending repair requests. write() and friends result in a call to pgm_send() which schedules transmission of a packet. The packet is queued into the socket buffer, and transmitted through the traffic shaper at the programmed rate. Zero length writes should not be allowed as they might be confused with other situations such as a hole or end of transmission. read() and friends just grab packets from the socket buffer, where they have been placed by the input handlers described below. In case of a non-recoverable hole in the data stream, a read returns a zero-length segment (TODO see if can be made to return an error) and a specific getsockopt() must be invoked to determine the number of missing segments and restart regular input. INPUT HANDLING input pgm packets are passed to pgm_input() which does basic checks, strips the IP header, and passes the packet to all matching sockets. This code uses a trick (borrowed from udp_input) to avoid unnecessary copies: a loop scanning descriptor is run, and when a matching socket is found the packet is first copied and delivered to the previously matching socket, and then the new socket is recorded. A final pass takes care of the last matching socket, this time without copying. Delivery is done inline for raw receivers, calling pgm_rx_in() for full receivers, and by immediately sending a NCF and inserting the request in the repair queue for sender sockets (further interventions will be done by the timer XXX check that there is no deadlock). pgm_rx_in runs the pgm receiver state machine, mostly following the spec except for some deviations done for efficiency. TIMERS The PGM layer makes use of two timers. One is coarse grained (0.5s) and is used to handle retransmissions, backoffs and packet expiration. Called by slowtimo(). The second one is fine-grained, called once per timer tick (i.e. every 1..10ms) and is used to schedule packet transmissions on senders (namely, ODATA and RDATA). It is only active when there are sender sockets with pending ODATA/RDATA. DATA STRUCTURES The reassembly queue is a list of records with head and tail pointers. Assuming ODATA arrive more or less in order, the critical point is search of the insertion point for repairs. In addition to managing a free-list of descriptor in the socket, Explore the following arrangements: + use an array to hold all descriptors -- search time is constant, but we have a potential waste of memory and cost of realloc. + collapse data segments in a single mbuf chain -- this way already existing blocks can be skipped in one step, and also recovery is easier. + potentially, allocate small arrays for burst losses. However the typical burst size might be so small to vanify any advantage. The expire queue is currently a fifo queue. Enhancements are very simple: + manage a free list autonomously; + collapse entries for packets expiring at the same time. This is trivial, just add a count field. Ought to implement some form of fair queueing in the output scheduler. pgm.4100644 423 0 20634 6753755042 10367 0ustar luigiwheel.\" Copyright (c) 1999 .\" Luigi Rizzo .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $Id$ .\" .Dd August 9, 1999 .Dt PGM 4 .Os FreeBSD 3.2 .Sh NAME .Nm pgm .Nd PGM multicast Protocol .Sh SYNOPSIS .Fd #include .Fd #include .Fd #include .Fd #include .Ft int .Fn socket AF_INET SOCK_SEQPACKET IPPROTO_PGM .Sh DESCRIPTION The .Tn PGM protocol provides support for reliable multicast data transfer from one source to many receivers. It implements the .Dv SOCK_SEQPACKET abstraction on multicast sockets. PGM uses the standard Internet address format and, in addition, provides a per-host collection of .Dq port addresses . Furthermore, a PGM session is identified by a 64-bit value called TSI, which is made of a 16-bit .Dv source port (SPORT) and a 48-bit unique host identifier called .Dv GSI. .Pp PGM traffic can be transmitted on one or more multicast groups (or even unicast addresses ?). .Pp Sockets utilizing the PGM protocol are unidirectional, and configured in any of the following states: unbound, sender, raw receiver, full receiver. .Pp PGM sockets are created with the .Xr socket 2 systemi call in an .Dq unbound state where they cannot either send or receive any traffic. .Sh PGM RECEIVERS PGM receivers must issue the .Xr bind 2 system call to configure the socket as a .Dq raw receiver or .Dq full receiver Bind is called with a struct sockaddr_pgm structure described below, which is an extension of struct sockaddr_in : .Bd -literal -offset indent struct sockaddr_pgm { u_int8_t sin_len; u_int8_t sin_family; u_int16_t sin_port; struct in_addr sin_addr; u_int32_t gsid_low ; u_int16_t gsid_high ; u_int16_t sport ; }; .Ed sin_family must contain AF_INET. sin_addr must contain the multicast address used for the communication. sin_port must contain the local port (DPORT) used for the communication. Binding to an address with a null TSI (i.e. gsid_low, gsid_high, sport are all zero) causes the socket to become a .Dq raw receiver , which simply passes up all received packets matching the specified address. This allows for example a receiver to listen to a given multicast group/port and collect the TSI to be used for receiving the actual data. .Pp Binding to an address that contains a full TSI makes the socket a .Dq full receiver , where the full PGM receiver state machine is run and only data packets are delivered to the reader, in sequential order, and using the PGM recovery procedures to collect missing packets. .Pp Note that .Xr bind 2 cannot be executed twice on the same socket. This means that a raw receiver socket cannot transformed at a later time into a full receiver. A new socket must be created instead, and the old one can be closed or kept open. .Pp Note that an application must issue an explicit .Xr setsockopt 2 call to JOIN the desired multicast group (matching the address specified in the bind() call) in order to actually receive the multicast data. .Pp .Sh Receiving data A receiver will normally read incoming packets with .Xr read 2 .Xr recv 2 or .Xr recvmsg 2 calls. A raw receiver will simply return packets in the order they are received, and including the PGM headers. A full receiver will normally return sequenced packets, or an error when an unrecoverable hole occurs. In this case, normal reception can only be recovered by issuing the .Dq PGM_HOLE_SIZE .Xr getsockopt 2 call to determine the size of the hole (in packets) and re-enable the delivery of data packets to the application. .Pp .Pp .Sh PGM SENDERS .Dq sender sockets require a .Xr connect 2 system calls to be issued before being able to transmit. The connect() call is used to specify the multicast address and port used for transmission. Optionally, a .Xr bind 2 can be invoked before the connect to set the SPORT part of the TSI. In this case, the struct sockaddr_pgm should contain a null GSI, INADDR_ANY as sin_addr, a non-null sin_port and the same value in sport. .Sh Sending data .Pp Normally, a sender will use any of the .Xr write 2 or .Xr send 2 calls to transmit packets, which are buffered at the PGM sender socket and cause packets to be transmitted according to the PGM scheduler. Socket options can be used to determine the window advance method. .Sh OPTIONS .Tn PGM supports a number of socket options which can be set with .Xr setsockopt 2 and tested with .Xr getsockopt 2 : .Bl -tag -width PGMx .It Dv PGM_TXW_SIZE sets the maximum size of the transmit window, in bytes. .It Dv PGM_TXW_MAX_RATE sets the maximum rate to be used for the transmission of DATA and SPM packets. .It Dv PGM_TRAIL_ADVANCE specifies the trail advance policy. Currently three are supported: ... .It Dv PGM_ODATA_LIFETIME specifies the lifetime of odata packets -- used when the .Dq advance with time trail advance policy is used. .It Dv PGM_HOLE_SIZE readonly, is used to fetch the currently known size of holes in the data sequence, and it is necessary to re-enable the delivery of data packets to the application. or the internal send buffer is filled. .El .Pp The option level for the .Xr setsockopt 2 call is the protocol number for .Tn PGM , available from .Xr getprotobyname 3 , or .Dv IPPROTO_PGM . All options are declared in .Aq Pa netinet/pgm.h . .Pp Options at the .Tn IP transport level may be used with .Tn PGM ; see .Xr ip 4 . .Sh MIB VARIABLES The .Nm protocol implements three variables in the .Li net.inet.pgm branch of the .Xr sysctl 3 MIB. .Bl -tag -width sendspacexx .It Dv sendspace pgm sender window size (default 16384 bytes). .It Dv recvspace pgm receive window size (default 16384 bytes, currently unchecked). .It Dv bandwidth maximum sending rate (default 16000 bit/s). .It Dv odata_lifetime ODATA lifetime when the "advance with time" window advance method is used (default 30 seconds). .It Dv nak_bo_ivl nak backoff interval, in ticks (default 6 = 3sec.) .It Dv nak_rpt_ivl nak repeat interval if NCF not received, in ticks (default 15). .It Dv nak_rdata_ivl nak repeat interval if RDATA not received, default 30 ticks. .It Dv nak_gen_ivl total timeout for retransmission, in ticks (default 125). .It Dv spm_ivl interval between ambient SPM, in ticks (default 6). .It Dv gsid_low, gsid_high low 32 bit and high 16 bit of the GSI. Should be set depending on the hostname etc. .It Dv pgmcksum enables the use of checksum on PGM packets (defaults to 1). .El .Sh DIAGNOSTICS A socket operation may fail with one of the following errors returned: .Bl -tag -width [EADDRNOTAVAIL] .It Bq Er EISCONN when trying to establish a connection on a socket which already has one; .It Bq Er ENOBUFS when the system runs out of memory for an internal data structure; .It Bq Er ETIMEDOUT when a segment is unavailable due to excessive retransmissions; .It Bq Er EADDRINUSE when an attempt is made to create a socket with a port which has already been allocated; .It Bq Er EADDRNOTAVAIL when an attempt is made to create a socket with a network address for which no network interface exists. .It Bq Er EAFNOSUPPORT when an attempt is made to bind or connect a socket to a multicast address. .El .Sh SEE ALSO .Xr getsockopt 2 , .Xr socket 2 , .Xr sysctl 3 , .Xr inet 4 , .Xr intro 4 , .Xr ip 4 , .Rs .%A T.Speakman et al .%T "PGM draft specification" .%O Internet Draft, ... .Re .Sh HISTORY The .Nm protocol appeared in .Nm FreeBSD 3.2 kpgm-2.2.6R-990814.patch100664 423 0 5144 6755314331 13012 0ustar luigiwheeldiff -ubwr src/sys/conf/files /usr/src/sys/conf/files --- src/sys/conf/files Sat Mar 7 00:43:56 1998 +++ /usr/src/sys/conf/files Fri Aug 6 11:08:55 1999 @@ -222,6 +222,8 @@ netinet/ip_input.c optional inet netinet/ip_mroute.c optional inet netinet/ip_output.c optional inet +netinet/pgm_timer.c optional inet pgm +netinet/pgm_usrreq.c optional inet pgm netinet/raw_ip.c optional inet netinet/tcp_debug.c optional tcpdebug netinet/tcp_input.c optional inet diff -ubwr src/sys/conf/options /usr/src/sys/conf/options --- src/sys/conf/options Sat Mar 7 00:43:58 1998 +++ /usr/src/sys/conf/options Fri Aug 6 11:11:57 1999 @@ -97,6 +97,7 @@ IPFIREWALL opt_ipfw.h IPFIREWALL_VERBOSE opt_ipfw.h IPFIREWALL_VERBOSE_LIMIT opt_ipfw.h +PGM opt_pgm.h # DPT SCSI RAID Controller DPT_VERIFY_HINTR opt_dpt.h diff -ubwr src/sys/netinet/in.h /usr/src/sys/netinet/in.h --- src/sys/netinet/in.h Wed Feb 25 03:34:30 1998 +++ /usr/src/sys/netinet/in.h Thu Feb 11 13:27:38 1999 @@ -140,6 +140,7 @@ #define IPPROTO_ENCAP 98 /* encapsulation header */ #define IPPROTO_APES 99 /* any private encr. scheme */ #define IPPROTO_GMTP 100 /* GMTP*/ +#define IPPROTO_PGM 113 /* PGM */ /* 101-254: Unassigned */ /* 255: Reserved */ /* BSD Private, local use, namespace incursion */ diff -ubwr src/sys/netinet/in_proto.c /usr/src/sys/netinet/in_proto.c --- src/sys/netinet/in_proto.c Tue Sep 16 20:37:00 1997 +++ /usr/src/sys/netinet/in_proto.c Fri Aug 6 11:12:56 1999 @@ -35,6 +35,7 @@ */ #include "opt_tcpdebug.h" +#include "opt_pgm.h" #include #include @@ -68,6 +69,10 @@ #endif #include #include +#ifdef PGM +#include +#include +#endif /* * TCP/IP protocol family: IP, ICMP, UDP, TCP. */ @@ -135,6 +140,15 @@ rip_usrreq, 0, 0, 0, 0, }, +#ifdef PGM /* PGM stuff */ +{ SOCK_SEQPACKET,&inetdomain, IPPROTO_PGM, + PR_ATOMIC|PR_CONNREQUIRED|PR_ADDR, + pgm_input, 0, pgm_ctlinput, pgm_ctloutput, + /* pgm_usrreq */ 0, + pgm_init, pgm_fasttimo, pgm_slowtimo, pgm_drain, + /* 0 */ &pgm_usrreqs +}, +#endif #ifdef IPDIVERT { SOCK_RAW, &inetdomain, IPPROTO_DIVERT, PR_ATOMIC|PR_ADDR, div_input, 0, 0, ip_ctloutput, @@ -197,6 +211,9 @@ SYSCTL_NODE(_net_inet, IPPROTO_UDP, udp, CTLFLAG_RW, 0, "UDP"); SYSCTL_NODE(_net_inet, IPPROTO_TCP, tcp, CTLFLAG_RW, 0, "TCP"); SYSCTL_NODE(_net_inet, IPPROTO_IGMP, igmp, CTLFLAG_RW, 0, "IGMP"); +#ifdef PGM +SYSCTL_NODE(_net_inet, IPPROTO_PGM, pgm, CTLFLAG_RW, 0, "PGM"); +#endif #ifdef IPDIVERT SYSCTL_NODE(_net_inet, IPPROTO_DIVERT, div, CTLFLAG_RW, 0, "DIVERT"); #endif kpgm-2.2.8R-990814.patch100664 423 0 5334 6755314637 13026 0ustar luigiwheeldiff -ubwr src/sys/conf/files /usr/src/sys/conf/files --- src/sys/conf/files Sat Sep 26 19:36:12 1998 +++ /usr/src/sys/conf/files Fri Aug 6 11:08:55 1999 @@ -227,6 +224,8 @@ netinet/ip_input.c optional inet netinet/ip_mroute.c optional inet netinet/ip_output.c optional inet +netinet/pgm_timer.c optional inet pgm +netinet/pgm_usrreq.c optional inet pgm netinet/raw_ip.c optional inet netinet/tcp_debug.c optional tcpdebug netinet/tcp_input.c optional inet diff -ubwr src/sys/conf/options /usr/src/sys/conf/options --- src/sys/conf/options Thu Jun 25 02:46:17 1998 +++ /usr/src/sys/conf/options Fri Aug 6 11:11:57 1999 @@ -98,8 +98,9 @@ IPFIREWALL_VERBOSE opt_ipfw.h IPFIREWALL_VERBOSE_LIMIT opt_ipfw.h IPFIREWALL_DEFAULT_TO_ACCEPT opt_ipfw.h #temp option to change ipfw/divert semantics. Should become standard. IPFW_DIVERT_RESTART opt_ipfw.h +PGM opt_pgm.h # DPT SCSI RAID Controller DPT_VERIFY_HINTR opt_dpt.h diff -ubwr src/sys/netinet/in.h /usr/src/sys/netinet/in.h --- src/sys/netinet/in.h Thu Sep 17 20:02:25 1998 +++ /usr/src/sys/netinet/in.h Thu Feb 11 13:27:38 1999 @@ -140,6 +140,7 @@ #define IPPROTO_ENCAP 98 /* encapsulation header */ #define IPPROTO_APES 99 /* any private encr. scheme */ #define IPPROTO_GMTP 100 /* GMTP*/ +#define IPPROTO_PGM 113 /* PGM */ /* 101-254: Unassigned */ /* 255: Reserved */ /* BSD Private, local use, namespace incursion */ diff -ubwr src/sys/netinet/in_proto.c /usr/src/sys/netinet/in_proto.c --- src/sys/netinet/in_proto.c Tue Sep 16 20:37:00 1997 +++ /usr/src/sys/netinet/in_proto.c Fri Aug 6 11:12:56 1999 @@ -35,6 +35,7 @@ */ #include "opt_tcpdebug.h" +#include "opt_pgm.h" #include #include @@ -68,6 +69,10 @@ #endif #include #include +#ifdef PGM +#include +#include +#endif /* * TCP/IP protocol family: IP, ICMP, UDP, TCP. */ @@ -135,6 +140,15 @@ rip_usrreq, 0, 0, 0, 0, }, +#ifdef PGM /* PGM stuff */ +{ SOCK_SEQPACKET,&inetdomain, IPPROTO_PGM, + PR_ATOMIC|PR_CONNREQUIRED|PR_ADDR, + pgm_input, 0, pgm_ctlinput, pgm_ctloutput, + /* pgm_usrreq */ 0, + pgm_init, pgm_fasttimo, pgm_slowtimo, pgm_drain, + /* 0 */ &pgm_usrreqs +}, +#endif #ifdef IPDIVERT { SOCK_RAW, &inetdomain, IPPROTO_DIVERT, PR_ATOMIC|PR_ADDR, div_input, 0, 0, ip_ctloutput, @@ -197,6 +211,9 @@ SYSCTL_NODE(_net_inet, IPPROTO_UDP, udp, CTLFLAG_RW, 0, "UDP"); SYSCTL_NODE(_net_inet, IPPROTO_TCP, tcp, CTLFLAG_RW, 0, "TCP"); SYSCTL_NODE(_net_inet, IPPROTO_IGMP, igmp, CTLFLAG_RW, 0, "IGMP"); +#ifdef PGM +SYSCTL_NODE(_net_inet, IPPROTO_PGM, pgm, CTLFLAG_RW, 0, "PGM"); +#endif #ifdef IPDIVERT SYSCTL_NODE(_net_inet, IPPROTO_DIVERT, div, CTLFLAG_RW, 0, "DIVERT"); #endif kpgm-3.1R-990730.patch100664 423 0 5746 6750330721 12647 0ustar luigiwheeldiff -ubwr src/sys/conf/files /mnt/src/sys/conf/files --- src/sys/conf/files Sun Jan 24 06:11:31 1999 +++ /mnt/src/sys/conf/files Fri Jul 30 12:53:05 1999 @@ -523,6 +523,8 @@ netinet/ip_proxy.c optional ipfilter inet netinet/ip_state.c optional ipfilter inet netinet/mlf_ipl.c optional ipfilter inet +netinet/pgm_timer.c optional pgm inet +netinet/pgm_usrreq.c optional pgm inet netinet/raw_ip.c optional inet netinet/tcp_debug.c optional tcpdebug netinet/tcp_input.c optional inet diff -ubwr src/sys/conf/options /mnt/src/sys/conf/options --- src/sys/conf/options Mon Feb 8 20:05:55 1999 +++ /mnt/src/sys/conf/options Fri Jul 30 12:57:26 1999 @@ -201,6 +201,7 @@ IPXIP opt_ipx.h IPTUNNEL opt_ipx.h NETATALK opt_atalk.h +PGM opt_pgm.h PPP_BSDCOMP opt_ppp.h PPP_DEFLATE opt_ppp.h PPP_FILTER opt_ppp.h diff -ubwr src/sys/netinet/in.h /mnt/src/sys/netinet/in.h --- src/sys/netinet/in.h Mon Dec 14 19:09:13 1998 +++ /mnt/src/sys/netinet/in.h Thu Jul 29 10:19:48 1999 @@ -31,7 +31,7 @@ * SUCH DAMAGE. * * @(#)in.h 8.3 (Berkeley) 1/3/94 - * $Id: in.h,v 1.38 1998/12/14 18:09:13 luigi Exp $ + * $Id: in.h,v 1.38.2.1 1999/05/04 16:23:55 luigi Exp $ */ #ifndef _NETINET_IN_H_ @@ -140,7 +140,8 @@ #define IPPROTO_ENCAP 98 /* encapsulation header */ #define IPPROTO_APES 99 /* any private encr. scheme */ #define IPPROTO_GMTP 100 /* GMTP*/ -/* 101-254: Unassigned */ +/* 101-254: Partly Unassigned */ +#define IPPROTO_PGM 113 /* PGM */ /* 255: Reserved */ /* BSD Private, local use, namespace incursion */ #define IPPROTO_DIVERT 254 /* divert pseudo-protocol */ diff -ubwr src/sys/netinet/in_proto.c /mnt/src/sys/netinet/in_proto.c --- src/sys/netinet/in_proto.c Sun Aug 23 05:07:14 1998 +++ /mnt/src/sys/netinet/in_proto.c Fri Jul 30 16:09:52 1999 @@ -36,6 +36,7 @@ #include "opt_ipdivert.h" #include "opt_ipx.h" +#include "opt_pgm.h" #include #include @@ -58,6 +59,10 @@ #include #include #include +#ifdef PGM +#include +#include +#endif /* * TCP/IP protocol family: IP, ICMP, UDP, TCP. */ @@ -124,6 +129,14 @@ 0, 0, 0, 0, &rip_usrreqs }, +#ifdef PGM +{ SOCK_SEQPACKET,&inetdomain, IPPROTO_PGM, PR_ATOMIC|PR_CONNREQUIRED|PR_ADDR, + pgm_input, 0, pgm_ctlinput, pgm_ctloutput, + 0, + pgm_init, pgm_fasttimo, pgm_slowtimo, pgm_drain, + &pgm_usrreqs +}, +#endif #ifdef IPDIVERT { SOCK_RAW, &inetdomain, IPPROTO_DIVERT, PR_ATOMIC|PR_ADDR, div_input, 0, 0, ip_ctloutput, @@ -190,6 +203,9 @@ SYSCTL_NODE(_net_inet, IPPROTO_UDP, udp, CTLFLAG_RW, 0, "UDP"); SYSCTL_NODE(_net_inet, IPPROTO_TCP, tcp, CTLFLAG_RW, 0, "TCP"); SYSCTL_NODE(_net_inet, IPPROTO_IGMP, igmp, CTLFLAG_RW, 0, "IGMP"); +#ifdef PGM +SYSCTL_NODE(_net_inet, IPPROTO_PGM, pgm, CTLFLAG_RW, 0, "PGM"); +#endif SYSCTL_NODE(_net_inet, IPPROTO_RAW, raw, CTLFLAG_RW, 0, "RAW"); #ifdef IPDIVERT SYSCTL_NODE(_net_inet, IPPROTO_DIVERT, div, CTLFLAG_RW, 0, "DIVERT"); kpgm-3.2R-990814.patch100644 423 0 4023 6755312576 12651 0ustar luigiwheel--- /sys/conf/files.orig Sat Jul 31 07:53:04 1999 +++ /sys/conf/files Sat Jul 31 07:53:30 1999 @@ -527,6 +527,8 @@ netinet/ip_proxy.c optional ipfilter inet netinet/ip_state.c optional ipfilter inet netinet/mlf_ipl.c optional ipfilter inet +netinet/pgm_timer.c optional pgm inet +netinet/pgm_usrreq.c optional pgm inet netinet/raw_ip.c optional inet netinet/tcp_debug.c optional tcpdebug netinet/tcp_input.c optional inet --- /sys/conf/options.orig Tue May 11 07:35:28 1999 +++ /sys/conf/options Sat Jul 31 07:53:44 1999 @@ -213,6 +213,7 @@ IPXIP opt_ipx.h IPTUNNEL opt_ipx.h NETATALK opt_atalk.h +PGM opt_pgm.h PPP_BSDCOMP opt_ppp.h PPP_DEFLATE opt_ppp.h PPP_FILTER opt_ppp.h --- /sys/netinet/in_proto.c.orig Sat Jul 31 07:55:14 1999 +++ /sys/netinet/in_proto.c Sat Jul 31 07:56:20 1999 @@ -36,6 +36,7 @@ #include "opt_ipdivert.h" #include "opt_ipx.h" +#include "opt_pgm.h" #include #include @@ -58,6 +59,10 @@ #include #include #include +#ifdef PGM +#include +#include +#endif /* * TCP/IP protocol family: IP, ICMP, UDP, TCP. */ @@ -124,6 +129,14 @@ 0, 0, 0, 0, &rip_usrreqs }, +#ifdef PGM +{ SOCK_SEQPACKET,&inetdomain, IPPROTO_PGM, PR_ATOMIC|PR_CONNREQUIRED|PR_ADDR, + pgm_input, 0, pgm_ctlinput, pgm_ctloutput, + 0, + pgm_init, pgm_fasttimo, pgm_slowtimo, pgm_drain, + &pgm_usrreqs +}, +#endif #ifdef IPDIVERT { SOCK_RAW, &inetdomain, IPPROTO_DIVERT, PR_ATOMIC|PR_ADDR, div_input, 0, 0, ip_ctloutput, @@ -190,6 +203,9 @@ SYSCTL_NODE(_net_inet, IPPROTO_UDP, udp, CTLFLAG_RW, 0, "UDP"); SYSCTL_NODE(_net_inet, IPPROTO_TCP, tcp, CTLFLAG_RW, 0, "TCP"); SYSCTL_NODE(_net_inet, IPPROTO_IGMP, igmp, CTLFLAG_RW, 0, "IGMP"); +#ifdef PGM +SYSCTL_NODE(_net_inet, IPPROTO_PGM, pgm, CTLFLAG_RW, 0, "PGM"); +#endif SYSCTL_NODE(_net_inet, IPPROTO_RAW, raw, CTLFLAG_RW, 0, "RAW"); #ifdef IPDIVERT SYSCTL_NODE(_net_inet, IPPROTO_DIVERT, div, CTLFLAG_RW, 0, "DIVERT"); kpgm-RELENG22-990814.patch100664 423 0 5337 6755315056 13354 0ustar luigiwheeldiff -ubwr src/sys/conf/files /usr/src/sys/conf/files --- src/sys/conf/files Sat Jul 3 01:52:50 1999 +++ /usr/src/sys/conf/files Fri Aug 6 11:08:55 1999 @@ -230,6 +224,8 @@ netinet/ip_input.c optional inet netinet/ip_mroute.c optional inet netinet/ip_output.c optional inet +netinet/pgm_timer.c optional inet pgm +netinet/pgm_usrreq.c optional inet pgm netinet/raw_ip.c optional inet netinet/tcp_debug.c optional tcpdebug netinet/tcp_input.c optional inet diff -ubwr src/sys/conf/options /usr/src/sys/conf/options --- src/sys/conf/options Mon Jul 5 22:20:39 1999 +++ /usr/src/sys/conf/options Fri Aug 6 11:11:57 1999 @@ -111,10 +98,11 @@ IPFIREWALL_VERBOSE opt_ipfw.h IPFIREWALL_VERBOSE_LIMIT opt_ipfw.h IPFIREWALL_DEFAULT_TO_ACCEPT opt_ipfw.h #temp option to change ipfw/divert semantics. Should become standard. IPFW_DIVERT_RESTART opt_ipfw.h +PGM opt_pgm.h # DPT SCSI RAID Controller DPT_VERIFY_HINTR opt_dpt.h diff -ubwr src/sys/netinet/in.h /usr/src/sys/netinet/in.h --- src/sys/netinet/in.h Thu Sep 17 20:02:25 1998 +++ /usr/src/sys/netinet/in.h Thu Feb 11 13:27:38 1999 @@ -140,6 +140,7 @@ #define IPPROTO_ENCAP 98 /* encapsulation header */ #define IPPROTO_APES 99 /* any private encr. scheme */ #define IPPROTO_GMTP 100 /* GMTP*/ +#define IPPROTO_PGM 113 /* PGM */ /* 101-254: Unassigned */ /* 255: Reserved */ /* BSD Private, local use, namespace incursion */ diff -ubwr src/sys/netinet/in_proto.c /usr/src/sys/netinet/in_proto.c --- src/sys/netinet/in_proto.c Tue Sep 16 20:37:00 1997 +++ /usr/src/sys/netinet/in_proto.c Fri Aug 6 11:12:56 1999 @@ -35,6 +35,7 @@ */ #include "opt_tcpdebug.h" +#include "opt_pgm.h" #include #include @@ -68,6 +69,10 @@ #endif #include #include +#ifdef PGM +#include +#include +#endif /* * TCP/IP protocol family: IP, ICMP, UDP, TCP. */ @@ -135,6 +140,15 @@ rip_usrreq, 0, 0, 0, 0, }, +#ifdef PGM /* PGM stuff */ +{ SOCK_SEQPACKET,&inetdomain, IPPROTO_PGM, + PR_ATOMIC|PR_CONNREQUIRED|PR_ADDR, + pgm_input, 0, pgm_ctlinput, pgm_ctloutput, + /* pgm_usrreq */ 0, + pgm_init, pgm_fasttimo, pgm_slowtimo, pgm_drain, + /* 0 */ &pgm_usrreqs +}, +#endif #ifdef IPDIVERT { SOCK_RAW, &inetdomain, IPPROTO_DIVERT, PR_ATOMIC|PR_ADDR, div_input, 0, 0, ip_ctloutput, @@ -197,6 +211,9 @@ SYSCTL_NODE(_net_inet, IPPROTO_UDP, udp, CTLFLAG_RW, 0, "UDP"); SYSCTL_NODE(_net_inet, IPPROTO_TCP, tcp, CTLFLAG_RW, 0, "TCP"); SYSCTL_NODE(_net_inet, IPPROTO_IGMP, igmp, CTLFLAG_RW, 0, "IGMP"); +#ifdef PGM +SYSCTL_NODE(_net_inet, IPPROTO_PGM, pgm, CTLFLAG_RW, 0, "PGM"); +#endif #ifdef IPDIVERT SYSCTL_NODE(_net_inet, IPPROTO_DIVERT, div, CTLFLAG_RW, 0, "DIVERT"); #endif pgm.h100644 423 0 7670 6755017354 10437 0ustar luigiwheel/* * pgm.h -- include files for PGM * * Copyright (c) 1999 Luigi Rizzo * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * */ #ifndef _NETINET_PGM_H_ #define _NETINET_PGM_H_ typedef u_int32_t pgm_seq ; #define PGM_SEQ_LT(a,b) ((int)((a)-(b)) < 0) #define PGM_SEQ_LEQ(a,b) ((int)((a)-(b)) <= 0) #define PGM_SEQ_GT(a,b) ((int)((a)-(b)) > 0) #define PGM_SEQ_GEQ(a,b) ((int)((a)-(b)) >= 0) /* * the header of a PGM packet, which is always present. * Basically we consider the header of a data pkt, other pkts * are extensions to it. */ struct pgmhdr { u_int16_t ph_sport ; u_int16_t ph_dport ; u_int8_t type ; #define PGM_SPM_TYPE 0 #define PGM_OD_TYPE 4 #define PGM_RD_TYPE 5 #define PGM_NAK_TYPE 8 #define PGM_NNAK_TYPE 9 #define PGM_NCF_TYPE 10 u_int8_t options ; #define PGM_OPT_PRESENT 1 /* there are option extentions */ #define PGM_OPT_NE_PRESENT 2 /* there are NE-significant opt.ext. */ u_int16_t checksum ; u_int32_t gsid_low ; u_int16_t gsid_high ; u_int16_t tpdu_len ; /* * all packets have a couple of sequence numbers here, but * their meaning differs. */ pgm_seq _seq1; #define od_txw_trail _seq1 /* this is in ODATA pkts */ #define spm_seq _seq1 /* this is in SPM pkts */ #define nak_req_seq _seq1 /* this is in NAK pkts */ pgm_seq _seq2; #define od_dp_seq _seq2 /* this is in ODATA pkts */ #define spm_txw_trail _seq2 /* this is in SPM pkts */ } ; /* * Source Path Message (SPM) packets */ struct pgm_spm_body { u_int32_t spm_le_seq ; u_int16_t nla_afi ; u_int16_t rsvd ; struct in_addr path_nla ; u_char options[0]; } ; struct pgm_spm { struct pgmhdr pgmhdr ; struct pgm_spm_body body ; } ; /* * (N)ACK packets (from receivers/DLR) */ struct pgm_ack_body { /* u_int32_t req_seq ; u_int16_t nla_afi ; u_int16_t rsvd1 ; */ struct in_addr src_nla ; u_int16_t nla_afi2 ; u_int16_t rsvd2 ; struct in_addr mc_nla ; u_char options[0] ; } ; /* * options (similar to IP options) */ struct pgm_option { u_int8_t type ; u_int8_t len ; u_int16_t tot_len ; u_int32_t opt_data[0] ; } ; #define PGM_OPT_LENGTH 0x00 #define PGM_OPT_FRAGMENT 0x01 #define PGM_OPT_JOIN 0x03 #define PGM_OPT_TIME 0x04 #define PGM_OPT_RXQ 0x05 #define PGM_OPT_DROP 0x06 #define PGM_OPT_REDIRECT 0x07 #define PGM_OPT_END 0x80 #define PGM_OPT_DESC 0xf0 #define PGM_OPT_SYN 0xf1 #define PGM_OPT_FIN 0xf2 /* * user settable options (with setsockopt) */ #define PGM_TXW_SIZE 1 #define PGM_TXW_MAX_RATE 2 #define PGM_TRAIL_ADVANCE 3 #define PGM_ODATA_LIFETIME 4 #define PGM_HOLE_SIZE 5 struct sockaddr_pgm { u_char sin_len ; u_char sin_family ; u_int16_t sin_port ; struct in_addr sin_addr ; u_int32_t gsid_low ; u_int16_t gsid_high ; u_int16_t sport ; } ; #endif _NETINET_PGM_H_ pgm_var.h100644 423 0 23615 6755312762 11325 0ustar luigiwheel/* * pgm_var.h * * Copyright (C) 1999 Luigi Rizzo * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $Id$ */ #ifndef _NETINET_PGM_VAR_H_ #define _NETINET_PGM_VAR_H_ /* * PGM kernel structures and variables. */ struct pgmiphdr { struct ipovly pi_i; /* overlaid ip structure */ struct pgmhdr pi_p; /* pgm header */ }; #define pi_next pi_i.ih_next #define pi_prev pi_i.ih_prev #define pi_x1 pi_i.ih_x1 #define pi_pr pi_i.ih_pr #define pi_len pi_i.ih_len #define pi_src pi_i.ih_src #define pi_dst pi_i.ih_dst #define pi_sport pi_p.ph_sport #define pi_dport pi_p.ph_dport #define pi_ulen pi_p.tpdu_len #define pi_sum pi_p.checksum /* * pgm connection state -- host side -- sender */ typedef enum pgm_state { PGM_NEW, PGM_SENDER, PGM_RECEIVER, PGM_RX_CONNECTED, PGM_CLOSED, } pgm_state ; /* * a queue of pgm packets -- used for both RDATA and reassembly queue, * and NAK timing. */ typedef enum reass_queue_type { T_ODATA, T_HOLE, T_NAK } reass_queue_type ; /* * The first element of the queue only contains the seg_next, seg_prev * pointers. If the head points to itself, the queue is empty. */ #define PGM_Q_EMPTY(ptr) \ ( ptr == (struct pgm_pkt_q *)&(ptr) ) #define PGM_Q_NONEMPTY(ptr) \ ( ptr != (struct pgm_pkt_q *)&(ptr) ) #define PGM_Q_HEAD(ptr, queue) \ ( (ptr) == (struct pgm_pkt_q *)&(queue->seg_next) ) /* * reassembly queue. This is a circular queue. The type field describes * the type of entry (T_HOLE, T_NAK, T_ODATA). * T_HOLE means a non-recoverable item. * T_NAK means there is a retransmit state associated. * T_DATA means there is a chain of mbufs */ struct pgm_pkt_q { struct pgm_pkt_q *seg_next, *seg_prev ; reass_queue_type type ; pgm_seq dp_seq ; /* host order */ union { struct { /* descriptor for data */ struct mbuf *m, *m_tail ; pgm_seq tail_seq ; /* host order */ } d ; struct { short timeout ; /* curr. timeout, decrement at each tick */ short start_timeout ; /* start value for current timeout */ short ticks_left ; /* ticks left before aborting. */ short to_gen_ivl; /* time for which a receiver will * retry a NAK while waiting for the *corresponding RDATA */ short got_ncf ; } n ; /* old timer types */ struct { enum rxmt_timer_type { NAK_TIMEOUT, NCF_TIMEOUT, RDATA_TIMEOUT } ; enum rxmt_timer_type type ; /* timer type */ short timeout ; /* expire when 1->0 */ short ncf_to_ivl ; short rdata_to_ivl ; short ncf_retry ; short rdata_retry ; } new_timer; } u ; } ; /* * RDATA queue element. (also used for ODATA expire) * The queue is ordered by sequence number. */ struct pgm_rdata_q { struct pgm_rdata_q *next; /* next element */ struct mbuf *mb; /* pointer to mbuf to retransmit */ pgm_seq seq; /* seqno of the packet to retransmit */ int ticks; /* timer queue for ODATA/RDATA */ }; struct pgmcb { /* Receiver: reassembly queue (must be first) */ struct pgm_pkt_q *seg_next, *seg_prev ; struct pgm_rdata_q *rdata_head; struct inpcb *p_inpcb; /* back pointer to internet pcb */ pgm_state state ; /* * local and remote addresses are held in the inpcb as * inp_{laddr,lport,faddr,fport}. Here we only need the GSI, * plus the sport on the receiver. */ u_int32_t gsid_low ; u_int16_t gsid_high ; u_int16_t sport; struct in_addr path_nla ; /* and this (from SPM's) */ /* * XXX src_nla is only used to fill the NAK, but i think it * is unnecessary. In any case... fill it from SPM. */ struct in_addr src_nla ; /* * sender state variables. * see sec.3.3 for tx window definitions */ int txw_max_rte ; pgm_seq txw_size ; /* window size in bytes */ pgm_seq txw_trail ; /* oldest avail. pkt */ pgm_seq txw_lead ; /* most recent xmitted pkt */ /* next fields are used to implement the expire of odata * in timer-advance mode. */ struct pgm_rdata_q *odata_trail_head, *odata_trail_tail; int trail_advance_policy ; #define TRAIL_ADVANCE_TIMER 1 #define TRAIL_ADVANCE_DATA 2 #define TRAIL_ADVANCE_USER 3 int odata_lifetime ; /* in ticks. */ int odata_ticks_from_last_insert ; pgm_seq spm_sqn ; /* * ambient SPM gou out "at a rate sufficient to maintain * source state. * Heartbeat SPM go out when data is idle at a decaying rate * IHB_TMR from the most recent transmit, from IHB_MIN to IHB_MAX. * Any data tx reinitializes IHB_TMR to min */ int spm_ticks; /* when 1->0 send spm */ int ihb_tmr, ihb_min, ihb_max ; struct pgmiphdr *p_template; /* skeletal packet for transmit */ /* * transmit buffers are in the socket buffer. odata_curr * points there and is the next odata pkt to send. */ struct mbuf *odata_curr; /* * next two values are scaled by 8*pgm_timer rate. * numbytes is the amount of data i can send next time, * txw_max_rte is the increment at each tick (in practice * is the rate in bits/s because of the scaling.). * The scheme is similar to the one used in dummynet. * At each tick, provided there is data or numbytes < txw_max_rte, * we increment numbytes. We transmit if we have sufficient * credit (could be numbytes>=0 or numbytes >= txw_max_rte, it * only matters at the beginning of a burst). */ int numbytes ; int flushbytes ; /* how many bytes to flush ? */ /* * receiver state variables. * See sec.3.4 for window definitions. */ pgm_seq rxw_size; /* XXX in bytes!!! */ pgm_seq rxw_irs; /* initial receive sequence number */ pgm_seq rxw_trail; /* oldest recoverable segment. Will never * be rxw_trail PGM_SEQ_LT rxw_next */ pgm_seq rxw_lead ; /* highest seqno so far */ #define PGM_DEFAULT_LOOKAHEAD 10 pgm_seq rxw_lookahead ; /* offset beyond rxw_lead for * acceptable segments */ pgm_seq nak_curr; /* which NAK/NCF are we sending ? */ pgm_seq rxw_next ; /* next pkt to read */ pgm_seq rxw_hole_start ; /* first segment in hole */ /* * these can become flags at a later time */ int in_hole ; /* last seg. appended was a hole */ int have_gsi ; /* have gsi */ /* some diagnostic variables */ int rdata_count ; /* number of received rdata */ int reass_q_bufs ; /* mbufs in reass_queue */ } ; #define intopgmcb(ip) ((struct pgmcb *)(ip)->inp_ppcb) #define sotopgmcb(so) (intopgmcb(sotoinpcb(so))) #define pgmcbtoso(tp) (tp->p_inpcb->inp_socket) /* * PGM statistics. * Many of these should be kept per connection, * but that's inconvenient at the moment. */ struct pgmstat { u_long pgms_badlen ; /* invalid len */ u_long pgms_badsum ; u_long pgms_fullsock ; u_long pgms_hdrops ; /* dropped by header */ u_long pgms_ipackets ; /* input packets */ u_long pgms_noport ; u_long pgms_noportbcast ; u_long pgms_opackets ; /* input packets */ u_long pgms_rcvduppack ; u_long pgms_rcvpack ; u_long pgms_rcvoopack ; }; /* * Names for PGM sysctl objects */ #define PGMCTL_DO_RFC1323 1 /* use RFC-1323 extensions */ #define PGMCTL_DO_RFC1644 2 /* use RFC-1644 extensions */ #define PGMCTL_MSSDFLT 3 /* MSS default */ #define PGMCTL_STATS 4 /* statistics (read-only) */ #define PGMCTL_RTTDFLT 5 /* default RTT estimate */ #define PGMCTL_KEEPIDLE 6 /* keepalive idle timer */ #define PGMCTL_KEEPINTVL 7 /* interval to send keepalives */ #define PGMCTL_SENDSPACE 8 /* send buffer space */ #define PGMCTL_RECVSPACE 9 /* receive buffer space */ #define PGMCTL_KEEPINIT 10 /* receive buffer space */ #define PGMCTL_MAXID 11 #define PGMCTL_NAMES { \ { 0, 0 }, \ { "rfc1323", CTLTYPE_INT }, \ { "rfc1644", CTLTYPE_INT }, \ { "mssdflt", CTLTYPE_INT }, \ { "stats", CTLTYPE_STRUCT }, \ { "rttdflt", CTLTYPE_INT }, \ { "keepidle", CTLTYPE_INT }, \ { "keepintvl", CTLTYPE_INT }, \ { "sendspace", CTLTYPE_INT }, \ { "recvspace", CTLTYPE_INT }, \ { "keepinit", CTLTYPE_INT }, \ } #ifdef KERNEL extern struct inpcbhead pgmcb; /* head of queue of active tcpcb's */ extern struct inpcbinfo pcbinfo; extern struct pgmstat pgmstat; /* pgm statistics */ struct pgmcb *pgm_close (struct pgmcb *); void pgm_ctlinput (int, struct sockaddr *, void *); #if __FreeBSD__ >= 3 int pgm_ctloutput (struct socket *so, struct sockopt *sopt); #else int pgm_ctloutput (int, struct socket *, int, int, struct mbuf **); #endif void pgm_drain __P((void)); void pgm_fasttimo (void); void pgm_init (void); void pgm_input (struct mbuf *, int); int pgm_output (struct pgmcb *, int); void pgm_slowtimo (void); void pgm_clean_reass(struct pgmcb *tp); int pgm_usrreq (struct socket *, int, struct mbuf *, struct mbuf *, struct mbuf *); void pgm_odata_move(struct pgmcb *); void pgm_rdata_move(struct pgmcb *); int in_window(struct pgmcb *, pgm_seq); extern struct pr_usrreqs pgm_usrreqs; extern struct inpcbinfo pgmcbinfo ; extern u_long pgm_sendspace, pgm_recvspace ; #endif /* KERNEL */ #endif /* _NETINET_TCP_VAR_H_ */ pgm_timer.c100644 423 0 105470 6755266467 11702 0ustar luigiwheel/* * pgm_timer.c - 1990812 * Copyright (c) 1999 Luigi Rizzo * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR `AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $Id$ */ #define PGM_NEW_TIMER 1 /* 0 if old timer */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #define DEB(x) #define DDB(x) x static void pgm_rx_in(struct pgmcb *tp, struct pgmhdr *ph, struct mbuf *m); static void pgm_handle_naks(struct pgmcb *tp) ; static void pgm_timer(void *); static void pgm_timer_restart(struct pgmcb *tp); static void pgm_dispatch(struct inpcb *last, struct mbuf *m, struct pgmhdr *ph); static void insert_rdata_q(struct pgmcb *tp, pgm_seq seq); int pgm_now; struct pgmstat pgmstat ; static struct sockaddr_pgm pgm_in = { sizeof(pgm_in), AF_INET } ; static int pgm_timer_active = 0 ; static int nak_bo_ivl = 6; SYSCTL_INT(_net_inet_pgm, OID_AUTO, nak_bo_ivl, CTLFLAG_RW, &nak_bo_ivl , 0, "Base for random backoff nak timeout"); /* counted from loss detection */ #if PGM_NEW_TIMER /* new timer handling -- Luigi's description */ static int ncf_to_ivl = 5 ; /* base value for NCF timeout */ SYSCTL_INT(_net_inet_pgm, OID_AUTO, ncf_to_ivl, CTLFLAG_RW, &ncf_to_ivl , 5, "NCF receive timeout"); static int rdata_to_ivl = 20 ; /* base value for RDATA timeout */ SYSCTL_INT(_net_inet_pgm, OID_AUTO, rdata_to_ivl, CTLFLAG_RW, &rdata_to_ivl , 0, "RDATA receive timeout"); static int ncf_retries = 6 ; /* how many retries to get NCF */ SYSCTL_INT(_net_inet_pgm, OID_AUTO, ncf_retries, CTLFLAG_RW, &ncf_retries , 0, "NCF retries"); static int rdata_retries = 10 ; /* how many retries to get RDATA */ SYSCTL_INT(_net_inet_pgm, OID_AUTO, rdata_retries, CTLFLAG_RW, &rdata_retries , 0, "RDATA retries"); #else /* old timer handling -- v3 spec */ /* * Timeout handling * There is a global timeout on recovery phase (nak_gen_ivl) * one on NAK generation (nak_rb_ivl) * one on NCF waiting (nak_rpt_ivl) * one on RDATA waiting (nak_rdata_ivl) */ static int nak_rpt_ivl = 15; SYSCTL_INT(_net_inet_pgm, OID_AUTO, nak_rpt_ivl, CTLFLAG_RW, &nak_rpt_ivl , 0, "Timeout for segment loss because of missing NCF"); /* counted from NAK transmission (first one ?) */ static int nak_rdata_ivl = 30; SYSCTL_INT(_net_inet_pgm, OID_AUTO, nak_rdata_ivl, CTLFLAG_RW, &nak_rdata_ivl , 0, "Timeout for RDATA loss because of missing RDATA"); /* counted from the time a matching NCF is received */ static int nak_gen_ivl = 125; SYSCTL_INT(_net_inet_pgm, OID_AUTO, nak_gen_ivl, CTLFLAG_RW, &nak_gen_ivl , 0, "Timeout for segment loss because of missing RDATA"); /* counted from NAK transmission */ #endif static int spm_ivl = 6; /* measured in ticks */ SYSCTL_INT(_net_inet_pgm, OID_AUTO, spm_ivl, CTLFLAG_RW, &spm_ivl , 0, "Interval between ambient SPM"); void pgm_fasttimo() /* currently unused... */ { } /* * Slow protocol timeout routine called every 500 ms. * Updates the timers in all active pcb's and * causes finite state machine actions if timers expire. */ void pgm_slowtimo() { struct inpcb *inp, *ipnxt; struct pgmcb *tp; int s = splnet(); DEB(printf("PGM: pgm_slowtimo %d\n", pgm_now);) /* * Search through pcb's and update active timers. */ for ( inp = pgmcb.lh_first; inp != NULL; inp = ipnxt) { ipnxt = inp->inp_list.le_next; tp = intopgmcb(inp) ; if (tp->state == PGM_SENDER && tp->have_gsi == 1) { /* XXX spm handling. In fact it is slightly more complex */ if (tp->spm_ticks-- <= 0) { tp->spm_ticks = spm_ivl ; pgm_output(tp, PGM_SPM_TYPE); } } else if (tp->state == PGM_RX_CONNECTED && PGM_Q_NONEMPTY(tp->seg_next) ) { pgm_handle_naks(tp); DEB(if (PGM_Q_NONEMPTY(tp->seg_next)) { int a = tp->seg_next->type ; int b = tp->seg_prev->type ; printf("-- rxw_next %d WIN %d ..%d REASS %d(%s)..%d(%s)\n", tp->rxw_next, tp->rxw_trail, tp->rxw_lead, tp->seg_next->dp_seq, (a == T_ODATA ? "ODATA" : (a == T_HOLE ? "HOLE" : "NAK")), tp->seg_prev->dp_seq, (b == T_ODATA ? "ODATA" : (b == T_HOLE ? "HOLE" : "NAK")) ) ; } ); } } pgm_now++; /* for timestamps */ splx(s); } /* * High frequency task to move packets out of the traffic shaper. * Calld by the timer. XXX Check splXX() protections. * We keep pgm_timer_active set till the very last moment to avoid * that the timer is set twice. * OK 990807 */ static void pgm_timer(void *dummy) { struct inpcb *inp, *ipnxt ; struct pgmcb *tp ; struct pgm_rdata_q *r; int s ; inp = pgmcb.lh_first; if (inp == NULL) { pgm_timer_active = 0; return; } /* * Search through pcb's. We look for active senders. */ for (; inp != NULL; inp = ipnxt) { struct socket *so = inp->inp_socket ; ipnxt = inp->inp_list.le_next; tp = intopgmcb(inp) ; if (tp->state != PGM_SENDER || tp->p_template == NULL) continue ; s = splnet(); /* * Increment the amount of data that can be sent. However, * don't do that if the channel is idle. */ if (tp->odata_curr || tp->rdata_head || tp->numbytes < tp->txw_max_rte ) tp->numbytes += tp->txw_max_rte; if (tp->rdata_head) pgm_rdata_move(tp); pgm_odata_move(tp); switch (tp->trail_advance_policy) { case TRAIL_ADVANCE_TIMER: if (tp->odata_trail_head) { struct pgm_rdata_q *q = tp->odata_trail_head ; int need_wakeup = 0 ; q->ticks-- ; if (tp->odata_ticks_from_last_insert < tp->odata_lifetime) tp->odata_ticks_from_last_insert++ ; while (q && q->ticks <= 0) { /* expire old packets */ sbdroprecord(&so->so_snd) ; tp->odata_trail_head = q->next ; DEB(printf("pgm_timer: free trail %p\n", q);) free(q, M_PCB); q = tp->odata_trail_head; need_wakeup = 1; tp->txw_trail++ ; } if (need_wakeup) sowwakeup(so); } break; case TRAIL_ADVANCE_USER: /* nothing to do here... */ break; case TRAIL_ADVANCE_DATA: /* TODO */ /* * If the sender is blocked, * start flushing data at a rate equal to send rate. */ #if 0 /* XXX need checking... */ if (so->so_snd.sb_flags & SB_WAIT || /* so->so_snd.sb_cc > so->so_snd.sb_hiwat || */ tp->flushbytes < tp->txw_max_rte) tp->flushbytes += tp->txw_max_rte ; if ( so->so_snd.sb_flags & SB_WAIT ) { while ( so->so_snd.sb_mb ) { int len_scaled = 8 * hz * so->so_snd.sb_mb->m_pkthdr.len ; /* numbytes is in bit/sec, scaled 8*hz ... */ if (tp->flushbytes < len_scaled) break; DEB(printf("PGM: flush %d bytes from socket queue\n", len_scaled / (8 * hz) ); ) tp->flushbytes -= len_scaled; sbdroprecord(&so->so_snd) ; tp->txw_trail++ ; sowwakeup(so) ; } } break ; #endif default: printf("pgm_timer: advance trail method not recognized..\n"); break; } /* free expired RDATA element: common to all trail advance method */ for (r = tp->rdata_head; r != NULL && r->seq < tp->txw_trail; r = tp->rdata_head) { DDB(printf("pgm_timer: free expired rdata %u\n", r->seq);) tp->rdata_head = r->next; free(r, M_PCB); } splx(s); } /* * finally, if some queue has data, restart the timer. Must be * splnet() protected. */ s = splnet(); pgm_timer_active = 0; pgm_timer_restart(NULL); splx(s); } /* * invoked to reschedule the periodic task if necessary. * Must be called at splnet(). * An optional hint speeds up the search... */ static void pgm_timer_restart(struct pgmcb *tp) { struct inpcb *inp = NULL, *ipnxt; struct socket *so ; if (pgm_timer_active) return ; DEB(printf("PGM: pgm_timer_restart %d\n", pgm_now);) inp = tp ? tp->p_inpcb : NULL ; if (inp == NULL) inp = pgmcb.lh_first ; if (inp == NULL) return ; /* * Search through pcb's for an active sender who needs us. */ for (; inp != NULL; inp = ipnxt) { ipnxt = inp->inp_list.le_next; so = inp->inp_socket ; tp = intopgmcb(inp) ; if (tp->state != PGM_SENDER || tp->p_template == NULL) continue ; /* if there any pgmcb that needs work, restart */ if (tp->odata_curr || /* have something to transmit */ tp->rdata_head || /* have something to retransmit */ tp->numbytes < 0 || /* neet to fixup credits */ so->so_snd.sb_mb || /* have something to flush */ tp->flushbytes < 0 /* neet to fixup credits */ ) { pgm_timer_active = 1; timeout(pgm_timer, (caddr_t)NULL, 1); return ; } } return; } /* * The following procedure decides whether to send NAKs or otherwise * handle timeouts for missing packets. Called by slowtimo. */ static void pgm_handle_naks(struct pgmcb *tp) { struct pgm_pkt_q *q = tp->seg_next ; int is_hole = 0; int ncf_timeouts=0, rdata_timeouts = 0 , nak_count = 0 ; for ( ; !PGM_Q_HEAD(q, tp) ; q = q->seg_next ) { if (q->type != T_NAK) continue ; #if PGM_NEW_TIMER /* new timer handling */ if ( --q->u.new_timer.timeout > 0 ) continue; if ( q->u.new_timer.type == NAK_TIMEOUT ) { /* Can send NAK only if have path_nla from an SPM */ if (tp->path_nla.s_addr) { tp->nak_curr = q->dp_seq ; pgm_output(tp, PGM_NAK_TYPE); if (q == tp->seg_next) printf("++ send NAK for first miss %d\n", q->dp_seq); } nak_count++ ; q->u.new_timer.type = NCF_TIMEOUT ; q->u.new_timer.timeout = q->u.new_timer.ncf_to_ivl ; } else { /* basically the same thing here... */ int toomuch ; if ( q->u.new_timer.type == NCF_TIMEOUT ) { /* q->u.new_timer.ncf_to_ivl *= 2 ; */ toomuch = ( ++q->u.new_timer.ncf_retry >= ncf_retries ) ; if (q == tp->seg_next) printf("++ NCF_TIMEOUT %d %d for first miss %d\n", q->u.new_timer.ncf_retry, toomuch, q->dp_seq); } else { /* RDATA_TIMEOUT */ /* q->u.new_timer.rdata_to_ivl *= 2 ; */ toomuch = ( ++q->u.new_timer.rdata_retry >= rdata_retries ) ; if (q == tp->seg_next) printf("++ RDATA_TIMEOUT %d %d for first miss %d\n", q->u.new_timer.rdata_retry, toomuch, q->dp_seq); } if ( toomuch ) { q->type = T_HOLE; /* irrecov. loss due to missing NCF */ is_hole = 1; if ( q->u.new_timer.type == NCF_TIMEOUT ) ncf_timeouts++ ; else rdata_timeouts++ ; } else { /* schedule nak ... */ q->u.new_timer.type = NAK_TIMEOUT ; q->u.new_timer.timeout = (nak_bo_ivl / 2) * (1 + random() % nak_bo_ivl); } } #else /* old timer handling */ DEB(printf("**** NAK_GEN_IVL for seq %u: %d\n", q->dp_seq, q->to_gen_ivl );) if (q->u.n.to_gen_ivl < 0) { /* irrecoverable due to missing RDATA */ rdata_timeouts++ ; q->type = T_HOLE; is_hole = 1; continue; } DEB(printf("**** NAK_RPT_IVL for seq %u: %d\n", q->dp_seq, q->ticks_left );) if (q->u.n.ticks_left < 0 ) { /* irrecov. loss due to missing NCF */ ncf_timeouts++ ; q->type = T_HOLE; is_hole = 1; continue; } DEB(if (q->u.n.got_ncf) printf("**** NAK_RDATA_IVL (have NCF wait for RDATA) for %u: %d\n", q->dp_seq, q->u.n.timeout );) if (--q->u.n.timeout > 0) continue ; /* ok, we have something to do now */ q->u.n.to_gen_ivl -= q->u.n.start_timeout; if (q->u.n.got_ncf == 1) { /* Got NCF, nak_rdata_ivl timeout */ q->u.n.ticks_left = nak_rpt_ivl; q->u.n.start_timeout = q->u.n.timeout = (nak_bo_ivl / 2) * (1 + random() % nak_bo_ivl); /* XXX */ q->u.n.got_ncf = 0; continue; } q->u.n.ticks_left -= q->u.n.start_timeout; /* send NAK */ q->u.n.start_timeout = q->u.n.timeout = (nak_bo_ivl / 2) * (1 + random() % nak_bo_ivl); /* XXX */ tp->nak_curr = q->dp_seq; /* Can send NAK only if have path_nla from an SPM */ if (tp->path_nla.s_addr) { pgm_output(tp, PGM_NAK_TYPE); nak_count++ ; } #endif /* old timer handling */ } DEB( if (nak_count > 0 || ncf_timeouts > 0 || rdata_timeouts > 0) printf(" sent %d naks, %d rdata timeouts, %d ncf timeouts\n", nak_count, rdata_timeouts, ncf_timeouts); ) if (is_hole) pgm_clean_reass(tp); } /* * Pass up packets from the reassembly queue when possible * (in-sequence data or irrecoverable packets "T_HOLE"). */ void pgm_clean_reass(struct pgmcb *tp) { struct pgm_pkt_q *q ; struct socket *so = tp->p_inpcb->inp_socket; int need_wakeup = 0 ; int expired = 0 ; /* diagnostic */ pgm_seq exp_first, exp_last ; /* * INVARIANT: at each stage, q->dp_seq == tp->rxw_next. If not * there is a bad mistake in the code, probably worth a panic. */ while ( PGM_Q_NONEMPTY(tp->seg_next) ) { q = tp->seg_next ; if (q->dp_seq != tp->rxw_next) { /* check invariant */ printf("### clean_reass: have %u (rxw_next.%d)\n", q->dp_seq, q->dp_seq - tp->rxw_next); panic("clean_reass, q->dp_seq != tp->rxw_next"); } if ( PGM_SEQ_LT(tp->rxw_next, tp->rxw_trail) && q->type == T_NAK ) { if (!expired) { /* diagnostic */ expired = 1 ; exp_first = q->dp_seq ; } exp_last = q->dp_seq ; q->type = T_HOLE ; /* NAK expired for trail advance */ } if ( q->type == T_NAK ) /* recoverable NAK */ break ; else if ( q->type == T_HOLE ) { if (tp->in_hole == 0) { /* add a hole entry */ need_wakeup = 1 ; tp->in_hole = 1 ; tp->rxw_hole_start = tp->rxw_next ; sbappendaddr(&so->so_rcv, (struct sockaddr *)&pgm_in, NULL, NULL) ; } } else if (q->type == T_ODATA) { if (tp->in_hole) break; else { need_wakeup = 1; tp->reass_q_bufs-- ; sbappendaddr(&so->so_rcv, (struct sockaddr *)&pgm_in, q->u.d.m, NULL); } } else { printf("--- OUCH! unrecognised seg type %d\n", q->type); } /* * record gone. advance next,trail, unlink from queue */ tp->rxw_next++; if (PGM_SEQ_LT(tp->rxw_trail, tp->rxw_next ) ) tp->rxw_trail = tp->rxw_next ; q->dp_seq = 0; /* to mark errors */ q->seg_next->seg_prev = q->seg_prev ; q->seg_prev->seg_next = q->seg_next ; free(q, M_PCB); } if (expired) printf("+++ %u .. %u expired for trail advance\n", exp_first, exp_last); if (need_wakeup) sorwakeup(so); } /* * pgm_rx_in is the main handler for packets in receivers. * Only called by pgm_dispatch on receive sockets with * an already-assigned TSI. * The reassembly queue is below the socket buffer. */ static void pgm_rx_in(struct pgmcb *tp, struct pgmhdr *ph, struct mbuf *m) { struct pgm_pkt_q *me = NULL, *q ; pgm_seq dp_seq = 0, dp_trail, dp_lead ; pgm_seq seq; /* first pkt we can recover */ struct pgm_spm *spm; int pgmhdrlen = sizeof(struct pgmhdr) ; if (ph->options & PGM_OPT_PRESENT) { /* * XXX TODO: process options in incoming packets. * First option must be PGM_OPT_LENGTH, last PGM_OPT_END */ goto fail ; /* XXX until we can deal with them... */ } if (tp->state == PGM_RECEIVER) { /* * First ODATA or SPM makes receiver socket connected * (maybe also NCF and NAKs ?) */ if (ph->type == PGM_OD_TYPE) dp_seq = ntohl(ph->od_dp_seq); else if (ph->type == PGM_SPM_TYPE) { if ((m = m_pullup(m, sizeof(struct pgm_spm))) == 0) { pgmstat.pgms_hdrops++; return; } spm = mtod(m, struct pgm_spm *); dp_seq = ntohl(spm->body.spm_le_seq) + 1 ; } else goto fail ; tp->state = PGM_RX_CONNECTED ; tp->rxw_lookahead = PGM_DEFAULT_LOOKAHEAD ; tp->rxw_irs = tp->rxw_trail = tp->rxw_lead = tp->rxw_next = dp_seq ; } /* * now we are in connected state. */ switch (ph->type) { case PGM_NCF_TYPE: /* * XXX TODO -- i might use the info to detect missing pkts. */ dp_seq = ntohl(ph->nak_req_seq) ; if (PGM_SEQ_GEQ(dp_seq, tp->rxw_trail) && PGM_SEQ_LEQ(dp_seq, tp->rxw_lead)) { DEB(printf("++++ NCF received for seq %lu\n", ntohl(ph->nak_req_seq));) for (q = tp->seg_next ; !PGM_Q_HEAD(q,tp) ; q = q->seg_next) if ( q->type == T_NAK && q->dp_seq == ntohl(ph->nak_req_seq) ) { #if PGM_NEW_TIMER q->u.new_timer.type = RDATA_TIMEOUT ; q->u.new_timer.timeout = q->u.new_timer.rdata_to_ivl ; q->u.new_timer.ncf_retry = 0 ; #else q->u.n.got_ncf = 1; q->u.n.start_timeout = q->u.n.timeout = nak_rdata_ivl; #endif break; } } m_freem(m); return; break; case PGM_OD_TYPE: case PGM_RD_TYPE: dp_seq = dp_lead = ntohl(ph->od_dp_seq); dp_trail = ntohl(ph->od_txw_trail); DEB( if (ph->type == PGM_RD_TYPE) printf("++++ RDATA received trail %u seq %u \n", dp_trail, dp_seq);) DEB( if (ph->type == PGM_OD_TYPE) printf("++++ ODATA received trail %u seq %u \n", dp_trail, dp_seq);) /* * XXX TODO enforce receive window size limitations, dropping * the most recent packets. Not trivial, as rxw_size is * measured in bytes, not packets; packets are spread between * the socket buffer and the reassembly queue; and we don't know * how big are the holes. */ /* set lookahead to 1/2 of the current window or min 10 pkts */ if (PGM_SEQ_GT(tp->rxw_lead, tp->rxw_trail + 20) ) tp->rxw_lookahead = (tp->rxw_lead - tp->rxw_trail) / 2 ; else tp->rxw_lookahead = PGM_DEFAULT_LOOKAHEAD ; /* * check for in-window packet. */ if (PGM_SEQ_LT(dp_seq, tp->rxw_trail) ) goto fail ; /* way too old */ if ( PGM_SEQ_GT(dp_seq, tp->rxw_lead + tp->rxw_lookahead) ) { /* new one... might be rogue, check trail is in-window */ if (PGM_SEQ_LT(dp_trail, tp->rxw_trail) || PGM_SEQ_GT(dp_trail, tp->rxw_lead + tp->rxw_lookahead) ) { DDB(printf("--- data %u out-of-window (%u,%u + %d), drop\n", dp_seq, tp->rxw_trail, tp->rxw_lead, tp->rxw_lookahead); ); goto fail ; } } /* * update rxw_lead and rxw_trail */ if (PGM_SEQ_GT(dp_lead, tp->rxw_lead)) tp->rxw_lead = dp_lead; if (PGM_SEQ_GT(dp_trail, tp->rxw_trail)) tp->rxw_trail = dp_trail; /* strip off pgm header */ m->m_len -= sizeof(struct pgmhdr); m->m_pkthdr.len -= sizeof(struct pgmhdr); m->m_data += sizeof(struct pgmhdr); break; case PGM_SPM_TYPE: if ((m = m_pullup(m, sizeof(struct pgm_spm))) == 0) { pgmstat.pgms_hdrops++; return; } /* * XXX todo: check that the SPM is a recent one */ spm = mtod(m, struct pgm_spm *); dp_trail = ntohl(ph->spm_txw_trail); dp_lead = ntohl(spm->body.spm_le_seq); DDB(printf("pgm_rx_in: SPM from 0x%lx [%u,%u]\n", ntohl(spm->body.path_nla.s_addr), dp_trail, dp_lead);); tp->path_nla = spm->body.path_nla; tp->src_nla.s_addr = pgm_in.sin_addr.s_addr ; m_freem(m); m = NULL ; /* don't need pkt anymore, only lead..trail markers */ if (PGM_SEQ_GT(dp_trail, tp->rxw_trail)) { /* trail advanced, cleanup */ tp->rxw_trail = dp_trail; pgm_clean_reass(tp); } if ( PGM_SEQ_LEQ(dp_lead, tp->rxw_lead) ) return; /* * If I get here: dp_lead > rxw_lead, and must insert entries * for NAK after rxw_lead. XXX check the code! */ dp_seq = tp->rxw_lead = dp_lead; break; case PGM_NAK_TYPE : /* XXX TODO * handle this for suppression and hole detection */ default: printf("pgm_rx_in: discarding type %d\n", ph->type); goto fail ; } #if 0 /* still incomplete... */ /* * optimized common case. In sequence delivery and empty * reass. queue and not in_hole. XXX still incomplete */ if (dp_seq == tp->rxw_next ) { tp->rxw_next++ ; if (tp->rxw_trail < tp->rxw_next) tp->rxw_trail = tp->rxw_next ; pgmstat.pgms_rcvpack++ ; /* pgmstat.pgms_rcvbyte += ... */ sbappendaddr(&so->so_rcv, (struct sockaddr *)&pgm_in, m, NULL) ; if ( PGM_Q_NONEMPTY(tp->seg_next) ) goto present ; /* possibly deliver other pkts already queued */ sorwakeup(so); return ; } #endif /* * XXX at the moment we only get here with ODATA/RDATA/SPM. Should we * decide to use NAK/NCF to detect holes, make sure m is NULL so we * can tell the two cases. */ DEB(printf("pgm_rx_in: insert %d [%d, %d]\n", dp_seq - tp->rxw_irs, tp->seg_next == (struct pgm_pkt_q *)tp ? -1 : tp->seg_next->dp_seq - tp->rxw_irs, tp->seg_prev == (struct pgm_pkt_q *)tp ? -1 : tp->seg_prev->dp_seq - tp->rxw_irs );); /* * locate place to insert. dp_seq is the current packet, q points * initially to the last record. After the scan, seq is the seqno of * the first missing packet, and q points to the record after which * we must insert new entries (which are seq..dp_seq inclusive). */ q = tp->seg_prev; if ( PGM_Q_HEAD(q, tp) ) { seq = tp->rxw_trail; /* queue empty, first missing is trail */ /* * check the special case of trail going beyond next * XXX this can be optimized a lot! */ if (seq > tp->rxw_next) seq = tp->rxw_next ; } else if ( PGM_SEQ_GT(dp_seq, q->dp_seq) ) seq = q->dp_seq+1 ; /* pkt newer than last, start after that. */ else if (m == NULL) /* we are in the middle, but this is not data. */ return ; /*XXX maybe should not happen ? */ else { /* we are in the middle, need a full scan */ for (q = tp->seg_next ; !PGM_Q_HEAD(q,tp) ; q = q->seg_next) { if (q->dp_seq == dp_seq) { /* entry already existing... */ if (q->type == T_ODATA) { /* duplicate */ pgmstat.pgms_rcvduppack++ ; goto fail ; } else { /* fill the hole */ tp->reass_q_bufs++ ; q->type = T_ODATA; q->u.d.m = m ; goto present ; /* XXX only if q == tp->seg_next */ } } } panic("pgm_rx_in: should not get here!!!\n"); } /* * insert an entry for each missing packet */ DEB( printf("trail=%d rcv_next=%d seq=%d dp_seq=%d lead=%d\n", tp->rxw_trail - tp->rxw_irs, tp->rxw_next - tp->rxw_irs, seq - tp->rxw_irs, dp_seq - tp->rxw_irs, tp->rxw_lead - tp->rxw_irs); ); for (; PGM_SEQ_LEQ(seq, dp_seq) ; seq++ ) { me = malloc(sizeof(*me), M_PCB, M_NOWAIT); if (me == NULL) goto fail ; bzero(me, sizeof(*me) ); me->dp_seq = seq ; /* index of the missing/new packet */ if ( seq == dp_seq && m != NULL ) { me->type = T_ODATA ; me->u.d.m = m; tp->reass_q_bufs++ ; } else { /* set retransmission state */ me->type = T_NAK ; #if PGM_NEW_TIMER me->u.new_timer.type = NAK_TIMEOUT ; me->u.new_timer.timeout = (nak_bo_ivl / 2) * (1 + random() % nak_bo_ivl); me->u.new_timer.ncf_to_ivl = ncf_to_ivl ; me->u.new_timer.rdata_to_ivl = rdata_to_ivl ; me->u.new_timer.ncf_retry = me->u.new_timer.rdata_retry = 0 ; #else me->u.n.start_timeout = me->u.n.timeout = (nak_bo_ivl / 2) * (1 + random() % nak_bo_ivl); /* XXX */ me->u.n.ticks_left = nak_rpt_ivl ; me->u.n.to_gen_ivl = nak_gen_ivl; #endif } /* * insert into queue */ me->seg_next = q->seg_next ; me->seg_prev = q ; q->seg_next = me ; me->seg_next->seg_prev = me ; q = me ; } /* * try to figure out if we can request some retransmission * to fill holes, and check if we can pass one or more packets * up to the socket buffer. */ present: pgm_clean_reass(tp); return ; fail: m_freem(m); return ; } /* * called by ip_input to demux the packet to the appropriate place(s). * Runs at splnet. */ void pgm_input(struct mbuf *m, int iphlen) { struct ip *ip; struct pgmhdr ph ; struct inpcb *inp; struct pgmcb *tp ; int len; u_int16_t pkt_sport ; struct inpcb *last; DEB(printf("PGM: pgm_input\n");) pgmstat.pgms_ipackets++; if (iphlen > sizeof (struct ip)) { /* Strip IP options, if any. */ ip_stripoptions(m, (struct mbuf *)0); iphlen = sizeof(struct ip); } /* * Get IP and PGM header together in first mbuf (can still be * a cluster, so shared in copies). */ ip = mtod(m, struct ip *); if (m->m_len < iphlen + sizeof(struct pgmhdr)) { if ((m = m_pullup(m, iphlen + sizeof(struct pgmhdr))) == 0) { pgmstat.pgms_hdrops++; return; } ip = mtod(m, struct ip *); } ph = *(struct pgmhdr *)((caddr_t)ip + iphlen); /* * Make mbuf data length reflect PGM length. * If not enough data to reflect PGM length, drop. */ len = ntohs( ph.tpdu_len); if (ip->ip_len != len) { printf("pgm_input: ip_len %d len %d\n", ip->ip_len, len); if (len > ip->ip_len || len < sizeof(struct pgmhdr)) { pgmstat.pgms_badlen++; printf("pgm_input: bad len\n"); m_freem(m); return ; } m_adj(m, len - ip->ip_len); /* ip->ip_len = len; */ } /* * Construct sockaddr format source address, to be used in sbappendaddr. */ pgm_in.sin_port = ph.ph_sport ; pgm_in.sin_addr = ip->ip_src ; pgm_in.gsid_low = ph.gsid_low ; pgm_in.gsid_high = ph.gsid_high ; pgm_in.sport = ph.ph_sport ; pkt_sport = (ph.type == PGM_NAK_TYPE ? ph.ph_dport : ph.ph_sport) ; /* strip off IP header, not needed anymore here. */ m->m_len -= iphlen; m->m_pkthdr.len -= iphlen; m->m_data += iphlen; /* * Checksum PGM header and data. Note, the IP header is not included. */ if (ph.checksum) { u_int16_t old_sum = ph.checksum ; ph.checksum = in_cksum(m, len ); if (ph.checksum) { pgmstat.pgms_badsum++; printf("+++ cksum failed, type %u, len %d, 0x%x -> 0x%x\n", ph.type, len, old_sum, ph.checksum); m_freem(m); return ; } } /* * Deliver PGM packets to all matching pcbs. Most are multicast, * unicast PGM packets can only be NAK directed to a source. * In principle we could go straight to the (only) pcb, but we cannot * use in_pcblookup_hash() for this as it checks faddr to filter * basing on the source IP, so we scan the whole list ourselves. * * To avoid mcopy'ing in case of a single destination, record the * matching position in "last", and handle it only when another match * is found. The final pass is done without copying. */ inp = pgmcb.lh_first ; last = NULL; /* * Now we look for matching inpcbs. * NOTA BENE: if on the same host we have a sender and receivers, * and a unicast NAK arrives, we will (correctly!) find a match only for * sender's inpcb. In fact ip->ip_dst.s_addr for NAK is a unicast address * and inp->inp_laddr.s_addr for receiver is a MC address. */ for (; inp != NULL ; inp = inp->inp_list.le_next) { DEB( printf( "pgm_input: packet SRC 0x%08lx/0x%04x -> DST 0x%08lx/0x%04x type %d\n" " socket FGN 0x%08lx/0x%04x -> LOC 0x%08lx/0x%04x\n", ntohl(ip->ip_src.s_addr), ntohs(ph.ph_sport), ntohl(ip->ip_dst.s_addr), ntohs(ph.ph_dport), ph.type, ntohl(inp->inp_laddr.s_addr), ntohs(inp->inp_lport), ntohl(inp->inp_faddr.s_addr), ntohs(inp->inp_fport) ); ) /* * various checks for a matching socket. We need to match: * + local port (always) * + foreign port (except for raw receiver where it is 0); * + local addr (for receiver it is multicast, for sender * it is the unicast IP of output interface) * + and finally, the full TSI * We _cannot_ match the foreign address, on the receiver because * packets might come from multiple sources, on the sender because * NAKs might come from multiple receivers/NEs. */ if (inp->inp_lport != ph.ph_dport) continue; /* local port not matching */ if (inp->inp_fport != 0 && inp->inp_fport != ph.ph_sport) continue; /* foreign port not matching */ /* * On the receiver, laddr is the MC group address. * On the sender, laddr is the unicast IP of the out interface. * XXX why do we check for INADDR_ANY ??? */ if (inp->inp_laddr.s_addr != INADDR_ANY && inp->inp_laddr.s_addr != ip->ip_dst.s_addr) continue; /* local addr. not matching */ tp = intopgmcb(inp); if (tp->state == PGM_NEW || tp->state == PGM_CLOSED) continue; DEB(printf("pgm_input: tp->TSI 0x%08lx.%04x.0x%04x\n", ntohl(tp->gsid_low), ntohs(ph.gsid_high), ntohs(tp->sport) );) /* * check full TSI */ if ( tp->have_gsi && ( tp->gsid_low != ph.gsid_low || tp->gsid_high != ph.gsid_high || tp->sport != pkt_sport ) ) { printf("---- TSI match failed 0x%08lx.%04x.%04x\n", ntohl(ph.gsid_low), ntohs(ph.gsid_high), ntohs(pkt_sport) ); continue ; /* TSI does not match */ } DEB(printf("pgm_input: found descriptor 0x%p state %d for type %d\n", inp, tp->state, ph.type);); if (last != NULL) { struct mbuf *my_m = m_copypacket(m, M_DONTWAIT) ; if (my_m != NULL) pgm_dispatch(last, my_m, &ph); } last = inp; } if (last) pgm_dispatch(last, m, &ph); else { /* No matching pcb found; discard datagram. */ pgmstat.pgms_noportbcast++; m_freem(m); DDB(printf("pgm_input: no matching socket\n");) } } /* * pass the packet to the pcb, possibly copying if not the last one */ static void pgm_dispatch(struct inpcb *last, struct mbuf *m, struct pgmhdr *ph) { struct pgmcb *tp = intopgmcb(last); switch (tp->state) { case PGM_RX_CONNECTED: case PGM_RECEIVER: if (tp->have_gsi) { /* run the receiver state machine */ pgm_rx_in(tp, ph, m); sorwakeup(last->inp_socket); } else { /* can only be PGM_RECEIVER, a raw receiver */ if (sbappendaddr(&last->inp_socket->so_rcv, (struct sockaddr *)&pgm_in, m, NULL) == 0) { pgmstat.pgms_fullsock++; m_freem(m); return ; } sorwakeup(last->inp_socket); /* XXX */ } return ; case PGM_SENDER: if (tp->p_template == NULL) /* connect not done yet... */ break ; /* * Multicast an NCF in response to ANY NAK, then schedule RDATA * for in-window requests. */ if (ph->type == PGM_NAK_TYPE) { /* sources only use NAKs */ pgm_seq i = tp->nak_curr = ntohl(ph->nak_req_seq); DDB(printf("++++ NAK received for seq %lu\n", i ); ) pgm_output(tp, PGM_NCF_TYPE); /* send NCF multicast to the group */ if (PGM_SEQ_GEQ(i, tp->txw_trail) && PGM_SEQ_LEQ(i, tp->txw_lead) ) insert_rdata_q( tp, i ); /* * any options should be processed here... */ } /* * XXX NNAK handling still missing... */ break; default: printf("pgm_input: state not recognized: should not be here !!!\n"); break; } m_freem(m); return; } /* * Send ODATA packets. Called either directly, or by the traffic shaper. * MUST BE CALLED AT splnet() OR ABOVE. * This, and pgm_rdata_move, send as many bytes as available or allowed * by the traffic shaper (the credit, scaled by 8*hz, is tp->numbytes). * The bandwidth budget should include ODATA, RDATA and SPM. * NOTE: pointers are advanced in pgm_output() !!! */ void pgm_odata_move(struct pgmcb *tp) { struct pgm_rdata_q *q; DEB(printf("PGM: pgm_odata_move\n"); ) while ( tp->odata_curr != NULL ) { int len_scaled = 8 * hz * tp->odata_curr->m_pkthdr.len ; if (tp->numbytes < len_scaled) break; tp->numbytes -= len_scaled; pgm_output(tp, PGM_OD_TYPE); /* * now implement the window advance policy for this pkt. * The seqno is txw_lead -- we don't have an mbuf pointer * for this anymore as odata_curr has been moved forward. */ switch (tp->trail_advance_policy) { case TRAIL_ADVANCE_TIMER: q = malloc( sizeof (*q), M_PCB, M_NOWAIT); DEB(printf("pgm_odata_move: malloc trail %p\n", q);) if (q == NULL) { printf("--- OUCH, cannot allocate record to expire ODATA...\n"); } else { q->next = NULL ; if (tp->odata_trail_head == NULL) { tp->odata_trail_head = q ; q->ticks = tp->odata_lifetime ; } else { tp->odata_trail_tail->next = q ; q->ticks = tp->odata_ticks_from_last_insert ; } tp->odata_ticks_from_last_insert = 0 ; tp->odata_trail_tail = q ; } break ; case TRAIL_ADVANCE_DATA: printf("TRAIL_ADVANCE_DATA still unimplemented\n"); #if 0 /* still incomplete... */ /* do things in the timer handler */ tp->txw_trail++ ; #endif break ; case TRAIL_ADVANCE_USER: printf("TRAIL_ADVANCE_USER still unimplemented\n"); #if 0 /* XXX incomplete! */ must set socket in non blocking mode. #endif break ; default: break ; /* should not arrive here! */ } } if (pgm_timer_active == 0) pgm_timer_restart(tp); } /* * Send RDATA packets. Called by the traffic shaper. */ void pgm_rdata_move(struct pgmcb *tp) { struct pgm_rdata_q *r; DEB(printf("pgm_rdata_move\n"); ) while ( (r = tp->rdata_head) != NULL ) { int len_scaled = 8 * hz * r->mb->m_pkthdr.len; if (tp->numbytes < len_scaled) break; tp->numbytes -= len_scaled; pgm_output(tp, PGM_RD_TYPE); } } /* * Insert pgm_rdata_q element ordered by seqno. Retrieve mbuf ptr * by moving (seq - txw_trail) steps in the mbuf chain. * We check before that the requested segment is in-window. */ static void insert_rdata_q(struct pgmcb *tp, pgm_seq seq) { struct pgm_rdata_q *p, *q, *r; struct mbuf *m; int diff = seq - tp->txw_trail ; /* how many steps must go in the queue. */ struct socket *so = pgmcbtoso(tp); /* first, locate position in queue after which to insert. */ for (p = NULL, r = tp->rdata_head ; r != NULL ; p = r, r = r->next) if (r->seq == seq) return ; /* nothing to do, entry already existing */ else if (r->seq > seq) break; /* * Allocate a descriptor. If fails, just ignore request (should * record the failure in some statistics. */ q = (struct pgm_rdata_q *) malloc(sizeof(*q), M_PCB, M_NOWAIT); if (q == NULL) return ; /* locate mbuf pointer */ for (m = so->so_snd.sb_mb; m && diff > 0 ; m = m->m_nextpkt, diff-- ) ; if (m == NULL) { printf("insert_rdata_q: want %u trail-lead %u, %u\n", seq, tp->txw_trail, tp->txw_lead); panic("++ insert_rdata_q, mbuf not found\n"); } q->next = r; q->seq = seq; q->mb = m; if (p == NULL) tp->rdata_head = q; else p->next = q; } /*** end of pgm_timer.c ***/ pgm_usrreq.c100644 423 0 62316 6755345606 12055 0ustar luigiwheel/* * pgm_usrreq.c - 19990810 * * Copyright (c) 1999 Luigi Rizzo * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $Id$ */ #include #include #include #include #include #include #include #include #include /* * There are some differences between FreeBSD 2.x and 3.x. * Mainly, in_pcb*() routines have a last argument p, * and the *_ctloutput has changed. For the former, we use * __2_or_3(x, y, z) to filter the third arg. * A few macros near the end help with the different interface. */ #if __FreeBSD__ >= 3 #include #define __2_or_3(x, y, z) x, y, z #else #define __2_or_3(x, y, z) x, y #endif #include #include #include #include #include #include #include #include #include #include #define DEB(x) #define DDB(x) x /* * PGM protocol interface to socket abstraction. */ #ifndef PGMCBHASHSIZE #define PGMCBHASHSIZE 128 #endif struct inpcbhead pgmcb; struct inpcbinfo pgmcbinfo ; static struct pgmcb *pgm_newpgmcb(struct inpcb *inp); static struct pgmiphdr *pgm_template(struct pgmcb *tp); /* * pgm_sendspace and pgm_recvspace are the default send and receive window * sizes, respectively. */ u_long pgm_sendspace = 1024*64; SYSCTL_INT(_net_inet_pgm, PGMCTL_SENDSPACE, sendspace, CTLFLAG_RW, &pgm_sendspace , 0, "PGM sender buffer space"); u_long pgm_recvspace = 1024*64; SYSCTL_INT(_net_inet_pgm, PGMCTL_RECVSPACE, recvspace, CTLFLAG_RW, &pgm_recvspace , 0, "PGM receive buffer space"); static int pgm_bandwidth = 16000 ; SYSCTL_INT(_net_inet_pgm, OID_AUTO, bandwidth, CTLFLAG_RW, &pgm_bandwidth , 0, "PGM send rate, bits/s"); static int pgm_odata_lifetime = 30 ; /* measured in seconds */ SYSCTL_INT(_net_inet_pgm, OID_AUTO, odata_lifetime, CTLFLAG_RW, &pgm_odata_lifetime , 0, "PGM ODATA lifetime, seconds"); static int pgmcksum = 1 ; /* do checksum */ SYSCTL_INT(_net_inet_pgm, OID_AUTO, pgmcksum, CTLFLAG_RW, &pgmcksum , 1, "Enable PGM checksums"); static int pgm_gsid_low = 0x12345678 ; SYSCTL_INT(_net_inet_pgm, OID_AUTO, gsid_low, CTLFLAG_RW, &pgm_gsid_low , 0x12345678, "PGM GSI-low (32 bit)"); static int pgm_gsid_high = 0x9abc ; SYSCTL_INT(_net_inet_pgm, OID_AUTO, gsid_high, CTLFLAG_RW, &pgm_gsid_high , 0x9abc, "PGM GSI-high (16 bit)"); /* * there is a router_alert thing in igmp.c which could be reused... * We don't just to avoid removing the static declaration in igmp.c * but it is something needs to be fixed later. XXX */ static struct mbuf *pgm_router_alert; /* * Create a new PGM control block, making an * empty reassembly queue and hooking it to the argument * protocol control block. */ static struct pgmcb * pgm_newpgmcb(struct inpcb *inp) { struct pgmcb *tp; tp = malloc(sizeof(*tp), M_PCB, M_NOWAIT); if (tp == NULL) return ((struct pgmcb *)0); bzero((char *) tp, sizeof(struct pgmcb)); tp->p_inpcb = inp; tp->seg_next = tp->seg_prev = (struct pgm_pkt_q *)tp; /* * set fields as required. */ tp->state = PGM_NEW ; tp->txw_max_rte = pgm_bandwidth ; /* bits/s */ /* * note here trail > lead, meaning the window is empty */ tp->spm_sqn = 0 /* random() */ ; tp->txw_lead = 0 /* random() */; /* most recently tx pkt */ tp->txw_trail = tp->txw_lead + 1 ; /* oldest avail. pkt */ tp->rxw_size = pgm_recvspace ; tp->trail_advance_policy = TRAIL_ADVANCE_TIMER ; tp->odata_lifetime = pgm_odata_lifetime * hz ; /* now in ticks */ inp->inp_ppcb = (caddr_t)tp; return (tp); } /* * Output the pkt requested in cmd (additional info is in the pgmcb). * Also do any necessary state update (e.g. sequence numbers, * pointers, remove RDATA records...) */ int pgm_output(struct pgmcb *tp, int cmd) { struct inpcb *inp = tp->p_inpcb ; struct mbuf *m ; struct pgmiphdr *pi ; u_int16_t len ; /* pgm header+options+payload */ int error ; struct mbuf *opt = NULL ; int pgmoptlen = 0 ; if (tp->p_template == NULL) { printf("-- OUCH ! template not allocated !"); return EINVAL; } /* * First, allocate data (or extended header) portion. */ switch(cmd) { default: printf("pgm_output: unsupported type\n"); return EINVAL; case PGM_NAK_TYPE: { struct pgm_ack_body *nak; MGETHDR(m, M_DONTWAIT, MT_HEADER); if (m == NULL) return ENOBUFS; /* leave room for link and protocol headers if possible */ if (max_linkhdr + sizeof(struct pgmiphdr) + sizeof(*nak) <= MHLEN) m->m_data += max_linkhdr + sizeof(struct pgmiphdr) ; m->m_len = m->m_pkthdr.len = sizeof(*nak); nak = mtod(m, struct pgm_ack_body *); nak->src_nla.s_addr = tp->src_nla.s_addr ; nak->nla_afi2 = htons( 1 ); /* this is for IPv4 */ nak->rsvd2 = 0 ; nak->mc_nla = inp->inp_laddr; } break ; case PGM_NCF_TYPE: { struct pgm_ack_body *ncf; MGETHDR(m, M_DONTWAIT, MT_HEADER); if (m == NULL) return ENOBUFS; /* leave room for link and protocol headers if possible */ if (max_linkhdr + sizeof(struct pgmiphdr) + sizeof(*ncf) <= MHLEN) m->m_data += max_linkhdr + sizeof(struct pgmiphdr) ; m->m_len = m->m_pkthdr.len = sizeof(*ncf); ncf = mtod(m, struct pgm_ack_body *); ncf->src_nla = inp->inp_laddr; ncf->nla_afi2 = htons( 1 ); /* IPv4 AFI */ ncf->rsvd2 = 0; ncf->mc_nla = inp->inp_faddr; } opt = pgm_router_alert ; break; case PGM_SPM_TYPE: { struct pgm_spm_body *spm ; MGETHDR(m, M_DONTWAIT, MT_HEADER); if (m == NULL) return ENOBUFS; /* leave room for link and protocol headers if possible */ /* XXX maybe add room for options as well ? */ if (max_linkhdr + sizeof(struct pgmiphdr) + sizeof(*spm) <= MHLEN) m->m_data += max_linkhdr + sizeof(struct pgmiphdr) ; m->m_len = m->m_pkthdr.len = sizeof(*spm); spm = mtod(m, struct pgm_spm_body *) ; spm->spm_le_seq = htonl( tp->txw_lead ) ; spm->nla_afi = htons( 1 ) ; /* IPv4 AFI */ spm->rsvd = 0 ; spm->path_nla = inp->inp_laddr; DEB( printf("Send SPM for nla 0x%lx\n", ntohl(spm->path_nla.s_addr) ); ) tp->spm_sqn++ ; } opt = pgm_router_alert ; break ; case PGM_OD_TYPE: /* copy packet from socket buffer */ m = m_copypacket( tp->odata_curr, M_DONTWAIT ); if (m == NULL) return ENOBUFS; break ; case PGM_RD_TYPE: /* copy packet from socket buffer */ m = m_copypacket( tp->rdata_head->mb, M_DONTWAIT); if (m == NULL) return ENOBUFS; opt = pgm_router_alert ; break; } /* * Fill in mbuf with extended pgm header and stuff... */ M_PREPEND(m, pgmoptlen + sizeof(struct pgmiphdr), M_DONTWAIT); if (m == NULL) return ENOBUFS; pi = mtod(m, struct pgmiphdr *); bcopy(tp->p_template, pi, sizeof(*pi) ); if (pgmoptlen) { /* XXX TODO: * copy pgm options at the right place, using the mbuf copy * functions if they don't fit in the first mbuf... */ pi->pi_p.options |= PGM_OPT_PRESENT ; } len = m->m_pkthdr.len - sizeof(struct ip) ; pi->pi_p.tpdu_len = htons( len ) ; /* * Fill up any remaining fields in the header, and update state * in the control block (pointers, sequence numbers, etc.). */ switch(cmd) { case PGM_NAK_TYPE: { u_int16_t *p = (u_int16_t *)&(pi->pi_p._seq2); /* the _seq2 field in NAK is 16-bit NLA AFI, 16 bit reserved */ p[0] = htons(1) ; /* NLA AFI for src IP */ p[1] = 0 ; /* reserved for src IP */ pi->pi_p.type = PGM_NAK_TYPE; pi->pi_p.nak_req_seq = htonl(tp->nak_curr); pi->pi_dst = tp->path_nla; } break ; case PGM_NCF_TYPE: pi->pi_p.type = PGM_NCF_TYPE; pi->pi_p.nak_req_seq = htonl(tp->nak_curr); DEB(printf("pgm_output: NCF for seq %u\n", tp->nak_curr);) break; case PGM_SPM_TYPE: pi->pi_p.type = PGM_SPM_TYPE ; pi->pi_p.spm_txw_trail = htonl(tp->txw_trail); pi->pi_p.spm_seq = htonl(tp->spm_sqn); break ; case PGM_OD_TYPE: tp->odata_curr = tp->odata_curr->m_nextpkt ; /* advance ptr */ tp->txw_lead++ ; pi->pi_p.type = PGM_OD_TYPE; pi->pi_p.od_txw_trail = htonl(tp->txw_trail) ; pi->pi_p.od_dp_seq = htonl(tp->txw_lead); DDB(printf("pgm_output: ODATA packet seq %u\n", tp->txw_lead);) break ; case PGM_RD_TYPE: { struct pgm_rdata_q *r = tp->rdata_head ; pi->pi_p.type = PGM_RD_TYPE; pi->pi_p.od_txw_trail = htonl(tp->txw_trail); pi->pi_p.od_dp_seq = htonl(r->seq); DDB(printf("++++ pgm_output: RDATA for packet %u\n", r->seq);) tp->rdata_head = r->next; /* advance pointer and free queue */ free(r, M_PCB); } break; } /* * PGM checksum starts from the PGM header. It _does not_ include * the IP header (or the pseudoheader for what matters). */ pi->pi_sum = 0 ; if ( pgmcksum ) { /* skip ip header... */ m->m_len -= sizeof(struct ip) ; m->m_pkthdr.len -= sizeof(struct ip) ; m->m_data += sizeof(struct ip) ; if ((pi->pi_sum = in_cksum(m, len)) == 0) pi->pi_sum = 0xffff ; m->m_len += sizeof(struct ip) ; m->m_pkthdr.len += sizeof(struct ip) ; m->m_data -= sizeof(struct ip) ; } pgmstat.pgms_opackets++; ((struct ip *)pi)->ip_len = len + sizeof (struct ip) ; ((struct ip *)pi)->ip_ttl = inp->inp_ip_ttl; /* XXX */ ((struct ip *)pi)->ip_tos = inp->inp_ip_tos; /* XXX */ error = ip_output(m, opt, &inp->inp_route, inp->inp_socket->so_options & (SO_DONTROUTE | SO_BROADCAST), inp->inp_moptions); if (error) DDB(printf("--- pgm_output: ip_output error\n");) return error ; } static int pgm_abort(struct socket *so) { struct inpcb *inp = sotoinpcb(so); struct pgmcb *tp = NULL ; int s ; struct pgm_rdata_q *q ; DDB(printf("pgm_abort\n");) if (inp == NULL) return EINVAL; s = splnet() ; /* XXX not sure it is really needed... */ tp = intopgmcb(inp); tp->state = PGM_CLOSED ; /* so input will not touch us */ /* free data structures... */ while ( (q = tp->rdata_head) != NULL ) { tp->rdata_head = q->next; free(q, M_PCB); } while ( (q = tp->odata_trail_head) != NULL ) { tp->odata_trail_head = q->next; free(q, M_PCB); } DEB(printf("trail_q done.\n");) while ( PGM_Q_NONEMPTY(tp->seg_next) ) { struct pgm_pkt_q *q = tp->seg_next ; if (q->type == T_ODATA && q->u.d.m) { tp->reass_q_bufs-- ; m_freem(q->u.d.m); } tp->seg_next = q->seg_next ; free(q, M_PCB); } if (tp->reass_q_bufs > 0) printf("pgm_abort: reass_q left %d\n", tp->reass_q_bufs ); /* * Need to loop on sbdroprecord to clean the buffer, as sbflush * would stop on the first hole (zero-len record) */ while (so->so_rcv.sb_mb); sbdroprecord(&so->so_rcv); if (tp->p_template) m_free(dtom(tp->p_template)); free(tp, M_PCB); inp->inp_ppcb = 0; soisdisconnected(so); in_pcbdetach(inp); splx(s); return 0 ; } static int #if __FreeBSD__ >= 3 pgm_attach(struct socket *so, int proto, struct proc *p) #else pgm_attach(struct socket *so, int proto) #endif { struct inpcb *inp = NULL; /* XXX BC was sotoinpcb(so) */ struct pgmcb *tp = NULL; int error, s; DEB(printf("pgm_attach\n");) if (inp != NULL) return EINVAL; s = splnet(); error = in_pcballoc( __2_or_3(so, &pgmcbinfo, p) ); if (error) goto done; inp = sotoinpcb(so); tp = pgm_newpgmcb(inp); if (tp == 0) { int nofd = so->so_state & SS_NOFDREF; /* XXX */ so->so_state &= ~SS_NOFDREF; /* don't free the socket yet */ in_pcbdetach(inp); so->so_state |= nofd; error = ENOBUFS ; goto done; } error = soreserve(so, pgm_sendspace, pgm_recvspace); if (error) goto done; ((struct inpcb *) so->so_pcb)->inp_ip_ttl = ip_defttl; done: splx(s); return error; } /* * API: we can do a bind() on both rx and tx sockets, only allowed * in PGM_NEW state. Bind is optional for a sender, and if done must * preceed the connect() call. * sin_addr, sin_port: anything (both sender and receiver) * For a receiver, sin_addr = multicast address, * sin_port = destination port (local endpoint). * For a sender, sin_addr is usally INADDR_ANY (or an IP for a * local interface), sin_port is either 0 or the chosen port. * gsid == 0, sport == 0: get all matching pkts (raw, receiver only) * gsid == 0, sport == sin_port: commit as sender. * (we still need a connect...) * gsid != 0, sport != 0: commit as receiver * once committed, cannot bind again. * */ static int #if __FreeBSD__ >= 3 pgm_bind(struct socket *so, struct sockaddr *nam, struct proc *p) #else pgm_bind(struct socket *so, struct mbuf *nam) #endif { struct inpcb *inp = sotoinpcb(so); struct pgmcb *tp = NULL ; int error =0, s ; DEB(printf("pgm_bind\n");) if (inp == NULL) return EINVAL; tp = intopgmcb(inp); s = splnet(); if (tp->state != PGM_NEW ) { printf("bind: socket already committed\n"); error = EINVAL ; } if (error == 0) error = in_pcbbind( __2_or_3(inp, nam, p) ); DEB(printf("in_pcbind error: %d\n",error);) if (error == 0) { #if __FreeBSD__ >= 3 struct sockaddr_pgm *sin = (struct sockaddr_pgm *) nam; #else struct sockaddr_pgm *sin = mtod(nam, struct sockaddr_pgm *); #endif if (sin->gsid_high == 0 && sin->gsid_low == 0) { if (sin->sport == sin->sin_port) { printf("sin_port == sport, sender mode\n"); tp->state = PGM_SENDER; tp->sport = inp->inp_lport; } else if (sin->sport == 0) { /* receiver... just listen for everything */ printf("soisconnecting for everything\n"); tp->state = PGM_RECEIVER ; soisconnecting(so); } else { printf("bind: invalid sport with gsi == 0\n"); error = EINVAL ; } } else { if (sin->sport == 0) { printf("bind: invalid sport with gsi != 0\n"); error = EINVAL ; } else { tp->have_gsi = 1 ; tp->gsid_low = sin->gsid_low ; tp->gsid_high = sin->gsid_high ; tp->sport = inp->inp_fport = sin->sport ; tp->state = PGM_RECEIVER ; /* pgm_template needs TSI and state initialized in the pgmcb */ tp->p_template = pgm_template(tp); if (tp->p_template == 0) error = ENOBUFS; else { printf("soisconnecting for tsi\n"); soisconnecting(so); } } } } splx(s); return error ; } static int #if __FreeBSD__ >= 3 pgm_connect(struct socket *so, struct sockaddr *nam, struct proc *p) #else pgm_connect(struct socket *so, struct mbuf *nam) #endif { int error = 0, s; struct inpcb *inp = sotoinpcb(so); struct pgmcb *tp = intopgmcb(inp); DEB(printf("pgm: pgm_connect\n");); if (tp->state != PGM_NEW && tp->state != PGM_SENDER) { printf("-- invalid state %d for connect\n", tp->state); return EINVAL ; } s = splnet(); /* If the socket has not been bound with a local port, * in_pcbbind assigns one automatically */ if (inp->inp_lport == 0) error = in_pcbbind( __2_or_3(inp, nam, p) ); if (error == 0) { if (inp->inp_faddr.s_addr != INADDR_ANY) { printf("pgm_connect: faddr != INADDR_ANY, 0x%lx\n", ntohl(inp->inp_faddr.s_addr ) ); error = EISCONN; } else error = in_pcbconnect( __2_or_3(inp, nam, p) ); } if (error == 0) { tp->state = PGM_SENDER; tp->sport = inp->inp_lport; tp->have_gsi = 1; tp->gsid_low = htonl(pgm_gsid_low); tp->gsid_high = htons(pgm_gsid_high); /* pgm_template needs TSI and state initialized in the pgmcb */ tp->p_template = pgm_template(tp); if (tp->p_template == 0) { in_pcbdisconnect(inp); error = ENOBUFS ; } else { soisconnected(so); } } splx(s); printf("pgm_connect laddr 0x%lx 0x%x, faddr 0x%lx 0x%x\n", ntohl(inp->inp_laddr.s_addr), ntohs(inp->inp_lport), ntohl(inp->inp_faddr.s_addr), ntohs(inp->inp_fport) ); return error ; } static int pgm_detach(struct socket *so) { DDB(printf("pgm_detach\n"); ); return pgm_abort(so); }; static int pgm_disconnect(struct socket *so) { DDB(printf("pgm_disconnect\n"); ); return pgm_abort(so); }; /* * at the moment, can only work in state PGM_SENDER and with * a valid template. */ static int #if __FreeBSD__ >= 3 pgm_send(struct socket *so, int flags, struct mbuf *m, struct sockaddr *nam, struct mbuf *control, struct proc *p) #else pgm_send(struct socket *so, int flags, struct mbuf *m, struct mbuf *nam, struct mbuf *control) #endif { struct inpcb *inp= sotoinpcb(so); struct pgmcb *tp = intopgmcb(inp); int s, error = 0; DEB(printf("pgm_send\n");) if (control) { printf("pgm: PRU_SEND: control_len %d\n", control->m_len); m_freem(control); /* XXX shouldn't caller do this??? */ } if (nam) { printf("pgm: PRU_SEND: don't want an address!\n"); m_freem(m); return EISCONN ; } if (tp->state != PGM_SENDER || tp->p_template == NULL) { printf("--- pgm_send: socket not ready to send\n"); return EINVAL ; } s = splnet(); sbappendrecord(&so->so_snd, m); /* * If there are no pending bufs, transmit this one. * also the next one to transmit. */ if (tp->odata_curr == NULL) tp->odata_curr = m ; pgm_odata_move(tp); /* send through the traffic shaper */ splx(s); return error; /* don't want to free the buffer */ } static int pgm_shutdown(struct socket *so) { struct inpcb *inp = sotoinpcb(so); DDB(printf("pgm_shutdown\n"); ); if (inp == 0) return EINVAL; /* maybe should schedule a FIN ? */ socantsendmore(so); return 0 ; } struct pr_usrreqs pgm_usrreqs = { pgm_abort, pru_accept_notsupp, pgm_attach, pgm_bind, pgm_connect, pru_connect2_notsupp, in_control, pgm_detach, pgm_disconnect, pru_listen_notsupp, in_setpeeraddr, pru_rcvd_notsupp, pru_rcvoob_notsupp, pgm_send, pru_sense_null, pgm_shutdown, in_setsockaddr #if __FreeBSD__ >= 3 , sosend, soreceive, sopoll #endif }; /* * pgm_template creates a template pkt for the sender. * We depend on some fields (gsid, sport, dport) being initialized * earlier. */ static struct pgmiphdr * pgm_template(struct pgmcb *tp) { struct inpcb *inp = tp->p_inpcb; struct pgmiphdr *pi; pi = tp->p_template ; if ( pi == NULL ) { struct mbuf *m = m_get(M_DONTWAIT, MT_HEADER); if (m == NULL) return NULL ; m->m_len = sizeof (struct pgmiphdr); pi = mtod(m, struct pgmiphdr *); } else { printf("+++ warning, p_template already set, state %d\n", tp->state); } bzero( pi, sizeof (*pi) ); /* IP header */ pi->pi_pr = IPPROTO_PGM ; pi->pi_len = htons(sizeof (struct pgmhdr) ) ; if (tp->state == PGM_RECEIVER) pi->pi_src.s_addr = INADDR_ANY; else pi->pi_src = inp->inp_laddr; pi->pi_dst = inp->inp_faddr; /* PGM header */ pi->pi_p.ph_sport = inp->inp_lport; pi->pi_p.ph_dport = inp->inp_fport; pi->pi_p.type = PGM_OD_TYPE; pi->pi_p.options = 0; pi->pi_p.checksum = 0; pi->pi_p.gsid_low = tp->gsid_low; pi->pi_p.gsid_high = tp->gsid_high; pi->pi_p.tpdu_len = pi->pi_p._seq1 = pi->pi_p._seq2 = 0; return pi ; } void pgm_drain() { printf("PGM: pgm_drain()\n"); } void pgm_init() { struct ipoption *ra; /* * init hash list for pgm control blocks */ LIST_INIT(&pgmcb) ; pgmcbinfo.listhead = &pgmcb ; pgmcbinfo.hashbase = hashinit(PGMCBHASHSIZE, M_PCB, &pgmcbinfo.hashmask); #if __FreeBSD__ >= 3 pgmcbinfo.porthashbase = hashinit(PGMCBHASHSIZE, M_PCB, &pgmcbinfo.porthashmask); pgmcbinfo.ipi_zone = zinit("pgmcb", sizeof(struct inpcb), maxsockets, ZONE_INTERRUPT, 0); #endif /* * update global variables */ if (max_protohdr < sizeof(struct pgmhdr)) max_protohdr = sizeof(struct pgmhdr) ; if (max_linkhdr + sizeof(struct pgmhdr) > MHLEN) panic("pgm_init: headers too long"); /* * Construct a Router Alert option to use in outgoing packets */ MGET(pgm_router_alert, M_DONTWAIT, MT_DATA); /* XXX might fail, just hope not! */ ra = mtod(pgm_router_alert, struct ipoption *); ra->ipopt_dst.s_addr = 0; ra->ipopt_list[0] = IPOPT_RA; /* Router Alert Option */ ra->ipopt_list[1] = 0x04; /* 4 bytes long */ ra->ipopt_list[2] = 0x00; ra->ipopt_list[3] = 0x00; pgm_router_alert->m_len = sizeof(ra->ipopt_dst) + ra->ipopt_list[1]; } void pgm_ctlinput(int cmd, struct sockaddr *sa, void *vip) { printf("PGM: pgm_ctlinput\n"); } /* * sockopt interface changed between 2.2 and 3.x. Some macros here help */ #if __FreeBSD__ >= 3 #define CTLOUTPUT_LEVEL sopt->sopt_level #define CTLOUTPUT_OP sopt->sopt_dir #define CTLOUTPUT_OP_SET SOPT_SET #define CTLOUTPUT_OP_GET SOPT_GET #define CTLOUTPUT_OP_NAME sopt->sopt_name #define CTLOUTPUT_ARGS ( struct socket *so, struct sockopt *sopt) #define GET_INT_ARG(x) \ error = sooptcopyin(sopt, &x, sizeof x, sizeof x); #define PUT_INT_ARG(x) \ optval = (x) ; error = sooptcopyout(sopt, &optval, sizeof optval); #else /* for FreeBSD 2.2.x */ #define CTLOUTPUT_LEVEL level #define CTLOUTPUT_OP op #define CTLOUTPUT_OP_SET PRCO_SETOPT #define CTLOUTPUT_OP_GET PRCO_GETOPT #define CTLOUTPUT_OP_NAME optname #define CTLOUTPUT_ARGS (int op, struct socket *so, int level, \ int optname, struct mbuf **mp) #define GET_INT_ARG(x) \ if ((*mp) == NULL || (*mp)->m_len != sizeof (int)) error = EINVAL; \ else x = *mtod( (*mp), int *) ; #define PUT_INT_ARG(x) \ *mp = m_get(M_WAIT, MT_SOOPTS); (*mp)->m_len = sizeof(int); \ *mtod( (*mp) , int *) = (x) ; #endif int pgm_ctloutput CTLOUTPUT_ARGS { int error = 0, optval, s; struct inpcb *inp; struct pgmcb *tp; s = splnet(); /* really too coarse locking... */ inp = sotoinpcb(so); if (inp == NULL) { splx(s); #if __FreeBSD__ < 3 if (op == PRCO_SETOPT && *mp) m_free(*mp); #endif return (ECONNRESET); } if (CTLOUTPUT_LEVEL != IPPROTO_PGM) { #if __FreeBSD__ >= 3 error = ip_ctloutput(so, sopt); #else error = ip_ctloutput(op, so, level, optname, mp); #endif splx(s); return (error); } tp = intopgmcb(inp); switch (CTLOUTPUT_OP) { case CTLOUTPUT_OP_SET: switch (CTLOUTPUT_OP_NAME) { case PGM_TXW_MAX_RATE: GET_INT_ARG( optval ); if (error) break ; if (optval < 0 || optval > 1000000) error = EINVAL; else tp->txw_max_rte = optval ; break ; case PGM_TRAIL_ADVANCE: /* set trail advance method. */ /* * in case TRAIL_ADVANCE_USER, must make the * socket non-blocking. * in case the policy changes must cleanup old status. */ GET_INT_ARG( optval ); if (error) break; if (optval < 1 || optval > 3) error = EINVAL; else if (tp->trail_advance_policy != optval) { switch (tp->trail_advance_policy) { case TRAIL_ADVANCE_TIMER: /* * XXX check... Deallocate unused structures leaving * this trail advance method. */ while ( tp->odata_trail_head != NULL ) { struct pgm_rdata_q *q = tp->odata_trail_head; tp->odata_trail_head = q->next; free(q, M_PCB); } DDB(printf("trail_q done.\n");) break; case TRAIL_ADVANCE_DATA: /* TODO */ case TRAIL_ADVANCE_USER: break; default: printf("pgm_ctloutput: should not get here !\n"); break; } tp->trail_advance_policy = optval; } break; case PGM_ODATA_LIFETIME: /* sets odata lifetime */ if (tp->trail_advance_policy != TRAIL_ADVANCE_TIMER) { /* XXX */ printf("This option can be used only with TRAIL_ADVANCE_TIMER policy\n"); error = EINVAL; break; } GET_INT_ARG( optval ); if (error) break; if (optval < 0 || optval > 1000) /* XXX: 1000 ?*/ error = EINVAL; tp->odata_lifetime = optval * hz; /* now in ticks */ break; default: error = ENOPROTOOPT; break; } #if __FreeBSD__ < 3 if ( *mp ) m_free( *mp ); #endif break; case CTLOUTPUT_OP_GET: switch (CTLOUTPUT_OP_NAME) { case PGM_TXW_MAX_RATE: PUT_INT_ARG( tp->txw_max_rte ) ; break ; case PGM_HOLE_SIZE: PUT_INT_ARG (tp->rxw_next - tp->rxw_hole_start ); tp->in_hole = 0; pgm_clean_reass(tp); break; default: error = ENOPROTOOPT; break; } break; } splx(s); return (error); }