README-CATALLI2000644 000423 000000 00000012527 11747450023 013337 0ustar00luigiwheel000000 000000 My own version of openvswitch derived from http://openvswitch.org/releases/openvswitch-1.1.0pre2.tar.gz - LOCAL PORT The bridge local port represents an entry point to the normal network stack. It is realized with a tap device. This device is opened by the 'dpif' module and links all the ports belonging to the bridge with the network stack through the tap file descriptor. - TAP DEVICES AS VIRTUAL DEVICES I've implemented the capability to create and open a tap device specifying it in the command line (e.g. in the --ports option of ovs-openflowd). Specifying a device with name 'tap:tapN' (e.g. --ports=tap:tap1)the software attempts to create and open a tap device with name 'tapN'. If the device already exists it attempts only to open it. If you don't put the 'tap:' prefix (e.g. --ports=tap1), the software attempts only to open the device, which must have been previously created (e.g. with 'ifconfig tap1 create'). *********************************** Compilation: The compilation of the netdev-bsd file requires some additional LDFLAGS. 1. For the time being leaves the Makefile unmodified and run the configure script as follow: bash# export LDFLAGS='-lrt -lpcap -lpthread' bash# ./configure This will include the time and pcap options to the Makefile. 2. Or modify the Makefile LDFAGS with the following line: <<<<<<< .mine LDFLAGS = -lrt -lpcap -lpthread ======= LDFLAGS = -lrt -lpcap =lpthread >>>>>>> .r289 3. The fake generator has the aim to test the openflowd performances avoiding the overhead introduced by the write()/read() functions. It can be enabled by the -DFAKE_GEN define and consists in: - a fake packet creation; - always return the fake packet in netdev_bsd_recv_system(); - do not wait on the involved interface; - always return success for netdev_bsd_send(). To enable the fake generation you should add a define in the Makefile: DEFS= -DFAKE_GEN and configure the openflow with "em0" and "em1": ./ovs-openflowd netdev@dp0 --ports=em0,em1 tcp:127.0.0.1:6635 --listen=ptcp:6634 --out-of-band ./ovs-controller -w --max-idle=permanent ptcp:6636 and insert the forward entry in the flow table by: ./ovs-ofctl add-flow tcp:127.0.0.1:6634 "in_port=1 idle_timeout=0 priority=65535 actions=output:2" this is mandatory because initially the switch does not know where to send the packets (ARP replies are missing for fake packets) and will continuosly flood the packets across the whole network. 4. Commands ./ovs-ofctl dump-flows tcp:127.0.0.1:6634 ovs-openflowd # The openflows server tcp:127.0.0.1:6635 # socket listening for the controller --listen=ptcp:6634 # socket listening for ofctl --out-of-band # used to avoid error on controller connection ovs-ofctl # Send commands and querty the server add-flow # add a flow entry dump-flows # dumps the active flows ifconfig dp0 destroy # to be used when the ovs-server crash and # do not reset the interfaces arp -d 192.168.1.2 # to be used to forse an arp request/reply # useful while running tests with netrate # because the flowtable entry is added when # a packet has a reply from a port 6. Linux kernel module (ONGOING) 6.1 create the var run directory, only for the first execution mkdir -p /usr/local/var/run/openvswitch/ 6.2 load the openvswitch module insmod datapath/linux-2.6/openvswitch_mod.ko 6.3 start the controller and the server ./ovs-controller -w --max-idle=permanent ptcp:6636 ./ovs-openflowd system@dp0 --ports=eth2,eth4 tcp:127.0.0.1:6636 --listen=ptcp:6634 --out-of-band 7. Linux kernel user space 7.1 create the var run directory, only for the first execution mkdir -p /usr/local/var/run/openvswitch/ 7.2 lauch controller, server and test ./ovs-controller -w --max-idle=permanent ptcp:6636 ./ovs-openflowd netdev@dp0 --ports=eth2,eth4 tcp:127.0.0.1:6636 --listen=ptcp:6634 --out-of-band LP MARTA: netsend 33K PC AMD: ipfw 32900 LP MARTA: ipfw 32200 PC AMD: netsend 34000 7.3 configure 4 ports and test ./ovs-openflowd netdev@dp0 --ports=eth2,eth4,eth6,eth7 tcp:127.0.0.1:6636 --listen=ptcp:6634 --out-of-band LP MARTA: netsend 33K PC AMD: ipfw 31700 LP MARTA: netsend 33K PC AMD: ipfw 33K LP MARTA: netsend 34K PC AMD: ipfw 30K LP MARTA: netsend 33K PC AMD: ipfw 29K LP MARTA: netsend 33K PC AMD: ipfw 30K LP MARTA: netsend 33K PC AMD: ipfw 30K # Errors with the linux device May 25 15:53:26|00007|pktbuf|WARN|cookie mismatch: 0000029a != 0000039a May 25 15:53:26|00008|pktbuf|WARN|cookie mismatch: 0000029b != 0000039b 8 netmap The netmap code is enabled wrapping the pcap functions by the changing the loader search path. Start the switch with: (LD_LIBRARY_PATH=. ./ovs-openflowd netdev@dp0 --ports=ix0,ix1 tcp:127.0.0.1:6636 --listen=ptcp:6634 --out-of-band) and the controller as usual. ./ovs-openflowd netdev@dp0 --ports=ix0,ix1 tcp:127.0.0.1:6636 --listen=ptcp:6634 --out-of-band Do not forget to disable the checksum on the intefaces: [root@10Gb1 /home/matteo/workspace/netmap/v2/netmap-v2/examples/pkt-gen]# ifconfig ix0 -rxcsum -txcsum [root@10Gb1 /home/matteo/workspace/netmap/v2/netmap-v2/examples/pkt-gen]# ifconfig ix1 -rxcsum -txcsum And to insert the rules into the flowtable: # ./ovs-ofctl add-flow tcp:127.0.0.1:6634 "in_port=1 idle_timeout=0 priority=65535 actions=output:2" # ./ovs-ofctl add-flow tcp:127.0.0.1:6634 "in_port=2 idle_timeout=0 priority=65535 actions=output:1" # ./ovs-ofctl dump-flows tcp:127.0.0.1:6634 openvswitch-1.1.0pre2-porting.patch000644 000423 000000 00000165612 11747450066 017637 0ustar00luigiwheel000000 000000 diff -Nur -x '*.svn*' -x '*.gitignore*' orig/acinclude.m4 mod/acinclude.m4 --- orig/acinclude.m4 2010-09-14 06:55:56.000000000 +0200 +++ mod/acinclude.m4 2011-11-29 15:46:30.626847642 +0100 @@ -220,6 +220,17 @@ [Define to 1 if net/if_packet.h is available.]) fi]) +dnl Checks for net/if_dl.h. +AC_DEFUN([OVS_CHECK_IF_DL], + [AC_CHECK_HEADER([net/if_dl.h], + [HAVE_IF_DL=yes], + [HAVE_IF_DL=no]) + AM_CONDITIONAL([HAVE_IF_DL], [test "$HAVE_IF_DL" = yes]) + if test "$HAVE_IF_DL" = yes; then + AC_DEFINE([HAVE_IF_DL], [1], + [Define to 1 if net/if_dl.h is available.]) + fi]) + dnl Checks for buggy strtok_r. dnl dnl Some versions of glibc 2.7 has a bug in strtok_r when compiling diff -Nur -x '*.svn*' -x '*.gitignore*' orig/configure.ac mod/configure.ac --- orig/configure.ac 2010-09-14 06:55:56.000000000 +0200 +++ mod/configure.ac 2011-11-29 15:46:30.623514308 +0100 @@ -52,6 +52,7 @@ OVS_CHECK_OVSDBMONITOR OVS_CHECK_ER_DIAGRAMS OVS_CHECK_IF_PACKET +OVS_CHECK_IF_DL OVS_CHECK_STRTOK_R AC_CHECK_MEMBERS([struct stat.st_mtim.tv_nsec, struct stat.st_mtimensec], [], [], [[#include ]]) diff -Nur -x '*.svn*' -x '*.gitignore*' orig/lib/automake.mk mod/lib/automake.mk --- orig/lib/automake.mk 2010-09-14 06:55:56.000000000 +0200 +++ mod/lib/automake.mk 2011-11-29 15:46:23.960180769 +0100 @@ -187,6 +187,13 @@ lib/rtnetlink.h endif +if HAVE_IF_DL +lib_libopenvswitch_a_SOURCES += \ + lib/netdev-bsd.c \ + lib/rtbsd.c \ + lib/rtbsd.h +endif + if HAVE_OPENSSL lib_libopenvswitch_a_SOURCES += lib/stream-ssl.c nodist_lib_libopenvswitch_a_SOURCES += lib/dhparams.c diff -Nur -x '*.svn*' -x '*.gitignore*' orig/lib/dpif-netdev.c mod/lib/dpif-netdev.c --- orig/lib/dpif-netdev.c 2010-09-14 06:55:56.000000000 +0200 +++ mod/lib/dpif-netdev.c 2011-11-29 15:46:23.966847437 +0100 @@ -1043,7 +1043,7 @@ struct ofpbuf packet; struct dp_netdev *dp; - ofpbuf_init(&packet, DP_NETDEV_HEADROOM + max_mtu); + ofpbuf_init(&packet, DP_NETDEV_HEADROOM + VLAN_ETH_HEADER_LEN + max_mtu); LIST_FOR_EACH (dp, struct dp_netdev, node, &dp_netdev_list) { struct dp_netdev_port *port; diff -Nur -x '*.svn*' -x '*.gitignore*' orig/lib/netdev-bsd.c mod/lib/netdev-bsd.c --- orig/lib/netdev-bsd.c 1970-01-01 01:00:00.000000000 +0100 +++ mod/lib/netdev-bsd.c 2011-12-02 10:18:19.027581676 +0100 @@ -0,0 +1,1593 @@ +/* + * Copyright (c) 2011 Gaetano Catalli. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * 1. Redistributions of source code must retain the above copyright notice, + * this list of conditions and the following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED ``AS IS'' WITHOUT ANY WARRANTIES OF ANY KIND. + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "rtbsd.h" +#include "coverage.h" +#include "dynamic-string.h" +#include "fatal-signal.h" +#include "netdev-provider.h" +#include "ofpbuf.h" +#include "openflow/openflow.h" +#include "packets.h" +#include "poll-loop.h" +#include "socket-util.h" +#include "shash.h" +#include "svec.h" +#include "vlog.h" + +VLOG_DEFINE_THIS_MODULE(netdev_bsd) + + +/* + * This file implements objects to access interfaces. + * Externally, interfaces are represented by two structures: + * + struct netdev_dev, representing a network device, + * containing e.g. name and a refcount; + * We can have private variables by embedding the + * struct netdev_dev into our own structure + * (e.g. netdev_dev_bsd) + * + * + struct netdev, representing an instance of an open netdev_dev. + * The structure contains a pointer to the 'struct netdev' + * representing the device. Again, private information + * such as file descriptor etc. are stored in our + * own struct netdev_bsd which includes a struct netdev. + * + * Both 'struct netdev' and 'struct netdev_dev' are referenced + * in containers which hold pointers to the data structures. + * We can reach our own struct netdev_XXX_bsd by putting a + * struct netdev_XXX within our own struct, and using CONTAINER_OF + * to access the parent structure. + */ +struct netdev_bsd { + struct netdev netdev; + + int netdev_fd; /* Selectable file descriptor for the network device. + This descriptor will be used for polling operations */ + + pcap_t *pcap_handle; /* Packet capture descriptor for a system network + device */ +}; + +struct netdev_dev_bsd { + struct netdev_dev netdev_dev; + unsigned int cache_valid; + + int ifindex; + uint8_t etheraddr[ETH_ADDR_LEN]; + struct in_addr in4; + struct in6_addr in6; + int mtu; + int carrier; + + bool tap_opened; + int tap_fd; /* TAP character device, if any */ +}; + + +enum { + VALID_IFINDEX = 1 << 0, + VALID_ETHERADDR = 1 << 1, + VALID_IN4 = 1 << 2, + VALID_IN6 = 1 << 3, + VALID_MTU = 1 << 4, + VALID_CARRIER = 1 << 5 +}; + +/* An AF_INET socket (used for ioctl operations). */ +static int af_inet_sock = -1; + +#define PCAP_SNAPLEN 1024 + +/* + * A BSD network device notifier. + * + * Represents a handler to be invoked on a device when some event occurs. + * Contains handler, parameters (in netdev_notifier) and link fields for the + * list (in struct list). + */ +struct netdev_bsd_notifier { + struct netdev_notifier notifier; /* Handler and arguments */ + struct list node; /* Link fields for the list */ +}; + +/* + * All 'struct netdev_bsd_notifier' objects are linked as children of a generic + * 'struct shash_node', there is one shash_node per interface, and the + * interface name is the search key. In turn, all the 'struct shash_node' are + * stored in a container, all_bsd_notifiers. + * + * A 'netdev_bsd_notifier' is created and added to the all_bsd_notifiers + * using 'netdev_bsd_poll_add()' XXX again, same code as netdev-linux + */ +static struct shash all_bsd_notifiers = + SHASH_INITIALIZER(&all_bsd_notifiers); + +/* + * Openvswitch can register multiple handlers on route-related events. + * The descriptor for each handler is a struct rtbsd_notifier + * that contains the function and a parameter. + * + * In this module we call rtbsd_notifier_register() to invoke + * the function netdev_bsd_poll_cb() on the all_bsd_notifiers above. + */ +static struct rtbsd_notifier netdev_bsd_poll_notifier; + +/* + * Notifier used to invalidate device informations in case of status change. + * + * It will be registered with a 'rtbsd_notifier_register()' when the first + * device will be created with the call of either 'netdev_bsd_tap_create()' or + * 'netdev_bsd_system_create()'. + * + * The callback associated with this notifier ('netdev_bsd_cache_cb()') will + * invalidate cached information about the device. + */ +static struct rtbsd_notifier netdev_bsd_cache_notifier; +static int cache_notifier_refcount; + +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); + +static int netdev_bsd_do_ioctl(const struct netdev *, struct ifreq *, + unsigned long cmd, const char *cmd_name); +static void destroy_tap(int fd, const char *name); +static int get_flags(const struct netdev *, int *flagsp); +static int set_flags(struct netdev *, int flags); +static int do_set_addr(struct netdev *netdev, + int ioctl_nr, const char *ioctl_name, + struct in_addr addr); +static int get_etheraddr(const char *netdev_name, uint8_t ea[ETH_ADDR_LEN]); +static int set_etheraddr(const char *netdev_name, int hwaddr_family, + int hwaddr_len, const uint8_t[ETH_ADDR_LEN]); +static int get_ifindex(const struct netdev *, int *ifindexp); + +static int netdev_bsd_init(void); + +static bool +is_netdev_bsd_class(const struct netdev_class *netdev_class) +{ + return netdev_class->init == netdev_bsd_init; +} + +static struct netdev_bsd * +netdev_bsd_cast(const struct netdev *netdev) +{ + assert(is_netdev_bsd_class(netdev_dev_get_class(netdev_get_dev(netdev)))); + return CONTAINER_OF(netdev, struct netdev_bsd, netdev); +} + +static struct netdev_dev_bsd * +netdev_dev_bsd_cast(const struct netdev_dev *netdev_dev) +{ + assert(is_netdev_bsd_class(netdev_dev_get_class(netdev_dev))); + return CONTAINER_OF(netdev_dev, struct netdev_dev_bsd, netdev_dev); +} + +/* Initialize the AF_INET socket used for ioctl operations */ +static int +netdev_bsd_init(void) +{ + static int status = -1; + + if (status >= 0) { /* already initialized */ + return status; + } + + af_inet_sock = socket(AF_INET, SOCK_DGRAM, 0); + status = af_inet_sock >= 0 ? 0 : errno; + + if (status) { + VLOG_ERR("failed to create inet socket: %s", strerror(status)); + } + + return status; +} + +/* + * Perform periodic work needed by netdev. In BSD netdevs it checks for any + * interface status changes, and eventually calls all the user callbacks. + */ +static void +netdev_bsd_run(void) +{ + rtbsd_notifier_run(); +} + +/* + * Arranges for poll_block() to wake up if the "run" member function needs to + * be called. + */ +static void +netdev_bsd_wait(void) +{ + rtbsd_notifier_wait(); +} + +/* Invalidate cache in case of interface status change. */ +static void +netdev_bsd_cache_cb(const struct rtbsd_change *change, + void *aux OVS_UNUSED) +{ + struct netdev_dev_bsd *dev; + + if (change) { + struct netdev_dev *base_dev = netdev_dev_from_name(change->if_name); + + if (base_dev) { + const struct netdev_class *netdev_class = + netdev_dev_get_class(base_dev); + + if (is_netdev_bsd_class(netdev_class)) { + dev = netdev_dev_bsd_cast(base_dev); + dev->cache_valid = 0; + } + } + } else { + /* + * XXX the API is lacking, we should be able to iterate on the list of + * netdevs without having to store the info in a temp shash. + */ + struct shash device_shash; + struct shash_node *node; + + shash_init(&device_shash); + netdev_dev_get_devices(&netdev_bsd_class, &device_shash); + SHASH_FOR_EACH (node, &device_shash) { + dev = node->data; + dev->cache_valid = 0; + } + shash_destroy(&device_shash); + } +} + +static int +cache_notifier_ref(void) +{ + int ret = 0; + + if (!cache_notifier_refcount) { + ret = rtbsd_notifier_register(&netdev_bsd_cache_notifier, + netdev_bsd_cache_cb, NULL); + if (ret) { + return ret; + } + } + cache_notifier_refcount++; + return 0; +} + +static int +cache_notifier_unref(void) +{ + cache_notifier_refcount--; + if (cache_notifier_refcount == 0) { + rtbsd_notifier_unregister(&netdev_bsd_cache_notifier); + } +} + +/* Allocate a netdev_dev_bsd structure */ +static int +netdev_bsd_create_system(const char *name, const char *type OVS_UNUSED, + const struct shash *args OVS_UNUSED, + struct netdev_dev **netdev_devp) +{ + struct netdev_dev_bsd *netdev_dev; + int error; + + error = cache_notifier_ref(); + if (!error) { + return error; + } + + netdev_dev = xzalloc(sizeof *netdev_dev); + netdev_dev_init(&netdev_dev->netdev_dev, name, &netdev_bsd_class); + *netdev_devp = &netdev_dev->netdev_dev; + + return 0; +} + +/* + * Allocate a netdev_dev_bsd structure with 'tap' class. + */ +static int +netdev_bsd_create_tap(const char *name, const char *type OVS_UNUSED, + const struct shash *args OVS_UNUSED, + struct netdev_dev **netdev_devp) +{ + struct netdev_dev_bsd *netdev_dev; + int error = 0; + struct ifreq ifr; + + error = cache_notifier_ref(); + if (!error) { + goto error; + } + + /* allocate the device structure and set the internal flag */ + netdev_dev = xzalloc(sizeof *netdev_dev); + + memset(&ifr, 0, sizeof(ifr)); + + /* Create a tap device by opening /dev/tap. The TAPGIFNAME ioctl is used + * to retrieve the name of the tap device. */ + netdev_dev->tap_fd = open("/dev/tap", O_RDWR); + if (netdev_dev->tap_fd < 0) { + error = errno; + VLOG_WARN("opening \"/dev/tap\" failed: %s", strerror(error)); + goto error_undef_notifier; + } + + /* Retrieve tap name (e.g. tap0) */ + if (ioctl(netdev_dev->tap_fd, TAPGIFNAME, &ifr) == -1) { + /* XXX Need to destroy the device? */ + error = errno; + goto error_undef_notifier; + } + + /* Change the name of the tap device */ + ifr.ifr_data = (void *)name; + if (ioctl(af_inet_sock, SIOCSIFNAME, &ifr) == -1) { + error = errno; + destroy_tap(netdev_dev->tap_fd, ifr.ifr_name); + goto error_undef_notifier; + } + + /* set non-blocking. */ + error = set_nonblocking(netdev_dev->tap_fd); + if (error) { + destroy_tap(netdev_dev->tap_fd, name); + goto error_undef_notifier; + } + + /* Turn device UP */ + ifr.ifr_flags = (uint16_t)IFF_UP; + ifr.ifr_flagshigh = 0; + strncpy(ifr.ifr_name, name, sizeof ifr.ifr_name); + if (ioctl(af_inet_sock, SIOCSIFFLAGS, &ifr) == -1) { + error = errno; + destroy_tap(netdev_dev->tap_fd, name); + goto error_undef_notifier; + } + + /* initialize the device structure and + * link the structure to its netdev */ + netdev_dev_init(&netdev_dev->netdev_dev, name, &netdev_tap_class); + *netdev_devp = &netdev_dev->netdev_dev; + + return 0; + +error_undef_notifier: + cache_notifier_unref(); +error: + free(netdev_dev); + return error; +} + +static void +netdev_bsd_destroy(struct netdev_dev *netdev_dev_) +{ + struct netdev_dev_bsd *netdev_dev = netdev_dev_bsd_cast(netdev_dev_); + + cache_notifier_unref(); + + if (netdev_dev->tap_fd >= 0 && + !strcmp(netdev_dev_get_type(netdev_dev_), "tap")) { + destroy_tap(netdev_dev->tap_fd, netdev_dev_get_name(netdev_dev_)); + } + free(netdev_dev); +} + + +static int +netdev_bsd_open_system(struct netdev_dev *netdev_dev_, int ethertype, + struct netdev **netdevp) +{ + struct netdev_dev_bsd *netdev_dev = netdev_dev_bsd_cast(netdev_dev_); + struct netdev_bsd *netdev; + int error; + enum netdev_flags flags; + + /* Allocate network device. */ + netdev = xcalloc(1, sizeof *netdev); + netdev->netdev_fd = -1; + netdev_init(&netdev->netdev, netdev_dev_); + + /* Verify that the netdev really exists by attempting to read its flags */ + error = netdev_get_flags(&netdev->netdev, &flags); + if (error == ENXIO) { + goto error; + } + + /* The first user that opens a tap port(from dpif_create_and_open()) will + * receive the file descriptor associated with the tap device. Instead, the + * following users will open the tap device as a normal 'system' device. */ + if (!strcmp(netdev_dev_get_type(netdev_dev_), "tap") && + !netdev_dev->tap_opened) { + netdev_dev->tap_opened = true; + netdev->netdev_fd = netdev_dev->tap_fd; + } else if (ethertype != NETDEV_ETH_TYPE_NONE) { + char errbuf[PCAP_ERRBUF_SIZE]; + int one = 1; + + /* open the pcap device. The device is opened in non-promiscuous mode + * because the interface flags are manually set by the caller. */ + netdev->pcap_handle = pcap_open_live(netdev_dev_->name, PCAP_SNAPLEN, + 0, 1000, errbuf); + if (netdev->pcap_handle == NULL) { + error = errno; + goto error; + } + + /* initialize netdev->netdev_fd */ + netdev->netdev_fd = pcap_get_selectable_fd(netdev->pcap_handle); + if (netdev->netdev_fd == -1) { + error = errno; + goto error; + } + + /* Set non-blocking mode. Also the BIOCIMMEDIATE ioctl must be called + * on the file descriptor returned by pcap_get_selectable_fd to achieve + * a real non-blocking behaviour.*/ + error = pcap_setnonblock(netdev->pcap_handle, 1, errbuf); + if (error == -1) { + error = errno; + goto error; + } + + /* This call assure that reads return immediately upon packet reception. + * Otherwise, a read will block until either the kernel buffer becomes + * full or a timeout occurs. */ + if(ioctl(netdev->netdev_fd, BIOCIMMEDIATE, &one) < 0 ) { + VLOG_ERR("ioctl(BIOCIMMEDIATE) on %s device failed: %s", + netdev_dev_get_name(netdev_dev_), strerror(errno)); + error = errno; + goto error; + } + + /* Capture only incoming packets */ + error = pcap_setdirection(netdev->pcap_handle, PCAP_D_IN); + if (error == -1) { + error = errno; + goto error; + } + } + *netdevp = &netdev->netdev; + + return 0; + +error: + netdev_uninit(&netdev->netdev, true); + return error; +} + + +/* Close a 'netdev'. */ +static void +netdev_bsd_close(struct netdev *netdev_) +{ + struct netdev_bsd *netdev = netdev_bsd_cast(netdev_); + + if (netdev->netdev_fd >= 0 && strcmp(netdev_get_type(netdev_), "tap")) { + pcap_close(netdev->pcap_handle); + } + + free(netdev); +} + + +/* Initializes 'svec' with a list of the names of all known network devices. */ +static int +netdev_bsd_enumerate(struct svec *svec) +{ + struct if_nameindex *names; + + names = if_nameindex(); + if (names) { + size_t i; + + for (i = 0; names[i].if_name != NULL; i++) { + svec_add(svec, names[i].if_name); + } + if_freenameindex(names); + return 0; + } else { + VLOG_WARN("could not obtain list of network device names: %s", + strerror(errno)); + return errno; + } +} + +/* The recv callback of the netdev class returns the number of bytes of the + * received packet. + * + * This can be done by the pcap_next() function. Unfortunately pcap_next() does + * not make difference between a missing packet on the capture interface and + * an error during the file capture. We can use the pcap_dispatch() function + * instead, which is able to distinguish between errors and null packet. + * + * To make pcap_dispatch() returns the number of bytes read from the interface + * we need to define the following callback and argument. + */ +struct pcap_arg { + void *data; + int size; + int retval; +}; + +/* + * This callback will be executed on every captured packet. + * + * If the packet captured by pcap_dispatch() does not fit the pcap buffer, + * pcap returns a truncated packet and we follow this behavior. + * + * The argument args->retval is the packet size in bytes. + */ +static void +proc_pkt(u_char *args_, const struct pcap_pkthdr *hdr, const u_char *packet) +{ + struct pcap_arg *args = (struct pcap_arg *)args_; + + if (args->size < hdr->len) { + printf("%s Warning: Packet truncated'n", __func__); + args->retval = args->size; + } else { + args->retval = hdr->len; + } + + /* copy the packet to our buffer */ + memcpy(args->data, packet, args->retval); +} + +/* + * This function attempts to receive a packet from the specified network + * device. It is assumed that the network device is a system device or a tap + * device opened as a system one. In this case the read operation is performed + * on the 'netdev' pcap descriptor. + */ +static int +netdev_bsd_recv_system(struct netdev_bsd *netdev, void *data, size_t size) +{ + struct pcap_arg arg; + int ret; + + if (netdev->netdev_fd < 0) { + /* Device was opened with NETDEV_ETH_TYPE_NONE. */ + return -EAGAIN; + } + + /* prepare the pcap argument to store the packet */ + arg.size = size; + arg.data = data; + + for (;;) { + ret = pcap_dispatch(netdev->pcap_handle, 1, proc_pkt, (u_char *)&arg); + + if (ret > 0) { + return arg.retval; /* arg.retval < 0 is handled in the caller */ + } + if (ret == -1) { + if (errno == EINTR) { + continue; + } + } + + return -EAGAIN; + } +} + +/* + * This function attempts to receive a packet from the specified network + * device. It is assumed that the network device is a tap device and the + * 'netdev_fd' member of the 'netdev' structure is initialized with the tap + * file descriptor. + */ +static int +netdev_bsd_recv_tap(struct netdev_bsd *netdev, void *data, size_t size) +{ + if (netdev->netdev_fd < 0) { + /* Device was opened with NETDEV_ETH_TYPE_NONE. */ + return -EAGAIN; + } + + for (;;) { + ssize_t retval = read(netdev->netdev_fd, data, size); + if (retval >= 0) { + return retval; + } else if (errno != EINTR) { + if (errno != EAGAIN) { + VLOG_WARN_RL(&rl, "error receiving Ethernet packet on %s: %s", + strerror(errno), netdev->netdev.netdev_dev->name); + } + return -errno; + } + } +} + + +/* + * According with the nature of the device a different function must be called. + * If the device is the bridge local port the 'netdev_bsd_recv_tap' function + * must be called, otherwise the 'netdev_bsd_recv_system' function is called. + * + * type!="tap" ---> system device. + * type=="tap" && netdev_fd == tap_fd ---> internal tap device + * type=="tap" && netdev_fd != tap_fd ---> internal tap device + * opened as a system + * device. + */ +static int +netdev_bsd_recv(struct netdev *netdev_, void* data, size_t size) +{ + struct netdev_bsd *netdev = netdev_bsd_cast(netdev_); + struct netdev_dev_bsd * netdev_dev = + netdev_dev_bsd_cast(netdev_get_dev(netdev_)); + + if (!strcmp(netdev_get_type(netdev_), "tap") && + netdev->netdev_fd == netdev_dev->tap_fd) { + return netdev_bsd_recv_tap(netdev, data, size); + } else { + return netdev_bsd_recv_system(netdev, data, size); + } +} + + +/* + * Registers with the poll loop to wake up from the next call to poll_block() + * when a packet is ready to be received with netdev_recv() on 'netdev'. + */ +static void +netdev_bsd_recv_wait(struct netdev *netdev_) +{ + struct netdev_bsd *netdev = netdev_bsd_cast(netdev_); + + if (netdev->netdev_fd >= 0) { + poll_fd_wait(netdev->netdev_fd, POLLIN); + } +} + +/* Discards all packets waiting to be received from 'netdev'. */ +static int +netdev_bsd_drain(struct netdev *netdev_) +{ + struct ifreq ifr; + struct netdev_bsd *netdev = netdev_bsd_cast(netdev_); + + strcpy(ifr.ifr_name, netdev_get_name(netdev_)); + if (ioctl(netdev->netdev_fd, BIOCFLUSH, &ifr) == -1) { + VLOG_DBG_RL(&rl, "%s: ioctl(BIOCFLUSH) failed: %s", + netdev_get_name(netdev_), strerror(errno)); + return errno; + } + return 0; +} + +/* + * Send a packet on the specified network device. The device could be either a + * system or a tap device. + */ +static int +netdev_bsd_send(struct netdev *netdev_, const void *data, size_t size) +{ + struct netdev_bsd *netdev = netdev_bsd_cast(netdev_); + struct netdev_dev_bsd * netdev_dev = + netdev_dev_bsd_cast(netdev_get_dev(netdev_)); + + /* XXX should support sending even if 'ethertype' was NETDEV_ETH_TYPE_NONE. + */ + if (netdev->netdev_fd < 0) { + return EPIPE; + } + + for (;;) { + ssize_t retval; + if (!strcmp(netdev_get_type(netdev_), "tap") && + netdev_dev->tap_fd == netdev->netdev_fd) { + retval = write(netdev->netdev_fd, data, size); + } else { + retval = pcap_inject(netdev->pcap_handle, data, size); + } + if (retval < 0) { + if (errno == EINTR) { + continue; + } else if (errno != EAGAIN) { + VLOG_WARN_RL(&rl, "error sending Ethernet packet on %s: %s", + netdev_get_name(netdev_), strerror(errno)); + } + return errno; + } else if (retval != size) { + VLOG_WARN_RL(&rl, "sent partial Ethernet packet (%zd bytes of " + "%zu) on %s", retval, size, + netdev_get_name(netdev_)); + return EMSGSIZE; + } else { + return 0; + } + } +} + +/* + * Registers with the poll loop to wake up from the next call to poll_block() + * when the packet transmission queue has sufficient room to transmit a packet + * with netdev_send(). + */ +static void +netdev_bsd_send_wait(struct netdev *netdev_) +{ + struct netdev_bsd *netdev = netdev_bsd_cast(netdev_); + + if (netdev->netdev_fd < 0) { /* Nothing to do. */ + return; + } + + if (strcmp(netdev_get_type(netdev_), "tap")) { + poll_fd_wait(netdev->netdev_fd, POLLOUT); + } else { + /* TAP device always accepts packets. */ + poll_immediate_wake(); + } +} + +/* + * Attempts to set 'netdev''s MAC address to 'mac'. Returns 0 if successful, + * otherwise a positive errno value. + */ +static int +netdev_bsd_set_etheraddr(struct netdev *netdev_, + const uint8_t mac[ETH_ADDR_LEN]) +{ + struct netdev_dev_bsd *netdev_dev = + netdev_dev_bsd_cast(netdev_get_dev(netdev_)); + int error; + + if (!(netdev_dev->cache_valid & VALID_ETHERADDR) + || !eth_addr_equals(netdev_dev->etheraddr, mac)) { + error = set_etheraddr(netdev_get_name(netdev_), AF_LINK, ETH_ADDR_LEN, + mac); + if (!error) { + netdev_dev->cache_valid |= VALID_ETHERADDR; + memcpy(netdev_dev->etheraddr, mac, ETH_ADDR_LEN); + } + } else { + error = 0; + } + return error; +} + +/* + * Returns a pointer to 'netdev''s MAC address. The caller must not modify or + * free the returned buffer. + */ +static int +netdev_bsd_get_etheraddr(const struct netdev *netdev_, + uint8_t mac[ETH_ADDR_LEN]) +{ + struct netdev_dev_bsd *netdev_dev = + netdev_dev_bsd_cast(netdev_get_dev(netdev_)); + + if (!(netdev_dev->cache_valid & VALID_ETHERADDR)) { + int error = get_etheraddr(netdev_get_name(netdev_), + netdev_dev->etheraddr); + if (error) { + return error; + } + netdev_dev->cache_valid |= VALID_ETHERADDR; + } + memcpy(mac, netdev_dev->etheraddr, ETH_ADDR_LEN); + + return 0; +} + +/* + * Returns the maximum size of transmitted (and received) packets on 'netdev', + * in bytes, not including the hardware header; thus, this is typically 1500 + * bytes for Ethernet devices. + */ +static int +netdev_bsd_get_mtu(const struct netdev *netdev_, int *mtup) +{ + struct netdev_dev_bsd *netdev_dev = + netdev_dev_bsd_cast(netdev_get_dev(netdev_)); + + if (!(netdev_dev->cache_valid & VALID_MTU)) { + struct ifreq ifr; + int error; + + error = netdev_bsd_do_ioctl(netdev_, &ifr, SIOCGIFMTU, "SIOCGIFMTU"); + if (error) { + return error; + } + netdev_dev->mtu = ifr.ifr_mtu; + netdev_dev->cache_valid |= VALID_MTU; + } + + *mtup = netdev_dev->mtu; + return 0; +} + +static int +netdev_bsd_get_ifindex(const struct netdev *netdev) +{ + int ifindex, error; + + error = get_ifindex(netdev, &ifindex); + return error ? -error : ifindex; +} + +static int +netdev_bsd_get_carrier(const struct netdev *netdev_, bool *carrier) +{ + struct netdev_dev_bsd *netdev_dev = + netdev_dev_bsd_cast(netdev_get_dev(netdev_)); + + if (!(netdev_dev->cache_valid & VALID_CARRIER)) { + struct ifmediareq ifmr; + + memset(&ifmr, 0, sizeof(ifmr)); + strncpy(ifmr.ifm_name, netdev_get_name(netdev_), sizeof ifmr.ifm_name); + + if (ioctl(af_inet_sock, SIOCGIFMEDIA, &ifmr) == -1) { + VLOG_DBG_RL(&rl, "%s: ioctl(SIOCGIFMEDIA) failed: %s", + netdev_get_name(netdev_), strerror(errno)); + return errno; + } + + netdev_dev->carrier = (ifmr.ifm_status & IFM_ACTIVE) == IFM_ACTIVE; + netdev_dev->cache_valid |= VALID_CARRIER; + + /* If the interface doesn't report whether the media is active, + * just assume it is active. */ + if ((ifmr.ifm_status & IFM_AVALID) == 0) { + netdev_dev->carrier = true; + } + } + *carrier = netdev_dev->carrier; + + return 0; +} + +/* Retrieves current device stats for 'netdev'. */ +static int +netdev_bsd_get_stats(const struct netdev *netdev_, struct netdev_stats *stats) +{ + int if_count, i; + int mib[6]; + size_t len; + struct ifmibdata ifmd; + + COVERAGE_INC(netdev_get_stats); + + mib[0] = CTL_NET; + mib[1] = PF_LINK; + mib[2] = NETLINK_GENERIC; + mib[3] = IFMIB_SYSTEM; + mib[4] = IFMIB_IFCOUNT; + + len = sizeof(if_count); + + if (sysctl(mib, 5, &if_count, &len, (void *)0, 0) == -1) { + VLOG_DBG_RL(&rl, "%s: sysctl failed: %s", + netdev_get_name(netdev_), strerror(errno)); + return errno; + } + + mib[5] = IFDATA_GENERAL; + mib[3] = IFMIB_IFDATA; + len = sizeof(ifmd); + for (i = 1; i <= if_count; i++) { + mib[4] = i; //row + if (sysctl(mib, 6, &ifmd, &len, (void *)0, 0) == -1) { + VLOG_DBG_RL(&rl, "%s: sysctl failed: %s", + netdev_get_name(netdev_), strerror(errno)); + return errno; + } else if (!strcmp(ifmd.ifmd_name, netdev_get_name(netdev_))) { + stats->rx_packets = ifmd.ifmd_data.ifi_ipackets; + stats->tx_packets = ifmd.ifmd_data.ifi_opackets; + stats->rx_bytes = ifmd.ifmd_data.ifi_ibytes; + stats->tx_bytes = ifmd.ifmd_data.ifi_obytes; + stats->rx_errors = ifmd.ifmd_data.ifi_ierrors; + stats->tx_errors = ifmd.ifmd_data.ifi_oerrors; + stats->rx_dropped = ifmd.ifmd_data.ifi_iqdrops; + stats->tx_dropped = 0; + stats->multicast = ifmd.ifmd_data.ifi_imcasts; + stats->collisions = ifmd.ifmd_data.ifi_collisions; + + stats->rx_length_errors = 0; + stats->rx_over_errors = 0; + stats->rx_crc_errors = 0; + stats->rx_frame_errors = 0; + stats->rx_fifo_errors = 0; + stats->rx_missed_errors = 0; + + stats->tx_aborted_errors = 0; + stats->tx_carrier_errors = 0; + stats->tx_fifo_errors = 0; + stats->tx_heartbeat_errors = 0; + stats->tx_window_errors = 0; + break; + } + } + + return 0; +} + +static uint32_t +netdev_bsd_parse_media(int media) +{ + uint32_t supported = 0; + bool half_duplex = media & IFM_HDX ? true : false; + + switch (IFM_SUBTYPE(media)) { + case IFM_10_2: + case IFM_10_5: + case IFM_10_STP: + case IFM_10_T: + supported |= half_duplex ? OFPPF_10MB_HD : OFPPF_10MB_FD; + supported |= OFPPF_COPPER; + break; + + case IFM_10_FL: + supported |= half_duplex ? OFPPF_10MB_HD : OFPPF_10MB_FD; + supported |= OFPPF_FIBER; + break; + + case IFM_100_T2: + case IFM_100_T4: + case IFM_100_TX: + case IFM_100_VG: + supported |= half_duplex ? OFPPF_100MB_HD : OFPPF_100MB_FD; + supported |= OFPPF_COPPER; + break; + + case IFM_100_FX: + supported |= half_duplex ? OFPPF_100MB_HD : OFPPF_100MB_FD; + supported |= OFPPF_FIBER; + break; + + case IFM_1000_CX: + case IFM_1000_T: + supported |= half_duplex ? OFPPF_1GB_HD : OFPPF_1GB_FD; + supported |= OFPPF_COPPER; + break; + + case IFM_1000_LX: + case IFM_1000_SX: + supported |= half_duplex ? OFPPF_1GB_HD : OFPPF_1GB_FD; + supported |= OFPPF_FIBER; + break; + + case IFM_10G_CX4: + supported |= OFPPF_10GB_FD; + supported |= OFPPF_COPPER; + break; + + case IFM_10G_LR: + case IFM_10G_SR: + supported |= OFPPF_10GB_FD; + supported |= OFPPF_FIBER; + break; + + default: + return 0; + } + + if (IFM_SUBTYPE(media) == IFM_AUTO) { + supported |= OFPPF_AUTONEG; + } + /* + if (media & IFM_ETH_FMASK) { + supported |= OFPPF_PAUSE; + } + */ + + return supported; +} + +/* + * Stores the features supported by 'netdev' into each of '*current', + * '*advertised', '*supported', and '*peer' that are non-null. Each value is a + * bitmap of "enum ofp_port_features" bits, in host byte order. Returns 0 if + * successful, otherwise a positive errno value. On failure, all of the + * passed-in values are set to 0. + */ +static int +netdev_bsd_get_features(struct netdev *netdev, + uint32_t *current, uint32_t *advertised, + uint32_t *supported, uint32_t *peer) +{ + struct ifmediareq ifmr; + int *media_list; + int i; + int error; + + + /* XXX Look into SIOCGIFCAP instead of SIOCGIFMEDIA */ + + memset(&ifmr, 0, sizeof(ifmr)); + strncpy(ifmr.ifm_name, netdev_get_name(netdev), sizeof ifmr.ifm_name); + + /* We make two SIOCGIFMEDIA ioctl calls. The first to determine the + * number of supported modes, and a second with a buffer to retrieve + * them. */ + if (ioctl(af_inet_sock, SIOCGIFMEDIA, &ifmr) == -1) { + VLOG_DBG_RL(&rl, "%s: ioctl(SIOCGIFMEDIA) failed: %s", + netdev_get_name(netdev), strerror(errno)); + return errno; + } + + media_list = xcalloc(ifmr.ifm_count, sizeof(int)); + ifmr.ifm_ulist = media_list; + + if (!IFM_TYPE(ifmr.ifm_current) & IFM_ETHER) { + VLOG_DBG_RL(&rl, "%s: doesn't appear to be ethernet", + netdev_get_name(netdev)); + error = EINVAL; + goto cleanup; + } + + if (ioctl(af_inet_sock, SIOCGIFMEDIA, &ifmr) == -1) { + VLOG_DBG_RL(&rl, "%s: ioctl(SIOCGIFMEDIA) failed: %s", + netdev_get_name(netdev), strerror(errno)); + error = errno; + goto cleanup; + } + + /* Current settings. */ + *current = netdev_bsd_parse_media(ifmr.ifm_active); + + /* Advertised features. */ + *advertised = netdev_bsd_parse_media(ifmr.ifm_current); + + /* Supported features. */ + *supported = 0; + for (i = 0; i < ifmr.ifm_count; i++) { + *supported |= netdev_bsd_parse_media(ifmr.ifm_ulist[i]); + } + + /* Peer advertisements. */ + *peer = 0; /* XXX */ + + error = 0; +cleanup: + free(media_list); + return error; +} + +/* + * If 'netdev' has an assigned IPv4 address, sets '*in4' to that address (if + * 'in4' is non-null) and returns true. Otherwise, returns false. + */ +static int +netdev_bsd_get_in4(const struct netdev *netdev_, struct in_addr *in4, + struct in_addr *netmask) +{ + struct netdev_dev_bsd *netdev_dev = + netdev_dev_bsd_cast(netdev_get_dev(netdev_)); + + if (!(netdev_dev->cache_valid & VALID_IN4)) { + const struct sockaddr_in *sin; + struct ifreq ifr; + int error; + + ifr.ifr_addr.sa_family = AF_INET; + error = netdev_bsd_do_ioctl(netdev_, &ifr, + SIOCGIFADDR, "SIOCGIFADDR"); + if (error) { + return error; + } + + sin = (struct sockaddr_in *) &ifr.ifr_addr; + netdev_dev->in4 = sin->sin_addr; + netdev_dev->cache_valid |= VALID_IN4; + error = netdev_bsd_do_ioctl(netdev_, &ifr, + SIOCGIFNETMASK, "SIOCGIFNETMASK"); + if (error) { + return error; + } + *netmask = ((struct sockaddr_in*)&ifr.ifr_addr)->sin_addr; + } + *in4 = netdev_dev->in4; + + return in4->s_addr == INADDR_ANY ? EADDRNOTAVAIL : 0; +} + +/* + * Assigns 'addr' as 'netdev''s IPv4 address and 'mask' as its netmask. If + * 'addr' is INADDR_ANY, 'netdev''s IPv4 address is cleared. Returns a + * positive errno value. + */ +static int +netdev_bsd_set_in4(struct netdev *netdev_, struct in_addr addr, + struct in_addr mask) +{ + struct netdev_dev_bsd *netdev_dev = + netdev_dev_bsd_cast(netdev_get_dev(netdev_)); + int error; + + error = do_set_addr(netdev_, SIOCSIFADDR, "SIOCSIFADDR", addr); + if (!error) { + netdev_dev->cache_valid |= VALID_IN4; + netdev_dev->in4 = addr; + if (addr.s_addr != INADDR_ANY) { + error = do_set_addr(netdev_, SIOCSIFNETMASK, + "SIOCSIFNETMASK", mask); + } + } + return error; +} + +static int +netdev_bsd_get_in6(const struct netdev *netdev_, struct in6_addr *in6) +{ + struct netdev_dev_bsd *netdev_dev = + netdev_dev_bsd_cast(netdev_get_dev(netdev_)); + if (!(netdev_dev->cache_valid & VALID_IN6)) { + struct ifaddrs *ifa, *head; + struct sockaddr_in6 *sin6; + const char *netdev_name = netdev_get_name(netdev_); + + if (getifaddrs(&head) != 0) { + VLOG_ERR("getifaddrs on %s device failed: %s", netdev_name, + strerror(errno)); + return errno; + } + + for (ifa = head; ifa; ifa = ifa->ifa_next) { + if (ifa->ifa_addr->sa_family == AF_INET6 && + !strcmp(ifa->ifa_name, netdev_name)) { + sin6 = (struct sockaddr_in6 *)ifa->ifa_addr; + if (sin6) { + memcpy(&netdev_dev->in6, &sin6->sin6_addr, sin6->sin6_len); + netdev_dev->cache_valid |= VALID_IN6; + *in6 = netdev_dev->in6; + freeifaddrs(head); + return 0; + } + } + } + return EADDRNOTAVAIL; + } + *in6 = netdev_dev->in6; + return 0; +} + +static void +make_in4_sockaddr(struct sockaddr *sa, struct in_addr addr) +{ + struct sockaddr_in sin; + memset(&sin, 0, sizeof sin); + sin.sin_family = AF_INET; + sin.sin_addr = addr; + sin.sin_port = 0; + + memset(sa, 0, sizeof *sa); + memcpy(sa, &sin, sizeof sin); +} + +static int +do_set_addr(struct netdev *netdev, + int ioctl_nr, const char *ioctl_name, struct in_addr addr) +{ + struct ifreq ifr; + make_in4_sockaddr(&ifr.ifr_addr, addr); + return netdev_bsd_do_ioctl(netdev, &ifr, ioctl_nr, ioctl_name); +} + +static int +nd_to_iff_flags(enum netdev_flags nd) +{ + int iff = 0; + if (nd & NETDEV_UP) { + iff |= IFF_UP; + } + if (nd & NETDEV_PROMISC) { + iff |= IFF_PROMISC; + iff |= IFF_PPROMISC; + } + return iff; +} + +static int +iff_to_nd_flags(int iff) +{ + enum netdev_flags nd = 0; + if (iff & IFF_UP) { + nd |= NETDEV_UP; + } + if (iff & IFF_PROMISC) { + nd |= NETDEV_PROMISC; + } + return nd; +} + +static int +netdev_bsd_update_flags(struct netdev *netdev, enum netdev_flags off, + enum netdev_flags on, enum netdev_flags *old_flagsp) +{ + int old_flags, new_flags; + int error; + + error = get_flags(netdev, &old_flags); + if (!error) { + *old_flagsp = iff_to_nd_flags(old_flags); + new_flags = (old_flags & ~nd_to_iff_flags(off)) | nd_to_iff_flags(on); + if (new_flags != old_flags) { + error = set_flags(netdev, new_flags); + } + } + return error; +} + +/* Call callbacks for all the list of notifiers */ +static void +poll_notify(struct list *dev_notifiers) +{ + struct netdev_bsd_notifier *notifier; + LIST_FOR_EACH (notifier, struct netdev_bsd_notifier, node, dev_notifiers) { + struct netdev_notifier *n = ¬ifier->notifier; + n->cb(n); + } +} + +/* + * The callback registered for 'netdev_bsd_poll_notifier'. + * + * If 'change' is set it retrieves the element in the 'all_bsd_notifiers' + * relative to the network device which has been subject to the change, and + * then call the callbacks for all the notifiers registered for that network + * device. + */ +static void +netdev_bsd_poll_cb(const struct rtbsd_change *change, void *aux) +{ + struct shash *arg = aux; + + if (change) { + struct list *dev_notifiers = shash_find_data(arg, change->if_name); + if (dev_notifiers) { + poll_notify(dev_notifiers); + } + } else { + struct shash_node *node; + SHASH_FOR_EACH (node, arg) { + poll_notify(node->data); + } + } +} + +/* + * Arranges for 'cb' to be called whenever one of the attributes of + * 'netdev' changes and sets '*notifierp' to a newly created + * netdev_notifier that represents this arrangement. + * + * If the 'all_bsd_notifiers' is empty, it registers the + * 'netdev_bsd_poll_notifier'. Than it creates a new notifier and inserts it + * into the list belonging to the node in the 'all_bsd_notifiers' relative to + * the network device. In case it is the first notifier registered for this + * network device, it first create the node into the 'all_bsd_notifiers' and + * then appends the notifier to its list. + */ +static int +netdev_bsd_poll_add(struct netdev *netdev, + void (*cb)(struct netdev_notifier *), void *aux, + struct netdev_notifier **notifierp) +{ + const char *netdev_name = netdev_get_name(netdev); + struct netdev_bsd_notifier *notifier; + struct list *list; + + if (shash_is_empty(&all_bsd_notifiers)) { + /* This is the first time that this function is called so the main + * notifier needs to be registered */ + int error = rtbsd_notifier_register(&netdev_bsd_poll_notifier, + netdev_bsd_poll_cb, &all_bsd_notifiers); + if (error) { + return error; + } + } + list = shash_find_data(&all_bsd_notifiers, netdev_name); + if (!list) { + list = xmalloc(sizeof *list); + list_init(list); + shash_add(&all_bsd_notifiers, netdev_name, list); + } + + notifier = xmalloc(sizeof *notifier); + netdev_notifier_init(¬ifier->notifier, netdev, cb, aux); + list_push_back(list, ¬ifier->node); + *notifierp = ¬ifier->notifier; + return 0; +} + +static void +netdev_bsd_poll_remove(struct netdev_notifier *notifier_) +{ + struct netdev_bsd_notifier *notifier = + CONTAINER_OF(notifier_, struct netdev_bsd_notifier, notifier); + struct list *list; + + /* Remove 'notifier' from its list. */ + list = list_remove(¬ifier->node); + if (list_is_empty(list)) { + /* The list is now empty. Remove it from the hash and free it. */ + const char *netdev_name = netdev_get_name(notifier->notifier.netdev); + shash_delete(&all_bsd_notifiers, + shash_find(&all_bsd_notifiers, netdev_name)); + free(list); + } + free(notifier); + + /* If that was the last notifier, unregister. */ + if (shash_is_empty(&all_bsd_notifiers)) { + rtbsd_notifier_unregister(&netdev_bsd_poll_notifier); + } +} + +const struct netdev_class netdev_bsd_class = { + "system", + + netdev_bsd_init, + netdev_bsd_run, + netdev_bsd_wait, + netdev_bsd_create_system, + netdev_bsd_destroy, + NULL, /* reconfigure */ + netdev_bsd_open_system, + netdev_bsd_close, + + netdev_bsd_enumerate, + + netdev_bsd_recv, + netdev_bsd_recv_wait, + netdev_bsd_drain, + + netdev_bsd_send, + netdev_bsd_send_wait, + + netdev_bsd_set_etheraddr, + netdev_bsd_get_etheraddr, + netdev_bsd_get_mtu, + netdev_bsd_get_ifindex, + netdev_bsd_get_carrier, + netdev_bsd_get_stats, + NULL, /* set_stats */ + + netdev_bsd_get_features, + NULL, /* set_advertisement */ + NULL, /* get_vlan_vid */ //XXX SIOCGETVLAN + NULL, /* set_policing */ + NULL, /* get_qos_type */ + NULL, /* get_qos_capabilities */ + NULL, /* get_qos */ + NULL, /* set_qos */ + NULL, /* get_queue */ + NULL, /* set_queue */ + NULL, /* delete_queue */ + NULL, /* get_queue_stats */ + NULL, /* dump_queue */ + NULL, /* dump_queue_stats */ + + netdev_bsd_get_in4, + netdev_bsd_set_in4, + netdev_bsd_get_in6, + NULL, /* add_router */ + NULL, /* get_next_hop */ + NULL, /* arp_lookup */ + + netdev_bsd_update_flags, + + netdev_bsd_poll_add, + netdev_bsd_poll_remove, +}; + +const struct netdev_class netdev_tap_class = { + "tap", + + netdev_bsd_init, + netdev_bsd_run, + netdev_bsd_wait, + netdev_bsd_create_tap, + netdev_bsd_destroy, + NULL, /* reconfigure */ + netdev_bsd_open_system, + netdev_bsd_close, + + netdev_bsd_enumerate, + + netdev_bsd_recv, + netdev_bsd_recv_wait, + netdev_bsd_drain, + + netdev_bsd_send, + netdev_bsd_send_wait, + + netdev_bsd_set_etheraddr, + netdev_bsd_get_etheraddr, + netdev_bsd_get_mtu, + netdev_bsd_get_ifindex, + netdev_bsd_get_carrier, + netdev_bsd_get_stats, + NULL, /* set_stats */ + + netdev_bsd_get_features, + NULL, /* set_advertisement */ + NULL, /* get_vlan_vid */ + NULL, /* set_policing */ + NULL, /* get_qos_type */ + NULL, /* get_qos_capabilities */ + NULL, /* get_qos */ + NULL, /* set_qos */ + NULL, /* get_queue */ + NULL, /* set_queue */ + NULL, /* delete_queue */ + NULL, /* get_queue_stats */ + NULL, /* dump_queue */ + NULL, /* dump_queue_stats */ + + netdev_bsd_get_in4, + netdev_bsd_set_in4, + netdev_bsd_get_in6, + NULL, /* add_router */ + NULL, /* get_next_hop */ + NULL, /* arp_lookup */ + + netdev_bsd_update_flags, + + netdev_bsd_poll_add, + netdev_bsd_poll_remove, +}; + + +static void +destroy_tap(int fd, const char *name) +{ + struct ifreq ifr; + + close(fd); + strcpy(ifr.ifr_name, name); + /* XXX What to do if this call fails? */ + ioctl(af_inet_sock, SIOCIFDESTROY, &ifr); +} + +static int +get_flags(const struct netdev *netdev, int *flags) +{ + struct ifreq ifr; + int error; + + error = netdev_bsd_do_ioctl(netdev, &ifr, SIOCGIFFLAGS, "SIOCGIFFLAGS"); + + *flags = 0xFFFF0000 & (ifr.ifr_flagshigh << 16); + *flags |= 0x0000FFFF & ifr.ifr_flags; + + return error; +} + +static int +set_flags(struct netdev *netdev, int flags) +{ + struct ifreq ifr; + + ifr.ifr_flags = 0x0000FFFF & flags; + ifr.ifr_flagshigh = (0xFFFF0000 & flags) >> 16; + + return netdev_bsd_do_ioctl(netdev, &ifr, SIOCSIFFLAGS, "SIOCSIFFLAGS"); +} + +static int +get_ifindex(const struct netdev *netdev_, int *ifindexp) +{ + struct netdev_dev_bsd *netdev_dev = + netdev_dev_bsd_cast(netdev_get_dev(netdev_)); + *ifindexp = 0; + if (!(netdev_dev->cache_valid & VALID_IFINDEX)) { + int ifindex = if_nametoindex(netdev_get_name(netdev_)); + if (ifindex <= 0) { + return errno; + } + netdev_dev->cache_valid |= VALID_IFINDEX; + netdev_dev->ifindex = ifindex; + } + *ifindexp = netdev_dev->ifindex; + return 0; +} + +static int +get_etheraddr(const char *netdev_name, uint8_t ea[ETH_ADDR_LEN]) +{ + struct ifaddrs *head; + struct ifaddrs *ifa; + struct sockaddr_dl *sdl; + + if (getifaddrs(&head) != 0) { + VLOG_ERR("getifaddrs on %s device failed: %s", netdev_name, + strerror(errno)); + return errno; + } + + for (ifa = head; ifa; ifa = ifa->ifa_next) { + if (ifa->ifa_addr->sa_family == AF_LINK) { + if (!strcmp(ifa->ifa_name, netdev_name)) { + sdl = (struct sockaddr_dl *)ifa->ifa_addr; + if (sdl) { + memcpy(ea, LLADDR(sdl), sdl->sdl_alen); + freeifaddrs(head); + return 0; + } + } + } + } + + VLOG_ERR("could not find ethernet address for %s device", netdev_name); + freeifaddrs(head); + return ENODEV; +} + +static int +set_etheraddr(const char *netdev_name, int hwaddr_family, + int hwaddr_len, const uint8_t mac[ETH_ADDR_LEN]) +{ + struct ifreq ifr; + + memset(&ifr, 0, sizeof ifr); + strncpy(ifr.ifr_name, netdev_name, sizeof ifr.ifr_name); + ifr.ifr_addr.sa_family = hwaddr_family; + ifr.ifr_addr.sa_len = hwaddr_len; + memcpy(ifr.ifr_addr.sa_data, mac, hwaddr_len); + COVERAGE_INC(netdev_set_hwaddr); + if (ioctl(af_inet_sock, SIOCSIFLLADDR, &ifr) < 0) { + VLOG_ERR("ioctl(SIOCSIFLLADDR) on %s device failed: %s", + netdev_name, strerror(errno)); + return errno; + } + return 0; +} + +static int +netdev_bsd_do_ioctl(const struct netdev *netdev, struct ifreq *ifr, + unsigned long cmd, const char *cmd_name) +{ + strncpy(ifr->ifr_name, netdev_get_name(netdev), sizeof ifr->ifr_name); + if (ioctl(af_inet_sock, cmd, ifr) == -1) { + VLOG_DBG_RL(&rl, "%s: ioctl(%s) failed: %s", + netdev_get_name(netdev), cmd_name, strerror(errno)); + return errno; + } + return 0; +} diff -Nur -x '*.svn*' -x '*.gitignore*' orig/lib/netdev.c mod/lib/netdev.c --- orig/lib/netdev.c 2010-09-14 06:55:56.000000000 +0200 +++ mod/lib/netdev.c 2011-11-29 15:46:23.973514103 +0100 @@ -49,6 +49,10 @@ &netdev_gre_class, &netdev_capwap_class, #endif +#ifdef __FreeBSD__ + &netdev_bsd_class, + &netdev_tap_class, +#endif }; static struct shash netdev_classes = SHASH_INITIALIZER(&netdev_classes); diff -Nur -x '*.svn*' -x '*.gitignore*' orig/lib/netdev-provider.h mod/lib/netdev-provider.h --- orig/lib/netdev-provider.h 2010-09-14 06:55:56.000000000 +0200 +++ mod/lib/netdev-provider.h 2011-11-29 15:46:23.980180769 +0100 @@ -551,6 +551,9 @@ extern const struct netdev_class netdev_patch_class; extern const struct netdev_class netdev_gre_class; extern const struct netdev_class netdev_capwap_class; +#ifdef __FreeBSD__ +extern const struct netdev_class netdev_bsd_class; +#endif #ifdef __cplusplus } diff -Nur -x '*.svn*' -x '*.gitignore*' orig/lib/rtbsd.c mod/lib/rtbsd.c --- orig/lib/rtbsd.c 1970-01-01 01:00:00.000000000 +0100 +++ mod/lib/rtbsd.c 2011-11-29 15:46:23.966847437 +0100 @@ -0,0 +1,168 @@ +/* + * Copyright (c) 2011 Gaetano Catalli. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * 1. Redistributions of source code must retain the above copyright notice, + * this list of conditions and the following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED ``AS IS'' WITHOUT ANY WARRANTIES OF ANY KIND. + */ + +#include + +#include +#include +#include +#include +#include +#include + +#include "coverage.h" +#include "socket-util.h" +#include "poll-loop.h" +#include "vlog.h" +#include "rtbsd.h" + +VLOG_DEFINE_THIS_MODULE(rtbsd) + +/* PF_ROUTE socket. */ +static int notify_sock = -1; + +/* All registered notifiers. */ +static struct list all_notifiers = LIST_INITIALIZER(&all_notifiers); + +static void rtbsd_report_change(const struct if_msghdr *); +static void rtbsd_report_notify_error(void); + +/* Registers 'cb' to be called with auxiliary data 'aux' with network device + * change notifications. The notifier is stored in 'notifier', which the + * caller must not modify or free. + * + * Returns 0 if successful, otherwise a positive errno value. */ +int +rtbsd_notifier_register(struct rtbsd_notifier *notifier, + rtbsd_notify_func *cb, void *aux) +{ + if (notify_sock < 0) { + int error; + notify_sock = socket(PF_ROUTE, SOCK_RAW, 0); + if (notify_sock < 0) { + VLOG_WARN("could not create PF_ROUTE socket: %s", + strerror(errno)); + return errno; + } + error = set_nonblocking(notify_sock); + if (error) { + VLOG_WARN("error set_nonblocking PF_ROUTE socket: %s", + strerror(error)); + return error; + } + } else { + /* Catch up on notification work so that the new notifier won't + * receive any stale notifications. XXX*/ + rtbsd_notifier_run(); + } + + list_push_back(&all_notifiers, ¬ifier->node); + notifier->cb = cb; + notifier->aux = aux; + return 0; +} + +/* Cancels notification on 'notifier', which must have previously been + * registered with rtbsd_notifier_register(). */ +void +rtbsd_notifier_unregister(struct rtbsd_notifier *notifier) +{ + list_remove(¬ifier->node); + if (list_is_empty(&all_notifiers)) { + close(notify_sock); + notify_sock = -1; + } +} + +/* Calls all of the registered notifiers, passing along any as-yet-unreported + * netdev change events. */ +void +rtbsd_notifier_run(void) +{ + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); + struct if_msghdr msg; + if (notify_sock < 0) { + return; + } + + for (;;) { + int retval; + + msg.ifm_type = RTM_IFINFO; + msg.ifm_version = RTM_VERSION; //XXX check if necessary + + /* read from PF_ROUTE socket */ + retval = read(notify_sock, (char *)&msg, sizeof(msg)); + if (retval >= 0) { + /* received packet from PF_ROUTE socket + * XXX check for bad packets */ + if (msg.ifm_type == RTM_IFINFO) { + rtbsd_report_change(&msg); + } + } else if (errno == EAGAIN) { + return; + } else { + if (errno == ENOBUFS) { + VLOG_WARN_RL(&rl, "PF_ROUTE receive buffer overflowed"); + } else { + VLOG_WARN_RL(&rl, "error reading PF_ROUTE socket: %s", + strerror(errno)); + } + rtbsd_report_notify_error(); + } + } +} + +/* Causes poll_block() to wake up when network device change notifications are + * ready. */ +void +rtbsd_notifier_wait(void) +{ + if (notify_sock >= 0) { + poll_fd_wait(notify_sock, POLLIN); + } +} + +static void +rtbsd_report_change(const struct if_msghdr *msg) +{ + struct rtbsd_notifier *notifier; + struct rtbsd_change change; + + /*COVERAGE_INC(rtbsd_changed);*/ /* XXX update coverage-counters.c */ + + change.msg_type = msg->ifm_type; //XXX + change.if_index = msg->ifm_index; + if_indextoname(msg->ifm_index, change.if_name); + change.master_ifindex = 0; //XXX + + LIST_FOR_EACH (notifier, struct rtbsd_notifier, node, + &all_notifiers) { + notifier->cb(&change, notifier->aux); + } +} + +/* If an error occurs the notifiers' callbacks are called with NULL changes */ +static void +rtbsd_report_notify_error(void) +{ + struct rtbsd_notifier *notifier; + + LIST_FOR_EACH (notifier, struct rtbsd_notifier, node, + &all_notifiers) { + notifier->cb(NULL, notifier->aux); + } +} diff -Nur -x '*.svn*' -x '*.gitignore*' orig/lib/rtbsd.h mod/lib/rtbsd.h --- orig/lib/rtbsd.h 1970-01-01 01:00:00.000000000 +0100 +++ mod/lib/rtbsd.h 2011-11-29 15:46:23.966847437 +0100 @@ -0,0 +1,58 @@ +/* + * Copyright (c) 2011 Gaetano Catalli. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * 1. Redistributions of source code must retain the above copyright notice, + * this list of conditions and the following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED ``AS IS'' WITHOUT ANY WARRANTIES OF ANY KIND. + */ + +#ifndef RTBSD_H +#define RTBSD_H 1 + +#include "list.h" + +/* + * A digested version of a message received from a PF_ROUTE socket which + * indicates that a network device has been created or destroyed or changed. + */ +struct rtbsd_change { + /* Copied from struct if_msghdr. */ + int msg_type; /* e.g. XXX. */ + + /* Copied from struct if_msghdr. */ + int if_index; /* Index of network device. */ + + char if_name[IF_NAMESIZE]; /* Name of network device. */ + int master_ifindex; /* Ifindex of datapath master (0 if none). */ +}; + +/* + * Function called to report that a netdev has changed. 'change' describes the + * specific change. It may be null if the buffer of change information + * overflowed, in which case the function must assume that every device may + * have changed. 'aux' is as specified in the call to + * rtbsd_notifier_register(). + */ +typedef void rtbsd_notify_func(const struct rtbsd_change *, void *aux); + +struct rtbsd_notifier { + struct list node; + rtbsd_notify_func *cb; + void *aux; +}; + +int rtbsd_notifier_register(struct rtbsd_notifier *, + rtbsd_notify_func *, void *aux); +void rtbsd_notifier_unregister(struct rtbsd_notifier *); +void rtbsd_notifier_run(void); +void rtbsd_notifier_wait(void); + +#endif /* rtbsd.h */ diff -Nur -x '*.svn*' -x '*.gitignore*' orig/lib/socket-util.c mod/lib/socket-util.c --- orig/lib/socket-util.c 2010-09-14 06:55:56.000000000 +0200 +++ mod/lib/socket-util.c 2011-11-29 15:46:23.973514103 +0100 @@ -259,7 +259,10 @@ } fatal_signal_add_file_to_unlink(bind_path); if (bind(fd, (struct sockaddr*) &un, un_len) - || fchmod(fd, S_IRWXU)) { +#ifndef __FreeBSD__ + || fchmod(fd, S_IRWXU) +#endif + ) { goto error; } } diff -Nur -x '*.svn*' -x '*.gitignore*' orig/lib/stream-ssl.c mod/lib/stream-ssl.c --- orig/lib/stream-ssl.c 2010-09-14 06:55:56.000000000 +0200 +++ mod/lib/stream-ssl.c 2011-11-29 15:46:23.956847435 +0100 @@ -22,6 +22,8 @@ #include #include #include +#include +#include #include #include #include diff -Nur -x '*.svn*' -x '*.gitignore*' orig/utilities/ovs-ofctl.c mod/utilities/ovs-ofctl.c --- orig/utilities/ovs-ofctl.c 2010-09-14 06:55:56.000000000 +0200 +++ mod/utilities/ovs-ofctl.c 2011-11-29 15:46:30.433514302 +0100 @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include openvswitch-1.1.0pre2-threaded.patch000644 000423 000000 00000307514 11747450023 017725 0ustar00luigiwheel000000 000000 diff -Nur -x '*.svn*' orig/acinclude.m4 mod/acinclude.m4 --- orig/acinclude.m4 2010-09-10 23:36:43.000000000 +0200 +++ mod/acinclude.m4 2011-11-29 12:51:10.969855922 +0100 @@ -220,6 +220,17 @@ [Define to 1 if net/if_packet.h is available.]) fi]) +dnl Checks for net/if_dl.h. +AC_DEFUN([OVS_CHECK_IF_DL], + [AC_CHECK_HEADER([net/if_dl.h], + [HAVE_IF_DL=yes], + [HAVE_IF_DL=no]) + AM_CONDITIONAL([HAVE_IF_DL], [test "$HAVE_IF_DL" = yes]) + if test "$HAVE_IF_DL" = yes; then + AC_DEFINE([HAVE_IF_DL], [1], + [Define to 1 if net/if_dl.h is available.]) + fi]) + dnl Checks for buggy strtok_r. dnl dnl Some versions of glibc 2.7 has a bug in strtok_r when compiling diff -Nur -x '*.svn*' orig/configure.ac mod/configure.ac --- orig/configure.ac 2010-09-14 06:49:48.000000000 +0200 +++ mod/configure.ac 2011-11-29 12:51:10.966522588 +0100 @@ -42,6 +42,7 @@ AC_SEARCH_LIBS([pow], [m]) AC_SEARCH_LIBS([clock_gettime], [rt]) +OVS_CHECK_THREADED OVS_CHECK_COVERAGE OVS_CHECK_NDEBUG OVS_CHECK_NETLINK @@ -52,6 +53,7 @@ OVS_CHECK_OVSDBMONITOR OVS_CHECK_ER_DIAGRAMS OVS_CHECK_IF_PACKET +OVS_CHECK_IF_DL OVS_CHECK_STRTOK_R AC_CHECK_MEMBERS([struct stat.st_mtim.tv_nsec, struct stat.st_mtimensec], [], [], [[#include ]]) diff -Nur -x '*.svn*' orig/lib/automake.mk mod/lib/automake.mk --- orig/lib/automake.mk 2010-09-10 23:36:43.000000000 +0200 +++ mod/lib/automake.mk 2011-11-29 12:51:05.443189084 +0100 @@ -187,6 +187,13 @@ lib/rtnetlink.h endif +if HAVE_IF_DL +lib_libopenvswitch_a_SOURCES += \ + lib/netdev-bsd.c \ + lib/rtbsd.c \ + lib/rtbsd.h +endif + if HAVE_OPENSSL lib_libopenvswitch_a_SOURCES += lib/stream-ssl.c nodist_lib_libopenvswitch_a_SOURCES += lib/dhparams.c diff -Nur -x '*.svn*' orig/lib/dispatch.h mod/lib/dispatch.h --- orig/lib/dispatch.h 1970-01-01 01:00:00.000000000 +0100 +++ mod/lib/dispatch.h 2011-11-29 12:51:05.449855752 +0100 @@ -0,0 +1,16 @@ +#include +#include + +#ifndef DISPATCH_H +#define DISPATCH_H 1 + +struct pkthdr { + struct timeval ts; /* time stamp */ + uint32_t caplen; /* length of portion present */ + uint32_t len; /* length this packet (off wire) */ +}; + +typedef void (*pkt_handler)(u_char *user, const struct pkthdr *h, + const u_char *pkt); + +#endif /* DISPATCH_H */ diff -Nur -x '*.svn*' orig/lib/dpif.c mod/lib/dpif.c --- orig/lib/dpif.c 2010-09-10 23:36:43.000000000 +0200 +++ mod/lib/dpif.c 2011-11-29 12:51:05.443189084 +0100 @@ -37,6 +37,7 @@ #include "svec.h" #include "util.h" #include "valgrind.h" +#include "fatal-signal.h" #include "vlog.h" VLOG_DEFINE_THIS_MODULE(dpif) @@ -54,6 +55,7 @@ }; static struct shash dpif_classes = SHASH_INITIALIZER(&dpif_classes); + /* Rate limit for individual messages going to or from the datapath, output at * DBG level. This is very high because, if these are enabled, it is because * we really need to see them. */ @@ -78,14 +80,30 @@ if (status < 0) { int i; +#ifdef THREADED + struct shash_node *node; +#endif status = 0; for (i = 0; i < ARRAY_SIZE(base_dpif_classes); i++) { dp_register_provider(base_dpif_classes[i]); } + +#ifdef THREADED + /* register an exit handler for the registered classes */ + SHASH_FOR_EACH(node, &dpif_classes) { + const struct registered_dpif_class *registered_class = node->data; + if (registered_class->dpif_class.exit_hook) { + fatal_signal_add_hook(registered_class->dpif_class.exit_hook, + NULL, NULL, true); + } + } +#endif } } + + /* Performs periodic work needed by all the various kinds of dpifs. * * If your program opens any dpifs, it must call both this function and @@ -118,13 +136,34 @@ } } +#ifdef THREADED +/* Start the datapath management. + * + * This function has been thought for a scenario in which the management of the + * datapath module and the ofproto module are performed in separate + * threads/processes module. */ +void +dp_start(void) +{ + struct shash_node *node; + + SHASH_FOR_EACH(node, &dpif_classes) { + const struct registered_dpif_class *registered_class = node->data; + if (registered_class->dpif_class.start) { + registered_class->dpif_class.start(); + } + } +} +#endif + + /* Registers a new datapath provider. After successful registration, new * datapaths of that type can be opened using dpif_open(). */ int dp_register_provider(const struct dpif_class *new_class) { struct registered_dpif_class *registered_class; - + if (shash_find(&dpif_classes, new_class->type)) { VLOG_WARN("attempted to register duplicate datapath provider: %s", new_class->type); diff -Nur -x '*.svn*' orig/lib/dpif.h mod/lib/dpif.h --- orig/lib/dpif.h 2010-09-10 23:36:43.000000000 +0200 +++ mod/lib/dpif.h 2011-11-29 12:51:05.446522418 +0100 @@ -36,6 +36,9 @@ void dp_run(void); void dp_wait(void); +#ifdef THREADED +void dp_start(void); +#endif int dp_register_provider(const struct dpif_class *); int dp_unregister_provider(const char *type); diff -Nur -x '*.svn*' orig/lib/dpif-linux.c mod/lib/dpif-linux.c --- orig/lib/dpif-linux.c 2010-09-10 23:36:43.000000000 +0200 +++ mod/lib/dpif-linux.c 2011-11-29 12:51:05.459855752 +0100 @@ -524,6 +524,10 @@ "system", NULL, NULL, +#ifdef THREADED + NULL, + NULL, +#endif dpif_linux_enumerate, dpif_linux_open, dpif_linux_close, diff -Nur -x '*.svn*' orig/lib/dpif-netdev.c mod/lib/dpif-netdev.c --- orig/lib/dpif-netdev.c 2010-09-10 23:36:43.000000000 +0200 +++ mod/lib/dpif-netdev.c 2011-11-29 12:51:05.449855752 +0100 @@ -31,6 +31,15 @@ #include #include +#ifdef THREADED +#include +#include + +#include "socket-util.h" +#include "fatal-signal.h" +#include "dispatch.h" +#endif + #include "csum.h" #include "dpif-provider.h" #include "flow.h" @@ -49,6 +58,17 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev) +/* We could use these macros instead of using #ifdef and #endif every time we + * need to call the pthread_mutex_lock/unlock. +#ifdef THREADED +#define LOCK(mutex) pthread_mutex_lock(mutex) +#define UNLOCK(mutex) pthread_mutex_unlock(mutex) +#else +#define LOCK(mutex) +#define UNLOCK(mutex) +#endif +*/ + /* Configuration parameters. */ enum { N_QUEUES = 2 }; /* Number of queues for dpif_recv(). */ enum { MAX_QUEUE_LEN = 100 }; /* Maximum number of packets per queue. */ @@ -68,8 +88,25 @@ bool destroyed; bool drop_frags; /* Drop all IP fragments, if true. */ - struct ovs_queue queues[N_QUEUES]; /* Messages queued for dpif_recv(). */ + +#ifdef THREADED + /* The pipe is used to signal the presence of a packet on the queue. + * - dpif_netdev_recv_wait() waits on p[0] + * - dpif_netdev_recv() extract from queue and read p[0] + * - dp_netdev_output_control() send to queue and write p[1] + */ + + /* The access to this queue is protected by the table_mutex mutex */ + int pipe[2]; /* signal a packet on the queue */ + + pthread_mutex_t table_mutex; /* mutex for the flow table */ + pthread_mutex_t port_list_mutex; /* port list mutex */ +#endif + + struct ovs_queue queues[N_QUEUES]; /* messages queued for dpif_recv(). */ + struct hmap flow_table; /* Flow table. */ + struct odp_port_group groups[N_GROUPS]; /* Statistics. */ @@ -79,9 +116,9 @@ long long int n_lost; /* Number of misses not passed to client. */ /* Ports. */ - int n_ports; struct dp_netdev_port *ports[MAX_PORTS]; struct list port_list; + int n_ports; unsigned int serial; }; @@ -90,7 +127,12 @@ int port_no; /* Index into dp_netdev's 'ports'. */ struct list node; /* Element in dp_netdev's 'port_list'. */ struct netdev *netdev; + bool internal; /* Internal port (as ODP_PORT_INTERNAL)? */ +#ifdef THREADED + struct pollfd *poll_fd; /* Useful to manage the poll loop in the + * thread */ +#endif }; /* A flow in dp_netdev's 'flow_table'. */ @@ -122,6 +164,11 @@ struct list dp_netdev_list = LIST_INITIALIZER(&dp_netdev_list); enum { N_DP_NETDEVS = ARRAY_SIZE(dp_netdevs) }; +#ifdef THREADED +/* Descriptor of the thread that manages the datapaths */ +pthread_t thread_p; +#endif + /* Maximum port MTU seen so far. */ static int max_mtu = ETH_PAYLOAD_MAX; @@ -213,7 +260,7 @@ struct dp_netdev *dp; int error; int i; - + if (dp_netdevs[dp_idx]) { return EBUSY; } @@ -224,15 +271,33 @@ dp->dp_idx = dp_idx; dp->open_cnt = 0; dp->drop_frags = false; + +#ifdef THREADED + error = pipe(dp->pipe); + if (error) { + fprintf(stderr, "pipe creation error\n"); + return errno; + } + if (set_nonblocking(dp->pipe[0]) || set_nonblocking(dp->pipe[1])) { + fprintf(stderr, "error set_nonblock on pipe\n"); + return errno; + } + + pthread_mutex_init(&dp->table_mutex, NULL); + pthread_mutex_init(&dp->port_list_mutex, NULL); +#endif + for (i = 0; i < N_QUEUES; i++) { queue_init(&dp->queues[i]); } + hmap_init(&dp->flow_table); for (i = 0; i < N_GROUPS; i++) { dp->groups[i].ports = NULL; dp->groups[i].n_ports = 0; dp->groups[i].group = i; } + list_init(&dp->port_list); error = do_add_port(dp, name, ODP_PORT_INTERNAL, ODPP_LOCAL); if (error) { @@ -285,15 +350,28 @@ int i; dp_netdev_flow_flush(dp); +#ifdef THREADED + pthread_mutex_lock(&dp->port_list_mutex); +#endif while (dp->n_ports > 0) { struct dp_netdev_port *port = CONTAINER_OF( dp->port_list.next, struct dp_netdev_port, node); do_del_port(dp, port->port_no); } +#ifdef THREADED + pthread_mutex_unlock(&dp->port_list_mutex); + pthread_mutex_lock(&dp->table_mutex); +#endif for (i = 0; i < N_QUEUES; i++) { queue_destroy(&dp->queues[i]); } hmap_destroy(&dp->flow_table); +#ifdef THREADED + pthread_mutex_unlock(&dp->table_mutex); + pthread_mutex_destroy(&dp->table_mutex); + pthread_mutex_destroy(&dp->port_list_mutex); +#endif + for (i = 0; i < N_GROUPS; i++) { free(dp->groups[i].ports); } @@ -326,8 +404,16 @@ { struct dp_netdev *dp = get_dp_netdev(dpif); memset(stats, 0, sizeof *stats); + +#ifdef THREADED + pthread_mutex_lock(&dp->table_mutex); +#endif stats->n_flows = hmap_count(&dp->flow_table); stats->cur_capacity = hmap_capacity(&dp->flow_table); +#ifdef THREADED + pthread_mutex_unlock(&dp->table_mutex); +#endif + stats->max_capacity = MAX_FLOWS; stats->n_ports = dp->n_ports; stats->max_ports = MAX_PORTS; @@ -395,15 +481,24 @@ port->port_no = port_no; port->netdev = netdev; port->internal = internal; +#ifdef THREADED + port->poll_fd = NULL; +#endif netdev_get_mtu(netdev, &mtu); if (mtu > max_mtu) { max_mtu = mtu; } +#ifdef THREADED + pthread_mutex_lock(&dp->port_list_mutex); +#endif list_push_back(&dp->port_list, &port->node); - dp->ports[port_no] = port; dp->n_ports++; +#ifdef THREADED + pthread_mutex_unlock(&dp->port_list_mutex); +#endif + dp->ports[port_no] = port; dp->serial++; return 0; @@ -457,12 +552,21 @@ { struct dp_netdev_port *port; +#ifdef THREADED + pthread_mutex_lock(&dp->port_list_mutex); +#endif LIST_FOR_EACH (port, struct dp_netdev_port, node, &dp->port_list) { if (!strcmp(netdev_get_name(port->netdev), devname)) { *portp = port; +#ifdef THREADED + pthread_mutex_unlock(&dp->port_list_mutex); +#endif return 0; } } +#ifdef THREADED + pthread_mutex_unlock(&dp->port_list_mutex); +#endif return ENOENT; } @@ -472,7 +576,7 @@ struct dp_netdev_port *port; char *name; int error; - + /* XXX why no semaphores?? */ error = get_port_by_number(dp, port_no, &port); if (error) { return error; @@ -535,7 +639,13 @@ static void dp_netdev_free_flow(struct dp_netdev *dp, struct dp_netdev_flow *flow) { +#ifdef THREADED + pthread_mutex_lock(&dp->table_mutex); +#endif hmap_remove(&dp->flow_table, &flow->node); +#ifdef THREADED + pthread_mutex_unlock(&dp->table_mutex); +#endif free(flow->actions); free(flow); } @@ -567,6 +677,9 @@ int i; i = 0; +#ifdef THREADED + pthread_mutex_lock(&dp->port_list_mutex); +#endif LIST_FOR_EACH (port, struct dp_netdev_port, node, &dp->port_list) { struct odp_port *odp_port = &ports[i]; if (i >= n) { @@ -575,6 +688,9 @@ answer_port_query(port, odp_port); i++; } +#ifdef THREADED + pthread_mutex_unlock(&dp->port_list_mutex); +#endif return dp->n_ports; } @@ -656,17 +772,27 @@ } static struct dp_netdev_flow * -dp_netdev_lookup_flow(const struct dp_netdev *dp, const flow_t *key) +dp_netdev_lookup_flow(struct dp_netdev *dp, const flow_t *key) { struct dp_netdev_flow *flow; assert(!key->reserved[0] && !key->reserved[1] && !key->reserved[2]); + +#ifdef THREADED + pthread_mutex_lock(&dp->table_mutex); +#endif HMAP_FOR_EACH_WITH_HASH (flow, struct dp_netdev_flow, node, flow_hash(key, 0), &dp->flow_table) { if (flow_equal(&flow->key, key)) { +#ifdef THREADED + pthread_mutex_unlock(&dp->table_mutex); +#endif return flow; } } +#ifdef THREADED + pthread_mutex_unlock(&dp->table_mutex); +#endif return NULL; } @@ -707,8 +833,11 @@ for (i = 0; i < n; i++) { struct odp_flow *odp_flow = &flows[i]; - answer_flow_query(dp_netdev_lookup_flow(dp, &odp_flow->key), - odp_flow->flags, odp_flow); + struct dp_netdev_flow *lookup_flow; + + lookup_flow = dp_netdev_lookup_flow(dp, &odp_flow->key); + if ( lookup_flow == NULL ) + answer_flow_query(lookup_flow, odp_flow->flags, odp_flow); } return 0; } @@ -817,7 +946,13 @@ return error; } +#ifdef THREADED + pthread_mutex_lock(&dp->table_mutex); +#endif hmap_insert(&dp->flow_table, &flow->node, flow_hash(&flow->key, 0)); +#ifdef THREADED + pthread_mutex_unlock(&dp->table_mutex); +#endif return 0; } @@ -836,11 +971,19 @@ { struct dp_netdev *dp = get_dp_netdev(dpif); struct dp_netdev_flow *flow; + int n_flows; flow = dp_netdev_lookup_flow(dp, &put->flow.key); if (!flow) { if (put->flags & ODPPF_CREATE) { - if (hmap_count(&dp->flow_table) < MAX_FLOWS) { +#ifdef THREADED + pthread_mutex_lock(&dp->table_mutex); +#endif + n_flows = hmap_count(&dp->flow_table); +#ifdef THREADED + pthread_mutex_unlock(&dp->table_mutex); +#endif + if (n_flows < MAX_FLOWS) { return add_flow(dpif, &put->flow); } else { return EFBIG; @@ -883,16 +1026,24 @@ { struct dp_netdev *dp = get_dp_netdev(dpif); struct dp_netdev_flow *flow; - int i; + int i, n_flows; i = 0; +#ifdef THREADED + pthread_mutex_lock(&dp->table_mutex); +#endif HMAP_FOR_EACH (flow, struct dp_netdev_flow, node, &dp->flow_table) { if (i >= n) { break; } answer_flow_query(flow, 0, &flows[i++]); } - return hmap_count(&dp->flow_table); + n_flows = hmap_count(&dp->flow_table); +#ifdef THREADED + pthread_mutex_unlock(&dp->table_mutex); +#endif + + return n_flows; } static int @@ -974,13 +1125,31 @@ } static int -dpif_netdev_recv(struct dpif *dpif, struct ofpbuf **bufp) +dpif_netdev_recv(struct dpif *dpif, struct ofpbuf **bufp OVS_UNUSED) { - struct ovs_queue *q = find_nonempty_queue(dpif); + struct ovs_queue *q; + +#ifdef THREADED + struct dp_netdev *dp = get_dp_netdev(dpif); + char c; + pthread_mutex_lock(&dp->table_mutex); +#endif + q = find_nonempty_queue(dpif); if (q) { *bufp = queue_pop_head(q); +#ifdef THREADED + /* read a byte from the pipe to advertise that a packet has been + * received */ + if (read(dp->pipe[0], &c, 1) < 0) { + printf("Error reading from the pipe\n"); + } + pthread_mutex_unlock(&dp->table_mutex); +#endif return 0; } else { +#ifdef THREADED + pthread_mutex_unlock(&dp->table_mutex); +#endif return EAGAIN; } } @@ -988,6 +1157,11 @@ static void dpif_netdev_recv_wait(struct dpif *dpif) { +#ifdef THREADED + struct dp_netdev *dp = get_dp_netdev(dpif); + + poll_fd_wait(dp->pipe[0], POLLIN); +#else struct ovs_queue *q = find_nonempty_queue(dpif); if (q) { poll_immediate_wake(); @@ -995,8 +1169,10 @@ /* No messages ready to be received, and dp_wait() will ensure that we * wake up to queue new messages, so there is nothing to do. */ } +#endif } + static void dp_netdev_flow_used(struct dp_netdev_flow *flow, const flow_t *key, const struct ofpbuf *packet) @@ -1037,13 +1213,17 @@ } } +/* + * This function is no longer called by the threaded version. The same task is + * instead performed in the thread body. + */ static void dp_netdev_run(void) { struct ofpbuf packet; struct dp_netdev *dp; - ofpbuf_init(&packet, DP_NETDEV_HEADROOM + max_mtu); + ofpbuf_init(&packet, DP_NETDEV_HEADROOM + VLAN_ETH_HEADER_LEN + max_mtu); LIST_FOR_EACH (dp, struct dp_netdev, node, &dp_netdev_list) { struct dp_netdev_port *port; @@ -1067,6 +1247,7 @@ ofpbuf_uninit(&packet); } +/* This function is no longer called in the threaded version. */ static void dp_netdev_wait(void) { @@ -1080,6 +1261,139 @@ } } +#ifdef THREADED +/* + * pcap callback argument + */ +struct dispatch_arg { + struct dp_netdev *dp; /* update statistics */ + struct dp_netdev_port *port; /* argument to flow identifier function */ + struct ofpbuf buf; /* used to process the packet */ +}; + +/* Process a packet. + * + * The port_input function will send immediately if it finds a flow match and + * the associated action is ODPAT_OUTPUT or ODPAT_OUTPUT_GROUP. + * If a flow is not found or for the other actions, the packet is copied. + */ +static void +process_pkt(u_char *arg_p, const struct pkthdr *hdr, const u_char *packet) +{ + struct dispatch_arg *arg = (struct dispatch_arg *)arg_p; + struct ofpbuf *buf = &arg->buf; + + /* set packet size and data pointer */ + buf->size = hdr->caplen; /* XXX Must the size be equal to hdr->len or + * hdr->caplen */ + buf->data = (void*)packet; + + dp_netdev_port_input(arg->dp, arg->port, buf); + + return; +} + +/* Body of the thread that manages the datapaths */ +static void* +dp_thread_body(void *args OVS_UNUSED) +{ + struct dp_netdev *dp; + struct dp_netdev_port *port; + struct dispatch_arg arg; + int error; + int n_fds; + uint32_t batch = 50; /* max number of pkts processed by the dispatch */ + int processed; /* actual number of pkts processed by the dispatch */ + + sigset_t sigmask; + + /*XXX Since the poll involves all ports of all datapaths, the right fds + * size should be MAX_PORTS * max_number_of_datapaths */ + struct pollfd fds[MAX_PORTS]; + + /* mask the fatal signals. In this way the main thread is delegate to + * manage this them. */ + sigemptyset(&sigmask); + sigaddset(&sigmask, SIGTERM); + sigaddset(&sigmask, SIGALRM); + sigaddset(&sigmask, SIGINT); + sigaddset(&sigmask, SIGHUP); + + if (pthread_sigmask(SIG_BLOCK, &sigmask, NULL) != 0) { + printf("Error pthread_sigmask\n"); + } + + ofpbuf_init(&arg.buf, DP_NETDEV_HEADROOM + VLAN_ETH_HEADER_LEN + max_mtu); + for(;;) { + n_fds = 0; + /* build the structure for poll */ + LIST_FOR_EACH (dp, struct dp_netdev, node, &dp_netdev_list) { + pthread_mutex_lock(&dp->port_list_mutex); + LIST_FOR_EACH (port, struct dp_netdev_port, node, &dp->port_list) { + /* insert an element in the fds structure */ + fds[n_fds].fd = netdev_get_fd(port->netdev); + fds[n_fds].events = POLLIN; + port->poll_fd = &fds[n_fds]; + n_fds++; + } + pthread_mutex_unlock(&dp->port_list_mutex); + } + + error = poll(fds, n_fds, 2000); + + if (error < 0) { + printf("poll() error: %s\n", strerror(errno)); + break; + } + + LIST_FOR_EACH (dp, struct dp_netdev, node, &dp_netdev_list) { + arg.dp = dp; + pthread_mutex_lock(&dp->port_list_mutex); + LIST_FOR_EACH (port, struct dp_netdev_port, node, &dp->port_list) { + arg.port = port; + arg.buf.size = 0; + arg.buf.data = (char*)arg.buf.base + DP_NETDEV_HEADROOM; + if (port->poll_fd && (port->poll_fd->revents & POLLIN)) { + /* call the dispatch and process the packet into + * its callback. We process 'batch' packets at time */ + processed = netdev_dispatch(port->netdev, batch, + process_pkt, (u_char *)&arg); + if (processed < 0) { /* pcap returns error */ + struct vlog_rate_limit rl = + VLOG_RATE_LIMIT_INIT(1, 5); + VLOG_ERR_RL(&rl, + "error receiving data from XXX \n"); + } + } /* end of if poll */ + } /* end of port loop */ + pthread_mutex_unlock(&dp->port_list_mutex); + } /* end of dp loop */ + } /* for ;; */ + + ofpbuf_uninit(&arg.buf); + return NULL; +} + +/* Starts the datapath */ +static void +dp_netdev_start(void) +{ + int error; + + /* Launch thread which manages the datapath */ + error = pthread_create(&thread_p, NULL, dp_thread_body, NULL); + return; +} + +/* This is the function that is called in response of a fatal signal (e.g. + * SIGTERM) */ +static void +dp_netdev_exit_hook(void *aux OVS_UNUSED) +{ + pthread_cancel(thread_p); + pthread_join(thread_p, NULL); +} +#endif /* THREADED */ /* Modify the TCI field of 'packet'. If a VLAN tag is not present, one * is added with the TCI field set to 'tci'. If a VLAN tag is present, @@ -1204,7 +1518,7 @@ dp_netdev_set_tp_port(struct ofpbuf *packet, const flow_t *key, const struct odp_action_tp_port *a) { - if (is_ip(packet, key)) { + if (is_ip(packet, key)) { uint16_t *field; if (key->nw_proto == IPPROTO_TCP && packet->l7) { struct tcp_header *th = packet->l4; @@ -1255,6 +1569,9 @@ struct odp_msg *header; struct ofpbuf *msg; size_t msg_size; +#ifdef THREADED + char c; +#endif if (q->n >= MAX_QUEUE_LEN) { dp->n_lost++; @@ -1269,7 +1586,18 @@ header->port = port_no; header->arg = arg; ofpbuf_put(msg, packet->data, packet->size); +#ifdef THREADED + pthread_mutex_lock(&dp->table_mutex); +#endif + queue_push_tail(q, msg); +#ifdef THREADED + /* write a byte on the pipe to advertise that a packet is ready */ + if (write(dp->pipe[1], &c, 1) < 0) { + printf("Error writing on the pipe\n"); + } + pthread_mutex_unlock(&dp->table_mutex); +#endif return 0; } @@ -1302,12 +1630,16 @@ || !eth_addr_equals(arp->ar_sha, eth->eth_src)); } +/* + * Execute the actions associated to a flow. + */ static int dp_netdev_execute_actions(struct dp_netdev *dp, struct ofpbuf *packet, const flow_t *key, const union odp_action *actions, int n_actions) { int i; + for (i = 0; i < n_actions; i++) { const union odp_action *a = &actions[i]; @@ -1376,6 +1708,10 @@ "netdev", dp_netdev_run, dp_netdev_wait, +#ifdef THREADED + dp_netdev_start, + dp_netdev_exit_hook, +#endif NULL, /* enumerate */ dpif_netdev_open, dpif_netdev_close, diff -Nur -x '*.svn*' orig/lib/dpif-provider.h mod/lib/dpif-provider.h --- orig/lib/dpif-provider.h 2010-09-10 23:36:43.000000000 +0200 +++ mod/lib/dpif-provider.h 2011-11-29 12:51:05.443189084 +0100 @@ -77,6 +77,16 @@ * to be called. */ void (*wait)(void); +#ifdef THREADED + /* Starts the datapath management. This function is thought for a scenario + * in which the datapath and the ofproto modules are managed in different + * threads/processes */ + void (*start)(void); + + /* Function called in the arrival of a fatal signal (e.g. SIGTERM) */ + void (*exit_hook)(void*); +#endif + /* Enumerates the names of all known created datapaths, if possible, into * 'all_dps'. The caller has already initialized 'all_dps' and other dpif * classes might already have added names to it. diff -Nur -x '*.svn*' orig/lib/fatal-signal.c mod/lib/fatal-signal.c --- orig/lib/fatal-signal.c 2010-09-10 23:36:43.000000000 +0200 +++ mod/lib/fatal-signal.c 2011-11-29 12:51:05.453189086 +0100 @@ -17,7 +17,6 @@ #include "fatal-signal.h" #include #include -#include #include #include #include diff -Nur -x '*.svn*' orig/lib/fatal-signal.h mod/lib/fatal-signal.h --- orig/lib/fatal-signal.h 2010-05-08 00:49:44.000000000 +0200 +++ mod/lib/fatal-signal.h 2011-11-29 12:51:05.453189086 +0100 @@ -18,6 +18,7 @@ #define FATAL_SIGNAL_H 1 #include +#include /* Basic interface. */ void fatal_signal_add_hook(void (*hook_cb)(void *aux), diff -Nur -x '*.svn*' orig/lib/netdev-bsd.c mod/lib/netdev-bsd.c --- orig/lib/netdev-bsd.c 1970-01-01 01:00:00.000000000 +0100 +++ mod/lib/netdev-bsd.c 2011-11-29 12:51:05.446522418 +0100 @@ -0,0 +1,1677 @@ +/* + * Copyright (c) 2011 Gaetano Catalli. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * 1. Redistributions of source code must retain the above copyright notice, + * this list of conditions and the following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED ``AS IS'' WITHOUT ANY WARRANTIES OF ANY KIND. + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "rtbsd.h" +#include "coverage.h" +#include "dynamic-string.h" +#include "fatal-signal.h" +#include "netdev-provider.h" +#include "ofpbuf.h" +#include "openflow/openflow.h" +#include "packets.h" +#include "poll-loop.h" +#include "socket-util.h" +#include "shash.h" +#include "svec.h" +#include "vlog.h" + +VLOG_DEFINE_THIS_MODULE(netdev_bsd) + + +/* + * This file implements objects to access interfaces. + * Externally, interfaces are represented by two structures: + * + struct netdev_dev, representing a network device, + * containing e.g. name and a refcount; + * We can have private variables by embedding the + * struct netdev_dev into our own structure + * (e.g. netdev_dev_bsd) + * + * + struct netdev, representing an instance of an open netdev_dev. + * The structure contains a pointer to the 'struct netdev' + * representing the device. Again, private information + * such as file descriptor etc. are stored in our + * own struct netdev_bsd which includes a struct netdev. + * + * Both 'struct netdev' and 'struct netdev_dev' are referenced + * in containers which hold pointers to the data structures. + * We can reach our own struct netdev_XXX_bsd by putting a + * struct netdev_XXX within our own struct, and using CONTAINER_OF + * to access the parent structure. + */ +struct netdev_bsd { + struct netdev netdev; + + int netdev_fd; /* Selectable file descriptor for the network device. + This descriptor will be used for polling operations */ + + pcap_t *pcap_handle; /* Packet capture descriptor for a system network + device */ +}; + +struct netdev_dev_bsd { + struct netdev_dev netdev_dev; + unsigned int cache_valid; + + int ifindex; + uint8_t etheraddr[ETH_ADDR_LEN]; + struct in_addr in4; + struct in6_addr in6; + int mtu; + int carrier; + + bool tap_opened; + int tap_fd; /* TAP character device, if any */ +}; + + +enum { + VALID_IFINDEX = 1 << 0, + VALID_ETHERADDR = 1 << 1, + VALID_IN4 = 1 << 2, + VALID_IN6 = 1 << 3, + VALID_MTU = 1 << 4, + VALID_CARRIER = 1 << 5 +}; + +/* An AF_INET socket (used for ioctl operations). */ +static int af_inet_sock = -1; + +#define PCAP_SNAPLEN 2048 + +/* + * A BSD network device notifier. + * + * Represents a handler to be invoked on a device when some event occurs. + * Contains handler, parameters (in netdev_notifier) and link fields for the + * list (in struct list). + */ +struct netdev_bsd_notifier { + struct netdev_notifier notifier; /* Handler and arguments */ + struct list node; /* Link fields for the list */ +}; + +/* + * All 'struct netdev_bsd_notifier' objects are linked as children of a generic + * 'struct shash_node', there is one shash_node per interface, and the + * interface name is the search key. In turn, all the 'struct shash_node' are + * stored in a container, all_bsd_notifiers. + * + * A 'netdev_bsd_notifier' is created and added to the all_bsd_notifiers + * using 'netdev_bsd_poll_add()' XXX again, same code as netdev-linux + */ +static struct shash all_bsd_notifiers = + SHASH_INITIALIZER(&all_bsd_notifiers); + +/* + * Openvswitch can register multiple handlers on route-related events. + * The descriptor for each handler is a struct rtbsd_notifier + * that contains the function and a parameter. + * + * In this module we call rtbsd_notifier_register() to invoke + * the function netdev_bsd_poll_cb() on the all_bsd_notifiers above. + */ +static struct rtbsd_notifier netdev_bsd_poll_notifier; + +/* + * Notifier used to invalidate device informations in case of status change. + * + * It will be registered with a 'rtbsd_notifier_register()' when the first + * device will be created with the call of either 'netdev_bsd_tap_create()' or + * 'netdev_bsd_system_create()'. + * + * The callback associated with this notifier ('netdev_bsd_cache_cb()') will + * invalidate cached information about the device. + */ +static struct rtbsd_notifier netdev_bsd_cache_notifier; +static int cache_notifier_refcount; + +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); + +static int netdev_bsd_do_ioctl(const struct netdev *, struct ifreq *, + int cmd, const char *cmd_name); +static void destroy_tap(int fd, const char *name); +static int get_flags(const struct netdev *, int *flagsp); +static int set_flags(struct netdev *, int flags); +static int do_set_addr(struct netdev *netdev, + int ioctl_nr, const char *ioctl_name, + struct in_addr addr); +static int get_etheraddr(const char *netdev_name, uint8_t ea[ETH_ADDR_LEN]); +static int set_etheraddr(const char *netdev_name, int hwaddr_family, + int hwaddr_len, const uint8_t[ETH_ADDR_LEN]); +static int get_ifindex(const struct netdev *, int *ifindexp); + +static int netdev_bsd_init(void); + +static bool +is_netdev_bsd_class(const struct netdev_class *netdev_class) +{ + return netdev_class->init == netdev_bsd_init; +} + +static struct netdev_bsd * +netdev_bsd_cast(const struct netdev *netdev) +{ + assert(is_netdev_bsd_class(netdev_dev_get_class(netdev_get_dev(netdev)))); + return CONTAINER_OF(netdev, struct netdev_bsd, netdev); +} + +static struct netdev_dev_bsd * +netdev_dev_bsd_cast(const struct netdev_dev *netdev_dev) +{ + assert(is_netdev_bsd_class(netdev_dev_get_class(netdev_dev))); + return CONTAINER_OF(netdev_dev, struct netdev_dev_bsd, netdev_dev); +} + +/* Initialize the AF_INET socket used for ioctl operations */ +static int +netdev_bsd_init(void) +{ + static int status = -1; + + if (status >= 0) { /* already initialized */ + return status; + } + + af_inet_sock = socket(AF_INET, SOCK_DGRAM, 0); + status = af_inet_sock >= 0 ? 0 : errno; + + if (status) { + VLOG_ERR("failed to create inet socket: %s", strerror(status)); + } + + return status; +} + +/* + * Perform periodic work needed by netdev. In BSD netdevs it checks for any + * interface status changes, and eventually calls all the callbacks registered + * by the user. In addiction, it invalidates the cached information about the + * interested interface. + */ +static void +netdev_bsd_run(void) +{ + rtbsd_notifier_run(); +} + +/* + * Arranges for poll_block() to wake up if + * the "run" member function needs to be called. + */ +static void +netdev_bsd_wait(void) +{ + rtbsd_notifier_wait(); +} + +/* Invalidate cache in case of interface status change. */ +static void +netdev_bsd_cache_cb(const struct rtbsd_change *change, + void *aux OVS_UNUSED) +{ + struct netdev_dev_bsd *dev; + + if (change) { + struct netdev_dev *base_dev = netdev_dev_from_name(change->if_name); + + if (base_dev) { + const struct netdev_class *netdev_class = + netdev_dev_get_class(base_dev); + + if (is_netdev_bsd_class(netdev_class)) { + dev = netdev_dev_bsd_cast(base_dev); + dev->cache_valid = 0; + } + } + } else { + /* + * XXX the API is lacking, we should be able to iterate + * on a .... without having to store the info in a temp shash + */ + struct shash device_shash; + struct shash_node *node; + + shash_init(&device_shash); + netdev_dev_get_devices(&netdev_bsd_class, &device_shash); + SHASH_FOR_EACH (node, &device_shash) { + dev = node->data; + dev->cache_valid = 0; + } + shash_destroy(&device_shash); + } +} + +#if 0 +/* Probabily could be moved into the rt code, + * This code could be simplified if the reference + * counter is increased inside the rtbsd_notifier_register + * function, on success */ +static int +rtbsd_notifier_register_refcount(struct rtbsd_notifier *notifier, + rtbsd_notify_func *cb, void *aux) +{ + int ret = 0; + + /* the first time register the notifier */ + if (!cache_notifier_refcount) { + ret = rtbsd_notifier_register(&netdev_bsd_cache_notifier, + netdev_bsd_cache_cb, NULL); + if (ret) { + return ret; + } + } + + cache_notifier_refcount++; + return ret; +} +#endif + +/* Allocate a netdev_dev_bsd structure */ +static int +netdev_bsd_create_system(const char *name, const char *type OVS_UNUSED, + const struct shash *args OVS_UNUSED, + struct netdev_dev **netdev_devp) +{ + struct netdev_dev_bsd *netdev_dev; + int ret = 0; + + if (!cache_notifier_refcount) { + ret = rtbsd_notifier_register(&netdev_bsd_cache_notifier, + netdev_bsd_cache_cb, NULL); + if (ret) { + return ret; + } + } + cache_notifier_refcount++; + + /* allocate and initialize the device structure, + * link the structure to its netdev */ + netdev_dev = xzalloc(sizeof *netdev_dev); + netdev_dev_init(&netdev_dev->netdev_dev, name, &netdev_bsd_class); + *netdev_devp = &netdev_dev->netdev_dev; + + return ret; +} + +/* + * Allocate a netdev_dev_bsd structure with 'tap' class. + */ +static int +netdev_bsd_create_tap(const char *name, const char *type OVS_UNUSED, + const struct shash *args OVS_UNUSED, + struct netdev_dev **netdev_devp) +{ + struct netdev_dev_bsd *netdev_dev; + int error = 0; + struct ifreq ifr; + + if (!cache_notifier_refcount) { + error = rtbsd_notifier_register(&netdev_bsd_cache_notifier, + netdev_bsd_cache_cb, NULL); + if (error) { + return error; + } + } + cache_notifier_refcount++; + + /* allocate the device structure and set the internal flag */ + netdev_dev = xzalloc(sizeof *netdev_dev); + + memset(&ifr, 0, sizeof(ifr)); + + /* Create a tap device by opening /dev/tap. To find the name of + * the created device, the TAPGIFNAME ioctl needs to be used. */ + netdev_dev->tap_fd = open("/dev/tap", O_RDWR); + if (netdev_dev->tap_fd < 0) { + error = errno; + VLOG_WARN("opening \"/dev/tap\" failed: %s", strerror(error)); + goto error; + } + + /* Retrieve tap name (e.g. tap0) */ + if (ioctl(netdev_dev->tap_fd, TAPGIFNAME, &ifr) == -1) { + /* Now the interface must be destroyed, but how can it be done if + * the name of the interface isn't known? */ + error = errno; + goto error; + } + + /* Change the name of the tap device */ + ifr.ifr_data = (void *)name; + if (ioctl(af_inet_sock, SIOCSIFNAME, &ifr) == -1) { + error = errno; + destroy_tap(netdev_dev->tap_fd, ifr.ifr_name); + goto error; + } + + /* Make non-blocking. */ + error = set_nonblocking(netdev_dev->tap_fd); + if (error) { + destroy_tap(netdev_dev->tap_fd, name); + goto error; + } + + /* Turn device UP */ + ifr.ifr_flags = (uint16_t)IFF_UP; + ifr.ifr_flagshigh = 0; + strncpy(ifr.ifr_name, name, sizeof ifr.ifr_name); + if (ioctl(af_inet_sock, SIOCSIFFLAGS, &ifr) == -1) { + error = errno; + goto error; + } + + /* initialize the device structure and + * link the structure to its netdev */ + netdev_dev_init(&netdev_dev->netdev_dev, name, &netdev_tap_class); + *netdev_devp = &netdev_dev->netdev_dev; + + return 0; + +error: + free(netdev_dev); + return error; +} + +static void +netdev_bsd_destroy(struct netdev_dev *netdev_dev_) +{ + struct netdev_dev_bsd *netdev_dev = netdev_dev_bsd_cast(netdev_dev_); + + cache_notifier_refcount--; + if (cache_notifier_refcount == 0) { + rtbsd_notifier_unregister(&netdev_bsd_cache_notifier); + } + + if (netdev_dev->tap_fd >= 0 && + !strcmp(netdev_dev_get_type(netdev_dev_), "tap")) { + destroy_tap(netdev_dev->tap_fd, netdev_dev_get_name(netdev_dev_)); + } + free(netdev_dev); +} + + +static int +netdev_bsd_open_system(struct netdev_dev *netdev_dev_, int ethertype, + struct netdev **netdevp) +{ + struct netdev_dev_bsd *netdev_dev = netdev_dev_bsd_cast(netdev_dev_); + struct netdev_bsd *netdev; + int error; + enum netdev_flags flags; + + /* Allocate network device. */ + netdev = xcalloc(1, sizeof *netdev); + netdev->netdev_fd = -1; + netdev_init(&netdev->netdev, netdev_dev_); + + /* Verify that the netdev really exists by attempting to read its flags */ + error = netdev_get_flags(&netdev->netdev, &flags); + if (error == ENXIO) { + goto error; + } + + /* The first user that opens this port (from dpif_create_and_open()) will + * receive the file descriptor associated with the tap device. Instead, the + * following users will open the tap device as a normal 'system' device. */ + if (!strcmp(netdev_dev_get_type(netdev_dev_), "tap") && + !netdev_dev->tap_opened) { + netdev_dev->tap_opened = true; + netdev->netdev_fd = netdev_dev->tap_fd; + } else if (ethertype != NETDEV_ETH_TYPE_NONE) { + char errbuf[PCAP_ERRBUF_SIZE]; + int one = 1; + + /* open the pcap device. The device is opened in non-promiscuous mode + * because the interface flags are manually set by the caller. */ + netdev->pcap_handle = pcap_open_live(netdev_dev_->name, PCAP_SNAPLEN, + 0, 1000, errbuf); + if (netdev->pcap_handle == NULL) { + error = errno; + goto error; + } + + /* initialize netdev->netdev_fd */ + netdev->netdev_fd = pcap_get_selectable_fd(netdev->pcap_handle); + if (netdev->netdev_fd == -1) { + error = errno; + goto error; + } + + /* Set non-blocking mode. Also the BIOCIMMEDIATE ioctl must be called + * on the file descriptor returned by pcap_get_selectable_fd to achieve + * a real non-blocking behaviour.*/ + error = pcap_setnonblock(netdev->pcap_handle, 1, errbuf); + if (error == -1) { + error = errno; + goto error; + } + + /* This call assure that reads return immediately upon packet reception. + * Otherwise, a read will block until either the kernel buffer becomes + * full or a timeout occurs. */ + if(ioctl(netdev->netdev_fd, BIOCIMMEDIATE, &one) < 0 ) { + VLOG_ERR("ioctl(BIOCIMMEDIATE) on %s device failed: %s", + netdev_dev_get_name(netdev_dev_), strerror(errno)); + error = errno; + goto error; + } + + /* Capture only incoming packets */ + error = pcap_setdirection(netdev->pcap_handle, PCAP_D_IN); + if (error == -1) { + error = errno; + goto error; + } + } + *netdevp = &netdev->netdev; + + return 0; + +error: + netdev_uninit(&netdev->netdev, true); + return error; +} + + +/* Close a 'netdev'. */ +static void +netdev_bsd_close(struct netdev *netdev_) +{ + struct netdev_bsd *netdev = netdev_bsd_cast(netdev_); + + if (netdev->netdev_fd >= 0 && strcmp(netdev_get_type(netdev_), "tap")) { + pcap_close(netdev->pcap_handle); + } + + free(netdev); +} + + +/* Initializes 'svec' with a list of the names of all known network devices. */ +static int +netdev_bsd_enumerate(struct svec *svec) +{ + struct if_nameindex *names; + + names = if_nameindex(); + if (names) { + size_t i; + + for (i = 0; names[i].if_name != NULL; i++) { + svec_add(svec, names[i].if_name); + } + if_freenameindex(names); + return 0; + } else { + VLOG_WARN("could not obtain list of network device names: %s", + strerror(errno)); + return errno; + } +} + +/* The recv callback of the netdev class returns the number of bytes of the + * received packet. + * + * This can be done by the pcap_next() function. Unfortunately pcap_next() does + * not make difference between a missing packet on the capture interface and + * an error during the file capture. We can use the pcap_dispatch() function + * instead, which is able to distinguish between errors and null packet. + * + * To make pcap_dispatch() returns the number of bytes read from the interface + * we need to define the following callback and argument. + */ +struct pcap_arg { + void *data; + int size; + int retval; +}; + +/* + * This callback will be executed on every captured packet. + * + * If the packet captured by pcap_dispatch() does not fit the pcap buffer, + * pcap returns a truncated packet and we follow this behavior. + * + * The argument args->retval is the packet size in bytes. + */ +static void +proc_pkt(u_char *args_, const struct pcap_pkthdr *hdr, const u_char *packet) +{ + struct pcap_arg *args = (struct pcap_arg *)args_; + + if (args->size < hdr->len) { + printf("%s Warning: Packet truncated'n", __func__); + args->retval = args->size; + } else { + args->retval = hdr->len; + } + + /* copy the packet to our buffer */ + memcpy(args->data, packet, args->retval); +} + +/* + * This function attempts to receive a packet from the specified network + * device. It is assumed that the network device is a system device or a tap + * device opened as a system one. In this case the read operation is performed + * on the 'netdev' pcap descriptor. + */ +static int +netdev_bsd_recv_system(struct netdev_bsd *netdev, void *data, size_t size) +{ + struct pcap_arg arg; + int ret; + + if (netdev->netdev_fd < 0) { + /* Device was opened with NETDEV_ETH_TYPE_NONE. */ + return -EAGAIN; + } + + /* prepare the pcap argument to store the packet */ + arg.size = size; + arg.data = data; + + for (;;) { + ret = pcap_dispatch(netdev->pcap_handle, 1, proc_pkt, (u_char *)&arg); + + if (ret > 0) { + return arg.retval; /* arg.retval < 0 is handled in the caller */ + } + + /* Check for EINTR, because this can be returned by our SIGALRM + * handler. XXX investigate this. */ + if (ret == -1) { + if (errno == EINTR) { + continue; + } + } + + return -EAGAIN; + } +} + +/* + * This function attempts to receive a packet from the specified network + * device. It is assumed that the network device is a tap device and the + * 'netdev_fd' member of the 'netdev' structure is initialized with the tap + * file descriptor. + */ +static int +netdev_bsd_recv_tap(struct netdev_bsd *netdev, void *data, size_t size) +{ + if (netdev->netdev_fd < 0) { + /* Device was opened with NETDEV_ETH_TYPE_NONE. */ + return -EAGAIN; + } + + for (;;) { + ssize_t retval = read(netdev->netdev_fd, data, size); + if (retval >= 0) { + return retval; + } else if (errno != EINTR) { + if (errno != EAGAIN) { + VLOG_WARN_RL(&rl, "error receiving Ethernet packet on %s: %s", + strerror(errno), netdev->netdev.netdev_dev->name); + } + return -errno; + } + } +} + + +/* + * According with the nature of the device a different function must be called. + * If the device is the bridge local port the 'netdev_bsd_recv_tap' function + * must be called, otherwise the 'netdev_bsd_recv_system' function is called. + * + * type!="tap" ---> system device. + * type=="tap" && netdev_fd == tap_fd ---> internal tap device + * type=="tap" && netdev_fd != tap_fd ---> internal tap device + * opened as a system + * device. + */ +static int +netdev_bsd_recv(struct netdev *netdev_, void* data, size_t size) +{ + struct netdev_bsd *netdev = netdev_bsd_cast(netdev_); + struct netdev_dev_bsd * netdev_dev = + netdev_dev_bsd_cast(netdev_get_dev(netdev_)); + + if (!strcmp(netdev_get_type(netdev_), "tap") && + netdev->netdev_fd == netdev_dev->tap_fd) { + return netdev_bsd_recv_tap(netdev, data, size); + } else { + return netdev_bsd_recv_system(netdev, data, size); + } +} + + +/* + * Registers with the poll loop to wake up from the next call to poll_block() + * when a packet is ready to be received with netdev_recv() on 'netdev'. + */ +static void +netdev_bsd_recv_wait(struct netdev *netdev_) +{ + struct netdev_bsd *netdev = netdev_bsd_cast(netdev_); + + if (netdev->netdev_fd >= 0) { + poll_fd_wait(netdev->netdev_fd, POLLIN); + } +} + +#ifdef THREADED +static int +netdev_bsd_dispatch_system(struct netdev_bsd *netdev, int batch, pkt_handler h, + u_char *user) +{ + int ret; + + ret = pcap_dispatch(netdev->pcap_handle, batch, (pcap_handler)h , user); + return ret; +} + +static int +netdev_bsd_dispatch_tap(struct netdev_bsd *netdev, int batch, pkt_handler h, + u_char *user) +{ + int ret; + int i; + u_char buf[VLAN_HEADER_LEN + ETH_HEADER_LEN + ETH_PAYLOAD_MAX]; + struct pkthdr hdr; + + for (i = 0; i < batch; i++) { + ret = netdev_bsd_recv_tap(netdev, buf, sizeof(buf)); + if (ret >= 0) { + /* XXX hdr.len should be set to the effective length of the packet */ + hdr.caplen = ret; + hdr.len = ret; + h(user, &hdr, buf); + } else if (ret != -EAGAIN) { + return -1; + } else { /* ret = EAGAIN */ + break; + } + } + return i; +} + +static int +netdev_bsd_dispatch(struct netdev *netdev_, int batch, pkt_handler h, + u_char *user) +{ + struct netdev_bsd *netdev = netdev_bsd_cast(netdev_); + struct netdev_dev_bsd * netdev_dev = + netdev_dev_bsd_cast(netdev_get_dev(netdev_)); + + if (!strcmp(netdev_get_type(netdev_), "tap") && + netdev->netdev_fd == netdev_dev->tap_fd) { + return netdev_bsd_dispatch_tap(netdev, batch, h, user); + } else { + return netdev_bsd_dispatch_system(netdev, batch, h, user); + } +} + +static int +netdev_bsd_get_fd(struct netdev *netdev_) +{ + struct netdev_bsd *netdev = netdev_bsd_cast(netdev_); + return netdev->netdev_fd; +} +#endif + +/* Discards all packets waiting to be received from 'netdev'. */ +static int +netdev_bsd_drain(struct netdev *netdev_) +{ + struct ifreq ifr; + struct netdev_bsd *netdev = netdev_bsd_cast(netdev_); + + strcpy(ifr.ifr_name, netdev_get_name(netdev_)); + if (ioctl(netdev->netdev_fd, BIOCFLUSH, &ifr) == -1) { + VLOG_DBG_RL(&rl, "%s: ioctl(BIOCFLUSH) failed: %s", + netdev_get_name(netdev_), strerror(errno)); + return errno; + } + return 0; +} + +/* + * Send a packet on the specified network device. The device could be either a + * system or a tap device. + */ +static int +netdev_bsd_send(struct netdev *netdev_, const void *data, size_t size) +{ + struct netdev_bsd *netdev = netdev_bsd_cast(netdev_); + struct netdev_dev_bsd * netdev_dev = + netdev_dev_bsd_cast(netdev_get_dev(netdev_)); + + /* XXX should support sending even if 'ethertype' was NETDEV_ETH_TYPE_NONE. + */ + if (netdev->netdev_fd < 0) { + return EPIPE; + } + + for (;;) { + ssize_t retval; + if (!strcmp(netdev_get_type(netdev_), "tap") && + netdev_dev->tap_fd == netdev->netdev_fd) { + retval = write(netdev->netdev_fd, data, size); + } else { + retval = pcap_inject(netdev->pcap_handle, data, size); + } + if (retval < 0) { + if (errno == EINTR) { + continue; + } else if (errno != EAGAIN) { + VLOG_WARN_RL(&rl, "error sending Ethernet packet on %s: %s", + netdev_get_name(netdev_), strerror(errno)); + } + return errno; + } else if (retval != size) { + VLOG_WARN_RL(&rl, "sent partial Ethernet packet (%zd bytes of " + "%zu) on %s", retval, size, + netdev_get_name(netdev_)); + return EMSGSIZE; + } else { + return 0; + } + } +} + +/* + * Registers with the poll loop to wake up from the next call to poll_block() + * when the packet transmission queue has sufficient room to transmit a packet + * with netdev_send(). + */ +static void +netdev_bsd_send_wait(struct netdev *netdev_) +{ + struct netdev_bsd *netdev = netdev_bsd_cast(netdev_); + + if (netdev->netdev_fd < 0) { /* Nothing to do. */ + return; + } + + if (strcmp(netdev_get_type(netdev_), "tap")) { + poll_fd_wait(netdev->netdev_fd, POLLOUT); + } else { + /* TAP device always accepts packets. XXX it depends on which side it + * is open*/ + poll_immediate_wake(); + } +} + +/* + * Attempts to set 'netdev''s MAC address to 'mac'. Returns 0 if successful, + * otherwise a positive errno value. + */ +static int +netdev_bsd_set_etheraddr(struct netdev *netdev_, + const uint8_t mac[ETH_ADDR_LEN]) +{ + struct netdev_dev_bsd *netdev_dev = + netdev_dev_bsd_cast(netdev_get_dev(netdev_)); + int error; + + if (!(netdev_dev->cache_valid & VALID_ETHERADDR) + || !eth_addr_equals(netdev_dev->etheraddr, mac)) { + error = set_etheraddr(netdev_get_name(netdev_), AF_LINK, ETH_ADDR_LEN, + mac); + if (!error) { + netdev_dev->cache_valid |= VALID_ETHERADDR; + memcpy(netdev_dev->etheraddr, mac, ETH_ADDR_LEN); + } + } else { + error = 0; + } + return error; +} + +/* + * Returns a pointer to 'netdev''s MAC address. The caller must not modify or + * free the returned buffer. + */ +static int +netdev_bsd_get_etheraddr(const struct netdev *netdev_, + uint8_t mac[ETH_ADDR_LEN]) +{ + struct netdev_dev_bsd *netdev_dev = + netdev_dev_bsd_cast(netdev_get_dev(netdev_)); + + if (!(netdev_dev->cache_valid & VALID_ETHERADDR)) { + int error = get_etheraddr(netdev_get_name(netdev_), + netdev_dev->etheraddr); + if (error) { + return error; + } + netdev_dev->cache_valid |= VALID_ETHERADDR; + } + memcpy(mac, netdev_dev->etheraddr, ETH_ADDR_LEN); + + return 0; +} + +/* + * Returns the maximum size of transmitted (and received) packets on 'netdev', + * in bytes, not including the hardware header; thus, this is typically 1500 + * bytes for Ethernet devices. + */ +static int +netdev_bsd_get_mtu(const struct netdev *netdev_, int *mtup) +{ + struct netdev_dev_bsd *netdev_dev = + netdev_dev_bsd_cast(netdev_get_dev(netdev_)); + + if (!(netdev_dev->cache_valid & VALID_MTU)) { + struct ifreq ifr; + int error; + + error = netdev_bsd_do_ioctl(netdev_, &ifr, SIOCGIFMTU, "SIOCGIFMTU"); + if (error) { + return error; + } + netdev_dev->mtu = ifr.ifr_mtu; + netdev_dev->cache_valid |= VALID_MTU; + } + + *mtup = netdev_dev->mtu; + return 0; +} + +static int +netdev_bsd_get_ifindex(const struct netdev *netdev) +{ + int ifindex, error; + + error = get_ifindex(netdev, &ifindex); + return error ? -error : ifindex; +} + +static int +netdev_bsd_get_carrier(const struct netdev *netdev_, bool *carrier) +{ + struct netdev_dev_bsd *netdev_dev = + netdev_dev_bsd_cast(netdev_get_dev(netdev_)); + + if (!(netdev_dev->cache_valid & VALID_CARRIER)) { + struct ifmediareq ifmr; + + memset(&ifmr, 0, sizeof(ifmr)); + strncpy(ifmr.ifm_name, netdev_get_name(netdev_), sizeof ifmr.ifm_name); + + if (ioctl(af_inet_sock, SIOCGIFMEDIA, &ifmr) == -1) { + VLOG_DBG_RL(&rl, "%s: ioctl(SIOCGIFMEDIA) failed: %s", + netdev_get_name(netdev_), strerror(errno)); + return errno; + } + + netdev_dev->carrier = (ifmr.ifm_status & IFM_ACTIVE) == IFM_ACTIVE; + netdev_dev->cache_valid |= VALID_CARRIER; + + /* If the interface doesn't report whether the media is active, + * just assume it is active. */ + if ((ifmr.ifm_status & IFM_AVALID) == 0) { + netdev_dev->carrier = true; + } + } + *carrier = netdev_dev->carrier; + + return 0; +} + +/* Retrieves current device stats for 'netdev'. */ +static int +netdev_bsd_get_stats(const struct netdev *netdev_, struct netdev_stats *stats) +{ + int if_count, i; + int mib[6]; + size_t len; + struct ifmibdata ifmd; + + COVERAGE_INC(netdev_get_stats); + + mib[0] = CTL_NET; + mib[1] = PF_LINK; + mib[2] = NETLINK_GENERIC; + mib[3] = IFMIB_SYSTEM; + mib[4] = IFMIB_IFCOUNT; + + len = sizeof(if_count); + + if (sysctl(mib, 5, &if_count, &len, (void *)0, 0) == -1) { + VLOG_DBG_RL(&rl, "%s: sysctl failed: %s", + netdev_get_name(netdev_), strerror(errno)); + return errno; + } + + mib[5] = IFDATA_GENERAL; + mib[3] = IFMIB_IFDATA; + len = sizeof(ifmd); + for (i = 1; i <= if_count; i++) { + mib[4] = i; //row + if (sysctl(mib, 6, &ifmd, &len, (void *)0, 0) == -1) { + VLOG_DBG_RL(&rl, "%s: sysctl failed: %s", + netdev_get_name(netdev_), strerror(errno)); + return errno; + } else if (!strcmp(ifmd.ifmd_name, netdev_get_name(netdev_))) { + stats->rx_packets = ifmd.ifmd_data.ifi_ipackets; + stats->tx_packets = ifmd.ifmd_data.ifi_opackets; + stats->rx_bytes = ifmd.ifmd_data.ifi_ibytes; + stats->tx_bytes = ifmd.ifmd_data.ifi_obytes; + stats->rx_errors = ifmd.ifmd_data.ifi_ierrors; + stats->tx_errors = ifmd.ifmd_data.ifi_oerrors; + stats->rx_dropped = ifmd.ifmd_data.ifi_iqdrops; + stats->tx_dropped = 0; + stats->multicast = ifmd.ifmd_data.ifi_imcasts; + stats->collisions = ifmd.ifmd_data.ifi_collisions; + + stats->rx_length_errors = 0; + stats->rx_over_errors = 0; + stats->rx_crc_errors = 0; + stats->rx_frame_errors = 0; + stats->rx_fifo_errors = 0; + stats->rx_missed_errors = 0; + + stats->tx_aborted_errors = 0; + stats->tx_carrier_errors = 0; + stats->tx_fifo_errors = 0; + stats->tx_heartbeat_errors = 0; + stats->tx_window_errors = 0; + break; + } + } + + return 0; +} + +static uint32_t +netdev_bsd_parse_media(int media) +{ + uint32_t supported = 0; + bool half_duplex = media & IFM_HDX ? true : false; + + switch (IFM_SUBTYPE(media)) { + case IFM_10_2: + case IFM_10_5: + case IFM_10_STP: + case IFM_10_T: + supported |= half_duplex ? OFPPF_10MB_HD : OFPPF_10MB_FD; + supported |= OFPPF_COPPER; + break; + + case IFM_10_FL: + supported |= half_duplex ? OFPPF_10MB_HD : OFPPF_10MB_FD; + supported |= OFPPF_FIBER; + break; + + case IFM_100_T2: + case IFM_100_T4: + case IFM_100_TX: + case IFM_100_VG: + supported |= half_duplex ? OFPPF_100MB_HD : OFPPF_100MB_FD; + supported |= OFPPF_COPPER; + break; + + case IFM_100_FX: + supported |= half_duplex ? OFPPF_100MB_HD : OFPPF_100MB_FD; + supported |= OFPPF_FIBER; + break; + + case IFM_1000_CX: + case IFM_1000_T: + supported |= half_duplex ? OFPPF_1GB_HD : OFPPF_1GB_FD; + supported |= OFPPF_COPPER; + break; + + case IFM_1000_LX: + case IFM_1000_SX: + supported |= half_duplex ? OFPPF_1GB_HD : OFPPF_1GB_FD; + supported |= OFPPF_FIBER; + break; + + case IFM_10G_CX4: + supported |= OFPPF_10GB_FD; + supported |= OFPPF_COPPER; + break; + + case IFM_10G_LR: + case IFM_10G_SR: + supported |= OFPPF_10GB_FD; + supported |= OFPPF_FIBER; + break; + + default: + return 0; + } + + if (IFM_SUBTYPE(media) == IFM_AUTO) { + supported |= OFPPF_AUTONEG; + } + /* + if (media & IFM_ETH_FMASK) { + supported |= OFPPF_PAUSE; + } + */ + + return supported; +} + +/* + * Stores the features supported by 'netdev' into each of '*current', + * '*advertised', '*supported', and '*peer' that are non-null. Each value is a + * bitmap of "enum ofp_port_features" bits, in host byte order. Returns 0 if + * successful, otherwise a positive errno value. On failure, all of the + * passed-in values are set to 0. + */ +static int +netdev_bsd_get_features(struct netdev *netdev, + uint32_t *current, uint32_t *advertised, + uint32_t *supported, uint32_t *peer) +{ + struct ifmediareq ifmr; + int *media_list; + int i; + int error; + + + /* XXX Look into SIOCGIFCAP instead of SIOCGIFMEDIA */ + + memset(&ifmr, 0, sizeof(ifmr)); + strncpy(ifmr.ifm_name, netdev_get_name(netdev), sizeof ifmr.ifm_name); + + /* We make two SIOCGIFMEDIA ioctl calls. The first to determine the + * number of supported modes, and a second with a buffer to retrieve + * them. */ + if (ioctl(af_inet_sock, SIOCGIFMEDIA, &ifmr) == -1) { + VLOG_DBG_RL(&rl, "%s: ioctl(SIOCGIFMEDIA) failed: %s", + netdev_get_name(netdev), strerror(errno)); + return errno; + } + + media_list = xcalloc(ifmr.ifm_count, sizeof(int)); + ifmr.ifm_ulist = media_list; + + if (!IFM_TYPE(ifmr.ifm_current) & IFM_ETHER) { + VLOG_DBG_RL(&rl, "%s: doesn't appear to be ethernet", + netdev_get_name(netdev)); + error = EINVAL; + goto cleanup; + } + + if (ioctl(af_inet_sock, SIOCGIFMEDIA, &ifmr) == -1) { + VLOG_DBG_RL(&rl, "%s: ioctl(SIOCGIFMEDIA) failed: %s", + netdev_get_name(netdev), strerror(errno)); + error = errno; + goto cleanup; + } + + /* Current settings. */ + *current = netdev_bsd_parse_media(ifmr.ifm_active); + + /* Advertised features. */ + *advertised = netdev_bsd_parse_media(ifmr.ifm_current); + + /* Supported features. */ + *supported = 0; + for (i = 0; i < ifmr.ifm_count; i++) { + *supported |= netdev_bsd_parse_media(ifmr.ifm_ulist[i]); + } + + /* Peer advertisements. */ + *peer = 0; /* XXX */ + + error = 0; +cleanup: + free(media_list); + return error; +} + +/* + * If 'netdev' has an assigned IPv4 address, sets '*in4' to that address (if + * 'in4' is non-null) and returns true. Otherwise, returns false. + */ +static int +netdev_bsd_get_in4(const struct netdev *netdev_, struct in_addr *in4, + struct in_addr *netmask) +{ + struct netdev_dev_bsd *netdev_dev = + netdev_dev_bsd_cast(netdev_get_dev(netdev_)); + + if (!(netdev_dev->cache_valid & VALID_IN4)) { + const struct sockaddr_in *sin; + struct ifreq ifr; + int error; + + ifr.ifr_addr.sa_family = AF_INET; + error = netdev_bsd_do_ioctl(netdev_, &ifr, + SIOCGIFADDR, "SIOCGIFADDR"); + if (error) { + return error; + } + + sin = (struct sockaddr_in *) &ifr.ifr_addr; + netdev_dev->in4 = sin->sin_addr; + netdev_dev->cache_valid |= VALID_IN4; + error = netdev_bsd_do_ioctl(netdev_, &ifr, + SIOCGIFNETMASK, "SIOCGIFNETMASK"); + if (error) { + return error; + } + *netmask = ((struct sockaddr_in*)&ifr.ifr_addr)->sin_addr; + } + *in4 = netdev_dev->in4; + + return in4->s_addr == INADDR_ANY ? EADDRNOTAVAIL : 0; +} + +/* + * Assigns 'addr' as 'netdev''s IPv4 address and 'mask' as its netmask. If + * 'addr' is INADDR_ANY, 'netdev''s IPv4 address is cleared. Returns a + * positive errno value. + */ +static int +netdev_bsd_set_in4(struct netdev *netdev_, struct in_addr addr, + struct in_addr mask) +{ + struct netdev_dev_bsd *netdev_dev = + netdev_dev_bsd_cast(netdev_get_dev(netdev_)); + int error; + + error = do_set_addr(netdev_, SIOCSIFADDR, "SIOCSIFADDR", addr); + if (!error) { + netdev_dev->cache_valid |= VALID_IN4; + netdev_dev->in4 = addr; + if (addr.s_addr != INADDR_ANY) { + error = do_set_addr(netdev_, SIOCSIFNETMASK, + "SIOCSIFNETMASK", mask); + } + } + return error; +} + +static int +netdev_bsd_get_in6(const struct netdev *netdev_, struct in6_addr *in6) +{ + struct netdev_dev_bsd *netdev_dev = + netdev_dev_bsd_cast(netdev_get_dev(netdev_)); + if (!(netdev_dev->cache_valid & VALID_IN6)) { + struct ifaddrs *ifa, *head; + struct sockaddr_in6 *sin6; + const char *netdev_name = netdev_get_name(netdev_); + + if (getifaddrs(&head) != 0) { + VLOG_ERR("getifaddrs on %s device failed: %s", netdev_name, + strerror(errno)); + return errno; + } + + for (ifa = head; ifa; ifa = ifa->ifa_next) { + if (ifa->ifa_addr->sa_family == AF_INET6 && + !strcmp(ifa->ifa_name, netdev_name)) { + sin6 = (struct sockaddr_in6 *)ifa->ifa_addr; + if (sin6) { + memcpy(&netdev_dev->in6, &sin6->sin6_addr, sin6->sin6_len); + netdev_dev->cache_valid |= VALID_IN6; + *in6 = netdev_dev->in6; + freeifaddrs(head); + return 0; + } + } + } + return EADDRNOTAVAIL; + } + *in6 = netdev_dev->in6; + return 0; +} + +static void +make_in4_sockaddr(struct sockaddr *sa, struct in_addr addr) +{ + struct sockaddr_in sin; + memset(&sin, 0, sizeof sin); + sin.sin_family = AF_INET; + sin.sin_addr = addr; + sin.sin_port = 0; + + memset(sa, 0, sizeof *sa); + memcpy(sa, &sin, sizeof sin); +} + +static int +do_set_addr(struct netdev *netdev, + int ioctl_nr, const char *ioctl_name, struct in_addr addr) +{ + struct ifreq ifr; + make_in4_sockaddr(&ifr.ifr_addr, addr); + return netdev_bsd_do_ioctl(netdev, &ifr, ioctl_nr, ioctl_name); +} + +static int +nd_to_iff_flags(enum netdev_flags nd) +{ + int iff = 0; + if (nd & NETDEV_UP) { + iff |= IFF_UP; + } + if (nd & NETDEV_PROMISC) { + iff |= IFF_PROMISC; + iff |= IFF_PPROMISC; + } + return iff; +} + +static int +iff_to_nd_flags(int iff) +{ + enum netdev_flags nd = 0; + if (iff & IFF_UP) { + nd |= NETDEV_UP; + } + if (iff & IFF_PROMISC) { + nd |= NETDEV_PROMISC; + } + return nd; +} + +static int +netdev_bsd_update_flags(struct netdev *netdev, enum netdev_flags off, + enum netdev_flags on, enum netdev_flags *old_flagsp) +{ + int old_flags, new_flags; + int error; + + error = get_flags(netdev, &old_flags); + if (!error) { + *old_flagsp = iff_to_nd_flags(old_flags); + new_flags = (old_flags & ~nd_to_iff_flags(off)) | nd_to_iff_flags(on); + if (new_flags != old_flags) { + error = set_flags(netdev, new_flags); + } + } + return error; +} + +/* Call callbacks for all the list of notifiers */ +static void +poll_notify(struct list *dev_notifiers) +{ + struct netdev_bsd_notifier *notifier; + LIST_FOR_EACH (notifier, struct netdev_bsd_notifier, node, dev_notifiers) { + struct netdev_notifier *n = ¬ifier->notifier; + n->cb(n); + } +} + +/* + * The callback registered for 'netdev_bsd_poll_notifier'. + * + * If 'change' is set it retrieves the element in the 'all_bsd_notifiers' + * relative to the network device which has been subject to the change, and + * then call the callbacks for all the notifiers registered for that network + * device. + */ +static void +netdev_bsd_poll_cb(const struct rtbsd_change *change, void *aux) +{ + struct shash *arg = aux; + + if (change) { + struct list *dev_notifiers = shash_find_data(arg, change->if_name); + if (dev_notifiers) { + poll_notify(dev_notifiers); + } + } else { + struct shash_node *node; + SHASH_FOR_EACH (node, arg) { + poll_notify(node->data); + } + } +} + +/* + * Arranges for 'cb' to be called whenever one of the attributes of + * 'netdev' changes and sets '*notifierp' to a newly created + * netdev_notifier that represents this arrangement. + * + * If the 'all_bsd_notifiers' is empty, it registers the + * 'netdev_bsd_poll_notifier'. Than it creates a new notifier and inserts it + * into the list belonging to the node in the 'all_bsd_notifiers' relative to + * the network device. In case it is the first notifier registered for this + * network device, it first create the node into the 'all_bsd_notifiers' and + * then appends the notifier to its list. + */ +static int +netdev_bsd_poll_add(struct netdev *netdev, + void (*cb)(struct netdev_notifier *), void *aux, + struct netdev_notifier **notifierp) +{ + const char *netdev_name = netdev_get_name(netdev); + struct netdev_bsd_notifier *notifier; + struct list *list; + + if (shash_is_empty(&all_bsd_notifiers)) { + /* This is the first time that this function is called so the main + * notifier needs to be registered */ + int error = rtbsd_notifier_register(&netdev_bsd_poll_notifier, + netdev_bsd_poll_cb, &all_bsd_notifiers); + if (error) { + return error; + } + } + list = shash_find_data(&all_bsd_notifiers, netdev_name); + if (!list) { + list = xmalloc(sizeof *list); + list_init(list); + shash_add(&all_bsd_notifiers, netdev_name, list); + } + + notifier = xmalloc(sizeof *notifier); + netdev_notifier_init(¬ifier->notifier, netdev, cb, aux); + list_push_back(list, ¬ifier->node); + *notifierp = ¬ifier->notifier; + return 0; +} + +static void +netdev_bsd_poll_remove(struct netdev_notifier *notifier_) +{ + struct netdev_bsd_notifier *notifier = + CONTAINER_OF(notifier_, struct netdev_bsd_notifier, notifier); + struct list *list; + + /* Remove 'notifier' from its list. */ + list = list_remove(¬ifier->node); + if (list_is_empty(list)) { + /* The list is now empty. Remove it from the hash and free it. */ + const char *netdev_name = netdev_get_name(notifier->notifier.netdev); + shash_delete(&all_bsd_notifiers, + shash_find(&all_bsd_notifiers, netdev_name)); + free(list); + } + free(notifier); + + /* If that was the last notifier, unregister. */ + if (shash_is_empty(&all_bsd_notifiers)) { + rtbsd_notifier_unregister(&netdev_bsd_poll_notifier); + } +} + +const struct netdev_class netdev_bsd_class = { + "system", + + netdev_bsd_init, + netdev_bsd_run, + netdev_bsd_wait, + netdev_bsd_create_system, + netdev_bsd_destroy, + NULL, /* reconfigure */ + netdev_bsd_open_system, + netdev_bsd_close, + + netdev_bsd_enumerate, + + netdev_bsd_recv, + netdev_bsd_recv_wait, +#ifdef THREADED + netdev_bsd_dispatch, + netdev_bsd_get_fd, +#endif + netdev_bsd_drain, + + netdev_bsd_send, + netdev_bsd_send_wait, + + netdev_bsd_set_etheraddr, + netdev_bsd_get_etheraddr, + netdev_bsd_get_mtu, + netdev_bsd_get_ifindex, + netdev_bsd_get_carrier, + netdev_bsd_get_stats, + NULL, /* set_stats */ + + netdev_bsd_get_features, + NULL, /* set_advertisement */ + NULL, /* get_vlan_vid */ //XXX SIOCGETVLAN + NULL, /* set_policing */ + NULL, /* get_qos_type */ + NULL, /* get_qos_capabilities */ + NULL, /* get_qos */ + NULL, /* set_qos */ + NULL, /* get_queue */ + NULL, /* set_queue */ + NULL, /* delete_queue */ + NULL, /* get_queue_stats */ + NULL, /* dump_queue */ + NULL, /* dump_queue_stats */ + + netdev_bsd_get_in4, + netdev_bsd_set_in4, + netdev_bsd_get_in6, + NULL, /* add_router */ + NULL, /* get_next_hop */ + NULL, /* arp_lookup */ + + netdev_bsd_update_flags, + + netdev_bsd_poll_add, + netdev_bsd_poll_remove, +}; + +const struct netdev_class netdev_tap_class = { + "tap", + + netdev_bsd_init, + netdev_bsd_run, + netdev_bsd_wait, + netdev_bsd_create_tap, + netdev_bsd_destroy, + NULL, /* reconfigure */ + netdev_bsd_open_system, + netdev_bsd_close, + + netdev_bsd_enumerate, + + netdev_bsd_recv, + netdev_bsd_recv_wait, +#ifdef THREADED + netdev_bsd_dispatch, /* dispatch */ + netdev_bsd_get_fd, +#endif + netdev_bsd_drain, + + netdev_bsd_send, + netdev_bsd_send_wait, + + netdev_bsd_set_etheraddr, + netdev_bsd_get_etheraddr, + netdev_bsd_get_mtu, + netdev_bsd_get_ifindex, + netdev_bsd_get_carrier, + netdev_bsd_get_stats, + NULL, /* set_stats */ + + netdev_bsd_get_features, + NULL, /* set_advertisement */ + NULL, /* get_vlan_vid */ + NULL, /* set_policing */ + NULL, /* get_qos_type */ + NULL, /* get_qos_capabilities */ + NULL, /* get_qos */ + NULL, /* set_qos */ + NULL, /* get_queue */ + NULL, /* set_queue */ + NULL, /* delete_queue */ + NULL, /* get_queue_stats */ + NULL, /* dump_queue */ + NULL, /* dump_queue_stats */ + + netdev_bsd_get_in4, + netdev_bsd_set_in4, + netdev_bsd_get_in6, + NULL, /* add_router */ + NULL, /* get_next_hop */ + NULL, /* arp_lookup */ + + netdev_bsd_update_flags, + + netdev_bsd_poll_add, + netdev_bsd_poll_remove, +}; + + +static void +destroy_tap(int fd, const char *name) +{ + struct ifreq ifr; + close(fd); + strcpy(ifr.ifr_name, name); + /* XXX What to do if this call fails? */ + ioctl(af_inet_sock, SIOCIFDESTROY, &ifr); +} + +static int +get_flags(const struct netdev *netdev, int *flags) +{ + struct ifreq ifr; + int error; + + error = netdev_bsd_do_ioctl(netdev, &ifr, SIOCGIFFLAGS, "SIOCGIFFLAGS"); + + *flags = 0xFFFF0000 & (ifr.ifr_flagshigh << 16); + *flags |= 0x0000FFFF & ifr.ifr_flags; + + return error; +} + +static int +set_flags(struct netdev *netdev, int flags) +{ + struct ifreq ifr; + + ifr.ifr_flags = 0x0000FFFF & flags; + ifr.ifr_flagshigh = (0xFFFF0000 & flags) >> 16; + + return netdev_bsd_do_ioctl(netdev, &ifr, SIOCSIFFLAGS, "SIOCSIFFLAGS"); +} + +static int +get_ifindex(const struct netdev *netdev_, int *ifindexp) +{ + struct netdev_dev_bsd *netdev_dev = + netdev_dev_bsd_cast(netdev_get_dev(netdev_)); + *ifindexp = 0; + if (!(netdev_dev->cache_valid & VALID_IFINDEX)) { + int ifindex = if_nametoindex(netdev_get_name(netdev_)); + if (ifindex <= 0) { + return errno; + } + netdev_dev->cache_valid |= VALID_IFINDEX; + netdev_dev->ifindex = ifindex; + } + *ifindexp = netdev_dev->ifindex; + return 0; +} + +static int +get_etheraddr(const char *netdev_name, uint8_t ea[ETH_ADDR_LEN]) +{ + struct ifaddrs *head; + struct ifaddrs *ifa; + struct sockaddr_dl *sdl; + + if (getifaddrs(&head) != 0) { + VLOG_ERR("getifaddrs on %s device failed: %s", netdev_name, + strerror(errno)); + return errno; + } + + for (ifa = head; ifa; ifa = ifa->ifa_next) { + if (ifa->ifa_addr->sa_family == AF_LINK) { + if (!strcmp(ifa->ifa_name, netdev_name)) { + sdl = (struct sockaddr_dl *)ifa->ifa_addr; + if (sdl) { + memcpy(ea, LLADDR(sdl), sdl->sdl_alen); + freeifaddrs(head); + return 0; + } + } + } + } + + VLOG_ERR("could not find ethernet address for %s device", netdev_name); + freeifaddrs(head); + return ENODEV; +} + +static int +set_etheraddr(const char *netdev_name, int hwaddr_family, + int hwaddr_len, const uint8_t mac[ETH_ADDR_LEN]) +{ + struct ifreq ifr; + + memset(&ifr, 0, sizeof ifr); + strncpy(ifr.ifr_name, netdev_name, sizeof ifr.ifr_name); + ifr.ifr_addr.sa_family = hwaddr_family; + ifr.ifr_addr.sa_len = hwaddr_len; + memcpy(ifr.ifr_addr.sa_data, mac, hwaddr_len); + COVERAGE_INC(netdev_set_hwaddr); + if (ioctl(af_inet_sock, SIOCSIFLLADDR, &ifr) < 0) { + VLOG_ERR("ioctl(SIOCSIFLLADDR) on %s device failed: %s", + netdev_name, strerror(errno)); + return errno; + } + return 0; +} + +static int +netdev_bsd_do_ioctl(const struct netdev *netdev, struct ifreq *ifr, + int cmd, const char *cmd_name) +{ + strncpy(ifr->ifr_name, netdev_get_name(netdev), sizeof ifr->ifr_name); + if (ioctl(af_inet_sock, cmd, ifr) == -1) { + VLOG_DBG_RL(&rl, "%s: ioctl(%s) failed: %s", + netdev_get_name(netdev), cmd_name, strerror(errno)); + return errno; + } + return 0; +} diff -Nur -x '*.svn*' orig/lib/netdev.c mod/lib/netdev.c --- orig/lib/netdev.c 2010-09-10 23:36:43.000000000 +0200 +++ mod/lib/netdev.c 2011-11-29 12:51:05.456522419 +0100 @@ -49,6 +49,10 @@ &netdev_gre_class, &netdev_capwap_class, #endif +#ifdef __FreeBSD__ + &netdev_bsd_class, + &netdev_tap_class, +#endif }; static struct shash netdev_classes = SHASH_INITIALIZER(&netdev_classes); @@ -74,7 +78,6 @@ if (status < 0) { int i; - fatal_signal_add_hook(close_all_netdevs, NULL, NULL, true); status = 0; @@ -122,7 +125,7 @@ netdev_register_provider(const struct netdev_class *new_class) { struct netdev_class *new_provider; - + if (shash_find(&netdev_classes, new_class->type)) { VLOG_WARN("attempted to register duplicate netdev provider: %s", new_class->type); @@ -505,6 +508,28 @@ } } +#ifdef THREADED +/* Attempts to receive and process 'batch' packets from 'netdev'. */ +int +netdev_dispatch(struct netdev *netdev, int batch, pkt_handler h, u_char *user) +{ + int (*dispatch)(struct netdev*, int, pkt_handler, u_char *); + + dispatch = netdev_get_dev(netdev)->netdev_class->dispatch; + return dispatch ? dispatch(netdev, batch, h, user) : 0; +} + +/* Returns the file descriptor */ +int +netdev_get_fd(struct netdev *netdev) +{ + int (*get_fd)(struct netdev *); + + get_fd = netdev_get_dev(netdev)->netdev_class->get_fd; + return get_fd ? get_fd(netdev) : 0; +} +#endif + /* Discards all packets waiting to be received from 'netdev'. */ int netdev_drain(struct netdev *netdev) diff -Nur -x '*.svn*' orig/lib/netdev.h mod/lib/netdev.h --- orig/lib/netdev.h 2010-08-04 03:43:38.000000000 +0200 +++ mod/lib/netdev.h 2011-11-29 12:51:05.456522419 +0100 @@ -21,6 +21,10 @@ #include #include +#ifdef THREADED +#include "dispatch.h" +#endif + #ifdef __cplusplus extern "C" { #endif @@ -117,6 +121,10 @@ /* Packet send and receive. */ int netdev_recv(struct netdev *, struct ofpbuf *); void netdev_recv_wait(struct netdev *); +#ifdef THREADED +int netdev_dispatch(struct netdev *, int, pkt_handler, u_char *); +int netdev_get_fd(struct netdev *); +#endif int netdev_drain(struct netdev *); int netdev_send(struct netdev *, const struct ofpbuf *); diff -Nur -x '*.svn*' orig/lib/netdev-linux.c mod/lib/netdev-linux.c --- orig/lib/netdev-linux.c 2010-09-10 23:36:43.000000000 +0200 +++ mod/lib/netdev-linux.c 2011-11-29 12:51:05.446522418 +0100 @@ -2079,6 +2079,10 @@ netdev_linux_recv, netdev_linux_recv_wait, +#ifdef THREADED + NULL, /* dispatch */ + NULL, /* get_fd */ +#endif netdev_linux_drain, netdev_linux_send, @@ -2139,6 +2143,10 @@ netdev_linux_recv, netdev_linux_recv_wait, +#ifdef THREADED + NULL, /* dispatch */ + NULL, /* get_fd */ +#endif netdev_linux_drain, netdev_linux_send, diff -Nur -x '*.svn*' orig/lib/netdev-patch.c mod/lib/netdev-patch.c --- orig/lib/netdev-patch.c 2010-08-04 03:43:38.000000000 +0200 +++ mod/lib/netdev-patch.c 2011-11-29 12:51:05.449855752 +0100 @@ -191,6 +191,10 @@ NULL, /* recv */ NULL, /* recv_wait */ +#ifdef THREADED + NULL, /* dispatch */ + NULL, /* get_fd */ +#endif NULL, /* drain */ NULL, /* send */ diff -Nur -x '*.svn*' orig/lib/netdev-provider.h mod/lib/netdev-provider.h --- orig/lib/netdev-provider.h 2010-09-10 23:36:43.000000000 +0200 +++ mod/lib/netdev-provider.h 2011-11-29 12:51:05.463189085 +0100 @@ -24,6 +24,9 @@ #include "netdev.h" #include "list.h" #include "shash.h" +#ifdef THREADED +#include "dispatch.h" +#endif #ifdef __cplusplus extern "C" { @@ -194,6 +197,22 @@ * implement packet reception through the 'recv' member function. */ void (*recv_wait)(struct netdev *netdev); +#ifdef THREADED + /* Attempts to receive 'batch' packets from 'netdev' and process them + * through the 'handler' callback. This function is used in the 'THREADED' + * version in order to optimize the forwarding process, since it permits to + * process packets directly in the netdev memory. + * + * Returns the number of packets processed on success; this can be 0 if no + * packets are available to be read. Returns -1 if an error occurred. + */ + int (*dispatch)(struct netdev *netdev, int batch, pkt_handler handler, + u_char *user); + + /* Return the file descriptor of the device */ + int (*get_fd)(struct netdev *netdev); +#endif + /* Discards all packets waiting to be received from 'netdev'. * * May be null if not needed, such as for a network device that does not @@ -551,6 +570,12 @@ extern const struct netdev_class netdev_patch_class; extern const struct netdev_class netdev_gre_class; extern const struct netdev_class netdev_capwap_class; +#ifdef __FreeBSD__ +extern const struct netdev_class netdev_bsd_class; +#ifdef NETMAP +extern const struct netdev_class netdev_netmap_class; +#endif +#endif #ifdef __cplusplus } diff -Nur -x '*.svn*' orig/lib/netdev-tunnel.c mod/lib/netdev-tunnel.c --- orig/lib/netdev-tunnel.c 2010-09-10 23:36:43.000000000 +0200 +++ mod/lib/netdev-tunnel.c 2011-11-29 12:51:05.453189086 +0100 @@ -247,6 +247,10 @@ NULL, /* recv */ NULL, /* recv_wait */ +#ifdef THREADED + NULL, /* dispatch */ + NULL, /* get_fd */ +#endif NULL, /* drain */ NULL, /* send */ @@ -307,7 +311,11 @@ NULL, /* recv */ NULL, /* recv_wait */ +#ifdef THREADED + NULL, /* dispatch */ +#endif NULL, /* drain */ + NULL, /* get_fd */ NULL, /* send */ NULL, /* send_wait */ diff -Nur -x '*.svn*' orig/lib/rtbsd.c mod/lib/rtbsd.c --- orig/lib/rtbsd.c 1970-01-01 01:00:00.000000000 +0100 +++ mod/lib/rtbsd.c 2011-11-29 12:51:05.449855752 +0100 @@ -0,0 +1,168 @@ +/* + * Copyright (c) 2011 Gaetano Catalli. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * 1. Redistributions of source code must retain the above copyright notice, + * this list of conditions and the following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED ``AS IS'' WITHOUT ANY WARRANTIES OF ANY KIND. + */ + +#include + +#include +#include +#include +#include +#include +#include + +#include "coverage.h" +#include "socket-util.h" +#include "poll-loop.h" +#include "vlog.h" +#include "rtbsd.h" + +VLOG_DEFINE_THIS_MODULE(rtbsd) + +/* PF_ROUTE socket. */ +static int notify_sock = -1; + +/* All registered notifiers. */ +static struct list all_notifiers = LIST_INITIALIZER(&all_notifiers); + +static void rtbsd_report_change(const struct if_msghdr *); +static void rtbsd_report_notify_error(void); + +/* Registers 'cb' to be called with auxiliary data 'aux' with network device + * change notifications. The notifier is stored in 'notifier', which the + * caller must not modify or free. + * + * Returns 0 if successful, otherwise a positive errno value. */ +int +rtbsd_notifier_register(struct rtbsd_notifier *notifier, + rtbsd_notify_func *cb, void *aux) +{ + if (notify_sock < 0) { + int error; + notify_sock = socket(PF_ROUTE, SOCK_RAW, 0); + if (notify_sock < 0) { + VLOG_WARN("could not create PF_ROUTE socket: %s", + strerror(errno)); + return errno; + } + error = set_nonblocking(notify_sock); + if (error) { + VLOG_WARN("error set_nonblocking PF_ROUTE socket: %s", + strerror(error)); + return error; + } + } else { + /* Catch up on notification work so that the new notifier won't + * receive any stale notifications. XXX*/ + rtbsd_notifier_run(); + } + + list_push_back(&all_notifiers, ¬ifier->node); + notifier->cb = cb; + notifier->aux = aux; + return 0; +} + +/* Cancels notification on 'notifier', which must have previously been + * registered with rtbsd_notifier_register(). */ +void +rtbsd_notifier_unregister(struct rtbsd_notifier *notifier) +{ + list_remove(¬ifier->node); + if (list_is_empty(&all_notifiers)) { + close(notify_sock); + notify_sock = -1; + } +} + +/* Calls all of the registered notifiers, passing along any as-yet-unreported + * netdev change events. */ +void +rtbsd_notifier_run(void) +{ + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); + struct if_msghdr msg; + if (notify_sock < 0) { + return; + } + + for (;;) { + int retval; + + msg.ifm_type = RTM_IFINFO; + msg.ifm_version = RTM_VERSION; //XXX check if necessary + + /* read from PF_ROUTE socket */ + retval = read(notify_sock, (char *)&msg, sizeof(msg)); + if (retval >= 0) { + /* received packet from PF_ROUTE socket + * XXX check for bad packets */ + if (msg.ifm_type == RTM_IFINFO) { + rtbsd_report_change(&msg); + } + } else if (errno == EAGAIN) { + return; + } else { + if (errno == ENOBUFS) { + VLOG_WARN_RL(&rl, "PF_ROUTE receive buffer overflowed"); + } else { + VLOG_WARN_RL(&rl, "error reading PF_ROUTE socket: %s", + strerror(errno)); + } + rtbsd_report_notify_error(); + } + } +} + +/* Causes poll_block() to wake up when network device change notifications are + * ready. */ +void +rtbsd_notifier_wait(void) +{ + if (notify_sock >= 0) { + poll_fd_wait(notify_sock, POLLIN); + } +} + +static void +rtbsd_report_change(const struct if_msghdr *msg) +{ + struct rtbsd_notifier *notifier; + struct rtbsd_change change; + + /*COVERAGE_INC(rtbsd_changed);*/ /* XXX update coverage-counters.c */ + + change.msg_type = msg->ifm_type; //XXX + change.if_index = msg->ifm_index; + if_indextoname(msg->ifm_index, change.if_name); + change.master_ifindex = 0; //XXX + + LIST_FOR_EACH (notifier, struct rtbsd_notifier, node, + &all_notifiers) { + notifier->cb(&change, notifier->aux); + } +} + +/* If an error occurs the notifiers' callbacks are called with NULL changes */ +static void +rtbsd_report_notify_error(void) +{ + struct rtbsd_notifier *notifier; + + LIST_FOR_EACH (notifier, struct rtbsd_notifier, node, + &all_notifiers) { + notifier->cb(NULL, notifier->aux); + } +} diff -Nur -x '*.svn*' orig/lib/rtbsd.h mod/lib/rtbsd.h --- orig/lib/rtbsd.h 1970-01-01 01:00:00.000000000 +0100 +++ mod/lib/rtbsd.h 2011-11-29 12:51:05.453189086 +0100 @@ -0,0 +1,58 @@ +/* + * Copyright (c) 2011 Gaetano Catalli. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * 1. Redistributions of source code must retain the above copyright notice, + * this list of conditions and the following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED ``AS IS'' WITHOUT ANY WARRANTIES OF ANY KIND. + */ + +#ifndef RTBSD_H +#define RTBSD_H 1 + +#include "list.h" + +/* + * A digested version of a message received from a PF_ROUTE socket which + * indicates that a network device has been created or destroyed or changed. + */ +struct rtbsd_change { + /* Copied from struct if_msghdr. */ + int msg_type; /* e.g. XXX. */ + + /* Copied from struct if_msghdr. */ + int if_index; /* Index of network device. */ + + char if_name[IF_NAMESIZE]; /* Name of network device. */ + int master_ifindex; /* Ifindex of datapath master (0 if none). */ +}; + +/* + * Function called to report that a netdev has changed. 'change' describes the + * specific change. It may be null if the buffer of change information + * overflowed, in which case the function must assume that every device may + * have changed. 'aux' is as specified in the call to + * rtbsd_notifier_register(). + */ +typedef void rtbsd_notify_func(const struct rtbsd_change *, void *aux); + +struct rtbsd_notifier { + struct list node; + rtbsd_notify_func *cb; + void *aux; +}; + +int rtbsd_notifier_register(struct rtbsd_notifier *, + rtbsd_notify_func *, void *aux); +void rtbsd_notifier_unregister(struct rtbsd_notifier *); +void rtbsd_notifier_run(void); +void rtbsd_notifier_wait(void); + +#endif /* rtbsd.h */ diff -Nur -x '*.svn*' orig/lib/socket-util.c mod/lib/socket-util.c --- orig/lib/socket-util.c 2010-09-10 23:36:43.000000000 +0200 +++ mod/lib/socket-util.c 2011-11-29 12:51:05.456522419 +0100 @@ -258,8 +258,11 @@ VLOG_WARN("unlinking \"%s\": %s\n", un.sun_path, strerror(errno)); } fatal_signal_add_file_to_unlink(bind_path); - if (bind(fd, (struct sockaddr*) &un, un_len) - || fchmod(fd, S_IRWXU)) { + if (bind(fd, (struct sockaddr*) &un, un_len) +#ifndef __FreeBSD__ + || fchmod(fd, S_IRWXU) +#endif + ) { goto error; } } diff -Nur -x '*.svn*' orig/lib/stream-ssl.c mod/lib/stream-ssl.c --- orig/lib/stream-ssl.c 2010-09-10 23:36:43.000000000 +0200 +++ mod/lib/stream-ssl.c 2011-11-29 12:51:05.439855751 +0100 @@ -22,6 +22,8 @@ #include #include #include +#include +#include #include #include #include diff -Nur -x '*.svn*' orig/m4/openvswitch.m4 mod/m4/openvswitch.m4 --- orig/m4/openvswitch.m4 2010-09-10 23:36:43.000000000 +0200 +++ mod/m4/openvswitch.m4 2011-11-29 12:50:42.899855055 +0100 @@ -14,6 +14,25 @@ # See the License for the specific language governing permissions and # limitations under the License. +dnl Check for --enable-threaded and updates CFLAGS. +AC_DEFUN([OVS_CHECK_THREADED], + [AC_REQUIRE([AC_PROG_CC]) + AC_ARG_ENABLE( + [threaded], + [AC_HELP_STRING([--enable-threaded], + [Enable threaded version of userspace implementation])], + [case "${enableval}" in + (yes) coverage=true ;; + (no) coverage=false ;; + (*) AC_MSG_ERROR([bad value ${enableval} for --enable-threaded]) ;; + esac], + [threaded=false]) + if $threaded; then + AC_DEFINE([THREADED], [1], + [Define to 1 if the threaded version of userspace + implementation is enabled.]) + fi]) + dnl Checks for --enable-coverage and updates CFLAGS and LDFLAGS appropriately. AC_DEFUN([OVS_CHECK_COVERAGE], [AC_REQUIRE([AC_PROG_CC]) diff -Nur -x '*.svn*' orig/ofproto/ofproto.c mod/ofproto/ofproto.c --- orig/ofproto/ofproto.c 2010-09-10 23:36:43.000000000 +0200 +++ mod/ofproto/ofproto.c 2011-11-29 12:50:51.026521973 +0100 @@ -995,6 +995,7 @@ int ofproto_run(struct ofproto *p) { + /* handle protocol messages coming from the datapath */ int error = ofproto_run1(p); if (!error) { error = ofproto_run2(p, false); @@ -1056,6 +1057,11 @@ } } +/* + * Calls the netdevice dpif_netdev_recv() callback, + * that read a protocol packet from the dpif queue + * and handle the message + */ int ofproto_run1(struct ofproto *p) { @@ -1278,6 +1284,7 @@ /* XXX Should we translate the dpif_execute() errno value into an OpenFlow * error code? */ + fprintf(stderr, "OFPROTO EXECUTE\n"); dpif_execute(p->dpif, flow->in_port, odp_actions.actions, odp_actions.n_actions, packet); return 0; diff -Nur -x '*.svn*' orig/README-CATALLI mod/README-CATALLI --- orig/README-CATALLI 1970-01-01 01:00:00.000000000 +0100 +++ mod/README-CATALLI 2011-11-29 12:51:10.969855922 +0100 @@ -0,0 +1,124 @@ +My own version of openvswitch derived from +http://openvswitch.org/releases/openvswitch-1.1.0pre2.tar.gz + +- LOCAL PORT +The bridge local port represents an entry point to the normal network stack. It +is realized with a tap device. This device is opened by the 'dpif' module and +links all the ports belonging to the bridge with the network stack through the +tap file descriptor. + +- TAP DEVICES AS VIRTUAL DEVICES +I've implemented the capability to create and open a tap device specifying it +in the command line (e.g. in the --ports option of ovs-openflowd). +Specifying a device with name 'tap:tapN' (e.g. --ports=tap:tap1)the software +attempts to create and open a tap device with name 'tapN'. If the device already +exists it attempts only to open it. +If you don't put the 'tap:' prefix (e.g. --ports=tap1), the software attempts +only to open the device, which must have been previously created (e.g. with +'ifconfig tap1 create'). + +*********************************** +Compilation: +The compilation of the netdev-bsd file requires some additional LDFLAGS. + +1. For the time being leaves the Makefile unmodified and run the configure +script as follow: + + bash# export LDFLAGS='-lrt -lpcap -lpthread' + bash# ./configure + +This will include the time and pcap options to the Makefile. + +2. Or modify the Makefile LDFAGS with the following line: + +<<<<<<< .mine + LDFLAGS = -lrt -lpcap -lpthread +======= + LDFLAGS = -lrt -lpcap =lpthread +>>>>>>> .r289 + +3. The fake generator has the aim to test the openflowd performances + avoiding the overhead introduced by the write()/read() functions. + It can be enabled by the -DFAKE_GEN define and consists in: + - a fake packet creation; + - always return the fake packet in netdev_bsd_recv_system(); + - do not wait on the involved interface; + - always return success for netdev_bsd_send(). + + To enable the fake generation you should add a define in the Makefile: + DEFS= -DFAKE_GEN + and configure the openflow with "em0" and "em1": + + ./ovs-openflowd netdev@dp0 --ports=em0,em1 tcp:127.0.0.1:6635 --listen=ptcp:6634 --out-of-band + ./ovs-controller -w --max-idle=permanent ptcp:6636 + + and insert the forward entry in the flow table by: + + ./ovs-ofctl add-flow tcp:127.0.0.1:6634 "in_port=1 idle_timeout=0 priority=65535 actions=output:2" + + this is mandatory because initially the switch does not know where + to send the packets (ARP replies are missing for fake packets) and + will continuosly flood the packets across the whole network. + +4. Commands + ./ovs-ofctl dump-flows tcp:127.0.0.1:6634 + ovs-openflowd # The openflows server + tcp:127.0.0.1:6635 # socket listening for the controller + --listen=ptcp:6634 # socket listening for ofctl + --out-of-band # used to avoid error on controller connection + + ovs-ofctl # Send commands and querty the server + add-flow # add a flow entry + dump-flows # dumps the active flows + + ifconfig dp0 destroy # to be used when the ovs-server crash and + # do not reset the interfaces + arp -d 192.168.1.2 # to be used to forse an arp request/reply + # useful while running tests with netrate + # because the flowtable entry is added when + # a packet has a reply from a port + +6. Linux kernel module (ONGOING) +6.1 create the var run directory, only for the first execution + mkdir -p /usr/local/var/run/openvswitch/ +6.2 load the openvswitch module + insmod datapath/linux-2.6/openvswitch_mod.ko +6.3 start the controller and the server + ./ovs-controller -w --max-idle=permanent ptcp:6636 + ./ovs-openflowd system@dp0 --ports=eth2,eth4 tcp:127.0.0.1:6636 --listen=ptcp:6634 --out-of-band + +7. Linux kernel user space +7.1 create the var run directory, only for the first execution + mkdir -p /usr/local/var/run/openvswitch/ +7.2 lauch controller, server and test + ./ovs-controller -w --max-idle=permanent ptcp:6636 + ./ovs-openflowd netdev@dp0 --ports=eth2,eth4 tcp:127.0.0.1:6636 --listen=ptcp:6634 --out-of-band + LP MARTA: netsend 33K PC AMD: ipfw 32900 + LP MARTA: ipfw 32200 PC AMD: netsend 34000 +7.3 configure 4 ports and test + ./ovs-openflowd netdev@dp0 --ports=eth2,eth4,eth6,eth7 tcp:127.0.0.1:6636 --listen=ptcp:6634 --out-of-band + LP MARTA: netsend 33K PC AMD: ipfw 31700 + LP MARTA: netsend 33K PC AMD: ipfw 33K + LP MARTA: netsend 34K PC AMD: ipfw 30K + LP MARTA: netsend 33K PC AMD: ipfw 29K + LP MARTA: netsend 33K PC AMD: ipfw 30K + LP MARTA: netsend 33K PC AMD: ipfw 30K + + # Errors with the linux device + May 25 15:53:26|00007|pktbuf|WARN|cookie mismatch: 0000029a != 0000039a + May 25 15:53:26|00008|pktbuf|WARN|cookie mismatch: 0000029b != 0000039b + +8 netmap +The netmap code is enabled wrapping the pcap functions by the changing the +loader search path. +Start the switch with: +(LD_LIBRARY_PATH=. ./ovs-openflowd netdev@dp0 --ports=ix0,ix1 tcp:127.0.0.1:6636 --listen=ptcp:6634 --out-of-band) +and the controller as usual. +./ovs-openflowd netdev@dp0 --ports=ix0,ix1 tcp:127.0.0.1:6636 --listen=ptcp:6634 --out-of-band +Do not forget to disable the checksum on the intefaces: +[root@10Gb1 /home/matteo/workspace/netmap/v2/netmap-v2/examples/pkt-gen]# ifconfig ix0 -rxcsum -txcsum +[root@10Gb1 /home/matteo/workspace/netmap/v2/netmap-v2/examples/pkt-gen]# ifconfig ix1 -rxcsum -txcsum +And to insert the rules into the flowtable: +# ./ovs-ofctl add-flow tcp:127.0.0.1:6634 "in_port=1 idle_timeout=0 priority=65535 actions=output:2" +# ./ovs-ofctl add-flow tcp:127.0.0.1:6634 "in_port=2 idle_timeout=0 priority=65535 actions=output:1" +# ./ovs-ofctl dump-flows tcp:127.0.0.1:6634 diff -Nur -x '*.svn*' orig/utilities/ovs-ofctl.c mod/utilities/ovs-ofctl.c --- orig/utilities/ovs-ofctl.c 2010-09-10 23:36:43.000000000 +0200 +++ mod/utilities/ovs-ofctl.c 2011-11-29 12:51:10.826522585 +0100 @@ -18,6 +18,7 @@ #include #include #include +#include //XXX #include #include #include diff -Nur -x '*.svn*' orig/utilities/ovs-openflowd.c mod/utilities/ovs-openflowd.c --- orig/utilities/ovs-openflowd.c 2010-09-10 23:36:43.000000000 +0200 +++ mod/utilities/ovs-openflowd.c 2011-11-29 12:51:10.823189252 +0100 @@ -152,18 +152,30 @@ daemonize_complete(); +#ifdef THREADED + /* Data thread started */ + fprintf(stdout, "THREADED version running!\n"); + dp_start(); +#endif + + /* The following loop polls on protocol messages + * and on messages related to topology changes */ while (ofproto_is_alive(ofproto)) { error = ofproto_run(ofproto); if (error) { ovs_fatal(error, "unrecoverable datapath error"); } unixctl_server_run(unixctl); +#ifndef THREADED dp_run(); +#endif netdev_run(); ofproto_wait(ofproto); unixctl_server_wait(unixctl); +#ifndef THREADED dp_wait(); +#endif netdev_wait(); poll_block(); } diff -Nur -x '*.svn*' orig/vswitchd/bridge.c mod/vswitchd/bridge.c --- orig/vswitchd/bridge.c 2010-09-10 23:36:43.000000000 +0200 +++ mod/vswitchd/bridge.c 2011-11-29 12:50:52.686522024 +0100 @@ -1260,6 +1260,7 @@ { struct bridge *br; int error; + static int first = 1; assert(!bridge_lookup(br_cfg->name)); br = xzalloc(sizeof *br); @@ -1299,6 +1300,15 @@ VLOG_INFO("created bridge %s on %s", br->name, dpif_name(br->dpif)); +#ifdef THREADED + /* The first time a bridge is created, we launch the datapath thread */ + if (first) { + fprintf(stderr, "THREADED version running!\n"); + dp_start(); + first = 0; + } +#endif + return br; } diff -Nur -x '*.svn*' orig/vswitchd/ovs-vswitchd.c mod/vswitchd/ovs-vswitchd.c --- orig/vswitchd/ovs-vswitchd.c 2010-08-04 03:43:38.000000000 +0200 +++ mod/vswitchd/ovs-vswitchd.c 2011-11-29 12:50:52.676522024 +0100 @@ -91,13 +91,17 @@ } bridge_run(); unixctl_server_run(unixctl); +#ifndef THREADED dp_run(); +#endif netdev_run(); signal_wait(sighup); bridge_wait(); unixctl_server_wait(unixctl); +#ifndef THREADED dp_wait(); +#endif netdev_wait(); poll_block(); }