# P4in5G: Flexible Data Plane Pipelines for 5G / 5GINFIRE ## Abstract Network Function Virtualization (NFV) is one of the enabling technologies of 5G to increase flexibility of communication networks, deploying network functions as software on commodity servers. The development of high-performance network function software still requires deep target-specific knowledge, increasing the development cost and time. To describe packet processing pipelines in a protocol independent way, a domain specific language called P4 has recently emerged. For different targets including both hardware and software P4 compilers can be used to generate the target-specific executable program. Through the high-level abstraction of P4 the code complexity, the implementation time and costs can both be reduced. In this document, we describe the implementation and execution of P4in5G experiment. P4in5G aims at combining the advantages of 5G-NFV and P4 by offering P4-programmable VNFs based on the P4 compiler and software data plane solution called T4P4S (using DPDK backend). The proposed P4-enhanced VNFs has been validated through use cases described in P4 language. In this document, we describe the implementation and execution of P4in5G experiment. The main goal of P4in5G is to extend 5GinFIRE with high-performance P4-programmable data plane capabilities, and to demonstrate the usability of P4 in various 5G scenarios. ### **Organization:** Eötvös Lóránd University (ELTE) # Virtual Machine Though T4P4S-based switches can run on any Linux distributions, we have chosen Ubunu 18.04 LTS as the base image and prepared the necessary execution environment. We note that to support high performance packet processing in virtualized environment DPDK supports different possible approaches: 1) virtual NICs can be used without extra configuration requirements; 2) PCI-passthrough to have direct access to high-performance NICs from the VM; 3) SR-IOV technology to have direct access to a slice of the NIC. Though the 2nd and 3rd options require infrastructure support. The 1st option using the available virtual NICs with DPDK remained possible without any external requirements. ![Components in T4P4S VM and its network interfaces](/uploads/p-4-in-5-g/t-4-p-4-s-vm.png "Components in T4P4S VM and its network interfaces") Figure 1 depicts the high-level structure of the T4P4S VM. It can be seen that the VNF is connected to three networks: a management (vnf-mgmt) network through which the VM can be accessed from outside, uplink network (vnf-ul) and a downlink network (vnf-dl) distinguishing the two directions of data traffic the VNF needs to handle. Both vnf-ul and vnf-dl are expected to be connected to private networks representing virtual links. These two interfaces are occupied by the software data plane generated by T4P4S compiler, using the DPDK’s kernel drivers. The VM contains the prepared P4 programs (note it can be extended and remote P4 programs (with simple urls) can also be used), the control plane programs for the P4 codes, T4P4S compiler and software data plane with helper scripts and an InfluxDB instance for storing INT monitoring time series. Note that if multiple instances of P4-VNF are running in the same environment, control plane and INT DB services can run on any instances (e.g. only running on a dedicated instance). ![Components in T4P4S VM and its network interfaces](/uploads/p-4-in-5-g/interaction-t-4-p-4-s.png) # Setting up DPDK In our image, T4P4S and all the dependencies are already compiled and pre-installed. To be able to run a DPDK backend based switch program, P4C and DPDK are required. Their location and the RTE_TARGET need to be specified as environment variables. ``` export P4C=~/t4p4s/p4c export RTE_SDK=~/t4p4s/dpdk-19.02/ export RTE_TARGET=x86_64-native-linuxapp-gcc ``` Hugepage support is also required for the large memory pool allocation used for packet buffers. Hugepages can, for example, be configured with the ~/t4p4s/dpdk-19.02/usertools/dpdk-setup.sh script. DPDK has its own drivers for the network interfaces which provides significantly better performance compared to the kernel driver. The VIRTIO virtual interfaces can be used with the so called igp_uio poll mode driver of DPDK. To bind the driver to the interfaces the following commands need to be run. ``` modprobe uio insmod $RTE_SDK/$RTE_TARGET/kmod/igb_uio.ko sudo ${RTE_SDK}/usertools/dpdk-devbind.py -b igb_uio 0000:00:04.0 sudo ${RTE_SDK}/usertools/dpdk-devbind.py -b igb_uio 0000:00:05.0 ``` Note that PCI devices 0000:00:04.0 and 0000:00:05.0 represent vnf-ul and vnf-dl virtual interfaces of the VM. ## T4P4S configuration T4P4S is and open source P4 compiler for different software targets including DPDK backends. For the support of INT monitoring feature the original compiler has been extended. The source code of the extended version used in the VM image of the P4in5G experiment is available at [https://github.com/eltecnl/t4p4s](https://github.com/eltecnl/t4p4s). The data plane program described in P4 can be launched with the t4p4s.sh script. It uses settings from three configuration files. 1. light.cfg describes how texts in the terminal and switch output look. 2. examples.cfg describes default options for the examples. - A set of parameters for an example is called a configuration _variant_. - On the command line, you have to specify the _example_ (by name or full path) and the _variant name_. 3. An architecture specific file (for DPDK, opts_dpdk.cfg) describes how the options are to be interpreted: they are translated to more options. - Everything apart from the _example_ is considered an option on the command line. To run an example with the default configuration execute the following command: ./t4p4s.sh :l2fwd The program finds the source file under examples folder by default, but one can also specify it manually ``` ./t4p4s.sh ./examples/l2fwd.p4_14 ``` ## Helper scripts for P4-VNF configuration and execution For easier usage we also provide a few scripts that help with the previous steps. _setup_t4p4s.sh_ is the DPDK configuration script. T4P4S by default uses DPDK backend, and to be able to run DPDK programs a few initial steps are required. This script prepares the VM for DPDK usage. It loads the igb_uio kernel module, binds the interfaces to the dpdk and sets up the hugepages. _run_switch.sh_ allows us to easily compile and run P4 programs from various sources. As command line argument one can specify the name of the P4 program, which is going to be used. It is also possible to provide the URL where the P4 program is available. In this case the script downloads the file moves it to the correct directory and translates the P4 program to C. ``` ./t4p4s.sh p4 :$program``` Then compiles the binary from the C code. ``` ./t4p4s.sh c :$program``` And starts the switch program with the VM specific parameters. Note that the parameters are the same in any instances of P4-VNF. ```sudo ./build/$l2program@std/build/$l2program -c 0x1 -n 1 -w 0000:00:04.0 -w 0000:00:05.0 -- -p 0x3 --config "\"(0,0,0),(1,0,0)\""``` Since the VMs are going to be the same, parameters of this command should not be changed, but to clarify the configuration we briefly explain their roles: -c 0x1 specifies the core mask. In these VM-s only single core processing is supported. -n 1 the number of memory channels to use. -w PCI_DEV_ID adds a PCI device in white list -p 0x3 sets the portmask, to use 2 ports --config "\"(0,0,0),(1,0,0)\"" assign cores to specific ports and queues: (port number [0-1 in the VM], queue ID, CPU core number) **Examples** To run our l2 switch example the following command should be called. The short names can be used for the prepared use cases: l2fwd, l3fwd, nat, lb and 5gupf. ``` ~/run_switch.sh l2fwd``` If the program is not present locally, the file URL may be used as a parameter instead of the short name used above. ``` ~/run_switch.sh https://raw.githubusercontent.com/P4ELTE/t4p4s/master/examples/l2fwd.p4_14``` ## Network measurement tools for benchmarking In P4in5G experiment we use the same VM image for running the performance tests. To this end, we installed the DPDK’s pktgen tool that enables to resend previously captured network traces at high rates. Unfortunately, during the experimentation pktgen tool did not work, since after a few time 10k packets it was unexpectedly stopped sending packets. The problem seems to be related to the virtual NICs and network (e.g., the underlying OVS or other softswitch) used behind the scenes. To by-pass the issue with pktgen, we also installed iperf tool that can be used to carry out throughput measurements using both TCP and UDP flows with more realistic behavior (TCP congestion control is also used). Based on our experiences from non-virtualized environments (our local testbed) iperf can be used up to 25 Gbps with ease. ## InfluxDB for storing telemetry data The telemetry data collected by the data plane program generated by T4P4S is loaded into a local or remote Influx database. For the sake of simplicity, the InfluxDB instance can be instantiated on any P4VNF instances (where it is needed) as a docker container. The local or remote data plane programs can then store their time series there. For the installation of InfluxDB we created an intdb_start.sh script: ``` #!/bin/sh echo "Launching InfluxDB" docker run -d --name intdb -p 8083:8083 -p 8086:8086 -p 25826:25826/udp influxdb:1.5 echo "Done." ``` The intdb_create.py Python script creates the database for the telemetry data: ``` import os from select import select from influxdb import InfluxDBClient import datetime DBNAME='int_db' print "Connecting to InfluxDB..." client = InfluxDBClient() print "Create database:", DBNAME client.create_database(DBNAME) print "Done." ``` The data plane program generated by T4P4S collects statistics on the packet processing time and the traceable “intvar” standard metadata field for user space information. The minimum, maximum and mean values in a 1 sec time windows are logged for both metrics. The logs are loaded into the INT_DB database. Note that timeseries stored in InfluxDB can easily be visualized in tools like Grafana. # P4-VNF The P4-VNF relies on T4P4S VM image (t4p4s_image). The vnf contains a vnfd, cloud-init config file and juju charms to prepare the DPDK environment after booting up the VNF. The VNF requires 4 VCPU, 8 GB RAM and 60 GB storage space for the VM’s file system. Three network interfaces are defined in the vnf: vnf-mgmt, vnf-ul and vnf-dl as described in the previous sections. In _./cloud_init/p4vnf_cloud_init.cfg_ we enable password authentication. The username and password are ubuntu and “5ginfire”. In the final VNF, the start configuration primitive has two parameters: 1) p4program representing a short name of the use case, filepath or url to the P4 program to be executed. 2) controller that expresses the controller program to be launched or the url to the controller instance. The primitive also has default parameters. The p4_vnfd.yaml is the following: ``` vnfd:vnfd-catalog: vnfd: - id: p4_vnf name: p4_vnf short-name: p4_vnf logo: logo-p4.png vendor: ELTE version: '1.0' description: P4 VNF mgmt-interface: cp: vnf-mgmt connection-point: - name: vnf-mgmt type: VPORT - name: vnf-ul type: VPORT port-security-enabled: "false" - name: vnf-dl type: VPORT port-security-enabled: "false" vdu: - cloud-init-file: p4vnf_cloud_init.cfg count: 1 interface: - name: ens3 type: EXTERNAL virtual-interface: type: VIRTIO vpci: '0000:00:0a.0' external-connection-point-ref: vnf-mgmt - name: ens4 type: EXTERNAL virtual-interface: type: VIRTIO vpci: '0000:00:0b.0' external-connection-point-ref: vnf-ul - name: ens5 type: EXTERNAL virtual-interface: type: VIRTIO vpci: '0000:00:0c.0' external-connection-point-ref: vnf-dl id: p4_vdu image: t4p4s_image description: P4_vnf name: p4_vdu vm-flavor: memory-mb: '8192' storage-gb: '60' vcpu-count: '4' vnf-configuration: config-primitive: - name: start parameter: - data-type: STRING name: p4program - data-type: STRING name: controller - name: stop initial-config-primitive: - name: config parameter: - name: ssh-hostname value: <rw_mgmt_ip> - name: ssh-username value: ubuntu - name: ssh-password value: 5ginfire seq: '1' - seq: '2' name: conf-dpdk juju: charm: t4p4s ``` # Experiment description - P4Chain NS ![VNFs and connections in the P4Chain NS](uploads/p4in5g/p4chain.png) Figure illustrates the experiment setup described in the NSD. One can see that three instances of P4-VNF are deployed, forming a ring topology that can be used to emulate various scenarios analyzed in Deliverable D3. LinkA, linkB and linkC are private networks representing virtual links between the interfaces of two VNFs, while the MGMT network is the “provider” network through which the nodes are available from the jumping gateway. The p4chain_nsd.yaml is the following: ``` nsd:nsd-catalog: nsd: - id: p4chain_nsd name: p4chain_nsd short-name: p4chain_nsd description: Generated by OSM pacakage generator vendor: OSM version: '1.0' # Specify the VNFDs that are part of this NSD constituent-vnfd: - vnfd-id-ref: p4_vnf member-vnf-index: '1' - vnfd-id-ref: p4_vnf member-vnf-index: '2' - vnfd-id-ref: p4_vnf member-vnf-index: '3' vld: - id: mgmtnet name: mgmtnet short-name: mgmtnet type: ELAN mgmt-network: 'true' vim-network-name: provider vnfd-connection-point-ref: - vnfd-id-ref: p4_vnf member-vnf-index-ref: '1' vnfd-connection-point-ref: vnf-mgmt - vnfd-id-ref: p4_vnf member-vnf-index-ref: '2' vnfd-connection-point-ref: vnf-mgmt - vnfd-id-ref: p4_vnf member-vnf-index-ref: '3' vnfd-connection-point-ref: vnf-mgmt - id: linkA name: linkA short-name: linkA type: ELAN vnfd-connection-point-ref: - vnfd-id-ref: p4_vnf member-vnf-index-ref: '1' vnfd-connection-point-ref: vnf-ul - vnfd-id-ref: p4_vnf member-vnf-index-ref: '2' vnfd-connection-point-ref: vnf-dl - id: linkB name: linkB short-name: linkB type: ELAN vnfd-connection-point-ref: - vnfd-id-ref: p4_vnf member-vnf-index-ref: '2' vnfd-connection-point-ref: vnf-ul - vnfd-id-ref: p4_vnf member-vnf-index-ref: '3' vnfd-connection-point-ref: vnf-dl - id: linkC name: linkC short-name: linkC type: ELAN vnfd-connection-point-ref: - vnfd-id-ref: p4_vnf member-vnf-index-ref: '3' vnfd-connection-point-ref: vnf-ul - vnfd-id-ref: p4_vnf member-vnf-index-ref: '1' vnfd-connection-point-ref: vnf-dl ``` # Execution T4P4S-based data plane is launched automatically when the VNF started (with l2fwd P4 program and its control plane by default). However, the P4 program can manually be changed during runtime with run_switch.sh script described above. The P4VNF is also equipped with tools for performance measurements like iperf and pktgen. Using the topology, there are multiple possible setups: 1) Using two nodes with linkA only: #1 with pktgen and #2 with data plane program, 2) using three nodes with linkA and linkB: #1 and #3 run the iperf source and sink, and the data plane program is executed by #2, 3) using three nodes with all the links: iperf source and sink or pktgen run on the same node #1 (executing ), while #2 and #3 both execute P4 data plane programs resulting in a short chain of NFs. ## Experiences with pktgen To measure packet processing rate we started the compiled P4 data plane program on one node and a dpdk-pktgen on the other. We specified the destination mac addresses in pktgen to match the ports of the node running the P4 program otherwhile the traffic was lost between the two nodes. Unfortunately, we only observed traffic in the first second and after reaching around 10k packets the pktgen tool was suddenly stopped sending packets. Since similar behavior was not observed in non-virtualized environments. The resulted PPS in the first second was not meaningful, using the simplest port forward P4 program (directly forwarding packets from one port to another). Because of these limitations of pktgen we decided to use iperf tool to generate measurement traffic. In this scenario, we used “the two nodes with linkA only” setting. ## Experiences with iperf In this experimental scenario, the using three nodes with linkA and linkB setting is used. On node #1 and #3 we launched an iperf client and a server, respectively. Node #2 is running the P4 data plane program to be tested. Note that the IP addresses of the interfaces connected to the virtual links should be prepared correctly. For example node #3 has IP 192.168.143.2/24 on port connected to linkB, and node #1 has 192.168.143.3/24 on port connected to linkA. They are in the same subnet now – but it may be different in the various use cases. Note that these steps are required since iperf generates real end-to-end traffic; with pktgen these can be avoided. ### Setting up the switch The machine’s interfaces, hugepages, etc. can be configured with the _setup_t4p4s.sh_ script. Note that when the VNF is booted up this script is automatically executed, but if some configuration issues raise, the script can reset the states. ``` ~/setup_t4p4s.sh``` After the initial configuration a controller and a switch program can be started or replaced with the following helper scripts. Note that the controller should be launched before the data plane program. By default the L2 forwarding program is launched when the VNF is booted up. ``` ~/run_controller.sh <PROGRAM_NAME> <TABLE_FILE>``` ``` ~/run_switch.sh <PROGRAM_NAME>``` It is also possible to start the switch and the controller with the default configuration (as described in examples.cfg) with: ``` ~/t4p4s/t4p4s/t4p4s.sh :<PROGRAM_NAME> verbose``` ### Launching iperf server As described above, the iperf tool is preinstalled in the VM image of P4VNF. On node #3 we launch the iperf server with the following command: ```iperf -s``` ### Launching iperf client The iperf client is executed on node #1 using the IP address of the node #3 as the server side. The client can be executed in the following way: ``` iperf -c 192.168.143.2 (iperf server’s ens4 interface IP)``` We can observe that the RTTs resulted by ping vary in a wide range (0.6ms - 6.49ms). Our hypothesis is that this phenomenon is caused by the underlying networking and virtualization layer of the 5TONIC infrastructure. The reported average packet processing delay in the data plane program (INT feature) was in the range of 100s of usecs. ## Chaining P4VNFs In the third setting, all the three nodes with all the links were used. Node #2 and #3 were running P4 data plane programs, while node #1 is used as source and destination of the test traffic. The test traffic originates at node #1 and goes through linkA, linkB and linkC in this order and the responses (e.g. ACKs) should follow the linkC, linkB and linkA path. Note that the implementation of this measurement with pktgen would be simple, but with iperf we had to do some tricks. Since both iperf source and sink instances are running inside the same node, we had to enforce them to take the path through node #2 and #3 as the communication path. To this end, we used Docker containers and the pipework tool that can assign (VIRTIO) NICs to containers. We first created two containers for the iperf source and sink on node #1: ``` sudo docker run -itd --cap-add=NET_ADMIN --name=iperf3-server networkstatic/iperf3 -s sudo docker run -itd --cap-add=NET_ADMIN --name=iperf3-client networkstatic/iperf3 -s ``` Then pipework was used to assign the interfaces to containers and configure their IP/PREFIX: ``` sudo ./pipework --direct-phys <NETWORK-INTERFACE> <CONTAINERNAME> <IP>/<PREFIX> ``` For example this is a valid configuration for the iperf containers: ``` sudo ./pipework --direct-phys enp11s0f0 iperf3-server 192.168.1.1/24 sudo ./pipework --direct-phys enp11s0f1 iperf3-client 192.168.1.2/24 ``` Then iperf measurements can be carried out in both directions of the chain, e.g. by entering one of the iperf containers and then launching the measurement: ``` docker exec -i -t iperf3-client bash root@50b2dc33e165:/# iperf3 -t 20 -c 192.168.1.2 ``` To measure the end-to-end delay during an iperf measurement, the ping tool can additionally be executed. # Loading data into INT database The _run_switch.sh_ script cannot automatically stores the data into the INT database. To this end, the following commands should be executed: ``` ~/run_switch.sh <PROGRAM_NAME> > switch.log & python ~/log_loader.py switch.log <DB_MGMT_IP-optional> ``` The python script will load the all the collected switch statistics into the prepared the INT database of the InfluxDB instance (local instance is used by default). For the integration with Grafana the required steps are the following: 1. A new data source should be added: - Database name: **int_db** - Database type: **InfluxDB** - URL: ``` http://<jumper node>:<forwarded port>``` - note that you have to reach the node inside the 5TONIC (VPN + tunneling) - Access: **Server (Default)** - InfluxDB Details->Database: **int_db** Then by clicking on Save & Test the data source can be added. If the DB is unavailable because of VPN or tunneling problems, an error message is presented. For visualizing the timeseries stored in int_db data source, you can select different metrics: - **min_proc_delay**: Minimum of the observed packet processing delay in usec - **max_proc_delay**: Maximum of the observed packet processing delay in usec - **mean_proc_delay**: Mean of the observed packet processing delay in usec - **min_intvar**: Minimum of the “intvar” standard metadata field used for storing user defined metrics. - **max_intvar**: Maximum of the “intvar” standard metadata field used for storing user defined metrics. - **mean_intvar**: Mean of the “intvar” standard metadata field used for storing user defined metrics. Each metric represents the statistics (min/max/mean) observed in a time window of 1 second. In all the metrics the actual value for the given timestamp is stored in the field called “value”. 2. To add a graph to a Grafana dashboard, int_db data source should first be selected and then the following query can be added: ``` FROM default <METRIC-NAME> WHERE SELECT field(value) GROUP BY FORMAT AS Time series ALIAS BY ``` In this example, **_<METRIC-NAME\>_** can be replaced by any collected metrics listed above. # Measurements in 5TONIC To analyze the performance of P4VNF we investigated two experimental scenarios: 1) P4VNF testing focusing on the packet forwarding performance, end-to-end and packet processing delays. In this scenario, we considered P4 programs covering all the use cases described in Section 3. 2) P4VNF chaining where two instances of a simple port forwarding network function were used, constituting a service chain. This scenario focused on the achievable throughput and end-to-end delays. ## Scenario 1: Evaluation of a P4VNF node ![Evaluation of a P4VNF node](uploads/p4in5g/scenario-1.png) Figure depicts the scenario consisting of 3 nodes. Node #1 and #3 are the source and sink of the test traffic, while node #2 runs the P4-based data plane. The test traffic was generated by the iperf tool. Note that we also tried to use DPDK’s pktgen traffic generator, but after sending approximately 10k packets it suddenly stopped sending packets. The end-to-end delay (actually RTT) was measured by the ping tool. It was executed in parallel with the iperf tools. The packet processing statistics were provided by the software data plane through its INT feature. In addition to the use cases described in Section 1, a port forwarding (PortFWD) program was also examined as a baseline. It simply forwards every packet from port 0 to 1, and vice versa. In the L2 forwarding use case we had 2 exact match tables for source and destination MAC addresses. We considered two cases show as L2 and L2big in the figures and tables: 1) L2: both tables are filled with 2 entries, 2) L2big: 1000 entries are inserted into both tables. L3 routing is a more complex use case. It consists of an exact match source mac table, an LPM table for IP/SUBNET and a next hop exact match table. Similarly to the L2 use case, in L3 scenario all the tables contain 2 entries, while in L3big 1000 entries are inserted into the LPM table. ![Packet processing performance of basic use cases with various packet sizes. (average throughput)](/uploads/p-4-in-5-g/scenario-1-image-1-a.png) ![Packet processing performance of basic use cases with various packet sizes. (average RTT)](/uploads/p-4-in-5-g/scenario-1-image-1-b.png) ![The performance of L2 forwarding use case with different table sizes. (average throughput)](/uploads/p-4-in-5-g/scenario-1-image-2-a.png) ![The performance of L2 forwarding use case with different table sizes. (average throughput)](/uploads/p-4-in-5-g/scenario-1-image-2-b.png) ![The performance of L3 routing use case with various table sizes. (average throughput)](/uploads/p-4-in-5-g/scenario-1-image-3-a.png) ![The performance of L3 routing use case with various table sizes. (average throughput)](/uploads/p-4-in-5-g/scenario-1-image-3-b.png) ## Scenario 2: Chaining two P4VNFs In this scenario, we executed two port forwarding (PortFWD) examples in node #2 and node #3 while the traffic was generated by node #1 where two Docker containers were running. DL and UL interfaces have been assigned to the two containers. The generated traffic followed the path denoted by the red dashed line in Figure below. ![Chaining VNFs](/uploads/p-4-in-5-g//scenario-2.png) This scenario was only evaluated with the port forward example to see the achievable baseline performance of chaining multiple P4VNFs. Similarly to the previous cases we monitored the end-to-end throughput and RTT, and the packet processing delay of the two data plane programs. One can observe that the achieved throughput is the same as in the previous experimental scenario, only the observed end-to-end RTT has been increased because of the longer path.