| .. SPDX-License-Identifier: GPL-2.0+ |
| |
| ================================================================= |
| Linux Base Driver for the Intel(R) Ethernet Controller 800 Series |
| ================================================================= |
| |
| Intel ice Linux driver. |
| Copyright(c) 2018-2021 Intel Corporation. |
| |
| Contents |
| ======== |
| |
| - Overview |
| - Identifying Your Adapter |
| - Important Notes |
| - Additional Features & Configurations |
| - Performance Optimization |
| |
| |
| The associated Virtual Function (VF) driver for this driver is iavf. |
| |
| Driver information can be obtained using ethtool and lspci. |
| |
| For questions related to hardware requirements, refer to the documentation |
| supplied with your Intel adapter. All hardware requirements listed apply to use |
| with Linux. |
| |
| This driver supports XDP (Express Data Path) and AF_XDP zero-copy. Note that |
| XDP is blocked for frame sizes larger than 3KB. |
| |
| |
| Identifying Your Adapter |
| ======================== |
| For information on how to identify your adapter, and for the latest Intel |
| network drivers, refer to the Intel Support website: |
| https://www.intel.com/support |
| |
| |
| Important Notes |
| =============== |
| |
| Packet drops may occur under receive stress |
| ------------------------------------------- |
| Devices based on the Intel(R) Ethernet Controller 800 Series are designed to |
| tolerate a limited amount of system latency during PCIe and DMA transactions. |
| If these transactions take longer than the tolerated latency, it can impact the |
| length of time the packets are buffered in the device and associated memory, |
| which may result in dropped packets. These packets drops typically do not have |
| a noticeable impact on throughput and performance under standard workloads. |
| |
| If these packet drops appear to affect your workload, the following may improve |
| the situation: |
| |
| 1) Make sure that your system's physical memory is in a high-performance |
| configuration, as recommended by the platform vendor. A common |
| recommendation is for all channels to be populated with a single DIMM |
| module. |
| 2) In your system's BIOS/UEFI settings, select the "Performance" profile. |
| 3) Your distribution may provide tools like "tuned," which can help tweak |
| kernel settings to achieve better standard settings for different workloads. |
| |
| |
| Configuring SR-IOV for improved network security |
| ------------------------------------------------ |
| In a virtualized environment, on Intel(R) Ethernet Network Adapters that |
| support SR-IOV, the virtual function (VF) may be subject to malicious behavior. |
| Software-generated layer two frames, like IEEE 802.3x (link flow control), IEEE |
| 802.1Qbb (priority based flow-control), and others of this type, are not |
| expected and can throttle traffic between the host and the virtual switch, |
| reducing performance. To resolve this issue, and to ensure isolation from |
| unintended traffic streams, configure all SR-IOV enabled ports for VLAN tagging |
| from the administrative interface on the PF. This configuration allows |
| unexpected, and potentially malicious, frames to be dropped. |
| |
| See "Configuring VLAN Tagging on SR-IOV Enabled Adapter Ports" later in this |
| README for configuration instructions. |
| |
| |
| Do not unload port driver if VF with active VM is bound to it |
| ------------------------------------------------------------- |
| Do not unload a port's driver if a Virtual Function (VF) with an active Virtual |
| Machine (VM) is bound to it. Doing so will cause the port to appear to hang. |
| Once the VM shuts down, or otherwise releases the VF, the command will |
| complete. |
| |
| |
| Important notes for SR-IOV and Link Aggregation |
| ----------------------------------------------- |
| Link Aggregation is mutually exclusive with SR-IOV. |
| |
| - If Link Aggregation is active, SR-IOV VFs cannot be created on the PF. |
| - If SR-IOV is active, you cannot set up Link Aggregation on the interface. |
| |
| Bridging and MACVLAN are also affected by this. If you wish to use bridging or |
| MACVLAN with SR-IOV, you must set up bridging or MACVLAN before enabling |
| SR-IOV. If you are using bridging or MACVLAN in conjunction with SR-IOV, and |
| you want to remove the interface from the bridge or MACVLAN, you must follow |
| these steps: |
| |
| 1. Destroy SR-IOV VFs if they exist |
| 2. Remove the interface from the bridge or MACVLAN |
| 3. Recreate SRIOV VFs as needed |
| |
| |
| Additional Features and Configurations |
| ====================================== |
| |
| ethtool |
| ------- |
| The driver utilizes the ethtool interface for driver configuration and |
| diagnostics, as well as displaying statistical information. The latest ethtool |
| version is required for this functionality. Download it at: |
| https://kernel.org/pub/software/network/ethtool/ |
| |
| NOTE: The rx_bytes value of ethtool does not match the rx_bytes value of |
| Netdev, due to the 4-byte CRC being stripped by the device. The difference |
| between the two rx_bytes values will be 4 x the number of Rx packets. For |
| example, if Rx packets are 10 and Netdev (software statistics) displays |
| rx_bytes as "X", then ethtool (hardware statistics) will display rx_bytes as |
| "X+40" (4 bytes CRC x 10 packets). |
| |
| |
| Viewing Link Messages |
| --------------------- |
| Link messages will not be displayed to the console if the distribution is |
| restricting system messages. In order to see network driver link messages on |
| your console, set dmesg to eight by entering the following:: |
| |
| # dmesg -n 8 |
| |
| NOTE: This setting is not saved across reboots. |
| |
| |
| Dynamic Device Personalization |
| ------------------------------ |
| Dynamic Device Personalization (DDP) allows you to change the packet processing |
| pipeline of a device by applying a profile package to the device at runtime. |
| Profiles can be used to, for example, add support for new protocols, change |
| existing protocols, or change default settings. DDP profiles can also be rolled |
| back without rebooting the system. |
| |
| The DDP package loads during device initialization. The driver looks for |
| ``intel/ice/ddp/ice.pkg`` in your firmware root (typically ``/lib/firmware/`` |
| or ``/lib/firmware/updates/``) and checks that it contains a valid DDP package |
| file. |
| |
| NOTE: Your distribution should likely have provided the latest DDP file, but if |
| ice.pkg is missing, you can find it in the linux-firmware repository or from |
| intel.com. |
| |
| If the driver is unable to load the DDP package, the device will enter Safe |
| Mode. Safe Mode disables advanced and performance features and supports only |
| basic traffic and minimal functionality, such as updating the NVM or |
| downloading a new driver or DDP package. Safe Mode only applies to the affected |
| physical function and does not impact any other PFs. See the "Intel(R) Ethernet |
| Adapters and Devices User Guide" for more details on DDP and Safe Mode. |
| |
| NOTES: |
| |
| - If you encounter issues with the DDP package file, you may need to download |
| an updated driver or DDP package file. See the log messages for more |
| information. |
| |
| - The ice.pkg file is a symbolic link to the default DDP package file. |
| |
| - You cannot update the DDP package if any PF drivers are already loaded. To |
| overwrite a package, unload all PFs and then reload the driver with the new |
| package. |
| |
| - Only the first loaded PF per device can download a package for that device. |
| |
| You can install specific DDP package files for different physical devices in |
| the same system. To install a specific DDP package file: |
| |
| 1. Download the DDP package file you want for your device. |
| |
| 2. Rename the file ice-xxxxxxxxxxxxxxxx.pkg, where 'xxxxxxxxxxxxxxxx' is the |
| unique 64-bit PCI Express device serial number (in hex) of the device you |
| want the package downloaded on. The filename must include the complete |
| serial number (including leading zeros) and be all lowercase. For example, |
| if the 64-bit serial number is b887a3ffffca0568, then the file name would be |
| ice-b887a3ffffca0568.pkg. |
| |
| To find the serial number from the PCI bus address, you can use the |
| following command:: |
| |
| # lspci -vv -s af:00.0 | grep -i Serial |
| Capabilities: [150 v1] Device Serial Number b8-87-a3-ff-ff-ca-05-68 |
| |
| You can use the following command to format the serial number without the |
| dashes:: |
| |
| # lspci -vv -s af:00.0 | grep -i Serial | awk '{print $7}' | sed s/-//g |
| b887a3ffffca0568 |
| |
| 3. Copy the renamed DDP package file to |
| ``/lib/firmware/updates/intel/ice/ddp/``. If the directory does not yet |
| exist, create it before copying the file. |
| |
| 4. Unload all of the PFs on the device. |
| |
| 5. Reload the driver with the new package. |
| |
| NOTE: The presence of a device-specific DDP package file overrides the loading |
| of the default DDP package file (ice.pkg). |
| |
| |
| Intel(R) Ethernet Flow Director |
| ------------------------------- |
| The Intel Ethernet Flow Director performs the following tasks: |
| |
| - Directs receive packets according to their flows to different queues |
| - Enables tight control on routing a flow in the platform |
| - Matches flows and CPU cores for flow affinity |
| |
| NOTE: This driver supports the following flow types: |
| |
| - IPv4 |
| - TCPv4 |
| - UDPv4 |
| - SCTPv4 |
| - IPv6 |
| - TCPv6 |
| - UDPv6 |
| - SCTPv6 |
| |
| Each flow type supports valid combinations of IP addresses (source or |
| destination) and UDP/TCP/SCTP ports (source and destination). You can supply |
| only a source IP address, a source IP address and a destination port, or any |
| combination of one or more of these four parameters. |
| |
| NOTE: This driver allows you to filter traffic based on a user-defined flexible |
| two-byte pattern and offset by using the ethtool user-def and mask fields. Only |
| L3 and L4 flow types are supported for user-defined flexible filters. For a |
| given flow type, you must clear all Intel Ethernet Flow Director filters before |
| changing the input set (for that flow type). |
| |
| |
| Flow Director Filters |
| --------------------- |
| Flow Director filters are used to direct traffic that matches specified |
| characteristics. They are enabled through ethtool's ntuple interface. To enable |
| or disable the Intel Ethernet Flow Director and these filters:: |
| |
| # ethtool -K <ethX> ntuple <off|on> |
| |
| NOTE: When you disable ntuple filters, all the user programmed filters are |
| flushed from the driver cache and hardware. All needed filters must be re-added |
| when ntuple is re-enabled. |
| |
| To display all of the active filters:: |
| |
| # ethtool -u <ethX> |
| |
| To add a new filter:: |
| |
| # ethtool -U <ethX> flow-type <type> src-ip <ip> [m <ip_mask>] dst-ip <ip> |
| [m <ip_mask>] src-port <port> [m <port_mask>] dst-port <port> [m <port_mask>] |
| action <queue> |
| |
| Where: |
| <ethX> - the Ethernet device to program |
| <type> - can be ip4, tcp4, udp4, sctp4, ip6, tcp6, udp6, sctp6 |
| <ip> - the IP address to match on |
| <ip_mask> - the IPv4 address to mask on |
| NOTE: These filters use inverted masks. |
| <port> - the port number to match on |
| <port_mask> - the 16-bit integer for masking |
| NOTE: These filters use inverted masks. |
| <queue> - the queue to direct traffic toward (-1 discards the |
| matched traffic) |
| |
| To delete a filter:: |
| |
| # ethtool -U <ethX> delete <N> |
| |
| Where <N> is the filter ID displayed when printing all the active filters, |
| and may also have been specified using "loc <N>" when adding the filter. |
| |
| EXAMPLES: |
| |
| To add a filter that directs packet to queue 2:: |
| |
| # ethtool -U <ethX> flow-type tcp4 src-ip 192.168.10.1 dst-ip \ |
| 192.168.10.2 src-port 2000 dst-port 2001 action 2 [loc 1] |
| |
| To set a filter using only the source and destination IP address:: |
| |
| # ethtool -U <ethX> flow-type tcp4 src-ip 192.168.10.1 dst-ip \ |
| 192.168.10.2 action 2 [loc 1] |
| |
| To set a filter based on a user-defined pattern and offset:: |
| |
| # ethtool -U <ethX> flow-type tcp4 src-ip 192.168.10.1 dst-ip \ |
| 192.168.10.2 user-def 0x4FFFF action 2 [loc 1] |
| |
| where the value of the user-def field contains the offset (4 bytes) and |
| the pattern (0xffff). |
| |
| To match TCP traffic sent from 192.168.0.1, port 5300, directed to 192.168.0.5, |
| port 80, and then send it to queue 7:: |
| |
| # ethtool -U enp130s0 flow-type tcp4 src-ip 192.168.0.1 dst-ip 192.168.0.5 |
| src-port 5300 dst-port 80 action 7 |
| |
| To add a TCPv4 filter with a partial mask for a source IP subnet:: |
| |
| # ethtool -U <ethX> flow-type tcp4 src-ip 192.168.0.0 m 0.255.255.255 dst-ip |
| 192.168.5.12 src-port 12600 dst-port 31 action 12 |
| |
| NOTES: |
| |
| For each flow-type, the programmed filters must all have the same matching |
| input set. For example, issuing the following two commands is acceptable:: |
| |
| # ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7 |
| # ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.5 src-port 55 action 10 |
| |
| Issuing the next two commands, however, is not acceptable, since the first |
| specifies src-ip and the second specifies dst-ip:: |
| |
| # ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7 |
| # ethtool -U enp130s0 flow-type ip4 dst-ip 192.168.0.5 src-port 55 action 10 |
| |
| The second command will fail with an error. You may program multiple filters |
| with the same fields, using different values, but, on one device, you may not |
| program two tcp4 filters with different matching fields. |
| |
| The ice driver does not support matching on a subportion of a field, thus |
| partial mask fields are not supported. |
| |
| |
| Flex Byte Flow Director Filters |
| ------------------------------- |
| The driver also supports matching user-defined data within the packet payload. |
| This flexible data is specified using the "user-def" field of the ethtool |
| command in the following way: |
| |
| .. table:: |
| |
| ============================== ============================ |
| ``31 28 24 20 16`` ``15 12 8 4 0`` |
| ``offset into packet payload`` ``2 bytes of flexible data`` |
| ============================== ============================ |
| |
| For example, |
| |
| :: |
| |
| ... user-def 0x4FFFF ... |
| |
| tells the filter to look 4 bytes into the payload and match that value against |
| 0xFFFF. The offset is based on the beginning of the payload, and not the |
| beginning of the packet. Thus |
| |
| :: |
| |
| flow-type tcp4 ... user-def 0x8BEAF ... |
| |
| would match TCP/IPv4 packets which have the value 0xBEAF 8 bytes into the |
| TCP/IPv4 payload. |
| |
| Note that ICMP headers are parsed as 4 bytes of header and 4 bytes of payload. |
| Thus to match the first byte of the payload, you must actually add 4 bytes to |
| the offset. Also note that ip4 filters match both ICMP frames as well as raw |
| (unknown) ip4 frames, where the payload will be the L3 payload of the IP4 |
| frame. |
| |
| The maximum offset is 64. The hardware will only read up to 64 bytes of data |
| from the payload. The offset must be even because the flexible data is 2 bytes |
| long and must be aligned to byte 0 of the packet payload. |
| |
| The user-defined flexible offset is also considered part of the input set and |
| cannot be programmed separately for multiple filters of the same type. However, |
| the flexible data is not part of the input set and multiple filters may use the |
| same offset but match against different data. |
| |
| |
| RSS Hash Flow |
| ------------- |
| Allows you to set the hash bytes per flow type and any combination of one or |
| more options for Receive Side Scaling (RSS) hash byte configuration. |
| |
| :: |
| |
| # ethtool -N <ethX> rx-flow-hash <type> <option> |
| |
| Where <type> is: |
| tcp4 signifying TCP over IPv4 |
| udp4 signifying UDP over IPv4 |
| tcp6 signifying TCP over IPv6 |
| udp6 signifying UDP over IPv6 |
| And <option> is one or more of: |
| s Hash on the IP source address of the Rx packet. |
| d Hash on the IP destination address of the Rx packet. |
| f Hash on bytes 0 and 1 of the Layer 4 header of the Rx packet. |
| n Hash on bytes 2 and 3 of the Layer 4 header of the Rx packet. |
| |
| |
| Accelerated Receive Flow Steering (aRFS) |
| ---------------------------------------- |
| Devices based on the Intel(R) Ethernet Controller 800 Series support |
| Accelerated Receive Flow Steering (aRFS) on the PF. aRFS is a load-balancing |
| mechanism that allows you to direct packets to the same CPU where an |
| application is running or consuming the packets in that flow. |
| |
| NOTES: |
| |
| - aRFS requires that ntuple filtering is enabled via ethtool. |
| - aRFS support is limited to the following packet types: |
| |
| - TCP over IPv4 and IPv6 |
| - UDP over IPv4 and IPv6 |
| - Nonfragmented packets |
| |
| - aRFS only supports Flow Director filters, which consist of the |
| source/destination IP addresses and source/destination ports. |
| - aRFS and ethtool's ntuple interface both use the device's Flow Director. aRFS |
| and ntuple features can coexist, but you may encounter unexpected results if |
| there's a conflict between aRFS and ntuple requests. See "Intel(R) Ethernet |
| Flow Director" for additional information. |
| |
| To set up aRFS: |
| |
| 1. Enable the Intel Ethernet Flow Director and ntuple filters using ethtool. |
| |
| :: |
| |
| # ethtool -K <ethX> ntuple on |
| |
| 2. Set up the number of entries in the global flow table. For example: |
| |
| :: |
| |
| # NUM_RPS_ENTRIES=16384 |
| # echo $NUM_RPS_ENTRIES > /proc/sys/net/core/rps_sock_flow_entries |
| |
| 3. Set up the number of entries in the per-queue flow table. For example: |
| |
| :: |
| |
| # NUM_RX_QUEUES=64 |
| # for file in /sys/class/net/$IFACE/queues/rx-*/rps_flow_cnt; do |
| # echo $(($NUM_RPS_ENTRIES/$NUM_RX_QUEUES)) > $file; |
| # done |
| |
| 4. Disable the IRQ balance daemon (this is only a temporary stop of the service |
| until the next reboot). |
| |
| :: |
| |
| # systemctl stop irqbalance |
| |
| 5. Configure the interrupt affinity. |
| |
| See ``/Documentation/core-api/irq/irq-affinity.rst`` |
| |
| |
| To disable aRFS using ethtool:: |
| |
| # ethtool -K <ethX> ntuple off |
| |
| NOTE: This command will disable ntuple filters and clear any aRFS filters in |
| software and hardware. |
| |
| Example Use Case: |
| |
| 1. Set the server application on the desired CPU (e.g., CPU 4). |
| |
| :: |
| |
| # taskset -c 4 netserver |
| |
| 2. Use netperf to route traffic from the client to CPU 4 on the server with |
| aRFS configured. This example uses TCP over IPv4. |
| |
| :: |
| |
| # netperf -H <Host IPv4 Address> -t TCP_STREAM |
| |
| |
| Enabling Virtual Functions (VFs) |
| -------------------------------- |
| Use sysfs to enable virtual functions (VF). |
| |
| For example, you can create 4 VFs as follows:: |
| |
| # echo 4 > /sys/class/net/<ethX>/device/sriov_numvfs |
| |
| To disable VFs, write 0 to the same file:: |
| |
| # echo 0 > /sys/class/net/<ethX>/device/sriov_numvfs |
| |
| The maximum number of VFs for the ice driver is 256 total (all ports). To check |
| how many VFs each PF supports, use the following command:: |
| |
| # cat /sys/class/net/<ethX>/device/sriov_totalvfs |
| |
| Note: You cannot use SR-IOV when link aggregation (LAG)/bonding is active, and |
| vice versa. To enforce this, the driver checks for this mutual exclusion. |
| |
| |
| Displaying VF Statistics on the PF |
| ---------------------------------- |
| Use the following command to display the statistics for the PF and its VFs:: |
| |
| # ip -s link show dev <ethX> |
| |
| NOTE: The output of this command can be very large due to the maximum number of |
| possible VFs. |
| |
| The PF driver will display a subset of the statistics for the PF and for all |
| VFs that are configured. The PF will always print a statistics block for each |
| of the possible VFs, and it will show zero for all unconfigured VFs. |
| |
| |
| Configuring VLAN Tagging on SR-IOV Enabled Adapter Ports |
| -------------------------------------------------------- |
| To configure VLAN tagging for the ports on an SR-IOV enabled adapter, use the |
| following command. The VLAN configuration should be done before the VF driver |
| is loaded or the VM is booted. The VF is not aware of the VLAN tag being |
| inserted on transmit and removed on received frames (sometimes called "port |
| VLAN" mode). |
| |
| :: |
| |
| # ip link set dev <ethX> vf <id> vlan <vlan id> |
| |
| For example, the following will configure PF eth0 and the first VF on VLAN 10:: |
| |
| # ip link set dev eth0 vf 0 vlan 10 |
| |
| |
| Enabling a VF link if the port is disconnected |
| ---------------------------------------------- |
| If the physical function (PF) link is down, you can force link up (from the |
| host PF) on any virtual functions (VF) bound to the PF. |
| |
| For example, to force link up on VF 0 bound to PF eth0:: |
| |
| # ip link set eth0 vf 0 state enable |
| |
| Note: If the command does not work, it may not be supported by your system. |
| |
| |
| Setting the MAC Address for a VF |
| -------------------------------- |
| To change the MAC address for the specified VF:: |
| |
| # ip link set <ethX> vf 0 mac <address> |
| |
| For example:: |
| |
| # ip link set <ethX> vf 0 mac 00:01:02:03:04:05 |
| |
| This setting lasts until the PF is reloaded. |
| |
| NOTE: Assigning a MAC address for a VF from the host will disable any |
| subsequent requests to change the MAC address from within the VM. This is a |
| security feature. The VM is not aware of this restriction, so if this is |
| attempted in the VM, it will trigger MDD events. |
| |
| |
| Trusted VFs and VF Promiscuous Mode |
| ----------------------------------- |
| This feature allows you to designate a particular VF as trusted and allows that |
| trusted VF to request selective promiscuous mode on the Physical Function (PF). |
| |
| To set a VF as trusted or untrusted, enter the following command in the |
| Hypervisor:: |
| |
| # ip link set dev <ethX> vf 1 trust [on|off] |
| |
| NOTE: It's important to set the VF to trusted before setting promiscuous mode. |
| If the VM is not trusted, the PF will ignore promiscuous mode requests from the |
| VF. If the VM becomes trusted after the VF driver is loaded, you must make a |
| new request to set the VF to promiscuous. |
| |
| Once the VF is designated as trusted, use the following commands in the VM to |
| set the VF to promiscuous mode. |
| |
| For promiscuous all:: |
| |
| # ip link set <ethX> promisc on |
| Where <ethX> is a VF interface in the VM |
| |
| For promiscuous Multicast:: |
| |
| # ip link set <ethX> allmulticast on |
| Where <ethX> is a VF interface in the VM |
| |
| NOTE: By default, the ethtool private flag vf-true-promisc-support is set to |
| "off," meaning that promiscuous mode for the VF will be limited. To set the |
| promiscuous mode for the VF to true promiscuous and allow the VF to see all |
| ingress traffic, use the following command:: |
| |
| # ethtool --set-priv-flags <ethX> vf-true-promisc-support on |
| |
| The vf-true-promisc-support private flag does not enable promiscuous mode; |
| rather, it designates which type of promiscuous mode (limited or true) you will |
| get when you enable promiscuous mode using the ip link commands above. Note |
| that this is a global setting that affects the entire device. However, the |
| vf-true-promisc-support private flag is only exposed to the first PF of the |
| device. The PF remains in limited promiscuous mode regardless of the |
| vf-true-promisc-support setting. |
| |
| Next, add a VLAN interface on the VF interface. For example:: |
| |
| # ip link add link eth2 name eth2.100 type vlan id 100 |
| |
| Note that the order in which you set the VF to promiscuous mode and add the |
| VLAN interface does not matter (you can do either first). The result in this |
| example is that the VF will get all traffic that is tagged with VLAN 100. |
| |
| |
| Malicious Driver Detection (MDD) for VFs |
| ---------------------------------------- |
| Some Intel Ethernet devices use Malicious Driver Detection (MDD) to detect |
| malicious traffic from the VF and disable Tx/Rx queues or drop the offending |
| packet until a VF driver reset occurs. You can view MDD messages in the PF's |
| system log using the dmesg command. |
| |
| - If the PF driver logs MDD events from the VF, confirm that the correct VF |
| driver is installed. |
| - To restore functionality, you can manually reload the VF or VM or enable |
| automatic VF resets. |
| - When automatic VF resets are enabled, the PF driver will immediately reset |
| the VF and reenable queues when it detects MDD events on the receive path. |
| - If automatic VF resets are disabled, the PF will not automatically reset the |
| VF when it detects MDD events. |
| |
| To enable or disable automatic VF resets, use the following command:: |
| |
| # ethtool --set-priv-flags <ethX> mdd-auto-reset-vf on|off |
| |
| |
| MAC and VLAN Anti-Spoofing Feature for VFs |
| ------------------------------------------ |
| When a malicious driver on a Virtual Function (VF) interface attempts to send a |
| spoofed packet, it is dropped by the hardware and not transmitted. |
| |
| NOTE: This feature can be disabled for a specific VF:: |
| |
| # ip link set <ethX> vf <vf id> spoofchk {off|on} |
| |
| |
| Jumbo Frames |
| ------------ |
| Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU) |
| to a value larger than the default value of 1500. |
| |
| Use the ifconfig command to increase the MTU size. For example, enter the |
| following where <ethX> is the interface number:: |
| |
| # ifconfig <ethX> mtu 9000 up |
| |
| Alternatively, you can use the ip command as follows:: |
| |
| # ip link set mtu 9000 dev <ethX> |
| # ip link set up dev <ethX> |
| |
| This setting is not saved across reboots. |
| |
| |
| NOTE: The maximum MTU setting for jumbo frames is 9702. This corresponds to the |
| maximum jumbo frame size of 9728 bytes. |
| |
| NOTE: This driver will attempt to use multiple page sized buffers to receive |
| each jumbo packet. This should help to avoid buffer starvation issues when |
| allocating receive packets. |
| |
| NOTE: Packet loss may have a greater impact on throughput when you use jumbo |
| frames. If you observe a drop in performance after enabling jumbo frames, |
| enabling flow control may mitigate the issue. |
| |
| |
| Speed and Duplex Configuration |
| ------------------------------ |
| In addressing speed and duplex configuration issues, you need to distinguish |
| between copper-based adapters and fiber-based adapters. |
| |
| In the default mode, an Intel(R) Ethernet Network Adapter using copper |
| connections will attempt to auto-negotiate with its link partner to determine |
| the best setting. If the adapter cannot establish link with the link partner |
| using auto-negotiation, you may need to manually configure the adapter and link |
| partner to identical settings to establish link and pass packets. This should |
| only be needed when attempting to link with an older switch that does not |
| support auto-negotiation or one that has been forced to a specific speed or |
| duplex mode. Your link partner must match the setting you choose. 1 Gbps speeds |
| and higher cannot be forced. Use the autonegotiation advertising setting to |
| manually set devices for 1 Gbps and higher. |
| |
| Speed, duplex, and autonegotiation advertising are configured through the |
| ethtool utility. For the latest version, download and install ethtool from the |
| following website: |
| |
| https://kernel.org/pub/software/network/ethtool/ |
| |
| To see the speed configurations your device supports, run the following:: |
| |
| # ethtool <ethX> |
| |
| Caution: Only experienced network administrators should force speed and duplex |
| or change autonegotiation advertising manually. The settings at the switch must |
| always match the adapter settings. Adapter performance may suffer or your |
| adapter may not operate if you configure the adapter differently from your |
| switch. |
| |
| |
| Data Center Bridging (DCB) |
| -------------------------- |
| NOTE: The kernel assumes that TC0 is available, and will disable Priority Flow |
| Control (PFC) on the device if TC0 is not available. To fix this, ensure TC0 is |
| enabled when setting up DCB on your switch. |
| |
| DCB is a configuration Quality of Service implementation in hardware. It uses |
| the VLAN priority tag (802.1p) to filter traffic. That means that there are 8 |
| different priorities that traffic can be filtered into. It also enables |
| priority flow control (802.1Qbb) which can limit or eliminate the number of |
| dropped packets during network stress. Bandwidth can be allocated to each of |
| these priorities, which is enforced at the hardware level (802.1Qaz). |
| |
| DCB is normally configured on the network using the DCBX protocol (802.1Qaz), a |
| specialization of LLDP (802.1AB). The ice driver supports the following |
| mutually exclusive variants of DCBX support: |
| |
| 1) Firmware-based LLDP Agent |
| 2) Software-based LLDP Agent |
| |
| In firmware-based mode, firmware intercepts all LLDP traffic and handles DCBX |
| negotiation transparently for the user. In this mode, the adapter operates in |
| "willing" DCBX mode, receiving DCB settings from the link partner (typically a |
| switch). The local user can only query the negotiated DCB configuration. For |
| information on configuring DCBX parameters on a switch, please consult the |
| switch manufacturer's documentation. |
| |
| In software-based mode, LLDP traffic is forwarded to the network stack and user |
| space, where a software agent can handle it. In this mode, the adapter can |
| operate in either "willing" or "nonwilling" DCBX mode and DCB configuration can |
| be both queried and set locally. This mode requires the FW-based LLDP Agent to |
| be disabled. |
| |
| NOTE: |
| |
| - You can enable and disable the firmware-based LLDP Agent using an ethtool |
| private flag. Refer to the "FW-LLDP (Firmware Link Layer Discovery Protocol)" |
| section in this README for more information. |
| - In software-based DCBX mode, you can configure DCB parameters using software |
| LLDP/DCBX agents that interface with the Linux kernel's DCB Netlink API. We |
| recommend using OpenLLDP as the DCBX agent when running in software mode. For |
| more information, see the OpenLLDP man pages and |
| https://github.com/intel/openlldp. |
| - The driver implements the DCB netlink interface layer to allow the user space |
| to communicate with the driver and query DCB configuration for the port. |
| - iSCSI with DCB is not supported. |
| |
| |
| FW-LLDP (Firmware Link Layer Discovery Protocol) |
| ------------------------------------------------ |
| Use ethtool to change FW-LLDP settings. The FW-LLDP setting is per port and |
| persists across boots. |
| |
| To enable LLDP:: |
| |
| # ethtool --set-priv-flags <ethX> fw-lldp-agent on |
| |
| To disable LLDP:: |
| |
| # ethtool --set-priv-flags <ethX> fw-lldp-agent off |
| |
| To check the current LLDP setting:: |
| |
| # ethtool --show-priv-flags <ethX> |
| |
| NOTE: You must enable the UEFI HII "LLDP Agent" attribute for this setting to |
| take effect. If "LLDP AGENT" is set to disabled, you cannot enable it from the |
| OS. |
| |
| |
| Flow Control |
| ------------ |
| Ethernet Flow Control (IEEE 802.3x) can be configured with ethtool to enable |
| receiving and transmitting pause frames for ice. When transmit is enabled, |
| pause frames are generated when the receive packet buffer crosses a predefined |
| threshold. When receive is enabled, the transmit unit will halt for the time |
| delay specified when a pause frame is received. |
| |
| NOTE: You must have a flow control capable link partner. |
| |
| Flow Control is disabled by default. |
| |
| Use ethtool to change the flow control settings. |
| |
| To enable or disable Rx or Tx Flow Control:: |
| |
| # ethtool -A <ethX> rx <on|off> tx <on|off> |
| |
| Note: This command only enables or disables Flow Control if auto-negotiation is |
| disabled. If auto-negotiation is enabled, this command changes the parameters |
| used for auto-negotiation with the link partner. |
| |
| Note: Flow Control auto-negotiation is part of link auto-negotiation. Depending |
| on your device, you may not be able to change the auto-negotiation setting. |
| |
| NOTE: |
| |
| - The ice driver requires flow control on both the port and link partner. If |
| flow control is disabled on one of the sides, the port may appear to hang on |
| heavy traffic. |
| - You may encounter issues with link-level flow control (LFC) after disabling |
| DCB. The LFC status may show as enabled but traffic is not paused. To resolve |
| this issue, disable and reenable LFC using ethtool:: |
| |
| # ethtool -A <ethX> rx off tx off |
| # ethtool -A <ethX> rx on tx on |
| |
| |
| NAPI |
| ---- |
| This driver supports NAPI (Rx polling mode). |
| For more information on NAPI, see |
| https://www.linuxfoundation.org/collaborate/workgroups/networking/napi |
| |
| |
| MACVLAN |
| ------- |
| This driver supports MACVLAN. Kernel support for MACVLAN can be tested by |
| checking if the MACVLAN driver is loaded. You can run 'lsmod | grep macvlan' to |
| see if the MACVLAN driver is loaded or run 'modprobe macvlan' to try to load |
| the MACVLAN driver. |
| |
| NOTE: |
| |
| - In passthru mode, you can only set up one MACVLAN device. It will inherit the |
| MAC address of the underlying PF (Physical Function) device. |
| |
| |
| IEEE 802.1ad (QinQ) Support |
| --------------------------- |
| The IEEE 802.1ad standard, informally known as QinQ, allows for multiple VLAN |
| IDs within a single Ethernet frame. VLAN IDs are sometimes referred to as |
| "tags," and multiple VLAN IDs are thus referred to as a "tag stack." Tag stacks |
| allow L2 tunneling and the ability to segregate traffic within a particular |
| VLAN ID, among other uses. |
| |
| NOTES: |
| |
| - Receive checksum offloads and VLAN acceleration are not supported for 802.1ad |
| (QinQ) packets. |
| |
| - 0x88A8 traffic will not be received unless VLAN stripping is disabled with |
| the following command:: |
| |
| # ethool -K <ethX> rxvlan off |
| |
| - 0x88A8/0x8100 double VLANs cannot be used with 0x8100 or 0x8100/0x8100 VLANS |
| configured on the same port. 0x88a8/0x8100 traffic will not be received if |
| 0x8100 VLANs are configured. |
| |
| - The VF can only transmit 0x88A8/0x8100 (i.e., 802.1ad/802.1Q) traffic if: |
| |
| 1) The VF is not assigned a port VLAN. |
| 2) spoofchk is disabled from the PF. If you enable spoofchk, the VF will |
| not transmit 0x88A8/0x8100 traffic. |
| |
| - The VF may not receive all network traffic based on the Inner VLAN header |
| when VF true promiscuous mode (vf-true-promisc-support) and double VLANs are |
| enabled in SR-IOV mode. |
| |
| The following are examples of how to configure 802.1ad (QinQ):: |
| |
| # ip link add link eth0 eth0.24 type vlan proto 802.1ad id 24 |
| # ip link add link eth0.24 eth0.24.371 type vlan proto 802.1Q id 371 |
| |
| Where "24" and "371" are example VLAN IDs. |
| |
| |
| Tunnel/Overlay Stateless Offloads |
| --------------------------------- |
| Supported tunnels and overlays include VXLAN, GENEVE, and others depending on |
| hardware and software configuration. Stateless offloads are enabled by default. |
| |
| To view the current state of all offloads:: |
| |
| # ethtool -k <ethX> |
| |
| |
| UDP Segmentation Offload |
| ------------------------ |
| Allows the adapter to offload transmit segmentation of UDP packets with |
| payloads up to 64K into valid Ethernet frames. Because the adapter hardware is |
| able to complete data segmentation much faster than operating system software, |
| this feature may improve transmission performance. |
| In addition, the adapter may use fewer CPU resources. |
| |
| NOTE: |
| |
| - The application sending UDP packets must support UDP segmentation offload. |
| |
| To enable/disable UDP Segmentation Offload, issue the following command:: |
| |
| # ethtool -K <ethX> tx-udp-segmentation [off|on] |
| |
| |
| Performance Optimization |
| ======================== |
| Driver defaults are meant to fit a wide variety of workloads, but if further |
| optimization is required, we recommend experimenting with the following |
| settings. |
| |
| |
| Rx Descriptor Ring Size |
| ----------------------- |
| To reduce the number of Rx packet discards, increase the number of Rx |
| descriptors for each Rx ring using ethtool. |
| |
| Check if the interface is dropping Rx packets due to buffers being full |
| (rx_dropped.nic can mean that there is no PCIe bandwidth):: |
| |
| # ethtool -S <ethX> | grep "rx_dropped" |
| |
| If the previous command shows drops on queues, it may help to increase |
| the number of descriptors using 'ethtool -G':: |
| |
| # ethtool -G <ethX> rx <N> |
| Where <N> is the desired number of ring entries/descriptors |
| |
| This can provide temporary buffering for issues that create latency while |
| the CPUs process descriptors. |
| |
| |
| Interrupt Rate Limiting |
| ----------------------- |
| This driver supports an adaptive interrupt throttle rate (ITR) mechanism that |
| is tuned for general workloads. The user can customize the interrupt rate |
| control for specific workloads, via ethtool, adjusting the number of |
| microseconds between interrupts. |
| |
| To set the interrupt rate manually, you must disable adaptive mode:: |
| |
| # ethtool -C <ethX> adaptive-rx off adaptive-tx off |
| |
| For lower CPU utilization: |
| |
| Disable adaptive ITR and lower Rx and Tx interrupts. The examples below |
| affect every queue of the specified interface. |
| |
| Setting rx-usecs and tx-usecs to 80 will limit interrupts to about |
| 12,500 interrupts per second per queue:: |
| |
| # ethtool -C <ethX> adaptive-rx off adaptive-tx off rx-usecs 80 tx-usecs 80 |
| |
| For reduced latency: |
| |
| Disable adaptive ITR and ITR by setting rx-usecs and tx-usecs to 0 |
| using ethtool:: |
| |
| # ethtool -C <ethX> adaptive-rx off adaptive-tx off rx-usecs 0 tx-usecs 0 |
| |
| Per-queue interrupt rate settings: |
| |
| The following examples are for queues 1 and 3, but you can adjust other |
| queues. |
| |
| To disable Rx adaptive ITR and set static Rx ITR to 10 microseconds or |
| about 100,000 interrupts/second, for queues 1 and 3:: |
| |
| # ethtool --per-queue <ethX> queue_mask 0xa --coalesce adaptive-rx off |
| rx-usecs 10 |
| |
| To show the current coalesce settings for queues 1 and 3:: |
| |
| # ethtool --per-queue <ethX> queue_mask 0xa --show-coalesce |
| |
| Bounding interrupt rates using rx-usecs-high: |
| |
| :Valid Range: 0-236 (0=no limit) |
| |
| The range of 0-236 microseconds provides an effective range of 4,237 to |
| 250,000 interrupts per second. The value of rx-usecs-high can be set |
| independently of rx-usecs and tx-usecs in the same ethtool command, and is |
| also independent of the adaptive interrupt moderation algorithm. The |
| underlying hardware supports granularity in 4-microsecond intervals, so |
| adjacent values may result in the same interrupt rate. |
| |
| The following command would disable adaptive interrupt moderation, and allow |
| a maximum of 5 microseconds before indicating a receive or transmit was |
| complete. However, instead of resulting in as many as 200,000 interrupts per |
| second, it limits total interrupts per second to 50,000 via the rx-usecs-high |
| parameter. |
| |
| :: |
| |
| # ethtool -C <ethX> adaptive-rx off adaptive-tx off rx-usecs-high 20 |
| rx-usecs 5 tx-usecs 5 |
| |
| |
| Virtualized Environments |
| ------------------------ |
| In addition to the other suggestions in this section, the following may be |
| helpful to optimize performance in VMs. |
| |
| Using the appropriate mechanism (vcpupin) in the VM, pin the CPUs to |
| individual LCPUs, making sure to use a set of CPUs included in the |
| device's local_cpulist: ``/sys/class/net/<ethX>/device/local_cpulist``. |
| |
| Configure as many Rx/Tx queues in the VM as available. (See the iavf driver |
| documentation for the number of queues supported.) For example:: |
| |
| # ethtool -L <virt_interface> rx <max> tx <max> |
| |
| |
| Support |
| ======= |
| For general information, go to the Intel support website at: |
| https://www.intel.com/support/ |
| |
| or the Intel Wired Networking project hosted by Sourceforge at: |
| https://sourceforge.net/projects/e1000 |
| |
| If an issue is identified with the released source code on a supported kernel |
| with a supported adapter, email the specific information related to the issue |
| to e1000-devel@lists.sf.net. |
| |
| |
| Trademarks |
| ========== |
| Intel is a trademark or registered trademark of Intel Corporation or its |
| subsidiaries in the United States and/or other countries. |
| |
| * Other names and brands may be claimed as the property of others. |