| .. SPDX-License-Identifier: GPL-2.0 |
| |
| ================= |
| Ethernet Bridging |
| ================= |
| |
| Introduction |
| ============ |
| |
| The IEEE 802.1Q-2022 (Bridges and Bridged Networks) standard defines the |
| operation of bridges in computer networks. A bridge, in the context of this |
| standard, is a device that connects two or more network segments and operates |
| at the data link layer (Layer 2) of the OSI (Open Systems Interconnection) |
| model. The purpose of a bridge is to filter and forward frames between |
| different segments based on the destination MAC (Media Access Control) address. |
| |
| Bridge kAPI |
| =========== |
| |
| Here are some core structures of bridge code. Note that the kAPI is *unstable*, |
| and can be changed at any time. |
| |
| .. kernel-doc:: net/bridge/br_private.h |
| :identifiers: net_bridge_vlan |
| |
| Bridge uAPI |
| =========== |
| |
| Modern Linux bridge uAPI is accessed via Netlink interface. You can find |
| below files where the bridge and bridge port netlink attributes are defined. |
| |
| Bridge netlink attributes |
| ------------------------- |
| |
| .. kernel-doc:: include/uapi/linux/if_link.h |
| :doc: Bridge enum definition |
| |
| Bridge port netlink attributes |
| ------------------------------ |
| |
| .. kernel-doc:: include/uapi/linux/if_link.h |
| :doc: Bridge port enum definition |
| |
| Bridge sysfs |
| ------------ |
| |
| The sysfs interface is deprecated and should not be extended if new |
| options are added. |
| |
| STP |
| === |
| |
| The STP (Spanning Tree Protocol) implementation in the Linux bridge driver |
| is a critical feature that helps prevent loops and broadcast storms in |
| Ethernet networks by identifying and disabling redundant links. In a Linux |
| bridge context, STP is crucial for network stability and availability. |
| |
| STP is a Layer 2 protocol that operates at the Data Link Layer of the OSI |
| model. It was originally developed as IEEE 802.1D and has since evolved into |
| multiple versions, including Rapid Spanning Tree Protocol (RSTP) and |
| `Multiple Spanning Tree Protocol (MSTP) |
| <https://lore.kernel.org/netdev/20220316150857.2442916-1-tobias@waldekranz.com/>`_. |
| |
| The 802.1D-2004 removed the original Spanning Tree Protocol, instead |
| incorporating the Rapid Spanning Tree Protocol (RSTP). By 2014, all the |
| functionality defined by IEEE 802.1D has been incorporated into either |
| IEEE 802.1Q (Bridges and Bridged Networks) or IEEE 802.1AC (MAC Service |
| Definition). 802.1D has been officially withdrawn in 2022. |
| |
| Bridge Ports and STP States |
| --------------------------- |
| |
| In the context of STP, bridge ports can be in one of the following states: |
| * Blocking: The port is disabled for data traffic and only listens for |
| BPDUs (Bridge Protocol Data Units) from other devices to determine the |
| network topology. |
| * Listening: The port begins to participate in the STP process and listens |
| for BPDUs. |
| * Learning: The port continues to listen for BPDUs and begins to learn MAC |
| addresses from incoming frames but does not forward data frames. |
| * Forwarding: The port is fully operational and forwards both BPDUs and |
| data frames. |
| * Disabled: The port is administratively disabled and does not participate |
| in the STP process. The data frames forwarding are also disabled. |
| |
| Root Bridge and Convergence |
| --------------------------- |
| |
| In the context of networking and Ethernet bridging in Linux, the root bridge |
| is a designated switch in a bridged network that serves as a reference point |
| for the spanning tree algorithm to create a loop-free topology. |
| |
| Here's how the STP works and root bridge is chosen: |
| 1. Bridge Priority: Each bridge running a spanning tree protocol, has a |
| configurable Bridge Priority value. The lower the value, the higher the |
| priority. By default, the Bridge Priority is set to a standard value |
| (e.g., 32768). |
| 2. Bridge ID: The Bridge ID is composed of two components: Bridge Priority |
| and the MAC address of the bridge. It uniquely identifies each bridge |
| in the network. The Bridge ID is used to compare the priorities of |
| different bridges. |
| 3. Bridge Election: When the network starts, all bridges initially assume |
| that they are the root bridge. They start advertising Bridge Protocol |
| Data Units (BPDU) to their neighbors, containing their Bridge ID and |
| other information. |
| 4. BPDU Comparison: Bridges exchange BPDUs to determine the root bridge. |
| Each bridge examines the received BPDUs, including the Bridge Priority |
| and Bridge ID, to determine if it should adjust its own priorities. |
| The bridge with the lowest Bridge ID will become the root bridge. |
| 5. Root Bridge Announcement: Once the root bridge is determined, it sends |
| BPDUs with information about the root bridge to all other bridges in the |
| network. This information is used by other bridges to calculate the |
| shortest path to the root bridge and, in doing so, create a loop-free |
| topology. |
| 6. Forwarding Ports: After the root bridge is selected and the spanning tree |
| topology is established, each bridge determines which of its ports should |
| be in the forwarding state (used for data traffic) and which should be in |
| the blocking state (used to prevent loops). The root bridge's ports are |
| all in the forwarding state. while other bridges have some ports in the |
| blocking state to avoid loops. |
| 7. Root Ports: After the root bridge is selected and the spanning tree |
| topology is established, each non-root bridge processes incoming |
| BPDUs and determines which of its ports provides the shortest path to the |
| root bridge based on the information in the received BPDUs. This port is |
| designated as the root port. And it is in the Forwarding state, allowing |
| it to actively forward network traffic. |
| 8. Designated ports: A designated port is the port through which the non-root |
| bridge will forward traffic towards the designated segment. Designated ports |
| are placed in the Forwarding state. All other ports on the non-root |
| bridge that are not designated for specific segments are placed in the |
| Blocking state to prevent network loops. |
| |
| STP ensures network convergence by calculating the shortest path and disabling |
| redundant links. When network topology changes occur (e.g., a link failure), |
| STP recalculates the network topology to restore connectivity while avoiding loops. |
| |
| Proper configuration of STP parameters, such as the bridge priority, can |
| influence network performance, path selection and which bridge becomes the |
| Root Bridge. |
| |
| User space STP helper |
| --------------------- |
| |
| The user space STP helper *bridge-stp* is a program to control whether to use |
| user mode spanning tree. The ``/sbin/bridge-stp <bridge> <start|stop>`` is |
| called by the kernel when STP is enabled/disabled on a bridge |
| (via ``brctl stp <bridge> <on|off>`` or ``ip link set <bridge> type bridge |
| stp_state <0|1>``). The kernel enables user_stp mode if that command returns |
| 0, or enables kernel_stp mode if that command returns any other value. |
| |
| VLAN |
| ==== |
| |
| A LAN (Local Area Network) is a network that covers a small geographic area, |
| typically within a single building or a campus. LANs are used to connect |
| computers, servers, printers, and other networked devices within a localized |
| area. LANs can be wired (using Ethernet cables) or wireless (using Wi-Fi). |
| |
| A VLAN (Virtual Local Area Network) is a logical segmentation of a physical |
| network into multiple isolated broadcast domains. VLANs are used to divide |
| a single physical LAN into multiple virtual LANs, allowing different groups of |
| devices to communicate as if they were on separate physical networks. |
| |
| Typically there are two VLAN implementations, IEEE 802.1Q and IEEE 802.1ad |
| (also known as QinQ). IEEE 802.1Q is a standard for VLAN tagging in Ethernet |
| networks. It allows network administrators to create logical VLANs on a |
| physical network and tag Ethernet frames with VLAN information, which is |
| called *VLAN-tagged frames*. IEEE 802.1ad, commonly known as QinQ or Double |
| VLAN, is an extension of the IEEE 802.1Q standard. QinQ allows for the |
| stacking of multiple VLAN tags within a single Ethernet frame. The Linux |
| bridge supports both the IEEE 802.1Q and `802.1AD |
| <https://lore.kernel.org/netdev/1402401565-15423-1-git-send-email-makita.toshiaki@lab.ntt.co.jp/>`_ |
| protocol for VLAN tagging. |
| |
| `VLAN filtering <https://lore.kernel.org/netdev/1360792820-14116-1-git-send-email-vyasevic@redhat.com/>`_ |
| on a bridge is disabled by default. After enabling VLAN filtering on a bridge, |
| it will start forwarding frames to appropriate destinations based on their |
| destination MAC address and VLAN tag (both must match). |
| |
| Multicast |
| ========= |
| |
| The Linux bridge driver has multicast support allowing it to process Internet |
| Group Management Protocol (IGMP) or Multicast Listener Discovery (MLD) |
| messages, and to efficiently forward multicast data packets. The bridge |
| driver supports IGMPv2/IGMPv3 and MLDv1/MLDv2. |
| |
| Multicast snooping |
| ------------------ |
| |
| Multicast snooping is a networking technology that allows network switches |
| to intelligently manage multicast traffic within a local area network (LAN). |
| |
| The switch maintains a multicast group table, which records the association |
| between multicast group addresses and the ports where hosts have joined these |
| groups. The group table is dynamically updated based on the IGMP/MLD messages |
| received. With the multicast group information gathered through snooping, the |
| switch optimizes the forwarding of multicast traffic. Instead of blindly |
| broadcasting the multicast traffic to all ports, it sends the multicast |
| traffic based on the destination MAC address only to ports which have |
| subscribed the respective destination multicast group. |
| |
| When created, the Linux bridge devices have multicast snooping enabled by |
| default. It maintains a Multicast forwarding database (MDB) which keeps track |
| of port and group relationships. |
| |
| IGMPv3/MLDv2 EHT support |
| ------------------------ |
| |
| The Linux bridge supports IGMPv3/MLDv2 EHT (Explicit Host Tracking), which |
| was added by `474ddb37fa3a ("net: bridge: multicast: add EHT allow/block handling") |
| <https://lore.kernel.org/netdev/20210120145203.1109140-1-razor@blackwall.org/>`_ |
| |
| The explicit host tracking enables the device to keep track of each |
| individual host that is joined to a particular group or channel. The main |
| benefit of the explicit host tracking in IGMP is to allow minimal leave |
| latencies when a host leaves a multicast group or channel. |
| |
| The length of time between a host wanting to leave and a device stopping |
| traffic forwarding is called the IGMP leave latency. A device configured |
| with IGMPv3 or MLDv2 and explicit tracking can immediately stop forwarding |
| traffic if the last host to request to receive traffic from the device |
| indicates that it no longer wants to receive traffic. The leave latency |
| is thus bound only by the packet transmission latencies in the multiaccess |
| network and the processing time in the device. |
| |
| Other multicast features |
| ------------------------ |
| |
| The Linux bridge also supports `per-VLAN multicast snooping |
| <https://lore.kernel.org/netdev/20210719170637.435541-1-razor@blackwall.org/>`_, |
| which is disabled by default but can be enabled. And `Multicast Router Discovery |
| <https://lore.kernel.org/netdev/20190121062628.2710-1-linus.luessing@c0d3.blue/>`_, |
| which help identify the location of multicast routers. |
| |
| Switchdev |
| ========= |
| |
| Linux Bridge Switchdev is a feature in the Linux kernel that extends the |
| capabilities of the traditional Linux bridge to work more efficiently with |
| hardware switches that support switchdev. With Linux Bridge Switchdev, certain |
| networking functions like forwarding, filtering, and learning of Ethernet |
| frames can be offloaded to a hardware switch. This offloading reduces the |
| burden on the Linux kernel and CPU, leading to improved network performance |
| and lower latency. |
| |
| To use Linux Bridge Switchdev, you need hardware switches that support the |
| switchdev interface. This means that the switch hardware needs to have the |
| necessary drivers and functionality to work in conjunction with the Linux |
| kernel. |
| |
| Please see the :ref:`switchdev` document for more details. |
| |
| Netfilter |
| ========= |
| |
| The bridge netfilter module is a legacy feature that allows to filter bridged |
| packets with iptables and ip6tables. Its use is discouraged. Users should |
| consider using nftables for packet filtering. |
| |
| The older ebtables tool is more feature-limited compared to nftables, but |
| just like nftables it doesn't need this module either to function. |
| |
| The br_netfilter module intercepts packets entering the bridge, performs |
| minimal sanity tests on ipv4 and ipv6 packets and then pretends that |
| these packets are being routed, not bridged. br_netfilter then calls |
| the ip and ipv6 netfilter hooks from the bridge layer, i.e. ip(6)tables |
| rulesets will also see these packets. |
| |
| br_netfilter is also the reason for the iptables *physdev* match: |
| This match is the only way to reliably tell routed and bridged packets |
| apart in an iptables ruleset. |
| |
| Note that ebtables and nftables will work fine without the br_netfilter module. |
| iptables/ip6tables/arptables do not work for bridged traffic because they |
| plug in the routing stack. nftables rules in ip/ip6/inet/arp families won't |
| see traffic that is forwarded by a bridge either, but that's very much how it |
| should be. |
| |
| Historically the feature set of ebtables was very limited (it still is), |
| this module was added to pretend packets are routed and invoke the ipv4/ipv6 |
| netfilter hooks from the bridge so users had access to the more feature-rich |
| iptables matching capabilities (including conntrack). nftables doesn't have |
| this limitation, pretty much all features work regardless of the protocol family. |
| |
| So, br_netfilter is only needed if users, for some reason, need to use |
| ip(6)tables to filter packets forwarded by the bridge, or NAT bridged |
| traffic. For pure link layer filtering, this module isn't needed. |
| |
| Other Features |
| ============== |
| |
| The Linux bridge also supports `IEEE 802.11 Proxy ARP |
| <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=958501163ddd6ea22a98f94fa0e7ce6d4734e5c4>`_, |
| `Media Redundancy Protocol (MRP) |
| <https://lore.kernel.org/netdev/20200426132208.3232-1-horatiu.vultur@microchip.com/>`_, |
| `Media Redundancy Protocol (MRP) LC mode |
| <https://lore.kernel.org/r/20201124082525.273820-1-horatiu.vultur@microchip.com>`_, |
| `IEEE 802.1X port authentication |
| <https://lore.kernel.org/netdev/20220218155148.2329797-1-schultz.hans+netdev@gmail.com/>`_, |
| and `MAC Authentication Bypass (MAB) |
| <https://lore.kernel.org/netdev/20221101193922.2125323-2-idosch@nvidia.com/>`_. |
| |
| FAQ |
| === |
| |
| What does a bridge do? |
| ---------------------- |
| |
| A bridge transparently forwards traffic between multiple network interfaces. |
| In plain English this means that a bridge connects two or more physical |
| Ethernet networks, to form one larger (logical) Ethernet network. |
| |
| Is it L3 protocol independent? |
| ------------------------------ |
| |
| Yes. The bridge sees all frames, but it *uses* only L2 headers/information. |
| As such, the bridging functionality is protocol independent, and there should |
| be no trouble forwarding IPX, NetBEUI, IP, IPv6, etc. |
| |
| Contact Info |
| ============ |
| |
| The code is currently maintained by Roopa Prabhu <roopa@nvidia.com> and |
| Nikolay Aleksandrov <razor@blackwall.org>. Bridge bugs and enhancements |
| are discussed on the linux-netdev mailing list netdev@vger.kernel.org and |
| bridge@lists.linux-foundation.org. |
| |
| The list is open to anyone interested: http://vger.kernel.org/vger-lists.html#netdev |
| |
| External Links |
| ============== |
| |
| The old Documentation for Linux bridging is on: |
| https://wiki.linuxfoundation.org/networking/bridge |