| ========================= |
| NXP SJA1105 switch driver |
| ========================= |
| |
| Overview |
| ======== |
| |
| The NXP SJA1105 is a family of 6 devices: |
| |
| - SJA1105E: First generation, no TTEthernet |
| - SJA1105T: First generation, TTEthernet |
| - SJA1105P: Second generation, no TTEthernet, no SGMII |
| - SJA1105Q: Second generation, TTEthernet, no SGMII |
| - SJA1105R: Second generation, no TTEthernet, SGMII |
| - SJA1105S: Second generation, TTEthernet, SGMII |
| |
| These are SPI-managed automotive switches, with all ports being gigabit |
| capable, and supporting MII/RMII/RGMII and optionally SGMII on one port. |
| |
| Being automotive parts, their configuration interface is geared towards |
| set-and-forget use, with minimal dynamic interaction at runtime. They |
| require a static configuration to be composed by software and packed |
| with CRC and table headers, and sent over SPI. |
| |
| The static configuration is composed of several configuration tables. Each |
| table takes a number of entries. Some configuration tables can be (partially) |
| reconfigured at runtime, some not. Some tables are mandatory, some not: |
| |
| ============================= ================== ============================= |
| Table Mandatory Reconfigurable |
| ============================= ================== ============================= |
| Schedule no no |
| Schedule entry points if Scheduling no |
| VL Lookup no no |
| VL Policing if VL Lookup no |
| VL Forwarding if VL Lookup no |
| L2 Lookup no no |
| L2 Policing yes no |
| VLAN Lookup yes yes |
| L2 Forwarding yes partially (fully on P/Q/R/S) |
| MAC Config yes partially (fully on P/Q/R/S) |
| Schedule Params if Scheduling no |
| Schedule Entry Points Params if Scheduling no |
| VL Forwarding Params if VL Forwarding no |
| L2 Lookup Params no partially (fully on P/Q/R/S) |
| L2 Forwarding Params yes no |
| Clock Sync Params no no |
| AVB Params no no |
| General Params yes partially |
| Retagging no yes |
| xMII Params yes no |
| SGMII no yes |
| ============================= ================== ============================= |
| |
| |
| Also the configuration is write-only (software cannot read it back from the |
| switch except for very few exceptions). |
| |
| The driver creates a static configuration at probe time, and keeps it at |
| all times in memory, as a shadow for the hardware state. When required to |
| change a hardware setting, the static configuration is also updated. |
| If that changed setting can be transmitted to the switch through the dynamic |
| reconfiguration interface, it is; otherwise the switch is reset and |
| reprogrammed with the updated static configuration. |
| |
| Traffic support |
| =============== |
| |
| The switches do not have hardware support for DSA tags, except for "slow |
| protocols" for switch control as STP and PTP. For these, the switches have two |
| programmable filters for link-local destination MACs. |
| These are used to trap BPDUs and PTP traffic to the master netdevice, and are |
| further used to support STP and 1588 ordinary clock/boundary clock |
| functionality. For frames trapped to the CPU, source port and switch ID |
| information is encoded by the hardware into the frames. |
| |
| But by leveraging ``CONFIG_NET_DSA_TAG_8021Q`` (a software-defined DSA tagging |
| format based on VLANs), general-purpose traffic termination through the network |
| stack can be supported under certain circumstances. |
| |
| Depending on VLAN awareness state, the following operating modes are possible |
| with the switch: |
| |
| - Mode 1 (VLAN-unaware): a port is in this mode when it is used as a standalone |
| net device, or when it is enslaved to a bridge with ``vlan_filtering=0``. |
| - Mode 2 (fully VLAN-aware): a port is in this mode when it is enslaved to a |
| bridge with ``vlan_filtering=1``. Access to the entire VLAN range is given to |
| the user through ``bridge vlan`` commands, but general-purpose (anything |
| other than STP, PTP etc) traffic termination is not possible through the |
| switch net devices. The other packets can be still by user space processed |
| through the DSA master interface (similar to ``DSA_TAG_PROTO_NONE``). |
| - Mode 3 (best-effort VLAN-aware): a port is in this mode when enslaved to a |
| bridge with ``vlan_filtering=1``, and the devlink property of its parent |
| switch named ``best_effort_vlan_filtering`` is set to ``true``. When |
| configured like this, the range of usable VIDs is reduced (0 to 1023 and 3072 |
| to 4094), so is the number of usable VIDs (maximum of 7 non-pvid VLANs per |
| port*), and shared VLAN learning is performed (FDB lookup is done only by |
| DMAC, not also by VID). |
| |
| To summarize, in each mode, the following types of traffic are supported over |
| the switch net devices: |
| |
| +-------------+-----------+--------------+------------+ |
| | | Mode 1 | Mode 2 | Mode 3 | |
| +=============+===========+==============+============+ |
| | Regular | Yes | No | Yes | |
| | traffic | | (use master) | | |
| +-------------+-----------+--------------+------------+ |
| | Management | Yes | Yes | Yes | |
| | traffic | | | | |
| | (BPDU, PTP) | | | | |
| +-------------+-----------+--------------+------------+ |
| |
| To configure the switch to operate in Mode 3, the following steps can be |
| followed:: |
| |
| ip link add dev br0 type bridge |
| # swp2 operates in Mode 1 now |
| ip link set dev swp2 master br0 |
| # swp2 temporarily moves to Mode 2 |
| ip link set dev br0 type bridge vlan_filtering 1 |
| [ 61.204770] sja1105 spi0.1: Reset switch and programmed static config. Reason: VLAN filtering |
| [ 61.239944] sja1105 spi0.1: Disabled switch tagging |
| # swp3 now operates in Mode 3 |
| devlink dev param set spi/spi0.1 name best_effort_vlan_filtering value true cmode runtime |
| [ 64.682927] sja1105 spi0.1: Reset switch and programmed static config. Reason: VLAN filtering |
| [ 64.711925] sja1105 spi0.1: Enabled switch tagging |
| # Cannot use VLANs in range 1024-3071 while in Mode 3. |
| bridge vlan add dev swp2 vid 1025 untagged pvid |
| RTNETLINK answers: Operation not permitted |
| bridge vlan add dev swp2 vid 100 |
| bridge vlan add dev swp2 vid 101 untagged |
| bridge vlan |
| port vlan ids |
| swp5 1 PVID Egress Untagged |
| |
| swp2 1 PVID Egress Untagged |
| 100 |
| 101 Egress Untagged |
| |
| swp3 1 PVID Egress Untagged |
| |
| swp4 1 PVID Egress Untagged |
| |
| br0 1 PVID Egress Untagged |
| bridge vlan add dev swp2 vid 102 |
| bridge vlan add dev swp2 vid 103 |
| bridge vlan add dev swp2 vid 104 |
| bridge vlan add dev swp2 vid 105 |
| bridge vlan add dev swp2 vid 106 |
| bridge vlan add dev swp2 vid 107 |
| # Cannot use mode than 7 VLANs per port while in Mode 3. |
| [ 3885.216832] sja1105 spi0.1: No more free subvlans |
| |
| \* "maximum of 7 non-pvid VLANs per port": Decoding VLAN-tagged packets on the |
| CPU in mode 3 is possible through VLAN retagging of packets that go from the |
| switch to the CPU. In cross-chip topologies, the port that goes to the CPU |
| might also go to other switches. In that case, those other switches will see |
| only a retagged packet (which only has meaning for the CPU). So if they are |
| interested in this VLAN, they need to apply retagging in the reverse direction, |
| to recover the original value from it. This consumes extra hardware resources |
| for this switch. There is a maximum of 32 entries in the Retagging Table of |
| each switch device. |
| |
| As an example, consider this cross-chip topology:: |
| |
| +-------------------------------------------------+ |
| | Host SoC | |
| | +-------------------------+ | |
| | | DSA master for embedded | | |
| | | switch (non-sja1105) | | |
| | +--------+-------------------------+--------+ | |
| | | embedded L2 switch | | |
| | | | | |
| | | +--------------+ +--------------+ | | |
| | | |DSA master for| |DSA master for| | | |
| | | | SJA1105 1 | | SJA1105 2 | | | |
| +--+---+--------------+-----+--------------+---+--+ |
| |
| +-----------------------+ +-----------------------+ |
| | SJA1105 switch 1 | | SJA1105 switch 2 | |
| +-----+-----+-----+-----+ +-----+-----+-----+-----+ |
| |sw1p0|sw1p1|sw1p2|sw1p3| |sw2p0|sw2p1|sw2p2|sw2p3| |
| +-----+-----+-----+-----+ +-----+-----+-----+-----+ |
| |
| To reach the CPU, SJA1105 switch 1 (spi/spi2.1) uses the same port as is uses |
| to reach SJA1105 switch 2 (spi/spi2.2), which would be port 4 (not drawn). |
| Similarly for SJA1105 switch 2. |
| |
| Also consider the following commands, that add VLAN 100 to every sja1105 user |
| port:: |
| |
| devlink dev param set spi/spi2.1 name best_effort_vlan_filtering value true cmode runtime |
| devlink dev param set spi/spi2.2 name best_effort_vlan_filtering value true cmode runtime |
| ip link add dev br0 type bridge |
| for port in sw1p0 sw1p1 sw1p2 sw1p3 \ |
| sw2p0 sw2p1 sw2p2 sw2p3; do |
| ip link set dev $port master br0 |
| done |
| ip link set dev br0 type bridge vlan_filtering 1 |
| for port in sw1p0 sw1p1 sw1p2 sw1p3 \ |
| sw2p0 sw2p1 sw2p2; do |
| bridge vlan add dev $port vid 100 |
| done |
| ip link add link br0 name br0.100 type vlan id 100 && ip link set dev br0.100 up |
| ip addr add 192.168.100.3/24 dev br0.100 |
| bridge vlan add dev br0 vid 100 self |
| |
| bridge vlan |
| port vlan ids |
| sw1p0 1 PVID Egress Untagged |
| 100 |
| |
| sw1p1 1 PVID Egress Untagged |
| 100 |
| |
| sw1p2 1 PVID Egress Untagged |
| 100 |
| |
| sw1p3 1 PVID Egress Untagged |
| 100 |
| |
| sw2p0 1 PVID Egress Untagged |
| 100 |
| |
| sw2p1 1 PVID Egress Untagged |
| 100 |
| |
| sw2p2 1 PVID Egress Untagged |
| 100 |
| |
| sw2p3 1 PVID Egress Untagged |
| |
| br0 1 PVID Egress Untagged |
| 100 |
| |
| SJA1105 switch 1 consumes 1 retagging entry for each VLAN on each user port |
| towards the CPU. It also consumes 1 retagging entry for each non-pvid VLAN that |
| it is also interested in, which is configured on any port of any neighbor |
| switch. |
| |
| In this case, SJA1105 switch 1 consumes a total of 11 retagging entries, as |
| follows: |
| |
| - 8 retagging entries for VLANs 1 and 100 installed on its user ports |
| (``sw1p0`` - ``sw1p3``) |
| - 3 retagging entries for VLAN 100 installed on the user ports of SJA1105 |
| switch 2 (``sw2p0`` - ``sw2p2``), because it also has ports that are |
| interested in it. The VLAN 1 is a pvid on SJA1105 switch 2 and does not need |
| reverse retagging. |
| |
| SJA1105 switch 2 also consumes 11 retagging entries, but organized as follows: |
| |
| - 7 retagging entries for the bridge VLANs on its user ports (``sw2p0`` - |
| ``sw2p3``). |
| - 4 retagging entries for VLAN 100 installed on the user ports of SJA1105 |
| switch 1 (``sw1p0`` - ``sw1p3``). |
| |
| Switching features |
| ================== |
| |
| The driver supports the configuration of L2 forwarding rules in hardware for |
| port bridging. The forwarding, broadcast and flooding domain between ports can |
| be restricted through two methods: either at the L2 forwarding level (isolate |
| one bridge's ports from another's) or at the VLAN port membership level |
| (isolate ports within the same bridge). The final forwarding decision taken by |
| the hardware is a logical AND of these two sets of rules. |
| |
| The hardware tags all traffic internally with a port-based VLAN (pvid), or it |
| decodes the VLAN information from the 802.1Q tag. Advanced VLAN classification |
| is not possible. Once attributed a VLAN tag, frames are checked against the |
| port's membership rules and dropped at ingress if they don't match any VLAN. |
| This behavior is available when switch ports are enslaved to a bridge with |
| ``vlan_filtering 1``. |
| |
| Normally the hardware is not configurable with respect to VLAN awareness, but |
| by changing what TPID the switch searches 802.1Q tags for, the semantics of a |
| bridge with ``vlan_filtering 0`` can be kept (accept all traffic, tagged or |
| untagged), and therefore this mode is also supported. |
| |
| Segregating the switch ports in multiple bridges is supported (e.g. 2 + 2), but |
| all bridges should have the same level of VLAN awareness (either both have |
| ``vlan_filtering`` 0, or both 1). Also an inevitable limitation of the fact |
| that VLAN awareness is global at the switch level is that once a bridge with |
| ``vlan_filtering`` enslaves at least one switch port, the other un-bridged |
| ports are no longer available for standalone traffic termination. |
| |
| Topology and loop detection through STP is supported. |
| |
| L2 FDB manipulation (add/delete/dump) is currently possible for the first |
| generation devices. Aging time of FDB entries, as well as enabling fully static |
| management (no address learning and no flooding of unknown traffic) is not yet |
| configurable in the driver. |
| |
| A special comment about bridging with other netdevices (illustrated with an |
| example): |
| |
| A board has eth0, eth1, swp0@eth1, swp1@eth1, swp2@eth1, swp3@eth1. |
| The switch ports (swp0-3) are under br0. |
| It is desired that eth0 is turned into another switched port that communicates |
| with swp0-3. |
| |
| If br0 has vlan_filtering 0, then eth0 can simply be added to br0 with the |
| intended results. |
| If br0 has vlan_filtering 1, then a new br1 interface needs to be created that |
| enslaves eth0 and eth1 (the DSA master of the switch ports). This is because in |
| this mode, the switch ports beneath br0 are not capable of regular traffic, and |
| are only used as a conduit for switchdev operations. |
| |
| Offloads |
| ======== |
| |
| Time-aware scheduling |
| --------------------- |
| |
| The switch supports a variation of the enhancements for scheduled traffic |
| specified in IEEE 802.1Q-2018 (formerly 802.1Qbv). This means it can be used to |
| ensure deterministic latency for priority traffic that is sent in-band with its |
| gate-open event in the network schedule. |
| |
| This capability can be managed through the tc-taprio offload ('flags 2'). The |
| difference compared to the software implementation of taprio is that the latter |
| would only be able to shape traffic originated from the CPU, but not |
| autonomously forwarded flows. |
| |
| The device has 8 traffic classes, and maps incoming frames to one of them based |
| on the VLAN PCP bits (if no VLAN is present, the port-based default is used). |
| As described in the previous sections, depending on the value of |
| ``vlan_filtering``, the EtherType recognized by the switch as being VLAN can |
| either be the typical 0x8100 or a custom value used internally by the driver |
| for tagging. Therefore, the switch ignores the VLAN PCP if used in standalone |
| or bridge mode with ``vlan_filtering=0``, as it will not recognize the 0x8100 |
| EtherType. In these modes, injecting into a particular TX queue can only be |
| done by the DSA net devices, which populate the PCP field of the tagging header |
| on egress. Using ``vlan_filtering=1``, the behavior is the other way around: |
| offloaded flows can be steered to TX queues based on the VLAN PCP, but the DSA |
| net devices are no longer able to do that. To inject frames into a hardware TX |
| queue with VLAN awareness active, it is necessary to create a VLAN |
| sub-interface on the DSA master port, and send normal (0x8100) VLAN-tagged |
| towards the switch, with the VLAN PCP bits set appropriately. |
| |
| Management traffic (having DMAC 01-80-C2-xx-xx-xx or 01-19-1B-xx-xx-xx) is the |
| notable exception: the switch always treats it with a fixed priority and |
| disregards any VLAN PCP bits even if present. The traffic class for management |
| traffic has a value of 7 (highest priority) at the moment, which is not |
| configurable in the driver. |
| |
| Below is an example of configuring a 500 us cyclic schedule on egress port |
| ``swp5``. The traffic class gate for management traffic (7) is open for 100 us, |
| and the gates for all other traffic classes are open for 400 us:: |
| |
| #!/bin/bash |
| |
| set -e -u -o pipefail |
| |
| NSEC_PER_SEC="1000000000" |
| |
| gatemask() { |
| local tc_list="$1" |
| local mask=0 |
| |
| for tc in ${tc_list}; do |
| mask=$((${mask} | (1 << ${tc}))) |
| done |
| |
| printf "%02x" ${mask} |
| } |
| |
| if ! systemctl is-active --quiet ptp4l; then |
| echo "Please start the ptp4l service" |
| exit |
| fi |
| |
| now=$(phc_ctl /dev/ptp1 get | gawk '/clock time is/ { print $5; }') |
| # Phase-align the base time to the start of the next second. |
| sec=$(echo "${now}" | gawk -F. '{ print $1; }') |
| base_time="$(((${sec} + 1) * ${NSEC_PER_SEC}))" |
| |
| tc qdisc add dev swp5 parent root handle 100 taprio \ |
| num_tc 8 \ |
| map 0 1 2 3 5 6 7 \ |
| queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ |
| base-time ${base_time} \ |
| sched-entry S $(gatemask 7) 100000 \ |
| sched-entry S $(gatemask "0 1 2 3 4 5 6") 400000 \ |
| flags 2 |
| |
| It is possible to apply the tc-taprio offload on multiple egress ports. There |
| are hardware restrictions related to the fact that no gate event may trigger |
| simultaneously on two ports. The driver checks the consistency of the schedules |
| against this restriction and errors out when appropriate. Schedule analysis is |
| needed to avoid this, which is outside the scope of the document. |
| |
| Routing actions (redirect, trap, drop) |
| -------------------------------------- |
| |
| The switch is able to offload flow-based redirection of packets to a set of |
| destination ports specified by the user. Internally, this is implemented by |
| making use of Virtual Links, a TTEthernet concept. |
| |
| The driver supports 2 types of keys for Virtual Links: |
| |
| - VLAN-aware virtual links: these match on destination MAC address, VLAN ID and |
| VLAN PCP. |
| - VLAN-unaware virtual links: these match on destination MAC address only. |
| |
| The VLAN awareness state of the bridge (vlan_filtering) cannot be changed while |
| there are virtual link rules installed. |
| |
| Composing multiple actions inside the same rule is supported. When only routing |
| actions are requested, the driver creates a "non-critical" virtual link. When |
| the action list also contains tc-gate (more details below), the virtual link |
| becomes "time-critical" (draws frame buffers from a reserved memory partition, |
| etc). |
| |
| The 3 routing actions that are supported are "trap", "drop" and "redirect". |
| |
| Example 1: send frames received on swp2 with a DA of 42:be:24:9b:76:20 to the |
| CPU and to swp3. This type of key (DA only) when the port's VLAN awareness |
| state is off:: |
| |
| tc qdisc add dev swp2 clsact |
| tc filter add dev swp2 ingress flower skip_sw dst_mac 42:be:24:9b:76:20 \ |
| action mirred egress redirect dev swp3 \ |
| action trap |
| |
| Example 2: drop frames received on swp2 with a DA of 42:be:24:9b:76:20, a VID |
| of 100 and a PCP of 0:: |
| |
| tc filter add dev swp2 ingress protocol 802.1Q flower skip_sw \ |
| dst_mac 42:be:24:9b:76:20 vlan_id 100 vlan_prio 0 action drop |
| |
| Time-based ingress policing |
| --------------------------- |
| |
| The TTEthernet hardware abilities of the switch can be constrained to act |
| similarly to the Per-Stream Filtering and Policing (PSFP) clause specified in |
| IEEE 802.1Q-2018 (formerly 802.1Qci). This means it can be used to perform |
| tight timing-based admission control for up to 1024 flows (identified by a |
| tuple composed of destination MAC address, VLAN ID and VLAN PCP). Packets which |
| are received outside their expected reception window are dropped. |
| |
| This capability can be managed through the offload of the tc-gate action. As |
| routing actions are intrinsic to virtual links in TTEthernet (which performs |
| explicit routing of time-critical traffic and does not leave that in the hands |
| of the FDB, flooding etc), the tc-gate action may never appear alone when |
| asking sja1105 to offload it. One (or more) redirect or trap actions must also |
| follow along. |
| |
| Example: create a tc-taprio schedule that is phase-aligned with a tc-gate |
| schedule (the clocks must be synchronized by a 1588 application stack, which is |
| outside the scope of this document). No packet delivered by the sender will be |
| dropped. Note that the reception window is larger than the transmission window |
| (and much more so, in this example) to compensate for the packet propagation |
| delay of the link (which can be determined by the 1588 application stack). |
| |
| Receiver (sja1105):: |
| |
| tc qdisc add dev swp2 clsact |
| now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \ |
| sec=$(echo $now | awk -F. '{print $1}') && \ |
| base_time="$(((sec + 2) * 1000000000))" && \ |
| echo "base time ${base_time}" |
| tc filter add dev swp2 ingress flower skip_sw \ |
| dst_mac 42:be:24:9b:76:20 \ |
| action gate base-time ${base_time} \ |
| sched-entry OPEN 60000 -1 -1 \ |
| sched-entry CLOSE 40000 -1 -1 \ |
| action trap |
| |
| Sender:: |
| |
| now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \ |
| sec=$(echo $now | awk -F. '{print $1}') && \ |
| base_time="$(((sec + 2) * 1000000000))" && \ |
| echo "base time ${base_time}" |
| tc qdisc add dev eno0 parent root taprio \ |
| num_tc 8 \ |
| map 0 1 2 3 4 5 6 7 \ |
| queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ |
| base-time ${base_time} \ |
| sched-entry S 01 50000 \ |
| sched-entry S 00 50000 \ |
| flags 2 |
| |
| The engine used to schedule the ingress gate operations is the same that the |
| one used for the tc-taprio offload. Therefore, the restrictions regarding the |
| fact that no two gate actions (either tc-gate or tc-taprio gates) may fire at |
| the same time (during the same 200 ns slot) still apply. |
| |
| To come in handy, it is possible to share time-triggered virtual links across |
| more than 1 ingress port, via flow blocks. In this case, the restriction of |
| firing at the same time does not apply because there is a single schedule in |
| the system, that of the shared virtual link:: |
| |
| tc qdisc add dev swp2 ingress_block 1 clsact |
| tc qdisc add dev swp3 ingress_block 1 clsact |
| tc filter add block 1 flower skip_sw dst_mac 42:be:24:9b:76:20 \ |
| action gate index 2 \ |
| base-time 0 \ |
| sched-entry OPEN 50000000 -1 -1 \ |
| sched-entry CLOSE 50000000 -1 -1 \ |
| action trap |
| |
| Hardware statistics for each flow are also available ("pkts" counts the number |
| of dropped frames, which is a sum of frames dropped due to timing violations, |
| lack of destination ports and MTU enforcement checks). Byte-level counters are |
| not available. |
| |
| Device Tree bindings and board design |
| ===================================== |
| |
| This section references ``Documentation/devicetree/bindings/net/dsa/sja1105.txt`` |
| and aims to showcase some potential switch caveats. |
| |
| RMII PHY role and out-of-band signaling |
| --------------------------------------- |
| |
| In the RMII spec, the 50 MHz clock signals are either driven by the MAC or by |
| an external oscillator (but not by the PHY). |
| But the spec is rather loose and devices go outside it in several ways. |
| Some PHYs go against the spec and may provide an output pin where they source |
| the 50 MHz clock themselves, in an attempt to be helpful. |
| On the other hand, the SJA1105 is only binary configurable - when in the RMII |
| MAC role it will also attempt to drive the clock signal. To prevent this from |
| happening it must be put in RMII PHY role. |
| But doing so has some unintended consequences. |
| In the RMII spec, the PHY can transmit extra out-of-band signals via RXD[1:0]. |
| These are practically some extra code words (/J/ and /K/) sent prior to the |
| preamble of each frame. The MAC does not have this out-of-band signaling |
| mechanism defined by the RMII spec. |
| So when the SJA1105 port is put in PHY role to avoid having 2 drivers on the |
| clock signal, inevitably an RMII PHY-to-PHY connection is created. The SJA1105 |
| emulates a PHY interface fully and generates the /J/ and /K/ symbols prior to |
| frame preambles, which the real PHY is not expected to understand. So the PHY |
| simply encodes the extra symbols received from the SJA1105-as-PHY onto the |
| 100Base-Tx wire. |
| On the other side of the wire, some link partners might discard these extra |
| symbols, while others might choke on them and discard the entire Ethernet |
| frames that follow along. This looks like packet loss with some link partners |
| but not with others. |
| The take-away is that in RMII mode, the SJA1105 must be let to drive the |
| reference clock if connected to a PHY. |
| |
| RGMII fixed-link and internal delays |
| ------------------------------------ |
| |
| As mentioned in the bindings document, the second generation of devices has |
| tunable delay lines as part of the MAC, which can be used to establish the |
| correct RGMII timing budget. |
| When powered up, these can shift the Rx and Tx clocks with a phase difference |
| between 73.8 and 101.7 degrees. |
| The catch is that the delay lines need to lock onto a clock signal with a |
| stable frequency. This means that there must be at least 2 microseconds of |
| silence between the clock at the old vs at the new frequency. Otherwise the |
| lock is lost and the delay lines must be reset (powered down and back up). |
| In RGMII the clock frequency changes with link speed (125 MHz at 1000 Mbps, 25 |
| MHz at 100 Mbps and 2.5 MHz at 10 Mbps), and link speed might change during the |
| AN process. |
| In the situation where the switch port is connected through an RGMII fixed-link |
| to a link partner whose link state life cycle is outside the control of Linux |
| (such as a different SoC), then the delay lines would remain unlocked (and |
| inactive) until there is manual intervention (ifdown/ifup on the switch port). |
| The take-away is that in RGMII mode, the switch's internal delays are only |
| reliable if the link partner never changes link speeds, or if it does, it does |
| so in a way that is coordinated with the switch port (practically, both ends of |
| the fixed-link are under control of the same Linux system). |
| As to why would a fixed-link interface ever change link speeds: there are |
| Ethernet controllers out there which come out of reset in 100 Mbps mode, and |
| their driver inevitably needs to change the speed and clock frequency if it's |
| required to work at gigabit. |
| |
| MDIO bus and PHY management |
| --------------------------- |
| |
| The SJA1105 does not have an MDIO bus and does not perform in-band AN either. |
| Therefore there is no link state notification coming from the switch device. |
| A board would need to hook up the PHYs connected to the switch to any other |
| MDIO bus available to Linux within the system (e.g. to the DSA master's MDIO |
| bus). Link state management then works by the driver manually keeping in sync |
| (over SPI commands) the MAC link speed with the settings negotiated by the PHY. |