| What: /sys/bus/platform/devices/smpro-errmon.*/error_[core|mem|pcie|other]_[ce|ue] |
| KernelVersion: 6.1 |
| Contact: Quan Nguyen <quan@os.amperecomputing.com> |
| Description: |
| (RO) Contains the 48-byte Ampere (Vendor-Specific) Error Record printed |
| in hex format according to the table below: |
| |
| +--------+---------------+-------------+------------------------------------------------------------+ |
| | Offset | Field | Size (byte) | Description | |
| +--------+---------------+-------------+------------------------------------------------------------+ |
| | 00 | Error Type | 1 | See :ref:`the table below <smpro-error-types>` for details | |
| +--------+---------------+-------------+------------------------------------------------------------+ |
| | 01 | Subtype | 1 | See :ref:`the table below <smpro-error-types>` for details | |
| +--------+---------------+-------------+------------------------------------------------------------+ |
| | 02 | Instance | 2 | See :ref:`the table below <smpro-error-types>` for details | |
| +--------+---------------+-------------+------------------------------------------------------------+ |
| | 04 | Error status | 4 | See ARM RAS specification for details | |
| +--------+---------------+-------------+------------------------------------------------------------+ |
| | 08 | Error Address | 8 | See ARM RAS specification for details | |
| +--------+---------------+-------------+------------------------------------------------------------+ |
| | 16 | Error Misc 0 | 8 | See ARM RAS specification for details | |
| +--------+---------------+-------------+------------------------------------------------------------+ |
| | 24 | Error Misc 1 | 8 | See ARM RAS specification for details | |
| +--------+---------------+-------------+------------------------------------------------------------+ |
| | 32 | Error Misc 2 | 8 | See ARM RAS specification for details | |
| +--------+---------------+-------------+------------------------------------------------------------+ |
| | 40 | Error Misc 3 | 8 | See ARM RAS specification for details | |
| +--------+---------------+-------------+------------------------------------------------------------+ |
| |
| The table below defines the value of error types, their subtype, subcomponent and instance: |
| |
| .. _smpro-error-types: |
| |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | Error Group | Error Type | Sub type | Sub component | Instance | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | CPM (core) | 0 | 0 | Snoop-Logic | CPM # | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | CPM (core) | 0 | 2 | Armv8 Core 1 | CPM # | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | MCU (mem) | 1 | 1 | ERR1 | MCU # \| SLOT << 11 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | MCU (mem) | 1 | 2 | ERR2 | MCU # \| SLOT << 11 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | MCU (mem) | 1 | 3 | ERR3 | MCU # | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | MCU (mem) | 1 | 4 | ERR4 | MCU # | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | MCU (mem) | 1 | 5 | ERR5 | MCU # | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | MCU (mem) | 1 | 6 | ERR6 | MCU # | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | MCU (mem) | 1 | 7 | Link Error | MCU # | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | Mesh (other) | 2 | 0 | Cross Point | X \| (Y << 5) \| NS <<11 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | Mesh (other) | 2 | 1 | Home Node(IO) | X \| (Y << 5) \| NS <<11 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | Mesh (other) | 2 | 2 | Home Node(Mem) | X \| (Y << 5) \| NS <<11 \| device<<12 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | Mesh (other) | 2 | 4 | CCIX Node | X \| (Y << 5) \| NS <<11 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | 2P Link (other) | 3 | 0 | N/A | Altra 2P Link # | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | GIC (other) | 5 | 0 | ERR0 | 0 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | GIC (other) | 5 | 1 | ERR1 | 0 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | GIC (other) | 5 | 2 | ERR2 | 0 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | GIC (other) | 5 | 3 | ERR3 | 0 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | GIC (other) | 5 | 4 | ERR4 | 0 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | GIC (other) | 5 | 5 | ERR5 | 0 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | GIC (other) | 5 | 6 | ERR6 | 0 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | GIC (other) | 5 | 7 | ERR7 | 0 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | GIC (other) | 5 | 8 | ERR8 | 0 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | GIC (other) | 5 | 9 | ERR9 | 0 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | GIC (other) | 5 | 10 | ERR10 | 0 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | GIC (other) | 5 | 11 | ERR11 | 0 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | GIC (other) | 5 | 12 | ERR12 | 0 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | GIC (other) | 5 | 13-21 | ERR13 | RC # + 1 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | SMMU (other) | 6 | TCU | 100 | RC # | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | SMMU (other) | 6 | TBU0 | 0 | RC # | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | SMMU (other) | 6 | TBU1 | 1 | RC # | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | SMMU (other) | 6 | TBU2 | 2 | RC # | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | SMMU (other) | 6 | TBU3 | 3 | RC # | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | SMMU (other) | 6 | TBU4 | 4 | RC # | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | SMMU (other) | 6 | TBU5 | 5 | RC # | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | SMMU (other) | 6 | TBU6 | 6 | RC # | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | SMMU (other) | 6 | TBU7 | 7 | RC # | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | SMMU (other) | 6 | TBU8 | 8 | RC # | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | SMMU (other) | 6 | TBU9 | 9 | RC # | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | PCIe AER (pcie) | 7 | Root | 0 | RC # | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | PCIe AER (pcie) | 7 | Device | 1 | RC # | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | PCIe RC (pcie) | 8 | RCA HB | 0 | RC # | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | PCIe RC (pcie) | 8 | RCB HB | 1 | RC # | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | PCIe RC (pcie) | 8 | RASDP | 8 | RC # | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | OCM (other) | 9 | ERR0 | 0 | 0 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | OCM (other) | 9 | ERR1 | 1 | 0 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | OCM (other) | 9 | ERR2 | 2 | 0 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | SMpro (other) | 10 | ERR0 | 0 | 0 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | SMpro (other) | 10 | ERR1 | 1 | 0 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | SMpro (other) | 10 | MPA_ERR | 2 | 0 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | PMpro (other) | 11 | ERR0 | 0 | 0 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | PMpro (other) | 11 | ERR1 | 1 | 0 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| | PMpro (other) | 11 | MPA_ERR | 2 | 0 | |
| +-----------------+------------+----------+----------------+----------------------------------------+ |
| |
| Example:: |
| |
| # cat error_other_ue |
| 880807001e004010401040101500000001004010401040100c0000000000000000000000000000000000000000000000 |
| |
| The detail of each sysfs entries is as below: |
| |
| +-------------+---------------------------------------------------------+----------------------------------+ |
| | Error | Sysfs entry | Description (when triggered) | |
| +-------------+---------------------------------------------------------+----------------------------------+ |
| | Core's CE | /sys/bus/platform/devices/smpro-errmon.*/error_core_ce | Core has CE error | |
| +-------------+---------------------------------------------------------+----------------------------------+ |
| | Core's UE | /sys/bus/platform/devices/smpro-errmon.*/error_core_ue | Core has UE error | |
| +-------------+---------------------------------------------------------+----------------------------------+ |
| | Memory's CE | /sys/bus/platform/devices/smpro-errmon.*/error_mem_ce | Memory has CE error | |
| +-------------+---------------------------------------------------------+----------------------------------+ |
| | Memory's UE | /sys/bus/platform/devices/smpro-errmon.*/error_mem_ue | Memory has UE error | |
| +-------------+---------------------------------------------------------+----------------------------------+ |
| | PCIe's CE | /sys/bus/platform/devices/smpro-errmon.*/error_pcie_ce | any PCIe controller has CE error | |
| +-------------+---------------------------------------------------------+----------------------------------+ |
| | PCIe's UE | /sys/bus/platform/devices/smpro-errmon.*/error_pcie_ue | any PCIe controller has UE error | |
| +-------------+---------------------------------------------------------+----------------------------------+ |
| | Other's CE | /sys/bus/platform/devices/smpro-errmon.*/error_other_ce | any other CE error | |
| +-------------+---------------------------------------------------------+----------------------------------+ |
| | Other's UE | /sys/bus/platform/devices/smpro-errmon.*/error_other_ue | any other UE error | |
| +-------------+---------------------------------------------------------+----------------------------------+ |
| |
| UE: Uncorrect-able Error |
| CE: Correct-able Error |
| |
| For details, see section `3.3 Ampere (Vendor-Specific) Error Record Formats, |
| Altra Family RAS Supplement`. |
| |
| |
| What: /sys/bus/platform/devices/smpro-errmon.*/overflow_[core|mem|pcie|other]_[ce|ue] |
| KernelVersion: 6.1 |
| Contact: Quan Nguyen <quan@os.amperecomputing.com> |
| Description: |
| (RO) Return the overflow status of each type HW error reported: |
| |
| - 0 : No overflow |
| - 1 : There is an overflow and the oldest HW errors are dropped |
| |
| The detail of each sysfs entries is as below: |
| |
| +-------------+-----------------------------------------------------------+---------------------------------------+ |
| | Overflow | Sysfs entry | Description | |
| +-------------+-----------------------------------------------------------+---------------------------------------+ |
| | Core's CE | /sys/bus/platform/devices/smpro-errmon.*/overflow_core_ce | Core CE error overflow | |
| +-------------+-----------------------------------------------------------+---------------------------------------+ |
| | Core's UE | /sys/bus/platform/devices/smpro-errmon.*/overflow_core_ue | Core UE error overflow | |
| +-------------+-----------------------------------------------------------+---------------------------------------+ |
| | Memory's CE | /sys/bus/platform/devices/smpro-errmon.*/overflow_mem_ce | Memory CE error overflow | |
| +-------------+-----------------------------------------------------------+---------------------------------------+ |
| | Memory's UE | /sys/bus/platform/devices/smpro-errmon.*/overflow_mem_ue | Memory UE error overflow | |
| +-------------+-----------------------------------------------------------+---------------------------------------+ |
| | PCIe's CE | /sys/bus/platform/devices/smpro-errmon.*/overflow_pcie_ce | any PCIe controller CE error overflow | |
| +-------------+-----------------------------------------------------------+---------------------------------------+ |
| | PCIe's UE | /sys/bus/platform/devices/smpro-errmon.*/overflow_pcie_ue | any PCIe controller UE error overflow | |
| +-------------+-----------------------------------------------------------+---------------------------------------+ |
| | Other's CE | /sys/bus/platform/devices/smpro-errmon.*/overflow_other_ce| any other CE error overflow | |
| +-------------+-----------------------------------------------------------+---------------------------------------+ |
| | Other's UE | /sys/bus/platform/devices/smpro-errmon.*/overflow_other_ue| other UE error overflow | |
| +-------------+-----------------------------------------------------------+---------------------------------------+ |
| |
| where: |
| |
| - UE: Uncorrect-able Error |
| - CE: Correct-able Error |
| |
| What: /sys/bus/platform/devices/smpro-errmon.*/[error|warn]_[smpro|pmpro] |
| KernelVersion: 6.1 |
| Contact: Quan Nguyen <quan@os.amperecomputing.com> |
| Description: |
| (RO) Contains the internal firmware error/warning printed as hex format. |
| |
| The detail of each sysfs entries is as below: |
| |
| +---------------+------------------------------------------------------+--------------------------+ |
| | Error | Sysfs entry | Description | |
| +---------------+------------------------------------------------------+--------------------------+ |
| | SMpro error | /sys/bus/platform/devices/smpro-errmon.*/error_smpro | system has SMpro error | |
| +---------------+------------------------------------------------------+--------------------------+ |
| | SMpro warning | /sys/bus/platform/devices/smpro-errmon.*/warn_smpro | system has SMpro warning | |
| +---------------+------------------------------------------------------+--------------------------+ |
| | PMpro error | /sys/bus/platform/devices/smpro-errmon.*/error_pmpro | system has PMpro error | |
| +---------------+------------------------------------------------------+--------------------------+ |
| | PMpro warning | /sys/bus/platform/devices/smpro-errmon.*/warn_pmpro | system has PMpro warning | |
| +---------------+------------------------------------------------------+--------------------------+ |
| |
| For details, see section `5.10 RAS Internal Error Register Definitions, |
| Altra Family Soc BMC Interface Specification`. |
| |
| What: /sys/bus/platform/devices/smpro-errmon.*/event_[vrd_warn_fault|vrd_hot|dimm_hot|dimm_2x_refresh] |
| KernelVersion: 6.1 (event_[vrd_warn_fault|vrd_hot|dimm_hot]), 6.4 (event_dimm_2x_refresh) |
| Contact: Quan Nguyen <quan@os.amperecomputing.com> |
| Description: |
| (RO) Contains the detail information in case of VRD/DIMM warning/hot events |
| in hex format as below:: |
| |
| AAAA |
| |
| where: |
| |
| - ``AAAA``: The event detail information data |
| |
| The detail of each sysfs entries is as below: |
| |
| +---------------+---------------------------------------------------------------+---------------------+ |
| | Event | Sysfs entry | Description | |
| +---------------+---------------------------------------------------------------+---------------------+ |
| | VRD HOT | /sys/bus/platform/devices/smpro-errmon.*/event_vrd_hot | VRD Hot | |
| +---------------+---------------------------------------------------------------+---------------------+ |
| | VR Warn/Fault | /sys/bus/platform/devices/smpro-errmon.*/event_vrd_warn_fault | VR Warning or Fault | |
| +---------------+---------------------------------------------------------------+---------------------+ |
| | DIMM HOT | /sys/bus/platform/devices/smpro-errmon.*/event_dimm_hot | DIMM Hot | |
| +---------------+---------------------------------------------------------------+---------------------+ |
| | DIMM 2X | /sys/bus/platform/devices/smpro-errmon.*/event_dimm_2x_refresh| DIMM 2x refresh rate| |
| | REFRESH RATE | | event in high temp | |
| +---------------+---------------------------------------------------------------+---------------------+ |
| |
| For more details, see section `5.7 GPI Status Registers and 5.9 Memory Error Register Definitions, |
| Altra Family Soc BMC Interface Specification`. |
| |
| What: /sys/bus/platform/devices/smpro-errmon.*/event_dimm[0-15]_syndrome |
| KernelVersion: 6.4 |
| Contact: Quan Nguyen <quan@os.amperecomputing.com> |
| Description: |
| (RO) The sysfs returns the 2-byte DIMM failure syndrome data for slot |
| 0-15 if it failed to initialize. |
| |
| For more details, see section `5.11 Boot Stage Register Definitions, |
| Altra Family Soc BMC Interface Specification`. |
| |
| What: /sys/bus/platform/devices/smpro-misc.*/boot_progress |
| KernelVersion: 6.1 |
| Contact: Quan Nguyen <quan@os.amperecomputing.com> |
| Description: |
| (RO) Contains the boot stages information in hex as format below:: |
| |
| AABBCCCCCCCC |
| |
| where: |
| |
| - ``AA`` : The boot stages |
| |
| - 00: SMpro firmware booting |
| - 01: PMpro firmware booting |
| - 02: ATF BL1 firmware booting |
| - 03: DDR initialization |
| - 04: DDR training report status |
| - 05: ATF BL2 firmware booting |
| - 06: ATF BL31 firmware booting |
| - 07: ATF BL32 firmware booting |
| - 08: UEFI firmware booting |
| - 09: OS booting |
| |
| - ``BB`` : Boot status |
| |
| - 00: Not started |
| - 01: Started |
| - 02: Completed without error |
| - 03: Failed. |
| |
| - ``CCCCCCCC``: Boot status information defined for each boot stages |
| |
| For details, see section `5.11 Boot Stage Register Definitions` |
| and section `6. Processor Boot Progress Codes, Altra Family Soc BMC |
| Interface Specification`. |
| |
| |
| What: /sys/bus/platform/devices/smpro-misc*/soc_power_limit |
| KernelVersion: 6.1 |
| Contact: Quan Nguyen <quan@os.amperecomputing.com> |
| Description: |
| (RW) Contains the desired SoC power limit in Watt. |
| Writes to this sysfs set the desired SoC power limit (W). |
| Reads from this register return the current SoC power limit (W). |
| The value ranges: |
| |
| - Minimum: 120 W |
| - Maximum: Socket TDP power |