| ====================================================== |
| HiSilicon SoC uncore Performance Monitoring Unit (PMU) |
| ====================================================== |
| |
| The HiSilicon SoC chip includes various independent system device PMUs |
| such as L3 cache (L3C), Hydra Home Agent (HHA) and DDRC. These PMUs are |
| independent and have hardware logic to gather statistics and performance |
| information. |
| |
| The HiSilicon SoC encapsulates multiple CPU and IO dies. Each CPU cluster |
| (CCL) is made up of 4 cpu cores sharing one L3 cache; each CPU die is |
| called Super CPU cluster (SCCL) and is made up of 6 CCLs. Each SCCL has |
| two HHAs (0 - 1) and four DDRCs (0 - 3), respectively. |
| |
| HiSilicon SoC uncore PMU driver |
| ------------------------------- |
| |
| Each device PMU has separate registers for event counting, control and |
| interrupt, and the PMU driver shall register perf PMU drivers like L3C, |
| HHA and DDRC etc. The available events and configuration options shall |
| be described in the sysfs, see: |
| |
| /sys/bus/event_source/devices/hisi_sccl{X}_<l3c{Y}/hha{Y}/ddrc{Y}>. |
| The "perf list" command shall list the available events from sysfs. |
| |
| Each L3C, HHA and DDRC is registered as a separate PMU with perf. The PMU |
| name will appear in event listing as hisi_sccl<sccl-id>_module<index-id>. |
| where "sccl-id" is the identifier of the SCCL and "index-id" is the index of |
| module. |
| |
| e.g. hisi_sccl3_l3c0/rd_hit_cpipe is READ_HIT_CPIPE event of L3C index #0 in |
| SCCL ID #3. |
| |
| e.g. hisi_sccl1_hha0/rx_operations is RX_OPERATIONS event of HHA index #0 in |
| SCCL ID #1. |
| |
| The driver also provides a "cpumask" sysfs attribute, which shows the CPU core |
| ID used to count the uncore PMU event. |
| |
| Example usage of perf:: |
| |
| $# perf list |
| hisi_sccl3_l3c0/rd_hit_cpipe/ [kernel PMU event] |
| ------------------------------------------ |
| hisi_sccl3_l3c0/wr_hit_cpipe/ [kernel PMU event] |
| ------------------------------------------ |
| hisi_sccl1_l3c0/rd_hit_cpipe/ [kernel PMU event] |
| ------------------------------------------ |
| hisi_sccl1_l3c0/wr_hit_cpipe/ [kernel PMU event] |
| ------------------------------------------ |
| |
| $# perf stat -a -e hisi_sccl3_l3c0/rd_hit_cpipe/ sleep 5 |
| $# perf stat -a -e hisi_sccl3_l3c0/config=0x02/ sleep 5 |
| |
| For HiSilicon uncore PMU v2 whose identifier is 0x30, the topology is the same |
| as PMU v1, but some new functions are added to the hardware. |
| |
| 1. L3C PMU supports filtering by core/thread within the cluster which can be |
| specified as a bitmap:: |
| |
| $# perf stat -a -e hisi_sccl3_l3c0/config=0x02,tt_core=0x3/ sleep 5 |
| |
| This will only count the operations from core/thread 0 and 1 in this cluster. |
| |
| 2. Tracetag allow the user to chose to count only read, write or atomic |
| operations via the tt_req parameeter in perf. The default value counts all |
| operations. tt_req is 3bits, 3'b100 represents read operations, 3'b101 |
| represents write operations, 3'b110 represents atomic store operations and |
| 3'b111 represents atomic non-store operations, other values are reserved:: |
| |
| $# perf stat -a -e hisi_sccl3_l3c0/config=0x02,tt_req=0x4/ sleep 5 |
| |
| This will only count the read operations in this cluster. |
| |
| 3. Datasrc allows the user to check where the data comes from. It is 5 bits. |
| Some important codes are as follows: |
| |
| - 5'b00001: comes from L3C in this die; |
| - 5'b01000: comes from L3C in the cross-die; |
| - 5'b01001: comes from L3C which is in another socket; |
| - 5'b01110: comes from the local DDR; |
| - 5'b01111: comes from the cross-die DDR; |
| - 5'b10000: comes from cross-socket DDR; |
| |
| etc, it is mainly helpful to find that the data source is nearest from the CPU |
| cores. If datasrc_cfg is used in the multi-chips, the datasrc_skt shall be |
| configured in perf command:: |
| |
| $# perf stat -a -e hisi_sccl3_l3c0/config=0xb9,datasrc_cfg=0xE/, |
| hisi_sccl3_l3c0/config=0xb9,datasrc_cfg=0xF/ sleep 5 |
| |
| 4. Some HiSilicon SoCs encapsulate multiple CPU and IO dies. Each CPU die |
| contains several Compute Clusters (CCLs). The I/O dies are called Super I/O |
| clusters (SICL) containing multiple I/O clusters (ICLs). Each CCL/ICL in the |
| SoC has a unique ID. Each ID is 11bits, include a 6-bit SCCL-ID and 5-bit |
| CCL/ICL-ID. For I/O die, the ICL-ID is followed by: |
| |
| - 5'b00000: I/O_MGMT_ICL; |
| - 5'b00001: Network_ICL; |
| - 5'b00011: HAC_ICL; |
| - 5'b10000: PCIe_ICL; |
| |
| 5. uring_channel: UC PMU events 0x47~0x59 supports filtering by tx request |
| uring channel. It is 2 bits. Some important codes are as follows: |
| |
| - 2'b11: count the events which sent to the uring_ext (MATA) channel; |
| - 2'b01: is the same as 2'b11; |
| - 2'b10: count the events which sent to the uring (non-MATA) channel; |
| - 2'b00: default value, count the events which sent to the both uring and |
| uring_ext channel; |
| |
| Users could configure IDs to count data come from specific CCL/ICL, by setting |
| srcid_cmd & srcid_msk, and data desitined for specific CCL/ICL by setting |
| tgtid_cmd & tgtid_msk. A set bit in srcid_msk/tgtid_msk means the PMU will not |
| check the bit when matching against the srcid_cmd/tgtid_cmd. |
| |
| If all of these options are disabled, it can works by the default value that |
| doesn't distinguish the filter condition and ID information and will return |
| the total counter values in the PMU counters. |
| |
| The current driver does not support sampling. So "perf record" is unsupported. |
| Also attach to a task is unsupported as the events are all uncore. |
| |
| Note: Please contact the maintainer for a complete list of events supported for |
| the PMU devices in the SoC and its information if needed. |