| Intel hybrid support |
| -------------------- |
| Support for Intel hybrid events within perf tools. |
| |
| For some Intel platforms, such as AlderLake, which is hybrid platform and |
| it consists of atom cpu and core cpu. Each cpu has dedicated event list. |
| Part of events are available on core cpu, part of events are available |
| on atom cpu and even part of events are available on both. |
| |
| Kernel exports two new cpu pmus via sysfs: |
| /sys/devices/cpu_core |
| /sys/devices/cpu_atom |
| |
| The 'cpus' files are created under the directories. For example, |
| |
| cat /sys/devices/cpu_core/cpus |
| 0-15 |
| |
| cat /sys/devices/cpu_atom/cpus |
| 16-23 |
| |
| It indicates cpu0-cpu15 are core cpus and cpu16-cpu23 are atom cpus. |
| |
| Quickstart |
| |
| List hybrid event |
| ----------------- |
| |
| As before, use perf-list to list the symbolic event. |
| |
| perf list |
| |
| inst_retired.any |
| [Fixed Counter: Counts the number of instructions retired. Unit: cpu_atom] |
| inst_retired.any |
| [Number of instructions retired. Fixed Counter - architectural event. Unit: cpu_core] |
| |
| The 'Unit: xxx' is added to brief description to indicate which pmu |
| the event is belong to. Same event name but with different pmu can |
| be supported. |
| |
| Enable hybrid event with a specific pmu |
| --------------------------------------- |
| |
| To enable a core only event or atom only event, following syntax is supported: |
| |
| cpu_core/<event name>/ |
| or |
| cpu_atom/<event name>/ |
| |
| For example, count the 'cycles' event on core cpus. |
| |
| perf stat -e cpu_core/cycles/ |
| |
| Create two events for one hardware event automatically |
| ------------------------------------------------------ |
| |
| When creating one event and the event is available on both atom and core, |
| two events are created automatically. One is for atom, the other is for |
| core. Most of hardware events and cache events are available on both |
| cpu_core and cpu_atom. |
| |
| For hardware events, they have pre-defined configs (e.g. 0 for cycles). |
| But on hybrid platform, kernel needs to know where the event comes from |
| (from atom or from core). The original perf event type PERF_TYPE_HARDWARE |
| can't carry pmu information. So now this type is extended to be PMU aware |
| type. The PMU type ID is stored at attr.config[63:32]. |
| |
| PMU type ID is retrieved from sysfs. |
| /sys/devices/cpu_atom/type |
| /sys/devices/cpu_core/type |
| |
| The new attr.config layout for PERF_TYPE_HARDWARE: |
| |
| PERF_TYPE_HARDWARE: 0xEEEEEEEE000000AA |
| AA: hardware event ID |
| EEEEEEEE: PMU type ID |
| |
| Cache event is similar. The type PERF_TYPE_HW_CACHE is extended to be |
| PMU aware type. The PMU type ID is stored at attr.config[63:32]. |
| |
| The new attr.config layout for PERF_TYPE_HW_CACHE: |
| |
| PERF_TYPE_HW_CACHE: 0xEEEEEEEE00DDCCBB |
| BB: hardware cache ID |
| CC: hardware cache op ID |
| DD: hardware cache op result ID |
| EEEEEEEE: PMU type ID |
| |
| When enabling a hardware event without specified pmu, such as, |
| perf stat -e cycles -a (use system-wide in this example), two events |
| are created automatically. |
| |
| ------------------------------------------------------------ |
| perf_event_attr: |
| size 120 |
| config 0x400000000 |
| sample_type IDENTIFIER |
| read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING |
| disabled 1 |
| inherit 1 |
| exclude_guest 1 |
| ------------------------------------------------------------ |
| |
| and |
| |
| ------------------------------------------------------------ |
| perf_event_attr: |
| size 120 |
| config 0x800000000 |
| sample_type IDENTIFIER |
| read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING |
| disabled 1 |
| inherit 1 |
| exclude_guest 1 |
| ------------------------------------------------------------ |
| |
| type 0 is PERF_TYPE_HARDWARE. |
| 0x4 in 0x400000000 indicates it's cpu_core pmu. |
| 0x8 in 0x800000000 indicates it's cpu_atom pmu (atom pmu type id is random). |
| |
| The kernel creates 'cycles' (0x400000000) on cpu0-cpu15 (core cpus), |
| and create 'cycles' (0x800000000) on cpu16-cpu23 (atom cpus). |
| |
| For perf-stat result, it displays two events: |
| |
| Performance counter stats for 'system wide': |
| |
| 6,744,979 cpu_core/cycles/ |
| 1,965,552 cpu_atom/cycles/ |
| |
| The first 'cycles' is core event, the second 'cycles' is atom event. |
| |
| Thread mode example: |
| -------------------- |
| |
| perf-stat reports the scaled counts for hybrid event and with a percentage |
| displayed. The percentage is the event's running time/enabling time. |
| |
| One example, 'triad_loop' runs on cpu16 (atom core), while we can see the |
| scaled value for core cycles is 160,444,092 and the percentage is 0.47%. |
| |
| perf stat -e cycles -- taskset -c 16 ./triad_loop |
| |
| As previous, two events are created. |
| |
| ------------------------------------------------------------ |
| perf_event_attr: |
| size 120 |
| config 0x400000000 |
| sample_type IDENTIFIER |
| read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING |
| disabled 1 |
| inherit 1 |
| enable_on_exec 1 |
| exclude_guest 1 |
| ------------------------------------------------------------ |
| |
| and |
| |
| ------------------------------------------------------------ |
| perf_event_attr: |
| size 120 |
| config 0x800000000 |
| sample_type IDENTIFIER |
| read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING |
| disabled 1 |
| inherit 1 |
| enable_on_exec 1 |
| exclude_guest 1 |
| ------------------------------------------------------------ |
| |
| Performance counter stats for 'taskset -c 16 ./triad_loop': |
| |
| 233,066,666 cpu_core/cycles/ (0.43%) |
| 604,097,080 cpu_atom/cycles/ (99.57%) |
| |
| perf-record: |
| ------------ |
| |
| If there is no '-e' specified in perf record, on hybrid platform, |
| it creates two default 'cycles' and adds them to event list. One |
| is for core, the other is for atom. |
| |
| perf-stat: |
| ---------- |
| |
| If there is no '-e' specified in perf stat, on hybrid platform, |
| besides of software events, following events are created and |
| added to event list in order. |
| |
| cpu_core/cycles/, |
| cpu_atom/cycles/, |
| cpu_core/instructions/, |
| cpu_atom/instructions/, |
| cpu_core/branches/, |
| cpu_atom/branches/, |
| cpu_core/branch-misses/, |
| cpu_atom/branch-misses/ |
| |
| Of course, both perf-stat and perf-record support to enable |
| hybrid event with a specific pmu. |
| |
| e.g. |
| perf stat -e cpu_core/cycles/ |
| perf stat -e cpu_atom/cycles/ |
| perf stat -e cpu_core/r1a/ |
| perf stat -e cpu_atom/L1-icache-loads/ |
| perf stat -e cpu_core/cycles/,cpu_atom/instructions/ |
| perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}' |
| |
| But '{cpu_core/cycles/,cpu_atom/instructions/}' will return |
| warning and disable grouping, because the pmus in group are |
| not matched (cpu_core vs. cpu_atom). |