| .. SPDX-License-Identifier: GPL-2.0 |
| .. _imc: |
| |
| =================================== |
| IMC (In-Memory Collection Counters) |
| =================================== |
| |
| Anju T Sudhakar, 10 May 2019 |
| |
| .. contents:: |
| :depth: 3 |
| |
| |
| Basic overview |
| ============== |
| |
| IMC (In-Memory collection counters) is a hardware monitoring facility that |
| collects large numbers of hardware performance events at Nest level (these are |
| on-chip but off-core), Core level and Thread level. |
| |
| The Nest PMU counters are handled by a Nest IMC microcode which runs in the OCC |
| (On-Chip Controller) complex. The microcode collects the counter data and moves |
| the nest IMC counter data to memory. |
| |
| The Core and Thread IMC PMU counters are handled in the core. Core level PMU |
| counters give us the IMC counters' data per core and thread level PMU counters |
| give us the IMC counters' data per CPU thread. |
| |
| OPAL obtains the IMC PMU and supported events information from the IMC Catalog |
| and passes on to the kernel via the device tree. The event's information |
| contains: |
| |
| - Event name |
| - Event Offset |
| - Event description |
| |
| and possibly also: |
| |
| - Event scale |
| - Event unit |
| |
| Some PMUs may have a common scale and unit values for all their supported |
| events. For those cases, the scale and unit properties for those events must be |
| inherited from the PMU. |
| |
| The event offset in the memory is where the counter data gets accumulated. |
| |
| IMC catalog is available at: |
| https://github.com/open-power/ima-catalog |
| |
| The kernel discovers the IMC counters information in the device tree at the |
| `imc-counters` device node which has a compatible field |
| `ibm,opal-in-memory-counters`. From the device tree, the kernel parses the PMUs |
| and their event's information and register the PMU and its attributes in the |
| kernel. |
| |
| IMC example usage |
| ================= |
| |
| .. code-block:: sh |
| |
| # perf list |
| [...] |
| nest_mcs01/PM_MCS01_64B_RD_DISP_PORT01/ [Kernel PMU event] |
| nest_mcs01/PM_MCS01_64B_RD_DISP_PORT23/ [Kernel PMU event] |
| [...] |
| core_imc/CPM_0THRD_NON_IDLE_PCYC/ [Kernel PMU event] |
| core_imc/CPM_1THRD_NON_IDLE_INST/ [Kernel PMU event] |
| [...] |
| thread_imc/CPM_0THRD_NON_IDLE_PCYC/ [Kernel PMU event] |
| thread_imc/CPM_1THRD_NON_IDLE_INST/ [Kernel PMU event] |
| |
| To see per chip data for nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0/: |
| |
| .. code-block:: sh |
| |
| # ./perf stat -e "nest_mcs01/PM_MCS01_64B_WR_DISP_PORT01/" -a --per-socket |
| |
| To see non-idle instructions for core 0: |
| |
| .. code-block:: sh |
| |
| # ./perf stat -e "core_imc/CPM_NON_IDLE_INST/" -C 0 -I 1000 |
| |
| To see non-idle instructions for a "make": |
| |
| .. code-block:: sh |
| |
| # ./perf stat -e "thread_imc/CPM_NON_IDLE_PCYC/" make |
| |
| |
| IMC Trace-mode |
| =============== |
| |
| POWER9 supports two modes for IMC which are the Accumulation mode and Trace |
| mode. In Accumulation mode, event counts are accumulated in system Memory. |
| Hypervisor then reads the posted counts periodically or when requested. In IMC |
| Trace mode, the 64 bit trace SCOM value is initialized with the event |
| information. The CPMCxSEL and CPMC_LOAD in the trace SCOM, specifies the event |
| to be monitored and the sampling duration. On each overflow in the CPMCxSEL, |
| hardware snapshots the program counter along with event counts and writes into |
| memory pointed by LDBAR. |
| |
| LDBAR is a 64 bit special purpose per thread register, it has bits to indicate |
| whether hardware is configured for accumulation or trace mode. |
| |
| LDBAR Register Layout |
| --------------------- |
| |
| +-------+----------------------+ |
| | 0 | Enable/Disable | |
| +-------+----------------------+ |
| | 1 | 0: Accumulation Mode | |
| | +----------------------+ |
| | | 1: Trace Mode | |
| +-------+----------------------+ |
| | 2:3 | Reserved | |
| +-------+----------------------+ |
| | 4-6 | PB scope | |
| +-------+----------------------+ |
| | 7 | Reserved | |
| +-------+----------------------+ |
| | 8:50 | Counter Address | |
| +-------+----------------------+ |
| | 51:63 | Reserved | |
| +-------+----------------------+ |
| |
| TRACE_IMC_SCOM bit representation |
| --------------------------------- |
| |
| +-------+------------+ |
| | 0:1 | SAMPSEL | |
| +-------+------------+ |
| | 2:33 | CPMC_LOAD | |
| +-------+------------+ |
| | 34:40 | CPMC1SEL | |
| +-------+------------+ |
| | 41:47 | CPMC2SEL | |
| +-------+------------+ |
| | 48:50 | BUFFERSIZE | |
| +-------+------------+ |
| | 51:63 | RESERVED | |
| +-------+------------+ |
| |
| CPMC_LOAD contains the sampling duration. SAMPSEL and CPMCxSEL determines the |
| event to count. BUFFERSIZE indicates the memory range. On each overflow, |
| hardware snapshots the program counter along with event counts and updates the |
| memory and reloads the CMPC_LOAD value for the next sampling duration. IMC |
| hardware does not support exceptions, so it quietly wraps around if memory |
| buffer reaches the end. |
| |
| *Currently the event monitored for trace-mode is fixed as cycle.* |
| |
| Trace IMC example usage |
| ======================= |
| |
| .. code-block:: sh |
| |
| # perf list |
| [....] |
| trace_imc/trace_cycles/ [Kernel PMU event] |
| |
| To record an application/process with trace-imc event: |
| |
| .. code-block:: sh |
| |
| # perf record -e trace_imc/trace_cycles/ yes > /dev/null |
| [ perf record: Woken up 1 times to write data ] |
| [ perf record: Captured and wrote 0.012 MB perf.data (21 samples) ] |
| |
| The `perf.data` generated, can be read using perf report. |
| |
| Benefits of using IMC trace-mode |
| ================================ |
| |
| PMI (Performance Monitoring Interrupts) interrupt handling is avoided, since IMC |
| trace mode snapshots the program counter and updates to the memory. And this |
| also provide a way for the operating system to do instruction sampling in real |
| time without PMI processing overhead. |
| |
| Performance data using `perf top` with and without trace-imc event. |
| |
| PMI interrupts count when `perf top` command is executed without trace-imc event. |
| |
| .. code-block:: sh |
| |
| # grep PMI /proc/interrupts |
| PMI: 0 0 0 0 Performance monitoring interrupts |
| # ./perf top |
| ... |
| # grep PMI /proc/interrupts |
| PMI: 39735 8710 17338 17801 Performance monitoring interrupts |
| # ./perf top -e trace_imc/trace_cycles/ |
| ... |
| # grep PMI /proc/interrupts |
| PMI: 39735 8710 17338 17801 Performance monitoring interrupts |
| |
| |
| That is, the PMI interrupt counts do not increment when using the `trace_imc` event. |