| ====================================== |
| Coresight - HW Assisted Tracing on ARM |
| ====================================== |
| |
| :Author: Mathieu Poirier <mathieu.poirier@linaro.org> |
| :Date: September 11th, 2014 |
| |
| Introduction |
| ------------ |
| |
| Coresight is an umbrella of technologies allowing for the debugging of ARM |
| based SoC. It includes solutions for JTAG and HW assisted tracing. This |
| document is concerned with the latter. |
| |
| HW assisted tracing is becoming increasingly useful when dealing with systems |
| that have many SoCs and other components like GPU and DMA engines. ARM has |
| developed a HW assisted tracing solution by means of different components, each |
| being added to a design at synthesis time to cater to specific tracing needs. |
| Components are generally categorised as source, link and sinks and are |
| (usually) discovered using the AMBA bus. |
| |
| "Sources" generate a compressed stream representing the processor instruction |
| path based on tracing scenarios as configured by users. From there the stream |
| flows through the coresight system (via ATB bus) using links that are connecting |
| the emanating source to a sink(s). Sinks serve as endpoints to the coresight |
| implementation, either storing the compressed stream in a memory buffer or |
| creating an interface to the outside world where data can be transferred to a |
| host without fear of filling up the onboard coresight memory buffer. |
| |
| At typical coresight system would look like this:: |
| |
| ***************************************************************** |
| **************************** AMBA AXI ****************************===|| |
| ***************************************************************** || |
| ^ ^ | || |
| | | * ** |
| 0000000 ::::: 0000000 ::::: ::::: @@@@@@@ |||||||||||| |
| 0 CPU 0<-->: C : 0 CPU 0<-->: C : : C : @ STM @ || System || |
| |->0000000 : T : |->0000000 : T : : T :<--->@@@@@ || Memory || |
| | #######<-->: I : | #######<-->: I : : I : @@@<-| |||||||||||| |
| | # ETM # ::::: | # PTM # ::::: ::::: @ | |
| | ##### ^ ^ | ##### ^ ! ^ ! . | ||||||||| |
| | |->### | ! | |->### | ! | ! . | || DAP || |
| | | # | ! | | # | ! | ! . | ||||||||| |
| | | . | ! | | . | ! | ! . | | | |
| | | . | ! | | . | ! | ! . | | * |
| | | . | ! | | . | ! | ! . | | SWD/ |
| | | . | ! | | . | ! | ! . | | JTAG |
| *****************************************************************<-| |
| *************************** AMBA Debug APB ************************ |
| ***************************************************************** |
| | . ! . ! ! . | |
| | . * . * * . | |
| ***************************************************************** |
| ******************** Cross Trigger Matrix (CTM) ******************* |
| ***************************************************************** |
| | . ^ . . | |
| | * ! * * | |
| ***************************************************************** |
| ****************** AMBA Advanced Trace Bus (ATB) ****************** |
| ***************************************************************** |
| | ! =============== | |
| | * ===== F =====<---------| |
| | ::::::::: ==== U ==== |
| |-->:: CTI ::<!! === N === |
| | ::::::::: ! == N == |
| | ^ * == E == |
| | ! &&&&&&&&& IIIIIII == L == |
| |------>&& ETB &&<......II I ======= |
| | ! &&&&&&&&& II I . |
| | ! I I . |
| | ! I REP I<.......... |
| | ! I I |
| | !!>&&&&&&&&& II I *Source: ARM ltd. |
| |------>& TPIU &<......II I DAP = Debug Access Port |
| &&&&&&&&& IIIIIII ETM = Embedded Trace Macrocell |
| ; PTM = Program Trace Macrocell |
| ; CTI = Cross Trigger Interface |
| * ETB = Embedded Trace Buffer |
| To trace port TPIU= Trace Port Interface Unit |
| SWD = Serial Wire Debug |
| |
| While on target configuration of the components is done via the APB bus, |
| all trace data are carried out-of-band on the ATB bus. The CTM provides |
| a way to aggregate and distribute signals between CoreSight components. |
| |
| The coresight framework provides a central point to represent, configure and |
| manage coresight devices on a platform. This first implementation centers on |
| the basic tracing functionality, enabling components such ETM/PTM, funnel, |
| replicator, TMC, TPIU and ETB. Future work will enable more |
| intricate IP blocks such as STM and CTI. |
| |
| |
| Acronyms and Classification |
| --------------------------- |
| |
| Acronyms: |
| |
| PTM: |
| Program Trace Macrocell |
| ETM: |
| Embedded Trace Macrocell |
| STM: |
| System trace Macrocell |
| ETB: |
| Embedded Trace Buffer |
| ITM: |
| Instrumentation Trace Macrocell |
| TPIU: |
| Trace Port Interface Unit |
| TMC-ETR: |
| Trace Memory Controller, configured as Embedded Trace Router |
| TMC-ETF: |
| Trace Memory Controller, configured as Embedded Trace FIFO |
| CTI: |
| Cross Trigger Interface |
| |
| Classification: |
| |
| Source: |
| ETMv3.x ETMv4, PTMv1.0, PTMv1.1, STM, STM500, ITM |
| Link: |
| Funnel, replicator (intelligent or not), TMC-ETR |
| Sinks: |
| ETBv1.0, ETB1.1, TPIU, TMC-ETF |
| Misc: |
| CTI |
| |
| |
| Device Tree Bindings |
| -------------------- |
| |
| See Documentation/devicetree/bindings/arm/coresight.txt for details. |
| |
| As of this writing drivers for ITM, STMs and CTIs are not provided but are |
| expected to be added as the solution matures. |
| |
| |
| Framework and implementation |
| ---------------------------- |
| |
| The coresight framework provides a central point to represent, configure and |
| manage coresight devices on a platform. Any coresight compliant device can |
| register with the framework for as long as they use the right APIs: |
| |
| .. c:function:: struct coresight_device *coresight_register(struct coresight_desc *desc); |
| .. c:function:: void coresight_unregister(struct coresight_device *csdev); |
| |
| The registering function is taking a ``struct coresight_desc *desc`` and |
| register the device with the core framework. The unregister function takes |
| a reference to a ``struct coresight_device *csdev`` obtained at registration time. |
| |
| If everything goes well during the registration process the new devices will |
| show up under /sys/bus/coresight/devices, as showns here for a TC2 platform:: |
| |
| root:~# ls /sys/bus/coresight/devices/ |
| replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm |
| 20010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm |
| root:~# |
| |
| The functions take a ``struct coresight_device``, which looks like this:: |
| |
| struct coresight_desc { |
| enum coresight_dev_type type; |
| struct coresight_dev_subtype subtype; |
| const struct coresight_ops *ops; |
| struct coresight_platform_data *pdata; |
| struct device *dev; |
| const struct attribute_group **groups; |
| }; |
| |
| |
| The "coresight_dev_type" identifies what the device is, i.e, source link or |
| sink while the "coresight_dev_subtype" will characterise that type further. |
| |
| The ``struct coresight_ops`` is mandatory and will tell the framework how to |
| perform base operations related to the components, each component having |
| a different set of requirement. For that ``struct coresight_ops_sink``, |
| ``struct coresight_ops_link`` and ``struct coresight_ops_source`` have been |
| provided. |
| |
| The next field ``struct coresight_platform_data *pdata`` is acquired by calling |
| ``of_get_coresight_platform_data()``, as part of the driver's _probe routine and |
| ``struct device *dev`` gets the device reference embedded in the ``amba_device``:: |
| |
| static int etm_probe(struct amba_device *adev, const struct amba_id *id) |
| { |
| ... |
| ... |
| drvdata->dev = &adev->dev; |
| ... |
| } |
| |
| Specific class of device (source, link, or sink) have generic operations |
| that can be performed on them (see ``struct coresight_ops``). The ``**groups`` |
| is a list of sysfs entries pertaining to operations |
| specific to that component only. "Implementation defined" customisations are |
| expected to be accessed and controlled using those entries. |
| |
| Device Naming scheme |
| -------------------- |
| |
| The devices that appear on the "coresight" bus were named the same as their |
| parent devices, i.e, the real devices that appears on AMBA bus or the platform bus. |
| Thus the names were based on the Linux Open Firmware layer naming convention, |
| which follows the base physical address of the device followed by the device |
| type. e.g:: |
| |
| root:~# ls /sys/bus/coresight/devices/ |
| 20010000.etf 20040000.funnel 20100000.stm 22040000.etm |
| 22140000.etm 230c0000.funnel 23240000.etm 20030000.tpiu |
| 20070000.etr 20120000.replicator 220c0000.funnel |
| 23040000.etm 23140000.etm 23340000.etm |
| |
| However, with the introduction of ACPI support, the names of the real |
| devices are a bit cryptic and non-obvious. Thus, a new naming scheme was |
| introduced to use more generic names based on the type of the device. The |
| following rules apply:: |
| |
| 1) Devices that are bound to CPUs, are named based on the CPU logical |
| number. |
| |
| e.g, ETM bound to CPU0 is named "etm0" |
| |
| 2) All other devices follow a pattern, "<device_type_prefix>N", where : |
| |
| <device_type_prefix> - A prefix specific to the type of the device |
| N - a sequential number assigned based on the order |
| of probing. |
| |
| e.g, tmc_etf0, tmc_etr0, funnel0, funnel1 |
| |
| Thus, with the new scheme the devices could appear as :: |
| |
| root:~# ls /sys/bus/coresight/devices/ |
| etm0 etm1 etm2 etm3 etm4 etm5 funnel0 |
| funnel1 funnel2 replicator0 stm0 tmc_etf0 tmc_etr0 tpiu0 |
| |
| Some of the examples below might refer to old naming scheme and some |
| to the newer scheme, to give a confirmation that what you see on your |
| system is not unexpected. One must use the "names" as they appear on |
| the system under specified locations. |
| |
| How to use the tracer modules |
| ----------------------------- |
| |
| There are two ways to use the Coresight framework: |
| |
| 1. using the perf cmd line tools. |
| 2. interacting directly with the Coresight devices using the sysFS interface. |
| |
| Preference is given to the former as using the sysFS interface |
| requires a deep understanding of the Coresight HW. The following sections |
| provide details on using both methods. |
| |
| 1) Using the sysFS interface: |
| |
| Before trace collection can start, a coresight sink needs to be identified. |
| There is no limit on the amount of sinks (nor sources) that can be enabled at |
| any given moment. As a generic operation, all device pertaining to the sink |
| class will have an "active" entry in sysfs:: |
| |
| root:/sys/bus/coresight/devices# ls |
| replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm |
| 20010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm |
| root:/sys/bus/coresight/devices# ls 20010000.etb |
| enable_sink status trigger_cntr |
| root:/sys/bus/coresight/devices# echo 1 > 20010000.etb/enable_sink |
| root:/sys/bus/coresight/devices# cat 20010000.etb/enable_sink |
| 1 |
| root:/sys/bus/coresight/devices# |
| |
| At boot time the current etm3x driver will configure the first address |
| comparator with "_stext" and "_etext", essentially tracing any instruction |
| that falls within that range. As such "enabling" a source will immediately |
| trigger a trace capture:: |
| |
| root:/sys/bus/coresight/devices# echo 1 > 2201c000.ptm/enable_source |
| root:/sys/bus/coresight/devices# cat 2201c000.ptm/enable_source |
| 1 |
| root:/sys/bus/coresight/devices# cat 20010000.etb/status |
| Depth: 0x2000 |
| Status: 0x1 |
| RAM read ptr: 0x0 |
| RAM wrt ptr: 0x19d3 <----- The write pointer is moving |
| Trigger cnt: 0x0 |
| Control: 0x1 |
| Flush status: 0x0 |
| Flush ctrl: 0x2001 |
| root:/sys/bus/coresight/devices# |
| |
| Trace collection is stopped the same way:: |
| |
| root:/sys/bus/coresight/devices# echo 0 > 2201c000.ptm/enable_source |
| root:/sys/bus/coresight/devices# |
| |
| The content of the ETB buffer can be harvested directly from /dev:: |
| |
| root:/sys/bus/coresight/devices# dd if=/dev/20010000.etb \ |
| of=~/cstrace.bin |
| 64+0 records in |
| 64+0 records out |
| 32768 bytes (33 kB) copied, 0.00125258 s, 26.2 MB/s |
| root:/sys/bus/coresight/devices# |
| |
| The file cstrace.bin can be decompressed using "ptm2human", DS-5 or Trace32. |
| |
| Following is a DS-5 output of an experimental loop that increments a variable up |
| to a certain value. The example is simple and yet provides a glimpse of the |
| wealth of possibilities that coresight provides. |
| :: |
| |
| Info Tracing enabled |
| Instruction 106378866 0x8026B53C E52DE004 false PUSH {lr} |
| Instruction 0 0x8026B540 E24DD00C false SUB sp,sp,#0xc |
| Instruction 0 0x8026B544 E3A03000 false MOV r3,#0 |
| Instruction 0 0x8026B548 E58D3004 false STR r3,[sp,#4] |
| Instruction 0 0x8026B54C E59D3004 false LDR r3,[sp,#4] |
| Instruction 0 0x8026B550 E3530004 false CMP r3,#4 |
| Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 |
| Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] |
| Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c |
| Timestamp Timestamp: 17106715833 |
| Instruction 319 0x8026B54C E59D3004 false LDR r3,[sp,#4] |
| Instruction 0 0x8026B550 E3530004 false CMP r3,#4 |
| Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 |
| Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] |
| Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c |
| Instruction 9 0x8026B54C E59D3004 false LDR r3,[sp,#4] |
| Instruction 0 0x8026B550 E3530004 false CMP r3,#4 |
| Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 |
| Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] |
| Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c |
| Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4] |
| Instruction 0 0x8026B550 E3530004 false CMP r3,#4 |
| Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 |
| Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] |
| Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c |
| Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4] |
| Instruction 0 0x8026B550 E3530004 false CMP r3,#4 |
| Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 |
| Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] |
| Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c |
| Instruction 10 0x8026B54C E59D3004 false LDR r3,[sp,#4] |
| Instruction 0 0x8026B550 E3530004 false CMP r3,#4 |
| Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 |
| Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] |
| Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c |
| Instruction 6 0x8026B560 EE1D3F30 false MRC p15,#0x0,r3,c13,c0,#1 |
| Instruction 0 0x8026B564 E1A0100D false MOV r1,sp |
| Instruction 0 0x8026B568 E3C12D7F false BIC r2,r1,#0x1fc0 |
| Instruction 0 0x8026B56C E3C2203F false BIC r2,r2,#0x3f |
| Instruction 0 0x8026B570 E59D1004 false LDR r1,[sp,#4] |
| Instruction 0 0x8026B574 E59F0010 false LDR r0,[pc,#16] ; [0x8026B58C] = 0x80550368 |
| Instruction 0 0x8026B578 E592200C false LDR r2,[r2,#0xc] |
| Instruction 0 0x8026B57C E59221D0 false LDR r2,[r2,#0x1d0] |
| Instruction 0 0x8026B580 EB07A4CF true BL {pc}+0x1e9344 ; 0x804548c4 |
| Info Tracing enabled |
| Instruction 13570831 0x8026B584 E28DD00C false ADD sp,sp,#0xc |
| Instruction 0 0x8026B588 E8BD8000 true LDM sp!,{pc} |
| Timestamp Timestamp: 17107041535 |
| |
| 2) Using perf framework: |
| |
| Coresight tracers are represented using the Perf framework's Performance |
| Monitoring Unit (PMU) abstraction. As such the perf framework takes charge of |
| controlling when tracing gets enabled based on when the process of interest is |
| scheduled. When configured in a system, Coresight PMUs will be listed when |
| queried by the perf command line tool: |
| |
| linaro@linaro-nano:~$ ./perf list pmu |
| |
| List of pre-defined events (to be used in -e): |
| |
| cs_etm// [Kernel PMU event] |
| |
| linaro@linaro-nano:~$ |
| |
| Regardless of the number of tracers available in a system (usually equal to the |
| amount of processor cores), the "cs_etm" PMU will be listed only once. |
| |
| A Coresight PMU works the same way as any other PMU, i.e the name of the PMU is |
| listed along with configuration options within forward slashes '/'. Since a |
| Coresight system will typically have more than one sink, the name of the sink to |
| work with needs to be specified as an event option. |
| On newer kernels the available sinks are listed in sysFS under |
| ($SYSFS)/bus/event_source/devices/cs_etm/sinks/:: |
| |
| root@localhost:/sys/bus/event_source/devices/cs_etm/sinks# ls |
| tmc_etf0 tmc_etr0 tpiu0 |
| |
| On older kernels, this may need to be found from the list of coresight devices, |
| available under ($SYSFS)/bus/coresight/devices/:: |
| |
| root:~# ls /sys/bus/coresight/devices/ |
| etm0 etm1 etm2 etm3 etm4 etm5 funnel0 |
| funnel1 funnel2 replicator0 stm0 tmc_etf0 tmc_etr0 tpiu0 |
| root@linaro-nano:~# perf record -e cs_etm/@tmc_etr0/u --per-thread program |
| |
| As mentioned above in section "Device Naming scheme", the names of the devices could |
| look different from what is used in the example above. One must use the device names |
| as it appears under the sysFS. |
| |
| The syntax within the forward slashes '/' is important. The '@' character |
| tells the parser that a sink is about to be specified and that this is the sink |
| to use for the trace session. |
| |
| More information on the above and other example on how to use Coresight with |
| the perf tools can be found in the "HOWTO.md" file of the openCSD gitHub |
| repository [#third]_. |
| |
| 2.1) AutoFDO analysis using the perf tools: |
| |
| perf can be used to record and analyze trace of programs. |
| |
| Execution can be recorded using 'perf record' with the cs_etm event, |
| specifying the name of the sink to record to, e.g:: |
| |
| perf record -e cs_etm/@tmc_etr0/u --per-thread |
| |
| The 'perf report' and 'perf script' commands can be used to analyze execution, |
| synthesizing instruction and branch events from the instruction trace. |
| 'perf inject' can be used to replace the trace data with the synthesized events. |
| The --itrace option controls the type and frequency of synthesized events |
| (see perf documentation). |
| |
| Note that only 64-bit programs are currently supported - further work is |
| required to support instruction decode of 32-bit Arm programs. |
| |
| |
| Generating coverage files for Feedback Directed Optimization: AutoFDO |
| --------------------------------------------------------------------- |
| |
| 'perf inject' accepts the --itrace option in which case tracing data is |
| removed and replaced with the synthesized events. e.g. |
| :: |
| |
| perf inject --itrace --strip -i perf.data -o perf.data.new |
| |
| Below is an example of using ARM ETM for autoFDO. It requires autofdo |
| (https://github.com/google/autofdo) and gcc version 5. The bubble |
| sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial). |
| :: |
| |
| $ gcc-5 -O3 sort.c -o sort |
| $ taskset -c 2 ./sort |
| Bubble sorting array of 30000 elements |
| 5910 ms |
| |
| $ perf record -e cs_etm/@tmc_etr0/u --per-thread taskset -c 2 ./sort |
| Bubble sorting array of 30000 elements |
| 12543 ms |
| [ perf record: Woken up 35 times to write data ] |
| [ perf record: Captured and wrote 69.640 MB perf.data ] |
| |
| $ perf inject -i perf.data -o inj.data --itrace=il64 --strip |
| $ create_gcov --binary=./sort --profile=inj.data --gcov=sort.gcov -gcov_version=1 |
| $ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo |
| $ taskset -c 2 ./sort_autofdo |
| Bubble sorting array of 30000 elements |
| 5806 ms |
| |
| |
| How to use the STM module |
| ------------------------- |
| |
| Using the System Trace Macrocell module is the same as the tracers - the only |
| difference is that clients are driving the trace capture rather |
| than the program flow through the code. |
| |
| As with any other CoreSight component, specifics about the STM tracer can be |
| found in sysfs with more information on each entry being found in [#first]_:: |
| |
| root@genericarmv8:~# ls /sys/bus/coresight/devices/stm0 |
| enable_source hwevent_select port_enable subsystem uevent |
| hwevent_enable mgmt port_select traceid |
| root@genericarmv8:~# |
| |
| Like any other source a sink needs to be identified and the STM enabled before |
| being used:: |
| |
| root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/tmc_etf0/enable_sink |
| root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/stm0/enable_source |
| |
| From there user space applications can request and use channels using the devfs |
| interface provided for that purpose by the generic STM API:: |
| |
| root@genericarmv8:~# ls -l /dev/stm0 |
| crw------- 1 root root 10, 61 Jan 3 18:11 /dev/stm0 |
| root@genericarmv8:~# |
| |
| Details on how to use the generic STM API can be found here [#second]_. |
| |
| .. [#first] Documentation/ABI/testing/sysfs-bus-coresight-devices-stm |
| |
| .. [#second] Documentation/trace/stm.rst |
| |
| .. [#third] https://github.com/Linaro/perf-opencsd |