| Coresight - HW Assisted Tracing on ARM |
| ====================================== |
| |
| Author: Mathieu Poirier <mathieu.poirier@linaro.org> |
| Date: September 11th, 2014 |
| |
| Introduction |
| ------------ |
| |
| Coresight is an umbrella of technologies allowing for the debugging of ARM |
| based SoC. It includes solutions for JTAG and HW assisted tracing. This |
| document is concerned with the latter. |
| |
| HW assisted tracing is becoming increasingly useful when dealing with systems |
| that have many SoCs and other components like GPU and DMA engines. ARM has |
| developed a HW assisted tracing solution by means of different components, each |
| being added to a design at synthesis time to cater to specific tracing needs. |
| Components are generally categorised as source, link and sinks and are |
| (usually) discovered using the AMBA bus. |
| |
| "Sources" generate a compressed stream representing the processor instruction |
| path based on tracing scenarios as configured by users. From there the stream |
| flows through the coresight system (via ATB bus) using links that are connecting |
| the emanating source to a sink(s). Sinks serve as endpoints to the coresight |
| implementation, either storing the compressed stream in a memory buffer or |
| creating an interface to the outside world where data can be transferred to a |
| host without fear of filling up the onboard coresight memory buffer. |
| |
| At typical coresight system would look like this: |
| |
| ***************************************************************** |
| **************************** AMBA AXI ****************************===|| |
| ***************************************************************** || |
| ^ ^ | || |
| | | * ** |
| 0000000 ::::: 0000000 ::::: ::::: @@@@@@@ |||||||||||| |
| 0 CPU 0<-->: C : 0 CPU 0<-->: C : : C : @ STM @ || System || |
| |->0000000 : T : |->0000000 : T : : T :<--->@@@@@ || Memory || |
| | #######<-->: I : | #######<-->: I : : I : @@@<-| |||||||||||| |
| | # ETM # ::::: | # PTM # ::::: ::::: @ | |
| | ##### ^ ^ | ##### ^ ! ^ ! . | ||||||||| |
| | |->### | ! | |->### | ! | ! . | || DAP || |
| | | # | ! | | # | ! | ! . | ||||||||| |
| | | . | ! | | . | ! | ! . | | | |
| | | . | ! | | . | ! | ! . | | * |
| | | . | ! | | . | ! | ! . | | SWD/ |
| | | . | ! | | . | ! | ! . | | JTAG |
| *****************************************************************<-| |
| *************************** AMBA Debug APB ************************ |
| ***************************************************************** |
| | . ! . ! ! . | |
| | . * . * * . | |
| ***************************************************************** |
| ******************** Cross Trigger Matrix (CTM) ******************* |
| ***************************************************************** |
| | . ^ . . | |
| | * ! * * | |
| ***************************************************************** |
| ****************** AMBA Advanced Trace Bus (ATB) ****************** |
| ***************************************************************** |
| | ! =============== | |
| | * ===== F =====<---------| |
| | ::::::::: ==== U ==== |
| |-->:: CTI ::<!! === N === |
| | ::::::::: ! == N == |
| | ^ * == E == |
| | ! &&&&&&&&& IIIIIII == L == |
| |------>&& ETB &&<......II I ======= |
| | ! &&&&&&&&& II I . |
| | ! I I . |
| | ! I REP I<.......... |
| | ! I I |
| | !!>&&&&&&&&& II I *Source: ARM ltd. |
| |------>& TPIU &<......II I DAP = Debug Access Port |
| &&&&&&&&& IIIIIII ETM = Embedded Trace Macrocell |
| ; PTM = Program Trace Macrocell |
| ; CTI = Cross Trigger Interface |
| * ETB = Embedded Trace Buffer |
| To trace port TPIU= Trace Port Interface Unit |
| SWD = Serial Wire Debug |
| |
| While on target configuration of the components is done via the APB bus, |
| all trace data are carried out-of-band on the ATB bus. The CTM provides |
| a way to aggregate and distribute signals between CoreSight components. |
| |
| The coresight framework provides a central point to represent, configure and |
| manage coresight devices on a platform. This first implementation centers on |
| the basic tracing functionality, enabling components such ETM/PTM, funnel, |
| replicator, TMC, TPIU and ETB. Future work will enable more |
| intricate IP blocks such as STM and CTI. |
| |
| |
| Acronyms and Classification |
| --------------------------- |
| |
| Acronyms: |
| |
| PTM: Program Trace Macrocell |
| ETM: Embedded Trace Macrocell |
| STM: System trace Macrocell |
| ETB: Embedded Trace Buffer |
| ITM: Instrumentation Trace Macrocell |
| TPIU: Trace Port Interface Unit |
| TMC-ETR: Trace Memory Controller, configured as Embedded Trace Router |
| TMC-ETF: Trace Memory Controller, configured as Embedded Trace FIFO |
| CTI: Cross Trigger Interface |
| |
| Classification: |
| |
| Source: |
| ETMv3.x ETMv4, PTMv1.0, PTMv1.1, STM, STM500, ITM |
| Link: |
| Funnel, replicator (intelligent or not), TMC-ETR |
| Sinks: |
| ETBv1.0, ETB1.1, TPIU, TMC-ETF |
| Misc: |
| CTI |
| |
| |
| Device Tree Bindings |
| ---------------------- |
| |
| See Documentation/devicetree/bindings/arm/coresight.txt for details. |
| |
| As of this writing drivers for ITM, STMs and CTIs are not provided but are |
| expected to be added as the solution matures. |
| |
| |
| Framework and implementation |
| ---------------------------- |
| |
| The coresight framework provides a central point to represent, configure and |
| manage coresight devices on a platform. Any coresight compliant device can |
| register with the framework for as long as they use the right APIs: |
| |
| struct coresight_device *coresight_register(struct coresight_desc *desc); |
| void coresight_unregister(struct coresight_device *csdev); |
| |
| The registering function is taking a "struct coresight_device *csdev" and |
| register the device with the core framework. The unregister function takes |
| a reference to a "struct coresight_device", obtained at registration time. |
| |
| If everything goes well during the registration process the new devices will |
| show up under /sys/bus/coresight/devices, as showns here for a TC2 platform: |
| |
| root:~# ls /sys/bus/coresight/devices/ |
| replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm |
| 20010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm |
| root:~# |
| |
| The functions take a "struct coresight_device", which looks like this: |
| |
| struct coresight_desc { |
| enum coresight_dev_type type; |
| struct coresight_dev_subtype subtype; |
| const struct coresight_ops *ops; |
| struct coresight_platform_data *pdata; |
| struct device *dev; |
| const struct attribute_group **groups; |
| }; |
| |
| |
| The "coresight_dev_type" identifies what the device is, i.e, source link or |
| sink while the "coresight_dev_subtype" will characterise that type further. |
| |
| The "struct coresight_ops" is mandatory and will tell the framework how to |
| perform base operations related to the components, each component having |
| a different set of requirement. For that "struct coresight_ops_sink", |
| "struct coresight_ops_link" and "struct coresight_ops_source" have been |
| provided. |
| |
| The next field, "struct coresight_platform_data *pdata" is acquired by calling |
| "of_get_coresight_platform_data()", as part of the driver's _probe routine and |
| "struct device *dev" gets the device reference embedded in the "amba_device": |
| |
| static int etm_probe(struct amba_device *adev, const struct amba_id *id) |
| { |
| ... |
| ... |
| drvdata->dev = &adev->dev; |
| ... |
| } |
| |
| Specific class of device (source, link, or sink) have generic operations |
| that can be performed on them (see "struct coresight_ops"). The |
| "**groups" is a list of sysfs entries pertaining to operations |
| specific to that component only. "Implementation defined" customisations are |
| expected to be accessed and controlled using those entries. |
| |
| Last but not least, "struct module *owner" is expected to be set to reflect |
| the information carried in "THIS_MODULE". |
| |
| How to use the tracer modules |
| ----------------------------- |
| |
| Before trace collection can start, a coresight sink needs to be identify. |
| There is no limit on the amount of sinks (nor sources) that can be enabled at |
| any given moment. As a generic operation, all device pertaining to the sink |
| class will have an "active" entry in sysfs: |
| |
| root:/sys/bus/coresight/devices# ls |
| replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm |
| 20010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm |
| root:/sys/bus/coresight/devices# ls 20010000.etb |
| enable_sink status trigger_cntr |
| root:/sys/bus/coresight/devices# echo 1 > 20010000.etb/enable_sink |
| root:/sys/bus/coresight/devices# cat 20010000.etb/enable_sink |
| 1 |
| root:/sys/bus/coresight/devices# |
| |
| At boot time the current etm3x driver will configure the first address |
| comparator with "_stext" and "_etext", essentially tracing any instruction |
| that falls within that range. As such "enabling" a source will immediately |
| trigger a trace capture: |
| |
| root:/sys/bus/coresight/devices# echo 1 > 2201c000.ptm/enable_source |
| root:/sys/bus/coresight/devices# cat 2201c000.ptm/enable_source |
| 1 |
| root:/sys/bus/coresight/devices# cat 20010000.etb/status |
| Depth: 0x2000 |
| Status: 0x1 |
| RAM read ptr: 0x0 |
| RAM wrt ptr: 0x19d3 <----- The write pointer is moving |
| Trigger cnt: 0x0 |
| Control: 0x1 |
| Flush status: 0x0 |
| Flush ctrl: 0x2001 |
| root:/sys/bus/coresight/devices# |
| |
| Trace collection is stopped the same way: |
| |
| root:/sys/bus/coresight/devices# echo 0 > 2201c000.ptm/enable_source |
| root:/sys/bus/coresight/devices# |
| |
| The content of the ETB buffer can be harvested directly from /dev: |
| |
| root:/sys/bus/coresight/devices# dd if=/dev/20010000.etb \ |
| of=~/cstrace.bin |
| |
| 64+0 records in |
| 64+0 records out |
| 32768 bytes (33 kB) copied, 0.00125258 s, 26.2 MB/s |
| root:/sys/bus/coresight/devices# |
| |
| The file cstrace.bin can be decompressed using "ptm2human", DS-5 or Trace32. |
| |
| Following is a DS-5 output of an experimental loop that increments a variable up |
| to a certain value. The example is simple and yet provides a glimpse of the |
| wealth of possibilities that coresight provides. |
| |
| Info Tracing enabled |
| Instruction 106378866 0x8026B53C E52DE004 false PUSH {lr} |
| Instruction 0 0x8026B540 E24DD00C false SUB sp,sp,#0xc |
| Instruction 0 0x8026B544 E3A03000 false MOV r3,#0 |
| Instruction 0 0x8026B548 E58D3004 false STR r3,[sp,#4] |
| Instruction 0 0x8026B54C E59D3004 false LDR r3,[sp,#4] |
| Instruction 0 0x8026B550 E3530004 false CMP r3,#4 |
| Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 |
| Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] |
| Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c |
| Timestamp Timestamp: 17106715833 |
| Instruction 319 0x8026B54C E59D3004 false LDR r3,[sp,#4] |
| Instruction 0 0x8026B550 E3530004 false CMP r3,#4 |
| Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 |
| Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] |
| Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c |
| Instruction 9 0x8026B54C E59D3004 false LDR r3,[sp,#4] |
| Instruction 0 0x8026B550 E3530004 false CMP r3,#4 |
| Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 |
| Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] |
| Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c |
| Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4] |
| Instruction 0 0x8026B550 E3530004 false CMP r3,#4 |
| Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 |
| Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] |
| Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c |
| Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4] |
| Instruction 0 0x8026B550 E3530004 false CMP r3,#4 |
| Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 |
| Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] |
| Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c |
| Instruction 10 0x8026B54C E59D3004 false LDR r3,[sp,#4] |
| Instruction 0 0x8026B550 E3530004 false CMP r3,#4 |
| Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 |
| Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] |
| Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c |
| Instruction 6 0x8026B560 EE1D3F30 false MRC p15,#0x0,r3,c13,c0,#1 |
| Instruction 0 0x8026B564 E1A0100D false MOV r1,sp |
| Instruction 0 0x8026B568 E3C12D7F false BIC r2,r1,#0x1fc0 |
| Instruction 0 0x8026B56C E3C2203F false BIC r2,r2,#0x3f |
| Instruction 0 0x8026B570 E59D1004 false LDR r1,[sp,#4] |
| Instruction 0 0x8026B574 E59F0010 false LDR r0,[pc,#16] ; [0x8026B58C] = 0x80550368 |
| Instruction 0 0x8026B578 E592200C false LDR r2,[r2,#0xc] |
| Instruction 0 0x8026B57C E59221D0 false LDR r2,[r2,#0x1d0] |
| Instruction 0 0x8026B580 EB07A4CF true BL {pc}+0x1e9344 ; 0x804548c4 |
| Info Tracing enabled |
| Instruction 13570831 0x8026B584 E28DD00C false ADD sp,sp,#0xc |
| Instruction 0 0x8026B588 E8BD8000 true LDM sp!,{pc} |
| Timestamp Timestamp: 17107041535 |
| |
| How to use the STM module |
| ------------------------- |
| |
| Using the System Trace Macrocell module is the same as the tracers - the only |
| difference is that clients are driving the trace capture rather |
| than the program flow through the code. |
| |
| As with any other CoreSight component, specifics about the STM tracer can be |
| found in sysfs with more information on each entry being found in [1]: |
| |
| root@genericarmv8:~# ls /sys/bus/coresight/devices/20100000.stm |
| enable_source hwevent_select port_enable subsystem uevent |
| hwevent_enable mgmt port_select traceid |
| root@genericarmv8:~# |
| |
| Like any other source a sink needs to be identified and the STM enabled before |
| being used: |
| |
| root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/20010000.etf/enable_sink |
| root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/20100000.stm/enable_source |
| |
| From there user space applications can request and use channels using the devfs |
| interface provided for that purpose by the generic STM API: |
| |
| root@genericarmv8:~# ls -l /dev/20100000.stm |
| crw------- 1 root root 10, 61 Jan 3 18:11 /dev/20100000.stm |
| root@genericarmv8:~# |
| |
| Details on how to use the generic STM API can be found here [2]. |
| |
| [1]. Documentation/ABI/testing/sysfs-bus-coresight-devices-stm |
| [2]. Documentation/trace/stm.txt |
| |
| |
| Using perf tools |
| ---------------- |
| |
| perf can be used to record and analyze trace of programs. |
| |
| Execution can be recorded using 'perf record' with the cs_etm event, |
| specifying the name of the sink to record to, e.g: |
| |
| perf record -e cs_etm/@20070000.etr/u --per-thread |
| |
| The 'perf report' and 'perf script' commands can be used to analyze execution, |
| synthesizing instruction and branch events from the instruction trace. |
| 'perf inject' can be used to replace the trace data with the synthesized events. |
| The --itrace option controls the type and frequency of synthesized events |
| (see perf documentation). |
| |
| Note that only 64-bit programs are currently supported - further work is |
| required to support instruction decode of 32-bit Arm programs. |
| |
| |
| Generating coverage files for Feedback Directed Optimization: AutoFDO |
| --------------------------------------------------------------------- |
| |
| 'perf inject' accepts the --itrace option in which case tracing data is |
| removed and replaced with the synthesized events. e.g. |
| |
| perf inject --itrace --strip -i perf.data -o perf.data.new |
| |
| Below is an example of using ARM ETM for autoFDO. It requires autofdo |
| (https://github.com/google/autofdo) and gcc version 5. The bubble |
| sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial). |
| |
| $ gcc-5 -O3 sort.c -o sort |
| $ taskset -c 2 ./sort |
| Bubble sorting array of 30000 elements |
| 5910 ms |
| |
| $ perf record -e cs_etm/@20070000.etr/u --per-thread taskset -c 2 ./sort |
| Bubble sorting array of 30000 elements |
| 12543 ms |
| [ perf record: Woken up 35 times to write data ] |
| [ perf record: Captured and wrote 69.640 MB perf.data ] |
| |
| $ perf inject -i perf.data -o inj.data --itrace=il64 --strip |
| $ create_gcov --binary=./sort --profile=inj.data --gcov=sort.gcov -gcov_version=1 |
| $ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo |
| $ taskset -c 2 ./sort_autofdo |
| Bubble sorting array of 30000 elements |
| 5806 ms |