| .. SPDX-License-Identifier: GPL-2.0 |
| |
| =================================== |
| Using AutoFDO with the Linux kernel |
| =================================== |
| |
| This enables AutoFDO build support for the kernel when using |
| the Clang compiler. AutoFDO (Auto-Feedback-Directed Optimization) |
| is a type of profile-guided optimization (PGO) used to enhance the |
| performance of binary executables. It gathers information about the |
| frequency of execution of various code paths within a binary using |
| hardware sampling. This data is then used to guide the compiler's |
| optimization decisions, resulting in a more efficient binary. AutoFDO |
| is a powerful optimization technique, and data indicates that it can |
| significantly improve kernel performance. It's especially beneficial |
| for workloads affected by front-end stalls. |
| |
| For AutoFDO builds, unlike non-FDO builds, the user must supply a |
| profile. Acquiring an AutoFDO profile can be done in several ways. |
| AutoFDO profiles are created by converting hardware sampling using |
| the "perf" tool. It is crucial that the workload used to create these |
| perf files is representative; they must exhibit runtime |
| characteristics similar to the workloads that are intended to be |
| optimized. Failure to do so will result in the compiler optimizing |
| for the wrong objective. |
| |
| The AutoFDO profile often encapsulates the program's behavior. If the |
| performance-critical codes are architecture-independent, the profile |
| can be applied across platforms to achieve performance gains. For |
| instance, using the profile generated on Intel architecture to build |
| a kernel for AMD architecture can also yield performance improvements. |
| |
| There are two methods for acquiring a representative profile: |
| (1) Sample real workloads using a production environment. |
| (2) Generate the profile using a representative load test. |
| When enabling the AutoFDO build configuration without providing an |
| AutoFDO profile, the compiler only modifies the dwarf information in |
| the kernel without impacting runtime performance. It's advisable to |
| use a kernel binary built with the same AutoFDO configuration to |
| collect the perf profile. While it's possible to use a kernel built |
| with different options, it may result in inferior performance. |
| |
| One can collect profiles using AutoFDO build for the previous kernel. |
| AutoFDO employs relative line numbers to match the profiles, offering |
| some tolerance for source changes. This mode is commonly used in a |
| production environment for profile collection. |
| |
| In a profile collection based on a load test, the AutoFDO collection |
| process consists of the following steps: |
| |
| #. Initial build: The kernel is built with AutoFDO options |
| without a profile. |
| |
| #. Profiling: The above kernel is then run with a representative |
| workload to gather execution frequency data. This data is |
| collected using hardware sampling, via perf. AutoFDO is most |
| effective on platforms supporting advanced PMU features like |
| LBR on Intel machines. |
| |
| #. AutoFDO profile generation: Perf output file is converted to |
| the AutoFDO profile via offline tools. |
| |
| The support requires a Clang compiler LLVM 17 or later. |
| |
| Preparation |
| =========== |
| |
| Configure the kernel with:: |
| |
| CONFIG_AUTOFDO_CLANG=y |
| |
| Customization |
| ============= |
| |
| The default CONFIG_AUTOFDO_CLANG setting covers kernel space objects for |
| AutoFDO builds. One can, however, enable or disable AutoFDO build for |
| individual files and directories by adding a line similar to the following |
| to the respective kernel Makefile: |
| |
| - For enabling a single file (e.g. foo.o) :: |
| |
| AUTOFDO_PROFILE_foo.o := y |
| |
| - For enabling all files in one directory :: |
| |
| AUTOFDO_PROFILE := y |
| |
| - For disabling one file :: |
| |
| AUTOFDO_PROFILE_foo.o := n |
| |
| - For disabling all files in one directory :: |
| |
| AUTOFDO_PROFILE := n |
| |
| Workflow |
| ======== |
| |
| Here is an example workflow for AutoFDO kernel: |
| |
| 1) Build the kernel on the host machine with LLVM enabled, |
| for example, :: |
| |
| $ make menuconfig LLVM=1 |
| |
| Turn on AutoFDO build config:: |
| |
| CONFIG_AUTOFDO_CLANG=y |
| |
| With a configuration that with LLVM enabled, use the following command:: |
| |
| $ scripts/config -e AUTOFDO_CLANG |
| |
| After getting the config, build with :: |
| |
| $ make LLVM=1 |
| |
| 2) Install the kernel on the test machine. |
| |
| 3) Run the load tests. The '-c' option in perf specifies the sample |
| event period. We suggest using a suitable prime number, like 500009, |
| for this purpose. |
| |
| - For Intel platforms:: |
| |
| $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest> |
| |
| - For AMD platforms: |
| |
| The supported systems are: Zen3 with BRS, or Zen4 with amd_lbr_v2. To check, |
| |
| For Zen3:: |
| |
| $ cat proc/cpuinfo | grep " brs" |
| |
| For Zen4:: |
| |
| $ cat proc/cpuinfo | grep amd_lbr_v2 |
| |
| The following command generated the perf data file:: |
| |
| $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest> |
| |
| 4) (Optional) Download the raw perf file to the host machine. |
| |
| 5) To generate an AutoFDO profile, two offline tools are available: |
| create_llvm_prof and llvm_profgen. The create_llvm_prof tool is part |
| of the AutoFDO project and can be found on GitHub |
| (https://github.com/google/autofdo), version v0.30.1 or later. |
| The llvm_profgen tool is included in the LLVM compiler itself. It's |
| important to note that the version of llvm_profgen doesn't need to match |
| the version of Clang. It needs to be the LLVM 19 release of Clang |
| or later, or just from the LLVM trunk. :: |
| |
| $ llvm-profgen --kernel --binary=<vmlinux> --perfdata=<perf_file> -o <profile_file> |
| |
| or :: |
| |
| $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> --format=extbinary --out=<profile_file> |
| |
| Note that multiple AutoFDO profile files can be merged into one via:: |
| |
| $ llvm-profdata merge -o <profile_file> <profile_1> <profile_2> ... <profile_n> |
| |
| 6) Rebuild the kernel using the AutoFDO profile file with the same config as step 1, |
| (Note CONFIG_AUTOFDO_CLANG needs to be enabled):: |
| |
| $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file> |