| ============================= |
| BPF Kernel Functions (kfuncs) |
| ============================= |
| |
| 1. Introduction |
| =============== |
| |
| BPF Kernel Functions or more commonly known as kfuncs are functions in the Linux |
| kernel which are exposed for use by BPF programs. Unlike normal BPF helpers, |
| kfuncs do not have a stable interface and can change from one kernel release to |
| another. Hence, BPF programs need to be updated in response to changes in the |
| kernel. |
| |
| 2. Defining a kfunc |
| =================== |
| |
| There are two ways to expose a kernel function to BPF programs, either make an |
| existing function in the kernel visible, or add a new wrapper for BPF. In both |
| cases, care must be taken that BPF program can only call such function in a |
| valid context. To enforce this, visibility of a kfunc can be per program type. |
| |
| If you are not creating a BPF wrapper for existing kernel function, skip ahead |
| to :ref:`BPF_kfunc_nodef`. |
| |
| 2.1 Creating a wrapper kfunc |
| ---------------------------- |
| |
| When defining a wrapper kfunc, the wrapper function should have extern linkage. |
| This prevents the compiler from optimizing away dead code, as this wrapper kfunc |
| is not invoked anywhere in the kernel itself. It is not necessary to provide a |
| prototype in a header for the wrapper kfunc. |
| |
| An example is given below:: |
| |
| /* Disables missing prototype warnings */ |
| __diag_push(); |
| __diag_ignore_all("-Wmissing-prototypes", |
| "Global kfuncs as their definitions will be in BTF"); |
| |
| struct task_struct *bpf_find_get_task_by_vpid(pid_t nr) |
| { |
| return find_get_task_by_vpid(nr); |
| } |
| |
| __diag_pop(); |
| |
| A wrapper kfunc is often needed when we need to annotate parameters of the |
| kfunc. Otherwise one may directly make the kfunc visible to the BPF program by |
| registering it with the BPF subsystem. See :ref:`BPF_kfunc_nodef`. |
| |
| 2.2 Annotating kfunc parameters |
| ------------------------------- |
| |
| Similar to BPF helpers, there is sometime need for additional context required |
| by the verifier to make the usage of kernel functions safer and more useful. |
| Hence, we can annotate a parameter by suffixing the name of the argument of the |
| kfunc with a __tag, where tag may be one of the supported annotations. |
| |
| 2.2.1 __sz Annotation |
| --------------------- |
| |
| This annotation is used to indicate a memory and size pair in the argument list. |
| An example is given below:: |
| |
| void bpf_memzero(void *mem, int mem__sz) |
| { |
| ... |
| } |
| |
| Here, the verifier will treat first argument as a PTR_TO_MEM, and second |
| argument as its size. By default, without __sz annotation, the size of the type |
| of the pointer is used. Without __sz annotation, a kfunc cannot accept a void |
| pointer. |
| |
| .. _BPF_kfunc_nodef: |
| |
| 2.3 Using an existing kernel function |
| ------------------------------------- |
| |
| When an existing function in the kernel is fit for consumption by BPF programs, |
| it can be directly registered with the BPF subsystem. However, care must still |
| be taken to review the context in which it will be invoked by the BPF program |
| and whether it is safe to do so. |
| |
| 2.4 Annotating kfuncs |
| --------------------- |
| |
| In addition to kfuncs' arguments, verifier may need more information about the |
| type of kfunc(s) being registered with the BPF subsystem. To do so, we define |
| flags on a set of kfuncs as follows:: |
| |
| BTF_SET8_START(bpf_task_set) |
| BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL) |
| BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE) |
| BTF_SET8_END(bpf_task_set) |
| |
| This set encodes the BTF ID of each kfunc listed above, and encodes the flags |
| along with it. Ofcourse, it is also allowed to specify no flags. |
| |
| 2.4.1 KF_ACQUIRE flag |
| --------------------- |
| |
| The KF_ACQUIRE flag is used to indicate that the kfunc returns a pointer to a |
| refcounted object. The verifier will then ensure that the pointer to the object |
| is eventually released using a release kfunc, or transferred to a map using a |
| referenced kptr (by invoking bpf_kptr_xchg). If not, the verifier fails the |
| loading of the BPF program until no lingering references remain in all possible |
| explored states of the program. |
| |
| 2.4.2 KF_RET_NULL flag |
| ---------------------- |
| |
| The KF_RET_NULL flag is used to indicate that the pointer returned by the kfunc |
| may be NULL. Hence, it forces the user to do a NULL check on the pointer |
| returned from the kfunc before making use of it (dereferencing or passing to |
| another helper). This flag is often used in pairing with KF_ACQUIRE flag, but |
| both are orthogonal to each other. |
| |
| 2.4.3 KF_RELEASE flag |
| --------------------- |
| |
| The KF_RELEASE flag is used to indicate that the kfunc releases the pointer |
| passed in to it. There can be only one referenced pointer that can be passed in. |
| All copies of the pointer being released are invalidated as a result of invoking |
| kfunc with this flag. |
| |
| 2.4.4 KF_KPTR_GET flag |
| ---------------------- |
| |
| The KF_KPTR_GET flag is used to indicate that the kfunc takes the first argument |
| as a pointer to kptr, safely increments the refcount of the object it points to, |
| and returns a reference to the user. The rest of the arguments may be normal |
| arguments of a kfunc. The KF_KPTR_GET flag should be used in conjunction with |
| KF_ACQUIRE and KF_RET_NULL flags. |
| |
| 2.4.5 KF_TRUSTED_ARGS flag |
| -------------------------- |
| |
| The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It |
| indicates that the all pointer arguments will always have a guaranteed lifetime, |
| and pointers to kernel objects are always passed to helpers in their unmodified |
| form (as obtained from acquire kfuncs). |
| |
| It can be used to enforce that a pointer to a refcounted object acquired from a |
| kfunc or BPF helper is passed as an argument to this kfunc without any |
| modifications (e.g. pointer arithmetic) such that it is trusted and points to |
| the original object. |
| |
| Meanwhile, it is also allowed pass pointers to normal memory to such kfuncs, |
| but those can have a non-zero offset. |
| |
| This flag is often used for kfuncs that operate (change some property, perform |
| some operation) on an object that was obtained using an acquire kfunc. Such |
| kfuncs need an unchanged pointer to ensure the integrity of the operation being |
| performed on the expected object. |
| |
| 2.4.6 KF_SLEEPABLE flag |
| ----------------------- |
| |
| The KF_SLEEPABLE flag is used for kfuncs that may sleep. Such kfuncs can only |
| be called by sleepable BPF programs (BPF_F_SLEEPABLE). |
| |
| 2.4.7 KF_DESTRUCTIVE flag |
| -------------------------- |
| |
| The KF_DESTRUCTIVE flag is used to indicate functions calling which is |
| destructive to the system. For example such a call can result in system |
| rebooting or panicking. Due to this additional restrictions apply to these |
| calls. At the moment they only require CAP_SYS_BOOT capability, but more can be |
| added later. |
| |
| 2.5 Registering the kfuncs |
| -------------------------- |
| |
| Once the kfunc is prepared for use, the final step to making it visible is |
| registering it with the BPF subsystem. Registration is done per BPF program |
| type. An example is shown below:: |
| |
| BTF_SET8_START(bpf_task_set) |
| BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL) |
| BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE) |
| BTF_SET8_END(bpf_task_set) |
| |
| static const struct btf_kfunc_id_set bpf_task_kfunc_set = { |
| .owner = THIS_MODULE, |
| .set = &bpf_task_set, |
| }; |
| |
| static int init_subsystem(void) |
| { |
| return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_task_kfunc_set); |
| } |
| late_initcall(init_subsystem); |