Jacob Pan | d0023e3 | 2020-09-25 09:32:42 -0700 | [diff] [blame] | 1 | .. SPDX-License-Identifier: GPL-2.0 |
| 2 | .. iommu: |
| 3 | |
| 4 | ===================================== |
| 5 | IOMMU Userspace API |
| 6 | ===================================== |
| 7 | |
| 8 | IOMMU UAPI is used for virtualization cases where communications are |
| 9 | needed between physical and virtual IOMMU drivers. For baremetal |
| 10 | usage, the IOMMU is a system device which does not need to communicate |
| 11 | with userspace directly. |
| 12 | |
| 13 | The primary use cases are guest Shared Virtual Address (SVA) and |
| 14 | guest IO virtual address (IOVA), wherein the vIOMMU implementation |
| 15 | relies on the physical IOMMU and for this reason requires interactions |
| 16 | with the host driver. |
| 17 | |
| 18 | .. contents:: :local: |
| 19 | |
| 20 | Functionalities |
| 21 | =============== |
| 22 | Communications of user and kernel involve both directions. The |
| 23 | supported user-kernel APIs are as follows: |
| 24 | |
| 25 | 1. Bind/Unbind guest PASID (e.g. Intel VT-d) |
| 26 | 2. Bind/Unbind guest PASID table (e.g. ARM SMMU) |
| 27 | 3. Invalidate IOMMU caches upon guest requests |
| 28 | 4. Report errors to the guest and serve page requests |
| 29 | |
| 30 | Requirements |
| 31 | ============ |
| 32 | The IOMMU UAPIs are generic and extensible to meet the following |
| 33 | requirements: |
| 34 | |
| 35 | 1. Emulated and para-virtualised vIOMMUs |
| 36 | 2. Multiple vendors (Intel VT-d, ARM SMMU, etc.) |
| 37 | 3. Extensions to the UAPI shall not break existing userspace |
| 38 | |
| 39 | Interfaces |
| 40 | ========== |
| 41 | Although the data structures defined in IOMMU UAPI are self-contained, |
| 42 | there are no user API functions introduced. Instead, IOMMU UAPI is |
| 43 | designed to work with existing user driver frameworks such as VFIO. |
| 44 | |
| 45 | Extension Rules & Precautions |
| 46 | ----------------------------- |
| 47 | When IOMMU UAPI gets extended, the data structures can *only* be |
| 48 | modified in two ways: |
| 49 | |
| 50 | 1. Adding new fields by re-purposing the padding[] field. No size change. |
| 51 | 2. Adding new union members at the end. May increase the structure sizes. |
| 52 | |
| 53 | No new fields can be added *after* the variable sized union in that it |
| 54 | will break backward compatibility when offset moves. A new flag must |
| 55 | be introduced whenever a change affects the structure using either |
| 56 | method. The IOMMU driver processes the data based on flags which |
| 57 | ensures backward compatibility. |
| 58 | |
| 59 | Version field is only reserved for the unlikely event of UAPI upgrade |
| 60 | at its entirety. |
| 61 | |
| 62 | It's *always* the caller's responsibility to indicate the size of the |
| 63 | structure passed by setting argsz appropriately. |
| 64 | Though at the same time, argsz is user provided data which is not |
| 65 | trusted. The argsz field allows the user app to indicate how much data |
| 66 | it is providing; it's still the kernel's responsibility to validate |
| 67 | whether it's correct and sufficient for the requested operation. |
| 68 | |
| 69 | Compatibility Checking |
| 70 | ---------------------- |
| 71 | When IOMMU UAPI extension results in some structure size increase, |
| 72 | IOMMU UAPI code shall handle the following cases: |
| 73 | |
| 74 | 1. User and kernel has exact size match |
| 75 | 2. An older user with older kernel header (smaller UAPI size) running on a |
| 76 | newer kernel (larger UAPI size) |
| 77 | 3. A newer user with newer kernel header (larger UAPI size) running |
| 78 | on an older kernel. |
| 79 | 4. A malicious/misbehaving user passing illegal/invalid size but within |
| 80 | range. The data may contain garbage. |
| 81 | |
| 82 | Feature Checking |
| 83 | ---------------- |
| 84 | While launching a guest with vIOMMU, it is strongly advised to check |
| 85 | the compatibility upfront, as some subsequent errors happening during |
| 86 | vIOMMU operation, such as cache invalidation failures cannot be nicely |
| 87 | escalated to the guest due to IOMMU specifications. This can lead to |
| 88 | catastrophic failures for the users. |
| 89 | |
| 90 | User applications such as QEMU are expected to import kernel UAPI |
| 91 | headers. Backward compatibility is supported per feature flags. |
| 92 | For example, an older QEMU (with older kernel header) can run on newer |
| 93 | kernel. Newer QEMU (with new kernel header) may refuse to initialize |
| 94 | on an older kernel if new feature flags are not supported by older |
| 95 | kernel. Simply recompiling existing code with newer kernel header should |
| 96 | not be an issue in that only existing flags are used. |
| 97 | |
| 98 | IOMMU vendor driver should report the below features to IOMMU UAPI |
| 99 | consumers (e.g. via VFIO). |
| 100 | |
| 101 | 1. IOMMU_NESTING_FEAT_SYSWIDE_PASID |
| 102 | 2. IOMMU_NESTING_FEAT_BIND_PGTBL |
| 103 | 3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE |
| 104 | 4. IOMMU_NESTING_FEAT_CACHE_INVLD |
| 105 | 5. IOMMU_NESTING_FEAT_PAGE_REQUEST |
| 106 | |
| 107 | Take VFIO as example, upon request from VFIO userspace (e.g. QEMU), |
| 108 | VFIO kernel code shall query IOMMU vendor driver for the support of |
| 109 | the above features. Query result can then be reported back to the |
| 110 | userspace caller. Details can be found in |
| 111 | Documentation/driver-api/vfio.rst. |
| 112 | |
| 113 | |
| 114 | Data Passing Example with VFIO |
| 115 | ------------------------------ |
| 116 | As the ubiquitous userspace driver framework, VFIO is already IOMMU |
| 117 | aware and shares many key concepts such as device model, group, and |
| 118 | protection domain. Other user driver frameworks can also be extended |
| 119 | to support IOMMU UAPI but it is outside the scope of this document. |
| 120 | |
| 121 | In this tight-knit VFIO-IOMMU interface, the ultimate consumer of the |
| 122 | IOMMU UAPI data is the host IOMMU driver. VFIO facilitates user-kernel |
| 123 | transport, capability checking, security, and life cycle management of |
| 124 | process address space ID (PASID). |
| 125 | |
| 126 | VFIO layer conveys the data structures down to the IOMMU driver. It |
| 127 | follows the pattern below:: |
| 128 | |
| 129 | struct { |
| 130 | __u32 argsz; |
| 131 | __u32 flags; |
| 132 | __u8 data[]; |
| 133 | }; |
| 134 | |
| 135 | Here data[] contains the IOMMU UAPI data structures. VFIO has the |
| 136 | freedom to bundle the data as well as parse data size based on its own flags. |
| 137 | |
| 138 | In order to determine the size and feature set of the user data, argsz |
| 139 | and flags (or the equivalent) are also embedded in the IOMMU UAPI data |
| 140 | structures. |
| 141 | |
| 142 | A "__u32 argsz" field is *always* at the beginning of each structure. |
| 143 | |
| 144 | For example: |
| 145 | :: |
| 146 | |
| 147 | struct iommu_cache_invalidate_info { |
| 148 | __u32 argsz; |
| 149 | #define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1 |
| 150 | __u32 version; |
| 151 | /* IOMMU paging structure cache */ |
| 152 | #define IOMMU_CACHE_INV_TYPE_IOTLB (1 << 0) /* IOMMU IOTLB */ |
| 153 | #define IOMMU_CACHE_INV_TYPE_DEV_IOTLB (1 << 1) /* Device IOTLB */ |
| 154 | #define IOMMU_CACHE_INV_TYPE_PASID (1 << 2) /* PASID cache */ |
| 155 | #define IOMMU_CACHE_INV_TYPE_NR (3) |
| 156 | __u8 cache; |
| 157 | __u8 granularity; |
| 158 | __u8 padding[6]; |
| 159 | union { |
| 160 | struct iommu_inv_pasid_info pasid_info; |
| 161 | struct iommu_inv_addr_info addr_info; |
| 162 | } granu; |
| 163 | }; |
| 164 | |
| 165 | VFIO is responsible for checking its own argsz and flags. It then |
| 166 | invokes appropriate IOMMU UAPI functions. The user pointers are passed |
| 167 | to the IOMMU layer for further processing. The responsibilities are |
| 168 | divided as follows: |
| 169 | |
| 170 | - Generic IOMMU layer checks argsz range based on UAPI data in the |
| 171 | current kernel version. |
| 172 | |
| 173 | - Generic IOMMU layer checks content of the UAPI data for non-zero |
| 174 | reserved bits in flags, padding fields, and unsupported version. |
| 175 | This is to ensure not breaking userspace in the future when these |
| 176 | fields or flags are used. |
| 177 | |
| 178 | - Vendor IOMMU driver checks argsz based on vendor flags. UAPI data |
| 179 | is consumed based on flags. Vendor driver has access to |
| 180 | unadulterated argsz value in case of vendor specific future |
| 181 | extensions. Currently, it does not perform the copy_from_user() |
| 182 | itself. A __user pointer can be provided in some future scenarios |
| 183 | where there's vendor data outside of the structure definition. |
| 184 | |
| 185 | IOMMU code treats UAPI data in two categories: |
| 186 | |
| 187 | - structure contains vendor data |
| 188 | (Example: iommu_uapi_cache_invalidate()) |
| 189 | |
| 190 | - structure contains only generic data |
| 191 | (Example: iommu_uapi_sva_bind_gpasid()) |
| 192 | |
| 193 | |
| 194 | |
| 195 | Sharing UAPI with in-kernel users |
| 196 | --------------------------------- |
| 197 | For UAPIs that are shared with in-kernel users, a wrapper function is |
| 198 | provided to distinguish the callers. For example, |
| 199 | |
| 200 | Userspace caller :: |
| 201 | |
| 202 | int iommu_uapi_sva_unbind_gpasid(struct iommu_domain *domain, |
| 203 | struct device *dev, |
| 204 | void __user *udata) |
| 205 | |
| 206 | In-kernel caller :: |
| 207 | |
| 208 | int iommu_sva_unbind_gpasid(struct iommu_domain *domain, |
| 209 | struct device *dev, ioasid_t ioasid); |