Mauro Carvalho Chehab | 97162a1 | 2019-06-08 23:27:03 -0300 | [diff] [blame] | 1 | ====================== |
| 2 | Userspace verbs access |
| 3 | ====================== |
Roland Dreier | 6f50142 | 2005-07-07 17:57:21 -0700 | [diff] [blame] | 4 | |
| 5 | The ib_uverbs module, built by enabling CONFIG_INFINIBAND_USER_VERBS, |
| 6 | enables direct userspace access to IB hardware via "verbs," as |
| 7 | described in chapter 11 of the InfiniBand Architecture Specification. |
| 8 | |
| 9 | To use the verbs, the libibverbs library, available from |
Jason Gunthorpe | 46adb17 | 2018-02-02 14:35:29 -0700 | [diff] [blame] | 10 | https://github.com/linux-rdma/rdma-core, is required. libibverbs contains a |
Roland Dreier | 6f50142 | 2005-07-07 17:57:21 -0700 | [diff] [blame] | 11 | device-independent API for using the ib_uverbs interface. |
| 12 | libibverbs also requires appropriate device-dependent kernel and |
| 13 | userspace driver for your InfiniBand hardware. For example, to use |
| 14 | a Mellanox HCA, you will need the ib_mthca kernel module and the |
| 15 | libmthca userspace driver be installed. |
| 16 | |
| 17 | User-kernel communication |
Mauro Carvalho Chehab | 97162a1 | 2019-06-08 23:27:03 -0300 | [diff] [blame] | 18 | ========================= |
Roland Dreier | 6f50142 | 2005-07-07 17:57:21 -0700 | [diff] [blame] | 19 | |
| 20 | Userspace communicates with the kernel for slow path, resource |
| 21 | management operations via the /dev/infiniband/uverbsN character |
| 22 | devices. Fast path operations are typically performed by writing |
| 23 | directly to hardware registers mmap()ed into userspace, with no |
| 24 | system call or context switch into the kernel. |
| 25 | |
| 26 | Commands are sent to the kernel via write()s on these device files. |
| 27 | The ABI is defined in drivers/infiniband/include/ib_user_verbs.h. |
| 28 | The structs for commands that require a response from the kernel |
| 29 | contain a 64-bit field used to pass a pointer to an output buffer. |
| 30 | Status is returned to userspace as the return value of the write() |
| 31 | system call. |
| 32 | |
| 33 | Resource management |
Mauro Carvalho Chehab | 97162a1 | 2019-06-08 23:27:03 -0300 | [diff] [blame] | 34 | =================== |
Roland Dreier | 6f50142 | 2005-07-07 17:57:21 -0700 | [diff] [blame] | 35 | |
| 36 | Since creation and destruction of all IB resources is done by |
| 37 | commands passed through a file descriptor, the kernel can keep track |
| 38 | of which resources are attached to a given userspace context. The |
| 39 | ib_uverbs module maintains idr tables that are used to translate |
| 40 | between kernel pointers and opaque userspace handles, so that kernel |
| 41 | pointers are never exposed to userspace and userspace cannot trick |
| 42 | the kernel into following a bogus pointer. |
| 43 | |
| 44 | This also allows the kernel to clean up when a process exits and |
| 45 | prevent one process from touching another process's resources. |
| 46 | |
| 47 | Memory pinning |
Mauro Carvalho Chehab | 97162a1 | 2019-06-08 23:27:03 -0300 | [diff] [blame] | 48 | ============== |
Roland Dreier | 6f50142 | 2005-07-07 17:57:21 -0700 | [diff] [blame] | 49 | |
| 50 | Direct userspace I/O requires that memory regions that are potential |
| 51 | I/O targets be kept resident at the same physical address. The |
| 52 | ib_uverbs module manages pinning and unpinning memory regions via |
| 53 | get_user_pages() and put_page() calls. It also accounts for the |
Davidlohr Bueso | 1a7a05e | 2019-02-06 17:31:55 -0800 | [diff] [blame] | 54 | amount of memory pinned in the process's pinned_vm, and checks that |
Roland Dreier | 6f50142 | 2005-07-07 17:57:21 -0700 | [diff] [blame] | 55 | unprivileged processes do not exceed their RLIMIT_MEMLOCK limit. |
| 56 | |
| 57 | Pages that are pinned multiple times are counted each time they are |
Davidlohr Bueso | 1a7a05e | 2019-02-06 17:31:55 -0800 | [diff] [blame] | 58 | pinned, so the value of pinned_vm may be an overestimate of the |
Roland Dreier | 6f50142 | 2005-07-07 17:57:21 -0700 | [diff] [blame] | 59 | number of pages pinned by a process. |
| 60 | |
| 61 | /dev files |
Mauro Carvalho Chehab | 97162a1 | 2019-06-08 23:27:03 -0300 | [diff] [blame] | 62 | ========== |
Roland Dreier | 6f50142 | 2005-07-07 17:57:21 -0700 | [diff] [blame] | 63 | |
| 64 | To create the appropriate character device files automatically with |
Mauro Carvalho Chehab | 97162a1 | 2019-06-08 23:27:03 -0300 | [diff] [blame] | 65 | udev, a rule like:: |
Roland Dreier | 6f50142 | 2005-07-07 17:57:21 -0700 | [diff] [blame] | 66 | |
Bart Van Assche | aa07a99 | 2009-10-07 15:35:55 -0700 | [diff] [blame] | 67 | KERNEL=="uverbs*", NAME="infiniband/%k" |
Roland Dreier | 6f50142 | 2005-07-07 17:57:21 -0700 | [diff] [blame] | 68 | |
Mauro Carvalho Chehab | 97162a1 | 2019-06-08 23:27:03 -0300 | [diff] [blame] | 69 | can be used. This will create device nodes named:: |
Roland Dreier | 6f50142 | 2005-07-07 17:57:21 -0700 | [diff] [blame] | 70 | |
| 71 | /dev/infiniband/uverbs0 |
| 72 | |
| 73 | and so on. Since the InfiniBand userspace verbs should be safe for |
| 74 | use by non-privileged processes, it may be useful to add an |
| 75 | appropriate MODE or GROUP to the udev rule. |