blob: bf28ac0401f38a0c7bd5cec975c1bf5c76af34a0 [file] [log] [blame]
Changbin Du28e21ea2019-05-08 23:21:26 +08001.. SPDX-License-Identifier: GPL-2.0
2
3======================
4Memory Protection Keys
5======================
6
Ira Weinyf8c1d4c2022-04-19 10:06:06 -07007Memory Protection Keys provide a mechanism for enforcing page-based
8protections, but without requiring modification of the page tables when an
9application changes protection domains.
Dave Hansenc51ff2c2017-11-10 16:12:28 -080010
Ira Weinyf8c1d4c2022-04-19 10:06:06 -070011Pkeys Userspace (PKU) is a feature which can be found on:
12 * Intel server CPUs, Skylake and later
13 * Intel client CPUs, Tiger Lake (11th Gen Core) and later
14 * Future AMD CPUs
Dave Hansen591b1d82015-12-14 11:06:34 -080015
Ira Weinyf8c1d4c2022-04-19 10:06:06 -070016Pkeys work by dedicating 4 previously Reserved bits in each page table entry to
17a "protection key", giving 16 possible keys.
Dave Hansen591b1d82015-12-14 11:06:34 -080018
Ira Weinyf8c1d4c2022-04-19 10:06:06 -070019Protections for each key are defined with a per-CPU user-accessible register
20(PKRU). Each of these is a 32-bit register storing two bits (Access Disable
21and Write Disable) for each of 16 keys.
22
23Being a CPU register, PKRU is inherently thread-local, potentially giving each
Dave Hansen591b1d82015-12-14 11:06:34 -080024thread a different set of protections from every other thread.
25
Ira Weinyf8c1d4c2022-04-19 10:06:06 -070026There are two instructions (RDPKRU/WRPKRU) for reading and writing to the
27register. The feature is only available in 64-bit mode, even though there is
28theoretically space in the PAE PTEs. These permissions are enforced on data
29access only and have no effect on instruction fetches.
Dave Hansen591b1d82015-12-14 11:06:34 -080030
Changbin Du28e21ea2019-05-08 23:21:26 +080031Syscalls
32========
Dave Hansenc74fe392016-07-29 09:30:20 -070033
Changbin Du28e21ea2019-05-08 23:21:26 +080034There are 3 system calls which directly interact with pkeys::
Dave Hansenc74fe392016-07-29 09:30:20 -070035
36 int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
37 int pkey_free(int pkey);
38 int pkey_mprotect(unsigned long start, size_t len,
39 unsigned long prot, int pkey);
40
41Before a pkey can be used, it must first be allocated with
42pkey_alloc(). An application calls the WRPKRU instruction
43directly in order to change access permissions to memory covered
44with a key. In this example WRPKRU is wrapped by a C function
45called pkey_set().
Changbin Du28e21ea2019-05-08 23:21:26 +080046::
Dave Hansenc74fe392016-07-29 09:30:20 -070047
48 int real_prot = PROT_READ|PROT_WRITE;
Wang Kaif90e2d92017-07-24 21:03:46 +080049 pkey = pkey_alloc(0, PKEY_DISABLE_WRITE);
Dave Hansenc74fe392016-07-29 09:30:20 -070050 ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
51 ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);
52 ... application runs here
53
54Now, if the application needs to update the data at 'ptr', it can
Changbin Du28e21ea2019-05-08 23:21:26 +080055gain access, do the update, then remove its write access::
Dave Hansenc74fe392016-07-29 09:30:20 -070056
Wang Kaif90e2d92017-07-24 21:03:46 +080057 pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE
Dave Hansenc74fe392016-07-29 09:30:20 -070058 *ptr = foo; // assign something
Wang Kaif90e2d92017-07-24 21:03:46 +080059 pkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again
Dave Hansenc74fe392016-07-29 09:30:20 -070060
61Now when it frees the memory, it will also free the pkey since it
Changbin Du28e21ea2019-05-08 23:21:26 +080062is no longer in use::
Dave Hansenc74fe392016-07-29 09:30:20 -070063
64 munmap(ptr, PAGE_SIZE);
65 pkey_free(pkey);
66
Changbin Du28e21ea2019-05-08 23:21:26 +080067.. note:: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions.
68 An example implementation can be found in
69 tools/testing/selftests/x86/protection_keys.c.
Dave Hansen6679dac2016-10-04 09:38:57 -070070
Changbin Du28e21ea2019-05-08 23:21:26 +080071Behavior
72========
Dave Hansenc74fe392016-07-29 09:30:20 -070073
74The kernel attempts to make protection keys consistent with the
Changbin Du28e21ea2019-05-08 23:21:26 +080075behavior of a plain mprotect(). For instance if you do this::
Dave Hansenc74fe392016-07-29 09:30:20 -070076
77 mprotect(ptr, size, PROT_NONE);
78 something(ptr);
79
Changbin Du28e21ea2019-05-08 23:21:26 +080080you can expect the same effects with protection keys when doing this::
Dave Hansenc74fe392016-07-29 09:30:20 -070081
82 pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ);
83 pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey);
84 something(ptr);
85
86That should be true whether something() is a direct access to 'ptr'
Changbin Du28e21ea2019-05-08 23:21:26 +080087like::
Dave Hansenc74fe392016-07-29 09:30:20 -070088
89 *ptr = foo;
90
91or when the kernel does the access on the application's behalf like
Changbin Du28e21ea2019-05-08 23:21:26 +080092with a read()::
Dave Hansenc74fe392016-07-29 09:30:20 -070093
94 read(fd, ptr, 1);
95
96The kernel will send a SIGSEGV in both cases, but si_code will be set
97to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
98the plain mprotect() permissions are violated.