| .. SPDX-License-Identifier: GPL-2.0 |
| |
| ================ |
| Page Table Check |
| ================ |
| |
| Introduction |
| ============ |
| |
| Page table check allows to harden the kernel by ensuring that some types of |
| the memory corruptions are prevented. |
| |
| Page table check performs extra verifications at the time when new pages become |
| accessible from the userspace by getting their page table entries (PTEs PMDs |
| etc.) added into the table. |
| |
| In case of most detected corruption, the kernel is crashed. There is a small |
| performance and memory overhead associated with the page table check. Therefore, |
| it is disabled by default, but can be optionally enabled on systems where the |
| extra hardening outweighs the performance costs. Also, because page table check |
| is synchronous, it can help with debugging double map memory corruption issues, |
| by crashing kernel at the time wrong mapping occurs instead of later which is |
| often the case with memory corruptions bugs. |
| |
| It can also be used to do page table entry checks over various flags, dump |
| warnings when illegal combinations of entry flags are detected. Currently, |
| userfaultfd is the only user of such to sanity check wr-protect bit against |
| any writable flags. Illegal flag combinations will not directly cause data |
| corruption in this case immediately, but that will cause read-only data to |
| be writable, leading to corrupt when the page content is later modified. |
| |
| Double mapping detection logic |
| ============================== |
| |
| +-------------------+-------------------+-------------------+------------------+ |
| | Current Mapping | New mapping | Permissions | Rule | |
| +===================+===================+===================+==================+ |
| | Anonymous | Anonymous | Read | Allow | |
| +-------------------+-------------------+-------------------+------------------+ |
| | Anonymous | Anonymous | Read / Write | Prohibit | |
| +-------------------+-------------------+-------------------+------------------+ |
| | Anonymous | Named | Any | Prohibit | |
| +-------------------+-------------------+-------------------+------------------+ |
| | Named | Anonymous | Any | Prohibit | |
| +-------------------+-------------------+-------------------+------------------+ |
| | Named | Named | Any | Allow | |
| +-------------------+-------------------+-------------------+------------------+ |
| |
| Enabling Page Table Check |
| ========================= |
| |
| Build kernel with: |
| |
| - PAGE_TABLE_CHECK=y |
| Note, it can only be enabled on platforms where ARCH_SUPPORTS_PAGE_TABLE_CHECK |
| is available. |
| |
| - Boot with 'page_table_check=on' kernel parameter. |
| |
| Optionally, build kernel with PAGE_TABLE_CHECK_ENFORCED in order to have page |
| table support without extra kernel parameter. |
| |
| Implementation notes |
| ==================== |
| |
| We specifically decided not to use VMA information in order to avoid relying on |
| MM states (except for limited "struct page" info). The page table check is a |
| separate from Linux-MM state machine that verifies that the user accessible |
| pages are not falsely shared. |
| |
| PAGE_TABLE_CHECK depends on EXCLUSIVE_SYSTEM_RAM. The reason is that without |
| EXCLUSIVE_SYSTEM_RAM, users are allowed to map arbitrary physical memory |
| regions into the userspace via /dev/mem. At the same time, pages may change |
| their properties (e.g., from anonymous pages to named pages) while they are |
| still being mapped in the userspace, leading to "corruption" detected by the |
| page table check. |
| |
| Even with EXCLUSIVE_SYSTEM_RAM, I/O pages may be still allowed to be mapped via |
| /dev/mem. However, these pages are always considered as named pages, so they |
| won't break the logic used in the page table check. |