Mike Rapoport | d18edf5 | 2018-03-21 21:22:39 +0200 | [diff] [blame] | 1 | ===================== |
Kirill A. Shutemov | 49076ec | 2013-11-14 14:31:51 -0800 | [diff] [blame] | 2 | Split page table lock |
| 3 | ===================== |
| 4 | |
| 5 | Originally, mm->page_table_lock spinlock protected all page tables of the |
| 6 | mm_struct. But this approach leads to poor page fault scalability of |
| 7 | multi-threaded applications due high contention on the lock. To improve |
| 8 | scalability, split page table lock was introduced. |
| 9 | |
| 10 | With split page table lock we have separate per-table lock to serialize |
| 11 | access to the table. At the moment we use split lock for PTE and PMD |
| 12 | tables. Access to higher level tables protected by mm->page_table_lock. |
| 13 | |
| 14 | There are helpers to lock/unlock a table and other accessor functions: |
Mike Rapoport | d18edf5 | 2018-03-21 21:22:39 +0200 | [diff] [blame] | 15 | |
Kirill A. Shutemov | 49076ec | 2013-11-14 14:31:51 -0800 | [diff] [blame] | 16 | - pte_offset_map_lock() |
Hugh Dickins | 0d940a9 | 2023-06-08 18:10:32 -0700 | [diff] [blame] | 17 | maps PTE and takes PTE table lock, returns pointer to PTE with |
| 18 | pointer to its PTE table lock, or returns NULL if no PTE table; |
| 19 | - pte_offset_map_nolock() |
| 20 | maps PTE, returns pointer to PTE with pointer to its PTE table |
| 21 | lock (not taken), or returns NULL if no PTE table; |
| 22 | - pte_offset_map() |
| 23 | maps PTE, returns pointer to PTE, or returns NULL if no PTE table; |
| 24 | - pte_unmap() |
| 25 | unmaps PTE table; |
Kirill A. Shutemov | 49076ec | 2013-11-14 14:31:51 -0800 | [diff] [blame] | 26 | - pte_unmap_unlock() |
| 27 | unlocks and unmaps PTE table; |
| 28 | - pte_alloc_map_lock() |
Hugh Dickins | 0d940a9 | 2023-06-08 18:10:32 -0700 | [diff] [blame] | 29 | allocates PTE table if needed and takes its lock, returns pointer to |
| 30 | PTE with pointer to its lock, or returns NULL if allocation failed; |
Kirill A. Shutemov | 49076ec | 2013-11-14 14:31:51 -0800 | [diff] [blame] | 31 | - pmd_lock() |
| 32 | takes PMD table lock, returns pointer to taken lock; |
| 33 | - pmd_lockptr() |
| 34 | returns pointer to PMD table lock; |
| 35 | |
| 36 | Split page table lock for PTE tables is enabled compile-time if |
| 37 | CONFIG_SPLIT_PTLOCK_CPUS (usually 4) is less or equal to NR_CPUS. |
Rolf Eike Beer | 96c0f7c | 2021-01-12 14:19:36 +0100 | [diff] [blame] | 38 | If split lock is disabled, all tables are guarded by mm->page_table_lock. |
Kirill A. Shutemov | 49076ec | 2013-11-14 14:31:51 -0800 | [diff] [blame] | 39 | |
| 40 | Split page table lock for PMD tables is enabled, if it's enabled for PTE |
| 41 | tables and the architecture supports it (see below). |
| 42 | |
| 43 | Hugetlb and split page table lock |
Mike Rapoport | d18edf5 | 2018-03-21 21:22:39 +0200 | [diff] [blame] | 44 | ================================= |
Kirill A. Shutemov | 49076ec | 2013-11-14 14:31:51 -0800 | [diff] [blame] | 45 | |
| 46 | Hugetlb can support several page sizes. We use split lock only for PMD |
| 47 | level, but not for PUD. |
| 48 | |
| 49 | Hugetlb-specific helpers: |
Mike Rapoport | d18edf5 | 2018-03-21 21:22:39 +0200 | [diff] [blame] | 50 | |
Kirill A. Shutemov | 49076ec | 2013-11-14 14:31:51 -0800 | [diff] [blame] | 51 | - huge_pte_lock() |
| 52 | takes pmd split lock for PMD_SIZE page, mm->page_table_lock |
| 53 | otherwise; |
| 54 | - huge_pte_lockptr() |
| 55 | returns pointer to table lock; |
| 56 | |
| 57 | Support of split page table lock by an architecture |
Mike Rapoport | d18edf5 | 2018-03-21 21:22:39 +0200 | [diff] [blame] | 58 | =================================================== |
Kirill A. Shutemov | 49076ec | 2013-11-14 14:31:51 -0800 | [diff] [blame] | 59 | |
Mark Rutland | b4ed71f | 2019-09-25 16:49:46 -0700 | [diff] [blame] | 60 | There's no need in special enabling of PTE split page table lock: everything |
Vishal Moola (Oracle) | 9a4bbd8 | 2023-08-07 16:05:13 -0700 | [diff] [blame] | 61 | required is done by pagetable_pte_ctor() and pagetable_pte_dtor(), which |
Mark Rutland | b4ed71f | 2019-09-25 16:49:46 -0700 | [diff] [blame] | 62 | must be called on PTE table allocation / freeing. |
Kirill A. Shutemov | 49076ec | 2013-11-14 14:31:51 -0800 | [diff] [blame] | 63 | |
| 64 | Make sure the architecture doesn't use slab allocator for page table |
Kirill A. Shutemov | 1d798ca | 2015-11-06 16:29:54 -0800 | [diff] [blame] | 65 | allocation: slab uses page->slab_cache for its pages. |
| 66 | This field shares storage with page->ptl. |
Kirill A. Shutemov | 49076ec | 2013-11-14 14:31:51 -0800 | [diff] [blame] | 67 | |
| 68 | PMD split lock only makes sense if you have more than two page table |
| 69 | levels. |
| 70 | |
Vishal Moola (Oracle) | 9a4bbd8 | 2023-08-07 16:05:13 -0700 | [diff] [blame] | 71 | PMD split lock enabling requires pagetable_pmd_ctor() call on PMD table |
| 72 | allocation and pagetable_pmd_dtor() on freeing. |
Kirill A. Shutemov | 49076ec | 2013-11-14 14:31:51 -0800 | [diff] [blame] | 73 | |
Kirill A. Shutemov | c283610 | 2013-11-21 14:32:09 -0800 | [diff] [blame] | 74 | Allocation usually happens in pmd_alloc_one(), freeing in pmd_free() and |
| 75 | pmd_free_tlb(), but make sure you cover all PMD table allocation / freeing |
| 76 | paths: i.e X86_PAE preallocate few PMDs on pgd_alloc(). |
Kirill A. Shutemov | 49076ec | 2013-11-14 14:31:51 -0800 | [diff] [blame] | 77 | |
| 78 | With everything in place you can set CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK. |
| 79 | |
Vishal Moola (Oracle) | 9a4bbd8 | 2023-08-07 16:05:13 -0700 | [diff] [blame] | 80 | NOTE: pagetable_pte_ctor() and pagetable_pmd_ctor() can fail -- it must |
Kirill A. Shutemov | 49076ec | 2013-11-14 14:31:51 -0800 | [diff] [blame] | 81 | be handled properly. |
| 82 | |
| 83 | page->ptl |
Mike Rapoport | d18edf5 | 2018-03-21 21:22:39 +0200 | [diff] [blame] | 84 | ========= |
Kirill A. Shutemov | 49076ec | 2013-11-14 14:31:51 -0800 | [diff] [blame] | 85 | |
| 86 | page->ptl is used to access split page table lock, where 'page' is struct |
| 87 | page of page containing the table. It shares storage with page->private |
| 88 | (and few other fields in union). |
| 89 | |
| 90 | To avoid increasing size of struct page and have best performance, we use a |
| 91 | trick: |
Mike Rapoport | d18edf5 | 2018-03-21 21:22:39 +0200 | [diff] [blame] | 92 | |
Kirill A. Shutemov | 49076ec | 2013-11-14 14:31:51 -0800 | [diff] [blame] | 93 | - if spinlock_t fits into long, we use page->ptr as spinlock, so we |
| 94 | can avoid indirect access and save a cache line. |
| 95 | - if size of spinlock_t is bigger then size of long, we use page->ptl as |
| 96 | pointer to spinlock_t and allocate it dynamically. This allows to use |
| 97 | split lock with enabled DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC, but costs |
| 98 | one more cache line for indirect access; |
| 99 | |
Vishal Moola (Oracle) | 9a4bbd8 | 2023-08-07 16:05:13 -0700 | [diff] [blame] | 100 | The spinlock_t allocated in pagetable_pte_ctor() for PTE table and in |
| 101 | pagetable_pmd_ctor() for PMD table. |
Kirill A. Shutemov | 49076ec | 2013-11-14 14:31:51 -0800 | [diff] [blame] | 102 | |
| 103 | Please, never access page->ptl directly -- use appropriate helper. |