Liam R. Howlett | 54a611b | 2022-09-06 19:48:39 +0000 | [diff] [blame] | 1 | .. SPDX-License-Identifier: GPL-2.0+ |
| 2 | |
| 3 | |
| 4 | ========== |
| 5 | Maple Tree |
| 6 | ========== |
| 7 | |
| 8 | :Author: Liam R. Howlett |
| 9 | |
| 10 | Overview |
| 11 | ======== |
| 12 | |
| 13 | The Maple Tree is a B-Tree data type which is optimized for storing |
| 14 | non-overlapping ranges, including ranges of size 1. The tree was designed to |
| 15 | be simple to use and does not require a user written search method. It |
| 16 | supports iterating over a range of entries and going to the previous or next |
| 17 | entry in a cache-efficient manner. The tree can also be put into an RCU-safe |
| 18 | mode of operation which allows reading and writing concurrently. Writers must |
| 19 | synchronize on a lock, which can be the default spinlock, or the user can set |
| 20 | the lock to an external lock of a different type. |
| 21 | |
| 22 | The Maple Tree maintains a small memory footprint and was designed to use |
| 23 | modern processor cache efficiently. The majority of the users will be able to |
| 24 | use the normal API. An :ref:`maple-tree-advanced-api` exists for more complex |
| 25 | scenarios. The most important usage of the Maple Tree is the tracking of the |
| 26 | virtual memory areas. |
| 27 | |
| 28 | The Maple Tree can store values between ``0`` and ``ULONG_MAX``. The Maple |
| 29 | Tree reserves values with the bottom two bits set to '10' which are below 4096 |
| 30 | (ie 2, 6, 10 .. 4094) for internal use. If the entries may use reserved |
| 31 | entries then the users can convert the entries using xa_mk_value() and convert |
| 32 | them back by calling xa_to_value(). If the user needs to use a reserved |
| 33 | value, then the user can convert the value when using the |
| 34 | :ref:`maple-tree-advanced-api`, but are blocked by the normal API. |
| 35 | |
| 36 | The Maple Tree can also be configured to support searching for a gap of a given |
| 37 | size (or larger). |
| 38 | |
| 39 | Pre-allocating of nodes is also supported using the |
| 40 | :ref:`maple-tree-advanced-api`. This is useful for users who must guarantee a |
| 41 | successful store operation within a given |
| 42 | code segment when allocating cannot be done. Allocations of nodes are |
| 43 | relatively small at around 256 bytes. |
| 44 | |
| 45 | .. _maple-tree-normal-api: |
| 46 | |
| 47 | Normal API |
| 48 | ========== |
| 49 | |
| 50 | Start by initialising a maple tree, either with DEFINE_MTREE() for statically |
| 51 | allocated maple trees or mt_init() for dynamically allocated ones. A |
| 52 | freshly-initialised maple tree contains a ``NULL`` pointer for the range ``0`` |
| 53 | - ``ULONG_MAX``. There are currently two types of maple trees supported: the |
| 54 | allocation tree and the regular tree. The regular tree has a higher branching |
| 55 | factor for internal nodes. The allocation tree has a lower branching factor |
| 56 | but allows the user to search for a gap of a given size or larger from either |
| 57 | ``0`` upwards or ``ULONG_MAX`` down. An allocation tree can be used by |
| 58 | passing in the ``MT_FLAGS_ALLOC_RANGE`` flag when initialising the tree. |
| 59 | |
| 60 | You can then set entries using mtree_store() or mtree_store_range(). |
| 61 | mtree_store() will overwrite any entry with the new entry and return 0 on |
| 62 | success or an error code otherwise. mtree_store_range() works in the same way |
| 63 | but takes a range. mtree_load() is used to retrieve the entry stored at a |
| 64 | given index. You can use mtree_erase() to erase an entire range by only |
| 65 | knowing one value within that range, or mtree_store() call with an entry of |
| 66 | NULL may be used to partially erase a range or many ranges at once. |
| 67 | |
| 68 | If you want to only store a new entry to a range (or index) if that range is |
| 69 | currently ``NULL``, you can use mtree_insert_range() or mtree_insert() which |
| 70 | return -EEXIST if the range is not empty. |
| 71 | |
| 72 | You can search for an entry from an index upwards by using mt_find(). |
| 73 | |
| 74 | You can walk each entry within a range by calling mt_for_each(). You must |
| 75 | provide a temporary variable to store a cursor. If you want to walk each |
| 76 | element of the tree then ``0`` and ``ULONG_MAX`` may be used as the range. If |
| 77 | the caller is going to hold the lock for the duration of the walk then it is |
| 78 | worth looking at the mas_for_each() API in the :ref:`maple-tree-advanced-api` |
| 79 | section. |
| 80 | |
| 81 | Sometimes it is necessary to ensure the next call to store to a maple tree does |
| 82 | not allocate memory, please see :ref:`maple-tree-advanced-api` for this use case. |
| 83 | |
Peng Zhang | 9bc1d3c | 2023-10-27 11:38:41 +0800 | [diff] [blame] | 84 | You can use mtree_dup() to duplicate an entire maple tree. It is a more |
| 85 | efficient way than inserting all elements one by one into a new tree. |
| 86 | |
Liam R. Howlett | 54a611b | 2022-09-06 19:48:39 +0000 | [diff] [blame] | 87 | Finally, you can remove all entries from a maple tree by calling |
| 88 | mtree_destroy(). If the maple tree entries are pointers, you may wish to free |
| 89 | the entries first. |
| 90 | |
| 91 | Allocating Nodes |
| 92 | ---------------- |
| 93 | |
| 94 | The allocations are handled by the internal tree code. See |
| 95 | :ref:`maple-tree-advanced-alloc` for other options. |
| 96 | |
| 97 | Locking |
| 98 | ------- |
| 99 | |
| 100 | You do not have to worry about locking. See :ref:`maple-tree-advanced-locks` |
| 101 | for other options. |
| 102 | |
| 103 | The Maple Tree uses RCU and an internal spinlock to synchronise access: |
| 104 | |
| 105 | Takes RCU read lock: |
| 106 | * mtree_load() |
| 107 | * mt_find() |
| 108 | * mt_for_each() |
| 109 | * mt_next() |
| 110 | * mt_prev() |
| 111 | |
| 112 | Takes ma_lock internally: |
| 113 | * mtree_store() |
| 114 | * mtree_store_range() |
| 115 | * mtree_insert() |
| 116 | * mtree_insert_range() |
| 117 | * mtree_erase() |
Peng Zhang | 9bc1d3c | 2023-10-27 11:38:41 +0800 | [diff] [blame] | 118 | * mtree_dup() |
Liam R. Howlett | 54a611b | 2022-09-06 19:48:39 +0000 | [diff] [blame] | 119 | * mtree_destroy() |
| 120 | * mt_set_in_rcu() |
| 121 | * mt_clear_in_rcu() |
| 122 | |
| 123 | If you want to take advantage of the internal lock to protect the data |
| 124 | structures that you are storing in the Maple Tree, you can call mtree_lock() |
| 125 | before calling mtree_load(), then take a reference count on the object you |
| 126 | have found before calling mtree_unlock(). This will prevent stores from |
| 127 | removing the object from the tree between looking up the object and |
| 128 | incrementing the refcount. You can also use RCU to avoid dereferencing |
| 129 | freed memory, but an explanation of that is beyond the scope of this |
| 130 | document. |
| 131 | |
| 132 | .. _maple-tree-advanced-api: |
| 133 | |
| 134 | Advanced API |
| 135 | ============ |
| 136 | |
| 137 | The advanced API offers more flexibility and better performance at the |
| 138 | cost of an interface which can be harder to use and has fewer safeguards. |
| 139 | You must take care of your own locking while using the advanced API. |
| 140 | You can use the ma_lock, RCU or an external lock for protection. |
| 141 | You can mix advanced and normal operations on the same array, as long |
| 142 | as the locking is compatible. The :ref:`maple-tree-normal-api` is implemented |
| 143 | in terms of the advanced API. |
| 144 | |
| 145 | The advanced API is based around the ma_state, this is where the 'mas' |
| 146 | prefix originates. The ma_state struct keeps track of tree operations to make |
| 147 | life easier for both internal and external tree users. |
| 148 | |
| 149 | Initialising the maple tree is the same as in the :ref:`maple-tree-normal-api`. |
| 150 | Please see above. |
| 151 | |
| 152 | The maple state keeps track of the range start and end in mas->index and |
| 153 | mas->last, respectively. |
| 154 | |
| 155 | mas_walk() will walk the tree to the location of mas->index and set the |
| 156 | mas->index and mas->last according to the range for the entry. |
| 157 | |
| 158 | You can set entries using mas_store(). mas_store() will overwrite any entry |
| 159 | with the new entry and return the first existing entry that is overwritten. |
| 160 | The range is passed in as members of the maple state: index and last. |
| 161 | |
| 162 | You can use mas_erase() to erase an entire range by setting index and |
| 163 | last of the maple state to the desired range to erase. This will erase |
| 164 | the first range that is found in that range, set the maple state index |
| 165 | and last as the range that was erased and return the entry that existed |
| 166 | at that location. |
| 167 | |
| 168 | You can walk each entry within a range by using mas_for_each(). If you want |
| 169 | to walk each element of the tree then ``0`` and ``ULONG_MAX`` may be used as |
| 170 | the range. If the lock needs to be periodically dropped, see the locking |
| 171 | section mas_pause(). |
| 172 | |
| 173 | Using a maple state allows mas_next() and mas_prev() to function as if the |
| 174 | tree was a linked list. With such a high branching factor the amortized |
| 175 | performance penalty is outweighed by cache optimization. mas_next() will |
| 176 | return the next entry which occurs after the entry at index. mas_prev() |
| 177 | will return the previous entry which occurs before the entry at index. |
| 178 | |
| 179 | mas_find() will find the first entry which exists at or above index on |
| 180 | the first call, and the next entry from every subsequent calls. |
| 181 | |
Tom Yang | 9e1b016 | 2023-10-23 17:57:37 +0800 | [diff] [blame] | 182 | mas_find_rev() will find the first entry which exists at or below the last on |
Liam R. Howlett | 54a611b | 2022-09-06 19:48:39 +0000 | [diff] [blame] | 183 | the first call, and the previous entry from every subsequent calls. |
| 184 | |
| 185 | If the user needs to yield the lock during an operation, then the maple state |
| 186 | must be paused using mas_pause(). |
| 187 | |
| 188 | There are a few extra interfaces provided when using an allocation tree. |
| 189 | If you wish to search for a gap within a range, then mas_empty_area() |
| 190 | or mas_empty_area_rev() can be used. mas_empty_area() searches for a gap |
| 191 | starting at the lowest index given up to the maximum of the range. |
| 192 | mas_empty_area_rev() searches for a gap starting at the highest index given |
| 193 | and continues downward to the lower bound of the range. |
| 194 | |
| 195 | .. _maple-tree-advanced-alloc: |
| 196 | |
| 197 | Advanced Allocating Nodes |
| 198 | ------------------------- |
| 199 | |
| 200 | Allocations are usually handled internally to the tree, however if allocations |
| 201 | need to occur before a write occurs then calling mas_expected_entries() will |
| 202 | allocate the worst-case number of needed nodes to insert the provided number of |
| 203 | ranges. This also causes the tree to enter mass insertion mode. Once |
| 204 | insertions are complete calling mas_destroy() on the maple state will free the |
| 205 | unused allocations. |
| 206 | |
| 207 | .. _maple-tree-advanced-locks: |
| 208 | |
| 209 | Advanced Locking |
| 210 | ---------------- |
| 211 | |
| 212 | The maple tree uses a spinlock by default, but external locks can be used for |
| 213 | tree updates as well. To use an external lock, the tree must be initialized |
| 214 | with the ``MT_FLAGS_LOCK_EXTERN flag``, this is usually done with the |
| 215 | MTREE_INIT_EXT() #define, which takes an external lock as an argument. |
| 216 | |
| 217 | Functions and structures |
| 218 | ======================== |
| 219 | |
| 220 | .. kernel-doc:: include/linux/maple_tree.h |
| 221 | .. kernel-doc:: lib/maple_tree.c |