Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 1 | ======================= |
Ingo Molnar | f3f54ff | 2006-01-09 15:59:20 -0800 | [diff] [blame] | 2 | Generic Mutex Subsystem |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 3 | ======================= |
Ingo Molnar | f3f54ff | 2006-01-09 15:59:20 -0800 | [diff] [blame] | 4 | |
| 5 | started by Ingo Molnar <mingo@redhat.com> |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 6 | |
Davidlohr Bueso | 9161f54 | 2014-05-28 21:36:43 -0700 | [diff] [blame] | 7 | updated by Davidlohr Bueso <davidlohr@hp.com> |
Ingo Molnar | f3f54ff | 2006-01-09 15:59:20 -0800 | [diff] [blame] | 8 | |
Davidlohr Bueso | 9161f54 | 2014-05-28 21:36:43 -0700 | [diff] [blame] | 9 | What are mutexes? |
| 10 | ----------------- |
Ingo Molnar | f3f54ff | 2006-01-09 15:59:20 -0800 | [diff] [blame] | 11 | |
Davidlohr Bueso | 9161f54 | 2014-05-28 21:36:43 -0700 | [diff] [blame] | 12 | In the Linux kernel, mutexes refer to a particular locking primitive |
| 13 | that enforces serialization on shared memory systems, and not only to |
| 14 | the generic term referring to 'mutual exclusion' found in academia |
| 15 | or similar theoretical text books. Mutexes are sleeping locks which |
| 16 | behave similarly to binary semaphores, and were introduced in 2006[1] |
| 17 | as an alternative to these. This new data structure provided a number |
| 18 | of advantages, including simpler interfaces, and at that time smaller |
| 19 | code (see Disadvantages). |
Ingo Molnar | f3f54ff | 2006-01-09 15:59:20 -0800 | [diff] [blame] | 20 | |
Alexander A. Klimov | 0288199 | 2020-07-13 13:57:28 +0200 | [diff] [blame] | 21 | [1] https://lwn.net/Articles/164802/ |
Ingo Molnar | f3f54ff | 2006-01-09 15:59:20 -0800 | [diff] [blame] | 22 | |
Davidlohr Bueso | 9161f54 | 2014-05-28 21:36:43 -0700 | [diff] [blame] | 23 | Implementation |
| 24 | -------------- |
Ingo Molnar | f3f54ff | 2006-01-09 15:59:20 -0800 | [diff] [blame] | 25 | |
Davidlohr Bueso | 9161f54 | 2014-05-28 21:36:43 -0700 | [diff] [blame] | 26 | Mutexes are represented by 'struct mutex', defined in include/linux/mutex.h |
Juri Lelli | 79e9023 | 2018-02-09 17:01:14 +0100 | [diff] [blame] | 27 | and implemented in kernel/locking/mutex.c. These locks use an atomic variable |
| 28 | (->owner) to keep track of the lock state during its lifetime. Field owner |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 29 | actually contains `struct task_struct *` to the current lock owner and it is |
Juri Lelli | 79e9023 | 2018-02-09 17:01:14 +0100 | [diff] [blame] | 30 | therefore NULL if not currently owned. Since task_struct pointers are aligned |
Randy Dunlap | 7d64394 | 2020-07-03 14:36:48 -0700 | [diff] [blame] | 31 | to at least L1_CACHE_BYTES, low bits (3) are used to store extra state (e.g., |
Juri Lelli | 79e9023 | 2018-02-09 17:01:14 +0100 | [diff] [blame] | 32 | if waiter list is non-empty). In its most basic form it also includes a |
| 33 | wait-queue and a spinlock that serializes access to it. Furthermore, |
| 34 | CONFIG_MUTEX_SPIN_ON_OWNER=y systems use a spinner MCS lock (->osq), described |
| 35 | below in (ii). |
Ingo Molnar | f3f54ff | 2006-01-09 15:59:20 -0800 | [diff] [blame] | 36 | |
Davidlohr Bueso | 9161f54 | 2014-05-28 21:36:43 -0700 | [diff] [blame] | 37 | When acquiring a mutex, there are three possible paths that can be |
| 38 | taken, depending on the state of the lock: |
Ingo Molnar | f3f54ff | 2006-01-09 15:59:20 -0800 | [diff] [blame] | 39 | |
Juri Lelli | 79e9023 | 2018-02-09 17:01:14 +0100 | [diff] [blame] | 40 | (i) fastpath: tries to atomically acquire the lock by cmpxchg()ing the owner with |
| 41 | the current task. This only works in the uncontended case (cmpxchg() checks |
| 42 | against 0UL, so all 3 state bits above have to be 0). If the lock is |
| 43 | contended it goes to the next possible path. |
Ingo Molnar | f3f54ff | 2006-01-09 15:59:20 -0800 | [diff] [blame] | 44 | |
Davidlohr Bueso | 9161f54 | 2014-05-28 21:36:43 -0700 | [diff] [blame] | 45 | (ii) midpath: aka optimistic spinning, tries to spin for acquisition |
| 46 | while the lock owner is running and there are no other tasks ready |
| 47 | to run that have higher priority (need_resched). The rationale is |
| 48 | that if the lock owner is running, it is likely to release the lock |
| 49 | soon. The mutex spinners are queued up using MCS lock so that only |
| 50 | one spinner can compete for the mutex. |
Ingo Molnar | f3f54ff | 2006-01-09 15:59:20 -0800 | [diff] [blame] | 51 | |
Davidlohr Bueso | 9161f54 | 2014-05-28 21:36:43 -0700 | [diff] [blame] | 52 | The MCS lock (proposed by Mellor-Crummey and Scott) is a simple spinlock |
| 53 | with the desirable properties of being fair and with each cpu trying |
| 54 | to acquire the lock spinning on a local variable. It avoids expensive |
| 55 | cacheline bouncing that common test-and-set spinlock implementations |
| 56 | incur. An MCS-like lock is specially tailored for optimistic spinning |
| 57 | for sleeping lock implementation. An important feature of the customized |
| 58 | MCS lock is that it has the extra property that spinners are able to exit |
| 59 | the MCS spinlock queue when they need to reschedule. This further helps |
| 60 | avoid situations where MCS spinners that need to reschedule would continue |
| 61 | waiting to spin on mutex owner, only to go directly to slowpath upon |
| 62 | obtaining the MCS lock. |
Ingo Molnar | f3f54ff | 2006-01-09 15:59:20 -0800 | [diff] [blame] | 63 | |
Davidlohr Bueso | 9161f54 | 2014-05-28 21:36:43 -0700 | [diff] [blame] | 64 | |
| 65 | (iii) slowpath: last resort, if the lock is still unable to be acquired, |
| 66 | the task is added to the wait-queue and sleeps until woken up by the |
| 67 | unlock path. Under normal circumstances it blocks as TASK_UNINTERRUPTIBLE. |
| 68 | |
| 69 | While formally kernel mutexes are sleepable locks, it is path (ii) that |
| 70 | makes them more practically a hybrid type. By simply not interrupting a |
| 71 | task and busy-waiting for a few cycles instead of immediately sleeping, |
| 72 | the performance of this lock has been seen to significantly improve a |
| 73 | number of workloads. Note that this technique is also used for rw-semaphores. |
| 74 | |
| 75 | Semantics |
| 76 | --------- |
| 77 | |
| 78 | The mutex subsystem checks and enforces the following rules: |
| 79 | |
| 80 | - Only one task can hold the mutex at a time. |
| 81 | - Only the owner can unlock the mutex. |
| 82 | - Multiple unlocks are not permitted. |
| 83 | - Recursive locking/unlocking is not permitted. |
| 84 | - A mutex must only be initialized via the API (see below). |
| 85 | - A task may not exit with a mutex held. |
| 86 | - Memory areas where held locks reside must not be freed. |
| 87 | - Held mutexes must not be reinitialized. |
| 88 | - Mutexes may not be used in hardware or software interrupt |
| 89 | contexts such as tasklets and timers. |
| 90 | |
| 91 | These semantics are fully enforced when CONFIG DEBUG_MUTEXES is enabled. |
| 92 | In addition, the mutex debugging code also implements a number of other |
| 93 | features that make lock debugging easier and faster: |
| 94 | |
| 95 | - Uses symbolic names of mutexes, whenever they are printed |
| 96 | in debug output. |
| 97 | - Point-of-acquire tracking, symbolic lookup of function names, |
| 98 | list of all locks held in the system, printout of them. |
| 99 | - Owner tracking. |
| 100 | - Detects self-recursing locks and prints out all relevant info. |
| 101 | - Detects multi-task circular deadlocks and prints out all affected |
| 102 | locks and tasks (and only those tasks). |
| 103 | |
Ingo Molnar | 2b9d9e0 | 2024-01-08 09:31:16 +0100 | [diff] [blame] | 104 | Mutexes - and most other sleeping locks like rwsems - do not provide an |
| 105 | implicit reference for the memory they occupy, which reference is released |
| 106 | with mutex_unlock(). |
| 107 | |
| 108 | [ This is in contrast with spin_unlock() [or completion_done()], which |
| 109 | APIs can be used to guarantee that the memory is not touched by the |
| 110 | lock implementation after spin_unlock()/completion_done() releases |
| 111 | the lock. ] |
| 112 | |
| 113 | mutex_unlock() may access the mutex structure even after it has internally |
| 114 | released the lock already - so it's not safe for another context to |
| 115 | acquire the mutex and assume that the mutex_unlock() context is not using |
| 116 | the structure anymore. |
| 117 | |
| 118 | The mutex user must ensure that the mutex is not destroyed while a |
| 119 | release operation is still in progress - in other words, callers of |
| 120 | mutex_unlock() must ensure that the mutex stays alive until mutex_unlock() |
| 121 | has returned. |
Davidlohr Bueso | 9161f54 | 2014-05-28 21:36:43 -0700 | [diff] [blame] | 122 | |
| 123 | Interfaces |
| 124 | ---------- |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 125 | Statically define the mutex:: |
| 126 | |
Davidlohr Bueso | 9161f54 | 2014-05-28 21:36:43 -0700 | [diff] [blame] | 127 | DEFINE_MUTEX(name); |
| 128 | |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 129 | Dynamically initialize the mutex:: |
| 130 | |
Davidlohr Bueso | 9161f54 | 2014-05-28 21:36:43 -0700 | [diff] [blame] | 131 | mutex_init(mutex); |
| 132 | |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 133 | Acquire the mutex, uninterruptible:: |
| 134 | |
Davidlohr Bueso | 9161f54 | 2014-05-28 21:36:43 -0700 | [diff] [blame] | 135 | void mutex_lock(struct mutex *lock); |
| 136 | void mutex_lock_nested(struct mutex *lock, unsigned int subclass); |
| 137 | int mutex_trylock(struct mutex *lock); |
| 138 | |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 139 | Acquire the mutex, interruptible:: |
| 140 | |
Davidlohr Bueso | 9161f54 | 2014-05-28 21:36:43 -0700 | [diff] [blame] | 141 | int mutex_lock_interruptible_nested(struct mutex *lock, |
| 142 | unsigned int subclass); |
| 143 | int mutex_lock_interruptible(struct mutex *lock); |
| 144 | |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 145 | Acquire the mutex, interruptible, if dec to 0:: |
| 146 | |
Davidlohr Bueso | 9161f54 | 2014-05-28 21:36:43 -0700 | [diff] [blame] | 147 | int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock); |
| 148 | |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 149 | Unlock the mutex:: |
| 150 | |
Davidlohr Bueso | 9161f54 | 2014-05-28 21:36:43 -0700 | [diff] [blame] | 151 | void mutex_unlock(struct mutex *lock); |
| 152 | |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 153 | Test if the mutex is taken:: |
| 154 | |
Davidlohr Bueso | 9161f54 | 2014-05-28 21:36:43 -0700 | [diff] [blame] | 155 | int mutex_is_locked(struct mutex *lock); |
Ingo Molnar | f3f54ff | 2006-01-09 15:59:20 -0800 | [diff] [blame] | 156 | |
| 157 | Disadvantages |
| 158 | ------------- |
| 159 | |
Juri Lelli | 79e9023 | 2018-02-09 17:01:14 +0100 | [diff] [blame] | 160 | Unlike its original design and purpose, 'struct mutex' is among the largest |
| 161 | locks in the kernel. E.g: on x86-64 it is 32 bytes, where 'struct semaphore' |
| 162 | is 24 bytes and rw_semaphore is 40 bytes. Larger structure sizes mean more CPU |
| 163 | cache and memory footprint. |
Ingo Molnar | f3f54ff | 2006-01-09 15:59:20 -0800 | [diff] [blame] | 164 | |
Davidlohr Bueso | 9161f54 | 2014-05-28 21:36:43 -0700 | [diff] [blame] | 165 | When to use mutexes |
| 166 | ------------------- |
Ingo Molnar | f3f54ff | 2006-01-09 15:59:20 -0800 | [diff] [blame] | 167 | |
Davidlohr Bueso | 9161f54 | 2014-05-28 21:36:43 -0700 | [diff] [blame] | 168 | Unless the strict semantics of mutexes are unsuitable and/or the critical |
| 169 | region prevents the lock from being shared, always prefer them to any other |
| 170 | locking primitive. |