| Completions - "wait for completion" barrier APIs |
| ================================================ |
| |
| Introduction: |
| ------------- |
| |
| If you have one or more threads that must wait for some kernel activity |
| to have reached a point or a specific state, completions can provide a |
| race-free solution to this problem. Semantically they are somewhat like a |
| pthread_barrier() and have similar use-cases. |
| |
| Completions are a code synchronization mechanism which is preferable to any |
| misuse of locks/semaphores and busy-loops. Any time you think of using |
| yield() or some quirky msleep(1) loop to allow something else to proceed, |
| you probably want to look into using one of the wait_for_completion*() |
| calls and complete() instead. |
| |
| The advantage of using completions is that they have a well defined, focused |
| purpose which makes it very easy to see the intent of the code, but they |
| also result in more efficient code as all threads can continue execution |
| until the result is actually needed, and both the waiting and the signalling |
| is highly efficient using low level scheduler sleep/wakeup facilities. |
| |
| Completions are built on top of the waitqueue and wakeup infrastructure of |
| the Linux scheduler. The event the threads on the waitqueue are waiting for |
| is reduced to a simple flag in 'struct completion', appropriately called "done". |
| |
| As completions are scheduling related, the code can be found in |
| kernel/sched/completion.c. |
| |
| |
| Usage: |
| ------ |
| |
| There are three main parts to using completions: |
| |
| - the initialization of the 'struct completion' synchronization object |
| - the waiting part through a call to one of the variants of wait_for_completion(), |
| - the signaling side through a call to complete() or complete_all(). |
| |
| There are also some helper functions for checking the state of completions. |
| Note that while initialization must happen first, the waiting and signaling |
| part can happen in any order. I.e. it's entirely normal for a thread |
| to have marked a completion as 'done' before another thread checks whether |
| it has to wait for it. |
| |
| To use completions you need to #include <linux/completion.h> and |
| create a static or dynamic variable of type 'struct completion', |
| which has only two fields: |
| |
| struct completion { |
| unsigned int done; |
| wait_queue_head_t wait; |
| }; |
| |
| This provides the ->wait waitqueue to place tasks on for waiting (if any), and |
| the ->done completion flag for indicating whether it's completed or not. |
| |
| Completions should be named to refer to the event that is being synchronized on. |
| A good example is: |
| |
| wait_for_completion(&early_console_added); |
| |
| complete(&early_console_added); |
| |
| Good, intuitive naming (as always) helps code readability. Naming a completion |
| 'complete' is not helpful unless the purpose is super obvious... |
| |
| |
| Initializing completions: |
| ------------------------- |
| |
| Dynamically allocated completion objects should preferably be embedded in data |
| structures that are assured to be alive for the life-time of the function/driver, |
| to prevent races with asynchronous complete() calls from occurring. |
| |
| Particular care should be taken when using the _timeout() or _killable()/_interruptible() |
| variants of wait_for_completion(), as it must be assured that memory de-allocation |
| does not happen until all related activities (complete() or reinit_completion()) |
| have taken place, even if these wait functions return prematurely due to a timeout |
| or a signal triggering. |
| |
| Initializing of dynamically allocated completion objects is done via a call to |
| init_completion(): |
| |
| init_completion(&dynamic_object->done); |
| |
| In this call we initialize the waitqueue and set ->done to 0, i.e. "not completed" |
| or "not done". |
| |
| The re-initialization function, reinit_completion(), simply resets the |
| ->done field to 0 ("not done"), without touching the waitqueue. |
| Callers of this function must make sure that there are no racy |
| wait_for_completion() calls going on in parallel. |
| |
| Calling init_completion() on the same completion object twice is |
| most likely a bug as it re-initializes the queue to an empty queue and |
| enqueued tasks could get "lost" - use reinit_completion() in that case, |
| but be aware of other races. |
| |
| For static declaration and initialization, macros are available. |
| |
| For static (or global) declarations in file scope you can use DECLARE_COMPLETION(): |
| |
| static DECLARE_COMPLETION(setup_done); |
| DECLARE_COMPLETION(setup_done); |
| |
| Note that in this case the completion is boot time (or module load time) |
| initialized to 'not done' and doesn't require an init_completion() call. |
| |
| When a completion is declared as a local variable within a function, |
| then the initialization should always use DECLARE_COMPLETION_ONSTACK() |
| explicitly, not just to make lockdep happy, but also to make it clear |
| that limited scope had been considered and is intentional: |
| |
| DECLARE_COMPLETION_ONSTACK(setup_done) |
| |
| Note that when using completion objects as local variables you must be |
| acutely aware of the short life time of the function stack: the function |
| must not return to a calling context until all activities (such as waiting |
| threads) have ceased and the completion object is completely unused. |
| |
| To emphasise this again: in particular when using some of the waiting API variants |
| with more complex outcomes, such as the timeout or signalling (_timeout(), |
| _killable() and _interruptible()) variants, the wait might complete |
| prematurely while the object might still be in use by another thread - and a return |
| from the wait_on_completion*() caller function will deallocate the function |
| stack and cause subtle data corruption if a complete() is done in some |
| other thread. Simple testing might not trigger these kinds of races. |
| |
| If unsure, use dynamically allocated completion objects, preferably embedded |
| in some other long lived object that has a boringly long life time which |
| exceeds the life time of any helper threads using the completion object, |
| or has a lock or other synchronization mechanism to make sure complete() |
| is not called on a freed object. |
| |
| A naive DECLARE_COMPLETION() on the stack triggers a lockdep warning. |
| |
| Waiting for completions: |
| ------------------------ |
| |
| For a thread to wait for some concurrent activity to finish, it |
| calls wait_for_completion() on the initialized completion structure: |
| |
| void wait_for_completion(struct completion *done) |
| |
| A typical usage scenario is: |
| |
| CPU#1 CPU#2 |
| |
| struct completion setup_done; |
| |
| init_completion(&setup_done); |
| initialize_work(...,&setup_done,...); |
| |
| /* run non-dependent code */ /* do setup */ |
| |
| wait_for_completion(&setup_done); complete(setup_done); |
| |
| This is not implying any particular order between wait_for_completion() and |
| the call to complete() - if the call to complete() happened before the call |
| to wait_for_completion() then the waiting side simply will continue |
| immediately as all dependencies are satisfied; if not, it will block until |
| completion is signaled by complete(). |
| |
| Note that wait_for_completion() is calling spin_lock_irq()/spin_unlock_irq(), |
| so it can only be called safely when you know that interrupts are enabled. |
| Calling it from IRQs-off atomic contexts will result in hard-to-detect |
| spurious enabling of interrupts. |
| |
| The default behavior is to wait without a timeout and to mark the task as |
| uninterruptible. wait_for_completion() and its variants are only safe |
| in process context (as they can sleep) but not in atomic context, |
| interrupt context, with disabled IRQs, or preemption is disabled - see also |
| try_wait_for_completion() below for handling completion in atomic/interrupt |
| context. |
| |
| As all variants of wait_for_completion() can (obviously) block for a long |
| time depending on the nature of the activity they are waiting for, so in |
| most cases you probably don't want to call this with held mutexes. |
| |
| |
| wait_for_completion*() variants available: |
| ------------------------------------------ |
| |
| The below variants all return status and this status should be checked in |
| most(/all) cases - in cases where the status is deliberately not checked you |
| probably want to make a note explaining this (e.g. see |
| arch/arm/kernel/smp.c:__cpu_up()). |
| |
| A common problem that occurs is to have unclean assignment of return types, |
| so take care to assign return-values to variables of the proper type. |
| |
| Checking for the specific meaning of return values also has been found |
| to be quite inaccurate, e.g. constructs like: |
| |
| if (!wait_for_completion_interruptible_timeout(...)) |
| |
| ... would execute the same code path for successful completion and for the |
| interrupted case - which is probably not what you want. |
| |
| int wait_for_completion_interruptible(struct completion *done) |
| |
| This function marks the task TASK_INTERRUPTIBLE while it is waiting. |
| If a signal was received while waiting it will return -ERESTARTSYS; 0 otherwise. |
| |
| unsigned long wait_for_completion_timeout(struct completion *done, unsigned long timeout) |
| |
| The task is marked as TASK_UNINTERRUPTIBLE and will wait at most 'timeout' |
| jiffies. If a timeout occurs it returns 0, else the remaining time in |
| jiffies (but at least 1). |
| |
| Timeouts are preferably calculated with msecs_to_jiffies() or usecs_to_jiffies(), |
| to make the code largely HZ-invariant. |
| |
| If the returned timeout value is deliberately ignored a comment should probably explain |
| why (e.g. see drivers/mfd/wm8350-core.c wm8350_read_auxadc()). |
| |
| long wait_for_completion_interruptible_timeout(struct completion *done, unsigned long timeout) |
| |
| This function passes a timeout in jiffies and marks the task as |
| TASK_INTERRUPTIBLE. If a signal was received it will return -ERESTARTSYS; |
| otherwise it returns 0 if the completion timed out, or the remaining time in |
| jiffies if completion occurred. |
| |
| Further variants include _killable which uses TASK_KILLABLE as the |
| designated tasks state and will return -ERESTARTSYS if it is interrupted, |
| or 0 if completion was achieved. There is a _timeout variant as well: |
| |
| long wait_for_completion_killable(struct completion *done) |
| long wait_for_completion_killable_timeout(struct completion *done, unsigned long timeout) |
| |
| The _io variants wait_for_completion_io() behave the same as the non-_io |
| variants, except for accounting waiting time as 'waiting on IO', which has |
| an impact on how the task is accounted in scheduling/IO stats: |
| |
| void wait_for_completion_io(struct completion *done) |
| unsigned long wait_for_completion_io_timeout(struct completion *done, unsigned long timeout) |
| |
| |
| Signaling completions: |
| ---------------------- |
| |
| A thread that wants to signal that the conditions for continuation have been |
| achieved calls complete() to signal exactly one of the waiters that it can |
| continue: |
| |
| void complete(struct completion *done) |
| |
| ... or calls complete_all() to signal all current and future waiters: |
| |
| void complete_all(struct completion *done) |
| |
| The signaling will work as expected even if completions are signaled before |
| a thread starts waiting. This is achieved by the waiter "consuming" |
| (decrementing) the done field of 'struct completion'. Waiting threads |
| wakeup order is the same in which they were enqueued (FIFO order). |
| |
| If complete() is called multiple times then this will allow for that number |
| of waiters to continue - each call to complete() will simply increment the |
| done field. Calling complete_all() multiple times is a bug though. Both |
| complete() and complete_all() can be called in IRQ/atomic context safely. |
| |
| There can only be one thread calling complete() or complete_all() on a |
| particular 'struct completion' at any time - serialized through the wait |
| queue spinlock. Any such concurrent calls to complete() or complete_all() |
| probably are a design bug. |
| |
| Signaling completion from IRQ context is fine as it will appropriately |
| lock with spin_lock_irqsave()/spin_unlock_irqrestore() and it will never |
| sleep. |
| |
| |
| try_wait_for_completion()/completion_done(): |
| -------------------------------------------- |
| |
| The try_wait_for_completion() function will not put the thread on the wait |
| queue but rather returns false if it would need to enqueue (block) the thread, |
| else it consumes one posted completion and returns true. |
| |
| bool try_wait_for_completion(struct completion *done) |
| |
| Finally, to check the state of a completion without changing it in any way, |
| call completion_done(), which returns false if there are no posted |
| completions that were not yet consumed by waiters (implying that there are |
| waiters) and true otherwise; |
| |
| bool completion_done(struct completion *done) |
| |
| Both try_wait_for_completion() and completion_done() are safe to be called in |
| IRQ or atomic context. |