| CONTROL DEPENDENCIES |
| ==================== |
| |
| A major difficulty with control dependencies is that current compilers |
| do not support them. One purpose of this document is therefore to |
| help you prevent your compiler from breaking your code. However, |
| control dependencies also pose other challenges, which leads to the |
| second purpose of this document, namely to help you to avoid breaking |
| your own code, even in the absence of help from your compiler. |
| |
| One such challenge is that control dependencies order only later stores. |
| Therefore, a load-load control dependency will not preserve ordering |
| unless a read memory barrier is provided. Consider the following code: |
| |
| q = READ_ONCE(a); |
| if (q) |
| p = READ_ONCE(b); |
| |
| This is not guaranteed to provide any ordering because some types of CPUs |
| are permitted to predict the result of the load from "b". This prediction |
| can cause other CPUs to see this load as having happened before the load |
| from "a". This means that an explicit read barrier is required, for example |
| as follows: |
| |
| q = READ_ONCE(a); |
| if (q) { |
| smp_rmb(); |
| p = READ_ONCE(b); |
| } |
| |
| However, stores are not speculated. This means that ordering is |
| (usually) guaranteed for load-store control dependencies, as in the |
| following example: |
| |
| q = READ_ONCE(a); |
| if (q) |
| WRITE_ONCE(b, 1); |
| |
| Control dependencies can pair with each other and with other types |
| of ordering. But please note that neither the READ_ONCE() nor the |
| WRITE_ONCE() are optional. Without the READ_ONCE(), the compiler might |
| fuse the load from "a" with other loads. Without the WRITE_ONCE(), |
| the compiler might fuse the store to "b" with other stores. Worse yet, |
| the compiler might convert the store into a load and a check followed |
| by a store, and this compiler-generated load would not be ordered by |
| the control dependency. |
| |
| Furthermore, if the compiler is able to prove that the value of variable |
| "a" is always non-zero, it would be well within its rights to optimize |
| the original example by eliminating the "if" statement as follows: |
| |
| q = a; |
| b = 1; /* BUG: Compiler and CPU can both reorder!!! */ |
| |
| So don't leave out either the READ_ONCE() or the WRITE_ONCE(). |
| In particular, although READ_ONCE() does force the compiler to emit a |
| load, it does *not* force the compiler to actually use the loaded value. |
| |
| It is tempting to try use control dependencies to enforce ordering on |
| identical stores on both branches of the "if" statement as follows: |
| |
| q = READ_ONCE(a); |
| if (q) { |
| barrier(); |
| WRITE_ONCE(b, 1); |
| do_something(); |
| } else { |
| barrier(); |
| WRITE_ONCE(b, 1); |
| do_something_else(); |
| } |
| |
| Unfortunately, current compilers will transform this as follows at high |
| optimization levels: |
| |
| q = READ_ONCE(a); |
| barrier(); |
| WRITE_ONCE(b, 1); /* BUG: No ordering vs. load from a!!! */ |
| if (q) { |
| /* WRITE_ONCE(b, 1); -- moved up, BUG!!! */ |
| do_something(); |
| } else { |
| /* WRITE_ONCE(b, 1); -- moved up, BUG!!! */ |
| do_something_else(); |
| } |
| |
| Now there is no conditional between the load from "a" and the store to |
| "b", which means that the CPU is within its rights to reorder them: The |
| conditional is absolutely required, and must be present in the final |
| assembly code, after all of the compiler and link-time optimizations |
| have been applied. Therefore, if you need ordering in this example, |
| you must use explicit memory ordering, for example, smp_store_release(): |
| |
| q = READ_ONCE(a); |
| if (q) { |
| smp_store_release(&b, 1); |
| do_something(); |
| } else { |
| smp_store_release(&b, 1); |
| do_something_else(); |
| } |
| |
| Without explicit memory ordering, control-dependency-based ordering is |
| guaranteed only when the stores differ, for example: |
| |
| q = READ_ONCE(a); |
| if (q) { |
| WRITE_ONCE(b, 1); |
| do_something(); |
| } else { |
| WRITE_ONCE(b, 2); |
| do_something_else(); |
| } |
| |
| The initial READ_ONCE() is still required to prevent the compiler from |
| knowing too much about the value of "a". |
| |
| But please note that you need to be careful what you do with the local |
| variable "q", otherwise the compiler might be able to guess the value |
| and again remove the conditional branch that is absolutely required to |
| preserve ordering. For example: |
| |
| q = READ_ONCE(a); |
| if (q % MAX) { |
| WRITE_ONCE(b, 1); |
| do_something(); |
| } else { |
| WRITE_ONCE(b, 2); |
| do_something_else(); |
| } |
| |
| If MAX is compile-time defined to be 1, then the compiler knows that |
| (q % MAX) must be equal to zero, regardless of the value of "q". |
| The compiler is therefore within its rights to transform the above code |
| into the following: |
| |
| q = READ_ONCE(a); |
| WRITE_ONCE(b, 2); |
| do_something_else(); |
| |
| Given this transformation, the CPU is not required to respect the ordering |
| between the load from variable "a" and the store to variable "b". It is |
| tempting to add a barrier(), but this does not help. The conditional |
| is gone, and the barrier won't bring it back. Therefore, if you need |
| to relying on control dependencies to produce this ordering, you should |
| make sure that MAX is greater than one, perhaps as follows: |
| |
| q = READ_ONCE(a); |
| BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */ |
| if (q % MAX) { |
| WRITE_ONCE(b, 1); |
| do_something(); |
| } else { |
| WRITE_ONCE(b, 2); |
| do_something_else(); |
| } |
| |
| Please note once again that each leg of the "if" statement absolutely |
| must store different values to "b". As in previous examples, if the two |
| values were identical, the compiler could pull this store outside of the |
| "if" statement, destroying the control dependency's ordering properties. |
| |
| You must also be careful avoid relying too much on boolean short-circuit |
| evaluation. Consider this example: |
| |
| q = READ_ONCE(a); |
| if (q || 1 > 0) |
| WRITE_ONCE(b, 1); |
| |
| Because the first condition cannot fault and the second condition is |
| always true, the compiler can transform this example as follows, again |
| destroying the control dependency's ordering: |
| |
| q = READ_ONCE(a); |
| WRITE_ONCE(b, 1); |
| |
| This is yet another example showing the importance of preventing the |
| compiler from out-guessing your code. Again, although READ_ONCE() really |
| does force the compiler to emit code for a given load, the compiler is |
| within its rights to discard the loaded value. |
| |
| In addition, control dependencies apply only to the then-clause and |
| else-clause of the "if" statement in question. In particular, they do |
| not necessarily order the code following the entire "if" statement: |
| |
| q = READ_ONCE(a); |
| if (q) { |
| WRITE_ONCE(b, 1); |
| } else { |
| WRITE_ONCE(b, 2); |
| } |
| WRITE_ONCE(c, 1); /* BUG: No ordering against the read from "a". */ |
| |
| It is tempting to argue that there in fact is ordering because the |
| compiler cannot reorder volatile accesses and also cannot reorder |
| the writes to "b" with the condition. Unfortunately for this line |
| of reasoning, the compiler might compile the two writes to "b" as |
| conditional-move instructions, as in this fanciful pseudo-assembly |
| language: |
| |
| ld r1,a |
| cmp r1,$0 |
| cmov,ne r4,$1 |
| cmov,eq r4,$2 |
| st r4,b |
| st $1,c |
| |
| The control dependencies would then extend only to the pair of cmov |
| instructions and the store depending on them. This means that a weakly |
| ordered CPU would have no dependency of any sort between the load from |
| "a" and the store to "c". In short, control dependencies provide ordering |
| only to the stores in the then-clause and else-clause of the "if" statement |
| in question (including functions invoked by those two clauses), and not |
| to code following that "if" statement. |
| |
| |
| In summary: |
| |
| (*) Control dependencies can order prior loads against later stores. |
| However, they do *not* guarantee any other sort of ordering: |
| Not prior loads against later loads, nor prior stores against |
| later anything. If you need these other forms of ordering, use |
| smp_load_acquire(), smp_store_release(), or, in the case of prior |
| stores and later loads, smp_mb(). |
| |
| (*) If both legs of the "if" statement contain identical stores to |
| the same variable, then you must explicitly order those stores, |
| either by preceding both of them with smp_mb() or by using |
| smp_store_release(). Please note that it is *not* sufficient to use |
| barrier() at beginning and end of each leg of the "if" statement |
| because, as shown by the example above, optimizing compilers can |
| destroy the control dependency while respecting the letter of the |
| barrier() law. |
| |
| (*) Control dependencies require at least one run-time conditional |
| between the prior load and the subsequent store, and this |
| conditional must involve the prior load. If the compiler is able |
| to optimize the conditional away, it will have also optimized |
| away the ordering. Careful use of READ_ONCE() and WRITE_ONCE() |
| can help to preserve the needed conditional. |
| |
| (*) Control dependencies require that the compiler avoid reordering the |
| dependency into nonexistence. Careful use of READ_ONCE() or |
| atomic{,64}_read() can help to preserve your control dependency. |
| |
| (*) Control dependencies apply only to the then-clause and else-clause |
| of the "if" statement containing the control dependency, including |
| any functions that these two clauses call. Control dependencies |
| do *not* apply to code beyond the end of that "if" statement. |
| |
| (*) Control dependencies pair normally with other types of barriers. |
| |
| (*) Control dependencies do *not* provide multicopy atomicity. If you |
| need all the CPUs to agree on the ordering of a given store against |
| all other accesses, use smp_mb(). |
| |
| (*) Compilers do not understand control dependencies. It is therefore |
| your job to ensure that they do not break your code. |