| This document gives an overview of the categories of memory-ordering | 
 | operations provided by the Linux-kernel memory model (LKMM). | 
 |  | 
 |  | 
 | Categories of Ordering | 
 | ====================== | 
 |  | 
 | This section lists LKMM's three top-level categories of memory-ordering | 
 | operations in decreasing order of strength: | 
 |  | 
 | 1.	Barriers (also known as "fences").  A barrier orders some or | 
 | 	all of the CPU's prior operations against some or all of its | 
 | 	subsequent operations. | 
 |  | 
 | 2.	Ordered memory accesses.  These operations order themselves | 
 | 	against some or all of the CPU's prior accesses or some or all | 
 | 	of the CPU's subsequent accesses, depending on the subcategory | 
 | 	of the operation. | 
 |  | 
 | 3.	Unordered accesses, as the name indicates, have no ordering | 
 | 	properties except to the extent that they interact with an | 
 | 	operation in the previous categories.  This being the real world, | 
 | 	some of these "unordered" operations provide limited ordering | 
 | 	in some special situations. | 
 |  | 
 | Each of the above categories is described in more detail by one of the | 
 | following sections. | 
 |  | 
 |  | 
 | Barriers | 
 | ======== | 
 |  | 
 | Each of the following categories of barriers is described in its own | 
 | subsection below: | 
 |  | 
 | a.	Full memory barriers. | 
 |  | 
 | b.	Read-modify-write (RMW) ordering augmentation barriers. | 
 |  | 
 | c.	Write memory barrier. | 
 |  | 
 | d.	Read memory barrier. | 
 |  | 
 | e.	Compiler barrier. | 
 |  | 
 | Note well that many of these primitives generate absolutely no code | 
 | in kernels built with CONFIG_SMP=n.  Therefore, if you are writing | 
 | a device driver, which must correctly order accesses to a physical | 
 | device even in kernels built with CONFIG_SMP=n, please use the | 
 | ordering primitives provided for that purpose.  For example, instead of | 
 | smp_mb(), use mb().  See the "Linux Kernel Device Drivers" book or the | 
 | https://lwn.net/Articles/698014/ article for more information. | 
 |  | 
 |  | 
 | Full Memory Barriers | 
 | -------------------- | 
 |  | 
 | The Linux-kernel primitives that provide full ordering include: | 
 |  | 
 | o	The smp_mb() full memory barrier. | 
 |  | 
 | o	Value-returning RMW atomic operations whose names do not end in | 
 | 	_acquire, _release, or _relaxed. | 
 |  | 
 | o	RCU's grace-period primitives. | 
 |  | 
 | First, the smp_mb() full memory barrier orders all of the CPU's prior | 
 | accesses against all subsequent accesses from the viewpoint of all CPUs. | 
 | In other words, all CPUs will agree that any earlier action taken | 
 | by that CPU happened before any later action taken by that same CPU. | 
 | For example, consider the following: | 
 |  | 
 | 	WRITE_ONCE(x, 1); | 
 | 	smp_mb(); // Order store to x before load from y. | 
 | 	r1 = READ_ONCE(y); | 
 |  | 
 | All CPUs will agree that the store to "x" happened before the load | 
 | from "y", as indicated by the comment.  And yes, please comment your | 
 | memory-ordering primitives.  It is surprisingly hard to remember their | 
 | purpose after even a few months. | 
 |  | 
 | Second, some RMW atomic operations provide full ordering.  These | 
 | operations include value-returning RMW atomic operations (that is, those | 
 | with non-void return types) whose names do not end in _acquire, _release, | 
 | or _relaxed.  Examples include atomic_add_return(), atomic_dec_and_test(), | 
 | cmpxchg(), and xchg().  Note that conditional RMW atomic operations such | 
 | as cmpxchg() are only guaranteed to provide ordering when they succeed. | 
 | When RMW atomic operations provide full ordering, they partition the | 
 | CPU's accesses into three groups: | 
 |  | 
 | 1.	All code that executed prior to the RMW atomic operation. | 
 |  | 
 | 2.	The RMW atomic operation itself. | 
 |  | 
 | 3.	All code that executed after the RMW atomic operation. | 
 |  | 
 | All CPUs will agree that any operation in a given partition happened | 
 | before any operation in a higher-numbered partition. | 
 |  | 
 | In contrast, non-value-returning RMW atomic operations (that is, those | 
 | with void return types) do not guarantee any ordering whatsoever.  Nor do | 
 | value-returning RMW atomic operations whose names end in _relaxed. | 
 | Examples of the former include atomic_inc() and atomic_dec(), | 
 | while examples of the latter include atomic_cmpxchg_relaxed() and | 
 | atomic_xchg_relaxed().  Similarly, value-returning non-RMW atomic | 
 | operations such as atomic_read() do not guarantee full ordering, and | 
 | are covered in the later section on unordered operations. | 
 |  | 
 | Value-returning RMW atomic operations whose names end in _acquire or | 
 | _release provide limited ordering, and will be described later in this | 
 | document. | 
 |  | 
 | Finally, RCU's grace-period primitives provide full ordering.  These | 
 | primitives include synchronize_rcu(), synchronize_rcu_expedited(), | 
 | synchronize_srcu() and so on.  However, these primitives have orders | 
 | of magnitude greater overhead than smp_mb(), atomic_xchg(), and so on. | 
 | Furthermore, RCU's grace-period primitives can only be invoked in | 
 | sleepable contexts.  Therefore, RCU's grace-period primitives are | 
 | typically instead used to provide ordering against RCU read-side critical | 
 | sections, as documented in their comment headers.  But of course if you | 
 | need a synchronize_rcu() to interact with readers, it costs you nothing | 
 | to also rely on its additional full-memory-barrier semantics.  Just please | 
 | carefully comment this, otherwise your future self will hate you. | 
 |  | 
 |  | 
 | RMW Ordering Augmentation Barriers | 
 | ---------------------------------- | 
 |  | 
 | As noted in the previous section, non-value-returning RMW operations | 
 | such as atomic_inc() and atomic_dec() guarantee no ordering whatsoever. | 
 | Nevertheless, a number of popular CPU families, including x86, provide | 
 | full ordering for these primitives.  One way to obtain full ordering on | 
 | all architectures is to add a call to smp_mb(): | 
 |  | 
 | 	WRITE_ONCE(x, 1); | 
 | 	atomic_inc(&my_counter); | 
 | 	smp_mb(); // Inefficient on x86!!! | 
 | 	r1 = READ_ONCE(y); | 
 |  | 
 | This works, but the added smp_mb() adds needless overhead for | 
 | x86, on which atomic_inc() provides full ordering all by itself. | 
 | The smp_mb__after_atomic() primitive can be used instead: | 
 |  | 
 | 	WRITE_ONCE(x, 1); | 
 | 	atomic_inc(&my_counter); | 
 | 	smp_mb__after_atomic(); // Order store to x before load from y. | 
 | 	r1 = READ_ONCE(y); | 
 |  | 
 | The smp_mb__after_atomic() primitive emits code only on CPUs whose | 
 | atomic_inc() implementations do not guarantee full ordering, thus | 
 | incurring no unnecessary overhead on x86.  There are a number of | 
 | variations on the smp_mb__*() theme: | 
 |  | 
 | o	smp_mb__before_atomic(), which provides full ordering prior | 
 | 	to an unordered RMW atomic operation. | 
 |  | 
 | o	smp_mb__after_atomic(), which, as shown above, provides full | 
 | 	ordering subsequent to an unordered RMW atomic operation. | 
 |  | 
 | o	smp_mb__after_spinlock(), which provides full ordering subsequent | 
 | 	to a successful spinlock acquisition.  Note that spin_lock() is | 
 | 	always successful but spin_trylock() might not be. | 
 |  | 
 | o	smp_mb__after_srcu_read_unlock(), which provides full ordering | 
 | 	subsequent to an srcu_read_unlock(). | 
 |  | 
 | It is bad practice to place code between the smp__*() primitive and the | 
 | operation whose ordering that it is augmenting.  The reason is that the | 
 | ordering of this intervening code will differ from one CPU architecture | 
 | to another. | 
 |  | 
 |  | 
 | Write Memory Barrier | 
 | -------------------- | 
 |  | 
 | The Linux kernel's write memory barrier is smp_wmb().  If a CPU executes | 
 | the following code: | 
 |  | 
 | 	WRITE_ONCE(x, 1); | 
 | 	smp_wmb(); | 
 | 	WRITE_ONCE(y, 1); | 
 |  | 
 | Then any given CPU will see the write to "x" has having happened before | 
 | the write to "y".  However, you are usually better off using a release | 
 | store, as described in the "Release Operations" section below. | 
 |  | 
 | Note that smp_wmb() might fail to provide ordering for unmarked C-language | 
 | stores because profile-driven optimization could determine that the | 
 | value being overwritten is almost always equal to the new value.  Such a | 
 | compiler might then reasonably decide to transform "x = 1" and "y = 1" | 
 | as follows: | 
 |  | 
 | 	if (x != 1) | 
 | 		x = 1; | 
 | 	smp_wmb(); // BUG: does not order the reads!!! | 
 | 	if (y != 1) | 
 | 		y = 1; | 
 |  | 
 | Therefore, if you need to use smp_wmb() with unmarked C-language writes, | 
 | you will need to make sure that none of the compilers used to build | 
 | the Linux kernel carry out this sort of transformation, both now and in | 
 | the future. | 
 |  | 
 |  | 
 | Read Memory Barrier | 
 | ------------------- | 
 |  | 
 | The Linux kernel's read memory barrier is smp_rmb().  If a CPU executes | 
 | the following code: | 
 |  | 
 | 	r0 = READ_ONCE(y); | 
 | 	smp_rmb(); | 
 | 	r1 = READ_ONCE(x); | 
 |  | 
 | Then any given CPU will see the read from "y" as having preceded the read from | 
 | "x".  However, you are usually better off using an acquire load, as described | 
 | in the "Acquire Operations" section below. | 
 |  | 
 | Compiler Barrier | 
 | ---------------- | 
 |  | 
 | The Linux kernel's compiler barrier is barrier().  This primitive | 
 | prohibits compiler code-motion optimizations that might move memory | 
 | references across the point in the code containing the barrier(), but | 
 | does not constrain hardware memory ordering.  For example, this can be | 
 | used to prevent to compiler from moving code across an infinite loop: | 
 |  | 
 | 	WRITE_ONCE(x, 1); | 
 | 	while (dontstop) | 
 | 		barrier(); | 
 | 	r1 = READ_ONCE(y); | 
 |  | 
 | Without the barrier(), the compiler would be within its rights to move the | 
 | WRITE_ONCE() to follow the loop.  This code motion could be problematic | 
 | in the case where an interrupt handler terminates the loop.  Another way | 
 | to handle this is to use READ_ONCE() for the load of "dontstop". | 
 |  | 
 | Note that the barriers discussed previously use barrier() or its low-level | 
 | equivalent in their implementations. | 
 |  | 
 |  | 
 | Ordered Memory Accesses | 
 | ======================= | 
 |  | 
 | The Linux kernel provides a wide variety of ordered memory accesses: | 
 |  | 
 | a.	Release operations. | 
 |  | 
 | b.	Acquire operations. | 
 |  | 
 | c.	RCU read-side ordering. | 
 |  | 
 | d.	Control dependencies. | 
 |  | 
 | Each of the above categories has its own section below. | 
 |  | 
 |  | 
 | Release Operations | 
 | ------------------ | 
 |  | 
 | Release operations include smp_store_release(), atomic_set_release(), | 
 | rcu_assign_pointer(), and value-returning RMW operations whose names | 
 | end in _release.  These operations order their own store against all | 
 | of the CPU's prior memory accesses.  Release operations often provide | 
 | improved readability and performance compared to explicit barriers. | 
 | For example, use of smp_store_release() saves a line compared to the | 
 | smp_wmb() example above: | 
 |  | 
 | 	WRITE_ONCE(x, 1); | 
 | 	smp_store_release(&y, 1); | 
 |  | 
 | More important, smp_store_release() makes it easier to connect up the | 
 | different pieces of the concurrent algorithm.  The variable stored to | 
 | by the smp_store_release(), in this case "y", will normally be used in | 
 | an acquire operation in other parts of the concurrent algorithm. | 
 |  | 
 | To see the performance advantages, suppose that the above example read | 
 | from "x" instead of writing to it.  Then an smp_wmb() could not guarantee | 
 | ordering, and an smp_mb() would be needed instead: | 
 |  | 
 | 	r1 = READ_ONCE(x); | 
 | 	smp_mb(); | 
 | 	WRITE_ONCE(y, 1); | 
 |  | 
 | But smp_mb() often incurs much higher overhead than does | 
 | smp_store_release(), which still provides the needed ordering of "x" | 
 | against "y".  On x86, the version using smp_store_release() might compile | 
 | to a simple load instruction followed by a simple store instruction. | 
 | In contrast, the smp_mb() compiles to an expensive instruction that | 
 | provides the needed ordering. | 
 |  | 
 | There is a wide variety of release operations: | 
 |  | 
 | o	Store operations, including not only the aforementioned | 
 | 	smp_store_release(), but also atomic_set_release(), and | 
 | 	atomic_long_set_release(). | 
 |  | 
 | o	RCU's rcu_assign_pointer() operation.  This is the same as | 
 | 	smp_store_release() except that: (1) It takes the pointer to | 
 | 	be assigned to instead of a pointer to that pointer, (2) It | 
 | 	is intended to be used in conjunction with rcu_dereference() | 
 | 	and similar rather than smp_load_acquire(), and (3) It checks | 
 | 	for an RCU-protected pointer in "sparse" runs. | 
 |  | 
 | o	Value-returning RMW operations whose names end in _release, | 
 | 	such as atomic_fetch_add_release() and cmpxchg_release(). | 
 | 	Note that release ordering is guaranteed only against the | 
 | 	memory-store portion of the RMW operation, and not against the | 
 | 	memory-load portion.  Note also that conditional operations such | 
 | 	as cmpxchg_release() are only guaranteed to provide ordering | 
 | 	when they succeed. | 
 |  | 
 | As mentioned earlier, release operations are often paired with acquire | 
 | operations, which are the subject of the next section. | 
 |  | 
 |  | 
 | Acquire Operations | 
 | ------------------ | 
 |  | 
 | Acquire operations include smp_load_acquire(), atomic_read_acquire(), | 
 | and value-returning RMW operations whose names end in _acquire.   These | 
 | operations order their own load against all of the CPU's subsequent | 
 | memory accesses.  Acquire operations often provide improved performance | 
 | and readability compared to explicit barriers.  For example, use of | 
 | smp_load_acquire() saves a line compared to the smp_rmb() example above: | 
 |  | 
 | 	r0 = smp_load_acquire(&y); | 
 | 	r1 = READ_ONCE(x); | 
 |  | 
 | As with smp_store_release(), this also makes it easier to connect | 
 | the different pieces of the concurrent algorithm by looking for the | 
 | smp_store_release() that stores to "y".  In addition, smp_load_acquire() | 
 | improves upon smp_rmb() by ordering against subsequent stores as well | 
 | as against subsequent loads. | 
 |  | 
 | There are a couple of categories of acquire operations: | 
 |  | 
 | o	Load operations, including not only the aforementioned | 
 | 	smp_load_acquire(), but also atomic_read_acquire(), and | 
 | 	atomic64_read_acquire(). | 
 |  | 
 | o	Value-returning RMW operations whose names end in _acquire, | 
 | 	such as atomic_xchg_acquire() and atomic_cmpxchg_acquire(). | 
 | 	Note that acquire ordering is guaranteed only against the | 
 | 	memory-load portion of the RMW operation, and not against the | 
 | 	memory-store portion.  Note also that conditional operations | 
 | 	such as atomic_cmpxchg_acquire() are only guaranteed to provide | 
 | 	ordering when they succeed. | 
 |  | 
 | Symmetry being what it is, acquire operations are often paired with the | 
 | release operations covered earlier.  For example, consider the following | 
 | example, where task0() and task1() execute concurrently: | 
 |  | 
 | 	void task0(void) | 
 | 	{ | 
 | 		WRITE_ONCE(x, 1); | 
 | 		smp_store_release(&y, 1); | 
 | 	} | 
 |  | 
 | 	void task1(void) | 
 | 	{ | 
 | 		r0 = smp_load_acquire(&y); | 
 | 		r1 = READ_ONCE(x); | 
 | 	} | 
 |  | 
 | If "x" and "y" are both initially zero, then either r0's final value | 
 | will be zero or r1's final value will be one, thus providing the required | 
 | ordering. | 
 |  | 
 |  | 
 | RCU Read-Side Ordering | 
 | ---------------------- | 
 |  | 
 | This category includes read-side markers such as rcu_read_lock() | 
 | and rcu_read_unlock() as well as pointer-traversal primitives such as | 
 | rcu_dereference() and srcu_dereference(). | 
 |  | 
 | Compared to locking primitives and RMW atomic operations, markers | 
 | for RCU read-side critical sections incur very low overhead because | 
 | they interact only with the corresponding grace-period primitives. | 
 | For example, the rcu_read_lock() and rcu_read_unlock() markers interact | 
 | with synchronize_rcu(), synchronize_rcu_expedited(), and call_rcu(). | 
 | The way this works is that if a given call to synchronize_rcu() cannot | 
 | prove that it started before a given call to rcu_read_lock(), then | 
 | that synchronize_rcu() must block until the matching rcu_read_unlock() | 
 | is reached.  For more information, please see the synchronize_rcu() | 
 | docbook header comment and the material in Documentation/RCU. | 
 |  | 
 | RCU's pointer-traversal primitives, including rcu_dereference() and | 
 | srcu_dereference(), order their load (which must be a pointer) against any | 
 | of the CPU's subsequent memory accesses whose address has been calculated | 
 | from the value loaded.  There is said to be an *address dependency* | 
 | from the value returned by the rcu_dereference() or srcu_dereference() | 
 | to that subsequent memory access. | 
 |  | 
 | A call to rcu_dereference() for a given RCU-protected pointer is | 
 | usually paired with a call to a call to rcu_assign_pointer() for that | 
 | same pointer in much the same way that a call to smp_load_acquire() is | 
 | paired with a call to smp_store_release().  Calls to rcu_dereference() | 
 | and rcu_assign_pointer are often buried in other APIs, for example, | 
 | the RCU list API members defined in include/linux/rculist.h.  For more | 
 | information, please see the docbook headers in that file, the most | 
 | recent LWN article on the RCU API (https://lwn.net/Articles/777036/), | 
 | and of course the material in Documentation/RCU. | 
 |  | 
 | If the pointer value is manipulated between the rcu_dereference() | 
 | that returned it and a later dereference(), please read | 
 | Documentation/RCU/rcu_dereference.rst.  It can also be quite helpful to | 
 | review uses in the Linux kernel. | 
 |  | 
 |  | 
 | Control Dependencies | 
 | -------------------- | 
 |  | 
 | A control dependency extends from a marked load (READ_ONCE() or stronger) | 
 | through an "if" condition to a marked store (WRITE_ONCE() or stronger) | 
 | that is executed only by one of the legs of that "if" statement. | 
 | Control dependencies are so named because they are mediated by | 
 | control-flow instructions such as comparisons and conditional branches. | 
 |  | 
 | In short, you can use a control dependency to enforce ordering between | 
 | an READ_ONCE() and a WRITE_ONCE() when there is an "if" condition | 
 | between them.  The canonical example is as follows: | 
 |  | 
 | 	q = READ_ONCE(a); | 
 | 	if (q) | 
 | 		WRITE_ONCE(b, 1); | 
 |  | 
 | In this case, all CPUs would see the read from "a" as happening before | 
 | the write to "b". | 
 |  | 
 | However, control dependencies are easily destroyed by compiler | 
 | optimizations, so any use of control dependencies must take into account | 
 | all of the compilers used to build the Linux kernel.  Please see the | 
 | "control-dependencies.txt" file for more information. | 
 |  | 
 |  | 
 | Unordered Accesses | 
 | ================== | 
 |  | 
 | Each of these two categories of unordered accesses has a section below: | 
 |  | 
 | a.	Unordered marked operations. | 
 |  | 
 | b.	Unmarked C-language accesses. | 
 |  | 
 |  | 
 | Unordered Marked Operations | 
 | --------------------------- | 
 |  | 
 | Unordered operations to different variables are just that, unordered. | 
 | However, if a group of CPUs apply these operations to a single variable, | 
 | all the CPUs will agree on the operation order.  Of course, the ordering | 
 | of unordered marked accesses can also be constrained using the mechanisms | 
 | described earlier in this document. | 
 |  | 
 | These operations come in three categories: | 
 |  | 
 | o	Marked writes, such as WRITE_ONCE() and atomic_set().  These | 
 | 	primitives required the compiler to emit the corresponding store | 
 | 	instructions in the expected execution order, thus suppressing | 
 | 	a number of destructive optimizations.	However, they provide no | 
 | 	hardware ordering guarantees, and in fact many CPUs will happily | 
 | 	reorder marked writes with each other or with other unordered | 
 | 	operations, unless these operations are to the same variable. | 
 |  | 
 | o	Marked reads, such as READ_ONCE() and atomic_read().  These | 
 | 	primitives required the compiler to emit the corresponding load | 
 | 	instructions in the expected execution order, thus suppressing | 
 | 	a number of destructive optimizations.	However, they provide no | 
 | 	hardware ordering guarantees, and in fact many CPUs will happily | 
 | 	reorder marked reads with each other or with other unordered | 
 | 	operations, unless these operations are to the same variable. | 
 |  | 
 | o	Unordered RMW atomic operations.  These are non-value-returning | 
 | 	RMW atomic operations whose names do not end in _acquire or | 
 | 	_release, and also value-returning RMW operations whose names | 
 | 	end in _relaxed.  Examples include atomic_add(), atomic_or(), | 
 | 	and atomic64_fetch_xor_relaxed().  These operations do carry | 
 | 	out the specified RMW operation atomically, for example, five | 
 | 	concurrent atomic_inc() operations applied to a given variable | 
 | 	will reliably increase the value of that variable by five. | 
 | 	However, many CPUs will happily reorder these operations with | 
 | 	each other or with other unordered operations. | 
 |  | 
 | 	This category of operations can be efficiently ordered using | 
 | 	smp_mb__before_atomic() and smp_mb__after_atomic(), as was | 
 | 	discussed in the "RMW Ordering Augmentation Barriers" section. | 
 |  | 
 | In short, these operations can be freely reordered unless they are all | 
 | operating on a single variable or unless they are constrained by one of | 
 | the operations called out earlier in this document. | 
 |  | 
 |  | 
 | Unmarked C-Language Accesses | 
 | ---------------------------- | 
 |  | 
 | Unmarked C-language accesses are normal variable accesses to normal | 
 | variables, that is, to variables that are not "volatile" and are not | 
 | C11 atomic variables.  These operations provide no ordering guarantees, | 
 | and further do not guarantee "atomic" access.  For example, the compiler | 
 | might (and sometimes does) split a plain C-language store into multiple | 
 | smaller stores.  A load from that same variable running on some other | 
 | CPU while such a store is executing might see a value that is a mashup | 
 | of the old value and the new value. | 
 |  | 
 | Unmarked C-language accesses are unordered, and are also subject to | 
 | any number of compiler optimizations, many of which can break your | 
 | concurrent code.  It is possible to used unmarked C-language accesses for | 
 | shared variables that are subject to concurrent access, but great care | 
 | is required on an ongoing basis.  The compiler-constraining barrier() | 
 | primitive can be helpful, as can the various ordering primitives discussed | 
 | in this document.  It nevertheless bears repeating that use of unmarked | 
 | C-language accesses requires careful attention to not just your code, | 
 | but to all the compilers that might be used to build it.  Such compilers | 
 | might replace a series of loads with a single load, and might replace | 
 | a series of stores with a single store.  Some compilers will even split | 
 | a single store into multiple smaller stores. | 
 |  | 
 | But there are some ways of using unmarked C-language accesses for shared | 
 | variables without such worries: | 
 |  | 
 | o	Guard all accesses to a given variable by a particular lock, | 
 | 	so that there are never concurrent conflicting accesses to | 
 | 	that variable.	(There are "conflicting accesses" when | 
 | 	(1) at least one of the concurrent accesses to a variable is an | 
 | 	unmarked C-language access and (2) when at least one of those | 
 | 	accesses is a write, whether marked or not.) | 
 |  | 
 | o	As above, but using other synchronization primitives such | 
 | 	as reader-writer locks or sequence locks. | 
 |  | 
 | o	Use locking or other means to ensure that all concurrent accesses | 
 | 	to a given variable are reads. | 
 |  | 
 | o	Restrict use of a given variable to statistics or heuristics | 
 | 	where the occasional bogus value can be tolerated. | 
 |  | 
 | o	Declare the accessed variables as C11 atomics. | 
 | 	https://lwn.net/Articles/691128/ | 
 |  | 
 | o	Declare the accessed variables as "volatile". | 
 |  | 
 | If you need to live more dangerously, please do take the time to | 
 | understand the compilers.  One place to start is these two LWN | 
 | articles: | 
 |  | 
 | Who's afraid of a big bad optimizing compiler? | 
 | 	https://lwn.net/Articles/793253 | 
 | Calibrating your fear of big bad optimizing compilers | 
 | 	https://lwn.net/Articles/799218 | 
 |  | 
 | Used properly, unmarked C-language accesses can reduce overhead on | 
 | fastpaths.  However, the price is great care and continual attention | 
 | to your compiler as new versions come out and as new optimizations | 
 | are enabled. |