| Reference-count design for elements of lists/arrays protected by RCU. |
| |
| |
| Please note that the percpu-ref feature is likely your first |
| stop if you need to combine reference counts and RCU. Please see |
| include/linux/percpu-refcount.h for more information. However, in |
| those unusual cases where percpu-ref would consume too much memory, |
| please read on. |
| |
| ------------------------------------------------------------------------ |
| |
| Reference counting on elements of lists which are protected by traditional |
| reader/writer spinlocks or semaphores are straightforward: |
| |
| CODE LISTING A: |
| 1. 2. |
| add() search_and_reference() |
| { { |
| alloc_object read_lock(&list_lock); |
| ... search_for_element |
| atomic_set(&el->rc, 1); atomic_inc(&el->rc); |
| write_lock(&list_lock); ... |
| add_element read_unlock(&list_lock); |
| ... ... |
| write_unlock(&list_lock); } |
| } |
| |
| 3. 4. |
| release_referenced() delete() |
| { { |
| ... write_lock(&list_lock); |
| if(atomic_dec_and_test(&el->rc)) ... |
| kfree(el); |
| ... remove_element |
| } write_unlock(&list_lock); |
| ... |
| if (atomic_dec_and_test(&el->rc)) |
| kfree(el); |
| ... |
| } |
| |
| If this list/array is made lock free using RCU as in changing the |
| write_lock() in add() and delete() to spin_lock() and changing read_lock() |
| in search_and_reference() to rcu_read_lock(), the atomic_inc() in |
| search_and_reference() could potentially hold reference to an element which |
| has already been deleted from the list/array. Use atomic_inc_not_zero() |
| in this scenario as follows: |
| |
| CODE LISTING B: |
| 1. 2. |
| add() search_and_reference() |
| { { |
| alloc_object rcu_read_lock(); |
| ... search_for_element |
| atomic_set(&el->rc, 1); if (!atomic_inc_not_zero(&el->rc)) { |
| spin_lock(&list_lock); rcu_read_unlock(); |
| return FAIL; |
| add_element } |
| ... ... |
| spin_unlock(&list_lock); rcu_read_unlock(); |
| } } |
| 3. 4. |
| release_referenced() delete() |
| { { |
| ... spin_lock(&list_lock); |
| if (atomic_dec_and_test(&el->rc)) ... |
| call_rcu(&el->head, el_free); remove_element |
| ... spin_unlock(&list_lock); |
| } ... |
| if (atomic_dec_and_test(&el->rc)) |
| call_rcu(&el->head, el_free); |
| ... |
| } |
| |
| Sometimes, a reference to the element needs to be obtained in the |
| update (write) stream. In such cases, atomic_inc_not_zero() might be |
| overkill, since we hold the update-side spinlock. One might instead |
| use atomic_inc() in such cases. |
| |
| It is not always convenient to deal with "FAIL" in the |
| search_and_reference() code path. In such cases, the |
| atomic_dec_and_test() may be moved from delete() to el_free() |
| as follows: |
| |
| CODE LISTING C: |
| 1. 2. |
| add() search_and_reference() |
| { { |
| alloc_object rcu_read_lock(); |
| ... search_for_element |
| atomic_set(&el->rc, 1); atomic_inc(&el->rc); |
| spin_lock(&list_lock); ... |
| |
| add_element rcu_read_unlock(); |
| ... } |
| spin_unlock(&list_lock); 4. |
| } delete() |
| 3. { |
| release_referenced() spin_lock(&list_lock); |
| { ... |
| ... remove_element |
| if (atomic_dec_and_test(&el->rc)) spin_unlock(&list_lock); |
| kfree(el); ... |
| ... call_rcu(&el->head, el_free); |
| } ... |
| 5. } |
| void el_free(struct rcu_head *rhp) |
| { |
| release_referenced(); |
| } |
| |
| The key point is that the initial reference added by add() is not removed |
| until after a grace period has elapsed following removal. This means that |
| search_and_reference() cannot find this element, which means that the value |
| of el->rc cannot increase. Thus, once it reaches zero, there are no |
| readers that can or ever will be able to reference the element. The |
| element can therefore safely be freed. This in turn guarantees that if |
| any reader finds the element, that reader may safely acquire a reference |
| without checking the value of the reference counter. |
| |
| A clear advantage of the RCU-based pattern in listing C over the one |
| in listing B is that any call to search_and_reference() that locates |
| a given object will succeed in obtaining a reference to that object, |
| even given a concurrent invocation of delete() for that same object. |
| Similarly, a clear advantage of both listings B and C over listing A is |
| that a call to delete() is not delayed even if there are an arbitrarily |
| large number of calls to search_and_reference() searching for the same |
| object that delete() was invoked on. Instead, all that is delayed is |
| the eventual invocation of kfree(), which is usually not a problem on |
| modern computer systems, even the small ones. |
| |
| In cases where delete() can sleep, synchronize_rcu() can be called from |
| delete(), so that el_free() can be subsumed into delete as follows: |
| |
| 4. |
| delete() |
| { |
| spin_lock(&list_lock); |
| ... |
| remove_element |
| spin_unlock(&list_lock); |
| ... |
| synchronize_rcu(); |
| if (atomic_dec_and_test(&el->rc)) |
| kfree(el); |
| ... |
| } |
| |
| As additional examples in the kernel, the pattern in listing C is used by |
| reference counting of struct pid, while the pattern in listing B is used by |
| struct posix_acl. |