Merge branch 'core/urgent' into core/rcu
Merge reason: Pick up RCU fixlet to base further commits on.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt
index 187bbf1..8608fd8 100644
--- a/Documentation/RCU/trace.txt
+++ b/Documentation/RCU/trace.txt
@@ -1,185 +1,10 @@
CONFIG_RCU_TRACE debugfs Files and Formats
-The rcupreempt and rcutree implementations of RCU provide debugfs trace
-output that summarizes counters and state. This information is useful for
-debugging RCU itself, and can sometimes also help to debug abuses of RCU.
-Note that the rcuclassic implementation of RCU does not provide debugfs
-trace output.
-
-The following sections describe the debugfs files and formats for
-preemptable RCU (rcupreempt) and hierarchical RCU (rcutree).
-
-
-Preemptable RCU debugfs Files and Formats
-
-This implementation of RCU provides three debugfs files under the
-top-level directory RCU: rcu/rcuctrs (which displays the per-CPU
-counters used by preemptable RCU) rcu/rcugp (which displays grace-period
-counters), and rcu/rcustats (which internal counters for debugging RCU).
-
-The output of "cat rcu/rcuctrs" looks as follows:
-
-CPU last cur F M
- 0 5 -5 0 0
- 1 -1 0 0 0
- 2 0 1 0 0
- 3 0 1 0 0
- 4 0 1 0 0
- 5 0 1 0 0
- 6 0 2 0 0
- 7 0 -1 0 0
- 8 0 1 0 0
-ggp = 26226, state = waitzero
-
-The per-CPU fields are as follows:
-
-o "CPU" gives the CPU number. Offline CPUs are not displayed.
-
-o "last" gives the value of the counter that is being decremented
- for the current grace period phase. In the example above,
- the counters sum to 4, indicating that there are still four
- RCU read-side critical sections still running that started
- before the last counter flip.
-
-o "cur" gives the value of the counter that is currently being
- both incremented (by rcu_read_lock()) and decremented (by
- rcu_read_unlock()). In the example above, the counters sum to
- 1, indicating that there is only one RCU read-side critical section
- still running that started after the last counter flip.
-
-o "F" indicates whether RCU is waiting for this CPU to acknowledge
- a counter flip. In the above example, RCU is not waiting on any,
- which is consistent with the state being "waitzero" rather than
- "waitack".
-
-o "M" indicates whether RCU is waiting for this CPU to execute a
- memory barrier. In the above example, RCU is not waiting on any,
- which is consistent with the state being "waitzero" rather than
- "waitmb".
-
-o "ggp" is the global grace-period counter.
-
-o "state" is the RCU state, which can be one of the following:
-
- o "idle": there is no grace period in progress.
-
- o "waitack": RCU just incremented the global grace-period
- counter, which has the effect of reversing the roles of
- the "last" and "cur" counters above, and is waiting for
- all the CPUs to acknowledge the flip. Once the flip has
- been acknowledged, CPUs will no longer be incrementing
- what are now the "last" counters, so that their sum will
- decrease monotonically down to zero.
-
- o "waitzero": RCU is waiting for the sum of the "last" counters
- to decrease to zero.
-
- o "waitmb": RCU is waiting for each CPU to execute a memory
- barrier, which ensures that instructions from a given CPU's
- last RCU read-side critical section cannot be reordered
- with instructions following the memory-barrier instruction.
-
-The output of "cat rcu/rcugp" looks as follows:
-
-oldggp=48870 newggp=48873
-
-Note that reading from this file provokes a synchronize_rcu(). The
-"oldggp" value is that of "ggp" from rcu/rcuctrs above, taken before
-executing the synchronize_rcu(), and the "newggp" value is also the
-"ggp" value, but taken after the synchronize_rcu() command returns.
-
-
-The output of "cat rcu/rcugp" looks as follows:
-
-na=1337955 nl=40 wa=1337915 wl=44 da=1337871 dl=0 dr=1337871 di=1337871
-1=50989 e1=6138 i1=49722 ie1=82 g1=49640 a1=315203 ae1=265563 a2=49640
-z1=1401244 ze1=1351605 z2=49639 m1=5661253 me1=5611614 m2=49639
-
-These are counters tracking internal preemptable-RCU events, however,
-some of them may be useful for debugging algorithms using RCU. In
-particular, the "nl", "wl", and "dl" values track the number of RCU
-callbacks in various states. The fields are as follows:
-
-o "na" is the total number of RCU callbacks that have been enqueued
- since boot.
-
-o "nl" is the number of RCU callbacks waiting for the previous
- grace period to end so that they can start waiting on the next
- grace period.
-
-o "wa" is the total number of RCU callbacks that have started waiting
- for a grace period since boot. "na" should be roughly equal to
- "nl" plus "wa".
-
-o "wl" is the number of RCU callbacks currently waiting for their
- grace period to end.
-
-o "da" is the total number of RCU callbacks whose grace periods
- have completed since boot. "wa" should be roughly equal to
- "wl" plus "da".
-
-o "dr" is the total number of RCU callbacks that have been removed
- from the list of callbacks ready to invoke. "dr" should be roughly
- equal to "da".
-
-o "di" is the total number of RCU callbacks that have been invoked
- since boot. "di" should be roughly equal to "da", though some
- early versions of preemptable RCU had a bug so that only the
- last CPU's count of invocations was displayed, rather than the
- sum of all CPU's counts.
-
-o "1" is the number of calls to rcu_try_flip(). This should be
- roughly equal to the sum of "e1", "i1", "a1", "z1", and "m1"
- described below. In other words, the number of times that
- the state machine is visited should be equal to the sum of the
- number of times that each state is visited plus the number of
- times that the state-machine lock acquisition failed.
-
-o "e1" is the number of times that rcu_try_flip() was unable to
- acquire the fliplock.
-
-o "i1" is the number of calls to rcu_try_flip_idle().
-
-o "ie1" is the number of times rcu_try_flip_idle() exited early
- due to the calling CPU having no work for RCU.
-
-o "g1" is the number of times that rcu_try_flip_idle() decided
- to start a new grace period. "i1" should be roughly equal to
- "ie1" plus "g1".
-
-o "a1" is the number of calls to rcu_try_flip_waitack().
-
-o "ae1" is the number of times that rcu_try_flip_waitack() found
- that at least one CPU had not yet acknowledge the new grace period
- (AKA "counter flip").
-
-o "a2" is the number of time rcu_try_flip_waitack() found that
- all CPUs had acknowledged. "a1" should be roughly equal to
- "ae1" plus "a2". (This particular output was collected on
- a 128-CPU machine, hence the smaller-than-usual fraction of
- calls to rcu_try_flip_waitack() finding all CPUs having already
- acknowledged.)
-
-o "z1" is the number of calls to rcu_try_flip_waitzero().
-
-o "ze1" is the number of times that rcu_try_flip_waitzero() found
- that not all of the old RCU read-side critical sections had
- completed.
-
-o "z2" is the number of times that rcu_try_flip_waitzero() finds
- the sum of the counters equal to zero, in other words, that
- all of the old RCU read-side critical sections had completed.
- The value of "z1" should be roughly equal to "ze1" plus
- "z2".
-
-o "m1" is the number of calls to rcu_try_flip_waitmb().
-
-o "me1" is the number of times that rcu_try_flip_waitmb() finds
- that at least one CPU has not yet executed a memory barrier.
-
-o "m2" is the number of times that rcu_try_flip_waitmb() finds that
- all CPUs have executed a memory barrier.
+The rcutree implementation of RCU provides debugfs trace output that
+summarizes counters and state. This information is useful for debugging
+RCU itself, and can sometimes also help to debug abuses of RCU.
+The following sections describe the debugfs files and formats.
Hierarchical RCU debugfs Files and Formats
@@ -210,9 +35,10 @@
6 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=859/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
7 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3761/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
-The first section lists the rcu_data structures for rcu, the second for
-rcu_bh. Each section has one line per CPU, or eight for this 8-CPU system.
-The fields are as follows:
+The first section lists the rcu_data structures for rcu_sched, the second
+for rcu_bh. Note that CONFIG_TREE_PREEMPT_RCU kernels will have an
+additional section for rcu_preempt. Each section has one line per CPU,
+or eight for this 8-CPU system. The fields are as follows:
o The number at the beginning of each line is the CPU number.
CPUs numbers followed by an exclamation mark are offline,
@@ -223,9 +49,9 @@
o "c" is the count of grace periods that this CPU believes have
completed. CPUs in dynticks idle mode may lag quite a ways
- behind, for example, CPU 4 under "rcu" above, which has slept
- through the past 25 RCU grace periods. It is not unusual to
- see CPUs lagging by thousands of grace periods.
+ behind, for example, CPU 4 under "rcu_sched" above, which has
+ slept through the past 25 RCU grace periods. It is not unusual
+ to see CPUs lagging by thousands of grace periods.
o "g" is the count of grace periods that this CPU believes have
started. Again, CPUs in dynticks idle mode may lag behind.
@@ -308,8 +134,10 @@
rcu_sched: completed=33062 gpnum=33063
rcu_bh: completed=464 gpnum=464
-Again, this output is for both "rcu" and "rcu_bh". The fields are
-taken from the rcu_state structure, and are as follows:
+Again, this output is for both "rcu_sched" and "rcu_bh". Note that
+kernels built with CONFIG_TREE_PREEMPT_RCU will have an additional
+"rcu_preempt" line. The fields are taken from the rcu_state structure,
+and are as follows:
o "completed" is the number of grace periods that have completed.
It is comparable to the "c" field from rcu/rcudata in that a
@@ -324,23 +152,24 @@
If these two fields are equal (as they are for "rcu_bh" above),
then there is no grace period in progress, in other words, RCU
is idle. On the other hand, if the two fields differ (as they
- do for "rcu" above), then an RCU grace period is in progress.
+ do for "rcu_sched" above), then an RCU grace period is in progress.
The output of "cat rcu/rcuhier" looks as follows, with very long lines:
-c=6902 g=6903 s=2 jfq=3 j=72c7 nfqs=13142/nfqsng=0(13142) fqlh=6
-1/1 0:127 ^0
-3/3 0:35 ^0 0/0 36:71 ^1 0/0 72:107 ^2 0/0 108:127 ^3
-3/3f 0:5 ^0 2/3 6:11 ^1 0/0 12:17 ^2 0/0 18:23 ^3 0/0 24:29 ^4 0/0 30:35 ^5 0/0 36:41 ^0 0/0 42:47 ^1 0/0 48:53 ^2 0/0 54:59 ^3 0/0 60:65 ^4 0/0 66:71 ^5 0/0 72:77 ^0 0/0 78:83 ^1 0/0 84:89 ^2 0/0 90:95 ^3 0/0 96:101 ^4 0/0 102:107 ^5 0/0 108:113 ^0 0/0 114:119 ^1 0/0 120:125 ^2 0/0 126:127 ^3
+c=6902 g=6903 s=2 jfq=3 j=72c7 nfqs=13142/nfqsng=0(13142) fqlh=6 oqlen=0
+1/1 .>. 0:127 ^0
+3/3 .>. 0:35 ^0 0/0 .>. 36:71 ^1 0/0 .>. 72:107 ^2 0/0 .>. 108:127 ^3
+3/3f .>. 0:5 ^0 2/3 .>. 6:11 ^1 0/0 .>. 12:17 ^2 0/0 .>. 18:23 ^3 0/0 .>. 24:29 ^4 0/0 .>. 30:35 ^5 0/0 .>. 36:41 ^0 0/0 .>. 42:47 ^1 0/0 .>. 48:53 ^2 0/0 .>. 54:59 ^3 0/0 .>. 60:65 ^4 0/0 .>. 66:71 ^5 0/0 .>. 72:77 ^0 0/0 .>. 78:83 ^1 0/0 .>. 84:89 ^2 0/0 .>. 90:95 ^3 0/0 .>. 96:101 ^4 0/0 .>. 102:107 ^5 0/0 .>. 108:113 ^0 0/0 .>. 114:119 ^1 0/0 .>. 120:125 ^2 0/0 .>. 126:127 ^3
rcu_bh:
-c=-226 g=-226 s=1 jfq=-5701 j=72c7 nfqs=88/nfqsng=0(88) fqlh=0
-0/1 0:127 ^0
-0/3 0:35 ^0 0/0 36:71 ^1 0/0 72:107 ^2 0/0 108:127 ^3
-0/3f 0:5 ^0 0/3 6:11 ^1 0/0 12:17 ^2 0/0 18:23 ^3 0/0 24:29 ^4 0/0 30:35 ^5 0/0 36:41 ^0 0/0 42:47 ^1 0/0 48:53 ^2 0/0 54:59 ^3 0/0 60:65 ^4 0/0 66:71 ^5 0/0 72:77 ^0 0/0 78:83 ^1 0/0 84:89 ^2 0/0 90:95 ^3 0/0 96:101 ^4 0/0 102:107 ^5 0/0 108:113 ^0 0/0 114:119 ^1 0/0 120:125 ^2 0/0 126:127 ^3
+c=-226 g=-226 s=1 jfq=-5701 j=72c7 nfqs=88/nfqsng=0(88) fqlh=0 oqlen=0
+0/1 .>. 0:127 ^0
+0/3 .>. 0:35 ^0 0/0 .>. 36:71 ^1 0/0 .>. 72:107 ^2 0/0 .>. 108:127 ^3
+0/3f .>. 0:5 ^0 0/3 .>. 6:11 ^1 0/0 .>. 12:17 ^2 0/0 .>. 18:23 ^3 0/0 .>. 24:29 ^4 0/0 .>. 30:35 ^5 0/0 .>. 36:41 ^0 0/0 .>. 42:47 ^1 0/0 .>. 48:53 ^2 0/0 .>. 54:59 ^3 0/0 .>. 60:65 ^4 0/0 .>. 66:71 ^5 0/0 .>. 72:77 ^0 0/0 .>. 78:83 ^1 0/0 .>. 84:89 ^2 0/0 .>. 90:95 ^3 0/0 .>. 96:101 ^4 0/0 .>. 102:107 ^5 0/0 .>. 108:113 ^0 0/0 .>. 114:119 ^1 0/0 .>. 120:125 ^2 0/0 .>. 126:127 ^3
-This is once again split into "rcu" and "rcu_bh" portions. The fields are
-as follows:
+This is once again split into "rcu_sched" and "rcu_bh" portions,
+and CONFIG_TREE_PREEMPT_RCU kernels will again have an additional
+"rcu_preempt" section. The fields are as follows:
o "c" is exactly the same as "completed" under rcu/rcugp.
@@ -372,6 +201,11 @@
exited immediately (without even being counted in nfqs above)
due to contention on ->fqslock.
+o "oqlen" is the number of callbacks on the "orphan" callback
+ list. RCU callbacks are placed on this list by CPUs going
+ offline, and are "adopted" either by the CPU helping the outgoing
+ CPU or by the next rcu_barrier*() call, whichever comes first.
+
o Each element of the form "1/1 0:127 ^0" represents one struct
rcu_node. Each line represents one level of the hierarchy, from
root to leaves. It is best to think of the rcu_data structures
@@ -379,7 +213,7 @@
might be either one, two, or three levels of rcu_node structures,
depending on the relationship between CONFIG_RCU_FANOUT and
CONFIG_NR_CPUS.
-
+
o The numbers separated by the "/" are the qsmask followed
by the qsmaskinit. The qsmask will have one bit
set for each entity in the next lower level that
@@ -389,10 +223,19 @@
The value of qsmaskinit is assigned to that of qsmask
at the beginning of each grace period.
- For example, for "rcu", the qsmask of the first entry
- of the lowest level is 0x14, meaning that we are still
- waiting for CPUs 2 and 4 to check in for the current
- grace period.
+ For example, for "rcu_sched", the qsmask of the first
+ entry of the lowest level is 0x14, meaning that we
+ are still waiting for CPUs 2 and 4 to check in for the
+ current grace period.
+
+ o The characters separated by the ">" indicate the state
+ of the blocked-tasks lists. A "T" preceding the ">"
+ indicates that at least one task blocked in an RCU
+ read-side critical section blocks the current grace
+ period, while a "." preceding the ">" indicates otherwise.
+ The character following the ">" indicates similarly for
+ the next grace period. A "T" should appear in this
+ field only for rcu-preempt.
o The numbers separated by the ":" are the range of CPUs
served by this struct rcu_node. This can be helpful
@@ -431,8 +274,9 @@
6 np=120834 qsp=9902 cbr=0 cng=0 gpc=6 gps=3 nf=2 nn=110921
7 np=144888 qsp=26336 cbr=0 cng=0 gpc=8 gps=2 nf=0 nn=118542
-As always, this is once again split into "rcu" and "rcu_bh" portions.
-The fields are as follows:
+As always, this is once again split into "rcu_sched" and "rcu_bh"
+portions, with CONFIG_TREE_PREEMPT_RCU kernels having an additional
+"rcu_preempt" section. The fields are as follows:
o "np" is the number of times that __rcu_pending() has been invoked
for the corresponding flavor of RCU.
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index e41a7fe..d542ca2 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt
@@ -830,7 +830,7 @@
SRCU: Critical sections Grace period Barrier
srcu_read_lock synchronize_srcu N/A
- srcu_read_unlock
+ srcu_read_unlock synchronize_srcu_expedited
SRCU: Initialization/cleanup
init_srcu_struct
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 6d527ee..d5b3876 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -139,10 +139,34 @@
#endif
#if defined(CONFIG_NO_HZ)
+#if defined(CONFIG_TINY_RCU)
+extern void rcu_enter_nohz(void);
+extern void rcu_exit_nohz(void);
+
+static inline void rcu_irq_enter(void)
+{
+ rcu_exit_nohz();
+}
+
+static inline void rcu_irq_exit(void)
+{
+ rcu_enter_nohz();
+}
+
+static inline void rcu_nmi_enter(void)
+{
+}
+
+static inline void rcu_nmi_exit(void)
+{
+}
+
+#else
extern void rcu_irq_enter(void);
extern void rcu_irq_exit(void);
extern void rcu_nmi_enter(void);
extern void rcu_nmi_exit(void);
+#endif
#else
# define rcu_irq_enter() do { } while (0)
# define rcu_irq_exit() do { } while (0)
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 3ebd0b7..2f1bc42 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -68,11 +68,17 @@
/* Internal to kernel */
extern void rcu_init(void);
extern void rcu_scheduler_starting(void);
+#ifndef CONFIG_TINY_RCU
extern int rcu_needs_cpu(int cpu);
+#else
+static inline int rcu_needs_cpu(int cpu) { return 0; }
+#endif
extern int rcu_scheduler_active;
#if defined(CONFIG_TREE_RCU) || defined(CONFIG_TREE_PREEMPT_RCU)
#include <linux/rcutree.h>
+#elif defined(CONFIG_TINY_RCU)
+#include <linux/rcutiny.h>
#else
#error "Unknown RCU implementation specified to kernel configuration"
#endif
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
new file mode 100644
index 0000000..2c1fe83
--- /dev/null
+++ b/include/linux/rcutiny.h
@@ -0,0 +1,95 @@
+/*
+ * Read-Copy Update mechanism for mutual exclusion, the Bloatwatch edition.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright IBM Corporation, 2008
+ *
+ * Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
+ *
+ * For detailed explanation of Read-Copy Update mechanism see -
+ * Documentation/RCU
+ */
+#ifndef __LINUX_TINY_H
+#define __LINUX_TINY_H
+
+#include <linux/cache.h>
+
+void rcu_sched_qs(int cpu);
+void rcu_bh_qs(int cpu);
+
+#define __rcu_read_lock() preempt_disable()
+#define __rcu_read_unlock() preempt_enable()
+#define __rcu_read_lock_bh() local_bh_disable()
+#define __rcu_read_unlock_bh() local_bh_enable()
+#define call_rcu_sched call_rcu
+
+#define rcu_init_sched() do { } while (0)
+extern void rcu_check_callbacks(int cpu, int user);
+extern void __rcu_init(void);
+
+/*
+ * Return the number of grace periods.
+ */
+static inline long rcu_batches_completed(void)
+{
+ return 0;
+}
+
+/*
+ * Return the number of bottom-half grace periods.
+ */
+static inline long rcu_batches_completed_bh(void)
+{
+ return 0;
+}
+
+extern int rcu_expedited_torture_stats(char *page);
+
+static inline void synchronize_rcu_expedited(void)
+{
+ synchronize_sched();
+}
+
+static inline void synchronize_rcu_bh_expedited(void)
+{
+ synchronize_sched();
+}
+
+struct notifier_block;
+extern int rcu_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu);
+
+#ifdef CONFIG_NO_HZ
+
+extern void rcu_enter_nohz(void);
+extern void rcu_exit_nohz(void);
+
+#else /* #ifdef CONFIG_NO_HZ */
+
+static inline void rcu_enter_nohz(void)
+{
+}
+
+static inline void rcu_exit_nohz(void)
+{
+}
+
+#endif /* #else #ifdef CONFIG_NO_HZ */
+
+static inline void exit_rcu(void)
+{
+}
+
+#endif /* __LINUX_RCUTINY_H */
diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index aca0eee..4765d97 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -48,6 +48,7 @@
int srcu_read_lock(struct srcu_struct *sp) __acquires(sp);
void srcu_read_unlock(struct srcu_struct *sp, int idx) __releases(sp);
void synchronize_srcu(struct srcu_struct *sp);
+void synchronize_srcu_expedited(struct srcu_struct *sp);
long srcu_batches_completed(struct srcu_struct *sp);
#endif
diff --git a/init/Kconfig b/init/Kconfig
index 09c5c64..d367bf6 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -334,6 +334,15 @@
is also required. It also scales down nicely to
smaller systems.
+config TINY_RCU
+ bool "UP-only small-memory-footprint RCU"
+ depends on !SMP
+ help
+ This option selects the RCU implementation that is
+ designed for UP systems from which real-time response
+ is not required. This option greatly reduces the
+ memory footprint of RCU.
+
endchoice
config RCU_TRACE
diff --git a/kernel/Makefile b/kernel/Makefile
index b8d4cd8..e9a0e6e 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -82,6 +82,7 @@
obj-$(CONFIG_TREE_RCU) += rcutree.o
obj-$(CONFIG_TREE_PREEMPT_RCU) += rcutree.o
obj-$(CONFIG_TREE_RCU_TRACE) += rcutree_trace.o
+obj-$(CONFIG_TINY_RCU) += rcutiny.o
obj-$(CONFIG_RELAY) += relay.o
obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
index 4001833..7625f20 100644
--- a/kernel/rcupdate.c
+++ b/kernel/rcupdate.c
@@ -67,6 +67,8 @@
complete(&rcu->completion);
}
+#ifndef CONFIG_TINY_RCU
+
#ifdef CONFIG_TREE_PREEMPT_RCU
/**
@@ -157,6 +159,8 @@
}
EXPORT_SYMBOL_GPL(synchronize_rcu_bh);
+#endif /* #ifndef CONFIG_TINY_RCU */
+
static int __cpuinit rcu_barrier_cpu_hotplug(struct notifier_block *self,
unsigned long action, void *hcpu)
{
diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
new file mode 100644
index 0000000..b33ec3aa
--- /dev/null
+++ b/kernel/rcutiny.c
@@ -0,0 +1,291 @@
+/*
+ * Read-Copy Update mechanism for mutual exclusion, the Bloatwatch edition.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright IBM Corporation, 2008
+ *
+ * Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
+ *
+ * For detailed explanation of Read-Copy Update mechanism see -
+ * Documentation/RCU
+ */
+#include <linux/moduleparam.h>
+#include <linux/completion.h>
+#include <linux/interrupt.h>
+#include <linux/notifier.h>
+#include <linux/rcupdate.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/sched.h>
+#include <linux/types.h>
+#include <linux/init.h>
+#include <linux/time.h>
+#include <linux/cpu.h>
+
+/* Global control variables for rcupdate callback mechanism. */
+struct rcu_ctrlblk {
+ struct rcu_head *rcucblist; /* List of pending callbacks (CBs). */
+ struct rcu_head **donetail; /* ->next pointer of last "done" CB. */
+ struct rcu_head **curtail; /* ->next pointer of last CB. */
+};
+
+/* Definition for rcupdate control block. */
+static struct rcu_ctrlblk rcu_ctrlblk = {
+ .donetail = &rcu_ctrlblk.rcucblist,
+ .curtail = &rcu_ctrlblk.rcucblist,
+};
+
+static struct rcu_ctrlblk rcu_bh_ctrlblk = {
+ .donetail = &rcu_bh_ctrlblk.rcucblist,
+ .curtail = &rcu_bh_ctrlblk.rcucblist,
+};
+
+#ifdef CONFIG_NO_HZ
+
+static long rcu_dynticks_nesting = 1;
+
+/*
+ * Enter dynticks-idle mode, which is an extended quiescent state
+ * if we have fully entered that mode (i.e., if the new value of
+ * dynticks_nesting is zero).
+ */
+void rcu_enter_nohz(void)
+{
+ if (--rcu_dynticks_nesting == 0)
+ rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */
+}
+
+/*
+ * Exit dynticks-idle mode, so that we are no longer in an extended
+ * quiescent state.
+ */
+void rcu_exit_nohz(void)
+{
+ rcu_dynticks_nesting++;
+}
+
+#endif /* #ifdef CONFIG_NO_HZ */
+
+/*
+ * Helper function for rcu_qsctr_inc() and rcu_bh_qsctr_inc().
+ * Also disable irqs to avoid confusion due to interrupt handlers
+ * invoking call_rcu().
+ */
+static int rcu_qsctr_help(struct rcu_ctrlblk *rcp)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ if (rcp->rcucblist != NULL &&
+ rcp->donetail != rcp->curtail) {
+ rcp->donetail = rcp->curtail;
+ local_irq_restore(flags);
+ return 1;
+ }
+ local_irq_restore(flags);
+
+ return 0;
+}
+
+/*
+ * Record an rcu quiescent state. And an rcu_bh quiescent state while we
+ * are at it, given that any rcu quiescent state is also an rcu_bh
+ * quiescent state. Use "+" instead of "||" to defeat short circuiting.
+ */
+void rcu_sched_qs(int cpu)
+{
+ if (rcu_qsctr_help(&rcu_ctrlblk) + rcu_qsctr_help(&rcu_bh_ctrlblk))
+ raise_softirq(RCU_SOFTIRQ);
+}
+
+/*
+ * Record an rcu_bh quiescent state.
+ */
+void rcu_bh_qs(int cpu)
+{
+ if (rcu_qsctr_help(&rcu_bh_ctrlblk))
+ raise_softirq(RCU_SOFTIRQ);
+}
+
+/*
+ * Check to see if the scheduling-clock interrupt came from an extended
+ * quiescent state, and, if so, tell RCU about it.
+ */
+void rcu_check_callbacks(int cpu, int user)
+{
+ if (user ||
+ (idle_cpu(cpu) &&
+ !in_softirq() &&
+ hardirq_count() <= (1 << HARDIRQ_SHIFT)))
+ rcu_sched_qs(cpu);
+ else if (!in_softirq())
+ rcu_bh_qs(cpu);
+}
+
+/*
+ * Helper function for rcu_process_callbacks() that operates on the
+ * specified rcu_ctrlkblk structure.
+ */
+static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp)
+{
+ struct rcu_head *next, *list;
+ unsigned long flags;
+
+ /* If no RCU callbacks ready to invoke, just return. */
+ if (&rcp->rcucblist == rcp->donetail)
+ return;
+
+ /* Move the ready-to-invoke callbacks to a local list. */
+ local_irq_save(flags);
+ list = rcp->rcucblist;
+ rcp->rcucblist = *rcp->donetail;
+ *rcp->donetail = NULL;
+ if (rcp->curtail == rcp->donetail)
+ rcp->curtail = &rcp->rcucblist;
+ rcp->donetail = &rcp->rcucblist;
+ local_irq_restore(flags);
+
+ /* Invoke the callbacks on the local list. */
+ while (list) {
+ next = list->next;
+ prefetch(next);
+ list->func(list);
+ list = next;
+ }
+}
+
+/*
+ * Invoke any callbacks whose grace period has completed.
+ */
+static void rcu_process_callbacks(struct softirq_action *unused)
+{
+ __rcu_process_callbacks(&rcu_ctrlblk);
+ __rcu_process_callbacks(&rcu_bh_ctrlblk);
+}
+
+/*
+ * Null function to handle CPU being onlined. Longer term, we want to
+ * make TINY_RCU avoid using rcupdate.c, but later...
+ */
+int rcu_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu)
+{
+ return NOTIFY_OK;
+}
+
+/*
+ * Wait for a grace period to elapse. But it is illegal to invoke
+ * synchronize_sched() from within an RCU read-side critical section.
+ * Therefore, any legal call to synchronize_sched() is a quiescent
+ * state, and so on a UP system, synchronize_sched() need do nothing.
+ * Ditto for synchronize_rcu_bh(). (But Lai Jiangshan points out the
+ * benefits of doing might_sleep() to reduce latency.)
+ *
+ * Cool, huh? (Due to Josh Triplett.)
+ *
+ * But we want to make this a static inline later.
+ */
+void synchronize_sched(void)
+{
+ cond_resched();
+}
+EXPORT_SYMBOL_GPL(synchronize_sched);
+
+void synchronize_rcu_bh(void)
+{
+ synchronize_sched();
+}
+EXPORT_SYMBOL_GPL(synchronize_rcu_bh);
+
+/*
+ * Helper function for call_rcu() and call_rcu_bh().
+ */
+static void __call_rcu(struct rcu_head *head,
+ void (*func)(struct rcu_head *rcu),
+ struct rcu_ctrlblk *rcp)
+{
+ unsigned long flags;
+
+ head->func = func;
+ head->next = NULL;
+
+ local_irq_save(flags);
+ *rcp->curtail = head;
+ rcp->curtail = &head->next;
+ local_irq_restore(flags);
+}
+
+/*
+ * Post an RCU callback to be invoked after the end of an RCU grace
+ * period. But since we have but one CPU, that would be after any
+ * quiescent state.
+ */
+void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
+{
+ __call_rcu(head, func, &rcu_ctrlblk);
+}
+EXPORT_SYMBOL_GPL(call_rcu);
+
+/*
+ * Post an RCU bottom-half callback to be invoked after any subsequent
+ * quiescent state.
+ */
+void call_rcu_bh(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
+{
+ __call_rcu(head, func, &rcu_bh_ctrlblk);
+}
+EXPORT_SYMBOL_GPL(call_rcu_bh);
+
+void rcu_barrier(void)
+{
+ struct rcu_synchronize rcu;
+
+ init_completion(&rcu.completion);
+ /* Will wake me after RCU finished. */
+ call_rcu(&rcu.head, wakeme_after_rcu);
+ /* Wait for it. */
+ wait_for_completion(&rcu.completion);
+}
+EXPORT_SYMBOL_GPL(rcu_barrier);
+
+void rcu_barrier_bh(void)
+{
+ struct rcu_synchronize rcu;
+
+ init_completion(&rcu.completion);
+ /* Will wake me after RCU finished. */
+ call_rcu_bh(&rcu.head, wakeme_after_rcu);
+ /* Wait for it. */
+ wait_for_completion(&rcu.completion);
+}
+EXPORT_SYMBOL_GPL(rcu_barrier_bh);
+
+void rcu_barrier_sched(void)
+{
+ struct rcu_synchronize rcu;
+
+ init_completion(&rcu.completion);
+ /* Will wake me after RCU finished. */
+ call_rcu_sched(&rcu.head, wakeme_after_rcu);
+ /* Wait for it. */
+ wait_for_completion(&rcu.completion);
+}
+EXPORT_SYMBOL_GPL(rcu_barrier_sched);
+
+void __rcu_init(void)
+{
+ open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
+}
diff --git a/kernel/rcutorture.c b/kernel/rcutorture.c
index 697c0a0..3dd0ca2 100644
--- a/kernel/rcutorture.c
+++ b/kernel/rcutorture.c
@@ -547,6 +547,25 @@
.name = "srcu"
};
+static void srcu_torture_synchronize_expedited(void)
+{
+ synchronize_srcu_expedited(&srcu_ctl);
+}
+
+static struct rcu_torture_ops srcu_expedited_ops = {
+ .init = srcu_torture_init,
+ .cleanup = srcu_torture_cleanup,
+ .readlock = srcu_torture_read_lock,
+ .read_delay = srcu_read_delay,
+ .readunlock = srcu_torture_read_unlock,
+ .completed = srcu_torture_completed,
+ .deferred_free = rcu_sync_torture_deferred_free,
+ .sync = srcu_torture_synchronize_expedited,
+ .cb_barrier = NULL,
+ .stats = srcu_torture_stats,
+ .name = "srcu_expedited"
+};
+
/*
* Definitions for sched torture testing.
*/
@@ -592,7 +611,7 @@
.name = "sched"
};
-static struct rcu_torture_ops sched_ops_sync = {
+static struct rcu_torture_ops sched_sync_ops = {
.init = rcu_sync_torture_init,
.cleanup = NULL,
.readlock = sched_torture_read_lock,
@@ -1098,8 +1117,8 @@
int firsterr = 0;
static struct rcu_torture_ops *torture_ops[] =
{ &rcu_ops, &rcu_sync_ops, &rcu_bh_ops, &rcu_bh_sync_ops,
- &sched_expedited_ops,
- &srcu_ops, &sched_ops, &sched_ops_sync, };
+ &srcu_ops, &srcu_expedited_ops,
+ &sched_ops, &sched_sync_ops, &sched_expedited_ops, };
mutex_lock(&fullstop_mutex);
@@ -1110,8 +1129,12 @@
break;
}
if (i == ARRAY_SIZE(torture_ops)) {
- printk(KERN_ALERT "rcutorture: invalid torture type: \"%s\"\n",
+ printk(KERN_ALERT "rcu-torture: invalid torture type: \"%s\"\n",
torture_type);
+ printk(KERN_ALERT "rcu-torture types:");
+ for (i = 0; i < ARRAY_SIZE(torture_ops); i++)
+ printk(KERN_ALERT " %s", torture_ops[i]->name);
+ printk(KERN_ALERT "\n");
mutex_unlock(&fullstop_mutex);
return -EINVAL;
}
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index f3077c0..69f4efe 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -51,6 +51,8 @@
/* Data structures. */
+static struct lock_class_key rcu_root_class;
+
#define RCU_STATE_INITIALIZER(name) { \
.level = { &name.node[0] }, \
.levelcnt = { \
@@ -1685,8 +1687,7 @@
cpustride *= rsp->levelspread[i];
rnp = rsp->level[i];
for (j = 0; j < rsp->levelcnt[i]; j++, rnp++) {
- if (rnp != rcu_get_root(rsp))
- spin_lock_init(&rnp->lock);
+ spin_lock_init(&rnp->lock);
rnp->gpnum = 0;
rnp->qsmask = 0;
rnp->qsmaskinit = 0;
@@ -1709,7 +1710,7 @@
INIT_LIST_HEAD(&rnp->blocked_tasks[1]);
}
}
- spin_lock_init(&rcu_get_root(rsp)->lock);
+ lockdep_set_class(&rcu_get_root(rsp)->lock, &rcu_root_class);
}
/*
diff --git a/kernel/rcutree_trace.c b/kernel/rcutree_trace.c
index 4b31c77..1984cdc 100644
--- a/kernel/rcutree_trace.c
+++ b/kernel/rcutree_trace.c
@@ -155,12 +155,14 @@
static void print_one_rcu_state(struct seq_file *m, struct rcu_state *rsp)
{
+ long gpnum;
int level = 0;
struct rcu_node *rnp;
+ gpnum = rsp->gpnum;
seq_printf(m, "c=%ld g=%ld s=%d jfq=%ld j=%x "
"nfqs=%lu/nfqsng=%lu(%lu) fqlh=%lu oqlen=%ld\n",
- rsp->completed, rsp->gpnum, rsp->signaled,
+ rsp->completed, gpnum, rsp->signaled,
(long)(rsp->jiffies_force_qs - jiffies),
(int)(jiffies & 0xffff),
rsp->n_force_qs, rsp->n_force_qs_ngp,
@@ -171,8 +173,10 @@
seq_puts(m, "\n");
level = rnp->level;
}
- seq_printf(m, "%lx/%lx %d:%d ^%d ",
+ seq_printf(m, "%lx/%lx %c>%c %d:%d ^%d ",
rnp->qsmask, rnp->qsmaskinit,
+ "T."[list_empty(&rnp->blocked_tasks[gpnum & 1])],
+ "T."[list_empty(&rnp->blocked_tasks[!(gpnum & 1)])],
rnp->grplo, rnp->grphi, rnp->grpnum);
}
seq_puts(m, "\n");
diff --git a/kernel/softirq.c b/kernel/softirq.c
index f8749e5..21939d9 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -302,9 +302,9 @@
if (!in_interrupt() && local_softirq_pending())
invoke_softirq();
+ rcu_irq_exit();
#ifdef CONFIG_NO_HZ
/* Make sure that timer wheel updates are propagated */
- rcu_irq_exit();
if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched())
tick_nohz_stop_sched_tick(0);
#endif
diff --git a/kernel/srcu.c b/kernel/srcu.c
index b0aeeaf..818d7d9 100644
--- a/kernel/srcu.c
+++ b/kernel/srcu.c
@@ -49,6 +49,7 @@
sp->per_cpu_ref = alloc_percpu(struct srcu_struct_array);
return (sp->per_cpu_ref ? 0 : -ENOMEM);
}
+EXPORT_SYMBOL_GPL(init_srcu_struct);
/*
* srcu_readers_active_idx -- returns approximate number of readers
@@ -97,6 +98,7 @@
free_percpu(sp->per_cpu_ref);
sp->per_cpu_ref = NULL;
}
+EXPORT_SYMBOL_GPL(cleanup_srcu_struct);
/**
* srcu_read_lock - register a new reader for an SRCU-protected structure.
@@ -118,6 +120,7 @@
preempt_enable();
return idx;
}
+EXPORT_SYMBOL_GPL(srcu_read_lock);
/**
* srcu_read_unlock - unregister a old reader from an SRCU-protected structure.
@@ -136,22 +139,12 @@
per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]--;
preempt_enable();
}
+EXPORT_SYMBOL_GPL(srcu_read_unlock);
-/**
- * synchronize_srcu - wait for prior SRCU read-side critical-section completion
- * @sp: srcu_struct with which to synchronize.
- *
- * Flip the completed counter, and wait for the old count to drain to zero.
- * As with classic RCU, the updater must use some separate means of
- * synchronizing concurrent updates. Can block; must be called from
- * process context.
- *
- * Note that it is illegal to call synchornize_srcu() from the corresponding
- * SRCU read-side critical section; doing so will result in deadlock.
- * However, it is perfectly legal to call synchronize_srcu() on one
- * srcu_struct from some other srcu_struct's read-side critical section.
+/*
+ * Helper function for synchronize_srcu() and synchronize_srcu_expedited().
*/
-void synchronize_srcu(struct srcu_struct *sp)
+void __synchronize_srcu(struct srcu_struct *sp, void (*sync_func)(void))
{
int idx;
@@ -173,7 +166,7 @@
return;
}
- synchronize_sched(); /* Force memory barrier on all CPUs. */
+ sync_func(); /* Force memory barrier on all CPUs. */
/*
* The preceding synchronize_sched() ensures that any CPU that
@@ -190,7 +183,7 @@
idx = sp->completed & 0x1;
sp->completed++;
- synchronize_sched(); /* Force memory barrier on all CPUs. */
+ sync_func(); /* Force memory barrier on all CPUs. */
/*
* At this point, because of the preceding synchronize_sched(),
@@ -203,7 +196,7 @@
while (srcu_readers_active_idx(sp, idx))
schedule_timeout_interruptible(1);
- synchronize_sched(); /* Force memory barrier on all CPUs. */
+ sync_func(); /* Force memory barrier on all CPUs. */
/*
* The preceding synchronize_sched() forces all srcu_read_unlock()
@@ -237,6 +230,47 @@
}
/**
+ * synchronize_srcu - wait for prior SRCU read-side critical-section completion
+ * @sp: srcu_struct with which to synchronize.
+ *
+ * Flip the completed counter, and wait for the old count to drain to zero.
+ * As with classic RCU, the updater must use some separate means of
+ * synchronizing concurrent updates. Can block; must be called from
+ * process context.
+ *
+ * Note that it is illegal to call synchronize_srcu() from the corresponding
+ * SRCU read-side critical section; doing so will result in deadlock.
+ * However, it is perfectly legal to call synchronize_srcu() on one
+ * srcu_struct from some other srcu_struct's read-side critical section.
+ */
+void synchronize_srcu(struct srcu_struct *sp)
+{
+ __synchronize_srcu(sp, synchronize_sched);
+}
+EXPORT_SYMBOL_GPL(synchronize_srcu);
+
+/**
+ * synchronize_srcu_expedited - like synchronize_srcu, but less patient
+ * @sp: srcu_struct with which to synchronize.
+ *
+ * Flip the completed counter, and wait for the old count to drain to zero.
+ * As with classic RCU, the updater must use some separate means of
+ * synchronizing concurrent updates. Can block; must be called from
+ * process context.
+ *
+ * Note that it is illegal to call synchronize_srcu_expedited()
+ * from the corresponding SRCU read-side critical section; doing so
+ * will result in deadlock. However, it is perfectly legal to call
+ * synchronize_srcu_expedited() on one srcu_struct from some other
+ * srcu_struct's read-side critical section.
+ */
+void synchronize_srcu_expedited(struct srcu_struct *sp)
+{
+ __synchronize_srcu(sp, synchronize_sched_expedited);
+}
+EXPORT_SYMBOL_GPL(synchronize_srcu_expedited);
+
+/**
* srcu_batches_completed - return batches completed.
* @sp: srcu_struct on which to report batch completion.
*
@@ -248,10 +282,4 @@
{
return sp->completed;
}
-
-EXPORT_SYMBOL_GPL(init_srcu_struct);
-EXPORT_SYMBOL_GPL(cleanup_srcu_struct);
-EXPORT_SYMBOL_GPL(srcu_read_lock);
-EXPORT_SYMBOL_GPL(srcu_read_unlock);
-EXPORT_SYMBOL_GPL(synchronize_srcu);
EXPORT_SYMBOL_GPL(srcu_batches_completed);