Merge branch 'core/urgent' into core/rcu Merge reason: Pick up RCU fixlet to base further commits on. Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit: 7e1a2766e67a529f62c8cfba0a47d63fc4f7fa8a [log] [tgz]
author: Ingo Molnar <mingo@elte.hu> Tue Nov 10 04:10:31 2009 +0100
committer: Ingo Molnar <mingo@elte.hu> Tue Nov 10 04:10:35 2009 +0100
tree: 197950369a773afdf04af9bdfc4a2ce1d2b5d3af
parent: c5e0cb3ddc5f14cedcfc50c0fb3b5fc6b56576da [diff]
parent: 83f5b01ffbbaea6f97c9a79d21e240dbfb69f2f1 [diff]
diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt
index 187bbf1..8608fd8 100644
--- a/Documentation/RCU/trace.txt
+++ b/Documentation/RCU/trace.txt

@@ -1,185 +1,10 @@
 CONFIG_RCU_TRACE debugfs Files and Formats
 
 
-The rcupreempt and rcutree implementations of RCU provide debugfs trace
-output that summarizes counters and state.  This information is useful for
-debugging RCU itself, and can sometimes also help to debug abuses of RCU.
-Note that the rcuclassic implementation of RCU does not provide debugfs
-trace output.
-
-The following sections describe the debugfs files and formats for
-preemptable RCU (rcupreempt) and hierarchical RCU (rcutree).
-
-
-Preemptable RCU debugfs Files and Formats
-
-This implementation of RCU provides three debugfs files under the
-top-level directory RCU: rcu/rcuctrs (which displays the per-CPU
-counters used by preemptable RCU) rcu/rcugp (which displays grace-period
-counters), and rcu/rcustats (which internal counters for debugging RCU).
-
-The output of "cat rcu/rcuctrs" looks as follows:
-
-CPU last cur F M
-  0    5  -5 0 0
-  1   -1   0 0 0
-  2    0   1 0 0
-  3    0   1 0 0
-  4    0   1 0 0
-  5    0   1 0 0
-  6    0   2 0 0
-  7    0  -1 0 0
-  8    0   1 0 0
-ggp = 26226, state = waitzero
-
-The per-CPU fields are as follows:
-
-o	"CPU" gives the CPU number.  Offline CPUs are not displayed.
-
-o	"last" gives the value of the counter that is being decremented
-	for the current grace period phase.  In the example above,
-	the counters sum to 4, indicating that there are still four
-	RCU read-side critical sections still running that started
-	before the last counter flip.
-
-o	"cur" gives the value of the counter that is currently being
-	both incremented (by rcu_read_lock()) and decremented (by
-	rcu_read_unlock()).  In the example above, the counters sum to
-	1, indicating that there is only one RCU read-side critical section
-	still running that started after the last counter flip.
-
-o	"F" indicates whether RCU is waiting for this CPU to acknowledge
-	a counter flip.  In the above example, RCU is not waiting on any,
-	which is consistent with the state being "waitzero" rather than
-	"waitack".
-
-o	"M" indicates whether RCU is waiting for this CPU to execute a
-	memory barrier.  In the above example, RCU is not waiting on any,
-	which is consistent with the state being "waitzero" rather than
-	"waitmb".
-
-o	"ggp" is the global grace-period counter.
-
-o	"state" is the RCU state, which can be one of the following:
-
-	o	"idle": there is no grace period in progress.
-
-	o	"waitack": RCU just incremented the global grace-period
-		counter, which has the effect of reversing the roles of
-		the "last" and "cur" counters above, and is waiting for
-		all the CPUs to acknowledge the flip.  Once the flip has
-		been acknowledged, CPUs will no longer be incrementing
-		what are now the "last" counters, so that their sum will
-		decrease monotonically down to zero.
-
-	o	"waitzero": RCU is waiting for the sum of the "last" counters
-		to decrease to zero.
-
-	o	"waitmb": RCU is waiting for each CPU to execute a memory
-		barrier, which ensures that instructions from a given CPU's
-		last RCU read-side critical section cannot be reordered
-		with instructions following the memory-barrier instruction.
-
-The output of "cat rcu/rcugp" looks as follows:
-
-oldggp=48870  newggp=48873
-
-Note that reading from this file provokes a synchronize_rcu().  The
-"oldggp" value is that of "ggp" from rcu/rcuctrs above, taken before
-executing the synchronize_rcu(), and the "newggp" value is also the
-"ggp" value, but taken after the synchronize_rcu() command returns.
-
-
-The output of "cat rcu/rcugp" looks as follows:
-
-na=1337955 nl=40 wa=1337915 wl=44 da=1337871 dl=0 dr=1337871 di=1337871
-1=50989 e1=6138 i1=49722 ie1=82 g1=49640 a1=315203 ae1=265563 a2=49640
-z1=1401244 ze1=1351605 z2=49639 m1=5661253 me1=5611614 m2=49639
-
-These are counters tracking internal preemptable-RCU events, however,
-some of them may be useful for debugging algorithms using RCU.  In
-particular, the "nl", "wl", and "dl" values track the number of RCU
-callbacks in various states.  The fields are as follows:
-
-o	"na" is the total number of RCU callbacks that have been enqueued
-	since boot.
-
-o	"nl" is the number of RCU callbacks waiting for the previous
-	grace period to end so that they can start waiting on the next
-	grace period.
-
-o	"wa" is the total number of RCU callbacks that have started waiting
-	for a grace period since boot.  "na" should be roughly equal to
-	"nl" plus "wa".
-
-o	"wl" is the number of RCU callbacks currently waiting for their
-	grace period to end.
-
-o	"da" is the total number of RCU callbacks whose grace periods
-	have completed since boot.  "wa" should be roughly equal to
-	"wl" plus "da".
-
-o	"dr" is the total number of RCU callbacks that have been removed
-	from the list of callbacks ready to invoke.  "dr" should be roughly
-	equal to "da".
-
-o	"di" is the total number of RCU callbacks that have been invoked
-	since boot.  "di" should be roughly equal to "da", though some
-	early versions of preemptable RCU had a bug so that only the
-	last CPU's count of invocations was displayed, rather than the
-	sum of all CPU's counts.
-
-o	"1" is the number of calls to rcu_try_flip().  This should be
-	roughly equal to the sum of "e1", "i1", "a1", "z1", and "m1"
-	described below.  In other words, the number of times that
-	the state machine is visited should be equal to the sum of the
-	number of times that each state is visited plus the number of
-	times that the state-machine lock acquisition failed.
-
-o	"e1" is the number of times that rcu_try_flip() was unable to
-	acquire the fliplock.
-
-o	"i1" is the number of calls to rcu_try_flip_idle().
-
-o	"ie1" is the number of times rcu_try_flip_idle() exited early
-	due to the calling CPU having no work for RCU.
-
-o	"g1" is the number of times that rcu_try_flip_idle() decided
-	to start a new grace period.  "i1" should be roughly equal to
-	"ie1" plus "g1".
-
-o	"a1" is the number of calls to rcu_try_flip_waitack().
-
-o	"ae1" is the number of times that rcu_try_flip_waitack() found
-	that at least one CPU had not yet acknowledge the new grace period
-	(AKA "counter flip").
-
-o	"a2" is the number of time rcu_try_flip_waitack() found that
-	all CPUs had acknowledged.  "a1" should be roughly equal to
-	"ae1" plus "a2".  (This particular output was collected on
-	a 128-CPU machine, hence the smaller-than-usual fraction of
-	calls to rcu_try_flip_waitack() finding all CPUs having already
-	acknowledged.)
-
-o	"z1" is the number of calls to rcu_try_flip_waitzero().
-
-o	"ze1" is the number of times that rcu_try_flip_waitzero() found
-	that not all of the old RCU read-side critical sections had
-	completed.
-
-o	"z2" is the number of times that rcu_try_flip_waitzero() finds
-	the sum of the counters equal to zero, in other words, that
-	all of the old RCU read-side critical sections had completed.
-	The value of "z1" should be roughly equal to "ze1" plus
-	"z2".
-
-o	"m1" is the number of calls to rcu_try_flip_waitmb().
-
-o	"me1" is the number of times that rcu_try_flip_waitmb() finds
-	that at least one CPU has not yet executed a memory barrier.
-
-o	"m2" is the number of times that rcu_try_flip_waitmb() finds that
-	all CPUs have executed a memory barrier.
+The rcutree implementation of RCU provides debugfs trace output that
+summarizes counters and state.  This information is useful for debugging
+RCU itself, and can sometimes also help to debug abuses of RCU.
+The following sections describe the debugfs files and formats.
 
 
 Hierarchical RCU debugfs Files and Formats
@@ -210,9 +35,10 @@
   6 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=859/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
   7 c=-275 g=-275 pq=1 pqc=-275 qp=0 dt=3761/1 dn=0 df=15 of=0 ri=0 ql=0 b=10
 
-The first section lists the rcu_data structures for rcu, the second for
-rcu_bh.  Each section has one line per CPU, or eight for this 8-CPU system.
-The fields are as follows:
+The first section lists the rcu_data structures for rcu_sched, the second
+for rcu_bh.  Note that CONFIG_TREE_PREEMPT_RCU kernels will have an
+additional section for rcu_preempt.  Each section has one line per CPU,
+or eight for this 8-CPU system.  The fields are as follows:
 
 o	The number at the beginning of each line is the CPU number.
 	CPUs numbers followed by an exclamation mark are offline,
@@ -223,9 +49,9 @@
 
 o	"c" is the count of grace periods that this CPU believes have
 	completed.  CPUs in dynticks idle mode may lag quite a ways
-	behind, for example, CPU 4 under "rcu" above, which has slept
-	through the past 25 RCU grace periods.	It is not unusual to
-	see CPUs lagging by thousands of grace periods.
+	behind, for example, CPU 4 under "rcu_sched" above, which has
+	slept through the past 25 RCU grace periods.  It is not unusual
+	to see CPUs lagging by thousands of grace periods.
 
 o	"g" is the count of grace periods that this CPU believes have
 	started.  Again, CPUs in dynticks idle mode may lag behind.
@@ -308,8 +134,10 @@
 rcu_sched: completed=33062  gpnum=33063
 rcu_bh: completed=464  gpnum=464
 
-Again, this output is for both "rcu" and "rcu_bh".  The fields are
-taken from the rcu_state structure, and are as follows:
+Again, this output is for both "rcu_sched" and "rcu_bh".  Note that
+kernels built with CONFIG_TREE_PREEMPT_RCU will have an additional
+"rcu_preempt" line.  The fields are taken from the rcu_state structure,
+and are as follows:
 
 o	"completed" is the number of grace periods that have completed.
 	It is comparable to the "c" field from rcu/rcudata in that a
@@ -324,23 +152,24 @@
 	If these two fields are equal (as they are for "rcu_bh" above),
 	then there is no grace period in progress, in other words, RCU
 	is idle.  On the other hand, if the two fields differ (as they
-	do for "rcu" above), then an RCU grace period is in progress.
+	do for "rcu_sched" above), then an RCU grace period is in progress.
 
 
 The output of "cat rcu/rcuhier" looks as follows, with very long lines:
 
-c=6902 g=6903 s=2 jfq=3 j=72c7 nfqs=13142/nfqsng=0(13142) fqlh=6
-1/1 0:127 ^0    
-3/3 0:35 ^0    0/0 36:71 ^1    0/0 72:107 ^2    0/0 108:127 ^3    
-3/3f 0:5 ^0    2/3 6:11 ^1    0/0 12:17 ^2    0/0 18:23 ^3    0/0 24:29 ^4    0/0 30:35 ^5    0/0 36:41 ^0    0/0 42:47 ^1    0/0 48:53 ^2    0/0 54:59 ^3    0/0 60:65 ^4    0/0 66:71 ^5    0/0 72:77 ^0    0/0 78:83 ^1    0/0 84:89 ^2    0/0 90:95 ^3    0/0 96:101 ^4    0/0 102:107 ^5    0/0 108:113 ^0    0/0 114:119 ^1    0/0 120:125 ^2    0/0 126:127 ^3    
+c=6902 g=6903 s=2 jfq=3 j=72c7 nfqs=13142/nfqsng=0(13142) fqlh=6 oqlen=0
+1/1 .>. 0:127 ^0    
+3/3 .>. 0:35 ^0    0/0 .>. 36:71 ^1    0/0 .>. 72:107 ^2    0/0 .>. 108:127 ^3    
+3/3f .>. 0:5 ^0    2/3 .>. 6:11 ^1    0/0 .>. 12:17 ^2    0/0 .>. 18:23 ^3    0/0 .>. 24:29 ^4    0/0 .>. 30:35 ^5    0/0 .>. 36:41 ^0    0/0 .>. 42:47 ^1    0/0 .>. 48:53 ^2    0/0 .>. 54:59 ^3    0/0 .>. 60:65 ^4    0/0 .>. 66:71 ^5    0/0 .>. 72:77 ^0    0/0 .>. 78:83 ^1    0/0 .>. 84:89 ^2    0/0 .>. 90:95 ^3    0/0 .>. 96:101 ^4    0/0 .>. 102:107 ^5    0/0 .>. 108:113 ^0    0/0 .>. 114:119 ^1    0/0 .>. 120:125 ^2    0/0 .>. 126:127 ^3    
 rcu_bh:
-c=-226 g=-226 s=1 jfq=-5701 j=72c7 nfqs=88/nfqsng=0(88) fqlh=0
-0/1 0:127 ^0    
-0/3 0:35 ^0    0/0 36:71 ^1    0/0 72:107 ^2    0/0 108:127 ^3    
-0/3f 0:5 ^0    0/3 6:11 ^1    0/0 12:17 ^2    0/0 18:23 ^3    0/0 24:29 ^4    0/0 30:35 ^5    0/0 36:41 ^0    0/0 42:47 ^1    0/0 48:53 ^2    0/0 54:59 ^3    0/0 60:65 ^4    0/0 66:71 ^5    0/0 72:77 ^0    0/0 78:83 ^1    0/0 84:89 ^2    0/0 90:95 ^3    0/0 96:101 ^4    0/0 102:107 ^5    0/0 108:113 ^0    0/0 114:119 ^1    0/0 120:125 ^2    0/0 126:127 ^3
+c=-226 g=-226 s=1 jfq=-5701 j=72c7 nfqs=88/nfqsng=0(88) fqlh=0 oqlen=0
+0/1 .>. 0:127 ^0    
+0/3 .>. 0:35 ^0    0/0 .>. 36:71 ^1    0/0 .>. 72:107 ^2    0/0 .>. 108:127 ^3    
+0/3f .>. 0:5 ^0    0/3 .>. 6:11 ^1    0/0 .>. 12:17 ^2    0/0 .>. 18:23 ^3    0/0 .>. 24:29 ^4    0/0 .>. 30:35 ^5    0/0 .>. 36:41 ^0    0/0 .>. 42:47 ^1    0/0 .>. 48:53 ^2    0/0 .>. 54:59 ^3    0/0 .>. 60:65 ^4    0/0 .>. 66:71 ^5    0/0 .>. 72:77 ^0    0/0 .>. 78:83 ^1    0/0 .>. 84:89 ^2    0/0 .>. 90:95 ^3    0/0 .>. 96:101 ^4    0/0 .>. 102:107 ^5    0/0 .>. 108:113 ^0    0/0 .>. 114:119 ^1    0/0 .>. 120:125 ^2    0/0 .>. 126:127 ^3
 
-This is once again split into "rcu" and "rcu_bh" portions.  The fields are
-as follows:
+This is once again split into "rcu_sched" and "rcu_bh" portions,
+and CONFIG_TREE_PREEMPT_RCU kernels will again have an additional
+"rcu_preempt" section.  The fields are as follows:
 
 o	"c" is exactly the same as "completed" under rcu/rcugp.
 
@@ -372,6 +201,11 @@
 	exited immediately (without even being counted in nfqs above)
 	due to contention on ->fqslock.
 
+o	"oqlen" is the number of callbacks on the "orphan" callback
+	list.  RCU callbacks are placed on this list by CPUs going
+	offline, and are "adopted" either by the CPU helping the outgoing
+	CPU or by the next rcu_barrier*() call, whichever comes first.
+
 o	Each element of the form "1/1 0:127 ^0" represents one struct
 	rcu_node.  Each line represents one level of the hierarchy, from
 	root to leaves.  It is best to think of the rcu_data structures
@@ -379,7 +213,7 @@
 	might be either one, two, or three levels of rcu_node structures,
 	depending on the relationship between CONFIG_RCU_FANOUT and
 	CONFIG_NR_CPUS.
-	
+
 	o	The numbers separated by the "/" are the qsmask followed
 		by the qsmaskinit.  The qsmask will have one bit
 		set for each entity in the next lower level that
@@ -389,10 +223,19 @@
 		The value of qsmaskinit is assigned to that of qsmask
 		at the beginning of each grace period.
 
-		For example, for "rcu", the qsmask of the first entry
-		of the lowest level is 0x14, meaning that we are still
-		waiting for CPUs 2 and 4 to check in for the current
-		grace period.
+		For example, for "rcu_sched", the qsmask of the first
+		entry of the lowest level is 0x14, meaning that we
+		are still waiting for CPUs 2 and 4 to check in for the
+		current grace period.
+
+	o	The characters separated by the ">" indicate the state
+		of the blocked-tasks lists.  A "T" preceding the ">"
+		indicates that at least one task blocked in an RCU
+		read-side critical section blocks the current grace
+		period, while a "." preceding the ">" indicates otherwise.
+		The character following the ">" indicates similarly for
+		the next grace period.  A "T" should appear in this
+		field only for rcu-preempt.
 
 	o	The numbers separated by the ":" are the range of CPUs
 		served by this struct rcu_node.  This can be helpful
@@ -431,8 +274,9 @@
   6 np=120834 qsp=9902 cbr=0 cng=0 gpc=6 gps=3 nf=2 nn=110921
   7 np=144888 qsp=26336 cbr=0 cng=0 gpc=8 gps=2 nf=0 nn=118542
 
-As always, this is once again split into "rcu" and "rcu_bh" portions.
-The fields are as follows:
+As always, this is once again split into "rcu_sched" and "rcu_bh"
+portions, with CONFIG_TREE_PREEMPT_RCU kernels having an additional
+"rcu_preempt" section.  The fields are as follows:
 
 o	"np" is the number of times that __rcu_pending() has been invoked
 	for the corresponding flavor of RCU.

diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index e41a7fe..d542ca2 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt

@@ -830,7 +830,7 @@
 SRCU:	Critical sections	Grace period		Barrier
 
 	srcu_read_lock		synchronize_srcu	N/A
-	srcu_read_unlock
+	srcu_read_unlock	synchronize_srcu_expedited
 
 SRCU:	Initialization/cleanup
 	init_srcu_struct

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 6d527ee..d5b3876 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h

@@ -139,10 +139,34 @@
 #endif
 
 #if defined(CONFIG_NO_HZ)
+#if defined(CONFIG_TINY_RCU)
+extern void rcu_enter_nohz(void);
+extern void rcu_exit_nohz(void);
+
+static inline void rcu_irq_enter(void)
+{
+	rcu_exit_nohz();
+}
+
+static inline void rcu_irq_exit(void)
+{
+	rcu_enter_nohz();
+}
+
+static inline void rcu_nmi_enter(void)
+{
+}
+
+static inline void rcu_nmi_exit(void)
+{
+}
+
+#else
 extern void rcu_irq_enter(void);
 extern void rcu_irq_exit(void);
 extern void rcu_nmi_enter(void);
 extern void rcu_nmi_exit(void);
+#endif
 #else
 # define rcu_irq_enter() do { } while (0)
 # define rcu_irq_exit() do { } while (0)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 3ebd0b7..2f1bc42 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h

@@ -68,11 +68,17 @@
 /* Internal to kernel */
 extern void rcu_init(void);
 extern void rcu_scheduler_starting(void);
+#ifndef CONFIG_TINY_RCU
 extern int rcu_needs_cpu(int cpu);
+#else
+static inline int rcu_needs_cpu(int cpu) { return 0; }
+#endif
 extern int rcu_scheduler_active;
 
 #if defined(CONFIG_TREE_RCU) || defined(CONFIG_TREE_PREEMPT_RCU)
 #include <linux/rcutree.h>
+#elif defined(CONFIG_TINY_RCU)
+#include <linux/rcutiny.h>
 #else
 #error "Unknown RCU implementation specified to kernel configuration"
 #endif

diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
new file mode 100644
index 0000000..2c1fe83
--- /dev/null
+++ b/include/linux/rcutiny.h

@@ -0,0 +1,95 @@
+/*
+ * Read-Copy Update mechanism for mutual exclusion, the Bloatwatch edition.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright IBM Corporation, 2008
+ *
+ * Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
+ *
+ * For detailed explanation of Read-Copy Update mechanism see -
+ *		Documentation/RCU
+ */
+#ifndef __LINUX_TINY_H
+#define __LINUX_TINY_H
+
+#include <linux/cache.h>
+
+void rcu_sched_qs(int cpu);
+void rcu_bh_qs(int cpu);
+
+#define __rcu_read_lock()	preempt_disable()
+#define __rcu_read_unlock()	preempt_enable()
+#define __rcu_read_lock_bh()	local_bh_disable()
+#define __rcu_read_unlock_bh()	local_bh_enable()
+#define call_rcu_sched		call_rcu
+
+#define rcu_init_sched()	do { } while (0)
+extern void rcu_check_callbacks(int cpu, int user);
+extern void __rcu_init(void);
+
+/*
+ * Return the number of grace periods.
+ */
+static inline long rcu_batches_completed(void)
+{
+	return 0;
+}
+
+/*
+ * Return the number of bottom-half grace periods.
+ */
+static inline long rcu_batches_completed_bh(void)
+{
+	return 0;
+}
+
+extern int rcu_expedited_torture_stats(char *page);
+
+static inline void synchronize_rcu_expedited(void)
+{
+	synchronize_sched();
+}
+
+static inline void synchronize_rcu_bh_expedited(void)
+{
+	synchronize_sched();
+}
+
+struct notifier_block;
+extern int rcu_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu);
+
+#ifdef CONFIG_NO_HZ
+
+extern void rcu_enter_nohz(void);
+extern void rcu_exit_nohz(void);
+
+#else /* #ifdef CONFIG_NO_HZ */
+
+static inline void rcu_enter_nohz(void)
+{
+}
+
+static inline void rcu_exit_nohz(void)
+{
+}
+
+#endif /* #else #ifdef CONFIG_NO_HZ */
+
+static inline void exit_rcu(void)
+{
+}
+
+#endif /* __LINUX_RCUTINY_H */

diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index aca0eee..4765d97 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h

@@ -48,6 +48,7 @@
 int srcu_read_lock(struct srcu_struct *sp) __acquires(sp);
 void srcu_read_unlock(struct srcu_struct *sp, int idx) __releases(sp);
 void synchronize_srcu(struct srcu_struct *sp);
+void synchronize_srcu_expedited(struct srcu_struct *sp);
 long srcu_batches_completed(struct srcu_struct *sp);
 
 #endif

diff --git a/init/Kconfig b/init/Kconfig
index 09c5c64..d367bf6 100644
--- a/init/Kconfig
+++ b/init/Kconfig

@@ -334,6 +334,15 @@
 	  is also required.  It also scales down nicely to
 	  smaller systems.
 
+config TINY_RCU
+	bool "UP-only small-memory-footprint RCU"
+	depends on !SMP
+	help
+	  This option selects the RCU implementation that is
+	  designed for UP systems from which real-time response
+	  is not required.  This option greatly reduces the
+	  memory footprint of RCU.
+
 endchoice
 
 config RCU_TRACE

diff --git a/kernel/Makefile b/kernel/Makefile
index b8d4cd8..e9a0e6e 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile

@@ -82,6 +82,7 @@
 obj-$(CONFIG_TREE_RCU) += rcutree.o
 obj-$(CONFIG_TREE_PREEMPT_RCU) += rcutree.o
 obj-$(CONFIG_TREE_RCU_TRACE) += rcutree_trace.o
+obj-$(CONFIG_TINY_RCU) += rcutiny.o
 obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
 obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o

diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
index 4001833..7625f20 100644
--- a/kernel/rcupdate.c
+++ b/kernel/rcupdate.c

@@ -67,6 +67,8 @@
 	complete(&rcu->completion);
 }
 
+#ifndef CONFIG_TINY_RCU
+
 #ifdef CONFIG_TREE_PREEMPT_RCU
 
 /**
@@ -157,6 +159,8 @@
 }
 EXPORT_SYMBOL_GPL(synchronize_rcu_bh);
 
+#endif /* #ifndef CONFIG_TINY_RCU */
+
 static int __cpuinit rcu_barrier_cpu_hotplug(struct notifier_block *self,
 		unsigned long action, void *hcpu)
 {

diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
new file mode 100644
index 0000000..b33ec3aa
--- /dev/null
+++ b/kernel/rcutiny.c

@@ -0,0 +1,291 @@
+/*
+ * Read-Copy Update mechanism for mutual exclusion, the Bloatwatch edition.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright IBM Corporation, 2008
+ *
+ * Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
+ *
+ * For detailed explanation of Read-Copy Update mechanism see -
+ *		Documentation/RCU
+ */
+#include <linux/moduleparam.h>
+#include <linux/completion.h>
+#include <linux/interrupt.h>
+#include <linux/notifier.h>
+#include <linux/rcupdate.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/sched.h>
+#include <linux/types.h>
+#include <linux/init.h>
+#include <linux/time.h>
+#include <linux/cpu.h>
+
+/* Global control variables for rcupdate callback mechanism. */
+struct rcu_ctrlblk {
+	struct rcu_head *rcucblist;	/* List of pending callbacks (CBs). */
+	struct rcu_head **donetail;	/* ->next pointer of last "done" CB. */
+	struct rcu_head **curtail;	/* ->next pointer of last CB. */
+};
+
+/* Definition for rcupdate control block. */
+static struct rcu_ctrlblk rcu_ctrlblk = {
+	.donetail	= &rcu_ctrlblk.rcucblist,
+	.curtail	= &rcu_ctrlblk.rcucblist,
+};
+
+static struct rcu_ctrlblk rcu_bh_ctrlblk = {
+	.donetail	= &rcu_bh_ctrlblk.rcucblist,
+	.curtail	= &rcu_bh_ctrlblk.rcucblist,
+};
+
+#ifdef CONFIG_NO_HZ
+
+static long rcu_dynticks_nesting = 1;
+
+/*
+ * Enter dynticks-idle mode, which is an extended quiescent state
+ * if we have fully entered that mode (i.e., if the new value of
+ * dynticks_nesting is zero).
+ */
+void rcu_enter_nohz(void)
+{
+	if (--rcu_dynticks_nesting == 0)
+		rcu_sched_qs(0); /* implies rcu_bh_qsctr_inc(0) */
+}
+
+/*
+ * Exit dynticks-idle mode, so that we are no longer in an extended
+ * quiescent state.
+ */
+void rcu_exit_nohz(void)
+{
+	rcu_dynticks_nesting++;
+}
+
+#endif /* #ifdef CONFIG_NO_HZ */
+
+/*
+ * Helper function for rcu_qsctr_inc() and rcu_bh_qsctr_inc().
+ * Also disable irqs to avoid confusion due to interrupt handlers
+ * invoking call_rcu().
+ */
+static int rcu_qsctr_help(struct rcu_ctrlblk *rcp)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	if (rcp->rcucblist != NULL &&
+	    rcp->donetail != rcp->curtail) {
+		rcp->donetail = rcp->curtail;
+		local_irq_restore(flags);
+		return 1;
+	}
+	local_irq_restore(flags);
+
+	return 0;
+}
+
+/*
+ * Record an rcu quiescent state.  And an rcu_bh quiescent state while we
+ * are at it, given that any rcu quiescent state is also an rcu_bh
+ * quiescent state.  Use "+" instead of "||" to defeat short circuiting.
+ */
+void rcu_sched_qs(int cpu)
+{
+	if (rcu_qsctr_help(&rcu_ctrlblk) + rcu_qsctr_help(&rcu_bh_ctrlblk))
+		raise_softirq(RCU_SOFTIRQ);
+}
+
+/*
+ * Record an rcu_bh quiescent state.
+ */
+void rcu_bh_qs(int cpu)
+{
+	if (rcu_qsctr_help(&rcu_bh_ctrlblk))
+		raise_softirq(RCU_SOFTIRQ);
+}
+
+/*
+ * Check to see if the scheduling-clock interrupt came from an extended
+ * quiescent state, and, if so, tell RCU about it.
+ */
+void rcu_check_callbacks(int cpu, int user)
+{
+	if (user ||
+	    (idle_cpu(cpu) &&
+	     !in_softirq() &&
+	     hardirq_count() <= (1 << HARDIRQ_SHIFT)))
+		rcu_sched_qs(cpu);
+	else if (!in_softirq())
+		rcu_bh_qs(cpu);
+}
+
+/*
+ * Helper function for rcu_process_callbacks() that operates on the
+ * specified rcu_ctrlkblk structure.
+ */
+static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp)
+{
+	struct rcu_head *next, *list;
+	unsigned long flags;
+
+	/* If no RCU callbacks ready to invoke, just return. */
+	if (&rcp->rcucblist == rcp->donetail)
+		return;
+
+	/* Move the ready-to-invoke callbacks to a local list. */
+	local_irq_save(flags);
+	list = rcp->rcucblist;
+	rcp->rcucblist = *rcp->donetail;
+	*rcp->donetail = NULL;
+	if (rcp->curtail == rcp->donetail)
+		rcp->curtail = &rcp->rcucblist;
+	rcp->donetail = &rcp->rcucblist;
+	local_irq_restore(flags);
+
+	/* Invoke the callbacks on the local list. */
+	while (list) {
+		next = list->next;
+		prefetch(next);
+		list->func(list);
+		list = next;
+	}
+}
+
+/*
+ * Invoke any callbacks whose grace period has completed.
+ */
+static void rcu_process_callbacks(struct softirq_action *unused)
+{
+	__rcu_process_callbacks(&rcu_ctrlblk);
+	__rcu_process_callbacks(&rcu_bh_ctrlblk);
+}
+
+/*
+ * Null function to handle CPU being onlined.  Longer term, we want to
+ * make TINY_RCU avoid using rcupdate.c, but later...
+ */
+int rcu_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu)
+{
+	return NOTIFY_OK;
+}
+
+/*
+ * Wait for a grace period to elapse.  But it is illegal to invoke
+ * synchronize_sched() from within an RCU read-side critical section.
+ * Therefore, any legal call to synchronize_sched() is a quiescent
+ * state, and so on a UP system, synchronize_sched() need do nothing.
+ * Ditto for synchronize_rcu_bh().  (But Lai Jiangshan points out the
+ * benefits of doing might_sleep() to reduce latency.)
+ *
+ * Cool, huh?  (Due to Josh Triplett.)
+ *
+ * But we want to make this a static inline later.
+ */
+void synchronize_sched(void)
+{
+	cond_resched();
+}
+EXPORT_SYMBOL_GPL(synchronize_sched);
+
+void synchronize_rcu_bh(void)
+{
+	synchronize_sched();
+}
+EXPORT_SYMBOL_GPL(synchronize_rcu_bh);
+
+/*
+ * Helper function for call_rcu() and call_rcu_bh().
+ */
+static void __call_rcu(struct rcu_head *head,
+		       void (*func)(struct rcu_head *rcu),
+		       struct rcu_ctrlblk *rcp)
+{
+	unsigned long flags;
+
+	head->func = func;
+	head->next = NULL;
+
+	local_irq_save(flags);
+	*rcp->curtail = head;
+	rcp->curtail = &head->next;
+	local_irq_restore(flags);
+}
+
+/*
+ * Post an RCU callback to be invoked after the end of an RCU grace
+ * period.  But since we have but one CPU, that would be after any
+ * quiescent state.
+ */
+void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
+{
+	__call_rcu(head, func, &rcu_ctrlblk);
+}
+EXPORT_SYMBOL_GPL(call_rcu);
+
+/*
+ * Post an RCU bottom-half callback to be invoked after any subsequent
+ * quiescent state.
+ */
+void call_rcu_bh(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
+{
+	__call_rcu(head, func, &rcu_bh_ctrlblk);
+}
+EXPORT_SYMBOL_GPL(call_rcu_bh);
+
+void rcu_barrier(void)
+{
+	struct rcu_synchronize rcu;
+
+	init_completion(&rcu.completion);
+	/* Will wake me after RCU finished. */
+	call_rcu(&rcu.head, wakeme_after_rcu);
+	/* Wait for it. */
+	wait_for_completion(&rcu.completion);
+}
+EXPORT_SYMBOL_GPL(rcu_barrier);
+
+void rcu_barrier_bh(void)
+{
+	struct rcu_synchronize rcu;
+
+	init_completion(&rcu.completion);
+	/* Will wake me after RCU finished. */
+	call_rcu_bh(&rcu.head, wakeme_after_rcu);
+	/* Wait for it. */
+	wait_for_completion(&rcu.completion);
+}
+EXPORT_SYMBOL_GPL(rcu_barrier_bh);
+
+void rcu_barrier_sched(void)
+{
+	struct rcu_synchronize rcu;
+
+	init_completion(&rcu.completion);
+	/* Will wake me after RCU finished. */
+	call_rcu_sched(&rcu.head, wakeme_after_rcu);
+	/* Wait for it. */
+	wait_for_completion(&rcu.completion);
+}
+EXPORT_SYMBOL_GPL(rcu_barrier_sched);
+
+void __rcu_init(void)
+{
+	open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
+}

diff --git a/kernel/rcutorture.c b/kernel/rcutorture.c
index 697c0a0..3dd0ca2 100644
--- a/kernel/rcutorture.c
+++ b/kernel/rcutorture.c

@@ -547,6 +547,25 @@
 	.name		= "srcu"
 };
 
+static void srcu_torture_synchronize_expedited(void)
+{
+	synchronize_srcu_expedited(&srcu_ctl);
+}
+
+static struct rcu_torture_ops srcu_expedited_ops = {
+	.init		= srcu_torture_init,
+	.cleanup	= srcu_torture_cleanup,
+	.readlock	= srcu_torture_read_lock,
+	.read_delay	= srcu_read_delay,
+	.readunlock	= srcu_torture_read_unlock,
+	.completed	= srcu_torture_completed,
+	.deferred_free	= rcu_sync_torture_deferred_free,
+	.sync		= srcu_torture_synchronize_expedited,
+	.cb_barrier	= NULL,
+	.stats		= srcu_torture_stats,
+	.name		= "srcu_expedited"
+};
+
 /*
  * Definitions for sched torture testing.
  */
@@ -592,7 +611,7 @@
 	.name		= "sched"
 };
 
-static struct rcu_torture_ops sched_ops_sync = {
+static struct rcu_torture_ops sched_sync_ops = {
 	.init		= rcu_sync_torture_init,
 	.cleanup	= NULL,
 	.readlock	= sched_torture_read_lock,
@@ -1098,8 +1117,8 @@
 	int firsterr = 0;
 	static struct rcu_torture_ops *torture_ops[] =
 		{ &rcu_ops, &rcu_sync_ops, &rcu_bh_ops, &rcu_bh_sync_ops,
-		  &sched_expedited_ops,
-		  &srcu_ops, &sched_ops, &sched_ops_sync, };
+		  &srcu_ops, &srcu_expedited_ops,
+		  &sched_ops, &sched_sync_ops, &sched_expedited_ops, };
 
 	mutex_lock(&fullstop_mutex);
 
@@ -1110,8 +1129,12 @@
 			break;
 	}
 	if (i == ARRAY_SIZE(torture_ops)) {
-		printk(KERN_ALERT "rcutorture: invalid torture type: \"%s\"\n",
+		printk(KERN_ALERT "rcu-torture: invalid torture type: \"%s\"\n",
 		       torture_type);
+		printk(KERN_ALERT "rcu-torture types:");
+		for (i = 0; i < ARRAY_SIZE(torture_ops); i++)
+			printk(KERN_ALERT " %s", torture_ops[i]->name);
+		printk(KERN_ALERT "\n");
 		mutex_unlock(&fullstop_mutex);
 		return -EINVAL;
 	}

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index f3077c0..69f4efe 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c

@@ -51,6 +51,8 @@
 
 /* Data structures. */
 
+static struct lock_class_key rcu_root_class;
+
 #define RCU_STATE_INITIALIZER(name) { \
 	.level = { &name.node[0] }, \
 	.levelcnt = { \
@@ -1685,8 +1687,7 @@
 		cpustride *= rsp->levelspread[i];
 		rnp = rsp->level[i];
 		for (j = 0; j < rsp->levelcnt[i]; j++, rnp++) {
-			if (rnp != rcu_get_root(rsp))
-				spin_lock_init(&rnp->lock);
+			spin_lock_init(&rnp->lock);
 			rnp->gpnum = 0;
 			rnp->qsmask = 0;
 			rnp->qsmaskinit = 0;
@@ -1709,7 +1710,7 @@
 			INIT_LIST_HEAD(&rnp->blocked_tasks[1]);
 		}
 	}
-	spin_lock_init(&rcu_get_root(rsp)->lock);
+	lockdep_set_class(&rcu_get_root(rsp)->lock, &rcu_root_class);
 }
 
 /*

diff --git a/kernel/rcutree_trace.c b/kernel/rcutree_trace.c
index 4b31c77..1984cdc 100644
--- a/kernel/rcutree_trace.c
+++ b/kernel/rcutree_trace.c

@@ -155,12 +155,14 @@
 
 static void print_one_rcu_state(struct seq_file *m, struct rcu_state *rsp)
 {
+	long gpnum;
 	int level = 0;
 	struct rcu_node *rnp;
 
+	gpnum = rsp->gpnum;
 	seq_printf(m, "c=%ld g=%ld s=%d jfq=%ld j=%x "
 		      "nfqs=%lu/nfqsng=%lu(%lu) fqlh=%lu oqlen=%ld\n",
-		   rsp->completed, rsp->gpnum, rsp->signaled,
+		   rsp->completed, gpnum, rsp->signaled,
 		   (long)(rsp->jiffies_force_qs - jiffies),
 		   (int)(jiffies & 0xffff),
 		   rsp->n_force_qs, rsp->n_force_qs_ngp,
@@ -171,8 +173,10 @@
 			seq_puts(m, "\n");
 			level = rnp->level;
 		}
-		seq_printf(m, "%lx/%lx %d:%d ^%d    ",
+		seq_printf(m, "%lx/%lx %c>%c %d:%d ^%d    ",
 			   rnp->qsmask, rnp->qsmaskinit,
+			   "T."[list_empty(&rnp->blocked_tasks[gpnum & 1])],
+			   "T."[list_empty(&rnp->blocked_tasks[!(gpnum & 1)])],
 			   rnp->grplo, rnp->grphi, rnp->grpnum);
 	}
 	seq_puts(m, "\n");

diff --git a/kernel/softirq.c b/kernel/softirq.c
index f8749e5..21939d9 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c

@@ -302,9 +302,9 @@
 	if (!in_interrupt() && local_softirq_pending())
 		invoke_softirq();
 
+	rcu_irq_exit();
 #ifdef CONFIG_NO_HZ
 	/* Make sure that timer wheel updates are propagated */
-	rcu_irq_exit();
 	if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched())
 		tick_nohz_stop_sched_tick(0);
 #endif

diff --git a/kernel/srcu.c b/kernel/srcu.c
index b0aeeaf..818d7d9 100644
--- a/kernel/srcu.c
+++ b/kernel/srcu.c

@@ -49,6 +49,7 @@
 	sp->per_cpu_ref = alloc_percpu(struct srcu_struct_array);
 	return (sp->per_cpu_ref ? 0 : -ENOMEM);
 }
+EXPORT_SYMBOL_GPL(init_srcu_struct);
 
 /*
  * srcu_readers_active_idx -- returns approximate number of readers
@@ -97,6 +98,7 @@
 	free_percpu(sp->per_cpu_ref);
 	sp->per_cpu_ref = NULL;
 }
+EXPORT_SYMBOL_GPL(cleanup_srcu_struct);
 
 /**
  * srcu_read_lock - register a new reader for an SRCU-protected structure.
@@ -118,6 +120,7 @@
 	preempt_enable();
 	return idx;
 }
+EXPORT_SYMBOL_GPL(srcu_read_lock);
 
 /**
  * srcu_read_unlock - unregister a old reader from an SRCU-protected structure.
@@ -136,22 +139,12 @@
 	per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]--;
 	preempt_enable();
 }
+EXPORT_SYMBOL_GPL(srcu_read_unlock);
 
-/**
- * synchronize_srcu - wait for prior SRCU read-side critical-section completion
- * @sp: srcu_struct with which to synchronize.
- *
- * Flip the completed counter, and wait for the old count to drain to zero.
- * As with classic RCU, the updater must use some separate means of
- * synchronizing concurrent updates.  Can block; must be called from
- * process context.
- *
- * Note that it is illegal to call synchornize_srcu() from the corresponding
- * SRCU read-side critical section; doing so will result in deadlock.
- * However, it is perfectly legal to call synchronize_srcu() on one
- * srcu_struct from some other srcu_struct's read-side critical section.
+/*
+ * Helper function for synchronize_srcu() and synchronize_srcu_expedited().
  */
-void synchronize_srcu(struct srcu_struct *sp)
+void __synchronize_srcu(struct srcu_struct *sp, void (*sync_func)(void))
 {
 	int idx;
 
@@ -173,7 +166,7 @@
 		return;
 	}
 
-	synchronize_sched();  /* Force memory barrier on all CPUs. */
+	sync_func();  /* Force memory barrier on all CPUs. */
 
 	/*
 	 * The preceding synchronize_sched() ensures that any CPU that
@@ -190,7 +183,7 @@
 	idx = sp->completed & 0x1;
 	sp->completed++;
 
-	synchronize_sched();  /* Force memory barrier on all CPUs. */
+	sync_func();  /* Force memory barrier on all CPUs. */
 
 	/*
 	 * At this point, because of the preceding synchronize_sched(),
@@ -203,7 +196,7 @@
 	while (srcu_readers_active_idx(sp, idx))
 		schedule_timeout_interruptible(1);
 
-	synchronize_sched();  /* Force memory barrier on all CPUs. */
+	sync_func();  /* Force memory barrier on all CPUs. */
 
 	/*
 	 * The preceding synchronize_sched() forces all srcu_read_unlock()
@@ -237,6 +230,47 @@
 }
 
 /**
+ * synchronize_srcu - wait for prior SRCU read-side critical-section completion
+ * @sp: srcu_struct with which to synchronize.
+ *
+ * Flip the completed counter, and wait for the old count to drain to zero.
+ * As with classic RCU, the updater must use some separate means of
+ * synchronizing concurrent updates.  Can block; must be called from
+ * process context.
+ *
+ * Note that it is illegal to call synchronize_srcu() from the corresponding
+ * SRCU read-side critical section; doing so will result in deadlock.
+ * However, it is perfectly legal to call synchronize_srcu() on one
+ * srcu_struct from some other srcu_struct's read-side critical section.
+ */
+void synchronize_srcu(struct srcu_struct *sp)
+{
+	__synchronize_srcu(sp, synchronize_sched);
+}
+EXPORT_SYMBOL_GPL(synchronize_srcu);
+
+/**
+ * synchronize_srcu_expedited - like synchronize_srcu, but less patient
+ * @sp: srcu_struct with which to synchronize.
+ *
+ * Flip the completed counter, and wait for the old count to drain to zero.
+ * As with classic RCU, the updater must use some separate means of
+ * synchronizing concurrent updates.  Can block; must be called from
+ * process context.
+ *
+ * Note that it is illegal to call synchronize_srcu_expedited()
+ * from the corresponding SRCU read-side critical section; doing so
+ * will result in deadlock.  However, it is perfectly legal to call
+ * synchronize_srcu_expedited() on one srcu_struct from some other
+ * srcu_struct's read-side critical section.
+ */
+void synchronize_srcu_expedited(struct srcu_struct *sp)
+{
+	__synchronize_srcu(sp, synchronize_sched_expedited);
+}
+EXPORT_SYMBOL_GPL(synchronize_srcu_expedited);
+
+/**
  * srcu_batches_completed - return batches completed.
  * @sp: srcu_struct on which to report batch completion.
  *
@@ -248,10 +282,4 @@
 {
 	return sp->completed;
 }
-
-EXPORT_SYMBOL_GPL(init_srcu_struct);
-EXPORT_SYMBOL_GPL(cleanup_srcu_struct);
-EXPORT_SYMBOL_GPL(srcu_read_lock);
-EXPORT_SYMBOL_GPL(srcu_read_unlock);
-EXPORT_SYMBOL_GPL(synchronize_srcu);
 EXPORT_SYMBOL_GPL(srcu_batches_completed);
commit	7e1a2766e67a529f62c8cfba0a47d63fc4f7fa8a	[log] [tgz]
author	Ingo Molnar <mingo@elte.hu>	Tue Nov 10 04:10:31 2009 +0100
committer	Ingo Molnar <mingo@elte.hu>	Tue Nov 10 04:10:35 2009 +0100
tree	197950369a773afdf04af9bdfc4a2ce1d2b5d3af
parent	c5e0cb3ddc5f14cedcfc50c0fb3b5fc6b56576da [diff]
parent	83f5b01ffbbaea6f97c9a79d21e240dbfb69f2f1 [diff]