Blame - Documentation/filesystems/idmappings.rst - linux

blob: ac0af679e61e5ceabdc6825e1e14260b1f188a83 [file] [log] [blame]

Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	1	.. SPDX-License-Identifier: GPL-2.0
				2
				3	Idmappings
				4	==========
				5
				6	Most filesystem developers will have encountered idmappings. They are used when
				7	reading from or writing ownership to disk, reporting ownership to userspace, or
				8	for permission checking. This document is aimed at filesystem developers that
				9	want to know how idmappings work.
				10
				11	Formal notes
				12	------------
				13
				14	An idmapping is essentially a translation of a range of ids into another or the
				15	same range of ids. The notational convention for idmappings that is widely used
				16	in userspace is::
				17
				18	u:k:r
				19
				20	``u`` indicates the first element in the upper idmapset ``U`` and ``k``
				21	indicates the first element in the lower idmapset ``K``. The ``r`` parameter
				22	indicates the range of the idmapping, i.e. how many ids are mapped. From now
				23	on, we will always prefix ids with ``u`` or ``k`` to make it clear whether
				24	we're talking about an id in the upper or lower idmapset.
				25
				26	To see what this looks like in practice, let's take the following idmapping::
				27
				28	u22:k10000:r3
				29
				30	and write down the mappings it will generate::
				31
				32	u22 -> k10000
				33	u23 -> k10001
				34	u24 -> k10002
				35
				36	From a mathematical viewpoint ``U`` and ``K`` are well-ordered sets and an
				37	idmapping is an order isomorphism from ``U`` into ``K``. So ``U`` and ``K`` are
				38	order isomorphic. In fact, ``U`` and ``K`` are always well-ordered subsets of
Bjorn Helgaas	d56b699	2023-08-14 16:28:22 -0500	[diff] [blame]	39	the set of all possible ids usable on a given system.
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	40
				41	Looking at this mathematically briefly will help us highlight some properties
				42	that make it easier to understand how we can translate between idmappings. For
				43	example, we know that the inverse idmapping is an order isomorphism as well::
				44
				45	k10000 -> u22
				46	k10001 -> u23
				47	k10002 -> u24
				48
				49	Given that we are dealing with order isomorphisms plus the fact that we're
Bjorn Helgaas	d56b699	2023-08-14 16:28:22 -0500	[diff] [blame]	50	dealing with subsets we can embed idmappings into each other, i.e. we can
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	51	sensibly translate between different idmappings. For example, assume we've been
				52	given the three idmappings::
				53
				54	1. u0:k10000:r10000
				55	2. u0:k20000:r10000
				56	3. u0:k30000:r10000
				57
				58	and id ``k11000`` which has been generated by the first idmapping by mapping
				59	``u1000`` from the upper idmapset down to ``k11000`` in the lower idmapset.
				60
				61	Because we're dealing with order isomorphic subsets it is meaningful to ask
				62	what id ``k11000`` corresponds to in the second or third idmapping. The
Bjorn Helgaas	d56b699	2023-08-14 16:28:22 -0500	[diff] [blame]	63	straightforward algorithm to use is to apply the inverse of the first idmapping,
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	64	mapping ``k11000`` up to ``u1000``. Afterwards, we can map ``u1000`` down using
				65	either the second idmapping mapping or third idmapping mapping. The second
				66	idmapping would map ``u1000`` down to ``21000``. The third idmapping would map
				67	``u1000`` down to ``u31000``.
				68
				69	If we were given the same task for the following three idmappings::
				70
				71	1. u0:k10000:r10000
				72	2. u0:k20000:r200
				73	3. u0:k30000:r300
				74
				75	we would fail to translate as the sets aren't order isomorphic over the full
				76	range of the first idmapping anymore (However they are order isomorphic over
				77	the full range of the second idmapping.). Neither the second or third idmapping
				78	contain ``u1000`` in the upper idmapset ``U``. This is equivalent to not having
				79	an id mapped. We can simply say that ``u1000`` is unmapped in the second and
				80	third idmapping. The kernel will report unmapped ids as the overflowuid
				81	``(uid_t)-1`` or overflowgid ``(gid_t)-1`` to userspace.
				82
				83	The algorithm to calculate what a given id maps to is pretty simple. First, we
				84	need to verify that the range can contain our target id. We will skip this step
				85	for simplicity. After that if we want to know what ``id`` maps to we can do
				86	simple calculations:
				87
				88	- If we want to map from left to right::
				89
				90	u:k:r
				91	id - u + k = n
				92
				93	- If we want to map from right to left::
				94
				95	u:k:r
				96	id - k + u = n
				97
				98	Instead of "left to right" we can also say "down" and instead of "right to
				99	left" we can also say "up". Obviously mapping down and up invert each other.
				100
				101	To see whether the simple formulas above work, consider the following two
				102	idmappings::
				103
				104	1. u0:k20000:r10000
				105	2. u500:k30000:r10000
				106
				107	Assume we are given ``k21000`` in the lower idmapset of the first idmapping. We
				108	want to know what id this was mapped from in the upper idmapset of the first
				109	idmapping. So we're mapping up in the first idmapping::
				110
				111	id - k + u = n
				112	k21000 - k20000 + u0 = u1000
				113
				114	Now assume we are given the id ``u1100`` in the upper idmapset of the second
				115	idmapping and we want to know what this id maps down to in the lower idmapset
				116	of the second idmapping. This means we're mapping down in the second
				117	idmapping::
				118
				119	id - u + k = n
				120	u1100 - u500 + k30000 = k30600
				121
				122	General notes
				123	-------------
				124
				125	In the context of the kernel an idmapping can be interpreted as mapping a range
				126	of userspace ids into a range of kernel ids::
				127
				128	userspace-id:kernel-id:range
				129
				130	A userspace id is always an element in the upper idmapset of an idmapping of
				131	type ``uid_t`` or ``gid_t`` and a kernel id is always an element in the lower
				132	idmapset of an idmapping of type ``kuid_t`` or ``kgid_t``. From now on
				133	"userspace id" will be used to refer to the well known ``uid_t`` and ``gid_t``
				134	types and "kernel id" will be used to refer to ``kuid_t`` and ``kgid_t``.
				135
				136	The kernel is mostly concerned with kernel ids. They are used when performing
				137	permission checks and are stored in an inode's ``i_uid`` and ``i_gid`` field.
				138	A userspace id on the other hand is an id that is reported to userspace by the
				139	kernel, or is passed by userspace to the kernel, or a raw device id that is
				140	written or read from disk.
				141
				142	Note that we are only concerned with idmappings as the kernel stores them not
				143	how userspace would specify them.
				144
				145	For the rest of this document we will prefix all userspace ids with ``u`` and
				146	all kernel ids with ``k``. Ranges of idmappings will be prefixed with ``r``. So
				147	an idmapping will be written as ``u0:k10000:r10000``.
				148
GONG, Ruiqi	b93ec21	2023-08-16 11:32:10 +0800	[diff] [blame]	149	For example, within this idmapping, the id ``u1000`` is an id in the upper
				150	idmapset or "userspace idmapset" starting with ``u0``. And it is mapped to
				151	``k11000`` which is a kernel id in the lower idmapset or "kernel idmapset"
				152	starting with ``k10000``.
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	153
				154	A kernel id is always created by an idmapping. Such idmappings are associated
				155	with user namespaces. Since we mainly care about how idmappings work we're not
				156	going to be concerned with how idmappings are created nor how they are used
				157	outside of the filesystem context. This is best left to an explanation of user
				158	namespaces.
				159
				160	The initial user namespace is special. It always has an idmapping of the
				161	following form::
				162
				163	u0:k0:r4294967295
				164
				165	which is an identity idmapping over the full range of ids available on this
				166	system.
				167
				168	Other user namespaces usually have non-identity idmappings such as::
				169
				170	u0:k10000:r10000
				171
				172	When a process creates or wants to change ownership of a file, or when the
				173	ownership of a file is read from disk by a filesystem, the userspace id is
				174	immediately translated into a kernel id according to the idmapping associated
				175	with the relevant user namespace.
				176
				177	For instance, consider a file that is stored on disk by a filesystem as being
				178	owned by ``u1000``:
				179
				180	- If a filesystem were to be mounted in the initial user namespaces (as most
				181	filesystems are) then the initial idmapping will be used. As we saw this is
				182	simply the identity idmapping. This would mean id ``u1000`` read from disk
				183	would be mapped to id ``k1000``. So an inode's ``i_uid`` and ``i_gid`` field
				184	would contain ``k1000``.
				185
				186	- If a filesystem were to be mounted with an idmapping of ``u0:k10000:r10000``
				187	then ``u1000`` read from disk would be mapped to ``k11000``. So an inode's
				188	``i_uid`` and ``i_gid`` would contain ``k11000``.
				189
				190	Translation algorithms
				191	----------------------
				192
				193	We've already seen briefly that it is possible to translate between different
				194	idmappings. We'll now take a closer look how that works.
				195
				196	Crossmapping
				197	~~~~~~~~~~~~
				198
				199	This translation algorithm is used by the kernel in quite a few places. For
				200	example, it is used when reporting back the ownership of a file to userspace
				201	via the ``stat()`` system call family.
				202
				203	If we've been given ``k11000`` from one idmapping we can map that id up in
				204	another idmapping. In order for this to work both idmappings need to contain
				205	the same kernel id in their kernel idmapsets. For example, consider the
				206	following idmappings::
				207
				208	1. u0:k10000:r10000
				209	2. u20000:k10000:r10000
				210
				211	and we are mapping ``u1000`` down to ``k11000`` in the first idmapping . We can
				212	then translate ``k11000`` into a userspace id in the second idmapping using the
				213	kernel idmapset of the second idmapping::
				214
				215	/* Map the kernel id up into a userspace id in the second idmapping. */
				216	from_kuid(u20000:k10000:r10000, k11000) = u21000
				217
				218	Note, how we can get back to the kernel id in the first idmapping by inverting
				219	the algorithm::
				220
				221	/* Map the userspace id down into a kernel id in the second idmapping. */
				222	make_kuid(u20000:k10000:r10000, u21000) = k11000
				223
				224	/* Map the kernel id up into a userspace id in the first idmapping. */
				225	from_kuid(u0:k10000:r10000, k11000) = u1000
				226
				227	This algorithm allows us to answer the question what userspace id a given
				228	kernel id corresponds to in a given idmapping. In order to be able to answer
				229	this question both idmappings need to contain the same kernel id in their
				230	respective kernel idmapsets.
				231
				232	For example, when the kernel reads a raw userspace id from disk it maps it down
				233	into a kernel id according to the idmapping associated with the filesystem.
				234	Let's assume the filesystem was mounted with an idmapping of
				235	``u0:k20000:r10000`` and it reads a file owned by ``u1000`` from disk. This
				236	means ``u1000`` will be mapped to ``k21000`` which is what will be stored in
				237	the inode's ``i_uid`` and ``i_gid`` field.
				238
				239	When someone in userspace calls ``stat()`` or a related function to get
				240	ownership information about the file the kernel can't simply map the id back up
				241	according to the filesystem's idmapping as this would give the wrong owner if
				242	the caller is using an idmapping.
				243
				244	So the kernel will map the id back up in the idmapping of the caller. Let's
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	245	assume the caller has the somewhat unconventional idmapping
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	246	``u3000:k20000:r10000`` then ``k21000`` would map back up to ``u4000``.
				247	Consequently the user would see that this file is owned by ``u4000``.
				248
				249	Remapping
				250	~~~~~~~~~
				251
				252	It is possible to translate a kernel id from one idmapping to another one via
				253	the userspace idmapset of the two idmappings. This is equivalent to remapping
				254	a kernel id.
				255
				256	Let's look at an example. We are given the following two idmappings::
				257
				258	1. u0:k10000:r10000
				259	2. u0:k20000:r10000
				260
				261	and we are given ``k11000`` in the first idmapping. In order to translate this
				262	kernel id in the first idmapping into a kernel id in the second idmapping we
				263	need to perform two steps:
				264
				265	1. Map the kernel id up into a userspace id in the first idmapping::
				266
				267	/* Map the kernel id up into a userspace id in the first idmapping. */
				268	from_kuid(u0:k10000:r10000, k11000) = u1000
				269
				270	2. Map the userspace id down into a kernel id in the second idmapping::
				271
				272	/* Map the userspace id down into a kernel id in the second idmapping. */
				273	make_kuid(u0:k20000:r10000, u1000) = k21000
				274
				275	As you can see we used the userspace idmapset in both idmappings to translate
				276	the kernel id in one idmapping to a kernel id in another idmapping.
				277
				278	This allows us to answer the question what kernel id we would need to use to
				279	get the same userspace id in another idmapping. In order to be able to answer
				280	this question both idmappings need to contain the same userspace id in their
				281	respective userspace idmapsets.
				282
				283	Note, how we can easily get back to the kernel id in the first idmapping by
				284	inverting the algorithm:
				285
				286	1. Map the kernel id up into a userspace id in the second idmapping::
				287
				288	/* Map the kernel id up into a userspace id in the second idmapping. */
				289	from_kuid(u0:k20000:r10000, k21000) = u1000
				290
				291	2. Map the userspace id down into a kernel id in the first idmapping::
				292
				293	/* Map the userspace id down into a kernel id in the first idmapping. */
				294	make_kuid(u0:k10000:r10000, u1000) = k11000
				295
				296	Another way to look at this translation is to treat it as inverting one
				297	idmapping and applying another idmapping if both idmappings have the relevant
				298	userspace id mapped. This will come in handy when working with idmapped mounts.
				299
				300	Invalid translations
				301	~~~~~~~~~~~~~~~~~~~~
				302
				303	It is never valid to use an id in the kernel idmapset of one idmapping as the
				304	id in the userspace idmapset of another or the same idmapping. While the kernel
				305	idmapset always indicates an idmapset in the kernel id space the userspace
				306	idmapset indicates a userspace id. So the following translations are forbidden::
				307
				308	/* Map the userspace id down into a kernel id in the first idmapping. */
				309	make_kuid(u0:k10000:r10000, u1000) = k11000
				310
				311	/* INVALID: Map the kernel id down into a kernel id in the second idmapping. */
				312	make_kuid(u10000:k20000:r10000, k110000) = k21000
				313	~~~~~~~
				314
				315	and equally wrong::
				316
				317	/* Map the kernel id up into a userspace id in the first idmapping. */
				318	from_kuid(u0:k10000:r10000, k11000) = u1000
				319
				320	/* INVALID: Map the userspace id up into a userspace id in the second idmapping. */
				321	from_kuid(u20000:k0:r10000, u1000) = k21000
				322	~~~~~
				323
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	324	Since userspace ids have type ``uid_t`` and ``gid_t`` and kernel ids have type
				325	``kuid_t`` and ``kgid_t`` the compiler will throw an error when they are
				326	conflated. So the two examples above would cause a compilation failure.
				327
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	328	Idmappings when creating filesystem objects
				329	-------------------------------------------
				330
				331	The concepts of mapping an id down or mapping an id up are expressed in the two
				332	kernel functions filesystem developers are rather familiar with and which we've
				333	already used in this document::
				334
				335	/* Map the userspace id down into a kernel id. */
				336	make_kuid(idmapping, uid)
				337
				338	/* Map the kernel id up into a userspace id. */
				339	from_kuid(idmapping, kuid)
				340
				341	We will take an abbreviated look into how idmappings figure into creating
				342	filesystem objects. For simplicity we will only look at what happens when the
				343	VFS has already completed path lookup right before it calls into the filesystem
				344	itself. So we're concerned with what happens when e.g. ``vfs_mkdir()`` is
				345	called. We will also assume that the directory we're creating filesystem
				346	objects in is readable and writable for everyone.
				347
				348	When creating a filesystem object the caller will look at the caller's
				349	filesystem ids. These are just regular ``uid_t`` and ``gid_t`` userspace ids
				350	but they are exclusively used when determining file ownership which is why they
				351	are called "filesystem ids". They are usually identical to the uid and gid of
				352	the caller but can differ. We will just assume they are always identical to not
				353	get lost in too many details.
				354
				355	When the caller enters the kernel two things happen:
				356
				357	1. Map the caller's userspace ids down into kernel ids in the caller's
				358	idmapping.
				359	(To be precise, the kernel will simply look at the kernel ids stashed in the
				360	credentials of the current task but for our education we'll pretend this
				361	translation happens just in time.)
				362	2. Verify that the caller's kernel ids can be mapped up to userspace ids in the
				363	filesystem's idmapping.
				364
				365	The second step is important as regular filesystem will ultimately need to map
				366	the kernel id back up into a userspace id when writing to disk.
				367	So with the second step the kernel guarantees that a valid userspace id can be
				368	written to disk. If it can't the kernel will refuse the creation request to not
				369	even remotely risk filesystem corruption.
				370
Bjorn Helgaas	d56b699	2023-08-14 16:28:22 -0500	[diff] [blame]	371	The astute reader will have realized that this is simply a variation of the
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	372	crossmapping algorithm we mentioned above in a previous section. First, the
				373	kernel maps the caller's userspace id down into a kernel id according to the
				374	caller's idmapping and then maps that kernel id up according to the
				375	filesystem's idmapping.
				376
Alexander Mikhalitsyn	d220efa	2023-06-25 20:20:47 +0200	[diff] [blame]	377	From the implementation point it's worth mentioning how idmappings are represented.
				378	All idmappings are taken from the corresponding user namespace.
				379
				380	- caller's idmapping (usually taken from ``current_user_ns()``)
				381	- filesystem's idmapping (``sb->s_user_ns``)
				382	- mount's idmapping (``mnt_idmap(vfsmnt)``)
				383
Rodrigo Campos	ccbd0c9	2022-04-29 15:57:48 +0200	[diff] [blame]	384	Let's see some examples with caller/filesystem idmapping but without mount
				385	idmappings. This will exhibit some problems we can hit. After that we will
				386	revisit/reconsider these examples, this time using mount idmappings, to see how
				387	they can solve the problems we observed before.
				388
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	389	Example 1
				390	~~~~~~~~~
				391
				392	::
				393
				394	caller id: u1000
				395	caller idmapping: u0:k0:r4294967295
				396	filesystem idmapping: u0:k0:r4294967295
				397
				398	Both the caller and the filesystem use the identity idmapping:
				399
				400	1. Map the caller's userspace ids into kernel ids in the caller's idmapping::
				401
				402	make_kuid(u0:k0:r4294967295, u1000) = k1000
				403
				404	2. Verify that the caller's kernel ids can be mapped to userspace ids in the
				405	filesystem's idmapping.
				406
				407	For this second step the kernel will call the function
				408	``fsuidgid_has_mapping()`` which ultimately boils down to calling
				409	``from_kuid()``::
				410
				411	from_kuid(u0:k0:r4294967295, k1000) = u1000
				412
				413	In this example both idmappings are the same so there's nothing exciting going
				414	on. Ultimately the userspace id that lands on disk will be ``u1000``.
				415
				416	Example 2
				417	~~~~~~~~~
				418
				419	::
				420
				421	caller id: u1000
				422	caller idmapping: u0:k10000:r10000
				423	filesystem idmapping: u0:k20000:r10000
				424
				425	1. Map the caller's userspace ids down into kernel ids in the caller's
				426	idmapping::
				427
				428	make_kuid(u0:k10000:r10000, u1000) = k11000
				429
				430	2. Verify that the caller's kernel ids can be mapped up to userspace ids in the
				431	filesystem's idmapping::
				432
				433	from_kuid(u0:k20000:r10000, k11000) = u-1
				434
				435	It's immediately clear that while the caller's userspace id could be
				436	successfully mapped down into kernel ids in the caller's idmapping the kernel
				437	ids could not be mapped up according to the filesystem's idmapping. So the
				438	kernel will deny this creation request.
				439
				440	Note that while this example is less common, because most filesystem can't be
				441	mounted with non-initial idmappings this is a general problem as we can see in
				442	the next examples.
				443
				444	Example 3
				445	~~~~~~~~~
				446
				447	::
				448
				449	caller id: u1000
				450	caller idmapping: u0:k10000:r10000
				451	filesystem idmapping: u0:k0:r4294967295
				452
				453	1. Map the caller's userspace ids down into kernel ids in the caller's
				454	idmapping::
				455
				456	make_kuid(u0:k10000:r10000, u1000) = k11000
				457
				458	2. Verify that the caller's kernel ids can be mapped up to userspace ids in the
				459	filesystem's idmapping::
				460
				461	from_kuid(u0:k0:r4294967295, k11000) = u11000
				462
				463	We can see that the translation always succeeds. The userspace id that the
				464	filesystem will ultimately put to disk will always be identical to the value of
				465	the kernel id that was created in the caller's idmapping. This has mainly two
				466	consequences.
				467
				468	First, that we can't allow a caller to ultimately write to disk with another
Bjorn Helgaas	d56b699	2023-08-14 16:28:22 -0500	[diff] [blame]	469	userspace id. We could only do this if we were to mount the whole filesystem
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	470	with the caller's or another idmapping. But that solution is limited to a few
				471	filesystems and not very flexible. But this is a use-case that is pretty
				472	important in containerized workloads.
				473
				474	Second, the caller will usually not be able to create any files or access
				475	directories that have stricter permissions because none of the filesystem's
				476	kernel ids map up into valid userspace ids in the caller's idmapping
				477
				478	1. Map raw userspace ids down to kernel ids in the filesystem's idmapping::
				479
				480	make_kuid(u0:k0:r4294967295, u1000) = k1000
				481
				482	2. Map kernel ids up to userspace ids in the caller's idmapping::
				483
				484	from_kuid(u0:k10000:r10000, k1000) = u-1
				485
				486	Example 4
				487	~~~~~~~~~
				488
				489	::
				490
				491	file id: u1000
				492	caller idmapping: u0:k10000:r10000
				493	filesystem idmapping: u0:k0:r4294967295
				494
				495	In order to report ownership to userspace the kernel uses the crossmapping
				496	algorithm introduced in a previous section:
				497
				498	1. Map the userspace id on disk down into a kernel id in the filesystem's
				499	idmapping::
				500
				501	make_kuid(u0:k0:r4294967295, u1000) = k1000
				502
				503	2. Map the kernel id up into a userspace id in the caller's idmapping::
				504
				505	from_kuid(u0:k10000:r10000, k1000) = u-1
				506
				507	The crossmapping algorithm fails in this case because the kernel id in the
				508	filesystem idmapping cannot be mapped up to a userspace id in the caller's
				509	idmapping. Thus, the kernel will report the ownership of this file as the
				510	overflowid.
				511
				512	Example 5
				513	~~~~~~~~~
				514
				515	::
				516
				517	file id: u1000
				518	caller idmapping: u0:k10000:r10000
				519	filesystem idmapping: u0:k20000:r10000
				520
				521	In order to report ownership to userspace the kernel uses the crossmapping
				522	algorithm introduced in a previous section:
				523
				524	1. Map the userspace id on disk down into a kernel id in the filesystem's
				525	idmapping::
				526
				527	make_kuid(u0:k20000:r10000, u1000) = k21000
				528
				529	2. Map the kernel id up into a userspace id in the caller's idmapping::
				530
				531	from_kuid(u0:k10000:r10000, k21000) = u-1
				532
				533	Again, the crossmapping algorithm fails in this case because the kernel id in
				534	the filesystem idmapping cannot be mapped to a userspace id in the caller's
				535	idmapping. Thus, the kernel will report the ownership of this file as the
				536	overflowid.
				537
				538	Note how in the last two examples things would be simple if the caller would be
				539	using the initial idmapping. For a filesystem mounted with the initial
				540	idmapping it would be trivial. So we only consider a filesystem with an
				541	idmapping of ``u0:k20000:r10000``:
				542
				543	1. Map the userspace id on disk down into a kernel id in the filesystem's
				544	idmapping::
				545
				546	make_kuid(u0:k20000:r10000, u1000) = k21000
				547
				548	2. Map the kernel id up into a userspace id in the caller's idmapping::
				549
				550	from_kuid(u0:k0:r4294967295, k21000) = u21000
				551
				552	Idmappings on idmapped mounts
				553	-----------------------------
				554
				555	The examples we've seen in the previous section where the caller's idmapping
				556	and the filesystem's idmapping are incompatible causes various issues for
				557	workloads. For a more complex but common example, consider two containers
				558	started on the host. To completely prevent the two containers from affecting
				559	each other, an administrator may often use different non-overlapping idmappings
				560	for the two containers::
				561
				562	container1 idmapping: u0:k10000:r10000
				563	container2 idmapping: u0:k20000:r10000
				564	filesystem idmapping: u0:k30000:r10000
				565
				566	An administrator wanting to provide easy read-write access to the following set
				567	of files::
				568
				569	dir id: u0
				570	dir/file1 id: u1000
				571	dir/file2 id: u2000
				572
				573	to both containers currently can't.
				574
				575	Of course the administrator has the option to recursively change ownership via
				576	``chown()``. For example, they could change ownership so that ``dir`` and all
				577	files below it can be crossmapped from the filesystem's into the container's
				578	idmapping. Let's assume they change ownership so it is compatible with the
				579	first container's idmapping::
				580
				581	dir id: u10000
				582	dir/file1 id: u11000
				583	dir/file2 id: u12000
				584
				585	This would still leave ``dir`` rather useless to the second container. In fact,
				586	``dir`` and all files below it would continue to appear owned by the overflowid
				587	for the second container.
				588
				589	Or consider another increasingly popular example. Some service managers such as
				590	systemd implement a concept called "portable home directories". A user may want
				591	to use their home directories on different machines where they are assigned
				592	different login userspace ids. Most users will have ``u1000`` as the login id
				593	on their machine at home and all files in their home directory will usually be
				594	owned by ``u1000``. At uni or at work they may have another login id such as
				595	``u1125``. This makes it rather difficult to interact with their home directory
				596	on their work machine.
				597
				598	In both cases changing ownership recursively has grave implications. The most
				599	obvious one is that ownership is changed globally and permanently. In the home
Bjorn Helgaas	d56b699	2023-08-14 16:28:22 -0500	[diff] [blame]	600	directory case this change in ownership would even need to happen every time the
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	601	user switches from their home to their work machine. For really large sets of
				602	files this becomes increasingly costly.
				603
				604	If the user is lucky, they are dealing with a filesystem that is mountable
				605	inside user namespaces. But this would also change ownership globally and the
				606	change in ownership is tied to the lifetime of the filesystem mount, i.e. the
				607	superblock. The only way to change ownership is to completely unmount the
				608	filesystem and mount it again in another user namespace. This is usually
				609	impossible because it would mean that all users currently accessing the
				610	filesystem can't anymore. And it means that ``dir`` still can't be shared
				611	between two containers with different idmappings.
				612	But usually the user doesn't even have this option since most filesystems
				613	aren't mountable inside containers. And not having them mountable might be
				614	desirable as it doesn't require the filesystem to deal with malicious
				615	filesystem images.
				616
				617	But the usecases mentioned above and more can be handled by idmapped mounts.
				618	They allow to expose the same set of dentries with different ownership at
				619	different mounts. This is achieved by marking the mounts with a user namespace
				620	through the ``mount_setattr()`` system call. The idmapping associated with it
				621	is then used to translate from the caller's idmapping to the filesystem's
				622	idmapping and vica versa using the remapping algorithm we introduced above.
				623
				624	Idmapped mounts make it possible to change ownership in a temporary and
				625	localized way. The ownership changes are restricted to a specific mount and the
				626	ownership changes are tied to the lifetime of the mount. All other users and
				627	locations where the filesystem is exposed are unaffected.
				628
				629	Filesystems that support idmapped mounts don't have any real reason to support
				630	being mountable inside user namespaces. A filesystem could be exposed
				631	completely under an idmapped mount to get the same effect. This has the
				632	advantage that filesystems can leave the creation of the superblock to
				633	privileged users in the initial user namespace.
				634
				635	However, it is perfectly possible to combine idmapped mounts with filesystems
				636	mountable inside user namespaces. We will touch on this further below.
				637
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	638	Filesystem types vs idmapped mount types
				639	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				640
				641	With the introduction of idmapped mounts we need to distinguish between
				642	filesystem ownership and mount ownership of a VFS object such as an inode. The
				643	owner of a inode might be different when looked at from a filesystem
				644	perspective than when looked at from an idmapped mount. Such fundamental
				645	conceptual distinctions should almost always be clearly expressed in the code.
				646	So, to distinguish idmapped mount ownership from filesystem ownership separate
				647	types have been introduced.
				648
				649	If a uid or gid has been generated using the filesystem or caller's idmapping
				650	then we will use the ``kuid_t`` and ``kgid_t`` types. However, if a uid or gid
				651	has been generated using a mount idmapping then we will be using the dedicated
				652	``vfsuid_t`` and ``vfsgid_t`` types.
				653
				654	All VFS helpers that generate or take uids and gids as arguments use the
				655	``vfsuid_t`` and ``vfsgid_t`` types and we will be able to rely on the compiler
				656	to catch errors that originate from conflating filesystem and VFS uids and gids.
				657
				658	The ``vfsuid_t`` and ``vfsgid_t`` types are often mapped from and to ``kuid_t``
				659	and ``kgid_t`` types similar how ``kuid_t`` and ``kgid_t`` types are mapped
				660	from and to ``uid_t`` and ``gid_t`` types::
				661
				662	uid_t <--> kuid_t <--> vfsuid_t
				663	gid_t <--> kgid_t <--> vfsgid_t
				664
				665	Whenever we report ownership based on a ``vfsuid_t`` or ``vfsgid_t`` type,
				666	e.g., during ``stat()``, or store ownership information in a shared VFS object
				667	based on a ``vfsuid_t`` or ``vfsgid_t`` type, e.g., during ``chown()`` we can
				668	use the ``vfsuid_into_kuid()`` and ``vfsgid_into_kgid()`` helpers.
				669
				670	To illustrate why this helper currently exists, consider what happens when we
				671	change ownership of an inode from an idmapped mount. After we generated
				672	a ``vfsuid_t`` or ``vfsgid_t`` based on the mount idmapping we later commit to
Bjorn Helgaas	d56b699	2023-08-14 16:28:22 -0500	[diff] [blame]	673	this ``vfsuid_t`` or ``vfsgid_t`` to become the new filesystem wide ownership.
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	674	Thus, we are turning the ``vfsuid_t`` or ``vfsgid_t`` into a global ``kuid_t``
				675	or ``kgid_t``. And this can be done by using ``vfsuid_into_kuid()`` and
				676	``vfsgid_into_kgid()``.
				677
				678	Note, whenever a shared VFS object, e.g., a cached ``struct inode`` or a cached
				679	``struct posix_acl``, stores ownership information a filesystem or "global"
				680	``kuid_t`` and ``kgid_t`` must be used. Ownership expressed via ``vfsuid_t``
				681	and ``vfsgid_t`` is specific to an idmapped mount.
				682
				683	We already noted that ``vfsuid_t`` and ``vfsgid_t`` types are generated based
				684	on mount idmappings whereas ``kuid_t`` and ``kgid_t`` types are generated based
				685	on filesystem idmappings. To prevent abusing filesystem idmappings to generate
				686	``vfsuid_t`` or ``vfsgid_t`` types or mount idmappings to generate ``kuid_t``
				687	or ``kgid_t`` types filesystem idmappings and mount idmappings are different
				688	types as well.
				689
				690	All helpers that map to or from ``vfsuid_t`` and ``vfsgid_t`` types require
				691	a mount idmapping to be passed which is of type ``struct mnt_idmap``. Passing
				692	a filesystem or caller idmapping will cause a compilation error.
				693
				694	Similar to how we prefix all userspace ids in this document with ``u`` and all
				695	kernel ids with ``k`` we will prefix all VFS ids with ``v``. So a mount
				696	idmapping will be written as: ``u0:v10000:r10000``.
				697
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	698	Remapping helpers
				699	~~~~~~~~~~~~~~~~~
				700
				701	Idmapping functions were added that translate between idmappings. They make use
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	702	of the remapping algorithm we've introduced earlier. We're going to look at:
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	703
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	704	- ``i_uid_into_vfsuid()`` and ``i_gid_into_vfsgid()``
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	705
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	706	The ``i_id_into_vfsid()`` functions translate filesystem's kernel ids into
				707	VFS ids in the mount's idmapping::
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	708
				709	/* Map the filesystem's kernel id up into a userspace id in the filesystem's idmapping. */
				710	from_kuid(filesystem, kid) = uid
				711
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	712	/* Map the filesystem's userspace id down ito a VFS id in the mount's idmapping. */
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	713	make_kuid(mount, uid) = kuid
				714
				715	- ``mapped_fsuid()`` and ``mapped_fsgid()``
				716
				717	The ``mapped_fs*id()`` functions translate the caller's kernel ids into
				718	kernel ids in the filesystem's idmapping. This translation is achieved by
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	719	remapping the caller's VFS ids using the mount's idmapping::
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	720
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	721	/* Map the caller's VFS id up into a userspace id in the mount's idmapping. */
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	722	from_kuid(mount, kid) = uid
				723
				724	/* Map the mount's userspace id down into a kernel id in the filesystem's idmapping. */
				725	make_kuid(filesystem, uid) = kuid
				726
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	727	- ``vfsuid_into_kuid()`` and ``vfsgid_into_kgid()``
				728
				729	Whenever
				730
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	731	Note that these two functions invert each other. Consider the following
				732	idmappings::
				733
				734	caller idmapping: u0:k10000:r10000
				735	filesystem idmapping: u0:k20000:r10000
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	736	mount idmapping: u0:v10000:r10000
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	737
				738	Assume a file owned by ``u1000`` is read from disk. The filesystem maps this id
Randy Dunlap	622d6f19	2022-08-31 17:28:28 -0700	[diff] [blame]	739	to ``k21000`` according to its idmapping. This is what is stored in the
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	740	inode's ``i_uid`` and ``i_gid`` fields.
				741
				742	When the caller queries the ownership of this file via ``stat()`` the kernel
				743	would usually simply use the crossmapping algorithm and map the filesystem's
				744	kernel id up to a userspace id in the caller's idmapping.
				745
				746	But when the caller is accessing the file on an idmapped mount the kernel will
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	747	first call ``i_uid_into_vfsuid()`` thereby translating the filesystem's kernel
				748	id into a VFS id in the mount's idmapping::
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	749
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	750	i_uid_into_vfsuid(k21000):
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	751	/* Map the filesystem's kernel id up into a userspace id. */
				752	from_kuid(u0:k20000:r10000, k21000) = u1000
				753
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	754	/* Map the filesystem's userspace id down into a VFS id in the mount's idmapping. */
				755	make_kuid(u0:v10000:r10000, u1000) = v11000
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	756
				757	Finally, when the kernel reports the owner to the caller it will turn the
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	758	VFS id in the mount's idmapping into a userspace id in the caller's
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	759	idmapping::
				760
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	761	k11000 = vfsuid_into_kuid(v11000)
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	762	from_kuid(u0:k10000:r10000, k11000) = u1000
				763
				764	We can test whether this algorithm really works by verifying what happens when
				765	we create a new file. Let's say the user is creating a file with ``u1000``.
				766
				767	The kernel maps this to ``k11000`` in the caller's idmapping. Usually the
				768	kernel would now apply the crossmapping, verifying that ``k11000`` can be
				769	mapped to a userspace id in the filesystem's idmapping. Since ``k11000`` can't
				770	be mapped up in the filesystem's idmapping directly this creation request
				771	fails.
				772
				773	But when the caller is accessing the file on an idmapped mount the kernel will
				774	first call ``mapped_fs*id()`` thereby translating the caller's kernel id into
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	775	a VFS id according to the mount's idmapping::
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	776
				777	mapped_fsuid(k11000):
				778	/* Map the caller's kernel id up into a userspace id in the mount's idmapping. */
				779	from_kuid(u0:k10000:r10000, k11000) = u1000
				780
				781	/* Map the mount's userspace id down into a kernel id in the filesystem's idmapping. */
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	782	make_kuid(u0:v20000:r10000, u1000) = v21000
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	783
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	784	When finally writing to disk the kernel will then map ``v21000`` up into a
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	785	userspace id in the filesystem's idmapping::
				786
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	787	k21000 = vfsuid_into_kuid(v21000)
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	788	from_kuid(u0:k20000:r10000, k21000) = u1000
				789
				790	As we can see, we end up with an invertible and therefore information
				791	preserving algorithm. A file created from ``u1000`` on an idmapped mount will
				792	also be reported as being owned by ``u1000`` and vica versa.
				793
				794	Let's now briefly reconsider the failing examples from earlier in the context
				795	of idmapped mounts.
				796
				797	Example 2 reconsidered
				798	~~~~~~~~~~~~~~~~~~~~~~
				799
				800	::
				801
				802	caller id: u1000
				803	caller idmapping: u0:k10000:r10000
				804	filesystem idmapping: u0:k20000:r10000
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	805	mount idmapping: u0:v10000:r10000
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	806
				807	When the caller is using a non-initial idmapping the common case is to attach
				808	the same idmapping to the mount. We now perform three steps:
				809
				810	1. Map the caller's userspace ids into kernel ids in the caller's idmapping::
				811
				812	make_kuid(u0:k10000:r10000, u1000) = k11000
				813
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	814	2. Translate the caller's VFS id into a kernel id in the filesystem's
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	815	idmapping::
				816
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	817	mapped_fsuid(v11000):
				818	/* Map the VFS id up into a userspace id in the mount's idmapping. */
				819	from_kuid(u0:v10000:r10000, v11000) = u1000
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	820
				821	/* Map the userspace id down into a kernel id in the filesystem's idmapping. */
				822	make_kuid(u0:k20000:r10000, u1000) = k21000
				823
				824	2. Verify that the caller's kernel ids can be mapped to userspace ids in the
				825	filesystem's idmapping::
				826
				827	from_kuid(u0:k20000:r10000, k21000) = u1000
				828
				829	So the ownership that lands on disk will be ``u1000``.
				830
				831	Example 3 reconsidered
				832	~~~~~~~~~~~~~~~~~~~~~~
				833
				834	::
				835
				836	caller id: u1000
				837	caller idmapping: u0:k10000:r10000
				838	filesystem idmapping: u0:k0:r4294967295
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	839	mount idmapping: u0:v10000:r10000
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	840
				841	The same translation algorithm works with the third example.
				842
				843	1. Map the caller's userspace ids into kernel ids in the caller's idmapping::
				844
				845	make_kuid(u0:k10000:r10000, u1000) = k11000
				846
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	847	2. Translate the caller's VFS id into a kernel id in the filesystem's
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	848	idmapping::
				849
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	850	mapped_fsuid(v11000):
				851	/* Map the VFS id up into a userspace id in the mount's idmapping. */
				852	from_kuid(u0:v10000:r10000, v11000) = u1000
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	853
				854	/* Map the userspace id down into a kernel id in the filesystem's idmapping. */
				855	make_kuid(u0:k0:r4294967295, u1000) = k1000
				856
				857	2. Verify that the caller's kernel ids can be mapped to userspace ids in the
				858	filesystem's idmapping::
				859
				860	from_kuid(u0:k0:r4294967295, k21000) = u1000
				861
				862	So the ownership that lands on disk will be ``u1000``.
				863
				864	Example 4 reconsidered
				865	~~~~~~~~~~~~~~~~~~~~~~
				866
				867	::
				868
				869	file id: u1000
				870	caller idmapping: u0:k10000:r10000
				871	filesystem idmapping: u0:k0:r4294967295
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	872	mount idmapping: u0:v10000:r10000
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	873
				874	In order to report ownership to userspace the kernel now does three steps using
				875	the translation algorithm we introduced earlier:
				876
				877	1. Map the userspace id on disk down into a kernel id in the filesystem's
				878	idmapping::
				879
				880	make_kuid(u0:k0:r4294967295, u1000) = k1000
				881
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	882	2. Translate the kernel id into a VFS id in the mount's idmapping::
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	883
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	884	i_uid_into_vfsuid(k1000):
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	885	/* Map the kernel id up into a userspace id in the filesystem's idmapping. */
				886	from_kuid(u0:k0:r4294967295, k1000) = u1000
				887
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	888	/* Map the userspace id down into a VFS id in the mounts's idmapping. */
				889	make_kuid(u0:v10000:r10000, u1000) = v11000
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	890
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	891	3. Map the VFS id up into a userspace id in the caller's idmapping::
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	892
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	893	k11000 = vfsuid_into_kuid(v11000)
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	894	from_kuid(u0:k10000:r10000, k11000) = u1000
				895
				896	Earlier, the caller's kernel id couldn't be crossmapped in the filesystems's
				897	idmapping. With the idmapped mount in place it now can be crossmapped into the
				898	filesystem's idmapping via the mount's idmapping. The file will now be created
				899	with ``u1000`` according to the mount's idmapping.
				900
				901	Example 5 reconsidered
				902	~~~~~~~~~~~~~~~~~~~~~~
				903
				904	::
				905
				906	file id: u1000
				907	caller idmapping: u0:k10000:r10000
				908	filesystem idmapping: u0:k20000:r10000
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	909	mount idmapping: u0:v10000:r10000
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	910
				911	Again, in order to report ownership to userspace the kernel now does three
				912	steps using the translation algorithm we introduced earlier:
				913
				914	1. Map the userspace id on disk down into a kernel id in the filesystem's
				915	idmapping::
				916
				917	make_kuid(u0:k20000:r10000, u1000) = k21000
				918
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	919	2. Translate the kernel id into a VFS id in the mount's idmapping::
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	920
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	921	i_uid_into_vfsuid(k21000):
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	922	/* Map the kernel id up into a userspace id in the filesystem's idmapping. */
				923	from_kuid(u0:k20000:r10000, k21000) = u1000
				924
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	925	/* Map the userspace id down into a VFS id in the mounts's idmapping. */
				926	make_kuid(u0:v10000:r10000, u1000) = v11000
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	927
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	928	3. Map the VFS id up into a userspace id in the caller's idmapping::
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	929
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	930	k11000 = vfsuid_into_kuid(v11000)
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	931	from_kuid(u0:k10000:r10000, k11000) = u1000
				932
				933	Earlier, the file's kernel id couldn't be crossmapped in the filesystems's
				934	idmapping. With the idmapped mount in place it now can be crossmapped into the
				935	filesystem's idmapping via the mount's idmapping. The file is now owned by
				936	``u1000`` according to the mount's idmapping.
				937
				938	Changing ownership on a home directory
				939	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				940
				941	We've seen above how idmapped mounts can be used to translate between
				942	idmappings when either the caller, the filesystem or both uses a non-initial
				943	idmapping. A wide range of usecases exist when the caller is using
				944	a non-initial idmapping. This mostly happens in the context of containerized
				945	workloads. The consequence is as we have seen that for both, filesystem's
				946	mounted with the initial idmapping and filesystems mounted with non-initial
				947	idmappings, access to the filesystem isn't working because the kernel ids can't
				948	be crossmapped between the caller's and the filesystem's idmapping.
				949
				950	As we've seen above idmapped mounts provide a solution to this by remapping the
				951	caller's or filesystem's idmapping according to the mount's idmapping.
				952
				953	Aside from containerized workloads, idmapped mounts have the advantage that
				954	they also work when both the caller and the filesystem use the initial
				955	idmapping which means users on the host can change the ownership of directories
				956	and files on a per-mount basis.
				957
				958	Consider our previous example where a user has their home directory on portable
				959	storage. At home they have id ``u1000`` and all files in their home directory
				960	are owned by ``u1000`` whereas at uni or work they have login id ``u1125``.
				961
				962	Taking their home directory with them becomes problematic. They can't easily
				963	access their files, they might not be able to write to disk without applying
				964	lax permissions or ACLs and even if they can, they will end up with an annoying
				965	mix of files and directories owned by ``u1000`` and ``u1125``.
				966
				967	Idmapped mounts allow to solve this problem. A user can create an idmapped
				968	mount for their home directory on their work computer or their computer at home
				969	depending on what ownership they would prefer to end up on the portable storage
				970	itself.
				971
				972	Let's assume they want all files on disk to belong to ``u1000``. When the user
				973	plugs in their portable storage at their work station they can setup a job that
				974	creates an idmapped mount with the minimal idmapping ``u1000:k1125:r1``. So now
				975	when they create a file the kernel performs the following steps we already know
				976	from above:::
				977
				978	caller id: u1125
				979	caller idmapping: u0:k0:r4294967295
				980	filesystem idmapping: u0:k0:r4294967295
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	981	mount idmapping: u1000:v1125:r1
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	982
				983	1. Map the caller's userspace ids into kernel ids in the caller's idmapping::
				984
				985	make_kuid(u0:k0:r4294967295, u1125) = k1125
				986
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	987	2. Translate the caller's VFS id into a kernel id in the filesystem's
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	988	idmapping::
				989
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	990	mapped_fsuid(v1125):
				991	/* Map the VFS id up into a userspace id in the mount's idmapping. */
				992	from_kuid(u1000:v1125:r1, v1125) = u1000
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	993
				994	/* Map the userspace id down into a kernel id in the filesystem's idmapping. */
				995	make_kuid(u0:k0:r4294967295, u1000) = k1000
				996
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	997	2. Verify that the caller's filesystem ids can be mapped to userspace ids in the
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	998	filesystem's idmapping::
				999
				1000	from_kuid(u0:k0:r4294967295, k1000) = u1000
				1001
				1002	So ultimately the file will be created with ``u1000`` on disk.
				1003
				1004	Now let's briefly look at what ownership the caller with id ``u1125`` will see
				1005	on their work computer:
				1006
				1007	::
				1008
				1009	file id: u1000
				1010	caller idmapping: u0:k0:r4294967295
				1011	filesystem idmapping: u0:k0:r4294967295
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	1012	mount idmapping: u1000:v1125:r1
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	1013
				1014	1. Map the userspace id on disk down into a kernel id in the filesystem's
				1015	idmapping::
				1016
				1017	make_kuid(u0:k0:r4294967295, u1000) = k1000
				1018
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	1019	2. Translate the kernel id into a VFS id in the mount's idmapping::
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	1020
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	1021	i_uid_into_vfsuid(k1000):
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	1022	/* Map the kernel id up into a userspace id in the filesystem's idmapping. */
				1023	from_kuid(u0:k0:r4294967295, k1000) = u1000
				1024
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	1025	/* Map the userspace id down into a VFS id in the mounts's idmapping. */
				1026	make_kuid(u1000:v1125:r1, u1000) = v1125
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	1027
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	1028	3. Map the VFS id up into a userspace id in the caller's idmapping::
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	1029
Christian Brauner	5d3ca59	2023-03-06 15:11:42 +0100	[diff] [blame]	1030	k1125 = vfsuid_into_kuid(v1125)
Christian Brauner	ad19607	2021-07-27 12:44:16 +0200	[diff] [blame]	1031	from_kuid(u0:k0:r4294967295, k1125) = u1125
				1032
				1033	So ultimately the caller will be reported that the file belongs to ``u1125``
				1034	which is the caller's userspace id on their workstation in our example.
				1035
				1036	The raw userspace id that is put on disk is ``u1000`` so when the user takes
				1037	their home directory back to their home computer where they are assigned
				1038	``u1000`` using the initial idmapping and mount the filesystem with the initial
				1039	idmapping they will see all those files owned by ``u1000``.