Documentation/filesystems/ubifs-authentication.rst - linux - Git at Google

 .. SPDX-License-Identifier: GPL-2.0

 .. UBIFS Authentication
 .. sigma star gmbh
 .. 2018

 ============================
 UBIFS Authentication Support
 ============================

 Introduction
 ============

 UBIFS utilizes the fscrypt framework to provide confidentiality for file
 contents and file names. This prevents attacks where an attacker is able to
 read contents of the filesystem on a single point in time. A classic example
 is a lost smartphone where the attacker is unable to read personal data stored
 on the device without the filesystem decryption key.

 At the current state, UBIFS encryption however does not prevent attacks where
 the attacker is able to modify the filesystem contents and the user uses the
 device afterwards. In such a scenario an attacker can modify filesystem
 contents arbitrarily without the user noticing. One example is to modify a
 binary to perform a malicious action when executed [DMC-CBC-ATTACK]. Since
 most of the filesystem metadata of UBIFS is stored in plain, this makes it
 fairly easy to swap files and replace their contents.

 Other full disk encryption systems like dm-crypt cover all filesystem metadata,
 which makes such kinds of attacks more complicated, but not impossible.
 Especially, if the attacker is given access to the device multiple points in
 time. For dm-crypt and other filesystems that build upon the Linux block IO
 layer, the dm-integrity or dm-verity subsystems [DM-INTEGRITY, DM-VERITY]
 can be used to get full data authentication at the block layer.
 These can also be combined with dm-crypt [CRYPTSETUP2].

 This document describes an approach to get file contents _and_ full metadata
 authentication for UBIFS. Since UBIFS uses fscrypt for file contents and file
 name encryption, the authentication system could be tied into fscrypt such that
 existing features like key derivation can be utilized. It should however also
 be possible to use UBIFS authentication without using encryption.


 MTD, UBI & UBIFS
 ----------------

 On Linux, the MTD (Memory Technology Devices) subsystem provides a uniform
 interface to access raw flash devices. One of the more prominent subsystems that
 work on top of MTD is UBI (Unsorted Block Images). It provides volume management
 for flash devices and is thus somewhat similar to LVM for block devices. In
 addition, it deals with flash-specific wear-leveling and transparent I/O error
 handling. UBI offers logical erase blocks (LEBs) to the layers on top of it
 and maps them transparently to physical erase blocks (PEBs) on the flash.

 UBIFS is a filesystem for raw flash which operates on top of UBI. Thus, wear
 leveling and some flash specifics are left to UBI, while UBIFS focuses on
 scalability, performance and recoverability.

 ::

 	+------------+ +*******+ +-----------+ +-----+
 	|            | * UBIFS * | UBI-BLOCK | | ... |
 	| JFFS/JFFS2 | +*******+ +-----------+ +-----+
 	|            | +-----------------------------+ +-----------+ +-----+
 	|            | |              UBI            | | MTD-BLOCK | | ... |
 	+------------+ +-----------------------------+ +-----------+ +-----+
 	+------------------------------------------------------------------+
 	|                  MEMORY TECHNOLOGY DEVICES (MTD)                 |
 	+------------------------------------------------------------------+
 	+-----------------------------+ +--------------------------+ +-----+
 	|         NAND DRIVERS        | |        NOR DRIVERS       | | ... |
 	+-----------------------------+ +--------------------------+ +-----+

             Figure 1: Linux kernel subsystems for dealing with raw flash


 Internally, UBIFS maintains multiple data structures which are persisted on
 the flash:

 - *Index*: an on-flash B+ tree where the leaf nodes contain filesystem data
 - *Journal*: an additional data structure to collect FS changes before updating
   the on-flash index and reduce flash wear.
 - *Tree Node Cache (TNC)*: an in-memory B+ tree that reflects the current FS
   state to avoid frequent flash reads. It is basically the in-memory
   representation of the index, but contains additional attributes.
 - *LEB property tree (LPT)*: an on-flash B+ tree for free space accounting per
   UBI LEB.

 In the remainder of this section we will cover the on-flash UBIFS data
 structures in more detail. The TNC is of less importance here since it is never
 persisted onto the flash directly. More details on UBIFS can also be found in
 [UBIFS-WP].


 UBIFS Index & Tree Node Cache
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Basic on-flash UBIFS entities are called *nodes*. UBIFS knows different types
 of nodes. Eg. data nodes (``struct ubifs_data_node``) which store chunks of file
 contents or inode nodes (``struct ubifs_ino_node``) which represent VFS inodes.
 Almost all types of nodes share a common header (``ubifs_ch``) containing basic
 information like node type, node length, a sequence number, etc. (see
 ``fs/ubifs/ubifs-media.h`` in kernel source). Exceptions are entries of the LPT
 and some less important node types like padding nodes which are used to pad
 unusable content at the end of LEBs.

 To avoid re-writing the whole B+ tree on every single change, it is implemented
 as *wandering tree*, where only the changed nodes are re-written and previous
 versions of them are obsoleted without erasing them right away. As a result,
 the index is not stored in a single place on the flash, but *wanders* around
 and there are obsolete parts on the flash as long as the LEB containing them is
 not reused by UBIFS. To find the most recent version of the index, UBIFS stores
 a special node called *master node* into UBI LEB 1 which always points to the
 most recent root node of the UBIFS index. For recoverability, the master node
 is additionally duplicated to LEB 2. Mounting UBIFS is thus a simple read of
 LEB 1 and 2 to get the current master node and from there get the location of
 the most recent on-flash index.

 The TNC is the in-memory representation of the on-flash index. It contains some
 additional runtime attributes per node which are not persisted. One of these is
 a dirty-flag which marks nodes that have to be persisted the next time the
 index is written onto the flash. The TNC acts as a write-back cache and all
 modifications of the on-flash index are done through the TNC. Like other caches,
 the TNC does not have to mirror the full index into memory, but reads parts of
 it from flash whenever needed. A *commit* is the UBIFS operation of updating the
 on-flash filesystem structures like the index. On every commit, the TNC nodes
 marked as dirty are written to the flash to update the persisted index.


 Journal
 ~~~~~~~

 To avoid wearing out the flash, the index is only persisted (*committed*) when
 certain conditions are met (eg. ``fsync(2)``). The journal is used to record
 any changes (in form of inode nodes, data nodes etc.) between commits
 of the index. During mount, the journal is read from the flash and replayed
 onto the TNC (which will be created on-demand from the on-flash index).

 UBIFS reserves a bunch of LEBs just for the journal called *log area*. The
 amount of log area LEBs is configured on filesystem creation (using
 ``mkfs.ubifs``) and stored in the superblock node. The log area contains only
 two types of nodes: *reference nodes* and *commit start nodes*. A commit start
 node is written whenever an index commit is performed. Reference nodes are
 written on every journal update. Each reference node points to the position of
 other nodes (inode nodes, data nodes etc.) on the flash that are part of this
 journal entry. These nodes are called *buds* and describe the actual filesystem
 changes including their data.

 The log area is maintained as a ring. Whenever the journal is almost full,
 a commit is initiated. This also writes a commit start node so that during
 mount, UBIFS will seek for the most recent commit start node and just replay
 every reference node after that. Every reference node before the commit start
 node will be ignored as they are already part of the on-flash index.

 When writing a journal entry, UBIFS first ensures that enough space is
 available to write the reference node and buds part of this entry. Then, the
 reference node is written and afterwards the buds describing the file changes.
 On replay, UBIFS will record every reference node and inspect the location of
 the referenced LEBs to discover the buds. If these are corrupt or missing,
 UBIFS will attempt to recover them by re-reading the LEB. This is however only
 done for the last referenced LEB of the journal. Only this can become corrupt
 because of a power cut. If the recovery fails, UBIFS will not mount. An error
 for every other LEB will directly cause UBIFS to fail the mount operation.

 ::

        | ----    LOG AREA     ---- | ----------    MAIN AREA    ------------ |

         -----+------+-----+--------+----   ------+-----+-----+---------------
         \    |      |     |        |   /  /      |     |     |               \
         / CS |  REF | REF |        |   \  \ DENT | INO | INO |               /
         \    |      |     |        |   /  /      |     |     |               \
          ----+------+-----+--------+---   -------+-----+-----+----------------
                  |     |                  ^            ^
                  |     |                  |            |
                  +------------------------+            |
                        |                               |
                        +-------------------------------+


                 Figure 2: UBIFS flash layout of log area with commit start nodes
                           (CS) and reference nodes (REF) pointing to main area
                           containing their buds


 LEB Property Tree/Table
 ~~~~~~~~~~~~~~~~~~~~~~~

 The LEB property tree is used to store per-LEB information. This includes the
 LEB type and amount of free and *dirty* (old, obsolete content) space [1]_ on
 the LEB. The type is important, because UBIFS never mixes index nodes with data
 nodes on a single LEB and thus each LEB has a specific purpose. This again is
 useful for free space calculations. See [UBIFS-WP] for more details.

 The LEB property tree again is a B+ tree, but it is much smaller than the
 index. Due to its smaller size it is always written as one chunk on every
 commit. Thus, saving the LPT is an atomic operation.


 .. [1] Since LEBs can only be appended and never overwritten, there is a
    difference between free space ie. the remaining space left on the LEB to be
    written to without erasing it and previously written content that is obsolete
    but can't be overwritten without erasing the full LEB.


 UBIFS Authentication
 ====================

 This chapter introduces UBIFS authentication which enables UBIFS to verify
 the authenticity and integrity of metadata and file contents stored on flash.


 Threat Model
 ------------

 UBIFS authentication enables detection of offline data modification. While it
 does not prevent it, it enables (trusted) code to check the integrity and
 authenticity of on-flash file contents and filesystem metadata. This covers
 attacks where file contents are swapped.

 UBIFS authentication will not protect against rollback of full flash contents.
 Ie. an attacker can still dump the flash and restore it at a later time without
 detection. It will also not protect against partial rollback of individual
 index commits. That means that an attacker is able to partially undo changes.
 This is possible because UBIFS does not immediately overwrites obsolete
 versions of the index tree or the journal, but instead marks them as obsolete
 and garbage collection erases them at a later time. An attacker can use this by
 erasing parts of the current tree and restoring old versions that are still on
 the flash and have not yet been erased. This is possible, because every commit
 will always write a new version of the index root node and the master node
 without overwriting the previous version. This is further helped by the
 wear-leveling operations of UBI which copies contents from one physical
 eraseblock to another and does not atomically erase the first eraseblock.

 UBIFS authentication does not cover attacks where an attacker is able to
 execute code on the device after the authentication key was provided.
 Additional measures like secure boot and trusted boot have to be taken to
 ensure that only trusted code is executed on a device.


 Authentication
 --------------

 To be able to fully trust data read from flash, all UBIFS data structures
 stored on flash are authenticated. That is:

 - The index which includes file contents, file metadata like extended
   attributes, file length etc.
 - The journal which also contains file contents and metadata by recording changes
   to the filesystem
 - The LPT which stores UBI LEB metadata which UBIFS uses for free space accounting


 Index Authentication
 ~~~~~~~~~~~~~~~~~~~~

 Through UBIFS' concept of a wandering tree, it already takes care of only
 updating and persisting changed parts from leaf node up to the root node
 of the full B+ tree. This enables us to augment the index nodes of the tree
 with a hash over each node's child nodes. As a result, the index basically also
 a Merkle tree. Since the leaf nodes of the index contain the actual filesystem
 data, the hashes of their parent index nodes thus cover all the file contents
 and file metadata. When a file changes, the UBIFS index is updated accordingly
 from the leaf nodes up to the root node including the master node. This process
 can be hooked to recompute the hash only for each changed node at the same time.
 Whenever a file is read, UBIFS can verify the hashes from each leaf node up to
 the root node to ensure the node's integrity.

 To ensure the authenticity of the whole index, the UBIFS master node stores a
 keyed hash (HMAC) over its own contents and a hash of the root node of the index
 tree. As mentioned above, the master node is always written to the flash whenever
 the index is persisted (ie. on index commit).

 Using this approach only UBIFS index nodes and the master node are changed to
 include a hash. All other types of nodes will remain unchanged. This reduces
 the storage overhead which is precious for users of UBIFS (ie. embedded
 devices).

 ::

                              +---------------+
                              |  Master Node  |
                              |    (hash)     |
                              +---------------+
                                      |
                                      v
                             +-------------------+
                             |  Index Node #1    |
                             |                   |
                             | branch0   branchn |
                             | (hash)    (hash)  |
                             +-------------------+
                                |    ...   |  (fanout: 8)
                                |          |
                        +-------+          +------+
                        |                         |
                        v                         v
             +-------------------+       +-------------------+
             |  Index Node #2    |       |  Index Node #3    |
             |                   |       |                   |
             | branch0   branchn |       | branch0   branchn |
             | (hash)    (hash)  |       | (hash)    (hash)  |
             +-------------------+       +-------------------+
                  |   ...                     |   ...   |
                  v                           v         v
                +-----------+         +----------+  +-----------+
                | Data Node |         | INO Node |  | DENT Node |
                +-----------+         +----------+  +-----------+


            Figure 3: Coverage areas of index node hash and master node HMAC


 The most important part for robustness and power-cut safety is to atomically
 persist the hash and file contents. Here the existing UBIFS logic for how
 changed nodes are persisted is already designed for this purpose such that
 UBIFS can safely recover if a power-cut occurs while persisting. Adding
 hashes to index nodes does not change this since each hash will be persisted
 atomically together with its respective node.


 Journal Authentication
 ~~~~~~~~~~~~~~~~~~~~~~

 The journal is authenticated too. Since the journal is continuously written
 it is necessary to also add authentication information frequently to the
 journal so that in case of a powercut not too much data can't be authenticated.
 This is done by creating a continuous hash beginning from the commit start node
 over the previous reference nodes, the current reference node, and the bud
 nodes. From time to time whenever it is suitable authentication nodes are added
 between the bud nodes. This new node type contains a HMAC over the current state
 of the hash chain. That way a journal can be authenticated up to the last
 authentication node. The tail of the journal which may not have a authentication
 node cannot be authenticated and is skipped during journal replay.

 We get this picture for journal authentication::

     ,,,,,,,,
     ,......,...........................................
     ,. CS  ,               hash1.----.           hash2.----.
     ,.  |  ,                    .    |hmac            .    |hmac
     ,.  v  ,                    .    v                .    v
     ,.REF#0,-> bud -> bud -> bud.-> auth -> bud -> bud.-> auth ...
     ,..|...,...........................................
     ,  |   ,
     ,  |   ,,,,,,,,,,,,,,,
     .  |            hash3,----.
     ,  |                 ,    |hmac
     ,  v                 ,    v
     , REF#1 -> bud -> bud,-> auth ...
     ,,,|,,,,,,,,,,,,,,,,,,
        v
       REF#2 -> ...
        |
        V
       ...

 Since the hash also includes the reference nodes an attacker cannot reorder or
 skip any journal heads for replay. An attacker can only remove bud nodes or
 reference nodes from the end of the journal, effectively rewinding the
 filesystem at maximum back to the last commit.

 The location of the log area is stored in the master node. Since the master
 node is authenticated with a HMAC as described above, it is not possible to
 tamper with that without detection. The size of the log area is specified when
 the filesystem is created using `mkfs.ubifs` and stored in the superblock node.
 To avoid tampering with this and other values stored there, a HMAC is added to
 the superblock struct. The superblock node is stored in LEB 0 and is only
 modified on feature flag or similar changes, but never on file changes.


 LPT Authentication
 ~~~~~~~~~~~~~~~~~~

 The location of the LPT root node on the flash is stored in the UBIFS master
 node. Since the LPT is written and read atomically on every commit, there is
 no need to authenticate individual nodes of the tree. It suffices to
 protect the integrity of the full LPT by a simple hash stored in the master
 node. Since the master node itself is authenticated, the LPTs authenticity can
 be verified by verifying the authenticity of the master node and comparing the
 LTP hash stored there with the hash computed from the read on-flash LPT.


 Key Management
 --------------

 For simplicity, UBIFS authentication uses a single key to compute the HMACs
 of superblock, master, commit start and reference nodes. This key has to be
 available on creation of the filesystem (`mkfs.ubifs`) to authenticate the
 superblock node. Further, it has to be available on mount of the filesystem
 to verify authenticated nodes and generate new HMACs for changes.

 UBIFS authentication is intended to operate side-by-side with UBIFS encryption
 (fscrypt) to provide confidentiality and authenticity. Since UBIFS encryption
 has a different approach of encryption policies per directory, there can be
 multiple fscrypt master keys and there might be folders without encryption.
 UBIFS authentication on the other hand has an all-or-nothing approach in the
 sense that it either authenticates everything of the filesystem or nothing.
 Because of this and because UBIFS authentication should also be usable without
 encryption, it does not share the same master key with fscrypt, but manages
 a dedicated authentication key.

 The API for providing the authentication key has yet to be defined, but the
 key can eg. be provided by userspace through a keyring similar to the way it
 is currently done in fscrypt. It should however be noted that the current
 fscrypt approach has shown its flaws and the userspace API will eventually
 change [FSCRYPT-POLICY2].

 Nevertheless, it will be possible for a user to provide a single passphrase
 or key in userspace that covers UBIFS authentication and encryption. This can
 be solved by the corresponding userspace tools which derive a second key for
 authentication in addition to the derived fscrypt master key used for
 encryption.

 To be able to check if the proper key is available on mount, the UBIFS
 superblock node will additionally store a hash of the authentication key. This
 approach is similar to the approach proposed for fscrypt encryption policy v2
 [FSCRYPT-POLICY2].


 Future Extensions
 =================

 In certain cases where a vendor wants to provide an authenticated filesystem
 image to customers, it should be possible to do so without sharing the secret
 UBIFS authentication key. Instead, in addition the each HMAC a digital
 signature could be stored where the vendor shares the public key alongside the
 filesystem image. In case this filesystem has to be modified afterwards,
 UBIFS can exchange all digital signatures with HMACs on first mount similar
 to the way the IMA/EVM subsystem deals with such situations. The HMAC key
 will then have to be provided beforehand in the normal way.


 References
 ==========

 [CRYPTSETUP2]        https://www.saout.de/pipermail/dm-crypt/2017-November/005745.html

 [DMC-CBC-ATTACK]     https://www.jakoblell.com/blog/2013/12/22/practical-malleability-attack-against-cbc-encrypted-luks-partitions/

 [DM-INTEGRITY]       https://www.kernel.org/doc/Documentation/device-mapper/dm-integrity.rst

 [DM-VERITY]          https://www.kernel.org/doc/Documentation/device-mapper/verity.rst

 [FSCRYPT-POLICY2]    https://www.spinics.net/lists/linux-ext4/msg58710.html

 [UBIFS-WP]           http://www.linux-mtd.infradead.org/doc/ubifs_whitepaper.pdf
	.. SPDX-License-Identifier: GPL-2.0

	.. UBIFS Authentication
	.. sigma star gmbh
	.. 2018

	============================
	UBIFS Authentication Support
	============================

	Introduction
	============

	UBIFS utilizes the fscrypt framework to provide confidentiality for file
	contents and file names. This prevents attacks where an attacker is able to
	read contents of the filesystem on a single point in time. A classic example
	is a lost smartphone where the attacker is unable to read personal data stored
	on the device without the filesystem decryption key.

	At the current state, UBIFS encryption however does not prevent attacks where
	the attacker is able to modify the filesystem contents and the user uses the
	device afterwards. In such a scenario an attacker can modify filesystem
	contents arbitrarily without the user noticing. One example is to modify a
	binary to perform a malicious action when executed [DMC-CBC-ATTACK]. Since
	most of the filesystem metadata of UBIFS is stored in plain, this makes it
	fairly easy to swap files and replace their contents.

	Other full disk encryption systems like dm-crypt cover all filesystem metadata,
	which makes such kinds of attacks more complicated, but not impossible.
	Especially, if the attacker is given access to the device multiple points in
	time. For dm-crypt and other filesystems that build upon the Linux block IO
	layer, the dm-integrity or dm-verity subsystems [DM-INTEGRITY, DM-VERITY]
	can be used to get full data authentication at the block layer.
	These can also be combined with dm-crypt [CRYPTSETUP2].

	This document describes an approach to get file contents _and_ full metadata
	authentication for UBIFS. Since UBIFS uses fscrypt for file contents and file
	name encryption, the authentication system could be tied into fscrypt such that
	existing features like key derivation can be utilized. It should however also
	be possible to use UBIFS authentication without using encryption.


	MTD, UBI & UBIFS
	----------------

	On Linux, the MTD (Memory Technology Devices) subsystem provides a uniform
	interface to access raw flash devices. One of the more prominent subsystems that
	work on top of MTD is UBI (Unsorted Block Images). It provides volume management
	for flash devices and is thus somewhat similar to LVM for block devices. In
	addition, it deals with flash-specific wear-leveling and transparent I/O error
	handling. UBI offers logical erase blocks (LEBs) to the layers on top of it
	and maps them transparently to physical erase blocks (PEBs) on the flash.

	UBIFS is a filesystem for raw flash which operates on top of UBI. Thus, wear
	leveling and some flash specifics are left to UBI, while UBIFS focuses on
	scalability, performance and recoverability.

	::

	+------------+ +*******+ +-----------+ +-----+
	\| \| * UBIFS * \| UBI-BLOCK \| \| ... \|
	\| JFFS/JFFS2 \| +*******+ +-----------+ +-----+
	\| \| +-----------------------------+ +-----------+ +-----+
	\| \| \| UBI \| \| MTD-BLOCK \| \| ... \|
	+------------+ +-----------------------------+ +-----------+ +-----+
	+------------------------------------------------------------------+
	\| MEMORY TECHNOLOGY DEVICES (MTD) \|
	+------------------------------------------------------------------+
	+-----------------------------+ +--------------------------+ +-----+
	\| NAND DRIVERS \| \| NOR DRIVERS \| \| ... \|
	+-----------------------------+ +--------------------------+ +-----+

	Figure 1: Linux kernel subsystems for dealing with raw flash



	Internally, UBIFS maintains multiple data structures which are persisted on
	the flash:

	- Index: an on-flash B+ tree where the leaf nodes contain filesystem data
	- Journal: an additional data structure to collect FS changes before updating
	the on-flash index and reduce flash wear.
	- Tree Node Cache (TNC): an in-memory B+ tree that reflects the current FS
	state to avoid frequent flash reads. It is basically the in-memory
	representation of the index, but contains additional attributes.
	- LEB property tree (LPT): an on-flash B+ tree for free space accounting per
	UBI LEB.

	In the remainder of this section we will cover the on-flash UBIFS data
	structures in more detail. The TNC is of less importance here since it is never
	persisted onto the flash directly. More details on UBIFS can also be found in
	[UBIFS-WP].


	UBIFS Index & Tree Node Cache
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	Basic on-flash UBIFS entities are called nodes. UBIFS knows different types
	of nodes. Eg. data nodes (``struct ubifs_data_node``) which store chunks of file
	contents or inode nodes (``struct ubifs_ino_node``) which represent VFS inodes.
	Almost all types of nodes share a common header (``ubifs_ch``) containing basic
	information like node type, node length, a sequence number, etc. (see
	``fs/ubifs/ubifs-media.h`` in kernel source). Exceptions are entries of the LPT
	and some less important node types like padding nodes which are used to pad
	unusable content at the end of LEBs.

	To avoid re-writing the whole B+ tree on every single change, it is implemented
	as wandering tree, where only the changed nodes are re-written and previous
	versions of them are obsoleted without erasing them right away. As a result,
	the index is not stored in a single place on the flash, but wanders around
	and there are obsolete parts on the flash as long as the LEB containing them is
	not reused by UBIFS. To find the most recent version of the index, UBIFS stores
	a special node called master node into UBI LEB 1 which always points to the
	most recent root node of the UBIFS index. For recoverability, the master node
	is additionally duplicated to LEB 2. Mounting UBIFS is thus a simple read of
	LEB 1 and 2 to get the current master node and from there get the location of
	the most recent on-flash index.

	The TNC is the in-memory representation of the on-flash index. It contains some
	additional runtime attributes per node which are not persisted. One of these is
	a dirty-flag which marks nodes that have to be persisted the next time the
	index is written onto the flash. The TNC acts as a write-back cache and all
	modifications of the on-flash index are done through the TNC. Like other caches,
	the TNC does not have to mirror the full index into memory, but reads parts of
	it from flash whenever needed. A commit is the UBIFS operation of updating the
	on-flash filesystem structures like the index. On every commit, the TNC nodes
	marked as dirty are written to the flash to update the persisted index.


	Journal
	~~~~~~~

	To avoid wearing out the flash, the index is only persisted (committed) when
	certain conditions are met (eg. ``fsync(2)``). The journal is used to record
	any changes (in form of inode nodes, data nodes etc.) between commits
	of the index. During mount, the journal is read from the flash and replayed
	onto the TNC (which will be created on-demand from the on-flash index).

	UBIFS reserves a bunch of LEBs just for the journal called log area. The
	amount of log area LEBs is configured on filesystem creation (using
	``mkfs.ubifs``) and stored in the superblock node. The log area contains only
	two types of nodes: reference nodes and commit start nodes. A commit start
	node is written whenever an index commit is performed. Reference nodes are
	written on every journal update. Each reference node points to the position of
	other nodes (inode nodes, data nodes etc.) on the flash that are part of this
	journal entry. These nodes are called buds and describe the actual filesystem
	changes including their data.

	The log area is maintained as a ring. Whenever the journal is almost full,
	a commit is initiated. This also writes a commit start node so that during
	mount, UBIFS will seek for the most recent commit start node and just replay
	every reference node after that. Every reference node before the commit start
	node will be ignored as they are already part of the on-flash index.

	When writing a journal entry, UBIFS first ensures that enough space is
	available to write the reference node and buds part of this entry. Then, the
	reference node is written and afterwards the buds describing the file changes.
	On replay, UBIFS will record every reference node and inspect the location of
	the referenced LEBs to discover the buds. If these are corrupt or missing,
	UBIFS will attempt to recover them by re-reading the LEB. This is however only
	done for the last referenced LEB of the journal. Only this can become corrupt
	because of a power cut. If the recovery fails, UBIFS will not mount. An error
	for every other LEB will directly cause UBIFS to fail the mount operation.

	::

	\| ---- LOG AREA ---- \| ---------- MAIN AREA ------------ \|

	-----+------+-----+--------+---- ------+-----+-----+---------------
	\ \| \| \| \| / / \| \| \| \
	/ CS \| REF \| REF \| \| \ \ DENT \| INO \| INO \| /
	\ \| \| \| \| / / \| \| \| \
	----+------+-----+--------+--- -------+-----+-----+----------------
	\| \| ^ ^
	\| \| \| \|
	+------------------------+ \|
	\| \|
	+-------------------------------+


	Figure 2: UBIFS flash layout of log area with commit start nodes
	(CS) and reference nodes (REF) pointing to main area
	containing their buds


	LEB Property Tree/Table
	~~~~~~~~~~~~~~~~~~~~~~~

	The LEB property tree is used to store per-LEB information. This includes the
	LEB type and amount of free and dirty (old, obsolete content) space [1]_ on
	the LEB. The type is important, because UBIFS never mixes index nodes with data
	nodes on a single LEB and thus each LEB has a specific purpose. This again is
	useful for free space calculations. See [UBIFS-WP] for more details.

	The LEB property tree again is a B+ tree, but it is much smaller than the
	index. Due to its smaller size it is always written as one chunk on every
	commit. Thus, saving the LPT is an atomic operation.


	.. [1] Since LEBs can only be appended and never overwritten, there is a
	difference between free space ie. the remaining space left on the LEB to be
	written to without erasing it and previously written content that is obsolete
	but can't be overwritten without erasing the full LEB.


	UBIFS Authentication
	====================

	This chapter introduces UBIFS authentication which enables UBIFS to verify
	the authenticity and integrity of metadata and file contents stored on flash.


	Threat Model
	------------

	UBIFS authentication enables detection of offline data modification. While it
	does not prevent it, it enables (trusted) code to check the integrity and
	authenticity of on-flash file contents and filesystem metadata. This covers
	attacks where file contents are swapped.

	UBIFS authentication will not protect against rollback of full flash contents.
	Ie. an attacker can still dump the flash and restore it at a later time without
	detection. It will also not protect against partial rollback of individual
	index commits. That means that an attacker is able to partially undo changes.
	This is possible because UBIFS does not immediately overwrites obsolete
	versions of the index tree or the journal, but instead marks them as obsolete
	and garbage collection erases them at a later time. An attacker can use this by
	erasing parts of the current tree and restoring old versions that are still on
	the flash and have not yet been erased. This is possible, because every commit
	will always write a new version of the index root node and the master node
	without overwriting the previous version. This is further helped by the
	wear-leveling operations of UBI which copies contents from one physical
	eraseblock to another and does not atomically erase the first eraseblock.

	UBIFS authentication does not cover attacks where an attacker is able to
	execute code on the device after the authentication key was provided.
	Additional measures like secure boot and trusted boot have to be taken to
	ensure that only trusted code is executed on a device.


	Authentication
	--------------

	To be able to fully trust data read from flash, all UBIFS data structures
	stored on flash are authenticated. That is:

	- The index which includes file contents, file metadata like extended
	attributes, file length etc.
	- The journal which also contains file contents and metadata by recording changes
	to the filesystem
	- The LPT which stores UBI LEB metadata which UBIFS uses for free space accounting


	Index Authentication
	~~~~~~~~~~~~~~~~~~~~

	Through UBIFS' concept of a wandering tree, it already takes care of only
	updating and persisting changed parts from leaf node up to the root node
	of the full B+ tree. This enables us to augment the index nodes of the tree
	with a hash over each node's child nodes. As a result, the index basically also
	a Merkle tree. Since the leaf nodes of the index contain the actual filesystem
	data, the hashes of their parent index nodes thus cover all the file contents
	and file metadata. When a file changes, the UBIFS index is updated accordingly
	from the leaf nodes up to the root node including the master node. This process
	can be hooked to recompute the hash only for each changed node at the same time.
	Whenever a file is read, UBIFS can verify the hashes from each leaf node up to
	the root node to ensure the node's integrity.

	To ensure the authenticity of the whole index, the UBIFS master node stores a
	keyed hash (HMAC) over its own contents and a hash of the root node of the index
	tree. As mentioned above, the master node is always written to the flash whenever
	the index is persisted (ie. on index commit).

	Using this approach only UBIFS index nodes and the master node are changed to
	include a hash. All other types of nodes will remain unchanged. This reduces
	the storage overhead which is precious for users of UBIFS (ie. embedded
	devices).

	::

	+---------------+
	\| Master Node \|
	\| (hash) \|
	+---------------+
	\|
	v
	+-------------------+
	\| Index Node #1 \|
	\| \|
	\| branch0 branchn \|
	\| (hash) (hash) \|
	+-------------------+
	\| ... \| (fanout: 8)
	\| \|
	+-------+ +------+
	\| \|
	v v
	+-------------------+ +-------------------+
	\| Index Node #2 \| \| Index Node #3 \|
	\| \| \| \|
	\| branch0 branchn \| \| branch0 branchn \|
	\| (hash) (hash) \| \| (hash) (hash) \|
	+-------------------+ +-------------------+
	\| ... \| ... \|
	v v v
	+-----------+ +----------+ +-----------+
	\| Data Node \| \| INO Node \| \| DENT Node \|
	+-----------+ +----------+ +-----------+


	Figure 3: Coverage areas of index node hash and master node HMAC



	The most important part for robustness and power-cut safety is to atomically
	persist the hash and file contents. Here the existing UBIFS logic for how
	changed nodes are persisted is already designed for this purpose such that
	UBIFS can safely recover if a power-cut occurs while persisting. Adding
	hashes to index nodes does not change this since each hash will be persisted
	atomically together with its respective node.


	Journal Authentication
	~~~~~~~~~~~~~~~~~~~~~~

	The journal is authenticated too. Since the journal is continuously written
	it is necessary to also add authentication information frequently to the
	journal so that in case of a powercut not too much data can't be authenticated.
	This is done by creating a continuous hash beginning from the commit start node
	over the previous reference nodes, the current reference node, and the bud
	nodes. From time to time whenever it is suitable authentication nodes are added
	between the bud nodes. This new node type contains a HMAC over the current state
	of the hash chain. That way a journal can be authenticated up to the last
	authentication node. The tail of the journal which may not have a authentication
	node cannot be authenticated and is skipped during journal replay.

	We get this picture for journal authentication::

	,,,,,,,,
	,......,...........................................
	,. CS , hash1.----. hash2.----.
	,. \| , . \|hmac . \|hmac
	,. v , . v . v
	,.REF#0,-> bud -> bud -> bud.-> auth -> bud -> bud.-> auth ...
	,..\|...,...........................................
	, \| ,
	, \| ,,,,,,,,,,,,,,,
	. \| hash3,----.
	, \| , \|hmac
	, v , v
	, REF#1 -> bud -> bud,-> auth ...
	,,,\|,,,,,,,,,,,,,,,,,,
	v
	REF#2 -> ...
	\|
	V
	...

	Since the hash also includes the reference nodes an attacker cannot reorder or
	skip any journal heads for replay. An attacker can only remove bud nodes or
	reference nodes from the end of the journal, effectively rewinding the
	filesystem at maximum back to the last commit.

	The location of the log area is stored in the master node. Since the master
	node is authenticated with a HMAC as described above, it is not possible to
	tamper with that without detection. The size of the log area is specified when
	the filesystem is created using `mkfs.ubifs` and stored in the superblock node.
	To avoid tampering with this and other values stored there, a HMAC is added to
	the superblock struct. The superblock node is stored in LEB 0 and is only
	modified on feature flag or similar changes, but never on file changes.


	LPT Authentication
	~~~~~~~~~~~~~~~~~~

	The location of the LPT root node on the flash is stored in the UBIFS master
	node. Since the LPT is written and read atomically on every commit, there is
	no need to authenticate individual nodes of the tree. It suffices to
	protect the integrity of the full LPT by a simple hash stored in the master
	node. Since the master node itself is authenticated, the LPTs authenticity can
	be verified by verifying the authenticity of the master node and comparing the
	LTP hash stored there with the hash computed from the read on-flash LPT.


	Key Management
	--------------

	For simplicity, UBIFS authentication uses a single key to compute the HMACs
	of superblock, master, commit start and reference nodes. This key has to be
	available on creation of the filesystem (`mkfs.ubifs`) to authenticate the
	superblock node. Further, it has to be available on mount of the filesystem
	to verify authenticated nodes and generate new HMACs for changes.

	UBIFS authentication is intended to operate side-by-side with UBIFS encryption
	(fscrypt) to provide confidentiality and authenticity. Since UBIFS encryption
	has a different approach of encryption policies per directory, there can be
	multiple fscrypt master keys and there might be folders without encryption.
	UBIFS authentication on the other hand has an all-or-nothing approach in the
	sense that it either authenticates everything of the filesystem or nothing.
	Because of this and because UBIFS authentication should also be usable without
	encryption, it does not share the same master key with fscrypt, but manages
	a dedicated authentication key.

	The API for providing the authentication key has yet to be defined, but the
	key can eg. be provided by userspace through a keyring similar to the way it
	is currently done in fscrypt. It should however be noted that the current
	fscrypt approach has shown its flaws and the userspace API will eventually
	change [FSCRYPT-POLICY2].

	Nevertheless, it will be possible for a user to provide a single passphrase
	or key in userspace that covers UBIFS authentication and encryption. This can
	be solved by the corresponding userspace tools which derive a second key for
	authentication in addition to the derived fscrypt master key used for
	encryption.

	To be able to check if the proper key is available on mount, the UBIFS
	superblock node will additionally store a hash of the authentication key. This
	approach is similar to the approach proposed for fscrypt encryption policy v2
	[FSCRYPT-POLICY2].


	Future Extensions
	=================

	In certain cases where a vendor wants to provide an authenticated filesystem
	image to customers, it should be possible to do so without sharing the secret
	UBIFS authentication key. Instead, in addition the each HMAC a digital
	signature could be stored where the vendor shares the public key alongside the
	filesystem image. In case this filesystem has to be modified afterwards,
	UBIFS can exchange all digital signatures with HMACs on first mount similar
	to the way the IMA/EVM subsystem deals with such situations. The HMAC key
	will then have to be provided beforehand in the normal way.


	References
	==========

	[CRYPTSETUP2] https://www.saout.de/pipermail/dm-crypt/2017-November/005745.html

	[DMC-CBC-ATTACK] https://www.jakoblell.com/blog/2013/12/22/practical-malleability-attack-against-cbc-encrypted-luks-partitions/

	[DM-INTEGRITY] https://www.kernel.org/doc/Documentation/device-mapper/dm-integrity.rst

	[DM-VERITY] https://www.kernel.org/doc/Documentation/device-mapper/verity.rst

	[FSCRYPT-POLICY2] https://www.spinics.net/lists/linux-ext4/msg58710.html

	[UBIFS-WP] http://www.linux-mtd.infradead.org/doc/ubifs_whitepaper.pdf