| .. SPDX-License-Identifier: GPL-2.0 |
| |
| .. _fsverity: |
| |
| ======================================================= |
| fs-verity: read-only file-based authenticity protection |
| ======================================================= |
| |
| Introduction |
| ============ |
| |
| fs-verity (``fs/verity/``) is a support layer that filesystems can |
| hook into to support transparent integrity and authenticity protection |
| of read-only files. Currently, it is supported by the ext4, f2fs, and |
| btrfs filesystems. Like fscrypt, not too much filesystem-specific |
| code is needed to support fs-verity. |
| |
| fs-verity is similar to `dm-verity |
| <https://www.kernel.org/doc/Documentation/device-mapper/verity.txt>`_ |
| but works on files rather than block devices. On regular files on |
| filesystems supporting fs-verity, userspace can execute an ioctl that |
| causes the filesystem to build a Merkle tree for the file and persist |
| it to a filesystem-specific location associated with the file. |
| |
| After this, the file is made readonly, and all reads from the file are |
| automatically verified against the file's Merkle tree. Reads of any |
| corrupted data, including mmap reads, will fail. |
| |
| Userspace can use another ioctl to retrieve the root hash (actually |
| the "fs-verity file digest", which is a hash that includes the Merkle |
| tree root hash) that fs-verity is enforcing for the file. This ioctl |
| executes in constant time, regardless of the file size. |
| |
| fs-verity is essentially a way to hash a file in constant time, |
| subject to the caveat that reads which would violate the hash will |
| fail at runtime. |
| |
| Use cases |
| ========= |
| |
| By itself, fs-verity only provides integrity protection, i.e. |
| detection of accidental (non-malicious) corruption. |
| |
| However, because fs-verity makes retrieving the file hash extremely |
| efficient, it's primarily meant to be used as a tool to support |
| authentication (detection of malicious modifications) or auditing |
| (logging file hashes before use). |
| |
| A standard file hash could be used instead of fs-verity. However, |
| this is inefficient if the file is large and only a small portion may |
| be accessed. This is often the case for Android application package |
| (APK) files, for example. These typically contain many translations, |
| classes, and other resources that are infrequently or even never |
| accessed on a particular device. It would be slow and wasteful to |
| read and hash the entire file before starting the application. |
| |
| Unlike an ahead-of-time hash, fs-verity also re-verifies data each |
| time it's paged in. This ensures that malicious disk firmware can't |
| undetectably change the contents of the file at runtime. |
| |
| fs-verity does not replace or obsolete dm-verity. dm-verity should |
| still be used on read-only filesystems. fs-verity is for files that |
| must live on a read-write filesystem because they are independently |
| updated and potentially user-installed, so dm-verity cannot be used. |
| |
| fs-verity does not mandate a particular scheme for authenticating its |
| file hashes. (Similarly, dm-verity does not mandate a particular |
| scheme for authenticating its block device root hashes.) Options for |
| authenticating fs-verity file hashes include: |
| |
| - Trusted userspace code. Often, the userspace code that accesses |
| files can be trusted to authenticate them. Consider e.g. an |
| application that wants to authenticate data files before using them, |
| or an application loader that is part of the operating system (which |
| is already authenticated in a different way, such as by being loaded |
| from a read-only partition that uses dm-verity) and that wants to |
| authenticate applications before loading them. In these cases, this |
| trusted userspace code can authenticate a file's contents by |
| retrieving its fs-verity digest using `FS_IOC_MEASURE_VERITY`_, then |
| verifying a signature of it using any userspace cryptographic |
| library that supports digital signatures. |
| |
| - Integrity Measurement Architecture (IMA). IMA supports fs-verity |
| file digests as an alternative to its traditional full file digests. |
| "IMA appraisal" enforces that files contain a valid, matching |
| signature in their "security.ima" extended attribute, as controlled |
| by the IMA policy. For more information, see the IMA documentation. |
| |
| - Trusted userspace code in combination with `Built-in signature |
| verification`_. This approach should be used only with great care. |
| |
| User API |
| ======== |
| |
| FS_IOC_ENABLE_VERITY |
| -------------------- |
| |
| The FS_IOC_ENABLE_VERITY ioctl enables fs-verity on a file. It takes |
| in a pointer to a struct fsverity_enable_arg, defined as |
| follows:: |
| |
| struct fsverity_enable_arg { |
| __u32 version; |
| __u32 hash_algorithm; |
| __u32 block_size; |
| __u32 salt_size; |
| __u64 salt_ptr; |
| __u32 sig_size; |
| __u32 __reserved1; |
| __u64 sig_ptr; |
| __u64 __reserved2[11]; |
| }; |
| |
| This structure contains the parameters of the Merkle tree to build for |
| the file. It must be initialized as follows: |
| |
| - ``version`` must be 1. |
| - ``hash_algorithm`` must be the identifier for the hash algorithm to |
| use for the Merkle tree, such as FS_VERITY_HASH_ALG_SHA256. See |
| ``include/uapi/linux/fsverity.h`` for the list of possible values. |
| - ``block_size`` is the Merkle tree block size, in bytes. In Linux |
| v6.3 and later, this can be any power of 2 between (inclusively) |
| 1024 and the minimum of the system page size and the filesystem |
| block size. In earlier versions, the page size was the only allowed |
| value. |
| - ``salt_size`` is the size of the salt in bytes, or 0 if no salt is |
| provided. The salt is a value that is prepended to every hashed |
| block; it can be used to personalize the hashing for a particular |
| file or device. Currently the maximum salt size is 32 bytes. |
| - ``salt_ptr`` is the pointer to the salt, or NULL if no salt is |
| provided. |
| - ``sig_size`` is the size of the builtin signature in bytes, or 0 if no |
| builtin signature is provided. Currently the builtin signature is |
| (somewhat arbitrarily) limited to 16128 bytes. |
| - ``sig_ptr`` is the pointer to the builtin signature, or NULL if no |
| builtin signature is provided. A builtin signature is only needed |
| if the `Built-in signature verification`_ feature is being used. It |
| is not needed for IMA appraisal, and it is not needed if the file |
| signature is being handled entirely in userspace. |
| - All reserved fields must be zeroed. |
| |
| FS_IOC_ENABLE_VERITY causes the filesystem to build a Merkle tree for |
| the file and persist it to a filesystem-specific location associated |
| with the file, then mark the file as a verity file. This ioctl may |
| take a long time to execute on large files, and it is interruptible by |
| fatal signals. |
| |
| FS_IOC_ENABLE_VERITY checks for write access to the inode. However, |
| it must be executed on an O_RDONLY file descriptor and no processes |
| can have the file open for writing. Attempts to open the file for |
| writing while this ioctl is executing will fail with ETXTBSY. (This |
| is necessary to guarantee that no writable file descriptors will exist |
| after verity is enabled, and to guarantee that the file's contents are |
| stable while the Merkle tree is being built over it.) |
| |
| On success, FS_IOC_ENABLE_VERITY returns 0, and the file becomes a |
| verity file. On failure (including the case of interruption by a |
| fatal signal), no changes are made to the file. |
| |
| FS_IOC_ENABLE_VERITY can fail with the following errors: |
| |
| - ``EACCES``: the process does not have write access to the file |
| - ``EBADMSG``: the builtin signature is malformed |
| - ``EBUSY``: this ioctl is already running on the file |
| - ``EEXIST``: the file already has verity enabled |
| - ``EFAULT``: the caller provided inaccessible memory |
| - ``EFBIG``: the file is too large to enable verity on |
| - ``EINTR``: the operation was interrupted by a fatal signal |
| - ``EINVAL``: unsupported version, hash algorithm, or block size; or |
| reserved bits are set; or the file descriptor refers to neither a |
| regular file nor a directory. |
| - ``EISDIR``: the file descriptor refers to a directory |
| - ``EKEYREJECTED``: the builtin signature doesn't match the file |
| - ``EMSGSIZE``: the salt or builtin signature is too long |
| - ``ENOKEY``: the ".fs-verity" keyring doesn't contain the certificate |
| needed to verify the builtin signature |
| - ``ENOPKG``: fs-verity recognizes the hash algorithm, but it's not |
| available in the kernel's crypto API as currently configured (e.g. |
| for SHA-512, missing CONFIG_CRYPTO_SHA512). |
| - ``ENOTTY``: this type of filesystem does not implement fs-verity |
| - ``EOPNOTSUPP``: the kernel was not configured with fs-verity |
| support; or the filesystem superblock has not had the 'verity' |
| feature enabled on it; or the filesystem does not support fs-verity |
| on this file. (See `Filesystem support`_.) |
| - ``EPERM``: the file is append-only; or, a builtin signature is |
| required and one was not provided. |
| - ``EROFS``: the filesystem is read-only |
| - ``ETXTBSY``: someone has the file open for writing. This can be the |
| caller's file descriptor, another open file descriptor, or the file |
| reference held by a writable memory map. |
| |
| FS_IOC_MEASURE_VERITY |
| --------------------- |
| |
| The FS_IOC_MEASURE_VERITY ioctl retrieves the digest of a verity file. |
| The fs-verity file digest is a cryptographic digest that identifies |
| the file contents that are being enforced on reads; it is computed via |
| a Merkle tree and is different from a traditional full-file digest. |
| |
| This ioctl takes in a pointer to a variable-length structure:: |
| |
| struct fsverity_digest { |
| __u16 digest_algorithm; |
| __u16 digest_size; /* input/output */ |
| __u8 digest[]; |
| }; |
| |
| ``digest_size`` is an input/output field. On input, it must be |
| initialized to the number of bytes allocated for the variable-length |
| ``digest`` field. |
| |
| On success, 0 is returned and the kernel fills in the structure as |
| follows: |
| |
| - ``digest_algorithm`` will be the hash algorithm used for the file |
| digest. It will match ``fsverity_enable_arg::hash_algorithm``. |
| - ``digest_size`` will be the size of the digest in bytes, e.g. 32 |
| for SHA-256. (This can be redundant with ``digest_algorithm``.) |
| - ``digest`` will be the actual bytes of the digest. |
| |
| FS_IOC_MEASURE_VERITY is guaranteed to execute in constant time, |
| regardless of the size of the file. |
| |
| FS_IOC_MEASURE_VERITY can fail with the following errors: |
| |
| - ``EFAULT``: the caller provided inaccessible memory |
| - ``ENODATA``: the file is not a verity file |
| - ``ENOTTY``: this type of filesystem does not implement fs-verity |
| - ``EOPNOTSUPP``: the kernel was not configured with fs-verity |
| support, or the filesystem superblock has not had the 'verity' |
| feature enabled on it. (See `Filesystem support`_.) |
| - ``EOVERFLOW``: the digest is longer than the specified |
| ``digest_size`` bytes. Try providing a larger buffer. |
| |
| FS_IOC_READ_VERITY_METADATA |
| --------------------------- |
| |
| The FS_IOC_READ_VERITY_METADATA ioctl reads verity metadata from a |
| verity file. This ioctl is available since Linux v5.12. |
| |
| This ioctl allows writing a server program that takes a verity file |
| and serves it to a client program, such that the client can do its own |
| fs-verity compatible verification of the file. This only makes sense |
| if the client doesn't trust the server and if the server needs to |
| provide the storage for the client. |
| |
| This is a fairly specialized use case, and most fs-verity users won't |
| need this ioctl. |
| |
| This ioctl takes in a pointer to the following structure:: |
| |
| #define FS_VERITY_METADATA_TYPE_MERKLE_TREE 1 |
| #define FS_VERITY_METADATA_TYPE_DESCRIPTOR 2 |
| #define FS_VERITY_METADATA_TYPE_SIGNATURE 3 |
| |
| struct fsverity_read_metadata_arg { |
| __u64 metadata_type; |
| __u64 offset; |
| __u64 length; |
| __u64 buf_ptr; |
| __u64 __reserved; |
| }; |
| |
| ``metadata_type`` specifies the type of metadata to read: |
| |
| - ``FS_VERITY_METADATA_TYPE_MERKLE_TREE`` reads the blocks of the |
| Merkle tree. The blocks are returned in order from the root level |
| to the leaf level. Within each level, the blocks are returned in |
| the same order that their hashes are themselves hashed. |
| See `Merkle tree`_ for more information. |
| |
| - ``FS_VERITY_METADATA_TYPE_DESCRIPTOR`` reads the fs-verity |
| descriptor. See `fs-verity descriptor`_. |
| |
| - ``FS_VERITY_METADATA_TYPE_SIGNATURE`` reads the builtin signature |
| which was passed to FS_IOC_ENABLE_VERITY, if any. See `Built-in |
| signature verification`_. |
| |
| The semantics are similar to those of ``pread()``. ``offset`` |
| specifies the offset in bytes into the metadata item to read from, and |
| ``length`` specifies the maximum number of bytes to read from the |
| metadata item. ``buf_ptr`` is the pointer to the buffer to read into, |
| cast to a 64-bit integer. ``__reserved`` must be 0. On success, the |
| number of bytes read is returned. 0 is returned at the end of the |
| metadata item. The returned length may be less than ``length``, for |
| example if the ioctl is interrupted. |
| |
| The metadata returned by FS_IOC_READ_VERITY_METADATA isn't guaranteed |
| to be authenticated against the file digest that would be returned by |
| `FS_IOC_MEASURE_VERITY`_, as the metadata is expected to be used to |
| implement fs-verity compatible verification anyway (though absent a |
| malicious disk, the metadata will indeed match). E.g. to implement |
| this ioctl, the filesystem is allowed to just read the Merkle tree |
| blocks from disk without actually verifying the path to the root node. |
| |
| FS_IOC_READ_VERITY_METADATA can fail with the following errors: |
| |
| - ``EFAULT``: the caller provided inaccessible memory |
| - ``EINTR``: the ioctl was interrupted before any data was read |
| - ``EINVAL``: reserved fields were set, or ``offset + length`` |
| overflowed |
| - ``ENODATA``: the file is not a verity file, or |
| FS_VERITY_METADATA_TYPE_SIGNATURE was requested but the file doesn't |
| have a builtin signature |
| - ``ENOTTY``: this type of filesystem does not implement fs-verity, or |
| this ioctl is not yet implemented on it |
| - ``EOPNOTSUPP``: the kernel was not configured with fs-verity |
| support, or the filesystem superblock has not had the 'verity' |
| feature enabled on it. (See `Filesystem support`_.) |
| |
| FS_IOC_GETFLAGS |
| --------------- |
| |
| The existing ioctl FS_IOC_GETFLAGS (which isn't specific to fs-verity) |
| can also be used to check whether a file has fs-verity enabled or not. |
| To do so, check for FS_VERITY_FL (0x00100000) in the returned flags. |
| |
| The verity flag is not settable via FS_IOC_SETFLAGS. You must use |
| FS_IOC_ENABLE_VERITY instead, since parameters must be provided. |
| |
| statx |
| ----- |
| |
| Since Linux v5.5, the statx() system call sets STATX_ATTR_VERITY if |
| the file has fs-verity enabled. This can perform better than |
| FS_IOC_GETFLAGS and FS_IOC_MEASURE_VERITY because it doesn't require |
| opening the file, and opening verity files can be expensive. |
| |
| Accessing verity files |
| ====================== |
| |
| Applications can transparently access a verity file just like a |
| non-verity one, with the following exceptions: |
| |
| - Verity files are readonly. They cannot be opened for writing or |
| truncate()d, even if the file mode bits allow it. Attempts to do |
| one of these things will fail with EPERM. However, changes to |
| metadata such as owner, mode, timestamps, and xattrs are still |
| allowed, since these are not measured by fs-verity. Verity files |
| can also still be renamed, deleted, and linked to. |
| |
| - Direct I/O is not supported on verity files. Attempts to use direct |
| I/O on such files will fall back to buffered I/O. |
| |
| - DAX (Direct Access) is not supported on verity files, because this |
| would circumvent the data verification. |
| |
| - Reads of data that doesn't match the verity Merkle tree will fail |
| with EIO (for read()) or SIGBUS (for mmap() reads). |
| |
| - If the sysctl "fs.verity.require_signatures" is set to 1 and the |
| file is not signed by a key in the ".fs-verity" keyring, then |
| opening the file will fail. See `Built-in signature verification`_. |
| |
| Direct access to the Merkle tree is not supported. Therefore, if a |
| verity file is copied, or is backed up and restored, then it will lose |
| its "verity"-ness. fs-verity is primarily meant for files like |
| executables that are managed by a package manager. |
| |
| File digest computation |
| ======================= |
| |
| This section describes how fs-verity hashes the file contents using a |
| Merkle tree to produce the digest which cryptographically identifies |
| the file contents. This algorithm is the same for all filesystems |
| that support fs-verity. |
| |
| Userspace only needs to be aware of this algorithm if it needs to |
| compute fs-verity file digests itself, e.g. in order to sign files. |
| |
| .. _fsverity_merkle_tree: |
| |
| Merkle tree |
| ----------- |
| |
| The file contents is divided into blocks, where the block size is |
| configurable but is usually 4096 bytes. The end of the last block is |
| zero-padded if needed. Each block is then hashed, producing the first |
| level of hashes. Then, the hashes in this first level are grouped |
| into 'blocksize'-byte blocks (zero-padding the ends as needed) and |
| these blocks are hashed, producing the second level of hashes. This |
| proceeds up the tree until only a single block remains. The hash of |
| this block is the "Merkle tree root hash". |
| |
| If the file fits in one block and is nonempty, then the "Merkle tree |
| root hash" is simply the hash of the single data block. If the file |
| is empty, then the "Merkle tree root hash" is all zeroes. |
| |
| The "blocks" here are not necessarily the same as "filesystem blocks". |
| |
| If a salt was specified, then it's zero-padded to the closest multiple |
| of the input size of the hash algorithm's compression function, e.g. |
| 64 bytes for SHA-256 or 128 bytes for SHA-512. The padded salt is |
| prepended to every data or Merkle tree block that is hashed. |
| |
| The purpose of the block padding is to cause every hash to be taken |
| over the same amount of data, which simplifies the implementation and |
| keeps open more possibilities for hardware acceleration. The purpose |
| of the salt padding is to make the salting "free" when the salted hash |
| state is precomputed, then imported for each hash. |
| |
| Example: in the recommended configuration of SHA-256 and 4K blocks, |
| 128 hash values fit in each block. Thus, each level of the Merkle |
| tree is approximately 128 times smaller than the previous, and for |
| large files the Merkle tree's size converges to approximately 1/127 of |
| the original file size. However, for small files, the padding is |
| significant, making the space overhead proportionally more. |
| |
| .. _fsverity_descriptor: |
| |
| fs-verity descriptor |
| -------------------- |
| |
| By itself, the Merkle tree root hash is ambiguous. For example, it |
| can't a distinguish a large file from a small second file whose data |
| is exactly the top-level hash block of the first file. Ambiguities |
| also arise from the convention of padding to the next block boundary. |
| |
| To solve this problem, the fs-verity file digest is actually computed |
| as a hash of the following structure, which contains the Merkle tree |
| root hash as well as other fields such as the file size:: |
| |
| struct fsverity_descriptor { |
| __u8 version; /* must be 1 */ |
| __u8 hash_algorithm; /* Merkle tree hash algorithm */ |
| __u8 log_blocksize; /* log2 of size of data and tree blocks */ |
| __u8 salt_size; /* size of salt in bytes; 0 if none */ |
| __le32 __reserved_0x04; /* must be 0 */ |
| __le64 data_size; /* size of file the Merkle tree is built over */ |
| __u8 root_hash[64]; /* Merkle tree root hash */ |
| __u8 salt[32]; /* salt prepended to each hashed block */ |
| __u8 __reserved[144]; /* must be 0's */ |
| }; |
| |
| Built-in signature verification |
| =============================== |
| |
| CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y adds supports for in-kernel |
| verification of fs-verity builtin signatures. |
| |
| **IMPORTANT**! Please take great care before using this feature. |
| It is not the only way to do signatures with fs-verity, and the |
| alternatives (such as userspace signature verification, and IMA |
| appraisal) can be much better. It's also easy to fall into a trap |
| of thinking this feature solves more problems than it actually does. |
| |
| Enabling this option adds the following: |
| |
| 1. At boot time, the kernel creates a keyring named ".fs-verity". The |
| root user can add trusted X.509 certificates to this keyring using |
| the add_key() system call. |
| |
| 2. `FS_IOC_ENABLE_VERITY`_ accepts a pointer to a PKCS#7 formatted |
| detached signature in DER format of the file's fs-verity digest. |
| On success, the ioctl persists the signature alongside the Merkle |
| tree. Then, any time the file is opened, the kernel verifies the |
| file's actual digest against this signature, using the certificates |
| in the ".fs-verity" keyring. |
| |
| 3. A new sysctl "fs.verity.require_signatures" is made available. |
| When set to 1, the kernel requires that all verity files have a |
| correctly signed digest as described in (2). |
| |
| The data that the signature as described in (2) must be a signature of |
| is the fs-verity file digest in the following format:: |
| |
| struct fsverity_formatted_digest { |
| char magic[8]; /* must be "FSVerity" */ |
| __le16 digest_algorithm; |
| __le16 digest_size; |
| __u8 digest[]; |
| }; |
| |
| That's it. It should be emphasized again that fs-verity builtin |
| signatures are not the only way to do signatures with fs-verity. See |
| `Use cases`_ for an overview of ways in which fs-verity can be used. |
| fs-verity builtin signatures have some major limitations that should |
| be carefully considered before using them: |
| |
| - Builtin signature verification does *not* make the kernel enforce |
| that any files actually have fs-verity enabled. Thus, it is not a |
| complete authentication policy. Currently, if it is used, the only |
| way to complete the authentication policy is for trusted userspace |
| code to explicitly check whether files have fs-verity enabled with a |
| signature before they are accessed. (With |
| fs.verity.require_signatures=1, just checking whether fs-verity is |
| enabled suffices.) But, in this case the trusted userspace code |
| could just store the signature alongside the file and verify it |
| itself using a cryptographic library, instead of using this feature. |
| |
| - A file's builtin signature can only be set at the same time that |
| fs-verity is being enabled on the file. Changing or deleting the |
| builtin signature later requires re-creating the file. |
| |
| - Builtin signature verification uses the same set of public keys for |
| all fs-verity enabled files on the system. Different keys cannot be |
| trusted for different files; each key is all or nothing. |
| |
| - The sysctl fs.verity.require_signatures applies system-wide. |
| Setting it to 1 only works when all users of fs-verity on the system |
| agree that it should be set to 1. This limitation can prevent |
| fs-verity from being used in cases where it would be helpful. |
| |
| - Builtin signature verification can only use signature algorithms |
| that are supported by the kernel. For example, the kernel does not |
| yet support Ed25519, even though this is often the signature |
| algorithm that is recommended for new cryptographic designs. |
| |
| - fs-verity builtin signatures are in PKCS#7 format, and the public |
| keys are in X.509 format. These formats are commonly used, |
| including by some other kernel features (which is why the fs-verity |
| builtin signatures use them), and are very feature rich. |
| Unfortunately, history has shown that code that parses and handles |
| these formats (which are from the 1990s and are based on ASN.1) |
| often has vulnerabilities as a result of their complexity. This |
| complexity is not inherent to the cryptography itself. |
| |
| fs-verity users who do not need advanced features of X.509 and |
| PKCS#7 should strongly consider using simpler formats, such as plain |
| Ed25519 keys and signatures, and verifying signatures in userspace. |
| |
| fs-verity users who choose to use X.509 and PKCS#7 anyway should |
| still consider that verifying those signatures in userspace is more |
| flexible (for other reasons mentioned earlier in this document) and |
| eliminates the need to enable CONFIG_FS_VERITY_BUILTIN_SIGNATURES |
| and its associated increase in kernel attack surface. In some cases |
| it can even be necessary, since advanced X.509 and PKCS#7 features |
| do not always work as intended with the kernel. For example, the |
| kernel does not check X.509 certificate validity times. |
| |
| Note: IMA appraisal, which supports fs-verity, does not use PKCS#7 |
| for its signatures, so it partially avoids the issues discussed |
| here. IMA appraisal does use X.509. |
| |
| Filesystem support |
| ================== |
| |
| fs-verity is supported by several filesystems, described below. The |
| CONFIG_FS_VERITY kconfig option must be enabled to use fs-verity on |
| any of these filesystems. |
| |
| ``include/linux/fsverity.h`` declares the interface between the |
| ``fs/verity/`` support layer and filesystems. Briefly, filesystems |
| must provide an ``fsverity_operations`` structure that provides |
| methods to read and write the verity metadata to a filesystem-specific |
| location, including the Merkle tree blocks and |
| ``fsverity_descriptor``. Filesystems must also call functions in |
| ``fs/verity/`` at certain times, such as when a file is opened or when |
| pages have been read into the pagecache. (See `Verifying data`_.) |
| |
| ext4 |
| ---- |
| |
| ext4 supports fs-verity since Linux v5.4 and e2fsprogs v1.45.2. |
| |
| To create verity files on an ext4 filesystem, the filesystem must have |
| been formatted with ``-O verity`` or had ``tune2fs -O verity`` run on |
| it. "verity" is an RO_COMPAT filesystem feature, so once set, old |
| kernels will only be able to mount the filesystem readonly, and old |
| versions of e2fsck will be unable to check the filesystem. |
| |
| Originally, an ext4 filesystem with the "verity" feature could only be |
| mounted when its block size was equal to the system page size |
| (typically 4096 bytes). In Linux v6.3, this limitation was removed. |
| |
| ext4 sets the EXT4_VERITY_FL on-disk inode flag on verity files. It |
| can only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be cleared. |
| |
| ext4 also supports encryption, which can be used simultaneously with |
| fs-verity. In this case, the plaintext data is verified rather than |
| the ciphertext. This is necessary in order to make the fs-verity file |
| digest meaningful, since every file is encrypted differently. |
| |
| ext4 stores the verity metadata (Merkle tree and fsverity_descriptor) |
| past the end of the file, starting at the first 64K boundary beyond |
| i_size. This approach works because (a) verity files are readonly, |
| and (b) pages fully beyond i_size aren't visible to userspace but can |
| be read/written internally by ext4 with only some relatively small |
| changes to ext4. This approach avoids having to depend on the |
| EA_INODE feature and on rearchitecturing ext4's xattr support to |
| support paging multi-gigabyte xattrs into memory, and to support |
| encrypting xattrs. Note that the verity metadata *must* be encrypted |
| when the file is, since it contains hashes of the plaintext data. |
| |
| ext4 only allows verity on extent-based files. |
| |
| f2fs |
| ---- |
| |
| f2fs supports fs-verity since Linux v5.4 and f2fs-tools v1.11.0. |
| |
| To create verity files on an f2fs filesystem, the filesystem must have |
| been formatted with ``-O verity``. |
| |
| f2fs sets the FADVISE_VERITY_BIT on-disk inode flag on verity files. |
| It can only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be |
| cleared. |
| |
| Like ext4, f2fs stores the verity metadata (Merkle tree and |
| fsverity_descriptor) past the end of the file, starting at the first |
| 64K boundary beyond i_size. See explanation for ext4 above. |
| Moreover, f2fs supports at most 4096 bytes of xattr entries per inode |
| which usually wouldn't be enough for even a single Merkle tree block. |
| |
| f2fs doesn't support enabling verity on files that currently have |
| atomic or volatile writes pending. |
| |
| btrfs |
| ----- |
| |
| btrfs supports fs-verity since Linux v5.15. Verity-enabled inodes are |
| marked with a RO_COMPAT inode flag, and the verity metadata is stored |
| in separate btree items. |
| |
| Implementation details |
| ====================== |
| |
| Verifying data |
| -------------- |
| |
| fs-verity ensures that all reads of a verity file's data are verified, |
| regardless of which syscall is used to do the read (e.g. mmap(), |
| read(), pread()) and regardless of whether it's the first read or a |
| later read (unless the later read can return cached data that was |
| already verified). Below, we describe how filesystems implement this. |
| |
| Pagecache |
| ~~~~~~~~~ |
| |
| For filesystems using Linux's pagecache, the ``->read_folio()`` and |
| ``->readahead()`` methods must be modified to verify folios before |
| they are marked Uptodate. Merely hooking ``->read_iter()`` would be |
| insufficient, since ``->read_iter()`` is not used for memory maps. |
| |
| Therefore, fs/verity/ provides the function fsverity_verify_blocks() |
| which verifies data that has been read into the pagecache of a verity |
| inode. The containing folio must still be locked and not Uptodate, so |
| it's not yet readable by userspace. As needed to do the verification, |
| fsverity_verify_blocks() will call back into the filesystem to read |
| hash blocks via fsverity_operations::read_merkle_tree_page(). |
| |
| fsverity_verify_blocks() returns false if verification failed; in this |
| case, the filesystem must not set the folio Uptodate. Following this, |
| as per the usual Linux pagecache behavior, attempts by userspace to |
| read() from the part of the file containing the folio will fail with |
| EIO, and accesses to the folio within a memory map will raise SIGBUS. |
| |
| In principle, verifying a data block requires verifying the entire |
| path in the Merkle tree from the data block to the root hash. |
| However, for efficiency the filesystem may cache the hash blocks. |
| Therefore, fsverity_verify_blocks() only ascends the tree reading hash |
| blocks until an already-verified hash block is seen. It then verifies |
| the path to that block. |
| |
| This optimization, which is also used by dm-verity, results in |
| excellent sequential read performance. This is because usually (e.g. |
| 127 in 128 times for 4K blocks and SHA-256) the hash block from the |
| bottom level of the tree will already be cached and checked from |
| reading a previous data block. However, random reads perform worse. |
| |
| Block device based filesystems |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Block device based filesystems (e.g. ext4 and f2fs) in Linux also use |
| the pagecache, so the above subsection applies too. However, they |
| also usually read many data blocks from a file at once, grouped into a |
| structure called a "bio". To make it easier for these types of |
| filesystems to support fs-verity, fs/verity/ also provides a function |
| fsverity_verify_bio() which verifies all data blocks in a bio. |
| |
| ext4 and f2fs also support encryption. If a verity file is also |
| encrypted, the data must be decrypted before being verified. To |
| support this, these filesystems allocate a "post-read context" for |
| each bio and store it in ``->bi_private``:: |
| |
| struct bio_post_read_ctx { |
| struct bio *bio; |
| struct work_struct work; |
| unsigned int cur_step; |
| unsigned int enabled_steps; |
| }; |
| |
| ``enabled_steps`` is a bitmask that specifies whether decryption, |
| verity, or both is enabled. After the bio completes, for each needed |
| postprocessing step the filesystem enqueues the bio_post_read_ctx on a |
| workqueue, and then the workqueue work does the decryption or |
| verification. Finally, folios where no decryption or verity error |
| occurred are marked Uptodate, and the folios are unlocked. |
| |
| On many filesystems, files can contain holes. Normally, |
| ``->readahead()`` simply zeroes hole blocks and considers the |
| corresponding data to be up-to-date; no bios are issued. To prevent |
| this case from bypassing fs-verity, filesystems use |
| fsverity_verify_blocks() to verify hole blocks. |
| |
| Filesystems also disable direct I/O on verity files, since otherwise |
| direct I/O would bypass fs-verity. |
| |
| Userspace utility |
| ================= |
| |
| This document focuses on the kernel, but a userspace utility for |
| fs-verity can be found at: |
| |
| https://git.kernel.org/pub/scm/fs/fsverity/fsverity-utils.git |
| |
| See the README.md file in the fsverity-utils source tree for details, |
| including examples of setting up fs-verity protected files. |
| |
| Tests |
| ===== |
| |
| To test fs-verity, use xfstests. For example, using `kvm-xfstests |
| <https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md>`_:: |
| |
| kvm-xfstests -c ext4,f2fs,btrfs -g verity |
| |
| FAQ |
| === |
| |
| This section answers frequently asked questions about fs-verity that |
| weren't already directly answered in other parts of this document. |
| |
| :Q: Why isn't fs-verity part of IMA? |
| :A: fs-verity and IMA (Integrity Measurement Architecture) have |
| different focuses. fs-verity is a filesystem-level mechanism for |
| hashing individual files using a Merkle tree. In contrast, IMA |
| specifies a system-wide policy that specifies which files are |
| hashed and what to do with those hashes, such as log them, |
| authenticate them, or add them to a measurement list. |
| |
| IMA supports the fs-verity hashing mechanism as an alternative |
| to full file hashes, for those who want the performance and |
| security benefits of the Merkle tree based hash. However, it |
| doesn't make sense to force all uses of fs-verity to be through |
| IMA. fs-verity already meets many users' needs even as a |
| standalone filesystem feature, and it's testable like other |
| filesystem features e.g. with xfstests. |
| |
| :Q: Isn't fs-verity useless because the attacker can just modify the |
| hashes in the Merkle tree, which is stored on-disk? |
| :A: To verify the authenticity of an fs-verity file you must verify |
| the authenticity of the "fs-verity file digest", which |
| incorporates the root hash of the Merkle tree. See `Use cases`_. |
| |
| :Q: Isn't fs-verity useless because the attacker can just replace a |
| verity file with a non-verity one? |
| :A: See `Use cases`_. In the initial use case, it's really trusted |
| userspace code that authenticates the files; fs-verity is just a |
| tool to do this job efficiently and securely. The trusted |
| userspace code will consider non-verity files to be inauthentic. |
| |
| :Q: Why does the Merkle tree need to be stored on-disk? Couldn't you |
| store just the root hash? |
| :A: If the Merkle tree wasn't stored on-disk, then you'd have to |
| compute the entire tree when the file is first accessed, even if |
| just one byte is being read. This is a fundamental consequence of |
| how Merkle tree hashing works. To verify a leaf node, you need to |
| verify the whole path to the root hash, including the root node |
| (the thing which the root hash is a hash of). But if the root |
| node isn't stored on-disk, you have to compute it by hashing its |
| children, and so on until you've actually hashed the entire file. |
| |
| That defeats most of the point of doing a Merkle tree-based hash, |
| since if you have to hash the whole file ahead of time anyway, |
| then you could simply do sha256(file) instead. That would be much |
| simpler, and a bit faster too. |
| |
| It's true that an in-memory Merkle tree could still provide the |
| advantage of verification on every read rather than just on the |
| first read. However, it would be inefficient because every time a |
| hash page gets evicted (you can't pin the entire Merkle tree into |
| memory, since it may be very large), in order to restore it you |
| again need to hash everything below it in the tree. This again |
| defeats most of the point of doing a Merkle tree-based hash, since |
| a single block read could trigger re-hashing gigabytes of data. |
| |
| :Q: But couldn't you store just the leaf nodes and compute the rest? |
| :A: See previous answer; this really just moves up one level, since |
| one could alternatively interpret the data blocks as being the |
| leaf nodes of the Merkle tree. It's true that the tree can be |
| computed much faster if the leaf level is stored rather than just |
| the data, but that's only because each level is less than 1% the |
| size of the level below (assuming the recommended settings of |
| SHA-256 and 4K blocks). For the exact same reason, by storing |
| "just the leaf nodes" you'd already be storing over 99% of the |
| tree, so you might as well simply store the whole tree. |
| |
| :Q: Can the Merkle tree be built ahead of time, e.g. distributed as |
| part of a package that is installed to many computers? |
| :A: This isn't currently supported. It was part of the original |
| design, but was removed to simplify the kernel UAPI and because it |
| wasn't a critical use case. Files are usually installed once and |
| used many times, and cryptographic hashing is somewhat fast on |
| most modern processors. |
| |
| :Q: Why doesn't fs-verity support writes? |
| :A: Write support would be very difficult and would require a |
| completely different design, so it's well outside the scope of |
| fs-verity. Write support would require: |
| |
| - A way to maintain consistency between the data and hashes, |
| including all levels of hashes, since corruption after a crash |
| (especially of potentially the entire file!) is unacceptable. |
| The main options for solving this are data journalling, |
| copy-on-write, and log-structured volume. But it's very hard to |
| retrofit existing filesystems with new consistency mechanisms. |
| Data journalling is available on ext4, but is very slow. |
| |
| - Rebuilding the Merkle tree after every write, which would be |
| extremely inefficient. Alternatively, a different authenticated |
| dictionary structure such as an "authenticated skiplist" could |
| be used. However, this would be far more complex. |
| |
| Compare it to dm-verity vs. dm-integrity. dm-verity is very |
| simple: the kernel just verifies read-only data against a |
| read-only Merkle tree. In contrast, dm-integrity supports writes |
| but is slow, is much more complex, and doesn't actually support |
| full-device authentication since it authenticates each sector |
| independently, i.e. there is no "root hash". It doesn't really |
| make sense for the same device-mapper target to support these two |
| very different cases; the same applies to fs-verity. |
| |
| :Q: Since verity files are immutable, why isn't the immutable bit set? |
| :A: The existing "immutable" bit (FS_IMMUTABLE_FL) already has a |
| specific set of semantics which not only make the file contents |
| read-only, but also prevent the file from being deleted, renamed, |
| linked to, or having its owner or mode changed. These extra |
| properties are unwanted for fs-verity, so reusing the immutable |
| bit isn't appropriate. |
| |
| :Q: Why does the API use ioctls instead of setxattr() and getxattr()? |
| :A: Abusing the xattr interface for basically arbitrary syscalls is |
| heavily frowned upon by most of the Linux filesystem developers. |
| An xattr should really just be an xattr on-disk, not an API to |
| e.g. magically trigger construction of a Merkle tree. |
| |
| :Q: Does fs-verity support remote filesystems? |
| :A: So far all filesystems that have implemented fs-verity support are |
| local filesystems, but in principle any filesystem that can store |
| per-file verity metadata can support fs-verity, regardless of |
| whether it's local or remote. Some filesystems may have fewer |
| options of where to store the verity metadata; one possibility is |
| to store it past the end of the file and "hide" it from userspace |
| by manipulating i_size. The data verification functions provided |
| by ``fs/verity/`` also assume that the filesystem uses the Linux |
| pagecache, but both local and remote filesystems normally do so. |
| |
| :Q: Why is anything filesystem-specific at all? Shouldn't fs-verity |
| be implemented entirely at the VFS level? |
| :A: There are many reasons why this is not possible or would be very |
| difficult, including the following: |
| |
| - To prevent bypassing verification, folios must not be marked |
| Uptodate until they've been verified. Currently, each |
| filesystem is responsible for marking folios Uptodate via |
| ``->readahead()``. Therefore, currently it's not possible for |
| the VFS to do the verification on its own. Changing this would |
| require significant changes to the VFS and all filesystems. |
| |
| - It would require defining a filesystem-independent way to store |
| the verity metadata. Extended attributes don't work for this |
| because (a) the Merkle tree may be gigabytes, but many |
| filesystems assume that all xattrs fit into a single 4K |
| filesystem block, and (b) ext4 and f2fs encryption doesn't |
| encrypt xattrs, yet the Merkle tree *must* be encrypted when the |
| file contents are, because it stores hashes of the plaintext |
| file contents. |
| |
| So the verity metadata would have to be stored in an actual |
| file. Using a separate file would be very ugly, since the |
| metadata is fundamentally part of the file to be protected, and |
| it could cause problems where users could delete the real file |
| but not the metadata file or vice versa. On the other hand, |
| having it be in the same file would break applications unless |
| filesystems' notion of i_size were divorced from the VFS's, |
| which would be complex and require changes to all filesystems. |
| |
| - It's desirable that FS_IOC_ENABLE_VERITY uses the filesystem's |
| transaction mechanism so that either the file ends up with |
| verity enabled, or no changes were made. Allowing intermediate |
| states to occur after a crash may cause problems. |