Documentation/admin-guide/device-mapper/dm-integrity.rst - linux - Git at Google

 ============
 dm-integrity
 ============

 The dm-integrity target emulates a block device that has additional
 per-sector tags that can be used for storing integrity information.

 A general problem with storing integrity tags with every sector is that
 writing the sector and the integrity tag must be atomic - i.e. in case of
 crash, either both sector and integrity tag or none of them is written.

 To guarantee write atomicity, the dm-integrity target uses journal, it
 writes sector data and integrity tags into a journal, commits the journal
 and then copies the data and integrity tags to their respective location.

 The dm-integrity target can be used with the dm-crypt target - in this
 situation the dm-crypt target creates the integrity data and passes them
 to the dm-integrity target via bio_integrity_payload attached to the bio.
 In this mode, the dm-crypt and dm-integrity targets provide authenticated
 disk encryption - if the attacker modifies the encrypted device, an I/O
 error is returned instead of random data.

 The dm-integrity target can also be used as a standalone target, in this
 mode it calculates and verifies the integrity tag internally. In this
 mode, the dm-integrity target can be used to detect silent data
 corruption on the disk or in the I/O path.

 There's an alternate mode of operation where dm-integrity uses a bitmap
 instead of a journal. If a bit in the bitmap is 1, the corresponding
 region's data and integrity tags are not synchronized - if the machine
 crashes, the unsynchronized regions will be recalculated. The bitmap mode
 is faster than the journal mode, because we don't have to write the data
 twice, but it is also less reliable, because if data corruption happens
 when the machine crashes, it may not be detected.

 When loading the target for the first time, the kernel driver will format
 the device. But it will only format the device if the superblock contains
 zeroes. If the superblock is neither valid nor zeroed, the dm-integrity
 target can't be loaded.

 Accesses to the on-disk metadata area containing checksums (aka tags) are
 buffered using dm-bufio. When an access to any given metadata area
 occurs, each unique metadata area gets its own buffer(s). The buffer size
 is capped at the size of the metadata area, but may be smaller, thereby
 requiring multiple buffers to represent the full metadata area. A smaller
 buffer size will produce a smaller resulting read/write operation to the
 metadata area for small reads/writes. The metadata is still read even in
 a full write to the data covered by a single buffer.

 To use the target for the first time:

 1. overwrite the superblock with zeroes
 2. load the dm-integrity target with one-sector size, the kernel driver
    will format the device
 3. unload the dm-integrity target
 4. read the "provided_data_sectors" value from the superblock
 5. load the dm-integrity target with the target size
    "provided_data_sectors"
 6. if you want to use dm-integrity with dm-crypt, load the dm-crypt target
    with the size "provided_data_sectors"


 Target arguments:

 1. the underlying block device

 2. the number of reserved sector at the beginning of the device - the
    dm-integrity won't read of write these sectors

 3. the size of the integrity tag (if "-" is used, the size is taken from
    the internal-hash algorithm)

 4. mode:

 	D - direct writes (without journal)
 		in this mode, journaling is
 		not used and data sectors and integrity tags are written
 		separately. In case of crash, it is possible that the data
 		and integrity tag doesn't match.
 	J - journaled writes
 		data and integrity tags are written to the
 		journal and atomicity is guaranteed. In case of crash,
 		either both data and tag or none of them are written. The
 		journaled mode degrades write throughput twice because the
 		data have to be written twice.
 	B - bitmap mode - data and metadata are written without any
 		synchronization, the driver maintains a bitmap of dirty
 		regions where data and metadata don't match. This mode can
 		only be used with internal hash.
 	R - recovery mode - in this mode, journal is not replayed,
 		checksums are not checked and writes to the device are not
 		allowed. This mode is useful for data recovery if the
 		device cannot be activated in any of the other standard
 		modes.

 5. the number of additional arguments

 Additional arguments:

 journal_sectors:number
 	The size of journal, this argument is used only if formatting the
 	device. If the device is already formatted, the value from the
 	superblock is used.

 interleave_sectors:number (default 32768)
 	The number of interleaved sectors. This values is rounded down to
 	a power of two. If the device is already formatted, the value from
 	the superblock is used.

 meta_device:device
 	Don't interleave the data and metadata on the device. Use a
 	separate device for metadata.

 buffer_sectors:number (default 128)
 	The number of sectors in one metadata buffer. The value is rounded
 	down to a power of two.

 journal_watermark:number (default 50)
 	The journal watermark in percents. When the size of the journal
 	exceeds this watermark, the thread that flushes the journal will
 	be started.

 commit_time:number (default 10000)
 	Commit time in milliseconds. When this time passes, the journal is
 	written. The journal is also written immediately if the FLUSH
 	request is received.

 internal_hash:algorithm(:key)	(the key is optional)
 	Use internal hash or crc.
 	When this argument is used, the dm-integrity target won't accept
 	integrity tags from the upper target, but it will automatically
 	generate and verify the integrity tags.

 	You can use a crc algorithm (such as crc32), then integrity target
 	will protect the data against accidental corruption.
 	You can also use a hmac algorithm (for example
 	"hmac(sha256):0123456789abcdef"), in this mode it will provide
 	cryptographic authentication of the data without encryption.

 	When this argument is not used, the integrity tags are accepted
 	from an upper layer target, such as dm-crypt. The upper layer
 	target should check the validity of the integrity tags.

 recalculate
 	Recalculate the integrity tags automatically. It is only valid
 	when using internal hash.

 journal_crypt:algorithm(:key)	(the key is optional)
 	Encrypt the journal using given algorithm to make sure that the
 	attacker can't read the journal. You can use a block cipher here
 	(such as "cbc(aes)") or a stream cipher (for example "chacha20"
 	or "ctr(aes)").

 	The journal contains history of last writes to the block device,
 	an attacker reading the journal could see the last sector numbers
 	that were written. From the sector numbers, the attacker can infer
 	the size of files that were written. To protect against this
 	situation, you can encrypt the journal.

 journal_mac:algorithm(:key)	(the key is optional)
 	Protect sector numbers in the journal from accidental or malicious
 	modification. To protect against accidental modification, use a
 	crc algorithm, to protect against malicious modification, use a
 	hmac algorithm with a key.

 	This option is not needed when using internal-hash because in this
 	mode, the integrity of journal entries is checked when replaying
 	the journal. Thus, modified sector number would be detected at
 	this stage.

 block_size:number (default 512)
 	The size of a data block in bytes. The larger the block size the
 	less overhead there is for per-block integrity metadata.
 	Supported values are 512, 1024, 2048 and 4096 bytes.

 sectors_per_bit:number
 	In the bitmap mode, this parameter specifies the number of
 	512-byte sectors that corresponds to one bitmap bit.

 bitmap_flush_interval:number
 	The bitmap flush interval in milliseconds. The metadata buffers
 	are synchronized when this interval expires.

 allow_discards
 	Allow block discard requests (a.k.a. TRIM) for the integrity device.
 	Discards are only allowed to devices using internal hash.

 fix_padding
 	Use a smaller padding of the tag area that is more
 	space-efficient. If this option is not present, large padding is
 	used - that is for compatibility with older kernels.

 fix_hmac
 	Improve security of internal_hash and journal_mac:

 	- the section number is mixed to the mac, so that an attacker can't
 	  copy sectors from one journal section to another journal section
 	- the superblock is protected by journal_mac
 	- a 16-byte salt stored in the superblock is mixed to the mac, so
 	  that the attacker can't detect that two disks have the same hmac
 	  key and also to disallow the attacker to move sectors from one
 	  disk to another

 legacy_recalculate
 	Allow recalculating of volumes with HMAC keys. This is disabled by
 	default for security reasons - an attacker could modify the volume,
 	set recalc_sector to zero, and the kernel would not detect the
 	modification.

 The journal mode (D/J), buffer_sectors, journal_watermark, commit_time and
 allow_discards can be changed when reloading the target (load an inactive
 table and swap the tables with suspend and resume). The other arguments
 should not be changed when reloading the target because the layout of disk
 data depend on them and the reloaded target would be non-functional.

 For example, on a device using the default interleave_sectors of 32768, a
 block_size of 512, and an internal_hash of crc32c with a tag size of 4
 bytes, it will take 128 KiB of tags to track a full data area, requiring
 256 sectors of metadata per data area. With the default buffer_sectors of
 128, that means there will be 2 buffers per metadata area, or 2 buffers
 per 16 MiB of data.

 Status line:

 1. the number of integrity mismatches
 2. provided data sectors - that is the number of sectors that the user
    could use
 3. the current recalculating position (or '-' if we didn't recalculate)


 The layout of the formatted block device:

 * reserved sectors
     (they are not used by this target, they can be used for
     storing LUKS metadata or for other purpose), the size of the reserved
     area is specified in the target arguments

 * superblock (4kiB)
 	* magic string - identifies that the device was formatted
 	* version
 	* log2(interleave sectors)
 	* integrity tag size
 	* the number of journal sections
 	* provided data sectors - the number of sectors that this target
 	  provides (i.e. the size of the device minus the size of all
 	  metadata and padding). The user of this target should not send
 	  bios that access data beyond the "provided data sectors" limit.
 	* flags
 	    SB_FLAG_HAVE_JOURNAL_MAC
 		- a flag is set if journal_mac is used
 	    SB_FLAG_RECALCULATING
 		- recalculating is in progress
 	    SB_FLAG_DIRTY_BITMAP
 		- journal area contains the bitmap of dirty
 		  blocks
 	* log2(sectors per block)
 	* a position where recalculating finished
 * journal
 	The journal is divided into sections, each section contains:

 	* metadata area (4kiB), it contains journal entries

 	  - every journal entry contains:

 		* logical sector (specifies where the data and tag should
 		  be written)
 		* last 8 bytes of data
 		* integrity tag (the size is specified in the superblock)

 	  - every metadata sector ends with

 		* mac (8-bytes), all the macs in 8 metadata sectors form a
 		  64-byte value. It is used to store hmac of sector
 		  numbers in the journal section, to protect against a
 		  possibility that the attacker tampers with sector
 		  numbers in the journal.
 		* commit id

 	* data area (the size is variable; it depends on how many journal
 	  entries fit into the metadata area)

 	    - every sector in the data area contains:

 		* data (504 bytes of data, the last 8 bytes are stored in
 		  the journal entry)
 		* commit id

 	To test if the whole journal section was written correctly, every
 	512-byte sector of the journal ends with 8-byte commit id. If the
 	commit id matches on all sectors in a journal section, then it is
 	assumed that the section was written correctly. If the commit id
 	doesn't match, the section was written partially and it should not
 	be replayed.

 * one or more runs of interleaved tags and data.
     Each run contains:

 	* tag area - it contains integrity tags. There is one tag for each
 	  sector in the data area. The size of this area is always 4KiB or
 	  greater.
 	* data area - it contains data sectors. The number of data sectors
 	  in one run must be a power of two. log2 of this value is stored
 	  in the superblock.
	============
	dm-integrity
	============

	The dm-integrity target emulates a block device that has additional
	per-sector tags that can be used for storing integrity information.

	A general problem with storing integrity tags with every sector is that
	writing the sector and the integrity tag must be atomic - i.e. in case of
	crash, either both sector and integrity tag or none of them is written.

	To guarantee write atomicity, the dm-integrity target uses journal, it
	writes sector data and integrity tags into a journal, commits the journal
	and then copies the data and integrity tags to their respective location.

	The dm-integrity target can be used with the dm-crypt target - in this
	situation the dm-crypt target creates the integrity data and passes them
	to the dm-integrity target via bio_integrity_payload attached to the bio.
	In this mode, the dm-crypt and dm-integrity targets provide authenticated
	disk encryption - if the attacker modifies the encrypted device, an I/O
	error is returned instead of random data.

	The dm-integrity target can also be used as a standalone target, in this
	mode it calculates and verifies the integrity tag internally. In this
	mode, the dm-integrity target can be used to detect silent data
	corruption on the disk or in the I/O path.

	There's an alternate mode of operation where dm-integrity uses a bitmap
	instead of a journal. If a bit in the bitmap is 1, the corresponding
	region's data and integrity tags are not synchronized - if the machine
	crashes, the unsynchronized regions will be recalculated. The bitmap mode
	is faster than the journal mode, because we don't have to write the data
	twice, but it is also less reliable, because if data corruption happens
	when the machine crashes, it may not be detected.

	When loading the target for the first time, the kernel driver will format
	the device. But it will only format the device if the superblock contains
	zeroes. If the superblock is neither valid nor zeroed, the dm-integrity
	target can't be loaded.

	Accesses to the on-disk metadata area containing checksums (aka tags) are
	buffered using dm-bufio. When an access to any given metadata area
	occurs, each unique metadata area gets its own buffer(s). The buffer size
	is capped at the size of the metadata area, but may be smaller, thereby
	requiring multiple buffers to represent the full metadata area. A smaller
	buffer size will produce a smaller resulting read/write operation to the
	metadata area for small reads/writes. The metadata is still read even in
	a full write to the data covered by a single buffer.

	To use the target for the first time:

	1. overwrite the superblock with zeroes
	2. load the dm-integrity target with one-sector size, the kernel driver
	will format the device
	3. unload the dm-integrity target
	4. read the "provided_data_sectors" value from the superblock
	5. load the dm-integrity target with the target size
	"provided_data_sectors"
	6. if you want to use dm-integrity with dm-crypt, load the dm-crypt target
	with the size "provided_data_sectors"


	Target arguments:

	1. the underlying block device

	2. the number of reserved sector at the beginning of the device - the
	dm-integrity won't read of write these sectors

	3. the size of the integrity tag (if "-" is used, the size is taken from
	the internal-hash algorithm)

	4. mode:

	D - direct writes (without journal)
	in this mode, journaling is
	not used and data sectors and integrity tags are written
	separately. In case of crash, it is possible that the data
	and integrity tag doesn't match.
	J - journaled writes
	data and integrity tags are written to the
	journal and atomicity is guaranteed. In case of crash,
	either both data and tag or none of them are written. The
	journaled mode degrades write throughput twice because the
	data have to be written twice.
	B - bitmap mode - data and metadata are written without any
	synchronization, the driver maintains a bitmap of dirty
	regions where data and metadata don't match. This mode can
	only be used with internal hash.
	R - recovery mode - in this mode, journal is not replayed,
	checksums are not checked and writes to the device are not
	allowed. This mode is useful for data recovery if the
	device cannot be activated in any of the other standard
	modes.

	5. the number of additional arguments

	Additional arguments:

	journal_sectors:number
	The size of journal, this argument is used only if formatting the
	device. If the device is already formatted, the value from the
	superblock is used.

	interleave_sectors:number (default 32768)
	The number of interleaved sectors. This values is rounded down to
	a power of two. If the device is already formatted, the value from
	the superblock is used.

	meta_device:device
	Don't interleave the data and metadata on the device. Use a
	separate device for metadata.

	buffer_sectors:number (default 128)
	The number of sectors in one metadata buffer. The value is rounded
	down to a power of two.

	journal_watermark:number (default 50)
	The journal watermark in percents. When the size of the journal
	exceeds this watermark, the thread that flushes the journal will
	be started.

	commit_time:number (default 10000)
	Commit time in milliseconds. When this time passes, the journal is
	written. The journal is also written immediately if the FLUSH
	request is received.

	internal_hash:algorithm(:key) (the key is optional)
	Use internal hash or crc.
	When this argument is used, the dm-integrity target won't accept
	integrity tags from the upper target, but it will automatically
	generate and verify the integrity tags.

	You can use a crc algorithm (such as crc32), then integrity target
	will protect the data against accidental corruption.
	You can also use a hmac algorithm (for example
	"hmac(sha256):0123456789abcdef"), in this mode it will provide
	cryptographic authentication of the data without encryption.

	When this argument is not used, the integrity tags are accepted
	from an upper layer target, such as dm-crypt. The upper layer
	target should check the validity of the integrity tags.

	recalculate
	Recalculate the integrity tags automatically. It is only valid
	when using internal hash.

	journal_crypt:algorithm(:key) (the key is optional)
	Encrypt the journal using given algorithm to make sure that the
	attacker can't read the journal. You can use a block cipher here
	(such as "cbc(aes)") or a stream cipher (for example "chacha20"
	or "ctr(aes)").

	The journal contains history of last writes to the block device,
	an attacker reading the journal could see the last sector numbers
	that were written. From the sector numbers, the attacker can infer
	the size of files that were written. To protect against this
	situation, you can encrypt the journal.

	journal_mac:algorithm(:key) (the key is optional)
	Protect sector numbers in the journal from accidental or malicious
	modification. To protect against accidental modification, use a
	crc algorithm, to protect against malicious modification, use a
	hmac algorithm with a key.

	This option is not needed when using internal-hash because in this
	mode, the integrity of journal entries is checked when replaying
	the journal. Thus, modified sector number would be detected at
	this stage.

	block_size:number (default 512)
	The size of a data block in bytes. The larger the block size the
	less overhead there is for per-block integrity metadata.
	Supported values are 512, 1024, 2048 and 4096 bytes.

	sectors_per_bit:number
	In the bitmap mode, this parameter specifies the number of
	512-byte sectors that corresponds to one bitmap bit.

	bitmap_flush_interval:number
	The bitmap flush interval in milliseconds. The metadata buffers
	are synchronized when this interval expires.

	allow_discards
	Allow block discard requests (a.k.a. TRIM) for the integrity device.
	Discards are only allowed to devices using internal hash.

	fix_padding
	Use a smaller padding of the tag area that is more
	space-efficient. If this option is not present, large padding is
	used - that is for compatibility with older kernels.

	fix_hmac
	Improve security of internal_hash and journal_mac:

	- the section number is mixed to the mac, so that an attacker can't
	copy sectors from one journal section to another journal section
	- the superblock is protected by journal_mac
	- a 16-byte salt stored in the superblock is mixed to the mac, so
	that the attacker can't detect that two disks have the same hmac
	key and also to disallow the attacker to move sectors from one
	disk to another

	legacy_recalculate
	Allow recalculating of volumes with HMAC keys. This is disabled by
	default for security reasons - an attacker could modify the volume,
	set recalc_sector to zero, and the kernel would not detect the
	modification.

	The journal mode (D/J), buffer_sectors, journal_watermark, commit_time and
	allow_discards can be changed when reloading the target (load an inactive
	table and swap the tables with suspend and resume). The other arguments
	should not be changed when reloading the target because the layout of disk
	data depend on them and the reloaded target would be non-functional.

	For example, on a device using the default interleave_sectors of 32768, a
	block_size of 512, and an internal_hash of crc32c with a tag size of 4
	bytes, it will take 128 KiB of tags to track a full data area, requiring
	256 sectors of metadata per data area. With the default buffer_sectors of
	128, that means there will be 2 buffers per metadata area, or 2 buffers
	per 16 MiB of data.

	Status line:

	1. the number of integrity mismatches
	2. provided data sectors - that is the number of sectors that the user
	could use
	3. the current recalculating position (or '-' if we didn't recalculate)


	The layout of the formatted block device:

	* reserved sectors
	(they are not used by this target, they can be used for
	storing LUKS metadata or for other purpose), the size of the reserved
	area is specified in the target arguments

	* superblock (4kiB)
	* magic string - identifies that the device was formatted
	* version
	* log2(interleave sectors)
	* integrity tag size
	* the number of journal sections
	* provided data sectors - the number of sectors that this target
	provides (i.e. the size of the device minus the size of all
	metadata and padding). The user of this target should not send
	bios that access data beyond the "provided data sectors" limit.
	* flags
	SB_FLAG_HAVE_JOURNAL_MAC
	- a flag is set if journal_mac is used
	SB_FLAG_RECALCULATING
	- recalculating is in progress
	SB_FLAG_DIRTY_BITMAP
	- journal area contains the bitmap of dirty
	blocks
	* log2(sectors per block)
	* a position where recalculating finished
	* journal
	The journal is divided into sections, each section contains:

	* metadata area (4kiB), it contains journal entries

	- every journal entry contains:

	* logical sector (specifies where the data and tag should
	be written)
	* last 8 bytes of data
	* integrity tag (the size is specified in the superblock)

	- every metadata sector ends with

	* mac (8-bytes), all the macs in 8 metadata sectors form a
	64-byte value. It is used to store hmac of sector
	numbers in the journal section, to protect against a
	possibility that the attacker tampers with sector
	numbers in the journal.
	* commit id

	* data area (the size is variable; it depends on how many journal
	entries fit into the metadata area)

	- every sector in the data area contains:

	* data (504 bytes of data, the last 8 bytes are stored in
	the journal entry)
	* commit id

	To test if the whole journal section was written correctly, every
	512-byte sector of the journal ends with 8-byte commit id. If the
	commit id matches on all sectors in a journal section, then it is
	assumed that the section was written correctly. If the commit id
	doesn't match, the section was written partially and it should not
	be replayed.

	* one or more runs of interleaved tags and data.
	Each run contains:

	* tag area - it contains integrity tags. There is one tag for each
	sector in the data area. The size of this area is always 4KiB or
	greater.
	* data area - it contains data sectors. The number of data sectors
	in one run must be a power of two. log2 of this value is stored
	in the superblock.