Mauro Carvalho Chehab | e6f7df7 | 2020-04-27 23:17:04 +0200 | [diff] [blame] | 1 | .. SPDX-License-Identifier: GPL-2.0 |
| 2 | |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 3 | ============ |
| 4 | Fiemap Ioctl |
| 5 | ============ |
| 6 | |
| 7 | The fiemap ioctl is an efficient method for userspace to get file |
| 8 | extent mappings. Instead of block-by-block mapping (such as bmap), fiemap |
| 9 | returns a list of extents. |
| 10 | |
| 11 | |
| 12 | Request Basics |
| 13 | -------------- |
| 14 | |
Mauro Carvalho Chehab | e6f7df7 | 2020-04-27 23:17:04 +0200 | [diff] [blame] | 15 | A fiemap request is encoded within struct fiemap:: |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 16 | |
Mauro Carvalho Chehab | e6f7df7 | 2020-04-27 23:17:04 +0200 | [diff] [blame] | 17 | struct fiemap { |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 18 | __u64 fm_start; /* logical offset (inclusive) at |
| 19 | * which to start mapping (in) */ |
| 20 | __u64 fm_length; /* logical length of mapping which |
| 21 | * userspace cares about (in) */ |
| 22 | __u32 fm_flags; /* FIEMAP_FLAG_* flags for request (in/out) */ |
| 23 | __u32 fm_mapped_extents; /* number of extents that were |
| 24 | * mapped (out) */ |
| 25 | __u32 fm_extent_count; /* size of fm_extents array (in) */ |
| 26 | __u32 fm_reserved; |
| 27 | struct fiemap_extent fm_extents[0]; /* array of mapped extents (out) */ |
Mauro Carvalho Chehab | e6f7df7 | 2020-04-27 23:17:04 +0200 | [diff] [blame] | 28 | }; |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 29 | |
| 30 | |
| 31 | fm_start, and fm_length specify the logical range within the file |
| 32 | which the process would like mappings for. Extents returned mirror |
| 33 | those on disk - that is, the logical offset of the 1st returned extent |
| 34 | may start before fm_start, and the range covered by the last returned |
| 35 | extent may end after fm_length. All offsets and lengths are in bytes. |
| 36 | |
| 37 | Certain flags to modify the way in which mappings are looked up can be |
| 38 | set in fm_flags. If the kernel doesn't understand some particular |
| 39 | flags, it will return EBADR and the contents of fm_flags will contain |
| 40 | the set of flags which caused the error. If the kernel is compatible |
| 41 | with all flags passed, the contents of fm_flags will be unmodified. |
| 42 | It is up to userspace to determine whether rejection of a particular |
Francis Galiegue | a33f322 | 2010-04-23 00:08:02 +0200 | [diff] [blame] | 43 | flag is fatal to its operation. This scheme is intended to allow the |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 44 | fiemap interface to grow in the future but without losing |
| 45 | compatibility with old software. |
| 46 | |
| 47 | fm_extent_count specifies the number of elements in the fm_extents[] array |
| 48 | that can be used to return extents. If fm_extent_count is zero, then the |
| 49 | fm_extents[] array is ignored (no extents will be returned), and the |
| 50 | fm_mapped_extents count will hold the number of extents needed in |
| 51 | fm_extents[] to hold the file's current mapping. Note that there is |
| 52 | nothing to prevent the file from changing between calls to FIEMAP. |
| 53 | |
| 54 | The following flags can be set in fm_flags: |
| 55 | |
Mauro Carvalho Chehab | e6f7df7 | 2020-04-27 23:17:04 +0200 | [diff] [blame] | 56 | FIEMAP_FLAG_SYNC |
| 57 | If this flag is set, the kernel will sync the file before mapping extents. |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 58 | |
Mauro Carvalho Chehab | e6f7df7 | 2020-04-27 23:17:04 +0200 | [diff] [blame] | 59 | FIEMAP_FLAG_XATTR |
| 60 | If this flag is set, the extents returned will describe the inodes |
| 61 | extended attribute lookup tree, instead of its data tree. |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 62 | |
| 63 | |
| 64 | Extent Mapping |
| 65 | -------------- |
| 66 | |
| 67 | Extent information is returned within the embedded fm_extents array |
| 68 | which userspace must allocate along with the fiemap structure. The |
| 69 | number of elements in the fiemap_extents[] array should be passed via |
| 70 | fm_extent_count. The number of extents mapped by kernel will be |
| 71 | returned via fm_mapped_extents. If the number of fiemap_extents |
| 72 | allocated is less than would be required to map the requested range, |
| 73 | the maximum number of extents that can be mapped in the fm_extent[] |
| 74 | array will be returned and fm_mapped_extents will be equal to |
| 75 | fm_extent_count. In that case, the last extent in the array will not |
| 76 | complete the requested range and will not have the FIEMAP_EXTENT_LAST |
| 77 | flag set (see the next section on extent flags). |
| 78 | |
| 79 | Each extent is described by a single fiemap_extent structure as |
Mauro Carvalho Chehab | e6f7df7 | 2020-04-27 23:17:04 +0200 | [diff] [blame] | 80 | returned in fm_extents:: |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 81 | |
Mauro Carvalho Chehab | e6f7df7 | 2020-04-27 23:17:04 +0200 | [diff] [blame] | 82 | struct fiemap_extent { |
| 83 | __u64 fe_logical; /* logical offset in bytes for the start of |
| 84 | * the extent */ |
| 85 | __u64 fe_physical; /* physical offset in bytes for the start |
| 86 | * of the extent */ |
| 87 | __u64 fe_length; /* length in bytes for the extent */ |
| 88 | __u64 fe_reserved64[2]; |
| 89 | __u32 fe_flags; /* FIEMAP_EXTENT_* flags for this extent */ |
| 90 | __u32 fe_reserved[3]; |
| 91 | }; |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 92 | |
| 93 | All offsets and lengths are in bytes and mirror those on disk. It is valid |
Francis Galiegue | a33f322 | 2010-04-23 00:08:02 +0200 | [diff] [blame] | 94 | for an extents logical offset to start before the request or its logical |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 95 | length to extend past the request. Unless FIEMAP_EXTENT_NOT_ALIGNED is |
| 96 | returned, fe_logical, fe_physical, and fe_length will be aligned to the |
| 97 | block size of the file system. With the exception of extents flagged as |
| 98 | FIEMAP_EXTENT_MERGED, adjacent extents will not be merged. |
| 99 | |
| 100 | The fe_flags field contains flags which describe the extent returned. |
| 101 | A special flag, FIEMAP_EXTENT_LAST is always set on the last extent in |
| 102 | the file so that the process making fiemap calls can determine when no |
| 103 | more extents are available, without having to call the ioctl again. |
| 104 | |
| 105 | Some flags are intentionally vague and will always be set in the |
| 106 | presence of other more specific flags. This way a program looking for |
| 107 | a general property does not have to know all existing and future flags |
| 108 | which imply that property. |
| 109 | |
| 110 | For example, if FIEMAP_EXTENT_DATA_INLINE or FIEMAP_EXTENT_DATA_TAIL |
| 111 | are set, FIEMAP_EXTENT_NOT_ALIGNED will also be set. A program looking |
| 112 | for inline or tail-packed data can key on the specific flag. Software |
| 113 | which simply cares not to try operating on non-aligned extents |
| 114 | however, can just key on FIEMAP_EXTENT_NOT_ALIGNED, and not have to |
| 115 | worry about all present and future flags which might imply unaligned |
| 116 | data. Note that the opposite is not true - it would be valid for |
| 117 | FIEMAP_EXTENT_NOT_ALIGNED to appear alone. |
| 118 | |
Mauro Carvalho Chehab | e6f7df7 | 2020-04-27 23:17:04 +0200 | [diff] [blame] | 119 | FIEMAP_EXTENT_LAST |
| 120 | This is generally the last extent in the file. A mapping attempt past |
| 121 | this extent may return nothing. Some implementations set this flag to |
| 122 | indicate this extent is the last one in the range queried by the user |
| 123 | (via fiemap->fm_length). |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 124 | |
Mauro Carvalho Chehab | e6f7df7 | 2020-04-27 23:17:04 +0200 | [diff] [blame] | 125 | FIEMAP_EXTENT_UNKNOWN |
| 126 | The location of this extent is currently unknown. This may indicate |
| 127 | the data is stored on an inaccessible volume or that no storage has |
| 128 | been allocated for the file yet. |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 129 | |
Mauro Carvalho Chehab | e6f7df7 | 2020-04-27 23:17:04 +0200 | [diff] [blame] | 130 | FIEMAP_EXTENT_DELALLOC |
| 131 | This will also set FIEMAP_EXTENT_UNKNOWN. |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 132 | |
Mauro Carvalho Chehab | e6f7df7 | 2020-04-27 23:17:04 +0200 | [diff] [blame] | 133 | Delayed allocation - while there is data for this extent, its |
| 134 | physical location has not been allocated yet. |
| 135 | |
| 136 | FIEMAP_EXTENT_ENCODED |
| 137 | This extent does not consist of plain filesystem blocks but is |
| 138 | encoded (e.g. encrypted or compressed). Reading the data in this |
| 139 | extent via I/O to the block device will have undefined results. |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 140 | |
| 141 | Note that it is *always* undefined to try to update the data |
| 142 | in-place by writing to the indicated location without the |
| 143 | assistance of the filesystem, or to access the data using the |
| 144 | information returned by the FIEMAP interface while the filesystem |
| 145 | is mounted. In other words, user applications may only read the |
| 146 | extent data via I/O to the block device while the filesystem is |
| 147 | unmounted, and then only if the FIEMAP_EXTENT_ENCODED flag is |
| 148 | clear; user applications must not try reading or writing to the |
| 149 | filesystem via the block device under any other circumstances. |
| 150 | |
Mauro Carvalho Chehab | e6f7df7 | 2020-04-27 23:17:04 +0200 | [diff] [blame] | 151 | FIEMAP_EXTENT_DATA_ENCRYPTED |
| 152 | This will also set FIEMAP_EXTENT_ENCODED |
| 153 | The data in this extent has been encrypted by the file system. |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 154 | |
Mauro Carvalho Chehab | e6f7df7 | 2020-04-27 23:17:04 +0200 | [diff] [blame] | 155 | FIEMAP_EXTENT_NOT_ALIGNED |
| 156 | Extent offsets and length are not guaranteed to be block aligned. |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 157 | |
Mauro Carvalho Chehab | e6f7df7 | 2020-04-27 23:17:04 +0200 | [diff] [blame] | 158 | FIEMAP_EXTENT_DATA_INLINE |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 159 | This will also set FIEMAP_EXTENT_NOT_ALIGNED |
Mauro Carvalho Chehab | e6f7df7 | 2020-04-27 23:17:04 +0200 | [diff] [blame] | 160 | Data is located within a meta data block. |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 161 | |
Mauro Carvalho Chehab | e6f7df7 | 2020-04-27 23:17:04 +0200 | [diff] [blame] | 162 | FIEMAP_EXTENT_DATA_TAIL |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 163 | This will also set FIEMAP_EXTENT_NOT_ALIGNED |
Mauro Carvalho Chehab | e6f7df7 | 2020-04-27 23:17:04 +0200 | [diff] [blame] | 164 | Data is packed into a block with data from other files. |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 165 | |
Mauro Carvalho Chehab | e6f7df7 | 2020-04-27 23:17:04 +0200 | [diff] [blame] | 166 | FIEMAP_EXTENT_UNWRITTEN |
| 167 | Unwritten extent - the extent is allocated but its data has not been |
| 168 | initialized. This indicates the extent's data will be all zero if read |
| 169 | through the filesystem but the contents are undefined if read directly from |
| 170 | the device. |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 171 | |
Mauro Carvalho Chehab | e6f7df7 | 2020-04-27 23:17:04 +0200 | [diff] [blame] | 172 | FIEMAP_EXTENT_MERGED |
| 173 | This will be set when a file does not support extents, i.e., it uses a block |
| 174 | based addressing scheme. Since returning an extent for each block back to |
| 175 | userspace would be highly inefficient, the kernel will try to merge most |
| 176 | adjacent blocks into 'extents'. |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 177 | |
| 178 | |
| 179 | VFS -> File System Implementation |
| 180 | --------------------------------- |
| 181 | |
| 182 | File systems wishing to support fiemap must implement a ->fiemap callback on |
| 183 | their inode_operations structure. The fs ->fiemap call is responsible for |
Francis Galiegue | a33f322 | 2010-04-23 00:08:02 +0200 | [diff] [blame] | 184 | defining its set of supported fiemap flags, and calling a helper function on |
Mauro Carvalho Chehab | e6f7df7 | 2020-04-27 23:17:04 +0200 | [diff] [blame] | 185 | each discovered extent:: |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 186 | |
Mauro Carvalho Chehab | e6f7df7 | 2020-04-27 23:17:04 +0200 | [diff] [blame] | 187 | struct inode_operations { |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 188 | ... |
| 189 | |
| 190 | int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start, |
| 191 | u64 len); |
| 192 | |
| 193 | ->fiemap is passed struct fiemap_extent_info which describes the |
Mauro Carvalho Chehab | e6f7df7 | 2020-04-27 23:17:04 +0200 | [diff] [blame] | 194 | fiemap request:: |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 195 | |
Mauro Carvalho Chehab | e6f7df7 | 2020-04-27 23:17:04 +0200 | [diff] [blame] | 196 | struct fiemap_extent_info { |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 197 | unsigned int fi_flags; /* Flags as passed from user */ |
| 198 | unsigned int fi_extents_mapped; /* Number of mapped extents */ |
| 199 | unsigned int fi_extents_max; /* Size of fiemap_extent array */ |
| 200 | struct fiemap_extent *fi_extents_start; /* Start of fiemap_extent array */ |
Mauro Carvalho Chehab | e6f7df7 | 2020-04-27 23:17:04 +0200 | [diff] [blame] | 201 | }; |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 202 | |
| 203 | It is intended that the file system should not need to access any of this |
Dmitry Monakhov | 913e027 | 2015-02-10 14:09:29 -0800 | [diff] [blame] | 204 | structure directly. Filesystem handlers should be tolerant to signals and return |
| 205 | EINTR once fatal signal received. |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 206 | |
| 207 | |
| 208 | Flag checking should be done at the beginning of the ->fiemap callback via the |
Linus Torvalds | 0b166a5 | 2020-06-05 16:19:28 -0700 | [diff] [blame] | 209 | fiemap_prep() helper:: |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 210 | |
Linus Torvalds | 0b166a5 | 2020-06-05 16:19:28 -0700 | [diff] [blame] | 211 | int fiemap_prep(struct inode *inode, struct fiemap_extent_info *fieinfo, |
| 212 | u64 start, u64 *len, u32 supported_flags); |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 213 | |
Matt LaPlante | 19f5946 | 2009-04-27 15:06:31 +0200 | [diff] [blame] | 214 | The struct fieinfo should be passed in as received from ioctl_fiemap(). The |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 215 | set of fiemap flags which the fs understands should be passed via fs_flags. If |
Linus Torvalds | 0b166a5 | 2020-06-05 16:19:28 -0700 | [diff] [blame] | 216 | fiemap_prep finds invalid user flags, it will place the bad values in |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 217 | fieinfo->fi_flags and return -EBADR. If the file system gets -EBADR, from |
Linus Torvalds | 0b166a5 | 2020-06-05 16:19:28 -0700 | [diff] [blame] | 218 | fiemap_prep(), it should immediately exit, returning that error back to |
| 219 | ioctl_fiemap(). Additionally the range is validate against the supported |
| 220 | maximum file size. |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 221 | |
| 222 | |
| 223 | For each extent in the request range, the file system should call |
Mauro Carvalho Chehab | e6f7df7 | 2020-04-27 23:17:04 +0200 | [diff] [blame] | 224 | the helper function, fiemap_fill_next_extent():: |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 225 | |
Mauro Carvalho Chehab | e6f7df7 | 2020-04-27 23:17:04 +0200 | [diff] [blame] | 226 | int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical, |
| 227 | u64 phys, u64 len, u32 flags, u32 dev); |
Mark Fasheh | c4b929b | 2008-10-08 19:44:18 -0400 | [diff] [blame] | 228 | |
| 229 | fiemap_fill_next_extent() will use the passed values to populate the |
| 230 | next free extent in the fm_extents array. 'General' extent flags will |
| 231 | automatically be set from specific flags on behalf of the calling file |
| 232 | system so that the userspace API is not broken. |
| 233 | |
| 234 | fiemap_fill_next_extent() returns 0 on success, and 1 when the |
| 235 | user-supplied fm_extents array is full. If an error is encountered |
| 236 | while copying the extent to user memory, -EFAULT will be returned. |