Mauro Carvalho Chehab | 898bd37 | 2019-04-18 19:45:00 -0300 | [diff] [blame] | 1 | ====================================== |
| 2 | Immutable biovecs and biovec iterators |
| 3 | ====================================== |
Kent Overstreet | 4550dd6 | 2013-08-07 14:26:21 -0700 | [diff] [blame] | 4 | |
| 5 | Kent Overstreet <kmo@daterainc.com> |
| 6 | |
| 7 | As of 3.13, biovecs should never be modified after a bio has been submitted. |
| 8 | Instead, we have a new struct bvec_iter which represents a range of a biovec - |
| 9 | the iterator will be modified as the bio is completed, not the biovec. |
| 10 | |
| 11 | More specifically, old code that needed to partially complete a bio would |
| 12 | update bi_sector and bi_size, and advance bi_idx to the next biovec. If it |
| 13 | ended up partway through a biovec, it would increment bv_offset and decrement |
| 14 | bv_len by the number of bytes completed in that biovec. |
| 15 | |
| 16 | In the new scheme of things, everything that must be mutated in order to |
| 17 | partially complete a bio is segregated into struct bvec_iter: bi_sector, |
| 18 | bi_size and bi_idx have been moved there; and instead of modifying bv_offset |
| 19 | and bv_len, struct bvec_iter has bi_bvec_done, which represents the number of |
| 20 | bytes completed in the current bvec. |
| 21 | |
| 22 | There are a bunch of new helper macros for hiding the gory details - in |
| 23 | particular, presenting the illusion of partially completed biovecs so that |
| 24 | normal code doesn't have to deal with bi_bvec_done. |
| 25 | |
| 26 | * Driver code should no longer refer to biovecs directly; we now have |
Dongsu Park | 2ec3182 | 2014-12-19 14:53:03 +0100 | [diff] [blame] | 27 | bio_iovec() and bio_iter_iovec() macros that return literal struct biovecs, |
Kent Overstreet | 4550dd6 | 2013-08-07 14:26:21 -0700 | [diff] [blame] | 28 | constructed from the raw biovecs but taking into account bi_bvec_done and |
| 29 | bi_size. |
| 30 | |
| 31 | bio_for_each_segment() has been updated to take a bvec_iter argument |
| 32 | instead of an integer (that corresponded to bi_idx); for a lot of code the |
| 33 | conversion just required changing the types of the arguments to |
| 34 | bio_for_each_segment(). |
| 35 | |
| 36 | * Advancing a bvec_iter is done with bio_advance_iter(); bio_advance() is a |
| 37 | wrapper around bio_advance_iter() that operates on bio->bi_iter, and also |
| 38 | advances the bio integrity's iter if present. |
| 39 | |
| 40 | There is a lower level advance function - bvec_iter_advance() - which takes |
| 41 | a pointer to a biovec, not a bio; this is used by the bio integrity code. |
| 42 | |
Pavel Begunkov | 9b2e001 | 2021-01-09 16:02:58 +0000 | [diff] [blame] | 43 | As of 5.12 bvec segments with zero bv_len are not supported. |
| 44 | |
Kent Overstreet | 4550dd6 | 2013-08-07 14:26:21 -0700 | [diff] [blame] | 45 | What's all this get us? |
| 46 | ======================= |
| 47 | |
| 48 | Having a real iterator, and making biovecs immutable, has a number of |
| 49 | advantages: |
| 50 | |
| 51 | * Before, iterating over bios was very awkward when you weren't processing |
Guoqing Jiang | 6f7f8ef | 2020-01-06 11:37:35 +0100 | [diff] [blame] | 52 | exactly one bvec at a time - for example, bio_copy_data() in block/bio.c, |
Kent Overstreet | 4550dd6 | 2013-08-07 14:26:21 -0700 | [diff] [blame] | 53 | which copies the contents of one bio into another. Because the biovecs |
| 54 | wouldn't necessarily be the same size, the old code was tricky convoluted - |
| 55 | it had to walk two different bios at the same time, keeping both bi_idx and |
| 56 | and offset into the current biovec for each. |
| 57 | |
| 58 | The new code is much more straightforward - have a look. This sort of |
| 59 | pattern comes up in a lot of places; a lot of drivers were essentially open |
| 60 | coding bvec iterators before, and having common implementation considerably |
| 61 | simplifies a lot of code. |
| 62 | |
| 63 | * Before, any code that might need to use the biovec after the bio had been |
| 64 | completed (perhaps to copy the data somewhere else, or perhaps to resubmit |
| 65 | it somewhere else if there was an error) had to save the entire bvec array |
| 66 | - again, this was being done in a fair number of places. |
| 67 | |
| 68 | * Biovecs can be shared between multiple bios - a bvec iter can represent an |
| 69 | arbitrary range of an existing biovec, both starting and ending midway |
| 70 | through biovecs. This is what enables efficient splitting of arbitrary |
| 71 | bios. Note that this means we _only_ use bi_size to determine when we've |
| 72 | reached the end of a bio, not bi_vcnt - and the bio_iovec() macro takes |
| 73 | bi_size into account when constructing biovecs. |
| 74 | |
| 75 | * Splitting bios is now much simpler. The old bio_split() didn't even work on |
| 76 | bios with more than a single bvec! Now, we can efficiently split arbitrary |
| 77 | size bios - because the new bio can share the old bio's biovec. |
| 78 | |
| 79 | Care must be taken to ensure the biovec isn't freed while the split bio is |
| 80 | still using it, in case the original bio completes first, though. Using |
| 81 | bio_chain() when splitting bios helps with this. |
| 82 | |
| 83 | * Submitting partially completed bios is now perfectly fine - this comes up |
| 84 | occasionally in stacking block drivers and various code (e.g. md and |
| 85 | bcache) had some ugly workarounds for this. |
| 86 | |
| 87 | It used to be the case that submitting a partially completed bio would work |
| 88 | fine to _most_ devices, but since accessing the raw bvec array was the |
| 89 | norm, not all drivers would respect bi_idx and those would break. Now, |
| 90 | since all drivers _must_ go through the bvec iterator - and have been |
| 91 | audited to make sure they are - submitting partially completed bios is |
| 92 | perfectly fine. |
| 93 | |
| 94 | Other implications: |
| 95 | =================== |
| 96 | |
| 97 | * Almost all usage of bi_idx is now incorrect and has been removed; instead, |
| 98 | where previously you would have used bi_idx you'd now use a bvec_iter, |
| 99 | probably passing it to one of the helper macros. |
| 100 | |
| 101 | I.e. instead of using bio_iovec_idx() (or bio->bi_iovec[bio->bi_idx]), you |
| 102 | now use bio_iter_iovec(), which takes a bvec_iter and returns a |
| 103 | literal struct bio_vec - constructed on the fly from the raw biovec but |
| 104 | taking into account bi_bvec_done (and bi_size). |
| 105 | |
| 106 | * bi_vcnt can't be trusted or relied upon by driver code - i.e. anything that |
| 107 | doesn't actually own the bio. The reason is twofold: firstly, it's not |
| 108 | actually needed for iterating over the bio anymore - we only use bi_size. |
| 109 | Secondly, when cloning a bio and reusing (a portion of) the original bio's |
| 110 | biovec, in order to calculate bi_vcnt for the new bio we'd have to iterate |
| 111 | over all the biovecs in the new bio - which is silly as it's not needed. |
| 112 | |
| 113 | So, don't use bi_vcnt anymore. |
Dongsu Park | 2ec3182 | 2014-12-19 14:53:03 +0100 | [diff] [blame] | 114 | |
| 115 | * The current interface allows the block layer to split bios as needed, so we |
| 116 | could eliminate a lot of complexity particularly in stacked drivers. Code |
| 117 | that creates bios can then create whatever size bios are convenient, and |
| 118 | more importantly stacked drivers don't have to deal with both their own bio |
| 119 | size limitations and the limitations of the underlying devices. Thus |
| 120 | there's no need to define ->merge_bvec_fn() callbacks for individual block |
| 121 | drivers. |
Ming Lei | ac4fa1d | 2019-02-15 19:13:22 +0800 | [diff] [blame] | 122 | |
| 123 | Usage of helpers: |
| 124 | ================= |
| 125 | |
Mauro Carvalho Chehab | 898bd37 | 2019-04-18 19:45:00 -0300 | [diff] [blame] | 126 | * The following helpers whose names have the suffix of `_all` can only be used |
| 127 | on non-BIO_CLONED bio. They are usually used by filesystem code. Drivers |
| 128 | shouldn't use them because the bio may have been split before it reached the |
| 129 | driver. |
| 130 | |
| 131 | :: |
Ming Lei | ac4fa1d | 2019-02-15 19:13:22 +0800 | [diff] [blame] | 132 | |
| 133 | bio_for_each_segment_all() |
Omar Sandoval | 1072c12d | 2020-04-16 14:46:11 -0700 | [diff] [blame] | 134 | bio_for_each_bvec_all() |
Ming Lei | ac4fa1d | 2019-02-15 19:13:22 +0800 | [diff] [blame] | 135 | bio_first_bvec_all() |
| 136 | bio_first_page_all() |
ZhangPeng | 6d2790d9 | 2023-07-21 11:44:44 +0800 | [diff] [blame] | 137 | bio_first_folio_all() |
Ming Lei | ac4fa1d | 2019-02-15 19:13:22 +0800 | [diff] [blame] | 138 | bio_last_bvec_all() |
| 139 | |
| 140 | * The following helpers iterate over single-page segment. The passed 'struct |
Mauro Carvalho Chehab | 898bd37 | 2019-04-18 19:45:00 -0300 | [diff] [blame] | 141 | bio_vec' will contain a single-page IO vector during the iteration:: |
Ming Lei | ac4fa1d | 2019-02-15 19:13:22 +0800 | [diff] [blame] | 142 | |
| 143 | bio_for_each_segment() |
| 144 | bio_for_each_segment_all() |
| 145 | |
| 146 | * The following helpers iterate over multi-page bvec. The passed 'struct |
Mauro Carvalho Chehab | 898bd37 | 2019-04-18 19:45:00 -0300 | [diff] [blame] | 147 | bio_vec' will contain a multi-page IO vector during the iteration:: |
Ming Lei | ac4fa1d | 2019-02-15 19:13:22 +0800 | [diff] [blame] | 148 | |
| 149 | bio_for_each_bvec() |
Omar Sandoval | 1072c12d | 2020-04-16 14:46:11 -0700 | [diff] [blame] | 150 | bio_for_each_bvec_all() |
Ming Lei | ac4fa1d | 2019-02-15 19:13:22 +0800 | [diff] [blame] | 151 | rq_for_each_bvec() |