c4923027bd58cdccbe54e13298a4d914668a56da - linux

commit	c4923027bd58cdccbe54e13298a4d914668a56da	[log] [tgz]
author	Josef Bacik <josef@toxicpanda.com>	Tue Aug 25 16:56:59 2020 -0400
committer	David Sterba <dsterba@suse.com>	Wed Oct 07 12:06:54 2020 +0200
tree	8d09dca1805379371f0672f8adc8a6b02029f213
parent	1a7a92c8ddcd1edc4a5407de8f56edc6cfdf394a [diff]

btrfs: fix possible infinite loop in data async reclaim

Dave reported an issue where generic/102 would sometimes hang.  This
turned out to be because we'd get into this spot where we were no longer
making progress on data reservations because our exit condition was not
met.  The log is basically

while (!space_info->full && !list_empty(&space_info->tickets))
	flush_space(space_info, flush_state);

where flush state is our various flush states, but doesn't include
ALLOC_CHUNK_FORCE.  This is because we actually lead with allocating
chunks, and so the assumption was that once you got to the actual
flushing states you could no longer allocate chunks.  This was a stupid
assumption, because you could have deleted block groups that would be
reclaimed by a transaction commit, thus unsetting space_info->full.
This is essentially what happens with generic/102, and so sometimes
you'd get stuck in the flushing loop because we weren't allocating
chunks, but flushing space wasn't giving us what we needed to make
progress.

Fix this by adding ALLOC_CHUNK_FORCE to the end of our flushing states,
that way we will eventually bail out because we did end up with
space_info->full if we free'd a chunk previously.  Otherwise, as is the
case for this test, we'll allocate our chunk and continue on our happy
merry way.

Reported-by: David Sterba <dsterba@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>

fs/btrfs/space-info.c[diff]

1 file changed

tree: 8d09dca1805379371f0672f8adc8a6b02029f213