33d32fa7120ed184efc9be1ea3c016109b4fea84 - linux

commit	33d32fa7120ed184efc9be1ea3c016109b4fea84	[log] [tgz]
author	Lars Ellenberg <lars.ellenberg@linbit.com>	Tue Aug 29 10:20:43 2017 +0200
committer	Jens Axboe <axboe@kernel.dk>	Tue Aug 29 15:34:45 2017 -0600
tree	30c0fdf900ecdfbf51207795eebc0d985a06effe
parent	427fd2bee0a33a670de186387e79d280a6808a66 [diff]

drbd: fix potential deadlock when trying to detach during handshake

When requesting a detach, we first suspend IO, and also inhibit meta-data IO
by means of drbd_md_get_buffer(), because we don't want to "fail" the disk
while there is IO in-flight: the transition into D_FAILED for detach purposes
may get misinterpreted as actual IO error in a confused endio function.

We wrap it all into wait_event(), to retry in case the drbd_req_state()
returns SS_IN_TRANSIENT_STATE, as it does for example during an ongoing
connection handshake.

In that example, the receiver thread may need to grab drbd_md_get_buffer()
during the handshake to make progress.  To avoid potential deadlock with
detach, detach needs to grab and release the meta data buffer inside of
that wait_event retry loop. To avoid lock inversion between
mutex_lock(&device->state_mutex) and drbd_md_get_buffer(device),
introduce a new enum chg_state_flag CS_INHIBIT_MD_IO, and move the
call to drbd_md_get_buffer() inside the state_mutex grabbed in
drbd_req_state().

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

3 files changed

tree: 30c0fdf900ecdfbf51207795eebc0d985a06effe