drbd: debugfs: Add in_flight_summary
* Add details about pending meta data operations to in_flight_summary.
* Report number of requests waiting for activity log transactions.
* timing details of peer_requests to in_flight_summary.
* FLUSH details
DRBD devides the incoming request stream into "epochs",
in which peers are allowed to re-order writes independendly.
These epochs are separated by P_BARRIER on the replication link.
Such barrier packets, depending on configuration, may cause
the receiving side to drain the lower level device request queues
and call blkdev_issue_flush().
This is known to be an other major source of latency in DRBD.
Track timing details of calls to blkdev_issue_flush(),
and add them to in_flight_summary.
* data socket stats
To be able to diagnose bottlenecks and root causes of "slow" IO on DRBD,
it is useful to see network buffer stats along with the timing details of
requests, peer requests, and meta data IO.
* pending bitmap IO timing details to in_flight_summary.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index 42e3835..9f6ed24 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1187,8 +1187,16 @@
kref_get(&device->kref);
rcu_read_unlock();
+ /* Right now, we have only this one synchronous code path
+ * for flushes between request epochs.
+ * We may want to make those asynchronous,
+ * or at least parallelize the flushes to the volume devices.
+ */
+ device->flush_jif = jiffies;
+ set_bit(FLUSH_PENDING, &device->flags);
rv = blkdev_issue_flush(device->ldev->backing_bdev,
GFP_NOIO, NULL);
+ clear_bit(FLUSH_PENDING, &device->flags);
if (rv) {
drbd_info(device, "local disk flush failed with status %d\n", rv);
/* would rather check on EOPNOTSUPP, but that is not reliable.