| Queue sysfs files |
| ================= |
| |
| This text file will detail the queue files that are located in the sysfs tree |
| for each block device. Note that stacked devices typically do not export |
| any settings, since their queue merely functions are a remapping target. |
| These files are the ones found in the /sys/block/xxx/queue/ directory. |
| |
| Files denoted with a RO postfix are readonly and the RW postfix means |
| read-write. |
| |
| add_random (RW) |
| ---------------- |
| This file allows to turn off the disk entropy contribution. Default |
| value of this file is '1'(on). |
| |
| dax (RO) |
| -------- |
| This file indicates whether the device supports Direct Access (DAX), |
| used by CPU-addressable storage to bypass the pagecache. It shows '1' |
| if true, '0' if not. |
| |
| discard_granularity (RO) |
| ----------------------- |
| This shows the size of internal allocation of the device in bytes, if |
| reported by the device. A value of '0' means device does not support |
| the discard functionality. |
| |
| discard_max_hw_bytes (RO) |
| ---------------------- |
| Devices that support discard functionality may have internal limits on |
| the number of bytes that can be trimmed or unmapped in a single operation. |
| The discard_max_bytes parameter is set by the device driver to the maximum |
| number of bytes that can be discarded in a single operation. Discard |
| requests issued to the device must not exceed this limit. A discard_max_bytes |
| value of 0 means that the device does not support discard functionality. |
| |
| discard_max_bytes (RW) |
| ---------------------- |
| While discard_max_hw_bytes is the hardware limit for the device, this |
| setting is the software limit. Some devices exhibit large latencies when |
| large discards are issued, setting this value lower will make Linux issue |
| smaller discards and potentially help reduce latencies induced by large |
| discard operations. |
| |
| hw_sector_size (RO) |
| ------------------- |
| This is the hardware sector size of the device, in bytes. |
| |
| io_poll (RW) |
| ------------ |
| When read, this file shows whether polling is enabled (1) or disabled |
| (0). Writing '0' to this file will disable polling for this device. |
| Writing any non-zero value will enable this feature. |
| |
| io_poll_delay (RW) |
| ------------------ |
| If polling is enabled, this controls what kind of polling will be |
| performed. It defaults to -1, which is classic polling. In this mode, |
| the CPU will repeatedly ask for completions without giving up any time. |
| If set to 0, a hybrid polling mode is used, where the kernel will attempt |
| to make an educated guess at when the IO will complete. Based on this |
| guess, the kernel will put the process issuing IO to sleep for an amount |
| of time, before entering a classic poll loop. This mode might be a |
| little slower than pure classic polling, but it will be more efficient. |
| If set to a value larger than 0, the kernel will put the process issuing |
| IO to sleep for this amount of microseconds before entering classic |
| polling. |
| |
| iostats (RW) |
| ------------- |
| This file is used to control (on/off) the iostats accounting of the |
| disk. |
| |
| logical_block_size (RO) |
| ----------------------- |
| This is the logical block size of the device, in bytes. |
| |
| max_hw_sectors_kb (RO) |
| ---------------------- |
| This is the maximum number of kilobytes supported in a single data transfer. |
| |
| max_integrity_segments (RO) |
| --------------------------- |
| When read, this file shows the max limit of integrity segments as |
| set by block layer which a hardware controller can handle. |
| |
| max_sectors_kb (RW) |
| ------------------- |
| This is the maximum number of kilobytes that the block layer will allow |
| for a filesystem request. Must be smaller than or equal to the maximum |
| size allowed by the hardware. |
| |
| max_segments (RO) |
| ----------------- |
| Maximum number of segments of the device. |
| |
| max_segment_size (RO) |
| --------------------- |
| Maximum segment size of the device. |
| |
| minimum_io_size (RO) |
| -------------------- |
| This is the smallest preferred IO size reported by the device. |
| |
| nomerges (RW) |
| ------------- |
| This enables the user to disable the lookup logic involved with IO |
| merging requests in the block layer. By default (0) all merges are |
| enabled. When set to 1 only simple one-hit merges will be tried. When |
| set to 2 no merge algorithms will be tried (including one-hit or more |
| complex tree/hash lookups). |
| |
| nr_requests (RW) |
| ---------------- |
| This controls how many requests may be allocated in the block layer for |
| read or write requests. Note that the total allocated number may be twice |
| this amount, since it applies only to reads or writes (not the accumulated |
| sum). |
| |
| To avoid priority inversion through request starvation, a request |
| queue maintains a separate request pool per each cgroup when |
| CONFIG_BLK_CGROUP is enabled, and this parameter applies to each such |
| per-block-cgroup request pool. IOW, if there are N block cgroups, |
| each request queue may have up to N request pools, each independently |
| regulated by nr_requests. |
| |
| optimal_io_size (RO) |
| -------------------- |
| This is the optimal IO size reported by the device. |
| |
| physical_block_size (RO) |
| ------------------------ |
| This is the physical block size of device, in bytes. |
| |
| read_ahead_kb (RW) |
| ------------------ |
| Maximum number of kilobytes to read-ahead for filesystems on this block |
| device. |
| |
| rotational (RW) |
| --------------- |
| This file is used to stat if the device is of rotational type or |
| non-rotational type. |
| |
| rq_affinity (RW) |
| ---------------- |
| If this option is '1', the block layer will migrate request completions to the |
| cpu "group" that originally submitted the request. For some workloads this |
| provides a significant reduction in CPU cycles due to caching effects. |
| |
| For storage configurations that need to maximize distribution of completion |
| processing setting this option to '2' forces the completion to run on the |
| requesting cpu (bypassing the "group" aggregation logic). |
| |
| scheduler (RW) |
| -------------- |
| When read, this file will display the current and available IO schedulers |
| for this block device. The currently active IO scheduler will be enclosed |
| in [] brackets. Writing an IO scheduler name to this file will switch |
| control of this block device to that new IO scheduler. Note that writing |
| an IO scheduler name to this file will attempt to load that IO scheduler |
| module, if it isn't already present in the system. |
| |
| write_cache (RW) |
| ---------------- |
| When read, this file will display whether the device has write back |
| caching enabled or not. It will return "write back" for the former |
| case, and "write through" for the latter. Writing to this file can |
| change the kernels view of the device, but it doesn't alter the |
| device state. This means that it might not be safe to toggle the |
| setting from "write back" to "write through", since that will also |
| eliminate cache flushes issued by the kernel. |
| |
| write_same_max_bytes (RO) |
| ------------------------- |
| This is the number of bytes the device can write in a single write-same |
| command. A value of '0' means write-same is not supported by this |
| device. |
| |
| wb_lat_usec (RW) |
| ---------------- |
| If the device is registered for writeback throttling, then this file shows |
| the target minimum read latency. If this latency is exceeded in a given |
| window of time (see wb_window_usec), then the writeback throttling will start |
| scaling back writes. Writing a value of '0' to this file disables the |
| feature. Writing a value of '-1' to this file resets the value to the |
| default setting. |
| |
| throttle_sample_time (RW) |
| ------------------------- |
| This is the time window that blk-throttle samples data, in millisecond. |
| blk-throttle makes decision based on the samplings. Lower time means cgroups |
| have more smooth throughput, but higher CPU overhead. This exists only when |
| CONFIG_BLK_DEV_THROTTLING_LOW is enabled. |
| |
| zoned (RO) |
| ---------- |
| This indicates if the device is a zoned block device and the zone model of the |
| device if it is indeed zoned. The possible values indicated by zoned are |
| "none" for regular block devices and "host-aware" or "host-managed" for zoned |
| block devices. The characteristics of host-aware and host-managed zoned block |
| devices are described in the ZBC (Zoned Block Commands) and ZAC |
| (Zoned Device ATA Command Set) standards. These standards also define the |
| "drive-managed" zone model. However, since drive-managed zoned block devices |
| do not support zone commands, they will be treated as regular block devices |
| and zoned will report "none". |
| |
| nr_zones (RO) |
| ------------- |
| For zoned block devices (zoned attribute indicating "host-managed" or |
| "host-aware"), this indicates the total number of zones of the device. |
| This is always 0 for regular block devices. |
| |
| chunk_sectors (RO) |
| ------------------ |
| This has different meaning depending on the type of the block device. |
| For a RAID device (dm-raid), chunk_sectors indicates the size in 512B sectors |
| of the RAID volume stripe segment. For a zoned block device, either host-aware |
| or host-managed, chunk_sectors indicates the size in 512B sectors of the zones |
| of the device, with the eventual exception of the last zone of the device which |
| may be smaller. |
| |
| Jens Axboe <jens.axboe@oracle.com>, February 2009 |