blob: 20206ba965affe333423de87fe6b470a026953ba [file] [log] [blame]
What: /sys/fs/lustre/version
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Shows current running lustre version.
What: /sys/fs/lustre/pinger
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Shows if the lustre module has pinger support.
"on" means yes and "off" means no.
What: /sys/fs/lustre/health
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Shows whenever current system state believed to be "healthy",
"NOT HEALTHY", or "LBUG" whenever lustre has experienced
an internal assertion failure
What: /sys/fs/lustre/jobid_name
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Currently running job "name" for this node to be transferred
to Lustre servers for purposes of QoS and statistics gathering.
Writing into this file will change the name, reading outputs
currently set value.
What: /sys/fs/lustre/jobid_var
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Control file for lustre "jobstats" functionality, write new
value from the list below to change the mode:
disable - disable job name reporting to the servers (default)
procname_uid - form the job name as the current running
command name and pid with a dot in between
e.g. dd.1253
nodelocal - use jobid_name value from above.
What: /sys/fs/lustre/timeout
Date: June 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Controls "lustre timeout" variable, also known as obd_timeout
in some old manual. In the past obd_timeout was of paramount
importance as the timeout value used everywhere and where
other timeouts were derived from. These days it's much less
important as network timeouts are mostly determined by
AT (adaptive timeouts).
Unit: seconds, default: 100
What: /sys/fs/lustre/max_dirty_mb
Date: June 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Controls total number of dirty cache (in megabytes) allowed
across all mounted lustre filesystems.
Since writeout of dirty pages in Lustre is somewhat expensive,
when you allow to many dirty pages, this might lead to
performance degradations as kernel tries to desperately
find some pages to free/writeout.
Default 1/2 RAM. Min value 4, max value 9/10 of RAM.
What: /sys/fs/lustre/debug_peer_on_timeout
Date: June 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Control if lnet debug information should be printed when
an RPC timeout occurs.
0 disabled (default)
1 enabled
What: /sys/fs/lustre/dump_on_timeout
Date: June 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Controls if Lustre debug log should be dumped when an RPC
timeout occurs. This is useful if yout debug buffer typically
rolls over by the time you notice RPC timeouts.
What: /sys/fs/lustre/dump_on_eviction
Date: June 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Controls if Lustre debug log should be dumped when an this
client is evicted from one of the servers.
This is useful if yout debug buffer typically rolls over
by the time you notice the eviction event.
What: /sys/fs/lustre/at_min
Date: July 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Controls minimum adaptive timeout in seconds. If you encounter
a case where clients timeout due to server-reported processing
time being too short, you might consider increasing this value.
One common case of this if the underlying network has
unpredictable long delays.
Default: 0
What: /sys/fs/lustre/at_max
Date: July 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Controls maximum adaptive timeout in seconds. If at_max timeout
is reached for an RPC, the RPC will time out.
Some genuinuely slow network hardware might warrant increasing
this value.
Setting this value to 0 disables Adaptive Timeouts
functionality and old-style obd_timeout value is then used.
Default: 600
What: /sys/fs/lustre/at_extra
Date: July 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Controls how much extra time to request for unfinished requests
in processing in seconds. Normally a server-side parameter, it
is also used on the client for responses to various LDLM ASTs
that are handled with a special server thread on the client.
This is a way for the servers to ask the clients not to time
out the request that reached current servicing time estimate
yet and give it some more time.
Default: 30
What: /sys/fs/lustre/at_early_margin
Date: July 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Controls when to send the early reply for requests that are
about to timeout as an offset to the estimated service time in
seconds..
Default: 5
What: /sys/fs/lustre/at_history
Date: July 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Controls for how many seconds to remember slowest events
encountered by adaptive timeouts code.
Default: 600
What: /sys/fs/lustre/llite/<fsname>-<uuid>/blocksize
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Biggest blocksize on object storage server for this filesystem.
What: /sys/fs/lustre/llite/<fsname>-<uuid>/kbytestotal
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Shows total number of kilobytes of space on this filesystem
What: /sys/fs/lustre/llite/<fsname>-<uuid>/kbytesfree
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Shows total number of free kilobytes of space on this filesystem
What: /sys/fs/lustre/llite/<fsname>-<uuid>/kbytesavail
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Shows total number of free kilobytes of space on this filesystem
actually available for use (taking into account per-client
grants and filesystem reservations).
What: /sys/fs/lustre/llite/<fsname>-<uuid>/filestotal
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Shows total number of inodes on the filesystem.
What: /sys/fs/lustre/llite/<fsname>-<uuid>/filesfree
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Shows estimated number of free inodes on the filesystem
What: /sys/fs/lustre/llite/<fsname>-<uuid>/client_type
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Shows whenever this filesystem considers this client to be
compute cluster-local or remote. Remote clients have
additional uid/gid convrting logic applied.
What: /sys/fs/lustre/llite/<fsname>-<uuid>/fstype
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Shows filesystem type of the filesystem
What: /sys/fs/lustre/llite/<fsname>-<uuid>/uuid
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Shows this filesystem superblock uuid
What: /sys/fs/lustre/llite/<fsname>-<uuid>/max_read_ahead_mb
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Sets maximum number of megabytes in system memory to be
given to read-ahead cache.
What: /sys/fs/lustre/llite/<fsname>-<uuid>/max_read_ahead_per_file_mb
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Sets maximum number of megabytes to read-ahead for a single file
What: /sys/fs/lustre/llite/<fsname>-<uuid>/max_read_ahead_whole_mb
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
For small reads, how many megabytes to actually request from
the server as initial read-ahead.
What: /sys/fs/lustre/llite/<fsname>-<uuid>/checksum_pages
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Enables or disables per-page checksum at llite layer, before
the pages are actually given to lower level for network transfer
What: /sys/fs/lustre/llite/<fsname>-<uuid>/stats_track_pid
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Limit Lustre vfs operations gathering to just a single pid.
0 to track everything.
What: /sys/fs/lustre/llite/<fsname>-<uuid>/stats_track_ppid
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Limit Lustre vfs operations gathering to just a single ppid.
0 to track everything.
What: /sys/fs/lustre/llite/<fsname>-<uuid>/stats_track_gid
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Limit Lustre vfs operations gathering to just a single gid.
0 to track everything.
What: /sys/fs/lustre/llite/<fsname>-<uuid>/statahead_max
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Controls maximum number of statahead requests to send when
sequential readdir+stat pattern is detected.
What: /sys/fs/lustre/llite/<fsname>-<uuid>/statahead_agl
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Controls if AGL (async glimpse ahead - obtain object information
from OSTs in parallel with MDS during statahead) should be
enabled or disabled.
0 to disable, 1 to enable.
What: /sys/fs/lustre/llite/<fsname>-<uuid>/lazystatfs
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Controls statfs(2) behaviour in the face of down servers.
If 0, always wait for all servers to come online,
if 1, ignote inactive servers.
What: /sys/fs/lustre/llite/<fsname>-<uuid>/max_easize
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Shows maximum number of bytes file striping data could be
in current configuration of storage.
What: /sys/fs/lustre/llite/<fsname>-<uuid>/default_easize
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Shows maximum observed file striping data seen by this
filesystem client instance.
What: /sys/fs/lustre/llite/<fsname>-<uuid>/xattr_cache
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Controls extended attributes client-side cache.
1 to enable, 0 to disable.
What: /sys/fs/lustre/llite/<fsname>-<uuid>/unstable_stats
Date: Apr 2016
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Shows number of pages that were sent and acknowledged by
server but were not yet committed and therefore still
pinned in client memory even though no longer dirty.
What: /sys/fs/lustre/ldlm/cancel_unused_locks_before_replay
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Controls if client should replay unused locks during recovery
If a client tends to have a lot of unused locks in LRU,
recovery times might become prolonged.
1 - just locally cancel unused locks (default)
0 - replay unused locks.
What: /sys/fs/lustre/ldlm/namespaces/<name>/resource_count
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Displays number of lock resources (objects on which individual
locks are taken) currently allocated in this namespace.
What: /sys/fs/lustre/ldlm/namespaces/<name>/lock_count
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Displays number or locks allocated in this namespace.
What: /sys/fs/lustre/ldlm/namespaces/<name>/lru_size
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Controls and displays LRU size limit for unused locks for this
namespace.
0 - LRU size is unlimited, controlled by server resources
positive number - number of locks to allow in lock LRU list
What: /sys/fs/lustre/ldlm/namespaces/<name>/lock_unused_count
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Display number of locks currently sitting in the LRU list
of this namespace
What: /sys/fs/lustre/ldlm/namespaces/<name>/lru_max_age
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Maximum number of milliseconds a lock could sit in LRU list
before client would voluntarily cancel it as unused.
What: /sys/fs/lustre/ldlm/namespaces/<name>/early_lock_cancel
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Controls "early lock cancellation" feature on this namespace
if supported by the server.
When enabled, tries to preemtively cancel locks that would be
cancelled by verious operations and bundle the cancellation
requests in the same RPC as the main operation, which results
in significant speedups due to reduced lock-pingpong RPCs.
0 - disabled
1 - enabled (default)
What: /sys/fs/lustre/ldlm/namespaces/<name>/pool/granted
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Displays number of granted locks in this namespace
What: /sys/fs/lustre/ldlm/namespaces/<name>/pool/grant_rate
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Number of granted locks in this namespace during last
time interval
What: /sys/fs/lustre/ldlm/namespaces/<name>/pool/cancel_rate
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Number of lock cancellations in this namespace during
last time interval
What: /sys/fs/lustre/ldlm/namespaces/<name>/pool/grant_speed
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Calculated speed of lock granting (grant_rate - cancel_rate)
in this namespace
What: /sys/fs/lustre/ldlm/namespaces/<name>/pool/grant_plan
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Estimated number of locks to be granted in the next time
interval in this namespace
What: /sys/fs/lustre/ldlm/namespaces/<name>/pool/limit
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Controls number of allowed locks in this pool.
When lru_size is 0, this is the actual limit then.
What: /sys/fs/lustre/ldlm/namespaces/<name>/pool/lock_volume_factor
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Multiplier for all lock volume calculations above.
Default is 1. Increase to make the client to more agressively
clean it's lock LRU list for this namespace.
What: /sys/fs/lustre/ldlm/namespaces/<name>/pool/server_lock_volume
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Calculated server lock volume.
What: /sys/fs/lustre/ldlm/namespaces/<name>/pool/recalc_period
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Controls length of time between recalculation of above
values (in seconds).
What: /sys/fs/lustre/ldlm/services/ldlm_cbd/threads_min
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Controls minimum number of ldlm callback threads to start.
What: /sys/fs/lustre/ldlm/services/ldlm_cbd/threads_max
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Controls maximum number of ldlm callback threads to start.
What: /sys/fs/lustre/ldlm/services/ldlm_cbd/threads_started
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Shows actual number of ldlm callback threads running.
What: /sys/fs/lustre/ldlm/services/ldlm_cbd/high_priority_ratio
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Controls what percentage of ldlm callback threads is dedicated
to "high priority" incoming requests.
What: /sys/fs/lustre/{obdtype}/{connection_name}/blocksize
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Blocksize on backend filesystem for service behind this obd
device (or biggest blocksize for compound devices like lov
and lmv)
What: /sys/fs/lustre/{obdtype}/{connection_name}/kbytestotal
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Total number of kilobytes of space on backend filesystem
for service behind this obd (or total amount for compound
devices like lov lmv)
What: /sys/fs/lustre/{obdtype}/{connection_name}/kbytesfree
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Number of free kilobytes on backend filesystem for service
behind this obd (or total amount for compound devices
like lov lmv)
What: /sys/fs/lustre/{obdtype}/{connection_name}/kbytesavail
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Number of kilobytes of free space on backend filesystem
for service behind this obd (or total amount for compound
devices like lov lmv) that is actually available for use
(taking into account per-client and filesystem reservations).
What: /sys/fs/lustre/{obdtype}/{connection_name}/filestotal
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Number of inodes on backend filesystem for service behind this
obd.
What: /sys/fs/lustre/{obdtype}/{connection_name}/filesfree
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Number of free inodes on backend filesystem for service
behind this obd.
What: /sys/fs/lustre/mdc/{connection_name}/max_pages_per_rpc
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Maximum number of readdir pages to fit into a single readdir
RPC.
What: /sys/fs/lustre/{mdc,osc}/{connection_name}/max_rpcs_in_flight
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Maximum number of parallel RPCs on the wire to allow on
this connection. Increasing this number would help on higher
latency links, but has a chance of overloading a server
if you have too many clients like this.
Default: 8
What: /sys/fs/lustre/osc/{connection_name}/max_pages_per_rpc
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Maximum number of pages to fit into a single RPC.
Typically bigger RPCs allow for better performance.
Default: however many pages to form 1M of data (256 pages
for 4K page sized platforms)
What: /sys/fs/lustre/osc/{connection_name}/active
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Controls accessibility of this connection. If set to 0,
fail all accesses immediately.
What: /sys/fs/lustre/osc/{connection_name}/checksums
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Controls whenever to checksum bulk RPC data over the wire
to this target.
1: enable (default) ; 0: disable
What: /sys/fs/lustre/osc/{connection_name}/contention_seconds
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Controls for how long to consider a file contended once
indicated as such by the server.
When a file is considered contended, all operations switch to
synchronous lockless mode to avoid cache and lock pingpong.
What: /sys/fs/lustre/osc/{connection_name}/cur_dirty_bytes
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Displays how many dirty bytes is presently in the cache for this
target.
What: /sys/fs/lustre/osc/{connection_name}/cur_grant_bytes
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Shows how many bytes we have as a "dirty cache" grant from the
server. Writing a value smaller than shown allows to release
some grant back to the server.
Dirty cache grant is a way Lustre ensures that cached successful
writes on client do not end up discarded by the server due to
lack of space later on.
What: /sys/fs/lustre/osc/{connection_name}/cur_lost_grant_bytes
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Shows how many granted bytes were released to the server due
to lack of write activity on this client.
What: /sys/fs/lustre/osc/{connection_name}/grant_shrink_interval
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Number of seconds with no write activity for this target
to start releasing dirty grant back to the server.
What: /sys/fs/lustre/osc/{connection_name}/destroys_in_flight
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Number of DESTROY RPCs currently in flight to this target.
What: /sys/fs/lustre/osc/{connection_name}/lockless_truncate
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Controls whether lockless truncate RPCs are allowed to this
target.
Lockless truncate causes server to perform the locking which
is beneficial if the truncate is not followed by a write
immediately.
1: enable ; 0: disable (default)
What: /sys/fs/lustre/osc/{connection_name}/max_dirty_mb
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Controls how much dirty data this client can accumulate
for this target. This is orthogonal to dirty grant and is
a hard limit even if the server would allow a bigger dirty
cache.
While allowing higher dirty cache is beneficial for write
performance, flushing write cache takes longer and as such
the node might be more prone to OOMs.
Having this value set too low might result in not being able
to sent too many parallel WRITE RPCs.
Default: 32
What: /sys/fs/lustre/osc/{connection_name}/resend_count
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Controls how many times to try and resend RPCs to this target
that failed with "recoverable" status, such as EAGAIN,
ENOMEM.
What: /sys/fs/lustre/lov/{connection_name}/numobd
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Number of OSC targets managed by this LOV instance.
What: /sys/fs/lustre/lov/{connection_name}/activeobd
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Number of OSC targets managed by this LOV instance that are
actually active.
What: /sys/fs/lustre/lmv/{connection_name}/numobd
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Number of MDC targets managed by this LMV instance.
What: /sys/fs/lustre/lmv/{connection_name}/activeobd
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Number of MDC targets managed by this LMV instance that are
actually active.
What: /sys/fs/lustre/lmv/{connection_name}/placement
Date: May 2015
Contact: "Oleg Drokin" <oleg.drokin@intel.com>
Description:
Determines policy of inode placement in case of multiple
metadata servers:
CHAR - based on a hash of the file name used at creation time
(Default)
NID - based on a hash of creating client network id.