| =================================== |
| Documentation for /proc/sys/kernel/ |
| =================================== |
| |
| .. See scripts/check-sysctl-docs to keep this up to date |
| |
| |
| Copyright (c) 1998, 1999, Rik van Riel <riel@nl.linux.org> |
| |
| Copyright (c) 2009, Shen Feng<shen@cn.fujitsu.com> |
| |
| For general info and legal blurb, please look in |
| Documentation/admin-guide/sysctl/index.rst. |
| |
| ------------------------------------------------------------------------------ |
| |
| This file contains documentation for the sysctl files in |
| ``/proc/sys/kernel/``. |
| |
| The files in this directory can be used to tune and monitor |
| miscellaneous and general things in the operation of the Linux |
| kernel. Since some of the files *can* be used to screw up your |
| system, it is advisable to read both documentation and source |
| before actually making adjustments. |
| |
| Currently, these files might (depending on your configuration) |
| show up in ``/proc/sys/kernel``: |
| |
| .. contents:: :local: |
| |
| |
| acct |
| ==== |
| |
| :: |
| |
| highwater lowwater frequency |
| |
| If BSD-style process accounting is enabled these values control |
| its behaviour. If free space on filesystem where the log lives |
| goes below ``lowwater``\ % accounting suspends. If free space gets |
| above ``highwater``\ % accounting resumes. ``frequency`` determines |
| how often do we check the amount of free space (value is in |
| seconds). Default: |
| |
| :: |
| |
| 4 2 30 |
| |
| That is, suspend accounting if free space drops below 2%; resume it |
| if it increases to at least 4%; consider information about amount of |
| free space valid for 30 seconds. |
| |
| |
| acpi_video_flags |
| ================ |
| |
| See Documentation/power/video.rst. This allows the video resume mode to be set, |
| in a similar fashion to the ``acpi_sleep`` kernel parameter, by |
| combining the following values: |
| |
| = ======= |
| 1 s3_bios |
| 2 s3_mode |
| 4 s3_beep |
| = ======= |
| |
| arch |
| ==== |
| |
| The machine hardware name, the same output as ``uname -m`` |
| (e.g. ``x86_64`` or ``aarch64``). |
| |
| auto_msgmni |
| =========== |
| |
| This variable has no effect and may be removed in future kernel |
| releases. Reading it always returns 0. |
| Up to Linux 3.17, it enabled/disabled automatic recomputing of |
| `msgmni`_ |
| upon memory add/remove or upon IPC namespace creation/removal. |
| Echoing "1" into this file enabled msgmni automatic recomputing. |
| Echoing "0" turned it off. The default value was 1. |
| |
| |
| bootloader_type (x86 only) |
| ========================== |
| |
| This gives the bootloader type number as indicated by the bootloader, |
| shifted left by 4, and OR'd with the low four bits of the bootloader |
| version. The reason for this encoding is that this used to match the |
| ``type_of_loader`` field in the kernel header; the encoding is kept for |
| backwards compatibility. That is, if the full bootloader type number |
| is 0x15 and the full version number is 0x234, this file will contain |
| the value 340 = 0x154. |
| |
| See the ``type_of_loader`` and ``ext_loader_type`` fields in |
| Documentation/arch/x86/boot.rst for additional information. |
| |
| |
| bootloader_version (x86 only) |
| ============================= |
| |
| The complete bootloader version number. In the example above, this |
| file will contain the value 564 = 0x234. |
| |
| See the ``type_of_loader`` and ``ext_loader_ver`` fields in |
| Documentation/arch/x86/boot.rst for additional information. |
| |
| |
| bpf_stats_enabled |
| ================= |
| |
| Controls whether the kernel should collect statistics on BPF programs |
| (total time spent running, number of times run...). Enabling |
| statistics causes a slight reduction in performance on each program |
| run. The statistics can be seen using ``bpftool``. |
| |
| = =================================== |
| 0 Don't collect statistics (default). |
| 1 Collect statistics. |
| = =================================== |
| |
| |
| cad_pid |
| ======= |
| |
| This is the pid which will be signalled on reboot (notably, by |
| Ctrl-Alt-Delete). Writing a value to this file which doesn't |
| correspond to a running process will result in ``-ESRCH``. |
| |
| See also `ctrl-alt-del`_. |
| |
| |
| cap_last_cap |
| ============ |
| |
| Highest valid capability of the running kernel. Exports |
| ``CAP_LAST_CAP`` from the kernel. |
| |
| |
| .. _core_pattern: |
| |
| core_pattern |
| ============ |
| |
| ``core_pattern`` is used to specify a core dumpfile pattern name. |
| |
| * max length 127 characters; default value is "core" |
| * ``core_pattern`` is used as a pattern template for the output |
| filename; certain string patterns (beginning with '%') are |
| substituted with their actual values. |
| * backward compatibility with ``core_uses_pid``: |
| |
| If ``core_pattern`` does not include "%p" (default does not) |
| and ``core_uses_pid`` is set, then .PID will be appended to |
| the filename. |
| |
| * corename format specifiers |
| |
| ======== ========================================== |
| %<NUL> '%' is dropped |
| %% output one '%' |
| %p pid |
| %P global pid (init PID namespace) |
| %i tid |
| %I global tid (init PID namespace) |
| %u uid (in initial user namespace) |
| %g gid (in initial user namespace) |
| %d dump mode, matches ``PR_SET_DUMPABLE`` and |
| ``/proc/sys/fs/suid_dumpable`` |
| %s signal number |
| %t UNIX time of dump |
| %h hostname |
| %e executable filename (may be shortened, could be changed by prctl etc) |
| %f executable filename |
| %E executable path |
| %c maximum size of core file by resource limit RLIMIT_CORE |
| %C CPU the task ran on |
| %<OTHER> both are dropped |
| ======== ========================================== |
| |
| * If the first character of the pattern is a '|', the kernel will treat |
| the rest of the pattern as a command to run. The core dump will be |
| written to the standard input of that program instead of to a file. |
| |
| |
| core_pipe_limit |
| =============== |
| |
| This sysctl is only applicable when `core_pattern`_ is configured to |
| pipe core files to a user space helper (when the first character of |
| ``core_pattern`` is a '|', see above). |
| When collecting cores via a pipe to an application, it is occasionally |
| useful for the collecting application to gather data about the |
| crashing process from its ``/proc/pid`` directory. |
| In order to do this safely, the kernel must wait for the collecting |
| process to exit, so as not to remove the crashing processes proc files |
| prematurely. |
| This in turn creates the possibility that a misbehaving userspace |
| collecting process can block the reaping of a crashed process simply |
| by never exiting. |
| This sysctl defends against that. |
| It defines how many concurrent crashing processes may be piped to user |
| space applications in parallel. |
| If this value is exceeded, then those crashing processes above that |
| value are noted via the kernel log and their cores are skipped. |
| 0 is a special value, indicating that unlimited processes may be |
| captured in parallel, but that no waiting will take place (i.e. the |
| collecting process is not guaranteed access to ``/proc/<crashing |
| pid>/``). |
| This value defaults to 0. |
| |
| |
| core_uses_pid |
| ============= |
| |
| The default coredump filename is "core". By setting |
| ``core_uses_pid`` to 1, the coredump filename becomes core.PID. |
| If `core_pattern`_ does not include "%p" (default does not) |
| and ``core_uses_pid`` is set, then .PID will be appended to |
| the filename. |
| |
| |
| ctrl-alt-del |
| ============ |
| |
| When the value in this file is 0, ctrl-alt-del is trapped and |
| sent to the ``init(1)`` program to handle a graceful restart. |
| When, however, the value is > 0, Linux's reaction to a Vulcan |
| Nerve Pinch (tm) will be an immediate reboot, without even |
| syncing its dirty buffers. |
| |
| Note: |
| when a program (like dosemu) has the keyboard in 'raw' |
| mode, the ctrl-alt-del is intercepted by the program before it |
| ever reaches the kernel tty layer, and it's up to the program |
| to decide what to do with it. |
| |
| |
| dmesg_restrict |
| ============== |
| |
| This toggle indicates whether unprivileged users are prevented |
| from using ``dmesg(8)`` to view messages from the kernel's log |
| buffer. |
| When ``dmesg_restrict`` is set to 0 there are no restrictions. |
| When ``dmesg_restrict`` is set to 1, users must have |
| ``CAP_SYSLOG`` to use ``dmesg(8)``. |
| |
| The kernel config option ``CONFIG_SECURITY_DMESG_RESTRICT`` sets the |
| default value of ``dmesg_restrict``. |
| |
| |
| domainname & hostname |
| ===================== |
| |
| These files can be used to set the NIS/YP domainname and the |
| hostname of your box in exactly the same way as the commands |
| domainname and hostname, i.e.:: |
| |
| # echo "darkstar" > /proc/sys/kernel/hostname |
| # echo "mydomain" > /proc/sys/kernel/domainname |
| |
| has the same effect as:: |
| |
| # hostname "darkstar" |
| # domainname "mydomain" |
| |
| Note, however, that the classic darkstar.frop.org has the |
| hostname "darkstar" and DNS (Internet Domain Name Server) |
| domainname "frop.org", not to be confused with the NIS (Network |
| Information Service) or YP (Yellow Pages) domainname. These two |
| domain names are in general different. For a detailed discussion |
| see the ``hostname(1)`` man page. |
| |
| |
| firmware_config |
| =============== |
| |
| See Documentation/driver-api/firmware/fallback-mechanisms.rst. |
| |
| The entries in this directory allow the firmware loader helper |
| fallback to be controlled: |
| |
| * ``force_sysfs_fallback``, when set to 1, forces the use of the |
| fallback; |
| * ``ignore_sysfs_fallback``, when set to 1, ignores any fallback. |
| |
| |
| ftrace_dump_on_oops |
| =================== |
| |
| Determines whether ``ftrace_dump()`` should be called on an oops (or |
| kernel panic). This will output the contents of the ftrace buffers to |
| the console. This is very useful for capturing traces that lead to |
| crashes and outputting them to a serial console. |
| |
| ======================= =========================================== |
| 0 Disabled (default). |
| 1 Dump buffers of all CPUs. |
| 2(orig_cpu) Dump the buffer of the CPU that triggered the |
| oops. |
| <instance> Dump the specific instance buffer on all CPUs. |
| <instance>=2(orig_cpu) Dump the specific instance buffer on the CPU |
| that triggered the oops. |
| ======================= =========================================== |
| |
| Multiple instance dump is also supported, and instances are separated |
| by commas. If global buffer also needs to be dumped, please specify |
| the dump mode (1/2/orig_cpu) first for global buffer. |
| |
| So for example to dump "foo" and "bar" instance buffer on all CPUs, |
| user can:: |
| |
| echo "foo,bar" > /proc/sys/kernel/ftrace_dump_on_oops |
| |
| To dump global buffer and "foo" instance buffer on all |
| CPUs along with the "bar" instance buffer on CPU that triggered the |
| oops, user can:: |
| |
| echo "1,foo,bar=2" > /proc/sys/kernel/ftrace_dump_on_oops |
| |
| ftrace_enabled, stack_tracer_enabled |
| ==================================== |
| |
| See Documentation/trace/ftrace.rst. |
| |
| |
| hardlockup_all_cpu_backtrace |
| ============================ |
| |
| This value controls the hard lockup detector behavior when a hard |
| lockup condition is detected as to whether or not to gather further |
| debug information. If enabled, arch-specific all-CPU stack dumping |
| will be initiated. |
| |
| = ============================================ |
| 0 Do nothing. This is the default behavior. |
| 1 On detection capture more debug information. |
| = ============================================ |
| |
| |
| hardlockup_panic |
| ================ |
| |
| This parameter can be used to control whether the kernel panics |
| when a hard lockup is detected. |
| |
| = =========================== |
| 0 Don't panic on hard lockup. |
| 1 Panic on hard lockup. |
| = =========================== |
| |
| See Documentation/admin-guide/lockup-watchdogs.rst for more information. |
| This can also be set using the nmi_watchdog kernel parameter. |
| |
| |
| hotplug |
| ======= |
| |
| Path for the hotplug policy agent. |
| Default value is ``CONFIG_UEVENT_HELPER_PATH``, which in turn defaults |
| to the empty string. |
| |
| This file only exists when ``CONFIG_UEVENT_HELPER`` is enabled. Most |
| modern systems rely exclusively on the netlink-based uevent source and |
| don't need this. |
| |
| |
| hung_task_all_cpu_backtrace |
| =========================== |
| |
| If this option is set, the kernel will send an NMI to all CPUs to dump |
| their backtraces when a hung task is detected. This file shows up if |
| CONFIG_DETECT_HUNG_TASK and CONFIG_SMP are enabled. |
| |
| 0: Won't show all CPUs backtraces when a hung task is detected. |
| This is the default behavior. |
| |
| 1: Will non-maskably interrupt all CPUs and dump their backtraces when |
| a hung task is detected. |
| |
| |
| hung_task_panic |
| =============== |
| |
| Controls the kernel's behavior when a hung task is detected. |
| This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. |
| |
| = ================================================= |
| 0 Continue operation. This is the default behavior. |
| 1 Panic immediately. |
| = ================================================= |
| |
| |
| hung_task_check_count |
| ===================== |
| |
| The upper bound on the number of tasks that are checked. |
| This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. |
| |
| |
| hung_task_timeout_secs |
| ====================== |
| |
| When a task in D state did not get scheduled |
| for more than this value report a warning. |
| This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. |
| |
| 0 means infinite timeout, no checking is done. |
| |
| Possible values to set are in range {0:``LONG_MAX``/``HZ``}. |
| |
| |
| hung_task_check_interval_secs |
| ============================= |
| |
| Hung task check interval. If hung task checking is enabled |
| (see `hung_task_timeout_secs`_), the check is done every |
| ``hung_task_check_interval_secs`` seconds. |
| This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. |
| |
| 0 (default) means use ``hung_task_timeout_secs`` as checking |
| interval. |
| |
| Possible values to set are in range {0:``LONG_MAX``/``HZ``}. |
| |
| |
| hung_task_warnings |
| ================== |
| |
| The maximum number of warnings to report. During a check interval |
| if a hung task is detected, this value is decreased by 1. |
| When this value reaches 0, no more warnings will be reported. |
| This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. |
| |
| -1: report an infinite number of warnings. |
| |
| |
| hyperv_record_panic_msg |
| ======================= |
| |
| Controls whether the panic kmsg data should be reported to Hyper-V. |
| |
| = ========================================================= |
| 0 Do not report panic kmsg data. |
| 1 Report the panic kmsg data. This is the default behavior. |
| = ========================================================= |
| |
| |
| ignore-unaligned-usertrap |
| ========================= |
| |
| On architectures where unaligned accesses cause traps, and where this |
| feature is supported (``CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN``; |
| currently, ``arc``, ``parisc`` and ``loongarch``), controls whether all |
| unaligned traps are logged. |
| |
| = ============================================================= |
| 0 Log all unaligned accesses. |
| 1 Only warn the first time a process traps. This is the default |
| setting. |
| = ============================================================= |
| |
| See also `unaligned-trap`_. |
| |
| io_uring_disabled |
| ================= |
| |
| Prevents all processes from creating new io_uring instances. Enabling this |
| shrinks the kernel's attack surface. |
| |
| = ====================================================================== |
| 0 All processes can create io_uring instances as normal. This is the |
| default setting. |
| 1 io_uring creation is disabled (io_uring_setup() will fail with |
| -EPERM) for unprivileged processes not in the io_uring_group group. |
| Existing io_uring instances can still be used. See the |
| documentation for io_uring_group for more information. |
| 2 io_uring creation is disabled for all processes. io_uring_setup() |
| always fails with -EPERM. Existing io_uring instances can still be |
| used. |
| = ====================================================================== |
| |
| |
| io_uring_group |
| ============== |
| |
| When io_uring_disabled is set to 1, a process must either be |
| privileged (CAP_SYS_ADMIN) or be in the io_uring_group group in order |
| to create an io_uring instance. If io_uring_group is set to -1 (the |
| default), only processes with the CAP_SYS_ADMIN capability may create |
| io_uring instances. |
| |
| |
| kexec_load_disabled |
| =================== |
| |
| A toggle indicating if the syscalls ``kexec_load`` and |
| ``kexec_file_load`` have been disabled. |
| This value defaults to 0 (false: ``kexec_*load`` enabled), but can be |
| set to 1 (true: ``kexec_*load`` disabled). |
| Once true, kexec can no longer be used, and the toggle cannot be set |
| back to false. |
| This allows a kexec image to be loaded before disabling the syscall, |
| allowing a system to set up (and later use) an image without it being |
| altered. |
| Generally used together with the `modules_disabled`_ sysctl. |
| |
| kexec_load_limit_panic |
| ====================== |
| |
| This parameter specifies a limit to the number of times the syscalls |
| ``kexec_load`` and ``kexec_file_load`` can be called with a crash |
| image. It can only be set with a more restrictive value than the |
| current one. |
| |
| == ====================================================== |
| -1 Unlimited calls to kexec. This is the default setting. |
| N Number of calls left. |
| == ====================================================== |
| |
| kexec_load_limit_reboot |
| ======================= |
| |
| Similar functionality as ``kexec_load_limit_panic``, but for a normal |
| image. |
| |
| kptr_restrict |
| ============= |
| |
| This toggle indicates whether restrictions are placed on |
| exposing kernel addresses via ``/proc`` and other interfaces. |
| |
| When ``kptr_restrict`` is set to 0 (the default) the address is hashed |
| before printing. |
| (This is the equivalent to %p.) |
| |
| When ``kptr_restrict`` is set to 1, kernel pointers printed using the |
| %pK format specifier will be replaced with 0s unless the user has |
| ``CAP_SYSLOG`` and effective user and group ids are equal to the real |
| ids. |
| This is because %pK checks are done at read() time rather than open() |
| time, so if permissions are elevated between the open() and the read() |
| (e.g via a setuid binary) then %pK will not leak kernel pointers to |
| unprivileged users. |
| Note, this is a temporary solution only. |
| The correct long-term solution is to do the permission checks at |
| open() time. |
| Consider removing world read permissions from files that use %pK, and |
| using `dmesg_restrict`_ to protect against uses of %pK in ``dmesg(8)`` |
| if leaking kernel pointer values to unprivileged users is a concern. |
| |
| When ``kptr_restrict`` is set to 2, kernel pointers printed using |
| %pK will be replaced with 0s regardless of privileges. |
| |
| |
| modprobe |
| ======== |
| |
| The full path to the usermode helper for autoloading kernel modules, |
| by default ``CONFIG_MODPROBE_PATH``, which in turn defaults to |
| "/sbin/modprobe". This binary is executed when the kernel requests a |
| module. For example, if userspace passes an unknown filesystem type |
| to mount(), then the kernel will automatically request the |
| corresponding filesystem module by executing this usermode helper. |
| This usermode helper should insert the needed module into the kernel. |
| |
| This sysctl only affects module autoloading. It has no effect on the |
| ability to explicitly insert modules. |
| |
| This sysctl can be used to debug module loading requests:: |
| |
| echo '#! /bin/sh' > /tmp/modprobe |
| echo 'echo "$@" >> /tmp/modprobe.log' >> /tmp/modprobe |
| echo 'exec /sbin/modprobe "$@"' >> /tmp/modprobe |
| chmod a+x /tmp/modprobe |
| echo /tmp/modprobe > /proc/sys/kernel/modprobe |
| |
| Alternatively, if this sysctl is set to the empty string, then module |
| autoloading is completely disabled. The kernel will not try to |
| execute a usermode helper at all, nor will it call the |
| kernel_module_request LSM hook. |
| |
| If CONFIG_STATIC_USERMODEHELPER=y is set in the kernel configuration, |
| then the configured static usermode helper overrides this sysctl, |
| except that the empty string is still accepted to completely disable |
| module autoloading as described above. |
| |
| modules_disabled |
| ================ |
| |
| A toggle value indicating if modules are allowed to be loaded |
| in an otherwise modular kernel. This toggle defaults to off |
| (0), but can be set true (1). Once true, modules can be |
| neither loaded nor unloaded, and the toggle cannot be set back |
| to false. Generally used with the `kexec_load_disabled`_ toggle. |
| |
| |
| .. _msgmni: |
| |
| msgmax, msgmnb, and msgmni |
| ========================== |
| |
| ``msgmax`` is the maximum size of an IPC message, in bytes. 8192 by |
| default (``MSGMAX``). |
| |
| ``msgmnb`` is the maximum size of an IPC queue, in bytes. 16384 by |
| default (``MSGMNB``). |
| |
| ``msgmni`` is the maximum number of IPC queues. 32000 by default |
| (``MSGMNI``). |
| |
| All of these parameters are set per ipc namespace. The maximum number of bytes |
| in POSIX message queues is limited by ``RLIMIT_MSGQUEUE``. This limit is |
| respected hierarchically in the each user namespace. |
| |
| msg_next_id, sem_next_id, and shm_next_id (System V IPC) |
| ======================================================== |
| |
| These three toggles allows to specify desired id for next allocated IPC |
| object: message, semaphore or shared memory respectively. |
| |
| By default they are equal to -1, which means generic allocation logic. |
| Possible values to set are in range {0:``INT_MAX``}. |
| |
| Notes: |
| 1) kernel doesn't guarantee, that new object will have desired id. So, |
| it's up to userspace, how to handle an object with "wrong" id. |
| 2) Toggle with non-default value will be set back to -1 by kernel after |
| successful IPC object allocation. If an IPC object allocation syscall |
| fails, it is undefined if the value remains unmodified or is reset to -1. |
| |
| |
| ngroups_max |
| =========== |
| |
| Maximum number of supplementary groups, _i.e._ the maximum size which |
| ``setgroups`` will accept. Exports ``NGROUPS_MAX`` from the kernel. |
| |
| |
| |
| nmi_watchdog |
| ============ |
| |
| This parameter can be used to control the NMI watchdog |
| (i.e. the hard lockup detector) on x86 systems. |
| |
| = ================================= |
| 0 Disable the hard lockup detector. |
| 1 Enable the hard lockup detector. |
| = ================================= |
| |
| The hard lockup detector monitors each CPU for its ability to respond to |
| timer interrupts. The mechanism utilizes CPU performance counter registers |
| that are programmed to generate Non-Maskable Interrupts (NMIs) periodically |
| while a CPU is busy. Hence, the alternative name 'NMI watchdog'. |
| |
| The NMI watchdog is disabled by default if the kernel is running as a guest |
| in a KVM virtual machine. This default can be overridden by adding:: |
| |
| nmi_watchdog=1 |
| |
| to the guest kernel command line (see |
| Documentation/admin-guide/kernel-parameters.rst). |
| |
| |
| nmi_wd_lpm_factor (PPC only) |
| ============================ |
| |
| Factor to apply to the NMI watchdog timeout (only when ``nmi_watchdog`` is |
| set to 1). This factor represents the percentage added to |
| ``watchdog_thresh`` when calculating the NMI watchdog timeout during an |
| LPM. The soft lockup timeout is not impacted. |
| |
| A value of 0 means no change. The default value is 200 meaning the NMI |
| watchdog is set to 30s (based on ``watchdog_thresh`` equal to 10). |
| |
| |
| numa_balancing |
| ============== |
| |
| Enables/disables and configures automatic page fault based NUMA memory |
| balancing. Memory is moved automatically to nodes that access it often. |
| The value to set can be the result of ORing the following: |
| |
| = ================================= |
| 0 NUMA_BALANCING_DISABLED |
| 1 NUMA_BALANCING_NORMAL |
| 2 NUMA_BALANCING_MEMORY_TIERING |
| = ================================= |
| |
| Or NUMA_BALANCING_NORMAL to optimize page placement among different |
| NUMA nodes to reduce remote accessing. On NUMA machines, there is a |
| performance penalty if remote memory is accessed by a CPU. When this |
| feature is enabled the kernel samples what task thread is accessing |
| memory by periodically unmapping pages and later trapping a page |
| fault. At the time of the page fault, it is determined if the data |
| being accessed should be migrated to a local memory node. |
| |
| The unmapping of pages and trapping faults incur additional overhead that |
| ideally is offset by improved memory locality but there is no universal |
| guarantee. If the target workload is already bound to NUMA nodes then this |
| feature should be disabled. |
| |
| Or NUMA_BALANCING_MEMORY_TIERING to optimize page placement among |
| different types of memory (represented as different NUMA nodes) to |
| place the hot pages in the fast memory. This is implemented based on |
| unmapping and page fault too. |
| |
| numa_balancing_promote_rate_limit_MBps |
| ====================================== |
| |
| Too high promotion/demotion throughput between different memory types |
| may hurt application latency. This can be used to rate limit the |
| promotion throughput. The per-node max promotion throughput in MB/s |
| will be limited to be no more than the set value. |
| |
| A rule of thumb is to set this to less than 1/10 of the PMEM node |
| write bandwidth. |
| |
| oops_all_cpu_backtrace |
| ====================== |
| |
| If this option is set, the kernel will send an NMI to all CPUs to dump |
| their backtraces when an oops event occurs. It should be used as a last |
| resort in case a panic cannot be triggered (to protect VMs running, for |
| example) or kdump can't be collected. This file shows up if CONFIG_SMP |
| is enabled. |
| |
| 0: Won't show all CPUs backtraces when an oops is detected. |
| This is the default behavior. |
| |
| 1: Will non-maskably interrupt all CPUs and dump their backtraces when |
| an oops event is detected. |
| |
| |
| oops_limit |
| ========== |
| |
| Number of kernel oopses after which the kernel should panic when |
| ``panic_on_oops`` is not set. Setting this to 0 disables checking |
| the count. Setting this to 1 has the same effect as setting |
| ``panic_on_oops=1``. The default value is 10000. |
| |
| |
| osrelease, ostype & version |
| =========================== |
| |
| :: |
| |
| # cat osrelease |
| 2.1.88 |
| # cat ostype |
| Linux |
| # cat version |
| #5 Wed Feb 25 21:49:24 MET 1998 |
| |
| The files ``osrelease`` and ``ostype`` should be clear enough. |
| ``version`` |
| needs a little more clarification however. The '#5' means that |
| this is the fifth kernel built from this source base and the |
| date behind it indicates the time the kernel was built. |
| The only way to tune these values is to rebuild the kernel :-) |
| |
| |
| overflowgid & overflowuid |
| ========================= |
| |
| if your architecture did not always support 32-bit UIDs (i.e. arm, |
| i386, m68k, sh, and sparc32), a fixed UID and GID will be returned to |
| applications that use the old 16-bit UID/GID system calls, if the |
| actual UID or GID would exceed 65535. |
| |
| These sysctls allow you to change the value of the fixed UID and GID. |
| The default is 65534. |
| |
| |
| panic |
| ===== |
| |
| The value in this file determines the behaviour of the kernel on a |
| panic: |
| |
| * if zero, the kernel will loop forever; |
| * if negative, the kernel will reboot immediately; |
| * if positive, the kernel will reboot after the corresponding number |
| of seconds. |
| |
| When you use the software watchdog, the recommended setting is 60. |
| |
| |
| panic_on_io_nmi |
| =============== |
| |
| Controls the kernel's behavior when a CPU receives an NMI caused by |
| an IO error. |
| |
| = ================================================================== |
| 0 Try to continue operation (default). |
| 1 Panic immediately. The IO error triggered an NMI. This indicates a |
| serious system condition which could result in IO data corruption. |
| Rather than continuing, panicking might be a better choice. Some |
| servers issue this sort of NMI when the dump button is pushed, |
| and you can use this option to take a crash dump. |
| = ================================================================== |
| |
| |
| panic_on_oops |
| ============= |
| |
| Controls the kernel's behaviour when an oops or BUG is encountered. |
| |
| = =================================================================== |
| 0 Try to continue operation. |
| 1 Panic immediately. If the `panic` sysctl is also non-zero then the |
| machine will be rebooted. |
| = =================================================================== |
| |
| |
| panic_on_stackoverflow |
| ====================== |
| |
| Controls the kernel's behavior when detecting the overflows of |
| kernel, IRQ and exception stacks except a user stack. |
| This file shows up if ``CONFIG_DEBUG_STACKOVERFLOW`` is enabled. |
| |
| = ========================== |
| 0 Try to continue operation. |
| 1 Panic immediately. |
| = ========================== |
| |
| |
| panic_on_unrecovered_nmi |
| ======================== |
| |
| The default Linux behaviour on an NMI of either memory or unknown is |
| to continue operation. For many environments such as scientific |
| computing it is preferable that the box is taken out and the error |
| dealt with than an uncorrected parity/ECC error get propagated. |
| |
| A small number of systems do generate NMIs for bizarre random reasons |
| such as power management so the default is off. That sysctl works like |
| the existing panic controls already in that directory. |
| |
| |
| panic_on_warn |
| ============= |
| |
| Calls panic() in the WARN() path when set to 1. This is useful to avoid |
| a kernel rebuild when attempting to kdump at the location of a WARN(). |
| |
| = ================================================ |
| 0 Only WARN(), default behaviour. |
| 1 Call panic() after printing out WARN() location. |
| = ================================================ |
| |
| |
| panic_print |
| =========== |
| |
| Bitmask for printing system info when panic happens. User can chose |
| combination of the following bits: |
| |
| ===== ============================================ |
| bit 0 print all tasks info |
| bit 1 print system memory info |
| bit 2 print timer info |
| bit 3 print locks info if ``CONFIG_LOCKDEP`` is on |
| bit 4 print ftrace buffer |
| bit 5 print all printk messages in buffer |
| bit 6 print all CPUs backtrace (if available in the arch) |
| bit 7 print only tasks in uninterruptible (blocked) state |
| ===== ============================================ |
| |
| So for example to print tasks and memory info on panic, user can:: |
| |
| echo 3 > /proc/sys/kernel/panic_print |
| |
| |
| panic_on_rcu_stall |
| ================== |
| |
| When set to 1, calls panic() after RCU stall detection messages. This |
| is useful to define the root cause of RCU stalls using a vmcore. |
| |
| = ============================================================ |
| 0 Do not panic() when RCU stall takes place, default behavior. |
| 1 panic() after printing RCU stall messages. |
| = ============================================================ |
| |
| max_rcu_stall_to_panic |
| ====================== |
| |
| When ``panic_on_rcu_stall`` is set to 1, this value determines the |
| number of times that RCU can stall before panic() is called. |
| |
| When ``panic_on_rcu_stall`` is set to 0, this value is has no effect. |
| |
| perf_cpu_time_max_percent |
| ========================= |
| |
| Hints to the kernel how much CPU time it should be allowed to |
| use to handle perf sampling events. If the perf subsystem |
| is informed that its samples are exceeding this limit, it |
| will drop its sampling frequency to attempt to reduce its CPU |
| usage. |
| |
| Some perf sampling happens in NMIs. If these samples |
| unexpectedly take too long to execute, the NMIs can become |
| stacked up next to each other so much that nothing else is |
| allowed to execute. |
| |
| ===== ======================================================== |
| 0 Disable the mechanism. Do not monitor or correct perf's |
| sampling rate no matter how CPU time it takes. |
| |
| 1-100 Attempt to throttle perf's sample rate to this |
| percentage of CPU. Note: the kernel calculates an |
| "expected" length of each sample event. 100 here means |
| 100% of that expected length. Even if this is set to |
| 100, you may still see sample throttling if this |
| length is exceeded. Set to 0 if you truly do not care |
| how much CPU is consumed. |
| ===== ======================================================== |
| |
| |
| perf_event_paranoid |
| =================== |
| |
| Controls use of the performance events system by unprivileged |
| users (without CAP_PERFMON). The default value is 2. |
| |
| For backward compatibility reasons access to system performance |
| monitoring and observability remains open for CAP_SYS_ADMIN |
| privileged processes but CAP_SYS_ADMIN usage for secure system |
| performance monitoring and observability operations is discouraged |
| with respect to CAP_PERFMON use cases. |
| |
| === ================================================================== |
| -1 Allow use of (almost) all events by all users. |
| |
| Ignore mlock limit after perf_event_mlock_kb without |
| ``CAP_IPC_LOCK``. |
| |
| >=0 Disallow ftrace function tracepoint by users without |
| ``CAP_PERFMON``. |
| |
| Disallow raw tracepoint access by users without ``CAP_PERFMON``. |
| |
| >=1 Disallow CPU event access by users without ``CAP_PERFMON``. |
| |
| >=2 Disallow kernel profiling by users without ``CAP_PERFMON``. |
| === ================================================================== |
| |
| |
| perf_event_max_stack |
| ==================== |
| |
| Controls maximum number of stack frames to copy for (``attr.sample_type & |
| PERF_SAMPLE_CALLCHAIN``) configured events, for instance, when using |
| '``perf record -g``' or '``perf trace --call-graph fp``'. |
| |
| This can only be done when no events are in use that have callchains |
| enabled, otherwise writing to this file will return ``-EBUSY``. |
| |
| The default value is 127. |
| |
| |
| perf_event_mlock_kb |
| =================== |
| |
| Control size of per-cpu ring buffer not counted against mlock limit. |
| |
| The default value is 512 + 1 page |
| |
| |
| perf_event_max_contexts_per_stack |
| ================================= |
| |
| Controls maximum number of stack frame context entries for |
| (``attr.sample_type & PERF_SAMPLE_CALLCHAIN``) configured events, for |
| instance, when using '``perf record -g``' or '``perf trace --call-graph fp``'. |
| |
| This can only be done when no events are in use that have callchains |
| enabled, otherwise writing to this file will return ``-EBUSY``. |
| |
| The default value is 8. |
| |
| |
| perf_user_access (arm64 and riscv only) |
| ======================================= |
| |
| Controls user space access for reading perf event counters. |
| |
| arm64 |
| ===== |
| |
| The default value is 0 (access disabled). |
| |
| When set to 1, user space can read performance monitor counter registers |
| directly. |
| |
| See Documentation/arch/arm64/perf.rst for more information. |
| |
| riscv |
| ===== |
| |
| When set to 0, user space access is disabled. |
| |
| The default value is 1, user space can read performance monitor counter |
| registers through perf, any direct access without perf intervention will trigger |
| an illegal instruction. |
| |
| When set to 2, which enables legacy mode (user space has direct access to cycle |
| and insret CSRs only). Note that this legacy value is deprecated and will be |
| removed once all user space applications are fixed. |
| |
| Note that the time CSR is always directly accessible to all modes. |
| |
| pid_max |
| ======= |
| |
| PID allocation wrap value. When the kernel's next PID value |
| reaches this value, it wraps back to a minimum PID value. |
| PIDs of value ``pid_max`` or larger are not allocated. |
| |
| |
| ns_last_pid |
| =========== |
| |
| The last pid allocated in the current (the one task using this sysctl |
| lives in) pid namespace. When selecting a pid for a next task on fork |
| kernel tries to allocate a number starting from this one. |
| |
| |
| powersave-nap (PPC only) |
| ======================== |
| |
| If set, Linux-PPC will use the 'nap' mode of powersaving, |
| otherwise the 'doze' mode will be used. |
| |
| |
| ============================================================== |
| |
| printk |
| ====== |
| |
| The four values in printk denote: ``console_loglevel``, |
| ``default_message_loglevel``, ``minimum_console_loglevel`` and |
| ``default_console_loglevel`` respectively. |
| |
| These values influence printk() behavior when printing or |
| logging error messages. See '``man 2 syslog``' for more info on |
| the different loglevels. |
| |
| ======================== ===================================== |
| console_loglevel messages with a higher priority than |
| this will be printed to the console |
| default_message_loglevel messages without an explicit priority |
| will be printed with this priority |
| minimum_console_loglevel minimum (highest) value to which |
| console_loglevel can be set |
| default_console_loglevel default value for console_loglevel |
| ======================== ===================================== |
| |
| |
| printk_delay |
| ============ |
| |
| Delay each printk message in ``printk_delay`` milliseconds |
| |
| Value from 0 - 10000 is allowed. |
| |
| |
| printk_ratelimit |
| ================ |
| |
| Some warning messages are rate limited. ``printk_ratelimit`` specifies |
| the minimum length of time between these messages (in seconds). |
| The default value is 5 seconds. |
| |
| A value of 0 will disable rate limiting. |
| |
| |
| printk_ratelimit_burst |
| ====================== |
| |
| While long term we enforce one message per `printk_ratelimit`_ |
| seconds, we do allow a burst of messages to pass through. |
| ``printk_ratelimit_burst`` specifies the number of messages we can |
| send before ratelimiting kicks in. |
| |
| The default value is 10 messages. |
| |
| |
| printk_devkmsg |
| ============== |
| |
| Control the logging to ``/dev/kmsg`` from userspace: |
| |
| ========= ============================================= |
| ratelimit default, ratelimited |
| on unlimited logging to /dev/kmsg from userspace |
| off logging to /dev/kmsg disabled |
| ========= ============================================= |
| |
| The kernel command line parameter ``printk.devkmsg=`` overrides this and is |
| a one-time setting until next reboot: once set, it cannot be changed by |
| this sysctl interface anymore. |
| |
| ============================================================== |
| |
| |
| pty |
| === |
| |
| See Documentation/filesystems/devpts.rst. |
| |
| |
| random |
| ====== |
| |
| This is a directory, with the following entries: |
| |
| * ``boot_id``: a UUID generated the first time this is retrieved, and |
| unvarying after that; |
| |
| * ``uuid``: a UUID generated every time this is retrieved (this can |
| thus be used to generate UUIDs at will); |
| |
| * ``entropy_avail``: the pool's entropy count, in bits; |
| |
| * ``poolsize``: the entropy pool size, in bits; |
| |
| * ``urandom_min_reseed_secs``: obsolete (used to determine the minimum |
| number of seconds between urandom pool reseeding). This file is |
| writable for compatibility purposes, but writing to it has no effect |
| on any RNG behavior; |
| |
| * ``write_wakeup_threshold``: when the entropy count drops below this |
| (as a number of bits), processes waiting to write to ``/dev/random`` |
| are woken up. This file is writable for compatibility purposes, but |
| writing to it has no effect on any RNG behavior. |
| |
| |
| randomize_va_space |
| ================== |
| |
| This option can be used to select the type of process address |
| space randomization that is used in the system, for architectures |
| that support this feature. |
| |
| == =========================================================================== |
| 0 Turn the process address space randomization off. This is the |
| default for architectures that do not support this feature anyways, |
| and kernels that are booted with the "norandmaps" parameter. |
| |
| 1 Make the addresses of mmap base, stack and VDSO page randomized. |
| This, among other things, implies that shared libraries will be |
| loaded to random addresses. Also for PIE-linked binaries, the |
| location of code start is randomized. This is the default if the |
| ``CONFIG_COMPAT_BRK`` option is enabled. |
| |
| 2 Additionally enable heap randomization. This is the default if |
| ``CONFIG_COMPAT_BRK`` is disabled. |
| |
| There are a few legacy applications out there (such as some ancient |
| versions of libc.so.5 from 1996) that assume that brk area starts |
| just after the end of the code+bss. These applications break when |
| start of the brk area is randomized. There are however no known |
| non-legacy applications that would be broken this way, so for most |
| systems it is safe to choose full randomization. |
| |
| Systems with ancient and/or broken binaries should be configured |
| with ``CONFIG_COMPAT_BRK`` enabled, which excludes the heap from process |
| address space randomization. |
| == =========================================================================== |
| |
| |
| real-root-dev |
| ============= |
| |
| See Documentation/admin-guide/initrd.rst. |
| |
| |
| reboot-cmd (SPARC only) |
| ======================= |
| |
| ??? This seems to be a way to give an argument to the Sparc |
| ROM/Flash boot loader. Maybe to tell it what to do after |
| rebooting. ??? |
| |
| |
| sched_energy_aware |
| ================== |
| |
| Enables/disables Energy Aware Scheduling (EAS). EAS starts |
| automatically on platforms where it can run (that is, |
| platforms with asymmetric CPU topologies and having an Energy |
| Model available). If your platform happens to meet the |
| requirements for EAS but you do not want to use it, change |
| this value to 0. On Non-EAS platforms, write operation fails and |
| read doesn't return anything. |
| |
| task_delayacct |
| =============== |
| |
| Enables/disables task delay accounting (see |
| Documentation/accounting/delay-accounting.rst. Enabling this feature incurs |
| a small amount of overhead in the scheduler but is useful for debugging |
| and performance tuning. It is required by some tools such as iotop. |
| |
| sched_schedstats |
| ================ |
| |
| Enables/disables scheduler statistics. Enabling this feature |
| incurs a small amount of overhead in the scheduler but is |
| useful for debugging and performance tuning. |
| |
| sched_util_clamp_min |
| ==================== |
| |
| Max allowed *minimum* utilization. |
| |
| Default value is 1024, which is the maximum possible value. |
| |
| It means that any requested uclamp.min value cannot be greater than |
| sched_util_clamp_min, i.e., it is restricted to the range |
| [0:sched_util_clamp_min]. |
| |
| sched_util_clamp_max |
| ==================== |
| |
| Max allowed *maximum* utilization. |
| |
| Default value is 1024, which is the maximum possible value. |
| |
| It means that any requested uclamp.max value cannot be greater than |
| sched_util_clamp_max, i.e., it is restricted to the range |
| [0:sched_util_clamp_max]. |
| |
| sched_util_clamp_min_rt_default |
| =============================== |
| |
| By default Linux is tuned for performance. Which means that RT tasks always run |
| at the highest frequency and most capable (highest capacity) CPU (in |
| heterogeneous systems). |
| |
| Uclamp achieves this by setting the requested uclamp.min of all RT tasks to |
| 1024 by default, which effectively boosts the tasks to run at the highest |
| frequency and biases them to run on the biggest CPU. |
| |
| This knob allows admins to change the default behavior when uclamp is being |
| used. In battery powered devices particularly, running at the maximum |
| capacity and frequency will increase energy consumption and shorten the battery |
| life. |
| |
| This knob is only effective for RT tasks which the user hasn't modified their |
| requested uclamp.min value via sched_setattr() syscall. |
| |
| This knob will not escape the range constraint imposed by sched_util_clamp_min |
| defined above. |
| |
| For example if |
| |
| sched_util_clamp_min_rt_default = 800 |
| sched_util_clamp_min = 600 |
| |
| Then the boost will be clamped to 600 because 800 is outside of the permissible |
| range of [0:600]. This could happen for instance if a powersave mode will |
| restrict all boosts temporarily by modifying sched_util_clamp_min. As soon as |
| this restriction is lifted, the requested sched_util_clamp_min_rt_default |
| will take effect. |
| |
| seccomp |
| ======= |
| |
| See Documentation/userspace-api/seccomp_filter.rst. |
| |
| |
| sg-big-buff |
| =========== |
| |
| This file shows the size of the generic SCSI (sg) buffer. |
| You can't tune it just yet, but you could change it on |
| compile time by editing ``include/scsi/sg.h`` and changing |
| the value of ``SG_BIG_BUFF``. |
| |
| There shouldn't be any reason to change this value. If |
| you can come up with one, you probably know what you |
| are doing anyway :) |
| |
| |
| shmall |
| ====== |
| |
| This parameter sets the total amount of shared memory pages that can be used |
| inside ipc namespace. The shared memory pages counting occurs for each ipc |
| namespace separately and is not inherited. Hence, ``shmall`` should always be at |
| least ``ceil(shmmax/PAGE_SIZE)``. |
| |
| If you are not sure what the default ``PAGE_SIZE`` is on your Linux |
| system, you can run the following command:: |
| |
| # getconf PAGE_SIZE |
| |
| To reduce or disable the ability to allocate shared memory, you must create a |
| new ipc namespace, set this parameter to the required value and prohibit the |
| creation of a new ipc namespace in the current user namespace or cgroups can |
| be used. |
| |
| shmmax |
| ====== |
| |
| This value can be used to query and set the run time limit |
| on the maximum shared memory segment size that can be created. |
| Shared memory segments up to 1Gb are now supported in the |
| kernel. This value defaults to ``SHMMAX``. |
| |
| |
| shmmni |
| ====== |
| |
| This value determines the maximum number of shared memory segments. |
| 4096 by default (``SHMMNI``). |
| |
| |
| shm_rmid_forced |
| =============== |
| |
| Linux lets you set resource limits, including how much memory one |
| process can consume, via ``setrlimit(2)``. Unfortunately, shared memory |
| segments are allowed to exist without association with any process, and |
| thus might not be counted against any resource limits. If enabled, |
| shared memory segments are automatically destroyed when their attach |
| count becomes zero after a detach or a process termination. It will |
| also destroy segments that were created, but never attached to, on exit |
| from the process. The only use left for ``IPC_RMID`` is to immediately |
| destroy an unattached segment. Of course, this breaks the way things are |
| defined, so some applications might stop working. Note that this |
| feature will do you no good unless you also configure your resource |
| limits (in particular, ``RLIMIT_AS`` and ``RLIMIT_NPROC``). Most systems don't |
| need this. |
| |
| Note that if you change this from 0 to 1, already created segments |
| without users and with a dead originative process will be destroyed. |
| |
| |
| sysctl_writes_strict |
| ==================== |
| |
| Control how file position affects the behavior of updating sysctl values |
| via the ``/proc/sys`` interface: |
| |
| == ====================================================================== |
| -1 Legacy per-write sysctl value handling, with no printk warnings. |
| Each write syscall must fully contain the sysctl value to be |
| written, and multiple writes on the same sysctl file descriptor |
| will rewrite the sysctl value, regardless of file position. |
| 0 Same behavior as above, but warn about processes that perform writes |
| to a sysctl file descriptor when the file position is not 0. |
| 1 (default) Respect file position when writing sysctl strings. Multiple |
| writes will append to the sysctl value buffer. Anything past the max |
| length of the sysctl value buffer will be ignored. Writes to numeric |
| sysctl entries must always be at file position 0 and the value must |
| be fully contained in the buffer sent in the write syscall. |
| == ====================================================================== |
| |
| |
| softlockup_all_cpu_backtrace |
| ============================ |
| |
| This value controls the soft lockup detector thread's behavior |
| when a soft lockup condition is detected as to whether or not |
| to gather further debug information. If enabled, each cpu will |
| be issued an NMI and instructed to capture stack trace. |
| |
| This feature is only applicable for architectures which support |
| NMI. |
| |
| = ============================================ |
| 0 Do nothing. This is the default behavior. |
| 1 On detection capture more debug information. |
| = ============================================ |
| |
| |
| softlockup_panic |
| ================= |
| |
| This parameter can be used to control whether the kernel panics |
| when a soft lockup is detected. |
| |
| = ============================================ |
| 0 Don't panic on soft lockup. |
| 1 Panic on soft lockup. |
| = ============================================ |
| |
| This can also be set using the softlockup_panic kernel parameter. |
| |
| |
| soft_watchdog |
| ============= |
| |
| This parameter can be used to control the soft lockup detector. |
| |
| = ================================= |
| 0 Disable the soft lockup detector. |
| 1 Enable the soft lockup detector. |
| = ================================= |
| |
| The soft lockup detector monitors CPUs for threads that are hogging the CPUs |
| without rescheduling voluntarily, and thus prevent the 'migration/N' threads |
| from running, causing the watchdog work fail to execute. The mechanism depends |
| on the CPUs ability to respond to timer interrupts which are needed for the |
| watchdog work to be queued by the watchdog timer function, otherwise the NMI |
| watchdog — if enabled — can detect a hard lockup condition. |
| |
| |
| split_lock_mitigate (x86 only) |
| ============================== |
| |
| On x86, each "split lock" imposes a system-wide performance penalty. On larger |
| systems, large numbers of split locks from unprivileged users can result in |
| denials of service to well-behaved and potentially more important users. |
| |
| The kernel mitigates these bad users by detecting split locks and imposing |
| penalties: forcing them to wait and only allowing one core to execute split |
| locks at a time. |
| |
| These mitigations can make those bad applications unbearably slow. Setting |
| split_lock_mitigate=0 may restore some application performance, but will also |
| increase system exposure to denial of service attacks from split lock users. |
| |
| = =================================================================== |
| 0 Disable the mitigation mode - just warns the split lock on kernel log |
| and exposes the system to denials of service from the split lockers. |
| 1 Enable the mitigation mode (this is the default) - penalizes the split |
| lockers with intentional performance degradation. |
| = =================================================================== |
| |
| |
| stack_erasing |
| ============= |
| |
| This parameter can be used to control kernel stack erasing at the end |
| of syscalls for kernels built with ``CONFIG_GCC_PLUGIN_STACKLEAK``. |
| |
| That erasing reduces the information which kernel stack leak bugs |
| can reveal and blocks some uninitialized stack variable attacks. |
| The tradeoff is the performance impact: on a single CPU system kernel |
| compilation sees a 1% slowdown, other systems and workloads may vary. |
| |
| = ==================================================================== |
| 0 Kernel stack erasing is disabled, STACKLEAK_METRICS are not updated. |
| 1 Kernel stack erasing is enabled (default), it is performed before |
| returning to the userspace at the end of syscalls. |
| = ==================================================================== |
| |
| |
| stop-a (SPARC only) |
| =================== |
| |
| Controls Stop-A: |
| |
| = ==================================== |
| 0 Stop-A has no effect. |
| 1 Stop-A breaks to the PROM (default). |
| = ==================================== |
| |
| Stop-A is always enabled on a panic, so that the user can return to |
| the boot PROM. |
| |
| |
| sysrq |
| ===== |
| |
| See Documentation/admin-guide/sysrq.rst. |
| |
| |
| tainted |
| ======= |
| |
| Non-zero if the kernel has been tainted. Numeric values, which can be |
| ORed together. The letters are seen in "Tainted" line of Oops reports. |
| |
| ====== ===== ============================================================== |
| 1 `(P)` proprietary module was loaded |
| 2 `(F)` module was force loaded |
| 4 `(S)` kernel running on an out of specification system |
| 8 `(R)` module was force unloaded |
| 16 `(M)` processor reported a Machine Check Exception (MCE) |
| 32 `(B)` bad page referenced or some unexpected page flags |
| 64 `(U)` taint requested by userspace application |
| 128 `(D)` kernel died recently, i.e. there was an OOPS or BUG |
| 256 `(A)` an ACPI table was overridden by user |
| 512 `(W)` kernel issued warning |
| 1024 `(C)` staging driver was loaded |
| 2048 `(I)` workaround for bug in platform firmware applied |
| 4096 `(O)` externally-built ("out-of-tree") module was loaded |
| 8192 `(E)` unsigned module was loaded |
| 16384 `(L)` soft lockup occurred |
| 32768 `(K)` kernel has been live patched |
| 65536 `(X)` Auxiliary taint, defined and used by for distros |
| 131072 `(T)` The kernel was built with the struct randomization plugin |
| ====== ===== ============================================================== |
| |
| See Documentation/admin-guide/tainted-kernels.rst for more information. |
| |
| Note: |
| writes to this sysctl interface will fail with ``EINVAL`` if the kernel is |
| booted with the command line option ``panic_on_taint=<bitmask>,nousertaint`` |
| and any of the ORed together values being written to ``tainted`` match with |
| the bitmask declared on panic_on_taint. |
| See Documentation/admin-guide/kernel-parameters.rst for more details on |
| that particular kernel command line option and its optional |
| ``nousertaint`` switch. |
| |
| threads-max |
| =========== |
| |
| This value controls the maximum number of threads that can be created |
| using ``fork()``. |
| |
| During initialization the kernel sets this value such that even if the |
| maximum number of threads is created, the thread structures occupy only |
| a part (1/8th) of the available RAM pages. |
| |
| The minimum value that can be written to ``threads-max`` is 1. |
| |
| The maximum value that can be written to ``threads-max`` is given by the |
| constant ``FUTEX_TID_MASK`` (0x3fffffff). |
| |
| If a value outside of this range is written to ``threads-max`` an |
| ``EINVAL`` error occurs. |
| |
| |
| traceoff_on_warning |
| =================== |
| |
| When set, disables tracing (see Documentation/trace/ftrace.rst) when a |
| ``WARN()`` is hit. |
| |
| |
| tracepoint_printk |
| ================= |
| |
| When tracepoints are sent to printk() (enabled by the ``tp_printk`` |
| boot parameter), this entry provides runtime control:: |
| |
| echo 0 > /proc/sys/kernel/tracepoint_printk |
| |
| will stop tracepoints from being sent to printk(), and:: |
| |
| echo 1 > /proc/sys/kernel/tracepoint_printk |
| |
| will send them to printk() again. |
| |
| This only works if the kernel was booted with ``tp_printk`` enabled. |
| |
| See Documentation/admin-guide/kernel-parameters.rst and |
| Documentation/trace/boottime-trace.rst. |
| |
| |
| unaligned-trap |
| ============== |
| |
| On architectures where unaligned accesses cause traps, and where this |
| feature is supported (``CONFIG_SYSCTL_ARCH_UNALIGN_ALLOW``; currently, |
| ``arc``, ``parisc`` and ``loongarch``), controls whether unaligned traps |
| are caught and emulated (instead of failing). |
| |
| = ======================================================== |
| 0 Do not emulate unaligned accesses. |
| 1 Emulate unaligned accesses. This is the default setting. |
| = ======================================================== |
| |
| See also `ignore-unaligned-usertrap`_. |
| |
| |
| unknown_nmi_panic |
| ================= |
| |
| The value in this file affects behavior of handling NMI. When the |
| value is non-zero, unknown NMI is trapped and then panic occurs. At |
| that time, kernel debugging information is displayed on console. |
| |
| NMI switch that most IA32 servers have fires unknown NMI up, for |
| example. If a system hangs up, try pressing the NMI switch. |
| |
| |
| unprivileged_bpf_disabled |
| ========================= |
| |
| Writing 1 to this entry will disable unprivileged calls to ``bpf()``; |
| once disabled, calling ``bpf()`` without ``CAP_SYS_ADMIN`` or ``CAP_BPF`` |
| will return ``-EPERM``. Once set to 1, this can't be cleared from the |
| running kernel anymore. |
| |
| Writing 2 to this entry will also disable unprivileged calls to ``bpf()``, |
| however, an admin can still change this setting later on, if needed, by |
| writing 0 or 1 to this entry. |
| |
| If ``BPF_UNPRIV_DEFAULT_OFF`` is enabled in the kernel config, then this |
| entry will default to 2 instead of 0. |
| |
| = ============================================================= |
| 0 Unprivileged calls to ``bpf()`` are enabled |
| 1 Unprivileged calls to ``bpf()`` are disabled without recovery |
| 2 Unprivileged calls to ``bpf()`` are disabled |
| = ============================================================= |
| |
| |
| warn_limit |
| ========== |
| |
| Number of kernel warnings after which the kernel should panic when |
| ``panic_on_warn`` is not set. Setting this to 0 disables checking |
| the warning count. Setting this to 1 has the same effect as setting |
| ``panic_on_warn=1``. The default value is 0. |
| |
| |
| watchdog |
| ======== |
| |
| This parameter can be used to disable or enable the soft lockup detector |
| *and* the NMI watchdog (i.e. the hard lockup detector) at the same time. |
| |
| = ============================== |
| 0 Disable both lockup detectors. |
| 1 Enable both lockup detectors. |
| = ============================== |
| |
| The soft lockup detector and the NMI watchdog can also be disabled or |
| enabled individually, using the ``soft_watchdog`` and ``nmi_watchdog`` |
| parameters. |
| If the ``watchdog`` parameter is read, for example by executing:: |
| |
| cat /proc/sys/kernel/watchdog |
| |
| the output of this command (0 or 1) shows the logical OR of |
| ``soft_watchdog`` and ``nmi_watchdog``. |
| |
| |
| watchdog_cpumask |
| ================ |
| |
| This value can be used to control on which cpus the watchdog may run. |
| The default cpumask is all possible cores, but if ``NO_HZ_FULL`` is |
| enabled in the kernel config, and cores are specified with the |
| ``nohz_full=`` boot argument, those cores are excluded by default. |
| Offline cores can be included in this mask, and if the core is later |
| brought online, the watchdog will be started based on the mask value. |
| |
| Typically this value would only be touched in the ``nohz_full`` case |
| to re-enable cores that by default were not running the watchdog, |
| if a kernel lockup was suspected on those cores. |
| |
| The argument value is the standard cpulist format for cpumasks, |
| so for example to enable the watchdog on cores 0, 2, 3, and 4 you |
| might say:: |
| |
| echo 0,2-4 > /proc/sys/kernel/watchdog_cpumask |
| |
| |
| watchdog_thresh |
| =============== |
| |
| This value can be used to control the frequency of hrtimer and NMI |
| events and the soft and hard lockup thresholds. The default threshold |
| is 10 seconds. |
| |
| The softlockup threshold is (``2 * watchdog_thresh``). Setting this |
| tunable to zero will disable lockup detection altogether. |