| What: /sys/devices/system/machinecheck/machinecheckX/tolerant |
| Contact: Borislav Petkov <bp@suse.de> |
| Date: Dec, 2021 |
| Description: |
| Unused and obsolete after the advent of recoverable machine |
| checks (see last sentence below) and those are present since |
| 2010 (Nehalem). |
| |
| Original description: |
| |
| The entries appear for each CPU, but they are truly shared |
| between all CPUs. |
| |
| Tolerance level. When a machine check exception occurs for a |
| non corrected machine check the kernel can take different |
| actions. |
| |
| Since machine check exceptions can happen any time it is |
| sometimes risky for the kernel to kill a process because it |
| defies normal kernel locking rules. The tolerance level |
| configures how hard the kernel tries to recover even at some |
| risk of deadlock. Higher tolerant values trade potentially |
| better uptime with the risk of a crash or even corruption |
| (for tolerant >= 3). |
| |
| == =========================================================== |
| 0 always panic on uncorrected errors, log corrected errors |
| 1 panic or SIGBUS on uncorrected errors, log corrected errors |
| 2 SIGBUS or log uncorrected errors, log corrected errors |
| 3 never panic or SIGBUS, log all errors (for testing only) |
| == =========================================================== |
| |
| Default: 1 |
| |
| Note this only makes a difference if the CPU allows recovery |
| from a machine check exception. Current x86 CPUs generally |
| do not. |