| What: /sys/devices/system/machinecheck/machinecheckX/ |
| Contact: Andi Kleen <ak@linux.intel.com> |
| Date: Feb, 2007 |
| Description: |
| (X = CPU number) |
| |
| Machine checks report internal hardware error conditions |
| detected by the CPU. Uncorrected errors typically cause a |
| machine check (often with panic), corrected ones cause a |
| machine check log entry. |
| |
| For more details about the x86 machine check architecture |
| see the Intel and AMD architecture manuals from their |
| developer websites. |
| |
| For more details about the architecture |
| see http://one.firstfloor.org/~andi/mce.pdf |
| |
| Each CPU has its own directory. |
| |
| What: /sys/devices/system/machinecheck/machinecheckX/bank<Y> |
| Contact: Andi Kleen <ak@linux.intel.com> |
| Date: Feb, 2007 |
| Description: |
| (Y bank number) |
| |
| 64bit Hex bitmask enabling/disabling specific subevents for |
| bank Y. |
| |
| When a bit in the bitmask is zero then the respective |
| subevent will not be reported. |
| |
| By default all events are enabled. |
| |
| Note that BIOS maintain another mask to disable specific events |
| per bank. This is not visible here |
| |
| What: /sys/devices/system/machinecheck/machinecheckX/check_interval |
| Contact: Andi Kleen <ak@linux.intel.com> |
| Date: Feb, 2007 |
| Description: |
| The entries appear for each CPU, but they are truly shared |
| between all CPUs. |
| |
| How often to poll for corrected machine check errors, in |
| seconds (Note output is hexadecimal). Default 5 minutes. |
| When the poller finds MCEs it triggers an exponential speedup |
| (poll more often) on the polling interval. When the poller |
| stops finding MCEs, it triggers an exponential backoff |
| (poll less often) on the polling interval. The check_interval |
| variable is both the initial and maximum polling interval. |
| 0 means no polling for corrected machine check errors |
| (but some corrected errors might be still reported |
| in other ways) |
| |
| What: /sys/devices/system/machinecheck/machinecheckX/tolerant |
| Contact: Andi Kleen <ak@linux.intel.com> |
| Date: Feb, 2007 |
| Description: |
| The entries appear for each CPU, but they are truly shared |
| between all CPUs. |
| |
| Tolerance level. When a machine check exception occurs for a |
| non corrected machine check the kernel can take different |
| actions. |
| |
| Since machine check exceptions can happen any time it is |
| sometimes risky for the kernel to kill a process because it |
| defies normal kernel locking rules. The tolerance level |
| configures how hard the kernel tries to recover even at some |
| risk of deadlock. Higher tolerant values trade potentially |
| better uptime with the risk of a crash or even corruption |
| (for tolerant >= 3). |
| |
| == =========================================================== |
| 0 always panic on uncorrected errors, log corrected errors |
| 1 panic or SIGBUS on uncorrected errors, log corrected errors |
| 2 SIGBUS or log uncorrected errors, log corrected errors |
| 3 never panic or SIGBUS, log all errors (for testing only) |
| == =========================================================== |
| |
| Default: 1 |
| |
| Note this only makes a difference if the CPU allows recovery |
| from a machine check exception. Current x86 CPUs generally |
| do not. |
| |
| What: /sys/devices/system/machinecheck/machinecheckX/trigger |
| Contact: Andi Kleen <ak@linux.intel.com> |
| Date: Feb, 2007 |
| Description: |
| The entries appear for each CPU, but they are truly shared |
| between all CPUs. |
| |
| Program to run when a machine check event is detected. |
| This is an alternative to running mcelog regularly from cron |
| and allows to detect events faster. |
| |
| What: /sys/devices/system/machinecheck/machinecheckX/monarch_timeout |
| Contact: Andi Kleen <ak@linux.intel.com> |
| Date: Feb, 2007 |
| Description: |
| How long to wait for the other CPUs to machine check too on a |
| exception. 0 to disable waiting for other CPUs. |
| |
| Unit: us |
| |
| What: /sys/devices/system/machinecheck/machinecheckX/ignore_ce |
| Contact: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> |
| Date: Jun 2009 |
| Description: |
| Disables polling and CMCI for corrected errors. |
| All corrected events are not cleared and kept in bank MSRs. |
| |
| What: /sys/devices/system/machinecheck/machinecheckX/dont_log_ce |
| Contact: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> |
| Date: Jun 2009 |
| Description: |
| Disables logging for corrected errors. |
| All reported corrected errors will be cleared silently. |
| |
| This option will be useful if you never care about corrected |
| errors. |
| |
| What: /sys/devices/system/machinecheck/machinecheckX/cmci_disabled |
| Contact: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> |
| Date: Jun 2009 |
| Description: |
| Disables the CMCI feature. |