| .. SPDX-License-Identifier: GPL-2.0 |
| |
| =============== |
| Boot Interrupts |
| =============== |
| |
| :Author: - Sean V Kelley <sean.v.kelley@linux.intel.com> |
| |
| Overview |
| ======== |
| |
| On PCI Express, interrupts are represented with either MSI or inbound |
| interrupt messages (Assert_INTx/Deassert_INTx). The integrated IO-APIC in a |
| given Core IO converts the legacy interrupt messages from PCI Express to |
| MSI interrupts. If the IO-APIC is disabled (via the mask bits in the |
| IO-APIC table entries), the messages are routed to the legacy PCH. This |
| in-band interrupt mechanism was traditionally necessary for systems that |
| did not support the IO-APIC and for boot. Intel in the past has used the |
| term "boot interrupts" to describe this mechanism. Further, the PCI Express |
| protocol describes this in-band legacy wire-interrupt INTx mechanism for |
| I/O devices to signal PCI-style level interrupts. The subsequent paragraphs |
| describe problems with the Core IO handling of INTx message routing to the |
| PCH and mitigation within BIOS and the OS. |
| |
| |
| Issue |
| ===== |
| |
| When in-band legacy INTx messages are forwarded to the PCH, they in turn |
| trigger a new interrupt for which the OS likely lacks a handler. When an |
| interrupt goes unhandled over time, they are tracked by the Linux kernel as |
| Spurious Interrupts. The IRQ will be disabled by the Linux kernel after it |
| reaches a specific count with the error "nobody cared". This disabled IRQ |
| now prevents valid usage by an existing interrupt which may happen to share |
| the IRQ line:: |
| |
| irq 19: nobody cared (try booting with the "irqpoll" option) |
| CPU: 0 PID: 2988 Comm: irq/34-nipalk Tainted: 4.14.87-rt49-02410-g4a640ec-dirty #1 |
| Hardware name: National Instruments NI PXIe-8880/NI PXIe-8880, BIOS 2.1.5f1 01/09/2020 |
| Call Trace: |
| |
| <IRQ> |
| ? dump_stack+0x46/0x5e |
| ? __report_bad_irq+0x2e/0xb0 |
| ? note_interrupt+0x242/0x290 |
| ? nNIKAL100_memoryRead16+0x8/0x10 [nikal] |
| ? handle_irq_event_percpu+0x55/0x70 |
| ? handle_irq_event+0x4f/0x80 |
| ? handle_fasteoi_irq+0x81/0x180 |
| ? handle_irq+0x1c/0x30 |
| ? do_IRQ+0x41/0xd0 |
| ? common_interrupt+0x84/0x84 |
| </IRQ> |
| |
| handlers: |
| irq_default_primary_handler threaded usb_hcd_irq |
| Disabling IRQ #19 |
| |
| |
| Conditions |
| ========== |
| |
| The use of threaded interrupts is the most likely condition to trigger |
| this problem today. Threaded interrupts may not be re-enabled after the IRQ |
| handler wakes. These "one shot" conditions mean that the threaded interrupt |
| needs to keep the interrupt line masked until the threaded handler has run. |
| Especially when dealing with high data rate interrupts, the thread needs to |
| run to completion; otherwise some handlers will end up in stack overflows |
| since the interrupt of the issuing device is still active. |
| |
| Affected Chipsets |
| ================= |
| |
| The legacy interrupt forwarding mechanism exists today in a number of |
| devices including but not limited to chipsets from AMD/ATI, Broadcom, and |
| Intel. Changes made through the mitigations below have been applied to |
| drivers/pci/quirks.c |
| |
| Starting with ICX there are no longer any IO-APICs in the Core IO's |
| devices. IO-APIC is only in the PCH. Devices connected to the Core IO's |
| PCIe Root Ports will use native MSI/MSI-X mechanisms. |
| |
| Mitigations |
| =========== |
| |
| The mitigations take the form of PCI quirks. The preference has been to |
| first identify and make use of a means to disable the routing to the PCH. |
| In such a case a quirk to disable boot interrupt generation can be |
| added. [1]_ |
| |
| Intel® 6300ESB I/O Controller Hub |
| Alternate Base Address Register: |
| BIE: Boot Interrupt Enable |
| |
| == =========================== |
| 0 Boot interrupt is enabled. |
| 1 Boot interrupt is disabled. |
| == =========================== |
| |
| Intel® Sandy Bridge through Sky Lake based Xeon servers: |
| Coherent Interface Protocol Interrupt Control |
| dis_intx_route2pch/dis_intx_route2ich/dis_intx_route2dmi2: |
| When this bit is set. Local INTx messages received from the |
| Intel® Quick Data DMA/PCI Express ports are not routed to legacy |
| PCH - they are either converted into MSI via the integrated IO-APIC |
| (if the IO-APIC mask bit is clear in the appropriate entries) |
| or cause no further action (when mask bit is set) |
| |
| In the absence of a way to directly disable the routing, another approach |
| has been to make use of PCI Interrupt pin to INTx routing tables for |
| purposes of redirecting the interrupt handler to the rerouted interrupt |
| line by default. Therefore, on chipsets where this INTx routing cannot be |
| disabled, the Linux kernel will reroute the valid interrupt to its legacy |
| interrupt. This redirection of the handler will prevent the occurrence of |
| the spurious interrupt detection which would ordinarily disable the IRQ |
| line due to excessive unhandled counts. [2]_ |
| |
| The config option X86_REROUTE_FOR_BROKEN_BOOT_IRQS exists to enable (or |
| disable) the redirection of the interrupt handler to the PCH interrupt |
| line. The option can be overridden by either pci=ioapicreroute or |
| pci=noioapicreroute. [3]_ |
| |
| |
| More Documentation |
| ================== |
| |
| There is an overview of the legacy interrupt handling in several datasheets |
| (6300ESB and 6700PXH below). While largely the same, it provides insight |
| into the evolution of its handling with chipsets. |
| |
| Example of disabling of the boot interrupt |
| ------------------------------------------ |
| |
| - Intel® 6300ESB I/O Controller Hub (Document # 300641-004US) |
| 5.7.3 Boot Interrupt |
| https://www.intel.com/content/dam/doc/datasheet/6300esb-io-controller-hub-datasheet.pdf |
| |
| - Intel® Xeon® Processor E5-1600/2400/2600/4600 v3 Product Families |
| Datasheet - Volume 2: Registers (Document # 330784-003) |
| 6.6.41 cipintrc Coherent Interface Protocol Interrupt Control |
| https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf |
| |
| Example of handler rerouting |
| ---------------------------- |
| |
| - Intel® 6700PXH 64-bit PCI Hub (Document # 302628) |
| 2.15.2 PCI Express Legacy INTx Support and Boot Interrupt |
| https://www.intel.com/content/dam/doc/datasheet/6700pxh-64-bit-pci-hub-datasheet.pdf |
| |
| |
| If you have any legacy PCI interrupt questions that aren't answered, email me. |
| |
| Cheers, |
| Sean V Kelley |
| sean.v.kelley@linux.intel.com |
| |
| .. [1] https://lore.kernel.org/r/12131949181903-git-send-email-sassmann@suse.de/ |
| .. [2] https://lore.kernel.org/r/12131949182094-git-send-email-sassmann@suse.de/ |
| .. [3] https://lore.kernel.org/r/487C8EA7.6020205@suse.de/ |