| .. SPDX-License-Identifier: GPL-2.0 |
| |
| ============================ |
| PCI Peer-to-Peer DMA Support |
| ============================ |
| |
| The PCI bus has pretty decent support for performing DMA transfers |
| between two devices on the bus. This type of transaction is henceforth |
| called Peer-to-Peer (or P2P). However, there are a number of issues that |
| make P2P transactions tricky to do in a perfectly safe way. |
| |
| One of the biggest issues is that PCI doesn't require forwarding |
| transactions between hierarchy domains, and in PCIe, each Root Port |
| defines a separate hierarchy domain. To make things worse, there is no |
| simple way to determine if a given Root Complex supports this or not. |
| (See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel |
| only supports doing P2P when the endpoints involved are all behind the |
| same PCI bridge, as such devices are all in the same PCI hierarchy |
| domain, and the spec guarantees that all transactions within the |
| hierarchy will be routable, but it does not require routing |
| between hierarchies. |
| |
| The second issue is that to make use of existing interfaces in Linux, |
| memory that is used for P2P transactions needs to be backed by struct |
| pages. However, PCI BARs are not typically cache coherent so there are |
| a few corner case gotchas with these pages so developers need to |
| be careful about what they do with them. |
| |
| |
| Driver Writer's Guide |
| ===================== |
| |
| In a given P2P implementation there may be three or more different |
| types of kernel drivers in play: |
| |
| * Provider - A driver which provides or publishes P2P resources like |
| memory or doorbell registers to other drivers. |
| * Client - A driver which makes use of a resource by setting up a |
| DMA transaction to or from it. |
| * Orchestrator - A driver which orchestrates the flow of data between |
| clients and providers. |
| |
| In many cases there could be overlap between these three types (i.e., |
| it may be typical for a driver to be both a provider and a client). |
| |
| For example, in the NVMe Target Copy Offload implementation: |
| |
| * The NVMe PCI driver is both a client, provider and orchestrator |
| in that it exposes any CMB (Controller Memory Buffer) as a P2P memory |
| resource (provider), it accepts P2P memory pages as buffers in requests |
| to be used directly (client) and it can also make use of the CMB as |
| submission queue entries (orchastrator). |
| * The RDMA driver is a client in this arrangement so that an RNIC |
| can DMA directly to the memory exposed by the NVMe device. |
| * The NVMe Target driver (nvmet) can orchestrate the data from the RNIC |
| to the P2P memory (CMB) and then to the NVMe device (and vice versa). |
| |
| This is currently the only arrangement supported by the kernel but |
| one could imagine slight tweaks to this that would allow for the same |
| functionality. For example, if a specific RNIC added a BAR with some |
| memory behind it, its driver could add support as a P2P provider and |
| then the NVMe Target could use the RNIC's memory instead of the CMB |
| in cases where the NVMe cards in use do not have CMB support. |
| |
| |
| Provider Drivers |
| ---------------- |
| |
| A provider simply needs to register a BAR (or a portion of a BAR) |
| as a P2P DMA resource using :c:func:`pci_p2pdma_add_resource()`. |
| This will register struct pages for all the specified memory. |
| |
| After that it may optionally publish all of its resources as |
| P2P memory using :c:func:`pci_p2pmem_publish()`. This will allow |
| any orchestrator drivers to find and use the memory. When marked in |
| this way, the resource must be regular memory with no side effects. |
| |
| For the time being this is fairly rudimentary in that all resources |
| are typically going to be P2P memory. Future work will likely expand |
| this to include other types of resources like doorbells. |
| |
| |
| Client Drivers |
| -------------- |
| |
| A client driver typically only has to conditionally change its DMA map |
| routine to use the mapping function :c:func:`pci_p2pdma_map_sg()` instead |
| of the usual :c:func:`dma_map_sg()` function. Memory mapped in this |
| way does not need to be unmapped. |
| |
| The client may also, optionally, make use of |
| :c:func:`is_pci_p2pdma_page()` to determine when to use the P2P mapping |
| functions and when to use the regular mapping functions. In some |
| situations, it may be more appropriate to use a flag to indicate a |
| given request is P2P memory and map appropriately. It is important to |
| ensure that struct pages that back P2P memory stay out of code that |
| does not have support for them as other code may treat the pages as |
| regular memory which may not be appropriate. |
| |
| |
| Orchestrator Drivers |
| -------------------- |
| |
| The first task an orchestrator driver must do is compile a list of |
| all client devices that will be involved in a given transaction. For |
| example, the NVMe Target driver creates a list including the namespace |
| block device and the RNIC in use. If the orchestrator has access to |
| a specific P2P provider to use it may check compatibility using |
| :c:func:`pci_p2pdma_distance()` otherwise it may find a memory provider |
| that's compatible with all clients using :c:func:`pci_p2pmem_find()`. |
| If more than one provider is supported, the one nearest to all the clients will |
| be chosen first. If more than one provider is an equal distance away, the |
| one returned will be chosen at random (it is not an arbitrary but |
| truely random). This function returns the PCI device to use for the provider |
| with a reference taken and therefore when it's no longer needed it should be |
| returned with pci_dev_put(). |
| |
| Once a provider is selected, the orchestrator can then use |
| :c:func:`pci_alloc_p2pmem()` and :c:func:`pci_free_p2pmem()` to |
| allocate P2P memory from the provider. :c:func:`pci_p2pmem_alloc_sgl()` |
| and :c:func:`pci_p2pmem_free_sgl()` are convenience functions for |
| allocating scatter-gather lists with P2P memory. |
| |
| Struct Page Caveats |
| ------------------- |
| |
| Driver writers should be very careful about not passing these special |
| struct pages to code that isn't prepared for it. At this time, the kernel |
| interfaces do not have any checks for ensuring this. This obviously |
| precludes passing these pages to userspace. |
| |
| P2P memory is also technically IO memory but should never have any side |
| effects behind it. Thus, the order of loads and stores should not be important |
| and ioreadX(), iowriteX() and friends should not be necessary. |
| However, as the memory is not cache coherent, if access ever needs to |
| be protected by a spinlock then :c:func:`mmiowb()` must be used before |
| unlocking the lock. (See ACQUIRES VS I/O ACCESSES in |
| Documentation/memory-barriers.txt) |
| |
| |
| P2P DMA Support Library |
| ======================= |
| |
| .. kernel-doc:: drivers/pci/p2pdma.c |
| :export: |