Documentation/accel/introduction.rst - linux - Git at Google

 .. SPDX-License-Identifier: GPL-2.0

 ============
 Introduction
 ============

 The Linux compute accelerators subsystem is designed to expose compute
 accelerators in a common way to user-space and provide a common set of
 functionality.

 These devices can be either stand-alone ASICs or IP blocks inside an SoC/GPU.
 Although these devices are typically designed to accelerate
 Machine-Learning (ML) and/or Deep-Learning (DL) computations, the accel layer
 is not limited to handling these types of accelerators.

 Typically, a compute accelerator will belong to one of the following
 categories:

 - Edge AI - doing inference at an edge device. It can be an embedded ASIC/FPGA,
   or an IP inside a SoC (e.g. laptop web camera). These devices
   are typically configured using registers and can work with or without DMA.

 - Inference data-center - single/multi user devices in a large server. This
   type of device can be stand-alone or an IP inside a SoC or a GPU. It will
   have on-board DRAM (to hold the DL topology), DMA engines and
   command submission queues (either kernel or user-space queues).
   It might also have an MMU to manage multiple users and might also enable
   virtualization (SR-IOV) to support multiple VMs on the same device. In
   addition, these devices will usually have some tools, such as profiler and
   debugger.

 - Training data-center - Similar to Inference data-center cards, but typically
   have more computational power and memory b/w (e.g. HBM) and will likely have
   a method of scaling-up/out, i.e. connecting to other training cards inside
   the server or in other servers, respectively.

 All these devices typically have different runtime user-space software stacks,
 that are tailored-made to their h/w. In addition, they will also probably
 include a compiler to generate programs to their custom-made computational
 engines. Typically, the common layer in user-space will be the DL frameworks,
 such as PyTorch and TensorFlow.

 Sharing code with DRM
 =====================

 Because this type of devices can be an IP inside GPUs or have similar
 characteristics as those of GPUs, the accel subsystem will use the
 DRM subsystem's code and functionality. i.e. the accel core code will
 be part of the DRM subsystem and an accel device will be a new type of DRM
 device.

 This will allow us to leverage the extensive DRM code-base and
 collaborate with DRM developers that have experience with this type of
 devices. In addition, new features that will be added for the accelerator
 drivers can be of use to GPU drivers as well.

 Differentiation from GPUs
 =========================

 Because we want to prevent the extensive user-space graphic software stack
 from trying to use an accelerator as a GPU, the compute accelerators will be
 differentiated from GPUs by using a new major number and new device char files.

 Furthermore, the drivers will be located in a separate place in the kernel
 tree - drivers/accel/.

 The accelerator devices will be exposed to the user space with the dedicated
 261 major number and will have the following convention:

 - device char files - /dev/accel/accel\*
 - sysfs             - /sys/class/accel/accel\*/
 - debugfs           - /sys/kernel/debug/accel/\*/

 Getting Started
 ===============

 First, read the DRM documentation at Documentation/gpu/index.rst.
 Not only it will explain how to write a new DRM driver but it will also
 contain all the information on how to contribute, the Code Of Conduct and
 what is the coding style/documentation. All of that is the same for the
 accel subsystem.

 Second, make sure the kernel is configured with CONFIG_DRM_ACCEL.

 To expose your device as an accelerator, two changes are needed to
 be done in your driver (as opposed to a standard DRM driver):

 - Add the DRIVER_COMPUTE_ACCEL feature flag in your drm_driver's
   driver_features field. It is important to note that this driver feature is
   mutually exclusive with DRIVER_RENDER and DRIVER_MODESET. Devices that want
   to expose both graphics and compute device char files should be handled by
   two drivers that are connected using the auxiliary bus framework.

 - Change the open callback in your driver fops structure to accel_open().
   Alternatively, your driver can use DEFINE_DRM_ACCEL_FOPS macro to easily
   set the correct function operations pointers structure.

 External References
 ===================

 email threads
 -------------

 * `Initial discussion on the New subsystem for acceleration devices <https://lkml.org/lkml/2022/7/31/83>`_ - Oded Gabbay (2022)
 * `patch-set to add the new subsystem <https://lkml.org/lkml/2022/10/22/544>`_ - Oded Gabbay (2022)

 Conference talks
 ----------------

 * `LPC 2022 Accelerators BOF outcomes summary <https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html>`_ - Dave Airlie (2022)
	.. SPDX-License-Identifier: GPL-2.0

	============
	Introduction
	============

	The Linux compute accelerators subsystem is designed to expose compute
	accelerators in a common way to user-space and provide a common set of
	functionality.

	These devices can be either stand-alone ASICs or IP blocks inside an SoC/GPU.
	Although these devices are typically designed to accelerate
	Machine-Learning (ML) and/or Deep-Learning (DL) computations, the accel layer
	is not limited to handling these types of accelerators.

	Typically, a compute accelerator will belong to one of the following
	categories:

	- Edge AI - doing inference at an edge device. It can be an embedded ASIC/FPGA,
	or an IP inside a SoC (e.g. laptop web camera). These devices
	are typically configured using registers and can work with or without DMA.

	- Inference data-center - single/multi user devices in a large server. This
	type of device can be stand-alone or an IP inside a SoC or a GPU. It will
	have on-board DRAM (to hold the DL topology), DMA engines and
	command submission queues (either kernel or user-space queues).
	It might also have an MMU to manage multiple users and might also enable
	virtualization (SR-IOV) to support multiple VMs on the same device. In
	addition, these devices will usually have some tools, such as profiler and
	debugger.

	- Training data-center - Similar to Inference data-center cards, but typically
	have more computational power and memory b/w (e.g. HBM) and will likely have
	a method of scaling-up/out, i.e. connecting to other training cards inside
	the server or in other servers, respectively.

	All these devices typically have different runtime user-space software stacks,
	that are tailored-made to their h/w. In addition, they will also probably
	include a compiler to generate programs to their custom-made computational
	engines. Typically, the common layer in user-space will be the DL frameworks,
	such as PyTorch and TensorFlow.

	Sharing code with DRM
	=====================

	Because this type of devices can be an IP inside GPUs or have similar
	characteristics as those of GPUs, the accel subsystem will use the
	DRM subsystem's code and functionality. i.e. the accel core code will
	be part of the DRM subsystem and an accel device will be a new type of DRM
	device.

	This will allow us to leverage the extensive DRM code-base and
	collaborate with DRM developers that have experience with this type of
	devices. In addition, new features that will be added for the accelerator
	drivers can be of use to GPU drivers as well.

	Differentiation from GPUs
	=========================

	Because we want to prevent the extensive user-space graphic software stack
	from trying to use an accelerator as a GPU, the compute accelerators will be
	differentiated from GPUs by using a new major number and new device char files.

	Furthermore, the drivers will be located in a separate place in the kernel
	tree - drivers/accel/.

	The accelerator devices will be exposed to the user space with the dedicated
	261 major number and will have the following convention:

	- device char files - /dev/accel/accel\*
	- sysfs - /sys/class/accel/accel\*/
	- debugfs - /sys/kernel/debug/accel/\*/

	Getting Started
	===============

	First, read the DRM documentation at Documentation/gpu/index.rst.
	Not only it will explain how to write a new DRM driver but it will also
	contain all the information on how to contribute, the Code Of Conduct and
	what is the coding style/documentation. All of that is the same for the
	accel subsystem.

	Second, make sure the kernel is configured with CONFIG_DRM_ACCEL.

	To expose your device as an accelerator, two changes are needed to
	be done in your driver (as opposed to a standard DRM driver):

	- Add the DRIVER_COMPUTE_ACCEL feature flag in your drm_driver's
	driver_features field. It is important to note that this driver feature is
	mutually exclusive with DRIVER_RENDER and DRIVER_MODESET. Devices that want
	to expose both graphics and compute device char files should be handled by
	two drivers that are connected using the auxiliary bus framework.

	- Change the open callback in your driver fops structure to accel_open().
	Alternatively, your driver can use DEFINE_DRM_ACCEL_FOPS macro to easily
	set the correct function operations pointers structure.

	External References
	===================

	email threads
	-------------

	* `Initial discussion on the New subsystem for acceleration devices <https://lkml.org/lkml/2022/7/31/83>`_ - Oded Gabbay (2022)
	* `patch-set to add the new subsystem <https://lkml.org/lkml/2022/10/22/544>`_ - Oded Gabbay (2022)

	Conference talks
	----------------

	* `LPC 2022 Accelerators BOF outcomes summary <https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html>`_ - Dave Airlie (2022)