Vaibhav Jain | 58b278f | 2019-08-28 13:57:29 +0530 | [diff] [blame] | 1 | .. SPDX-License-Identifier: GPL-2.0 |
| 2 | |
| 3 | =========================== |
| 4 | Hypercall Op-codes (hcalls) |
| 5 | =========================== |
| 6 | |
| 7 | Overview |
| 8 | ========= |
| 9 | |
| 10 | Virtualization on 64-bit Power Book3S Platforms is based on the PAPR |
| 11 | specification [1]_ which describes the run-time environment for a guest |
| 12 | operating system and how it should interact with the hypervisor for |
| 13 | privileged operations. Currently there are two PAPR compliant hypervisors: |
| 14 | |
| 15 | - **IBM PowerVM (PHYP)**: IBM's proprietary hypervisor that supports AIX, |
| 16 | IBM-i and Linux as supported guests (termed as Logical Partitions |
| 17 | or LPARS). It supports the full PAPR specification. |
| 18 | |
| 19 | - **Qemu/KVM**: Supports PPC64 linux guests running on a PPC64 linux host. |
| 20 | Though it only implements a subset of PAPR specification called LoPAPR [2]_. |
| 21 | |
| 22 | On PPC64 arch a guest kernel running on top of a PAPR hypervisor is called |
| 23 | a *pSeries guest*. A pseries guest runs in a supervisor mode (HV=0) and must |
| 24 | issue hypercalls to the hypervisor whenever it needs to perform an action |
| 25 | that is hypervisor priviledged [3]_ or for other services managed by the |
| 26 | hypervisor. |
| 27 | |
| 28 | Hence a Hypercall (hcall) is essentially a request by the pseries guest |
| 29 | asking hypervisor to perform a privileged operation on behalf of the guest. The |
| 30 | guest issues a with necessary input operands. The hypervisor after performing |
| 31 | the privilege operation returns a status code and output operands back to the |
| 32 | guest. |
| 33 | |
| 34 | HCALL ABI |
| 35 | ========= |
| 36 | The ABI specification for a hcall between a pseries guest and PAPR hypervisor |
| 37 | is covered in section 14.5.3 of ref [2]_. Switch to the Hypervisor context is |
| 38 | done via the instruction **HVCS** that expects the Opcode for hcall is set in *r3* |
| 39 | and any in-arguments for the hcall are provided in registers *r4-r12*. If values |
| 40 | have to be passed through a memory buffer, the data stored in that buffer should be |
| 41 | in Big-endian byte order. |
| 42 | |
| 43 | Once control is returns back to the guest after hypervisor has serviced the |
| 44 | 'HVCS' instruction the return value of the hcall is available in *r3* and any |
| 45 | out values are returned in registers *r4-r12*. Again like in case of in-arguments, |
| 46 | any out values stored in a memory buffer will be in Big-endian byte order. |
| 47 | |
| 48 | Powerpc arch code provides convenient wrappers named **plpar_hcall_xxx** defined |
| 49 | in a arch specific header [4]_ to issue hcalls from the linux kernel |
| 50 | running as pseries guest. |
| 51 | |
| 52 | Register Conventions |
| 53 | ==================== |
| 54 | |
| 55 | Any hcall should follow same register convention as described in section 2.2.1.1 |
| 56 | of "64-Bit ELF V2 ABI Specification: Power Architecture"[5]_. Table below |
| 57 | summarizes these conventions: |
| 58 | |
| 59 | +----------+----------+-------------------------------------------+ |
| 60 | | Register |Volatile | Purpose | |
| 61 | | Range |(Y/N) | | |
| 62 | +==========+==========+===========================================+ |
| 63 | | r0 | Y | Optional-usage | |
| 64 | +----------+----------+-------------------------------------------+ |
| 65 | | r1 | N | Stack Pointer | |
| 66 | +----------+----------+-------------------------------------------+ |
| 67 | | r2 | N | TOC | |
| 68 | +----------+----------+-------------------------------------------+ |
| 69 | | r3 | Y | hcall opcode/return value | |
| 70 | +----------+----------+-------------------------------------------+ |
| 71 | | r4-r10 | Y | in and out values | |
| 72 | +----------+----------+-------------------------------------------+ |
| 73 | | r11 | Y | Optional-usage/Environmental pointer | |
| 74 | +----------+----------+-------------------------------------------+ |
| 75 | | r12 | Y | Optional-usage/Function entry address at | |
| 76 | | | | global entry point | |
| 77 | +----------+----------+-------------------------------------------+ |
| 78 | | r13 | N | Thread-Pointer | |
| 79 | +----------+----------+-------------------------------------------+ |
| 80 | | r14-r31 | N | Local Variables | |
| 81 | +----------+----------+-------------------------------------------+ |
| 82 | | LR | Y | Link Register | |
| 83 | +----------+----------+-------------------------------------------+ |
| 84 | | CTR | Y | Loop Counter | |
| 85 | +----------+----------+-------------------------------------------+ |
| 86 | | XER | Y | Fixed-point exception register. | |
| 87 | +----------+----------+-------------------------------------------+ |
| 88 | | CR0-1 | Y | Condition register fields. | |
| 89 | +----------+----------+-------------------------------------------+ |
| 90 | | CR2-4 | N | Condition register fields. | |
| 91 | +----------+----------+-------------------------------------------+ |
| 92 | | CR5-7 | Y | Condition register fields. | |
| 93 | +----------+----------+-------------------------------------------+ |
| 94 | | Others | N | | |
| 95 | +----------+----------+-------------------------------------------+ |
| 96 | |
| 97 | DRC & DRC Indexes |
| 98 | ================= |
| 99 | :: |
| 100 | |
| 101 | DR1 Guest |
| 102 | +--+ +------------+ +---------+ |
| 103 | | | <----> | | | User | |
| 104 | +--+ DRC1 | | DRC | Space | |
| 105 | | PAPR | Index +---------+ |
| 106 | DR2 | Hypervisor | | | |
| 107 | +--+ | | <-----> | Kernel | |
| 108 | | | <----> | | Hcall | | |
| 109 | +--+ DRC2 +------------+ +---------+ |
| 110 | |
| 111 | PAPR hypervisor terms shared hardware resources like PCI devices, NVDIMMs etc |
| 112 | available for use by LPARs as Dynamic Resource (DR). When a DR is allocated to |
| 113 | an LPAR, PHYP creates a data-structure called Dynamic Resource Connector (DRC) |
| 114 | to manage LPAR access. An LPAR refers to a DRC via an opaque 32-bit number |
| 115 | called DRC-Index. The DRC-index value is provided to the LPAR via device-tree |
| 116 | where its present as an attribute in the device tree node associated with the |
| 117 | DR. |
| 118 | |
| 119 | HCALL Return-values |
| 120 | =================== |
| 121 | |
| 122 | After servicing the hcall, hypervisor sets the return-value in *r3* indicating |
| 123 | success or failure of the hcall. In case of a failure an error code indicates |
| 124 | the cause for error. These codes are defined and documented in arch specific |
| 125 | header [4]_. |
| 126 | |
| 127 | In some cases a hcall can potentially take a long time and need to be issued |
| 128 | multiple times in order to be completely serviced. These hcalls will usually |
| 129 | accept an opaque value *continue-token* within there argument list and a |
| 130 | return value of *H_CONTINUE* indicates that hypervisor hasn't still finished |
| 131 | servicing the hcall yet. |
| 132 | |
| 133 | To make such hcalls the guest need to set *continue-token == 0* for the |
| 134 | initial call and use the hypervisor returned value of *continue-token* |
| 135 | for each subsequent hcall until hypervisor returns a non *H_CONTINUE* |
| 136 | return value. |
| 137 | |
| 138 | HCALL Op-codes |
| 139 | ============== |
| 140 | |
| 141 | Below is a partial list of HCALLs that are supported by PHYP. For the |
| 142 | corresponding opcode values please look into the arch specific header [4]_: |
| 143 | |
| 144 | **H_SCM_READ_METADATA** |
| 145 | |
| 146 | | Input: *drcIndex, offset, buffer-address, numBytesToRead* |
| 147 | | Out: *numBytesRead* |
| 148 | | Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_Hardware* |
| 149 | |
| 150 | Given a DRC Index of an NVDIMM, read N-bytes from the the metadata area |
| 151 | associated with it, at a specified offset and copy it to provided buffer. |
| 152 | The metadata area stores configuration information such as label information, |
| 153 | bad-blocks etc. The metadata area is located out-of-band of NVDIMM storage |
| 154 | area hence a separate access semantics is provided. |
| 155 | |
| 156 | **H_SCM_WRITE_METADATA** |
| 157 | |
| 158 | | Input: *drcIndex, offset, data, numBytesToWrite* |
| 159 | | Out: *None* |
| 160 | | Return Value: *H_Success, H_Parameter, H_P2, H_P4, H_Hardware* |
| 161 | |
| 162 | Given a DRC Index of an NVDIMM, write N-bytes to the metadata area |
| 163 | associated with it, at the specified offset and from the provided buffer. |
| 164 | |
| 165 | **H_SCM_BIND_MEM** |
| 166 | |
| 167 | | Input: *drcIndex, startingScmBlockIndex, numScmBlocksToBind,* |
| 168 | | *targetLogicalMemoryAddress, continue-token* |
| 169 | | Out: *continue-token, targetLogicalMemoryAddress, numScmBlocksToBound* |
| 170 | | Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_P4, H_Overlap,* |
| 171 | | *H_Too_Big, H_P5, H_Busy* |
| 172 | |
| 173 | Given a DRC-Index of an NVDIMM, map a continuous SCM blocks range |
| 174 | *(startingScmBlockIndex, startingScmBlockIndex+numScmBlocksToBind)* to the guest |
| 175 | at *targetLogicalMemoryAddress* within guest physical address space. In |
| 176 | case *targetLogicalMemoryAddress == 0xFFFFFFFF_FFFFFFFF* then hypervisor |
| 177 | assigns a target address to the guest. The HCALL can fail if the Guest has |
| 178 | an active PTE entry to the SCM block being bound. |
| 179 | |
| 180 | **H_SCM_UNBIND_MEM** |
| 181 | | Input: drcIndex, startingScmLogicalMemoryAddress, numScmBlocksToUnbind |
| 182 | | Out: numScmBlocksUnbound |
| 183 | | Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Overlap,* |
| 184 | | *H_Busy, H_LongBusyOrder1mSec, H_LongBusyOrder10mSec* |
| 185 | |
| 186 | Given a DRC-Index of an NVDimm, unmap *numScmBlocksToUnbind* SCM blocks starting |
| 187 | at *startingScmLogicalMemoryAddress* from guest physical address space. The |
| 188 | HCALL can fail if the Guest has an active PTE entry to the SCM block being |
| 189 | unbound. |
| 190 | |
| 191 | **H_SCM_QUERY_BLOCK_MEM_BINDING** |
| 192 | |
| 193 | | Input: *drcIndex, scmBlockIndex* |
| 194 | | Out: *Guest-Physical-Address* |
| 195 | | Return Value: *H_Success, H_Parameter, H_P2, H_NotFound* |
| 196 | |
| 197 | Given a DRC-Index and an SCM Block index return the guest physical address to |
| 198 | which the SCM block is mapped to. |
| 199 | |
| 200 | **H_SCM_QUERY_LOGICAL_MEM_BINDING** |
| 201 | |
| 202 | | Input: *Guest-Physical-Address* |
| 203 | | Out: *drcIndex, scmBlockIndex* |
| 204 | | Return Value: *H_Success, H_Parameter, H_P2, H_NotFound* |
| 205 | |
| 206 | Given a guest physical address return which DRC Index and SCM block is mapped |
| 207 | to that address. |
| 208 | |
| 209 | **H_SCM_UNBIND_ALL** |
| 210 | |
| 211 | | Input: *scmTargetScope, drcIndex* |
| 212 | | Out: *None* |
| 213 | | Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Busy,* |
| 214 | | *H_LongBusyOrder1mSec, H_LongBusyOrder10mSec* |
| 215 | |
| 216 | Depending on the Target scope unmap all SCM blocks belonging to all NVDIMMs |
| 217 | or all SCM blocks belonging to a single NVDIMM identified by its drcIndex |
| 218 | from the LPAR memory. |
| 219 | |
| 220 | **H_SCM_HEALTH** |
| 221 | |
| 222 | | Input: drcIndex |
| 223 | | Out: *health-bitmap, health-bit-valid-bitmap* |
| 224 | | Return Value: *H_Success, H_Parameter, H_Hardware* |
| 225 | |
| 226 | Given a DRC Index return the info on predictive failure and overall health of |
| 227 | the NVDIMM. The asserted bits in the health-bitmap indicate a single predictive |
| 228 | failure and health-bit-valid-bitmap indicate which bits in health-bitmap are |
| 229 | valid. |
| 230 | |
| 231 | **H_SCM_PERFORMANCE_STATS** |
| 232 | |
| 233 | | Input: drcIndex, resultBuffer Addr |
| 234 | | Out: None |
| 235 | | Return Value: *H_Success, H_Parameter, H_Unsupported, H_Hardware, H_Authority, H_Privilege* |
| 236 | |
| 237 | Given a DRC Index collect the performance statistics for NVDIMM and copy them |
| 238 | to the resultBuffer. |
| 239 | |
| 240 | References |
| 241 | ========== |
| 242 | .. [1] "Power Architecture Platform Reference" |
| 243 | https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference |
| 244 | .. [2] "Linux on Power Architecture Platform Reference" |
| 245 | https://members.openpowerfoundation.org/document/dl/469 |
| 246 | .. [3] "Definitions and Notation" Book III-Section 14.5.3 |
| 247 | https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0 |
| 248 | .. [4] arch/powerpc/include/asm/hvcall.h |
| 249 | .. [5] "64-Bit ELF V2 ABI Specification: Power Architecture" |
| 250 | https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture |