| .. SPDX-License-Identifier: GPL-2.0 |
| |
| =========================== |
| The KVM halt polling system |
| =========================== |
| |
| The KVM halt polling system provides a feature within KVM whereby the latency |
| of a guest can, under some circumstances, be reduced by polling in the host |
| for some time period after the guest has elected to no longer run by cedeing. |
| That is, when a guest vcpu has ceded, or in the case of powerpc when all of the |
| vcpus of a single vcore have ceded, the host kernel polls for wakeup conditions |
| before giving up the cpu to the scheduler in order to let something else run. |
| |
| Polling provides a latency advantage in cases where the guest can be run again |
| very quickly by at least saving us a trip through the scheduler, normally on |
| the order of a few micro-seconds, although performance benefits are workload |
| dependent. In the event that no wakeup source arrives during the polling |
| interval or some other task on the runqueue is runnable the scheduler is |
| invoked. Thus halt polling is especially useful on workloads with very short |
| wakeup periods where the time spent halt polling is minimised and the time |
| savings of not invoking the scheduler are distinguishable. |
| |
| The generic halt polling code is implemented in: |
| |
| virt/kvm/kvm_main.c: kvm_vcpu_block() |
| |
| The powerpc kvm-hv specific case is implemented in: |
| |
| arch/powerpc/kvm/book3s_hv.c: kvmppc_vcore_blocked() |
| |
| Halt Polling Interval |
| ===================== |
| |
| The maximum time for which to poll before invoking the scheduler, referred to |
| as the halt polling interval, is increased and decreased based on the perceived |
| effectiveness of the polling in an attempt to limit pointless polling. |
| This value is stored in either the vcpu struct: |
| |
| kvm_vcpu->halt_poll_ns |
| |
| or in the case of powerpc kvm-hv, in the vcore struct: |
| |
| kvmppc_vcore->halt_poll_ns |
| |
| Thus this is a per vcpu (or vcore) value. |
| |
| During polling if a wakeup source is received within the halt polling interval, |
| the interval is left unchanged. In the event that a wakeup source isn't |
| received during the polling interval (and thus schedule is invoked) there are |
| two options, either the polling interval and total block time[0] were less than |
| the global max polling interval (see module params below), or the total block |
| time was greater than the global max polling interval. |
| |
| In the event that both the polling interval and total block time were less than |
| the global max polling interval then the polling interval can be increased in |
| the hope that next time during the longer polling interval the wake up source |
| will be received while the host is polling and the latency benefits will be |
| received. The polling interval is grown in the function grow_halt_poll_ns() and |
| is multiplied by the module parameters halt_poll_ns_grow and |
| halt_poll_ns_grow_start. |
| |
| In the event that the total block time was greater than the global max polling |
| interval then the host will never poll for long enough (limited by the global |
| max) to wakeup during the polling interval so it may as well be shrunk in order |
| to avoid pointless polling. The polling interval is shrunk in the function |
| shrink_halt_poll_ns() and is divided by the module parameter |
| halt_poll_ns_shrink, or set to 0 iff halt_poll_ns_shrink == 0. |
| |
| It is worth noting that this adjustment process attempts to hone in on some |
| steady state polling interval but will only really do a good job for wakeups |
| which come at an approximately constant rate, otherwise there will be constant |
| adjustment of the polling interval. |
| |
| [0] total block time: |
| the time between when the halt polling function is |
| invoked and a wakeup source received (irrespective of |
| whether the scheduler is invoked within that function). |
| |
| Module Parameters |
| ================= |
| |
| The kvm module has 3 tuneable module parameters to adjust the global max |
| polling interval as well as the rate at which the polling interval is grown and |
| shrunk. These variables are defined in include/linux/kvm_host.h and as module |
| parameters in virt/kvm/kvm_main.c, or arch/powerpc/kvm/book3s_hv.c in the |
| powerpc kvm-hv case. |
| |
| +-----------------------+---------------------------+-------------------------+ |
| |Module Parameter | Description | Default Value | |
| +-----------------------+---------------------------+-------------------------+ |
| |halt_poll_ns | The global max polling | KVM_HALT_POLL_NS_DEFAULT| |
| | | interval which defines | | |
| | | the ceiling value of the | | |
| | | polling interval for | (per arch value) | |
| | | each vcpu. | | |
| +-----------------------+---------------------------+-------------------------+ |
| |halt_poll_ns_grow | The value by which the | 2 | |
| | | halt polling interval is | | |
| | | multiplied in the | | |
| | | grow_halt_poll_ns() | | |
| | | function. | | |
| +-----------------------+---------------------------+-------------------------+ |
| |halt_poll_ns_grow_start| The initial value to grow | 10000 | |
| | | to from zero in the | | |
| | | grow_halt_poll_ns() | | |
| | | function. | | |
| +-----------------------+---------------------------+-------------------------+ |
| |halt_poll_ns_shrink | The value by which the | 0 | |
| | | halt polling interval is | | |
| | | divided in the | | |
| | | shrink_halt_poll_ns() | | |
| | | function. | | |
| +-----------------------+---------------------------+-------------------------+ |
| |
| These module parameters can be set from the sysfs files in: |
| |
| /sys/module/kvm/parameters/ |
| |
| Note: these module parameters are system-wide values and are not able to |
| be tuned on a per vm basis. |
| |
| Any changes to these parameters will be picked up by new and existing vCPUs the |
| next time they halt, with the notable exception of VMs using KVM_CAP_HALT_POLL |
| (see next section). |
| |
| KVM_CAP_HALT_POLL |
| ================= |
| |
| KVM_CAP_HALT_POLL is a VM capability that allows userspace to override halt_poll_ns |
| on a per-VM basis. VMs using KVM_CAP_HALT_POLL ignore halt_poll_ns completely (but |
| still obey halt_poll_ns_grow, halt_poll_ns_grow_start, and halt_poll_ns_shrink). |
| |
| See Documentation/virt/kvm/api.rst for more information on this capability. |
| |
| Further Notes |
| ============= |
| |
| - Care should be taken when setting the halt_poll_ns module parameter as a large value |
| has the potential to drive the cpu usage to 100% on a machine which would be almost |
| entirely idle otherwise. This is because even if a guest has wakeups during which very |
| little work is done and which are quite far apart, if the period is shorter than the |
| global max polling interval (halt_poll_ns) then the host will always poll for the |
| entire block time and thus cpu utilisation will go to 100%. |
| |
| - Halt polling essentially presents a trade-off between power usage and latency and |
| the module parameters should be used to tune the affinity for this. Idle cpu time is |
| essentially converted to host kernel time with the aim of decreasing latency when |
| entering the guest. |
| |
| - Halt polling will only be conducted by the host when no other tasks are runnable on |
| that cpu, otherwise the polling will cease immediately and schedule will be invoked to |
| allow that other task to run. Thus this doesn't allow a guest to cause denial of service |
| of the cpu. |