KVM: arm64: Refactor the guest teardown path

The __pkvm_teardown_vm hypercall can take a long time. According to my
measurement on Pixel 6, up to 150+ms. The vast majority of that time is
spent walking the guest stage-2 page-table to put its pages in the
'pending reclaim' state, which was introduced to allow poisoning the
pages asynchronously. Given that pKVM is fundamentally non-preemptible,
those 150+ms are not acceptable.

In order to spread the work in multiple smaller sections, let's split the
teardown procedure in two. A first hypercall will be used to place a VM
in a 'dying' state after all the required sanity checks have been done
(e.g. checking that no vCPUs are currently loaded). Once in a dying
state, the hypervisor will deny any attempt to load vCPUs and run the
VM, but accept requests to reclaim guest pages. Once all guest pages have
been reclaimed, the host can issue a second hypercall to finalize the
teardown, which will free the handle and return all pages used to store
guest metadata at EL2 back to EL1.

This was tested on Pixel 6 with android14-6.1, and concurrently running
a memory intensive benchmark on the host and a large protected guest.
The length of EL2 periods have been measured by parsing pKVM traces and
the results showed that all outliers of 200+us have been entirely
eliminated.

Signed-off-by: Quentin Perret <qperret@google.com>
7 files changed