crypto: riscv - add vector crypto accelerated AES-CBC-CTS

Add an implementation of cts(cbc(aes)) accelerated using the Zvkned
RISC-V vector crypto extension.  This is mainly useful for fscrypt,
where cts(cbc(aes)) is the "default" filenames encryption algorithm.  In
that use case, typically most messages are short and are block-aligned.
The CBC-CTS variant implemented is CS3; this is the variant Linux uses.

To perform well on short messages, the new implementation processes the
full message in one call to the assembly function if the data is
contiguous.  Otherwise it falls back to CBC operations followed by CTS
at the end.  For decryption, to further improve performance on short
messages, especially block-aligned messages, the CBC-CTS assembly
function parallelizes the AES decryption of all full blocks.  This
improves on the arm64 implementation of cts(cbc(aes)), which always
splits the CBC part(s) from the CTS part, doing the AES decryptions for
the last two blocks serially and usually loading the round keys twice.

Tested in QEMU with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20240213055442.35954-1-ebiggers@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
3 files changed