spi: stm32-qspi: Trigger DMA only if more than 4 bytes to transfer

In order to optimize accesses to spi flashes, trigger a DMA only
if more than 4 bytes has to be transferred.

DMA transfer preparation's cost becomes negligible above 4 bytes to
transfer. Below this threshold, indirect transfer give more throughput.

mtd_speedtest shows that page write throughtput increases :
  - from 779 to 853 KiB/s (~9.5%) with s25fl512s SPI-NOR.
  - from 5283 to 5666 KiB/s (~7.25%) with Micron SPI-NAND.

Signed-off-by: Christophe Kerello <christophe.kerello@foss.st.com>
Signed-off-by: Patrice Chotard <patrice.chotard@foss.st.com>
Link: https://lore.kernel.org/r/20210419121541.11617-3-patrice.chotard@foss.st.com
Signed-off-by: Mark Brown <broonie@kernel.org>
diff --git a/drivers/spi/spi-stm32-qspi.c b/drivers/spi/spi-stm32-qspi.c
index 2786470..6e74d6b 100644
--- a/drivers/spi/spi-stm32-qspi.c
+++ b/drivers/spi/spi-stm32-qspi.c
@@ -269,8 +269,9 @@ static int stm32_qspi_tx(struct stm32_qspi *qspi, const struct spi_mem_op *op)
 
 	if (qspi->fmode == CCR_FMODE_MM)
 		return stm32_qspi_tx_mm(qspi, op);
-	else if ((op->data.dir == SPI_MEM_DATA_IN && qspi->dma_chrx) ||
-		 (op->data.dir == SPI_MEM_DATA_OUT && qspi->dma_chtx))
+	else if (((op->data.dir == SPI_MEM_DATA_IN && qspi->dma_chrx) ||
+		 (op->data.dir == SPI_MEM_DATA_OUT && qspi->dma_chtx)) &&
+		  op->data.nbytes > 4)
 		if (!stm32_qspi_tx_dma(qspi, op))
 			return 0;