Jakub Kicinski | f42c104 | 2019-05-21 18:57:14 -0700 | [diff] [blame] | 1 | .. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) |
| 2 | |
| 3 | ================== |
| 4 | Kernel TLS offload |
| 5 | ================== |
| 6 | |
| 7 | Kernel TLS operation |
| 8 | ==================== |
| 9 | |
| 10 | Linux kernel provides TLS connection offload infrastructure. Once a TCP |
| 11 | connection is in ``ESTABLISHED`` state user space can enable the TLS Upper |
| 12 | Layer Protocol (ULP) and install the cryptographic connection state. |
| 13 | For details regarding the user-facing interface refer to the TLS |
| 14 | documentation in :ref:`Documentation/networking/tls.rst <kernel_tls>`. |
| 15 | |
| 16 | ``ktls`` can operate in three modes: |
| 17 | |
| 18 | * Software crypto mode (``TLS_SW``) - CPU handles the cryptography. |
| 19 | In most basic cases only crypto operations synchronous with the CPU |
| 20 | can be used, but depending on calling context CPU may utilize |
| 21 | asynchronous crypto accelerators. The use of accelerators introduces extra |
| 22 | latency on socket reads (decryption only starts when a read syscall |
| 23 | is made) and additional I/O load on the system. |
| 24 | * Packet-based NIC offload mode (``TLS_HW``) - the NIC handles crypto |
| 25 | on a packet by packet basis, provided the packets arrive in order. |
| 26 | This mode integrates best with the kernel stack and is described in detail |
| 27 | in the remaining part of this document |
| 28 | (``ethtool`` flags ``tls-hw-tx-offload`` and ``tls-hw-rx-offload``). |
| 29 | * Full TCP NIC offload mode (``TLS_HW_RECORD``) - mode of operation where |
| 30 | NIC driver and firmware replace the kernel networking stack |
| 31 | with its own TCP handling, it is not usable in production environments |
| 32 | making use of the Linux networking stack for example any firewalling |
| 33 | abilities or QoS and packet scheduling (``ethtool`` flag ``tls-hw-record``). |
| 34 | |
| 35 | The operation mode is selected automatically based on device configuration, |
| 36 | offload opt-in or opt-out on per-connection basis is not currently supported. |
| 37 | |
| 38 | TX |
| 39 | -- |
| 40 | |
| 41 | At a high level user write requests are turned into a scatter list, the TLS ULP |
| 42 | intercepts them, inserts record framing, performs encryption (in ``TLS_SW`` |
| 43 | mode) and then hands the modified scatter list to the TCP layer. From this |
| 44 | point on the TCP stack proceeds as normal. |
| 45 | |
| 46 | In ``TLS_HW`` mode the encryption is not performed in the TLS ULP. |
| 47 | Instead packets reach a device driver, the driver will mark the packets |
| 48 | for crypto offload based on the socket the packet is attached to, |
| 49 | and send them to the device for encryption and transmission. |
| 50 | |
| 51 | RX |
| 52 | -- |
| 53 | |
| 54 | On the receive side if the device handled decryption and authentication |
| 55 | successfully, the driver will set the decrypted bit in the associated |
| 56 | :c:type:`struct sk_buff <sk_buff>`. The packets reach the TCP stack and |
| 57 | are handled normally. ``ktls`` is informed when data is queued to the socket |
| 58 | and the ``strparser`` mechanism is used to delineate the records. Upon read |
| 59 | request, records are retrieved from the socket and passed to decryption routine. |
| 60 | If device decrypted all the segments of the record the decryption is skipped, |
| 61 | otherwise software path handles decryption. |
| 62 | |
| 63 | .. kernel-figure:: tls-offload-layers.svg |
| 64 | :alt: TLS offload layers |
| 65 | :align: center |
| 66 | :figwidth: 28em |
| 67 | |
| 68 | Layers of Kernel TLS stack |
| 69 | |
| 70 | Device configuration |
| 71 | ==================== |
| 72 | |
| 73 | During driver initialization device sets the ``NETIF_F_HW_TLS_RX`` and |
| 74 | ``NETIF_F_HW_TLS_TX`` features and installs its |
| 75 | :c:type:`struct tlsdev_ops <tlsdev_ops>` |
| 76 | pointer in the :c:member:`tlsdev_ops` member of the |
| 77 | :c:type:`struct net_device <net_device>`. |
| 78 | |
| 79 | When TLS cryptographic connection state is installed on a ``ktls`` socket |
| 80 | (note that it is done twice, once for RX and once for TX direction, |
| 81 | and the two are completely independent), the kernel checks if the underlying |
| 82 | network device is offload-capable and attempts the offload. In case offload |
| 83 | fails the connection is handled entirely in software using the same mechanism |
| 84 | as if the offload was never tried. |
| 85 | |
| 86 | Offload request is performed via the :c:member:`tls_dev_add` callback of |
| 87 | :c:type:`struct tlsdev_ops <tlsdev_ops>`: |
| 88 | |
| 89 | .. code-block:: c |
| 90 | |
| 91 | int (*tls_dev_add)(struct net_device *netdev, struct sock *sk, |
| 92 | enum tls_offload_ctx_dir direction, |
| 93 | struct tls_crypto_info *crypto_info, |
| 94 | u32 start_offload_tcp_sn); |
| 95 | |
| 96 | ``direction`` indicates whether the cryptographic information is for |
| 97 | the received or transmitted packets. Driver uses the ``sk`` parameter |
| 98 | to retrieve the connection 5-tuple and socket family (IPv4 vs IPv6). |
| 99 | Cryptographic information in ``crypto_info`` includes the key, iv, salt |
| 100 | as well as TLS record sequence number. ``start_offload_tcp_sn`` indicates |
| 101 | which TCP sequence number corresponds to the beginning of the record with |
| 102 | sequence number from ``crypto_info``. The driver can add its state |
| 103 | at the end of kernel structures (see :c:member:`driver_state` members |
| 104 | in ``include/net/tls.h``) to avoid additional allocations and pointer |
| 105 | dereferences. |
| 106 | |
| 107 | TX |
| 108 | -- |
| 109 | |
| 110 | After TX state is installed, the stack guarantees that the first segment |
| 111 | of the stream will start exactly at the ``start_offload_tcp_sn`` sequence |
| 112 | number, simplifying TCP sequence number matching. |
| 113 | |
| 114 | TX offload being fully initialized does not imply that all segments passing |
| 115 | through the driver and which belong to the offloaded socket will be after |
| 116 | the expected sequence number and will have kernel record information. |
| 117 | In particular, already encrypted data may have been queued to the socket |
| 118 | before installing the connection state in the kernel. |
| 119 | |
| 120 | RX |
| 121 | -- |
| 122 | |
| 123 | In RX direction local networking stack has little control over the segmentation, |
| 124 | so the initial records' TCP sequence number may be anywhere inside the segment. |
| 125 | |
| 126 | Normal operation |
| 127 | ================ |
| 128 | |
| 129 | At the minimum the device maintains the following state for each connection, in |
| 130 | each direction: |
| 131 | |
| 132 | * crypto secrets (key, iv, salt) |
| 133 | * crypto processing state (partial blocks, partial authentication tag, etc.) |
| 134 | * record metadata (sequence number, processing offset and length) |
| 135 | * expected TCP sequence number |
| 136 | |
| 137 | There are no guarantees on record length or record segmentation. In particular |
| 138 | segments may start at any point of a record and contain any number of records. |
| 139 | Assuming segments are received in order, the device should be able to perform |
| 140 | crypto operations and authentication regardless of segmentation. For this |
| 141 | to be possible device has to keep small amount of segment-to-segment state. |
| 142 | This includes at least: |
| 143 | |
| 144 | * partial headers (if a segment carried only a part of the TLS header) |
| 145 | * partial data block |
| 146 | * partial authentication tag (all data had been seen but part of the |
| 147 | authentication tag has to be written or read from the subsequent segment) |
| 148 | |
| 149 | Record reassembly is not necessary for TLS offload. If the packets arrive |
| 150 | in order the device should be able to handle them separately and make |
| 151 | forward progress. |
| 152 | |
| 153 | TX |
| 154 | -- |
| 155 | |
| 156 | The kernel stack performs record framing reserving space for the authentication |
| 157 | tag and populating all other TLS header and tailer fields. |
| 158 | |
| 159 | Both the device and the driver maintain expected TCP sequence numbers |
| 160 | due to the possibility of retransmissions and the lack of software fallback |
| 161 | once the packet reaches the device. |
| 162 | For segments passed in order, the driver marks the packets with |
| 163 | a connection identifier (note that a 5-tuple lookup is insufficient to identify |
| 164 | packets requiring HW offload, see the :ref:`5tuple_problems` section) |
| 165 | and hands them to the device. The device identifies the packet as requiring |
| 166 | TLS handling and confirms the sequence number matches its expectation. |
| 167 | The device performs encryption and authentication of the record data. |
| 168 | It replaces the authentication tag and TCP checksum with correct values. |
| 169 | |
| 170 | RX |
| 171 | -- |
| 172 | |
| 173 | Before a packet is DMAed to the host (but after NIC's embedded switching |
| 174 | and packet transformation functions) the device validates the Layer 4 |
| 175 | checksum and performs a 5-tuple lookup to find any TLS connection the packet |
| 176 | may belong to (technically a 4-tuple |
| 177 | lookup is sufficient - IP addresses and TCP port numbers, as the protocol |
| 178 | is always TCP). If connection is matched device confirms if the TCP sequence |
| 179 | number is the expected one and proceeds to TLS handling (record delineation, |
| 180 | decryption, authentication for each record in the packet). The device leaves |
| 181 | the record framing unmodified, the stack takes care of record decapsulation. |
| 182 | Device indicates successful handling of TLS offload in the per-packet context |
| 183 | (descriptor) passed to the host. |
| 184 | |
| 185 | Upon reception of a TLS offloaded packet, the driver sets |
| 186 | the :c:member:`decrypted` mark in :c:type:`struct sk_buff <sk_buff>` |
| 187 | corresponding to the segment. Networking stack makes sure decrypted |
| 188 | and non-decrypted segments do not get coalesced (e.g. by GRO or socket layer) |
| 189 | and takes care of partial decryption. |
| 190 | |
| 191 | Resync handling |
| 192 | =============== |
| 193 | |
| 194 | In presence of packet drops or network packet reordering, the device may lose |
| 195 | synchronization with the TLS stream, and require a resync with the kernel's |
| 196 | TCP stack. |
| 197 | |
| 198 | Note that resync is only attempted for connections which were successfully |
| 199 | added to the device table and are in TLS_HW mode. For example, |
| 200 | if the table was full when cryptographic state was installed in the kernel, |
| 201 | such connection will never get offloaded. Therefore the resync request |
| 202 | does not carry any cryptographic connection state. |
| 203 | |
| 204 | TX |
| 205 | -- |
| 206 | |
| 207 | Segments transmitted from an offloaded socket can get out of sync |
| 208 | in similar ways to the receive side-retransmissions - local drops |
Jakub Kicinski | 5018007 | 2019-06-10 21:40:09 -0700 | [diff] [blame] | 209 | are possible, though network reorders are not. There are currently |
| 210 | two mechanisms for dealing with out of order segments. |
| 211 | |
| 212 | Crypto state rebuilding |
| 213 | ~~~~~~~~~~~~~~~~~~~~~~~ |
Jakub Kicinski | f42c104 | 2019-05-21 18:57:14 -0700 | [diff] [blame] | 214 | |
| 215 | Whenever an out of order segment is transmitted the driver provides |
| 216 | the device with enough information to perform cryptographic operations. |
| 217 | This means most likely that the part of the record preceding the current |
| 218 | segment has to be passed to the device as part of the packet context, |
| 219 | together with its TCP sequence number and TLS record number. The device |
| 220 | can then initialize its crypto state, process and discard the preceding |
| 221 | data (to be able to insert the authentication tag) and move onto handling |
| 222 | the actual packet. |
| 223 | |
| 224 | In this mode depending on the implementation the driver can either ask |
| 225 | for a continuation with the crypto state and the new sequence number |
| 226 | (next expected segment is the one after the out of order one), or continue |
| 227 | with the previous stream state - assuming that the out of order segment |
| 228 | was just a retransmission. The former is simpler, and does not require |
| 229 | retransmission detection therefore it is the recommended method until |
| 230 | such time it is proven inefficient. |
| 231 | |
Jakub Kicinski | 5018007 | 2019-06-10 21:40:09 -0700 | [diff] [blame] | 232 | Next record sync |
| 233 | ~~~~~~~~~~~~~~~~ |
| 234 | |
| 235 | Whenever an out of order segment is detected the driver requests |
| 236 | that the ``ktls`` software fallback code encrypt it. If the segment's |
| 237 | sequence number is lower than expected the driver assumes retransmission |
| 238 | and doesn't change device state. If the segment is in the future, it |
| 239 | may imply a local drop, the driver asks the stack to sync the device |
| 240 | to the next record state and falls back to software. |
| 241 | |
| 242 | Resync request is indicated with: |
| 243 | |
| 244 | .. code-block:: c |
| 245 | |
| 246 | void tls_offload_tx_resync_request(struct sock *sk, u32 got_seq, u32 exp_seq) |
| 247 | |
| 248 | Until resync is complete driver should not access its expected TCP |
| 249 | sequence number (as it will be updated from a different context). |
| 250 | Following helper should be used to test if resync is complete: |
| 251 | |
| 252 | .. code-block:: c |
| 253 | |
| 254 | bool tls_offload_tx_resync_pending(struct sock *sk) |
| 255 | |
| 256 | Next time ``ktls`` pushes a record it will first send its TCP sequence number |
| 257 | and TLS record number to the driver. Stack will also make sure that |
| 258 | the new record will start on a segment boundary (like it does when |
| 259 | the connection is initially added). |
| 260 | |
Jakub Kicinski | f42c104 | 2019-05-21 18:57:14 -0700 | [diff] [blame] | 261 | RX |
| 262 | -- |
| 263 | |
| 264 | A small amount of RX reorder events may not require a full resynchronization. |
| 265 | In particular the device should not lose synchronization |
| 266 | when record boundary can be recovered: |
| 267 | |
| 268 | .. kernel-figure:: tls-offload-reorder-good.svg |
| 269 | :alt: reorder of non-header segment |
| 270 | :align: center |
| 271 | |
| 272 | Reorder of non-header segment |
| 273 | |
| 274 | Green segments are successfully decrypted, blue ones are passed |
| 275 | as received on wire, red stripes mark start of new records. |
| 276 | |
| 277 | In above case segment 1 is received and decrypted successfully. |
| 278 | Segment 2 was dropped so 3 arrives out of order. The device knows |
| 279 | the next record starts inside 3, based on record length in segment 1. |
| 280 | Segment 3 is passed untouched, because due to lack of data from segment 2 |
| 281 | the remainder of the previous record inside segment 3 cannot be handled. |
| 282 | The device can, however, collect the authentication algorithm's state |
| 283 | and partial block from the new record in segment 3 and when 4 and 5 |
| 284 | arrive continue decryption. Finally when 2 arrives it's completely outside |
| 285 | of expected window of the device so it's passed as is without special |
| 286 | handling. ``ktls`` software fallback handles the decryption of record |
| 287 | spanning segments 1, 2 and 3. The device did not get out of sync, |
| 288 | even though two segments did not get decrypted. |
| 289 | |
| 290 | Kernel synchronization may be necessary if the lost segment contained |
| 291 | a record header and arrived after the next record header has already passed: |
| 292 | |
| 293 | .. kernel-figure:: tls-offload-reorder-bad.svg |
| 294 | :alt: reorder of header segment |
| 295 | :align: center |
| 296 | |
| 297 | Reorder of segment with a TLS header |
| 298 | |
| 299 | In this example segment 2 gets dropped, and it contains a record header. |
| 300 | Device can only detect that segment 4 also contains a TLS header |
| 301 | if it knows the length of the previous record from segment 2. In this case |
| 302 | the device will lose synchronization with the stream. |
| 303 | |
Jakub Kicinski | f953d33b | 2019-06-10 21:40:02 -0700 | [diff] [blame] | 304 | Stream scan resynchronization |
| 305 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 306 | |
Jakub Kicinski | f42c104 | 2019-05-21 18:57:14 -0700 | [diff] [blame] | 307 | When the device gets out of sync and the stream reaches TCP sequence |
| 308 | numbers more than a max size record past the expected TCP sequence number, |
| 309 | the device starts scanning for a known header pattern. For example |
| 310 | for TLS 1.2 and TLS 1.3 subsequent bytes of value ``0x03 0x03`` occur |
| 311 | in the SSL/TLS version field of the header. Once pattern is matched |
| 312 | the device continues attempting parsing headers at expected locations |
| 313 | (based on the length fields at guessed locations). |
| 314 | Whenever the expected location does not contain a valid header the scan |
| 315 | is restarted. |
| 316 | |
| 317 | When the header is matched the device sends a confirmation request |
| 318 | to the kernel, asking if the guessed location is correct (if a TLS record |
| 319 | really starts there), and which record sequence number the given header had. |
| 320 | The kernel confirms the guessed location was correct and tells the device |
| 321 | the record sequence number. Meanwhile, the device had been parsing |
| 322 | and counting all records since the just-confirmed one, it adds the number |
| 323 | of records it had seen to the record number provided by the kernel. |
| 324 | At this point the device is in sync and can resume decryption at next |
| 325 | segment boundary. |
| 326 | |
| 327 | In a pathological case the device may latch onto a sequence of matching |
| 328 | headers and never hear back from the kernel (there is no negative |
| 329 | confirmation from the kernel). The implementation may choose to periodically |
| 330 | restart scan. Given how unlikely falsely-matching stream is, however, |
| 331 | periodic restart is not deemed necessary. |
| 332 | |
| 333 | Special care has to be taken if the confirmation request is passed |
| 334 | asynchronously to the packet stream and record may get processed |
| 335 | by the kernel before the confirmation request. |
| 336 | |
Jakub Kicinski | f953d33b | 2019-06-10 21:40:02 -0700 | [diff] [blame] | 337 | Stack-driven resynchronization |
| 338 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 339 | |
| 340 | The driver may also request the stack to perform resynchronization |
| 341 | whenever it sees the records are no longer getting decrypted. |
| 342 | If the connection is configured in this mode the stack automatically |
| 343 | schedules resynchronization after it has received two completely encrypted |
| 344 | records. |
| 345 | |
| 346 | The stack waits for the socket to drain and informs the device about |
| 347 | the next expected record number and its TCP sequence number. If the |
| 348 | records continue to be received fully encrypted stack retries the |
| 349 | synchronization with an exponential back off (first after 2 encrypted |
| 350 | records, then after 4 records, after 8, after 16... up until every |
| 351 | 128 records). |
| 352 | |
Jakub Kicinski | f42c104 | 2019-05-21 18:57:14 -0700 | [diff] [blame] | 353 | Error handling |
| 354 | ============== |
| 355 | |
| 356 | TX |
| 357 | -- |
| 358 | |
| 359 | Packets may be redirected or rerouted by the stack to a different |
| 360 | device than the selected TLS offload device. The stack will handle |
| 361 | such condition using the :c:func:`sk_validate_xmit_skb` helper |
| 362 | (TLS offload code installs :c:func:`tls_validate_xmit_skb` at this hook). |
| 363 | Offload maintains information about all records until the data is |
| 364 | fully acknowledged, so if skbs reach the wrong device they can be handled |
| 365 | by software fallback. |
| 366 | |
| 367 | Any device TLS offload handling error on the transmission side must result |
| 368 | in the packet being dropped. For example if a packet got out of order |
| 369 | due to a bug in the stack or the device, reached the device and can't |
| 370 | be encrypted such packet must be dropped. |
| 371 | |
| 372 | RX |
| 373 | -- |
| 374 | |
| 375 | If the device encounters any problems with TLS offload on the receive |
| 376 | side it should pass the packet to the host's networking stack as it was |
| 377 | received on the wire. |
| 378 | |
| 379 | For example authentication failure for any record in the segment should |
| 380 | result in passing the unmodified packet to the software fallback. This means |
| 381 | packets should not be modified "in place". Splitting segments to handle partial |
| 382 | decryption is not advised. In other words either all records in the packet |
| 383 | had been handled successfully and authenticated or the packet has to be passed |
| 384 | to the host's stack as it was on the wire (recovering original packet in the |
| 385 | driver if device provides precise error is sufficient). |
| 386 | |
| 387 | The Linux networking stack does not provide a way of reporting per-packet |
| 388 | decryption and authentication errors, packets with errors must simply not |
| 389 | have the :c:member:`decrypted` mark set. |
| 390 | |
| 391 | A packet should also not be handled by the TLS offload if it contains |
| 392 | incorrect checksums. |
| 393 | |
| 394 | Performance metrics |
| 395 | =================== |
| 396 | |
| 397 | TLS offload can be characterized by the following basic metrics: |
| 398 | |
| 399 | * max connection count |
| 400 | * connection installation rate |
| 401 | * connection installation latency |
| 402 | * total cryptographic performance |
| 403 | |
| 404 | Note that each TCP connection requires a TLS session in both directions, |
| 405 | the performance may be reported treating each direction separately. |
| 406 | |
| 407 | Max connection count |
| 408 | -------------------- |
| 409 | |
| 410 | The number of connections device can support can be exposed via |
| 411 | ``devlink resource`` API. |
| 412 | |
| 413 | Total cryptographic performance |
| 414 | ------------------------------- |
| 415 | |
| 416 | Offload performance may depend on segment and record size. |
| 417 | |
| 418 | Overload of the cryptographic subsystem of the device should not have |
| 419 | significant performance impact on non-offloaded streams. |
| 420 | |
| 421 | Statistics |
| 422 | ========== |
| 423 | |
| 424 | Following minimum set of TLS-related statistics should be reported |
| 425 | by the driver: |
| 426 | |
Tariq Toukan | 280c089 | 2019-07-22 13:43:03 +0300 | [diff] [blame] | 427 | * ``rx_tls_decrypted_packets`` - number of successfully decrypted RX packets |
| 428 | which were part of a TLS stream. |
| 429 | * ``rx_tls_decrypted_bytes`` - number of TLS payload bytes in RX packets |
| 430 | which were successfully decrypted. |
Tariq Toukan | 76c1e1a | 2020-06-15 15:25:23 +0300 | [diff] [blame] | 431 | * ``rx_tls_ctx`` - number of TLS RX HW offload contexts added to device for |
| 432 | decryption. |
| 433 | * ``rx_tls_del`` - number of TLS RX HW offload contexts deleted from device |
| 434 | (connection has finished). |
| 435 | * ``rx_tls_resync_req_pkt`` - number of received TLS packets with a resync |
| 436 | request. |
| 437 | * ``rx_tls_resync_req_start`` - number of times the TLS async resync request |
| 438 | was started. |
| 439 | * ``rx_tls_resync_req_end`` - number of times the TLS async resync request |
| 440 | properly ended with providing the HW tracked tcp-seq. |
| 441 | * ``rx_tls_resync_req_skip`` - number of times the TLS async resync request |
| 442 | procedure was started by not properly ended. |
| 443 | * ``rx_tls_resync_res_ok`` - number of times the TLS resync response call to |
| 444 | the driver was successfully handled. |
| 445 | * ``rx_tls_resync_res_skip`` - number of times the TLS resync response call to |
| 446 | the driver was terminated unsuccessfully. |
| 447 | * ``rx_tls_err`` - number of RX packets which were part of a TLS stream |
| 448 | but were not decrypted due to unexpected error in the state machine. |
Tariq Toukan | 280c089 | 2019-07-22 13:43:03 +0300 | [diff] [blame] | 449 | * ``tx_tls_encrypted_packets`` - number of TX packets passed to the device |
| 450 | for encryption of their TLS payload. |
| 451 | * ``tx_tls_encrypted_bytes`` - number of TLS payload bytes in TX packets |
| 452 | passed to the device for encryption. |
| 453 | * ``tx_tls_ctx`` - number of TLS TX HW offload contexts added to device for |
| 454 | encryption. |
Jakub Kicinski | f42c104 | 2019-05-21 18:57:14 -0700 | [diff] [blame] | 455 | * ``tx_tls_ooo`` - number of TX packets which were part of a TLS stream |
Tariq Toukan | 280c089 | 2019-07-22 13:43:03 +0300 | [diff] [blame] | 456 | but did not arrive in the expected order. |
Tariq Toukan | 2836654 | 2019-11-05 14:13:48 +0200 | [diff] [blame] | 457 | * ``tx_tls_skip_no_sync_data`` - number of TX packets which were part of |
| 458 | a TLS stream and arrived out-of-order, but skipped the HW offload routine |
| 459 | and went to the regular transmit flow as they were retransmissions of the |
| 460 | connection handshake. |
Tariq Toukan | 280c089 | 2019-07-22 13:43:03 +0300 | [diff] [blame] | 461 | * ``tx_tls_drop_no_sync_data`` - number of TX packets which were part of |
| 462 | a TLS stream dropped, because they arrived out of order and associated |
| 463 | record could not be found. |
| 464 | * ``tx_tls_drop_bypass_req`` - number of TX packets which were part of a TLS |
| 465 | stream dropped, because they contain both data that has been encrypted by |
| 466 | software and data that expects hardware crypto offload. |
Jakub Kicinski | f42c104 | 2019-05-21 18:57:14 -0700 | [diff] [blame] | 467 | |
| 468 | Notable corner cases, exceptions and additional requirements |
| 469 | ============================================================ |
| 470 | |
| 471 | .. _5tuple_problems: |
| 472 | |
| 473 | 5-tuple matching limitations |
| 474 | ---------------------------- |
| 475 | |
| 476 | The device can only recognize received packets based on the 5-tuple |
| 477 | of the socket. Current ``ktls`` implementation will not offload sockets |
| 478 | routed through software interfaces such as those used for tunneling |
| 479 | or virtual networking. However, many packet transformations performed |
| 480 | by the networking stack (most notably any BPF logic) do not require |
| 481 | any intermediate software device, therefore a 5-tuple match may |
| 482 | consistently miss at the device level. In such cases the device |
| 483 | should still be able to perform TX offload (encryption) and should |
| 484 | fallback cleanly to software decryption (RX). |
| 485 | |
| 486 | Out of order |
| 487 | ------------ |
| 488 | |
| 489 | Introducing extra processing in NICs should not cause packets to be |
| 490 | transmitted or received out of order, for example pure ACK packets |
| 491 | should not be reordered with respect to data segments. |
| 492 | |
| 493 | Ingress reorder |
| 494 | --------------- |
| 495 | |
| 496 | A device is permitted to perform packet reordering for consecutive |
| 497 | TCP segments (i.e. placing packets in the correct order) but any form |
| 498 | of additional buffering is disallowed. |
| 499 | |
| 500 | Coexistence with standard networking offload features |
| 501 | ----------------------------------------------------- |
| 502 | |
| 503 | Offloaded ``ktls`` sockets should support standard TCP stack features |
| 504 | transparently. Enabling device TLS offload should not cause any difference |
| 505 | in packets as seen on the wire. |
| 506 | |
| 507 | Transport layer transparency |
| 508 | ---------------------------- |
| 509 | |
| 510 | The device should not modify any packet headers for the purpose |
| 511 | of the simplifying TLS offload. |
| 512 | |
| 513 | The device should not depend on any packet headers beyond what is strictly |
| 514 | necessary for TLS offload. |
| 515 | |
| 516 | Segment drops |
| 517 | ------------- |
| 518 | |
| 519 | Dropping packets is acceptable only in the event of catastrophic |
| 520 | system errors and should never be used as an error handling mechanism |
| 521 | in cases arising from normal operation. In other words, reliance |
| 522 | on TCP retransmissions to handle corner cases is not acceptable. |
| 523 | |
| 524 | TLS device features |
| 525 | ------------------- |
| 526 | |
Tariq Toukan | ae0b04b | 2020-12-13 16:39:29 +0200 | [diff] [blame] | 527 | Drivers should ignore the changes to the TLS device feature flags. |
Jakub Kicinski | f42c104 | 2019-05-21 18:57:14 -0700 | [diff] [blame] | 528 | These flags will be acted upon accordingly by the core ``ktls`` code. |
| 529 | TLS device feature flags only control adding of new TLS connection |
| 530 | offloads, old connections will remain active after flags are cleared. |
Tariq Toukan | ae0b04b | 2020-12-13 16:39:29 +0200 | [diff] [blame] | 531 | |
| 532 | TLS encryption cannot be offloaded to devices without checksum calculation |
| 533 | offload. Hence, TLS TX device feature flag requires NETIF_F_HW_CSUM being set. |
| 534 | Disabling the latter implies clearing the former. Disabling TX checksum offload |
| 535 | should not affect old connections, and drivers should make sure checksum |
| 536 | calculation does not break for them. |