| .. SPDX-License-Identifier: BSD-3-Clause |
| |
| ======================= |
| Introduction to Netlink |
| ======================= |
| |
| Netlink is often described as an ioctl() replacement. |
| It aims to replace fixed-format C structures as supplied |
| to ioctl() with a format which allows an easy way to add |
| or extended the arguments. |
| |
| To achieve this Netlink uses a minimal fixed-format metadata header |
| followed by multiple attributes in the TLV (type, length, value) format. |
| |
| Unfortunately the protocol has evolved over the years, in an organic |
| and undocumented fashion, making it hard to coherently explain. |
| To make the most practical sense this document starts by describing |
| netlink as it is used today and dives into more "historical" uses |
| in later sections. |
| |
| Opening a socket |
| ================ |
| |
| Netlink communication happens over sockets, a socket needs to be |
| opened first: |
| |
| .. code-block:: c |
| |
| fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC); |
| |
| The use of sockets allows for a natural way of exchanging information |
| in both directions (to and from the kernel). The operations are still |
| performed synchronously when applications send() the request but |
| a separate recv() system call is needed to read the reply. |
| |
| A very simplified flow of a Netlink "call" will therefore look |
| something like: |
| |
| .. code-block:: c |
| |
| fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC); |
| |
| /* format the request */ |
| send(fd, &request, sizeof(request)); |
| n = recv(fd, &response, RSP_BUFFER_SIZE); |
| /* interpret the response */ |
| |
| Netlink also provides natural support for "dumping", i.e. communicating |
| to user space all objects of a certain type (e.g. dumping all network |
| interfaces). |
| |
| .. code-block:: c |
| |
| fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC); |
| |
| /* format the dump request */ |
| send(fd, &request, sizeof(request)); |
| while (1) { |
| n = recv(fd, &buffer, RSP_BUFFER_SIZE); |
| /* one recv() call can read multiple messages, hence the loop below */ |
| for (nl_msg in buffer) { |
| if (nl_msg.nlmsg_type == NLMSG_DONE) |
| goto dump_finished; |
| /* process the object */ |
| } |
| } |
| dump_finished: |
| |
| The first two arguments of the socket() call require little explanation - |
| it is opening a Netlink socket, with all headers provided by the user |
| (hence NETLINK, RAW). The last argument is the protocol within Netlink. |
| This field used to identify the subsystem with which the socket will |
| communicate. |
| |
| Classic vs Generic Netlink |
| -------------------------- |
| |
| Initial implementation of Netlink depended on a static allocation |
| of IDs to subsystems and provided little supporting infrastructure. |
| Let us refer to those protocols collectively as **Classic Netlink**. |
| The list of them is defined on top of the ``include/uapi/linux/netlink.h`` |
| file, they include among others - general networking (NETLINK_ROUTE), |
| iSCSI (NETLINK_ISCSI), and audit (NETLINK_AUDIT). |
| |
| **Generic Netlink** (introduced in 2005) allows for dynamic registration of |
| subsystems (and subsystem ID allocation), introspection and simplifies |
| implementing the kernel side of the interface. |
| |
| The following section describes how to use Generic Netlink, as the |
| number of subsystems using Generic Netlink outnumbers the older |
| protocols by an order of magnitude. There are also no plans for adding |
| more Classic Netlink protocols to the kernel. |
| Basic information on how communicating with core networking parts of |
| the Linux kernel (or another of the 20 subsystems using Classic |
| Netlink) differs from Generic Netlink is provided later in this document. |
| |
| Generic Netlink |
| =============== |
| |
| In addition to the Netlink fixed metadata header each Netlink protocol |
| defines its own fixed metadata header. (Similarly to how network |
| headers stack - Ethernet > IP > TCP we have Netlink > Generic N. > Family.) |
| |
| A Netlink message always starts with struct nlmsghdr, which is followed |
| by a protocol-specific header. In case of Generic Netlink the protocol |
| header is struct genlmsghdr. |
| |
| The practical meaning of the fields in case of Generic Netlink is as follows: |
| |
| .. code-block:: c |
| |
| struct nlmsghdr { |
| __u32 nlmsg_len; /* Length of message including headers */ |
| __u16 nlmsg_type; /* Generic Netlink Family (subsystem) ID */ |
| __u16 nlmsg_flags; /* Flags - request or dump */ |
| __u32 nlmsg_seq; /* Sequence number */ |
| __u32 nlmsg_pid; /* Port ID, set to 0 */ |
| }; |
| struct genlmsghdr { |
| __u8 cmd; /* Command, as defined by the Family */ |
| __u8 version; /* Irrelevant, set to 1 */ |
| __u16 reserved; /* Reserved, set to 0 */ |
| }; |
| /* TLV attributes follow... */ |
| |
| In Classic Netlink :c:member:`nlmsghdr.nlmsg_type` used to identify |
| which operation within the subsystem the message was referring to |
| (e.g. get information about a netdev). Generic Netlink needs to mux |
| multiple subsystems in a single protocol so it uses this field to |
| identify the subsystem, and :c:member:`genlmsghdr.cmd` identifies |
| the operation instead. (See :ref:`res_fam` for |
| information on how to find the Family ID of the subsystem of interest.) |
| Note that the first 16 values (0 - 15) of this field are reserved for |
| control messages both in Classic Netlink and Generic Netlink. |
| See :ref:`nl_msg_type` for more details. |
| |
| There are 3 usual types of message exchanges on a Netlink socket: |
| |
| - performing a single action (``do``); |
| - dumping information (``dump``); |
| - getting asynchronous notifications (``multicast``). |
| |
| Classic Netlink is very flexible and presumably allows other types |
| of exchanges to happen, but in practice those are the three that get |
| used. |
| |
| Asynchronous notifications are sent by the kernel and received by |
| the user sockets which subscribed to them. ``do`` and ``dump`` requests |
| are initiated by the user. :c:member:`nlmsghdr.nlmsg_flags` should |
| be set as follows: |
| |
| - for ``do``: ``NLM_F_REQUEST | NLM_F_ACK`` |
| - for ``dump``: ``NLM_F_REQUEST | NLM_F_ACK | NLM_F_DUMP`` |
| |
| :c:member:`nlmsghdr.nlmsg_seq` should be a set to a monotonically |
| increasing value. The value gets echoed back in responses and doesn't |
| matter in practice, but setting it to an increasing value for each |
| message sent is considered good hygiene. The purpose of the field is |
| matching responses to requests. Asynchronous notifications will have |
| :c:member:`nlmsghdr.nlmsg_seq` of ``0``. |
| |
| :c:member:`nlmsghdr.nlmsg_pid` is the Netlink equivalent of an address. |
| This field can be set to ``0`` when talking to the kernel. |
| See :ref:`nlmsg_pid` for the (uncommon) uses of the field. |
| |
| The expected use for :c:member:`genlmsghdr.version` was to allow |
| versioning of the APIs provided by the subsystems. No subsystem to |
| date made significant use of this field, so setting it to ``1`` seems |
| like a safe bet. |
| |
| .. _nl_msg_type: |
| |
| Netlink message types |
| --------------------- |
| |
| As previously mentioned :c:member:`nlmsghdr.nlmsg_type` carries |
| protocol specific values but the first 16 identifiers are reserved |
| (first subsystem specific message type should be equal to |
| ``NLMSG_MIN_TYPE`` which is ``0x10``). |
| |
| There are only 4 Netlink control messages defined: |
| |
| - ``NLMSG_NOOP`` - ignore the message, not used in practice; |
| - ``NLMSG_ERROR`` - carries the return code of an operation; |
| - ``NLMSG_DONE`` - marks the end of a dump; |
| - ``NLMSG_OVERRUN`` - socket buffer has overflown, not used to date. |
| |
| ``NLMSG_ERROR`` and ``NLMSG_DONE`` are of practical importance. |
| They carry return codes for operations. Note that unless |
| the ``NLM_F_ACK`` flag is set on the request Netlink will not respond |
| with ``NLMSG_ERROR`` if there is no error. To avoid having to special-case |
| this quirk it is recommended to always set ``NLM_F_ACK``. |
| |
| The format of ``NLMSG_ERROR`` is described by struct nlmsgerr:: |
| |
| ---------------------------------------------- |
| | struct nlmsghdr - response header | |
| ---------------------------------------------- |
| | int error | |
| ---------------------------------------------- |
| | struct nlmsghdr - original request header | |
| ---------------------------------------------- |
| | ** optionally (1) payload of the request | |
| ---------------------------------------------- |
| | ** optionally (2) extended ACK | |
| ---------------------------------------------- |
| |
| There are two instances of struct nlmsghdr here, first of the response |
| and second of the request. ``NLMSG_ERROR`` carries the information about |
| the request which led to the error. This could be useful when trying |
| to match requests to responses or re-parse the request to dump it into |
| logs. |
| |
| The payload of the request is not echoed in messages reporting success |
| (``error == 0``) or if ``NETLINK_CAP_ACK`` setsockopt() was set. |
| The latter is common |
| and perhaps recommended as having to read a copy of every request back |
| from the kernel is rather wasteful. The absence of request payload |
| is indicated by ``NLM_F_CAPPED`` in :c:member:`nlmsghdr.nlmsg_flags`. |
| |
| The second optional element of ``NLMSG_ERROR`` are the extended ACK |
| attributes. See :ref:`ext_ack` for more details. The presence |
| of extended ACK is indicated by ``NLM_F_ACK_TLVS`` in |
| :c:member:`nlmsghdr.nlmsg_flags`. |
| |
| ``NLMSG_DONE`` is simpler, the request is never echoed but the extended |
| ACK attributes may be present:: |
| |
| ---------------------------------------------- |
| | struct nlmsghdr - response header | |
| ---------------------------------------------- |
| | int error | |
| ---------------------------------------------- |
| | ** optionally extended ACK | |
| ---------------------------------------------- |
| |
| Note that some implementations may issue custom ``NLMSG_DONE`` messages |
| in reply to ``do`` action requests. In that case the payload is |
| implementation-specific and may also be absent. |
| |
| .. _res_fam: |
| |
| Resolving the Family ID |
| ----------------------- |
| |
| This section explains how to find the Family ID of a subsystem. |
| It also serves as an example of Generic Netlink communication. |
| |
| Generic Netlink is itself a subsystem exposed via the Generic Netlink API. |
| To avoid a circular dependency Generic Netlink has a statically allocated |
| Family ID (``GENL_ID_CTRL`` which is equal to ``NLMSG_MIN_TYPE``). |
| The Generic Netlink family implements a command used to find out information |
| about other families (``CTRL_CMD_GETFAMILY``). |
| |
| To get information about the Generic Netlink family named for example |
| ``"test1"`` we need to send a message on the previously opened Generic Netlink |
| socket. The message should target the Generic Netlink Family (1), be a |
| ``do`` (2) call to ``CTRL_CMD_GETFAMILY`` (3). A ``dump`` version of this |
| call would make the kernel respond with information about *all* the families |
| it knows about. Last but not least the name of the family in question has |
| to be specified (4) as an attribute with the appropriate type:: |
| |
| struct nlmsghdr: |
| __u32 nlmsg_len: 32 |
| __u16 nlmsg_type: GENL_ID_CTRL // (1) |
| __u16 nlmsg_flags: NLM_F_REQUEST | NLM_F_ACK // (2) |
| __u32 nlmsg_seq: 1 |
| __u32 nlmsg_pid: 0 |
| |
| struct genlmsghdr: |
| __u8 cmd: CTRL_CMD_GETFAMILY // (3) |
| __u8 version: 2 /* or 1, doesn't matter */ |
| __u16 reserved: 0 |
| |
| struct nlattr: // (4) |
| __u16 nla_len: 10 |
| __u16 nla_type: CTRL_ATTR_FAMILY_NAME |
| char data: test1\0 |
| |
| (padding:) |
| char data: \0\0 |
| |
| The length fields in Netlink (:c:member:`nlmsghdr.nlmsg_len` |
| and :c:member:`nlattr.nla_len`) always *include* the header. |
| Attribute headers in netlink must be aligned to 4 bytes from the start |
| of the message, hence the extra ``\0\0`` after ``CTRL_ATTR_FAMILY_NAME``. |
| The attribute lengths *exclude* the padding. |
| |
| If the family is found kernel will reply with two messages, the response |
| with all the information about the family:: |
| |
| /* Message #1 - reply */ |
| struct nlmsghdr: |
| __u32 nlmsg_len: 136 |
| __u16 nlmsg_type: GENL_ID_CTRL |
| __u16 nlmsg_flags: 0 |
| __u32 nlmsg_seq: 1 /* echoed from our request */ |
| __u32 nlmsg_pid: 5831 /* The PID of our user space process */ |
| |
| struct genlmsghdr: |
| __u8 cmd: CTRL_CMD_GETFAMILY |
| __u8 version: 2 |
| __u16 reserved: 0 |
| |
| struct nlattr: |
| __u16 nla_len: 10 |
| __u16 nla_type: CTRL_ATTR_FAMILY_NAME |
| char data: test1\0 |
| |
| (padding:) |
| data: \0\0 |
| |
| struct nlattr: |
| __u16 nla_len: 6 |
| __u16 nla_type: CTRL_ATTR_FAMILY_ID |
| __u16: 123 /* The Family ID we are after */ |
| |
| (padding:) |
| char data: \0\0 |
| |
| struct nlattr: |
| __u16 nla_len: 9 |
| __u16 nla_type: CTRL_ATTR_FAMILY_VERSION |
| __u16: 1 |
| |
| /* ... etc, more attributes will follow. */ |
| |
| And the error code (success) since ``NLM_F_ACK`` had been set on the request:: |
| |
| /* Message #2 - the ACK */ |
| struct nlmsghdr: |
| __u32 nlmsg_len: 36 |
| __u16 nlmsg_type: NLMSG_ERROR |
| __u16 nlmsg_flags: NLM_F_CAPPED /* There won't be a payload */ |
| __u32 nlmsg_seq: 1 /* echoed from our request */ |
| __u32 nlmsg_pid: 5831 /* The PID of our user space process */ |
| |
| int error: 0 |
| |
| struct nlmsghdr: /* Copy of the request header as we sent it */ |
| __u32 nlmsg_len: 32 |
| __u16 nlmsg_type: GENL_ID_CTRL |
| __u16 nlmsg_flags: NLM_F_REQUEST | NLM_F_ACK |
| __u32 nlmsg_seq: 1 |
| __u32 nlmsg_pid: 0 |
| |
| The order of attributes (struct nlattr) is not guaranteed so the user |
| has to walk the attributes and parse them. |
| |
| Note that Generic Netlink sockets are not associated or bound to a single |
| family. A socket can be used to exchange messages with many different |
| families, selecting the recipient family on message-by-message basis using |
| the :c:member:`nlmsghdr.nlmsg_type` field. |
| |
| .. _ext_ack: |
| |
| Extended ACK |
| ------------ |
| |
| Extended ACK controls reporting of additional error/warning TLVs |
| in ``NLMSG_ERROR`` and ``NLMSG_DONE`` messages. To maintain backward |
| compatibility this feature has to be explicitly enabled by setting |
| the ``NETLINK_EXT_ACK`` setsockopt() to ``1``. |
| |
| Types of extended ack attributes are defined in enum nlmsgerr_attrs. |
| The most commonly used attributes are ``NLMSGERR_ATTR_MSG``, |
| ``NLMSGERR_ATTR_OFFS`` and ``NLMSGERR_ATTR_MISS_*``. |
| |
| ``NLMSGERR_ATTR_MSG`` carries a message in English describing |
| the encountered problem. These messages are far more detailed |
| than what can be expressed thru standard UNIX error codes. |
| |
| ``NLMSGERR_ATTR_OFFS`` points to the attribute which caused the problem. |
| |
| ``NLMSGERR_ATTR_MISS_TYPE`` and ``NLMSGERR_ATTR_MISS_NEST`` |
| inform about a missing attribute. |
| |
| Extended ACKs can be reported on errors as well as in case of success. |
| The latter should be treated as a warning. |
| |
| Extended ACKs greatly improve the usability of Netlink and should |
| always be enabled, appropriately parsed and reported to the user. |
| |
| Advanced topics |
| =============== |
| |
| Dump consistency |
| ---------------- |
| |
| Some of the data structures kernel uses for storing objects make |
| it hard to provide an atomic snapshot of all the objects in a dump |
| (without impacting the fast-paths updating them). |
| |
| Kernel may set the ``NLM_F_DUMP_INTR`` flag on any message in a dump |
| (including the ``NLMSG_DONE`` message) if the dump was interrupted and |
| may be inconsistent (e.g. missing objects). User space should retry |
| the dump if it sees the flag set. |
| |
| Introspection |
| ------------- |
| |
| The basic introspection abilities are enabled by access to the Family |
| object as reported in :ref:`res_fam`. User can query information about |
| the Generic Netlink family, including which operations are supported |
| by the kernel and what attributes the kernel understands. |
| Family information includes the highest ID of an attribute kernel can parse, |
| a separate command (``CTRL_CMD_GETPOLICY``) provides detailed information |
| about supported attributes, including ranges of values the kernel accepts. |
| |
| Querying family information is useful in cases when user space needs |
| to make sure that the kernel has support for a feature before issuing |
| a request. |
| |
| .. _nlmsg_pid: |
| |
| nlmsg_pid |
| --------- |
| |
| :c:member:`nlmsghdr.nlmsg_pid` is the Netlink equivalent of an address. |
| It is referred to as Port ID, sometimes Process ID because for historical |
| reasons if the application does not select (bind() to) an explicit Port ID |
| kernel will automatically assign it the ID equal to its Process ID |
| (as reported by the getpid() system call). |
| |
| Similarly to the bind() semantics of the TCP/IP network protocols the value |
| of zero means "assign automatically", hence it is common for applications |
| to leave the :c:member:`nlmsghdr.nlmsg_pid` field initialized to ``0``. |
| |
| The field is still used today in rare cases when kernel needs to send |
| a unicast notification. User space application can use bind() to associate |
| its socket with a specific PID, it then communicates its PID to the kernel. |
| This way the kernel can reach the specific user space process. |
| |
| This sort of communication is utilized in UMH (User Mode Helper)-like |
| scenarios when kernel needs to trigger user space processing or ask user |
| space for a policy decision. |
| |
| Multicast notifications |
| ----------------------- |
| |
| One of the strengths of Netlink is the ability to send event notifications |
| to user space. This is a unidirectional form of communication (kernel -> |
| user) and does not involve any control messages like ``NLMSG_ERROR`` or |
| ``NLMSG_DONE``. |
| |
| For example the Generic Netlink family itself defines a set of multicast |
| notifications about registered families. When a new family is added the |
| sockets subscribed to the notifications will get the following message:: |
| |
| struct nlmsghdr: |
| __u32 nlmsg_len: 136 |
| __u16 nlmsg_type: GENL_ID_CTRL |
| __u16 nlmsg_flags: 0 |
| __u32 nlmsg_seq: 0 |
| __u32 nlmsg_pid: 0 |
| |
| struct genlmsghdr: |
| __u8 cmd: CTRL_CMD_NEWFAMILY |
| __u8 version: 2 |
| __u16 reserved: 0 |
| |
| struct nlattr: |
| __u16 nla_len: 10 |
| __u16 nla_type: CTRL_ATTR_FAMILY_NAME |
| char data: test1\0 |
| |
| (padding:) |
| data: \0\0 |
| |
| struct nlattr: |
| __u16 nla_len: 6 |
| __u16 nla_type: CTRL_ATTR_FAMILY_ID |
| __u16: 123 /* The Family ID we are after */ |
| |
| (padding:) |
| char data: \0\0 |
| |
| struct nlattr: |
| __u16 nla_len: 9 |
| __u16 nla_type: CTRL_ATTR_FAMILY_VERSION |
| __u16: 1 |
| |
| /* ... etc, more attributes will follow. */ |
| |
| The notification contains the same information as the response |
| to the ``CTRL_CMD_GETFAMILY`` request. |
| |
| The Netlink headers of the notification are mostly 0 and irrelevant. |
| The :c:member:`nlmsghdr.nlmsg_seq` may be either zero or a monotonically |
| increasing notification sequence number maintained by the family. |
| |
| To receive notifications the user socket must subscribe to the relevant |
| notification group. Much like the Family ID, the Group ID for a given |
| multicast group is dynamic and can be found inside the Family information. |
| The ``CTRL_ATTR_MCAST_GROUPS`` attribute contains nests with names |
| (``CTRL_ATTR_MCAST_GRP_NAME``) and IDs (``CTRL_ATTR_MCAST_GRP_ID``) of |
| the groups family. |
| |
| Once the Group ID is known a setsockopt() call adds the socket to the group: |
| |
| .. code-block:: c |
| |
| unsigned int group_id; |
| |
| /* .. find the group ID... */ |
| |
| setsockopt(fd, SOL_NETLINK, NETLINK_ADD_MEMBERSHIP, |
| &group_id, sizeof(group_id)); |
| |
| The socket will now receive notifications. |
| |
| It is recommended to use separate sockets for receiving notifications |
| and sending requests to the kernel. The asynchronous nature of notifications |
| means that they may get mixed in with the responses making the message |
| handling much harder. |
| |
| Buffer sizing |
| ------------- |
| |
| Netlink sockets are datagram sockets rather than stream sockets, |
| meaning that each message must be received in its entirety by a single |
| recv()/recvmsg() system call. If the buffer provided by the user is too |
| short, the message will be truncated and the ``MSG_TRUNC`` flag set |
| in struct msghdr (struct msghdr is the second argument |
| of the recvmsg() system call, *not* a Netlink header). |
| |
| Upon truncation the remaining part of the message is discarded. |
| |
| Netlink expects that the user buffer will be at least 8kB or a page |
| size of the CPU architecture, whichever is bigger. Particular Netlink |
| families may, however, require a larger buffer. 32kB buffer is recommended |
| for most efficient handling of dumps (larger buffer fits more dumped |
| objects and therefore fewer recvmsg() calls are needed). |
| |
| .. _classic_netlink: |
| |
| Classic Netlink |
| =============== |
| |
| The main differences between Classic and Generic Netlink are the dynamic |
| allocation of subsystem identifiers and availability of introspection. |
| In theory the protocol does not differ significantly, however, in practice |
| Classic Netlink experimented with concepts which were abandoned in Generic |
| Netlink (really, they usually only found use in a small corner of a single |
| subsystem). This section is meant as an explainer of a few of such concepts, |
| with the explicit goal of giving the Generic Netlink |
| users the confidence to ignore them when reading the uAPI headers. |
| |
| Most of the concepts and examples here refer to the ``NETLINK_ROUTE`` family, |
| which covers much of the configuration of the Linux networking stack. |
| Real documentation of that family, deserves a chapter (or a book) of its own. |
| |
| Families |
| -------- |
| |
| Netlink refers to subsystems as families. This is a remnant of using |
| sockets and the concept of protocol families, which are part of message |
| demultiplexing in ``NETLINK_ROUTE``. |
| |
| Sadly every layer of encapsulation likes to refer to whatever it's carrying |
| as "families" making the term very confusing: |
| |
| 1. AF_NETLINK is a bona fide socket protocol family |
| 2. AF_NETLINK's documentation refers to what comes after its own |
| header (struct nlmsghdr) in a message as a "Family Header" |
| 3. Generic Netlink is a family for AF_NETLINK (struct genlmsghdr follows |
| struct nlmsghdr), yet it also calls its users "Families". |
| |
| Note that the Generic Netlink Family IDs are in a different "ID space" |
| and overlap with Classic Netlink protocol numbers (e.g. ``NETLINK_CRYPTO`` |
| has the Classic Netlink protocol ID of 21 which Generic Netlink will |
| happily allocate to one of its families as well). |
| |
| Strict checking |
| --------------- |
| |
| The ``NETLINK_GET_STRICT_CHK`` socket option enables strict input checking |
| in ``NETLINK_ROUTE``. It was needed because historically kernel did not |
| validate the fields of structures it didn't process. This made it impossible |
| to start using those fields later without risking regressions in applications |
| which initialized them incorrectly or not at all. |
| |
| ``NETLINK_GET_STRICT_CHK`` declares that the application is initializing |
| all fields correctly. It also opts into validating that message does not |
| contain trailing data and requests that kernel rejects attributes with |
| type higher than largest attribute type known to the kernel. |
| |
| ``NETLINK_GET_STRICT_CHK`` is not used outside of ``NETLINK_ROUTE``. |
| |
| Unknown attributes |
| ------------------ |
| |
| Historically Netlink ignored all unknown attributes. The thinking was that |
| it would free the application from having to probe what kernel supports. |
| The application could make a request to change the state and check which |
| parts of the request "stuck". |
| |
| This is no longer the case for new Generic Netlink families and those opting |
| in to strict checking. See enum netlink_validation for validation types |
| performed. |
| |
| Fixed metadata and structures |
| ----------------------------- |
| |
| Classic Netlink made liberal use of fixed-format structures within |
| the messages. Messages would commonly have a structure with |
| a considerable number of fields after struct nlmsghdr. It was also |
| common to put structures with multiple members inside attributes, |
| without breaking each member into an attribute of its own. |
| |
| This has caused problems with validation and extensibility and |
| therefore using binary structures is actively discouraged for new |
| attributes. |
| |
| Request types |
| ------------- |
| |
| ``NETLINK_ROUTE`` categorized requests into 4 types ``NEW``, ``DEL``, ``GET``, |
| and ``SET``. Each object can handle all or some of those requests |
| (objects being netdevs, routes, addresses, qdiscs etc.) Request type |
| is defined by the 2 lowest bits of the message type, so commands for |
| new objects would always be allocated with a stride of 4. |
| |
| Each object would also have its own fixed metadata shared by all request |
| types (e.g. struct ifinfomsg for netdev requests, struct ifaddrmsg for address |
| requests, struct tcmsg for qdisc requests). |
| |
| Even though other protocols and Generic Netlink commands often use |
| the same verbs in their message names (``GET``, ``SET``) the concept |
| of request types did not find wider adoption. |
| |
| Notification echo |
| ----------------- |
| |
| ``NLM_F_ECHO`` requests for notifications resulting from the request |
| to be queued onto the requesting socket. This is useful to discover |
| the impact of the request. |
| |
| Note that this feature is not universally implemented. |
| |
| Other request-type-specific flags |
| --------------------------------- |
| |
| Classic Netlink defined various flags for its ``GET``, ``NEW`` |
| and ``DEL`` requests in the upper byte of nlmsg_flags in struct nlmsghdr. |
| Since request types have not been generalized the request type specific |
| flags are rarely used (and considered deprecated for new families). |
| |
| For ``GET`` - ``NLM_F_ROOT`` and ``NLM_F_MATCH`` are combined into |
| ``NLM_F_DUMP``, and not used separately. ``NLM_F_ATOMIC`` is never used. |
| |
| For ``DEL`` - ``NLM_F_NONREC`` is only used by nftables and ``NLM_F_BULK`` |
| only by FDB some operations. |
| |
| The flags for ``NEW`` are used most commonly in classic Netlink. Unfortunately, |
| the meaning is not crystal clear. The following description is based on the |
| best guess of the intention of the authors, and in practice all families |
| stray from it in one way or another. ``NLM_F_REPLACE`` asks to replace |
| an existing object, if no matching object exists the operation should fail. |
| ``NLM_F_EXCL`` has the opposite semantics and only succeeds if object already |
| existed. |
| ``NLM_F_CREATE`` asks for the object to be created if it does not |
| exist, it can be combined with ``NLM_F_REPLACE`` and ``NLM_F_EXCL``. |
| |
| A comment in the main Netlink uAPI header states:: |
| |
| 4.4BSD ADD NLM_F_CREATE|NLM_F_EXCL |
| 4.4BSD CHANGE NLM_F_REPLACE |
| |
| True CHANGE NLM_F_CREATE|NLM_F_REPLACE |
| Append NLM_F_CREATE |
| Check NLM_F_EXCL |
| |
| which seems to indicate that those flags predate request types. |
| ``NLM_F_REPLACE`` without ``NLM_F_CREATE`` was initially used instead |
| of ``SET`` commands. |
| ``NLM_F_EXCL`` without ``NLM_F_CREATE`` was used to check if object exists |
| without creating it, presumably predating ``GET`` commands. |
| |
| ``NLM_F_APPEND`` indicates that if one key can have multiple objects associated |
| with it (e.g. multiple next-hop objects for a route) the new object should be |
| added to the list rather than replacing the entire list. |
| |
| uAPI reference |
| ============== |
| |
| .. kernel-doc:: include/uapi/linux/netlink.h |