| <head> |
| <style> p { max-width:50em} ol, ul {max-width: 40em}</style> |
| </head> |
| |
| autofs - how it works |
| ===================== |
| |
| Purpose |
| ------- |
| |
| The goal of autofs is to provide on-demand mounting and race free |
| automatic unmounting of various other filesystems. This provides two |
| key advantages: |
| |
| 1. There is no need to delay boot until all filesystems that |
| might be needed are mounted. Processes that try to access those |
| slow filesystems might be delayed but other processes can |
| continue freely. This is particularly important for |
| network filesystems (e.g. NFS) or filesystems stored on |
| media with a media-changing robot. |
| |
| 2. The names and locations of filesystems can be stored in |
| a remote database and can change at any time. The content |
| in that data base at the time of access will be used to provide |
| a target for the access. The interpretation of names in the |
| filesystem can even be programmatic rather than database-backed, |
| allowing wildcards for example, and can vary based on the user who |
| first accessed a name. |
| |
| Context |
| ------- |
| |
| The "autofs4" filesystem module is only one part of an autofs system. |
| There also needs to be a user-space program which looks up names |
| and mounts filesystems. This will often be the "automount" program, |
| though other tools including "systemd" can make use of "autofs4". |
| This document describes only the kernel module and the interactions |
| required with any user-space program. Subsequent text refers to this |
| as the "automount daemon" or simply "the daemon". |
| |
| "autofs4" is a Linux kernel module with provides the "autofs" |
| filesystem type. Several "autofs" filesystems can be mounted and they |
| can each be managed separately, or all managed by the same daemon. |
| |
| Content |
| ------- |
| |
| An autofs filesystem can contain 3 sorts of objects: directories, |
| symbolic links and mount traps. Mount traps are directories with |
| extra properties as described in the next section. |
| |
| Objects can only be created by the automount daemon: symlinks are |
| created with a regular `symlink` system call, while directories and |
| mount traps are created with `mkdir`. The determination of whether a |
| directory should be a mount trap or not is quite _ad hoc_, largely for |
| historical reasons, and is determined in part by the |
| *direct*/*indirect*/*offset* mount options, and the *maxproto* mount option. |
| |
| If neither the *direct* or *offset* mount options are given (so the |
| mount is considered to be *indirect*), then the root directory is |
| always a regular directory, otherwise it is a mount trap when it is |
| empty and a regular directory when not empty. Note that *direct* and |
| *offset* are treated identically so a concise summary is that the root |
| directory is a mount trap only if the filesystem is mounted *direct* |
| and the root is empty. |
| |
| Directories created in the root directory are mount traps only if the |
| filesystem is mounted *indirect* and they are empty. |
| |
| Directories further down the tree depend on the *maxproto* mount |
| option and particularly whether it is less than five or not. |
| When *maxproto* is five, no directories further down the |
| tree are ever mount traps, they are always regular directories. When |
| the *maxproto* is four (or three), these directories are mount traps |
| precisely when they are empty. |
| |
| So: non-empty (i.e. non-leaf) directories are never mount traps. Empty |
| directories are sometimes mount traps, and sometimes not depending on |
| where in the tree they are (root, top level, or lower), the *maxproto*, |
| and whether the mount was *indirect* or not. |
| |
| Mount Traps |
| --------------- |
| |
| A core element of the implementation of autofs is the Mount Traps |
| which are provided by the Linux VFS. Any directory provided by a |
| filesystem can be designated as a trap. This involves two separate |
| features that work together to allow autofs to do its job. |
| |
| **DCACHE_NEED_AUTOMOUNT** |
| |
| If a dentry has the DCACHE_NEED_AUTOMOUNT flag set (which gets set if |
| the inode has S_AUTOMOUNT set, or can be set directly) then it is |
| (potentially) a mount trap. Any access to this directory beyond a |
| "`stat`" will (normally) cause the `d_op->d_automount()` dentry operation |
| to be called. The task of this method is to find the filesystem that |
| should be mounted on the directory and to return it. The VFS is |
| responsible for actually mounting the root of this filesystem on the |
| directory. |
| |
| autofs doesn't find the filesystem itself but sends a message to the |
| automount daemon asking it to find and mount the filesystem. The |
| autofs `d_automount` method then waits for the daemon to report that |
| everything is ready. It will then return "`NULL`" indicating that the |
| mount has already happened. The VFS doesn't try to mount anything but |
| follows down the mount that is already there. |
| |
| This functionality is sufficient for some users of mount traps such |
| as NFS which creates traps so that mountpoints on the server can be |
| reflected on the client. However it is not sufficient for autofs. As |
| mounting onto a directory is considered to be "beyond a `stat`", the |
| automount daemon would not be able to mount a filesystem on the 'trap' |
| directory without some way to avoid getting caught in the trap. For |
| that purpose there is another flag. |
| |
| **DCACHE_MANAGE_TRANSIT** |
| |
| If a dentry has DCACHE_MANAGE_TRANSIT set then two very different but |
| related behaviors are invoked, both using the `d_op->d_manage()` |
| dentry operation. |
| |
| Firstly, before checking to see if any filesystem is mounted on the |
| directory, d_manage() will be called with the `rcu_walk` parameter set |
| to `false`. It may return one of three things: |
| |
| - A return value of zero indicates that there is nothing special |
| about this dentry and normal checks for mounts and automounts |
| should proceed. |
| |
| autofs normally returns zero, but first waits for any |
| expiry (automatic unmounting of the mounted filesystem) to |
| complete. This avoids races. |
| |
| - A return value of `-EISDIR` tells the VFS to ignore any mounts |
| on the directory and to not consider calling `->d_automount()`. |
| This effectively disables the **DCACHE_NEED_AUTOMOUNT** flag |
| causing the directory not be a mount trap after all. |
| |
| autofs returns this if it detects that the process performing the |
| lookup is the automount daemon and that the mount has been |
| requested but has not yet completed. How it determines this is |
| discussed later. This allows the automount daemon not to get |
| caught in the mount trap. |
| |
| There is a subtlety here. It is possible that a second autofs |
| filesystem can be mounted below the first and for both of them to |
| be managed by the same daemon. For the daemon to be able to mount |
| something on the second it must be able to "walk" down past the |
| first. This means that d_manage cannot *always* return -EISDIR for |
| the automount daemon. It must only return it when a mount has |
| been requested, but has not yet completed. |
| |
| `d_manage` also returns `-EISDIR` if the dentry shouldn't be a |
| mount trap, either because it is a symbolic link or because it is |
| not empty. |
| |
| - Any other negative value is treated as an error and returned |
| to the caller. |
| |
| autofs can return |
| |
| - -ENOENT if the automount daemon failed to mount anything, |
| - -ENOMEM if it ran out of memory, |
| - -EINTR if a signal arrived while waiting for expiry to |
| complete |
| - or any other error sent down by the automount daemon. |
| |
| |
| The second use case only occurs during an "RCU-walk" and so `rcu_walk` |
| will be set. |
| |
| An RCU-walk is a fast and lightweight process for walking down a |
| filename path (i.e. it is like running on tip-toes). RCU-walk cannot |
| cope with all situations so when it finds a difficulty it falls back |
| to "REF-walk", which is slower but more robust. |
| |
| RCU-walk will never call `->d_automount`; the filesystems must already |
| be mounted or RCU-walk cannot handle the path. |
| To determine if a mount-trap is safe for RCU-walk mode it calls |
| `->d_manage()` with `rcu_walk` set to `true`. |
| |
| In this case `d_manage()` must avoid blocking and should avoid taking |
| spinlocks if at all possible. Its sole purpose is to determine if it |
| would be safe to follow down into any mounted directory and the only |
| reason that it might not be is if an expiry of the mount is |
| underway. |
| |
| In the `rcu_walk` case, `d_manage()` cannot return -EISDIR to tell the |
| VFS that this is a directory that doesn't require d_automount. If |
| `rcu_walk` sees a dentry with DCACHE_NEED_AUTOMOUNT set but nothing |
| mounted, it *will* fall back to REF-walk. `d_manage()` cannot make the |
| VFS remain in RCU-walk mode, but can only tell it to get out of |
| RCU-walk mode by returning `-ECHILD`. |
| |
| So `d_manage()`, when called with `rcu_walk` set, should either return |
| -ECHILD if there is any reason to believe it is unsafe to end the |
| mounted filesystem, and otherwise should return 0. |
| |
| autofs will return `-ECHILD` if an expiry of the filesystem has been |
| initiated or is being considered, otherwise it returns 0. |
| |
| |
| Mountpoint expiry |
| ----------------- |
| |
| The VFS has a mechanism for automatically expiring unused mounts, |
| much as it can expire any unused dentry information from the dcache. |
| This is guided by the MNT_SHRINKABLE flag. This only applies to |
| mounts that were created by `d_automount()` returning a filesystem to be |
| mounted. As autofs doesn't return such a filesystem but leaves the |
| mounting to the automount daemon, it must involve the automount daemon |
| in unmounting as well. This also means that autofs has more control |
| of expiry. |
| |
| The VFS also supports "expiry" of mounts using the MNT_EXPIRE flag to |
| the `umount` system call. Unmounting with MNT_EXPIRE will fail unless |
| a previous attempt had been made, and the filesystem has been inactive |
| and untouched since that previous attempt. autofs4 does not depend on |
| this but has its own internal tracking of whether filesystems were |
| recently used. This allows individual names in the autofs directory |
| to expire separately. |
| |
| With version 4 of the protocol, the automount daemon can try to |
| unmount any filesystems mounted on the autofs filesystem or remove any |
| symbolic links or empty directories any time it likes. If the unmount |
| or removal is successful the filesystem will be returned to the state |
| it was before the mount or creation, so that any access of the name |
| will trigger normal auto-mount processing. In particlar, `rmdir` and |
| `unlink` do not leave negative entries in the dcache as a normal |
| filesystem would, so an attempt to access a recently-removed object is |
| passed to autofs for handling. |
| |
| With version 5, this is not safe except for unmounting from top-level |
| directories. As lower-level directories are never mount traps, other |
| processes will see an empty directory as soon as the filesystem is |
| unmounted. So it is generally safest to use the autofs expiry |
| protocol described below. |
| |
| Normally the daemon only wants to remove entries which haven't been |
| used for a while. For this purpose autofs maintains a "`last_used`" |
| time stamp on each directory or symlink. For symlinks it genuinely |
| does record the last time the symlink was "used" or followed to find |
| out where it points to. For directories the field is a slight |
| misnomer. It actually records the last time that autofs checked if |
| the directory or one of its descendents was busy and found that it |
| was. This is just as useful and doesn't require updating the field so |
| often. |
| |
| The daemon is able to ask autofs if anything is due to be expired, |
| using an `ioctl` as discussed later. For a *direct* mount, autofs |
| considers if the entire mount-tree can be unmounted or not. For an |
| *indirect* mount, autofs considers each of the names in the top level |
| directory to determine if any of those can be unmounted and cleaned |
| up. |
| |
| There is an option with indirect mounts to consider each of the leaves |
| that has been mounted on instead of considering the top-level names. |
| This is intended for compatability with version 4 of autofs and should |
| be considered as deprecated. |
| |
| When autofs considers a directory it checks the `last_used` time and |
| compares it with the "timeout" value set when the filesystem was |
| mounted, though this check is ignored in some cases. It also checks if |
| the directory or anything below it is in use. For symbolic links, |
| only the `last_used` time is ever considered. |
| |
| If both appear to support expiring the directory or symlink, an action |
| is taken. |
| |
| There are two ways to ask autofs to consider expiry. The first is to |
| use the **AUTOFS_IOC_EXPIRE** ioctl. This only works for indirect |
| mounts. If it finds something in the root directory to expire it will |
| return the name of that thing. Once a name has been returned the |
| automount daemon needs to unmount any filesystems mounted below the |
| name normally. As described above, this is unsafe for non-toplevel |
| mounts in a version-5 autofs. For this reason the current `automountd` |
| does not use this ioctl. |
| |
| The second mechanism uses either the **AUTOFS_DEV_IOCTL_EXPIRE_CMD** or |
| the **AUTOFS_IOC_EXPIRE_MULTI** ioctl. This will work for both direct and |
| indirect mounts. If it selects an object to expire, it will notify |
| the daemon using the notification mechanism described below. This |
| will block until the daemon acknowledges the expiry notification. |
| This implies that the "`EXPIRE`" ioctl must be sent from a different |
| thread than the one which handles notification. |
| |
| While the ioctl is blocking, the entry is marked as "expiring" and |
| `d_manage` will block until the daemon affirms that the unmount has |
| completed (together with removing any directories that might have been |
| necessary), or has been aborted. |
| |
| Communicating with autofs: detecting the daemon |
| ----------------------------------------------- |
| |
| There are several forms of communication between the automount daemon |
| and the filesystem. As we have already seen, the daemon can create and |
| remove directories and symlinks using normal filesystem operations. |
| autofs knows whether a process requesting some operation is the daemon |
| or not based on its process-group id number (see getpgid(1)). |
| |
| When an autofs filesystem is mounted the pgid of the mounting |
| processes is recorded unless the "pgrp=" option is given, in which |
| case that number is recorded instead. Any request arriving from a |
| process in that process group is considered to come from the daemon. |
| If the daemon ever has to be stopped and restarted a new pgid can be |
| provided through an ioctl as will be described below. |
| |
| Communicating with autofs: the event pipe |
| ----------------------------------------- |
| |
| When an autofs filesystem is mounted, the 'write' end of a pipe must |
| be passed using the 'fd=' mount option. autofs will write |
| notification messages to this pipe for the daemon to respond to. |
| For version 5, the format of the message is: |
| |
| struct autofs_v5_packet { |
| int proto_version; /* Protocol version */ |
| int type; /* Type of packet */ |
| autofs_wqt_t wait_queue_entry_token; |
| __u32 dev; |
| __u64 ino; |
| __u32 uid; |
| __u32 gid; |
| __u32 pid; |
| __u32 tgid; |
| __u32 len; |
| char name[NAME_MAX+1]; |
| }; |
| |
| where the type is one of |
| |
| autofs_ptype_missing_indirect |
| autofs_ptype_expire_indirect |
| autofs_ptype_missing_direct |
| autofs_ptype_expire_direct |
| |
| so messages can indicate that a name is missing (something tried to |
| access it but it isn't there) or that it has been selected for expiry. |
| |
| The pipe will be set to "packet mode" (equivalent to passing |
| `O_DIRECT`) to _pipe2(2)_ so that a read from the pipe will return at |
| most one packet, and any unread portion of a packet will be discarded. |
| |
| The `wait_queue_entry_token` is a unique number which can identify a |
| particular request to be acknowledged. When a message is sent over |
| the pipe the affected dentry is marked as either "active" or |
| "expiring" and other accesses to it block until the message is |
| acknowledged using one of the ioctls below and the relevant |
| `wait_queue_entry_token`. |
| |
| Communicating with autofs: root directory ioctls |
| ------------------------------------------------ |
| |
| The root directory of an autofs filesystem will respond to a number of |
| ioctls. The process issuing the ioctl must have the CAP_SYS_ADMIN |
| capability, or must be the automount daemon. |
| |
| The available ioctl commands are: |
| |
| - **AUTOFS_IOC_READY**: a notification has been handled. The argument |
| to the ioctl command is the "wait_queue_entry_token" number |
| corresponding to the notification being acknowledged. |
| - **AUTOFS_IOC_FAIL**: similar to above, but indicates failure with |
| the error code `ENOENT`. |
| - **AUTOFS_IOC_CATATONIC**: Causes the autofs to enter "catatonic" |
| mode meaning that it stops sending notifications to the daemon. |
| This mode is also entered if a write to the pipe fails. |
| - **AUTOFS_IOC_PROTOVER**: This returns the protocol version in use. |
| - **AUTOFS_IOC_PROTOSUBVER**: Returns the protocol sub-version which |
| is really a version number for the implementation. It is |
| currently 2. |
| - **AUTOFS_IOC_SETTIMEOUT**: This passes a pointer to an unsigned |
| long. The value is used to set the timeout for expiry, and |
| the current timeout value is stored back through the pointer. |
| - **AUTOFS_IOC_ASKUMOUNT**: Returns, in the pointed-to `int`, 1 if |
| the filesystem could be unmounted. This is only a hint as |
| the situation could change at any instant. This call can be |
| use to avoid a more expensive full unmount attempt. |
| - **AUTOFS_IOC_EXPIRE**: as described above, this asks if there is |
| anything suitable to expire. A pointer to a packet: |
| |
| struct autofs_packet_expire_multi { |
| int proto_version; /* Protocol version */ |
| int type; /* Type of packet */ |
| autofs_wqt_t wait_queue_entry_token; |
| int len; |
| char name[NAME_MAX+1]; |
| }; |
| |
| is required. This is filled in with the name of something |
| that can be unmounted or removed. If nothing can be expired, |
| `errno` is set to `EAGAIN`. Even though a `wait_queue_entry_token` |
| is present in the structure, no "wait queue" is established |
| and no acknowledgment is needed. |
| - **AUTOFS_IOC_EXPIRE_MULTI**: This is similar to |
| **AUTOFS_IOC_EXPIRE** except that it causes notification to be |
| sent to the daemon, and it blocks until the daemon acknowledges. |
| The argument is an integer which can contain two different flags. |
| |
| **AUTOFS_EXP_IMMEDIATE** causes `last_used` time to be ignored |
| and objects are expired if the are not in use. |
| |
| **AUTOFS_EXP_LEAVES** will select a leaf rather than a top-level |
| name to expire. This is only safe when *maxproto* is 4. |
| |
| Communicating with autofs: char-device ioctls |
| --------------------------------------------- |
| |
| It is not always possible to open the root of an autofs filesystem, |
| particularly a *direct* mounted filesystem. If the automount daemon |
| is restarted there is no way for it to regain control of existing |
| mounts using any of the above communication channels. To address this |
| need there is a "miscellaneous" character device (major 10, minor 235) |
| which can be used to communicate directly with the autofs filesystem. |
| It requires CAP_SYS_ADMIN for access. |
| |
| The `ioctl`s that can be used on this device are described in a separate |
| document `autofs4-mount-control.txt`, and are summarized briefly here. |
| Each ioctl is passed a pointer to an `autofs_dev_ioctl` structure: |
| |
| struct autofs_dev_ioctl { |
| __u32 ver_major; |
| __u32 ver_minor; |
| __u32 size; /* total size of data passed in |
| * including this struct */ |
| __s32 ioctlfd; /* automount command fd */ |
| |
| /* Command parameters */ |
| union { |
| struct args_protover protover; |
| struct args_protosubver protosubver; |
| struct args_openmount openmount; |
| struct args_ready ready; |
| struct args_fail fail; |
| struct args_setpipefd setpipefd; |
| struct args_timeout timeout; |
| struct args_requester requester; |
| struct args_expire expire; |
| struct args_askumount askumount; |
| struct args_ismountpoint ismountpoint; |
| }; |
| |
| char path[0]; |
| }; |
| |
| For the **OPEN_MOUNT** and **IS_MOUNTPOINT** commands, the target |
| filesystem is identified by the `path`. All other commands identify |
| the filesystem by the `ioctlfd` which is a file descriptor open on the |
| root, and which can be returned by **OPEN_MOUNT**. |
| |
| The `ver_major` and `ver_minor` are in/out parameters which check that |
| the requested version is supported, and report the maximum version |
| that the kernel module can support. |
| |
| Commands are: |
| |
| - **AUTOFS_DEV_IOCTL_VERSION_CMD**: does nothing, except validate and |
| set version numbers. |
| - **AUTOFS_DEV_IOCTL_OPENMOUNT_CMD**: return an open file descriptor |
| on the root of an autofs filesystem. The filesystem is identified |
| by name and device number, which is stored in `openmount.devid`. |
| Device numbers for existing filesystems can be found in |
| `/proc/self/mountinfo`. |
| - **AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD**: same as `close(ioctlfd)`. |
| - **AUTOFS_DEV_IOCTL_SETPIPEFD_CMD**: if the filesystem is in |
| catatonic mode, this can provide the write end of a new pipe |
| in `setpipefd.pipefd` to re-establish communication with a daemon. |
| The process group of the calling process is used to identify the |
| daemon. |
| - **AUTOFS_DEV_IOCTL_REQUESTER_CMD**: `path` should be a |
| name within the filesystem that has been auto-mounted on. |
| On successful return, `requester.uid` and `requester.gid` will be |
| the UID and GID of the process which triggered that mount. |
| - **AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD**: Check if path is a |
| mountpoint of a particular type - see separate documentation for |
| details. |
| - **AUTOFS_DEV_IOCTL_PROTOVER_CMD**: |
| - **AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD**: |
| - **AUTOFS_DEV_IOCTL_READY_CMD**: |
| - **AUTOFS_DEV_IOCTL_FAIL_CMD**: |
| - **AUTOFS_DEV_IOCTL_CATATONIC_CMD**: |
| - **AUTOFS_DEV_IOCTL_TIMEOUT_CMD**: |
| - **AUTOFS_DEV_IOCTL_EXPIRE_CMD**: |
| - **AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD**: These all have the same |
| function as the similarly named **AUTOFS_IOC** ioctls, except |
| that **FAIL** can be given an explicit error number in `fail.status` |
| instead of assuming `ENOENT`, and this **EXPIRE** command |
| corresponds to **AUTOFS_IOC_EXPIRE_MULTI**. |
| |
| Catatonic mode |
| -------------- |
| |
| As mentioned, an autofs mount can enter "catatonic" mode. This |
| happens if a write to the notification pipe fails, or if it is |
| explicitly requested by an `ioctl`. |
| |
| When entering catatonic mode, the pipe is closed and any pending |
| notifications are acknowledged with the error `ENOENT`. |
| |
| Once in catatonic mode attempts to access non-existing names will |
| result in `ENOENT` while attempts to access existing directories will |
| be treated in the same way as if they came from the daemon, so mount |
| traps will not fire. |
| |
| When the filesystem is mounted a _uid_ and _gid_ can be given which |
| set the ownership of directories and symbolic links. When the |
| filesystem is in catatonic mode, any process with a matching UID can |
| create directories or symlinks in the root directory, but not in other |
| directories. |
| |
| Catatonic mode can only be left via the |
| **AUTOFS_DEV_IOCTL_OPENMOUNT_CMD** ioctl on the `/dev/autofs`. |
| |
| autofs, name spaces, and shared mounts |
| -------------------------------------- |
| |
| With bind mounts and name spaces it is possible for an autofs |
| filesystem to appear at multiple places in one or more filesystem |
| name spaces. For this to work sensibly, the autofs filesystem should |
| always be mounted "shared". e.g. |
| |
| > `mount --make-shared /autofs/mount/point` |
| |
| The automount daemon is only able to manage a single mount location for |
| an autofs filesystem and if mounts on that are not 'shared', other |
| locations will not behave as expected. In particular access to those |
| other locations will likely result in the `ELOOP` error |
| |
| > Too many levels of symbolic links |