| .. SPDX-License-Identifier: GPL-2.0 |
| |
| ============================== |
| Network Filesystem Caching API |
| ============================== |
| |
| Fscache provides an API by which a network filesystem can make use of local |
| caching facilities. The API is arranged around a number of principles: |
| |
| (1) A cache is logically organised into volumes and data storage objects |
| within those volumes. |
| |
| (2) Volumes and data storage objects are represented by various types of |
| cookie. |
| |
| (3) Cookies have keys that distinguish them from their peers. |
| |
| (4) Cookies have coherency data that allows a cache to determine if the |
| cached data is still valid. |
| |
| (5) I/O is done asynchronously where possible. |
| |
| This API is used by:: |
| |
| #include <linux/fscache.h>. |
| |
| .. This document contains the following sections: |
| |
| (1) Overview |
| (2) Volume registration |
| (3) Data file registration |
| (4) Declaring a cookie to be in use |
| (5) Resizing a data file (truncation) |
| (6) Data I/O API |
| (7) Data file coherency |
| (8) Data file invalidation |
| (9) Write back resource management |
| (10) Caching of local modifications |
| (11) Page release and invalidation |
| |
| |
| Overview |
| ======== |
| |
| The fscache hierarchy is organised on two levels from a network filesystem's |
| point of view. The upper level represents "volumes" and the lower level |
| represents "data storage objects". These are represented by two types of |
| cookie, hereafter referred to as "volume cookies" and "cookies". |
| |
| A network filesystem acquires a volume cookie for a volume using a volume key, |
| which represents all the information that defines that volume (e.g. cell name |
| or server address, volume ID or share name). This must be rendered as a |
| printable string that can be used as a directory name (ie. no '/' characters |
| and shouldn't begin with a '.'). The maximum name length is one less than the |
| maximum size of a filename component (allowing the cache backend one char for |
| its own purposes). |
| |
| A filesystem would typically have a volume cookie for each superblock. |
| |
| The filesystem then acquires a cookie for each file within that volume using an |
| object key. Object keys are binary blobs and only need to be unique within |
| their parent volume. The cache backend is responsible for rendering the binary |
| blob into something it can use and may employ hash tables, trees or whatever to |
| improve its ability to find an object. This is transparent to the network |
| filesystem. |
| |
| A filesystem would typically have a cookie for each inode, and would acquire it |
| in iget and relinquish it when evicting the cookie. |
| |
| Once it has a cookie, the filesystem needs to mark the cookie as being in use. |
| This causes fscache to send the cache backend off to look up/create resources |
| for the cookie in the background, to check its coherency and, if necessary, to |
| mark the object as being under modification. |
| |
| A filesystem would typically "use" the cookie in its file open routine and |
| unuse it in file release and it needs to use the cookie around calls to |
| truncate the cookie locally. It *also* needs to use the cookie when the |
| pagecache becomes dirty and unuse it when writeback is complete. This is |
| slightly tricky, and provision is made for it. |
| |
| When performing a read, write or resize on a cookie, the filesystem must first |
| begin an operation. This copies the resources into a holding struct and puts |
| extra pins into the cache to stop cache withdrawal from tearing down the |
| structures being used. The actual operation can then be issued and conflicting |
| invalidations can be detected upon completion. |
| |
| The filesystem is expected to use netfslib to access the cache, but that's not |
| actually required and it can use the fscache I/O API directly. |
| |
| |
| Volume Registration |
| =================== |
| |
| The first step for a network filesystem is to acquire a volume cookie for the |
| volume it wants to access:: |
| |
| struct fscache_volume * |
| fscache_acquire_volume(const char *volume_key, |
| const char *cache_name, |
| const void *coherency_data, |
| size_t coherency_len); |
| |
| This function creates a volume cookie with the specified volume key as its name |
| and notes the coherency data. |
| |
| The volume key must be a printable string with no '/' characters in it. It |
| should begin with the name of the filesystem and should be no longer than 254 |
| characters. It should uniquely represent the volume and will be matched with |
| what's stored in the cache. |
| |
| The caller may also specify the name of the cache to use. If specified, |
| fscache will look up or create a cache cookie of that name and will use a cache |
| of that name if it is online or comes online. If no cache name is specified, |
| it will use the first cache that comes to hand and set the name to that. |
| |
| The specified coherency data is stored in the cookie and will be matched |
| against coherency data stored on disk. The data pointer may be NULL if no data |
| is provided. If the coherency data doesn't match, the entire cache volume will |
| be invalidated. |
| |
| This function can return errors such as EBUSY if the volume key is already in |
| use by an acquired volume or ENOMEM if an allocation failure occurred. It may |
| also return a NULL volume cookie if fscache is not enabled. It is safe to |
| pass a NULL cookie to any function that takes a volume cookie. This will |
| cause that function to do nothing. |
| |
| |
| When the network filesystem has finished with a volume, it should relinquish it |
| by calling:: |
| |
| void fscache_relinquish_volume(struct fscache_volume *volume, |
| const void *coherency_data, |
| bool invalidate); |
| |
| This will cause the volume to be committed or removed, and if sealed the |
| coherency data will be set to the value supplied. The amount of coherency data |
| must match the length specified when the volume was acquired. Note that all |
| data cookies obtained in this volume must be relinquished before the volume is |
| relinquished. |
| |
| |
| Data File Registration |
| ====================== |
| |
| Once it has a volume cookie, a network filesystem can use it to acquire a |
| cookie for data storage:: |
| |
| struct fscache_cookie * |
| fscache_acquire_cookie(struct fscache_volume *volume, |
| u8 advice, |
| const void *index_key, |
| size_t index_key_len, |
| const void *aux_data, |
| size_t aux_data_len, |
| loff_t object_size) |
| |
| This creates the cookie in the volume using the specified index key. The index |
| key is a binary blob of the given length and must be unique for the volume. |
| This is saved into the cookie. There are no restrictions on the content, but |
| its length shouldn't exceed about three quarters of the maximum filename length |
| to allow for encoding. |
| |
| The caller should also pass in a piece of coherency data in aux_data. A buffer |
| of size aux_data_len will be allocated and the coherency data copied in. It is |
| assumed that the size is invariant over time. The coherency data is used to |
| check the validity of data in the cache. Functions are provided by which the |
| coherency data can be updated. |
| |
| The file size of the object being cached should also be provided. This may be |
| used to trim the data and will be stored with the coherency data. |
| |
| This function never returns an error, though it may return a NULL cookie on |
| allocation failure or if fscache is not enabled. It is safe to pass in a NULL |
| volume cookie and pass the NULL cookie returned to any function that takes it. |
| This will cause that function to do nothing. |
| |
| |
| When the network filesystem has finished with a cookie, it should relinquish it |
| by calling:: |
| |
| void fscache_relinquish_cookie(struct fscache_cookie *cookie, |
| bool retire); |
| |
| This will cause fscache to either commit the storage backing the cookie or |
| delete it. |
| |
| |
| Marking A Cookie In-Use |
| ======================= |
| |
| Once a cookie has been acquired by a network filesystem, the filesystem should |
| tell fscache when it intends to use the cookie (typically done on file open) |
| and should say when it has finished with it (typically on file close):: |
| |
| void fscache_use_cookie(struct fscache_cookie *cookie, |
| bool will_modify); |
| void fscache_unuse_cookie(struct fscache_cookie *cookie, |
| const void *aux_data, |
| const loff_t *object_size); |
| |
| The *use* function tells fscache that it will use the cookie and, additionally, |
| indicate if the user is intending to modify the contents locally. If not yet |
| done, this will trigger the cache backend to go and gather the resources it |
| needs to access/store data in the cache. This is done in the background, and |
| so may not be complete by the time the function returns. |
| |
| The *unuse* function indicates that a filesystem has finished using a cookie. |
| It optionally updates the stored coherency data and object size and then |
| decreases the in-use counter. When the last user unuses the cookie, it is |
| scheduled for garbage collection. If not reused within a short time, the |
| resources will be released to reduce system resource consumption. |
| |
| A cookie must be marked in-use before it can be accessed for read, write or |
| resize - and an in-use mark must be kept whilst there is dirty data in the |
| pagecache in order to avoid an oops due to trying to open a file during process |
| exit. |
| |
| Note that in-use marks are cumulative. For each time a cookie is marked |
| in-use, it must be unused. |
| |
| |
| Resizing A Data File (Truncation) |
| ================================= |
| |
| If a network filesystem file is resized locally by truncation, the following |
| should be called to notify the cache:: |
| |
| void fscache_resize_cookie(struct fscache_cookie *cookie, |
| loff_t new_size); |
| |
| The caller must have first marked the cookie in-use. The cookie and the new |
| size are passed in and the cache is synchronously resized. This is expected to |
| be called from ``->setattr()`` inode operation under the inode lock. |
| |
| |
| Data I/O API |
| ============ |
| |
| To do data I/O operations directly through a cookie, the following functions |
| are available:: |
| |
| int fscache_begin_read_operation(struct netfs_cache_resources *cres, |
| struct fscache_cookie *cookie); |
| int fscache_read(struct netfs_cache_resources *cres, |
| loff_t start_pos, |
| struct iov_iter *iter, |
| enum netfs_read_from_hole read_hole, |
| netfs_io_terminated_t term_func, |
| void *term_func_priv); |
| int fscache_write(struct netfs_cache_resources *cres, |
| loff_t start_pos, |
| struct iov_iter *iter, |
| netfs_io_terminated_t term_func, |
| void *term_func_priv); |
| |
| The *begin* function sets up an operation, attaching the resources required to |
| the cache resources block from the cookie. Assuming it doesn't return an error |
| (for instance, it will return -ENOBUFS if given a NULL cookie, but otherwise do |
| nothing), then one of the other two functions can be issued. |
| |
| The *read* and *write* functions initiate a direct-IO operation. Both take the |
| previously set up cache resources block, an indication of the start file |
| position, and an I/O iterator that describes buffer and indicates the amount of |
| data. |
| |
| The read function also takes a parameter to indicate how it should handle a |
| partially populated region (a hole) in the disk content. This may be to ignore |
| it, skip over an initial hole and place zeros in the buffer or give an error. |
| |
| The read and write functions can be given an optional termination function that |
| will be run on completion:: |
| |
| typedef |
| void (*netfs_io_terminated_t)(void *priv, ssize_t transferred_or_error, |
| bool was_async); |
| |
| If a termination function is given, the operation will be run asynchronously |
| and the termination function will be called upon completion. If not given, the |
| operation will be run synchronously. Note that in the asynchronous case, it is |
| possible for the operation to complete before the function returns. |
| |
| Both the read and write functions end the operation when they complete, |
| detaching any pinned resources. |
| |
| The read operation will fail with ESTALE if invalidation occurred whilst the |
| operation was ongoing. |
| |
| |
| Data File Coherency |
| =================== |
| |
| To request an update of the coherency data and file size on a cookie, the |
| following should be called:: |
| |
| void fscache_update_cookie(struct fscache_cookie *cookie, |
| const void *aux_data, |
| const loff_t *object_size); |
| |
| This will update the cookie's coherency data and/or file size. |
| |
| |
| Data File Invalidation |
| ====================== |
| |
| Sometimes it will be necessary to invalidate an object that contains data. |
| Typically this will be necessary when the server informs the network filesystem |
| of a remote third-party change - at which point the filesystem has to throw |
| away the state and cached data that it had for an file and reload from the |
| server. |
| |
| To indicate that a cache object should be invalidated, the following should be |
| called:: |
| |
| void fscache_invalidate(struct fscache_cookie *cookie, |
| const void *aux_data, |
| loff_t size, |
| unsigned int flags); |
| |
| This increases the invalidation counter in the cookie to cause outstanding |
| reads to fail with -ESTALE, sets the coherency data and file size from the |
| information supplied, blocks new I/O on the cookie and dispatches the cache to |
| go and get rid of the old data. |
| |
| Invalidation runs asynchronously in a worker thread so that it doesn't block |
| too much. |
| |
| |
| Write-Back Resource Management |
| ============================== |
| |
| To write data to the cache from network filesystem writeback, the cache |
| resources required need to be pinned at the point the modification is made (for |
| instance when the page is marked dirty) as it's not possible to open a file in |
| a thread that's exiting. |
| |
| The following facilities are provided to manage this: |
| |
| * An inode flag, ``I_PINNING_FSCACHE_WB``, is provided to indicate that an |
| in-use is held on the cookie for this inode. It can only be changed if the |
| the inode lock is held. |
| |
| * A flag, ``unpinned_fscache_wb`` is placed in the ``writeback_control`` |
| struct that gets set if ``__writeback_single_inode()`` clears |
| ``I_PINNING_FSCACHE_WB`` because all the dirty pages were cleared. |
| |
| To support this, the following functions are provided:: |
| |
| bool fscache_dirty_folio(struct address_space *mapping, |
| struct folio *folio, |
| struct fscache_cookie *cookie); |
| void fscache_unpin_writeback(struct writeback_control *wbc, |
| struct fscache_cookie *cookie); |
| void fscache_clear_inode_writeback(struct fscache_cookie *cookie, |
| struct inode *inode, |
| const void *aux); |
| |
| The *set* function is intended to be called from the filesystem's |
| ``dirty_folio`` address space operation. If ``I_PINNING_FSCACHE_WB`` is not |
| set, it sets that flag and increments the use count on the cookie (the caller |
| must already have called ``fscache_use_cookie()``). |
| |
| The *unpin* function is intended to be called from the filesystem's |
| ``write_inode`` superblock operation. It cleans up after writing by unusing |
| the cookie if unpinned_fscache_wb is set in the writeback_control struct. |
| |
| The *clear* function is intended to be called from the netfs's ``evict_inode`` |
| superblock operation. It must be called *after* |
| ``truncate_inode_pages_final()``, but *before* ``clear_inode()``. This cleans |
| up any hanging ``I_PINNING_FSCACHE_WB``. It also allows the coherency data to |
| be updated. |
| |
| |
| Caching of Local Modifications |
| ============================== |
| |
| If a network filesystem has locally modified data that it wants to write to the |
| cache, it needs to mark the pages to indicate that a write is in progress, and |
| if the mark is already present, it needs to wait for it to be removed first |
| (presumably due to an already in-progress operation). This prevents multiple |
| competing DIO writes to the same storage in the cache. |
| |
| Firstly, the netfs should determine if caching is available by doing something |
| like:: |
| |
| bool caching = fscache_cookie_enabled(cookie); |
| |
| If caching is to be attempted, pages should be waited for and then marked using |
| the following functions provided by the netfs helper library:: |
| |
| void set_page_fscache(struct page *page); |
| void wait_on_page_fscache(struct page *page); |
| int wait_on_page_fscache_killable(struct page *page); |
| |
| Once all the pages in the span are marked, the netfs can ask fscache to |
| schedule a write of that region:: |
| |
| void fscache_write_to_cache(struct fscache_cookie *cookie, |
| struct address_space *mapping, |
| loff_t start, size_t len, loff_t i_size, |
| netfs_io_terminated_t term_func, |
| void *term_func_priv, |
| bool caching) |
| |
| And if an error occurs before that point is reached, the marks can be removed |
| by calling:: |
| |
| void fscache_clear_page_bits(struct address_space *mapping, |
| loff_t start, size_t len, |
| bool caching) |
| |
| In these functions, a pointer to the mapping to which the source pages are |
| attached is passed in and start and len indicate the size of the region that's |
| going to be written (it doesn't have to align to page boundaries necessarily, |
| but it does have to align to DIO boundaries on the backing filesystem). The |
| caching parameter indicates if caching should be skipped, and if false, the |
| functions do nothing. |
| |
| The write function takes some additional parameters: the cookie representing |
| the cache object to be written to, i_size indicates the size of the netfs file |
| and term_func indicates an optional completion function, to which |
| term_func_priv will be passed, along with the error or amount written. |
| |
| Note that the write function will always run asynchronously and will unmark all |
| the pages upon completion before calling term_func. |
| |
| |
| Page Release and Invalidation |
| ============================= |
| |
| Fscache keeps track of whether we have any data in the cache yet for a cache |
| object we've just created. It knows it doesn't have to do any reading until it |
| has done a write and then the page it wrote from has been released by the VM, |
| after which it *has* to look in the cache. |
| |
| To inform fscache that a page might now be in the cache, the following function |
| should be called from the ``release_folio`` address space op:: |
| |
| void fscache_note_page_release(struct fscache_cookie *cookie); |
| |
| if the page has been released (ie. release_folio returned true). |
| |
| Page release and page invalidation should also wait for any mark left on the |
| page to say that a DIO write is underway from that page:: |
| |
| void wait_on_page_fscache(struct page *page); |
| int wait_on_page_fscache_killable(struct page *page); |
| |
| |
| API Function Reference |
| ====================== |
| |
| .. kernel-doc:: include/linux/fscache.h |