Merge tag 'for-6.9/block-20240310' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe:
- MD pull requests via Song:
- Cleanup redundant checks (Yu Kuai)
- Remove deprecated headers (Marc Zyngier, Song Liu)
- Concurrency fixes (Li Lingfeng)
- Memory leak fix (Li Nan)
- Refactor raid1 read_balance (Yu Kuai, Paul Luse)
- Clean up and fix for md_ioctl (Li Nan)
- Other small fixes (Gui-Dong Han, Heming Zhao)
- MD atomic limits (Christoph)
- NVMe pull request via Keith:
- RDMA target enhancements (Max)
- Fabrics fixes (Max, Guixin, Hannes)
- Atomic queue_limits usage (Christoph)
- Const use for class_register (Ricardo)
- Identification error handling fixes (Shin'ichiro, Keith)
- Improvement and cleanup for cached request handling (Christoph)
- Moving towards atomic queue limits. Core changes and driver bits so
far (Christoph)
- Fix UAF issues in aoeblk (Chun-Yi)
- Zoned fix and cleanups (Damien)
- s390 dasd cleanups and fixes (Jan, Miroslav)
- Block issue timestamp caching (me)
- noio scope guarding for zoned IO (Johannes)
- block/nvme PI improvements (Kanchan)
- Ability to terminate long running discard loop (Keith)
- bdev revalidation fix (Li)
- Get rid of old nr_queues hack for kdump kernels (Ming)
- Support for async deletion of ublk (Ming)
- Improve IRQ bio recycling (Pavel)
- Factor in CPU capacity for remote vs local completion (Qais)
- Add shared_tags configfs entry for null_blk (Shin'ichiro
- Fix for a regression in page refcounts introduced by the folio
unification (Tony)
- Misc fixes and cleanups (Arnd, Colin, John, Kunwu, Li, Navid,
Ricardo, Roman, Tang, Uwe)
* tag 'for-6.9/block-20240310' of git://git.kernel.dk/linux: (221 commits)
block: partitions: only define function mac_fix_string for CONFIG_PPC_PMAC
block/swim: Convert to platform remove callback returning void
cdrom: gdrom: Convert to platform remove callback returning void
block: remove disk_stack_limits
md: remove mddev->queue
md: don't initialize queue limits
md/raid10: use the atomic queue limit update APIs
md/raid5: use the atomic queue limit update APIs
md/raid1: use the atomic queue limit update APIs
md/raid0: use the atomic queue limit update APIs
md: add queue limit helpers
md: add a mddev_is_dm helper
md: add a mddev_add_trace_msg helper
md: add a mddev_trace_remap helper
bcache: move calculation of stripe_size and io_opt into bcache_device_init
virtio_blk: Do not use disk_set_max_open/active_zones()
aoe: fix the potential use-after-free problem in aoecmd_cfg_pkts
block: move capacity validation to blkpg_do_ioctl()
block: prevent division by zero in blk_rq_stat_sum()
drbd: atomically update queue limits in drbd_reconsider_queue_parameters
...
diff --git a/.mailmap b/.mailmap
index 04998f7..bd9f102 100644
--- a/.mailmap
+++ b/.mailmap
@@ -191,10 +191,11 @@
Gao Xiang <xiang@kernel.org> <hsiangkao@aol.com>
Gao Xiang <xiang@kernel.org> <hsiangkao@linux.alibaba.com>
Gao Xiang <xiang@kernel.org> <hsiangkao@redhat.com>
-Geliang Tang <geliang.tang@linux.dev> <geliang.tang@suse.com>
-Geliang Tang <geliang.tang@linux.dev> <geliangtang@xiaomi.com>
-Geliang Tang <geliang.tang@linux.dev> <geliangtang@gmail.com>
-Geliang Tang <geliang.tang@linux.dev> <geliangtang@163.com>
+Geliang Tang <geliang@kernel.org> <geliang.tang@linux.dev>
+Geliang Tang <geliang@kernel.org> <geliang.tang@suse.com>
+Geliang Tang <geliang@kernel.org> <geliangtang@xiaomi.com>
+Geliang Tang <geliang@kernel.org> <geliangtang@gmail.com>
+Geliang Tang <geliang@kernel.org> <geliangtang@163.com>
Georgi Djakov <djakov@kernel.org> <georgi.djakov@linaro.org>
Gerald Schaefer <gerald.schaefer@linux.ibm.com> <geraldsc@de.ibm.com>
Gerald Schaefer <gerald.schaefer@linux.ibm.com> <gerald.schaefer@de.ibm.com>
@@ -289,6 +290,7 @@
John Crispin <john@phrozen.org> <blogic@openwrt.org>
John Fastabend <john.fastabend@gmail.com> <john.r.fastabend@intel.com>
John Keeping <john@keeping.me.uk> <john@metanate.com>
+John Moon <john@jmoon.dev> <quic_johmoo@quicinc.com>
John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
John Stultz <johnstul@us.ibm.com>
<jon.toppins+linux@gmail.com> <jtoppins@cumulusnetworks.com>
@@ -323,6 +325,7 @@
Kenneth Westfield <quic_kwestfie@quicinc.com> <kwestfie@codeaurora.org>
Kiran Gunda <quic_kgunda@quicinc.com> <kgunda@codeaurora.org>
Kirill Tkhai <tkhai@ya.ru> <ktkhai@virtuozzo.com>
+Kishon Vijay Abraham I <kishon@kernel.org> <kishon@ti.com>
Konstantin Khlebnikov <koct9i@gmail.com> <khlebnikov@yandex-team.ru>
Konstantin Khlebnikov <koct9i@gmail.com> <k.khlebnikov@samsung.com>
Koushik <raghavendra.koushik@neterion.com>
@@ -344,6 +347,7 @@
Leon Romanovsky <leon@kernel.org> <leon@leon.nu>
Leon Romanovsky <leon@kernel.org> <leonro@mellanox.com>
Leon Romanovsky <leon@kernel.org> <leonro@nvidia.com>
+Leo Yan <leo.yan@linux.dev> <leo.yan@linaro.org>
Liam Mark <quic_lmark@quicinc.com> <lmark@codeaurora.org>
Linas Vepstas <linas@austin.ibm.com>
Linus Lüssing <linus.luessing@c0d3.blue> <linus.luessing@ascom.ch>
@@ -550,6 +554,7 @@
Serge Hallyn <sergeh@kernel.org> <serge.hallyn@canonical.com>
Serge Hallyn <sergeh@kernel.org> <serue@us.ibm.com>
Seth Forshee <sforshee@kernel.org> <seth.forshee@canonical.com>
+Shakeel Butt <shakeel.butt@linux.dev> <shakeelb@google.com>
Shannon Nelson <shannon.nelson@amd.com> <snelson@pensando.io>
Shannon Nelson <shannon.nelson@amd.com> <shannon.nelson@intel.com>
Shannon Nelson <shannon.nelson@amd.com> <shannon.nelson@oracle.com>
@@ -605,6 +610,11 @@
TripleX Chung <xxx.phy@gmail.com> <zhongyu@18mail.cn>
Tsuneo Yoshioka <Tsuneo.Yoshioka@f-secure.com>
Tudor Ambarus <tudor.ambarus@linaro.org> <tudor.ambarus@microchip.com>
+Tvrtko Ursulin <tursulin@ursulin.net> <tvrtko.ursulin@intel.com>
+Tvrtko Ursulin <tursulin@ursulin.net> <tvrtko.ursulin@linux.intel.com>
+Tvrtko Ursulin <tursulin@ursulin.net> <tvrtko.ursulin@sophos.com>
+Tvrtko Ursulin <tursulin@ursulin.net> <tvrtko.ursulin@onelan.co.uk>
+Tvrtko Ursulin <tursulin@ursulin.net> <tvrtko@ursulin.net>
Tycho Andersen <tycho@tycho.pizza> <tycho@tycho.ws>
Tzung-Bi Shih <tzungbi@kernel.org> <tzungbi@google.com>
Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de>
diff --git a/CREDITS b/CREDITS
index df8d694..3c2bb55 100644
--- a/CREDITS
+++ b/CREDITS
@@ -63,6 +63,11 @@
S: Buenos Aires
S: Argentina
+NTFS FILESYSTEM
+N: Anton Altaparmakov
+E: anton@tuxera.com
+D: NTFS filesystem
+
N: Tim Alpaerts
E: tim_alpaerts@toyota-motor-europe.com
D: 802.2 class II logical link control layer,
diff --git a/Documentation/ABI/testing/sysfs-class-net-statistics b/Documentation/ABI/testing/sysfs-class-net-statistics
index 55db278..53e508c 100644
--- a/Documentation/ABI/testing/sysfs-class-net-statistics
+++ b/Documentation/ABI/testing/sysfs-class-net-statistics
@@ -1,4 +1,4 @@
-What: /sys/class/<iface>/statistics/collisions
+What: /sys/class/net/<iface>/statistics/collisions
Date: April 2005
KernelVersion: 2.6.12
Contact: netdev@vger.kernel.org
@@ -6,7 +6,7 @@
Indicates the number of collisions seen by this network device.
This value might not be relevant with all MAC layers.
-What: /sys/class/<iface>/statistics/multicast
+What: /sys/class/net/<iface>/statistics/multicast
Date: April 2005
KernelVersion: 2.6.12
Contact: netdev@vger.kernel.org
@@ -14,7 +14,7 @@
Indicates the number of multicast packets received by this
network device.
-What: /sys/class/<iface>/statistics/rx_bytes
+What: /sys/class/net/<iface>/statistics/rx_bytes
Date: April 2005
KernelVersion: 2.6.12
Contact: netdev@vger.kernel.org
@@ -23,7 +23,7 @@
See the network driver for the exact meaning of when this
value is incremented.
-What: /sys/class/<iface>/statistics/rx_compressed
+What: /sys/class/net/<iface>/statistics/rx_compressed
Date: April 2005
KernelVersion: 2.6.12
Contact: netdev@vger.kernel.org
@@ -32,7 +32,7 @@
network device. This value might only be relevant for interfaces
that support packet compression (e.g: PPP).
-What: /sys/class/<iface>/statistics/rx_crc_errors
+What: /sys/class/net/<iface>/statistics/rx_crc_errors
Date: April 2005
KernelVersion: 2.6.12
Contact: netdev@vger.kernel.org
@@ -41,7 +41,7 @@
by this network device. Note that the specific meaning might
depend on the MAC layer used by the interface.
-What: /sys/class/<iface>/statistics/rx_dropped
+What: /sys/class/net/<iface>/statistics/rx_dropped
Date: April 2005
KernelVersion: 2.6.12
Contact: netdev@vger.kernel.org
@@ -51,7 +51,7 @@
packet processing. See the network driver for the exact
meaning of this value.
-What: /sys/class/<iface>/statistics/rx_errors
+What: /sys/class/net/<iface>/statistics/rx_errors
Date: April 2005
KernelVersion: 2.6.12
Contact: netdev@vger.kernel.org
@@ -59,7 +59,7 @@
Indicates the number of receive errors on this network device.
See the network driver for the exact meaning of this value.
-What: /sys/class/<iface>/statistics/rx_fifo_errors
+What: /sys/class/net/<iface>/statistics/rx_fifo_errors
Date: April 2005
KernelVersion: 2.6.12
Contact: netdev@vger.kernel.org
@@ -68,7 +68,7 @@
network device. See the network driver for the exact
meaning of this value.
-What: /sys/class/<iface>/statistics/rx_frame_errors
+What: /sys/class/net/<iface>/statistics/rx_frame_errors
Date: April 2005
KernelVersion: 2.6.12
Contact: netdev@vger.kernel.org
@@ -78,7 +78,7 @@
on the MAC layer protocol used. See the network driver for
the exact meaning of this value.
-What: /sys/class/<iface>/statistics/rx_length_errors
+What: /sys/class/net/<iface>/statistics/rx_length_errors
Date: April 2005
KernelVersion: 2.6.12
Contact: netdev@vger.kernel.org
@@ -87,7 +87,7 @@
error, oversized or undersized. See the network driver for the
exact meaning of this value.
-What: /sys/class/<iface>/statistics/rx_missed_errors
+What: /sys/class/net/<iface>/statistics/rx_missed_errors
Date: April 2005
KernelVersion: 2.6.12
Contact: netdev@vger.kernel.org
@@ -96,7 +96,7 @@
due to lack of capacity in the receive side. See the network
driver for the exact meaning of this value.
-What: /sys/class/<iface>/statistics/rx_nohandler
+What: /sys/class/net/<iface>/statistics/rx_nohandler
Date: February 2016
KernelVersion: 4.6
Contact: netdev@vger.kernel.org
@@ -104,7 +104,7 @@
Indicates the number of received packets that were dropped on
an inactive device by the network core.
-What: /sys/class/<iface>/statistics/rx_over_errors
+What: /sys/class/net/<iface>/statistics/rx_over_errors
Date: April 2005
KernelVersion: 2.6.12
Contact: netdev@vger.kernel.org
@@ -114,7 +114,7 @@
(e.g: larger than MTU). See the network driver for the exact
meaning of this value.
-What: /sys/class/<iface>/statistics/rx_packets
+What: /sys/class/net/<iface>/statistics/rx_packets
Date: April 2005
KernelVersion: 2.6.12
Contact: netdev@vger.kernel.org
@@ -122,7 +122,7 @@
Indicates the total number of good packets received by this
network device.
-What: /sys/class/<iface>/statistics/tx_aborted_errors
+What: /sys/class/net/<iface>/statistics/tx_aborted_errors
Date: April 2005
KernelVersion: 2.6.12
Contact: netdev@vger.kernel.org
@@ -132,7 +132,7 @@
a medium collision). See the network driver for the exact
meaning of this value.
-What: /sys/class/<iface>/statistics/tx_bytes
+What: /sys/class/net/<iface>/statistics/tx_bytes
Date: April 2005
KernelVersion: 2.6.12
Contact: netdev@vger.kernel.org
@@ -143,7 +143,7 @@
transmitted packets or all packets that have been queued for
transmission.
-What: /sys/class/<iface>/statistics/tx_carrier_errors
+What: /sys/class/net/<iface>/statistics/tx_carrier_errors
Date: April 2005
KernelVersion: 2.6.12
Contact: netdev@vger.kernel.org
@@ -152,7 +152,7 @@
because of carrier errors (e.g: physical link down). See the
network driver for the exact meaning of this value.
-What: /sys/class/<iface>/statistics/tx_compressed
+What: /sys/class/net/<iface>/statistics/tx_compressed
Date: April 2005
KernelVersion: 2.6.12
Contact: netdev@vger.kernel.org
@@ -161,7 +161,7 @@
this might only be relevant for devices that support
compression (e.g: PPP).
-What: /sys/class/<iface>/statistics/tx_dropped
+What: /sys/class/net/<iface>/statistics/tx_dropped
Date: April 2005
KernelVersion: 2.6.12
Contact: netdev@vger.kernel.org
@@ -170,7 +170,7 @@
See the driver for the exact reasons as to why the packets were
dropped.
-What: /sys/class/<iface>/statistics/tx_errors
+What: /sys/class/net/<iface>/statistics/tx_errors
Date: April 2005
KernelVersion: 2.6.12
Contact: netdev@vger.kernel.org
@@ -179,7 +179,7 @@
a network device. See the driver for the exact reasons as to
why the packets were dropped.
-What: /sys/class/<iface>/statistics/tx_fifo_errors
+What: /sys/class/net/<iface>/statistics/tx_fifo_errors
Date: April 2005
KernelVersion: 2.6.12
Contact: netdev@vger.kernel.org
@@ -188,7 +188,7 @@
FIFO error. See the driver for the exact reasons as to why the
packets were dropped.
-What: /sys/class/<iface>/statistics/tx_heartbeat_errors
+What: /sys/class/net/<iface>/statistics/tx_heartbeat_errors
Date: April 2005
KernelVersion: 2.6.12
Contact: netdev@vger.kernel.org
@@ -197,7 +197,7 @@
reported as heartbeat errors. See the driver for the exact
reasons as to why the packets were dropped.
-What: /sys/class/<iface>/statistics/tx_packets
+What: /sys/class/net/<iface>/statistics/tx_packets
Date: April 2005
KernelVersion: 2.6.12
Contact: netdev@vger.kernel.org
@@ -206,7 +206,7 @@
device. See the driver for whether this reports the number of all
attempted or successful transmissions.
-What: /sys/class/<iface>/statistics/tx_window_errors
+What: /sys/class/net/<iface>/statistics/tx_window_errors
Date: April 2005
KernelVersion: 2.6.12
Contact: netdev@vger.kernel.org
diff --git a/Documentation/ABI/testing/sysfs-nvmem-cells b/Documentation/ABI/testing/sysfs-nvmem-cells
index 7af70ad..c7c9444 100644
--- a/Documentation/ABI/testing/sysfs-nvmem-cells
+++ b/Documentation/ABI/testing/sysfs-nvmem-cells
@@ -4,18 +4,18 @@
Contact: Miquel Raynal <miquel.raynal@bootlin.com>
Description:
The "cells" folder contains one file per cell exposed by the
- NVMEM device. The name of the file is: <name>@<where>, with
- <name> being the cell name and <where> its location in the NVMEM
- device, in hexadecimal (without the '0x' prefix, to mimic device
- tree node names). The length of the file is the size of the cell
- (when known). The content of the file is the binary content of
- the cell (may sometimes be ASCII, likely without trailing
- character).
+ NVMEM device. The name of the file is: "<name>@<byte>,<bit>",
+ with <name> being the cell name and <where> its location in
+ the NVMEM device, in hexadecimal bytes and bits (without the
+ '0x' prefix, to mimic device tree node names). The length of
+ the file is the size of the cell (when known). The content of
+ the file is the binary content of the cell (may sometimes be
+ ASCII, likely without trailing character).
Note: This file is only present if CONFIG_NVMEM_SYSFS
is enabled.
Example::
- hexdump -C /sys/bus/nvmem/devices/1-00563/cells/product-name@d
+ hexdump -C /sys/bus/nvmem/devices/1-00563/cells/product-name@d,0
00000000 54 4e 34 38 4d 2d 50 2d 44 4e |TN48M-P-DN|
0000000a
diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
index e8c2ce1..45a7f49 100644
--- a/Documentation/arch/arm64/silicon-errata.rst
+++ b/Documentation/arch/arm64/silicon-errata.rst
@@ -243,3 +243,10 @@
+----------------+-----------------+-----------------+-----------------------------+
| ASR | ASR8601 | #8601001 | N/A |
+----------------+-----------------+-----------------+-----------------------------+
++----------------+-----------------+-----------------+-----------------------------+
+| Microsoft | Azure Cobalt 100| #2139208 | ARM64_ERRATUM_2139208 |
++----------------+-----------------+-----------------+-----------------------------+
+| Microsoft | Azure Cobalt 100| #2067961 | ARM64_ERRATUM_2067961 |
++----------------+-----------------+-----------------+-----------------------------+
+| Microsoft | Azure Cobalt 100| #2253138 | ARM64_ERRATUM_2253138 |
++----------------+-----------------+-----------------+-----------------------------+
diff --git a/Documentation/arch/x86/mds.rst b/Documentation/arch/x86/mds.rst
index e73fdff..c58c723 100644
--- a/Documentation/arch/x86/mds.rst
+++ b/Documentation/arch/x86/mds.rst
@@ -95,6 +95,9 @@
mds_clear_cpu_buffers()
+Also macro CLEAR_CPU_BUFFERS can be used in ASM late in exit-to-user path.
+Other than CFLAGS.ZF, this macro doesn't clobber any registers.
+
The mitigation is invoked on kernel/userspace, hypervisor/guest and C-state
(idle) transitions.
@@ -138,17 +141,30 @@
When transitioning from kernel to user space the CPU buffers are flushed
on affected CPUs when the mitigation is not disabled on the kernel
- command line. The migitation is enabled through the static key
- mds_user_clear.
+ command line. The mitigation is enabled through the feature flag
+ X86_FEATURE_CLEAR_CPU_BUF.
- The mitigation is invoked in prepare_exit_to_usermode() which covers
- all but one of the kernel to user space transitions. The exception
- is when we return from a Non Maskable Interrupt (NMI), which is
- handled directly in do_nmi().
+ The mitigation is invoked just before transitioning to userspace after
+ user registers are restored. This is done to minimize the window in
+ which kernel data could be accessed after VERW e.g. via an NMI after
+ VERW.
- (The reason that NMI is special is that prepare_exit_to_usermode() can
- enable IRQs. In NMI context, NMIs are blocked, and we don't want to
- enable IRQs with NMIs blocked.)
+ **Corner case not handled**
+ Interrupts returning to kernel don't clear CPUs buffers since the
+ exit-to-user path is expected to do that anyways. But, there could be
+ a case when an NMI is generated in kernel after the exit-to-user path
+ has cleared the buffers. This case is not handled and NMI returning to
+ kernel don't clear CPU buffers because:
+
+ 1. It is rare to get an NMI after VERW, but before returning to userspace.
+ 2. For an unprivileged user, there is no known way to make that NMI
+ less rare or target it.
+ 3. It would take a large number of these precisely-timed NMIs to mount
+ an actual attack. There's presumably not enough bandwidth.
+ 4. The NMI in question occurs after a VERW, i.e. when user state is
+ restored and most interesting data is already scrubbed. Whats left
+ is only the data that NMI touches, and that may or may not be of
+ any interest.
2. C-State transition
diff --git a/Documentation/conf.py b/Documentation/conf.py
index 5830b01c..da64c9f 100644
--- a/Documentation/conf.py
+++ b/Documentation/conf.py
@@ -388,6 +388,12 @@
verbatimhintsturnover=false,
''',
+ #
+ # Some of our authors are fond of deep nesting; tell latex to
+ # cope.
+ #
+ 'maxlistdepth': '10',
+
# For CJK One-half spacing, need to be in front of hyperref
'extrapackages': r'\usepackage{setspace}',
diff --git a/Documentation/dev-tools/kselftest.rst b/Documentation/dev-tools/kselftest.rst
index ab376b31..7f3582a 100644
--- a/Documentation/dev-tools/kselftest.rst
+++ b/Documentation/dev-tools/kselftest.rst
@@ -245,6 +245,10 @@
TEST_PROGS, TEST_GEN_PROGS mean it is the executable tested by
default.
+ TEST_GEN_MODS_DIR should be used by tests that require modules to be built
+ before the test starts. The variable will contain the name of the directory
+ containing the modules.
+
TEST_CUSTOM_PROGS should be used by tests that require custom build
rules and prevent common build rule use.
diff --git a/Documentation/devicetree/bindings/Makefile b/Documentation/devicetree/bindings/Makefile
index 2323fd5..129cf69 100644
--- a/Documentation/devicetree/bindings/Makefile
+++ b/Documentation/devicetree/bindings/Makefile
@@ -28,7 +28,10 @@
find_all_cmd = find $(srctree)/$(src) \( -name '*.yaml' ! \
-name 'processed-schema*' \)
-find_cmd = $(find_all_cmd) | sed 's|^$(srctree)/$(src)/||' | grep -F -e "$(subst :," -e ",$(DT_SCHEMA_FILES))" | sed 's|^|$(srctree)/$(src)/|'
+find_cmd = $(find_all_cmd) | \
+ sed 's|^$(srctree)/||' | \
+ grep -F -e "$(subst :," -e ",$(DT_SCHEMA_FILES))" | \
+ sed 's|^|$(srctree)/|'
CHK_DT_DOCS := $(shell $(find_cmd))
quiet_cmd_yamllint = LINT $(src)
diff --git a/Documentation/devicetree/bindings/ata/ceva,ahci-1v84.yaml b/Documentation/devicetree/bindings/ata/ceva,ahci-1v84.yaml
index b29ce59..9952e0e 100644
--- a/Documentation/devicetree/bindings/ata/ceva,ahci-1v84.yaml
+++ b/Documentation/devicetree/bindings/ata/ceva,ahci-1v84.yaml
@@ -7,7 +7,8 @@
title: Ceva AHCI SATA Controller
maintainers:
- - Piyush Mehta <piyush.mehta@amd.com>
+ - Mubin Sayyed <mubin.sayyed@amd.com>
+ - Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
description: |
The Ceva SATA controller mostly conforms to the AHCI interface with some
diff --git a/Documentation/devicetree/bindings/clock/google,gs101-clock.yaml b/Documentation/devicetree/bindings/clock/google,gs101-clock.yaml
index 3eebc03..ca7fdad 100644
--- a/Documentation/devicetree/bindings/clock/google,gs101-clock.yaml
+++ b/Documentation/devicetree/bindings/clock/google,gs101-clock.yaml
@@ -85,8 +85,8 @@
clock-names:
items:
- - const: dout_cmu_misc_bus
- - const: dout_cmu_misc_sss
+ - const: bus
+ - const: sss
additionalProperties: false
diff --git a/Documentation/devicetree/bindings/display/bridge/nxp,tda998x.yaml b/Documentation/devicetree/bindings/display/bridge/nxp,tda998x.yaml
index 21d995f..b8e9cf6 100644
--- a/Documentation/devicetree/bindings/display/bridge/nxp,tda998x.yaml
+++ b/Documentation/devicetree/bindings/display/bridge/nxp,tda998x.yaml
@@ -29,19 +29,22 @@
audio-ports:
description:
- Array of 8-bit values, 2 values per DAI (Documentation/sound/soc/dai.rst).
+ Array of 2 values per DAI (Documentation/sound/soc/dai.rst).
The implementation allows one or two DAIs.
If two DAIs are defined, they must be of different type.
$ref: /schemas/types.yaml#/definitions/uint32-matrix
+ minItems: 1
+ maxItems: 2
items:
- minItems: 1
items:
- description: |
The first value defines the DAI type: TDA998x_SPDIF or TDA998x_I2S
(see include/dt-bindings/display/tda998x.h).
+ enum: [ 1, 2 ]
- description:
The second value defines the tda998x AP_ENA reg content when the
DAI in question is used.
+ maximum: 0xff
'#sound-dai-cells':
enum: [ 0, 1 ]
diff --git a/Documentation/devicetree/bindings/gpio/xlnx,zynqmp-gpio-modepin.yaml b/Documentation/devicetree/bindings/gpio/xlnx,zynqmp-gpio-modepin.yaml
index b1fd632..bb93baa 100644
--- a/Documentation/devicetree/bindings/gpio/xlnx,zynqmp-gpio-modepin.yaml
+++ b/Documentation/devicetree/bindings/gpio/xlnx,zynqmp-gpio-modepin.yaml
@@ -12,7 +12,8 @@
PS_MODE). Every pin can be configured as input/output.
maintainers:
- - Piyush Mehta <piyush.mehta@amd.com>
+ - Mubin Sayyed <mubin.sayyed@amd.com>
+ - Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
properties:
compatible:
diff --git a/Documentation/devicetree/bindings/net/marvell,prestera.yaml b/Documentation/devicetree/bindings/net/marvell,prestera.yaml
index 5ea8b73..16ff892 100644
--- a/Documentation/devicetree/bindings/net/marvell,prestera.yaml
+++ b/Documentation/devicetree/bindings/net/marvell,prestera.yaml
@@ -78,8 +78,8 @@
pcie@0 {
#address-cells = <3>;
#size-cells = <2>;
- ranges = <0x0 0x0 0x0 0x0 0x0 0x0>;
- reg = <0x0 0x0 0x0 0x0 0x0 0x0>;
+ ranges = <0x02000000 0x0 0x100000 0x10000000 0x0 0x0>;
+ reg = <0x0 0x1000>;
device_type = "pci";
switch@0,0 {
diff --git a/Documentation/devicetree/bindings/net/renesas,ethertsn.yaml b/Documentation/devicetree/bindings/net/renesas,ethertsn.yaml
index 475aff7..ea35d19 100644
--- a/Documentation/devicetree/bindings/net/renesas,ethertsn.yaml
+++ b/Documentation/devicetree/bindings/net/renesas,ethertsn.yaml
@@ -65,9 +65,11 @@
rx-internal-delay-ps:
enum: [0, 1800]
+ default: 0
tx-internal-delay-ps:
enum: [0, 2000]
+ default: 0
'#address-cells':
const: 1
diff --git a/Documentation/devicetree/bindings/reset/xlnx,zynqmp-reset.yaml b/Documentation/devicetree/bindings/reset/xlnx,zynqmp-reset.yaml
index 49db668..1f1b42d 100644
--- a/Documentation/devicetree/bindings/reset/xlnx,zynqmp-reset.yaml
+++ b/Documentation/devicetree/bindings/reset/xlnx,zynqmp-reset.yaml
@@ -7,7 +7,8 @@
title: Zynq UltraScale+ MPSoC and Versal reset
maintainers:
- - Piyush Mehta <piyush.mehta@amd.com>
+ - Mubin Sayyed <mubin.sayyed@amd.com>
+ - Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
description: |
The Zynq UltraScale+ MPSoC and Versal has several different resets.
diff --git a/Documentation/devicetree/bindings/sound/google,sc7280-herobrine.yaml b/Documentation/devicetree/bindings/sound/google,sc7280-herobrine.yaml
index ec4b6e5..cdcd7c6 100644
--- a/Documentation/devicetree/bindings/sound/google,sc7280-herobrine.yaml
+++ b/Documentation/devicetree/bindings/sound/google,sc7280-herobrine.yaml
@@ -7,7 +7,6 @@
title: Google SC7280-Herobrine ASoC sound card driver
maintainers:
- - Srinivasa Rao Mandadapu <srivasam@codeaurora.org>
- Judy Hsiao <judyhsiao@chromium.org>
description:
diff --git a/Documentation/devicetree/bindings/sound/nvidia,tegra-audio-max9808x.yaml b/Documentation/devicetree/bindings/sound/nvidia,tegra-audio-max9808x.yaml
index c29d794..241d20f 100644
--- a/Documentation/devicetree/bindings/sound/nvidia,tegra-audio-max9808x.yaml
+++ b/Documentation/devicetree/bindings/sound/nvidia,tegra-audio-max9808x.yaml
@@ -64,7 +64,7 @@
#include <dt-bindings/clock/tegra30-car.h>
#include <dt-bindings/soc/tegra-pmc.h>
sound {
- compatible = "lge,tegra-audio-max98089-p895",
+ compatible = "lg,tegra-audio-max98089-p895",
"nvidia,tegra-audio-max98089";
nvidia,model = "LG Optimus Vu MAX98089";
diff --git a/Documentation/devicetree/bindings/tpm/tpm-common.yaml b/Documentation/devicetree/bindings/tpm/tpm-common.yaml
index 9039062..3c1241b 100644
--- a/Documentation/devicetree/bindings/tpm/tpm-common.yaml
+++ b/Documentation/devicetree/bindings/tpm/tpm-common.yaml
@@ -42,7 +42,7 @@
resets:
description: Reset controller to reset the TPM
- $ref: /schemas/types.yaml#/definitions/phandle
+ maxItems: 1
reset-gpios:
description: Output GPIO pin to reset the TPM
diff --git a/Documentation/devicetree/bindings/ufs/samsung,exynos-ufs.yaml b/Documentation/devicetree/bindings/ufs/samsung,exynos-ufs.yaml
index 88cc1e3..b2b509b 100644
--- a/Documentation/devicetree/bindings/ufs/samsung,exynos-ufs.yaml
+++ b/Documentation/devicetree/bindings/ufs/samsung,exynos-ufs.yaml
@@ -55,9 +55,12 @@
samsung,sysreg:
$ref: /schemas/types.yaml#/definitions/phandle-array
- description: Should be phandle/offset pair. The phandle to the syscon node
- which indicates the FSYSx sysreg interface and the offset of
- the control register for UFS io coherency setting.
+ items:
+ - items:
+ - description: phandle to FSYSx sysreg node
+ - description: offset of the control register for UFS io coherency setting
+ description:
+ Phandle and offset to the FSYSx sysreg for UFS io coherency setting.
dma-coherent: true
diff --git a/Documentation/devicetree/bindings/usb/dwc3-xilinx.yaml b/Documentation/devicetree/bindings/usb/dwc3-xilinx.yaml
index bb373eb..00f87a5 100644
--- a/Documentation/devicetree/bindings/usb/dwc3-xilinx.yaml
+++ b/Documentation/devicetree/bindings/usb/dwc3-xilinx.yaml
@@ -7,7 +7,8 @@
title: Xilinx SuperSpeed DWC3 USB SoC controller
maintainers:
- - Piyush Mehta <piyush.mehta@amd.com>
+ - Mubin Sayyed <mubin.sayyed@amd.com>
+ - Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
properties:
compatible:
diff --git a/Documentation/devicetree/bindings/usb/microchip,usb5744.yaml b/Documentation/devicetree/bindings/usb/microchip,usb5744.yaml
index 6d4cfd9..445183d 100644
--- a/Documentation/devicetree/bindings/usb/microchip,usb5744.yaml
+++ b/Documentation/devicetree/bindings/usb/microchip,usb5744.yaml
@@ -16,8 +16,9 @@
USB 2.0 traffic.
maintainers:
- - Piyush Mehta <piyush.mehta@amd.com>
- Michal Simek <michal.simek@amd.com>
+ - Mubin Sayyed <mubin.sayyed@amd.com>
+ - Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
properties:
compatible:
diff --git a/Documentation/devicetree/bindings/usb/xlnx,usb2.yaml b/Documentation/devicetree/bindings/usb/xlnx,usb2.yaml
index 868dffe..a7f75fe 100644
--- a/Documentation/devicetree/bindings/usb/xlnx,usb2.yaml
+++ b/Documentation/devicetree/bindings/usb/xlnx,usb2.yaml
@@ -7,7 +7,8 @@
title: Xilinx udc controller
maintainers:
- - Piyush Mehta <piyush.mehta@amd.com>
+ - Mubin Sayyed <mubin.sayyed@amd.com>
+ - Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
properties:
compatible:
diff --git a/Documentation/driver-api/dpll.rst b/Documentation/driver-api/dpll.rst
index e3d5938..ea8d166 100644
--- a/Documentation/driver-api/dpll.rst
+++ b/Documentation/driver-api/dpll.rst
@@ -545,7 +545,7 @@
to drive dpll with signal recovered from the PHY netdevice.
This is done by exposing a pin to the netdevice - attaching pin to the
netdevice itself with
-``netdev_dpll_pin_set(struct net_device *dev, struct dpll_pin *dpll_pin)``.
+``dpll_netdev_pin_set(struct net_device *dev, struct dpll_pin *dpll_pin)``.
Exposed pin id handle ``DPLL_A_PIN_ID`` is then identifiable by the user
as it is attached to rtnetlink respond to get ``RTM_NEWLINK`` command in
nested attribute ``IFLA_DPLL_PIN``.
diff --git a/Documentation/filesystems/files.rst b/Documentation/filesystems/files.rst
index 9e38e4c..eb770f8 100644
--- a/Documentation/filesystems/files.rst
+++ b/Documentation/filesystems/files.rst
@@ -116,7 +116,7 @@
in get_file_rcu() and __files_get_rcu().
In addition, it isn't possible to access or check fields in struct file
-without first aqcuiring a reference on it under rcu lookup. Not doing
+without first acquiring a reference on it under rcu lookup. Not doing
that was always very dodgy and it was only usable for non-pointer data
in struct file. With SLAB_TYPESAFE_BY_RCU it is necessary that callers
either first acquire a reference or they must hold the files_lock of the
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index e18bc5a..0ea1e44 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -98,7 +98,6 @@
isofs
nilfs2
nfs/index
- ntfs
ntfs3
ocfs2
ocfs2-online-filecheck
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index d5bf4b6..e664061 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -29,7 +29,7 @@
char *(*d_dname)((struct dentry *dentry, char *buffer, int buflen);
struct vfsmount *(*d_automount)(struct path *path);
int (*d_manage)(const struct path *, bool);
- struct dentry *(*d_real)(struct dentry *, const struct inode *);
+ struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
locking rules:
diff --git a/Documentation/filesystems/ntfs.rst b/Documentation/filesystems/ntfs.rst
deleted file mode 100644
index 5bb093a..0000000
--- a/Documentation/filesystems/ntfs.rst
+++ /dev/null
@@ -1,466 +0,0 @@
-.. SPDX-License-Identifier: GPL-2.0
-
-================================
-The Linux NTFS filesystem driver
-================================
-
-
-.. Table of contents
-
- - Overview
- - Web site
- - Features
- - Supported mount options
- - Known bugs and (mis-)features
- - Using NTFS volume and stripe sets
- - The Device-Mapper driver
- - The Software RAID / MD driver
- - Limitations when using the MD driver
-
-
-Overview
-========
-
-Linux-NTFS comes with a number of user-space programs known as ntfsprogs.
-These include mkntfs, a full-featured ntfs filesystem format utility,
-ntfsundelete used for recovering files that were unintentionally deleted
-from an NTFS volume and ntfsresize which is used to resize an NTFS partition.
-See the web site for more information.
-
-To mount an NTFS 1.2/3.x (Windows NT4/2000/XP/2003) volume, use the file
-system type 'ntfs'. The driver currently supports read-only mode (with no
-fault-tolerance, encryption or journalling) and very limited, but safe, write
-support.
-
-For fault tolerance and raid support (i.e. volume and stripe sets), you can
-use the kernel's Software RAID / MD driver. See section "Using Software RAID
-with NTFS" for details.
-
-
-Web site
-========
-
-There is plenty of additional information on the linux-ntfs web site
-at http://www.linux-ntfs.org/
-
-The web site has a lot of additional information, such as a comprehensive
-FAQ, documentation on the NTFS on-disk format, information on the Linux-NTFS
-userspace utilities, etc.
-
-
-Features
-========
-
-- This is a complete rewrite of the NTFS driver that used to be in the 2.4 and
- earlier kernels. This new driver implements NTFS read support and is
- functionally equivalent to the old ntfs driver and it also implements limited
- write support. The biggest limitation at present is that files/directories
- cannot be created or deleted. See below for the list of write features that
- are so far supported. Another limitation is that writing to compressed files
- is not implemented at all. Also, neither read nor write access to encrypted
- files is so far implemented.
-- The new driver has full support for sparse files on NTFS 3.x volumes which
- the old driver isn't happy with.
-- The new driver supports execution of binaries due to mmap() now being
- supported.
-- The new driver supports loopback mounting of files on NTFS which is used by
- some Linux distributions to enable the user to run Linux from an NTFS
- partition by creating a large file while in Windows and then loopback
- mounting the file while in Linux and creating a Linux filesystem on it that
- is used to install Linux on it.
-- A comparison of the two drivers using::
-
- time find . -type f -exec md5sum "{}" \;
-
- run three times in sequence with each driver (after a reboot) on a 1.4GiB
- NTFS partition, showed the new driver to be 20% faster in total time elapsed
- (from 9:43 minutes on average down to 7:53). The time spent in user space
- was unchanged but the time spent in the kernel was decreased by a factor of
- 2.5 (from 85 CPU seconds down to 33).
-- The driver does not support short file names in general. For backwards
- compatibility, we implement access to files using their short file names if
- they exist. The driver will not create short file names however, and a
- rename will discard any existing short file name.
-- The new driver supports exporting of mounted NTFS volumes via NFS.
-- The new driver supports async io (aio).
-- The new driver supports fsync(2), fdatasync(2), and msync(2).
-- The new driver supports readv(2) and writev(2).
-- The new driver supports access time updates (including mtime and ctime).
-- The new driver supports truncate(2) and open(2) with O_TRUNC. But at present
- only very limited support for highly fragmented files, i.e. ones which have
- their data attribute split across multiple extents, is included. Another
- limitation is that at present truncate(2) will never create sparse files,
- since to mark a file sparse we need to modify the directory entry for the
- file and we do not implement directory modifications yet.
-- The new driver supports write(2) which can both overwrite existing data and
- extend the file size so that you can write beyond the existing data. Also,
- writing into sparse regions is supported and the holes are filled in with
- clusters. But at present only limited support for highly fragmented files,
- i.e. ones which have their data attribute split across multiple extents, is
- included. Another limitation is that write(2) will never create sparse
- files, since to mark a file sparse we need to modify the directory entry for
- the file and we do not implement directory modifications yet.
-
-Supported mount options
-=======================
-
-In addition to the generic mount options described by the manual page for the
-mount command (man 8 mount, also see man 5 fstab), the NTFS driver supports the
-following mount options:
-
-======================= =======================================================
-iocharset=name Deprecated option. Still supported but please use
- nls=name in the future. See description for nls=name.
-
-nls=name Character set to use when returning file names.
- Unlike VFAT, NTFS suppresses names that contain
- unconvertible characters. Note that most character
- sets contain insufficient characters to represent all
- possible Unicode characters that can exist on NTFS.
- To be sure you are not missing any files, you are
- advised to use nls=utf8 which is capable of
- representing all Unicode characters.
-
-utf8=<bool> Option no longer supported. Currently mapped to
- nls=utf8 but please use nls=utf8 in the future and
- make sure utf8 is compiled either as module or into
- the kernel. See description for nls=name.
-
-uid=
-gid=
-umask= Provide default owner, group, and access mode mask.
- These options work as documented in mount(8). By
- default, the files/directories are owned by root and
- he/she has read and write permissions, as well as
- browse permission for directories. No one else has any
- access permissions. I.e. the mode on all files is by
- default rw------- and for directories rwx------, a
- consequence of the default fmask=0177 and dmask=0077.
- Using a umask of zero will grant all permissions to
- everyone, i.e. all files and directories will have mode
- rwxrwxrwx.
-
-fmask=
-dmask= Instead of specifying umask which applies both to
- files and directories, fmask applies only to files and
- dmask only to directories.
-
-sloppy=<BOOL> If sloppy is specified, ignore unknown mount options.
- Otherwise the default behaviour is to abort mount if
- any unknown options are found.
-
-show_sys_files=<BOOL> If show_sys_files is specified, show the system files
- in directory listings. Otherwise the default behaviour
- is to hide the system files.
- Note that even when show_sys_files is specified, "$MFT"
- will not be visible due to bugs/mis-features in glibc.
- Further, note that irrespective of show_sys_files, all
- files are accessible by name, i.e. you can always do
- "ls -l \$UpCase" for example to specifically show the
- system file containing the Unicode upcase table.
-
-case_sensitive=<BOOL> If case_sensitive is specified, treat all file names as
- case sensitive and create file names in the POSIX
- namespace. Otherwise the default behaviour is to treat
- file names as case insensitive and to create file names
- in the WIN32/LONG name space. Note, the Linux NTFS
- driver will never create short file names and will
- remove them on rename/delete of the corresponding long
- file name.
- Note that files remain accessible via their short file
- name, if it exists. If case_sensitive, you will need
- to provide the correct case of the short file name.
-
-disable_sparse=<BOOL> If disable_sparse is specified, creation of sparse
- regions, i.e. holes, inside files is disabled for the
- volume (for the duration of this mount only). By
- default, creation of sparse regions is enabled, which
- is consistent with the behaviour of traditional Unix
- filesystems.
-
-errors=opt What to do when critical filesystem errors are found.
- Following values can be used for "opt":
-
- ======== =========================================
- continue DEFAULT, try to clean-up as much as
- possible, e.g. marking a corrupt inode as
- bad so it is no longer accessed, and then
- continue.
- recover At present only supported is recovery of
- the boot sector from the backup copy.
- If read-only mount, the recovery is done
- in memory only and not written to disk.
- ======== =========================================
-
- Note that the options are additive, i.e. specifying::
-
- errors=continue,errors=recover
-
- means the driver will attempt to recover and if that
- fails it will clean-up as much as possible and
- continue.
-
-mft_zone_multiplier= Set the MFT zone multiplier for the volume (this
- setting is not persistent across mounts and can be
- changed from mount to mount but cannot be changed on
- remount). Values of 1 to 4 are allowed, 1 being the
- default. The MFT zone multiplier determines how much
- space is reserved for the MFT on the volume. If all
- other space is used up, then the MFT zone will be
- shrunk dynamically, so this has no impact on the
- amount of free space. However, it can have an impact
- on performance by affecting fragmentation of the MFT.
- In general use the default. If you have a lot of small
- files then use a higher value. The values have the
- following meaning:
-
- ===== =================================
- Value MFT zone size (% of volume size)
- ===== =================================
- 1 12.5%
- 2 25%
- 3 37.5%
- 4 50%
- ===== =================================
-
- Note this option is irrelevant for read-only mounts.
-======================= =======================================================
-
-
-Known bugs and (mis-)features
-=============================
-
-- The link count on each directory inode entry is set to 1, due to Linux not
- supporting directory hard links. This may well confuse some user space
- applications, since the directory names will have the same inode numbers.
- This also speeds up ntfs_read_inode() immensely. And we haven't found any
- problems with this approach so far. If you find a problem with this, please
- let us know.
-
-
-Please send bug reports/comments/feedback/abuse to the Linux-NTFS development
-list at sourceforge: linux-ntfs-dev@lists.sourceforge.net
-
-
-Using NTFS volume and stripe sets
-=================================
-
-For support of volume and stripe sets, you can either use the kernel's
-Device-Mapper driver or the kernel's Software RAID / MD driver. The former is
-the recommended one to use for linear raid. But the latter is required for
-raid level 5. For striping and mirroring, either driver should work fine.
-
-
-The Device-Mapper driver
-------------------------
-
-You will need to create a table of the components of the volume/stripe set and
-how they fit together and load this into the kernel using the dmsetup utility
-(see man 8 dmsetup).
-
-Linear volume sets, i.e. linear raid, has been tested and works fine. Even
-though untested, there is no reason why stripe sets, i.e. raid level 0, and
-mirrors, i.e. raid level 1 should not work, too. Stripes with parity, i.e.
-raid level 5, unfortunately cannot work yet because the current version of the
-Device-Mapper driver does not support raid level 5. You may be able to use the
-Software RAID / MD driver for raid level 5, see the next section for details.
-
-To create the table describing your volume you will need to know each of its
-components and their sizes in sectors, i.e. multiples of 512-byte blocks.
-
-For NT4 fault tolerant volumes you can obtain the sizes using fdisk. So for
-example if one of your partitions is /dev/hda2 you would do::
-
- $ fdisk -ul /dev/hda
-
- Disk /dev/hda: 81.9 GB, 81964302336 bytes
- 255 heads, 63 sectors/track, 9964 cylinders, total 160086528 sectors
- Units = sectors of 1 * 512 = 512 bytes
-
- Device Boot Start End Blocks Id System
- /dev/hda1 * 63 4209029 2104483+ 83 Linux
- /dev/hda2 4209030 37768814 16779892+ 86 NTFS
- /dev/hda3 37768815 46170809 4200997+ 83 Linux
-
-And you would know that /dev/hda2 has a size of 37768814 - 4209030 + 1 =
-33559785 sectors.
-
-For Win2k and later dynamic disks, you can for example use the ldminfo utility
-which is part of the Linux LDM tools (the latest version at the time of
-writing is linux-ldm-0.0.8.tar.bz2). You can download it from:
-
- http://www.linux-ntfs.org/
-
-Simply extract the downloaded archive (tar xvjf linux-ldm-0.0.8.tar.bz2), go
-into it (cd linux-ldm-0.0.8) and change to the test directory (cd test). You
-will find the precompiled (i386) ldminfo utility there. NOTE: You will not be
-able to compile this yourself easily so use the binary version!
-
-Then you would use ldminfo in dump mode to obtain the necessary information::
-
- $ ./ldminfo --dump /dev/hda
-
-This would dump the LDM database found on /dev/hda which describes all of your
-dynamic disks and all the volumes on them. At the bottom you will see the
-VOLUME DEFINITIONS section which is all you really need. You may need to look
-further above to determine which of the disks in the volume definitions is
-which device in Linux. Hint: Run ldminfo on each of your dynamic disks and
-look at the Disk Id close to the top of the output for each (the PRIVATE HEADER
-section). You can then find these Disk Ids in the VBLK DATABASE section in the
-<Disk> components where you will get the LDM Name for the disk that is found in
-the VOLUME DEFINITIONS section.
-
-Note you will also need to enable the LDM driver in the Linux kernel. If your
-distribution did not enable it, you will need to recompile the kernel with it
-enabled. This will create the LDM partitions on each device at boot time. You
-would then use those devices (for /dev/hda they would be /dev/hda1, 2, 3, etc)
-in the Device-Mapper table.
-
-You can also bypass using the LDM driver by using the main device (e.g.
-/dev/hda) and then using the offsets of the LDM partitions into this device as
-the "Start sector of device" when creating the table. Once again ldminfo would
-give you the correct information to do this.
-
-Assuming you know all your devices and their sizes things are easy.
-
-For a linear raid the table would look like this (note all values are in
-512-byte sectors)::
-
- # Offset into Size of this Raid type Device Start sector
- # volume device of device
- 0 1028161 linear /dev/hda1 0
- 1028161 3903762 linear /dev/hdb2 0
- 4931923 2103211 linear /dev/hdc1 0
-
-For a striped volume, i.e. raid level 0, you will need to know the chunk size
-you used when creating the volume. Windows uses 64kiB as the default, so it
-will probably be this unless you changes the defaults when creating the array.
-
-For a raid level 0 the table would look like this (note all values are in
-512-byte sectors)::
-
- # Offset Size Raid Number Chunk 1st Start 2nd Start
- # into of the type of size Device in Device in
- # volume volume stripes device device
- 0 2056320 striped 2 128 /dev/hda1 0 /dev/hdb1 0
-
-If there are more than two devices, just add each of them to the end of the
-line.
-
-Finally, for a mirrored volume, i.e. raid level 1, the table would look like
-this (note all values are in 512-byte sectors)::
-
- # Ofs Size Raid Log Number Region Should Number Source Start Target Start
- # in of the type type of log size sync? of Device in Device in
- # vol volume params mirrors Device Device
- 0 2056320 mirror core 2 16 nosync 2 /dev/hda1 0 /dev/hdb1 0
-
-If you are mirroring to multiple devices you can specify further targets at the
-end of the line.
-
-Note the "Should sync?" parameter "nosync" means that the two mirrors are
-already in sync which will be the case on a clean shutdown of Windows. If the
-mirrors are not clean, you can specify the "sync" option instead of "nosync"
-and the Device-Mapper driver will then copy the entirety of the "Source Device"
-to the "Target Device" or if you specified multiple target devices to all of
-them.
-
-Once you have your table, save it in a file somewhere (e.g. /etc/ntfsvolume1),
-and hand it over to dmsetup to work with, like so::
-
- $ dmsetup create myvolume1 /etc/ntfsvolume1
-
-You can obviously replace "myvolume1" with whatever name you like.
-
-If it all worked, you will now have the device /dev/device-mapper/myvolume1
-which you can then just use as an argument to the mount command as usual to
-mount the ntfs volume. For example::
-
- $ mount -t ntfs -o ro /dev/device-mapper/myvolume1 /mnt/myvol1
-
-(You need to create the directory /mnt/myvol1 first and of course you can use
-anything you like instead of /mnt/myvol1 as long as it is an existing
-directory.)
-
-It is advisable to do the mount read-only to see if the volume has been setup
-correctly to avoid the possibility of causing damage to the data on the ntfs
-volume.
-
-
-The Software RAID / MD driver
------------------------------
-
-An alternative to using the Device-Mapper driver is to use the kernel's
-Software RAID / MD driver. For which you need to set up your /etc/raidtab
-appropriately (see man 5 raidtab).
-
-Linear volume sets, i.e. linear raid, as well as stripe sets, i.e. raid level
-0, have been tested and work fine (though see section "Limitations when using
-the MD driver with NTFS volumes" especially if you want to use linear raid).
-Even though untested, there is no reason why mirrors, i.e. raid level 1, and
-stripes with parity, i.e. raid level 5, should not work, too.
-
-You have to use the "persistent-superblock 0" option for each raid-disk in the
-NTFS volume/stripe you are configuring in /etc/raidtab as the persistent
-superblock used by the MD driver would damage the NTFS volume.
-
-Windows by default uses a stripe chunk size of 64k, so you probably want the
-"chunk-size 64k" option for each raid-disk, too.
-
-For example, if you have a stripe set consisting of two partitions /dev/hda5
-and /dev/hdb1 your /etc/raidtab would look like this::
-
- raiddev /dev/md0
- raid-level 0
- nr-raid-disks 2
- nr-spare-disks 0
- persistent-superblock 0
- chunk-size 64k
- device /dev/hda5
- raid-disk 0
- device /dev/hdb1
- raid-disk 1
-
-For linear raid, just change the raid-level above to "raid-level linear", for
-mirrors, change it to "raid-level 1", and for stripe sets with parity, change
-it to "raid-level 5".
-
-Note for stripe sets with parity you will also need to tell the MD driver
-which parity algorithm to use by specifying the option "parity-algorithm
-which", where you need to replace "which" with the name of the algorithm to
-use (see man 5 raidtab for available algorithms) and you will have to try the
-different available algorithms until you find one that works. Make sure you
-are working read-only when playing with this as you may damage your data
-otherwise. If you find which algorithm works please let us know (email the
-linux-ntfs developers list linux-ntfs-dev@lists.sourceforge.net or drop in on
-IRC in channel #ntfs on the irc.freenode.net network) so we can update this
-documentation.
-
-Once the raidtab is setup, run for example raid0run -a to start all devices or
-raid0run /dev/md0 to start a particular md device, in this case /dev/md0.
-
-Then just use the mount command as usual to mount the ntfs volume using for
-example::
-
- mount -t ntfs -o ro /dev/md0 /mnt/myntfsvolume
-
-It is advisable to do the mount read-only to see if the md volume has been
-setup correctly to avoid the possibility of causing damage to the data on the
-ntfs volume.
-
-
-Limitations when using the Software RAID / MD driver
------------------------------------------------------
-
-Using the md driver will not work properly if any of your NTFS partitions have
-an odd number of sectors. This is especially important for linear raid as all
-data after the first partition with an odd number of sectors will be offset by
-one or more sectors so if you mount such a partition with write support you
-will cause massive damage to the data on the volume which will only become
-apparent when you try to use the volume again under Windows.
-
-So when using linear raid, make sure that all your partitions have an even
-number of sectors BEFORE attempting to use it. You have been warned!
-
-Even better is to simply use the Device-Mapper for linear raid and then you do
-not have this problem with odd numbers of sectors.
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index eebcc0f..6e903a9 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -1264,7 +1264,7 @@
char *(*d_dname)(struct dentry *, char *, int);
struct vfsmount *(*d_automount)(struct path *);
int (*d_manage)(const struct path *, bool);
- struct dentry *(*d_real)(struct dentry *, const struct inode *);
+ struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
};
``d_revalidate``
@@ -1419,16 +1419,14 @@
the dentry being transited from.
``d_real``
- overlay/union type filesystems implement this method to return
- one of the underlying dentries hidden by the overlay. It is
- used in two different modes:
+ overlay/union type filesystems implement this method to return one
+ of the underlying dentries of a regular file hidden by the overlay.
- Called from file_dentry() it returns the real dentry matching
- the inode argument. The real dentry may be from a lower layer
- already copied up, but still referenced from the file. This
- mode is selected with a non-NULL inode argument.
+ The 'type' argument takes the values D_REAL_DATA or D_REAL_METADATA
+ for returning the real underlying dentry that refers to the inode
+ hosting the file's data or metadata respectively.
- With NULL inode the topmost real underlying dentry is returned.
+ For non-regular files, the 'dentry' argument is returned.
Each dentry has a pointer to its parent dentry, as well as a hash list
of child dentries. Child dentries are basically like files in a
diff --git a/Documentation/kbuild/Kconfig.recursion-issue-01 b/Documentation/kbuild/Kconfig.recursion-issue-01
index e8877db..ac49836 100644
--- a/Documentation/kbuild/Kconfig.recursion-issue-01
+++ b/Documentation/kbuild/Kconfig.recursion-issue-01
@@ -16,13 +16,13 @@
# that are possible for CORE. So for example if CORE_BELL_A_ADVANCED is 'y',
# CORE must be 'y' too.
#
-# * What influences CORE_BELL_A_ADVANCED ?
+# * What influences CORE_BELL_A_ADVANCED?
#
# As the name implies CORE_BELL_A_ADVANCED is an advanced feature of
# CORE_BELL_A so naturally it depends on CORE_BELL_A. So if CORE_BELL_A is 'y'
# we know CORE_BELL_A_ADVANCED can be 'y' too.
#
-# * What influences CORE_BELL_A ?
+# * What influences CORE_BELL_A?
#
# CORE_BELL_A depends on CORE, so CORE influences CORE_BELL_A.
#
@@ -34,7 +34,7 @@
# the "recursive dependency detected" error.
#
# Reading the Documentation/kbuild/Kconfig.recursion-issue-01 file it may be
-# obvious that an easy to solution to this problem should just be the removal
+# obvious that an easy solution to this problem should just be the removal
# of the "select CORE" from CORE_BELL_A_ADVANCED as that is implicit already
# since CORE_BELL_A depends on CORE. Recursive dependency issues are not always
# so trivial to resolve, we provide another example below of practical
diff --git a/Documentation/netlink/specs/dpll.yaml b/Documentation/netlink/specs/dpll.yaml
index b14aed1..3dcc9ec 100644
--- a/Documentation/netlink/specs/dpll.yaml
+++ b/Documentation/netlink/specs/dpll.yaml
@@ -384,8 +384,6 @@
- type
dump:
- pre: dpll-lock-dumpit
- post: dpll-unlock-dumpit
reply: *dev-attrs
-
@@ -473,8 +471,6 @@
- fractional-frequency-offset
dump:
- pre: dpll-lock-dumpit
- post: dpll-unlock-dumpit
request:
attributes:
- id
diff --git a/Documentation/networking/devlink/devlink-port.rst b/Documentation/networking/devlink/devlink-port.rst
index e33ad24..562f46b 100644
--- a/Documentation/networking/devlink/devlink-port.rst
+++ b/Documentation/networking/devlink/devlink-port.rst
@@ -126,7 +126,7 @@
`devlink port function set roce` command.
Users may also set the function as migratable using
-'devlink port function set migratable' command.
+`devlink port function set migratable` command.
Users may also set the IPsec crypto capability of the function using
`devlink port function set ipsec_crypto` command.
diff --git a/Documentation/networking/net_cachelines/inet_sock.rst b/Documentation/networking/net_cachelines/inet_sock.rst
index a2babd0..595d7ef 100644
--- a/Documentation/networking/net_cachelines/inet_sock.rst
+++ b/Documentation/networking/net_cachelines/inet_sock.rst
@@ -1,9 +1,9 @@
.. SPDX-License-Identifier: GPL-2.0
.. Copyright (C) 2023 Google LLC
-=====================================================
-inet_connection_sock struct fast path usage breakdown
-=====================================================
+==========================================
+inet_sock struct fast path usage breakdown
+==========================================
Type Name fastpath_tx_access fastpath_rx_access comment
..struct ..inet_sock
diff --git a/Documentation/networking/net_cachelines/net_device.rst b/Documentation/networking/net_cachelines/net_device.rst
index e75a535..dceb49d 100644
--- a/Documentation/networking/net_cachelines/net_device.rst
+++ b/Documentation/networking/net_cachelines/net_device.rst
@@ -136,8 +136,8 @@
possible_net_t nd_net - read_mostly (dev_net)napi_busy_loop,tcp_v(4/6)_rcv,ip(v6)_rcv,ip(6)_input,ip(6)_input_finish
void* ml_priv
enum_netdev_ml_priv_type ml_priv_type
-struct_pcpu_lstats__percpu* lstats
-struct_pcpu_sw_netstats__percpu* tstats
+struct_pcpu_lstats__percpu* lstats read_mostly dev_lstats_add()
+struct_pcpu_sw_netstats__percpu* tstats read_mostly dev_sw_netstats_tx_add()
struct_pcpu_dstats__percpu* dstats
struct_garp_port* garp_port
struct_mrp_port* mrp_port
diff --git a/Documentation/networking/net_cachelines/tcp_sock.rst b/Documentation/networking/net_cachelines/tcp_sock.rst
index 97d7a5c..1c154cb 100644
--- a/Documentation/networking/net_cachelines/tcp_sock.rst
+++ b/Documentation/networking/net_cachelines/tcp_sock.rst
@@ -38,13 +38,13 @@
u32 mss_cache read_mostly read_mostly tcp_rate_check_app_limited,tcp_current_mss,tcp_sync_mss,tcp_sndbuf_expand,tcp_tso_should_defer(tx);tcp_update_pacing_rate,tcp_clean_rtx_queue(rx)
u32 window_clamp read_mostly read_write tcp_rcv_space_adjust,__tcp_select_window
u32 rcv_ssthresh read_mostly - __tcp_select_window
-u82 scaling_ratio
+u8 scaling_ratio read_mostly read_mostly tcp_win_from_space
struct tcp_rack
u16 advmss - read_mostly tcp_rcv_space_adjust
u8 compressed_ack
u8:2 dup_ack_counter
u8:1 tlp_retrans
-u8:1 tcp_usec_ts
+u8:1 tcp_usec_ts read_mostly read_mostly
u32 chrono_start read_write - tcp_chrono_start/stop(tcp_write_xmit,tcp_cwnd_validate,tcp_send_syn_data)
u32[3] chrono_stat read_write - tcp_chrono_start/stop(tcp_write_xmit,tcp_cwnd_validate,tcp_send_syn_data)
u8:2 chrono_type read_write - tcp_chrono_start/stop(tcp_write_xmit,tcp_cwnd_validate,tcp_send_syn_data)
diff --git a/Documentation/process/cve.rst b/Documentation/process/cve.rst
new file mode 100644
index 0000000..5e2753e
--- /dev/null
+++ b/Documentation/process/cve.rst
@@ -0,0 +1,121 @@
+====
+CVEs
+====
+
+Common Vulnerabilities and Exposure (CVE®) numbers were developed as an
+unambiguous way to identify, define, and catalog publicly disclosed
+security vulnerabilities. Over time, their usefulness has declined with
+regards to the kernel project, and CVE numbers were very often assigned
+in inappropriate ways and for inappropriate reasons. Because of this,
+the kernel development community has tended to avoid them. However, the
+combination of continuing pressure to assign CVEs and other forms of
+security identifiers, and ongoing abuses by individuals and companies
+outside of the kernel community has made it clear that the kernel
+community should have control over those assignments.
+
+The Linux kernel developer team does have the ability to assign CVEs for
+potential Linux kernel security issues. This assignment is independent
+of the :doc:`normal Linux kernel security bug reporting
+process<../process/security-bugs>`.
+
+A list of all assigned CVEs for the Linux kernel can be found in the
+archives of the linux-cve mailing list, as seen on
+https://lore.kernel.org/linux-cve-announce/. To get notice of the
+assigned CVEs, please `subscribe
+<https://subspace.kernel.org/subscribing.html>`_ to that mailing list.
+
+Process
+=======
+
+As part of the normal stable release process, kernel changes that are
+potentially security issues are identified by the developers responsible
+for CVE number assignments and have CVE numbers automatically assigned
+to them. These assignments are published on the linux-cve-announce
+mailing list as announcements on a frequent basis.
+
+Note, due to the layer at which the Linux kernel is in a system, almost
+any bug might be exploitable to compromise the security of the kernel,
+but the possibility of exploitation is often not evident when the bug is
+fixed. Because of this, the CVE assignment team is overly cautious and
+assign CVE numbers to any bugfix that they identify. This
+explains the seemingly large number of CVEs that are issued by the Linux
+kernel team.
+
+If the CVE assignment team misses a specific fix that any user feels
+should have a CVE assigned to it, please email them at <cve@kernel.org>
+and the team there will work with you on it. Note that no potential
+security issues should be sent to this alias, it is ONLY for assignment
+of CVEs for fixes that are already in released kernel trees. If you
+feel you have found an unfixed security issue, please follow the
+:doc:`normal Linux kernel security bug reporting
+process<../process/security-bugs>`.
+
+No CVEs will be automatically assigned for unfixed security issues in
+the Linux kernel; assignment will only automatically happen after a fix
+is available and applied to a stable kernel tree, and it will be tracked
+that way by the git commit id of the original fix. If anyone wishes to
+have a CVE assigned before an issue is resolved with a commit, please
+contact the kernel CVE assignment team at <cve@kernel.org> to get an
+identifier assigned from their batch of reserved identifiers.
+
+No CVEs will be assigned for any issue found in a version of the kernel
+that is not currently being actively supported by the Stable/LTS kernel
+team. A list of the currently supported kernel branches can be found at
+https://kernel.org/releases.html
+
+Disputes of assigned CVEs
+=========================
+
+The authority to dispute or modify an assigned CVE for a specific kernel
+change lies solely with the maintainers of the relevant subsystem
+affected. This principle ensures a high degree of accuracy and
+accountability in vulnerability reporting. Only those individuals with
+deep expertise and intimate knowledge of the subsystem can effectively
+assess the validity and scope of a reported vulnerability and determine
+its appropriate CVE designation. Any attempt to modify or dispute a CVE
+outside of this designated authority could lead to confusion, inaccurate
+reporting, and ultimately, compromised systems.
+
+Invalid CVEs
+============
+
+If a security issue is found in a Linux kernel that is only supported by
+a Linux distribution due to the changes that have been made by that
+distribution, or due to the distribution supporting a kernel version
+that is no longer one of the kernel.org supported releases, then a CVE
+can not be assigned by the Linux kernel CVE team, and must be asked for
+from that Linux distribution itself.
+
+Any CVE that is assigned against the Linux kernel for an actively
+supported kernel version, by any group other than the kernel assignment
+CVE team should not be treated as a valid CVE. Please notify the
+kernel CVE assignment team at <cve@kernel.org> so that they can work to
+invalidate such entries through the CNA remediation process.
+
+Applicability of specific CVEs
+==============================
+
+As the Linux kernel can be used in many different ways, with many
+different ways of accessing it by external users, or no access at all,
+the applicability of any specific CVE is up to the user of Linux to
+determine, it is not up to the CVE assignment team. Please do not
+contact us to attempt to determine the applicability of any specific
+CVE.
+
+Also, as the source tree is so large, and any one system only uses a
+small subset of the source tree, any users of Linux should be aware that
+large numbers of assigned CVEs are not relevant for their systems.
+
+In short, we do not know your use case, and we do not know what portions
+of the kernel that you use, so there is no way for us to determine if a
+specific CVE is relevant for your system.
+
+As always, it is best to take all released kernel changes, as they are
+tested together in a unified whole by many community members, and not as
+individual cherry-picked changes. Also note that for many bugs, the
+solution to the overall problem is not found in a single change, but by
+the sum of many fixes on top of each other. Ideally CVEs will be
+assigned to all fixes for all issues, but sometimes we will fail to
+notice fixes, therefore assume that some changes without a CVE assigned
+might be relevant to take.
+
diff --git a/Documentation/process/index.rst b/Documentation/process/index.rst
index 6cb732d..de9cbb7 100644
--- a/Documentation/process/index.rst
+++ b/Documentation/process/index.rst
@@ -81,6 +81,7 @@
handling-regressions
security-bugs
+ cve
embargoed-hardware-issues
Maintainer information
diff --git a/Documentation/process/maintainer-netdev.rst b/Documentation/process/maintainer-netdev.rst
index 84ee60f..fd96e4a 100644
--- a/Documentation/process/maintainer-netdev.rst
+++ b/Documentation/process/maintainer-netdev.rst
@@ -431,7 +431,7 @@
Checks in patchwork are mostly simple wrappers around existing kernel
scripts, the sources are available at:
-https://github.com/kuba-moo/nipa/tree/master/tests
+https://github.com/linux-netdev/nipa/tree/master/tests
**Do not** post your patches just to run them through the checks.
You must ensure that your patches are ready by testing them locally
diff --git a/Documentation/process/security-bugs.rst b/Documentation/process/security-bugs.rst
index 692a3ba..56c560a 100644
--- a/Documentation/process/security-bugs.rst
+++ b/Documentation/process/security-bugs.rst
@@ -99,9 +99,8 @@
The security team does not assign CVEs, nor do we require them for
reports or fixes, as this can needlessly complicate the process and may
delay the bug handling. If a reporter wishes to have a CVE identifier
-assigned, they should find one by themselves, for example by contacting
-MITRE directly. However under no circumstances will a patch inclusion
-be delayed to wait for a CVE identifier to arrive.
+assigned for a confirmed issue, they can contact the :doc:`kernel CVE
+assignment team<../process/cve>` to obtain one.
Non-disclosure agreements
-------------------------
diff --git a/Documentation/sphinx/kernel_feat.py b/Documentation/sphinx/kernel_feat.py
index b9df61e..03ace5f 100644
--- a/Documentation/sphinx/kernel_feat.py
+++ b/Documentation/sphinx/kernel_feat.py
@@ -109,7 +109,7 @@
else:
out_lines += line + "\n"
- nodeList = self.nestedParse(out_lines, fname)
+ nodeList = self.nestedParse(out_lines, self.arguments[0])
return nodeList
def nestedParse(self, lines, fname):
diff --git a/Documentation/sphinx/translations.py b/Documentation/sphinx/translations.py
index 47161e6..32c2b32 100644
--- a/Documentation/sphinx/translations.py
+++ b/Documentation/sphinx/translations.py
@@ -29,10 +29,7 @@
}
class LanguagesNode(nodes.Element):
- def __init__(self, current_language, *args, **kwargs):
- super().__init__(*args, **kwargs)
-
- self.current_language = current_language
+ pass
class TranslationsTransform(Transform):
default_priority = 900
@@ -49,7 +46,8 @@
# normalize docname to be the untranslated one
docname = os.path.join(*components[2:])
- new_nodes = LanguagesNode(all_languages[this_lang_code])
+ new_nodes = LanguagesNode()
+ new_nodes['current_language'] = all_languages[this_lang_code]
for lang_code, lang_name in all_languages.items():
if lang_code == this_lang_code:
@@ -84,7 +82,7 @@
html_content = app.builder.templates.render('translations.html',
context={
- 'current_language': node.current_language,
+ 'current_language': node['current_language'],
'languages': languages,
})
diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 457e16f..3731ecf 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -82,8 +82,9 @@
0x10 00-0F drivers/char/s390/vmcp.h
0x10 10-1F arch/s390/include/uapi/sclp_ctl.h
0x10 20-2F arch/s390/include/uapi/asm/hypfs.h
-0x12 all linux/fs.h
+0x12 all linux/fs.h BLK* ioctls
linux/blkpg.h
+0x15 all linux/fs.h FS_IOC_* ioctls
0x1b all InfiniBand Subsystem
<http://infiniband.sourceforge.net/>
0x20 all drivers/cdrom/cm206.h
diff --git a/Documentation/virt/hyperv/index.rst b/Documentation/virt/hyperv/index.rst
index 4a7a1b7..de447e1 100644
--- a/Documentation/virt/hyperv/index.rst
+++ b/Documentation/virt/hyperv/index.rst
@@ -10,3 +10,4 @@
overview
vmbus
clocks
+ vpci
diff --git a/Documentation/virt/hyperv/vpci.rst b/Documentation/virt/hyperv/vpci.rst
new file mode 100644
index 0000000..b65b212
--- /dev/null
+++ b/Documentation/virt/hyperv/vpci.rst
@@ -0,0 +1,316 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+PCI pass-thru devices
+=========================
+In a Hyper-V guest VM, PCI pass-thru devices (also called
+virtual PCI devices, or vPCI devices) are physical PCI devices
+that are mapped directly into the VM's physical address space.
+Guest device drivers can interact directly with the hardware
+without intermediation by the host hypervisor. This approach
+provides higher bandwidth access to the device with lower
+latency, compared with devices that are virtualized by the
+hypervisor. The device should appear to the guest just as it
+would when running on bare metal, so no changes are required
+to the Linux device drivers for the device.
+
+Hyper-V terminology for vPCI devices is "Discrete Device
+Assignment" (DDA). Public documentation for Hyper-V DDA is
+available here: `DDA`_
+
+.. _DDA: https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/plan/plan-for-deploying-devices-using-discrete-device-assignment
+
+DDA is typically used for storage controllers, such as NVMe,
+and for GPUs. A similar mechanism for NICs is called SR-IOV
+and produces the same benefits by allowing a guest device
+driver to interact directly with the hardware. See Hyper-V
+public documentation here: `SR-IOV`_
+
+.. _SR-IOV: https://learn.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-single-root-i-o-virtualization--sr-iov-
+
+This discussion of vPCI devices includes DDA and SR-IOV
+devices.
+
+Device Presentation
+-------------------
+Hyper-V provides full PCI functionality for a vPCI device when
+it is operating, so the Linux device driver for the device can
+be used unchanged, provided it uses the correct Linux kernel
+APIs for accessing PCI config space and for other integration
+with Linux. But the initial detection of the PCI device and
+its integration with the Linux PCI subsystem must use Hyper-V
+specific mechanisms. Consequently, vPCI devices on Hyper-V
+have a dual identity. They are initially presented to Linux
+guests as VMBus devices via the standard VMBus "offer"
+mechanism, so they have a VMBus identity and appear under
+/sys/bus/vmbus/devices. The VMBus vPCI driver in Linux at
+drivers/pci/controller/pci-hyperv.c handles a newly introduced
+vPCI device by fabricating a PCI bus topology and creating all
+the normal PCI device data structures in Linux that would
+exist if the PCI device were discovered via ACPI on a bare-
+metal system. Once those data structures are set up, the
+device also has a normal PCI identity in Linux, and the normal
+Linux device driver for the vPCI device can function as if it
+were running in Linux on bare-metal. Because vPCI devices are
+presented dynamically through the VMBus offer mechanism, they
+do not appear in the Linux guest's ACPI tables. vPCI devices
+may be added to a VM or removed from a VM at any time during
+the life of the VM, and not just during initial boot.
+
+With this approach, the vPCI device is a VMBus device and a
+PCI device at the same time. In response to the VMBus offer
+message, the hv_pci_probe() function runs and establishes a
+VMBus connection to the vPCI VSP on the Hyper-V host. That
+connection has a single VMBus channel. The channel is used to
+exchange messages with the vPCI VSP for the purpose of setting
+up and configuring the vPCI device in Linux. Once the device
+is fully configured in Linux as a PCI device, the VMBus
+channel is used only if Linux changes the vCPU to be interrupted
+in the guest, or if the vPCI device is removed from
+the VM while the VM is running. The ongoing operation of the
+device happens directly between the Linux device driver for
+the device and the hardware, with VMBus and the VMBus channel
+playing no role.
+
+PCI Device Setup
+----------------
+PCI device setup follows a sequence that Hyper-V originally
+created for Windows guests, and that can be ill-suited for
+Linux guests due to differences in the overall structure of
+the Linux PCI subsystem compared with Windows. Nonetheless,
+with a bit of hackery in the Hyper-V virtual PCI driver for
+Linux, the virtual PCI device is setup in Linux so that
+generic Linux PCI subsystem code and the Linux driver for the
+device "just work".
+
+Each vPCI device is set up in Linux to be in its own PCI
+domain with a host bridge. The PCI domainID is derived from
+bytes 4 and 5 of the instance GUID assigned to the VMBus vPCI
+device. The Hyper-V host does not guarantee that these bytes
+are unique, so hv_pci_probe() has an algorithm to resolve
+collisions. The collision resolution is intended to be stable
+across reboots of the same VM so that the PCI domainIDs don't
+change, as the domainID appears in the user space
+configuration of some devices.
+
+hv_pci_probe() allocates a guest MMIO range to be used as PCI
+config space for the device. This MMIO range is communicated
+to the Hyper-V host over the VMBus channel as part of telling
+the host that the device is ready to enter d0. See
+hv_pci_enter_d0(). When the guest subsequently accesses this
+MMIO range, the Hyper-V host intercepts the accesses and maps
+them to the physical device PCI config space.
+
+hv_pci_probe() also gets BAR information for the device from
+the Hyper-V host, and uses this information to allocate MMIO
+space for the BARs. That MMIO space is then setup to be
+associated with the host bridge so that it works when generic
+PCI subsystem code in Linux processes the BARs.
+
+Finally, hv_pci_probe() creates the root PCI bus. At this
+point the Hyper-V virtual PCI driver hackery is done, and the
+normal Linux PCI machinery for scanning the root bus works to
+detect the device, to perform driver matching, and to
+initialize the driver and device.
+
+PCI Device Removal
+------------------
+A Hyper-V host may initiate removal of a vPCI device from a
+guest VM at any time during the life of the VM. The removal
+is instigated by an admin action taken on the Hyper-V host and
+is not under the control of the guest OS.
+
+A guest VM is notified of the removal by an unsolicited
+"Eject" message sent from the host to the guest over the VMBus
+channel associated with the vPCI device. Upon receipt of such
+a message, the Hyper-V virtual PCI driver in Linux
+asynchronously invokes Linux kernel PCI subsystem calls to
+shutdown and remove the device. When those calls are
+complete, an "Ejection Complete" message is sent back to
+Hyper-V over the VMBus channel indicating that the device has
+been removed. At this point, Hyper-V sends a VMBus rescind
+message to the Linux guest, which the VMBus driver in Linux
+processes by removing the VMBus identity for the device. Once
+that processing is complete, all vestiges of the device having
+been present are gone from the Linux kernel. The rescind
+message also indicates to the guest that Hyper-V has stopped
+providing support for the vPCI device in the guest. If the
+guest were to attempt to access that device's MMIO space, it
+would be an invalid reference. Hypercalls affecting the device
+return errors, and any further messages sent in the VMBus
+channel are ignored.
+
+After sending the Eject message, Hyper-V allows the guest VM
+60 seconds to cleanly shutdown the device and respond with
+Ejection Complete before sending the VMBus rescind
+message. If for any reason the Eject steps don't complete
+within the allowed 60 seconds, the Hyper-V host forcibly
+performs the rescind steps, which will likely result in
+cascading errors in the guest because the device is now no
+longer present from the guest standpoint and accessing the
+device MMIO space will fail.
+
+Because ejection is asynchronous and can happen at any point
+during the guest VM lifecycle, proper synchronization in the
+Hyper-V virtual PCI driver is very tricky. Ejection has been
+observed even before a newly offered vPCI device has been
+fully setup. The Hyper-V virtual PCI driver has been updated
+several times over the years to fix race conditions when
+ejections happen at inopportune times. Care must be taken when
+modifying this code to prevent re-introducing such problems.
+See comments in the code.
+
+Interrupt Assignment
+--------------------
+The Hyper-V virtual PCI driver supports vPCI devices using
+MSI, multi-MSI, or MSI-X. Assigning the guest vCPU that will
+receive the interrupt for a particular MSI or MSI-X message is
+complex because of the way the Linux setup of IRQs maps onto
+the Hyper-V interfaces. For the single-MSI and MSI-X cases,
+Linux calls hv_compse_msi_msg() twice, with the first call
+containing a dummy vCPU and the second call containing the
+real vCPU. Furthermore, hv_irq_unmask() is finally called
+(on x86) or the GICD registers are set (on arm64) to specify
+the real vCPU again. Each of these three calls interact
+with Hyper-V, which must decide which physical CPU should
+receive the interrupt before it is forwarded to the guest VM.
+Unfortunately, the Hyper-V decision-making process is a bit
+limited, and can result in concentrating the physical
+interrupts on a single CPU, causing a performance bottleneck.
+See details about how this is resolved in the extensive
+comment above the function hv_compose_msi_req_get_cpu().
+
+The Hyper-V virtual PCI driver implements the
+irq_chip.irq_compose_msi_msg function as hv_compose_msi_msg().
+Unfortunately, on Hyper-V the implementation requires sending
+a VMBus message to the Hyper-V host and awaiting an interrupt
+indicating receipt of a reply message. Since
+irq_chip.irq_compose_msi_msg can be called with IRQ locks
+held, it doesn't work to do the normal sleep until awakened by
+the interrupt. Instead hv_compose_msi_msg() must send the
+VMBus message, and then poll for the completion message. As
+further complexity, the vPCI device could be ejected/rescinded
+while the polling is in progress, so this scenario must be
+detected as well. See comments in the code regarding this
+very tricky area.
+
+Most of the code in the Hyper-V virtual PCI driver (pci-
+hyperv.c) applies to Hyper-V and Linux guests running on x86
+and on arm64 architectures. But there are differences in how
+interrupt assignments are managed. On x86, the Hyper-V
+virtual PCI driver in the guest must make a hypercall to tell
+Hyper-V which guest vCPU should be interrupted by each
+MSI/MSI-X interrupt, and the x86 interrupt vector number that
+the x86_vector IRQ domain has picked for the interrupt. This
+hypercall is made by hv_arch_irq_unmask(). On arm64, the
+Hyper-V virtual PCI driver manages the allocation of an SPI
+for each MSI/MSI-X interrupt. The Hyper-V virtual PCI driver
+stores the allocated SPI in the architectural GICD registers,
+which Hyper-V emulates, so no hypercall is necessary as with
+x86. Hyper-V does not support using LPIs for vPCI devices in
+arm64 guest VMs because it does not emulate a GICv3 ITS.
+
+The Hyper-V virtual PCI driver in Linux supports vPCI devices
+whose drivers create managed or unmanaged Linux IRQs. If the
+smp_affinity for an unmanaged IRQ is updated via the /proc/irq
+interface, the Hyper-V virtual PCI driver is called to tell
+the Hyper-V host to change the interrupt targeting and
+everything works properly. However, on x86 if the x86_vector
+IRQ domain needs to reassign an interrupt vector due to
+running out of vectors on a CPU, there's no path to inform the
+Hyper-V host of the change, and things break. Fortunately,
+guest VMs operate in a constrained device environment where
+using all the vectors on a CPU doesn't happen. Since such a
+problem is only a theoretical concern rather than a practical
+concern, it has been left unaddressed.
+
+DMA
+---
+By default, Hyper-V pins all guest VM memory in the host
+when the VM is created, and programs the physical IOMMU to
+allow the VM to have DMA access to all its memory. Hence
+it is safe to assign PCI devices to the VM, and allow the
+guest operating system to program the DMA transfers. The
+physical IOMMU prevents a malicious guest from initiating
+DMA to memory belonging to the host or to other VMs on the
+host. From the Linux guest standpoint, such DMA transfers
+are in "direct" mode since Hyper-V does not provide a virtual
+IOMMU in the guest.
+
+Hyper-V assumes that physical PCI devices always perform
+cache-coherent DMA. When running on x86, this behavior is
+required by the architecture. When running on arm64, the
+architecture allows for both cache-coherent and
+non-cache-coherent devices, with the behavior of each device
+specified in the ACPI DSDT. But when a PCI device is assigned
+to a guest VM, that device does not appear in the DSDT, so the
+Hyper-V VMBus driver propagates cache-coherency information
+from the VMBus node in the ACPI DSDT to all VMBus devices,
+including vPCI devices (since they have a dual identity as a VMBus
+device and as a PCI device). See vmbus_dma_configure().
+Current Hyper-V versions always indicate that the VMBus is
+cache coherent, so vPCI devices on arm64 always get marked as
+cache coherent and the CPU does not perform any sync
+operations as part of dma_map/unmap_*() calls.
+
+vPCI protocol versions
+----------------------
+As previously described, during vPCI device setup and teardown
+messages are passed over a VMBus channel between the Hyper-V
+host and the Hyper-v vPCI driver in the Linux guest. Some
+messages have been revised in newer versions of Hyper-V, so
+the guest and host must agree on the vPCI protocol version to
+be used. The version is negotiated when communication over
+the VMBus channel is first established. See
+hv_pci_protocol_negotiation(). Newer versions of the protocol
+extend support to VMs with more than 64 vCPUs, and provide
+additional information about the vPCI device, such as the
+guest virtual NUMA node to which it is most closely affined in
+the underlying hardware.
+
+Guest NUMA node affinity
+------------------------
+When the vPCI protocol version provides it, the guest NUMA
+node affinity of the vPCI device is stored as part of the Linux
+device information for subsequent use by the Linux driver. See
+hv_pci_assign_numa_node(). If the negotiated protocol version
+does not support the host providing NUMA affinity information,
+the Linux guest defaults the device NUMA node to 0. But even
+when the negotiated protocol version includes NUMA affinity
+information, the ability of the host to provide such
+information depends on certain host configuration options. If
+the guest receives NUMA node value "0", it could mean NUMA
+node 0, or it could mean "no information is available".
+Unfortunately it is not possible to distinguish the two cases
+from the guest side.
+
+PCI config space access in a CoCo VM
+------------------------------------
+Linux PCI device drivers access PCI config space using a
+standard set of functions provided by the Linux PCI subsystem.
+In Hyper-V guests these standard functions map to functions
+hv_pcifront_read_config() and hv_pcifront_write_config()
+in the Hyper-V virtual PCI driver. In normal VMs,
+these hv_pcifront_*() functions directly access the PCI config
+space, and the accesses trap to Hyper-V to be handled.
+But in CoCo VMs, memory encryption prevents Hyper-V
+from reading the guest instruction stream to emulate the
+access, so the hv_pcifront_*() functions must invoke
+hypercalls with explicit arguments describing the access to be
+made.
+
+Config Block back-channel
+-------------------------
+The Hyper-V host and Hyper-V virtual PCI driver in Linux
+together implement a non-standard back-channel communication
+path between the host and guest. The back-channel path uses
+messages sent over the VMBus channel associated with the vPCI
+device. The functions hyperv_read_cfg_blk() and
+hyperv_write_cfg_blk() are the primary interfaces provided to
+other parts of the Linux kernel. As of this writing, these
+interfaces are used only by the Mellanox mlx5 driver to pass
+diagnostic data to a Hyper-V host running in the Azure public
+cloud. The functions hyperv_read_cfg_blk() and
+hyperv_write_cfg_blk() are implemented in a separate module
+(pci-hyperv-intf.c, under CONFIG_PCI_HYPERV_INTERFACE) that
+effectively stubs them out when running in non-Hyper-V
+environments.
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 3ec0b7a..09c7e58 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8791,6 +8791,11 @@
#define KVM_X86_DEFAULT_VM 0
#define KVM_X86_SW_PROTECTED_VM 1
+Note, KVM_X86_SW_PROTECTED_VM is currently only for development and testing.
+Do not use KVM_X86_SW_PROTECTED_VM for "real" VMs, and especially not in
+production. The behavior and effective ABI for software-protected VMs is
+unstable.
+
9. Known KVM API problems
=========================
diff --git a/MAINTAINERS b/MAINTAINERS
index 960512b..7f0b462 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1395,6 +1395,7 @@
ANALOGBITS PLL LIBRARIES
M: Paul Walmsley <paul.walmsley@sifive.com>
+M: Samuel Holland <samuel.holland@sifive.com>
S: Supported
F: drivers/clk/analogbits/*
F: include/linux/clk/analogbits*
@@ -2156,7 +2157,7 @@
M: Sascha Hauer <s.hauer@pengutronix.de>
R: Pengutronix Kernel Team <kernel@pengutronix.de>
R: Fabio Estevam <festevam@gmail.com>
-R: NXP Linux Team <linux-imx@nxp.com>
+L: imx@lists.linux.dev
L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux.git
@@ -4169,14 +4170,14 @@
F: drivers/net/ethernet/broadcom/bnxt/
F: include/linux/firmware/broadcom/tee_bnxt_fw.h
-BROADCOM BRCM80211 IEEE802.11n WIRELESS DRIVER
-M: Arend van Spriel <aspriel@gmail.com>
-M: Franky Lin <franky.lin@broadcom.com>
-M: Hante Meuleman <hante.meuleman@broadcom.com>
+BROADCOM BRCM80211 IEEE802.11 WIRELESS DRIVERS
+M: Arend van Spriel <arend.vanspriel@broadcom.com>
L: linux-wireless@vger.kernel.org
+L: brcm80211@lists.linux.dev
L: brcm80211-dev-list.pdl@broadcom.com
S: Supported
F: drivers/net/wireless/broadcom/brcm80211/
+F: include/linux/platform_data/brcmfmac.h
BROADCOM BRCMSTB GPIO DRIVER
M: Doug Berger <opendmb@gmail.com>
@@ -5378,7 +5379,7 @@
M: Johannes Weiner <hannes@cmpxchg.org>
M: Michal Hocko <mhocko@kernel.org>
M: Roman Gushchin <roman.gushchin@linux.dev>
-M: Shakeel Butt <shakeelb@google.com>
+M: Shakeel Butt <shakeel.butt@linux.dev>
R: Muchun Song <muchun.song@linux.dev>
L: cgroups@vger.kernel.org
L: linux-mm@kvack.org
@@ -5610,6 +5611,11 @@
F: Documentation/devicetree/bindings/net/can/ctu,ctucanfd.yaml
F: drivers/net/can/ctucanfd/
+CVE ASSIGNMENT CONTACT
+M: CVE Assignment Team <cve@kernel.org>
+S: Maintained
+F: Documentation/process/cve.rst
+
CW1200 WLAN driver
S: Orphan
F: drivers/net/wireless/st/cw1200/
@@ -8490,7 +8496,7 @@
M: Wei Fang <wei.fang@nxp.com>
R: Shenwei Wang <shenwei.wang@nxp.com>
R: Clark Wang <xiaoning.wang@nxp.com>
-R: NXP Linux Team <linux-imx@nxp.com>
+L: imx@lists.linux.dev
L: netdev@vger.kernel.org
S: Maintained
F: Documentation/devicetree/bindings/net/fsl,fec.yaml
@@ -8525,7 +8531,7 @@
FREESCALE IMX LPI2C DRIVER
M: Dong Aisheng <aisheng.dong@nxp.com>
L: linux-i2c@vger.kernel.org
-L: linux-imx@nxp.com
+L: imx@lists.linux.dev
S: Maintained
F: Documentation/devicetree/bindings/i2c/i2c-imx-lpi2c.yaml
F: drivers/i2c/busses/i2c-imx-lpi2c.c
@@ -10729,7 +10735,7 @@
M: Jani Nikula <jani.nikula@linux.intel.com>
M: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
M: Rodrigo Vivi <rodrigo.vivi@intel.com>
-M: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
+M: Tvrtko Ursulin <tursulin@ursulin.net>
L: intel-gfx@lists.freedesktop.org
S: Supported
W: https://drm.pages.freedesktop.org/intel-docs/
@@ -10801,11 +10807,11 @@
INTEL GVT-g DRIVERS (Intel GPU Virtualization)
M: Zhenyu Wang <zhenyuw@linux.intel.com>
-M: Zhi Wang <zhi.a.wang@intel.com>
+M: Zhi Wang <zhi.wang.linux@gmail.com>
L: intel-gvt-dev@lists.freedesktop.org
L: intel-gfx@lists.freedesktop.org
S: Supported
-W: https://01.org/igvt-g
+W: https://github.com/intel/gvt-linux/wiki
T: git https://github.com/intel/gvt-linux.git
F: drivers/gpu/drm/i915/gvt/
@@ -11127,7 +11133,6 @@
F: drivers/net/wireless/intel/iwlegacy/
INTEL WIRELESS WIFI LINK (iwlwifi)
-M: Gregory Greenman <gregory.greenman@intel.com>
M: Miri Korenblit <miriam.rachel.korenblit@intel.com>
L: linux-wireless@vger.kernel.org
S: Supported
@@ -12512,7 +12517,6 @@
F: include/linux/livepatch.h
F: kernel/livepatch/
F: kernel/module/livepatch.c
-F: lib/livepatch/
F: samples/livepatch/
F: tools/testing/selftests/livepatch/
@@ -14107,6 +14111,17 @@
F: tools/mm/
F: tools/testing/selftests/mm/
+MEMORY MAPPING
+M: Andrew Morton <akpm@linux-foundation.org>
+R: Liam R. Howlett <Liam.Howlett@oracle.com>
+R: Vlastimil Babka <vbabka@suse.cz>
+R: Lorenzo Stoakes <lstoakes@gmail.com>
+L: linux-mm@kvack.org
+S: Maintained
+W: http://www.linux-mm.org
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
+F: mm/mmap.c
+
MEMORY TECHNOLOGY DEVICES (MTD)
M: Miquel Raynal <miquel.raynal@bootlin.com>
M: Richard Weinberger <richard@nod.at>
@@ -14365,7 +14380,7 @@
M: Claudiu Beznea <claudiu.beznea@tuxon.dev>
L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
S: Supported
-F: Documentation/devicetree/bindings/regulator/mcp16502-regulator.txt
+F: Documentation/devicetree/bindings/regulator/microchip,mcp16502.yaml
F: drivers/regulator/mcp16502.c
MICROCHIP MCP3564 ADC DRIVER
@@ -15238,6 +15253,8 @@
F: Documentation/networking/net_cachelines/
F: Documentation/process/maintainer-netdev.rst
F: Documentation/userspace-api/netlink/
+F: include/linux/framer/framer-provider.h
+F: include/linux/framer/framer.h
F: include/linux/in.h
F: include/linux/indirect_call_wrapper.h
F: include/linux/net.h
@@ -15325,7 +15342,7 @@
NETWORKING [MPTCP]
M: Matthieu Baerts <matttbe@kernel.org>
M: Mat Martineau <martineau@kernel.org>
-R: Geliang Tang <geliang.tang@linux.dev>
+R: Geliang Tang <geliang@kernel.org>
L: netdev@vger.kernel.org
L: mptcp@lists.linux.dev
S: Maintained
@@ -15572,16 +15589,6 @@
T: git https://github.com/davejiang/linux.git
F: drivers/ntb/hw/intel/
-NTFS FILESYSTEM
-M: Anton Altaparmakov <anton@tuxera.com>
-R: Namjae Jeon <linkinjeon@kernel.org>
-L: linux-ntfs-dev@lists.sourceforge.net
-S: Supported
-W: http://www.tuxera.com/
-T: git git://git.kernel.org/pub/scm/linux/kernel/git/aia21/ntfs.git
-F: Documentation/filesystems/ntfs.rst
-F: fs/ntfs/
-
NTFS3 FILESYSTEM
M: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
L: ntfs3@lists.linux.dev
@@ -15710,7 +15717,7 @@
NXP i.MX 7D/6SX/6UL/93 AND VF610 ADC DRIVER
M: Haibo Chen <haibo.chen@nxp.com>
L: linux-iio@vger.kernel.org
-L: linux-imx@nxp.com
+L: imx@lists.linux.dev
S: Maintained
F: Documentation/devicetree/bindings/iio/adc/fsl,imx7d-adc.yaml
F: Documentation/devicetree/bindings/iio/adc/fsl,vf610-adc.yaml
@@ -15747,7 +15754,7 @@
NXP i.MX 8QXP ADC DRIVER
M: Cai Huoqing <cai.huoqing@linux.dev>
M: Haibo Chen <haibo.chen@nxp.com>
-L: linux-imx@nxp.com
+L: imx@lists.linux.dev
L: linux-iio@vger.kernel.org
S: Maintained
F: Documentation/devicetree/bindings/iio/adc/nxp,imx8qxp-adc.yaml
@@ -15755,7 +15762,7 @@
NXP i.MX 8QXP/8QM JPEG V4L2 DRIVER
M: Mirela Rabulea <mirela.rabulea@nxp.com>
-R: NXP Linux Team <linux-imx@nxp.com>
+L: imx@lists.linux.dev
L: linux-media@vger.kernel.org
S: Maintained
F: Documentation/devicetree/bindings/media/nxp,imx8-jpeg.yaml
@@ -15765,7 +15772,7 @@
M: Abel Vesa <abelvesa@kernel.org>
R: Peng Fan <peng.fan@nxp.com>
L: linux-clk@vger.kernel.org
-L: linux-imx@nxp.com
+L: imx@lists.linux.dev
S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/abelvesa/linux.git clk/imx
F: Documentation/devicetree/bindings/clock/imx*
@@ -16726,6 +16733,7 @@
PCI DRIVER FOR FU740
M: Paul Walmsley <paul.walmsley@sifive.com>
M: Greentime Hu <greentime.hu@sifive.com>
+M: Samuel Holland <samuel.holland@sifive.com>
L: linux-pci@vger.kernel.org
S: Maintained
F: Documentation/devicetree/bindings/pci/sifive,fu740-pcie.yaml
@@ -16838,6 +16846,7 @@
PCI DRIVER FOR TI DRA7XX/J721E
M: Vignesh Raghavendra <vigneshr@ti.com>
+R: Siddharth Vadapalli <s-vadapalli@ti.com>
L: linux-omap@vger.kernel.org
L: linux-pci@vger.kernel.org
L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
@@ -17183,7 +17192,7 @@
R: Will Deacon <will@kernel.org>
R: James Clark <james.clark@arm.com>
R: Mike Leach <mike.leach@linaro.org>
-R: Leo Yan <leo.yan@linaro.org>
+R: Leo Yan <leo.yan@linux.dev>
L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
S: Supported
F: tools/build/feature/test-libopencsd.c
@@ -17530,6 +17539,7 @@
F: drivers/power/supply/
F: include/linux/power/
F: include/linux/power_supply.h
+F: tools/testing/selftests/power_supply/
POWERNV OPERATOR PANEL LCD DISPLAY DRIVER
M: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
@@ -17977,33 +17987,34 @@
QUALCOMM ATH12K WIRELESS DRIVER
M: Kalle Valo <kvalo@kernel.org>
-M: Jeff Johnson <quic_jjohnson@quicinc.com>
+M: Jeff Johnson <jjohnson@kernel.org>
L: ath12k@lists.infradead.org
S: Supported
W: https://wireless.wiki.kernel.org/en/users/Drivers/ath12k
T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git
F: drivers/net/wireless/ath/ath12k/
+N: ath12k
QUALCOMM ATHEROS ATH10K WIRELESS DRIVER
M: Kalle Valo <kvalo@kernel.org>
-M: Jeff Johnson <quic_jjohnson@quicinc.com>
+M: Jeff Johnson <jjohnson@kernel.org>
L: ath10k@lists.infradead.org
S: Supported
W: https://wireless.wiki.kernel.org/en/users/Drivers/ath10k
T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git
-F: Documentation/devicetree/bindings/net/wireless/qcom,ath10k.yaml
F: drivers/net/wireless/ath/ath10k/
+N: ath10k
QUALCOMM ATHEROS ATH11K WIRELESS DRIVER
M: Kalle Valo <kvalo@kernel.org>
-M: Jeff Johnson <quic_jjohnson@quicinc.com>
+M: Jeff Johnson <jjohnson@kernel.org>
L: ath11k@lists.infradead.org
S: Supported
W: https://wireless.wiki.kernel.org/en/users/Drivers/ath11k
B: https://wireless.wiki.kernel.org/en/users/Drivers/ath11k/bugreport
T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git
-F: Documentation/devicetree/bindings/net/wireless/qcom,ath11k.yaml
F: drivers/net/wireless/ath/ath11k/
+N: ath11k
QUALCOMM ATHEROS ATH9K WIRELESS DRIVER
M: Toke Høiland-Jørgensen <toke@toke.dk>
@@ -18432,7 +18443,7 @@
F: drivers/infiniband/sw/rdmavt
RDS - RELIABLE DATAGRAM SOCKETS
-M: Santosh Shilimkar <santosh.shilimkar@oracle.com>
+M: Allison Henderson <allison.henderson@oracle.com>
L: netdev@vger.kernel.org
L: linux-rdma@vger.kernel.org
L: rds-devel@oss.oracle.com (moderated for non-subscribers)
@@ -19099,6 +19110,7 @@
F: rust/
F: samples/rust/
F: scripts/*rust*
+F: tools/testing/selftests/rust/
K: \b(?i:rust)\b
RXRPC SOCKETS (AF_RXRPC)
@@ -19634,7 +19646,7 @@
SECURE DIGITAL HOST CONTROLLER INTERFACE (SDHCI) NXP i.MX DRIVER
M: Haibo Chen <haibo.chen@nxp.com>
-L: linux-imx@nxp.com
+L: imx@lists.linux.dev
L: linux-mmc@vger.kernel.org
S: Maintained
F: drivers/mmc/host/sdhci-esdhc-imx.c
@@ -19969,35 +19981,14 @@
F: drivers/watchdog/simatic-ipc-wdt.c
SIFIVE DRIVERS
-M: Palmer Dabbelt <palmer@dabbelt.com>
M: Paul Walmsley <paul.walmsley@sifive.com>
+M: Samuel Holland <samuel.holland@sifive.com>
L: linux-riscv@lists.infradead.org
S: Supported
-N: sifive
-K: [^@]sifive
-
-SIFIVE CACHE DRIVER
-M: Conor Dooley <conor@kernel.org>
-L: linux-riscv@lists.infradead.org
-S: Maintained
-F: Documentation/devicetree/bindings/cache/sifive,ccache0.yaml
-F: drivers/cache/sifive_ccache.c
-
-SIFIVE FU540 SYSTEM-ON-CHIP
-M: Paul Walmsley <paul.walmsley@sifive.com>
-M: Palmer Dabbelt <palmer@dabbelt.com>
-L: linux-riscv@lists.infradead.org
-S: Supported
-T: git git://git.kernel.org/pub/scm/linux/kernel/git/pjw/sifive.git
-N: fu540
-K: fu540
-
-SIFIVE PDMA DRIVER
-M: Green Wan <green.wan@sifive.com>
-S: Maintained
-F: Documentation/devicetree/bindings/dma/sifive,fu540-c000-pdma.yaml
F: drivers/dma/sf-pdma/
-
+N: sifive
+K: fu[57]40
+K: [^@]sifive
SILEAD TOUCHSCREEN DRIVER
M: Hans de Goede <hdegoede@redhat.com>
@@ -20207,8 +20198,8 @@
F: drivers/net/ethernet/socionext/sni_ave.c
SOCIONEXT (SNI) NETSEC NETWORK DRIVER
-M: Jassi Brar <jaswinder.singh@linaro.org>
M: Ilias Apalodimas <ilias.apalodimas@linaro.org>
+M: Masahisa Kojima <kojima.masahisa@socionext.com>
L: netdev@vger.kernel.org
S: Maintained
F: Documentation/devicetree/bindings/net/socionext,synquacer-netsec.yaml
@@ -22010,6 +22001,14 @@
F: drivers/media/i2c/ds90*
F: include/media/i2c/ds90*
+TI HDC302X HUMIDITY DRIVER
+M: Javier Carrasco <javier.carrasco.cruz@gmail.com>
+M: Li peiyu <579lpy@gmail.com>
+L: linux-iio@vger.kernel.org
+S: Maintained
+F: Documentation/devicetree/bindings/iio/humidity/ti,hdc3020.yaml
+F: drivers/iio/humidity/hdc3020.c
+
TI ICSSG ETHERNET DRIVER (ICSSG)
R: MD Danish Anwar <danishanwar@ti.com>
R: Roger Quadros <rogerq@kernel.org>
@@ -22865,9 +22864,8 @@
F: drivers/usb/typec/mux/pi3usb30532.c
USB TYPEC PORT CONTROLLER DRIVERS
-M: Guenter Roeck <linux@roeck-us.net>
L: linux-usb@vger.kernel.org
-S: Maintained
+S: Orphan
F: drivers/usb/typec/tcpm/
USB UHCI DRIVER
diff --git a/Makefile b/Makefile
index a171eafc..c7ee53f 100644
--- a/Makefile
+++ b/Makefile
@@ -2,7 +2,7 @@
VERSION = 6
PATCHLEVEL = 8
SUBLEVEL = 0
-EXTRAVERSION = -rc3
+EXTRAVERSION =
NAME = Hurr durr I'ma ninja sloth
# *DOCUMENTATION*
@@ -294,15 +294,15 @@
single-build :=
ifneq ($(filter $(no-dot-config-targets), $(MAKECMDGOALS)),)
- ifeq ($(filter-out $(no-dot-config-targets), $(MAKECMDGOALS)),)
+ ifeq ($(filter-out $(no-dot-config-targets), $(MAKECMDGOALS)),)
need-config :=
- endif
+ endif
endif
ifneq ($(filter $(no-sync-config-targets), $(MAKECMDGOALS)),)
- ifeq ($(filter-out $(no-sync-config-targets), $(MAKECMDGOALS)),)
+ ifeq ($(filter-out $(no-sync-config-targets), $(MAKECMDGOALS)),)
may-sync-config :=
- endif
+ endif
endif
need-compiler := $(may-sync-config)
@@ -323,9 +323,9 @@
# We cannot build single targets and the others at the same time
ifneq ($(filter $(single-targets), $(MAKECMDGOALS)),)
single-build := 1
- ifneq ($(filter-out $(single-targets), $(MAKECMDGOALS)),)
+ ifneq ($(filter-out $(single-targets), $(MAKECMDGOALS)),)
mixed-build := 1
- endif
+ endif
endif
# For "make -j clean all", "make -j mrproper defconfig all", etc.
diff --git a/arch/arc/include/asm/jump_label.h b/arch/arc/include/asm/jump_label.h
index 9d96180..a339223 100644
--- a/arch/arc/include/asm/jump_label.h
+++ b/arch/arc/include/asm/jump_label.h
@@ -31,7 +31,7 @@
static __always_inline bool arch_static_branch(struct static_key *key,
bool branch)
{
- asm_volatile_goto(".balign "__stringify(JUMP_LABEL_NOP_SIZE)" \n"
+ asm goto(".balign "__stringify(JUMP_LABEL_NOP_SIZE)" \n"
"1: \n"
"nop \n"
".pushsection __jump_table, \"aw\" \n"
@@ -47,7 +47,7 @@ static __always_inline bool arch_static_branch(struct static_key *key,
static __always_inline bool arch_static_branch_jump(struct static_key *key,
bool branch)
{
- asm_volatile_goto(".balign "__stringify(JUMP_LABEL_NOP_SIZE)" \n"
+ asm goto(".balign "__stringify(JUMP_LABEL_NOP_SIZE)" \n"
"1: \n"
"b %l[l_yes] \n"
".pushsection __jump_table, \"aw\" \n"
diff --git a/arch/arm/boot/dts/amazon/alpine.dtsi b/arch/arm/boot/dts/amazon/alpine.dtsi
index ff68dfb..90bd12f 100644
--- a/arch/arm/boot/dts/amazon/alpine.dtsi
+++ b/arch/arm/boot/dts/amazon/alpine.dtsi
@@ -167,7 +167,6 @@ pcie@fbc00000 {
msix: msix@fbe00000 {
compatible = "al,alpine-msix";
reg = <0x0 0xfbe00000 0x0 0x100000>;
- interrupt-controller;
msi-controller;
al,msi-base-spi = <96>;
al,msi-num-spis = <64>;
diff --git a/arch/arm/boot/dts/aspeed/aspeed-g4.dtsi b/arch/arm/boot/dts/aspeed/aspeed-g4.dtsi
index 530491a..857cb26 100644
--- a/arch/arm/boot/dts/aspeed/aspeed-g4.dtsi
+++ b/arch/arm/boot/dts/aspeed/aspeed-g4.dtsi
@@ -466,7 +466,6 @@ i2c_ic: interrupt-controller@0 {
i2c0: i2c-bus@40 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x40 0x40>;
compatible = "aspeed,ast2400-i2c-bus";
@@ -482,7 +481,6 @@ i2c0: i2c-bus@40 {
i2c1: i2c-bus@80 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x80 0x40>;
compatible = "aspeed,ast2400-i2c-bus";
@@ -498,7 +496,6 @@ i2c1: i2c-bus@80 {
i2c2: i2c-bus@c0 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0xc0 0x40>;
compatible = "aspeed,ast2400-i2c-bus";
@@ -515,7 +512,6 @@ i2c2: i2c-bus@c0 {
i2c3: i2c-bus@100 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x100 0x40>;
compatible = "aspeed,ast2400-i2c-bus";
@@ -532,7 +528,6 @@ i2c3: i2c-bus@100 {
i2c4: i2c-bus@140 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x140 0x40>;
compatible = "aspeed,ast2400-i2c-bus";
@@ -549,7 +544,6 @@ i2c4: i2c-bus@140 {
i2c5: i2c-bus@180 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x180 0x40>;
compatible = "aspeed,ast2400-i2c-bus";
@@ -566,7 +560,6 @@ i2c5: i2c-bus@180 {
i2c6: i2c-bus@1c0 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x1c0 0x40>;
compatible = "aspeed,ast2400-i2c-bus";
@@ -583,7 +576,6 @@ i2c6: i2c-bus@1c0 {
i2c7: i2c-bus@300 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x300 0x40>;
compatible = "aspeed,ast2400-i2c-bus";
@@ -600,7 +592,6 @@ i2c7: i2c-bus@300 {
i2c8: i2c-bus@340 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x340 0x40>;
compatible = "aspeed,ast2400-i2c-bus";
@@ -617,7 +608,6 @@ i2c8: i2c-bus@340 {
i2c9: i2c-bus@380 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x380 0x40>;
compatible = "aspeed,ast2400-i2c-bus";
@@ -634,7 +624,6 @@ i2c9: i2c-bus@380 {
i2c10: i2c-bus@3c0 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x3c0 0x40>;
compatible = "aspeed,ast2400-i2c-bus";
@@ -651,7 +640,6 @@ i2c10: i2c-bus@3c0 {
i2c11: i2c-bus@400 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x400 0x40>;
compatible = "aspeed,ast2400-i2c-bus";
@@ -668,7 +656,6 @@ i2c11: i2c-bus@400 {
i2c12: i2c-bus@440 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x440 0x40>;
compatible = "aspeed,ast2400-i2c-bus";
@@ -685,7 +672,6 @@ i2c12: i2c-bus@440 {
i2c13: i2c-bus@480 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x480 0x40>;
compatible = "aspeed,ast2400-i2c-bus";
diff --git a/arch/arm/boot/dts/aspeed/aspeed-g5.dtsi b/arch/arm/boot/dts/aspeed/aspeed-g5.dtsi
index 04f98d1..e6f3cf3 100644
--- a/arch/arm/boot/dts/aspeed/aspeed-g5.dtsi
+++ b/arch/arm/boot/dts/aspeed/aspeed-g5.dtsi
@@ -363,6 +363,7 @@ sgpio: sgpio@1e780200 {
interrupts = <40>;
reg = <0x1e780200 0x0100>;
clocks = <&syscon ASPEED_CLK_APB>;
+ #interrupt-cells = <2>;
interrupt-controller;
bus-frequency = <12000000>;
pinctrl-names = "default";
@@ -594,7 +595,6 @@ i2c_ic: interrupt-controller@0 {
i2c0: i2c-bus@40 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x40 0x40>;
compatible = "aspeed,ast2500-i2c-bus";
@@ -610,7 +610,6 @@ i2c0: i2c-bus@40 {
i2c1: i2c-bus@80 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x80 0x40>;
compatible = "aspeed,ast2500-i2c-bus";
@@ -626,7 +625,6 @@ i2c1: i2c-bus@80 {
i2c2: i2c-bus@c0 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0xc0 0x40>;
compatible = "aspeed,ast2500-i2c-bus";
@@ -643,7 +641,6 @@ i2c2: i2c-bus@c0 {
i2c3: i2c-bus@100 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x100 0x40>;
compatible = "aspeed,ast2500-i2c-bus";
@@ -660,7 +657,6 @@ i2c3: i2c-bus@100 {
i2c4: i2c-bus@140 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x140 0x40>;
compatible = "aspeed,ast2500-i2c-bus";
@@ -677,7 +673,6 @@ i2c4: i2c-bus@140 {
i2c5: i2c-bus@180 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x180 0x40>;
compatible = "aspeed,ast2500-i2c-bus";
@@ -694,7 +689,6 @@ i2c5: i2c-bus@180 {
i2c6: i2c-bus@1c0 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x1c0 0x40>;
compatible = "aspeed,ast2500-i2c-bus";
@@ -711,7 +705,6 @@ i2c6: i2c-bus@1c0 {
i2c7: i2c-bus@300 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x300 0x40>;
compatible = "aspeed,ast2500-i2c-bus";
@@ -728,7 +721,6 @@ i2c7: i2c-bus@300 {
i2c8: i2c-bus@340 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x340 0x40>;
compatible = "aspeed,ast2500-i2c-bus";
@@ -745,7 +737,6 @@ i2c8: i2c-bus@340 {
i2c9: i2c-bus@380 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x380 0x40>;
compatible = "aspeed,ast2500-i2c-bus";
@@ -762,7 +753,6 @@ i2c9: i2c-bus@380 {
i2c10: i2c-bus@3c0 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x3c0 0x40>;
compatible = "aspeed,ast2500-i2c-bus";
@@ -779,7 +769,6 @@ i2c10: i2c-bus@3c0 {
i2c11: i2c-bus@400 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x400 0x40>;
compatible = "aspeed,ast2500-i2c-bus";
@@ -796,7 +785,6 @@ i2c11: i2c-bus@400 {
i2c12: i2c-bus@440 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x440 0x40>;
compatible = "aspeed,ast2500-i2c-bus";
@@ -813,7 +801,6 @@ i2c12: i2c-bus@440 {
i2c13: i2c-bus@480 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x480 0x40>;
compatible = "aspeed,ast2500-i2c-bus";
diff --git a/arch/arm/boot/dts/aspeed/aspeed-g6.dtsi b/arch/arm/boot/dts/aspeed/aspeed-g6.dtsi
index c4d1faa..29f9469 100644
--- a/arch/arm/boot/dts/aspeed/aspeed-g6.dtsi
+++ b/arch/arm/boot/dts/aspeed/aspeed-g6.dtsi
@@ -474,6 +474,7 @@ sgpiom0: sgpiom@1e780500 {
reg = <0x1e780500 0x100>;
interrupts = <GIC_SPI 51 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&syscon ASPEED_CLK_APB2>;
+ #interrupt-cells = <2>;
interrupt-controller;
bus-frequency = <12000000>;
pinctrl-names = "default";
@@ -488,6 +489,7 @@ sgpiom1: sgpiom@1e780600 {
reg = <0x1e780600 0x100>;
interrupts = <GIC_SPI 70 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&syscon ASPEED_CLK_APB2>;
+ #interrupt-cells = <2>;
interrupt-controller;
bus-frequency = <12000000>;
pinctrl-names = "default";
@@ -902,7 +904,6 @@ &i2c {
i2c0: i2c-bus@80 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x80 0x80>;
compatible = "aspeed,ast2600-i2c-bus";
clocks = <&syscon ASPEED_CLK_APB2>;
@@ -917,7 +918,6 @@ i2c0: i2c-bus@80 {
i2c1: i2c-bus@100 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x100 0x80>;
compatible = "aspeed,ast2600-i2c-bus";
clocks = <&syscon ASPEED_CLK_APB2>;
@@ -932,7 +932,6 @@ i2c1: i2c-bus@100 {
i2c2: i2c-bus@180 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x180 0x80>;
compatible = "aspeed,ast2600-i2c-bus";
clocks = <&syscon ASPEED_CLK_APB2>;
@@ -947,7 +946,6 @@ i2c2: i2c-bus@180 {
i2c3: i2c-bus@200 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x200 0x80>;
compatible = "aspeed,ast2600-i2c-bus";
clocks = <&syscon ASPEED_CLK_APB2>;
@@ -962,7 +960,6 @@ i2c3: i2c-bus@200 {
i2c4: i2c-bus@280 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x280 0x80>;
compatible = "aspeed,ast2600-i2c-bus";
clocks = <&syscon ASPEED_CLK_APB2>;
@@ -977,7 +974,6 @@ i2c4: i2c-bus@280 {
i2c5: i2c-bus@300 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x300 0x80>;
compatible = "aspeed,ast2600-i2c-bus";
clocks = <&syscon ASPEED_CLK_APB2>;
@@ -992,7 +988,6 @@ i2c5: i2c-bus@300 {
i2c6: i2c-bus@380 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x380 0x80>;
compatible = "aspeed,ast2600-i2c-bus";
clocks = <&syscon ASPEED_CLK_APB2>;
@@ -1007,7 +1002,6 @@ i2c6: i2c-bus@380 {
i2c7: i2c-bus@400 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x400 0x80>;
compatible = "aspeed,ast2600-i2c-bus";
clocks = <&syscon ASPEED_CLK_APB2>;
@@ -1022,7 +1016,6 @@ i2c7: i2c-bus@400 {
i2c8: i2c-bus@480 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x480 0x80>;
compatible = "aspeed,ast2600-i2c-bus";
clocks = <&syscon ASPEED_CLK_APB2>;
@@ -1037,7 +1030,6 @@ i2c8: i2c-bus@480 {
i2c9: i2c-bus@500 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x500 0x80>;
compatible = "aspeed,ast2600-i2c-bus";
clocks = <&syscon ASPEED_CLK_APB2>;
@@ -1052,7 +1044,6 @@ i2c9: i2c-bus@500 {
i2c10: i2c-bus@580 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x580 0x80>;
compatible = "aspeed,ast2600-i2c-bus";
clocks = <&syscon ASPEED_CLK_APB2>;
@@ -1067,7 +1058,6 @@ i2c10: i2c-bus@580 {
i2c11: i2c-bus@600 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x600 0x80>;
compatible = "aspeed,ast2600-i2c-bus";
clocks = <&syscon ASPEED_CLK_APB2>;
@@ -1082,7 +1072,6 @@ i2c11: i2c-bus@600 {
i2c12: i2c-bus@680 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x680 0x80>;
compatible = "aspeed,ast2600-i2c-bus";
clocks = <&syscon ASPEED_CLK_APB2>;
@@ -1097,7 +1086,6 @@ i2c12: i2c-bus@680 {
i2c13: i2c-bus@700 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x700 0x80>;
compatible = "aspeed,ast2600-i2c-bus";
clocks = <&syscon ASPEED_CLK_APB2>;
@@ -1112,7 +1100,6 @@ i2c13: i2c-bus@700 {
i2c14: i2c-bus@780 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x780 0x80>;
compatible = "aspeed,ast2600-i2c-bus";
clocks = <&syscon ASPEED_CLK_APB2>;
@@ -1127,7 +1114,6 @@ i2c14: i2c-bus@780 {
i2c15: i2c-bus@800 {
#address-cells = <1>;
#size-cells = <0>;
- #interrupt-cells = <1>;
reg = <0x800 0x80>;
compatible = "aspeed,ast2600-i2c-bus";
clocks = <&syscon ASPEED_CLK_APB2>;
diff --git a/arch/arm/boot/dts/broadcom/bcm-cygnus.dtsi b/arch/arm/boot/dts/broadcom/bcm-cygnus.dtsi
index f9f79ed..07ca0d9 100644
--- a/arch/arm/boot/dts/broadcom/bcm-cygnus.dtsi
+++ b/arch/arm/boot/dts/broadcom/bcm-cygnus.dtsi
@@ -167,6 +167,7 @@ gpio_crmu: gpio@3024800 {
#gpio-cells = <2>;
gpio-controller;
interrupt-controller;
+ #interrupt-cells = <2>;
interrupt-parent = <&mailbox>;
interrupts = <0>;
};
@@ -247,6 +248,7 @@ gpio_ccm: gpio@1800a000 {
gpio-controller;
interrupts = <GIC_SPI 84 IRQ_TYPE_LEVEL_HIGH>;
interrupt-controller;
+ #interrupt-cells = <2>;
};
i2c1: i2c@1800b000 {
@@ -518,6 +520,7 @@ gpio_asiu: gpio@180a5000 {
gpio-controller;
interrupt-controller;
+ #interrupt-cells = <2>;
interrupts = <GIC_SPI 174 IRQ_TYPE_LEVEL_HIGH>;
gpio-ranges = <&pinctrl 0 42 1>,
<&pinctrl 1 44 3>,
diff --git a/arch/arm/boot/dts/broadcom/bcm-hr2.dtsi b/arch/arm/boot/dts/broadcom/bcm-hr2.dtsi
index 788a680..75545b1 100644
--- a/arch/arm/boot/dts/broadcom/bcm-hr2.dtsi
+++ b/arch/arm/boot/dts/broadcom/bcm-hr2.dtsi
@@ -200,6 +200,7 @@ gpiob: gpio@30000 {
gpio-controller;
ngpios = <4>;
interrupt-controller;
+ #interrupt-cells = <2>;
interrupts = <GIC_SPI 93 IRQ_TYPE_LEVEL_HIGH>;
};
diff --git a/arch/arm/boot/dts/broadcom/bcm-nsp.dtsi b/arch/arm/boot/dts/broadcom/bcm-nsp.dtsi
index 9d20ba3..6a4482c 100644
--- a/arch/arm/boot/dts/broadcom/bcm-nsp.dtsi
+++ b/arch/arm/boot/dts/broadcom/bcm-nsp.dtsi
@@ -180,6 +180,7 @@ gpioa: gpio@20 {
gpio-controller;
ngpios = <32>;
interrupt-controller;
+ #interrupt-cells = <2>;
interrupts = <GIC_SPI 85 IRQ_TYPE_LEVEL_HIGH>;
gpio-ranges = <&pinctrl 0 0 32>;
};
@@ -352,6 +353,7 @@ gpiob: gpio@30000 {
gpio-controller;
ngpios = <4>;
interrupt-controller;
+ #interrupt-cells = <2>;
interrupts = <GIC_SPI 87 IRQ_TYPE_LEVEL_HIGH>;
};
diff --git a/arch/arm/boot/dts/intel/ixp/intel-ixp42x-gateway-7001.dts b/arch/arm/boot/dts/intel/ixp/intel-ixp42x-gateway-7001.dts
index 4d70f6a..6d5e690 100644
--- a/arch/arm/boot/dts/intel/ixp/intel-ixp42x-gateway-7001.dts
+++ b/arch/arm/boot/dts/intel/ixp/intel-ixp42x-gateway-7001.dts
@@ -60,6 +60,8 @@ pci@c0000000 {
* We have slots (IDSEL) 1 and 2 with one assigned IRQ
* each handling all IRQs.
*/
+ #interrupt-cells = <1>;
+ interrupt-map-mask = <0xf800 0 0 7>;
interrupt-map =
/* IDSEL 1 */
<0x0800 0 0 1 &gpio0 11 IRQ_TYPE_LEVEL_LOW>, /* INT A on slot 1 is irq 11 */
diff --git a/arch/arm/boot/dts/intel/ixp/intel-ixp42x-goramo-multilink.dts b/arch/arm/boot/dts/intel/ixp/intel-ixp42x-goramo-multilink.dts
index 9ec0169..5f4c849 100644
--- a/arch/arm/boot/dts/intel/ixp/intel-ixp42x-goramo-multilink.dts
+++ b/arch/arm/boot/dts/intel/ixp/intel-ixp42x-goramo-multilink.dts
@@ -89,6 +89,8 @@ pci@c0000000 {
* The slots have Ethernet, Ethernet, NEC and MPCI.
* The IDSELs are 11, 12, 13, 14.
*/
+ #interrupt-cells = <1>;
+ interrupt-map-mask = <0xf800 0 0 7>;
interrupt-map =
/* IDSEL 11 - Ethernet A */
<0x5800 0 0 1 &gpio0 4 IRQ_TYPE_LEVEL_LOW>, /* INT A on slot 11 is irq 4 */
diff --git a/arch/arm/boot/dts/marvell/kirkwood-l-50.dts b/arch/arm/boot/dts/marvell/kirkwood-l-50.dts
index dffb9f8..c841eb8 100644
--- a/arch/arm/boot/dts/marvell/kirkwood-l-50.dts
+++ b/arch/arm/boot/dts/marvell/kirkwood-l-50.dts
@@ -65,6 +65,7 @@ i2c@11000 {
gpio2: gpio-expander@20 {
#gpio-cells = <2>;
#interrupt-cells = <2>;
+ interrupt-controller;
compatible = "semtech,sx1505q";
reg = <0x20>;
@@ -79,6 +80,7 @@ gpio2: gpio-expander@20 {
gpio3: gpio-expander@21 {
#gpio-cells = <2>;
#interrupt-cells = <2>;
+ interrupt-controller;
compatible = "semtech,sx1505q";
reg = <0x21>;
diff --git a/arch/arm/boot/dts/nuvoton/nuvoton-wpcm450.dtsi b/arch/arm/boot/dts/nuvoton/nuvoton-wpcm450.dtsi
index fd671c7..6e1f0f1 100644
--- a/arch/arm/boot/dts/nuvoton/nuvoton-wpcm450.dtsi
+++ b/arch/arm/boot/dts/nuvoton/nuvoton-wpcm450.dtsi
@@ -120,6 +120,7 @@ gpio0: gpio@0 {
interrupts = <2 IRQ_TYPE_LEVEL_HIGH>,
<3 IRQ_TYPE_LEVEL_HIGH>,
<4 IRQ_TYPE_LEVEL_HIGH>;
+ #interrupt-cells = <2>;
interrupt-controller;
};
@@ -128,6 +129,7 @@ gpio1: gpio@1 {
gpio-controller;
#gpio-cells = <2>;
interrupts = <5 IRQ_TYPE_LEVEL_HIGH>;
+ #interrupt-cells = <2>;
interrupt-controller;
};
diff --git a/arch/arm/boot/dts/nvidia/tegra30-apalis-v1.1.dtsi b/arch/arm/boot/dts/nvidia/tegra30-apalis-v1.1.dtsi
index 1640763..ff0d684 100644
--- a/arch/arm/boot/dts/nvidia/tegra30-apalis-v1.1.dtsi
+++ b/arch/arm/boot/dts/nvidia/tegra30-apalis-v1.1.dtsi
@@ -997,7 +997,6 @@ touchscreen@41 {
compatible = "st,stmpe811";
reg = <0x41>;
irq-gpio = <&gpio TEGRA_GPIO(V, 0) GPIO_ACTIVE_LOW>;
- interrupt-controller;
id = <0>;
blocks = <0x5>;
irq-trigger = <0x1>;
diff --git a/arch/arm/boot/dts/nvidia/tegra30-apalis.dtsi b/arch/arm/boot/dts/nvidia/tegra30-apalis.dtsi
index 3b6fad2..d38f1dd 100644
--- a/arch/arm/boot/dts/nvidia/tegra30-apalis.dtsi
+++ b/arch/arm/boot/dts/nvidia/tegra30-apalis.dtsi
@@ -980,7 +980,6 @@ touchscreen@41 {
compatible = "st,stmpe811";
reg = <0x41>;
irq-gpio = <&gpio TEGRA_GPIO(V, 0) GPIO_ACTIVE_LOW>;
- interrupt-controller;
id = <0>;
blocks = <0x5>;
irq-trigger = <0x1>;
diff --git a/arch/arm/boot/dts/nvidia/tegra30-colibri.dtsi b/arch/arm/boot/dts/nvidia/tegra30-colibri.dtsi
index 4eb526f..81c8a5f 100644
--- a/arch/arm/boot/dts/nvidia/tegra30-colibri.dtsi
+++ b/arch/arm/boot/dts/nvidia/tegra30-colibri.dtsi
@@ -861,7 +861,6 @@ touchscreen@41 {
compatible = "st,stmpe811";
reg = <0x41>;
irq-gpio = <&gpio TEGRA_GPIO(V, 0) GPIO_ACTIVE_LOW>;
- interrupt-controller;
id = <0>;
blocks = <0x5>;
irq-trigger = <0x1>;
diff --git a/arch/arm/boot/dts/nxp/imx/imx6q-b850v3.dts b/arch/arm/boot/dts/nxp/imx/imx6q-b850v3.dts
index db8c332..cad112e 100644
--- a/arch/arm/boot/dts/nxp/imx/imx6q-b850v3.dts
+++ b/arch/arm/boot/dts/nxp/imx/imx6q-b850v3.dts
@@ -227,7 +227,6 @@ bridge@1,0 {
#address-cells = <3>;
#size-cells = <2>;
- #interrupt-cells = <1>;
bridge@2,1 {
compatible = "pci10b5,8605";
@@ -235,7 +234,6 @@ bridge@2,1 {
#address-cells = <3>;
#size-cells = <2>;
- #interrupt-cells = <1>;
/* Intel Corporation I210 Gigabit Network Connection */
ethernet@3,0 {
@@ -250,7 +248,6 @@ bridge@2,2 {
#address-cells = <3>;
#size-cells = <2>;
- #interrupt-cells = <1>;
/* Intel Corporation I210 Gigabit Network Connection */
switch_nic: ethernet@4,0 {
diff --git a/arch/arm/boot/dts/nxp/imx/imx6q-bx50v3.dtsi b/arch/arm/boot/dts/nxp/imx/imx6q-bx50v3.dtsi
index 99f4f6a..c1ae7c4 100644
--- a/arch/arm/boot/dts/nxp/imx/imx6q-bx50v3.dtsi
+++ b/arch/arm/boot/dts/nxp/imx/imx6q-bx50v3.dtsi
@@ -245,6 +245,7 @@ pca9539: pca9539@74 {
reg = <0x74>;
gpio-controller;
#gpio-cells = <2>;
+ #interrupt-cells = <2>;
interrupt-controller;
interrupt-parent = <&gpio2>;
interrupts = <3 IRQ_TYPE_LEVEL_LOW>;
@@ -390,7 +391,6 @@ pci_root: root@0,0 {
#address-cells = <3>;
#size-cells = <2>;
- #interrupt-cells = <1>;
};
};
diff --git a/arch/arm/boot/dts/nxp/imx/imx6qdl-apalis.dtsi b/arch/arm/boot/dts/nxp/imx/imx6qdl-apalis.dtsi
index 2ae93f5..ea40623 100644
--- a/arch/arm/boot/dts/nxp/imx/imx6qdl-apalis.dtsi
+++ b/arch/arm/boot/dts/nxp/imx/imx6qdl-apalis.dtsi
@@ -626,7 +626,6 @@ stmpe811@41 {
blocks = <0x5>;
id = <0>;
interrupts = <10 IRQ_TYPE_LEVEL_LOW>;
- interrupt-controller;
interrupt-parent = <&gpio4>;
irq-trigger = <0x1>;
pinctrl-names = "default";
diff --git a/arch/arm/boot/dts/nxp/imx/imx6qdl-colibri.dtsi b/arch/arm/boot/dts/nxp/imx/imx6qdl-colibri.dtsi
index 55c90f6..d3a7a6e 100644
--- a/arch/arm/boot/dts/nxp/imx/imx6qdl-colibri.dtsi
+++ b/arch/arm/boot/dts/nxp/imx/imx6qdl-colibri.dtsi
@@ -550,7 +550,6 @@ stmpe811@41 {
blocks = <0x5>;
interrupts = <20 IRQ_TYPE_LEVEL_LOW>;
interrupt-parent = <&gpio6>;
- interrupt-controller;
id = <0>;
irq-trigger = <0x1>;
pinctrl-names = "default";
diff --git a/arch/arm/boot/dts/nxp/imx/imx6qdl-emcon.dtsi b/arch/arm/boot/dts/nxp/imx/imx6qdl-emcon.dtsi
index a63e73a..42b2ba2 100644
--- a/arch/arm/boot/dts/nxp/imx/imx6qdl-emcon.dtsi
+++ b/arch/arm/boot/dts/nxp/imx/imx6qdl-emcon.dtsi
@@ -225,7 +225,6 @@ da9063: pmic@58 {
pinctrl-0 = <&pinctrl_pmic>;
interrupt-parent = <&gpio2>;
interrupts = <8 IRQ_TYPE_LEVEL_LOW>;
- interrupt-controller;
onkey {
compatible = "dlg,da9063-onkey";
diff --git a/arch/arm/boot/dts/nxp/imx/imx6qdl-phytec-pfla02.dtsi b/arch/arm/boot/dts/nxp/imx/imx6qdl-phytec-pfla02.dtsi
index 1139745..c0c47ad 100644
--- a/arch/arm/boot/dts/nxp/imx/imx6qdl-phytec-pfla02.dtsi
+++ b/arch/arm/boot/dts/nxp/imx/imx6qdl-phytec-pfla02.dtsi
@@ -124,6 +124,7 @@ pmic@58 {
reg = <0x58>;
interrupt-parent = <&gpio2>;
interrupts = <9 IRQ_TYPE_LEVEL_LOW>; /* active-low GPIO2_9 */
+ #interrupt-cells = <2>;
interrupt-controller;
regulators {
diff --git a/arch/arm/boot/dts/nxp/imx/imx6qdl-phytec-phycore-som.dtsi b/arch/arm/boot/dts/nxp/imx/imx6qdl-phytec-phycore-som.dtsi
index 86b4269..85e278e 100644
--- a/arch/arm/boot/dts/nxp/imx/imx6qdl-phytec-phycore-som.dtsi
+++ b/arch/arm/boot/dts/nxp/imx/imx6qdl-phytec-phycore-som.dtsi
@@ -100,6 +100,7 @@ pmic: pmic@58 {
interrupt-parent = <&gpio1>;
interrupts = <2 IRQ_TYPE_LEVEL_LOW>;
interrupt-controller;
+ #interrupt-cells = <2>;
gpio-controller;
#gpio-cells = <2>;
diff --git a/arch/arm/boot/dts/nxp/imx/imx7d-pico-dwarf.dts b/arch/arm/boot/dts/nxp/imx/imx7d-pico-dwarf.dts
index 12361fc..1b96565 100644
--- a/arch/arm/boot/dts/nxp/imx/imx7d-pico-dwarf.dts
+++ b/arch/arm/boot/dts/nxp/imx/imx7d-pico-dwarf.dts
@@ -63,6 +63,7 @@ pca9554: io-expander@25 {
gpio-controller;
#gpio-cells = <2>;
#interrupt-cells = <2>;
+ interrupt-controller;
reg = <0x25>;
};
diff --git a/arch/arm/boot/dts/nxp/imx/imx7s.dtsi b/arch/arm/boot/dts/nxp/imx/imx7s.dtsi
index ebf7bef..9c81c6b 100644
--- a/arch/arm/boot/dts/nxp/imx/imx7s.dtsi
+++ b/arch/arm/boot/dts/nxp/imx/imx7s.dtsi
@@ -834,16 +834,6 @@ lcdif: lcdif@30730000 {
<&clks IMX7D_LCDIF_PIXEL_ROOT_CLK>;
clock-names = "pix", "axi";
status = "disabled";
-
- port {
- #address-cells = <1>;
- #size-cells = <0>;
-
- lcdif_out_mipi_dsi: endpoint@0 {
- reg = <0>;
- remote-endpoint = <&mipi_dsi_in_lcdif>;
- };
- };
};
mipi_csi: mipi-csi@30750000 {
@@ -895,22 +885,6 @@ mipi_dsi: dsi@30760000 {
samsung,esc-clock-frequency = <20000000>;
samsung,pll-clock-frequency = <24000000>;
status = "disabled";
-
- ports {
- #address-cells = <1>;
- #size-cells = <0>;
-
- port@0 {
- reg = <0>;
- #address-cells = <1>;
- #size-cells = <0>;
-
- mipi_dsi_in_lcdif: endpoint@0 {
- reg = <0>;
- remote-endpoint = <&lcdif_out_mipi_dsi>;
- };
- };
- };
};
};
diff --git a/arch/arm/boot/dts/nxp/vf/vf610-zii-dev-rev-b.dts b/arch/arm/boot/dts/nxp/vf/vf610-zii-dev-rev-b.dts
index b0ed68a..029f49b 100644
--- a/arch/arm/boot/dts/nxp/vf/vf610-zii-dev-rev-b.dts
+++ b/arch/arm/boot/dts/nxp/vf/vf610-zii-dev-rev-b.dts
@@ -338,6 +338,7 @@ gpio6: io-expander@22 {
reg = <0x22>;
gpio-controller;
#gpio-cells = <2>;
+ #interrupt-cells = <2>;
interrupt-controller;
interrupt-parent = <&gpio3>;
interrupts = <2 IRQ_TYPE_LEVEL_LOW>;
diff --git a/arch/arm/boot/dts/qcom/qcom-sdx55.dtsi b/arch/arm/boot/dts/qcom/qcom-sdx55.dtsi
index 2045fc7..27429d0 100644
--- a/arch/arm/boot/dts/qcom/qcom-sdx55.dtsi
+++ b/arch/arm/boot/dts/qcom/qcom-sdx55.dtsi
@@ -340,10 +340,10 @@ pcie_rc: pcie@1c00000 {
"msi8";
#interrupt-cells = <1>;
interrupt-map-mask = <0 0 0 0x7>;
- interrupt-map = <0 0 0 1 &intc 0 0 0 141 IRQ_TYPE_LEVEL_HIGH>, /* int_a */
- <0 0 0 2 &intc 0 0 0 142 IRQ_TYPE_LEVEL_HIGH>, /* int_b */
- <0 0 0 3 &intc 0 0 0 143 IRQ_TYPE_LEVEL_HIGH>, /* int_c */
- <0 0 0 4 &intc 0 0 0 144 IRQ_TYPE_LEVEL_HIGH>; /* int_d */
+ interrupt-map = <0 0 0 1 &intc 0 141 IRQ_TYPE_LEVEL_HIGH>, /* int_a */
+ <0 0 0 2 &intc 0 142 IRQ_TYPE_LEVEL_HIGH>, /* int_b */
+ <0 0 0 3 &intc 0 143 IRQ_TYPE_LEVEL_HIGH>, /* int_c */
+ <0 0 0 4 &intc 0 144 IRQ_TYPE_LEVEL_HIGH>; /* int_d */
clocks = <&gcc GCC_PCIE_PIPE_CLK>,
<&gcc GCC_PCIE_AUX_CLK>,
diff --git a/arch/arm/boot/dts/renesas/r8a7790-lager.dts b/arch/arm/boot/dts/renesas/r8a7790-lager.dts
index 2fba4d0..8590981 100644
--- a/arch/arm/boot/dts/renesas/r8a7790-lager.dts
+++ b/arch/arm/boot/dts/renesas/r8a7790-lager.dts
@@ -447,6 +447,7 @@ pmic@58 {
interrupt-parent = <&irqc0>;
interrupts = <2 IRQ_TYPE_LEVEL_LOW>;
interrupt-controller;
+ #interrupt-cells = <2>;
rtc {
compatible = "dlg,da9063-rtc";
diff --git a/arch/arm/boot/dts/renesas/r8a7790-stout.dts b/arch/arm/boot/dts/renesas/r8a7790-stout.dts
index f9bc5b4..683f739 100644
--- a/arch/arm/boot/dts/renesas/r8a7790-stout.dts
+++ b/arch/arm/boot/dts/renesas/r8a7790-stout.dts
@@ -347,6 +347,7 @@ pmic@58 {
interrupt-parent = <&irqc0>;
interrupts = <2 IRQ_TYPE_LEVEL_LOW>;
interrupt-controller;
+ #interrupt-cells = <2>;
onkey {
compatible = "dlg,da9063-onkey";
diff --git a/arch/arm/boot/dts/renesas/r8a7791-koelsch.dts b/arch/arm/boot/dts/renesas/r8a7791-koelsch.dts
index e9c13bb..0efd9f9 100644
--- a/arch/arm/boot/dts/renesas/r8a7791-koelsch.dts
+++ b/arch/arm/boot/dts/renesas/r8a7791-koelsch.dts
@@ -819,6 +819,7 @@ pmic@58 {
interrupt-parent = <&irqc0>;
interrupts = <2 IRQ_TYPE_LEVEL_LOW>;
interrupt-controller;
+ #interrupt-cells = <2>;
rtc {
compatible = "dlg,da9063-rtc";
diff --git a/arch/arm/boot/dts/renesas/r8a7791-porter.dts b/arch/arm/boot/dts/renesas/r8a7791-porter.dts
index 7e8bc06..93c86e9 100644
--- a/arch/arm/boot/dts/renesas/r8a7791-porter.dts
+++ b/arch/arm/boot/dts/renesas/r8a7791-porter.dts
@@ -413,6 +413,7 @@ pmic@5a {
interrupt-parent = <&irqc0>;
interrupts = <2 IRQ_TYPE_LEVEL_LOW>;
interrupt-controller;
+ #interrupt-cells = <2>;
watchdog {
compatible = "dlg,da9063-watchdog";
diff --git a/arch/arm/boot/dts/renesas/r8a7792-blanche.dts b/arch/arm/boot/dts/renesas/r8a7792-blanche.dts
index 4f9838cf..540a9ad 100644
--- a/arch/arm/boot/dts/renesas/r8a7792-blanche.dts
+++ b/arch/arm/boot/dts/renesas/r8a7792-blanche.dts
@@ -381,6 +381,7 @@ pmic@58 {
interrupt-parent = <&irqc>;
interrupts = <2 IRQ_TYPE_LEVEL_LOW>;
interrupt-controller;
+ #interrupt-cells = <2>;
rtc {
compatible = "dlg,da9063-rtc";
diff --git a/arch/arm/boot/dts/renesas/r8a7793-gose.dts b/arch/arm/boot/dts/renesas/r8a7793-gose.dts
index 1744fdb..1ea6c75 100644
--- a/arch/arm/boot/dts/renesas/r8a7793-gose.dts
+++ b/arch/arm/boot/dts/renesas/r8a7793-gose.dts
@@ -759,6 +759,7 @@ pmic@58 {
interrupt-parent = <&irqc0>;
interrupts = <2 IRQ_TYPE_LEVEL_LOW>;
interrupt-controller;
+ #interrupt-cells = <2>;
rtc {
compatible = "dlg,da9063-rtc";
diff --git a/arch/arm/boot/dts/renesas/r8a7794-alt.dts b/arch/arm/boot/dts/renesas/r8a7794-alt.dts
index c0d067d..b5ecafb 100644
--- a/arch/arm/boot/dts/renesas/r8a7794-alt.dts
+++ b/arch/arm/boot/dts/renesas/r8a7794-alt.dts
@@ -453,6 +453,7 @@ pmic@58 {
interrupt-parent = <&gpio3>;
interrupts = <31 IRQ_TYPE_LEVEL_LOW>;
interrupt-controller;
+ #interrupt-cells = <2>;
rtc {
compatible = "dlg,da9063-rtc";
diff --git a/arch/arm/boot/dts/renesas/r8a7794-silk.dts b/arch/arm/boot/dts/renesas/r8a7794-silk.dts
index 43d480a..595e074 100644
--- a/arch/arm/boot/dts/renesas/r8a7794-silk.dts
+++ b/arch/arm/boot/dts/renesas/r8a7794-silk.dts
@@ -439,6 +439,7 @@ pmic@58 {
interrupt-parent = <&gpio3>;
interrupts = <31 IRQ_TYPE_LEVEL_LOW>;
interrupt-controller;
+ #interrupt-cells = <2>;
onkey {
compatible = "dlg,da9063-onkey";
diff --git a/arch/arm/boot/dts/rockchip/rv1108.dtsi b/arch/arm/boot/dts/rockchip/rv1108.dtsi
index abf3006..f3291f3 100644
--- a/arch/arm/boot/dts/rockchip/rv1108.dtsi
+++ b/arch/arm/boot/dts/rockchip/rv1108.dtsi
@@ -196,7 +196,6 @@ spi: spi@10270000 {
pwm4: pwm@10280000 {
compatible = "rockchip,rv1108-pwm", "rockchip,rk3288-pwm";
reg = <0x10280000 0x10>;
- interrupts = <GIC_SPI 38 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&cru SCLK_PWM>, <&cru PCLK_PWM>;
clock-names = "pwm", "pclk";
pinctrl-names = "default";
@@ -208,7 +207,6 @@ pwm4: pwm@10280000 {
pwm5: pwm@10280010 {
compatible = "rockchip,rv1108-pwm", "rockchip,rk3288-pwm";
reg = <0x10280010 0x10>;
- interrupts = <GIC_SPI 38 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&cru SCLK_PWM>, <&cru PCLK_PWM>;
clock-names = "pwm", "pclk";
pinctrl-names = "default";
@@ -220,7 +218,6 @@ pwm5: pwm@10280010 {
pwm6: pwm@10280020 {
compatible = "rockchip,rv1108-pwm", "rockchip,rk3288-pwm";
reg = <0x10280020 0x10>;
- interrupts = <GIC_SPI 38 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&cru SCLK_PWM>, <&cru PCLK_PWM>;
clock-names = "pwm", "pclk";
pinctrl-names = "default";
@@ -232,7 +229,6 @@ pwm6: pwm@10280020 {
pwm7: pwm@10280030 {
compatible = "rockchip,rv1108-pwm", "rockchip,rk3288-pwm";
reg = <0x10280030 0x10>;
- interrupts = <GIC_SPI 38 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&cru SCLK_PWM>, <&cru PCLK_PWM>;
clock-names = "pwm", "pclk";
pinctrl-names = "default";
@@ -386,7 +382,6 @@ i2c0: i2c@20000000 {
pwm0: pwm@20040000 {
compatible = "rockchip,rv1108-pwm", "rockchip,rk3288-pwm";
reg = <0x20040000 0x10>;
- interrupts = <GIC_SPI 39 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&cru SCLK_PWM0_PMU>, <&cru PCLK_PWM0_PMU>;
clock-names = "pwm", "pclk";
pinctrl-names = "default";
@@ -398,7 +393,6 @@ pwm0: pwm@20040000 {
pwm1: pwm@20040010 {
compatible = "rockchip,rv1108-pwm", "rockchip,rk3288-pwm";
reg = <0x20040010 0x10>;
- interrupts = <GIC_SPI 39 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&cru SCLK_PWM0_PMU>, <&cru PCLK_PWM0_PMU>;
clock-names = "pwm", "pclk";
pinctrl-names = "default";
@@ -410,7 +404,6 @@ pwm1: pwm@20040010 {
pwm2: pwm@20040020 {
compatible = "rockchip,rv1108-pwm", "rockchip,rk3288-pwm";
reg = <0x20040020 0x10>;
- interrupts = <GIC_SPI 39 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&cru SCLK_PWM0_PMU>, <&cru PCLK_PWM0_PMU>;
clock-names = "pwm", "pclk";
pinctrl-names = "default";
@@ -422,7 +415,6 @@ pwm2: pwm@20040020 {
pwm3: pwm@20040030 {
compatible = "rockchip,rv1108-pwm", "rockchip,rk3288-pwm";
reg = <0x20040030 0x10>;
- interrupts = <GIC_SPI 39 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&cru SCLK_PWM0_PMU>, <&cru PCLK_PWM0_PMU>;
clock-names = "pwm", "pclk";
pinctrl-names = "default";
diff --git a/arch/arm/boot/dts/st/stm32429i-eval.dts b/arch/arm/boot/dts/st/stm32429i-eval.dts
index 576235e..afa417b 100644
--- a/arch/arm/boot/dts/st/stm32429i-eval.dts
+++ b/arch/arm/boot/dts/st/stm32429i-eval.dts
@@ -222,7 +222,6 @@ stmpe1600: stmpe1600@42 {
reg = <0x42>;
interrupts = <8 3>;
interrupt-parent = <&gpioi>;
- interrupt-controller;
wakeup-source;
stmpegpio: stmpe_gpio {
diff --git a/arch/arm/boot/dts/st/stm32mp157c-dk2.dts b/arch/arm/boot/dts/st/stm32mp157c-dk2.dts
index 510cca5a..7a701f7 100644
--- a/arch/arm/boot/dts/st/stm32mp157c-dk2.dts
+++ b/arch/arm/boot/dts/st/stm32mp157c-dk2.dts
@@ -64,7 +64,6 @@ touchscreen@38 {
reg = <0x38>;
interrupts = <2 2>;
interrupt-parent = <&gpiof>;
- interrupt-controller;
touchscreen-size-x = <480>;
touchscreen-size-y = <800>;
status = "okay";
diff --git a/arch/arm/boot/dts/ti/omap/am5729-beagleboneai.dts b/arch/arm/boot/dts/ti/omap/am5729-beagleboneai.dts
index c8e5564..3e834fc 100644
--- a/arch/arm/boot/dts/ti/omap/am5729-beagleboneai.dts
+++ b/arch/arm/boot/dts/ti/omap/am5729-beagleboneai.dts
@@ -415,7 +415,6 @@ stmpe811@41 {
reg = <0x41>;
interrupts = <30 IRQ_TYPE_LEVEL_LOW>;
interrupt-parent = <&gpio2>;
- interrupt-controller;
id = <0>;
blocks = <0x5>;
irq-trigger = <0x1>;
diff --git a/arch/arm/configs/imx_v6_v7_defconfig b/arch/arm/configs/imx_v6_v7_defconfig
index 0a90583..8f9dbe8 100644
--- a/arch/arm/configs/imx_v6_v7_defconfig
+++ b/arch/arm/configs/imx_v6_v7_defconfig
@@ -297,6 +297,7 @@
CONFIG_LCD_CLASS_DEVICE=y
CONFIG_LCD_L4F00242T03=y
CONFIG_LCD_PLATFORM=y
+CONFIG_BACKLIGHT_CLASS_DEVICE=y
CONFIG_BACKLIGHT_PWM=y
CONFIG_BACKLIGHT_GPIO=y
CONFIG_FRAMEBUFFER_CONSOLE=y
diff --git a/arch/arm/include/asm/jump_label.h b/arch/arm/include/asm/jump_label.h
index e12d7d0..e4eb54f 100644
--- a/arch/arm/include/asm/jump_label.h
+++ b/arch/arm/include/asm/jump_label.h
@@ -11,7 +11,7 @@
static __always_inline bool arch_static_branch(struct static_key *key, bool branch)
{
- asm_volatile_goto("1:\n\t"
+ asm goto("1:\n\t"
WASM(nop) "\n\t"
".pushsection __jump_table, \"aw\"\n\t"
".word 1b, %l[l_yes], %c0\n\t"
@@ -25,7 +25,7 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran
static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch)
{
- asm_volatile_goto("1:\n\t"
+ asm goto("1:\n\t"
WASM(b) " %l[l_yes]\n\t"
".pushsection __jump_table, \"aw\"\n\t"
".word 1b, %l[l_yes], %c0\n\t"
diff --git a/arch/arm/mach-ep93xx/core.c b/arch/arm/mach-ep93xx/core.c
index 71b1139..8b1ec60 100644
--- a/arch/arm/mach-ep93xx/core.c
+++ b/arch/arm/mach-ep93xx/core.c
@@ -339,6 +339,7 @@ static struct gpiod_lookup_table ep93xx_i2c_gpiod_table = {
GPIO_ACTIVE_HIGH | GPIO_OPEN_DRAIN),
GPIO_LOOKUP_IDX("G", 0, NULL, 1,
GPIO_ACTIVE_HIGH | GPIO_OPEN_DRAIN),
+ { }
},
};
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index e96fb40..07565b5 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -298,6 +298,8 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
goto done;
}
count_vm_vma_lock_event(VMA_LOCK_RETRY);
+ if (fault & VM_FAULT_MAJOR)
+ flags |= FAULT_FLAG_TRIED;
/* Quick path to respond to signals */
if (fault_signal_pending(fault, regs)) {
diff --git a/arch/arm64/boot/dts/allwinner/Makefile b/arch/arm64/boot/dts/allwinner/Makefile
index 91d505b..1f1f8d8 100644
--- a/arch/arm64/boot/dts/allwinner/Makefile
+++ b/arch/arm64/boot/dts/allwinner/Makefile
@@ -42,5 +42,6 @@
dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h616-bigtreetech-pi.dtb
dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h616-orangepi-zero2.dtb
dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h616-x96-mate.dtb
+dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h618-orangepi-zero2w.dtb
dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h618-orangepi-zero3.dtb
dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h618-transpeed-8k618-t.dtb
diff --git a/arch/arm64/boot/dts/amazon/alpine-v2.dtsi b/arch/arm64/boot/dts/amazon/alpine-v2.dtsi
index dccbba6..dbf2dce 100644
--- a/arch/arm64/boot/dts/amazon/alpine-v2.dtsi
+++ b/arch/arm64/boot/dts/amazon/alpine-v2.dtsi
@@ -145,7 +145,6 @@ pci@fbc00000 {
msix: msix@fbe00000 {
compatible = "al,alpine-msix";
reg = <0x0 0xfbe00000 0x0 0x100000>;
- interrupt-controller;
msi-controller;
al,msi-base-spi = <160>;
al,msi-num-spis = <160>;
diff --git a/arch/arm64/boot/dts/amazon/alpine-v3.dtsi b/arch/arm64/boot/dts/amazon/alpine-v3.dtsi
index 39481d7..3ea178a 100644
--- a/arch/arm64/boot/dts/amazon/alpine-v3.dtsi
+++ b/arch/arm64/boot/dts/amazon/alpine-v3.dtsi
@@ -355,7 +355,6 @@ pcie@fbd00000 {
msix: msix@fbe00000 {
compatible = "al,alpine-msix";
reg = <0x0 0xfbe00000 0x0 0x100000>;
- interrupt-controller;
msi-controller;
al,msi-base-spi = <336>;
al,msi-num-spis = <959>;
diff --git a/arch/arm64/boot/dts/broadcom/northstar2/ns2.dtsi b/arch/arm64/boot/dts/broadcom/northstar2/ns2.dtsi
index 9dcd25e..896d1f3 100644
--- a/arch/arm64/boot/dts/broadcom/northstar2/ns2.dtsi
+++ b/arch/arm64/boot/dts/broadcom/northstar2/ns2.dtsi
@@ -586,6 +586,7 @@ gpio_g: gpio@660a0000 {
#gpio-cells = <2>;
gpio-controller;
interrupt-controller;
+ #interrupt-cells = <2>;
interrupts = <GIC_SPI 400 IRQ_TYPE_LEVEL_HIGH>;
};
diff --git a/arch/arm64/boot/dts/broadcom/stingray/stingray.dtsi b/arch/arm64/boot/dts/broadcom/stingray/stingray.dtsi
index f049687..d8516ec 100644
--- a/arch/arm64/boot/dts/broadcom/stingray/stingray.dtsi
+++ b/arch/arm64/boot/dts/broadcom/stingray/stingray.dtsi
@@ -450,6 +450,7 @@ gpio_hsls: gpio@d0000 {
#gpio-cells = <2>;
gpio-controller;
interrupt-controller;
+ #interrupt-cells = <2>;
interrupts = <GIC_SPI 183 IRQ_TYPE_LEVEL_HIGH>;
gpio-ranges = <&pinmux 0 0 16>,
<&pinmux 16 71 2>,
diff --git a/arch/arm64/boot/dts/freescale/Makefile b/arch/arm64/boot/dts/freescale/Makefile
index 2e02767..2cb0212 100644
--- a/arch/arm64/boot/dts/freescale/Makefile
+++ b/arch/arm64/boot/dts/freescale/Makefile
@@ -20,23 +20,41 @@
dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls1046a-qds.dtb
dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls1046a-rdb.dtb
dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls1046a-tqmls1046a-mbls10xxa.dtb
+DTC_FLAGS_fsl-ls1088a-qds := -Wno-interrupt_map
dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls1088a-qds.dtb
+DTC_FLAGS_fsl-ls1088a-rdb := -Wno-interrupt_map
dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls1088a-rdb.dtb
+DTC_FLAGS_fsl-ls1088a-ten64 := -Wno-interrupt_map
dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls1088a-ten64.dtb
+DTC_FLAGS_fsl-ls1088a-tqmls1088a-mbls10xxa := -Wno-interrupt_map
dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls1088a-tqmls1088a-mbls10xxa.dtb
+DTC_FLAGS_fsl-ls2080a-qds := -Wno-interrupt_map
dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2080a-qds.dtb
+DTC_FLAGS_fsl-ls2080a-rdb := -Wno-interrupt_map
dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2080a-rdb.dtb
+DTC_FLAGS_fsl-ls2081a-rdb := -Wno-interrupt_map
dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2081a-rdb.dtb
+DTC_FLAGS_fsl-ls2080a-simu := -Wno-interrupt_map
dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2080a-simu.dtb
+DTC_FLAGS_fsl-ls2088a-qds := -Wno-interrupt_map
dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2088a-qds.dtb
+DTC_FLAGS_fsl-ls2088a-rdb := -Wno-interrupt_map
dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2088a-rdb.dtb
+DTC_FLAGS_fsl-lx2160a-bluebox3 := -Wno-interrupt_map
dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2160a-bluebox3.dtb
+DTC_FLAGS_fsl-lx2160a-bluebox3-rev-a := -Wno-interrupt_map
dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2160a-bluebox3-rev-a.dtb
+DTC_FLAGS_fsl-lx2160a-clearfog-cx := -Wno-interrupt_map
dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2160a-clearfog-cx.dtb
+DTC_FLAGS_fsl-lx2160a-honeycomb := -Wno-interrupt_map
dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2160a-honeycomb.dtb
+DTC_FLAGS_fsl-lx2160a-qds := -Wno-interrupt_map
dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2160a-qds.dtb
+DTC_FLAGS_fsl-lx2160a-rdb := -Wno-interrupt_map
dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2160a-rdb.dtb
+DTC_FLAGS_fsl-lx2162a-clearfog := -Wno-interrupt_map
dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2162a-clearfog.dtb
+DTC_FLAGS_fsl-lx2162a-qds := -Wno-interrupt_map
dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2162a-qds.dtb
fsl-ls1028a-qds-13bb-dtbs := fsl-ls1028a-qds.dtb fsl-ls1028a-qds-13bb.dtbo
@@ -53,6 +71,7 @@
dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls1028a-qds-899b.dtb
dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls1028a-qds-9999.dtb
+DTC_FLAGS_fsl-lx2160a-tqmlx2160a-mblx2160a := -Wno-interrupt_map
fsl-lx2160a-tqmlx2160a-mblx2160a-12-11-x-dtbs := fsl-lx2160a-tqmlx2160a-mblx2160a.dtb \
fsl-lx2160a-tqmlx2160a-mblx2160a_12_x_x.dtbo \
fsl-lx2160a-tqmlx2160a-mblx2160a_x_11_x.dtbo
diff --git a/arch/arm64/boot/dts/freescale/imx8mn-var-som-symphony.dts b/arch/arm64/boot/dts/freescale/imx8mn-var-som-symphony.dts
index f38ee22..a6b94d1 100644
--- a/arch/arm64/boot/dts/freescale/imx8mn-var-som-symphony.dts
+++ b/arch/arm64/boot/dts/freescale/imx8mn-var-som-symphony.dts
@@ -128,14 +128,9 @@ extcon_usbotg1: typec@3d {
pinctrl-0 = <&pinctrl_ptn5150>;
status = "okay";
- connector {
- compatible = "usb-c-connector";
- label = "USB-C";
-
- port {
- typec1_dr_sw: endpoint {
- remote-endpoint = <&usb1_drd_sw>;
- };
+ port {
+ typec1_dr_sw: endpoint {
+ remote-endpoint = <&usb1_drd_sw>;
};
};
};
diff --git a/arch/arm64/boot/dts/freescale/imx8mp-data-modul-edm-sbc.dts b/arch/arm64/boot/dts/freescale/imx8mp-data-modul-edm-sbc.dts
index d98a040..5828c9d 100644
--- a/arch/arm64/boot/dts/freescale/imx8mp-data-modul-edm-sbc.dts
+++ b/arch/arm64/boot/dts/freescale/imx8mp-data-modul-edm-sbc.dts
@@ -486,7 +486,7 @@ &uart3 { /* A53 Debug */
&uart4 {
pinctrl-names = "default";
pinctrl-0 = <&pinctrl_uart4>;
- status = "okay";
+ status = "disabled";
};
&usb3_phy0 {
diff --git a/arch/arm64/boot/dts/freescale/imx8mp-dhcom-pdk3.dts b/arch/arm64/boot/dts/freescale/imx8mp-dhcom-pdk3.dts
index fea67a9..b749e28 100644
--- a/arch/arm64/boot/dts/freescale/imx8mp-dhcom-pdk3.dts
+++ b/arch/arm64/boot/dts/freescale/imx8mp-dhcom-pdk3.dts
@@ -175,14 +175,10 @@ typec@3d {
pinctrl-names = "default";
pinctrl-0 = <&pinctrl_ptn5150>;
- connector {
- compatible = "usb-c-connector";
- label = "USB-C";
+ port {
- port {
- ptn5150_out_ep: endpoint {
- remote-endpoint = <&dwc3_0_ep>;
- };
+ ptn5150_out_ep: endpoint {
+ remote-endpoint = <&dwc3_0_ep>;
};
};
};
diff --git a/arch/arm64/boot/dts/freescale/imx8mp-dhcom-som.dtsi b/arch/arm64/boot/dts/freescale/imx8mp-dhcom-som.dtsi
index 4ae4fda..43f1d45c 100644
--- a/arch/arm64/boot/dts/freescale/imx8mp-dhcom-som.dtsi
+++ b/arch/arm64/boot/dts/freescale/imx8mp-dhcom-som.dtsi
@@ -255,7 +255,7 @@ tc_bridge: bridge@f {
<&clk IMX8MP_AUDIO_PLL2_OUT>;
assigned-clock-parents = <&clk IMX8MP_AUDIO_PLL2_OUT>;
assigned-clock-rates = <13000000>, <13000000>, <156000000>;
- reset-gpios = <&gpio3 21 GPIO_ACTIVE_HIGH>;
+ reset-gpios = <&gpio4 1 GPIO_ACTIVE_HIGH>;
status = "disabled";
ports {
diff --git a/arch/arm64/boot/dts/freescale/imx8mp-tqma8mpql-mba8mpxl.dts b/arch/arm64/boot/dts/freescale/imx8mp-tqma8mpql-mba8mpxl.dts
index a2d5d19..86d3da3 100644
--- a/arch/arm64/boot/dts/freescale/imx8mp-tqma8mpql-mba8mpxl.dts
+++ b/arch/arm64/boot/dts/freescale/imx8mp-tqma8mpql-mba8mpxl.dts
@@ -184,6 +184,13 @@ reg_vcc_12v0: regulator-12v0 {
enable-active-high;
};
+ reg_vcc_1v8: regulator-1v8 {
+ compatible = "regulator-fixed";
+ regulator-name = "VCC_1V8";
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <1800000>;
+ };
+
reg_vcc_3v3: regulator-3v3 {
compatible = "regulator-fixed";
regulator-name = "VCC_3V3";
@@ -480,7 +487,7 @@ tlv320aic3x04: audio-codec@18 {
clock-names = "mclk";
clocks = <&audio_blk_ctrl IMX8MP_CLK_AUDIOMIX_SAI3_MCLK1>;
reset-gpios = <&gpio4 29 GPIO_ACTIVE_LOW>;
- iov-supply = <®_vcc_3v3>;
+ iov-supply = <®_vcc_1v8>;
ldoin-supply = <®_vcc_3v3>;
};
diff --git a/arch/arm64/boot/dts/freescale/imx8mp.dtsi b/arch/arm64/boot/dts/freescale/imx8mp.dtsi
index 76c73da..39a550c1 100644
--- a/arch/arm64/boot/dts/freescale/imx8mp.dtsi
+++ b/arch/arm64/boot/dts/freescale/imx8mp.dtsi
@@ -1820,7 +1820,7 @@ lvds_bridge: bridge@5c {
compatible = "fsl,imx8mp-ldb";
reg = <0x5c 0x4>, <0x128 0x4>;
reg-names = "ldb", "lvds";
- clocks = <&clk IMX8MP_CLK_MEDIA_LDB>;
+ clocks = <&clk IMX8MP_CLK_MEDIA_LDB_ROOT>;
clock-names = "ldb";
assigned-clocks = <&clk IMX8MP_CLK_MEDIA_LDB>;
assigned-clock-parents = <&clk IMX8MP_VIDEO_PLL1_OUT>;
diff --git a/arch/arm64/boot/dts/lg/lg1312.dtsi b/arch/arm64/boot/dts/lg/lg1312.dtsi
index 48ec4eb..b864ffa 100644
--- a/arch/arm64/boot/dts/lg/lg1312.dtsi
+++ b/arch/arm64/boot/dts/lg/lg1312.dtsi
@@ -126,7 +126,6 @@ eth0: ethernet@c1b00000 {
amba {
#address-cells = <2>;
#size-cells = <1>;
- #interrupt-cells = <3>;
compatible = "simple-bus";
interrupt-parent = <&gic>;
diff --git a/arch/arm64/boot/dts/lg/lg1313.dtsi b/arch/arm64/boot/dts/lg/lg1313.dtsi
index 3869460..996fb39 100644
--- a/arch/arm64/boot/dts/lg/lg1313.dtsi
+++ b/arch/arm64/boot/dts/lg/lg1313.dtsi
@@ -126,7 +126,6 @@ eth0: ethernet@c3700000 {
amba {
#address-cells = <2>;
#size-cells = <1>;
- #interrupt-cells = <3>;
compatible = "simple-bus";
interrupt-parent = <&gic>;
diff --git a/arch/arm64/boot/dts/marvell/armada-ap80x.dtsi b/arch/arm64/boot/dts/marvell/armada-ap80x.dtsi
index 2c920e2..7ec7c78 100644
--- a/arch/arm64/boot/dts/marvell/armada-ap80x.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-ap80x.dtsi
@@ -138,7 +138,6 @@ pmu {
odmi: odmi@300000 {
compatible = "marvell,odmi-controller";
- interrupt-controller;
msi-controller;
marvell,odmi-frames = <4>;
reg = <0x300000 0x4000>,
diff --git a/arch/arm64/boot/dts/mediatek/mt8195-demo.dts b/arch/arm64/boot/dts/mediatek/mt8195-demo.dts
index 69c7f39..4127cb8 100644
--- a/arch/arm64/boot/dts/mediatek/mt8195-demo.dts
+++ b/arch/arm64/boot/dts/mediatek/mt8195-demo.dts
@@ -128,6 +128,7 @@ mt6360: pmic@34 {
compatible = "mediatek,mt6360";
reg = <0x34>;
interrupt-controller;
+ #interrupt-cells = <1>;
interrupts-extended = <&pio 101 IRQ_TYPE_EDGE_FALLING>;
interrupt-names = "IRQB";
diff --git a/arch/arm64/boot/dts/nvidia/tegra234-p3737-0000+p3701-0000.dts b/arch/arm64/boot/dts/nvidia/tegra234-p3737-0000+p3701-0000.dts
index ea13c4a..81a8293 100644
--- a/arch/arm64/boot/dts/nvidia/tegra234-p3737-0000+p3701-0000.dts
+++ b/arch/arm64/boot/dts/nvidia/tegra234-p3737-0000+p3701-0000.dts
@@ -175,7 +175,7 @@ ethernet@6800000 {
status = "okay";
phy-handle = <&mgbe0_phy>;
- phy-mode = "usxgmii";
+ phy-mode = "10gbase-r";
mdio {
#address-cells = <1>;
diff --git a/arch/arm64/boot/dts/nvidia/tegra234.dtsi b/arch/arm64/boot/dts/nvidia/tegra234.dtsi
index 3f16595..d1bd328 100644
--- a/arch/arm64/boot/dts/nvidia/tegra234.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra234.dtsi
@@ -1459,7 +1459,7 @@ ethernet@6800000 {
<&mc TEGRA234_MEMORY_CLIENT_MGBEAWR &emc>;
interconnect-names = "dma-mem", "write";
iommus = <&smmu_niso0 TEGRA234_SID_MGBE>;
- power-domains = <&bpmp TEGRA234_POWER_DOMAIN_MGBEA>;
+ power-domains = <&bpmp TEGRA234_POWER_DOMAIN_MGBEB>;
status = "disabled";
};
@@ -1493,7 +1493,7 @@ ethernet@6900000 {
<&mc TEGRA234_MEMORY_CLIENT_MGBEBWR &emc>;
interconnect-names = "dma-mem", "write";
iommus = <&smmu_niso0 TEGRA234_SID_MGBE_VF1>;
- power-domains = <&bpmp TEGRA234_POWER_DOMAIN_MGBEB>;
+ power-domains = <&bpmp TEGRA234_POWER_DOMAIN_MGBEC>;
status = "disabled";
};
@@ -1527,7 +1527,7 @@ ethernet@6a00000 {
<&mc TEGRA234_MEMORY_CLIENT_MGBECWR &emc>;
interconnect-names = "dma-mem", "write";
iommus = <&smmu_niso0 TEGRA234_SID_MGBE_VF2>;
- power-domains = <&bpmp TEGRA234_POWER_DOMAIN_MGBEC>;
+ power-domains = <&bpmp TEGRA234_POWER_DOMAIN_MGBED>;
status = "disabled";
};
diff --git a/arch/arm64/boot/dts/qcom/ipq6018.dtsi b/arch/arm64/boot/dts/qcom/ipq6018.dtsi
index 5e1277f..61c8fd4 100644
--- a/arch/arm64/boot/dts/qcom/ipq6018.dtsi
+++ b/arch/arm64/boot/dts/qcom/ipq6018.dtsi
@@ -830,10 +830,10 @@ pcie0: pcie@20000000 {
#interrupt-cells = <1>;
interrupt-map-mask = <0 0 0 0x7>;
- interrupt-map = <0 0 0 1 &intc 0 75 IRQ_TYPE_LEVEL_HIGH>, /* int_a */
- <0 0 0 2 &intc 0 78 IRQ_TYPE_LEVEL_HIGH>, /* int_b */
- <0 0 0 3 &intc 0 79 IRQ_TYPE_LEVEL_HIGH>, /* int_c */
- <0 0 0 4 &intc 0 83 IRQ_TYPE_LEVEL_HIGH>; /* int_d */
+ interrupt-map = <0 0 0 1 &intc 0 0 0 75 IRQ_TYPE_LEVEL_HIGH>, /* int_a */
+ <0 0 0 2 &intc 0 0 0 78 IRQ_TYPE_LEVEL_HIGH>, /* int_b */
+ <0 0 0 3 &intc 0 0 0 79 IRQ_TYPE_LEVEL_HIGH>, /* int_c */
+ <0 0 0 4 &intc 0 0 0 83 IRQ_TYPE_LEVEL_HIGH>; /* int_d */
clocks = <&gcc GCC_SYS_NOC_PCIE0_AXI_CLK>,
<&gcc GCC_PCIE0_AXI_M_CLK>,
diff --git a/arch/arm64/boot/dts/qcom/ipq8074.dtsi b/arch/arm64/boot/dts/qcom/ipq8074.dtsi
index cf295be..2644144 100644
--- a/arch/arm64/boot/dts/qcom/ipq8074.dtsi
+++ b/arch/arm64/boot/dts/qcom/ipq8074.dtsi
@@ -814,13 +814,13 @@ pcie1: pcie@10000000 {
interrupt-names = "msi";
#interrupt-cells = <1>;
interrupt-map-mask = <0 0 0 0x7>;
- interrupt-map = <0 0 0 1 &intc 0 142
+ interrupt-map = <0 0 0 1 &intc 0 0 142
IRQ_TYPE_LEVEL_HIGH>, /* int_a */
- <0 0 0 2 &intc 0 143
+ <0 0 0 2 &intc 0 0 143
IRQ_TYPE_LEVEL_HIGH>, /* int_b */
- <0 0 0 3 &intc 0 144
+ <0 0 0 3 &intc 0 0 144
IRQ_TYPE_LEVEL_HIGH>, /* int_c */
- <0 0 0 4 &intc 0 145
+ <0 0 0 4 &intc 0 0 145
IRQ_TYPE_LEVEL_HIGH>; /* int_d */
clocks = <&gcc GCC_SYS_NOC_PCIE1_AXI_CLK>,
@@ -876,13 +876,13 @@ pcie0: pcie@20000000 {
interrupt-names = "msi";
#interrupt-cells = <1>;
interrupt-map-mask = <0 0 0 0x7>;
- interrupt-map = <0 0 0 1 &intc 0 75
+ interrupt-map = <0 0 0 1 &intc 0 0 75
IRQ_TYPE_LEVEL_HIGH>, /* int_a */
- <0 0 0 2 &intc 0 78
+ <0 0 0 2 &intc 0 0 78
IRQ_TYPE_LEVEL_HIGH>, /* int_b */
- <0 0 0 3 &intc 0 79
+ <0 0 0 3 &intc 0 0 79
IRQ_TYPE_LEVEL_HIGH>, /* int_c */
- <0 0 0 4 &intc 0 83
+ <0 0 0 4 &intc 0 0 83
IRQ_TYPE_LEVEL_HIGH>; /* int_d */
clocks = <&gcc GCC_SYS_NOC_PCIE0_AXI_CLK>,
diff --git a/arch/arm64/boot/dts/qcom/msm8996.dtsi b/arch/arm64/boot/dts/qcom/msm8996.dtsi
index 8d41ed2..ee6f87c 100644
--- a/arch/arm64/boot/dts/qcom/msm8996.dtsi
+++ b/arch/arm64/boot/dts/qcom/msm8996.dtsi
@@ -457,25 +457,6 @@ modem_etm_out_funnel_in2: endpoint {
};
};
- mpm: interrupt-controller {
- compatible = "qcom,mpm";
- qcom,rpm-msg-ram = <&apss_mpm>;
- interrupts = <GIC_SPI 171 IRQ_TYPE_EDGE_RISING>;
- mboxes = <&apcs_glb 1>;
- interrupt-controller;
- #interrupt-cells = <2>;
- #power-domain-cells = <0>;
- interrupt-parent = <&intc>;
- qcom,mpm-pin-count = <96>;
- qcom,mpm-pin-map = <2 184>, /* TSENS1 upper_lower_int */
- <52 243>, /* DWC3_PRI ss_phy_irq */
- <79 347>, /* DWC3_PRI hs_phy_irq */
- <80 352>, /* DWC3_SEC hs_phy_irq */
- <81 347>, /* QUSB2_PHY_PRI DP+DM */
- <82 352>, /* QUSB2_PHY_SEC DP+DM */
- <87 326>; /* SPMI */
- };
-
psci {
compatible = "arm,psci-1.0";
method = "smc";
@@ -765,15 +746,8 @@ pciephy_2: phy@3000 {
};
rpm_msg_ram: sram@68000 {
- compatible = "qcom,rpm-msg-ram", "mmio-sram";
+ compatible = "qcom,rpm-msg-ram";
reg = <0x00068000 0x6000>;
- #address-cells = <1>;
- #size-cells = <1>;
- ranges = <0 0x00068000 0x7000>;
-
- apss_mpm: sram@1b8 {
- reg = <0x1b8 0x48>;
- };
};
qfprom@74000 {
@@ -856,8 +830,8 @@ tsens1: thermal-sensor@4ad000 {
reg = <0x004ad000 0x1000>, /* TM */
<0x004ac000 0x1000>; /* SROT */
#qcom,sensors = <8>;
- interrupts-extended = <&mpm 2 IRQ_TYPE_LEVEL_HIGH>,
- <&intc GIC_SPI 430 IRQ_TYPE_LEVEL_HIGH>;
+ interrupts = <GIC_SPI 184 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 430 IRQ_TYPE_LEVEL_HIGH>;
interrupt-names = "uplow", "critical";
#thermal-sensor-cells = <1>;
};
@@ -1363,7 +1337,6 @@ tlmm: pinctrl@1010000 {
interrupts = <GIC_SPI 208 IRQ_TYPE_LEVEL_HIGH>;
gpio-controller;
gpio-ranges = <&tlmm 0 0 150>;
- wakeup-parent = <&mpm>;
#gpio-cells = <2>;
interrupt-controller;
#interrupt-cells = <2>;
@@ -1891,7 +1864,7 @@ spmi_bus: spmi@400f000 {
<0x0400a000 0x002100>;
reg-names = "core", "chnls", "obsrvr", "intr", "cnfg";
interrupt-names = "periph_irq";
- interrupts-extended = <&mpm 87 IRQ_TYPE_LEVEL_HIGH>;
+ interrupts = <GIC_SPI 326 IRQ_TYPE_LEVEL_HIGH>;
qcom,ee = <0>;
qcom,channel = <0>;
#address-cells = <2>;
@@ -3052,8 +3025,8 @@ usb3: usb@6af8800 {
#size-cells = <1>;
ranges;
- interrupts-extended = <&mpm 79 IRQ_TYPE_LEVEL_HIGH>,
- <&mpm 52 IRQ_TYPE_LEVEL_HIGH>;
+ interrupts = <GIC_SPI 347 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 243 IRQ_TYPE_LEVEL_HIGH>;
interrupt-names = "hs_phy_irq", "ss_phy_irq";
clocks = <&gcc GCC_SYS_NOC_USB3_AXI_CLK>,
diff --git a/arch/arm64/boot/dts/qcom/sc8280xp-crd.dts b/arch/arm64/boot/dts/qcom/sc8280xp-crd.dts
index ffc4406..4121556 100644
--- a/arch/arm64/boot/dts/qcom/sc8280xp-crd.dts
+++ b/arch/arm64/boot/dts/qcom/sc8280xp-crd.dts
@@ -563,6 +563,8 @@ &pcie3a_phy {
};
&pcie4 {
+ max-link-speed = <2>;
+
perst-gpios = <&tlmm 141 GPIO_ACTIVE_LOW>;
wake-gpios = <&tlmm 139 GPIO_ACTIVE_LOW>;
diff --git a/arch/arm64/boot/dts/qcom/sc8280xp-lenovo-thinkpad-x13s.dts b/arch/arm64/boot/dts/qcom/sc8280xp-lenovo-thinkpad-x13s.dts
index def3976..eb657e5 100644
--- a/arch/arm64/boot/dts/qcom/sc8280xp-lenovo-thinkpad-x13s.dts
+++ b/arch/arm64/boot/dts/qcom/sc8280xp-lenovo-thinkpad-x13s.dts
@@ -722,6 +722,8 @@ &pcie3a_phy {
};
&pcie4 {
+ max-link-speed = <2>;
+
perst-gpios = <&tlmm 141 GPIO_ACTIVE_LOW>;
wake-gpios = <&tlmm 139 GPIO_ACTIVE_LOW>;
diff --git a/arch/arm64/boot/dts/qcom/sm6115.dtsi b/arch/arm64/boot/dts/qcom/sm6115.dtsi
index 160e098f..f9849b8 100644
--- a/arch/arm64/boot/dts/qcom/sm6115.dtsi
+++ b/arch/arm64/boot/dts/qcom/sm6115.dtsi
@@ -1304,6 +1304,9 @@ &clk_virt SLAVE_QUP_CORE_0 RPM_ALWAYS_TAG>,
&config_noc SLAVE_QUP_0 RPM_ALWAYS_TAG>,
<&system_noc MASTER_QUP_0 RPM_ALWAYS_TAG
&bimc SLAVE_EBI_CH0 RPM_ALWAYS_TAG>;
+ interconnect-names = "qup-core",
+ "qup-config",
+ "qup-memory";
#address-cells = <1>;
#size-cells = <0>;
status = "disabled";
diff --git a/arch/arm64/boot/dts/qcom/sm8650-mtp.dts b/arch/arm64/boot/dts/qcom/sm8650-mtp.dts
index 9d916ed..be133a3 100644
--- a/arch/arm64/boot/dts/qcom/sm8650-mtp.dts
+++ b/arch/arm64/boot/dts/qcom/sm8650-mtp.dts
@@ -622,7 +622,7 @@ right_spkr: speaker@0,1 {
&tlmm {
/* Reserved I/Os for NFC */
- gpio-reserved-ranges = <32 8>;
+ gpio-reserved-ranges = <32 8>, <74 1>;
disp0_reset_n_active: disp0-reset-n-active-state {
pins = "gpio133";
diff --git a/arch/arm64/boot/dts/qcom/sm8650-qrd.dts b/arch/arm64/boot/dts/qcom/sm8650-qrd.dts
index 592a67a..b9151c2 100644
--- a/arch/arm64/boot/dts/qcom/sm8650-qrd.dts
+++ b/arch/arm64/boot/dts/qcom/sm8650-qrd.dts
@@ -659,7 +659,7 @@ touchscreen@0 {
&tlmm {
/* Reserved I/Os for NFC */
- gpio-reserved-ranges = <32 8>;
+ gpio-reserved-ranges = <32 8>, <74 1>;
bt_default: bt-default-state {
bt-en-pins {
diff --git a/arch/arm64/boot/dts/renesas/ulcb-kf.dtsi b/arch/arm64/boot/dts/renesas/ulcb-kf.dtsi
index 3885ef3..50de17e 100644
--- a/arch/arm64/boot/dts/renesas/ulcb-kf.dtsi
+++ b/arch/arm64/boot/dts/renesas/ulcb-kf.dtsi
@@ -234,6 +234,7 @@ gpio_exp_74: gpio@74 {
gpio-controller;
#gpio-cells = <2>;
interrupt-controller;
+ #interrupt-cells = <2>;
interrupt-parent = <&gpio6>;
interrupts = <8 IRQ_TYPE_EDGE_FALLING>;
@@ -294,6 +295,7 @@ gpio_exp_75: gpio@75 {
gpio-controller;
#gpio-cells = <2>;
interrupt-controller;
+ #interrupt-cells = <2>;
interrupt-parent = <&gpio6>;
interrupts = <4 IRQ_TYPE_EDGE_FALLING>;
};
@@ -314,6 +316,7 @@ gpio_exp_76: gpio@76 {
gpio-controller;
#gpio-cells = <2>;
interrupt-controller;
+ #interrupt-cells = <2>;
interrupt-parent = <&gpio7>;
interrupts = <3 IRQ_TYPE_EDGE_FALLING>;
};
@@ -324,6 +327,7 @@ gpio_exp_77: gpio@77 {
gpio-controller;
#gpio-cells = <2>;
interrupt-controller;
+ #interrupt-cells = <2>;
interrupt-parent = <&gpio5>;
interrupts = <9 IRQ_TYPE_EDGE_FALLING>;
};
diff --git a/arch/arm64/boot/dts/rockchip/px30.dtsi b/arch/arm64/boot/dts/rockchip/px30.dtsi
index d090551..9137dd7 100644
--- a/arch/arm64/boot/dts/rockchip/px30.dtsi
+++ b/arch/arm64/boot/dts/rockchip/px30.dtsi
@@ -631,6 +631,7 @@ spi0: spi@ff1d0000 {
clock-names = "spiclk", "apb_pclk";
dmas = <&dmac 12>, <&dmac 13>;
dma-names = "tx", "rx";
+ num-cs = <2>;
pinctrl-names = "default";
pinctrl-0 = <&spi0_clk &spi0_csn &spi0_miso &spi0_mosi>;
#address-cells = <1>;
@@ -646,6 +647,7 @@ spi1: spi@ff1d8000 {
clock-names = "spiclk", "apb_pclk";
dmas = <&dmac 14>, <&dmac 15>;
dma-names = "tx", "rx";
+ num-cs = <2>;
pinctrl-names = "default";
pinctrl-0 = <&spi1_clk &spi1_csn0 &spi1_csn1 &spi1_miso &spi1_mosi>;
#address-cells = <1>;
diff --git a/arch/arm64/boot/dts/rockchip/rk3328.dtsi b/arch/arm64/boot/dts/rockchip/rk3328.dtsi
index fb5dcf6..7b4c15c 100644
--- a/arch/arm64/boot/dts/rockchip/rk3328.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3328.dtsi
@@ -488,7 +488,6 @@ pwm2: pwm@ff1b0020 {
pwm3: pwm@ff1b0030 {
compatible = "rockchip,rk3328-pwm";
reg = <0x0 0xff1b0030 0x0 0x10>;
- interrupts = <GIC_SPI 50 IRQ_TYPE_LEVEL_HIGH>;
clocks = <&cru SCLK_PWM>, <&cru PCLK_PWM>;
clock-names = "pwm", "pclk";
pinctrl-names = "default";
diff --git a/arch/arm64/boot/dts/rockchip/rk3588-coolpi-cm5-evb.dts b/arch/arm64/boot/dts/rockchip/rk3588-coolpi-cm5-evb.dts
index d4c7083..a4946cd 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588-coolpi-cm5-evb.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3588-coolpi-cm5-evb.dts
@@ -72,7 +72,7 @@ vcc3v3_lcd: vcc3v3-lcd-regulator {
vin-supply = <&vcc3v3_sys>;
};
- vcc5v0_usb30_host: vcc5v0-usb30-host-regulator {
+ vcc5v0_usb_host1: vcc5v0_usb_host2: vcc5v0-usb-host-regulator {
compatible = "regulator-fixed";
regulator-name = "vcc5v0_host";
regulator-boot-on;
@@ -114,6 +114,7 @@ &pcie30phy {
status = "okay";
};
+/* Standard pcie */
&pcie3x2 {
reset-gpios = <&gpio3 RK_PB0 GPIO_ACTIVE_HIGH>;
vpcie3v3-supply = <&vcc3v3_sys>;
@@ -122,6 +123,7 @@ &pcie3x2 {
/* M.2 M-Key ssd */
&pcie3x4 {
+ num-lanes = <2>;
reset-gpios = <&gpio4 RK_PB6 GPIO_ACTIVE_HIGH>;
vpcie3v3-supply = <&vcc3v3_sys>;
status = "okay";
@@ -188,12 +190,12 @@ &u2phy3 {
};
&u2phy2_host {
- phy-supply = <&vcc5v0_usb30_host>;
+ phy-supply = <&vcc5v0_usb_host1>;
status = "okay";
};
&u2phy3_host {
- phy-supply = <&vcc5v0_usb30_host>;
+ phy-supply = <&vcc5v0_usb_host2>;
status = "okay";
};
diff --git a/arch/arm64/boot/dts/rockchip/rk3588-coolpi-cm5.dtsi b/arch/arm64/boot/dts/rockchip/rk3588-coolpi-cm5.dtsi
index 0b02f4d..cce1c8e 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588-coolpi-cm5.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3588-coolpi-cm5.dtsi
@@ -16,8 +16,8 @@ / {
aliases {
mmc0 = &sdhci;
- mmc1 = &sdio;
- mmc2 = &sdmmc;
+ mmc1 = &sdmmc;
+ mmc2 = &sdio;
serial2 = &uart2;
};
diff --git a/arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts b/arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts
index ac7c677..de30c26 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts
@@ -448,6 +448,7 @@ pmic@0 {
<&rk806_dvs2_null>, <&rk806_dvs3_null>;
pinctrl-names = "default";
spi-max-frequency = <1000000>;
+ system-power-controller;
vcc1-supply = <&vcc5v0_sys>;
vcc2-supply = <&vcc5v0_sys>;
diff --git a/arch/arm64/boot/dts/rockchip/rk3588-jaguar.dts b/arch/arm64/boot/dts/rockchip/rk3588-jaguar.dts
index 4ce70fb..39d6500 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588-jaguar.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3588-jaguar.dts
@@ -62,7 +62,6 @@ leds {
compatible = "gpio-leds";
pinctrl-names = "default";
pinctrl-0 = <&led1_pin>;
- status = "okay";
/* LED1 on PCB */
led-1 {
diff --git a/arch/arm64/boot/dts/rockchip/rk3588-nanopc-t6.dts b/arch/arm64/boot/dts/rockchip/rk3588-nanopc-t6.dts
index d772277..997b516 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588-nanopc-t6.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3588-nanopc-t6.dts
@@ -189,19 +189,19 @@ &cpu_l3 {
cpu-supply = <&vdd_cpu_lit_s0>;
};
-&cpu_b0{
+&cpu_b0 {
cpu-supply = <&vdd_cpu_big0_s0>;
};
-&cpu_b1{
+&cpu_b1 {
cpu-supply = <&vdd_cpu_big0_s0>;
};
-&cpu_b2{
+&cpu_b2 {
cpu-supply = <&vdd_cpu_big1_s0>;
};
-&cpu_b3{
+&cpu_b3 {
cpu-supply = <&vdd_cpu_big1_s0>;
};
diff --git a/arch/arm64/boot/dts/rockchip/rk3588s-coolpi-4b.dts b/arch/arm64/boot/dts/rockchip/rk3588s-coolpi-4b.dts
index ef4f058..e037bf9 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588s-coolpi-4b.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3588s-coolpi-4b.dts
@@ -19,8 +19,8 @@ / {
aliases {
mmc0 = &sdhci;
- mmc1 = &sdio;
- mmc2 = &sdmmc;
+ mmc1 = &sdmmc;
+ mmc2 = &sdio;
};
analog-sound {
diff --git a/arch/arm64/boot/dts/rockchip/rk3588s-indiedroid-nova.dts b/arch/arm64/boot/dts/rockchip/rk3588s-indiedroid-nova.dts
index dc677f2..3c22788 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588s-indiedroid-nova.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3588s-indiedroid-nova.dts
@@ -195,13 +195,13 @@ &gpio0 {
&gpio1 {
gpio-line-names = /* GPIO1 A0-A7 */
- "HEADER_27_3v3", "HEADER_28_3v3", "", "",
+ "HEADER_27_3v3", "", "", "",
"HEADER_29_1v8", "", "HEADER_7_1v8", "",
/* GPIO1 B0-B7 */
"", "HEADER_31_1v8", "HEADER_33_1v8", "",
"HEADER_11_1v8", "HEADER_13_1v8", "", "",
/* GPIO1 C0-C7 */
- "", "", "", "",
+ "", "HEADER_28_3v3", "", "",
"", "", "", "",
/* GPIO1 D0-D7 */
"", "", "", "",
@@ -225,11 +225,11 @@ &gpio3 {
&gpio4 {
gpio-line-names = /* GPIO4 A0-A7 */
- "", "", "HEADER_37_3v3", "HEADER_32_3v3",
- "HEADER_36_3v3", "", "HEADER_35_3v3", "HEADER_38_3v3",
+ "", "", "HEADER_37_3v3", "HEADER_8_3v3",
+ "HEADER_10_3v3", "", "HEADER_32_3v3", "HEADER_35_3v3",
/* GPIO4 B0-B7 */
"", "", "", "HEADER_40_3v3",
- "HEADER_8_3v3", "HEADER_10_3v3", "", "",
+ "HEADER_38_3v3", "HEADER_36_3v3", "", "",
/* GPIO4 C0-C7 */
"", "", "", "",
"", "", "", "",
diff --git a/arch/arm64/crypto/aes-neonbs-glue.c b/arch/arm64/crypto/aes-neonbs-glue.c
index bac4cab..467ac2f 100644
--- a/arch/arm64/crypto/aes-neonbs-glue.c
+++ b/arch/arm64/crypto/aes-neonbs-glue.c
@@ -227,8 +227,19 @@ static int ctr_encrypt(struct skcipher_request *req)
src += blocks * AES_BLOCK_SIZE;
}
if (nbytes && walk.nbytes == walk.total) {
+ u8 buf[AES_BLOCK_SIZE];
+ u8 *d = dst;
+
+ if (unlikely(nbytes < AES_BLOCK_SIZE))
+ src = dst = memcpy(buf + sizeof(buf) - nbytes,
+ src, nbytes);
+
neon_aes_ctr_encrypt(dst, src, ctx->enc, ctx->key.rounds,
nbytes, walk.iv);
+
+ if (unlikely(nbytes < AES_BLOCK_SIZE))
+ memcpy(d, dst, nbytes);
+
nbytes = 0;
}
kernel_neon_end();
diff --git a/arch/arm64/include/asm/alternative-macros.h b/arch/arm64/include/asm/alternative-macros.h
index 210bb43..d328f54 100644
--- a/arch/arm64/include/asm/alternative-macros.h
+++ b/arch/arm64/include/asm/alternative-macros.h
@@ -229,7 +229,7 @@ alternative_has_cap_likely(const unsigned long cpucap)
if (!cpucap_is_possible(cpucap))
return false;
- asm_volatile_goto(
+ asm goto(
ALTERNATIVE_CB("b %l[l_no]", %[cpucap], alt_cb_patch_nops)
:
: [cpucap] "i" (cpucap)
@@ -247,7 +247,7 @@ alternative_has_cap_unlikely(const unsigned long cpucap)
if (!cpucap_is_possible(cpucap))
return false;
- asm_volatile_goto(
+ asm goto(
ALTERNATIVE("nop", "b %l[l_yes]", %[cpucap])
:
: [cpucap] "i" (cpucap)
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 21c824e..bd8d4ca 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -83,7 +83,7 @@ struct arm64_ftr_bits {
* to full-0 denotes that this field has no override
*
* A @mask field set to full-0 with the corresponding @val field set
- * to full-1 denotes thath this field has an invalid override.
+ * to full-1 denotes that this field has an invalid override.
*/
struct arm64_ftr_override {
u64 val;
diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
index 7c7493c..52f076a 100644
--- a/arch/arm64/include/asm/cputype.h
+++ b/arch/arm64/include/asm/cputype.h
@@ -61,6 +61,7 @@
#define ARM_CPU_IMP_HISI 0x48
#define ARM_CPU_IMP_APPLE 0x61
#define ARM_CPU_IMP_AMPERE 0xC0
+#define ARM_CPU_IMP_MICROSOFT 0x6D
#define ARM_CPU_PART_AEM_V8 0xD0F
#define ARM_CPU_PART_FOUNDATION 0xD00
@@ -135,6 +136,8 @@
#define AMPERE_CPU_PART_AMPERE1 0xAC3
+#define MICROSOFT_CPU_PART_AZURE_COBALT_100 0xD49 /* Based on r0p0 of ARM Neoverse N2 */
+
#define MIDR_CORTEX_A53 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A53)
#define MIDR_CORTEX_A57 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A57)
#define MIDR_CORTEX_A72 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A72)
@@ -193,6 +196,7 @@
#define MIDR_APPLE_M2_BLIZZARD_MAX MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M2_BLIZZARD_MAX)
#define MIDR_APPLE_M2_AVALANCHE_MAX MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M2_AVALANCHE_MAX)
#define MIDR_AMPERE1 MIDR_CPU_MODEL(ARM_CPU_IMP_AMPERE, AMPERE_CPU_PART_AMPERE1)
+#define MIDR_MICROSOFT_AZURE_COBALT_100 MIDR_CPU_MODEL(ARM_CPU_IMP_MICROSOFT, MICROSOFT_CPU_PART_AZURE_COBALT_100)
/* Fujitsu Erratum 010001 affects A64FX 1.0 and 1.1, (v0r0 and v1r0) */
#define MIDR_FUJITSU_ERRATUM_010001 MIDR_FUJITSU_A64FX
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 50e5f25..b67b89c 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -62,13 +62,13 @@ static inline void cpacr_restore(unsigned long cpacr)
* When we defined the maximum SVE vector length we defined the ABI so
* that the maximum vector length included all the reserved for future
* expansion bits in ZCR rather than those just currently defined by
- * the architecture. While SME follows a similar pattern the fact that
- * it includes a square matrix means that any allocations that attempt
- * to cover the maximum potential vector length (such as happen with
- * the regset used for ptrace) end up being extremely large. Define
- * the much lower actual limit for use in such situations.
+ * the architecture. Using this length to allocate worst size buffers
+ * results in excessively large allocations, and this effect is even
+ * more pronounced for SME due to ZA. Define more suitable VLs for
+ * these situations.
*/
-#define SME_VQ_MAX 16
+#define ARCH_SVE_VQ_MAX ((ZCR_ELx_LEN_MASK >> ZCR_ELx_LEN_SHIFT) + 1)
+#define SME_VQ_MAX ((SMCR_ELx_LEN_MASK >> SMCR_ELx_LEN_SHIFT) + 1)
struct task_struct;
@@ -386,6 +386,7 @@ extern void sme_alloc(struct task_struct *task, bool flush);
extern unsigned int sme_get_vl(void);
extern int sme_set_current_vl(unsigned long arg);
extern int sme_get_current_vl(void);
+extern void sme_suspend_exit(void);
/*
* Return how many bytes of memory are required to store the full SME
@@ -421,6 +422,7 @@ static inline int sme_max_vl(void) { return 0; }
static inline int sme_max_virtualisable_vl(void) { return 0; }
static inline int sme_set_current_vl(unsigned long arg) { return -EINVAL; }
static inline int sme_get_current_vl(void) { return -EINVAL; }
+static inline void sme_suspend_exit(void) { }
static inline size_t sme_state_size(struct task_struct const *task)
{
diff --git a/arch/arm64/include/asm/jump_label.h b/arch/arm64/include/asm/jump_label.h
index 48ddc0f..6aafbb78 100644
--- a/arch/arm64/include/asm/jump_label.h
+++ b/arch/arm64/include/asm/jump_label.h
@@ -18,7 +18,7 @@
static __always_inline bool arch_static_branch(struct static_key * const key,
const bool branch)
{
- asm_volatile_goto(
+ asm goto(
"1: nop \n\t"
" .pushsection __jump_table, \"aw\" \n\t"
" .align 3 \n\t"
@@ -35,7 +35,7 @@ static __always_inline bool arch_static_branch(struct static_key * const key,
static __always_inline bool arch_static_branch_jump(struct static_key * const key,
const bool branch)
{
- asm_volatile_goto(
+ asm goto(
"1: b %l[l_yes] \n\t"
" .pushsection __jump_table, \"aw\" \n\t"
" .align 3 \n\t"
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 967c7c7a..76b8dd3 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -374,6 +374,7 @@ static const struct midr_range erratum_1463225[] = {
static const struct midr_range trbe_overwrite_fill_mode_cpus[] = {
#ifdef CONFIG_ARM64_ERRATUM_2139208
MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
+ MIDR_ALL_VERSIONS(MIDR_MICROSOFT_AZURE_COBALT_100),
#endif
#ifdef CONFIG_ARM64_ERRATUM_2119858
MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
@@ -387,6 +388,7 @@ static const struct midr_range trbe_overwrite_fill_mode_cpus[] = {
static const struct midr_range tsb_flush_fail_cpus[] = {
#ifdef CONFIG_ARM64_ERRATUM_2067961
MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
+ MIDR_ALL_VERSIONS(MIDR_MICROSOFT_AZURE_COBALT_100),
#endif
#ifdef CONFIG_ARM64_ERRATUM_2054223
MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
@@ -399,6 +401,7 @@ static const struct midr_range tsb_flush_fail_cpus[] = {
static struct midr_range trbe_write_out_of_range_cpus[] = {
#ifdef CONFIG_ARM64_ERRATUM_2253138
MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
+ MIDR_ALL_VERSIONS(MIDR_MICROSOFT_AZURE_COBALT_100),
#endif
#ifdef CONFIG_ARM64_ERRATUM_2224489
MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index a5dc6f7..f27acca 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1311,6 +1311,22 @@ void __init sme_setup(void)
get_sme_default_vl());
}
+void sme_suspend_exit(void)
+{
+ u64 smcr = 0;
+
+ if (!system_supports_sme())
+ return;
+
+ if (system_supports_fa64())
+ smcr |= SMCR_ELx_FA64;
+ if (system_supports_sme2())
+ smcr |= SMCR_ELx_EZT0;
+
+ write_sysreg_s(smcr, SYS_SMCR_EL1);
+ write_sysreg_s(0, SYS_SMPRI_EL1);
+}
+
#endif /* CONFIG_ARM64_SME */
static void sve_init_regs(void)
@@ -1635,7 +1651,7 @@ void fpsimd_preserve_current_state(void)
void fpsimd_signal_preserve_current_state(void)
{
fpsimd_preserve_current_state();
- if (test_thread_flag(TIF_SVE))
+ if (current->thread.fp_type == FP_STATE_SVE)
sve_to_fpsimd(current);
}
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index dc6cf0e..e3bef38 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -1500,7 +1500,8 @@ static const struct user_regset aarch64_regsets[] = {
#ifdef CONFIG_ARM64_SVE
[REGSET_SVE] = { /* Scalable Vector Extension */
.core_note_type = NT_ARM_SVE,
- .n = DIV_ROUND_UP(SVE_PT_SIZE(SVE_VQ_MAX, SVE_PT_REGS_SVE),
+ .n = DIV_ROUND_UP(SVE_PT_SIZE(ARCH_SVE_VQ_MAX,
+ SVE_PT_REGS_SVE),
SVE_VQ_BYTES),
.size = SVE_VQ_BYTES,
.align = SVE_VQ_BYTES,
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 0e8beb3..425b1bc 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -242,7 +242,7 @@ static int preserve_sve_context(struct sve_context __user *ctx)
vl = task_get_sme_vl(current);
vq = sve_vq_from_vl(vl);
flags |= SVE_SIG_FLAG_SM;
- } else if (test_thread_flag(TIF_SVE)) {
+ } else if (current->thread.fp_type == FP_STATE_SVE) {
vq = sve_vq_from_vl(vl);
}
@@ -878,7 +878,7 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user,
if (system_supports_sve() || system_supports_sme()) {
unsigned int vq = 0;
- if (add_all || test_thread_flag(TIF_SVE) ||
+ if (add_all || current->thread.fp_type == FP_STATE_SVE ||
thread_sm_enabled(¤t->thread)) {
int vl = max(sve_max_vl(), sme_max_vl());
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 7f88028..b2a60e0 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -247,7 +247,7 @@ struct kunwind_consume_entry_data {
void *cookie;
};
-static bool
+static __always_inline bool
arch_kunwind_consume_entry(const struct kunwind_state *state, void *cookie)
{
struct kunwind_consume_entry_data *data = cookie;
diff --git a/arch/arm64/kernel/suspend.c b/arch/arm64/kernel/suspend.c
index eca4d04..eaaff94 100644
--- a/arch/arm64/kernel/suspend.c
+++ b/arch/arm64/kernel/suspend.c
@@ -12,6 +12,7 @@
#include <asm/daifflags.h>
#include <asm/debug-monitors.h>
#include <asm/exec.h>
+#include <asm/fpsimd.h>
#include <asm/mte.h>
#include <asm/memory.h>
#include <asm/mmu_context.h>
@@ -80,6 +81,8 @@ void notrace __cpu_suspend_exit(void)
*/
spectre_v4_enable_mitigation(NULL);
+ sme_suspend_exit();
+
/* Restore additional feature-specific configuration */
ptrauth_suspend_exit();
}
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 6c3c8ca..27ca89b 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -3,7 +3,6 @@
# KVM configuration
#
-source "virt/lib/Kconfig"
source "virt/kvm/Kconfig"
menuconfig VIRTUALIZATION
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index c651df9..ab9d05f 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1419,7 +1419,6 @@ kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt,
level + 1);
if (ret) {
kvm_pgtable_stage2_free_unlinked(mm_ops, pgtable, level);
- mm_ops->put_page(pgtable);
return ERR_PTR(ret);
}
@@ -1502,7 +1501,6 @@ static int stage2_split_walker(const struct kvm_pgtable_visit_ctx *ctx,
if (!stage2_try_break_pte(ctx, mmu)) {
kvm_pgtable_stage2_free_unlinked(mm_ops, childp, level);
- mm_ops->put_page(childp);
return -EAGAIN;
}
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 8350fb8..b7be96a 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -101,6 +101,17 @@ void __init kvm_hyp_reserve(void)
hyp_mem_base);
}
+static void __pkvm_destroy_hyp_vm(struct kvm *host_kvm)
+{
+ if (host_kvm->arch.pkvm.handle) {
+ WARN_ON(kvm_call_hyp_nvhe(__pkvm_teardown_vm,
+ host_kvm->arch.pkvm.handle));
+ }
+
+ host_kvm->arch.pkvm.handle = 0;
+ free_hyp_memcache(&host_kvm->arch.pkvm.teardown_mc);
+}
+
/*
* Allocates and donates memory for hypervisor VM structs at EL2.
*
@@ -181,7 +192,7 @@ static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
return 0;
destroy_vm:
- pkvm_destroy_hyp_vm(host_kvm);
+ __pkvm_destroy_hyp_vm(host_kvm);
return ret;
free_vm:
free_pages_exact(hyp_vm, hyp_vm_sz);
@@ -194,23 +205,19 @@ int pkvm_create_hyp_vm(struct kvm *host_kvm)
{
int ret = 0;
- mutex_lock(&host_kvm->lock);
+ mutex_lock(&host_kvm->arch.config_lock);
if (!host_kvm->arch.pkvm.handle)
ret = __pkvm_create_hyp_vm(host_kvm);
- mutex_unlock(&host_kvm->lock);
+ mutex_unlock(&host_kvm->arch.config_lock);
return ret;
}
void pkvm_destroy_hyp_vm(struct kvm *host_kvm)
{
- if (host_kvm->arch.pkvm.handle) {
- WARN_ON(kvm_call_hyp_nvhe(__pkvm_teardown_vm,
- host_kvm->arch.pkvm.handle));
- }
-
- host_kvm->arch.pkvm.handle = 0;
- free_hyp_memcache(&host_kvm->arch.pkvm.teardown_mc);
+ mutex_lock(&host_kvm->arch.config_lock);
+ __pkvm_destroy_hyp_vm(host_kvm);
+ mutex_unlock(&host_kvm->arch.config_lock);
}
int pkvm_init_host_vm(struct kvm *host_kvm)
diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index e2764d0..28a9307 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -468,6 +468,9 @@ static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
}
irq = vgic_get_irq(vcpu->kvm, NULL, intids[i]);
+ if (!irq)
+ continue;
+
raw_spin_lock_irqsave(&irq->irq_lock, flags);
irq->pending_latch = pendmask & (1U << bit_nr);
vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
@@ -1432,6 +1435,8 @@ static int vgic_its_cmd_handle_movall(struct kvm *kvm, struct vgic_its *its,
for (i = 0; i < irq_count; i++) {
irq = vgic_get_irq(kvm, NULL, intids[i]);
+ if (!irq)
+ continue;
update_affinity(irq, vcpu2);
diff --git a/arch/csky/include/asm/jump_label.h b/arch/csky/include/asm/jump_label.h
index 98a3f4b..ef2e37a 100644
--- a/arch/csky/include/asm/jump_label.h
+++ b/arch/csky/include/asm/jump_label.h
@@ -12,7 +12,7 @@
static __always_inline bool arch_static_branch(struct static_key *key,
bool branch)
{
- asm_volatile_goto(
+ asm goto(
"1: nop32 \n"
" .pushsection __jump_table, \"aw\" \n"
" .align 2 \n"
@@ -29,7 +29,7 @@ static __always_inline bool arch_static_branch(struct static_key *key,
static __always_inline bool arch_static_branch_jump(struct static_key *key,
bool branch)
{
- asm_volatile_goto(
+ asm goto(
"1: bsr32 %l[label] \n"
" .pushsection __jump_table, \"aw\" \n"
" .align 2 \n"
diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index 10959e6..929f689 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -12,6 +12,7 @@
select ARCH_DISABLE_KASAN_INLINE
select ARCH_ENABLE_MEMORY_HOTPLUG
select ARCH_ENABLE_MEMORY_HOTREMOVE
+ select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
select ARCH_HAS_ACPI_TABLE_UPGRADE if ACPI
select ARCH_HAS_CPU_FINALIZE_INIT
select ARCH_HAS_FORTIFY_SOURCE
@@ -99,6 +100,7 @@
select HAVE_ARCH_KFENCE
select HAVE_ARCH_KGDB if PERF_EVENTS
select HAVE_ARCH_MMAP_RND_BITS if MMU
+ select HAVE_ARCH_SECCOMP
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
@@ -632,23 +634,6 @@
This is limited by the size of the lower address memory, 256MB.
-config SECCOMP
- bool "Enable seccomp to safely compute untrusted bytecode"
- depends on PROC_FS
- default y
- help
- This kernel feature is useful for number crunching applications
- that may need to compute untrusted bytecode during their
- execution. By using pipes or other transports made available to
- the process as file descriptors supporting the read/write
- syscalls, it's possible to isolate those applications in
- their own address space using seccomp. Once seccomp is
- enabled via /proc/<pid>/seccomp, it cannot be disabled
- and the task is only allowed to execute a few safe syscalls
- defined by each seccomp mode.
-
- If unsure, say Y. Only embedded should say N here.
-
endmenu
config ARCH_SELECT_MEMORY_MODEL
@@ -667,10 +652,6 @@
or have huge holes in the physical address space for other reasons.
See <file:Documentation/mm/numa.rst> for more.
-config ARCH_ENABLE_THP_MIGRATION
- def_bool y
- depends on TRANSPARENT_HUGEPAGE
-
config ARCH_MEMORY_PROBE
def_bool y
depends on MEMORY_HOTPLUG
diff --git a/arch/loongarch/boot/dts/loongson-2k0500-ref.dts b/arch/loongarch/boot/dts/loongson-2k0500-ref.dts
index b38071a..8aefb0c 100644
--- a/arch/loongarch/boot/dts/loongson-2k0500-ref.dts
+++ b/arch/loongarch/boot/dts/loongson-2k0500-ref.dts
@@ -60,7 +60,7 @@ &i2c0 {
#address-cells = <1>;
#size-cells = <0>;
- eeprom@57{
+ eeprom@57 {
compatible = "atmel,24c16";
reg = <0x57>;
pagesize = <16>;
diff --git a/arch/loongarch/boot/dts/loongson-2k1000-ref.dts b/arch/loongarch/boot/dts/loongson-2k1000-ref.dts
index 132a2d1..ed4d324 100644
--- a/arch/loongarch/boot/dts/loongson-2k1000-ref.dts
+++ b/arch/loongarch/boot/dts/loongson-2k1000-ref.dts
@@ -78,7 +78,7 @@ &i2c2 {
#address-cells = <1>;
#size-cells = <0>;
- eeprom@57{
+ eeprom@57 {
compatible = "atmel,24c16";
reg = <0x57>;
pagesize = <16>;
diff --git a/arch/loongarch/include/asm/acpi.h b/arch/loongarch/include/asm/acpi.h
index 8de6c4b..49e29b2 100644
--- a/arch/loongarch/include/asm/acpi.h
+++ b/arch/loongarch/include/asm/acpi.h
@@ -32,8 +32,10 @@ static inline bool acpi_has_cpu_in_madt(void)
return true;
}
+#define MAX_CORE_PIC 256
+
extern struct list_head acpi_wakeup_device_list;
-extern struct acpi_madt_core_pic acpi_core_pic[NR_CPUS];
+extern struct acpi_madt_core_pic acpi_core_pic[MAX_CORE_PIC];
extern int __init parse_acpi_topology(void);
diff --git a/arch/loongarch/include/asm/jump_label.h b/arch/loongarch/include/asm/jump_label.h
index 3cea299..29acfe3 100644
--- a/arch/loongarch/include/asm/jump_label.h
+++ b/arch/loongarch/include/asm/jump_label.h
@@ -22,7 +22,7 @@
static __always_inline bool arch_static_branch(struct static_key * const key, const bool branch)
{
- asm_volatile_goto(
+ asm goto(
"1: nop \n\t"
JUMP_TABLE_ENTRY
: : "i"(&((char *)key)[branch]) : : l_yes);
@@ -35,7 +35,7 @@ static __always_inline bool arch_static_branch(struct static_key * const key, co
static __always_inline bool arch_static_branch_jump(struct static_key * const key, const bool branch)
{
- asm_volatile_goto(
+ asm goto(
"1: b %l[l_yes] \n\t"
JUMP_TABLE_ENTRY
: : "i"(&((char *)key)[branch]) : : l_yes);
diff --git a/arch/loongarch/kernel/acpi.c b/arch/loongarch/kernel/acpi.c
index b6b097b..5cf59c6 100644
--- a/arch/loongarch/kernel/acpi.c
+++ b/arch/loongarch/kernel/acpi.c
@@ -29,11 +29,9 @@ int disabled_cpus;
u64 acpi_saved_sp;
-#define MAX_CORE_PIC 256
-
#define PREFIX "ACPI: "
-struct acpi_madt_core_pic acpi_core_pic[NR_CPUS];
+struct acpi_madt_core_pic acpi_core_pic[MAX_CORE_PIC];
void __init __iomem * __acpi_map_table(unsigned long phys, unsigned long size)
{
diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
index edf2bba..634ef17 100644
--- a/arch/loongarch/kernel/setup.c
+++ b/arch/loongarch/kernel/setup.c
@@ -357,6 +357,8 @@ void __init platform_init(void)
acpi_gbl_use_default_register_widths = false;
acpi_boot_table_init();
#endif
+
+ early_init_fdt_scan_reserved_mem();
unflatten_and_copy_device_tree();
#ifdef CONFIG_NUMA
@@ -390,8 +392,6 @@ static void __init arch_mem_init(char **cmdline_p)
check_kernel_sections_mem();
- early_init_fdt_scan_reserved_mem();
-
/*
* In order to reduce the possibility of kernel panic when failed to
* get IO TLB memory under CONFIG_SWIOTLB, it is better to allocate
diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c
index 2b49d30..aabee0b 100644
--- a/arch/loongarch/kernel/smp.c
+++ b/arch/loongarch/kernel/smp.c
@@ -88,6 +88,73 @@ void show_ipi_list(struct seq_file *p, int prec)
}
}
+static inline void set_cpu_core_map(int cpu)
+{
+ int i;
+
+ cpumask_set_cpu(cpu, &cpu_core_setup_map);
+
+ for_each_cpu(i, &cpu_core_setup_map) {
+ if (cpu_data[cpu].package == cpu_data[i].package) {
+ cpumask_set_cpu(i, &cpu_core_map[cpu]);
+ cpumask_set_cpu(cpu, &cpu_core_map[i]);
+ }
+ }
+}
+
+static inline void set_cpu_sibling_map(int cpu)
+{
+ int i;
+
+ cpumask_set_cpu(cpu, &cpu_sibling_setup_map);
+
+ for_each_cpu(i, &cpu_sibling_setup_map) {
+ if (cpus_are_siblings(cpu, i)) {
+ cpumask_set_cpu(i, &cpu_sibling_map[cpu]);
+ cpumask_set_cpu(cpu, &cpu_sibling_map[i]);
+ }
+ }
+}
+
+static inline void clear_cpu_sibling_map(int cpu)
+{
+ int i;
+
+ for_each_cpu(i, &cpu_sibling_setup_map) {
+ if (cpus_are_siblings(cpu, i)) {
+ cpumask_clear_cpu(i, &cpu_sibling_map[cpu]);
+ cpumask_clear_cpu(cpu, &cpu_sibling_map[i]);
+ }
+ }
+
+ cpumask_clear_cpu(cpu, &cpu_sibling_setup_map);
+}
+
+/*
+ * Calculate a new cpu_foreign_map mask whenever a
+ * new cpu appears or disappears.
+ */
+void calculate_cpu_foreign_map(void)
+{
+ int i, k, core_present;
+ cpumask_t temp_foreign_map;
+
+ /* Re-calculate the mask */
+ cpumask_clear(&temp_foreign_map);
+ for_each_online_cpu(i) {
+ core_present = 0;
+ for_each_cpu(k, &temp_foreign_map)
+ if (cpus_are_siblings(i, k))
+ core_present = 1;
+ if (!core_present)
+ cpumask_set_cpu(i, &temp_foreign_map);
+ }
+
+ for_each_online_cpu(i)
+ cpumask_andnot(&cpu_foreign_map[i],
+ &temp_foreign_map, &cpu_sibling_map[i]);
+}
+
/* Send mailbox buffer via Mail_Send */
static void csr_mail_send(uint64_t data, int cpu, int mailbox)
{
@@ -303,6 +370,7 @@ int loongson_cpu_disable(void)
numa_remove_cpu(cpu);
#endif
set_cpu_online(cpu, false);
+ clear_cpu_sibling_map(cpu);
calculate_cpu_foreign_map();
local_irq_save(flags);
irq_migrate_all_off_this_cpu();
@@ -337,6 +405,7 @@ void __noreturn arch_cpu_idle_dead(void)
addr = iocsr_read64(LOONGARCH_IOCSR_MBUF0);
} while (addr == 0);
+ local_irq_disable();
init_fn = (void *)TO_CACHE(addr);
iocsr_write32(0xffffffff, LOONGARCH_IOCSR_IPI_CLEAR);
@@ -379,59 +448,6 @@ static int __init ipi_pm_init(void)
core_initcall(ipi_pm_init);
#endif
-static inline void set_cpu_sibling_map(int cpu)
-{
- int i;
-
- cpumask_set_cpu(cpu, &cpu_sibling_setup_map);
-
- for_each_cpu(i, &cpu_sibling_setup_map) {
- if (cpus_are_siblings(cpu, i)) {
- cpumask_set_cpu(i, &cpu_sibling_map[cpu]);
- cpumask_set_cpu(cpu, &cpu_sibling_map[i]);
- }
- }
-}
-
-static inline void set_cpu_core_map(int cpu)
-{
- int i;
-
- cpumask_set_cpu(cpu, &cpu_core_setup_map);
-
- for_each_cpu(i, &cpu_core_setup_map) {
- if (cpu_data[cpu].package == cpu_data[i].package) {
- cpumask_set_cpu(i, &cpu_core_map[cpu]);
- cpumask_set_cpu(cpu, &cpu_core_map[i]);
- }
- }
-}
-
-/*
- * Calculate a new cpu_foreign_map mask whenever a
- * new cpu appears or disappears.
- */
-void calculate_cpu_foreign_map(void)
-{
- int i, k, core_present;
- cpumask_t temp_foreign_map;
-
- /* Re-calculate the mask */
- cpumask_clear(&temp_foreign_map);
- for_each_online_cpu(i) {
- core_present = 0;
- for_each_cpu(k, &temp_foreign_map)
- if (cpus_are_siblings(i, k))
- core_present = 1;
- if (!core_present)
- cpumask_set_cpu(i, &temp_foreign_map);
- }
-
- for_each_online_cpu(i)
- cpumask_andnot(&cpu_foreign_map[i],
- &temp_foreign_map, &cpu_sibling_map[i]);
-}
-
/* Preload SMP state for boot cpu */
void smp_prepare_boot_cpu(void)
{
diff --git a/arch/loongarch/kvm/vcpu.c b/arch/loongarch/kvm/vcpu.c
index 2770199..3610692 100644
--- a/arch/loongarch/kvm/vcpu.c
+++ b/arch/loongarch/kvm/vcpu.c
@@ -298,74 +298,73 @@ static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
return ret;
}
-static int _kvm_get_cpucfg(int id, u64 *v)
+static int _kvm_get_cpucfg_mask(int id, u64 *v)
{
- int ret = 0;
-
- if (id < 0 && id >= KVM_MAX_CPUCFG_REGS)
+ if (id < 0 || id >= KVM_MAX_CPUCFG_REGS)
return -EINVAL;
switch (id) {
case 2:
- /* Return CPUCFG2 features which have been supported by KVM */
+ /* CPUCFG2 features unconditionally supported by KVM */
*v = CPUCFG2_FP | CPUCFG2_FPSP | CPUCFG2_FPDP |
CPUCFG2_FPVERS | CPUCFG2_LLFTP | CPUCFG2_LLFTPREV |
CPUCFG2_LAM;
/*
- * If LSX is supported by CPU, it is also supported by KVM,
- * as we implement it.
+ * For the ISA extensions listed below, if one is supported
+ * by the host, then it is also supported by KVM.
*/
if (cpu_has_lsx)
*v |= CPUCFG2_LSX;
- /*
- * if LASX is supported by CPU, it is also supported by KVM,
- * as we implement it.
- */
if (cpu_has_lasx)
*v |= CPUCFG2_LASX;
- break;
+ return 0;
default:
- ret = -EINVAL;
- break;
+ /*
+ * No restrictions on other valid CPUCFG IDs' values, but
+ * CPUCFG data is limited to 32 bits as the LoongArch ISA
+ * manual says (Volume 1, Section 2.2.10.5 "CPUCFG").
+ */
+ *v = U32_MAX;
+ return 0;
}
- return ret;
}
static int kvm_check_cpucfg(int id, u64 val)
{
- u64 mask;
- int ret = 0;
+ int ret;
+ u64 mask = 0;
- if (id < 0 && id >= KVM_MAX_CPUCFG_REGS)
- return -EINVAL;
-
- if (_kvm_get_cpucfg(id, &mask))
+ ret = _kvm_get_cpucfg_mask(id, &mask);
+ if (ret)
return ret;
+ if (val & ~mask)
+ /* Unsupported features and/or the higher 32 bits should not be set */
+ return -EINVAL;
+
switch (id) {
case 2:
- /* CPUCFG2 features checking */
- if (val & ~mask)
- /* The unsupported features should not be set */
- ret = -EINVAL;
- else if (!(val & CPUCFG2_LLFTP))
- /* The LLFTP must be set, as guest must has a constant timer */
- ret = -EINVAL;
- else if ((val & CPUCFG2_FP) && (!(val & CPUCFG2_FPSP) || !(val & CPUCFG2_FPDP)))
- /* Single and double float point must both be set when enable FP */
- ret = -EINVAL;
- else if ((val & CPUCFG2_LSX) && !(val & CPUCFG2_FP))
- /* FP should be set when enable LSX */
- ret = -EINVAL;
- else if ((val & CPUCFG2_LASX) && !(val & CPUCFG2_LSX))
- /* LSX, FP should be set when enable LASX, and FP has been checked before. */
- ret = -EINVAL;
- break;
+ if (!(val & CPUCFG2_LLFTP))
+ /* Guests must have a constant timer */
+ return -EINVAL;
+ if ((val & CPUCFG2_FP) && (!(val & CPUCFG2_FPSP) || !(val & CPUCFG2_FPDP)))
+ /* Single and double float point must both be set when FP is enabled */
+ return -EINVAL;
+ if ((val & CPUCFG2_LSX) && !(val & CPUCFG2_FP))
+ /* LSX architecturally implies FP but val does not satisfy that */
+ return -EINVAL;
+ if ((val & CPUCFG2_LASX) && !(val & CPUCFG2_LSX))
+ /* LASX architecturally implies LSX and FP but val does not satisfy that */
+ return -EINVAL;
+ return 0;
default:
- break;
+ /*
+ * Values for the other CPUCFG IDs are not being further validated
+ * besides the mask check above.
+ */
+ return 0;
}
- return ret;
}
static int kvm_get_one_reg(struct kvm_vcpu *vcpu,
@@ -566,7 +565,7 @@ static int kvm_loongarch_get_cpucfg_attr(struct kvm_vcpu *vcpu,
uint64_t val;
uint64_t __user *uaddr = (uint64_t __user *)attr->addr;
- ret = _kvm_get_cpucfg(attr->attr, &val);
+ ret = _kvm_get_cpucfg_mask(attr->attr, &val);
if (ret)
return ret;
diff --git a/arch/loongarch/mm/kasan_init.c b/arch/loongarch/mm/kasan_init.c
index cc3e81f..c608adc 100644
--- a/arch/loongarch/mm/kasan_init.c
+++ b/arch/loongarch/mm/kasan_init.c
@@ -44,6 +44,9 @@ void *kasan_mem_to_shadow(const void *addr)
unsigned long xrange = (maddr >> XRANGE_SHIFT) & 0xffff;
unsigned long offset = 0;
+ if (maddr >= FIXADDR_START)
+ return (void *)(kasan_early_shadow_page);
+
maddr &= XRANGE_SHADOW_MASK;
switch (xrange) {
case XKPRANGE_CC_SEG:
diff --git a/arch/loongarch/vdso/Makefile b/arch/loongarch/vdso/Makefile
index c74c992..f597cd0 100644
--- a/arch/loongarch/vdso/Makefile
+++ b/arch/loongarch/vdso/Makefile
@@ -2,6 +2,7 @@
# Objects to go into the VDSO.
KASAN_SANITIZE := n
+UBSAN_SANITIZE := n
KCOV_INSTRUMENT := n
# Include the generic Makefile to check the built vdso.
diff --git a/arch/m68k/Makefile b/arch/m68k/Makefile
index 76ef1a6..0abcf99 100644
--- a/arch/m68k/Makefile
+++ b/arch/m68k/Makefile
@@ -15,10 +15,10 @@
KBUILD_DEFCONFIG := multi_defconfig
ifdef cross_compiling
- ifeq ($(CROSS_COMPILE),)
+ ifeq ($(CROSS_COMPILE),)
CROSS_COMPILE := $(call cc-cross-prefix, \
m68k-linux-gnu- m68k-linux- m68k-unknown-linux-gnu-)
- endif
+ endif
endif
#
diff --git a/arch/mips/include/asm/checksum.h b/arch/mips/include/asm/checksum.h
index 4044eaf..0921ddda 100644
--- a/arch/mips/include/asm/checksum.h
+++ b/arch/mips/include/asm/checksum.h
@@ -241,7 +241,8 @@ static __inline__ __sum16 csum_ipv6_magic(const struct in6_addr *saddr,
" .set pop"
: "=&r" (sum), "=&r" (tmp)
: "r" (saddr), "r" (daddr),
- "0" (htonl(len)), "r" (htonl(proto)), "r" (sum));
+ "0" (htonl(len)), "r" (htonl(proto)), "r" (sum)
+ : "memory");
return csum_fold(sum);
}
diff --git a/arch/mips/include/asm/jump_label.h b/arch/mips/include/asm/jump_label.h
index 081be98..ff5d388 100644
--- a/arch/mips/include/asm/jump_label.h
+++ b/arch/mips/include/asm/jump_label.h
@@ -39,7 +39,7 @@ extern void jump_label_apply_nops(struct module *mod);
static __always_inline bool arch_static_branch(struct static_key *key, bool branch)
{
- asm_volatile_goto("1:\t" B_INSN " 2f\n\t"
+ asm goto("1:\t" B_INSN " 2f\n\t"
"2:\t.insn\n\t"
".pushsection __jump_table, \"aw\"\n\t"
WORD_INSN " 1b, %l[l_yes], %0\n\t"
@@ -53,7 +53,7 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran
static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch)
{
- asm_volatile_goto("1:\t" J_INSN " %l[l_yes]\n\t"
+ asm goto("1:\t" J_INSN " %l[l_yes]\n\t"
".pushsection __jump_table, \"aw\"\n\t"
WORD_INSN " 1b, %l[l_yes], %0\n\t"
".popsection\n\t"
diff --git a/arch/mips/include/asm/ptrace.h b/arch/mips/include/asm/ptrace.h
index daf3cf2..d14d0e3 100644
--- a/arch/mips/include/asm/ptrace.h
+++ b/arch/mips/include/asm/ptrace.h
@@ -60,6 +60,7 @@ static inline void instruction_pointer_set(struct pt_regs *regs,
unsigned long val)
{
regs->cp0_epc = val;
+ regs->cp0_cause &= ~CAUSEF_BD;
}
/* Query offset/name of register from its name/offset */
@@ -154,6 +155,8 @@ static inline long regs_return_value(struct pt_regs *regs)
}
#define instruction_pointer(regs) ((regs)->cp0_epc)
+extern unsigned long exception_ip(struct pt_regs *regs);
+#define exception_ip(regs) exception_ip(regs)
#define profile_pc(regs) instruction_pointer(regs)
extern asmlinkage long syscall_trace_enter(struct pt_regs *regs, long syscall);
diff --git a/arch/mips/kernel/ptrace.c b/arch/mips/kernel/ptrace.c
index d9df543..59288c1 100644
--- a/arch/mips/kernel/ptrace.c
+++ b/arch/mips/kernel/ptrace.c
@@ -31,6 +31,7 @@
#include <linux/seccomp.h>
#include <linux/ftrace.h>
+#include <asm/branch.h>
#include <asm/byteorder.h>
#include <asm/cpu.h>
#include <asm/cpu-info.h>
@@ -48,6 +49,12 @@
#define CREATE_TRACE_POINTS
#include <trace/events/syscalls.h>
+unsigned long exception_ip(struct pt_regs *regs)
+{
+ return exception_epc(regs);
+}
+EXPORT_SYMBOL(exception_ip);
+
/*
* Called by kernel/ptrace.c when detaching..
*
diff --git a/arch/parisc/Makefile b/arch/parisc/Makefile
index 7486b3b..316f84f 100644
--- a/arch/parisc/Makefile
+++ b/arch/parisc/Makefile
@@ -50,12 +50,12 @@
# Set default cross compiler for kernel build
ifdef cross_compiling
- ifeq ($(CROSS_COMPILE),)
+ ifeq ($(CROSS_COMPILE),)
CC_SUFFIXES = linux linux-gnu unknown-linux-gnu suse-linux
CROSS_COMPILE := $(call cc-cross-prefix, \
$(foreach a,$(CC_ARCHES), \
$(foreach s,$(CC_SUFFIXES),$(a)-$(s)-)))
- endif
+ endif
endif
ifdef CONFIG_DYNAMIC_FTRACE
diff --git a/arch/parisc/include/asm/jump_label.h b/arch/parisc/include/asm/jump_label.h
index 9442879..317ebc5 100644
--- a/arch/parisc/include/asm/jump_label.h
+++ b/arch/parisc/include/asm/jump_label.h
@@ -12,7 +12,7 @@
static __always_inline bool arch_static_branch(struct static_key *key, bool branch)
{
- asm_volatile_goto("1:\n\t"
+ asm goto("1:\n\t"
"nop\n\t"
".pushsection __jump_table, \"aw\"\n\t"
".align %1\n\t"
@@ -29,7 +29,7 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran
static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch)
{
- asm_volatile_goto("1:\n\t"
+ asm goto("1:\n\t"
"b,n %l[l_yes]\n\t"
".pushsection __jump_table, \"aw\"\n\t"
".align %1\n\t"
diff --git a/arch/parisc/include/asm/kprobes.h b/arch/parisc/include/asm/kprobes.h
index 0a175ac..0f42f5c 100644
--- a/arch/parisc/include/asm/kprobes.h
+++ b/arch/parisc/include/asm/kprobes.h
@@ -10,9 +10,10 @@
#ifndef _PARISC_KPROBES_H
#define _PARISC_KPROBES_H
+#include <asm-generic/kprobes.h>
+
#ifdef CONFIG_KPROBES
-#include <asm-generic/kprobes.h>
#include <linux/types.h>
#include <linux/ptrace.h>
#include <linux/notifier.h>
diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
index d1defb9e..621a4b3 100644
--- a/arch/parisc/kernel/ftrace.c
+++ b/arch/parisc/kernel/ftrace.c
@@ -78,7 +78,7 @@ asmlinkage void notrace __hot ftrace_function_trampoline(unsigned long parent,
#endif
}
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+#if defined(CONFIG_DYNAMIC_FTRACE) && defined(CONFIG_FUNCTION_GRAPH_TRACER)
int ftrace_enable_ftrace_graph_caller(void)
{
static_key_enable(&ftrace_graph_enable.key);
diff --git a/arch/parisc/kernel/processor.c b/arch/parisc/kernel/processor.c
index e95a977..bf73562 100644
--- a/arch/parisc/kernel/processor.c
+++ b/arch/parisc/kernel/processor.c
@@ -172,7 +172,6 @@ static int __init processor_probe(struct parisc_device *dev)
p->cpu_num = cpu_info.cpu_num;
p->cpu_loc = cpu_info.cpu_loc;
- set_cpu_possible(cpuid, true);
store_cpu_topology(cpuid);
#ifdef CONFIG_SMP
@@ -474,13 +473,6 @@ static struct parisc_driver cpu_driver __refdata = {
*/
void __init processor_init(void)
{
- unsigned int cpu;
-
reset_cpu_topology();
-
- /* reset possible mask. We will mark those which are possible. */
- for_each_possible_cpu(cpu)
- set_cpu_possible(cpu, false);
-
register_parisc_driver(&cpu_driver);
}
diff --git a/arch/parisc/kernel/unwind.c b/arch/parisc/kernel/unwind.c
index 27ae40a..f7e0fee 100644
--- a/arch/parisc/kernel/unwind.c
+++ b/arch/parisc/kernel/unwind.c
@@ -228,10 +228,8 @@ static int unwind_special(struct unwind_frame_info *info, unsigned long pc, int
#ifdef CONFIG_IRQSTACKS
extern void * const _call_on_stack;
#endif /* CONFIG_IRQSTACKS */
- void *ptr;
- ptr = dereference_kernel_function_descriptor(&handle_interruption);
- if (pc_is_kernel_fn(pc, ptr)) {
+ if (pc_is_kernel_fn(pc, handle_interruption)) {
struct pt_regs *regs = (struct pt_regs *)(info->sp - frame_size - PT_SZ_ALGN);
dbg("Unwinding through handle_interruption()\n");
info->prev_sp = regs->gr[30];
@@ -239,13 +237,13 @@ static int unwind_special(struct unwind_frame_info *info, unsigned long pc, int
return 1;
}
- if (pc_is_kernel_fn(pc, ret_from_kernel_thread) ||
- pc_is_kernel_fn(pc, syscall_exit)) {
+ if (pc == (unsigned long)&ret_from_kernel_thread ||
+ pc == (unsigned long)&syscall_exit) {
info->prev_sp = info->prev_ip = 0;
return 1;
}
- if (pc_is_kernel_fn(pc, intr_return)) {
+ if (pc == (unsigned long)&intr_return) {
struct pt_regs *regs;
dbg("Found intr_return()\n");
@@ -257,14 +255,14 @@ static int unwind_special(struct unwind_frame_info *info, unsigned long pc, int
}
if (pc_is_kernel_fn(pc, _switch_to) ||
- pc_is_kernel_fn(pc, _switch_to_ret)) {
+ pc == (unsigned long)&_switch_to_ret) {
info->prev_sp = info->sp - CALLEE_SAVE_FRAME_SIZE;
info->prev_ip = *(unsigned long *)(info->prev_sp - RP_OFFSET);
return 1;
}
#ifdef CONFIG_IRQSTACKS
- if (pc_is_kernel_fn(pc, _call_on_stack)) {
+ if (pc == (unsigned long)&_call_on_stack) {
info->prev_sp = *(unsigned long *)(info->sp - FRAME_SIZE - REG_SZ);
info->prev_ip = *(unsigned long *)(info->sp - FRAME_SIZE - RP_OFFSET);
return 1;
diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
index 1ebd2ca..107fc5a 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -20,14 +20,6 @@
#ifndef __ASSEMBLY__
extern void _mcount(void);
-static inline unsigned long ftrace_call_adjust(unsigned long addr)
-{
- if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY))
- addr += MCOUNT_INSN_SIZE;
-
- return addr;
-}
-
unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip,
unsigned long sp);
@@ -142,8 +134,10 @@ static inline u8 this_cpu_get_ftrace_enabled(void) { return 1; }
#ifdef CONFIG_FUNCTION_TRACER
extern unsigned int ftrace_tramp_text[], ftrace_tramp_init[];
void ftrace_free_init_tramp(void);
+unsigned long ftrace_call_adjust(unsigned long addr);
#else
static inline void ftrace_free_init_tramp(void) { }
+static inline unsigned long ftrace_call_adjust(unsigned long addr) { return addr; }
#endif
#endif /* !__ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/jump_label.h b/arch/powerpc/include/asm/jump_label.h
index 93ce3ec..2f2a86e 100644
--- a/arch/powerpc/include/asm/jump_label.h
+++ b/arch/powerpc/include/asm/jump_label.h
@@ -17,7 +17,7 @@
static __always_inline bool arch_static_branch(struct static_key *key, bool branch)
{
- asm_volatile_goto("1:\n\t"
+ asm goto("1:\n\t"
"nop # arch_static_branch\n\t"
".pushsection __jump_table, \"aw\"\n\t"
".long 1b - ., %l[l_yes] - .\n\t"
@@ -32,7 +32,7 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran
static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch)
{
- asm_volatile_goto("1:\n\t"
+ asm goto("1:\n\t"
"b %l[l_yes] # arch_static_branch_jump\n\t"
".pushsection __jump_table, \"aw\"\n\t"
".long 1b - ., %l[l_yes] - .\n\t"
diff --git a/arch/powerpc/include/asm/papr-sysparm.h b/arch/powerpc/include/asm/papr-sysparm.h
index 0dbbff5..c3cd5b1 100644
--- a/arch/powerpc/include/asm/papr-sysparm.h
+++ b/arch/powerpc/include/asm/papr-sysparm.h
@@ -32,7 +32,7 @@ typedef struct {
*/
struct papr_sysparm_buf {
__be16 len;
- char val[PAPR_SYSPARM_MAX_OUTPUT];
+ u8 val[PAPR_SYSPARM_MAX_OUTPUT];
};
struct papr_sysparm_buf *papr_sysparm_buf_alloc(void);
diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h
index ce2b1b5e..a8b7e86 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -30,6 +30,16 @@ void *pci_traverse_device_nodes(struct device_node *start,
void *data);
extern void pci_devs_phb_init_dynamic(struct pci_controller *phb);
+#if defined(CONFIG_IOMMU_API) && (defined(CONFIG_PPC_PSERIES) || \
+ defined(CONFIG_PPC_POWERNV))
+extern void ppc_iommu_register_device(struct pci_controller *phb);
+extern void ppc_iommu_unregister_device(struct pci_controller *phb);
+#else
+static inline void ppc_iommu_register_device(struct pci_controller *phb) { }
+static inline void ppc_iommu_unregister_device(struct pci_controller *phb) { }
+#endif
+
+
/* From rtas_pci.h */
extern void init_pci_config_tokens (void);
extern unsigned long get_phb_buid (struct device_node *);
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 7fd09f2..bb47af9 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -617,6 +617,8 @@
#endif
#define SPRN_HID2 0x3F8 /* Hardware Implementation Register 2 */
#define SPRN_HID2_GEKKO 0x398 /* Gekko HID2 Register */
+#define SPRN_HID2_G2_LE 0x3F3 /* G2_LE HID2 Register */
+#define HID2_G2_LE_HBE (1<<18) /* High BAT Enable (G2_LE) */
#define SPRN_IABR 0x3F2 /* Instruction Address Breakpoint Register */
#define SPRN_IABR2 0x3FA /* 83xx */
#define SPRN_IBCR 0x135 /* 83xx Insn Breakpoint Control Reg */
diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 9bb2210..065ffd1 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -69,7 +69,7 @@ enum rtas_function_index {
RTAS_FNIDX__IBM_READ_SLOT_RESET_STATE,
RTAS_FNIDX__IBM_READ_SLOT_RESET_STATE2,
RTAS_FNIDX__IBM_REMOVE_PE_DMA_WINDOW,
- RTAS_FNIDX__IBM_RESET_PE_DMA_WINDOWS,
+ RTAS_FNIDX__IBM_RESET_PE_DMA_WINDOW,
RTAS_FNIDX__IBM_SCAN_LOG_DUMP,
RTAS_FNIDX__IBM_SET_DYNAMIC_INDICATOR,
RTAS_FNIDX__IBM_SET_EEH_OPTION,
@@ -164,7 +164,7 @@ typedef struct {
#define RTAS_FN_IBM_READ_SLOT_RESET_STATE rtas_fn_handle(RTAS_FNIDX__IBM_READ_SLOT_RESET_STATE)
#define RTAS_FN_IBM_READ_SLOT_RESET_STATE2 rtas_fn_handle(RTAS_FNIDX__IBM_READ_SLOT_RESET_STATE2)
#define RTAS_FN_IBM_REMOVE_PE_DMA_WINDOW rtas_fn_handle(RTAS_FNIDX__IBM_REMOVE_PE_DMA_WINDOW)
-#define RTAS_FN_IBM_RESET_PE_DMA_WINDOWS rtas_fn_handle(RTAS_FNIDX__IBM_RESET_PE_DMA_WINDOWS)
+#define RTAS_FN_IBM_RESET_PE_DMA_WINDOW rtas_fn_handle(RTAS_FNIDX__IBM_RESET_PE_DMA_WINDOW)
#define RTAS_FN_IBM_SCAN_LOG_DUMP rtas_fn_handle(RTAS_FNIDX__IBM_SCAN_LOG_DUMP)
#define RTAS_FN_IBM_SET_DYNAMIC_INDICATOR rtas_fn_handle(RTAS_FNIDX__IBM_SET_DYNAMIC_INDICATOR)
#define RTAS_FN_IBM_SET_EEH_OPTION rtas_fn_handle(RTAS_FNIDX__IBM_SET_EEH_OPTION)
diff --git a/arch/powerpc/include/asm/sections.h b/arch/powerpc/include/asm/sections.h
index ea26665..f43f3a6 100644
--- a/arch/powerpc/include/asm/sections.h
+++ b/arch/powerpc/include/asm/sections.h
@@ -14,6 +14,7 @@ typedef struct func_desc func_desc_t;
extern char __head_end[];
extern char __srwx_boundary[];
+extern char __exittext_begin[], __exittext_end[];
/* Patch sites */
extern s32 patch__call_flush_branch_caches1;
diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
index bf5dde1..15c5691 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -14,7 +14,7 @@
#ifdef __KERNEL__
-#ifdef CONFIG_KASAN
+#if defined(CONFIG_KASAN) && CONFIG_THREAD_SHIFT < 15
#define MIN_THREAD_SHIFT (CONFIG_THREAD_SHIFT + 1)
#else
#define MIN_THREAD_SHIFT CONFIG_THREAD_SHIFT
diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
index f1f9890..de10437 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -74,7 +74,7 @@ __pu_failed: \
/* -mprefixed can generate offsets beyond range, fall back hack */
#ifdef CONFIG_PPC_KERNEL_PREFIXED
#define __put_user_asm_goto(x, addr, label, op) \
- asm_volatile_goto( \
+ asm goto( \
"1: " op " %0,0(%1) # put_user\n" \
EX_TABLE(1b, %l2) \
: \
@@ -83,7 +83,7 @@ __pu_failed: \
: label)
#else
#define __put_user_asm_goto(x, addr, label, op) \
- asm_volatile_goto( \
+ asm goto( \
"1: " op "%U1%X1 %0,%1 # put_user\n" \
EX_TABLE(1b, %l2) \
: \
@@ -97,7 +97,7 @@ __pu_failed: \
__put_user_asm_goto(x, ptr, label, "std")
#else /* __powerpc64__ */
#define __put_user_asm2_goto(x, addr, label) \
- asm_volatile_goto( \
+ asm goto( \
"1: stw%X1 %0, %1\n" \
"2: stw%X1 %L0, %L1\n" \
EX_TABLE(1b, %l2) \
@@ -146,7 +146,7 @@ do { \
/* -mprefixed can generate offsets beyond range, fall back hack */
#ifdef CONFIG_PPC_KERNEL_PREFIXED
#define __get_user_asm_goto(x, addr, label, op) \
- asm_volatile_goto( \
+ asm_goto_output( \
"1: "op" %0,0(%1) # get_user\n" \
EX_TABLE(1b, %l2) \
: "=r" (x) \
@@ -155,7 +155,7 @@ do { \
: label)
#else
#define __get_user_asm_goto(x, addr, label, op) \
- asm_volatile_goto( \
+ asm_goto_output( \
"1: "op"%U1%X1 %0, %1 # get_user\n" \
EX_TABLE(1b, %l2) \
: "=r" (x) \
@@ -169,7 +169,7 @@ do { \
__get_user_asm_goto(x, addr, label, "ld")
#else /* __powerpc64__ */
#define __get_user_asm2_goto(x, addr, label) \
- asm_volatile_goto( \
+ asm_goto_output( \
"1: lwz%X1 %0, %1\n" \
"2: lwz%X1 %L0, %L1\n" \
EX_TABLE(1b, %l2) \
diff --git a/arch/powerpc/include/uapi/asm/papr-sysparm.h b/arch/powerpc/include/uapi/asm/papr-sysparm.h
index 9f9a0f2..f733467 100644
--- a/arch/powerpc/include/uapi/asm/papr-sysparm.h
+++ b/arch/powerpc/include/uapi/asm/papr-sysparm.h
@@ -14,7 +14,7 @@ enum {
struct papr_sysparm_io_block {
__u32 parameter;
__u16 length;
- char data[PAPR_SYSPARM_MAX_OUTPUT];
+ __u8 data[PAPR_SYSPARM_MAX_OUTPUT];
};
/**
diff --git a/arch/powerpc/kernel/cpu_setup_6xx.S b/arch/powerpc/kernel/cpu_setup_6xx.S
index f29ce3d..bfd3f442 100644
--- a/arch/powerpc/kernel/cpu_setup_6xx.S
+++ b/arch/powerpc/kernel/cpu_setup_6xx.S
@@ -26,6 +26,15 @@
bl __init_fpu_registers
END_FTR_SECTION_IFCLR(CPU_FTR_FPU_UNAVAILABLE)
bl setup_common_caches
+
+ /*
+ * This assumes that all cores using __setup_cpu_603 with
+ * MMU_FTR_USE_HIGH_BATS are G2_LE compatible
+ */
+BEGIN_MMU_FTR_SECTION
+ bl setup_g2_le_hid2
+END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_HIGH_BATS)
+
mtlr r5
blr
_GLOBAL(__setup_cpu_604)
@@ -115,6 +124,16 @@
blr
SYM_FUNC_END(setup_604_hid0)
+/* Enable high BATs for G2_LE and derivatives like e300cX */
+SYM_FUNC_START_LOCAL(setup_g2_le_hid2)
+ mfspr r11,SPRN_HID2_G2_LE
+ oris r11,r11,HID2_G2_LE_HBE@h
+ mtspr SPRN_HID2_G2_LE,r11
+ sync
+ isync
+ blr
+SYM_FUNC_END(setup_g2_le_hid2)
+
/* 7400 <= rev 2.7 and 7410 rev = 1.0 suffer from some
* erratas we work around here.
* Moto MPC710CE.pdf describes them, those are errata
@@ -495,4 +514,3 @@
mtcr r7
blr
_ASM_NOKPROBE_SYMBOL(__restore_cpu_setup)
-
diff --git a/arch/powerpc/kernel/cpu_specs_e500mc.h b/arch/powerpc/kernel/cpu_specs_e500mc.h
index ceb06b1..2ae8e9a 100644
--- a/arch/powerpc/kernel/cpu_specs_e500mc.h
+++ b/arch/powerpc/kernel/cpu_specs_e500mc.h
@@ -8,7 +8,8 @@
#ifdef CONFIG_PPC64
#define COMMON_USER_BOOKE (PPC_FEATURE_32 | PPC_FEATURE_HAS_MMU | \
- PPC_FEATURE_HAS_FPU | PPC_FEATURE_64)
+ PPC_FEATURE_HAS_FPU | PPC_FEATURE_64 | \
+ PPC_FEATURE_BOOKE)
#else
#define COMMON_USER_BOOKE (PPC_FEATURE_32 | PPC_FEATURE_HAS_MMU | \
PPC_FEATURE_BOOKE)
diff --git a/arch/powerpc/kernel/interrupt_64.S b/arch/powerpc/kernel/interrupt_64.S
index bd86370..1ad059a9 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -52,7 +52,8 @@
mr r10,r1
ld r1,PACAKSAVE(r13)
std r10,0(r1)
- std r11,_NIP(r1)
+ std r11,_LINK(r1)
+ std r11,_NIP(r1) /* Saved LR is also the next instruction */
std r12,_MSR(r1)
std r0,GPR0(r1)
std r10,GPR1(r1)
@@ -70,7 +71,6 @@
std r9,GPR13(r1)
SAVE_NVGPRS(r1)
std r11,_XER(r1)
- std r11,_LINK(r1)
std r11,_CTR(r1)
li r11,\trapnr
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index d71eac3..1185efe 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1289,8 +1289,10 @@ spapr_tce_platform_iommu_attach_dev(struct iommu_domain *platform_domain,
struct iommu_table_group *table_group;
/* At first attach the ownership is already set */
- if (!domain)
+ if (!domain) {
+ iommu_group_put(grp);
return 0;
+ }
table_group = iommu_group_get_iommudata(grp);
/*
@@ -1358,7 +1360,7 @@ static struct iommu_device *spapr_tce_iommu_probe_device(struct device *dev)
struct pci_controller *hose;
if (!dev_is_pci(dev))
- return ERR_PTR(-EPERM);
+ return ERR_PTR(-ENODEV);
pdev = to_pci_dev(dev);
hose = pdev->bus->sysdata;
@@ -1407,6 +1409,21 @@ static const struct attribute_group *spapr_tce_iommu_groups[] = {
NULL,
};
+void ppc_iommu_register_device(struct pci_controller *phb)
+{
+ iommu_device_sysfs_add(&phb->iommu, phb->parent,
+ spapr_tce_iommu_groups, "iommu-phb%04x",
+ phb->global_number);
+ iommu_device_register(&phb->iommu, &spapr_tce_iommu_ops,
+ phb->parent);
+}
+
+void ppc_iommu_unregister_device(struct pci_controller *phb)
+{
+ iommu_device_unregister(&phb->iommu);
+ iommu_device_sysfs_remove(&phb->iommu);
+}
+
/*
* This registers IOMMU devices of PHBs. This needs to happen
* after core_initcall(iommu_init) + postcore_initcall(pci_driver_init) and
@@ -1417,11 +1434,7 @@ static int __init spapr_tce_setup_phb_iommus_initcall(void)
struct pci_controller *hose;
list_for_each_entry(hose, &hose_list, list_node) {
- iommu_device_sysfs_add(&hose->iommu, hose->parent,
- spapr_tce_iommu_groups, "iommu-phb%04x",
- hose->global_number);
- iommu_device_register(&hose->iommu, &spapr_tce_iommu_ops,
- hose->parent);
+ ppc_iommu_register_device(hose);
}
return 0;
}
diff --git a/arch/powerpc/kernel/irq_64.c b/arch/powerpc/kernel/irq_64.c
index 938e668..d5c48d1 100644
--- a/arch/powerpc/kernel/irq_64.c
+++ b/arch/powerpc/kernel/irq_64.c
@@ -230,7 +230,7 @@ notrace __no_kcsan void arch_local_irq_restore(unsigned long mask)
* This allows interrupts to be unmasked without hard disabling, and
* also without new hard interrupts coming in ahead of pending ones.
*/
- asm_volatile_goto(
+ asm goto(
"1: \n"
" lbz 9,%0(13) \n"
" cmpwi 9,0 \n"
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 7e793b5..8064d9c 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -375,8 +375,13 @@ static struct rtas_function rtas_function_table[] __ro_after_init = {
[RTAS_FNIDX__IBM_REMOVE_PE_DMA_WINDOW] = {
.name = "ibm,remove-pe-dma-window",
},
- [RTAS_FNIDX__IBM_RESET_PE_DMA_WINDOWS] = {
- .name = "ibm,reset-pe-dma-windows",
+ [RTAS_FNIDX__IBM_RESET_PE_DMA_WINDOW] = {
+ /*
+ * Note: PAPR+ v2.13 7.3.31.4.1 spells this as
+ * "ibm,reset-pe-dma-windows" (plural), but RTAS
+ * implementations use the singular form in practice.
+ */
+ .name = "ibm,reset-pe-dma-window",
},
[RTAS_FNIDX__IBM_SCAN_LOG_DUMP] = {
.name = "ibm,scan-log-dump",
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index 8201062..d8d6b4f 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -27,10 +27,22 @@
#include <asm/ftrace.h>
#include <asm/syscall.h>
#include <asm/inst.h>
+#include <asm/sections.h>
#define NUM_FTRACE_TRAMPS 2
static unsigned long ftrace_tramps[NUM_FTRACE_TRAMPS];
+unsigned long ftrace_call_adjust(unsigned long addr)
+{
+ if (addr >= (unsigned long)__exittext_begin && addr < (unsigned long)__exittext_end)
+ return 0;
+
+ if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY))
+ addr += MCOUNT_INSN_SIZE;
+
+ return addr;
+}
+
static ppc_inst_t ftrace_create_branch_inst(unsigned long ip, unsigned long addr, int link)
{
ppc_inst_t op;
diff --git a/arch/powerpc/kernel/trace/ftrace_64_pg.c b/arch/powerpc/kernel/trace/ftrace_64_pg.c
index 7b85c3b..12fab18 100644
--- a/arch/powerpc/kernel/trace/ftrace_64_pg.c
+++ b/arch/powerpc/kernel/trace/ftrace_64_pg.c
@@ -37,6 +37,11 @@
#define NUM_FTRACE_TRAMPS 8
static unsigned long ftrace_tramps[NUM_FTRACE_TRAMPS];
+unsigned long ftrace_call_adjust(unsigned long addr)
+{
+ return addr;
+}
+
static ppc_inst_t
ftrace_call_replace(unsigned long ip, unsigned long addr, int link)
{
diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S
index 1c5970d..f420df7 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -281,7 +281,9 @@
* to deal with references from __bug_table
*/
.exit.text : AT(ADDR(.exit.text) - LOAD_OFFSET) {
+ __exittext_begin = .;
EXIT_TEXT
+ __exittext_end = .;
}
. = ALIGN(PAGE_SIZE);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 52427fc..0b92170 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -391,6 +391,24 @@ static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr)
/* Dummy value used in computing PCR value below */
#define PCR_ARCH_31 (PCR_ARCH_300 << 1)
+static inline unsigned long map_pcr_to_cap(unsigned long pcr)
+{
+ unsigned long cap = 0;
+
+ switch (pcr) {
+ case PCR_ARCH_300:
+ cap = H_GUEST_CAP_POWER9;
+ break;
+ case PCR_ARCH_31:
+ cap = H_GUEST_CAP_POWER10;
+ break;
+ default:
+ break;
+ }
+
+ return cap;
+}
+
static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat)
{
unsigned long host_pcr_bit = 0, guest_pcr_bit = 0, cap = 0;
@@ -424,11 +442,9 @@ static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat)
break;
case PVR_ARCH_300:
guest_pcr_bit = PCR_ARCH_300;
- cap = H_GUEST_CAP_POWER9;
break;
case PVR_ARCH_31:
guest_pcr_bit = PCR_ARCH_31;
- cap = H_GUEST_CAP_POWER10;
break;
default:
return -EINVAL;
@@ -440,6 +456,12 @@ static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat)
return -EINVAL;
if (kvmhv_on_pseries() && kvmhv_is_nestedv2()) {
+ /*
+ * 'arch_compat == 0' would mean the guest should default to
+ * L1's compatibility. In this case, the guest would pick
+ * host's PCR and evaluate the corresponding capabilities.
+ */
+ cap = map_pcr_to_cap(guest_pcr_bit);
if (!(cap & nested_capabilities))
return -EINVAL;
}
diff --git a/arch/powerpc/kvm/book3s_hv_nestedv2.c b/arch/powerpc/kvm/book3s_hv_nestedv2.c
index 5378eb4..8e6f535 100644
--- a/arch/powerpc/kvm/book3s_hv_nestedv2.c
+++ b/arch/powerpc/kvm/book3s_hv_nestedv2.c
@@ -138,6 +138,7 @@ static int gs_msg_ops_vcpu_fill_info(struct kvmppc_gs_buff *gsb,
vector128 v;
int rc, i;
u16 iden;
+ u32 arch_compat = 0;
vcpu = gsm->data;
@@ -347,8 +348,23 @@ static int gs_msg_ops_vcpu_fill_info(struct kvmppc_gs_buff *gsb,
break;
}
case KVMPPC_GSID_LOGICAL_PVR:
- rc = kvmppc_gse_put_u32(gsb, iden,
- vcpu->arch.vcore->arch_compat);
+ /*
+ * Though 'arch_compat == 0' would mean the default
+ * compatibility, arch_compat, being a Guest Wide
+ * Element, cannot be filled with a value of 0 in GSB
+ * as this would result into a kernel trap.
+ * Hence, when `arch_compat == 0`, arch_compat should
+ * default to L1's PVR.
+ */
+ if (!vcpu->arch.vcore->arch_compat) {
+ if (cpu_has_feature(CPU_FTR_ARCH_31))
+ arch_compat = PVR_ARCH_31;
+ else if (cpu_has_feature(CPU_FTR_ARCH_300))
+ arch_compat = PVR_ARCH_300;
+ } else {
+ arch_compat = vcpu->arch.vcore->arch_compat;
+ }
+ rc = kvmppc_gse_put_u32(gsb, iden, arch_compat);
break;
}
diff --git a/arch/powerpc/mm/kasan/init_32.c b/arch/powerpc/mm/kasan/init_32.c
index a70828a..aa9aa11 100644
--- a/arch/powerpc/mm/kasan/init_32.c
+++ b/arch/powerpc/mm/kasan/init_32.c
@@ -64,6 +64,7 @@ int __init __weak kasan_init_region(void *start, size_t size)
if (ret)
return ret;
+ k_start = k_start & PAGE_MASK;
block = memblock_alloc(k_end - k_start, PAGE_SIZE);
if (!block)
return -ENOMEM;
diff --git a/arch/powerpc/platforms/85xx/mpc8536_ds.c b/arch/powerpc/platforms/85xx/mpc8536_ds.c
index e966b2a..b3327a3 100644
--- a/arch/powerpc/platforms/85xx/mpc8536_ds.c
+++ b/arch/powerpc/platforms/85xx/mpc8536_ds.c
@@ -27,7 +27,7 @@
#include "mpc85xx.h"
-void __init mpc8536_ds_pic_init(void)
+static void __init mpc8536_ds_pic_init(void)
{
struct mpic *mpic = mpic_alloc(NULL, 0, MPIC_BIG_ENDIAN,
0, 256, " OpenPIC ");
diff --git a/arch/powerpc/platforms/85xx/mvme2500.c b/arch/powerpc/platforms/85xx/mvme2500.c
index 1b59e45..19122da 100644
--- a/arch/powerpc/platforms/85xx/mvme2500.c
+++ b/arch/powerpc/platforms/85xx/mvme2500.c
@@ -21,7 +21,7 @@
#include "mpc85xx.h"
-void __init mvme2500_pic_init(void)
+static void __init mvme2500_pic_init(void)
{
struct mpic *mpic = mpic_alloc(NULL, 0,
MPIC_BIG_ENDIAN | MPIC_SINGLE_DEST_CPU,
diff --git a/arch/powerpc/platforms/85xx/p1010rdb.c b/arch/powerpc/platforms/85xx/p1010rdb.c
index 10d6f1f..491895a 100644
--- a/arch/powerpc/platforms/85xx/p1010rdb.c
+++ b/arch/powerpc/platforms/85xx/p1010rdb.c
@@ -24,7 +24,7 @@
#include "mpc85xx.h"
-void __init p1010_rdb_pic_init(void)
+static void __init p1010_rdb_pic_init(void)
{
struct mpic *mpic = mpic_alloc(NULL, 0, MPIC_BIG_ENDIAN |
MPIC_SINGLE_DEST_CPU,
diff --git a/arch/powerpc/platforms/85xx/p1022_ds.c b/arch/powerpc/platforms/85xx/p1022_ds.c
index 0dd786a..adc3a2e 100644
--- a/arch/powerpc/platforms/85xx/p1022_ds.c
+++ b/arch/powerpc/platforms/85xx/p1022_ds.c
@@ -370,7 +370,7 @@ static void p1022ds_set_monitor_port(enum fsl_diu_monitor_port port)
*
* @pixclock: the wavelength, in picoseconds, of the clock
*/
-void p1022ds_set_pixel_clock(unsigned int pixclock)
+static void p1022ds_set_pixel_clock(unsigned int pixclock)
{
struct device_node *guts_np = NULL;
struct ccsr_guts __iomem *guts;
@@ -418,7 +418,7 @@ void p1022ds_set_pixel_clock(unsigned int pixclock)
/**
* p1022ds_valid_monitor_port: set the monitor port for sysfs
*/
-enum fsl_diu_monitor_port
+static enum fsl_diu_monitor_port
p1022ds_valid_monitor_port(enum fsl_diu_monitor_port port)
{
switch (port) {
@@ -432,7 +432,7 @@ p1022ds_valid_monitor_port(enum fsl_diu_monitor_port port)
#endif
-void __init p1022_ds_pic_init(void)
+static void __init p1022_ds_pic_init(void)
{
struct mpic *mpic = mpic_alloc(NULL, 0, MPIC_BIG_ENDIAN |
MPIC_SINGLE_DEST_CPU,
diff --git a/arch/powerpc/platforms/85xx/p1022_rdk.c b/arch/powerpc/platforms/85xx/p1022_rdk.c
index 25ab6e9..6198299 100644
--- a/arch/powerpc/platforms/85xx/p1022_rdk.c
+++ b/arch/powerpc/platforms/85xx/p1022_rdk.c
@@ -40,7 +40,7 @@
*
* @pixclock: the wavelength, in picoseconds, of the clock
*/
-void p1022rdk_set_pixel_clock(unsigned int pixclock)
+static void p1022rdk_set_pixel_clock(unsigned int pixclock)
{
struct device_node *guts_np = NULL;
struct ccsr_guts __iomem *guts;
@@ -88,7 +88,7 @@ void p1022rdk_set_pixel_clock(unsigned int pixclock)
/**
* p1022rdk_valid_monitor_port: set the monitor port for sysfs
*/
-enum fsl_diu_monitor_port
+static enum fsl_diu_monitor_port
p1022rdk_valid_monitor_port(enum fsl_diu_monitor_port port)
{
return FSL_DIU_PORT_DVI;
@@ -96,7 +96,7 @@ p1022rdk_valid_monitor_port(enum fsl_diu_monitor_port port)
#endif
-void __init p1022_rdk_pic_init(void)
+static void __init p1022_rdk_pic_init(void)
{
struct mpic *mpic = mpic_alloc(NULL, 0, MPIC_BIG_ENDIAN |
MPIC_SINGLE_DEST_CPU,
diff --git a/arch/powerpc/platforms/85xx/socrates_fpga_pic.c b/arch/powerpc/platforms/85xx/socrates_fpga_pic.c
index baa12ef..60e0b89 100644
--- a/arch/powerpc/platforms/85xx/socrates_fpga_pic.c
+++ b/arch/powerpc/platforms/85xx/socrates_fpga_pic.c
@@ -8,6 +8,8 @@
#include <linux/of_irq.h>
#include <linux/io.h>
+#include "socrates_fpga_pic.h"
+
/*
* The FPGA supports 9 interrupt sources, which can be routed to 3
* interrupt request lines of the MPIC. The line to be used can be
diff --git a/arch/powerpc/platforms/85xx/xes_mpc85xx.c b/arch/powerpc/platforms/85xx/xes_mpc85xx.c
index 45f257f..2582427 100644
--- a/arch/powerpc/platforms/85xx/xes_mpc85xx.c
+++ b/arch/powerpc/platforms/85xx/xes_mpc85xx.c
@@ -37,7 +37,7 @@
#define MPC85xx_L2CTL_L2I 0x40000000 /* L2 flash invalidate */
#define MPC85xx_L2CTL_L2SIZ_MASK 0x30000000 /* L2 SRAM size (R/O) */
-void __init xes_mpc85xx_pic_init(void)
+static void __init xes_mpc85xx_pic_init(void)
{
struct mpic *mpic = mpic_alloc(NULL, 0, MPIC_BIG_ENDIAN,
0, 256, " OpenPIC ");
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 496e16c..e8c4129 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -574,29 +574,6 @@ static void iommu_table_setparms(struct pci_controller *phb,
struct iommu_table_ops iommu_table_lpar_multi_ops;
-/*
- * iommu_table_setparms_lpar
- *
- * Function: On pSeries LPAR systems, return TCE table info, given a pci bus.
- */
-static void iommu_table_setparms_lpar(struct pci_controller *phb,
- struct device_node *dn,
- struct iommu_table *tbl,
- struct iommu_table_group *table_group,
- const __be32 *dma_window)
-{
- unsigned long offset, size, liobn;
-
- of_parse_dma_window(dn, dma_window, &liobn, &offset, &size);
-
- iommu_table_setparms_common(tbl, phb->bus->number, liobn, offset, size, IOMMU_PAGE_SHIFT_4K, NULL,
- &iommu_table_lpar_multi_ops);
-
-
- table_group->tce32_start = offset;
- table_group->tce32_size = size;
-}
-
struct iommu_table_ops iommu_table_pseries_ops = {
.set = tce_build_pSeries,
.clear = tce_free_pSeries,
@@ -724,26 +701,71 @@ struct iommu_table_ops iommu_table_lpar_multi_ops = {
* dynamic 64bit DMA window, walking up the device tree.
*/
static struct device_node *pci_dma_find(struct device_node *dn,
- const __be32 **dma_window)
+ struct dynamic_dma_window_prop *prop)
{
- const __be32 *dw = NULL;
+ const __be32 *default_prop = NULL;
+ const __be32 *ddw_prop = NULL;
+ struct device_node *rdn = NULL;
+ bool default_win = false, ddw_win = false;
for ( ; dn && PCI_DN(dn); dn = dn->parent) {
- dw = of_get_property(dn, "ibm,dma-window", NULL);
- if (dw) {
- if (dma_window)
- *dma_window = dw;
- return dn;
+ default_prop = of_get_property(dn, "ibm,dma-window", NULL);
+ if (default_prop) {
+ rdn = dn;
+ default_win = true;
}
- dw = of_get_property(dn, DIRECT64_PROPNAME, NULL);
- if (dw)
- return dn;
- dw = of_get_property(dn, DMA64_PROPNAME, NULL);
- if (dw)
- return dn;
+ ddw_prop = of_get_property(dn, DIRECT64_PROPNAME, NULL);
+ if (ddw_prop) {
+ rdn = dn;
+ ddw_win = true;
+ break;
+ }
+ ddw_prop = of_get_property(dn, DMA64_PROPNAME, NULL);
+ if (ddw_prop) {
+ rdn = dn;
+ ddw_win = true;
+ break;
+ }
+
+ /* At least found default window, which is the case for normal boot */
+ if (default_win)
+ break;
}
- return NULL;
+ /* For PCI devices there will always be a DMA window, either on the device
+ * or parent bus
+ */
+ WARN_ON(!(default_win | ddw_win));
+
+ /* caller doesn't want to get DMA window property */
+ if (!prop)
+ return rdn;
+
+ /* parse DMA window property. During normal system boot, only default
+ * DMA window is passed in OF. But, for kdump, a dedicated adapter might
+ * have both default and DDW in FDT. In this scenario, DDW takes precedence
+ * over default window.
+ */
+ if (ddw_win) {
+ struct dynamic_dma_window_prop *p;
+
+ p = (struct dynamic_dma_window_prop *)ddw_prop;
+ prop->liobn = p->liobn;
+ prop->dma_base = p->dma_base;
+ prop->tce_shift = p->tce_shift;
+ prop->window_shift = p->window_shift;
+ } else if (default_win) {
+ unsigned long offset, size, liobn;
+
+ of_parse_dma_window(rdn, default_prop, &liobn, &offset, &size);
+
+ prop->liobn = cpu_to_be32((u32)liobn);
+ prop->dma_base = cpu_to_be64(offset);
+ prop->tce_shift = cpu_to_be32(IOMMU_PAGE_SHIFT_4K);
+ prop->window_shift = cpu_to_be32(order_base_2(size));
+ }
+
+ return rdn;
}
static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
@@ -751,17 +773,20 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
struct iommu_table *tbl;
struct device_node *dn, *pdn;
struct pci_dn *ppci;
- const __be32 *dma_window = NULL;
+ struct dynamic_dma_window_prop prop;
dn = pci_bus_to_OF_node(bus);
pr_debug("pci_dma_bus_setup_pSeriesLP: setting up bus %pOF\n",
dn);
- pdn = pci_dma_find(dn, &dma_window);
+ pdn = pci_dma_find(dn, &prop);
- if (dma_window == NULL)
- pr_debug(" no ibm,dma-window property !\n");
+ /* In PPC architecture, there will always be DMA window on bus or one of the
+ * parent bus. During reboot, there will be ibm,dma-window property to
+ * define DMA window. For kdump, there will at least be default window or DDW
+ * or both.
+ */
ppci = PCI_DN(pdn);
@@ -771,13 +796,24 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
if (!ppci->table_group) {
ppci->table_group = iommu_pseries_alloc_group(ppci->phb->node);
tbl = ppci->table_group->tables[0];
- if (dma_window) {
- iommu_table_setparms_lpar(ppci->phb, pdn, tbl,
- ppci->table_group, dma_window);
- if (!iommu_init_table(tbl, ppci->phb->node, 0, 0))
- panic("Failed to initialize iommu table");
- }
+ iommu_table_setparms_common(tbl, ppci->phb->bus->number,
+ be32_to_cpu(prop.liobn),
+ be64_to_cpu(prop.dma_base),
+ 1ULL << be32_to_cpu(prop.window_shift),
+ be32_to_cpu(prop.tce_shift), NULL,
+ &iommu_table_lpar_multi_ops);
+
+ /* Only for normal boot with default window. Doesn't matter even
+ * if we set these with DDW which is 64bit during kdump, since
+ * these will not be used during kdump.
+ */
+ ppci->table_group->tce32_start = be64_to_cpu(prop.dma_base);
+ ppci->table_group->tce32_size = 1 << be32_to_cpu(prop.window_shift);
+
+ if (!iommu_init_table(tbl, ppci->phb->node, 0, 0))
+ panic("Failed to initialize iommu table");
+
iommu_register_group(ppci->table_group,
pci_domain_nr(bus), 0);
pr_debug(" created table: %p\n", ppci->table_group);
@@ -968,6 +1004,12 @@ static void find_existing_ddw_windows_named(const char *name)
continue;
}
+ /* If at the time of system initialization, there are DDWs in OF,
+ * it means this is during kexec. DDW could be direct or dynamic.
+ * We will just mark DDWs as "dynamic" since this is kdump path,
+ * no need to worry about perforance. ddw_list_new_entry() will
+ * set window->direct = false.
+ */
window = ddw_list_new_entry(pdn, dma64);
if (!window) {
of_node_put(pdn);
@@ -1524,8 +1566,8 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
{
struct device_node *pdn, *dn;
struct iommu_table *tbl;
- const __be32 *dma_window = NULL;
struct pci_dn *pci;
+ struct dynamic_dma_window_prop prop;
pr_debug("pci_dma_dev_setup_pSeriesLP: %s\n", pci_name(dev));
@@ -1538,7 +1580,7 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
dn = pci_device_to_OF_node(dev);
pr_debug(" node is %pOF\n", dn);
- pdn = pci_dma_find(dn, &dma_window);
+ pdn = pci_dma_find(dn, &prop);
if (!pdn || !PCI_DN(pdn)) {
printk(KERN_WARNING "pci_dma_dev_setup_pSeriesLP: "
"no DMA window found for pci dev=%s dn=%pOF\n",
@@ -1551,8 +1593,20 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
if (!pci->table_group) {
pci->table_group = iommu_pseries_alloc_group(pci->phb->node);
tbl = pci->table_group->tables[0];
- iommu_table_setparms_lpar(pci->phb, pdn, tbl,
- pci->table_group, dma_window);
+
+ iommu_table_setparms_common(tbl, pci->phb->bus->number,
+ be32_to_cpu(prop.liobn),
+ be64_to_cpu(prop.dma_base),
+ 1ULL << be32_to_cpu(prop.window_shift),
+ be32_to_cpu(prop.tce_shift), NULL,
+ &iommu_table_lpar_multi_ops);
+
+ /* Only for normal boot with default window. Doesn't matter even
+ * if we set these with DDW which is 64bit during kdump, since
+ * these will not be used during kdump.
+ */
+ pci->table_group->tce32_start = be64_to_cpu(prop.dma_base);
+ pci->table_group->tce32_size = 1 << be32_to_cpu(prop.window_shift);
iommu_init_table(tbl, pci->phb->node, 0, 0);
iommu_register_group(pci->table_group,
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 4561667..4e9916b 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -662,8 +662,12 @@ u64 pseries_paravirt_steal_clock(int cpu)
{
struct lppaca *lppaca = &lppaca_of(cpu);
- return be64_to_cpu(READ_ONCE(lppaca->enqueue_dispatch_tb)) +
- be64_to_cpu(READ_ONCE(lppaca->ready_enqueue_tb));
+ /*
+ * VPA steal time counters are reported at TB frequency. Hence do a
+ * conversion to ns before returning
+ */
+ return tb_to_ns(be64_to_cpu(READ_ONCE(lppaca->enqueue_dispatch_tb)) +
+ be64_to_cpu(READ_ONCE(lppaca->ready_enqueue_tb)));
}
#endif
diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c b/arch/powerpc/platforms/pseries/pci_dlpar.c
index 4ba8245..4448386 100644
--- a/arch/powerpc/platforms/pseries/pci_dlpar.c
+++ b/arch/powerpc/platforms/pseries/pci_dlpar.c
@@ -35,6 +35,8 @@ struct pci_controller *init_phb_dynamic(struct device_node *dn)
pseries_msi_allocate_domains(phb);
+ ppc_iommu_register_device(phb);
+
/* Create EEH devices for the PHB */
eeh_phb_pe_create(phb);
@@ -76,6 +78,8 @@ int remove_phb_dynamic(struct pci_controller *phb)
}
}
+ ppc_iommu_unregister_device(phb);
+
pseries_msi_free_domains(phb);
/* Keep a reference so phb isn't freed yet */
diff --git a/arch/powerpc/sysdev/udbg_memcons.c b/arch/powerpc/sysdev/udbg_memcons.c
index 5020044..4de57ba 100644
--- a/arch/powerpc/sysdev/udbg_memcons.c
+++ b/arch/powerpc/sysdev/udbg_memcons.c
@@ -41,7 +41,7 @@ struct memcons memcons = {
.input_end = &memcons_input[CONFIG_PPC_MEMCONS_INPUT_SIZE],
};
-void memcons_putc(char c)
+static void memcons_putc(char c)
{
char *new_output_pos;
@@ -54,7 +54,7 @@ void memcons_putc(char c)
memcons.output_pos = new_output_pos;
}
-int memcons_getc_poll(void)
+static int memcons_getc_poll(void)
{
char c;
char *new_input_pos;
@@ -77,7 +77,7 @@ int memcons_getc_poll(void)
return -1;
}
-int memcons_getc(void)
+static int memcons_getc(void)
{
int c;
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index bffbd86..e3142ce 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -315,7 +315,6 @@
# https://reviews.llvm.org/D123515
def_bool y
depends on $(as-instr, .option arch$(comma) +m)
- depends on !$(as-instr, .option arch$(comma) -i)
source "arch/riscv/Kconfig.socs"
source "arch/riscv/Kconfig.errata"
diff --git a/arch/riscv/boot/dts/sifive/hifive-unmatched-a00.dts b/arch/riscv/boot/dts/sifive/hifive-unmatched-a00.dts
index 07387f9..72b87b0 100644
--- a/arch/riscv/boot/dts/sifive/hifive-unmatched-a00.dts
+++ b/arch/riscv/boot/dts/sifive/hifive-unmatched-a00.dts
@@ -123,6 +123,7 @@ pmic@58 {
interrupt-parent = <&gpio>;
interrupts = <1 IRQ_TYPE_LEVEL_LOW>;
interrupt-controller;
+ #interrupt-cells = <2>;
onkey {
compatible = "dlg,da9063-onkey";
diff --git a/arch/riscv/boot/dts/starfive/jh7100.dtsi b/arch/riscv/boot/dts/starfive/jh7100.dtsi
index c216aae..8bcf36d 100644
--- a/arch/riscv/boot/dts/starfive/jh7100.dtsi
+++ b/arch/riscv/boot/dts/starfive/jh7100.dtsi
@@ -96,14 +96,14 @@ cpu-thermal {
thermal-sensors = <&sfctemp>;
trips {
- cpu_alert0 {
+ cpu-alert0 {
/* milliCelsius */
temperature = <75000>;
hysteresis = <2000>;
type = "passive";
};
- cpu_crit {
+ cpu-crit {
/* milliCelsius */
temperature = <90000>;
hysteresis = <2000>;
@@ -113,28 +113,28 @@ cpu_crit {
};
};
- osc_sys: osc_sys {
+ osc_sys: osc-sys {
compatible = "fixed-clock";
#clock-cells = <0>;
/* This value must be overridden by the board */
clock-frequency = <0>;
};
- osc_aud: osc_aud {
+ osc_aud: osc-aud {
compatible = "fixed-clock";
#clock-cells = <0>;
/* This value must be overridden by the board */
clock-frequency = <0>;
};
- gmac_rmii_ref: gmac_rmii_ref {
+ gmac_rmii_ref: gmac-rmii-ref {
compatible = "fixed-clock";
#clock-cells = <0>;
/* Should be overridden by the board when needed */
clock-frequency = <0>;
};
- gmac_gr_mii_rxclk: gmac_gr_mii_rxclk {
+ gmac_gr_mii_rxclk: gmac-gr-mii-rxclk {
compatible = "fixed-clock";
#clock-cells = <0>;
/* Should be overridden by the board when needed */
diff --git a/arch/riscv/boot/dts/starfive/jh7110.dtsi b/arch/riscv/boot/dts/starfive/jh7110.dtsi
index 45213cd..74ed3b9 100644
--- a/arch/riscv/boot/dts/starfive/jh7110.dtsi
+++ b/arch/riscv/boot/dts/starfive/jh7110.dtsi
@@ -237,14 +237,14 @@ map0 {
};
trips {
- cpu_alert0: cpu_alert0 {
+ cpu_alert0: cpu-alert0 {
/* milliCelsius */
temperature = <85000>;
hysteresis = <2000>;
type = "passive";
};
- cpu_crit {
+ cpu-crit {
/* milliCelsius */
temperature = <100000>;
hysteresis = <2000>;
diff --git a/arch/riscv/include/asm/arch_hweight.h b/arch/riscv/include/asm/arch_hweight.h
index c20236a..85b2c44 100644
--- a/arch/riscv/include/asm/arch_hweight.h
+++ b/arch/riscv/include/asm/arch_hweight.h
@@ -20,7 +20,7 @@
static __always_inline unsigned int __arch_hweight32(unsigned int w)
{
#ifdef CONFIG_RISCV_ISA_ZBB
- asm_volatile_goto(ALTERNATIVE("j %l[legacy]", "nop", 0,
+ asm goto(ALTERNATIVE("j %l[legacy]", "nop", 0,
RISCV_ISA_EXT_ZBB, 1)
: : : : legacy);
@@ -51,7 +51,7 @@ static inline unsigned int __arch_hweight8(unsigned int w)
static __always_inline unsigned long __arch_hweight64(__u64 w)
{
# ifdef CONFIG_RISCV_ISA_ZBB
- asm_volatile_goto(ALTERNATIVE("j %l[legacy]", "nop", 0,
+ asm goto(ALTERNATIVE("j %l[legacy]", "nop", 0,
RISCV_ISA_EXT_ZBB, 1)
: : : : legacy);
diff --git a/arch/riscv/include/asm/bitops.h b/arch/riscv/include/asm/bitops.h
index 9ffc355..329d824 100644
--- a/arch/riscv/include/asm/bitops.h
+++ b/arch/riscv/include/asm/bitops.h
@@ -39,7 +39,7 @@ static __always_inline unsigned long variable__ffs(unsigned long word)
{
int num;
- asm_volatile_goto(ALTERNATIVE("j %l[legacy]", "nop", 0,
+ asm goto(ALTERNATIVE("j %l[legacy]", "nop", 0,
RISCV_ISA_EXT_ZBB, 1)
: : : : legacy);
@@ -95,7 +95,7 @@ static __always_inline unsigned long variable__fls(unsigned long word)
{
int num;
- asm_volatile_goto(ALTERNATIVE("j %l[legacy]", "nop", 0,
+ asm goto(ALTERNATIVE("j %l[legacy]", "nop", 0,
RISCV_ISA_EXT_ZBB, 1)
: : : : legacy);
@@ -154,7 +154,7 @@ static __always_inline int variable_ffs(int x)
if (!x)
return 0;
- asm_volatile_goto(ALTERNATIVE("j %l[legacy]", "nop", 0,
+ asm goto(ALTERNATIVE("j %l[legacy]", "nop", 0,
RISCV_ISA_EXT_ZBB, 1)
: : : : legacy);
@@ -209,7 +209,7 @@ static __always_inline int variable_fls(unsigned int x)
if (!x)
return 0;
- asm_volatile_goto(ALTERNATIVE("j %l[legacy]", "nop", 0,
+ asm goto(ALTERNATIVE("j %l[legacy]", "nop", 0,
RISCV_ISA_EXT_ZBB, 1)
: : : : legacy);
diff --git a/arch/riscv/include/asm/checksum.h b/arch/riscv/include/asm/checksum.h
index a5b60b5..88e6f14 100644
--- a/arch/riscv/include/asm/checksum.h
+++ b/arch/riscv/include/asm/checksum.h
@@ -53,7 +53,7 @@ static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) {
unsigned long fold_temp;
- asm_volatile_goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
+ asm goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
RISCV_ISA_EXT_ZBB, 1)
:
:
diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
index 5a626ed..0bd1186 100644
--- a/arch/riscv/include/asm/cpufeature.h
+++ b/arch/riscv/include/asm/cpufeature.h
@@ -80,7 +80,7 @@ riscv_has_extension_likely(const unsigned long ext)
"ext must be < RISCV_ISA_EXT_MAX");
if (IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) {
- asm_volatile_goto(
+ asm goto(
ALTERNATIVE("j %l[l_no]", "nop", 0, %[ext], 1)
:
: [ext] "i" (ext)
@@ -103,7 +103,7 @@ riscv_has_extension_unlikely(const unsigned long ext)
"ext must be < RISCV_ISA_EXT_MAX");
if (IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) {
- asm_volatile_goto(
+ asm goto(
ALTERNATIVE("nop", "j %l[l_yes]", 0, %[ext], 1)
:
: [ext] "i" (ext)
diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index 5100140..2468c559 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -424,6 +424,7 @@
# define CSR_STATUS CSR_MSTATUS
# define CSR_IE CSR_MIE
# define CSR_TVEC CSR_MTVEC
+# define CSR_ENVCFG CSR_MENVCFG
# define CSR_SCRATCH CSR_MSCRATCH
# define CSR_EPC CSR_MEPC
# define CSR_CAUSE CSR_MCAUSE
@@ -448,6 +449,7 @@
# define CSR_STATUS CSR_SSTATUS
# define CSR_IE CSR_SIE
# define CSR_TVEC CSR_STVEC
+# define CSR_ENVCFG CSR_SENVCFG
# define CSR_SCRATCH CSR_SSCRATCH
# define CSR_EPC CSR_SEPC
# define CSR_CAUSE CSR_SCAUSE
diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h
index 3291721..15055f9 100644
--- a/arch/riscv/include/asm/ftrace.h
+++ b/arch/riscv/include/asm/ftrace.h
@@ -25,6 +25,11 @@
#define ARCH_SUPPORTS_FTRACE_OPS 1
#ifndef __ASSEMBLY__
+
+extern void *return_address(unsigned int level);
+
+#define ftrace_return_address(n) return_address(n)
+
void MCOUNT_NAME(void);
static inline unsigned long ftrace_call_adjust(unsigned long addr)
{
diff --git a/arch/riscv/include/asm/hugetlb.h b/arch/riscv/include/asm/hugetlb.h
index 4c5b0e9..22deb7a 100644
--- a/arch/riscv/include/asm/hugetlb.h
+++ b/arch/riscv/include/asm/hugetlb.h
@@ -11,6 +11,11 @@ static inline void arch_clear_hugepage_flags(struct page *page)
}
#define arch_clear_hugepage_flags arch_clear_hugepage_flags
+#ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
+bool arch_hugetlb_migration_supported(struct hstate *h);
+#define arch_hugetlb_migration_supported arch_hugetlb_migration_supported
+#endif
+
#ifdef CONFIG_RISCV_ISA_SVNAPOT
#define __HAVE_ARCH_HUGE_PTE_CLEAR
void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index 5340f81..1f2d259 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -81,6 +81,8 @@
#define RISCV_ISA_EXT_ZTSO 72
#define RISCV_ISA_EXT_ZACAS 73
+#define RISCV_ISA_EXT_XLINUXENVCFG 127
+
#define RISCV_ISA_EXT_MAX 128
#define RISCV_ISA_EXT_INVALID U32_MAX
diff --git a/arch/riscv/include/asm/jump_label.h b/arch/riscv/include/asm/jump_label.h
index 14a5ea8..4a35d78 100644
--- a/arch/riscv/include/asm/jump_label.h
+++ b/arch/riscv/include/asm/jump_label.h
@@ -17,7 +17,7 @@
static __always_inline bool arch_static_branch(struct static_key * const key,
const bool branch)
{
- asm_volatile_goto(
+ asm goto(
" .align 2 \n\t"
" .option push \n\t"
" .option norelax \n\t"
@@ -39,7 +39,7 @@ static __always_inline bool arch_static_branch(struct static_key * const key,
static __always_inline bool arch_static_branch_jump(struct static_key * const key,
const bool branch)
{
- asm_volatile_goto(
+ asm goto(
" .align 2 \n\t"
" .option push \n\t"
" .option norelax \n\t"
diff --git a/arch/riscv/include/asm/pgalloc.h b/arch/riscv/include/asm/pgalloc.h
index d169a4f..c80bb99 100644
--- a/arch/riscv/include/asm/pgalloc.h
+++ b/arch/riscv/include/asm/pgalloc.h
@@ -95,7 +95,13 @@ static inline void pud_free(struct mm_struct *mm, pud_t *pud)
__pud_free(mm, pud);
}
-#define __pud_free_tlb(tlb, pud, addr) pud_free((tlb)->mm, pud)
+#define __pud_free_tlb(tlb, pud, addr) \
+do { \
+ if (pgtable_l4_enabled) { \
+ pagetable_pud_dtor(virt_to_ptdesc(pud)); \
+ tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pud)); \
+ } \
+} while (0)
#define p4d_alloc_one p4d_alloc_one
static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr)
@@ -124,7 +130,11 @@ static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
__p4d_free(mm, p4d);
}
-#define __p4d_free_tlb(tlb, p4d, addr) p4d_free((tlb)->mm, p4d)
+#define __p4d_free_tlb(tlb, p4d, addr) \
+do { \
+ if (pgtable_l5_enabled) \
+ tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(p4d)); \
+} while (0)
#endif /* __PAGETABLE_PMD_FOLDED */
static inline void sync_kernel_mappings(pgd_t *pgd)
@@ -149,7 +159,11 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
#ifndef __PAGETABLE_PMD_FOLDED
-#define __pmd_free_tlb(tlb, pmd, addr) pmd_free((tlb)->mm, pmd)
+#define __pmd_free_tlb(tlb, pmd, addr) \
+do { \
+ pagetable_pmd_dtor(virt_to_ptdesc(pmd)); \
+ tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pmd)); \
+} while (0)
#endif /* __PAGETABLE_PMD_FOLDED */
diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
index b42017d..b99bd66 100644
--- a/arch/riscv/include/asm/pgtable-64.h
+++ b/arch/riscv/include/asm/pgtable-64.h
@@ -136,7 +136,7 @@ enum napot_cont_order {
* 10010 - IO Strongly-ordered, Non-cacheable, Non-bufferable, Shareable, Non-trustable
*/
#define _PAGE_PMA_THEAD ((1UL << 62) | (1UL << 61) | (1UL << 60))
-#define _PAGE_NOCACHE_THEAD ((1UL < 61) | (1UL << 60))
+#define _PAGE_NOCACHE_THEAD ((1UL << 61) | (1UL << 60))
#define _PAGE_IO_THEAD ((1UL << 63) | (1UL << 60))
#define _PAGE_MTMASK_THEAD (_PAGE_PMA_THEAD | _PAGE_IO_THEAD | (1UL << 59))
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 0c94260..6066822 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -84,7 +84,7 @@
* Define vmemmap for pfn_to_page & page_to_pfn calls. Needed if kernel
* is configured with CONFIG_SPARSEMEM_VMEMMAP enabled.
*/
-#define vmemmap ((struct page *)VMEMMAP_START)
+#define vmemmap ((struct page *)VMEMMAP_START - (phys_ram_base >> PAGE_SHIFT))
#define PCI_IO_SIZE SZ_16M
#define PCI_IO_END VMEMMAP_START
@@ -439,6 +439,10 @@ static inline pte_t pte_mkhuge(pte_t pte)
return pte;
}
+#define pte_leaf_size(pte) (pte_napot(pte) ? \
+ napot_cont_size(napot_cont_order(pte)) :\
+ PAGE_SIZE)
+
#ifdef CONFIG_NUMA_BALANCING
/*
* See the comment in include/asm-generic/pgtable.h
diff --git a/arch/riscv/include/asm/stacktrace.h b/arch/riscv/include/asm/stacktrace.h
index f7e8ef2..b1495a7 100644
--- a/arch/riscv/include/asm/stacktrace.h
+++ b/arch/riscv/include/asm/stacktrace.h
@@ -21,4 +21,9 @@ static inline bool on_thread_stack(void)
return !(((unsigned long)(current->stack) ^ current_stack_pointer) & ~(THREAD_SIZE - 1));
}
+
+#ifdef CONFIG_VMAP_STACK
+DECLARE_PER_CPU(unsigned long [OVERFLOW_STACK_SIZE/sizeof(long)], overflow_stack);
+#endif /* CONFIG_VMAP_STACK */
+
#endif /* _ASM_RISCV_STACKTRACE_H */
diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
index 02f8786..491296a 100644
--- a/arch/riscv/include/asm/suspend.h
+++ b/arch/riscv/include/asm/suspend.h
@@ -14,6 +14,7 @@ struct suspend_context {
struct pt_regs regs;
/* Saved and restored by high-level functions */
unsigned long scratch;
+ unsigned long envcfg;
unsigned long tvec;
unsigned long ie;
#ifdef CONFIG_MMU
diff --git a/arch/riscv/include/asm/tlb.h b/arch/riscv/include/asm/tlb.h
index 1eb5682..50b63b5 100644
--- a/arch/riscv/include/asm/tlb.h
+++ b/arch/riscv/include/asm/tlb.h
@@ -16,7 +16,7 @@ static void tlb_flush(struct mmu_gather *tlb);
static inline void tlb_flush(struct mmu_gather *tlb)
{
#ifdef CONFIG_MMU
- if (tlb->fullmm || tlb->need_flush_all)
+ if (tlb->fullmm || tlb->need_flush_all || tlb->freed_tables)
flush_tlb_mm(tlb->mm);
else
flush_tlb_mm_range(tlb->mm, tlb->start, tlb->end,
diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
index 928f096..4112cc8 100644
--- a/arch/riscv/include/asm/tlbflush.h
+++ b/arch/riscv/include/asm/tlbflush.h
@@ -75,6 +75,7 @@ static inline void flush_tlb_kernel_range(unsigned long start,
#define flush_tlb_mm(mm) flush_tlb_all()
#define flush_tlb_mm_range(mm, start, end, page_size) flush_tlb_all()
+#define local_flush_tlb_kernel_range(start, end) flush_tlb_all()
#endif /* !CONFIG_SMP || !CONFIG_MMU */
#endif /* _ASM_RISCV_TLBFLUSH_H */
diff --git a/arch/riscv/include/asm/vmalloc.h b/arch/riscv/include/asm/vmalloc.h
index 924d01b..51f6dfe 100644
--- a/arch/riscv/include/asm/vmalloc.h
+++ b/arch/riscv/include/asm/vmalloc.h
@@ -19,65 +19,6 @@ static inline bool arch_vmap_pmd_supported(pgprot_t prot)
return true;
}
-#ifdef CONFIG_RISCV_ISA_SVNAPOT
-#include <linux/pgtable.h>
+#endif
-#define arch_vmap_pte_range_map_size arch_vmap_pte_range_map_size
-static inline unsigned long arch_vmap_pte_range_map_size(unsigned long addr, unsigned long end,
- u64 pfn, unsigned int max_page_shift)
-{
- unsigned long map_size = PAGE_SIZE;
- unsigned long size, order;
-
- if (!has_svnapot())
- return map_size;
-
- for_each_napot_order_rev(order) {
- if (napot_cont_shift(order) > max_page_shift)
- continue;
-
- size = napot_cont_size(order);
- if (end - addr < size)
- continue;
-
- if (!IS_ALIGNED(addr, size))
- continue;
-
- if (!IS_ALIGNED(PFN_PHYS(pfn), size))
- continue;
-
- map_size = size;
- break;
- }
-
- return map_size;
-}
-
-#define arch_vmap_pte_supported_shift arch_vmap_pte_supported_shift
-static inline int arch_vmap_pte_supported_shift(unsigned long size)
-{
- int shift = PAGE_SHIFT;
- unsigned long order;
-
- if (!has_svnapot())
- return shift;
-
- WARN_ON_ONCE(size >= PMD_SIZE);
-
- for_each_napot_order_rev(order) {
- if (napot_cont_size(order) > size)
- continue;
-
- if (!IS_ALIGNED(size, napot_cont_size(order)))
- continue;
-
- shift = napot_cont_shift(order);
- break;
- }
-
- return shift;
-}
-
-#endif /* CONFIG_RISCV_ISA_SVNAPOT */
-#endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
#endif /* _ASM_RISCV_VMALLOC_H */
diff --git a/arch/riscv/include/uapi/asm/kvm.h b/arch/riscv/include/uapi/asm/kvm.h
index d6b7a5b..7499e88 100644
--- a/arch/riscv/include/uapi/asm/kvm.h
+++ b/arch/riscv/include/uapi/asm/kvm.h
@@ -139,6 +139,33 @@ enum KVM_RISCV_ISA_EXT_ID {
KVM_RISCV_ISA_EXT_ZIHPM,
KVM_RISCV_ISA_EXT_SMSTATEEN,
KVM_RISCV_ISA_EXT_ZICOND,
+ KVM_RISCV_ISA_EXT_ZBC,
+ KVM_RISCV_ISA_EXT_ZBKB,
+ KVM_RISCV_ISA_EXT_ZBKC,
+ KVM_RISCV_ISA_EXT_ZBKX,
+ KVM_RISCV_ISA_EXT_ZKND,
+ KVM_RISCV_ISA_EXT_ZKNE,
+ KVM_RISCV_ISA_EXT_ZKNH,
+ KVM_RISCV_ISA_EXT_ZKR,
+ KVM_RISCV_ISA_EXT_ZKSED,
+ KVM_RISCV_ISA_EXT_ZKSH,
+ KVM_RISCV_ISA_EXT_ZKT,
+ KVM_RISCV_ISA_EXT_ZVBB,
+ KVM_RISCV_ISA_EXT_ZVBC,
+ KVM_RISCV_ISA_EXT_ZVKB,
+ KVM_RISCV_ISA_EXT_ZVKG,
+ KVM_RISCV_ISA_EXT_ZVKNED,
+ KVM_RISCV_ISA_EXT_ZVKNHA,
+ KVM_RISCV_ISA_EXT_ZVKNHB,
+ KVM_RISCV_ISA_EXT_ZVKSED,
+ KVM_RISCV_ISA_EXT_ZVKSH,
+ KVM_RISCV_ISA_EXT_ZVKT,
+ KVM_RISCV_ISA_EXT_ZFH,
+ KVM_RISCV_ISA_EXT_ZFHMIN,
+ KVM_RISCV_ISA_EXT_ZIHINTNTL,
+ KVM_RISCV_ISA_EXT_ZVFH,
+ KVM_RISCV_ISA_EXT_ZVFHMIN,
+ KVM_RISCV_ISA_EXT_ZFA,
KVM_RISCV_ISA_EXT_MAX,
};
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index f719107..604d6bf 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -7,6 +7,7 @@
CFLAGS_REMOVE_ftrace.o = $(CC_FLAGS_FTRACE)
CFLAGS_REMOVE_patch.o = $(CC_FLAGS_FTRACE)
CFLAGS_REMOVE_sbi.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_return_address.o = $(CC_FLAGS_FTRACE)
endif
CFLAGS_syscall_table.o += $(call cc-option,-Wno-override-init,)
CFLAGS_compat_syscall_table.o += $(call cc-option,-Wno-override-init,)
@@ -46,6 +47,7 @@
obj-y += process.o
obj-y += ptrace.o
obj-y += reset.o
+obj-y += return_address.o
obj-y += setup.o
obj-y += signal.o
obj-y += syscall_table.o
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 89920f8..79a5a35f 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -24,6 +24,7 @@
#include <asm/hwprobe.h>
#include <asm/patch.h>
#include <asm/processor.h>
+#include <asm/sbi.h>
#include <asm/vector.h>
#include "copy-unaligned.h"
@@ -202,6 +203,16 @@ static const unsigned int riscv_zvbb_exts[] = {
};
/*
+ * While the [ms]envcfg CSRs were not defined until version 1.12 of the RISC-V
+ * privileged ISA, the existence of the CSRs is implied by any extension which
+ * specifies [ms]envcfg bit(s). Hence, we define a custom ISA extension for the
+ * existence of the CSR, and treat it as a subset of those other extensions.
+ */
+static const unsigned int riscv_xlinuxenvcfg_exts[] = {
+ RISCV_ISA_EXT_XLINUXENVCFG
+};
+
+/*
* The canonical order of ISA extension names in the ISA string is defined in
* chapter 27 of the unprivileged specification.
*
@@ -250,8 +261,8 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = {
__RISCV_ISA_EXT_DATA(c, RISCV_ISA_EXT_c),
__RISCV_ISA_EXT_DATA(v, RISCV_ISA_EXT_v),
__RISCV_ISA_EXT_DATA(h, RISCV_ISA_EXT_h),
- __RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),
- __RISCV_ISA_EXT_DATA(zicboz, RISCV_ISA_EXT_ZICBOZ),
+ __RISCV_ISA_EXT_SUPERSET(zicbom, RISCV_ISA_EXT_ZICBOM, riscv_xlinuxenvcfg_exts),
+ __RISCV_ISA_EXT_SUPERSET(zicboz, RISCV_ISA_EXT_ZICBOZ, riscv_xlinuxenvcfg_exts),
__RISCV_ISA_EXT_DATA(zicntr, RISCV_ISA_EXT_ZICNTR),
__RISCV_ISA_EXT_DATA(zicond, RISCV_ISA_EXT_ZICOND),
__RISCV_ISA_EXT_DATA(zicsr, RISCV_ISA_EXT_ZICSR),
@@ -539,6 +550,20 @@ static void __init riscv_fill_hwcap_from_isa_string(unsigned long *isa2hwcap)
}
/*
+ * "V" in ISA strings is ambiguous in practice: it should mean
+ * just the standard V-1.0 but vendors aren't well behaved.
+ * Many vendors with T-Head CPU cores which implement the 0.7.1
+ * version of the vector specification put "v" into their DTs.
+ * CPU cores with the ratified spec will contain non-zero
+ * marchid.
+ */
+ if (acpi_disabled && riscv_cached_mvendorid(cpu) == THEAD_VENDOR_ID &&
+ riscv_cached_marchid(cpu) == 0x0) {
+ this_hwcap &= ~isa2hwcap[RISCV_ISA_EXT_v];
+ clear_bit(RISCV_ISA_EXT_v, isainfo->isa);
+ }
+
+ /*
* All "okay" hart should have same isa. Set HWCAP based on
* common capabilities of every "okay" hart, in case they don't
* have.
@@ -950,7 +975,7 @@ arch_initcall(check_unaligned_access_all_cpus);
void riscv_user_isa_enable(void)
{
if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_ZICBOZ))
- csr_set(CSR_SENVCFG, ENVCFG_CBZE);
+ csr_set(CSR_ENVCFG, ENVCFG_CBZE);
}
#ifdef CONFIG_RISCV_ALTERNATIVE
diff --git a/arch/riscv/kernel/paravirt.c b/arch/riscv/kernel/paravirt.c
index 8e114f5..0d6225f 100644
--- a/arch/riscv/kernel/paravirt.c
+++ b/arch/riscv/kernel/paravirt.c
@@ -41,7 +41,7 @@ static int __init parse_no_stealacc(char *arg)
early_param("no-steal-acc", parse_no_stealacc);
-DEFINE_PER_CPU(struct sbi_sta_struct, steal_time) __aligned(64);
+static DEFINE_PER_CPU(struct sbi_sta_struct, steal_time) __aligned(64);
static bool __init has_pv_steal_clock(void)
{
@@ -91,8 +91,8 @@ static int pv_time_cpu_down_prepare(unsigned int cpu)
static u64 pv_time_steal_clock(int cpu)
{
struct sbi_sta_struct *st = per_cpu_ptr(&steal_time, cpu);
- u32 sequence;
- u64 steal;
+ __le32 sequence;
+ __le64 steal;
/*
* Check the sequence field before and after reading the steal
diff --git a/arch/riscv/kernel/return_address.c b/arch/riscv/kernel/return_address.c
new file mode 100644
index 0000000..c8115ec
--- /dev/null
+++ b/arch/riscv/kernel/return_address.c
@@ -0,0 +1,48 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * This code come from arch/arm64/kernel/return_address.c
+ *
+ * Copyright (C) 2023 SiFive.
+ */
+
+#include <linux/export.h>
+#include <linux/kprobes.h>
+#include <linux/stacktrace.h>
+
+struct return_address_data {
+ unsigned int level;
+ void *addr;
+};
+
+static bool save_return_addr(void *d, unsigned long pc)
+{
+ struct return_address_data *data = d;
+
+ if (!data->level) {
+ data->addr = (void *)pc;
+ return false;
+ }
+
+ --data->level;
+
+ return true;
+}
+NOKPROBE_SYMBOL(save_return_addr);
+
+noinline void *return_address(unsigned int level)
+{
+ struct return_address_data data;
+
+ data.level = level + 3;
+ data.addr = NULL;
+
+ arch_stack_walk(save_return_addr, &data, current, NULL);
+
+ if (!data.level)
+ return data.addr;
+ else
+ return NULL;
+
+}
+EXPORT_SYMBOL_GPL(return_address);
+NOKPROBE_SYMBOL(return_address);
diff --git a/arch/riscv/kernel/suspend.c b/arch/riscv/kernel/suspend.c
index 2395093..2997953 100644
--- a/arch/riscv/kernel/suspend.c
+++ b/arch/riscv/kernel/suspend.c
@@ -15,6 +15,8 @@
void suspend_save_csrs(struct suspend_context *context)
{
context->scratch = csr_read(CSR_SCRATCH);
+ if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG))
+ context->envcfg = csr_read(CSR_ENVCFG);
context->tvec = csr_read(CSR_TVEC);
context->ie = csr_read(CSR_IE);
@@ -36,6 +38,8 @@ void suspend_save_csrs(struct suspend_context *context)
void suspend_restore_csrs(struct suspend_context *context)
{
csr_write(CSR_SCRATCH, context->scratch);
+ if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG))
+ csr_write(CSR_ENVCFG, context->envcfg);
csr_write(CSR_TVEC, context->tvec);
csr_write(CSR_IE, context->ie);
diff --git a/arch/riscv/kvm/vcpu_onereg.c b/arch/riscv/kvm/vcpu_onereg.c
index fc34557..5f7355e 100644
--- a/arch/riscv/kvm/vcpu_onereg.c
+++ b/arch/riscv/kvm/vcpu_onereg.c
@@ -42,15 +42,42 @@ static const unsigned long kvm_isa_ext_arr[] = {
KVM_ISA_EXT_ARR(SVPBMT),
KVM_ISA_EXT_ARR(ZBA),
KVM_ISA_EXT_ARR(ZBB),
+ KVM_ISA_EXT_ARR(ZBC),
+ KVM_ISA_EXT_ARR(ZBKB),
+ KVM_ISA_EXT_ARR(ZBKC),
+ KVM_ISA_EXT_ARR(ZBKX),
KVM_ISA_EXT_ARR(ZBS),
+ KVM_ISA_EXT_ARR(ZFA),
+ KVM_ISA_EXT_ARR(ZFH),
+ KVM_ISA_EXT_ARR(ZFHMIN),
KVM_ISA_EXT_ARR(ZICBOM),
KVM_ISA_EXT_ARR(ZICBOZ),
KVM_ISA_EXT_ARR(ZICNTR),
KVM_ISA_EXT_ARR(ZICOND),
KVM_ISA_EXT_ARR(ZICSR),
KVM_ISA_EXT_ARR(ZIFENCEI),
+ KVM_ISA_EXT_ARR(ZIHINTNTL),
KVM_ISA_EXT_ARR(ZIHINTPAUSE),
KVM_ISA_EXT_ARR(ZIHPM),
+ KVM_ISA_EXT_ARR(ZKND),
+ KVM_ISA_EXT_ARR(ZKNE),
+ KVM_ISA_EXT_ARR(ZKNH),
+ KVM_ISA_EXT_ARR(ZKR),
+ KVM_ISA_EXT_ARR(ZKSED),
+ KVM_ISA_EXT_ARR(ZKSH),
+ KVM_ISA_EXT_ARR(ZKT),
+ KVM_ISA_EXT_ARR(ZVBB),
+ KVM_ISA_EXT_ARR(ZVBC),
+ KVM_ISA_EXT_ARR(ZVFH),
+ KVM_ISA_EXT_ARR(ZVFHMIN),
+ KVM_ISA_EXT_ARR(ZVKB),
+ KVM_ISA_EXT_ARR(ZVKG),
+ KVM_ISA_EXT_ARR(ZVKNED),
+ KVM_ISA_EXT_ARR(ZVKNHA),
+ KVM_ISA_EXT_ARR(ZVKNHB),
+ KVM_ISA_EXT_ARR(ZVKSED),
+ KVM_ISA_EXT_ARR(ZVKSH),
+ KVM_ISA_EXT_ARR(ZVKT),
};
static unsigned long kvm_riscv_vcpu_base2isa_ext(unsigned long base_ext)
@@ -92,13 +119,40 @@ static bool kvm_riscv_vcpu_isa_disable_allowed(unsigned long ext)
case KVM_RISCV_ISA_EXT_SVNAPOT:
case KVM_RISCV_ISA_EXT_ZBA:
case KVM_RISCV_ISA_EXT_ZBB:
+ case KVM_RISCV_ISA_EXT_ZBC:
+ case KVM_RISCV_ISA_EXT_ZBKB:
+ case KVM_RISCV_ISA_EXT_ZBKC:
+ case KVM_RISCV_ISA_EXT_ZBKX:
case KVM_RISCV_ISA_EXT_ZBS:
+ case KVM_RISCV_ISA_EXT_ZFA:
+ case KVM_RISCV_ISA_EXT_ZFH:
+ case KVM_RISCV_ISA_EXT_ZFHMIN:
case KVM_RISCV_ISA_EXT_ZICNTR:
case KVM_RISCV_ISA_EXT_ZICOND:
case KVM_RISCV_ISA_EXT_ZICSR:
case KVM_RISCV_ISA_EXT_ZIFENCEI:
+ case KVM_RISCV_ISA_EXT_ZIHINTNTL:
case KVM_RISCV_ISA_EXT_ZIHINTPAUSE:
case KVM_RISCV_ISA_EXT_ZIHPM:
+ case KVM_RISCV_ISA_EXT_ZKND:
+ case KVM_RISCV_ISA_EXT_ZKNE:
+ case KVM_RISCV_ISA_EXT_ZKNH:
+ case KVM_RISCV_ISA_EXT_ZKR:
+ case KVM_RISCV_ISA_EXT_ZKSED:
+ case KVM_RISCV_ISA_EXT_ZKSH:
+ case KVM_RISCV_ISA_EXT_ZKT:
+ case KVM_RISCV_ISA_EXT_ZVBB:
+ case KVM_RISCV_ISA_EXT_ZVBC:
+ case KVM_RISCV_ISA_EXT_ZVFH:
+ case KVM_RISCV_ISA_EXT_ZVFHMIN:
+ case KVM_RISCV_ISA_EXT_ZVKB:
+ case KVM_RISCV_ISA_EXT_ZVKG:
+ case KVM_RISCV_ISA_EXT_ZVKNED:
+ case KVM_RISCV_ISA_EXT_ZVKNHA:
+ case KVM_RISCV_ISA_EXT_ZVKNHB:
+ case KVM_RISCV_ISA_EXT_ZVKSED:
+ case KVM_RISCV_ISA_EXT_ZVKSH:
+ case KVM_RISCV_ISA_EXT_ZVKT:
return false;
/* Extensions which can be disabled using Smstateen */
case KVM_RISCV_ISA_EXT_SSAIA:
diff --git a/arch/riscv/kvm/vcpu_sbi_sta.c b/arch/riscv/kvm/vcpu_sbi_sta.c
index 01f09fe..d8cf9ca 100644
--- a/arch/riscv/kvm/vcpu_sbi_sta.c
+++ b/arch/riscv/kvm/vcpu_sbi_sta.c
@@ -26,8 +26,12 @@ void kvm_riscv_vcpu_record_steal_time(struct kvm_vcpu *vcpu)
{
gpa_t shmem = vcpu->arch.sta.shmem;
u64 last_steal = vcpu->arch.sta.last_steal;
- u32 *sequence_ptr, sequence;
- u64 *steal_ptr, steal;
+ __le32 __user *sequence_ptr;
+ __le64 __user *steal_ptr;
+ __le32 sequence_le;
+ __le64 steal_le;
+ u32 sequence;
+ u64 steal;
unsigned long hva;
gfn_t gfn;
@@ -47,22 +51,22 @@ void kvm_riscv_vcpu_record_steal_time(struct kvm_vcpu *vcpu)
return;
}
- sequence_ptr = (u32 *)(hva + offset_in_page(shmem) +
+ sequence_ptr = (__le32 __user *)(hva + offset_in_page(shmem) +
offsetof(struct sbi_sta_struct, sequence));
- steal_ptr = (u64 *)(hva + offset_in_page(shmem) +
+ steal_ptr = (__le64 __user *)(hva + offset_in_page(shmem) +
offsetof(struct sbi_sta_struct, steal));
- if (WARN_ON(get_user(sequence, sequence_ptr)))
+ if (WARN_ON(get_user(sequence_le, sequence_ptr)))
return;
- sequence = le32_to_cpu(sequence);
+ sequence = le32_to_cpu(sequence_le);
sequence += 1;
if (WARN_ON(put_user(cpu_to_le32(sequence), sequence_ptr)))
return;
- if (!WARN_ON(get_user(steal, steal_ptr))) {
- steal = le64_to_cpu(steal);
+ if (!WARN_ON(get_user(steal_le, steal_ptr))) {
+ steal = le64_to_cpu(steal_le);
vcpu->arch.sta.last_steal = READ_ONCE(current->sched_info.run_delay);
steal += vcpu->arch.sta.last_steal - last_steal;
WARN_ON(put_user(cpu_to_le64(steal), steal_ptr));
diff --git a/arch/riscv/lib/csum.c b/arch/riscv/lib/csum.c
index af3df52..74af3ab 100644
--- a/arch/riscv/lib/csum.c
+++ b/arch/riscv/lib/csum.c
@@ -53,7 +53,7 @@ __sum16 csum_ipv6_magic(const struct in6_addr *saddr,
* support, so nop when Zbb is available and jump when Zbb is
* not available.
*/
- asm_volatile_goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
+ asm goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
RISCV_ISA_EXT_ZBB, 1)
:
:
@@ -170,7 +170,7 @@ do_csum_with_alignment(const unsigned char *buff, int len)
* support, so nop when Zbb is available and jump when Zbb is
* not available.
*/
- asm_volatile_goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
+ asm goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
RISCV_ISA_EXT_ZBB, 1)
:
:
@@ -178,7 +178,7 @@ do_csum_with_alignment(const unsigned char *buff, int len)
: no_zbb);
#ifdef CONFIG_32BIT
- asm_volatile_goto(".option push \n\
+ asm_goto_output(".option push \n\
.option arch,+zbb \n\
rori %[fold_temp], %[csum], 16 \n\
andi %[offset], %[offset], 1 \n\
@@ -193,7 +193,7 @@ do_csum_with_alignment(const unsigned char *buff, int len)
return (unsigned short)csum;
#else /* !CONFIG_32BIT */
- asm_volatile_goto(".option push \n\
+ asm_goto_output(".option push \n\
.option arch,+zbb \n\
rori %[fold_temp], %[csum], 32 \n\
add %[csum], %[fold_temp], %[csum] \n\
@@ -257,7 +257,7 @@ do_csum_no_alignment(const unsigned char *buff, int len)
* support, so nop when Zbb is available and jump when Zbb is
* not available.
*/
- asm_volatile_goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
+ asm goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
RISCV_ISA_EXT_ZBB, 1)
:
:
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index 431596c..5ef2a68 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -125,6 +125,26 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
return pte;
}
+unsigned long hugetlb_mask_last_page(struct hstate *h)
+{
+ unsigned long hp_size = huge_page_size(h);
+
+ switch (hp_size) {
+#ifndef __PAGETABLE_PMD_FOLDED
+ case PUD_SIZE:
+ return P4D_SIZE - PUD_SIZE;
+#endif
+ case PMD_SIZE:
+ return PUD_SIZE - PMD_SIZE;
+ case napot_cont_size(NAPOT_CONT64KB_ORDER):
+ return PMD_SIZE - napot_cont_size(NAPOT_CONT64KB_ORDER);
+ default:
+ break;
+ }
+
+ return 0UL;
+}
+
static pte_t get_clear_contig(struct mm_struct *mm,
unsigned long addr,
pte_t *ptep,
@@ -177,13 +197,36 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
return entry;
}
+static void clear_flush(struct mm_struct *mm,
+ unsigned long addr,
+ pte_t *ptep,
+ unsigned long pgsize,
+ unsigned long ncontig)
+{
+ struct vm_area_struct vma = TLB_FLUSH_VMA(mm, 0);
+ unsigned long i, saddr = addr;
+
+ for (i = 0; i < ncontig; i++, addr += pgsize, ptep++)
+ ptep_get_and_clear(mm, addr, ptep);
+
+ flush_tlb_range(&vma, saddr, addr);
+}
+
+/*
+ * When dealing with NAPOT mappings, the privileged specification indicates that
+ * "if an update needs to be made, the OS generally should first mark all of the
+ * PTEs invalid, then issue SFENCE.VMA instruction(s) covering all 4 KiB regions
+ * within the range, [...] then update the PTE(s), as described in Section
+ * 4.2.1.". That's the equivalent of the Break-Before-Make approach used by
+ * arm64.
+ */
void set_huge_pte_at(struct mm_struct *mm,
unsigned long addr,
pte_t *ptep,
pte_t pte,
unsigned long sz)
{
- unsigned long hugepage_shift;
+ unsigned long hugepage_shift, pgsize;
int i, pte_num;
if (sz >= PGDIR_SIZE)
@@ -198,7 +241,22 @@ void set_huge_pte_at(struct mm_struct *mm,
hugepage_shift = PAGE_SHIFT;
pte_num = sz >> hugepage_shift;
- for (i = 0; i < pte_num; i++, ptep++, addr += (1 << hugepage_shift))
+ pgsize = 1 << hugepage_shift;
+
+ if (!pte_present(pte)) {
+ for (i = 0; i < pte_num; i++, ptep++, addr += pgsize)
+ set_ptes(mm, addr, ptep, pte, 1);
+ return;
+ }
+
+ if (!pte_napot(pte)) {
+ set_ptes(mm, addr, ptep, pte, 1);
+ return;
+ }
+
+ clear_flush(mm, addr, ptep, pgsize, pte_num);
+
+ for (i = 0; i < pte_num; i++, ptep++, addr += pgsize)
set_pte_at(mm, addr, ptep, pte);
}
@@ -306,7 +364,7 @@ void huge_pte_clear(struct mm_struct *mm,
pte_clear(mm, addr, ptep);
}
-static __init bool is_napot_size(unsigned long size)
+static bool is_napot_size(unsigned long size)
{
unsigned long order;
@@ -334,7 +392,7 @@ arch_initcall(napot_hugetlbpages_init);
#else
-static __init bool is_napot_size(unsigned long size)
+static bool is_napot_size(unsigned long size)
{
return false;
}
@@ -351,7 +409,7 @@ int pmd_huge(pmd_t pmd)
return pmd_leaf(pmd);
}
-bool __init arch_hugetlb_valid_size(unsigned long size)
+static bool __hugetlb_valid_size(unsigned long size)
{
if (size == HPAGE_SIZE)
return true;
@@ -363,6 +421,18 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
return false;
}
+bool __init arch_hugetlb_valid_size(unsigned long size)
+{
+ return __hugetlb_valid_size(size);
+}
+
+#ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
+bool arch_hugetlb_migration_supported(struct hstate *h)
+{
+ return __hugetlb_valid_size(huge_page_size(h));
+}
+#endif
+
#ifdef CONFIG_CONTIG_ALLOC
static __init int gigantic_pages_init(void)
{
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 32cad6a..fa34cf5 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -1385,6 +1385,10 @@ void __init misc_mem_init(void)
early_memtest(min_low_pfn << PAGE_SHIFT, max_low_pfn << PAGE_SHIFT);
arch_numa_init();
sparse_init();
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+ /* The entire VMEMMAP region has been populated. Flush TLB for this region */
+ local_flush_tlb_kernel_range(VMEMMAP_START, VMEMMAP_END);
+#endif
zone_sizes_init();
arch_reserve_crashkernel();
memblock_dump_all();
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index 8d12b26f..893566e 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -66,9 +66,10 @@ static inline void local_flush_tlb_range_asid(unsigned long start,
local_flush_tlb_range_threshold_asid(start, size, stride, asid);
}
+/* Flush a range of kernel pages without broadcasting */
void local_flush_tlb_kernel_range(unsigned long start, unsigned long end)
{
- local_flush_tlb_range_asid(start, end, PAGE_SIZE, FLUSH_TLB_NO_ASID);
+ local_flush_tlb_range_asid(start, end - start, PAGE_SIZE, FLUSH_TLB_NO_ASID);
}
static void __ipi_flush_tlb_all(void *info)
@@ -233,4 +234,5 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
{
__flush_tlb_range(&batch->cpumask, FLUSH_TLB_NO_ASID, 0,
FLUSH_TLB_MAX_SIZE, PAGE_SIZE);
+ cpumask_clear(&batch->cpumask);
}
diff --git a/arch/s390/configs/compat.config b/arch/s390/configs/compat.config
new file mode 100644
index 0000000..6fd0514
--- /dev/null
+++ b/arch/s390/configs/compat.config
@@ -0,0 +1,3 @@
+# Help: Enable compat support
+CONFIG_COMPAT=y
+CONFIG_COMPAT_32BIT_TIME=y
diff --git a/arch/s390/configs/debug_defconfig b/arch/s390/configs/debug_defconfig
index cae2dd3..06756ba 100644
--- a/arch/s390/configs/debug_defconfig
+++ b/arch/s390/configs/debug_defconfig
@@ -118,7 +118,6 @@
CONFIG_UNIX_DIAG=m
CONFIG_XFRM_USER=m
CONFIG_NET_KEY=m
-CONFIG_SMC=m
CONFIG_SMC_DIAG=m
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
@@ -374,6 +373,7 @@
CONFIG_NET_ACT_GACT=m
CONFIG_GACT_PROB=y
CONFIG_NET_ACT_MIRRED=m
+CONFIG_NET_ACT_IPT=m
CONFIG_NET_ACT_NAT=m
CONFIG_NET_ACT_PEDIT=m
CONFIG_NET_ACT_SIMP=m
@@ -436,9 +436,6 @@
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
# CONFIG_MD_BITMAP_FILE is not set
-CONFIG_MD_LINEAR=m
-CONFIG_MD_MULTIPATH=m
-CONFIG_MD_FAULTY=m
CONFIG_MD_CLUSTER=m
CONFIG_BCACHE=m
CONFIG_BLK_DEV_DM=y
@@ -637,7 +634,6 @@
CONFIG_CUSE=m
CONFIG_VIRTIO_FS=m
CONFIG_OVERLAY_FS=m
-CONFIG_NETFS_SUPPORT=m
CONFIG_NETFS_STATS=y
CONFIG_FSCACHE=y
CONFIG_CACHEFILES=m
@@ -709,7 +705,6 @@
CONFIG_IMA_WRITE_POLICY=y
CONFIG_IMA_APPRAISE=y
CONFIG_LSM="yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor"
-CONFIG_INIT_STACK_NONE=y
CONFIG_BUG_ON_DATA_CORRUPTION=y
CONFIG_CRYPTO_USER=m
# CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
@@ -739,7 +734,6 @@
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_ADIANTUM=m
CONFIG_CRYPTO_ARC4=m
-CONFIG_CRYPTO_CFB=m
CONFIG_CRYPTO_HCTR2=m
CONFIG_CRYPTO_KEYWRAP=m
CONFIG_CRYPTO_LRW=m
@@ -886,4 +880,3 @@
CONFIG_STRING_SELFTEST=y
CONFIG_TEST_BITOPS=m
CONFIG_TEST_BPF=m
-CONFIG_TEST_LIVEPATCH=m
diff --git a/arch/s390/configs/defconfig b/arch/s390/configs/defconfig
index 42b9888..d33f814 100644
--- a/arch/s390/configs/defconfig
+++ b/arch/s390/configs/defconfig
@@ -109,7 +109,6 @@
CONFIG_UNIX_DIAG=m
CONFIG_XFRM_USER=m
CONFIG_NET_KEY=m
-CONFIG_SMC=m
CONFIG_SMC_DIAG=m
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
@@ -364,6 +363,7 @@
CONFIG_NET_ACT_GACT=m
CONFIG_GACT_PROB=y
CONFIG_NET_ACT_MIRRED=m
+CONFIG_NET_ACT_IPT=m
CONFIG_NET_ACT_NAT=m
CONFIG_NET_ACT_PEDIT=m
CONFIG_NET_ACT_SIMP=m
@@ -426,9 +426,6 @@
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
# CONFIG_MD_BITMAP_FILE is not set
-CONFIG_MD_LINEAR=m
-CONFIG_MD_MULTIPATH=m
-CONFIG_MD_FAULTY=m
CONFIG_MD_CLUSTER=m
CONFIG_BCACHE=m
CONFIG_BLK_DEV_DM=y
@@ -622,7 +619,6 @@
CONFIG_CUSE=m
CONFIG_VIRTIO_FS=m
CONFIG_OVERLAY_FS=m
-CONFIG_NETFS_SUPPORT=m
CONFIG_NETFS_STATS=y
CONFIG_FSCACHE=y
CONFIG_CACHEFILES=m
@@ -693,7 +689,6 @@
CONFIG_IMA_WRITE_POLICY=y
CONFIG_IMA_APPRAISE=y
CONFIG_LSM="yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor"
-CONFIG_INIT_STACK_NONE=y
CONFIG_BUG_ON_DATA_CORRUPTION=y
CONFIG_CRYPTO_FIPS=y
CONFIG_CRYPTO_USER=m
@@ -724,11 +719,9 @@
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_ADIANTUM=m
CONFIG_CRYPTO_ARC4=m
-CONFIG_CRYPTO_CFB=m
CONFIG_CRYPTO_HCTR2=m
CONFIG_CRYPTO_KEYWRAP=m
CONFIG_CRYPTO_LRW=m
-CONFIG_CRYPTO_OFB=m
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_AEGIS128=m
CONFIG_CRYPTO_CHACHA20POLY1305=m
@@ -815,4 +808,3 @@
CONFIG_PERCPU_TEST=m
CONFIG_ATOMIC64_SELFTEST=y
CONFIG_TEST_BPF=m
-CONFIG_TEST_LIVEPATCH=m
diff --git a/arch/s390/configs/zfcpdump_defconfig b/arch/s390/configs/zfcpdump_defconfig
index 30d2a16..c51f3ec 100644
--- a/arch/s390/configs/zfcpdump_defconfig
+++ b/arch/s390/configs/zfcpdump_defconfig
@@ -8,6 +8,7 @@
# CONFIG_NET_NS is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
+CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_MARCH_Z13=y
CONFIG_NR_CPUS=2
@@ -64,7 +65,6 @@
# CONFIG_MISC_FILESYSTEMS is not set
# CONFIG_NETWORK_FILESYSTEMS is not set
CONFIG_LSM="yama,loadpin,safesetid,integrity"
-CONFIG_INIT_STACK_NONE=y
# CONFIG_ZLIB_DFLTCC is not set
CONFIG_XZ_DEC_MICROLZMA=y
CONFIG_PRINTK_TIME=y
diff --git a/arch/s390/include/asm/jump_label.h b/arch/s390/include/asm/jump_label.h
index 895f774..bf78cf3 100644
--- a/arch/s390/include/asm/jump_label.h
+++ b/arch/s390/include/asm/jump_label.h
@@ -25,7 +25,7 @@
*/
static __always_inline bool arch_static_branch(struct static_key *key, bool branch)
{
- asm_volatile_goto("0: brcl 0,%l[label]\n"
+ asm goto("0: brcl 0,%l[label]\n"
".pushsection __jump_table,\"aw\"\n"
".balign 8\n"
".long 0b-.,%l[label]-.\n"
@@ -39,7 +39,7 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran
static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch)
{
- asm_volatile_goto("0: brcl 15,%l[label]\n"
+ asm goto("0: brcl 15,%l[label]\n"
".pushsection __jump_table,\"aw\"\n"
".balign 8\n"
".long 0b-.,%l[label]-.\n"
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index 621a17f..f875a40 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -676,8 +676,12 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
if (vcpu->kvm->arch.crypto.pqap_hook) {
pqap_hook = *vcpu->kvm->arch.crypto.pqap_hook;
ret = pqap_hook(vcpu);
- if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000)
- kvm_s390_set_psw_cc(vcpu, 3);
+ if (!ret) {
+ if (vcpu->run->s.regs.gprs[1] & 0x00ff0000)
+ kvm_s390_set_psw_cc(vcpu, 3);
+ else
+ kvm_s390_set_psw_cc(vcpu, 0);
+ }
up_read(&vcpu->kvm->arch.crypto.pqap_hook_rwsem);
return ret;
}
diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
index fef42e2..3af3bd2 100644
--- a/arch/s390/kvm/vsie.c
+++ b/arch/s390/kvm/vsie.c
@@ -1235,7 +1235,6 @@ static int acquire_gmap_shadow(struct kvm_vcpu *vcpu,
gmap = gmap_shadow(vcpu->arch.gmap, asce, edat);
if (IS_ERR(gmap))
return PTR_ERR(gmap);
- gmap->private = vcpu->kvm;
vcpu->kvm->stat.gmap_shadow_create++;
WRITE_ONCE(vsie_page->gmap, gmap);
return 0;
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 6f96b5a..8da39de 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -1691,6 +1691,7 @@ struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce,
return ERR_PTR(-ENOMEM);
new->mm = parent->mm;
new->parent = gmap_get(parent);
+ new->private = parent->private;
new->orig_asce = asce;
new->edat_level = edat_level;
new->initialized = false;
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 676ac74..52a44e3 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -252,7 +252,7 @@ resource_size_t pcibios_align_resource(void *data, const struct resource *res,
/* combine single writes by using store-block insn */
void __iowrite64_copy(void __iomem *to, const void *from, size_t count)
{
- zpci_memcpy_toio(to, from, count);
+ zpci_memcpy_toio(to, from, count * 8);
}
void __iomem *ioremap_prot(phys_addr_t phys_addr, size_t size,
diff --git a/arch/sparc/Makefile b/arch/sparc/Makefile
index 5f60359..2a03daa 100644
--- a/arch/sparc/Makefile
+++ b/arch/sparc/Makefile
@@ -60,7 +60,7 @@
libs-y += arch/sparc/lib/
drivers-$(CONFIG_PM) += arch/sparc/power/
-drivers-$(CONFIG_FB) += arch/sparc/video/
+drivers-$(CONFIG_FB_CORE) += arch/sparc/video/
boot := arch/sparc/boot
diff --git a/arch/sparc/include/asm/jump_label.h b/arch/sparc/include/asm/jump_label.h
index 94eb529..2718cbe 100644
--- a/arch/sparc/include/asm/jump_label.h
+++ b/arch/sparc/include/asm/jump_label.h
@@ -10,7 +10,7 @@
static __always_inline bool arch_static_branch(struct static_key *key, bool branch)
{
- asm_volatile_goto("1:\n\t"
+ asm goto("1:\n\t"
"nop\n\t"
"nop\n\t"
".pushsection __jump_table, \"aw\"\n\t"
@@ -26,7 +26,7 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran
static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch)
{
- asm_volatile_goto("1:\n\t"
+ asm goto("1:\n\t"
"b %l[l_yes]\n\t"
"nop\n\t"
".pushsection __jump_table, \"aw\"\n\t"
diff --git a/arch/sparc/video/Makefile b/arch/sparc/video/Makefile
index 6baddbd..d4d83f1 100644
--- a/arch/sparc/video/Makefile
+++ b/arch/sparc/video/Makefile
@@ -1,3 +1,3 @@
# SPDX-License-Identifier: GPL-2.0-only
-obj-$(CONFIG_FB) += fbdev.o
+obj-$(CONFIG_FB_CORE) += fbdev.o
diff --git a/arch/um/include/asm/cpufeature.h b/arch/um/include/asm/cpufeature.h
index 4b6d1b5..66fe06d 100644
--- a/arch/um/include/asm/cpufeature.h
+++ b/arch/um/include/asm/cpufeature.h
@@ -75,7 +75,7 @@ extern void setup_clear_cpu_cap(unsigned int bit);
*/
static __always_inline bool _static_cpu_has(u16 bit)
{
- asm_volatile_goto("1: jmp 6f\n"
+ asm goto("1: jmp 6f\n"
"2:\n"
".skip -(((5f-4f) - (2b-1b)) > 0) * "
"((5f-4f) - (2b-1b)),0x90\n"
diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index b9224cf..2a7279d 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -379,7 +379,7 @@
config X86_MINIMUM_CPU_FAMILY
int
default "64" if X86_64
- default "6" if X86_32 && (MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MEFFICEON || MATOM || MCRUSOE || MCORE2 || MK7 || MK8)
+ default "6" if X86_32 && (MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MEFFICEON || MATOM || MCORE2 || MK7 || MK8)
default "5" if X86_32 && X86_CMPXCHG64
default "4"
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 2264db1..da8f3ca 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -112,13 +112,13 @@
# temporary until string.h is fixed
KBUILD_CFLAGS += -ffreestanding
- ifeq ($(CONFIG_STACKPROTECTOR),y)
- ifeq ($(CONFIG_SMP),y)
+ ifeq ($(CONFIG_STACKPROTECTOR),y)
+ ifeq ($(CONFIG_SMP),y)
KBUILD_CFLAGS += -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard
- else
+ else
KBUILD_CFLAGS += -mstack-protector-guard=global
- endif
endif
+ endif
else
BITS := 64
UTS_MACHINE := x86_64
diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index b277171..a1bbedd 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -106,8 +106,7 @@
.word 0 # MinorSubsystemVersion
.long 0 # Win32VersionValue
- .long setup_size + ZO__end + pecompat_vsize
- # SizeOfImage
+ .long setup_size + ZO__end # SizeOfImage
.long salign # SizeOfHeaders
.long 0 # CheckSum
@@ -143,7 +142,7 @@
.ascii ".setup"
.byte 0
.byte 0
- .long setup_size - salign # VirtualSize
+ .long pecompat_fstart - salign # VirtualSize
.long salign # VirtualAddress
.long pecompat_fstart - salign # SizeOfRawData
.long salign # PointerToRawData
@@ -156,8 +155,8 @@
#ifdef CONFIG_EFI_MIXED
.asciz ".compat"
- .long 8 # VirtualSize
- .long setup_size + ZO__end # VirtualAddress
+ .long pecompat_fsize # VirtualSize
+ .long pecompat_fstart # VirtualAddress
.long pecompat_fsize # SizeOfRawData
.long pecompat_fstart # PointerToRawData
@@ -172,17 +171,16 @@
* modes this image supports.
*/
.pushsection ".pecompat", "a", @progbits
- .balign falign
- .set pecompat_vsize, salign
+ .balign salign
.globl pecompat_fstart
pecompat_fstart:
.byte 0x1 # Version
.byte 8 # Size
.word IMAGE_FILE_MACHINE_I386 # PE machine type
.long setup_size + ZO_efi32_pe_entry # Entrypoint
+ .byte 0x0 # Sentinel
.popsection
#else
- .set pecompat_vsize, 0
.set pecompat_fstart, setup_size
#endif
.ascii ".text"
diff --git a/arch/x86/boot/setup.ld b/arch/x86/boot/setup.ld
index 83bb7ef..3a2d136 100644
--- a/arch/x86/boot/setup.ld
+++ b/arch/x86/boot/setup.ld
@@ -24,6 +24,9 @@
.text : { *(.text .text.*) }
.text32 : { *(.text32) }
+ .pecompat : { *(.pecompat) }
+ PROVIDE(pecompat_fsize = setup_size - pecompat_fstart);
+
. = ALIGN(16);
.rodata : { *(.rodata*) }
@@ -36,9 +39,6 @@
. = ALIGN(16);
.data : { *(.data*) }
- .pecompat : { *(.pecompat) }
- PROVIDE(pecompat_fsize = setup_size - pecompat_fstart);
-
.signature : {
setup_sig = .;
LONG(0x5a5aaa55)
diff --git a/arch/x86/entry/entry.S b/arch/x86/entry/entry.S
index 8c8d38f..0033790 100644
--- a/arch/x86/entry/entry.S
+++ b/arch/x86/entry/entry.S
@@ -6,6 +6,9 @@
#include <linux/export.h>
#include <linux/linkage.h>
#include <asm/msr-index.h>
+#include <asm/unwind_hints.h>
+#include <asm/segment.h>
+#include <asm/cache.h>
.pushsection .noinstr.text, "ax"
@@ -20,3 +23,23 @@
EXPORT_SYMBOL_GPL(entry_ibpb);
.popsection
+
+/*
+ * Define the VERW operand that is disguised as entry code so that
+ * it can be referenced with KPTI enabled. This ensure VERW can be
+ * used late in exit-to-user path after page tables are switched.
+ */
+.pushsection .entry.text, "ax"
+
+.align L1_CACHE_BYTES, 0xcc
+SYM_CODE_START_NOALIGN(mds_verw_sel)
+ UNWIND_HINT_UNDEFINED
+ ANNOTATE_NOENDBR
+ .word __KERNEL_DS
+.align L1_CACHE_BYTES, 0xcc
+SYM_CODE_END(mds_verw_sel);
+/* For KVM */
+EXPORT_SYMBOL_GPL(mds_verw_sel);
+
+.popsection
+
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index c73047b..fba4276 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -885,6 +885,7 @@
BUG_IF_WRONG_CR3 no_user_check=1
popfl
popl %eax
+ CLEAR_CPU_BUFFERS
/*
* Return back to the vDSO, which will pop ecx and edx.
@@ -954,6 +955,7 @@
/* Restore user state */
RESTORE_REGS pop=4 # skip orig_eax/error_code
+ CLEAR_CPU_BUFFERS
.Lirq_return:
/*
* ARCH_HAS_MEMBARRIER_SYNC_CORE rely on IRET core serialization
@@ -1146,6 +1148,7 @@
/* Not on SYSENTER stack. */
call exc_nmi
+ CLEAR_CPU_BUFFERS
jmp .Lnmi_return
.Lnmi_from_sysenter_stack:
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index c40f89a..9bb4859 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -161,6 +161,7 @@
SYM_INNER_LABEL(entry_SYSRETQ_unsafe_stack, SYM_L_GLOBAL)
ANNOTATE_NOENDBR
swapgs
+ CLEAR_CPU_BUFFERS
sysretq
SYM_INNER_LABEL(entry_SYSRETQ_end, SYM_L_GLOBAL)
ANNOTATE_NOENDBR
@@ -573,6 +574,7 @@
.Lswapgs_and_iret:
swapgs
+ CLEAR_CPU_BUFFERS
/* Assert that the IRET frame indicates user mode. */
testb $3, 8(%rsp)
jnz .Lnative_iret
@@ -723,6 +725,8 @@
*/
popq %rax /* Restore user RAX */
+ CLEAR_CPU_BUFFERS
+
/*
* RSP now points to an ordinary IRET frame, except that the page
* is read-only and RSP[31:16] are preloaded with the userspace
@@ -1450,6 +1454,12 @@
movq $0, 5*8(%rsp) /* clear "NMI executing" */
/*
+ * Skip CLEAR_CPU_BUFFERS here, since it only helps in rare cases like
+ * NMI in kernel after user state is restored. For an unprivileged user
+ * these conditions are hard to meet.
+ */
+
+ /*
* iretq reads the "iret" frame and exits the NMI stack in a
* single instruction. We are returning to kernel mode, so this
* cannot result in a fault. Similarly, we don't need to worry
@@ -1466,6 +1476,7 @@
UNWIND_HINT_END_OF_STACK
ENDBR
mov $-ENOSYS, %eax
+ CLEAR_CPU_BUFFERS
sysretl
SYM_CODE_END(entry_SYSCALL32_ignore)
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index de94e2e..eabf48c 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -270,6 +270,7 @@
xorl %r9d, %r9d
xorl %r10d, %r10d
swapgs
+ CLEAR_CPU_BUFFERS
sysretl
SYM_INNER_LABEL(entry_SYSRETL_compat_end, SYM_L_GLOBAL)
ANNOTATE_NOENDBR
diff --git a/arch/x86/hyperv/hv_vtl.c b/arch/x86/hyperv/hv_vtl.c
index 96e6c51..cf1b78c 100644
--- a/arch/x86/hyperv/hv_vtl.c
+++ b/arch/x86/hyperv/hv_vtl.c
@@ -16,6 +16,11 @@
extern struct boot_params boot_params;
static struct real_mode_header hv_vtl_real_mode_header;
+static bool __init hv_vtl_msi_ext_dest_id(void)
+{
+ return true;
+}
+
void __init hv_vtl_init_platform(void)
{
pr_info("Linux runs in Hyper-V Virtual Trust Level\n");
@@ -38,6 +43,8 @@ void __init hv_vtl_init_platform(void)
x86_platform.legacy.warm_reset = 0;
x86_platform.legacy.reserve_bios_regions = 0;
x86_platform.legacy.devices.pnpbios = 0;
+
+ x86_init.hyper.msi_ext_dest_id = hv_vtl_msi_ext_dest_id;
}
static inline u64 hv_vtl_system_desc_base(struct ldttss_desc *desc)
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 7dcbf15..768d73d 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -15,6 +15,7 @@
#include <asm/io.h>
#include <asm/coco.h>
#include <asm/mem_encrypt.h>
+#include <asm/set_memory.h>
#include <asm/mshyperv.h>
#include <asm/hypervisor.h>
#include <asm/mtrr.h>
@@ -503,6 +504,31 @@ static int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
}
/*
+ * When transitioning memory between encrypted and decrypted, the caller
+ * of set_memory_encrypted() or set_memory_decrypted() is responsible for
+ * ensuring that the memory isn't in use and isn't referenced while the
+ * transition is in progress. The transition has multiple steps, and the
+ * memory is in an inconsistent state until all steps are complete. A
+ * reference while the state is inconsistent could result in an exception
+ * that can't be cleanly fixed up.
+ *
+ * But the Linux kernel load_unaligned_zeropad() mechanism could cause a
+ * stray reference that can't be prevented by the caller, so Linux has
+ * specific code to handle this case. But when the #VC and #VE exceptions
+ * routed to a paravisor, the specific code doesn't work. To avoid this
+ * problem, mark the pages as "not present" while the transition is in
+ * progress. If load_unaligned_zeropad() causes a stray reference, a normal
+ * page fault is generated instead of #VC or #VE, and the page-fault-based
+ * handlers for load_unaligned_zeropad() resolve the reference. When the
+ * transition is complete, hv_vtom_set_host_visibility() marks the pages
+ * as "present" again.
+ */
+static bool hv_vtom_clear_present(unsigned long kbuffer, int pagecount, bool enc)
+{
+ return !set_memory_np(kbuffer, pagecount);
+}
+
+/*
* hv_vtom_set_host_visibility - Set specified memory visible to host.
*
* In Isolation VM, all guest memory is encrypted from host and guest
@@ -515,16 +541,28 @@ static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bo
enum hv_mem_host_visibility visibility = enc ?
VMBUS_PAGE_NOT_VISIBLE : VMBUS_PAGE_VISIBLE_READ_WRITE;
u64 *pfn_array;
+ phys_addr_t paddr;
+ void *vaddr;
int ret = 0;
bool result = true;
int i, pfn;
pfn_array = kmalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL);
- if (!pfn_array)
- return false;
+ if (!pfn_array) {
+ result = false;
+ goto err_set_memory_p;
+ }
for (i = 0, pfn = 0; i < pagecount; i++) {
- pfn_array[pfn] = virt_to_hvpfn((void *)kbuffer + i * HV_HYP_PAGE_SIZE);
+ /*
+ * Use slow_virt_to_phys() because the PRESENT bit has been
+ * temporarily cleared in the PTEs. slow_virt_to_phys() works
+ * without the PRESENT bit while virt_to_hvpfn() or similar
+ * does not.
+ */
+ vaddr = (void *)kbuffer + (i * HV_HYP_PAGE_SIZE);
+ paddr = slow_virt_to_phys(vaddr);
+ pfn_array[pfn] = paddr >> HV_HYP_PAGE_SHIFT;
pfn++;
if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) {
@@ -538,14 +576,30 @@ static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bo
}
}
- err_free_pfn_array:
+err_free_pfn_array:
kfree(pfn_array);
+
+err_set_memory_p:
+ /*
+ * Set the PTE PRESENT bits again to revert what hv_vtom_clear_present()
+ * did. Do this even if there is an error earlier in this function in
+ * order to avoid leaving the memory range in a "broken" state. Setting
+ * the PRESENT bits shouldn't fail, but return an error if it does.
+ */
+ if (set_memory_p(kbuffer, pagecount))
+ result = false;
+
return result;
}
static bool hv_vtom_tlb_flush_required(bool private)
{
- return true;
+ /*
+ * Since hv_vtom_clear_present() marks the PTEs as "not present"
+ * and flushes the TLB, they can't be in the TLB. That makes the
+ * flush controlled by this function redundant, so return "false".
+ */
+ return false;
}
static bool hv_vtom_cache_flush_required(void)
@@ -608,6 +662,7 @@ void __init hv_vtom_init(void)
x86_platform.hyper.is_private_mmio = hv_is_private_mmio;
x86_platform.guest.enc_cache_flush_required = hv_vtom_cache_flush_required;
x86_platform.guest.enc_tlb_flush_required = hv_vtom_tlb_flush_required;
+ x86_platform.guest.enc_status_change_prepare = hv_vtom_clear_present;
x86_platform.guest.enc_status_change_finish = hv_vtom_set_host_visibility;
/* Set WB as the default cache mode. */
diff --git a/arch/x86/include/asm/coco.h b/arch/x86/include/asm/coco.h
index 6ae2d16..76c310b 100644
--- a/arch/x86/include/asm/coco.h
+++ b/arch/x86/include/asm/coco.h
@@ -10,13 +10,14 @@ enum cc_vendor {
CC_VENDOR_INTEL,
};
-extern enum cc_vendor cc_vendor;
-
#ifdef CONFIG_ARCH_HAS_CC_PLATFORM
+extern enum cc_vendor cc_vendor;
void cc_set_mask(u64 mask);
u64 cc_mkenc(u64 val);
u64 cc_mkdec(u64 val);
#else
+#define cc_vendor (CC_VENDOR_NONE)
+
static inline u64 cc_mkenc(u64 val)
{
return val;
diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index a26bebb..a127369 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -168,7 +168,7 @@ extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit);
*/
static __always_inline bool _static_cpu_has(u16 bit)
{
- asm_volatile_goto(
+ asm goto(
ALTERNATIVE_TERNARY("jmp 6f", %P[feature], "", "jmp %l[t_no]")
".pushsection .altinstr_aux,\"ax\"\n"
"6:\n"
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index fdf723b..2b62cdd 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -95,7 +95,7 @@
#define X86_FEATURE_SYSENTER32 ( 3*32+15) /* "" sysenter in IA32 userspace */
#define X86_FEATURE_REP_GOOD ( 3*32+16) /* REP microcode works well */
#define X86_FEATURE_AMD_LBR_V2 ( 3*32+17) /* AMD Last Branch Record Extension Version 2 */
-/* FREE, was #define X86_FEATURE_LFENCE_RDTSC ( 3*32+18) "" LFENCE synchronizes RDTSC */
+#define X86_FEATURE_CLEAR_CPU_BUF ( 3*32+18) /* "" Clear CPU buffers using VERW */
#define X86_FEATURE_ACC_POWER ( 3*32+19) /* AMD Accumulated Power Mechanism */
#define X86_FEATURE_NOPL ( 3*32+20) /* The NOPL (0F 1F) instructions */
#define X86_FEATURE_ALWAYS ( 3*32+21) /* "" Always-present feature */
diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
index ce8f501..7e523bb 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -91,7 +91,6 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
static __always_inline void arch_exit_to_user_mode(void)
{
- mds_user_clear_cpu_buffers();
amd_clear_divider();
}
#define arch_exit_to_user_mode arch_exit_to_user_mode
diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
index 071572e..cbbef32 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -24,7 +24,7 @@
static __always_inline bool arch_static_branch(struct static_key *key, bool branch)
{
- asm_volatile_goto("1:"
+ asm goto("1:"
"jmp %l[l_yes] # objtool NOPs this \n\t"
JUMP_TABLE_ENTRY
: : "i" (key), "i" (2 | branch) : : l_yes);
@@ -38,7 +38,7 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran
static __always_inline bool arch_static_branch(struct static_key * const key, const bool branch)
{
- asm_volatile_goto("1:"
+ asm goto("1:"
".byte " __stringify(BYTES_NOP5) "\n\t"
JUMP_TABLE_ENTRY
: : "i" (key), "i" (branch) : : l_yes);
@@ -52,7 +52,7 @@ static __always_inline bool arch_static_branch(struct static_key * const key, co
static __always_inline bool arch_static_branch_jump(struct static_key * const key, const bool branch)
{
- asm_volatile_goto("1:"
+ asm goto("1:"
"jmp %l[l_yes]\n\t"
JUMP_TABLE_ENTRY
: : "i" (key), "i" (branch) : : l_yes);
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b5b2d0fd..d271ba2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1145,6 +1145,8 @@ struct kvm_hv {
unsigned int synic_auto_eoi_used;
struct kvm_hv_syndbg hv_syndbg;
+
+ bool xsaves_xsavec_checked;
};
#endif
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 262e655..2aa52ca 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -315,6 +315,17 @@
#endif
.endm
+/*
+ * Macro to execute VERW instruction that mitigate transient data sampling
+ * attacks such as MDS. On affected systems a microcode update overloaded VERW
+ * instruction to also clear the CPU buffers. VERW clobbers CFLAGS.ZF.
+ *
+ * Note: Only the memory operand variant of VERW clears the CPU buffers.
+ */
+.macro CLEAR_CPU_BUFFERS
+ ALTERNATIVE "", __stringify(verw _ASM_RIP(mds_verw_sel)), X86_FEATURE_CLEAR_CPU_BUF
+.endm
+
#else /* __ASSEMBLY__ */
#define ANNOTATE_RETPOLINE_SAFE \
@@ -529,13 +540,14 @@ DECLARE_STATIC_KEY_FALSE(switch_to_cond_stibp);
DECLARE_STATIC_KEY_FALSE(switch_mm_cond_ibpb);
DECLARE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
-DECLARE_STATIC_KEY_FALSE(mds_user_clear);
DECLARE_STATIC_KEY_FALSE(mds_idle_clear);
DECLARE_STATIC_KEY_FALSE(switch_mm_cond_l1d_flush);
DECLARE_STATIC_KEY_FALSE(mmio_stale_data_clear);
+extern u16 mds_verw_sel;
+
#include <asm/segment.h>
/**
@@ -562,17 +574,6 @@ static __always_inline void mds_clear_cpu_buffers(void)
}
/**
- * mds_user_clear_cpu_buffers - Mitigation for MDS and TAA vulnerability
- *
- * Clear CPU buffers if the corresponding static key is enabled
- */
-static __always_inline void mds_user_clear_cpu_buffers(void)
-{
- if (static_branch_likely(&mds_user_clear))
- mds_clear_cpu_buffers();
-}
-
-/**
* mds_idle_clear_cpu_buffers - Mitigation for MDS vulnerability
*
* Clear CPU buffers if the corresponding static key is enabled
diff --git a/arch/x86/include/asm/rmwcc.h b/arch/x86/include/asm/rmwcc.h
index 4b081e0..363266c 100644
--- a/arch/x86/include/asm/rmwcc.h
+++ b/arch/x86/include/asm/rmwcc.h
@@ -13,7 +13,7 @@
#define __GEN_RMWcc(fullop, _var, cc, clobbers, ...) \
({ \
bool c = false; \
- asm_volatile_goto (fullop "; j" #cc " %l[cc_label]" \
+ asm goto (fullop "; j" #cc " %l[cc_label]" \
: : [var] "m" (_var), ## __VA_ARGS__ \
: clobbers : cc_label); \
if (0) { \
diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index a5e8964..9aee318 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -47,6 +47,7 @@ int set_memory_uc(unsigned long addr, int numpages);
int set_memory_wc(unsigned long addr, int numpages);
int set_memory_wb(unsigned long addr, int numpages);
int set_memory_np(unsigned long addr, int numpages);
+int set_memory_p(unsigned long addr, int numpages);
int set_memory_4k(unsigned long addr, int numpages);
int set_memory_encrypted(unsigned long addr, int numpages);
int set_memory_decrypted(unsigned long addr, int numpages);
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index d6cd934..48f8dd4 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -205,7 +205,7 @@ static inline void clwb(volatile void *__p)
#ifdef CONFIG_X86_USER_SHADOW_STACK
static inline int write_user_shstk_64(u64 __user *addr, u64 val)
{
- asm_volatile_goto("1: wrussq %[val], (%[addr])\n"
+ asm goto("1: wrussq %[val], (%[addr])\n"
_ASM_EXTABLE(1b, %l[fail])
:: [addr] "r" (addr), [val] "r" (val)
:: fail);
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 5c367c1..237dc8c 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -133,7 +133,7 @@ extern int __get_user_bad(void);
#ifdef CONFIG_X86_32
#define __put_user_goto_u64(x, addr, label) \
- asm_volatile_goto("\n" \
+ asm goto("\n" \
"1: movl %%eax,0(%1)\n" \
"2: movl %%edx,4(%1)\n" \
_ASM_EXTABLE_UA(1b, %l2) \
@@ -295,7 +295,7 @@ do { \
} while (0)
#define __get_user_asm(x, addr, itype, ltype, label) \
- asm_volatile_goto("\n" \
+ asm_goto_output("\n" \
"1: mov"itype" %[umem],%[output]\n" \
_ASM_EXTABLE_UA(1b, %l2) \
: [output] ltype(x) \
@@ -375,7 +375,7 @@ do { \
__typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \
__typeof__(*(_ptr)) __old = *_old; \
__typeof__(*(_ptr)) __new = (_new); \
- asm_volatile_goto("\n" \
+ asm_goto_output("\n" \
"1: " LOCK_PREFIX "cmpxchg"itype" %[new], %[ptr]\n"\
_ASM_EXTABLE_UA(1b, %l[label]) \
: CC_OUT(z) (success), \
@@ -394,7 +394,7 @@ do { \
__typeof__(_ptr) _old = (__typeof__(_ptr))(_pold); \
__typeof__(*(_ptr)) __old = *_old; \
__typeof__(*(_ptr)) __new = (_new); \
- asm_volatile_goto("\n" \
+ asm_goto_output("\n" \
"1: " LOCK_PREFIX "cmpxchg8b %[ptr]\n" \
_ASM_EXTABLE_UA(1b, %l[label]) \
: CC_OUT(z) (success), \
@@ -477,7 +477,7 @@ struct __large_struct { unsigned long buf[100]; };
* aliasing issues.
*/
#define __put_user_goto(x, addr, itype, ltype, label) \
- asm_volatile_goto("\n" \
+ asm goto("\n" \
"1: mov"itype" %0,%1\n" \
_ASM_EXTABLE_UA(1b, %l2) \
: : ltype(x), "m" (__m(addr)) \
diff --git a/arch/x86/include/asm/vsyscall.h b/arch/x86/include/asm/vsyscall.h
index ab60a71..472f026 100644
--- a/arch/x86/include/asm/vsyscall.h
+++ b/arch/x86/include/asm/vsyscall.h
@@ -4,6 +4,7 @@
#include <linux/seqlock.h>
#include <uapi/asm/vsyscall.h>
+#include <asm/page_types.h>
#ifdef CONFIG_X86_VSYSCALL_EMULATION
extern void map_vsyscall(void);
@@ -24,4 +25,13 @@ static inline bool emulate_vsyscall(unsigned long error_code,
}
#endif
+/*
+ * The (legacy) vsyscall page is the long page in the kernel portion
+ * of the address space that has user-accessible permissions.
+ */
+static inline bool is_vsyscall_vaddr(unsigned long vaddr)
+{
+ return unlikely((vaddr & PAGE_MASK) == VSYSCALL_ADDR);
+}
+
#endif /* _ASM_X86_VSYSCALL_H */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index bb0ab84..48d049c 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -111,9 +111,6 @@ DEFINE_STATIC_KEY_FALSE(switch_mm_cond_ibpb);
/* Control unconditional IBPB in switch_mm() */
DEFINE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
-/* Control MDS CPU buffer clear before returning to user space */
-DEFINE_STATIC_KEY_FALSE(mds_user_clear);
-EXPORT_SYMBOL_GPL(mds_user_clear);
/* Control MDS CPU buffer clear before idling (halt, mwait) */
DEFINE_STATIC_KEY_FALSE(mds_idle_clear);
EXPORT_SYMBOL_GPL(mds_idle_clear);
@@ -252,7 +249,7 @@ static void __init mds_select_mitigation(void)
if (!boot_cpu_has(X86_FEATURE_MD_CLEAR))
mds_mitigation = MDS_MITIGATION_VMWERV;
- static_branch_enable(&mds_user_clear);
+ setup_force_cpu_cap(X86_FEATURE_CLEAR_CPU_BUF);
if (!boot_cpu_has(X86_BUG_MSBDS_ONLY) &&
(mds_nosmt || cpu_mitigations_auto_nosmt()))
@@ -356,7 +353,7 @@ static void __init taa_select_mitigation(void)
* For guests that can't determine whether the correct microcode is
* present on host, enable the mitigation for UCODE_NEEDED as well.
*/
- static_branch_enable(&mds_user_clear);
+ setup_force_cpu_cap(X86_FEATURE_CLEAR_CPU_BUF);
if (taa_nosmt || cpu_mitigations_auto_nosmt())
cpu_smt_disable(false);
@@ -424,7 +421,7 @@ static void __init mmio_select_mitigation(void)
*/
if (boot_cpu_has_bug(X86_BUG_MDS) || (boot_cpu_has_bug(X86_BUG_TAA) &&
boot_cpu_has(X86_FEATURE_RTM)))
- static_branch_enable(&mds_user_clear);
+ setup_force_cpu_cap(X86_FEATURE_CLEAR_CPU_BUF);
else
static_branch_enable(&mmio_stale_data_clear);
@@ -484,12 +481,12 @@ static void __init md_clear_update_mitigation(void)
if (cpu_mitigations_off())
return;
- if (!static_key_enabled(&mds_user_clear))
+ if (!boot_cpu_has(X86_FEATURE_CLEAR_CPU_BUF))
goto out;
/*
- * mds_user_clear is now enabled. Update MDS, TAA and MMIO Stale Data
- * mitigation, if necessary.
+ * X86_FEATURE_CLEAR_CPU_BUF is now enabled. Update MDS, TAA and MMIO
+ * Stale Data mitigation, if necessary.
*/
if (mds_mitigation == MDS_MITIGATION_OFF &&
boot_cpu_has_bug(X86_BUG_MDS)) {
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 0b97bcd..fbc4e60 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1589,6 +1589,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
get_cpu_vendor(c);
get_cpu_cap(c);
setup_force_cpu_cap(X86_FEATURE_CPUID);
+ get_cpu_address_sizes(c);
cpu_parse_early_param();
if (this_cpu->c_early_init)
@@ -1601,10 +1602,9 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
this_cpu->c_bsp_init(c);
} else {
setup_clear_cpu_cap(X86_FEATURE_CPUID);
+ get_cpu_address_sizes(c);
}
- get_cpu_address_sizes(c);
-
setup_force_cpu_cap(X86_FEATURE_ALWAYS);
cpu_set_bug_bits(c);
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index a927a8f..40dec9b 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -184,6 +184,90 @@ static bool bad_spectre_microcode(struct cpuinfo_x86 *c)
return false;
}
+#define MSR_IA32_TME_ACTIVATE 0x982
+
+/* Helpers to access TME_ACTIVATE MSR */
+#define TME_ACTIVATE_LOCKED(x) (x & 0x1)
+#define TME_ACTIVATE_ENABLED(x) (x & 0x2)
+
+#define TME_ACTIVATE_POLICY(x) ((x >> 4) & 0xf) /* Bits 7:4 */
+#define TME_ACTIVATE_POLICY_AES_XTS_128 0
+
+#define TME_ACTIVATE_KEYID_BITS(x) ((x >> 32) & 0xf) /* Bits 35:32 */
+
+#define TME_ACTIVATE_CRYPTO_ALGS(x) ((x >> 48) & 0xffff) /* Bits 63:48 */
+#define TME_ACTIVATE_CRYPTO_AES_XTS_128 1
+
+/* Values for mktme_status (SW only construct) */
+#define MKTME_ENABLED 0
+#define MKTME_DISABLED 1
+#define MKTME_UNINITIALIZED 2
+static int mktme_status = MKTME_UNINITIALIZED;
+
+static void detect_tme_early(struct cpuinfo_x86 *c)
+{
+ u64 tme_activate, tme_policy, tme_crypto_algs;
+ int keyid_bits = 0, nr_keyids = 0;
+ static u64 tme_activate_cpu0 = 0;
+
+ rdmsrl(MSR_IA32_TME_ACTIVATE, tme_activate);
+
+ if (mktme_status != MKTME_UNINITIALIZED) {
+ if (tme_activate != tme_activate_cpu0) {
+ /* Broken BIOS? */
+ pr_err_once("x86/tme: configuration is inconsistent between CPUs\n");
+ pr_err_once("x86/tme: MKTME is not usable\n");
+ mktme_status = MKTME_DISABLED;
+
+ /* Proceed. We may need to exclude bits from x86_phys_bits. */
+ }
+ } else {
+ tme_activate_cpu0 = tme_activate;
+ }
+
+ if (!TME_ACTIVATE_LOCKED(tme_activate) || !TME_ACTIVATE_ENABLED(tme_activate)) {
+ pr_info_once("x86/tme: not enabled by BIOS\n");
+ mktme_status = MKTME_DISABLED;
+ return;
+ }
+
+ if (mktme_status != MKTME_UNINITIALIZED)
+ goto detect_keyid_bits;
+
+ pr_info("x86/tme: enabled by BIOS\n");
+
+ tme_policy = TME_ACTIVATE_POLICY(tme_activate);
+ if (tme_policy != TME_ACTIVATE_POLICY_AES_XTS_128)
+ pr_warn("x86/tme: Unknown policy is active: %#llx\n", tme_policy);
+
+ tme_crypto_algs = TME_ACTIVATE_CRYPTO_ALGS(tme_activate);
+ if (!(tme_crypto_algs & TME_ACTIVATE_CRYPTO_AES_XTS_128)) {
+ pr_err("x86/mktme: No known encryption algorithm is supported: %#llx\n",
+ tme_crypto_algs);
+ mktme_status = MKTME_DISABLED;
+ }
+detect_keyid_bits:
+ keyid_bits = TME_ACTIVATE_KEYID_BITS(tme_activate);
+ nr_keyids = (1UL << keyid_bits) - 1;
+ if (nr_keyids) {
+ pr_info_once("x86/mktme: enabled by BIOS\n");
+ pr_info_once("x86/mktme: %d KeyIDs available\n", nr_keyids);
+ } else {
+ pr_info_once("x86/mktme: disabled by BIOS\n");
+ }
+
+ if (mktme_status == MKTME_UNINITIALIZED) {
+ /* MKTME is usable */
+ mktme_status = MKTME_ENABLED;
+ }
+
+ /*
+ * KeyID bits effectively lower the number of physical address
+ * bits. Update cpuinfo_x86::x86_phys_bits accordingly.
+ */
+ c->x86_phys_bits -= keyid_bits;
+}
+
static void early_init_intel(struct cpuinfo_x86 *c)
{
u64 misc_enable;
@@ -322,6 +406,13 @@ static void early_init_intel(struct cpuinfo_x86 *c)
*/
if (detect_extended_topology_early(c) < 0)
detect_ht_early(c);
+
+ /*
+ * Adjust the number of physical bits early because it affects the
+ * valid bits of the MTRR mask registers.
+ */
+ if (cpu_has(c, X86_FEATURE_TME))
+ detect_tme_early(c);
}
static void bsp_init_intel(struct cpuinfo_x86 *c)
@@ -482,90 +573,6 @@ static void srat_detect_node(struct cpuinfo_x86 *c)
#endif
}
-#define MSR_IA32_TME_ACTIVATE 0x982
-
-/* Helpers to access TME_ACTIVATE MSR */
-#define TME_ACTIVATE_LOCKED(x) (x & 0x1)
-#define TME_ACTIVATE_ENABLED(x) (x & 0x2)
-
-#define TME_ACTIVATE_POLICY(x) ((x >> 4) & 0xf) /* Bits 7:4 */
-#define TME_ACTIVATE_POLICY_AES_XTS_128 0
-
-#define TME_ACTIVATE_KEYID_BITS(x) ((x >> 32) & 0xf) /* Bits 35:32 */
-
-#define TME_ACTIVATE_CRYPTO_ALGS(x) ((x >> 48) & 0xffff) /* Bits 63:48 */
-#define TME_ACTIVATE_CRYPTO_AES_XTS_128 1
-
-/* Values for mktme_status (SW only construct) */
-#define MKTME_ENABLED 0
-#define MKTME_DISABLED 1
-#define MKTME_UNINITIALIZED 2
-static int mktme_status = MKTME_UNINITIALIZED;
-
-static void detect_tme(struct cpuinfo_x86 *c)
-{
- u64 tme_activate, tme_policy, tme_crypto_algs;
- int keyid_bits = 0, nr_keyids = 0;
- static u64 tme_activate_cpu0 = 0;
-
- rdmsrl(MSR_IA32_TME_ACTIVATE, tme_activate);
-
- if (mktme_status != MKTME_UNINITIALIZED) {
- if (tme_activate != tme_activate_cpu0) {
- /* Broken BIOS? */
- pr_err_once("x86/tme: configuration is inconsistent between CPUs\n");
- pr_err_once("x86/tme: MKTME is not usable\n");
- mktme_status = MKTME_DISABLED;
-
- /* Proceed. We may need to exclude bits from x86_phys_bits. */
- }
- } else {
- tme_activate_cpu0 = tme_activate;
- }
-
- if (!TME_ACTIVATE_LOCKED(tme_activate) || !TME_ACTIVATE_ENABLED(tme_activate)) {
- pr_info_once("x86/tme: not enabled by BIOS\n");
- mktme_status = MKTME_DISABLED;
- return;
- }
-
- if (mktme_status != MKTME_UNINITIALIZED)
- goto detect_keyid_bits;
-
- pr_info("x86/tme: enabled by BIOS\n");
-
- tme_policy = TME_ACTIVATE_POLICY(tme_activate);
- if (tme_policy != TME_ACTIVATE_POLICY_AES_XTS_128)
- pr_warn("x86/tme: Unknown policy is active: %#llx\n", tme_policy);
-
- tme_crypto_algs = TME_ACTIVATE_CRYPTO_ALGS(tme_activate);
- if (!(tme_crypto_algs & TME_ACTIVATE_CRYPTO_AES_XTS_128)) {
- pr_err("x86/mktme: No known encryption algorithm is supported: %#llx\n",
- tme_crypto_algs);
- mktme_status = MKTME_DISABLED;
- }
-detect_keyid_bits:
- keyid_bits = TME_ACTIVATE_KEYID_BITS(tme_activate);
- nr_keyids = (1UL << keyid_bits) - 1;
- if (nr_keyids) {
- pr_info_once("x86/mktme: enabled by BIOS\n");
- pr_info_once("x86/mktme: %d KeyIDs available\n", nr_keyids);
- } else {
- pr_info_once("x86/mktme: disabled by BIOS\n");
- }
-
- if (mktme_status == MKTME_UNINITIALIZED) {
- /* MKTME is usable */
- mktme_status = MKTME_ENABLED;
- }
-
- /*
- * KeyID bits effectively lower the number of physical address
- * bits. Update cpuinfo_x86::x86_phys_bits accordingly.
- */
- c->x86_phys_bits -= keyid_bits;
-}
-
static void init_cpuid_fault(struct cpuinfo_x86 *c)
{
u64 msr;
@@ -702,9 +709,6 @@ static void init_intel(struct cpuinfo_x86 *c)
init_ia32_feat_ctl(c);
- if (cpu_has(c, X86_FEATURE_TME))
- detect_tme(c);
-
init_intel_misc_features(c);
split_lock_init();
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index fb8cf953..b66f540 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1017,10 +1017,12 @@ void __init e820__reserve_setup_data(void)
e820__range_update(pa_data, sizeof(*data)+data->len, E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
/*
- * SETUP_EFI and SETUP_IMA are supplied by kexec and do not need
- * to be reserved.
+ * SETUP_EFI, SETUP_IMA and SETUP_RNG_SEED are supplied by
+ * kexec and do not need to be reserved.
*/
- if (data->type != SETUP_EFI && data->type != SETUP_IMA)
+ if (data->type != SETUP_EFI &&
+ data->type != SETUP_IMA &&
+ data->type != SETUP_RNG_SEED)
e820__range_update_kexec(pa_data,
sizeof(*data) + data->len,
E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 558076db..247f222 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -274,12 +274,13 @@ static int __restore_fpregs_from_user(void __user *buf, u64 ufeatures,
* Attempt to restore the FPU registers directly from user memory.
* Pagefaults are handled and any errors returned are fatal.
*/
-static bool restore_fpregs_from_user(void __user *buf, u64 xrestore,
- bool fx_only, unsigned int size)
+static bool restore_fpregs_from_user(void __user *buf, u64 xrestore, bool fx_only)
{
struct fpu *fpu = ¤t->thread.fpu;
int ret;
+ /* Restore enabled features only. */
+ xrestore &= fpu->fpstate->user_xfeatures;
retry:
fpregs_lock();
/* Ensure that XFD is up to date */
@@ -309,7 +310,7 @@ static bool restore_fpregs_from_user(void __user *buf, u64 xrestore,
if (ret != X86_TRAP_PF)
return false;
- if (!fault_in_readable(buf, size))
+ if (!fault_in_readable(buf, fpu->fpstate->user_size))
goto retry;
return false;
}
@@ -339,7 +340,6 @@ static bool __fpu_restore_sig(void __user *buf, void __user *buf_fx,
struct user_i387_ia32_struct env;
bool success, fx_only = false;
union fpregs_state *fpregs;
- unsigned int state_size;
u64 user_xfeatures = 0;
if (use_xsave()) {
@@ -349,17 +349,14 @@ static bool __fpu_restore_sig(void __user *buf, void __user *buf_fx,
return false;
fx_only = !fx_sw_user.magic1;
- state_size = fx_sw_user.xstate_size;
user_xfeatures = fx_sw_user.xfeatures;
} else {
user_xfeatures = XFEATURE_MASK_FPSSE;
- state_size = fpu->fpstate->user_size;
}
if (likely(!ia32_fxstate)) {
/* Restore the FPU registers directly from user memory. */
- return restore_fpregs_from_user(buf_fx, user_xfeatures, fx_only,
- state_size);
+ return restore_fpregs_from_user(buf_fx, user_xfeatures, fx_only);
}
/*
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index dfe9945..428ee74 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -434,7 +434,8 @@ static void __init sev_map_percpu_data(void)
{
int cpu;
- if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
+ if (cc_vendor != CC_VENDOR_AMD ||
+ !cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
return;
for_each_possible_cpu(cpu) {
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 17e955a..3082cf2 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -563,9 +563,6 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
}
if (this_cpu_dec_return(nmi_state))
goto nmi_restart;
-
- if (user_mode(regs))
- mds_user_clear_cpu_buffers();
}
#if IS_ENABLED(CONFIG_KVM_INTEL)
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 87e3da7..65ed14b 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -80,9 +80,10 @@
depends on KVM && X86_64
select KVM_GENERIC_PRIVATE_MEM
help
- Enable support for KVM software-protected VMs. Currently "protected"
- means the VM can be backed with memory provided by
- KVM_CREATE_GUEST_MEMFD.
+ Enable support for KVM software-protected VMs. Currently, software-
+ protected VMs are purely a development and testing vehicle for
+ KVM_CREATE_GUEST_MEMFD. Attempting to run a "real" VM workload as a
+ software-protected VM will fail miserably.
If unsure, say "N".
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 4943f6b..8a47f85 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1322,6 +1322,56 @@ static bool hv_check_msr_access(struct kvm_vcpu_hv *hv_vcpu, u32 msr)
return false;
}
+#define KVM_HV_WIN2016_GUEST_ID 0x1040a00003839
+#define KVM_HV_WIN2016_GUEST_ID_MASK (~GENMASK_ULL(23, 16)) /* mask out the service version */
+
+/*
+ * Hyper-V enabled Windows Server 2016 SMP VMs fail to boot in !XSAVES && XSAVEC
+ * configuration.
+ * Such configuration can result from, for example, AMD Erratum 1386 workaround.
+ *
+ * Print a notice so users aren't left wondering what's suddenly gone wrong.
+ */
+static void __kvm_hv_xsaves_xsavec_maybe_warn(struct kvm_vcpu *vcpu)
+{
+ struct kvm *kvm = vcpu->kvm;
+ struct kvm_hv *hv = to_kvm_hv(kvm);
+
+ /* Check again under the hv_lock. */
+ if (hv->xsaves_xsavec_checked)
+ return;
+
+ if ((hv->hv_guest_os_id & KVM_HV_WIN2016_GUEST_ID_MASK) !=
+ KVM_HV_WIN2016_GUEST_ID)
+ return;
+
+ hv->xsaves_xsavec_checked = true;
+
+ /* UP configurations aren't affected */
+ if (atomic_read(&kvm->online_vcpus) < 2)
+ return;
+
+ if (guest_cpuid_has(vcpu, X86_FEATURE_XSAVES) ||
+ !guest_cpuid_has(vcpu, X86_FEATURE_XSAVEC))
+ return;
+
+ pr_notice_ratelimited("Booting SMP Windows KVM VM with !XSAVES && XSAVEC. "
+ "If it fails to boot try disabling XSAVEC in the VM config.\n");
+}
+
+void kvm_hv_xsaves_xsavec_maybe_warn(struct kvm_vcpu *vcpu)
+{
+ struct kvm_hv *hv = to_kvm_hv(vcpu->kvm);
+
+ if (!vcpu->arch.hyperv_enabled ||
+ hv->xsaves_xsavec_checked)
+ return;
+
+ mutex_lock(&hv->hv_lock);
+ __kvm_hv_xsaves_xsavec_maybe_warn(vcpu);
+ mutex_unlock(&hv->hv_lock);
+}
+
static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data,
bool host)
{
diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index 1dc0b66..923e649 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -182,6 +182,8 @@ void kvm_hv_setup_tsc_page(struct kvm *kvm,
struct pvclock_vcpu_time_info *hv_clock);
void kvm_hv_request_tsc_page_update(struct kvm *kvm);
+void kvm_hv_xsaves_xsavec_maybe_warn(struct kvm_vcpu *vcpu);
+
void kvm_hv_init_vm(struct kvm *kvm);
void kvm_hv_destroy_vm(struct kvm *kvm);
int kvm_hv_vcpu_init(struct kvm_vcpu *vcpu);
@@ -267,6 +269,7 @@ int kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu);
static inline void kvm_hv_setup_tsc_page(struct kvm *kvm,
struct pvclock_vcpu_time_info *hv_clock) {}
static inline void kvm_hv_request_tsc_page_update(struct kvm *kvm) {}
+static inline void kvm_hv_xsaves_xsavec_maybe_warn(struct kvm_vcpu *vcpu) {}
static inline void kvm_hv_init_vm(struct kvm *kvm) {}
static inline void kvm_hv_destroy_vm(struct kvm *kvm) {}
static inline int kvm_hv_vcpu_init(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2d6cdea..0544700 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4405,6 +4405,31 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
fault->mmu_seq = vcpu->kvm->mmu_invalidate_seq;
smp_rmb();
+ /*
+ * Check for a relevant mmu_notifier invalidation event before getting
+ * the pfn from the primary MMU, and before acquiring mmu_lock.
+ *
+ * For mmu_lock, if there is an in-progress invalidation and the kernel
+ * allows preemption, the invalidation task may drop mmu_lock and yield
+ * in response to mmu_lock being contended, which is *very* counter-
+ * productive as this vCPU can't actually make forward progress until
+ * the invalidation completes.
+ *
+ * Retrying now can also avoid unnessary lock contention in the primary
+ * MMU, as the primary MMU doesn't necessarily hold a single lock for
+ * the duration of the invalidation, i.e. faulting in a conflicting pfn
+ * can cause the invalidation to take longer by holding locks that are
+ * needed to complete the invalidation.
+ *
+ * Do the pre-check even for non-preemtible kernels, i.e. even if KVM
+ * will never yield mmu_lock in response to contention, as this vCPU is
+ * *guaranteed* to need to retry, i.e. waiting until mmu_lock is held
+ * to detect retry guarantees the worst case latency for the vCPU.
+ */
+ if (fault->slot &&
+ mmu_invalidate_retry_gfn_unsafe(vcpu->kvm, fault->mmu_seq, fault->gfn))
+ return RET_PF_RETRY;
+
ret = __kvm_faultin_pfn(vcpu, fault);
if (ret != RET_PF_CONTINUE)
return ret;
@@ -4415,6 +4440,18 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
if (unlikely(!fault->slot))
return kvm_handle_noslot_fault(vcpu, fault, access);
+ /*
+ * Check again for a relevant mmu_notifier invalidation event purely to
+ * avoid contending mmu_lock. Most invalidations will be detected by
+ * the previous check, but checking is extremely cheap relative to the
+ * overall cost of failing to detect the invalidation until after
+ * mmu_lock is acquired.
+ */
+ if (mmu_invalidate_retry_gfn_unsafe(vcpu->kvm, fault->mmu_seq, fault->gfn)) {
+ kvm_release_pfn_clean(fault->pfn);
+ return RET_PF_RETRY;
+ }
+
return RET_PF_CONTINUE;
}
@@ -4442,6 +4479,11 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
if (!sp && kvm_test_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu))
return true;
+ /*
+ * Check for a relevant mmu_notifier invalidation event one last time
+ * now that mmu_lock is held, as the "unsafe" checks performed without
+ * holding mmu_lock can get false negatives.
+ */
return fault->slot &&
mmu_invalidate_retry_gfn(vcpu->kvm, fault->mmu_seq, fault->gfn);
}
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index f760106..a8ce5226 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -57,7 +57,7 @@ static bool sev_es_enabled = true;
module_param_named(sev_es, sev_es_enabled, bool, 0444);
/* enable/disable SEV-ES DebugSwap support */
-static bool sev_es_debug_swap_enabled = true;
+static bool sev_es_debug_swap_enabled = false;
module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
#else
#define sev_enabled false
@@ -612,8 +612,11 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
save->xss = svm->vcpu.arch.ia32_xss;
save->dr6 = svm->vcpu.arch.dr6;
- if (sev_es_debug_swap_enabled)
+ if (sev_es_debug_swap_enabled) {
save->sev_features |= SVM_SEV_FEAT_DEBUG_SWAP;
+ pr_warn_once("Enabling DebugSwap with KVM_SEV_ES_INIT. "
+ "This will not work starting with Linux 6.10\n");
+ }
pr_debug("Virtual Machine Save Area (VMSA):\n");
print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);
@@ -1975,20 +1978,22 @@ int sev_mem_enc_register_region(struct kvm *kvm,
goto e_free;
}
+ /*
+ * The guest may change the memory encryption attribute from C=0 -> C=1
+ * or vice versa for this memory range. Lets make sure caches are
+ * flushed to ensure that guest data gets written into memory with
+ * correct C-bit. Note, this must be done before dropping kvm->lock,
+ * as region and its array of pages can be freed by a different task
+ * once kvm->lock is released.
+ */
+ sev_clflush_pages(region->pages, region->npages);
+
region->uaddr = range->addr;
region->size = range->size;
list_add_tail(®ion->list, &sev->regions_list);
mutex_unlock(&kvm->lock);
- /*
- * The guest may change the memory encryption attribute from C=0 -> C=1
- * or vice versa for this memory range. Lets make sure caches are
- * flushed to ensure that guest data gets written into memory with
- * correct C-bit.
- */
- sev_clflush_pages(region->pages, region->npages);
-
return ret;
e_free:
diff --git a/arch/x86/kvm/svm/svm_ops.h b/arch/x86/kvm/svm/svm_ops.h
index 36c8af8..4e72585 100644
--- a/arch/x86/kvm/svm/svm_ops.h
+++ b/arch/x86/kvm/svm/svm_ops.h
@@ -8,7 +8,7 @@
#define svm_asm(insn, clobber...) \
do { \
- asm_volatile_goto("1: " __stringify(insn) "\n\t" \
+ asm goto("1: " __stringify(insn) "\n\t" \
_ASM_EXTABLE(1b, %l[fault]) \
::: clobber : fault); \
return; \
@@ -18,7 +18,7 @@ fault: \
#define svm_asm1(insn, op1, clobber...) \
do { \
- asm_volatile_goto("1: " __stringify(insn) " %0\n\t" \
+ asm goto("1: " __stringify(insn) " %0\n\t" \
_ASM_EXTABLE(1b, %l[fault]) \
:: op1 : clobber : fault); \
return; \
@@ -28,7 +28,7 @@ fault: \
#define svm_asm2(insn, op1, op2, clobber...) \
do { \
- asm_volatile_goto("1: " __stringify(insn) " %1, %0\n\t" \
+ asm goto("1: " __stringify(insn) " %1, %0\n\t" \
_ASM_EXTABLE(1b, %l[fault]) \
:: op1, op2 : clobber : fault); \
return; \
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index a6216c8..315c7c2 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -71,7 +71,7 @@ static int fixed_pmc_events[] = {
static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
{
struct kvm_pmc *pmc;
- u8 old_fixed_ctr_ctrl = pmu->fixed_ctr_ctrl;
+ u64 old_fixed_ctr_ctrl = pmu->fixed_ctr_ctrl;
int i;
pmu->fixed_ctr_ctrl = data;
diff --git a/arch/x86/kvm/vmx/run_flags.h b/arch/x86/kvm/vmx/run_flags.h
index edc3f16..6a9bfdf 100644
--- a/arch/x86/kvm/vmx/run_flags.h
+++ b/arch/x86/kvm/vmx/run_flags.h
@@ -2,7 +2,10 @@
#ifndef __KVM_X86_VMX_RUN_FLAGS_H
#define __KVM_X86_VMX_RUN_FLAGS_H
-#define VMX_RUN_VMRESUME (1 << 0)
-#define VMX_RUN_SAVE_SPEC_CTRL (1 << 1)
+#define VMX_RUN_VMRESUME_SHIFT 0
+#define VMX_RUN_SAVE_SPEC_CTRL_SHIFT 1
+
+#define VMX_RUN_VMRESUME BIT(VMX_RUN_VMRESUME_SHIFT)
+#define VMX_RUN_SAVE_SPEC_CTRL BIT(VMX_RUN_SAVE_SPEC_CTRL_SHIFT)
#endif /* __KVM_X86_VMX_RUN_FLAGS_H */
diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index 906ecd0..2bfbf75 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -139,7 +139,7 @@
mov (%_ASM_SP), %_ASM_AX
/* Check if vmlaunch or vmresume is needed */
- test $VMX_RUN_VMRESUME, %ebx
+ bt $VMX_RUN_VMRESUME_SHIFT, %ebx
/* Load guest registers. Don't clobber flags. */
mov VCPU_RCX(%_ASM_AX), %_ASM_CX
@@ -161,8 +161,11 @@
/* Load guest RAX. This kills the @regs pointer! */
mov VCPU_RAX(%_ASM_AX), %_ASM_AX
- /* Check EFLAGS.ZF from 'test VMX_RUN_VMRESUME' above */
- jz .Lvmlaunch
+ /* Clobbers EFLAGS.ZF */
+ CLEAR_CPU_BUFFERS
+
+ /* Check EFLAGS.CF from the VMX_RUN_VMRESUME bit test above. */
+ jnc .Lvmlaunch
/*
* After a successful VMRESUME/VMLAUNCH, control flow "magically"
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e262bc2..88a4ff2 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -388,7 +388,16 @@ static __always_inline void vmx_enable_fb_clear(struct vcpu_vmx *vmx)
static void vmx_update_fb_clear_dis(struct kvm_vcpu *vcpu, struct vcpu_vmx *vmx)
{
- vmx->disable_fb_clear = (host_arch_capabilities & ARCH_CAP_FB_CLEAR_CTRL) &&
+ /*
+ * Disable VERW's behavior of clearing CPU buffers for the guest if the
+ * CPU isn't affected by MDS/TAA, and the host hasn't forcefully enabled
+ * the mitigation. Disabling the clearing behavior provides a
+ * performance boost for guests that aren't aware that manually clearing
+ * CPU buffers is unnecessary, at the cost of MSR accesses on VM-Entry
+ * and VM-Exit.
+ */
+ vmx->disable_fb_clear = !cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF) &&
+ (host_arch_capabilities & ARCH_CAP_FB_CLEAR_CTRL) &&
!boot_cpu_has_bug(X86_BUG_MDS) &&
!boot_cpu_has_bug(X86_BUG_TAA);
@@ -738,7 +747,7 @@ static int vmx_set_guest_uret_msr(struct vcpu_vmx *vmx,
*/
static int kvm_cpu_vmxoff(void)
{
- asm_volatile_goto("1: vmxoff\n\t"
+ asm goto("1: vmxoff\n\t"
_ASM_EXTABLE(1b, %l[fault])
::: "cc", "memory" : fault);
@@ -2784,7 +2793,7 @@ static int kvm_cpu_vmxon(u64 vmxon_pointer)
cr4_set_bits(X86_CR4_VMXE);
- asm_volatile_goto("1: vmxon %[vmxon_pointer]\n\t"
+ asm goto("1: vmxon %[vmxon_pointer]\n\t"
_ASM_EXTABLE(1b, %l[fault])
: : [vmxon_pointer] "m"(vmxon_pointer)
: : fault);
@@ -7224,11 +7233,14 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
guest_state_enter_irqoff();
- /* L1D Flush includes CPU buffer clear to mitigate MDS */
+ /*
+ * L1D Flush includes CPU buffer clear to mitigate MDS, but VERW
+ * mitigation for MDS is done late in VMentry and is still
+ * executed in spite of L1D Flush. This is because an extra VERW
+ * should not matter much after the big hammer L1D Flush.
+ */
if (static_branch_unlikely(&vmx_l1d_should_flush))
vmx_l1d_flush(vcpu);
- else if (static_branch_unlikely(&mds_user_clear))
- mds_clear_cpu_buffers();
else if (static_branch_unlikely(&mmio_stale_data_clear) &&
kvm_arch_has_assigned_device(vcpu->kvm))
mds_clear_cpu_buffers();
diff --git a/arch/x86/kvm/vmx/vmx_ops.h b/arch/x86/kvm/vmx/vmx_ops.h
index f41ce3c..8060e5f 100644
--- a/arch/x86/kvm/vmx/vmx_ops.h
+++ b/arch/x86/kvm/vmx/vmx_ops.h
@@ -94,7 +94,7 @@ static __always_inline unsigned long __vmcs_readl(unsigned long field)
#ifdef CONFIG_CC_HAS_ASM_GOTO_OUTPUT
- asm_volatile_goto("1: vmread %[field], %[output]\n\t"
+ asm_goto_output("1: vmread %[field], %[output]\n\t"
"jna %l[do_fail]\n\t"
_ASM_EXTABLE(1b, %l[do_exception])
@@ -188,7 +188,7 @@ static __always_inline unsigned long vmcs_readl(unsigned long field)
#define vmx_asm1(insn, op1, error_args...) \
do { \
- asm_volatile_goto("1: " __stringify(insn) " %0\n\t" \
+ asm goto("1: " __stringify(insn) " %0\n\t" \
".byte 0x2e\n\t" /* branch not taken hint */ \
"jna %l[error]\n\t" \
_ASM_EXTABLE(1b, %l[fault]) \
@@ -205,7 +205,7 @@ fault: \
#define vmx_asm2(insn, op1, op2, error_args...) \
do { \
- asm_volatile_goto("1: " __stringify(insn) " %1, %0\n\t" \
+ asm goto("1: " __stringify(insn) " %1, %0\n\t" \
".byte 0x2e\n\t" /* branch not taken hint */ \
"jna %l[error]\n\t" \
_ASM_EXTABLE(1b, %l[fault]) \
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 363b1c080..e02cc710 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1704,22 +1704,17 @@ static int do_get_msr_feature(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
struct kvm_msr_entry msr;
int r;
+ /* Unconditionally clear the output for simplicity */
+ msr.data = 0;
msr.index = index;
r = kvm_get_msr_feature(&msr);
- if (r == KVM_MSR_RET_INVALID) {
- /* Unconditionally clear the output for simplicity */
- *data = 0;
- if (kvm_msr_ignored_check(index, 0, false))
- r = 0;
- }
-
- if (r)
- return r;
+ if (r == KVM_MSR_RET_INVALID && kvm_msr_ignored_check(index, 0, false))
+ r = 0;
*data = msr.data;
- return 0;
+ return r;
}
static bool __kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer)
@@ -1782,6 +1777,10 @@ static int set_efer(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if ((efer ^ old_efer) & KVM_MMU_EFER_ROLE_BITS)
kvm_mmu_reset_context(vcpu);
+ if (!static_cpu_has(X86_FEATURE_XSAVES) &&
+ (efer & EFER_SVME))
+ kvm_hv_xsaves_xsavec_maybe_warn(vcpu);
+
return 0;
}
@@ -2507,7 +2506,7 @@ static u64 compute_guest_tsc(struct kvm_vcpu *vcpu, s64 kernel_ns)
}
#ifdef CONFIG_X86_64
-static inline int gtod_is_based_on_tsc(int mode)
+static inline bool gtod_is_based_on_tsc(int mode)
{
return mode == VDSO_CLOCKMODE_TSC || mode == VDSO_CLOCKMODE_HVCLOCK;
}
@@ -4581,7 +4580,7 @@ static bool kvm_is_vm_type_supported(unsigned long type)
{
return type == KVM_X86_DEFAULT_VM ||
(type == KVM_X86_SW_PROTECTED_VM &&
- IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) && tdp_enabled);
+ IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) && tdp_mmu_enabled);
}
int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
@@ -5454,7 +5453,8 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
vcpu->arch.nmi_pending = 0;
atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending);
- kvm_make_request(KVM_REQ_NMI, vcpu);
+ if (events->nmi.pending)
+ kvm_make_request(KVM_REQ_NMI, vcpu);
}
static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
@@ -7016,6 +7016,9 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
r = -EEXIST;
if (kvm->arch.vpit)
goto create_pit_unlock;
+ r = -ENOENT;
+ if (!pic_in_kernel(kvm))
+ goto create_pit_unlock;
r = -ENOMEM;
kvm->arch.vpit = kvm_create_pit(kvm, u.pit_config.flags);
if (kvm->arch.vpit)
@@ -8004,6 +8007,16 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
if (r < 0)
return X86EMUL_UNHANDLEABLE;
+
+ /*
+ * Mark the page dirty _before_ checking whether or not the CMPXCHG was
+ * successful, as the old value is written back on failure. Note, for
+ * live migration, this is unnecessarily conservative as CMPXCHG writes
+ * back the original value and the access is atomic, but KVM's ABI is
+ * that all writes are dirty logged, regardless of the value written.
+ */
+ kvm_vcpu_mark_page_dirty(vcpu, gpa_to_gfn(gpa));
+
if (r)
return X86EMUL_CMPXCHG_FAILED;
diff --git a/arch/x86/lib/getuser.S b/arch/x86/lib/getuser.S
index 20ef350..10d5ed8 100644
--- a/arch/x86/lib/getuser.S
+++ b/arch/x86/lib/getuser.S
@@ -163,23 +163,23 @@
#endif
/* get_user */
- _ASM_EXTABLE(1b, __get_user_handle_exception)
- _ASM_EXTABLE(2b, __get_user_handle_exception)
- _ASM_EXTABLE(3b, __get_user_handle_exception)
+ _ASM_EXTABLE_UA(1b, __get_user_handle_exception)
+ _ASM_EXTABLE_UA(2b, __get_user_handle_exception)
+ _ASM_EXTABLE_UA(3b, __get_user_handle_exception)
#ifdef CONFIG_X86_64
- _ASM_EXTABLE(4b, __get_user_handle_exception)
+ _ASM_EXTABLE_UA(4b, __get_user_handle_exception)
#else
- _ASM_EXTABLE(4b, __get_user_8_handle_exception)
- _ASM_EXTABLE(5b, __get_user_8_handle_exception)
+ _ASM_EXTABLE_UA(4b, __get_user_8_handle_exception)
+ _ASM_EXTABLE_UA(5b, __get_user_8_handle_exception)
#endif
/* __get_user */
- _ASM_EXTABLE(6b, __get_user_handle_exception)
- _ASM_EXTABLE(7b, __get_user_handle_exception)
- _ASM_EXTABLE(8b, __get_user_handle_exception)
+ _ASM_EXTABLE_UA(6b, __get_user_handle_exception)
+ _ASM_EXTABLE_UA(7b, __get_user_handle_exception)
+ _ASM_EXTABLE_UA(8b, __get_user_handle_exception)
#ifdef CONFIG_X86_64
- _ASM_EXTABLE(9b, __get_user_handle_exception)
+ _ASM_EXTABLE_UA(9b, __get_user_handle_exception)
#else
- _ASM_EXTABLE(9b, __get_user_8_handle_exception)
- _ASM_EXTABLE(10b, __get_user_8_handle_exception)
+ _ASM_EXTABLE_UA(9b, __get_user_8_handle_exception)
+ _ASM_EXTABLE_UA(10b, __get_user_8_handle_exception)
#endif
diff --git a/arch/x86/lib/putuser.S b/arch/x86/lib/putuser.S
index 2877f59..975c9c1 100644
--- a/arch/x86/lib/putuser.S
+++ b/arch/x86/lib/putuser.S
@@ -133,15 +133,15 @@
RET
SYM_CODE_END(__put_user_handle_exception)
- _ASM_EXTABLE(1b, __put_user_handle_exception)
- _ASM_EXTABLE(2b, __put_user_handle_exception)
- _ASM_EXTABLE(3b, __put_user_handle_exception)
- _ASM_EXTABLE(4b, __put_user_handle_exception)
- _ASM_EXTABLE(5b, __put_user_handle_exception)
- _ASM_EXTABLE(6b, __put_user_handle_exception)
- _ASM_EXTABLE(7b, __put_user_handle_exception)
- _ASM_EXTABLE(9b, __put_user_handle_exception)
+ _ASM_EXTABLE_UA(1b, __put_user_handle_exception)
+ _ASM_EXTABLE_UA(2b, __put_user_handle_exception)
+ _ASM_EXTABLE_UA(3b, __put_user_handle_exception)
+ _ASM_EXTABLE_UA(4b, __put_user_handle_exception)
+ _ASM_EXTABLE_UA(5b, __put_user_handle_exception)
+ _ASM_EXTABLE_UA(6b, __put_user_handle_exception)
+ _ASM_EXTABLE_UA(7b, __put_user_handle_exception)
+ _ASM_EXTABLE_UA(9b, __put_user_handle_exception)
#ifdef CONFIG_X86_32
- _ASM_EXTABLE(8b, __put_user_handle_exception)
- _ASM_EXTABLE(10b, __put_user_handle_exception)
+ _ASM_EXTABLE_UA(8b, __put_user_handle_exception)
+ _ASM_EXTABLE_UA(10b, __put_user_handle_exception)
#endif
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 679b09c..d6375b3c 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -798,15 +798,6 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
show_opcodes(regs, loglvl);
}
-/*
- * The (legacy) vsyscall page is the long page in the kernel portion
- * of the address space that has user-accessible permissions.
- */
-static bool is_vsyscall_vaddr(unsigned long vaddr)
-{
- return unlikely((vaddr & PAGE_MASK) == VSYSCALL_ADDR);
-}
-
static void
__bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
unsigned long address, u32 pkey, int si_code)
diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
index 968d700..f50cc21 100644
--- a/arch/x86/mm/ident_map.c
+++ b/arch/x86/mm/ident_map.c
@@ -26,18 +26,31 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
for (; addr < end; addr = next) {
pud_t *pud = pud_page + pud_index(addr);
pmd_t *pmd;
+ bool use_gbpage;
next = (addr & PUD_MASK) + PUD_SIZE;
if (next > end)
next = end;
- if (info->direct_gbpages) {
+ /* if this is already a gbpage, this portion is already mapped */
+ if (pud_large(*pud))
+ continue;
+
+ /* Is using a gbpage allowed? */
+ use_gbpage = info->direct_gbpages;
+
+ /* Don't use gbpage if it maps more than the requested region. */
+ /* at the begining: */
+ use_gbpage &= ((addr & ~PUD_MASK) == 0);
+ /* ... or at the end: */
+ use_gbpage &= ((next & ~PUD_MASK) == 0);
+
+ /* Never overwrite existing mappings */
+ use_gbpage &= !pud_present(*pud);
+
+ if (use_gbpage) {
pud_t pudval;
- if (pud_present(*pud))
- continue;
-
- addr &= PUD_MASK;
pudval = __pud((addr - info->offset) | info->page_flag);
set_pud(pud, pudval);
continue;
diff --git a/arch/x86/mm/maccess.c b/arch/x86/mm/maccess.c
index 6993f02..42115ac 100644
--- a/arch/x86/mm/maccess.c
+++ b/arch/x86/mm/maccess.c
@@ -3,6 +3,8 @@
#include <linux/uaccess.h>
#include <linux/kernel.h>
+#include <asm/vsyscall.h>
+
#ifdef CONFIG_X86_64
bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size)
{
@@ -16,6 +18,14 @@ bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size)
return false;
/*
+ * Reading from the vsyscall page may cause an unhandled fault in
+ * certain cases. Though it is at an address above TASK_SIZE_MAX, it is
+ * usually considered as a user space address.
+ */
+ if (is_vsyscall_vaddr(vaddr))
+ return false;
+
+ /*
* Allow everything during early boot before 'x86_virt_bits'
* is initialized. Needed for instruction decoding in early
* exception handlers.
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index adc497b..65e9a6e 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -934,7 +934,7 @@ static int __init cmp_memblk(const void *a, const void *b)
const struct numa_memblk *ma = *(const struct numa_memblk **)a;
const struct numa_memblk *mb = *(const struct numa_memblk **)b;
- return ma->start - mb->start;
+ return (ma->start > mb->start) - (ma->start < mb->start);
}
static struct numa_memblk *numa_memblk_list[NR_NODE_MEMBLKS] __initdata;
@@ -944,14 +944,12 @@ static struct numa_memblk *numa_memblk_list[NR_NODE_MEMBLKS] __initdata;
* @start: address to begin fill
* @end: address to end fill
*
- * Find and extend numa_meminfo memblks to cover the @start-@end
- * physical address range, such that the first memblk includes
- * @start, the last memblk includes @end, and any gaps in between
- * are filled.
+ * Find and extend numa_meminfo memblks to cover the physical
+ * address range @start-@end
*
* RETURNS:
* 0 : Success
- * NUMA_NO_MEMBLK : No memblk exists in @start-@end range
+ * NUMA_NO_MEMBLK : No memblks exist in address range @start-@end
*/
int __init numa_fill_memblks(u64 start, u64 end)
@@ -963,17 +961,14 @@ int __init numa_fill_memblks(u64 start, u64 end)
/*
* Create a list of pointers to numa_meminfo memblks that
- * overlap start, end. Exclude (start == bi->end) since
- * end addresses in both a CFMWS range and a memblk range
- * are exclusive.
- *
- * This list of pointers is used to make in-place changes
- * that fill out the numa_meminfo memblks.
+ * overlap start, end. The list is used to make in-place
+ * changes that fill out the numa_meminfo memblks.
*/
for (int i = 0; i < mi->nr_blks; i++) {
struct numa_memblk *bi = &mi->blk[i];
- if (start < bi->end && end >= bi->start) {
+ if (memblock_addrs_overlap(start, end - start, bi->start,
+ bi->end - bi->start)) {
blk[count] = &mi->blk[i];
count++;
}
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index e9b448d..1028804 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -755,10 +755,14 @@ pmd_t *lookup_pmd_address(unsigned long address)
* areas on 32-bit NUMA systems. The percpu areas can
* end up in this kind of memory, for instance.
*
- * This could be optimized, but it is only intended to be
- * used at initialization time, and keeping it
- * unoptimized should increase the testing coverage for
- * the more obscure platforms.
+ * Note that as long as the PTEs are well-formed with correct PFNs, this
+ * works without checking the PRESENT bit in the leaf PTE. This is unlike
+ * the similar vmalloc_to_page() and derivatives. Callers may depend on
+ * this behavior.
+ *
+ * This could be optimized, but it is only used in paths that are not perf
+ * sensitive, and keeping it unoptimized should increase the testing coverage
+ * for the more obscure platforms.
*/
phys_addr_t slow_virt_to_phys(void *__virt_addr)
{
@@ -2041,17 +2045,12 @@ int set_mce_nospec(unsigned long pfn)
return rc;
}
-static int set_memory_p(unsigned long *addr, int numpages)
-{
- return change_page_attr_set(addr, numpages, __pgprot(_PAGE_PRESENT), 0);
-}
-
/* Restore full speculative operation to the pfn. */
int clear_mce_nospec(unsigned long pfn)
{
unsigned long addr = (unsigned long) pfn_to_kaddr(pfn);
- return set_memory_p(&addr, 1);
+ return set_memory_p(addr, 1);
}
EXPORT_SYMBOL_GPL(clear_mce_nospec);
#endif /* CONFIG_X86_64 */
@@ -2104,6 +2103,11 @@ int set_memory_np_noalias(unsigned long addr, int numpages)
CPA_NO_CHECK_ALIAS, NULL);
}
+int set_memory_p(unsigned long addr, int numpages)
+{
+ return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_PRESENT), 0);
+}
+
int set_memory_4k(unsigned long addr, int numpages)
{
return change_page_attr_set_clr(&addr, numpages, __pgprot(0),
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 4b0d6ff..1fb9a16 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -65,6 +65,8 @@ int xen_smp_intr_init(unsigned int cpu)
char *resched_name, *callfunc_name, *debug_name;
resched_name = kasprintf(GFP_KERNEL, "resched%d", cpu);
+ if (!resched_name)
+ goto fail_mem;
per_cpu(xen_resched_irq, cpu).name = resched_name;
rc = bind_ipi_to_irqhandler(XEN_RESCHEDULE_VECTOR,
cpu,
@@ -77,6 +79,8 @@ int xen_smp_intr_init(unsigned int cpu)
per_cpu(xen_resched_irq, cpu).irq = rc;
callfunc_name = kasprintf(GFP_KERNEL, "callfunc%d", cpu);
+ if (!callfunc_name)
+ goto fail_mem;
per_cpu(xen_callfunc_irq, cpu).name = callfunc_name;
rc = bind_ipi_to_irqhandler(XEN_CALL_FUNCTION_VECTOR,
cpu,
@@ -90,6 +94,9 @@ int xen_smp_intr_init(unsigned int cpu)
if (!xen_fifo_events) {
debug_name = kasprintf(GFP_KERNEL, "debug%d", cpu);
+ if (!debug_name)
+ goto fail_mem;
+
per_cpu(xen_debug_irq, cpu).name = debug_name;
rc = bind_virq_to_irqhandler(VIRQ_DEBUG, cpu,
xen_debug_interrupt,
@@ -101,6 +108,9 @@ int xen_smp_intr_init(unsigned int cpu)
}
callfunc_name = kasprintf(GFP_KERNEL, "callfuncsingle%d", cpu);
+ if (!callfunc_name)
+ goto fail_mem;
+
per_cpu(xen_callfuncsingle_irq, cpu).name = callfunc_name;
rc = bind_ipi_to_irqhandler(XEN_CALL_FUNCTION_SINGLE_VECTOR,
cpu,
@@ -114,6 +124,8 @@ int xen_smp_intr_init(unsigned int cpu)
return 0;
+ fail_mem:
+ rc = -ENOMEM;
fail:
xen_smp_intr_free(cpu);
return rc;
diff --git a/arch/xtensa/include/asm/jump_label.h b/arch/xtensa/include/asm/jump_label.h
index c812bf8..46c8596 100644
--- a/arch/xtensa/include/asm/jump_label.h
+++ b/arch/xtensa/include/asm/jump_label.h
@@ -13,7 +13,7 @@
static __always_inline bool arch_static_branch(struct static_key *key,
bool branch)
{
- asm_volatile_goto("1:\n\t"
+ asm goto("1:\n\t"
"_nop\n\t"
".pushsection __jump_table, \"aw\"\n\t"
".word 1b, %l[l_yes], %c0\n\t"
@@ -38,7 +38,7 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key,
* make it reachable and wrap both into a no-transform block
* to avoid any assembler interference with this.
*/
- asm_volatile_goto("1:\n\t"
+ asm goto("1:\n\t"
".begin no-transform\n\t"
"_j %l[l_yes]\n\t"
"2:\n\t"
diff --git a/block/bdev.c b/block/bdev.c
index d5ed1a70..e7adaaf 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -49,6 +49,12 @@ struct block_device *I_BDEV(struct inode *inode)
}
EXPORT_SYMBOL(I_BDEV);
+struct block_device *file_bdev(struct file *bdev_file)
+{
+ return I_BDEV(bdev_file->f_mapping->host);
+}
+EXPORT_SYMBOL(file_bdev);
+
static void bdev_write_inode(struct block_device *bdev)
{
struct inode *inode = bdev->bd_inode;
@@ -368,12 +374,12 @@ static struct file_system_type bd_type = {
};
struct super_block *blockdev_superblock __ro_after_init;
+struct vfsmount *blockdev_mnt __ro_after_init;
EXPORT_SYMBOL_GPL(blockdev_superblock);
void __init bdev_cache_init(void)
{
int err;
- static struct vfsmount *bd_mnt __ro_after_init;
bdev_cachep = kmem_cache_create("bdev_cache", sizeof(struct bdev_inode),
0, (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
@@ -382,10 +388,10 @@ void __init bdev_cache_init(void)
err = register_filesystem(&bd_type);
if (err)
panic("Cannot register bdev pseudo-fs");
- bd_mnt = kern_mount(&bd_type);
- if (IS_ERR(bd_mnt))
+ blockdev_mnt = kern_mount(&bd_type);
+ if (IS_ERR(blockdev_mnt))
panic("Cannot create bdev pseudo-fs");
- blockdev_superblock = bd_mnt->mnt_sb; /* For writeback */
+ blockdev_superblock = blockdev_mnt->mnt_sb; /* For writeback */
}
struct block_device *bdev_alloc(struct gendisk *disk, u8 partno)
@@ -696,6 +702,31 @@ static int blkdev_get_part(struct block_device *part, blk_mode_t mode)
return ret;
}
+int bdev_permission(dev_t dev, blk_mode_t mode, void *holder)
+{
+ int ret;
+
+ ret = devcgroup_check_permission(DEVCG_DEV_BLOCK,
+ MAJOR(dev), MINOR(dev),
+ ((mode & BLK_OPEN_READ) ? DEVCG_ACC_READ : 0) |
+ ((mode & BLK_OPEN_WRITE) ? DEVCG_ACC_WRITE : 0));
+ if (ret)
+ return ret;
+
+ /* Blocking writes requires exclusive opener */
+ if (mode & BLK_OPEN_RESTRICT_WRITES && !holder)
+ return -EINVAL;
+
+ /*
+ * We're using error pointers to indicate to ->release() when we
+ * failed to open that block device. Also this doesn't make sense.
+ */
+ if (WARN_ON_ONCE(IS_ERR(holder)))
+ return -EINVAL;
+
+ return 0;
+}
+
static void blkdev_put_part(struct block_device *part)
{
struct block_device *whole = bdev_whole(part);
@@ -775,83 +806,55 @@ static void bdev_claim_write_access(struct block_device *bdev, blk_mode_t mode)
bdev->bd_writers++;
}
-static void bdev_yield_write_access(struct block_device *bdev, blk_mode_t mode)
+static void bdev_yield_write_access(struct file *bdev_file)
{
+ struct block_device *bdev;
+
if (bdev_allow_write_mounted)
return;
+ bdev = file_bdev(bdev_file);
/* Yield exclusive or shared write access. */
- if (mode & BLK_OPEN_RESTRICT_WRITES)
- bdev_unblock_writes(bdev);
- else if (mode & BLK_OPEN_WRITE)
- bdev->bd_writers--;
+ if (bdev_file->f_mode & FMODE_WRITE) {
+ if (bdev_writes_blocked(bdev))
+ bdev_unblock_writes(bdev);
+ else
+ bdev->bd_writers--;
+ }
}
/**
- * bdev_open_by_dev - open a block device by device number
- * @dev: device number of block device to open
+ * bdev_open - open a block device
+ * @bdev: block device to open
* @mode: open mode (BLK_OPEN_*)
* @holder: exclusive holder identifier
* @hops: holder operations
+ * @bdev_file: file for the block device
*
- * Open the block device described by device number @dev. If @holder is not
- * %NULL, the block device is opened with exclusive access. Exclusive opens may
- * nest for the same @holder.
- *
- * Use this interface ONLY if you really do not have anything better - i.e. when
- * you are behind a truly sucky interface and all you are given is a device
- * number. Everything else should use bdev_open_by_path().
+ * Open the block device. If @holder is not %NULL, the block device is opened
+ * with exclusive access. Exclusive opens may nest for the same @holder.
*
* CONTEXT:
* Might sleep.
*
* RETURNS:
- * Handle with a reference to the block_device on success, ERR_PTR(-errno) on
- * failure.
+ * zero on success, -errno on failure.
*/
-struct bdev_handle *bdev_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
- const struct blk_holder_ops *hops)
+int bdev_open(struct block_device *bdev, blk_mode_t mode, void *holder,
+ const struct blk_holder_ops *hops, struct file *bdev_file)
{
- struct bdev_handle *handle = kmalloc(sizeof(struct bdev_handle),
- GFP_KERNEL);
- struct block_device *bdev;
bool unblock_events = true;
- struct gendisk *disk;
+ struct gendisk *disk = bdev->bd_disk;
int ret;
- if (!handle)
- return ERR_PTR(-ENOMEM);
-
- ret = devcgroup_check_permission(DEVCG_DEV_BLOCK,
- MAJOR(dev), MINOR(dev),
- ((mode & BLK_OPEN_READ) ? DEVCG_ACC_READ : 0) |
- ((mode & BLK_OPEN_WRITE) ? DEVCG_ACC_WRITE : 0));
- if (ret)
- goto free_handle;
-
- /* Blocking writes requires exclusive opener */
- if (mode & BLK_OPEN_RESTRICT_WRITES && !holder) {
- ret = -EINVAL;
- goto free_handle;
- }
-
- bdev = blkdev_get_no_open(dev);
- if (!bdev) {
- ret = -ENXIO;
- goto free_handle;
- }
- disk = bdev->bd_disk;
-
if (holder) {
mode |= BLK_OPEN_EXCL;
ret = bd_prepare_to_claim(bdev, holder, hops);
if (ret)
- goto put_blkdev;
+ return ret;
} else {
- if (WARN_ON_ONCE(mode & BLK_OPEN_EXCL)) {
- ret = -EIO;
- goto put_blkdev;
- }
+ if (WARN_ON_ONCE(mode & BLK_OPEN_EXCL))
+ return -EIO;
}
disk_block_events(disk);
@@ -892,10 +895,16 @@ struct bdev_handle *bdev_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
if (unblock_events)
disk_unblock_events(disk);
- handle->bdev = bdev;
- handle->holder = holder;
- handle->mode = mode;
- return handle;
+
+ bdev_file->f_flags |= O_LARGEFILE;
+ bdev_file->f_mode |= FMODE_BUF_RASYNC | FMODE_CAN_ODIRECT;
+ if (bdev_nowait(bdev))
+ bdev_file->f_mode |= FMODE_NOWAIT;
+ bdev_file->f_mapping = bdev->bd_inode->i_mapping;
+ bdev_file->f_wb_err = filemap_sample_wb_err(bdev_file->f_mapping);
+ bdev_file->private_data = holder;
+
+ return 0;
put_module:
module_put(disk->fops->owner);
abort_claiming:
@@ -903,36 +912,80 @@ struct bdev_handle *bdev_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
bd_abort_claiming(bdev, holder);
mutex_unlock(&disk->open_mutex);
disk_unblock_events(disk);
-put_blkdev:
- blkdev_put_no_open(bdev);
-free_handle:
- kfree(handle);
- return ERR_PTR(ret);
+ return ret;
}
-EXPORT_SYMBOL(bdev_open_by_dev);
-/**
- * bdev_open_by_path - open a block device by name
- * @path: path to the block device to open
- * @mode: open mode (BLK_OPEN_*)
- * @holder: exclusive holder identifier
- * @hops: holder operations
+/*
+ * If BLK_OPEN_WRITE_IOCTL is set then this is a historical quirk
+ * associated with the floppy driver where it has allowed ioctls if the
+ * file was opened for writing, but does not allow reads or writes.
+ * Make sure that this quirk is reflected in @f_flags.
*
- * Open the block device described by the device file at @path. If @holder is
- * not %NULL, the block device is opened with exclusive access. Exclusive opens
- * may nest for the same @holder.
- *
- * CONTEXT:
- * Might sleep.
- *
- * RETURNS:
- * Handle with a reference to the block_device on success, ERR_PTR(-errno) on
- * failure.
+ * It can also happen if a block device is opened as O_RDWR | O_WRONLY.
*/
-struct bdev_handle *bdev_open_by_path(const char *path, blk_mode_t mode,
- void *holder, const struct blk_holder_ops *hops)
+static unsigned blk_to_file_flags(blk_mode_t mode)
{
- struct bdev_handle *handle;
+ unsigned int flags = 0;
+
+ if ((mode & (BLK_OPEN_READ | BLK_OPEN_WRITE)) ==
+ (BLK_OPEN_READ | BLK_OPEN_WRITE))
+ flags |= O_RDWR;
+ else if (mode & BLK_OPEN_WRITE_IOCTL)
+ flags |= O_RDWR | O_WRONLY;
+ else if (mode & BLK_OPEN_WRITE)
+ flags |= O_WRONLY;
+ else if (mode & BLK_OPEN_READ)
+ flags |= O_RDONLY; /* homeopathic, because O_RDONLY is 0 */
+ else
+ WARN_ON_ONCE(true);
+
+ if (mode & BLK_OPEN_NDELAY)
+ flags |= O_NDELAY;
+
+ return flags;
+}
+
+struct file *bdev_file_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
+ const struct blk_holder_ops *hops)
+{
+ struct file *bdev_file;
+ struct block_device *bdev;
+ unsigned int flags;
+ int ret;
+
+ ret = bdev_permission(dev, mode, holder);
+ if (ret)
+ return ERR_PTR(ret);
+
+ bdev = blkdev_get_no_open(dev);
+ if (!bdev)
+ return ERR_PTR(-ENXIO);
+
+ flags = blk_to_file_flags(mode);
+ bdev_file = alloc_file_pseudo_noaccount(bdev->bd_inode,
+ blockdev_mnt, "", flags | O_LARGEFILE, &def_blk_fops);
+ if (IS_ERR(bdev_file)) {
+ blkdev_put_no_open(bdev);
+ return bdev_file;
+ }
+ ihold(bdev->bd_inode);
+
+ ret = bdev_open(bdev, mode, holder, hops, bdev_file);
+ if (ret) {
+ /* We failed to open the block device. Let ->release() know. */
+ bdev_file->private_data = ERR_PTR(ret);
+ fput(bdev_file);
+ return ERR_PTR(ret);
+ }
+ return bdev_file;
+}
+EXPORT_SYMBOL(bdev_file_open_by_dev);
+
+struct file *bdev_file_open_by_path(const char *path, blk_mode_t mode,
+ void *holder,
+ const struct blk_holder_ops *hops)
+{
+ struct file *file;
dev_t dev;
int error;
@@ -940,22 +993,28 @@ struct bdev_handle *bdev_open_by_path(const char *path, blk_mode_t mode,
if (error)
return ERR_PTR(error);
- handle = bdev_open_by_dev(dev, mode, holder, hops);
- if (!IS_ERR(handle) && (mode & BLK_OPEN_WRITE) &&
- bdev_read_only(handle->bdev)) {
- bdev_release(handle);
- return ERR_PTR(-EACCES);
+ file = bdev_file_open_by_dev(dev, mode, holder, hops);
+ if (!IS_ERR(file) && (mode & BLK_OPEN_WRITE)) {
+ if (bdev_read_only(file_bdev(file))) {
+ fput(file);
+ file = ERR_PTR(-EACCES);
+ }
}
- return handle;
+ return file;
}
-EXPORT_SYMBOL(bdev_open_by_path);
+EXPORT_SYMBOL(bdev_file_open_by_path);
-void bdev_release(struct bdev_handle *handle)
+void bdev_release(struct file *bdev_file)
{
- struct block_device *bdev = handle->bdev;
+ struct block_device *bdev = file_bdev(bdev_file);
+ void *holder = bdev_file->private_data;
struct gendisk *disk = bdev->bd_disk;
+ /* We failed to open that block device. */
+ if (IS_ERR(holder))
+ goto put_no_open;
+
/*
* Sync early if it looks like we're the last one. If someone else
* opens the block device between now and the decrement of bd_openers
@@ -967,10 +1026,10 @@ void bdev_release(struct bdev_handle *handle)
sync_blockdev(bdev);
mutex_lock(&disk->open_mutex);
- bdev_yield_write_access(bdev, handle->mode);
+ bdev_yield_write_access(bdev_file);
- if (handle->holder)
- bd_end_claim(bdev, handle->holder);
+ if (holder)
+ bd_end_claim(bdev, holder);
/*
* Trigger event checking and tell drivers to flush MEDIA_CHANGE
@@ -986,10 +1045,9 @@ void bdev_release(struct bdev_handle *handle)
mutex_unlock(&disk->open_mutex);
module_put(disk->fops->owner);
+put_no_open:
blkdev_put_no_open(bdev);
- kfree(handle);
}
-EXPORT_SYMBOL(bdev_release);
/**
* lookup_bdev() - Look up a struct block_device by name.
diff --git a/block/bio.c b/block/bio.c
index a8b6919..d24420e 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -250,6 +250,7 @@ void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table,
bio->bi_opf = opf;
bio->bi_flags = 0;
bio->bi_ioprio = 0;
+ bio->bi_write_hint = 0;
bio->bi_status = 0;
bio->bi_iter.bi_sector = 0;
bio->bi_iter.bi_size = 0;
@@ -814,6 +815,7 @@ static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp)
{
bio_set_flag(bio, BIO_CLONED);
bio->bi_ioprio = bio_src->bi_ioprio;
+ bio->bi_write_hint = bio_src->bi_write_hint;
bio->bi_iter = bio_src->bi_iter;
if (bio->bi_bdev) {
diff --git a/block/blk-crypto-fallback.c b/block/blk-crypto-fallback.c
index e6468ea..b1e7415 100644
--- a/block/blk-crypto-fallback.c
+++ b/block/blk-crypto-fallback.c
@@ -172,6 +172,7 @@ static struct bio *blk_crypto_fallback_clone_bio(struct bio *bio_src)
if (bio_flagged(bio_src, BIO_REMAPPED))
bio_set_flag(bio, BIO_REMAPPED);
bio->bi_ioprio = bio_src->bi_ioprio;
+ bio->bi_write_hint = bio_src->bi_write_hint;
bio->bi_iter.bi_sector = bio_src->bi_iter.bi_sector;
bio->bi_iter.bi_size = bio_src->bi_iter.bi_size;
diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 4b0b483..9a85bfb 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -1353,6 +1353,13 @@ static bool iocg_kick_delay(struct ioc_gq *iocg, struct ioc_now *now)
lockdep_assert_held(&iocg->waitq.lock);
+ /*
+ * If the delay is set by another CPU, we may be in the past. No need to
+ * change anything if so. This avoids decay calculation underflow.
+ */
+ if (time_before64(now->now, iocg->delay_at))
+ return false;
+
/* calculate the current delay in effect - 1/2 every second */
tdelta = now->now - iocg->delay_at;
if (iocg->delay)
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 2d470cf..2a06fd3 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -810,6 +810,10 @@ static struct request *attempt_merge(struct request_queue *q,
if (rq_data_dir(req) != rq_data_dir(next))
return NULL;
+ /* Don't merge requests with different write hints. */
+ if (req->write_hint != next->write_hint)
+ return NULL;
+
if (req->ioprio != next->ioprio)
return NULL;
@@ -937,6 +941,10 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
if (!bio_crypt_rq_ctx_compatible(rq, bio))
return false;
+ /* Don't merge requests with different write hints. */
+ if (rq->write_hint != bio->bi_write_hint)
+ return false;
+
if (rq->ioprio != bio_prio(bio))
return false;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index b7ba91d..555ada92 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2572,6 +2572,7 @@ static void blk_mq_bio_to_request(struct request *rq, struct bio *bio,
rq->cmd_flags |= REQ_FAILFAST_MASK;
rq->__sector = bio->bi_iter.bi_sector;
+ rq->write_hint = bio->bi_write_hint;
blk_rq_bio_prep(rq, bio, nr_segs);
/* This can't fail, since GFP_NOIO includes __GFP_DIRECT_RECLAIM. */
@@ -3170,6 +3171,7 @@ int blk_rq_prep_clone(struct request *rq, struct request *rq_src,
}
rq->nr_phys_segments = rq_src->nr_phys_segments;
rq->ioprio = rq_src->ioprio;
+ rq->write_hint = rq_src->write_hint;
if (rq->bio && blk_crypto_rq_bio_prep(rq, rq->bio, gfp_mask) < 0)
goto free_and_out;
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index 8cb53bf..6447213 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -164,9 +164,9 @@ static void wb_timestamp(struct rq_wb *rwb, unsigned long *var)
*/
static bool wb_recent_wait(struct rq_wb *rwb)
{
- struct bdi_writeback *wb = &rwb->rqos.disk->bdi->wb;
+ struct backing_dev_info *bdi = rwb->rqos.disk->bdi;
- return time_before(jiffies, wb->dirty_sleep + HZ);
+ return time_before(jiffies, bdi->last_bdp_sleep + HZ);
}
static inline struct rq_wait *get_rq_wait(struct rq_wb *rwb,
diff --git a/block/blk.h b/block/blk.h
index 6c2749d..a19b7b4 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -596,4 +596,9 @@ static inline void bio_issue_init(struct bio_issue *issue,
((u64)size << BIO_ISSUE_SIZE_SHIFT));
}
+void bdev_release(struct file *bdev_file);
+int bdev_open(struct block_device *bdev, blk_mode_t mode, void *holder,
+ const struct blk_holder_ops *hops, struct file *bdev_file);
+int bdev_permission(dev_t dev, blk_mode_t mode, void *holder);
+
#endif /* BLK_INTERNAL_H */
diff --git a/block/bounce.c b/block/bounce.c
index 7cfcb24..d6a5219 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -169,6 +169,7 @@ static struct bio *bounce_clone_bio(struct bio *bio_src)
if (bio_flagged(bio_src, BIO_REMAPPED))
bio_set_flag(bio, BIO_REMAPPED);
bio->bi_ioprio = bio_src->bi_ioprio;
+ bio->bi_write_hint = bio_src->bi_write_hint;
bio->bi_iter.bi_sector = bio_src->bi_iter.bi_sector;
bio->bi_iter.bi_size = bio_src->bi_iter.bi_size;
diff --git a/block/fops.c b/block/fops.c
index 0cf8cf7..679d9b7 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -73,6 +73,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
bio_init(&bio, bdev, vecs, nr_pages, dio_bio_write_op(iocb));
}
bio.bi_iter.bi_sector = pos >> SECTOR_SHIFT;
+ bio.bi_write_hint = file_inode(iocb->ki_filp)->i_write_hint;
bio.bi_ioprio = iocb->ki_ioprio;
ret = bio_iov_iter_get_pages(&bio, iter);
@@ -203,6 +204,7 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
for (;;) {
bio->bi_iter.bi_sector = pos >> SECTOR_SHIFT;
+ bio->bi_write_hint = file_inode(iocb->ki_filp)->i_write_hint;
bio->bi_private = dio;
bio->bi_end_io = blkdev_bio_end_io;
bio->bi_ioprio = iocb->ki_ioprio;
@@ -321,6 +323,7 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
dio->flags = 0;
dio->iocb = iocb;
bio->bi_iter.bi_sector = pos >> SECTOR_SHIFT;
+ bio->bi_write_hint = file_inode(iocb->ki_filp)->i_write_hint;
bio->bi_end_io = blkdev_bio_end_io_async;
bio->bi_ioprio = iocb->ki_ioprio;
@@ -482,7 +485,7 @@ static void blkdev_readahead(struct readahead_control *rac)
}
static int blkdev_map_blocks(struct iomap_writepage_ctx *wpc,
- struct inode *inode, loff_t offset)
+ struct inode *inode, loff_t offset, unsigned int len)
{
loff_t isize = i_size_read(inode);
@@ -569,18 +572,17 @@ static int blkdev_fsync(struct file *filp, loff_t start, loff_t end,
blk_mode_t file_to_blk_mode(struct file *file)
{
blk_mode_t mode = 0;
- struct bdev_handle *handle = file->private_data;
if (file->f_mode & FMODE_READ)
mode |= BLK_OPEN_READ;
if (file->f_mode & FMODE_WRITE)
mode |= BLK_OPEN_WRITE;
/*
- * do_dentry_open() clears O_EXCL from f_flags, use handle->mode to
- * determine whether the open was exclusive for already open files.
+ * do_dentry_open() clears O_EXCL from f_flags, use file->private_data
+ * to determine whether the open was exclusive for already open files.
*/
- if (handle)
- mode |= handle->mode & BLK_OPEN_EXCL;
+ if (file->private_data)
+ mode |= BLK_OPEN_EXCL;
else if (file->f_flags & O_EXCL)
mode |= BLK_OPEN_EXCL;
if (file->f_flags & O_NDELAY)
@@ -599,36 +601,31 @@ blk_mode_t file_to_blk_mode(struct file *file)
static int blkdev_open(struct inode *inode, struct file *filp)
{
- struct bdev_handle *handle;
+ struct block_device *bdev;
blk_mode_t mode;
-
- /*
- * Preserve backwards compatibility and allow large file access
- * even if userspace doesn't ask for it explicitly. Some mkfs
- * binary needs it. We might want to drop this workaround
- * during an unstable branch.
- */
- filp->f_flags |= O_LARGEFILE;
- filp->f_mode |= FMODE_BUF_RASYNC | FMODE_CAN_ODIRECT;
+ int ret;
mode = file_to_blk_mode(filp);
- handle = bdev_open_by_dev(inode->i_rdev, mode,
- mode & BLK_OPEN_EXCL ? filp : NULL, NULL);
- if (IS_ERR(handle))
- return PTR_ERR(handle);
+ /* Use the file as the holder. */
+ if (mode & BLK_OPEN_EXCL)
+ filp->private_data = filp;
+ ret = bdev_permission(inode->i_rdev, mode, filp->private_data);
+ if (ret)
+ return ret;
- if (bdev_nowait(handle->bdev))
- filp->f_mode |= FMODE_NOWAIT;
+ bdev = blkdev_get_no_open(inode->i_rdev);
+ if (!bdev)
+ return -ENXIO;
- filp->f_mapping = handle->bdev->bd_inode->i_mapping;
- filp->f_wb_err = filemap_sample_wb_err(filp->f_mapping);
- filp->private_data = handle;
- return 0;
+ ret = bdev_open(bdev, mode, filp->private_data, NULL, filp);
+ if (ret)
+ blkdev_put_no_open(bdev);
+ return ret;
}
static int blkdev_release(struct inode *inode, struct file *filp)
{
- bdev_release(filp->private_data);
+ bdev_release(filp);
return 0;
}
diff --git a/block/genhd.c b/block/genhd.c
index a214f9c..bb29a68 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -342,7 +342,7 @@ EXPORT_SYMBOL_GPL(disk_uevent);
int disk_scan_partitions(struct gendisk *disk, blk_mode_t mode)
{
- struct bdev_handle *handle;
+ struct file *file;
int ret = 0;
if (disk->flags & (GENHD_FL_NO_PART | GENHD_FL_HIDDEN))
@@ -366,12 +366,12 @@ int disk_scan_partitions(struct gendisk *disk, blk_mode_t mode)
}
set_bit(GD_NEED_PART_SCAN, &disk->state);
- handle = bdev_open_by_dev(disk_devt(disk), mode & ~BLK_OPEN_EXCL, NULL,
- NULL);
- if (IS_ERR(handle))
- ret = PTR_ERR(handle);
+ file = bdev_file_open_by_dev(disk_devt(disk), mode & ~BLK_OPEN_EXCL,
+ NULL, NULL);
+ if (IS_ERR(file))
+ ret = PTR_ERR(file);
else
- bdev_release(handle);
+ fput(file);
/*
* If blkdev_get_by_dev() failed early, GD_NEED_PART_SCAN is still set,
diff --git a/block/ioctl.c b/block/ioctl.c
index de0cc0d..0c76137 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -476,7 +476,7 @@ static int blkdev_bszset(struct block_device *bdev, blk_mode_t mode,
int __user *argp)
{
int ret, n;
- struct bdev_handle *handle;
+ struct file *file;
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
@@ -488,12 +488,11 @@ static int blkdev_bszset(struct block_device *bdev, blk_mode_t mode,
if (mode & BLK_OPEN_EXCL)
return set_blocksize(bdev, n);
- handle = bdev_open_by_dev(bdev->bd_dev, mode, &bdev, NULL);
- if (IS_ERR(handle))
+ file = bdev_file_open_by_dev(bdev->bd_dev, mode, &bdev, NULL);
+ if (IS_ERR(file))
return -EBUSY;
ret = set_blocksize(bdev, n);
- bdev_release(handle);
-
+ fput(file);
return ret;
}
diff --git a/block/opal_proto.h b/block/opal_proto.h
index dec7ce3a..d247a45 100644
--- a/block/opal_proto.h
+++ b/block/opal_proto.h
@@ -71,6 +71,7 @@ enum opal_response_token {
#define SHORT_ATOM_BYTE 0xBF
#define MEDIUM_ATOM_BYTE 0xDF
#define LONG_ATOM_BYTE 0xE3
+#define EMPTY_ATOM_BYTE 0xFF
#define OPAL_INVAL_PARAM 12
#define OPAL_MANUFACTURED_INACTIVE 0x08
diff --git a/block/sed-opal.c b/block/sed-opal.c
index 25fdbb7..14fe0fe 100644
--- a/block/sed-opal.c
+++ b/block/sed-opal.c
@@ -1056,16 +1056,20 @@ static int response_parse(const u8 *buf, size_t length,
token_length = response_parse_medium(iter, pos);
else if (pos[0] <= LONG_ATOM_BYTE) /* long atom */
token_length = response_parse_long(iter, pos);
+ else if (pos[0] == EMPTY_ATOM_BYTE) /* empty atom */
+ token_length = 1;
else /* TOKEN */
token_length = response_parse_token(iter, pos);
if (token_length < 0)
return token_length;
+ if (pos[0] != EMPTY_ATOM_BYTE)
+ num_entries++;
+
pos += token_length;
total -= token_length;
iter++;
- num_entries++;
}
resp->num = num_entries;
diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c
index 82c44d4..e24c829 100644
--- a/crypto/algif_hash.c
+++ b/crypto/algif_hash.c
@@ -91,13 +91,13 @@ static int hash_sendmsg(struct socket *sock, struct msghdr *msg,
if (!(msg->msg_flags & MSG_MORE)) {
err = hash_alloc_result(sk, ctx);
if (err)
- goto unlock_free;
+ goto unlock_free_result;
ahash_request_set_crypt(&ctx->req, NULL,
ctx->result, 0);
err = crypto_wait_req(crypto_ahash_final(&ctx->req),
&ctx->wait);
if (err)
- goto unlock_free;
+ goto unlock_free_result;
}
goto done_more;
}
@@ -170,6 +170,7 @@ static int hash_sendmsg(struct socket *sock, struct msghdr *msg,
unlock_free:
af_alg_free_sg(&ctx->sgl);
+unlock_free_result:
hash_free_result(sk, ctx);
ctx->more = false;
goto unlock;
diff --git a/crypto/cbc.c b/crypto/cbc.c
index eedddef..e81918c 100644
--- a/crypto/cbc.c
+++ b/crypto/cbc.c
@@ -148,6 +148,9 @@ static int crypto_cbc_create(struct crypto_template *tmpl, struct rtattr **tb)
if (!is_power_of_2(inst->alg.co.base.cra_blocksize))
goto out_free_inst;
+ if (inst->alg.co.statesize)
+ goto out_free_inst;
+
inst->alg.encrypt = crypto_cbc_encrypt;
inst->alg.decrypt = crypto_cbc_decrypt;
diff --git a/crypto/lskcipher.c b/crypto/lskcipher.c
index 0b6dd8aa..0f1bd7d 100644
--- a/crypto/lskcipher.c
+++ b/crypto/lskcipher.c
@@ -212,13 +212,12 @@ static int crypto_lskcipher_crypt_sg(struct skcipher_request *req,
ivsize = crypto_lskcipher_ivsize(tfm);
ivs = PTR_ALIGN(ivs, crypto_skcipher_alignmask(skcipher) + 1);
+ memcpy(ivs, req->iv, ivsize);
flags = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP;
if (req->base.flags & CRYPTO_SKCIPHER_REQ_CONT)
flags |= CRYPTO_LSKCIPHER_FLAG_CONT;
- else
- memcpy(ivs, req->iv, ivsize);
if (!(req->base.flags & CRYPTO_SKCIPHER_REQ_NOTFINAL))
flags |= CRYPTO_LSKCIPHER_FLAG_FINAL;
@@ -234,8 +233,7 @@ static int crypto_lskcipher_crypt_sg(struct skcipher_request *req,
flags |= CRYPTO_LSKCIPHER_FLAG_CONT;
}
- if (flags & CRYPTO_LSKCIPHER_FLAG_FINAL)
- memcpy(req->iv, ivs, ivsize);
+ memcpy(req->iv, ivs, ivsize);
return err;
}
diff --git a/drivers/accel/ivpu/ivpu_drv.c b/drivers/accel/ivpu/ivpu_drv.c
index 9418c73..4b06402 100644
--- a/drivers/accel/ivpu/ivpu_drv.c
+++ b/drivers/accel/ivpu/ivpu_drv.c
@@ -480,9 +480,8 @@ static int ivpu_pci_init(struct ivpu_device *vdev)
/* Clear any pending errors */
pcie_capability_clear_word(pdev, PCI_EXP_DEVSTA, 0x3f);
- /* VPU 37XX does not require 10m D3hot delay */
- if (ivpu_hw_gen(vdev) == IVPU_HW_37XX)
- pdev->d3hot_delay = 0;
+ /* NPU does not require 10m D3hot delay */
+ pdev->d3hot_delay = 0;
ret = pcim_enable_device(pdev);
if (ret) {
diff --git a/drivers/accel/ivpu/ivpu_fw.c b/drivers/accel/ivpu/ivpu_fw.c
index 6576232..5fa8bd4 100644
--- a/drivers/accel/ivpu/ivpu_fw.c
+++ b/drivers/accel/ivpu/ivpu_fw.c
@@ -222,7 +222,6 @@ ivpu_fw_init_wa(struct ivpu_device *vdev)
const struct vpu_firmware_header *fw_hdr = (const void *)vdev->fw->file->data;
if (IVPU_FW_CHECK_API_VER_LT(vdev, fw_hdr, BOOT, 3, 17) ||
- (ivpu_hw_gen(vdev) > IVPU_HW_37XX) ||
(ivpu_test_mode & IVPU_TEST_MODE_D0I3_MSG_DISABLE))
vdev->wa.disable_d0i3_msg = true;
diff --git a/drivers/accel/ivpu/ivpu_hw_37xx.c b/drivers/accel/ivpu/ivpu_hw_37xx.c
index f15a93d..89af100 100644
--- a/drivers/accel/ivpu/ivpu_hw_37xx.c
+++ b/drivers/accel/ivpu/ivpu_hw_37xx.c
@@ -510,22 +510,12 @@ static int ivpu_boot_pwr_domain_enable(struct ivpu_device *vdev)
return ret;
}
-static int ivpu_boot_pwr_domain_disable(struct ivpu_device *vdev)
-{
- ivpu_boot_dpu_active_drive(vdev, false);
- ivpu_boot_pwr_island_isolation_drive(vdev, true);
- ivpu_boot_pwr_island_trickle_drive(vdev, false);
- ivpu_boot_pwr_island_drive(vdev, false);
-
- return ivpu_boot_wait_for_pwr_island_status(vdev, 0x0);
-}
-
static void ivpu_boot_no_snoop_enable(struct ivpu_device *vdev)
{
u32 val = REGV_RD32(VPU_37XX_HOST_IF_TCU_PTW_OVERRIDES);
val = REG_SET_FLD(VPU_37XX_HOST_IF_TCU_PTW_OVERRIDES, NOSNOOP_OVERRIDE_EN, val);
- val = REG_SET_FLD(VPU_37XX_HOST_IF_TCU_PTW_OVERRIDES, AW_NOSNOOP_OVERRIDE, val);
+ val = REG_CLR_FLD(VPU_37XX_HOST_IF_TCU_PTW_OVERRIDES, AW_NOSNOOP_OVERRIDE, val);
val = REG_SET_FLD(VPU_37XX_HOST_IF_TCU_PTW_OVERRIDES, AR_NOSNOOP_OVERRIDE, val);
REGV_WR32(VPU_37XX_HOST_IF_TCU_PTW_OVERRIDES, val);
@@ -616,12 +606,37 @@ static int ivpu_hw_37xx_info_init(struct ivpu_device *vdev)
return 0;
}
+static int ivpu_hw_37xx_ip_reset(struct ivpu_device *vdev)
+{
+ int ret;
+ u32 val;
+
+ if (IVPU_WA(punit_disabled))
+ return 0;
+
+ ret = REGB_POLL_FLD(VPU_37XX_BUTTRESS_VPU_IP_RESET, TRIGGER, 0, TIMEOUT_US);
+ if (ret) {
+ ivpu_err(vdev, "Timed out waiting for TRIGGER bit\n");
+ return ret;
+ }
+
+ val = REGB_RD32(VPU_37XX_BUTTRESS_VPU_IP_RESET);
+ val = REG_SET_FLD(VPU_37XX_BUTTRESS_VPU_IP_RESET, TRIGGER, val);
+ REGB_WR32(VPU_37XX_BUTTRESS_VPU_IP_RESET, val);
+
+ ret = REGB_POLL_FLD(VPU_37XX_BUTTRESS_VPU_IP_RESET, TRIGGER, 0, TIMEOUT_US);
+ if (ret)
+ ivpu_err(vdev, "Timed out waiting for RESET completion\n");
+
+ return ret;
+}
+
static int ivpu_hw_37xx_reset(struct ivpu_device *vdev)
{
int ret = 0;
- if (ivpu_boot_pwr_domain_disable(vdev)) {
- ivpu_err(vdev, "Failed to disable power domain\n");
+ if (ivpu_hw_37xx_ip_reset(vdev)) {
+ ivpu_err(vdev, "Failed to reset NPU\n");
ret = -EIO;
}
@@ -661,6 +676,11 @@ static int ivpu_hw_37xx_power_up(struct ivpu_device *vdev)
{
int ret;
+ /* PLL requests may fail when powering down, so issue WP 0 here */
+ ret = ivpu_pll_disable(vdev);
+ if (ret)
+ ivpu_warn(vdev, "Failed to disable PLL: %d\n", ret);
+
ret = ivpu_hw_37xx_d0i3_disable(vdev);
if (ret)
ivpu_warn(vdev, "Failed to disable D0I3: %d\n", ret);
diff --git a/drivers/accel/ivpu/ivpu_hw_40xx.c b/drivers/accel/ivpu/ivpu_hw_40xx.c
index 7042880..a1523d0 100644
--- a/drivers/accel/ivpu/ivpu_hw_40xx.c
+++ b/drivers/accel/ivpu/ivpu_hw_40xx.c
@@ -24,7 +24,7 @@
#define SKU_HW_ID_SHIFT 16u
#define SKU_HW_ID_MASK 0xffff0000u
-#define PLL_CONFIG_DEFAULT 0x1
+#define PLL_CONFIG_DEFAULT 0x0
#define PLL_CDYN_DEFAULT 0x80
#define PLL_EPP_DEFAULT 0x80
#define PLL_REF_CLK_FREQ (50 * 1000000)
@@ -530,7 +530,7 @@ static void ivpu_boot_no_snoop_enable(struct ivpu_device *vdev)
u32 val = REGV_RD32(VPU_40XX_HOST_IF_TCU_PTW_OVERRIDES);
val = REG_SET_FLD(VPU_40XX_HOST_IF_TCU_PTW_OVERRIDES, SNOOP_OVERRIDE_EN, val);
- val = REG_CLR_FLD(VPU_40XX_HOST_IF_TCU_PTW_OVERRIDES, AW_SNOOP_OVERRIDE, val);
+ val = REG_SET_FLD(VPU_40XX_HOST_IF_TCU_PTW_OVERRIDES, AW_SNOOP_OVERRIDE, val);
val = REG_CLR_FLD(VPU_40XX_HOST_IF_TCU_PTW_OVERRIDES, AR_SNOOP_OVERRIDE, val);
REGV_WR32(VPU_40XX_HOST_IF_TCU_PTW_OVERRIDES, val);
@@ -704,7 +704,6 @@ static int ivpu_hw_40xx_info_init(struct ivpu_device *vdev)
{
struct ivpu_hw_info *hw = vdev->hw;
u32 tile_disable;
- u32 tile_enable;
u32 fuse;
fuse = REGB_RD32(VPU_40XX_BUTTRESS_TILE_FUSE);
@@ -725,10 +724,6 @@ static int ivpu_hw_40xx_info_init(struct ivpu_device *vdev)
else
ivpu_dbg(vdev, MISC, "Fuse: All %d tiles enabled\n", TILE_MAX_NUM);
- tile_enable = (~tile_disable) & TILE_MAX_MASK;
-
- hw->sku = REG_SET_FLD_NUM(SKU, HW_ID, LNL_HW_ID, hw->sku);
- hw->sku = REG_SET_FLD_NUM(SKU, TILE, tile_enable, hw->sku);
hw->tile_fuse = tile_disable;
hw->pll.profiling_freq = PLL_PROFILING_FREQ_DEFAULT;
diff --git a/drivers/accel/ivpu/ivpu_job.c b/drivers/accel/ivpu/ivpu_job.c
index 0440bee..e70cfb8 100644
--- a/drivers/accel/ivpu/ivpu_job.c
+++ b/drivers/accel/ivpu/ivpu_job.c
@@ -294,7 +294,7 @@ static int ivpu_job_signal_and_destroy(struct ivpu_device *vdev, u32 job_id, u32
return -ENOENT;
if (job->file_priv->has_mmu_faults)
- job_status = VPU_JSM_STATUS_ABORTED;
+ job_status = DRM_IVPU_JOB_STATUS_ABORTED;
job->bos[CMD_BUF_IDX]->job_status = job_status;
dma_fence_signal(job->done_fence);
@@ -315,7 +315,7 @@ void ivpu_jobs_abort_all(struct ivpu_device *vdev)
unsigned long id;
xa_for_each(&vdev->submitted_jobs_xa, id, job)
- ivpu_job_signal_and_destroy(vdev, id, VPU_JSM_STATUS_ABORTED);
+ ivpu_job_signal_and_destroy(vdev, id, DRM_IVPU_JOB_STATUS_ABORTED);
}
static int ivpu_job_submit(struct ivpu_job *job)
diff --git a/drivers/accel/ivpu/ivpu_mmu.c b/drivers/accel/ivpu/ivpu_mmu.c
index 9a3122ff..91bd640 100644
--- a/drivers/accel/ivpu/ivpu_mmu.c
+++ b/drivers/accel/ivpu/ivpu_mmu.c
@@ -72,10 +72,10 @@
#define IVPU_MMU_Q_COUNT_LOG2 4 /* 16 entries */
#define IVPU_MMU_Q_COUNT ((u32)1 << IVPU_MMU_Q_COUNT_LOG2)
-#define IVPU_MMU_Q_WRAP_BIT (IVPU_MMU_Q_COUNT << 1)
-#define IVPU_MMU_Q_WRAP_MASK (IVPU_MMU_Q_WRAP_BIT - 1)
-#define IVPU_MMU_Q_IDX_MASK (IVPU_MMU_Q_COUNT - 1)
+#define IVPU_MMU_Q_WRAP_MASK GENMASK(IVPU_MMU_Q_COUNT_LOG2, 0)
+#define IVPU_MMU_Q_IDX_MASK (IVPU_MMU_Q_COUNT - 1)
#define IVPU_MMU_Q_IDX(val) ((val) & IVPU_MMU_Q_IDX_MASK)
+#define IVPU_MMU_Q_WRP(val) ((val) & IVPU_MMU_Q_COUNT)
#define IVPU_MMU_CMDQ_CMD_SIZE 16
#define IVPU_MMU_CMDQ_SIZE (IVPU_MMU_Q_COUNT * IVPU_MMU_CMDQ_CMD_SIZE)
@@ -475,20 +475,32 @@ static int ivpu_mmu_cmdq_wait_for_cons(struct ivpu_device *vdev)
return 0;
}
+static bool ivpu_mmu_queue_is_full(struct ivpu_mmu_queue *q)
+{
+ return ((IVPU_MMU_Q_IDX(q->prod) == IVPU_MMU_Q_IDX(q->cons)) &&
+ (IVPU_MMU_Q_WRP(q->prod) != IVPU_MMU_Q_WRP(q->cons)));
+}
+
+static bool ivpu_mmu_queue_is_empty(struct ivpu_mmu_queue *q)
+{
+ return ((IVPU_MMU_Q_IDX(q->prod) == IVPU_MMU_Q_IDX(q->cons)) &&
+ (IVPU_MMU_Q_WRP(q->prod) == IVPU_MMU_Q_WRP(q->cons)));
+}
+
static int ivpu_mmu_cmdq_cmd_write(struct ivpu_device *vdev, const char *name, u64 data0, u64 data1)
{
- struct ivpu_mmu_queue *q = &vdev->mmu->cmdq;
- u64 *queue_buffer = q->base;
- int idx = IVPU_MMU_Q_IDX(q->prod) * (IVPU_MMU_CMDQ_CMD_SIZE / sizeof(*queue_buffer));
+ struct ivpu_mmu_queue *cmdq = &vdev->mmu->cmdq;
+ u64 *queue_buffer = cmdq->base;
+ int idx = IVPU_MMU_Q_IDX(cmdq->prod) * (IVPU_MMU_CMDQ_CMD_SIZE / sizeof(*queue_buffer));
- if (!CIRC_SPACE(IVPU_MMU_Q_IDX(q->prod), IVPU_MMU_Q_IDX(q->cons), IVPU_MMU_Q_COUNT)) {
+ if (ivpu_mmu_queue_is_full(cmdq)) {
ivpu_err(vdev, "Failed to write MMU CMD %s\n", name);
return -EBUSY;
}
queue_buffer[idx] = data0;
queue_buffer[idx + 1] = data1;
- q->prod = (q->prod + 1) & IVPU_MMU_Q_WRAP_MASK;
+ cmdq->prod = (cmdq->prod + 1) & IVPU_MMU_Q_WRAP_MASK;
ivpu_dbg(vdev, MMU, "CMD write: %s data: 0x%llx 0x%llx\n", name, data0, data1);
@@ -560,7 +572,6 @@ static int ivpu_mmu_reset(struct ivpu_device *vdev)
mmu->cmdq.cons = 0;
memset(mmu->evtq.base, 0, IVPU_MMU_EVTQ_SIZE);
- clflush_cache_range(mmu->evtq.base, IVPU_MMU_EVTQ_SIZE);
mmu->evtq.prod = 0;
mmu->evtq.cons = 0;
@@ -874,14 +885,10 @@ static u32 *ivpu_mmu_get_event(struct ivpu_device *vdev)
u32 *evt = evtq->base + (idx * IVPU_MMU_EVTQ_CMD_SIZE);
evtq->prod = REGV_RD32(IVPU_MMU_REG_EVTQ_PROD_SEC);
- if (!CIRC_CNT(IVPU_MMU_Q_IDX(evtq->prod), IVPU_MMU_Q_IDX(evtq->cons), IVPU_MMU_Q_COUNT))
+ if (ivpu_mmu_queue_is_empty(evtq))
return NULL;
- clflush_cache_range(evt, IVPU_MMU_EVTQ_CMD_SIZE);
-
evtq->cons = (evtq->cons + 1) & IVPU_MMU_Q_WRAP_MASK;
- REGV_WR32(IVPU_MMU_REG_EVTQ_CONS_SEC, evtq->cons);
-
return evt;
}
@@ -902,6 +909,7 @@ void ivpu_mmu_irq_evtq_handler(struct ivpu_device *vdev)
}
ivpu_mmu_user_context_mark_invalid(vdev, ssid);
+ REGV_WR32(IVPU_MMU_REG_EVTQ_CONS_SEC, vdev->mmu->evtq.cons);
}
}
diff --git a/drivers/accel/ivpu/ivpu_pm.c b/drivers/accel/ivpu/ivpu_pm.c
index f501f27..5f73854 100644
--- a/drivers/accel/ivpu/ivpu_pm.c
+++ b/drivers/accel/ivpu/ivpu_pm.c
@@ -58,11 +58,14 @@ static int ivpu_suspend(struct ivpu_device *vdev)
{
int ret;
+ /* Save PCI state before powering down as it sometimes gets corrupted if NPU hangs */
+ pci_save_state(to_pci_dev(vdev->drm.dev));
+
ret = ivpu_shutdown(vdev);
- if (ret) {
+ if (ret)
ivpu_err(vdev, "Failed to shutdown VPU: %d\n", ret);
- return ret;
- }
+
+ pci_set_power_state(to_pci_dev(vdev->drm.dev), PCI_D3hot);
return ret;
}
@@ -71,6 +74,9 @@ static int ivpu_resume(struct ivpu_device *vdev)
{
int ret;
+ pci_set_power_state(to_pci_dev(vdev->drm.dev), PCI_D0);
+ pci_restore_state(to_pci_dev(vdev->drm.dev));
+
retry:
ret = ivpu_hw_power_up(vdev);
if (ret) {
@@ -120,15 +126,20 @@ static void ivpu_pm_recovery_work(struct work_struct *work)
ivpu_fw_log_dump(vdev);
-retry:
- ret = pci_try_reset_function(to_pci_dev(vdev->drm.dev));
- if (ret == -EAGAIN && !drm_dev_is_unplugged(&vdev->drm)) {
- cond_resched();
- goto retry;
- }
+ atomic_inc(&vdev->pm->reset_counter);
+ atomic_set(&vdev->pm->reset_pending, 1);
+ down_write(&vdev->pm->reset_lock);
- if (ret && ret != -EAGAIN)
- ivpu_err(vdev, "Failed to reset VPU: %d\n", ret);
+ ivpu_suspend(vdev);
+ ivpu_pm_prepare_cold_boot(vdev);
+ ivpu_jobs_abort_all(vdev);
+
+ ret = ivpu_resume(vdev);
+ if (ret)
+ ivpu_err(vdev, "Failed to resume NPU: %d\n", ret);
+
+ up_write(&vdev->pm->reset_lock);
+ atomic_set(&vdev->pm->reset_pending, 0);
kobject_uevent_env(&vdev->drm.dev->kobj, KOBJ_CHANGE, evt);
pm_runtime_mark_last_busy(vdev->drm.dev);
@@ -200,9 +211,6 @@ int ivpu_pm_suspend_cb(struct device *dev)
ivpu_suspend(vdev);
ivpu_pm_prepare_warm_boot(vdev);
- pci_save_state(to_pci_dev(dev));
- pci_set_power_state(to_pci_dev(dev), PCI_D3hot);
-
ivpu_dbg(vdev, PM, "Suspend done.\n");
return 0;
@@ -216,9 +224,6 @@ int ivpu_pm_resume_cb(struct device *dev)
ivpu_dbg(vdev, PM, "Resume..\n");
- pci_set_power_state(to_pci_dev(dev), PCI_D0);
- pci_restore_state(to_pci_dev(dev));
-
ret = ivpu_resume(vdev);
if (ret)
ivpu_err(vdev, "Failed to resume: %d\n", ret);
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 7b7c605..ab2a82c 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -26,7 +26,6 @@
#include <linux/interrupt.h>
#include <linux/timer.h>
#include <linux/cper.h>
-#include <linux/cxl-event.h>
#include <linux/platform_device.h>
#include <linux/mutex.h>
#include <linux/ratelimit.h>
@@ -674,78 +673,6 @@ static void ghes_defer_non_standard_event(struct acpi_hest_generic_data *gdata,
schedule_work(&entry->work);
}
-/*
- * Only a single callback can be registered for CXL CPER events.
- */
-static DECLARE_RWSEM(cxl_cper_rw_sem);
-static cxl_cper_callback cper_callback;
-
-/* CXL Event record UUIDs are formatted as GUIDs and reported in section type */
-
-/*
- * General Media Event Record
- * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
- */
-#define CPER_SEC_CXL_GEN_MEDIA_GUID \
- GUID_INIT(0xfbcd0a77, 0xc260, 0x417f, \
- 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6)
-
-/*
- * DRAM Event Record
- * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
- */
-#define CPER_SEC_CXL_DRAM_GUID \
- GUID_INIT(0x601dcbb3, 0x9c06, 0x4eab, \
- 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24)
-
-/*
- * Memory Module Event Record
- * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
- */
-#define CPER_SEC_CXL_MEM_MODULE_GUID \
- GUID_INIT(0xfe927475, 0xdd59, 0x4339, \
- 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74)
-
-static void cxl_cper_post_event(enum cxl_event_type event_type,
- struct cxl_cper_event_rec *rec)
-{
- if (rec->hdr.length <= sizeof(rec->hdr) ||
- rec->hdr.length > sizeof(*rec)) {
- pr_err(FW_WARN "CXL CPER Invalid section length (%u)\n",
- rec->hdr.length);
- return;
- }
-
- if (!(rec->hdr.validation_bits & CPER_CXL_COMP_EVENT_LOG_VALID)) {
- pr_err(FW_WARN "CXL CPER invalid event\n");
- return;
- }
-
- guard(rwsem_read)(&cxl_cper_rw_sem);
- if (cper_callback)
- cper_callback(event_type, rec);
-}
-
-int cxl_cper_register_callback(cxl_cper_callback callback)
-{
- guard(rwsem_write)(&cxl_cper_rw_sem);
- if (cper_callback)
- return -EINVAL;
- cper_callback = callback;
- return 0;
-}
-EXPORT_SYMBOL_NS_GPL(cxl_cper_register_callback, CXL);
-
-int cxl_cper_unregister_callback(cxl_cper_callback callback)
-{
- guard(rwsem_write)(&cxl_cper_rw_sem);
- if (callback != cper_callback)
- return -EINVAL;
- cper_callback = NULL;
- return 0;
-}
-EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_callback, CXL);
-
static bool ghes_do_proc(struct ghes *ghes,
const struct acpi_hest_generic_status *estatus)
{
@@ -780,22 +707,6 @@ static bool ghes_do_proc(struct ghes *ghes,
}
else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
queued = ghes_handle_arm_hw_error(gdata, sev, sync);
- } else if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID)) {
- struct cxl_cper_event_rec *rec =
- acpi_hest_get_payload(gdata);
-
- cxl_cper_post_event(CXL_CPER_EVENT_GEN_MEDIA, rec);
- } else if (guid_equal(sec_type, &CPER_SEC_CXL_DRAM_GUID)) {
- struct cxl_cper_event_rec *rec =
- acpi_hest_get_payload(gdata);
-
- cxl_cper_post_event(CXL_CPER_EVENT_DRAM, rec);
- } else if (guid_equal(sec_type,
- &CPER_SEC_CXL_MEM_MODULE_GUID)) {
- struct cxl_cper_event_rec *rec =
- acpi_hest_get_payload(gdata);
-
- cxl_cper_post_event(CXL_CPER_EVENT_MEM_MODULE, rec);
} else {
void *err = acpi_hest_get_payload(gdata);
diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index dbdee29..0225579 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -525,10 +525,12 @@ static void acpi_ec_clear(struct acpi_ec *ec)
static void acpi_ec_enable_event(struct acpi_ec *ec)
{
- spin_lock(&ec->lock);
+ unsigned long flags;
+
+ spin_lock_irqsave(&ec->lock, flags);
if (acpi_ec_started(ec))
__acpi_ec_enable_event(ec);
- spin_unlock(&ec->lock);
+ spin_unlock_irqrestore(&ec->lock, flags);
/* Drain additional events if hardware requires that */
if (EC_FLAGS_CLEAR_ON_RESUME)
@@ -544,9 +546,11 @@ static void __acpi_ec_flush_work(void)
static void acpi_ec_disable_event(struct acpi_ec *ec)
{
- spin_lock(&ec->lock);
+ unsigned long flags;
+
+ spin_lock_irqsave(&ec->lock, flags);
__acpi_ec_disable_event(ec);
- spin_unlock(&ec->lock);
+ spin_unlock_irqrestore(&ec->lock, flags);
/*
* When ec_freeze_events is true, we need to flush events in
@@ -567,9 +571,10 @@ void acpi_ec_flush_work(void)
static bool acpi_ec_guard_event(struct acpi_ec *ec)
{
+ unsigned long flags;
bool guarded;
- spin_lock(&ec->lock);
+ spin_lock_irqsave(&ec->lock, flags);
/*
* If firmware SCI_EVT clearing timing is "event", we actually
* don't know when the SCI_EVT will be cleared by firmware after
@@ -585,29 +590,31 @@ static bool acpi_ec_guard_event(struct acpi_ec *ec)
guarded = ec_event_clearing == ACPI_EC_EVT_TIMING_EVENT &&
ec->event_state != EC_EVENT_READY &&
(!ec->curr || ec->curr->command != ACPI_EC_COMMAND_QUERY);
- spin_unlock(&ec->lock);
+ spin_unlock_irqrestore(&ec->lock, flags);
return guarded;
}
static int ec_transaction_polled(struct acpi_ec *ec)
{
+ unsigned long flags;
int ret = 0;
- spin_lock(&ec->lock);
+ spin_lock_irqsave(&ec->lock, flags);
if (ec->curr && (ec->curr->flags & ACPI_EC_COMMAND_POLL))
ret = 1;
- spin_unlock(&ec->lock);
+ spin_unlock_irqrestore(&ec->lock, flags);
return ret;
}
static int ec_transaction_completed(struct acpi_ec *ec)
{
+ unsigned long flags;
int ret = 0;
- spin_lock(&ec->lock);
+ spin_lock_irqsave(&ec->lock, flags);
if (ec->curr && (ec->curr->flags & ACPI_EC_COMMAND_COMPLETE))
ret = 1;
- spin_unlock(&ec->lock);
+ spin_unlock_irqrestore(&ec->lock, flags);
return ret;
}
@@ -749,6 +756,7 @@ static int ec_guard(struct acpi_ec *ec)
static int ec_poll(struct acpi_ec *ec)
{
+ unsigned long flags;
int repeat = 5; /* number of command restarts */
while (repeat--) {
@@ -757,14 +765,14 @@ static int ec_poll(struct acpi_ec *ec)
do {
if (!ec_guard(ec))
return 0;
- spin_lock(&ec->lock);
+ spin_lock_irqsave(&ec->lock, flags);
advance_transaction(ec, false);
- spin_unlock(&ec->lock);
+ spin_unlock_irqrestore(&ec->lock, flags);
} while (time_before(jiffies, delay));
pr_debug("controller reset, restart transaction\n");
- spin_lock(&ec->lock);
+ spin_lock_irqsave(&ec->lock, flags);
start_transaction(ec);
- spin_unlock(&ec->lock);
+ spin_unlock_irqrestore(&ec->lock, flags);
}
return -ETIME;
}
@@ -772,10 +780,11 @@ static int ec_poll(struct acpi_ec *ec)
static int acpi_ec_transaction_unlocked(struct acpi_ec *ec,
struct transaction *t)
{
+ unsigned long tmp;
int ret = 0;
/* start transaction */
- spin_lock(&ec->lock);
+ spin_lock_irqsave(&ec->lock, tmp);
/* Enable GPE for command processing (IBF=0/OBF=1) */
if (!acpi_ec_submit_flushable_request(ec)) {
ret = -EINVAL;
@@ -786,11 +795,11 @@ static int acpi_ec_transaction_unlocked(struct acpi_ec *ec,
ec->curr = t;
ec_dbg_req("Command(%s) started", acpi_ec_cmd_string(t->command));
start_transaction(ec);
- spin_unlock(&ec->lock);
+ spin_unlock_irqrestore(&ec->lock, tmp);
ret = ec_poll(ec);
- spin_lock(&ec->lock);
+ spin_lock_irqsave(&ec->lock, tmp);
if (t->irq_count == ec_storm_threshold)
acpi_ec_unmask_events(ec);
ec_dbg_req("Command(%s) stopped", acpi_ec_cmd_string(t->command));
@@ -799,7 +808,7 @@ static int acpi_ec_transaction_unlocked(struct acpi_ec *ec,
acpi_ec_complete_request(ec);
ec_dbg_ref(ec, "Decrease command");
unlock:
- spin_unlock(&ec->lock);
+ spin_unlock_irqrestore(&ec->lock, tmp);
return ret;
}
@@ -927,7 +936,9 @@ EXPORT_SYMBOL(ec_get_handle);
static void acpi_ec_start(struct acpi_ec *ec, bool resuming)
{
- spin_lock(&ec->lock);
+ unsigned long flags;
+
+ spin_lock_irqsave(&ec->lock, flags);
if (!test_and_set_bit(EC_FLAGS_STARTED, &ec->flags)) {
ec_dbg_drv("Starting EC");
/* Enable GPE for event processing (SCI_EVT=1) */
@@ -937,28 +948,31 @@ static void acpi_ec_start(struct acpi_ec *ec, bool resuming)
}
ec_log_drv("EC started");
}
- spin_unlock(&ec->lock);
+ spin_unlock_irqrestore(&ec->lock, flags);
}
static bool acpi_ec_stopped(struct acpi_ec *ec)
{
+ unsigned long flags;
bool flushed;
- spin_lock(&ec->lock);
+ spin_lock_irqsave(&ec->lock, flags);
flushed = acpi_ec_flushed(ec);
- spin_unlock(&ec->lock);
+ spin_unlock_irqrestore(&ec->lock, flags);
return flushed;
}
static void acpi_ec_stop(struct acpi_ec *ec, bool suspending)
{
- spin_lock(&ec->lock);
+ unsigned long flags;
+
+ spin_lock_irqsave(&ec->lock, flags);
if (acpi_ec_started(ec)) {
ec_dbg_drv("Stopping EC");
set_bit(EC_FLAGS_STOPPED, &ec->flags);
- spin_unlock(&ec->lock);
+ spin_unlock_irqrestore(&ec->lock, flags);
wait_event(ec->wait, acpi_ec_stopped(ec));
- spin_lock(&ec->lock);
+ spin_lock_irqsave(&ec->lock, flags);
/* Disable GPE for event processing (SCI_EVT=1) */
if (!suspending) {
acpi_ec_complete_request(ec);
@@ -969,25 +983,29 @@ static void acpi_ec_stop(struct acpi_ec *ec, bool suspending)
clear_bit(EC_FLAGS_STOPPED, &ec->flags);
ec_log_drv("EC stopped");
}
- spin_unlock(&ec->lock);
+ spin_unlock_irqrestore(&ec->lock, flags);
}
static void acpi_ec_enter_noirq(struct acpi_ec *ec)
{
- spin_lock(&ec->lock);
+ unsigned long flags;
+
+ spin_lock_irqsave(&ec->lock, flags);
ec->busy_polling = true;
ec->polling_guard = 0;
ec_log_drv("interrupt blocked");
- spin_unlock(&ec->lock);
+ spin_unlock_irqrestore(&ec->lock, flags);
}
static void acpi_ec_leave_noirq(struct acpi_ec *ec)
{
- spin_lock(&ec->lock);
+ unsigned long flags;
+
+ spin_lock_irqsave(&ec->lock, flags);
ec->busy_polling = ec_busy_polling;
ec->polling_guard = ec_polling_guard;
ec_log_drv("interrupt unblocked");
- spin_unlock(&ec->lock);
+ spin_unlock_irqrestore(&ec->lock, flags);
}
void acpi_ec_block_transactions(void)
@@ -1119,9 +1137,9 @@ static void acpi_ec_event_processor(struct work_struct *work)
ec_dbg_evt("Query(0x%02x) stopped", handler->query_bit);
- spin_lock(&ec->lock);
+ spin_lock_irq(&ec->lock);
ec->queries_in_progress--;
- spin_unlock(&ec->lock);
+ spin_unlock_irq(&ec->lock);
acpi_ec_put_query_handler(handler);
kfree(q);
@@ -1184,12 +1202,12 @@ static int acpi_ec_submit_query(struct acpi_ec *ec)
*/
ec_dbg_evt("Query(0x%02x) scheduled", value);
- spin_lock(&ec->lock);
+ spin_lock_irq(&ec->lock);
ec->queries_in_progress++;
queue_work(ec_query_wq, &q->work);
- spin_unlock(&ec->lock);
+ spin_unlock_irq(&ec->lock);
return 0;
@@ -1205,14 +1223,14 @@ static void acpi_ec_event_handler(struct work_struct *work)
ec_dbg_evt("Event started");
- spin_lock(&ec->lock);
+ spin_lock_irq(&ec->lock);
while (ec->events_to_process) {
- spin_unlock(&ec->lock);
+ spin_unlock_irq(&ec->lock);
acpi_ec_submit_query(ec);
- spin_lock(&ec->lock);
+ spin_lock_irq(&ec->lock);
ec->events_to_process--;
}
@@ -1229,11 +1247,11 @@ static void acpi_ec_event_handler(struct work_struct *work)
ec_dbg_evt("Event stopped");
- spin_unlock(&ec->lock);
+ spin_unlock_irq(&ec->lock);
guard_timeout = !!ec_guard(ec);
- spin_lock(&ec->lock);
+ spin_lock_irq(&ec->lock);
/* Take care of SCI_EVT unless someone else is doing that. */
if (guard_timeout && !ec->curr)
@@ -1246,7 +1264,7 @@ static void acpi_ec_event_handler(struct work_struct *work)
ec->events_in_progress--;
- spin_unlock(&ec->lock);
+ spin_unlock_irq(&ec->lock);
}
static void clear_gpe_and_advance_transaction(struct acpi_ec *ec, bool interrupt)
@@ -1271,11 +1289,13 @@ static void clear_gpe_and_advance_transaction(struct acpi_ec *ec, bool interrupt
static void acpi_ec_handle_interrupt(struct acpi_ec *ec)
{
- spin_lock(&ec->lock);
+ unsigned long flags;
+
+ spin_lock_irqsave(&ec->lock, flags);
clear_gpe_and_advance_transaction(ec, true);
- spin_unlock(&ec->lock);
+ spin_unlock_irqrestore(&ec->lock, flags);
}
static u32 acpi_ec_gpe_handler(acpi_handle gpe_device,
@@ -2085,7 +2105,7 @@ bool acpi_ec_dispatch_gpe(void)
* Dispatch the EC GPE in-band, but do not report wakeup in any case
* to allow the caller to process events properly after that.
*/
- spin_lock(&first_ec->lock);
+ spin_lock_irq(&first_ec->lock);
if (acpi_ec_gpe_status_set(first_ec)) {
pm_pr_dbg("ACPI EC GPE status set\n");
@@ -2094,7 +2114,7 @@ bool acpi_ec_dispatch_gpe(void)
work_in_progress = acpi_ec_work_in_progress(first_ec);
}
- spin_unlock(&first_ec->lock);
+ spin_unlock_irq(&first_ec->lock);
if (!work_in_progress)
return false;
@@ -2107,11 +2127,11 @@ bool acpi_ec_dispatch_gpe(void)
pm_pr_dbg("ACPI EC work flushed\n");
- spin_lock(&first_ec->lock);
+ spin_lock_irq(&first_ec->lock);
work_in_progress = acpi_ec_work_in_progress(first_ec);
- spin_unlock(&first_ec->lock);
+ spin_unlock_irq(&first_ec->lock);
} while (work_in_progress && !pm_wakeup_pending());
return false;
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index da2e74f..682ff55 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -671,9 +671,17 @@ MODULE_PARM_DESC(mobile_lpm_policy, "Default LPM policy for mobile chipsets");
static void ahci_pci_save_initial_config(struct pci_dev *pdev,
struct ahci_host_priv *hpriv)
{
- if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA && pdev->device == 0x1166) {
- dev_info(&pdev->dev, "ASM1166 has only six ports\n");
- hpriv->saved_port_map = 0x3f;
+ if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA) {
+ switch (pdev->device) {
+ case 0x1166:
+ dev_info(&pdev->dev, "ASM1166 has only six ports\n");
+ hpriv->saved_port_map = 0x3f;
+ break;
+ case 0x1064:
+ dev_info(&pdev->dev, "ASM1064 has only four ports\n");
+ hpriv->saved_port_map = 0xf;
+ break;
+ }
}
if (pdev->vendor == PCI_VENDOR_ID_JMICRON && pdev->device == 0x2361) {
diff --git a/drivers/ata/ahci_ceva.c b/drivers/ata/ahci_ceva.c
index 64f7f7d..11a2c19 100644
--- a/drivers/ata/ahci_ceva.c
+++ b/drivers/ata/ahci_ceva.c
@@ -88,7 +88,6 @@ struct ceva_ahci_priv {
u32 axicc;
bool is_cci_enabled;
int flags;
- struct reset_control *rst;
};
static unsigned int ceva_ahci_read_id(struct ata_device *dev,
@@ -189,6 +188,60 @@ static const struct scsi_host_template ahci_platform_sht = {
AHCI_SHT(DRV_NAME),
};
+static int ceva_ahci_platform_enable_resources(struct ahci_host_priv *hpriv)
+{
+ int rc, i;
+
+ rc = ahci_platform_enable_regulators(hpriv);
+ if (rc)
+ return rc;
+
+ rc = ahci_platform_enable_clks(hpriv);
+ if (rc)
+ goto disable_regulator;
+
+ /* Assert the controller reset */
+ rc = ahci_platform_assert_rsts(hpriv);
+ if (rc)
+ goto disable_clks;
+
+ for (i = 0; i < hpriv->nports; i++) {
+ rc = phy_init(hpriv->phys[i]);
+ if (rc)
+ goto disable_rsts;
+ }
+
+ /* De-assert the controller reset */
+ ahci_platform_deassert_rsts(hpriv);
+
+ for (i = 0; i < hpriv->nports; i++) {
+ rc = phy_power_on(hpriv->phys[i]);
+ if (rc) {
+ phy_exit(hpriv->phys[i]);
+ goto disable_phys;
+ }
+ }
+
+ return 0;
+
+disable_rsts:
+ ahci_platform_deassert_rsts(hpriv);
+
+disable_phys:
+ while (--i >= 0) {
+ phy_power_off(hpriv->phys[i]);
+ phy_exit(hpriv->phys[i]);
+ }
+
+disable_clks:
+ ahci_platform_disable_clks(hpriv);
+
+disable_regulator:
+ ahci_platform_disable_regulators(hpriv);
+
+ return rc;
+}
+
static int ceva_ahci_probe(struct platform_device *pdev)
{
struct device_node *np = pdev->dev.of_node;
@@ -203,47 +256,19 @@ static int ceva_ahci_probe(struct platform_device *pdev)
return -ENOMEM;
cevapriv->ahci_pdev = pdev;
-
- cevapriv->rst = devm_reset_control_get_optional_exclusive(&pdev->dev,
- NULL);
- if (IS_ERR(cevapriv->rst))
- dev_err_probe(&pdev->dev, PTR_ERR(cevapriv->rst),
- "failed to get reset\n");
-
hpriv = ahci_platform_get_resources(pdev, 0);
if (IS_ERR(hpriv))
return PTR_ERR(hpriv);
- if (!cevapriv->rst) {
- rc = ahci_platform_enable_resources(hpriv);
- if (rc)
- return rc;
- } else {
- int i;
+ hpriv->rsts = devm_reset_control_get_optional_exclusive(&pdev->dev,
+ NULL);
+ if (IS_ERR(hpriv->rsts))
+ return dev_err_probe(&pdev->dev, PTR_ERR(hpriv->rsts),
+ "failed to get reset\n");
- rc = ahci_platform_enable_clks(hpriv);
- if (rc)
- return rc;
- /* Assert the controller reset */
- reset_control_assert(cevapriv->rst);
-
- for (i = 0; i < hpriv->nports; i++) {
- rc = phy_init(hpriv->phys[i]);
- if (rc)
- return rc;
- }
-
- /* De-assert the controller reset */
- reset_control_deassert(cevapriv->rst);
-
- for (i = 0; i < hpriv->nports; i++) {
- rc = phy_power_on(hpriv->phys[i]);
- if (rc) {
- phy_exit(hpriv->phys[i]);
- return rc;
- }
- }
- }
+ rc = ceva_ahci_platform_enable_resources(hpriv);
+ if (rc)
+ return rc;
if (of_property_read_bool(np, "ceva,broken-gen2"))
cevapriv->flags = CEVA_FLAG_BROKEN_GEN2;
@@ -252,52 +277,60 @@ static int ceva_ahci_probe(struct platform_device *pdev)
if (of_property_read_u8_array(np, "ceva,p0-cominit-params",
(u8 *)&cevapriv->pp2c[0], 4) < 0) {
dev_warn(dev, "ceva,p0-cominit-params property not defined\n");
- return -EINVAL;
+ rc = -EINVAL;
+ goto disable_resources;
}
if (of_property_read_u8_array(np, "ceva,p1-cominit-params",
(u8 *)&cevapriv->pp2c[1], 4) < 0) {
dev_warn(dev, "ceva,p1-cominit-params property not defined\n");
- return -EINVAL;
+ rc = -EINVAL;
+ goto disable_resources;
}
/* Read OOB timing value for COMWAKE from device-tree*/
if (of_property_read_u8_array(np, "ceva,p0-comwake-params",
(u8 *)&cevapriv->pp3c[0], 4) < 0) {
dev_warn(dev, "ceva,p0-comwake-params property not defined\n");
- return -EINVAL;
+ rc = -EINVAL;
+ goto disable_resources;
}
if (of_property_read_u8_array(np, "ceva,p1-comwake-params",
(u8 *)&cevapriv->pp3c[1], 4) < 0) {
dev_warn(dev, "ceva,p1-comwake-params property not defined\n");
- return -EINVAL;
+ rc = -EINVAL;
+ goto disable_resources;
}
/* Read phy BURST timing value from device-tree */
if (of_property_read_u8_array(np, "ceva,p0-burst-params",
(u8 *)&cevapriv->pp4c[0], 4) < 0) {
dev_warn(dev, "ceva,p0-burst-params property not defined\n");
- return -EINVAL;
+ rc = -EINVAL;
+ goto disable_resources;
}
if (of_property_read_u8_array(np, "ceva,p1-burst-params",
(u8 *)&cevapriv->pp4c[1], 4) < 0) {
dev_warn(dev, "ceva,p1-burst-params property not defined\n");
- return -EINVAL;
+ rc = -EINVAL;
+ goto disable_resources;
}
/* Read phy RETRY interval timing value from device-tree */
if (of_property_read_u16_array(np, "ceva,p0-retry-params",
(u16 *)&cevapriv->pp5c[0], 2) < 0) {
dev_warn(dev, "ceva,p0-retry-params property not defined\n");
- return -EINVAL;
+ rc = -EINVAL;
+ goto disable_resources;
}
if (of_property_read_u16_array(np, "ceva,p1-retry-params",
(u16 *)&cevapriv->pp5c[1], 2) < 0) {
dev_warn(dev, "ceva,p1-retry-params property not defined\n");
- return -EINVAL;
+ rc = -EINVAL;
+ goto disable_resources;
}
/*
@@ -335,7 +368,7 @@ static int __maybe_unused ceva_ahci_resume(struct device *dev)
struct ahci_host_priv *hpriv = host->private_data;
int rc;
- rc = ahci_platform_enable_resources(hpriv);
+ rc = ceva_ahci_platform_enable_resources(hpriv);
if (rc)
return rc;
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 09ed6777..be3412c 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -2001,47 +2001,6 @@ bool ata_dev_power_init_tf(struct ata_device *dev, struct ata_taskfile *tf,
return true;
}
-/**
- * ata_dev_power_set_standby - Set a device power mode to standby
- * @dev: target device
- *
- * Issue a STANDBY IMMEDIATE command to set a device power mode to standby.
- * For an HDD device, this spins down the disks.
- *
- * LOCKING:
- * Kernel thread context (may sleep).
- */
-void ata_dev_power_set_standby(struct ata_device *dev)
-{
- unsigned long ap_flags = dev->link->ap->flags;
- struct ata_taskfile tf;
- unsigned int err_mask;
-
- /*
- * Some odd clown BIOSes issue spindown on power off (ACPI S4 or S5)
- * causing some drives to spin up and down again. For these, do nothing
- * if we are being called on shutdown.
- */
- if ((ap_flags & ATA_FLAG_NO_POWEROFF_SPINDOWN) &&
- system_state == SYSTEM_POWER_OFF)
- return;
-
- if ((ap_flags & ATA_FLAG_NO_HIBERNATE_SPINDOWN) &&
- system_entering_hibernation())
- return;
-
- /* Issue STANDBY IMMEDIATE command only if supported by the device */
- if (!ata_dev_power_init_tf(dev, &tf, false))
- return;
-
- ata_dev_notice(dev, "Entering standby power mode\n");
-
- err_mask = ata_exec_internal(dev, &tf, NULL, DMA_NONE, NULL, 0, 0);
- if (err_mask)
- ata_dev_err(dev, "STANDBY IMMEDIATE failed (err_mask=0x%x)\n",
- err_mask);
-}
-
static bool ata_dev_power_is_active(struct ata_device *dev)
{
struct ata_taskfile tf;
@@ -2070,6 +2029,52 @@ static bool ata_dev_power_is_active(struct ata_device *dev)
}
/**
+ * ata_dev_power_set_standby - Set a device power mode to standby
+ * @dev: target device
+ *
+ * Issue a STANDBY IMMEDIATE command to set a device power mode to standby.
+ * For an HDD device, this spins down the disks.
+ *
+ * LOCKING:
+ * Kernel thread context (may sleep).
+ */
+void ata_dev_power_set_standby(struct ata_device *dev)
+{
+ unsigned long ap_flags = dev->link->ap->flags;
+ struct ata_taskfile tf;
+ unsigned int err_mask;
+
+ /* If the device is already sleeping or in standby, do nothing. */
+ if ((dev->flags & ATA_DFLAG_SLEEPING) ||
+ !ata_dev_power_is_active(dev))
+ return;
+
+ /*
+ * Some odd clown BIOSes issue spindown on power off (ACPI S4 or S5)
+ * causing some drives to spin up and down again. For these, do nothing
+ * if we are being called on shutdown.
+ */
+ if ((ap_flags & ATA_FLAG_NO_POWEROFF_SPINDOWN) &&
+ system_state == SYSTEM_POWER_OFF)
+ return;
+
+ if ((ap_flags & ATA_FLAG_NO_HIBERNATE_SPINDOWN) &&
+ system_entering_hibernation())
+ return;
+
+ /* Issue STANDBY IMMEDIATE command only if supported by the device */
+ if (!ata_dev_power_init_tf(dev, &tf, false))
+ return;
+
+ ata_dev_notice(dev, "Entering standby power mode\n");
+
+ err_mask = ata_exec_internal(dev, &tf, NULL, DMA_NONE, NULL, 0, 0);
+ if (err_mask)
+ ata_dev_err(dev, "STANDBY IMMEDIATE failed (err_mask=0x%x)\n",
+ err_mask);
+}
+
+/**
* ata_dev_power_set_active - Set a device power mode to active
* @dev: target device
*
diff --git a/drivers/atm/idt77252.c b/drivers/atm/idt77252.c
index e327a022..e7f713c 100644
--- a/drivers/atm/idt77252.c
+++ b/drivers/atm/idt77252.c
@@ -2930,6 +2930,8 @@ open_card_ubr0(struct idt77252_dev *card)
vc->scq = alloc_scq(card, vc->class);
if (!vc->scq) {
printk("%s: can't get SCQ.\n", card->name);
+ kfree(card->vcs[0]);
+ card->vcs[0] = NULL;
return -ENOMEM;
}
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 018ac20..024b78a 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -431,9 +431,6 @@ init_cpu_capacity_callback(struct notifier_block *nb,
struct cpufreq_policy *policy = data;
int cpu;
- if (!raw_capacity)
- return 0;
-
if (val != CPUFREQ_CREATE_POLICY)
return 0;
@@ -450,9 +447,11 @@ init_cpu_capacity_callback(struct notifier_block *nb,
}
if (cpumask_empty(cpus_to_visit)) {
- topology_normalize_cpu_scale();
- schedule_work(&update_topology_flags_work);
- free_raw_capacity();
+ if (raw_capacity) {
+ topology_normalize_cpu_scale();
+ schedule_work(&update_topology_flags_work);
+ free_raw_capacity();
+ }
pr_debug("cpu_capacity: parsing done\n");
schedule_work(&parsing_done_work);
}
@@ -472,7 +471,7 @@ static int __init register_cpufreq_notifier(void)
* On ACPI-based systems skip registering cpufreq notifier as cpufreq
* information is not needed for cpu capacity initialization.
*/
- if (!acpi_disabled || !raw_capacity)
+ if (!acpi_disabled)
return -EINVAL;
if (!alloc_cpumask_var(&cpus_to_visit, GFP_KERNEL))
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 14d46af..9828da9 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -125,7 +125,7 @@ static void __fwnode_link_del(struct fwnode_link *link)
*/
static void __fwnode_link_cycle(struct fwnode_link *link)
{
- pr_debug("%pfwf: Relaxing link with %pfwf\n",
+ pr_debug("%pfwf: cycle: depends on %pfwf\n",
link->consumer, link->supplier);
link->flags |= FWLINK_FLAG_CYCLE;
}
@@ -284,10 +284,12 @@ static bool device_is_ancestor(struct device *dev, struct device *target)
return false;
}
+#define DL_MARKER_FLAGS (DL_FLAG_INFERRED | \
+ DL_FLAG_CYCLE | \
+ DL_FLAG_MANAGED)
static inline bool device_link_flag_is_sync_state_only(u32 flags)
{
- return (flags & ~(DL_FLAG_INFERRED | DL_FLAG_CYCLE)) ==
- (DL_FLAG_SYNC_STATE_ONLY | DL_FLAG_MANAGED);
+ return (flags & ~DL_MARKER_FLAGS) == DL_FLAG_SYNC_STATE_ONLY;
}
/**
@@ -1943,6 +1945,7 @@ static bool __fw_devlink_relax_cycles(struct device *con,
/* Termination condition. */
if (sup_dev == con) {
+ pr_debug("----- cycle: start -----\n");
ret = true;
goto out;
}
@@ -1974,8 +1977,11 @@ static bool __fw_devlink_relax_cycles(struct device *con,
else
par_dev = fwnode_get_next_parent_dev(sup_handle);
- if (par_dev && __fw_devlink_relax_cycles(con, par_dev->fwnode))
+ if (par_dev && __fw_devlink_relax_cycles(con, par_dev->fwnode)) {
+ pr_debug("%pfwf: cycle: child of %pfwf\n", sup_handle,
+ par_dev->fwnode);
ret = true;
+ }
if (!sup_dev)
goto out;
@@ -1991,6 +1997,8 @@ static bool __fw_devlink_relax_cycles(struct device *con,
if (__fw_devlink_relax_cycles(con,
dev_link->supplier->fwnode)) {
+ pr_debug("%pfwf: cycle: depends on %pfwf\n", sup_handle,
+ dev_link->supplier->fwnode);
fw_devlink_relax_link(dev_link);
dev_link->flags |= DL_FLAG_CYCLE;
ret = true;
@@ -2058,13 +2066,19 @@ static int fw_devlink_create_devlink(struct device *con,
/*
* SYNC_STATE_ONLY device links don't block probing and supports cycles.
- * So cycle detection isn't necessary and shouldn't be done.
+ * So, one might expect that cycle detection isn't necessary for them.
+ * However, if the device link was marked as SYNC_STATE_ONLY because
+ * it's part of a cycle, then we still need to do cycle detection. This
+ * is because the consumer and supplier might be part of multiple cycles
+ * and we need to detect all those cycles.
*/
- if (!(flags & DL_FLAG_SYNC_STATE_ONLY)) {
+ if (!device_link_flag_is_sync_state_only(flags) ||
+ flags & DL_FLAG_CYCLE) {
device_links_write_lock();
if (__fw_devlink_relax_cycles(con, sup_handle)) {
__fwnode_link_cycle(link);
flags = fw_devlink_get_flags(link->flags);
+ pr_debug("----- cycle: end -----\n");
dev_info(con, "Fixed dependency cycle(s) with %pfwf\n",
sup_handle);
}
diff --git a/drivers/base/regmap/regmap-kunit.c b/drivers/base/regmap/regmap-kunit.c
index 026bdcb..0d957c5 100644
--- a/drivers/base/regmap/regmap-kunit.c
+++ b/drivers/base/regmap/regmap-kunit.c
@@ -9,6 +9,23 @@
#define BLOCK_TEST_SIZE 12
+static void get_changed_bytes(void *orig, void *new, size_t size)
+{
+ char *o = orig;
+ char *n = new;
+ int i;
+
+ get_random_bytes(new, size);
+
+ /*
+ * This could be nicer and more efficient but we shouldn't
+ * super care.
+ */
+ for (i = 0; i < size; i++)
+ while (n[i] == o[i])
+ get_random_bytes(&n[i], 1);
+}
+
static const struct regmap_config test_regmap_config = {
.max_register = BLOCK_TEST_SIZE,
.reg_stride = 1,
@@ -1202,7 +1219,8 @@ static void raw_noinc_write(struct kunit *test)
struct regmap *map;
struct regmap_config config;
struct regmap_ram_data *data;
- unsigned int val, val_test, val_last;
+ unsigned int val;
+ u16 val_test, val_last;
u16 val_array[BLOCK_TEST_SIZE];
config = raw_regmap_config;
@@ -1251,7 +1269,7 @@ static void raw_sync(struct kunit *test)
struct regmap *map;
struct regmap_config config;
struct regmap_ram_data *data;
- u16 val[2];
+ u16 val[3];
u16 *hw_buf;
unsigned int rval;
int i;
@@ -1265,17 +1283,13 @@ static void raw_sync(struct kunit *test)
hw_buf = (u16 *)data->vals;
- get_random_bytes(&val, sizeof(val));
+ get_changed_bytes(&hw_buf[2], &val[0], sizeof(val));
/* Do a regular write and a raw write in cache only mode */
regcache_cache_only(map, true);
- KUNIT_EXPECT_EQ(test, 0, regmap_raw_write(map, 2, val, sizeof(val)));
- if (config.val_format_endian == REGMAP_ENDIAN_BIG)
- KUNIT_EXPECT_EQ(test, 0, regmap_write(map, 6,
- be16_to_cpu(val[0])));
- else
- KUNIT_EXPECT_EQ(test, 0, regmap_write(map, 6,
- le16_to_cpu(val[0])));
+ KUNIT_EXPECT_EQ(test, 0, regmap_raw_write(map, 2, val,
+ sizeof(u16) * 2));
+ KUNIT_EXPECT_EQ(test, 0, regmap_write(map, 4, val[2]));
/* We should read back the new values, and defaults for the rest */
for (i = 0; i < config.max_register + 1; i++) {
@@ -1284,24 +1298,34 @@ static void raw_sync(struct kunit *test)
switch (i) {
case 2:
case 3:
- case 6:
if (config.val_format_endian == REGMAP_ENDIAN_BIG) {
KUNIT_EXPECT_EQ(test, rval,
- be16_to_cpu(val[i % 2]));
+ be16_to_cpu(val[i - 2]));
} else {
KUNIT_EXPECT_EQ(test, rval,
- le16_to_cpu(val[i % 2]));
+ le16_to_cpu(val[i - 2]));
}
break;
+ case 4:
+ KUNIT_EXPECT_EQ(test, rval, val[i - 2]);
+ break;
default:
KUNIT_EXPECT_EQ(test, config.reg_defaults[i].def, rval);
break;
}
}
+
+ /*
+ * The value written via _write() was translated by the core,
+ * translate the original copy for comparison purposes.
+ */
+ if (config.val_format_endian == REGMAP_ENDIAN_BIG)
+ val[2] = cpu_to_be16(val[2]);
+ else
+ val[2] = cpu_to_le16(val[2]);
/* The values should not appear in the "hardware" */
- KUNIT_EXPECT_MEMNEQ(test, &hw_buf[2], val, sizeof(val));
- KUNIT_EXPECT_MEMNEQ(test, &hw_buf[6], val, sizeof(u16));
+ KUNIT_EXPECT_MEMNEQ(test, &hw_buf[2], &val[0], sizeof(val));
for (i = 0; i < config.max_register + 1; i++)
data->written[i] = false;
@@ -1312,8 +1336,7 @@ static void raw_sync(struct kunit *test)
KUNIT_EXPECT_EQ(test, 0, regcache_sync(map));
/* The values should now appear in the "hardware" */
- KUNIT_EXPECT_MEMEQ(test, &hw_buf[2], val, sizeof(val));
- KUNIT_EXPECT_MEMEQ(test, &hw_buf[6], val, sizeof(u16));
+ KUNIT_EXPECT_MEMEQ(test, &hw_buf[2], &val[0], sizeof(val));
regmap_exit(map);
}
diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index c21e373..94dc0a23 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -524,9 +524,9 @@ struct drbd_md {
struct drbd_backing_dev {
struct block_device *backing_bdev;
- struct bdev_handle *backing_bdev_handle;
+ struct file *backing_bdev_file;
struct block_device *md_bdev;
- struct bdev_handle *md_bdev_handle;
+ struct file *f_md_bdev;
struct drbd_md md;
struct disk_conf *disk_conf; /* RCU, for updates: resource->conf_update */
sector_t known_size; /* last known size of that backing device */
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index fbd92803..5d65c97 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1627,45 +1627,45 @@ int drbd_adm_disk_opts(struct sk_buff *skb, struct genl_info *info)
return 0;
}
-static struct bdev_handle *open_backing_dev(struct drbd_device *device,
+static struct file *open_backing_dev(struct drbd_device *device,
const char *bdev_path, void *claim_ptr, bool do_bd_link)
{
- struct bdev_handle *handle;
+ struct file *file;
int err = 0;
- handle = bdev_open_by_path(bdev_path, BLK_OPEN_READ | BLK_OPEN_WRITE,
- claim_ptr, NULL);
- if (IS_ERR(handle)) {
+ file = bdev_file_open_by_path(bdev_path, BLK_OPEN_READ | BLK_OPEN_WRITE,
+ claim_ptr, NULL);
+ if (IS_ERR(file)) {
drbd_err(device, "open(\"%s\") failed with %ld\n",
- bdev_path, PTR_ERR(handle));
- return handle;
+ bdev_path, PTR_ERR(file));
+ return file;
}
if (!do_bd_link)
- return handle;
+ return file;
- err = bd_link_disk_holder(handle->bdev, device->vdisk);
+ err = bd_link_disk_holder(file_bdev(file), device->vdisk);
if (err) {
- bdev_release(handle);
+ fput(file);
drbd_err(device, "bd_link_disk_holder(\"%s\", ...) failed with %d\n",
bdev_path, err);
- handle = ERR_PTR(err);
+ file = ERR_PTR(err);
}
- return handle;
+ return file;
}
static int open_backing_devices(struct drbd_device *device,
struct disk_conf *new_disk_conf,
struct drbd_backing_dev *nbc)
{
- struct bdev_handle *handle;
+ struct file *file;
- handle = open_backing_dev(device, new_disk_conf->backing_dev, device,
+ file = open_backing_dev(device, new_disk_conf->backing_dev, device,
true);
- if (IS_ERR(handle))
+ if (IS_ERR(file))
return ERR_OPEN_DISK;
- nbc->backing_bdev = handle->bdev;
- nbc->backing_bdev_handle = handle;
+ nbc->backing_bdev = file_bdev(file);
+ nbc->backing_bdev_file = file;
/*
* meta_dev_idx >= 0: external fixed size, possibly multiple
@@ -1675,7 +1675,7 @@ static int open_backing_devices(struct drbd_device *device,
* should check it for you already; but if you don't, or
* someone fooled it, we need to double check here)
*/
- handle = open_backing_dev(device, new_disk_conf->meta_dev,
+ file = open_backing_dev(device, new_disk_conf->meta_dev,
/* claim ptr: device, if claimed exclusively; shared drbd_m_holder,
* if potentially shared with other drbd minors */
(new_disk_conf->meta_dev_idx < 0) ? (void*)device : (void*)drbd_m_holder,
@@ -1683,21 +1683,21 @@ static int open_backing_devices(struct drbd_device *device,
* as would happen with internal metadata. */
(new_disk_conf->meta_dev_idx != DRBD_MD_INDEX_FLEX_INT &&
new_disk_conf->meta_dev_idx != DRBD_MD_INDEX_INTERNAL));
- if (IS_ERR(handle))
+ if (IS_ERR(file))
return ERR_OPEN_MD_DISK;
- nbc->md_bdev = handle->bdev;
- nbc->md_bdev_handle = handle;
+ nbc->md_bdev = file_bdev(file);
+ nbc->f_md_bdev = file;
return NO_ERROR;
}
static void close_backing_dev(struct drbd_device *device,
- struct bdev_handle *handle, bool do_bd_unlink)
+ struct file *bdev_file, bool do_bd_unlink)
{
- if (!handle)
+ if (!bdev_file)
return;
if (do_bd_unlink)
- bd_unlink_disk_holder(handle->bdev, device->vdisk);
- bdev_release(handle);
+ bd_unlink_disk_holder(file_bdev(bdev_file), device->vdisk);
+ fput(bdev_file);
}
void drbd_backing_dev_free(struct drbd_device *device, struct drbd_backing_dev *ldev)
@@ -1705,9 +1705,9 @@ void drbd_backing_dev_free(struct drbd_device *device, struct drbd_backing_dev *
if (ldev == NULL)
return;
- close_backing_dev(device, ldev->md_bdev_handle,
+ close_backing_dev(device, ldev->f_md_bdev,
ldev->md_bdev != ldev->backing_bdev);
- close_backing_dev(device, ldev->backing_bdev_handle, true);
+ close_backing_dev(device, ldev->backing_bdev_file, true);
kfree(ldev->disk_conf);
kfree(ldev);
@@ -2123,9 +2123,9 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
fail:
conn_reconfig_done(connection);
if (nbc) {
- close_backing_dev(device, nbc->md_bdev_handle,
+ close_backing_dev(device, nbc->f_md_bdev,
nbc->md_bdev != nbc->backing_bdev);
- close_backing_dev(device, nbc->backing_bdev_handle, true);
+ close_backing_dev(device, nbc->backing_bdev_file, true);
kfree(nbc);
}
kfree(new_disk_conf);
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 9071c4e..21728e9 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -340,8 +340,8 @@ static ssize_t device_map_show(const struct class *c, const struct class_attribu
n += sysfs_emit_at(data, n, "%s %u:%u %u:%u\n",
pd->disk->disk_name,
MAJOR(pd->pkt_dev), MINOR(pd->pkt_dev),
- MAJOR(pd->bdev_handle->bdev->bd_dev),
- MINOR(pd->bdev_handle->bdev->bd_dev));
+ MAJOR(file_bdev(pd->bdev_file)->bd_dev),
+ MINOR(file_bdev(pd->bdev_file)->bd_dev));
}
mutex_unlock(&ctl_mutex);
return n;
@@ -438,7 +438,7 @@ static int pkt_seq_show(struct seq_file *m, void *p)
int states[PACKET_NUM_STATES];
seq_printf(m, "Writer %s mapped to %pg:\n", pd->disk->disk_name,
- pd->bdev_handle->bdev);
+ file_bdev(pd->bdev_file));
seq_printf(m, "\nSettings:\n");
seq_printf(m, "\tpacket size:\t\t%dkB\n", pd->settings.size / 2);
@@ -715,7 +715,7 @@ static void pkt_rbtree_insert(struct pktcdvd_device *pd, struct pkt_rb_node *nod
*/
static int pkt_generic_packet(struct pktcdvd_device *pd, struct packet_command *cgc)
{
- struct request_queue *q = bdev_get_queue(pd->bdev_handle->bdev);
+ struct request_queue *q = bdev_get_queue(file_bdev(pd->bdev_file));
struct scsi_cmnd *scmd;
struct request *rq;
int ret = 0;
@@ -1054,7 +1054,7 @@ static void pkt_gather_data(struct pktcdvd_device *pd, struct packet_data *pkt)
continue;
bio = pkt->r_bios[f];
- bio_init(bio, pd->bdev_handle->bdev, bio->bi_inline_vecs, 1,
+ bio_init(bio, file_bdev(pd->bdev_file), bio->bi_inline_vecs, 1,
REQ_OP_READ);
bio->bi_iter.bi_sector = pkt->sector + f * (CD_FRAMESIZE >> 9);
bio->bi_end_io = pkt_end_io_read;
@@ -1270,7 +1270,7 @@ static void pkt_start_write(struct pktcdvd_device *pd, struct packet_data *pkt)
struct device *ddev = disk_to_dev(pd->disk);
int f;
- bio_init(pkt->w_bio, pd->bdev_handle->bdev, pkt->w_bio->bi_inline_vecs,
+ bio_init(pkt->w_bio, file_bdev(pd->bdev_file), pkt->w_bio->bi_inline_vecs,
pkt->frames, REQ_OP_WRITE);
pkt->w_bio->bi_iter.bi_sector = pkt->sector;
pkt->w_bio->bi_end_io = pkt_end_io_packet_write;
@@ -2168,20 +2168,20 @@ static int pkt_open_dev(struct pktcdvd_device *pd, bool write)
int ret;
long lba;
struct request_queue *q;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
/*
* We need to re-open the cdrom device without O_NONBLOCK to be able
* to read/write from/to it. It is already opened in O_NONBLOCK mode
* so open should not fail.
*/
- bdev_handle = bdev_open_by_dev(pd->bdev_handle->bdev->bd_dev,
+ bdev_file = bdev_file_open_by_dev(file_bdev(pd->bdev_file)->bd_dev,
BLK_OPEN_READ, pd, NULL);
- if (IS_ERR(bdev_handle)) {
- ret = PTR_ERR(bdev_handle);
+ if (IS_ERR(bdev_file)) {
+ ret = PTR_ERR(bdev_file);
goto out;
}
- pd->open_bdev_handle = bdev_handle;
+ pd->f_open_bdev = bdev_file;
ret = pkt_get_last_written(pd, &lba);
if (ret) {
@@ -2190,9 +2190,9 @@ static int pkt_open_dev(struct pktcdvd_device *pd, bool write)
}
set_capacity(pd->disk, lba << 2);
- set_capacity_and_notify(pd->bdev_handle->bdev->bd_disk, lba << 2);
+ set_capacity_and_notify(file_bdev(pd->bdev_file)->bd_disk, lba << 2);
- q = bdev_get_queue(pd->bdev_handle->bdev);
+ q = bdev_get_queue(file_bdev(pd->bdev_file));
if (write) {
ret = pkt_open_write(pd);
if (ret)
@@ -2219,7 +2219,7 @@ static int pkt_open_dev(struct pktcdvd_device *pd, bool write)
return 0;
out_putdev:
- bdev_release(bdev_handle);
+ fput(bdev_file);
out:
return ret;
}
@@ -2238,8 +2238,8 @@ static void pkt_release_dev(struct pktcdvd_device *pd, int flush)
pkt_lock_door(pd, 0);
pkt_set_speed(pd, MAX_SPEED, MAX_SPEED);
- bdev_release(pd->open_bdev_handle);
- pd->open_bdev_handle = NULL;
+ fput(pd->f_open_bdev);
+ pd->f_open_bdev = NULL;
pkt_shrink_pktlist(pd);
}
@@ -2327,7 +2327,7 @@ static void pkt_end_io_read_cloned(struct bio *bio)
static void pkt_make_request_read(struct pktcdvd_device *pd, struct bio *bio)
{
- struct bio *cloned_bio = bio_alloc_clone(pd->bdev_handle->bdev, bio,
+ struct bio *cloned_bio = bio_alloc_clone(file_bdev(pd->bdev_file), bio,
GFP_NOIO, &pkt_bio_set);
struct packet_stacked_data *psd = mempool_alloc(&psd_pool, GFP_NOIO);
@@ -2489,7 +2489,7 @@ static int pkt_new_dev(struct pktcdvd_device *pd, dev_t dev)
{
struct device *ddev = disk_to_dev(pd->disk);
int i;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
struct scsi_device *sdev;
if (pd->pkt_dev == dev) {
@@ -2500,9 +2500,9 @@ static int pkt_new_dev(struct pktcdvd_device *pd, dev_t dev)
struct pktcdvd_device *pd2 = pkt_devs[i];
if (!pd2)
continue;
- if (pd2->bdev_handle->bdev->bd_dev == dev) {
+ if (file_bdev(pd2->bdev_file)->bd_dev == dev) {
dev_err(ddev, "%pg already setup\n",
- pd2->bdev_handle->bdev);
+ file_bdev(pd2->bdev_file));
return -EBUSY;
}
if (pd2->pkt_dev == dev) {
@@ -2511,13 +2511,13 @@ static int pkt_new_dev(struct pktcdvd_device *pd, dev_t dev)
}
}
- bdev_handle = bdev_open_by_dev(dev, BLK_OPEN_READ | BLK_OPEN_NDELAY,
+ bdev_file = bdev_file_open_by_dev(dev, BLK_OPEN_READ | BLK_OPEN_NDELAY,
NULL, NULL);
- if (IS_ERR(bdev_handle))
- return PTR_ERR(bdev_handle);
- sdev = scsi_device_from_queue(bdev_handle->bdev->bd_disk->queue);
+ if (IS_ERR(bdev_file))
+ return PTR_ERR(bdev_file);
+ sdev = scsi_device_from_queue(file_bdev(bdev_file)->bd_disk->queue);
if (!sdev) {
- bdev_release(bdev_handle);
+ fput(bdev_file);
return -EINVAL;
}
put_device(&sdev->sdev_gendev);
@@ -2525,8 +2525,8 @@ static int pkt_new_dev(struct pktcdvd_device *pd, dev_t dev)
/* This is safe, since we have a reference from open(). */
__module_get(THIS_MODULE);
- pd->bdev_handle = bdev_handle;
- set_blocksize(bdev_handle->bdev, CD_FRAMESIZE);
+ pd->bdev_file = bdev_file;
+ set_blocksize(file_bdev(bdev_file), CD_FRAMESIZE);
atomic_set(&pd->cdrw.pending_bios, 0);
pd->cdrw.thread = kthread_run(kcdrwd, pd, "%s", pd->disk->disk_name);
@@ -2536,11 +2536,11 @@ static int pkt_new_dev(struct pktcdvd_device *pd, dev_t dev)
}
proc_create_single_data(pd->disk->disk_name, 0, pkt_proc, pkt_seq_show, pd);
- dev_notice(ddev, "writer mapped to %pg\n", bdev_handle->bdev);
+ dev_notice(ddev, "writer mapped to %pg\n", file_bdev(bdev_file));
return 0;
out_mem:
- bdev_release(bdev_handle);
+ fput(bdev_file);
/* This is safe: open() is still holding a reference. */
module_put(THIS_MODULE);
return -ENOMEM;
@@ -2595,9 +2595,9 @@ static unsigned int pkt_check_events(struct gendisk *disk,
if (!pd)
return 0;
- if (!pd->bdev_handle)
+ if (!pd->bdev_file)
return 0;
- attached_disk = pd->bdev_handle->bdev->bd_disk;
+ attached_disk = file_bdev(pd->bdev_file)->bd_disk;
if (!attached_disk || !attached_disk->fops->check_events)
return 0;
return attached_disk->fops->check_events(attached_disk, clearing);
@@ -2687,7 +2687,7 @@ static int pkt_setup_dev(dev_t dev, dev_t* pkt_dev)
goto out_mem2;
/* inherit events of the host device */
- disk->events = pd->bdev_handle->bdev->bd_disk->events;
+ disk->events = file_bdev(pd->bdev_file)->bd_disk->events;
ret = add_disk(disk);
if (ret)
@@ -2752,7 +2752,7 @@ static int pkt_remove_dev(dev_t pkt_dev)
pkt_debugfs_dev_remove(pd);
pkt_sysfs_dev_remove(pd);
- bdev_release(pd->bdev_handle);
+ fput(pd->bdev_file);
remove_proc_entry(pd->disk->disk_name, pkt_proc);
dev_notice(ddev, "writer unmapped\n");
@@ -2779,7 +2779,7 @@ static void pkt_get_status(struct pkt_ctrl_command *ctrl_cmd)
pd = pkt_find_dev_from_minor(ctrl_cmd->dev_index);
if (pd) {
- ctrl_cmd->dev = new_encode_dev(pd->bdev_handle->bdev->bd_dev);
+ ctrl_cmd->dev = new_encode_dev(file_bdev(pd->bdev_file)->bd_dev);
ctrl_cmd->pkt_dev = new_encode_dev(pd->pkt_dev);
} else {
ctrl_cmd->dev = 0;
diff --git a/drivers/block/rnbd/rnbd-srv.c b/drivers/block/rnbd/rnbd-srv.c
index 3a0d5dc..f6e3a3c 100644
--- a/drivers/block/rnbd/rnbd-srv.c
+++ b/drivers/block/rnbd/rnbd-srv.c
@@ -145,7 +145,7 @@ static int process_rdma(struct rnbd_srv_session *srv_sess,
priv->sess_dev = sess_dev;
priv->id = id;
- bio = bio_alloc(sess_dev->bdev_handle->bdev, 1,
+ bio = bio_alloc(file_bdev(sess_dev->bdev_file), 1,
rnbd_to_bio_flags(le32_to_cpu(msg->rw)), GFP_KERNEL);
if (bio_add_page(bio, virt_to_page(data), datalen,
offset_in_page(data)) != datalen) {
@@ -219,7 +219,7 @@ void rnbd_destroy_sess_dev(struct rnbd_srv_sess_dev *sess_dev, bool keep_id)
rnbd_put_sess_dev(sess_dev);
wait_for_completion(&dc); /* wait for inflights to drop to zero */
- bdev_release(sess_dev->bdev_handle);
+ fput(sess_dev->bdev_file);
mutex_lock(&sess_dev->dev->lock);
list_del(&sess_dev->dev_list);
if (!sess_dev->readonly)
@@ -534,7 +534,7 @@ rnbd_srv_get_or_create_srv_dev(struct block_device *bdev,
static void rnbd_srv_fill_msg_open_rsp(struct rnbd_msg_open_rsp *rsp,
struct rnbd_srv_sess_dev *sess_dev)
{
- struct block_device *bdev = sess_dev->bdev_handle->bdev;
+ struct block_device *bdev = file_bdev(sess_dev->bdev_file);
rsp->hdr.type = cpu_to_le16(RNBD_MSG_OPEN_RSP);
rsp->device_id = cpu_to_le32(sess_dev->device_id);
@@ -560,7 +560,7 @@ static void rnbd_srv_fill_msg_open_rsp(struct rnbd_msg_open_rsp *rsp,
static struct rnbd_srv_sess_dev *
rnbd_srv_create_set_sess_dev(struct rnbd_srv_session *srv_sess,
const struct rnbd_msg_open *open_msg,
- struct bdev_handle *handle, bool readonly,
+ struct file *bdev_file, bool readonly,
struct rnbd_srv_dev *srv_dev)
{
struct rnbd_srv_sess_dev *sdev = rnbd_sess_dev_alloc(srv_sess);
@@ -572,7 +572,7 @@ rnbd_srv_create_set_sess_dev(struct rnbd_srv_session *srv_sess,
strscpy(sdev->pathname, open_msg->dev_name, sizeof(sdev->pathname));
- sdev->bdev_handle = handle;
+ sdev->bdev_file = bdev_file;
sdev->sess = srv_sess;
sdev->dev = srv_dev;
sdev->readonly = readonly;
@@ -678,7 +678,7 @@ static int process_msg_open(struct rnbd_srv_session *srv_sess,
struct rnbd_srv_dev *srv_dev;
struct rnbd_srv_sess_dev *srv_sess_dev;
const struct rnbd_msg_open *open_msg = msg;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
blk_mode_t open_flags = BLK_OPEN_READ;
char *full_path;
struct rnbd_msg_open_rsp *rsp = data;
@@ -716,15 +716,15 @@ static int process_msg_open(struct rnbd_srv_session *srv_sess,
goto reject;
}
- bdev_handle = bdev_open_by_path(full_path, open_flags, NULL, NULL);
- if (IS_ERR(bdev_handle)) {
- ret = PTR_ERR(bdev_handle);
+ bdev_file = bdev_file_open_by_path(full_path, open_flags, NULL, NULL);
+ if (IS_ERR(bdev_file)) {
+ ret = PTR_ERR(bdev_file);
pr_err("Opening device '%s' on session %s failed, failed to open the block device, err: %pe\n",
- full_path, srv_sess->sessname, bdev_handle);
+ full_path, srv_sess->sessname, bdev_file);
goto free_path;
}
- srv_dev = rnbd_srv_get_or_create_srv_dev(bdev_handle->bdev, srv_sess,
+ srv_dev = rnbd_srv_get_or_create_srv_dev(file_bdev(bdev_file), srv_sess,
open_msg->access_mode);
if (IS_ERR(srv_dev)) {
pr_err("Opening device '%s' on session %s failed, creating srv_dev failed, err: %pe\n",
@@ -734,7 +734,7 @@ static int process_msg_open(struct rnbd_srv_session *srv_sess,
}
srv_sess_dev = rnbd_srv_create_set_sess_dev(srv_sess, open_msg,
- bdev_handle,
+ bdev_file,
open_msg->access_mode == RNBD_ACCESS_RO,
srv_dev);
if (IS_ERR(srv_sess_dev)) {
@@ -750,7 +750,7 @@ static int process_msg_open(struct rnbd_srv_session *srv_sess,
*/
mutex_lock(&srv_dev->lock);
if (!srv_dev->dev_kobj.state_in_sysfs) {
- ret = rnbd_srv_create_dev_sysfs(srv_dev, bdev_handle->bdev);
+ ret = rnbd_srv_create_dev_sysfs(srv_dev, file_bdev(bdev_file));
if (ret) {
mutex_unlock(&srv_dev->lock);
rnbd_srv_err(srv_sess_dev,
@@ -793,7 +793,7 @@ static int process_msg_open(struct rnbd_srv_session *srv_sess,
}
rnbd_put_srv_dev(srv_dev);
blkdev_put:
- bdev_release(bdev_handle);
+ fput(bdev_file);
free_path:
kfree(full_path);
reject:
diff --git a/drivers/block/rnbd/rnbd-srv.h b/drivers/block/rnbd/rnbd-srv.h
index 343cc68..18d8738 100644
--- a/drivers/block/rnbd/rnbd-srv.h
+++ b/drivers/block/rnbd/rnbd-srv.h
@@ -46,7 +46,7 @@ struct rnbd_srv_dev {
struct rnbd_srv_sess_dev {
/* Entry inside rnbd_srv_dev struct */
struct list_head dev_list;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
struct rnbd_srv_session *sess;
struct rnbd_srv_dev *dev;
struct kobject kobj;
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index aa9f865..42dea76 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -1600,14 +1600,15 @@ static int virtblk_freeze(struct virtio_device *vdev)
{
struct virtio_blk *vblk = vdev->priv;
+ /* Ensure no requests in virtqueues before deleting vqs. */
+ blk_mq_freeze_queue(vblk->disk->queue);
+
/* Ensure we don't receive any more interrupts */
virtio_reset_device(vdev);
/* Make sure no work handler is accessing the device. */
flush_work(&vblk->config_work);
- blk_mq_quiesce_queue(vblk->disk->queue);
-
vdev->config->del_vqs(vdev);
kfree(vblk->vqs);
@@ -1625,7 +1626,7 @@ static int virtblk_restore(struct virtio_device *vdev)
virtio_device_ready(vdev);
- blk_mq_unquiesce_queue(vblk->disk->queue);
+ blk_mq_unfreeze_queue(vblk->disk->queue);
return 0;
}
#endif
diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index 4defd7f..944576d 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -465,7 +465,7 @@ static int xen_vbd_translate(struct phys_req *req, struct xen_blkif *blkif,
}
req->dev = vbd->pdevice;
- req->bdev = vbd->bdev_handle->bdev;
+ req->bdev = file_bdev(vbd->bdev_file);
rc = 0;
out:
@@ -969,7 +969,7 @@ static int dispatch_discard_io(struct xen_blkif_ring *ring,
int err = 0;
int status = BLKIF_RSP_OKAY;
struct xen_blkif *blkif = ring->blkif;
- struct block_device *bdev = blkif->vbd.bdev_handle->bdev;
+ struct block_device *bdev = file_bdev(blkif->vbd.bdev_file);
struct phys_req preq;
xen_blkif_get(blkif);
diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index 1432c83..b427d54 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -221,7 +221,7 @@ struct xen_vbd {
unsigned char type;
/* phys device that this vbd maps to. */
u32 pdevice;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
/* Cached size parameter. */
sector_t size;
unsigned int flush_support:1;
@@ -360,7 +360,7 @@ struct pending_req {
};
-#define vbd_sz(_v) bdev_nr_sectors((_v)->bdev_handle->bdev)
+#define vbd_sz(_v) bdev_nr_sectors(file_bdev((_v)->bdev_file))
#define xen_blkif_get(_b) (atomic_inc(&(_b)->refcnt))
#define xen_blkif_put(_b) \
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index e34219e..0621878 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -81,7 +81,7 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
int i;
/* Not ready to connect? */
- if (!blkif->rings || !blkif->rings[0].irq || !blkif->vbd.bdev_handle)
+ if (!blkif->rings || !blkif->rings[0].irq || !blkif->vbd.bdev_file)
return;
/* Already connected? */
@@ -99,13 +99,12 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
return;
}
- err = sync_blockdev(blkif->vbd.bdev_handle->bdev);
+ err = sync_blockdev(file_bdev(blkif->vbd.bdev_file));
if (err) {
xenbus_dev_error(blkif->be->dev, err, "block flush");
return;
}
- invalidate_inode_pages2(
- blkif->vbd.bdev_handle->bdev->bd_inode->i_mapping);
+ invalidate_inode_pages2(blkif->vbd.bdev_file->f_mapping);
for (i = 0; i < blkif->nr_rings; i++) {
ring = &blkif->rings[i];
@@ -473,9 +472,9 @@ static void xenvbd_sysfs_delif(struct xenbus_device *dev)
static void xen_vbd_free(struct xen_vbd *vbd)
{
- if (vbd->bdev_handle)
- bdev_release(vbd->bdev_handle);
- vbd->bdev_handle = NULL;
+ if (vbd->bdev_file)
+ fput(vbd->bdev_file);
+ vbd->bdev_file = NULL;
}
static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
@@ -483,7 +482,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
int cdrom)
{
struct xen_vbd *vbd;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
vbd = &blkif->vbd;
vbd->handle = handle;
@@ -492,17 +491,17 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
vbd->pdevice = MKDEV(major, minor);
- bdev_handle = bdev_open_by_dev(vbd->pdevice, vbd->readonly ?
+ bdev_file = bdev_file_open_by_dev(vbd->pdevice, vbd->readonly ?
BLK_OPEN_READ : BLK_OPEN_WRITE, NULL, NULL);
- if (IS_ERR(bdev_handle)) {
+ if (IS_ERR(bdev_file)) {
pr_warn("xen_vbd_create: device %08x could not be opened\n",
vbd->pdevice);
return -ENOENT;
}
- vbd->bdev_handle = bdev_handle;
- if (vbd->bdev_handle->bdev->bd_disk == NULL) {
+ vbd->bdev_file = bdev_file;
+ if (file_bdev(vbd->bdev_file)->bd_disk == NULL) {
pr_warn("xen_vbd_create: device %08x doesn't exist\n",
vbd->pdevice);
xen_vbd_free(vbd);
@@ -510,14 +509,14 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
}
vbd->size = vbd_sz(vbd);
- if (cdrom || disk_to_cdi(vbd->bdev_handle->bdev->bd_disk))
+ if (cdrom || disk_to_cdi(file_bdev(vbd->bdev_file)->bd_disk))
vbd->type |= VDISK_CDROM;
- if (vbd->bdev_handle->bdev->bd_disk->flags & GENHD_FL_REMOVABLE)
+ if (file_bdev(vbd->bdev_file)->bd_disk->flags & GENHD_FL_REMOVABLE)
vbd->type |= VDISK_REMOVABLE;
- if (bdev_write_cache(bdev_handle->bdev))
+ if (bdev_write_cache(file_bdev(bdev_file)))
vbd->flush_support = true;
- if (bdev_max_secure_erase_sectors(bdev_handle->bdev))
+ if (bdev_max_secure_erase_sectors(file_bdev(bdev_file)))
vbd->discard_secure = true;
pr_debug("Successful creation of handle=%04x (dom=%u)\n",
@@ -570,7 +569,7 @@ static void xen_blkbk_discard(struct xenbus_transaction xbt, struct backend_info
struct xen_blkif *blkif = be->blkif;
int err;
int state = 0;
- struct block_device *bdev = be->blkif->vbd.bdev_handle->bdev;
+ struct block_device *bdev = file_bdev(be->blkif->vbd.bdev_file);
if (!xenbus_read_unsigned(dev->nodename, "discard-enable", 1))
return;
@@ -932,7 +931,7 @@ static void connect(struct backend_info *be)
}
err = xenbus_printf(xbt, dev->nodename, "sector-size", "%lu",
(unsigned long)bdev_logical_block_size(
- be->blkif->vbd.bdev_handle->bdev));
+ file_bdev(be->blkif->vbd.bdev_file)));
if (err) {
xenbus_dev_fatal(dev, err, "writing %s/sector-size",
dev->nodename);
@@ -940,7 +939,7 @@ static void connect(struct backend_info *be)
}
err = xenbus_printf(xbt, dev->nodename, "physical-sector-size", "%u",
bdev_physical_block_size(
- be->blkif->vbd.bdev_handle->bdev));
+ file_bdev(be->blkif->vbd.bdev_file)));
if (err)
xenbus_dev_error(dev, err, "writing %s/physical-sector-size",
dev->nodename);
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 8ee0f7b..da7a20f 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -426,11 +426,11 @@ static void reset_bdev(struct zram *zram)
if (!zram->backing_dev)
return;
- bdev_release(zram->bdev_handle);
+ fput(zram->bdev_file);
/* hope filp_close flush all of IO */
filp_close(zram->backing_dev, NULL);
zram->backing_dev = NULL;
- zram->bdev_handle = NULL;
+ zram->bdev_file = NULL;
zram->disk->fops = &zram_devops;
kvfree(zram->bitmap);
zram->bitmap = NULL;
@@ -476,7 +476,7 @@ static ssize_t backing_dev_store(struct device *dev,
struct address_space *mapping;
unsigned int bitmap_sz;
unsigned long nr_pages, *bitmap = NULL;
- struct bdev_handle *bdev_handle = NULL;
+ struct file *bdev_file = NULL;
int err;
struct zram *zram = dev_to_zram(dev);
@@ -513,11 +513,11 @@ static ssize_t backing_dev_store(struct device *dev,
goto out;
}
- bdev_handle = bdev_open_by_dev(inode->i_rdev,
+ bdev_file = bdev_file_open_by_dev(inode->i_rdev,
BLK_OPEN_READ | BLK_OPEN_WRITE, zram, NULL);
- if (IS_ERR(bdev_handle)) {
- err = PTR_ERR(bdev_handle);
- bdev_handle = NULL;
+ if (IS_ERR(bdev_file)) {
+ err = PTR_ERR(bdev_file);
+ bdev_file = NULL;
goto out;
}
@@ -531,7 +531,7 @@ static ssize_t backing_dev_store(struct device *dev,
reset_bdev(zram);
- zram->bdev_handle = bdev_handle;
+ zram->bdev_file = bdev_file;
zram->backing_dev = backing_dev;
zram->bitmap = bitmap;
zram->nr_pages = nr_pages;
@@ -544,8 +544,8 @@ static ssize_t backing_dev_store(struct device *dev,
out:
kvfree(bitmap);
- if (bdev_handle)
- bdev_release(bdev_handle);
+ if (bdev_file)
+ fput(bdev_file);
if (backing_dev)
filp_close(backing_dev, NULL);
@@ -587,7 +587,7 @@ static void read_from_bdev_async(struct zram *zram, struct page *page,
{
struct bio *bio;
- bio = bio_alloc(zram->bdev_handle->bdev, 1, parent->bi_opf, GFP_NOIO);
+ bio = bio_alloc(file_bdev(zram->bdev_file), 1, parent->bi_opf, GFP_NOIO);
bio->bi_iter.bi_sector = entry * (PAGE_SIZE >> 9);
__bio_add_page(bio, page, PAGE_SIZE, 0);
bio_chain(bio, parent);
@@ -703,7 +703,7 @@ static ssize_t writeback_store(struct device *dev,
continue;
}
- bio_init(&bio, zram->bdev_handle->bdev, &bio_vec, 1,
+ bio_init(&bio, file_bdev(zram->bdev_file), &bio_vec, 1,
REQ_OP_WRITE | REQ_SYNC);
bio.bi_iter.bi_sector = blk_idx * (PAGE_SIZE >> 9);
__bio_add_page(&bio, page, PAGE_SIZE, 0);
@@ -785,7 +785,7 @@ static void zram_sync_read(struct work_struct *work)
struct bio_vec bv;
struct bio bio;
- bio_init(&bio, zw->zram->bdev_handle->bdev, &bv, 1, REQ_OP_READ);
+ bio_init(&bio, file_bdev(zw->zram->bdev_file), &bv, 1, REQ_OP_READ);
bio.bi_iter.bi_sector = zw->entry * (PAGE_SIZE >> 9);
__bio_add_page(&bio, zw->page, PAGE_SIZE, 0);
zw->error = submit_bio_wait(&bio);
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 3b94d12..37bf29f 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -132,7 +132,7 @@ struct zram {
spinlock_t wb_limit_lock;
bool wb_limit_enable;
u64 bd_wb_limit;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
unsigned long *bitmap;
unsigned long nr_pages;
#endif
diff --git a/drivers/bluetooth/btqca.c b/drivers/bluetooth/btqca.c
index fdb0fae..b40b32f 100644
--- a/drivers/bluetooth/btqca.c
+++ b/drivers/bluetooth/btqca.c
@@ -152,7 +152,7 @@ static int qca_send_patch_config_cmd(struct hci_dev *hdev)
bt_dev_dbg(hdev, "QCA Patch config");
skb = __hci_cmd_sync_ev(hdev, EDL_PATCH_CMD_OPCODE, sizeof(cmd),
- cmd, HCI_EV_VENDOR, HCI_INIT_TIMEOUT);
+ cmd, 0, HCI_INIT_TIMEOUT);
if (IS_ERR(skb)) {
err = PTR_ERR(skb);
bt_dev_err(hdev, "Sending QCA Patch config failed (%d)", err);
diff --git a/drivers/bluetooth/hci_bcm4377.c b/drivers/bluetooth/hci_bcm4377.c
index a617578..9a7243d 100644
--- a/drivers/bluetooth/hci_bcm4377.c
+++ b/drivers/bluetooth/hci_bcm4377.c
@@ -1417,7 +1417,7 @@ static int bcm4377_check_bdaddr(struct bcm4377_data *bcm4377)
bda = (struct hci_rp_read_bd_addr *)skb->data;
if (!bcm4377_is_valid_bdaddr(bcm4377, &bda->bdaddr))
- set_bit(HCI_QUIRK_INVALID_BDADDR, &bcm4377->hdev->quirks);
+ set_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &bcm4377->hdev->quirks);
kfree_skb(skb);
return 0;
@@ -2368,7 +2368,6 @@ static int bcm4377_probe(struct pci_dev *pdev, const struct pci_device_id *id)
hdev->set_bdaddr = bcm4377_hci_set_bdaddr;
hdev->setup = bcm4377_hci_setup;
- set_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks);
if (bcm4377->hw->broken_mws_transport_config)
set_bit(HCI_QUIRK_BROKEN_MWS_TRANSPORT_CONFIG, &hdev->quirks);
if (bcm4377->hw->broken_ext_scan)
diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
index 94b8c40..edd2a81 100644
--- a/drivers/bluetooth/hci_qca.c
+++ b/drivers/bluetooth/hci_qca.c
@@ -7,6 +7,7 @@
*
* Copyright (C) 2007 Texas Instruments, Inc.
* Copyright (c) 2010, 2012, 2018 The Linux Foundation. All rights reserved.
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved.
*
* Acknowledgements:
* This file is based on hci_ll.c, which was...
@@ -1806,13 +1807,12 @@ static int qca_power_on(struct hci_dev *hdev)
static void hci_coredump_qca(struct hci_dev *hdev)
{
+ int err;
static const u8 param[] = { 0x26 };
- struct sk_buff *skb;
- skb = __hci_cmd_sync(hdev, 0xfc0c, 1, param, HCI_CMD_TIMEOUT);
- if (IS_ERR(skb))
- bt_dev_err(hdev, "%s: trigger crash failed (%ld)", __func__, PTR_ERR(skb));
- kfree_skb(skb);
+ err = __hci_cmd_send(hdev, 0xfc0c, 1, param);
+ if (err < 0)
+ bt_dev_err(hdev, "%s: trigger crash failed (%d)", __func__, err);
}
static int qca_get_data_path_id(struct hci_dev *hdev, __u8 *data_path_id)
@@ -1904,7 +1904,17 @@ static int qca_setup(struct hci_uart *hu)
case QCA_WCN6750:
case QCA_WCN6855:
case QCA_WCN7850:
- set_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks);
+
+ /* Set BDA quirk bit for reading BDA value from fwnode property
+ * only if that property exist in DT.
+ */
+ if (fwnode_property_present(dev_fwnode(hdev->dev.parent), "local-bd-address")) {
+ set_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks);
+ bt_dev_info(hdev, "setting quirk bit to read BDA from fwnode later");
+ } else {
+ bt_dev_dbg(hdev, "local-bd-address` is not present in the devicetree so not setting quirk bit for BDA");
+ }
+
hci_set_aosp_capable(hdev);
ret = qca_read_soc_version(hdev, &ver, soc_type);
diff --git a/drivers/bus/imx-weim.c b/drivers/bus/imx-weim.c
index 6b5da73..837bf9d 100644
--- a/drivers/bus/imx-weim.c
+++ b/drivers/bus/imx-weim.c
@@ -120,7 +120,7 @@ static int imx_weim_gpr_setup(struct platform_device *pdev)
i++;
}
- if (i == 0 || i % 4)
+ if (i == 0)
goto err;
for (i = 0; i < ARRAY_SIZE(gprvals); i++) {
diff --git a/drivers/cache/ax45mp_cache.c b/drivers/cache/ax45mp_cache.c
index 57186c5..1d7dd3d 100644
--- a/drivers/cache/ax45mp_cache.c
+++ b/drivers/cache/ax45mp_cache.c
@@ -129,8 +129,12 @@ static void ax45mp_dma_cache_wback(phys_addr_t paddr, size_t size)
unsigned long line_size;
unsigned long flags;
+ if (unlikely(start == end))
+ return;
+
line_size = ax45mp_priv.ax45mp_cache_line_size;
start = start & (~(line_size - 1));
+ end = ((end + line_size - 1) & (~(line_size - 1)));
local_irq_save(flags);
ax45mp_cpu_dcache_wb_range(start, end);
local_irq_restore(flags);
diff --git a/drivers/clk/samsung/clk-gs101.c b/drivers/clk/samsung/clk-gs101.c
index 0964bb1..7829939 100644
--- a/drivers/clk/samsung/clk-gs101.c
+++ b/drivers/clk/samsung/clk-gs101.c
@@ -2475,7 +2475,7 @@ static const struct samsung_cmu_info misc_cmu_info __initconst = {
.nr_clk_ids = CLKS_NR_MISC,
.clk_regs = misc_clk_regs,
.nr_clk_regs = ARRAY_SIZE(misc_clk_regs),
- .clk_name = "dout_cmu_misc_bus",
+ .clk_name = "bus",
};
/* ---- platform_driver ----------------------------------------------------- */
diff --git a/drivers/comedi/drivers/comedi_8255.c b/drivers/comedi/drivers/comedi_8255.c
index e4974b5..a933ef5 100644
--- a/drivers/comedi/drivers/comedi_8255.c
+++ b/drivers/comedi/drivers/comedi_8255.c
@@ -159,6 +159,7 @@ static int __subdev_8255_init(struct comedi_device *dev,
return -ENOMEM;
spriv->context = context;
+ spriv->io = io;
s->type = COMEDI_SUBD_DIO;
s->subdev_flags = SDF_READABLE | SDF_WRITABLE;
diff --git a/drivers/comedi/drivers/comedi_test.c b/drivers/comedi/drivers/comedi_test.c
index 30ea8b53..05ae912 100644
--- a/drivers/comedi/drivers/comedi_test.c
+++ b/drivers/comedi/drivers/comedi_test.c
@@ -87,6 +87,8 @@ struct waveform_private {
struct comedi_device *dev; /* parent comedi device */
u64 ao_last_scan_time; /* time of previous AO scan in usec */
unsigned int ao_scan_period; /* AO scan period in usec */
+ bool ai_timer_enable:1; /* should AI timer be running? */
+ bool ao_timer_enable:1; /* should AO timer be running? */
unsigned short ao_loopbacks[N_CHANS];
};
@@ -236,8 +238,12 @@ static void waveform_ai_timer(struct timer_list *t)
time_increment = devpriv->ai_convert_time - now;
else
time_increment = 1;
- mod_timer(&devpriv->ai_timer,
- jiffies + usecs_to_jiffies(time_increment));
+ spin_lock(&dev->spinlock);
+ if (devpriv->ai_timer_enable) {
+ mod_timer(&devpriv->ai_timer,
+ jiffies + usecs_to_jiffies(time_increment));
+ }
+ spin_unlock(&dev->spinlock);
}
overrun:
@@ -393,9 +399,12 @@ static int waveform_ai_cmd(struct comedi_device *dev,
* Seem to need an extra jiffy here, otherwise timer expires slightly
* early!
*/
+ spin_lock_bh(&dev->spinlock);
+ devpriv->ai_timer_enable = true;
devpriv->ai_timer.expires =
jiffies + usecs_to_jiffies(devpriv->ai_convert_period) + 1;
add_timer(&devpriv->ai_timer);
+ spin_unlock_bh(&dev->spinlock);
return 0;
}
@@ -404,6 +413,9 @@ static int waveform_ai_cancel(struct comedi_device *dev,
{
struct waveform_private *devpriv = dev->private;
+ spin_lock_bh(&dev->spinlock);
+ devpriv->ai_timer_enable = false;
+ spin_unlock_bh(&dev->spinlock);
if (in_softirq()) {
/* Assume we were called from the timer routine itself. */
del_timer(&devpriv->ai_timer);
@@ -495,8 +507,12 @@ static void waveform_ao_timer(struct timer_list *t)
unsigned int time_inc = devpriv->ao_last_scan_time +
devpriv->ao_scan_period - now;
- mod_timer(&devpriv->ao_timer,
- jiffies + usecs_to_jiffies(time_inc));
+ spin_lock(&dev->spinlock);
+ if (devpriv->ao_timer_enable) {
+ mod_timer(&devpriv->ao_timer,
+ jiffies + usecs_to_jiffies(time_inc));
+ }
+ spin_unlock(&dev->spinlock);
}
underrun:
@@ -517,9 +533,12 @@ static int waveform_ao_inttrig_start(struct comedi_device *dev,
async->inttrig = NULL;
devpriv->ao_last_scan_time = ktime_to_us(ktime_get());
+ spin_lock_bh(&dev->spinlock);
+ devpriv->ao_timer_enable = true;
devpriv->ao_timer.expires =
jiffies + usecs_to_jiffies(devpriv->ao_scan_period);
add_timer(&devpriv->ao_timer);
+ spin_unlock_bh(&dev->spinlock);
return 1;
}
@@ -604,6 +623,9 @@ static int waveform_ao_cancel(struct comedi_device *dev,
struct waveform_private *devpriv = dev->private;
s->async->inttrig = NULL;
+ spin_lock_bh(&dev->spinlock);
+ devpriv->ao_timer_enable = false;
+ spin_unlock_bh(&dev->spinlock);
if (in_softirq()) {
/* Assume we were called from the timer routine itself. */
del_timer(&devpriv->ao_timer);
diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 3d5e6d7..44b19e69 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -108,9 +108,8 @@ static inline void send_msg(struct cn_msg *msg)
filter_data[1] = 0;
}
- if (cn_netlink_send_mult(msg, msg->len, 0, CN_IDX_PROC, GFP_NOWAIT,
- cn_filter, (void *)filter_data) == -ESRCH)
- atomic_set(&proc_event_num_listeners, 0);
+ cn_netlink_send_mult(msg, msg->len, 0, CN_IDX_PROC, GFP_NOWAIT,
+ cn_filter, (void *)filter_data);
local_unlock(&local_event.lock);
}
diff --git a/drivers/counter/counter-core.c b/drivers/counter/counter-core.c
index 09c77af..3f24481 100644
--- a/drivers/counter/counter-core.c
+++ b/drivers/counter/counter-core.c
@@ -31,10 +31,11 @@ struct counter_device_allochelper {
struct counter_device counter;
/*
- * This is cache line aligned to ensure private data behaves like if it
- * were kmalloced separately.
+ * This ensures private data behaves like if it were kmalloced
+ * separately. Also ensures the minimum alignment for safe DMA
+ * operations (which may or may not mean cache alignment).
*/
- unsigned long privdata[] ____cacheline_aligned;
+ unsigned long privdata[] __aligned(ARCH_DMA_MINALIGN);
};
static void counter_device_release(struct device *dev)
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index ca94e60..7961922 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -2987,6 +2987,9 @@ static void intel_cpufreq_adjust_perf(unsigned int cpunum,
if (min_pstate < cpu->min_perf_ratio)
min_pstate = cpu->min_perf_ratio;
+ if (min_pstate > cpu->max_perf_ratio)
+ min_pstate = cpu->max_perf_ratio;
+
max_pstate = min(cap_pstate, cpu->max_perf_ratio);
if (max_pstate < min_pstate)
max_pstate = min_pstate;
diff --git a/drivers/crypto/allwinner/sun8i-ce/sun8i-ce-cipher.c b/drivers/crypto/allwinner/sun8i-ce/sun8i-ce-cipher.c
index 1262a77..de50c00 100644
--- a/drivers/crypto/allwinner/sun8i-ce/sun8i-ce-cipher.c
+++ b/drivers/crypto/allwinner/sun8i-ce/sun8i-ce-cipher.c
@@ -299,22 +299,6 @@ static int sun8i_ce_cipher_prepare(struct crypto_engine *engine, void *async_req
return err;
}
-static void sun8i_ce_cipher_run(struct crypto_engine *engine, void *areq)
-{
- struct skcipher_request *breq = container_of(areq, struct skcipher_request, base);
- struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(breq);
- struct sun8i_cipher_tfm_ctx *op = crypto_skcipher_ctx(tfm);
- struct sun8i_ce_dev *ce = op->ce;
- struct sun8i_cipher_req_ctx *rctx = skcipher_request_ctx(breq);
- int flow, err;
-
- flow = rctx->flow;
- err = sun8i_ce_run_task(ce, flow, crypto_tfm_alg_name(breq->base.tfm));
- local_bh_disable();
- crypto_finalize_skcipher_request(engine, breq, err);
- local_bh_enable();
-}
-
static void sun8i_ce_cipher_unprepare(struct crypto_engine *engine,
void *async_req)
{
@@ -360,6 +344,23 @@ static void sun8i_ce_cipher_unprepare(struct crypto_engine *engine,
dma_unmap_single(ce->dev, rctx->addr_key, op->keylen, DMA_TO_DEVICE);
}
+static void sun8i_ce_cipher_run(struct crypto_engine *engine, void *areq)
+{
+ struct skcipher_request *breq = container_of(areq, struct skcipher_request, base);
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(breq);
+ struct sun8i_cipher_tfm_ctx *op = crypto_skcipher_ctx(tfm);
+ struct sun8i_ce_dev *ce = op->ce;
+ struct sun8i_cipher_req_ctx *rctx = skcipher_request_ctx(breq);
+ int flow, err;
+
+ flow = rctx->flow;
+ err = sun8i_ce_run_task(ce, flow, crypto_tfm_alg_name(breq->base.tfm));
+ sun8i_ce_cipher_unprepare(engine, areq);
+ local_bh_disable();
+ crypto_finalize_skcipher_request(engine, breq, err);
+ local_bh_enable();
+}
+
int sun8i_ce_cipher_do_one(struct crypto_engine *engine, void *areq)
{
int err = sun8i_ce_cipher_prepare(engine, areq);
@@ -368,7 +369,6 @@ int sun8i_ce_cipher_do_one(struct crypto_engine *engine, void *areq)
return err;
sun8i_ce_cipher_run(engine, areq);
- sun8i_ce_cipher_unprepare(engine, areq);
return 0;
}
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index e4d3f45..b04bc1d 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -534,10 +534,16 @@ EXPORT_SYMBOL_GPL(sev_platform_init);
static int __sev_platform_shutdown_locked(int *error)
{
- struct sev_device *sev = psp_master->sev_data;
+ struct psp_device *psp = psp_master;
+ struct sev_device *sev;
int ret;
- if (!sev || sev->state == SEV_STATE_UNINIT)
+ if (!psp || !psp->sev_data)
+ return 0;
+
+ sev = psp->sev_data;
+
+ if (sev->state == SEV_STATE_UNINIT)
return 0;
ret = __sev_do_cmd_locked(SEV_CMD_SHUTDOWN, NULL, error);
diff --git a/drivers/crypto/rockchip/rk3288_crypto_ahash.c b/drivers/crypto/rockchip/rk3288_crypto_ahash.c
index 1b13b4a..a235e6c3 100644
--- a/drivers/crypto/rockchip/rk3288_crypto_ahash.c
+++ b/drivers/crypto/rockchip/rk3288_crypto_ahash.c
@@ -332,12 +332,12 @@ static int rk_hash_run(struct crypto_engine *engine, void *breq)
theend:
pm_runtime_put_autosuspend(rkc->dev);
+ rk_hash_unprepare(engine, breq);
+
local_bh_disable();
crypto_finalize_hash_request(engine, breq, err);
local_bh_enable();
- rk_hash_unprepare(engine, breq);
-
return 0;
}
diff --git a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
index 2621ff8..de53edd 100644
--- a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
+++ b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
@@ -104,7 +104,8 @@ static void virtio_crypto_dataq_akcipher_callback(struct virtio_crypto_request *
}
static int virtio_crypto_alg_akcipher_init_session(struct virtio_crypto_akcipher_ctx *ctx,
- struct virtio_crypto_ctrl_header *header, void *para,
+ struct virtio_crypto_ctrl_header *header,
+ struct virtio_crypto_akcipher_session_para *para,
const uint8_t *key, unsigned int keylen)
{
struct scatterlist outhdr_sg, key_sg, inhdr_sg, *sgs[3];
@@ -128,7 +129,7 @@ static int virtio_crypto_alg_akcipher_init_session(struct virtio_crypto_akcipher
ctrl = &vc_ctrl_req->ctrl;
memcpy(&ctrl->header, header, sizeof(ctrl->header));
- memcpy(&ctrl->u, para, sizeof(ctrl->u));
+ memcpy(&ctrl->u.akcipher_create_session.para, para, sizeof(*para));
input = &vc_ctrl_req->input;
input->status = cpu_to_le32(VIRTIO_CRYPTO_ERR);
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index dcf2b39..1a3e6aa 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -316,31 +316,27 @@ static const struct cxl_root_ops acpi_root_ops = {
.qos_class = cxl_acpi_qos_class,
};
-static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
- const unsigned long end)
+static int __cxl_parse_cfmws(struct acpi_cedt_cfmws *cfmws,
+ struct cxl_cfmws_context *ctx)
{
int target_map[CXL_DECODER_MAX_INTERLEAVE];
- struct cxl_cfmws_context *ctx = arg;
struct cxl_port *root_port = ctx->root_port;
struct resource *cxl_res = ctx->cxl_res;
struct cxl_cxims_context cxims_ctx;
struct cxl_root_decoder *cxlrd;
struct device *dev = ctx->dev;
- struct acpi_cedt_cfmws *cfmws;
cxl_calc_hb_fn cxl_calc_hb;
struct cxl_decoder *cxld;
unsigned int ways, i, ig;
struct resource *res;
int rc;
- cfmws = (struct acpi_cedt_cfmws *) header;
-
rc = cxl_acpi_cfmws_verify(dev, cfmws);
if (rc) {
dev_err(dev, "CFMWS range %#llx-%#llx not registered\n",
cfmws->base_hpa,
cfmws->base_hpa + cfmws->window_size - 1);
- return 0;
+ return rc;
}
rc = eiw_to_ways(cfmws->interleave_ways, &ways);
@@ -376,7 +372,7 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
cxlrd = cxl_root_decoder_alloc(root_port, ways, cxl_calc_hb);
if (IS_ERR(cxlrd))
- return 0;
+ return PTR_ERR(cxlrd);
cxld = &cxlrd->cxlsd.cxld;
cxld->flags = cfmws_to_decoder_flags(cfmws->restrictions);
@@ -420,16 +416,7 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
put_device(&cxld->dev);
else
rc = cxl_decoder_autoremove(dev, cxld);
- if (rc) {
- dev_err(dev, "Failed to add decode range: %pr", res);
- return rc;
- }
- dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
- dev_name(&cxld->dev),
- phys_to_target_node(cxld->hpa_range.start),
- cxld->hpa_range.start, cxld->hpa_range.end);
-
- return 0;
+ return rc;
err_insert:
kfree(res->name);
@@ -438,6 +425,29 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
return -ENOMEM;
}
+static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
+ const unsigned long end)
+{
+ struct acpi_cedt_cfmws *cfmws = (struct acpi_cedt_cfmws *)header;
+ struct cxl_cfmws_context *ctx = arg;
+ struct device *dev = ctx->dev;
+ int rc;
+
+ rc = __cxl_parse_cfmws(cfmws, ctx);
+ if (rc)
+ dev_err(dev,
+ "Failed to add decode range: [%#llx - %#llx] (%d)\n",
+ cfmws->base_hpa,
+ cfmws->base_hpa + cfmws->window_size - 1, rc);
+ else
+ dev_dbg(dev, "decode range: node: %d range [%#llx - %#llx]\n",
+ phys_to_target_node(cfmws->base_hpa), cfmws->base_hpa,
+ cfmws->base_hpa + cfmws->window_size - 1);
+
+ /* never fail cxl_acpi load for a single window failure */
+ return 0;
+}
+
__mock struct acpi_device *to_cxl_host_bridge(struct device *host,
struct device *dev)
{
diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c
index 6fe1154..08fd0ba 100644
--- a/drivers/cxl/core/cdat.c
+++ b/drivers/cxl/core/cdat.c
@@ -210,19 +210,12 @@ static int cxl_port_perf_data_calculate(struct cxl_port *port,
return 0;
}
-static void add_perf_entry(struct device *dev, struct dsmas_entry *dent,
- struct list_head *list)
+static void update_perf_entry(struct device *dev, struct dsmas_entry *dent,
+ struct cxl_dpa_perf *dpa_perf)
{
- struct cxl_dpa_perf *dpa_perf;
-
- dpa_perf = kzalloc(sizeof(*dpa_perf), GFP_KERNEL);
- if (!dpa_perf)
- return;
-
dpa_perf->dpa_range = dent->dpa_range;
dpa_perf->coord = dent->coord;
dpa_perf->qos_class = dent->qos_class;
- list_add_tail(&dpa_perf->list, list);
dev_dbg(dev,
"DSMAS: dpa: %#llx qos: %d read_bw: %d write_bw %d read_lat: %d write_lat: %d\n",
dent->dpa_range.start, dpa_perf->qos_class,
@@ -230,20 +223,6 @@ static void add_perf_entry(struct device *dev, struct dsmas_entry *dent,
dent->coord.read_latency, dent->coord.write_latency);
}
-static void free_perf_ents(void *data)
-{
- struct cxl_memdev_state *mds = data;
- struct cxl_dpa_perf *dpa_perf, *n;
- LIST_HEAD(discard);
-
- list_splice_tail_init(&mds->ram_perf_list, &discard);
- list_splice_tail_init(&mds->pmem_perf_list, &discard);
- list_for_each_entry_safe(dpa_perf, n, &discard, list) {
- list_del(&dpa_perf->list);
- kfree(dpa_perf);
- }
-}
-
static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds,
struct xarray *dsmas_xa)
{
@@ -263,16 +242,14 @@ static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds,
xa_for_each(dsmas_xa, index, dent) {
if (resource_size(&cxlds->ram_res) &&
range_contains(&ram_range, &dent->dpa_range))
- add_perf_entry(dev, dent, &mds->ram_perf_list);
+ update_perf_entry(dev, dent, &mds->ram_perf);
else if (resource_size(&cxlds->pmem_res) &&
range_contains(&pmem_range, &dent->dpa_range))
- add_perf_entry(dev, dent, &mds->pmem_perf_list);
+ update_perf_entry(dev, dent, &mds->pmem_perf);
else
dev_dbg(dev, "no partition for dsmas dpa: %#llx\n",
dent->dpa_range.start);
}
-
- devm_add_action_or_reset(&cxlds->cxlmd->dev, free_perf_ents, mds);
}
static int match_cxlrd_qos_class(struct device *dev, void *data)
@@ -293,24 +270,24 @@ static int match_cxlrd_qos_class(struct device *dev, void *data)
return 0;
}
-static void cxl_qos_match(struct cxl_port *root_port,
- struct list_head *work_list,
- struct list_head *discard_list)
+static void reset_dpa_perf(struct cxl_dpa_perf *dpa_perf)
{
- struct cxl_dpa_perf *dpa_perf, *n;
+ *dpa_perf = (struct cxl_dpa_perf) {
+ .qos_class = CXL_QOS_CLASS_INVALID,
+ };
+}
- list_for_each_entry_safe(dpa_perf, n, work_list, list) {
- int rc;
+static bool cxl_qos_match(struct cxl_port *root_port,
+ struct cxl_dpa_perf *dpa_perf)
+{
+ if (dpa_perf->qos_class == CXL_QOS_CLASS_INVALID)
+ return false;
- if (dpa_perf->qos_class == CXL_QOS_CLASS_INVALID)
- return;
+ if (!device_for_each_child(&root_port->dev, &dpa_perf->qos_class,
+ match_cxlrd_qos_class))
+ return false;
- rc = device_for_each_child(&root_port->dev,
- (void *)&dpa_perf->qos_class,
- match_cxlrd_qos_class);
- if (!rc)
- list_move_tail(&dpa_perf->list, discard_list);
- }
+ return true;
}
static int match_cxlrd_hb(struct device *dev, void *data)
@@ -334,23 +311,10 @@ static int match_cxlrd_hb(struct device *dev, void *data)
return 0;
}
-static void discard_dpa_perf(struct list_head *list)
-{
- struct cxl_dpa_perf *dpa_perf, *n;
-
- list_for_each_entry_safe(dpa_perf, n, list, list) {
- list_del(&dpa_perf->list);
- kfree(dpa_perf);
- }
-}
-DEFINE_FREE(dpa_perf, struct list_head *, if (!list_empty(_T)) discard_dpa_perf(_T))
-
static int cxl_qos_class_verify(struct cxl_memdev *cxlmd)
{
struct cxl_dev_state *cxlds = cxlmd->cxlds;
struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
- LIST_HEAD(__discard);
- struct list_head *discard __free(dpa_perf) = &__discard;
struct cxl_port *root_port;
int rc;
@@ -363,16 +327,17 @@ static int cxl_qos_class_verify(struct cxl_memdev *cxlmd)
root_port = &cxl_root->port;
/* Check that the QTG IDs are all sane between end device and root decoders */
- cxl_qos_match(root_port, &mds->ram_perf_list, discard);
- cxl_qos_match(root_port, &mds->pmem_perf_list, discard);
+ if (!cxl_qos_match(root_port, &mds->ram_perf))
+ reset_dpa_perf(&mds->ram_perf);
+ if (!cxl_qos_match(root_port, &mds->pmem_perf))
+ reset_dpa_perf(&mds->pmem_perf);
/* Check to make sure that the device's host bridge is under a root decoder */
rc = device_for_each_child(&root_port->dev,
- (void *)cxlmd->endpoint->host_bridge,
- match_cxlrd_hb);
+ cxlmd->endpoint->host_bridge, match_cxlrd_hb);
if (!rc) {
- list_splice_tail_init(&mds->ram_perf_list, discard);
- list_splice_tail_init(&mds->pmem_perf_list, discard);
+ reset_dpa_perf(&mds->ram_perf);
+ reset_dpa_perf(&mds->pmem_perf);
}
return rc;
@@ -417,6 +382,7 @@ void cxl_endpoint_parse_cdat(struct cxl_port *port)
cxl_memdev_set_qos_class(cxlds, dsmas_xa);
cxl_qos_class_verify(cxlmd);
+ cxl_memdev_update_perf(cxlmd);
}
EXPORT_SYMBOL_NS_GPL(cxl_endpoint_parse_cdat, CXL);
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 27166a4..9adda47 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1391,8 +1391,8 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
mds->cxlds.reg_map.host = dev;
mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE;
mds->cxlds.type = CXL_DEVTYPE_CLASSMEM;
- INIT_LIST_HEAD(&mds->ram_perf_list);
- INIT_LIST_HEAD(&mds->pmem_perf_list);
+ mds->ram_perf.qos_class = CXL_QOS_CLASS_INVALID;
+ mds->pmem_perf.qos_class = CXL_QOS_CLASS_INVALID;
return mds;
}
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index dae8802..d4e259f 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -447,13 +447,41 @@ static struct attribute *cxl_memdev_attributes[] = {
NULL,
};
+static ssize_t pmem_qos_class_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
+ struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
+
+ return sysfs_emit(buf, "%d\n", mds->pmem_perf.qos_class);
+}
+
+static struct device_attribute dev_attr_pmem_qos_class =
+ __ATTR(qos_class, 0444, pmem_qos_class_show, NULL);
+
static struct attribute *cxl_memdev_pmem_attributes[] = {
&dev_attr_pmem_size.attr,
+ &dev_attr_pmem_qos_class.attr,
NULL,
};
+static ssize_t ram_qos_class_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
+ struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
+
+ return sysfs_emit(buf, "%d\n", mds->ram_perf.qos_class);
+}
+
+static struct device_attribute dev_attr_ram_qos_class =
+ __ATTR(qos_class, 0444, ram_qos_class_show, NULL);
+
static struct attribute *cxl_memdev_ram_attributes[] = {
&dev_attr_ram_size.attr,
+ &dev_attr_ram_qos_class.attr,
NULL,
};
@@ -477,14 +505,42 @@ static struct attribute_group cxl_memdev_attribute_group = {
.is_visible = cxl_memdev_visible,
};
+static umode_t cxl_ram_visible(struct kobject *kobj, struct attribute *a, int n)
+{
+ struct device *dev = kobj_to_dev(kobj);
+ struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+ struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+
+ if (a == &dev_attr_ram_qos_class.attr)
+ if (mds->ram_perf.qos_class == CXL_QOS_CLASS_INVALID)
+ return 0;
+
+ return a->mode;
+}
+
static struct attribute_group cxl_memdev_ram_attribute_group = {
.name = "ram",
.attrs = cxl_memdev_ram_attributes,
+ .is_visible = cxl_ram_visible,
};
+static umode_t cxl_pmem_visible(struct kobject *kobj, struct attribute *a, int n)
+{
+ struct device *dev = kobj_to_dev(kobj);
+ struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+ struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+
+ if (a == &dev_attr_pmem_qos_class.attr)
+ if (mds->pmem_perf.qos_class == CXL_QOS_CLASS_INVALID)
+ return 0;
+
+ return a->mode;
+}
+
static struct attribute_group cxl_memdev_pmem_attribute_group = {
.name = "pmem",
.attrs = cxl_memdev_pmem_attributes,
+ .is_visible = cxl_pmem_visible,
};
static umode_t cxl_memdev_security_visible(struct kobject *kobj,
@@ -519,6 +575,13 @@ static const struct attribute_group *cxl_memdev_attribute_groups[] = {
NULL,
};
+void cxl_memdev_update_perf(struct cxl_memdev *cxlmd)
+{
+ sysfs_update_group(&cxlmd->dev.kobj, &cxl_memdev_ram_attribute_group);
+ sysfs_update_group(&cxlmd->dev.kobj, &cxl_memdev_pmem_attribute_group);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_memdev_update_perf, CXL);
+
static const struct device_type cxl_memdev_type = {
.name = "cxl_memdev",
.release = cxl_memdev_release,
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 6c9c8d9..e9e6c81 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -477,9 +477,9 @@ int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
allowed++;
}
- if (!allowed) {
- cxl_set_mem_enable(cxlds, 0);
- info->mem_enabled = 0;
+ if (!allowed && info->mem_enabled) {
+ dev_err(dev, "Range register decodes outside platform defined CXL ranges.\n");
+ return -ENXIO;
}
/*
@@ -932,11 +932,21 @@ static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { }
void cxl_cor_error_detected(struct pci_dev *pdev)
{
struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
+ struct device *dev = &cxlds->cxlmd->dev;
- if (cxlds->rcd)
- cxl_handle_rdport_errors(cxlds);
+ scoped_guard(device, dev) {
+ if (!dev->driver) {
+ dev_warn(&pdev->dev,
+ "%s: memdev disabled, abort error handling\n",
+ dev_name(dev));
+ return;
+ }
- cxl_handle_endpoint_cor_ras(cxlds);
+ if (cxlds->rcd)
+ cxl_handle_rdport_errors(cxlds);
+
+ cxl_handle_endpoint_cor_ras(cxlds);
+ }
}
EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, CXL);
@@ -948,16 +958,25 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
struct device *dev = &cxlmd->dev;
bool ue;
- if (cxlds->rcd)
- cxl_handle_rdport_errors(cxlds);
+ scoped_guard(device, dev) {
+ if (!dev->driver) {
+ dev_warn(&pdev->dev,
+ "%s: memdev disabled, abort error handling\n",
+ dev_name(dev));
+ return PCI_ERS_RESULT_DISCONNECT;
+ }
- /*
- * A frozen channel indicates an impending reset which is fatal to
- * CXL.mem operation, and will likely crash the system. On the off
- * chance the situation is recoverable dump the status of the RAS
- * capability registers and bounce the active state of the memdev.
- */
- ue = cxl_handle_endpoint_ras(cxlds);
+ if (cxlds->rcd)
+ cxl_handle_rdport_errors(cxlds);
+ /*
+ * A frozen channel indicates an impending reset which is fatal to
+ * CXL.mem operation, and will likely crash the system. On the off
+ * chance the situation is recoverable dump the status of the RAS
+ * capability registers and bounce the active state of the memdev.
+ */
+ ue = cxl_handle_endpoint_ras(cxlds);
+ }
+
switch (state) {
case pci_channel_io_normal:
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index ce0e2d8..4c7fd2d 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -730,12 +730,17 @@ static int match_auto_decoder(struct device *dev, void *data)
return 0;
}
-static struct cxl_decoder *cxl_region_find_decoder(struct cxl_port *port,
- struct cxl_region *cxlr)
+static struct cxl_decoder *
+cxl_region_find_decoder(struct cxl_port *port,
+ struct cxl_endpoint_decoder *cxled,
+ struct cxl_region *cxlr)
{
struct device *dev;
int id = 0;
+ if (port == cxled_to_port(cxled))
+ return &cxled->cxld;
+
if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags))
dev = device_find_child(&port->dev, &cxlr->params,
match_auto_decoder);
@@ -753,8 +758,31 @@ static struct cxl_decoder *cxl_region_find_decoder(struct cxl_port *port,
return to_cxl_decoder(dev);
}
-static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port,
- struct cxl_region *cxlr)
+static bool auto_order_ok(struct cxl_port *port, struct cxl_region *cxlr_iter,
+ struct cxl_decoder *cxld)
+{
+ struct cxl_region_ref *rr = cxl_rr_load(port, cxlr_iter);
+ struct cxl_decoder *cxld_iter = rr->decoder;
+
+ /*
+ * Allow the out of order assembly of auto-discovered regions.
+ * Per CXL Spec 3.1 8.2.4.20.12 software must commit decoders
+ * in HPA order. Confirm that the decoder with the lesser HPA
+ * starting address has the lesser id.
+ */
+ dev_dbg(&cxld->dev, "check for HPA violation %s:%d < %s:%d\n",
+ dev_name(&cxld->dev), cxld->id,
+ dev_name(&cxld_iter->dev), cxld_iter->id);
+
+ if (cxld_iter->id > cxld->id)
+ return true;
+
+ return false;
+}
+
+static struct cxl_region_ref *
+alloc_region_ref(struct cxl_port *port, struct cxl_region *cxlr,
+ struct cxl_endpoint_decoder *cxled)
{
struct cxl_region_params *p = &cxlr->params;
struct cxl_region_ref *cxl_rr, *iter;
@@ -764,16 +792,21 @@ static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port,
xa_for_each(&port->regions, index, iter) {
struct cxl_region_params *ip = &iter->region->params;
- if (!ip->res)
+ if (!ip->res || ip->res->start < p->res->start)
continue;
- if (ip->res->start > p->res->start) {
- dev_dbg(&cxlr->dev,
- "%s: HPA order violation %s:%pr vs %pr\n",
- dev_name(&port->dev),
- dev_name(&iter->region->dev), ip->res, p->res);
- return ERR_PTR(-EBUSY);
+ if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) {
+ struct cxl_decoder *cxld;
+
+ cxld = cxl_region_find_decoder(port, cxled, cxlr);
+ if (auto_order_ok(port, iter->region, cxld))
+ continue;
}
+ dev_dbg(&cxlr->dev, "%s: HPA order violation %s:%pr vs %pr\n",
+ dev_name(&port->dev),
+ dev_name(&iter->region->dev), ip->res, p->res);
+
+ return ERR_PTR(-EBUSY);
}
cxl_rr = kzalloc(sizeof(*cxl_rr), GFP_KERNEL);
@@ -853,10 +886,7 @@ static int cxl_rr_alloc_decoder(struct cxl_port *port, struct cxl_region *cxlr,
{
struct cxl_decoder *cxld;
- if (port == cxled_to_port(cxled))
- cxld = &cxled->cxld;
- else
- cxld = cxl_region_find_decoder(port, cxlr);
+ cxld = cxl_region_find_decoder(port, cxled, cxlr);
if (!cxld) {
dev_dbg(&cxlr->dev, "%s: no decoder available\n",
dev_name(&port->dev));
@@ -953,7 +983,7 @@ static int cxl_port_attach_region(struct cxl_port *port,
nr_targets_inc = true;
}
} else {
- cxl_rr = alloc_region_ref(port, cxlr);
+ cxl_rr = alloc_region_ref(port, cxlr, cxled);
if (IS_ERR(cxl_rr)) {
dev_dbg(&cxlr->dev,
"%s: failed to allocate region reference\n",
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index 8944543..bdf117a 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -338,7 +338,7 @@ TRACE_EVENT(cxl_general_media,
TP_fast_assign(
CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
- memcpy(&__entry->hdr_uuid, &CXL_EVENT_GEN_MEDIA_UUID, sizeof(uuid_t));
+ __entry->hdr_uuid = CXL_EVENT_GEN_MEDIA_UUID;
/* General Media */
__entry->dpa = le64_to_cpu(rec->phys_addr);
@@ -425,7 +425,7 @@ TRACE_EVENT(cxl_dram,
TP_fast_assign(
CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
- memcpy(&__entry->hdr_uuid, &CXL_EVENT_DRAM_UUID, sizeof(uuid_t));
+ __entry->hdr_uuid = CXL_EVENT_DRAM_UUID;
/* DRAM */
__entry->dpa = le64_to_cpu(rec->phys_addr);
@@ -573,7 +573,7 @@ TRACE_EVENT(cxl_memory_module,
TP_fast_assign(
CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
- memcpy(&__entry->hdr_uuid, &CXL_EVENT_MEM_MODULE_UUID, sizeof(uuid_t));
+ __entry->hdr_uuid = CXL_EVENT_MEM_MODULE_UUID;
/* Memory Module Event */
__entry->event_type = rec->event_type;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index b6017c0..003feeb 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -880,6 +880,8 @@ void cxl_switch_parse_cdat(struct cxl_port *port);
int cxl_endpoint_get_perf_coordinates(struct cxl_port *port,
struct access_coordinate *coord);
+void cxl_memdev_update_perf(struct cxl_memdev *cxlmd);
+
/*
* Unit test builds overrides this to __weak, find the 'strong' version
* of these symbols in tools/testing/cxl/.
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 5303d69..20fb3b3 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -395,13 +395,11 @@ enum cxl_devtype {
/**
* struct cxl_dpa_perf - DPA performance property entry
- * @list - list entry
* @dpa_range - range for DPA address
* @coord - QoS performance data (i.e. latency, bandwidth)
* @qos_class - QoS Class cookies
*/
struct cxl_dpa_perf {
- struct list_head list;
struct range dpa_range;
struct access_coordinate coord;
int qos_class;
@@ -471,8 +469,8 @@ struct cxl_dev_state {
* @security: security driver state info
* @fw: firmware upload / activation state
* @mbox_send: @dev specific transport for transmitting mailbox commands
- * @ram_perf_list: performance data entries matched to RAM
- * @pmem_perf_list: performance data entries matched to PMEM
+ * @ram_perf: performance data entry matched to RAM partition
+ * @pmem_perf: performance data entry matched to PMEM partition
*
* See CXL 3.0 8.2.9.8.2 Capacity Configuration and Label Storage for
* details on capacity parameters.
@@ -494,8 +492,8 @@ struct cxl_memdev_state {
u64 next_volatile_bytes;
u64 next_persistent_bytes;
- struct list_head ram_perf_list;
- struct list_head pmem_perf_list;
+ struct cxl_dpa_perf ram_perf;
+ struct cxl_dpa_perf pmem_perf;
struct cxl_event_state event;
struct cxl_poison_state poison;
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index c5c9d8e..0c79d9c 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -215,52 +215,6 @@ static ssize_t trigger_poison_list_store(struct device *dev,
}
static DEVICE_ATTR_WO(trigger_poison_list);
-static ssize_t ram_qos_class_show(struct device *dev,
- struct device_attribute *attr, char *buf)
-{
- struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
- struct cxl_dev_state *cxlds = cxlmd->cxlds;
- struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
- struct cxl_dpa_perf *dpa_perf;
-
- if (!dev->driver)
- return -ENOENT;
-
- if (list_empty(&mds->ram_perf_list))
- return -ENOENT;
-
- dpa_perf = list_first_entry(&mds->ram_perf_list, struct cxl_dpa_perf,
- list);
-
- return sysfs_emit(buf, "%d\n", dpa_perf->qos_class);
-}
-
-static struct device_attribute dev_attr_ram_qos_class =
- __ATTR(qos_class, 0444, ram_qos_class_show, NULL);
-
-static ssize_t pmem_qos_class_show(struct device *dev,
- struct device_attribute *attr, char *buf)
-{
- struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
- struct cxl_dev_state *cxlds = cxlmd->cxlds;
- struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
- struct cxl_dpa_perf *dpa_perf;
-
- if (!dev->driver)
- return -ENOENT;
-
- if (list_empty(&mds->pmem_perf_list))
- return -ENOENT;
-
- dpa_perf = list_first_entry(&mds->pmem_perf_list, struct cxl_dpa_perf,
- list);
-
- return sysfs_emit(buf, "%d\n", dpa_perf->qos_class);
-}
-
-static struct device_attribute dev_attr_pmem_qos_class =
- __ATTR(qos_class, 0444, pmem_qos_class_show, NULL);
-
static umode_t cxl_mem_visible(struct kobject *kobj, struct attribute *a, int n)
{
struct device *dev = kobj_to_dev(kobj);
@@ -272,21 +226,11 @@ static umode_t cxl_mem_visible(struct kobject *kobj, struct attribute *a, int n)
mds->poison.enabled_cmds))
return 0;
- if (a == &dev_attr_pmem_qos_class.attr)
- if (list_empty(&mds->pmem_perf_list))
- return 0;
-
- if (a == &dev_attr_ram_qos_class.attr)
- if (list_empty(&mds->ram_perf_list))
- return 0;
-
return a->mode;
}
static struct attribute *cxl_mem_attrs[] = {
&dev_attr_trigger_poison_list.attr,
- &dev_attr_ram_qos_class.attr,
- &dev_attr_pmem_qos_class.attr,
NULL
};
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 233e7c4..2ff361e 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -974,61 +974,6 @@ static struct pci_driver cxl_pci_driver = {
},
};
-#define CXL_EVENT_HDR_FLAGS_REC_SEVERITY GENMASK(1, 0)
-static void cxl_cper_event_call(enum cxl_event_type ev_type,
- struct cxl_cper_event_rec *rec)
-{
- struct cper_cxl_event_devid *device_id = &rec->hdr.device_id;
- struct pci_dev *pdev __free(pci_dev_put) = NULL;
- enum cxl_event_log_type log_type;
- struct cxl_dev_state *cxlds;
- unsigned int devfn;
- u32 hdr_flags;
-
- devfn = PCI_DEVFN(device_id->device_num, device_id->func_num);
- pdev = pci_get_domain_bus_and_slot(device_id->segment_num,
- device_id->bus_num, devfn);
- if (!pdev)
- return;
-
- guard(pci_dev)(pdev);
- if (pdev->driver != &cxl_pci_driver)
- return;
-
- cxlds = pci_get_drvdata(pdev);
- if (!cxlds)
- return;
-
- /* Fabricate a log type */
- hdr_flags = get_unaligned_le24(rec->event.generic.hdr.flags);
- log_type = FIELD_GET(CXL_EVENT_HDR_FLAGS_REC_SEVERITY, hdr_flags);
-
- cxl_event_trace_record(cxlds->cxlmd, log_type, ev_type,
- &uuid_null, &rec->event);
-}
-
-static int __init cxl_pci_driver_init(void)
-{
- int rc;
-
- rc = cxl_cper_register_callback(cxl_cper_event_call);
- if (rc)
- return rc;
-
- rc = pci_register_driver(&cxl_pci_driver);
- if (rc)
- cxl_cper_unregister_callback(cxl_cper_event_call);
-
- return rc;
-}
-
-static void __exit cxl_pci_driver_exit(void)
-{
- pci_unregister_driver(&cxl_pci_driver);
- cxl_cper_unregister_callback(cxl_cper_event_call);
-}
-
-module_init(cxl_pci_driver_init);
-module_exit(cxl_pci_driver_exit);
+module_pci_driver(cxl_pci_driver);
MODULE_LICENSE("GPL v2");
MODULE_IMPORT_NS(CXL);
diff --git a/drivers/dma/dw-edma/dw-edma-v0-core.c b/drivers/dma/dw-edma/dw-edma-v0-core.c
index b38786f..b75fdaf 100644
--- a/drivers/dma/dw-edma/dw-edma-v0-core.c
+++ b/drivers/dma/dw-edma/dw-edma-v0-core.c
@@ -346,6 +346,20 @@ static void dw_edma_v0_core_write_chunk(struct dw_edma_chunk *chunk)
dw_edma_v0_write_ll_link(chunk, i, control, chunk->ll_region.paddr);
}
+static void dw_edma_v0_sync_ll_data(struct dw_edma_chunk *chunk)
+{
+ /*
+ * In case of remote eDMA engine setup, the DW PCIe RP/EP internal
+ * configuration registers and application memory are normally accessed
+ * over different buses. Ensure LL-data reaches the memory before the
+ * doorbell register is toggled by issuing the dummy-read from the remote
+ * LL memory in a hope that the MRd TLP will return only after the
+ * last MWr TLP is completed
+ */
+ if (!(chunk->chan->dw->chip->flags & DW_EDMA_CHIP_LOCAL))
+ readl(chunk->ll_region.vaddr.io);
+}
+
static void dw_edma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
{
struct dw_edma_chan *chan = chunk->chan;
@@ -412,6 +426,9 @@ static void dw_edma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
SET_CH_32(dw, chan->dir, chan->id, llp.msb,
upper_32_bits(chunk->ll_region.paddr));
}
+
+ dw_edma_v0_sync_ll_data(chunk);
+
/* Doorbell */
SET_RW_32(dw, chan->dir, doorbell,
FIELD_PREP(EDMA_V0_DOORBELL_CH_MASK, chan->id));
diff --git a/drivers/dma/dw-edma/dw-hdma-v0-core.c b/drivers/dma/dw-edma/dw-hdma-v0-core.c
index 00b735a..10e8f07 100644
--- a/drivers/dma/dw-edma/dw-hdma-v0-core.c
+++ b/drivers/dma/dw-edma/dw-hdma-v0-core.c
@@ -65,18 +65,12 @@ static void dw_hdma_v0_core_off(struct dw_edma *dw)
static u16 dw_hdma_v0_core_ch_count(struct dw_edma *dw, enum dw_edma_dir dir)
{
- u32 num_ch = 0;
- int id;
-
- for (id = 0; id < HDMA_V0_MAX_NR_CH; id++) {
- if (GET_CH_32(dw, id, dir, ch_en) & BIT(0))
- num_ch++;
- }
-
- if (num_ch > HDMA_V0_MAX_NR_CH)
- num_ch = HDMA_V0_MAX_NR_CH;
-
- return (u16)num_ch;
+ /*
+ * The HDMA IP have no way to know the number of hardware channels
+ * available, we set it to maximum channels and let the platform
+ * set the right number of channels.
+ */
+ return HDMA_V0_MAX_NR_CH;
}
static enum dma_status dw_hdma_v0_core_ch_status(struct dw_edma_chan *chan)
@@ -228,6 +222,20 @@ static void dw_hdma_v0_core_write_chunk(struct dw_edma_chunk *chunk)
dw_hdma_v0_write_ll_link(chunk, i, control, chunk->ll_region.paddr);
}
+static void dw_hdma_v0_sync_ll_data(struct dw_edma_chunk *chunk)
+{
+ /*
+ * In case of remote HDMA engine setup, the DW PCIe RP/EP internal
+ * configuration registers and application memory are normally accessed
+ * over different buses. Ensure LL-data reaches the memory before the
+ * doorbell register is toggled by issuing the dummy-read from the remote
+ * LL memory in a hope that the MRd TLP will return only after the
+ * last MWr TLP is completed
+ */
+ if (!(chunk->chan->dw->chip->flags & DW_EDMA_CHIP_LOCAL))
+ readl(chunk->ll_region.vaddr.io);
+}
+
static void dw_hdma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
{
struct dw_edma_chan *chan = chunk->chan;
@@ -242,7 +250,9 @@ static void dw_hdma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
/* Interrupt enable&unmask - done, abort */
tmp = GET_CH_32(dw, chan->dir, chan->id, int_setup) |
HDMA_V0_STOP_INT_MASK | HDMA_V0_ABORT_INT_MASK |
- HDMA_V0_LOCAL_STOP_INT_EN | HDMA_V0_LOCAL_STOP_INT_EN;
+ HDMA_V0_LOCAL_STOP_INT_EN | HDMA_V0_LOCAL_ABORT_INT_EN;
+ if (!(dw->chip->flags & DW_EDMA_CHIP_LOCAL))
+ tmp |= HDMA_V0_REMOTE_STOP_INT_EN | HDMA_V0_REMOTE_ABORT_INT_EN;
SET_CH_32(dw, chan->dir, chan->id, int_setup, tmp);
/* Channel control */
SET_CH_32(dw, chan->dir, chan->id, control1, HDMA_V0_LINKLIST_EN);
@@ -256,6 +266,9 @@ static void dw_hdma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
/* Set consumer cycle */
SET_CH_32(dw, chan->dir, chan->id, cycle_sync,
HDMA_V0_CONSUMER_CYCLE_STAT | HDMA_V0_CONSUMER_CYCLE_BIT);
+
+ dw_hdma_v0_sync_ll_data(chunk);
+
/* Doorbell */
SET_CH_32(dw, chan->dir, chan->id, doorbell, HDMA_V0_DOORBELL_START);
}
diff --git a/drivers/dma/dw-edma/dw-hdma-v0-regs.h b/drivers/dma/dw-edma/dw-hdma-v0-regs.h
index a974abd..eab5fd7 100644
--- a/drivers/dma/dw-edma/dw-hdma-v0-regs.h
+++ b/drivers/dma/dw-edma/dw-hdma-v0-regs.h
@@ -15,7 +15,7 @@
#define HDMA_V0_LOCAL_ABORT_INT_EN BIT(6)
#define HDMA_V0_REMOTE_ABORT_INT_EN BIT(5)
#define HDMA_V0_LOCAL_STOP_INT_EN BIT(4)
-#define HDMA_V0_REMOTEL_STOP_INT_EN BIT(3)
+#define HDMA_V0_REMOTE_STOP_INT_EN BIT(3)
#define HDMA_V0_ABORT_INT_MASK BIT(2)
#define HDMA_V0_STOP_INT_MASK BIT(0)
#define HDMA_V0_LINKLIST_EN BIT(0)
diff --git a/drivers/dma/fsl-edma-common.c b/drivers/dma/fsl-edma-common.c
index b53f462..793f1a7 100644
--- a/drivers/dma/fsl-edma-common.c
+++ b/drivers/dma/fsl-edma-common.c
@@ -503,7 +503,7 @@ void fsl_edma_fill_tcd(struct fsl_edma_chan *fsl_chan,
if (fsl_chan->is_multi_fifo) {
/* set mloff to support multiple fifo */
burst = cfg->direction == DMA_DEV_TO_MEM ?
- cfg->src_addr_width : cfg->dst_addr_width;
+ cfg->src_maxburst : cfg->dst_maxburst;
nbytes |= EDMA_V3_TCD_NBYTES_MLOFF(-(burst * 4));
/* enable DMLOE/SMLOE */
if (cfg->direction == DMA_MEM_TO_DEV) {
diff --git a/drivers/dma/fsl-edma-common.h b/drivers/dma/fsl-edma-common.h
index bb52211..f5e216b 100644
--- a/drivers/dma/fsl-edma-common.h
+++ b/drivers/dma/fsl-edma-common.h
@@ -30,8 +30,9 @@
#define EDMA_TCD_ATTR_SSIZE(x) (((x) & GENMASK(2, 0)) << 8)
#define EDMA_TCD_ATTR_SMOD(x) (((x) & GENMASK(4, 0)) << 11)
-#define EDMA_TCD_CITER_CITER(x) ((x) & GENMASK(14, 0))
-#define EDMA_TCD_BITER_BITER(x) ((x) & GENMASK(14, 0))
+#define EDMA_TCD_ITER_MASK GENMASK(14, 0)
+#define EDMA_TCD_CITER_CITER(x) ((x) & EDMA_TCD_ITER_MASK)
+#define EDMA_TCD_BITER_BITER(x) ((x) & EDMA_TCD_ITER_MASK)
#define EDMA_TCD_CSR_START BIT(0)
#define EDMA_TCD_CSR_INT_MAJOR BIT(1)
diff --git a/drivers/dma/fsl-edma-main.c b/drivers/dma/fsl-edma-main.c
index 45cc419..d36e28b 100644
--- a/drivers/dma/fsl-edma-main.c
+++ b/drivers/dma/fsl-edma-main.c
@@ -10,6 +10,7 @@
*/
#include <dt-bindings/dma/fsl-edma.h>
+#include <linux/bitfield.h>
#include <linux/module.h>
#include <linux/interrupt.h>
#include <linux/clk.h>
@@ -582,7 +583,8 @@ static int fsl_edma_probe(struct platform_device *pdev)
DMAENGINE_ALIGN_32_BYTES;
/* Per worst case 'nbytes = 1' take CITER as the max_seg_size */
- dma_set_max_seg_size(fsl_edma->dma_dev.dev, 0x3fff);
+ dma_set_max_seg_size(fsl_edma->dma_dev.dev,
+ FIELD_GET(EDMA_TCD_ITER_MASK, EDMA_TCD_ITER_MASK));
fsl_edma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_SEGMENT;
diff --git a/drivers/dma/fsl-qdma.c b/drivers/dma/fsl-qdma.c
index f405c77..5005e13 100644
--- a/drivers/dma/fsl-qdma.c
+++ b/drivers/dma/fsl-qdma.c
@@ -109,6 +109,7 @@
#define FSL_QDMA_CMD_WTHROTL_OFFSET 20
#define FSL_QDMA_CMD_DSEN_OFFSET 19
#define FSL_QDMA_CMD_LWC_OFFSET 16
+#define FSL_QDMA_CMD_PF BIT(17)
/* Field definition for Descriptor status */
#define QDMA_CCDF_STATUS_RTE BIT(5)
@@ -160,6 +161,10 @@ struct fsl_qdma_format {
u8 __reserved1[2];
u8 cfg8b_w1;
} __packed;
+ struct {
+ __le32 __reserved2;
+ __le32 cmd;
+ } __packed;
__le64 data;
};
} __packed;
@@ -354,7 +359,6 @@ static void fsl_qdma_free_chan_resources(struct dma_chan *chan)
static void fsl_qdma_comp_fill_memcpy(struct fsl_qdma_comp *fsl_comp,
dma_addr_t dst, dma_addr_t src, u32 len)
{
- u32 cmd;
struct fsl_qdma_format *sdf, *ddf;
struct fsl_qdma_format *ccdf, *csgf_desc, *csgf_src, *csgf_dest;
@@ -383,14 +387,11 @@ static void fsl_qdma_comp_fill_memcpy(struct fsl_qdma_comp *fsl_comp,
/* This entry is the last entry. */
qdma_csgf_set_f(csgf_dest, len);
/* Descriptor Buffer */
- cmd = cpu_to_le32(FSL_QDMA_CMD_RWTTYPE <<
- FSL_QDMA_CMD_RWTTYPE_OFFSET);
- sdf->data = QDMA_SDDF_CMD(cmd);
+ sdf->cmd = cpu_to_le32((FSL_QDMA_CMD_RWTTYPE << FSL_QDMA_CMD_RWTTYPE_OFFSET) |
+ FSL_QDMA_CMD_PF);
- cmd = cpu_to_le32(FSL_QDMA_CMD_RWTTYPE <<
- FSL_QDMA_CMD_RWTTYPE_OFFSET);
- cmd |= cpu_to_le32(FSL_QDMA_CMD_LWC << FSL_QDMA_CMD_LWC_OFFSET);
- ddf->data = QDMA_SDDF_CMD(cmd);
+ ddf->cmd = cpu_to_le32((FSL_QDMA_CMD_RWTTYPE << FSL_QDMA_CMD_RWTTYPE_OFFSET) |
+ (FSL_QDMA_CMD_LWC << FSL_QDMA_CMD_LWC_OFFSET));
}
/*
@@ -624,7 +625,7 @@ static int fsl_qdma_halt(struct fsl_qdma_engine *fsl_qdma)
static int
fsl_qdma_queue_transfer_complete(struct fsl_qdma_engine *fsl_qdma,
- void *block,
+ __iomem void *block,
int id)
{
bool duplicate;
@@ -1196,10 +1197,6 @@ static int fsl_qdma_probe(struct platform_device *pdev)
if (!fsl_qdma->queue)
return -ENOMEM;
- ret = fsl_qdma_irq_init(pdev, fsl_qdma);
- if (ret)
- return ret;
-
fsl_qdma->irq_base = platform_get_irq_byname(pdev, "qdma-queue0");
if (fsl_qdma->irq_base < 0)
return fsl_qdma->irq_base;
@@ -1238,19 +1235,22 @@ static int fsl_qdma_probe(struct platform_device *pdev)
platform_set_drvdata(pdev, fsl_qdma);
- ret = dma_async_device_register(&fsl_qdma->dma_dev);
- if (ret) {
- dev_err(&pdev->dev,
- "Can't register NXP Layerscape qDMA engine.\n");
- return ret;
- }
-
ret = fsl_qdma_reg_init(fsl_qdma);
if (ret) {
dev_err(&pdev->dev, "Can't Initialize the qDMA engine.\n");
return ret;
}
+ ret = fsl_qdma_irq_init(pdev, fsl_qdma);
+ if (ret)
+ return ret;
+
+ ret = dma_async_device_register(&fsl_qdma->dma_dev);
+ if (ret) {
+ dev_err(&pdev->dev, "Can't register NXP Layerscape qDMA engine.\n");
+ return ret;
+ }
+
return 0;
}
diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index 77f8885..e5a94a9 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -345,7 +345,7 @@ static void idxd_cdev_evl_drain_pasid(struct idxd_wq *wq, u32 pasid)
spin_lock(&evl->lock);
status.bits = ioread64(idxd->reg_base + IDXD_EVLSTATUS_OFFSET);
t = status.tail;
- h = evl->head;
+ h = status.head;
size = evl->size;
while (h != t) {
diff --git a/drivers/dma/idxd/debugfs.c b/drivers/dma/idxd/debugfs.c
index 9cfbd9b..f3f25ee 100644
--- a/drivers/dma/idxd/debugfs.c
+++ b/drivers/dma/idxd/debugfs.c
@@ -68,9 +68,9 @@ static int debugfs_evl_show(struct seq_file *s, void *d)
spin_lock(&evl->lock);
- h = evl->head;
evl_status.bits = ioread64(idxd->reg_base + IDXD_EVLSTATUS_OFFSET);
t = evl_status.tail;
+ h = evl_status.head;
evl_size = evl->size;
seq_printf(s, "Event Log head %u tail %u interrupt pending %u\n\n",
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index 47de3f9..d0f5db6 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -300,7 +300,6 @@ struct idxd_evl {
unsigned int log_size;
/* The number of entries in the event log. */
u16 size;
- u16 head;
unsigned long *bmap;
bool batch_fail[IDXD_MAX_BATCH_IDENT];
};
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 14df1f1..4954adc 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -343,7 +343,9 @@ static void idxd_cleanup_internals(struct idxd_device *idxd)
static int idxd_init_evl(struct idxd_device *idxd)
{
struct device *dev = &idxd->pdev->dev;
+ unsigned int evl_cache_size;
struct idxd_evl *evl;
+ const char *idxd_name;
if (idxd->hw.gen_cap.evl_support == 0)
return 0;
@@ -355,9 +357,16 @@ static int idxd_init_evl(struct idxd_device *idxd)
spin_lock_init(&evl->lock);
evl->size = IDXD_EVL_SIZE_MIN;
- idxd->evl_cache = kmem_cache_create(dev_name(idxd_confdev(idxd)),
- sizeof(struct idxd_evl_fault) + evl_ent_size(idxd),
- 0, 0, NULL);
+ idxd_name = dev_name(idxd_confdev(idxd));
+ evl_cache_size = sizeof(struct idxd_evl_fault) + evl_ent_size(idxd);
+ /*
+ * Since completion record in evl_cache will be copied to user
+ * when handling completion record page fault, need to create
+ * the cache suitable for user copy.
+ */
+ idxd->evl_cache = kmem_cache_create_usercopy(idxd_name, evl_cache_size,
+ 0, 0, 0, evl_cache_size,
+ NULL);
if (!idxd->evl_cache) {
kfree(evl);
return -ENOMEM;
diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c
index c8a0aa8..348aa21 100644
--- a/drivers/dma/idxd/irq.c
+++ b/drivers/dma/idxd/irq.c
@@ -367,9 +367,9 @@ static void process_evl_entries(struct idxd_device *idxd)
/* Clear interrupt pending bit */
iowrite32(evl_status.bits_upper32,
idxd->reg_base + IDXD_EVLSTATUS_OFFSET + sizeof(u32));
- h = evl->head;
evl_status.bits = ioread64(idxd->reg_base + IDXD_EVLSTATUS_OFFSET);
t = evl_status.tail;
+ h = evl_status.head;
size = idxd->evl->size;
while (h != t) {
@@ -378,7 +378,6 @@ static void process_evl_entries(struct idxd_device *idxd)
h = (h + 1) % size;
}
- evl->head = h;
evl_status.head = h;
iowrite32(evl_status.bits_lower32, idxd->reg_base + IDXD_EVLSTATUS_OFFSET);
spin_unlock(&evl->lock);
diff --git a/drivers/dma/ptdma/ptdma-dmaengine.c b/drivers/dma/ptdma/ptdma-dmaengine.c
index 1aa65e5..f792407 100644
--- a/drivers/dma/ptdma/ptdma-dmaengine.c
+++ b/drivers/dma/ptdma/ptdma-dmaengine.c
@@ -385,8 +385,6 @@ int pt_dmaengine_register(struct pt_device *pt)
chan->vc.desc_free = pt_do_cleanup;
vchan_init(&chan->vc, dma_dev);
- dma_set_mask_and_coherent(pt->dev, DMA_BIT_MASK(64));
-
ret = dma_async_device_register(dma_dev);
if (ret)
goto err_reg;
diff --git a/drivers/dpll/dpll_core.c b/drivers/dpll/dpll_core.c
index 5152bd1..7f686d1 100644
--- a/drivers/dpll/dpll_core.c
+++ b/drivers/dpll/dpll_core.c
@@ -508,6 +508,26 @@ dpll_pin_alloc(u64 clock_id, u32 pin_idx, struct module *module,
return ERR_PTR(ret);
}
+static void dpll_netdev_pin_assign(struct net_device *dev, struct dpll_pin *dpll_pin)
+{
+ rtnl_lock();
+ rcu_assign_pointer(dev->dpll_pin, dpll_pin);
+ rtnl_unlock();
+}
+
+void dpll_netdev_pin_set(struct net_device *dev, struct dpll_pin *dpll_pin)
+{
+ WARN_ON(!dpll_pin);
+ dpll_netdev_pin_assign(dev, dpll_pin);
+}
+EXPORT_SYMBOL(dpll_netdev_pin_set);
+
+void dpll_netdev_pin_clear(struct net_device *dev)
+{
+ dpll_netdev_pin_assign(dev, NULL);
+}
+EXPORT_SYMBOL(dpll_netdev_pin_clear);
+
/**
* dpll_pin_get - find existing or create new dpll pin
* @clock_id: clock_id of creator
@@ -564,7 +584,7 @@ void dpll_pin_put(struct dpll_pin *pin)
xa_destroy(&pin->parent_refs);
xa_erase(&dpll_pin_xa, pin->id);
dpll_pin_prop_free(&pin->prop);
- kfree(pin);
+ kfree_rcu(pin, rcu);
}
mutex_unlock(&dpll_lock);
}
diff --git a/drivers/dpll/dpll_core.h b/drivers/dpll/dpll_core.h
index 717f715..2b6d8ef 100644
--- a/drivers/dpll/dpll_core.h
+++ b/drivers/dpll/dpll_core.h
@@ -47,6 +47,7 @@ struct dpll_device {
* @prop: pin properties copied from the registerer
* @rclk_dev_name: holds name of device when pin can recover clock from it
* @refcount: refcount
+ * @rcu: rcu_head for kfree_rcu()
**/
struct dpll_pin {
u32 id;
@@ -57,6 +58,7 @@ struct dpll_pin {
struct xarray parent_refs;
struct dpll_pin_properties prop;
refcount_t refcount;
+ struct rcu_head rcu;
};
/**
diff --git a/drivers/dpll/dpll_netlink.c b/drivers/dpll/dpll_netlink.c
index 314bb37..b57355e 100644
--- a/drivers/dpll/dpll_netlink.c
+++ b/drivers/dpll/dpll_netlink.c
@@ -8,6 +8,7 @@
*/
#include <linux/module.h>
#include <linux/kernel.h>
+#include <linux/netdevice.h>
#include <net/genetlink.h>
#include "dpll_core.h"
#include "dpll_netlink.h"
@@ -48,18 +49,6 @@ dpll_msg_add_dev_parent_handle(struct sk_buff *msg, u32 id)
}
/**
- * dpll_msg_pin_handle_size - get size of pin handle attribute for given pin
- * @pin: pin pointer
- *
- * Return: byte size of pin handle attribute for given pin.
- */
-size_t dpll_msg_pin_handle_size(struct dpll_pin *pin)
-{
- return pin ? nla_total_size(4) : 0; /* DPLL_A_PIN_ID */
-}
-EXPORT_SYMBOL_GPL(dpll_msg_pin_handle_size);
-
-/**
* dpll_msg_add_pin_handle - attach pin handle attribute to a given message
* @msg: pointer to sk_buff message to attach a pin handle
* @pin: pin pointer
@@ -68,7 +57,7 @@ EXPORT_SYMBOL_GPL(dpll_msg_pin_handle_size);
* * 0 - success
* * -EMSGSIZE - no space in message to attach pin handle
*/
-int dpll_msg_add_pin_handle(struct sk_buff *msg, struct dpll_pin *pin)
+static int dpll_msg_add_pin_handle(struct sk_buff *msg, struct dpll_pin *pin)
{
if (!pin)
return 0;
@@ -76,7 +65,28 @@ int dpll_msg_add_pin_handle(struct sk_buff *msg, struct dpll_pin *pin)
return -EMSGSIZE;
return 0;
}
-EXPORT_SYMBOL_GPL(dpll_msg_add_pin_handle);
+
+static struct dpll_pin *dpll_netdev_pin(const struct net_device *dev)
+{
+ return rcu_dereference_rtnl(dev->dpll_pin);
+}
+
+/**
+ * dpll_netdev_pin_handle_size - get size of pin handle attribute of a netdev
+ * @dev: netdev from which to get the pin
+ *
+ * Return: byte size of pin handle attribute, or 0 if @dev has no pin.
+ */
+size_t dpll_netdev_pin_handle_size(const struct net_device *dev)
+{
+ return dpll_netdev_pin(dev) ? nla_total_size(4) : 0; /* DPLL_A_PIN_ID */
+}
+
+int dpll_netdev_add_pin_handle(struct sk_buff *msg,
+ const struct net_device *dev)
+{
+ return dpll_msg_add_pin_handle(msg, dpll_netdev_pin(dev));
+}
static int
dpll_msg_add_mode(struct sk_buff *msg, struct dpll_device *dpll,
@@ -1199,6 +1209,7 @@ int dpll_nl_pin_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
unsigned long i;
int ret = 0;
+ mutex_lock(&dpll_lock);
xa_for_each_marked_start(&dpll_pin_xa, i, pin, DPLL_REGISTERED,
ctx->idx) {
if (!dpll_pin_available(pin))
@@ -1218,6 +1229,8 @@ int dpll_nl_pin_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
}
genlmsg_end(skb, hdr);
}
+ mutex_unlock(&dpll_lock);
+
if (ret == -EMSGSIZE) {
ctx->idx = i;
return skb->len;
@@ -1373,6 +1386,7 @@ int dpll_nl_device_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
unsigned long i;
int ret = 0;
+ mutex_lock(&dpll_lock);
xa_for_each_marked_start(&dpll_device_xa, i, dpll, DPLL_REGISTERED,
ctx->idx) {
hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).portid,
@@ -1389,6 +1403,8 @@ int dpll_nl_device_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
}
genlmsg_end(skb, hdr);
}
+ mutex_unlock(&dpll_lock);
+
if (ret == -EMSGSIZE) {
ctx->idx = i;
return skb->len;
@@ -1439,20 +1455,6 @@ dpll_unlock_doit(const struct genl_split_ops *ops, struct sk_buff *skb,
mutex_unlock(&dpll_lock);
}
-int dpll_lock_dumpit(struct netlink_callback *cb)
-{
- mutex_lock(&dpll_lock);
-
- return 0;
-}
-
-int dpll_unlock_dumpit(struct netlink_callback *cb)
-{
- mutex_unlock(&dpll_lock);
-
- return 0;
-}
-
int dpll_pin_pre_doit(const struct genl_split_ops *ops, struct sk_buff *skb,
struct genl_info *info)
{
diff --git a/drivers/dpll/dpll_nl.c b/drivers/dpll/dpll_nl.c
index eaee5be..1e95f53 100644
--- a/drivers/dpll/dpll_nl.c
+++ b/drivers/dpll/dpll_nl.c
@@ -95,9 +95,7 @@ static const struct genl_split_ops dpll_nl_ops[] = {
},
{
.cmd = DPLL_CMD_DEVICE_GET,
- .start = dpll_lock_dumpit,
.dumpit = dpll_nl_device_get_dumpit,
- .done = dpll_unlock_dumpit,
.flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DUMP,
},
{
@@ -129,9 +127,7 @@ static const struct genl_split_ops dpll_nl_ops[] = {
},
{
.cmd = DPLL_CMD_PIN_GET,
- .start = dpll_lock_dumpit,
.dumpit = dpll_nl_pin_get_dumpit,
- .done = dpll_unlock_dumpit,
.policy = dpll_pin_get_dump_nl_policy,
.maxattr = DPLL_A_PIN_ID,
.flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DUMP,
diff --git a/drivers/dpll/dpll_nl.h b/drivers/dpll/dpll_nl.h
index 92d4c9c..f491262 100644
--- a/drivers/dpll/dpll_nl.h
+++ b/drivers/dpll/dpll_nl.h
@@ -30,8 +30,6 @@ dpll_post_doit(const struct genl_split_ops *ops, struct sk_buff *skb,
void
dpll_pin_post_doit(const struct genl_split_ops *ops, struct sk_buff *skb,
struct genl_info *info);
-int dpll_lock_dumpit(struct netlink_callback *cb);
-int dpll_unlock_dumpit(struct netlink_callback *cb);
int dpll_nl_device_id_get_doit(struct sk_buff *skb, struct genl_info *info);
int dpll_nl_device_get_doit(struct sk_buff *skb, struct genl_info *info);
diff --git a/drivers/firewire/core-card.c b/drivers/firewire/core-card.c
index 6ac5ff2..401a77e 100644
--- a/drivers/firewire/core-card.c
+++ b/drivers/firewire/core-card.c
@@ -429,7 +429,23 @@ static void bm_work(struct work_struct *work)
*/
card->bm_generation = generation;
- if (root_device == NULL) {
+ if (card->gap_count == 0) {
+ /*
+ * If self IDs have inconsistent gap counts, do a
+ * bus reset ASAP. The config rom read might never
+ * complete, so don't wait for it. However, still
+ * send a PHY configuration packet prior to the
+ * bus reset. The PHY configuration packet might
+ * fail, but 1394-2008 8.4.5.2 explicitly permits
+ * it in this case, so it should be safe to try.
+ */
+ new_root_id = local_id;
+ /*
+ * We must always send a bus reset if the gap count
+ * is inconsistent, so bypass the 5-reset limit.
+ */
+ card->bm_retries = 0;
+ } else if (root_device == NULL) {
/*
* Either link_on is false, or we failed to read the
* config rom. In either case, pick another root.
@@ -484,7 +500,19 @@ static void bm_work(struct work_struct *work)
fw_notice(card, "phy config: new root=%x, gap_count=%d\n",
new_root_id, gap_count);
fw_send_phy_config(card, new_root_id, generation, gap_count);
- reset_bus(card, true);
+ /*
+ * Where possible, use a short bus reset to minimize
+ * disruption to isochronous transfers. But in the event
+ * of a gap count inconsistency, use a long bus reset.
+ *
+ * As noted in 1394a 8.4.6.2, nodes on a mixed 1394/1394a bus
+ * may set different gap counts after a bus reset. On a mixed
+ * 1394/1394a bus, a short bus reset can get doubled. Some
+ * nodes may treat the double reset as one bus reset and others
+ * may treat it as two, causing a gap count inconsistency
+ * again. Using a long bus reset prevents this.
+ */
+ reset_bus(card, card->gap_count != 0);
/* Will allocate broadcast channel after the reset. */
goto out;
}
diff --git a/drivers/firewire/ohci.c b/drivers/firewire/ohci.c
index 9db9290..7bc71f4 100644
--- a/drivers/firewire/ohci.c
+++ b/drivers/firewire/ohci.c
@@ -3773,6 +3773,7 @@ static int pci_probe(struct pci_dev *dev,
return 0;
fail_msi:
+ devm_free_irq(&dev->dev, dev->irq, ohci);
pci_disable_msi(dev);
return err;
@@ -3800,6 +3801,7 @@ static void pci_remove(struct pci_dev *dev)
software_reset(ohci);
+ devm_free_irq(&dev->dev, dev->irq, ohci);
pci_disable_msi(dev);
dev_notice(&dev->dev, "removing fw-ohci device\n");
diff --git a/drivers/firmware/efi/arm-runtime.c b/drivers/firmware/efi/arm-runtime.c
index 83f5bb5..83092d9 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -107,7 +107,7 @@ static int __init arm_enable_runtime_services(void)
efi_memory_desc_t *md;
for_each_efi_memory_desc(md) {
- int md_size = md->num_pages << EFI_PAGE_SHIFT;
+ u64 md_size = md->num_pages << EFI_PAGE_SHIFT;
struct resource *res;
if (!(md->attribute & EFI_MEMORY_SP))
diff --git a/drivers/firmware/efi/capsule-loader.c b/drivers/firmware/efi/capsule-loader.c
index 3e8d4b5..97bafb5 100644
--- a/drivers/firmware/efi/capsule-loader.c
+++ b/drivers/firmware/efi/capsule-loader.c
@@ -292,7 +292,7 @@ static int efi_capsule_open(struct inode *inode, struct file *file)
return -ENOMEM;
}
- cap_info->phys = kzalloc(sizeof(void *), GFP_KERNEL);
+ cap_info->phys = kzalloc(sizeof(phys_addr_t), GFP_KERNEL);
if (!cap_info->phys) {
kfree(cap_info->pages);
kfree(cap_info);
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 35c37f6..9b3884f 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -523,6 +523,17 @@ static void cper_print_tstamp(const char *pfx,
}
}
+struct ignore_section {
+ guid_t guid;
+ const char *name;
+};
+
+static const struct ignore_section ignore_sections[] = {
+ { .guid = CPER_SEC_CXL_GEN_MEDIA_GUID, .name = "CXL General Media Event" },
+ { .guid = CPER_SEC_CXL_DRAM_GUID, .name = "CXL DRAM Event" },
+ { .guid = CPER_SEC_CXL_MEM_MODULE_GUID, .name = "CXL Memory Module Event" },
+};
+
static void
cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data *gdata,
int sec_no)
@@ -543,6 +554,14 @@ cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data *gdata
printk("%s""fru_text: %.20s\n", pfx, gdata->fru_text);
snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
+
+ for (int i = 0; i < ARRAY_SIZE(ignore_sections); i++) {
+ if (guid_equal(sec_type, &ignore_sections[i].guid)) {
+ printk("%ssection_type: %s\n", newpfx, ignore_sections[i].name);
+ return;
+ }
+ }
+
if (guid_equal(sec_type, &CPER_SEC_PROC_GENERIC)) {
struct cper_sec_proc_generic *proc_err = acpi_hest_get_payload(gdata);
diff --git a/drivers/firmware/efi/efi-init.c b/drivers/firmware/efi/efi-init.c
index d4987d0..a00e07b 100644
--- a/drivers/firmware/efi/efi-init.c
+++ b/drivers/firmware/efi/efi-init.c
@@ -144,15 +144,6 @@ static __init int is_usable_memory(efi_memory_desc_t *md)
case EFI_CONVENTIONAL_MEMORY:
case EFI_PERSISTENT_MEMORY:
/*
- * Special purpose memory is 'soft reserved', which means it
- * is set aside initially, but can be hotplugged back in or
- * be assigned to the dax driver after boot.
- */
- if (efi_soft_reserve_enabled() &&
- (md->attribute & EFI_MEMORY_SP))
- return false;
-
- /*
* According to the spec, these regions are no longer reserved
* after calling ExitBootServices(). However, we can only use
* them as System RAM if they can be mapped writeback cacheable.
@@ -196,6 +187,16 @@ static __init void reserve_regions(void)
size = npages << PAGE_SHIFT;
if (is_memory(md)) {
+ /*
+ * Special purpose memory is 'soft reserved', which
+ * means it is set aside initially. Don't add a memblock
+ * for it now so that it can be hotplugged back in or
+ * be assigned to the dax driver after boot.
+ */
+ if (efi_soft_reserve_enabled() &&
+ (md->attribute & EFI_MEMORY_SP))
+ continue;
+
early_init_dt_add_memory_arch(paddr, size);
if (!is_usable_memory(md))
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 06964a3..73f4810 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -28,7 +28,7 @@
-DEFI_HAVE_MEMCHR -DEFI_HAVE_STRRCHR \
-DEFI_HAVE_STRCMP -fno-builtin -fpic \
$(call cc-option,-mno-single-pic-base)
-cflags-$(CONFIG_RISCV) += -fpic -DNO_ALTERNATIVE
+cflags-$(CONFIG_RISCV) += -fpic -DNO_ALTERNATIVE -mno-relax
cflags-$(CONFIG_LOONGARCH) += -fpie
cflags-$(CONFIG_EFI_PARAMS_FROM_FDT) += -I$(srctree)/scripts/dtc/libfdt
@@ -143,7 +143,7 @@
# exist.
STUBCOPY_FLAGS-$(CONFIG_RISCV) += --prefix-alloc-sections=.init \
--prefix-symbols=__efistub_
-STUBCOPY_RELOC-$(CONFIG_RISCV) := R_RISCV_HI20
+STUBCOPY_RELOC-$(CONFIG_RISCV) := -E R_RISCV_HI20\|R_RISCV_$(BITS)\|R_RISCV_RELAX
# For LoongArch, keep all the symbols in .init section and make sure that no
# absolute symbols references exist.
diff --git a/drivers/firmware/efi/libstub/alignedmem.c b/drivers/firmware/efi/libstub/alignedmem.c
index 6b83c49..31928bd 100644
--- a/drivers/firmware/efi/libstub/alignedmem.c
+++ b/drivers/firmware/efi/libstub/alignedmem.c
@@ -14,6 +14,7 @@
* @max: the address that the last allocated memory page shall not
* exceed
* @align: minimum alignment of the base of the allocation
+ * @memory_type: the type of memory to allocate
*
* Allocate pages as EFI_LOADER_DATA. The allocated pages are aligned according
* to @align, which should be >= EFI_ALLOC_ALIGN. The last allocated page will
diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index 212687c..c04b82e 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -956,7 +956,8 @@ efi_status_t efi_get_random_bytes(unsigned long size, u8 *out);
efi_status_t efi_random_alloc(unsigned long size, unsigned long align,
unsigned long *addr, unsigned long random_seed,
- int memory_type, unsigned long alloc_limit);
+ int memory_type, unsigned long alloc_min,
+ unsigned long alloc_max);
efi_status_t efi_random_get_seed(void);
diff --git a/drivers/firmware/efi/libstub/kaslr.c b/drivers/firmware/efi/libstub/kaslr.c
index 62d63f7..1a98080 100644
--- a/drivers/firmware/efi/libstub/kaslr.c
+++ b/drivers/firmware/efi/libstub/kaslr.c
@@ -119,7 +119,7 @@ efi_status_t efi_kaslr_relocate_kernel(unsigned long *image_addr,
*/
status = efi_random_alloc(*reserve_size, min_kimg_align,
reserve_addr, phys_seed,
- EFI_LOADER_CODE, EFI_ALLOC_LIMIT);
+ EFI_LOADER_CODE, 0, EFI_ALLOC_LIMIT);
if (status != EFI_SUCCESS)
efi_warn("efi_random_alloc() failed: 0x%lx\n", status);
} else {
diff --git a/drivers/firmware/efi/libstub/randomalloc.c b/drivers/firmware/efi/libstub/randomalloc.c
index 674a064..4e96a85 100644
--- a/drivers/firmware/efi/libstub/randomalloc.c
+++ b/drivers/firmware/efi/libstub/randomalloc.c
@@ -17,7 +17,7 @@
static unsigned long get_entry_num_slots(efi_memory_desc_t *md,
unsigned long size,
unsigned long align_shift,
- u64 alloc_limit)
+ u64 alloc_min, u64 alloc_max)
{
unsigned long align = 1UL << align_shift;
u64 first_slot, last_slot, region_end;
@@ -30,11 +30,11 @@ static unsigned long get_entry_num_slots(efi_memory_desc_t *md,
return 0;
region_end = min(md->phys_addr + md->num_pages * EFI_PAGE_SIZE - 1,
- alloc_limit);
+ alloc_max);
if (region_end < size)
return 0;
- first_slot = round_up(md->phys_addr, align);
+ first_slot = round_up(max(md->phys_addr, alloc_min), align);
last_slot = round_down(region_end - size + 1, align);
if (first_slot > last_slot)
@@ -56,7 +56,8 @@ efi_status_t efi_random_alloc(unsigned long size,
unsigned long *addr,
unsigned long random_seed,
int memory_type,
- unsigned long alloc_limit)
+ unsigned long alloc_min,
+ unsigned long alloc_max)
{
unsigned long total_slots = 0, target_slot;
unsigned long total_mirrored_slots = 0;
@@ -78,7 +79,8 @@ efi_status_t efi_random_alloc(unsigned long size,
efi_memory_desc_t *md = (void *)map->map + map_offset;
unsigned long slots;
- slots = get_entry_num_slots(md, size, ilog2(align), alloc_limit);
+ slots = get_entry_num_slots(md, size, ilog2(align), alloc_min,
+ alloc_max);
MD_NUM_SLOTS(md) = slots;
total_slots += slots;
if (md->attribute & EFI_MEMORY_MORE_RELIABLE)
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 0d510c9..99429bc 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -223,8 +223,8 @@ static void retrieve_apple_device_properties(struct boot_params *boot_params)
}
}
-void efi_adjust_memory_range_protection(unsigned long start,
- unsigned long size)
+efi_status_t efi_adjust_memory_range_protection(unsigned long start,
+ unsigned long size)
{
efi_status_t status;
efi_gcd_memory_space_desc_t desc;
@@ -236,13 +236,17 @@ void efi_adjust_memory_range_protection(unsigned long start,
rounded_end = roundup(start + size, EFI_PAGE_SIZE);
if (memattr != NULL) {
- efi_call_proto(memattr, clear_memory_attributes, rounded_start,
- rounded_end - rounded_start, EFI_MEMORY_XP);
- return;
+ status = efi_call_proto(memattr, clear_memory_attributes,
+ rounded_start,
+ rounded_end - rounded_start,
+ EFI_MEMORY_XP);
+ if (status != EFI_SUCCESS)
+ efi_warn("Failed to clear EFI_MEMORY_XP attribute\n");
+ return status;
}
if (efi_dxe_table == NULL)
- return;
+ return EFI_SUCCESS;
/*
* Don't modify memory region attributes, they are
@@ -255,7 +259,7 @@ void efi_adjust_memory_range_protection(unsigned long start,
status = efi_dxe_call(get_memory_space_descriptor, start, &desc);
if (status != EFI_SUCCESS)
- return;
+ break;
next = desc.base_address + desc.length;
@@ -280,8 +284,10 @@ void efi_adjust_memory_range_protection(unsigned long start,
unprotect_start,
unprotect_start + unprotect_size,
status);
+ break;
}
}
+ return EFI_SUCCESS;
}
static void setup_unaccepted_memory(void)
@@ -793,6 +799,7 @@ static efi_status_t efi_decompress_kernel(unsigned long *kernel_entry)
status = efi_random_alloc(alloc_size, CONFIG_PHYSICAL_ALIGN, &addr,
seed[0], EFI_LOADER_CODE,
+ LOAD_PHYSICAL_ADDR,
EFI_X86_KERNEL_ALLOC_LIMIT);
if (status != EFI_SUCCESS)
return status;
@@ -805,9 +812,7 @@ static efi_status_t efi_decompress_kernel(unsigned long *kernel_entry)
*kernel_entry = addr + entry;
- efi_adjust_memory_range_protection(addr, kernel_total_size);
-
- return EFI_SUCCESS;
+ return efi_adjust_memory_range_protection(addr, kernel_total_size);
}
static void __noreturn enter_kernel(unsigned long kernel_addr,
diff --git a/drivers/firmware/efi/libstub/x86-stub.h b/drivers/firmware/efi/libstub/x86-stub.h
index 37c5a36..1c20e99a 100644
--- a/drivers/firmware/efi/libstub/x86-stub.h
+++ b/drivers/firmware/efi/libstub/x86-stub.h
@@ -5,8 +5,8 @@
extern void trampoline_32bit_src(void *, bool);
extern const u16 trampoline_ljmp_imm_offset;
-void efi_adjust_memory_range_protection(unsigned long start,
- unsigned long size);
+efi_status_t efi_adjust_memory_range_protection(unsigned long start,
+ unsigned long size);
#ifdef CONFIG_X86_64
efi_status_t efi_setup_5level_paging(void);
diff --git a/drivers/firmware/efi/libstub/zboot.c b/drivers/firmware/efi/libstub/zboot.c
index bdb17ea..1ceace9 100644
--- a/drivers/firmware/efi/libstub/zboot.c
+++ b/drivers/firmware/efi/libstub/zboot.c
@@ -119,7 +119,7 @@ efi_zboot_entry(efi_handle_t handle, efi_system_table_t *systab)
}
status = efi_random_alloc(alloc_size, min_kimg_align, &image_base,
- seed, EFI_LOADER_CODE, EFI_ALLOC_LIMIT);
+ seed, EFI_LOADER_CODE, 0, EFI_ALLOC_LIMIT);
if (status != EFI_SUCCESS) {
efi_err("Failed to allocate memory\n");
goto free_cmdline;
diff --git a/drivers/firmware/efi/riscv-runtime.c b/drivers/firmware/efi/riscv-runtime.c
index 09525fb..01f0f90 100644
--- a/drivers/firmware/efi/riscv-runtime.c
+++ b/drivers/firmware/efi/riscv-runtime.c
@@ -85,7 +85,7 @@ static int __init riscv_enable_runtime_services(void)
efi_memory_desc_t *md;
for_each_efi_memory_desc(md) {
- int md_size = md->num_pages << EFI_PAGE_SHIFT;
+ u64 md_size = md->num_pages << EFI_PAGE_SHIFT;
struct resource *res;
if (!(md->attribute & EFI_MEMORY_SP))
diff --git a/drivers/firmware/microchip/mpfs-auto-update.c b/drivers/firmware/microchip/mpfs-auto-update.c
index 81f5f62..fbeeaee 100644
--- a/drivers/firmware/microchip/mpfs-auto-update.c
+++ b/drivers/firmware/microchip/mpfs-auto-update.c
@@ -167,7 +167,7 @@ static int mpfs_auto_update_verify_image(struct fw_upload *fw_uploader)
u32 *response_msg;
int ret;
- response_msg = devm_kzalloc(priv->dev, AUTO_UPDATE_FEATURE_RESP_SIZE * sizeof(response_msg),
+ response_msg = devm_kzalloc(priv->dev, AUTO_UPDATE_FEATURE_RESP_SIZE * sizeof(*response_msg),
GFP_KERNEL);
if (!response_msg)
return -ENOMEM;
@@ -384,7 +384,8 @@ static int mpfs_auto_update_available(struct mpfs_auto_update_priv *priv)
u32 *response_msg;
int ret;
- response_msg = devm_kzalloc(priv->dev, AUTO_UPDATE_FEATURE_RESP_SIZE * sizeof(response_msg),
+ response_msg = devm_kzalloc(priv->dev,
+ AUTO_UPDATE_FEATURE_RESP_SIZE * sizeof(*response_msg),
GFP_KERNEL);
if (!response_msg)
return -ENOMEM;
diff --git a/drivers/gpio/gpio-74x164.c b/drivers/gpio/gpio-74x164.c
index e00c333..753e7be 100644
--- a/drivers/gpio/gpio-74x164.c
+++ b/drivers/gpio/gpio-74x164.c
@@ -127,8 +127,6 @@ static int gen_74x164_probe(struct spi_device *spi)
if (IS_ERR(chip->gpiod_oe))
return PTR_ERR(chip->gpiod_oe);
- gpiod_set_value_cansleep(chip->gpiod_oe, 1);
-
spi_set_drvdata(spi, chip);
chip->gpio_chip.label = spi->modalias;
@@ -153,6 +151,8 @@ static int gen_74x164_probe(struct spi_device *spi)
goto exit_destroy;
}
+ gpiod_set_value_cansleep(chip->gpiod_oe, 1);
+
ret = gpiochip_add_data(&chip->gpio_chip, chip);
if (!ret)
return 0;
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index 44c8f57..75be4a3 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -968,11 +968,11 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data,
ret = gpiochip_irqchip_init_valid_mask(gc);
if (ret)
- goto err_remove_acpi_chip;
+ goto err_free_hogs;
ret = gpiochip_irqchip_init_hw(gc);
if (ret)
- goto err_remove_acpi_chip;
+ goto err_remove_irqchip_mask;
ret = gpiochip_add_irqchip(gc, lock_key, request_key);
if (ret)
@@ -997,23 +997,23 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data,
gpiochip_irqchip_remove(gc);
err_remove_irqchip_mask:
gpiochip_irqchip_free_valid_mask(gc);
-err_remove_acpi_chip:
- acpi_gpiochip_remove(gc);
-err_remove_of_chip:
+err_free_hogs:
gpiochip_free_hogs(gc);
+ acpi_gpiochip_remove(gc);
+ gpiochip_remove_pin_ranges(gc);
+err_remove_of_chip:
of_gpiochip_remove(gc);
err_free_gpiochip_mask:
- gpiochip_remove_pin_ranges(gc);
gpiochip_free_valid_mask(gc);
+err_remove_from_list:
+ spin_lock_irqsave(&gpio_lock, flags);
+ list_del(&gdev->list);
+ spin_unlock_irqrestore(&gpio_lock, flags);
if (gdev->dev.release) {
/* release() has been registered by gpiochip_setup_dev() */
gpio_device_put(gdev);
goto err_print_message;
}
-err_remove_from_list:
- spin_lock_irqsave(&gpio_lock, flags);
- list_del(&gdev->list);
- spin_unlock_irqrestore(&gpio_lock, flags);
err_free_label:
kfree_const(gdev->label);
err_free_descs:
@@ -2042,6 +2042,11 @@ EXPORT_SYMBOL_GPL(gpiochip_generic_free);
int gpiochip_generic_config(struct gpio_chip *gc, unsigned int offset,
unsigned long config)
{
+#ifdef CONFIG_PINCTRL
+ if (list_empty(&gc->gpiodev->pin_ranges))
+ return -ENOTSUPP;
+#endif
+
return pinctrl_gpio_set_config(gc, offset, config);
}
EXPORT_SYMBOL_GPL(gpiochip_generic_config);
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 2520db0..c7edba1 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -199,7 +199,7 @@
config DRM_TTM_KUNIT_TEST
tristate "KUnit tests for TTM" if !KUNIT_ALL_TESTS
default n
- depends on DRM && KUNIT && MMU
+ depends on DRM && KUNIT && MMU && (UML || COMPILE_TEST)
select DRM_TTM
select DRM_EXPORT_FOR_TESTS if m
select DRM_KUNIT_TEST_HELPERS
@@ -207,7 +207,8 @@
help
Enables unit tests for TTM, a GPU memory manager subsystem used
to manage memory buffers. This option is mostly useful for kernel
- developers.
+ developers. It depends on (UML || COMPILE_TEST) since no other driver
+ which uses TTM can be loaded while running the tests.
If in doubt, say "N".
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 3d8a48f..79827a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -200,6 +200,7 @@ extern uint amdgpu_dc_debug_mask;
extern uint amdgpu_dc_visual_confirm;
extern uint amdgpu_dm_abm_level;
extern int amdgpu_backlight;
+extern int amdgpu_damage_clips;
extern struct amdgpu_mgpu_info mgpu_info;
extern int amdgpu_ras_enable;
extern uint amdgpu_ras_mask;
@@ -1078,6 +1079,8 @@ struct amdgpu_device {
bool in_s3;
bool in_s4;
bool in_s0ix;
+ /* indicate amdgpu suspension status */
+ bool suspend_complete;
enum pp_mp1_state mp1_state;
struct amdgpu_doorbell_index doorbell_index;
@@ -1547,9 +1550,11 @@ static inline int amdgpu_acpi_smart_shift_update(struct drm_device *dev,
#if defined(CONFIG_ACPI) && defined(CONFIG_SUSPEND)
bool amdgpu_acpi_is_s3_active(struct amdgpu_device *adev);
bool amdgpu_acpi_is_s0ix_active(struct amdgpu_device *adev);
+void amdgpu_choose_low_power_state(struct amdgpu_device *adev);
#else
static inline bool amdgpu_acpi_is_s0ix_active(struct amdgpu_device *adev) { return false; }
static inline bool amdgpu_acpi_is_s3_active(struct amdgpu_device *adev) { return false; }
+static inline void amdgpu_choose_low_power_state(struct amdgpu_device *adev) { }
#endif
#if defined(CONFIG_DRM_AMD_DC)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
index 2deebec..7099ff9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
@@ -1519,4 +1519,22 @@ bool amdgpu_acpi_is_s0ix_active(struct amdgpu_device *adev)
#endif /* CONFIG_AMD_PMC */
}
+/**
+ * amdgpu_choose_low_power_state
+ *
+ * @adev: amdgpu_device_pointer
+ *
+ * Choose the target low power state for the GPU
+ */
+void amdgpu_choose_low_power_state(struct amdgpu_device *adev)
+{
+ if (adev->in_runpm)
+ return;
+
+ if (amdgpu_acpi_is_s0ix_active(adev))
+ adev->in_s0ix = true;
+ else if (amdgpu_acpi_is_s3_active(adev))
+ adev->in_s3 = true;
+}
+
#endif /* CONFIG_SUSPEND */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index fdde748..94bdb5f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4514,13 +4514,15 @@ int amdgpu_device_prepare(struct drm_device *dev)
struct amdgpu_device *adev = drm_to_adev(dev);
int i, r;
+ amdgpu_choose_low_power_state(adev);
+
if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
return 0;
/* Evict the majority of BOs before starting suspend sequence */
r = amdgpu_device_evict_resources(adev);
if (r)
- return r;
+ goto unprepare;
for (i = 0; i < adev->num_ip_blocks; i++) {
if (!adev->ip_blocks[i].status.valid)
@@ -4529,10 +4531,15 @@ int amdgpu_device_prepare(struct drm_device *dev)
continue;
r = adev->ip_blocks[i].version->funcs->prepare_suspend((void *)adev);
if (r)
- return r;
+ goto unprepare;
}
return 0;
+
+unprepare:
+ adev->in_s0ix = adev->in_s3 = false;
+
+ return r;
}
/**
@@ -4569,7 +4576,6 @@ int amdgpu_device_suspend(struct drm_device *dev, bool fbcon)
drm_fb_helper_set_suspend_unlocked(adev_to_drm(adev)->fb_helper, true);
cancel_delayed_work_sync(&adev->delayed_init_work);
- flush_delayed_work(&adev->gfx.gfx_off_delay_work);
amdgpu_ras_suspend(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 971acf0..586f4d0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -211,6 +211,7 @@ int amdgpu_seamless = -1; /* auto */
uint amdgpu_debug_mask;
int amdgpu_agp = -1; /* auto */
int amdgpu_wbrf = -1;
+int amdgpu_damage_clips = -1; /* auto */
static void amdgpu_drv_delayed_reset_work_handler(struct work_struct *work);
@@ -860,6 +861,18 @@ MODULE_PARM_DESC(backlight, "Backlight control (0 = pwm, 1 = aux, -1 auto (defau
module_param_named(backlight, amdgpu_backlight, bint, 0444);
/**
+ * DOC: damageclips (int)
+ * Enable or disable damage clips support. If damage clips support is disabled,
+ * we will force full frame updates, irrespective of what user space sends to
+ * us.
+ *
+ * Defaults to -1 (where it is enabled unless a PSR-SU display is detected).
+ */
+MODULE_PARM_DESC(damageclips,
+ "Damage clips support (0 = disable, 1 = enable, -1 auto (default))");
+module_param_named(damageclips, amdgpu_damage_clips, int, 0444);
+
+/**
* DOC: tmz (int)
* Trusted Memory Zone (TMZ) is a method to protect data being written
* to or read from memory.
@@ -2476,6 +2489,7 @@ static int amdgpu_pmops_suspend(struct device *dev)
struct drm_device *drm_dev = dev_get_drvdata(dev);
struct amdgpu_device *adev = drm_to_adev(drm_dev);
+ adev->suspend_complete = false;
if (amdgpu_acpi_is_s0ix_active(adev))
adev->in_s0ix = true;
else if (amdgpu_acpi_is_s3_active(adev))
@@ -2490,6 +2504,7 @@ static int amdgpu_pmops_suspend_noirq(struct device *dev)
struct drm_device *drm_dev = dev_get_drvdata(dev);
struct amdgpu_device *adev = drm_to_adev(drm_dev);
+ adev->suspend_complete = true;
if (amdgpu_acpi_should_gpu_reset(adev))
return amdgpu_asic_reset(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index b9674c5..6ddc8e33 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -723,8 +723,15 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable)
if (adev->gfx.gfx_off_req_count == 0 &&
!adev->gfx.gfx_off_state) {
- schedule_delayed_work(&adev->gfx.gfx_off_delay_work,
+ /* If going to s2idle, no need to wait */
+ if (adev->in_s0ix) {
+ if (!amdgpu_dpm_set_powergating_by_smu(adev,
+ AMD_IP_BLOCK_TYPE_GFX, true))
+ adev->gfx.gfx_off_state = true;
+ } else {
+ schedule_delayed_work(&adev->gfx.gfx_off_delay_work,
delay);
+ }
}
} else {
if (adev->gfx.gfx_off_req_count == 0) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c
index 468a67b..ca5c86e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c
@@ -362,7 +362,7 @@ static ssize_t ta_if_invoke_debugfs_write(struct file *fp, const char *buf, size
}
}
- if (copy_to_user((char *)buf, context->mem_context.shared_buf, shared_buf_len))
+ if (copy_to_user((char *)&buf[copy_pos], context->mem_context.shared_buf, shared_buf_len))
ret = -EFAULT;
err_free_shared_buf:
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 69c5009..3bc6943 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -3034,6 +3034,14 @@ static int gfx_v9_0_cp_gfx_start(struct amdgpu_device *adev)
gfx_v9_0_cp_gfx_enable(adev, true);
+ /* Now only limit the quirk on the APU gfx9 series and already
+ * confirmed that the APU gfx10/gfx11 needn't such update.
+ */
+ if (adev->flags & AMD_IS_APU &&
+ adev->in_s3 && !adev->suspend_complete) {
+ DRM_INFO(" Will skip the CSB packet resubmit\n");
+ return 0;
+ }
r = amdgpu_ring_alloc(ring, gfx_v9_0_get_csb_size(adev) + 4 + 3);
if (r) {
DRM_ERROR("amdgpu: cp failed to lock ring (%d).\n", r);
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 40a00ea..e67a62d 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -1947,14 +1947,6 @@ static int gmc_v9_0_init_mem_ranges(struct amdgpu_device *adev)
static void gmc_v9_4_3_init_vram_info(struct amdgpu_device *adev)
{
- static const u32 regBIF_BIOS_SCRATCH_4 = 0x50;
- u32 vram_info;
-
- /* Only for dGPU, vendor informaton is reliable */
- if (!amdgpu_sriov_vf(adev) && !(adev->flags & AMD_IS_APU)) {
- vram_info = RREG32(regBIF_BIOS_SCRATCH_4);
- adev->gmc.vram_vendor = vram_info & 0xF;
- }
adev->gmc.vram_type = AMDGPU_VRAM_TYPE_HBM;
adev->gmc.vram_width = 128 * 64;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
index bc38b90..88ea58d 100644
--- a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
@@ -674,14 +674,6 @@ static int jpeg_v4_0_set_powergating_state(void *handle,
return ret;
}
-static int jpeg_v4_0_set_interrupt_state(struct amdgpu_device *adev,
- struct amdgpu_irq_src *source,
- unsigned type,
- enum amdgpu_interrupt_state state)
-{
- return 0;
-}
-
static int jpeg_v4_0_set_ras_interrupt_state(struct amdgpu_device *adev,
struct amdgpu_irq_src *source,
unsigned int type,
@@ -765,7 +757,6 @@ static void jpeg_v4_0_set_dec_ring_funcs(struct amdgpu_device *adev)
}
static const struct amdgpu_irq_src_funcs jpeg_v4_0_irq_funcs = {
- .set = jpeg_v4_0_set_interrupt_state,
.process = jpeg_v4_0_process_interrupt,
};
diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c
index 6ede85b..78b74da 100644
--- a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c
+++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c
@@ -181,7 +181,6 @@ static int jpeg_v4_0_5_hw_fini(void *handle)
RREG32_SOC15(JPEG, 0, regUVD_JRBC_STATUS))
jpeg_v4_0_5_set_powergating_state(adev, AMD_PG_STATE_GATE);
}
- amdgpu_irq_put(adev, &adev->jpeg.inst->irq, 0);
return 0;
}
@@ -516,14 +515,6 @@ static int jpeg_v4_0_5_set_powergating_state(void *handle,
return ret;
}
-static int jpeg_v4_0_5_set_interrupt_state(struct amdgpu_device *adev,
- struct amdgpu_irq_src *source,
- unsigned type,
- enum amdgpu_interrupt_state state)
-{
- return 0;
-}
-
static int jpeg_v4_0_5_process_interrupt(struct amdgpu_device *adev,
struct amdgpu_irq_src *source,
struct amdgpu_iv_entry *entry)
@@ -603,7 +594,6 @@ static void jpeg_v4_0_5_set_dec_ring_funcs(struct amdgpu_device *adev)
}
static const struct amdgpu_irq_src_funcs jpeg_v4_0_5_irq_funcs = {
- .set = jpeg_v4_0_5_set_interrupt_state,
.process = jpeg_v4_0_5_process_interrupt,
};
diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c b/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c
index e90f337..b4723d6 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c
@@ -431,6 +431,12 @@ static void nbio_v7_9_init_registers(struct amdgpu_device *adev)
u32 inst_mask;
int i;
+ if (amdgpu_sriov_vf(adev))
+ adev->rmmio_remap.reg_offset =
+ SOC15_REG_OFFSET(
+ NBIO, 0,
+ regBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL)
+ << 2;
WREG32_SOC15(NBIO, 0, regXCC_DOORBELL_FENCE,
0xff & ~(adev->gfx.xcc_mask));
diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 15033ef..1c61445 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -574,11 +574,34 @@ soc15_asic_reset_method(struct amdgpu_device *adev)
return AMD_RESET_METHOD_MODE1;
}
+static bool soc15_need_reset_on_resume(struct amdgpu_device *adev)
+{
+ u32 sol_reg;
+
+ sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
+
+ /* Will reset for the following suspend abort cases.
+ * 1) Only reset limit on APU side, dGPU hasn't checked yet.
+ * 2) S3 suspend abort and TOS already launched.
+ */
+ if (adev->flags & AMD_IS_APU && adev->in_s3 &&
+ !adev->suspend_complete &&
+ sol_reg)
+ return true;
+
+ return false;
+}
+
static int soc15_asic_reset(struct amdgpu_device *adev)
{
/* original raven doesn't have full asic reset */
- if ((adev->apu_flags & AMD_APU_IS_RAVEN) ||
- (adev->apu_flags & AMD_APU_IS_RAVEN2))
+ /* On the latest Raven, the GPU reset can be performed
+ * successfully. So now, temporarily enable it for the
+ * S3 suspend abort case.
+ */
+ if (((adev->apu_flags & AMD_APU_IS_RAVEN) ||
+ (adev->apu_flags & AMD_APU_IS_RAVEN2)) &&
+ !soc15_need_reset_on_resume(adev))
return 0;
switch (soc15_asic_reset_method(adev)) {
@@ -1302,6 +1325,10 @@ static int soc15_common_resume(void *handle)
{
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+ if (soc15_need_reset_on_resume(adev)) {
+ dev_info(adev->dev, "S3 suspend abort case, let's reset ASIC.\n");
+ soc15_asic_reset(adev);
+ }
return soc15_common_hw_init(adev);
}
diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c b/drivers/gpu/drm/amd/amdgpu/soc21.c
index 48c6efc..4d71889 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc21.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
@@ -50,13 +50,13 @@ static const struct amd_ip_funcs soc21_common_ip_funcs;
/* SOC21 */
static const struct amdgpu_video_codec_info vcn_4_0_0_video_codecs_encode_array_vcn0[] = {
{codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG4_AVC, 4096, 2304, 0)},
- {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, 4096, 2304, 0)},
+ {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, 8192, 4352, 0)},
{codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_AV1, 8192, 4352, 0)},
};
static const struct amdgpu_video_codec_info vcn_4_0_0_video_codecs_encode_array_vcn1[] = {
{codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG4_AVC, 4096, 2304, 0)},
- {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, 4096, 2304, 0)},
+ {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, 8192, 4352, 0)},
};
static const struct amdgpu_video_codecs vcn_4_0_0_video_codecs_encode_vcn0 = {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c
index d722cbd..826bc4f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c
@@ -55,8 +55,8 @@ static void update_cu_mask(struct mqd_manager *mm, void *mqd,
m = get_mqd(mqd);
if (has_wa_flag) {
- uint32_t wa_mask = minfo->update_flag == UPDATE_FLAG_DBG_WA_ENABLE ?
- 0xffff : 0xffffffff;
+ uint32_t wa_mask =
+ (minfo->update_flag & UPDATE_FLAG_DBG_WA_ENABLE) ? 0xffff : 0xffffffff;
m->compute_static_thread_mgmt_se0 = wa_mask;
m->compute_static_thread_mgmt_se1 = wa_mask;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
index 42d8818..697b6d5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
@@ -303,6 +303,15 @@ static void update_mqd(struct mqd_manager *mm, void *mqd,
update_cu_mask(mm, mqd, minfo, 0);
set_priority(m, q);
+ if (minfo && KFD_GC_VERSION(mm->dev) >= IP_VERSION(9, 4, 2)) {
+ if (minfo->update_flag & UPDATE_FLAG_IS_GWS)
+ m->compute_resource_limits |=
+ COMPUTE_RESOURCE_LIMITS__FORCE_SIMD_DIST_MASK;
+ else
+ m->compute_resource_limits &=
+ ~COMPUTE_RESOURCE_LIMITS__FORCE_SIMD_DIST_MASK;
+ }
+
q->is_active = QUEUE_IS_ACTIVE(*q);
}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 677281c..80320b8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -532,6 +532,7 @@ struct queue_properties {
enum mqd_update_flag {
UPDATE_FLAG_DBG_WA_ENABLE = 1,
UPDATE_FLAG_DBG_WA_DISABLE = 2,
+ UPDATE_FLAG_IS_GWS = 4, /* quirk for gfx9 IP */
};
struct mqd_update_info {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 43eff22..4858112 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -95,6 +95,7 @@ void kfd_process_dequeue_from_device(struct kfd_process_device *pdd)
int pqm_set_gws(struct process_queue_manager *pqm, unsigned int qid,
void *gws)
{
+ struct mqd_update_info minfo = {0};
struct kfd_node *dev = NULL;
struct process_queue_node *pqn;
struct kfd_process_device *pdd;
@@ -146,9 +147,10 @@ int pqm_set_gws(struct process_queue_manager *pqm, unsigned int qid,
}
pdd->qpd.num_gws = gws ? dev->adev->gds.gws_size : 0;
+ minfo.update_flag = gws ? UPDATE_FLAG_IS_GWS : 0;
return pqn->q->device->dqm->ops.update_queue(pqn->q->device->dqm,
- pqn->q, NULL);
+ pqn->q, &minfo);
}
void kfd_process_dequeue_from_all_devices(struct kfd_process *p)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index e5f7c92..6ed2ec3 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1638,12 +1638,10 @@ static int fill_in_l2_l3_pcache(struct kfd_cache_properties **props_ext,
else
mode = UNKNOWN_MEMORY_PARTITION_MODE;
- if (pcache->cache_level == 2)
- pcache->cache_size = pcache_info[cache_type].cache_size * num_xcc;
- else if (mode)
- pcache->cache_size = pcache_info[cache_type].cache_size / mode;
- else
- pcache->cache_size = pcache_info[cache_type].cache_size;
+ pcache->cache_size = pcache_info[cache_type].cache_size;
+ /* Partition mode only affects L3 cache size */
+ if (mode && pcache->cache_level == 3)
+ pcache->cache_size /= mode;
if (pcache_info[cache_type].flags & CRAT_CACHE_FLAGS_DATA_CACHE)
pcache->cache_type |= HSA_CACHE_TYPE_DATA;
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index d292f29..1a9bbb0 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1843,21 +1843,12 @@ static int amdgpu_dm_init(struct amdgpu_device *adev)
DRM_ERROR("amdgpu: fail to register dmub aux callback");
goto error;
}
- if (!register_dmub_notify_callback(adev, DMUB_NOTIFICATION_HPD, dmub_hpd_callback, true)) {
- DRM_ERROR("amdgpu: fail to register dmub hpd callback");
- goto error;
- }
- if (!register_dmub_notify_callback(adev, DMUB_NOTIFICATION_HPD_IRQ, dmub_hpd_callback, true)) {
- DRM_ERROR("amdgpu: fail to register dmub hpd callback");
- goto error;
- }
- }
-
- /* Enable outbox notification only after IRQ handlers are registered and DMUB is alive.
- * It is expected that DMUB will resend any pending notifications at this point, for
- * example HPD from DPIA.
- */
- if (dc_is_dmub_outbox_supported(adev->dm.dc)) {
+ /* Enable outbox notification only after IRQ handlers are registered and DMUB is alive.
+ * It is expected that DMUB will resend any pending notifications at this point. Note
+ * that hpd and hpd_irq handler registration are deferred to register_hpd_handlers() to
+ * align legacy interface initialization sequence. Connection status will be proactivly
+ * detected once in the amdgpu_dm_initialize_drm_device.
+ */
dc_enable_dmub_outbox(adev->dm.dc);
/* DPIA trace goes to dmesg logs only if outbox is enabled */
@@ -1956,7 +1947,7 @@ static void amdgpu_dm_fini(struct amdgpu_device *adev)
&adev->dm.dmub_bo_gpu_addr,
&adev->dm.dmub_bo_cpu_addr);
- if (adev->dm.hpd_rx_offload_wq) {
+ if (adev->dm.hpd_rx_offload_wq && adev->dm.dc) {
for (i = 0; i < adev->dm.dc->caps.max_links; i++) {
if (adev->dm.hpd_rx_offload_wq[i].wq) {
destroy_workqueue(adev->dm.hpd_rx_offload_wq[i].wq);
@@ -2287,6 +2278,7 @@ static int dm_sw_fini(void *handle)
if (adev->dm.dmub_srv) {
dmub_srv_destroy(adev->dm.dmub_srv);
+ kfree(adev->dm.dmub_srv);
adev->dm.dmub_srv = NULL;
}
@@ -3536,6 +3528,14 @@ static void register_hpd_handlers(struct amdgpu_device *adev)
int_params.requested_polarity = INTERRUPT_POLARITY_DEFAULT;
int_params.current_polarity = INTERRUPT_POLARITY_DEFAULT;
+ if (dc_is_dmub_outbox_supported(adev->dm.dc)) {
+ if (!register_dmub_notify_callback(adev, DMUB_NOTIFICATION_HPD, dmub_hpd_callback, true))
+ DRM_ERROR("amdgpu: fail to register dmub hpd callback");
+
+ if (!register_dmub_notify_callback(adev, DMUB_NOTIFICATION_HPD_IRQ, dmub_hpd_callback, true))
+ DRM_ERROR("amdgpu: fail to register dmub hpd callback");
+ }
+
list_for_each_entry(connector,
&dev->mode_config.connector_list, head) {
@@ -3564,10 +3564,6 @@ static void register_hpd_handlers(struct amdgpu_device *adev)
handle_hpd_rx_irq,
(void *) aconnector);
}
-
- if (adev->dm.hpd_rx_offload_wq)
- adev->dm.hpd_rx_offload_wq[connector->index].aconnector =
- aconnector;
}
}
@@ -4561,6 +4557,10 @@ static int amdgpu_dm_initialize_drm_device(struct amdgpu_device *adev)
goto fail;
}
+ if (dm->hpd_rx_offload_wq)
+ dm->hpd_rx_offload_wq[aconnector->base.index].aconnector =
+ aconnector;
+
if (!dc_link_detect_connection_type(link, &new_connection_type))
DRM_ERROR("KMS: Failed to detect connector\n");
@@ -5219,6 +5219,7 @@ static void fill_dc_dirty_rects(struct drm_plane *plane,
struct drm_plane_state *new_plane_state,
struct drm_crtc_state *crtc_state,
struct dc_flip_addrs *flip_addrs,
+ bool is_psr_su,
bool *dirty_regions_changed)
{
struct dm_crtc_state *dm_crtc_state = to_dm_crtc_state(crtc_state);
@@ -5243,6 +5244,10 @@ static void fill_dc_dirty_rects(struct drm_plane *plane,
num_clips = drm_plane_get_damage_clips_count(new_plane_state);
clips = drm_plane_get_damage_clips(new_plane_state);
+ if (num_clips && (!amdgpu_damage_clips || (amdgpu_damage_clips < 0 &&
+ is_psr_su)))
+ goto ffu;
+
if (!dm_crtc_state->mpo_requested) {
if (!num_clips || num_clips > DC_MAX_DIRTY_RECTS)
goto ffu;
@@ -6194,7 +6199,9 @@ create_stream_for_sink(struct drm_connector *connector,
if (recalculate_timing) {
freesync_mode = get_highest_refresh_rate_mode(aconnector, false);
drm_mode_copy(&saved_mode, &mode);
+ saved_mode.picture_aspect_ratio = mode.picture_aspect_ratio;
drm_mode_copy(&mode, freesync_mode);
+ mode.picture_aspect_ratio = saved_mode.picture_aspect_ratio;
} else {
decide_crtc_timing_for_drm_display_mode(
&mode, preferred_mode, scale);
@@ -6527,10 +6534,15 @@ amdgpu_dm_connector_late_register(struct drm_connector *connector)
static void amdgpu_dm_connector_funcs_force(struct drm_connector *connector)
{
struct amdgpu_dm_connector *aconnector = to_amdgpu_dm_connector(connector);
- struct amdgpu_connector *amdgpu_connector = to_amdgpu_connector(connector);
struct dc_link *dc_link = aconnector->dc_link;
struct dc_sink *dc_em_sink = aconnector->dc_em_sink;
struct edid *edid;
+ struct i2c_adapter *ddc;
+
+ if (dc_link && dc_link->aux_mode)
+ ddc = &aconnector->dm_dp_aux.aux.ddc;
+ else
+ ddc = &aconnector->i2c->base;
/*
* Note: drm_get_edid gets edid in the following order:
@@ -6538,7 +6550,7 @@ static void amdgpu_dm_connector_funcs_force(struct drm_connector *connector)
* 2) firmware EDID if set via edid_firmware module parameter
* 3) regular DDC read.
*/
- edid = drm_get_edid(connector, &amdgpu_connector->ddc_bus->aux.ddc);
+ edid = drm_get_edid(connector, ddc);
if (!edid) {
DRM_ERROR("No EDID found on connector: %s.\n", connector->name);
return;
@@ -6579,12 +6591,18 @@ static int get_modes(struct drm_connector *connector)
static void create_eml_sink(struct amdgpu_dm_connector *aconnector)
{
struct drm_connector *connector = &aconnector->base;
- struct amdgpu_connector *amdgpu_connector = to_amdgpu_connector(&aconnector->base);
+ struct dc_link *dc_link = aconnector->dc_link;
struct dc_sink_init_data init_params = {
.link = aconnector->dc_link,
.sink_signal = SIGNAL_TYPE_VIRTUAL
};
struct edid *edid;
+ struct i2c_adapter *ddc;
+
+ if (dc_link->aux_mode)
+ ddc = &aconnector->dm_dp_aux.aux.ddc;
+ else
+ ddc = &aconnector->i2c->base;
/*
* Note: drm_get_edid gets edid in the following order:
@@ -6592,7 +6610,7 @@ static void create_eml_sink(struct amdgpu_dm_connector *aconnector)
* 2) firmware EDID if set via edid_firmware module parameter
* 3) regular DDC read.
*/
- edid = drm_get_edid(connector, &amdgpu_connector->ddc_bus->aux.ddc);
+ edid = drm_get_edid(connector, ddc);
if (!edid) {
DRM_ERROR("No EDID found on connector: %s.\n", connector->name);
return;
@@ -8298,6 +8316,8 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
fill_dc_dirty_rects(plane, old_plane_state,
new_plane_state, new_crtc_state,
&bundle->flip_addrs[planes_count],
+ acrtc_state->stream->link->psr_settings.psr_version ==
+ DC_PSR_VERSION_SU_1,
&dirty_rects_changed);
/*
@@ -10731,11 +10751,13 @@ static int amdgpu_dm_atomic_check(struct drm_device *dev,
goto fail;
}
- ret = compute_mst_dsc_configs_for_state(state, dm_state->context, vars);
- if (ret) {
- DRM_DEBUG_DRIVER("compute_mst_dsc_configs_for_state() failed\n");
- ret = -EINVAL;
- goto fail;
+ if (dc_resource_is_dsc_encoding_supported(dc)) {
+ ret = compute_mst_dsc_configs_for_state(state, dm_state->context, vars);
+ if (ret) {
+ DRM_DEBUG_DRIVER("compute_mst_dsc_configs_for_state() failed\n");
+ ret = -EINVAL;
+ goto fail;
+ }
}
ret = dm_update_mst_vcpi_slots_for_dsc(state, dm_state->context, vars);
@@ -11147,14 +11169,23 @@ void amdgpu_dm_update_freesync_caps(struct drm_connector *connector,
if (range->flags != 1)
continue;
- amdgpu_dm_connector->min_vfreq = range->min_vfreq;
- amdgpu_dm_connector->max_vfreq = range->max_vfreq;
- amdgpu_dm_connector->pixel_clock_mhz =
- range->pixel_clock_mhz * 10;
-
connector->display_info.monitor_range.min_vfreq = range->min_vfreq;
connector->display_info.monitor_range.max_vfreq = range->max_vfreq;
+ if (edid->revision >= 4) {
+ if (data->pad2 & DRM_EDID_RANGE_OFFSET_MIN_VFREQ)
+ connector->display_info.monitor_range.min_vfreq += 255;
+ if (data->pad2 & DRM_EDID_RANGE_OFFSET_MAX_VFREQ)
+ connector->display_info.monitor_range.max_vfreq += 255;
+ }
+
+ amdgpu_dm_connector->min_vfreq =
+ connector->display_info.monitor_range.min_vfreq;
+ amdgpu_dm_connector->max_vfreq =
+ connector->display_info.monitor_range.max_vfreq;
+ amdgpu_dm_connector->pixel_clock_mhz =
+ range->pixel_clock_mhz * 10;
+
break;
}
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
index 85b7f58..c270633 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
@@ -67,6 +67,8 @@ static void apply_edid_quirks(struct edid *edid, struct dc_edid_caps *edid_caps)
/* Workaround for some monitors that do not clear DPCD 0x317 if FreeSync is unsupported */
case drm_edid_encode_panel_id('A', 'U', 'O', 0xA7AB):
case drm_edid_encode_panel_id('A', 'U', 'O', 0xE69B):
+ case drm_edid_encode_panel_id('B', 'O', 'E', 0x092A):
+ case drm_edid_encode_panel_id('L', 'G', 'D', 0x06D1):
DRM_DEBUG_DRIVER("Clearing DPCD 0x317 on monitor with panel id %X\n", panel_id);
edid_caps->panel_patch.remove_sink_ext_caps = true;
break;
@@ -120,6 +122,8 @@ enum dc_edid_status dm_helpers_parse_edid_caps(
edid_caps->edid_hdmi = connector->display_info.is_hdmi;
+ apply_edid_quirks(edid_buf, edid_caps);
+
sad_count = drm_edid_to_sad((struct edid *) edid->raw_edid, &sads);
if (sad_count <= 0)
return result;
@@ -146,8 +150,6 @@ enum dc_edid_status dm_helpers_parse_edid_caps(
else
edid_caps->speaker_flags = DEFAULT_SPEAKER_LOCATION;
- apply_edid_quirks(edid_buf, edid_caps);
-
kfree(sads);
kfree(sadb);
diff --git a/drivers/gpu/drm/amd/display/dc/basics/dce_calcs.c b/drivers/gpu/drm/amd/display/dc/basics/dce_calcs.c
index f2dfa96..39530b2 100644
--- a/drivers/gpu/drm/amd/display/dc/basics/dce_calcs.c
+++ b/drivers/gpu/drm/amd/display/dc/basics/dce_calcs.c
@@ -94,7 +94,7 @@ static void calculate_bandwidth(
const uint32_t s_high = 7;
const uint32_t dmif_chunk_buff_margin = 1;
- uint32_t max_chunks_fbc_mode;
+ uint32_t max_chunks_fbc_mode = 0;
int32_t num_cursor_lines;
int32_t i, j, k;
diff --git a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
index 960c4b4..05f3925 100644
--- a/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
+++ b/drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
@@ -1850,19 +1850,21 @@ static enum bp_result get_firmware_info_v3_2(
/* Vega12 */
smu_info_v3_2 = GET_IMAGE(struct atom_smu_info_v3_2,
DATA_TABLES(smu_info));
- DC_LOG_BIOS("gpuclk_ss_percentage (unit of 0.001 percent): %d\n", smu_info_v3_2->gpuclk_ss_percentage);
if (!smu_info_v3_2)
return BP_RESULT_BADBIOSTABLE;
+ DC_LOG_BIOS("gpuclk_ss_percentage (unit of 0.001 percent): %d\n", smu_info_v3_2->gpuclk_ss_percentage);
+
info->default_engine_clk = smu_info_v3_2->bootup_dcefclk_10khz * 10;
} else if (revision.minor == 3) {
/* Vega20 */
smu_info_v3_3 = GET_IMAGE(struct atom_smu_info_v3_3,
DATA_TABLES(smu_info));
- DC_LOG_BIOS("gpuclk_ss_percentage (unit of 0.001 percent): %d\n", smu_info_v3_3->gpuclk_ss_percentage);
if (!smu_info_v3_3)
return BP_RESULT_BADBIOSTABLE;
+ DC_LOG_BIOS("gpuclk_ss_percentage (unit of 0.001 percent): %d\n", smu_info_v3_3->gpuclk_ss_percentage);
+
info->default_engine_clk = smu_info_v3_3->bootup_dcefclk_10khz * 10;
}
@@ -2422,10 +2424,11 @@ static enum bp_result get_integrated_info_v11(
info_v11 = GET_IMAGE(struct atom_integrated_system_info_v1_11,
DATA_TABLES(integratedsysteminfo));
- DC_LOG_BIOS("gpuclk_ss_percentage (unit of 0.001 percent): %d\n", info_v11->gpuclk_ss_percentage);
if (info_v11 == NULL)
return BP_RESULT_BADBIOSTABLE;
+ DC_LOG_BIOS("gpuclk_ss_percentage (unit of 0.001 percent): %d\n", info_v11->gpuclk_ss_percentage);
+
info->gpu_cap_info =
le32_to_cpu(info_v11->gpucapinfo);
/*
@@ -2637,11 +2640,12 @@ static enum bp_result get_integrated_info_v2_1(
info_v2_1 = GET_IMAGE(struct atom_integrated_system_info_v2_1,
DATA_TABLES(integratedsysteminfo));
- DC_LOG_BIOS("gpuclk_ss_percentage (unit of 0.001 percent): %d\n", info_v2_1->gpuclk_ss_percentage);
if (info_v2_1 == NULL)
return BP_RESULT_BADBIOSTABLE;
+ DC_LOG_BIOS("gpuclk_ss_percentage (unit of 0.001 percent): %d\n", info_v2_1->gpuclk_ss_percentage);
+
info->gpu_cap_info =
le32_to_cpu(info_v2_1->gpucapinfo);
/*
@@ -2799,11 +2803,11 @@ static enum bp_result get_integrated_info_v2_2(
info_v2_2 = GET_IMAGE(struct atom_integrated_system_info_v2_2,
DATA_TABLES(integratedsysteminfo));
- DC_LOG_BIOS("gpuclk_ss_percentage (unit of 0.001 percent): %d\n", info_v2_2->gpuclk_ss_percentage);
-
if (info_v2_2 == NULL)
return BP_RESULT_BADBIOSTABLE;
+ DC_LOG_BIOS("gpuclk_ss_percentage (unit of 0.001 percent): %d\n", info_v2_2->gpuclk_ss_percentage);
+
info->gpu_cap_info =
le32_to_cpu(info_v2_2->gpucapinfo);
/*
diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn301/vg_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn301/vg_clk_mgr.c
index a5489fe..aa9fd1d 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn301/vg_clk_mgr.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn301/vg_clk_mgr.c
@@ -546,6 +546,8 @@ static unsigned int find_dcfclk_for_voltage(const struct vg_dpm_clocks *clock_ta
int i;
for (i = 0; i < VG_NUM_SOC_VOLTAGE_LEVELS; i++) {
+ if (i >= VG_NUM_DCFCLK_DPM_LEVELS)
+ break;
if (clock_table->SocVoltage[i] == voltage)
return clock_table->DcfClocks[i];
}
diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c
index 14cec1c..e648902 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c
@@ -655,10 +655,13 @@ static void dcn35_clk_mgr_helper_populate_bw_params(struct clk_mgr_internal *clk
struct clk_limit_table_entry def_max = bw_params->clk_table.entries[bw_params->clk_table.num_entries - 1];
uint32_t max_fclk = 0, min_pstate = 0, max_dispclk = 0, max_dppclk = 0;
uint32_t max_pstate = 0, max_dram_speed_mts = 0, min_dram_speed_mts = 0;
+ uint32_t num_memps, num_fclk, num_dcfclk;
int i;
/* Determine min/max p-state values. */
- for (i = 0; i < clock_table->NumMemPstatesEnabled; i++) {
+ num_memps = (clock_table->NumMemPstatesEnabled > NUM_MEM_PSTATE_LEVELS) ? NUM_MEM_PSTATE_LEVELS :
+ clock_table->NumMemPstatesEnabled;
+ for (i = 0; i < num_memps; i++) {
uint32_t dram_speed_mts = calc_dram_speed_mts(&clock_table->MemPstateTable[i]);
if (is_valid_clock_value(dram_speed_mts) && dram_speed_mts > max_dram_speed_mts) {
@@ -670,7 +673,7 @@ static void dcn35_clk_mgr_helper_populate_bw_params(struct clk_mgr_internal *clk
min_dram_speed_mts = max_dram_speed_mts;
min_pstate = max_pstate;
- for (i = 0; i < clock_table->NumMemPstatesEnabled; i++) {
+ for (i = 0; i < num_memps; i++) {
uint32_t dram_speed_mts = calc_dram_speed_mts(&clock_table->MemPstateTable[i]);
if (is_valid_clock_value(dram_speed_mts) && dram_speed_mts < min_dram_speed_mts) {
@@ -699,9 +702,13 @@ static void dcn35_clk_mgr_helper_populate_bw_params(struct clk_mgr_internal *clk
/* Base the clock table on dcfclk, need at least one entry regardless of pmfw table */
ASSERT(clock_table->NumDcfClkLevelsEnabled > 0);
- max_fclk = find_max_clk_value(clock_table->FclkClocks_Freq, clock_table->NumFclkLevelsEnabled);
+ num_fclk = (clock_table->NumFclkLevelsEnabled > NUM_FCLK_DPM_LEVELS) ? NUM_FCLK_DPM_LEVELS :
+ clock_table->NumFclkLevelsEnabled;
+ max_fclk = find_max_clk_value(clock_table->FclkClocks_Freq, num_fclk);
- for (i = 0; i < clock_table->NumDcfClkLevelsEnabled; i++) {
+ num_dcfclk = (clock_table->NumFclkLevelsEnabled > NUM_DCFCLK_DPM_LEVELS) ? NUM_DCFCLK_DPM_LEVELS :
+ clock_table->NumDcfClkLevelsEnabled;
+ for (i = 0; i < num_dcfclk; i++) {
int j;
/* First search defaults for the clocks we don't read using closest lower or equal default dcfclk */
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c
index aa7c02b..2c424e4 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -3817,7 +3817,9 @@ static void commit_planes_for_stream(struct dc *dc,
* programming has completed (we turn on phantom OTG in order
* to complete the plane disable for phantom pipes).
*/
- dc->hwss.apply_ctx_to_hw(dc, context);
+
+ if (dc->hwss.disable_phantom_streams)
+ dc->hwss.disable_phantom_streams(dc, context);
}
if (update_type != UPDATE_TYPE_FAST)
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_state.c b/drivers/gpu/drm/amd/display/dc/core/dc_state.c
index 88c6436..180ac47 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_state.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_state.c
@@ -291,11 +291,14 @@ void dc_state_destruct(struct dc_state *state)
dc_stream_release(state->phantom_streams[i]);
state->phantom_streams[i] = NULL;
}
+ state->phantom_stream_count = 0;
for (i = 0; i < state->phantom_plane_count; i++) {
dc_plane_state_release(state->phantom_planes[i]);
state->phantom_planes[i] = NULL;
}
+ state->phantom_plane_count = 0;
+
state->stream_mask = 0;
memset(&state->res_ctx, 0, sizeof(state->res_ctx));
memset(&state->pp_display_cfg, 0, sizeof(state->pp_display_cfg));
diff --git a/drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c b/drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c
index 2b79a0e..363d522 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c
+++ b/drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c
@@ -125,7 +125,7 @@ bool dc_dmub_srv_cmd_list_queue_execute(struct dc_dmub_srv *dc_dmub_srv,
unsigned int count,
union dmub_rb_cmd *cmd_list)
{
- struct dc_context *dc_ctx = dc_dmub_srv->ctx;
+ struct dc_context *dc_ctx;
struct dmub_srv *dmub;
enum dmub_status status;
int i;
@@ -133,6 +133,7 @@ bool dc_dmub_srv_cmd_list_queue_execute(struct dc_dmub_srv *dc_dmub_srv,
if (!dc_dmub_srv || !dc_dmub_srv->dmub)
return false;
+ dc_ctx = dc_dmub_srv->ctx;
dmub = dc_dmub_srv->dmub;
for (i = 0 ; i < count; i++) {
@@ -1161,7 +1162,7 @@ void dc_dmub_srv_subvp_save_surf_addr(const struct dc_dmub_srv *dc_dmub_srv, con
bool dc_dmub_srv_is_hw_pwr_up(struct dc_dmub_srv *dc_dmub_srv, bool wait)
{
- struct dc_context *dc_ctx = dc_dmub_srv->ctx;
+ struct dc_context *dc_ctx;
enum dmub_status status;
if (!dc_dmub_srv || !dc_dmub_srv->dmub)
@@ -1170,6 +1171,8 @@ bool dc_dmub_srv_is_hw_pwr_up(struct dc_dmub_srv *dc_dmub_srv, bool wait)
if (dc_dmub_srv->ctx->dc->debug.dmcub_emulation)
return true;
+ dc_ctx = dc_dmub_srv->ctx;
+
if (wait) {
if (dc_dmub_srv->ctx->dc->debug.disable_timeout) {
do {
diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_panel_cntl.c b/drivers/gpu/drm/amd/display/dc/dce/dce_panel_cntl.c
index e857006..5bca674 100644
--- a/drivers/gpu/drm/amd/display/dc/dce/dce_panel_cntl.c
+++ b/drivers/gpu/drm/amd/display/dc/dce/dce_panel_cntl.c
@@ -290,4 +290,5 @@ void dce_panel_cntl_construct(
dce_panel_cntl->base.funcs = &dce_link_panel_cntl_funcs;
dce_panel_cntl->base.ctx = init_data->ctx;
dce_panel_cntl->base.inst = init_data->inst;
+ dce_panel_cntl->base.pwrseq_inst = 0;
}
diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp_cm.c b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp_cm.c
index e43f77c..5f97a86 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp_cm.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp_cm.c
@@ -56,16 +56,13 @@ static void dpp3_enable_cm_block(
static enum dc_lut_mode dpp30_get_gamcor_current(struct dpp *dpp_base)
{
- enum dc_lut_mode mode;
+ enum dc_lut_mode mode = LUT_BYPASS;
uint32_t state_mode;
uint32_t lut_mode;
struct dcn3_dpp *dpp = TO_DCN30_DPP(dpp_base);
REG_GET(CM_GAMCOR_CONTROL, CM_GAMCOR_MODE_CURRENT, &state_mode);
- if (state_mode == 0)
- mode = LUT_BYPASS;
-
if (state_mode == 2) {//Programmable RAM LUT
REG_GET(CM_GAMCOR_CONTROL, CM_GAMCOR_SELECT_CURRENT, &lut_mode);
if (lut_mode == 0)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_panel_cntl.c b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_panel_cntl.c
index ad0df1a..9e96a3a 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_panel_cntl.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_panel_cntl.c
@@ -215,4 +215,5 @@ void dcn301_panel_cntl_construct(
dcn301_panel_cntl->base.funcs = &dcn301_link_panel_cntl_funcs;
dcn301_panel_cntl->base.ctx = init_data->ctx;
dcn301_panel_cntl->base.inst = init_data->inst;
+ dcn301_panel_cntl->base.pwrseq_inst = 0;
}
diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_panel_cntl.c b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_panel_cntl.c
index 0324842..281be20 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_panel_cntl.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_panel_cntl.c
@@ -154,8 +154,24 @@ void dcn31_panel_cntl_construct(
struct dcn31_panel_cntl *dcn31_panel_cntl,
const struct panel_cntl_init_data *init_data)
{
+ uint8_t pwrseq_inst = 0xF;
+
dcn31_panel_cntl->base.funcs = &dcn31_link_panel_cntl_funcs;
dcn31_panel_cntl->base.ctx = init_data->ctx;
dcn31_panel_cntl->base.inst = init_data->inst;
- dcn31_panel_cntl->base.pwrseq_inst = init_data->pwrseq_inst;
+
+ switch (init_data->eng_id) {
+ case ENGINE_ID_DIGA:
+ pwrseq_inst = 0;
+ break;
+ case ENGINE_ID_DIGB:
+ pwrseq_inst = 1;
+ break;
+ default:
+ DC_LOG_WARNING("Unsupported pwrseq engine id: %d!\n", init_data->eng_id);
+ ASSERT(false);
+ break;
+ }
+
+ dcn31_panel_cntl->base.pwrseq_inst = pwrseq_inst;
}
diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile b/drivers/gpu/drm/amd/display/dc/dml/Makefile
index 6042a5a..59ade76 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile
@@ -72,11 +72,11 @@
CFLAGS_$(AMDDALPATH)/dc/dml/display_mode_vba.o := $(dml_ccflags)
CFLAGS_$(AMDDALPATH)/dc/dml/dcn10/dcn10_fpu.o := $(dml_ccflags)
CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/dcn20_fpu.o := $(dml_ccflags)
-CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_mode_vba_20.o := $(dml_ccflags)
+CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_mode_vba_20.o := $(dml_ccflags) $(frame_warn_flag)
CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_rq_dlg_calc_20.o := $(dml_ccflags)
-CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_mode_vba_20v2.o := $(dml_ccflags)
+CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_mode_vba_20v2.o := $(dml_ccflags) $(frame_warn_flag)
CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_rq_dlg_calc_20v2.o := $(dml_ccflags)
-CFLAGS_$(AMDDALPATH)/dc/dml/dcn21/display_mode_vba_21.o := $(dml_ccflags)
+CFLAGS_$(AMDDALPATH)/dc/dml/dcn21/display_mode_vba_21.o := $(dml_ccflags) $(frame_warn_flag)
CFLAGS_$(AMDDALPATH)/dc/dml/dcn21/display_rq_dlg_calc_21.o := $(dml_ccflags)
CFLAGS_$(AMDDALPATH)/dc/dml/dcn30/display_mode_vba_30.o := $(dml_ccflags) $(frame_warn_flag)
CFLAGS_$(AMDDALPATH)/dc/dml/dcn30/display_rq_dlg_calc_30.o := $(dml_ccflags)
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
index dd781a2..a0a65e0 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
@@ -1288,7 +1288,7 @@ static bool update_pipes_with_split_flags(struct dc *dc, struct dc_state *contex
return updated;
}
-static bool should_allow_odm_power_optimization(struct dc *dc,
+static bool should_apply_odm_power_optimization(struct dc *dc,
struct dc_state *context, struct vba_vars_st *v, int *split,
bool *merge)
{
@@ -1392,9 +1392,12 @@ static void try_odm_power_optimization_and_revalidate(
{
int i;
unsigned int new_vlevel;
+ unsigned int cur_policy[MAX_PIPES];
- for (i = 0; i < pipe_cnt; i++)
+ for (i = 0; i < pipe_cnt; i++) {
+ cur_policy[i] = pipes[i].pipe.dest.odm_combine_policy;
pipes[i].pipe.dest.odm_combine_policy = dm_odm_combine_policy_2to1;
+ }
new_vlevel = dml_get_voltage_level(&context->bw_ctx.dml, pipes, pipe_cnt);
@@ -1403,6 +1406,9 @@ static void try_odm_power_optimization_and_revalidate(
memset(merge, 0, MAX_PIPES * sizeof(bool));
*vlevel = dcn20_validate_apply_pipe_split_flags(dc, context, new_vlevel, split, merge);
context->bw_ctx.dml.vba.VoltageLevel = *vlevel;
+ } else {
+ for (i = 0; i < pipe_cnt; i++)
+ pipes[i].pipe.dest.odm_combine_policy = cur_policy[i];
}
}
@@ -1580,7 +1586,7 @@ static void dcn32_full_validate_bw_helper(struct dc *dc,
}
}
- if (should_allow_odm_power_optimization(dc, context, vba, split, merge))
+ if (should_apply_odm_power_optimization(dc, context, vba, split, merge))
try_odm_power_optimization_and_revalidate(
dc, context, pipes, split, merge, vlevel, *pipe_cnt);
@@ -2209,7 +2215,8 @@ bool dcn32_internal_validate_bw(struct dc *dc,
int i;
pipe_cnt = dc->res_pool->funcs->populate_dml_pipes(dc, context, pipes, fast_validate);
- dcn32_update_dml_pipes_odm_policy_based_on_context(dc, context, pipes);
+ if (!dc->config.enable_windowed_mpo_odm)
+ dcn32_update_dml_pipes_odm_policy_based_on_context(dc, context, pipes);
/* repopulate_pipes = 1 means the pipes were either split or merged. In this case
* we have to re-calculate the DET allocation and run through DML once more to
@@ -2753,7 +2760,7 @@ static int build_synthetic_soc_states(bool disable_dc_mode_overwrite, struct clk
struct _vcs_dpi_voltage_scaling_st entry = {0};
struct clk_limit_table_entry max_clk_data = {0};
- unsigned int min_dcfclk_mhz = 399, min_fclk_mhz = 599;
+ unsigned int min_dcfclk_mhz = 199, min_fclk_mhz = 299;
static const unsigned int num_dcfclk_stas = 5;
unsigned int dcfclk_sta_targets[DC__VOLTAGE_STATES] = {199, 615, 906, 1324, 1564};
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.c b/drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.c
index 23a6082..1ba6933 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.c
+++ b/drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.c
@@ -398,7 +398,6 @@ void dml2_init_soc_states(struct dml2_context *dml2, const struct dc *in_dc,
/* Copy clocks tables entries, if available */
if (dml2->config.bbox_overrides.clks_table.num_states) {
p->in_states->num_states = dml2->config.bbox_overrides.clks_table.num_states;
-
for (i = 0; i < dml2->config.bbox_overrides.clks_table.num_entries_per_clk.num_dcfclk_levels; i++) {
p->in_states->state_array[i].dcfclk_mhz = dml2->config.bbox_overrides.clks_table.clk_entries[i].dcfclk_mhz;
}
@@ -437,6 +436,14 @@ void dml2_init_soc_states(struct dml2_context *dml2, const struct dc *in_dc,
}
dml2_policy_build_synthetic_soc_states(s, p);
+ if (dml2->v20.dml_core_ctx.project == dml_project_dcn35 ||
+ dml2->v20.dml_core_ctx.project == dml_project_dcn351) {
+ // Override last out_state with data from last in_state
+ // This will ensure that out_state contains max fclk
+ memcpy(&p->out_states->state_array[p->out_states->num_states - 1],
+ &p->in_states->state_array[p->in_states->num_states - 1],
+ sizeof(struct soc_state_bounding_box_st));
+ }
}
void dml2_translate_ip_params(const struct dc *in, struct ip_params_st *out)
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml2_wrapper.c b/drivers/gpu/drm/amd/display/dc/dml2/dml2_wrapper.c
index 26307e5..2a58a76 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/dml2_wrapper.c
+++ b/drivers/gpu/drm/amd/display/dc/dml2/dml2_wrapper.c
@@ -76,6 +76,11 @@ static void map_hw_resources(struct dml2_context *dml2,
in_out_display_cfg->hw.DLGRefClkFreqMHz = 50;
}
for (j = 0; j < mode_support_info->DPPPerSurface[i]; j++) {
+ if (i >= __DML2_WRAPPER_MAX_STREAMS_PLANES__) {
+ dml_print("DML::%s: Index out of bounds: i=%d, __DML2_WRAPPER_MAX_STREAMS_PLANES__=%d\n",
+ __func__, i, __DML2_WRAPPER_MAX_STREAMS_PLANES__);
+ break;
+ }
dml2->v20.scratch.dml_to_dc_pipe_mapping.dml_pipe_idx_to_stream_id[num_pipes] = dml2->v20.scratch.dml_to_dc_pipe_mapping.disp_cfg_to_stream_id[i];
dml2->v20.scratch.dml_to_dc_pipe_mapping.dml_pipe_idx_to_stream_id_valid[num_pipes] = true;
dml2->v20.scratch.dml_to_dc_pipe_mapping.dml_pipe_idx_to_plane_id[num_pipes] = dml2->v20.scratch.dml_to_dc_pipe_mapping.disp_cfg_to_plane_id[i];
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c
index 2352428..01493c4 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c
@@ -1476,7 +1476,7 @@ static enum dc_status dce110_enable_stream_timing(
return DC_OK;
}
-static enum dc_status apply_single_controller_ctx_to_hw(
+enum dc_status dce110_apply_single_controller_ctx_to_hw(
struct pipe_ctx *pipe_ctx,
struct dc_state *context,
struct dc *dc)
@@ -2302,7 +2302,7 @@ enum dc_status dce110_apply_ctx_to_hw(
if (pipe_ctx->top_pipe || pipe_ctx->prev_odm_pipe)
continue;
- status = apply_single_controller_ctx_to_hw(
+ status = dce110_apply_single_controller_ctx_to_hw(
pipe_ctx,
context,
dc);
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.h b/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.h
index 08028a1..ed3cc36 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.h
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.h
@@ -39,6 +39,10 @@ enum dc_status dce110_apply_ctx_to_hw(
struct dc *dc,
struct dc_state *context);
+enum dc_status dce110_apply_single_controller_ctx_to_hw(
+ struct pipe_ctx *pipe_ctx,
+ struct dc_state *context,
+ struct dc *dc);
void dce110_enable_stream(struct pipe_ctx *pipe_ctx);
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c
index 4853ecac..931ac8e 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c
@@ -2561,7 +2561,7 @@ void dcn20_setup_vupdate_interrupt(struct dc *dc, struct pipe_ctx *pipe_ctx)
tg->funcs->setup_vertical_interrupt2(tg, start_line);
}
-static void dcn20_reset_back_end_for_pipe(
+void dcn20_reset_back_end_for_pipe(
struct dc *dc,
struct pipe_ctx *pipe_ctx,
struct dc_state *context)
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.h b/drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.h
index b94c853..d950b3e 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.h
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.h
@@ -84,6 +84,10 @@ enum dc_status dcn20_enable_stream_timing(
void dcn20_disable_stream_gating(struct dc *dc, struct pipe_ctx *pipe_ctx);
void dcn20_enable_stream_gating(struct dc *dc, struct pipe_ctx *pipe_ctx);
void dcn20_setup_vupdate_interrupt(struct dc *dc, struct pipe_ctx *pipe_ctx);
+void dcn20_reset_back_end_for_pipe(
+ struct dc *dc,
+ struct pipe_ctx *pipe_ctx,
+ struct dc_state *context);
void dcn20_init_blank(
struct dc *dc,
struct timing_generator *tg);
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn21/dcn21_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn21/dcn21_hwseq.c
index 8e88dca..7252f5f 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn21/dcn21_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn21/dcn21_hwseq.c
@@ -206,28 +206,32 @@ void dcn21_set_abm_immediate_disable(struct pipe_ctx *pipe_ctx)
void dcn21_set_pipe(struct pipe_ctx *pipe_ctx)
{
struct abm *abm = pipe_ctx->stream_res.abm;
- uint32_t otg_inst = pipe_ctx->stream_res.tg->inst;
+ struct timing_generator *tg = pipe_ctx->stream_res.tg;
struct panel_cntl *panel_cntl = pipe_ctx->stream->link->panel_cntl;
struct dmcu *dmcu = pipe_ctx->stream->ctx->dc->res_pool->dmcu;
+ uint32_t otg_inst;
+
+ if (!abm || !tg || !panel_cntl)
+ return;
+
+ otg_inst = tg->inst;
if (dmcu) {
dce110_set_pipe(pipe_ctx);
return;
}
- if (abm && panel_cntl) {
- if (abm->funcs && abm->funcs->set_pipe_ex) {
- abm->funcs->set_pipe_ex(abm,
+ if (abm->funcs && abm->funcs->set_pipe_ex) {
+ abm->funcs->set_pipe_ex(abm,
otg_inst,
SET_ABM_PIPE_NORMAL,
panel_cntl->inst,
panel_cntl->pwrseq_inst);
- } else {
- dmub_abm_set_pipe(abm, otg_inst,
- SET_ABM_PIPE_NORMAL,
- panel_cntl->inst,
- panel_cntl->pwrseq_inst);
- }
+ } else {
+ dmub_abm_set_pipe(abm, otg_inst,
+ SET_ABM_PIPE_NORMAL,
+ panel_cntl->inst,
+ panel_cntl->pwrseq_inst);
}
}
@@ -237,34 +241,35 @@ bool dcn21_set_backlight_level(struct pipe_ctx *pipe_ctx,
{
struct dc_context *dc = pipe_ctx->stream->ctx;
struct abm *abm = pipe_ctx->stream_res.abm;
+ struct timing_generator *tg = pipe_ctx->stream_res.tg;
struct panel_cntl *panel_cntl = pipe_ctx->stream->link->panel_cntl;
+ uint32_t otg_inst;
+
+ if (!abm || !tg || !panel_cntl)
+ return false;
+
+ otg_inst = tg->inst;
if (dc->dc->res_pool->dmcu) {
dce110_set_backlight_level(pipe_ctx, backlight_pwm_u16_16, frame_ramp);
return true;
}
- if (abm != NULL) {
- uint32_t otg_inst = pipe_ctx->stream_res.tg->inst;
-
- if (abm && panel_cntl) {
- if (abm->funcs && abm->funcs->set_pipe_ex) {
- abm->funcs->set_pipe_ex(abm,
- otg_inst,
- SET_ABM_PIPE_NORMAL,
- panel_cntl->inst,
- panel_cntl->pwrseq_inst);
- } else {
- dmub_abm_set_pipe(abm,
- otg_inst,
- SET_ABM_PIPE_NORMAL,
- panel_cntl->inst,
- panel_cntl->pwrseq_inst);
- }
- }
+ if (abm->funcs && abm->funcs->set_pipe_ex) {
+ abm->funcs->set_pipe_ex(abm,
+ otg_inst,
+ SET_ABM_PIPE_NORMAL,
+ panel_cntl->inst,
+ panel_cntl->pwrseq_inst);
+ } else {
+ dmub_abm_set_pipe(abm,
+ otg_inst,
+ SET_ABM_PIPE_NORMAL,
+ panel_cntl->inst,
+ panel_cntl->pwrseq_inst);
}
- if (abm && abm->funcs && abm->funcs->set_backlight_level_pwm)
+ if (abm->funcs && abm->funcs->set_backlight_level_pwm)
abm->funcs->set_backlight_level_pwm(abm, backlight_pwm_u16_16,
frame_ramp, 0, panel_cntl->inst);
else
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c
index 6c9299c..aa36d7a 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c
@@ -1474,9 +1474,44 @@ void dcn32_update_dsc_pg(struct dc *dc,
}
}
+void dcn32_disable_phantom_streams(struct dc *dc, struct dc_state *context)
+{
+ struct dce_hwseq *hws = dc->hwseq;
+ int i;
+
+ for (i = dc->res_pool->pipe_count - 1; i >= 0 ; i--) {
+ struct pipe_ctx *pipe_ctx_old =
+ &dc->current_state->res_ctx.pipe_ctx[i];
+ struct pipe_ctx *pipe_ctx = &context->res_ctx.pipe_ctx[i];
+
+ if (!pipe_ctx_old->stream)
+ continue;
+
+ if (dc_state_get_pipe_subvp_type(dc->current_state, pipe_ctx_old) != SUBVP_PHANTOM)
+ continue;
+
+ if (pipe_ctx_old->top_pipe || pipe_ctx_old->prev_odm_pipe)
+ continue;
+
+ if (!pipe_ctx->stream || pipe_need_reprogram(pipe_ctx_old, pipe_ctx) ||
+ (pipe_ctx->stream && dc_state_get_pipe_subvp_type(context, pipe_ctx) != SUBVP_PHANTOM)) {
+ struct clock_source *old_clk = pipe_ctx_old->clock_source;
+
+ if (hws->funcs.reset_back_end_for_pipe)
+ hws->funcs.reset_back_end_for_pipe(dc, pipe_ctx_old, dc->current_state);
+ if (hws->funcs.enable_stream_gating)
+ hws->funcs.enable_stream_gating(dc, pipe_ctx_old);
+ if (old_clk)
+ old_clk->funcs->cs_power_down(old_clk);
+ }
+ }
+}
+
void dcn32_enable_phantom_streams(struct dc *dc, struct dc_state *context)
{
unsigned int i;
+ enum dc_status status = DC_OK;
+ struct dce_hwseq *hws = dc->hwseq;
for (i = 0; i < dc->res_pool->pipe_count; i++) {
struct pipe_ctx *pipe = &context->res_ctx.pipe_ctx[i];
@@ -1497,16 +1532,39 @@ void dcn32_enable_phantom_streams(struct dc *dc, struct dc_state *context)
}
}
for (i = 0; i < dc->res_pool->pipe_count; i++) {
- struct pipe_ctx *new_pipe = &context->res_ctx.pipe_ctx[i];
+ struct pipe_ctx *pipe_ctx_old =
+ &dc->current_state->res_ctx.pipe_ctx[i];
+ struct pipe_ctx *pipe_ctx = &context->res_ctx.pipe_ctx[i];
- if (new_pipe->stream && dc_state_get_pipe_subvp_type(context, new_pipe) == SUBVP_PHANTOM) {
- // If old context or new context has phantom pipes, apply
- // the phantom timings now. We can't change the phantom
- // pipe configuration safely without driver acquiring
- // the DMCUB lock first.
- dc->hwss.apply_ctx_to_hw(dc, context);
- break;
+ if (pipe_ctx->stream == NULL)
+ continue;
+
+ if (dc_state_get_pipe_subvp_type(context, pipe_ctx) != SUBVP_PHANTOM)
+ continue;
+
+ if (pipe_ctx->stream == pipe_ctx_old->stream &&
+ pipe_ctx->stream->link->link_state_valid) {
+ continue;
}
+
+ if (pipe_ctx_old->stream && !pipe_need_reprogram(pipe_ctx_old, pipe_ctx))
+ continue;
+
+ if (pipe_ctx->top_pipe || pipe_ctx->prev_odm_pipe)
+ continue;
+
+ if (hws->funcs.apply_single_controller_ctx_to_hw)
+ status = hws->funcs.apply_single_controller_ctx_to_hw(
+ pipe_ctx,
+ context,
+ dc);
+
+ ASSERT(status == DC_OK);
+
+#ifdef CONFIG_DRM_AMD_DC_FP
+ if (hws->funcs.resync_fifo_dccg_dio)
+ hws->funcs.resync_fifo_dccg_dio(hws, dc, context);
+#endif
}
}
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.h b/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.h
index cecf7f0..069e20b 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.h
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.h
@@ -111,6 +111,8 @@ void dcn32_update_dsc_pg(struct dc *dc,
void dcn32_enable_phantom_streams(struct dc *dc, struct dc_state *context);
+void dcn32_disable_phantom_streams(struct dc *dc, struct dc_state *context);
+
void dcn32_init_blank(
struct dc *dc,
struct timing_generator *tg);
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_init.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_init.c
index 427cfc8..e8ac94a 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_init.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_init.c
@@ -109,6 +109,7 @@ static const struct hw_sequencer_funcs dcn32_funcs = {
.get_dcc_en_bits = dcn10_get_dcc_en_bits,
.commit_subvp_config = dcn32_commit_subvp_config,
.enable_phantom_streams = dcn32_enable_phantom_streams,
+ .disable_phantom_streams = dcn32_disable_phantom_streams,
.subvp_pipe_control_lock = dcn32_subvp_pipe_control_lock,
.update_visual_confirm_color = dcn10_update_visual_confirm_color,
.subvp_pipe_control_lock_fast = dcn32_subvp_pipe_control_lock_fast,
@@ -159,6 +160,8 @@ static const struct hwseq_private_funcs dcn32_private_funcs = {
.set_pixels_per_cycle = dcn32_set_pixels_per_cycle,
.resync_fifo_dccg_dio = dcn32_resync_fifo_dccg_dio,
.is_dp_dig_pixel_rate_div_policy = dcn32_is_dp_dig_pixel_rate_div_policy,
+ .apply_single_controller_ctx_to_hw = dce110_apply_single_controller_ctx_to_hw,
+ .reset_back_end_for_pipe = dcn20_reset_back_end_for_pipe,
};
void dcn32_hw_sequencer_init_functions(struct dc *dc)
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/hw_sequencer.h b/drivers/gpu/drm/amd/display/dc/hwss/hw_sequencer.h
index a543993..64ca7c6 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/hw_sequencer.h
+++ b/drivers/gpu/drm/amd/display/dc/hwss/hw_sequencer.h
@@ -379,6 +379,7 @@ struct hw_sequencer_funcs {
struct dc_cursor_attributes *cursor_attr);
void (*commit_subvp_config)(struct dc *dc, struct dc_state *context);
void (*enable_phantom_streams)(struct dc *dc, struct dc_state *context);
+ void (*disable_phantom_streams)(struct dc *dc, struct dc_state *context);
void (*subvp_pipe_control_lock)(struct dc *dc,
struct dc_state *context,
bool lock,
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/hw_sequencer_private.h b/drivers/gpu/drm/amd/display/dc/hwss/hw_sequencer_private.h
index 6137cf0..b3c62a8 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/hw_sequencer_private.h
+++ b/drivers/gpu/drm/amd/display/dc/hwss/hw_sequencer_private.h
@@ -165,8 +165,15 @@ struct hwseq_private_funcs {
void (*set_pixels_per_cycle)(struct pipe_ctx *pipe_ctx);
void (*resync_fifo_dccg_dio)(struct dce_hwseq *hws, struct dc *dc,
struct dc_state *context);
+ enum dc_status (*apply_single_controller_ctx_to_hw)(
+ struct pipe_ctx *pipe_ctx,
+ struct dc_state *context,
+ struct dc *dc);
bool (*is_dp_dig_pixel_rate_div_policy)(struct pipe_ctx *pipe_ctx);
#endif
+ void (*reset_back_end_for_pipe)(struct dc *dc,
+ struct pipe_ctx *pipe_ctx,
+ struct dc_state *context);
};
struct dce_hwseq {
diff --git a/drivers/gpu/drm/amd/display/dc/inc/hw/panel_cntl.h b/drivers/gpu/drm/amd/display/dc/inc/hw/panel_cntl.h
index 5dcbaa2..e97d964 100644
--- a/drivers/gpu/drm/amd/display/dc/inc/hw/panel_cntl.h
+++ b/drivers/gpu/drm/amd/display/dc/inc/hw/panel_cntl.h
@@ -57,7 +57,7 @@ struct panel_cntl_funcs {
struct panel_cntl_init_data {
struct dc_context *ctx;
uint32_t inst;
- uint32_t pwrseq_inst;
+ uint32_t eng_id;
};
struct panel_cntl {
diff --git a/drivers/gpu/drm/amd/display/dc/inc/resource.h b/drivers/gpu/drm/amd/display/dc/inc/resource.h
index c958ef3..77a60aa 100644
--- a/drivers/gpu/drm/amd/display/dc/inc/resource.h
+++ b/drivers/gpu/drm/amd/display/dc/inc/resource.h
@@ -427,22 +427,18 @@ struct pipe_ctx *resource_get_primary_dpp_pipe(const struct pipe_ctx *dpp_pipe);
int resource_get_mpc_slice_index(const struct pipe_ctx *dpp_pipe);
/*
- * Get number of MPC "cuts" of the plane associated with the pipe. MPC slice
- * count is equal to MPC splits + 1. For example if a plane is cut 3 times, it
- * will have 4 pieces of slice.
- * return - 0 if pipe is not used for a plane with MPCC combine. otherwise
- * the number of MPC "cuts" for the plane.
+ * Get the number of MPC slices associated with the pipe.
+ * The function returns 0 if the pipe is not associated with an MPC combine
+ * pipe topology.
*/
-int resource_get_mpc_slice_count(const struct pipe_ctx *opp_head);
+int resource_get_mpc_slice_count(const struct pipe_ctx *pipe);
/*
- * Get number of ODM "cuts" of the timing associated with the pipe. ODM slice
- * count is equal to ODM splits + 1. For example if a timing is cut 3 times, it
- * will have 4 pieces of slice.
- * return - 0 if pipe is not used for ODM combine. otherwise
- * the number of ODM "cuts" for the timing.
+ * Get the number of ODM slices associated with the pipe.
+ * The function returns 0 if the pipe is not associated with an ODM combine
+ * pipe topology.
*/
-int resource_get_odm_slice_count(const struct pipe_ctx *otg_master);
+int resource_get_odm_slice_count(const struct pipe_ctx *pipe);
/* Get the ODM slice index counting from 0 from left most slice */
int resource_get_odm_slice_index(const struct pipe_ctx *opp_head);
diff --git a/drivers/gpu/drm/amd/display/dc/link/link_factory.c b/drivers/gpu/drm/amd/display/dc/link/link_factory.c
index 37d3027..cf22b8f 100644
--- a/drivers/gpu/drm/amd/display/dc/link/link_factory.c
+++ b/drivers/gpu/drm/amd/display/dc/link/link_factory.c
@@ -370,30 +370,6 @@ static enum transmitter translate_encoder_to_transmitter(
}
}
-static uint8_t translate_dig_inst_to_pwrseq_inst(struct dc_link *link)
-{
- uint8_t pwrseq_inst = 0xF;
- struct dc_context *dc_ctx = link->dc->ctx;
-
- DC_LOGGER_INIT(dc_ctx->logger);
-
- switch (link->eng_id) {
- case ENGINE_ID_DIGA:
- pwrseq_inst = 0;
- break;
- case ENGINE_ID_DIGB:
- pwrseq_inst = 1;
- break;
- default:
- DC_LOG_WARNING("Unsupported pwrseq engine id: %d!\n", link->eng_id);
- ASSERT(false);
- break;
- }
-
- return pwrseq_inst;
-}
-
-
static void link_destruct(struct dc_link *link)
{
int i;
@@ -657,7 +633,7 @@ static bool construct_phy(struct dc_link *link,
link->link_id.id == CONNECTOR_ID_LVDS)) {
panel_cntl_init_data.ctx = dc_ctx;
panel_cntl_init_data.inst = panel_cntl_init_data.ctx->dc_edp_id_count;
- panel_cntl_init_data.pwrseq_inst = translate_dig_inst_to_pwrseq_inst(link);
+ panel_cntl_init_data.eng_id = link->eng_id;
link->panel_cntl =
link->dc->res_pool->funcs->panel_cntl_create(
&panel_cntl_init_data);
diff --git a/drivers/gpu/drm/amd/display/dc/link/link_validation.c b/drivers/gpu/drm/amd/display/dc/link/link_validation.c
index 8fe66c3..5b0bc7f 100644
--- a/drivers/gpu/drm/amd/display/dc/link/link_validation.c
+++ b/drivers/gpu/drm/amd/display/dc/link/link_validation.c
@@ -361,7 +361,7 @@ bool link_validate_dpia_bandwidth(const struct dc_stream_state *stream, const un
struct dc_link *dpia_link[MAX_DPIA_NUM] = {0};
int num_dpias = 0;
- for (uint8_t i = 0; i < num_streams; ++i) {
+ for (unsigned int i = 0; i < num_streams; ++i) {
if (stream[i].signal == SIGNAL_TYPE_DISPLAY_PORT) {
/* new dpia sst stream, check whether it exceeds max dpia */
if (num_dpias >= MAX_DPIA_NUM)
diff --git a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_training.c b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_training.c
index 5a0b04518..16a62e0 100644
--- a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_training.c
+++ b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_training.c
@@ -517,6 +517,7 @@ enum link_training_result dp_check_link_loss_status(
{
enum link_training_result status = LINK_TRAINING_SUCCESS;
union lane_status lane_status;
+ union lane_align_status_updated dpcd_lane_status_updated;
uint8_t dpcd_buf[6] = {0};
uint32_t lane;
@@ -532,10 +533,12 @@ enum link_training_result dp_check_link_loss_status(
* check lanes status
*/
lane_status.raw = dp_get_nibble_at_index(&dpcd_buf[2], lane);
+ dpcd_lane_status_updated.raw = dpcd_buf[4];
if (!lane_status.bits.CHANNEL_EQ_DONE_0 ||
!lane_status.bits.CR_DONE_0 ||
- !lane_status.bits.SYMBOL_LOCKED_0) {
+ !lane_status.bits.SYMBOL_LOCKED_0 ||
+ !dp_is_interlane_aligned(dpcd_lane_status_updated)) {
/* if one of the channel equalization, clock
* recovery or symbol lock is dropped
* consider it as (link has been
diff --git a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_training_dpia.c b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_training_dpia.c
index e8dda44..5d36bab 100644
--- a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_training_dpia.c
+++ b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_training_dpia.c
@@ -619,7 +619,7 @@ static enum link_training_result dpia_training_eq_non_transparent(
uint32_t retries_eq = 0;
enum dc_status status;
enum dc_dp_training_pattern tr_pattern;
- uint32_t wait_time_microsec;
+ uint32_t wait_time_microsec = 0;
enum dc_lane_count lane_count = lt_settings->link_settings.lane_count;
union lane_align_status_updated dpcd_lane_status_updated = {0};
union lane_status dpcd_lane_status[LANE_COUNT_DP_MAX] = {0};
diff --git a/drivers/gpu/drm/amd/display/dc/resource/dcn301/dcn301_resource.c b/drivers/gpu/drm/amd/display/dc/resource/dcn301/dcn301_resource.c
index 511ff6b..7538b54 100644
--- a/drivers/gpu/drm/amd/display/dc/resource/dcn301/dcn301_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/resource/dcn301/dcn301_resource.c
@@ -999,7 +999,7 @@ static struct stream_encoder *dcn301_stream_encoder_create(enum engine_id eng_id
vpg = dcn301_vpg_create(ctx, vpg_inst);
afmt = dcn301_afmt_create(ctx, afmt_inst);
- if (!enc1 || !vpg || !afmt) {
+ if (!enc1 || !vpg || !afmt || eng_id >= ARRAY_SIZE(stream_enc_regs)) {
kfree(enc1);
kfree(vpg);
kfree(afmt);
diff --git a/drivers/gpu/drm/amd/display/dc/resource/dcn32/dcn32_resource.c b/drivers/gpu/drm/amd/display/dc/resource/dcn32/dcn32_resource.c
index c4d71e7..6f10052 100644
--- a/drivers/gpu/drm/amd/display/dc/resource/dcn32/dcn32_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/resource/dcn32/dcn32_resource.c
@@ -1829,7 +1829,21 @@ int dcn32_populate_dml_pipes_from_context(
dcn32_zero_pipe_dcc_fraction(pipes, pipe_cnt);
DC_FP_END();
pipes[pipe_cnt].pipe.dest.vfront_porch = timing->v_front_porch;
- pipes[pipe_cnt].pipe.dest.odm_combine_policy = dm_odm_combine_policy_dal;
+ if (dc->config.enable_windowed_mpo_odm &&
+ dc->debug.enable_single_display_2to1_odm_policy) {
+ switch (resource_get_odm_slice_count(pipe)) {
+ case 2:
+ pipes[pipe_cnt].pipe.dest.odm_combine_policy = dm_odm_combine_policy_2to1;
+ break;
+ case 4:
+ pipes[pipe_cnt].pipe.dest.odm_combine_policy = dm_odm_combine_policy_4to1;
+ break;
+ default:
+ pipes[pipe_cnt].pipe.dest.odm_combine_policy = dm_odm_combine_policy_dal;
+ }
+ } else {
+ pipes[pipe_cnt].pipe.dest.odm_combine_policy = dm_odm_combine_policy_dal;
+ }
pipes[pipe_cnt].pipe.src.gpuvm_min_page_size_kbytes = 256; // according to spreadsheet
pipes[pipe_cnt].pipe.src.unbounded_req_mode = false;
pipes[pipe_cnt].pipe.scale_ratio_depth.lb_depth = dm_lb_19;
diff --git a/drivers/gpu/drm/amd/display/dc/resource/dcn35/dcn35_resource.c b/drivers/gpu/drm/amd/display/dc/resource/dcn35/dcn35_resource.c
index 761ec98..5fdcda8 100644
--- a/drivers/gpu/drm/amd/display/dc/resource/dcn35/dcn35_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/resource/dcn35/dcn35_resource.c
@@ -780,8 +780,8 @@ static const struct dc_debug_options debug_defaults_drv = {
.disable_z10 = false,
.ignore_pg = true,
.psp_disabled_wa = true,
- .ips2_eval_delay_us = 200,
- .ips2_entry_delay_us = 400,
+ .ips2_eval_delay_us = 2000,
+ .ips2_entry_delay_us = 800,
.static_screen_wait_frames = 2,
};
@@ -2130,6 +2130,7 @@ static bool dcn35_resource_construct(
dc->dml2_options.dcn_pipe_count = pool->base.pipe_count;
dc->dml2_options.use_native_pstate_optimization = true;
dc->dml2_options.use_native_soc_bb_construction = true;
+ dc->dml2_options.minimize_dispclk_using_odm = false;
if (dc->config.EnableMinDispClkODM)
dc->dml2_options.minimize_dispclk_using_odm = true;
dc->dml2_options.enable_windowed_mpo_odm = dc->config.enable_windowed_mpo_odm;
diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index 087d578..39c5e1d 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -2558,6 +2558,7 @@ static ssize_t amdgpu_hwmon_set_pwm1_enable(struct device *dev,
{
struct amdgpu_device *adev = dev_get_drvdata(dev);
int err, ret;
+ u32 pwm_mode;
int value;
if (amdgpu_in_reset(adev))
@@ -2569,13 +2570,22 @@ static ssize_t amdgpu_hwmon_set_pwm1_enable(struct device *dev,
if (err)
return err;
+ if (value == 0)
+ pwm_mode = AMD_FAN_CTRL_NONE;
+ else if (value == 1)
+ pwm_mode = AMD_FAN_CTRL_MANUAL;
+ else if (value == 2)
+ pwm_mode = AMD_FAN_CTRL_AUTO;
+ else
+ return -EINVAL;
+
ret = pm_runtime_get_sync(adev_to_drm(adev)->dev);
if (ret < 0) {
pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
return ret;
}
- ret = amdgpu_dpm_set_fan_control_mode(adev, value);
+ ret = amdgpu_dpm_set_fan_control_mode(adev, pwm_mode);
pm_runtime_mark_last_busy(adev_to_drm(adev)->dev);
pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
index df4f202..eb4da36 100644
--- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
+++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
@@ -6925,6 +6925,23 @@ static int si_dpm_enable(struct amdgpu_device *adev)
return 0;
}
+static int si_set_temperature_range(struct amdgpu_device *adev)
+{
+ int ret;
+
+ ret = si_thermal_enable_alert(adev, false);
+ if (ret)
+ return ret;
+ ret = si_thermal_set_temperature_range(adev, R600_TEMP_RANGE_MIN, R600_TEMP_RANGE_MAX);
+ if (ret)
+ return ret;
+ ret = si_thermal_enable_alert(adev, true);
+ if (ret)
+ return ret;
+
+ return ret;
+}
+
static void si_dpm_disable(struct amdgpu_device *adev)
{
struct rv7xx_power_info *pi = rv770_get_pi(adev);
@@ -7608,6 +7625,18 @@ static int si_dpm_process_interrupt(struct amdgpu_device *adev,
static int si_dpm_late_init(void *handle)
{
+ int ret;
+ struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+
+ if (!adev->pm.dpm_enabled)
+ return 0;
+
+ ret = si_set_temperature_range(adev);
+ if (ret)
+ return ret;
+#if 0 //TODO ?
+ si_dpm_powergate_uvd(adev, true);
+#endif
return 0;
}
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
index 4cd43bb..bcad425 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
@@ -1303,13 +1303,12 @@ static int arcturus_get_power_limit(struct smu_context *smu,
if (default_power_limit)
*default_power_limit = power_limit;
- if (smu->od_enabled) {
+ if (smu->od_enabled)
od_percent_upper = le32_to_cpu(powerplay_table->overdrive_table.max[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
- od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
- } else {
+ else
od_percent_upper = 0;
- od_percent_lower = 100;
- }
+
+ od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d (default power: %d)\n",
od_percent_upper, od_percent_lower, power_limit);
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
index 8d1d29f..ed189a3 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
@@ -2357,13 +2357,12 @@ static int navi10_get_power_limit(struct smu_context *smu,
*default_power_limit = power_limit;
if (smu->od_enabled &&
- navi10_od_feature_is_supported(od_settings, SMU_11_0_ODCAP_POWER_LIMIT)) {
+ navi10_od_feature_is_supported(od_settings, SMU_11_0_ODCAP_POWER_LIMIT))
od_percent_upper = le32_to_cpu(powerplay_table->overdrive_table.max[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
- od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
- } else {
+ else
od_percent_upper = 0;
- od_percent_lower = 100;
- }
+
+ od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d (default power: %d)\n",
od_percent_upper, od_percent_lower, power_limit);
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
index 21fc033..e2ad2b9 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
@@ -640,13 +640,12 @@ static int sienna_cichlid_get_power_limit(struct smu_context *smu,
if (default_power_limit)
*default_power_limit = power_limit;
- if (smu->od_enabled) {
+ if (smu->od_enabled)
od_percent_upper = le32_to_cpu(powerplay_table->overdrive_table.max[SMU_11_0_7_ODSETTING_POWERPERCENTAGE]);
- od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_7_ODSETTING_POWERPERCENTAGE]);
- } else {
+ else
od_percent_upper = 0;
- od_percent_lower = 100;
- }
+
+ od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_7_ODSETTING_POWERPERCENTAGE]);
dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d (default power: %d)\n",
od_percent_upper, od_percent_lower, power_limit);
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
index a9954ff..9b80f18 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
@@ -2369,13 +2369,12 @@ static int smu_v13_0_0_get_power_limit(struct smu_context *smu,
if (default_power_limit)
*default_power_limit = power_limit;
- if (smu->od_enabled) {
+ if (smu->od_enabled)
od_percent_upper = le32_to_cpu(powerplay_table->overdrive_table.max[SMU_13_0_0_ODSETTING_POWERPERCENTAGE]);
- od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_13_0_0_ODSETTING_POWERPERCENTAGE]);
- } else {
+ else
od_percent_upper = 0;
- od_percent_lower = 100;
- }
+
+ od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_13_0_0_ODSETTING_POWERPERCENTAGE]);
dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d (default power: %d)\n",
od_percent_upper, od_percent_lower, power_limit);
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c
index 0ffdb58..3dc7b60 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c
@@ -2333,13 +2333,12 @@ static int smu_v13_0_7_get_power_limit(struct smu_context *smu,
if (default_power_limit)
*default_power_limit = power_limit;
- if (smu->od_enabled) {
+ if (smu->od_enabled)
od_percent_upper = le32_to_cpu(powerplay_table->overdrive_table.max[SMU_13_0_7_ODSETTING_POWERPERCENTAGE]);
- od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_13_0_7_ODSETTING_POWERPERCENTAGE]);
- } else {
+ else
od_percent_upper = 0;
- od_percent_lower = 100;
- }
+
+ od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_13_0_7_ODSETTING_POWERPERCENTAGE]);
dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d (default power: %d)\n",
od_percent_upper, od_percent_lower, power_limit);
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0.c b/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0.c
index 4894f7e..6dae5ad 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0.c
@@ -229,8 +229,6 @@ int smu_v14_0_check_fw_version(struct smu_context *smu)
smu->smc_driver_if_version = SMU14_DRIVER_IF_VERSION_SMU_V14_0_2;
break;
case IP_VERSION(14, 0, 0):
- if ((smu->smc_fw_version < 0x5d3a00))
- dev_warn(smu->adev->dev, "The PMFW version(%x) is behind in this BIOS!\n", smu->smc_fw_version);
smu->smc_driver_if_version = SMU14_DRIVER_IF_VERSION_SMU_V14_0_0;
break;
default:
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_0_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_0_ppt.c
index 47fdbae..9310c47 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_0_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_0_ppt.c
@@ -261,7 +261,10 @@ static int smu_v14_0_0_get_smu_metrics_data(struct smu_context *smu,
*value = metrics->MpipuclkFrequency;
break;
case METRICS_AVERAGE_GFXACTIVITY:
- *value = metrics->GfxActivity / 100;
+ if ((smu->smc_fw_version > 0x5d4600))
+ *value = metrics->GfxActivity;
+ else
+ *value = metrics->GfxActivity / 100;
break;
case METRICS_AVERAGE_VCNACTIVITY:
*value = metrics->VcnActivity / 100;
diff --git a/drivers/gpu/drm/bridge/aux-hpd-bridge.c b/drivers/gpu/drm/bridge/aux-hpd-bridge.c
index bb55f69..6886db2 100644
--- a/drivers/gpu/drm/bridge/aux-hpd-bridge.c
+++ b/drivers/gpu/drm/bridge/aux-hpd-bridge.c
@@ -25,20 +25,18 @@ static void drm_aux_hpd_bridge_release(struct device *dev)
ida_free(&drm_aux_hpd_bridge_ida, adev->id);
of_node_put(adev->dev.platform_data);
+ of_node_put(adev->dev.of_node);
kfree(adev);
}
-static void drm_aux_hpd_bridge_unregister_adev(void *_adev)
+static void drm_aux_hpd_bridge_free_adev(void *_adev)
{
- struct auxiliary_device *adev = _adev;
-
- auxiliary_device_delete(adev);
- auxiliary_device_uninit(adev);
+ auxiliary_device_uninit(_adev);
}
/**
- * drm_dp_hpd_bridge_register - Create a simple HPD DisplayPort bridge
+ * devm_drm_dp_hpd_bridge_alloc - allocate a HPD DisplayPort bridge
* @parent: device instance providing this bridge
* @np: device node pointer corresponding to this bridge instance
*
@@ -46,11 +44,9 @@ static void drm_aux_hpd_bridge_unregister_adev(void *_adev)
* DRM_MODE_CONNECTOR_DisplayPort, which terminates the bridge chain and is
* able to send the HPD events.
*
- * Return: device instance that will handle created bridge or an error code
- * encoded into the pointer.
+ * Return: bridge auxiliary device pointer or an error pointer
*/
-struct device *drm_dp_hpd_bridge_register(struct device *parent,
- struct device_node *np)
+struct auxiliary_device *devm_drm_dp_hpd_bridge_alloc(struct device *parent, struct device_node *np)
{
struct auxiliary_device *adev;
int ret;
@@ -74,18 +70,62 @@ struct device *drm_dp_hpd_bridge_register(struct device *parent,
ret = auxiliary_device_init(adev);
if (ret) {
+ of_node_put(adev->dev.platform_data);
+ of_node_put(adev->dev.of_node);
ida_free(&drm_aux_hpd_bridge_ida, adev->id);
kfree(adev);
return ERR_PTR(ret);
}
- ret = auxiliary_device_add(adev);
- if (ret) {
- auxiliary_device_uninit(adev);
+ ret = devm_add_action_or_reset(parent, drm_aux_hpd_bridge_free_adev, adev);
+ if (ret)
return ERR_PTR(ret);
- }
- ret = devm_add_action_or_reset(parent, drm_aux_hpd_bridge_unregister_adev, adev);
+ return adev;
+}
+EXPORT_SYMBOL_GPL(devm_drm_dp_hpd_bridge_alloc);
+
+static void drm_aux_hpd_bridge_del_adev(void *_adev)
+{
+ auxiliary_device_delete(_adev);
+}
+
+/**
+ * devm_drm_dp_hpd_bridge_add - register a HDP DisplayPort bridge
+ * @dev: struct device to tie registration lifetime to
+ * @adev: bridge auxiliary device to be registered
+ *
+ * Returns: zero on success or a negative errno
+ */
+int devm_drm_dp_hpd_bridge_add(struct device *dev, struct auxiliary_device *adev)
+{
+ int ret;
+
+ ret = auxiliary_device_add(adev);
+ if (ret)
+ return ret;
+
+ return devm_add_action_or_reset(dev, drm_aux_hpd_bridge_del_adev, adev);
+}
+EXPORT_SYMBOL_GPL(devm_drm_dp_hpd_bridge_add);
+
+/**
+ * drm_dp_hpd_bridge_register - allocate and register a HDP DisplayPort bridge
+ * @parent: device instance providing this bridge
+ * @np: device node pointer corresponding to this bridge instance
+ *
+ * Return: device instance that will handle created bridge or an error pointer
+ */
+struct device *drm_dp_hpd_bridge_register(struct device *parent, struct device_node *np)
+{
+ struct auxiliary_device *adev;
+ int ret;
+
+ adev = devm_drm_dp_hpd_bridge_alloc(parent, np);
+ if (IS_ERR(adev))
+ return ERR_CAST(adev);
+
+ ret = devm_drm_dp_hpd_bridge_add(parent, adev);
if (ret)
return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index f57e6d7..5ebdd6f 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -332,6 +332,7 @@ alloc_range_bias(struct drm_buddy *mm,
u64 start, u64 end,
unsigned int order)
{
+ u64 req_size = mm->chunk_size << order;
struct drm_buddy_block *block;
struct drm_buddy_block *buddy;
LIST_HEAD(dfs);
@@ -367,6 +368,15 @@ alloc_range_bias(struct drm_buddy *mm,
if (drm_buddy_block_is_allocated(block))
continue;
+ if (block_start < start || block_end > end) {
+ u64 adjusted_start = max(block_start, start);
+ u64 adjusted_end = min(block_end, end);
+
+ if (round_down(adjusted_end + 1, req_size) <=
+ round_up(adjusted_start, req_size))
+ continue;
+ }
+
if (contains(start, end, block_start, block_end) &&
order == drm_buddy_block_order(block)) {
/*
@@ -538,7 +548,13 @@ static int __alloc_range(struct drm_buddy *mm,
list_add(&block->left->tmp_link, dfs);
} while (1);
+ if (total_allocated < size) {
+ err = -ENOSPC;
+ goto err_free;
+ }
+
list_splice_tail(&allocated, blocks);
+
return 0;
err_undo:
@@ -755,8 +771,12 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm,
return -EINVAL;
/* Actual range allocation */
- if (start + size == end)
+ if (start + size == end) {
+ if (!IS_ALIGNED(start | end, min_block_size))
+ return -EINVAL;
+
return __drm_buddy_alloc_range(mm, start, size, NULL, blocks);
+ }
original_size = size;
original_min_size = min_block_size;
diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
index cb90e70..65f9f66 100644
--- a/drivers/gpu/drm/drm_crtc.c
+++ b/drivers/gpu/drm/drm_crtc.c
@@ -904,6 +904,7 @@ int drm_mode_setcrtc(struct drm_device *dev, void *data,
connector_set = NULL;
fb = NULL;
mode = NULL;
+ num_connectors = 0;
DRM_MODESET_LOCK_ALL_END(dev, ctx, ret);
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 834a5e2..7352bde 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -820,7 +820,7 @@ struct sg_table *drm_prime_pages_to_sg(struct drm_device *dev,
if (max_segment == 0)
max_segment = UINT_MAX;
err = sg_alloc_table_from_pages_segment(sg, pages, nr_pages, 0,
- nr_pages << PAGE_SHIFT,
+ (unsigned long)nr_pages << PAGE_SHIFT,
max_segment, GFP_KERNEL);
if (err) {
kfree(sg);
diff --git a/drivers/gpu/drm/drm_probe_helper.c b/drivers/gpu/drm/drm_probe_helper.c
index 3f47948..23b4e9a 100644
--- a/drivers/gpu/drm/drm_probe_helper.c
+++ b/drivers/gpu/drm/drm_probe_helper.c
@@ -760,9 +760,11 @@ static void output_poll_execute(struct work_struct *work)
changed = dev->mode_config.delayed_event;
dev->mode_config.delayed_event = false;
- if (!drm_kms_helper_poll && dev->mode_config.poll_running) {
- drm_kms_helper_disable_hpd(dev);
- dev->mode_config.poll_running = false;
+ if (!drm_kms_helper_poll) {
+ if (dev->mode_config.poll_running) {
+ drm_kms_helper_disable_hpd(dev);
+ dev->mode_config.poll_running = false;
+ }
goto out;
}
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 84101ba..a6c19de 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1040,7 +1040,8 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
uint64_t *points;
uint32_t signaled_count, i;
- if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT)
+ if (flags & (DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT |
+ DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE))
lockdep_assert_none_held_once();
points = kmalloc_array(count, sizeof(*points), GFP_KERNEL);
@@ -1109,7 +1110,8 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
* fallthough and try a 0 timeout wait!
*/
- if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) {
+ if (flags & (DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT |
+ DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE)) {
for (i = 0; i < count; ++i)
drm_syncobj_fence_add_wait(syncobjs[i], &entries[i]);
}
@@ -1416,10 +1418,21 @@ syncobj_eventfd_entry_func(struct drm_syncobj *syncobj,
/* This happens inside the syncobj lock */
fence = dma_fence_get(rcu_dereference_protected(syncobj->fence, 1));
+ if (!fence)
+ return;
+
ret = dma_fence_chain_find_seqno(&fence, entry->point);
- if (ret != 0 || !fence) {
+ if (ret != 0) {
+ /* The given seqno has not been submitted yet. */
dma_fence_put(fence);
return;
+ } else if (!fence) {
+ /* If dma_fence_chain_find_seqno returns 0 but sets the fence
+ * to NULL, it implies that the given seqno is signaled and a
+ * later seqno has already been submitted. Assign a stub fence
+ * so that the eventfd still gets signaled below.
+ */
+ fence = dma_fence_get_stub();
}
list_del_init(&entry->node);
diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index b5d6e33..3089029 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -140,7 +140,7 @@
Note that this driver only supports newer device from Broadwell on.
For further information and setup guide, you can visit:
- http://01.org/igvt-g.
+ https://github.com/intel/gvt-linux/wiki.
If in doubt, say "N".
diff --git a/drivers/gpu/drm/i915/display/intel_display_power_well.c b/drivers/gpu/drm/i915/display/intel_display_power_well.c
index 47cd6bb..06900ff 100644
--- a/drivers/gpu/drm/i915/display/intel_display_power_well.c
+++ b/drivers/gpu/drm/i915/display/intel_display_power_well.c
@@ -246,7 +246,14 @@ static enum phy icl_aux_pw_to_phy(struct drm_i915_private *i915,
enum aux_ch aux_ch = icl_aux_pw_to_ch(power_well);
struct intel_digital_port *dig_port = aux_ch_to_digital_port(i915, aux_ch);
- return intel_port_to_phy(i915, dig_port->base.port);
+ /*
+ * FIXME should we care about the (VBT defined) dig_port->aux_ch
+ * relationship or should this be purely defined by the hardware layout?
+ * Currently if the port doesn't appear in the VBT, or if it's declared
+ * as HDMI-only and routed to a combo PHY, the encoder either won't be
+ * present at all or it will not have an aux_ch assigned.
+ */
+ return dig_port ? intel_port_to_phy(i915, dig_port->base.port) : PHY_NONE;
}
static void hsw_wait_for_power_well_enable(struct drm_i915_private *dev_priv,
@@ -414,7 +421,8 @@ icl_combo_phy_aux_power_well_enable(struct drm_i915_private *dev_priv,
intel_de_rmw(dev_priv, regs->driver, 0, HSW_PWR_WELL_CTL_REQ(pw_idx));
- if (DISPLAY_VER(dev_priv) < 12)
+ /* FIXME this is a mess */
+ if (phy != PHY_NONE)
intel_de_rmw(dev_priv, ICL_PORT_CL_DW12(phy),
0, ICL_LANE_ENABLE_AUX);
@@ -437,7 +445,10 @@ icl_combo_phy_aux_power_well_disable(struct drm_i915_private *dev_priv,
drm_WARN_ON(&dev_priv->drm, !IS_ICELAKE(dev_priv));
- intel_de_rmw(dev_priv, ICL_PORT_CL_DW12(phy), ICL_LANE_ENABLE_AUX, 0);
+ /* FIXME this is a mess */
+ if (phy != PHY_NONE)
+ intel_de_rmw(dev_priv, ICL_PORT_CL_DW12(phy),
+ ICL_LANE_ENABLE_AUX, 0);
intel_de_rmw(dev_priv, regs->driver, HSW_PWR_WELL_CTL_REQ(pw_idx), 0);
diff --git a/drivers/gpu/drm/i915/display/intel_display_types.h b/drivers/gpu/drm/i915/display/intel_display_types.h
index 3fdd8a5..ac7fe62 100644
--- a/drivers/gpu/drm/i915/display/intel_display_types.h
+++ b/drivers/gpu/drm/i915/display/intel_display_types.h
@@ -609,6 +609,13 @@ struct intel_connector {
* and active (i.e. dpms ON state). */
bool (*get_hw_state)(struct intel_connector *);
+ /*
+ * Optional hook called during init/resume to sync any state
+ * stored in the connector (eg. DSC state) wrt. the HW state.
+ */
+ void (*sync_state)(struct intel_connector *connector,
+ const struct intel_crtc_state *crtc_state);
+
/* Panel info for eDP and LVDS */
struct intel_panel panel;
diff --git a/drivers/gpu/drm/i915/display/intel_dp.c b/drivers/gpu/drm/i915/display/intel_dp.c
index f5ef95d..94d2a15 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -2355,6 +2355,9 @@ intel_dp_compute_config_limits(struct intel_dp *intel_dp,
limits->min_rate = intel_dp_common_rate(intel_dp, 0);
limits->max_rate = intel_dp_max_link_rate(intel_dp);
+ /* FIXME 128b/132b SST support missing */
+ limits->max_rate = min(limits->max_rate, 810000);
+
limits->min_lane_count = 1;
limits->max_lane_count = intel_dp_max_lane_count(intel_dp);
@@ -5696,6 +5699,9 @@ intel_dp_detect(struct drm_connector *connector,
goto out;
}
+ if (!intel_dp_is_edp(intel_dp))
+ intel_psr_init_dpcd(intel_dp);
+
intel_dp_detect_dsc_caps(intel_dp, intel_connector);
intel_dp_configure_mst(intel_dp);
@@ -5856,6 +5862,19 @@ intel_dp_connector_unregister(struct drm_connector *connector)
intel_connector_unregister(connector);
}
+void intel_dp_connector_sync_state(struct intel_connector *connector,
+ const struct intel_crtc_state *crtc_state)
+{
+ struct drm_i915_private *i915 = to_i915(connector->base.dev);
+
+ if (crtc_state && crtc_state->dsc.compression_enable) {
+ drm_WARN_ON(&i915->drm, !connector->dp.dsc_decompression_aux);
+ connector->dp.dsc_decompression_enabled = true;
+ } else {
+ connector->dp.dsc_decompression_enabled = false;
+ }
+}
+
void intel_dp_encoder_flush_work(struct drm_encoder *encoder)
{
struct intel_digital_port *dig_port = enc_to_dig_port(to_intel_encoder(encoder));
diff --git a/drivers/gpu/drm/i915/display/intel_dp.h b/drivers/gpu/drm/i915/display/intel_dp.h
index 05db46b..375d067 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.h
+++ b/drivers/gpu/drm/i915/display/intel_dp.h
@@ -45,6 +45,8 @@ bool intel_dp_limited_color_range(const struct intel_crtc_state *crtc_state,
int intel_dp_min_bpp(enum intel_output_format output_format);
bool intel_dp_init_connector(struct intel_digital_port *dig_port,
struct intel_connector *intel_connector);
+void intel_dp_connector_sync_state(struct intel_connector *connector,
+ const struct intel_crtc_state *crtc_state);
void intel_dp_set_link_params(struct intel_dp *intel_dp,
int link_rate, int lane_count);
int intel_dp_get_link_train_fallback_values(struct intel_dp *intel_dp,
diff --git a/drivers/gpu/drm/i915/display/intel_dp_hdcp.c b/drivers/gpu/drm/i915/display/intel_dp_hdcp.c
index 3a595cd..8538d1c 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_hdcp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp_hdcp.c
@@ -330,23 +330,13 @@ static const struct hdcp2_dp_msg_data hdcp2_dp_msg_data[] = {
0, 0 },
};
-static struct drm_dp_aux *
-intel_dp_hdcp_get_aux(struct intel_connector *connector)
-{
- struct intel_digital_port *dig_port = intel_attached_dig_port(connector);
-
- if (intel_encoder_is_mst(connector->encoder))
- return &connector->port->aux;
- else
- return &dig_port->dp.aux;
-}
-
static int
intel_dp_hdcp2_read_rx_status(struct intel_connector *connector,
u8 *rx_status)
{
struct drm_i915_private *i915 = to_i915(connector->base.dev);
- struct drm_dp_aux *aux = intel_dp_hdcp_get_aux(connector);
+ struct intel_digital_port *dig_port = intel_attached_dig_port(connector);
+ struct drm_dp_aux *aux = &dig_port->dp.aux;
ssize_t ret;
ret = drm_dp_dpcd_read(aux,
@@ -399,7 +389,9 @@ intel_dp_hdcp2_wait_for_msg(struct intel_connector *connector,
const struct hdcp2_dp_msg_data *hdcp2_msg_data)
{
struct drm_i915_private *i915 = to_i915(connector->base.dev);
- struct intel_hdcp *hdcp = &connector->hdcp;
+ struct intel_digital_port *dig_port = intel_attached_dig_port(connector);
+ struct intel_dp *dp = &dig_port->dp;
+ struct intel_hdcp *hdcp = &dp->attached_connector->hdcp;
u8 msg_id = hdcp2_msg_data->msg_id;
int ret, timeout;
bool msg_ready = false;
@@ -454,8 +446,9 @@ int intel_dp_hdcp2_write_msg(struct intel_connector *connector,
unsigned int offset;
u8 *byte = buf;
ssize_t ret, bytes_to_write, len;
+ struct intel_digital_port *dig_port = intel_attached_dig_port(connector);
+ struct drm_dp_aux *aux = &dig_port->dp.aux;
const struct hdcp2_dp_msg_data *hdcp2_msg_data;
- struct drm_dp_aux *aux;
hdcp2_msg_data = get_hdcp2_dp_msg_data(*byte);
if (!hdcp2_msg_data)
@@ -463,8 +456,6 @@ int intel_dp_hdcp2_write_msg(struct intel_connector *connector,
offset = hdcp2_msg_data->offset;
- aux = intel_dp_hdcp_get_aux(connector);
-
/* No msg_id in DP HDCP2.2 msgs */
bytes_to_write = size - 1;
byte++;
@@ -490,7 +481,8 @@ static
ssize_t get_receiver_id_list_rx_info(struct intel_connector *connector,
u32 *dev_cnt, u8 *byte)
{
- struct drm_dp_aux *aux = intel_dp_hdcp_get_aux(connector);
+ struct intel_digital_port *dig_port = intel_attached_dig_port(connector);
+ struct drm_dp_aux *aux = &dig_port->dp.aux;
ssize_t ret;
u8 *rx_info = byte;
@@ -515,8 +507,9 @@ int intel_dp_hdcp2_read_msg(struct intel_connector *connector,
{
struct intel_digital_port *dig_port = intel_attached_dig_port(connector);
struct drm_i915_private *i915 = to_i915(dig_port->base.base.dev);
- struct intel_hdcp *hdcp = &connector->hdcp;
- struct drm_dp_aux *aux;
+ struct drm_dp_aux *aux = &dig_port->dp.aux;
+ struct intel_dp *dp = &dig_port->dp;
+ struct intel_hdcp *hdcp = &dp->attached_connector->hdcp;
unsigned int offset;
u8 *byte = buf;
ssize_t ret, bytes_to_recv, len;
@@ -530,8 +523,6 @@ int intel_dp_hdcp2_read_msg(struct intel_connector *connector,
return -EINVAL;
offset = hdcp2_msg_data->offset;
- aux = intel_dp_hdcp_get_aux(connector);
-
ret = intel_dp_hdcp2_wait_for_msg(connector, hdcp2_msg_data);
if (ret < 0)
return ret;
@@ -561,13 +552,8 @@ int intel_dp_hdcp2_read_msg(struct intel_connector *connector,
/* Entire msg read timeout since initiate of msg read */
if (bytes_to_recv == size - 1 && hdcp2_msg_data->msg_read_timeout > 0) {
- if (intel_encoder_is_mst(connector->encoder))
- msg_end = ktime_add_ms(ktime_get_raw(),
- hdcp2_msg_data->msg_read_timeout *
- connector->port->parent->num_ports);
- else
- msg_end = ktime_add_ms(ktime_get_raw(),
- hdcp2_msg_data->msg_read_timeout);
+ msg_end = ktime_add_ms(ktime_get_raw(),
+ hdcp2_msg_data->msg_read_timeout);
}
ret = drm_dp_dpcd_read(aux, offset,
@@ -651,12 +637,11 @@ static
int intel_dp_hdcp2_capable(struct intel_connector *connector,
bool *capable)
{
- struct drm_dp_aux *aux;
+ struct intel_digital_port *dig_port = intel_attached_dig_port(connector);
+ struct drm_dp_aux *aux = &dig_port->dp.aux;
u8 rx_caps[3];
int ret;
- aux = intel_dp_hdcp_get_aux(connector);
-
*capable = false;
ret = drm_dp_dpcd_read(aux,
DP_HDCP_2_2_REG_RX_CAPS_OFFSET,
diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c b/drivers/gpu/drm/i915/display/intel_dp_mst.c
index 8a94323..a01a59f 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
@@ -1534,6 +1534,7 @@ static struct drm_connector *intel_dp_add_mst_connector(struct drm_dp_mst_topolo
return NULL;
intel_connector->get_hw_state = intel_dp_mst_get_hw_state;
+ intel_connector->sync_state = intel_dp_connector_sync_state;
intel_connector->mst_port = intel_dp;
intel_connector->port = port;
drm_dp_mst_get_port_malloc(port);
diff --git a/drivers/gpu/drm/i915/display/intel_modeset_setup.c b/drivers/gpu/drm/i915/display/intel_modeset_setup.c
index 94eece7..caeca3a8 100644
--- a/drivers/gpu/drm/i915/display/intel_modeset_setup.c
+++ b/drivers/gpu/drm/i915/display/intel_modeset_setup.c
@@ -318,12 +318,6 @@ static void intel_modeset_update_connector_atomic_state(struct drm_i915_private
const struct intel_crtc_state *crtc_state =
to_intel_crtc_state(crtc->base.state);
- if (crtc_state->dsc.compression_enable) {
- drm_WARN_ON(&i915->drm, !connector->dp.dsc_decompression_aux);
- connector->dp.dsc_decompression_enabled = true;
- } else {
- connector->dp.dsc_decompression_enabled = false;
- }
conn_state->max_bpc = (crtc_state->pipe_bpp ?: 24) / 3;
}
}
@@ -775,8 +769,9 @@ static void intel_modeset_readout_hw_state(struct drm_i915_private *i915)
drm_connector_list_iter_begin(&i915->drm, &conn_iter);
for_each_intel_connector_iter(connector, &conn_iter) {
+ struct intel_crtc_state *crtc_state = NULL;
+
if (connector->get_hw_state(connector)) {
- struct intel_crtc_state *crtc_state;
struct intel_crtc *crtc;
connector->base.dpms = DRM_MODE_DPMS_ON;
@@ -802,6 +797,10 @@ static void intel_modeset_readout_hw_state(struct drm_i915_private *i915)
connector->base.dpms = DRM_MODE_DPMS_OFF;
connector->base.encoder = NULL;
}
+
+ if (connector->sync_state)
+ connector->sync_state(connector, crtc_state);
+
drm_dbg_kms(&i915->drm,
"[CONNECTOR:%d:%s] hw state readout: %s\n",
connector->base.base.id, connector->base.name,
diff --git a/drivers/gpu/drm/i915/display/intel_psr.c b/drivers/gpu/drm/i915/display/intel_psr.c
index 57bbf3e..4faaf4b 100644
--- a/drivers/gpu/drm/i915/display/intel_psr.c
+++ b/drivers/gpu/drm/i915/display/intel_psr.c
@@ -2776,9 +2776,6 @@ void intel_psr_init(struct intel_dp *intel_dp)
if (!(HAS_PSR(dev_priv) || HAS_DP20(dev_priv)))
return;
- if (!intel_dp_is_edp(intel_dp))
- intel_psr_init_dpcd(intel_dp);
-
/*
* HSW spec explicitly says PSR is tied to port A.
* BDW+ platforms have a instance of PSR registers per transcoder but
diff --git a/drivers/gpu/drm/i915/display/intel_sdvo.c b/drivers/gpu/drm/i915/display/intel_sdvo.c
index acc6b68..2915d7a 100644
--- a/drivers/gpu/drm/i915/display/intel_sdvo.c
+++ b/drivers/gpu/drm/i915/display/intel_sdvo.c
@@ -1209,7 +1209,7 @@ static bool intel_sdvo_set_tv_format(struct intel_sdvo *intel_sdvo,
struct intel_sdvo_tv_format format;
u32 format_map;
- format_map = 1 << conn_state->tv.mode;
+ format_map = 1 << conn_state->tv.legacy_mode;
memset(&format, 0, sizeof(format));
memcpy(&format, &format_map, min(sizeof(format), sizeof(format_map)));
@@ -2298,7 +2298,7 @@ static int intel_sdvo_get_tv_modes(struct drm_connector *connector)
* Read the list of supported input resolutions for the selected TV
* format.
*/
- format_map = 1 << conn_state->tv.mode;
+ format_map = 1 << conn_state->tv.legacy_mode;
memcpy(&tv_res, &format_map,
min(sizeof(format_map), sizeof(struct intel_sdvo_sdtv_resolution_request)));
@@ -2363,7 +2363,7 @@ intel_sdvo_connector_atomic_get_property(struct drm_connector *connector,
int i;
for (i = 0; i < intel_sdvo_connector->format_supported_num; i++)
- if (state->tv.mode == intel_sdvo_connector->tv_format_supported[i]) {
+ if (state->tv.legacy_mode == intel_sdvo_connector->tv_format_supported[i]) {
*val = i;
return 0;
@@ -2419,7 +2419,7 @@ intel_sdvo_connector_atomic_set_property(struct drm_connector *connector,
struct intel_sdvo_connector_state *sdvo_state = to_intel_sdvo_connector_state(state);
if (property == intel_sdvo_connector->tv_format) {
- state->tv.mode = intel_sdvo_connector->tv_format_supported[val];
+ state->tv.legacy_mode = intel_sdvo_connector->tv_format_supported[val];
if (state->crtc) {
struct drm_crtc_state *crtc_state =
@@ -3076,7 +3076,7 @@ static bool intel_sdvo_tv_create_property(struct intel_sdvo *intel_sdvo,
drm_property_add_enum(intel_sdvo_connector->tv_format, i,
tv_format_names[intel_sdvo_connector->tv_format_supported[i]]);
- intel_sdvo_connector->base.base.state->tv.mode = intel_sdvo_connector->tv_format_supported[0];
+ intel_sdvo_connector->base.base.state->tv.legacy_mode = intel_sdvo_connector->tv_format_supported[0];
drm_object_attach_property(&intel_sdvo_connector->base.base.base,
intel_sdvo_connector->tv_format, 0);
return true;
diff --git a/drivers/gpu/drm/i915/display/intel_tv.c b/drivers/gpu/drm/i915/display/intel_tv.c
index d4386cb..992a725 100644
--- a/drivers/gpu/drm/i915/display/intel_tv.c
+++ b/drivers/gpu/drm/i915/display/intel_tv.c
@@ -949,7 +949,7 @@ intel_disable_tv(struct intel_atomic_state *state,
static const struct tv_mode *intel_tv_mode_find(const struct drm_connector_state *conn_state)
{
- int format = conn_state->tv.mode;
+ int format = conn_state->tv.legacy_mode;
return &tv_modes[format];
}
@@ -1704,7 +1704,7 @@ static void intel_tv_find_better_format(struct drm_connector *connector)
break;
}
- connector->state->tv.mode = i;
+ connector->state->tv.legacy_mode = i;
}
static int
@@ -1859,7 +1859,7 @@ static int intel_tv_atomic_check(struct drm_connector *connector,
old_state = drm_atomic_get_old_connector_state(state, connector);
new_crtc_state = drm_atomic_get_new_crtc_state(state, new_state->crtc);
- if (old_state->tv.mode != new_state->tv.mode ||
+ if (old_state->tv.legacy_mode != new_state->tv.legacy_mode ||
old_state->tv.margins.left != new_state->tv.margins.left ||
old_state->tv.margins.right != new_state->tv.margins.right ||
old_state->tv.margins.top != new_state->tv.margins.top ||
@@ -1896,7 +1896,7 @@ static void intel_tv_add_properties(struct drm_connector *connector)
conn_state->tv.margins.right = 46;
conn_state->tv.margins.bottom = 37;
- conn_state->tv.mode = 0;
+ conn_state->tv.legacy_mode = 0;
/* Create TV properties then attach current values */
for (i = 0; i < ARRAY_SIZE(tv_modes); i++) {
@@ -1910,7 +1910,7 @@ static void intel_tv_add_properties(struct drm_connector *connector)
drm_object_attach_property(&connector->base,
i915->drm.mode_config.legacy_tv_mode_property,
- conn_state->tv.mode);
+ conn_state->tv.legacy_mode);
drm_object_attach_property(&connector->base,
i915->drm.mode_config.tv_left_margin_property,
conn_state->tv.margins.left);
diff --git a/drivers/gpu/drm/i915/display/intel_vdsc_regs.h b/drivers/gpu/drm/i915/display/intel_vdsc_regs.h
index 64f440f..8b21dc8 100644
--- a/drivers/gpu/drm/i915/display/intel_vdsc_regs.h
+++ b/drivers/gpu/drm/i915/display/intel_vdsc_regs.h
@@ -51,8 +51,8 @@
#define DSCC_PICTURE_PARAMETER_SET_0 _MMIO(0x6BA00)
#define _DSCA_PPS_0 0x6B200
#define _DSCC_PPS_0 0x6BA00
-#define DSCA_PPS(pps) _MMIO(_DSCA_PPS_0 + (pps) * 4)
-#define DSCC_PPS(pps) _MMIO(_DSCC_PPS_0 + (pps) * 4)
+#define DSCA_PPS(pps) _MMIO(_DSCA_PPS_0 + ((pps) < 12 ? (pps) : (pps) + 12) * 4)
+#define DSCC_PPS(pps) _MMIO(_DSCC_PPS_0 + ((pps) < 12 ? (pps) : (pps) + 12) * 4)
#define _ICL_DSC0_PICTURE_PARAMETER_SET_0_PB 0x78270
#define _ICL_DSC1_PICTURE_PARAMETER_SET_0_PB 0x78370
#define _ICL_DSC0_PICTURE_PARAMETER_SET_0_PC 0x78470
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index 1d3ebdf..c08b675 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -379,6 +379,9 @@ i915_gem_userptr_release(struct drm_i915_gem_object *obj)
{
GEM_WARN_ON(obj->userptr.page_ref);
+ if (!obj->userptr.notifier.mm)
+ return;
+
mmu_interval_notifier_remove(&obj->userptr.notifier);
obj->userptr.notifier.mm = NULL;
}
diff --git a/drivers/gpu/drm/i915/gvt/handlers.c b/drivers/gpu/drm/i915/gvt/handlers.c
index 90f6c1e..efcb004 100644
--- a/drivers/gpu/drm/i915/gvt/handlers.c
+++ b/drivers/gpu/drm/i915/gvt/handlers.c
@@ -2849,8 +2849,7 @@ static int handle_mmio(struct intel_gvt_mmio_table_iter *iter, u32 offset,
for (i = start; i < end; i += 4) {
p = intel_gvt_find_mmio_info(gvt, i);
if (p) {
- WARN(1, "dup mmio definition offset %x\n",
- info->offset);
+ WARN(1, "dup mmio definition offset %x\n", i);
/* We return -EEXIST here to make GVT-g load fail.
* So duplicated MMIO can be found as soon as
diff --git a/drivers/gpu/drm/i915/intel_gvt.c b/drivers/gpu/drm/i915/intel_gvt.c
index e98b6d6..9b6d87c 100644
--- a/drivers/gpu/drm/i915/intel_gvt.c
+++ b/drivers/gpu/drm/i915/intel_gvt.c
@@ -41,7 +41,7 @@
* To virtualize GPU resources GVT-g driver depends on hypervisor technology
* e.g KVM/VFIO/mdev, Xen, etc. to provide resource access trapping capability
* and be virtualized within GVT-g device module. More architectural design
- * doc is available on https://01.org/group/2230/documentation-list.
+ * doc is available on https://github.com/intel/gvt-linux/wiki.
*/
static LIST_HEAD(intel_gvt_devices);
diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
index 2990dd4..e14ac0a 100644
--- a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
+++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
@@ -3,6 +3,8 @@
* Copyright © 2021 Intel Corporation
*/
+#include <linux/jiffies.h>
+
//#include "gt/intel_engine_user.h"
#include "gt/intel_gt.h"
#include "i915_drv.h"
@@ -12,7 +14,7 @@
#define REDUCED_TIMESLICE 5
#define REDUCED_PREEMPT 10
-#define WAIT_FOR_RESET_TIME 10000
+#define WAIT_FOR_RESET_TIME_MS 10000
struct intel_engine_cs *intel_selftest_find_any_engine(struct intel_gt *gt)
{
@@ -91,7 +93,7 @@ int intel_selftest_wait_for_rq(struct i915_request *rq)
{
long ret;
- ret = i915_request_wait(rq, 0, WAIT_FOR_RESET_TIME);
+ ret = i915_request_wait(rq, 0, msecs_to_jiffies(WAIT_FOR_RESET_TIME_MS));
if (ret < 0)
return ret;
diff --git a/drivers/gpu/drm/meson/meson_encoder_cvbs.c b/drivers/gpu/drm/meson/meson_encoder_cvbs.c
index 3f73b21..3407450 100644
--- a/drivers/gpu/drm/meson/meson_encoder_cvbs.c
+++ b/drivers/gpu/drm/meson/meson_encoder_cvbs.c
@@ -294,6 +294,5 @@ void meson_encoder_cvbs_remove(struct meson_drm *priv)
if (priv->encoders[MESON_ENC_CVBS]) {
meson_encoder_cvbs = priv->encoders[MESON_ENC_CVBS];
drm_bridge_remove(&meson_encoder_cvbs->bridge);
- drm_bridge_remove(meson_encoder_cvbs->next_bridge);
}
}
diff --git a/drivers/gpu/drm/meson/meson_encoder_dsi.c b/drivers/gpu/drm/meson/meson_encoder_dsi.c
index 3f93c70..311b916 100644
--- a/drivers/gpu/drm/meson/meson_encoder_dsi.c
+++ b/drivers/gpu/drm/meson/meson_encoder_dsi.c
@@ -168,6 +168,5 @@ void meson_encoder_dsi_remove(struct meson_drm *priv)
if (priv->encoders[MESON_ENC_DSI]) {
meson_encoder_dsi = priv->encoders[MESON_ENC_DSI];
drm_bridge_remove(&meson_encoder_dsi->bridge);
- drm_bridge_remove(meson_encoder_dsi->next_bridge);
}
}
diff --git a/drivers/gpu/drm/meson/meson_encoder_hdmi.c b/drivers/gpu/drm/meson/meson_encoder_hdmi.c
index 25ea765..c468656 100644
--- a/drivers/gpu/drm/meson/meson_encoder_hdmi.c
+++ b/drivers/gpu/drm/meson/meson_encoder_hdmi.c
@@ -474,6 +474,5 @@ void meson_encoder_hdmi_remove(struct meson_drm *priv)
if (priv->encoders[MESON_ENC_HDMI]) {
meson_encoder_hdmi = priv->encoders[MESON_ENC_HDMI];
drm_bridge_remove(&meson_encoder_hdmi->bridge);
- drm_bridge_remove(meson_encoder_hdmi->next_bridge);
}
}
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index c0bc924..c9c55e2 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1287,7 +1287,7 @@ static void a6xx_calc_ubwc_config(struct adreno_gpu *gpu)
gpu->ubwc_config.highest_bank_bit = 15;
if (adreno_is_a610(gpu)) {
- gpu->ubwc_config.highest_bank_bit = 14;
+ gpu->ubwc_config.highest_bank_bit = 13;
gpu->ubwc_config.min_acc_len = 1;
gpu->ubwc_config.ubwc_mode = 1;
}
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
index 83380bc..6a4b489 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
@@ -144,10 +144,6 @@ enum dpu_enc_rc_states {
* to track crtc in the disable() hook which is called
* _after_ encoder_mask is cleared.
* @connector: If a mode is set, cached pointer to the active connector
- * @crtc_kickoff_cb: Callback into CRTC that will flush & start
- * all CTL paths
- * @crtc_kickoff_cb_data: Opaque user data given to crtc_kickoff_cb
- * @debugfs_root: Debug file system root file node
* @enc_lock: Lock around physical encoder
* create/destroy/enable/disable
* @frame_busy_mask: Bitmask tracking which phys_enc we are still
@@ -2072,7 +2068,7 @@ void dpu_encoder_helper_phys_cleanup(struct dpu_encoder_phys *phys_enc)
}
/* reset the merge 3D HW block */
- if (phys_enc->hw_pp->merge_3d) {
+ if (phys_enc->hw_pp && phys_enc->hw_pp->merge_3d) {
phys_enc->hw_pp->merge_3d->ops.setup_3d_mode(phys_enc->hw_pp->merge_3d,
BLEND_3D_NONE);
if (phys_enc->hw_ctl->ops.update_pending_flush_merge_3d)
@@ -2103,7 +2099,7 @@ void dpu_encoder_helper_phys_cleanup(struct dpu_encoder_phys *phys_enc)
if (phys_enc->hw_wb)
intf_cfg.wb = phys_enc->hw_wb->idx;
- if (phys_enc->hw_pp->merge_3d)
+ if (phys_enc->hw_pp && phys_enc->hw_pp->merge_3d)
intf_cfg.merge_3d = phys_enc->hw_pp->merge_3d->idx;
if (ctl->ops.reset_intf_cfg)
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_rm.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_rm.c
index b58a9c2..724537a 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_rm.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_rm.c
@@ -29,7 +29,6 @@ static inline bool reserved_by_other(uint32_t *res_map, int idx,
/**
* struct dpu_rm_requirements - Reservation requirements parameter bundle
* @topology: selected topology for the display
- * @hw_res: Hardware resources required as reported by the encoders
*/
struct dpu_rm_requirements {
struct msm_display_topology topology;
@@ -204,6 +203,8 @@ static bool _dpu_rm_needs_split_display(const struct msm_display_topology *top)
* _dpu_rm_get_lm_peer - get the id of a mixer which is a peer of the primary
* @rm: dpu resource manager handle
* @primary_idx: index of primary mixer in rm->mixer_blks[]
+ *
+ * Returns: lm peer mixed id on success or %-EINVAL on error
*/
static int _dpu_rm_get_lm_peer(struct dpu_rm *rm, int primary_idx)
{
diff --git a/drivers/gpu/drm/msm/dp/dp_ctrl.c b/drivers/gpu/drm/msm/dp/dp_ctrl.c
index 77a8d93..fb588fd 100644
--- a/drivers/gpu/drm/msm/dp/dp_ctrl.c
+++ b/drivers/gpu/drm/msm/dp/dp_ctrl.c
@@ -135,11 +135,6 @@ static void dp_ctrl_config_ctrl(struct dp_ctrl_private *ctrl)
tbd = dp_link_get_test_bits_depth(ctrl->link,
ctrl->panel->dp_mode.bpp);
- if (tbd == DP_TEST_BIT_DEPTH_UNKNOWN) {
- pr_debug("BIT_DEPTH not set. Configure default\n");
- tbd = DP_TEST_BIT_DEPTH_8;
- }
-
config |= tbd << DP_CONFIGURATION_CTRL_BPC_SHIFT;
/* Num of Lanes */
diff --git a/drivers/gpu/drm/msm/dp/dp_display.c b/drivers/gpu/drm/msm/dp/dp_display.c
index d37d599..4c72124 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.c
+++ b/drivers/gpu/drm/msm/dp/dp_display.c
@@ -329,10 +329,26 @@ static const struct component_ops dp_display_comp_ops = {
.unbind = dp_display_unbind,
};
+static void dp_display_send_hpd_event(struct msm_dp *dp_display)
+{
+ struct dp_display_private *dp;
+ struct drm_connector *connector;
+
+ dp = container_of(dp_display, struct dp_display_private, dp_display);
+
+ connector = dp->dp_display.connector;
+ drm_helper_hpd_irq_event(connector->dev);
+}
+
static int dp_display_send_hpd_notification(struct dp_display_private *dp,
bool hpd)
{
- struct drm_bridge *bridge = dp->dp_display.bridge;
+ if ((hpd && dp->dp_display.link_ready) ||
+ (!hpd && !dp->dp_display.link_ready)) {
+ drm_dbg_dp(dp->drm_dev, "HPD already %s\n",
+ (hpd ? "on" : "off"));
+ return 0;
+ }
/* reset video pattern flag on disconnect */
if (!hpd) {
@@ -348,7 +364,7 @@ static int dp_display_send_hpd_notification(struct dp_display_private *dp,
drm_dbg_dp(dp->drm_dev, "type=%d hpd=%d\n",
dp->dp_display.connector_type, hpd);
- drm_bridge_hpd_notify(bridge, dp->dp_display.link_ready);
+ dp_display_send_hpd_event(&dp->dp_display);
return 0;
}
diff --git a/drivers/gpu/drm/msm/dp/dp_link.c b/drivers/gpu/drm/msm/dp/dp_link.c
index 98427d4..49dfac1 100644
--- a/drivers/gpu/drm/msm/dp/dp_link.c
+++ b/drivers/gpu/drm/msm/dp/dp_link.c
@@ -7,6 +7,7 @@
#include <drm/drm_print.h>
+#include "dp_reg.h"
#include "dp_link.h"
#include "dp_panel.h"
@@ -1082,7 +1083,7 @@ int dp_link_process_request(struct dp_link *dp_link)
int dp_link_get_colorimetry_config(struct dp_link *dp_link)
{
- u32 cc;
+ u32 cc = DP_MISC0_COLORIMERY_CFG_LEGACY_RGB;
struct dp_link_private *link;
if (!dp_link) {
@@ -1096,10 +1097,11 @@ int dp_link_get_colorimetry_config(struct dp_link *dp_link)
* Unless a video pattern CTS test is ongoing, use RGB_VESA
* Only RGB_VESA and RGB_CEA supported for now
*/
- if (dp_link_is_video_pattern_requested(link))
- cc = link->dp_link.test_video.test_dyn_range;
- else
- cc = DP_TEST_DYNAMIC_RANGE_VESA;
+ if (dp_link_is_video_pattern_requested(link)) {
+ if (link->dp_link.test_video.test_dyn_range &
+ DP_TEST_DYNAMIC_RANGE_CEA)
+ cc = DP_MISC0_COLORIMERY_CFG_CEA_RGB;
+ }
return cc;
}
@@ -1179,6 +1181,9 @@ void dp_link_reset_phy_params_vx_px(struct dp_link *dp_link)
u32 dp_link_get_test_bits_depth(struct dp_link *dp_link, u32 bpp)
{
u32 tbd;
+ struct dp_link_private *link;
+
+ link = container_of(dp_link, struct dp_link_private, dp_link);
/*
* Few simplistic rules and assumptions made here:
@@ -1196,12 +1201,13 @@ u32 dp_link_get_test_bits_depth(struct dp_link *dp_link, u32 bpp)
tbd = DP_TEST_BIT_DEPTH_10;
break;
default:
- tbd = DP_TEST_BIT_DEPTH_UNKNOWN;
+ drm_dbg_dp(link->drm_dev, "bpp=%d not supported, use bpc=8\n",
+ bpp);
+ tbd = DP_TEST_BIT_DEPTH_8;
break;
}
- if (tbd != DP_TEST_BIT_DEPTH_UNKNOWN)
- tbd = (tbd >> DP_TEST_BIT_DEPTH_SHIFT);
+ tbd = (tbd >> DP_TEST_BIT_DEPTH_SHIFT);
return tbd;
}
diff --git a/drivers/gpu/drm/msm/dp/dp_reg.h b/drivers/gpu/drm/msm/dp/dp_reg.h
index ea85a69..78785ed 100644
--- a/drivers/gpu/drm/msm/dp/dp_reg.h
+++ b/drivers/gpu/drm/msm/dp/dp_reg.h
@@ -143,6 +143,9 @@
#define DP_MISC0_COLORIMETRY_CFG_SHIFT (0x00000001)
#define DP_MISC0_TEST_BITS_DEPTH_SHIFT (0x00000005)
+#define DP_MISC0_COLORIMERY_CFG_LEGACY_RGB (0)
+#define DP_MISC0_COLORIMERY_CFG_CEA_RGB (0x04)
+
#define REG_DP_VALID_BOUNDARY (0x00000030)
#define REG_DP_VALID_BOUNDARY_2 (0x00000034)
diff --git a/drivers/gpu/drm/msm/msm_gem_prime.c b/drivers/gpu/drm/msm/msm_gem_prime.c
index 5f68e31..0915f3b 100644
--- a/drivers/gpu/drm/msm/msm_gem_prime.c
+++ b/drivers/gpu/drm/msm/msm_gem_prime.c
@@ -26,7 +26,7 @@ int msm_gem_prime_vmap(struct drm_gem_object *obj, struct iosys_map *map)
{
void *vaddr;
- vaddr = msm_gem_get_vaddr(obj);
+ vaddr = msm_gem_get_vaddr_locked(obj);
if (IS_ERR(vaddr))
return PTR_ERR(vaddr);
iosys_map_set_vaddr(map, vaddr);
@@ -36,7 +36,7 @@ int msm_gem_prime_vmap(struct drm_gem_object *obj, struct iosys_map *map)
void msm_gem_prime_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
{
- msm_gem_put_vaddr(obj);
+ msm_gem_put_vaddr_locked(obj);
}
struct drm_gem_object *msm_gem_prime_import_sg_table(struct drm_device *dev,
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 0953907..655002b 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -751,12 +751,14 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
struct msm_ringbuffer *ring = submit->ring;
unsigned long flags;
+ WARN_ON(!mutex_is_locked(&gpu->lock));
+
pm_runtime_get_sync(&gpu->pdev->dev);
- mutex_lock(&gpu->lock);
-
msm_gpu_hw_init(gpu);
+ submit->seqno = submit->hw_fence->seqno;
+
update_sw_cntrs(gpu);
/*
@@ -781,11 +783,8 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
gpu->funcs->submit(gpu, submit);
gpu->cur_ctx_seqno = submit->queue->ctx->seqno;
- hangcheck_timer_reset(gpu);
-
- mutex_unlock(&gpu->lock);
-
pm_runtime_put(&gpu->pdev->dev);
+ hangcheck_timer_reset(gpu);
}
/*
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 5cc8d35..d551203 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -21,6 +21,8 @@ struct msm_iommu_pagetable {
struct msm_mmu base;
struct msm_mmu *parent;
struct io_pgtable_ops *pgtbl_ops;
+ const struct iommu_flush_ops *tlb;
+ struct device *iommu_dev;
unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
phys_addr_t ttbr;
u32 asid;
@@ -201,11 +203,33 @@ static const struct msm_mmu_funcs pagetable_funcs = {
static void msm_iommu_tlb_flush_all(void *cookie)
{
+ struct msm_iommu_pagetable *pagetable = cookie;
+ struct adreno_smmu_priv *adreno_smmu;
+
+ if (!pm_runtime_get_if_in_use(pagetable->iommu_dev))
+ return;
+
+ adreno_smmu = dev_get_drvdata(pagetable->parent->dev);
+
+ pagetable->tlb->tlb_flush_all((void *)adreno_smmu->cookie);
+
+ pm_runtime_put_autosuspend(pagetable->iommu_dev);
}
static void msm_iommu_tlb_flush_walk(unsigned long iova, size_t size,
size_t granule, void *cookie)
{
+ struct msm_iommu_pagetable *pagetable = cookie;
+ struct adreno_smmu_priv *adreno_smmu;
+
+ if (!pm_runtime_get_if_in_use(pagetable->iommu_dev))
+ return;
+
+ adreno_smmu = dev_get_drvdata(pagetable->parent->dev);
+
+ pagetable->tlb->tlb_flush_walk(iova, size, granule, (void *)adreno_smmu->cookie);
+
+ pm_runtime_put_autosuspend(pagetable->iommu_dev);
}
static void msm_iommu_tlb_add_page(struct iommu_iotlb_gather *gather,
@@ -213,7 +237,7 @@ static void msm_iommu_tlb_add_page(struct iommu_iotlb_gather *gather,
{
}
-static const struct iommu_flush_ops null_tlb_ops = {
+static const struct iommu_flush_ops tlb_ops = {
.tlb_flush_all = msm_iommu_tlb_flush_all,
.tlb_flush_walk = msm_iommu_tlb_flush_walk,
.tlb_add_page = msm_iommu_tlb_add_page,
@@ -254,10 +278,10 @@ struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
/* The incoming cfg will have the TTBR1 quirk enabled */
ttbr0_cfg.quirks &= ~IO_PGTABLE_QUIRK_ARM_TTBR1;
- ttbr0_cfg.tlb = &null_tlb_ops;
+ ttbr0_cfg.tlb = &tlb_ops;
pagetable->pgtbl_ops = alloc_io_pgtable_ops(ARM_64_LPAE_S1,
- &ttbr0_cfg, iommu->domain);
+ &ttbr0_cfg, pagetable);
if (!pagetable->pgtbl_ops) {
kfree(pagetable);
@@ -279,6 +303,8 @@ struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
/* Needed later for TLB flush */
pagetable->parent = parent;
+ pagetable->tlb = ttbr1_cfg->tlb;
+ pagetable->iommu_dev = ttbr1_cfg->iommu_dev;
pagetable->pgsize_bitmap = ttbr0_cfg.pgsize_bitmap;
pagetable->ttbr = ttbr0_cfg.arm_lpae_s1_cfg.ttbr;
diff --git a/drivers/gpu/drm/msm/msm_mdss.c b/drivers/gpu/drm/msm/msm_mdss.c
index 455b2e3..35423d1 100644
--- a/drivers/gpu/drm/msm/msm_mdss.c
+++ b/drivers/gpu/drm/msm/msm_mdss.c
@@ -562,6 +562,7 @@ static const struct msm_mdss_data sdm670_data = {
.ubwc_enc_version = UBWC_2_0,
.ubwc_dec_version = UBWC_2_0,
.highest_bank_bit = 1,
+ .reg_bus_bw = 76800,
};
static const struct msm_mdss_data sdm845_data = {
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
index 4bc13f7..9d6655f 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.c
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
@@ -21,8 +21,6 @@ static struct dma_fence *msm_job_run(struct drm_sched_job *job)
msm_fence_init(submit->hw_fence, fctx);
- submit->seqno = submit->hw_fence->seqno;
-
mutex_lock(&priv->lru.lock);
for (i = 0; i < submit->nr_bos; i++) {
@@ -35,8 +33,13 @@ static struct dma_fence *msm_job_run(struct drm_sched_job *job)
mutex_unlock(&priv->lru.lock);
+ /* TODO move submit path over to using a per-ring lock.. */
+ mutex_lock(&gpu->lock);
+
msm_gpu_submit(gpu, submit);
+ mutex_unlock(&gpu->lock);
+
return dma_fence_get(submit->hw_fence);
}
diff --git a/drivers/gpu/drm/nouveau/Kconfig b/drivers/gpu/drm/nouveau/Kconfig
index 1e6aaf9..ceef470 100644
--- a/drivers/gpu/drm/nouveau/Kconfig
+++ b/drivers/gpu/drm/nouveau/Kconfig
@@ -100,3 +100,11 @@
help
Say Y here if you want to enable experimental support for
Shared Virtual Memory (SVM).
+
+config DRM_NOUVEAU_GSP_DEFAULT
+ bool "Use GSP firmware for Turing/Ampere (needs firmware installed)"
+ depends on DRM_NOUVEAU
+ default n
+ help
+ Say Y here if you want to use the GSP codepaths by default on
+ Turing and Ampere GPUs.
diff --git a/drivers/gpu/drm/nouveau/include/nvkm/core/client.h b/drivers/gpu/drm/nouveau/include/nvkm/core/client.h
index 0d9fc74..932c9fd 100644
--- a/drivers/gpu/drm/nouveau/include/nvkm/core/client.h
+++ b/drivers/gpu/drm/nouveau/include/nvkm/core/client.h
@@ -11,6 +11,7 @@ struct nvkm_client {
u32 debug;
struct rb_root objroot;
+ spinlock_t obj_lock;
void *data;
int (*event)(u64 token, void *argv, u32 argc);
diff --git a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
index d1437c0..6f5d376 100644
--- a/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
+++ b/drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
@@ -9,7 +9,7 @@
#define GSP_PAGE_SIZE BIT(GSP_PAGE_SHIFT)
struct nvkm_gsp_mem {
- u32 size;
+ size_t size;
void *data;
dma_addr_t addr;
};
diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.c b/drivers/gpu/drm/nouveau/nouveau_abi16.c
index a04156c..80f74ee 100644
--- a/drivers/gpu/drm/nouveau/nouveau_abi16.c
+++ b/drivers/gpu/drm/nouveau/nouveau_abi16.c
@@ -128,12 +128,14 @@ nouveau_abi16_chan_fini(struct nouveau_abi16 *abi16,
struct nouveau_abi16_ntfy *ntfy, *temp;
/* Cancel all jobs from the entity's queue. */
- drm_sched_entity_fini(&chan->sched.entity);
+ if (chan->sched)
+ drm_sched_entity_fini(&chan->sched->entity);
if (chan->chan)
nouveau_channel_idle(chan->chan);
- nouveau_sched_fini(&chan->sched);
+ if (chan->sched)
+ nouveau_sched_destroy(&chan->sched);
/* cleanup notifier state */
list_for_each_entry_safe(ntfy, temp, &chan->notifiers, head) {
@@ -197,6 +199,7 @@ nouveau_abi16_ioctl_getparam(ABI16_IOCTL_ARGS)
struct nouveau_cli *cli = nouveau_cli(file_priv);
struct nouveau_drm *drm = nouveau_drm(dev);
struct nvif_device *device = &drm->client.device;
+ struct nvkm_device *nvkm_device = nvxx_device(&drm->client.device);
struct nvkm_gr *gr = nvxx_gr(device);
struct drm_nouveau_getparam *getparam = data;
struct pci_dev *pdev = to_pci_dev(dev->dev);
@@ -261,6 +264,14 @@ nouveau_abi16_ioctl_getparam(ABI16_IOCTL_ARGS)
getparam->value = nouveau_exec_push_max_from_ib_max(ib_max);
break;
}
+ case NOUVEAU_GETPARAM_VRAM_BAR_SIZE:
+ getparam->value = nvkm_device->func->resource_size(nvkm_device, 1);
+ break;
+ case NOUVEAU_GETPARAM_VRAM_USED: {
+ struct ttm_resource_manager *vram_mgr = ttm_manager_type(&drm->ttm.bdev, TTM_PL_VRAM);
+ getparam->value = (u64)ttm_resource_manager_usage(vram_mgr);
+ break;
+ }
default:
NV_PRINTK(dbg, cli, "unknown parameter %lld\n", getparam->param);
return -EINVAL;
@@ -337,10 +348,16 @@ nouveau_abi16_ioctl_channel_alloc(ABI16_IOCTL_ARGS)
if (ret)
goto done;
- ret = nouveau_sched_init(&chan->sched, drm, drm->sched_wq,
- chan->chan->dma.ib_max);
- if (ret)
- goto done;
+ /* If we're not using the VM_BIND uAPI, we don't need a scheduler.
+ *
+ * The client lock is already acquired by nouveau_abi16_get().
+ */
+ if (nouveau_cli_uvmm(cli)) {
+ ret = nouveau_sched_create(&chan->sched, drm, drm->sched_wq,
+ chan->chan->dma.ib_max);
+ if (ret)
+ goto done;
+ }
init->channel = chan->chan->chid;
diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.h b/drivers/gpu/drm/nouveau/nouveau_abi16.h
index 1f5e243..11c8c4a 100644
--- a/drivers/gpu/drm/nouveau/nouveau_abi16.h
+++ b/drivers/gpu/drm/nouveau/nouveau_abi16.h
@@ -26,7 +26,7 @@ struct nouveau_abi16_chan {
struct nouveau_bo *ntfy;
struct nouveau_vma *ntfy_vma;
struct nvkm_mm heap;
- struct nouveau_sched sched;
+ struct nouveau_sched *sched;
};
struct nouveau_abi16 {
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 6f6c31a..a947e1d 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -201,7 +201,8 @@ nouveau_cli_fini(struct nouveau_cli *cli)
WARN_ON(!list_empty(&cli->worker));
usif_client_fini(cli);
- nouveau_sched_fini(&cli->sched);
+ if (cli->sched)
+ nouveau_sched_destroy(&cli->sched);
if (uvmm)
nouveau_uvmm_fini(uvmm);
nouveau_vmm_fini(&cli->svm);
@@ -311,7 +312,7 @@ nouveau_cli_init(struct nouveau_drm *drm, const char *sname,
cli->mem = &mems[ret];
/* Don't pass in the (shared) sched_wq in order to let
- * nouveau_sched_init() create a dedicated one for VM_BIND jobs.
+ * nouveau_sched_create() create a dedicated one for VM_BIND jobs.
*
* This is required to ensure that for VM_BIND jobs free_job() work and
* run_job() work can always run concurrently and hence, free_job() work
@@ -320,7 +321,7 @@ nouveau_cli_init(struct nouveau_drm *drm, const char *sname,
* locks which indirectly or directly are held for allocations
* elsewhere.
*/
- ret = nouveau_sched_init(&cli->sched, drm, NULL, 1);
+ ret = nouveau_sched_create(&cli->sched, drm, NULL, 1);
if (ret)
goto done;
diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h
index 8a6d94c..e239c6b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drv.h
+++ b/drivers/gpu/drm/nouveau/nouveau_drv.h
@@ -98,7 +98,7 @@ struct nouveau_cli {
bool disabled;
} uvmm;
- struct nouveau_sched sched;
+ struct nouveau_sched *sched;
const struct nvif_mclass *mem;
diff --git a/drivers/gpu/drm/nouveau/nouveau_exec.c b/drivers/gpu/drm/nouveau/nouveau_exec.c
index bc5d71b..e65c0ef 100644
--- a/drivers/gpu/drm/nouveau/nouveau_exec.c
+++ b/drivers/gpu/drm/nouveau/nouveau_exec.c
@@ -389,7 +389,7 @@ nouveau_exec_ioctl_exec(struct drm_device *dev,
if (ret)
goto out;
- args.sched = &chan16->sched;
+ args.sched = chan16->sched;
args.file_priv = file_priv;
args.chan = chan;
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 49c2bcb..5a887d6 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -764,7 +764,7 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data,
return -ENOMEM;
if (unlikely(nouveau_cli_uvmm(cli)))
- return -ENOSYS;
+ return nouveau_abi16_put(abi16, -ENOSYS);
list_for_each_entry(temp, &abi16->channels, head) {
if (temp->chan->chid == req->channel) {
diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.c b/drivers/gpu/drm/nouveau/nouveau_sched.c
index dd98f69..32fa2e2 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sched.c
+++ b/drivers/gpu/drm/nouveau/nouveau_sched.c
@@ -398,7 +398,7 @@ static const struct drm_sched_backend_ops nouveau_sched_ops = {
.free_job = nouveau_sched_free_job,
};
-int
+static int
nouveau_sched_init(struct nouveau_sched *sched, struct nouveau_drm *drm,
struct workqueue_struct *wq, u32 credit_limit)
{
@@ -453,7 +453,30 @@ nouveau_sched_init(struct nouveau_sched *sched, struct nouveau_drm *drm,
return ret;
}
-void
+int
+nouveau_sched_create(struct nouveau_sched **psched, struct nouveau_drm *drm,
+ struct workqueue_struct *wq, u32 credit_limit)
+{
+ struct nouveau_sched *sched;
+ int ret;
+
+ sched = kzalloc(sizeof(*sched), GFP_KERNEL);
+ if (!sched)
+ return -ENOMEM;
+
+ ret = nouveau_sched_init(sched, drm, wq, credit_limit);
+ if (ret) {
+ kfree(sched);
+ return ret;
+ }
+
+ *psched = sched;
+
+ return 0;
+}
+
+
+static void
nouveau_sched_fini(struct nouveau_sched *sched)
{
struct drm_gpu_scheduler *drm_sched = &sched->base;
@@ -471,3 +494,14 @@ nouveau_sched_fini(struct nouveau_sched *sched)
if (sched->wq)
destroy_workqueue(sched->wq);
}
+
+void
+nouveau_sched_destroy(struct nouveau_sched **psched)
+{
+ struct nouveau_sched *sched = *psched;
+
+ nouveau_sched_fini(sched);
+ kfree(sched);
+
+ *psched = NULL;
+}
diff --git a/drivers/gpu/drm/nouveau/nouveau_sched.h b/drivers/gpu/drm/nouveau/nouveau_sched.h
index a6528f5..e1f01a2 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sched.h
+++ b/drivers/gpu/drm/nouveau/nouveau_sched.h
@@ -111,8 +111,8 @@ struct nouveau_sched {
} job;
};
-int nouveau_sched_init(struct nouveau_sched *sched, struct nouveau_drm *drm,
- struct workqueue_struct *wq, u32 credit_limit);
-void nouveau_sched_fini(struct nouveau_sched *sched);
+int nouveau_sched_create(struct nouveau_sched **psched, struct nouveau_drm *drm,
+ struct workqueue_struct *wq, u32 credit_limit);
+void nouveau_sched_destroy(struct nouveau_sched **psched);
#endif
diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
index cc03e0c..5e4565c 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -1011,7 +1011,7 @@ nouveau_svm_fault_buffer_ctor(struct nouveau_svm *svm, s32 oclass, int id)
if (ret)
return ret;
- buffer->fault = kvcalloc(sizeof(*buffer->fault), buffer->entries, GFP_KERNEL);
+ buffer->fault = kvcalloc(buffer->entries, sizeof(*buffer->fault), GFP_KERNEL);
if (!buffer->fault)
return -ENOMEM;
diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
index 4f223c9..0a0a11d 100644
--- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c
@@ -1740,7 +1740,7 @@ nouveau_uvmm_ioctl_vm_bind(struct drm_device *dev,
if (ret)
return ret;
- args.sched = &cli->sched;
+ args.sched = cli->sched;
args.file_priv = file_priv;
ret = nouveau_uvmm_vm_bind(&args);
diff --git a/drivers/gpu/drm/nouveau/nvkm/core/client.c b/drivers/gpu/drm/nouveau/nvkm/core/client.c
index ebdeb8e..c556629 100644
--- a/drivers/gpu/drm/nouveau/nvkm/core/client.c
+++ b/drivers/gpu/drm/nouveau/nvkm/core/client.c
@@ -180,6 +180,7 @@ nvkm_client_new(const char *name, u64 device, const char *cfg, const char *dbg,
client->device = device;
client->debug = nvkm_dbgopt(dbg, "CLIENT");
client->objroot = RB_ROOT;
+ spin_lock_init(&client->obj_lock);
client->event = event;
INIT_LIST_HEAD(&client->umem);
spin_lock_init(&client->lock);
diff --git a/drivers/gpu/drm/nouveau/nvkm/core/object.c b/drivers/gpu/drm/nouveau/nvkm/core/object.c
index 7c554c1..aea3ba7 100644
--- a/drivers/gpu/drm/nouveau/nvkm/core/object.c
+++ b/drivers/gpu/drm/nouveau/nvkm/core/object.c
@@ -30,8 +30,10 @@ nvkm_object_search(struct nvkm_client *client, u64 handle,
const struct nvkm_object_func *func)
{
struct nvkm_object *object;
+ unsigned long flags;
if (handle) {
+ spin_lock_irqsave(&client->obj_lock, flags);
struct rb_node *node = client->objroot.rb_node;
while (node) {
object = rb_entry(node, typeof(*object), node);
@@ -40,9 +42,12 @@ nvkm_object_search(struct nvkm_client *client, u64 handle,
else
if (handle > object->object)
node = node->rb_right;
- else
+ else {
+ spin_unlock_irqrestore(&client->obj_lock, flags);
goto done;
+ }
}
+ spin_unlock_irqrestore(&client->obj_lock, flags);
return ERR_PTR(-ENOENT);
} else {
object = &client->object;
@@ -57,30 +62,39 @@ nvkm_object_search(struct nvkm_client *client, u64 handle,
void
nvkm_object_remove(struct nvkm_object *object)
{
+ unsigned long flags;
+
+ spin_lock_irqsave(&object->client->obj_lock, flags);
if (!RB_EMPTY_NODE(&object->node))
rb_erase(&object->node, &object->client->objroot);
+ spin_unlock_irqrestore(&object->client->obj_lock, flags);
}
bool
nvkm_object_insert(struct nvkm_object *object)
{
- struct rb_node **ptr = &object->client->objroot.rb_node;
+ struct rb_node **ptr;
struct rb_node *parent = NULL;
+ unsigned long flags;
+ spin_lock_irqsave(&object->client->obj_lock, flags);
+ ptr = &object->client->objroot.rb_node;
while (*ptr) {
struct nvkm_object *this = rb_entry(*ptr, typeof(*this), node);
parent = *ptr;
- if (object->object < this->object)
+ if (object->object < this->object) {
ptr = &parent->rb_left;
- else
- if (object->object > this->object)
+ } else if (object->object > this->object) {
ptr = &parent->rb_right;
- else
+ } else {
+ spin_unlock_irqrestore(&object->client->obj_lock, flags);
return false;
+ }
}
rb_link_node(&object->node, parent, ptr);
rb_insert_color(&object->node, &object->client->objroot);
+ spin_unlock_irqrestore(&object->client->obj_lock, flags);
return true;
}
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/r535.c
index 4135690..3a30bea 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/r535.c
@@ -168,12 +168,11 @@ r535_bar_new_(const struct nvkm_bar_func *hw, struct nvkm_device *device,
rm->flush = r535_bar_flush;
ret = gf100_bar_new_(rm, device, type, inst, &bar);
- *pbar = bar;
if (ret) {
- if (!bar)
- kfree(rm);
+ kfree(rm);
return ret;
}
+ *pbar = bar;
bar->flushBAR2PhysMode = ioremap(device->func->resource_addr(device, 3), PAGE_SIZE);
if (!bar->flushBAR2PhysMode)
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadow.c b/drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadow.c
index 1918868..8c2bf1c 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadow.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadow.c
@@ -154,11 +154,17 @@ shadow_fw_init(struct nvkm_bios *bios, const char *name)
return (void *)fw;
}
+static void
+shadow_fw_release(void *fw)
+{
+ release_firmware(fw);
+}
+
static const struct nvbios_source
shadow_fw = {
.name = "firmware",
.init = shadow_fw_init,
- .fini = (void(*)(void *))release_firmware,
+ .fini = shadow_fw_release,
.read = shadow_fw_read,
.rw = false,
};
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
index 5e1fa17..a73a5b5 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
@@ -997,6 +997,32 @@ r535_gsp_rpc_get_gsp_static_info(struct nvkm_gsp *gsp)
return 0;
}
+static void
+nvkm_gsp_mem_dtor(struct nvkm_gsp *gsp, struct nvkm_gsp_mem *mem)
+{
+ if (mem->data) {
+ /*
+ * Poison the buffer to catch any unexpected access from
+ * GSP-RM if the buffer was prematurely freed.
+ */
+ memset(mem->data, 0xFF, mem->size);
+
+ dma_free_coherent(gsp->subdev.device->dev, mem->size, mem->data, mem->addr);
+ memset(mem, 0, sizeof(*mem));
+ }
+}
+
+static int
+nvkm_gsp_mem_ctor(struct nvkm_gsp *gsp, size_t size, struct nvkm_gsp_mem *mem)
+{
+ mem->size = size;
+ mem->data = dma_alloc_coherent(gsp->subdev.device->dev, size, &mem->addr, GFP_KERNEL);
+ if (WARN_ON(!mem->data))
+ return -ENOMEM;
+
+ return 0;
+}
+
static int
r535_gsp_postinit(struct nvkm_gsp *gsp)
{
@@ -1024,6 +1050,11 @@ r535_gsp_postinit(struct nvkm_gsp *gsp)
nvkm_inth_allow(&gsp->subdev.inth);
nvkm_wr32(device, 0x110004, 0x00000040);
+
+ /* Release the DMA buffers that were needed only for boot and init */
+ nvkm_gsp_mem_dtor(gsp, &gsp->boot.fw);
+ nvkm_gsp_mem_dtor(gsp, &gsp->libos);
+
return ret;
}
@@ -1532,27 +1563,6 @@ r535_gsp_msg_run_cpu_sequencer(void *priv, u32 fn, void *repv, u32 repc)
return 0;
}
-static void
-nvkm_gsp_mem_dtor(struct nvkm_gsp *gsp, struct nvkm_gsp_mem *mem)
-{
- if (mem->data) {
- dma_free_coherent(gsp->subdev.device->dev, mem->size, mem->data, mem->addr);
- mem->data = NULL;
- }
-}
-
-static int
-nvkm_gsp_mem_ctor(struct nvkm_gsp *gsp, u32 size, struct nvkm_gsp_mem *mem)
-{
- mem->size = size;
- mem->data = dma_alloc_coherent(gsp->subdev.device->dev, size, &mem->addr, GFP_KERNEL);
- if (WARN_ON(!mem->data))
- return -ENOMEM;
-
- return 0;
-}
-
-
static int
r535_gsp_booter_unload(struct nvkm_gsp *gsp, u32 mbox0, u32 mbox1)
{
@@ -1938,20 +1948,20 @@ nvkm_gsp_radix3_dtor(struct nvkm_gsp *gsp, struct nvkm_gsp_radix3 *rx3)
* See kgspCreateRadix3_IMPL
*/
static int
-nvkm_gsp_radix3_sg(struct nvkm_device *device, struct sg_table *sgt, u64 size,
+nvkm_gsp_radix3_sg(struct nvkm_gsp *gsp, struct sg_table *sgt, u64 size,
struct nvkm_gsp_radix3 *rx3)
{
u64 addr;
for (int i = ARRAY_SIZE(rx3->mem) - 1; i >= 0; i--) {
u64 *ptes;
- int idx;
+ size_t bufsize;
+ int ret, idx;
- rx3->mem[i].size = ALIGN((size / GSP_PAGE_SIZE) * sizeof(u64), GSP_PAGE_SIZE);
- rx3->mem[i].data = dma_alloc_coherent(device->dev, rx3->mem[i].size,
- &rx3->mem[i].addr, GFP_KERNEL);
- if (WARN_ON(!rx3->mem[i].data))
- return -ENOMEM;
+ bufsize = ALIGN((size / GSP_PAGE_SIZE) * sizeof(u64), GSP_PAGE_SIZE);
+ ret = nvkm_gsp_mem_ctor(gsp, bufsize, &rx3->mem[i]);
+ if (ret)
+ return ret;
ptes = rx3->mem[i].data;
if (i == 2) {
@@ -1991,7 +2001,7 @@ r535_gsp_fini(struct nvkm_gsp *gsp, bool suspend)
if (ret)
return ret;
- ret = nvkm_gsp_radix3_sg(gsp->subdev.device, &gsp->sr.sgt, len, &gsp->sr.radix3);
+ ret = nvkm_gsp_radix3_sg(gsp, &gsp->sr.sgt, len, &gsp->sr.radix3);
if (ret)
return ret;
@@ -2150,6 +2160,13 @@ r535_gsp_dtor(struct nvkm_gsp *gsp)
mutex_destroy(&gsp->cmdq.mutex);
r535_gsp_dtor_fws(gsp);
+
+ nvkm_gsp_mem_dtor(gsp, &gsp->rmargs);
+ nvkm_gsp_mem_dtor(gsp, &gsp->wpr_meta);
+ nvkm_gsp_mem_dtor(gsp, &gsp->shm.mem);
+ nvkm_gsp_mem_dtor(gsp, &gsp->loginit);
+ nvkm_gsp_mem_dtor(gsp, &gsp->logintr);
+ nvkm_gsp_mem_dtor(gsp, &gsp->logrm);
}
int
@@ -2194,7 +2211,7 @@ r535_gsp_oneinit(struct nvkm_gsp *gsp)
memcpy(gsp->sig.data, data, size);
/* Build radix3 page table for ELF image. */
- ret = nvkm_gsp_radix3_sg(device, &gsp->fw.mem.sgt, gsp->fw.len, &gsp->radix3);
+ ret = nvkm_gsp_radix3_sg(gsp, &gsp->fw.mem.sgt, gsp->fw.len, &gsp->radix3);
if (ret)
return ret;
@@ -2295,8 +2312,12 @@ r535_gsp_load(struct nvkm_gsp *gsp, int ver, const struct nvkm_gsp_fwif *fwif)
{
struct nvkm_subdev *subdev = &gsp->subdev;
int ret;
+ bool enable_gsp = fwif->enable;
- if (!nvkm_boolopt(subdev->device->cfgopt, "NvGspRm", fwif->enable))
+#if IS_ENABLED(CONFIG_DRM_NOUVEAU_GSP_DEFAULT)
+ enable_gsp = true;
+#endif
+ if (!nvkm_boolopt(subdev->device->cfgopt, "NvGspRm", enable_gsp))
return -EINVAL;
if ((ret = r535_gsp_load_fw(gsp, "gsp", fwif->ver, &gsp->fws.rm)) ||
diff --git a/drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c b/drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c
index c4c0f08..4945a1e 100644
--- a/drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c
+++ b/drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c
@@ -1768,11 +1768,11 @@ static const struct panel_desc starry_qfh032011_53g_desc = {
};
static const struct drm_display_mode starry_himax83102_j02_default_mode = {
- .clock = 162850,
+ .clock = 162680,
.hdisplay = 1200,
- .hsync_start = 1200 + 50,
- .hsync_end = 1200 + 50 + 20,
- .htotal = 1200 + 50 + 20 + 50,
+ .hsync_start = 1200 + 60,
+ .hsync_end = 1200 + 60 + 20,
+ .htotal = 1200 + 60 + 20 + 40,
.vdisplay = 1920,
.vsync_start = 1920 + 116,
.vsync_end = 1920 + 116 + 8,
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c b/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c
index 85b3b48..fdd768b 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop2.c
@@ -1985,8 +1985,10 @@ static void vop2_crtc_atomic_enable(struct drm_crtc *crtc,
clock = vop2_set_intf_mux(vp, rkencoder->crtc_endpoint_id, polflags);
}
- if (!clock)
+ if (!clock) {
+ vop2_unlock(vop2);
return;
+ }
if (vcstate->output_mode == ROCKCHIP_OUT_MODE_AAAA &&
!(vp_data->feature & VOP2_VP_FEATURE_OUTPUT_10BIT))
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 85f0823..d442b89 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1178,21 +1178,24 @@ static void drm_sched_run_job_work(struct work_struct *w)
struct drm_sched_entity *entity;
struct dma_fence *fence;
struct drm_sched_fence *s_fence;
- struct drm_sched_job *sched_job = NULL;
+ struct drm_sched_job *sched_job;
int r;
if (READ_ONCE(sched->pause_submit))
return;
/* Find entity with a ready job */
- while (!sched_job && (entity = drm_sched_select_entity(sched))) {
- sched_job = drm_sched_entity_pop_job(entity);
- if (!sched_job)
- complete_all(&entity->entity_idle);
- }
+ entity = drm_sched_select_entity(sched);
if (!entity)
return; /* No more work */
+ sched_job = drm_sched_entity_pop_job(entity);
+ if (!sched_job) {
+ complete_all(&entity->entity_idle);
+ drm_sched_run_job_queue(sched);
+ return;
+ }
+
s_fence = sched_job->s_fence;
atomic_add(sched_job->credits, &sched->credit_count);
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index a73cff7..03d1c76 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -1243,9 +1243,26 @@ static int host1x_drm_probe(struct host1x_device *dev)
drm_mode_config_reset(drm);
- err = drm_aperture_remove_framebuffers(&tegra_drm_driver);
- if (err < 0)
- goto hub;
+ /*
+ * Only take over from a potential firmware framebuffer if any CRTCs
+ * have been registered. This must not be a fatal error because there
+ * are other accelerators that are exposed via this driver.
+ *
+ * Another case where this happens is on Tegra234 where the display
+ * hardware is no longer part of the host1x complex, so this driver
+ * will not expose any modesetting features.
+ */
+ if (drm->mode_config.num_crtc > 0) {
+ err = drm_aperture_remove_framebuffers(&tegra_drm_driver);
+ if (err < 0)
+ goto hub;
+ } else {
+ /*
+ * Indicate to userspace that this doesn't expose any display
+ * capabilities.
+ */
+ drm->driver_features &= ~(DRIVER_MODESET | DRIVER_ATOMIC);
+ }
err = drm_dev_register(drm, 0);
if (err < 0)
diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c b/drivers/gpu/drm/tests/drm_buddy_test.c
index ea2af6b..e48863a 100644
--- a/drivers/gpu/drm/tests/drm_buddy_test.c
+++ b/drivers/gpu/drm/tests/drm_buddy_test.c
@@ -8,16 +8,308 @@
#include <linux/prime_numbers.h>
#include <linux/sched/signal.h>
+#include <linux/sizes.h>
#include <drm/drm_buddy.h>
#include "../lib/drm_random.h"
+static unsigned int random_seed;
+
static inline u64 get_size(int order, u64 chunk_size)
{
return (1 << order) * chunk_size;
}
+static void drm_test_buddy_alloc_range_bias(struct kunit *test)
+{
+ u32 mm_size, ps, bias_size, bias_start, bias_end, bias_rem;
+ DRM_RND_STATE(prng, random_seed);
+ unsigned int i, count, *order;
+ struct drm_buddy mm;
+ LIST_HEAD(allocated);
+
+ bias_size = SZ_1M;
+ ps = roundup_pow_of_two(prandom_u32_state(&prng) % bias_size);
+ ps = max(SZ_4K, ps);
+ mm_size = (SZ_8M-1) & ~(ps-1); /* Multiple roots */
+
+ kunit_info(test, "mm_size=%u, ps=%u\n", mm_size, ps);
+
+ KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, mm_size, ps),
+ "buddy_init failed\n");
+
+ count = mm_size / bias_size;
+ order = drm_random_order(count, &prng);
+ KUNIT_EXPECT_TRUE(test, order);
+
+ /*
+ * Idea is to split the address space into uniform bias ranges, and then
+ * in some random order allocate within each bias, using various
+ * patterns within. This should detect if allocations leak out from a
+ * given bias, for example.
+ */
+
+ for (i = 0; i < count; i++) {
+ LIST_HEAD(tmp);
+ u32 size;
+
+ bias_start = order[i] * bias_size;
+ bias_end = bias_start + bias_size;
+ bias_rem = bias_size;
+
+ /* internal round_up too big */
+ KUNIT_ASSERT_TRUE_MSG(test,
+ drm_buddy_alloc_blocks(&mm, bias_start,
+ bias_end, bias_size + ps, bias_size,
+ &allocated,
+ DRM_BUDDY_RANGE_ALLOCATION),
+ "buddy_alloc failed with bias(%x-%x), size=%u, ps=%u\n",
+ bias_start, bias_end, bias_size, bias_size);
+
+ /* size too big */
+ KUNIT_ASSERT_TRUE_MSG(test,
+ drm_buddy_alloc_blocks(&mm, bias_start,
+ bias_end, bias_size + ps, ps,
+ &allocated,
+ DRM_BUDDY_RANGE_ALLOCATION),
+ "buddy_alloc didn't fail with bias(%x-%x), size=%u, ps=%u\n",
+ bias_start, bias_end, bias_size + ps, ps);
+
+ /* bias range too small for size */
+ KUNIT_ASSERT_TRUE_MSG(test,
+ drm_buddy_alloc_blocks(&mm, bias_start + ps,
+ bias_end, bias_size, ps,
+ &allocated,
+ DRM_BUDDY_RANGE_ALLOCATION),
+ "buddy_alloc didn't fail with bias(%x-%x), size=%u, ps=%u\n",
+ bias_start + ps, bias_end, bias_size, ps);
+
+ /* bias misaligned */
+ KUNIT_ASSERT_TRUE_MSG(test,
+ drm_buddy_alloc_blocks(&mm, bias_start + ps,
+ bias_end - ps,
+ bias_size >> 1, bias_size >> 1,
+ &allocated,
+ DRM_BUDDY_RANGE_ALLOCATION),
+ "buddy_alloc h didn't fail with bias(%x-%x), size=%u, ps=%u\n",
+ bias_start + ps, bias_end - ps, bias_size >> 1, bias_size >> 1);
+
+ /* single big page */
+ KUNIT_ASSERT_FALSE_MSG(test,
+ drm_buddy_alloc_blocks(&mm, bias_start,
+ bias_end, bias_size, bias_size,
+ &tmp,
+ DRM_BUDDY_RANGE_ALLOCATION),
+ "buddy_alloc i failed with bias(%x-%x), size=%u, ps=%u\n",
+ bias_start, bias_end, bias_size, bias_size);
+ drm_buddy_free_list(&mm, &tmp);
+
+ /* single page with internal round_up */
+ KUNIT_ASSERT_FALSE_MSG(test,
+ drm_buddy_alloc_blocks(&mm, bias_start,
+ bias_end, ps, bias_size,
+ &tmp,
+ DRM_BUDDY_RANGE_ALLOCATION),
+ "buddy_alloc failed with bias(%x-%x), size=%u, ps=%u\n",
+ bias_start, bias_end, ps, bias_size);
+ drm_buddy_free_list(&mm, &tmp);
+
+ /* random size within */
+ size = max(round_up(prandom_u32_state(&prng) % bias_rem, ps), ps);
+ if (size)
+ KUNIT_ASSERT_FALSE_MSG(test,
+ drm_buddy_alloc_blocks(&mm, bias_start,
+ bias_end, size, ps,
+ &tmp,
+ DRM_BUDDY_RANGE_ALLOCATION),
+ "buddy_alloc failed with bias(%x-%x), size=%u, ps=%u\n",
+ bias_start, bias_end, size, ps);
+
+ bias_rem -= size;
+ /* too big for current avail */
+ KUNIT_ASSERT_TRUE_MSG(test,
+ drm_buddy_alloc_blocks(&mm, bias_start,
+ bias_end, bias_rem + ps, ps,
+ &allocated,
+ DRM_BUDDY_RANGE_ALLOCATION),
+ "buddy_alloc didn't fail with bias(%x-%x), size=%u, ps=%u\n",
+ bias_start, bias_end, bias_rem + ps, ps);
+
+ if (bias_rem) {
+ /* random fill of the remainder */
+ size = max(round_up(prandom_u32_state(&prng) % bias_rem, ps), ps);
+ size = max(size, ps);
+
+ KUNIT_ASSERT_FALSE_MSG(test,
+ drm_buddy_alloc_blocks(&mm, bias_start,
+ bias_end, size, ps,
+ &allocated,
+ DRM_BUDDY_RANGE_ALLOCATION),
+ "buddy_alloc failed with bias(%x-%x), size=%u, ps=%u\n",
+ bias_start, bias_end, size, ps);
+ /*
+ * Intentionally allow some space to be left
+ * unallocated, and ideally not always on the bias
+ * boundaries.
+ */
+ drm_buddy_free_list(&mm, &tmp);
+ } else {
+ list_splice_tail(&tmp, &allocated);
+ }
+ }
+
+ kfree(order);
+ drm_buddy_free_list(&mm, &allocated);
+ drm_buddy_fini(&mm);
+
+ /*
+ * Something more free-form. Idea is to pick a random starting bias
+ * range within the address space and then start filling it up. Also
+ * randomly grow the bias range in both directions as we go along. This
+ * should give us bias start/end which is not always uniform like above,
+ * and in some cases will require the allocator to jump over already
+ * allocated nodes in the middle of the address space.
+ */
+
+ KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, mm_size, ps),
+ "buddy_init failed\n");
+
+ bias_start = round_up(prandom_u32_state(&prng) % (mm_size - ps), ps);
+ bias_end = round_up(bias_start + prandom_u32_state(&prng) % (mm_size - bias_start), ps);
+ bias_end = max(bias_end, bias_start + ps);
+ bias_rem = bias_end - bias_start;
+
+ do {
+ u32 size = max(round_up(prandom_u32_state(&prng) % bias_rem, ps), ps);
+
+ KUNIT_ASSERT_FALSE_MSG(test,
+ drm_buddy_alloc_blocks(&mm, bias_start,
+ bias_end, size, ps,
+ &allocated,
+ DRM_BUDDY_RANGE_ALLOCATION),
+ "buddy_alloc failed with bias(%x-%x), size=%u, ps=%u\n",
+ bias_start, bias_end, size, ps);
+ bias_rem -= size;
+
+ /*
+ * Try to randomly grow the bias range in both directions, or
+ * only one, or perhaps don't grow at all.
+ */
+ do {
+ u32 old_bias_start = bias_start;
+ u32 old_bias_end = bias_end;
+
+ if (bias_start)
+ bias_start -= round_up(prandom_u32_state(&prng) % bias_start, ps);
+ if (bias_end != mm_size)
+ bias_end += round_up(prandom_u32_state(&prng) % (mm_size - bias_end), ps);
+
+ bias_rem += old_bias_start - bias_start;
+ bias_rem += bias_end - old_bias_end;
+ } while (!bias_rem && (bias_start || bias_end != mm_size));
+ } while (bias_rem);
+
+ KUNIT_ASSERT_EQ(test, bias_start, 0);
+ KUNIT_ASSERT_EQ(test, bias_end, mm_size);
+ KUNIT_ASSERT_TRUE_MSG(test,
+ drm_buddy_alloc_blocks(&mm, bias_start, bias_end,
+ ps, ps,
+ &allocated,
+ DRM_BUDDY_RANGE_ALLOCATION),
+ "buddy_alloc passed with bias(%x-%x), size=%u\n",
+ bias_start, bias_end, ps);
+
+ drm_buddy_free_list(&mm, &allocated);
+ drm_buddy_fini(&mm);
+}
+
+static void drm_test_buddy_alloc_contiguous(struct kunit *test)
+{
+ const unsigned long ps = SZ_4K, mm_size = 16 * 3 * SZ_4K;
+ unsigned long i, n_pages, total;
+ struct drm_buddy_block *block;
+ struct drm_buddy mm;
+ LIST_HEAD(left);
+ LIST_HEAD(middle);
+ LIST_HEAD(right);
+ LIST_HEAD(allocated);
+
+ KUNIT_EXPECT_FALSE(test, drm_buddy_init(&mm, mm_size, ps));
+
+ /*
+ * Idea is to fragment the address space by alternating block
+ * allocations between three different lists; one for left, middle and
+ * right. We can then free a list to simulate fragmentation. In
+ * particular we want to exercise the DRM_BUDDY_CONTIGUOUS_ALLOCATION,
+ * including the try_harder path.
+ */
+
+ i = 0;
+ n_pages = mm_size / ps;
+ do {
+ struct list_head *list;
+ int slot = i % 3;
+
+ if (slot == 0)
+ list = &left;
+ else if (slot == 1)
+ list = &middle;
+ else
+ list = &right;
+ KUNIT_ASSERT_FALSE_MSG(test,
+ drm_buddy_alloc_blocks(&mm, 0, mm_size,
+ ps, ps, list, 0),
+ "buddy_alloc hit an error size=%lu\n",
+ ps);
+ } while (++i < n_pages);
+
+ KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
+ 3 * ps, ps, &allocated,
+ DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+ "buddy_alloc didn't error size=%lu\n", 3 * ps);
+
+ drm_buddy_free_list(&mm, &middle);
+ KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
+ 3 * ps, ps, &allocated,
+ DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+ "buddy_alloc didn't error size=%lu\n", 3 * ps);
+ KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
+ 2 * ps, ps, &allocated,
+ DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+ "buddy_alloc didn't error size=%lu\n", 2 * ps);
+
+ drm_buddy_free_list(&mm, &right);
+ KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
+ 3 * ps, ps, &allocated,
+ DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+ "buddy_alloc didn't error size=%lu\n", 3 * ps);
+ /*
+ * At this point we should have enough contiguous space for 2 blocks,
+ * however they are never buddies (since we freed middle and right) so
+ * will require the try_harder logic to find them.
+ */
+ KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
+ 2 * ps, ps, &allocated,
+ DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+ "buddy_alloc hit an error size=%lu\n", 2 * ps);
+
+ drm_buddy_free_list(&mm, &left);
+ KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
+ 3 * ps, ps, &allocated,
+ DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+ "buddy_alloc hit an error size=%lu\n", 3 * ps);
+
+ total = 0;
+ list_for_each_entry(block, &allocated, link)
+ total += drm_buddy_block_size(&mm, block);
+
+ KUNIT_ASSERT_EQ(test, total, ps * 2 + ps * 3);
+
+ drm_buddy_free_list(&mm, &allocated);
+ drm_buddy_fini(&mm);
+}
+
static void drm_test_buddy_alloc_pathological(struct kunit *test)
{
u64 mm_size, size, start = 0;
@@ -275,16 +567,30 @@ static void drm_test_buddy_alloc_limit(struct kunit *test)
drm_buddy_fini(&mm);
}
+static int drm_buddy_suite_init(struct kunit_suite *suite)
+{
+ while (!random_seed)
+ random_seed = get_random_u32();
+
+ kunit_info(suite, "Testing DRM buddy manager, with random_seed=0x%x\n",
+ random_seed);
+
+ return 0;
+}
+
static struct kunit_case drm_buddy_tests[] = {
KUNIT_CASE(drm_test_buddy_alloc_limit),
KUNIT_CASE(drm_test_buddy_alloc_optimistic),
KUNIT_CASE(drm_test_buddy_alloc_pessimistic),
KUNIT_CASE(drm_test_buddy_alloc_pathological),
+ KUNIT_CASE(drm_test_buddy_alloc_contiguous),
+ KUNIT_CASE(drm_test_buddy_alloc_range_bias),
{}
};
static struct kunit_suite drm_buddy_test_suite = {
.name = "drm_buddy",
+ .suite_init = drm_buddy_suite_init,
.test_cases = drm_buddy_tests,
};
diff --git a/drivers/gpu/drm/tests/drm_mm_test.c b/drivers/gpu/drm/tests/drm_mm_test.c
index 1eb0c30..f37c0d7 100644
--- a/drivers/gpu/drm/tests/drm_mm_test.c
+++ b/drivers/gpu/drm/tests/drm_mm_test.c
@@ -157,7 +157,7 @@ static void drm_test_mm_init(struct kunit *test)
/* After creation, it should all be one massive hole */
if (!assert_one_hole(test, &mm, 0, size)) {
- KUNIT_FAIL(test, "");
+ KUNIT_FAIL(test, "mm not one hole on creation");
goto out;
}
@@ -171,14 +171,14 @@ static void drm_test_mm_init(struct kunit *test)
/* After filling the range entirely, there should be no holes */
if (!assert_no_holes(test, &mm)) {
- KUNIT_FAIL(test, "");
+ KUNIT_FAIL(test, "mm has holes when filled");
goto out;
}
/* And then after emptying it again, the massive hole should be back */
drm_mm_remove_node(&tmp);
if (!assert_one_hole(test, &mm, 0, size)) {
- KUNIT_FAIL(test, "");
+ KUNIT_FAIL(test, "mm does not have single hole after emptying");
goto out;
}
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index b62f420..112438d 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -387,7 +387,7 @@ static void ttm_pool_free_range(struct ttm_pool *pool, struct ttm_tt *tt,
enum ttm_caching caching,
pgoff_t start_page, pgoff_t end_page)
{
- struct page **pages = tt->pages;
+ struct page **pages = &tt->pages[start_page];
unsigned int order;
pgoff_t i, nr;
diff --git a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_object.h b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_object.h
index 68d9f61..777c20c 100644
--- a/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_object.h
@@ -10,7 +10,7 @@
#include "xe_bo.h"
-#define i915_gem_object_is_shmem(obj) ((obj)->flags & XE_BO_CREATE_SYSTEM_BIT)
+#define i915_gem_object_is_shmem(obj) (0) /* We don't use shmem */
static inline dma_addr_t i915_gem_object_get_dma_address(const struct xe_bo *bo, pgoff_t n)
{
diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
index a6523df..c347e2c 100644
--- a/drivers/gpu/drm/xe/tests/xe_migrate.c
+++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
@@ -114,21 +114,21 @@ static void test_copy(struct xe_migrate *m, struct xe_bo *bo,
region |
XE_BO_NEEDS_CPU_ACCESS);
if (IS_ERR(remote)) {
- KUNIT_FAIL(test, "Failed to allocate remote bo for %s: %li\n",
- str, PTR_ERR(remote));
+ KUNIT_FAIL(test, "Failed to allocate remote bo for %s: %pe\n",
+ str, remote);
return;
}
err = xe_bo_validate(remote, NULL, false);
if (err) {
- KUNIT_FAIL(test, "Failed to validate system bo for %s: %li\n",
+ KUNIT_FAIL(test, "Failed to validate system bo for %s: %i\n",
str, err);
goto out_unlock;
}
err = xe_bo_vmap(remote);
if (err) {
- KUNIT_FAIL(test, "Failed to vmap system bo for %s: %li\n",
+ KUNIT_FAIL(test, "Failed to vmap system bo for %s: %i\n",
str, err);
goto out_unlock;
}
diff --git a/drivers/gpu/drm/xe/tests/xe_mocs_test.c b/drivers/gpu/drm/xe/tests/xe_mocs_test.c
index ef56bd5..421b819 100644
--- a/drivers/gpu/drm/xe/tests/xe_mocs_test.c
+++ b/drivers/gpu/drm/xe/tests/xe_mocs_test.c
@@ -21,4 +21,5 @@ kunit_test_suite(xe_mocs_test_suite);
MODULE_AUTHOR("Intel Corporation");
MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("xe_mocs kunit test");
MODULE_IMPORT_NS(EXPORTED_FOR_KUNIT_TESTING);
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 0b0e262..4d3b80e 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -28,6 +28,14 @@
#include "xe_ttm_stolen_mgr.h"
#include "xe_vm.h"
+const char *const xe_mem_type_to_name[TTM_NUM_MEM_TYPES] = {
+ [XE_PL_SYSTEM] = "system",
+ [XE_PL_TT] = "gtt",
+ [XE_PL_VRAM0] = "vram0",
+ [XE_PL_VRAM1] = "vram1",
+ [XE_PL_STOLEN] = "stolen"
+};
+
static const struct ttm_place sys_placement_flags = {
.fpfn = 0,
.lpfn = 0,
@@ -713,8 +721,7 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
migrate = xe->tiles[0].migrate;
xe_assert(xe, migrate);
-
- trace_xe_bo_move(bo);
+ trace_xe_bo_move(bo, new_mem->mem_type, old_mem_type, move_lacks_source);
xe_device_mem_access_get(xe);
if (xe_bo_is_pinned(bo) && !xe_bo_is_user(bo)) {
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index 9b1279a..8be42ac 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -243,6 +243,7 @@ int xe_bo_evict_pinned(struct xe_bo *bo);
int xe_bo_restore_pinned(struct xe_bo *bo);
extern struct ttm_device_funcs xe_ttm_funcs;
+extern const char *const xe_mem_type_to_name[];
int xe_gem_create_ioctl(struct drm_device *dev, void *data,
struct drm_file *file);
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 1f0b4b9..5176c27 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -83,9 +83,6 @@ static int xe_file_open(struct drm_device *dev, struct drm_file *file)
return 0;
}
-static void device_kill_persistent_exec_queues(struct xe_device *xe,
- struct xe_file *xef);
-
static void xe_file_close(struct drm_device *dev, struct drm_file *file)
{
struct xe_device *xe = to_xe_device(dev);
@@ -102,8 +99,6 @@ static void xe_file_close(struct drm_device *dev, struct drm_file *file)
mutex_unlock(&xef->exec_queue.lock);
xa_destroy(&xef->exec_queue.xa);
mutex_destroy(&xef->exec_queue.lock);
- device_kill_persistent_exec_queues(xe, xef);
-
mutex_lock(&xef->vm.lock);
xa_for_each(&xef->vm.xa, idx, vm)
xe_vm_close_and_put(vm);
@@ -255,9 +250,6 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
xa_erase(&xe->usm.asid_to_vm, asid);
}
- drmm_mutex_init(&xe->drm, &xe->persistent_engines.lock);
- INIT_LIST_HEAD(&xe->persistent_engines.list);
-
spin_lock_init(&xe->pinned.lock);
INIT_LIST_HEAD(&xe->pinned.kernel_bo_present);
INIT_LIST_HEAD(&xe->pinned.external_vram);
@@ -570,37 +562,6 @@ void xe_device_shutdown(struct xe_device *xe)
{
}
-void xe_device_add_persistent_exec_queues(struct xe_device *xe, struct xe_exec_queue *q)
-{
- mutex_lock(&xe->persistent_engines.lock);
- list_add_tail(&q->persistent.link, &xe->persistent_engines.list);
- mutex_unlock(&xe->persistent_engines.lock);
-}
-
-void xe_device_remove_persistent_exec_queues(struct xe_device *xe,
- struct xe_exec_queue *q)
-{
- mutex_lock(&xe->persistent_engines.lock);
- if (!list_empty(&q->persistent.link))
- list_del(&q->persistent.link);
- mutex_unlock(&xe->persistent_engines.lock);
-}
-
-static void device_kill_persistent_exec_queues(struct xe_device *xe,
- struct xe_file *xef)
-{
- struct xe_exec_queue *q, *next;
-
- mutex_lock(&xe->persistent_engines.lock);
- list_for_each_entry_safe(q, next, &xe->persistent_engines.list,
- persistent.link)
- if (q->persistent.xef == xef) {
- xe_exec_queue_kill(q);
- list_del_init(&q->persistent.link);
- }
- mutex_unlock(&xe->persistent_engines.lock);
-}
-
void xe_device_wmb(struct xe_device *xe)
{
struct xe_gt *gt = xe_root_mmio_gt(xe);
diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index 3da83b2..08d8b72 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -42,10 +42,6 @@ int xe_device_probe(struct xe_device *xe);
void xe_device_remove(struct xe_device *xe);
void xe_device_shutdown(struct xe_device *xe);
-void xe_device_add_persistent_exec_queues(struct xe_device *xe, struct xe_exec_queue *q);
-void xe_device_remove_persistent_exec_queues(struct xe_device *xe,
- struct xe_exec_queue *q);
-
void xe_device_wmb(struct xe_device *xe);
static inline struct xe_file *to_xe_file(const struct drm_file *file)
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 5dc9127..e849197 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -341,14 +341,6 @@ struct xe_device {
struct mutex lock;
} usm;
- /** @persistent_engines: engines that are closed but still running */
- struct {
- /** @lock: protects persistent engines */
- struct mutex lock;
- /** @list: list of persistent engines */
- struct list_head list;
- } persistent_engines;
-
/** @pinned: pinned BO state */
struct {
/** @lock: protected pinned BO list state */
diff --git a/drivers/gpu/drm/xe/xe_display.c b/drivers/gpu/drm/xe/xe_display.c
index 74391d9..e4db069 100644
--- a/drivers/gpu/drm/xe/xe_display.c
+++ b/drivers/gpu/drm/xe/xe_display.c
@@ -134,8 +134,6 @@ static void xe_display_fini_nommio(struct drm_device *dev, void *dummy)
int xe_display_init_nommio(struct xe_device *xe)
{
- int err;
-
if (!xe->info.enable_display)
return 0;
@@ -145,10 +143,6 @@ int xe_display_init_nommio(struct xe_device *xe)
/* This must be called before any calls to HAS_PCH_* */
intel_detect_pch(xe);
- err = intel_power_domains_init(xe);
- if (err)
- return err;
-
return drmm_add_action_or_reset(&xe->drm, xe_display_fini_nommio, xe);
}
diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c
index 82d1305..6040e4d 100644
--- a/drivers/gpu/drm/xe/xe_drm_client.c
+++ b/drivers/gpu/drm/xe/xe_drm_client.c
@@ -131,14 +131,6 @@ static void bo_meminfo(struct xe_bo *bo,
static void show_meminfo(struct drm_printer *p, struct drm_file *file)
{
- static const char *const mem_type_to_name[TTM_NUM_MEM_TYPES] = {
- [XE_PL_SYSTEM] = "system",
- [XE_PL_TT] = "gtt",
- [XE_PL_VRAM0] = "vram0",
- [XE_PL_VRAM1] = "vram1",
- [4 ... 6] = NULL,
- [XE_PL_STOLEN] = "stolen"
- };
struct drm_memory_stats stats[TTM_NUM_MEM_TYPES] = {};
struct xe_file *xef = file->driver_priv;
struct ttm_device *bdev = &xef->xe->ttm;
@@ -171,7 +163,7 @@ static void show_meminfo(struct drm_printer *p, struct drm_file *file)
spin_unlock(&client->bos_lock);
for (mem_type = XE_PL_SYSTEM; mem_type < TTM_NUM_MEM_TYPES; ++mem_type) {
- if (!mem_type_to_name[mem_type])
+ if (!xe_mem_type_to_name[mem_type])
continue;
man = ttm_manager_type(bdev, mem_type);
@@ -182,7 +174,7 @@ static void show_meminfo(struct drm_printer *p, struct drm_file *file)
DRM_GEM_OBJECT_RESIDENT |
(mem_type != XE_PL_SYSTEM ? 0 :
DRM_GEM_OBJECT_PURGEABLE),
- mem_type_to_name[mem_type]);
+ xe_mem_type_to_name[mem_type]);
}
}
}
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index bcfc412..4922302 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -60,7 +60,6 @@ static struct xe_exec_queue *__xe_exec_queue_create(struct xe_device *xe,
q->fence_irq = >->fence_irq[hwe->class];
q->ring_ops = gt->ring_ops[hwe->class];
q->ops = gt->exec_queue_ops;
- INIT_LIST_HEAD(&q->persistent.link);
INIT_LIST_HEAD(&q->compute.link);
INIT_LIST_HEAD(&q->multi_gt_link);
@@ -310,102 +309,6 @@ static int exec_queue_set_timeslice(struct xe_device *xe, struct xe_exec_queue *
return q->ops->set_timeslice(q, value);
}
-static int exec_queue_set_preemption_timeout(struct xe_device *xe,
- struct xe_exec_queue *q, u64 value,
- bool create)
-{
- u32 min = 0, max = 0;
-
- xe_exec_queue_get_prop_minmax(q->hwe->eclass,
- XE_EXEC_QUEUE_PREEMPT_TIMEOUT, &min, &max);
-
- if (xe_exec_queue_enforce_schedule_limit() &&
- !xe_hw_engine_timeout_in_range(value, min, max))
- return -EINVAL;
-
- return q->ops->set_preempt_timeout(q, value);
-}
-
-static int exec_queue_set_persistence(struct xe_device *xe, struct xe_exec_queue *q,
- u64 value, bool create)
-{
- if (XE_IOCTL_DBG(xe, !create))
- return -EINVAL;
-
- if (XE_IOCTL_DBG(xe, xe_vm_in_preempt_fence_mode(q->vm)))
- return -EINVAL;
-
- if (value)
- q->flags |= EXEC_QUEUE_FLAG_PERSISTENT;
- else
- q->flags &= ~EXEC_QUEUE_FLAG_PERSISTENT;
-
- return 0;
-}
-
-static int exec_queue_set_job_timeout(struct xe_device *xe, struct xe_exec_queue *q,
- u64 value, bool create)
-{
- u32 min = 0, max = 0;
-
- if (XE_IOCTL_DBG(xe, !create))
- return -EINVAL;
-
- xe_exec_queue_get_prop_minmax(q->hwe->eclass,
- XE_EXEC_QUEUE_JOB_TIMEOUT, &min, &max);
-
- if (xe_exec_queue_enforce_schedule_limit() &&
- !xe_hw_engine_timeout_in_range(value, min, max))
- return -EINVAL;
-
- return q->ops->set_job_timeout(q, value);
-}
-
-static int exec_queue_set_acc_trigger(struct xe_device *xe, struct xe_exec_queue *q,
- u64 value, bool create)
-{
- if (XE_IOCTL_DBG(xe, !create))
- return -EINVAL;
-
- if (XE_IOCTL_DBG(xe, !xe->info.has_usm))
- return -EINVAL;
-
- q->usm.acc_trigger = value;
-
- return 0;
-}
-
-static int exec_queue_set_acc_notify(struct xe_device *xe, struct xe_exec_queue *q,
- u64 value, bool create)
-{
- if (XE_IOCTL_DBG(xe, !create))
- return -EINVAL;
-
- if (XE_IOCTL_DBG(xe, !xe->info.has_usm))
- return -EINVAL;
-
- q->usm.acc_notify = value;
-
- return 0;
-}
-
-static int exec_queue_set_acc_granularity(struct xe_device *xe, struct xe_exec_queue *q,
- u64 value, bool create)
-{
- if (XE_IOCTL_DBG(xe, !create))
- return -EINVAL;
-
- if (XE_IOCTL_DBG(xe, !xe->info.has_usm))
- return -EINVAL;
-
- if (value > DRM_XE_ACC_GRANULARITY_64M)
- return -EINVAL;
-
- q->usm.acc_granularity = value;
-
- return 0;
-}
-
typedef int (*xe_exec_queue_set_property_fn)(struct xe_device *xe,
struct xe_exec_queue *q,
u64 value, bool create);
@@ -413,12 +316,6 @@ typedef int (*xe_exec_queue_set_property_fn)(struct xe_device *xe,
static const xe_exec_queue_set_property_fn exec_queue_set_property_funcs[] = {
[DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY] = exec_queue_set_priority,
[DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE] = exec_queue_set_timeslice,
- [DRM_XE_EXEC_QUEUE_SET_PROPERTY_PREEMPTION_TIMEOUT] = exec_queue_set_preemption_timeout,
- [DRM_XE_EXEC_QUEUE_SET_PROPERTY_PERSISTENCE] = exec_queue_set_persistence,
- [DRM_XE_EXEC_QUEUE_SET_PROPERTY_JOB_TIMEOUT] = exec_queue_set_job_timeout,
- [DRM_XE_EXEC_QUEUE_SET_PROPERTY_ACC_TRIGGER] = exec_queue_set_acc_trigger,
- [DRM_XE_EXEC_QUEUE_SET_PROPERTY_ACC_NOTIFY] = exec_queue_set_acc_notify,
- [DRM_XE_EXEC_QUEUE_SET_PROPERTY_ACC_GRANULARITY] = exec_queue_set_acc_granularity,
};
static int exec_queue_user_ext_set_property(struct xe_device *xe,
@@ -437,10 +334,15 @@ static int exec_queue_user_ext_set_property(struct xe_device *xe,
if (XE_IOCTL_DBG(xe, ext.property >=
ARRAY_SIZE(exec_queue_set_property_funcs)) ||
- XE_IOCTL_DBG(xe, ext.pad))
+ XE_IOCTL_DBG(xe, ext.pad) ||
+ XE_IOCTL_DBG(xe, ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY &&
+ ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE))
return -EINVAL;
idx = array_index_nospec(ext.property, ARRAY_SIZE(exec_queue_set_property_funcs));
+ if (!exec_queue_set_property_funcs[idx])
+ return -EINVAL;
+
return exec_queue_set_property_funcs[idx](xe, q, ext.value, create);
}
@@ -704,9 +606,7 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
}
q = xe_exec_queue_create(xe, vm, logical_mask,
- args->width, hwe,
- xe_vm_in_lr_mode(vm) ? 0 :
- EXEC_QUEUE_FLAG_PERSISTENT);
+ args->width, hwe, 0);
up_read(&vm->lock);
xe_vm_put(vm);
if (IS_ERR(q))
@@ -728,8 +628,6 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
goto kill_exec_queue;
}
- q->persistent.xef = xef;
-
mutex_lock(&xef->exec_queue.lock);
err = xa_alloc(&xef->exec_queue.xa, &id, q, xa_limit_32b, GFP_KERNEL);
mutex_unlock(&xef->exec_queue.lock);
@@ -872,10 +770,7 @@ int xe_exec_queue_destroy_ioctl(struct drm_device *dev, void *data,
if (XE_IOCTL_DBG(xe, !q))
return -ENOENT;
- if (!(q->flags & EXEC_QUEUE_FLAG_PERSISTENT))
- xe_exec_queue_kill(q);
- else
- xe_device_add_persistent_exec_queues(xe, q);
+ xe_exec_queue_kill(q);
trace_xe_exec_queue_close(q);
xe_exec_queue_put(q);
@@ -926,20 +821,24 @@ void xe_exec_queue_last_fence_put_unlocked(struct xe_exec_queue *q)
* @q: The exec queue
* @vm: The VM the engine does a bind or exec for
*
- * Get last fence, does not take a ref
+ * Get last fence, takes a ref
*
* Returns: last fence if not signaled, dma fence stub if signaled
*/
struct dma_fence *xe_exec_queue_last_fence_get(struct xe_exec_queue *q,
struct xe_vm *vm)
{
+ struct dma_fence *fence;
+
xe_exec_queue_last_fence_lockdep_assert(q, vm);
if (q->last_fence &&
test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &q->last_fence->flags))
xe_exec_queue_last_fence_put(q, vm);
- return q->last_fence ? q->last_fence : dma_fence_get_stub();
+ fence = q->last_fence ? q->last_fence : dma_fence_get_stub();
+ dma_fence_get(fence);
+ return fence;
}
/**
diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index 8d4b7fe..36f4901 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -105,16 +105,6 @@ struct xe_exec_queue {
struct xe_guc_exec_queue *guc;
};
- /**
- * @persistent: persistent exec queue state
- */
- struct {
- /** @xef: file which this exec queue belongs to */
- struct xe_file *xef;
- /** @link: link in list of persistent exec queues */
- struct list_head link;
- } persistent;
-
union {
/**
* @parallel: parallel submission state
@@ -160,16 +150,6 @@ struct xe_exec_queue {
spinlock_t lock;
} compute;
- /** @usm: unified shared memory state */
- struct {
- /** @acc_trigger: access counter trigger */
- u32 acc_trigger;
- /** @acc_notify: access counter notify */
- u32 acc_notify;
- /** @acc_granularity: access counter granularity */
- u32 acc_granularity;
- } usm;
-
/** @ops: submission backend exec queue operations */
const struct xe_exec_queue_ops *ops;
diff --git a/drivers/gpu/drm/xe/xe_execlist.c b/drivers/gpu/drm/xe/xe_execlist.c
index 96b5224..acb4d9f 100644
--- a/drivers/gpu/drm/xe/xe_execlist.c
+++ b/drivers/gpu/drm/xe/xe_execlist.c
@@ -212,7 +212,7 @@ static void xe_execlist_port_wake_locked(struct xe_execlist_port *port,
static void xe_execlist_make_active(struct xe_execlist_exec_queue *exl)
{
struct xe_execlist_port *port = exl->port;
- enum xe_exec_queue_priority priority = exl->active_priority;
+ enum xe_exec_queue_priority priority = exl->q->sched_props.priority;
XE_WARN_ON(priority == XE_EXEC_QUEUE_PRIORITY_UNSET);
XE_WARN_ON(priority < 0);
@@ -378,8 +378,6 @@ static void execlist_exec_queue_fini_async(struct work_struct *w)
list_del(&exl->active_link);
spin_unlock_irqrestore(&exl->port->lock, flags);
- if (q->flags & EXEC_QUEUE_FLAG_PERSISTENT)
- xe_device_remove_persistent_exec_queues(xe, q);
drm_sched_entity_fini(&exl->entity);
drm_sched_fini(&exl->sched);
kfree(exl);
diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index 3af2ade..35474dd 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -437,7 +437,10 @@ static int all_fw_domain_init(struct xe_gt *gt)
* USM has its only SA pool to non-block behind user operations
*/
if (gt_to_xe(gt)->info.has_usm) {
- gt->usm.bb_pool = xe_sa_bo_manager_init(gt_to_tile(gt), SZ_1M, 16);
+ struct xe_device *xe = gt_to_xe(gt);
+
+ gt->usm.bb_pool = xe_sa_bo_manager_init(gt_to_tile(gt),
+ IS_DGFX(xe) ? SZ_1M : SZ_512K, 16);
if (IS_ERR(gt->usm.bb_pool)) {
err = PTR_ERR(gt->usm.bb_pool);
goto err_force_wake;
diff --git a/drivers/gpu/drm/xe/xe_gt_idle.c b/drivers/gpu/drm/xe/xe_gt_idle.c
index 9358f73..9fcae65 100644
--- a/drivers/gpu/drm/xe/xe_gt_idle.c
+++ b/drivers/gpu/drm/xe/xe_gt_idle.c
@@ -145,10 +145,10 @@ void xe_gt_idle_sysfs_init(struct xe_gt_idle *gtidle)
}
if (xe_gt_is_media_type(gt)) {
- sprintf(gtidle->name, "gt%d-mc\n", gt->info.id);
+ sprintf(gtidle->name, "gt%d-mc", gt->info.id);
gtidle->idle_residency = xe_guc_pc_mc6_residency;
} else {
- sprintf(gtidle->name, "gt%d-rc\n", gt->info.id);
+ sprintf(gtidle->name, "gt%d-rc", gt->info.id);
gtidle->idle_residency = xe_guc_pc_rc6_residency;
}
diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index 9c2fe16..73f08f1 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -335,7 +335,7 @@ int xe_guc_pagefault_handler(struct xe_guc *guc, u32 *msg, u32 len)
return -EPROTO;
asid = FIELD_GET(PFD_ASID, msg[1]);
- pf_queue = >->usm.pf_queue[asid % NUM_PF_QUEUE];
+ pf_queue = gt->usm.pf_queue + (asid % NUM_PF_QUEUE);
spin_lock_irqsave(&pf_queue->lock, flags);
full = pf_queue_full(pf_queue);
diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index 7eef23a..f4c4852 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -247,6 +247,14 @@ int xe_gt_tlb_invalidation_vma(struct xe_gt *gt,
xe_gt_assert(gt, vma);
+ /* Execlists not supported */
+ if (gt_to_xe(gt)->info.force_execlist) {
+ if (fence)
+ __invalidation_fence_signal(fence);
+
+ return 0;
+ }
+
action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
action[len++] = 0; /* seqno, replaced in send_tlb_invalidation */
if (!xe->info.has_range_tlb_invalidation) {
@@ -317,6 +325,10 @@ int xe_gt_tlb_invalidation_wait(struct xe_gt *gt, int seqno)
struct drm_printer p = drm_err_printer(__func__);
int ret;
+ /* Execlists not supported */
+ if (gt_to_xe(gt)->info.force_execlist)
+ return 0;
+
/*
* XXX: See above, this algorithm only works if seqno are always in
* order
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 54ffcfc..f22ae71 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1028,8 +1028,6 @@ static void __guc_exec_queue_fini_async(struct work_struct *w)
if (xe_exec_queue_is_lr(q))
cancel_work_sync(&ge->lr_tdr);
- if (q->flags & EXEC_QUEUE_FLAG_PERSISTENT)
- xe_device_remove_persistent_exec_queues(gt_to_xe(q->gt), q);
release_guc_id(guc, q);
xe_sched_entity_fini(&ge->entity);
xe_sched_fini(&ge->sched);
diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
index 0ec5ad2..b38319d 100644
--- a/drivers/gpu/drm/xe/xe_lrc.c
+++ b/drivers/gpu/drm/xe/xe_lrc.c
@@ -682,8 +682,6 @@ static void xe_lrc_set_ppgtt(struct xe_lrc *lrc, struct xe_vm *vm)
#define PVC_CTX_ASID (0x2e + 1)
#define PVC_CTX_ACC_CTR_THOLD (0x2a + 1)
-#define ACC_GRANULARITY_S 20
-#define ACC_NOTIFY_S 16
int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
struct xe_exec_queue *q, struct xe_vm *vm, u32 ring_size)
@@ -754,13 +752,7 @@ int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
xe_lrc_write_ctx_reg(lrc, CTX_RING_CTL,
RING_CTL_SIZE(lrc->ring.size) | RING_VALID);
if (xe->info.has_asid && vm)
- xe_lrc_write_ctx_reg(lrc, PVC_CTX_ASID,
- (q->usm.acc_granularity <<
- ACC_GRANULARITY_S) | vm->usm.asid);
- if (xe->info.has_usm && vm)
- xe_lrc_write_ctx_reg(lrc, PVC_CTX_ACC_CTR_THOLD,
- (q->usm.acc_notify << ACC_NOTIFY_S) |
- q->usm.acc_trigger);
+ xe_lrc_write_ctx_reg(lrc, PVC_CTX_ASID, vm->usm.asid);
lrc->desc = LRC_VALID;
lrc->desc |= LRC_LEGACY_64B_CONTEXT << LRC_ADDRESSING_MODE_SHIFT;
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 5c6c546..70480c3 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -170,11 +170,6 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
if (!IS_DGFX(xe)) {
/* Write out batch too */
m->batch_base_ofs = NUM_PT_SLOTS * XE_PAGE_SIZE;
- if (xe->info.has_usm) {
- batch = tile->primary_gt->usm.bb_pool->bo;
- m->usm_batch_base_ofs = m->batch_base_ofs;
- }
-
for (i = 0; i < batch->size;
i += vm->flags & XE_VM_FLAG_64K ? XE_64K_PAGE_SIZE :
XE_PAGE_SIZE) {
@@ -185,6 +180,24 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
entry);
level++;
}
+ if (xe->info.has_usm) {
+ xe_tile_assert(tile, batch->size == SZ_1M);
+
+ batch = tile->primary_gt->usm.bb_pool->bo;
+ m->usm_batch_base_ofs = m->batch_base_ofs + SZ_1M;
+ xe_tile_assert(tile, batch->size == SZ_512K);
+
+ for (i = 0; i < batch->size;
+ i += vm->flags & XE_VM_FLAG_64K ? XE_64K_PAGE_SIZE :
+ XE_PAGE_SIZE) {
+ entry = vm->pt_ops->pte_encode_bo(batch, i,
+ pat_index, 0);
+
+ xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64,
+ entry);
+ level++;
+ }
+ }
} else {
u64 batch_addr = xe_bo_addr(batch, 0, XE_PAGE_SIZE);
@@ -1204,8 +1217,11 @@ static bool no_in_syncs(struct xe_vm *vm, struct xe_exec_queue *q,
}
if (q) {
fence = xe_exec_queue_last_fence_get(q, vm);
- if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
+ if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) {
+ dma_fence_put(fence);
return false;
+ }
+ dma_fence_put(fence);
}
return true;
diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c
index 5f6b53e..02f7808 100644
--- a/drivers/gpu/drm/xe/xe_mmio.c
+++ b/drivers/gpu/drm/xe/xe_mmio.c
@@ -105,7 +105,7 @@ static void xe_resize_vram_bar(struct xe_device *xe)
pci_bus_for_each_resource(root, root_res, i) {
if (root_res && root_res->flags & (IORESOURCE_MEM | IORESOURCE_MEM_64) &&
- root_res->start > 0x100000000ull)
+ (u64)root_res->start > 0x100000000ul)
break;
}
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index e45b37c..6653c04 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -20,8 +20,8 @@
struct xe_pt_dir {
struct xe_pt pt;
- /** @dir: Directory structure for the xe_pt_walk functionality */
- struct xe_ptw_dir dir;
+ /** @children: Array of page-table child nodes */
+ struct xe_ptw *children[XE_PDES];
};
#if IS_ENABLED(CONFIG_DRM_XE_DEBUG_VM)
@@ -44,7 +44,7 @@ static struct xe_pt_dir *as_xe_pt_dir(struct xe_pt *pt)
static struct xe_pt *xe_pt_entry(struct xe_pt_dir *pt_dir, unsigned int index)
{
- return container_of(pt_dir->dir.entries[index], struct xe_pt, base);
+ return container_of(pt_dir->children[index], struct xe_pt, base);
}
static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
@@ -65,6 +65,14 @@ static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
XE_PTE_NULL;
}
+static void xe_pt_free(struct xe_pt *pt)
+{
+ if (pt->level)
+ kfree(as_xe_pt_dir(pt));
+ else
+ kfree(pt);
+}
+
/**
* xe_pt_create() - Create a page-table.
* @vm: The vm to create for.
@@ -85,15 +93,19 @@ struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_tile *tile,
{
struct xe_pt *pt;
struct xe_bo *bo;
- size_t size;
int err;
- size = !level ? sizeof(struct xe_pt) : sizeof(struct xe_pt_dir) +
- XE_PDES * sizeof(struct xe_ptw *);
- pt = kzalloc(size, GFP_KERNEL);
+ if (level) {
+ struct xe_pt_dir *dir = kzalloc(sizeof(*dir), GFP_KERNEL);
+
+ pt = (dir) ? &dir->pt : NULL;
+ } else {
+ pt = kzalloc(sizeof(*pt), GFP_KERNEL);
+ }
if (!pt)
return ERR_PTR(-ENOMEM);
+ pt->level = level;
bo = xe_bo_create_pin_map(vm->xe, tile, vm, SZ_4K,
ttm_bo_type_kernel,
XE_BO_CREATE_VRAM_IF_DGFX(tile) |
@@ -106,8 +118,7 @@ struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_tile *tile,
goto err_kfree;
}
pt->bo = bo;
- pt->level = level;
- pt->base.dir = level ? &as_xe_pt_dir(pt)->dir : NULL;
+ pt->base.children = level ? as_xe_pt_dir(pt)->children : NULL;
if (vm->xef)
xe_drm_client_add_bo(vm->xef->client, pt->bo);
@@ -116,7 +127,7 @@ struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_tile *tile,
return pt;
err_kfree:
- kfree(pt);
+ xe_pt_free(pt);
return ERR_PTR(err);
}
@@ -193,7 +204,7 @@ void xe_pt_destroy(struct xe_pt *pt, u32 flags, struct llist_head *deferred)
deferred);
}
}
- kfree(pt);
+ xe_pt_free(pt);
}
/**
@@ -358,7 +369,7 @@ xe_pt_insert_entry(struct xe_pt_stage_bind_walk *xe_walk, struct xe_pt *parent,
struct iosys_map *map = &parent->bo->vmap;
if (unlikely(xe_child))
- parent->base.dir->entries[offset] = &xe_child->base;
+ parent->base.children[offset] = &xe_child->base;
xe_pt_write(xe_walk->vm->xe, map, offset, pte);
parent->num_live++;
@@ -488,10 +499,12 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
* this device *requires* 64K PTE size for VRAM, fail.
*/
if (level == 0 && !xe_parent->is_compact) {
- if (xe_pt_is_pte_ps64K(addr, next, xe_walk))
+ if (xe_pt_is_pte_ps64K(addr, next, xe_walk)) {
+ xe_walk->vma->gpuva.flags |= XE_VMA_PTE_64K;
pte |= XE_PTE_PS64;
- else if (XE_WARN_ON(xe_walk->needs_64K))
+ } else if (XE_WARN_ON(xe_walk->needs_64K)) {
return -EINVAL;
+ }
}
ret = xe_pt_insert_entry(xe_walk, xe_parent, offset, NULL, pte);
@@ -534,13 +547,16 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
*child = &xe_child->base;
/*
- * Prefer the compact pagetable layout for L0 if possible.
+ * Prefer the compact pagetable layout for L0 if possible. Only
+ * possible if VMA covers entire 2MB region as compact 64k and
+ * 4k pages cannot be mixed within a 2MB region.
* TODO: Suballocate the pt bo to avoid wasting a lot of
* memory.
*/
if (GRAPHICS_VERx100(tile_to_xe(xe_walk->tile)) >= 1250 && level == 1 &&
covers && xe_pt_scan_64K(addr, next, xe_walk)) {
walk->shifts = xe_compact_pt_shifts;
+ xe_walk->vma->gpuva.flags |= XE_VMA_PTE_COMPACT;
flags |= XE_PDE_64K;
xe_child->is_compact = true;
}
@@ -853,7 +869,7 @@ static void xe_pt_commit_bind(struct xe_vma *vma,
xe_pt_destroy(xe_pt_entry(pt_dir, j_),
xe_vma_vm(vma)->flags, deferred);
- pt_dir->dir.entries[j_] = &newpte->base;
+ pt_dir->children[j_] = &newpte->base;
}
kfree(entries[i].pt_entries);
}
@@ -1507,7 +1523,7 @@ xe_pt_commit_unbind(struct xe_vma *vma,
xe_pt_destroy(xe_pt_entry(pt_dir, i),
xe_vma_vm(vma)->flags, deferred);
- pt_dir->dir.entries[i] = NULL;
+ pt_dir->children[i] = NULL;
}
}
}
diff --git a/drivers/gpu/drm/xe/xe_pt_walk.c b/drivers/gpu/drm/xe/xe_pt_walk.c
index 8f6c8d0..b8b3d2a 100644
--- a/drivers/gpu/drm/xe/xe_pt_walk.c
+++ b/drivers/gpu/drm/xe/xe_pt_walk.c
@@ -74,7 +74,7 @@ int xe_pt_walk_range(struct xe_ptw *parent, unsigned int level,
u64 addr, u64 end, struct xe_pt_walk *walk)
{
pgoff_t offset = xe_pt_offset(addr, level, walk);
- struct xe_ptw **entries = parent->dir ? parent->dir->entries : NULL;
+ struct xe_ptw **entries = parent->children ? parent->children : NULL;
const struct xe_pt_walk_ops *ops = walk->ops;
enum page_walk_action action;
struct xe_ptw *child;
diff --git a/drivers/gpu/drm/xe/xe_pt_walk.h b/drivers/gpu/drm/xe/xe_pt_walk.h
index ec3d1e9..5ecc4d2 100644
--- a/drivers/gpu/drm/xe/xe_pt_walk.h
+++ b/drivers/gpu/drm/xe/xe_pt_walk.h
@@ -8,28 +8,15 @@
#include <linux/pagewalk.h>
#include <linux/types.h>
-struct xe_ptw_dir;
-
/**
* struct xe_ptw - base class for driver pagetable subclassing.
- * @dir: Pointer to an array of children if any.
+ * @children: Pointer to an array of children if any.
*
* Drivers could subclass this, and if it's a page-directory, typically
- * embed the xe_ptw_dir::entries array in the same allocation.
+ * embed an array of xe_ptw pointers.
*/
struct xe_ptw {
- struct xe_ptw_dir *dir;
-};
-
-/**
- * struct xe_ptw_dir - page directory structure
- * @entries: Array holding page directory children.
- *
- * It is the responsibility of the user to ensure @entries is
- * correctly sized.
- */
-struct xe_ptw_dir {
- struct xe_ptw *entries[0];
+ struct xe_ptw **children;
};
/**
diff --git a/drivers/gpu/drm/xe/xe_range_fence.c b/drivers/gpu/drm/xe/xe_range_fence.c
index d35d9ec..372378e 100644
--- a/drivers/gpu/drm/xe/xe_range_fence.c
+++ b/drivers/gpu/drm/xe/xe_range_fence.c
@@ -151,6 +151,11 @@ xe_range_fence_tree_next(struct xe_range_fence *rfence, u64 start, u64 last)
return xe_range_fence_tree_iter_next(rfence, start, last);
}
+static void xe_range_fence_free(struct xe_range_fence *rfence)
+{
+ kfree(rfence);
+}
+
const struct xe_range_fence_ops xe_range_fence_kfree_ops = {
- .free = (void (*)(struct xe_range_fence *rfence)) kfree,
+ .free = xe_range_fence_free,
};
diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c
index 01106a1..4e2ccad 100644
--- a/drivers/gpu/drm/xe/xe_sched_job.c
+++ b/drivers/gpu/drm/xe/xe_sched_job.c
@@ -274,7 +274,6 @@ int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct xe_vm *vm)
struct dma_fence *fence;
fence = xe_exec_queue_last_fence_get(job->q, vm);
- dma_fence_get(fence);
return drm_sched_job_add_dependency(&job->drm, fence);
}
diff --git a/drivers/gpu/drm/xe/xe_sync.c b/drivers/gpu/drm/xe/xe_sync.c
index e4c220c..02c9577 100644
--- a/drivers/gpu/drm/xe/xe_sync.c
+++ b/drivers/gpu/drm/xe/xe_sync.c
@@ -19,7 +19,7 @@
#include "xe_macros.h"
#include "xe_sched_job_types.h"
-struct user_fence {
+struct xe_user_fence {
struct xe_device *xe;
struct kref refcount;
struct dma_fence_cb cb;
@@ -27,31 +27,32 @@ struct user_fence {
struct mm_struct *mm;
u64 __user *addr;
u64 value;
+ int signalled;
};
static void user_fence_destroy(struct kref *kref)
{
- struct user_fence *ufence = container_of(kref, struct user_fence,
+ struct xe_user_fence *ufence = container_of(kref, struct xe_user_fence,
refcount);
mmdrop(ufence->mm);
kfree(ufence);
}
-static void user_fence_get(struct user_fence *ufence)
+static void user_fence_get(struct xe_user_fence *ufence)
{
kref_get(&ufence->refcount);
}
-static void user_fence_put(struct user_fence *ufence)
+static void user_fence_put(struct xe_user_fence *ufence)
{
kref_put(&ufence->refcount, user_fence_destroy);
}
-static struct user_fence *user_fence_create(struct xe_device *xe, u64 addr,
- u64 value)
+static struct xe_user_fence *user_fence_create(struct xe_device *xe, u64 addr,
+ u64 value)
{
- struct user_fence *ufence;
+ struct xe_user_fence *ufence;
ufence = kmalloc(sizeof(*ufence), GFP_KERNEL);
if (!ufence)
@@ -69,7 +70,7 @@ static struct user_fence *user_fence_create(struct xe_device *xe, u64 addr,
static void user_fence_worker(struct work_struct *w)
{
- struct user_fence *ufence = container_of(w, struct user_fence, worker);
+ struct xe_user_fence *ufence = container_of(w, struct xe_user_fence, worker);
if (mmget_not_zero(ufence->mm)) {
kthread_use_mm(ufence->mm);
@@ -80,10 +81,11 @@ static void user_fence_worker(struct work_struct *w)
}
wake_up_all(&ufence->xe->ufence_wq);
+ WRITE_ONCE(ufence->signalled, 1);
user_fence_put(ufence);
}
-static void kick_ufence(struct user_fence *ufence, struct dma_fence *fence)
+static void kick_ufence(struct xe_user_fence *ufence, struct dma_fence *fence)
{
INIT_WORK(&ufence->worker, user_fence_worker);
queue_work(ufence->xe->ordered_wq, &ufence->worker);
@@ -92,7 +94,7 @@ static void kick_ufence(struct user_fence *ufence, struct dma_fence *fence)
static void user_fence_cb(struct dma_fence *fence, struct dma_fence_cb *cb)
{
- struct user_fence *ufence = container_of(cb, struct user_fence, cb);
+ struct xe_user_fence *ufence = container_of(cb, struct xe_user_fence, cb);
kick_ufence(ufence, fence);
}
@@ -307,7 +309,6 @@ xe_sync_in_fence_get(struct xe_sync_entry *sync, int num_sync,
/* Easy case... */
if (!num_in_fence) {
fence = xe_exec_queue_last_fence_get(q, vm);
- dma_fence_get(fence);
return fence;
}
@@ -322,7 +323,6 @@ xe_sync_in_fence_get(struct xe_sync_entry *sync, int num_sync,
}
}
fences[current_fence++] = xe_exec_queue_last_fence_get(q, vm);
- dma_fence_get(fences[current_fence - 1]);
cf = dma_fence_array_create(num_in_fence, fences,
vm->composite_fence_ctx,
vm->composite_fence_seqno++,
@@ -342,3 +342,39 @@ xe_sync_in_fence_get(struct xe_sync_entry *sync, int num_sync,
return ERR_PTR(-ENOMEM);
}
+
+/**
+ * xe_sync_ufence_get() - Get user fence from sync
+ * @sync: input sync
+ *
+ * Get a user fence reference from sync.
+ *
+ * Return: xe_user_fence pointer with reference
+ */
+struct xe_user_fence *xe_sync_ufence_get(struct xe_sync_entry *sync)
+{
+ user_fence_get(sync->ufence);
+
+ return sync->ufence;
+}
+
+/**
+ * xe_sync_ufence_put() - Put user fence reference
+ * @ufence: user fence reference
+ *
+ */
+void xe_sync_ufence_put(struct xe_user_fence *ufence)
+{
+ user_fence_put(ufence);
+}
+
+/**
+ * xe_sync_ufence_get_status() - Get user fence status
+ * @ufence: user fence
+ *
+ * Return: 1 if signalled, 0 not signalled, <0 on error
+ */
+int xe_sync_ufence_get_status(struct xe_user_fence *ufence)
+{
+ return READ_ONCE(ufence->signalled);
+}
diff --git a/drivers/gpu/drm/xe/xe_sync.h b/drivers/gpu/drm/xe/xe_sync.h
index f43cdca..0fd0d51 100644
--- a/drivers/gpu/drm/xe/xe_sync.h
+++ b/drivers/gpu/drm/xe/xe_sync.h
@@ -38,4 +38,8 @@ static inline bool xe_sync_is_ufence(struct xe_sync_entry *sync)
return !!sync->ufence;
}
+struct xe_user_fence *xe_sync_ufence_get(struct xe_sync_entry *sync);
+void xe_sync_ufence_put(struct xe_user_fence *ufence);
+int xe_sync_ufence_get_status(struct xe_user_fence *ufence);
+
#endif
diff --git a/drivers/gpu/drm/xe/xe_sync_types.h b/drivers/gpu/drm/xe/xe_sync_types.h
index 852db5e..30ac3f5 100644
--- a/drivers/gpu/drm/xe/xe_sync_types.h
+++ b/drivers/gpu/drm/xe/xe_sync_types.h
@@ -18,7 +18,7 @@ struct xe_sync_entry {
struct drm_syncobj *syncobj;
struct dma_fence *fence;
struct dma_fence_chain *chain_fence;
- struct user_fence *ufence;
+ struct xe_user_fence *ufence;
u64 addr;
u64 timeline_value;
u32 type;
diff --git a/drivers/gpu/drm/xe/xe_tile.c b/drivers/gpu/drm/xe/xe_tile.c
index 044c208..0650b2f 100644
--- a/drivers/gpu/drm/xe/xe_tile.c
+++ b/drivers/gpu/drm/xe/xe_tile.c
@@ -167,9 +167,10 @@ int xe_tile_init_noalloc(struct xe_tile *tile)
goto err_mem_access;
tile->mem.kernel_bb_pool = xe_sa_bo_manager_init(tile, SZ_1M, 16);
- if (IS_ERR(tile->mem.kernel_bb_pool))
+ if (IS_ERR(tile->mem.kernel_bb_pool)) {
err = PTR_ERR(tile->mem.kernel_bb_pool);
-
+ goto err_mem_access;
+ }
xe_wa_apply_tile_workarounds(tile);
xe_tile_sysfs_init(tile);
diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
index 95163c3..4ddc555 100644
--- a/drivers/gpu/drm/xe/xe_trace.h
+++ b/drivers/gpu/drm/xe/xe_trace.h
@@ -12,6 +12,7 @@
#include <linux/tracepoint.h>
#include <linux/types.h>
+#include "xe_bo.h"
#include "xe_bo_types.h"
#include "xe_exec_queue_types.h"
#include "xe_gpu_scheduler_types.h"
@@ -26,16 +27,16 @@ DECLARE_EVENT_CLASS(xe_gt_tlb_invalidation_fence,
TP_ARGS(fence),
TP_STRUCT__entry(
- __field(u64, fence)
+ __field(struct xe_gt_tlb_invalidation_fence *, fence)
__field(int, seqno)
),
TP_fast_assign(
- __entry->fence = (u64)fence;
+ __entry->fence = fence;
__entry->seqno = fence->seqno;
),
- TP_printk("fence=0x%016llx, seqno=%d",
+ TP_printk("fence=%p, seqno=%d",
__entry->fence, __entry->seqno)
);
@@ -82,16 +83,16 @@ DECLARE_EVENT_CLASS(xe_bo,
TP_STRUCT__entry(
__field(size_t, size)
__field(u32, flags)
- __field(u64, vm)
+ __field(struct xe_vm *, vm)
),
TP_fast_assign(
__entry->size = bo->size;
__entry->flags = bo->flags;
- __entry->vm = (unsigned long)bo->vm;
+ __entry->vm = bo->vm;
),
- TP_printk("size=%zu, flags=0x%02x, vm=0x%016llx",
+ TP_printk("size=%zu, flags=0x%02x, vm=%p",
__entry->size, __entry->flags, __entry->vm)
);
@@ -100,9 +101,31 @@ DEFINE_EVENT(xe_bo, xe_bo_cpu_fault,
TP_ARGS(bo)
);
-DEFINE_EVENT(xe_bo, xe_bo_move,
- TP_PROTO(struct xe_bo *bo),
- TP_ARGS(bo)
+TRACE_EVENT(xe_bo_move,
+ TP_PROTO(struct xe_bo *bo, uint32_t new_placement, uint32_t old_placement,
+ bool move_lacks_source),
+ TP_ARGS(bo, new_placement, old_placement, move_lacks_source),
+ TP_STRUCT__entry(
+ __field(struct xe_bo *, bo)
+ __field(size_t, size)
+ __field(u32, new_placement)
+ __field(u32, old_placement)
+ __array(char, device_id, 12)
+ __field(bool, move_lacks_source)
+ ),
+
+ TP_fast_assign(
+ __entry->bo = bo;
+ __entry->size = bo->size;
+ __entry->new_placement = new_placement;
+ __entry->old_placement = old_placement;
+ strscpy(__entry->device_id, dev_name(xe_bo_device(__entry->bo)->drm.dev), 12);
+ __entry->move_lacks_source = move_lacks_source;
+ ),
+ TP_printk("move_lacks_source:%s, migrate object %p [size %zu] from %s to %s device_id:%s",
+ __entry->move_lacks_source ? "yes" : "no", __entry->bo, __entry->size,
+ xe_mem_type_to_name[__entry->old_placement],
+ xe_mem_type_to_name[__entry->new_placement], __entry->device_id)
);
DECLARE_EVENT_CLASS(xe_exec_queue,
@@ -327,16 +350,16 @@ DECLARE_EVENT_CLASS(xe_hw_fence,
TP_STRUCT__entry(
__field(u64, ctx)
__field(u32, seqno)
- __field(u64, fence)
+ __field(struct xe_hw_fence *, fence)
),
TP_fast_assign(
__entry->ctx = fence->dma.context;
__entry->seqno = fence->dma.seqno;
- __entry->fence = (unsigned long)fence;
+ __entry->fence = fence;
),
- TP_printk("ctx=0x%016llx, fence=0x%016llx, seqno=%u",
+ TP_printk("ctx=0x%016llx, fence=%p, seqno=%u",
__entry->ctx, __entry->fence, __entry->seqno)
);
@@ -365,7 +388,7 @@ DECLARE_EVENT_CLASS(xe_vma,
TP_ARGS(vma),
TP_STRUCT__entry(
- __field(u64, vma)
+ __field(struct xe_vma *, vma)
__field(u32, asid)
__field(u64, start)
__field(u64, end)
@@ -373,14 +396,14 @@ DECLARE_EVENT_CLASS(xe_vma,
),
TP_fast_assign(
- __entry->vma = (unsigned long)vma;
+ __entry->vma = vma;
__entry->asid = xe_vma_vm(vma)->usm.asid;
__entry->start = xe_vma_start(vma);
__entry->end = xe_vma_end(vma) - 1;
__entry->ptr = xe_vma_userptr(vma);
),
- TP_printk("vma=0x%016llx, asid=0x%05x, start=0x%012llx, end=0x%012llx, ptr=0x%012llx,",
+ TP_printk("vma=%p, asid=0x%05x, start=0x%012llx, end=0x%012llx, userptr=0x%012llx,",
__entry->vma, __entry->asid, __entry->start,
__entry->end, __entry->ptr)
)
@@ -465,16 +488,16 @@ DECLARE_EVENT_CLASS(xe_vm,
TP_ARGS(vm),
TP_STRUCT__entry(
- __field(u64, vm)
+ __field(struct xe_vm *, vm)
__field(u32, asid)
),
TP_fast_assign(
- __entry->vm = (unsigned long)vm;
+ __entry->vm = vm;
__entry->asid = vm->usm.asid;
),
- TP_printk("vm=0x%016llx, asid=0x%05x", __entry->vm,
+ TP_printk("vm=%p, asid=0x%05x", __entry->vm,
__entry->asid)
);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 30db264..3b21afe 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -37,8 +37,6 @@
#include "generated/xe_wa_oob.h"
#include "xe_wa.h"
-#define TEST_VM_ASYNC_OPS_ERROR
-
static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm)
{
return vm->gpuvm.r_obj;
@@ -114,11 +112,8 @@ int xe_vma_userptr_pin_pages(struct xe_userptr_vma *uvma)
num_pages - pinned,
read_only ? 0 : FOLL_WRITE,
&pages[pinned]);
- if (ret < 0) {
- if (in_kthread)
- ret = 0;
+ if (ret < 0)
break;
- }
pinned += ret;
ret = 0;
@@ -902,6 +897,11 @@ static void xe_vma_destroy_late(struct xe_vma *vma)
struct xe_device *xe = vm->xe;
bool read_only = xe_vma_read_only(vma);
+ if (vma->ufence) {
+ xe_sync_ufence_put(vma->ufence);
+ vma->ufence = NULL;
+ }
+
if (xe_vma_is_userptr(vma)) {
struct xe_userptr *userptr = &to_userptr_vma(vma)->userptr;
@@ -1000,9 +1000,16 @@ int xe_vm_prepare_vma(struct drm_exec *exec, struct xe_vma *vma,
int err;
XE_WARN_ON(!vm);
- err = drm_exec_prepare_obj(exec, xe_vm_obj(vm), num_shared);
- if (!err && bo && !bo->vm)
- err = drm_exec_prepare_obj(exec, &bo->ttm.base, num_shared);
+ if (num_shared)
+ err = drm_exec_prepare_obj(exec, xe_vm_obj(vm), num_shared);
+ else
+ err = drm_exec_lock_obj(exec, xe_vm_obj(vm));
+ if (!err && bo && !bo->vm) {
+ if (num_shared)
+ err = drm_exec_prepare_obj(exec, &bo->ttm.base, num_shared);
+ else
+ err = drm_exec_lock_obj(exec, &bo->ttm.base);
+ }
return err;
}
@@ -1606,6 +1613,16 @@ xe_vm_unbind_vma(struct xe_vma *vma, struct xe_exec_queue *q,
trace_xe_vma_unbind(vma);
+ if (vma->ufence) {
+ struct xe_user_fence * const f = vma->ufence;
+
+ if (!xe_sync_ufence_get_status(f))
+ return ERR_PTR(-EBUSY);
+
+ vma->ufence = NULL;
+ xe_sync_ufence_put(f);
+ }
+
if (number_tiles > 1) {
fences = kmalloc_array(number_tiles, sizeof(*fences),
GFP_KERNEL);
@@ -1739,6 +1756,21 @@ xe_vm_bind_vma(struct xe_vma *vma, struct xe_exec_queue *q,
return ERR_PTR(err);
}
+static struct xe_user_fence *
+find_ufence_get(struct xe_sync_entry *syncs, u32 num_syncs)
+{
+ unsigned int i;
+
+ for (i = 0; i < num_syncs; i++) {
+ struct xe_sync_entry *e = &syncs[i];
+
+ if (xe_sync_is_ufence(e))
+ return xe_sync_ufence_get(e);
+ }
+
+ return NULL;
+}
+
static int __xe_vm_bind(struct xe_vm *vm, struct xe_vma *vma,
struct xe_exec_queue *q, struct xe_sync_entry *syncs,
u32 num_syncs, bool immediate, bool first_op,
@@ -1746,9 +1778,16 @@ static int __xe_vm_bind(struct xe_vm *vm, struct xe_vma *vma,
{
struct dma_fence *fence;
struct xe_exec_queue *wait_exec_queue = to_wait_exec_queue(vm, q);
+ struct xe_user_fence *ufence;
xe_vm_assert_held(vm);
+ ufence = find_ufence_get(syncs, num_syncs);
+ if (vma->ufence && ufence)
+ xe_sync_ufence_put(vma->ufence);
+
+ vma->ufence = ufence ?: vma->ufence;
+
if (immediate) {
fence = xe_vm_bind_vma(vma, q, syncs, num_syncs, first_op,
last_op);
@@ -1984,6 +2023,7 @@ static int xe_vm_prefetch(struct xe_vm *vm, struct xe_vma *vma,
xe_exec_queue_last_fence_get(wait_exec_queue, vm);
xe_sync_entry_signal(&syncs[i], NULL, fence);
+ dma_fence_put(fence);
}
}
@@ -2064,7 +2104,6 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
struct drm_gem_object *obj = bo ? &bo->ttm.base : NULL;
struct drm_gpuva_ops *ops;
struct drm_gpuva_op *__op;
- struct xe_vma_op *op;
struct drm_gpuvm_bo *vm_bo;
int err;
@@ -2111,23 +2150,10 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
if (IS_ERR(ops))
return ops;
-#ifdef TEST_VM_ASYNC_OPS_ERROR
- if (operation & FORCE_ASYNC_OP_ERROR) {
- op = list_first_entry_or_null(&ops->list, struct xe_vma_op,
- base.entry);
- if (op)
- op->inject_error = true;
- }
-#endif
-
drm_gpuva_for_each_op(__op, ops) {
struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
if (__op->op == DRM_GPUVA_OP_MAP) {
- op->map.immediate =
- flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE;
- op->map.read_only =
- flags & DRM_XE_VM_BIND_FLAG_READONLY;
op->map.is_null = flags & DRM_XE_VM_BIND_FLAG_NULL;
op->map.pat_index = pat_index;
} else if (__op->op == DRM_GPUVA_OP_PREFETCH) {
@@ -2197,13 +2223,17 @@ static u64 xe_vma_max_pte_size(struct xe_vma *vma)
{
if (vma->gpuva.flags & XE_VMA_PTE_1G)
return SZ_1G;
- else if (vma->gpuva.flags & XE_VMA_PTE_2M)
+ else if (vma->gpuva.flags & (XE_VMA_PTE_2M | XE_VMA_PTE_COMPACT))
return SZ_2M;
+ else if (vma->gpuva.flags & XE_VMA_PTE_64K)
+ return SZ_64K;
+ else if (vma->gpuva.flags & XE_VMA_PTE_4K)
+ return SZ_4K;
- return SZ_4K;
+ return SZ_1G; /* Uninitialized, used max size */
}
-static u64 xe_vma_set_pte_size(struct xe_vma *vma, u64 size)
+static void xe_vma_set_pte_size(struct xe_vma *vma, u64 size)
{
switch (size) {
case SZ_1G:
@@ -2212,9 +2242,13 @@ static u64 xe_vma_set_pte_size(struct xe_vma *vma, u64 size)
case SZ_2M:
vma->gpuva.flags |= XE_VMA_PTE_2M;
break;
+ case SZ_64K:
+ vma->gpuva.flags |= XE_VMA_PTE_64K;
+ break;
+ case SZ_4K:
+ vma->gpuva.flags |= XE_VMA_PTE_4K;
+ break;
}
-
- return SZ_4K;
}
static int xe_vma_op_commit(struct xe_vm *vm, struct xe_vma_op *op)
@@ -2312,8 +2346,6 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
switch (op->base.op) {
case DRM_GPUVA_OP_MAP:
{
- flags |= op->map.read_only ?
- VMA_CREATE_FLAG_READ_ONLY : 0;
flags |= op->map.is_null ?
VMA_CREATE_FLAG_IS_NULL : 0;
@@ -2444,7 +2476,7 @@ static int op_execute(struct drm_exec *exec, struct xe_vm *vm,
case DRM_GPUVA_OP_MAP:
err = xe_vm_bind(vm, vma, op->q, xe_vma_bo(vma),
op->syncs, op->num_syncs,
- op->map.immediate || !xe_vm_in_fault_mode(vm),
+ !xe_vm_in_fault_mode(vm),
op->flags & XE_VMA_OP_FIRST,
op->flags & XE_VMA_OP_LAST);
break;
@@ -2530,13 +2562,25 @@ static int __xe_vma_op_execute(struct xe_vm *vm, struct xe_vma *vma,
}
drm_exec_fini(&exec);
- if (err == -EAGAIN && xe_vma_is_userptr(vma)) {
+ if (err == -EAGAIN) {
lockdep_assert_held_write(&vm->lock);
- err = xe_vma_userptr_pin_pages(to_userptr_vma(vma));
- if (!err)
- goto retry_userptr;
- trace_xe_vma_fail(vma);
+ if (op->base.op == DRM_GPUVA_OP_REMAP) {
+ if (!op->remap.unmap_done)
+ vma = gpuva_to_vma(op->base.remap.unmap->va);
+ else if (op->remap.prev)
+ vma = op->remap.prev;
+ else
+ vma = op->remap.next;
+ }
+
+ if (xe_vma_is_userptr(vma)) {
+ err = xe_vma_userptr_pin_pages(to_userptr_vma(vma));
+ if (!err)
+ goto retry_userptr;
+
+ trace_xe_vma_fail(vma);
+ }
}
return err;
@@ -2548,13 +2592,6 @@ static int xe_vma_op_execute(struct xe_vm *vm, struct xe_vma_op *op)
lockdep_assert_held_write(&vm->lock);
-#ifdef TEST_VM_ASYNC_OPS_ERROR
- if (op->inject_error) {
- op->inject_error = false;
- return -ENOMEM;
- }
-#endif
-
switch (op->base.op) {
case DRM_GPUVA_OP_MAP:
ret = __xe_vma_op_execute(vm, op->map.vma, op);
@@ -2669,7 +2706,7 @@ static void vm_bind_ioctl_ops_unwind(struct xe_vm *vm,
{
int i;
- for (i = num_ops_list - 1; i; ++i) {
+ for (i = num_ops_list - 1; i >= 0; --i) {
struct drm_gpuva_ops *__ops = ops[i];
struct drm_gpuva_op *__op;
@@ -2714,21 +2751,11 @@ static int vm_bind_ioctl_ops_execute(struct xe_vm *vm,
return 0;
}
-#ifdef TEST_VM_ASYNC_OPS_ERROR
-#define SUPPORTED_FLAGS \
- (FORCE_ASYNC_OP_ERROR | DRM_XE_VM_BIND_FLAG_READONLY | \
- DRM_XE_VM_BIND_FLAG_IMMEDIATE | DRM_XE_VM_BIND_FLAG_NULL | 0xffff)
-#else
-#define SUPPORTED_FLAGS \
- (DRM_XE_VM_BIND_FLAG_READONLY | \
- DRM_XE_VM_BIND_FLAG_IMMEDIATE | DRM_XE_VM_BIND_FLAG_NULL | \
- 0xffff)
-#endif
+#define SUPPORTED_FLAGS (DRM_XE_VM_BIND_FLAG_NULL | \
+ DRM_XE_VM_BIND_FLAG_DUMPABLE)
#define XE_64K_PAGE_MASK 0xffffull
#define ALL_DRM_XE_SYNCS_FLAGS (DRM_XE_SYNCS_FLAG_WAIT_FOR_OP)
-#define MAX_BINDS 512 /* FIXME: Picking random upper limit */
-
static int vm_bind_ioctl_check_args(struct xe_device *xe,
struct drm_xe_vm_bind *args,
struct drm_xe_vm_bind_op **bind_ops)
@@ -2740,16 +2767,16 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
return -EINVAL;
- if (XE_IOCTL_DBG(xe, args->extensions) ||
- XE_IOCTL_DBG(xe, args->num_binds > MAX_BINDS))
+ if (XE_IOCTL_DBG(xe, args->extensions))
return -EINVAL;
if (args->num_binds > 1) {
u64 __user *bind_user =
u64_to_user_ptr(args->vector_of_binds);
- *bind_ops = kmalloc(sizeof(struct drm_xe_vm_bind_op) *
- args->num_binds, GFP_KERNEL);
+ *bind_ops = kvmalloc_array(args->num_binds,
+ sizeof(struct drm_xe_vm_bind_op),
+ GFP_KERNEL | __GFP_ACCOUNT);
if (!*bind_ops)
return -ENOMEM;
@@ -2839,7 +2866,7 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
free_bind_ops:
if (args->num_binds > 1)
- kfree(*bind_ops);
+ kvfree(*bind_ops);
return err;
}
@@ -2927,13 +2954,15 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
}
if (args->num_binds) {
- bos = kcalloc(args->num_binds, sizeof(*bos), GFP_KERNEL);
+ bos = kvcalloc(args->num_binds, sizeof(*bos),
+ GFP_KERNEL | __GFP_ACCOUNT);
if (!bos) {
err = -ENOMEM;
goto release_vm_lock;
}
- ops = kcalloc(args->num_binds, sizeof(*ops), GFP_KERNEL);
+ ops = kvcalloc(args->num_binds, sizeof(*ops),
+ GFP_KERNEL | __GFP_ACCOUNT);
if (!ops) {
err = -ENOMEM;
goto release_vm_lock;
@@ -3074,10 +3103,10 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
for (i = 0; bos && i < args->num_binds; ++i)
xe_bo_put(bos[i]);
- kfree(bos);
- kfree(ops);
+ kvfree(bos);
+ kvfree(ops);
if (args->num_binds > 1)
- kfree(bind_ops);
+ kvfree(bind_ops);
return err;
@@ -3101,10 +3130,10 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
if (q)
xe_exec_queue_put(q);
free_objs:
- kfree(bos);
- kfree(ops);
+ kvfree(bos);
+ kvfree(ops);
if (args->num_binds > 1)
- kfree(bind_ops);
+ kvfree(bind_ops);
return err;
}
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 1fec66a..7300eea5 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -19,11 +19,9 @@
struct xe_bo;
struct xe_sync_entry;
+struct xe_user_fence;
struct xe_vm;
-#define TEST_VM_ASYNC_OPS_ERROR
-#define FORCE_ASYNC_OP_ERROR BIT(31)
-
#define XE_VMA_READ_ONLY DRM_GPUVA_USERBITS
#define XE_VMA_DESTROYED (DRM_GPUVA_USERBITS << 1)
#define XE_VMA_ATOMIC_PTE_BIT (DRM_GPUVA_USERBITS << 2)
@@ -32,6 +30,8 @@ struct xe_vm;
#define XE_VMA_PTE_4K (DRM_GPUVA_USERBITS << 5)
#define XE_VMA_PTE_2M (DRM_GPUVA_USERBITS << 6)
#define XE_VMA_PTE_1G (DRM_GPUVA_USERBITS << 7)
+#define XE_VMA_PTE_64K (DRM_GPUVA_USERBITS << 8)
+#define XE_VMA_PTE_COMPACT (DRM_GPUVA_USERBITS << 9)
/** struct xe_userptr - User pointer */
struct xe_userptr {
@@ -105,6 +105,12 @@ struct xe_vma {
* @pat_index: The pat index to use when encoding the PTEs for this vma.
*/
u16 pat_index;
+
+ /**
+ * @ufence: The user fence that was provided with MAP.
+ * Needs to be signalled before UNMAP can be processed.
+ */
+ struct xe_user_fence *ufence;
};
/**
@@ -289,10 +295,6 @@ struct xe_vm {
struct xe_vma_op_map {
/** @vma: VMA to map */
struct xe_vma *vma;
- /** @immediate: Immediate bind */
- bool immediate;
- /** @read_only: Read only */
- bool read_only;
/** @is_null: is NULL binding */
bool is_null;
/** @pat_index: The pat index to use for this operation. */
@@ -360,11 +362,6 @@ struct xe_vma_op {
/** @flags: operation flags */
enum xe_vma_op_flags flags;
-#ifdef TEST_VM_ASYNC_OPS_ERROR
- /** @inject_error: inject error to test async op error handling */
- bool inject_error;
-#endif
-
union {
/** @map: VMA map operation specific data */
struct xe_vma_op_map map;
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index 42fd504..89983d7 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -169,6 +169,7 @@ static const struct host1x_info host1x06_info = {
.num_sid_entries = ARRAY_SIZE(tegra186_sid_table),
.sid_table = tegra186_sid_table,
.reserve_vblank_syncpts = false,
+ .skip_reset_assert = true,
};
static const struct host1x_sid_entry tegra194_sid_table[] = {
@@ -680,13 +681,15 @@ static int __maybe_unused host1x_runtime_suspend(struct device *dev)
host1x_intr_stop(host);
host1x_syncpt_save(host);
- err = reset_control_bulk_assert(host->nresets, host->resets);
- if (err) {
- dev_err(dev, "failed to assert reset: %d\n", err);
- goto resume_host1x;
- }
+ if (!host->info->skip_reset_assert) {
+ err = reset_control_bulk_assert(host->nresets, host->resets);
+ if (err) {
+ dev_err(dev, "failed to assert reset: %d\n", err);
+ goto resume_host1x;
+ }
- usleep_range(1000, 2000);
+ usleep_range(1000, 2000);
+ }
clk_disable_unprepare(host->clk);
reset_control_bulk_release(host->nresets, host->resets);
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
index c8e302d..925a118 100644
--- a/drivers/gpu/host1x/dev.h
+++ b/drivers/gpu/host1x/dev.h
@@ -116,6 +116,12 @@ struct host1x_info {
* the display driver disables VBLANK increments.
*/
bool reserve_vblank_syncpts;
+ /*
+ * On Tegra186, secure world applications may require access to
+ * host1x during suspend/resume. To allow this, we need to leave
+ * host1x not in reset.
+ */
+ bool skip_reset_assert;
};
struct host1x {
diff --git a/drivers/hid/hid-logitech-hidpp.c b/drivers/hid/hid-logitech-hidpp.c
index 6ef0c88e..d2f3f23 100644
--- a/drivers/hid/hid-logitech-hidpp.c
+++ b/drivers/hid/hid-logitech-hidpp.c
@@ -203,6 +203,8 @@ struct hidpp_device {
struct hidpp_scroll_counter vertical_wheel_counter;
u8 wireless_feature_index;
+
+ bool connected_once;
};
/* HID++ 1.0 error codes */
@@ -988,8 +990,13 @@ static int hidpp_root_get_protocol_version(struct hidpp_device *hidpp)
hidpp->protocol_minor = response.rap.params[1];
print_version:
- hid_info(hidpp->hid_dev, "HID++ %u.%u device connected.\n",
- hidpp->protocol_major, hidpp->protocol_minor);
+ if (!hidpp->connected_once) {
+ hid_info(hidpp->hid_dev, "HID++ %u.%u device connected.\n",
+ hidpp->protocol_major, hidpp->protocol_minor);
+ hidpp->connected_once = true;
+ } else
+ hid_dbg(hidpp->hid_dev, "HID++ %u.%u device connected.\n",
+ hidpp->protocol_major, hidpp->protocol_minor);
return 0;
}
@@ -4184,7 +4191,7 @@ static void hidpp_connect_event(struct work_struct *work)
/* Get device version to check if it is connected */
ret = hidpp_root_get_protocol_version(hidpp);
if (ret) {
- hid_info(hidpp->hid_dev, "Disconnected\n");
+ hid_dbg(hidpp->hid_dev, "Disconnected\n");
if (hidpp->battery.ps) {
hidpp->battery.online = false;
hidpp->battery.status = POWER_SUPPLY_STATUS_UNKNOWN;
diff --git a/drivers/hid/hid-multitouch.c b/drivers/hid/hid-multitouch.c
index fd5b063..3e91e4d 100644
--- a/drivers/hid/hid-multitouch.c
+++ b/drivers/hid/hid-multitouch.c
@@ -2153,6 +2153,10 @@ static const struct hid_device_id mt_devices[] = {
{ .driver_data = MT_CLS_WIN_8_FORCE_MULTI_INPUT,
HID_DEVICE(BUS_I2C, HID_GROUP_MULTITOUCH_WIN_8,
+ USB_VENDOR_ID_SYNAPTICS, 0xcddc) },
+
+ { .driver_data = MT_CLS_WIN_8_FORCE_MULTI_INPUT,
+ HID_DEVICE(BUS_I2C, HID_GROUP_MULTITOUCH_WIN_8,
USB_VENDOR_ID_SYNAPTICS, 0xce08) },
{ .driver_data = MT_CLS_WIN_8_FORCE_MULTI_INPUT,
diff --git a/drivers/hid/intel-ish-hid/ishtp/bus.c b/drivers/hid/intel-ish-hid/ishtp/bus.c
index aa6cb03..03d5601 100644
--- a/drivers/hid/intel-ish-hid/ishtp/bus.c
+++ b/drivers/hid/intel-ish-hid/ishtp/bus.c
@@ -722,6 +722,8 @@ void ishtp_bus_remove_all_clients(struct ishtp_device *ishtp_dev,
spin_lock_irqsave(&ishtp_dev->cl_list_lock, flags);
list_for_each_entry(cl, &ishtp_dev->cl_list, link) {
cl->state = ISHTP_CL_DISCONNECTED;
+ if (warm_reset && cl->device->reference_count)
+ continue;
/*
* Wake any pending process. The waiter would check dev->state
diff --git a/drivers/hid/intel-ish-hid/ishtp/client.c b/drivers/hid/intel-ish-hid/ishtp/client.c
index 82c907f..8a7f2f6 100644
--- a/drivers/hid/intel-ish-hid/ishtp/client.c
+++ b/drivers/hid/intel-ish-hid/ishtp/client.c
@@ -49,7 +49,9 @@ static void ishtp_read_list_flush(struct ishtp_cl *cl)
list_for_each_entry_safe(rb, next, &cl->dev->read_list.list, list)
if (rb->cl && ishtp_cl_cmp_id(cl, rb->cl)) {
list_del(&rb->list);
- ishtp_io_rb_free(rb);
+ spin_lock(&cl->free_list_spinlock);
+ list_add_tail(&rb->list, &cl->free_rb_list.list);
+ spin_unlock(&cl->free_list_spinlock);
}
spin_unlock_irqrestore(&cl->dev->read_list_spinlock, flags);
}
diff --git a/drivers/hid/wacom_sys.c b/drivers/hid/wacom_sys.c
index b613f11e..2bc45b2 100644
--- a/drivers/hid/wacom_sys.c
+++ b/drivers/hid/wacom_sys.c
@@ -2087,7 +2087,7 @@ static int wacom_allocate_inputs(struct wacom *wacom)
return 0;
}
-static int wacom_register_inputs(struct wacom *wacom)
+static int wacom_setup_inputs(struct wacom *wacom)
{
struct input_dev *pen_input_dev, *touch_input_dev, *pad_input_dev;
struct wacom_wac *wacom_wac = &(wacom->wacom_wac);
@@ -2106,10 +2106,6 @@ static int wacom_register_inputs(struct wacom *wacom)
input_free_device(pen_input_dev);
wacom_wac->pen_input = NULL;
pen_input_dev = NULL;
- } else {
- error = input_register_device(pen_input_dev);
- if (error)
- goto fail;
}
error = wacom_setup_touch_input_capabilities(touch_input_dev, wacom_wac);
@@ -2118,10 +2114,6 @@ static int wacom_register_inputs(struct wacom *wacom)
input_free_device(touch_input_dev);
wacom_wac->touch_input = NULL;
touch_input_dev = NULL;
- } else {
- error = input_register_device(touch_input_dev);
- if (error)
- goto fail;
}
error = wacom_setup_pad_input_capabilities(pad_input_dev, wacom_wac);
@@ -2130,7 +2122,34 @@ static int wacom_register_inputs(struct wacom *wacom)
input_free_device(pad_input_dev);
wacom_wac->pad_input = NULL;
pad_input_dev = NULL;
- } else {
+ }
+
+ return 0;
+}
+
+static int wacom_register_inputs(struct wacom *wacom)
+{
+ struct input_dev *pen_input_dev, *touch_input_dev, *pad_input_dev;
+ struct wacom_wac *wacom_wac = &(wacom->wacom_wac);
+ int error = 0;
+
+ pen_input_dev = wacom_wac->pen_input;
+ touch_input_dev = wacom_wac->touch_input;
+ pad_input_dev = wacom_wac->pad_input;
+
+ if (pen_input_dev) {
+ error = input_register_device(pen_input_dev);
+ if (error)
+ goto fail;
+ }
+
+ if (touch_input_dev) {
+ error = input_register_device(touch_input_dev);
+ if (error)
+ goto fail;
+ }
+
+ if (pad_input_dev) {
error = input_register_device(pad_input_dev);
if (error)
goto fail;
@@ -2383,6 +2402,20 @@ static int wacom_parse_and_register(struct wacom *wacom, bool wireless)
if (error)
goto fail;
+ error = wacom_setup_inputs(wacom);
+ if (error)
+ goto fail;
+
+ if (features->type == HID_GENERIC)
+ connect_mask |= HID_CONNECT_DRIVER;
+
+ /* Regular HID work starts now */
+ error = hid_hw_start(hdev, connect_mask);
+ if (error) {
+ hid_err(hdev, "hw start failed\n");
+ goto fail;
+ }
+
error = wacom_register_inputs(wacom);
if (error)
goto fail;
@@ -2397,16 +2430,6 @@ static int wacom_parse_and_register(struct wacom *wacom, bool wireless)
goto fail;
}
- if (features->type == HID_GENERIC)
- connect_mask |= HID_CONNECT_DRIVER;
-
- /* Regular HID work starts now */
- error = hid_hw_start(hdev, connect_mask);
- if (error) {
- hid_err(hdev, "hw start failed\n");
- goto fail;
- }
-
if (!wireless) {
/* Note that if query fails it is not a hard failure */
wacom_query_tablet_data(wacom);
diff --git a/drivers/hid/wacom_wac.c b/drivers/hid/wacom_wac.c
index da8a01f..fbe10fb 100644
--- a/drivers/hid/wacom_wac.c
+++ b/drivers/hid/wacom_wac.c
@@ -2575,7 +2575,14 @@ static void wacom_wac_pen_report(struct hid_device *hdev,
wacom_wac->hid_data.tipswitch);
input_report_key(input, wacom_wac->tool[0], sense);
if (wacom_wac->serial[0]) {
- input_event(input, EV_MSC, MSC_SERIAL, wacom_wac->serial[0]);
+ /*
+ * xf86-input-wacom does not accept a serial number
+ * of '0'. Report the low 32 bits if possible, but
+ * if they are zero, report the upper ones instead.
+ */
+ __u32 serial_lo = wacom_wac->serial[0] & 0xFFFFFFFFu;
+ __u32 serial_hi = wacom_wac->serial[0] >> 32;
+ input_event(input, EV_MSC, MSC_SERIAL, (int)(serial_lo ? serial_lo : serial_hi));
input_report_abs(input, ABS_MISC, sense ? id : 0);
}
diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 56f7e06..adbf674 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -322,125 +322,89 @@ static int create_gpadl_header(enum hv_gpadl_type type, void *kbuffer,
pagecount = hv_gpadl_size(type, size) >> HV_HYP_PAGE_SHIFT;
- /* do we need a gpadl body msg */
pfnsize = MAX_SIZE_CHANNEL_MESSAGE -
sizeof(struct vmbus_channel_gpadl_header) -
sizeof(struct gpa_range);
+ pfncount = umin(pagecount, pfnsize / sizeof(u64));
+
+ msgsize = sizeof(struct vmbus_channel_msginfo) +
+ sizeof(struct vmbus_channel_gpadl_header) +
+ sizeof(struct gpa_range) + pfncount * sizeof(u64);
+ msgheader = kzalloc(msgsize, GFP_KERNEL);
+ if (!msgheader)
+ return -ENOMEM;
+
+ INIT_LIST_HEAD(&msgheader->submsglist);
+ msgheader->msgsize = msgsize;
+
+ gpadl_header = (struct vmbus_channel_gpadl_header *)
+ msgheader->msg;
+ gpadl_header->rangecount = 1;
+ gpadl_header->range_buflen = sizeof(struct gpa_range) +
+ pagecount * sizeof(u64);
+ gpadl_header->range[0].byte_offset = 0;
+ gpadl_header->range[0].byte_count = hv_gpadl_size(type, size);
+ for (i = 0; i < pfncount; i++)
+ gpadl_header->range[0].pfn_array[i] = hv_gpadl_hvpfn(
+ type, kbuffer, size, send_offset, i);
+ *msginfo = msgheader;
+
+ pfnsum = pfncount;
+ pfnleft = pagecount - pfncount;
+
+ /* how many pfns can we fit in a body message */
+ pfnsize = MAX_SIZE_CHANNEL_MESSAGE -
+ sizeof(struct vmbus_channel_gpadl_body);
pfncount = pfnsize / sizeof(u64);
- if (pagecount > pfncount) {
- /* we need a gpadl body */
- /* fill in the header */
+ /*
+ * If pfnleft is zero, everything fits in the header and no body
+ * messages are needed
+ */
+ while (pfnleft) {
+ pfncurr = umin(pfncount, pfnleft);
msgsize = sizeof(struct vmbus_channel_msginfo) +
- sizeof(struct vmbus_channel_gpadl_header) +
- sizeof(struct gpa_range) + pfncount * sizeof(u64);
- msgheader = kzalloc(msgsize, GFP_KERNEL);
- if (!msgheader)
- goto nomem;
+ sizeof(struct vmbus_channel_gpadl_body) +
+ pfncurr * sizeof(u64);
+ msgbody = kzalloc(msgsize, GFP_KERNEL);
- INIT_LIST_HEAD(&msgheader->submsglist);
- msgheader->msgsize = msgsize;
-
- gpadl_header = (struct vmbus_channel_gpadl_header *)
- msgheader->msg;
- gpadl_header->rangecount = 1;
- gpadl_header->range_buflen = sizeof(struct gpa_range) +
- pagecount * sizeof(u64);
- gpadl_header->range[0].byte_offset = 0;
- gpadl_header->range[0].byte_count = hv_gpadl_size(type, size);
- for (i = 0; i < pfncount; i++)
- gpadl_header->range[0].pfn_array[i] = hv_gpadl_hvpfn(
- type, kbuffer, size, send_offset, i);
- *msginfo = msgheader;
-
- pfnsum = pfncount;
- pfnleft = pagecount - pfncount;
-
- /* how many pfns can we fit */
- pfnsize = MAX_SIZE_CHANNEL_MESSAGE -
- sizeof(struct vmbus_channel_gpadl_body);
- pfncount = pfnsize / sizeof(u64);
-
- /* fill in the body */
- while (pfnleft) {
- if (pfnleft > pfncount)
- pfncurr = pfncount;
- else
- pfncurr = pfnleft;
-
- msgsize = sizeof(struct vmbus_channel_msginfo) +
- sizeof(struct vmbus_channel_gpadl_body) +
- pfncurr * sizeof(u64);
- msgbody = kzalloc(msgsize, GFP_KERNEL);
-
- if (!msgbody) {
- struct vmbus_channel_msginfo *pos = NULL;
- struct vmbus_channel_msginfo *tmp = NULL;
- /*
- * Free up all the allocated messages.
- */
- list_for_each_entry_safe(pos, tmp,
- &msgheader->submsglist,
- msglistentry) {
-
- list_del(&pos->msglistentry);
- kfree(pos);
- }
-
- goto nomem;
- }
-
- msgbody->msgsize = msgsize;
- gpadl_body =
- (struct vmbus_channel_gpadl_body *)msgbody->msg;
-
+ if (!msgbody) {
+ struct vmbus_channel_msginfo *pos = NULL;
+ struct vmbus_channel_msginfo *tmp = NULL;
/*
- * Gpadl is u32 and we are using a pointer which could
- * be 64-bit
- * This is governed by the guest/host protocol and
- * so the hypervisor guarantees that this is ok.
+ * Free up all the allocated messages.
*/
- for (i = 0; i < pfncurr; i++)
- gpadl_body->pfn[i] = hv_gpadl_hvpfn(type,
- kbuffer, size, send_offset, pfnsum + i);
+ list_for_each_entry_safe(pos, tmp,
+ &msgheader->submsglist,
+ msglistentry) {
- /* add to msg header */
- list_add_tail(&msgbody->msglistentry,
- &msgheader->submsglist);
- pfnsum += pfncurr;
- pfnleft -= pfncurr;
+ list_del(&pos->msglistentry);
+ kfree(pos);
+ }
+ kfree(msgheader);
+ return -ENOMEM;
}
- } else {
- /* everything fits in a header */
- msgsize = sizeof(struct vmbus_channel_msginfo) +
- sizeof(struct vmbus_channel_gpadl_header) +
- sizeof(struct gpa_range) + pagecount * sizeof(u64);
- msgheader = kzalloc(msgsize, GFP_KERNEL);
- if (msgheader == NULL)
- goto nomem;
- INIT_LIST_HEAD(&msgheader->submsglist);
- msgheader->msgsize = msgsize;
+ msgbody->msgsize = msgsize;
+ gpadl_body = (struct vmbus_channel_gpadl_body *)msgbody->msg;
- gpadl_header = (struct vmbus_channel_gpadl_header *)
- msgheader->msg;
- gpadl_header->rangecount = 1;
- gpadl_header->range_buflen = sizeof(struct gpa_range) +
- pagecount * sizeof(u64);
- gpadl_header->range[0].byte_offset = 0;
- gpadl_header->range[0].byte_count = hv_gpadl_size(type, size);
- for (i = 0; i < pagecount; i++)
- gpadl_header->range[0].pfn_array[i] = hv_gpadl_hvpfn(
- type, kbuffer, size, send_offset, i);
+ /*
+ * Gpadl is u32 and we are using a pointer which could
+ * be 64-bit
+ * This is governed by the guest/host protocol and
+ * so the hypervisor guarantees that this is ok.
+ */
+ for (i = 0; i < pfncurr; i++)
+ gpadl_body->pfn[i] = hv_gpadl_hvpfn(type,
+ kbuffer, size, send_offset, pfnsum + i);
- *msginfo = msgheader;
+ /* add to msg header */
+ list_add_tail(&msgbody->msglistentry, &msgheader->submsglist);
+ pfnsum += pfncurr;
+ pfnleft -= pfncurr;
}
return 0;
-nomem:
- kfree(msgheader);
- kfree(msgbody);
- return -ENOMEM;
}
/*
diff --git a/drivers/hv/hv_util.c b/drivers/hv/hv_util.c
index 42aec2c..9c97c40 100644
--- a/drivers/hv/hv_util.c
+++ b/drivers/hv/hv_util.c
@@ -296,6 +296,11 @@ static struct {
spinlock_t lock;
} host_ts;
+static bool timesync_implicit;
+
+module_param(timesync_implicit, bool, 0644);
+MODULE_PARM_DESC(timesync_implicit, "If set treat SAMPLE as SYNC when clock is behind");
+
static inline u64 reftime_to_ns(u64 reftime)
{
return (reftime - WLTIMEDELTA) * 100;
@@ -345,6 +350,29 @@ static void hv_set_host_time(struct work_struct *work)
}
/*
+ * Due to a bug on Hyper-V hosts, the sync flag may not always be sent on resume.
+ * Force a sync if the guest is behind.
+ */
+static inline bool hv_implicit_sync(u64 host_time)
+{
+ struct timespec64 new_ts;
+ struct timespec64 threshold_ts;
+
+ new_ts = ns_to_timespec64(reftime_to_ns(host_time));
+ ktime_get_real_ts64(&threshold_ts);
+
+ threshold_ts.tv_sec += 5;
+
+ /*
+ * If guest behind the host by 5 or more seconds.
+ */
+ if (timespec64_compare(&new_ts, &threshold_ts) >= 0)
+ return true;
+
+ return false;
+}
+
+/*
* Synchronize time with host after reboot, restore, etc.
*
* ICTIMESYNCFLAG_SYNC flag bit indicates reboot, restore events of the VM.
@@ -384,7 +412,8 @@ static inline void adj_guesttime(u64 hosttime, u64 reftime, u8 adj_flags)
spin_unlock_irqrestore(&host_ts.lock, flags);
/* Schedule work to do do_settimeofday64() */
- if (adj_flags & ICTIMESYNCFLAG_SYNC)
+ if ((adj_flags & ICTIMESYNCFLAG_SYNC) ||
+ (timesync_implicit && hv_implicit_sync(host_ts.host_time)))
schedule_work(&adj_time_work);
}
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index b33d5ab..7f7965f 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -988,7 +988,7 @@ static const struct dev_pm_ops vmbus_pm = {
};
/* The one and only one */
-static struct bus_type hv_bus = {
+static const struct bus_type hv_bus = {
.name = "vmbus",
.match = vmbus_match,
.shutdown = vmbus_shutdown,
diff --git a/drivers/hwmon/aspeed-pwm-tacho.c b/drivers/hwmon/aspeed-pwm-tacho.c
index f6e1e55..4acc185 100644
--- a/drivers/hwmon/aspeed-pwm-tacho.c
+++ b/drivers/hwmon/aspeed-pwm-tacho.c
@@ -195,6 +195,8 @@ struct aspeed_pwm_tacho_data {
u8 fan_tach_ch_source[MAX_ASPEED_FAN_TACH_CHANNELS];
struct aspeed_cooling_device *cdev[8];
const struct attribute_group *groups[3];
+ /* protects access to shared ASPEED_PTCR_RESULT */
+ struct mutex tach_lock;
};
enum type { TYPEM, TYPEN, TYPEO };
@@ -529,6 +531,8 @@ static int aspeed_get_fan_tach_ch_rpm(struct aspeed_pwm_tacho_data *priv,
u8 fan_tach_ch_source, type, mode, both;
int ret;
+ mutex_lock(&priv->tach_lock);
+
regmap_write(priv->regmap, ASPEED_PTCR_TRIGGER, 0);
regmap_write(priv->regmap, ASPEED_PTCR_TRIGGER, 0x1 << fan_tach_ch);
@@ -546,6 +550,8 @@ static int aspeed_get_fan_tach_ch_rpm(struct aspeed_pwm_tacho_data *priv,
ASPEED_RPM_STATUS_SLEEP_USEC,
usec);
+ mutex_unlock(&priv->tach_lock);
+
/* return -ETIMEDOUT if we didn't get an answer. */
if (ret)
return ret;
@@ -915,6 +921,7 @@ static int aspeed_pwm_tacho_probe(struct platform_device *pdev)
priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
if (!priv)
return -ENOMEM;
+ mutex_init(&priv->tach_lock);
priv->regmap = devm_regmap_init(dev, NULL, (__force void *)regs,
&aspeed_pwm_tacho_regmap_config);
if (IS_ERR(priv->regmap))
diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
index ba82d1e..b8fc8d1 100644
--- a/drivers/hwmon/coretemp.c
+++ b/drivers/hwmon/coretemp.c
@@ -41,7 +41,7 @@ MODULE_PARM_DESC(tjmax, "TjMax value in degrees Celsius");
#define PKG_SYSFS_ATTR_NO 1 /* Sysfs attribute for package temp */
#define BASE_SYSFS_ATTR_NO 2 /* Sysfs Base attr no for coretemp */
-#define NUM_REAL_CORES 128 /* Number of Real cores per cpu */
+#define NUM_REAL_CORES 512 /* Number of Real cores per cpu */
#define CORETEMP_NAME_LENGTH 28 /* String Length of attrs */
#define MAX_CORE_ATTRS 4 /* Maximum no of basic attrs */
#define TOTAL_ATTRS (MAX_CORE_ATTRS + 1)
@@ -419,7 +419,7 @@ static ssize_t show_temp(struct device *dev,
}
static int create_core_attrs(struct temp_data *tdata, struct device *dev,
- int attr_no)
+ int index)
{
int i;
static ssize_t (*const rd_ptr[TOTAL_ATTRS]) (struct device *dev,
@@ -431,13 +431,20 @@ static int create_core_attrs(struct temp_data *tdata, struct device *dev,
};
for (i = 0; i < tdata->attr_size; i++) {
+ /*
+ * We map the attr number to core id of the CPU
+ * The attr number is always core id + 2
+ * The Pkgtemp will always show up as temp1_*, if available
+ */
+ int attr_no = tdata->is_pkg_data ? 1 : tdata->cpu_core_id + 2;
+
snprintf(tdata->attr_name[i], CORETEMP_NAME_LENGTH,
"temp%d_%s", attr_no, suffixes[i]);
sysfs_attr_init(&tdata->sd_attrs[i].dev_attr.attr);
tdata->sd_attrs[i].dev_attr.attr.name = tdata->attr_name[i];
tdata->sd_attrs[i].dev_attr.attr.mode = 0444;
tdata->sd_attrs[i].dev_attr.show = rd_ptr[i];
- tdata->sd_attrs[i].index = attr_no;
+ tdata->sd_attrs[i].index = index;
tdata->attrs[i] = &tdata->sd_attrs[i].dev_attr.attr;
}
tdata->attr_group.attrs = tdata->attrs;
@@ -495,30 +502,25 @@ static int create_core_data(struct platform_device *pdev, unsigned int cpu,
struct platform_data *pdata = platform_get_drvdata(pdev);
struct cpuinfo_x86 *c = &cpu_data(cpu);
u32 eax, edx;
- int err, index, attr_no;
+ int err, index;
if (!housekeeping_cpu(cpu, HK_TYPE_MISC))
return 0;
/*
- * Find attr number for sysfs:
- * We map the attr number to core id of the CPU
- * The attr number is always core id + 2
- * The Pkgtemp will always show up as temp1_*, if available
+ * Get the index of tdata in pdata->core_data[]
+ * tdata for package: pdata->core_data[1]
+ * tdata for core: pdata->core_data[2] .. pdata->core_data[NUM_REAL_CORES + 1]
*/
if (pkg_flag) {
- attr_no = PKG_SYSFS_ATTR_NO;
+ index = PKG_SYSFS_ATTR_NO;
} else {
- index = ida_alloc(&pdata->ida, GFP_KERNEL);
+ index = ida_alloc_max(&pdata->ida, NUM_REAL_CORES - 1, GFP_KERNEL);
if (index < 0)
return index;
- pdata->cpu_map[index] = topology_core_id(cpu);
- attr_no = index + BASE_SYSFS_ATTR_NO;
- }
- if (attr_no > MAX_CORE_DATA - 1) {
- err = -ERANGE;
- goto ida_free;
+ pdata->cpu_map[index] = topology_core_id(cpu);
+ index += BASE_SYSFS_ATTR_NO;
}
tdata = init_temp_data(cpu, pkg_flag);
@@ -544,20 +546,20 @@ static int create_core_data(struct platform_device *pdev, unsigned int cpu,
if (get_ttarget(tdata, &pdev->dev) >= 0)
tdata->attr_size++;
- pdata->core_data[attr_no] = tdata;
+ pdata->core_data[index] = tdata;
/* Create sysfs interfaces */
- err = create_core_attrs(tdata, pdata->hwmon_dev, attr_no);
+ err = create_core_attrs(tdata, pdata->hwmon_dev, index);
if (err)
goto exit_free;
return 0;
exit_free:
- pdata->core_data[attr_no] = NULL;
+ pdata->core_data[index] = NULL;
kfree(tdata);
ida_free:
if (!pkg_flag)
- ida_free(&pdata->ida, index);
+ ida_free(&pdata->ida, index - BASE_SYSFS_ATTR_NO);
return err;
}
diff --git a/drivers/hwmon/nct6775-core.c b/drivers/hwmon/nct6775-core.c
index 8d2ef31..9fbab8f 100644
--- a/drivers/hwmon/nct6775-core.c
+++ b/drivers/hwmon/nct6775-core.c
@@ -3512,6 +3512,7 @@ int nct6775_probe(struct device *dev, struct nct6775_data *data,
const u16 *reg_temp_mon, *reg_temp_alternate, *reg_temp_crit;
const u16 *reg_temp_crit_l = NULL, *reg_temp_crit_h = NULL;
int num_reg_temp, num_reg_temp_mon, num_reg_tsi_temp;
+ int num_reg_temp_config;
struct device *hwmon_dev;
struct sensor_template_group tsi_temp_tg;
@@ -3594,6 +3595,7 @@ int nct6775_probe(struct device *dev, struct nct6775_data *data,
reg_temp_over = NCT6106_REG_TEMP_OVER;
reg_temp_hyst = NCT6106_REG_TEMP_HYST;
reg_temp_config = NCT6106_REG_TEMP_CONFIG;
+ num_reg_temp_config = ARRAY_SIZE(NCT6106_REG_TEMP_CONFIG);
reg_temp_alternate = NCT6106_REG_TEMP_ALTERNATE;
reg_temp_crit = NCT6106_REG_TEMP_CRIT;
reg_temp_crit_l = NCT6106_REG_TEMP_CRIT_L;
@@ -3669,6 +3671,7 @@ int nct6775_probe(struct device *dev, struct nct6775_data *data,
reg_temp_over = NCT6106_REG_TEMP_OVER;
reg_temp_hyst = NCT6106_REG_TEMP_HYST;
reg_temp_config = NCT6106_REG_TEMP_CONFIG;
+ num_reg_temp_config = ARRAY_SIZE(NCT6106_REG_TEMP_CONFIG);
reg_temp_alternate = NCT6106_REG_TEMP_ALTERNATE;
reg_temp_crit = NCT6106_REG_TEMP_CRIT;
reg_temp_crit_l = NCT6106_REG_TEMP_CRIT_L;
@@ -3746,6 +3749,7 @@ int nct6775_probe(struct device *dev, struct nct6775_data *data,
reg_temp_over = NCT6775_REG_TEMP_OVER;
reg_temp_hyst = NCT6775_REG_TEMP_HYST;
reg_temp_config = NCT6775_REG_TEMP_CONFIG;
+ num_reg_temp_config = ARRAY_SIZE(NCT6775_REG_TEMP_CONFIG);
reg_temp_alternate = NCT6775_REG_TEMP_ALTERNATE;
reg_temp_crit = NCT6775_REG_TEMP_CRIT;
@@ -3821,6 +3825,7 @@ int nct6775_probe(struct device *dev, struct nct6775_data *data,
reg_temp_over = NCT6775_REG_TEMP_OVER;
reg_temp_hyst = NCT6775_REG_TEMP_HYST;
reg_temp_config = NCT6776_REG_TEMP_CONFIG;
+ num_reg_temp_config = ARRAY_SIZE(NCT6776_REG_TEMP_CONFIG);
reg_temp_alternate = NCT6776_REG_TEMP_ALTERNATE;
reg_temp_crit = NCT6776_REG_TEMP_CRIT;
@@ -3900,6 +3905,7 @@ int nct6775_probe(struct device *dev, struct nct6775_data *data,
reg_temp_over = NCT6779_REG_TEMP_OVER;
reg_temp_hyst = NCT6779_REG_TEMP_HYST;
reg_temp_config = NCT6779_REG_TEMP_CONFIG;
+ num_reg_temp_config = ARRAY_SIZE(NCT6779_REG_TEMP_CONFIG);
reg_temp_alternate = NCT6779_REG_TEMP_ALTERNATE;
reg_temp_crit = NCT6779_REG_TEMP_CRIT;
@@ -4034,6 +4040,7 @@ int nct6775_probe(struct device *dev, struct nct6775_data *data,
reg_temp_over = NCT6779_REG_TEMP_OVER;
reg_temp_hyst = NCT6779_REG_TEMP_HYST;
reg_temp_config = NCT6779_REG_TEMP_CONFIG;
+ num_reg_temp_config = ARRAY_SIZE(NCT6779_REG_TEMP_CONFIG);
reg_temp_alternate = NCT6779_REG_TEMP_ALTERNATE;
reg_temp_crit = NCT6779_REG_TEMP_CRIT;
@@ -4123,6 +4130,7 @@ int nct6775_probe(struct device *dev, struct nct6775_data *data,
reg_temp_over = NCT6798_REG_TEMP_OVER;
reg_temp_hyst = NCT6798_REG_TEMP_HYST;
reg_temp_config = NCT6779_REG_TEMP_CONFIG;
+ num_reg_temp_config = ARRAY_SIZE(NCT6779_REG_TEMP_CONFIG);
reg_temp_alternate = NCT6798_REG_TEMP_ALTERNATE;
reg_temp_crit = NCT6798_REG_TEMP_CRIT;
@@ -4204,7 +4212,8 @@ int nct6775_probe(struct device *dev, struct nct6775_data *data,
= reg_temp_crit[src - 1];
if (reg_temp_crit_l && reg_temp_crit_l[i])
data->reg_temp[4][src - 1] = reg_temp_crit_l[i];
- data->reg_temp_config[src - 1] = reg_temp_config[i];
+ if (i < num_reg_temp_config)
+ data->reg_temp_config[src - 1] = reg_temp_config[i];
data->temp_src[src - 1] = src;
continue;
}
@@ -4217,7 +4226,8 @@ int nct6775_probe(struct device *dev, struct nct6775_data *data,
data->reg_temp[0][s] = reg_temp[i];
data->reg_temp[1][s] = reg_temp_over[i];
data->reg_temp[2][s] = reg_temp_hyst[i];
- data->reg_temp_config[s] = reg_temp_config[i];
+ if (i < num_reg_temp_config)
+ data->reg_temp_config[s] = reg_temp_config[i];
if (reg_temp_crit_h && reg_temp_crit_h[i])
data->reg_temp[3][s] = reg_temp_crit_h[i];
else if (reg_temp_crit[src - 1])
diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile
index 3757b93..aa0ee8e 100644
--- a/drivers/i2c/busses/Makefile
+++ b/drivers/i2c/busses/Makefile
@@ -90,10 +90,8 @@
obj-$(CONFIG_I2C_OCORES) += i2c-ocores.o
obj-$(CONFIG_I2C_OMAP) += i2c-omap.o
obj-$(CONFIG_I2C_OWL) += i2c-owl.o
-i2c-pasemi-objs := i2c-pasemi-core.o i2c-pasemi-pci.o
-obj-$(CONFIG_I2C_PASEMI) += i2c-pasemi.o
-i2c-apple-objs := i2c-pasemi-core.o i2c-pasemi-platform.o
-obj-$(CONFIG_I2C_APPLE) += i2c-apple.o
+obj-$(CONFIG_I2C_PASEMI) += i2c-pasemi-core.o i2c-pasemi-pci.o
+obj-$(CONFIG_I2C_APPLE) += i2c-pasemi-core.o i2c-pasemi-platform.o
obj-$(CONFIG_I2C_PCA_PLATFORM) += i2c-pca-platform.o
obj-$(CONFIG_I2C_PNX) += i2c-pnx.o
obj-$(CONFIG_I2C_PXA) += i2c-pxa.o
diff --git a/drivers/i2c/busses/i2c-aspeed.c b/drivers/i2c/busses/i2c-aspeed.c
index 5511fd4..ce8c484 100644
--- a/drivers/i2c/busses/i2c-aspeed.c
+++ b/drivers/i2c/busses/i2c-aspeed.c
@@ -445,6 +445,7 @@ static u32 aspeed_i2c_master_irq(struct aspeed_i2c_bus *bus, u32 irq_status)
irq_status);
irq_handled |= (irq_status & ASPEED_I2CD_INTR_MASTER_ERRORS);
if (bus->master_state != ASPEED_I2C_MASTER_INACTIVE) {
+ irq_handled = irq_status;
bus->cmd_err = ret;
bus->master_state = ASPEED_I2C_MASTER_INACTIVE;
goto out_complete;
diff --git a/drivers/i2c/busses/i2c-i801.c b/drivers/i2c/busses/i2c-i801.c
index 3932e8d..274e987 100644
--- a/drivers/i2c/busses/i2c-i801.c
+++ b/drivers/i2c/busses/i2c-i801.c
@@ -498,11 +498,10 @@ static int i801_block_transaction_by_block(struct i801_priv *priv,
/* Set block buffer mode */
outb_p(inb_p(SMBAUXCTL(priv)) | SMBAUXCTL_E32B, SMBAUXCTL(priv));
- inb_p(SMBHSTCNT(priv)); /* reset the data buffer index */
-
if (read_write == I2C_SMBUS_WRITE) {
len = data->block[0];
outb_p(len, SMBHSTDAT0(priv));
+ inb_p(SMBHSTCNT(priv)); /* reset the data buffer index */
for (i = 0; i < len; i++)
outb_p(data->block[i+1], SMBBLKDAT(priv));
}
@@ -520,6 +519,7 @@ static int i801_block_transaction_by_block(struct i801_priv *priv,
}
data->block[0] = len;
+ inb_p(SMBHSTCNT(priv)); /* reset the data buffer index */
for (i = 0; i < len; i++)
data->block[i + 1] = inb_p(SMBBLKDAT(priv));
}
@@ -1416,7 +1416,6 @@ static void i801_add_mux(struct i801_priv *priv)
lookup->table[i] = GPIO_LOOKUP(mux_config->gpio_chip,
mux_config->gpios[i], "mux", 0);
gpiod_add_lookup_table(lookup);
- priv->lookup = lookup;
/*
* Register the mux device, we use PLATFORM_DEVID_NONE here
@@ -1430,7 +1429,10 @@ static void i801_add_mux(struct i801_priv *priv)
sizeof(struct i2c_mux_gpio_platform_data));
if (IS_ERR(priv->mux_pdev)) {
gpiod_remove_lookup_table(lookup);
+ devm_kfree(dev, lookup);
dev_err(dev, "Failed to register i2c-mux-gpio device\n");
+ } else {
+ priv->lookup = lookup;
}
}
@@ -1742,9 +1744,9 @@ static int i801_probe(struct pci_dev *dev, const struct pci_device_id *id)
i801_enable_host_notify(&priv->adapter);
- i801_probe_optional_slaves(priv);
/* We ignore errors - multiplexing is optional */
i801_add_mux(priv);
+ i801_probe_optional_slaves(priv);
pci_set_drvdata(dev, priv);
diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c
index 88a0539..60e8131 100644
--- a/drivers/i2c/busses/i2c-imx.c
+++ b/drivers/i2c/busses/i2c-imx.c
@@ -803,6 +803,11 @@ static irqreturn_t i2c_imx_slave_handle(struct imx_i2c_struct *i2c_imx,
ctl &= ~I2CR_MTX;
imx_i2c_write_reg(ctl, i2c_imx, IMX_I2C_I2CR);
imx_i2c_read_reg(i2c_imx, IMX_I2C_I2DR);
+
+ /* flag the last byte as processed */
+ i2c_imx_slave_event(i2c_imx,
+ I2C_SLAVE_READ_PROCESSED, &value);
+
i2c_imx_slave_finish_op(i2c_imx);
return IRQ_HANDLED;
}
diff --git a/drivers/i2c/busses/i2c-pasemi-core.c b/drivers/i2c/busses/i2c-pasemi-core.c
index 7d54a9f..bd8becb 100644
--- a/drivers/i2c/busses/i2c-pasemi-core.c
+++ b/drivers/i2c/busses/i2c-pasemi-core.c
@@ -369,6 +369,7 @@ int pasemi_i2c_common_probe(struct pasemi_smbus *smbus)
return 0;
}
+EXPORT_SYMBOL_GPL(pasemi_i2c_common_probe);
irqreturn_t pasemi_irq_handler(int irq, void *dev_id)
{
@@ -378,3 +379,8 @@ irqreturn_t pasemi_irq_handler(int irq, void *dev_id)
complete(&smbus->irq_completion);
return IRQ_HANDLED;
}
+EXPORT_SYMBOL_GPL(pasemi_irq_handler);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Olof Johansson <olof@lixom.net>");
+MODULE_DESCRIPTION("PA Semi PWRficient SMBus driver");
diff --git a/drivers/i2c/busses/i2c-qcom-geni.c b/drivers/i2c/busses/i2c-qcom-geni.c
index 0d2e717..da94df4 100644
--- a/drivers/i2c/busses/i2c-qcom-geni.c
+++ b/drivers/i2c/busses/i2c-qcom-geni.c
@@ -613,20 +613,20 @@ static int geni_i2c_gpi_xfer(struct geni_i2c_dev *gi2c, struct i2c_msg msgs[], i
peripheral.addr = msgs[i].addr;
- if (msgs[i].flags & I2C_M_RD) {
- ret = geni_i2c_gpi(gi2c, &msgs[i], &config,
- &rx_addr, &rx_buf, I2C_READ, gi2c->rx_c);
- if (ret)
- goto err;
- }
-
ret = geni_i2c_gpi(gi2c, &msgs[i], &config,
&tx_addr, &tx_buf, I2C_WRITE, gi2c->tx_c);
if (ret)
goto err;
- if (msgs[i].flags & I2C_M_RD)
+ if (msgs[i].flags & I2C_M_RD) {
+ ret = geni_i2c_gpi(gi2c, &msgs[i], &config,
+ &rx_addr, &rx_buf, I2C_READ, gi2c->rx_c);
+ if (ret)
+ goto err;
+
dma_async_issue_pending(gi2c->rx_c);
+ }
+
dma_async_issue_pending(gi2c->tx_c);
timeout = wait_for_completion_timeout(&gi2c->done, XFER_TIMEOUT);
diff --git a/drivers/i2c/busses/i2c-wmt.c b/drivers/i2c/busses/i2c-wmt.c
index ec2a8da..198afee 100644
--- a/drivers/i2c/busses/i2c-wmt.c
+++ b/drivers/i2c/busses/i2c-wmt.c
@@ -378,11 +378,15 @@ static int wmt_i2c_probe(struct platform_device *pdev)
err = i2c_add_adapter(adap);
if (err)
- return err;
+ goto err_disable_clk;
platform_set_drvdata(pdev, i2c_dev);
return 0;
+
+err_disable_clk:
+ clk_disable_unprepare(i2c_dev->clk);
+ return err;
}
static void wmt_i2c_remove(struct platform_device *pdev)
diff --git a/drivers/iio/accel/Kconfig b/drivers/iio/accel/Kconfig
index 91adcac..c9d7afe 100644
--- a/drivers/iio/accel/Kconfig
+++ b/drivers/iio/accel/Kconfig
@@ -219,10 +219,12 @@
config BMA400_I2C
tristate
+ select REGMAP_I2C
depends on BMA400
config BMA400_SPI
tristate
+ select REGMAP_SPI
depends on BMA400
config BMC150_ACCEL
diff --git a/drivers/iio/accel/adxl367.c b/drivers/iio/accel/adxl367.c
index 90b7ae6..484fe2e 100644
--- a/drivers/iio/accel/adxl367.c
+++ b/drivers/iio/accel/adxl367.c
@@ -1429,9 +1429,11 @@ static int adxl367_verify_devid(struct adxl367_state *st)
unsigned int val;
int ret;
- ret = regmap_read_poll_timeout(st->regmap, ADXL367_REG_DEVID, val,
- val == ADXL367_DEVID_AD, 1000, 10000);
+ ret = regmap_read(st->regmap, ADXL367_REG_DEVID, &val);
if (ret)
+ return dev_err_probe(st->dev, ret, "Failed to read dev id\n");
+
+ if (val != ADXL367_DEVID_AD)
return dev_err_probe(st->dev, -ENODEV,
"Invalid dev id 0x%02X, expected 0x%02X\n",
val, ADXL367_DEVID_AD);
@@ -1510,6 +1512,8 @@ int adxl367_probe(struct device *dev, const struct adxl367_ops *ops,
if (ret)
return ret;
+ fsleep(15000);
+
ret = adxl367_verify_devid(st);
if (ret)
return ret;
diff --git a/drivers/iio/accel/adxl367_i2c.c b/drivers/iio/accel/adxl367_i2c.c
index b595fe9..62c74bd 100644
--- a/drivers/iio/accel/adxl367_i2c.c
+++ b/drivers/iio/accel/adxl367_i2c.c
@@ -11,7 +11,7 @@
#include "adxl367.h"
-#define ADXL367_I2C_FIFO_DATA 0x42
+#define ADXL367_I2C_FIFO_DATA 0x18
struct adxl367_i2c_state {
struct regmap *regmap;
diff --git a/drivers/iio/adc/ad4130.c b/drivers/iio/adc/ad4130.c
index feb86fe..6249042 100644
--- a/drivers/iio/adc/ad4130.c
+++ b/drivers/iio/adc/ad4130.c
@@ -1821,7 +1821,7 @@ static int ad4130_setup_int_clk(struct ad4130_state *st)
{
struct device *dev = &st->spi->dev;
struct device_node *of_node = dev_of_node(dev);
- struct clk_init_data init;
+ struct clk_init_data init = {};
const char *clk_name;
int ret;
@@ -1891,10 +1891,14 @@ static int ad4130_setup(struct iio_dev *indio_dev)
return ret;
/*
- * Configure all GPIOs for output. If configured, the interrupt function
- * of P2 takes priority over the GPIO out function.
+ * Configure unused GPIOs for output. If configured, the interrupt
+ * function of P2 takes priority over the GPIO out function.
*/
- val = AD4130_IO_CONTROL_GPIO_CTRL_MASK;
+ val = 0;
+ for (i = 0; i < AD4130_MAX_GPIOS; i++)
+ if (st->pins_fn[i + AD4130_AIN2_P1] == AD4130_PIN_FN_NONE)
+ val |= FIELD_PREP(AD4130_IO_CONTROL_GPIO_CTRL_MASK, BIT(i));
+
val |= FIELD_PREP(AD4130_IO_CONTROL_INT_PIN_SEL_MASK, st->int_pin_sel);
ret = regmap_write(st->regmap, AD4130_IO_CONTROL_REG, val);
diff --git a/drivers/iio/adc/ad7091r8.c b/drivers/iio/adc/ad7091r8.c
index 57700f1..7005643 100644
--- a/drivers/iio/adc/ad7091r8.c
+++ b/drivers/iio/adc/ad7091r8.c
@@ -195,7 +195,7 @@ static int ad7091r8_gpio_setup(struct ad7091r_state *st)
st->reset_gpio = devm_gpiod_get_optional(st->dev, "reset",
GPIOD_OUT_HIGH);
if (IS_ERR(st->reset_gpio))
- return dev_err_probe(st->dev, PTR_ERR(st->convst_gpio),
+ return dev_err_probe(st->dev, PTR_ERR(st->reset_gpio),
"Error on requesting reset GPIO\n");
if (st->reset_gpio) {
diff --git a/drivers/iio/humidity/Kconfig b/drivers/iio/humidity/Kconfig
index 2de5494..b15b7a3 100644
--- a/drivers/iio/humidity/Kconfig
+++ b/drivers/iio/humidity/Kconfig
@@ -48,6 +48,18 @@
To compile this driver as a module, choose M here: the module
will be called hdc2010.
+config HDC3020
+ tristate "TI HDC3020 relative humidity and temperature sensor"
+ depends on I2C
+ select CRC8
+ help
+ Say yes here to build support for the Texas Instruments
+ HDC3020, HDC3021 and HDC3022 relative humidity and temperature
+ sensors.
+
+ To compile this driver as a module, choose M here: the module
+ will be called hdc3020.
+
config HID_SENSOR_HUMIDITY
tristate "HID Environmental humidity sensor"
depends on HID_SENSOR_HUB
diff --git a/drivers/iio/humidity/Makefile b/drivers/iio/humidity/Makefile
index f19ff3d..5fbeef2 100644
--- a/drivers/iio/humidity/Makefile
+++ b/drivers/iio/humidity/Makefile
@@ -7,6 +7,7 @@
obj-$(CONFIG_DHT11) += dht11.o
obj-$(CONFIG_HDC100X) += hdc100x.o
obj-$(CONFIG_HDC2010) += hdc2010.o
+obj-$(CONFIG_HDC3020) += hdc3020.o
obj-$(CONFIG_HID_SENSOR_HUMIDITY) += hid-sensor-humidity.o
hts221-y := hts221_core.o \
diff --git a/drivers/iio/humidity/hdc3020.c b/drivers/iio/humidity/hdc3020.c
index 4e33111..ed70415 100644
--- a/drivers/iio/humidity/hdc3020.c
+++ b/drivers/iio/humidity/hdc3020.c
@@ -322,7 +322,7 @@ static int hdc3020_read_raw(struct iio_dev *indio_dev,
if (chan->type != IIO_TEMP)
return -EINVAL;
- *val = 16852;
+ *val = -16852;
return IIO_VAL_INT;
default:
diff --git a/drivers/iio/imu/bno055/Kconfig b/drivers/iio/imu/bno055/Kconfig
index 83e53ac..c7f5866 100644
--- a/drivers/iio/imu/bno055/Kconfig
+++ b/drivers/iio/imu/bno055/Kconfig
@@ -8,6 +8,7 @@
config BOSCH_BNO055_SERIAL
tristate "Bosch BNO055 attached via UART"
depends on SERIAL_DEV_BUS
+ select REGMAP
select BOSCH_BNO055
help
Enable this to support Bosch BNO055 IMUs attached via UART.
diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c
index 66d4ba0..d4f9b5d 100644
--- a/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c
+++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c
@@ -109,6 +109,8 @@ irqreturn_t inv_mpu6050_read_fifo(int irq, void *p)
/* compute and process only all complete datum */
nb = fifo_count / bytes_per_datum;
fifo_count = nb * bytes_per_datum;
+ if (nb == 0)
+ goto end_session;
/* Each FIFO data contains all sensors, so same number for FIFO and sensor data */
fifo_period = NSEC_PER_SEC / INV_MPU6050_DIVIDER_TO_FIFO_RATE(st->chip_config.divider);
inv_sensors_timestamp_interrupt(&st->timestamp, fifo_period, nb, nb, pf->timestamp);
diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c b/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c
index 676704f9..e6e6e94 100644
--- a/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c
+++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c
@@ -111,6 +111,7 @@ int inv_mpu6050_prepare_fifo(struct inv_mpu6050_state *st, bool enable)
if (enable) {
/* reset timestamping */
inv_sensors_timestamp_reset(&st->timestamp);
+ inv_sensors_timestamp_apply_odr(&st->timestamp, 0, 0, 0);
/* reset FIFO */
d = st->chip_config.user_ctrl | INV_MPU6050_BIT_FIFO_RST;
ret = regmap_write(st->map, st->reg->user_ctrl, d);
@@ -184,6 +185,10 @@ static int inv_mpu6050_set_enable(struct iio_dev *indio_dev, bool enable)
if (result)
goto error_power_off;
} else {
+ st->chip_config.gyro_fifo_enable = 0;
+ st->chip_config.accl_fifo_enable = 0;
+ st->chip_config.temp_fifo_enable = 0;
+ st->chip_config.magn_fifo_enable = 0;
result = inv_mpu6050_prepare_fifo(st, false);
if (result)
goto error_power_off;
diff --git a/drivers/iio/industrialio-core.c b/drivers/iio/industrialio-core.c
index 9a85752..173dc00 100644
--- a/drivers/iio/industrialio-core.c
+++ b/drivers/iio/industrialio-core.c
@@ -1584,10 +1584,13 @@ static int iio_device_register_sysfs(struct iio_dev *indio_dev)
ret = iio_device_register_sysfs_group(indio_dev,
&iio_dev_opaque->chan_attr_group);
if (ret)
- goto error_clear_attrs;
+ goto error_free_chan_attrs;
return 0;
+error_free_chan_attrs:
+ kfree(iio_dev_opaque->chan_attr_group.attrs);
+ iio_dev_opaque->chan_attr_group.attrs = NULL;
error_clear_attrs:
iio_free_chan_devattr_list(&iio_dev_opaque->channel_attr_list);
diff --git a/drivers/iio/light/hid-sensor-als.c b/drivers/iio/light/hid-sensor-als.c
index 5cd27f0..b6c4bef 100644
--- a/drivers/iio/light/hid-sensor-als.c
+++ b/drivers/iio/light/hid-sensor-als.c
@@ -226,6 +226,7 @@ static int als_capture_sample(struct hid_sensor_hub_device *hsdev,
case HID_USAGE_SENSOR_TIME_TIMESTAMP:
als_state->timestamp = hid_sensor_convert_timestamp(&als_state->common_attributes,
*(s64 *)raw_data);
+ ret = 0;
break;
default:
break;
diff --git a/drivers/iio/magnetometer/rm3100-core.c b/drivers/iio/magnetometer/rm3100-core.c
index 6993820..42b70cd 100644
--- a/drivers/iio/magnetometer/rm3100-core.c
+++ b/drivers/iio/magnetometer/rm3100-core.c
@@ -530,6 +530,7 @@ int rm3100_common_probe(struct device *dev, struct regmap *regmap, int irq)
struct rm3100_data *data;
unsigned int tmp;
int ret;
+ int samp_rate_index;
indio_dev = devm_iio_device_alloc(dev, sizeof(*data));
if (!indio_dev)
@@ -586,9 +587,14 @@ int rm3100_common_probe(struct device *dev, struct regmap *regmap, int irq)
ret = regmap_read(regmap, RM3100_REG_TMRC, &tmp);
if (ret < 0)
return ret;
+
+ samp_rate_index = tmp - RM3100_TMRC_OFFSET;
+ if (samp_rate_index < 0 || samp_rate_index >= RM3100_SAMP_NUM) {
+ dev_err(dev, "The value read from RM3100_REG_TMRC is invalid!\n");
+ return -EINVAL;
+ }
/* Initializing max wait time, which is double conversion time. */
- data->conversion_time = rm3100_samp_rates[tmp - RM3100_TMRC_OFFSET][2]
- * 2;
+ data->conversion_time = rm3100_samp_rates[samp_rate_index][2] * 2;
/* Cycle count values may not be what we want. */
if ((tmp - RM3100_TMRC_OFFSET) == 0)
diff --git a/drivers/iio/pressure/bmp280-spi.c b/drivers/iio/pressure/bmp280-spi.c
index 433d6fa..a444d4b 100644
--- a/drivers/iio/pressure/bmp280-spi.c
+++ b/drivers/iio/pressure/bmp280-spi.c
@@ -4,6 +4,7 @@
*
* Inspired by the older BMP085 driver drivers/misc/bmp085-spi.c
*/
+#include <linux/bits.h>
#include <linux/module.h>
#include <linux/spi/spi.h>
#include <linux/err.h>
@@ -35,6 +36,34 @@ static int bmp280_regmap_spi_read(void *context, const void *reg,
return spi_write_then_read(spi, reg, reg_size, val, val_size);
}
+static int bmp380_regmap_spi_read(void *context, const void *reg,
+ size_t reg_size, void *val, size_t val_size)
+{
+ struct spi_device *spi = to_spi_device(context);
+ u8 rx_buf[4];
+ ssize_t status;
+
+ /*
+ * Maximum number of consecutive bytes read for a temperature or
+ * pressure measurement is 3.
+ */
+ if (val_size > 3)
+ return -EINVAL;
+
+ /*
+ * According to the BMP3xx datasheets, for a basic SPI read opertion,
+ * the first byte needs to be dropped and the rest are the requested
+ * data.
+ */
+ status = spi_write_then_read(spi, reg, 1, rx_buf, val_size + 1);
+ if (status)
+ return status;
+
+ memcpy(val, rx_buf + 1, val_size);
+
+ return 0;
+}
+
static struct regmap_bus bmp280_regmap_bus = {
.write = bmp280_regmap_spi_write,
.read = bmp280_regmap_spi_read,
@@ -42,10 +71,19 @@ static struct regmap_bus bmp280_regmap_bus = {
.val_format_endian_default = REGMAP_ENDIAN_BIG,
};
+static struct regmap_bus bmp380_regmap_bus = {
+ .write = bmp280_regmap_spi_write,
+ .read = bmp380_regmap_spi_read,
+ .read_flag_mask = BIT(7),
+ .reg_format_endian_default = REGMAP_ENDIAN_BIG,
+ .val_format_endian_default = REGMAP_ENDIAN_BIG,
+};
+
static int bmp280_spi_probe(struct spi_device *spi)
{
const struct spi_device_id *id = spi_get_device_id(spi);
const struct bmp280_chip_info *chip_info;
+ struct regmap_bus *bmp_regmap_bus;
struct regmap *regmap;
int ret;
@@ -58,8 +96,18 @@ static int bmp280_spi_probe(struct spi_device *spi)
chip_info = spi_get_device_match_data(spi);
+ switch (chip_info->chip_id[0]) {
+ case BMP380_CHIP_ID:
+ case BMP390_CHIP_ID:
+ bmp_regmap_bus = &bmp380_regmap_bus;
+ break;
+ default:
+ bmp_regmap_bus = &bmp280_regmap_bus;
+ break;
+ }
+
regmap = devm_regmap_init(&spi->dev,
- &bmp280_regmap_bus,
+ bmp_regmap_bus,
&spi->dev,
chip_info->regmap_config);
if (IS_ERR(regmap)) {
@@ -87,6 +135,7 @@ static const struct of_device_id bmp280_of_spi_match[] = {
MODULE_DEVICE_TABLE(of, bmp280_of_spi_match);
static const struct spi_device_id bmp280_spi_id[] = {
+ { "bmp085", (kernel_ulong_t)&bmp180_chip_info },
{ "bmp180", (kernel_ulong_t)&bmp180_chip_info },
{ "bmp181", (kernel_ulong_t)&bmp180_chip_info },
{ "bmp280", (kernel_ulong_t)&bmp280_chip_info },
diff --git a/drivers/iio/pressure/dlhl60d.c b/drivers/iio/pressure/dlhl60d.c
index 28c8269..0bba4c5 100644
--- a/drivers/iio/pressure/dlhl60d.c
+++ b/drivers/iio/pressure/dlhl60d.c
@@ -250,18 +250,17 @@ static irqreturn_t dlh_trigger_handler(int irq, void *private)
struct dlh_state *st = iio_priv(indio_dev);
int ret;
unsigned int chn, i = 0;
- __be32 tmp_buf[2];
+ __be32 tmp_buf[2] = { };
ret = dlh_start_capture_and_read(st);
if (ret)
goto out;
for_each_set_bit(chn, indio_dev->active_scan_mask,
- indio_dev->masklength) {
- memcpy(tmp_buf + i,
+ indio_dev->masklength) {
+ memcpy(&tmp_buf[i++],
&st->rx_buf[1] + chn * DLH_NUM_DATA_BYTES,
DLH_NUM_DATA_BYTES);
- i++;
}
iio_push_to_buffers(indio_dev, tmp_buf);
diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index 8243496..ce9c5ba 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -401,6 +401,10 @@ static void bnxt_re_create_fence_wqe(struct bnxt_re_pd *pd)
struct bnxt_re_fence_data *fence = &pd->fence;
struct ib_mr *ib_mr = &fence->mr->ib_mr;
struct bnxt_qplib_swqe *wqe = &fence->bind_wqe;
+ struct bnxt_re_dev *rdev = pd->rdev;
+
+ if (bnxt_qplib_is_chip_gen_p5_p7(rdev->chip_ctx))
+ return;
memset(wqe, 0, sizeof(*wqe));
wqe->type = BNXT_QPLIB_SWQE_TYPE_BIND_MW;
@@ -455,6 +459,9 @@ static void bnxt_re_destroy_fence_mr(struct bnxt_re_pd *pd)
struct device *dev = &rdev->en_dev->pdev->dev;
struct bnxt_re_mr *mr = fence->mr;
+ if (bnxt_qplib_is_chip_gen_p5_p7(rdev->chip_ctx))
+ return;
+
if (fence->mw) {
bnxt_re_dealloc_mw(fence->mw);
fence->mw = NULL;
@@ -486,6 +493,9 @@ static int bnxt_re_create_fence_mr(struct bnxt_re_pd *pd)
struct ib_mw *mw;
int rc;
+ if (bnxt_qplib_is_chip_gen_p5_p7(rdev->chip_ctx))
+ return 0;
+
dma_addr = dma_map_single(dev, fence->va, BNXT_RE_FENCE_BYTES,
DMA_BIDIRECTIONAL);
rc = dma_mapping_error(dev, dma_addr);
@@ -1817,7 +1827,7 @@ int bnxt_re_modify_srq(struct ib_srq *ib_srq, struct ib_srq_attr *srq_attr,
switch (srq_attr_mask) {
case IB_SRQ_MAX_WR:
/* SRQ resize is not supported */
- break;
+ return -EINVAL;
case IB_SRQ_LIMIT:
/* Change the SRQ threshold */
if (srq_attr->srq_limit > srq->qplib_srq.max_wqe)
@@ -1832,13 +1842,12 @@ int bnxt_re_modify_srq(struct ib_srq *ib_srq, struct ib_srq_attr *srq_attr,
/* On success, update the shadow */
srq->srq_limit = srq_attr->srq_limit;
/* No need to Build and send response back to udata */
- break;
+ return 0;
default:
ibdev_err(&rdev->ibdev,
"Unsupported srq_attr_mask 0x%x", srq_attr_mask);
return -EINVAL;
}
- return 0;
}
int bnxt_re_query_srq(struct ib_srq *ib_srq, struct ib_srq_attr *srq_attr)
@@ -2556,11 +2565,6 @@ static int bnxt_re_build_inv_wqe(const struct ib_send_wr *wr,
wqe->type = BNXT_QPLIB_SWQE_TYPE_LOCAL_INV;
wqe->local_inv.inv_l_key = wr->ex.invalidate_rkey;
- /* Need unconditional fence for local invalidate
- * opcode to work as expected.
- */
- wqe->flags |= BNXT_QPLIB_SWQE_FLAGS_UC_FENCE;
-
if (wr->send_flags & IB_SEND_SIGNALED)
wqe->flags |= BNXT_QPLIB_SWQE_FLAGS_SIGNAL_COMP;
if (wr->send_flags & IB_SEND_SOLICITED)
@@ -2583,12 +2587,6 @@ static int bnxt_re_build_reg_wqe(const struct ib_reg_wr *wr,
wqe->frmr.levels = qplib_frpl->hwq.level;
wqe->type = BNXT_QPLIB_SWQE_TYPE_REG_MR;
- /* Need unconditional fence for reg_mr
- * opcode to function as expected.
- */
-
- wqe->flags |= BNXT_QPLIB_SWQE_FLAGS_UC_FENCE;
-
if (wr->wr.send_flags & IB_SEND_SIGNALED)
wqe->flags |= BNXT_QPLIB_SWQE_FLAGS_SIGNAL_COMP;
@@ -2719,6 +2717,18 @@ static int bnxt_re_post_send_shadow_qp(struct bnxt_re_dev *rdev,
return rc;
}
+static void bnxt_re_legacy_set_uc_fence(struct bnxt_qplib_swqe *wqe)
+{
+ /* Need unconditional fence for non-wire memory opcode
+ * to work as expected.
+ */
+ if (wqe->type == BNXT_QPLIB_SWQE_TYPE_LOCAL_INV ||
+ wqe->type == BNXT_QPLIB_SWQE_TYPE_FAST_REG_MR ||
+ wqe->type == BNXT_QPLIB_SWQE_TYPE_REG_MR ||
+ wqe->type == BNXT_QPLIB_SWQE_TYPE_BIND_MW)
+ wqe->flags |= BNXT_QPLIB_SWQE_FLAGS_UC_FENCE;
+}
+
int bnxt_re_post_send(struct ib_qp *ib_qp, const struct ib_send_wr *wr,
const struct ib_send_wr **bad_wr)
{
@@ -2798,8 +2808,11 @@ int bnxt_re_post_send(struct ib_qp *ib_qp, const struct ib_send_wr *wr,
rc = -EINVAL;
goto bad;
}
- if (!rc)
+ if (!rc) {
+ if (!bnxt_qplib_is_chip_gen_p5_p7(qp->rdev->chip_ctx))
+ bnxt_re_legacy_set_uc_fence(&wqe);
rc = bnxt_qplib_post_send(&qp->qplib_qp, &wqe);
+ }
bad:
if (rc) {
ibdev_err(&qp->rdev->ibdev,
diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c
index f022c922..54b4d2f 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -280,9 +280,6 @@ static void bnxt_re_set_resource_limits(struct bnxt_re_dev *rdev)
static void bnxt_re_vf_res_config(struct bnxt_re_dev *rdev)
{
-
- if (test_bit(BNXT_RE_FLAG_ERR_DEVICE_DETACHED, &rdev->flags))
- return;
rdev->num_vfs = pci_sriov_get_totalvfs(rdev->en_dev->pdev);
if (!bnxt_qplib_is_chip_gen_p5_p7(rdev->chip_ctx)) {
bnxt_re_set_resource_limits(rdev);
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_fp.c b/drivers/infiniband/hw/bnxt_re/qplib_fp.c
index c98e04f..439d0c7 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_fp.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_fp.c
@@ -744,7 +744,8 @@ int bnxt_qplib_query_srq(struct bnxt_qplib_res *res,
bnxt_qplib_fill_cmdqmsg(&msg, &req, &resp, &sbuf, sizeof(req),
sizeof(resp), 0);
rc = bnxt_qplib_rcfw_send_message(rcfw, &msg);
- srq->threshold = le16_to_cpu(sb->srq_limit);
+ if (!rc)
+ srq->threshold = le16_to_cpu(sb->srq_limit);
dma_free_coherent(&rcfw->pdev->dev, sbuf.size,
sbuf.sb, sbuf.dma_addr);
diff --git a/drivers/infiniband/hw/hfi1/pio.c b/drivers/infiniband/hw/hfi1/pio.c
index 68c621f..5a91cbd 100644
--- a/drivers/infiniband/hw/hfi1/pio.c
+++ b/drivers/infiniband/hw/hfi1/pio.c
@@ -2086,7 +2086,7 @@ int init_credit_return(struct hfi1_devdata *dd)
"Unable to allocate credit return DMA range for NUMA %d\n",
i);
ret = -ENOMEM;
- goto done;
+ goto free_cr_base;
}
}
set_dev_node(&dd->pcidev->dev, dd->node);
@@ -2094,6 +2094,10 @@ int init_credit_return(struct hfi1_devdata *dd)
ret = 0;
done:
return ret;
+
+free_cr_base:
+ free_credit_return(dd);
+ goto done;
}
void free_credit_return(struct hfi1_devdata *dd)
diff --git a/drivers/infiniband/hw/hfi1/sdma.c b/drivers/infiniband/hw/hfi1/sdma.c
index 6e5ac20..b67d23b 100644
--- a/drivers/infiniband/hw/hfi1/sdma.c
+++ b/drivers/infiniband/hw/hfi1/sdma.c
@@ -3158,7 +3158,7 @@ int _pad_sdma_tx_descs(struct hfi1_devdata *dd, struct sdma_txreq *tx)
{
int rval = 0;
- if ((unlikely(tx->num_desc + 1 == tx->desc_limit))) {
+ if ((unlikely(tx->num_desc == tx->desc_limit))) {
rval = _extend_sdma_tx_descs(dd, tx);
if (rval) {
__sdma_txclean(dd, tx);
diff --git a/drivers/infiniband/hw/irdma/defs.h b/drivers/infiniband/hw/irdma/defs.h
index 8fb752f..2cb4b96 100644
--- a/drivers/infiniband/hw/irdma/defs.h
+++ b/drivers/infiniband/hw/irdma/defs.h
@@ -346,6 +346,7 @@ enum irdma_cqp_op_type {
#define IRDMA_AE_LLP_TOO_MANY_KEEPALIVE_RETRIES 0x050b
#define IRDMA_AE_LLP_DOUBT_REACHABILITY 0x050c
#define IRDMA_AE_LLP_CONNECTION_ESTABLISHED 0x050e
+#define IRDMA_AE_LLP_TOO_MANY_RNRS 0x050f
#define IRDMA_AE_RESOURCE_EXHAUSTION 0x0520
#define IRDMA_AE_RESET_SENT 0x0601
#define IRDMA_AE_TERMINATE_SENT 0x0602
diff --git a/drivers/infiniband/hw/irdma/hw.c b/drivers/infiniband/hw/irdma/hw.c
index bd4b2b8..ad50b77 100644
--- a/drivers/infiniband/hw/irdma/hw.c
+++ b/drivers/infiniband/hw/irdma/hw.c
@@ -387,6 +387,7 @@ static void irdma_process_aeq(struct irdma_pci_f *rf)
case IRDMA_AE_LLP_TOO_MANY_RETRIES:
case IRDMA_AE_LCE_QP_CATASTROPHIC:
case IRDMA_AE_LCE_FUNCTION_CATASTROPHIC:
+ case IRDMA_AE_LLP_TOO_MANY_RNRS:
case IRDMA_AE_LCE_CQ_CATASTROPHIC:
case IRDMA_AE_UDA_XMIT_DGRAM_TOO_LONG:
default:
@@ -570,6 +571,13 @@ static void irdma_destroy_irq(struct irdma_pci_f *rf,
dev->irq_ops->irdma_dis_irq(dev, msix_vec->idx);
irq_update_affinity_hint(msix_vec->irq, NULL);
free_irq(msix_vec->irq, dev_id);
+ if (rf == dev_id) {
+ tasklet_kill(&rf->dpc_tasklet);
+ } else {
+ struct irdma_ceq *iwceq = (struct irdma_ceq *)dev_id;
+
+ tasklet_kill(&iwceq->dpc_tasklet);
+ }
}
/**
diff --git a/drivers/infiniband/hw/irdma/verbs.c b/drivers/infiniband/hw/irdma/verbs.c
index b5eb8d4..0b046c0 100644
--- a/drivers/infiniband/hw/irdma/verbs.c
+++ b/drivers/infiniband/hw/irdma/verbs.c
@@ -839,7 +839,9 @@ static int irdma_validate_qp_attrs(struct ib_qp_init_attr *init_attr,
if (init_attr->cap.max_inline_data > uk_attrs->max_hw_inline ||
init_attr->cap.max_send_sge > uk_attrs->max_hw_wq_frags ||
- init_attr->cap.max_recv_sge > uk_attrs->max_hw_wq_frags)
+ init_attr->cap.max_recv_sge > uk_attrs->max_hw_wq_frags ||
+ init_attr->cap.max_send_wr > uk_attrs->max_hw_wq_quanta ||
+ init_attr->cap.max_recv_wr > uk_attrs->max_hw_rq_quanta)
return -EINVAL;
if (rdma_protocol_roce(&iwdev->ibdev, 1)) {
@@ -2184,9 +2186,8 @@ static int irdma_create_cq(struct ib_cq *ibcq,
info.cq_base_pa = iwcq->kmem.pa;
}
- if (dev->hw_attrs.uk_attrs.hw_rev >= IRDMA_GEN_2)
- info.shadow_read_threshold = min(info.cq_uk_init_info.cq_size / 2,
- (u32)IRDMA_MAX_CQ_READ_THRESH);
+ info.shadow_read_threshold = min(info.cq_uk_init_info.cq_size / 2,
+ (u32)IRDMA_MAX_CQ_READ_THRESH);
if (irdma_sc_cq_init(cq, &info)) {
ibdev_dbg(&iwdev->ibdev, "VERBS: init cq fail\n");
diff --git a/drivers/infiniband/hw/mlx5/cong.c b/drivers/infiniband/hw/mlx5/cong.c
index f875313..a78a067 100644
--- a/drivers/infiniband/hw/mlx5/cong.c
+++ b/drivers/infiniband/hw/mlx5/cong.c
@@ -458,6 +458,12 @@ void mlx5_ib_init_cong_debugfs(struct mlx5_ib_dev *dev, u32 port_num)
dbg_cc_params->root = debugfs_create_dir("cc_params", mlx5_debugfs_get_dev_root(mdev));
for (i = 0; i < MLX5_IB_DBG_CC_MAX; i++) {
+ if ((i == MLX5_IB_DBG_CC_GENERAL_RTT_RESP_DSCP_VALID ||
+ i == MLX5_IB_DBG_CC_GENERAL_RTT_RESP_DSCP))
+ if (!MLX5_CAP_GEN(mdev, roce) ||
+ !MLX5_CAP_ROCE(mdev, roce_cc_general))
+ continue;
+
dbg_cc_params->params[i].offset = i;
dbg_cc_params->params[i].dev = dev;
dbg_cc_params->params[i].port_num = port_num;
diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c
index 869369c..253fea3 100644
--- a/drivers/infiniband/hw/mlx5/devx.c
+++ b/drivers/infiniband/hw/mlx5/devx.c
@@ -2949,7 +2949,7 @@ DECLARE_UVERBS_NAMED_METHOD(
MLX5_IB_METHOD_DEVX_OBJ_MODIFY,
UVERBS_ATTR_IDR(MLX5_IB_ATTR_DEVX_OBJ_MODIFY_HANDLE,
UVERBS_IDR_ANY_OBJECT,
- UVERBS_ACCESS_WRITE,
+ UVERBS_ACCESS_READ,
UA_MANDATORY),
UVERBS_ATTR_PTR_IN(
MLX5_IB_ATTR_DEVX_OBJ_MODIFY_CMD_IN,
diff --git a/drivers/infiniband/hw/mlx5/wr.c b/drivers/infiniband/hw/mlx5/wr.c
index df1d1b0..9947feb 100644
--- a/drivers/infiniband/hw/mlx5/wr.c
+++ b/drivers/infiniband/hw/mlx5/wr.c
@@ -78,7 +78,7 @@ static void set_eth_seg(const struct ib_send_wr *wr, struct mlx5_ib_qp *qp,
*/
copysz = min_t(u64, *cur_edge - (void *)eseg->inline_hdr.start,
left);
- memcpy(eseg->inline_hdr.start, pdata, copysz);
+ memcpy(eseg->inline_hdr.data, pdata, copysz);
stride = ALIGN(sizeof(struct mlx5_wqe_eth_seg) -
sizeof(eseg->inline_hdr.start) + copysz, 16);
*size += stride / 16;
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index 7887a67..f118ce0 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -1879,8 +1879,17 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
/* RQ - read access only (0) */
rc = qedr_init_user_queue(udata, dev, &qp->urq, ureq.rq_addr,
ureq.rq_len, true, 0, alloc_and_init);
- if (rc)
+ if (rc) {
+ ib_umem_release(qp->usq.umem);
+ qp->usq.umem = NULL;
+ if (rdma_protocol_roce(&dev->ibdev, 1)) {
+ qedr_free_pbl(dev, &qp->usq.pbl_info,
+ qp->usq.pbl_tbl);
+ } else {
+ kfree(qp->usq.pbl_tbl);
+ }
return rc;
+ }
}
memset(&in_params, 0, sizeof(in_params));
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 58f70cf..040234c 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -79,12 +79,16 @@ module_param(srpt_srq_size, int, 0444);
MODULE_PARM_DESC(srpt_srq_size,
"Shared receive queue (SRQ) size.");
+static int srpt_set_u64_x(const char *buffer, const struct kernel_param *kp)
+{
+ return kstrtou64(buffer, 16, (u64 *)kp->arg);
+}
static int srpt_get_u64_x(char *buffer, const struct kernel_param *kp)
{
return sprintf(buffer, "0x%016llx\n", *(u64 *)kp->arg);
}
-module_param_call(srpt_service_guid, NULL, srpt_get_u64_x, &srpt_service_guid,
- 0444);
+module_param_call(srpt_service_guid, srpt_set_u64_x, srpt_get_u64_x,
+ &srpt_service_guid, 0444);
MODULE_PARM_DESC(srpt_service_guid,
"Using this value for ioc_guid, id_ext, and cm_listen_id instead of using the node_guid of the first HCA.");
@@ -210,10 +214,12 @@ static const char *get_ch_state_name(enum rdma_ch_state s)
/**
* srpt_qp_event - QP event callback function
* @event: Description of the event that occurred.
- * @ch: SRPT RDMA channel.
+ * @ptr: SRPT RDMA channel.
*/
-static void srpt_qp_event(struct ib_event *event, struct srpt_rdma_ch *ch)
+static void srpt_qp_event(struct ib_event *event, void *ptr)
{
+ struct srpt_rdma_ch *ch = ptr;
+
pr_debug("QP event %d on ch=%p sess_name=%s-%d state=%s\n",
event->event, ch, ch->sess_name, ch->qp->qp_num,
get_ch_state_name(ch->state));
@@ -1807,8 +1813,7 @@ static int srpt_create_ch_ib(struct srpt_rdma_ch *ch)
ch->cq_size = ch->rq_size + sq_size;
qp_init->qp_context = (void *)ch;
- qp_init->event_handler
- = (void(*)(struct ib_event *, void*))srpt_qp_event;
+ qp_init->event_handler = srpt_qp_event;
qp_init->send_cq = ch->cq;
qp_init->recv_cq = ch->cq;
qp_init->sq_sig_type = IB_SIGNAL_REQ_WR;
diff --git a/drivers/input/joystick/xpad.c b/drivers/input/joystick/xpad.c
index 7c4b2a5..14c828a 100644
--- a/drivers/input/joystick/xpad.c
+++ b/drivers/input/joystick/xpad.c
@@ -130,7 +130,12 @@ static const struct xpad_device {
{ 0x0079, 0x18d4, "GPD Win 2 X-Box Controller", 0, XTYPE_XBOX360 },
{ 0x03eb, 0xff01, "Wooting One (Legacy)", 0, XTYPE_XBOX360 },
{ 0x03eb, 0xff02, "Wooting Two (Legacy)", 0, XTYPE_XBOX360 },
+ { 0x03f0, 0x038D, "HyperX Clutch", 0, XTYPE_XBOX360 }, /* wired */
+ { 0x03f0, 0x048D, "HyperX Clutch", 0, XTYPE_XBOX360 }, /* wireless */
{ 0x03f0, 0x0495, "HyperX Clutch Gladiate", 0, XTYPE_XBOXONE },
+ { 0x03f0, 0x07A0, "HyperX Clutch Gladiate RGB", 0, XTYPE_XBOXONE },
+ { 0x03f0, 0x08B6, "HyperX Clutch Gladiate", 0, XTYPE_XBOXONE }, /* v2 */
+ { 0x03f0, 0x09B4, "HyperX Clutch Tanto", 0, XTYPE_XBOXONE },
{ 0x044f, 0x0f00, "Thrustmaster Wheel", 0, XTYPE_XBOX },
{ 0x044f, 0x0f03, "Thrustmaster Wheel", 0, XTYPE_XBOX },
{ 0x044f, 0x0f07, "Thrustmaster, Inc. Controller", 0, XTYPE_XBOX },
@@ -463,6 +468,7 @@ static const struct usb_device_id xpad_table[] = {
{ USB_INTERFACE_INFO('X', 'B', 0) }, /* Xbox USB-IF not-approved class */
XPAD_XBOX360_VENDOR(0x0079), /* GPD Win 2 controller */
XPAD_XBOX360_VENDOR(0x03eb), /* Wooting Keyboards (Legacy) */
+ XPAD_XBOX360_VENDOR(0x03f0), /* HP HyperX Xbox 360 controllers */
XPAD_XBOXONE_VENDOR(0x03f0), /* HP HyperX Xbox One controllers */
XPAD_XBOX360_VENDOR(0x044f), /* Thrustmaster Xbox 360 controllers */
XPAD_XBOX360_VENDOR(0x045e), /* Microsoft Xbox 360 controllers */
diff --git a/drivers/input/keyboard/gpio_keys_polled.c b/drivers/input/keyboard/gpio_keys_polled.c
index ba00ecf..b41fd12 100644
--- a/drivers/input/keyboard/gpio_keys_polled.c
+++ b/drivers/input/keyboard/gpio_keys_polled.c
@@ -315,12 +315,10 @@ static int gpio_keys_polled_probe(struct platform_device *pdev)
error = devm_gpio_request_one(dev, button->gpio,
flags, button->desc ? : DRV_NAME);
- if (error) {
- dev_err(dev,
- "unable to claim gpio %u, err=%d\n",
- button->gpio, error);
- return error;
- }
+ if (error)
+ return dev_err_probe(dev, error,
+ "unable to claim gpio %u\n",
+ button->gpio);
bdata->gpiod = gpio_to_desc(button->gpio);
if (!bdata->gpiod) {
diff --git a/drivers/input/mouse/bcm5974.c b/drivers/input/mouse/bcm5974.c
index 953992b..ca15061 100644
--- a/drivers/input/mouse/bcm5974.c
+++ b/drivers/input/mouse/bcm5974.c
@@ -19,7 +19,6 @@
* Copyright (C) 2006 Nicolas Boichat (nicolas@boichat.ch)
*/
-#include "linux/usb.h"
#include <linux/kernel.h>
#include <linux/errno.h>
#include <linux/slab.h>
@@ -194,8 +193,6 @@ enum tp_type {
/* list of device capability bits */
#define HAS_INTEGRATED_BUTTON 1
-/* maximum number of supported endpoints (currently trackpad and button) */
-#define MAX_ENDPOINTS 2
/* trackpad finger data block size */
#define FSIZE_TYPE1 (14 * sizeof(__le16))
@@ -894,18 +891,6 @@ static int bcm5974_resume(struct usb_interface *iface)
return error;
}
-static bool bcm5974_check_endpoints(struct usb_interface *iface,
- const struct bcm5974_config *cfg)
-{
- u8 ep_addr[MAX_ENDPOINTS + 1] = {0};
-
- ep_addr[0] = cfg->tp_ep;
- if (cfg->tp_type == TYPE1)
- ep_addr[1] = cfg->bt_ep;
-
- return usb_check_int_endpoints(iface, ep_addr);
-}
-
static int bcm5974_probe(struct usb_interface *iface,
const struct usb_device_id *id)
{
@@ -918,11 +903,6 @@ static int bcm5974_probe(struct usb_interface *iface,
/* find the product index */
cfg = bcm5974_get_config(udev);
- if (!bcm5974_check_endpoints(iface, cfg)) {
- dev_err(&iface->dev, "Unexpected non-int endpoint\n");
- return -ENODEV;
- }
-
/* allocate memory for our device state and initialize it */
dev = kzalloc(sizeof(struct bcm5974), GFP_KERNEL);
input_dev = input_allocate_device();
diff --git a/drivers/input/rmi4/rmi_driver.c b/drivers/input/rmi4/rmi_driver.c
index 258d5fe..42eaebb 100644
--- a/drivers/input/rmi4/rmi_driver.c
+++ b/drivers/input/rmi4/rmi_driver.c
@@ -978,12 +978,12 @@ static int rmi_driver_remove(struct device *dev)
rmi_disable_irq(rmi_dev, false);
- irq_domain_remove(data->irqdomain);
- data->irqdomain = NULL;
-
rmi_f34_remove_sysfs(rmi_dev);
rmi_free_function_list(rmi_dev);
+ irq_domain_remove(data->irqdomain);
+ data->irqdomain = NULL;
+
return 0;
}
diff --git a/drivers/interconnect/qcom/sc8180x.c b/drivers/interconnect/qcom/sc8180x.c
index 20331e1..03d6267 100644
--- a/drivers/interconnect/qcom/sc8180x.c
+++ b/drivers/interconnect/qcom/sc8180x.c
@@ -1372,6 +1372,7 @@ static struct qcom_icc_bcm bcm_mm0 = {
static struct qcom_icc_bcm bcm_co0 = {
.name = "CO0",
+ .keepalive = true,
.num_nodes = 1,
.nodes = { &slv_qns_cdsp_mem_noc }
};
diff --git a/drivers/interconnect/qcom/sm8550.c b/drivers/interconnect/qcom/sm8550.c
index 629faa4..fc22cec 100644
--- a/drivers/interconnect/qcom/sm8550.c
+++ b/drivers/interconnect/qcom/sm8550.c
@@ -2223,6 +2223,7 @@ static struct platform_driver qnoc_driver = {
.driver = {
.name = "qnoc-sm8550",
.of_match_table = qnoc_of_match,
+ .sync_state = icc_sync_state,
},
};
diff --git a/drivers/interconnect/qcom/sm8650.c b/drivers/interconnect/qcom/sm8650.c
index b83de54..b962e6c 100644
--- a/drivers/interconnect/qcom/sm8650.c
+++ b/drivers/interconnect/qcom/sm8650.c
@@ -1160,7 +1160,7 @@ static struct qcom_icc_node qns_gemnoc_sf = {
static struct qcom_icc_bcm bcm_acv = {
.name = "ACV",
- .enable_mask = BIT(3),
+ .enable_mask = BIT(0),
.num_nodes = 1,
.nodes = { &ebi },
};
diff --git a/drivers/interconnect/qcom/x1e80100.c b/drivers/interconnect/qcom/x1e80100.c
index d19501d..cbaf4f9 100644
--- a/drivers/interconnect/qcom/x1e80100.c
+++ b/drivers/interconnect/qcom/x1e80100.c
@@ -1586,6 +1586,7 @@ static struct qcom_icc_node qns_pcie_south_gem_noc_pcie = {
static struct qcom_icc_bcm bcm_acv = {
.name = "ACV",
+ .enable_mask = BIT(3),
.num_nodes = 1,
.nodes = { &ebi },
};
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index 0572212..4a27fbd 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -292,10 +292,8 @@ arm_smmu_mmu_notifier_get(struct arm_smmu_domain *smmu_domain,
struct mm_struct *mm)
{
int ret;
- unsigned long flags;
struct arm_smmu_ctx_desc *cd;
struct arm_smmu_mmu_notifier *smmu_mn;
- struct arm_smmu_master *master;
list_for_each_entry(smmu_mn, &smmu_domain->mmu_notifiers, list) {
if (smmu_mn->mn.mm == mm) {
@@ -325,28 +323,9 @@ arm_smmu_mmu_notifier_get(struct arm_smmu_domain *smmu_domain,
goto err_free_cd;
}
- spin_lock_irqsave(&smmu_domain->devices_lock, flags);
- list_for_each_entry(master, &smmu_domain->devices, domain_head) {
- ret = arm_smmu_write_ctx_desc(master, mm_get_enqcmd_pasid(mm),
- cd);
- if (ret) {
- list_for_each_entry_from_reverse(
- master, &smmu_domain->devices, domain_head)
- arm_smmu_write_ctx_desc(
- master, mm_get_enqcmd_pasid(mm), NULL);
- break;
- }
- }
- spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
- if (ret)
- goto err_put_notifier;
-
list_add(&smmu_mn->list, &smmu_domain->mmu_notifiers);
return smmu_mn;
-err_put_notifier:
- /* Frees smmu_mn */
- mmu_notifier_put(&smmu_mn->mn);
err_free_cd:
arm_smmu_free_shared_cd(cd);
return ERR_PTR(ret);
@@ -363,9 +342,6 @@ static void arm_smmu_mmu_notifier_put(struct arm_smmu_mmu_notifier *smmu_mn)
list_del(&smmu_mn->list);
- arm_smmu_update_ctx_desc_devices(smmu_domain, mm_get_enqcmd_pasid(mm),
- NULL);
-
/*
* If we went through clear(), we've already invalidated, and no
* new TLB entry can have been formed.
@@ -381,7 +357,8 @@ static void arm_smmu_mmu_notifier_put(struct arm_smmu_mmu_notifier *smmu_mn)
arm_smmu_free_shared_cd(cd);
}
-static int __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
+static int __arm_smmu_sva_bind(struct device *dev, ioasid_t pasid,
+ struct mm_struct *mm)
{
int ret;
struct arm_smmu_bond *bond;
@@ -404,9 +381,15 @@ static int __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
goto err_free_bond;
}
+ ret = arm_smmu_write_ctx_desc(master, pasid, bond->smmu_mn->cd);
+ if (ret)
+ goto err_put_notifier;
+
list_add(&bond->list, &master->bonds);
return 0;
+err_put_notifier:
+ arm_smmu_mmu_notifier_put(bond->smmu_mn);
err_free_bond:
kfree(bond);
return ret;
@@ -568,6 +551,9 @@ void arm_smmu_sva_remove_dev_pasid(struct iommu_domain *domain,
struct arm_smmu_master *master = dev_iommu_priv_get(dev);
mutex_lock(&sva_lock);
+
+ arm_smmu_write_ctx_desc(master, id, NULL);
+
list_for_each_entry(t, &master->bonds, list) {
if (t->mm == mm) {
bond = t;
@@ -590,7 +576,7 @@ static int arm_smmu_sva_set_dev_pasid(struct iommu_domain *domain,
struct mm_struct *mm = domain->mm;
mutex_lock(&sva_lock);
- ret = __arm_smmu_sva_bind(dev, mm);
+ ret = __arm_smmu_sva_bind(dev, id, mm);
mutex_unlock(&sva_lock);
return ret;
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 68b6bc5..6317aaf 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -859,10 +859,14 @@ static void arm_smmu_destroy_domain_context(struct arm_smmu_domain *smmu_domain)
arm_smmu_rpm_put(smmu);
}
-static struct iommu_domain *arm_smmu_domain_alloc_paging(struct device *dev)
+static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
{
struct arm_smmu_domain *smmu_domain;
+ if (type != IOMMU_DOMAIN_UNMANAGED) {
+ if (using_legacy_binding || type != IOMMU_DOMAIN_DMA)
+ return NULL;
+ }
/*
* Allocate the domain and initialise some of its data structures.
* We can't really do anything meaningful until we've added a
@@ -875,15 +879,6 @@ static struct iommu_domain *arm_smmu_domain_alloc_paging(struct device *dev)
mutex_init(&smmu_domain->init_mutex);
spin_lock_init(&smmu_domain->cb_lock);
- if (dev) {
- struct arm_smmu_master_cfg *cfg = dev_iommu_priv_get(dev);
-
- if (arm_smmu_init_domain_context(smmu_domain, cfg->smmu, dev)) {
- kfree(smmu_domain);
- return NULL;
- }
- }
-
return &smmu_domain->domain;
}
@@ -1600,7 +1595,7 @@ static struct iommu_ops arm_smmu_ops = {
.identity_domain = &arm_smmu_identity_domain,
.blocked_domain = &arm_smmu_blocked_domain,
.capable = arm_smmu_capable,
- .domain_alloc_paging = arm_smmu_domain_alloc_paging,
+ .domain_alloc = arm_smmu_domain_alloc,
.probe_device = arm_smmu_probe_device,
.release_device = arm_smmu_release_device,
.probe_finalize = arm_smmu_probe_finalize,
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 6fb5f6f..11652e0 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -396,8 +396,6 @@ static int domain_update_device_node(struct dmar_domain *domain)
return nid;
}
-static void domain_update_iotlb(struct dmar_domain *domain);
-
/* Return the super pagesize bitmap if supported. */
static unsigned long domain_super_pgsize_bitmap(struct dmar_domain *domain)
{
@@ -1218,7 +1216,7 @@ domain_lookup_dev_info(struct dmar_domain *domain,
return NULL;
}
-static void domain_update_iotlb(struct dmar_domain *domain)
+void domain_update_iotlb(struct dmar_domain *domain)
{
struct dev_pasid_info *dev_pasid;
struct device_domain_info *info;
@@ -1368,6 +1366,46 @@ static void domain_flush_pasid_iotlb(struct intel_iommu *iommu,
spin_unlock_irqrestore(&domain->lock, flags);
}
+static void __iommu_flush_iotlb_psi(struct intel_iommu *iommu, u16 did,
+ unsigned long pfn, unsigned int pages,
+ int ih)
+{
+ unsigned int aligned_pages = __roundup_pow_of_two(pages);
+ unsigned long bitmask = aligned_pages - 1;
+ unsigned int mask = ilog2(aligned_pages);
+ u64 addr = (u64)pfn << VTD_PAGE_SHIFT;
+
+ /*
+ * PSI masks the low order bits of the base address. If the
+ * address isn't aligned to the mask, then compute a mask value
+ * needed to ensure the target range is flushed.
+ */
+ if (unlikely(bitmask & pfn)) {
+ unsigned long end_pfn = pfn + pages - 1, shared_bits;
+
+ /*
+ * Since end_pfn <= pfn + bitmask, the only way bits
+ * higher than bitmask can differ in pfn and end_pfn is
+ * by carrying. This means after masking out bitmask,
+ * high bits starting with the first set bit in
+ * shared_bits are all equal in both pfn and end_pfn.
+ */
+ shared_bits = ~(pfn ^ end_pfn) & ~bitmask;
+ mask = shared_bits ? __ffs(shared_bits) : BITS_PER_LONG;
+ }
+
+ /*
+ * Fallback to domain selective flush if no PSI support or
+ * the size is too big.
+ */
+ if (!cap_pgsel_inv(iommu->cap) || mask > cap_max_amask_val(iommu->cap))
+ iommu->flush.flush_iotlb(iommu, did, 0, 0,
+ DMA_TLB_DSI_FLUSH);
+ else
+ iommu->flush.flush_iotlb(iommu, did, addr | ih, mask,
+ DMA_TLB_PSI_FLUSH);
+}
+
static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
struct dmar_domain *domain,
unsigned long pfn, unsigned int pages,
@@ -1384,42 +1422,10 @@ static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
if (ih)
ih = 1 << 6;
- if (domain->use_first_level) {
+ if (domain->use_first_level)
domain_flush_pasid_iotlb(iommu, domain, addr, pages, ih);
- } else {
- unsigned long bitmask = aligned_pages - 1;
-
- /*
- * PSI masks the low order bits of the base address. If the
- * address isn't aligned to the mask, then compute a mask value
- * needed to ensure the target range is flushed.
- */
- if (unlikely(bitmask & pfn)) {
- unsigned long end_pfn = pfn + pages - 1, shared_bits;
-
- /*
- * Since end_pfn <= pfn + bitmask, the only way bits
- * higher than bitmask can differ in pfn and end_pfn is
- * by carrying. This means after masking out bitmask,
- * high bits starting with the first set bit in
- * shared_bits are all equal in both pfn and end_pfn.
- */
- shared_bits = ~(pfn ^ end_pfn) & ~bitmask;
- mask = shared_bits ? __ffs(shared_bits) : BITS_PER_LONG;
- }
-
- /*
- * Fallback to domain selective flush if no PSI support or
- * the size is too big.
- */
- if (!cap_pgsel_inv(iommu->cap) ||
- mask > cap_max_amask_val(iommu->cap))
- iommu->flush.flush_iotlb(iommu, did, 0, 0,
- DMA_TLB_DSI_FLUSH);
- else
- iommu->flush.flush_iotlb(iommu, did, addr | ih, mask,
- DMA_TLB_PSI_FLUSH);
- }
+ else
+ __iommu_flush_iotlb_psi(iommu, did, pfn, pages, ih);
/*
* In caching mode, changes of pages from non-present to present require
@@ -1443,6 +1449,46 @@ static void __mapping_notify_one(struct intel_iommu *iommu, struct dmar_domain *
iommu_flush_write_buffer(iommu);
}
+/*
+ * Flush the relevant caches in nested translation if the domain
+ * also serves as a parent
+ */
+static void parent_domain_flush(struct dmar_domain *domain,
+ unsigned long pfn,
+ unsigned long pages, int ih)
+{
+ struct dmar_domain *s1_domain;
+
+ spin_lock(&domain->s1_lock);
+ list_for_each_entry(s1_domain, &domain->s1_domains, s2_link) {
+ struct device_domain_info *device_info;
+ struct iommu_domain_info *info;
+ unsigned long flags;
+ unsigned long i;
+
+ xa_for_each(&s1_domain->iommu_array, i, info)
+ __iommu_flush_iotlb_psi(info->iommu, info->did,
+ pfn, pages, ih);
+
+ if (!s1_domain->has_iotlb_device)
+ continue;
+
+ spin_lock_irqsave(&s1_domain->lock, flags);
+ list_for_each_entry(device_info, &s1_domain->devices, link)
+ /*
+ * Address translation cache in device side caches the
+ * result of nested translation. There is no easy way
+ * to identify the exact set of nested translations
+ * affected by a change in S2. So just flush the entire
+ * device cache.
+ */
+ __iommu_flush_dev_iotlb(device_info, 0,
+ MAX_AGAW_PFN_WIDTH);
+ spin_unlock_irqrestore(&s1_domain->lock, flags);
+ }
+ spin_unlock(&domain->s1_lock);
+}
+
static void intel_flush_iotlb_all(struct iommu_domain *domain)
{
struct dmar_domain *dmar_domain = to_dmar_domain(domain);
@@ -1462,6 +1508,9 @@ static void intel_flush_iotlb_all(struct iommu_domain *domain)
if (!cap_caching_mode(iommu->cap))
iommu_flush_dev_iotlb(dmar_domain, 0, MAX_AGAW_PFN_WIDTH);
}
+
+ if (dmar_domain->nested_parent)
+ parent_domain_flush(dmar_domain, 0, -1, 0);
}
static void iommu_disable_protect_mem_regions(struct intel_iommu *iommu)
@@ -1985,6 +2034,9 @@ static void switch_to_super_page(struct dmar_domain *domain,
iommu_flush_iotlb_psi(info->iommu, domain,
start_pfn, lvl_pages,
0, 0);
+ if (domain->nested_parent)
+ parent_domain_flush(domain, start_pfn,
+ lvl_pages, 0);
}
pte++;
@@ -3883,6 +3935,7 @@ intel_iommu_domain_alloc_user(struct device *dev, u32 flags,
bool dirty_tracking = flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
bool nested_parent = flags & IOMMU_HWPT_ALLOC_NEST_PARENT;
struct intel_iommu *iommu = info->iommu;
+ struct dmar_domain *dmar_domain;
struct iommu_domain *domain;
/* Must be NESTING domain */
@@ -3908,11 +3961,16 @@ intel_iommu_domain_alloc_user(struct device *dev, u32 flags,
if (!domain)
return ERR_PTR(-ENOMEM);
- if (nested_parent)
- to_dmar_domain(domain)->nested_parent = true;
+ dmar_domain = to_dmar_domain(domain);
+
+ if (nested_parent) {
+ dmar_domain->nested_parent = true;
+ INIT_LIST_HEAD(&dmar_domain->s1_domains);
+ spin_lock_init(&dmar_domain->s1_lock);
+ }
if (dirty_tracking) {
- if (to_dmar_domain(domain)->use_first_level) {
+ if (dmar_domain->use_first_level) {
iommu_domain_free(domain);
return ERR_PTR(-EOPNOTSUPP);
}
@@ -3924,8 +3982,12 @@ intel_iommu_domain_alloc_user(struct device *dev, u32 flags,
static void intel_iommu_domain_free(struct iommu_domain *domain)
{
+ struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+
+ WARN_ON(dmar_domain->nested_parent &&
+ !list_empty(&dmar_domain->s1_domains));
if (domain != &si_domain->domain)
- domain_exit(to_dmar_domain(domain));
+ domain_exit(dmar_domain);
}
int prepare_domain_attach_device(struct iommu_domain *domain,
@@ -4107,6 +4169,9 @@ static void intel_iommu_tlb_sync(struct iommu_domain *domain,
start_pfn, nrpages,
list_empty(&gather->freelist), 0);
+ if (dmar_domain->nested_parent)
+ parent_domain_flush(dmar_domain, start_pfn, nrpages,
+ list_empty(&gather->freelist));
put_pages_list(&gather->freelist);
}
@@ -4664,21 +4729,70 @@ static void *intel_iommu_hw_info(struct device *dev, u32 *length, u32 *type)
return vtd;
}
+/*
+ * Set dirty tracking for the device list of a domain. The caller must
+ * hold the domain->lock when calling it.
+ */
+static int device_set_dirty_tracking(struct list_head *devices, bool enable)
+{
+ struct device_domain_info *info;
+ int ret = 0;
+
+ list_for_each_entry(info, devices, link) {
+ ret = intel_pasid_setup_dirty_tracking(info->iommu, info->dev,
+ IOMMU_NO_PASID, enable);
+ if (ret)
+ break;
+ }
+
+ return ret;
+}
+
+static int parent_domain_set_dirty_tracking(struct dmar_domain *domain,
+ bool enable)
+{
+ struct dmar_domain *s1_domain;
+ unsigned long flags;
+ int ret;
+
+ spin_lock(&domain->s1_lock);
+ list_for_each_entry(s1_domain, &domain->s1_domains, s2_link) {
+ spin_lock_irqsave(&s1_domain->lock, flags);
+ ret = device_set_dirty_tracking(&s1_domain->devices, enable);
+ spin_unlock_irqrestore(&s1_domain->lock, flags);
+ if (ret)
+ goto err_unwind;
+ }
+ spin_unlock(&domain->s1_lock);
+ return 0;
+
+err_unwind:
+ list_for_each_entry(s1_domain, &domain->s1_domains, s2_link) {
+ spin_lock_irqsave(&s1_domain->lock, flags);
+ device_set_dirty_tracking(&s1_domain->devices,
+ domain->dirty_tracking);
+ spin_unlock_irqrestore(&s1_domain->lock, flags);
+ }
+ spin_unlock(&domain->s1_lock);
+ return ret;
+}
+
static int intel_iommu_set_dirty_tracking(struct iommu_domain *domain,
bool enable)
{
struct dmar_domain *dmar_domain = to_dmar_domain(domain);
- struct device_domain_info *info;
int ret;
spin_lock(&dmar_domain->lock);
if (dmar_domain->dirty_tracking == enable)
goto out_unlock;
- list_for_each_entry(info, &dmar_domain->devices, link) {
- ret = intel_pasid_setup_dirty_tracking(info->iommu,
- info->domain, info->dev,
- IOMMU_NO_PASID, enable);
+ ret = device_set_dirty_tracking(&dmar_domain->devices, enable);
+ if (ret)
+ goto err_unwind;
+
+ if (dmar_domain->nested_parent) {
+ ret = parent_domain_set_dirty_tracking(dmar_domain, enable);
if (ret)
goto err_unwind;
}
@@ -4690,10 +4804,8 @@ static int intel_iommu_set_dirty_tracking(struct iommu_domain *domain,
return 0;
err_unwind:
- list_for_each_entry(info, &dmar_domain->devices, link)
- intel_pasid_setup_dirty_tracking(info->iommu, dmar_domain,
- info->dev, IOMMU_NO_PASID,
- dmar_domain->dirty_tracking);
+ device_set_dirty_tracking(&dmar_domain->devices,
+ dmar_domain->dirty_tracking);
spin_unlock(&dmar_domain->lock);
return ret;
}
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index d02f916..4145c04 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -627,6 +627,10 @@ struct dmar_domain {
int agaw;
/* maximum mapped address */
u64 max_addr;
+ /* Protect the s1_domains list */
+ spinlock_t s1_lock;
+ /* Track s1_domains nested on this domain */
+ struct list_head s1_domains;
};
/* Nested user domain */
@@ -637,6 +641,8 @@ struct dmar_domain {
unsigned long s1_pgtbl;
/* page table attributes */
struct iommu_hwpt_vtd_s1 s1_cfg;
+ /* link to parent domain siblings */
+ struct list_head s2_link;
};
};
@@ -1060,6 +1066,7 @@ int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
*/
#define QI_OPT_WAIT_DRAIN BIT(0)
+void domain_update_iotlb(struct dmar_domain *domain);
int domain_attach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu);
void domain_detach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu);
void device_block_translation(struct device *dev);
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
index f26c7f1..a7d68f3 100644
--- a/drivers/iommu/intel/nested.c
+++ b/drivers/iommu/intel/nested.c
@@ -65,12 +65,20 @@ static int intel_nested_attach_dev(struct iommu_domain *domain,
list_add(&info->link, &dmar_domain->devices);
spin_unlock_irqrestore(&dmar_domain->lock, flags);
+ domain_update_iotlb(dmar_domain);
+
return 0;
}
static void intel_nested_domain_free(struct iommu_domain *domain)
{
- kfree(to_dmar_domain(domain));
+ struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+ struct dmar_domain *s2_domain = dmar_domain->s2_domain;
+
+ spin_lock(&s2_domain->s1_lock);
+ list_del(&dmar_domain->s2_link);
+ spin_unlock(&s2_domain->s1_lock);
+ kfree(dmar_domain);
}
static void nested_flush_dev_iotlb(struct dmar_domain *domain, u64 addr,
@@ -95,7 +103,7 @@ static void nested_flush_dev_iotlb(struct dmar_domain *domain, u64 addr,
}
static void intel_nested_flush_cache(struct dmar_domain *domain, u64 addr,
- unsigned long npages, bool ih)
+ u64 npages, bool ih)
{
struct iommu_domain_info *info;
unsigned int mask;
@@ -201,5 +209,9 @@ struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *parent,
spin_lock_init(&domain->lock);
xa_init(&domain->iommu_array);
+ spin_lock(&s2_domain->s1_lock);
+ list_add(&domain->s2_link, &s2_domain->s1_domains);
+ spin_unlock(&s2_domain->s1_lock);
+
return &domain->domain;
}
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 3239cef..108158e 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -428,7 +428,6 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
* Set up dirty tracking on a second only or nested translation type.
*/
int intel_pasid_setup_dirty_tracking(struct intel_iommu *iommu,
- struct dmar_domain *domain,
struct device *dev, u32 pasid,
bool enabled)
{
@@ -445,7 +444,7 @@ int intel_pasid_setup_dirty_tracking(struct intel_iommu *iommu,
return -ENODEV;
}
- did = domain_id_iommu(domain, iommu);
+ did = pasid_get_domain_id(pte);
pgtt = pasid_pte_get_pgtt(pte);
if (pgtt != PASID_ENTRY_PGTT_SL_ONLY &&
pgtt != PASID_ENTRY_PGTT_NESTED) {
@@ -658,6 +657,8 @@ int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev,
pasid_set_domain_id(pte, did);
pasid_set_address_width(pte, s2_domain->agaw);
pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
+ if (s2_domain->dirty_tracking)
+ pasid_set_ssade(pte);
pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
pasid_set_present(pte);
spin_unlock(&iommu->lock);
diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index 8d40d4c..487ede0 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -307,7 +307,6 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
struct dmar_domain *domain,
struct device *dev, u32 pasid);
int intel_pasid_setup_dirty_tracking(struct intel_iommu *iommu,
- struct dmar_domain *domain,
struct device *dev, u32 pasid,
bool enabled);
int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index c3fc920..65814cbc 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -41,6 +41,7 @@ static struct iommu_mm_data *iommu_alloc_mm_data(struct mm_struct *mm, struct de
}
iommu_mm->pasid = pasid;
INIT_LIST_HEAD(&iommu_mm->sva_domains);
+ INIT_LIST_HEAD(&iommu_mm->sva_handles);
/*
* Make sure the write to mm->iommu_mm is not reordered in front of
* initialization to iommu_mm fields. If it does, readers may see a
@@ -82,6 +83,14 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
goto out_unlock;
}
+ list_for_each_entry(handle, &mm->iommu_mm->sva_handles, handle_item) {
+ if (handle->dev == dev) {
+ refcount_inc(&handle->users);
+ mutex_unlock(&iommu_sva_lock);
+ return handle;
+ }
+ }
+
handle = kzalloc(sizeof(*handle), GFP_KERNEL);
if (!handle) {
ret = -ENOMEM;
@@ -111,6 +120,8 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
list_add(&domain->next, &mm->iommu_mm->sva_domains);
out:
+ refcount_set(&handle->users, 1);
+ list_add(&handle->handle_item, &mm->iommu_mm->sva_handles);
mutex_unlock(&iommu_sva_lock);
handle->dev = dev;
handle->domain = domain;
@@ -141,6 +152,12 @@ void iommu_sva_unbind_device(struct iommu_sva *handle)
struct device *dev = handle->dev;
mutex_lock(&iommu_sva_lock);
+ if (!refcount_dec_and_test(&handle->users)) {
+ mutex_unlock(&iommu_sva_lock);
+ return;
+ }
+ list_del(&handle->handle_item);
+
iommu_detach_device_pasid(domain, dev, iommu_mm->pasid);
if (--domain->users == 0) {
list_del(&domain->next);
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 3f3f1fa..33d142f 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -263,7 +263,8 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
if (cmd->__reserved)
return -EOPNOTSUPP;
- if (cmd->data_type == IOMMU_HWPT_DATA_NONE && cmd->data_len)
+ if ((cmd->data_type == IOMMU_HWPT_DATA_NONE && cmd->data_len) ||
+ (cmd->data_type != IOMMU_HWPT_DATA_NONE && !cmd->data_len))
return -EINVAL;
idev = iommufd_get_device(ucmd, cmd->dev_id);
diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c
index 504ac1b..05fd9d3 100644
--- a/drivers/iommu/iommufd/io_pagetable.c
+++ b/drivers/iommu/iommufd/io_pagetable.c
@@ -1330,20 +1330,23 @@ int iopt_disable_large_pages(struct io_pagetable *iopt)
int iopt_add_access(struct io_pagetable *iopt, struct iommufd_access *access)
{
+ u32 new_id;
int rc;
down_write(&iopt->domains_rwsem);
down_write(&iopt->iova_rwsem);
- rc = xa_alloc(&iopt->access_list, &access->iopt_access_list_id, access,
- xa_limit_16b, GFP_KERNEL_ACCOUNT);
+ rc = xa_alloc(&iopt->access_list, &new_id, access, xa_limit_16b,
+ GFP_KERNEL_ACCOUNT);
+
if (rc)
goto out_unlock;
rc = iopt_calculate_iova_alignment(iopt);
if (rc) {
- xa_erase(&iopt->access_list, access->iopt_access_list_id);
+ xa_erase(&iopt->access_list, new_id);
goto out_unlock;
}
+ access->iopt_access_list_id = new_id;
out_unlock:
up_write(&iopt->iova_rwsem);
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index 482d405..e854d3f 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -45,6 +45,7 @@ enum {
enum {
MOCK_FLAGS_DEVICE_NO_DIRTY = 1 << 0,
+ MOCK_FLAGS_DEVICE_HUGE_IOVA = 1 << 1,
};
enum {
diff --git a/drivers/iommu/iommufd/iova_bitmap.c b/drivers/iommu/iommufd/iova_bitmap.c
index 0a92c9e..db8c46b 100644
--- a/drivers/iommu/iommufd/iova_bitmap.c
+++ b/drivers/iommu/iommufd/iova_bitmap.c
@@ -100,7 +100,7 @@ struct iova_bitmap {
struct iova_bitmap_map mapped;
/* userspace address of the bitmap */
- u64 __user *bitmap;
+ u8 __user *bitmap;
/* u64 index that @mapped points to */
unsigned long mapped_base_index;
@@ -113,6 +113,9 @@ struct iova_bitmap {
/* length of the IOVA range for the whole bitmap */
size_t length;
+
+ /* length of the IOVA range set ahead the pinned pages */
+ unsigned long set_ahead_length;
};
/*
@@ -162,7 +165,7 @@ static int iova_bitmap_get(struct iova_bitmap *bitmap)
{
struct iova_bitmap_map *mapped = &bitmap->mapped;
unsigned long npages;
- u64 __user *addr;
+ u8 __user *addr;
long ret;
/*
@@ -176,17 +179,18 @@ static int iova_bitmap_get(struct iova_bitmap *bitmap)
sizeof(*bitmap->bitmap), PAGE_SIZE);
/*
- * We always cap at max number of 'struct page' a base page can fit.
- * This is, for example, on x86 means 2M of bitmap data max.
- */
- npages = min(npages, PAGE_SIZE / sizeof(struct page *));
-
- /*
* Bitmap address to be pinned is calculated via pointer arithmetic
* with bitmap u64 word index.
*/
addr = bitmap->bitmap + bitmap->mapped_base_index;
+ /*
+ * We always cap at max number of 'struct page' a base page can fit.
+ * This is, for example, on x86 means 2M of bitmap data max.
+ */
+ npages = min(npages + !!offset_in_page(addr),
+ PAGE_SIZE / sizeof(struct page *));
+
ret = pin_user_pages_fast((unsigned long)addr, npages,
FOLL_WRITE, mapped->pages);
if (ret <= 0)
@@ -247,7 +251,7 @@ struct iova_bitmap *iova_bitmap_alloc(unsigned long iova, size_t length,
mapped = &bitmap->mapped;
mapped->pgshift = __ffs(page_size);
- bitmap->bitmap = data;
+ bitmap->bitmap = (u8 __user *)data;
bitmap->mapped_total_index =
iova_bitmap_offset_to_index(bitmap, length - 1) + 1;
bitmap->iova = iova;
@@ -304,7 +308,7 @@ static unsigned long iova_bitmap_mapped_remaining(struct iova_bitmap *bitmap)
remaining = bitmap->mapped_total_index - bitmap->mapped_base_index;
remaining = min_t(unsigned long, remaining,
- bytes / sizeof(*bitmap->bitmap));
+ DIV_ROUND_UP(bytes, sizeof(*bitmap->bitmap)));
return remaining;
}
@@ -341,6 +345,32 @@ static bool iova_bitmap_done(struct iova_bitmap *bitmap)
return bitmap->mapped_base_index >= bitmap->mapped_total_index;
}
+static int iova_bitmap_set_ahead(struct iova_bitmap *bitmap,
+ size_t set_ahead_length)
+{
+ int ret = 0;
+
+ while (set_ahead_length > 0 && !iova_bitmap_done(bitmap)) {
+ unsigned long length = iova_bitmap_mapped_length(bitmap);
+ unsigned long iova = iova_bitmap_mapped_iova(bitmap);
+
+ ret = iova_bitmap_get(bitmap);
+ if (ret)
+ break;
+
+ length = min(length, set_ahead_length);
+ iova_bitmap_set(bitmap, iova, length);
+
+ set_ahead_length -= length;
+ bitmap->mapped_base_index +=
+ iova_bitmap_offset_to_index(bitmap, length - 1) + 1;
+ iova_bitmap_put(bitmap);
+ }
+
+ bitmap->set_ahead_length = 0;
+ return ret;
+}
+
/*
* Advances to the next range, releases the current pinned
* pages and pins the next set of bitmap pages.
@@ -357,6 +387,15 @@ static int iova_bitmap_advance(struct iova_bitmap *bitmap)
if (iova_bitmap_done(bitmap))
return 0;
+ /* Iterate, set and skip any bits requested for next iteration */
+ if (bitmap->set_ahead_length) {
+ int ret;
+
+ ret = iova_bitmap_set_ahead(bitmap, bitmap->set_ahead_length);
+ if (ret)
+ return ret;
+ }
+
/* When advancing the index we pin the next set of bitmap pages */
return iova_bitmap_get(bitmap);
}
@@ -409,6 +448,7 @@ void iova_bitmap_set(struct iova_bitmap *bitmap,
mapped->pgshift) + mapped->pgoff * BITS_PER_BYTE;
unsigned long last_bit = (((iova + length - 1) - mapped->iova) >>
mapped->pgshift) + mapped->pgoff * BITS_PER_BYTE;
+ unsigned long last_page_idx = mapped->npages - 1;
do {
unsigned int page_idx = cur_bit / BITS_PER_PAGE;
@@ -417,10 +457,18 @@ void iova_bitmap_set(struct iova_bitmap *bitmap,
last_bit - cur_bit + 1);
void *kaddr;
+ if (unlikely(page_idx > last_page_idx))
+ break;
+
kaddr = kmap_local_page(mapped->pages[page_idx]);
bitmap_set(kaddr, offset, nbits);
kunmap_local(kaddr);
cur_bit += nbits;
} while (cur_bit <= last_bit);
+
+ if (unlikely(cur_bit <= last_bit)) {
+ bitmap->set_ahead_length =
+ ((last_bit - cur_bit + 1) << bitmap->mapped.pgshift);
+ }
}
EXPORT_SYMBOL_NS_GPL(iova_bitmap_set, IOMMUFD);
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index d9e9920..7a219947 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -36,11 +36,12 @@ static struct mock_bus_type iommufd_mock_bus_type = {
},
};
-static atomic_t mock_dev_num;
+static DEFINE_IDA(mock_dev_ida);
enum {
MOCK_DIRTY_TRACK = 1,
MOCK_IO_PAGE_SIZE = PAGE_SIZE / 2,
+ MOCK_HUGE_PAGE_SIZE = 512 * MOCK_IO_PAGE_SIZE,
/*
* Like a real page table alignment requires the low bits of the address
@@ -53,6 +54,7 @@ enum {
MOCK_PFN_START_IOVA = _MOCK_PFN_START,
MOCK_PFN_LAST_IOVA = _MOCK_PFN_START,
MOCK_PFN_DIRTY_IOVA = _MOCK_PFN_START << 1,
+ MOCK_PFN_HUGE_IOVA = _MOCK_PFN_START << 2,
};
/*
@@ -61,8 +63,8 @@ enum {
* In syzkaller mode the 64 bit IOVA is converted into an nth area and offset
* value. This has a much smaller randomization space and syzkaller can hit it.
*/
-static unsigned long iommufd_test_syz_conv_iova(struct io_pagetable *iopt,
- u64 *iova)
+static unsigned long __iommufd_test_syz_conv_iova(struct io_pagetable *iopt,
+ u64 *iova)
{
struct syz_layout {
__u32 nth_area;
@@ -86,6 +88,21 @@ static unsigned long iommufd_test_syz_conv_iova(struct io_pagetable *iopt,
return 0;
}
+static unsigned long iommufd_test_syz_conv_iova(struct iommufd_access *access,
+ u64 *iova)
+{
+ unsigned long ret;
+
+ mutex_lock(&access->ioas_lock);
+ if (!access->ioas) {
+ mutex_unlock(&access->ioas_lock);
+ return 0;
+ }
+ ret = __iommufd_test_syz_conv_iova(&access->ioas->iopt, iova);
+ mutex_unlock(&access->ioas_lock);
+ return ret;
+}
+
void iommufd_test_syz_conv_iova_id(struct iommufd_ucmd *ucmd,
unsigned int ioas_id, u64 *iova, u32 *flags)
{
@@ -98,7 +115,7 @@ void iommufd_test_syz_conv_iova_id(struct iommufd_ucmd *ucmd,
ioas = iommufd_get_ioas(ucmd->ictx, ioas_id);
if (IS_ERR(ioas))
return;
- *iova = iommufd_test_syz_conv_iova(&ioas->iopt, iova);
+ *iova = __iommufd_test_syz_conv_iova(&ioas->iopt, iova);
iommufd_put_object(ucmd->ictx, &ioas->obj);
}
@@ -121,6 +138,7 @@ enum selftest_obj_type {
struct mock_dev {
struct device dev;
unsigned long flags;
+ int id;
};
struct selftest_obj {
@@ -191,6 +209,34 @@ static int mock_domain_set_dirty_tracking(struct iommu_domain *domain,
return 0;
}
+static bool mock_test_and_clear_dirty(struct mock_iommu_domain *mock,
+ unsigned long iova, size_t page_size,
+ unsigned long flags)
+{
+ unsigned long cur, end = iova + page_size - 1;
+ bool dirty = false;
+ void *ent, *old;
+
+ for (cur = iova; cur < end; cur += MOCK_IO_PAGE_SIZE) {
+ ent = xa_load(&mock->pfns, cur / MOCK_IO_PAGE_SIZE);
+ if (!ent || !(xa_to_value(ent) & MOCK_PFN_DIRTY_IOVA))
+ continue;
+
+ dirty = true;
+ /* Clear dirty */
+ if (!(flags & IOMMU_DIRTY_NO_CLEAR)) {
+ unsigned long val;
+
+ val = xa_to_value(ent) & ~MOCK_PFN_DIRTY_IOVA;
+ old = xa_store(&mock->pfns, cur / MOCK_IO_PAGE_SIZE,
+ xa_mk_value(val), GFP_KERNEL);
+ WARN_ON_ONCE(ent != old);
+ }
+ }
+
+ return dirty;
+}
+
static int mock_domain_read_and_clear_dirty(struct iommu_domain *domain,
unsigned long iova, size_t size,
unsigned long flags,
@@ -198,31 +244,31 @@ static int mock_domain_read_and_clear_dirty(struct iommu_domain *domain,
{
struct mock_iommu_domain *mock =
container_of(domain, struct mock_iommu_domain, domain);
- unsigned long i, max = size / MOCK_IO_PAGE_SIZE;
- void *ent, *old;
+ unsigned long end = iova + size;
+ void *ent;
if (!(mock->flags & MOCK_DIRTY_TRACK) && dirty->bitmap)
return -EINVAL;
- for (i = 0; i < max; i++) {
- unsigned long cur = iova + i * MOCK_IO_PAGE_SIZE;
+ do {
+ unsigned long pgsize = MOCK_IO_PAGE_SIZE;
+ unsigned long head;
- ent = xa_load(&mock->pfns, cur / MOCK_IO_PAGE_SIZE);
- if (ent && (xa_to_value(ent) & MOCK_PFN_DIRTY_IOVA)) {
- /* Clear dirty */
- if (!(flags & IOMMU_DIRTY_NO_CLEAR)) {
- unsigned long val;
-
- val = xa_to_value(ent) & ~MOCK_PFN_DIRTY_IOVA;
- old = xa_store(&mock->pfns,
- cur / MOCK_IO_PAGE_SIZE,
- xa_mk_value(val), GFP_KERNEL);
- WARN_ON_ONCE(ent != old);
- }
- iommu_dirty_bitmap_record(dirty, cur,
- MOCK_IO_PAGE_SIZE);
+ ent = xa_load(&mock->pfns, iova / MOCK_IO_PAGE_SIZE);
+ if (!ent) {
+ iova += pgsize;
+ continue;
}
- }
+
+ if (xa_to_value(ent) & MOCK_PFN_HUGE_IOVA)
+ pgsize = MOCK_HUGE_PAGE_SIZE;
+ head = iova & ~(pgsize - 1);
+
+ /* Clear dirty */
+ if (mock_test_and_clear_dirty(mock, head, pgsize, flags))
+ iommu_dirty_bitmap_record(dirty, head, pgsize);
+ iova = head + pgsize;
+ } while (iova < end);
return 0;
}
@@ -234,6 +280,7 @@ const struct iommu_dirty_ops dirty_ops = {
static struct iommu_domain *mock_domain_alloc_paging(struct device *dev)
{
+ struct mock_dev *mdev = container_of(dev, struct mock_dev, dev);
struct mock_iommu_domain *mock;
mock = kzalloc(sizeof(*mock), GFP_KERNEL);
@@ -242,6 +289,8 @@ static struct iommu_domain *mock_domain_alloc_paging(struct device *dev)
mock->domain.geometry.aperture_start = MOCK_APERTURE_START;
mock->domain.geometry.aperture_end = MOCK_APERTURE_LAST;
mock->domain.pgsize_bitmap = MOCK_IO_PAGE_SIZE;
+ if (dev && mdev->flags & MOCK_FLAGS_DEVICE_HUGE_IOVA)
+ mock->domain.pgsize_bitmap |= MOCK_HUGE_PAGE_SIZE;
mock->domain.ops = mock_ops.default_domain_ops;
mock->domain.type = IOMMU_DOMAIN_UNMANAGED;
xa_init(&mock->pfns);
@@ -287,7 +336,7 @@ mock_domain_alloc_user(struct device *dev, u32 flags,
return ERR_PTR(-EOPNOTSUPP);
if (user_data || (has_dirty_flag && no_dirty_ops))
return ERR_PTR(-EOPNOTSUPP);
- domain = mock_domain_alloc_paging(NULL);
+ domain = mock_domain_alloc_paging(dev);
if (!domain)
return ERR_PTR(-ENOMEM);
if (has_dirty_flag)
@@ -350,6 +399,9 @@ static int mock_domain_map_pages(struct iommu_domain *domain,
if (pgcount == 1 && cur + MOCK_IO_PAGE_SIZE == pgsize)
flags = MOCK_PFN_LAST_IOVA;
+ if (pgsize != MOCK_IO_PAGE_SIZE) {
+ flags |= MOCK_PFN_HUGE_IOVA;
+ }
old = xa_store(&mock->pfns, iova / MOCK_IO_PAGE_SIZE,
xa_mk_value((paddr / MOCK_IO_PAGE_SIZE) |
flags),
@@ -394,20 +446,27 @@ static size_t mock_domain_unmap_pages(struct iommu_domain *domain,
/*
* iommufd generates unmaps that must be a strict
- * superset of the map's performend So every starting
- * IOVA should have been an iova passed to map, and the
+ * superset of the map's performend So every
+ * starting/ending IOVA should have been an iova passed
+ * to map.
*
- * First IOVA must be present and have been a first IOVA
- * passed to map_pages
+ * This simple logic doesn't work when the HUGE_PAGE is
+ * turned on since the core code will automatically
+ * switch between the two page sizes creating a break in
+ * the unmap calls. The break can land in the middle of
+ * contiguous IOVA.
*/
- if (first) {
- WARN_ON(ent && !(xa_to_value(ent) &
- MOCK_PFN_START_IOVA));
- first = false;
+ if (!(domain->pgsize_bitmap & MOCK_HUGE_PAGE_SIZE)) {
+ if (first) {
+ WARN_ON(ent && !(xa_to_value(ent) &
+ MOCK_PFN_START_IOVA));
+ first = false;
+ }
+ if (pgcount == 1 &&
+ cur + MOCK_IO_PAGE_SIZE == pgsize)
+ WARN_ON(ent && !(xa_to_value(ent) &
+ MOCK_PFN_LAST_IOVA));
}
- if (pgcount == 1 && cur + MOCK_IO_PAGE_SIZE == pgsize)
- WARN_ON(ent && !(xa_to_value(ent) &
- MOCK_PFN_LAST_IOVA));
iova += MOCK_IO_PAGE_SIZE;
ret += MOCK_IO_PAGE_SIZE;
@@ -595,7 +654,7 @@ static void mock_dev_release(struct device *dev)
{
struct mock_dev *mdev = container_of(dev, struct mock_dev, dev);
- atomic_dec(&mock_dev_num);
+ ida_free(&mock_dev_ida, mdev->id);
kfree(mdev);
}
@@ -604,7 +663,8 @@ static struct mock_dev *mock_dev_create(unsigned long dev_flags)
struct mock_dev *mdev;
int rc;
- if (dev_flags & ~(MOCK_FLAGS_DEVICE_NO_DIRTY))
+ if (dev_flags &
+ ~(MOCK_FLAGS_DEVICE_NO_DIRTY | MOCK_FLAGS_DEVICE_HUGE_IOVA))
return ERR_PTR(-EINVAL);
mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
@@ -616,8 +676,12 @@ static struct mock_dev *mock_dev_create(unsigned long dev_flags)
mdev->dev.release = mock_dev_release;
mdev->dev.bus = &iommufd_mock_bus_type.bus;
- rc = dev_set_name(&mdev->dev, "iommufd_mock%u",
- atomic_inc_return(&mock_dev_num));
+ rc = ida_alloc(&mock_dev_ida, GFP_KERNEL);
+ if (rc < 0)
+ goto err_put;
+ mdev->id = rc;
+
+ rc = dev_set_name(&mdev->dev, "iommufd_mock%u", mdev->id);
if (rc)
goto err_put;
@@ -1119,7 +1183,7 @@ static int iommufd_test_access_pages(struct iommufd_ucmd *ucmd,
}
if (flags & MOCK_FLAGS_ACCESS_SYZ)
- iova = iommufd_test_syz_conv_iova(&staccess->access->ioas->iopt,
+ iova = iommufd_test_syz_conv_iova(staccess->access,
&cmd->access_pages.iova);
npages = (ALIGN(iova + length, PAGE_SIZE) -
@@ -1221,8 +1285,8 @@ static int iommufd_test_access_rw(struct iommufd_ucmd *ucmd,
}
if (flags & MOCK_FLAGS_ACCESS_SYZ)
- iova = iommufd_test_syz_conv_iova(&staccess->access->ioas->iopt,
- &cmd->access_rw.iova);
+ iova = iommufd_test_syz_conv_iova(staccess->access,
+ &cmd->access_rw.iova);
rc = iommufd_access_rw(staccess->access, iova, tmp, length, flags);
if (rc)
diff --git a/drivers/irqchip/irq-brcmstb-l2.c b/drivers/irqchip/irq-brcmstb-l2.c
index 5559c94..2b0b317 100644
--- a/drivers/irqchip/irq-brcmstb-l2.c
+++ b/drivers/irqchip/irq-brcmstb-l2.c
@@ -2,7 +2,7 @@
/*
* Generic Broadcom Set Top Box Level 2 Interrupt controller driver
*
- * Copyright (C) 2014-2017 Broadcom
+ * Copyright (C) 2014-2024 Broadcom
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -112,6 +112,9 @@ static void brcmstb_l2_intc_irq_handle(struct irq_desc *desc)
generic_handle_domain_irq(b->domain, irq);
} while (status);
out:
+ /* Don't ack parent before all device writes are done */
+ wmb();
+
chained_irq_exit(chip, desc);
}
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index d097001..b822752 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -207,6 +207,11 @@ static bool require_its_list_vmovp(struct its_vm *vm, struct its_node *its)
return (gic_rdists->has_rvpeid || vm->vlpi_count[its->list_nr]);
}
+static bool rdists_support_shareable(void)
+{
+ return !(gic_rdists->flags & RDIST_FLAGS_FORCE_NON_SHAREABLE);
+}
+
static u16 get_its_list(struct its_vm *vm)
{
struct its_node *its;
@@ -2710,10 +2715,12 @@ static u64 inherit_vpe_l1_table_from_its(void)
break;
}
val |= FIELD_PREP(GICR_VPROPBASER_4_1_ADDR, addr >> 12);
- val |= FIELD_PREP(GICR_VPROPBASER_SHAREABILITY_MASK,
- FIELD_GET(GITS_BASER_SHAREABILITY_MASK, baser));
- val |= FIELD_PREP(GICR_VPROPBASER_INNER_CACHEABILITY_MASK,
- FIELD_GET(GITS_BASER_INNER_CACHEABILITY_MASK, baser));
+ if (rdists_support_shareable()) {
+ val |= FIELD_PREP(GICR_VPROPBASER_SHAREABILITY_MASK,
+ FIELD_GET(GITS_BASER_SHAREABILITY_MASK, baser));
+ val |= FIELD_PREP(GICR_VPROPBASER_INNER_CACHEABILITY_MASK,
+ FIELD_GET(GITS_BASER_INNER_CACHEABILITY_MASK, baser));
+ }
val |= FIELD_PREP(GICR_VPROPBASER_4_1_SIZE, GITS_BASER_NR_PAGES(baser) - 1);
return val;
@@ -2936,8 +2943,10 @@ static int allocate_vpe_l1_table(void)
WARN_ON(!IS_ALIGNED(pa, psz));
val |= FIELD_PREP(GICR_VPROPBASER_4_1_ADDR, pa >> 12);
- val |= GICR_VPROPBASER_RaWb;
- val |= GICR_VPROPBASER_InnerShareable;
+ if (rdists_support_shareable()) {
+ val |= GICR_VPROPBASER_RaWb;
+ val |= GICR_VPROPBASER_InnerShareable;
+ }
val |= GICR_VPROPBASER_4_1_Z;
val |= GICR_VPROPBASER_4_1_VALID;
@@ -3126,7 +3135,7 @@ static void its_cpu_init_lpis(void)
gicr_write_propbaser(val, rbase + GICR_PROPBASER);
tmp = gicr_read_propbaser(rbase + GICR_PROPBASER);
- if (gic_rdists->flags & RDIST_FLAGS_FORCE_NON_SHAREABLE)
+ if (!rdists_support_shareable())
tmp &= ~GICR_PROPBASER_SHAREABILITY_MASK;
if ((tmp ^ val) & GICR_PROPBASER_SHAREABILITY_MASK) {
@@ -3153,7 +3162,7 @@ static void its_cpu_init_lpis(void)
gicr_write_pendbaser(val, rbase + GICR_PENDBASER);
tmp = gicr_read_pendbaser(rbase + GICR_PENDBASER);
- if (gic_rdists->flags & RDIST_FLAGS_FORCE_NON_SHAREABLE)
+ if (!rdists_support_shareable())
tmp &= ~GICR_PENDBASER_SHAREABILITY_MASK;
if (!(tmp & GICR_PENDBASER_SHAREABILITY_MASK)) {
@@ -3172,6 +3181,7 @@ static void its_cpu_init_lpis(void)
val |= GICR_CTLR_ENABLE_LPIS;
writel_relaxed(val, rbase + GICR_CTLR);
+out:
if (gic_rdists->has_vlpis && !gic_rdists->has_rvpeid) {
void __iomem *vlpi_base = gic_data_rdist_vlpi_base();
@@ -3207,7 +3217,6 @@ static void its_cpu_init_lpis(void)
/* Make sure the GIC has seen the above */
dsb(sy);
-out:
gic_data_rdist()->flags |= RD_LOCAL_LPI_ENABLED;
pr_info("GICv3: CPU%d: using %s LPI pending table @%pa\n",
smp_processor_id(),
@@ -3817,8 +3826,9 @@ static int its_vpe_set_affinity(struct irq_data *d,
bool force)
{
struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
- int from, cpu = cpumask_first(mask_val);
+ struct cpumask common, *table_mask;
unsigned long flags;
+ int from, cpu;
/*
* Changing affinity is mega expensive, so let's be as lazy as
@@ -3834,19 +3844,22 @@ static int its_vpe_set_affinity(struct irq_data *d,
* taken on any vLPI handling path that evaluates vpe->col_idx.
*/
from = vpe_to_cpuid_lock(vpe, &flags);
+ table_mask = gic_data_rdist_cpu(from)->vpe_table_mask;
+
+ /*
+ * If we are offered another CPU in the same GICv4.1 ITS
+ * affinity, pick this one. Otherwise, any CPU will do.
+ */
+ if (table_mask && cpumask_and(&common, mask_val, table_mask))
+ cpu = cpumask_test_cpu(from, &common) ? from : cpumask_first(&common);
+ else
+ cpu = cpumask_first(mask_val);
+
if (from == cpu)
goto out;
vpe->col_idx = cpu;
- /*
- * GICv4.1 allows us to skip VMOVP if moving to a cpu whose RD
- * is sharing its VPE table with the current one.
- */
- if (gic_data_rdist_cpu(cpu)->vpe_table_mask &&
- cpumask_test_cpu(from, gic_data_rdist_cpu(cpu)->vpe_table_mask))
- goto out;
-
its_send_vmovp(vpe);
its_vpe_db_proxy_move(vpe, from, cpu);
@@ -3880,14 +3893,18 @@ static void its_vpe_schedule(struct its_vpe *vpe)
val = virt_to_phys(page_address(vpe->its_vm->vprop_page)) &
GENMASK_ULL(51, 12);
val |= (LPI_NRBITS - 1) & GICR_VPROPBASER_IDBITS_MASK;
- val |= GICR_VPROPBASER_RaWb;
- val |= GICR_VPROPBASER_InnerShareable;
+ if (rdists_support_shareable()) {
+ val |= GICR_VPROPBASER_RaWb;
+ val |= GICR_VPROPBASER_InnerShareable;
+ }
gicr_write_vpropbaser(val, vlpi_base + GICR_VPROPBASER);
val = virt_to_phys(page_address(vpe->vpt_page)) &
GENMASK_ULL(51, 16);
- val |= GICR_VPENDBASER_RaWaWb;
- val |= GICR_VPENDBASER_InnerShareable;
+ if (rdists_support_shareable()) {
+ val |= GICR_VPENDBASER_RaWaWb;
+ val |= GICR_VPENDBASER_InnerShareable;
+ }
/*
* There is no good way of finding out if the pending table is
* empty as we can race against the doorbell interrupt very
@@ -5078,6 +5095,8 @@ static int __init its_probe_one(struct its_node *its)
u32 ctlr;
int err;
+ its_enable_quirks(its);
+
if (is_v4(its)) {
if (!(its->typer & GITS_TYPER_VMOVP)) {
err = its_compute_its_list_map(its);
@@ -5429,7 +5448,6 @@ static int __init its_of_probe(struct device_node *node)
if (!its)
return -ENOMEM;
- its_enable_quirks(its);
err = its_probe_one(its);
if (err) {
its_node_destroy(its);
diff --git a/drivers/irqchip/irq-loongson-eiointc.c b/drivers/irqchip/irq-loongson-eiointc.c
index 1623cd7..b3736bd 100644
--- a/drivers/irqchip/irq-loongson-eiointc.c
+++ b/drivers/irqchip/irq-loongson-eiointc.c
@@ -241,7 +241,7 @@ static int eiointc_domain_alloc(struct irq_domain *domain, unsigned int virq,
int ret;
unsigned int i, type;
unsigned long hwirq = 0;
- struct eiointc *priv = domain->host_data;
+ struct eiointc_priv *priv = domain->host_data;
ret = irq_domain_translate_onecell(domain, arg, &hwirq, &type);
if (ret)
diff --git a/drivers/irqchip/irq-mbigen.c b/drivers/irqchip/irq-mbigen.c
index 5101a3f..58881d3 100644
--- a/drivers/irqchip/irq-mbigen.c
+++ b/drivers/irqchip/irq-mbigen.c
@@ -235,22 +235,17 @@ static const struct irq_domain_ops mbigen_domain_ops = {
static int mbigen_of_create_domain(struct platform_device *pdev,
struct mbigen_device *mgn_chip)
{
- struct device *parent;
struct platform_device *child;
struct irq_domain *domain;
struct device_node *np;
u32 num_pins;
int ret = 0;
- parent = bus_get_dev_root(&platform_bus_type);
- if (!parent)
- return -ENODEV;
-
for_each_child_of_node(pdev->dev.of_node, np) {
if (!of_property_read_bool(np, "interrupt-controller"))
continue;
- child = of_platform_device_create(np, NULL, parent);
+ child = of_platform_device_create(np, NULL, NULL);
if (!child) {
ret = -ENOMEM;
break;
@@ -273,7 +268,6 @@ static int mbigen_of_create_domain(struct platform_device *pdev,
}
}
- put_device(parent);
if (ret)
of_node_put(np);
diff --git a/drivers/irqchip/irq-qcom-mpm.c b/drivers/irqchip/irq-qcom-mpm.c
index cda5838..7942d8e 100644
--- a/drivers/irqchip/irq-qcom-mpm.c
+++ b/drivers/irqchip/irq-qcom-mpm.c
@@ -389,8 +389,8 @@ static int qcom_mpm_init(struct device_node *np, struct device_node *parent)
/* Don't use devm_ioremap_resource, as we're accessing a shared region. */
priv->base = devm_ioremap(dev, res.start, resource_size(&res));
of_node_put(msgram_np);
- if (IS_ERR(priv->base))
- return PTR_ERR(priv->base);
+ if (!priv->base)
+ return -ENOMEM;
} else {
/* Otherwise, fall back to simple MMIO. */
priv->base = devm_platform_ioremap_resource(pdev, 0);
diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
index 5b7bc4f..bf0b40b 100644
--- a/drivers/irqchip/irq-sifive-plic.c
+++ b/drivers/irqchip/irq-sifive-plic.c
@@ -148,7 +148,13 @@ static void plic_irq_eoi(struct irq_data *d)
{
struct plic_handler *handler = this_cpu_ptr(&plic_handlers);
- writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
+ if (unlikely(irqd_irq_disabled(d))) {
+ plic_toggle(handler, d->hwirq, 1);
+ writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
+ plic_toggle(handler, d->hwirq, 0);
+ } else {
+ writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
+ }
}
#ifdef CONFIG_SMP
diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 6ae2329..4e6afa8 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -300,7 +300,7 @@ struct cached_dev {
struct list_head list;
struct bcache_device disk;
struct block_device *bdev;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
struct cache_sb sb;
struct cache_sb_disk *sb_disk;
@@ -423,7 +423,7 @@ struct cache {
struct kobject kobj;
struct block_device *bdev;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
struct task_struct *alloc_thread;
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index f716c32..330bcd9 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1375,8 +1375,8 @@ static CLOSURE_CALLBACK(cached_dev_free)
if (dc->sb_disk)
put_page(virt_to_page(dc->sb_disk));
- if (dc->bdev_handle)
- bdev_release(dc->bdev_handle);
+ if (dc->bdev_file)
+ fput(dc->bdev_file);
wake_up(&unregister_wait);
@@ -1446,7 +1446,7 @@ static int cached_dev_init(struct cached_dev *dc, unsigned int block_size)
/* Cached device - bcache superblock */
static int register_bdev(struct cache_sb *sb, struct cache_sb_disk *sb_disk,
- struct bdev_handle *bdev_handle,
+ struct file *bdev_file,
struct cached_dev *dc)
{
const char *err = "cannot allocate memory";
@@ -1454,8 +1454,8 @@ static int register_bdev(struct cache_sb *sb, struct cache_sb_disk *sb_disk,
int ret = -ENOMEM;
memcpy(&dc->sb, sb, sizeof(struct cache_sb));
- dc->bdev_handle = bdev_handle;
- dc->bdev = bdev_handle->bdev;
+ dc->bdev_file = bdev_file;
+ dc->bdev = file_bdev(bdev_file);
dc->sb_disk = sb_disk;
if (cached_dev_init(dc, sb->block_size << 9))
@@ -2219,8 +2219,8 @@ void bch_cache_release(struct kobject *kobj)
if (ca->sb_disk)
put_page(virt_to_page(ca->sb_disk));
- if (ca->bdev_handle)
- bdev_release(ca->bdev_handle);
+ if (ca->bdev_file)
+ fput(ca->bdev_file);
kfree(ca);
module_put(THIS_MODULE);
@@ -2340,18 +2340,18 @@ static int cache_alloc(struct cache *ca)
}
static int register_cache(struct cache_sb *sb, struct cache_sb_disk *sb_disk,
- struct bdev_handle *bdev_handle,
+ struct file *bdev_file,
struct cache *ca)
{
const char *err = NULL; /* must be set for any error case */
int ret = 0;
memcpy(&ca->sb, sb, sizeof(struct cache_sb));
- ca->bdev_handle = bdev_handle;
- ca->bdev = bdev_handle->bdev;
+ ca->bdev_file = bdev_file;
+ ca->bdev = file_bdev(bdev_file);
ca->sb_disk = sb_disk;
- if (bdev_max_discard_sectors((bdev_handle->bdev)))
+ if (bdev_max_discard_sectors(file_bdev(bdev_file)))
ca->discard = CACHE_DISCARD(&ca->sb);
ret = cache_alloc(ca);
@@ -2362,20 +2362,20 @@ static int register_cache(struct cache_sb *sb, struct cache_sb_disk *sb_disk,
err = "cache_alloc(): cache device is too small";
else
err = "cache_alloc(): unknown error";
- pr_notice("error %pg: %s\n", bdev_handle->bdev, err);
+ pr_notice("error %pg: %s\n", file_bdev(bdev_file), err);
/*
* If we failed here, it means ca->kobj is not initialized yet,
* kobject_put() won't be called and there is no chance to
- * call bdev_release() to bdev in bch_cache_release(). So
- * we explicitly call bdev_release() here.
+ * call fput() to bdev in bch_cache_release(). So
+ * we explicitly call fput() on the block device here.
*/
- bdev_release(bdev_handle);
+ fput(bdev_file);
return ret;
}
- if (kobject_add(&ca->kobj, bdev_kobj(bdev_handle->bdev), "bcache")) {
+ if (kobject_add(&ca->kobj, bdev_kobj(file_bdev(bdev_file)), "bcache")) {
pr_notice("error %pg: error calling kobject_add\n",
- bdev_handle->bdev);
+ file_bdev(bdev_file));
ret = -ENOMEM;
goto out;
}
@@ -2389,7 +2389,7 @@ static int register_cache(struct cache_sb *sb, struct cache_sb_disk *sb_disk,
goto out;
}
- pr_info("registered cache device %pg\n", ca->bdev_handle->bdev);
+ pr_info("registered cache device %pg\n", file_bdev(ca->bdev_file));
out:
kobject_put(&ca->kobj);
@@ -2447,7 +2447,7 @@ struct async_reg_args {
char *path;
struct cache_sb *sb;
struct cache_sb_disk *sb_disk;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
void *holder;
};
@@ -2458,7 +2458,7 @@ static void register_bdev_worker(struct work_struct *work)
container_of(work, struct async_reg_args, reg_work.work);
mutex_lock(&bch_register_lock);
- if (register_bdev(args->sb, args->sb_disk, args->bdev_handle,
+ if (register_bdev(args->sb, args->sb_disk, args->bdev_file,
args->holder) < 0)
fail = true;
mutex_unlock(&bch_register_lock);
@@ -2479,7 +2479,7 @@ static void register_cache_worker(struct work_struct *work)
container_of(work, struct async_reg_args, reg_work.work);
/* blkdev_put() will be called in bch_cache_release() */
- if (register_cache(args->sb, args->sb_disk, args->bdev_handle,
+ if (register_cache(args->sb, args->sb_disk, args->bdev_file,
args->holder))
fail = true;
@@ -2517,7 +2517,7 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
char *path = NULL;
struct cache_sb *sb;
struct cache_sb_disk *sb_disk;
- struct bdev_handle *bdev_handle, *bdev_handle2;
+ struct file *bdev_file, *bdev_file2;
void *holder = NULL;
ssize_t ret;
bool async_registration = false;
@@ -2550,15 +2550,15 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
ret = -EINVAL;
err = "failed to open device";
- bdev_handle = bdev_open_by_path(strim(path), BLK_OPEN_READ, NULL, NULL);
- if (IS_ERR(bdev_handle))
+ bdev_file = bdev_file_open_by_path(strim(path), BLK_OPEN_READ, NULL, NULL);
+ if (IS_ERR(bdev_file))
goto out_free_sb;
err = "failed to set blocksize";
- if (set_blocksize(bdev_handle->bdev, 4096))
+ if (set_blocksize(file_bdev(bdev_file), 4096))
goto out_blkdev_put;
- err = read_super(sb, bdev_handle->bdev, &sb_disk);
+ err = read_super(sb, file_bdev(bdev_file), &sb_disk);
if (err)
goto out_blkdev_put;
@@ -2570,13 +2570,13 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
}
/* Now reopen in exclusive mode with proper holder */
- bdev_handle2 = bdev_open_by_dev(bdev_handle->bdev->bd_dev,
+ bdev_file2 = bdev_file_open_by_dev(file_bdev(bdev_file)->bd_dev,
BLK_OPEN_READ | BLK_OPEN_WRITE, holder, NULL);
- bdev_release(bdev_handle);
- bdev_handle = bdev_handle2;
- if (IS_ERR(bdev_handle)) {
- ret = PTR_ERR(bdev_handle);
- bdev_handle = NULL;
+ fput(bdev_file);
+ bdev_file = bdev_file2;
+ if (IS_ERR(bdev_file)) {
+ ret = PTR_ERR(bdev_file);
+ bdev_file = NULL;
if (ret == -EBUSY) {
dev_t dev;
@@ -2611,7 +2611,7 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
args->path = path;
args->sb = sb;
args->sb_disk = sb_disk;
- args->bdev_handle = bdev_handle;
+ args->bdev_file = bdev_file;
args->holder = holder;
register_device_async(args);
/* No wait and returns to user space */
@@ -2620,14 +2620,14 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
if (SB_IS_BDEV(sb)) {
mutex_lock(&bch_register_lock);
- ret = register_bdev(sb, sb_disk, bdev_handle, holder);
+ ret = register_bdev(sb, sb_disk, bdev_file, holder);
mutex_unlock(&bch_register_lock);
/* blkdev_put() will be called in cached_dev_free() */
if (ret < 0)
goto out_free_sb;
} else {
/* blkdev_put() will be called in bch_cache_release() */
- ret = register_cache(sb, sb_disk, bdev_handle, holder);
+ ret = register_cache(sb, sb_disk, bdev_file, holder);
if (ret)
goto out_free_sb;
}
@@ -2643,8 +2643,8 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
out_put_sb_page:
put_page(virt_to_page(sb_disk));
out_blkdev_put:
- if (bdev_handle)
- bdev_release(bdev_handle);
+ if (bdev_file)
+ fput(bdev_file);
out_free_sb:
kfree(sb);
out_free_path:
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index f745f85..5944576 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -53,15 +53,17 @@
struct convert_context {
struct completion restart;
struct bio *bio_in;
- struct bio *bio_out;
struct bvec_iter iter_in;
+ struct bio *bio_out;
struct bvec_iter iter_out;
- u64 cc_sector;
atomic_t cc_pending;
+ u64 cc_sector;
union {
struct skcipher_request *req;
struct aead_request *req_aead;
} r;
+ bool aead_recheck;
+ bool aead_failed;
};
@@ -82,6 +84,8 @@ struct dm_crypt_io {
blk_status_t error;
sector_t sector;
+ struct bvec_iter saved_bi_iter;
+
struct rb_node rb_node;
} CRYPTO_MINALIGN_ATTR;
@@ -1370,10 +1374,13 @@ static int crypt_convert_block_aead(struct crypt_config *cc,
if (r == -EBADMSG) {
sector_t s = le64_to_cpu(*sector);
- DMERR_LIMIT("%pg: INTEGRITY AEAD ERROR, sector %llu",
- ctx->bio_in->bi_bdev, s);
- dm_audit_log_bio(DM_MSG_PREFIX, "integrity-aead",
- ctx->bio_in, s, 0);
+ ctx->aead_failed = true;
+ if (ctx->aead_recheck) {
+ DMERR_LIMIT("%pg: INTEGRITY AEAD ERROR, sector %llu",
+ ctx->bio_in->bi_bdev, s);
+ dm_audit_log_bio(DM_MSG_PREFIX, "integrity-aead",
+ ctx->bio_in, s, 0);
+ }
}
if (!r && cc->iv_gen_ops && cc->iv_gen_ops->post)
@@ -1757,6 +1764,8 @@ static void crypt_io_init(struct dm_crypt_io *io, struct crypt_config *cc,
io->base_bio = bio;
io->sector = sector;
io->error = 0;
+ io->ctx.aead_recheck = false;
+ io->ctx.aead_failed = false;
io->ctx.r.req = NULL;
io->integrity_metadata = NULL;
io->integrity_metadata_from_pool = false;
@@ -1768,6 +1777,8 @@ static void crypt_inc_pending(struct dm_crypt_io *io)
atomic_inc(&io->io_pending);
}
+static void kcryptd_queue_read(struct dm_crypt_io *io);
+
/*
* One of the bios was finished. Check for completion of
* the whole request and correctly clean up the buffer.
@@ -1781,6 +1792,15 @@ static void crypt_dec_pending(struct dm_crypt_io *io)
if (!atomic_dec_and_test(&io->io_pending))
return;
+ if (likely(!io->ctx.aead_recheck) && unlikely(io->ctx.aead_failed) &&
+ cc->on_disk_tag_size && bio_data_dir(base_bio) == READ) {
+ io->ctx.aead_recheck = true;
+ io->ctx.aead_failed = false;
+ io->error = 0;
+ kcryptd_queue_read(io);
+ return;
+ }
+
if (io->ctx.r.req)
crypt_free_req(cc, io->ctx.r.req, base_bio);
@@ -1816,15 +1836,19 @@ static void crypt_endio(struct bio *clone)
struct dm_crypt_io *io = clone->bi_private;
struct crypt_config *cc = io->cc;
unsigned int rw = bio_data_dir(clone);
- blk_status_t error;
+ blk_status_t error = clone->bi_status;
+
+ if (io->ctx.aead_recheck && !error) {
+ kcryptd_queue_crypt(io);
+ return;
+ }
/*
* free the processed pages
*/
- if (rw == WRITE)
+ if (rw == WRITE || io->ctx.aead_recheck)
crypt_free_buffer_pages(cc, clone);
- error = clone->bi_status;
bio_put(clone);
if (rw == READ && !error) {
@@ -1845,6 +1869,22 @@ static int kcryptd_io_read(struct dm_crypt_io *io, gfp_t gfp)
struct crypt_config *cc = io->cc;
struct bio *clone;
+ if (io->ctx.aead_recheck) {
+ if (!(gfp & __GFP_DIRECT_RECLAIM))
+ return 1;
+ crypt_inc_pending(io);
+ clone = crypt_alloc_buffer(io, io->base_bio->bi_iter.bi_size);
+ if (unlikely(!clone)) {
+ crypt_dec_pending(io);
+ return 1;
+ }
+ clone->bi_iter.bi_sector = cc->start + io->sector;
+ crypt_convert_init(cc, &io->ctx, clone, clone, io->sector);
+ io->saved_bi_iter = clone->bi_iter;
+ dm_submit_bio_remap(io->base_bio, clone);
+ return 0;
+ }
+
/*
* We need the original biovec array in order to decrypt the whole bio
* data *afterwards* -- thanks to immutable biovecs we don't need to
@@ -2071,6 +2111,12 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
io->ctx.bio_out = clone;
io->ctx.iter_out = clone->bi_iter;
+ if (crypt_integrity_aead(cc)) {
+ bio_copy_data(clone, io->base_bio);
+ io->ctx.bio_in = clone;
+ io->ctx.iter_in = clone->bi_iter;
+ }
+
sector += bio_sectors(clone);
crypt_inc_pending(io);
@@ -2107,6 +2153,14 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
static void kcryptd_crypt_read_done(struct dm_crypt_io *io)
{
+ if (io->ctx.aead_recheck) {
+ if (!io->error) {
+ io->ctx.bio_in->bi_iter = io->saved_bi_iter;
+ bio_copy_data(io->base_bio, io->ctx.bio_in);
+ }
+ crypt_free_buffer_pages(io->cc, io->ctx.bio_in);
+ bio_put(io->ctx.bio_in);
+ }
crypt_dec_pending(io);
}
@@ -2136,11 +2190,17 @@ static void kcryptd_crypt_read_convert(struct dm_crypt_io *io)
crypt_inc_pending(io);
- crypt_convert_init(cc, &io->ctx, io->base_bio, io->base_bio,
- io->sector);
+ if (io->ctx.aead_recheck) {
+ io->ctx.cc_sector = io->sector + cc->iv_offset;
+ r = crypt_convert(cc, &io->ctx,
+ test_bit(DM_CRYPT_NO_READ_WORKQUEUE, &cc->flags), true);
+ } else {
+ crypt_convert_init(cc, &io->ctx, io->base_bio, io->base_bio,
+ io->sector);
- r = crypt_convert(cc, &io->ctx,
- test_bit(DM_CRYPT_NO_READ_WORKQUEUE, &cc->flags), true);
+ r = crypt_convert(cc, &io->ctx,
+ test_bit(DM_CRYPT_NO_READ_WORKQUEUE, &cc->flags), true);
+ }
/*
* Crypto API backlogged the request, because its queue was full
* and we're in softirq context, so continue from a workqueue
@@ -2182,10 +2242,13 @@ static void kcryptd_async_done(void *data, int error)
if (error == -EBADMSG) {
sector_t s = le64_to_cpu(*org_sector_of_dmreq(cc, dmreq));
- DMERR_LIMIT("%pg: INTEGRITY AEAD ERROR, sector %llu",
- ctx->bio_in->bi_bdev, s);
- dm_audit_log_bio(DM_MSG_PREFIX, "integrity-aead",
- ctx->bio_in, s, 0);
+ ctx->aead_failed = true;
+ if (ctx->aead_recheck) {
+ DMERR_LIMIT("%pg: INTEGRITY AEAD ERROR, sector %llu",
+ ctx->bio_in->bi_bdev, s);
+ dm_audit_log_bio(DM_MSG_PREFIX, "integrity-aead",
+ ctx->bio_in, s, 0);
+ }
io->error = BLK_STS_PROTECTION;
} else if (error < 0)
io->error = BLK_STS_IOERR;
@@ -3110,7 +3173,7 @@ static int crypt_ctr_optional(struct dm_target *ti, unsigned int argc, char **ar
sval = strchr(opt_string + strlen("integrity:"), ':') + 1;
if (!strcasecmp(sval, "aead")) {
set_bit(CRYPT_MODE_INTEGRITY_AEAD, &cc->cipher_flags);
- } else if (strcasecmp(sval, "none")) {
+ } else if (strcasecmp(sval, "none")) {
ti->error = "Unknown integrity profile";
return -EINVAL;
}
@@ -3639,7 +3702,7 @@ static void crypt_io_hints(struct dm_target *ti, struct queue_limits *limits)
static struct target_type crypt_target = {
.name = "crypt",
- .version = {1, 24, 0},
+ .version = {1, 25, 0},
.module = THIS_MODULE,
.ctr = crypt_ctr,
.dtr = crypt_dtr,
diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c
index c5f03aa..1fc901d 100644
--- a/drivers/md/dm-integrity.c
+++ b/drivers/md/dm-integrity.c
@@ -278,6 +278,8 @@ struct dm_integrity_c {
atomic64_t number_of_mismatches;
+ mempool_t recheck_pool;
+
struct notifier_block reboot_notifier;
};
@@ -1689,6 +1691,77 @@ static void integrity_sector_checksum(struct dm_integrity_c *ic, sector_t sector
get_random_bytes(result, ic->tag_size);
}
+static noinline void integrity_recheck(struct dm_integrity_io *dio, char *checksum)
+{
+ struct bio *bio = dm_bio_from_per_bio_data(dio, sizeof(struct dm_integrity_io));
+ struct dm_integrity_c *ic = dio->ic;
+ struct bvec_iter iter;
+ struct bio_vec bv;
+ sector_t sector, logical_sector, area, offset;
+ struct page *page;
+ void *buffer;
+
+ get_area_and_offset(ic, dio->range.logical_sector, &area, &offset);
+ dio->metadata_block = get_metadata_sector_and_offset(ic, area, offset,
+ &dio->metadata_offset);
+ sector = get_data_sector(ic, area, offset);
+ logical_sector = dio->range.logical_sector;
+
+ page = mempool_alloc(&ic->recheck_pool, GFP_NOIO);
+ buffer = page_to_virt(page);
+
+ __bio_for_each_segment(bv, bio, iter, dio->bio_details.bi_iter) {
+ unsigned pos = 0;
+
+ do {
+ char *mem;
+ int r;
+ struct dm_io_request io_req;
+ struct dm_io_region io_loc;
+ io_req.bi_opf = REQ_OP_READ;
+ io_req.mem.type = DM_IO_KMEM;
+ io_req.mem.ptr.addr = buffer;
+ io_req.notify.fn = NULL;
+ io_req.client = ic->io;
+ io_loc.bdev = ic->dev->bdev;
+ io_loc.sector = sector;
+ io_loc.count = ic->sectors_per_block;
+
+ r = dm_io(&io_req, 1, &io_loc, NULL);
+ if (unlikely(r)) {
+ dio->bi_status = errno_to_blk_status(r);
+ goto free_ret;
+ }
+
+ integrity_sector_checksum(ic, logical_sector, buffer, checksum);
+ r = dm_integrity_rw_tag(ic, checksum, &dio->metadata_block,
+ &dio->metadata_offset, ic->tag_size, TAG_CMP);
+ if (r) {
+ if (r > 0) {
+ DMERR_LIMIT("%pg: Checksum failed at sector 0x%llx",
+ bio->bi_bdev, logical_sector);
+ atomic64_inc(&ic->number_of_mismatches);
+ dm_audit_log_bio(DM_MSG_PREFIX, "integrity-checksum",
+ bio, logical_sector, 0);
+ r = -EILSEQ;
+ }
+ dio->bi_status = errno_to_blk_status(r);
+ goto free_ret;
+ }
+
+ mem = bvec_kmap_local(&bv);
+ memcpy(mem + pos, buffer, ic->sectors_per_block << SECTOR_SHIFT);
+ kunmap_local(mem);
+
+ pos += ic->sectors_per_block << SECTOR_SHIFT;
+ sector += ic->sectors_per_block;
+ logical_sector += ic->sectors_per_block;
+ } while (pos < bv.bv_len);
+ }
+free_ret:
+ mempool_free(page, &ic->recheck_pool);
+}
+
static void integrity_metadata(struct work_struct *w)
{
struct dm_integrity_io *dio = container_of(w, struct dm_integrity_io, work);
@@ -1776,15 +1849,8 @@ static void integrity_metadata(struct work_struct *w)
checksums_ptr - checksums, dio->op == REQ_OP_READ ? TAG_CMP : TAG_WRITE);
if (unlikely(r)) {
if (r > 0) {
- sector_t s;
-
- s = sector - ((r + ic->tag_size - 1) / ic->tag_size);
- DMERR_LIMIT("%pg: Checksum failed at sector 0x%llx",
- bio->bi_bdev, s);
- r = -EILSEQ;
- atomic64_inc(&ic->number_of_mismatches);
- dm_audit_log_bio(DM_MSG_PREFIX, "integrity-checksum",
- bio, s, 0);
+ integrity_recheck(dio, checksums);
+ goto skip_io;
}
if (likely(checksums != checksums_onstack))
kfree(checksums);
@@ -4261,6 +4327,12 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned int argc, char **argv
goto bad;
}
+ r = mempool_init_page_pool(&ic->recheck_pool, 1, 0);
+ if (r) {
+ ti->error = "Cannot allocate mempool";
+ goto bad;
+ }
+
ic->metadata_wq = alloc_workqueue("dm-integrity-metadata",
WQ_MEM_RECLAIM, METADATA_WORKQUEUE_MAX_ACTIVE);
if (!ic->metadata_wq) {
@@ -4609,6 +4681,7 @@ static void dm_integrity_dtr(struct dm_target *ti)
kvfree(ic->bbs);
if (ic->bufio)
dm_bufio_client_destroy(ic->bufio);
+ mempool_exit(&ic->recheck_pool);
mempool_exit(&ic->journal_io_mempool);
if (ic->io)
dm_io_client_destroy(ic->io);
@@ -4661,7 +4734,7 @@ static void dm_integrity_dtr(struct dm_target *ti)
static struct target_type integrity_target = {
.name = "integrity",
- .version = {1, 10, 0},
+ .version = {1, 11, 0},
.module = THIS_MODULE,
.features = DM_TARGET_SINGLETON | DM_TARGET_INTEGRITY,
.ctr = dm_integrity_ctr,
diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c
index 82662f5..1b591bf 100644
--- a/drivers/md/dm-verity-target.c
+++ b/drivers/md/dm-verity-target.c
@@ -482,6 +482,63 @@ int verity_for_bv_block(struct dm_verity *v, struct dm_verity_io *io,
return 0;
}
+static int verity_recheck_copy(struct dm_verity *v, struct dm_verity_io *io,
+ u8 *data, size_t len)
+{
+ memcpy(data, io->recheck_buffer, len);
+ io->recheck_buffer += len;
+
+ return 0;
+}
+
+static noinline int verity_recheck(struct dm_verity *v, struct dm_verity_io *io,
+ struct bvec_iter start, sector_t cur_block)
+{
+ struct page *page;
+ void *buffer;
+ int r;
+ struct dm_io_request io_req;
+ struct dm_io_region io_loc;
+
+ page = mempool_alloc(&v->recheck_pool, GFP_NOIO);
+ buffer = page_to_virt(page);
+
+ io_req.bi_opf = REQ_OP_READ;
+ io_req.mem.type = DM_IO_KMEM;
+ io_req.mem.ptr.addr = buffer;
+ io_req.notify.fn = NULL;
+ io_req.client = v->io;
+ io_loc.bdev = v->data_dev->bdev;
+ io_loc.sector = cur_block << (v->data_dev_block_bits - SECTOR_SHIFT);
+ io_loc.count = 1 << (v->data_dev_block_bits - SECTOR_SHIFT);
+ r = dm_io(&io_req, 1, &io_loc, NULL);
+ if (unlikely(r))
+ goto free_ret;
+
+ r = verity_hash(v, verity_io_hash_req(v, io), buffer,
+ 1 << v->data_dev_block_bits,
+ verity_io_real_digest(v, io), true);
+ if (unlikely(r))
+ goto free_ret;
+
+ if (memcmp(verity_io_real_digest(v, io),
+ verity_io_want_digest(v, io), v->digest_size)) {
+ r = -EIO;
+ goto free_ret;
+ }
+
+ io->recheck_buffer = buffer;
+ r = verity_for_bv_block(v, io, &start, verity_recheck_copy);
+ if (unlikely(r))
+ goto free_ret;
+
+ r = 0;
+free_ret:
+ mempool_free(page, &v->recheck_pool);
+
+ return r;
+}
+
static int verity_bv_zero(struct dm_verity *v, struct dm_verity_io *io,
u8 *data, size_t len)
{
@@ -508,9 +565,7 @@ static int verity_verify_io(struct dm_verity_io *io)
{
bool is_zero;
struct dm_verity *v = io->v;
-#if defined(CONFIG_DM_VERITY_FEC)
struct bvec_iter start;
-#endif
struct bvec_iter iter_copy;
struct bvec_iter *iter;
struct crypto_wait wait;
@@ -561,10 +616,7 @@ static int verity_verify_io(struct dm_verity_io *io)
if (unlikely(r < 0))
return r;
-#if defined(CONFIG_DM_VERITY_FEC)
- if (verity_fec_is_enabled(v))
- start = *iter;
-#endif
+ start = *iter;
r = verity_for_io_block(v, io, iter, &wait);
if (unlikely(r < 0))
return r;
@@ -586,6 +638,10 @@ static int verity_verify_io(struct dm_verity_io *io)
* tasklet since it may sleep, so fallback to work-queue.
*/
return -EAGAIN;
+ } else if (verity_recheck(v, io, start, cur_block) == 0) {
+ if (v->validated_blocks)
+ set_bit(cur_block, v->validated_blocks);
+ continue;
#if defined(CONFIG_DM_VERITY_FEC)
} else if (verity_fec_decode(v, io, DM_VERITY_BLOCK_TYPE_DATA,
cur_block, NULL, &start) == 0) {
@@ -941,6 +997,10 @@ static void verity_dtr(struct dm_target *ti)
if (v->verify_wq)
destroy_workqueue(v->verify_wq);
+ mempool_exit(&v->recheck_pool);
+ if (v->io)
+ dm_io_client_destroy(v->io);
+
if (v->bufio)
dm_bufio_client_destroy(v->bufio);
@@ -1379,6 +1439,20 @@ static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv)
}
v->hash_blocks = hash_position;
+ r = mempool_init_page_pool(&v->recheck_pool, 1, 0);
+ if (unlikely(r)) {
+ ti->error = "Cannot allocate mempool";
+ goto bad;
+ }
+
+ v->io = dm_io_client_create();
+ if (IS_ERR(v->io)) {
+ r = PTR_ERR(v->io);
+ v->io = NULL;
+ ti->error = "Cannot allocate dm io";
+ goto bad;
+ }
+
v->bufio = dm_bufio_client_create(v->hash_dev->bdev,
1 << v->hash_dev_block_bits, 1, sizeof(struct buffer_aux),
dm_bufio_alloc_callback, NULL,
@@ -1486,7 +1560,7 @@ int dm_verity_get_root_digest(struct dm_target *ti, u8 **root_digest, unsigned i
static struct target_type verity_target = {
.name = "verity",
.features = DM_TARGET_IMMUTABLE,
- .version = {1, 9, 0},
+ .version = {1, 10, 0},
.module = THIS_MODULE,
.ctr = verity_ctr,
.dtr = verity_dtr,
diff --git a/drivers/md/dm-verity.h b/drivers/md/dm-verity.h
index f3f6070..db93a91 100644
--- a/drivers/md/dm-verity.h
+++ b/drivers/md/dm-verity.h
@@ -11,6 +11,7 @@
#ifndef DM_VERITY_H
#define DM_VERITY_H
+#include <linux/dm-io.h>
#include <linux/dm-bufio.h>
#include <linux/device-mapper.h>
#include <linux/interrupt.h>
@@ -68,6 +69,9 @@ struct dm_verity {
unsigned long *validated_blocks; /* bitset blocks validated */
char *signature_key_desc; /* signature keyring reference */
+
+ struct dm_io_client *io;
+ mempool_t recheck_pool;
};
struct dm_verity_io {
@@ -76,14 +80,16 @@ struct dm_verity_io {
/* original value of bio->bi_end_io */
bio_end_io_t *orig_bi_end_io;
+ struct bvec_iter iter;
+
sector_t block;
unsigned int n_blocks;
bool in_tasklet;
- struct bvec_iter iter;
-
struct work_struct work;
+ char *recheck_buffer;
+
/*
* Three variably-size fields follow this struct:
*
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index b5e6a10..447e132 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -726,7 +726,8 @@ static struct table_device *open_table_device(struct mapped_device *md,
dev_t dev, blk_mode_t mode)
{
struct table_device *td;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
+ struct block_device *bdev;
u64 part_off;
int r;
@@ -735,34 +736,36 @@ static struct table_device *open_table_device(struct mapped_device *md,
return ERR_PTR(-ENOMEM);
refcount_set(&td->count, 1);
- bdev_handle = bdev_open_by_dev(dev, mode, _dm_claim_ptr, NULL);
- if (IS_ERR(bdev_handle)) {
- r = PTR_ERR(bdev_handle);
+ bdev_file = bdev_file_open_by_dev(dev, mode, _dm_claim_ptr, NULL);
+ if (IS_ERR(bdev_file)) {
+ r = PTR_ERR(bdev_file);
goto out_free_td;
}
+ bdev = file_bdev(bdev_file);
+
/*
* We can be called before the dm disk is added. In that case we can't
* register the holder relation here. It will be done once add_disk was
* called.
*/
if (md->disk->slave_dir) {
- r = bd_link_disk_holder(bdev_handle->bdev, md->disk);
+ r = bd_link_disk_holder(bdev, md->disk);
if (r)
goto out_blkdev_put;
}
td->dm_dev.mode = mode;
- td->dm_dev.bdev = bdev_handle->bdev;
- td->dm_dev.bdev_handle = bdev_handle;
- td->dm_dev.dax_dev = fs_dax_get_by_bdev(bdev_handle->bdev, &part_off,
+ td->dm_dev.bdev = bdev;
+ td->dm_dev.bdev_file = bdev_file;
+ td->dm_dev.dax_dev = fs_dax_get_by_bdev(bdev, &part_off,
NULL, NULL);
format_dev_t(td->dm_dev.name, dev);
list_add(&td->list, &md->table_devices);
return td;
out_blkdev_put:
- bdev_release(bdev_handle);
+ fput(bdev_file);
out_free_td:
kfree(td);
return ERR_PTR(r);
@@ -775,7 +778,7 @@ static void close_table_device(struct table_device *td, struct mapped_device *md
{
if (md->disk->slave_dir)
bd_unlink_disk_holder(td->dm_dev.bdev, md->disk);
- bdev_release(td->dm_dev.bdev_handle);
+ fput(td->dm_dev.bdev_file);
put_dax(td->dm_dev.dax_dev);
list_del(&td->list);
kfree(td);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 7d7b982..e575e74 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -587,8 +587,12 @@ static void submit_flushes(struct work_struct *ws)
rcu_read_lock();
}
rcu_read_unlock();
- if (atomic_dec_and_test(&mddev->flush_pending))
+ if (atomic_dec_and_test(&mddev->flush_pending)) {
+ /* The pair is percpu_ref_get() from md_flush_request() */
+ percpu_ref_put(&mddev->active_io);
+
queue_work(md_wq, &mddev->flush_work);
+ }
}
static void md_submit_flush_data(struct work_struct *ws)
@@ -2587,7 +2591,7 @@ static void export_rdev(struct md_rdev *rdev, struct mddev *mddev)
if (test_bit(AutoDetected, &rdev->flags))
md_autodetect_dev(rdev->bdev->bd_dev);
#endif
- bdev_release(rdev->bdev_handle);
+ fput(rdev->bdev_file);
rdev->bdev = NULL;
kobject_put(&rdev->kobj);
}
@@ -3774,16 +3778,16 @@ static struct md_rdev *md_import_device(dev_t newdev, int super_format, int supe
if (err)
goto out_clear_rdev;
- rdev->bdev_handle = bdev_open_by_dev(newdev,
+ rdev->bdev_file = bdev_file_open_by_dev(newdev,
BLK_OPEN_READ | BLK_OPEN_WRITE,
super_format == -2 ? &claim_rdev : rdev, NULL);
- if (IS_ERR(rdev->bdev_handle)) {
+ if (IS_ERR(rdev->bdev_file)) {
pr_warn("md: could not open device unknown-block(%u,%u).\n",
MAJOR(newdev), MINOR(newdev));
- err = PTR_ERR(rdev->bdev_handle);
+ err = PTR_ERR(rdev->bdev_file);
goto out_clear_rdev;
}
- rdev->bdev = rdev->bdev_handle->bdev;
+ rdev->bdev = file_bdev(rdev->bdev_file);
kobject_init(&rdev->kobj, &rdev_ktype);
@@ -3814,7 +3818,7 @@ static struct md_rdev *md_import_device(dev_t newdev, int super_format, int supe
return rdev;
out_blkdev_put:
- bdev_release(rdev->bdev_handle);
+ fput(rdev->bdev_file);
out_clear_rdev:
md_rdev_clear(rdev);
out_free_rdev:
@@ -8847,12 +8851,16 @@ void md_do_sync(struct md_thread *thread)
int ret;
/* just incase thread restarts... */
- if (test_bit(MD_RECOVERY_DONE, &mddev->recovery) ||
- test_bit(MD_RECOVERY_WAIT, &mddev->recovery))
+ if (test_bit(MD_RECOVERY_DONE, &mddev->recovery))
return;
- if (!md_is_rdwr(mddev)) {/* never try to sync a read-only array */
+
+ if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
+ goto skip;
+
+ if (test_bit(MD_RECOVERY_WAIT, &mddev->recovery) ||
+ !md_is_rdwr(mddev)) {/* never try to sync a read-only array */
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
- return;
+ goto skip;
}
if (mddev_is_clustered(mddev)) {
@@ -9432,13 +9440,19 @@ static void md_start_sync(struct work_struct *ws)
struct mddev *mddev = container_of(ws, struct mddev, sync_work);
int spares = 0;
bool suspend = false;
+ char *name;
- if (md_spares_need_change(mddev))
+ /*
+ * If reshape is still in progress, spares won't be added or removed
+ * from conf until reshape is done.
+ */
+ if (mddev->reshape_position == MaxSector &&
+ md_spares_need_change(mddev)) {
suspend = true;
+ mddev_suspend(mddev, false);
+ }
- suspend ? mddev_suspend_and_lock_nointr(mddev) :
- mddev_lock_nointr(mddev);
-
+ mddev_lock_nointr(mddev);
if (!md_is_rdwr(mddev)) {
/*
* On a read-only array we can:
@@ -9464,8 +9478,10 @@ static void md_start_sync(struct work_struct *ws)
if (spares)
md_bitmap_write_all(mddev->bitmap);
+ name = test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) ?
+ "reshape" : "resync";
rcu_assign_pointer(mddev->sync_thread,
- md_register_thread(md_do_sync, mddev, "resync"));
+ md_register_thread(md_do_sync, mddev, name));
if (!mddev->sync_thread) {
pr_warn("%s: could not start resync thread...\n",
mdname(mddev));
@@ -9509,6 +9525,20 @@ static void md_start_sync(struct work_struct *ws)
sysfs_notify_dirent_safe(mddev->sysfs_action);
}
+static void unregister_sync_thread(struct mddev *mddev)
+{
+ if (!test_bit(MD_RECOVERY_DONE, &mddev->recovery)) {
+ /* resync/recovery still happening */
+ clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+ return;
+ }
+
+ if (WARN_ON_ONCE(!mddev->sync_thread))
+ return;
+
+ md_reap_sync_thread(mddev);
+}
+
/*
* This routine is regularly called by all per-raid-array threads to
* deal with generic issues like resync and super-block update.
@@ -9533,9 +9563,6 @@ static void md_start_sync(struct work_struct *ws)
*/
void md_check_recovery(struct mddev *mddev)
{
- if (READ_ONCE(mddev->suspended))
- return;
-
if (mddev->bitmap)
md_bitmap_daemon_work(mddev);
@@ -9549,7 +9576,8 @@ void md_check_recovery(struct mddev *mddev)
}
if (!md_is_rdwr(mddev) &&
- !test_bit(MD_RECOVERY_NEEDED, &mddev->recovery))
+ !test_bit(MD_RECOVERY_NEEDED, &mddev->recovery) &&
+ !test_bit(MD_RECOVERY_DONE, &mddev->recovery))
return;
if ( ! (
(mddev->sb_flags & ~ (1<<MD_SB_CHANGE_PENDING)) ||
@@ -9571,8 +9599,7 @@ void md_check_recovery(struct mddev *mddev)
struct md_rdev *rdev;
if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
- /* sync_work already queued. */
- clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+ unregister_sync_thread(mddev);
goto unlock;
}
@@ -9635,16 +9662,7 @@ void md_check_recovery(struct mddev *mddev)
* still set.
*/
if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
- if (!test_bit(MD_RECOVERY_DONE, &mddev->recovery)) {
- /* resync/recovery still happening */
- clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
- goto unlock;
- }
-
- if (WARN_ON_ONCE(!mddev->sync_thread))
- goto unlock;
-
- md_reap_sync_thread(mddev);
+ unregister_sync_thread(mddev);
goto unlock;
}
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 67e50c4..097d9db 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -60,7 +60,7 @@ struct md_rdev {
*/
struct block_device *meta_bdev;
struct block_device *bdev; /* block device handle */
- struct bdev_handle *bdev_handle; /* Handle from open for bdev */
+ struct file *bdev_file; /* Handle from open for bdev */
struct page *sb_page, *bb_page;
int sb_loaded;
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index b0fd300..a4556d2 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4149,11 +4149,7 @@ static int raid10_run(struct mddev *mddev)
clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
- set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
- rcu_assign_pointer(mddev->sync_thread,
- md_register_thread(md_do_sync, mddev, "reshape"));
- if (!mddev->sync_thread)
- goto out_free_conf;
+ set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
}
return 0;
@@ -4547,16 +4543,8 @@ static int raid10_start_reshape(struct mddev *mddev)
clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
clear_bit(MD_RECOVERY_DONE, &mddev->recovery);
set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
- set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
-
- rcu_assign_pointer(mddev->sync_thread,
- md_register_thread(md_do_sync, mddev, "reshape"));
- if (!mddev->sync_thread) {
- ret = -EAGAIN;
- goto abort;
- }
+ set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
conf->reshape_checkpoint = jiffies;
- md_wakeup_thread(mddev->sync_thread);
md_new_event();
return 0;
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index f2e3c3e..d874abf 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -8000,11 +8000,7 @@ static int raid5_run(struct mddev *mddev)
clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
- set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
- rcu_assign_pointer(mddev->sync_thread,
- md_register_thread(md_do_sync, mddev, "reshape"));
- if (!mddev->sync_thread)
- goto abort;
+ set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
}
/* Ok, everything is just fine now */
@@ -8514,29 +8510,8 @@ static int raid5_start_reshape(struct mddev *mddev)
clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
clear_bit(MD_RECOVERY_DONE, &mddev->recovery);
set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
- set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
- rcu_assign_pointer(mddev->sync_thread,
- md_register_thread(md_do_sync, mddev, "reshape"));
- if (!mddev->sync_thread) {
- mddev->recovery = 0;
- spin_lock_irq(&conf->device_lock);
- write_seqcount_begin(&conf->gen_lock);
- mddev->raid_disks = conf->raid_disks = conf->previous_raid_disks;
- mddev->new_chunk_sectors =
- conf->chunk_sectors = conf->prev_chunk_sectors;
- mddev->new_layout = conf->algorithm = conf->prev_algo;
- rdev_for_each(rdev, mddev)
- rdev->new_data_offset = rdev->data_offset;
- smp_wmb();
- conf->generation --;
- conf->reshape_progress = MaxSector;
- mddev->reshape_position = MaxSector;
- write_seqcount_end(&conf->gen_lock);
- spin_unlock_irq(&conf->device_lock);
- return -EAGAIN;
- }
+ set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
conf->reshape_checkpoint = jiffies;
- md_wakeup_thread(mddev->sync_thread);
md_new_event();
return 0;
}
diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
index aebd3c1..c381c22 100644
--- a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
+++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
@@ -725,6 +725,9 @@ irqreturn_t rkisp1_capture_isr(int irq, void *ctx)
unsigned int i;
u32 status;
+ if (!rkisp1->irqs_enabled)
+ return IRQ_NONE;
+
status = rkisp1_read(rkisp1, RKISP1_CIF_MI_MIS);
if (!status)
return IRQ_NONE;
diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-common.h b/drivers/media/platform/rockchip/rkisp1/rkisp1-common.h
index 4b6b28c..b757f75 100644
--- a/drivers/media/platform/rockchip/rkisp1/rkisp1-common.h
+++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-common.h
@@ -450,6 +450,7 @@ struct rkisp1_debug {
* @debug: debug params to be exposed on debugfs
* @info: version-specific ISP information
* @irqs: IRQ line numbers
+ * @irqs_enabled: the hardware is enabled and can cause interrupts
*/
struct rkisp1_device {
void __iomem *base_addr;
@@ -471,6 +472,7 @@ struct rkisp1_device {
struct rkisp1_debug debug;
const struct rkisp1_info *info;
int irqs[RKISP1_NUM_IRQS];
+ bool irqs_enabled;
};
/*
diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-csi.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-csi.c
index b6e47e2..4202642 100644
--- a/drivers/media/platform/rockchip/rkisp1/rkisp1-csi.c
+++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-csi.c
@@ -196,6 +196,9 @@ irqreturn_t rkisp1_csi_isr(int irq, void *ctx)
struct rkisp1_device *rkisp1 = dev_get_drvdata(dev);
u32 val, status;
+ if (!rkisp1->irqs_enabled)
+ return IRQ_NONE;
+
status = rkisp1_read(rkisp1, RKISP1_CIF_MIPI_MIS);
if (!status)
return IRQ_NONE;
diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-dev.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-dev.c
index f96f821..73cf08a 100644
--- a/drivers/media/platform/rockchip/rkisp1/rkisp1-dev.c
+++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-dev.c
@@ -305,6 +305,24 @@ static int __maybe_unused rkisp1_runtime_suspend(struct device *dev)
{
struct rkisp1_device *rkisp1 = dev_get_drvdata(dev);
+ rkisp1->irqs_enabled = false;
+ /* Make sure the IRQ handler will see the above */
+ mb();
+
+ /*
+ * Wait until any running IRQ handler has returned. The IRQ handler
+ * may get called even after this (as it's a shared interrupt line)
+ * but the 'irqs_enabled' flag will make the handler return immediately.
+ */
+ for (unsigned int il = 0; il < ARRAY_SIZE(rkisp1->irqs); ++il) {
+ if (rkisp1->irqs[il] == -1)
+ continue;
+
+ /* Skip if the irq line is the same as previous */
+ if (il == 0 || rkisp1->irqs[il - 1] != rkisp1->irqs[il])
+ synchronize_irq(rkisp1->irqs[il]);
+ }
+
clk_bulk_disable_unprepare(rkisp1->clk_size, rkisp1->clks);
return pinctrl_pm_select_sleep_state(dev);
}
@@ -321,6 +339,10 @@ static int __maybe_unused rkisp1_runtime_resume(struct device *dev)
if (ret)
return ret;
+ rkisp1->irqs_enabled = true;
+ /* Make sure the IRQ handler will see the above */
+ mb();
+
return 0;
}
@@ -559,7 +581,7 @@ static int rkisp1_probe(struct platform_device *pdev)
rkisp1->irqs[il] = irq;
}
- ret = devm_request_irq(dev, irq, info->isrs[i].isr, 0,
+ ret = devm_request_irq(dev, irq, info->isrs[i].isr, IRQF_SHARED,
dev_driver_string(dev), dev);
if (ret) {
dev_err(dev, "request irq failed: %d\n", ret);
diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-isp.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-isp.c
index f00873d..78a1f7a 100644
--- a/drivers/media/platform/rockchip/rkisp1/rkisp1-isp.c
+++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-isp.c
@@ -976,6 +976,9 @@ irqreturn_t rkisp1_isp_isr(int irq, void *ctx)
struct rkisp1_device *rkisp1 = dev_get_drvdata(dev);
u32 status, isp_err;
+ if (!rkisp1->irqs_enabled)
+ return IRQ_NONE;
+
status = rkisp1_read(rkisp1, RKISP1_CIF_ISP_MIS);
if (!status)
return IRQ_NONE;
diff --git a/drivers/media/rc/Kconfig b/drivers/media/rc/Kconfig
index 2afe67f..74d69ce 100644
--- a/drivers/media/rc/Kconfig
+++ b/drivers/media/rc/Kconfig
@@ -319,6 +319,7 @@
tristate "PWM IR transmitter"
depends on LIRC
depends on PWM
+ depends on HIGH_RES_TIMERS
depends on OF
help
Say Y if you want to use a PWM based IR transmitter. This is
diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c
index fe17c7f..52d82cb 100644
--- a/drivers/media/rc/bpf-lirc.c
+++ b/drivers/media/rc/bpf-lirc.c
@@ -253,7 +253,7 @@ int lirc_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
if (attr->attach_flags)
return -EINVAL;
- rcdev = rc_dev_get_from_fd(attr->target_fd);
+ rcdev = rc_dev_get_from_fd(attr->target_fd, true);
if (IS_ERR(rcdev))
return PTR_ERR(rcdev);
@@ -278,7 +278,7 @@ int lirc_prog_detach(const union bpf_attr *attr)
if (IS_ERR(prog))
return PTR_ERR(prog);
- rcdev = rc_dev_get_from_fd(attr->target_fd);
+ rcdev = rc_dev_get_from_fd(attr->target_fd, true);
if (IS_ERR(rcdev)) {
bpf_prog_put(prog);
return PTR_ERR(rcdev);
@@ -303,7 +303,7 @@ int lirc_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr)
if (attr->query.query_flags)
return -EINVAL;
- rcdev = rc_dev_get_from_fd(attr->query.target_fd);
+ rcdev = rc_dev_get_from_fd(attr->query.target_fd, false);
if (IS_ERR(rcdev))
return PTR_ERR(rcdev);
diff --git a/drivers/media/rc/ir_toy.c b/drivers/media/rc/ir_toy.c
index 1968067..69e630d 100644
--- a/drivers/media/rc/ir_toy.c
+++ b/drivers/media/rc/ir_toy.c
@@ -332,6 +332,7 @@ static int irtoy_tx(struct rc_dev *rc, uint *txbuf, uint count)
sizeof(COMMAND_SMODE_EXIT), STATE_COMMAND_NO_RESP);
if (err) {
dev_err(irtoy->dev, "exit sample mode: %d\n", err);
+ kfree(buf);
return err;
}
@@ -339,6 +340,7 @@ static int irtoy_tx(struct rc_dev *rc, uint *txbuf, uint count)
sizeof(COMMAND_SMODE_ENTER), STATE_COMMAND);
if (err) {
dev_err(irtoy->dev, "enter sample mode: %d\n", err);
+ kfree(buf);
return err;
}
diff --git a/drivers/media/rc/lirc_dev.c b/drivers/media/rc/lirc_dev.c
index a537734..caad59f 100644
--- a/drivers/media/rc/lirc_dev.c
+++ b/drivers/media/rc/lirc_dev.c
@@ -814,7 +814,7 @@ void __exit lirc_dev_exit(void)
unregister_chrdev_region(lirc_base_dev, RC_DEV_MAX);
}
-struct rc_dev *rc_dev_get_from_fd(int fd)
+struct rc_dev *rc_dev_get_from_fd(int fd, bool write)
{
struct fd f = fdget(fd);
struct lirc_fh *fh;
@@ -828,6 +828,9 @@ struct rc_dev *rc_dev_get_from_fd(int fd)
return ERR_PTR(-EINVAL);
}
+ if (write && !(f.file->f_mode & FMODE_WRITE))
+ return ERR_PTR(-EPERM);
+
fh = f.file->private_data;
dev = fh->rc;
diff --git a/drivers/media/rc/rc-core-priv.h b/drivers/media/rc/rc-core-priv.h
index ef1e95e..7df949f 100644
--- a/drivers/media/rc/rc-core-priv.h
+++ b/drivers/media/rc/rc-core-priv.h
@@ -325,7 +325,7 @@ void lirc_raw_event(struct rc_dev *dev, struct ir_raw_event ev);
void lirc_scancode_event(struct rc_dev *dev, struct lirc_scancode *lsc);
int lirc_register(struct rc_dev *dev);
void lirc_unregister(struct rc_dev *dev);
-struct rc_dev *rc_dev_get_from_fd(int fd);
+struct rc_dev *rc_dev_get_from_fd(int fd, bool write);
#else
static inline int lirc_dev_init(void) { return 0; }
static inline void lirc_dev_exit(void) {}
diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
index 03319a1f..dbd26c3 100644
--- a/drivers/misc/fastrpc.c
+++ b/drivers/misc/fastrpc.c
@@ -263,7 +263,6 @@ struct fastrpc_channel_ctx {
int domain_id;
int sesscount;
int vmcount;
- u64 perms;
struct qcom_scm_vmperm vmperms[FASTRPC_MAX_VMIDS];
struct rpmsg_device *rpdev;
struct fastrpc_session_ctx session[FASTRPC_MAX_SESSIONS];
@@ -1279,9 +1278,11 @@ static int fastrpc_init_create_static_process(struct fastrpc_user *fl,
/* Map if we have any heap VMIDs associated with this ADSP Static Process. */
if (fl->cctx->vmcount) {
+ u64 src_perms = BIT(QCOM_SCM_VMID_HLOS);
+
err = qcom_scm_assign_mem(fl->cctx->remote_heap->phys,
(u64)fl->cctx->remote_heap->size,
- &fl->cctx->perms,
+ &src_perms,
fl->cctx->vmperms, fl->cctx->vmcount);
if (err) {
dev_err(fl->sctx->dev, "Failed to assign memory with phys 0x%llx size 0x%llx err %d",
@@ -1915,8 +1916,10 @@ static int fastrpc_req_mmap(struct fastrpc_user *fl, char __user *argp)
/* Add memory to static PD pool, protection thru hypervisor */
if (req.flags == ADSP_MMAP_REMOTE_HEAP_ADDR && fl->cctx->vmcount) {
+ u64 src_perms = BIT(QCOM_SCM_VMID_HLOS);
+
err = qcom_scm_assign_mem(buf->phys, (u64)buf->size,
- &fl->cctx->perms, fl->cctx->vmperms, fl->cctx->vmcount);
+ &src_perms, fl->cctx->vmperms, fl->cctx->vmcount);
if (err) {
dev_err(fl->sctx->dev, "Failed to assign memory phys 0x%llx size 0x%llx err %d",
buf->phys, buf->size, err);
@@ -2290,7 +2293,6 @@ static int fastrpc_rpmsg_probe(struct rpmsg_device *rpdev)
if (vmcount) {
data->vmcount = vmcount;
- data->perms = BIT(QCOM_SCM_VMID_HLOS);
for (i = 0; i < data->vmcount; i++) {
data->vmperms[i].vmid = vmids[i];
data->vmperms[i].perm = QCOM_SCM_PERM_RWX;
diff --git a/drivers/misc/lis3lv02d/lis3lv02d_i2c.c b/drivers/misc/lis3lv02d/lis3lv02d_i2c.c
index c6eb27d..1511958 100644
--- a/drivers/misc/lis3lv02d/lis3lv02d_i2c.c
+++ b/drivers/misc/lis3lv02d/lis3lv02d_i2c.c
@@ -198,8 +198,14 @@ static int lis3lv02d_i2c_suspend(struct device *dev)
struct i2c_client *client = to_i2c_client(dev);
struct lis3lv02d *lis3 = i2c_get_clientdata(client);
- if (!lis3->pdata || !lis3->pdata->wakeup_flags)
+ /* Turn on for wakeup if turned off by runtime suspend */
+ if (lis3->pdata && lis3->pdata->wakeup_flags) {
+ if (pm_runtime_suspended(dev))
+ lis3lv02d_poweron(lis3);
+ /* For non wakeup turn off if not already turned off by runtime suspend */
+ } else if (!pm_runtime_suspended(dev))
lis3lv02d_poweroff(lis3);
+
return 0;
}
@@ -208,13 +214,12 @@ static int lis3lv02d_i2c_resume(struct device *dev)
struct i2c_client *client = to_i2c_client(dev);
struct lis3lv02d *lis3 = i2c_get_clientdata(client);
- /*
- * pm_runtime documentation says that devices should always
- * be powered on at resume. Pm_runtime turns them off after system
- * wide resume is complete.
- */
- if (!lis3->pdata || !lis3->pdata->wakeup_flags ||
- pm_runtime_suspended(dev))
+ /* Turn back off if turned on for wakeup and runtime suspended*/
+ if (lis3->pdata && lis3->pdata->wakeup_flags) {
+ if (pm_runtime_suspended(dev))
+ lis3lv02d_poweroff(lis3);
+ /* For non wakeup turn back on if not runtime suspended */
+ } else if (!pm_runtime_suspended(dev))
lis3lv02d_poweron(lis3);
return 0;
diff --git a/drivers/misc/mei/gsc_proxy/mei_gsc_proxy.c b/drivers/misc/mei/gsc_proxy/mei_gsc_proxy.c
index be52b11..89364bd 100644
--- a/drivers/misc/mei/gsc_proxy/mei_gsc_proxy.c
+++ b/drivers/misc/mei/gsc_proxy/mei_gsc_proxy.c
@@ -96,7 +96,8 @@ static const struct component_master_ops mei_component_master_ops = {
*
* The function checks if the device is pci device and
* Intel VGA adapter, the subcomponent is SW Proxy
- * and the parent of MEI PCI and the parent of VGA are the same PCH device.
+ * and the VGA is on the bus 0 reserved for built-in devices
+ * to reject discrete GFX.
*
* @dev: master device
* @subcomponent: subcomponent to match (I915_COMPONENT_SWPROXY)
@@ -123,7 +124,8 @@ static int mei_gsc_proxy_component_match(struct device *dev, int subcomponent,
if (subcomponent != I915_COMPONENT_GSC_PROXY)
return 0;
- return component_compare_dev(dev->parent, ((struct device *)data)->parent);
+ /* Only built-in GFX */
+ return (pdev->bus->number == 0);
}
static int mei_gsc_proxy_probe(struct mei_cl_device *cldev,
@@ -146,7 +148,7 @@ static int mei_gsc_proxy_probe(struct mei_cl_device *cldev,
}
component_match_add_typed(&cldev->dev, &master_match,
- mei_gsc_proxy_component_match, cldev->dev.parent);
+ mei_gsc_proxy_component_match, NULL);
if (IS_ERR_OR_NULL(master_match)) {
ret = -ENOMEM;
goto err_exit;
diff --git a/drivers/misc/mei/hw-me-regs.h b/drivers/misc/mei/hw-me-regs.h
index 961e5d5..aac3675 100644
--- a/drivers/misc/mei/hw-me-regs.h
+++ b/drivers/misc/mei/hw-me-regs.h
@@ -112,6 +112,8 @@
#define MEI_DEV_ID_RPL_S 0x7A68 /* Raptor Lake Point S */
#define MEI_DEV_ID_MTL_M 0x7E70 /* Meteor Lake Point M */
+#define MEI_DEV_ID_ARL_S 0x7F68 /* Arrow Lake Point S */
+#define MEI_DEV_ID_ARL_H 0x7770 /* Arrow Lake Point H */
/*
* MEI HW Section
diff --git a/drivers/misc/mei/pci-me.c b/drivers/misc/mei/pci-me.c
index 676d566..8cf636c 100644
--- a/drivers/misc/mei/pci-me.c
+++ b/drivers/misc/mei/pci-me.c
@@ -119,6 +119,8 @@ static const struct pci_device_id mei_me_pci_tbl[] = {
{MEI_PCI_DEVICE(MEI_DEV_ID_RPL_S, MEI_ME_PCH15_CFG)},
{MEI_PCI_DEVICE(MEI_DEV_ID_MTL_M, MEI_ME_PCH15_CFG)},
+ {MEI_PCI_DEVICE(MEI_DEV_ID_ARL_S, MEI_ME_PCH15_CFG)},
+ {MEI_PCI_DEVICE(MEI_DEV_ID_ARL_H, MEI_ME_PCH15_CFG)},
/* required last entry */
{0, }
diff --git a/drivers/misc/mei/vsc-tp.c b/drivers/misc/mei/vsc-tp.c
index 6f4a4be..55f7db4 100644
--- a/drivers/misc/mei/vsc-tp.c
+++ b/drivers/misc/mei/vsc-tp.c
@@ -535,6 +535,7 @@ static const struct acpi_device_id vsc_tp_acpi_ids[] = {
{ "INTC1009" }, /* Raptor Lake */
{ "INTC1058" }, /* Tiger Lake */
{ "INTC1094" }, /* Alder Lake */
+ { "INTC10D0" }, /* Meteor Lake */
{}
};
MODULE_DEVICE_TABLE(acpi, vsc_tp_acpi_ids);
diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
index f410bee..58ed719 100644
--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -1015,10 +1015,12 @@ static int mmc_select_bus_width(struct mmc_card *card)
static unsigned ext_csd_bits[] = {
EXT_CSD_BUS_WIDTH_8,
EXT_CSD_BUS_WIDTH_4,
+ EXT_CSD_BUS_WIDTH_1,
};
static unsigned bus_widths[] = {
MMC_BUS_WIDTH_8,
MMC_BUS_WIDTH_4,
+ MMC_BUS_WIDTH_1,
};
struct mmc_host *host = card->host;
unsigned idx, bus_width = 0;
diff --git a/drivers/mmc/core/slot-gpio.c b/drivers/mmc/core/slot-gpio.c
index 2a2d949..39f45c2 100644
--- a/drivers/mmc/core/slot-gpio.c
+++ b/drivers/mmc/core/slot-gpio.c
@@ -75,11 +75,15 @@ EXPORT_SYMBOL(mmc_gpio_set_cd_irq);
int mmc_gpio_get_ro(struct mmc_host *host)
{
struct mmc_gpio *ctx = host->slot.handler_priv;
+ int cansleep;
if (!ctx || !ctx->ro_gpio)
return -ENOSYS;
- return gpiod_get_value_cansleep(ctx->ro_gpio);
+ cansleep = gpiod_cansleep(ctx->ro_gpio);
+ return cansleep ?
+ gpiod_get_value_cansleep(ctx->ro_gpio) :
+ gpiod_get_value(ctx->ro_gpio);
}
EXPORT_SYMBOL(mmc_gpio_get_ro);
diff --git a/drivers/mmc/host/mmci_stm32_sdmmc.c b/drivers/mmc/host/mmci_stm32_sdmmc.c
index 35067e1..f5da7f9 100644
--- a/drivers/mmc/host/mmci_stm32_sdmmc.c
+++ b/drivers/mmc/host/mmci_stm32_sdmmc.c
@@ -225,6 +225,8 @@ static int sdmmc_idma_start(struct mmci_host *host, unsigned int *datactrl)
struct scatterlist *sg;
int i;
+ host->dma_in_progress = true;
+
if (!host->variant->dma_lli || data->sg_len == 1 ||
idma->use_bounce_buffer) {
u32 dma_addr;
@@ -263,9 +265,30 @@ static int sdmmc_idma_start(struct mmci_host *host, unsigned int *datactrl)
return 0;
}
+static void sdmmc_idma_error(struct mmci_host *host)
+{
+ struct mmc_data *data = host->data;
+ struct sdmmc_idma *idma = host->dma_priv;
+
+ if (!dma_inprogress(host))
+ return;
+
+ writel_relaxed(0, host->base + MMCI_STM32_IDMACTRLR);
+ host->dma_in_progress = false;
+ data->host_cookie = 0;
+
+ if (!idma->use_bounce_buffer)
+ dma_unmap_sg(mmc_dev(host->mmc), data->sg, data->sg_len,
+ mmc_get_dma_dir(data));
+}
+
static void sdmmc_idma_finalize(struct mmci_host *host, struct mmc_data *data)
{
+ if (!dma_inprogress(host))
+ return;
+
writel_relaxed(0, host->base + MMCI_STM32_IDMACTRLR);
+ host->dma_in_progress = false;
if (!data->host_cookie)
sdmmc_idma_unprep_data(host, data, 0);
@@ -676,6 +699,7 @@ static struct mmci_host_ops sdmmc_variant_ops = {
.dma_setup = sdmmc_idma_setup,
.dma_start = sdmmc_idma_start,
.dma_finalize = sdmmc_idma_finalize,
+ .dma_error = sdmmc_idma_error,
.set_clkreg = mmci_sdmmc_set_clkreg,
.set_pwrreg = mmci_sdmmc_set_pwrreg,
.busy_complete = sdmmc_busy_complete,
diff --git a/drivers/mmc/host/sdhci-pci-o2micro.c b/drivers/mmc/host/sdhci-pci-o2micro.c
index 7bfee28..d4a0218 100644
--- a/drivers/mmc/host/sdhci-pci-o2micro.c
+++ b/drivers/mmc/host/sdhci-pci-o2micro.c
@@ -693,6 +693,35 @@ static int sdhci_pci_o2_init_sd_express(struct mmc_host *mmc, struct mmc_ios *io
return 0;
}
+static void sdhci_pci_o2_set_power(struct sdhci_host *host, unsigned char mode, unsigned short vdd)
+{
+ struct sdhci_pci_chip *chip;
+ struct sdhci_pci_slot *slot = sdhci_priv(host);
+ u32 scratch_32 = 0;
+ u8 scratch_8 = 0;
+
+ chip = slot->chip;
+
+ if (mode == MMC_POWER_OFF) {
+ /* UnLock WP */
+ pci_read_config_byte(chip->pdev, O2_SD_LOCK_WP, &scratch_8);
+ scratch_8 &= 0x7f;
+ pci_write_config_byte(chip->pdev, O2_SD_LOCK_WP, scratch_8);
+
+ /* Set PCR 0x354[16] to switch Clock Source back to OPE Clock */
+ pci_read_config_dword(chip->pdev, O2_SD_OUTPUT_CLK_SOURCE_SWITCH, &scratch_32);
+ scratch_32 &= ~(O2_SD_SEL_DLL);
+ pci_write_config_dword(chip->pdev, O2_SD_OUTPUT_CLK_SOURCE_SWITCH, scratch_32);
+
+ /* Lock WP */
+ pci_read_config_byte(chip->pdev, O2_SD_LOCK_WP, &scratch_8);
+ scratch_8 |= 0x80;
+ pci_write_config_byte(chip->pdev, O2_SD_LOCK_WP, scratch_8);
+ }
+
+ sdhci_set_power(host, mode, vdd);
+}
+
static int sdhci_pci_o2_probe_slot(struct sdhci_pci_slot *slot)
{
struct sdhci_pci_chip *chip;
@@ -1051,6 +1080,7 @@ static const struct sdhci_ops sdhci_pci_o2_ops = {
.set_bus_width = sdhci_set_bus_width,
.reset = sdhci_reset,
.set_uhs_signaling = sdhci_set_uhs_signaling,
+ .set_power = sdhci_pci_o2_set_power,
};
const struct sdhci_pci_fixes sdhci_o2 = {
diff --git a/drivers/mmc/host/sdhci-xenon-phy.c b/drivers/mmc/host/sdhci-xenon-phy.c
index 8cf3a37..cc9d28b 100644
--- a/drivers/mmc/host/sdhci-xenon-phy.c
+++ b/drivers/mmc/host/sdhci-xenon-phy.c
@@ -11,6 +11,7 @@
#include <linux/slab.h>
#include <linux/delay.h>
#include <linux/ktime.h>
+#include <linux/iopoll.h>
#include <linux/of_address.h>
#include "sdhci-pltfm.h"
@@ -109,6 +110,8 @@
#define XENON_EMMC_PHY_LOGIC_TIMING_ADJUST (XENON_EMMC_PHY_REG_BASE + 0x18)
#define XENON_LOGIC_TIMING_VALUE 0x00AA8977
+#define XENON_MAX_PHY_TIMEOUT_LOOPS 100
+
/*
* List offset of PHY registers and some special register values
* in eMMC PHY 5.0 or eMMC PHY 5.1
@@ -216,6 +219,19 @@ static int xenon_alloc_emmc_phy(struct sdhci_host *host)
return 0;
}
+static int xenon_check_stability_internal_clk(struct sdhci_host *host)
+{
+ u32 reg;
+ int err;
+
+ err = read_poll_timeout(sdhci_readw, reg, reg & SDHCI_CLOCK_INT_STABLE,
+ 1100, 20000, false, host, SDHCI_CLOCK_CONTROL);
+ if (err)
+ dev_err(mmc_dev(host->mmc), "phy_init: Internal clock never stabilized.\n");
+
+ return err;
+}
+
/*
* eMMC 5.0/5.1 PHY init/re-init.
* eMMC PHY init should be executed after:
@@ -232,6 +248,11 @@ static int xenon_emmc_phy_init(struct sdhci_host *host)
struct xenon_priv *priv = sdhci_pltfm_priv(pltfm_host);
struct xenon_emmc_phy_regs *phy_regs = priv->emmc_phy_regs;
+ int ret = xenon_check_stability_internal_clk(host);
+
+ if (ret)
+ return ret;
+
reg = sdhci_readl(host, phy_regs->timing_adj);
reg |= XENON_PHY_INITIALIZAION;
sdhci_writel(host, reg, phy_regs->timing_adj);
@@ -259,18 +280,27 @@ static int xenon_emmc_phy_init(struct sdhci_host *host)
/* get the wait time */
wait /= clock;
wait++;
- /* wait for host eMMC PHY init completes */
- udelay(wait);
- reg = sdhci_readl(host, phy_regs->timing_adj);
- reg &= XENON_PHY_INITIALIZAION;
- if (reg) {
+ /*
+ * AC5X spec says bit must be polled until zero.
+ * We see cases in which timeout can take longer
+ * than the standard calculation on AC5X, which is
+ * expected following the spec comment above.
+ * According to the spec, we must wait as long as
+ * it takes for that bit to toggle on AC5X.
+ * Cap that with 100 delay loops so we won't get
+ * stuck here forever:
+ */
+
+ ret = read_poll_timeout(sdhci_readl, reg,
+ !(reg & XENON_PHY_INITIALIZAION),
+ wait, XENON_MAX_PHY_TIMEOUT_LOOPS * wait,
+ false, host, phy_regs->timing_adj);
+ if (ret)
dev_err(mmc_dev(host->mmc), "eMMC PHY init cannot complete after %d us\n",
- wait);
- return -ETIMEDOUT;
- }
+ wait * XENON_MAX_PHY_TIMEOUT_LOOPS);
- return 0;
+ return ret;
}
#define ARMADA_3700_SOC_PAD_1_8V 0x1
diff --git a/drivers/mtd/devices/block2mtd.c b/drivers/mtd/devices/block2mtd.c
index aa44a23..97a00ec 100644
--- a/drivers/mtd/devices/block2mtd.c
+++ b/drivers/mtd/devices/block2mtd.c
@@ -37,7 +37,7 @@
/* Info for the block device */
struct block2mtd_dev {
struct list_head list;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
struct mtd_info mtd;
struct mutex write_mutex;
};
@@ -55,8 +55,7 @@ static struct page *page_read(struct address_space *mapping, pgoff_t index)
/* erase a specified part of the device */
static int _block2mtd_erase(struct block2mtd_dev *dev, loff_t to, size_t len)
{
- struct address_space *mapping =
- dev->bdev_handle->bdev->bd_inode->i_mapping;
+ struct address_space *mapping = dev->bdev_file->f_mapping;
struct page *page;
pgoff_t index = to >> PAGE_SHIFT; // page index
int pages = len >> PAGE_SHIFT;
@@ -106,8 +105,7 @@ static int block2mtd_read(struct mtd_info *mtd, loff_t from, size_t len,
size_t *retlen, u_char *buf)
{
struct block2mtd_dev *dev = mtd->priv;
- struct address_space *mapping =
- dev->bdev_handle->bdev->bd_inode->i_mapping;
+ struct address_space *mapping = dev->bdev_file->f_mapping;
struct page *page;
pgoff_t index = from >> PAGE_SHIFT;
int offset = from & (PAGE_SIZE-1);
@@ -142,8 +140,7 @@ static int _block2mtd_write(struct block2mtd_dev *dev, const u_char *buf,
loff_t to, size_t len, size_t *retlen)
{
struct page *page;
- struct address_space *mapping =
- dev->bdev_handle->bdev->bd_inode->i_mapping;
+ struct address_space *mapping = dev->bdev_file->f_mapping;
pgoff_t index = to >> PAGE_SHIFT; // page index
int offset = to & ~PAGE_MASK; // page offset
int cpylen;
@@ -198,7 +195,7 @@ static int block2mtd_write(struct mtd_info *mtd, loff_t to, size_t len,
static void block2mtd_sync(struct mtd_info *mtd)
{
struct block2mtd_dev *dev = mtd->priv;
- sync_blockdev(dev->bdev_handle->bdev);
+ sync_blockdev(file_bdev(dev->bdev_file));
return;
}
@@ -210,10 +207,9 @@ static void block2mtd_free_device(struct block2mtd_dev *dev)
kfree(dev->mtd.name);
- if (dev->bdev_handle) {
- invalidate_mapping_pages(
- dev->bdev_handle->bdev->bd_inode->i_mapping, 0, -1);
- bdev_release(dev->bdev_handle);
+ if (dev->bdev_file) {
+ invalidate_mapping_pages(dev->bdev_file->f_mapping, 0, -1);
+ fput(dev->bdev_file);
}
kfree(dev);
@@ -223,10 +219,10 @@ static void block2mtd_free_device(struct block2mtd_dev *dev)
* This function is marked __ref because it calls the __init marked
* early_lookup_bdev when called from the early boot code.
*/
-static struct bdev_handle __ref *mdtblock_early_get_bdev(const char *devname,
+static struct file __ref *mdtblock_early_get_bdev(const char *devname,
blk_mode_t mode, int timeout, struct block2mtd_dev *dev)
{
- struct bdev_handle *bdev_handle = ERR_PTR(-ENODEV);
+ struct file *bdev_file = ERR_PTR(-ENODEV);
#ifndef MODULE
int i;
@@ -234,7 +230,7 @@ static struct bdev_handle __ref *mdtblock_early_get_bdev(const char *devname,
* We can't use early_lookup_bdev from a running system.
*/
if (system_state >= SYSTEM_RUNNING)
- return bdev_handle;
+ return bdev_file;
/*
* We might not have the root device mounted at this point.
@@ -253,20 +249,20 @@ static struct bdev_handle __ref *mdtblock_early_get_bdev(const char *devname,
wait_for_device_probe();
if (!early_lookup_bdev(devname, &devt)) {
- bdev_handle = bdev_open_by_dev(devt, mode, dev, NULL);
- if (!IS_ERR(bdev_handle))
+ bdev_file = bdev_file_open_by_dev(devt, mode, dev, NULL);
+ if (!IS_ERR(bdev_file))
break;
}
}
#endif
- return bdev_handle;
+ return bdev_file;
}
static struct block2mtd_dev *add_device(char *devname, int erase_size,
char *label, int timeout)
{
const blk_mode_t mode = BLK_OPEN_READ | BLK_OPEN_WRITE;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
struct block_device *bdev;
struct block2mtd_dev *dev;
char *name;
@@ -279,16 +275,16 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
return NULL;
/* Get a handle on the device */
- bdev_handle = bdev_open_by_path(devname, mode, dev, NULL);
- if (IS_ERR(bdev_handle))
- bdev_handle = mdtblock_early_get_bdev(devname, mode, timeout,
+ bdev_file = bdev_file_open_by_path(devname, mode, dev, NULL);
+ if (IS_ERR(bdev_file))
+ bdev_file = mdtblock_early_get_bdev(devname, mode, timeout,
dev);
- if (IS_ERR(bdev_handle)) {
+ if (IS_ERR(bdev_file)) {
pr_err("error: cannot open device %s\n", devname);
goto err_free_block2mtd;
}
- dev->bdev_handle = bdev_handle;
- bdev = bdev_handle->bdev;
+ dev->bdev_file = bdev_file;
+ bdev = file_bdev(bdev_file);
if (MAJOR(bdev->bd_dev) == MTD_BLOCK_MAJOR) {
pr_err("attempting to use an MTD device as a block device\n");
diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
index e451b28..5887feb 100644
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -621,6 +621,7 @@ static void mtd_check_of_node(struct mtd_info *mtd)
if (plen == mtd_name_len &&
!strncmp(mtd->name, pname + offset, plen)) {
mtd_set_of_node(mtd, mtd_dn);
+ of_node_put(mtd_dn);
break;
}
}
diff --git a/drivers/mtd/nand/raw/marvell_nand.c b/drivers/mtd/nand/raw/marvell_nand.c
index a466987..5b0f5a9 100644
--- a/drivers/mtd/nand/raw/marvell_nand.c
+++ b/drivers/mtd/nand/raw/marvell_nand.c
@@ -290,16 +290,13 @@ static const struct marvell_hw_ecc_layout marvell_nfc_layouts[] = {
MARVELL_LAYOUT( 2048, 512, 4, 1, 1, 2048, 32, 30, 0, 0, 0),
MARVELL_LAYOUT( 2048, 512, 8, 2, 1, 1024, 0, 30,1024,32, 30),
MARVELL_LAYOUT( 2048, 512, 8, 2, 1, 1024, 0, 30,1024,64, 30),
- MARVELL_LAYOUT( 2048, 512, 12, 3, 2, 704, 0, 30,640, 0, 30),
- MARVELL_LAYOUT( 2048, 512, 16, 5, 4, 512, 0, 30, 0, 32, 30),
+ MARVELL_LAYOUT( 2048, 512, 16, 4, 4, 512, 0, 30, 0, 32, 30),
MARVELL_LAYOUT( 4096, 512, 4, 2, 2, 2048, 32, 30, 0, 0, 0),
- MARVELL_LAYOUT( 4096, 512, 8, 5, 4, 1024, 0, 30, 0, 64, 30),
- MARVELL_LAYOUT( 4096, 512, 12, 6, 5, 704, 0, 30,576, 32, 30),
- MARVELL_LAYOUT( 4096, 512, 16, 9, 8, 512, 0, 30, 0, 32, 30),
+ MARVELL_LAYOUT( 4096, 512, 8, 4, 4, 1024, 0, 30, 0, 64, 30),
+ MARVELL_LAYOUT( 4096, 512, 16, 8, 8, 512, 0, 30, 0, 32, 30),
MARVELL_LAYOUT( 8192, 512, 4, 4, 4, 2048, 0, 30, 0, 0, 0),
- MARVELL_LAYOUT( 8192, 512, 8, 9, 8, 1024, 0, 30, 0, 160, 30),
- MARVELL_LAYOUT( 8192, 512, 12, 12, 11, 704, 0, 30,448, 64, 30),
- MARVELL_LAYOUT( 8192, 512, 16, 17, 16, 512, 0, 30, 0, 32, 30),
+ MARVELL_LAYOUT( 8192, 512, 8, 8, 8, 1024, 0, 30, 0, 160, 30),
+ MARVELL_LAYOUT( 8192, 512, 16, 16, 16, 512, 0, 30, 0, 32, 30),
};
/**
diff --git a/drivers/mtd/nand/spi/gigadevice.c b/drivers/mtd/nand/spi/gigadevice.c
index 987710e..6023cba 100644
--- a/drivers/mtd/nand/spi/gigadevice.c
+++ b/drivers/mtd/nand/spi/gigadevice.c
@@ -186,7 +186,7 @@ static int gd5fxgq4uexxg_ecc_get_status(struct spinand_device *spinand,
{
u8 status2;
struct spi_mem_op op = SPINAND_GET_FEATURE_OP(GD5FXGQXXEXXG_REG_STATUS2,
- &status2);
+ spinand->scratchbuf);
int ret;
switch (status & STATUS_ECC_MASK) {
@@ -207,6 +207,7 @@ static int gd5fxgq4uexxg_ecc_get_status(struct spinand_device *spinand,
* report the maximum of 4 in this case
*/
/* bits sorted this way (3...0): ECCS1,ECCS0,ECCSE1,ECCSE0 */
+ status2 = *(spinand->scratchbuf);
return ((status & STATUS_ECC_MASK) >> 2) |
((status2 & STATUS_ECC_MASK) >> 4);
@@ -228,7 +229,7 @@ static int gd5fxgq5xexxg_ecc_get_status(struct spinand_device *spinand,
{
u8 status2;
struct spi_mem_op op = SPINAND_GET_FEATURE_OP(GD5FXGQXXEXXG_REG_STATUS2,
- &status2);
+ spinand->scratchbuf);
int ret;
switch (status & STATUS_ECC_MASK) {
@@ -248,6 +249,7 @@ static int gd5fxgq5xexxg_ecc_get_status(struct spinand_device *spinand,
* 1 ... 4 bits are flipped (and corrected)
*/
/* bits sorted this way (1...0): ECCSE1, ECCSE0 */
+ status2 = *(spinand->scratchbuf);
return ((status2 & STATUS_ECC_MASK) >> 4) + 1;
case STATUS_ECC_UNCOR_ERROR:
diff --git a/drivers/net/arcnet/arc-rawmode.c b/drivers/net/arcnet/arc-rawmode.c
index 8c651fd..57f1729 100644
--- a/drivers/net/arcnet/arc-rawmode.c
+++ b/drivers/net/arcnet/arc-rawmode.c
@@ -186,4 +186,5 @@ static void __exit arcnet_raw_exit(void)
module_init(arcnet_raw_init);
module_exit(arcnet_raw_exit);
+MODULE_DESCRIPTION("ARCnet raw mode packet interface module");
MODULE_LICENSE("GPL");
diff --git a/drivers/net/arcnet/arc-rimi.c b/drivers/net/arcnet/arc-rimi.c
index 8c3ccc7..53d10a0 100644
--- a/drivers/net/arcnet/arc-rimi.c
+++ b/drivers/net/arcnet/arc-rimi.c
@@ -312,6 +312,7 @@ module_param(node, int, 0);
module_param(io, int, 0);
module_param(irq, int, 0);
module_param_string(device, device, sizeof(device), 0);
+MODULE_DESCRIPTION("ARCnet COM90xx RIM I chipset driver");
MODULE_LICENSE("GPL");
static struct net_device *my_dev;
diff --git a/drivers/net/arcnet/capmode.c b/drivers/net/arcnet/capmode.c
index c09b567..7a0a799 100644
--- a/drivers/net/arcnet/capmode.c
+++ b/drivers/net/arcnet/capmode.c
@@ -265,4 +265,5 @@ static void __exit capmode_module_exit(void)
module_init(capmode_module_init);
module_exit(capmode_module_exit);
+MODULE_DESCRIPTION("ARCnet CAP mode packet interface module");
MODULE_LICENSE("GPL");
diff --git a/drivers/net/arcnet/com20020-pci.c b/drivers/net/arcnet/com20020-pci.c
index 7b5c8bb..c5e571e 100644
--- a/drivers/net/arcnet/com20020-pci.c
+++ b/drivers/net/arcnet/com20020-pci.c
@@ -61,6 +61,7 @@ module_param(timeout, int, 0);
module_param(backplane, int, 0);
module_param(clockp, int, 0);
module_param(clockm, int, 0);
+MODULE_DESCRIPTION("ARCnet COM20020 chipset PCI driver");
MODULE_LICENSE("GPL");
static void led_tx_set(struct led_classdev *led_cdev,
diff --git a/drivers/net/arcnet/com20020.c b/drivers/net/arcnet/com20020.c
index 06e1651..a0053e3 100644
--- a/drivers/net/arcnet/com20020.c
+++ b/drivers/net/arcnet/com20020.c
@@ -399,6 +399,7 @@ EXPORT_SYMBOL(com20020_found);
EXPORT_SYMBOL(com20020_netdev_ops);
#endif
+MODULE_DESCRIPTION("ARCnet COM20020 chipset core driver");
MODULE_LICENSE("GPL");
#ifdef MODULE
diff --git a/drivers/net/arcnet/com20020_cs.c b/drivers/net/arcnet/com20020_cs.c
index dc3253b..75f08aa 100644
--- a/drivers/net/arcnet/com20020_cs.c
+++ b/drivers/net/arcnet/com20020_cs.c
@@ -97,6 +97,7 @@ module_param(backplane, int, 0);
module_param(clockp, int, 0);
module_param(clockm, int, 0);
+MODULE_DESCRIPTION("ARCnet COM20020 chipset PCMCIA driver");
MODULE_LICENSE("GPL");
/*====================================================================*/
diff --git a/drivers/net/arcnet/com90io.c b/drivers/net/arcnet/com90io.c
index 37b4774..3b463fb 100644
--- a/drivers/net/arcnet/com90io.c
+++ b/drivers/net/arcnet/com90io.c
@@ -350,6 +350,7 @@ static char device[9]; /* use eg. device=arc1 to change name */
module_param_hw(io, int, ioport, 0);
module_param_hw(irq, int, irq, 0);
module_param_string(device, device, sizeof(device), 0);
+MODULE_DESCRIPTION("ARCnet COM90xx IO mapped chipset driver");
MODULE_LICENSE("GPL");
#ifndef MODULE
diff --git a/drivers/net/arcnet/com90xx.c b/drivers/net/arcnet/com90xx.c
index f49dae1..b3b287c 100644
--- a/drivers/net/arcnet/com90xx.c
+++ b/drivers/net/arcnet/com90xx.c
@@ -645,6 +645,7 @@ static void com90xx_copy_from_card(struct net_device *dev, int bufnum,
TIME(dev, "memcpy_fromio", count, memcpy_fromio(buf, memaddr, count));
}
+MODULE_DESCRIPTION("ARCnet COM90xx normal chipset driver");
MODULE_LICENSE("GPL");
static int __init com90xx_init(void)
diff --git a/drivers/net/arcnet/rfc1051.c b/drivers/net/arcnet/rfc1051.c
index a7752a5..46519ca 100644
--- a/drivers/net/arcnet/rfc1051.c
+++ b/drivers/net/arcnet/rfc1051.c
@@ -78,6 +78,7 @@ static void __exit arcnet_rfc1051_exit(void)
module_init(arcnet_rfc1051_init);
module_exit(arcnet_rfc1051_exit);
+MODULE_DESCRIPTION("ARCNet packet format (RFC 1051) module");
MODULE_LICENSE("GPL");
/* Determine a packet's protocol ID.
diff --git a/drivers/net/arcnet/rfc1201.c b/drivers/net/arcnet/rfc1201.c
index a4c8562..0edf35d 100644
--- a/drivers/net/arcnet/rfc1201.c
+++ b/drivers/net/arcnet/rfc1201.c
@@ -35,6 +35,7 @@
#include "arcdevice.h"
+MODULE_DESCRIPTION("ARCNet packet format (RFC 1201) module");
MODULE_LICENSE("GPL");
static __be16 type_trans(struct sk_buff *skb, struct net_device *dev);
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 4e0600c..cd0683b 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1811,7 +1811,7 @@ void bond_xdp_set_features(struct net_device *bond_dev)
ASSERT_RTNL();
- if (!bond_xdp_check(bond)) {
+ if (!bond_xdp_check(bond) || !bond_has_slaves(bond)) {
xdp_clear_features_flag(bond_dev);
return;
}
@@ -1819,6 +1819,8 @@ void bond_xdp_set_features(struct net_device *bond_dev)
bond_for_each_slave(bond, slave, iter)
val &= slave->dev->xdp_features;
+ val &= ~NETDEV_XDP_ACT_XSK_ZEROCOPY;
+
xdp_set_features_flag(bond_dev, val);
}
@@ -5909,9 +5911,6 @@ void bond_setup(struct net_device *bond_dev)
if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP)
bond_dev->features |= BOND_XFRM_FEATURES;
#endif /* CONFIG_XFRM_OFFLOAD */
-
- if (bond_xdp_check(bond))
- bond_dev->xdp_features = NETDEV_XDP_ACT_MASK;
}
/* Destroy a bonding device.
diff --git a/drivers/net/can/dev/netlink.c b/drivers/net/can/dev/netlink.c
index 036d85e..dfdc039 100644
--- a/drivers/net/can/dev/netlink.c
+++ b/drivers/net/can/dev/netlink.c
@@ -346,7 +346,7 @@ static int can_changelink(struct net_device *dev, struct nlattr *tb[],
/* Neither of TDC parameters nor TDC flags are
* provided: do calculation
*/
- can_calc_tdco(&priv->tdc, priv->tdc_const, &priv->data_bittiming,
+ can_calc_tdco(&priv->tdc, priv->tdc_const, &dbt,
&priv->ctrlmode, priv->ctrlmode_supported);
} /* else: both CAN_CTRLMODE_TDC_{AUTO,MANUAL} are explicitly
* turned off. TDC is disabled: do nothing
diff --git a/drivers/net/dsa/dsa_loop_bdinfo.c b/drivers/net/dsa/dsa_loop_bdinfo.c
index 237066d..14ca424 100644
--- a/drivers/net/dsa/dsa_loop_bdinfo.c
+++ b/drivers/net/dsa/dsa_loop_bdinfo.c
@@ -32,4 +32,5 @@ static int __init dsa_loop_bdinfo_init(void)
}
arch_initcall(dsa_loop_bdinfo_init)
+MODULE_DESCRIPTION("DSA mock-up switch driver");
MODULE_LICENSE("GPL");
diff --git a/drivers/net/dsa/microchip/ksz8795.c b/drivers/net/dsa/microchip/ksz8795.c
index 61b71bc..c3da97a 100644
--- a/drivers/net/dsa/microchip/ksz8795.c
+++ b/drivers/net/dsa/microchip/ksz8795.c
@@ -49,9 +49,9 @@ static int ksz8_ind_write8(struct ksz_device *dev, u8 table, u16 addr, u8 data)
mutex_lock(&dev->alu_mutex);
ctrl_addr = IND_ACC_TABLE(table) | addr;
- ret = ksz_write8(dev, regs[REG_IND_BYTE], data);
+ ret = ksz_write16(dev, regs[REG_IND_CTRL_0], ctrl_addr);
if (!ret)
- ret = ksz_write16(dev, regs[REG_IND_CTRL_0], ctrl_addr);
+ ret = ksz_write8(dev, regs[REG_IND_BYTE], data);
mutex_unlock(&dev->alu_mutex);
diff --git a/drivers/net/ethernet/adi/Kconfig b/drivers/net/ethernet/adi/Kconfig
index da3bdd3..760a9a6 100644
--- a/drivers/net/ethernet/adi/Kconfig
+++ b/drivers/net/ethernet/adi/Kconfig
@@ -21,6 +21,7 @@
tristate "Analog Devices ADIN1110 MAC-PHY"
depends on SPI && NET_SWITCHDEV
select CRC8
+ select PHYLIB
help
Say yes here to build support for Analog Devices ADIN1110
Low Power 10BASE-T1L Ethernet MAC-PHY.
diff --git a/drivers/net/ethernet/amd/pds_core/auxbus.c b/drivers/net/ethernet/amd/pds_core/auxbus.c
index 11c23a7..fd1a514 100644
--- a/drivers/net/ethernet/amd/pds_core/auxbus.c
+++ b/drivers/net/ethernet/amd/pds_core/auxbus.c
@@ -160,23 +160,19 @@ static struct pds_auxiliary_dev *pdsc_auxbus_dev_register(struct pdsc *cf,
if (err < 0) {
dev_warn(cf->dev, "auxiliary_device_init of %s failed: %pe\n",
name, ERR_PTR(err));
- goto err_out;
+ kfree(padev);
+ return ERR_PTR(err);
}
err = auxiliary_device_add(aux_dev);
if (err) {
dev_warn(cf->dev, "auxiliary_device_add of %s failed: %pe\n",
name, ERR_PTR(err));
- goto err_out_uninit;
+ auxiliary_device_uninit(aux_dev);
+ return ERR_PTR(err);
}
return padev;
-
-err_out_uninit:
- auxiliary_device_uninit(aux_dev);
-err_out:
- kfree(padev);
- return ERR_PTR(err);
}
int pdsc_auxbus_dev_del(struct pdsc *cf, struct pdsc *pf)
diff --git a/drivers/net/ethernet/amd/pds_core/main.c b/drivers/net/ethernet/amd/pds_core/main.c
index cdbf053..0050c58 100644
--- a/drivers/net/ethernet/amd/pds_core/main.c
+++ b/drivers/net/ethernet/amd/pds_core/main.c
@@ -451,6 +451,9 @@ static void pdsc_remove(struct pci_dev *pdev)
static void pdsc_stop_health_thread(struct pdsc *pdsc)
{
+ if (pdsc->pdev->is_virtfn)
+ return;
+
timer_shutdown_sync(&pdsc->wdtimer);
if (pdsc->health_work.func)
cancel_work_sync(&pdsc->health_work);
@@ -458,6 +461,9 @@ static void pdsc_stop_health_thread(struct pdsc *pdsc)
static void pdsc_restart_health_thread(struct pdsc *pdsc)
{
+ if (pdsc->pdev->is_virtfn)
+ return;
+
timer_setup(&pdsc->wdtimer, pdsc_wdtimer_cb, 0);
mod_timer(&pdsc->wdtimer, jiffies + 1);
}
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ptp.c b/drivers/net/ethernet/aquantia/atlantic/aq_ptp.c
index abd4832..5acb3e1 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_ptp.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ptp.c
@@ -993,7 +993,7 @@ int aq_ptp_ring_alloc(struct aq_nic_s *aq_nic)
return 0;
err_exit_hwts_rx:
- aq_ring_free(&aq_ptp->hwts_rx);
+ aq_ring_hwts_rx_free(&aq_ptp->hwts_rx);
err_exit_ptp_rx:
aq_ring_free(&aq_ptp->ptp_rx);
err_exit_ptp_tx:
@@ -1011,7 +1011,7 @@ void aq_ptp_ring_free(struct aq_nic_s *aq_nic)
aq_ring_free(&aq_ptp->ptp_tx);
aq_ring_free(&aq_ptp->ptp_rx);
- aq_ring_free(&aq_ptp->hwts_rx);
+ aq_ring_hwts_rx_free(&aq_ptp->hwts_rx);
aq_ptp_skb_ring_release(&aq_ptp->skb_ring);
}
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
index cda8597..f7433ab 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
@@ -919,6 +919,19 @@ void aq_ring_free(struct aq_ring_s *self)
}
}
+void aq_ring_hwts_rx_free(struct aq_ring_s *self)
+{
+ if (!self)
+ return;
+
+ if (self->dx_ring) {
+ dma_free_coherent(aq_nic_get_dev(self->aq_nic),
+ self->size * self->dx_size + AQ_CFG_RXDS_DEF,
+ self->dx_ring, self->dx_ring_pa);
+ self->dx_ring = NULL;
+ }
+}
+
unsigned int aq_ring_fill_stats_data(struct aq_ring_s *self, u64 *data)
{
unsigned int count;
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.h b/drivers/net/ethernet/aquantia/atlantic/aq_ring.h
index 5284731..d627ace 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.h
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.h
@@ -210,6 +210,7 @@ int aq_ring_rx_fill(struct aq_ring_s *self);
int aq_ring_hwts_rx_alloc(struct aq_ring_s *self,
struct aq_nic_s *aq_nic, unsigned int idx,
unsigned int size, unsigned int dx_size);
+void aq_ring_hwts_rx_free(struct aq_ring_s *self);
void aq_ring_hwts_rx_clean(struct aq_ring_s *self, struct aq_nic_s *aq_nic);
unsigned int aq_ring_fill_stats_data(struct aq_ring_s *self, u64 *data);
diff --git a/drivers/net/ethernet/broadcom/asp2/bcmasp.c b/drivers/net/ethernet/broadcom/asp2/bcmasp.c
index 29b04a2..80245c6 100644
--- a/drivers/net/ethernet/broadcom/asp2/bcmasp.c
+++ b/drivers/net/ethernet/broadcom/asp2/bcmasp.c
@@ -535,9 +535,6 @@ int bcmasp_netfilt_get_all_active(struct bcmasp_intf *intf, u32 *rule_locs,
int j = 0, i;
for (i = 0; i < NUM_NET_FILTERS; i++) {
- if (j == *rule_cnt)
- return -EMSGSIZE;
-
if (!priv->net_filters[i].claimed ||
priv->net_filters[i].port != intf->port)
continue;
@@ -547,6 +544,9 @@ int bcmasp_netfilt_get_all_active(struct bcmasp_intf *intf, u32 *rule_locs,
priv->net_filters[i - 1].wake_filter)
continue;
+ if (j == *rule_cnt)
+ return -EMSGSIZE;
+
rule_locs[j++] = priv->net_filters[i].fs.location;
}
diff --git a/drivers/net/ethernet/broadcom/asp2/bcmasp_intf.c b/drivers/net/ethernet/broadcom/asp2/bcmasp_intf.c
index 53e5428..6ad1366 100644
--- a/drivers/net/ethernet/broadcom/asp2/bcmasp_intf.c
+++ b/drivers/net/ethernet/broadcom/asp2/bcmasp_intf.c
@@ -684,6 +684,8 @@ static int bcmasp_init_rx(struct bcmasp_intf *intf)
intf->rx_buf_order = get_order(RING_BUFFER_SIZE);
buffer_pg = alloc_pages(GFP_KERNEL, intf->rx_buf_order);
+ if (!buffer_pg)
+ return -ENOMEM;
dma = dma_map_page(kdev, buffer_pg, 0, RING_BUFFER_SIZE,
DMA_FROM_DEVICE);
@@ -1048,6 +1050,9 @@ static int bcmasp_netif_init(struct net_device *dev, bool phy_connect)
netdev_err(dev, "could not attach to PHY\n");
goto err_phy_disable;
}
+
+ /* Indicate that the MAC is responsible for PHY PM */
+ phydev->mac_managed_pm = true;
} else if (!intf->wolopts) {
ret = phy_resume(dev->phydev);
if (ret)
@@ -1092,6 +1097,7 @@ static int bcmasp_netif_init(struct net_device *dev, bool phy_connect)
return 0;
err_reclaim_tx:
+ netif_napi_del(&intf->tx_napi);
bcmasp_reclaim_free_all_tx(intf);
err_phy_disconnect:
if (phydev)
diff --git a/drivers/net/ethernet/brocade/bna/bnad.c b/drivers/net/ethernet/brocade/bna/bnad.c
index 31191b5..c321744 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.c
+++ b/drivers/net/ethernet/brocade/bna/bnad.c
@@ -1091,10 +1091,10 @@ bnad_cb_tx_resume(struct bnad *bnad, struct bna_tx *tx)
* Free all TxQs buffers and then notify TX_E_CLEANUP_DONE to Tx fsm.
*/
static void
-bnad_tx_cleanup(struct delayed_work *work)
+bnad_tx_cleanup(struct work_struct *work)
{
struct bnad_tx_info *tx_info =
- container_of(work, struct bnad_tx_info, tx_cleanup_work);
+ container_of(work, struct bnad_tx_info, tx_cleanup_work.work);
struct bnad *bnad = NULL;
struct bna_tcb *tcb;
unsigned long flags;
@@ -1170,7 +1170,7 @@ bnad_cb_rx_stall(struct bnad *bnad, struct bna_rx *rx)
* Free all RxQs buffers and then notify RX_E_CLEANUP_DONE to Rx fsm.
*/
static void
-bnad_rx_cleanup(void *work)
+bnad_rx_cleanup(struct work_struct *work)
{
struct bnad_rx_info *rx_info =
container_of(work, struct bnad_rx_info, rx_cleanup_work);
@@ -1991,8 +1991,7 @@ bnad_setup_tx(struct bnad *bnad, u32 tx_id)
}
tx_info->tx = tx;
- INIT_DELAYED_WORK(&tx_info->tx_cleanup_work,
- (work_func_t)bnad_tx_cleanup);
+ INIT_DELAYED_WORK(&tx_info->tx_cleanup_work, bnad_tx_cleanup);
/* Register ISR for the Tx object */
if (intr_info->intr_type == BNA_INTR_T_MSIX) {
@@ -2248,8 +2247,7 @@ bnad_setup_rx(struct bnad *bnad, u32 rx_id)
rx_info->rx = rx;
spin_unlock_irqrestore(&bnad->bna_lock, flags);
- INIT_WORK(&rx_info->rx_cleanup_work,
- (work_func_t)(bnad_rx_cleanup));
+ INIT_WORK(&rx_info->rx_cleanup_work, bnad_rx_cleanup);
/*
* Init NAPI, so that state is set to NAPI_STATE_SCHED,
diff --git a/drivers/net/ethernet/cisco/enic/vnic_vic.c b/drivers/net/ethernet/cisco/enic/vnic_vic.c
index 20fcb20..66b5778 100644
--- a/drivers/net/ethernet/cisco/enic/vnic_vic.c
+++ b/drivers/net/ethernet/cisco/enic/vnic_vic.c
@@ -49,7 +49,8 @@ int vic_provinfo_add_tlv(struct vic_provinfo *vp, u16 type, u16 length,
tlv->type = htons(type);
tlv->length = htons(length);
- memcpy(tlv->value, value, length);
+ unsafe_memcpy(tlv->value, value, length,
+ /* Flexible array of flexible arrays */);
vp->num_tlvs = htonl(ntohl(vp->num_tlvs) + 1);
vp->length = htonl(ntohl(vp->length) +
diff --git a/drivers/net/ethernet/engleder/tsnep_main.c b/drivers/net/ethernet/engleder/tsnep_main.c
index 9aeff2b..64eadd3 100644
--- a/drivers/net/ethernet/engleder/tsnep_main.c
+++ b/drivers/net/ethernet/engleder/tsnep_main.c
@@ -719,17 +719,25 @@ static void tsnep_xdp_xmit_flush(struct tsnep_tx *tx)
static bool tsnep_xdp_xmit_back(struct tsnep_adapter *adapter,
struct xdp_buff *xdp,
- struct netdev_queue *tx_nq, struct tsnep_tx *tx)
+ struct netdev_queue *tx_nq, struct tsnep_tx *tx,
+ bool zc)
{
struct xdp_frame *xdpf = xdp_convert_buff_to_frame(xdp);
bool xmit;
+ u32 type;
if (unlikely(!xdpf))
return false;
+ /* no page pool for zero copy */
+ if (zc)
+ type = TSNEP_TX_TYPE_XDP_NDO;
+ else
+ type = TSNEP_TX_TYPE_XDP_TX;
+
__netif_tx_lock(tx_nq, smp_processor_id());
- xmit = tsnep_xdp_xmit_frame_ring(xdpf, tx, TSNEP_TX_TYPE_XDP_TX);
+ xmit = tsnep_xdp_xmit_frame_ring(xdpf, tx, type);
/* Avoid transmit queue timeout since we share it with the slow path */
if (xmit)
@@ -1273,7 +1281,7 @@ static bool tsnep_xdp_run_prog(struct tsnep_rx *rx, struct bpf_prog *prog,
case XDP_PASS:
return false;
case XDP_TX:
- if (!tsnep_xdp_xmit_back(rx->adapter, xdp, tx_nq, tx))
+ if (!tsnep_xdp_xmit_back(rx->adapter, xdp, tx_nq, tx, false))
goto out_failure;
*status |= TSNEP_XDP_TX;
return true;
@@ -1323,7 +1331,7 @@ static bool tsnep_xdp_run_prog_zc(struct tsnep_rx *rx, struct bpf_prog *prog,
case XDP_PASS:
return false;
case XDP_TX:
- if (!tsnep_xdp_xmit_back(rx->adapter, xdp, tx_nq, tx))
+ if (!tsnep_xdp_xmit_back(rx->adapter, xdp, tx_nq, tx, true))
goto out_failure;
*status |= TSNEP_XDP_TX;
return true;
diff --git a/drivers/net/ethernet/freescale/fman/fman_memac.c b/drivers/net/ethernet/freescale/fman/fman_memac.c
index 9ba15d318..758535a 100644
--- a/drivers/net/ethernet/freescale/fman/fman_memac.c
+++ b/drivers/net/ethernet/freescale/fman/fman_memac.c
@@ -1073,6 +1073,14 @@ int memac_initialization(struct mac_device *mac_dev,
unsigned long capabilities;
unsigned long *supported;
+ /* The internal connection to the serdes is XGMII, but this isn't
+ * really correct for the phy mode (which is the external connection).
+ * However, this is how all older device trees say that they want
+ * 10GBASE-R (aka XFI), so just convert it for them.
+ */
+ if (mac_dev->phy_if == PHY_INTERFACE_MODE_XGMII)
+ mac_dev->phy_if = PHY_INTERFACE_MODE_10GBASER;
+
mac_dev->phylink_ops = &memac_mac_ops;
mac_dev->set_promisc = memac_set_promiscuous;
mac_dev->change_addr = memac_modify_mac_address;
@@ -1139,7 +1147,7 @@ int memac_initialization(struct mac_device *mac_dev,
* (and therefore that xfi_pcs cannot be set). If we are defaulting to
* XGMII, assume this is for XFI. Otherwise, assume it is for SGMII.
*/
- if (err && mac_dev->phy_if == PHY_INTERFACE_MODE_XGMII)
+ if (err && mac_dev->phy_if == PHY_INTERFACE_MODE_10GBASER)
memac->xfi_pcs = pcs;
else
memac->sgmii_pcs = pcs;
@@ -1153,14 +1161,6 @@ int memac_initialization(struct mac_device *mac_dev,
goto _return_fm_mac_free;
}
- /* The internal connection to the serdes is XGMII, but this isn't
- * really correct for the phy mode (which is the external connection).
- * However, this is how all older device trees say that they want
- * 10GBASE-R (aka XFI), so just convert it for them.
- */
- if (mac_dev->phy_if == PHY_INTERFACE_MODE_XGMII)
- mac_dev->phy_if = PHY_INTERFACE_MODE_10GBASER;
-
/* TODO: The following interface modes are supported by (some) hardware
* but not by this driver:
* - 1000BASE-KX
diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
index a2788fd..19e450a 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
@@ -2559,7 +2559,7 @@ void e1000_copy_rx_addrs_to_phy_ich8lan(struct e1000_hw *hw)
hw->phy.ops.write_reg_page(hw, BM_RAR_H(i),
(u16)(mac_reg & 0xFFFF));
hw->phy.ops.write_reg_page(hw, BM_RAR_CTRL(i),
- FIELD_GET(E1000_RAH_AV, mac_reg));
+ (u16)((mac_reg & E1000_RAH_AV) >> 16));
}
e1000_disable_phy_wakeup_reg_access_bm(hw, &phy_reg);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_dcb.c b/drivers/net/ethernet/intel/i40e/i40e_dcb.c
index 9d88ed6..8db1eb0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_dcb.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_dcb.c
@@ -1523,7 +1523,7 @@ void i40e_dcb_hw_rx_ets_bw_config(struct i40e_hw *hw, u8 *bw_share,
reg = rd32(hw, I40E_PRTDCB_RETSTCC(i));
reg &= ~(I40E_PRTDCB_RETSTCC_BWSHARE_MASK |
I40E_PRTDCB_RETSTCC_UPINTC_MODE_MASK |
- I40E_PRTDCB_RETSTCC_ETSTC_SHIFT);
+ I40E_PRTDCB_RETSTCC_ETSTC_MASK);
reg |= FIELD_PREP(I40E_PRTDCB_RETSTCC_BWSHARE_MASK,
bw_share[i]);
reg |= FIELD_PREP(I40E_PRTDCB_RETSTCC_UPINTC_MODE_MASK,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_dcb.h b/drivers/net/ethernet/intel/i40e/i40e_dcb.h
index 6b60dc9..d764975 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_dcb.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_dcb.h
@@ -43,7 +43,7 @@
#define I40E_LLDP_TLV_SUBTYPE_SHIFT 0
#define I40E_LLDP_TLV_SUBTYPE_MASK (0xFF << I40E_LLDP_TLV_SUBTYPE_SHIFT)
#define I40E_LLDP_TLV_OUI_SHIFT 8
-#define I40E_LLDP_TLV_OUI_MASK (0xFFFFFF << I40E_LLDP_TLV_OUI_SHIFT)
+#define I40E_LLDP_TLV_OUI_MASK (0xFFFFFFU << I40E_LLDP_TLV_OUI_SHIFT)
/* Defines for IEEE ETS TLV */
#define I40E_IEEE_ETS_MAXTC_SHIFT 0
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 6e7fd47..89a3401 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -4926,27 +4926,23 @@ int i40e_vsi_start_rings(struct i40e_vsi *vsi)
void i40e_vsi_stop_rings(struct i40e_vsi *vsi)
{
struct i40e_pf *pf = vsi->back;
- int pf_q, err, q_end;
+ u32 pf_q, tx_q_end, rx_q_end;
/* When port TX is suspended, don't wait */
if (test_bit(__I40E_PORT_SUSPENDED, vsi->back->state))
return i40e_vsi_stop_rings_no_wait(vsi);
- q_end = vsi->base_queue + vsi->num_queue_pairs;
- for (pf_q = vsi->base_queue; pf_q < q_end; pf_q++)
- i40e_pre_tx_queue_cfg(&pf->hw, (u32)pf_q, false);
+ tx_q_end = vsi->base_queue +
+ vsi->alloc_queue_pairs * (i40e_enabled_xdp_vsi(vsi) ? 2 : 1);
+ for (pf_q = vsi->base_queue; pf_q < tx_q_end; pf_q++)
+ i40e_pre_tx_queue_cfg(&pf->hw, pf_q, false);
- for (pf_q = vsi->base_queue; pf_q < q_end; pf_q++) {
- err = i40e_control_wait_rx_q(pf, pf_q, false);
- if (err)
- dev_info(&pf->pdev->dev,
- "VSI seid %d Rx ring %d disable timeout\n",
- vsi->seid, pf_q);
- }
+ rx_q_end = vsi->base_queue + vsi->num_queue_pairs;
+ for (pf_q = vsi->base_queue; pf_q < rx_q_end; pf_q++)
+ i40e_control_rx_q(pf, pf_q, false);
msleep(I40E_DISABLE_TX_GAP_MSEC);
- pf_q = vsi->base_queue;
- for (pf_q = vsi->base_queue; pf_q < q_end; pf_q++)
+ for (pf_q = vsi->base_queue; pf_q < tx_q_end; pf_q++)
wr32(&pf->hw, I40E_QTX_ENA(pf_q), 0);
i40e_vsi_wait_queues_disabled(vsi);
@@ -5360,7 +5356,7 @@ static int i40e_pf_wait_queues_disabled(struct i40e_pf *pf)
{
int v, ret = 0;
- for (v = 0; v < pf->hw.func_caps.num_vsis; v++) {
+ for (v = 0; v < pf->num_alloc_vsi; v++) {
if (pf->vsi[v]) {
ret = i40e_vsi_wait_queues_disabled(pf->vsi[v]);
if (ret)
@@ -13564,9 +13560,9 @@ int i40e_queue_pair_disable(struct i40e_vsi *vsi, int queue_pair)
return err;
i40e_queue_pair_disable_irq(vsi, queue_pair);
+ i40e_queue_pair_toggle_napi(vsi, queue_pair, false /* off */);
err = i40e_queue_pair_toggle_rings(vsi, queue_pair, false /* off */);
i40e_clean_rx_ring(vsi->rx_rings[queue_pair]);
- i40e_queue_pair_toggle_napi(vsi, queue_pair, false /* off */);
i40e_queue_pair_clean_rings(vsi, queue_pair);
i40e_queue_pair_reset_stats(vsi, queue_pair);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_prototype.h b/drivers/net/ethernet/intel/i40e/i40e_prototype.h
index af42693..ce1f11b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_prototype.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_prototype.h
@@ -567,8 +567,7 @@ static inline bool i40e_is_fw_ver_lt(struct i40e_hw *hw, u16 maj, u16 min)
**/
static inline bool i40e_is_fw_ver_eq(struct i40e_hw *hw, u16 maj, u16 min)
{
- return (hw->aq.fw_maj_ver > maj ||
- (hw->aq.fw_maj_ver == maj && hw->aq.fw_min_ver == min));
+ return (hw->aq.fw_maj_ver == maj && hw->aq.fw_min_ver == min);
}
#endif /* _I40E_PROTOTYPE_H_ */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 908cdbd..b34c717 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -2848,6 +2848,24 @@ static int i40e_vc_get_stats_msg(struct i40e_vf *vf, u8 *msg)
(u8 *)&stats, sizeof(stats));
}
+/**
+ * i40e_can_vf_change_mac
+ * @vf: pointer to the VF info
+ *
+ * Return true if the VF is allowed to change its MAC filters, false otherwise
+ */
+static bool i40e_can_vf_change_mac(struct i40e_vf *vf)
+{
+ /* If the VF MAC address has been set administratively (via the
+ * ndo_set_vf_mac command), then deny permission to the VF to
+ * add/delete unicast MAC addresses, unless the VF is trusted
+ */
+ if (vf->pf_set_mac && !vf->trusted)
+ return false;
+
+ return true;
+}
+
#define I40E_MAX_MACVLAN_PER_HW 3072
#define I40E_MAX_MACVLAN_PER_PF(num_ports) (I40E_MAX_MACVLAN_PER_HW / \
(num_ports))
@@ -2907,8 +2925,8 @@ static inline int i40e_check_vf_permission(struct i40e_vf *vf,
* The VF may request to set the MAC address filter already
* assigned to it so do not return an error in that case.
*/
- if (!test_bit(I40E_VIRTCHNL_VF_CAP_PRIVILEGE, &vf->vf_caps) &&
- !is_multicast_ether_addr(addr) && vf->pf_set_mac &&
+ if (!i40e_can_vf_change_mac(vf) &&
+ !is_multicast_ether_addr(addr) &&
!ether_addr_equal(addr, vf->default_lan_addr.addr)) {
dev_err(&pf->pdev->dev,
"VF attempting to override administratively set MAC address, bring down and up the VF interface to resume normal operation\n");
@@ -3114,19 +3132,29 @@ static int i40e_vc_del_mac_addr_msg(struct i40e_vf *vf, u8 *msg)
ret = -EINVAL;
goto error_param;
}
- if (ether_addr_equal(al->list[i].addr, vf->default_lan_addr.addr))
- was_unimac_deleted = true;
}
vsi = pf->vsi[vf->lan_vsi_idx];
spin_lock_bh(&vsi->mac_filter_hash_lock);
/* delete addresses from the list */
- for (i = 0; i < al->num_elements; i++)
+ for (i = 0; i < al->num_elements; i++) {
+ const u8 *addr = al->list[i].addr;
+
+ /* Allow to delete VF primary MAC only if it was not set
+ * administratively by PF or if VF is trusted.
+ */
+ if (ether_addr_equal(addr, vf->default_lan_addr.addr) &&
+ i40e_can_vf_change_mac(vf))
+ was_unimac_deleted = true;
+ else
+ continue;
+
if (i40e_del_mac_filter(vsi, al->list[i].addr)) {
ret = -EINVAL;
spin_unlock_bh(&vsi->mac_filter_hash_lock);
goto error_param;
}
+ }
spin_unlock_bh(&vsi->mac_filter_hash_lock);
diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index 7ac8477..c979192 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -190,15 +190,13 @@ static void ice_free_q_vector(struct ice_vsi *vsi, int v_idx)
q_vector = vsi->q_vectors[v_idx];
ice_for_each_tx_ring(tx_ring, q_vector->tx) {
- if (vsi->netdev)
- netif_queue_set_napi(vsi->netdev, tx_ring->q_index,
- NETDEV_QUEUE_TYPE_TX, NULL);
+ ice_queue_set_napi(vsi, tx_ring->q_index, NETDEV_QUEUE_TYPE_TX,
+ NULL);
tx_ring->q_vector = NULL;
}
ice_for_each_rx_ring(rx_ring, q_vector->rx) {
- if (vsi->netdev)
- netif_queue_set_napi(vsi->netdev, rx_ring->q_index,
- NETDEV_QUEUE_TYPE_RX, NULL);
+ ice_queue_set_napi(vsi, rx_ring->q_index, NETDEV_QUEUE_TYPE_RX,
+ NULL);
rx_ring->q_vector = NULL;
}
diff --git a/drivers/net/ethernet/intel/ice/ice_dpll.c b/drivers/net/ethernet/intel/ice/ice_dpll.c
index b9c5ece..bd9b1fe 100644
--- a/drivers/net/ethernet/intel/ice/ice_dpll.c
+++ b/drivers/net/ethernet/intel/ice/ice_dpll.c
@@ -31,6 +31,26 @@ static const char * const pin_type_name[] = {
};
/**
+ * ice_dpll_is_reset - check if reset is in progress
+ * @pf: private board structure
+ * @extack: error reporting
+ *
+ * If reset is in progress, fill extack with error.
+ *
+ * Return:
+ * * false - no reset in progress
+ * * true - reset in progress
+ */
+static bool ice_dpll_is_reset(struct ice_pf *pf, struct netlink_ext_ack *extack)
+{
+ if (ice_is_reset_in_progress(pf->state)) {
+ NL_SET_ERR_MSG(extack, "PF reset in progress");
+ return true;
+ }
+ return false;
+}
+
+/**
* ice_dpll_pin_freq_set - set pin's frequency
* @pf: private board structure
* @pin: pointer to a pin
@@ -109,6 +129,9 @@ ice_dpll_frequency_set(const struct dpll_pin *pin, void *pin_priv,
struct ice_pf *pf = d->pf;
int ret;
+ if (ice_dpll_is_reset(pf, extack))
+ return -EBUSY;
+
mutex_lock(&pf->dplls.lock);
ret = ice_dpll_pin_freq_set(pf, p, pin_type, frequency, extack);
mutex_unlock(&pf->dplls.lock);
@@ -254,6 +277,7 @@ ice_dpll_output_frequency_get(const struct dpll_pin *pin, void *pin_priv,
* ice_dpll_pin_enable - enable a pin on dplls
* @hw: board private hw structure
* @pin: pointer to a pin
+ * @dpll_idx: dpll index to connect to output pin
* @pin_type: type of pin being enabled
* @extack: error reporting
*
@@ -266,7 +290,7 @@ ice_dpll_output_frequency_get(const struct dpll_pin *pin, void *pin_priv,
*/
static int
ice_dpll_pin_enable(struct ice_hw *hw, struct ice_dpll_pin *pin,
- enum ice_dpll_pin_type pin_type,
+ u8 dpll_idx, enum ice_dpll_pin_type pin_type,
struct netlink_ext_ack *extack)
{
u8 flags = 0;
@@ -280,10 +304,12 @@ ice_dpll_pin_enable(struct ice_hw *hw, struct ice_dpll_pin *pin,
ret = ice_aq_set_input_pin_cfg(hw, pin->idx, 0, flags, 0, 0);
break;
case ICE_DPLL_PIN_TYPE_OUTPUT:
+ flags = ICE_AQC_SET_CGU_OUT_CFG_UPDATE_SRC_SEL;
if (pin->flags[0] & ICE_AQC_GET_CGU_OUT_CFG_ESYNC_EN)
flags |= ICE_AQC_SET_CGU_OUT_CFG_ESYNC_EN;
flags |= ICE_AQC_SET_CGU_OUT_CFG_OUT_EN;
- ret = ice_aq_set_output_pin_cfg(hw, pin->idx, flags, 0, 0, 0);
+ ret = ice_aq_set_output_pin_cfg(hw, pin->idx, flags, dpll_idx,
+ 0, 0);
break;
default:
return -EINVAL;
@@ -370,7 +396,7 @@ ice_dpll_pin_state_update(struct ice_pf *pf, struct ice_dpll_pin *pin,
case ICE_DPLL_PIN_TYPE_INPUT:
ret = ice_aq_get_input_pin_cfg(&pf->hw, pin->idx, NULL, NULL,
NULL, &pin->flags[0],
- &pin->freq, NULL);
+ &pin->freq, &pin->phase_adjust);
if (ret)
goto err;
if (ICE_AQC_GET_CGU_IN_CFG_FLG2_INPUT_EN & pin->flags[0]) {
@@ -398,14 +424,27 @@ ice_dpll_pin_state_update(struct ice_pf *pf, struct ice_dpll_pin *pin,
break;
case ICE_DPLL_PIN_TYPE_OUTPUT:
ret = ice_aq_get_output_pin_cfg(&pf->hw, pin->idx,
- &pin->flags[0], NULL,
+ &pin->flags[0], &parent,
&pin->freq, NULL);
if (ret)
goto err;
- if (ICE_AQC_SET_CGU_OUT_CFG_OUT_EN & pin->flags[0])
- pin->state[0] = DPLL_PIN_STATE_CONNECTED;
- else
- pin->state[0] = DPLL_PIN_STATE_DISCONNECTED;
+
+ parent &= ICE_AQC_GET_CGU_OUT_CFG_DPLL_SRC_SEL;
+ if (ICE_AQC_SET_CGU_OUT_CFG_OUT_EN & pin->flags[0]) {
+ pin->state[pf->dplls.eec.dpll_idx] =
+ parent == pf->dplls.eec.dpll_idx ?
+ DPLL_PIN_STATE_CONNECTED :
+ DPLL_PIN_STATE_DISCONNECTED;
+ pin->state[pf->dplls.pps.dpll_idx] =
+ parent == pf->dplls.pps.dpll_idx ?
+ DPLL_PIN_STATE_CONNECTED :
+ DPLL_PIN_STATE_DISCONNECTED;
+ } else {
+ pin->state[pf->dplls.eec.dpll_idx] =
+ DPLL_PIN_STATE_DISCONNECTED;
+ pin->state[pf->dplls.pps.dpll_idx] =
+ DPLL_PIN_STATE_DISCONNECTED;
+ }
break;
case ICE_DPLL_PIN_TYPE_RCLK_INPUT:
for (parent = 0; parent < pf->dplls.rclk.num_parents;
@@ -568,9 +607,13 @@ ice_dpll_pin_state_set(const struct dpll_pin *pin, void *pin_priv,
struct ice_pf *pf = d->pf;
int ret;
+ if (ice_dpll_is_reset(pf, extack))
+ return -EBUSY;
+
mutex_lock(&pf->dplls.lock);
if (enable)
- ret = ice_dpll_pin_enable(&pf->hw, p, pin_type, extack);
+ ret = ice_dpll_pin_enable(&pf->hw, p, d->dpll_idx, pin_type,
+ extack);
else
ret = ice_dpll_pin_disable(&pf->hw, p, pin_type, extack);
if (!ret)
@@ -603,6 +646,11 @@ ice_dpll_output_state_set(const struct dpll_pin *pin, void *pin_priv,
struct netlink_ext_ack *extack)
{
bool enable = state == DPLL_PIN_STATE_CONNECTED;
+ struct ice_dpll_pin *p = pin_priv;
+ struct ice_dpll *d = dpll_priv;
+
+ if (!enable && p->state[d->dpll_idx] == DPLL_PIN_STATE_DISCONNECTED)
+ return 0;
return ice_dpll_pin_state_set(pin, pin_priv, dpll, dpll_priv, enable,
extack, ICE_DPLL_PIN_TYPE_OUTPUT);
@@ -665,14 +713,16 @@ ice_dpll_pin_state_get(const struct dpll_pin *pin, void *pin_priv,
struct ice_pf *pf = d->pf;
int ret;
+ if (ice_dpll_is_reset(pf, extack))
+ return -EBUSY;
+
mutex_lock(&pf->dplls.lock);
ret = ice_dpll_pin_state_update(pf, p, pin_type, extack);
if (ret)
goto unlock;
- if (pin_type == ICE_DPLL_PIN_TYPE_INPUT)
+ if (pin_type == ICE_DPLL_PIN_TYPE_INPUT ||
+ pin_type == ICE_DPLL_PIN_TYPE_OUTPUT)
*state = p->state[d->dpll_idx];
- else if (pin_type == ICE_DPLL_PIN_TYPE_OUTPUT)
- *state = p->state[0];
ret = 0;
unlock:
mutex_unlock(&pf->dplls.lock);
@@ -790,6 +840,9 @@ ice_dpll_input_prio_set(const struct dpll_pin *pin, void *pin_priv,
struct ice_pf *pf = d->pf;
int ret;
+ if (ice_dpll_is_reset(pf, extack))
+ return -EBUSY;
+
mutex_lock(&pf->dplls.lock);
ret = ice_dpll_hw_input_prio_set(pf, d, p, prio, extack);
mutex_unlock(&pf->dplls.lock);
@@ -910,6 +963,9 @@ ice_dpll_pin_phase_adjust_set(const struct dpll_pin *pin, void *pin_priv,
u8 flag, flags_en = 0;
int ret;
+ if (ice_dpll_is_reset(pf, extack))
+ return -EBUSY;
+
mutex_lock(&pf->dplls.lock);
switch (type) {
case ICE_DPLL_PIN_TYPE_INPUT:
@@ -1069,6 +1125,9 @@ ice_dpll_rclk_state_on_pin_set(const struct dpll_pin *pin, void *pin_priv,
int ret = -EINVAL;
u32 hw_idx;
+ if (ice_dpll_is_reset(pf, extack))
+ return -EBUSY;
+
mutex_lock(&pf->dplls.lock);
hw_idx = parent->idx - pf->dplls.base_rclk_idx;
if (hw_idx >= pf->dplls.num_inputs)
@@ -1123,6 +1182,9 @@ ice_dpll_rclk_state_on_pin_get(const struct dpll_pin *pin, void *pin_priv,
int ret = -EINVAL;
u32 hw_idx;
+ if (ice_dpll_is_reset(pf, extack))
+ return -EBUSY;
+
mutex_lock(&pf->dplls.lock);
hw_idx = parent->idx - pf->dplls.base_rclk_idx;
if (hw_idx >= pf->dplls.num_inputs)
@@ -1305,8 +1367,10 @@ static void ice_dpll_periodic_work(struct kthread_work *work)
struct ice_pf *pf = container_of(d, struct ice_pf, dplls);
struct ice_dpll *de = &pf->dplls.eec;
struct ice_dpll *dp = &pf->dplls.pps;
- int ret;
+ int ret = 0;
+ if (ice_is_reset_in_progress(pf->state))
+ goto resched;
mutex_lock(&pf->dplls.lock);
ret = ice_dpll_update_state(pf, de, false);
if (!ret)
@@ -1326,6 +1390,7 @@ static void ice_dpll_periodic_work(struct kthread_work *work)
ice_dpll_notify_changes(de);
ice_dpll_notify_changes(dp);
+resched:
/* Run twice a second or reschedule if update failed */
kthread_queue_delayed_work(d->kworker, &d->work,
ret ? msecs_to_jiffies(10) :
@@ -1532,7 +1597,7 @@ static void ice_dpll_deinit_rclk_pin(struct ice_pf *pf)
}
if (WARN_ON_ONCE(!vsi || !vsi->netdev))
return;
- netdev_dpll_pin_clear(vsi->netdev);
+ dpll_netdev_pin_clear(vsi->netdev);
dpll_pin_put(rclk->pin);
}
@@ -1576,7 +1641,7 @@ ice_dpll_init_rclk_pins(struct ice_pf *pf, struct ice_dpll_pin *pin,
}
if (WARN_ON((!vsi || !vsi->netdev)))
return -EINVAL;
- netdev_dpll_pin_set(vsi->netdev, pf->dplls.rclk.pin);
+ dpll_netdev_pin_set(vsi->netdev, pf->dplls.rclk.pin);
return 0;
@@ -2055,6 +2120,7 @@ void ice_dpll_init(struct ice_pf *pf)
struct ice_dplls *d = &pf->dplls;
int err = 0;
+ mutex_init(&d->lock);
err = ice_dpll_init_info(pf, cgu);
if (err)
goto err_exit;
@@ -2067,7 +2133,6 @@ void ice_dpll_init(struct ice_pf *pf)
err = ice_dpll_init_pins(pf, cgu);
if (err)
goto deinit_pps;
- mutex_init(&d->lock);
if (cgu) {
err = ice_dpll_init_worker(pf);
if (err)
diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c b/drivers/net/ethernet/intel/ice/ice_lag.c
index 2a25323..467372d 100644
--- a/drivers/net/ethernet/intel/ice/ice_lag.c
+++ b/drivers/net/ethernet/intel/ice/ice_lag.c
@@ -152,6 +152,27 @@ ice_lag_find_hw_by_lport(struct ice_lag *lag, u8 lport)
}
/**
+ * ice_pkg_has_lport_extract - check if lport extraction supported
+ * @hw: HW struct
+ */
+static bool ice_pkg_has_lport_extract(struct ice_hw *hw)
+{
+ int i;
+
+ for (i = 0; i < hw->blk[ICE_BLK_SW].es.count; i++) {
+ u16 offset;
+ u8 fv_prot;
+
+ ice_find_prot_off(hw, ICE_BLK_SW, ICE_SW_DEFAULT_PROFILE, i,
+ &fv_prot, &offset);
+ if (fv_prot == ICE_FV_PROT_MDID &&
+ offset == ICE_LP_EXT_BUF_OFFSET)
+ return true;
+ }
+ return false;
+}
+
+/**
* ice_lag_find_primary - returns pointer to primary interfaces lag struct
* @lag: local interfaces lag struct
*/
@@ -1206,7 +1227,7 @@ static void ice_lag_del_prune_list(struct ice_lag *lag, struct ice_pf *event_pf)
}
/**
- * ice_lag_init_feature_support_flag - Check for NVM support for LAG
+ * ice_lag_init_feature_support_flag - Check for package and NVM support for LAG
* @pf: PF struct
*/
static void ice_lag_init_feature_support_flag(struct ice_pf *pf)
@@ -1219,7 +1240,7 @@ static void ice_lag_init_feature_support_flag(struct ice_pf *pf)
else
ice_clear_feature_support(pf, ICE_F_ROCE_LAG);
- if (caps->sriov_lag)
+ if (caps->sriov_lag && ice_pkg_has_lport_extract(&pf->hw))
ice_set_feature_support(pf, ICE_F_SRIOV_LAG);
else
ice_clear_feature_support(pf, ICE_F_SRIOV_LAG);
diff --git a/drivers/net/ethernet/intel/ice/ice_lag.h b/drivers/net/ethernet/intel/ice/ice_lag.h
index ede833d..183b387 100644
--- a/drivers/net/ethernet/intel/ice/ice_lag.h
+++ b/drivers/net/ethernet/intel/ice/ice_lag.h
@@ -17,6 +17,9 @@ enum ice_lag_role {
#define ICE_LAG_INVALID_PORT 0xFF
#define ICE_LAG_RESET_RETRIES 5
+#define ICE_SW_DEFAULT_PROFILE 0
+#define ICE_FV_PROT_MDID 255
+#define ICE_LP_EXT_BUF_OFFSET 32
struct ice_pf;
struct ice_vf;
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 9be7242..fc23dbe 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -2426,7 +2426,7 @@ ice_vsi_cfg_def(struct ice_vsi *vsi, struct ice_vsi_cfg_params *params)
ice_vsi_map_rings_to_vectors(vsi);
/* Associate q_vector rings to napi */
- ice_vsi_set_napi_queues(vsi, true);
+ ice_vsi_set_napi_queues(vsi);
vsi->stat_offsets_loaded = false;
@@ -2904,19 +2904,19 @@ void ice_vsi_dis_irq(struct ice_vsi *vsi)
}
/**
- * ice_queue_set_napi - Set the napi instance for the queue
+ * __ice_queue_set_napi - Set the napi instance for the queue
* @dev: device to which NAPI and queue belong
* @queue_index: Index of queue
* @type: queue type as RX or TX
* @napi: NAPI context
* @locked: is the rtnl_lock already held
*
- * Set the napi instance for the queue
+ * Set the napi instance for the queue. Caller indicates the lock status.
*/
static void
-ice_queue_set_napi(struct net_device *dev, unsigned int queue_index,
- enum netdev_queue_type type, struct napi_struct *napi,
- bool locked)
+__ice_queue_set_napi(struct net_device *dev, unsigned int queue_index,
+ enum netdev_queue_type type, struct napi_struct *napi,
+ bool locked)
{
if (!locked)
rtnl_lock();
@@ -2926,26 +2926,79 @@ ice_queue_set_napi(struct net_device *dev, unsigned int queue_index,
}
/**
- * ice_q_vector_set_napi_queues - Map queue[s] associated with the napi
+ * ice_queue_set_napi - Set the napi instance for the queue
+ * @vsi: VSI being configured
+ * @queue_index: Index of queue
+ * @type: queue type as RX or TX
+ * @napi: NAPI context
+ *
+ * Set the napi instance for the queue. The rtnl lock state is derived from the
+ * execution path.
+ */
+void
+ice_queue_set_napi(struct ice_vsi *vsi, unsigned int queue_index,
+ enum netdev_queue_type type, struct napi_struct *napi)
+{
+ struct ice_pf *pf = vsi->back;
+
+ if (!vsi->netdev)
+ return;
+
+ if (current_work() == &pf->serv_task ||
+ test_bit(ICE_PREPARED_FOR_RESET, pf->state) ||
+ test_bit(ICE_DOWN, pf->state) ||
+ test_bit(ICE_SUSPENDED, pf->state))
+ __ice_queue_set_napi(vsi->netdev, queue_index, type, napi,
+ false);
+ else
+ __ice_queue_set_napi(vsi->netdev, queue_index, type, napi,
+ true);
+}
+
+/**
+ * __ice_q_vector_set_napi_queues - Map queue[s] associated with the napi
* @q_vector: q_vector pointer
* @locked: is the rtnl_lock already held
*
- * Associate the q_vector napi with all the queue[s] on the vector
+ * Associate the q_vector napi with all the queue[s] on the vector.
+ * Caller indicates the lock status.
*/
-void ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector, bool locked)
+void __ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector, bool locked)
{
struct ice_rx_ring *rx_ring;
struct ice_tx_ring *tx_ring;
ice_for_each_rx_ring(rx_ring, q_vector->rx)
- ice_queue_set_napi(q_vector->vsi->netdev, rx_ring->q_index,
- NETDEV_QUEUE_TYPE_RX, &q_vector->napi,
- locked);
+ __ice_queue_set_napi(q_vector->vsi->netdev, rx_ring->q_index,
+ NETDEV_QUEUE_TYPE_RX, &q_vector->napi,
+ locked);
ice_for_each_tx_ring(tx_ring, q_vector->tx)
- ice_queue_set_napi(q_vector->vsi->netdev, tx_ring->q_index,
- NETDEV_QUEUE_TYPE_TX, &q_vector->napi,
- locked);
+ __ice_queue_set_napi(q_vector->vsi->netdev, tx_ring->q_index,
+ NETDEV_QUEUE_TYPE_TX, &q_vector->napi,
+ locked);
+ /* Also set the interrupt number for the NAPI */
+ netif_napi_set_irq(&q_vector->napi, q_vector->irq.virq);
+}
+
+/**
+ * ice_q_vector_set_napi_queues - Map queue[s] associated with the napi
+ * @q_vector: q_vector pointer
+ *
+ * Associate the q_vector napi with all the queue[s] on the vector
+ */
+void ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector)
+{
+ struct ice_rx_ring *rx_ring;
+ struct ice_tx_ring *tx_ring;
+
+ ice_for_each_rx_ring(rx_ring, q_vector->rx)
+ ice_queue_set_napi(q_vector->vsi, rx_ring->q_index,
+ NETDEV_QUEUE_TYPE_RX, &q_vector->napi);
+
+ ice_for_each_tx_ring(tx_ring, q_vector->tx)
+ ice_queue_set_napi(q_vector->vsi, tx_ring->q_index,
+ NETDEV_QUEUE_TYPE_TX, &q_vector->napi);
/* Also set the interrupt number for the NAPI */
netif_napi_set_irq(&q_vector->napi, q_vector->irq.virq);
}
@@ -2953,11 +3006,10 @@ void ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector, bool locked)
/**
* ice_vsi_set_napi_queues
* @vsi: VSI pointer
- * @locked: is the rtnl_lock already held
*
* Associate queue[s] with napi for all vectors
*/
-void ice_vsi_set_napi_queues(struct ice_vsi *vsi, bool locked)
+void ice_vsi_set_napi_queues(struct ice_vsi *vsi)
{
int i;
@@ -2965,7 +3017,7 @@ void ice_vsi_set_napi_queues(struct ice_vsi *vsi, bool locked)
return;
ice_for_each_q_vector(vsi, i)
- ice_q_vector_set_napi_queues(vsi->q_vectors[i], locked);
+ ice_q_vector_set_napi_queues(vsi->q_vectors[i]);
}
/**
@@ -3140,7 +3192,7 @@ ice_vsi_realloc_stat_arrays(struct ice_vsi *vsi)
}
}
- tx_ring_stats = vsi_stat->rx_ring_stats;
+ tx_ring_stats = vsi_stat->tx_ring_stats;
vsi_stat->tx_ring_stats =
krealloc_array(vsi_stat->tx_ring_stats, req_txq,
sizeof(*vsi_stat->tx_ring_stats),
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.h b/drivers/net/ethernet/intel/ice/ice_lib.h
index 71bd272..bfcfc58 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_lib.h
@@ -91,9 +91,15 @@ void ice_vsi_cfg_netdev_tc(struct ice_vsi *vsi, u8 ena_tc);
struct ice_vsi *
ice_vsi_setup(struct ice_pf *pf, struct ice_vsi_cfg_params *params);
-void ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector, bool locked);
+void
+ice_queue_set_napi(struct ice_vsi *vsi, unsigned int queue_index,
+ enum netdev_queue_type type, struct napi_struct *napi);
-void ice_vsi_set_napi_queues(struct ice_vsi *vsi, bool locked);
+void __ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector, bool locked);
+
+void ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector);
+
+void ice_vsi_set_napi_queues(struct ice_vsi *vsi);
int ice_vsi_release(struct ice_vsi *vsi);
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index dd4a9bc..df6a68a 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -3495,7 +3495,7 @@ static void ice_napi_add(struct ice_vsi *vsi)
ice_for_each_q_vector(vsi, v_idx) {
netif_napi_add(vsi->netdev, &vsi->q_vectors[v_idx]->napi,
ice_napi_poll);
- ice_q_vector_set_napi_queues(vsi->q_vectors[v_idx], false);
+ __ice_q_vector_set_napi_queues(vsi->q_vectors[v_idx], false);
}
}
@@ -5447,6 +5447,7 @@ static int ice_reinit_interrupt_scheme(struct ice_pf *pf)
if (ret)
goto err_reinit;
ice_vsi_map_rings_to_vectors(pf->vsi[v]);
+ ice_vsi_set_napi_queues(pf->vsi[v]);
}
ret = ice_req_irq_msix_misc(pf);
@@ -8012,6 +8013,8 @@ ice_bridge_setlink(struct net_device *dev, struct nlmsghdr *nlh,
pf_sw = pf->first_sw;
/* find the attribute in the netlink message */
br_spec = nlmsg_find_attr(nlh, sizeof(struct ifinfomsg), IFLA_AF_SPEC);
+ if (!br_spec)
+ return -EINVAL;
nla_for_each_nested(attr, br_spec, rem) {
__u16 mode;
diff --git a/drivers/net/ethernet/intel/ice/ice_osdep.h b/drivers/net/ethernet/intel/ice/ice_osdep.h
index 82bc54f..a2562f0 100644
--- a/drivers/net/ethernet/intel/ice/ice_osdep.h
+++ b/drivers/net/ethernet/intel/ice/ice_osdep.h
@@ -24,7 +24,7 @@
#define rd64(a, reg) readq((a)->hw_addr + (reg))
#define ice_flush(a) rd32((a), GLGEN_STAT)
-#define ICE_M(m, s) ((m) << (s))
+#define ICE_M(m, s) ((m ## U) << (s))
struct ice_dma_mem {
void *va;
diff --git a/drivers/net/ethernet/intel/ice/ice_sriov.c b/drivers/net/ethernet/intel/ice/ice_sriov.c
index a94a1c4..b0f78c2 100644
--- a/drivers/net/ethernet/intel/ice/ice_sriov.c
+++ b/drivers/net/ethernet/intel/ice/ice_sriov.c
@@ -1068,6 +1068,7 @@ int ice_sriov_set_msix_vec_count(struct pci_dev *vf_dev, int msix_vec_count)
struct ice_pf *pf = pci_get_drvdata(pdev);
u16 prev_msix, prev_queues, queues;
bool needs_rebuild = false;
+ struct ice_vsi *vsi;
struct ice_vf *vf;
int id;
@@ -1102,6 +1103,10 @@ int ice_sriov_set_msix_vec_count(struct pci_dev *vf_dev, int msix_vec_count)
if (!vf)
return -ENOENT;
+ vsi = ice_get_vf_vsi(vf);
+ if (!vsi)
+ return -ENOENT;
+
prev_msix = vf->num_msix;
prev_queues = vf->num_vf_qs;
@@ -1122,7 +1127,7 @@ int ice_sriov_set_msix_vec_count(struct pci_dev *vf_dev, int msix_vec_count)
if (vf->first_vector_idx < 0)
goto unroll;
- if (ice_vf_reconfig_vsi(vf)) {
+ if (ice_vf_reconfig_vsi(vf) || ice_vf_init_host_cfg(vf, vsi)) {
/* Try to rebuild with previous values */
needs_rebuild = true;
goto unroll;
@@ -1148,8 +1153,10 @@ int ice_sriov_set_msix_vec_count(struct pci_dev *vf_dev, int msix_vec_count)
if (vf->first_vector_idx < 0)
return -EINVAL;
- if (needs_rebuild)
+ if (needs_rebuild) {
ice_vf_reconfig_vsi(vf);
+ ice_vf_init_host_cfg(vf, vsi);
+ }
ice_ena_vf_mappings(vf);
ice_put_vf(vf);
diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h
index 41ab6d7..a508e91 100644
--- a/drivers/net/ethernet/intel/ice/ice_type.h
+++ b/drivers/net/ethernet/intel/ice/ice_type.h
@@ -1072,7 +1072,7 @@ struct ice_aq_get_set_rss_lut_params {
#define ICE_OROM_VER_BUILD_SHIFT 8
#define ICE_OROM_VER_BUILD_MASK (0xffff << ICE_OROM_VER_BUILD_SHIFT)
#define ICE_OROM_VER_SHIFT 24
-#define ICE_OROM_VER_MASK (0xff << ICE_OROM_VER_SHIFT)
+#define ICE_OROM_VER_MASK (0xffU << ICE_OROM_VER_SHIFT)
#define ICE_SR_PFA_PTR 0x40
#define ICE_SR_1ST_NVM_BANK_PTR 0x42
#define ICE_SR_NVM_BANK_SIZE 0x43
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
index c925813..6f2328a 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
@@ -440,7 +440,6 @@ static int ice_vc_get_vf_res_msg(struct ice_vf *vf, u8 *msg)
vf->driver_caps = *(u32 *)msg;
else
vf->driver_caps = VIRTCHNL_VF_OFFLOAD_L2 |
- VIRTCHNL_VF_OFFLOAD_RSS_REG |
VIRTCHNL_VF_OFFLOAD_VLAN;
vfres->vf_cap_flags = VIRTCHNL_VF_OFFLOAD_L2;
@@ -453,14 +452,8 @@ static int ice_vc_get_vf_res_msg(struct ice_vf *vf, u8 *msg)
vfres->vf_cap_flags |= ice_vc_get_vlan_caps(hw, vf, vsi,
vf->driver_caps);
- if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_RSS_PF) {
+ if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_RSS_PF)
vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_RSS_PF;
- } else {
- if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_RSS_AQ)
- vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_RSS_AQ;
- else
- vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_RSS_REG;
- }
if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC)
vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC;
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
index 5e19d48..d796dbd 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
@@ -13,8 +13,6 @@
* - opcodes needed by VF when caps are activated
*
* Caps that don't use new opcodes (no opcodes should be allowed):
- * - VIRTCHNL_VF_OFFLOAD_RSS_AQ
- * - VIRTCHNL_VF_OFFLOAD_RSS_REG
* - VIRTCHNL_VF_OFFLOAD_WB_ON_ITR
* - VIRTCHNL_VF_OFFLOAD_CRC
* - VIRTCHNL_VF_OFFLOAD_RX_POLLING
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index 8b81a16..2eecd0f 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -179,6 +179,10 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
return -EBUSY;
usleep_range(1000, 2000);
}
+
+ ice_qvec_dis_irq(vsi, rx_ring, q_vector);
+ ice_qvec_toggle_napi(vsi, q_vector, false);
+
netif_tx_stop_queue(netdev_get_tx_queue(vsi->netdev, q_idx));
ice_fill_txq_meta(vsi, tx_ring, &txq_meta);
@@ -195,13 +199,10 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
if (err)
return err;
}
- ice_qvec_dis_irq(vsi, rx_ring, q_vector);
-
err = ice_vsi_ctrl_one_rx_ring(vsi, false, q_idx, true);
if (err)
return err;
- ice_qvec_toggle_napi(vsi, q_vector, false);
ice_qp_clean_rings(vsi, q_idx);
ice_qp_reset_stats(vsi, q_idx);
@@ -259,11 +260,11 @@ static int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx)
if (err)
return err;
- clear_bit(ICE_CFG_BUSY, vsi->state);
ice_qvec_toggle_napi(vsi, q_vector, true);
ice_qvec_ena_irq(vsi, q_vector);
netif_tx_start_queue(netdev_get_tx_queue(vsi->netdev, q_idx));
+ clear_bit(ICE_CFG_BUSY, vsi->state);
return 0;
}
diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
index d0cdd63..390977a 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
@@ -2087,8 +2087,10 @@ int idpf_send_disable_queues_msg(struct idpf_vport *vport)
set_bit(__IDPF_Q_POLL_MODE, vport->txqs[i]->flags);
/* schedule the napi to receive all the marker packets */
+ local_bh_disable();
for (i = 0; i < vport->num_q_vectors; i++)
napi_schedule(&vport->q_vectors[i].napi);
+ local_bh_enable();
return idpf_wait_for_marker_event(vport);
}
diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index a2b7595..3c2dc7b 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -637,7 +637,7 @@ struct igb_adapter {
struct timespec64 period;
} perout[IGB_N_PEROUT];
- char fw_version[32];
+ char fw_version[48];
#ifdef CONFIG_IGB_HWMON
struct hwmon_buff *igb_hwmon_buff;
bool ets;
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 4df8d41..cebb44f 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3069,7 +3069,6 @@ void igb_set_fw_version(struct igb_adapter *adapter)
{
struct e1000_hw *hw = &adapter->hw;
struct e1000_fw_version fw;
- char *lbuf;
igb_get_fw_version(hw, &fw);
@@ -3077,34 +3076,36 @@ void igb_set_fw_version(struct igb_adapter *adapter)
case e1000_i210:
case e1000_i211:
if (!(igb_get_flash_presence_i210(hw))) {
- lbuf = kasprintf(GFP_KERNEL, "%2d.%2d-%d",
- fw.invm_major, fw.invm_minor,
- fw.invm_img_type);
+ snprintf(adapter->fw_version,
+ sizeof(adapter->fw_version),
+ "%2d.%2d-%d",
+ fw.invm_major, fw.invm_minor,
+ fw.invm_img_type);
break;
}
fallthrough;
default:
/* if option rom is valid, display its version too */
if (fw.or_valid) {
- lbuf = kasprintf(GFP_KERNEL, "%d.%d, 0x%08x, %d.%d.%d",
- fw.eep_major, fw.eep_minor,
- fw.etrack_id, fw.or_major, fw.or_build,
- fw.or_patch);
+ snprintf(adapter->fw_version,
+ sizeof(adapter->fw_version),
+ "%d.%d, 0x%08x, %d.%d.%d",
+ fw.eep_major, fw.eep_minor, fw.etrack_id,
+ fw.or_major, fw.or_build, fw.or_patch);
/* no option rom */
} else if (fw.etrack_id != 0X0000) {
- lbuf = kasprintf(GFP_KERNEL, "%d.%d, 0x%08x",
- fw.eep_major, fw.eep_minor,
- fw.etrack_id);
+ snprintf(adapter->fw_version,
+ sizeof(adapter->fw_version),
+ "%d.%d, 0x%08x",
+ fw.eep_major, fw.eep_minor, fw.etrack_id);
} else {
- lbuf = kasprintf(GFP_KERNEL, "%d.%d.%d", fw.eep_major,
- fw.eep_minor, fw.eep_build);
+ snprintf(adapter->fw_version,
+ sizeof(adapter->fw_version),
+ "%d.%d.%d",
+ fw.eep_major, fw.eep_minor, fw.eep_build);
}
break;
}
-
- /* the truncate happens here if it doesn't fit */
- strscpy(adapter->fw_version, lbuf, sizeof(adapter->fw_version));
- kfree(lbuf);
}
/**
diff --git a/drivers/net/ethernet/intel/igb/igb_ptp.c b/drivers/net/ethernet/intel/igb/igb_ptp.c
index 319c544b..f945705 100644
--- a/drivers/net/ethernet/intel/igb/igb_ptp.c
+++ b/drivers/net/ethernet/intel/igb/igb_ptp.c
@@ -957,7 +957,7 @@ static void igb_ptp_tx_hwtstamp(struct igb_adapter *adapter)
igb_ptp_systim_to_hwtstamp(adapter, &shhwtstamps, regval);
/* adjust timestamp for the TX latency based on link speed */
- if (adapter->hw.mac.type == e1000_i210) {
+ if (hw->mac.type == e1000_i210 || hw->mac.type == e1000_i211) {
switch (adapter->link_speed) {
case SPEED_10:
adjust = IGB_I210_TX_LATENCY_10;
@@ -1003,6 +1003,7 @@ int igb_ptp_rx_pktstamp(struct igb_q_vector *q_vector, void *va,
ktime_t *timestamp)
{
struct igb_adapter *adapter = q_vector->adapter;
+ struct e1000_hw *hw = &adapter->hw;
struct skb_shared_hwtstamps ts;
__le64 *regval = (__le64 *)va;
int adjust = 0;
@@ -1022,7 +1023,7 @@ int igb_ptp_rx_pktstamp(struct igb_q_vector *q_vector, void *va,
igb_ptp_systim_to_hwtstamp(adapter, &ts, le64_to_cpu(regval[1]));
/* adjust timestamp for the RX latency based on link speed */
- if (adapter->hw.mac.type == e1000_i210) {
+ if (hw->mac.type == e1000_i210 || hw->mac.type == e1000_i211) {
switch (adapter->link_speed) {
case SPEED_10:
adjust = IGB_I210_RX_LATENCY_10;
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index ba8d3fe..81c21a8 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -6487,7 +6487,7 @@ static int igc_xdp_xmit(struct net_device *dev, int num_frames,
int cpu = smp_processor_id();
struct netdev_queue *nq;
struct igc_ring *ring;
- int i, drops;
+ int i, nxmit;
if (unlikely(!netif_carrier_ok(dev)))
return -ENETDOWN;
@@ -6503,16 +6503,15 @@ static int igc_xdp_xmit(struct net_device *dev, int num_frames,
/* Avoid transmit queue timeout since we share it with the slow path */
txq_trans_cond_update(nq);
- drops = 0;
+ nxmit = 0;
for (i = 0; i < num_frames; i++) {
int err;
struct xdp_frame *xdpf = frames[i];
err = igc_xdp_init_tx_descriptor(ring, xdpf);
- if (err) {
- xdp_return_frame_rx_napi(xdpf);
- drops++;
- }
+ if (err)
+ break;
+ nxmit++;
}
if (flags & XDP_XMIT_FLUSH)
@@ -6520,7 +6519,7 @@ static int igc_xdp_xmit(struct net_device *dev, int num_frames,
__netif_tx_unlock(nq);
- return num_frames - drops;
+ return nxmit;
}
static void igc_trigger_rxtxq_interrupt(struct igc_adapter *adapter,
diff --git a/drivers/net/ethernet/intel/igc/igc_phy.c b/drivers/net/ethernet/intel/igc/igc_phy.c
index 7cd8716..861f370 100644
--- a/drivers/net/ethernet/intel/igc/igc_phy.c
+++ b/drivers/net/ethernet/intel/igc/igc_phy.c
@@ -130,11 +130,7 @@ void igc_power_down_phy_copper(struct igc_hw *hw)
/* The PHY will retain its settings across a power down/up cycle */
hw->phy.ops.read_reg(hw, PHY_CONTROL, &mii_reg);
mii_reg |= MII_CR_POWER_DOWN;
-
- /* Temporary workaround - should be removed when PHY will implement
- * IEEE registers as properly
- */
- /* hw->phy.ops.write_reg(hw, PHY_CONTROL, mii_reg);*/
+ hw->phy.ops.write_reg(hw, PHY_CONTROL, mii_reg);
usleep_range(1000, 2000);
}
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index bd54152..99876b7 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2939,8 +2939,8 @@ static void ixgbe_check_lsc(struct ixgbe_adapter *adapter)
static inline void ixgbe_irq_enable_queues(struct ixgbe_adapter *adapter,
u64 qmask)
{
- u32 mask;
struct ixgbe_hw *hw = &adapter->hw;
+ u32 mask;
switch (hw->mac.type) {
case ixgbe_mac_82598EB:
@@ -10525,6 +10525,44 @@ static void ixgbe_reset_rxr_stats(struct ixgbe_ring *rx_ring)
}
/**
+ * ixgbe_irq_disable_single - Disable single IRQ vector
+ * @adapter: adapter structure
+ * @ring: ring index
+ **/
+static void ixgbe_irq_disable_single(struct ixgbe_adapter *adapter, u32 ring)
+{
+ struct ixgbe_hw *hw = &adapter->hw;
+ u64 qmask = BIT_ULL(ring);
+ u32 mask;
+
+ switch (adapter->hw.mac.type) {
+ case ixgbe_mac_82598EB:
+ mask = qmask & IXGBE_EIMC_RTX_QUEUE;
+ IXGBE_WRITE_REG(&adapter->hw, IXGBE_EIMC, mask);
+ break;
+ case ixgbe_mac_82599EB:
+ case ixgbe_mac_X540:
+ case ixgbe_mac_X550:
+ case ixgbe_mac_X550EM_x:
+ case ixgbe_mac_x550em_a:
+ mask = (qmask & 0xFFFFFFFF);
+ if (mask)
+ IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(0), mask);
+ mask = (qmask >> 32);
+ if (mask)
+ IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(1), mask);
+ break;
+ default:
+ break;
+ }
+ IXGBE_WRITE_FLUSH(&adapter->hw);
+ if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED)
+ synchronize_irq(adapter->msix_entries[ring].vector);
+ else
+ synchronize_irq(adapter->pdev->irq);
+}
+
+/**
* ixgbe_txrx_ring_disable - Disable Rx/Tx/XDP Tx rings
* @adapter: adapter structure
* @ring: ring index
@@ -10540,6 +10578,11 @@ void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring)
tx_ring = adapter->tx_ring[ring];
xdp_ring = adapter->xdp_ring[ring];
+ ixgbe_irq_disable_single(adapter, ring);
+
+ /* Rx/Tx/XDP Tx share the same napi context. */
+ napi_disable(&rx_ring->q_vector->napi);
+
ixgbe_disable_txr(adapter, tx_ring);
if (xdp_ring)
ixgbe_disable_txr(adapter, xdp_ring);
@@ -10548,9 +10591,6 @@ void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring)
if (xdp_ring)
synchronize_rcu();
- /* Rx/Tx/XDP Tx share the same napi context. */
- napi_disable(&rx_ring->q_vector->napi);
-
ixgbe_clean_tx_ring(tx_ring);
if (xdp_ring)
ixgbe_clean_tx_ring(xdp_ring);
@@ -10578,9 +10618,6 @@ void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring)
tx_ring = adapter->tx_ring[ring];
xdp_ring = adapter->xdp_ring[ring];
- /* Rx/Tx/XDP Tx share the same napi context. */
- napi_enable(&rx_ring->q_vector->napi);
-
ixgbe_configure_tx_ring(adapter, tx_ring);
if (xdp_ring)
ixgbe_configure_tx_ring(adapter, xdp_ring);
@@ -10589,6 +10626,11 @@ void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring)
clear_bit(__IXGBE_TX_DISABLED, &tx_ring->state);
if (xdp_ring)
clear_bit(__IXGBE_TX_DISABLED, &xdp_ring->state);
+
+ /* Rx/Tx/XDP Tx share the same napi context. */
+ napi_enable(&rx_ring->q_vector->napi);
+ ixgbe_irq_enable_queues(adapter, BIT_ULL(ring));
+ IXGBE_WRITE_FLUSH(&adapter->hw);
}
/**
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
index 167145b..516adb5 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
@@ -61,28 +61,6 @@ int rvu_npc_get_tx_nibble_cfg(struct rvu *rvu, u64 nibble_ena)
return 0;
}
-static int npc_mcam_verify_pf_func(struct rvu *rvu,
- struct mcam_entry *entry_data, u8 intf,
- u16 pcifunc)
-{
- u16 pf_func, pf_func_mask;
-
- if (is_npc_intf_rx(intf))
- return 0;
-
- pf_func_mask = (entry_data->kw_mask[0] >> 32) &
- NPC_KEX_PF_FUNC_MASK;
- pf_func = (entry_data->kw[0] >> 32) & NPC_KEX_PF_FUNC_MASK;
-
- pf_func = be16_to_cpu((__force __be16)pf_func);
- if (pf_func_mask != NPC_KEX_PF_FUNC_MASK ||
- ((pf_func & ~RVU_PFVF_FUNC_MASK) !=
- (pcifunc & ~RVU_PFVF_FUNC_MASK)))
- return -EINVAL;
-
- return 0;
-}
-
void rvu_npc_set_pkind(struct rvu *rvu, int pkind, struct rvu_pfvf *pfvf)
{
int blkaddr;
@@ -437,6 +415,10 @@ static void npc_fixup_vf_rule(struct rvu *rvu, struct npc_mcam *mcam,
return;
}
+ /* AF modifies given action iff PF/VF has requested for it */
+ if ((entry->action & 0xFULL) != NIX_RX_ACTION_DEFAULT)
+ return;
+
/* copy VF default entry action to the VF mcam entry */
rx_action = npc_get_default_entry_action(rvu, mcam, blkaddr,
target_func);
@@ -1850,8 +1832,8 @@ void npc_mcam_rsrcs_deinit(struct rvu *rvu)
{
struct npc_mcam *mcam = &rvu->hw->mcam;
- kfree(mcam->bmap);
- kfree(mcam->bmap_reverse);
+ bitmap_free(mcam->bmap);
+ bitmap_free(mcam->bmap_reverse);
kfree(mcam->entry2pfvf_map);
kfree(mcam->cntr2pfvf_map);
kfree(mcam->entry2cntr_map);
@@ -1904,21 +1886,20 @@ int npc_mcam_rsrcs_init(struct rvu *rvu, int blkaddr)
mcam->pf_offset = mcam->nixlf_offset + nixlf_count;
/* Allocate bitmaps for managing MCAM entries */
- mcam->bmap = kmalloc_array(BITS_TO_LONGS(mcam->bmap_entries),
- sizeof(long), GFP_KERNEL);
+ mcam->bmap = bitmap_zalloc(mcam->bmap_entries, GFP_KERNEL);
if (!mcam->bmap)
return -ENOMEM;
- mcam->bmap_reverse = kmalloc_array(BITS_TO_LONGS(mcam->bmap_entries),
- sizeof(long), GFP_KERNEL);
+ mcam->bmap_reverse = bitmap_zalloc(mcam->bmap_entries, GFP_KERNEL);
if (!mcam->bmap_reverse)
goto free_bmap;
mcam->bmap_fcnt = mcam->bmap_entries;
/* Alloc memory for saving entry to RVU PFFUNC allocation mapping */
- mcam->entry2pfvf_map = kmalloc_array(mcam->bmap_entries,
- sizeof(u16), GFP_KERNEL);
+ mcam->entry2pfvf_map = kcalloc(mcam->bmap_entries, sizeof(u16),
+ GFP_KERNEL);
+
if (!mcam->entry2pfvf_map)
goto free_bmap_reverse;
@@ -1941,21 +1922,21 @@ int npc_mcam_rsrcs_init(struct rvu *rvu, int blkaddr)
if (err)
goto free_entry_map;
- mcam->cntr2pfvf_map = kmalloc_array(mcam->counters.max,
- sizeof(u16), GFP_KERNEL);
+ mcam->cntr2pfvf_map = kcalloc(mcam->counters.max, sizeof(u16),
+ GFP_KERNEL);
if (!mcam->cntr2pfvf_map)
goto free_cntr_bmap;
/* Alloc memory for MCAM entry to counter mapping and for tracking
* counter's reference count.
*/
- mcam->entry2cntr_map = kmalloc_array(mcam->bmap_entries,
- sizeof(u16), GFP_KERNEL);
+ mcam->entry2cntr_map = kcalloc(mcam->bmap_entries, sizeof(u16),
+ GFP_KERNEL);
if (!mcam->entry2cntr_map)
goto free_cntr_map;
- mcam->cntr_refcnt = kmalloc_array(mcam->counters.max,
- sizeof(u16), GFP_KERNEL);
+ mcam->cntr_refcnt = kcalloc(mcam->counters.max, sizeof(u16),
+ GFP_KERNEL);
if (!mcam->cntr_refcnt)
goto free_entry_cntr_map;
@@ -1988,9 +1969,9 @@ int npc_mcam_rsrcs_init(struct rvu *rvu, int blkaddr)
free_entry_map:
kfree(mcam->entry2pfvf_map);
free_bmap_reverse:
- kfree(mcam->bmap_reverse);
+ bitmap_free(mcam->bmap_reverse);
free_bmap:
- kfree(mcam->bmap);
+ bitmap_free(mcam->bmap);
return -ENOMEM;
}
@@ -2852,12 +2833,6 @@ int rvu_mbox_handler_npc_mcam_write_entry(struct rvu *rvu,
else
nix_intf = pfvf->nix_rx_intf;
- if (!is_pffunc_af(pcifunc) &&
- npc_mcam_verify_pf_func(rvu, &req->entry_data, req->intf, pcifunc)) {
- rc = NPC_MCAM_INVALID_REQ;
- goto exit;
- }
-
/* For AF installed rules, the nix_intf should be set to target NIX */
if (is_pffunc_af(req->hdr.pcifunc))
nix_intf = req->intf;
@@ -3209,10 +3184,6 @@ int rvu_mbox_handler_npc_mcam_alloc_and_write_entry(struct rvu *rvu,
if (!is_npc_interface_valid(rvu, req->intf))
return NPC_MCAM_INVALID_REQ;
- if (npc_mcam_verify_pf_func(rvu, &req->entry_data, req->intf,
- req->hdr.pcifunc))
- return NPC_MCAM_INVALID_REQ;
-
/* Try to allocate a MCAM entry */
entry_req.hdr.pcifunc = req->hdr.pcifunc;
entry_req.contig = true;
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
index 7ca6941..02d0b70 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
@@ -951,8 +951,11 @@ int otx2_sq_init(struct otx2_nic *pfvf, u16 qidx, u16 sqb_aura)
if (pfvf->ptp && qidx < pfvf->hw.tx_queues) {
err = qmem_alloc(pfvf->dev, &sq->timestamps, qset->sqe_cnt,
sizeof(*sq->timestamps));
- if (err)
+ if (err) {
+ kfree(sq->sg);
+ sq->sg = NULL;
return err;
+ }
}
sq->head = 0;
@@ -968,7 +971,14 @@ int otx2_sq_init(struct otx2_nic *pfvf, u16 qidx, u16 sqb_aura)
sq->stats.bytes = 0;
sq->stats.pkts = 0;
- return pfvf->hw_ops->sq_aq_init(pfvf, qidx, sqb_aura);
+ err = pfvf->hw_ops->sq_aq_init(pfvf, qidx, sqb_aura);
+ if (err) {
+ kfree(sq->sg);
+ sq->sg = NULL;
+ return err;
+ }
+
+ return 0;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index 3e06423..98d4306 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -157,6 +157,12 @@ static int mlx5_devlink_reload_down(struct devlink *devlink, bool netns_change,
return -EOPNOTSUPP;
}
+ if (action == DEVLINK_RELOAD_ACTION_FW_ACTIVATE &&
+ !dev->priv.fw_reset) {
+ NL_SET_ERR_MSG_MOD(extack, "FW activate is unsupported for this function");
+ return -EOPNOTSUPP;
+ }
+
if (mlx5_core_is_pf(dev) && pci_num_vf(pdev))
NL_SET_ERR_MSG_MOD(extack, "reload while VFs are present is unfavorable");
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/dpll.c b/drivers/net/ethernet/mellanox/mlx5/core/dpll.c
index 18fed2b..d74a5aa 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/dpll.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/dpll.c
@@ -261,7 +261,7 @@ static void mlx5_dpll_netdev_dpll_pin_set(struct mlx5_dpll *mdpll,
{
if (mdpll->tracking_netdev)
return;
- netdev_dpll_pin_set(netdev, mdpll->dpll_pin);
+ dpll_netdev_pin_set(netdev, mdpll->dpll_pin);
mdpll->tracking_netdev = netdev;
}
@@ -269,7 +269,7 @@ static void mlx5_dpll_netdev_dpll_pin_clear(struct mlx5_dpll *mdpll)
{
if (!mdpll->tracking_netdev)
return;
- netdev_dpll_pin_clear(mdpll->tracking_netdev);
+ dpll_netdev_pin_clear(mdpll->tracking_netdev);
mdpll->tracking_netdev = NULL;
}
@@ -389,7 +389,7 @@ static void mlx5_dpll_remove(struct auxiliary_device *adev)
struct mlx5_dpll *mdpll = auxiliary_get_drvdata(adev);
struct mlx5_core_dev *mdev = mdpll->mdev;
- cancel_delayed_work(&mdpll->work);
+ cancel_delayed_work_sync(&mdpll->work);
mlx5_dpll_mdev_netdev_untrack(mdpll, mdev);
destroy_workqueue(mdpll->wq);
dpll_pin_unregister(mdpll->dpll, mdpll->dpll_pin,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
index 078f56a..ca05b32 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
@@ -42,9 +42,9 @@ mlx5e_ptp_port_ts_cqe_list_add(struct mlx5e_ptp_port_ts_cqe_list *list, u8 metad
WARN_ON_ONCE(tracker->inuse);
tracker->inuse = true;
- spin_lock(&list->tracker_list_lock);
+ spin_lock_bh(&list->tracker_list_lock);
list_add_tail(&tracker->entry, &list->tracker_list_head);
- spin_unlock(&list->tracker_list_lock);
+ spin_unlock_bh(&list->tracker_list_lock);
}
static void
@@ -54,9 +54,9 @@ mlx5e_ptp_port_ts_cqe_list_remove(struct mlx5e_ptp_port_ts_cqe_list *list, u8 me
WARN_ON_ONCE(!tracker->inuse);
tracker->inuse = false;
- spin_lock(&list->tracker_list_lock);
+ spin_lock_bh(&list->tracker_list_lock);
list_del(&tracker->entry);
- spin_unlock(&list->tracker_list_lock);
+ spin_unlock_bh(&list->tracker_list_lock);
}
void mlx5e_ptpsq_track_metadata(struct mlx5e_ptpsq *ptpsq, u8 metadata)
@@ -155,7 +155,7 @@ static void mlx5e_ptpsq_mark_ts_cqes_undelivered(struct mlx5e_ptpsq *ptpsq,
struct mlx5e_ptp_metadata_map *metadata_map = &ptpsq->metadata_map;
struct mlx5e_ptp_port_ts_cqe_tracker *pos, *n;
- spin_lock(&cqe_list->tracker_list_lock);
+ spin_lock_bh(&cqe_list->tracker_list_lock);
list_for_each_entry_safe(pos, n, &cqe_list->tracker_list_head, entry) {
struct sk_buff *skb =
mlx5e_ptp_metadata_map_lookup(metadata_map, pos->metadata_id);
@@ -170,7 +170,7 @@ static void mlx5e_ptpsq_mark_ts_cqes_undelivered(struct mlx5e_ptpsq *ptpsq,
pos->inuse = false;
list_del(&pos->entry);
}
- spin_unlock(&cqe_list->tracker_list_lock);
+ spin_unlock_bh(&cqe_list->tracker_list_lock);
}
#define PTP_WQE_CTR2IDX(val) ((val) & ptpsq->ts_cqe_ctr_mask)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/post_act.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/post_act.c
index 86bf007..b500cc2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/post_act.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/post_act.c
@@ -37,7 +37,7 @@ mlx5e_tc_post_act_init(struct mlx5e_priv *priv, struct mlx5_fs_chains *chains,
if (!MLX5_CAP_FLOWTABLE_TYPE(priv->mdev, ignore_flow_level, table_type)) {
if (priv->mdev->coredev_type == MLX5_COREDEV_PF)
- mlx5_core_warn(priv->mdev, "firmware level support is missing\n");
+ mlx5_core_dbg(priv->mdev, "firmware flow level support is missing\n");
err = -EOPNOTSUPP;
goto err_check;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c
index d4ebd87..b2cabd6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c
@@ -310,9 +310,9 @@ static void mlx5e_macsec_destroy_object(struct mlx5_core_dev *mdev, u32 macsec_o
mlx5_cmd_exec(mdev, in, sizeof(in), out, sizeof(out));
}
-static void mlx5e_macsec_cleanup_sa(struct mlx5e_macsec *macsec,
- struct mlx5e_macsec_sa *sa,
- bool is_tx, struct net_device *netdev, u32 fs_id)
+static void mlx5e_macsec_cleanup_sa_fs(struct mlx5e_macsec *macsec,
+ struct mlx5e_macsec_sa *sa, bool is_tx,
+ struct net_device *netdev, u32 fs_id)
{
int action = (is_tx) ? MLX5_ACCEL_MACSEC_ACTION_ENCRYPT :
MLX5_ACCEL_MACSEC_ACTION_DECRYPT;
@@ -322,20 +322,49 @@ static void mlx5e_macsec_cleanup_sa(struct mlx5e_macsec *macsec,
mlx5_macsec_fs_del_rule(macsec->mdev->macsec_fs, sa->macsec_rule, action, netdev,
fs_id);
- mlx5e_macsec_destroy_object(macsec->mdev, sa->macsec_obj_id);
sa->macsec_rule = NULL;
}
+static void mlx5e_macsec_cleanup_sa(struct mlx5e_macsec *macsec,
+ struct mlx5e_macsec_sa *sa, bool is_tx,
+ struct net_device *netdev, u32 fs_id)
+{
+ mlx5e_macsec_cleanup_sa_fs(macsec, sa, is_tx, netdev, fs_id);
+ mlx5e_macsec_destroy_object(macsec->mdev, sa->macsec_obj_id);
+}
+
+static int mlx5e_macsec_init_sa_fs(struct macsec_context *ctx,
+ struct mlx5e_macsec_sa *sa, bool encrypt,
+ bool is_tx, u32 *fs_id)
+{
+ struct mlx5e_priv *priv = macsec_netdev_priv(ctx->netdev);
+ struct mlx5_macsec_fs *macsec_fs = priv->mdev->macsec_fs;
+ struct mlx5_macsec_rule_attrs rule_attrs;
+ union mlx5_macsec_rule *macsec_rule;
+
+ rule_attrs.macsec_obj_id = sa->macsec_obj_id;
+ rule_attrs.sci = sa->sci;
+ rule_attrs.assoc_num = sa->assoc_num;
+ rule_attrs.action = (is_tx) ? MLX5_ACCEL_MACSEC_ACTION_ENCRYPT :
+ MLX5_ACCEL_MACSEC_ACTION_DECRYPT;
+
+ macsec_rule = mlx5_macsec_fs_add_rule(macsec_fs, ctx, &rule_attrs, fs_id);
+ if (!macsec_rule)
+ return -ENOMEM;
+
+ sa->macsec_rule = macsec_rule;
+
+ return 0;
+}
+
static int mlx5e_macsec_init_sa(struct macsec_context *ctx,
struct mlx5e_macsec_sa *sa,
bool encrypt, bool is_tx, u32 *fs_id)
{
struct mlx5e_priv *priv = macsec_netdev_priv(ctx->netdev);
struct mlx5e_macsec *macsec = priv->macsec;
- struct mlx5_macsec_rule_attrs rule_attrs;
struct mlx5_core_dev *mdev = priv->mdev;
struct mlx5_macsec_obj_attrs obj_attrs;
- union mlx5_macsec_rule *macsec_rule;
int err;
obj_attrs.next_pn = sa->next_pn;
@@ -357,20 +386,12 @@ static int mlx5e_macsec_init_sa(struct macsec_context *ctx,
if (err)
return err;
- rule_attrs.macsec_obj_id = sa->macsec_obj_id;
- rule_attrs.sci = sa->sci;
- rule_attrs.assoc_num = sa->assoc_num;
- rule_attrs.action = (is_tx) ? MLX5_ACCEL_MACSEC_ACTION_ENCRYPT :
- MLX5_ACCEL_MACSEC_ACTION_DECRYPT;
-
- macsec_rule = mlx5_macsec_fs_add_rule(mdev->macsec_fs, ctx, &rule_attrs, fs_id);
- if (!macsec_rule) {
- err = -ENOMEM;
- goto destroy_macsec_object;
+ if (sa->active) {
+ err = mlx5e_macsec_init_sa_fs(ctx, sa, encrypt, is_tx, fs_id);
+ if (err)
+ goto destroy_macsec_object;
}
- sa->macsec_rule = macsec_rule;
-
return 0;
destroy_macsec_object:
@@ -526,9 +547,7 @@ static int mlx5e_macsec_add_txsa(struct macsec_context *ctx)
goto destroy_sa;
macsec_device->tx_sa[assoc_num] = tx_sa;
- if (!secy->operational ||
- assoc_num != tx_sc->encoding_sa ||
- !tx_sa->active)
+ if (!secy->operational)
goto out;
err = mlx5e_macsec_init_sa(ctx, tx_sa, tx_sc->encrypt, true, NULL);
@@ -595,7 +614,7 @@ static int mlx5e_macsec_upd_txsa(struct macsec_context *ctx)
goto out;
if (ctx_tx_sa->active) {
- err = mlx5e_macsec_init_sa(ctx, tx_sa, tx_sc->encrypt, true, NULL);
+ err = mlx5e_macsec_init_sa_fs(ctx, tx_sa, tx_sc->encrypt, true, NULL);
if (err)
goto out;
} else {
@@ -604,7 +623,7 @@ static int mlx5e_macsec_upd_txsa(struct macsec_context *ctx)
goto out;
}
- mlx5e_macsec_cleanup_sa(macsec, tx_sa, true, ctx->secy->netdev, 0);
+ mlx5e_macsec_cleanup_sa_fs(macsec, tx_sa, true, ctx->secy->netdev, 0);
}
out:
mutex_unlock(&macsec->lock);
@@ -1030,8 +1049,9 @@ static int mlx5e_macsec_del_rxsa(struct macsec_context *ctx)
goto out;
}
- mlx5e_macsec_cleanup_sa(macsec, rx_sa, false, ctx->secy->netdev,
- rx_sc->sc_xarray_element->fs_id);
+ if (rx_sa->active)
+ mlx5e_macsec_cleanup_sa(macsec, rx_sa, false, ctx->secy->netdev,
+ rx_sc->sc_xarray_element->fs_id);
mlx5_destroy_encryption_key(macsec->mdev, rx_sa->enc_key_id);
kfree(rx_sa);
rx_sc->rx_sa[assoc_num] = NULL;
@@ -1112,8 +1132,8 @@ static int macsec_upd_secy_hw_address(struct macsec_context *ctx,
if (!rx_sa || !rx_sa->macsec_rule)
continue;
- mlx5e_macsec_cleanup_sa(macsec, rx_sa, false, ctx->secy->netdev,
- rx_sc->sc_xarray_element->fs_id);
+ mlx5e_macsec_cleanup_sa_fs(macsec, rx_sa, false, ctx->secy->netdev,
+ rx_sc->sc_xarray_element->fs_id);
}
}
@@ -1124,8 +1144,8 @@ static int macsec_upd_secy_hw_address(struct macsec_context *ctx,
continue;
if (rx_sa->active) {
- err = mlx5e_macsec_init_sa(ctx, rx_sa, true, false,
- &rx_sc->sc_xarray_element->fs_id);
+ err = mlx5e_macsec_init_sa_fs(ctx, rx_sa, true, false,
+ &rx_sc->sc_xarray_element->fs_id);
if (err)
goto out;
}
@@ -1178,7 +1198,7 @@ static int mlx5e_macsec_upd_secy(struct macsec_context *ctx)
if (!tx_sa)
continue;
- mlx5e_macsec_cleanup_sa(macsec, tx_sa, true, ctx->secy->netdev, 0);
+ mlx5e_macsec_cleanup_sa_fs(macsec, tx_sa, true, ctx->secy->netdev, 0);
}
for (i = 0; i < MACSEC_NUM_AN; ++i) {
@@ -1187,7 +1207,7 @@ static int mlx5e_macsec_upd_secy(struct macsec_context *ctx)
continue;
if (tx_sa->assoc_num == tx_sc->encoding_sa && tx_sa->active) {
- err = mlx5e_macsec_init_sa(ctx, tx_sa, tx_sc->encrypt, true, NULL);
+ err = mlx5e_macsec_init_sa_fs(ctx, tx_sa, tx_sc->encrypt, true, NULL);
if (err)
goto out;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 5c166d9..2fa076b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -401,6 +401,8 @@ mlx5e_txwqe_complete(struct mlx5e_txqsq *sq, struct sk_buff *skb,
mlx5e_skb_cb_hwtstamp_init(skb);
mlx5e_ptp_metadata_map_put(&sq->ptpsq->metadata_map, skb,
metadata_index);
+ /* ensure skb is put on metadata_map before tracking the index */
+ wmb();
mlx5e_ptpsq_track_metadata(sq->ptpsq, metadata_index);
if (!netif_tx_queue_stopped(sq->txq) &&
mlx5e_ptpsq_metadata_freelist_empty(sq->ptpsq)) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.c
index 190f10a..5a0047b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.c
@@ -152,7 +152,7 @@ void mlx5_esw_ipsec_restore_dest_uplink(struct mlx5_core_dev *mdev)
xa_for_each(&esw->offloads.vport_reps, i, rep) {
rpriv = rep->rep_data[REP_ETH].priv;
- if (!rpriv || !rpriv->netdev || !atomic_read(&rpriv->tc_ht.nelems))
+ if (!rpriv || !rpriv->netdev)
continue;
rhashtable_walk_enter(&rpriv->tc_ht, &iter);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index b045513..baaae62 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -535,21 +535,26 @@ esw_src_port_rewrite_supported(struct mlx5_eswitch *esw)
}
static bool
-esw_dests_to_vf_pf_vports(struct mlx5_flow_destination *dests, int max_dest)
+esw_dests_to_int_external(struct mlx5_flow_destination *dests, int max_dest)
{
- bool vf_dest = false, pf_dest = false;
+ bool internal_dest = false, external_dest = false;
int i;
for (i = 0; i < max_dest; i++) {
- if (dests[i].type != MLX5_FLOW_DESTINATION_TYPE_VPORT)
+ if (dests[i].type != MLX5_FLOW_DESTINATION_TYPE_VPORT &&
+ dests[i].type != MLX5_FLOW_DESTINATION_TYPE_UPLINK)
continue;
- if (dests[i].vport.num == MLX5_VPORT_UPLINK)
- pf_dest = true;
+ /* Uplink dest is external, but considered as internal
+ * if there is reformat because firmware uses LB+hairpin to support it.
+ */
+ if (dests[i].vport.num == MLX5_VPORT_UPLINK &&
+ !(dests[i].vport.flags & MLX5_FLOW_DEST_VPORT_REFORMAT_ID))
+ external_dest = true;
else
- vf_dest = true;
+ internal_dest = true;
- if (vf_dest && pf_dest)
+ if (internal_dest && external_dest)
return true;
}
@@ -695,9 +700,9 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
/* Header rewrite with combined wire+loopback in FDB is not allowed */
if ((flow_act.action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) &&
- esw_dests_to_vf_pf_vports(dest, i)) {
+ esw_dests_to_int_external(dest, i)) {
esw_warn(esw->dev,
- "FDB: Header rewrite with forwarding to both PF and VF is not allowed\n");
+ "FDB: Header rewrite with forwarding to both internal and external dests is not allowed\n");
rule = ERR_PTR(-EINVAL);
goto err_esw_get;
}
@@ -3658,22 +3663,6 @@ static int esw_inline_mode_to_devlink(u8 mlx5_mode, u8 *mode)
return 0;
}
-static bool esw_offloads_devlink_ns_eq_netdev_ns(struct devlink *devlink)
-{
- struct mlx5_core_dev *dev = devlink_priv(devlink);
- struct net *devl_net, *netdev_net;
- bool ret = false;
-
- mutex_lock(&dev->mlx5e_res.uplink_netdev_lock);
- if (dev->mlx5e_res.uplink_netdev) {
- netdev_net = dev_net(dev->mlx5e_res.uplink_netdev);
- devl_net = devlink_net(devlink);
- ret = net_eq(devl_net, netdev_net);
- }
- mutex_unlock(&dev->mlx5e_res.uplink_netdev_lock);
- return ret;
-}
-
int mlx5_eswitch_block_mode(struct mlx5_core_dev *dev)
{
struct mlx5_eswitch *esw = dev->priv.eswitch;
@@ -3718,13 +3707,6 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
if (esw_mode_from_devlink(mode, &mlx5_mode))
return -EINVAL;
- if (mode == DEVLINK_ESWITCH_MODE_SWITCHDEV &&
- !esw_offloads_devlink_ns_eq_netdev_ns(devlink)) {
- NL_SET_ERR_MSG_MOD(extack,
- "Can't change E-Switch mode to switchdev when netdev net namespace has diverged from the devlink's.");
- return -EPERM;
- }
-
mlx5_lag_disable_change(esw->dev);
err = mlx5_esw_try_lock(esw);
if (err < 0) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
index f27eab6..2911aa3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
@@ -703,19 +703,30 @@ void mlx5_fw_reset_events_start(struct mlx5_core_dev *dev)
{
struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
+ if (!fw_reset)
+ return;
+
MLX5_NB_INIT(&fw_reset->nb, fw_reset_event_notifier, GENERAL_EVENT);
mlx5_eq_notifier_register(dev, &fw_reset->nb);
}
void mlx5_fw_reset_events_stop(struct mlx5_core_dev *dev)
{
- mlx5_eq_notifier_unregister(dev, &dev->priv.fw_reset->nb);
+ struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
+
+ if (!fw_reset)
+ return;
+
+ mlx5_eq_notifier_unregister(dev, &fw_reset->nb);
}
void mlx5_drain_fw_reset(struct mlx5_core_dev *dev)
{
struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
+ if (!fw_reset)
+ return;
+
set_bit(MLX5_FW_RESET_FLAGS_DROP_NEW_REQUESTS, &fw_reset->reset_flags);
cancel_work_sync(&fw_reset->fw_live_patch_work);
cancel_work_sync(&fw_reset->reset_request_work);
@@ -733,9 +744,13 @@ static const struct devlink_param mlx5_fw_reset_devlink_params[] = {
int mlx5_fw_reset_init(struct mlx5_core_dev *dev)
{
- struct mlx5_fw_reset *fw_reset = kzalloc(sizeof(*fw_reset), GFP_KERNEL);
+ struct mlx5_fw_reset *fw_reset;
int err;
+ if (!MLX5_CAP_MCAM_REG(dev, mfrl))
+ return 0;
+
+ fw_reset = kzalloc(sizeof(*fw_reset), GFP_KERNEL);
if (!fw_reset)
return -ENOMEM;
fw_reset->wq = create_singlethread_workqueue("mlx5_fw_reset_events");
@@ -771,6 +786,9 @@ void mlx5_fw_reset_cleanup(struct mlx5_core_dev *dev)
{
struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
+ if (!fw_reset)
+ return;
+
devl_params_unregister(priv_to_devlink(dev),
mlx5_fw_reset_devlink_params,
ARRAY_SIZE(mlx5_fw_reset_devlink_params));
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index 8ff6dc9..b5c709b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -452,10 +452,10 @@ mlx5_fw_reporter_diagnose(struct devlink_health_reporter *reporter,
struct health_buffer __iomem *h = health->health;
u8 synd = ioread8(&h->synd);
+ devlink_fmsg_u8_pair_put(fmsg, "Syndrome", synd);
if (!synd)
return 0;
- devlink_fmsg_u8_pair_put(fmsg, "Syndrome", synd);
devlink_fmsg_string_pair_put(fmsg, "Description", hsynd_str(synd));
return 0;
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_lag.c b/drivers/net/ethernet/microchip/lan966x/lan966x_lag.c
index 41fa252..5f2cd9a 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_lag.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_lag.c
@@ -37,19 +37,24 @@ static void lan966x_lag_set_aggr_pgids(struct lan966x *lan966x)
/* Now, set PGIDs for each active LAG */
for (lag = 0; lag < lan966x->num_phys_ports; ++lag) {
- struct net_device *bond = lan966x->ports[lag]->bond;
+ struct lan966x_port *port = lan966x->ports[lag];
int num_active_ports = 0;
+ struct net_device *bond;
unsigned long bond_mask;
u8 aggr_idx[16];
- if (!bond || (visited & BIT(lag)))
+ if (!port || !port->bond || (visited & BIT(lag)))
continue;
+ bond = port->bond;
bond_mask = lan966x_lag_get_mask(lan966x, bond);
for_each_set_bit(p, &bond_mask, lan966x->num_phys_ports) {
struct lan966x_port *port = lan966x->ports[p];
+ if (!port)
+ continue;
+
lan_wr(ANA_PGID_PGID_SET(bond_mask),
lan966x, ANA_PGID(p));
if (port->lag_tx_active)
diff --git a/drivers/net/ethernet/microchip/sparx5/sparx5_mactable.c b/drivers/net/ethernet/microchip/sparx5/sparx5_mactable.c
index 4af2859..75868b3 100644
--- a/drivers/net/ethernet/microchip/sparx5/sparx5_mactable.c
+++ b/drivers/net/ethernet/microchip/sparx5/sparx5_mactable.c
@@ -347,10 +347,10 @@ int sparx5_del_mact_entry(struct sparx5 *sparx5,
list) {
if ((vid == 0 || mact_entry->vid == vid) &&
ether_addr_equal(addr, mact_entry->mac)) {
+ sparx5_mact_forget(sparx5, addr, mact_entry->vid);
+
list_del(&mact_entry->list);
devm_kfree(sparx5->dev, mact_entry);
-
- sparx5_mact_forget(sparx5, addr, mact_entry->vid);
}
}
mutex_unlock(&sparx5->mact_lock);
diff --git a/drivers/net/ethernet/microchip/sparx5/sparx5_main.c b/drivers/net/ethernet/microchip/sparx5/sparx5_main.c
index d1f7fc8..3c066b6 100644
--- a/drivers/net/ethernet/microchip/sparx5/sparx5_main.c
+++ b/drivers/net/ethernet/microchip/sparx5/sparx5_main.c
@@ -757,6 +757,7 @@ static int mchp_sparx5_probe(struct platform_device *pdev)
platform_set_drvdata(pdev, sparx5);
sparx5->pdev = pdev;
sparx5->dev = &pdev->dev;
+ spin_lock_init(&sparx5->tx_lock);
/* Do switch core reset if available */
reset = devm_reset_control_get_optional_shared(&pdev->dev, "switch");
diff --git a/drivers/net/ethernet/microchip/sparx5/sparx5_main.h b/drivers/net/ethernet/microchip/sparx5/sparx5_main.h
index 6f565c0..316fed5 100644
--- a/drivers/net/ethernet/microchip/sparx5/sparx5_main.h
+++ b/drivers/net/ethernet/microchip/sparx5/sparx5_main.h
@@ -280,6 +280,7 @@ struct sparx5 {
int xtr_irq;
/* Frame DMA */
int fdma_irq;
+ spinlock_t tx_lock; /* lock for frame transmission */
struct sparx5_rx rx;
struct sparx5_tx tx;
/* PTP */
diff --git a/drivers/net/ethernet/microchip/sparx5/sparx5_packet.c b/drivers/net/ethernet/microchip/sparx5/sparx5_packet.c
index 6db6ac6..ac7e1cf 100644
--- a/drivers/net/ethernet/microchip/sparx5/sparx5_packet.c
+++ b/drivers/net/ethernet/microchip/sparx5/sparx5_packet.c
@@ -244,10 +244,12 @@ netdev_tx_t sparx5_port_xmit_impl(struct sk_buff *skb, struct net_device *dev)
}
skb_tx_timestamp(skb);
+ spin_lock(&sparx5->tx_lock);
if (sparx5->fdma_irq > 0)
ret = sparx5_fdma_xmit(sparx5, ifh, skb);
else
ret = sparx5_inject(sparx5, ifh, skb, dev);
+ spin_unlock(&sparx5->tx_lock);
if (ret == -EBUSY)
goto busy;
diff --git a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
index e522845..0d7d138 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
@@ -1084,7 +1084,7 @@ nfp_tunnel_add_shared_mac(struct nfp_app *app, struct net_device *netdev,
u16 nfp_mac_idx = 0;
entry = nfp_tunnel_lookup_offloaded_macs(app, netdev->dev_addr);
- if (entry && nfp_tunnel_is_mac_idx_global(entry->index)) {
+ if (entry && (nfp_tunnel_is_mac_idx_global(entry->index) || netif_is_lag_port(netdev))) {
if (entry->bridge_count ||
!nfp_flower_is_supported_bridge(netdev)) {
nfp_tunnel_offloaded_macs_inc_ref_and_link(entry,
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 3b3210d..f28e769 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -2776,6 +2776,7 @@ static void nfp_net_netdev_init(struct nfp_net *nn)
case NFP_NFD_VER_NFD3:
netdev->netdev_ops = &nfp_nfd3_netdev_ops;
netdev->xdp_features |= NETDEV_XDP_ACT_XSK_ZEROCOPY;
+ netdev->xdp_features |= NETDEV_XDP_ACT_REDIRECT;
break;
case NFP_NFD_VER_NFDK:
netdev->netdev_ops = &nfp_nfdk_netdev_ops;
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
index 33b4c28..3f10c53 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
@@ -537,11 +537,13 @@ static int enable_bars(struct nfp6000_pcie *nfp, u16 interface)
const u32 barcfg_msix_general =
NFP_PCIE_BAR_PCIE2CPP_MapType(
NFP_PCIE_BAR_PCIE2CPP_MapType_GENERAL) |
- NFP_PCIE_BAR_PCIE2CPP_LengthSelect_32BIT;
+ NFP_PCIE_BAR_PCIE2CPP_LengthSelect(
+ NFP_PCIE_BAR_PCIE2CPP_LengthSelect_32BIT);
const u32 barcfg_msix_xpb =
NFP_PCIE_BAR_PCIE2CPP_MapType(
NFP_PCIE_BAR_PCIE2CPP_MapType_BULK) |
- NFP_PCIE_BAR_PCIE2CPP_LengthSelect_32BIT |
+ NFP_PCIE_BAR_PCIE2CPP_LengthSelect(
+ NFP_PCIE_BAR_PCIE2CPP_LengthSelect_32BIT) |
NFP_PCIE_BAR_PCIE2CPP_Target_BaseAddress(
NFP_CPP_TARGET_ISLAND_XPB);
const u32 barcfg_explicit[4] = {
diff --git a/drivers/net/ethernet/pensando/ionic/ionic_bus_pci.c b/drivers/net/ethernet/pensando/ionic/ionic_bus_pci.c
index c49aa35..6ba8d4ac 100644
--- a/drivers/net/ethernet/pensando/ionic/ionic_bus_pci.c
+++ b/drivers/net/ethernet/pensando/ionic/ionic_bus_pci.c
@@ -93,6 +93,7 @@ static void ionic_unmap_bars(struct ionic *ionic)
bars[i].len = 0;
}
}
+ ionic->num_bars = 0;
}
void __iomem *ionic_bus_map_dbpage(struct ionic *ionic, int page_num)
@@ -215,15 +216,17 @@ static int ionic_sriov_configure(struct pci_dev *pdev, int num_vfs)
static void ionic_clear_pci(struct ionic *ionic)
{
- ionic->idev.dev_info_regs = NULL;
- ionic->idev.dev_cmd_regs = NULL;
- ionic->idev.intr_status = NULL;
- ionic->idev.intr_ctrl = NULL;
+ if (ionic->num_bars) {
+ ionic->idev.dev_info_regs = NULL;
+ ionic->idev.dev_cmd_regs = NULL;
+ ionic->idev.intr_status = NULL;
+ ionic->idev.intr_ctrl = NULL;
- ionic_unmap_bars(ionic);
- pci_release_regions(ionic->pdev);
+ ionic_unmap_bars(ionic);
+ pci_release_regions(ionic->pdev);
+ }
- if (atomic_read(&ionic->pdev->enable_cnt) > 0)
+ if (pci_is_enabled(ionic->pdev))
pci_disable_device(ionic->pdev);
}
diff --git a/drivers/net/ethernet/pensando/ionic/ionic_dev.c b/drivers/net/ethernet/pensando/ionic/ionic_dev.c
index 1e7c71f..746072b 100644
--- a/drivers/net/ethernet/pensando/ionic/ionic_dev.c
+++ b/drivers/net/ethernet/pensando/ionic/ionic_dev.c
@@ -319,22 +319,32 @@ int ionic_heartbeat_check(struct ionic *ionic)
u8 ionic_dev_cmd_status(struct ionic_dev *idev)
{
+ if (!idev->dev_cmd_regs)
+ return (u8)PCI_ERROR_RESPONSE;
return ioread8(&idev->dev_cmd_regs->comp.comp.status);
}
bool ionic_dev_cmd_done(struct ionic_dev *idev)
{
+ if (!idev->dev_cmd_regs)
+ return false;
return ioread32(&idev->dev_cmd_regs->done) & IONIC_DEV_CMD_DONE;
}
void ionic_dev_cmd_comp(struct ionic_dev *idev, union ionic_dev_cmd_comp *comp)
{
+ if (!idev->dev_cmd_regs)
+ return;
memcpy_fromio(comp, &idev->dev_cmd_regs->comp, sizeof(*comp));
}
void ionic_dev_cmd_go(struct ionic_dev *idev, union ionic_dev_cmd *cmd)
{
idev->opcode = cmd->cmd.opcode;
+
+ if (!idev->dev_cmd_regs)
+ return;
+
memcpy_toio(&idev->dev_cmd_regs->cmd, cmd, sizeof(*cmd));
iowrite32(0, &idev->dev_cmd_regs->done);
iowrite32(1, &idev->dev_cmd_regs->doorbell);
diff --git a/drivers/net/ethernet/pensando/ionic/ionic_ethtool.c b/drivers/net/ethernet/pensando/ionic/ionic_ethtool.c
index cd3c0b0..0ffc9c4 100644
--- a/drivers/net/ethernet/pensando/ionic/ionic_ethtool.c
+++ b/drivers/net/ethernet/pensando/ionic/ionic_ethtool.c
@@ -90,18 +90,23 @@ static void ionic_get_regs(struct net_device *netdev, struct ethtool_regs *regs,
void *p)
{
struct ionic_lif *lif = netdev_priv(netdev);
+ struct ionic_dev *idev;
unsigned int offset;
unsigned int size;
regs->version = IONIC_DEV_CMD_REG_VERSION;
+ idev = &lif->ionic->idev;
+ if (!idev->dev_info_regs)
+ return;
+
offset = 0;
size = IONIC_DEV_INFO_REG_COUNT * sizeof(u32);
memcpy_fromio(p + offset, lif->ionic->idev.dev_info_regs->words, size);
offset += size;
size = IONIC_DEV_CMD_REG_COUNT * sizeof(u32);
- memcpy_fromio(p + offset, lif->ionic->idev.dev_cmd_regs->words, size);
+ memcpy_fromio(p + offset, idev->dev_cmd_regs->words, size);
}
static void ionic_get_link_ext_stats(struct net_device *netdev,
diff --git a/drivers/net/ethernet/pensando/ionic/ionic_fw.c b/drivers/net/ethernet/pensando/ionic/ionic_fw.c
index 5f40324..3c209c1 100644
--- a/drivers/net/ethernet/pensando/ionic/ionic_fw.c
+++ b/drivers/net/ethernet/pensando/ionic/ionic_fw.c
@@ -109,6 +109,11 @@ int ionic_firmware_update(struct ionic_lif *lif, const struct firmware *fw,
dl = priv_to_devlink(ionic);
devlink_flash_update_status_notify(dl, "Preparing to flash", NULL, 0, 0);
+ if (!idev->dev_cmd_regs) {
+ err = -ENXIO;
+ goto err_out;
+ }
+
buf_sz = sizeof(idev->dev_cmd_regs->data);
netdev_dbg(netdev,
diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.c b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
index cf2d5ad..fcb44ce 100644
--- a/drivers/net/ethernet/pensando/ionic/ionic_lif.c
+++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.c
@@ -3559,7 +3559,10 @@ int ionic_lif_init(struct ionic_lif *lif)
goto err_out_notifyq_deinit;
}
- err = ionic_init_nic_features(lif);
+ if (test_bit(IONIC_LIF_F_FW_RESET, lif->state))
+ err = ionic_set_nic_features(lif, lif->netdev->features);
+ else
+ err = ionic_init_nic_features(lif);
if (err)
goto err_out_notifyq_deinit;
diff --git a/drivers/net/ethernet/pensando/ionic/ionic_main.c b/drivers/net/ethernet/pensando/ionic/ionic_main.c
index 165ab08..2f479de 100644
--- a/drivers/net/ethernet/pensando/ionic/ionic_main.c
+++ b/drivers/net/ethernet/pensando/ionic/ionic_main.c
@@ -416,6 +416,9 @@ static void ionic_dev_cmd_clean(struct ionic *ionic)
{
struct ionic_dev *idev = &ionic->idev;
+ if (!idev->dev_cmd_regs)
+ return;
+
iowrite32(0, &idev->dev_cmd_regs->doorbell);
memset_io(&idev->dev_cmd_regs->cmd, 0, sizeof(idev->dev_cmd_regs->cmd));
}
diff --git a/drivers/net/ethernet/pensando/ionic/ionic_txrx.c b/drivers/net/ethernet/pensando/ionic/ionic_txrx.c
index 54cd96b..6f47767 100644
--- a/drivers/net/ethernet/pensando/ionic/ionic_txrx.c
+++ b/drivers/net/ethernet/pensando/ionic/ionic_txrx.c
@@ -579,6 +579,9 @@ int ionic_tx_napi(struct napi_struct *napi, int budget)
work_done = ionic_cq_service(cq, budget,
ionic_tx_service, NULL, NULL);
+ if (unlikely(!budget))
+ return budget;
+
if (work_done < budget && napi_complete_done(napi, work_done)) {
ionic_dim_update(qcq, IONIC_LIF_F_TX_DIM_INTR);
flags |= IONIC_INTR_CRED_UNMASK;
@@ -607,6 +610,9 @@ int ionic_rx_napi(struct napi_struct *napi, int budget)
u32 work_done = 0;
u32 flags = 0;
+ if (unlikely(!budget))
+ return budget;
+
lif = cq->bound_q->lif;
idev = &lif->ionic->idev;
@@ -656,6 +662,9 @@ int ionic_txrx_napi(struct napi_struct *napi, int budget)
tx_work_done = ionic_cq_service(txcq, IONIC_TX_BUDGET_DEFAULT,
ionic_tx_service, NULL, NULL);
+ if (unlikely(!budget))
+ return budget;
+
rx_work_done = ionic_cq_service(rxcq, budget,
ionic_rx_service, NULL, NULL);
diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index 0e3731f..f7566cf 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -772,29 +772,25 @@ static bool ravb_rx_gbeth(struct net_device *ndev, int *quota, int q)
struct ravb_rx_desc *desc;
struct sk_buff *skb;
dma_addr_t dma_addr;
+ int rx_packets = 0;
u8 desc_status;
- int boguscnt;
u16 pkt_len;
u8 die_dt;
int entry;
int limit;
+ int i;
entry = priv->cur_rx[q] % priv->num_rx_ring[q];
- boguscnt = priv->dirty_rx[q] + priv->num_rx_ring[q] - priv->cur_rx[q];
+ limit = priv->dirty_rx[q] + priv->num_rx_ring[q] - priv->cur_rx[q];
stats = &priv->stats[q];
- boguscnt = min(boguscnt, *quota);
- limit = boguscnt;
desc = &priv->gbeth_rx_ring[entry];
- while (desc->die_dt != DT_FEMPTY) {
+ for (i = 0; i < limit && rx_packets < *quota && desc->die_dt != DT_FEMPTY; i++) {
/* Descriptor type must be checked before all other reads */
dma_rmb();
desc_status = desc->msc;
pkt_len = le16_to_cpu(desc->ds_cc) & RX_DS;
- if (--boguscnt < 0)
- break;
-
/* We use 0-byte descriptors to mark the DMA mapping errors */
if (!pkt_len)
continue;
@@ -820,7 +816,7 @@ static bool ravb_rx_gbeth(struct net_device *ndev, int *quota, int q)
skb_put(skb, pkt_len);
skb->protocol = eth_type_trans(skb, ndev);
napi_gro_receive(&priv->napi[q], skb);
- stats->rx_packets++;
+ rx_packets++;
stats->rx_bytes += pkt_len;
break;
case DT_FSTART:
@@ -848,7 +844,7 @@ static bool ravb_rx_gbeth(struct net_device *ndev, int *quota, int q)
eth_type_trans(priv->rx_1st_skb, ndev);
napi_gro_receive(&priv->napi[q],
priv->rx_1st_skb);
- stats->rx_packets++;
+ rx_packets++;
stats->rx_bytes += pkt_len;
break;
}
@@ -887,9 +883,9 @@ static bool ravb_rx_gbeth(struct net_device *ndev, int *quota, int q)
desc->die_dt = DT_FEMPTY;
}
- *quota -= limit - (++boguscnt);
-
- return boguscnt <= 0;
+ stats->rx_packets += rx_packets;
+ *quota -= rx_packets;
+ return *quota == 0;
}
/* Packet receive function for Ethernet AVB */
diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
index 721c1f8..5ba606a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -59,28 +59,51 @@
#undef FRAME_FILTER_DEBUG
/* #define FRAME_FILTER_DEBUG */
+struct stmmac_q_tx_stats {
+ u64_stats_t tx_bytes;
+ u64_stats_t tx_set_ic_bit;
+ u64_stats_t tx_tso_frames;
+ u64_stats_t tx_tso_nfrags;
+};
+
+struct stmmac_napi_tx_stats {
+ u64_stats_t tx_packets;
+ u64_stats_t tx_pkt_n;
+ u64_stats_t poll;
+ u64_stats_t tx_clean;
+ u64_stats_t tx_set_ic_bit;
+};
+
struct stmmac_txq_stats {
- u64 tx_bytes;
- u64 tx_packets;
- u64 tx_pkt_n;
- u64 tx_normal_irq_n;
- u64 napi_poll;
- u64 tx_clean;
- u64 tx_set_ic_bit;
- u64 tx_tso_frames;
- u64 tx_tso_nfrags;
- struct u64_stats_sync syncp;
+ /* Updates protected by tx queue lock. */
+ struct u64_stats_sync q_syncp;
+ struct stmmac_q_tx_stats q;
+
+ /* Updates protected by NAPI poll logic. */
+ struct u64_stats_sync napi_syncp;
+ struct stmmac_napi_tx_stats napi;
} ____cacheline_aligned_in_smp;
+struct stmmac_napi_rx_stats {
+ u64_stats_t rx_bytes;
+ u64_stats_t rx_packets;
+ u64_stats_t rx_pkt_n;
+ u64_stats_t poll;
+};
+
struct stmmac_rxq_stats {
- u64 rx_bytes;
- u64 rx_packets;
- u64 rx_pkt_n;
- u64 rx_normal_irq_n;
- u64 napi_poll;
- struct u64_stats_sync syncp;
+ /* Updates protected by NAPI poll logic. */
+ struct u64_stats_sync napi_syncp;
+ struct stmmac_napi_rx_stats napi;
} ____cacheline_aligned_in_smp;
+/* Updates on each CPU protected by not allowing nested irqs. */
+struct stmmac_pcpu_stats {
+ struct u64_stats_sync syncp;
+ u64_stats_t rx_normal_irq_n[MTL_MAX_TX_QUEUES];
+ u64_stats_t tx_normal_irq_n[MTL_MAX_RX_QUEUES];
+};
+
/* Extra statistic and debug information exposed by ethtool */
struct stmmac_extra_stats {
/* Transmit errors */
@@ -205,6 +228,7 @@ struct stmmac_extra_stats {
/* per queue statistics */
struct stmmac_txq_stats txq_stats[MTL_MAX_TX_QUEUES];
struct stmmac_rxq_stats rxq_stats[MTL_MAX_RX_QUEUES];
+ struct stmmac_pcpu_stats __percpu *pcpu_stats;
unsigned long rx_dropped;
unsigned long rx_errors;
unsigned long tx_dropped;
@@ -216,6 +240,7 @@ struct stmmac_safety_stats {
unsigned long mac_errors[32];
unsigned long mtl_errors[32];
unsigned long dma_errors[32];
+ unsigned long dma_dpp_errors[32];
};
/* Number of fields in Safety Stats */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
index 137741b..b21d99f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
@@ -441,8 +441,7 @@ static int sun8i_dwmac_dma_interrupt(struct stmmac_priv *priv,
struct stmmac_extra_stats *x, u32 chan,
u32 dir)
{
- struct stmmac_rxq_stats *rxq_stats = &priv->xstats.rxq_stats[chan];
- struct stmmac_txq_stats *txq_stats = &priv->xstats.txq_stats[chan];
+ struct stmmac_pcpu_stats *stats = this_cpu_ptr(priv->xstats.pcpu_stats);
int ret = 0;
u32 v;
@@ -455,9 +454,9 @@ static int sun8i_dwmac_dma_interrupt(struct stmmac_priv *priv,
if (v & EMAC_TX_INT) {
ret |= handle_tx;
- u64_stats_update_begin(&txq_stats->syncp);
- txq_stats->tx_normal_irq_n++;
- u64_stats_update_end(&txq_stats->syncp);
+ u64_stats_update_begin(&stats->syncp);
+ u64_stats_inc(&stats->tx_normal_irq_n[chan]);
+ u64_stats_update_end(&stats->syncp);
}
if (v & EMAC_TX_DMA_STOP_INT)
@@ -479,9 +478,9 @@ static int sun8i_dwmac_dma_interrupt(struct stmmac_priv *priv,
if (v & EMAC_RX_INT) {
ret |= handle_rx;
- u64_stats_update_begin(&rxq_stats->syncp);
- rxq_stats->rx_normal_irq_n++;
- u64_stats_update_end(&rxq_stats->syncp);
+ u64_stats_update_begin(&stats->syncp);
+ u64_stats_inc(&stats->rx_normal_irq_n[chan]);
+ u64_stats_update_end(&stats->syncp);
}
if (v & EMAC_RX_BUF_UA_INT)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c
index 9470d3f..0d185e5 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c
@@ -171,8 +171,7 @@ int dwmac4_dma_interrupt(struct stmmac_priv *priv, void __iomem *ioaddr,
const struct dwmac4_addrs *dwmac4_addrs = priv->plat->dwmac4_addrs;
u32 intr_status = readl(ioaddr + DMA_CHAN_STATUS(dwmac4_addrs, chan));
u32 intr_en = readl(ioaddr + DMA_CHAN_INTR_ENA(dwmac4_addrs, chan));
- struct stmmac_rxq_stats *rxq_stats = &priv->xstats.rxq_stats[chan];
- struct stmmac_txq_stats *txq_stats = &priv->xstats.txq_stats[chan];
+ struct stmmac_pcpu_stats *stats = this_cpu_ptr(priv->xstats.pcpu_stats);
int ret = 0;
if (dir == DMA_DIR_RX)
@@ -201,15 +200,15 @@ int dwmac4_dma_interrupt(struct stmmac_priv *priv, void __iomem *ioaddr,
}
/* TX/RX NORMAL interrupts */
if (likely(intr_status & DMA_CHAN_STATUS_RI)) {
- u64_stats_update_begin(&rxq_stats->syncp);
- rxq_stats->rx_normal_irq_n++;
- u64_stats_update_end(&rxq_stats->syncp);
+ u64_stats_update_begin(&stats->syncp);
+ u64_stats_inc(&stats->rx_normal_irq_n[chan]);
+ u64_stats_update_end(&stats->syncp);
ret |= handle_rx;
}
if (likely(intr_status & DMA_CHAN_STATUS_TI)) {
- u64_stats_update_begin(&txq_stats->syncp);
- txq_stats->tx_normal_irq_n++;
- u64_stats_update_end(&txq_stats->syncp);
+ u64_stats_update_begin(&stats->syncp);
+ u64_stats_inc(&stats->tx_normal_irq_n[chan]);
+ u64_stats_update_end(&stats->syncp);
ret |= handle_tx;
}
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c b/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c
index 7907d62..85e18f9 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c
@@ -162,8 +162,7 @@ static void show_rx_process_state(unsigned int status)
int dwmac_dma_interrupt(struct stmmac_priv *priv, void __iomem *ioaddr,
struct stmmac_extra_stats *x, u32 chan, u32 dir)
{
- struct stmmac_rxq_stats *rxq_stats = &priv->xstats.rxq_stats[chan];
- struct stmmac_txq_stats *txq_stats = &priv->xstats.txq_stats[chan];
+ struct stmmac_pcpu_stats *stats = this_cpu_ptr(priv->xstats.pcpu_stats);
int ret = 0;
/* read the status register (CSR5) */
u32 intr_status = readl(ioaddr + DMA_STATUS);
@@ -215,16 +214,16 @@ int dwmac_dma_interrupt(struct stmmac_priv *priv, void __iomem *ioaddr,
u32 value = readl(ioaddr + DMA_INTR_ENA);
/* to schedule NAPI on real RIE event. */
if (likely(value & DMA_INTR_ENA_RIE)) {
- u64_stats_update_begin(&rxq_stats->syncp);
- rxq_stats->rx_normal_irq_n++;
- u64_stats_update_end(&rxq_stats->syncp);
+ u64_stats_update_begin(&stats->syncp);
+ u64_stats_inc(&stats->rx_normal_irq_n[chan]);
+ u64_stats_update_end(&stats->syncp);
ret |= handle_rx;
}
}
if (likely(intr_status & DMA_STATUS_TI)) {
- u64_stats_update_begin(&txq_stats->syncp);
- txq_stats->tx_normal_irq_n++;
- u64_stats_update_end(&txq_stats->syncp);
+ u64_stats_update_begin(&stats->syncp);
+ u64_stats_inc(&stats->tx_normal_irq_n[chan]);
+ u64_stats_update_end(&stats->syncp);
ret |= handle_tx;
}
if (unlikely(intr_status & DMA_STATUS_ERI))
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
index 207ff17..6a2c7d2 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
@@ -303,6 +303,8 @@
#define XGMAC_RXCEIE BIT(4)
#define XGMAC_TXCEIE BIT(0)
#define XGMAC_MTL_ECC_INT_STATUS 0x000010cc
+#define XGMAC_MTL_DPP_CONTROL 0x000010e0
+#define XGMAC_DPP_DISABLE BIT(0)
#define XGMAC_MTL_TXQ_OPMODE(x) (0x00001100 + (0x80 * (x)))
#define XGMAC_TQS GENMASK(25, 16)
#define XGMAC_TQS_SHIFT 16
@@ -385,6 +387,7 @@
#define XGMAC_DCEIE BIT(1)
#define XGMAC_TCEIE BIT(0)
#define XGMAC_DMA_ECC_INT_STATUS 0x0000306c
+#define XGMAC_DMA_DPP_INT_STATUS 0x00003074
#define XGMAC_DMA_CH_CONTROL(x) (0x00003100 + (0x80 * (x)))
#define XGMAC_SPH BIT(24)
#define XGMAC_PBLx8 BIT(16)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
index eb48211..1af2f89 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
@@ -830,6 +830,44 @@ static const struct dwxgmac3_error_desc dwxgmac3_dma_errors[32]= {
{ false, "UNKNOWN", "Unknown Error" }, /* 31 */
};
+#define DPP_RX_ERR "Read Rx Descriptor Parity checker Error"
+#define DPP_TX_ERR "Read Tx Descriptor Parity checker Error"
+
+static const struct dwxgmac3_error_desc dwxgmac3_dma_dpp_errors[32] = {
+ { true, "TDPES0", DPP_TX_ERR },
+ { true, "TDPES1", DPP_TX_ERR },
+ { true, "TDPES2", DPP_TX_ERR },
+ { true, "TDPES3", DPP_TX_ERR },
+ { true, "TDPES4", DPP_TX_ERR },
+ { true, "TDPES5", DPP_TX_ERR },
+ { true, "TDPES6", DPP_TX_ERR },
+ { true, "TDPES7", DPP_TX_ERR },
+ { true, "TDPES8", DPP_TX_ERR },
+ { true, "TDPES9", DPP_TX_ERR },
+ { true, "TDPES10", DPP_TX_ERR },
+ { true, "TDPES11", DPP_TX_ERR },
+ { true, "TDPES12", DPP_TX_ERR },
+ { true, "TDPES13", DPP_TX_ERR },
+ { true, "TDPES14", DPP_TX_ERR },
+ { true, "TDPES15", DPP_TX_ERR },
+ { true, "RDPES0", DPP_RX_ERR },
+ { true, "RDPES1", DPP_RX_ERR },
+ { true, "RDPES2", DPP_RX_ERR },
+ { true, "RDPES3", DPP_RX_ERR },
+ { true, "RDPES4", DPP_RX_ERR },
+ { true, "RDPES5", DPP_RX_ERR },
+ { true, "RDPES6", DPP_RX_ERR },
+ { true, "RDPES7", DPP_RX_ERR },
+ { true, "RDPES8", DPP_RX_ERR },
+ { true, "RDPES9", DPP_RX_ERR },
+ { true, "RDPES10", DPP_RX_ERR },
+ { true, "RDPES11", DPP_RX_ERR },
+ { true, "RDPES12", DPP_RX_ERR },
+ { true, "RDPES13", DPP_RX_ERR },
+ { true, "RDPES14", DPP_RX_ERR },
+ { true, "RDPES15", DPP_RX_ERR },
+};
+
static void dwxgmac3_handle_dma_err(struct net_device *ndev,
void __iomem *ioaddr, bool correctable,
struct stmmac_safety_stats *stats)
@@ -841,6 +879,13 @@ static void dwxgmac3_handle_dma_err(struct net_device *ndev,
dwxgmac3_log_error(ndev, value, correctable, "DMA",
dwxgmac3_dma_errors, STAT_OFF(dma_errors), stats);
+
+ value = readl(ioaddr + XGMAC_DMA_DPP_INT_STATUS);
+ writel(value, ioaddr + XGMAC_DMA_DPP_INT_STATUS);
+
+ dwxgmac3_log_error(ndev, value, false, "DMA_DPP",
+ dwxgmac3_dma_dpp_errors,
+ STAT_OFF(dma_dpp_errors), stats);
}
static int
@@ -881,6 +926,12 @@ dwxgmac3_safety_feat_config(void __iomem *ioaddr, unsigned int asp,
value |= XGMAC_TMOUTEN; /* FSM Timeout Feature */
writel(value, ioaddr + XGMAC_MAC_FSM_CONTROL);
+ /* 5. Enable Data Path Parity Protection */
+ value = readl(ioaddr + XGMAC_MTL_DPP_CONTROL);
+ /* already enabled by default, explicit enable it again */
+ value &= ~XGMAC_DPP_DISABLE;
+ writel(value, ioaddr + XGMAC_MTL_DPP_CONTROL);
+
return 0;
}
@@ -914,7 +965,11 @@ static int dwxgmac3_safety_feat_irq_status(struct net_device *ndev,
ret |= !corr;
}
- err = dma & (XGMAC_DEUIS | XGMAC_DECIS);
+ /* DMA_DPP_Interrupt_Status is indicated by MCSIS bit in
+ * DMA_Safety_Interrupt_Status, so we handle DMA Data Path
+ * Parity Errors here
+ */
+ err = dma & (XGMAC_DEUIS | XGMAC_DECIS | XGMAC_MCSIS);
corr = dma & XGMAC_DECIS;
if (err) {
dwxgmac3_handle_dma_err(ndev, ioaddr, corr, stats);
@@ -930,6 +985,7 @@ static const struct dwxgmac3_error {
{ dwxgmac3_mac_errors },
{ dwxgmac3_mtl_errors },
{ dwxgmac3_dma_errors },
+ { dwxgmac3_dma_dpp_errors },
};
static int dwxgmac3_safety_feat_dump(struct stmmac_safety_stats *stats,
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c
index 3cde695..dd2ab61 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c
@@ -337,8 +337,7 @@ static int dwxgmac2_dma_interrupt(struct stmmac_priv *priv,
struct stmmac_extra_stats *x, u32 chan,
u32 dir)
{
- struct stmmac_rxq_stats *rxq_stats = &priv->xstats.rxq_stats[chan];
- struct stmmac_txq_stats *txq_stats = &priv->xstats.txq_stats[chan];
+ struct stmmac_pcpu_stats *stats = this_cpu_ptr(priv->xstats.pcpu_stats);
u32 intr_status = readl(ioaddr + XGMAC_DMA_CH_STATUS(chan));
u32 intr_en = readl(ioaddr + XGMAC_DMA_CH_INT_EN(chan));
int ret = 0;
@@ -367,15 +366,15 @@ static int dwxgmac2_dma_interrupt(struct stmmac_priv *priv,
/* TX/RX NORMAL interrupts */
if (likely(intr_status & XGMAC_NIS)) {
if (likely(intr_status & XGMAC_RI)) {
- u64_stats_update_begin(&rxq_stats->syncp);
- rxq_stats->rx_normal_irq_n++;
- u64_stats_update_end(&rxq_stats->syncp);
+ u64_stats_update_begin(&stats->syncp);
+ u64_stats_inc(&stats->rx_normal_irq_n[chan]);
+ u64_stats_update_end(&stats->syncp);
ret |= handle_rx;
}
if (likely(intr_status & (XGMAC_TI | XGMAC_TBU))) {
- u64_stats_update_begin(&txq_stats->syncp);
- txq_stats->tx_normal_irq_n++;
- u64_stats_update_end(&txq_stats->syncp);
+ u64_stats_update_begin(&stats->syncp);
+ u64_stats_inc(&stats->tx_normal_irq_n[chan]);
+ u64_stats_update_end(&stats->syncp);
ret |= handle_tx;
}
}
diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.c b/drivers/net/ethernet/stmicro/stmmac/hwif.c
index 1bd34b2..2936710 100644
--- a/drivers/net/ethernet/stmicro/stmmac/hwif.c
+++ b/drivers/net/ethernet/stmicro/stmmac/hwif.c
@@ -224,7 +224,7 @@ static const struct stmmac_hwif_entry {
.regs = {
.ptp_off = PTP_GMAC4_OFFSET,
.mmc_off = MMC_GMAC4_OFFSET,
- .est_off = EST_XGMAC_OFFSET,
+ .est_off = EST_GMAC4_OFFSET,
},
.desc = &dwmac4_desc_ops,
.dma = &dwmac410_dma_ops,
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
index 42d27b9..ec44bec 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
@@ -549,44 +549,79 @@ stmmac_set_pauseparam(struct net_device *netdev,
}
}
+static u64 stmmac_get_rx_normal_irq_n(struct stmmac_priv *priv, int q)
+{
+ u64 total;
+ int cpu;
+
+ total = 0;
+ for_each_possible_cpu(cpu) {
+ struct stmmac_pcpu_stats *pcpu;
+ unsigned int start;
+ u64 irq_n;
+
+ pcpu = per_cpu_ptr(priv->xstats.pcpu_stats, cpu);
+ do {
+ start = u64_stats_fetch_begin(&pcpu->syncp);
+ irq_n = u64_stats_read(&pcpu->rx_normal_irq_n[q]);
+ } while (u64_stats_fetch_retry(&pcpu->syncp, start));
+ total += irq_n;
+ }
+ return total;
+}
+
+static u64 stmmac_get_tx_normal_irq_n(struct stmmac_priv *priv, int q)
+{
+ u64 total;
+ int cpu;
+
+ total = 0;
+ for_each_possible_cpu(cpu) {
+ struct stmmac_pcpu_stats *pcpu;
+ unsigned int start;
+ u64 irq_n;
+
+ pcpu = per_cpu_ptr(priv->xstats.pcpu_stats, cpu);
+ do {
+ start = u64_stats_fetch_begin(&pcpu->syncp);
+ irq_n = u64_stats_read(&pcpu->tx_normal_irq_n[q]);
+ } while (u64_stats_fetch_retry(&pcpu->syncp, start));
+ total += irq_n;
+ }
+ return total;
+}
+
static void stmmac_get_per_qstats(struct stmmac_priv *priv, u64 *data)
{
u32 tx_cnt = priv->plat->tx_queues_to_use;
u32 rx_cnt = priv->plat->rx_queues_to_use;
unsigned int start;
- int q, stat;
- char *p;
+ int q;
for (q = 0; q < tx_cnt; q++) {
struct stmmac_txq_stats *txq_stats = &priv->xstats.txq_stats[q];
- struct stmmac_txq_stats snapshot;
+ u64 pkt_n;
do {
- start = u64_stats_fetch_begin(&txq_stats->syncp);
- snapshot = *txq_stats;
- } while (u64_stats_fetch_retry(&txq_stats->syncp, start));
+ start = u64_stats_fetch_begin(&txq_stats->napi_syncp);
+ pkt_n = u64_stats_read(&txq_stats->napi.tx_pkt_n);
+ } while (u64_stats_fetch_retry(&txq_stats->napi_syncp, start));
- p = (char *)&snapshot + offsetof(struct stmmac_txq_stats, tx_pkt_n);
- for (stat = 0; stat < STMMAC_TXQ_STATS; stat++) {
- *data++ = (*(u64 *)p);
- p += sizeof(u64);
- }
+ *data++ = pkt_n;
+ *data++ = stmmac_get_tx_normal_irq_n(priv, q);
}
for (q = 0; q < rx_cnt; q++) {
struct stmmac_rxq_stats *rxq_stats = &priv->xstats.rxq_stats[q];
- struct stmmac_rxq_stats snapshot;
+ u64 pkt_n;
do {
- start = u64_stats_fetch_begin(&rxq_stats->syncp);
- snapshot = *rxq_stats;
- } while (u64_stats_fetch_retry(&rxq_stats->syncp, start));
+ start = u64_stats_fetch_begin(&rxq_stats->napi_syncp);
+ pkt_n = u64_stats_read(&rxq_stats->napi.rx_pkt_n);
+ } while (u64_stats_fetch_retry(&rxq_stats->napi_syncp, start));
- p = (char *)&snapshot + offsetof(struct stmmac_rxq_stats, rx_pkt_n);
- for (stat = 0; stat < STMMAC_RXQ_STATS; stat++) {
- *data++ = (*(u64 *)p);
- p += sizeof(u64);
- }
+ *data++ = pkt_n;
+ *data++ = stmmac_get_rx_normal_irq_n(priv, q);
}
}
@@ -645,39 +680,49 @@ static void stmmac_get_ethtool_stats(struct net_device *dev,
pos = j;
for (i = 0; i < rx_queues_count; i++) {
struct stmmac_rxq_stats *rxq_stats = &priv->xstats.rxq_stats[i];
- struct stmmac_rxq_stats snapshot;
+ struct stmmac_napi_rx_stats snapshot;
+ u64 n_irq;
j = pos;
do {
- start = u64_stats_fetch_begin(&rxq_stats->syncp);
- snapshot = *rxq_stats;
- } while (u64_stats_fetch_retry(&rxq_stats->syncp, start));
+ start = u64_stats_fetch_begin(&rxq_stats->napi_syncp);
+ snapshot = rxq_stats->napi;
+ } while (u64_stats_fetch_retry(&rxq_stats->napi_syncp, start));
- data[j++] += snapshot.rx_pkt_n;
- data[j++] += snapshot.rx_normal_irq_n;
- normal_irq_n += snapshot.rx_normal_irq_n;
- napi_poll += snapshot.napi_poll;
+ data[j++] += u64_stats_read(&snapshot.rx_pkt_n);
+ n_irq = stmmac_get_rx_normal_irq_n(priv, i);
+ data[j++] += n_irq;
+ normal_irq_n += n_irq;
+ napi_poll += u64_stats_read(&snapshot.poll);
}
pos = j;
for (i = 0; i < tx_queues_count; i++) {
struct stmmac_txq_stats *txq_stats = &priv->xstats.txq_stats[i];
- struct stmmac_txq_stats snapshot;
+ struct stmmac_napi_tx_stats napi_snapshot;
+ struct stmmac_q_tx_stats q_snapshot;
+ u64 n_irq;
j = pos;
do {
- start = u64_stats_fetch_begin(&txq_stats->syncp);
- snapshot = *txq_stats;
- } while (u64_stats_fetch_retry(&txq_stats->syncp, start));
+ start = u64_stats_fetch_begin(&txq_stats->q_syncp);
+ q_snapshot = txq_stats->q;
+ } while (u64_stats_fetch_retry(&txq_stats->q_syncp, start));
+ do {
+ start = u64_stats_fetch_begin(&txq_stats->napi_syncp);
+ napi_snapshot = txq_stats->napi;
+ } while (u64_stats_fetch_retry(&txq_stats->napi_syncp, start));
- data[j++] += snapshot.tx_pkt_n;
- data[j++] += snapshot.tx_normal_irq_n;
- normal_irq_n += snapshot.tx_normal_irq_n;
- data[j++] += snapshot.tx_clean;
- data[j++] += snapshot.tx_set_ic_bit;
- data[j++] += snapshot.tx_tso_frames;
- data[j++] += snapshot.tx_tso_nfrags;
- napi_poll += snapshot.napi_poll;
+ data[j++] += u64_stats_read(&napi_snapshot.tx_pkt_n);
+ n_irq = stmmac_get_tx_normal_irq_n(priv, i);
+ data[j++] += n_irq;
+ normal_irq_n += n_irq;
+ data[j++] += u64_stats_read(&napi_snapshot.tx_clean);
+ data[j++] += u64_stats_read(&q_snapshot.tx_set_ic_bit) +
+ u64_stats_read(&napi_snapshot.tx_set_ic_bit);
+ data[j++] += u64_stats_read(&q_snapshot.tx_tso_frames);
+ data[j++] += u64_stats_read(&q_snapshot.tx_tso_nfrags);
+ napi_poll += u64_stats_read(&napi_snapshot.poll);
}
normal_irq_n += priv->xstats.rx_early_irq;
data[j++] = normal_irq_n;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 2551995..7c6aef0 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2482,7 +2482,6 @@ static bool stmmac_xdp_xmit_zc(struct stmmac_priv *priv, u32 queue, u32 budget)
struct xdp_desc xdp_desc;
bool work_done = true;
u32 tx_set_ic_bit = 0;
- unsigned long flags;
/* Avoids TX time-out as we are sharing with slow path */
txq_trans_cond_update(nq);
@@ -2566,9 +2565,9 @@ static bool stmmac_xdp_xmit_zc(struct stmmac_priv *priv, u32 queue, u32 budget)
tx_q->cur_tx = STMMAC_GET_ENTRY(tx_q->cur_tx, priv->dma_conf.dma_tx_size);
entry = tx_q->cur_tx;
}
- flags = u64_stats_update_begin_irqsave(&txq_stats->syncp);
- txq_stats->tx_set_ic_bit += tx_set_ic_bit;
- u64_stats_update_end_irqrestore(&txq_stats->syncp, flags);
+ u64_stats_update_begin(&txq_stats->napi_syncp);
+ u64_stats_add(&txq_stats->napi.tx_set_ic_bit, tx_set_ic_bit);
+ u64_stats_update_end(&txq_stats->napi_syncp);
if (tx_desc) {
stmmac_flush_tx_descriptors(priv, queue);
@@ -2616,7 +2615,6 @@ static int stmmac_tx_clean(struct stmmac_priv *priv, int budget, u32 queue,
unsigned int bytes_compl = 0, pkts_compl = 0;
unsigned int entry, xmits = 0, count = 0;
u32 tx_packets = 0, tx_errors = 0;
- unsigned long flags;
__netif_tx_lock_bh(netdev_get_tx_queue(priv->dev, queue));
@@ -2674,7 +2672,8 @@ static int stmmac_tx_clean(struct stmmac_priv *priv, int budget, u32 queue,
}
if (skb) {
stmmac_get_tx_hwtstamp(priv, p, skb);
- } else {
+ } else if (tx_q->xsk_pool &&
+ xp_tx_metadata_enabled(tx_q->xsk_pool)) {
struct stmmac_xsk_tx_complete tx_compl = {
.priv = priv,
.desc = p,
@@ -2782,11 +2781,11 @@ static int stmmac_tx_clean(struct stmmac_priv *priv, int budget, u32 queue,
if (tx_q->dirty_tx != tx_q->cur_tx)
*pending_packets = true;
- flags = u64_stats_update_begin_irqsave(&txq_stats->syncp);
- txq_stats->tx_packets += tx_packets;
- txq_stats->tx_pkt_n += tx_packets;
- txq_stats->tx_clean++;
- u64_stats_update_end_irqrestore(&txq_stats->syncp, flags);
+ u64_stats_update_begin(&txq_stats->napi_syncp);
+ u64_stats_add(&txq_stats->napi.tx_packets, tx_packets);
+ u64_stats_add(&txq_stats->napi.tx_pkt_n, tx_packets);
+ u64_stats_inc(&txq_stats->napi.tx_clean);
+ u64_stats_update_end(&txq_stats->napi_syncp);
priv->xstats.tx_errors += tx_errors;
@@ -4007,8 +4006,10 @@ static void stmmac_fpe_stop_wq(struct stmmac_priv *priv)
{
set_bit(__FPE_REMOVING, &priv->fpe_task_state);
- if (priv->fpe_wq)
+ if (priv->fpe_wq) {
destroy_workqueue(priv->fpe_wq);
+ priv->fpe_wq = NULL;
+ }
netdev_info(priv->dev, "FPE workqueue stop");
}
@@ -4213,7 +4214,6 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
struct stmmac_tx_queue *tx_q;
bool has_vlan, set_ic;
u8 proto_hdr_len, hdr;
- unsigned long flags;
u32 pay_len, mss;
dma_addr_t des;
int i;
@@ -4378,13 +4378,13 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
netif_tx_stop_queue(netdev_get_tx_queue(priv->dev, queue));
}
- flags = u64_stats_update_begin_irqsave(&txq_stats->syncp);
- txq_stats->tx_bytes += skb->len;
- txq_stats->tx_tso_frames++;
- txq_stats->tx_tso_nfrags += nfrags;
+ u64_stats_update_begin(&txq_stats->q_syncp);
+ u64_stats_add(&txq_stats->q.tx_bytes, skb->len);
+ u64_stats_inc(&txq_stats->q.tx_tso_frames);
+ u64_stats_add(&txq_stats->q.tx_tso_nfrags, nfrags);
if (set_ic)
- txq_stats->tx_set_ic_bit++;
- u64_stats_update_end_irqrestore(&txq_stats->syncp, flags);
+ u64_stats_inc(&txq_stats->q.tx_set_ic_bit);
+ u64_stats_update_end(&txq_stats->q_syncp);
if (priv->sarc_type)
stmmac_set_desc_sarc(priv, first, priv->sarc_type);
@@ -4483,7 +4483,6 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
struct stmmac_tx_queue *tx_q;
bool has_vlan, set_ic;
int entry, first_tx;
- unsigned long flags;
dma_addr_t des;
tx_q = &priv->dma_conf.tx_queue[queue];
@@ -4653,11 +4652,11 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
netif_tx_stop_queue(netdev_get_tx_queue(priv->dev, queue));
}
- flags = u64_stats_update_begin_irqsave(&txq_stats->syncp);
- txq_stats->tx_bytes += skb->len;
+ u64_stats_update_begin(&txq_stats->q_syncp);
+ u64_stats_add(&txq_stats->q.tx_bytes, skb->len);
if (set_ic)
- txq_stats->tx_set_ic_bit++;
- u64_stats_update_end_irqrestore(&txq_stats->syncp, flags);
+ u64_stats_inc(&txq_stats->q.tx_set_ic_bit);
+ u64_stats_update_end(&txq_stats->q_syncp);
if (priv->sarc_type)
stmmac_set_desc_sarc(priv, first, priv->sarc_type);
@@ -4921,12 +4920,11 @@ static int stmmac_xdp_xmit_xdpf(struct stmmac_priv *priv, int queue,
set_ic = false;
if (set_ic) {
- unsigned long flags;
tx_q->tx_count_frames = 0;
stmmac_set_tx_ic(priv, tx_desc);
- flags = u64_stats_update_begin_irqsave(&txq_stats->syncp);
- txq_stats->tx_set_ic_bit++;
- u64_stats_update_end_irqrestore(&txq_stats->syncp, flags);
+ u64_stats_update_begin(&txq_stats->q_syncp);
+ u64_stats_inc(&txq_stats->q.tx_set_ic_bit);
+ u64_stats_update_end(&txq_stats->q_syncp);
}
stmmac_enable_dma_transmission(priv, priv->ioaddr);
@@ -5076,7 +5074,6 @@ static void stmmac_dispatch_skb_zc(struct stmmac_priv *priv, u32 queue,
unsigned int len = xdp->data_end - xdp->data;
enum pkt_hash_types hash_type;
int coe = priv->hw->rx_csum;
- unsigned long flags;
struct sk_buff *skb;
u32 hash;
@@ -5106,10 +5103,10 @@ static void stmmac_dispatch_skb_zc(struct stmmac_priv *priv, u32 queue,
skb_record_rx_queue(skb, queue);
napi_gro_receive(&ch->rxtx_napi, skb);
- flags = u64_stats_update_begin_irqsave(&rxq_stats->syncp);
- rxq_stats->rx_pkt_n++;
- rxq_stats->rx_bytes += len;
- u64_stats_update_end_irqrestore(&rxq_stats->syncp, flags);
+ u64_stats_update_begin(&rxq_stats->napi_syncp);
+ u64_stats_inc(&rxq_stats->napi.rx_pkt_n);
+ u64_stats_add(&rxq_stats->napi.rx_bytes, len);
+ u64_stats_update_end(&rxq_stats->napi_syncp);
}
static bool stmmac_rx_refill_zc(struct stmmac_priv *priv, u32 queue, u32 budget)
@@ -5191,7 +5188,6 @@ static int stmmac_rx_zc(struct stmmac_priv *priv, int limit, u32 queue)
unsigned int desc_size;
struct bpf_prog *prog;
bool failure = false;
- unsigned long flags;
int xdp_status = 0;
int status = 0;
@@ -5346,9 +5342,9 @@ static int stmmac_rx_zc(struct stmmac_priv *priv, int limit, u32 queue)
stmmac_finalize_xdp_rx(priv, xdp_status);
- flags = u64_stats_update_begin_irqsave(&rxq_stats->syncp);
- rxq_stats->rx_pkt_n += count;
- u64_stats_update_end_irqrestore(&rxq_stats->syncp, flags);
+ u64_stats_update_begin(&rxq_stats->napi_syncp);
+ u64_stats_add(&rxq_stats->napi.rx_pkt_n, count);
+ u64_stats_update_end(&rxq_stats->napi_syncp);
priv->xstats.rx_dropped += rx_dropped;
priv->xstats.rx_errors += rx_errors;
@@ -5386,7 +5382,6 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
unsigned int desc_size;
struct sk_buff *skb = NULL;
struct stmmac_xdp_buff ctx;
- unsigned long flags;
int xdp_status = 0;
int buf_sz;
@@ -5646,11 +5641,11 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
stmmac_rx_refill(priv, queue);
- flags = u64_stats_update_begin_irqsave(&rxq_stats->syncp);
- rxq_stats->rx_packets += rx_packets;
- rxq_stats->rx_bytes += rx_bytes;
- rxq_stats->rx_pkt_n += count;
- u64_stats_update_end_irqrestore(&rxq_stats->syncp, flags);
+ u64_stats_update_begin(&rxq_stats->napi_syncp);
+ u64_stats_add(&rxq_stats->napi.rx_packets, rx_packets);
+ u64_stats_add(&rxq_stats->napi.rx_bytes, rx_bytes);
+ u64_stats_add(&rxq_stats->napi.rx_pkt_n, count);
+ u64_stats_update_end(&rxq_stats->napi_syncp);
priv->xstats.rx_dropped += rx_dropped;
priv->xstats.rx_errors += rx_errors;
@@ -5665,13 +5660,12 @@ static int stmmac_napi_poll_rx(struct napi_struct *napi, int budget)
struct stmmac_priv *priv = ch->priv_data;
struct stmmac_rxq_stats *rxq_stats;
u32 chan = ch->index;
- unsigned long flags;
int work_done;
rxq_stats = &priv->xstats.rxq_stats[chan];
- flags = u64_stats_update_begin_irqsave(&rxq_stats->syncp);
- rxq_stats->napi_poll++;
- u64_stats_update_end_irqrestore(&rxq_stats->syncp, flags);
+ u64_stats_update_begin(&rxq_stats->napi_syncp);
+ u64_stats_inc(&rxq_stats->napi.poll);
+ u64_stats_update_end(&rxq_stats->napi_syncp);
work_done = stmmac_rx(priv, budget, chan);
if (work_done < budget && napi_complete_done(napi, work_done)) {
@@ -5693,13 +5687,12 @@ static int stmmac_napi_poll_tx(struct napi_struct *napi, int budget)
struct stmmac_txq_stats *txq_stats;
bool pending_packets = false;
u32 chan = ch->index;
- unsigned long flags;
int work_done;
txq_stats = &priv->xstats.txq_stats[chan];
- flags = u64_stats_update_begin_irqsave(&txq_stats->syncp);
- txq_stats->napi_poll++;
- u64_stats_update_end_irqrestore(&txq_stats->syncp, flags);
+ u64_stats_update_begin(&txq_stats->napi_syncp);
+ u64_stats_inc(&txq_stats->napi.poll);
+ u64_stats_update_end(&txq_stats->napi_syncp);
work_done = stmmac_tx_clean(priv, budget, chan, &pending_packets);
work_done = min(work_done, budget);
@@ -5729,17 +5722,16 @@ static int stmmac_napi_poll_rxtx(struct napi_struct *napi, int budget)
struct stmmac_rxq_stats *rxq_stats;
struct stmmac_txq_stats *txq_stats;
u32 chan = ch->index;
- unsigned long flags;
rxq_stats = &priv->xstats.rxq_stats[chan];
- flags = u64_stats_update_begin_irqsave(&rxq_stats->syncp);
- rxq_stats->napi_poll++;
- u64_stats_update_end_irqrestore(&rxq_stats->syncp, flags);
+ u64_stats_update_begin(&rxq_stats->napi_syncp);
+ u64_stats_inc(&rxq_stats->napi.poll);
+ u64_stats_update_end(&rxq_stats->napi_syncp);
txq_stats = &priv->xstats.txq_stats[chan];
- flags = u64_stats_update_begin_irqsave(&txq_stats->syncp);
- txq_stats->napi_poll++;
- u64_stats_update_end_irqrestore(&txq_stats->syncp, flags);
+ u64_stats_update_begin(&txq_stats->napi_syncp);
+ u64_stats_inc(&txq_stats->napi.poll);
+ u64_stats_update_end(&txq_stats->napi_syncp);
tx_done = stmmac_tx_clean(priv, budget, chan, &tx_pending_packets);
tx_done = min(tx_done, budget);
@@ -6070,11 +6062,6 @@ static irqreturn_t stmmac_mac_interrupt(int irq, void *dev_id)
struct net_device *dev = (struct net_device *)dev_id;
struct stmmac_priv *priv = netdev_priv(dev);
- if (unlikely(!dev)) {
- netdev_err(priv->dev, "%s: invalid dev pointer\n", __func__);
- return IRQ_NONE;
- }
-
/* Check if adapter is up */
if (test_bit(STMMAC_DOWN, &priv->state))
return IRQ_HANDLED;
@@ -6090,11 +6077,6 @@ static irqreturn_t stmmac_safety_interrupt(int irq, void *dev_id)
struct net_device *dev = (struct net_device *)dev_id;
struct stmmac_priv *priv = netdev_priv(dev);
- if (unlikely(!dev)) {
- netdev_err(priv->dev, "%s: invalid dev pointer\n", __func__);
- return IRQ_NONE;
- }
-
/* Check if adapter is up */
if (test_bit(STMMAC_DOWN, &priv->state))
return IRQ_HANDLED;
@@ -6116,11 +6098,6 @@ static irqreturn_t stmmac_msi_intr_tx(int irq, void *data)
dma_conf = container_of(tx_q, struct stmmac_dma_conf, tx_queue[chan]);
priv = container_of(dma_conf, struct stmmac_priv, dma_conf);
- if (unlikely(!data)) {
- netdev_err(priv->dev, "%s: invalid dev pointer\n", __func__);
- return IRQ_NONE;
- }
-
/* Check if adapter is up */
if (test_bit(STMMAC_DOWN, &priv->state))
return IRQ_HANDLED;
@@ -6147,11 +6124,6 @@ static irqreturn_t stmmac_msi_intr_rx(int irq, void *data)
dma_conf = container_of(rx_q, struct stmmac_dma_conf, rx_queue[chan]);
priv = container_of(dma_conf, struct stmmac_priv, dma_conf);
- if (unlikely(!data)) {
- netdev_err(priv->dev, "%s: invalid dev pointer\n", __func__);
- return IRQ_NONE;
- }
-
/* Check if adapter is up */
if (test_bit(STMMAC_DOWN, &priv->state))
return IRQ_HANDLED;
@@ -7065,10 +7037,13 @@ static void stmmac_get_stats64(struct net_device *dev, struct rtnl_link_stats64
u64 tx_bytes;
do {
- start = u64_stats_fetch_begin(&txq_stats->syncp);
- tx_packets = txq_stats->tx_packets;
- tx_bytes = txq_stats->tx_bytes;
- } while (u64_stats_fetch_retry(&txq_stats->syncp, start));
+ start = u64_stats_fetch_begin(&txq_stats->q_syncp);
+ tx_bytes = u64_stats_read(&txq_stats->q.tx_bytes);
+ } while (u64_stats_fetch_retry(&txq_stats->q_syncp, start));
+ do {
+ start = u64_stats_fetch_begin(&txq_stats->napi_syncp);
+ tx_packets = u64_stats_read(&txq_stats->napi.tx_packets);
+ } while (u64_stats_fetch_retry(&txq_stats->napi_syncp, start));
stats->tx_packets += tx_packets;
stats->tx_bytes += tx_bytes;
@@ -7080,10 +7055,10 @@ static void stmmac_get_stats64(struct net_device *dev, struct rtnl_link_stats64
u64 rx_bytes;
do {
- start = u64_stats_fetch_begin(&rxq_stats->syncp);
- rx_packets = rxq_stats->rx_packets;
- rx_bytes = rxq_stats->rx_bytes;
- } while (u64_stats_fetch_retry(&rxq_stats->syncp, start));
+ start = u64_stats_fetch_begin(&rxq_stats->napi_syncp);
+ rx_packets = u64_stats_read(&rxq_stats->napi.rx_packets);
+ rx_bytes = u64_stats_read(&rxq_stats->napi.rx_bytes);
+ } while (u64_stats_fetch_retry(&rxq_stats->napi_syncp, start));
stats->rx_packets += rx_packets;
stats->rx_bytes += rx_bytes;
@@ -7477,9 +7452,16 @@ int stmmac_dvr_probe(struct device *device,
priv->dev = ndev;
for (i = 0; i < MTL_MAX_RX_QUEUES; i++)
- u64_stats_init(&priv->xstats.rxq_stats[i].syncp);
- for (i = 0; i < MTL_MAX_TX_QUEUES; i++)
- u64_stats_init(&priv->xstats.txq_stats[i].syncp);
+ u64_stats_init(&priv->xstats.rxq_stats[i].napi_syncp);
+ for (i = 0; i < MTL_MAX_TX_QUEUES; i++) {
+ u64_stats_init(&priv->xstats.txq_stats[i].q_syncp);
+ u64_stats_init(&priv->xstats.txq_stats[i].napi_syncp);
+ }
+
+ priv->xstats.pcpu_stats =
+ devm_netdev_alloc_pcpu_stats(device, struct stmmac_pcpu_stats);
+ if (!priv->xstats.pcpu_stats)
+ return -ENOMEM;
stmmac_set_ethtool_ops(ndev);
priv->pause = pause;
diff --git a/drivers/net/ethernet/ti/Kconfig b/drivers/net/ethernet/ti/Kconfig
index be01450..1530d139 100644
--- a/drivers/net/ethernet/ti/Kconfig
+++ b/drivers/net/ethernet/ti/Kconfig
@@ -189,6 +189,7 @@
select TI_K3_CPPI_DESC_POOL
depends on PRU_REMOTEPROC
depends on ARCH_K3 && OF && TI_K3_UDMA_GLUE_LAYER
+ depends on PTP_1588_CLOCK_OPTIONAL
help
Support dual Gigabit Ethernet ports over the ICSSG PRU Subsystem.
This subsystem is available starting with the AM65 platform.
diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
index 9d2f4ac..2939a21 100644
--- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c
+++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
@@ -294,7 +294,7 @@ static void am65_cpsw_nuss_ndo_host_tx_timeout(struct net_device *ndev,
txqueue,
netif_tx_queue_stopped(netif_txq),
jiffies_to_msecs(jiffies - trans_start),
- dql_avail(&netif_txq->dql),
+ netdev_queue_dql_avail(netif_txq),
k3_cppi_desc_pool_avail(tx_chn->desc_pool));
if (netif_tx_queue_stopped(netif_txq)) {
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index ea85c6d..c0a5abd 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -631,6 +631,8 @@ static void cpsw_slave_open(struct cpsw_slave *slave, struct cpsw_priv *priv)
}
}
+ phy->mac_managed_pm = true;
+
slave->phy = phy;
phy_attached_info(slave->phy);
diff --git a/drivers/net/ethernet/ti/cpsw_new.c b/drivers/net/ethernet/ti/cpsw_new.c
index 498c50c..087dcb6 100644
--- a/drivers/net/ethernet/ti/cpsw_new.c
+++ b/drivers/net/ethernet/ti/cpsw_new.c
@@ -773,6 +773,9 @@ static void cpsw_slave_open(struct cpsw_slave *slave, struct cpsw_priv *priv)
slave->slave_num);
return;
}
+
+ phy->mac_managed_pm = true;
+
slave->phy = phy;
phy_attached_info(slave->phy);
diff --git a/drivers/net/ethernet/ti/cpts.c b/drivers/net/ethernet/ti/cpts.c
index bcccf43..dbbea91 100644
--- a/drivers/net/ethernet/ti/cpts.c
+++ b/drivers/net/ethernet/ti/cpts.c
@@ -638,6 +638,16 @@ static void cpts_calc_mult_shift(struct cpts *cpts)
freq, cpts->cc.mult, cpts->cc.shift, (ns - NSEC_PER_SEC));
}
+static void cpts_clk_unregister(void *clk)
+{
+ clk_hw_unregister_mux(clk);
+}
+
+static void cpts_clk_del_provider(void *np)
+{
+ of_clk_del_provider(np);
+}
+
static int cpts_of_mux_clk_setup(struct cpts *cpts, struct device_node *node)
{
struct device_node *refclk_np;
@@ -687,9 +697,7 @@ static int cpts_of_mux_clk_setup(struct cpts *cpts, struct device_node *node)
goto mux_fail;
}
- ret = devm_add_action_or_reset(cpts->dev,
- (void(*)(void *))clk_hw_unregister_mux,
- clk_hw);
+ ret = devm_add_action_or_reset(cpts->dev, cpts_clk_unregister, clk_hw);
if (ret) {
dev_err(cpts->dev, "add clkmux unreg action %d", ret);
goto mux_fail;
@@ -699,8 +707,7 @@ static int cpts_of_mux_clk_setup(struct cpts *cpts, struct device_node *node)
if (ret)
goto mux_fail;
- ret = devm_add_action_or_reset(cpts->dev,
- (void(*)(void *))of_clk_del_provider,
+ ret = devm_add_action_or_reset(cpts->dev, cpts_clk_del_provider,
refclk_np);
if (ret) {
dev_err(cpts->dev, "add clkmux provider unreg action %d", ret);
diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_net.c b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
index d5b75af..c1b0d35 100644
--- a/drivers/net/ethernet/toshiba/ps3_gelic_net.c
+++ b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
@@ -384,18 +384,18 @@ static int gelic_descr_prepare_rx(struct gelic_card *card,
if (gelic_descr_get_status(descr) != GELIC_DESCR_DMA_NOT_IN_USE)
dev_info(ctodev(card), "%s: ERROR status\n", __func__);
- descr->skb = netdev_alloc_skb(*card->netdev, rx_skb_size);
- if (!descr->skb) {
- descr->hw_regs.payload.dev_addr = 0; /* tell DMAC don't touch memory */
- return -ENOMEM;
- }
descr->hw_regs.dmac_cmd_status = 0;
descr->hw_regs.result_size = 0;
descr->hw_regs.valid_size = 0;
descr->hw_regs.data_error = 0;
descr->hw_regs.payload.dev_addr = 0;
descr->hw_regs.payload.size = 0;
- descr->skb = NULL;
+
+ descr->skb = netdev_alloc_skb(*card->netdev, rx_skb_size);
+ if (!descr->skb) {
+ descr->hw_regs.payload.dev_addr = 0; /* tell DMAC don't touch memory */
+ return -ENOMEM;
+ }
offset = ((unsigned long)descr->skb->data) &
(GELIC_NET_RXBUF_ALIGN - 1);
diff --git a/drivers/net/fddi/skfp/skfddi.c b/drivers/net/fddi/skfp/skfddi.c
index 2b6a607..a273362 100644
--- a/drivers/net/fddi/skfp/skfddi.c
+++ b/drivers/net/fddi/skfp/skfddi.c
@@ -153,6 +153,7 @@ static const struct pci_device_id skfddi_pci_tbl[] = {
{ } /* Terminating entry */
};
MODULE_DEVICE_TABLE(pci, skfddi_pci_tbl);
+MODULE_DESCRIPTION("SysKonnect FDDI PCI driver");
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Mirko Lindner <mlindner@syskonnect.de>");
diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 32c51c2..c4ed36c 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -221,7 +221,7 @@ static void geneve_rx(struct geneve_dev *geneve, struct geneve_sock *gs,
struct genevehdr *gnvh = geneve_hdr(skb);
struct metadata_dst *tun_dst = NULL;
unsigned int len;
- int err = 0;
+ int nh, err = 0;
void *oiph;
if (ip_tunnel_collect_metadata() || gs->collect_md) {
@@ -272,9 +272,23 @@ static void geneve_rx(struct geneve_dev *geneve, struct geneve_sock *gs,
skb->pkt_type = PACKET_HOST;
}
- oiph = skb_network_header(skb);
+ /* Save offset of outer header relative to skb->head,
+ * because we are going to reset the network header to the inner header
+ * and might change skb->head.
+ */
+ nh = skb_network_header(skb) - skb->head;
+
skb_reset_network_header(skb);
+ if (!pskb_inet_may_pull(skb)) {
+ DEV_STATS_INC(geneve->dev, rx_length_errors);
+ DEV_STATS_INC(geneve->dev, rx_errors);
+ goto drop;
+ }
+
+ /* Get the outer header. */
+ oiph = skb->head + nh;
+
if (geneve_get_sk_family(gs) == AF_INET)
err = IP_ECN_decapsulate(oiph, skb);
#if IS_ENABLED(CONFIG_IPV6)
diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index b191927..2b5357d 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -1903,26 +1903,26 @@ static int __init gtp_init(void)
get_random_bytes(>p_h_initval, sizeof(gtp_h_initval));
- err = rtnl_link_register(>p_link_ops);
+ err = register_pernet_subsys(>p_net_ops);
if (err < 0)
goto error_out;
+ err = rtnl_link_register(>p_link_ops);
+ if (err < 0)
+ goto unreg_pernet_subsys;
+
err = genl_register_family(>p_genl_family);
if (err < 0)
goto unreg_rtnl_link;
- err = register_pernet_subsys(>p_net_ops);
- if (err < 0)
- goto unreg_genl_family;
-
pr_info("GTP module loaded (pdp ctx size %zd bytes)\n",
sizeof(struct pdp_ctx));
return 0;
-unreg_genl_family:
- genl_unregister_family(>p_genl_family);
unreg_rtnl_link:
rtnl_link_unregister(>p_link_ops);
+unreg_pernet_subsys:
+ unregister_pernet_subsys(>p_net_ops);
error_out:
pr_err("error loading GTP module loaded\n");
return err;
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 273bd8a..11831a1 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -42,6 +42,10 @@
#define LINKCHANGE_INT (2 * HZ)
#define VF_TAKEOVER_INT (HZ / 10)
+/* Macros to define the context of vf registration */
+#define VF_REG_IN_PROBE 1
+#define VF_REG_IN_NOTIFIER 2
+
static unsigned int ring_size __ro_after_init = 128;
module_param(ring_size, uint, 0444);
MODULE_PARM_DESC(ring_size, "Ring buffer size (# of 4K pages)");
@@ -2185,7 +2189,7 @@ static rx_handler_result_t netvsc_vf_handle_frame(struct sk_buff **pskb)
}
static int netvsc_vf_join(struct net_device *vf_netdev,
- struct net_device *ndev)
+ struct net_device *ndev, int context)
{
struct net_device_context *ndev_ctx = netdev_priv(ndev);
int ret;
@@ -2208,7 +2212,11 @@ static int netvsc_vf_join(struct net_device *vf_netdev,
goto upper_link_failed;
}
- schedule_delayed_work(&ndev_ctx->vf_takeover, VF_TAKEOVER_INT);
+ /* If this registration is called from probe context vf_takeover
+ * is taken care of later in probe itself.
+ */
+ if (context == VF_REG_IN_NOTIFIER)
+ schedule_delayed_work(&ndev_ctx->vf_takeover, VF_TAKEOVER_INT);
call_netdevice_notifiers(NETDEV_JOIN, vf_netdev);
@@ -2346,7 +2354,7 @@ static int netvsc_prepare_bonding(struct net_device *vf_netdev)
return NOTIFY_DONE;
}
-static int netvsc_register_vf(struct net_device *vf_netdev)
+static int netvsc_register_vf(struct net_device *vf_netdev, int context)
{
struct net_device_context *net_device_ctx;
struct netvsc_device *netvsc_dev;
@@ -2386,7 +2394,7 @@ static int netvsc_register_vf(struct net_device *vf_netdev)
netdev_info(ndev, "VF registering: %s\n", vf_netdev->name);
- if (netvsc_vf_join(vf_netdev, ndev) != 0)
+ if (netvsc_vf_join(vf_netdev, ndev, context) != 0)
return NOTIFY_DONE;
dev_hold(vf_netdev);
@@ -2484,10 +2492,31 @@ static int netvsc_unregister_vf(struct net_device *vf_netdev)
return NOTIFY_OK;
}
+static int check_dev_is_matching_vf(struct net_device *event_ndev)
+{
+ /* Skip NetVSC interfaces */
+ if (event_ndev->netdev_ops == &device_ops)
+ return -ENODEV;
+
+ /* Avoid non-Ethernet type devices */
+ if (event_ndev->type != ARPHRD_ETHER)
+ return -ENODEV;
+
+ /* Avoid Vlan dev with same MAC registering as VF */
+ if (is_vlan_dev(event_ndev))
+ return -ENODEV;
+
+ /* Avoid Bonding master dev with same MAC registering as VF */
+ if (netif_is_bond_master(event_ndev))
+ return -ENODEV;
+
+ return 0;
+}
+
static int netvsc_probe(struct hv_device *dev,
const struct hv_vmbus_device_id *dev_id)
{
- struct net_device *net = NULL;
+ struct net_device *net = NULL, *vf_netdev;
struct net_device_context *net_device_ctx;
struct netvsc_device_info *device_info = NULL;
struct netvsc_device *nvdev;
@@ -2599,6 +2628,30 @@ static int netvsc_probe(struct hv_device *dev,
}
list_add(&net_device_ctx->list, &netvsc_dev_list);
+
+ /* When the hv_netvsc driver is unloaded and reloaded, the
+ * NET_DEVICE_REGISTER for the vf device is replayed before probe
+ * is complete. This is because register_netdevice_notifier() gets
+ * registered before vmbus_driver_register() so that callback func
+ * is set before probe and we don't miss events like NETDEV_POST_INIT
+ * So, in this section we try to register the matching vf device that
+ * is present as a netdevice, knowing that its register call is not
+ * processed in the netvsc_netdev_notifier(as probing is progress and
+ * get_netvsc_byslot fails).
+ */
+ for_each_netdev(dev_net(net), vf_netdev) {
+ ret = check_dev_is_matching_vf(vf_netdev);
+ if (ret != 0)
+ continue;
+
+ if (net != get_netvsc_byslot(vf_netdev))
+ continue;
+
+ netvsc_prepare_bonding(vf_netdev);
+ netvsc_register_vf(vf_netdev, VF_REG_IN_PROBE);
+ __netvsc_vf_setup(net, vf_netdev);
+ break;
+ }
rtnl_unlock();
netvsc_devinfo_put(device_info);
@@ -2754,28 +2807,17 @@ static int netvsc_netdev_event(struct notifier_block *this,
unsigned long event, void *ptr)
{
struct net_device *event_dev = netdev_notifier_info_to_dev(ptr);
+ int ret = 0;
- /* Skip our own events */
- if (event_dev->netdev_ops == &device_ops)
- return NOTIFY_DONE;
-
- /* Avoid non-Ethernet type devices */
- if (event_dev->type != ARPHRD_ETHER)
- return NOTIFY_DONE;
-
- /* Avoid Vlan dev with same MAC registering as VF */
- if (is_vlan_dev(event_dev))
- return NOTIFY_DONE;
-
- /* Avoid Bonding master dev with same MAC registering as VF */
- if (netif_is_bond_master(event_dev))
+ ret = check_dev_is_matching_vf(event_dev);
+ if (ret != 0)
return NOTIFY_DONE;
switch (event) {
case NETDEV_POST_INIT:
return netvsc_prepare_bonding(event_dev);
case NETDEV_REGISTER:
- return netvsc_register_vf(event_dev);
+ return netvsc_register_vf(event_dev, VF_REG_IN_NOTIFIER);
case NETDEV_UNREGISTER:
return netvsc_unregister_vf(event_dev);
case NETDEV_UP:
diff --git a/drivers/net/ieee802154/fakelb.c b/drivers/net/ieee802154/fakelb.c
index 35e55f1..2930141 100644
--- a/drivers/net/ieee802154/fakelb.c
+++ b/drivers/net/ieee802154/fakelb.c
@@ -259,4 +259,5 @@ static __exit void fake_remove_module(void)
module_init(fakelb_init_module);
module_exit(fake_remove_module);
+MODULE_DESCRIPTION("IEEE 802.15.4 loopback driver");
MODULE_LICENSE("GPL");
diff --git a/drivers/net/ipa/ipa_interrupt.c b/drivers/net/ipa/ipa_interrupt.c
index 4bc0594..a78c692 100644
--- a/drivers/net/ipa/ipa_interrupt.c
+++ b/drivers/net/ipa/ipa_interrupt.c
@@ -212,7 +212,7 @@ void ipa_interrupt_suspend_clear_all(struct ipa_interrupt *interrupt)
u32 unit_count;
u32 unit;
- unit_count = roundup(ipa->endpoint_count, 32);
+ unit_count = DIV_ROUND_UP(ipa->endpoint_count, 32);
for (unit = 0; unit < unit_count; unit++) {
const struct reg *reg;
u32 val;
diff --git a/drivers/net/ipvlan/ipvtap.c b/drivers/net/ipvlan/ipvtap.c
index 60944a4..1afc4c4 100644
--- a/drivers/net/ipvlan/ipvtap.c
+++ b/drivers/net/ipvlan/ipvtap.c
@@ -237,4 +237,5 @@ static void __exit ipvtap_exit(void)
module_exit(ipvtap_exit);
MODULE_ALIAS_RTNL_LINK("ipvtap");
MODULE_AUTHOR("Sainath Grandhi <sainath.grandhi@intel.com>");
+MODULE_DESCRIPTION("IP-VLAN based tap driver");
MODULE_LICENSE("GPL");
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index b4d3b9c..92a7a36 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -835,14 +835,14 @@ static void nsim_dev_trap_report_work(struct work_struct *work)
trap_report_dw.work);
nsim_dev = nsim_trap_data->nsim_dev;
- /* For each running port and enabled packet trap, generate a UDP
- * packet with a random 5-tuple and report it.
- */
if (!devl_trylock(priv_to_devlink(nsim_dev))) {
- schedule_delayed_work(&nsim_dev->trap_data->trap_report_dw, 0);
+ schedule_delayed_work(&nsim_dev->trap_data->trap_report_dw, 1);
return;
}
+ /* For each running port and enabled packet trap, generate a UDP
+ * packet with a random 5-tuple and report it.
+ */
list_for_each_entry(nsim_dev_port, &nsim_dev->port_list, list) {
if (!netif_running(nsim_dev_port->ns->netdev))
continue;
diff --git a/drivers/net/phy/mdio_devres.c b/drivers/net/phy/mdio_devres.c
index 69b829e..7fd3377 100644
--- a/drivers/net/phy/mdio_devres.c
+++ b/drivers/net/phy/mdio_devres.c
@@ -131,4 +131,5 @@ int __devm_of_mdiobus_register(struct device *dev, struct mii_bus *mdio,
EXPORT_SYMBOL(__devm_of_mdiobus_register);
#endif /* CONFIG_OF_MDIO */
+MODULE_DESCRIPTION("Network MDIO bus devres helpers");
MODULE_LICENSE("GPL");
diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
index 894172a..337899c 100644
--- a/drivers/net/phy/realtek.c
+++ b/drivers/net/phy/realtek.c
@@ -421,9 +421,11 @@ static int rtl8211f_config_init(struct phy_device *phydev)
ERR_PTR(ret));
return ret;
}
+
+ return genphy_soft_reset(phydev);
}
- return genphy_soft_reset(phydev);
+ return 0;
}
static int rtl821x_suspend(struct phy_device *phydev)
diff --git a/drivers/net/plip/plip.c b/drivers/net/plip/plip.c
index 40ce8ab..cc7d111 100644
--- a/drivers/net/plip/plip.c
+++ b/drivers/net/plip/plip.c
@@ -1437,4 +1437,5 @@ static int __init plip_init (void)
module_init(plip_init);
module_exit(plip_cleanup_module);
+MODULE_DESCRIPTION("PLIP (parallel port) network module");
MODULE_LICENSE("GPL");
diff --git a/drivers/net/ppp/bsd_comp.c b/drivers/net/ppp/bsd_comp.c
index db0dc36..5595459 100644
--- a/drivers/net/ppp/bsd_comp.c
+++ b/drivers/net/ppp/bsd_comp.c
@@ -1166,5 +1166,6 @@ static void __exit bsdcomp_cleanup(void)
module_init(bsdcomp_init);
module_exit(bsdcomp_cleanup);
+MODULE_DESCRIPTION("PPP BSD-Compress compression module");
MODULE_LICENSE("Dual BSD/GPL");
MODULE_ALIAS("ppp-compress-" __stringify(CI_BSD_COMPRESS));
diff --git a/drivers/net/ppp/ppp_async.c b/drivers/net/ppp/ppp_async.c
index 840da92..c33c3db 100644
--- a/drivers/net/ppp/ppp_async.c
+++ b/drivers/net/ppp/ppp_async.c
@@ -87,6 +87,7 @@ struct asyncppp {
static int flag_time = HZ;
module_param(flag_time, int, 0);
MODULE_PARM_DESC(flag_time, "ppp_async: interval between flagged packets (in clock ticks)");
+MODULE_DESCRIPTION("PPP async serial channel module");
MODULE_LICENSE("GPL");
MODULE_ALIAS_LDISC(N_PPP);
@@ -460,6 +461,10 @@ ppp_async_ioctl(struct ppp_channel *chan, unsigned int cmd, unsigned long arg)
case PPPIOCSMRU:
if (get_user(val, p))
break;
+ if (val > U16_MAX) {
+ err = -EINVAL;
+ break;
+ }
if (val < PPP_MRU)
val = PPP_MRU;
ap->mru = val;
diff --git a/drivers/net/ppp/ppp_deflate.c b/drivers/net/ppp/ppp_deflate.c
index e6d48e5..4d2ff63 100644
--- a/drivers/net/ppp/ppp_deflate.c
+++ b/drivers/net/ppp/ppp_deflate.c
@@ -630,6 +630,7 @@ static void __exit deflate_cleanup(void)
module_init(deflate_init);
module_exit(deflate_cleanup);
+MODULE_DESCRIPTION("PPP Deflate compression module");
MODULE_LICENSE("Dual BSD/GPL");
MODULE_ALIAS("ppp-compress-" __stringify(CI_DEFLATE));
MODULE_ALIAS("ppp-compress-" __stringify(CI_DEFLATE_DRAFT));
diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index 0193af2..3dd52bf 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -3604,6 +3604,7 @@ EXPORT_SYMBOL(ppp_input_error);
EXPORT_SYMBOL(ppp_output_wakeup);
EXPORT_SYMBOL(ppp_register_compressor);
EXPORT_SYMBOL(ppp_unregister_compressor);
+MODULE_DESCRIPTION("Generic PPP layer driver");
MODULE_LICENSE("GPL");
MODULE_ALIAS_CHARDEV(PPP_MAJOR, 0);
MODULE_ALIAS_RTNL_LINK("ppp");
diff --git a/drivers/net/ppp/ppp_synctty.c b/drivers/net/ppp/ppp_synctty.c
index 52d05ce..45bf59a 100644
--- a/drivers/net/ppp/ppp_synctty.c
+++ b/drivers/net/ppp/ppp_synctty.c
@@ -724,5 +724,6 @@ ppp_sync_cleanup(void)
module_init(ppp_sync_init);
module_exit(ppp_sync_cleanup);
+MODULE_DESCRIPTION("PPP synchronous TTY channel module");
MODULE_LICENSE("GPL");
MODULE_ALIAS_LDISC(N_SYNC_PPP);
diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 8e7238e..2ea4f48 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -1007,26 +1007,21 @@ static int pppoe_recvmsg(struct socket *sock, struct msghdr *m,
struct sk_buff *skb;
int error = 0;
- if (sk->sk_state & PPPOX_BOUND) {
- error = -EIO;
- goto end;
- }
+ if (sk->sk_state & PPPOX_BOUND)
+ return -EIO;
skb = skb_recv_datagram(sk, flags, &error);
- if (error < 0)
- goto end;
+ if (!skb)
+ return error;
- if (skb) {
- total_len = min_t(size_t, total_len, skb->len);
- error = skb_copy_datagram_msg(skb, 0, m, total_len);
- if (error == 0) {
- consume_skb(skb);
- return total_len;
- }
+ total_len = min_t(size_t, total_len, skb->len);
+ error = skb_copy_datagram_msg(skb, 0, m, total_len);
+ if (error == 0) {
+ consume_skb(skb);
+ return total_len;
}
kfree_skb(skb);
-end:
return error;
}
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 4a4f8c8..8f95a56 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -653,6 +653,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
tun->tfiles[tun->numqueues - 1]);
ntfile = rtnl_dereference(tun->tfiles[index]);
ntfile->queue_index = index;
+ ntfile->xdp_rxq.queue_index = index;
rcu_assign_pointer(tun->tfiles[tun->numqueues - 1],
NULL);
diff --git a/drivers/net/usb/dm9601.c b/drivers/net/usb/dm9601.c
index 99ec1d4..8b6d6a1 100644
--- a/drivers/net/usb/dm9601.c
+++ b/drivers/net/usb/dm9601.c
@@ -232,7 +232,7 @@ static int dm9601_mdio_read(struct net_device *netdev, int phy_id, int loc)
err = dm_read_shared_word(dev, 1, loc, &res);
if (err < 0) {
netdev_err(dev->net, "MDIO read error: %d\n", err);
- return err;
+ return 0;
}
netdev_dbg(dev->net,
diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index a6d653f..d2aa2c5 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -1501,7 +1501,9 @@ static int lan78xx_link_reset(struct lan78xx_net *dev)
lan78xx_rx_urb_submit_all(dev);
+ local_bh_disable();
napi_schedule(&dev->napi);
+ local_bh_enable();
}
return 0;
@@ -3033,7 +3035,8 @@ static int lan78xx_reset(struct lan78xx_net *dev)
if (dev->chipid == ID_REV_CHIP_ID_7801_)
buf &= ~MAC_CR_GMII_EN_;
- if (dev->chipid == ID_REV_CHIP_ID_7800_) {
+ if (dev->chipid == ID_REV_CHIP_ID_7800_ ||
+ dev->chipid == ID_REV_CHIP_ID_7850_) {
ret = lan78xx_read_raw_eeprom(dev, 0, 1, &sig);
if (!ret && sig != EEPROM_INDICATOR) {
/* Implies there is no external eeprom. Set mac speed */
@@ -3132,7 +3135,8 @@ static int lan78xx_open(struct net_device *net)
done:
mutex_unlock(&dev->dev_mutex);
- usb_autopm_put_interface(dev->intf);
+ if (ret < 0)
+ usb_autopm_put_interface(dev->intf);
return ret;
}
diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index a530f20..2fa46baa 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -2105,6 +2105,11 @@ static const struct usb_device_id products[] = {
.driver_info = (unsigned long) &smsc95xx_info,
},
{
+ /* SYSTEC USB-SPEmodule1 10BASE-T1L Ethernet Device */
+ USB_DEVICE(0x0878, 0x1400),
+ .driver_info = (unsigned long)&smsc95xx_info,
+ },
+ {
/* Microchip's EVB-LAN8670-USB 10BASE-T1S Ethernet Device */
USB_DEVICE(0x184F, 0x0051),
.driver_info = (unsigned long)&smsc95xx_info,
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 578e36e..cd4a6fe 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1208,14 +1208,6 @@ static int veth_enable_xdp(struct net_device *dev)
veth_disable_xdp_range(dev, 0, dev->real_num_rx_queues, true);
return err;
}
-
- if (!veth_gro_requested(dev)) {
- /* user-space did not require GRO, but adding XDP
- * is supposed to get GRO working
- */
- dev->features |= NETIF_F_GRO;
- netdev_features_change(dev);
- }
}
}
@@ -1235,18 +1227,9 @@ static void veth_disable_xdp(struct net_device *dev)
for (i = 0; i < dev->real_num_rx_queues; i++)
rcu_assign_pointer(priv->rq[i].xdp_prog, NULL);
- if (!netif_running(dev) || !veth_gro_requested(dev)) {
+ if (!netif_running(dev) || !veth_gro_requested(dev))
veth_napi_del(dev);
- /* if user-space did not require GRO, since adding XDP
- * enabled it, clear it now
- */
- if (!veth_gro_requested(dev) && netif_running(dev)) {
- dev->features &= ~NETIF_F_GRO;
- netdev_features_change(dev);
- }
- }
-
veth_disable_xdp_range(dev, 0, dev->real_num_rx_queues, false);
}
@@ -1478,7 +1461,8 @@ static int veth_alloc_queues(struct net_device *dev)
struct veth_priv *priv = netdev_priv(dev);
int i;
- priv->rq = kcalloc(dev->num_rx_queues, sizeof(*priv->rq), GFP_KERNEL_ACCOUNT);
+ priv->rq = kvcalloc(dev->num_rx_queues, sizeof(*priv->rq),
+ GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL);
if (!priv->rq)
return -ENOMEM;
@@ -1494,7 +1478,7 @@ static void veth_free_queues(struct net_device *dev)
{
struct veth_priv *priv = netdev_priv(dev);
- kfree(priv->rq);
+ kvfree(priv->rq);
}
static int veth_dev_init(struct net_device *dev)
@@ -1654,6 +1638,14 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog,
}
if (!old_prog) {
+ if (!veth_gro_requested(dev)) {
+ /* user-space did not require GRO, but adding
+ * XDP is supposed to get GRO working
+ */
+ dev->features |= NETIF_F_GRO;
+ netdev_features_change(dev);
+ }
+
peer->hw_features &= ~NETIF_F_GSO_SOFTWARE;
peer->max_mtu = max_mtu;
}
@@ -1669,6 +1661,14 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog,
if (dev->flags & IFF_UP)
veth_disable_xdp(dev);
+ /* if user-space did not require GRO, since adding XDP
+ * enabled it, clear it now
+ */
+ if (!veth_gro_requested(dev)) {
+ dev->features &= ~NETIF_F_GRO;
+ netdev_features_change(dev);
+ }
+
if (peer) {
peer->hw_features |= NETIF_F_GSO_SOFTWARE;
peer->max_mtu = ETH_MAX_MTU;
diff --git a/drivers/net/wireless/ath/ar5523/ar5523.c b/drivers/net/wireless/ath/ar5523/ar5523.c
index 43e0db78..a742cec 100644
--- a/drivers/net/wireless/ath/ar5523/ar5523.c
+++ b/drivers/net/wireless/ath/ar5523/ar5523.c
@@ -1803,5 +1803,6 @@ static struct usb_driver ar5523_driver = {
module_usb_driver(ar5523_driver);
+MODULE_DESCRIPTION("Atheros AR5523 wireless driver");
MODULE_LICENSE("Dual BSD/GPL");
MODULE_FIRMWARE(AR5523_FIRMWARE_FILE);
diff --git a/drivers/net/wireless/ath/wcn36xx/main.c b/drivers/net/wireless/ath/wcn36xx/main.c
index 41119fb..4e6b4df 100644
--- a/drivers/net/wireless/ath/wcn36xx/main.c
+++ b/drivers/net/wireless/ath/wcn36xx/main.c
@@ -1685,6 +1685,7 @@ static struct platform_driver wcn36xx_driver = {
module_platform_driver(wcn36xx_driver);
+MODULE_DESCRIPTION("Qualcomm Atheros WCN3660/3680 wireless driver");
MODULE_LICENSE("Dual BSD/GPL");
MODULE_AUTHOR("Eugene Krasnikov k.eugene.e@gmail.com");
MODULE_FIRMWARE(WLAN_NV_FILE);
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bca/module.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bca/module.c
index d55f327..4f0c1e1 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bca/module.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bca/module.c
@@ -20,6 +20,7 @@ static void __exit brcmf_bca_exit(void)
brcmf_fwvid_unregister_vendor(BRCMF_FWVENDOR_BCA, THIS_MODULE);
}
+MODULE_DESCRIPTION("Broadcom FullMAC WLAN driver plugin for Broadcom AP chipsets");
MODULE_LICENSE("Dual BSD/GPL");
MODULE_IMPORT_NS(BRCMFMAC);
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
index 133c5ea..28d6a30 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
@@ -3779,8 +3779,10 @@ static int brcmf_internal_escan_add_info(struct cfg80211_scan_request *req,
if (req->channels[i] == chan)
break;
}
- if (i == req->n_channels)
- req->channels[req->n_channels++] = chan;
+ if (i == req->n_channels) {
+ req->n_channels++;
+ req->channels[i] = chan;
+ }
for (i = 0; i < req->n_ssids; i++) {
if (req->ssids[i].ssid_len == ssid_len &&
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cyw/module.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cyw/module.c
index f82fbbe3..90d06cd 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cyw/module.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cyw/module.c
@@ -20,6 +20,7 @@ static void __exit brcmf_cyw_exit(void)
brcmf_fwvid_unregister_vendor(BRCMF_FWVENDOR_CYW, THIS_MODULE);
}
+MODULE_DESCRIPTION("Broadcom FullMAC WLAN driver plugin for Cypress/Infineon chipsets");
MODULE_LICENSE("Dual BSD/GPL");
MODULE_IMPORT_NS(BRCMFMAC);
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/wcc/module.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/wcc/module.c
index 02918d4..b66135e3c 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/wcc/module.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/wcc/module.c
@@ -20,6 +20,7 @@ static void __exit brcmf_wcc_exit(void)
brcmf_fwvid_unregister_vendor(BRCMF_FWVENDOR_WCC, THIS_MODULE);
}
+MODULE_DESCRIPTION("Broadcom FullMAC WLAN driver plugin for Broadcom mobility chipsets");
MODULE_LICENSE("Dual BSD/GPL");
MODULE_IMPORT_NS(BRCMFMAC);
diff --git a/drivers/net/wireless/intel/iwlwifi/fw/acpi.c b/drivers/net/wireless/intel/iwlwifi/fw/acpi.c
index b96f30d..dcc4810 100644
--- a/drivers/net/wireless/intel/iwlwifi/fw/acpi.c
+++ b/drivers/net/wireless/intel/iwlwifi/fw/acpi.c
@@ -618,7 +618,7 @@ int iwl_sar_get_wrds_table(struct iwl_fw_runtime *fwrt)
&tbl_rev);
if (!IS_ERR(wifi_pkg)) {
if (tbl_rev != 2) {
- ret = PTR_ERR(wifi_pkg);
+ ret = -EINVAL;
goto out_free;
}
@@ -634,7 +634,7 @@ int iwl_sar_get_wrds_table(struct iwl_fw_runtime *fwrt)
&tbl_rev);
if (!IS_ERR(wifi_pkg)) {
if (tbl_rev != 1) {
- ret = PTR_ERR(wifi_pkg);
+ ret = -EINVAL;
goto out_free;
}
@@ -650,7 +650,7 @@ int iwl_sar_get_wrds_table(struct iwl_fw_runtime *fwrt)
&tbl_rev);
if (!IS_ERR(wifi_pkg)) {
if (tbl_rev != 0) {
- ret = PTR_ERR(wifi_pkg);
+ ret = -EINVAL;
goto out_free;
}
@@ -707,7 +707,7 @@ int iwl_sar_get_ewrd_table(struct iwl_fw_runtime *fwrt)
&tbl_rev);
if (!IS_ERR(wifi_pkg)) {
if (tbl_rev != 2) {
- ret = PTR_ERR(wifi_pkg);
+ ret = -EINVAL;
goto out_free;
}
@@ -723,7 +723,7 @@ int iwl_sar_get_ewrd_table(struct iwl_fw_runtime *fwrt)
&tbl_rev);
if (!IS_ERR(wifi_pkg)) {
if (tbl_rev != 1) {
- ret = PTR_ERR(wifi_pkg);
+ ret = -EINVAL;
goto out_free;
}
@@ -739,7 +739,7 @@ int iwl_sar_get_ewrd_table(struct iwl_fw_runtime *fwrt)
&tbl_rev);
if (!IS_ERR(wifi_pkg)) {
if (tbl_rev != 0) {
- ret = PTR_ERR(wifi_pkg);
+ ret = -EINVAL;
goto out_free;
}
@@ -1116,6 +1116,9 @@ int iwl_acpi_get_ppag_table(struct iwl_fw_runtime *fwrt)
goto read_table;
}
+ ret = PTR_ERR(wifi_pkg);
+ goto out_free;
+
read_table:
fwrt->ppag_ver = tbl_rev;
flags = &wifi_pkg->package.elements[1];
diff --git a/drivers/net/wireless/intel/iwlwifi/fw/api/debug.h b/drivers/net/wireless/intel/iwlwifi/fw/api/debug.h
index 798731e..b740c65 100644
--- a/drivers/net/wireless/intel/iwlwifi/fw/api/debug.h
+++ b/drivers/net/wireless/intel/iwlwifi/fw/api/debug.h
@@ -537,7 +537,7 @@ enum iwl_fw_dbg_config_cmd_type {
}; /* LDBG_CFG_CMD_TYPE_API_E_VER_1 */
/* this token disables debug asserts in the firmware */
-#define IWL_FW_DBG_CONFIG_TOKEN 0x00011301
+#define IWL_FW_DBG_CONFIG_TOKEN 0x00010001
/**
* struct iwl_fw_dbg_config_cmd - configure FW debug
diff --git a/drivers/net/wireless/intel/iwlwifi/fw/api/txq.h b/drivers/net/wireless/intel/iwlwifi/fw/api/txq.h
index 9c69d36..e6c0f92 100644
--- a/drivers/net/wireless/intel/iwlwifi/fw/api/txq.h
+++ b/drivers/net/wireless/intel/iwlwifi/fw/api/txq.h
@@ -1,6 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
/*
- * Copyright (C) 2005-2014, 2019-2021, 2023 Intel Corporation
+ * Copyright (C) 2005-2014, 2019-2021, 2023-2024 Intel Corporation
* Copyright (C) 2013-2015 Intel Mobile Communications GmbH
* Copyright (C) 2016-2017 Intel Deutschland GmbH
*/
@@ -66,6 +66,16 @@ enum iwl_gen2_tx_fifo {
IWL_GEN2_TRIG_TX_FIFO_VO,
};
+enum iwl_bz_tx_fifo {
+ IWL_BZ_EDCA_TX_FIFO_BK,
+ IWL_BZ_EDCA_TX_FIFO_BE,
+ IWL_BZ_EDCA_TX_FIFO_VI,
+ IWL_BZ_EDCA_TX_FIFO_VO,
+ IWL_BZ_TRIG_TX_FIFO_BK,
+ IWL_BZ_TRIG_TX_FIFO_BE,
+ IWL_BZ_TRIG_TX_FIFO_VI,
+ IWL_BZ_TRIG_TX_FIFO_VO,
+};
/**
* enum iwl_tx_queue_cfg_actions - TXQ config options
* @TX_QUEUE_CFG_ENABLE_QUEUE: enable a queue
diff --git a/drivers/net/wireless/intel/iwlwifi/fw/dbg.c b/drivers/net/wireless/intel/iwlwifi/fw/dbg.c
index e27774e..80fda05 100644
--- a/drivers/net/wireless/intel/iwlwifi/fw/dbg.c
+++ b/drivers/net/wireless/intel/iwlwifi/fw/dbg.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
/*
- * Copyright (C) 2005-2014, 2018-2023 Intel Corporation
+ * Copyright (C) 2005-2014, 2018-2024 Intel Corporation
* Copyright (C) 2013-2015 Intel Mobile Communications GmbH
* Copyright (C) 2015-2017 Intel Deutschland GmbH
*/
@@ -19,7 +19,6 @@
* @fwrt_ptr: pointer to the buffer coming from fwrt
* @trans_ptr: pointer to struct %iwl_trans_dump_data which contains the
* transport's data.
- * @trans_len: length of the valid data in trans_ptr
* @fwrt_len: length of the valid data in fwrt_ptr
*/
struct iwl_fw_dump_ptrs {
diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-drv.c b/drivers/net/wireless/intel/iwlwifi/iwl-drv.c
index ffe2670..abf8001 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-drv.c
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-drv.c
@@ -128,6 +128,7 @@ static void iwl_dealloc_ucode(struct iwl_drv *drv)
kfree(drv->fw.ucode_capa.cmd_versions);
kfree(drv->fw.phy_integration_ver);
kfree(drv->trans->dbg.pc_data);
+ drv->trans->dbg.pc_data = NULL;
for (i = 0; i < IWL_UCODE_TYPE_MAX; i++)
iwl_free_fw_img(drv, drv->fw.img + i);
diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c b/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
index 4028969..2f6774ec 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
@@ -668,7 +668,6 @@ static const struct ieee80211_sband_iftype_data iwl_he_eht_capa[] = {
.has_eht = true,
.eht_cap_elem = {
.mac_cap_info[0] =
- IEEE80211_EHT_MAC_CAP0_EPCS_PRIO_ACCESS |
IEEE80211_EHT_MAC_CAP0_OM_CONTROL |
IEEE80211_EHT_MAC_CAP0_TRIG_TXOP_SHARING_MODE1 |
IEEE80211_EHT_MAC_CAP0_TRIG_TXOP_SHARING_MODE2 |
@@ -793,7 +792,6 @@ static const struct ieee80211_sband_iftype_data iwl_he_eht_capa[] = {
.has_eht = true,
.eht_cap_elem = {
.mac_cap_info[0] =
- IEEE80211_EHT_MAC_CAP0_EPCS_PRIO_ACCESS |
IEEE80211_EHT_MAC_CAP0_OM_CONTROL |
IEEE80211_EHT_MAC_CAP0_TRIG_TXOP_SHARING_MODE1 |
IEEE80211_EHT_MAC_CAP0_TRIG_TXOP_SHARING_MODE2,
@@ -1020,8 +1018,7 @@ iwl_nvm_fixup_sband_iftd(struct iwl_trans *trans,
if (CSR_HW_REV_TYPE(trans->hw_rev) == IWL_CFG_MAC_TYPE_GL &&
iftype_data->eht_cap.has_eht) {
iftype_data->eht_cap.eht_cap_elem.mac_cap_info[0] &=
- ~(IEEE80211_EHT_MAC_CAP0_EPCS_PRIO_ACCESS |
- IEEE80211_EHT_MAC_CAP0_TRIG_TXOP_SHARING_MODE1 |
+ ~(IEEE80211_EHT_MAC_CAP0_TRIG_TXOP_SHARING_MODE1 |
IEEE80211_EHT_MAC_CAP0_TRIG_TXOP_SHARING_MODE2);
iftype_data->eht_cap.eht_cap_elem.phy_cap_info[3] &=
~(IEEE80211_EHT_PHY_CAP0_PARTIAL_BW_UL_MU_MIMO |
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/d3.c b/drivers/net/wireless/intel/iwlwifi/mvm/d3.c
index 4582afb..05b6417 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/d3.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/d3.c
@@ -1279,7 +1279,9 @@ static int __iwl_mvm_suspend(struct ieee80211_hw *hw,
mvm->net_detect = true;
} else {
- struct iwl_wowlan_config_cmd wowlan_config_cmd = {};
+ struct iwl_wowlan_config_cmd wowlan_config_cmd = {
+ .offloading_tid = 0,
+ };
wowlan_config_cmd.sta_id = mvmvif->deflink.ap_sta_id;
@@ -1291,6 +1293,11 @@ static int __iwl_mvm_suspend(struct ieee80211_hw *hw,
goto out_noreset;
}
+ ret = iwl_mvm_sta_ensure_queue(
+ mvm, ap_sta->txq[wowlan_config_cmd.offloading_tid]);
+ if (ret)
+ goto out_noreset;
+
ret = iwl_mvm_get_wowlan_config(mvm, wowlan, &wowlan_config_cmd,
vif, mvmvif, ap_sta);
if (ret)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c b/drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c
index c4f9612..25a5a31 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c
@@ -31,6 +31,17 @@ const u8 iwl_mvm_ac_to_gen2_tx_fifo[] = {
IWL_GEN2_TRIG_TX_FIFO_BK,
};
+const u8 iwl_mvm_ac_to_bz_tx_fifo[] = {
+ IWL_BZ_EDCA_TX_FIFO_VO,
+ IWL_BZ_EDCA_TX_FIFO_VI,
+ IWL_BZ_EDCA_TX_FIFO_BE,
+ IWL_BZ_EDCA_TX_FIFO_BK,
+ IWL_BZ_TRIG_TX_FIFO_VO,
+ IWL_BZ_TRIG_TX_FIFO_VI,
+ IWL_BZ_TRIG_TX_FIFO_BE,
+ IWL_BZ_TRIG_TX_FIFO_BK,
+};
+
struct iwl_mvm_mac_iface_iterator_data {
struct iwl_mvm *mvm;
struct ieee80211_vif *vif;
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
index 7f13dff..53e26c3 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
@@ -1600,7 +1600,8 @@ static int iwl_mvm_mac_add_interface(struct ieee80211_hw *hw,
*/
if (vif->type == NL80211_IFTYPE_AP ||
vif->type == NL80211_IFTYPE_ADHOC) {
- iwl_mvm_vif_dbgfs_add_link(mvm, vif);
+ if (!test_bit(IWL_MVM_STATUS_IN_HW_RESTART, &mvm->status))
+ iwl_mvm_vif_dbgfs_add_link(mvm, vif);
ret = 0;
goto out;
}
@@ -1640,7 +1641,8 @@ static int iwl_mvm_mac_add_interface(struct ieee80211_hw *hw,
iwl_mvm_chandef_get_primary_80(&vif->bss_conf.chandef);
}
- iwl_mvm_vif_dbgfs_add_link(mvm, vif);
+ if (!test_bit(IWL_MVM_STATUS_IN_HW_RESTART, &mvm->status))
+ iwl_mvm_vif_dbgfs_add_link(mvm, vif);
if (!test_bit(IWL_MVM_STATUS_IN_HW_RESTART, &mvm->status) &&
vif->type == NL80211_IFTYPE_STATION && !vif->p2p &&
@@ -3685,6 +3687,9 @@ iwl_mvm_sta_state_notexist_to_none(struct iwl_mvm *mvm,
NL80211_TDLS_SETUP);
}
+ if (ret)
+ return ret;
+
for_each_sta_active_link(vif, sta, link_sta, i)
link_sta->agg.max_rc_amsdu_len = 1;
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mld-mac80211.c b/drivers/net/wireless/intel/iwlwifi/mvm/mld-mac80211.c
index 6117017..893b69f 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/mld-mac80211.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/mld-mac80211.c
@@ -81,7 +81,8 @@ static int iwl_mvm_mld_mac_add_interface(struct ieee80211_hw *hw,
ieee80211_hw_set(mvm->hw, RX_INCLUDES_FCS);
}
- iwl_mvm_vif_dbgfs_add_link(mvm, vif);
+ if (!test_bit(IWL_MVM_STATUS_IN_HW_RESTART, &mvm->status))
+ iwl_mvm_vif_dbgfs_add_link(mvm, vif);
if (!test_bit(IWL_MVM_STATUS_IN_HW_RESTART, &mvm->status) &&
vif->type == NL80211_IFTYPE_STATION && !vif->p2p &&
@@ -437,6 +438,9 @@ __iwl_mvm_mld_unassign_vif_chanctx(struct iwl_mvm *mvm,
mvmvif->ap_ibss_active = false;
}
+ iwl_mvm_link_changed(mvm, vif, link_conf,
+ LINK_CONTEXT_MODIFY_ACTIVE, false);
+
if (iwl_mvm_is_esr_supported(mvm->fwrt.trans) && n_active > 1) {
int ret = iwl_mvm_esr_mode_inactive(mvm, vif);
@@ -448,9 +452,6 @@ __iwl_mvm_mld_unassign_vif_chanctx(struct iwl_mvm *mvm,
if (vif->type == NL80211_IFTYPE_MONITOR)
iwl_mvm_mld_rm_snif_sta(mvm, vif);
- iwl_mvm_link_changed(mvm, vif, link_conf,
- LINK_CONTEXT_MODIFY_ACTIVE, false);
-
if (switching_chanctx)
return;
mvmvif->link[link_id]->phy_ctxt = NULL;
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h b/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h
index 4062796..81dbef6 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h
@@ -1581,12 +1581,16 @@ static inline int iwl_mvm_max_active_links(struct iwl_mvm *mvm,
extern const u8 iwl_mvm_ac_to_tx_fifo[];
extern const u8 iwl_mvm_ac_to_gen2_tx_fifo[];
+extern const u8 iwl_mvm_ac_to_bz_tx_fifo[];
static inline u8 iwl_mvm_mac_ac_to_tx_fifo(struct iwl_mvm *mvm,
enum ieee80211_ac_numbers ac)
{
- return iwl_mvm_has_new_tx_api(mvm) ?
- iwl_mvm_ac_to_gen2_tx_fifo[ac] : iwl_mvm_ac_to_tx_fifo[ac];
+ if (mvm->trans->trans_cfg->device_family >= IWL_DEVICE_FAMILY_BZ)
+ return iwl_mvm_ac_to_bz_tx_fifo[ac];
+ if (iwl_mvm_has_new_tx_api(mvm))
+ return iwl_mvm_ac_to_gen2_tx_fifo[ac];
+ return iwl_mvm_ac_to_tx_fifo[ac];
}
struct iwl_rate_info {
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c b/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c
index 886d000..af15d47 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c
@@ -505,6 +505,10 @@ static bool iwl_mvm_is_dup(struct ieee80211_sta *sta, int queue,
return false;
mvm_sta = iwl_mvm_sta_from_mac80211(sta);
+
+ if (WARN_ON_ONCE(!mvm_sta->dup_data))
+ return false;
+
dup_data = &mvm_sta->dup_data[queue];
/*
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/sta.c b/drivers/net/wireless/intel/iwlwifi/mvm/sta.c
index 2a3ca97..c2e0cff 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/sta.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/sta.c
@@ -1502,6 +1502,34 @@ static int iwl_mvm_sta_alloc_queue(struct iwl_mvm *mvm,
return ret;
}
+int iwl_mvm_sta_ensure_queue(struct iwl_mvm *mvm,
+ struct ieee80211_txq *txq)
+{
+ struct iwl_mvm_txq *mvmtxq = iwl_mvm_txq_from_mac80211(txq);
+ int ret = -EINVAL;
+
+ lockdep_assert_held(&mvm->mutex);
+
+ if (likely(test_bit(IWL_MVM_TXQ_STATE_READY, &mvmtxq->state)) ||
+ !txq->sta) {
+ return 0;
+ }
+
+ if (!iwl_mvm_sta_alloc_queue(mvm, txq->sta, txq->ac, txq->tid)) {
+ set_bit(IWL_MVM_TXQ_STATE_READY, &mvmtxq->state);
+ ret = 0;
+ }
+
+ local_bh_disable();
+ spin_lock(&mvm->add_stream_lock);
+ if (!list_empty(&mvmtxq->list))
+ list_del_init(&mvmtxq->list);
+ spin_unlock(&mvm->add_stream_lock);
+ local_bh_enable();
+
+ return ret;
+}
+
void iwl_mvm_add_new_dqa_stream_wk(struct work_struct *wk)
{
struct iwl_mvm *mvm = container_of(wk, struct iwl_mvm,
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/sta.h b/drivers/net/wireless/intel/iwlwifi/mvm/sta.h
index b33a0ce..3cf8a70 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/sta.h
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/sta.h
@@ -1,6 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
/*
- * Copyright (C) 2012-2014, 2018-2023 Intel Corporation
+ * Copyright (C) 2012-2014, 2018-2024 Intel Corporation
* Copyright (C) 2013-2014 Intel Mobile Communications GmbH
* Copyright (C) 2015-2016 Intel Deutschland GmbH
*/
@@ -571,6 +571,7 @@ void iwl_mvm_modify_all_sta_disable_tx(struct iwl_mvm *mvm,
bool disable);
void iwl_mvm_csa_client_absent(struct iwl_mvm *mvm, struct ieee80211_vif *vif);
+int iwl_mvm_sta_ensure_queue(struct iwl_mvm *mvm, struct ieee80211_txq *txq);
void iwl_mvm_add_new_dqa_stream_wk(struct work_struct *wk);
int iwl_mvm_add_pasn_sta(struct iwl_mvm *mvm, struct ieee80211_vif *vif,
struct iwl_mvm_int_sta *sta, u8 *addr, u32 cipher,
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/time-event.c b/drivers/net/wireless/intel/iwlwifi/mvm/time-event.c
index 218fdf1..2e653a4 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/time-event.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/time-event.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
/*
- * Copyright (C) 2012-2014, 2018-2023 Intel Corporation
+ * Copyright (C) 2012-2014, 2018-2024 Intel Corporation
* Copyright (C) 2013-2015 Intel Mobile Communications GmbH
* Copyright (C) 2017 Intel Deutschland GmbH
*/
@@ -972,6 +972,7 @@ void iwl_mvm_rx_session_protect_notif(struct iwl_mvm *mvm,
if (!le32_to_cpu(notif->status) || !le32_to_cpu(notif->start)) {
/* End TE, notify mac80211 */
mvmvif->time_event_data.id = SESSION_PROTECT_CONF_MAX_ID;
+ mvmvif->time_event_data.link_id = -1;
iwl_mvm_p2p_roc_finished(mvm);
ieee80211_remain_on_channel_expired(mvm->hw);
} else if (le32_to_cpu(notif->start)) {
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
index db986bf..461f26d 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
@@ -520,13 +520,24 @@ static void iwl_mvm_set_tx_cmd_crypto(struct iwl_mvm *mvm,
}
}
+static void iwl_mvm_copy_hdr(void *cmd, const void *hdr, int hdrlen,
+ const u8 *addr3_override)
+{
+ struct ieee80211_hdr *out_hdr = cmd;
+
+ memcpy(cmd, hdr, hdrlen);
+ if (addr3_override)
+ memcpy(out_hdr->addr3, addr3_override, ETH_ALEN);
+}
+
/*
* Allocates and sets the Tx cmd the driver data pointers in the skb
*/
static struct iwl_device_tx_cmd *
iwl_mvm_set_tx_params(struct iwl_mvm *mvm, struct sk_buff *skb,
struct ieee80211_tx_info *info, int hdrlen,
- struct ieee80211_sta *sta, u8 sta_id)
+ struct ieee80211_sta *sta, u8 sta_id,
+ const u8 *addr3_override)
{
struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)skb->data;
struct iwl_device_tx_cmd *dev_cmd;
@@ -584,7 +595,7 @@ iwl_mvm_set_tx_params(struct iwl_mvm *mvm, struct sk_buff *skb,
cmd->len = cpu_to_le16((u16)skb->len);
/* Copy MAC header from skb into command buffer */
- memcpy(cmd->hdr, hdr, hdrlen);
+ iwl_mvm_copy_hdr(cmd->hdr, hdr, hdrlen, addr3_override);
cmd->flags = cpu_to_le16(flags);
cmd->rate_n_flags = cpu_to_le32(rate_n_flags);
@@ -599,7 +610,7 @@ iwl_mvm_set_tx_params(struct iwl_mvm *mvm, struct sk_buff *skb,
cmd->len = cpu_to_le16((u16)skb->len);
/* Copy MAC header from skb into command buffer */
- memcpy(cmd->hdr, hdr, hdrlen);
+ iwl_mvm_copy_hdr(cmd->hdr, hdr, hdrlen, addr3_override);
cmd->flags = cpu_to_le32(flags);
cmd->rate_n_flags = cpu_to_le32(rate_n_flags);
@@ -617,7 +628,7 @@ iwl_mvm_set_tx_params(struct iwl_mvm *mvm, struct sk_buff *skb,
iwl_mvm_set_tx_cmd_rate(mvm, tx_cmd, info, sta, hdr->frame_control);
/* Copy MAC header from skb into command buffer */
- memcpy(tx_cmd->hdr, hdr, hdrlen);
+ iwl_mvm_copy_hdr(tx_cmd->hdr, hdr, hdrlen, addr3_override);
out:
return dev_cmd;
@@ -820,7 +831,8 @@ int iwl_mvm_tx_skb_non_sta(struct iwl_mvm *mvm, struct sk_buff *skb)
IWL_DEBUG_TX(mvm, "station Id %d, queue=%d\n", sta_id, queue);
- dev_cmd = iwl_mvm_set_tx_params(mvm, skb, &info, hdrlen, NULL, sta_id);
+ dev_cmd = iwl_mvm_set_tx_params(mvm, skb, &info, hdrlen, NULL, sta_id,
+ NULL);
if (!dev_cmd)
return -1;
@@ -1140,7 +1152,8 @@ static int iwl_mvm_tx_pkt_queued(struct iwl_mvm *mvm,
*/
static int iwl_mvm_tx_mpdu(struct iwl_mvm *mvm, struct sk_buff *skb,
struct ieee80211_tx_info *info,
- struct ieee80211_sta *sta)
+ struct ieee80211_sta *sta,
+ const u8 *addr3_override)
{
struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)skb->data;
struct iwl_mvm_sta *mvmsta;
@@ -1172,7 +1185,8 @@ static int iwl_mvm_tx_mpdu(struct iwl_mvm *mvm, struct sk_buff *skb,
iwl_mvm_probe_resp_set_noa(mvm, skb);
dev_cmd = iwl_mvm_set_tx_params(mvm, skb, info, hdrlen,
- sta, mvmsta->deflink.sta_id);
+ sta, mvmsta->deflink.sta_id,
+ addr3_override);
if (!dev_cmd)
goto drop;
@@ -1294,9 +1308,11 @@ int iwl_mvm_tx_skb_sta(struct iwl_mvm *mvm, struct sk_buff *skb,
struct iwl_mvm_sta *mvmsta = iwl_mvm_sta_from_mac80211(sta);
struct ieee80211_tx_info info;
struct sk_buff_head mpdus_skbs;
+ struct ieee80211_vif *vif;
unsigned int payload_len;
int ret;
struct sk_buff *orig_skb = skb;
+ const u8 *addr3;
if (WARN_ON_ONCE(!mvmsta))
return -1;
@@ -1307,26 +1323,59 @@ int iwl_mvm_tx_skb_sta(struct iwl_mvm *mvm, struct sk_buff *skb,
memcpy(&info, skb->cb, sizeof(info));
if (!skb_is_gso(skb))
- return iwl_mvm_tx_mpdu(mvm, skb, &info, sta);
+ return iwl_mvm_tx_mpdu(mvm, skb, &info, sta, NULL);
payload_len = skb_tail_pointer(skb) - skb_transport_header(skb) -
tcp_hdrlen(skb) + skb->data_len;
if (payload_len <= skb_shinfo(skb)->gso_size)
- return iwl_mvm_tx_mpdu(mvm, skb, &info, sta);
+ return iwl_mvm_tx_mpdu(mvm, skb, &info, sta, NULL);
__skb_queue_head_init(&mpdus_skbs);
+ vif = info.control.vif;
+ if (!vif)
+ return -1;
+
ret = iwl_mvm_tx_tso(mvm, skb, &info, sta, &mpdus_skbs);
if (ret)
return ret;
WARN_ON(skb_queue_empty(&mpdus_skbs));
- while (!skb_queue_empty(&mpdus_skbs)) {
- skb = __skb_dequeue(&mpdus_skbs);
+ /*
+ * As described in IEEE sta 802.11-2020, table 9-30 (Address
+ * field contents), A-MSDU address 3 should contain the BSSID
+ * address.
+ * Pass address 3 down to iwl_mvm_tx_mpdu() and further to set it
+ * in the command header. We need to preserve the original
+ * address 3 in the skb header to correctly create all the
+ * A-MSDU subframe headers from it.
+ */
+ switch (vif->type) {
+ case NL80211_IFTYPE_STATION:
+ addr3 = vif->cfg.ap_addr;
+ break;
+ case NL80211_IFTYPE_AP:
+ addr3 = vif->addr;
+ break;
+ default:
+ addr3 = NULL;
+ break;
+ }
- ret = iwl_mvm_tx_mpdu(mvm, skb, &info, sta);
+ while (!skb_queue_empty(&mpdus_skbs)) {
+ struct ieee80211_hdr *hdr;
+ bool amsdu;
+
+ skb = __skb_dequeue(&mpdus_skbs);
+ hdr = (void *)skb->data;
+ amsdu = ieee80211_is_data_qos(hdr->frame_control) &&
+ (*ieee80211_get_qos_ctl(hdr) &
+ IEEE80211_QOS_CTL_A_MSDU_PRESENT);
+
+ ret = iwl_mvm_tx_mpdu(mvm, skb, &info, sta,
+ amsdu ? addr3 : NULL);
if (ret) {
/* Free skbs created as part of TSO logic that have not yet been dequeued */
__skb_queue_purge(&mpdus_skbs);
diff --git a/drivers/net/wireless/intersil/p54/p54spi.c b/drivers/net/wireless/intersil/p54/p54spi.c
index ce0179b..0073b5e 100644
--- a/drivers/net/wireless/intersil/p54/p54spi.c
+++ b/drivers/net/wireless/intersil/p54/p54spi.c
@@ -700,6 +700,7 @@ static struct spi_driver p54spi_driver = {
module_spi_driver(p54spi_driver);
+MODULE_DESCRIPTION("Prism54 SPI wireless driver");
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Christian Lamparter <chunkeey@web.de>");
MODULE_ALIAS("spi:cx3110x");
diff --git a/drivers/net/wireless/mediatek/mt76/mt7603/main.c b/drivers/net/wireless/mediatek/mt76/mt7603/main.c
index 89d738d..e2146d3 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7603/main.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7603/main.c
@@ -728,6 +728,7 @@ const struct ieee80211_ops mt7603_ops = {
.set_sar_specs = mt7603_set_sar_specs,
};
+MODULE_DESCRIPTION("MediaTek MT7603E and MT76x8 wireless driver");
MODULE_LICENSE("Dual BSD/GPL");
static int __init mt7603_init(void)
diff --git a/drivers/net/wireless/mediatek/mt76/mt7615/main.c b/drivers/net/wireless/mediatek/mt76/mt7615/main.c
index dab16b5..0971c16 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7615/main.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7615/main.c
@@ -1375,4 +1375,5 @@ const struct ieee80211_ops mt7615_ops = {
};
EXPORT_SYMBOL_GPL(mt7615_ops);
+MODULE_DESCRIPTION("MediaTek MT7615E and MT7663E wireless driver");
MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/wireless/mediatek/mt76/mt7615/mmio.c b/drivers/net/wireless/mediatek/mt76/mt7615/mmio.c
index ac036a0..87a956e 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7615/mmio.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7615/mmio.c
@@ -270,4 +270,5 @@ static void __exit mt7615_exit(void)
module_init(mt7615_init);
module_exit(mt7615_exit);
+MODULE_DESCRIPTION("MediaTek MT7615E MMIO helpers");
MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/wireless/mediatek/mt76/mt7615/sdio.c b/drivers/net/wireless/mediatek/mt76/mt7615/sdio.c
index 67cedd2..9692890 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7615/sdio.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7615/sdio.c
@@ -253,4 +253,5 @@ module_sdio_driver(mt7663s_driver);
MODULE_AUTHOR("Sean Wang <sean.wang@mediatek.com>");
MODULE_AUTHOR("Lorenzo Bianconi <lorenzo@kernel.org>");
+MODULE_DESCRIPTION("MediaTek MT7663S (SDIO) wireless driver");
MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/wireless/mediatek/mt76/mt7615/usb.c b/drivers/net/wireless/mediatek/mt76/mt7615/usb.c
index 04963b9..df737e1 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7615/usb.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7615/usb.c
@@ -281,4 +281,5 @@ module_usb_driver(mt7663u_driver);
MODULE_AUTHOR("Sean Wang <sean.wang@mediatek.com>");
MODULE_AUTHOR("Lorenzo Bianconi <lorenzo@kernel.org>");
+MODULE_DESCRIPTION("MediaTek MT7663U (USB) wireless driver");
MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/wireless/mediatek/mt76/mt7615/usb_sdio.c b/drivers/net/wireless/mediatek/mt76/mt7615/usb_sdio.c
index 0052d10..820b395 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7615/usb_sdio.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7615/usb_sdio.c
@@ -349,4 +349,5 @@ EXPORT_SYMBOL_GPL(mt7663_usb_sdio_register_device);
MODULE_AUTHOR("Lorenzo Bianconi <lorenzo@kernel.org>");
MODULE_AUTHOR("Sean Wang <sean.wang@mediatek.com>");
+MODULE_DESCRIPTION("MediaTek MT7663 SDIO/USB helpers");
MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/wireless/mediatek/mt76/mt76_connac_mcu.c b/drivers/net/wireless/mediatek/mt76/mt76_connac_mcu.c
index 96494ba..3a20ba0 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76_connac_mcu.c
+++ b/drivers/net/wireless/mediatek/mt76/mt76_connac_mcu.c
@@ -3160,4 +3160,5 @@ int mt76_connac2_mcu_fill_message(struct mt76_dev *dev, struct sk_buff *skb,
EXPORT_SYMBOL_GPL(mt76_connac2_mcu_fill_message);
MODULE_AUTHOR("Lorenzo Bianconi <lorenzo@kernel.org>");
+MODULE_DESCRIPTION("MediaTek MT76x connac layer helpers");
MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/wireless/mediatek/mt76/mt76x0/eeprom.c b/drivers/net/wireless/mediatek/mt76/mt76x0/eeprom.c
index c3a392a..bcd24c9 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76x0/eeprom.c
+++ b/drivers/net/wireless/mediatek/mt76/mt76x0/eeprom.c
@@ -342,4 +342,5 @@ int mt76x0_eeprom_init(struct mt76x02_dev *dev)
return 0;
}
+MODULE_DESCRIPTION("MediaTek MT76x EEPROM helpers");
MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/wireless/mediatek/mt76/mt76x0/pci.c b/drivers/net/wireless/mediatek/mt76/mt76x0/pci.c
index 9277ff3..293e66f 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76x0/pci.c
+++ b/drivers/net/wireless/mediatek/mt76/mt76x0/pci.c
@@ -302,6 +302,7 @@ static const struct pci_device_id mt76x0e_device_table[] = {
MODULE_DEVICE_TABLE(pci, mt76x0e_device_table);
MODULE_FIRMWARE(MT7610E_FIRMWARE);
MODULE_FIRMWARE(MT7650E_FIRMWARE);
+MODULE_DESCRIPTION("MediaTek MT76x0E (PCIe) wireless driver");
MODULE_LICENSE("Dual BSD/GPL");
static struct pci_driver mt76x0e_driver = {
diff --git a/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c b/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c
index 0422c33..dd04294 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c
+++ b/drivers/net/wireless/mediatek/mt76/mt76x0/usb.c
@@ -336,6 +336,7 @@ static int __maybe_unused mt76x0_resume(struct usb_interface *usb_intf)
MODULE_DEVICE_TABLE(usb, mt76x0_device_table);
MODULE_FIRMWARE(MT7610E_FIRMWARE);
MODULE_FIRMWARE(MT7610U_FIRMWARE);
+MODULE_DESCRIPTION("MediaTek MT76x0U (USB) wireless driver");
MODULE_LICENSE("GPL");
static struct usb_driver mt76x0_driver = {
diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_usb_mcu.c b/drivers/net/wireless/mediatek/mt76/mt76x02_usb_mcu.c
index 02da543..b2cc449 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76x02_usb_mcu.c
+++ b/drivers/net/wireless/mediatek/mt76/mt76x02_usb_mcu.c
@@ -293,4 +293,5 @@ void mt76x02u_init_mcu(struct mt76_dev *dev)
EXPORT_SYMBOL_GPL(mt76x02u_init_mcu);
MODULE_AUTHOR("Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>");
+MODULE_DESCRIPTION("MediaTek MT76x02 MCU helpers");
MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_util.c b/drivers/net/wireless/mediatek/mt76/mt76x02_util.c
index 8a0e812..8020446 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76x02_util.c
+++ b/drivers/net/wireless/mediatek/mt76/mt76x02_util.c
@@ -696,4 +696,5 @@ void mt76x02_config_mac_addr_list(struct mt76x02_dev *dev)
}
EXPORT_SYMBOL_GPL(mt76x02_config_mac_addr_list);
+MODULE_DESCRIPTION("MediaTek MT76x02 helpers");
MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/wireless/mediatek/mt76/mt76x2/eeprom.c b/drivers/net/wireless/mediatek/mt76/mt76x2/eeprom.c
index 8c01855..1fe5f5a 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76x2/eeprom.c
+++ b/drivers/net/wireless/mediatek/mt76/mt76x2/eeprom.c
@@ -506,4 +506,5 @@ int mt76x2_eeprom_init(struct mt76x02_dev *dev)
}
EXPORT_SYMBOL_GPL(mt76x2_eeprom_init);
+MODULE_DESCRIPTION("MediaTek MT76x2 EEPROM helpers");
MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/wireless/mediatek/mt76/mt76x2/pci.c b/drivers/net/wireless/mediatek/mt76/mt76x2/pci.c
index df85ebc..3095974 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76x2/pci.c
+++ b/drivers/net/wireless/mediatek/mt76/mt76x2/pci.c
@@ -165,6 +165,7 @@ mt76x2e_resume(struct pci_dev *pdev)
MODULE_DEVICE_TABLE(pci, mt76x2e_device_table);
MODULE_FIRMWARE(MT7662_FIRMWARE);
MODULE_FIRMWARE(MT7662_ROM_PATCH);
+MODULE_DESCRIPTION("MediaTek MT76x2E (PCIe) wireless driver");
MODULE_LICENSE("Dual BSD/GPL");
static struct pci_driver mt76pci_driver = {
diff --git a/drivers/net/wireless/mediatek/mt76/mt76x2/usb.c b/drivers/net/wireless/mediatek/mt76/mt76x2/usb.c
index 55068f3..ca78e14 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76x2/usb.c
+++ b/drivers/net/wireless/mediatek/mt76/mt76x2/usb.c
@@ -147,4 +147,5 @@ static struct usb_driver mt76x2u_driver = {
module_usb_driver(mt76x2u_driver);
MODULE_AUTHOR("Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>");
+MODULE_DESCRIPTION("MediaTek MT76x2U (USB) wireless driver");
MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/wireless/mediatek/mt76/mt7915/mmio.c b/drivers/net/wireless/mediatek/mt76/mt7915/mmio.c
index aff4f21..3039f53 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7915/mmio.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7915/mmio.c
@@ -958,4 +958,5 @@ static void __exit mt7915_exit(void)
module_init(mt7915_init);
module_exit(mt7915_exit);
+MODULE_DESCRIPTION("MediaTek MT7915E MMIO helpers");
MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/main.c b/drivers/net/wireless/mediatek/mt76/mt7921/main.c
index 0645417..0d5adc5d 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7921/main.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7921/main.c
@@ -1418,5 +1418,6 @@ const struct ieee80211_ops mt7921_ops = {
};
EXPORT_SYMBOL_GPL(mt7921_ops);
+MODULE_DESCRIPTION("MediaTek MT7921 core driver");
MODULE_LICENSE("Dual BSD/GPL");
MODULE_AUTHOR("Sean Wang <sean.wang@mediatek.com>");
diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/pci.c b/drivers/net/wireless/mediatek/mt76/mt7921/pci.c
index 57903c6..dde26f32 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7921/pci.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7921/pci.c
@@ -544,4 +544,5 @@ MODULE_FIRMWARE(MT7922_FIRMWARE_WM);
MODULE_FIRMWARE(MT7922_ROM_PATCH);
MODULE_AUTHOR("Sean Wang <sean.wang@mediatek.com>");
MODULE_AUTHOR("Lorenzo Bianconi <lorenzo@kernel.org>");
+MODULE_DESCRIPTION("MediaTek MT7921E (PCIe) wireless driver");
MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/sdio.c b/drivers/net/wireless/mediatek/mt76/mt7921/sdio.c
index 7591e54..a9ce1e7 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7921/sdio.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7921/sdio.c
@@ -323,5 +323,6 @@ static struct sdio_driver mt7921s_driver = {
.drv.pm = pm_sleep_ptr(&mt7921s_pm_ops),
};
module_sdio_driver(mt7921s_driver);
+MODULE_DESCRIPTION("MediaTek MT7921S (SDIO) wireless driver");
MODULE_AUTHOR("Sean Wang <sean.wang@mediatek.com>");
MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/usb.c b/drivers/net/wireless/mediatek/mt76/mt7921/usb.c
index e5258c7..8b7c03c 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7921/usb.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7921/usb.c
@@ -336,5 +336,6 @@ static struct usb_driver mt7921u_driver = {
};
module_usb_driver(mt7921u_driver);
+MODULE_DESCRIPTION("MediaTek MT7921U (USB) wireless driver");
MODULE_AUTHOR("Lorenzo Bianconi <lorenzo@kernel.org>");
MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/wireless/mediatek/mt76/mt7925/main.c b/drivers/net/wireless/mediatek/mt76/mt7925/main.c
index 8f1075d..125a1be 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7925/main.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7925/main.c
@@ -1450,4 +1450,5 @@ const struct ieee80211_ops mt7925_ops = {
EXPORT_SYMBOL_GPL(mt7925_ops);
MODULE_AUTHOR("Deren Wu <deren.wu@mediatek.com>");
+MODULE_DESCRIPTION("MediaTek MT7925 core driver");
MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/wireless/mediatek/mt76/mt7925/pci.c b/drivers/net/wireless/mediatek/mt76/mt7925/pci.c
index 734f31e..1fd99a8 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7925/pci.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7925/pci.c
@@ -583,4 +583,5 @@ MODULE_FIRMWARE(MT7925_FIRMWARE_WM);
MODULE_FIRMWARE(MT7925_ROM_PATCH);
MODULE_AUTHOR("Deren Wu <deren.wu@mediatek.com>");
MODULE_AUTHOR("Lorenzo Bianconi <lorenzo@kernel.org>");
+MODULE_DESCRIPTION("MediaTek MT7925E (PCIe) wireless driver");
MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/wireless/mediatek/mt76/mt7925/usb.c b/drivers/net/wireless/mediatek/mt76/mt7925/usb.c
index 9b885c5..1e0f094 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7925/usb.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7925/usb.c
@@ -329,4 +329,5 @@ static struct usb_driver mt7925u_driver = {
module_usb_driver(mt7925u_driver);
MODULE_AUTHOR("Lorenzo Bianconi <lorenzo@kernel.org>");
+MODULE_DESCRIPTION("MediaTek MT7925U (USB) wireless driver");
MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/wireless/mediatek/mt76/mt792x_core.c b/drivers/net/wireless/mediatek/mt76/mt792x_core.c
index 502be22..c42101a 100644
--- a/drivers/net/wireless/mediatek/mt76/mt792x_core.c
+++ b/drivers/net/wireless/mediatek/mt76/mt792x_core.c
@@ -862,5 +862,6 @@ int mt792x_load_firmware(struct mt792x_dev *dev)
}
EXPORT_SYMBOL_GPL(mt792x_load_firmware);
+MODULE_DESCRIPTION("MediaTek MT792x core driver");
MODULE_LICENSE("Dual BSD/GPL");
MODULE_AUTHOR("Lorenzo Bianconi <lorenzo@kernel.org>");
diff --git a/drivers/net/wireless/mediatek/mt76/mt792x_usb.c b/drivers/net/wireless/mediatek/mt76/mt792x_usb.c
index 2dd283c..589a3ef 100644
--- a/drivers/net/wireless/mediatek/mt76/mt792x_usb.c
+++ b/drivers/net/wireless/mediatek/mt76/mt792x_usb.c
@@ -314,5 +314,6 @@ void mt792xu_disconnect(struct usb_interface *usb_intf)
}
EXPORT_SYMBOL_GPL(mt792xu_disconnect);
+MODULE_DESCRIPTION("MediaTek MT792x USB helpers");
MODULE_LICENSE("Dual BSD/GPL");
MODULE_AUTHOR("Lorenzo Bianconi <lorenzo@kernel.org>");
diff --git a/drivers/net/wireless/mediatek/mt76/mt7996/mcu.c b/drivers/net/wireless/mediatek/mt76/mt7996/mcu.c
index 3c729b5..699be57 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7996/mcu.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7996/mcu.c
@@ -4477,7 +4477,8 @@ int mt7996_mcu_set_txpower_sku(struct mt7996_phy *phy)
skb_put_data(skb, &req, sizeof(req));
/* cck and ofdm */
- skb_put_data(skb, &la.cck, sizeof(la.cck) + sizeof(la.ofdm));
+ skb_put_data(skb, &la.cck, sizeof(la.cck));
+ skb_put_data(skb, &la.ofdm, sizeof(la.ofdm));
/* ht20 */
skb_put_data(skb, &la.mcs[0], 8);
/* ht40 */
diff --git a/drivers/net/wireless/mediatek/mt76/mt7996/mmio.c b/drivers/net/wireless/mediatek/mt76/mt7996/mmio.c
index c50d89a..9f2abfa 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7996/mmio.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7996/mmio.c
@@ -650,4 +650,5 @@ static void __exit mt7996_exit(void)
module_init(mt7996_init);
module_exit(mt7996_exit);
+MODULE_DESCRIPTION("MediaTek MT7996 MMIO helpers");
MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/wireless/mediatek/mt76/sdio.c b/drivers/net/wireless/mediatek/mt76/sdio.c
index c52d550..3e88798 100644
--- a/drivers/net/wireless/mediatek/mt76/sdio.c
+++ b/drivers/net/wireless/mediatek/mt76/sdio.c
@@ -672,4 +672,5 @@ EXPORT_SYMBOL_GPL(mt76s_init);
MODULE_AUTHOR("Sean Wang <sean.wang@mediatek.com>");
MODULE_AUTHOR("Lorenzo Bianconi <lorenzo@kernel.org>");
+MODULE_DESCRIPTION("MediaTek MT76x SDIO helpers");
MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/wireless/mediatek/mt76/usb.c b/drivers/net/wireless/mediatek/mt76/usb.c
index 1584665..5a0bcb5 100644
--- a/drivers/net/wireless/mediatek/mt76/usb.c
+++ b/drivers/net/wireless/mediatek/mt76/usb.c
@@ -1128,4 +1128,5 @@ int mt76u_init(struct mt76_dev *dev, struct usb_interface *intf)
EXPORT_SYMBOL_GPL(mt76u_init);
MODULE_AUTHOR("Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>");
+MODULE_DESCRIPTION("MediaTek MT76x USB helpers");
MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/wireless/mediatek/mt76/util.c b/drivers/net/wireless/mediatek/mt76/util.c
index fc76c66..d6c01a2 100644
--- a/drivers/net/wireless/mediatek/mt76/util.c
+++ b/drivers/net/wireless/mediatek/mt76/util.c
@@ -138,4 +138,5 @@ int __mt76_worker_fn(void *ptr)
}
EXPORT_SYMBOL_GPL(__mt76_worker_fn);
+MODULE_DESCRIPTION("MediaTek MT76x helpers");
MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/wireless/microchip/wilc1000/netdev.c b/drivers/net/wireless/microchip/wilc1000/netdev.c
index 91d71e0..81e8f25 100644
--- a/drivers/net/wireless/microchip/wilc1000/netdev.c
+++ b/drivers/net/wireless/microchip/wilc1000/netdev.c
@@ -1018,5 +1018,6 @@ struct wilc_vif *wilc_netdev_ifc_init(struct wilc *wl, const char *name,
return ERR_PTR(ret);
}
+MODULE_DESCRIPTION("Atmel WILC1000 core wireless driver");
MODULE_LICENSE("GPL");
MODULE_FIRMWARE(WILC1000_FW(WILC1000_API_VER));
diff --git a/drivers/net/wireless/microchip/wilc1000/sdio.c b/drivers/net/wireless/microchip/wilc1000/sdio.c
index 0d13e3e..d6d3946 100644
--- a/drivers/net/wireless/microchip/wilc1000/sdio.c
+++ b/drivers/net/wireless/microchip/wilc1000/sdio.c
@@ -984,4 +984,5 @@ static struct sdio_driver wilc_sdio_driver = {
module_driver(wilc_sdio_driver,
sdio_register_driver,
sdio_unregister_driver);
+MODULE_DESCRIPTION("Atmel WILC1000 SDIO wireless driver");
MODULE_LICENSE("GPL");
diff --git a/drivers/net/wireless/microchip/wilc1000/spi.c b/drivers/net/wireless/microchip/wilc1000/spi.c
index 77b4cdf..1d8b241 100644
--- a/drivers/net/wireless/microchip/wilc1000/spi.c
+++ b/drivers/net/wireless/microchip/wilc1000/spi.c
@@ -273,6 +273,7 @@ static struct spi_driver wilc_spi_driver = {
.remove = wilc_bus_remove,
};
module_spi_driver(wilc_spi_driver);
+MODULE_DESCRIPTION("Atmel WILC1000 SPI wireless driver");
MODULE_LICENSE("GPL");
static int wilc_spi_tx(struct wilc *wilc, u8 *b, u32 len)
diff --git a/drivers/net/wireless/ti/wl1251/sdio.c b/drivers/net/wireless/ti/wl1251/sdio.c
index 301bd00..4e5b351 100644
--- a/drivers/net/wireless/ti/wl1251/sdio.c
+++ b/drivers/net/wireless/ti/wl1251/sdio.c
@@ -343,5 +343,6 @@ static void __exit wl1251_sdio_exit(void)
module_init(wl1251_sdio_init);
module_exit(wl1251_sdio_exit);
+MODULE_DESCRIPTION("TI WL1251 SDIO helpers");
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Kalle Valo <kvalo@adurom.com>");
diff --git a/drivers/net/wireless/ti/wl1251/spi.c b/drivers/net/wireless/ti/wl1251/spi.c
index 29292f0..1936bb3 100644
--- a/drivers/net/wireless/ti/wl1251/spi.c
+++ b/drivers/net/wireless/ti/wl1251/spi.c
@@ -342,6 +342,7 @@ static struct spi_driver wl1251_spi_driver = {
module_spi_driver(wl1251_spi_driver);
+MODULE_DESCRIPTION("TI WL1251 SPI helpers");
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Kalle Valo <kvalo@adurom.com>");
MODULE_ALIAS("spi:wl1251");
diff --git a/drivers/net/wireless/ti/wl12xx/main.c b/drivers/net/wireless/ti/wl12xx/main.c
index de045fe..b26d42b 100644
--- a/drivers/net/wireless/ti/wl12xx/main.c
+++ b/drivers/net/wireless/ti/wl12xx/main.c
@@ -1955,6 +1955,7 @@ module_param_named(tcxo, tcxo_param, charp, 0);
MODULE_PARM_DESC(tcxo,
"TCXO clock: 19.2, 26, 38.4, 52, 16.368, 32.736, 16.8, 33.6");
+MODULE_DESCRIPTION("TI WL12xx wireless driver");
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Luciano Coelho <coelho@ti.com>");
MODULE_FIRMWARE(WL127X_FW_NAME_SINGLE);
diff --git a/drivers/net/wireless/ti/wl18xx/main.c b/drivers/net/wireless/ti/wl18xx/main.c
index 20d9181b..2ccac1c 100644
--- a/drivers/net/wireless/ti/wl18xx/main.c
+++ b/drivers/net/wireless/ti/wl18xx/main.c
@@ -2086,6 +2086,7 @@ module_param_named(num_rx_desc, num_rx_desc_param, int, 0400);
MODULE_PARM_DESC(num_rx_desc_param,
"Number of Rx descriptors: u8 (default is 32)");
+MODULE_DESCRIPTION("TI WiLink 8 wireless driver");
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Luciano Coelho <coelho@ti.com>");
MODULE_FIRMWARE(WL18XX_FW_NAME);
diff --git a/drivers/net/wireless/ti/wlcore/main.c b/drivers/net/wireless/ti/wlcore/main.c
index fb9ed97..5736acb4d 100644
--- a/drivers/net/wireless/ti/wlcore/main.c
+++ b/drivers/net/wireless/ti/wlcore/main.c
@@ -6793,6 +6793,7 @@ MODULE_PARM_DESC(bug_on_recovery, "BUG() on fw recovery");
module_param(no_recovery, int, 0600);
MODULE_PARM_DESC(no_recovery, "Prevent HW recovery. FW will remain stuck.");
+MODULE_DESCRIPTION("TI WLAN core driver");
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Luciano Coelho <coelho@ti.com>");
MODULE_AUTHOR("Juuso Oikarinen <juuso.oikarinen@nokia.com>");
diff --git a/drivers/net/wireless/ti/wlcore/sdio.c b/drivers/net/wireless/ti/wlcore/sdio.c
index f0686635..eb5482e 100644
--- a/drivers/net/wireless/ti/wlcore/sdio.c
+++ b/drivers/net/wireless/ti/wlcore/sdio.c
@@ -447,6 +447,7 @@ module_sdio_driver(wl1271_sdio_driver);
module_param(dump, bool, 0600);
MODULE_PARM_DESC(dump, "Enable sdio read/write dumps.");
+MODULE_DESCRIPTION("TI WLAN SDIO helpers");
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Luciano Coelho <coelho@ti.com>");
MODULE_AUTHOR("Juuso Oikarinen <juuso.oikarinen@nokia.com>");
diff --git a/drivers/net/wireless/ti/wlcore/spi.c b/drivers/net/wireless/ti/wlcore/spi.c
index 7d9a139..0aa2b2f 100644
--- a/drivers/net/wireless/ti/wlcore/spi.c
+++ b/drivers/net/wireless/ti/wlcore/spi.c
@@ -562,6 +562,7 @@ static struct spi_driver wl1271_spi_driver = {
};
module_spi_driver(wl1271_spi_driver);
+MODULE_DESCRIPTION("TI WLAN SPI helpers");
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Luciano Coelho <coelho@ti.com>");
MODULE_AUTHOR("Juuso Oikarinen <juuso.oikarinen@nokia.com>");
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index fab361a..ef76850d 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1778,5 +1778,6 @@ static void __exit netback_fini(void)
}
module_exit(netback_fini);
+MODULE_DESCRIPTION("Xen backend network device module");
MODULE_LICENSE("Dual BSD/GPL");
MODULE_ALIAS("xen-backend:vif");
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 2baf578..00864a6 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -722,7 +722,7 @@ void nvme_init_request(struct request *req, struct nvme_command *cmd)
if (req->q->queuedata) {
struct nvme_ns *ns = req->q->disk->private_data;
- logging_enabled = ns->passthru_err_log_enabled;
+ logging_enabled = ns->head->passthru_err_log_enabled;
req->timeout = NVME_IO_TIMEOUT;
} else { /* no queuedata implies admin queue */
logging_enabled = nr->ctrl->passthru_err_log_enabled;
@@ -1162,6 +1162,10 @@ u32 nvme_command_effects(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u8 opcode)
effects &= ~NVME_CMD_EFFECTS_CSE_MASK;
} else {
effects = le32_to_cpu(ctrl->effects->acs[opcode]);
+
+ /* Ignore execution restrictions if any relaxation bits are set */
+ if (effects & NVME_CMD_EFFECTS_CSER_MASK)
+ effects &= ~NVME_CMD_EFFECTS_CSE_MASK;
}
return effects;
@@ -3727,7 +3731,6 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, struct nvme_ns_info *info)
ns->disk = disk;
ns->queue = disk->queue;
- ns->passthru_err_log_enabled = false;
if (ctrl->opts && ctrl->opts->data_digest)
blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, ns->queue);
@@ -3793,8 +3796,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, struct nvme_ns_info *info)
/*
* Set ns->disk->device->driver_data to ns so we can access
- * ns->logging_enabled in nvme_passthru_err_log_enabled_store() and
- * nvme_passthru_err_log_enabled_show().
+ * ns->head->passthru_err_log_enabled in
+ * nvme_io_passthru_err_log_enabled_[store | show]().
*/
dev_set_drvdata(disk_to_dev(ns->disk), ns);
@@ -4222,6 +4225,7 @@ static bool nvme_ctrl_pp_status(struct nvme_ctrl *ctrl)
static void nvme_get_fw_slot_info(struct nvme_ctrl *ctrl)
{
struct nvme_fw_slot_info_log *log;
+ u8 next_fw_slot, cur_fw_slot;
log = kmalloc(sizeof(*log), GFP_KERNEL);
if (!log)
@@ -4233,13 +4237,15 @@ static void nvme_get_fw_slot_info(struct nvme_ctrl *ctrl)
goto out_free_log;
}
- if (log->afi & 0x70 || !(log->afi & 0x7)) {
+ cur_fw_slot = log->afi & 0x7;
+ next_fw_slot = (log->afi & 0x70) >> 4;
+ if (!cur_fw_slot || (next_fw_slot && (cur_fw_slot != next_fw_slot))) {
dev_info(ctrl->device,
"Firmware is activated after next Controller Level Reset\n");
goto out_free_log;
}
- memcpy(ctrl->subsys->firmware_rev, &log->frs[(log->afi & 0x7) - 1],
+ memcpy(ctrl->subsys->firmware_rev, &log->frs[cur_fw_slot - 1],
sizeof(ctrl->subsys->firmware_rev));
out_free_log:
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 0141c0a..1f0ea1f 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -534,6 +534,7 @@ int nvmf_connect_io_queue(struct nvme_ctrl *ctrl, u16 qid)
if (ret) {
nvmf_log_connect_error(ctrl, ret, le32_to_cpu(res.u32),
&cmd, data);
+ goto out_free_data;
}
result = le32_to_cpu(res.u32);
if (result & (NVME_CONNECT_AUTHREQ_ATR | NVME_CONNECT_AUTHREQ_ASCR)) {
diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index 18f5c1be..3dfd5ae9 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -228,7 +228,7 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
length = (io.nblocks + 1) << ns->head->lba_shift;
if ((io.control & NVME_RW_PRINFO_PRACT) &&
- ns->head->ms == sizeof(struct t10_pi_tuple)) {
+ (ns->head->ms == ns->head->pi_size)) {
/*
* Protection information is stripped/inserted by the
* controller.
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 27397f8..24193fc 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -455,6 +455,7 @@ struct nvme_ns_head {
struct list_head entry;
struct kref ref;
bool shared;
+ bool passthru_err_log_enabled;
int instance;
struct nvme_effects_log *effects;
u64 nuse;
@@ -524,7 +525,6 @@ struct nvme_ns {
struct device cdev_device;
struct nvme_fault_inject fault_inject;
- bool passthru_err_log_enabled;
};
/* NVMe ns supports metadata actions by the controller (generate/strip) */
diff --git a/drivers/nvme/host/sysfs.c b/drivers/nvme/host/sysfs.c
index 6c7f1d5..09fcaa5 100644
--- a/drivers/nvme/host/sysfs.c
+++ b/drivers/nvme/host/sysfs.c
@@ -48,8 +48,8 @@ static ssize_t nvme_adm_passthru_err_log_enabled_store(struct device *dev,
struct device_attribute *attr, const char *buf, size_t count)
{
struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
- int err;
bool passthru_err_log_enabled;
+ int err;
err = kstrtobool(buf, &passthru_err_log_enabled);
if (err)
@@ -60,25 +60,34 @@ static ssize_t nvme_adm_passthru_err_log_enabled_store(struct device *dev,
return count;
}
+static inline struct nvme_ns_head *dev_to_ns_head(struct device *dev)
+{
+ struct gendisk *disk = dev_to_disk(dev);
+
+ if (nvme_disk_is_ns_head(disk))
+ return disk->private_data;
+ return nvme_get_ns_from_dev(dev)->head;
+}
+
static ssize_t nvme_io_passthru_err_log_enabled_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
- struct nvme_ns *n = dev_get_drvdata(dev);
+ struct nvme_ns_head *head = dev_to_ns_head(dev);
- return sysfs_emit(buf, n->passthru_err_log_enabled ? "on\n" : "off\n");
+ return sysfs_emit(buf, head->passthru_err_log_enabled ? "on\n" : "off\n");
}
static ssize_t nvme_io_passthru_err_log_enabled_store(struct device *dev,
struct device_attribute *attr, const char *buf, size_t count)
{
- struct nvme_ns *ns = dev_get_drvdata(dev);
- int err;
+ struct nvme_ns_head *head = dev_to_ns_head(dev);
bool passthru_err_log_enabled;
+ int err;
err = kstrtobool(buf, &passthru_err_log_enabled);
if (err)
return -EINVAL;
- ns->passthru_err_log_enabled = passthru_err_log_enabled;
+ head->passthru_err_log_enabled = passthru_err_log_enabled;
return count;
}
@@ -91,15 +100,6 @@ static struct device_attribute dev_attr_io_passthru_err_log_enabled = \
__ATTR(passthru_err_log_enabled, S_IRUGO | S_IWUSR, \
nvme_io_passthru_err_log_enabled_show, nvme_io_passthru_err_log_enabled_store);
-static inline struct nvme_ns_head *dev_to_ns_head(struct device *dev)
-{
- struct gendisk *disk = dev_to_disk(dev);
-
- if (nvme_disk_is_ns_head(disk))
- return disk->private_data;
- return nvme_get_ns_from_dev(dev)->head;
-}
-
static ssize_t wwid_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
diff --git a/drivers/nvme/target/fabrics-cmd.c b/drivers/nvme/target/fabrics-cmd.c
index 08e9c6b..b23f4cf 100644
--- a/drivers/nvme/target/fabrics-cmd.c
+++ b/drivers/nvme/target/fabrics-cmd.c
@@ -210,7 +210,7 @@ static void nvmet_execute_admin_connect(struct nvmet_req *req)
struct nvmf_connect_command *c = &req->cmd->connect;
struct nvmf_connect_data *d;
struct nvmet_ctrl *ctrl = NULL;
- u16 status = 0;
+ u16 status;
int ret;
if (!nvmet_check_transfer_len(req, sizeof(struct nvmf_connect_data)))
@@ -289,7 +289,7 @@ static void nvmet_execute_io_connect(struct nvmet_req *req)
struct nvmf_connect_data *d;
struct nvmet_ctrl *ctrl;
u16 qid = le16_to_cpu(c->qid);
- u16 status = 0;
+ u16 status;
if (!nvmet_check_transfer_len(req, sizeof(struct nvmf_connect_data)))
return;
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index f11400a..6426aac 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -50,10 +50,10 @@ void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id)
void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
{
- if (ns->bdev_handle) {
- bdev_release(ns->bdev_handle);
+ if (ns->bdev_file) {
+ fput(ns->bdev_file);
ns->bdev = NULL;
- ns->bdev_handle = NULL;
+ ns->bdev_file = NULL;
}
}
@@ -85,18 +85,18 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
if (ns->buffered_io)
return -ENOTBLK;
- ns->bdev_handle = bdev_open_by_path(ns->device_path,
+ ns->bdev_file = bdev_file_open_by_path(ns->device_path,
BLK_OPEN_READ | BLK_OPEN_WRITE, NULL, NULL);
- if (IS_ERR(ns->bdev_handle)) {
- ret = PTR_ERR(ns->bdev_handle);
+ if (IS_ERR(ns->bdev_file)) {
+ ret = PTR_ERR(ns->bdev_file);
if (ret != -ENOTBLK) {
pr_err("failed to open block device %s: (%d)\n",
ns->device_path, ret);
}
- ns->bdev_handle = NULL;
+ ns->bdev_file = NULL;
return ret;
}
- ns->bdev = ns->bdev_handle->bdev;
+ ns->bdev = file_bdev(ns->bdev_file);
ns->size = bdev_nr_bytes(ns->bdev);
ns->blksize_shift = blksize_bits(bdev_logical_block_size(ns->bdev));
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 7c6e7e6..f460728 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -58,7 +58,7 @@
struct nvmet_ns {
struct percpu_ref ref;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
struct block_device *bdev;
struct file *file;
bool readonly;
diff --git a/drivers/nvmem/core.c b/drivers/nvmem/core.c
index 980123f..eb357ac 100644
--- a/drivers/nvmem/core.c
+++ b/drivers/nvmem/core.c
@@ -460,8 +460,9 @@ static int nvmem_populate_sysfs_cells(struct nvmem_device *nvmem)
list_for_each_entry(entry, &nvmem->cells, node) {
sysfs_bin_attr_init(&attrs[i]);
attrs[i].attr.name = devm_kasprintf(&nvmem->dev, GFP_KERNEL,
- "%s@%x", entry->name,
- entry->offset);
+ "%s@%x,%x", entry->name,
+ entry->offset,
+ entry->bit_offset);
attrs[i].attr.mode = 0444;
attrs[i].size = entry->bytes;
attrs[i].read = &nvmem_cell_attr_read;
diff --git a/drivers/of/property.c b/drivers/of/property.c
index 641a40c..fa8cd33b 100644
--- a/drivers/of/property.c
+++ b/drivers/of/property.c
@@ -763,7 +763,9 @@ struct device_node *of_graph_get_port_parent(struct device_node *node)
/* Walk 3 levels up only if there is 'ports' node. */
for (depth = 3; depth && node; depth--) {
node = of_get_next_parent(node);
- if (depth == 2 && !of_node_name_eq(node, "ports"))
+ if (depth == 2 && !of_node_name_eq(node, "ports") &&
+ !of_node_name_eq(node, "in-ports") &&
+ !of_node_name_eq(node, "out-ports"))
break;
}
return node;
@@ -1063,36 +1065,6 @@ of_fwnode_device_get_match_data(const struct fwnode_handle *fwnode,
return of_device_get_match_data(dev);
}
-static struct device_node *of_get_compat_node(struct device_node *np)
-{
- of_node_get(np);
-
- while (np) {
- if (!of_device_is_available(np)) {
- of_node_put(np);
- np = NULL;
- }
-
- if (of_property_present(np, "compatible"))
- break;
-
- np = of_get_next_parent(np);
- }
-
- return np;
-}
-
-static struct device_node *of_get_compat_node_parent(struct device_node *np)
-{
- struct device_node *parent, *node;
-
- parent = of_get_parent(np);
- node = of_get_compat_node(parent);
- of_node_put(parent);
-
- return node;
-}
-
static void of_link_to_phandle(struct device_node *con_np,
struct device_node *sup_np)
{
@@ -1222,10 +1194,10 @@ static struct device_node *parse_##fname(struct device_node *np, \
* parse_prop.prop_name: Name of property holding a phandle value
* parse_prop.index: For properties holding a list of phandles, this is the
* index into the list
+ * @get_con_dev: If the consumer node containing the property is never converted
+ * to a struct device, implement this ops so fw_devlink can use it
+ * to find the true consumer.
* @optional: Describes whether a supplier is mandatory or not
- * @node_not_dev: The consumer node containing the property is never converted
- * to a struct device. Instead, parse ancestor nodes for the
- * compatible property to find a node corresponding to a device.
*
* Returns:
* parse_prop() return values are
@@ -1236,15 +1208,15 @@ static struct device_node *parse_##fname(struct device_node *np, \
struct supplier_bindings {
struct device_node *(*parse_prop)(struct device_node *np,
const char *prop_name, int index);
+ struct device_node *(*get_con_dev)(struct device_node *np);
bool optional;
- bool node_not_dev;
};
DEFINE_SIMPLE_PROP(clocks, "clocks", "#clock-cells")
DEFINE_SIMPLE_PROP(interconnects, "interconnects", "#interconnect-cells")
DEFINE_SIMPLE_PROP(iommus, "iommus", "#iommu-cells")
DEFINE_SIMPLE_PROP(mboxes, "mboxes", "#mbox-cells")
-DEFINE_SIMPLE_PROP(io_channels, "io-channel", "#io-channel-cells")
+DEFINE_SIMPLE_PROP(io_channels, "io-channels", "#io-channel-cells")
DEFINE_SIMPLE_PROP(interrupt_parent, "interrupt-parent", NULL)
DEFINE_SIMPLE_PROP(dmas, "dmas", "#dma-cells")
DEFINE_SIMPLE_PROP(power_domains, "power-domains", "#power-domain-cells")
@@ -1262,7 +1234,6 @@ DEFINE_SIMPLE_PROP(pinctrl5, "pinctrl-5", NULL)
DEFINE_SIMPLE_PROP(pinctrl6, "pinctrl-6", NULL)
DEFINE_SIMPLE_PROP(pinctrl7, "pinctrl-7", NULL)
DEFINE_SIMPLE_PROP(pinctrl8, "pinctrl-8", NULL)
-DEFINE_SIMPLE_PROP(remote_endpoint, "remote-endpoint", NULL)
DEFINE_SIMPLE_PROP(pwms, "pwms", "#pwm-cells")
DEFINE_SIMPLE_PROP(resets, "resets", "#reset-cells")
DEFINE_SIMPLE_PROP(leds, "leds", NULL)
@@ -1328,6 +1299,17 @@ static struct device_node *parse_interrupts(struct device_node *np,
return of_irq_parse_one(np, index, &sup_args) ? NULL : sup_args.np;
}
+static struct device_node *parse_remote_endpoint(struct device_node *np,
+ const char *prop_name,
+ int index)
+{
+ /* Return NULL for index > 0 to signify end of remote-endpoints. */
+ if (index > 0 || strcmp(prop_name, "remote-endpoint"))
+ return NULL;
+
+ return of_graph_get_remote_port_parent(np);
+}
+
static const struct supplier_bindings of_supplier_bindings[] = {
{ .parse_prop = parse_clocks, },
{ .parse_prop = parse_interconnects, },
@@ -1352,7 +1334,10 @@ static const struct supplier_bindings of_supplier_bindings[] = {
{ .parse_prop = parse_pinctrl6, },
{ .parse_prop = parse_pinctrl7, },
{ .parse_prop = parse_pinctrl8, },
- { .parse_prop = parse_remote_endpoint, .node_not_dev = true, },
+ {
+ .parse_prop = parse_remote_endpoint,
+ .get_con_dev = of_graph_get_port_parent,
+ },
{ .parse_prop = parse_pwms, },
{ .parse_prop = parse_resets, },
{ .parse_prop = parse_leds, },
@@ -1403,8 +1388,8 @@ static int of_link_property(struct device_node *con_np, const char *prop_name)
while ((phandle = s->parse_prop(con_np, prop_name, i))) {
struct device_node *con_dev_np;
- con_dev_np = s->node_not_dev
- ? of_get_compat_node_parent(con_np)
+ con_dev_np = s->get_con_dev
+ ? s->get_con_dev(con_np)
: of_node_get(con_np);
matched = true;
i++;
diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
index cfd60e3..d7593bd 100644
--- a/drivers/of/unittest.c
+++ b/drivers/of/unittest.c
@@ -50,6 +50,12 @@ static struct unittest_results {
failed; \
})
+#ifdef CONFIG_OF_KOBJ
+#define OF_KREF_READ(NODE) kref_read(&(NODE)->kobj.kref)
+#else
+#define OF_KREF_READ(NODE) 1
+#endif
+
/*
* Expected message may have a message level other than KERN_INFO.
* Print the expected message only if the current loglevel will allow
@@ -570,7 +576,7 @@ static void __init of_unittest_parse_phandle_with_args_map(void)
pr_err("missing testcase data\n");
return;
}
- prefs[i] = kref_read(&p[i]->kobj.kref);
+ prefs[i] = OF_KREF_READ(p[i]);
}
rc = of_count_phandle_with_args(np, "phandle-list", "#phandle-cells");
@@ -693,9 +699,9 @@ static void __init of_unittest_parse_phandle_with_args_map(void)
unittest(rc == -EINVAL, "expected:%i got:%i\n", -EINVAL, rc);
for (i = 0; i < ARRAY_SIZE(p); ++i) {
- unittest(prefs[i] == kref_read(&p[i]->kobj.kref),
+ unittest(prefs[i] == OF_KREF_READ(p[i]),
"provider%d: expected:%d got:%d\n",
- i, prefs[i], kref_read(&p[i]->kobj.kref));
+ i, prefs[i], OF_KREF_READ(p[i]));
of_node_put(p[i]);
}
}
diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
index 5befed2..9a437cf 100644
--- a/drivers/pci/controller/dwc/pcie-designware-ep.c
+++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
@@ -6,6 +6,7 @@
* Author: Kishon Vijay Abraham I <kishon@ti.com>
*/
+#include <linux/align.h>
#include <linux/bitfield.h>
#include <linux/of.h>
#include <linux/platform_device.h>
@@ -482,9 +483,10 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
reg = ep_func->msi_cap + PCI_MSI_DATA_32;
msg_data = dw_pcie_ep_readw_dbi(ep, func_no, reg);
}
- aligned_offset = msg_addr_lower & (epc->mem->window.page_size - 1);
- msg_addr = ((u64)msg_addr_upper) << 32 |
- (msg_addr_lower & ~aligned_offset);
+ msg_addr = ((u64)msg_addr_upper) << 32 | msg_addr_lower;
+
+ aligned_offset = msg_addr & (epc->mem->window.page_size - 1);
+ msg_addr = ALIGN_DOWN(msg_addr, epc->mem->window.page_size);
ret = dw_pcie_ep_map_addr(epc, func_no, 0, ep->msi_mem_phys, msg_addr,
epc->mem->window.page_size);
if (ret)
@@ -551,7 +553,7 @@ int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
}
aligned_offset = msg_addr & (epc->mem->window.page_size - 1);
- msg_addr &= ~aligned_offset;
+ msg_addr = ALIGN_DOWN(msg_addr, epc->mem->window.page_size);
ret = dw_pcie_ep_map_addr(epc, func_no, 0, ep->msi_mem_phys, msg_addr,
epc->mem->window.page_size);
if (ret)
diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
index c8be056..cfd84a8 100644
--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -61,7 +61,7 @@ static irq_hw_number_t pci_msi_domain_calc_hwirq(struct msi_desc *desc)
return (irq_hw_number_t)desc->msi_index |
pci_dev_id(dev) << 11 |
- (pci_domain_nr(dev->bus) & 0xFFFFFFFF) << 27;
+ ((irq_hw_number_t)(pci_domain_nr(dev->bus) & 0xFFFFFFFF)) << 27;
}
static void pci_msi_domain_set_desc(msi_alloc_info_t *arg,
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 9ab9b10..c358522 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2522,29 +2522,36 @@ static void pci_pme_list_scan(struct work_struct *work)
if (pdev->pme_poll) {
struct pci_dev *bridge = pdev->bus->self;
struct device *dev = &pdev->dev;
- int pm_status;
+ struct device *bdev = bridge ? &bridge->dev : NULL;
+ int bref = 0;
/*
- * If bridge is in low power state, the
- * configuration space of subordinate devices
- * may be not accessible
+ * If we have a bridge, it should be in an active/D0
+ * state or the configuration space of subordinate
+ * devices may not be accessible or stable over the
+ * course of the call.
*/
- if (bridge && bridge->current_state != PCI_D0)
- continue;
+ if (bdev) {
+ bref = pm_runtime_get_if_active(bdev, true);
+ if (!bref)
+ continue;
+
+ if (bridge->current_state != PCI_D0)
+ goto put_bridge;
+ }
/*
- * If the device is in a low power state it
- * should not be polled either.
+ * The device itself should be suspended but config
+ * space must be accessible, therefore it cannot be in
+ * D3cold.
*/
- pm_status = pm_runtime_get_if_active(dev, true);
- if (!pm_status)
- continue;
-
- if (pdev->current_state != PCI_D3cold)
+ if (pm_runtime_suspended(dev) &&
+ pdev->current_state != PCI_D3cold)
pci_pme_wakeup(pdev, NULL);
- if (pm_status > 0)
- pm_runtime_put(dev);
+put_bridge:
+ if (bref > 0)
+ pm_runtime_put(bdev);
} else {
list_del(&pme_dev->list);
kfree(pme_dev);
diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index c584165..7e3aa7e 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -2305,6 +2305,17 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
dev_dbg(cmn->dev, "ignoring external node %llx\n", reg);
continue;
}
+ /*
+ * AmpereOneX erratum AC04_MESH_1 makes some XPs report a bogus
+ * child count larger than the number of valid child pointers.
+ * A child offset of 0 can only occur on CMN-600; otherwise it
+ * would imply the root node being its own grandchild, which
+ * we can safely dismiss in general.
+ */
+ if (reg == 0 && cmn->part != PART_CMN600) {
+ dev_dbg(cmn->dev, "bogus child pointer?\n");
+ continue;
+ }
arm_cmn_init_node_info(cmn, reg & CMN_CHILD_NODE_ADDR, dn);
diff --git a/drivers/perf/cxl_pmu.c b/drivers/perf/cxl_pmu.c
index 365d964..308c996 100644
--- a/drivers/perf/cxl_pmu.c
+++ b/drivers/perf/cxl_pmu.c
@@ -59,7 +59,7 @@
#define CXL_PMU_COUNTER_CFG_EVENT_GRP_ID_IDX_MSK GENMASK_ULL(63, 59)
#define CXL_PMU_FILTER_CFG_REG(n, f) (0x400 + 4 * ((f) + (n) * 8))
-#define CXL_PMU_FILTER_CFG_VALUE_MSK GENMASK(15, 0)
+#define CXL_PMU_FILTER_CFG_VALUE_MSK GENMASK(31, 0)
#define CXL_PMU_COUNTER_REG(n) (0xc00 + 8 * (n))
@@ -314,9 +314,9 @@ static bool cxl_pmu_config1_get_edge(struct perf_event *event)
}
/*
- * CPMU specification allows for 8 filters, each with a 16 bit value...
- * So we need to find 8x16bits to store it in.
- * As the value used for disable is 0xffff, a separate enable switch
+ * CPMU specification allows for 8 filters, each with a 32 bit value...
+ * So we need to find 8x32bits to store it in.
+ * As the value used for disable is 0xffff_ffff, a separate enable switch
* is needed.
*/
@@ -419,7 +419,7 @@ static struct attribute *cxl_pmu_event_attrs[] = {
CXL_PMU_EVENT_CXL_ATTR(s2m_ndr_cmp, CXL_PMU_GID_S2M_NDR, BIT(0)),
CXL_PMU_EVENT_CXL_ATTR(s2m_ndr_cmps, CXL_PMU_GID_S2M_NDR, BIT(1)),
CXL_PMU_EVENT_CXL_ATTR(s2m_ndr_cmpe, CXL_PMU_GID_S2M_NDR, BIT(2)),
- CXL_PMU_EVENT_CXL_ATTR(s2m_ndr_biconflictack, CXL_PMU_GID_S2M_NDR, BIT(3)),
+ CXL_PMU_EVENT_CXL_ATTR(s2m_ndr_biconflictack, CXL_PMU_GID_S2M_NDR, BIT(4)),
/* CXL rev 3.0 Table 3-46 S2M DRS opcodes */
CXL_PMU_EVENT_CXL_ATTR(s2m_drs_memdata, CXL_PMU_GID_S2M_DRS, BIT(0)),
CXL_PMU_EVENT_CXL_ATTR(s2m_drs_memdatanxm, CXL_PMU_GID_S2M_DRS, BIT(1)),
@@ -642,7 +642,7 @@ static void cxl_pmu_event_start(struct perf_event *event, int flags)
if (cxl_pmu_config1_hdm_filter_en(event))
cfg = cxl_pmu_config2_get_hdm_decoder(event);
else
- cfg = GENMASK(15, 0); /* No filtering if 0xFFFF_FFFF */
+ cfg = GENMASK(31, 0); /* No filtering if 0xFFFF_FFFF */
writeq(cfg, base + CXL_PMU_FILTER_CFG_REG(hwc->idx, 0));
}
diff --git a/drivers/perf/riscv_pmu.c b/drivers/perf/riscv_pmu.c
index 0dda70e1..c78a6fd 100644
--- a/drivers/perf/riscv_pmu.c
+++ b/drivers/perf/riscv_pmu.c
@@ -150,19 +150,11 @@ u64 riscv_pmu_ctr_get_width_mask(struct perf_event *event)
struct riscv_pmu *rvpmu = to_riscv_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
- if (!rvpmu->ctr_get_width)
- /**
- * If the pmu driver doesn't support counter width, set it to default
- * maximum allowed by the specification.
- */
- cwidth = 63;
- else {
- if (hwc->idx == -1)
- /* Handle init case where idx is not initialized yet */
- cwidth = rvpmu->ctr_get_width(0);
- else
- cwidth = rvpmu->ctr_get_width(hwc->idx);
- }
+ if (hwc->idx == -1)
+ /* Handle init case where idx is not initialized yet */
+ cwidth = rvpmu->ctr_get_width(0);
+ else
+ cwidth = rvpmu->ctr_get_width(hwc->idx);
return GENMASK_ULL(cwidth, 0);
}
diff --git a/drivers/perf/riscv_pmu_legacy.c b/drivers/perf/riscv_pmu_legacy.c
index 79fdd66..fa0bccf 100644
--- a/drivers/perf/riscv_pmu_legacy.c
+++ b/drivers/perf/riscv_pmu_legacy.c
@@ -37,6 +37,12 @@ static int pmu_legacy_event_map(struct perf_event *event, u64 *config)
return pmu_legacy_ctr_get_idx(event);
}
+/* cycle & instret are always 64 bit, one bit less according to SBI spec */
+static int pmu_legacy_ctr_get_width(int idx)
+{
+ return 63;
+}
+
static u64 pmu_legacy_read_ctr(struct perf_event *event)
{
struct hw_perf_event *hwc = &event->hw;
@@ -111,12 +117,14 @@ static void pmu_legacy_init(struct riscv_pmu *pmu)
pmu->ctr_stop = NULL;
pmu->event_map = pmu_legacy_event_map;
pmu->ctr_get_idx = pmu_legacy_ctr_get_idx;
- pmu->ctr_get_width = NULL;
+ pmu->ctr_get_width = pmu_legacy_ctr_get_width;
pmu->ctr_clear_idx = NULL;
pmu->ctr_read = pmu_legacy_read_ctr;
pmu->event_mapped = pmu_legacy_event_mapped;
pmu->event_unmapped = pmu_legacy_event_unmapped;
pmu->csr_index = pmu_legacy_csr_index;
+ pmu->pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT;
+ pmu->pmu.capabilities |= PERF_PMU_CAP_NO_EXCLUDE;
perf_pmu_register(&pmu->pmu, "cpu", PERF_TYPE_RAW);
}
diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index 16acd4d..452aab4 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -512,7 +512,7 @@ static void pmu_sbi_set_scounteren(void *arg)
if (event->hw.idx != -1)
csr_write(CSR_SCOUNTEREN,
- csr_read(CSR_SCOUNTEREN) | (1 << pmu_sbi_csr_index(event)));
+ csr_read(CSR_SCOUNTEREN) | BIT(pmu_sbi_csr_index(event)));
}
static void pmu_sbi_reset_scounteren(void *arg)
@@ -521,7 +521,7 @@ static void pmu_sbi_reset_scounteren(void *arg)
if (event->hw.idx != -1)
csr_write(CSR_SCOUNTEREN,
- csr_read(CSR_SCOUNTEREN) & ~(1 << pmu_sbi_csr_index(event)));
+ csr_read(CSR_SCOUNTEREN) & ~BIT(pmu_sbi_csr_index(event)));
}
static void pmu_sbi_ctr_start(struct perf_event *event, u64 ival)
@@ -731,14 +731,14 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
/* compute hardware counter index */
hidx = info->csr - CSR_CYCLE;
/* check if the corresponding bit is set in sscountovf */
- if (!(overflow & (1 << hidx)))
+ if (!(overflow & BIT(hidx)))
continue;
/*
* Keep a track of overflowed counters so that they can be started
* with updated initial value.
*/
- overflowed_ctrs |= 1 << lidx;
+ overflowed_ctrs |= BIT(lidx);
hw_evt = &event->hw;
riscv_pmu_event_update(event);
perf_sample_data_init(&data, 0, hw_evt->last_period);
diff --git a/drivers/phy/freescale/phy-fsl-imx8-mipi-dphy.c b/drivers/phy/freescale/phy-fsl-imx8-mipi-dphy.c
index e625b32..0928a52 100644
--- a/drivers/phy/freescale/phy-fsl-imx8-mipi-dphy.c
+++ b/drivers/phy/freescale/phy-fsl-imx8-mipi-dphy.c
@@ -706,7 +706,7 @@ static int mixel_dphy_probe(struct platform_device *pdev)
return ret;
}
- priv->id = of_alias_get_id(np, "mipi_dphy");
+ priv->id = of_alias_get_id(np, "mipi-dphy");
if (priv->id < 0) {
dev_err(dev, "Failed to get phy node alias id: %d\n",
priv->id);
diff --git a/drivers/phy/qualcomm/phy-qcom-eusb2-repeater.c b/drivers/phy/qualcomm/phy-qcom-eusb2-repeater.c
index a623f09..a43e20a 100644
--- a/drivers/phy/qualcomm/phy-qcom-eusb2-repeater.c
+++ b/drivers/phy/qualcomm/phy-qcom-eusb2-repeater.c
@@ -37,56 +37,28 @@
#define EUSB2_TUNE_EUSB_EQU 0x5A
#define EUSB2_TUNE_EUSB_HS_COMP_CUR 0x5B
-#define QCOM_EUSB2_REPEATER_INIT_CFG(r, v) \
- { \
- .reg = r, \
- .val = v, \
- }
+enum eusb2_reg_layout {
+ TUNE_EUSB_HS_COMP_CUR,
+ TUNE_EUSB_EQU,
+ TUNE_EUSB_SLEW,
+ TUNE_USB2_HS_COMP_CUR,
+ TUNE_USB2_PREEM,
+ TUNE_USB2_EQU,
+ TUNE_USB2_SLEW,
+ TUNE_SQUELCH_U,
+ TUNE_HSDISC,
+ TUNE_RES_FSDIF,
+ TUNE_IUSB2,
+ TUNE_USB2_CROSSOVER,
+ NUM_TUNE_FIELDS,
-enum reg_fields {
- F_TUNE_EUSB_HS_COMP_CUR,
- F_TUNE_EUSB_EQU,
- F_TUNE_EUSB_SLEW,
- F_TUNE_USB2_HS_COMP_CUR,
- F_TUNE_USB2_PREEM,
- F_TUNE_USB2_EQU,
- F_TUNE_USB2_SLEW,
- F_TUNE_SQUELCH_U,
- F_TUNE_HSDISC,
- F_TUNE_RES_FSDIF,
- F_TUNE_IUSB2,
- F_TUNE_USB2_CROSSOVER,
- F_NUM_TUNE_FIELDS,
+ FORCE_VAL_5 = NUM_TUNE_FIELDS,
+ FORCE_EN_5,
- F_FORCE_VAL_5 = F_NUM_TUNE_FIELDS,
- F_FORCE_EN_5,
+ EN_CTL1,
- F_EN_CTL1,
-
- F_RPTR_STATUS,
- F_NUM_FIELDS,
-};
-
-static struct reg_field eusb2_repeater_tune_reg_fields[F_NUM_FIELDS] = {
- [F_TUNE_EUSB_HS_COMP_CUR] = REG_FIELD(EUSB2_TUNE_EUSB_HS_COMP_CUR, 0, 1),
- [F_TUNE_EUSB_EQU] = REG_FIELD(EUSB2_TUNE_EUSB_EQU, 0, 1),
- [F_TUNE_EUSB_SLEW] = REG_FIELD(EUSB2_TUNE_EUSB_SLEW, 0, 1),
- [F_TUNE_USB2_HS_COMP_CUR] = REG_FIELD(EUSB2_TUNE_USB2_HS_COMP_CUR, 0, 1),
- [F_TUNE_USB2_PREEM] = REG_FIELD(EUSB2_TUNE_USB2_PREEM, 0, 2),
- [F_TUNE_USB2_EQU] = REG_FIELD(EUSB2_TUNE_USB2_EQU, 0, 1),
- [F_TUNE_USB2_SLEW] = REG_FIELD(EUSB2_TUNE_USB2_SLEW, 0, 1),
- [F_TUNE_SQUELCH_U] = REG_FIELD(EUSB2_TUNE_SQUELCH_U, 0, 2),
- [F_TUNE_HSDISC] = REG_FIELD(EUSB2_TUNE_HSDISC, 0, 2),
- [F_TUNE_RES_FSDIF] = REG_FIELD(EUSB2_TUNE_RES_FSDIF, 0, 2),
- [F_TUNE_IUSB2] = REG_FIELD(EUSB2_TUNE_IUSB2, 0, 3),
- [F_TUNE_USB2_CROSSOVER] = REG_FIELD(EUSB2_TUNE_USB2_CROSSOVER, 0, 2),
-
- [F_FORCE_VAL_5] = REG_FIELD(EUSB2_FORCE_VAL_5, 0, 7),
- [F_FORCE_EN_5] = REG_FIELD(EUSB2_FORCE_EN_5, 0, 7),
-
- [F_EN_CTL1] = REG_FIELD(EUSB2_EN_CTL1, 0, 7),
-
- [F_RPTR_STATUS] = REG_FIELD(EUSB2_RPTR_STATUS, 0, 7),
+ RPTR_STATUS,
+ LAYOUT_SIZE,
};
struct eusb2_repeater_cfg {
@@ -98,10 +70,11 @@ struct eusb2_repeater_cfg {
struct eusb2_repeater {
struct device *dev;
- struct regmap_field *regs[F_NUM_FIELDS];
+ struct regmap *regmap;
struct phy *phy;
struct regulator_bulk_data *vregs;
const struct eusb2_repeater_cfg *cfg;
+ u32 base;
enum phy_mode mode;
};
@@ -109,10 +82,10 @@ static const char * const pm8550b_vreg_l[] = {
"vdd18", "vdd3",
};
-static const u32 pm8550b_init_tbl[F_NUM_TUNE_FIELDS] = {
- [F_TUNE_IUSB2] = 0x8,
- [F_TUNE_SQUELCH_U] = 0x3,
- [F_TUNE_USB2_PREEM] = 0x5,
+static const u32 pm8550b_init_tbl[NUM_TUNE_FIELDS] = {
+ [TUNE_IUSB2] = 0x8,
+ [TUNE_SQUELCH_U] = 0x3,
+ [TUNE_USB2_PREEM] = 0x5,
};
static const struct eusb2_repeater_cfg pm8550b_eusb2_cfg = {
@@ -140,47 +113,42 @@ static int eusb2_repeater_init_vregs(struct eusb2_repeater *rptr)
static int eusb2_repeater_init(struct phy *phy)
{
- struct reg_field *regfields = eusb2_repeater_tune_reg_fields;
struct eusb2_repeater *rptr = phy_get_drvdata(phy);
struct device_node *np = rptr->dev->of_node;
- u32 init_tbl[F_NUM_TUNE_FIELDS] = { 0 };
- u8 override;
+ struct regmap *regmap = rptr->regmap;
+ const u32 *init_tbl = rptr->cfg->init_tbl;
+ u8 tune_usb2_preem = init_tbl[TUNE_USB2_PREEM];
+ u8 tune_hsdisc = init_tbl[TUNE_HSDISC];
+ u8 tune_iusb2 = init_tbl[TUNE_IUSB2];
+ u32 base = rptr->base;
u32 val;
int ret;
- int i;
+
+ of_property_read_u8(np, "qcom,tune-usb2-amplitude", &tune_iusb2);
+ of_property_read_u8(np, "qcom,tune-usb2-disc-thres", &tune_hsdisc);
+ of_property_read_u8(np, "qcom,tune-usb2-preem", &tune_usb2_preem);
ret = regulator_bulk_enable(rptr->cfg->num_vregs, rptr->vregs);
if (ret)
return ret;
- regmap_field_update_bits(rptr->regs[F_EN_CTL1], EUSB2_RPTR_EN, EUSB2_RPTR_EN);
+ regmap_write(regmap, base + EUSB2_EN_CTL1, EUSB2_RPTR_EN);
- for (i = 0; i < F_NUM_TUNE_FIELDS; i++) {
- if (init_tbl[i]) {
- regmap_field_update_bits(rptr->regs[i], init_tbl[i], init_tbl[i]);
- } else {
- /* Write 0 if there's no value set */
- u32 mask = GENMASK(regfields[i].msb, regfields[i].lsb);
+ regmap_write(regmap, base + EUSB2_TUNE_EUSB_HS_COMP_CUR, init_tbl[TUNE_EUSB_HS_COMP_CUR]);
+ regmap_write(regmap, base + EUSB2_TUNE_EUSB_EQU, init_tbl[TUNE_EUSB_EQU]);
+ regmap_write(regmap, base + EUSB2_TUNE_EUSB_SLEW, init_tbl[TUNE_EUSB_SLEW]);
+ regmap_write(regmap, base + EUSB2_TUNE_USB2_HS_COMP_CUR, init_tbl[TUNE_USB2_HS_COMP_CUR]);
+ regmap_write(regmap, base + EUSB2_TUNE_USB2_EQU, init_tbl[TUNE_USB2_EQU]);
+ regmap_write(regmap, base + EUSB2_TUNE_USB2_SLEW, init_tbl[TUNE_USB2_SLEW]);
+ regmap_write(regmap, base + EUSB2_TUNE_SQUELCH_U, init_tbl[TUNE_SQUELCH_U]);
+ regmap_write(regmap, base + EUSB2_TUNE_RES_FSDIF, init_tbl[TUNE_RES_FSDIF]);
+ regmap_write(regmap, base + EUSB2_TUNE_USB2_CROSSOVER, init_tbl[TUNE_USB2_CROSSOVER]);
- regmap_field_update_bits(rptr->regs[i], mask, 0);
- }
- }
- memcpy(init_tbl, rptr->cfg->init_tbl, sizeof(init_tbl));
+ regmap_write(regmap, base + EUSB2_TUNE_USB2_PREEM, tune_usb2_preem);
+ regmap_write(regmap, base + EUSB2_TUNE_HSDISC, tune_hsdisc);
+ regmap_write(regmap, base + EUSB2_TUNE_IUSB2, tune_iusb2);
- if (!of_property_read_u8(np, "qcom,tune-usb2-amplitude", &override))
- init_tbl[F_TUNE_IUSB2] = override;
-
- if (!of_property_read_u8(np, "qcom,tune-usb2-disc-thres", &override))
- init_tbl[F_TUNE_HSDISC] = override;
-
- if (!of_property_read_u8(np, "qcom,tune-usb2-preem", &override))
- init_tbl[F_TUNE_USB2_PREEM] = override;
-
- for (i = 0; i < F_NUM_TUNE_FIELDS; i++)
- regmap_field_update_bits(rptr->regs[i], init_tbl[i], init_tbl[i]);
-
- ret = regmap_field_read_poll_timeout(rptr->regs[F_RPTR_STATUS],
- val, val & RPTR_OK, 10, 5);
+ ret = regmap_read_poll_timeout(regmap, base + EUSB2_RPTR_STATUS, val, val & RPTR_OK, 10, 5);
if (ret)
dev_err(rptr->dev, "initialization timed-out\n");
@@ -191,6 +159,8 @@ static int eusb2_repeater_set_mode(struct phy *phy,
enum phy_mode mode, int submode)
{
struct eusb2_repeater *rptr = phy_get_drvdata(phy);
+ struct regmap *regmap = rptr->regmap;
+ u32 base = rptr->base;
switch (mode) {
case PHY_MODE_USB_HOST:
@@ -199,10 +169,8 @@ static int eusb2_repeater_set_mode(struct phy *phy,
* per eUSB 1.2 Spec. Below implement software workaround until
* PHY and controller is fixing seen observation.
*/
- regmap_field_update_bits(rptr->regs[F_FORCE_EN_5],
- F_CLK_19P2M_EN, F_CLK_19P2M_EN);
- regmap_field_update_bits(rptr->regs[F_FORCE_VAL_5],
- V_CLK_19P2M_EN, V_CLK_19P2M_EN);
+ regmap_write(regmap, base + EUSB2_FORCE_EN_5, F_CLK_19P2M_EN);
+ regmap_write(regmap, base + EUSB2_FORCE_VAL_5, V_CLK_19P2M_EN);
break;
case PHY_MODE_USB_DEVICE:
/*
@@ -211,10 +179,8 @@ static int eusb2_repeater_set_mode(struct phy *phy,
* repeater doesn't clear previous value due to shared
* regulators (say host <-> device mode switch).
*/
- regmap_field_update_bits(rptr->regs[F_FORCE_EN_5],
- F_CLK_19P2M_EN, 0);
- regmap_field_update_bits(rptr->regs[F_FORCE_VAL_5],
- V_CLK_19P2M_EN, 0);
+ regmap_write(regmap, base + EUSB2_FORCE_EN_5, 0);
+ regmap_write(regmap, base + EUSB2_FORCE_VAL_5, 0);
break;
default:
return -EINVAL;
@@ -243,9 +209,8 @@ static int eusb2_repeater_probe(struct platform_device *pdev)
struct device *dev = &pdev->dev;
struct phy_provider *phy_provider;
struct device_node *np = dev->of_node;
- struct regmap *regmap;
- int i, ret;
u32 res;
+ int ret;
rptr = devm_kzalloc(dev, sizeof(*rptr), GFP_KERNEL);
if (!rptr)
@@ -258,22 +223,15 @@ static int eusb2_repeater_probe(struct platform_device *pdev)
if (!rptr->cfg)
return -EINVAL;
- regmap = dev_get_regmap(dev->parent, NULL);
- if (!regmap)
+ rptr->regmap = dev_get_regmap(dev->parent, NULL);
+ if (!rptr->regmap)
return -ENODEV;
ret = of_property_read_u32(np, "reg", &res);
if (ret < 0)
return ret;
- for (i = 0; i < F_NUM_FIELDS; i++)
- eusb2_repeater_tune_reg_fields[i].reg += res;
-
- ret = devm_regmap_field_bulk_alloc(dev, regmap, rptr->regs,
- eusb2_repeater_tune_reg_fields,
- F_NUM_FIELDS);
- if (ret)
- return ret;
+ rptr->base = res;
ret = eusb2_repeater_init_vregs(rptr);
if (ret < 0) {
diff --git a/drivers/phy/qualcomm/phy-qcom-m31.c b/drivers/phy/qualcomm/phy-qcom-m31.c
index c259057..03fb0d4 100644
--- a/drivers/phy/qualcomm/phy-qcom-m31.c
+++ b/drivers/phy/qualcomm/phy-qcom-m31.c
@@ -299,7 +299,7 @@ static int m31usb_phy_probe(struct platform_device *pdev)
qphy->vreg = devm_regulator_get(dev, "vdda-phy");
if (IS_ERR(qphy->vreg))
- return dev_err_probe(dev, PTR_ERR(qphy->phy),
+ return dev_err_probe(dev, PTR_ERR(qphy->vreg),
"failed to get vreg\n");
phy_set_drvdata(qphy->phy, qphy);
diff --git a/drivers/phy/qualcomm/phy-qcom-qmp-combo.c b/drivers/phy/qualcomm/phy-qcom-qmp-combo.c
index 1ad1011..17c4ad7 100644
--- a/drivers/phy/qualcomm/phy-qcom-qmp-combo.c
+++ b/drivers/phy/qualcomm/phy-qcom-qmp-combo.c
@@ -3562,14 +3562,6 @@ static int qmp_combo_probe(struct platform_device *pdev)
if (ret)
return ret;
- ret = qmp_combo_typec_switch_register(qmp);
- if (ret)
- return ret;
-
- ret = drm_aux_bridge_register(dev);
- if (ret)
- return ret;
-
/* Check for legacy binding with child nodes. */
usb_np = of_get_child_by_name(dev->of_node, "usb3-phy");
if (usb_np) {
@@ -3589,6 +3581,14 @@ static int qmp_combo_probe(struct platform_device *pdev)
if (ret)
goto err_node_put;
+ ret = qmp_combo_typec_switch_register(qmp);
+ if (ret)
+ goto err_node_put;
+
+ ret = drm_aux_bridge_register(dev);
+ if (ret)
+ goto err_node_put;
+
pm_runtime_set_active(dev);
ret = devm_pm_runtime_enable(dev);
if (ret)
diff --git a/drivers/phy/qualcomm/phy-qcom-qmp-usb.c b/drivers/phy/qualcomm/phy-qcom-qmp-usb.c
index 6621246..5c00398 100644
--- a/drivers/phy/qualcomm/phy-qcom-qmp-usb.c
+++ b/drivers/phy/qualcomm/phy-qcom-qmp-usb.c
@@ -1556,7 +1556,7 @@ static const char * const qmp_phy_vreg_l[] = {
"vdda-phy", "vdda-pll",
};
-static const struct qmp_usb_offsets qmp_usb_offsets_ipq8074 = {
+static const struct qmp_usb_offsets qmp_usb_offsets_v3 = {
.serdes = 0,
.pcs = 0x800,
.pcs_misc = 0x600,
@@ -1572,7 +1572,7 @@ static const struct qmp_usb_offsets qmp_usb_offsets_ipq9574 = {
.rx = 0x400,
};
-static const struct qmp_usb_offsets qmp_usb_offsets_v3 = {
+static const struct qmp_usb_offsets qmp_usb_offsets_v3_msm8996 = {
.serdes = 0,
.pcs = 0x600,
.tx = 0x200,
@@ -1624,7 +1624,7 @@ static const struct qmp_usb_offsets qmp_usb_offsets_v7 = {
static const struct qmp_phy_cfg ipq6018_usb3phy_cfg = {
.lanes = 1,
- .offsets = &qmp_usb_offsets_ipq8074,
+ .offsets = &qmp_usb_offsets_v3,
.serdes_tbl = ipq9574_usb3_serdes_tbl,
.serdes_tbl_num = ARRAY_SIZE(ipq9574_usb3_serdes_tbl),
@@ -1642,7 +1642,7 @@ static const struct qmp_phy_cfg ipq6018_usb3phy_cfg = {
static const struct qmp_phy_cfg ipq8074_usb3phy_cfg = {
.lanes = 1,
- .offsets = &qmp_usb_offsets_ipq8074,
+ .offsets = &qmp_usb_offsets_v3,
.serdes_tbl = ipq8074_usb3_serdes_tbl,
.serdes_tbl_num = ARRAY_SIZE(ipq8074_usb3_serdes_tbl),
@@ -1678,7 +1678,7 @@ static const struct qmp_phy_cfg ipq9574_usb3phy_cfg = {
static const struct qmp_phy_cfg msm8996_usb3phy_cfg = {
.lanes = 1,
- .offsets = &qmp_usb_offsets_v3,
+ .offsets = &qmp_usb_offsets_v3_msm8996,
.serdes_tbl = msm8996_usb3_serdes_tbl,
.serdes_tbl_num = ARRAY_SIZE(msm8996_usb3_serdes_tbl),
diff --git a/drivers/pinctrl/core.c b/drivers/pinctrl/core.c
index ee56856..bbcdece8 100644
--- a/drivers/pinctrl/core.c
+++ b/drivers/pinctrl/core.c
@@ -1644,7 +1644,7 @@ static int pinctrl_pins_show(struct seq_file *s, void *what)
const struct pinctrl_ops *ops = pctldev->desc->pctlops;
unsigned int i, pin;
#ifdef CONFIG_GPIOLIB
- struct gpio_device *gdev __free(gpio_device_put) = NULL;
+ struct gpio_device *gdev = NULL;
struct pinctrl_gpio_range *range;
int gpio_num;
#endif
diff --git a/drivers/pinctrl/pinctrl-amd.c b/drivers/pinctrl/pinctrl-amd.c
index 03ecb3d..49f89b7 100644
--- a/drivers/pinctrl/pinctrl-amd.c
+++ b/drivers/pinctrl/pinctrl-amd.c
@@ -1159,7 +1159,7 @@ static int amd_gpio_probe(struct platform_device *pdev)
}
ret = devm_request_irq(&pdev->dev, gpio_dev->irq, amd_gpio_irq_handler,
- IRQF_SHARED, KBUILD_MODNAME, gpio_dev);
+ IRQF_SHARED | IRQF_ONESHOT, KBUILD_MODNAME, gpio_dev);
if (ret)
goto out2;
diff --git a/drivers/pinctrl/stm32/pinctrl-stm32mp257.c b/drivers/pinctrl/stm32/pinctrl-stm32mp257.c
index 73f091c..23aebd4 100644
--- a/drivers/pinctrl/stm32/pinctrl-stm32mp257.c
+++ b/drivers/pinctrl/stm32/pinctrl-stm32mp257.c
@@ -2562,7 +2562,7 @@ static const struct of_device_id stm32mp257_pctrl_match[] = {
};
static const struct dev_pm_ops stm32_pinctrl_dev_pm_ops = {
- SET_LATE_SYSTEM_SLEEP_PM_OPS(NULL, stm32_pinctrl_resume)
+ SET_LATE_SYSTEM_SLEEP_PM_OPS(stm32_pinctrl_suspend, stm32_pinctrl_resume)
};
static struct platform_driver stm32mp257_pinctrl_driver = {
diff --git a/drivers/platform/x86/amd/pmf/core.c b/drivers/platform/x86/amd/pmf/core.c
index feaa09f..4f734e0 100644
--- a/drivers/platform/x86/amd/pmf/core.c
+++ b/drivers/platform/x86/amd/pmf/core.c
@@ -296,7 +296,8 @@ static int amd_pmf_suspend_handler(struct device *dev)
{
struct amd_pmf_dev *pdev = dev_get_drvdata(dev);
- kfree(pdev->buf);
+ if (pdev->smart_pc_enabled)
+ cancel_delayed_work_sync(&pdev->pb_work);
return 0;
}
@@ -312,6 +313,9 @@ static int amd_pmf_resume_handler(struct device *dev)
return ret;
}
+ if (pdev->smart_pc_enabled)
+ schedule_delayed_work(&pdev->pb_work, msecs_to_jiffies(2000));
+
return 0;
}
@@ -330,9 +334,14 @@ static void amd_pmf_init_features(struct amd_pmf_dev *dev)
dev_dbg(dev->dev, "SPS enabled and Platform Profiles registered\n");
}
- if (!amd_pmf_init_smart_pc(dev)) {
+ amd_pmf_init_smart_pc(dev);
+ if (dev->smart_pc_enabled) {
dev_dbg(dev->dev, "Smart PC Solution Enabled\n");
- } else if (is_apmf_func_supported(dev, APMF_FUNC_AUTO_MODE)) {
+ /* If Smart PC is enabled, no need to check for other features */
+ return;
+ }
+
+ if (is_apmf_func_supported(dev, APMF_FUNC_AUTO_MODE)) {
amd_pmf_init_auto_mode(dev);
dev_dbg(dev->dev, "Auto Mode Init done\n");
} else if (is_apmf_func_supported(dev, APMF_FUNC_DYN_SLIDER_AC) ||
@@ -351,7 +360,7 @@ static void amd_pmf_deinit_features(struct amd_pmf_dev *dev)
amd_pmf_deinit_sps(dev);
}
- if (!dev->smart_pc_enabled) {
+ if (dev->smart_pc_enabled) {
amd_pmf_deinit_smart_pc(dev);
} else if (is_apmf_func_supported(dev, APMF_FUNC_AUTO_MODE)) {
amd_pmf_deinit_auto_mode(dev);
diff --git a/drivers/platform/x86/amd/pmf/pmf.h b/drivers/platform/x86/amd/pmf/pmf.h
index 16999c5..66cae1c 100644
--- a/drivers/platform/x86/amd/pmf/pmf.h
+++ b/drivers/platform/x86/amd/pmf/pmf.h
@@ -441,11 +441,6 @@ struct apmf_dyn_slider_output {
struct apmf_cnqf_power_set ps[APMF_CNQF_MAX];
} __packed;
-enum smart_pc_status {
- PMF_SMART_PC_ENABLED,
- PMF_SMART_PC_DISABLED,
-};
-
/* Smart PC - TA internals */
enum system_state {
SYSTEM_STATE_S0i3,
diff --git a/drivers/platform/x86/amd/pmf/tee-if.c b/drivers/platform/x86/amd/pmf/tee-if.c
index f8c0177..dcbe8f8 100644
--- a/drivers/platform/x86/amd/pmf/tee-if.c
+++ b/drivers/platform/x86/amd/pmf/tee-if.c
@@ -252,15 +252,17 @@ static int amd_pmf_start_policy_engine(struct amd_pmf_dev *dev)
cookie = readl(dev->policy_buf + POLICY_COOKIE_OFFSET);
length = readl(dev->policy_buf + POLICY_COOKIE_LEN);
- if (cookie != POLICY_SIGN_COOKIE || !length)
+ if (cookie != POLICY_SIGN_COOKIE || !length) {
+ dev_dbg(dev->dev, "cookie doesn't match\n");
return -EINVAL;
+ }
/* Update the actual length */
dev->policy_sz = length + 512;
res = amd_pmf_invoke_cmd_init(dev);
if (res == TA_PMF_TYPE_SUCCESS) {
/* Now its safe to announce that smart pc is enabled */
- dev->smart_pc_enabled = PMF_SMART_PC_ENABLED;
+ dev->smart_pc_enabled = true;
/*
* Start collecting the data from TA FW after a small delay
* or else, we might end up getting stale values.
@@ -268,7 +270,7 @@ static int amd_pmf_start_policy_engine(struct amd_pmf_dev *dev)
schedule_delayed_work(&dev->pb_work, msecs_to_jiffies(pb_actions_ms * 3));
} else {
dev_err(dev->dev, "ta invoke cmd init failed err: %x\n", res);
- dev->smart_pc_enabled = PMF_SMART_PC_DISABLED;
+ dev->smart_pc_enabled = false;
return res;
}
@@ -336,25 +338,6 @@ static void amd_pmf_remove_pb(struct amd_pmf_dev *dev) {}
static void amd_pmf_hex_dump_pb(struct amd_pmf_dev *dev) {}
#endif
-static int amd_pmf_get_bios_buffer(struct amd_pmf_dev *dev)
-{
- dev->policy_buf = kzalloc(dev->policy_sz, GFP_KERNEL);
- if (!dev->policy_buf)
- return -ENOMEM;
-
- dev->policy_base = devm_ioremap(dev->dev, dev->policy_addr, dev->policy_sz);
- if (!dev->policy_base)
- return -ENOMEM;
-
- memcpy(dev->policy_buf, dev->policy_base, dev->policy_sz);
-
- amd_pmf_hex_dump_pb(dev);
- if (pb_side_load)
- amd_pmf_open_pb(dev, dev->dbgfs_dir);
-
- return amd_pmf_start_policy_engine(dev);
-}
-
static int amd_pmf_amdtee_ta_match(struct tee_ioctl_version_data *ver, const void *data)
{
return ver->impl_id == TEE_IMPL_ID_AMDTEE;
@@ -453,22 +436,59 @@ int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev)
return ret;
INIT_DELAYED_WORK(&dev->pb_work, amd_pmf_invoke_cmd);
- amd_pmf_set_dram_addr(dev, true);
- amd_pmf_get_bios_buffer(dev);
- dev->prev_data = kzalloc(sizeof(*dev->prev_data), GFP_KERNEL);
- if (!dev->prev_data)
- return -ENOMEM;
- return dev->smart_pc_enabled;
+ ret = amd_pmf_set_dram_addr(dev, true);
+ if (ret)
+ goto error;
+
+ dev->policy_base = devm_ioremap(dev->dev, dev->policy_addr, dev->policy_sz);
+ if (!dev->policy_base) {
+ ret = -ENOMEM;
+ goto error;
+ }
+
+ dev->policy_buf = kzalloc(dev->policy_sz, GFP_KERNEL);
+ if (!dev->policy_buf) {
+ ret = -ENOMEM;
+ goto error;
+ }
+
+ memcpy(dev->policy_buf, dev->policy_base, dev->policy_sz);
+
+ amd_pmf_hex_dump_pb(dev);
+
+ dev->prev_data = kzalloc(sizeof(*dev->prev_data), GFP_KERNEL);
+ if (!dev->prev_data) {
+ ret = -ENOMEM;
+ goto error;
+ }
+
+ ret = amd_pmf_start_policy_engine(dev);
+ if (ret)
+ goto error;
+
+ if (pb_side_load)
+ amd_pmf_open_pb(dev, dev->dbgfs_dir);
+
+ return 0;
+
+error:
+ amd_pmf_deinit_smart_pc(dev);
+
+ return ret;
}
void amd_pmf_deinit_smart_pc(struct amd_pmf_dev *dev)
{
- if (pb_side_load)
+ if (pb_side_load && dev->esbin)
amd_pmf_remove_pb(dev);
- kfree(dev->prev_data);
- kfree(dev->policy_buf);
cancel_delayed_work_sync(&dev->pb_work);
+ kfree(dev->prev_data);
+ dev->prev_data = NULL;
+ kfree(dev->policy_buf);
+ dev->policy_buf = NULL;
+ kfree(dev->buf);
+ dev->buf = NULL;
amd_pmf_tee_deinit(dev);
}
diff --git a/drivers/platform/x86/intel/int0002_vgpio.c b/drivers/platform/x86/intel/int0002_vgpio.c
index b6708ba..527d8fb 100644
--- a/drivers/platform/x86/intel/int0002_vgpio.c
+++ b/drivers/platform/x86/intel/int0002_vgpio.c
@@ -196,7 +196,7 @@ static int int0002_probe(struct platform_device *pdev)
* IRQs into gpiolib.
*/
ret = devm_request_irq(dev, irq, int0002_irq,
- IRQF_SHARED, "INT0002", chip);
+ IRQF_ONESHOT | IRQF_SHARED, "INT0002", chip);
if (ret) {
dev_err(dev, "Error requesting IRQ %d: %d\n", irq, ret);
return ret;
diff --git a/drivers/platform/x86/intel/vbtn.c b/drivers/platform/x86/intel/vbtn.c
index 210b0a8..084c355 100644
--- a/drivers/platform/x86/intel/vbtn.c
+++ b/drivers/platform/x86/intel/vbtn.c
@@ -200,9 +200,6 @@ static void notify_handler(acpi_handle handle, u32 event, void *context)
autorelease = val && (!ke_rel || ke_rel->type == KE_IGNORE);
sparse_keymap_report_event(input_dev, event, val, autorelease);
-
- /* Some devices need this to report further events */
- acpi_evaluate_object(handle, "VBDL", NULL, NULL);
}
/*
diff --git a/drivers/platform/x86/p2sb.c b/drivers/platform/x86/p2sb.c
index 6bd14d0..3d66e1d 100644
--- a/drivers/platform/x86/p2sb.c
+++ b/drivers/platform/x86/p2sb.c
@@ -20,9 +20,11 @@
#define P2SBC_HIDE BIT(8)
#define P2SB_DEVFN_DEFAULT PCI_DEVFN(31, 1)
+#define P2SB_DEVFN_GOLDMONT PCI_DEVFN(13, 0)
+#define SPI_DEVFN_GOLDMONT PCI_DEVFN(13, 2)
static const struct x86_cpu_id p2sb_cpu_ids[] = {
- X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT, PCI_DEVFN(13, 0)),
+ X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT, P2SB_DEVFN_GOLDMONT),
{}
};
@@ -98,21 +100,12 @@ static void p2sb_scan_and_cache_devfn(struct pci_bus *bus, unsigned int devfn)
static int p2sb_scan_and_cache(struct pci_bus *bus, unsigned int devfn)
{
- unsigned int slot, fn;
+ /* Scan the P2SB device and cache its BAR0 */
+ p2sb_scan_and_cache_devfn(bus, devfn);
- if (PCI_FUNC(devfn) == 0) {
- /*
- * When function number of the P2SB device is zero, scan it and
- * other function numbers, and if devices are available, cache
- * their BAR0s.
- */
- slot = PCI_SLOT(devfn);
- for (fn = 0; fn < NR_P2SB_RES_CACHE; fn++)
- p2sb_scan_and_cache_devfn(bus, PCI_DEVFN(slot, fn));
- } else {
- /* Scan the P2SB device and cache its BAR0 */
- p2sb_scan_and_cache_devfn(bus, devfn);
- }
+ /* On Goldmont p2sb_bar() also gets called for the SPI controller */
+ if (devfn == P2SB_DEVFN_GOLDMONT)
+ p2sb_scan_and_cache_devfn(bus, SPI_DEVFN_GOLDMONT);
if (!p2sb_valid_resource(&p2sb_resources[PCI_FUNC(devfn)].res))
return -ENOENT;
diff --git a/drivers/platform/x86/serdev_helpers.h b/drivers/platform/x86/serdev_helpers.h
new file mode 100644
index 0000000..bcf3a0c
--- /dev/null
+++ b/drivers/platform/x86/serdev_helpers.h
@@ -0,0 +1,80 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * In some cases UART attached devices which require an in kernel driver,
+ * e.g. UART attached Bluetooth HCIs are described in the ACPI tables
+ * by an ACPI device with a broken or missing UartSerialBusV2() resource.
+ *
+ * This causes the kernel to create a /dev/ttyS# char-device for the UART
+ * instead of creating an in kernel serdev-controller + serdev-device pair
+ * for the in kernel driver.
+ *
+ * The quirk handling in acpi_quirk_skip_serdev_enumeration() makes the kernel
+ * create a serdev-controller device for these UARTs instead of a /dev/ttyS#.
+ *
+ * Instantiating the actual serdev-device to bind to is up to pdx86 code,
+ * this header provides a helper for getting the serdev-controller device.
+ */
+#include <linux/acpi.h>
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/printk.h>
+#include <linux/sprintf.h>
+#include <linux/string.h>
+
+static inline struct device *
+get_serdev_controller(const char *serial_ctrl_hid,
+ const char *serial_ctrl_uid,
+ int serial_ctrl_port,
+ const char *serdev_ctrl_name)
+{
+ struct device *ctrl_dev, *child;
+ struct acpi_device *ctrl_adev;
+ char name[32];
+ int i;
+
+ ctrl_adev = acpi_dev_get_first_match_dev(serial_ctrl_hid, serial_ctrl_uid, -1);
+ if (!ctrl_adev) {
+ pr_err("error could not get %s/%s serial-ctrl adev\n",
+ serial_ctrl_hid, serial_ctrl_uid);
+ return ERR_PTR(-ENODEV);
+ }
+
+ /* get_first_physical_node() returns a weak ref */
+ ctrl_dev = get_device(acpi_get_first_physical_node(ctrl_adev));
+ if (!ctrl_dev) {
+ pr_err("error could not get %s/%s serial-ctrl physical node\n",
+ serial_ctrl_hid, serial_ctrl_uid);
+ ctrl_dev = ERR_PTR(-ENODEV);
+ goto put_ctrl_adev;
+ }
+
+ /* Walk host -> uart-ctrl -> port -> serdev-ctrl */
+ for (i = 0; i < 3; i++) {
+ switch (i) {
+ case 0:
+ snprintf(name, sizeof(name), "%s:0", dev_name(ctrl_dev));
+ break;
+ case 1:
+ snprintf(name, sizeof(name), "%s.%d",
+ dev_name(ctrl_dev), serial_ctrl_port);
+ break;
+ case 2:
+ strscpy(name, serdev_ctrl_name, sizeof(name));
+ break;
+ }
+
+ child = device_find_child_by_name(ctrl_dev, name);
+ put_device(ctrl_dev);
+ if (!child) {
+ pr_err("error could not find '%s' device\n", name);
+ ctrl_dev = ERR_PTR(-ENODEV);
+ goto put_ctrl_adev;
+ }
+
+ ctrl_dev = child;
+ }
+
+put_ctrl_adev:
+ acpi_dev_put(ctrl_adev);
+ return ctrl_dev;
+}
diff --git a/drivers/platform/x86/think-lmi.c b/drivers/platform/x86/think-lmi.c
index 3a396b7..ce3e088 100644
--- a/drivers/platform/x86/think-lmi.c
+++ b/drivers/platform/x86/think-lmi.c
@@ -1009,7 +1009,16 @@ static ssize_t current_value_store(struct kobject *kobj,
* Note - this sets the variable and then the password as separate
* WMI calls. Function tlmi_save_bios_settings will error if the
* password is incorrect.
+ * Workstation's require the opcode to be set before changing the
+ * attribute.
*/
+ if (tlmi_priv.pwd_admin->valid && tlmi_priv.pwd_admin->password[0]) {
+ ret = tlmi_opcode_setting("WmiOpcodePasswordAdmin",
+ tlmi_priv.pwd_admin->password);
+ if (ret)
+ goto out;
+ }
+
set_str = kasprintf(GFP_KERNEL, "%s,%s;", setting->display_name,
new_setting);
if (!set_str) {
@@ -1021,17 +1030,10 @@ static ssize_t current_value_store(struct kobject *kobj,
if (ret)
goto out;
- if (tlmi_priv.save_mode == TLMI_SAVE_BULK) {
+ if (tlmi_priv.save_mode == TLMI_SAVE_BULK)
tlmi_priv.save_required = true;
- } else {
- if (tlmi_priv.pwd_admin->valid && tlmi_priv.pwd_admin->password[0]) {
- ret = tlmi_opcode_setting("WmiOpcodePasswordAdmin",
- tlmi_priv.pwd_admin->password);
- if (ret)
- goto out;
- }
+ else
ret = tlmi_save_bios_settings("");
- }
} else { /* old non-opcode based authentication method (deprecated) */
if (tlmi_priv.pwd_admin->valid && tlmi_priv.pwd_admin->password[0]) {
auth_str = kasprintf(GFP_KERNEL, "%s,%s,%s;",
diff --git a/drivers/platform/x86/thinkpad_acpi.c b/drivers/platform/x86/thinkpad_acpi.c
index c4895e9..5ecd9d3 100644
--- a/drivers/platform/x86/thinkpad_acpi.c
+++ b/drivers/platform/x86/thinkpad_acpi.c
@@ -10308,6 +10308,7 @@ static int convert_dytc_to_profile(int funcmode, int dytcmode,
return 0;
default:
/* Unknown function */
+ pr_debug("unknown function 0x%x\n", funcmode);
return -EOPNOTSUPP;
}
return 0;
@@ -10493,8 +10494,8 @@ static void dytc_profile_refresh(void)
return;
perfmode = (output >> DYTC_GET_MODE_BIT) & 0xF;
- convert_dytc_to_profile(funcmode, perfmode, &profile);
- if (profile != dytc_current_profile) {
+ err = convert_dytc_to_profile(funcmode, perfmode, &profile);
+ if (!err && profile != dytc_current_profile) {
dytc_current_profile = profile;
platform_profile_notify();
}
diff --git a/drivers/platform/x86/touchscreen_dmi.c b/drivers/platform/x86/touchscreen_dmi.c
index 7aee5e9..975cf24 100644
--- a/drivers/platform/x86/touchscreen_dmi.c
+++ b/drivers/platform/x86/touchscreen_dmi.c
@@ -81,7 +81,7 @@ static const struct property_entry chuwi_hi8_air_props[] = {
};
static const struct ts_dmi_data chuwi_hi8_air_data = {
- .acpi_name = "MSSL1680:00",
+ .acpi_name = "MSSL1680",
.properties = chuwi_hi8_air_props,
};
@@ -415,18 +415,13 @@ static const struct property_entry gdix1001_upside_down_props[] = {
{ }
};
-static const struct ts_dmi_data gdix1001_00_upside_down_data = {
- .acpi_name = "GDIX1001:00",
+static const struct ts_dmi_data gdix1001_upside_down_data = {
+ .acpi_name = "GDIX1001",
.properties = gdix1001_upside_down_props,
};
-static const struct ts_dmi_data gdix1001_01_upside_down_data = {
- .acpi_name = "GDIX1001:01",
- .properties = gdix1001_upside_down_props,
-};
-
-static const struct ts_dmi_data gdix1002_00_upside_down_data = {
- .acpi_name = "GDIX1002:00",
+static const struct ts_dmi_data gdix1002_upside_down_data = {
+ .acpi_name = "GDIX1002",
.properties = gdix1001_upside_down_props,
};
@@ -1412,7 +1407,7 @@ const struct dmi_system_id touchscreen_dmi_table[] = {
},
{
/* Juno Tablet */
- .driver_data = (void *)&gdix1002_00_upside_down_data,
+ .driver_data = (void *)&gdix1002_upside_down_data,
.matches = {
DMI_MATCH(DMI_SYS_VENDOR, "Default string"),
/* Both product- and board-name being "Default string" is somewhat rare */
@@ -1658,7 +1653,7 @@ const struct dmi_system_id touchscreen_dmi_table[] = {
},
{
/* Teclast X89 (Android version / BIOS) */
- .driver_data = (void *)&gdix1001_00_upside_down_data,
+ .driver_data = (void *)&gdix1001_upside_down_data,
.matches = {
DMI_MATCH(DMI_BOARD_VENDOR, "WISKY"),
DMI_MATCH(DMI_BOARD_NAME, "3G062i"),
@@ -1666,7 +1661,7 @@ const struct dmi_system_id touchscreen_dmi_table[] = {
},
{
/* Teclast X89 (Windows version / BIOS) */
- .driver_data = (void *)&gdix1001_01_upside_down_data,
+ .driver_data = (void *)&gdix1001_upside_down_data,
.matches = {
/* tPAD is too generic, also match on bios date */
DMI_MATCH(DMI_BOARD_VENDOR, "TECLAST"),
@@ -1684,7 +1679,7 @@ const struct dmi_system_id touchscreen_dmi_table[] = {
},
{
/* Teclast X98 Pro */
- .driver_data = (void *)&gdix1001_00_upside_down_data,
+ .driver_data = (void *)&gdix1001_upside_down_data,
.matches = {
/*
* Only match BIOS date, because the manufacturers
@@ -1788,7 +1783,7 @@ const struct dmi_system_id touchscreen_dmi_table[] = {
},
{
/* "WinBook TW100" */
- .driver_data = (void *)&gdix1001_00_upside_down_data,
+ .driver_data = (void *)&gdix1001_upside_down_data,
.matches = {
DMI_MATCH(DMI_SYS_VENDOR, "WinBook"),
DMI_MATCH(DMI_PRODUCT_NAME, "TW100")
@@ -1796,7 +1791,7 @@ const struct dmi_system_id touchscreen_dmi_table[] = {
},
{
/* WinBook TW700 */
- .driver_data = (void *)&gdix1001_00_upside_down_data,
+ .driver_data = (void *)&gdix1001_upside_down_data,
.matches = {
DMI_MATCH(DMI_SYS_VENDOR, "WinBook"),
DMI_MATCH(DMI_PRODUCT_NAME, "TW700")
@@ -1821,7 +1816,7 @@ static void ts_dmi_add_props(struct i2c_client *client)
int error;
if (has_acpi_companion(dev) &&
- !strncmp(ts_data->acpi_name, client->name, I2C_NAME_SIZE)) {
+ strstarts(client->name, ts_data->acpi_name)) {
error = device_create_managed_software_node(dev, ts_data->properties, NULL);
if (error)
dev_err(dev, "failed to add properties: %d\n", error);
diff --git a/drivers/platform/x86/x86-android-tablets/core.c b/drivers/platform/x86/x86-android-tablets/core.c
index f8221a1..a3415f1 100644
--- a/drivers/platform/x86/x86-android-tablets/core.c
+++ b/drivers/platform/x86/x86-android-tablets/core.c
@@ -21,6 +21,7 @@
#include <linux/string.h>
#include "x86-android-tablets.h"
+#include "../serdev_helpers.h"
static struct platform_device *x86_android_tablet_device;
@@ -113,6 +114,9 @@ int x86_acpi_irq_helper_get(const struct x86_acpi_irq_data *data)
if (irq_type != IRQ_TYPE_NONE && irq_type != irq_get_trigger_type(irq))
irq_set_irq_type(irq, irq_type);
+ if (data->free_gpio)
+ devm_gpiod_put(&x86_android_tablet_device->dev, gpiod);
+
return irq;
case X86_ACPI_IRQ_TYPE_PMIC:
status = acpi_get_handle(NULL, data->chip, &handle);
@@ -229,38 +233,20 @@ static __init int x86_instantiate_spi_dev(const struct x86_dev_info *dev_info, i
static __init int x86_instantiate_serdev(const struct x86_serdev_info *info, int idx)
{
- struct acpi_device *ctrl_adev, *serdev_adev;
+ struct acpi_device *serdev_adev;
struct serdev_device *serdev;
struct device *ctrl_dev;
int ret = -ENODEV;
- ctrl_adev = acpi_dev_get_first_match_dev(info->ctrl_hid, info->ctrl_uid, -1);
- if (!ctrl_adev) {
- pr_err("error could not get %s/%s ctrl adev\n",
- info->ctrl_hid, info->ctrl_uid);
- return -ENODEV;
- }
+ ctrl_dev = get_serdev_controller(info->ctrl_hid, info->ctrl_uid, 0,
+ info->ctrl_devname);
+ if (IS_ERR(ctrl_dev))
+ return PTR_ERR(ctrl_dev);
serdev_adev = acpi_dev_get_first_match_dev(info->serdev_hid, NULL, -1);
if (!serdev_adev) {
pr_err("error could not get %s serdev adev\n", info->serdev_hid);
- goto put_ctrl_adev;
- }
-
- /* get_first_physical_node() returns a weak ref, no need to put() it */
- ctrl_dev = acpi_get_first_physical_node(ctrl_adev);
- if (!ctrl_dev) {
- pr_err("error could not get %s/%s ctrl physical dev\n",
- info->ctrl_hid, info->ctrl_uid);
- goto put_serdev_adev;
- }
-
- /* ctrl_dev now points to the controller's parent, get the controller */
- ctrl_dev = device_find_child_by_name(ctrl_dev, info->ctrl_devname);
- if (!ctrl_dev) {
- pr_err("error could not get %s/%s %s ctrl dev\n",
- info->ctrl_hid, info->ctrl_uid, info->ctrl_devname);
- goto put_serdev_adev;
+ goto put_ctrl_dev;
}
serdev = serdev_device_alloc(to_serdev_controller(ctrl_dev));
@@ -283,8 +269,8 @@ static __init int x86_instantiate_serdev(const struct x86_serdev_info *info, int
put_serdev_adev:
acpi_dev_put(serdev_adev);
-put_ctrl_adev:
- acpi_dev_put(ctrl_adev);
+put_ctrl_dev:
+ put_device(ctrl_dev);
return ret;
}
diff --git a/drivers/platform/x86/x86-android-tablets/lenovo.c b/drivers/platform/x86/x86-android-tablets/lenovo.c
index f1c66a6..c297391 100644
--- a/drivers/platform/x86/x86-android-tablets/lenovo.c
+++ b/drivers/platform/x86/x86-android-tablets/lenovo.c
@@ -116,6 +116,7 @@ static const struct x86_i2c_client_info lenovo_yb1_x90_i2c_clients[] __initconst
.trigger = ACPI_EDGE_SENSITIVE,
.polarity = ACPI_ACTIVE_LOW,
.con_id = "goodix_ts_irq",
+ .free_gpio = true,
},
}, {
/* Wacom Digitizer in keyboard half */
diff --git a/drivers/platform/x86/x86-android-tablets/other.c b/drivers/platform/x86/x86-android-tablets/other.c
index bc6bbf7..278402d 100644
--- a/drivers/platform/x86/x86-android-tablets/other.c
+++ b/drivers/platform/x86/x86-android-tablets/other.c
@@ -68,7 +68,7 @@ static const struct x86_i2c_client_info acer_b1_750_i2c_clients[] __initconst =
},
};
-static struct gpiod_lookup_table acer_b1_750_goodix_gpios = {
+static struct gpiod_lookup_table acer_b1_750_nvt_ts_gpios = {
.dev_id = "i2c-NVT-ts",
.table = {
GPIO_LOOKUP("INT33FC:01", 26, "reset", GPIO_ACTIVE_LOW),
@@ -77,7 +77,7 @@ static struct gpiod_lookup_table acer_b1_750_goodix_gpios = {
};
static struct gpiod_lookup_table * const acer_b1_750_gpios[] = {
- &acer_b1_750_goodix_gpios,
+ &acer_b1_750_nvt_ts_gpios,
&int3496_reference_gpios,
NULL
};
diff --git a/drivers/platform/x86/x86-android-tablets/x86-android-tablets.h b/drivers/platform/x86/x86-android-tablets/x86-android-tablets.h
index 49fed94..468993e 100644
--- a/drivers/platform/x86/x86-android-tablets/x86-android-tablets.h
+++ b/drivers/platform/x86/x86-android-tablets/x86-android-tablets.h
@@ -39,6 +39,7 @@ struct x86_acpi_irq_data {
int index;
int trigger; /* ACPI_EDGE_SENSITIVE / ACPI_LEVEL_SENSITIVE */
int polarity; /* ACPI_ACTIVE_HIGH / ACPI_ACTIVE_LOW / ACPI_ACTIVE_BOTH */
+ bool free_gpio; /* Release GPIO after getting IRQ (for TYPE_GPIOINT) */
const char *con_id;
};
diff --git a/drivers/pmdomain/arm/scmi_perf_domain.c b/drivers/pmdomain/arm/scmi_perf_domain.c
index 709bbc4..d7ef46c 100644
--- a/drivers/pmdomain/arm/scmi_perf_domain.c
+++ b/drivers/pmdomain/arm/scmi_perf_domain.c
@@ -159,6 +159,9 @@ static void scmi_perf_domain_remove(struct scmi_device *sdev)
struct genpd_onecell_data *scmi_pd_data = dev_get_drvdata(dev);
int i;
+ if (!scmi_pd_data)
+ return;
+
of_genpd_del_provider(dev->of_node);
for (i = 0; i < scmi_pd_data->num_domains; i++)
diff --git a/drivers/pmdomain/core.c b/drivers/pmdomain/core.c
index a1f6cba..18e232b 100644
--- a/drivers/pmdomain/core.c
+++ b/drivers/pmdomain/core.c
@@ -1109,7 +1109,7 @@ static int __init genpd_power_off_unused(void)
return 0;
}
-late_initcall(genpd_power_off_unused);
+late_initcall_sync(genpd_power_off_unused);
#ifdef CONFIG_PM_SLEEP
diff --git a/drivers/pmdomain/mediatek/mtk-pm-domains.c b/drivers/pmdomain/mediatek/mtk-pm-domains.c
index e26dc17d..e274e33 100644
--- a/drivers/pmdomain/mediatek/mtk-pm-domains.c
+++ b/drivers/pmdomain/mediatek/mtk-pm-domains.c
@@ -561,6 +561,11 @@ static int scpsys_add_subdomain(struct scpsys *scpsys, struct device_node *paren
goto err_put_node;
}
+ /* recursive call to add all subdomains */
+ ret = scpsys_add_subdomain(scpsys, child);
+ if (ret)
+ goto err_put_node;
+
ret = pm_genpd_add_subdomain(parent_pd, child_pd);
if (ret) {
dev_err(scpsys->dev, "failed to add %s subdomain to parent %s\n",
@@ -570,11 +575,6 @@ static int scpsys_add_subdomain(struct scpsys *scpsys, struct device_node *paren
dev_dbg(scpsys->dev, "%s add subdomain: %s\n", parent_pd->name,
child_pd->name);
}
-
- /* recursive call to add all subdomains */
- ret = scpsys_add_subdomain(scpsys, child);
- if (ret)
- goto err_put_node;
}
return 0;
@@ -588,9 +588,6 @@ static void scpsys_remove_one_domain(struct scpsys_domain *pd)
{
int ret;
- if (scpsys_domain_is_on(pd))
- scpsys_power_off(&pd->genpd);
-
/*
* We're in the error cleanup already, so we only complain,
* but won't emit another error on top of the original one.
@@ -600,6 +597,8 @@ static void scpsys_remove_one_domain(struct scpsys_domain *pd)
dev_err(pd->scpsys->dev,
"failed to remove domain '%s' : %d - state may be inconsistent\n",
pd->genpd.name, ret);
+ if (scpsys_domain_is_on(pd))
+ scpsys_power_off(&pd->genpd);
clk_bulk_put(pd->num_clks, pd->clks);
clk_bulk_put(pd->num_subsys_clks, pd->subsys_clks);
diff --git a/drivers/pmdomain/qcom/rpmhpd.c b/drivers/pmdomain/qcom/rpmhpd.c
index 3078896..47df910 100644
--- a/drivers/pmdomain/qcom/rpmhpd.c
+++ b/drivers/pmdomain/qcom/rpmhpd.c
@@ -692,6 +692,7 @@ static int rpmhpd_aggregate_corner(struct rpmhpd *pd, unsigned int corner)
unsigned int active_corner, sleep_corner;
unsigned int this_active_corner = 0, this_sleep_corner = 0;
unsigned int peer_active_corner = 0, peer_sleep_corner = 0;
+ unsigned int peer_enabled_corner;
if (pd->state_synced) {
to_active_sleep(pd, corner, &this_active_corner, &this_sleep_corner);
@@ -701,9 +702,11 @@ static int rpmhpd_aggregate_corner(struct rpmhpd *pd, unsigned int corner)
this_sleep_corner = pd->level_count - 1;
}
- if (peer && peer->enabled)
- to_active_sleep(peer, peer->corner, &peer_active_corner,
+ if (peer && peer->enabled) {
+ peer_enabled_corner = max(peer->corner, peer->enable_corner);
+ to_active_sleep(peer, peer_enabled_corner, &peer_active_corner,
&peer_sleep_corner);
+ }
active_corner = max(this_active_corner, peer_active_corner);
diff --git a/drivers/pmdomain/renesas/r8a77980-sysc.c b/drivers/pmdomain/renesas/r8a77980-sysc.c
index 39ca84a..621e411 100644
--- a/drivers/pmdomain/renesas/r8a77980-sysc.c
+++ b/drivers/pmdomain/renesas/r8a77980-sysc.c
@@ -25,7 +25,8 @@ static const struct rcar_sysc_area r8a77980_areas[] __initconst = {
PD_CPU_NOCR },
{ "ca53-cpu3", 0x200, 3, R8A77980_PD_CA53_CPU3, R8A77980_PD_CA53_SCU,
PD_CPU_NOCR },
- { "cr7", 0x240, 0, R8A77980_PD_CR7, R8A77980_PD_ALWAYS_ON },
+ { "cr7", 0x240, 0, R8A77980_PD_CR7, R8A77980_PD_ALWAYS_ON,
+ PD_CPU_NOCR },
{ "a3ir", 0x180, 0, R8A77980_PD_A3IR, R8A77980_PD_ALWAYS_ON },
{ "a2ir0", 0x400, 0, R8A77980_PD_A2IR0, R8A77980_PD_A3IR },
{ "a2ir1", 0x400, 1, R8A77980_PD_A2IR1, R8A77980_PD_A3IR },
diff --git a/drivers/power/supply/Kconfig b/drivers/power/supply/Kconfig
index f21cb05..3e31375 100644
--- a/drivers/power/supply/Kconfig
+++ b/drivers/power/supply/Kconfig
@@ -978,6 +978,7 @@
config FUEL_GAUGE_MM8013
tristate "Mitsumi MM8013 fuel gauge driver"
depends on I2C
+ select REGMAP_I2C
help
Say Y here to enable the Mitsumi MM8013 fuel gauge driver.
It enables the monitoring of many battery parameters, including
diff --git a/drivers/power/supply/bq27xxx_battery_i2c.c b/drivers/power/supply/bq27xxx_battery_i2c.c
index 3a1798b..9910c60 100644
--- a/drivers/power/supply/bq27xxx_battery_i2c.c
+++ b/drivers/power/supply/bq27xxx_battery_i2c.c
@@ -209,7 +209,9 @@ static void bq27xxx_battery_i2c_remove(struct i2c_client *client)
{
struct bq27xxx_device_info *di = i2c_get_clientdata(client);
- free_irq(client->irq, di);
+ if (client->irq)
+ free_irq(client->irq, di);
+
bq27xxx_battery_teardown(di);
mutex_lock(&battery_mutex);
diff --git a/drivers/regulator/max5970-regulator.c b/drivers/regulator/max5970-regulator.c
index 830a1c4..8bbcd98 100644
--- a/drivers/regulator/max5970-regulator.c
+++ b/drivers/regulator/max5970-regulator.c
@@ -29,8 +29,8 @@ struct max5970_regulator {
};
enum max597x_regulator_id {
- MAX597X_SW0,
- MAX597X_SW1,
+ MAX597X_sw0,
+ MAX597X_sw1,
};
static int max5970_read_adc(struct regmap *regmap, int reg, long *val)
@@ -378,8 +378,8 @@ static int max597x_dt_parse(struct device_node *np,
}
static const struct regulator_desc regulators[] = {
- MAX597X_SWITCH(SW0, MAX5970_REG_CHXEN, 0, "vss1"),
- MAX597X_SWITCH(SW1, MAX5970_REG_CHXEN, 1, "vss2"),
+ MAX597X_SWITCH(sw0, MAX5970_REG_CHXEN, 0, "vss1"),
+ MAX597X_SWITCH(sw1, MAX5970_REG_CHXEN, 1, "vss2"),
};
static int max597x_regmap_read_clear(struct regmap *map, unsigned int reg,
diff --git a/drivers/regulator/rk808-regulator.c b/drivers/regulator/rk808-regulator.c
index e374fa6..d89ae7f 100644
--- a/drivers/regulator/rk808-regulator.c
+++ b/drivers/regulator/rk808-regulator.c
@@ -1017,14 +1017,14 @@ static const struct regulator_desc rk805_reg[] = {
};
static const struct linear_range rk806_buck_voltage_ranges[] = {
- REGULATOR_LINEAR_RANGE(500000, 0, 160, 6250), /* 500mV ~ 1500mV */
- REGULATOR_LINEAR_RANGE(1500000, 161, 237, 25000), /* 1500mV ~ 3400mV */
- REGULATOR_LINEAR_RANGE(3400000, 238, 255, 0),
+ REGULATOR_LINEAR_RANGE(500000, 0, 159, 6250), /* 500mV ~ 1500mV */
+ REGULATOR_LINEAR_RANGE(1500000, 160, 235, 25000), /* 1500mV ~ 3400mV */
+ REGULATOR_LINEAR_RANGE(3400000, 236, 255, 0),
};
static const struct linear_range rk806_ldo_voltage_ranges[] = {
- REGULATOR_LINEAR_RANGE(500000, 0, 232, 12500), /* 500mV ~ 3400mV */
- REGULATOR_LINEAR_RANGE(3400000, 233, 255, 0), /* 500mV ~ 3400mV */
+ REGULATOR_LINEAR_RANGE(500000, 0, 231, 12500), /* 500mV ~ 3400mV */
+ REGULATOR_LINEAR_RANGE(3400000, 232, 255, 0),
};
static const struct regulator_desc rk806_reg[] = {
diff --git a/drivers/rtc/lib_test.c b/drivers/rtc/lib_test.c
index d5caf36..225c859 100644
--- a/drivers/rtc/lib_test.c
+++ b/drivers/rtc/lib_test.c
@@ -54,7 +54,7 @@ static void rtc_time64_to_tm_test_date_range(struct kunit *test)
days = div_s64(secs, 86400);
- #define FAIL_MSG "%d/%02d/%02d (%2d) : %ld", \
+ #define FAIL_MSG "%d/%02d/%02d (%2d) : %lld", \
year, month, mday, yday, days
KUNIT_ASSERT_EQ_MSG(test, year - 1900, result.tm_year, FAIL_MSG);
diff --git a/drivers/s390/block/dasd.c b/drivers/s390/block/dasd.c
index e8eb710..cead018 100644
--- a/drivers/s390/block/dasd.c
+++ b/drivers/s390/block/dasd.c
@@ -424,7 +424,7 @@ dasd_state_ready_to_online(struct dasd_device * device)
KOBJ_CHANGE);
return 0;
}
- disk_uevent(device->block->bdev_handle->bdev->bd_disk,
+ disk_uevent(file_bdev(device->block->bdev_file)->bd_disk,
KOBJ_CHANGE);
}
return 0;
@@ -445,7 +445,7 @@ static int dasd_state_online_to_ready(struct dasd_device *device)
device->state = DASD_STATE_READY;
if (device->block && !(device->features & DASD_FEATURE_USERAW))
- disk_uevent(device->block->bdev_handle->bdev->bd_disk,
+ disk_uevent(file_bdev(device->block->bdev_file)->bd_disk,
KOBJ_CHANGE);
return 0;
}
@@ -3581,7 +3581,7 @@ int dasd_generic_set_offline(struct ccw_device *cdev)
* in the other openers.
*/
if (device->block) {
- max_count = device->block->bdev_handle ? 0 : -1;
+ max_count = device->block->bdev_file ? 0 : -1;
open_count = atomic_read(&device->block->open_count);
if (open_count > max_count) {
if (open_count > 0)
@@ -3626,8 +3626,8 @@ int dasd_generic_set_offline(struct ccw_device *cdev)
* so sync bdev first and then wait for our queues to become
* empty
*/
- if (device->block && device->block->bdev_handle)
- bdev_mark_dead(device->block->bdev_handle->bdev, false);
+ if (device->block && device->block->bdev_file)
+ bdev_mark_dead(file_bdev(device->block->bdev_file), false);
dasd_schedule_device_bh(device);
rc = wait_event_interruptible(shutdown_waitq,
_wait_for_empty_queues(device));
diff --git a/drivers/s390/block/dasd_genhd.c b/drivers/s390/block/dasd_genhd.c
index 528e2d3..4533dd0 100644
--- a/drivers/s390/block/dasd_genhd.c
+++ b/drivers/s390/block/dasd_genhd.c
@@ -133,15 +133,15 @@ void dasd_gendisk_free(struct dasd_block *block)
*/
int dasd_scan_partitions(struct dasd_block *block)
{
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
int rc;
- bdev_handle = bdev_open_by_dev(disk_devt(block->gdp), BLK_OPEN_READ,
+ bdev_file = bdev_file_open_by_dev(disk_devt(block->gdp), BLK_OPEN_READ,
NULL, NULL);
- if (IS_ERR(bdev_handle)) {
+ if (IS_ERR(bdev_file)) {
DBF_DEV_EVENT(DBF_ERR, block->base,
"scan partitions error, blkdev_get returned %ld",
- PTR_ERR(bdev_handle));
+ PTR_ERR(bdev_file));
return -ENODEV;
}
@@ -153,15 +153,15 @@ int dasd_scan_partitions(struct dasd_block *block)
"scan partitions error, rc %d", rc);
/*
- * Since the matching bdev_release() call to the
- * bdev_open_by_path() in this function is not called before
+ * Since the matching fput() call to the
+ * bdev_file_open_by_path() in this function is not called before
* dasd_destroy_partitions the offline open_count limit needs to be
- * increased from 0 to 1. This is done by setting device->bdev_handle
+ * increased from 0 to 1. This is done by setting device->bdev_file
* (see dasd_generic_set_offline). As long as the partition detection
* is running no offline should be allowed. That is why the assignment
- * to block->bdev_handle is done AFTER the BLKRRPART ioctl.
+ * to block->bdev_file is done AFTER the BLKRRPART ioctl.
*/
- block->bdev_handle = bdev_handle;
+ block->bdev_file = bdev_file;
return 0;
}
@@ -171,21 +171,21 @@ int dasd_scan_partitions(struct dasd_block *block)
*/
void dasd_destroy_partitions(struct dasd_block *block)
{
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
/*
- * Get the bdev_handle pointer from the device structure and clear
- * device->bdev_handle to lower the offline open_count limit again.
+ * Get the bdev_file pointer from the device structure and clear
+ * device->bdev_file to lower the offline open_count limit again.
*/
- bdev_handle = block->bdev_handle;
- block->bdev_handle = NULL;
+ bdev_file = block->bdev_file;
+ block->bdev_file = NULL;
- mutex_lock(&bdev_handle->bdev->bd_disk->open_mutex);
- bdev_disk_changed(bdev_handle->bdev->bd_disk, true);
- mutex_unlock(&bdev_handle->bdev->bd_disk->open_mutex);
+ mutex_lock(&file_bdev(bdev_file)->bd_disk->open_mutex);
+ bdev_disk_changed(file_bdev(bdev_file)->bd_disk, true);
+ mutex_unlock(&file_bdev(bdev_file)->bd_disk->open_mutex);
/* Matching blkdev_put to the blkdev_get in dasd_scan_partitions. */
- bdev_release(bdev_handle);
+ fput(bdev_file);
}
int dasd_gendisk_init(void)
diff --git a/drivers/s390/block/dasd_int.h b/drivers/s390/block/dasd_int.h
index b56d683..e5f4053 100644
--- a/drivers/s390/block/dasd_int.h
+++ b/drivers/s390/block/dasd_int.h
@@ -619,7 +619,7 @@ struct dasd_block {
struct gendisk *gdp;
spinlock_t request_queue_lock;
struct blk_mq_tag_set tag_set;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
atomic_t open_count;
unsigned long blocks; /* size of volume in blocks */
diff --git a/drivers/s390/block/dasd_ioctl.c b/drivers/s390/block/dasd_ioctl.c
index 6814354..7e0ed70 100644
--- a/drivers/s390/block/dasd_ioctl.c
+++ b/drivers/s390/block/dasd_ioctl.c
@@ -531,7 +531,7 @@ static int __dasd_ioctl_information(struct dasd_block *block,
* This must be hidden from user-space.
*/
dasd_info->open_count = atomic_read(&block->open_count);
- if (!block->bdev_handle)
+ if (!block->bdev_file)
dasd_info->open_count++;
/*
diff --git a/drivers/s390/cio/device_ops.c b/drivers/s390/cio/device_ops.c
index c533d1d..a5dba38 100644
--- a/drivers/s390/cio/device_ops.c
+++ b/drivers/s390/cio/device_ops.c
@@ -202,7 +202,8 @@ int ccw_device_start_timeout_key(struct ccw_device *cdev, struct ccw1 *cpa,
return -EINVAL;
if (cdev->private->state == DEV_STATE_NOT_OPER)
return -ENODEV;
- if (cdev->private->state == DEV_STATE_VERIFY) {
+ if (cdev->private->state == DEV_STATE_VERIFY ||
+ cdev->private->flags.doverify) {
/* Remember to fake irb when finished. */
if (!cdev->private->flags.fake_irb) {
cdev->private->flags.fake_irb = FAKE_CMD_IRB;
@@ -214,8 +215,7 @@ int ccw_device_start_timeout_key(struct ccw_device *cdev, struct ccw1 *cpa,
}
if (cdev->private->state != DEV_STATE_ONLINE ||
((sch->schib.scsw.cmd.stctl & SCSW_STCTL_PRIM_STATUS) &&
- !(sch->schib.scsw.cmd.stctl & SCSW_STCTL_SEC_STATUS)) ||
- cdev->private->flags.doverify)
+ !(sch->schib.scsw.cmd.stctl & SCSW_STCTL_SEC_STATUS)))
return -EBUSY;
ret = cio_set_options (sch, flags);
if (ret)
diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c
index b92a32b..04c64ce 100644
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -255,9 +255,10 @@ static void qeth_l3_clear_ip_htable(struct qeth_card *card, int recover)
if (!recover) {
hash_del(&addr->hnode);
kfree(addr);
- continue;
+ } else {
+ /* prepare for recovery */
+ addr->disp_flag = QETH_DISP_ADDR_ADD;
}
- addr->disp_flag = QETH_DISP_ADDR_ADD;
}
mutex_unlock(&card->ip_lock);
@@ -278,9 +279,11 @@ static void qeth_l3_recover_ip(struct qeth_card *card)
if (addr->disp_flag == QETH_DISP_ADDR_ADD) {
rc = qeth_l3_register_addr_entry(card, addr);
- if (!rc) {
+ if (!rc || rc == -EADDRINUSE || rc == -ENETDOWN) {
+ /* keep it in the records */
addr->disp_flag = QETH_DISP_ADDR_DO_NOTHING;
} else {
+ /* bad address */
hash_del(&addr->hnode);
kfree(addr);
}
diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index addac7f..9ce2709 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -1270,7 +1270,7 @@
config JAZZ_ESP
bool "MIPS JAZZ FAS216 SCSI support"
- depends on MACH_JAZZ && SCSI
+ depends on MACH_JAZZ && SCSI=y
select SCSI_SPI_ATTRS
help
This is the driver for the onboard SCSI host adapter of MIPS Magnum
diff --git a/drivers/scsi/fcoe/fcoe_ctlr.c b/drivers/scsi/fcoe/fcoe_ctlr.c
index 19eee10..5c8d1ba 100644
--- a/drivers/scsi/fcoe/fcoe_ctlr.c
+++ b/drivers/scsi/fcoe/fcoe_ctlr.c
@@ -319,17 +319,16 @@ static void fcoe_ctlr_announce(struct fcoe_ctlr *fip)
{
struct fcoe_fcf *sel;
struct fcoe_fcf *fcf;
- unsigned long flags;
mutex_lock(&fip->ctlr_mutex);
- spin_lock_irqsave(&fip->ctlr_lock, flags);
+ spin_lock_bh(&fip->ctlr_lock);
kfree_skb(fip->flogi_req);
fip->flogi_req = NULL;
list_for_each_entry(fcf, &fip->fcfs, list)
fcf->flogi_sent = 0;
- spin_unlock_irqrestore(&fip->ctlr_lock, flags);
+ spin_unlock_bh(&fip->ctlr_lock);
sel = fip->sel_fcf;
if (sel && ether_addr_equal(sel->fcf_mac, fip->dest_addr))
@@ -700,7 +699,6 @@ int fcoe_ctlr_els_send(struct fcoe_ctlr *fip, struct fc_lport *lport,
{
struct fc_frame *fp;
struct fc_frame_header *fh;
- unsigned long flags;
u16 old_xid;
u8 op;
u8 mac[ETH_ALEN];
@@ -734,11 +732,11 @@ int fcoe_ctlr_els_send(struct fcoe_ctlr *fip, struct fc_lport *lport,
op = FIP_DT_FLOGI;
if (fip->mode == FIP_MODE_VN2VN)
break;
- spin_lock_irqsave(&fip->ctlr_lock, flags);
+ spin_lock_bh(&fip->ctlr_lock);
kfree_skb(fip->flogi_req);
fip->flogi_req = skb;
fip->flogi_req_send = 1;
- spin_unlock_irqrestore(&fip->ctlr_lock, flags);
+ spin_unlock_bh(&fip->ctlr_lock);
schedule_work(&fip->timer_work);
return -EINPROGRESS;
case ELS_FDISC:
@@ -1707,11 +1705,10 @@ static int fcoe_ctlr_flogi_send_locked(struct fcoe_ctlr *fip)
static int fcoe_ctlr_flogi_retry(struct fcoe_ctlr *fip)
{
struct fcoe_fcf *fcf;
- unsigned long flags;
int error;
mutex_lock(&fip->ctlr_mutex);
- spin_lock_irqsave(&fip->ctlr_lock, flags);
+ spin_lock_bh(&fip->ctlr_lock);
LIBFCOE_FIP_DBG(fip, "re-sending FLOGI - reselect\n");
fcf = fcoe_ctlr_select(fip);
if (!fcf || fcf->flogi_sent) {
@@ -1722,7 +1719,7 @@ static int fcoe_ctlr_flogi_retry(struct fcoe_ctlr *fip)
fcoe_ctlr_solicit(fip, NULL);
error = fcoe_ctlr_flogi_send_locked(fip);
}
- spin_unlock_irqrestore(&fip->ctlr_lock, flags);
+ spin_unlock_bh(&fip->ctlr_lock);
mutex_unlock(&fip->ctlr_mutex);
return error;
}
@@ -1739,9 +1736,8 @@ static int fcoe_ctlr_flogi_retry(struct fcoe_ctlr *fip)
static void fcoe_ctlr_flogi_send(struct fcoe_ctlr *fip)
{
struct fcoe_fcf *fcf;
- unsigned long flags;
- spin_lock_irqsave(&fip->ctlr_lock, flags);
+ spin_lock_bh(&fip->ctlr_lock);
fcf = fip->sel_fcf;
if (!fcf || !fip->flogi_req_send)
goto unlock;
@@ -1768,7 +1764,7 @@ static void fcoe_ctlr_flogi_send(struct fcoe_ctlr *fip)
} else /* XXX */
LIBFCOE_FIP_DBG(fip, "No FCF selected - defer send\n");
unlock:
- spin_unlock_irqrestore(&fip->ctlr_lock, flags);
+ spin_unlock_bh(&fip->ctlr_lock);
}
/**
diff --git a/drivers/scsi/fnic/fnic.h b/drivers/scsi/fnic/fnic.h
index 2074937..ce73f08 100644
--- a/drivers/scsi/fnic/fnic.h
+++ b/drivers/scsi/fnic/fnic.h
@@ -305,6 +305,7 @@ struct fnic {
unsigned int copy_wq_base;
struct work_struct link_work;
struct work_struct frame_work;
+ struct work_struct flush_work;
struct sk_buff_head frame_queue;
struct sk_buff_head tx_queue;
@@ -363,7 +364,7 @@ void fnic_handle_event(struct work_struct *work);
int fnic_rq_cmpl_handler(struct fnic *fnic, int);
int fnic_alloc_rq_frame(struct vnic_rq *rq);
void fnic_free_rq_buf(struct vnic_rq *rq, struct vnic_rq_buf *buf);
-void fnic_flush_tx(struct fnic *);
+void fnic_flush_tx(struct work_struct *work);
void fnic_eth_send(struct fcoe_ctlr *, struct sk_buff *skb);
void fnic_set_port_id(struct fc_lport *, u32, struct fc_frame *);
void fnic_update_mac(struct fc_lport *, u8 *new);
diff --git a/drivers/scsi/fnic/fnic_fcs.c b/drivers/scsi/fnic/fnic_fcs.c
index 5e312a5..a08293b 100644
--- a/drivers/scsi/fnic/fnic_fcs.c
+++ b/drivers/scsi/fnic/fnic_fcs.c
@@ -1182,7 +1182,7 @@ int fnic_send(struct fc_lport *lp, struct fc_frame *fp)
/**
* fnic_flush_tx() - send queued frames.
- * @fnic: fnic device
+ * @work: pointer to work element
*
* Send frames that were waiting to go out in FC or Ethernet mode.
* Whenever changing modes we purge queued frames, so these frames should
@@ -1190,8 +1190,9 @@ int fnic_send(struct fc_lport *lp, struct fc_frame *fp)
*
* Called without fnic_lock held.
*/
-void fnic_flush_tx(struct fnic *fnic)
+void fnic_flush_tx(struct work_struct *work)
{
+ struct fnic *fnic = container_of(work, struct fnic, flush_work);
struct sk_buff *skb;
struct fc_frame *fp;
diff --git a/drivers/scsi/fnic/fnic_main.c b/drivers/scsi/fnic/fnic_main.c
index 5ed1d89..29eead3 100644
--- a/drivers/scsi/fnic/fnic_main.c
+++ b/drivers/scsi/fnic/fnic_main.c
@@ -830,6 +830,7 @@ static int fnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
spin_lock_init(&fnic->vlans_lock);
INIT_WORK(&fnic->fip_frame_work, fnic_handle_fip_frame);
INIT_WORK(&fnic->event_work, fnic_handle_event);
+ INIT_WORK(&fnic->flush_work, fnic_flush_tx);
skb_queue_head_init(&fnic->fip_frame_queue);
INIT_LIST_HEAD(&fnic->evlist);
INIT_LIST_HEAD(&fnic->vlans);
diff --git a/drivers/scsi/fnic/fnic_scsi.c b/drivers/scsi/fnic/fnic_scsi.c
index 8d7fc52..fc4cee9 100644
--- a/drivers/scsi/fnic/fnic_scsi.c
+++ b/drivers/scsi/fnic/fnic_scsi.c
@@ -680,7 +680,7 @@ static int fnic_fcpio_fw_reset_cmpl_handler(struct fnic *fnic,
spin_unlock_irqrestore(&fnic->fnic_lock, flags);
- fnic_flush_tx(fnic);
+ queue_work(fnic_event_queue, &fnic->flush_work);
reset_cmpl_handler_end:
fnic_clear_state_flags(fnic, FNIC_FLAGS_FWRESET);
@@ -736,7 +736,7 @@ static int fnic_fcpio_flogi_reg_cmpl_handler(struct fnic *fnic,
}
spin_unlock_irqrestore(&fnic->fnic_lock, flags);
- fnic_flush_tx(fnic);
+ queue_work(fnic_event_queue, &fnic->flush_work);
queue_work(fnic_event_queue, &fnic->frame_work);
} else {
spin_unlock_irqrestore(&fnic->fnic_lock, flags);
diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
index d26941b..bf879d8 100644
--- a/drivers/scsi/lpfc/lpfc_scsi.c
+++ b/drivers/scsi/lpfc/lpfc_scsi.c
@@ -1918,7 +1918,7 @@ lpfc_bg_setup_bpl_prot(struct lpfc_hba *phba, struct scsi_cmnd *sc,
*
* Returns the number of SGEs added to the SGL.
**/
-static int
+static uint32_t
lpfc_bg_setup_sgl(struct lpfc_hba *phba, struct scsi_cmnd *sc,
struct sli4_sge *sgl, int datasegcnt,
struct lpfc_io_buf *lpfc_cmd)
@@ -1926,8 +1926,8 @@ lpfc_bg_setup_sgl(struct lpfc_hba *phba, struct scsi_cmnd *sc,
struct scatterlist *sgde = NULL; /* s/g data entry */
struct sli4_sge_diseed *diseed = NULL;
dma_addr_t physaddr;
- int i = 0, num_sge = 0, status;
- uint32_t reftag;
+ int i = 0, status;
+ uint32_t reftag, num_sge = 0;
uint8_t txop, rxop;
#ifdef CONFIG_SCSI_LPFC_DEBUG_FS
uint32_t rc;
@@ -2099,7 +2099,7 @@ lpfc_bg_setup_sgl(struct lpfc_hba *phba, struct scsi_cmnd *sc,
*
* Returns the number of SGEs added to the SGL.
**/
-static int
+static uint32_t
lpfc_bg_setup_sgl_prot(struct lpfc_hba *phba, struct scsi_cmnd *sc,
struct sli4_sge *sgl, int datacnt, int protcnt,
struct lpfc_io_buf *lpfc_cmd)
@@ -2123,8 +2123,8 @@ lpfc_bg_setup_sgl_prot(struct lpfc_hba *phba, struct scsi_cmnd *sc,
uint32_t rc;
#endif
uint32_t checking = 1;
- uint32_t dma_offset = 0;
- int num_sge = 0, j = 2;
+ uint32_t dma_offset = 0, num_sge = 0;
+ int j = 2;
struct sli4_hybrid_sgl *sgl_xtra = NULL;
sgpe = scsi_prot_sglist(sc);
diff --git a/drivers/scsi/mpi3mr/mpi3mr_transport.c b/drivers/scsi/mpi3mr/mpi3mr_transport.c
index c0c8ab5..d32ad46 100644
--- a/drivers/scsi/mpi3mr/mpi3mr_transport.c
+++ b/drivers/scsi/mpi3mr/mpi3mr_transport.c
@@ -1671,7 +1671,7 @@ mpi3mr_update_mr_sas_port(struct mpi3mr_ioc *mrioc, struct host_port *h_port,
void
mpi3mr_refresh_sas_ports(struct mpi3mr_ioc *mrioc)
{
- struct host_port h_port[64];
+ struct host_port *h_port = NULL;
int i, j, found, host_port_count = 0, port_idx;
u16 sz, attached_handle, ioc_status;
struct mpi3_sas_io_unit_page0 *sas_io_unit_pg0 = NULL;
@@ -1685,6 +1685,10 @@ mpi3mr_refresh_sas_ports(struct mpi3mr_ioc *mrioc)
sas_io_unit_pg0 = kzalloc(sz, GFP_KERNEL);
if (!sas_io_unit_pg0)
return;
+ h_port = kcalloc(64, sizeof(struct host_port), GFP_KERNEL);
+ if (!h_port)
+ goto out;
+
if (mpi3mr_cfg_get_sas_io_unit_pg0(mrioc, sas_io_unit_pg0, sz)) {
ioc_err(mrioc, "failure at %s:%d/%s()!\n",
__FILE__, __LINE__, __func__);
@@ -1814,6 +1818,7 @@ mpi3mr_refresh_sas_ports(struct mpi3mr_ioc *mrioc)
}
}
out:
+ kfree(h_port);
kfree(sas_io_unit_pg0);
}
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 8761bc5..b8120ca 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -7378,7 +7378,9 @@ _base_wait_for_iocstate(struct MPT3SAS_ADAPTER *ioc, int timeout)
return -EFAULT;
}
- issue_diag_reset:
+ return 0;
+
+issue_diag_reset:
rc = _base_diag_reset(ioc);
return rc;
}
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 76d3693..8cad979 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -328,21 +328,39 @@ static int scsi_vpd_inquiry(struct scsi_device *sdev, unsigned char *buffer,
return result + 4;
}
+enum scsi_vpd_parameters {
+ SCSI_VPD_HEADER_SIZE = 4,
+ SCSI_VPD_LIST_SIZE = 36,
+};
+
static int scsi_get_vpd_size(struct scsi_device *sdev, u8 page)
{
- unsigned char vpd_header[SCSI_VPD_HEADER_SIZE] __aligned(4);
+ unsigned char vpd[SCSI_VPD_LIST_SIZE] __aligned(4);
int result;
if (sdev->no_vpd_size)
return SCSI_DEFAULT_VPD_LEN;
/*
+ * Fetch the supported pages VPD and validate that the requested page
+ * number is present.
+ */
+ if (page != 0) {
+ result = scsi_vpd_inquiry(sdev, vpd, 0, sizeof(vpd));
+ if (result < SCSI_VPD_HEADER_SIZE)
+ return 0;
+
+ result -= SCSI_VPD_HEADER_SIZE;
+ if (!memchr(&vpd[SCSI_VPD_HEADER_SIZE], page, result))
+ return 0;
+ }
+ /*
* Fetch the VPD page header to find out how big the page
* is. This is done to prevent problems on legacy devices
* which can not handle allocation lengths as large as
* potentially requested by the caller.
*/
- result = scsi_vpd_inquiry(sdev, vpd_header, page, sizeof(vpd_header));
+ result = scsi_vpd_inquiry(sdev, vpd, page, SCSI_VPD_HEADER_SIZE);
if (result < 0)
return 0;
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 4f45588..612489a 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -282,11 +282,12 @@ static void scsi_eh_inc_host_failed(struct rcu_head *head)
{
struct scsi_cmnd *scmd = container_of(head, typeof(*scmd), rcu);
struct Scsi_Host *shost = scmd->device->host;
+ unsigned int busy = scsi_host_busy(shost);
unsigned long flags;
spin_lock_irqsave(shost->host_lock, flags);
shost->host_failed++;
- scsi_eh_wakeup(shost, scsi_host_busy(shost));
+ scsi_eh_wakeup(shost, busy);
spin_unlock_irqrestore(shost->host_lock, flags);
}
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 1fb80ea..df5ac03 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -278,9 +278,11 @@ static void scsi_dec_host_busy(struct Scsi_Host *shost, struct scsi_cmnd *cmd)
rcu_read_lock();
__clear_bit(SCMD_STATE_INFLIGHT, &cmd->state);
if (unlikely(scsi_host_in_recovery(shost))) {
+ unsigned int busy = scsi_host_busy(shost);
+
spin_lock_irqsave(shost->host_lock, flags);
if (shost->host_failed || shost->host_eh_scheduled)
- scsi_eh_wakeup(shost, scsi_host_busy(shost));
+ scsi_eh_wakeup(shost, busy);
spin_unlock_irqrestore(shost->host_lock, flags);
}
rcu_read_unlock();
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 0833b3e..bdd0acf 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3407,6 +3407,24 @@ static bool sd_validate_opt_xfer_size(struct scsi_disk *sdkp,
return true;
}
+static void sd_read_block_zero(struct scsi_disk *sdkp)
+{
+ unsigned int buf_len = sdkp->device->sector_size;
+ char *buffer, cmd[10] = { };
+
+ buffer = kmalloc(buf_len, GFP_KERNEL);
+ if (!buffer)
+ return;
+
+ cmd[0] = READ_10;
+ put_unaligned_be32(0, &cmd[2]); /* Logical block address 0 */
+ put_unaligned_be16(1, &cmd[7]); /* Transfer 1 logical block */
+
+ scsi_execute_cmd(sdkp->device, cmd, REQ_OP_DRV_IN, buffer, buf_len,
+ SD_TIMEOUT, sdkp->max_retries, NULL);
+ kfree(buffer);
+}
+
/**
* sd_revalidate_disk - called the first time a new disk is seen,
* performs disk spin up, read_capacity, etc.
@@ -3446,7 +3464,13 @@ static int sd_revalidate_disk(struct gendisk *disk)
*/
if (sdkp->media_present) {
sd_read_capacity(sdkp, buffer);
-
+ /*
+ * Some USB/UAS devices return generic values for mode pages
+ * until the media has been accessed. Trigger a READ operation
+ * to force the device to populate mode pages.
+ */
+ if (sdp->read_before_ms)
+ sd_read_block_zero(sdkp);
/*
* set the default to rotational. All non-rotational devices
* support the block characteristics VPD page, which will
diff --git a/drivers/scsi/smartpqi/smartpqi_init.c b/drivers/scsi/smartpqi/smartpqi_init.c
index ceff1ec..385180c 100644
--- a/drivers/scsi/smartpqi/smartpqi_init.c
+++ b/drivers/scsi/smartpqi/smartpqi_init.c
@@ -6533,8 +6533,11 @@ static void pqi_map_queues(struct Scsi_Host *shost)
{
struct pqi_ctrl_info *ctrl_info = shost_to_hba(shost);
- blk_mq_pci_map_queues(&shost->tag_set.map[HCTX_TYPE_DEFAULT],
+ if (!ctrl_info->disable_managed_interrupts)
+ return blk_mq_pci_map_queues(&shost->tag_set.map[HCTX_TYPE_DEFAULT],
ctrl_info->pci_dev, 0);
+ else
+ return blk_mq_map_queues(&shost->tag_set.map[HCTX_TYPE_DEFAULT]);
}
static inline bool pqi_is_tape_changer_device(struct pqi_scsi_dev *device)
diff --git a/drivers/soc/microchip/Kconfig b/drivers/soc/microchip/Kconfig
index 9b0fdd9..19f4b57 100644
--- a/drivers/soc/microchip/Kconfig
+++ b/drivers/soc/microchip/Kconfig
@@ -1,5 +1,5 @@
config POLARFIRE_SOC_SYS_CTRL
- tristate "POLARFIRE_SOC_SYS_CTRL"
+ tristate "Microchip PolarFire SoC (MPFS) system controller support"
depends on POLARFIRE_SOC_MAILBOX
depends on MTD
help
diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c
index f4bfd24..f913e9bd 100644
--- a/drivers/soc/qcom/pmic_glink.c
+++ b/drivers/soc/qcom/pmic_glink.c
@@ -265,10 +265,17 @@ static int pmic_glink_probe(struct platform_device *pdev)
pg->client_mask = *match_data;
+ pg->pdr = pdr_handle_alloc(pmic_glink_pdr_callback, pg);
+ if (IS_ERR(pg->pdr)) {
+ ret = dev_err_probe(&pdev->dev, PTR_ERR(pg->pdr),
+ "failed to initialize pdr\n");
+ return ret;
+ }
+
if (pg->client_mask & BIT(PMIC_GLINK_CLIENT_UCSI)) {
ret = pmic_glink_add_aux_device(pg, &pg->ucsi_aux, "ucsi");
if (ret)
- return ret;
+ goto out_release_pdr_handle;
}
if (pg->client_mask & BIT(PMIC_GLINK_CLIENT_ALTMODE)) {
ret = pmic_glink_add_aux_device(pg, &pg->altmode_aux, "altmode");
@@ -281,17 +288,11 @@ static int pmic_glink_probe(struct platform_device *pdev)
goto out_release_altmode_aux;
}
- pg->pdr = pdr_handle_alloc(pmic_glink_pdr_callback, pg);
- if (IS_ERR(pg->pdr)) {
- ret = dev_err_probe(&pdev->dev, PTR_ERR(pg->pdr), "failed to initialize pdr\n");
- goto out_release_aux_devices;
- }
-
service = pdr_add_lookup(pg->pdr, "tms/servreg", "msm/adsp/charger_pd");
if (IS_ERR(service)) {
ret = dev_err_probe(&pdev->dev, PTR_ERR(service),
"failed adding pdr lookup for charger_pd\n");
- goto out_release_pdr_handle;
+ goto out_release_aux_devices;
}
mutex_lock(&__pmic_glink_lock);
@@ -300,8 +301,6 @@ static int pmic_glink_probe(struct platform_device *pdev)
return 0;
-out_release_pdr_handle:
- pdr_handle_release(pg->pdr);
out_release_aux_devices:
if (pg->client_mask & BIT(PMIC_GLINK_CLIENT_BATT))
pmic_glink_del_aux_device(pg, &pg->ps_aux);
@@ -311,6 +310,8 @@ static int pmic_glink_probe(struct platform_device *pdev)
out_release_ucsi_aux:
if (pg->client_mask & BIT(PMIC_GLINK_CLIENT_UCSI))
pmic_glink_del_aux_device(pg, &pg->ucsi_aux);
+out_release_pdr_handle:
+ pdr_handle_release(pg->pdr);
return ret;
}
diff --git a/drivers/soc/qcom/pmic_glink_altmode.c b/drivers/soc/qcom/pmic_glink_altmode.c
index 5fcd0fd..b3808fc 100644
--- a/drivers/soc/qcom/pmic_glink_altmode.c
+++ b/drivers/soc/qcom/pmic_glink_altmode.c
@@ -76,7 +76,7 @@ struct pmic_glink_altmode_port {
struct work_struct work;
- struct device *bridge;
+ struct auxiliary_device *bridge;
enum typec_orientation orientation;
u16 svid;
@@ -230,7 +230,7 @@ static void pmic_glink_altmode_worker(struct work_struct *work)
else
pmic_glink_altmode_enable_usb(altmode, alt_port);
- drm_aux_hpd_bridge_notify(alt_port->bridge,
+ drm_aux_hpd_bridge_notify(&alt_port->bridge->dev,
alt_port->hpd_state ?
connector_status_connected :
connector_status_disconnected);
@@ -454,7 +454,7 @@ static int pmic_glink_altmode_probe(struct auxiliary_device *adev,
alt_port->index = port;
INIT_WORK(&alt_port->work, pmic_glink_altmode_worker);
- alt_port->bridge = drm_dp_hpd_bridge_register(dev, to_of_node(fwnode));
+ alt_port->bridge = devm_drm_dp_hpd_bridge_alloc(dev, to_of_node(fwnode));
if (IS_ERR(alt_port->bridge)) {
fwnode_handle_put(fwnode);
return PTR_ERR(alt_port->bridge);
@@ -510,6 +510,16 @@ static int pmic_glink_altmode_probe(struct auxiliary_device *adev,
}
}
+ for (port = 0; port < ARRAY_SIZE(altmode->ports); port++) {
+ alt_port = &altmode->ports[port];
+ if (!alt_port->bridge)
+ continue;
+
+ ret = devm_drm_dp_hpd_bridge_add(dev, alt_port->bridge);
+ if (ret)
+ return ret;
+ }
+
altmode->client = devm_pmic_glink_register_client(dev,
altmode->owner_id,
pmic_glink_altmode_callback,
diff --git a/drivers/spi/spi-cadence-quadspi.c b/drivers/spi/spi-cadence-quadspi.c
index f94e0d3..1a8d039 100644
--- a/drivers/spi/spi-cadence-quadspi.c
+++ b/drivers/spi/spi-cadence-quadspi.c
@@ -1927,24 +1927,18 @@ static void cqspi_remove(struct platform_device *pdev)
pm_runtime_disable(&pdev->dev);
}
-static int cqspi_suspend(struct device *dev)
+static int cqspi_runtime_suspend(struct device *dev)
{
struct cqspi_st *cqspi = dev_get_drvdata(dev);
- struct spi_controller *host = dev_get_drvdata(dev);
- int ret;
- ret = spi_controller_suspend(host);
cqspi_controller_enable(cqspi, 0);
-
clk_disable_unprepare(cqspi->clk);
-
- return ret;
+ return 0;
}
-static int cqspi_resume(struct device *dev)
+static int cqspi_runtime_resume(struct device *dev)
{
struct cqspi_st *cqspi = dev_get_drvdata(dev);
- struct spi_controller *host = dev_get_drvdata(dev);
clk_prepare_enable(cqspi->clk);
cqspi_wait_idle(cqspi);
@@ -1952,12 +1946,27 @@ static int cqspi_resume(struct device *dev)
cqspi->current_cs = -1;
cqspi->sclk = 0;
-
- return spi_controller_resume(host);
+ return 0;
}
-static DEFINE_RUNTIME_DEV_PM_OPS(cqspi_dev_pm_ops, cqspi_suspend,
- cqspi_resume, NULL);
+static int cqspi_suspend(struct device *dev)
+{
+ struct cqspi_st *cqspi = dev_get_drvdata(dev);
+
+ return spi_controller_suspend(cqspi->host);
+}
+
+static int cqspi_resume(struct device *dev)
+{
+ struct cqspi_st *cqspi = dev_get_drvdata(dev);
+
+ return spi_controller_resume(cqspi->host);
+}
+
+static const struct dev_pm_ops cqspi_dev_pm_ops = {
+ RUNTIME_PM_OPS(cqspi_runtime_suspend, cqspi_runtime_resume, NULL)
+ SYSTEM_SLEEP_PM_OPS(cqspi_suspend, cqspi_resume)
+};
static const struct cqspi_driver_platdata cdns_qspi = {
.quirks = CQSPI_DISABLE_DAC_MODE,
diff --git a/drivers/spi/spi-cs42l43.c b/drivers/spi/spi-cs42l43.c
index b241905..adf19e8 100644
--- a/drivers/spi/spi-cs42l43.c
+++ b/drivers/spi/spi-cs42l43.c
@@ -148,8 +148,7 @@ static void cs42l43_set_cs(struct spi_device *spi, bool is_high)
{
struct cs42l43_spi *priv = spi_controller_get_devdata(spi->controller);
- if (spi_get_chipselect(spi, 0) == 0)
- regmap_write(priv->regmap, CS42L43_SPI_CONFIG2, !is_high);
+ regmap_write(priv->regmap, CS42L43_SPI_CONFIG2, !is_high);
}
static int cs42l43_prepare_message(struct spi_controller *ctlr, struct spi_message *msg)
diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index 546cdce..833a1bb 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -2,6 +2,7 @@
// Copyright 2004-2007 Freescale Semiconductor, Inc. All Rights Reserved.
// Copyright (C) 2008 Juergen Beisert
+#include <linux/bits.h>
#include <linux/clk.h>
#include <linux/completion.h>
#include <linux/delay.h>
@@ -660,15 +661,15 @@ static int mx51_ecspi_prepare_transfer(struct spi_imx_data *spi_imx,
<< MX51_ECSPI_CTRL_BL_OFFSET;
else {
if (spi_imx->usedma) {
- ctrl |= (spi_imx->bits_per_word *
- spi_imx_bytes_per_word(spi_imx->bits_per_word) - 1)
+ ctrl |= (spi_imx->bits_per_word - 1)
<< MX51_ECSPI_CTRL_BL_OFFSET;
} else {
if (spi_imx->count >= MX51_ECSPI_CTRL_MAX_BURST)
- ctrl |= (MX51_ECSPI_CTRL_MAX_BURST - 1)
+ ctrl |= (MX51_ECSPI_CTRL_MAX_BURST * BITS_PER_BYTE - 1)
<< MX51_ECSPI_CTRL_BL_OFFSET;
else
- ctrl |= (spi_imx->count * spi_imx->bits_per_word - 1)
+ ctrl |= spi_imx->count / DIV_ROUND_UP(spi_imx->bits_per_word,
+ BITS_PER_BYTE) * spi_imx->bits_per_word
<< MX51_ECSPI_CTRL_BL_OFFSET;
}
}
diff --git a/drivers/spi/spi-intel-pci.c b/drivers/spi/spi-intel-pci.c
index 07d20ca..4337ca5 100644
--- a/drivers/spi/spi-intel-pci.c
+++ b/drivers/spi/spi-intel-pci.c
@@ -85,6 +85,7 @@ static const struct pci_device_id intel_spi_pci_ids[] = {
{ PCI_VDEVICE(INTEL, 0xa2a4), (unsigned long)&cnl_info },
{ PCI_VDEVICE(INTEL, 0xa324), (unsigned long)&cnl_info },
{ PCI_VDEVICE(INTEL, 0xa3a4), (unsigned long)&cnl_info },
+ { PCI_VDEVICE(INTEL, 0xa823), (unsigned long)&cnl_info },
{ },
};
MODULE_DEVICE_TABLE(pci, intel_spi_pci_ids);
diff --git a/drivers/spi/spi-mxs.c b/drivers/spi/spi-mxs.c
index 1bf0803..88cbe4f 100644
--- a/drivers/spi/spi-mxs.c
+++ b/drivers/spi/spi-mxs.c
@@ -39,6 +39,7 @@
#include <linux/spi/spi.h>
#include <linux/spi/mxs-spi.h>
#include <trace/events/spi.h>
+#include <linux/dma/mxs-dma.h>
#define DRIVER_NAME "mxs-spi"
@@ -252,7 +253,7 @@ static int mxs_spi_txrx_dma(struct mxs_spi *spi,
desc = dmaengine_prep_slave_sg(ssp->dmach,
&dma_xfer[sg_count].sg, 1,
(flags & TXRX_WRITE) ? DMA_MEM_TO_DEV : DMA_DEV_TO_MEM,
- DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
+ DMA_PREP_INTERRUPT | MXS_DMA_CTRL_WAIT4END);
if (!desc) {
dev_err(ssp->dev,
diff --git a/drivers/spi/spi-omap2-mcspi.c b/drivers/spi/spi-omap2-mcspi.c
index a0c9fea..ddf1c68 100644
--- a/drivers/spi/spi-omap2-mcspi.c
+++ b/drivers/spi/spi-omap2-mcspi.c
@@ -53,8 +53,6 @@
/* per-register bitmasks: */
#define OMAP2_MCSPI_IRQSTATUS_EOW BIT(17)
-#define OMAP2_MCSPI_IRQSTATUS_TX0_EMPTY BIT(0)
-#define OMAP2_MCSPI_IRQSTATUS_RX0_FULL BIT(2)
#define OMAP2_MCSPI_MODULCTRL_SINGLE BIT(0)
#define OMAP2_MCSPI_MODULCTRL_MS BIT(2)
@@ -293,7 +291,7 @@ static void omap2_mcspi_set_mode(struct spi_controller *ctlr)
}
static void omap2_mcspi_set_fifo(const struct spi_device *spi,
- struct spi_transfer *t, int enable, int dma_enabled)
+ struct spi_transfer *t, int enable)
{
struct spi_controller *ctlr = spi->controller;
struct omap2_mcspi_cs *cs = spi->controller_state;
@@ -314,28 +312,20 @@ static void omap2_mcspi_set_fifo(const struct spi_device *spi,
max_fifo_depth = OMAP2_MCSPI_MAX_FIFODEPTH / 2;
else
max_fifo_depth = OMAP2_MCSPI_MAX_FIFODEPTH;
- if (dma_enabled)
- wcnt = t->len / bytes_per_word;
- else
- wcnt = 0;
+
+ wcnt = t->len / bytes_per_word;
if (wcnt > OMAP2_MCSPI_MAX_FIFOWCNT)
goto disable_fifo;
xferlevel = wcnt << 16;
if (t->rx_buf != NULL) {
chconf |= OMAP2_MCSPI_CHCONF_FFER;
- if (dma_enabled)
- xferlevel |= (bytes_per_word - 1) << 8;
- else
- xferlevel |= (max_fifo_depth - 1) << 8;
+ xferlevel |= (bytes_per_word - 1) << 8;
}
if (t->tx_buf != NULL) {
chconf |= OMAP2_MCSPI_CHCONF_FFET;
- if (dma_enabled)
- xferlevel |= bytes_per_word - 1;
- else
- xferlevel |= (max_fifo_depth - 1);
+ xferlevel |= bytes_per_word - 1;
}
mcspi_write_reg(ctlr, OMAP2_MCSPI_XFERLEVEL, xferlevel);
@@ -892,113 +882,6 @@ omap2_mcspi_txrx_pio(struct spi_device *spi, struct spi_transfer *xfer)
return count - c;
}
-static unsigned
-omap2_mcspi_txrx_piofifo(struct spi_device *spi, struct spi_transfer *xfer)
-{
- struct omap2_mcspi_cs *cs = spi->controller_state;
- struct omap2_mcspi *mcspi;
- unsigned int count, c;
- unsigned int iter, cwc;
- int last_request;
- void __iomem *base = cs->base;
- void __iomem *tx_reg;
- void __iomem *rx_reg;
- void __iomem *chstat_reg;
- void __iomem *irqstat_reg;
- int word_len, bytes_per_word;
- u8 *rx;
- const u8 *tx;
-
- mcspi = spi_controller_get_devdata(spi->controller);
- count = xfer->len;
- c = count;
- word_len = cs->word_len;
- bytes_per_word = mcspi_bytes_per_word(word_len);
-
- /*
- * We store the pre-calculated register addresses on stack to speed
- * up the transfer loop.
- */
- tx_reg = base + OMAP2_MCSPI_TX0;
- rx_reg = base + OMAP2_MCSPI_RX0;
- chstat_reg = base + OMAP2_MCSPI_CHSTAT0;
- irqstat_reg = base + OMAP2_MCSPI_IRQSTATUS;
-
- if (c < (word_len >> 3))
- return 0;
-
- rx = xfer->rx_buf;
- tx = xfer->tx_buf;
-
- do {
- /* calculate number of words in current iteration */
- cwc = min((unsigned int)mcspi->fifo_depth / bytes_per_word,
- c / bytes_per_word);
- last_request = cwc != (mcspi->fifo_depth / bytes_per_word);
- if (tx) {
- if (mcspi_wait_for_reg_bit(irqstat_reg,
- OMAP2_MCSPI_IRQSTATUS_TX0_EMPTY) < 0) {
- dev_err(&spi->dev, "TX Empty timed out\n");
- goto out;
- }
- writel_relaxed(OMAP2_MCSPI_IRQSTATUS_TX0_EMPTY, irqstat_reg);
-
- for (iter = 0; iter < cwc; iter++, tx += bytes_per_word) {
- if (bytes_per_word == 1)
- writel_relaxed(*tx, tx_reg);
- else if (bytes_per_word == 2)
- writel_relaxed(*((u16 *)tx), tx_reg);
- else if (bytes_per_word == 4)
- writel_relaxed(*((u32 *)tx), tx_reg);
- }
- }
-
- if (rx) {
- if (!last_request &&
- mcspi_wait_for_reg_bit(irqstat_reg,
- OMAP2_MCSPI_IRQSTATUS_RX0_FULL) < 0) {
- dev_err(&spi->dev, "RX_FULL timed out\n");
- goto out;
- }
- writel_relaxed(OMAP2_MCSPI_IRQSTATUS_RX0_FULL, irqstat_reg);
-
- for (iter = 0; iter < cwc; iter++, rx += bytes_per_word) {
- if (last_request &&
- mcspi_wait_for_reg_bit(chstat_reg,
- OMAP2_MCSPI_CHSTAT_RXS) < 0) {
- dev_err(&spi->dev, "RXS timed out\n");
- goto out;
- }
- if (bytes_per_word == 1)
- *rx = readl_relaxed(rx_reg);
- else if (bytes_per_word == 2)
- *((u16 *)rx) = readl_relaxed(rx_reg);
- else if (bytes_per_word == 4)
- *((u32 *)rx) = readl_relaxed(rx_reg);
- }
- }
-
- if (last_request) {
- if (mcspi_wait_for_reg_bit(chstat_reg,
- OMAP2_MCSPI_CHSTAT_EOT) < 0) {
- dev_err(&spi->dev, "EOT timed out\n");
- goto out;
- }
- if (mcspi_wait_for_reg_bit(chstat_reg,
- OMAP2_MCSPI_CHSTAT_TXFFE) < 0) {
- dev_err(&spi->dev, "TXFFE timed out\n");
- goto out;
- }
- omap2_mcspi_set_enable(spi, 0);
- }
- c -= cwc * bytes_per_word;
- } while (c >= bytes_per_word);
-
-out:
- omap2_mcspi_set_enable(spi, 1);
- return count - c;
-}
-
static u32 omap2_mcspi_calc_divisor(u32 speed_hz, u32 ref_clk_hz)
{
u32 div;
@@ -1323,9 +1206,7 @@ static int omap2_mcspi_transfer_one(struct spi_controller *ctlr,
if ((mcspi_dma->dma_rx && mcspi_dma->dma_tx) &&
ctlr->cur_msg_mapped &&
ctlr->can_dma(ctlr, spi, t))
- omap2_mcspi_set_fifo(spi, t, 1, 1);
- else if (t->len > OMAP2_MCSPI_MAX_FIFODEPTH)
- omap2_mcspi_set_fifo(spi, t, 1, 0);
+ omap2_mcspi_set_fifo(spi, t, 1);
omap2_mcspi_set_enable(spi, 1);
@@ -1338,8 +1219,6 @@ static int omap2_mcspi_transfer_one(struct spi_controller *ctlr,
ctlr->cur_msg_mapped &&
ctlr->can_dma(ctlr, spi, t))
count = omap2_mcspi_txrx_dma(spi, t);
- else if (mcspi->fifo_depth > 0)
- count = omap2_mcspi_txrx_piofifo(spi, t);
else
count = omap2_mcspi_txrx_pio(spi, t);
@@ -1352,7 +1231,7 @@ static int omap2_mcspi_transfer_one(struct spi_controller *ctlr,
omap2_mcspi_set_enable(spi, 0);
if (mcspi->fifo_depth > 0)
- omap2_mcspi_set_fifo(spi, t, 0, 0);
+ omap2_mcspi_set_fifo(spi, t, 0);
out:
/* Restore defaults if they were overriden */
@@ -1375,7 +1254,7 @@ static int omap2_mcspi_transfer_one(struct spi_controller *ctlr,
omap2_mcspi_set_cs(spi, !(spi->mode & SPI_CS_HIGH));
if (mcspi->fifo_depth > 0 && t)
- omap2_mcspi_set_fifo(spi, t, 0, 0);
+ omap2_mcspi_set_fifo(spi, t, 0);
return status;
}
diff --git a/drivers/spi/spi-ppc4xx.c b/drivers/spi/spi-ppc4xx.c
index 03aab66..82d6264 100644
--- a/drivers/spi/spi-ppc4xx.c
+++ b/drivers/spi/spi-ppc4xx.c
@@ -25,11 +25,13 @@
#include <linux/slab.h>
#include <linux/errno.h>
#include <linux/wait.h>
+#include <linux/platform_device.h>
#include <linux/of_address.h>
#include <linux/of_irq.h>
#include <linux/of_platform.h>
#include <linux/interrupt.h>
#include <linux/delay.h>
+#include <linux/platform_device.h>
#include <linux/spi/spi.h>
#include <linux/spi/spi_bitbang.h>
@@ -166,10 +168,8 @@ static int spi_ppc4xx_setupxfer(struct spi_device *spi, struct spi_transfer *t)
int scr;
u8 cdm = 0;
u32 speed;
- u8 bits_per_word;
/* Start with the generic configuration for this device. */
- bits_per_word = spi->bits_per_word;
speed = spi->max_speed_hz;
/*
@@ -177,9 +177,6 @@ static int spi_ppc4xx_setupxfer(struct spi_device *spi, struct spi_transfer *t)
* the transfer to overwrite the generic configuration with zeros.
*/
if (t) {
- if (t->bits_per_word)
- bits_per_word = t->bits_per_word;
-
if (t->speed_hz)
speed = min(t->speed_hz, spi->max_speed_hz);
}
diff --git a/drivers/staging/iio/impedance-analyzer/ad5933.c b/drivers/staging/iio/impedance-analyzer/ad5933.c
index e748a5d..9149d41 100644
--- a/drivers/staging/iio/impedance-analyzer/ad5933.c
+++ b/drivers/staging/iio/impedance-analyzer/ad5933.c
@@ -608,7 +608,7 @@ static void ad5933_work(struct work_struct *work)
struct ad5933_state, work.work);
struct iio_dev *indio_dev = i2c_get_clientdata(st->client);
__be16 buf[2];
- int val[2];
+ u16 val[2];
unsigned char status;
int ret;
diff --git a/drivers/staging/media/atomisp/pci/atomisp_cmd.c b/drivers/staging/media/atomisp/pci/atomisp_cmd.c
index f44e641..d0db2ef 100644
--- a/drivers/staging/media/atomisp/pci/atomisp_cmd.c
+++ b/drivers/staging/media/atomisp/pci/atomisp_cmd.c
@@ -3723,12 +3723,10 @@ void atomisp_get_padding(struct atomisp_device *isp, u32 width, u32 height,
static int atomisp_set_crop(struct atomisp_device *isp,
const struct v4l2_mbus_framefmt *format,
+ struct v4l2_subdev_state *sd_state,
int which)
{
struct atomisp_input_subdev *input = &isp->inputs[isp->asd.input_curr];
- struct v4l2_subdev_state pad_state = {
- .pads = &input->pad_cfg,
- };
struct v4l2_subdev_selection sel = {
.which = which,
.target = V4L2_SEL_TGT_CROP,
@@ -3754,7 +3752,7 @@ static int atomisp_set_crop(struct atomisp_device *isp,
sel.r.left = ((input->native_rect.width - sel.r.width) / 2) & ~1;
sel.r.top = ((input->native_rect.height - sel.r.height) / 2) & ~1;
- ret = v4l2_subdev_call(input->camera, pad, set_selection, &pad_state, &sel);
+ ret = v4l2_subdev_call(input->camera, pad, set_selection, sd_state, &sel);
if (ret)
dev_err(isp->dev, "Error setting crop to %ux%u @%ux%u: %d\n",
sel.r.width, sel.r.height, sel.r.left, sel.r.top, ret);
@@ -3770,9 +3768,6 @@ int atomisp_try_fmt(struct atomisp_device *isp, struct v4l2_pix_format *f,
const struct atomisp_format_bridge *fmt, *snr_fmt;
struct atomisp_sub_device *asd = &isp->asd;
struct atomisp_input_subdev *input = &isp->inputs[asd->input_curr];
- struct v4l2_subdev_state pad_state = {
- .pads = &input->pad_cfg,
- };
struct v4l2_subdev_format format = {
.which = V4L2_SUBDEV_FORMAT_TRY,
};
@@ -3809,11 +3804,16 @@ int atomisp_try_fmt(struct atomisp_device *isp, struct v4l2_pix_format *f,
dev_dbg(isp->dev, "try_mbus_fmt: asking for %ux%u\n",
format.format.width, format.format.height);
- ret = atomisp_set_crop(isp, &format.format, V4L2_SUBDEV_FORMAT_TRY);
- if (ret)
- return ret;
+ v4l2_subdev_lock_state(input->try_sd_state);
- ret = v4l2_subdev_call(input->camera, pad, set_fmt, &pad_state, &format);
+ ret = atomisp_set_crop(isp, &format.format, input->try_sd_state,
+ V4L2_SUBDEV_FORMAT_TRY);
+ if (ret == 0)
+ ret = v4l2_subdev_call(input->camera, pad, set_fmt,
+ input->try_sd_state, &format);
+
+ v4l2_subdev_unlock_state(input->try_sd_state);
+
if (ret)
return ret;
@@ -4238,9 +4238,7 @@ static int atomisp_set_fmt_to_snr(struct video_device *vdev, const struct v4l2_p
struct atomisp_device *isp = asd->isp;
struct atomisp_input_subdev *input = &isp->inputs[asd->input_curr];
const struct atomisp_format_bridge *format;
- struct v4l2_subdev_state pad_state = {
- .pads = &input->pad_cfg,
- };
+ struct v4l2_subdev_state *act_sd_state;
struct v4l2_subdev_format vformat = {
.which = V4L2_SUBDEV_FORMAT_TRY,
};
@@ -4268,12 +4266,18 @@ static int atomisp_set_fmt_to_snr(struct video_device *vdev, const struct v4l2_p
/* Disable dvs if resolution can't be supported by sensor */
if (asd->params.video_dis_en && asd->run_mode->val == ATOMISP_RUN_MODE_VIDEO) {
- ret = atomisp_set_crop(isp, &vformat.format, V4L2_SUBDEV_FORMAT_TRY);
- if (ret)
- return ret;
+ v4l2_subdev_lock_state(input->try_sd_state);
- vformat.which = V4L2_SUBDEV_FORMAT_TRY;
- ret = v4l2_subdev_call(input->camera, pad, set_fmt, &pad_state, &vformat);
+ ret = atomisp_set_crop(isp, &vformat.format, input->try_sd_state,
+ V4L2_SUBDEV_FORMAT_TRY);
+ if (ret == 0) {
+ vformat.which = V4L2_SUBDEV_FORMAT_TRY;
+ ret = v4l2_subdev_call(input->camera, pad, set_fmt,
+ input->try_sd_state, &vformat);
+ }
+
+ v4l2_subdev_unlock_state(input->try_sd_state);
+
if (ret)
return ret;
@@ -4291,12 +4295,18 @@ static int atomisp_set_fmt_to_snr(struct video_device *vdev, const struct v4l2_p
}
}
- ret = atomisp_set_crop(isp, &vformat.format, V4L2_SUBDEV_FORMAT_ACTIVE);
- if (ret)
- return ret;
+ act_sd_state = v4l2_subdev_lock_and_get_active_state(input->camera);
- vformat.which = V4L2_SUBDEV_FORMAT_ACTIVE;
- ret = v4l2_subdev_call(input->camera, pad, set_fmt, NULL, &vformat);
+ ret = atomisp_set_crop(isp, &vformat.format, act_sd_state,
+ V4L2_SUBDEV_FORMAT_ACTIVE);
+ if (ret == 0) {
+ vformat.which = V4L2_SUBDEV_FORMAT_ACTIVE;
+ ret = v4l2_subdev_call(input->camera, pad, set_fmt, act_sd_state, &vformat);
+ }
+
+ if (act_sd_state)
+ v4l2_subdev_unlock_state(act_sd_state);
+
if (ret)
return ret;
diff --git a/drivers/staging/media/atomisp/pci/atomisp_internal.h b/drivers/staging/media/atomisp/pci/atomisp_internal.h
index f7b4bee..d5b077e 100644
--- a/drivers/staging/media/atomisp/pci/atomisp_internal.h
+++ b/drivers/staging/media/atomisp/pci/atomisp_internal.h
@@ -132,8 +132,8 @@ struct atomisp_input_subdev {
/* Sensor rects for sensors which support crop */
struct v4l2_rect native_rect;
struct v4l2_rect active_rect;
- /* Sensor pad_cfg for which == V4L2_SUBDEV_FORMAT_TRY calls */
- struct v4l2_subdev_pad_config pad_cfg;
+ /* Sensor state for which == V4L2_SUBDEV_FORMAT_TRY calls */
+ struct v4l2_subdev_state *try_sd_state;
struct v4l2_subdev *motor;
diff --git a/drivers/staging/media/atomisp/pci/atomisp_ioctl.c b/drivers/staging/media/atomisp/pci/atomisp_ioctl.c
index 01b7fa9..5b2d88c 100644
--- a/drivers/staging/media/atomisp/pci/atomisp_ioctl.c
+++ b/drivers/staging/media/atomisp/pci/atomisp_ioctl.c
@@ -781,12 +781,20 @@ static int atomisp_enum_framesizes(struct file *file, void *priv,
.which = V4L2_SUBDEV_FORMAT_ACTIVE,
.code = input->code,
};
+ struct v4l2_subdev_state *act_sd_state;
int ret;
+ if (!input->camera)
+ return -EINVAL;
+
if (input->crop_support)
return atomisp_enum_framesizes_crop(isp, fsize);
- ret = v4l2_subdev_call(input->camera, pad, enum_frame_size, NULL, &fse);
+ act_sd_state = v4l2_subdev_lock_and_get_active_state(input->camera);
+ ret = v4l2_subdev_call(input->camera, pad, enum_frame_size,
+ act_sd_state, &fse);
+ if (act_sd_state)
+ v4l2_subdev_unlock_state(act_sd_state);
if (ret)
return ret;
@@ -803,18 +811,25 @@ static int atomisp_enum_frameintervals(struct file *file, void *priv,
struct video_device *vdev = video_devdata(file);
struct atomisp_device *isp = video_get_drvdata(vdev);
struct atomisp_sub_device *asd = atomisp_to_video_pipe(vdev)->asd;
+ struct atomisp_input_subdev *input = &isp->inputs[asd->input_curr];
struct v4l2_subdev_frame_interval_enum fie = {
- .code = atomisp_in_fmt_conv[0].code,
+ .code = atomisp_in_fmt_conv[0].code,
.index = fival->index,
.width = fival->width,
.height = fival->height,
.which = V4L2_SUBDEV_FORMAT_ACTIVE,
};
+ struct v4l2_subdev_state *act_sd_state;
int ret;
- ret = v4l2_subdev_call(isp->inputs[asd->input_curr].camera,
- pad, enum_frame_interval, NULL,
- &fie);
+ if (!input->camera)
+ return -EINVAL;
+
+ act_sd_state = v4l2_subdev_lock_and_get_active_state(input->camera);
+ ret = v4l2_subdev_call(input->camera, pad, enum_frame_interval,
+ act_sd_state, &fie);
+ if (act_sd_state)
+ v4l2_subdev_unlock_state(act_sd_state);
if (ret)
return ret;
@@ -830,30 +845,25 @@ static int atomisp_enum_fmt_cap(struct file *file, void *fh,
struct video_device *vdev = video_devdata(file);
struct atomisp_device *isp = video_get_drvdata(vdev);
struct atomisp_sub_device *asd = atomisp_to_video_pipe(vdev)->asd;
+ struct atomisp_input_subdev *input = &isp->inputs[asd->input_curr];
struct v4l2_subdev_mbus_code_enum code = {
.which = V4L2_SUBDEV_FORMAT_ACTIVE,
};
const struct atomisp_format_bridge *format;
- struct v4l2_subdev *camera;
+ struct v4l2_subdev_state *act_sd_state;
unsigned int i, fi = 0;
- int rval;
+ int ret;
- camera = isp->inputs[asd->input_curr].camera;
- if(!camera) {
- dev_err(isp->dev, "%s(): camera is NULL, device is %s\n",
- __func__, vdev->name);
+ if (!input->camera)
return -EINVAL;
- }
- rval = v4l2_subdev_call(camera, pad, enum_mbus_code, NULL, &code);
- if (rval == -ENOIOCTLCMD) {
- dev_warn(isp->dev,
- "enum_mbus_code pad op not supported by %s. Please fix your sensor driver!\n",
- camera->name);
- }
-
- if (rval)
- return rval;
+ act_sd_state = v4l2_subdev_lock_and_get_active_state(input->camera);
+ ret = v4l2_subdev_call(input->camera, pad, enum_mbus_code,
+ act_sd_state, &code);
+ if (act_sd_state)
+ v4l2_subdev_unlock_state(act_sd_state);
+ if (ret)
+ return ret;
for (i = 0; i < ARRAY_SIZE(atomisp_output_fmts); i++) {
format = &atomisp_output_fmts[i];
diff --git a/drivers/staging/media/atomisp/pci/atomisp_v4l2.c b/drivers/staging/media/atomisp/pci/atomisp_v4l2.c
index c1c8501..547e144 100644
--- a/drivers/staging/media/atomisp/pci/atomisp_v4l2.c
+++ b/drivers/staging/media/atomisp/pci/atomisp_v4l2.c
@@ -862,6 +862,9 @@ static void atomisp_unregister_entities(struct atomisp_device *isp)
v4l2_device_unregister(&isp->v4l2_dev);
media_device_unregister(&isp->media_dev);
media_device_cleanup(&isp->media_dev);
+
+ for (i = 0; i < isp->input_cnt; i++)
+ __v4l2_subdev_state_free(isp->inputs[i].try_sd_state);
}
static int atomisp_register_entities(struct atomisp_device *isp)
@@ -933,32 +936,49 @@ static int atomisp_register_entities(struct atomisp_device *isp)
static void atomisp_init_sensor(struct atomisp_input_subdev *input)
{
+ static struct lock_class_key try_sd_state_key;
struct v4l2_subdev_mbus_code_enum mbus_code_enum = { };
struct v4l2_subdev_frame_size_enum fse = { };
- struct v4l2_subdev_state sd_state = {
- .pads = &input->pad_cfg,
- };
struct v4l2_subdev_selection sel = { };
+ struct v4l2_subdev_state *try_sd_state, *act_sd_state;
int i, err;
+ /*
+ * FIXME: Drivers are not supposed to use __v4l2_subdev_state_alloc()
+ * but atomisp needs this for try_fmt on its /dev/video# node since
+ * it emulates a normal v4l2 device there, passing through try_fmt /
+ * set_fmt to the sensor.
+ */
+ try_sd_state = __v4l2_subdev_state_alloc(input->camera,
+ "atomisp:try_sd_state->lock", &try_sd_state_key);
+ if (IS_ERR(try_sd_state))
+ return;
+
+ input->try_sd_state = try_sd_state;
+
+ act_sd_state = v4l2_subdev_lock_and_get_active_state(input->camera);
+
mbus_code_enum.which = V4L2_SUBDEV_FORMAT_ACTIVE;
- err = v4l2_subdev_call(input->camera, pad, enum_mbus_code, NULL, &mbus_code_enum);
+ err = v4l2_subdev_call(input->camera, pad, enum_mbus_code,
+ act_sd_state, &mbus_code_enum);
if (!err)
input->code = mbus_code_enum.code;
sel.which = V4L2_SUBDEV_FORMAT_ACTIVE;
sel.target = V4L2_SEL_TGT_NATIVE_SIZE;
- err = v4l2_subdev_call(input->camera, pad, get_selection, NULL, &sel);
+ err = v4l2_subdev_call(input->camera, pad, get_selection,
+ act_sd_state, &sel);
if (err)
- return;
+ goto unlock_act_sd_state;
input->native_rect = sel.r;
sel.which = V4L2_SUBDEV_FORMAT_ACTIVE;
sel.target = V4L2_SEL_TGT_CROP_DEFAULT;
- err = v4l2_subdev_call(input->camera, pad, get_selection, NULL, &sel);
+ err = v4l2_subdev_call(input->camera, pad, get_selection,
+ act_sd_state, &sel);
if (err)
- return;
+ goto unlock_act_sd_state;
input->active_rect = sel.r;
@@ -973,7 +993,8 @@ static void atomisp_init_sensor(struct atomisp_input_subdev *input)
fse.code = input->code;
fse.which = V4L2_SUBDEV_FORMAT_ACTIVE;
- err = v4l2_subdev_call(input->camera, pad, enum_frame_size, NULL, &fse);
+ err = v4l2_subdev_call(input->camera, pad, enum_frame_size,
+ act_sd_state, &fse);
if (err)
break;
@@ -989,22 +1010,26 @@ static void atomisp_init_sensor(struct atomisp_input_subdev *input)
* for padding, set the crop rect to cover the entire sensor instead
* of only the default active area.
*
- * Do this for both try and active formats since the try_crop rect in
- * pad_cfg may influence (clamp) future try_fmt calls with which == try.
+ * Do this for both try and active formats since the crop rect in
+ * try_sd_state may influence (clamp size) in calls with which == try.
*/
sel.which = V4L2_SUBDEV_FORMAT_TRY;
sel.target = V4L2_SEL_TGT_CROP;
sel.r = input->native_rect;
- err = v4l2_subdev_call(input->camera, pad, set_selection, &sd_state, &sel);
+ v4l2_subdev_lock_state(input->try_sd_state);
+ err = v4l2_subdev_call(input->camera, pad, set_selection,
+ input->try_sd_state, &sel);
+ v4l2_subdev_unlock_state(input->try_sd_state);
if (err)
- return;
+ goto unlock_act_sd_state;
sel.which = V4L2_SUBDEV_FORMAT_ACTIVE;
sel.target = V4L2_SEL_TGT_CROP;
sel.r = input->native_rect;
- err = v4l2_subdev_call(input->camera, pad, set_selection, NULL, &sel);
+ err = v4l2_subdev_call(input->camera, pad, set_selection,
+ act_sd_state, &sel);
if (err)
- return;
+ goto unlock_act_sd_state;
dev_info(input->camera->dev, "Supports crop native %dx%d active %dx%d binning %d\n",
input->native_rect.width, input->native_rect.height,
@@ -1012,6 +1037,10 @@ static void atomisp_init_sensor(struct atomisp_input_subdev *input)
input->binning_support);
input->crop_support = true;
+
+unlock_act_sd_state:
+ if (act_sd_state)
+ v4l2_subdev_unlock_state(act_sd_state);
}
int atomisp_register_device_nodes(struct atomisp_device *isp)
diff --git a/drivers/target/target_core_configfs.c b/drivers/target/target_core_configfs.c
index a5f5898..c1fbcdd 100644
--- a/drivers/target/target_core_configfs.c
+++ b/drivers/target/target_core_configfs.c
@@ -759,6 +759,29 @@ static ssize_t emulate_tas_store(struct config_item *item,
return count;
}
+static int target_try_configure_unmap(struct se_device *dev,
+ const char *config_opt)
+{
+ if (!dev->transport->configure_unmap) {
+ pr_err("Generic Block Discard not supported\n");
+ return -ENOSYS;
+ }
+
+ if (!target_dev_configured(dev)) {
+ pr_err("Generic Block Discard setup for %s requires device to be configured\n",
+ config_opt);
+ return -ENODEV;
+ }
+
+ if (!dev->transport->configure_unmap(dev)) {
+ pr_err("Generic Block Discard setup for %s failed\n",
+ config_opt);
+ return -ENOSYS;
+ }
+
+ return 0;
+}
+
static ssize_t emulate_tpu_store(struct config_item *item,
const char *page, size_t count)
{
@@ -776,11 +799,9 @@ static ssize_t emulate_tpu_store(struct config_item *item,
* Discard supported is detected iblock_create_virtdevice().
*/
if (flag && !da->max_unmap_block_desc_count) {
- if (!dev->transport->configure_unmap ||
- !dev->transport->configure_unmap(dev)) {
- pr_err("Generic Block Discard not supported\n");
- return -ENOSYS;
- }
+ ret = target_try_configure_unmap(dev, "emulate_tpu");
+ if (ret)
+ return ret;
}
da->emulate_tpu = flag;
@@ -806,11 +827,9 @@ static ssize_t emulate_tpws_store(struct config_item *item,
* Discard supported is detected iblock_create_virtdevice().
*/
if (flag && !da->max_unmap_block_desc_count) {
- if (!dev->transport->configure_unmap ||
- !dev->transport->configure_unmap(dev)) {
- pr_err("Generic Block Discard not supported\n");
- return -ENOSYS;
- }
+ ret = target_try_configure_unmap(dev, "emulate_tpws");
+ if (ret)
+ return ret;
}
da->emulate_tpws = flag;
@@ -1022,12 +1041,9 @@ static ssize_t unmap_zeroes_data_store(struct config_item *item,
* Discard supported is detected iblock_configure_device().
*/
if (flag && !da->max_unmap_block_desc_count) {
- if (!dev->transport->configure_unmap ||
- !dev->transport->configure_unmap(dev)) {
- pr_err("dev[%p]: Thin Provisioning LBPRZ will not be set because max_unmap_block_desc_count is zero\n",
- da->da_dev);
- return -ENOSYS;
- }
+ ret = target_try_configure_unmap(dev, "unmap_zeroes_data");
+ if (ret)
+ return ret;
}
da->unmap_zeroes_data = flag;
pr_debug("dev[%p]: SE Device Thin Provisioning LBPRZ bit: %d\n",
diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
index 8eb9eb7..7f6ca81 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -91,7 +91,7 @@ static int iblock_configure_device(struct se_device *dev)
{
struct iblock_dev *ib_dev = IBLOCK_DEV(dev);
struct request_queue *q;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
struct block_device *bd;
struct blk_integrity *bi;
blk_mode_t mode = BLK_OPEN_READ;
@@ -117,14 +117,14 @@ static int iblock_configure_device(struct se_device *dev)
else
dev->dev_flags |= DF_READ_ONLY;
- bdev_handle = bdev_open_by_path(ib_dev->ibd_udev_path, mode, ib_dev,
+ bdev_file = bdev_file_open_by_path(ib_dev->ibd_udev_path, mode, ib_dev,
NULL);
- if (IS_ERR(bdev_handle)) {
- ret = PTR_ERR(bdev_handle);
+ if (IS_ERR(bdev_file)) {
+ ret = PTR_ERR(bdev_file);
goto out_free_bioset;
}
- ib_dev->ibd_bdev_handle = bdev_handle;
- ib_dev->ibd_bd = bd = bdev_handle->bdev;
+ ib_dev->ibd_bdev_file = bdev_file;
+ ib_dev->ibd_bd = bd = file_bdev(bdev_file);
q = bdev_get_queue(bd);
@@ -180,7 +180,7 @@ static int iblock_configure_device(struct se_device *dev)
return 0;
out_blkdev_put:
- bdev_release(ib_dev->ibd_bdev_handle);
+ fput(ib_dev->ibd_bdev_file);
out_free_bioset:
bioset_exit(&ib_dev->ibd_bio_set);
out:
@@ -205,8 +205,8 @@ static void iblock_destroy_device(struct se_device *dev)
{
struct iblock_dev *ib_dev = IBLOCK_DEV(dev);
- if (ib_dev->ibd_bdev_handle)
- bdev_release(ib_dev->ibd_bdev_handle);
+ if (ib_dev->ibd_bdev_file)
+ fput(ib_dev->ibd_bdev_file);
bioset_exit(&ib_dev->ibd_bio_set);
}
diff --git a/drivers/target/target_core_iblock.h b/drivers/target/target_core_iblock.h
index 683f9a5..91f6f42 100644
--- a/drivers/target/target_core_iblock.h
+++ b/drivers/target/target_core_iblock.h
@@ -32,7 +32,7 @@ struct iblock_dev {
u32 ibd_flags;
struct bio_set ibd_bio_set;
struct block_device *ibd_bd;
- struct bdev_handle *ibd_bdev_handle;
+ struct file *ibd_bdev_file;
bool ibd_readonly;
struct iblock_dev_plug *ibd_plug;
} ____cacheline_aligned;
diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
index 41b7489..f98ebb1 100644
--- a/drivers/target/target_core_pscsi.c
+++ b/drivers/target/target_core_pscsi.c
@@ -352,7 +352,7 @@ static int pscsi_create_type_disk(struct se_device *dev, struct scsi_device *sd)
struct pscsi_hba_virt *phv = dev->se_hba->hba_ptr;
struct pscsi_dev_virt *pdv = PSCSI_DEV(dev);
struct Scsi_Host *sh = sd->host;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
int ret;
if (scsi_device_get(sd)) {
@@ -366,18 +366,18 @@ static int pscsi_create_type_disk(struct se_device *dev, struct scsi_device *sd)
* Claim exclusive struct block_device access to struct scsi_device
* for TYPE_DISK and TYPE_ZBC using supplied udev_path
*/
- bdev_handle = bdev_open_by_path(dev->udev_path,
+ bdev_file = bdev_file_open_by_path(dev->udev_path,
BLK_OPEN_WRITE | BLK_OPEN_READ, pdv, NULL);
- if (IS_ERR(bdev_handle)) {
+ if (IS_ERR(bdev_file)) {
pr_err("pSCSI: bdev_open_by_path() failed\n");
scsi_device_put(sd);
- return PTR_ERR(bdev_handle);
+ return PTR_ERR(bdev_file);
}
- pdv->pdv_bdev_handle = bdev_handle;
+ pdv->pdv_bdev_file = bdev_file;
ret = pscsi_add_device_to_list(dev, sd);
if (ret) {
- bdev_release(bdev_handle);
+ fput(bdev_file);
scsi_device_put(sd);
return ret;
}
@@ -564,9 +564,9 @@ static void pscsi_destroy_device(struct se_device *dev)
* from pscsi_create_type_disk()
*/
if ((sd->type == TYPE_DISK || sd->type == TYPE_ZBC) &&
- pdv->pdv_bdev_handle) {
- bdev_release(pdv->pdv_bdev_handle);
- pdv->pdv_bdev_handle = NULL;
+ pdv->pdv_bdev_file) {
+ fput(pdv->pdv_bdev_file);
+ pdv->pdv_bdev_file = NULL;
}
/*
* For HBA mode PHV_LLD_SCSI_HOST_NO, release the reference
@@ -907,12 +907,15 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
return 0;
fail:
- if (bio)
- bio_put(bio);
+ if (bio) {
+ bio_uninit(bio);
+ kfree(bio);
+ }
while (req->bio) {
bio = req->bio;
req->bio = bio->bi_next;
- bio_put(bio);
+ bio_uninit(bio);
+ kfree(bio);
}
req->biotail = NULL;
return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
@@ -994,8 +997,8 @@ static sector_t pscsi_get_blocks(struct se_device *dev)
{
struct pscsi_dev_virt *pdv = PSCSI_DEV(dev);
- if (pdv->pdv_bdev_handle)
- return bdev_nr_sectors(pdv->pdv_bdev_handle->bdev);
+ if (pdv->pdv_bdev_file)
+ return bdev_nr_sectors(file_bdev(pdv->pdv_bdev_file));
return 0;
}
diff --git a/drivers/target/target_core_pscsi.h b/drivers/target/target_core_pscsi.h
index b0a3ef1..9acaa21 100644
--- a/drivers/target/target_core_pscsi.h
+++ b/drivers/target/target_core_pscsi.h
@@ -37,7 +37,7 @@ struct pscsi_dev_virt {
int pdv_channel_id;
int pdv_target_id;
int pdv_lun_id;
- struct bdev_handle *pdv_bdev_handle;
+ struct file *pdv_bdev_file;
struct scsi_device *pdv_sd;
struct Scsi_Host *pdv_lld_host;
} ____cacheline_aligned;
diff --git a/drivers/tee/optee/device.c b/drivers/tee/optee/device.c
index 4b10921..1892e49 100644
--- a/drivers/tee/optee/device.c
+++ b/drivers/tee/optee/device.c
@@ -90,13 +90,14 @@ static int optee_register_device(const uuid_t *device_uuid, u32 func)
if (rc) {
pr_err("device registration failed, err: %d\n", rc);
put_device(&optee_device->dev);
+ return rc;
}
if (func == PTA_CMD_GET_DEVICES_SUPP)
device_create_file(&optee_device->dev,
&dev_attr_need_supplicant);
- return rc;
+ return 0;
}
static int __optee_enumerate_devices(u32 func)
diff --git a/drivers/thunderbolt/switch.c b/drivers/thunderbolt/switch.c
index 900114b..fad40c4 100644
--- a/drivers/thunderbolt/switch.c
+++ b/drivers/thunderbolt/switch.c
@@ -1249,6 +1249,9 @@ int tb_port_update_credits(struct tb_port *port)
ret = tb_port_do_update_credits(port);
if (ret)
return ret;
+
+ if (!port->dual_link_port)
+ return 0;
return tb_port_do_update_credits(port->dual_link_port);
}
diff --git a/drivers/thunderbolt/tb_regs.h b/drivers/thunderbolt/tb_regs.h
index 87e4795..6f798f6 100644
--- a/drivers/thunderbolt/tb_regs.h
+++ b/drivers/thunderbolt/tb_regs.h
@@ -203,7 +203,7 @@ struct tb_regs_switch_header {
#define ROUTER_CS_5_WOP BIT(1)
#define ROUTER_CS_5_WOU BIT(2)
#define ROUTER_CS_5_WOD BIT(3)
-#define ROUTER_CS_5_C3S BIT(23)
+#define ROUTER_CS_5_CNS BIT(23)
#define ROUTER_CS_5_PTO BIT(24)
#define ROUTER_CS_5_UTO BIT(25)
#define ROUTER_CS_5_HCO BIT(26)
diff --git a/drivers/thunderbolt/usb4.c b/drivers/thunderbolt/usb4.c
index f8f0d24..1515eff 100644
--- a/drivers/thunderbolt/usb4.c
+++ b/drivers/thunderbolt/usb4.c
@@ -290,7 +290,7 @@ int usb4_switch_setup(struct tb_switch *sw)
}
/* TBT3 supported by the CM */
- val |= ROUTER_CS_5_C3S;
+ val &= ~ROUTER_CS_5_CNS;
return tb_sw_write(sw, &val, TB_CFG_SWITCH, ROUTER_CS_5, 1);
}
diff --git a/drivers/tty/hvc/Kconfig b/drivers/tty/hvc/Kconfig
index 6e05c5c..c2a4e88 100644
--- a/drivers/tty/hvc/Kconfig
+++ b/drivers/tty/hvc/Kconfig
@@ -108,13 +108,15 @@
config HVC_RISCV_SBI
bool "RISC-V SBI console support"
- depends on RISCV_SBI
+ depends on RISCV_SBI && NONPORTABLE
select HVC_DRIVER
help
This enables support for console output via RISC-V SBI calls, which
- is normally used only during boot to output printk.
+ is normally used only during boot to output printk. This driver
+ conflicts with real console drivers and should not be enabled on
+ systems that directly access the console.
- If you don't know what do to here, say Y.
+ If you don't know what do to here, say N.
config HVCS
tristate "IBM Hypervisor Virtual Console Server support"
diff --git a/drivers/tty/serial/8250/8250_dw.c b/drivers/tty/serial/8250/8250_dw.c
index 2d1f350..c1d43f0 100644
--- a/drivers/tty/serial/8250/8250_dw.c
+++ b/drivers/tty/serial/8250/8250_dw.c
@@ -357,9 +357,9 @@ static void dw8250_set_termios(struct uart_port *p, struct ktermios *termios,
long rate;
int ret;
- clk_disable_unprepare(d->clk);
rate = clk_round_rate(d->clk, newrate);
- if (rate > 0) {
+ if (rate > 0 && p->uartclk != rate) {
+ clk_disable_unprepare(d->clk);
/*
* Note that any clock-notifer worker will block in
* serial8250_update_uartclk() until we are done.
@@ -367,8 +367,8 @@ static void dw8250_set_termios(struct uart_port *p, struct ktermios *termios,
ret = clk_set_rate(d->clk, newrate);
if (!ret)
p->uartclk = rate;
+ clk_prepare_enable(d->clk);
}
- clk_prepare_enable(d->clk);
dw8250_do_set_termios(p, termios, old);
}
diff --git a/drivers/tty/serial/8250/8250_pci1xxxx.c b/drivers/tty/serial/8250/8250_pci1xxxx.c
index cd25892..2dda737 100644
--- a/drivers/tty/serial/8250/8250_pci1xxxx.c
+++ b/drivers/tty/serial/8250/8250_pci1xxxx.c
@@ -302,7 +302,7 @@ static void pci1xxxx_process_read_data(struct uart_port *port,
* to read, the data is received one byte at a time.
*/
while (valid_burst_count--) {
- if (*buff_index >= (RX_BUF_SIZE - UART_BURST_SIZE))
+ if (*buff_index > (RX_BUF_SIZE - UART_BURST_SIZE))
break;
burst_buf = (u32 *)&rx_buff[*buff_index];
*burst_buf = readl(port->membase + UART_RX_BURST_FIFO);
diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c
index fccec16..cf2c890 100644
--- a/drivers/tty/serial/amba-pl011.c
+++ b/drivers/tty/serial/amba-pl011.c
@@ -1339,11 +1339,41 @@ static void pl011_start_tx_pio(struct uart_amba_port *uap)
}
}
+static void pl011_rs485_tx_start(struct uart_amba_port *uap)
+{
+ struct uart_port *port = &uap->port;
+ u32 cr;
+
+ /* Enable transmitter */
+ cr = pl011_read(uap, REG_CR);
+ cr |= UART011_CR_TXE;
+
+ /* Disable receiver if half-duplex */
+ if (!(port->rs485.flags & SER_RS485_RX_DURING_TX))
+ cr &= ~UART011_CR_RXE;
+
+ if (port->rs485.flags & SER_RS485_RTS_ON_SEND)
+ cr &= ~UART011_CR_RTS;
+ else
+ cr |= UART011_CR_RTS;
+
+ pl011_write(cr, uap, REG_CR);
+
+ if (port->rs485.delay_rts_before_send)
+ mdelay(port->rs485.delay_rts_before_send);
+
+ uap->rs485_tx_started = true;
+}
+
static void pl011_start_tx(struct uart_port *port)
{
struct uart_amba_port *uap =
container_of(port, struct uart_amba_port, port);
+ if ((uap->port.rs485.flags & SER_RS485_ENABLED) &&
+ !uap->rs485_tx_started)
+ pl011_rs485_tx_start(uap);
+
if (!pl011_dma_tx_start(uap))
pl011_start_tx_pio(uap);
}
@@ -1424,42 +1454,12 @@ static bool pl011_tx_char(struct uart_amba_port *uap, unsigned char c,
return true;
}
-static void pl011_rs485_tx_start(struct uart_amba_port *uap)
-{
- struct uart_port *port = &uap->port;
- u32 cr;
-
- /* Enable transmitter */
- cr = pl011_read(uap, REG_CR);
- cr |= UART011_CR_TXE;
-
- /* Disable receiver if half-duplex */
- if (!(port->rs485.flags & SER_RS485_RX_DURING_TX))
- cr &= ~UART011_CR_RXE;
-
- if (port->rs485.flags & SER_RS485_RTS_ON_SEND)
- cr &= ~UART011_CR_RTS;
- else
- cr |= UART011_CR_RTS;
-
- pl011_write(cr, uap, REG_CR);
-
- if (port->rs485.delay_rts_before_send)
- mdelay(port->rs485.delay_rts_before_send);
-
- uap->rs485_tx_started = true;
-}
-
/* Returns true if tx interrupts have to be (kept) enabled */
static bool pl011_tx_chars(struct uart_amba_port *uap, bool from_irq)
{
struct circ_buf *xmit = &uap->port.state->xmit;
int count = uap->fifosize >> 1;
- if ((uap->port.rs485.flags & SER_RS485_ENABLED) &&
- !uap->rs485_tx_started)
- pl011_rs485_tx_start(uap);
-
if (uap->port.x_char) {
if (!pl011_tx_char(uap, uap->port.x_char, from_irq))
return true;
diff --git a/drivers/tty/serial/fsl_lpuart.c b/drivers/tty/serial/fsl_lpuart.c
index 5ddf110..bbcbc91 100644
--- a/drivers/tty/serial/fsl_lpuart.c
+++ b/drivers/tty/serial/fsl_lpuart.c
@@ -2345,9 +2345,12 @@ lpuart32_set_termios(struct uart_port *port, struct ktermios *termios,
lpuart32_write(&sport->port, bd, UARTBAUD);
lpuart32_serial_setbrg(sport, baud);
- lpuart32_write(&sport->port, modem, UARTMODIR);
- lpuart32_write(&sport->port, ctrl, UARTCTRL);
+ /* disable CTS before enabling UARTCTRL_TE to avoid pending idle preamble */
+ lpuart32_write(&sport->port, modem & ~UARTMODIR_TXCTSE, UARTMODIR);
/* restore control register */
+ lpuart32_write(&sport->port, ctrl, UARTCTRL);
+ /* re-enable the CTS if needed */
+ lpuart32_write(&sport->port, modem, UARTMODIR);
if ((ctrl & (UARTCTRL_PE | UARTCTRL_M)) == UARTCTRL_PE)
sport->is_cs7 = true;
diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
index 4aa72d5..e148132 100644
--- a/drivers/tty/serial/imx.c
+++ b/drivers/tty/serial/imx.c
@@ -462,8 +462,7 @@ static void imx_uart_stop_tx(struct uart_port *port)
}
}
-/* called with port.lock taken and irqs off */
-static void imx_uart_stop_rx(struct uart_port *port)
+static void imx_uart_stop_rx_with_loopback_ctrl(struct uart_port *port, bool loopback)
{
struct imx_port *sport = (struct imx_port *)port;
u32 ucr1, ucr2, ucr4, uts;
@@ -485,7 +484,7 @@ static void imx_uart_stop_rx(struct uart_port *port)
/* See SER_RS485_ENABLED/UTS_LOOP comment in imx_uart_probe() */
if (port->rs485.flags & SER_RS485_ENABLED &&
port->rs485.flags & SER_RS485_RTS_ON_SEND &&
- sport->have_rtscts && !sport->have_rtsgpio) {
+ sport->have_rtscts && !sport->have_rtsgpio && loopback) {
uts = imx_uart_readl(sport, imx_uart_uts_reg(sport));
uts |= UTS_LOOP;
imx_uart_writel(sport, uts, imx_uart_uts_reg(sport));
@@ -498,6 +497,16 @@ static void imx_uart_stop_rx(struct uart_port *port)
}
/* called with port.lock taken and irqs off */
+static void imx_uart_stop_rx(struct uart_port *port)
+{
+ /*
+ * Stop RX and enable loopback in order to make sure RS485 bus
+ * is not blocked. Se comment in imx_uart_probe().
+ */
+ imx_uart_stop_rx_with_loopback_ctrl(port, true);
+}
+
+/* called with port.lock taken and irqs off */
static void imx_uart_enable_ms(struct uart_port *port)
{
struct imx_port *sport = (struct imx_port *)port;
@@ -682,9 +691,14 @@ static void imx_uart_start_tx(struct uart_port *port)
imx_uart_rts_inactive(sport, &ucr2);
imx_uart_writel(sport, ucr2, UCR2);
+ /*
+ * Since we are about to transmit we can not stop RX
+ * with loopback enabled because that will make our
+ * transmitted data being just looped to RX.
+ */
if (!(port->rs485.flags & SER_RS485_RX_DURING_TX) &&
!port->rs485_rx_during_tx_gpio)
- imx_uart_stop_rx(port);
+ imx_uart_stop_rx_with_loopback_ctrl(port, false);
sport->tx_state = WAIT_AFTER_RTS;
diff --git a/drivers/tty/serial/mxs-auart.c b/drivers/tty/serial/mxs-auart.c
index 3ec72555..4749331 100644
--- a/drivers/tty/serial/mxs-auart.c
+++ b/drivers/tty/serial/mxs-auart.c
@@ -605,13 +605,16 @@ static void mxs_auart_tx_chars(struct mxs_auart_port *s)
return;
}
- pending = uart_port_tx(&s->port, ch,
+ pending = uart_port_tx_flags(&s->port, ch, UART_TX_NOSTOP,
!(mxs_read(s, REG_STAT) & AUART_STAT_TXFF),
mxs_write(ch, s, REG_DATA));
if (pending)
mxs_set(AUART_INTR_TXIEN, s, REG_INTR);
else
mxs_clr(AUART_INTR_TXIEN, s, REG_INTR);
+
+ if (uart_tx_stopped(&s->port))
+ mxs_auart_stop_tx(&s->port);
}
static void mxs_auart_rx_char(struct mxs_auart_port *s)
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index e63a8fb..99e0873 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -851,19 +851,21 @@ static void qcom_geni_serial_stop_tx(struct uart_port *uport)
}
static void qcom_geni_serial_send_chunk_fifo(struct uart_port *uport,
- unsigned int remaining)
+ unsigned int chunk)
{
struct qcom_geni_serial_port *port = to_dev_port(uport);
struct circ_buf *xmit = &uport->state->xmit;
- unsigned int tx_bytes;
+ unsigned int tx_bytes, c, remaining = chunk;
u8 buf[BYTES_PER_FIFO_WORD];
while (remaining) {
memset(buf, 0, sizeof(buf));
tx_bytes = min(remaining, BYTES_PER_FIFO_WORD);
- memcpy(buf, &xmit->buf[xmit->tail], tx_bytes);
- uart_xmit_advance(uport, tx_bytes);
+ for (c = 0; c < tx_bytes ; c++) {
+ buf[c] = xmit->buf[xmit->tail];
+ uart_xmit_advance(uport, 1);
+ }
iowrite32_rep(uport->membase + SE_GENI_TX_FIFOn, buf, 1);
diff --git a/drivers/tty/serial/serial_port.c b/drivers/tty/serial/serial_port.c
index 88975a4..72b6f4f 100644
--- a/drivers/tty/serial/serial_port.c
+++ b/drivers/tty/serial/serial_port.c
@@ -46,8 +46,31 @@ static int serial_port_runtime_resume(struct device *dev)
return 0;
}
+static int serial_port_runtime_suspend(struct device *dev)
+{
+ struct serial_port_device *port_dev = to_serial_base_port_device(dev);
+ struct uart_port *port = port_dev->port;
+ unsigned long flags;
+ bool busy;
+
+ if (port->flags & UPF_DEAD)
+ return 0;
+
+ uart_port_lock_irqsave(port, &flags);
+ busy = __serial_port_busy(port);
+ if (busy)
+ port->ops->start_tx(port);
+ uart_port_unlock_irqrestore(port, flags);
+
+ if (busy)
+ pm_runtime_mark_last_busy(dev);
+
+ return busy ? -EBUSY : 0;
+}
+
static DEFINE_RUNTIME_DEV_PM_OPS(serial_port_pm,
- NULL, serial_port_runtime_resume, NULL);
+ serial_port_runtime_suspend,
+ serial_port_runtime_resume, NULL);
static int serial_port_probe(struct device *dev)
{
diff --git a/drivers/tty/serial/stm32-usart.c b/drivers/tty/serial/stm32-usart.c
index 794b775..693e932 100644
--- a/drivers/tty/serial/stm32-usart.c
+++ b/drivers/tty/serial/stm32-usart.c
@@ -251,7 +251,9 @@ static int stm32_usart_config_rs485(struct uart_port *port, struct ktermios *ter
writel_relaxed(cr3, port->membase + ofs->cr3);
writel_relaxed(cr1, port->membase + ofs->cr1);
- rs485conf->flags |= SER_RS485_RX_DURING_TX;
+ if (!port->rs485_rx_during_tx_gpio)
+ rs485conf->flags |= SER_RS485_RX_DURING_TX;
+
} else {
stm32_usart_clr_bits(port, ofs->cr3,
USART_CR3_DEM | USART_CR3_DEP);
diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index 156efda..38a765e 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -381,7 +381,7 @@ static void vc_uniscr_delete(struct vc_data *vc, unsigned int nr)
u32 *ln = vc->vc_uni_lines[vc->state.y];
unsigned int x = vc->state.x, cols = vc->vc_cols;
- memcpy(&ln[x], &ln[x + nr], (cols - x - nr) * sizeof(*ln));
+ memmove(&ln[x], &ln[x + nr], (cols - x - nr) * sizeof(*ln));
memset32(&ln[cols - nr], ' ', nr);
}
}
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index c502a86..eac7fff 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -1469,7 +1469,7 @@ static int ufshcd_devfreq_target(struct device *dev,
int ret = 0;
struct ufs_hba *hba = dev_get_drvdata(dev);
ktime_t start;
- bool scale_up, sched_clk_scaling_suspend_work = false;
+ bool scale_up = false, sched_clk_scaling_suspend_work = false;
struct list_head *clk_list = &hba->clk_list_head;
struct ufs_clk_info *clki;
unsigned long irq_flags;
@@ -3057,7 +3057,7 @@ bool ufshcd_cmd_inflight(struct scsi_cmnd *cmd)
*/
static int ufshcd_clear_cmd(struct ufs_hba *hba, u32 task_tag)
{
- u32 mask = 1U << task_tag;
+ u32 mask;
unsigned long flags;
int err;
@@ -3075,6 +3075,8 @@ static int ufshcd_clear_cmd(struct ufs_hba *hba, u32 task_tag)
return 0;
}
+ mask = 1U << task_tag;
+
/* clear outstanding transaction before retry */
spin_lock_irqsave(hba->host->host_lock, flags);
ufshcd_utrl_clear(hba, mask);
@@ -6352,7 +6354,6 @@ static void ufshcd_err_handling_prepare(struct ufs_hba *hba)
ufshcd_hold(hba);
if (!ufshcd_is_clkgating_allowed(hba))
ufshcd_setup_clocks(hba, true);
- ufshcd_release(hba);
pm_op = hba->is_sys_suspended ? UFS_SYSTEM_PM : UFS_RUNTIME_PM;
ufshcd_vops_resume(hba, pm_op);
} else {
diff --git a/drivers/usb/cdns3/cdns3-gadget.c b/drivers/usb/cdns3/cdns3-gadget.c
index aeca902..fd1beb1 100644
--- a/drivers/usb/cdns3/cdns3-gadget.c
+++ b/drivers/usb/cdns3/cdns3-gadget.c
@@ -828,7 +828,11 @@ void cdns3_gadget_giveback(struct cdns3_endpoint *priv_ep,
return;
}
- if (request->complete) {
+ /*
+ * zlp request is appended by driver, needn't call usb_gadget_giveback_request() to notify
+ * gadget composite driver.
+ */
+ if (request->complete && request->buf != priv_dev->zlp_buf) {
spin_unlock(&priv_dev->lock);
usb_gadget_giveback_request(&priv_ep->endpoint,
request);
@@ -2540,11 +2544,11 @@ static int cdns3_gadget_ep_disable(struct usb_ep *ep)
while (!list_empty(&priv_ep->wa2_descmiss_req_list)) {
priv_req = cdns3_next_priv_request(&priv_ep->wa2_descmiss_req_list);
+ list_del_init(&priv_req->list);
kfree(priv_req->request.buf);
cdns3_gadget_ep_free_request(&priv_ep->endpoint,
&priv_req->request);
- list_del_init(&priv_req->list);
--priv_ep->wa2_counter;
}
diff --git a/drivers/usb/cdns3/core.c b/drivers/usb/cdns3/core.c
index 3354877..465e926 100644
--- a/drivers/usb/cdns3/core.c
+++ b/drivers/usb/cdns3/core.c
@@ -395,7 +395,6 @@ static int cdns_role_set(struct usb_role_switch *sw, enum usb_role role)
return ret;
}
-
/**
* cdns_wakeup_irq - interrupt handler for wakeup events
* @irq: irq number for cdns3/cdnsp core device
diff --git a/drivers/usb/cdns3/drd.c b/drivers/usb/cdns3/drd.c
index 04b6d12..ee917f1 100644
--- a/drivers/usb/cdns3/drd.c
+++ b/drivers/usb/cdns3/drd.c
@@ -156,7 +156,8 @@ bool cdns_is_device(struct cdns *cdns)
*/
static void cdns_otg_disable_irq(struct cdns *cdns)
{
- writel(0, &cdns->otg_irq_regs->ien);
+ if (cdns->version)
+ writel(0, &cdns->otg_irq_regs->ien);
}
/**
@@ -422,15 +423,20 @@ int cdns_drd_init(struct cdns *cdns)
cdns->otg_regs = (void __iomem *)&cdns->otg_v1_regs->cmd;
- if (readl(&cdns->otg_cdnsp_regs->did) == OTG_CDNSP_DID) {
+ state = readl(&cdns->otg_cdnsp_regs->did);
+
+ if (OTG_CDNSP_CHECK_DID(state)) {
cdns->otg_irq_regs = (struct cdns_otg_irq_regs __iomem *)
&cdns->otg_cdnsp_regs->ien;
cdns->version = CDNSP_CONTROLLER_V2;
- } else {
+ } else if (OTG_CDNS3_CHECK_DID(state)) {
cdns->otg_irq_regs = (struct cdns_otg_irq_regs __iomem *)
&cdns->otg_v1_regs->ien;
writel(1, &cdns->otg_v1_regs->simulate);
cdns->version = CDNS3_CONTROLLER_V1;
+ } else {
+ dev_err(cdns->dev, "not supporte DID=0x%08x\n", state);
+ return -EINVAL;
}
dev_dbg(cdns->dev, "DRD version v1 (ID: %08x, rev: %08x)\n",
@@ -483,7 +489,6 @@ int cdns_drd_exit(struct cdns *cdns)
return 0;
}
-
/* Indicate the cdns3 core was power lost before */
bool cdns_power_is_lost(struct cdns *cdns)
{
diff --git a/drivers/usb/cdns3/drd.h b/drivers/usb/cdns3/drd.h
index cbdf94f..d72370c 100644
--- a/drivers/usb/cdns3/drd.h
+++ b/drivers/usb/cdns3/drd.h
@@ -79,7 +79,11 @@ struct cdnsp_otg_regs {
__le32 susp_timing_ctrl;
};
-#define OTG_CDNSP_DID 0x0004034E
+/* CDNSP driver supports 0x000403xx Cadence USB controller family. */
+#define OTG_CDNSP_CHECK_DID(did) (((did) & GENMASK(31, 8)) == 0x00040300)
+
+/* CDNS3 driver supports 0x000402xx Cadence USB controller family. */
+#define OTG_CDNS3_CHECK_DID(did) (((did) & GENMASK(31, 8)) == 0x00040200)
/*
* Common registers interface for both CDNS3 and CDNSP version of DRD.
diff --git a/drivers/usb/cdns3/host.c b/drivers/usb/cdns3/host.c
index 6164fc4..ceca4d8 100644
--- a/drivers/usb/cdns3/host.c
+++ b/drivers/usb/cdns3/host.c
@@ -18,6 +18,11 @@
#include "../host/xhci.h"
#include "../host/xhci-plat.h"
+/*
+ * The XECP_PORT_CAP_REG and XECP_AUX_CTRL_REG1 exist only
+ * in Cadence USB3 dual-role controller, so it can't be used
+ * with Cadence CDNSP dual-role controller.
+ */
#define XECP_PORT_CAP_REG 0x8000
#define XECP_AUX_CTRL_REG1 0x8120
@@ -57,6 +62,8 @@ static const struct xhci_plat_priv xhci_plat_cdns3_xhci = {
.resume_quirk = xhci_cdns3_resume_quirk,
};
+static const struct xhci_plat_priv xhci_plat_cdnsp_xhci;
+
static int __cdns_host_init(struct cdns *cdns)
{
struct platform_device *xhci;
@@ -81,8 +88,13 @@ static int __cdns_host_init(struct cdns *cdns)
goto err1;
}
- cdns->xhci_plat_data = kmemdup(&xhci_plat_cdns3_xhci,
- sizeof(struct xhci_plat_priv), GFP_KERNEL);
+ if (cdns->version < CDNSP_CONTROLLER_V2)
+ cdns->xhci_plat_data = kmemdup(&xhci_plat_cdns3_xhci,
+ sizeof(struct xhci_plat_priv), GFP_KERNEL);
+ else
+ cdns->xhci_plat_data = kmemdup(&xhci_plat_cdnsp_xhci,
+ sizeof(struct xhci_plat_priv), GFP_KERNEL);
+
if (!cdns->xhci_plat_data) {
ret = -ENOMEM;
goto err1;
diff --git a/drivers/usb/core/port.c b/drivers/usb/core/port.c
index c628c1a..4d63496 100644
--- a/drivers/usb/core/port.c
+++ b/drivers/usb/core/port.c
@@ -573,7 +573,7 @@ static int match_location(struct usb_device *peer_hdev, void *p)
struct usb_hub *peer_hub = usb_hub_to_struct_hub(peer_hdev);
struct usb_device *hdev = to_usb_device(port_dev->dev.parent->parent);
- if (!peer_hub)
+ if (!peer_hub || port_dev->connect_type == USB_PORT_NOT_USED)
return 0;
hcd = bus_to_hcd(hdev->bus);
@@ -584,7 +584,8 @@ static int match_location(struct usb_device *peer_hdev, void *p)
for (port1 = 1; port1 <= peer_hdev->maxchild; port1++) {
peer = peer_hub->ports[port1 - 1];
- if (peer && peer->location == port_dev->location) {
+ if (peer && peer->connect_type != USB_PORT_NOT_USED &&
+ peer->location == port_dev->location) {
link_peers_report(port_dev, peer);
return 1; /* done */
}
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index e3eea96..e120611 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -376,7 +376,6 @@
/* Global HWPARAMS4 Register */
#define DWC3_GHWPARAMS4_HIBER_SCRATCHBUFS(n) (((n) & (0x0f << 13)) >> 13)
#define DWC3_MAX_HIBER_SCRATCHBUFS 15
-#define DWC3_EXT_BUFF_CONTROL BIT(21)
/* Global HWPARAMS6 Register */
#define DWC3_GHWPARAMS6_BCSUPPORT BIT(14)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 564976b..28f4940 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -673,12 +673,6 @@ static int dwc3_gadget_set_ep_config(struct dwc3_ep *dep, unsigned int action)
params.param1 |= DWC3_DEPCFG_BINTERVAL_M1(bInterval_m1);
}
- if (dep->endpoint.fifo_mode) {
- if (!(dwc->hwparams.hwparams4 & DWC3_EXT_BUFF_CONTROL))
- return -EINVAL;
- params.param1 |= DWC3_DEPCFG_EBC_HWO_NOWB | DWC3_DEPCFG_USE_EBC;
- }
-
return dwc3_send_gadget_ep_cmd(dep, DWC3_DEPCMD_SETEPCONFIG, ¶ms);
}
@@ -2656,6 +2650,11 @@ static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
int ret;
spin_lock_irqsave(&dwc->lock, flags);
+ if (!dwc->pullups_connected) {
+ spin_unlock_irqrestore(&dwc->lock, flags);
+ return 0;
+ }
+
dwc->connected = false;
/*
diff --git a/drivers/usb/dwc3/gadget.h b/drivers/usb/dwc3/gadget.h
index fd7a4e9..55a56cf67 100644
--- a/drivers/usb/dwc3/gadget.h
+++ b/drivers/usb/dwc3/gadget.h
@@ -26,8 +26,6 @@ struct dwc3;
#define DWC3_DEPCFG_XFER_NOT_READY_EN BIT(10)
#define DWC3_DEPCFG_FIFO_ERROR_EN BIT(11)
#define DWC3_DEPCFG_STREAM_EVENT_EN BIT(13)
-#define DWC3_DEPCFG_EBC_HWO_NOWB BIT(14)
-#define DWC3_DEPCFG_USE_EBC BIT(15)
#define DWC3_DEPCFG_BINTERVAL_M1(n) (((n) & 0xff) << 16)
#define DWC3_DEPCFG_STREAM_CAPABLE BIT(24)
#define DWC3_DEPCFG_EP_NUMBER(n) (((n) & 0x1f) << 25)
diff --git a/drivers/usb/gadget/function/f_ncm.c b/drivers/usb/gadget/function/f_ncm.c
index ca5d5f5..28f4e65 100644
--- a/drivers/usb/gadget/function/f_ncm.c
+++ b/drivers/usb/gadget/function/f_ncm.c
@@ -1338,7 +1338,15 @@ static int ncm_unwrap_ntb(struct gether *port,
"Parsed NTB with %d frames\n", dgram_counter);
to_process -= block_len;
- if (to_process != 0) {
+
+ /*
+ * Windows NCM driver avoids USB ZLPs by adding a 1-byte
+ * zero pad as needed.
+ */
+ if (to_process == 1 &&
+ (*(unsigned char *)(ntb_ptr + block_len) == 0x00)) {
+ to_process--;
+ } else if ((to_process > 0) && (block_len != 0)) {
ntb_ptr = (unsigned char *)(ntb_ptr + block_len);
goto parse_ntb;
}
diff --git a/drivers/usb/gadget/udc/omap_udc.c b/drivers/usb/gadget/udc/omap_udc.c
index 10c5d7f..f90eeecf 100644
--- a/drivers/usb/gadget/udc/omap_udc.c
+++ b/drivers/usb/gadget/udc/omap_udc.c
@@ -2036,7 +2036,8 @@ static irqreturn_t omap_udc_iso_irq(int irq, void *_dev)
static inline int machine_without_vbus_sense(void)
{
- return machine_is_omap_osk() || machine_is_sx1();
+ return machine_is_omap_osk() || machine_is_omap_palmte() ||
+ machine_is_sx1();
}
static int omap_udc_start(struct usb_gadget *g,
diff --git a/drivers/usb/host/uhci-grlib.c b/drivers/usb/host/uhci-grlib.c
index ac3fc59..cfebb83 100644
--- a/drivers/usb/host/uhci-grlib.c
+++ b/drivers/usb/host/uhci-grlib.c
@@ -22,6 +22,7 @@
#include <linux/of_irq.h>
#include <linux/of_address.h>
#include <linux/of_platform.h>
+#include <linux/platform_device.h>
static int uhci_grlib_init(struct usb_hcd *hcd)
{
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index f0d8a60..4f64b81 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -326,7 +326,13 @@ static unsigned int xhci_ring_expansion_needed(struct xhci_hcd *xhci, struct xhc
/* how many trbs will be queued past the enqueue segment? */
trbs_past_seg = enq_used + num_trbs - (TRBS_PER_SEGMENT - 1);
- if (trbs_past_seg <= 0)
+ /*
+ * Consider expanding the ring already if num_trbs fills the current
+ * segment (i.e. trbs_past_seg == 0), not only when num_trbs goes into
+ * the next segment. Avoids confusing full ring with special empty ring
+ * case below
+ */
+ if (trbs_past_seg < 0)
return 0;
/* Empty ring special case, enqueue stuck on link trb while dequeue advanced */
diff --git a/drivers/usb/roles/class.c b/drivers/usb/roles/class.c
index ae41578..70165dd 100644
--- a/drivers/usb/roles/class.c
+++ b/drivers/usb/roles/class.c
@@ -21,7 +21,9 @@ static const struct class role_class = {
struct usb_role_switch {
struct device dev;
struct mutex lock; /* device lock*/
+ struct module *module; /* the module this device depends on */
enum usb_role role;
+ bool registered;
/* From descriptor */
struct device *usb2_port;
@@ -48,6 +50,9 @@ int usb_role_switch_set_role(struct usb_role_switch *sw, enum usb_role role)
if (IS_ERR_OR_NULL(sw))
return 0;
+ if (!sw->registered)
+ return -EOPNOTSUPP;
+
mutex_lock(&sw->lock);
ret = sw->set(sw, role);
@@ -73,7 +78,7 @@ enum usb_role usb_role_switch_get_role(struct usb_role_switch *sw)
{
enum usb_role role;
- if (IS_ERR_OR_NULL(sw))
+ if (IS_ERR_OR_NULL(sw) || !sw->registered)
return USB_ROLE_NONE;
mutex_lock(&sw->lock);
@@ -135,7 +140,7 @@ struct usb_role_switch *usb_role_switch_get(struct device *dev)
usb_role_switch_match);
if (!IS_ERR_OR_NULL(sw))
- WARN_ON(!try_module_get(sw->dev.parent->driver->owner));
+ WARN_ON(!try_module_get(sw->module));
return sw;
}
@@ -157,7 +162,7 @@ struct usb_role_switch *fwnode_usb_role_switch_get(struct fwnode_handle *fwnode)
sw = fwnode_connection_find_match(fwnode, "usb-role-switch",
NULL, usb_role_switch_match);
if (!IS_ERR_OR_NULL(sw))
- WARN_ON(!try_module_get(sw->dev.parent->driver->owner));
+ WARN_ON(!try_module_get(sw->module));
return sw;
}
@@ -172,7 +177,7 @@ EXPORT_SYMBOL_GPL(fwnode_usb_role_switch_get);
void usb_role_switch_put(struct usb_role_switch *sw)
{
if (!IS_ERR_OR_NULL(sw)) {
- module_put(sw->dev.parent->driver->owner);
+ module_put(sw->module);
put_device(&sw->dev);
}
}
@@ -189,15 +194,18 @@ struct usb_role_switch *
usb_role_switch_find_by_fwnode(const struct fwnode_handle *fwnode)
{
struct device *dev;
+ struct usb_role_switch *sw = NULL;
if (!fwnode)
return NULL;
dev = class_find_device_by_fwnode(&role_class, fwnode);
- if (dev)
- WARN_ON(!try_module_get(dev->parent->driver->owner));
+ if (dev) {
+ sw = to_role_switch(dev);
+ WARN_ON(!try_module_get(sw->module));
+ }
- return dev ? to_role_switch(dev) : NULL;
+ return sw;
}
EXPORT_SYMBOL_GPL(usb_role_switch_find_by_fwnode);
@@ -338,6 +346,7 @@ usb_role_switch_register(struct device *parent,
sw->set = desc->set;
sw->get = desc->get;
+ sw->module = parent->driver->owner;
sw->dev.parent = parent;
sw->dev.fwnode = desc->fwnode;
sw->dev.class = &role_class;
@@ -352,6 +361,8 @@ usb_role_switch_register(struct device *parent,
return ERR_PTR(ret);
}
+ sw->registered = true;
+
/* TODO: Symlinks for the host port and the device controller. */
return sw;
@@ -366,8 +377,10 @@ EXPORT_SYMBOL_GPL(usb_role_switch_register);
*/
void usb_role_switch_unregister(struct usb_role_switch *sw)
{
- if (!IS_ERR_OR_NULL(sw))
+ if (!IS_ERR_OR_NULL(sw)) {
+ sw->registered = false;
device_unregister(&sw->dev);
+ }
}
EXPORT_SYMBOL_GPL(usb_role_switch_unregister);
diff --git a/drivers/usb/storage/isd200.c b/drivers/usb/storage/isd200.c
index 4e0eef14..300aeef 100644
--- a/drivers/usb/storage/isd200.c
+++ b/drivers/usb/storage/isd200.c
@@ -1105,7 +1105,7 @@ static void isd200_dump_driveid(struct us_data *us, u16 *id)
static int isd200_get_inquiry_data( struct us_data *us )
{
struct isd200_info *info = (struct isd200_info *)us->extra;
- int retStatus = ISD200_GOOD;
+ int retStatus;
u16 *id = info->id;
usb_stor_dbg(us, "Entering isd200_get_inquiry_data\n");
@@ -1137,6 +1137,13 @@ static int isd200_get_inquiry_data( struct us_data *us )
isd200_fix_driveid(id);
isd200_dump_driveid(us, id);
+ /* Prevent division by 0 in isd200_scsi_to_ata() */
+ if (id[ATA_ID_HEADS] == 0 || id[ATA_ID_SECTORS] == 0) {
+ usb_stor_dbg(us, " Invalid ATA Identify data\n");
+ retStatus = ISD200_ERROR;
+ goto Done;
+ }
+
memset(&info->InquiryData, 0, sizeof(info->InquiryData));
/* Standard IDE interface only supports disks */
@@ -1202,6 +1209,7 @@ static int isd200_get_inquiry_data( struct us_data *us )
}
}
+ Done:
usb_stor_dbg(us, "Leaving isd200_get_inquiry_data %08X\n", retStatus);
return(retStatus);
@@ -1481,22 +1489,27 @@ static int isd200_init_info(struct us_data *us)
static int isd200_Initialization(struct us_data *us)
{
+ int rc = 0;
+
usb_stor_dbg(us, "ISD200 Initialization...\n");
/* Initialize ISD200 info struct */
- if (isd200_init_info(us) == ISD200_ERROR) {
+ if (isd200_init_info(us) < 0) {
usb_stor_dbg(us, "ERROR Initializing ISD200 Info struct\n");
+ rc = -ENOMEM;
} else {
/* Get device specific data */
- if (isd200_get_inquiry_data(us) != ISD200_GOOD)
+ if (isd200_get_inquiry_data(us) != ISD200_GOOD) {
usb_stor_dbg(us, "ISD200 Initialization Failure\n");
- else
+ rc = -EINVAL;
+ } else {
usb_stor_dbg(us, "ISD200 Initialization complete\n");
+ }
}
- return 0;
+ return rc;
}
diff --git a/drivers/usb/storage/scsiglue.c b/drivers/usb/storage/scsiglue.c
index c54e980..12cf994 100644
--- a/drivers/usb/storage/scsiglue.c
+++ b/drivers/usb/storage/scsiglue.c
@@ -180,6 +180,13 @@ static int slave_configure(struct scsi_device *sdev)
sdev->use_192_bytes_for_3f = 1;
/*
+ * Some devices report generic values until the media has been
+ * accessed. Force a READ(10) prior to querying device
+ * characteristics.
+ */
+ sdev->read_before_ms = 1;
+
+ /*
* Some devices don't like MODE SENSE with page=0x3f,
* which is the command used for checking if a device
* is write-protected. Now that we tell the sd driver
diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c
index 9707f53..71ace27 100644
--- a/drivers/usb/storage/uas.c
+++ b/drivers/usb/storage/uas.c
@@ -879,6 +879,13 @@ static int uas_slave_configure(struct scsi_device *sdev)
sdev->guess_capacity = 1;
/*
+ * Some devices report generic values until the media has been
+ * accessed. Force a READ(10) prior to querying device
+ * characteristics.
+ */
+ sdev->read_before_ms = 1;
+
+ /*
* Some devices don't like MODE SENSE with page=0x3f,
* which is the command used for checking if a device
* is write-protected. Now that we tell the sd driver
diff --git a/drivers/usb/typec/altmodes/displayport.c b/drivers/usb/typec/altmodes/displayport.c
index f81bec0..f8ea305 100644
--- a/drivers/usb/typec/altmodes/displayport.c
+++ b/drivers/usb/typec/altmodes/displayport.c
@@ -559,16 +559,21 @@ static ssize_t hpd_show(struct device *dev, struct device_attribute *attr, char
}
static DEVICE_ATTR_RO(hpd);
-static struct attribute *dp_altmode_attrs[] = {
+static struct attribute *displayport_attrs[] = {
&dev_attr_configuration.attr,
&dev_attr_pin_assignment.attr,
&dev_attr_hpd.attr,
NULL
};
-static const struct attribute_group dp_altmode_group = {
+static const struct attribute_group displayport_group = {
.name = "displayport",
- .attrs = dp_altmode_attrs,
+ .attrs = displayport_attrs,
+};
+
+static const struct attribute_group *displayport_groups[] = {
+ &displayport_group,
+ NULL,
};
int dp_altmode_probe(struct typec_altmode *alt)
@@ -576,7 +581,6 @@ int dp_altmode_probe(struct typec_altmode *alt)
const struct typec_altmode *port = typec_altmode_get_partner(alt);
struct fwnode_handle *fwnode;
struct dp_altmode *dp;
- int ret;
/* FIXME: Port can only be DFP_U. */
@@ -587,10 +591,6 @@ int dp_altmode_probe(struct typec_altmode *alt)
DP_CAP_PIN_ASSIGN_DFP_D(alt->vdo)))
return -ENODEV;
- ret = sysfs_create_group(&alt->dev.kobj, &dp_altmode_group);
- if (ret)
- return ret;
-
dp = devm_kzalloc(&alt->dev, sizeof(*dp), GFP_KERNEL);
if (!dp)
return -ENOMEM;
@@ -624,7 +624,6 @@ void dp_altmode_remove(struct typec_altmode *alt)
{
struct dp_altmode *dp = typec_altmode_get_drvdata(alt);
- sysfs_remove_group(&alt->dev.kobj, &dp_altmode_group);
cancel_work_sync(&dp->work);
if (dp->connector_fwnode) {
@@ -649,6 +648,7 @@ static struct typec_altmode_driver dp_altmode_driver = {
.driver = {
.name = "typec_displayport",
.owner = THIS_MODULE,
+ .dev_groups = displayport_groups,
},
};
module_typec_altmode_driver(dp_altmode_driver);
diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
index f7d7daa..0965972 100644
--- a/drivers/usb/typec/tcpm/tcpm.c
+++ b/drivers/usb/typec/tcpm/tcpm.c
@@ -3743,9 +3743,6 @@ static void tcpm_detach(struct tcpm_port *port)
if (tcpm_port_is_disconnected(port))
port->hard_reset_count = 0;
- port->try_src_count = 0;
- port->try_snk_count = 0;
-
if (!port->attached)
return;
@@ -4876,7 +4873,11 @@ static void run_state_machine(struct tcpm_port *port)
break;
case PORT_RESET:
tcpm_reset_port(port);
- tcpm_set_cc(port, TYPEC_CC_OPEN);
+ if (port->self_powered)
+ tcpm_set_cc(port, TYPEC_CC_OPEN);
+ else
+ tcpm_set_cc(port, tcpm_default_state(port) == SNK_UNATTACHED ?
+ TYPEC_CC_RD : tcpm_rp_cc(port));
tcpm_set_state(port, PORT_RESET_WAIT_OFF,
PD_T_ERROR_RECOVERY);
break;
diff --git a/drivers/usb/typec/ucsi/ucsi_glink.c b/drivers/usb/typec/ucsi/ucsi_glink.c
index 53a7ede..faccc94 100644
--- a/drivers/usb/typec/ucsi/ucsi_glink.c
+++ b/drivers/usb/typec/ucsi/ucsi_glink.c
@@ -301,6 +301,7 @@ static const struct of_device_id pmic_glink_ucsi_of_quirks[] = {
{ .compatible = "qcom,sc8180x-pmic-glink", .data = (void *)UCSI_NO_PARTNER_PDOS, },
{ .compatible = "qcom,sc8280xp-pmic-glink", .data = (void *)UCSI_NO_PARTNER_PDOS, },
{ .compatible = "qcom,sm8350-pmic-glink", .data = (void *)UCSI_NO_PARTNER_PDOS, },
+ { .compatible = "qcom,sm8550-pmic-glink", .data = (void *)UCSI_NO_PARTNER_PDOS, },
{}
};
diff --git a/drivers/video/fbdev/core/fbcon.c b/drivers/video/fbdev/core/fbcon.c
index 1183e7a..46823c2 100644
--- a/drivers/video/fbdev/core/fbcon.c
+++ b/drivers/video/fbdev/core/fbcon.c
@@ -2399,11 +2399,9 @@ static int fbcon_do_set_font(struct vc_data *vc, int w, int h, int charcount,
struct fbcon_ops *ops = info->fbcon_par;
struct fbcon_display *p = &fb_display[vc->vc_num];
int resize, ret, old_userfont, old_width, old_height, old_charcount;
- char *old_data = NULL;
+ u8 *old_data = vc->vc_font.data;
resize = (w != vc->vc_font.width) || (h != vc->vc_font.height);
- if (p->userfont)
- old_data = vc->vc_font.data;
vc->vc_font.data = (void *)(p->fontdata = data);
old_userfont = p->userfont;
if ((p->userfont = userfont))
@@ -2437,13 +2435,13 @@ static int fbcon_do_set_font(struct vc_data *vc, int w, int h, int charcount,
update_screen(vc);
}
- if (old_data && (--REFCOUNT(old_data) == 0))
+ if (old_userfont && (--REFCOUNT(old_data) == 0))
kfree(old_data - FONT_EXTRA_WORDS * sizeof(int));
return 0;
err_out:
p->fontdata = old_data;
- vc->vc_font.data = (void *)old_data;
+ vc->vc_font.data = old_data;
if (userfont) {
p->userfont = old_userfont;
diff --git a/drivers/video/fbdev/hyperv_fb.c b/drivers/video/fbdev/hyperv_fb.c
index c26ee6f..8fdccf0 100644
--- a/drivers/video/fbdev/hyperv_fb.c
+++ b/drivers/video/fbdev/hyperv_fb.c
@@ -1010,8 +1010,6 @@ static int hvfb_getmem(struct hv_device *hdev, struct fb_info *info)
goto getmem_done;
}
pr_info("Unable to allocate enough contiguous physical memory on Gen 1 VM. Using MMIO instead.\n");
- } else {
- goto err1;
}
/*
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index b8cfea7..3b9f080 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -923,8 +923,8 @@ static void shutdown_pirq(struct irq_data *data)
return;
do_mask(info, EVT_MASK_REASON_EXPLICIT);
- xen_evtchn_close(evtchn);
xen_irq_info_cleanup(info);
+ xen_evtchn_close(evtchn);
}
static void enable_pirq(struct irq_data *data)
@@ -956,6 +956,7 @@ EXPORT_SYMBOL_GPL(xen_irq_from_gsi);
static void __unbind_from_irq(struct irq_info *info, unsigned int irq)
{
evtchn_port_t evtchn;
+ bool close_evtchn = false;
if (!info) {
xen_irq_free_desc(irq);
@@ -975,7 +976,7 @@ static void __unbind_from_irq(struct irq_info *info, unsigned int irq)
struct xenbus_device *dev;
if (!info->is_static)
- xen_evtchn_close(evtchn);
+ close_evtchn = true;
switch (info->type) {
case IRQT_VIRQ:
@@ -995,6 +996,9 @@ static void __unbind_from_irq(struct irq_info *info, unsigned int irq)
}
xen_irq_info_cleanup(info);
+
+ if (close_evtchn)
+ xen_evtchn_close(evtchn);
}
xen_free_irq(info);
diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c
index 26ffb87..f93f73e 100644
--- a/drivers/xen/gntalloc.c
+++ b/drivers/xen/gntalloc.c
@@ -317,7 +317,7 @@ static long gntalloc_ioctl_alloc(struct gntalloc_file_private_data *priv,
rc = -EFAULT;
goto out_free;
}
- if (copy_to_user(arg->gref_ids, gref_ids,
+ if (copy_to_user(arg->gref_ids_flex, gref_ids,
sizeof(gref_ids[0]) * op.count)) {
rc = -EFAULT;
goto out_free;
diff --git a/drivers/xen/pcpu.c b/drivers/xen/pcpu.c
index 5086552..c63f317 100644
--- a/drivers/xen/pcpu.c
+++ b/drivers/xen/pcpu.c
@@ -65,7 +65,7 @@ struct pcpu {
uint32_t flags;
};
-static struct bus_type xen_pcpu_subsys = {
+static const struct bus_type xen_pcpu_subsys = {
.name = "xen_cpu",
.dev_name = "xen_cpu",
};
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 35b6e30..67dfa47 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -1223,18 +1223,13 @@ struct privcmd_kernel_ioreq *alloc_ioreq(struct privcmd_ioeventfd *ioeventfd)
kioreq->ioreq = (struct ioreq *)(page_to_virt(pages[0]));
mmap_write_unlock(mm);
- size = sizeof(*ports) * kioreq->vcpus;
- ports = kzalloc(size, GFP_KERNEL);
- if (!ports) {
- ret = -ENOMEM;
+ ports = memdup_array_user(u64_to_user_ptr(ioeventfd->ports),
+ kioreq->vcpus, sizeof(*ports));
+ if (IS_ERR(ports)) {
+ ret = PTR_ERR(ports);
goto error_kfree;
}
- if (copy_from_user(ports, u64_to_user_ptr(ioeventfd->ports), size)) {
- ret = -EFAULT;
- goto error_kfree_ports;
- }
-
for (i = 0; i < kioreq->vcpus; i++) {
kioreq->ports[i].vcpu = i;
kioreq->ports[i].port = ports[i];
@@ -1256,7 +1251,7 @@ struct privcmd_kernel_ioreq *alloc_ioreq(struct privcmd_ioeventfd *ioeventfd)
error_unbind:
while (--i >= 0)
unbind_from_irqhandler(irq_from_evtchn(ports[i]), &kioreq->ports[i]);
-error_kfree_ports:
+
kfree(ports);
error_kfree:
kfree(kioreq);
diff --git a/drivers/xen/xen-balloon.c b/drivers/xen/xen-balloon.c
index 8cd583d..b293d76 100644
--- a/drivers/xen/xen-balloon.c
+++ b/drivers/xen/xen-balloon.c
@@ -237,7 +237,7 @@ static const struct attribute_group *balloon_groups[] = {
NULL
};
-static struct bus_type balloon_subsys = {
+static const struct bus_type balloon_subsys = {
.name = BALLOON_CLASS_NAME,
.dev_name = BALLOON_CLASS_NAME,
};
diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index 32835b4..51b3124 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -116,14 +116,15 @@ EXPORT_SYMBOL_GPL(xenbus_strstate);
* @dev: xenbus device
* @path: path to watch
* @watch: watch to register
+ * @will_handle: events queuing determine callback
* @callback: callback to register
*
* Register a @watch on the given path, using the given xenbus_watch structure
- * for storage, and the given @callback function as the callback. On success,
- * the given @path will be saved as @watch->node, and remains the
- * caller's to free. On error, @watch->node will
- * be NULL, the device will switch to %XenbusStateClosing, and the error will
- * be saved in the store.
+ * for storage, @will_handle function as the callback to determine if each
+ * event need to be queued, and the given @callback function as the callback.
+ * On success, the given @path will be saved as @watch->node, and remains the
+ * caller's to free. On error, @watch->node will be NULL, the device will
+ * switch to %XenbusStateClosing, and the error will be saved in the store.
*
* Returns: %0 on success or -errno on error
*/
@@ -158,11 +159,13 @@ EXPORT_SYMBOL_GPL(xenbus_watch_path);
* xenbus_watch_pathfmt - register a watch on a sprintf-formatted path
* @dev: xenbus device
* @watch: watch to register
+ * @will_handle: events queuing determine callback
* @callback: callback to register
* @pathfmt: format of path to watch
*
* Register a watch on the given @path, using the given xenbus_watch
- * structure for storage, and the given @callback function as the
+ * structure for storage, @will_handle function as the callback to determine if
+ * each event need to be queued, and the given @callback function as the
* callback. On success, the watched path (@path/@path2) will be saved
* as @watch->node, and becomes the caller's to kfree().
* On error, watch->node will be NULL, so the caller has nothing to
diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
index bae330c..abdbbae 100644
--- a/fs/9p/vfs_file.c
+++ b/fs/9p/vfs_file.c
@@ -107,7 +107,7 @@ static int v9fs_file_lock(struct file *filp, int cmd, struct file_lock *fl)
p9_debug(P9_DEBUG_VFS, "filp: %p lock: %p\n", filp, fl);
- if ((IS_SETLK(cmd) || IS_SETLKW(cmd)) && fl->fl_type != F_UNLCK) {
+ if ((IS_SETLK(cmd) || IS_SETLKW(cmd)) && fl->c.flc_type != F_UNLCK) {
filemap_write_and_wait(inode->i_mapping);
invalidate_mapping_pages(&inode->i_data, 0, -1);
}
@@ -121,13 +121,12 @@ static int v9fs_file_do_lock(struct file *filp, int cmd, struct file_lock *fl)
struct p9_fid *fid;
uint8_t status = P9_LOCK_ERROR;
int res = 0;
- unsigned char fl_type;
struct v9fs_session_info *v9ses;
fid = filp->private_data;
BUG_ON(fid == NULL);
- BUG_ON((fl->fl_flags & FL_POSIX) != FL_POSIX);
+ BUG_ON((fl->c.flc_flags & FL_POSIX) != FL_POSIX);
res = locks_lock_file_wait(filp, fl);
if (res < 0)
@@ -136,7 +135,7 @@ static int v9fs_file_do_lock(struct file *filp, int cmd, struct file_lock *fl)
/* convert posix lock to p9 tlock args */
memset(&flock, 0, sizeof(flock));
/* map the lock type */
- switch (fl->fl_type) {
+ switch (fl->c.flc_type) {
case F_RDLCK:
flock.type = P9_LOCK_TYPE_RDLCK;
break;
@@ -152,7 +151,7 @@ static int v9fs_file_do_lock(struct file *filp, int cmd, struct file_lock *fl)
flock.length = 0;
else
flock.length = fl->fl_end - fl->fl_start + 1;
- flock.proc_id = fl->fl_pid;
+ flock.proc_id = fl->c.flc_pid;
flock.client_id = fid->clnt->name;
if (IS_SETLKW(cmd))
flock.flags = P9_LOCK_FLAGS_BLOCK;
@@ -207,12 +206,13 @@ static int v9fs_file_do_lock(struct file *filp, int cmd, struct file_lock *fl)
* incase server returned error for lock request, revert
* it locally
*/
- if (res < 0 && fl->fl_type != F_UNLCK) {
- fl_type = fl->fl_type;
- fl->fl_type = F_UNLCK;
+ if (res < 0 && fl->c.flc_type != F_UNLCK) {
+ unsigned char type = fl->c.flc_type;
+
+ fl->c.flc_type = F_UNLCK;
/* Even if this fails we want to return the remote error */
locks_lock_file_wait(filp, fl);
- fl->fl_type = fl_type;
+ fl->c.flc_type = type;
}
if (flock.client_id != fid->clnt->name)
kfree(flock.client_id);
@@ -234,7 +234,7 @@ static int v9fs_file_getlock(struct file *filp, struct file_lock *fl)
* if we have a conflicting lock locally, no need to validate
* with server
*/
- if (fl->fl_type != F_UNLCK)
+ if (fl->c.flc_type != F_UNLCK)
return res;
/* convert posix lock to p9 tgetlock args */
@@ -245,7 +245,7 @@ static int v9fs_file_getlock(struct file *filp, struct file_lock *fl)
glock.length = 0;
else
glock.length = fl->fl_end - fl->fl_start + 1;
- glock.proc_id = fl->fl_pid;
+ glock.proc_id = fl->c.flc_pid;
glock.client_id = fid->clnt->name;
res = p9_client_getlock_dotl(fid, &glock);
@@ -254,13 +254,13 @@ static int v9fs_file_getlock(struct file *filp, struct file_lock *fl)
/* map 9p lock type to os lock type */
switch (glock.type) {
case P9_LOCK_TYPE_RDLCK:
- fl->fl_type = F_RDLCK;
+ fl->c.flc_type = F_RDLCK;
break;
case P9_LOCK_TYPE_WRLCK:
- fl->fl_type = F_WRLCK;
+ fl->c.flc_type = F_WRLCK;
break;
case P9_LOCK_TYPE_UNLCK:
- fl->fl_type = F_UNLCK;
+ fl->c.flc_type = F_UNLCK;
break;
}
if (glock.type != P9_LOCK_TYPE_UNLCK) {
@@ -269,7 +269,7 @@ static int v9fs_file_getlock(struct file *filp, struct file_lock *fl)
fl->fl_end = OFFSET_MAX;
else
fl->fl_end = glock.start + glock.length - 1;
- fl->fl_pid = -glock.proc_id;
+ fl->c.flc_pid = -glock.proc_id;
}
out:
if (glock.client_id != fid->clnt->name)
@@ -293,7 +293,7 @@ static int v9fs_file_lock_dotl(struct file *filp, int cmd, struct file_lock *fl)
p9_debug(P9_DEBUG_VFS, "filp: %p cmd:%d lock: %p name: %pD\n",
filp, cmd, fl, filp);
- if ((IS_SETLK(cmd) || IS_SETLKW(cmd)) && fl->fl_type != F_UNLCK) {
+ if ((IS_SETLK(cmd) || IS_SETLKW(cmd)) && fl->c.flc_type != F_UNLCK) {
filemap_write_and_wait(inode->i_mapping);
invalidate_mapping_pages(&inode->i_data, 0, -1);
}
@@ -324,16 +324,16 @@ static int v9fs_file_flock_dotl(struct file *filp, int cmd,
p9_debug(P9_DEBUG_VFS, "filp: %p cmd:%d lock: %p name: %pD\n",
filp, cmd, fl, filp);
- if (!(fl->fl_flags & FL_FLOCK))
+ if (!(fl->c.flc_flags & FL_FLOCK))
goto out_err;
- if ((IS_SETLK(cmd) || IS_SETLKW(cmd)) && fl->fl_type != F_UNLCK) {
+ if ((IS_SETLK(cmd) || IS_SETLKW(cmd)) && fl->c.flc_type != F_UNLCK) {
filemap_write_and_wait(inode->i_mapping);
invalidate_mapping_pages(&inode->i_data, 0, -1);
}
/* Convert flock to posix lock */
- fl->fl_flags |= FL_POSIX;
- fl->fl_flags ^= FL_FLOCK;
+ fl->c.flc_flags |= FL_POSIX;
+ fl->c.flc_flags ^= FL_FLOCK;
if (IS_SETLK(cmd) | IS_SETLKW(cmd))
ret = v9fs_file_do_lock(filp, cmd, fl);
diff --git a/fs/Kconfig b/fs/Kconfig
index 89fdbef..4bc7dd4 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -162,7 +162,6 @@
source "fs/fat/Kconfig"
source "fs/exfat/Kconfig"
-source "fs/ntfs/Kconfig"
source "fs/ntfs3/Kconfig"
endmenu
@@ -174,6 +173,13 @@
source "fs/kernfs/Kconfig"
source "fs/sysfs/Kconfig"
+config FS_PID
+ bool "Pseudo filesystem for process file descriptors"
+ depends on 64BIT
+ default y
+ help
+ Pidfs implements advanced features for process file descriptors.
+
config TMPFS
bool "Tmpfs virtual memory file system support (former shm fs)"
depends on SHMEM
diff --git a/fs/Makefile b/fs/Makefile
index c090162..6ecc9b0 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -15,7 +15,7 @@
pnode.o splice.o sync.o utimes.o d_path.o \
stack.o fs_struct.o statfs.o fs_pin.o nsfs.o \
fs_types.o fs_context.o fs_parser.o fsopen.o init.o \
- kernel_read_file.o mnt_idmapping.o remap_range.o
+ kernel_read_file.o mnt_idmapping.o remap_range.o pidfs.o
obj-$(CONFIG_BUFFER_HEAD) += buffer.o mpage.o
obj-$(CONFIG_PROC_FS) += proc_namespace.o
@@ -91,7 +91,6 @@
obj-$(CONFIG_SYSV_FS) += sysv/
obj-$(CONFIG_SMBFS) += smb/
obj-$(CONFIG_HPFS_FS) += hpfs/
-obj-$(CONFIG_NTFS_FS) += ntfs/
obj-$(CONFIG_NTFS3_FS) += ntfs3/
obj-$(CONFIG_UFS_FS) += ufs/
obj-$(CONFIG_EFS_FS) += efs/
diff --git a/fs/affs/affs.h b/fs/affs/affs.h
index 60685ec..2e612834 100644
--- a/fs/affs/affs.h
+++ b/fs/affs/affs.h
@@ -105,6 +105,7 @@ struct affs_sb_info {
int work_queued; /* non-zero delayed work is queued */
struct delayed_work sb_work; /* superblock flush delayed work */
spinlock_t work_lock; /* protects sb_work and work_queued */
+ struct rcu_head rcu;
};
#define AFFS_MOUNT_SF_INTL 0x0001 /* International filesystem. */
diff --git a/fs/affs/super.c b/fs/affs/super.c
index 58b3914..b56a95c 100644
--- a/fs/affs/super.c
+++ b/fs/affs/super.c
@@ -640,7 +640,7 @@ static void affs_kill_sb(struct super_block *sb)
affs_brelse(sbi->s_root_bh);
kfree(sbi->s_prefix);
mutex_destroy(&sbi->s_bmlock);
- kfree(sbi);
+ kfree_rcu(sbi, rcu);
}
}
diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index b5b8de5..8a67fc4 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -479,8 +479,10 @@ static int afs_dir_iterate_block(struct afs_vnode *dvnode,
dire->u.name[0] == '.' &&
ctx->actor != afs_lookup_filldir &&
ctx->actor != afs_lookup_one_filldir &&
- memcmp(dire->u.name, ".__afs", 6) == 0)
+ memcmp(dire->u.name, ".__afs", 6) == 0) {
+ ctx->pos = blkoff + next * sizeof(union afs_xdr_dirent);
continue;
+ }
/* found the next entry */
if (!dir_emit(ctx, dire->u.name, nlen,
diff --git a/fs/afs/file.c b/fs/afs/file.c
index 3d33b22..ef2cc8f 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -417,13 +417,17 @@ static void afs_add_open_mmap(struct afs_vnode *vnode)
static void afs_drop_open_mmap(struct afs_vnode *vnode)
{
- if (!atomic_dec_and_test(&vnode->cb_nr_mmap))
+ if (atomic_add_unless(&vnode->cb_nr_mmap, -1, 1))
return;
down_write(&vnode->volume->open_mmaps_lock);
- if (atomic_read(&vnode->cb_nr_mmap) == 0)
+ read_seqlock_excl(&vnode->cb_lock);
+ // the only place where ->cb_nr_mmap may hit 0
+ // see __afs_break_callback() for the other side...
+ if (atomic_dec_and_test(&vnode->cb_nr_mmap))
list_del_init(&vnode->cb_mmap_link);
+ read_sequnlock_excl(&vnode->cb_lock);
up_write(&vnode->volume->open_mmaps_lock);
flush_work(&vnode->cb_work);
diff --git a/fs/afs/flock.c b/fs/afs/flock.c
index 9c6dea3..f0e96a3 100644
--- a/fs/afs/flock.c
+++ b/fs/afs/flock.c
@@ -93,13 +93,13 @@ static void afs_grant_locks(struct afs_vnode *vnode)
bool exclusive = (vnode->lock_type == AFS_LOCK_WRITE);
list_for_each_entry_safe(p, _p, &vnode->pending_locks, fl_u.afs.link) {
- if (!exclusive && p->fl_type == F_WRLCK)
+ if (!exclusive && lock_is_write(p))
continue;
list_move_tail(&p->fl_u.afs.link, &vnode->granted_locks);
p->fl_u.afs.state = AFS_LOCK_GRANTED;
trace_afs_flock_op(vnode, p, afs_flock_op_grant);
- wake_up(&p->fl_wait);
+ locks_wake_up(p);
}
}
@@ -112,25 +112,24 @@ static void afs_next_locker(struct afs_vnode *vnode, int error)
{
struct file_lock *p, *_p, *next = NULL;
struct key *key = vnode->lock_key;
- unsigned int fl_type = F_RDLCK;
+ unsigned int type = F_RDLCK;
_enter("");
if (vnode->lock_type == AFS_LOCK_WRITE)
- fl_type = F_WRLCK;
+ type = F_WRLCK;
list_for_each_entry_safe(p, _p, &vnode->pending_locks, fl_u.afs.link) {
if (error &&
- p->fl_type == fl_type &&
- afs_file_key(p->fl_file) == key) {
+ p->c.flc_type == type &&
+ afs_file_key(p->c.flc_file) == key) {
list_del_init(&p->fl_u.afs.link);
p->fl_u.afs.state = error;
- wake_up(&p->fl_wait);
+ locks_wake_up(p);
}
/* Select the next locker to hand off to. */
- if (next &&
- (next->fl_type == F_WRLCK || p->fl_type == F_RDLCK))
+ if (next && (lock_is_write(next) || lock_is_read(p)))
continue;
next = p;
}
@@ -142,7 +141,7 @@ static void afs_next_locker(struct afs_vnode *vnode, int error)
afs_set_lock_state(vnode, AFS_VNODE_LOCK_SETTING);
next->fl_u.afs.state = AFS_LOCK_YOUR_TRY;
trace_afs_flock_op(vnode, next, afs_flock_op_wake);
- wake_up(&next->fl_wait);
+ locks_wake_up(next);
} else {
afs_set_lock_state(vnode, AFS_VNODE_LOCK_NONE);
trace_afs_flock_ev(vnode, NULL, afs_flock_no_lockers, 0);
@@ -166,7 +165,7 @@ static void afs_kill_lockers_enoent(struct afs_vnode *vnode)
struct file_lock, fl_u.afs.link);
list_del_init(&p->fl_u.afs.link);
p->fl_u.afs.state = -ENOENT;
- wake_up(&p->fl_wait);
+ locks_wake_up(p);
}
key_put(vnode->lock_key);
@@ -464,14 +463,14 @@ static int afs_do_setlk(struct file *file, struct file_lock *fl)
_enter("{%llx:%llu},%llu-%llu,%u,%u",
vnode->fid.vid, vnode->fid.vnode,
- fl->fl_start, fl->fl_end, fl->fl_type, mode);
+ fl->fl_start, fl->fl_end, fl->c.flc_type, mode);
fl->fl_ops = &afs_lock_ops;
INIT_LIST_HEAD(&fl->fl_u.afs.link);
fl->fl_u.afs.state = AFS_LOCK_PENDING;
partial = (fl->fl_start != 0 || fl->fl_end != OFFSET_MAX);
- type = (fl->fl_type == F_RDLCK) ? AFS_LOCK_READ : AFS_LOCK_WRITE;
+ type = lock_is_read(fl) ? AFS_LOCK_READ : AFS_LOCK_WRITE;
if (mode == afs_flock_mode_write && partial)
type = AFS_LOCK_WRITE;
@@ -524,7 +523,7 @@ static int afs_do_setlk(struct file *file, struct file_lock *fl)
}
if (vnode->lock_state == AFS_VNODE_LOCK_NONE &&
- !(fl->fl_flags & FL_SLEEP)) {
+ !(fl->c.flc_flags & FL_SLEEP)) {
ret = -EAGAIN;
if (type == AFS_LOCK_READ) {
if (vnode->status.lock_count == -1)
@@ -621,7 +620,7 @@ static int afs_do_setlk(struct file *file, struct file_lock *fl)
return 0;
lock_is_contended:
- if (!(fl->fl_flags & FL_SLEEP)) {
+ if (!(fl->c.flc_flags & FL_SLEEP)) {
list_del_init(&fl->fl_u.afs.link);
afs_next_locker(vnode, 0);
ret = -EAGAIN;
@@ -641,7 +640,7 @@ static int afs_do_setlk(struct file *file, struct file_lock *fl)
spin_unlock(&vnode->lock);
trace_afs_flock_ev(vnode, fl, afs_flock_waiting, 0);
- ret = wait_event_interruptible(fl->fl_wait,
+ ret = wait_event_interruptible(fl->c.flc_wait,
fl->fl_u.afs.state != AFS_LOCK_PENDING);
trace_afs_flock_ev(vnode, fl, afs_flock_waited, ret);
@@ -704,7 +703,8 @@ static int afs_do_unlk(struct file *file, struct file_lock *fl)
struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
int ret;
- _enter("{%llx:%llu},%u", vnode->fid.vid, vnode->fid.vnode, fl->fl_type);
+ _enter("{%llx:%llu},%u", vnode->fid.vid, vnode->fid.vnode,
+ fl->c.flc_type);
trace_afs_flock_op(vnode, fl, afs_flock_op_unlock);
@@ -730,11 +730,11 @@ static int afs_do_getlk(struct file *file, struct file_lock *fl)
if (vnode->lock_state == AFS_VNODE_LOCK_DELETED)
return -ENOENT;
- fl->fl_type = F_UNLCK;
+ fl->c.flc_type = F_UNLCK;
/* check local lock records first */
posix_test_lock(file, fl);
- if (fl->fl_type == F_UNLCK) {
+ if (lock_is_unlock(fl)) {
/* no local locks; consult the server */
ret = afs_fetch_status(vnode, key, false, NULL);
if (ret < 0)
@@ -743,18 +743,18 @@ static int afs_do_getlk(struct file *file, struct file_lock *fl)
lock_count = READ_ONCE(vnode->status.lock_count);
if (lock_count != 0) {
if (lock_count > 0)
- fl->fl_type = F_RDLCK;
+ fl->c.flc_type = F_RDLCK;
else
- fl->fl_type = F_WRLCK;
+ fl->c.flc_type = F_WRLCK;
fl->fl_start = 0;
fl->fl_end = OFFSET_MAX;
- fl->fl_pid = 0;
+ fl->c.flc_pid = 0;
}
}
ret = 0;
error:
- _leave(" = %d [%hd]", ret, fl->fl_type);
+ _leave(" = %d [%hd]", ret, fl->c.flc_type);
return ret;
}
@@ -769,7 +769,7 @@ int afs_lock(struct file *file, int cmd, struct file_lock *fl)
_enter("{%llx:%llu},%d,{t=%x,fl=%x,r=%Ld:%Ld}",
vnode->fid.vid, vnode->fid.vnode, cmd,
- fl->fl_type, fl->fl_flags,
+ fl->c.flc_type, fl->c.flc_flags,
(long long) fl->fl_start, (long long) fl->fl_end);
if (IS_GETLK(cmd))
@@ -778,7 +778,7 @@ int afs_lock(struct file *file, int cmd, struct file_lock *fl)
fl->fl_u.afs.debug_id = atomic_inc_return(&afs_file_lock_debug_id);
trace_afs_flock_op(vnode, fl, afs_flock_op_lock);
- if (fl->fl_type == F_UNLCK)
+ if (lock_is_unlock(fl))
ret = afs_do_unlk(file, fl);
else
ret = afs_do_setlk(file, fl);
@@ -804,7 +804,7 @@ int afs_flock(struct file *file, int cmd, struct file_lock *fl)
_enter("{%llx:%llu},%d,{t=%x,fl=%x}",
vnode->fid.vid, vnode->fid.vnode, cmd,
- fl->fl_type, fl->fl_flags);
+ fl->c.flc_type, fl->c.flc_flags);
/*
* No BSD flocks over NFS allowed.
@@ -813,14 +813,14 @@ int afs_flock(struct file *file, int cmd, struct file_lock *fl)
* Not sure whether that would be unique, though, or whether
* that would break in other places.
*/
- if (!(fl->fl_flags & FL_FLOCK))
+ if (!(fl->c.flc_flags & FL_FLOCK))
return -ENOLCK;
fl->fl_u.afs.debug_id = atomic_inc_return(&afs_file_lock_debug_id);
trace_afs_flock_op(vnode, fl, afs_flock_op_flock);
/* we're simulating flock() locks using posix locks on the server */
- if (fl->fl_type == F_UNLCK)
+ if (lock_is_unlock(fl))
ret = afs_do_unlk(file, fl);
else
ret = afs_do_setlk(file, fl);
@@ -843,7 +843,7 @@ int afs_flock(struct file *file, int cmd, struct file_lock *fl)
*/
static void afs_fl_copy_lock(struct file_lock *new, struct file_lock *fl)
{
- struct afs_vnode *vnode = AFS_FS_I(file_inode(fl->fl_file));
+ struct afs_vnode *vnode = AFS_FS_I(file_inode(fl->c.flc_file));
_enter("");
@@ -861,7 +861,7 @@ static void afs_fl_copy_lock(struct file_lock *new, struct file_lock *fl)
*/
static void afs_fl_release_private(struct file_lock *fl)
{
- struct afs_vnode *vnode = AFS_FS_I(file_inode(fl->fl_file));
+ struct afs_vnode *vnode = AFS_FS_I(file_inode(fl->c.flc_file));
_enter("");
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 9c03fcf..6ce5a61 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -321,8 +321,7 @@ struct afs_net {
struct list_head fs_probe_slow; /* List of afs_server to probe at 5m intervals */
struct hlist_head fs_proc; /* procfs servers list */
- struct hlist_head fs_addresses4; /* afs_server (by lowest IPv4 addr) */
- struct hlist_head fs_addresses6; /* afs_server (by lowest IPv6 addr) */
+ struct hlist_head fs_addresses; /* afs_server (by lowest IPv6 addr) */
seqlock_t fs_addr_lock; /* For fs_addresses[46] */
struct work_struct fs_manager;
@@ -561,8 +560,7 @@ struct afs_server {
struct afs_server __rcu *uuid_next; /* Next server with same UUID */
struct afs_server *uuid_prev; /* Previous server with same UUID */
struct list_head probe_link; /* Link in net->fs_probe_list */
- struct hlist_node addr4_link; /* Link in net->fs_addresses4 */
- struct hlist_node addr6_link; /* Link in net->fs_addresses6 */
+ struct hlist_node addr_link; /* Link in net->fs_addresses6 */
struct hlist_node proc_link; /* Link in net->fs_proc */
struct list_head volumes; /* RCU list of afs_server_entry objects */
struct afs_server *gc_next; /* Next server in manager's list */
diff --git a/fs/afs/main.c b/fs/afs/main.c
index 1b3bd21..a14f601 100644
--- a/fs/afs/main.c
+++ b/fs/afs/main.c
@@ -90,8 +90,7 @@ static int __net_init afs_net_init(struct net *net_ns)
INIT_LIST_HEAD(&net->fs_probe_slow);
INIT_HLIST_HEAD(&net->fs_proc);
- INIT_HLIST_HEAD(&net->fs_addresses4);
- INIT_HLIST_HEAD(&net->fs_addresses6);
+ INIT_HLIST_HEAD(&net->fs_addresses);
seqlock_init(&net->fs_addr_lock);
INIT_WORK(&net->fs_manager, afs_manage_servers);
diff --git a/fs/afs/server.c b/fs/afs/server.c
index e169121..038f9d0 100644
--- a/fs/afs/server.c
+++ b/fs/afs/server.c
@@ -38,7 +38,7 @@ struct afs_server *afs_find_server(struct afs_net *net, const struct rxrpc_peer
seq++; /* 2 on the 1st/lockless path, otherwise odd */
read_seqbegin_or_lock(&net->fs_addr_lock, &seq);
- hlist_for_each_entry_rcu(server, &net->fs_addresses6, addr6_link) {
+ hlist_for_each_entry_rcu(server, &net->fs_addresses, addr_link) {
estate = rcu_dereference(server->endpoint_state);
alist = estate->addresses;
for (i = 0; i < alist->nr_addrs; i++)
@@ -177,10 +177,8 @@ static struct afs_server *afs_install_server(struct afs_cell *cell,
* bit, but anything we might want to do gets messy and memory
* intensive.
*/
- if (alist->nr_ipv4 > 0)
- hlist_add_head_rcu(&server->addr4_link, &net->fs_addresses4);
- if (alist->nr_addrs > alist->nr_ipv4)
- hlist_add_head_rcu(&server->addr6_link, &net->fs_addresses6);
+ if (alist->nr_addrs > 0)
+ hlist_add_head_rcu(&server->addr_link, &net->fs_addresses);
write_sequnlock(&net->fs_addr_lock);
@@ -511,10 +509,8 @@ static void afs_gc_servers(struct afs_net *net, struct afs_server *gc_list)
list_del(&server->probe_link);
hlist_del_rcu(&server->proc_link);
- if (!hlist_unhashed(&server->addr4_link))
- hlist_del_rcu(&server->addr4_link);
- if (!hlist_unhashed(&server->addr6_link))
- hlist_del_rcu(&server->addr6_link);
+ if (!hlist_unhashed(&server->addr_link))
+ hlist_del_rcu(&server->addr_link);
}
write_sequnlock(&net->fs_lock);
diff --git a/fs/afs/volume.c b/fs/afs/volume.c
index 020ecd4..af3a3f5 100644
--- a/fs/afs/volume.c
+++ b/fs/afs/volume.c
@@ -353,7 +353,7 @@ static int afs_update_volume_status(struct afs_volume *volume, struct key *key)
{
struct afs_server_list *new, *old, *discard;
struct afs_vldb_entry *vldb;
- char idbuf[16];
+ char idbuf[24];
int ret, idsz;
_enter("");
@@ -361,7 +361,7 @@ static int afs_update_volume_status(struct afs_volume *volume, struct key *key)
/* We look up an ID by passing it as a decimal string in the
* operation's name parameter.
*/
- idsz = sprintf(idbuf, "%llu", volume->vid);
+ idsz = snprintf(idbuf, sizeof(idbuf), "%llu", volume->vid);
vldb = afs_vl_lookup_vldb(volume->cell, key, idbuf, idsz);
if (IS_ERR(vldb)) {
diff --git a/fs/aio.c b/fs/aio.c
index bb2ff48..9cdaa2f 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -589,13 +589,24 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
{
- struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, rw);
- struct kioctx *ctx = req->ki_ctx;
+ struct aio_kiocb *req;
+ struct kioctx *ctx;
unsigned long flags;
+ /*
+ * kiocb didn't come from aio or is neither a read nor a write, hence
+ * ignore it.
+ */
+ if (!(iocb->ki_flags & IOCB_AIO_RW))
+ return;
+
+ req = container_of(iocb, struct aio_kiocb, rw);
+
if (WARN_ON_ONCE(!list_empty(&req->ki_list)))
return;
+ ctx = req->ki_ctx;
+
spin_lock_irqsave(&ctx->ctx_lock, flags);
list_add_tail(&req->ki_list, &ctx->active_reqs);
req->ki_cancel = cancel;
@@ -1509,7 +1520,7 @@ static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb)
req->ki_complete = aio_complete_rw;
req->private = NULL;
req->ki_pos = iocb->aio_offset;
- req->ki_flags = req->ki_filp->f_iocb_flags;
+ req->ki_flags = req->ki_filp->f_iocb_flags | IOCB_AIO_RW;
if (iocb->aio_flags & IOCB_FLAG_RESFD)
req->ki_flags |= IOCB_EVENTFD;
if (iocb->aio_flags & IOCB_FLAG_IOPRIO) {
diff --git a/fs/attr.c b/fs/attr.c
index 5a13f0c..49d23b5 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -352,7 +352,7 @@ int may_setattr(struct mnt_idmap *idmap, struct inode *inode,
EXPORT_SYMBOL(may_setattr);
/**
- * notify_change - modify attributes of a filesytem object
+ * notify_change - modify attributes of a filesystem object
* @idmap: idmap of the mount the inode was found from
* @dentry: object affected
* @attr: new attributes
diff --git a/fs/backing-file.c b/fs/backing-file.c
index a681f38..7401851 100644
--- a/fs/backing-file.c
+++ b/fs/backing-file.c
@@ -325,9 +325,7 @@ EXPORT_SYMBOL_GPL(backing_file_mmap);
static int __init backing_aio_init(void)
{
- backing_aio_cachep = kmem_cache_create("backing_aio",
- sizeof(struct backing_aio),
- 0, SLAB_HWCACHE_ALIGN, NULL);
+ backing_aio_cachep = KMEM_CACHE(backing_aio, SLAB_HWCACHE_ALIGN);
if (!backing_aio_cachep)
return -ENOMEM;
diff --git a/fs/bcachefs/backpointers.c b/fs/bcachefs/backpointers.c
index b4dc319..569b979 100644
--- a/fs/bcachefs/backpointers.c
+++ b/fs/bcachefs/backpointers.c
@@ -68,9 +68,11 @@ void bch2_backpointer_to_text(struct printbuf *out, const struct bch_backpointer
void bch2_backpointer_k_to_text(struct printbuf *out, struct bch_fs *c, struct bkey_s_c k)
{
- prt_str(out, "bucket=");
- bch2_bpos_to_text(out, bp_pos_to_bucket(c, k.k->p));
- prt_str(out, " ");
+ if (bch2_dev_exists2(c, k.k->p.inode)) {
+ prt_str(out, "bucket=");
+ bch2_bpos_to_text(out, bp_pos_to_bucket(c, k.k->p));
+ prt_str(out, " ");
+ }
bch2_backpointer_to_text(out, bkey_s_c_to_backpointer(k).v);
}
diff --git a/fs/bcachefs/bcachefs.h b/fs/bcachefs/bcachefs.h
index b80c6c9..69d0d60 100644
--- a/fs/bcachefs/bcachefs.h
+++ b/fs/bcachefs/bcachefs.h
@@ -1249,6 +1249,18 @@ static inline struct stdio_redirect *bch2_fs_stdio_redirect(struct bch_fs *c)
return stdio;
}
+static inline unsigned metadata_replicas_required(struct bch_fs *c)
+{
+ return min(c->opts.metadata_replicas,
+ c->opts.metadata_replicas_required);
+}
+
+static inline unsigned data_replicas_required(struct bch_fs *c)
+{
+ return min(c->opts.data_replicas,
+ c->opts.data_replicas_required);
+}
+
#define BKEY_PADDED_ONSTACK(key, pad) \
struct { struct bkey_i key; __u64 key ## _pad[pad]; }
diff --git a/fs/bcachefs/btree_iter.c b/fs/bcachefs/btree_iter.c
index 5467a86..3ef338d 100644
--- a/fs/bcachefs/btree_iter.c
+++ b/fs/bcachefs/btree_iter.c
@@ -2156,7 +2156,9 @@ struct bkey_s_c bch2_btree_iter_peek_upto(struct btree_iter *iter, struct bpos e
* isn't monotonically increasing before FILTER_SNAPSHOTS, and
* that's what we check against in extents mode:
*/
- if (k.k->p.inode > end.inode)
+ if (unlikely(!(iter->flags & BTREE_ITER_IS_EXTENTS)
+ ? bkey_gt(k.k->p, end)
+ : k.k->p.inode > end.inode))
goto end;
if (iter->update_path &&
diff --git a/fs/bcachefs/btree_update_interior.c b/fs/bcachefs/btree_update_interior.c
index 17a5938..4530b14 100644
--- a/fs/bcachefs/btree_update_interior.c
+++ b/fs/bcachefs/btree_update_interior.c
@@ -280,7 +280,8 @@ static struct btree *__bch2_btree_node_alloc(struct btree_trans *trans,
writepoint_ptr(&c->btree_write_point),
&devs_have,
res->nr_replicas,
- c->opts.metadata_replicas_required,
+ min(res->nr_replicas,
+ c->opts.metadata_replicas_required),
watermark, 0, cl, &wp);
if (unlikely(ret))
return ERR_PTR(ret);
diff --git a/fs/bcachefs/fs-io-buffered.c b/fs/bcachefs/fs-io-buffered.c
index 73c12e5..27710cd 100644
--- a/fs/bcachefs/fs-io-buffered.c
+++ b/fs/bcachefs/fs-io-buffered.c
@@ -303,18 +303,6 @@ void bch2_readahead(struct readahead_control *ractl)
darray_exit(&readpages_iter.folios);
}
-static void __bchfs_readfolio(struct bch_fs *c, struct bch_read_bio *rbio,
- subvol_inum inum, struct folio *folio)
-{
- bch2_folio_create(folio, __GFP_NOFAIL);
-
- rbio->bio.bi_opf = REQ_OP_READ|REQ_SYNC;
- rbio->bio.bi_iter.bi_sector = folio_sector(folio);
- BUG_ON(!bio_add_folio(&rbio->bio, folio, folio_size(folio), 0));
-
- bch2_trans_run(c, (bchfs_read(trans, rbio, inum, NULL), 0));
-}
-
static void bch2_read_single_folio_end_io(struct bio *bio)
{
complete(bio->bi_private);
@@ -329,6 +317,9 @@ int bch2_read_single_folio(struct folio *folio, struct address_space *mapping)
int ret;
DECLARE_COMPLETION_ONSTACK(done);
+ if (!bch2_folio_create(folio, GFP_KERNEL))
+ return -ENOMEM;
+
bch2_inode_opts_get(&opts, c, &inode->ei_inode);
rbio = rbio_init(bio_alloc_bioset(NULL, 1, REQ_OP_READ, GFP_KERNEL, &c->bio_read),
@@ -336,7 +327,11 @@ int bch2_read_single_folio(struct folio *folio, struct address_space *mapping)
rbio->bio.bi_private = &done;
rbio->bio.bi_end_io = bch2_read_single_folio_end_io;
- __bchfs_readfolio(c, rbio, inode_inum(inode), folio);
+ rbio->bio.bi_opf = REQ_OP_READ|REQ_SYNC;
+ rbio->bio.bi_iter.bi_sector = folio_sector(folio);
+ BUG_ON(!bio_add_folio(&rbio->bio, folio, folio_size(folio), 0));
+
+ bch2_trans_run(c, (bchfs_read(trans, rbio, inode_inum(inode), NULL), 0));
wait_for_completion(&done);
ret = blk_status_to_errno(rbio->bio.bi_status);
diff --git a/fs/bcachefs/fs-io-direct.c b/fs/bcachefs/fs-io-direct.c
index e3b219e..33cb6da 100644
--- a/fs/bcachefs/fs-io-direct.c
+++ b/fs/bcachefs/fs-io-direct.c
@@ -88,6 +88,8 @@ static int bch2_direct_IO_read(struct kiocb *req, struct iov_iter *iter)
return ret;
shorten = iov_iter_count(iter) - round_up(ret, block_bytes(c));
+ if (shorten >= iter->count)
+ shorten = 0;
iter->count -= shorten;
bio = bio_alloc_bioset(NULL,
diff --git a/fs/bcachefs/fs-ioctl.c b/fs/bcachefs/fs-ioctl.c
index 3a4c24c..3dc8630 100644
--- a/fs/bcachefs/fs-ioctl.c
+++ b/fs/bcachefs/fs-ioctl.c
@@ -455,6 +455,7 @@ static long bch2_ioctl_subvolume_destroy(struct bch_fs *c, struct file *filp,
if (IS_ERR(victim))
return PTR_ERR(victim);
+ dir = d_inode(path.dentry);
if (victim->d_sb->s_fs_info != c) {
ret = -EXDEV;
goto err;
@@ -463,14 +464,13 @@ static long bch2_ioctl_subvolume_destroy(struct bch_fs *c, struct file *filp,
ret = -ENOENT;
goto err;
}
- dir = d_inode(path.dentry);
ret = __bch2_unlink(dir, victim, true);
if (!ret) {
fsnotify_rmdir(dir, victim);
d_delete(victim);
}
- inode_unlock(dir);
err:
+ inode_unlock(dir);
dput(victim);
path_put(&path);
return ret;
diff --git a/fs/bcachefs/fs.c b/fs/bcachefs/fs.c
index ec419b8..77ae655 100644
--- a/fs/bcachefs/fs.c
+++ b/fs/bcachefs/fs.c
@@ -435,7 +435,7 @@ static int bch2_link(struct dentry *old_dentry, struct inode *vdir,
bch2_subvol_is_ro(c, inode->ei_subvol) ?:
__bch2_link(c, inode, dir, dentry);
if (unlikely(ret))
- return ret;
+ return bch2_err_class(ret);
ihold(&inode->v);
d_instantiate(dentry, &inode->v);
@@ -487,8 +487,9 @@ static int bch2_unlink(struct inode *vdir, struct dentry *dentry)
struct bch_inode_info *dir= to_bch_ei(vdir);
struct bch_fs *c = dir->v.i_sb->s_fs_info;
- return bch2_subvol_is_ro(c, dir->ei_subvol) ?:
+ int ret = bch2_subvol_is_ro(c, dir->ei_subvol) ?:
__bch2_unlink(vdir, dentry, false);
+ return bch2_err_class(ret);
}
static int bch2_symlink(struct mnt_idmap *idmap,
@@ -523,7 +524,7 @@ static int bch2_symlink(struct mnt_idmap *idmap,
return 0;
err:
iput(&inode->v);
- return ret;
+ return bch2_err_class(ret);
}
static int bch2_mkdir(struct mnt_idmap *idmap,
@@ -641,7 +642,7 @@ static int bch2_rename2(struct mnt_idmap *idmap,
src_inode,
dst_inode);
- return ret;
+ return bch2_err_class(ret);
}
static void bch2_setattr_copy(struct mnt_idmap *idmap,
diff --git a/fs/bcachefs/io_write.c b/fs/bcachefs/io_write.c
index ef3a53f..2c098ac 100644
--- a/fs/bcachefs/io_write.c
+++ b/fs/bcachefs/io_write.c
@@ -1564,6 +1564,7 @@ CLOSURE_CALLBACK(bch2_write)
BUG_ON(!op->write_point.v);
BUG_ON(bkey_eq(op->pos, POS_MAX));
+ op->nr_replicas_required = min_t(unsigned, op->nr_replicas_required, op->nr_replicas);
op->start_time = local_clock();
bch2_keylist_init(&op->insert_keys, op->inline_keys);
wbio_init(bio)->put_bio = false;
diff --git a/fs/bcachefs/journal_io.c b/fs/bcachefs/journal_io.c
index bfd6585..4780519 100644
--- a/fs/bcachefs/journal_io.c
+++ b/fs/bcachefs/journal_io.c
@@ -1478,6 +1478,8 @@ static int journal_write_alloc(struct journal *j, struct journal_buf *w)
c->opts.foreground_target;
unsigned i, replicas = 0, replicas_want =
READ_ONCE(c->opts.metadata_replicas);
+ unsigned replicas_need = min_t(unsigned, replicas_want,
+ READ_ONCE(c->opts.metadata_replicas_required));
rcu_read_lock();
retry:
@@ -1526,7 +1528,7 @@ static int journal_write_alloc(struct journal *j, struct journal_buf *w)
BUG_ON(bkey_val_u64s(&w->key.k) > BCH_REPLICAS_MAX);
- return replicas >= c->opts.metadata_replicas_required ? 0 : -EROFS;
+ return replicas >= replicas_need ? 0 : -EROFS;
}
static void journal_buf_realloc(struct journal *j, struct journal_buf *buf)
diff --git a/fs/bcachefs/journal_reclaim.c b/fs/bcachefs/journal_reclaim.c
index 820d25e..c33dca6 100644
--- a/fs/bcachefs/journal_reclaim.c
+++ b/fs/bcachefs/journal_reclaim.c
@@ -205,7 +205,7 @@ void bch2_journal_space_available(struct journal *j)
j->can_discard = can_discard;
- if (nr_online < c->opts.metadata_replicas_required) {
+ if (nr_online < metadata_replicas_required(c)) {
ret = JOURNAL_ERR_insufficient_devices;
goto out;
}
@@ -892,9 +892,11 @@ int bch2_journal_flush_device_pins(struct journal *j, int dev_idx)
journal_seq_pin(j, seq)->devs);
seq++;
- spin_unlock(&j->lock);
- ret = bch2_mark_replicas(c, &replicas.e);
- spin_lock(&j->lock);
+ if (replicas.e.nr_devs) {
+ spin_unlock(&j->lock);
+ ret = bch2_mark_replicas(c, &replicas.e);
+ spin_lock(&j->lock);
+ }
}
spin_unlock(&j->lock);
err:
diff --git a/fs/bcachefs/mean_and_variance.h b/fs/bcachefs/mean_and_variance.h
index b2be565..64df11a 100644
--- a/fs/bcachefs/mean_and_variance.h
+++ b/fs/bcachefs/mean_and_variance.h
@@ -17,7 +17,7 @@
* Rust and rustc has issues with u128.
*/
-#if defined(__SIZEOF_INT128__) && defined(__KERNEL__)
+#if defined(__SIZEOF_INT128__) && defined(__KERNEL__) && !defined(CONFIG_PARISC)
typedef struct {
unsigned __int128 v;
diff --git a/fs/bcachefs/printbuf.c b/fs/bcachefs/printbuf.c
index accf246..b27d229 100644
--- a/fs/bcachefs/printbuf.c
+++ b/fs/bcachefs/printbuf.c
@@ -56,6 +56,7 @@ void bch2_prt_vprintf(struct printbuf *out, const char *fmt, va_list args)
va_copy(args2, args);
len = vsnprintf(out->buf + out->pos, printbuf_remaining(out), fmt, args2);
+ va_end(args2);
} while (len + 1 >= printbuf_remaining(out) &&
!bch2_printbuf_make_room(out, len + 1));
diff --git a/fs/bcachefs/recovery.c b/fs/bcachefs/recovery.c
index 9127d0e..21e13bb 100644
--- a/fs/bcachefs/recovery.c
+++ b/fs/bcachefs/recovery.c
@@ -577,8 +577,9 @@ u64 bch2_recovery_passes_from_stable(u64 v)
static bool check_version_upgrade(struct bch_fs *c)
{
- unsigned latest_compatible = bch2_latest_compatible_version(c->sb.version);
unsigned latest_version = bcachefs_metadata_version_current;
+ unsigned latest_compatible = min(latest_version,
+ bch2_latest_compatible_version(c->sb.version));
unsigned old_version = c->sb.version_upgrade_complete ?: c->sb.version;
unsigned new_version = 0;
@@ -597,7 +598,7 @@ static bool check_version_upgrade(struct bch_fs *c)
new_version = latest_version;
break;
case BCH_VERSION_UPGRADE_none:
- new_version = old_version;
+ new_version = min(old_version, latest_version);
break;
}
}
@@ -774,7 +775,7 @@ int bch2_fs_recovery(struct bch_fs *c)
goto err;
}
- if (!(c->opts.nochanges && c->opts.norecovery)) {
+ if (!c->opts.nochanges) {
mutex_lock(&c->sb_lock);
bool write_sb = false;
@@ -804,7 +805,7 @@ int bch2_fs_recovery(struct bch_fs *c)
if (bch2_check_version_downgrade(c)) {
struct printbuf buf = PRINTBUF;
- prt_str(&buf, "Version downgrade required:\n");
+ prt_str(&buf, "Version downgrade required:");
__le64 passes = ext->recovery_passes_required[0];
bch2_sb_set_downgrade(c,
@@ -812,7 +813,7 @@ int bch2_fs_recovery(struct bch_fs *c)
BCH_VERSION_MINOR(c->sb.version));
passes = ext->recovery_passes_required[0] & ~passes;
if (passes) {
- prt_str(&buf, " running recovery passes: ");
+ prt_str(&buf, "\n running recovery passes: ");
prt_bitflags(&buf, bch2_recovery_passes,
bch2_recovery_passes_from_stable(le64_to_cpu(passes)));
}
diff --git a/fs/bcachefs/sb-members.c b/fs/bcachefs/sb-members.c
index a45354d..eff5ce1 100644
--- a/fs/bcachefs/sb-members.c
+++ b/fs/bcachefs/sb-members.c
@@ -421,7 +421,7 @@ void bch2_dev_errors_reset(struct bch_dev *ca)
m = bch2_members_v2_get_mut(c->disk_sb.sb, ca->dev_idx);
for (unsigned i = 0; i < ARRAY_SIZE(m->errors_at_reset); i++)
m->errors_at_reset[i] = cpu_to_le64(atomic64_read(&ca->errors[i]));
- m->errors_reset_time = ktime_get_real_seconds();
+ m->errors_reset_time = cpu_to_le64(ktime_get_real_seconds());
bch2_write_super(c);
mutex_unlock(&c->sb_lock);
diff --git a/fs/bcachefs/snapshot.c b/fs/bcachefs/snapshot.c
index 45f67e8..ac6ba04 100644
--- a/fs/bcachefs/snapshot.c
+++ b/fs/bcachefs/snapshot.c
@@ -728,7 +728,7 @@ static int check_snapshot(struct btree_trans *trans,
return 0;
memset(&s, 0, sizeof(s));
- memcpy(&s, k.v, bkey_val_bytes(k.k));
+ memcpy(&s, k.v, min(sizeof(s), bkey_val_bytes(k.k)));
id = le32_to_cpu(s.parent);
if (id) {
diff --git a/fs/bcachefs/super-io.c b/fs/bcachefs/super-io.c
index d60c7d2..bd64eb6 100644
--- a/fs/bcachefs/super-io.c
+++ b/fs/bcachefs/super-io.c
@@ -142,8 +142,8 @@ void bch2_sb_field_delete(struct bch_sb_handle *sb,
void bch2_free_super(struct bch_sb_handle *sb)
{
kfree(sb->bio);
- if (!IS_ERR_OR_NULL(sb->bdev_handle))
- bdev_release(sb->bdev_handle);
+ if (!IS_ERR_OR_NULL(sb->s_bdev_file))
+ fput(sb->s_bdev_file);
kfree(sb->holder);
kfree(sb->sb_name);
@@ -704,22 +704,22 @@ static int __bch2_read_super(const char *path, struct bch_opts *opts,
if (!opt_get(*opts, nochanges))
sb->mode |= BLK_OPEN_WRITE;
- sb->bdev_handle = bdev_open_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
- if (IS_ERR(sb->bdev_handle) &&
- PTR_ERR(sb->bdev_handle) == -EACCES &&
+ sb->s_bdev_file = bdev_file_open_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
+ if (IS_ERR(sb->s_bdev_file) &&
+ PTR_ERR(sb->s_bdev_file) == -EACCES &&
opt_get(*opts, read_only)) {
sb->mode &= ~BLK_OPEN_WRITE;
- sb->bdev_handle = bdev_open_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
- if (!IS_ERR(sb->bdev_handle))
+ sb->s_bdev_file = bdev_file_open_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
+ if (!IS_ERR(sb->s_bdev_file))
opt_set(*opts, nochanges, true);
}
- if (IS_ERR(sb->bdev_handle)) {
- ret = PTR_ERR(sb->bdev_handle);
- goto out;
+ if (IS_ERR(sb->s_bdev_file)) {
+ ret = PTR_ERR(sb->s_bdev_file);
+ goto err;
}
- sb->bdev = sb->bdev_handle->bdev;
+ sb->bdev = file_bdev(sb->s_bdev_file);
ret = bch2_sb_realloc(sb, 0);
if (ret) {
diff --git a/fs/bcachefs/super.c b/fs/bcachefs/super.c
index b991140..6b23e11 100644
--- a/fs/bcachefs/super.c
+++ b/fs/bcachefs/super.c
@@ -1428,10 +1428,10 @@ bool bch2_dev_state_allowed(struct bch_fs *c, struct bch_dev *ca,
required = max(!(flags & BCH_FORCE_IF_METADATA_DEGRADED)
? c->opts.metadata_replicas
- : c->opts.metadata_replicas_required,
+ : metadata_replicas_required(c),
!(flags & BCH_FORCE_IF_DATA_DEGRADED)
? c->opts.data_replicas
- : c->opts.data_replicas_required);
+ : data_replicas_required(c));
return nr_rw >= required;
case BCH_MEMBER_STATE_failed:
diff --git a/fs/bcachefs/super_types.h b/fs/bcachefs/super_types.h
index 0e5a14f..ec784d97 100644
--- a/fs/bcachefs/super_types.h
+++ b/fs/bcachefs/super_types.h
@@ -4,7 +4,7 @@
struct bch_sb_handle {
struct bch_sb *sb;
- struct bdev_handle *bdev_handle;
+ struct file *s_bdev_file;
struct block_device *bdev;
char *sb_name;
struct bio *bio;
diff --git a/fs/bcachefs/thread_with_file.c b/fs/bcachefs/thread_with_file.c
index b1c867a..9220d7d 100644
--- a/fs/bcachefs/thread_with_file.c
+++ b/fs/bcachefs/thread_with_file.c
@@ -53,9 +53,9 @@ int bch2_run_thread_with_file(struct thread_with_file *thr,
if (ret)
goto err;
- fd_install(fd, file);
get_task_struct(thr->task);
wake_up_process(thr->task);
+ fd_install(fd, file);
return fd;
err:
if (fd >= 0)
diff --git a/fs/bcachefs/util.c b/fs/bcachefs/util.c
index 56b815f..3a32faa 100644
--- a/fs/bcachefs/util.c
+++ b/fs/bcachefs/util.c
@@ -289,7 +289,7 @@ int bch2_save_backtrace(bch_stacktrace *stack, struct task_struct *task, unsigne
do {
nr_entries = stack_trace_save_tsk(task, stack->data, stack->size, skipnr + 1);
} while (nr_entries == stack->size &&
- !(ret = darray_make_room(stack, stack->size * 2)));
+ !(ret = darray_make_room_gfp(stack, stack->size * 2, gfp)));
stack->nr = nr_entries;
up_read(&task->signal->exec_update_lock);
@@ -418,14 +418,15 @@ static inline void bch2_time_stats_update_one(struct bch2_time_stats *stats,
bch2_quantiles_update(&stats->quantiles, duration);
}
- if (time_after64(end, stats->last_event)) {
+ if (stats->last_event && time_after64(end, stats->last_event)) {
freq = end - stats->last_event;
mean_and_variance_update(&stats->freq_stats, freq);
mean_and_variance_weighted_update(&stats->freq_stats_weighted, freq);
stats->max_freq = max(stats->max_freq, freq);
stats->min_freq = min(stats->min_freq, freq);
- stats->last_event = end;
}
+
+ stats->last_event = end;
}
static void __bch2_time_stats_clear_buffer(struct bch2_time_stats *stats,
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index a9be9ac..378d910 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1455,6 +1455,7 @@ static bool clean_pinned_extents(struct btrfs_trans_handle *trans,
*/
void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
{
+ LIST_HEAD(retry_list);
struct btrfs_block_group *block_group;
struct btrfs_space_info *space_info;
struct btrfs_trans_handle *trans;
@@ -1476,6 +1477,7 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
spin_lock(&fs_info->unused_bgs_lock);
while (!list_empty(&fs_info->unused_bgs)) {
+ u64 used;
int trimming;
block_group = list_first_entry(&fs_info->unused_bgs,
@@ -1511,9 +1513,9 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
goto next;
}
+ spin_lock(&space_info->lock);
spin_lock(&block_group->lock);
- if (block_group->reserved || block_group->pinned ||
- block_group->used || block_group->ro ||
+ if (btrfs_is_block_group_used(block_group) || block_group->ro ||
list_is_singular(&block_group->list)) {
/*
* We want to bail if we made new allocations or have
@@ -1523,10 +1525,49 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
*/
trace_btrfs_skip_unused_block_group(block_group);
spin_unlock(&block_group->lock);
+ spin_unlock(&space_info->lock);
up_write(&space_info->groups_sem);
goto next;
}
+
+ /*
+ * The block group may be unused but there may be space reserved
+ * accounting with the existence of that block group, that is,
+ * space_info->bytes_may_use was incremented by a task but no
+ * space was yet allocated from the block group by the task.
+ * That space may or may not be allocated, as we are generally
+ * pessimistic about space reservation for metadata as well as
+ * for data when using compression (as we reserve space based on
+ * the worst case, when data can't be compressed, and before
+ * actually attempting compression, before starting writeback).
+ *
+ * So check if the total space of the space_info minus the size
+ * of this block group is less than the used space of the
+ * space_info - if that's the case, then it means we have tasks
+ * that might be relying on the block group in order to allocate
+ * extents, and add back the block group to the unused list when
+ * we finish, so that we retry later in case no tasks ended up
+ * needing to allocate extents from the block group.
+ */
+ used = btrfs_space_info_used(space_info, true);
+ if (space_info->total_bytes - block_group->length < used) {
+ /*
+ * Add a reference for the list, compensate for the ref
+ * drop under the "next" label for the
+ * fs_info->unused_bgs list.
+ */
+ btrfs_get_block_group(block_group);
+ list_add_tail(&block_group->bg_list, &retry_list);
+
+ trace_btrfs_skip_unused_block_group(block_group);
+ spin_unlock(&block_group->lock);
+ spin_unlock(&space_info->lock);
+ up_write(&space_info->groups_sem);
+ goto next;
+ }
+
spin_unlock(&block_group->lock);
+ spin_unlock(&space_info->lock);
/* We don't want to force the issue, only flip if it's ok. */
ret = inc_block_group_ro(block_group, 0);
@@ -1650,12 +1691,16 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
btrfs_put_block_group(block_group);
spin_lock(&fs_info->unused_bgs_lock);
}
+ list_splice_tail(&retry_list, &fs_info->unused_bgs);
spin_unlock(&fs_info->unused_bgs_lock);
mutex_unlock(&fs_info->reclaim_bgs_lock);
return;
flip_async:
btrfs_end_transaction(trans);
+ spin_lock(&fs_info->unused_bgs_lock);
+ list_splice_tail(&retry_list, &fs_info->unused_bgs);
+ spin_unlock(&fs_info->unused_bgs_lock);
mutex_unlock(&fs_info->reclaim_bgs_lock);
btrfs_put_block_group(block_group);
btrfs_discard_punt_unused_bgs_list(fs_info);
@@ -2684,6 +2729,37 @@ void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans)
btrfs_dec_delayed_refs_rsv_bg_inserts(fs_info);
list_del_init(&block_group->bg_list);
clear_bit(BLOCK_GROUP_FLAG_NEW, &block_group->runtime_flags);
+
+ /*
+ * If the block group is still unused, add it to the list of
+ * unused block groups. The block group may have been created in
+ * order to satisfy a space reservation, in which case the
+ * extent allocation only happens later. But often we don't
+ * actually need to allocate space that we previously reserved,
+ * so the block group may become unused for a long time. For
+ * example for metadata we generally reserve space for a worst
+ * possible scenario, but then don't end up allocating all that
+ * space or none at all (due to no need to COW, extent buffers
+ * were already COWed in the current transaction and still
+ * unwritten, tree heights lower than the maximum possible
+ * height, etc). For data we generally reserve the axact amount
+ * of space we are going to allocate later, the exception is
+ * when using compression, as we must reserve space based on the
+ * uncompressed data size, because the compression is only done
+ * when writeback triggered and we don't know how much space we
+ * are actually going to need, so we reserve the uncompressed
+ * size because the data may be uncompressible in the worst case.
+ */
+ if (ret == 0) {
+ bool used;
+
+ spin_lock(&block_group->lock);
+ used = btrfs_is_block_group_used(block_group);
+ spin_unlock(&block_group->lock);
+
+ if (!used)
+ btrfs_mark_bg_unused(block_group);
+ }
}
btrfs_trans_release_chunk_metadata(trans);
}
diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
index c4a1f01..962b119 100644
--- a/fs/btrfs/block-group.h
+++ b/fs/btrfs/block-group.h
@@ -257,6 +257,13 @@ static inline u64 btrfs_block_group_end(struct btrfs_block_group *block_group)
return (block_group->start + block_group->length);
}
+static inline bool btrfs_is_block_group_used(const struct btrfs_block_group *bg)
+{
+ lockdep_assert_held(&bg->lock);
+
+ return (bg->used > 0 || bg->reserved > 0 || bg->pinned > 0);
+}
+
static inline bool btrfs_is_block_group_data_only(
struct btrfs_block_group *block_group)
{
diff --git a/fs/btrfs/block-rsv.c b/fs/btrfs/block-rsv.c
index ceb5f58..1043a81 100644
--- a/fs/btrfs/block-rsv.c
+++ b/fs/btrfs/block-rsv.c
@@ -494,7 +494,7 @@ struct btrfs_block_rsv *btrfs_use_block_rsv(struct btrfs_trans_handle *trans,
block_rsv = get_block_rsv(trans, root);
- if (unlikely(block_rsv->size == 0))
+ if (unlikely(btrfs_block_rsv_size(block_rsv) == 0))
goto try_reserve;
again:
ret = btrfs_block_rsv_use_bytes(block_rsv, blocksize);
diff --git a/fs/btrfs/block-rsv.h b/fs/btrfs/block-rsv.h
index b0bd12b..43a9a6b 100644
--- a/fs/btrfs/block-rsv.h
+++ b/fs/btrfs/block-rsv.h
@@ -101,4 +101,36 @@ static inline bool btrfs_block_rsv_full(const struct btrfs_block_rsv *rsv)
return data_race(rsv->full);
}
+/*
+ * Get the reserved mount of a block reserve in a context where getting a stale
+ * value is acceptable, instead of accessing it directly and trigger data race
+ * warning from KCSAN.
+ */
+static inline u64 btrfs_block_rsv_reserved(struct btrfs_block_rsv *rsv)
+{
+ u64 ret;
+
+ spin_lock(&rsv->lock);
+ ret = rsv->reserved;
+ spin_unlock(&rsv->lock);
+
+ return ret;
+}
+
+/*
+ * Get the size of a block reserve in a context where getting a stale value is
+ * acceptable, instead of accessing it directly and trigger data race warning
+ * from KCSAN.
+ */
+static inline u64 btrfs_block_rsv_size(struct btrfs_block_rsv *rsv)
+{
+ u64 ret;
+
+ spin_lock(&rsv->lock);
+ ret = rsv->size;
+ spin_unlock(&rsv->lock);
+
+ return ret;
+}
+
#endif /* BTRFS_BLOCK_RSV_H */
diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c
index c276b13..5b0b6457 100644
--- a/fs/btrfs/defrag.c
+++ b/fs/btrfs/defrag.c
@@ -1046,7 +1046,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
goto add;
/* Skip too large extent */
- if (range_len >= extent_thresh)
+ if (em->len >= extent_thresh)
goto next;
/*
diff --git a/fs/btrfs/delalloc-space.c b/fs/btrfs/delalloc-space.c
index 2833e8e..acf9f4b 100644
--- a/fs/btrfs/delalloc-space.c
+++ b/fs/btrfs/delalloc-space.c
@@ -245,7 +245,6 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
u64 reserve_size = 0;
u64 qgroup_rsv_size = 0;
- u64 csum_leaves;
unsigned outstanding_extents;
lockdep_assert_held(&inode->lock);
@@ -260,10 +259,12 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
outstanding_extents);
reserve_size += btrfs_calc_metadata_size(fs_info, 1);
}
- csum_leaves = btrfs_csum_bytes_to_leaves(fs_info,
- inode->csum_bytes);
- reserve_size += btrfs_calc_insert_metadata_size(fs_info,
- csum_leaves);
+ if (!(inode->flags & BTRFS_INODE_NODATASUM)) {
+ u64 csum_leaves;
+
+ csum_leaves = btrfs_csum_bytes_to_leaves(fs_info, inode->csum_bytes);
+ reserve_size += btrfs_calc_insert_metadata_size(fs_info, csum_leaves);
+ }
/*
* For qgroup rsv, the calculation is very simple:
* account one nodesize for each outstanding extent
@@ -278,14 +279,20 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
spin_unlock(&block_rsv->lock);
}
-static void calc_inode_reservations(struct btrfs_fs_info *fs_info,
+static void calc_inode_reservations(struct btrfs_inode *inode,
u64 num_bytes, u64 disk_num_bytes,
u64 *meta_reserve, u64 *qgroup_reserve)
{
+ struct btrfs_fs_info *fs_info = inode->root->fs_info;
u64 nr_extents = count_max_extents(fs_info, num_bytes);
- u64 csum_leaves = btrfs_csum_bytes_to_leaves(fs_info, disk_num_bytes);
+ u64 csum_leaves;
u64 inode_update = btrfs_calc_metadata_size(fs_info, 1);
+ if (inode->flags & BTRFS_INODE_NODATASUM)
+ csum_leaves = 0;
+ else
+ csum_leaves = btrfs_csum_bytes_to_leaves(fs_info, disk_num_bytes);
+
*meta_reserve = btrfs_calc_insert_metadata_size(fs_info,
nr_extents + csum_leaves);
@@ -337,7 +344,7 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes,
* everything out and try again, which is bad. This way we just
* over-reserve slightly, and clean up the mess when we are done.
*/
- calc_inode_reservations(fs_info, num_bytes, disk_num_bytes,
+ calc_inode_reservations(inode, num_bytes, disk_num_bytes,
&meta_reserve, &qgroup_reserve);
ret = btrfs_qgroup_reserve_meta_prealloc(root, qgroup_reserve, true,
noflush);
@@ -359,7 +366,8 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes,
nr_extents = count_max_extents(fs_info, num_bytes);
spin_lock(&inode->lock);
btrfs_mod_outstanding_extents(inode, nr_extents);
- inode->csum_bytes += disk_num_bytes;
+ if (!(inode->flags & BTRFS_INODE_NODATASUM))
+ inode->csum_bytes += disk_num_bytes;
btrfs_calculate_inode_block_rsv_size(fs_info, inode);
spin_unlock(&inode->lock);
@@ -393,7 +401,8 @@ void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes,
num_bytes = ALIGN(num_bytes, fs_info->sectorsize);
spin_lock(&inode->lock);
- inode->csum_bytes -= num_bytes;
+ if (!(inode->flags & BTRFS_INODE_NODATASUM))
+ inode->csum_bytes -= num_bytes;
btrfs_calculate_inode_block_rsv_size(fs_info, inode);
spin_unlock(&inode->lock);
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 1502d66..fb33027 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -246,7 +246,7 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
{
struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
struct btrfs_device *device;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
struct block_device *bdev;
u64 devid = BTRFS_DEV_REPLACE_DEVID;
int ret = 0;
@@ -257,13 +257,13 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
return -EINVAL;
}
- bdev_handle = bdev_open_by_path(device_path, BLK_OPEN_WRITE,
+ bdev_file = bdev_file_open_by_path(device_path, BLK_OPEN_WRITE,
fs_info->bdev_holder, NULL);
- if (IS_ERR(bdev_handle)) {
+ if (IS_ERR(bdev_file)) {
btrfs_err(fs_info, "target device %s is invalid!", device_path);
- return PTR_ERR(bdev_handle);
+ return PTR_ERR(bdev_file);
}
- bdev = bdev_handle->bdev;
+ bdev = file_bdev(bdev_file);
if (!btrfs_check_device_zone_type(fs_info, bdev)) {
btrfs_err(fs_info,
@@ -314,7 +314,7 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
device->commit_bytes_used = device->bytes_used;
device->fs_info = fs_info;
device->bdev = bdev;
- device->bdev_handle = bdev_handle;
+ device->bdev_file = bdev_file;
set_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state);
set_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state);
device->dev_stats_valid = 1;
@@ -335,7 +335,7 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
return 0;
error:
- bdev_release(bdev_handle);
+ fput(bdev_file);
return ret;
}
@@ -725,6 +725,23 @@ static int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
return ret;
}
+static int btrfs_check_replace_dev_names(struct btrfs_ioctl_dev_replace_args *args)
+{
+ if (args->start.srcdevid == 0) {
+ if (memchr(args->start.srcdev_name, 0,
+ sizeof(args->start.srcdev_name)) == NULL)
+ return -ENAMETOOLONG;
+ } else {
+ args->start.srcdev_name[0] = 0;
+ }
+
+ if (memchr(args->start.tgtdev_name, 0,
+ sizeof(args->start.tgtdev_name)) == NULL)
+ return -ENAMETOOLONG;
+
+ return 0;
+}
+
int btrfs_dev_replace_by_ioctl(struct btrfs_fs_info *fs_info,
struct btrfs_ioctl_dev_replace_args *args)
{
@@ -737,10 +754,9 @@ int btrfs_dev_replace_by_ioctl(struct btrfs_fs_info *fs_info,
default:
return -EINVAL;
}
-
- if ((args->start.srcdevid == 0 && args->start.srcdev_name[0] == '\0') ||
- args->start.tgtdev_name[0] == '\0')
- return -EINVAL;
+ ret = btrfs_check_replace_dev_names(args);
+ if (ret < 0)
+ return ret;
ret = btrfs_dev_replace_start(fs_info, args->start.tgtdev_name,
args->start.srcdevid,
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c6907d5..c843563 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1307,12 +1307,12 @@ void btrfs_free_fs_info(struct btrfs_fs_info *fs_info)
*
* @objectid: root id
* @anon_dev: preallocated anonymous block device number for new roots,
- * pass 0 for new allocation.
+ * pass NULL for a new allocation.
* @check_ref: whether to check root item references, If true, return -ENOENT
* for orphan roots
*/
static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
- u64 objectid, dev_t anon_dev,
+ u64 objectid, dev_t *anon_dev,
bool check_ref)
{
struct btrfs_root *root;
@@ -1336,8 +1336,17 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
again:
root = btrfs_lookup_fs_root(fs_info, objectid);
if (root) {
- /* Shouldn't get preallocated anon_dev for cached roots */
- ASSERT(!anon_dev);
+ /*
+ * Some other caller may have read out the newly inserted
+ * subvolume already (for things like backref walk etc). Not
+ * that common but still possible. In that case, we just need
+ * to free the anon_dev.
+ */
+ if (unlikely(anon_dev && *anon_dev)) {
+ free_anon_bdev(*anon_dev);
+ *anon_dev = 0;
+ }
+
if (check_ref && btrfs_root_refs(&root->root_item) == 0) {
btrfs_put_root(root);
return ERR_PTR(-ENOENT);
@@ -1357,7 +1366,7 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
goto fail;
}
- ret = btrfs_init_fs_root(root, anon_dev);
+ ret = btrfs_init_fs_root(root, anon_dev ? *anon_dev : 0);
if (ret)
goto fail;
@@ -1393,7 +1402,7 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
* root's anon_dev to 0 to avoid a double free, once by btrfs_put_root()
* and once again by our caller.
*/
- if (anon_dev)
+ if (anon_dev && *anon_dev)
root->anon_dev = 0;
btrfs_put_root(root);
return ERR_PTR(ret);
@@ -1409,7 +1418,7 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
u64 objectid, bool check_ref)
{
- return btrfs_get_root_ref(fs_info, objectid, 0, check_ref);
+ return btrfs_get_root_ref(fs_info, objectid, NULL, check_ref);
}
/*
@@ -1417,11 +1426,11 @@ struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
* the anonymous block device id
*
* @objectid: tree objectid
- * @anon_dev: if zero, allocate a new anonymous block device or use the
- * parameter value
+ * @anon_dev: if NULL, allocate a new anonymous block device or use the
+ * parameter value if not NULL
*/
struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
- u64 objectid, dev_t anon_dev)
+ u64 objectid, dev_t *anon_dev)
{
return btrfs_get_root_ref(fs_info, objectid, anon_dev, true);
}
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 9413726..eb3473d 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -61,7 +61,7 @@ void btrfs_free_fs_roots(struct btrfs_fs_info *fs_info);
struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
u64 objectid, bool check_ref);
struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
- u64 objectid, dev_t anon_dev);
+ u64 objectid, dev_t *anon_dev);
struct btrfs_root *btrfs_get_fs_root_commit_root(struct btrfs_fs_info *fs_info,
struct btrfs_path *path,
u64 objectid);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index cfd2967..8b4bef0 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2480,6 +2480,7 @@ static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo,
struct fiemap_cache *cache,
u64 offset, u64 phys, u64 len, u32 flags)
{
+ u64 cache_end;
int ret = 0;
/* Set at the end of extent_fiemap(). */
@@ -2489,15 +2490,102 @@ static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo,
goto assign;
/*
- * Sanity check, extent_fiemap() should have ensured that new
- * fiemap extent won't overlap with cached one.
- * Not recoverable.
+ * When iterating the extents of the inode, at extent_fiemap(), we may
+ * find an extent that starts at an offset behind the end offset of the
+ * previous extent we processed. This happens if fiemap is called
+ * without FIEMAP_FLAG_SYNC and there are ordered extents completing
+ * while we call btrfs_next_leaf() (through fiemap_next_leaf_item()).
*
- * NOTE: Physical address can overlap, due to compression
+ * For example we are in leaf X processing its last item, which is the
+ * file extent item for file range [512K, 1M[, and after
+ * btrfs_next_leaf() releases the path, there's an ordered extent that
+ * completes for the file range [768K, 2M[, and that results in trimming
+ * the file extent item so that it now corresponds to the file range
+ * [512K, 768K[ and a new file extent item is inserted for the file
+ * range [768K, 2M[, which may end up as the last item of leaf X or as
+ * the first item of the next leaf - in either case btrfs_next_leaf()
+ * will leave us with a path pointing to the new extent item, for the
+ * file range [768K, 2M[, since that's the first key that follows the
+ * last one we processed. So in order not to report overlapping extents
+ * to user space, we trim the length of the previously cached extent and
+ * emit it.
+ *
+ * Upon calling btrfs_next_leaf() we may also find an extent with an
+ * offset smaller than or equals to cache->offset, and this happens
+ * when we had a hole or prealloc extent with several delalloc ranges in
+ * it, but after btrfs_next_leaf() released the path, delalloc was
+ * flushed and the resulting ordered extents were completed, so we can
+ * now have found a file extent item for an offset that is smaller than
+ * or equals to what we have in cache->offset. We deal with this as
+ * described below.
*/
- if (cache->offset + cache->len > offset) {
- WARN_ON(1);
- return -EINVAL;
+ cache_end = cache->offset + cache->len;
+ if (cache_end > offset) {
+ if (offset == cache->offset) {
+ /*
+ * We cached a dealloc range (found in the io tree) for
+ * a hole or prealloc extent and we have now found a
+ * file extent item for the same offset. What we have
+ * now is more recent and up to date, so discard what
+ * we had in the cache and use what we have just found.
+ */
+ goto assign;
+ } else if (offset > cache->offset) {
+ /*
+ * The extent range we previously found ends after the
+ * offset of the file extent item we found and that
+ * offset falls somewhere in the middle of that previous
+ * extent range. So adjust the range we previously found
+ * to end at the offset of the file extent item we have
+ * just found, since this extent is more up to date.
+ * Emit that adjusted range and cache the file extent
+ * item we have just found. This corresponds to the case
+ * where a previously found file extent item was split
+ * due to an ordered extent completing.
+ */
+ cache->len = offset - cache->offset;
+ goto emit;
+ } else {
+ const u64 range_end = offset + len;
+
+ /*
+ * The offset of the file extent item we have just found
+ * is behind the cached offset. This means we were
+ * processing a hole or prealloc extent for which we
+ * have found delalloc ranges (in the io tree), so what
+ * we have in the cache is the last delalloc range we
+ * found while the file extent item we found can be
+ * either for a whole delalloc range we previously
+ * emmitted or only a part of that range.
+ *
+ * We have two cases here:
+ *
+ * 1) The file extent item's range ends at or behind the
+ * cached extent's end. In this case just ignore the
+ * current file extent item because we don't want to
+ * overlap with previous ranges that may have been
+ * emmitted already;
+ *
+ * 2) The file extent item starts behind the currently
+ * cached extent but its end offset goes beyond the
+ * end offset of the cached extent. We don't want to
+ * overlap with a previous range that may have been
+ * emmitted already, so we emit the currently cached
+ * extent and then partially store the current file
+ * extent item's range in the cache, for the subrange
+ * going the cached extent's end to the end of the
+ * file extent item.
+ */
+ if (range_end <= cache_end)
+ return 0;
+
+ if (!(flags & (FIEMAP_EXTENT_ENCODED | FIEMAP_EXTENT_DELALLOC)))
+ phys += cache_end - offset;
+
+ offset = cache_end;
+ len = range_end - cache_end;
+ goto emit;
+ }
}
/*
@@ -2517,6 +2605,7 @@ static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo,
return 0;
}
+emit:
/* Not mergeable, need to submit cached one */
ret = fiemap_fill_next_extent(fieinfo, cache->offset, cache->phys,
cache->len, cache->flags);
@@ -2689,16 +2778,34 @@ static int fiemap_process_hole(struct btrfs_inode *inode,
* it beyond i_size.
*/
while (cur_offset < end && cur_offset < i_size) {
+ struct extent_state *cached_state = NULL;
u64 delalloc_start;
u64 delalloc_end;
u64 prealloc_start;
+ u64 lockstart;
+ u64 lockend;
u64 prealloc_len = 0;
bool delalloc;
+ lockstart = round_down(cur_offset, inode->root->fs_info->sectorsize);
+ lockend = round_up(end, inode->root->fs_info->sectorsize);
+
+ /*
+ * We are only locking for the delalloc range because that's the
+ * only thing that can change here. With fiemap we have a lock
+ * on the inode, so no buffered or direct writes can happen.
+ *
+ * However mmaps and normal page writeback will cause this to
+ * change arbitrarily. We have to lock the extent lock here to
+ * make sure that nobody messes with the tree while we're doing
+ * btrfs_find_delalloc_in_range.
+ */
+ lock_extent(&inode->io_tree, lockstart, lockend, &cached_state);
delalloc = btrfs_find_delalloc_in_range(inode, cur_offset, end,
delalloc_cached_state,
&delalloc_start,
&delalloc_end);
+ unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state);
if (!delalloc)
break;
@@ -2866,15 +2973,15 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
u64 start, u64 len)
{
const u64 ino = btrfs_ino(inode);
- struct extent_state *cached_state = NULL;
struct extent_state *delalloc_cached_state = NULL;
struct btrfs_path *path;
struct fiemap_cache cache = { 0 };
struct btrfs_backref_share_check_ctx *backref_ctx;
u64 last_extent_end;
u64 prev_extent_end;
- u64 lockstart;
- u64 lockend;
+ u64 range_start;
+ u64 range_end;
+ const u64 sectorsize = inode->root->fs_info->sectorsize;
bool stopped = false;
int ret;
@@ -2885,22 +2992,19 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
goto out;
}
- lockstart = round_down(start, inode->root->fs_info->sectorsize);
- lockend = round_up(start + len, inode->root->fs_info->sectorsize);
- prev_extent_end = lockstart;
-
- btrfs_inode_lock(inode, BTRFS_ILOCK_SHARED);
- lock_extent(&inode->io_tree, lockstart, lockend, &cached_state);
+ range_start = round_down(start, sectorsize);
+ range_end = round_up(start + len, sectorsize);
+ prev_extent_end = range_start;
ret = fiemap_find_last_extent_offset(inode, path, &last_extent_end);
if (ret < 0)
- goto out_unlock;
+ goto out;
btrfs_release_path(path);
path->reada = READA_FORWARD;
- ret = fiemap_search_slot(inode, path, lockstart);
+ ret = fiemap_search_slot(inode, path, range_start);
if (ret < 0) {
- goto out_unlock;
+ goto out;
} else if (ret > 0) {
/*
* No file extent item found, but we may have delalloc between
@@ -2910,7 +3014,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
goto check_eof_delalloc;
}
- while (prev_extent_end < lockend) {
+ while (prev_extent_end < range_end) {
struct extent_buffer *leaf = path->nodes[0];
struct btrfs_file_extent_item *ei;
struct btrfs_key key;
@@ -2933,21 +3037,21 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
* The first iteration can leave us at an extent item that ends
* before our range's start. Move to the next item.
*/
- if (extent_end <= lockstart)
+ if (extent_end <= range_start)
goto next_item;
backref_ctx->curr_leaf_bytenr = leaf->start;
/* We have in implicit hole (NO_HOLES feature enabled). */
if (prev_extent_end < key.offset) {
- const u64 range_end = min(key.offset, lockend) - 1;
+ const u64 hole_end = min(key.offset, range_end) - 1;
ret = fiemap_process_hole(inode, fieinfo, &cache,
&delalloc_cached_state,
backref_ctx, 0, 0, 0,
- prev_extent_end, range_end);
+ prev_extent_end, hole_end);
if (ret < 0) {
- goto out_unlock;
+ goto out;
} else if (ret > 0) {
/* fiemap_fill_next_extent() told us to stop. */
stopped = true;
@@ -2955,7 +3059,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
}
/* We've reached the end of the fiemap range, stop. */
- if (key.offset >= lockend) {
+ if (key.offset >= range_end) {
stopped = true;
break;
}
@@ -3003,7 +3107,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
extent_gen,
backref_ctx);
if (ret < 0)
- goto out_unlock;
+ goto out;
else if (ret > 0)
flags |= FIEMAP_EXTENT_SHARED;
}
@@ -3014,7 +3118,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
}
if (ret < 0) {
- goto out_unlock;
+ goto out;
} else if (ret > 0) {
/* fiemap_fill_next_extent() told us to stop. */
stopped = true;
@@ -3025,12 +3129,12 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
next_item:
if (fatal_signal_pending(current)) {
ret = -EINTR;
- goto out_unlock;
+ goto out;
}
ret = fiemap_next_leaf_item(inode, path);
if (ret < 0) {
- goto out_unlock;
+ goto out;
} else if (ret > 0) {
/* No more file extent items for this inode. */
break;
@@ -3049,29 +3153,41 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
btrfs_free_path(path);
path = NULL;
- if (!stopped && prev_extent_end < lockend) {
+ if (!stopped && prev_extent_end < range_end) {
ret = fiemap_process_hole(inode, fieinfo, &cache,
&delalloc_cached_state, backref_ctx,
- 0, 0, 0, prev_extent_end, lockend - 1);
+ 0, 0, 0, prev_extent_end, range_end - 1);
if (ret < 0)
- goto out_unlock;
- prev_extent_end = lockend;
+ goto out;
+ prev_extent_end = range_end;
}
if (cache.cached && cache.offset + cache.len >= last_extent_end) {
const u64 i_size = i_size_read(&inode->vfs_inode);
if (prev_extent_end < i_size) {
+ struct extent_state *cached_state = NULL;
u64 delalloc_start;
u64 delalloc_end;
+ u64 lockstart;
+ u64 lockend;
bool delalloc;
+ lockstart = round_down(prev_extent_end, sectorsize);
+ lockend = round_up(i_size, sectorsize);
+
+ /*
+ * See the comment in fiemap_process_hole as to why
+ * we're doing the locking here.
+ */
+ lock_extent(&inode->io_tree, lockstart, lockend, &cached_state);
delalloc = btrfs_find_delalloc_in_range(inode,
prev_extent_end,
i_size - 1,
&delalloc_cached_state,
&delalloc_start,
&delalloc_end);
+ unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state);
if (!delalloc)
cache.flags |= FIEMAP_EXTENT_LAST;
} else {
@@ -3080,10 +3196,6 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
}
ret = emit_last_fiemap_cache(fieinfo, &cache);
-
-out_unlock:
- unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state);
- btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED);
out:
free_extent_state(delalloc_cached_state);
btrfs_free_backref_share_ctx(backref_ctx);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1eb93d3..4795738 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3184,8 +3184,23 @@ int btrfs_finish_one_ordered(struct btrfs_ordered_extent *ordered_extent)
unwritten_start += logical_len;
clear_extent_uptodate(io_tree, unwritten_start, end, NULL);
- /* Drop extent maps for the part of the extent we didn't write. */
- btrfs_drop_extent_map_range(inode, unwritten_start, end, false);
+ /*
+ * Drop extent maps for the part of the extent we didn't write.
+ *
+ * We have an exception here for the free_space_inode, this is
+ * because when we do btrfs_get_extent() on the free space inode
+ * we will search the commit root. If this is a new block group
+ * we won't find anything, and we will trip over the assert in
+ * writepage where we do ASSERT(em->block_start !=
+ * EXTENT_MAP_HOLE).
+ *
+ * Theoretically we could also skip this for any NOCOW extent as
+ * we don't mess with the extent map tree in the NOCOW case, but
+ * for now simply skip this if we are the free space inode.
+ */
+ if (!btrfs_is_free_space_inode(inode))
+ btrfs_drop_extent_map_range(inode, unwritten_start,
+ end, false);
/*
* If the ordered extent had an IOERR or something else went
@@ -7820,6 +7835,7 @@ struct iomap_dio *btrfs_dio_write(struct kiocb *iocb, struct iov_iter *iter,
static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
u64 start, u64 len)
{
+ struct btrfs_inode *btrfs_inode = BTRFS_I(inode);
int ret;
ret = fiemap_prep(inode, fieinfo, start, &len, 0);
@@ -7845,7 +7861,26 @@ static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
return ret;
}
- return extent_fiemap(BTRFS_I(inode), fieinfo, start, len);
+ btrfs_inode_lock(btrfs_inode, BTRFS_ILOCK_SHARED);
+
+ /*
+ * We did an initial flush to avoid holding the inode's lock while
+ * triggering writeback and waiting for the completion of IO and ordered
+ * extents. Now after we locked the inode we do it again, because it's
+ * possible a new write may have happened in between those two steps.
+ */
+ if (fieinfo->fi_flags & FIEMAP_FLAG_SYNC) {
+ ret = btrfs_wait_ordered_range(inode, 0, LLONG_MAX);
+ if (ret) {
+ btrfs_inode_unlock(btrfs_inode, BTRFS_ILOCK_SHARED);
+ return ret;
+ }
+ }
+
+ ret = extent_fiemap(btrfs_inode, fieinfo, start, len);
+ btrfs_inode_unlock(btrfs_inode, BTRFS_ILOCK_SHARED);
+
+ return ret;
}
static int btrfs_writepages(struct address_space *mapping,
@@ -10273,6 +10308,13 @@ ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from,
if (encoded->encryption != BTRFS_ENCODED_IO_ENCRYPTION_NONE)
return -EINVAL;
+ /*
+ * Compressed extents should always have checksums, so error out if we
+ * have a NOCOW file or inode was created while mounted with NODATASUM.
+ */
+ if (inode->flags & BTRFS_INODE_NODATASUM)
+ return -EINVAL;
+
orig_count = iov_iter_count(from);
/* The extent size must be sane. */
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index dfed9dd..9876ee2 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -721,7 +721,7 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
free_extent_buffer(leaf);
leaf = NULL;
- new_root = btrfs_get_new_fs_root(fs_info, objectid, anon_dev);
+ new_root = btrfs_get_new_fs_root(fs_info, objectid, &anon_dev);
if (IS_ERR(new_root)) {
ret = PTR_ERR(new_root);
btrfs_abort_transaction(trans, ret);
@@ -2698,7 +2698,7 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
struct inode *inode = file_inode(file);
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
struct btrfs_ioctl_vol_args_v2 *vol_args;
- struct bdev_handle *bdev_handle = NULL;
+ struct file *bdev_file = NULL;
int ret;
bool cancel = false;
@@ -2735,7 +2735,7 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
goto err_drop;
/* Exclusive operation is now claimed */
- ret = btrfs_rm_device(fs_info, &args, &bdev_handle);
+ ret = btrfs_rm_device(fs_info, &args, &bdev_file);
btrfs_exclop_finish(fs_info);
@@ -2749,8 +2749,8 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
}
err_drop:
mnt_drop_write_file(file);
- if (bdev_handle)
- bdev_release(bdev_handle);
+ if (bdev_file)
+ fput(bdev_file);
out:
btrfs_put_dev_args_from_path(&args);
kfree(vol_args);
@@ -2763,7 +2763,7 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
struct inode *inode = file_inode(file);
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
struct btrfs_ioctl_vol_args *vol_args;
- struct bdev_handle *bdev_handle = NULL;
+ struct file *bdev_file = NULL;
int ret;
bool cancel = false;
@@ -2790,15 +2790,15 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
ret = exclop_start_or_cancel_reloc(fs_info, BTRFS_EXCLOP_DEV_REMOVE,
cancel);
if (ret == 0) {
- ret = btrfs_rm_device(fs_info, &args, &bdev_handle);
+ ret = btrfs_rm_device(fs_info, &args, &bdev_file);
if (!ret)
btrfs_info(fs_info, "disk deleted %s", vol_args->name);
btrfs_exclop_finish(fs_info);
}
mnt_drop_write_file(file);
- if (bdev_handle)
- bdev_release(bdev_handle);
+ if (bdev_file)
+ fput(bdev_file);
out:
btrfs_put_dev_args_from_path(&args);
kfree(vol_args);
@@ -3815,6 +3815,11 @@ static long btrfs_ioctl_qgroup_create(struct file *file, void __user *arg)
goto out;
}
+ if (sa->create && is_fstree(sa->qgroupid)) {
+ ret = -EINVAL;
+ goto out;
+ }
+
trans = btrfs_join_transaction(root);
if (IS_ERR(trans)) {
ret = PTR_ERR(trans);
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 63b426c..5470e1c 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1736,6 +1736,15 @@ int btrfs_create_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
return ret;
}
+static bool qgroup_has_usage(struct btrfs_qgroup *qgroup)
+{
+ return (qgroup->rfer > 0 || qgroup->rfer_cmpr > 0 ||
+ qgroup->excl > 0 || qgroup->excl_cmpr > 0 ||
+ qgroup->rsv.values[BTRFS_QGROUP_RSV_DATA] > 0 ||
+ qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PREALLOC] > 0 ||
+ qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PERTRANS] > 0);
+}
+
int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
{
struct btrfs_fs_info *fs_info = trans->fs_info;
@@ -1755,6 +1764,11 @@ int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
goto out;
}
+ if (is_fstree(qgroupid) && qgroup_has_usage(qgroup)) {
+ ret = -EBUSY;
+ goto out;
+ }
+
/* Check if there are no children of this qgroup */
if (!list_empty(&qgroup->members)) {
ret = -EBUSY;
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 2d7519a..e48a063 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6705,11 +6705,20 @@ static int finish_inode_if_needed(struct send_ctx *sctx, int at_end)
if (ret)
goto out;
}
- if (sctx->cur_inode_last_extent <
- sctx->cur_inode_size) {
- ret = send_hole(sctx, sctx->cur_inode_size);
- if (ret)
+ if (sctx->cur_inode_last_extent < sctx->cur_inode_size) {
+ ret = range_is_hole_in_parent(sctx,
+ sctx->cur_inode_last_extent,
+ sctx->cur_inode_size);
+ if (ret < 0) {
goto out;
+ } else if (ret == 0) {
+ ret = send_hole(sctx, sctx->cur_inode_size);
+ if (ret < 0)
+ goto out;
+ } else {
+ /* Range is already a hole, skip. */
+ ret = 0;
+ }
}
}
if (need_truncate) {
@@ -8111,7 +8120,7 @@ long btrfs_ioctl_send(struct inode *inode, struct btrfs_ioctl_send_args *arg)
}
if (arg->flags & ~BTRFS_SEND_FLAG_MASK) {
- ret = -EINVAL;
+ ret = -EOPNOTSUPP;
goto out;
}
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 571bb13..3b54eb5 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -856,7 +856,7 @@ btrfs_calc_reclaim_metadata_size(struct btrfs_fs_info *fs_info,
static bool need_preemptive_reclaim(struct btrfs_fs_info *fs_info,
struct btrfs_space_info *space_info)
{
- u64 global_rsv_size = fs_info->global_block_rsv.reserved;
+ const u64 global_rsv_size = btrfs_block_rsv_reserved(&fs_info->global_block_rsv);
u64 ordered, delalloc;
u64 thresh;
u64 used;
@@ -956,8 +956,8 @@ static bool need_preemptive_reclaim(struct btrfs_fs_info *fs_info,
ordered = percpu_counter_read_positive(&fs_info->ordered_bytes) >> 1;
delalloc = percpu_counter_read_positive(&fs_info->delalloc_bytes);
if (ordered >= delalloc)
- used += fs_info->delayed_refs_rsv.reserved +
- fs_info->delayed_block_rsv.reserved;
+ used += btrfs_block_rsv_reserved(&fs_info->delayed_refs_rsv) +
+ btrfs_block_rsv_reserved(&fs_info->delayed_block_rsv);
else
used += space_info->bytes_may_use - global_rsv_size;
@@ -1173,7 +1173,7 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work)
enum btrfs_flush_state flush;
u64 delalloc_size = 0;
u64 to_reclaim, block_rsv_size;
- u64 global_rsv_size = global_rsv->reserved;
+ const u64 global_rsv_size = btrfs_block_rsv_reserved(global_rsv);
loops++;
@@ -1185,9 +1185,9 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work)
* assume it's tied up in delalloc reservations.
*/
block_rsv_size = global_rsv_size +
- delayed_block_rsv->reserved +
- delayed_refs_rsv->reserved +
- trans_rsv->reserved;
+ btrfs_block_rsv_reserved(delayed_block_rsv) +
+ btrfs_block_rsv_reserved(delayed_refs_rsv) +
+ btrfs_block_rsv_reserved(trans_rsv);
if (block_rsv_size < space_info->bytes_may_use)
delalloc_size = space_info->bytes_may_use - block_rsv_size;
@@ -1207,16 +1207,16 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work)
to_reclaim = delalloc_size;
flush = FLUSH_DELALLOC;
} else if (space_info->bytes_pinned >
- (delayed_block_rsv->reserved +
- delayed_refs_rsv->reserved)) {
+ (btrfs_block_rsv_reserved(delayed_block_rsv) +
+ btrfs_block_rsv_reserved(delayed_refs_rsv))) {
to_reclaim = space_info->bytes_pinned;
flush = COMMIT_TRANS;
- } else if (delayed_block_rsv->reserved >
- delayed_refs_rsv->reserved) {
- to_reclaim = delayed_block_rsv->reserved;
+ } else if (btrfs_block_rsv_reserved(delayed_block_rsv) >
+ btrfs_block_rsv_reserved(delayed_refs_rsv)) {
+ to_reclaim = btrfs_block_rsv_reserved(delayed_block_rsv);
flush = FLUSH_DELAYED_ITEMS_NR;
} else {
- to_reclaim = delayed_refs_rsv->reserved;
+ to_reclaim = btrfs_block_rsv_reserved(delayed_refs_rsv);
flush = FLUSH_DELAYED_REFS_NR;
}
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 5b3333c..bf8e64c 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -564,56 +564,22 @@ static int btrfs_reserve_trans_metadata(struct btrfs_fs_info *fs_info,
u64 num_bytes,
u64 *delayed_refs_bytes)
{
- struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv;
struct btrfs_space_info *si = fs_info->trans_block_rsv.space_info;
- u64 extra_delayed_refs_bytes = 0;
- u64 bytes;
+ u64 bytes = num_bytes + *delayed_refs_bytes;
int ret;
/*
- * If there's a gap between the size of the delayed refs reserve and
- * its reserved space, than some tasks have added delayed refs or bumped
- * its size otherwise (due to block group creation or removal, or block
- * group item update). Also try to allocate that gap in order to prevent
- * using (and possibly abusing) the global reserve when committing the
- * transaction.
- */
- if (flush == BTRFS_RESERVE_FLUSH_ALL &&
- !btrfs_block_rsv_full(delayed_refs_rsv)) {
- spin_lock(&delayed_refs_rsv->lock);
- if (delayed_refs_rsv->size > delayed_refs_rsv->reserved)
- extra_delayed_refs_bytes = delayed_refs_rsv->size -
- delayed_refs_rsv->reserved;
- spin_unlock(&delayed_refs_rsv->lock);
- }
-
- bytes = num_bytes + *delayed_refs_bytes + extra_delayed_refs_bytes;
-
- /*
* We want to reserve all the bytes we may need all at once, so we only
* do 1 enospc flushing cycle per transaction start.
*/
ret = btrfs_reserve_metadata_bytes(fs_info, si, bytes, flush);
- if (ret == 0) {
- if (extra_delayed_refs_bytes > 0)
- btrfs_migrate_to_delayed_refs_rsv(fs_info,
- extra_delayed_refs_bytes);
- return 0;
- }
-
- if (extra_delayed_refs_bytes > 0) {
- bytes -= extra_delayed_refs_bytes;
- ret = btrfs_reserve_metadata_bytes(fs_info, si, bytes, flush);
- if (ret == 0)
- return 0;
- }
/*
* If we are an emergency flush, which can steal from the global block
* reserve, then attempt to not reserve space for the delayed refs, as
* we will consume space for them from the global block reserve.
*/
- if (flush == BTRFS_RESERVE_FLUSH_ALL_STEAL) {
+ if (ret && flush == BTRFS_RESERVE_FLUSH_ALL_STEAL) {
bytes -= *delayed_refs_bytes;
*delayed_refs_bytes = 0;
ret = btrfs_reserve_metadata_bytes(fs_info, si, bytes, flush);
@@ -1868,7 +1834,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
}
key.offset = (u64)-1;
- pending->snap = btrfs_get_new_fs_root(fs_info, objectid, pending->anon_dev);
+ pending->snap = btrfs_get_new_fs_root(fs_info, objectid, &pending->anon_dev);
if (IS_ERR(pending->snap)) {
ret = PTR_ERR(pending->snap);
pending->snap = NULL;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d67785b..e180da4 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -468,39 +468,39 @@ static noinline struct btrfs_fs_devices *find_fsid(
static int
btrfs_get_bdev_and_sb(const char *device_path, blk_mode_t flags, void *holder,
- int flush, struct bdev_handle **bdev_handle,
+ int flush, struct file **bdev_file,
struct btrfs_super_block **disk_super)
{
struct block_device *bdev;
int ret;
- *bdev_handle = bdev_open_by_path(device_path, flags, holder, NULL);
+ *bdev_file = bdev_file_open_by_path(device_path, flags, holder, NULL);
- if (IS_ERR(*bdev_handle)) {
- ret = PTR_ERR(*bdev_handle);
+ if (IS_ERR(*bdev_file)) {
+ ret = PTR_ERR(*bdev_file);
goto error;
}
- bdev = (*bdev_handle)->bdev;
+ bdev = file_bdev(*bdev_file);
if (flush)
sync_blockdev(bdev);
ret = set_blocksize(bdev, BTRFS_BDEV_BLOCKSIZE);
if (ret) {
- bdev_release(*bdev_handle);
+ fput(*bdev_file);
goto error;
}
invalidate_bdev(bdev);
*disk_super = btrfs_read_dev_super(bdev);
if (IS_ERR(*disk_super)) {
ret = PTR_ERR(*disk_super);
- bdev_release(*bdev_handle);
+ fput(*bdev_file);
goto error;
}
return 0;
error:
- *bdev_handle = NULL;
+ *bdev_file = NULL;
return ret;
}
@@ -643,7 +643,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
struct btrfs_device *device, blk_mode_t flags,
void *holder)
{
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
struct btrfs_super_block *disk_super;
u64 devid;
int ret;
@@ -654,7 +654,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
return -EINVAL;
ret = btrfs_get_bdev_and_sb(device->name->str, flags, holder, 1,
- &bdev_handle, &disk_super);
+ &bdev_file, &disk_super);
if (ret)
return ret;
@@ -678,20 +678,20 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
fs_devices->seeding = true;
} else {
- if (bdev_read_only(bdev_handle->bdev))
+ if (bdev_read_only(file_bdev(bdev_file)))
clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
else
set_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
}
- if (!bdev_nonrot(bdev_handle->bdev))
+ if (!bdev_nonrot(file_bdev(bdev_file)))
fs_devices->rotating = true;
- if (bdev_max_discard_sectors(bdev_handle->bdev))
+ if (bdev_max_discard_sectors(file_bdev(bdev_file)))
fs_devices->discardable = true;
- device->bdev_handle = bdev_handle;
- device->bdev = bdev_handle->bdev;
+ device->bdev_file = bdev_file;
+ device->bdev = file_bdev(bdev_file);
clear_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state);
fs_devices->open_devices++;
@@ -706,7 +706,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
error_free_page:
btrfs_release_disk_super(disk_super);
- bdev_release(bdev_handle);
+ fput(bdev_file);
return -EINVAL;
}
@@ -1015,10 +1015,10 @@ static void __btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices,
if (device->devid == BTRFS_DEV_REPLACE_DEVID)
continue;
- if (device->bdev_handle) {
- bdev_release(device->bdev_handle);
+ if (device->bdev_file) {
+ fput(device->bdev_file);
device->bdev = NULL;
- device->bdev_handle = NULL;
+ device->bdev_file = NULL;
fs_devices->open_devices--;
}
if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
@@ -1063,7 +1063,7 @@ static void btrfs_close_bdev(struct btrfs_device *device)
invalidate_bdev(device->bdev);
}
- bdev_release(device->bdev_handle);
+ fput(device->bdev_file);
}
static void btrfs_close_one_device(struct btrfs_device *device)
@@ -1316,7 +1316,7 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
struct btrfs_super_block *disk_super;
bool new_device_added = false;
struct btrfs_device *device = NULL;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
u64 bytenr, bytenr_orig;
int ret;
@@ -1339,18 +1339,18 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
* values temporarily, as the device paths of the fsid are the only
* required information for assembling the volume.
*/
- bdev_handle = bdev_open_by_path(path, flags, NULL, NULL);
- if (IS_ERR(bdev_handle))
- return ERR_CAST(bdev_handle);
+ bdev_file = bdev_file_open_by_path(path, flags, NULL, NULL);
+ if (IS_ERR(bdev_file))
+ return ERR_CAST(bdev_file);
bytenr_orig = btrfs_sb_offset(0);
- ret = btrfs_sb_log_location_bdev(bdev_handle->bdev, 0, READ, &bytenr);
+ ret = btrfs_sb_log_location_bdev(file_bdev(bdev_file), 0, READ, &bytenr);
if (ret) {
device = ERR_PTR(ret);
goto error_bdev_put;
}
- disk_super = btrfs_read_disk_super(bdev_handle->bdev, bytenr,
+ disk_super = btrfs_read_disk_super(file_bdev(bdev_file), bytenr,
bytenr_orig);
if (IS_ERR(disk_super)) {
device = ERR_CAST(disk_super);
@@ -1381,7 +1381,7 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
btrfs_release_disk_super(disk_super);
error_bdev_put:
- bdev_release(bdev_handle);
+ fput(bdev_file);
return device;
}
@@ -2057,7 +2057,7 @@ void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info,
int btrfs_rm_device(struct btrfs_fs_info *fs_info,
struct btrfs_dev_lookup_args *args,
- struct bdev_handle **bdev_handle)
+ struct file **bdev_file)
{
struct btrfs_trans_handle *trans;
struct btrfs_device *device;
@@ -2166,7 +2166,7 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info,
btrfs_assign_next_active_device(device, NULL);
- if (device->bdev_handle) {
+ if (device->bdev_file) {
cur_devices->open_devices--;
/* remove sysfs entry */
btrfs_sysfs_remove_device(device);
@@ -2182,9 +2182,9 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info,
* free the device.
*
* We cannot call btrfs_close_bdev() here because we're holding the sb
- * write lock, and bdev_release() will pull in the ->open_mutex on
- * the block device and it's dependencies. Instead just flush the
- * device and let the caller do the final bdev_release.
+ * write lock, and fput() on the block device will pull in the
+ * ->open_mutex on the block device and it's dependencies. Instead
+ * just flush the device and let the caller do the final bdev_release.
*/
if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
btrfs_scratch_superblocks(fs_info, device->bdev,
@@ -2195,7 +2195,7 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info,
}
}
- *bdev_handle = device->bdev_handle;
+ *bdev_file = device->bdev_file;
synchronize_rcu();
btrfs_free_device(device);
@@ -2332,7 +2332,7 @@ int btrfs_get_dev_args_from_path(struct btrfs_fs_info *fs_info,
const char *path)
{
struct btrfs_super_block *disk_super;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
int ret;
if (!path || !path[0])
@@ -2350,7 +2350,7 @@ int btrfs_get_dev_args_from_path(struct btrfs_fs_info *fs_info,
}
ret = btrfs_get_bdev_and_sb(path, BLK_OPEN_READ, NULL, 0,
- &bdev_handle, &disk_super);
+ &bdev_file, &disk_super);
if (ret) {
btrfs_put_dev_args_from_path(args);
return ret;
@@ -2363,7 +2363,7 @@ int btrfs_get_dev_args_from_path(struct btrfs_fs_info *fs_info,
else
memcpy(args->fsid, disk_super->fsid, BTRFS_FSID_SIZE);
btrfs_release_disk_super(disk_super);
- bdev_release(bdev_handle);
+ fput(bdev_file);
return 0;
}
@@ -2583,7 +2583,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
struct btrfs_root *root = fs_info->dev_root;
struct btrfs_trans_handle *trans;
struct btrfs_device *device;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
struct super_block *sb = fs_info->sb;
struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
struct btrfs_fs_devices *seed_devices = NULL;
@@ -2596,12 +2596,12 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
if (sb_rdonly(sb) && !fs_devices->seeding)
return -EROFS;
- bdev_handle = bdev_open_by_path(device_path, BLK_OPEN_WRITE,
+ bdev_file = bdev_file_open_by_path(device_path, BLK_OPEN_WRITE,
fs_info->bdev_holder, NULL);
- if (IS_ERR(bdev_handle))
- return PTR_ERR(bdev_handle);
+ if (IS_ERR(bdev_file))
+ return PTR_ERR(bdev_file);
- if (!btrfs_check_device_zone_type(fs_info, bdev_handle->bdev)) {
+ if (!btrfs_check_device_zone_type(fs_info, file_bdev(bdev_file))) {
ret = -EINVAL;
goto error;
}
@@ -2613,11 +2613,11 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
locked = true;
}
- sync_blockdev(bdev_handle->bdev);
+ sync_blockdev(file_bdev(bdev_file));
rcu_read_lock();
list_for_each_entry_rcu(device, &fs_devices->devices, dev_list) {
- if (device->bdev == bdev_handle->bdev) {
+ if (device->bdev == file_bdev(bdev_file)) {
ret = -EEXIST;
rcu_read_unlock();
goto error;
@@ -2633,8 +2633,8 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
}
device->fs_info = fs_info;
- device->bdev_handle = bdev_handle;
- device->bdev = bdev_handle->bdev;
+ device->bdev_file = bdev_file;
+ device->bdev = file_bdev(bdev_file);
ret = lookup_bdev(device_path, &device->devt);
if (ret)
goto error_free_device;
@@ -2817,7 +2817,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
error_free_device:
btrfs_free_device(device);
error:
- bdev_release(bdev_handle);
+ fput(bdev_file);
if (locked) {
mutex_unlock(&uuid_mutex);
up_write(&sb->s_umount);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 53f87f3..a118549 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -90,7 +90,7 @@ struct btrfs_device {
u64 generation;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
struct block_device *bdev;
struct btrfs_zoned_device_info *zone_info;
@@ -661,7 +661,7 @@ struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
void btrfs_put_dev_args_from_path(struct btrfs_dev_lookup_args *args);
int btrfs_rm_device(struct btrfs_fs_info *fs_info,
struct btrfs_dev_lookup_args *args,
- struct bdev_handle **bdev_handle);
+ struct file **bdev_file);
void __exit btrfs_cleanup_fs_uuids(void);
int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len);
int btrfs_grow_device(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index cf2e779..aea51fd 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1652,6 +1652,15 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
}
out:
+ /* Reject non SINGLE data profiles without RST */
+ if ((map->type & BTRFS_BLOCK_GROUP_DATA) &&
+ (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) &&
+ !fs_info->stripe_root) {
+ btrfs_err(fs_info, "zoned: data %s needs raid-stripe-tree",
+ btrfs_bg_type_to_raid_name(map->type));
+ return -EINVAL;
+ }
+
if (cache->alloc_offset > cache->zone_capacity) {
btrfs_err(fs_info,
"zoned: invalid write pointer %llu (larger than zone capacity %llu) in block group %llu",
@@ -1683,6 +1692,7 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
}
bitmap_free(active);
kfree(zone_info);
+ btrfs_free_chunk_map(map);
return ret;
}
diff --git a/fs/buffer.c b/fs/buffer.c
index d3bcf60..4f73d23 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -55,7 +55,7 @@
static int fsync_buffers_list(spinlock_t *lock, struct list_head *list);
static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
- struct writeback_control *wbc);
+ enum rw_hint hint, struct writeback_control *wbc);
#define BH_ENTRY(list) list_entry((list), struct buffer_head, b_assoc_buffers)
@@ -464,7 +464,7 @@ EXPORT_SYMBOL(mark_buffer_async_write);
* a successful fsync(). For example, ext2 indirect blocks need to be
* written back and waited upon before fsync() returns.
*
- * The functions mark_buffer_inode_dirty(), fsync_inode_buffers(),
+ * The functions mark_buffer_dirty_inode(), fsync_inode_buffers(),
* inode_has_buffers() and invalidate_inode_buffers() are provided for the
* management of a list of dependent buffers at ->i_mapping->i_private_list.
*
@@ -1889,7 +1889,8 @@ int __block_write_full_folio(struct inode *inode, struct folio *folio,
do {
struct buffer_head *next = bh->b_this_page;
if (buffer_async_write(bh)) {
- submit_bh_wbc(REQ_OP_WRITE | write_flags, bh, wbc);
+ submit_bh_wbc(REQ_OP_WRITE | write_flags, bh,
+ inode->i_write_hint, wbc);
nr_underway++;
}
bh = next;
@@ -1944,7 +1945,8 @@ int __block_write_full_folio(struct inode *inode, struct folio *folio,
struct buffer_head *next = bh->b_this_page;
if (buffer_async_write(bh)) {
clear_buffer_dirty(bh);
- submit_bh_wbc(REQ_OP_WRITE | write_flags, bh, wbc);
+ submit_bh_wbc(REQ_OP_WRITE | write_flags, bh,
+ inode->i_write_hint, wbc);
nr_underway++;
}
bh = next;
@@ -2756,6 +2758,7 @@ static void end_bio_bh_io_sync(struct bio *bio)
}
static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
+ enum rw_hint write_hint,
struct writeback_control *wbc)
{
const enum req_op op = opf & REQ_OP_MASK;
@@ -2783,6 +2786,7 @@ static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
fscrypt_set_bio_crypt_ctx_bh(bio, bh, GFP_NOIO);
bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
+ bio->bi_write_hint = write_hint;
__bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh));
@@ -2802,7 +2806,7 @@ static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
void submit_bh(blk_opf_t opf, struct buffer_head *bh)
{
- submit_bh_wbc(opf, bh, NULL);
+ submit_bh_wbc(opf, bh, WRITE_LIFE_NOT_SET, NULL);
}
EXPORT_SYMBOL(submit_bh);
@@ -3121,12 +3125,8 @@ void __init buffer_init(void)
unsigned long nrpages;
int ret;
- bh_cachep = kmem_cache_create("buffer_head",
- sizeof(struct buffer_head), 0,
- (SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|
- SLAB_MEM_SPREAD),
- NULL);
-
+ bh_cachep = KMEM_CACHE(buffer_head,
+ SLAB_RECLAIM_ACCOUNT|SLAB_PANIC);
/*
* Limit the bh occupancy to 10% of ZONE_NORMAL
*/
diff --git a/fs/cachefiles/cache.c b/fs/cachefiles/cache.c
index 7077f72..f449f73 100644
--- a/fs/cachefiles/cache.c
+++ b/fs/cachefiles/cache.c
@@ -168,6 +168,8 @@ int cachefiles_add_cache(struct cachefiles_cache *cache)
dput(root);
error_open_root:
cachefiles_end_secure(cache, saved_cred);
+ put_cred(cache->cache_cred);
+ cache->cache_cred = NULL;
error_getsec:
fscache_relinquish_cache(cache_cookie);
cache->cache = NULL;
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 3f24905..6465e25 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -816,6 +816,7 @@ static void cachefiles_daemon_unbind(struct cachefiles_cache *cache)
cachefiles_put_directory(cache->graveyard);
cachefiles_put_directory(cache->store);
mntput(cache->mnt);
+ put_cred(cache->cache_cred);
kfree(cache->rootdirname);
kfree(cache->secctx);
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 9c02f32..7fb4aae 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -1452,7 +1452,7 @@ static void __prep_cap(struct cap_msg_args *arg, struct ceph_cap *cap,
if (flushing & CEPH_CAP_XATTR_EXCL) {
arg->old_xattr_buf = __ceph_build_xattrs_blob(ci);
arg->xattr_version = ci->i_xattrs.version;
- arg->xattr_buf = ci->i_xattrs.blob;
+ arg->xattr_buf = ceph_buffer_get(ci->i_xattrs.blob);
} else {
arg->xattr_buf = NULL;
arg->old_xattr_buf = NULL;
@@ -1553,6 +1553,7 @@ static void __send_cap(struct cap_msg_args *arg, struct ceph_inode_info *ci)
encode_cap_msg(msg, arg);
ceph_con_send(&arg->session->s_con, msg);
ceph_buffer_put(arg->old_xattr_buf);
+ ceph_buffer_put(arg->xattr_buf);
if (arg->wake)
wake_up_all(&ci->i_cap_wq);
}
@@ -2155,6 +2156,30 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags)
ceph_cap_string(cap->implemented),
ceph_cap_string(revoking));
+ /* completed revocation? going down and there are no caps? */
+ if (revoking) {
+ if ((revoking & cap_used) == 0) {
+ doutc(cl, "completed revocation of %s\n",
+ ceph_cap_string(cap->implemented & ~cap->issued));
+ goto ack;
+ }
+
+ /*
+ * If the "i_wrbuffer_ref" was increased by mmap or generic
+ * cache write just before the ceph_check_caps() is called,
+ * the Fb capability revoking will fail this time. Then we
+ * must wait for the BDI's delayed work to flush the dirty
+ * pages and to release the "i_wrbuffer_ref", which will cost
+ * at most 5 seconds. That means the MDS needs to wait at
+ * most 5 seconds to finished the Fb capability's revocation.
+ *
+ * Let's queue a writeback for it.
+ */
+ if (S_ISREG(inode->i_mode) && ci->i_wrbuffer_ref &&
+ (revoking & CEPH_CAP_FILE_BUFFER))
+ queue_writeback = true;
+ }
+
if (cap == ci->i_auth_cap &&
(cap->issued & CEPH_CAP_FILE_WR)) {
/* request larger max_size from MDS? */
@@ -2182,30 +2207,6 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags)
}
}
- /* completed revocation? going down and there are no caps? */
- if (revoking) {
- if ((revoking & cap_used) == 0) {
- doutc(cl, "completed revocation of %s\n",
- ceph_cap_string(cap->implemented & ~cap->issued));
- goto ack;
- }
-
- /*
- * If the "i_wrbuffer_ref" was increased by mmap or generic
- * cache write just before the ceph_check_caps() is called,
- * the Fb capability revoking will fail this time. Then we
- * must wait for the BDI's delayed work to flush the dirty
- * pages and to release the "i_wrbuffer_ref", which will cost
- * at most 5 seconds. That means the MDS needs to wait at
- * most 5 seconds to finished the Fb capability's revocation.
- *
- * Let's queue a writeback for it.
- */
- if (S_ISREG(inode->i_mode) && ci->i_wrbuffer_ref &&
- (revoking & CEPH_CAP_FILE_BUFFER))
- queue_writeback = true;
- }
-
/* want more caps from mds? */
if (want & ~cap->mds_wanted) {
if (want & ~(cap->mds_wanted | cap->issued))
@@ -3215,7 +3216,6 @@ static int ceph_try_drop_cap_snap(struct ceph_inode_info *ci,
enum put_cap_refs_mode {
PUT_CAP_REFS_SYNC = 0,
- PUT_CAP_REFS_NO_CHECK,
PUT_CAP_REFS_ASYNC,
};
@@ -3331,11 +3331,6 @@ void ceph_put_cap_refs_async(struct ceph_inode_info *ci, int had)
__ceph_put_cap_refs(ci, had, PUT_CAP_REFS_ASYNC);
}
-void ceph_put_cap_refs_no_check_caps(struct ceph_inode_info *ci, int had)
-{
- __ceph_put_cap_refs(ci, had, PUT_CAP_REFS_NO_CHECK);
-}
-
/*
* Release @nr WRBUFFER refs on dirty pages for the given @snapc snap
* context. Adjust per-snap dirty page accounting as appropriate.
@@ -4777,7 +4772,22 @@ int ceph_drop_caps_for_unlink(struct inode *inode)
if (__ceph_caps_dirty(ci)) {
struct ceph_mds_client *mdsc =
ceph_inode_to_fs_client(inode)->mdsc;
- __cap_delay_requeue_front(mdsc, ci);
+
+ doutc(mdsc->fsc->client, "%p %llx.%llx\n", inode,
+ ceph_vinop(inode));
+ spin_lock(&mdsc->cap_unlink_delay_lock);
+ ci->i_ceph_flags |= CEPH_I_FLUSH;
+ if (!list_empty(&ci->i_cap_delay_list))
+ list_del_init(&ci->i_cap_delay_list);
+ list_add_tail(&ci->i_cap_delay_list,
+ &mdsc->cap_unlink_delay_list);
+ spin_unlock(&mdsc->cap_unlink_delay_lock);
+
+ /*
+ * Fire the work immediately, because the MDS maybe
+ * waiting for caps release.
+ */
+ ceph_queue_cap_unlink_work(mdsc);
}
}
spin_unlock(&ci->i_ceph_lock);
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 0c25d32..7b2e775 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -78,6 +78,8 @@ struct inode *ceph_new_inode(struct inode *dir, struct dentry *dentry,
if (!inode)
return ERR_PTR(-ENOMEM);
+ inode->i_blkbits = CEPH_FSCRYPT_BLOCK_SHIFT;
+
if (!S_ISLNK(*mode)) {
err = ceph_pre_init_acls(dir, mode, as_ctx);
if (err < 0)
diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c
index e07ad29..ebf4ac0 100644
--- a/fs/ceph/locks.c
+++ b/fs/ceph/locks.c
@@ -33,7 +33,7 @@ void __init ceph_flock_init(void)
static void ceph_fl_copy_lock(struct file_lock *dst, struct file_lock *src)
{
- struct inode *inode = file_inode(dst->fl_file);
+ struct inode *inode = file_inode(dst->c.flc_file);
atomic_inc(&ceph_inode(inode)->i_filelock_ref);
dst->fl_u.ceph.inode = igrab(inode);
}
@@ -110,17 +110,18 @@ static int ceph_lock_message(u8 lock_type, u16 operation, struct inode *inode,
else
length = fl->fl_end - fl->fl_start + 1;
- owner = secure_addr(fl->fl_owner);
+ owner = secure_addr(fl->c.flc_owner);
doutc(cl, "rule: %d, op: %d, owner: %llx, pid: %llu, "
"start: %llu, length: %llu, wait: %d, type: %d\n",
- (int)lock_type, (int)operation, owner, (u64)fl->fl_pid,
- fl->fl_start, length, wait, fl->fl_type);
+ (int)lock_type, (int)operation, owner,
+ (u64) fl->c.flc_pid,
+ fl->fl_start, length, wait, fl->c.flc_type);
req->r_args.filelock_change.rule = lock_type;
req->r_args.filelock_change.type = cmd;
req->r_args.filelock_change.owner = cpu_to_le64(owner);
- req->r_args.filelock_change.pid = cpu_to_le64((u64)fl->fl_pid);
+ req->r_args.filelock_change.pid = cpu_to_le64((u64) fl->c.flc_pid);
req->r_args.filelock_change.start = cpu_to_le64(fl->fl_start);
req->r_args.filelock_change.length = cpu_to_le64(length);
req->r_args.filelock_change.wait = wait;
@@ -130,13 +131,13 @@ static int ceph_lock_message(u8 lock_type, u16 operation, struct inode *inode,
err = ceph_mdsc_wait_request(mdsc, req, wait ?
ceph_lock_wait_for_completion : NULL);
if (!err && operation == CEPH_MDS_OP_GETFILELOCK) {
- fl->fl_pid = -le64_to_cpu(req->r_reply_info.filelock_reply->pid);
+ fl->c.flc_pid = -le64_to_cpu(req->r_reply_info.filelock_reply->pid);
if (CEPH_LOCK_SHARED == req->r_reply_info.filelock_reply->type)
- fl->fl_type = F_RDLCK;
+ fl->c.flc_type = F_RDLCK;
else if (CEPH_LOCK_EXCL == req->r_reply_info.filelock_reply->type)
- fl->fl_type = F_WRLCK;
+ fl->c.flc_type = F_WRLCK;
else
- fl->fl_type = F_UNLCK;
+ fl->c.flc_type = F_UNLCK;
fl->fl_start = le64_to_cpu(req->r_reply_info.filelock_reply->start);
length = le64_to_cpu(req->r_reply_info.filelock_reply->start) +
@@ -150,8 +151,8 @@ static int ceph_lock_message(u8 lock_type, u16 operation, struct inode *inode,
ceph_mdsc_put_request(req);
doutc(cl, "rule: %d, op: %d, pid: %llu, start: %llu, "
"length: %llu, wait: %d, type: %d, err code %d\n",
- (int)lock_type, (int)operation, (u64)fl->fl_pid,
- fl->fl_start, length, wait, fl->fl_type, err);
+ (int)lock_type, (int)operation, (u64) fl->c.flc_pid,
+ fl->fl_start, length, wait, fl->c.flc_type, err);
return err;
}
@@ -227,10 +228,10 @@ static int ceph_lock_wait_for_completion(struct ceph_mds_client *mdsc,
static int try_unlock_file(struct file *file, struct file_lock *fl)
{
int err;
- unsigned int orig_flags = fl->fl_flags;
- fl->fl_flags |= FL_EXISTS;
+ unsigned int orig_flags = fl->c.flc_flags;
+ fl->c.flc_flags |= FL_EXISTS;
err = locks_lock_file_wait(file, fl);
- fl->fl_flags = orig_flags;
+ fl->c.flc_flags = orig_flags;
if (err == -ENOENT) {
if (!(orig_flags & FL_EXISTS))
err = 0;
@@ -253,13 +254,13 @@ int ceph_lock(struct file *file, int cmd, struct file_lock *fl)
u8 wait = 0;
u8 lock_cmd;
- if (!(fl->fl_flags & FL_POSIX))
+ if (!(fl->c.flc_flags & FL_POSIX))
return -ENOLCK;
if (ceph_inode_is_shutdown(inode))
return -ESTALE;
- doutc(cl, "fl_owner: %p\n", fl->fl_owner);
+ doutc(cl, "fl_owner: %p\n", fl->c.flc_owner);
/* set wait bit as appropriate, then make command as Ceph expects it*/
if (IS_GETLK(cmd))
@@ -273,19 +274,19 @@ int ceph_lock(struct file *file, int cmd, struct file_lock *fl)
}
spin_unlock(&ci->i_ceph_lock);
if (err < 0) {
- if (op == CEPH_MDS_OP_SETFILELOCK && F_UNLCK == fl->fl_type)
+ if (op == CEPH_MDS_OP_SETFILELOCK && lock_is_unlock(fl))
posix_lock_file(file, fl, NULL);
return err;
}
- if (F_RDLCK == fl->fl_type)
+ if (lock_is_read(fl))
lock_cmd = CEPH_LOCK_SHARED;
- else if (F_WRLCK == fl->fl_type)
+ else if (lock_is_write(fl))
lock_cmd = CEPH_LOCK_EXCL;
else
lock_cmd = CEPH_LOCK_UNLOCK;
- if (op == CEPH_MDS_OP_SETFILELOCK && F_UNLCK == fl->fl_type) {
+ if (op == CEPH_MDS_OP_SETFILELOCK && lock_is_unlock(fl)) {
err = try_unlock_file(file, fl);
if (err <= 0)
return err;
@@ -293,7 +294,7 @@ int ceph_lock(struct file *file, int cmd, struct file_lock *fl)
err = ceph_lock_message(CEPH_LOCK_FCNTL, op, inode, lock_cmd, wait, fl);
if (!err) {
- if (op == CEPH_MDS_OP_SETFILELOCK && F_UNLCK != fl->fl_type) {
+ if (op == CEPH_MDS_OP_SETFILELOCK && F_UNLCK != fl->c.flc_type) {
doutc(cl, "locking locally\n");
err = posix_lock_file(file, fl, NULL);
if (err) {
@@ -319,13 +320,13 @@ int ceph_flock(struct file *file, int cmd, struct file_lock *fl)
u8 wait = 0;
u8 lock_cmd;
- if (!(fl->fl_flags & FL_FLOCK))
+ if (!(fl->c.flc_flags & FL_FLOCK))
return -ENOLCK;
if (ceph_inode_is_shutdown(inode))
return -ESTALE;
- doutc(cl, "fl_file: %p\n", fl->fl_file);
+ doutc(cl, "fl_file: %p\n", fl->c.flc_file);
spin_lock(&ci->i_ceph_lock);
if (ci->i_ceph_flags & CEPH_I_ERROR_FILELOCK) {
@@ -333,7 +334,7 @@ int ceph_flock(struct file *file, int cmd, struct file_lock *fl)
}
spin_unlock(&ci->i_ceph_lock);
if (err < 0) {
- if (F_UNLCK == fl->fl_type)
+ if (lock_is_unlock(fl))
locks_lock_file_wait(file, fl);
return err;
}
@@ -341,14 +342,14 @@ int ceph_flock(struct file *file, int cmd, struct file_lock *fl)
if (IS_SETLKW(cmd))
wait = 1;
- if (F_RDLCK == fl->fl_type)
+ if (lock_is_read(fl))
lock_cmd = CEPH_LOCK_SHARED;
- else if (F_WRLCK == fl->fl_type)
+ else if (lock_is_write(fl))
lock_cmd = CEPH_LOCK_EXCL;
else
lock_cmd = CEPH_LOCK_UNLOCK;
- if (F_UNLCK == fl->fl_type) {
+ if (lock_is_unlock(fl)) {
err = try_unlock_file(file, fl);
if (err <= 0)
return err;
@@ -356,7 +357,7 @@ int ceph_flock(struct file *file, int cmd, struct file_lock *fl)
err = ceph_lock_message(CEPH_LOCK_FLOCK, CEPH_MDS_OP_SETFILELOCK,
inode, lock_cmd, wait, fl);
- if (!err && F_UNLCK != fl->fl_type) {
+ if (!err && F_UNLCK != fl->c.flc_type) {
err = locks_lock_file_wait(file, fl);
if (err) {
ceph_lock_message(CEPH_LOCK_FLOCK,
@@ -385,9 +386,9 @@ void ceph_count_locks(struct inode *inode, int *fcntl_count, int *flock_count)
ctx = locks_inode_context(inode);
if (ctx) {
spin_lock(&ctx->flc_lock);
- list_for_each_entry(lock, &ctx->flc_posix, fl_list)
+ for_each_file_lock(lock, &ctx->flc_posix)
++(*fcntl_count);
- list_for_each_entry(lock, &ctx->flc_flock, fl_list)
+ for_each_file_lock(lock, &ctx->flc_flock)
++(*flock_count);
spin_unlock(&ctx->flc_lock);
}
@@ -408,10 +409,10 @@ static int lock_to_ceph_filelock(struct inode *inode,
cephlock->start = cpu_to_le64(lock->fl_start);
cephlock->length = cpu_to_le64(lock->fl_end - lock->fl_start + 1);
cephlock->client = cpu_to_le64(0);
- cephlock->pid = cpu_to_le64((u64)lock->fl_pid);
- cephlock->owner = cpu_to_le64(secure_addr(lock->fl_owner));
+ cephlock->pid = cpu_to_le64((u64) lock->c.flc_pid);
+ cephlock->owner = cpu_to_le64(secure_addr(lock->c.flc_owner));
- switch (lock->fl_type) {
+ switch (lock->c.flc_type) {
case F_RDLCK:
cephlock->type = CEPH_LOCK_SHARED;
break;
@@ -422,7 +423,8 @@ static int lock_to_ceph_filelock(struct inode *inode,
cephlock->type = CEPH_LOCK_UNLOCK;
break;
default:
- doutc(cl, "Have unknown lock type %d\n", lock->fl_type);
+ doutc(cl, "Have unknown lock type %d\n",
+ lock->c.flc_type);
err = -EINVAL;
}
@@ -453,7 +455,7 @@ int ceph_encode_locks_to_buffer(struct inode *inode,
return 0;
spin_lock(&ctx->flc_lock);
- list_for_each_entry(lock, &ctx->flc_posix, fl_list) {
+ for_each_file_lock(lock, &ctx->flc_posix) {
++seen_fcntl;
if (seen_fcntl > num_fcntl_locks) {
err = -ENOSPC;
@@ -464,7 +466,7 @@ int ceph_encode_locks_to_buffer(struct inode *inode,
goto fail;
++l;
}
- list_for_each_entry(lock, &ctx->flc_flock, fl_list) {
+ for_each_file_lock(lock, &ctx->flc_flock) {
++seen_flock;
if (seen_flock > num_flock_locks) {
err = -ENOSPC;
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 548d1de..3ab9c26 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -1089,7 +1089,7 @@ void ceph_mdsc_release_request(struct kref *kref)
struct ceph_mds_request *req = container_of(kref,
struct ceph_mds_request,
r_kref);
- ceph_mdsc_release_dir_caps_no_check(req);
+ ceph_mdsc_release_dir_caps_async(req);
destroy_reply_info(&req->r_reply_info);
if (req->r_request)
ceph_msg_put(req->r_request);
@@ -2484,6 +2484,50 @@ void ceph_reclaim_caps_nr(struct ceph_mds_client *mdsc, int nr)
}
}
+void ceph_queue_cap_unlink_work(struct ceph_mds_client *mdsc)
+{
+ struct ceph_client *cl = mdsc->fsc->client;
+ if (mdsc->stopping)
+ return;
+
+ if (queue_work(mdsc->fsc->cap_wq, &mdsc->cap_unlink_work)) {
+ doutc(cl, "caps unlink work queued\n");
+ } else {
+ doutc(cl, "failed to queue caps unlink work\n");
+ }
+}
+
+static void ceph_cap_unlink_work(struct work_struct *work)
+{
+ struct ceph_mds_client *mdsc =
+ container_of(work, struct ceph_mds_client, cap_unlink_work);
+ struct ceph_client *cl = mdsc->fsc->client;
+
+ doutc(cl, "begin\n");
+ spin_lock(&mdsc->cap_unlink_delay_lock);
+ while (!list_empty(&mdsc->cap_unlink_delay_list)) {
+ struct ceph_inode_info *ci;
+ struct inode *inode;
+
+ ci = list_first_entry(&mdsc->cap_unlink_delay_list,
+ struct ceph_inode_info,
+ i_cap_delay_list);
+ list_del_init(&ci->i_cap_delay_list);
+
+ inode = igrab(&ci->netfs.inode);
+ if (inode) {
+ spin_unlock(&mdsc->cap_unlink_delay_lock);
+ doutc(cl, "on %p %llx.%llx\n", inode,
+ ceph_vinop(inode));
+ ceph_check_caps(ci, CHECK_CAPS_FLUSH);
+ iput(inode);
+ spin_lock(&mdsc->cap_unlink_delay_lock);
+ }
+ }
+ spin_unlock(&mdsc->cap_unlink_delay_lock);
+ doutc(cl, "done\n");
+}
+
/*
* requests
*/
@@ -4261,7 +4305,7 @@ void ceph_mdsc_release_dir_caps(struct ceph_mds_request *req)
}
}
-void ceph_mdsc_release_dir_caps_no_check(struct ceph_mds_request *req)
+void ceph_mdsc_release_dir_caps_async(struct ceph_mds_request *req)
{
struct ceph_client *cl = req->r_mdsc->fsc->client;
int dcaps;
@@ -4269,8 +4313,7 @@ void ceph_mdsc_release_dir_caps_no_check(struct ceph_mds_request *req)
dcaps = xchg(&req->r_dir_caps, 0);
if (dcaps) {
doutc(cl, "releasing r_dir_caps=%s\n", ceph_cap_string(dcaps));
- ceph_put_cap_refs_no_check_caps(ceph_inode(req->r_parent),
- dcaps);
+ ceph_put_cap_refs_async(ceph_inode(req->r_parent), dcaps);
}
}
@@ -4306,7 +4349,7 @@ static void replay_unsafe_requests(struct ceph_mds_client *mdsc,
if (req->r_session->s_mds != session->s_mds)
continue;
- ceph_mdsc_release_dir_caps_no_check(req);
+ ceph_mdsc_release_dir_caps_async(req);
__send_request(session, req, true);
}
@@ -5360,6 +5403,8 @@ int ceph_mdsc_init(struct ceph_fs_client *fsc)
INIT_LIST_HEAD(&mdsc->cap_delay_list);
INIT_LIST_HEAD(&mdsc->cap_wait_list);
spin_lock_init(&mdsc->cap_delay_lock);
+ INIT_LIST_HEAD(&mdsc->cap_unlink_delay_list);
+ spin_lock_init(&mdsc->cap_unlink_delay_lock);
INIT_LIST_HEAD(&mdsc->snap_flush_list);
spin_lock_init(&mdsc->snap_flush_lock);
mdsc->last_cap_flush_tid = 1;
@@ -5368,6 +5413,7 @@ int ceph_mdsc_init(struct ceph_fs_client *fsc)
spin_lock_init(&mdsc->cap_dirty_lock);
init_waitqueue_head(&mdsc->cap_flushing_wq);
INIT_WORK(&mdsc->cap_reclaim_work, ceph_cap_reclaim_work);
+ INIT_WORK(&mdsc->cap_unlink_work, ceph_cap_unlink_work);
err = ceph_metric_init(&mdsc->metric);
if (err)
goto err_mdsmap;
@@ -5641,6 +5687,7 @@ void ceph_mdsc_close_sessions(struct ceph_mds_client *mdsc)
ceph_cleanup_global_and_empty_realms(mdsc);
cancel_work_sync(&mdsc->cap_reclaim_work);
+ cancel_work_sync(&mdsc->cap_unlink_work);
cancel_delayed_work_sync(&mdsc->delayed_work); /* cancel timer */
doutc(cl, "done\n");
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 2e6ddaa..03f8ff0 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -462,6 +462,8 @@ struct ceph_mds_client {
unsigned long last_renew_caps; /* last time we renewed our caps */
struct list_head cap_delay_list; /* caps with delayed release */
spinlock_t cap_delay_lock; /* protects cap_delay_list */
+ struct list_head cap_unlink_delay_list; /* caps with delayed release for unlink */
+ spinlock_t cap_unlink_delay_lock; /* protects cap_unlink_delay_list */
struct list_head snap_flush_list; /* cap_snaps ready to flush */
spinlock_t snap_flush_lock;
@@ -475,6 +477,8 @@ struct ceph_mds_client {
struct work_struct cap_reclaim_work;
atomic_t cap_reclaim_pending;
+ struct work_struct cap_unlink_work;
+
/*
* Cap reservations
*
@@ -552,7 +556,7 @@ extern int ceph_mdsc_do_request(struct ceph_mds_client *mdsc,
struct inode *dir,
struct ceph_mds_request *req);
extern void ceph_mdsc_release_dir_caps(struct ceph_mds_request *req);
-extern void ceph_mdsc_release_dir_caps_no_check(struct ceph_mds_request *req);
+extern void ceph_mdsc_release_dir_caps_async(struct ceph_mds_request *req);
static inline void ceph_mdsc_get_request(struct ceph_mds_request *req)
{
kref_get(&req->r_kref);
@@ -574,6 +578,7 @@ extern void ceph_flush_cap_releases(struct ceph_mds_client *mdsc,
struct ceph_mds_session *session);
extern void ceph_queue_cap_reclaim_work(struct ceph_mds_client *mdsc);
extern void ceph_reclaim_caps_nr(struct ceph_mds_client *mdsc, int nr);
+extern void ceph_queue_cap_unlink_work(struct ceph_mds_client *mdsc);
extern int ceph_iterate_session_caps(struct ceph_mds_session *session,
int (*cb)(struct inode *, int mds, void *),
void *arg);
diff --git a/fs/ceph/mdsmap.c b/fs/ceph/mdsmap.c
index fae97c2..8109aba 100644
--- a/fs/ceph/mdsmap.c
+++ b/fs/ceph/mdsmap.c
@@ -380,10 +380,11 @@ struct ceph_mdsmap *ceph_mdsmap_decode(struct ceph_mds_client *mdsc, void **p,
ceph_decode_skip_8(p, end, bad_ext);
/* required_client_features */
ceph_decode_skip_set(p, end, 64, bad_ext);
+ /* bal_rank_mask */
+ ceph_decode_skip_string(p, end, bad_ext);
+ }
+ if (mdsmap_ev >= 18) {
ceph_decode_64_safe(p, end, m->m_max_xattr_size, bad_ext);
- } else {
- /* This forces the usage of the (sync) SETXATTR Op */
- m->m_max_xattr_size = 0;
}
bad_ext:
doutc(cl, "m_enabled: %d, m_damaged: %d, m_num_laggy: %d\n",
diff --git a/fs/ceph/mdsmap.h b/fs/ceph/mdsmap.h
index 89f1931..1f2171d 100644
--- a/fs/ceph/mdsmap.h
+++ b/fs/ceph/mdsmap.h
@@ -27,7 +27,11 @@ struct ceph_mdsmap {
u32 m_session_timeout; /* seconds */
u32 m_session_autoclose; /* seconds */
u64 m_max_file_size;
- u64 m_max_xattr_size; /* maximum size for xattrs blob */
+ /*
+ * maximum size for xattrs blob.
+ * Zeroed by default to force the usage of the (sync) SETXATTR Op.
+ */
+ u64 m_max_xattr_size;
u32 m_max_mds; /* expected up:active mds number */
u32 m_num_active_mds; /* actual up:active mds number */
u32 possible_max_rank; /* possible max rank index */
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index b06e2bc..b63b4cd 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -1255,8 +1255,6 @@ extern void ceph_take_cap_refs(struct ceph_inode_info *ci, int caps,
extern void ceph_get_cap_refs(struct ceph_inode_info *ci, int caps);
extern void ceph_put_cap_refs(struct ceph_inode_info *ci, int had);
extern void ceph_put_cap_refs_async(struct ceph_inode_info *ci, int had);
-extern void ceph_put_cap_refs_no_check_caps(struct ceph_inode_info *ci,
- int had);
extern void ceph_put_wrbuffer_cap_refs(struct ceph_inode_info *ci, int nr,
struct ceph_snap_context *snapc);
extern void __ceph_remove_capsnap(struct inode *inode,
diff --git a/fs/coda/inode.c b/fs/coda/inode.c
index 0c7c252..a50356c 100644
--- a/fs/coda/inode.c
+++ b/fs/coda/inode.c
@@ -24,6 +24,8 @@
#include <linux/pid_namespace.h>
#include <linux/uaccess.h>
#include <linux/fs.h>
+#include <linux/fs_context.h>
+#include <linux/fs_parser.h>
#include <linux/vmalloc.h>
#include <linux/coda.h>
@@ -87,10 +89,10 @@ void coda_destroy_inodecache(void)
kmem_cache_destroy(coda_inode_cachep);
}
-static int coda_remount(struct super_block *sb, int *flags, char *data)
+static int coda_reconfigure(struct fs_context *fc)
{
- sync_filesystem(sb);
- *flags |= SB_NOATIME;
+ sync_filesystem(fc->root->d_sb);
+ fc->sb_flags |= SB_NOATIME;
return 0;
}
@@ -102,78 +104,102 @@ static const struct super_operations coda_super_operations =
.evict_inode = coda_evict_inode,
.put_super = coda_put_super,
.statfs = coda_statfs,
- .remount_fs = coda_remount,
};
-static int get_device_index(struct coda_mount_data *data)
+struct coda_fs_context {
+ int idx;
+};
+
+enum {
+ Opt_fd,
+};
+
+static const struct fs_parameter_spec coda_param_specs[] = {
+ fsparam_fd ("fd", Opt_fd),
+ {}
+};
+
+static int coda_parse_fd(struct fs_context *fc, int fd)
{
+ struct coda_fs_context *ctx = fc->fs_private;
struct fd f;
struct inode *inode;
int idx;
- if (data == NULL) {
- pr_warn("%s: Bad mount data\n", __func__);
- return -1;
- }
-
- if (data->version != CODA_MOUNT_VERSION) {
- pr_warn("%s: Bad mount version\n", __func__);
- return -1;
- }
-
- f = fdget(data->fd);
+ f = fdget(fd);
if (!f.file)
- goto Ebadf;
+ return -EBADF;
inode = file_inode(f.file);
if (!S_ISCHR(inode->i_mode) || imajor(inode) != CODA_PSDEV_MAJOR) {
fdput(f);
- goto Ebadf;
+ return invalf(fc, "code: Not coda psdev");
}
idx = iminor(inode);
fdput(f);
- if (idx < 0 || idx >= MAX_CODADEVS) {
- pr_warn("%s: Bad minor number\n", __func__);
- return -1;
- }
-
- return idx;
-Ebadf:
- pr_warn("%s: Bad file\n", __func__);
- return -1;
+ if (idx < 0 || idx >= MAX_CODADEVS)
+ return invalf(fc, "coda: Bad minor number");
+ ctx->idx = idx;
+ return 0;
}
-static int coda_fill_super(struct super_block *sb, void *data, int silent)
+static int coda_parse_param(struct fs_context *fc, struct fs_parameter *param)
{
+ struct fs_parse_result result;
+ int opt;
+
+ opt = fs_parse(fc, coda_param_specs, param, &result);
+ if (opt < 0)
+ return opt;
+
+ switch (opt) {
+ case Opt_fd:
+ return coda_parse_fd(fc, result.uint_32);
+ }
+
+ return 0;
+}
+
+/*
+ * Parse coda's binary mount data form. We ignore any errors and go with index
+ * 0 if we get one for backward compatibility.
+ */
+static int coda_parse_monolithic(struct fs_context *fc, void *_data)
+{
+ struct coda_mount_data *data = _data;
+
+ if (!data)
+ return invalf(fc, "coda: Bad mount data");
+
+ if (data->version != CODA_MOUNT_VERSION)
+ return invalf(fc, "coda: Bad mount version");
+
+ coda_parse_fd(fc, data->fd);
+ return 0;
+}
+
+static int coda_fill_super(struct super_block *sb, struct fs_context *fc)
+{
+ struct coda_fs_context *ctx = fc->fs_private;
struct inode *root = NULL;
struct venus_comm *vc;
struct CodaFid fid;
int error;
- int idx;
- if (task_active_pid_ns(current) != &init_pid_ns)
- return -EINVAL;
+ infof(fc, "coda: device index: %i\n", ctx->idx);
- idx = get_device_index((struct coda_mount_data *) data);
-
- /* Ignore errors in data, for backward compatibility */
- if(idx == -1)
- idx = 0;
-
- pr_info("%s: device index: %i\n", __func__, idx);
-
- vc = &coda_comms[idx];
+ vc = &coda_comms[ctx->idx];
mutex_lock(&vc->vc_mutex);
if (!vc->vc_inuse) {
- pr_warn("%s: No pseudo device\n", __func__);
+ errorf(fc, "coda: No pseudo device");
error = -EINVAL;
goto unlock_out;
}
if (vc->vc_sb) {
- pr_warn("%s: Device already mounted\n", __func__);
+ errorf(fc, "coda: Device already mounted");
error = -EBUSY;
goto unlock_out;
}
@@ -313,18 +339,45 @@ static int coda_statfs(struct dentry *dentry, struct kstatfs *buf)
return 0;
}
-/* init_coda: used by filesystems.c to register coda */
-
-static struct dentry *coda_mount(struct file_system_type *fs_type,
- int flags, const char *dev_name, void *data)
+static int coda_get_tree(struct fs_context *fc)
{
- return mount_nodev(fs_type, flags, data, coda_fill_super);
+ if (task_active_pid_ns(current) != &init_pid_ns)
+ return -EINVAL;
+
+ return get_tree_nodev(fc, coda_fill_super);
+}
+
+static void coda_free_fc(struct fs_context *fc)
+{
+ kfree(fc->fs_private);
+}
+
+static const struct fs_context_operations coda_context_ops = {
+ .free = coda_free_fc,
+ .parse_param = coda_parse_param,
+ .parse_monolithic = coda_parse_monolithic,
+ .get_tree = coda_get_tree,
+ .reconfigure = coda_reconfigure,
+};
+
+static int coda_init_fs_context(struct fs_context *fc)
+{
+ struct coda_fs_context *ctx;
+
+ ctx = kzalloc(sizeof(struct coda_fs_context), GFP_KERNEL);
+ if (!ctx)
+ return -ENOMEM;
+
+ fc->fs_private = ctx;
+ fc->ops = &coda_context_ops;
+ return 0;
}
struct file_system_type coda_fs_type = {
.owner = THIS_MODULE,
.name = "coda",
- .mount = coda_mount,
+ .init_fs_context = coda_init_fs_context,
+ .parameters = coda_param_specs,
.kill_sb = kill_anon_super,
.fs_flags = FS_BINARY_MOUNTDATA,
};
diff --git a/fs/coredump.c b/fs/coredump.c
index f258c17..be6403b 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -872,6 +872,9 @@ static int dump_emit_page(struct coredump_params *cprm, struct page *page)
loff_t pos;
ssize_t n;
+ if (!page)
+ return 0;
+
if (cprm->to_skip) {
if (!__dump_skip(cprm, cprm->to_skip))
return 0;
@@ -884,7 +887,6 @@ static int dump_emit_page(struct coredump_params *cprm, struct page *page)
pos = file->f_pos;
bvec_set_page(&bvec, page, PAGE_SIZE, 0);
iov_iter_bvec(&iter, ITER_SOURCE, &bvec, 1, PAGE_SIZE);
- iov_iter_set_copy_mc(&iter);
n = __kernel_write_iter(cprm->file, &iter, &pos);
if (n != PAGE_SIZE)
return 0;
@@ -895,10 +897,44 @@ static int dump_emit_page(struct coredump_params *cprm, struct page *page)
return 1;
}
+/*
+ * If we might get machine checks from kernel accesses during the
+ * core dump, let's get those errors early rather than during the
+ * IO. This is not performance-critical enough to warrant having
+ * all the machine check logic in the iovec paths.
+ */
+#ifdef copy_mc_to_kernel
+
+#define dump_page_alloc() alloc_page(GFP_KERNEL)
+#define dump_page_free(x) __free_page(x)
+static struct page *dump_page_copy(struct page *src, struct page *dst)
+{
+ void *buf = kmap_local_page(src);
+ size_t left = copy_mc_to_kernel(page_address(dst), buf, PAGE_SIZE);
+ kunmap_local(buf);
+ return left ? NULL : dst;
+}
+
+#else
+
+/* We just want to return non-NULL; it's never used. */
+#define dump_page_alloc() ERR_PTR(-EINVAL)
+#define dump_page_free(x) ((void)(x))
+static inline struct page *dump_page_copy(struct page *src, struct page *dst)
+{
+ return src;
+}
+#endif
+
int dump_user_range(struct coredump_params *cprm, unsigned long start,
unsigned long len)
{
unsigned long addr;
+ struct page *dump_page;
+
+ dump_page = dump_page_alloc();
+ if (!dump_page)
+ return 0;
for (addr = start; addr < start + len; addr += PAGE_SIZE) {
struct page *page;
@@ -912,14 +948,17 @@ int dump_user_range(struct coredump_params *cprm, unsigned long start,
*/
page = get_dump_page(addr);
if (page) {
- int stop = !dump_emit_page(cprm, page);
+ int stop = !dump_emit_page(cprm, dump_page_copy(page, dump_page));
put_page(page);
- if (stop)
+ if (stop) {
+ dump_page_free(dump_page);
return 0;
+ }
} else {
dump_skip(cprm, PAGE_SIZE);
}
}
+ dump_page_free(dump_page);
return 1;
}
#endif
diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
index 60dbfa0..39e7513 100644
--- a/fs/cramfs/inode.c
+++ b/fs/cramfs/inode.c
@@ -495,7 +495,7 @@ static void cramfs_kill_sb(struct super_block *sb)
sb->s_mtd = NULL;
} else if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV) && sb->s_bdev) {
sync_blockdev(sb->s_bdev);
- bdev_release(sb->s_bdev_handle);
+ fput(sb->s_bdev_file);
}
kfree(sbi);
}
diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
index 7b3fc18..0ad52fb 100644
--- a/fs/crypto/fname.c
+++ b/fs/crypto/fname.c
@@ -74,13 +74,7 @@ struct fscrypt_nokey_name {
static inline bool fscrypt_is_dot_dotdot(const struct qstr *str)
{
- if (str->len == 1 && str->name[0] == '.')
- return true;
-
- if (str->len == 2 && str->name[0] == '.' && str->name[1] == '.')
- return true;
-
- return false;
+ return is_dot_dotdot(str->name, str->len);
}
/**
diff --git a/fs/crypto/hooks.c b/fs/crypto/hooks.c
index 52504dd..104771c 100644
--- a/fs/crypto/hooks.c
+++ b/fs/crypto/hooks.c
@@ -102,11 +102,8 @@ int __fscrypt_prepare_lookup(struct inode *dir, struct dentry *dentry,
if (err && err != -ENOENT)
return err;
- if (fname->is_nokey_name) {
- spin_lock(&dentry->d_lock);
- dentry->d_flags |= DCACHE_NOKEY_NAME;
- spin_unlock(&dentry->d_lock);
- }
+ fscrypt_prepare_dentry(dentry, fname->is_nokey_name);
+
return err;
}
EXPORT_SYMBOL_GPL(__fscrypt_prepare_lookup);
@@ -131,12 +128,10 @@ EXPORT_SYMBOL_GPL(__fscrypt_prepare_lookup);
int fscrypt_prepare_lookup_partial(struct inode *dir, struct dentry *dentry)
{
int err = fscrypt_get_encryption_info(dir, true);
+ bool is_nokey_name = (!err && !fscrypt_has_encryption_key(dir));
- if (!err && !fscrypt_has_encryption_key(dir)) {
- spin_lock(&dentry->d_lock);
- dentry->d_flags |= DCACHE_NOKEY_NAME;
- spin_unlock(&dentry->d_lock);
- }
+ fscrypt_prepare_dentry(dentry, is_nokey_name);
+
return err;
}
EXPORT_SYMBOL_GPL(fscrypt_prepare_lookup_partial);
diff --git a/fs/dcache.c b/fs/dcache.c
index b813528..71a8e94 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3061,7 +3061,10 @@ static enum d_walk_ret d_genocide_kill(void *data, struct dentry *dentry)
if (d_unhashed(dentry) || !dentry->d_inode)
return D_WALK_SKIP;
- dentry->d_lockref.count--;
+ if (!(dentry->d_flags & DCACHE_GENOCIDE)) {
+ dentry->d_flags |= DCACHE_GENOCIDE;
+ dentry->d_lockref.count--;
+ }
}
return D_WALK_CONTINUE;
}
@@ -3136,7 +3139,7 @@ static void __init dcache_init(void)
* of the dcache.
*/
dentry_cache = KMEM_CACHE_USERCOPY(dentry,
- SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
+ SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_ACCOUNT,
d_iname);
/* Hash may have been set up in dcache_init_early */
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 6045626..62c97ff 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -410,6 +410,8 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio,
bio->bi_end_io = dio_bio_end_io;
if (dio->is_pinned)
bio_set_flag(bio, BIO_PAGE_PINNED);
+ bio->bi_write_hint = file_inode(dio->iocb->ki_filp)->i_write_hint;
+
sdio->bio = bio;
sdio->logical_offset_in_bio = sdio->cur_page_fs_offset;
}
diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
index d814c512..9ca83ef 100644
--- a/fs/dlm/plock.c
+++ b/fs/dlm/plock.c
@@ -138,14 +138,14 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
}
op->info.optype = DLM_PLOCK_OP_LOCK;
- op->info.pid = fl->fl_pid;
- op->info.ex = (fl->fl_type == F_WRLCK);
- op->info.wait = !!(fl->fl_flags & FL_SLEEP);
+ op->info.pid = fl->c.flc_pid;
+ op->info.ex = lock_is_write(fl);
+ op->info.wait = !!(fl->c.flc_flags & FL_SLEEP);
op->info.fsid = ls->ls_global_id;
op->info.number = number;
op->info.start = fl->fl_start;
op->info.end = fl->fl_end;
- op->info.owner = (__u64)(long)fl->fl_owner;
+ op->info.owner = (__u64)(long) fl->c.flc_owner;
/* async handling */
if (fl->fl_lmops && fl->fl_lmops->lm_grant) {
op_data = kzalloc(sizeof(*op_data), GFP_NOFS);
@@ -258,7 +258,7 @@ static int dlm_plock_callback(struct plock_op *op)
}
/* got fs lock; bookkeep locally as well: */
- flc->fl_flags &= ~FL_SLEEP;
+ flc->c.flc_flags &= ~FL_SLEEP;
if (posix_lock_file(file, flc, NULL)) {
/*
* This can only happen in the case of kmalloc() failure.
@@ -291,7 +291,7 @@ int dlm_posix_unlock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
struct dlm_ls *ls;
struct plock_op *op;
int rv;
- unsigned char fl_flags = fl->fl_flags;
+ unsigned char saved_flags = fl->c.flc_flags;
ls = dlm_find_lockspace_local(lockspace);
if (!ls)
@@ -304,7 +304,7 @@ int dlm_posix_unlock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
}
/* cause the vfs unlock to return ENOENT if lock is not found */
- fl->fl_flags |= FL_EXISTS;
+ fl->c.flc_flags |= FL_EXISTS;
rv = locks_lock_file_wait(file, fl);
if (rv == -ENOENT) {
@@ -317,14 +317,14 @@ int dlm_posix_unlock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
}
op->info.optype = DLM_PLOCK_OP_UNLOCK;
- op->info.pid = fl->fl_pid;
+ op->info.pid = fl->c.flc_pid;
op->info.fsid = ls->ls_global_id;
op->info.number = number;
op->info.start = fl->fl_start;
op->info.end = fl->fl_end;
- op->info.owner = (__u64)(long)fl->fl_owner;
+ op->info.owner = (__u64)(long) fl->c.flc_owner;
- if (fl->fl_flags & FL_CLOSE) {
+ if (fl->c.flc_flags & FL_CLOSE) {
op->info.flags |= DLM_PLOCK_FL_CLOSE;
send_op(op);
rv = 0;
@@ -345,7 +345,7 @@ int dlm_posix_unlock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
dlm_release_plock_op(op);
out:
dlm_put_lockspace(ls);
- fl->fl_flags = fl_flags;
+ fl->c.flc_flags = saved_flags;
return rv;
}
EXPORT_SYMBOL_GPL(dlm_posix_unlock);
@@ -375,14 +375,14 @@ int dlm_posix_cancel(dlm_lockspace_t *lockspace, u64 number, struct file *file,
return -EINVAL;
memset(&info, 0, sizeof(info));
- info.pid = fl->fl_pid;
- info.ex = (fl->fl_type == F_WRLCK);
+ info.pid = fl->c.flc_pid;
+ info.ex = lock_is_write(fl);
info.fsid = ls->ls_global_id;
dlm_put_lockspace(ls);
info.number = number;
info.start = fl->fl_start;
info.end = fl->fl_end;
- info.owner = (__u64)(long)fl->fl_owner;
+ info.owner = (__u64)(long) fl->c.flc_owner;
rv = do_lock_cancel(&info);
switch (rv) {
@@ -437,13 +437,13 @@ int dlm_posix_get(dlm_lockspace_t *lockspace, u64 number, struct file *file,
}
op->info.optype = DLM_PLOCK_OP_GET;
- op->info.pid = fl->fl_pid;
- op->info.ex = (fl->fl_type == F_WRLCK);
+ op->info.pid = fl->c.flc_pid;
+ op->info.ex = lock_is_write(fl);
op->info.fsid = ls->ls_global_id;
op->info.number = number;
op->info.start = fl->fl_start;
op->info.end = fl->fl_end;
- op->info.owner = (__u64)(long)fl->fl_owner;
+ op->info.owner = (__u64)(long) fl->c.flc_owner;
send_op(op);
wait_event(recv_wq, (op->done != 0));
@@ -455,16 +455,16 @@ int dlm_posix_get(dlm_lockspace_t *lockspace, u64 number, struct file *file,
rv = op->info.rv;
- fl->fl_type = F_UNLCK;
+ fl->c.flc_type = F_UNLCK;
if (rv == -ENOENT)
rv = 0;
else if (rv > 0) {
locks_init_lock(fl);
- fl->fl_type = (op->info.ex) ? F_WRLCK : F_RDLCK;
- fl->fl_flags = FL_POSIX;
- fl->fl_pid = op->info.pid;
+ fl->c.flc_type = (op->info.ex) ? F_WRLCK : F_RDLCK;
+ fl->c.flc_flags = FL_POSIX;
+ fl->c.flc_pid = op->info.pid;
if (op->info.nodeid != dlm_our_nodeid())
- fl->fl_pid = -fl->fl_pid;
+ fl->c.flc_pid = -fl->c.flc_pid;
fl->fl_start = op->info.start;
fl->fl_end = op->info.end;
rv = 0;
diff --git a/fs/ecryptfs/crypto.c b/fs/ecryptfs/crypto.c
index 03bd550..2fe0f3a 100644
--- a/fs/ecryptfs/crypto.c
+++ b/fs/ecryptfs/crypto.c
@@ -1949,16 +1949,6 @@ int ecryptfs_encrypt_and_encode_filename(
return rc;
}
-static bool is_dot_dotdot(const char *name, size_t name_size)
-{
- if (name_size == 1 && name[0] == '.')
- return true;
- else if (name_size == 2 && name[0] == '.' && name[1] == '.')
- return true;
-
- return false;
-}
-
/**
* ecryptfs_decode_and_decrypt_filename - converts the encoded cipher text name to decoded plaintext
* @plaintext_name: The plaintext name
diff --git a/fs/efivarfs/internal.h b/fs/efivarfs/internal.h
index 169252e..f720615 100644
--- a/fs/efivarfs/internal.h
+++ b/fs/efivarfs/internal.h
@@ -38,7 +38,7 @@ struct efivar_entry {
int efivar_init(int (*func)(efi_char16_t *, efi_guid_t, unsigned long, void *,
struct list_head *),
- void *data, bool duplicates, struct list_head *head);
+ void *data, struct list_head *head);
int efivar_entry_add(struct efivar_entry *entry, struct list_head *head);
void __efivar_entry_add(struct efivar_entry *entry, struct list_head *head);
diff --git a/fs/efivarfs/super.c b/fs/efivarfs/super.c
index 6038dd393..bb14462 100644
--- a/fs/efivarfs/super.c
+++ b/fs/efivarfs/super.c
@@ -343,12 +343,7 @@ static int efivarfs_fill_super(struct super_block *sb, struct fs_context *fc)
if (err)
return err;
- err = efivar_init(efivarfs_callback, (void *)sb, true,
- &sfi->efivarfs_list);
- if (err)
- efivar_entry_iter(efivarfs_destroy, &sfi->efivarfs_list, NULL);
-
- return err;
+ return efivar_init(efivarfs_callback, sb, &sfi->efivarfs_list);
}
static int efivarfs_get_tree(struct fs_context *fc)
diff --git a/fs/efivarfs/vars.c b/fs/efivarfs/vars.c
index 114ff0f..4d722af 100644
--- a/fs/efivarfs/vars.c
+++ b/fs/efivarfs/vars.c
@@ -361,7 +361,6 @@ static void dup_variable_bug(efi_char16_t *str16, efi_guid_t *vendor_guid,
* efivar_init - build the initial list of EFI variables
* @func: callback function to invoke for every variable
* @data: function-specific data to pass to @func
- * @duplicates: error if we encounter duplicates on @head?
* @head: initialised head of variable list
*
* Get every EFI variable from the firmware and invoke @func. @func
@@ -371,9 +370,9 @@ static void dup_variable_bug(efi_char16_t *str16, efi_guid_t *vendor_guid,
*/
int efivar_init(int (*func)(efi_char16_t *, efi_guid_t, unsigned long, void *,
struct list_head *),
- void *data, bool duplicates, struct list_head *head)
+ void *data, struct list_head *head)
{
- unsigned long variable_name_size = 1024;
+ unsigned long variable_name_size = 512;
efi_char16_t *variable_name;
efi_status_t status;
efi_guid_t vendor_guid;
@@ -390,12 +389,13 @@ int efivar_init(int (*func)(efi_char16_t *, efi_guid_t, unsigned long, void *,
goto free;
/*
- * Per EFI spec, the maximum storage allocated for both
- * the variable name and variable data is 1024 bytes.
+ * A small set of old UEFI implementations reject sizes
+ * above a certain threshold, the lowest seen in the wild
+ * is 512.
*/
do {
- variable_name_size = 1024;
+ variable_name_size = 512;
status = efivar_get_next_variable(&variable_name_size,
variable_name,
@@ -413,8 +413,7 @@ int efivar_init(int (*func)(efi_char16_t *, efi_guid_t, unsigned long, void *,
* we'll ever see a different variable name,
* and may end up looping here forever.
*/
- if (duplicates &&
- variable_is_present(variable_name, &vendor_guid,
+ if (variable_is_present(variable_name, &vendor_guid,
head)) {
dup_variable_bug(variable_name, &vendor_guid,
variable_name_size);
@@ -432,9 +431,13 @@ int efivar_init(int (*func)(efi_char16_t *, efi_guid_t, unsigned long, void *,
break;
case EFI_NOT_FOUND:
break;
+ case EFI_BUFFER_TOO_SMALL:
+ pr_warn("efivars: Variable name size exceeds maximum (%lu > 512)\n",
+ variable_name_size);
+ status = EFI_NOT_FOUND;
+ break;
default:
- printk(KERN_WARNING "efivars: get_next_variable: status=%lx\n",
- status);
+ pr_warn("efivars: get_next_variable: status=%lx\n", status);
status = EFI_NOT_FOUND;
break;
}
diff --git a/fs/efs/super.c b/fs/efs/super.c
index f17fdac..e4421c1 100644
--- a/fs/efs/super.c
+++ b/fs/efs/super.c
@@ -14,19 +14,14 @@
#include <linux/buffer_head.h>
#include <linux/vfs.h>
#include <linux/blkdev.h>
-
+#include <linux/fs_context.h>
+#include <linux/fs_parser.h>
#include "efs.h"
#include <linux/efs_vh.h>
#include <linux/efs_fs_sb.h>
static int efs_statfs(struct dentry *dentry, struct kstatfs *buf);
-static int efs_fill_super(struct super_block *s, void *d, int silent);
-
-static struct dentry *efs_mount(struct file_system_type *fs_type,
- int flags, const char *dev_name, void *data)
-{
- return mount_bdev(fs_type, flags, dev_name, data, efs_fill_super);
-}
+static int efs_init_fs_context(struct fs_context *fc);
static void efs_kill_sb(struct super_block *s)
{
@@ -35,15 +30,6 @@ static void efs_kill_sb(struct super_block *s)
kfree(sbi);
}
-static struct file_system_type efs_fs_type = {
- .owner = THIS_MODULE,
- .name = "efs",
- .mount = efs_mount,
- .kill_sb = efs_kill_sb,
- .fs_flags = FS_REQUIRES_DEV,
-};
-MODULE_ALIAS_FS("efs");
-
static struct pt_types sgi_pt_types[] = {
{0x00, "SGI vh"},
{0x01, "SGI trkrepl"},
@@ -63,6 +49,27 @@ static struct pt_types sgi_pt_types[] = {
{0, NULL}
};
+enum {
+ Opt_explicit_open,
+};
+
+static const struct fs_parameter_spec efs_param_spec[] = {
+ fsparam_flag ("explicit-open", Opt_explicit_open),
+ {}
+};
+
+/*
+ * File system definition and registration.
+ */
+static struct file_system_type efs_fs_type = {
+ .owner = THIS_MODULE,
+ .name = "efs",
+ .kill_sb = efs_kill_sb,
+ .fs_flags = FS_REQUIRES_DEV,
+ .init_fs_context = efs_init_fs_context,
+ .parameters = efs_param_spec,
+};
+MODULE_ALIAS_FS("efs");
static struct kmem_cache * efs_inode_cachep;
@@ -91,8 +98,8 @@ static int __init init_inodecache(void)
{
efs_inode_cachep = kmem_cache_create("efs_inode_cache",
sizeof(struct efs_inode_info), 0,
- SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
- SLAB_ACCOUNT, init_once);
+ SLAB_RECLAIM_ACCOUNT|SLAB_ACCOUNT,
+ init_once);
if (efs_inode_cachep == NULL)
return -ENOMEM;
return 0;
@@ -108,18 +115,10 @@ static void destroy_inodecache(void)
kmem_cache_destroy(efs_inode_cachep);
}
-static int efs_remount(struct super_block *sb, int *flags, char *data)
-{
- sync_filesystem(sb);
- *flags |= SB_RDONLY;
- return 0;
-}
-
static const struct super_operations efs_superblock_operations = {
.alloc_inode = efs_alloc_inode,
.free_inode = efs_free_inode,
.statfs = efs_statfs,
- .remount_fs = efs_remount,
};
static const struct export_operations efs_export_ops = {
@@ -249,26 +248,26 @@ static int efs_validate_super(struct efs_sb_info *sb, struct efs_super *super) {
return 0;
}
-static int efs_fill_super(struct super_block *s, void *d, int silent)
+static int efs_fill_super(struct super_block *s, struct fs_context *fc)
{
struct efs_sb_info *sb;
struct buffer_head *bh;
struct inode *root;
- sb = kzalloc(sizeof(struct efs_sb_info), GFP_KERNEL);
+ sb = kzalloc(sizeof(struct efs_sb_info), GFP_KERNEL);
if (!sb)
return -ENOMEM;
s->s_fs_info = sb;
s->s_time_min = 0;
s->s_time_max = U32_MAX;
-
+
s->s_magic = EFS_SUPER_MAGIC;
if (!sb_set_blocksize(s, EFS_BLOCKSIZE)) {
pr_err("device does not support %d byte blocks\n",
EFS_BLOCKSIZE);
return -EINVAL;
}
-
+
/* read the vh (volume header) block */
bh = sb_bread(s, 0);
@@ -294,7 +293,7 @@ static int efs_fill_super(struct super_block *s, void *d, int silent)
pr_err("cannot read superblock\n");
return -EIO;
}
-
+
if (efs_validate_super(sb, (struct efs_super *) bh->b_data)) {
#ifdef DEBUG
pr_warn("invalid superblock at block %u\n",
@@ -328,6 +327,61 @@ static int efs_fill_super(struct super_block *s, void *d, int silent)
return 0;
}
+static void efs_free_fc(struct fs_context *fc)
+{
+ kfree(fc->fs_private);
+}
+
+static int efs_get_tree(struct fs_context *fc)
+{
+ return get_tree_bdev(fc, efs_fill_super);
+}
+
+static int efs_parse_param(struct fs_context *fc, struct fs_parameter *param)
+{
+ int token;
+ struct fs_parse_result result;
+
+ token = fs_parse(fc, efs_param_spec, param, &result);
+ if (token < 0)
+ return token;
+ return 0;
+}
+
+static int efs_reconfigure(struct fs_context *fc)
+{
+ sync_filesystem(fc->root->d_sb);
+
+ return 0;
+}
+
+struct efs_context {
+ unsigned long s_mount_opts;
+};
+
+static const struct fs_context_operations efs_context_opts = {
+ .parse_param = efs_parse_param,
+ .get_tree = efs_get_tree,
+ .reconfigure = efs_reconfigure,
+ .free = efs_free_fc,
+};
+
+/*
+ * Set up the filesystem mount context.
+ */
+static int efs_init_fs_context(struct fs_context *fc)
+{
+ struct efs_context *ctx;
+
+ ctx = kzalloc(sizeof(struct efs_context), GFP_KERNEL);
+ if (!ctx)
+ return -ENOMEM;
+ fc->fs_private = ctx;
+ fc->ops = &efs_context_opts;
+
+ return 0;
+}
+
static int efs_statfs(struct dentry *dentry, struct kstatfs *buf) {
struct super_block *sb = dentry->d_sb;
struct efs_sb_info *sbi = SUPER_INFO(sb);
diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index c98aeda..52524bd 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -220,7 +220,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
up_read(&devs->rwsem);
return 0;
}
- map->m_bdev = dif->bdev_handle ? dif->bdev_handle->bdev : NULL;
+ map->m_bdev = dif->bdev_file ? file_bdev(dif->bdev_file) : NULL;
map->m_daxdev = dif->dax_dev;
map->m_dax_part_off = dif->dax_part_off;
map->m_fscache = dif->fscache;
@@ -238,8 +238,8 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
if (map->m_pa >= startoff &&
map->m_pa < startoff + length) {
map->m_pa -= startoff;
- map->m_bdev = dif->bdev_handle ?
- dif->bdev_handle->bdev : NULL;
+ map->m_bdev = dif->bdev_file ?
+ file_bdev(dif->bdev_file) : NULL;
map->m_daxdev = dif->dax_dev;
map->m_dax_part_off = dif->dax_part_off;
map->m_fscache = dif->fscache;
@@ -447,5 +447,6 @@ const struct file_operations erofs_file_fops = {
.llseek = generic_file_llseek,
.read_iter = erofs_file_read_iter,
.mmap = erofs_file_mmap,
+ .get_unmapped_area = thp_get_unmapped_area,
.splice_read = filemap_splice_read,
};
diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c
index d4cee95a..2ec9b2b 100644
--- a/fs/erofs/decompressor.c
+++ b/fs/erofs/decompressor.c
@@ -323,7 +323,8 @@ static int z_erofs_transform_plain(struct z_erofs_decompress_req *rq,
unsigned int cur = 0, ni = 0, no, pi, po, insz, cnt;
u8 *kin;
- DBG_BUGON(rq->outputsize > rq->inputsize);
+ if (rq->outputsize > rq->inputsize)
+ return -EOPNOTSUPP;
if (rq->alg == Z_EROFS_COMPRESSION_INTERLACED) {
cur = bs - (rq->pageofs_out & (bs - 1));
pi = (rq->pageofs_in + rq->inputsize - cur) & ~PAGE_MASK;
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 5ff9002..89a7c24 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -381,11 +381,12 @@ static int erofs_fscache_init_domain(struct super_block *sb)
goto out;
if (!erofs_pseudo_mnt) {
- erofs_pseudo_mnt = kern_mount(&erofs_fs_type);
- if (IS_ERR(erofs_pseudo_mnt)) {
- err = PTR_ERR(erofs_pseudo_mnt);
+ struct vfsmount *mnt = kern_mount(&erofs_fs_type);
+ if (IS_ERR(mnt)) {
+ err = PTR_ERR(mnt);
goto out;
}
+ erofs_pseudo_mnt = mnt;
}
domain->volume = sbi->volume;
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index b0409ba..0f07063 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -49,7 +49,7 @@ typedef u32 erofs_blk_t;
struct erofs_device_info {
char *path;
struct erofs_fscache *fscache;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
struct dax_device *dax_dev;
u64 dax_part_off;
diff --git a/fs/erofs/namei.c b/fs/erofs/namei.c
index d4f631d..f0110a7 100644
--- a/fs/erofs/namei.c
+++ b/fs/erofs/namei.c
@@ -130,24 +130,24 @@ static void *erofs_find_target_block(struct erofs_buf *target,
/* string comparison without already matched prefix */
diff = erofs_dirnamecmp(name, &dname, &matched);
- if (!diff) {
- *_ndirents = 0;
- goto out;
- } else if (diff > 0) {
- head = mid + 1;
- startprfx = matched;
-
- if (!IS_ERR(candidate))
- erofs_put_metabuf(target);
- *target = buf;
- candidate = de;
- *_ndirents = ndirents;
- } else {
+ if (diff < 0) {
erofs_put_metabuf(&buf);
-
back = mid - 1;
endprfx = matched;
+ continue;
}
+
+ if (!IS_ERR(candidate))
+ erofs_put_metabuf(target);
+ *target = buf;
+ if (!diff) {
+ *_ndirents = 0;
+ return de;
+ }
+ head = mid + 1;
+ startprfx = matched;
+ candidate = de;
+ *_ndirents = ndirents;
continue;
}
out: /* free if the candidate is valid */
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 5f60f16..9b4b66d 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -177,7 +177,7 @@ static int erofs_init_device(struct erofs_buf *buf, struct super_block *sb,
struct erofs_sb_info *sbi = EROFS_SB(sb);
struct erofs_fscache *fscache;
struct erofs_deviceslot *dis;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
void *ptr;
ptr = erofs_read_metabuf(buf, sb, erofs_blknr(sb, *pos), EROFS_KMAP);
@@ -201,12 +201,12 @@ static int erofs_init_device(struct erofs_buf *buf, struct super_block *sb,
return PTR_ERR(fscache);
dif->fscache = fscache;
} else if (!sbi->devs->flatdev) {
- bdev_handle = bdev_open_by_path(dif->path, BLK_OPEN_READ,
+ bdev_file = bdev_file_open_by_path(dif->path, BLK_OPEN_READ,
sb->s_type, NULL);
- if (IS_ERR(bdev_handle))
- return PTR_ERR(bdev_handle);
- dif->bdev_handle = bdev_handle;
- dif->dax_dev = fs_dax_get_by_bdev(bdev_handle->bdev,
+ if (IS_ERR(bdev_file))
+ return PTR_ERR(bdev_file);
+ dif->bdev_file = bdev_file;
+ dif->dax_dev = fs_dax_get_by_bdev(file_bdev(bdev_file),
&dif->dax_part_off, NULL, NULL);
}
@@ -754,8 +754,8 @@ static int erofs_release_device_info(int id, void *ptr, void *data)
struct erofs_device_info *dif = ptr;
fs_put_dax(dif->dax_dev, NULL);
- if (dif->bdev_handle)
- bdev_release(dif->bdev_handle);
+ if (dif->bdev_file)
+ fput(dif->bdev_file);
erofs_fscache_unregister_cookie(dif->fscache);
dif->fscache = NULL;
kfree(dif->path);
diff --git a/fs/eventfd.c b/fs/eventfd.c
index ad8186d..9afdb72 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -251,7 +251,7 @@ static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t c
ssize_t res;
__u64 ucnt;
- if (count < sizeof(ucnt))
+ if (count != sizeof(ucnt))
return -EINVAL;
if (copy_from_user(&ucnt, buf, sizeof(ucnt)))
return -EFAULT;
@@ -283,13 +283,18 @@ static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t c
static void eventfd_show_fdinfo(struct seq_file *m, struct file *f)
{
struct eventfd_ctx *ctx = f->private_data;
+ __u64 cnt;
spin_lock_irq(&ctx->wqh.lock);
- seq_printf(m, "eventfd-count: %16llx\n",
- (unsigned long long)ctx->count);
+ cnt = ctx->count;
spin_unlock_irq(&ctx->wqh.lock);
- seq_printf(m, "eventfd-id: %d\n", ctx->id);
- seq_printf(m, "eventfd-semaphore: %d\n",
+
+ seq_printf(m,
+ "eventfd-count: %16llx\n"
+ "eventfd-id: %d\n"
+ "eventfd-semaphore: %d\n",
+ cnt,
+ ctx->id,
!!(ctx->flags & EFD_SEMAPHORE));
}
#endif
@@ -383,6 +388,7 @@ static int do_eventfd(unsigned int count, int flags)
/* Check the EFD_* constants for consistency. */
BUILD_BUG_ON(EFD_CLOEXEC != O_CLOEXEC);
BUILD_BUG_ON(EFD_NONBLOCK != O_NONBLOCK);
+ BUILD_BUG_ON(EFD_SEMAPHORE != (1 << 0));
if (flags & ~EFD_FLAGS_SET)
return -EINVAL;
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 3534d36..39ac6fd 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -206,7 +206,7 @@ struct eventpoll {
*/
struct epitem *ovflist;
- /* wakeup_source used when ep_scan_ready_list is running */
+ /* wakeup_source used when ep_send_events or __ep_eventpoll_poll is running */
struct wakeup_source *ws;
/* The user that created the eventpoll descriptor */
@@ -678,12 +678,6 @@ static void ep_done_scan(struct eventpoll *ep,
write_unlock_irq(&ep->lock);
}
-static void epi_rcu_free(struct rcu_head *head)
-{
- struct epitem *epi = container_of(head, struct epitem, rcu);
- kmem_cache_free(epi_cache, epi);
-}
-
static void ep_get(struct eventpoll *ep)
{
refcount_inc(&ep->refcount);
@@ -767,7 +761,7 @@ static bool __ep_remove(struct eventpoll *ep, struct epitem *epi, bool force)
* ep->mtx. The rcu read side, reverse_path_check_proc(), does not make
* use of the rbn field.
*/
- call_rcu(&epi->rcu, epi_rcu_free);
+ kfree_rcu(epi, rcu);
percpu_counter_dec(&ep->user->epoll_watches);
return ep_refcount_dec_and_test(ep);
@@ -1153,7 +1147,7 @@ static inline bool chain_epi_lockless(struct epitem *epi)
* This callback takes a read lock in order not to contend with concurrent
* events from another file descriptor, thus all modifications to ->rdllist
* or ->ovflist are lockless. Read lock is paired with the write lock from
- * ep_scan_ready_list(), which stops all list modifications and guarantees
+ * ep_start/done_scan(), which stops all list modifications and guarantees
* that lists state is seen correctly.
*
* Another thing worth to mention is that ep_poll_callback() can be called
@@ -1751,7 +1745,7 @@ static int ep_send_events(struct eventpoll *ep,
* availability. At this point, no one can insert
* into ep->rdllist besides us. The epoll_ctl()
* callers are locked out by
- * ep_scan_ready_list() holding "mtx" and the
+ * ep_send_events() holding "mtx" and the
* poll callback will queue them in ep->ovflist.
*/
list_add_tail(&epi->rdllink, &ep->rdllist);
@@ -1904,7 +1898,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
__set_current_state(TASK_INTERRUPTIBLE);
/*
- * Do the final check under the lock. ep_scan_ready_list()
+ * Do the final check under the lock. ep_start/done_scan()
* plays with two lists (->rdllist and ->ovflist) and there
* is always a race when both lists are empty for short
* period of time although events are pending, so lock is
diff --git a/fs/exec.c b/fs/exec.c
index af4fbb6..ece3ab0 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1158,7 +1158,6 @@ static int de_thread(struct task_struct *tsk)
BUG_ON(leader->exit_state != EXIT_ZOMBIE);
leader->exit_state = EXIT_DEAD;
-
/*
* We are going to release_task()->ptrace_unlink() silently,
* the tracer can sleep in do_wait(). EXIT_DEAD guarantees
diff --git a/fs/exfat/exfat_fs.h b/fs/exfat/exfat_fs.h
index 9474cd5..3615954 100644
--- a/fs/exfat/exfat_fs.h
+++ b/fs/exfat/exfat_fs.h
@@ -275,6 +275,7 @@ struct exfat_sb_info {
spinlock_t inode_hash_lock;
struct hlist_head inode_hashtable[EXFAT_HASH_SIZE];
+ struct rcu_head rcu;
};
#define EXFAT_CACHE_VALID 0
diff --git a/fs/exfat/file.c b/fs/exfat/file.c
index d25a96a..cc00f1a 100644
--- a/fs/exfat/file.c
+++ b/fs/exfat/file.c
@@ -35,13 +35,18 @@ static int exfat_cont_expand(struct inode *inode, loff_t size)
if (new_num_clusters == num_clusters)
goto out;
- exfat_chain_set(&clu, ei->start_clu, num_clusters, ei->flags);
- ret = exfat_find_last_cluster(sb, &clu, &last_clu);
- if (ret)
- return ret;
+ if (num_clusters) {
+ exfat_chain_set(&clu, ei->start_clu, num_clusters, ei->flags);
+ ret = exfat_find_last_cluster(sb, &clu, &last_clu);
+ if (ret)
+ return ret;
- clu.dir = (last_clu == EXFAT_EOF_CLUSTER) ?
- EXFAT_EOF_CLUSTER : last_clu + 1;
+ clu.dir = last_clu + 1;
+ } else {
+ last_clu = EXFAT_EOF_CLUSTER;
+ clu.dir = EXFAT_EOF_CLUSTER;
+ }
+
clu.size = 0;
clu.flags = ei->flags;
@@ -51,17 +56,19 @@ static int exfat_cont_expand(struct inode *inode, loff_t size)
return ret;
/* Append new clusters to chain */
- if (clu.flags != ei->flags) {
- exfat_chain_cont_cluster(sb, ei->start_clu, num_clusters);
- ei->flags = ALLOC_FAT_CHAIN;
- }
- if (clu.flags == ALLOC_FAT_CHAIN)
- if (exfat_ent_set(sb, last_clu, clu.dir))
- goto free_clu;
+ if (num_clusters) {
+ if (clu.flags != ei->flags)
+ if (exfat_chain_cont_cluster(sb, ei->start_clu, num_clusters))
+ goto free_clu;
- if (num_clusters == 0)
+ if (clu.flags == ALLOC_FAT_CHAIN)
+ if (exfat_ent_set(sb, last_clu, clu.dir))
+ goto free_clu;
+ } else
ei->start_clu = clu.dir;
+ ei->flags = clu.flags;
+
out:
inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode));
/* Expanded range not zeroed, do not update valid_size */
diff --git a/fs/exfat/nls.c b/fs/exfat/nls.c
index 705710f..afdf13c 100644
--- a/fs/exfat/nls.c
+++ b/fs/exfat/nls.c
@@ -655,7 +655,6 @@ static int exfat_load_upcase_table(struct super_block *sb,
unsigned int sect_size = sb->s_blocksize;
unsigned int i, index = 0;
u32 chksum = 0;
- int ret;
unsigned char skip = false;
unsigned short *upcase_table;
@@ -673,8 +672,7 @@ static int exfat_load_upcase_table(struct super_block *sb,
if (!bh) {
exfat_err(sb, "failed to read sector(0x%llx)",
(unsigned long long)sector);
- ret = -EIO;
- goto free_table;
+ return -EIO;
}
sector++;
for (i = 0; i < sect_size && index <= 0xFFFF; i += 2) {
@@ -701,15 +699,12 @@ static int exfat_load_upcase_table(struct super_block *sb,
exfat_err(sb, "failed to load upcase table (idx : 0x%08x, chksum : 0x%08x, utbl_chksum : 0x%08x)",
index, chksum, utbl_checksum);
- ret = -EINVAL;
-free_table:
- exfat_free_upcase_table(sbi);
- return ret;
+ return -EINVAL;
}
static int exfat_load_default_upcase_table(struct super_block *sb)
{
- int i, ret = -EIO;
+ int i;
struct exfat_sb_info *sbi = EXFAT_SB(sb);
unsigned char skip = false;
unsigned short uni = 0, *upcase_table;
@@ -740,8 +735,7 @@ static int exfat_load_default_upcase_table(struct super_block *sb)
return 0;
/* FATAL error: default upcase table has error */
- exfat_free_upcase_table(sbi);
- return ret;
+ return -EIO;
}
int exfat_create_upcase_table(struct super_block *sb)
diff --git a/fs/exfat/super.c b/fs/exfat/super.c
index d9d4fa9..fcb6582 100644
--- a/fs/exfat/super.c
+++ b/fs/exfat/super.c
@@ -39,9 +39,6 @@ static void exfat_put_super(struct super_block *sb)
exfat_free_bitmap(sbi);
brelse(sbi->boot_bh);
mutex_unlock(&sbi->s_lock);
-
- unload_nls(sbi->nls_io);
- exfat_free_upcase_table(sbi);
}
static int exfat_sync_fs(struct super_block *sb, int wait)
@@ -600,7 +597,7 @@ static int __exfat_fill_super(struct super_block *sb)
ret = exfat_load_bitmap(sb);
if (ret) {
exfat_err(sb, "failed to load alloc-bitmap");
- goto free_upcase_table;
+ goto free_bh;
}
ret = exfat_count_used_clusters(sb, &sbi->used_clusters);
@@ -613,8 +610,6 @@ static int __exfat_fill_super(struct super_block *sb)
free_alloc_bitmap:
exfat_free_bitmap(sbi);
-free_upcase_table:
- exfat_free_upcase_table(sbi);
free_bh:
brelse(sbi->boot_bh);
return ret;
@@ -701,12 +696,10 @@ static int exfat_fill_super(struct super_block *sb, struct fs_context *fc)
sb->s_root = NULL;
free_table:
- exfat_free_upcase_table(sbi);
exfat_free_bitmap(sbi);
brelse(sbi->boot_bh);
check_nls_io:
- unload_nls(sbi->nls_io);
return err;
}
@@ -771,13 +764,22 @@ static int exfat_init_fs_context(struct fs_context *fc)
return 0;
}
+static void delayed_free(struct rcu_head *p)
+{
+ struct exfat_sb_info *sbi = container_of(p, struct exfat_sb_info, rcu);
+
+ unload_nls(sbi->nls_io);
+ exfat_free_upcase_table(sbi);
+ exfat_free_sbi(sbi);
+}
+
static void exfat_kill_sb(struct super_block *sb)
{
struct exfat_sb_info *sbi = sb->s_fs_info;
kill_block_super(sb);
if (sbi)
- exfat_free_sbi(sbi);
+ call_rcu(&sbi->rcu, delayed_free);
}
static struct file_system_type exfat_fs_type = {
diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
index 3ae0154..07ea3d6 100644
--- a/fs/exportfs/expfs.c
+++ b/fs/exportfs/expfs.c
@@ -255,7 +255,7 @@ static bool filldir_one(struct dir_context *ctx, const char *name, int len,
container_of(ctx, struct getdents_callback, ctx);
buf->sequence++;
- if (buf->ino == ino && len <= NAME_MAX) {
+ if (buf->ino == ino && len <= NAME_MAX && !is_dot_dotdot(name, len)) {
memcpy(buf->name, name, len);
buf->name[len] = '\0';
buf->found = 1;
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 023571f..3c0d7d1 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1550,7 +1550,7 @@ struct ext4_sb_info {
unsigned long s_commit_interval;
u32 s_max_batch_time;
u32 s_min_batch_time;
- struct bdev_handle *s_journal_bdev_handle;
+ struct file *s_journal_bdev_file;
#ifdef CONFIG_QUOTA
/* Names of quota files with journalled quota */
char __rcu *s_qf_names[EXT4_MAXQUOTAS];
diff --git a/fs/ext4/fsmap.c b/fs/ext4/fsmap.c
index 11e6f33..df853c4 100644
--- a/fs/ext4/fsmap.c
+++ b/fs/ext4/fsmap.c
@@ -576,9 +576,9 @@ static bool ext4_getfsmap_is_valid_device(struct super_block *sb,
if (fm->fmr_device == 0 || fm->fmr_device == UINT_MAX ||
fm->fmr_device == new_encode_dev(sb->s_bdev->bd_dev))
return true;
- if (EXT4_SB(sb)->s_journal_bdev_handle &&
+ if (EXT4_SB(sb)->s_journal_bdev_file &&
fm->fmr_device ==
- new_encode_dev(EXT4_SB(sb)->s_journal_bdev_handle->bdev->bd_dev))
+ new_encode_dev(file_bdev(EXT4_SB(sb)->s_journal_bdev_file)->bd_dev))
return true;
return false;
}
@@ -648,9 +648,9 @@ int ext4_getfsmap(struct super_block *sb, struct ext4_fsmap_head *head,
memset(handlers, 0, sizeof(handlers));
handlers[0].gfd_dev = new_encode_dev(sb->s_bdev->bd_dev);
handlers[0].gfd_fn = ext4_getfsmap_datadev;
- if (EXT4_SB(sb)->s_journal_bdev_handle) {
+ if (EXT4_SB(sb)->s_journal_bdev_file) {
handlers[1].gfd_dev = new_encode_dev(
- EXT4_SB(sb)->s_journal_bdev_handle->bdev->bd_dev);
+ file_bdev(EXT4_SB(sb)->s_journal_bdev_file)->bd_dev);
handlers[1].gfd_fn = ext4_getfsmap_logdev;
}
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 05b647e..5e4f65c 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1762,7 +1762,6 @@ static struct buffer_head *ext4_lookup_entry(struct inode *dir,
struct buffer_head *bh;
err = ext4_fname_prepare_lookup(dir, dentry, &fname);
- generic_set_encrypted_ci_d_ops(dentry);
if (err == -ENOENT)
return NULL;
if (err)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 0f931d0..a8ba84e 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1359,14 +1359,14 @@ static void ext4_put_super(struct super_block *sb)
sync_blockdev(sb->s_bdev);
invalidate_bdev(sb->s_bdev);
- if (sbi->s_journal_bdev_handle) {
+ if (sbi->s_journal_bdev_file) {
/*
* Invalidate the journal device's buffers. We don't want them
* floating about in memory - the physical journal device may
* hotswapped, and it breaks the `ro-after' testing code.
*/
- sync_blockdev(sbi->s_journal_bdev_handle->bdev);
- invalidate_bdev(sbi->s_journal_bdev_handle->bdev);
+ sync_blockdev(file_bdev(sbi->s_journal_bdev_file));
+ invalidate_bdev(file_bdev(sbi->s_journal_bdev_file));
}
ext4_xattr_destroy_cache(sbi->s_ea_inode_cache);
@@ -4233,7 +4233,7 @@ int ext4_calculate_overhead(struct super_block *sb)
* Add the internal journal blocks whether the journal has been
* loaded or not
*/
- if (sbi->s_journal && !sbi->s_journal_bdev_handle)
+ if (sbi->s_journal && !sbi->s_journal_bdev_file)
overhead += EXT4_NUM_B2C(sbi, sbi->s_journal->j_total_len);
else if (ext4_has_feature_journal(sb) && !sbi->s_journal && j_inum) {
/* j_inum for internal journal is non-zero */
@@ -5346,7 +5346,7 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
sb->s_qcop = &ext4_qctl_operations;
sb->s_quota_types = QTYPE_MASK_USR | QTYPE_MASK_GRP | QTYPE_MASK_PRJ;
#endif
- memcpy(&sb->s_uuid, es->s_uuid, sizeof(es->s_uuid));
+ super_set_uuid(sb, es->s_uuid, sizeof(es->s_uuid));
INIT_LIST_HEAD(&sbi->s_orphan); /* unlinked but open files */
mutex_init(&sbi->s_orphan_lock);
@@ -5484,6 +5484,7 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
goto failed_mount4;
}
+ generic_set_sb_d_ops(sb);
sb->s_root = d_make_root(root);
if (!sb->s_root) {
ext4_msg(sb, KERN_ERR, "get root dentry failed");
@@ -5670,9 +5671,9 @@ failed_mount9: __maybe_unused
#endif
fscrypt_free_dummy_policy(&sbi->s_dummy_enc_policy);
brelse(sbi->s_sbh);
- if (sbi->s_journal_bdev_handle) {
- invalidate_bdev(sbi->s_journal_bdev_handle->bdev);
- bdev_release(sbi->s_journal_bdev_handle);
+ if (sbi->s_journal_bdev_file) {
+ invalidate_bdev(file_bdev(sbi->s_journal_bdev_file));
+ fput(sbi->s_journal_bdev_file);
}
out_fail:
invalidate_bdev(sb->s_bdev);
@@ -5842,30 +5843,30 @@ static journal_t *ext4_open_inode_journal(struct super_block *sb,
return journal;
}
-static struct bdev_handle *ext4_get_journal_blkdev(struct super_block *sb,
+static struct file *ext4_get_journal_blkdev(struct super_block *sb,
dev_t j_dev, ext4_fsblk_t *j_start,
ext4_fsblk_t *j_len)
{
struct buffer_head *bh;
struct block_device *bdev;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
int hblock, blocksize;
ext4_fsblk_t sb_block;
unsigned long offset;
struct ext4_super_block *es;
int errno;
- bdev_handle = bdev_open_by_dev(j_dev,
+ bdev_file = bdev_file_open_by_dev(j_dev,
BLK_OPEN_READ | BLK_OPEN_WRITE | BLK_OPEN_RESTRICT_WRITES,
sb, &fs_holder_ops);
- if (IS_ERR(bdev_handle)) {
+ if (IS_ERR(bdev_file)) {
ext4_msg(sb, KERN_ERR,
"failed to open journal device unknown-block(%u,%u) %ld",
- MAJOR(j_dev), MINOR(j_dev), PTR_ERR(bdev_handle));
- return bdev_handle;
+ MAJOR(j_dev), MINOR(j_dev), PTR_ERR(bdev_file));
+ return bdev_file;
}
- bdev = bdev_handle->bdev;
+ bdev = file_bdev(bdev_file);
blocksize = sb->s_blocksize;
hblock = bdev_logical_block_size(bdev);
if (blocksize < hblock) {
@@ -5912,12 +5913,12 @@ static struct bdev_handle *ext4_get_journal_blkdev(struct super_block *sb,
*j_start = sb_block + 1;
*j_len = ext4_blocks_count(es);
brelse(bh);
- return bdev_handle;
+ return bdev_file;
out_bh:
brelse(bh);
out_bdev:
- bdev_release(bdev_handle);
+ fput(bdev_file);
return ERR_PTR(errno);
}
@@ -5927,14 +5928,14 @@ static journal_t *ext4_open_dev_journal(struct super_block *sb,
journal_t *journal;
ext4_fsblk_t j_start;
ext4_fsblk_t j_len;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
int errno = 0;
- bdev_handle = ext4_get_journal_blkdev(sb, j_dev, &j_start, &j_len);
- if (IS_ERR(bdev_handle))
- return ERR_CAST(bdev_handle);
+ bdev_file = ext4_get_journal_blkdev(sb, j_dev, &j_start, &j_len);
+ if (IS_ERR(bdev_file))
+ return ERR_CAST(bdev_file);
- journal = jbd2_journal_init_dev(bdev_handle->bdev, sb->s_bdev, j_start,
+ journal = jbd2_journal_init_dev(file_bdev(bdev_file), sb->s_bdev, j_start,
j_len, sb->s_blocksize);
if (IS_ERR(journal)) {
ext4_msg(sb, KERN_ERR, "failed to create device journal");
@@ -5949,14 +5950,14 @@ static journal_t *ext4_open_dev_journal(struct super_block *sb,
goto out_journal;
}
journal->j_private = sb;
- EXT4_SB(sb)->s_journal_bdev_handle = bdev_handle;
+ EXT4_SB(sb)->s_journal_bdev_file = bdev_file;
ext4_init_journal_params(sb, journal);
return journal;
out_journal:
jbd2_journal_destroy(journal);
out_bdev:
- bdev_release(bdev_handle);
+ fput(bdev_file);
return ERR_PTR(errno);
}
@@ -7314,12 +7315,12 @@ static inline int ext3_feature_set_ok(struct super_block *sb)
static void ext4_kill_sb(struct super_block *sb)
{
struct ext4_sb_info *sbi = EXT4_SB(sb);
- struct bdev_handle *handle = sbi ? sbi->s_journal_bdev_handle : NULL;
+ struct file *bdev_file = sbi ? sbi->s_journal_bdev_file : NULL;
kill_block_super(sb);
- if (handle)
- bdev_release(handle);
+ if (bdev_file)
+ fput(bdev_file);
}
static struct file_system_type ext4_fs_type = {
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index 75bf1f8..645240c 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -92,10 +92,12 @@ static const char *ext4_get_link(struct dentry *dentry, struct inode *inode,
if (!dentry) {
bh = ext4_getblk(NULL, inode, 0, EXT4_GET_BLOCKS_CACHED_NOWAIT);
- if (IS_ERR(bh))
- return ERR_CAST(bh);
- if (!bh || !ext4_buffer_uptodate(bh))
+ if (IS_ERR(bh) || !bh)
return ERR_PTR(-ECHILD);
+ if (!ext4_buffer_uptodate(bh)) {
+ brelse(bh);
+ return ERR_PTR(-ECHILD);
+ }
} else {
bh = ext4_bread(NULL, inode, 0, 0);
if (IS_ERR(bh))
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 65294e3..4c77e8ce 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -24,6 +24,7 @@
#include <linux/blkdev.h>
#include <linux/quotaops.h>
#include <linux/part_stat.h>
+#include <linux/rw_hint.h>
#include <crypto/hash.h>
#include <linux/fscrypt.h>
@@ -1239,7 +1240,7 @@ struct f2fs_bio_info {
#define FDEV(i) (sbi->devs[i])
#define RDEV(i) (raw_super->devs[i])
struct f2fs_dev_info {
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
struct block_device *bdev;
char path[MAX_PATH_LEN];
unsigned int total_segments;
@@ -3364,17 +3365,6 @@ static inline bool f2fs_cp_error(struct f2fs_sb_info *sbi)
return is_set_ckpt_flags(sbi, CP_ERROR_FLAG);
}
-static inline bool is_dot_dotdot(const u8 *name, size_t len)
-{
- if (len == 1 && name[0] == '.')
- return true;
-
- if (len == 2 && name[0] == '.' && name[1] == '.')
- return true;
-
- return false;
-}
-
static inline void *f2fs_kmalloc(struct f2fs_sb_info *sbi,
size_t size, gfp_t flags)
{
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index b3bb815..f7f63a5 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -531,7 +531,6 @@ static struct dentry *f2fs_lookup(struct inode *dir, struct dentry *dentry,
}
err = f2fs_prepare_lookup(dir, dentry, &fname);
- generic_set_encrypted_ci_d_ops(dentry);
if (err == -ENOENT)
goto out_splice;
if (err)
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index d45ab09..b880b74 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1605,7 +1605,7 @@ static void destroy_device_list(struct f2fs_sb_info *sbi)
for (i = 0; i < sbi->s_ndevs; i++) {
if (i > 0)
- bdev_release(FDEV(i).bdev_handle);
+ fput(FDEV(i).bdev_file);
#ifdef CONFIG_BLK_DEV_ZONED
kvfree(FDEV(i).blkz_seq);
#endif
@@ -4247,7 +4247,7 @@ static int f2fs_scan_devices(struct f2fs_sb_info *sbi)
for (i = 0; i < max_devices; i++) {
if (i == 0)
- FDEV(0).bdev_handle = sbi->sb->s_bdev_handle;
+ FDEV(0).bdev_file = sbi->sb->s_bdev_file;
else if (!RDEV(i).path[0])
break;
@@ -4267,14 +4267,14 @@ static int f2fs_scan_devices(struct f2fs_sb_info *sbi)
FDEV(i).end_blk = FDEV(i).start_blk +
(FDEV(i).total_segments <<
sbi->log_blocks_per_seg) - 1;
- FDEV(i).bdev_handle = bdev_open_by_path(
+ FDEV(i).bdev_file = bdev_file_open_by_path(
FDEV(i).path, mode, sbi->sb, NULL);
}
}
- if (IS_ERR(FDEV(i).bdev_handle))
- return PTR_ERR(FDEV(i).bdev_handle);
+ if (IS_ERR(FDEV(i).bdev_file))
+ return PTR_ERR(FDEV(i).bdev_file);
- FDEV(i).bdev = FDEV(i).bdev_handle->bdev;
+ FDEV(i).bdev = file_bdev(FDEV(i).bdev_file);
/* to release errored devices */
sbi->s_ndevs = i + 1;
@@ -4496,7 +4496,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
sb->s_time_gran = 1;
sb->s_flags = (sb->s_flags & ~SB_POSIXACL) |
(test_opt(sbi, POSIX_ACL) ? SB_POSIXACL : 0);
- memcpy(&sb->s_uuid, raw_super->uuid, sizeof(raw_super->uuid));
+ super_set_uuid(sb, (void *) raw_super->uuid, sizeof(raw_super->uuid));
sb->s_iflags |= SB_I_CGROUPWB;
/* init f2fs-specific super block info */
@@ -4660,6 +4660,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
goto free_node_inode;
}
+ generic_set_sb_d_ops(sb);
sb->s_root = d_make_root(root); /* allocate root dentry */
if (!sb->s_root) {
err = -ENOMEM;
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 1fac3da..5c81369 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -1762,6 +1762,9 @@ int fat_fill_super(struct super_block *sb, void *data, int silent, int isvfat,
else /* fat 16 or 12 */
sbi->vol_id = bpb.fat16_vol_id;
+ __le32 vol_id_le = cpu_to_le32(sbi->vol_id);
+ super_set_uuid(sb, (void *) &vol_id_le, sizeof(vol_id_le));
+
sbi->dir_per_block = sb->s_blocksize / sizeof(struct msdos_dir_entry);
sbi->dir_per_block_bits = ffs(sbi->dir_per_block) - 1;
diff --git a/fs/fcntl.c b/fs/fcntl.c
index c80a6ac..54cc85d 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -27,6 +27,7 @@
#include <linux/memfd.h>
#include <linux/compat.h>
#include <linux/mount.h>
+#include <linux/rw_hint.h>
#include <linux/poll.h>
#include <asm/siginfo.h>
@@ -268,8 +269,15 @@ static int f_getowner_uids(struct file *filp, unsigned long arg)
}
#endif
-static bool rw_hint_valid(enum rw_hint hint)
+static bool rw_hint_valid(u64 hint)
{
+ BUILD_BUG_ON(WRITE_LIFE_NOT_SET != RWH_WRITE_LIFE_NOT_SET);
+ BUILD_BUG_ON(WRITE_LIFE_NONE != RWH_WRITE_LIFE_NONE);
+ BUILD_BUG_ON(WRITE_LIFE_SHORT != RWH_WRITE_LIFE_SHORT);
+ BUILD_BUG_ON(WRITE_LIFE_MEDIUM != RWH_WRITE_LIFE_MEDIUM);
+ BUILD_BUG_ON(WRITE_LIFE_LONG != RWH_WRITE_LIFE_LONG);
+ BUILD_BUG_ON(WRITE_LIFE_EXTREME != RWH_WRITE_LIFE_EXTREME);
+
switch (hint) {
case RWH_WRITE_LIFE_NOT_SET:
case RWH_WRITE_LIFE_NONE:
@@ -283,34 +291,40 @@ static bool rw_hint_valid(enum rw_hint hint)
}
}
-static long fcntl_rw_hint(struct file *file, unsigned int cmd,
- unsigned long arg)
+static long fcntl_get_rw_hint(struct file *file, unsigned int cmd,
+ unsigned long arg)
{
struct inode *inode = file_inode(file);
u64 __user *argp = (u64 __user *)arg;
- enum rw_hint hint;
- u64 h;
+ u64 hint = READ_ONCE(inode->i_write_hint);
- switch (cmd) {
- case F_GET_RW_HINT:
- h = inode->i_write_hint;
- if (copy_to_user(argp, &h, sizeof(*argp)))
- return -EFAULT;
- return 0;
- case F_SET_RW_HINT:
- if (copy_from_user(&h, argp, sizeof(h)))
- return -EFAULT;
- hint = (enum rw_hint) h;
- if (!rw_hint_valid(hint))
- return -EINVAL;
+ if (copy_to_user(argp, &hint, sizeof(*argp)))
+ return -EFAULT;
+ return 0;
+}
- inode_lock(inode);
- inode->i_write_hint = hint;
- inode_unlock(inode);
- return 0;
- default:
+static long fcntl_set_rw_hint(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ struct inode *inode = file_inode(file);
+ u64 __user *argp = (u64 __user *)arg;
+ u64 hint;
+
+ if (copy_from_user(&hint, argp, sizeof(hint)))
+ return -EFAULT;
+ if (!rw_hint_valid(hint))
return -EINVAL;
- }
+
+ WRITE_ONCE(inode->i_write_hint, hint);
+
+ /*
+ * file->f_mapping->host may differ from inode. As an example,
+ * blkdev_open() modifies file->f_mapping.
+ */
+ if (file->f_mapping->host != inode)
+ WRITE_ONCE(file->f_mapping->host->i_write_hint, hint);
+
+ return 0;
}
static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
@@ -416,8 +430,10 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
err = memfd_fcntl(filp, cmd, argi);
break;
case F_GET_RW_HINT:
+ err = fcntl_get_rw_hint(filp, cmd, arg);
+ break;
case F_SET_RW_HINT:
- err = fcntl_rw_hint(filp, cmd, arg);
+ err = fcntl_set_rw_hint(filp, cmd, arg);
break;
default:
break;
@@ -846,12 +862,6 @@ int send_sigurg(struct fown_struct *fown)
static DEFINE_SPINLOCK(fasync_lock);
static struct kmem_cache *fasync_cache __ro_after_init;
-static void fasync_free_rcu(struct rcu_head *head)
-{
- kmem_cache_free(fasync_cache,
- container_of(head, struct fasync_struct, fa_rcu));
-}
-
/*
* Remove a fasync entry. If successfully removed, return
* positive and clear the FASYNC flag. If no entry exists,
@@ -877,7 +887,7 @@ int fasync_remove_entry(struct file *filp, struct fasync_struct **fapp)
write_unlock_irq(&fa->fa_lock);
*fp = fa->fa_next;
- call_rcu(&fa->fa_rcu, fasync_free_rcu);
+ kfree_rcu(fa, fa_rcu);
filp->f_flags &= ~FASYNC;
result = 1;
break;
diff --git a/fs/fhandle.c b/fs/fhandle.c
index 18b3ba8..57a1261 100644
--- a/fs/fhandle.c
+++ b/fs/fhandle.c
@@ -36,7 +36,7 @@ static long do_sys_name_to_handle(const struct path *path,
if (f_handle.handle_bytes > MAX_HANDLE_SZ)
return -EINVAL;
- handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_bytes,
+ handle = kzalloc(sizeof(struct file_handle) + f_handle.handle_bytes,
GFP_KERNEL);
if (!handle)
return -ENOMEM;
diff --git a/fs/file_table.c b/fs/file_table.c
index b991f90..6925522 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -276,21 +276,15 @@ struct file *alloc_empty_backing_file(int flags, const struct cred *cred)
}
/**
- * alloc_file - allocate and initialize a 'struct file'
+ * file_init_path - initialize a 'struct file' based on path
*
+ * @file: the file to set up
* @path: the (dentry, vfsmount) pair for the new file
- * @flags: O_... flags with which the new file will be opened
* @fop: the 'struct file_operations' for the new file
*/
-static struct file *alloc_file(const struct path *path, int flags,
- const struct file_operations *fop)
+static void file_init_path(struct file *file, const struct path *path,
+ const struct file_operations *fop)
{
- struct file *file;
-
- file = alloc_empty_file(flags, current_cred());
- if (IS_ERR(file))
- return file;
-
file->f_path = *path;
file->f_inode = path->dentry->d_inode;
file->f_mapping = path->dentry->d_inode->i_mapping;
@@ -309,22 +303,51 @@ static struct file *alloc_file(const struct path *path, int flags,
file->f_op = fop;
if ((file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ)
i_readcount_inc(path->dentry->d_inode);
+}
+
+/**
+ * alloc_file - allocate and initialize a 'struct file'
+ *
+ * @path: the (dentry, vfsmount) pair for the new file
+ * @flags: O_... flags with which the new file will be opened
+ * @fop: the 'struct file_operations' for the new file
+ */
+static struct file *alloc_file(const struct path *path, int flags,
+ const struct file_operations *fop)
+{
+ struct file *file;
+
+ file = alloc_empty_file(flags, current_cred());
+ if (!IS_ERR(file))
+ file_init_path(file, path, fop);
return file;
}
-struct file *alloc_file_pseudo(struct inode *inode, struct vfsmount *mnt,
- const char *name, int flags,
- const struct file_operations *fops)
+static inline int alloc_path_pseudo(const char *name, struct inode *inode,
+ struct vfsmount *mnt, struct path *path)
{
struct qstr this = QSTR_INIT(name, strlen(name));
+
+ path->dentry = d_alloc_pseudo(mnt->mnt_sb, &this);
+ if (!path->dentry)
+ return -ENOMEM;
+ path->mnt = mntget(mnt);
+ d_instantiate(path->dentry, inode);
+ return 0;
+}
+
+struct file *alloc_file_pseudo(struct inode *inode, struct vfsmount *mnt,
+ const char *name, int flags,
+ const struct file_operations *fops)
+{
+ int ret;
struct path path;
struct file *file;
- path.dentry = d_alloc_pseudo(mnt->mnt_sb, &this);
- if (!path.dentry)
- return ERR_PTR(-ENOMEM);
- path.mnt = mntget(mnt);
- d_instantiate(path.dentry, inode);
+ ret = alloc_path_pseudo(name, inode, mnt, &path);
+ if (ret)
+ return ERR_PTR(ret);
+
file = alloc_file(&path, flags, fops);
if (IS_ERR(file)) {
ihold(inode);
@@ -334,6 +357,30 @@ struct file *alloc_file_pseudo(struct inode *inode, struct vfsmount *mnt,
}
EXPORT_SYMBOL(alloc_file_pseudo);
+struct file *alloc_file_pseudo_noaccount(struct inode *inode,
+ struct vfsmount *mnt, const char *name,
+ int flags,
+ const struct file_operations *fops)
+{
+ int ret;
+ struct path path;
+ struct file *file;
+
+ ret = alloc_path_pseudo(name, inode, mnt, &path);
+ if (ret)
+ return ERR_PTR(ret);
+
+ file = alloc_empty_file_noaccount(flags, current_cred());
+ if (IS_ERR(file)) {
+ ihold(inode);
+ path_put(&path);
+ return file;
+ }
+ file_init_path(file, &path, fops);
+ return file;
+}
+EXPORT_SYMBOL_GPL(alloc_file_pseudo_noaccount);
+
struct file *alloc_file_clone(struct file *base, int flags,
const struct file_operations *fops)
{
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 3d84fcc..e4f17c5 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -141,6 +141,31 @@ static void wb_wakeup(struct bdi_writeback *wb)
spin_unlock_irq(&wb->work_lock);
}
+/*
+ * This function is used when the first inode for this wb is marked dirty. It
+ * wakes-up the corresponding bdi thread which should then take care of the
+ * periodic background write-out of dirty inodes. Since the write-out would
+ * starts only 'dirty_writeback_interval' centisecs from now anyway, we just
+ * set up a timer which wakes the bdi thread up later.
+ *
+ * Note, we wouldn't bother setting up the timer, but this function is on the
+ * fast-path (used by '__mark_inode_dirty()'), so we save few context switches
+ * by delaying the wake-up.
+ *
+ * We have to be careful not to postpone flush work if it is scheduled for
+ * earlier. Thus we use queue_delayed_work().
+ */
+static void wb_wakeup_delayed(struct bdi_writeback *wb)
+{
+ unsigned long timeout;
+
+ timeout = msecs_to_jiffies(dirty_writeback_interval * 10);
+ spin_lock_irq(&wb->work_lock);
+ if (test_bit(WB_registered, &wb->state))
+ queue_delayed_work(bdi_wq, &wb->dwork, timeout);
+ spin_unlock_irq(&wb->work_lock);
+}
+
static void finish_writeback_work(struct bdi_writeback *wb,
struct wb_writeback_work *work)
{
diff --git a/fs/fs_parser.c b/fs/fs_parser.c
index edb3712..a4d6ca0 100644
--- a/fs/fs_parser.c
+++ b/fs/fs_parser.c
@@ -83,8 +83,8 @@ static const struct fs_parameter_spec *fs_lookup_key(
}
/*
- * fs_parse - Parse a filesystem configuration parameter
- * @fc: The filesystem context to log errors through.
+ * __fs_parse - Parse a filesystem configuration parameter
+ * @log: The filesystem context to log errors through.
* @desc: The parameter description to use.
* @param: The parameter.
* @result: Where to place the result of the parse
diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c
index 91e89e6..b6cad10 100644
--- a/fs/fuse/cuse.c
+++ b/fs/fuse/cuse.c
@@ -474,8 +474,7 @@ static int cuse_send_init(struct cuse_conn *cc)
static void cuse_fc_release(struct fuse_conn *fc)
{
- struct cuse_conn *cc = fc_to_cc(fc);
- kfree_rcu(cc, fc.rcu);
+ kfree(fc_to_cc(fc));
}
/**
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 148a71b..c007b0f 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2509,14 +2509,14 @@ static int convert_fuse_file_lock(struct fuse_conn *fc,
* translate it into the caller's pid namespace.
*/
rcu_read_lock();
- fl->fl_pid = pid_nr_ns(find_pid_ns(ffl->pid, fc->pid_ns), &init_pid_ns);
+ fl->c.flc_pid = pid_nr_ns(find_pid_ns(ffl->pid, fc->pid_ns), &init_pid_ns);
rcu_read_unlock();
break;
default:
return -EIO;
}
- fl->fl_type = ffl->type;
+ fl->c.flc_type = ffl->type;
return 0;
}
@@ -2530,10 +2530,10 @@ static void fuse_lk_fill(struct fuse_args *args, struct file *file,
memset(inarg, 0, sizeof(*inarg));
inarg->fh = ff->fh;
- inarg->owner = fuse_lock_owner_id(fc, fl->fl_owner);
+ inarg->owner = fuse_lock_owner_id(fc, fl->c.flc_owner);
inarg->lk.start = fl->fl_start;
inarg->lk.end = fl->fl_end;
- inarg->lk.type = fl->fl_type;
+ inarg->lk.type = fl->c.flc_type;
inarg->lk.pid = pid;
if (flock)
inarg->lk_flags |= FUSE_LK_FLOCK;
@@ -2570,8 +2570,8 @@ static int fuse_setlk(struct file *file, struct file_lock *fl, int flock)
struct fuse_mount *fm = get_fuse_mount(inode);
FUSE_ARGS(args);
struct fuse_lk_in inarg;
- int opcode = (fl->fl_flags & FL_SLEEP) ? FUSE_SETLKW : FUSE_SETLK;
- struct pid *pid = fl->fl_type != F_UNLCK ? task_tgid(current) : NULL;
+ int opcode = (fl->c.flc_flags & FL_SLEEP) ? FUSE_SETLKW : FUSE_SETLK;
+ struct pid *pid = fl->c.flc_type != F_UNLCK ? task_tgid(current) : NULL;
pid_t pid_nr = pid_nr_ns(pid, fm->fc->pid_ns);
int err;
@@ -2581,7 +2581,7 @@ static int fuse_setlk(struct file *file, struct file_lock *fl, int flock)
}
/* Unlock on close is handled by the flush method */
- if ((fl->fl_flags & FL_CLOSE_POSIX) == FL_CLOSE_POSIX)
+ if ((fl->c.flc_flags & FL_CLOSE_POSIX) == FL_CLOSE_POSIX)
return 0;
fuse_lk_fill(&args, file, fl, opcode, pid_nr, flock, &inarg);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 1df83ee..bcbe344 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -888,6 +888,7 @@ struct fuse_mount {
/* Entry on fc->mounts */
struct list_head fc_entry;
+ struct rcu_head rcu;
};
static inline struct fuse_mount *get_fuse_mount_super(struct super_block *sb)
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 2a6d44f..516ea29 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -930,6 +930,14 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm,
}
EXPORT_SYMBOL_GPL(fuse_conn_init);
+static void delayed_release(struct rcu_head *p)
+{
+ struct fuse_conn *fc = container_of(p, struct fuse_conn, rcu);
+
+ put_user_ns(fc->user_ns);
+ fc->release(fc);
+}
+
void fuse_conn_put(struct fuse_conn *fc)
{
if (refcount_dec_and_test(&fc->count)) {
@@ -941,13 +949,12 @@ void fuse_conn_put(struct fuse_conn *fc)
if (fiq->ops->release)
fiq->ops->release(fiq);
put_pid_ns(fc->pid_ns);
- put_user_ns(fc->user_ns);
bucket = rcu_dereference_protected(fc->curr_bucket, 1);
if (bucket) {
WARN_ON(atomic_read(&bucket->count) != 1);
kfree(bucket);
}
- fc->release(fc);
+ call_rcu(&fc->rcu, delayed_release);
}
}
EXPORT_SYMBOL_GPL(fuse_conn_put);
@@ -1366,7 +1373,7 @@ EXPORT_SYMBOL_GPL(fuse_send_init);
void fuse_free_conn(struct fuse_conn *fc)
{
WARN_ON(!list_empty(&fc->devices));
- kfree_rcu(fc, rcu);
+ kfree(fc);
}
EXPORT_SYMBOL_GPL(fuse_free_conn);
@@ -1902,7 +1909,7 @@ static void fuse_sb_destroy(struct super_block *sb)
void fuse_mount_destroy(struct fuse_mount *fm)
{
fuse_conn_put(fm->fc);
- kfree(fm);
+ kfree_rcu(fm, rcu);
}
EXPORT_SYMBOL(fuse_mount_destroy);
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index d9ccfd2..789af5c 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -2465,7 +2465,7 @@ int __gfs2_punch_hole(struct file *file, loff_t offset, loff_t length)
}
static int gfs2_map_blocks(struct iomap_writepage_ctx *wpc, struct inode *inode,
- loff_t offset)
+ loff_t offset, unsigned int len)
{
int ret;
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 992ca4e..4c42ada 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -1440,10 +1440,10 @@ static int gfs2_lock(struct file *file, int cmd, struct file_lock *fl)
struct gfs2_sbd *sdp = GFS2_SB(file->f_mapping->host);
struct lm_lockstruct *ls = &sdp->sd_lockstruct;
- if (!(fl->fl_flags & FL_POSIX))
+ if (!(fl->c.flc_flags & FL_POSIX))
return -ENOLCK;
if (gfs2_withdrawing_or_withdrawn(sdp)) {
- if (fl->fl_type == F_UNLCK)
+ if (lock_is_unlock(fl))
locks_lock_file_wait(file, fl);
return -EIO;
}
@@ -1451,7 +1451,7 @@ static int gfs2_lock(struct file *file, int cmd, struct file_lock *fl)
return dlm_posix_cancel(ls->ls_dlm, ip->i_no_addr, file, fl);
else if (IS_GETLK(cmd))
return dlm_posix_get(ls->ls_dlm, ip->i_no_addr, file, fl);
- else if (fl->fl_type == F_UNLCK)
+ else if (lock_is_unlock(fl))
return dlm_posix_unlock(ls->ls_dlm, ip->i_no_addr, file, fl);
else
return dlm_posix_lock(ls->ls_dlm, ip->i_no_addr, file, cmd, fl);
@@ -1483,7 +1483,7 @@ static int do_flock(struct file *file, int cmd, struct file_lock *fl)
int error = 0;
int sleeptime;
- state = (fl->fl_type == F_WRLCK) ? LM_ST_EXCLUSIVE : LM_ST_SHARED;
+ state = lock_is_write(fl) ? LM_ST_EXCLUSIVE : LM_ST_SHARED;
flags = GL_EXACT | GL_NOPID;
if (!IS_SETLKW(cmd))
flags |= LM_FLAG_TRY_1CB;
@@ -1495,8 +1495,8 @@ static int do_flock(struct file *file, int cmd, struct file_lock *fl)
if (fl_gh->gh_state == state)
goto out;
locks_init_lock(&request);
- request.fl_type = F_UNLCK;
- request.fl_flags = FL_FLOCK;
+ request.c.flc_type = F_UNLCK;
+ request.c.flc_flags = FL_FLOCK;
locks_lock_file_wait(file, &request);
gfs2_glock_dq(fl_gh);
gfs2_holder_reinit(state, flags, fl_gh);
@@ -1557,10 +1557,10 @@ static void do_unflock(struct file *file, struct file_lock *fl)
static int gfs2_flock(struct file *file, int cmd, struct file_lock *fl)
{
- if (!(fl->fl_flags & FL_FLOCK))
+ if (!(fl->c.flc_flags & FL_FLOCK))
return -ENOLCK;
- if (fl->fl_type == F_UNLCK) {
+ if (lock_is_unlock(fl)) {
do_unflock(file, fl);
return 0;
} else {
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 1281e60..572d58e 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -214,7 +214,7 @@ static void gfs2_sb_in(struct gfs2_sbd *sdp, const void *buf)
memcpy(sb->sb_lockproto, str->sb_lockproto, GFS2_LOCKNAME_LEN);
memcpy(sb->sb_locktable, str->sb_locktable, GFS2_LOCKNAME_LEN);
- memcpy(&s->s_uuid, str->sb_uuid, 16);
+ super_set_uuid(s, str->sb_uuid, 16);
}
/**
diff --git a/fs/hfsplus/hfsplus_fs.h b/fs/hfsplus/hfsplus_fs.h
index 7ededcb..012a3d0 100644
--- a/fs/hfsplus/hfsplus_fs.h
+++ b/fs/hfsplus/hfsplus_fs.h
@@ -190,6 +190,7 @@ struct hfsplus_sb_info {
int work_queued; /* non-zero delayed work is queued */
struct delayed_work sync_work; /* FS sync delayed work */
spinlock_t work_lock; /* protects sync_work and work_queued */
+ struct rcu_head rcu;
};
#define HFSPLUS_SB_WRITEBACKUP 0
diff --git a/fs/hfsplus/super.c b/fs/hfsplus/super.c
index 1986b4f..9792020 100644
--- a/fs/hfsplus/super.c
+++ b/fs/hfsplus/super.c
@@ -277,6 +277,14 @@ void hfsplus_mark_mdb_dirty(struct super_block *sb)
spin_unlock(&sbi->work_lock);
}
+static void delayed_free(struct rcu_head *p)
+{
+ struct hfsplus_sb_info *sbi = container_of(p, struct hfsplus_sb_info, rcu);
+
+ unload_nls(sbi->nls);
+ kfree(sbi);
+}
+
static void hfsplus_put_super(struct super_block *sb)
{
struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
@@ -302,9 +310,7 @@ static void hfsplus_put_super(struct super_block *sb)
hfs_btree_close(sbi->ext_tree);
kfree(sbi->s_vhdr_buf);
kfree(sbi->s_backup_vhdr_buf);
- unload_nls(sbi->nls);
- kfree(sb->s_fs_info);
- sb->s_fs_info = NULL;
+ call_rcu(&sbi->rcu, delayed_free);
}
static int hfsplus_statfs(struct dentry *dentry, struct kstatfs *buf)
diff --git a/fs/hfsplus/wrapper.c b/fs/hfsplus/wrapper.c
index b0cb704..ce93460 100644
--- a/fs/hfsplus/wrapper.c
+++ b/fs/hfsplus/wrapper.c
@@ -30,7 +30,7 @@ struct hfsplus_wd {
* @sector: block to read or write, for blocks of HFSPLUS_SECTOR_SIZE bytes
* @buf: buffer for I/O
* @data: output pointer for location of requested data
- * @opf: request op flags
+ * @opf: I/O operation type and flags
*
* The unit of I/O is hfsplus_min_io_size(sb), which may be bigger than
* HFSPLUS_SECTOR_SIZE, and @buf must be sized accordingly. On reads
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 671664f..6502c7e 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -100,6 +100,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
loff_t len, vma_len;
int ret;
struct hstate *h = hstate_file(file);
+ vm_flags_t vm_flags;
/*
* vma address alignment (but not the pgoff alignment) has
@@ -141,10 +142,20 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
file_accessed(file);
ret = -ENOMEM;
+
+ vm_flags = vma->vm_flags;
+ /*
+ * for SHM_HUGETLB, the pages are reserved in the shmget() call so skip
+ * reserving here. Note: only for SHM hugetlbfs file, the inode
+ * flag S_PRIVATE is set.
+ */
+ if (inode->i_flags & S_PRIVATE)
+ vm_flags |= VM_NORESERVE;
+
if (!hugetlb_reserve_pages(inode,
vma->vm_pgoff >> huge_page_order(h),
len >> huge_page_shift(h), vma,
- vma->vm_flags))
+ vm_flags))
goto out;
ret = 0;
@@ -922,7 +933,7 @@ static int hugetlbfs_setattr(struct mnt_idmap *idmap,
unsigned int ia_valid = attr->ia_valid;
struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode);
- error = setattr_prepare(&nop_mnt_idmap, dentry, attr);
+ error = setattr_prepare(idmap, dentry, attr);
if (error)
return error;
@@ -939,7 +950,7 @@ static int hugetlbfs_setattr(struct mnt_idmap *idmap,
hugetlb_vmtruncate(inode, newsize);
}
- setattr_copy(&nop_mnt_idmap, inode, attr);
+ setattr_copy(idmap, inode, attr);
mark_inode_dirty(inode);
return 0;
}
@@ -974,6 +985,7 @@ static struct inode *hugetlbfs_get_root(struct super_block *sb,
static struct lock_class_key hugetlbfs_i_mmap_rwsem_key;
static struct inode *hugetlbfs_get_inode(struct super_block *sb,
+ struct mnt_idmap *idmap,
struct inode *dir,
umode_t mode, dev_t dev)
{
@@ -995,7 +1007,7 @@ static struct inode *hugetlbfs_get_inode(struct super_block *sb,
struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode);
inode->i_ino = get_next_ino();
- inode_init_owner(&nop_mnt_idmap, inode, dir, mode);
+ inode_init_owner(idmap, inode, dir, mode);
lockdep_set_class(&inode->i_mapping->i_mmap_rwsem,
&hugetlbfs_i_mmap_rwsem_key);
inode->i_mapping->a_ops = &hugetlbfs_aops;
@@ -1039,7 +1051,7 @@ static int hugetlbfs_mknod(struct mnt_idmap *idmap, struct inode *dir,
{
struct inode *inode;
- inode = hugetlbfs_get_inode(dir->i_sb, dir, mode, dev);
+ inode = hugetlbfs_get_inode(dir->i_sb, idmap, dir, mode, dev);
if (!inode)
return -ENOSPC;
inode_set_mtime_to_ts(dir, inode_set_ctime_current(dir));
@@ -1051,7 +1063,7 @@ static int hugetlbfs_mknod(struct mnt_idmap *idmap, struct inode *dir,
static int hugetlbfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
struct dentry *dentry, umode_t mode)
{
- int retval = hugetlbfs_mknod(&nop_mnt_idmap, dir, dentry,
+ int retval = hugetlbfs_mknod(idmap, dir, dentry,
mode | S_IFDIR, 0);
if (!retval)
inc_nlink(dir);
@@ -1062,7 +1074,7 @@ static int hugetlbfs_create(struct mnt_idmap *idmap,
struct inode *dir, struct dentry *dentry,
umode_t mode, bool excl)
{
- return hugetlbfs_mknod(&nop_mnt_idmap, dir, dentry, mode | S_IFREG, 0);
+ return hugetlbfs_mknod(idmap, dir, dentry, mode | S_IFREG, 0);
}
static int hugetlbfs_tmpfile(struct mnt_idmap *idmap,
@@ -1071,7 +1083,7 @@ static int hugetlbfs_tmpfile(struct mnt_idmap *idmap,
{
struct inode *inode;
- inode = hugetlbfs_get_inode(dir->i_sb, dir, mode | S_IFREG, 0);
+ inode = hugetlbfs_get_inode(dir->i_sb, idmap, dir, mode | S_IFREG, 0);
if (!inode)
return -ENOSPC;
inode_set_mtime_to_ts(dir, inode_set_ctime_current(dir));
@@ -1083,10 +1095,11 @@ static int hugetlbfs_symlink(struct mnt_idmap *idmap,
struct inode *dir, struct dentry *dentry,
const char *symname)
{
+ const umode_t mode = S_IFLNK|S_IRWXUGO;
struct inode *inode;
int error = -ENOSPC;
- inode = hugetlbfs_get_inode(dir->i_sb, dir, S_IFLNK|S_IRWXUGO, 0);
+ inode = hugetlbfs_get_inode(dir->i_sb, idmap, dir, mode, 0);
if (inode) {
int l = strlen(symname)+1;
error = page_symlink(inode, symname, l);
@@ -1354,6 +1367,7 @@ static int hugetlbfs_parse_param(struct fs_context *fc, struct fs_parameter *par
{
struct hugetlbfs_fs_context *ctx = fc->fs_private;
struct fs_parse_result result;
+ struct hstate *h;
char *rest;
unsigned long ps;
int opt;
@@ -1398,11 +1412,12 @@ static int hugetlbfs_parse_param(struct fs_context *fc, struct fs_parameter *par
case Opt_pagesize:
ps = memparse(param->string, &rest);
- ctx->hstate = size_to_hstate(ps);
- if (!ctx->hstate) {
+ h = size_to_hstate(ps);
+ if (!h) {
pr_err("Unsupported page size %lu MB\n", ps / SZ_1M);
return -EINVAL;
}
+ ctx->hstate = h;
return 0;
case Opt_min_size:
@@ -1553,6 +1568,7 @@ static struct file_system_type hugetlbfs_fs_type = {
.init_fs_context = hugetlbfs_init_fs_context,
.parameters = hugetlb_fs_parameters,
.kill_sb = kill_litter_super,
+ .fs_flags = FS_ALLOW_IDMAP,
};
static struct vfsmount *hugetlbfs_vfsmount[HUGE_MAX_HSTATE];
@@ -1606,7 +1622,9 @@ struct file *hugetlb_file_setup(const char *name, size_t size,
}
file = ERR_PTR(-ENOSPC);
- inode = hugetlbfs_get_inode(mnt->mnt_sb, NULL, S_IFREG | S_IRWXUGO, 0);
+ /* hugetlbfs_vfsmount[] mounts do not use idmapped mounts. */
+ inode = hugetlbfs_get_inode(mnt->mnt_sb, &nop_mnt_idmap, NULL,
+ S_IFREG | S_IRWXUGO, 0);
if (!inode)
goto out;
if (creat_flags == HUGETLB_SHMFS_INODE)
diff --git a/fs/inode.c b/fs/inode.c
index 91048c4..d290f00 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -20,6 +20,7 @@
#include <linux/ratelimit.h>
#include <linux/list_lru.h>
#include <linux/iversion.h>
+#include <linux/rw_hint.h>
#include <trace/events/writeback.h>
#include "internal.h"
@@ -588,7 +589,8 @@ void dump_mapping(const struct address_space *mapping)
}
dentry_ptr = container_of(dentry_first, struct dentry, d_u.d_alias);
- if (get_kernel_nofault(dentry, dentry_ptr)) {
+ if (get_kernel_nofault(dentry, dentry_ptr) ||
+ !dentry.d_parent || !dentry.d_name.name) {
pr_warn("aops:%ps ino:%lx invalid dentry:%px\n",
a_ops, ino, dentry_ptr);
return;
@@ -2285,7 +2287,7 @@ void __init inode_init(void)
sizeof(struct inode),
0,
(SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|
- SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+ SLAB_ACCOUNT),
init_once);
/* Hash may have been set up in inode_init_early */
@@ -2509,7 +2511,7 @@ struct timespec64 inode_set_ctime_current(struct inode *inode)
{
struct timespec64 now = current_time(inode);
- inode_set_ctime(inode, now.tv_sec, now.tv_nsec);
+ inode_set_ctime_to_ts(inode, now);
return now;
}
EXPORT_SYMBOL(inode_set_ctime_current);
diff --git a/fs/internal.h b/fs/internal.h
index b674064..49c1fcf 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -183,6 +183,7 @@ extern struct open_how build_open_how(int flags, umode_t mode);
extern int build_open_flags(const struct open_how *how, struct open_flags *op);
struct file *file_close_fd_locked(struct files_struct *files, unsigned fd);
+long do_ftruncate(struct file *file, loff_t length, int small);
long do_sys_ftruncate(unsigned int fd, loff_t length, int small);
int chmod_common(const struct path *path, umode_t mode);
int do_fchownat(int dfd, const char __user *filename, uid_t user, gid_t group,
@@ -310,3 +311,10 @@ ssize_t __kernel_write_iter(struct file *file, struct iov_iter *from, loff_t *po
struct mnt_idmap *alloc_mnt_idmap(struct user_namespace *mnt_userns);
struct mnt_idmap *mnt_idmap_get(struct mnt_idmap *idmap);
void mnt_idmap_put(struct mnt_idmap *idmap);
+struct stashed_operations {
+ void (*put_data)(void *data);
+ void (*init_inode)(struct inode *inode, void *data);
+};
+int path_from_stashed(struct dentry **stashed, unsigned long ino,
+ struct vfsmount *mnt, void *data, struct path *path);
+void stashed_dentry_prune(struct dentry *dentry);
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 76cf22a..1d5abfd 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -763,6 +763,33 @@ static int ioctl_fssetxattr(struct file *file, void __user *argp)
return err;
}
+static int ioctl_getfsuuid(struct file *file, void __user *argp)
+{
+ struct super_block *sb = file_inode(file)->i_sb;
+ struct fsuuid2 u = { .len = sb->s_uuid_len, };
+
+ if (!sb->s_uuid_len)
+ return -ENOIOCTLCMD;
+
+ memcpy(&u.uuid[0], &sb->s_uuid, sb->s_uuid_len);
+
+ return copy_to_user(argp, &u, sizeof(u)) ? -EFAULT : 0;
+}
+
+static int ioctl_get_fs_sysfs_path(struct file *file, void __user *argp)
+{
+ struct super_block *sb = file_inode(file)->i_sb;
+
+ if (!strlen(sb->s_sysfs_name))
+ return -ENOIOCTLCMD;
+
+ struct fs_sysfs_path u = {};
+
+ u.len = scnprintf(u.name, sizeof(u.name), "%s/%s", sb->s_type->name, sb->s_sysfs_name);
+
+ return copy_to_user(argp, &u, sizeof(u)) ? -EFAULT : 0;
+}
+
/*
* do_vfs_ioctl() is not for drivers and not intended to be EXPORT_SYMBOL()'d.
* It's just a simple helper for sys_ioctl and compat_sys_ioctl.
@@ -845,6 +872,12 @@ static int do_vfs_ioctl(struct file *filp, unsigned int fd,
case FS_IOC_FSSETXATTR:
return ioctl_fssetxattr(filp, argp);
+ case FS_IOC_GETFSUUID:
+ return ioctl_getfsuuid(filp, argp);
+
+ case FS_IOC_GETFSSYSFSPATH:
+ return ioctl_get_fs_sysfs_path(filp, argp);
+
default:
if (S_ISREG(inode->i_mode))
return file_ioctl(filp, cmd, argp);
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 093c451..4e8e41c 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1,7 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (C) 2010 Red Hat, Inc.
- * Copyright (C) 2016-2019 Christoph Hellwig.
+ * Copyright (C) 2016-2023 Christoph Hellwig.
*/
#include <linux/module.h>
#include <linux/compiler.h>
@@ -95,6 +95,44 @@ static inline bool ifs_block_is_dirty(struct folio *folio,
return test_bit(block + blks_per_folio, ifs->state);
}
+static unsigned ifs_find_dirty_range(struct folio *folio,
+ struct iomap_folio_state *ifs, u64 *range_start, u64 range_end)
+{
+ struct inode *inode = folio->mapping->host;
+ unsigned start_blk =
+ offset_in_folio(folio, *range_start) >> inode->i_blkbits;
+ unsigned end_blk = min_not_zero(
+ offset_in_folio(folio, range_end) >> inode->i_blkbits,
+ i_blocks_per_folio(inode, folio));
+ unsigned nblks = 1;
+
+ while (!ifs_block_is_dirty(folio, ifs, start_blk))
+ if (++start_blk == end_blk)
+ return 0;
+
+ while (start_blk + nblks < end_blk) {
+ if (!ifs_block_is_dirty(folio, ifs, start_blk + nblks))
+ break;
+ nblks++;
+ }
+
+ *range_start = folio_pos(folio) + (start_blk << inode->i_blkbits);
+ return nblks << inode->i_blkbits;
+}
+
+static unsigned iomap_find_dirty_range(struct folio *folio, u64 *range_start,
+ u64 range_end)
+{
+ struct iomap_folio_state *ifs = folio->private;
+
+ if (*range_start >= range_end)
+ return 0;
+
+ if (ifs)
+ return ifs_find_dirty_range(folio, ifs, range_start, range_end);
+ return range_end - *range_start;
+}
+
static void ifs_clear_range_dirty(struct folio *folio,
struct iomap_folio_state *ifs, size_t off, size_t len)
{
@@ -1454,15 +1492,10 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops)
EXPORT_SYMBOL_GPL(iomap_page_mkwrite);
static void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
- size_t len, int error)
+ size_t len)
{
struct iomap_folio_state *ifs = folio->private;
- if (error) {
- folio_set_error(folio);
- mapping_set_error(inode->i_mapping, error);
- }
-
WARN_ON_ONCE(i_blocks_per_folio(inode, folio) > 1 && !ifs);
WARN_ON_ONCE(ifs && atomic_read(&ifs->write_bytes_pending) <= 0);
@@ -1479,40 +1512,29 @@ static u32
iomap_finish_ioend(struct iomap_ioend *ioend, int error)
{
struct inode *inode = ioend->io_inode;
- struct bio *bio = &ioend->io_inline_bio;
- struct bio *last = ioend->io_bio, *next;
- u64 start = bio->bi_iter.bi_sector;
- loff_t offset = ioend->io_offset;
- bool quiet = bio_flagged(bio, BIO_QUIET);
+ struct bio *bio = &ioend->io_bio;
+ struct folio_iter fi;
u32 folio_count = 0;
- for (bio = &ioend->io_inline_bio; bio; bio = next) {
- struct folio_iter fi;
-
- /*
- * For the last bio, bi_private points to the ioend, so we
- * need to explicitly end the iteration here.
- */
- if (bio == last)
- next = NULL;
- else
- next = bio->bi_private;
-
- /* walk all folios in bio, ending page IO on them */
- bio_for_each_folio_all(fi, bio) {
- iomap_finish_folio_write(inode, fi.folio, fi.length,
- error);
- folio_count++;
- }
- bio_put(bio);
- }
- /* The ioend has been freed by bio_put() */
-
- if (unlikely(error && !quiet)) {
- printk_ratelimited(KERN_ERR
+ if (error) {
+ mapping_set_error(inode->i_mapping, error);
+ if (!bio_flagged(bio, BIO_QUIET)) {
+ pr_err_ratelimited(
"%s: writeback error on inode %lu, offset %lld, sector %llu",
- inode->i_sb->s_id, inode->i_ino, offset, start);
+ inode->i_sb->s_id, inode->i_ino,
+ ioend->io_offset, ioend->io_sector);
+ }
}
+
+ /* walk all folios in bio, ending page IO on them */
+ bio_for_each_folio_all(fi, bio) {
+ if (error)
+ folio_set_error(fi.folio);
+ iomap_finish_folio_write(inode, fi.folio, fi.length);
+ folio_count++;
+ }
+
+ bio_put(bio); /* frees the ioend */
return folio_count;
}
@@ -1553,7 +1575,7 @@ EXPORT_SYMBOL_GPL(iomap_finish_ioends);
static bool
iomap_ioend_can_merge(struct iomap_ioend *ioend, struct iomap_ioend *next)
{
- if (ioend->io_bio->bi_status != next->io_bio->bi_status)
+ if (ioend->io_bio.bi_status != next->io_bio.bi_status)
return false;
if ((ioend->io_flags & IOMAP_F_SHARED) ^
(next->io_flags & IOMAP_F_SHARED))
@@ -1618,47 +1640,46 @@ EXPORT_SYMBOL_GPL(iomap_sort_ioends);
static void iomap_writepage_end_bio(struct bio *bio)
{
- struct iomap_ioend *ioend = bio->bi_private;
-
- iomap_finish_ioend(ioend, blk_status_to_errno(bio->bi_status));
+ iomap_finish_ioend(iomap_ioend_from_bio(bio),
+ blk_status_to_errno(bio->bi_status));
}
/*
* Submit the final bio for an ioend.
*
* If @error is non-zero, it means that we have a situation where some part of
- * the submission process has failed after we've marked pages for writeback
- * and unlocked them. In this situation, we need to fail the bio instead of
- * submitting it. This typically only happens on a filesystem shutdown.
+ * the submission process has failed after we've marked pages for writeback.
+ * We cannot cancel ioend directly in that case, so call the bio end I/O handler
+ * with the error status here to run the normal I/O completion handler to clear
+ * the writeback bit and let the file system proess the errors.
*/
-static int
-iomap_submit_ioend(struct iomap_writepage_ctx *wpc, struct iomap_ioend *ioend,
- int error)
+static int iomap_submit_ioend(struct iomap_writepage_ctx *wpc, int error)
{
- ioend->io_bio->bi_private = ioend;
- ioend->io_bio->bi_end_io = iomap_writepage_end_bio;
-
- if (wpc->ops->prepare_ioend)
- error = wpc->ops->prepare_ioend(ioend, error);
- if (error) {
- /*
- * If we're failing the IO now, just mark the ioend with an
- * error and finish it. This will run IO completion immediately
- * as there is only one reference to the ioend at this point in
- * time.
- */
- ioend->io_bio->bi_status = errno_to_blk_status(error);
- bio_endio(ioend->io_bio);
+ if (!wpc->ioend)
return error;
+
+ /*
+ * Let the file systems prepare the I/O submission and hook in an I/O
+ * comletion handler. This also needs to happen in case after a
+ * failure happened so that the file system end I/O handler gets called
+ * to clean up.
+ */
+ if (wpc->ops->prepare_ioend)
+ error = wpc->ops->prepare_ioend(wpc->ioend, error);
+
+ if (error) {
+ wpc->ioend->io_bio.bi_status = errno_to_blk_status(error);
+ bio_endio(&wpc->ioend->io_bio);
+ } else {
+ submit_bio(&wpc->ioend->io_bio);
}
- submit_bio(ioend->io_bio);
- return 0;
+ wpc->ioend = NULL;
+ return error;
}
-static struct iomap_ioend *
-iomap_alloc_ioend(struct inode *inode, struct iomap_writepage_ctx *wpc,
- loff_t offset, sector_t sector, struct writeback_control *wbc)
+static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
+ struct writeback_control *wbc, struct inode *inode, loff_t pos)
{
struct iomap_ioend *ioend;
struct bio *bio;
@@ -1666,63 +1687,42 @@ iomap_alloc_ioend(struct inode *inode, struct iomap_writepage_ctx *wpc,
bio = bio_alloc_bioset(wpc->iomap.bdev, BIO_MAX_VECS,
REQ_OP_WRITE | wbc_to_write_flags(wbc),
GFP_NOFS, &iomap_ioend_bioset);
- bio->bi_iter.bi_sector = sector;
+ bio->bi_iter.bi_sector = iomap_sector(&wpc->iomap, pos);
+ bio->bi_end_io = iomap_writepage_end_bio;
wbc_init_bio(wbc, bio);
+ bio->bi_write_hint = inode->i_write_hint;
- ioend = container_of(bio, struct iomap_ioend, io_inline_bio);
+ ioend = iomap_ioend_from_bio(bio);
INIT_LIST_HEAD(&ioend->io_list);
ioend->io_type = wpc->iomap.type;
ioend->io_flags = wpc->iomap.flags;
ioend->io_inode = inode;
ioend->io_size = 0;
- ioend->io_folios = 0;
- ioend->io_offset = offset;
- ioend->io_bio = bio;
- ioend->io_sector = sector;
+ ioend->io_offset = pos;
+ ioend->io_sector = bio->bi_iter.bi_sector;
+
+ wpc->nr_folios = 0;
return ioend;
}
-/*
- * Allocate a new bio, and chain the old bio to the new one.
- *
- * Note that we have to perform the chaining in this unintuitive order
- * so that the bi_private linkage is set up in the right direction for the
- * traversal in iomap_finish_ioend().
- */
-static struct bio *
-iomap_chain_bio(struct bio *prev)
-{
- struct bio *new;
-
- new = bio_alloc(prev->bi_bdev, BIO_MAX_VECS, prev->bi_opf, GFP_NOFS);
- bio_clone_blkg_association(new, prev);
- new->bi_iter.bi_sector = bio_end_sector(prev);
-
- bio_chain(prev, new);
- bio_get(prev); /* for iomap_finish_ioend */
- submit_bio(prev);
- return new;
-}
-
-static bool
-iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset,
- sector_t sector)
+static bool iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t pos)
{
if ((wpc->iomap.flags & IOMAP_F_SHARED) !=
(wpc->ioend->io_flags & IOMAP_F_SHARED))
return false;
if (wpc->iomap.type != wpc->ioend->io_type)
return false;
- if (offset != wpc->ioend->io_offset + wpc->ioend->io_size)
+ if (pos != wpc->ioend->io_offset + wpc->ioend->io_size)
return false;
- if (sector != bio_end_sector(wpc->ioend->io_bio))
+ if (iomap_sector(&wpc->iomap, pos) !=
+ bio_end_sector(&wpc->ioend->io_bio))
return false;
/*
* Limit ioend bio chain lengths to minimise IO completion latency. This
* also prevents long tight loops ending page writeback on all the
* folios in the ioend.
*/
- if (wpc->ioend->io_folios >= IOEND_BATCH_SIZE)
+ if (wpc->nr_folios >= IOEND_BATCH_SIZE)
return false;
return true;
}
@@ -1730,255 +1730,238 @@ iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset,
/*
* Test to see if we have an existing ioend structure that we could append to
* first; otherwise finish off the current ioend and start another.
+ *
+ * If a new ioend is created and cached, the old ioend is submitted to the block
+ * layer instantly. Batching optimisations are provided by higher level block
+ * plugging.
+ *
+ * At the end of a writeback pass, there will be a cached ioend remaining on the
+ * writepage context that the caller will need to submit.
*/
-static void
-iomap_add_to_ioend(struct inode *inode, loff_t pos, struct folio *folio,
- struct iomap_folio_state *ifs, struct iomap_writepage_ctx *wpc,
- struct writeback_control *wbc, struct list_head *iolist)
+static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
+ struct writeback_control *wbc, struct folio *folio,
+ struct inode *inode, loff_t pos, unsigned len)
{
- sector_t sector = iomap_sector(&wpc->iomap, pos);
- unsigned len = i_blocksize(inode);
+ struct iomap_folio_state *ifs = folio->private;
size_t poff = offset_in_folio(folio, pos);
+ int error;
- if (!wpc->ioend || !iomap_can_add_to_ioend(wpc, pos, sector)) {
- if (wpc->ioend)
- list_add(&wpc->ioend->io_list, iolist);
- wpc->ioend = iomap_alloc_ioend(inode, wpc, pos, sector, wbc);
+ if (!wpc->ioend || !iomap_can_add_to_ioend(wpc, pos)) {
+new_ioend:
+ error = iomap_submit_ioend(wpc, 0);
+ if (error)
+ return error;
+ wpc->ioend = iomap_alloc_ioend(wpc, wbc, inode, pos);
}
- if (!bio_add_folio(wpc->ioend->io_bio, folio, len, poff)) {
- wpc->ioend->io_bio = iomap_chain_bio(wpc->ioend->io_bio);
- bio_add_folio_nofail(wpc->ioend->io_bio, folio, len, poff);
- }
+ if (!bio_add_folio(&wpc->ioend->io_bio, folio, len, poff))
+ goto new_ioend;
if (ifs)
atomic_add(len, &ifs->write_bytes_pending);
wpc->ioend->io_size += len;
wbc_account_cgroup_owner(wbc, &folio->page, len);
+ return 0;
}
-/*
- * We implement an immediate ioend submission policy here to avoid needing to
- * chain multiple ioends and hence nest mempool allocations which can violate
- * the forward progress guarantees we need to provide. The current ioend we're
- * adding blocks to is cached in the writepage context, and if the new block
- * doesn't append to the cached ioend, it will create a new ioend and cache that
- * instead.
- *
- * If a new ioend is created and cached, the old ioend is returned and queued
- * locally for submission once the entire page is processed or an error has been
- * detected. While ioends are submitted immediately after they are completed,
- * batching optimisations are provided by higher level block plugging.
- *
- * At the end of a writeback pass, there will be a cached ioend remaining on the
- * writepage context that the caller will need to submit.
- */
-static int
-iomap_writepage_map(struct iomap_writepage_ctx *wpc,
- struct writeback_control *wbc, struct inode *inode,
- struct folio *folio, u64 end_pos)
+static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc,
+ struct writeback_control *wbc, struct folio *folio,
+ struct inode *inode, u64 pos, unsigned dirty_len,
+ unsigned *count)
{
- struct iomap_folio_state *ifs = folio->private;
- struct iomap_ioend *ioend, *next;
- unsigned len = i_blocksize(inode);
- unsigned nblocks = i_blocks_per_folio(inode, folio);
- u64 pos = folio_pos(folio);
- int error = 0, count = 0, i;
- LIST_HEAD(submit_list);
+ int error;
- WARN_ON_ONCE(end_pos <= pos);
+ do {
+ unsigned map_len;
- if (!ifs && nblocks > 1) {
- ifs = ifs_alloc(inode, folio, 0);
- iomap_set_range_dirty(folio, 0, end_pos - pos);
- }
-
- WARN_ON_ONCE(ifs && atomic_read(&ifs->write_bytes_pending) != 0);
-
- /*
- * Walk through the folio to find areas to write back. If we
- * run off the end of the current map or find the current map
- * invalid, grab a new one.
- */
- for (i = 0; i < nblocks && pos < end_pos; i++, pos += len) {
- if (ifs && !ifs_block_is_dirty(folio, ifs, i))
- continue;
-
- error = wpc->ops->map_blocks(wpc, inode, pos);
+ error = wpc->ops->map_blocks(wpc, inode, pos, dirty_len);
if (error)
break;
- trace_iomap_writepage_map(inode, &wpc->iomap);
- if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE))
- continue;
- if (wpc->iomap.type == IOMAP_HOLE)
- continue;
- iomap_add_to_ioend(inode, pos, folio, ifs, wpc, wbc,
- &submit_list);
- count++;
- }
- if (count)
- wpc->ioend->io_folios++;
+ trace_iomap_writepage_map(inode, pos, dirty_len, &wpc->iomap);
- WARN_ON_ONCE(!wpc->ioend && !list_empty(&submit_list));
- WARN_ON_ONCE(!folio_test_locked(folio));
- WARN_ON_ONCE(folio_test_writeback(folio));
- WARN_ON_ONCE(folio_test_dirty(folio));
+ map_len = min_t(u64, dirty_len,
+ wpc->iomap.offset + wpc->iomap.length - pos);
+ WARN_ON_ONCE(!folio->private && map_len < dirty_len);
+
+ switch (wpc->iomap.type) {
+ case IOMAP_INLINE:
+ WARN_ON_ONCE(1);
+ error = -EIO;
+ break;
+ case IOMAP_HOLE:
+ break;
+ default:
+ error = iomap_add_to_ioend(wpc, wbc, folio, inode, pos,
+ map_len);
+ if (!error)
+ (*count)++;
+ break;
+ }
+ dirty_len -= map_len;
+ pos += map_len;
+ } while (dirty_len && !error);
/*
* We cannot cancel the ioend directly here on error. We may have
* already set other pages under writeback and hence we have to run I/O
* completion to mark the error state of the pages under writeback
* appropriately.
+ *
+ * Just let the file system know what portion of the folio failed to
+ * map.
*/
- if (unlikely(error)) {
+ if (error && wpc->ops->discard_folio)
+ wpc->ops->discard_folio(folio, pos);
+ return error;
+}
+
+/*
+ * Check interaction of the folio with the file end.
+ *
+ * If the folio is entirely beyond i_size, return false. If it straddles
+ * i_size, adjust end_pos and zero all data beyond i_size.
+ */
+static bool iomap_writepage_handle_eof(struct folio *folio, struct inode *inode,
+ u64 *end_pos)
+{
+ u64 isize = i_size_read(inode);
+
+ if (*end_pos > isize) {
+ size_t poff = offset_in_folio(folio, isize);
+ pgoff_t end_index = isize >> PAGE_SHIFT;
+
/*
- * Let the filesystem know what portion of the current page
- * failed to map. If the page hasn't been added to ioend, it
- * won't be affected by I/O completion and we must unlock it
- * now.
+ * If the folio is entirely ouside of i_size, skip it.
+ *
+ * This can happen due to a truncate operation that is in
+ * progress and in that case truncate will finish it off once
+ * we've dropped the folio lock.
+ *
+ * Note that the pgoff_t used for end_index is an unsigned long.
+ * If the given offset is greater than 16TB on a 32-bit system,
+ * then if we checked if the folio is fully outside i_size with
+ * "if (folio->index >= end_index + 1)", "end_index + 1" would
+ * overflow and evaluate to 0. Hence this folio would be
+ * redirtied and written out repeatedly, which would result in
+ * an infinite loop; the user program performing this operation
+ * would hang. Instead, we can detect this situation by
+ * checking if the folio is totally beyond i_size or if its
+ * offset is just equal to the EOF.
*/
- if (wpc->ops->discard_folio)
- wpc->ops->discard_folio(folio, pos);
- if (!count) {
- folio_unlock(folio);
- goto done;
- }
+ if (folio->index > end_index ||
+ (folio->index == end_index && poff == 0))
+ return false;
+
+ /*
+ * The folio straddles i_size.
+ *
+ * It must be zeroed out on each and every writepage invocation
+ * because it may be mmapped:
+ *
+ * A file is mapped in multiples of the page size. For a
+ * file that is not a multiple of the page size, the
+ * remaining memory is zeroed when mapped, and writes to that
+ * region are not written out to the file.
+ *
+ * Also adjust the writeback range to skip all blocks entirely
+ * beyond i_size.
+ */
+ folio_zero_segment(folio, poff, folio_size(folio));
+ *end_pos = round_up(isize, i_blocksize(inode));
}
+ return true;
+}
+
+static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
+ struct writeback_control *wbc, struct folio *folio)
+{
+ struct iomap_folio_state *ifs = folio->private;
+ struct inode *inode = folio->mapping->host;
+ u64 pos = folio_pos(folio);
+ u64 end_pos = pos + folio_size(folio);
+ unsigned count = 0;
+ int error = 0;
+ u32 rlen;
+
+ WARN_ON_ONCE(!folio_test_locked(folio));
+ WARN_ON_ONCE(folio_test_dirty(folio));
+ WARN_ON_ONCE(folio_test_writeback(folio));
+
+ trace_iomap_writepage(inode, pos, folio_size(folio));
+
+ if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) {
+ folio_unlock(folio);
+ return 0;
+ }
+ WARN_ON_ONCE(end_pos <= pos);
+
+ if (i_blocks_per_folio(inode, folio) > 1) {
+ if (!ifs) {
+ ifs = ifs_alloc(inode, folio, 0);
+ iomap_set_range_dirty(folio, 0, end_pos - pos);
+ }
+
+ /*
+ * Keep the I/O completion handler from clearing the writeback
+ * bit until we have submitted all blocks by adding a bias to
+ * ifs->write_bytes_pending, which is dropped after submitting
+ * all blocks.
+ */
+ WARN_ON_ONCE(atomic_read(&ifs->write_bytes_pending) != 0);
+ atomic_inc(&ifs->write_bytes_pending);
+ }
+
+ /*
+ * Set the writeback bit ASAP, as the I/O completion for the single
+ * block per folio case happen hit as soon as we're submitting the bio.
+ */
+ folio_start_writeback(folio);
+
+ /*
+ * Walk through the folio to find dirty areas to write back.
+ */
+ while ((rlen = iomap_find_dirty_range(folio, &pos, end_pos))) {
+ error = iomap_writepage_map_blocks(wpc, wbc, folio, inode,
+ pos, rlen, &count);
+ if (error)
+ break;
+ pos += rlen;
+ }
+
+ if (count)
+ wpc->nr_folios++;
+
/*
* We can have dirty bits set past end of file in page_mkwrite path
* while mapping the last partial folio. Hence it's better to clear
* all the dirty bits in the folio here.
*/
iomap_clear_range_dirty(folio, 0, folio_size(folio));
- folio_start_writeback(folio);
+
+ /*
+ * Usually the writeback bit is cleared by the I/O completion handler.
+ * But we may end up either not actually writing any blocks, or (when
+ * there are multiple blocks in a folio) all I/O might have finished
+ * already at this point. In that case we need to clear the writeback
+ * bit ourselves right after unlocking the page.
+ */
folio_unlock(folio);
-
- /*
- * Preserve the original error if there was one; catch
- * submission errors here and propagate into subsequent ioend
- * submissions.
- */
- list_for_each_entry_safe(ioend, next, &submit_list, io_list) {
- int error2;
-
- list_del_init(&ioend->io_list);
- error2 = iomap_submit_ioend(wpc, ioend, error);
- if (error2 && !error)
- error = error2;
+ if (ifs) {
+ if (atomic_dec_and_test(&ifs->write_bytes_pending))
+ folio_end_writeback(folio);
+ } else {
+ if (!count)
+ folio_end_writeback(folio);
}
-
- /*
- * We can end up here with no error and nothing to write only if we race
- * with a partial page truncate on a sub-page block sized filesystem.
- */
- if (!count)
- folio_end_writeback(folio);
-done:
mapping_set_error(inode->i_mapping, error);
return error;
}
-/*
- * Write out a dirty page.
- *
- * For delalloc space on the page, we need to allocate space and flush it.
- * For unwritten space on the page, we need to start the conversion to
- * regular allocated space.
- */
static int iomap_do_writepage(struct folio *folio,
struct writeback_control *wbc, void *data)
{
- struct iomap_writepage_ctx *wpc = data;
- struct inode *inode = folio->mapping->host;
- u64 end_pos, isize;
-
- trace_iomap_writepage(inode, folio_pos(folio), folio_size(folio));
-
- /*
- * Refuse to write the folio out if we're called from reclaim context.
- *
- * This avoids stack overflows when called from deeply used stacks in
- * random callers for direct reclaim or memcg reclaim. We explicitly
- * allow reclaim from kswapd as the stack usage there is relatively low.
- *
- * This should never happen except in the case of a VM regression so
- * warn about it.
- */
- if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
- PF_MEMALLOC))
- goto redirty;
-
- /*
- * Is this folio beyond the end of the file?
- *
- * The folio index is less than the end_index, adjust the end_pos
- * to the highest offset that this folio should represent.
- * -----------------------------------------------------
- * | file mapping | <EOF> |
- * -----------------------------------------------------
- * | Page ... | Page N-2 | Page N-1 | Page N | |
- * ^--------------------------------^----------|--------
- * | desired writeback range | see else |
- * ---------------------------------^------------------|
- */
- isize = i_size_read(inode);
- end_pos = folio_pos(folio) + folio_size(folio);
- if (end_pos > isize) {
- /*
- * Check whether the page to write out is beyond or straddles
- * i_size or not.
- * -------------------------------------------------------
- * | file mapping | <EOF> |
- * -------------------------------------------------------
- * | Page ... | Page N-2 | Page N-1 | Page N | Beyond |
- * ^--------------------------------^-----------|---------
- * | | Straddles |
- * ---------------------------------^-----------|--------|
- */
- size_t poff = offset_in_folio(folio, isize);
- pgoff_t end_index = isize >> PAGE_SHIFT;
-
- /*
- * Skip the page if it's fully outside i_size, e.g.
- * due to a truncate operation that's in progress. We've
- * cleaned this page and truncate will finish things off for
- * us.
- *
- * Note that the end_index is unsigned long. If the given
- * offset is greater than 16TB on a 32-bit system then if we
- * checked if the page is fully outside i_size with
- * "if (page->index >= end_index + 1)", "end_index + 1" would
- * overflow and evaluate to 0. Hence this page would be
- * redirtied and written out repeatedly, which would result in
- * an infinite loop; the user program performing this operation
- * would hang. Instead, we can detect this situation by
- * checking if the page is totally beyond i_size or if its
- * offset is just equal to the EOF.
- */
- if (folio->index > end_index ||
- (folio->index == end_index && poff == 0))
- goto unlock;
-
- /*
- * The page straddles i_size. It must be zeroed out on each
- * and every writepage invocation because it may be mmapped.
- * "A file is mapped in multiples of the page size. For a file
- * that is not a multiple of the page size, the remaining
- * memory is zeroed when mapped, and writes to that region are
- * not written out to the file."
- */
- folio_zero_segment(folio, poff, folio_size(folio));
- end_pos = isize;
- }
-
- return iomap_writepage_map(wpc, wbc, inode, folio, end_pos);
-
-redirty:
- folio_redirty_for_writepage(wbc, folio);
-unlock:
- folio_unlock(folio);
- return 0;
+ return iomap_writepage_map(data, wbc, folio);
}
int
@@ -1988,18 +1971,24 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
{
int ret;
+ /*
+ * Writeback from reclaim context should never happen except in the case
+ * of a VM regression so warn about it and refuse to write the data.
+ */
+ if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC | PF_KSWAPD)) ==
+ PF_MEMALLOC))
+ return -EIO;
+
wpc->ops = ops;
ret = write_cache_pages(mapping, wbc, iomap_do_writepage, wpc);
- if (!wpc->ioend)
- return ret;
- return iomap_submit_ioend(wpc, wpc->ioend, ret);
+ return iomap_submit_ioend(wpc, ret);
}
EXPORT_SYMBOL_GPL(iomap_writepages);
static int __init iomap_init(void)
{
return bioset_init(&iomap_ioend_bioset, 4 * (PAGE_SIZE / SECTOR_SIZE),
- offsetof(struct iomap_ioend, io_inline_bio),
+ offsetof(struct iomap_ioend, io_bio),
BIOSET_NEED_BVECS);
}
fs_initcall(iomap_init);
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index bcd3f8c..f3b43d2 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -380,6 +380,7 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits,
GFP_KERNEL);
bio->bi_iter.bi_sector = iomap_sector(iomap, pos);
+ bio->bi_write_hint = inode->i_write_hint;
bio->bi_ioprio = dio->iocb->ki_ioprio;
bio->bi_private = dio;
bio->bi_end_io = iomap_dio_bio_end_io;
diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
index c16fd55..0a991c4 100644
--- a/fs/iomap/trace.h
+++ b/fs/iomap/trace.h
@@ -154,7 +154,48 @@ DEFINE_EVENT(iomap_class, name, \
TP_ARGS(inode, iomap))
DEFINE_IOMAP_EVENT(iomap_iter_dstmap);
DEFINE_IOMAP_EVENT(iomap_iter_srcmap);
-DEFINE_IOMAP_EVENT(iomap_writepage_map);
+
+TRACE_EVENT(iomap_writepage_map,
+ TP_PROTO(struct inode *inode, u64 pos, unsigned int dirty_len,
+ struct iomap *iomap),
+ TP_ARGS(inode, pos, dirty_len, iomap),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(u64, ino)
+ __field(u64, pos)
+ __field(u64, dirty_len)
+ __field(u64, addr)
+ __field(loff_t, offset)
+ __field(u64, length)
+ __field(u16, type)
+ __field(u16, flags)
+ __field(dev_t, bdev)
+ ),
+ TP_fast_assign(
+ __entry->dev = inode->i_sb->s_dev;
+ __entry->ino = inode->i_ino;
+ __entry->pos = pos;
+ __entry->dirty_len = dirty_len;
+ __entry->addr = iomap->addr;
+ __entry->offset = iomap->offset;
+ __entry->length = iomap->length;
+ __entry->type = iomap->type;
+ __entry->flags = iomap->flags;
+ __entry->bdev = iomap->bdev ? iomap->bdev->bd_dev : 0;
+ ),
+ TP_printk("dev %d:%d ino 0x%llx bdev %d:%d pos 0x%llx dirty len 0x%llx "
+ "addr 0x%llx offset 0x%llx length 0x%llx type %s flags %s",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->ino,
+ MAJOR(__entry->bdev), MINOR(__entry->bdev),
+ __entry->pos,
+ __entry->dirty_len,
+ __entry->addr,
+ __entry->offset,
+ __entry->length,
+ __print_symbolic(__entry->type, IOMAP_TYPE_STRINGS),
+ __print_flags(__entry->flags, "|", IOMAP_F_FLAGS_STRINGS))
+);
TRACE_EVENT(iomap_iter,
TP_PROTO(struct iomap_iter *iter, const void *ops,
@@ -165,6 +206,7 @@ TRACE_EVENT(iomap_iter,
__field(u64, ino)
__field(loff_t, pos)
__field(u64, length)
+ __field(s64, processed)
__field(unsigned int, flags)
__field(const void *, ops)
__field(unsigned long, caller)
@@ -174,15 +216,17 @@ TRACE_EVENT(iomap_iter,
__entry->ino = iter->inode->i_ino;
__entry->pos = iter->pos;
__entry->length = iomap_length(iter);
+ __entry->processed = iter->processed;
__entry->flags = iter->flags;
__entry->ops = ops;
__entry->caller = caller;
),
- TP_printk("dev %d:%d ino 0x%llx pos 0x%llx length 0x%llx flags %s (0x%x) ops %ps caller %pS",
+ TP_printk("dev %d:%d ino 0x%llx pos 0x%llx length 0x%llx processed %lld flags %s (0x%x) ops %ps caller %pS",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->ino,
__entry->pos,
__entry->length,
+ __entry->processed,
__print_flags(__entry->flags, "|", IOMAP_FLAGS_STRINGS),
__entry->flags,
__entry->ops,
diff --git a/fs/jfs/jfs_logmgr.c b/fs/jfs/jfs_logmgr.c
index cb6d1fd..73389c6 100644
--- a/fs/jfs/jfs_logmgr.c
+++ b/fs/jfs/jfs_logmgr.c
@@ -1058,7 +1058,7 @@ void jfs_syncpt(struct jfs_log *log, int hard_sync)
int lmLogOpen(struct super_block *sb)
{
int rc;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
struct jfs_log *log;
struct jfs_sb_info *sbi = JFS_SBI(sb);
@@ -1070,7 +1070,7 @@ int lmLogOpen(struct super_block *sb)
mutex_lock(&jfs_log_mutex);
list_for_each_entry(log, &jfs_external_logs, journal_list) {
- if (log->bdev_handle->bdev->bd_dev == sbi->logdev) {
+ if (file_bdev(log->bdev_file)->bd_dev == sbi->logdev) {
if (!uuid_equal(&log->uuid, &sbi->loguuid)) {
jfs_warn("wrong uuid on JFS journal");
mutex_unlock(&jfs_log_mutex);
@@ -1100,14 +1100,14 @@ int lmLogOpen(struct super_block *sb)
* file systems to log may have n-to-1 relationship;
*/
- bdev_handle = bdev_open_by_dev(sbi->logdev,
+ bdev_file = bdev_file_open_by_dev(sbi->logdev,
BLK_OPEN_READ | BLK_OPEN_WRITE, log, NULL);
- if (IS_ERR(bdev_handle)) {
- rc = PTR_ERR(bdev_handle);
+ if (IS_ERR(bdev_file)) {
+ rc = PTR_ERR(bdev_file);
goto free;
}
- log->bdev_handle = bdev_handle;
+ log->bdev_file = bdev_file;
uuid_copy(&log->uuid, &sbi->loguuid);
/*
@@ -1141,7 +1141,7 @@ int lmLogOpen(struct super_block *sb)
lbmLogShutdown(log);
close: /* close external log device */
- bdev_release(bdev_handle);
+ fput(bdev_file);
free: /* free log descriptor */
mutex_unlock(&jfs_log_mutex);
@@ -1162,7 +1162,7 @@ static int open_inline_log(struct super_block *sb)
init_waitqueue_head(&log->syncwait);
set_bit(log_INLINELOG, &log->flag);
- log->bdev_handle = sb->s_bdev_handle;
+ log->bdev_file = sb->s_bdev_file;
log->base = addressPXD(&JFS_SBI(sb)->logpxd);
log->size = lengthPXD(&JFS_SBI(sb)->logpxd) >>
(L2LOGPSIZE - sb->s_blocksize_bits);
@@ -1436,7 +1436,7 @@ int lmLogClose(struct super_block *sb)
{
struct jfs_sb_info *sbi = JFS_SBI(sb);
struct jfs_log *log = sbi->log;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
int rc = 0;
jfs_info("lmLogClose: log:0x%p", log);
@@ -1482,10 +1482,10 @@ int lmLogClose(struct super_block *sb)
* external log as separate logical volume
*/
list_del(&log->journal_list);
- bdev_handle = log->bdev_handle;
+ bdev_file = log->bdev_file;
rc = lmLogShutdown(log);
- bdev_release(bdev_handle);
+ fput(bdev_file);
kfree(log);
@@ -1972,7 +1972,7 @@ static int lbmRead(struct jfs_log * log, int pn, struct lbuf ** bpp)
bp->l_flag |= lbmREAD;
- bio = bio_alloc(log->bdev_handle->bdev, 1, REQ_OP_READ, GFP_NOFS);
+ bio = bio_alloc(file_bdev(log->bdev_file), 1, REQ_OP_READ, GFP_NOFS);
bio->bi_iter.bi_sector = bp->l_blkno << (log->l2bsize - 9);
__bio_add_page(bio, bp->l_page, LOGPSIZE, bp->l_offset);
BUG_ON(bio->bi_iter.bi_size != LOGPSIZE);
@@ -2115,7 +2115,7 @@ static void lbmStartIO(struct lbuf * bp)
jfs_info("lbmStartIO");
if (!log->no_integrity)
- bdev = log->bdev_handle->bdev;
+ bdev = file_bdev(log->bdev_file);
bio = bio_alloc(bdev, 1, REQ_OP_WRITE | REQ_SYNC,
GFP_NOFS);
diff --git a/fs/jfs/jfs_logmgr.h b/fs/jfs/jfs_logmgr.h
index 84aa2d2..8b8994e 100644
--- a/fs/jfs/jfs_logmgr.h
+++ b/fs/jfs/jfs_logmgr.h
@@ -356,7 +356,7 @@ struct jfs_log {
* before writing syncpt.
*/
struct list_head journal_list; /* Global list */
- struct bdev_handle *bdev_handle; /* 4: log lv pointer */
+ struct file *bdev_file; /* 4: log lv pointer */
int serial; /* 4: log mount serial number */
s64 base; /* @8: log extent address (inline log ) */
diff --git a/fs/jfs/jfs_mount.c b/fs/jfs/jfs_mount.c
index 9b5c6a2..98f9a43 100644
--- a/fs/jfs/jfs_mount.c
+++ b/fs/jfs/jfs_mount.c
@@ -431,7 +431,7 @@ int updateSuper(struct super_block *sb, uint state)
if (state == FM_MOUNT) {
/* record log's dev_t and mount serial number */
j_sb->s_logdev = cpu_to_le32(
- new_encode_dev(sbi->log->bdev_handle->bdev->bd_dev));
+ new_encode_dev(file_bdev(sbi->log->bdev_file)->bd_dev));
j_sb->s_logserial = cpu_to_le32(sbi->log->serial);
} else if (state == FM_CLEAN) {
/*
diff --git a/fs/jfs/super.c b/fs/jfs/super.c
index 8d8e556..73f09a7 100644
--- a/fs/jfs/super.c
+++ b/fs/jfs/super.c
@@ -932,7 +932,7 @@ static int __init init_jfs_fs(void)
jfs_inode_cachep =
kmem_cache_create_usercopy("jfs_ip", sizeof(struct jfs_inode_info),
- 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
+ 0, SLAB_RECLAIM_ACCOUNT|SLAB_ACCOUNT,
offsetof(struct jfs_inode_info, i_inline_all),
sizeof_field(struct jfs_inode_info, i_inline_all),
init_once);
diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index 0c93cad..e29f4ed 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -358,7 +358,9 @@ int kernfs_get_tree(struct fs_context *fc)
}
sb->s_flags |= SB_ACTIVE;
- uuid_gen(&sb->s_uuid);
+ uuid_t uuid;
+ uuid_gen(&uuid);
+ super_set_uuid(sb, uuid.b, sizeof(uuid));
down_write(&root->kernfs_supers_rwsem);
list_add(&info->node, &info->root->supers);
diff --git a/fs/libfs.c b/fs/libfs.c
index eec6031..0d14ae8 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -23,6 +23,7 @@
#include <linux/fsnotify.h>
#include <linux/unicode.h>
#include <linux/fscrypt.h>
+#include <linux/pidfs.h>
#include <linux/uaccess.h>
@@ -240,17 +241,22 @@ const struct inode_operations simple_dir_inode_operations = {
};
EXPORT_SYMBOL(simple_dir_inode_operations);
-static void offset_set(struct dentry *dentry, u32 offset)
+/* 0 is '.', 1 is '..', so always start with offset 2 or more */
+enum {
+ DIR_OFFSET_MIN = 2,
+};
+
+static void offset_set(struct dentry *dentry, long offset)
{
- dentry->d_fsdata = (void *)((uintptr_t)(offset));
+ dentry->d_fsdata = (void *)offset;
}
-static u32 dentry2offset(struct dentry *dentry)
+static long dentry2offset(struct dentry *dentry)
{
- return (u32)((uintptr_t)(dentry->d_fsdata));
+ return (long)dentry->d_fsdata;
}
-static struct lock_class_key simple_offset_xa_lock;
+static struct lock_class_key simple_offset_lock_class;
/**
* simple_offset_init - initialize an offset_ctx
@@ -259,11 +265,9 @@ static struct lock_class_key simple_offset_xa_lock;
*/
void simple_offset_init(struct offset_ctx *octx)
{
- xa_init_flags(&octx->xa, XA_FLAGS_ALLOC1);
- lockdep_set_class(&octx->xa.xa_lock, &simple_offset_xa_lock);
-
- /* 0 is '.', 1 is '..', so always start with offset 2 */
- octx->next_offset = 2;
+ mt_init_flags(&octx->mt, MT_FLAGS_ALLOC_RANGE);
+ lockdep_set_class(&octx->mt.ma_lock, &simple_offset_lock_class);
+ octx->next_offset = DIR_OFFSET_MIN;
}
/**
@@ -271,20 +275,19 @@ void simple_offset_init(struct offset_ctx *octx)
* @octx: directory offset ctx to be updated
* @dentry: new dentry being added
*
- * Returns zero on success. @so_ctx and the dentry offset are updated.
+ * Returns zero on success. @octx and the dentry's offset are updated.
* Otherwise, a negative errno value is returned.
*/
int simple_offset_add(struct offset_ctx *octx, struct dentry *dentry)
{
- static const struct xa_limit limit = XA_LIMIT(2, U32_MAX);
- u32 offset;
+ unsigned long offset;
int ret;
if (dentry2offset(dentry) != 0)
return -EBUSY;
- ret = xa_alloc_cyclic(&octx->xa, &offset, dentry, limit,
- &octx->next_offset, GFP_KERNEL);
+ ret = mtree_alloc_cyclic(&octx->mt, &offset, dentry, DIR_OFFSET_MIN,
+ LONG_MAX, &octx->next_offset, GFP_KERNEL);
if (ret < 0)
return ret;
@@ -300,17 +303,49 @@ int simple_offset_add(struct offset_ctx *octx, struct dentry *dentry)
*/
void simple_offset_remove(struct offset_ctx *octx, struct dentry *dentry)
{
- u32 offset;
+ long offset;
offset = dentry2offset(dentry);
if (offset == 0)
return;
- xa_erase(&octx->xa, offset);
+ mtree_erase(&octx->mt, offset);
offset_set(dentry, 0);
}
/**
+ * simple_offset_empty - Check if a dentry can be unlinked
+ * @dentry: dentry to be tested
+ *
+ * Returns 0 if @dentry is a non-empty directory; otherwise returns 1.
+ */
+int simple_offset_empty(struct dentry *dentry)
+{
+ struct inode *inode = d_inode(dentry);
+ struct offset_ctx *octx;
+ struct dentry *child;
+ unsigned long index;
+ int ret = 1;
+
+ if (!inode || !S_ISDIR(inode->i_mode))
+ return ret;
+
+ index = DIR_OFFSET_MIN;
+ octx = inode->i_op->get_offset_ctx(inode);
+ mt_for_each(&octx->mt, child, index, LONG_MAX) {
+ spin_lock(&child->d_lock);
+ if (simple_positive(child)) {
+ spin_unlock(&child->d_lock);
+ ret = 0;
+ break;
+ }
+ spin_unlock(&child->d_lock);
+ }
+
+ return ret;
+}
+
+/**
* simple_offset_rename_exchange - exchange rename with directory offsets
* @old_dir: parent of dentry being moved
* @old_dentry: dentry being moved
@@ -327,8 +362,8 @@ int simple_offset_rename_exchange(struct inode *old_dir,
{
struct offset_ctx *old_ctx = old_dir->i_op->get_offset_ctx(old_dir);
struct offset_ctx *new_ctx = new_dir->i_op->get_offset_ctx(new_dir);
- u32 old_index = dentry2offset(old_dentry);
- u32 new_index = dentry2offset(new_dentry);
+ long old_index = dentry2offset(old_dentry);
+ long new_index = dentry2offset(new_dentry);
int ret;
simple_offset_remove(old_ctx, old_dentry);
@@ -354,9 +389,9 @@ int simple_offset_rename_exchange(struct inode *old_dir,
out_restore:
offset_set(old_dentry, old_index);
- xa_store(&old_ctx->xa, old_index, old_dentry, GFP_KERNEL);
+ mtree_store(&old_ctx->mt, old_index, old_dentry, GFP_KERNEL);
offset_set(new_dentry, new_index);
- xa_store(&new_ctx->xa, new_index, new_dentry, GFP_KERNEL);
+ mtree_store(&new_ctx->mt, new_index, new_dentry, GFP_KERNEL);
return ret;
}
@@ -369,7 +404,7 @@ int simple_offset_rename_exchange(struct inode *old_dir,
*/
void simple_offset_destroy(struct offset_ctx *octx)
{
- xa_destroy(&octx->xa);
+ mtree_destroy(&octx->mt);
}
/**
@@ -399,15 +434,16 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence)
/* In this case, ->private_data is protected by f_pos_lock */
file->private_data = NULL;
- return vfs_setpos(file, offset, U32_MAX);
+ return vfs_setpos(file, offset, LONG_MAX);
}
-static struct dentry *offset_find_next(struct xa_state *xas)
+static struct dentry *offset_find_next(struct offset_ctx *octx, loff_t offset)
{
+ MA_STATE(mas, &octx->mt, offset, offset);
struct dentry *child, *found = NULL;
rcu_read_lock();
- child = xas_next_entry(xas, U32_MAX);
+ child = mas_find(&mas, LONG_MAX);
if (!child)
goto out;
spin_lock(&child->d_lock);
@@ -421,8 +457,8 @@ static struct dentry *offset_find_next(struct xa_state *xas)
static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry)
{
- u32 offset = dentry2offset(dentry);
struct inode *inode = d_inode(dentry);
+ long offset = dentry2offset(dentry);
return ctx->actor(ctx, dentry->d_name.name, dentry->d_name.len, offset,
inode->i_ino, fs_umode_to_dtype(inode->i_mode));
@@ -430,12 +466,11 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry)
static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx)
{
- struct offset_ctx *so_ctx = inode->i_op->get_offset_ctx(inode);
- XA_STATE(xas, &so_ctx->xa, ctx->pos);
+ struct offset_ctx *octx = inode->i_op->get_offset_ctx(inode);
struct dentry *dentry;
while (true) {
- dentry = offset_find_next(&xas);
+ dentry = offset_find_next(octx, ctx->pos);
if (!dentry)
return ERR_PTR(-ENOENT);
@@ -444,8 +479,8 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx)
break;
}
+ ctx->pos = dentry2offset(dentry) + 1;
dput(dentry);
- ctx->pos = xas.xa_index + 1;
}
return NULL;
}
@@ -481,7 +516,7 @@ static int offset_readdir(struct file *file, struct dir_context *ctx)
return 0;
/* In this case, ->private_data is protected by f_pos_lock */
- if (ctx->pos == 2)
+ if (ctx->pos == DIR_OFFSET_MIN)
file->private_data = NULL;
else if (file->private_data == ERR_PTR(-ENOENT))
return 0;
@@ -1580,7 +1615,7 @@ EXPORT_SYMBOL(alloc_anon_inode);
* All arguments are ignored and it just returns -EINVAL.
*/
int
-simple_nosetlease(struct file *filp, int arg, struct file_lock **flp,
+simple_nosetlease(struct file *filp, int arg, struct file_lease **flp,
void **priv)
{
return -EINVAL;
@@ -1704,16 +1739,28 @@ bool is_empty_dir_inode(struct inode *inode)
static int generic_ci_d_compare(const struct dentry *dentry, unsigned int len,
const char *str, const struct qstr *name)
{
- const struct dentry *parent = READ_ONCE(dentry->d_parent);
- const struct inode *dir = READ_ONCE(parent->d_inode);
- const struct super_block *sb = dentry->d_sb;
- const struct unicode_map *um = sb->s_encoding;
- struct qstr qstr = QSTR_INIT(str, len);
+ const struct dentry *parent;
+ const struct inode *dir;
char strbuf[DNAME_INLINE_LEN];
- int ret;
+ struct qstr qstr;
+ /*
+ * Attempt a case-sensitive match first. It is cheaper and
+ * should cover most lookups, including all the sane
+ * applications that expect a case-sensitive filesystem.
+ *
+ * This comparison is safe under RCU because the caller
+ * guarantees the consistency between str and len. See
+ * __d_lookup_rcu_op_compare() for details.
+ */
+ if (len == name->len && !memcmp(str, name->name, len))
+ return 0;
+
+ parent = READ_ONCE(dentry->d_parent);
+ dir = READ_ONCE(parent->d_inode);
if (!dir || !IS_CASEFOLDED(dir))
- goto fallback;
+ return 1;
+
/*
* If the dentry name is stored in-line, then it may be concurrently
* modified by a rename. If this happens, the VFS will eventually retry
@@ -1724,20 +1771,14 @@ static int generic_ci_d_compare(const struct dentry *dentry, unsigned int len,
if (len <= DNAME_INLINE_LEN - 1) {
memcpy(strbuf, str, len);
strbuf[len] = 0;
- qstr.name = strbuf;
+ str = strbuf;
/* prevent compiler from optimizing out the temporary buffer */
barrier();
}
- ret = utf8_strncasecmp(um, name, &qstr);
- if (ret >= 0)
- return ret;
+ qstr.len = len;
+ qstr.name = str;
- if (sb_has_strict_encoding(sb))
- return -EINVAL;
-fallback:
- if (len != name->len)
- return 1;
- return !!memcmp(str, name->name, len);
+ return utf8_strncasecmp(dentry->d_sb->s_encoding, name, &qstr);
}
/**
@@ -1752,7 +1793,7 @@ static int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str)
const struct inode *dir = READ_ONCE(dentry->d_inode);
struct super_block *sb = dentry->d_sb;
const struct unicode_map *um = sb->s_encoding;
- int ret = 0;
+ int ret;
if (!dir || !IS_CASEFOLDED(dir))
return 0;
@@ -1766,6 +1807,9 @@ static int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str)
static const struct dentry_operations generic_ci_dentry_ops = {
.d_hash = generic_ci_d_hash,
.d_compare = generic_ci_d_compare,
+#ifdef CONFIG_FS_ENCRYPTION
+ .d_revalidate = fscrypt_d_revalidate,
+#endif
};
#endif
@@ -1775,64 +1819,33 @@ static const struct dentry_operations generic_encrypted_dentry_ops = {
};
#endif
-#if defined(CONFIG_FS_ENCRYPTION) && IS_ENABLED(CONFIG_UNICODE)
-static const struct dentry_operations generic_encrypted_ci_dentry_ops = {
- .d_hash = generic_ci_d_hash,
- .d_compare = generic_ci_d_compare,
- .d_revalidate = fscrypt_d_revalidate,
-};
-#endif
-
/**
- * generic_set_encrypted_ci_d_ops - helper for setting d_ops for given dentry
- * @dentry: dentry to set ops on
+ * generic_set_sb_d_ops - helper for choosing the set of
+ * filesystem-wide dentry operations for the enabled features
+ * @sb: superblock to be configured
*
- * Casefolded directories need d_hash and d_compare set, so that the dentries
- * contained in them are handled case-insensitively. Note that these operations
- * are needed on the parent directory rather than on the dentries in it, and
- * while the casefolding flag can be toggled on and off on an empty directory,
- * dentry_operations can't be changed later. As a result, if the filesystem has
- * casefolding support enabled at all, we have to give all dentries the
- * casefolding operations even if their inode doesn't have the casefolding flag
- * currently (and thus the casefolding ops would be no-ops for now).
- *
- * Encryption works differently in that the only dentry operation it needs is
- * d_revalidate, which it only needs on dentries that have the no-key name flag.
- * The no-key flag can't be set "later", so we don't have to worry about that.
- *
- * Finally, to maximize compatibility with overlayfs (which isn't compatible
- * with certain dentry operations) and to avoid taking an unnecessary
- * performance hit, we use custom dentry_operations for each possible
- * combination rather than always installing all operations.
+ * Filesystems supporting casefolding and/or fscrypt can call this
+ * helper at mount-time to configure sb->s_d_op to best set of dentry
+ * operations required for the enabled features. The helper must be
+ * called after these have been configured, but before the root dentry
+ * is created.
*/
-void generic_set_encrypted_ci_d_ops(struct dentry *dentry)
+void generic_set_sb_d_ops(struct super_block *sb)
{
-#ifdef CONFIG_FS_ENCRYPTION
- bool needs_encrypt_ops = dentry->d_flags & DCACHE_NOKEY_NAME;
-#endif
#if IS_ENABLED(CONFIG_UNICODE)
- bool needs_ci_ops = dentry->d_sb->s_encoding;
-#endif
-#if defined(CONFIG_FS_ENCRYPTION) && IS_ENABLED(CONFIG_UNICODE)
- if (needs_encrypt_ops && needs_ci_ops) {
- d_set_d_op(dentry, &generic_encrypted_ci_dentry_ops);
+ if (sb->s_encoding) {
+ sb->s_d_op = &generic_ci_dentry_ops;
return;
}
#endif
#ifdef CONFIG_FS_ENCRYPTION
- if (needs_encrypt_ops) {
- d_set_d_op(dentry, &generic_encrypted_dentry_ops);
- return;
- }
-#endif
-#if IS_ENABLED(CONFIG_UNICODE)
- if (needs_ci_ops) {
- d_set_d_op(dentry, &generic_ci_dentry_ops);
+ if (sb->s_cop) {
+ sb->s_d_op = &generic_encrypted_dentry_ops;
return;
}
#endif
}
-EXPORT_SYMBOL(generic_set_encrypted_ci_d_ops);
+EXPORT_SYMBOL(generic_set_sb_d_ops);
/**
* inode_maybe_inc_iversion - increments i_version
@@ -1973,3 +1986,144 @@ struct timespec64 simple_inode_init_ts(struct inode *inode)
return ts;
}
EXPORT_SYMBOL(simple_inode_init_ts);
+
+static inline struct dentry *get_stashed_dentry(struct dentry *stashed)
+{
+ struct dentry *dentry;
+
+ guard(rcu)();
+ dentry = READ_ONCE(stashed);
+ if (!dentry)
+ return NULL;
+ if (!lockref_get_not_dead(&dentry->d_lockref))
+ return NULL;
+ return dentry;
+}
+
+static struct dentry *prepare_anon_dentry(struct dentry **stashed,
+ unsigned long ino,
+ struct super_block *sb,
+ void *data)
+{
+ struct dentry *dentry;
+ struct inode *inode;
+ const struct stashed_operations *sops = sb->s_fs_info;
+
+ dentry = d_alloc_anon(sb);
+ if (!dentry)
+ return ERR_PTR(-ENOMEM);
+
+ inode = new_inode_pseudo(sb);
+ if (!inode) {
+ dput(dentry);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ inode->i_ino = ino;
+ inode->i_flags |= S_IMMUTABLE;
+ inode->i_mode = S_IFREG;
+ simple_inode_init_ts(inode);
+ sops->init_inode(inode, data);
+
+ /* Notice when this is changed. */
+ WARN_ON_ONCE(!S_ISREG(inode->i_mode));
+ WARN_ON_ONCE(!IS_IMMUTABLE(inode));
+
+ /* Store address of location where dentry's supposed to be stashed. */
+ dentry->d_fsdata = stashed;
+
+ /* @data is now owned by the fs */
+ d_instantiate(dentry, inode);
+ return dentry;
+}
+
+static struct dentry *stash_dentry(struct dentry **stashed,
+ struct dentry *dentry)
+{
+ guard(rcu)();
+ for (;;) {
+ struct dentry *old;
+
+ /* Assume any old dentry was cleared out. */
+ old = cmpxchg(stashed, NULL, dentry);
+ if (likely(!old))
+ return dentry;
+
+ /* Check if somebody else installed a reusable dentry. */
+ if (lockref_get_not_dead(&old->d_lockref))
+ return old;
+
+ /* There's an old dead dentry there, try to take it over. */
+ if (likely(try_cmpxchg(stashed, &old, dentry)))
+ return dentry;
+ }
+}
+
+/**
+ * path_from_stashed - create path from stashed or new dentry
+ * @stashed: where to retrieve or stash dentry
+ * @ino: inode number to use
+ * @mnt: mnt of the filesystems to use
+ * @data: data to store in inode->i_private
+ * @path: path to create
+ *
+ * The function tries to retrieve a stashed dentry from @stashed. If the dentry
+ * is still valid then it will be reused. If the dentry isn't able the function
+ * will allocate a new dentry and inode. It will then check again whether it
+ * can reuse an existing dentry in case one has been added in the meantime or
+ * update @stashed with the newly added dentry.
+ *
+ * Special-purpose helper for nsfs and pidfs.
+ *
+ * Return: On success zero and on failure a negative error is returned.
+ */
+int path_from_stashed(struct dentry **stashed, unsigned long ino,
+ struct vfsmount *mnt, void *data, struct path *path)
+{
+ struct dentry *dentry;
+ const struct stashed_operations *sops = mnt->mnt_sb->s_fs_info;
+
+ /* See if dentry can be reused. */
+ path->dentry = get_stashed_dentry(*stashed);
+ if (path->dentry) {
+ sops->put_data(data);
+ goto out_path;
+ }
+
+ /* Allocate a new dentry. */
+ dentry = prepare_anon_dentry(stashed, ino, mnt->mnt_sb, data);
+ if (IS_ERR(dentry)) {
+ sops->put_data(data);
+ return PTR_ERR(dentry);
+ }
+
+ /* Added a new dentry. @data is now owned by the filesystem. */
+ path->dentry = stash_dentry(stashed, dentry);
+ if (path->dentry != dentry)
+ dput(dentry);
+
+out_path:
+ WARN_ON_ONCE(path->dentry->d_fsdata != stashed);
+ WARN_ON_ONCE(d_inode(path->dentry)->i_private != data);
+ path->mnt = mntget(mnt);
+ return 0;
+}
+
+void stashed_dentry_prune(struct dentry *dentry)
+{
+ struct dentry **stashed = dentry->d_fsdata;
+ struct inode *inode = d_inode(dentry);
+
+ if (WARN_ON_ONCE(!stashed))
+ return;
+
+ if (!inode)
+ return;
+
+ /*
+ * Only replace our own @dentry as someone else might've
+ * already cleared out @dentry and stashed their own
+ * dentry in there.
+ */
+ cmpxchg(stashed, dentry, NULL);
+}
diff --git a/fs/lockd/clnt4xdr.c b/fs/lockd/clnt4xdr.c
index 8161667..527458d 100644
--- a/fs/lockd/clnt4xdr.c
+++ b/fs/lockd/clnt4xdr.c
@@ -243,7 +243,7 @@ static void encode_nlm4_holder(struct xdr_stream *xdr,
u64 l_offset, l_len;
__be32 *p;
- encode_bool(xdr, lock->fl.fl_type == F_RDLCK);
+ encode_bool(xdr, lock->fl.c.flc_type == F_RDLCK);
encode_int32(xdr, lock->svid);
encode_netobj(xdr, lock->oh.data, lock->oh.len);
@@ -270,7 +270,7 @@ static int decode_nlm4_holder(struct xdr_stream *xdr, struct nlm_res *result)
goto out_overflow;
exclusive = be32_to_cpup(p++);
lock->svid = be32_to_cpup(p);
- fl->fl_pid = (pid_t)lock->svid;
+ fl->c.flc_pid = (pid_t)lock->svid;
error = decode_netobj(xdr, &lock->oh);
if (unlikely(error))
@@ -280,8 +280,8 @@ static int decode_nlm4_holder(struct xdr_stream *xdr, struct nlm_res *result)
if (unlikely(p == NULL))
goto out_overflow;
- fl->fl_flags = FL_POSIX;
- fl->fl_type = exclusive != 0 ? F_WRLCK : F_RDLCK;
+ fl->c.flc_flags = FL_POSIX;
+ fl->c.flc_type = exclusive != 0 ? F_WRLCK : F_RDLCK;
p = xdr_decode_hyper(p, &l_offset);
xdr_decode_hyper(p, &l_len);
nlm4svc_set_file_lock_range(fl, l_offset, l_len);
@@ -357,7 +357,7 @@ static void nlm4_xdr_enc_testargs(struct rpc_rqst *req,
const struct nlm_lock *lock = &args->lock;
encode_cookie(xdr, &args->cookie);
- encode_bool(xdr, lock->fl.fl_type == F_WRLCK);
+ encode_bool(xdr, lock->fl.c.flc_type == F_WRLCK);
encode_nlm4_lock(xdr, lock);
}
@@ -380,7 +380,7 @@ static void nlm4_xdr_enc_lockargs(struct rpc_rqst *req,
encode_cookie(xdr, &args->cookie);
encode_bool(xdr, args->block);
- encode_bool(xdr, lock->fl.fl_type == F_WRLCK);
+ encode_bool(xdr, lock->fl.c.flc_type == F_WRLCK);
encode_nlm4_lock(xdr, lock);
encode_bool(xdr, args->reclaim);
encode_int32(xdr, args->state);
@@ -403,7 +403,7 @@ static void nlm4_xdr_enc_cancargs(struct rpc_rqst *req,
encode_cookie(xdr, &args->cookie);
encode_bool(xdr, args->block);
- encode_bool(xdr, lock->fl.fl_type == F_WRLCK);
+ encode_bool(xdr, lock->fl.c.flc_type == F_WRLCK);
encode_nlm4_lock(xdr, lock);
}
diff --git a/fs/lockd/clntlock.c b/fs/lockd/clntlock.c
index 5d85715..a7e0519 100644
--- a/fs/lockd/clntlock.c
+++ b/fs/lockd/clntlock.c
@@ -185,7 +185,7 @@ __be32 nlmclnt_grant(const struct sockaddr *addr, const struct nlm_lock *lock)
continue;
if (!rpc_cmp_addr(nlm_addr(block->b_host), addr))
continue;
- if (nfs_compare_fh(NFS_FH(file_inode(fl_blocked->fl_file)), fh) != 0)
+ if (nfs_compare_fh(NFS_FH(file_inode(fl_blocked->c.flc_file)), fh) != 0)
continue;
/* Alright, we found a lock. Set the return status
* and wake up the caller
diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
index fba6c7f..cebcc28 100644
--- a/fs/lockd/clntproc.c
+++ b/fs/lockd/clntproc.c
@@ -133,7 +133,8 @@ static void nlmclnt_setlockargs(struct nlm_rqst *req, struct file_lock *fl)
char *nodename = req->a_host->h_rpcclnt->cl_nodename;
nlmclnt_next_cookie(&argp->cookie);
- memcpy(&lock->fh, NFS_FH(file_inode(fl->fl_file)), sizeof(struct nfs_fh));
+ memcpy(&lock->fh, NFS_FH(file_inode(fl->c.flc_file)),
+ sizeof(struct nfs_fh));
lock->caller = nodename;
lock->oh.data = req->a_owner;
lock->oh.len = snprintf(req->a_owner, sizeof(req->a_owner), "%u@%s",
@@ -142,7 +143,7 @@ static void nlmclnt_setlockargs(struct nlm_rqst *req, struct file_lock *fl)
lock->svid = fl->fl_u.nfs_fl.owner->pid;
lock->fl.fl_start = fl->fl_start;
lock->fl.fl_end = fl->fl_end;
- lock->fl.fl_type = fl->fl_type;
+ lock->fl.c.flc_type = fl->c.flc_type;
}
static void nlmclnt_release_lockargs(struct nlm_rqst *req)
@@ -182,7 +183,7 @@ int nlmclnt_proc(struct nlm_host *host, int cmd, struct file_lock *fl, void *dat
call->a_callback_data = data;
if (IS_SETLK(cmd) || IS_SETLKW(cmd)) {
- if (fl->fl_type != F_UNLCK) {
+ if (fl->c.flc_type != F_UNLCK) {
call->a_args.block = IS_SETLKW(cmd) ? 1 : 0;
status = nlmclnt_lock(call, fl);
} else
@@ -432,13 +433,14 @@ nlmclnt_test(struct nlm_rqst *req, struct file_lock *fl)
{
int status;
- status = nlmclnt_call(nfs_file_cred(fl->fl_file), req, NLMPROC_TEST);
+ status = nlmclnt_call(nfs_file_cred(fl->c.flc_file), req,
+ NLMPROC_TEST);
if (status < 0)
goto out;
switch (req->a_res.status) {
case nlm_granted:
- fl->fl_type = F_UNLCK;
+ fl->c.flc_type = F_UNLCK;
break;
case nlm_lck_denied:
/*
@@ -446,8 +448,8 @@ nlmclnt_test(struct nlm_rqst *req, struct file_lock *fl)
*/
fl->fl_start = req->a_res.lock.fl.fl_start;
fl->fl_end = req->a_res.lock.fl.fl_end;
- fl->fl_type = req->a_res.lock.fl.fl_type;
- fl->fl_pid = -req->a_res.lock.fl.fl_pid;
+ fl->c.flc_type = req->a_res.lock.fl.c.flc_type;
+ fl->c.flc_pid = -req->a_res.lock.fl.c.flc_pid;
break;
default:
status = nlm_stat_to_errno(req->a_res.status);
@@ -485,14 +487,15 @@ static const struct file_lock_operations nlmclnt_lock_ops = {
static void nlmclnt_locks_init_private(struct file_lock *fl, struct nlm_host *host)
{
fl->fl_u.nfs_fl.state = 0;
- fl->fl_u.nfs_fl.owner = nlmclnt_find_lockowner(host, fl->fl_owner);
+ fl->fl_u.nfs_fl.owner = nlmclnt_find_lockowner(host,
+ fl->c.flc_owner);
INIT_LIST_HEAD(&fl->fl_u.nfs_fl.list);
fl->fl_ops = &nlmclnt_lock_ops;
}
static int do_vfs_lock(struct file_lock *fl)
{
- return locks_lock_file_wait(fl->fl_file, fl);
+ return locks_lock_file_wait(fl->c.flc_file, fl);
}
/*
@@ -518,12 +521,12 @@ static int do_vfs_lock(struct file_lock *fl)
static int
nlmclnt_lock(struct nlm_rqst *req, struct file_lock *fl)
{
- const struct cred *cred = nfs_file_cred(fl->fl_file);
+ const struct cred *cred = nfs_file_cred(fl->c.flc_file);
struct nlm_host *host = req->a_host;
struct nlm_res *resp = &req->a_res;
struct nlm_wait block;
- unsigned char fl_flags = fl->fl_flags;
- unsigned char fl_type;
+ unsigned char flags = fl->c.flc_flags;
+ unsigned char type;
__be32 b_status;
int status = -ENOLCK;
@@ -531,9 +534,9 @@ nlmclnt_lock(struct nlm_rqst *req, struct file_lock *fl)
goto out;
req->a_args.state = nsm_local_state;
- fl->fl_flags |= FL_ACCESS;
+ fl->c.flc_flags |= FL_ACCESS;
status = do_vfs_lock(fl);
- fl->fl_flags = fl_flags;
+ fl->c.flc_flags = flags;
if (status < 0)
goto out;
@@ -591,11 +594,11 @@ nlmclnt_lock(struct nlm_rqst *req, struct file_lock *fl)
goto again;
}
/* Ensure the resulting lock will get added to granted list */
- fl->fl_flags |= FL_SLEEP;
+ fl->c.flc_flags |= FL_SLEEP;
if (do_vfs_lock(fl) < 0)
printk(KERN_WARNING "%s: VFS is out of sync with lock manager!\n", __func__);
up_read(&host->h_rwsem);
- fl->fl_flags = fl_flags;
+ fl->c.flc_flags = flags;
status = 0;
}
if (status < 0)
@@ -605,7 +608,7 @@ nlmclnt_lock(struct nlm_rqst *req, struct file_lock *fl)
* cases NLM_LCK_DENIED is returned for a permanent error. So
* turn it into an ENOLCK.
*/
- if (resp->status == nlm_lck_denied && (fl_flags & FL_SLEEP))
+ if (resp->status == nlm_lck_denied && (flags & FL_SLEEP))
status = -ENOLCK;
else
status = nlm_stat_to_errno(resp->status);
@@ -622,13 +625,13 @@ nlmclnt_lock(struct nlm_rqst *req, struct file_lock *fl)
req->a_host->h_addrlen, req->a_res.status);
dprintk("lockd: lock attempt ended in fatal error.\n"
" Attempting to unlock.\n");
- fl_type = fl->fl_type;
- fl->fl_type = F_UNLCK;
+ type = fl->c.flc_type;
+ fl->c.flc_type = F_UNLCK;
down_read(&host->h_rwsem);
do_vfs_lock(fl);
up_read(&host->h_rwsem);
- fl->fl_type = fl_type;
- fl->fl_flags = fl_flags;
+ fl->c.flc_type = type;
+ fl->c.flc_flags = flags;
nlmclnt_async_call(cred, req, NLMPROC_UNLOCK, &nlmclnt_unlock_ops);
return status;
}
@@ -651,12 +654,14 @@ nlmclnt_reclaim(struct nlm_host *host, struct file_lock *fl,
nlmclnt_setlockargs(req, fl);
req->a_args.reclaim = 1;
- status = nlmclnt_call(nfs_file_cred(fl->fl_file), req, NLMPROC_LOCK);
+ status = nlmclnt_call(nfs_file_cred(fl->c.flc_file), req,
+ NLMPROC_LOCK);
if (status >= 0 && req->a_res.status == nlm_granted)
return 0;
printk(KERN_WARNING "lockd: failed to reclaim lock for pid %d "
- "(errno %d, status %d)\n", fl->fl_pid,
+ "(errno %d, status %d)\n",
+ fl->c.flc_pid,
status, ntohl(req->a_res.status));
/*
@@ -683,26 +688,26 @@ nlmclnt_unlock(struct nlm_rqst *req, struct file_lock *fl)
struct nlm_host *host = req->a_host;
struct nlm_res *resp = &req->a_res;
int status;
- unsigned char fl_flags = fl->fl_flags;
+ unsigned char flags = fl->c.flc_flags;
/*
* Note: the server is supposed to either grant us the unlock
* request, or to deny it with NLM_LCK_DENIED_GRACE_PERIOD. In either
* case, we want to unlock.
*/
- fl->fl_flags |= FL_EXISTS;
+ fl->c.flc_flags |= FL_EXISTS;
down_read(&host->h_rwsem);
status = do_vfs_lock(fl);
up_read(&host->h_rwsem);
- fl->fl_flags = fl_flags;
+ fl->c.flc_flags = flags;
if (status == -ENOENT) {
status = 0;
goto out;
}
refcount_inc(&req->a_count);
- status = nlmclnt_async_call(nfs_file_cred(fl->fl_file), req,
- NLMPROC_UNLOCK, &nlmclnt_unlock_ops);
+ status = nlmclnt_async_call(nfs_file_cred(fl->c.flc_file), req,
+ NLMPROC_UNLOCK, &nlmclnt_unlock_ops);
if (status < 0)
goto out;
@@ -795,8 +800,8 @@ static int nlmclnt_cancel(struct nlm_host *host, int block, struct file_lock *fl
req->a_args.block = block;
refcount_inc(&req->a_count);
- status = nlmclnt_async_call(nfs_file_cred(fl->fl_file), req,
- NLMPROC_CANCEL, &nlmclnt_cancel_ops);
+ status = nlmclnt_async_call(nfs_file_cred(fl->c.flc_file), req,
+ NLMPROC_CANCEL, &nlmclnt_cancel_ops);
if (status == 0 && req->a_res.status == nlm_lck_denied)
status = -ENOLCK;
nlmclnt_release_call(req);
diff --git a/fs/lockd/clntxdr.c b/fs/lockd/clntxdr.c
index 4df62f6..a3e9727 100644
--- a/fs/lockd/clntxdr.c
+++ b/fs/lockd/clntxdr.c
@@ -238,7 +238,7 @@ static void encode_nlm_holder(struct xdr_stream *xdr,
u32 l_offset, l_len;
__be32 *p;
- encode_bool(xdr, lock->fl.fl_type == F_RDLCK);
+ encode_bool(xdr, lock->fl.c.flc_type == F_RDLCK);
encode_int32(xdr, lock->svid);
encode_netobj(xdr, lock->oh.data, lock->oh.len);
@@ -265,7 +265,7 @@ static int decode_nlm_holder(struct xdr_stream *xdr, struct nlm_res *result)
goto out_overflow;
exclusive = be32_to_cpup(p++);
lock->svid = be32_to_cpup(p);
- fl->fl_pid = (pid_t)lock->svid;
+ fl->c.flc_pid = (pid_t)lock->svid;
error = decode_netobj(xdr, &lock->oh);
if (unlikely(error))
@@ -275,8 +275,8 @@ static int decode_nlm_holder(struct xdr_stream *xdr, struct nlm_res *result)
if (unlikely(p == NULL))
goto out_overflow;
- fl->fl_flags = FL_POSIX;
- fl->fl_type = exclusive != 0 ? F_WRLCK : F_RDLCK;
+ fl->c.flc_flags = FL_POSIX;
+ fl->c.flc_type = exclusive != 0 ? F_WRLCK : F_RDLCK;
l_offset = be32_to_cpup(p++);
l_len = be32_to_cpup(p);
end = l_offset + l_len - 1;
@@ -357,7 +357,7 @@ static void nlm_xdr_enc_testargs(struct rpc_rqst *req,
const struct nlm_lock *lock = &args->lock;
encode_cookie(xdr, &args->cookie);
- encode_bool(xdr, lock->fl.fl_type == F_WRLCK);
+ encode_bool(xdr, lock->fl.c.flc_type == F_WRLCK);
encode_nlm_lock(xdr, lock);
}
@@ -380,7 +380,7 @@ static void nlm_xdr_enc_lockargs(struct rpc_rqst *req,
encode_cookie(xdr, &args->cookie);
encode_bool(xdr, args->block);
- encode_bool(xdr, lock->fl.fl_type == F_WRLCK);
+ encode_bool(xdr, lock->fl.c.flc_type == F_WRLCK);
encode_nlm_lock(xdr, lock);
encode_bool(xdr, args->reclaim);
encode_int32(xdr, args->state);
@@ -403,7 +403,7 @@ static void nlm_xdr_enc_cancargs(struct rpc_rqst *req,
encode_cookie(xdr, &args->cookie);
encode_bool(xdr, args->block);
- encode_bool(xdr, lock->fl.fl_type == F_WRLCK);
+ encode_bool(xdr, lock->fl.c.flc_type == F_WRLCK);
encode_nlm_lock(xdr, lock);
}
diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c
index b72023a..8a72c41 100644
--- a/fs/lockd/svc4proc.c
+++ b/fs/lockd/svc4proc.c
@@ -52,16 +52,16 @@ nlm4svc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp,
*filp = file;
/* Set up the missing parts of the file_lock structure */
- lock->fl.fl_flags = FL_POSIX;
- lock->fl.fl_file = file->f_file[mode];
- lock->fl.fl_pid = current->tgid;
+ lock->fl.c.flc_flags = FL_POSIX;
+ lock->fl.c.flc_file = file->f_file[mode];
+ lock->fl.c.flc_pid = current->tgid;
lock->fl.fl_start = (loff_t)lock->lock_start;
lock->fl.fl_end = lock->lock_len ?
(loff_t)(lock->lock_start + lock->lock_len - 1) :
OFFSET_MAX;
lock->fl.fl_lmops = &nlmsvc_lock_operations;
nlmsvc_locks_init_private(&lock->fl, host, (pid_t)lock->svid);
- if (!lock->fl.fl_owner) {
+ if (!lock->fl.c.flc_owner) {
/* lockowner allocation has failed */
nlmsvc_release_host(host);
return nlm_lck_denied_nolocks;
@@ -106,7 +106,7 @@ __nlm4svc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp)
if ((resp->status = nlm4svc_retrieve_args(rqstp, argp, &host, &file)))
return resp->status == nlm_drop_reply ? rpc_drop_reply :rpc_success;
- test_owner = argp->lock.fl.fl_owner;
+ test_owner = argp->lock.fl.c.flc_owner;
/* Now check for conflicting locks */
resp->status = nlmsvc_testlock(rqstp, file, host, &argp->lock, &resp->lock, &resp->cookie);
if (resp->status == nlm_drop_reply)
diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
index 2dc1090..1f2149d 100644
--- a/fs/lockd/svclock.c
+++ b/fs/lockd/svclock.c
@@ -150,16 +150,17 @@ nlmsvc_lookup_block(struct nlm_file *file, struct nlm_lock *lock)
struct file_lock *fl;
dprintk("lockd: nlmsvc_lookup_block f=%p pd=%d %Ld-%Ld ty=%d\n",
- file, lock->fl.fl_pid,
+ file, lock->fl.c.flc_pid,
(long long)lock->fl.fl_start,
- (long long)lock->fl.fl_end, lock->fl.fl_type);
+ (long long)lock->fl.fl_end,
+ lock->fl.c.flc_type);
spin_lock(&nlm_blocked_lock);
list_for_each_entry(block, &nlm_blocked, b_list) {
fl = &block->b_call->a_args.lock.fl;
dprintk("lockd: check f=%p pd=%d %Ld-%Ld ty=%d cookie=%s\n",
- block->b_file, fl->fl_pid,
+ block->b_file, fl->c.flc_pid,
(long long)fl->fl_start,
- (long long)fl->fl_end, fl->fl_type,
+ (long long)fl->fl_end, fl->c.flc_type,
nlmdbg_cookie2a(&block->b_call->a_args.cookie));
if (block->b_file == file && nlm_compare_locks(fl, &lock->fl)) {
kref_get(&block->b_count);
@@ -244,7 +245,7 @@ nlmsvc_create_block(struct svc_rqst *rqstp, struct nlm_host *host,
goto failed_free;
/* Set notifier function for VFS, and init args */
- call->a_args.lock.fl.fl_flags |= FL_SLEEP;
+ call->a_args.lock.fl.c.flc_flags |= FL_SLEEP;
call->a_args.lock.fl.fl_lmops = &nlmsvc_lock_operations;
nlmclnt_next_cookie(&call->a_args.cookie);
@@ -402,14 +403,14 @@ static struct nlm_lockowner *nlmsvc_find_lockowner(struct nlm_host *host, pid_t
void
nlmsvc_release_lockowner(struct nlm_lock *lock)
{
- if (lock->fl.fl_owner)
- nlmsvc_put_lockowner(lock->fl.fl_owner);
+ if (lock->fl.c.flc_owner)
+ nlmsvc_put_lockowner(lock->fl.c.flc_owner);
}
void nlmsvc_locks_init_private(struct file_lock *fl, struct nlm_host *host,
pid_t pid)
{
- fl->fl_owner = nlmsvc_find_lockowner(host, pid);
+ fl->c.flc_owner = nlmsvc_find_lockowner(host, pid);
}
/*
@@ -425,7 +426,7 @@ static int nlmsvc_setgrantargs(struct nlm_rqst *call, struct nlm_lock *lock)
/* set default data area */
call->a_args.lock.oh.data = call->a_owner;
- call->a_args.lock.svid = ((struct nlm_lockowner *)lock->fl.fl_owner)->pid;
+ call->a_args.lock.svid = ((struct nlm_lockowner *) lock->fl.c.flc_owner)->pid;
if (lock->oh.len > NLMCLNT_OHSIZE) {
void *data = kmalloc(lock->oh.len, GFP_KERNEL);
@@ -489,7 +490,8 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
dprintk("lockd: nlmsvc_lock(%s/%ld, ty=%d, pi=%d, %Ld-%Ld, bl=%d)\n",
inode->i_sb->s_id, inode->i_ino,
- lock->fl.fl_type, lock->fl.fl_pid,
+ lock->fl.c.flc_type,
+ lock->fl.c.flc_pid,
(long long)lock->fl.fl_start,
(long long)lock->fl.fl_end,
wait);
@@ -512,7 +514,7 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
goto out;
lock = &block->b_call->a_args.lock;
} else
- lock->fl.fl_flags &= ~FL_SLEEP;
+ lock->fl.c.flc_flags &= ~FL_SLEEP;
if (block->b_flags & B_QUEUED) {
dprintk("lockd: nlmsvc_lock deferred block %p flags %d\n",
@@ -560,10 +562,10 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
spin_unlock(&nlm_blocked_lock);
if (!wait)
- lock->fl.fl_flags &= ~FL_SLEEP;
+ lock->fl.c.flc_flags &= ~FL_SLEEP;
mode = lock_to_openmode(&lock->fl);
error = vfs_lock_file(file->f_file[mode], F_SETLK, &lock->fl, NULL);
- lock->fl.fl_flags &= ~FL_SLEEP;
+ lock->fl.c.flc_flags &= ~FL_SLEEP;
dprintk("lockd: vfs_lock_file returned %d\n", error);
switch (error) {
@@ -616,7 +618,7 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file,
dprintk("lockd: nlmsvc_testlock(%s/%ld, ty=%d, %Ld-%Ld)\n",
nlmsvc_file_inode(file)->i_sb->s_id,
nlmsvc_file_inode(file)->i_ino,
- lock->fl.fl_type,
+ lock->fl.c.flc_type,
(long long)lock->fl.fl_start,
(long long)lock->fl.fl_end);
@@ -636,19 +638,19 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file,
goto out;
}
- if (lock->fl.fl_type == F_UNLCK) {
+ if (lock->fl.c.flc_type == F_UNLCK) {
ret = nlm_granted;
goto out;
}
dprintk("lockd: conflicting lock(ty=%d, %Ld-%Ld)\n",
- lock->fl.fl_type, (long long)lock->fl.fl_start,
+ lock->fl.c.flc_type, (long long)lock->fl.fl_start,
(long long)lock->fl.fl_end);
conflock->caller = "somehost"; /* FIXME */
conflock->len = strlen(conflock->caller);
conflock->oh.len = 0; /* don't return OH info */
- conflock->svid = lock->fl.fl_pid;
- conflock->fl.fl_type = lock->fl.fl_type;
+ conflock->svid = lock->fl.c.flc_pid;
+ conflock->fl.c.flc_type = lock->fl.c.flc_type;
conflock->fl.fl_start = lock->fl.fl_start;
conflock->fl.fl_end = lock->fl.fl_end;
locks_release_private(&lock->fl);
@@ -673,21 +675,21 @@ nlmsvc_unlock(struct net *net, struct nlm_file *file, struct nlm_lock *lock)
dprintk("lockd: nlmsvc_unlock(%s/%ld, pi=%d, %Ld-%Ld)\n",
nlmsvc_file_inode(file)->i_sb->s_id,
nlmsvc_file_inode(file)->i_ino,
- lock->fl.fl_pid,
+ lock->fl.c.flc_pid,
(long long)lock->fl.fl_start,
(long long)lock->fl.fl_end);
/* First, cancel any lock that might be there */
nlmsvc_cancel_blocked(net, file, lock);
- lock->fl.fl_type = F_UNLCK;
- lock->fl.fl_file = file->f_file[O_RDONLY];
- if (lock->fl.fl_file)
- error = vfs_lock_file(lock->fl.fl_file, F_SETLK,
+ lock->fl.c.flc_type = F_UNLCK;
+ lock->fl.c.flc_file = file->f_file[O_RDONLY];
+ if (lock->fl.c.flc_file)
+ error = vfs_lock_file(lock->fl.c.flc_file, F_SETLK,
&lock->fl, NULL);
- lock->fl.fl_file = file->f_file[O_WRONLY];
- if (lock->fl.fl_file)
- error |= vfs_lock_file(lock->fl.fl_file, F_SETLK,
+ lock->fl.c.flc_file = file->f_file[O_WRONLY];
+ if (lock->fl.c.flc_file)
+ error |= vfs_lock_file(lock->fl.c.flc_file, F_SETLK,
&lock->fl, NULL);
return (error < 0)? nlm_lck_denied_nolocks : nlm_granted;
@@ -710,7 +712,7 @@ nlmsvc_cancel_blocked(struct net *net, struct nlm_file *file, struct nlm_lock *l
dprintk("lockd: nlmsvc_cancel(%s/%ld, pi=%d, %Ld-%Ld)\n",
nlmsvc_file_inode(file)->i_sb->s_id,
nlmsvc_file_inode(file)->i_ino,
- lock->fl.fl_pid,
+ lock->fl.c.flc_pid,
(long long)lock->fl.fl_start,
(long long)lock->fl.fl_end);
@@ -863,12 +865,12 @@ nlmsvc_grant_blocked(struct nlm_block *block)
/* vfs_lock_file() can mangle fl_start and fl_end, but we need
* them unchanged for the GRANT_MSG
*/
- lock->fl.fl_flags |= FL_SLEEP;
+ lock->fl.c.flc_flags |= FL_SLEEP;
fl_start = lock->fl.fl_start;
fl_end = lock->fl.fl_end;
mode = lock_to_openmode(&lock->fl);
error = vfs_lock_file(file->f_file[mode], F_SETLK, &lock->fl, NULL);
- lock->fl.fl_flags &= ~FL_SLEEP;
+ lock->fl.c.flc_flags &= ~FL_SLEEP;
lock->fl.fl_start = fl_start;
lock->fl.fl_end = fl_end;
@@ -993,8 +995,8 @@ nlmsvc_grant_reply(struct nlm_cookie *cookie, __be32 status)
/* Client doesn't want it, just unlock it */
nlmsvc_unlink_block(block);
fl = &block->b_call->a_args.lock.fl;
- fl->fl_type = F_UNLCK;
- error = vfs_lock_file(fl->fl_file, F_SETLK, fl, NULL);
+ fl->c.flc_type = F_UNLCK;
+ error = vfs_lock_file(fl->c.flc_file, F_SETLK, fl, NULL);
if (error)
pr_warn("lockd: unable to unlock lock rejected by client!\n");
break;
diff --git a/fs/lockd/svcproc.c b/fs/lockd/svcproc.c
index 32784f5..a03220e 100644
--- a/fs/lockd/svcproc.c
+++ b/fs/lockd/svcproc.c
@@ -77,12 +77,12 @@ nlmsvc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp,
/* Set up the missing parts of the file_lock structure */
mode = lock_to_openmode(&lock->fl);
- lock->fl.fl_flags = FL_POSIX;
- lock->fl.fl_file = file->f_file[mode];
- lock->fl.fl_pid = current->tgid;
+ lock->fl.c.flc_flags = FL_POSIX;
+ lock->fl.c.flc_file = file->f_file[mode];
+ lock->fl.c.flc_pid = current->tgid;
lock->fl.fl_lmops = &nlmsvc_lock_operations;
nlmsvc_locks_init_private(&lock->fl, host, (pid_t)lock->svid);
- if (!lock->fl.fl_owner) {
+ if (!lock->fl.c.flc_owner) {
/* lockowner allocation has failed */
nlmsvc_release_host(host);
return nlm_lck_denied_nolocks;
@@ -127,7 +127,7 @@ __nlmsvc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp)
if ((resp->status = nlmsvc_retrieve_args(rqstp, argp, &host, &file)))
return resp->status == nlm_drop_reply ? rpc_drop_reply :rpc_success;
- test_owner = argp->lock.fl.fl_owner;
+ test_owner = argp->lock.fl.c.flc_owner;
/* Now check for conflicting locks */
resp->status = cast_status(nlmsvc_testlock(rqstp, file, host, &argp->lock, &resp->lock, &resp->cookie));
diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c
index e3b6229..9103896 100644
--- a/fs/lockd/svcsubs.c
+++ b/fs/lockd/svcsubs.c
@@ -73,7 +73,7 @@ static inline unsigned int file_hash(struct nfs_fh *f)
int lock_to_openmode(struct file_lock *lock)
{
- return (lock->fl_type == F_WRLCK) ? O_WRONLY : O_RDONLY;
+ return lock_is_write(lock) ? O_WRONLY : O_RDONLY;
}
/*
@@ -181,18 +181,18 @@ static int nlm_unlock_files(struct nlm_file *file, const struct file_lock *fl)
struct file_lock lock;
locks_init_lock(&lock);
- lock.fl_type = F_UNLCK;
+ lock.c.flc_type = F_UNLCK;
lock.fl_start = 0;
lock.fl_end = OFFSET_MAX;
- lock.fl_owner = fl->fl_owner;
- lock.fl_pid = fl->fl_pid;
- lock.fl_flags = FL_POSIX;
+ lock.c.flc_owner = fl->c.flc_owner;
+ lock.c.flc_pid = fl->c.flc_pid;
+ lock.c.flc_flags = FL_POSIX;
- lock.fl_file = file->f_file[O_RDONLY];
- if (lock.fl_file && vfs_lock_file(lock.fl_file, F_SETLK, &lock, NULL))
+ lock.c.flc_file = file->f_file[O_RDONLY];
+ if (lock.c.flc_file && vfs_lock_file(lock.c.flc_file, F_SETLK, &lock, NULL))
goto out_err;
- lock.fl_file = file->f_file[O_WRONLY];
- if (lock.fl_file && vfs_lock_file(lock.fl_file, F_SETLK, &lock, NULL))
+ lock.c.flc_file = file->f_file[O_WRONLY];
+ if (lock.c.flc_file && vfs_lock_file(lock.c.flc_file, F_SETLK, &lock, NULL))
goto out_err;
return 0;
out_err:
@@ -218,14 +218,14 @@ nlm_traverse_locks(struct nlm_host *host, struct nlm_file *file,
again:
file->f_locks = 0;
spin_lock(&flctx->flc_lock);
- list_for_each_entry(fl, &flctx->flc_posix, fl_list) {
+ for_each_file_lock(fl, &flctx->flc_posix) {
if (fl->fl_lmops != &nlmsvc_lock_operations)
continue;
/* update current lock count */
file->f_locks++;
- lockhost = ((struct nlm_lockowner *)fl->fl_owner)->host;
+ lockhost = ((struct nlm_lockowner *) fl->c.flc_owner)->host;
if (match(lockhost, host)) {
spin_unlock(&flctx->flc_lock);
@@ -272,7 +272,7 @@ nlm_file_inuse(struct nlm_file *file)
if (flctx && !list_empty_careful(&flctx->flc_posix)) {
spin_lock(&flctx->flc_lock);
- list_for_each_entry(fl, &flctx->flc_posix, fl_list) {
+ for_each_file_lock(fl, &flctx->flc_posix) {
if (fl->fl_lmops == &nlmsvc_lock_operations) {
spin_unlock(&flctx->flc_lock);
return 1;
diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c
index 2fb5748..adfcce2 100644
--- a/fs/lockd/xdr.c
+++ b/fs/lockd/xdr.c
@@ -88,8 +88,8 @@ svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock)
return false;
locks_init_lock(fl);
- fl->fl_flags = FL_POSIX;
- fl->fl_type = F_RDLCK;
+ fl->c.flc_flags = FL_POSIX;
+ fl->c.flc_type = F_RDLCK;
end = start + len - 1;
fl->fl_start = s32_to_loff_t(start);
if (len == 0 || end < 0)
@@ -107,7 +107,7 @@ svcxdr_encode_holder(struct xdr_stream *xdr, const struct nlm_lock *lock)
s32 start, len;
/* exclusive */
- if (xdr_stream_encode_bool(xdr, fl->fl_type != F_RDLCK) < 0)
+ if (xdr_stream_encode_bool(xdr, fl->c.flc_type != F_RDLCK) < 0)
return false;
if (xdr_stream_encode_u32(xdr, lock->svid) < 0)
return false;
@@ -164,7 +164,7 @@ nlmsvc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr)
if (!svcxdr_decode_lock(xdr, &argp->lock))
return false;
if (exclusive)
- argp->lock.fl.fl_type = F_WRLCK;
+ argp->lock.fl.c.flc_type = F_WRLCK;
return true;
}
@@ -184,7 +184,7 @@ nlmsvc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr)
if (!svcxdr_decode_lock(xdr, &argp->lock))
return false;
if (exclusive)
- argp->lock.fl.fl_type = F_WRLCK;
+ argp->lock.fl.c.flc_type = F_WRLCK;
if (xdr_stream_decode_bool(xdr, &argp->reclaim) < 0)
return false;
if (xdr_stream_decode_u32(xdr, &argp->state) < 0)
@@ -209,7 +209,7 @@ nlmsvc_decode_cancargs(struct svc_rqst *rqstp, struct xdr_stream *xdr)
if (!svcxdr_decode_lock(xdr, &argp->lock))
return false;
if (exclusive)
- argp->lock.fl.fl_type = F_WRLCK;
+ argp->lock.fl.c.flc_type = F_WRLCK;
return true;
}
@@ -223,7 +223,7 @@ nlmsvc_decode_unlockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr)
return false;
if (!svcxdr_decode_lock(xdr, &argp->lock))
return false;
- argp->lock.fl.fl_type = F_UNLCK;
+ argp->lock.fl.c.flc_type = F_UNLCK;
return true;
}
diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c
index 5fcbf30c..3d28b9c 100644
--- a/fs/lockd/xdr4.c
+++ b/fs/lockd/xdr4.c
@@ -89,8 +89,8 @@ svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock)
return false;
locks_init_lock(fl);
- fl->fl_flags = FL_POSIX;
- fl->fl_type = F_RDLCK;
+ fl->c.flc_flags = FL_POSIX;
+ fl->c.flc_type = F_RDLCK;
nlm4svc_set_file_lock_range(fl, lock->lock_start, lock->lock_len);
return true;
}
@@ -102,7 +102,7 @@ svcxdr_encode_holder(struct xdr_stream *xdr, const struct nlm_lock *lock)
s64 start, len;
/* exclusive */
- if (xdr_stream_encode_bool(xdr, fl->fl_type != F_RDLCK) < 0)
+ if (xdr_stream_encode_bool(xdr, fl->c.flc_type != F_RDLCK) < 0)
return false;
if (xdr_stream_encode_u32(xdr, lock->svid) < 0)
return false;
@@ -159,7 +159,7 @@ nlm4svc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr)
if (!svcxdr_decode_lock(xdr, &argp->lock))
return false;
if (exclusive)
- argp->lock.fl.fl_type = F_WRLCK;
+ argp->lock.fl.c.flc_type = F_WRLCK;
return true;
}
@@ -179,7 +179,7 @@ nlm4svc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr)
if (!svcxdr_decode_lock(xdr, &argp->lock))
return false;
if (exclusive)
- argp->lock.fl.fl_type = F_WRLCK;
+ argp->lock.fl.c.flc_type = F_WRLCK;
if (xdr_stream_decode_bool(xdr, &argp->reclaim) < 0)
return false;
if (xdr_stream_decode_u32(xdr, &argp->state) < 0)
@@ -204,7 +204,7 @@ nlm4svc_decode_cancargs(struct svc_rqst *rqstp, struct xdr_stream *xdr)
if (!svcxdr_decode_lock(xdr, &argp->lock))
return false;
if (exclusive)
- argp->lock.fl.fl_type = F_WRLCK;
+ argp->lock.fl.c.flc_type = F_WRLCK;
return true;
}
@@ -218,7 +218,7 @@ nlm4svc_decode_unlockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr)
return false;
if (!svcxdr_decode_lock(xdr, &argp->lock))
return false;
- argp->lock.fl.fl_type = F_UNLCK;
+ argp->lock.fl.c.flc_type = F_UNLCK;
return true;
}
diff --git a/fs/locks.c b/fs/locks.c
index cc7c117..90c8746 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -48,7 +48,6 @@
* children.
*
*/
-
#include <linux/capability.h>
#include <linux/file.h>
#include <linux/fdtable.h>
@@ -70,24 +69,28 @@
#include <linux/uaccess.h>
-#define IS_POSIX(fl) (fl->fl_flags & FL_POSIX)
-#define IS_FLOCK(fl) (fl->fl_flags & FL_FLOCK)
-#define IS_LEASE(fl) (fl->fl_flags & (FL_LEASE|FL_DELEG|FL_LAYOUT))
-#define IS_OFDLCK(fl) (fl->fl_flags & FL_OFDLCK)
-#define IS_REMOTELCK(fl) (fl->fl_pid <= 0)
-
-static bool lease_breaking(struct file_lock *fl)
+static struct file_lock *file_lock(struct file_lock_core *flc)
{
- return fl->fl_flags & (FL_UNLOCK_PENDING | FL_DOWNGRADE_PENDING);
+ return container_of(flc, struct file_lock, c);
}
-static int target_leasetype(struct file_lock *fl)
+static struct file_lease *file_lease(struct file_lock_core *flc)
{
- if (fl->fl_flags & FL_UNLOCK_PENDING)
+ return container_of(flc, struct file_lease, c);
+}
+
+static bool lease_breaking(struct file_lease *fl)
+{
+ return fl->c.flc_flags & (FL_UNLOCK_PENDING | FL_DOWNGRADE_PENDING);
+}
+
+static int target_leasetype(struct file_lease *fl)
+{
+ if (fl->c.flc_flags & FL_UNLOCK_PENDING)
return F_UNLCK;
- if (fl->fl_flags & FL_DOWNGRADE_PENDING)
+ if (fl->c.flc_flags & FL_DOWNGRADE_PENDING)
return F_RDLCK;
- return fl->fl_type;
+ return fl->c.flc_type;
}
static int leases_enable = 1;
@@ -168,6 +171,7 @@ static DEFINE_SPINLOCK(blocked_lock_lock);
static struct kmem_cache *flctx_cache __ro_after_init;
static struct kmem_cache *filelock_cache __ro_after_init;
+static struct kmem_cache *filelease_cache __ro_after_init;
static struct file_lock_context *
locks_get_lock_context(struct inode *inode, int type)
@@ -204,11 +208,12 @@ locks_get_lock_context(struct inode *inode, int type)
static void
locks_dump_ctx_list(struct list_head *list, char *list_type)
{
- struct file_lock *fl;
+ struct file_lock_core *flc;
- list_for_each_entry(fl, list, fl_list) {
- pr_warn("%s: fl_owner=%p fl_flags=0x%x fl_type=0x%x fl_pid=%u\n", list_type, fl->fl_owner, fl->fl_flags, fl->fl_type, fl->fl_pid);
- }
+ list_for_each_entry(flc, list, flc_list)
+ pr_warn("%s: fl_owner=%p fl_flags=0x%x fl_type=0x%x fl_pid=%u\n",
+ list_type, flc->flc_owner, flc->flc_flags,
+ flc->flc_type, flc->flc_pid);
}
static void
@@ -229,19 +234,19 @@ locks_check_ctx_lists(struct inode *inode)
}
static void
-locks_check_ctx_file_list(struct file *filp, struct list_head *list,
- char *list_type)
+locks_check_ctx_file_list(struct file *filp, struct list_head *list, char *list_type)
{
- struct file_lock *fl;
+ struct file_lock_core *flc;
struct inode *inode = file_inode(filp);
- list_for_each_entry(fl, list, fl_list)
- if (fl->fl_file == filp)
+ list_for_each_entry(flc, list, flc_list)
+ if (flc->flc_file == filp)
pr_warn("Leaked %s lock on dev=0x%x:0x%x ino=0x%lx "
" fl_owner=%p fl_flags=0x%x fl_type=0x%x fl_pid=%u\n",
list_type, MAJOR(inode->i_sb->s_dev),
MINOR(inode->i_sb->s_dev), inode->i_ino,
- fl->fl_owner, fl->fl_flags, fl->fl_type, fl->fl_pid);
+ flc->flc_owner, flc->flc_flags,
+ flc->flc_type, flc->flc_pid);
}
void
@@ -255,13 +260,13 @@ locks_free_lock_context(struct inode *inode)
}
}
-static void locks_init_lock_heads(struct file_lock *fl)
+static void locks_init_lock_heads(struct file_lock_core *flc)
{
- INIT_HLIST_NODE(&fl->fl_link);
- INIT_LIST_HEAD(&fl->fl_list);
- INIT_LIST_HEAD(&fl->fl_blocked_requests);
- INIT_LIST_HEAD(&fl->fl_blocked_member);
- init_waitqueue_head(&fl->fl_wait);
+ INIT_HLIST_NODE(&flc->flc_link);
+ INIT_LIST_HEAD(&flc->flc_list);
+ INIT_LIST_HEAD(&flc->flc_blocked_requests);
+ INIT_LIST_HEAD(&flc->flc_blocked_member);
+ init_waitqueue_head(&flc->flc_wait);
}
/* Allocate an empty lock structure. */
@@ -270,19 +275,33 @@ struct file_lock *locks_alloc_lock(void)
struct file_lock *fl = kmem_cache_zalloc(filelock_cache, GFP_KERNEL);
if (fl)
- locks_init_lock_heads(fl);
+ locks_init_lock_heads(&fl->c);
return fl;
}
EXPORT_SYMBOL_GPL(locks_alloc_lock);
+/* Allocate an empty lock structure. */
+struct file_lease *locks_alloc_lease(void)
+{
+ struct file_lease *fl = kmem_cache_zalloc(filelease_cache, GFP_KERNEL);
+
+ if (fl)
+ locks_init_lock_heads(&fl->c);
+
+ return fl;
+}
+EXPORT_SYMBOL_GPL(locks_alloc_lease);
+
void locks_release_private(struct file_lock *fl)
{
- BUG_ON(waitqueue_active(&fl->fl_wait));
- BUG_ON(!list_empty(&fl->fl_list));
- BUG_ON(!list_empty(&fl->fl_blocked_requests));
- BUG_ON(!list_empty(&fl->fl_blocked_member));
- BUG_ON(!hlist_unhashed(&fl->fl_link));
+ struct file_lock_core *flc = &fl->c;
+
+ BUG_ON(waitqueue_active(&flc->flc_wait));
+ BUG_ON(!list_empty(&flc->flc_list));
+ BUG_ON(!list_empty(&flc->flc_blocked_requests));
+ BUG_ON(!list_empty(&flc->flc_blocked_member));
+ BUG_ON(!hlist_unhashed(&flc->flc_link));
if (fl->fl_ops) {
if (fl->fl_ops->fl_release_private)
@@ -292,8 +311,8 @@ void locks_release_private(struct file_lock *fl)
if (fl->fl_lmops) {
if (fl->fl_lmops->lm_put_owner) {
- fl->fl_lmops->lm_put_owner(fl->fl_owner);
- fl->fl_owner = NULL;
+ fl->fl_lmops->lm_put_owner(flc->flc_owner);
+ flc->flc_owner = NULL;
}
fl->fl_lmops = NULL;
}
@@ -309,16 +328,15 @@ EXPORT_SYMBOL_GPL(locks_release_private);
* %true: @owner has at least one blocker
* %false: @owner has no blockers
*/
-bool locks_owner_has_blockers(struct file_lock_context *flctx,
- fl_owner_t owner)
+bool locks_owner_has_blockers(struct file_lock_context *flctx, fl_owner_t owner)
{
- struct file_lock *fl;
+ struct file_lock_core *flc;
spin_lock(&flctx->flc_lock);
- list_for_each_entry(fl, &flctx->flc_posix, fl_list) {
- if (fl->fl_owner != owner)
+ list_for_each_entry(flc, &flctx->flc_posix, flc_list) {
+ if (flc->flc_owner != owner)
continue;
- if (!list_empty(&fl->fl_blocked_requests)) {
+ if (!list_empty(&flc->flc_blocked_requests)) {
spin_unlock(&flctx->flc_lock);
return true;
}
@@ -336,35 +354,52 @@ void locks_free_lock(struct file_lock *fl)
}
EXPORT_SYMBOL(locks_free_lock);
+/* Free a lease which is not in use. */
+void locks_free_lease(struct file_lease *fl)
+{
+ kmem_cache_free(filelease_cache, fl);
+}
+EXPORT_SYMBOL(locks_free_lease);
+
static void
locks_dispose_list(struct list_head *dispose)
{
- struct file_lock *fl;
+ struct file_lock_core *flc;
while (!list_empty(dispose)) {
- fl = list_first_entry(dispose, struct file_lock, fl_list);
- list_del_init(&fl->fl_list);
- locks_free_lock(fl);
+ flc = list_first_entry(dispose, struct file_lock_core, flc_list);
+ list_del_init(&flc->flc_list);
+ if (flc->flc_flags & (FL_LEASE|FL_DELEG|FL_LAYOUT))
+ locks_free_lease(file_lease(flc));
+ else
+ locks_free_lock(file_lock(flc));
}
}
void locks_init_lock(struct file_lock *fl)
{
memset(fl, 0, sizeof(struct file_lock));
- locks_init_lock_heads(fl);
+ locks_init_lock_heads(&fl->c);
}
EXPORT_SYMBOL(locks_init_lock);
+void locks_init_lease(struct file_lease *fl)
+{
+ memset(fl, 0, sizeof(*fl));
+ locks_init_lock_heads(&fl->c);
+}
+EXPORT_SYMBOL(locks_init_lease);
+
/*
* Initialize a new lock from an existing file_lock structure.
*/
void locks_copy_conflock(struct file_lock *new, struct file_lock *fl)
{
- new->fl_owner = fl->fl_owner;
- new->fl_pid = fl->fl_pid;
- new->fl_file = NULL;
- new->fl_flags = fl->fl_flags;
- new->fl_type = fl->fl_type;
+ new->c.flc_owner = fl->c.flc_owner;
+ new->c.flc_pid = fl->c.flc_pid;
+ new->c.flc_file = NULL;
+ new->c.flc_flags = fl->c.flc_flags;
+ new->c.flc_type = fl->c.flc_type;
new->fl_start = fl->fl_start;
new->fl_end = fl->fl_end;
new->fl_lmops = fl->fl_lmops;
@@ -372,7 +407,7 @@ void locks_copy_conflock(struct file_lock *new, struct file_lock *fl)
if (fl->fl_lmops) {
if (fl->fl_lmops->lm_get_owner)
- fl->fl_lmops->lm_get_owner(fl->fl_owner);
+ fl->fl_lmops->lm_get_owner(fl->c.flc_owner);
}
}
EXPORT_SYMBOL(locks_copy_conflock);
@@ -384,7 +419,7 @@ void locks_copy_lock(struct file_lock *new, struct file_lock *fl)
locks_copy_conflock(new, fl);
- new->fl_file = fl->fl_file;
+ new->c.flc_file = fl->c.flc_file;
new->fl_ops = fl->fl_ops;
if (fl->fl_ops) {
@@ -400,15 +435,17 @@ static void locks_move_blocks(struct file_lock *new, struct file_lock *fl)
/*
* As ctx->flc_lock is held, new requests cannot be added to
- * ->fl_blocked_requests, so we don't need a lock to check if it
+ * ->flc_blocked_requests, so we don't need a lock to check if it
* is empty.
*/
- if (list_empty(&fl->fl_blocked_requests))
+ if (list_empty(&fl->c.flc_blocked_requests))
return;
spin_lock(&blocked_lock_lock);
- list_splice_init(&fl->fl_blocked_requests, &new->fl_blocked_requests);
- list_for_each_entry(f, &new->fl_blocked_requests, fl_blocked_member)
- f->fl_blocker = new;
+ list_splice_init(&fl->c.flc_blocked_requests,
+ &new->c.flc_blocked_requests);
+ list_for_each_entry(f, &new->c.flc_blocked_requests,
+ c.flc_blocked_member)
+ f->c.flc_blocker = &new->c;
spin_unlock(&blocked_lock_lock);
}
@@ -429,21 +466,21 @@ static void flock_make_lock(struct file *filp, struct file_lock *fl, int type)
{
locks_init_lock(fl);
- fl->fl_file = filp;
- fl->fl_owner = filp;
- fl->fl_pid = current->tgid;
- fl->fl_flags = FL_FLOCK;
- fl->fl_type = type;
+ fl->c.flc_file = filp;
+ fl->c.flc_owner = filp;
+ fl->c.flc_pid = current->tgid;
+ fl->c.flc_flags = FL_FLOCK;
+ fl->c.flc_type = type;
fl->fl_end = OFFSET_MAX;
}
-static int assign_type(struct file_lock *fl, int type)
+static int assign_type(struct file_lock_core *flc, int type)
{
switch (type) {
case F_RDLCK:
case F_WRLCK:
case F_UNLCK:
- fl->fl_type = type;
+ flc->flc_type = type;
break;
default:
return -EINVAL;
@@ -488,14 +525,14 @@ static int flock64_to_posix_lock(struct file *filp, struct file_lock *fl,
} else
fl->fl_end = OFFSET_MAX;
- fl->fl_owner = current->files;
- fl->fl_pid = current->tgid;
- fl->fl_file = filp;
- fl->fl_flags = FL_POSIX;
+ fl->c.flc_owner = current->files;
+ fl->c.flc_pid = current->tgid;
+ fl->c.flc_file = filp;
+ fl->c.flc_flags = FL_POSIX;
fl->fl_ops = NULL;
fl->fl_lmops = NULL;
- return assign_type(fl, l->l_type);
+ return assign_type(&fl->c, l->l_type);
}
/* Verify a "struct flock" and copy it to a "struct file_lock" as a POSIX
@@ -516,16 +553,16 @@ static int flock_to_posix_lock(struct file *filp, struct file_lock *fl,
/* default lease lock manager operations */
static bool
-lease_break_callback(struct file_lock *fl)
+lease_break_callback(struct file_lease *fl)
{
kill_fasync(&fl->fl_fasync, SIGIO, POLL_MSG);
return false;
}
static void
-lease_setup(struct file_lock *fl, void **priv)
+lease_setup(struct file_lease *fl, void **priv)
{
- struct file *filp = fl->fl_file;
+ struct file *filp = fl->c.flc_file;
struct fasync_struct *fa = *priv;
/*
@@ -539,7 +576,7 @@ lease_setup(struct file_lock *fl, void **priv)
__f_setown(filp, task_pid(current), PIDTYPE_TGID, 0);
}
-static const struct lock_manager_operations lease_manager_ops = {
+static const struct lease_manager_operations lease_manager_ops = {
.lm_break = lease_break_callback,
.lm_change = lease_modify,
.lm_setup = lease_setup,
@@ -548,27 +585,24 @@ static const struct lock_manager_operations lease_manager_ops = {
/*
* Initialize a lease, use the default lock manager operations
*/
-static int lease_init(struct file *filp, int type, struct file_lock *fl)
+static int lease_init(struct file *filp, int type, struct file_lease *fl)
{
- if (assign_type(fl, type) != 0)
+ if (assign_type(&fl->c, type) != 0)
return -EINVAL;
- fl->fl_owner = filp;
- fl->fl_pid = current->tgid;
+ fl->c.flc_owner = filp;
+ fl->c.flc_pid = current->tgid;
- fl->fl_file = filp;
- fl->fl_flags = FL_LEASE;
- fl->fl_start = 0;
- fl->fl_end = OFFSET_MAX;
- fl->fl_ops = NULL;
+ fl->c.flc_file = filp;
+ fl->c.flc_flags = FL_LEASE;
fl->fl_lmops = &lease_manager_ops;
return 0;
}
/* Allocate a file_lock initialised to this type of lease */
-static struct file_lock *lease_alloc(struct file *filp, int type)
+static struct file_lease *lease_alloc(struct file *filp, int type)
{
- struct file_lock *fl = locks_alloc_lock();
+ struct file_lease *fl = locks_alloc_lease();
int error = -ENOMEM;
if (fl == NULL)
@@ -576,7 +610,7 @@ static struct file_lock *lease_alloc(struct file *filp, int type)
error = lease_init(filp, type, fl);
if (error) {
- locks_free_lock(fl);
+ locks_free_lease(fl);
return ERR_PTR(error);
}
return fl;
@@ -593,26 +627,26 @@ static inline int locks_overlap(struct file_lock *fl1, struct file_lock *fl2)
/*
* Check whether two locks have the same owner.
*/
-static int posix_same_owner(struct file_lock *fl1, struct file_lock *fl2)
+static int posix_same_owner(struct file_lock_core *fl1, struct file_lock_core *fl2)
{
- return fl1->fl_owner == fl2->fl_owner;
+ return fl1->flc_owner == fl2->flc_owner;
}
/* Must be called with the flc_lock held! */
-static void locks_insert_global_locks(struct file_lock *fl)
+static void locks_insert_global_locks(struct file_lock_core *flc)
{
struct file_lock_list_struct *fll = this_cpu_ptr(&file_lock_list);
percpu_rwsem_assert_held(&file_rwsem);
spin_lock(&fll->lock);
- fl->fl_link_cpu = smp_processor_id();
- hlist_add_head(&fl->fl_link, &fll->hlist);
+ flc->flc_link_cpu = smp_processor_id();
+ hlist_add_head(&flc->flc_link, &fll->hlist);
spin_unlock(&fll->lock);
}
/* Must be called with the flc_lock held! */
-static void locks_delete_global_locks(struct file_lock *fl)
+static void locks_delete_global_locks(struct file_lock_core *flc)
{
struct file_lock_list_struct *fll;
@@ -623,33 +657,33 @@ static void locks_delete_global_locks(struct file_lock *fl)
* is done while holding the flc_lock, and new insertions into the list
* also require that it be held.
*/
- if (hlist_unhashed(&fl->fl_link))
+ if (hlist_unhashed(&flc->flc_link))
return;
- fll = per_cpu_ptr(&file_lock_list, fl->fl_link_cpu);
+ fll = per_cpu_ptr(&file_lock_list, flc->flc_link_cpu);
spin_lock(&fll->lock);
- hlist_del_init(&fl->fl_link);
+ hlist_del_init(&flc->flc_link);
spin_unlock(&fll->lock);
}
static unsigned long
-posix_owner_key(struct file_lock *fl)
+posix_owner_key(struct file_lock_core *flc)
{
- return (unsigned long)fl->fl_owner;
+ return (unsigned long) flc->flc_owner;
}
-static void locks_insert_global_blocked(struct file_lock *waiter)
+static void locks_insert_global_blocked(struct file_lock_core *waiter)
{
lockdep_assert_held(&blocked_lock_lock);
- hash_add(blocked_hash, &waiter->fl_link, posix_owner_key(waiter));
+ hash_add(blocked_hash, &waiter->flc_link, posix_owner_key(waiter));
}
-static void locks_delete_global_blocked(struct file_lock *waiter)
+static void locks_delete_global_blocked(struct file_lock_core *waiter)
{
lockdep_assert_held(&blocked_lock_lock);
- hash_del(&waiter->fl_link);
+ hash_del(&waiter->flc_link);
}
/* Remove waiter from blocker's block list.
@@ -657,41 +691,39 @@ static void locks_delete_global_blocked(struct file_lock *waiter)
*
* Must be called with blocked_lock_lock held.
*/
-static void __locks_delete_block(struct file_lock *waiter)
+static void __locks_unlink_block(struct file_lock_core *waiter)
{
locks_delete_global_blocked(waiter);
- list_del_init(&waiter->fl_blocked_member);
+ list_del_init(&waiter->flc_blocked_member);
}
-static void __locks_wake_up_blocks(struct file_lock *blocker)
+static void __locks_wake_up_blocks(struct file_lock_core *blocker)
{
- while (!list_empty(&blocker->fl_blocked_requests)) {
- struct file_lock *waiter;
+ while (!list_empty(&blocker->flc_blocked_requests)) {
+ struct file_lock_core *waiter;
+ struct file_lock *fl;
- waiter = list_first_entry(&blocker->fl_blocked_requests,
- struct file_lock, fl_blocked_member);
- __locks_delete_block(waiter);
- if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
- waiter->fl_lmops->lm_notify(waiter);
+ waiter = list_first_entry(&blocker->flc_blocked_requests,
+ struct file_lock_core, flc_blocked_member);
+
+ fl = file_lock(waiter);
+ __locks_unlink_block(waiter);
+ if ((waiter->flc_flags & (FL_POSIX | FL_FLOCK)) &&
+ fl->fl_lmops && fl->fl_lmops->lm_notify)
+ fl->fl_lmops->lm_notify(fl);
else
- wake_up(&waiter->fl_wait);
+ locks_wake_up(fl);
/*
- * The setting of fl_blocker to NULL marks the "done"
+ * The setting of flc_blocker to NULL marks the "done"
* point in deleting a block. Paired with acquire at the top
* of locks_delete_block().
*/
- smp_store_release(&waiter->fl_blocker, NULL);
+ smp_store_release(&waiter->flc_blocker, NULL);
}
}
-/**
- * locks_delete_block - stop waiting for a file lock
- * @waiter: the lock which was waiting
- *
- * lockd/nfsd need to disconnect the lock while working on it.
- */
-int locks_delete_block(struct file_lock *waiter)
+static int __locks_delete_block(struct file_lock_core *waiter)
{
int status = -ENOENT;
@@ -716,24 +748,35 @@ int locks_delete_block(struct file_lock *waiter)
* no new locks can be inserted into its fl_blocked_requests list, and
* can avoid doing anything further if the list is empty.
*/
- if (!smp_load_acquire(&waiter->fl_blocker) &&
- list_empty(&waiter->fl_blocked_requests))
+ if (!smp_load_acquire(&waiter->flc_blocker) &&
+ list_empty(&waiter->flc_blocked_requests))
return status;
spin_lock(&blocked_lock_lock);
- if (waiter->fl_blocker)
+ if (waiter->flc_blocker)
status = 0;
__locks_wake_up_blocks(waiter);
- __locks_delete_block(waiter);
+ __locks_unlink_block(waiter);
/*
* The setting of fl_blocker to NULL marks the "done" point in deleting
* a block. Paired with acquire at the top of this function.
*/
- smp_store_release(&waiter->fl_blocker, NULL);
+ smp_store_release(&waiter->flc_blocker, NULL);
spin_unlock(&blocked_lock_lock);
return status;
}
+
+/**
+ * locks_delete_block - stop waiting for a file lock
+ * @waiter: the lock which was waiting
+ *
+ * lockd/nfsd need to disconnect the lock while working on it.
+ */
+int locks_delete_block(struct file_lock *waiter)
+{
+ return __locks_delete_block(&waiter->c);
+}
EXPORT_SYMBOL(locks_delete_block);
/* Insert waiter into blocker's block list.
@@ -751,26 +794,28 @@ EXPORT_SYMBOL(locks_delete_block);
* waiters, and add beneath any waiter that blocks the new waiter.
* Thus wakeups don't happen until needed.
*/
-static void __locks_insert_block(struct file_lock *blocker,
- struct file_lock *waiter,
- bool conflict(struct file_lock *,
- struct file_lock *))
+static void __locks_insert_block(struct file_lock_core *blocker,
+ struct file_lock_core *waiter,
+ bool conflict(struct file_lock_core *,
+ struct file_lock_core *))
{
- struct file_lock *fl;
- BUG_ON(!list_empty(&waiter->fl_blocked_member));
+ struct file_lock_core *flc;
+ BUG_ON(!list_empty(&waiter->flc_blocked_member));
new_blocker:
- list_for_each_entry(fl, &blocker->fl_blocked_requests, fl_blocked_member)
- if (conflict(fl, waiter)) {
- blocker = fl;
+ list_for_each_entry(flc, &blocker->flc_blocked_requests, flc_blocked_member)
+ if (conflict(flc, waiter)) {
+ blocker = flc;
goto new_blocker;
}
- waiter->fl_blocker = blocker;
- list_add_tail(&waiter->fl_blocked_member, &blocker->fl_blocked_requests);
- if (IS_POSIX(blocker) && !IS_OFDLCK(blocker))
+ waiter->flc_blocker = blocker;
+ list_add_tail(&waiter->flc_blocked_member,
+ &blocker->flc_blocked_requests);
+
+ if ((blocker->flc_flags & (FL_POSIX|FL_OFDLCK)) == FL_POSIX)
locks_insert_global_blocked(waiter);
- /* The requests in waiter->fl_blocked are known to conflict with
+ /* The requests in waiter->flc_blocked are known to conflict with
* waiter, but might not conflict with blocker, or the requests
* and lock which block it. So they all need to be woken.
*/
@@ -778,10 +823,10 @@ static void __locks_insert_block(struct file_lock *blocker,
}
/* Must be called with flc_lock held. */
-static void locks_insert_block(struct file_lock *blocker,
- struct file_lock *waiter,
- bool conflict(struct file_lock *,
- struct file_lock *))
+static void locks_insert_block(struct file_lock_core *blocker,
+ struct file_lock_core *waiter,
+ bool conflict(struct file_lock_core *,
+ struct file_lock_core *))
{
spin_lock(&blocked_lock_lock);
__locks_insert_block(blocker, waiter, conflict);
@@ -793,7 +838,7 @@ static void locks_insert_block(struct file_lock *blocker,
*
* Must be called with the inode->flc_lock held!
*/
-static void locks_wake_up_blocks(struct file_lock *blocker)
+static void locks_wake_up_blocks(struct file_lock_core *blocker)
{
/*
* Avoid taking global lock if list is empty. This is safe since new
@@ -802,7 +847,7 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
* fl_blocked_requests list does not require the flc_lock, so we must
* recheck list_empty() after acquiring the blocked_lock_lock.
*/
- if (list_empty(&blocker->fl_blocked_requests))
+ if (list_empty(&blocker->flc_blocked_requests))
return;
spin_lock(&blocked_lock_lock);
@@ -811,39 +856,39 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
}
static void
-locks_insert_lock_ctx(struct file_lock *fl, struct list_head *before)
+locks_insert_lock_ctx(struct file_lock_core *fl, struct list_head *before)
{
- list_add_tail(&fl->fl_list, before);
+ list_add_tail(&fl->flc_list, before);
locks_insert_global_locks(fl);
}
static void
-locks_unlink_lock_ctx(struct file_lock *fl)
+locks_unlink_lock_ctx(struct file_lock_core *fl)
{
locks_delete_global_locks(fl);
- list_del_init(&fl->fl_list);
+ list_del_init(&fl->flc_list);
locks_wake_up_blocks(fl);
}
static void
-locks_delete_lock_ctx(struct file_lock *fl, struct list_head *dispose)
+locks_delete_lock_ctx(struct file_lock_core *fl, struct list_head *dispose)
{
locks_unlink_lock_ctx(fl);
if (dispose)
- list_add(&fl->fl_list, dispose);
+ list_add(&fl->flc_list, dispose);
else
- locks_free_lock(fl);
+ locks_free_lock(file_lock(fl));
}
/* Determine if lock sys_fl blocks lock caller_fl. Common functionality
* checks for shared/exclusive status of overlapping locks.
*/
-static bool locks_conflict(struct file_lock *caller_fl,
- struct file_lock *sys_fl)
+static bool locks_conflict(struct file_lock_core *caller_flc,
+ struct file_lock_core *sys_flc)
{
- if (sys_fl->fl_type == F_WRLCK)
+ if (sys_flc->flc_type == F_WRLCK)
return true;
- if (caller_fl->fl_type == F_WRLCK)
+ if (caller_flc->flc_type == F_WRLCK)
return true;
return false;
}
@@ -851,20 +896,23 @@ static bool locks_conflict(struct file_lock *caller_fl,
/* Determine if lock sys_fl blocks lock caller_fl. POSIX specific
* checking before calling the locks_conflict().
*/
-static bool posix_locks_conflict(struct file_lock *caller_fl,
- struct file_lock *sys_fl)
+static bool posix_locks_conflict(struct file_lock_core *caller_flc,
+ struct file_lock_core *sys_flc)
{
+ struct file_lock *caller_fl = file_lock(caller_flc);
+ struct file_lock *sys_fl = file_lock(sys_flc);
+
/* POSIX locks owned by the same process do not conflict with
* each other.
*/
- if (posix_same_owner(caller_fl, sys_fl))
+ if (posix_same_owner(caller_flc, sys_flc))
return false;
/* Check whether they overlap */
if (!locks_overlap(caller_fl, sys_fl))
return false;
- return locks_conflict(caller_fl, sys_fl);
+ return locks_conflict(caller_flc, sys_flc);
}
/* Determine if lock sys_fl blocks lock caller_fl. Used on xx_GETLK
@@ -873,28 +921,31 @@ static bool posix_locks_conflict(struct file_lock *caller_fl,
static bool posix_test_locks_conflict(struct file_lock *caller_fl,
struct file_lock *sys_fl)
{
+ struct file_lock_core *caller = &caller_fl->c;
+ struct file_lock_core *sys = &sys_fl->c;
+
/* F_UNLCK checks any locks on the same fd. */
- if (caller_fl->fl_type == F_UNLCK) {
- if (!posix_same_owner(caller_fl, sys_fl))
+ if (lock_is_unlock(caller_fl)) {
+ if (!posix_same_owner(caller, sys))
return false;
return locks_overlap(caller_fl, sys_fl);
}
- return posix_locks_conflict(caller_fl, sys_fl);
+ return posix_locks_conflict(caller, sys);
}
/* Determine if lock sys_fl blocks lock caller_fl. FLOCK specific
* checking before calling the locks_conflict().
*/
-static bool flock_locks_conflict(struct file_lock *caller_fl,
- struct file_lock *sys_fl)
+static bool flock_locks_conflict(struct file_lock_core *caller_flc,
+ struct file_lock_core *sys_flc)
{
/* FLOCK locks referring to the same filp do not conflict with
* each other.
*/
- if (caller_fl->fl_file == sys_fl->fl_file)
+ if (caller_flc->flc_file == sys_flc->flc_file)
return false;
- return locks_conflict(caller_fl, sys_fl);
+ return locks_conflict(caller_flc, sys_flc);
}
void
@@ -908,13 +959,13 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
ctx = locks_inode_context(inode);
if (!ctx || list_empty_careful(&ctx->flc_posix)) {
- fl->fl_type = F_UNLCK;
+ fl->c.flc_type = F_UNLCK;
return;
}
retry:
spin_lock(&ctx->flc_lock);
- list_for_each_entry(cfl, &ctx->flc_posix, fl_list) {
+ list_for_each_entry(cfl, &ctx->flc_posix, c.flc_list) {
if (!posix_test_locks_conflict(fl, cfl))
continue;
if (cfl->fl_lmops && cfl->fl_lmops->lm_lock_expirable
@@ -930,7 +981,7 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
locks_copy_conflock(fl, cfl);
goto out;
}
- fl->fl_type = F_UNLCK;
+ fl->c.flc_type = F_UNLCK;
out:
spin_unlock(&ctx->flc_lock);
return;
@@ -972,25 +1023,27 @@ EXPORT_SYMBOL(posix_test_lock);
#define MAX_DEADLK_ITERATIONS 10
-/* Find a lock that the owner of the given block_fl is blocking on. */
-static struct file_lock *what_owner_is_waiting_for(struct file_lock *block_fl)
+/* Find a lock that the owner of the given @blocker is blocking on. */
+static struct file_lock_core *what_owner_is_waiting_for(struct file_lock_core *blocker)
{
- struct file_lock *fl;
+ struct file_lock_core *flc;
- hash_for_each_possible(blocked_hash, fl, fl_link, posix_owner_key(block_fl)) {
- if (posix_same_owner(fl, block_fl)) {
- while (fl->fl_blocker)
- fl = fl->fl_blocker;
- return fl;
+ hash_for_each_possible(blocked_hash, flc, flc_link, posix_owner_key(blocker)) {
+ if (posix_same_owner(flc, blocker)) {
+ while (flc->flc_blocker)
+ flc = flc->flc_blocker;
+ return flc;
}
}
return NULL;
}
/* Must be called with the blocked_lock_lock held! */
-static int posix_locks_deadlock(struct file_lock *caller_fl,
- struct file_lock *block_fl)
+static bool posix_locks_deadlock(struct file_lock *caller_fl,
+ struct file_lock *block_fl)
{
+ struct file_lock_core *caller = &caller_fl->c;
+ struct file_lock_core *blocker = &block_fl->c;
int i = 0;
lockdep_assert_held(&blocked_lock_lock);
@@ -999,16 +1052,16 @@ static int posix_locks_deadlock(struct file_lock *caller_fl,
* This deadlock detector can't reasonably detect deadlocks with
* FL_OFDLCK locks, since they aren't owned by a process, per-se.
*/
- if (IS_OFDLCK(caller_fl))
- return 0;
+ if (caller->flc_flags & FL_OFDLCK)
+ return false;
- while ((block_fl = what_owner_is_waiting_for(block_fl))) {
+ while ((blocker = what_owner_is_waiting_for(blocker))) {
if (i++ > MAX_DEADLK_ITERATIONS)
- return 0;
- if (posix_same_owner(caller_fl, block_fl))
- return 1;
+ return false;
+ if (posix_same_owner(caller, blocker))
+ return true;
}
- return 0;
+ return false;
}
/* Try to create a FLOCK lock on filp. We always insert new FLOCK locks
@@ -1027,14 +1080,14 @@ static int flock_lock_inode(struct inode *inode, struct file_lock *request)
bool found = false;
LIST_HEAD(dispose);
- ctx = locks_get_lock_context(inode, request->fl_type);
+ ctx = locks_get_lock_context(inode, request->c.flc_type);
if (!ctx) {
- if (request->fl_type != F_UNLCK)
+ if (request->c.flc_type != F_UNLCK)
return -ENOMEM;
- return (request->fl_flags & FL_EXISTS) ? -ENOENT : 0;
+ return (request->c.flc_flags & FL_EXISTS) ? -ENOENT : 0;
}
- if (!(request->fl_flags & FL_ACCESS) && (request->fl_type != F_UNLCK)) {
+ if (!(request->c.flc_flags & FL_ACCESS) && (request->c.flc_type != F_UNLCK)) {
new_fl = locks_alloc_lock();
if (!new_fl)
return -ENOMEM;
@@ -1042,41 +1095,41 @@ static int flock_lock_inode(struct inode *inode, struct file_lock *request)
percpu_down_read(&file_rwsem);
spin_lock(&ctx->flc_lock);
- if (request->fl_flags & FL_ACCESS)
+ if (request->c.flc_flags & FL_ACCESS)
goto find_conflict;
- list_for_each_entry(fl, &ctx->flc_flock, fl_list) {
- if (request->fl_file != fl->fl_file)
+ list_for_each_entry(fl, &ctx->flc_flock, c.flc_list) {
+ if (request->c.flc_file != fl->c.flc_file)
continue;
- if (request->fl_type == fl->fl_type)
+ if (request->c.flc_type == fl->c.flc_type)
goto out;
found = true;
- locks_delete_lock_ctx(fl, &dispose);
+ locks_delete_lock_ctx(&fl->c, &dispose);
break;
}
- if (request->fl_type == F_UNLCK) {
- if ((request->fl_flags & FL_EXISTS) && !found)
+ if (lock_is_unlock(request)) {
+ if ((request->c.flc_flags & FL_EXISTS) && !found)
error = -ENOENT;
goto out;
}
find_conflict:
- list_for_each_entry(fl, &ctx->flc_flock, fl_list) {
- if (!flock_locks_conflict(request, fl))
+ list_for_each_entry(fl, &ctx->flc_flock, c.flc_list) {
+ if (!flock_locks_conflict(&request->c, &fl->c))
continue;
error = -EAGAIN;
- if (!(request->fl_flags & FL_SLEEP))
+ if (!(request->c.flc_flags & FL_SLEEP))
goto out;
error = FILE_LOCK_DEFERRED;
- locks_insert_block(fl, request, flock_locks_conflict);
+ locks_insert_block(&fl->c, &request->c, flock_locks_conflict);
goto out;
}
- if (request->fl_flags & FL_ACCESS)
+ if (request->c.flc_flags & FL_ACCESS)
goto out;
locks_copy_lock(new_fl, request);
locks_move_blocks(new_fl, request);
- locks_insert_lock_ctx(new_fl, &ctx->flc_flock);
+ locks_insert_lock_ctx(&new_fl->c, &ctx->flc_flock);
new_fl = NULL;
error = 0;
@@ -1105,9 +1158,9 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
void *owner;
void (*func)(void);
- ctx = locks_get_lock_context(inode, request->fl_type);
+ ctx = locks_get_lock_context(inode, request->c.flc_type);
if (!ctx)
- return (request->fl_type == F_UNLCK) ? 0 : -ENOMEM;
+ return lock_is_unlock(request) ? 0 : -ENOMEM;
/*
* We may need two file_lock structures for this operation,
@@ -1115,8 +1168,8 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
*
* In some cases we can be sure, that no new locks will be needed
*/
- if (!(request->fl_flags & FL_ACCESS) &&
- (request->fl_type != F_UNLCK ||
+ if (!(request->c.flc_flags & FL_ACCESS) &&
+ (request->c.flc_type != F_UNLCK ||
request->fl_start != 0 || request->fl_end != OFFSET_MAX)) {
new_fl = locks_alloc_lock();
new_fl2 = locks_alloc_lock();
@@ -1130,9 +1183,9 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
* there are any, either return error or put the request on the
* blocker's list of waiters and the global blocked_hash.
*/
- if (request->fl_type != F_UNLCK) {
- list_for_each_entry(fl, &ctx->flc_posix, fl_list) {
- if (!posix_locks_conflict(request, fl))
+ if (request->c.flc_type != F_UNLCK) {
+ list_for_each_entry(fl, &ctx->flc_posix, c.flc_list) {
+ if (!posix_locks_conflict(&request->c, &fl->c))
continue;
if (fl->fl_lmops && fl->fl_lmops->lm_lock_expirable
&& (*fl->fl_lmops->lm_lock_expirable)(fl)) {
@@ -1148,7 +1201,7 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
if (conflock)
locks_copy_conflock(conflock, fl);
error = -EAGAIN;
- if (!(request->fl_flags & FL_SLEEP))
+ if (!(request->c.flc_flags & FL_SLEEP))
goto out;
/*
* Deadlock detection and insertion into the blocked
@@ -1160,10 +1213,10 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
* Ensure that we don't find any locks blocked on this
* request during deadlock detection.
*/
- __locks_wake_up_blocks(request);
+ __locks_wake_up_blocks(&request->c);
if (likely(!posix_locks_deadlock(request, fl))) {
error = FILE_LOCK_DEFERRED;
- __locks_insert_block(fl, request,
+ __locks_insert_block(&fl->c, &request->c,
posix_locks_conflict);
}
spin_unlock(&blocked_lock_lock);
@@ -1173,22 +1226,22 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
/* If we're just looking for a conflict, we're done. */
error = 0;
- if (request->fl_flags & FL_ACCESS)
+ if (request->c.flc_flags & FL_ACCESS)
goto out;
/* Find the first old lock with the same owner as the new lock */
- list_for_each_entry(fl, &ctx->flc_posix, fl_list) {
- if (posix_same_owner(request, fl))
+ list_for_each_entry(fl, &ctx->flc_posix, c.flc_list) {
+ if (posix_same_owner(&request->c, &fl->c))
break;
}
/* Process locks with this owner. */
- list_for_each_entry_safe_from(fl, tmp, &ctx->flc_posix, fl_list) {
- if (!posix_same_owner(request, fl))
+ list_for_each_entry_safe_from(fl, tmp, &ctx->flc_posix, c.flc_list) {
+ if (!posix_same_owner(&request->c, &fl->c))
break;
/* Detect adjacent or overlapping regions (if same lock type) */
- if (request->fl_type == fl->fl_type) {
+ if (request->c.flc_type == fl->c.flc_type) {
/* In all comparisons of start vs end, use
* "start - 1" rather than "end + 1". If end
* is OFFSET_MAX, end + 1 will become negative.
@@ -1215,7 +1268,7 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
else
request->fl_end = fl->fl_end;
if (added) {
- locks_delete_lock_ctx(fl, &dispose);
+ locks_delete_lock_ctx(&fl->c, &dispose);
continue;
}
request = fl;
@@ -1228,7 +1281,7 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
continue;
if (fl->fl_start > request->fl_end)
break;
- if (request->fl_type == F_UNLCK)
+ if (lock_is_unlock(request))
added = true;
if (fl->fl_start < request->fl_start)
left = fl;
@@ -1244,7 +1297,7 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
* one (This may happen several times).
*/
if (added) {
- locks_delete_lock_ctx(fl, &dispose);
+ locks_delete_lock_ctx(&fl->c, &dispose);
continue;
}
/*
@@ -1261,8 +1314,9 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
locks_move_blocks(new_fl, request);
request = new_fl;
new_fl = NULL;
- locks_insert_lock_ctx(request, &fl->fl_list);
- locks_delete_lock_ctx(fl, &dispose);
+ locks_insert_lock_ctx(&request->c,
+ &fl->c.flc_list);
+ locks_delete_lock_ctx(&fl->c, &dispose);
added = true;
}
}
@@ -1279,8 +1333,8 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
error = 0;
if (!added) {
- if (request->fl_type == F_UNLCK) {
- if (request->fl_flags & FL_EXISTS)
+ if (lock_is_unlock(request)) {
+ if (request->c.flc_flags & FL_EXISTS)
error = -ENOENT;
goto out;
}
@@ -1291,7 +1345,7 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
}
locks_copy_lock(new_fl, request);
locks_move_blocks(new_fl, request);
- locks_insert_lock_ctx(new_fl, &fl->fl_list);
+ locks_insert_lock_ctx(&new_fl->c, &fl->c.flc_list);
fl = new_fl;
new_fl = NULL;
}
@@ -1303,14 +1357,14 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
left = new_fl2;
new_fl2 = NULL;
locks_copy_lock(left, right);
- locks_insert_lock_ctx(left, &fl->fl_list);
+ locks_insert_lock_ctx(&left->c, &fl->c.flc_list);
}
right->fl_start = request->fl_end + 1;
- locks_wake_up_blocks(right);
+ locks_wake_up_blocks(&right->c);
}
if (left) {
left->fl_end = request->fl_start - 1;
- locks_wake_up_blocks(left);
+ locks_wake_up_blocks(&left->c);
}
out:
spin_unlock(&ctx->flc_lock);
@@ -1364,8 +1418,8 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl)
error = posix_lock_inode(inode, fl, NULL);
if (error != FILE_LOCK_DEFERRED)
break;
- error = wait_event_interruptible(fl->fl_wait,
- list_empty(&fl->fl_blocked_member));
+ error = wait_event_interruptible(fl->c.flc_wait,
+ list_empty(&fl->c.flc_blocked_member));
if (error)
break;
}
@@ -1373,37 +1427,37 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl)
return error;
}
-static void lease_clear_pending(struct file_lock *fl, int arg)
+static void lease_clear_pending(struct file_lease *fl, int arg)
{
switch (arg) {
case F_UNLCK:
- fl->fl_flags &= ~FL_UNLOCK_PENDING;
+ fl->c.flc_flags &= ~FL_UNLOCK_PENDING;
fallthrough;
case F_RDLCK:
- fl->fl_flags &= ~FL_DOWNGRADE_PENDING;
+ fl->c.flc_flags &= ~FL_DOWNGRADE_PENDING;
}
}
/* We already had a lease on this file; just change its type */
-int lease_modify(struct file_lock *fl, int arg, struct list_head *dispose)
+int lease_modify(struct file_lease *fl, int arg, struct list_head *dispose)
{
- int error = assign_type(fl, arg);
+ int error = assign_type(&fl->c, arg);
if (error)
return error;
lease_clear_pending(fl, arg);
- locks_wake_up_blocks(fl);
+ locks_wake_up_blocks(&fl->c);
if (arg == F_UNLCK) {
- struct file *filp = fl->fl_file;
+ struct file *filp = fl->c.flc_file;
f_delown(filp);
filp->f_owner.signum = 0;
- fasync_helper(0, fl->fl_file, 0, &fl->fl_fasync);
+ fasync_helper(0, fl->c.flc_file, 0, &fl->fl_fasync);
if (fl->fl_fasync != NULL) {
printk(KERN_ERR "locks_delete_lock: fasync == %p\n", fl->fl_fasync);
fl->fl_fasync = NULL;
}
- locks_delete_lock_ctx(fl, dispose);
+ locks_delete_lock_ctx(&fl->c, dispose);
}
return 0;
}
@@ -1420,11 +1474,11 @@ static bool past_time(unsigned long then)
static void time_out_leases(struct inode *inode, struct list_head *dispose)
{
struct file_lock_context *ctx = inode->i_flctx;
- struct file_lock *fl, *tmp;
+ struct file_lease *fl, *tmp;
lockdep_assert_held(&ctx->flc_lock);
- list_for_each_entry_safe(fl, tmp, &ctx->flc_lease, fl_list) {
+ list_for_each_entry_safe(fl, tmp, &ctx->flc_lease, c.flc_list) {
trace_time_out_leases(inode, fl);
if (past_time(fl->fl_downgrade_time))
lease_modify(fl, F_RDLCK, dispose);
@@ -1433,38 +1487,40 @@ static void time_out_leases(struct inode *inode, struct list_head *dispose)
}
}
-static bool leases_conflict(struct file_lock *lease, struct file_lock *breaker)
+static bool leases_conflict(struct file_lock_core *lc, struct file_lock_core *bc)
{
bool rc;
+ struct file_lease *lease = file_lease(lc);
+ struct file_lease *breaker = file_lease(bc);
if (lease->fl_lmops->lm_breaker_owns_lease
&& lease->fl_lmops->lm_breaker_owns_lease(lease))
return false;
- if ((breaker->fl_flags & FL_LAYOUT) != (lease->fl_flags & FL_LAYOUT)) {
+ if ((bc->flc_flags & FL_LAYOUT) != (lc->flc_flags & FL_LAYOUT)) {
rc = false;
goto trace;
}
- if ((breaker->fl_flags & FL_DELEG) && (lease->fl_flags & FL_LEASE)) {
+ if ((bc->flc_flags & FL_DELEG) && (lc->flc_flags & FL_LEASE)) {
rc = false;
goto trace;
}
- rc = locks_conflict(breaker, lease);
+ rc = locks_conflict(bc, lc);
trace:
trace_leases_conflict(rc, lease, breaker);
return rc;
}
static bool
-any_leases_conflict(struct inode *inode, struct file_lock *breaker)
+any_leases_conflict(struct inode *inode, struct file_lease *breaker)
{
struct file_lock_context *ctx = inode->i_flctx;
- struct file_lock *fl;
+ struct file_lock_core *flc;
lockdep_assert_held(&ctx->flc_lock);
- list_for_each_entry(fl, &ctx->flc_lease, fl_list) {
- if (leases_conflict(fl, breaker))
+ list_for_each_entry(flc, &ctx->flc_lease, flc_list) {
+ if (leases_conflict(flc, &breaker->c))
return true;
}
return false;
@@ -1487,7 +1543,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
{
int error = 0;
struct file_lock_context *ctx;
- struct file_lock *new_fl, *fl, *tmp;
+ struct file_lease *new_fl, *fl, *tmp;
unsigned long break_time;
int want_write = (mode & O_ACCMODE) != O_RDONLY;
LIST_HEAD(dispose);
@@ -1495,7 +1551,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
new_fl = lease_alloc(NULL, want_write ? F_WRLCK : F_RDLCK);
if (IS_ERR(new_fl))
return PTR_ERR(new_fl);
- new_fl->fl_flags = type;
+ new_fl->c.flc_flags = type;
/* typically we will check that ctx is non-NULL before calling */
ctx = locks_inode_context(inode);
@@ -1519,22 +1575,22 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
break_time++; /* so that 0 means no break time */
}
- list_for_each_entry_safe(fl, tmp, &ctx->flc_lease, fl_list) {
- if (!leases_conflict(fl, new_fl))
+ list_for_each_entry_safe(fl, tmp, &ctx->flc_lease, c.flc_list) {
+ if (!leases_conflict(&fl->c, &new_fl->c))
continue;
if (want_write) {
- if (fl->fl_flags & FL_UNLOCK_PENDING)
+ if (fl->c.flc_flags & FL_UNLOCK_PENDING)
continue;
- fl->fl_flags |= FL_UNLOCK_PENDING;
+ fl->c.flc_flags |= FL_UNLOCK_PENDING;
fl->fl_break_time = break_time;
} else {
if (lease_breaking(fl))
continue;
- fl->fl_flags |= FL_DOWNGRADE_PENDING;
+ fl->c.flc_flags |= FL_DOWNGRADE_PENDING;
fl->fl_downgrade_time = break_time;
}
if (fl->fl_lmops->lm_break(fl))
- locks_delete_lock_ctx(fl, &dispose);
+ locks_delete_lock_ctx(&fl->c, &dispose);
}
if (list_empty(&ctx->flc_lease))
@@ -1547,26 +1603,26 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
}
restart:
- fl = list_first_entry(&ctx->flc_lease, struct file_lock, fl_list);
+ fl = list_first_entry(&ctx->flc_lease, struct file_lease, c.flc_list);
break_time = fl->fl_break_time;
if (break_time != 0)
break_time -= jiffies;
if (break_time == 0)
break_time++;
- locks_insert_block(fl, new_fl, leases_conflict);
+ locks_insert_block(&fl->c, &new_fl->c, leases_conflict);
trace_break_lease_block(inode, new_fl);
spin_unlock(&ctx->flc_lock);
percpu_up_read(&file_rwsem);
locks_dispose_list(&dispose);
- error = wait_event_interruptible_timeout(new_fl->fl_wait,
- list_empty(&new_fl->fl_blocked_member),
- break_time);
+ error = wait_event_interruptible_timeout(new_fl->c.flc_wait,
+ list_empty(&new_fl->c.flc_blocked_member),
+ break_time);
percpu_down_read(&file_rwsem);
spin_lock(&ctx->flc_lock);
trace_break_lease_unblock(inode, new_fl);
- locks_delete_block(new_fl);
+ __locks_delete_block(&new_fl->c);
if (error >= 0) {
/*
* Wait for the next conflicting lease that has not been
@@ -1583,7 +1639,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
percpu_up_read(&file_rwsem);
locks_dispose_list(&dispose);
free_lock:
- locks_free_lock(new_fl);
+ locks_free_lease(new_fl);
return error;
}
EXPORT_SYMBOL(__break_lease);
@@ -1601,14 +1657,14 @@ void lease_get_mtime(struct inode *inode, struct timespec64 *time)
{
bool has_lease = false;
struct file_lock_context *ctx;
- struct file_lock *fl;
+ struct file_lock_core *flc;
ctx = locks_inode_context(inode);
if (ctx && !list_empty_careful(&ctx->flc_lease)) {
spin_lock(&ctx->flc_lock);
- fl = list_first_entry_or_null(&ctx->flc_lease,
- struct file_lock, fl_list);
- if (fl && (fl->fl_type == F_WRLCK))
+ flc = list_first_entry_or_null(&ctx->flc_lease,
+ struct file_lock_core, flc_list);
+ if (flc && flc->flc_type == F_WRLCK)
has_lease = true;
spin_unlock(&ctx->flc_lock);
}
@@ -1643,7 +1699,7 @@ EXPORT_SYMBOL(lease_get_mtime);
*/
int fcntl_getlease(struct file *filp)
{
- struct file_lock *fl;
+ struct file_lease *fl;
struct inode *inode = file_inode(filp);
struct file_lock_context *ctx;
int type = F_UNLCK;
@@ -1654,8 +1710,8 @@ int fcntl_getlease(struct file *filp)
percpu_down_read(&file_rwsem);
spin_lock(&ctx->flc_lock);
time_out_leases(inode, &dispose);
- list_for_each_entry(fl, &ctx->flc_lease, fl_list) {
- if (fl->fl_file != filp)
+ list_for_each_entry(fl, &ctx->flc_lease, c.flc_list) {
+ if (fl->c.flc_file != filp)
continue;
type = target_leasetype(fl);
break;
@@ -1715,12 +1771,12 @@ check_conflicting_open(struct file *filp, const int arg, int flags)
}
static int
-generic_add_lease(struct file *filp, int arg, struct file_lock **flp, void **priv)
+generic_add_lease(struct file *filp, int arg, struct file_lease **flp, void **priv)
{
- struct file_lock *fl, *my_fl = NULL, *lease;
+ struct file_lease *fl, *my_fl = NULL, *lease;
struct inode *inode = file_inode(filp);
struct file_lock_context *ctx;
- bool is_deleg = (*flp)->fl_flags & FL_DELEG;
+ bool is_deleg = (*flp)->c.flc_flags & FL_DELEG;
int error;
LIST_HEAD(dispose);
@@ -1746,7 +1802,7 @@ generic_add_lease(struct file *filp, int arg, struct file_lock **flp, void **pri
percpu_down_read(&file_rwsem);
spin_lock(&ctx->flc_lock);
time_out_leases(inode, &dispose);
- error = check_conflicting_open(filp, arg, lease->fl_flags);
+ error = check_conflicting_open(filp, arg, lease->c.flc_flags);
if (error)
goto out;
@@ -1759,9 +1815,9 @@ generic_add_lease(struct file *filp, int arg, struct file_lock **flp, void **pri
* except for this filp.
*/
error = -EAGAIN;
- list_for_each_entry(fl, &ctx->flc_lease, fl_list) {
- if (fl->fl_file == filp &&
- fl->fl_owner == lease->fl_owner) {
+ list_for_each_entry(fl, &ctx->flc_lease, c.flc_list) {
+ if (fl->c.flc_file == filp &&
+ fl->c.flc_owner == lease->c.flc_owner) {
my_fl = fl;
continue;
}
@@ -1776,7 +1832,7 @@ generic_add_lease(struct file *filp, int arg, struct file_lock **flp, void **pri
* Modifying our existing lease is OK, but no getting a
* new lease if someone else is opening for write:
*/
- if (fl->fl_flags & FL_UNLOCK_PENDING)
+ if (fl->c.flc_flags & FL_UNLOCK_PENDING)
goto out;
}
@@ -1792,7 +1848,7 @@ generic_add_lease(struct file *filp, int arg, struct file_lock **flp, void **pri
if (!leases_enable)
goto out;
- locks_insert_lock_ctx(lease, &ctx->flc_lease);
+ locks_insert_lock_ctx(&lease->c, &ctx->flc_lease);
/*
* The check in break_lease() is lockless. It's possible for another
* open to race in after we did the earlier check for a conflicting
@@ -1803,9 +1859,9 @@ generic_add_lease(struct file *filp, int arg, struct file_lock **flp, void **pri
* precedes these checks.
*/
smp_mb();
- error = check_conflicting_open(filp, arg, lease->fl_flags);
+ error = check_conflicting_open(filp, arg, lease->c.flc_flags);
if (error) {
- locks_unlink_lock_ctx(lease);
+ locks_unlink_lock_ctx(&lease->c);
goto out;
}
@@ -1826,7 +1882,7 @@ generic_add_lease(struct file *filp, int arg, struct file_lock **flp, void **pri
static int generic_delete_lease(struct file *filp, void *owner)
{
int error = -EAGAIN;
- struct file_lock *fl, *victim = NULL;
+ struct file_lease *fl, *victim = NULL;
struct inode *inode = file_inode(filp);
struct file_lock_context *ctx;
LIST_HEAD(dispose);
@@ -1839,9 +1895,9 @@ static int generic_delete_lease(struct file *filp, void *owner)
percpu_down_read(&file_rwsem);
spin_lock(&ctx->flc_lock);
- list_for_each_entry(fl, &ctx->flc_lease, fl_list) {
- if (fl->fl_file == filp &&
- fl->fl_owner == owner) {
+ list_for_each_entry(fl, &ctx->flc_lease, c.flc_list) {
+ if (fl->c.flc_file == filp &&
+ fl->c.flc_owner == owner) {
victim = fl;
break;
}
@@ -1866,21 +1922,9 @@ static int generic_delete_lease(struct file *filp, void *owner)
* The (input) flp->fl_lmops->lm_break function is required
* by break_lease().
*/
-int generic_setlease(struct file *filp, int arg, struct file_lock **flp,
+int generic_setlease(struct file *filp, int arg, struct file_lease **flp,
void **priv)
{
- struct inode *inode = file_inode(filp);
- vfsuid_t vfsuid = i_uid_into_vfsuid(file_mnt_idmap(filp), inode);
- int error;
-
- if ((!vfsuid_eq_kuid(vfsuid, current_fsuid())) && !capable(CAP_LEASE))
- return -EACCES;
- if (!S_ISREG(inode->i_mode))
- return -EINVAL;
- error = security_file_lock(filp, arg);
- if (error)
- return error;
-
switch (arg) {
case F_UNLCK:
return generic_delete_lease(filp, *priv);
@@ -1913,7 +1957,7 @@ lease_notifier_chain_init(void)
}
static inline void
-setlease_notifier(int arg, struct file_lock *lease)
+setlease_notifier(int arg, struct file_lease *lease)
{
if (arg != F_UNLCK)
srcu_notifier_call_chain(&lease_notifier_chain, arg, lease);
@@ -1931,6 +1975,19 @@ void lease_unregister_notifier(struct notifier_block *nb)
}
EXPORT_SYMBOL_GPL(lease_unregister_notifier);
+
+int
+kernel_setlease(struct file *filp, int arg, struct file_lease **lease, void **priv)
+{
+ if (lease)
+ setlease_notifier(arg, *lease);
+ if (filp->f_op->setlease)
+ return filp->f_op->setlease(filp, arg, lease, priv);
+ else
+ return generic_setlease(filp, arg, lease, priv);
+}
+EXPORT_SYMBOL_GPL(kernel_setlease);
+
/**
* vfs_setlease - sets a lease on an open file
* @filp: file pointer
@@ -1949,20 +2006,26 @@ EXPORT_SYMBOL_GPL(lease_unregister_notifier);
* may be NULL if the lm_setup operation doesn't require it.
*/
int
-vfs_setlease(struct file *filp, int arg, struct file_lock **lease, void **priv)
+vfs_setlease(struct file *filp, int arg, struct file_lease **lease, void **priv)
{
- if (lease)
- setlease_notifier(arg, *lease);
- if (filp->f_op->setlease)
- return filp->f_op->setlease(filp, arg, lease, priv);
- else
- return generic_setlease(filp, arg, lease, priv);
+ struct inode *inode = file_inode(filp);
+ vfsuid_t vfsuid = i_uid_into_vfsuid(file_mnt_idmap(filp), inode);
+ int error;
+
+ if ((!vfsuid_eq_kuid(vfsuid, current_fsuid())) && !capable(CAP_LEASE))
+ return -EACCES;
+ if (!S_ISREG(inode->i_mode))
+ return -EINVAL;
+ error = security_file_lock(filp, arg);
+ if (error)
+ return error;
+ return kernel_setlease(filp, arg, lease, priv);
}
EXPORT_SYMBOL_GPL(vfs_setlease);
static int do_fcntl_add_lease(unsigned int fd, struct file *filp, int arg)
{
- struct file_lock *fl;
+ struct file_lease *fl;
struct fasync_struct *new;
int error;
@@ -1972,14 +2035,14 @@ static int do_fcntl_add_lease(unsigned int fd, struct file *filp, int arg)
new = fasync_alloc();
if (!new) {
- locks_free_lock(fl);
+ locks_free_lease(fl);
return -ENOMEM;
}
new->fa_fd = fd;
error = vfs_setlease(filp, arg, &fl, (void **)&new);
if (fl)
- locks_free_lock(fl);
+ locks_free_lease(fl);
if (new)
fasync_free(new);
return error;
@@ -2017,8 +2080,8 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl)
error = flock_lock_inode(inode, fl);
if (error != FILE_LOCK_DEFERRED)
break;
- error = wait_event_interruptible(fl->fl_wait,
- list_empty(&fl->fl_blocked_member));
+ error = wait_event_interruptible(fl->c.flc_wait,
+ list_empty(&fl->c.flc_blocked_member));
if (error)
break;
}
@@ -2036,7 +2099,7 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl)
int locks_lock_inode_wait(struct inode *inode, struct file_lock *fl)
{
int res = 0;
- switch (fl->fl_flags & (FL_POSIX|FL_FLOCK)) {
+ switch (fl->c.flc_flags & (FL_POSIX|FL_FLOCK)) {
case FL_POSIX:
res = posix_lock_inode_wait(inode, fl);
break;
@@ -2098,13 +2161,13 @@ SYSCALL_DEFINE2(flock, unsigned int, fd, unsigned int, cmd)
flock_make_lock(f.file, &fl, type);
- error = security_file_lock(f.file, fl.fl_type);
+ error = security_file_lock(f.file, fl.c.flc_type);
if (error)
goto out_putf;
can_sleep = !(cmd & LOCK_NB);
if (can_sleep)
- fl.fl_flags |= FL_SLEEP;
+ fl.c.flc_flags |= FL_SLEEP;
if (f.file->f_op->flock)
error = f.file->f_op->flock(f.file,
@@ -2130,7 +2193,7 @@ SYSCALL_DEFINE2(flock, unsigned int, fd, unsigned int, cmd)
*/
int vfs_test_lock(struct file *filp, struct file_lock *fl)
{
- WARN_ON_ONCE(filp != fl->fl_file);
+ WARN_ON_ONCE(filp != fl->c.flc_file);
if (filp->f_op->lock)
return filp->f_op->lock(filp, F_GETLK, fl);
posix_test_lock(filp, fl);
@@ -2145,25 +2208,28 @@ EXPORT_SYMBOL_GPL(vfs_test_lock);
*
* Used to translate a fl_pid into a namespace virtual pid number
*/
-static pid_t locks_translate_pid(struct file_lock *fl, struct pid_namespace *ns)
+static pid_t locks_translate_pid(struct file_lock_core *fl, struct pid_namespace *ns)
{
pid_t vnr;
struct pid *pid;
- if (IS_OFDLCK(fl))
+ if (fl->flc_flags & FL_OFDLCK)
return -1;
- if (IS_REMOTELCK(fl))
- return fl->fl_pid;
+
+ /* Remote locks report a negative pid value */
+ if (fl->flc_pid <= 0)
+ return fl->flc_pid;
+
/*
* If the flock owner process is dead and its pid has been already
* freed, the translation below won't work, but we still want to show
* flock owner pid number in init pidns.
*/
if (ns == &init_pid_ns)
- return (pid_t)fl->fl_pid;
+ return (pid_t) fl->flc_pid;
rcu_read_lock();
- pid = find_pid_ns(fl->fl_pid, &init_pid_ns);
+ pid = find_pid_ns(fl->flc_pid, &init_pid_ns);
vnr = pid_nr_ns(pid, ns);
rcu_read_unlock();
return vnr;
@@ -2171,7 +2237,7 @@ static pid_t locks_translate_pid(struct file_lock *fl, struct pid_namespace *ns)
static int posix_lock_to_flock(struct flock *flock, struct file_lock *fl)
{
- flock->l_pid = locks_translate_pid(fl, task_active_pid_ns(current));
+ flock->l_pid = locks_translate_pid(&fl->c, task_active_pid_ns(current));
#if BITS_PER_LONG == 32
/*
* Make sure we can represent the posix lock via
@@ -2186,19 +2252,19 @@ static int posix_lock_to_flock(struct flock *flock, struct file_lock *fl)
flock->l_len = fl->fl_end == OFFSET_MAX ? 0 :
fl->fl_end - fl->fl_start + 1;
flock->l_whence = 0;
- flock->l_type = fl->fl_type;
+ flock->l_type = fl->c.flc_type;
return 0;
}
#if BITS_PER_LONG == 32
static void posix_lock_to_flock64(struct flock64 *flock, struct file_lock *fl)
{
- flock->l_pid = locks_translate_pid(fl, task_active_pid_ns(current));
+ flock->l_pid = locks_translate_pid(&fl->c, task_active_pid_ns(current));
flock->l_start = fl->fl_start;
flock->l_len = fl->fl_end == OFFSET_MAX ? 0 :
fl->fl_end - fl->fl_start + 1;
flock->l_whence = 0;
- flock->l_type = fl->fl_type;
+ flock->l_type = fl->c.flc_type;
}
#endif
@@ -2227,16 +2293,16 @@ int fcntl_getlk(struct file *filp, unsigned int cmd, struct flock *flock)
if (flock->l_pid != 0)
goto out;
- fl->fl_flags |= FL_OFDLCK;
- fl->fl_owner = filp;
+ fl->c.flc_flags |= FL_OFDLCK;
+ fl->c.flc_owner = filp;
}
error = vfs_test_lock(filp, fl);
if (error)
goto out;
- flock->l_type = fl->fl_type;
- if (fl->fl_type != F_UNLCK) {
+ flock->l_type = fl->c.flc_type;
+ if (fl->c.flc_type != F_UNLCK) {
error = posix_lock_to_flock(flock, fl);
if (error)
goto out;
@@ -2283,7 +2349,7 @@ int fcntl_getlk(struct file *filp, unsigned int cmd, struct flock *flock)
*/
int vfs_lock_file(struct file *filp, unsigned int cmd, struct file_lock *fl, struct file_lock *conf)
{
- WARN_ON_ONCE(filp != fl->fl_file);
+ WARN_ON_ONCE(filp != fl->c.flc_file);
if (filp->f_op->lock)
return filp->f_op->lock(filp, cmd, fl);
else
@@ -2296,7 +2362,7 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
{
int error;
- error = security_file_lock(filp, fl->fl_type);
+ error = security_file_lock(filp, fl->c.flc_type);
if (error)
return error;
@@ -2304,8 +2370,8 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
error = vfs_lock_file(filp, cmd, fl, NULL);
if (error != FILE_LOCK_DEFERRED)
break;
- error = wait_event_interruptible(fl->fl_wait,
- list_empty(&fl->fl_blocked_member));
+ error = wait_event_interruptible(fl->c.flc_wait,
+ list_empty(&fl->c.flc_blocked_member));
if (error)
break;
}
@@ -2318,13 +2384,13 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
static int
check_fmode_for_setlk(struct file_lock *fl)
{
- switch (fl->fl_type) {
+ switch (fl->c.flc_type) {
case F_RDLCK:
- if (!(fl->fl_file->f_mode & FMODE_READ))
+ if (!(fl->c.flc_file->f_mode & FMODE_READ))
return -EBADF;
break;
case F_WRLCK:
- if (!(fl->fl_file->f_mode & FMODE_WRITE))
+ if (!(fl->c.flc_file->f_mode & FMODE_WRITE))
return -EBADF;
}
return 0;
@@ -2363,8 +2429,8 @@ int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd,
goto out;
cmd = F_SETLK;
- file_lock->fl_flags |= FL_OFDLCK;
- file_lock->fl_owner = filp;
+ file_lock->c.flc_flags |= FL_OFDLCK;
+ file_lock->c.flc_owner = filp;
break;
case F_OFD_SETLKW:
error = -EINVAL;
@@ -2372,11 +2438,11 @@ int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd,
goto out;
cmd = F_SETLKW;
- file_lock->fl_flags |= FL_OFDLCK;
- file_lock->fl_owner = filp;
+ file_lock->c.flc_flags |= FL_OFDLCK;
+ file_lock->c.flc_owner = filp;
fallthrough;
case F_SETLKW:
- file_lock->fl_flags |= FL_SLEEP;
+ file_lock->c.flc_flags |= FL_SLEEP;
}
error = do_lock_file_wait(filp, cmd, file_lock);
@@ -2386,8 +2452,8 @@ int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd,
* lock that was just acquired. There is no need to do that when we're
* unlocking though, or for OFD locks.
*/
- if (!error && file_lock->fl_type != F_UNLCK &&
- !(file_lock->fl_flags & FL_OFDLCK)) {
+ if (!error && file_lock->c.flc_type != F_UNLCK &&
+ !(file_lock->c.flc_flags & FL_OFDLCK)) {
struct files_struct *files = current->files;
/*
* We need that spin_lock here - it prevents reordering between
@@ -2398,7 +2464,7 @@ int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd,
f = files_lookup_fd_locked(files, fd);
spin_unlock(&files->file_lock);
if (f != filp) {
- file_lock->fl_type = F_UNLCK;
+ file_lock->c.flc_type = F_UNLCK;
error = do_lock_file_wait(filp, cmd, file_lock);
WARN_ON_ONCE(error);
error = -EBADF;
@@ -2437,16 +2503,16 @@ int fcntl_getlk64(struct file *filp, unsigned int cmd, struct flock64 *flock)
if (flock->l_pid != 0)
goto out;
- fl->fl_flags |= FL_OFDLCK;
- fl->fl_owner = filp;
+ fl->c.flc_flags |= FL_OFDLCK;
+ fl->c.flc_owner = filp;
}
error = vfs_test_lock(filp, fl);
if (error)
goto out;
- flock->l_type = fl->fl_type;
- if (fl->fl_type != F_UNLCK)
+ flock->l_type = fl->c.flc_type;
+ if (fl->c.flc_type != F_UNLCK)
posix_lock_to_flock64(flock, fl);
out:
@@ -2486,8 +2552,8 @@ int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd,
goto out;
cmd = F_SETLK64;
- file_lock->fl_flags |= FL_OFDLCK;
- file_lock->fl_owner = filp;
+ file_lock->c.flc_flags |= FL_OFDLCK;
+ file_lock->c.flc_owner = filp;
break;
case F_OFD_SETLKW:
error = -EINVAL;
@@ -2495,11 +2561,11 @@ int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd,
goto out;
cmd = F_SETLKW64;
- file_lock->fl_flags |= FL_OFDLCK;
- file_lock->fl_owner = filp;
+ file_lock->c.flc_flags |= FL_OFDLCK;
+ file_lock->c.flc_owner = filp;
fallthrough;
case F_SETLKW64:
- file_lock->fl_flags |= FL_SLEEP;
+ file_lock->c.flc_flags |= FL_SLEEP;
}
error = do_lock_file_wait(filp, cmd, file_lock);
@@ -2509,8 +2575,8 @@ int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd,
* lock that was just acquired. There is no need to do that when we're
* unlocking though, or for OFD locks.
*/
- if (!error && file_lock->fl_type != F_UNLCK &&
- !(file_lock->fl_flags & FL_OFDLCK)) {
+ if (!error && file_lock->c.flc_type != F_UNLCK &&
+ !(file_lock->c.flc_flags & FL_OFDLCK)) {
struct files_struct *files = current->files;
/*
* We need that spin_lock here - it prevents reordering between
@@ -2521,7 +2587,7 @@ int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd,
f = files_lookup_fd_locked(files, fd);
spin_unlock(&files->file_lock);
if (f != filp) {
- file_lock->fl_type = F_UNLCK;
+ file_lock->c.flc_type = F_UNLCK;
error = do_lock_file_wait(filp, cmd, file_lock);
WARN_ON_ONCE(error);
error = -EBADF;
@@ -2555,13 +2621,13 @@ void locks_remove_posix(struct file *filp, fl_owner_t owner)
return;
locks_init_lock(&lock);
- lock.fl_type = F_UNLCK;
- lock.fl_flags = FL_POSIX | FL_CLOSE;
+ lock.c.flc_type = F_UNLCK;
+ lock.c.flc_flags = FL_POSIX | FL_CLOSE;
lock.fl_start = 0;
lock.fl_end = OFFSET_MAX;
- lock.fl_owner = owner;
- lock.fl_pid = current->tgid;
- lock.fl_file = filp;
+ lock.c.flc_owner = owner;
+ lock.c.flc_pid = current->tgid;
+ lock.c.flc_file = filp;
lock.fl_ops = NULL;
lock.fl_lmops = NULL;
@@ -2584,7 +2650,7 @@ locks_remove_flock(struct file *filp, struct file_lock_context *flctx)
return;
flock_make_lock(filp, &fl, F_UNLCK);
- fl.fl_flags |= FL_CLOSE;
+ fl.c.flc_flags |= FL_CLOSE;
if (filp->f_op->flock)
filp->f_op->flock(filp, F_SETLKW, &fl);
@@ -2599,7 +2665,7 @@ locks_remove_flock(struct file *filp, struct file_lock_context *flctx)
static void
locks_remove_lease(struct file *filp, struct file_lock_context *ctx)
{
- struct file_lock *fl, *tmp;
+ struct file_lease *fl, *tmp;
LIST_HEAD(dispose);
if (list_empty(&ctx->flc_lease))
@@ -2607,8 +2673,8 @@ locks_remove_lease(struct file *filp, struct file_lock_context *ctx)
percpu_down_read(&file_rwsem);
spin_lock(&ctx->flc_lock);
- list_for_each_entry_safe(fl, tmp, &ctx->flc_lease, fl_list)
- if (filp == fl->fl_file)
+ list_for_each_entry_safe(fl, tmp, &ctx->flc_lease, c.flc_list)
+ if (filp == fl->c.flc_file)
lease_modify(fl, F_UNLCK, &dispose);
spin_unlock(&ctx->flc_lock);
percpu_up_read(&file_rwsem);
@@ -2652,7 +2718,7 @@ void locks_remove_file(struct file *filp)
*/
int vfs_cancel_lock(struct file *filp, struct file_lock *fl)
{
- WARN_ON_ONCE(filp != fl->fl_file);
+ WARN_ON_ONCE(filp != fl->c.flc_file);
if (filp->f_op->lock)
return filp->f_op->lock(filp, F_CANCELLK, fl);
return 0;
@@ -2691,69 +2757,73 @@ struct locks_iterator {
loff_t li_pos;
};
-static void lock_get_status(struct seq_file *f, struct file_lock *fl,
+static void lock_get_status(struct seq_file *f, struct file_lock_core *flc,
loff_t id, char *pfx, int repeat)
{
struct inode *inode = NULL;
- unsigned int fl_pid;
+ unsigned int pid;
struct pid_namespace *proc_pidns = proc_pid_ns(file_inode(f->file)->i_sb);
- int type;
+ int type = flc->flc_type;
+ struct file_lock *fl = file_lock(flc);
- fl_pid = locks_translate_pid(fl, proc_pidns);
+ pid = locks_translate_pid(flc, proc_pidns);
+
/*
* If lock owner is dead (and pid is freed) or not visible in current
* pidns, zero is shown as a pid value. Check lock info from
* init_pid_ns to get saved lock pid value.
*/
-
- if (fl->fl_file != NULL)
- inode = file_inode(fl->fl_file);
+ if (flc->flc_file != NULL)
+ inode = file_inode(flc->flc_file);
seq_printf(f, "%lld: ", id);
if (repeat)
seq_printf(f, "%*s", repeat - 1 + (int)strlen(pfx), pfx);
- if (IS_POSIX(fl)) {
- if (fl->fl_flags & FL_ACCESS)
+ if (flc->flc_flags & FL_POSIX) {
+ if (flc->flc_flags & FL_ACCESS)
seq_puts(f, "ACCESS");
- else if (IS_OFDLCK(fl))
+ else if (flc->flc_flags & FL_OFDLCK)
seq_puts(f, "OFDLCK");
else
seq_puts(f, "POSIX ");
seq_printf(f, " %s ",
(inode == NULL) ? "*NOINODE*" : "ADVISORY ");
- } else if (IS_FLOCK(fl)) {
+ } else if (flc->flc_flags & FL_FLOCK) {
seq_puts(f, "FLOCK ADVISORY ");
- } else if (IS_LEASE(fl)) {
- if (fl->fl_flags & FL_DELEG)
+ } else if (flc->flc_flags & (FL_LEASE|FL_DELEG|FL_LAYOUT)) {
+ struct file_lease *lease = file_lease(flc);
+
+ type = target_leasetype(lease);
+
+ if (flc->flc_flags & FL_DELEG)
seq_puts(f, "DELEG ");
else
seq_puts(f, "LEASE ");
- if (lease_breaking(fl))
+ if (lease_breaking(lease))
seq_puts(f, "BREAKING ");
- else if (fl->fl_file)
+ else if (flc->flc_file)
seq_puts(f, "ACTIVE ");
else
seq_puts(f, "BREAKER ");
} else {
seq_puts(f, "UNKNOWN UNKNOWN ");
}
- type = IS_LEASE(fl) ? target_leasetype(fl) : fl->fl_type;
seq_printf(f, "%s ", (type == F_WRLCK) ? "WRITE" :
(type == F_RDLCK) ? "READ" : "UNLCK");
if (inode) {
/* userspace relies on this representation of dev_t */
- seq_printf(f, "%d %02x:%02x:%lu ", fl_pid,
+ seq_printf(f, "%d %02x:%02x:%lu ", pid,
MAJOR(inode->i_sb->s_dev),
MINOR(inode->i_sb->s_dev), inode->i_ino);
} else {
- seq_printf(f, "%d <none>:0 ", fl_pid);
+ seq_printf(f, "%d <none>:0 ", pid);
}
- if (IS_POSIX(fl)) {
+ if (flc->flc_flags & FL_POSIX) {
if (fl->fl_end == OFFSET_MAX)
seq_printf(f, "%Ld EOF\n", fl->fl_start);
else
@@ -2763,17 +2833,18 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
}
}
-static struct file_lock *get_next_blocked_member(struct file_lock *node)
+static struct file_lock_core *get_next_blocked_member(struct file_lock_core *node)
{
- struct file_lock *tmp;
+ struct file_lock_core *tmp;
/* NULL node or root node */
- if (node == NULL || node->fl_blocker == NULL)
+ if (node == NULL || node->flc_blocker == NULL)
return NULL;
/* Next member in the linked list could be itself */
- tmp = list_next_entry(node, fl_blocked_member);
- if (list_entry_is_head(tmp, &node->fl_blocker->fl_blocked_requests, fl_blocked_member)
+ tmp = list_next_entry(node, flc_blocked_member);
+ if (list_entry_is_head(tmp, &node->flc_blocker->flc_blocked_requests,
+ flc_blocked_member)
|| tmp == node) {
return NULL;
}
@@ -2784,18 +2855,18 @@ static struct file_lock *get_next_blocked_member(struct file_lock *node)
static int locks_show(struct seq_file *f, void *v)
{
struct locks_iterator *iter = f->private;
- struct file_lock *cur, *tmp;
+ struct file_lock_core *cur, *tmp;
struct pid_namespace *proc_pidns = proc_pid_ns(file_inode(f->file)->i_sb);
int level = 0;
- cur = hlist_entry(v, struct file_lock, fl_link);
+ cur = hlist_entry(v, struct file_lock_core, flc_link);
if (locks_translate_pid(cur, proc_pidns) == 0)
return 0;
- /* View this crossed linked list as a binary tree, the first member of fl_blocked_requests
- * is the left child of current node, the next silibing in fl_blocked_member is the
- * right child, we can alse get the parent of current node from fl_blocker, so this
+ /* View this crossed linked list as a binary tree, the first member of flc_blocked_requests
+ * is the left child of current node, the next silibing in flc_blocked_member is the
+ * right child, we can alse get the parent of current node from flc_blocker, so this
* question becomes traversal of a binary tree
*/
while (cur != NULL) {
@@ -2804,17 +2875,18 @@ static int locks_show(struct seq_file *f, void *v)
else
lock_get_status(f, cur, iter->li_pos, "", level);
- if (!list_empty(&cur->fl_blocked_requests)) {
+ if (!list_empty(&cur->flc_blocked_requests)) {
/* Turn left */
- cur = list_first_entry_or_null(&cur->fl_blocked_requests,
- struct file_lock, fl_blocked_member);
+ cur = list_first_entry_or_null(&cur->flc_blocked_requests,
+ struct file_lock_core,
+ flc_blocked_member);
level++;
} else {
/* Turn right */
tmp = get_next_blocked_member(cur);
/* Fall back to parent node */
- while (tmp == NULL && cur->fl_blocker != NULL) {
- cur = cur->fl_blocker;
+ while (tmp == NULL && cur->flc_blocker != NULL) {
+ cur = cur->flc_blocker;
level--;
tmp = get_next_blocked_member(cur);
}
@@ -2829,14 +2901,13 @@ static void __show_fd_locks(struct seq_file *f,
struct list_head *head, int *id,
struct file *filp, struct files_struct *files)
{
- struct file_lock *fl;
+ struct file_lock_core *fl;
- list_for_each_entry(fl, head, fl_list) {
+ list_for_each_entry(fl, head, flc_list) {
- if (filp != fl->fl_file)
+ if (filp != fl->flc_file)
continue;
- if (fl->fl_owner != files &&
- fl->fl_owner != filp)
+ if (fl->flc_owner != files && fl->flc_owner != filp)
continue;
(*id)++;
@@ -2915,6 +2986,9 @@ static int __init filelock_init(void)
filelock_cache = kmem_cache_create("file_lock_cache",
sizeof(struct file_lock), 0, SLAB_PANIC, NULL);
+ filelease_cache = kmem_cache_create("file_lock_cache",
+ sizeof(struct file_lease), 0, SLAB_PANIC, NULL);
+
for_each_possible_cpu(i) {
struct file_lock_list_struct *fll = per_cpu_ptr(&file_lock_list, i);
diff --git a/fs/mbcache.c b/fs/mbcache.c
index 82aa7a3..e60a840 100644
--- a/fs/mbcache.c
+++ b/fs/mbcache.c
@@ -426,9 +426,7 @@ EXPORT_SYMBOL(mb_cache_destroy);
static int __init mbcache_init(void)
{
- mb_entry_cache = kmem_cache_create("mbcache",
- sizeof(struct mb_cache_entry), 0,
- SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD, NULL);
+ mb_entry_cache = KMEM_CACHE(mb_cache_entry, SLAB_RECLAIM_ACCOUNT);
if (!mb_entry_cache)
return -ENOMEM;
return 0;
diff --git a/fs/minix/inode.c b/fs/minix/inode.c
index 73f37f29..7cbd2b9 100644
--- a/fs/minix/inode.c
+++ b/fs/minix/inode.c
@@ -87,7 +87,7 @@ static int __init init_inodecache(void)
minix_inode_cachep = kmem_cache_create("minix_inode_cache",
sizeof(struct minix_inode_info),
0, (SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+ SLAB_ACCOUNT),
init_once);
if (minix_inode_cachep == NULL)
return -ENOMEM;
diff --git a/fs/mnt_idmapping.c b/fs/mnt_idmapping.c
index 64c5205..3c60f1e 100644
--- a/fs/mnt_idmapping.c
+++ b/fs/mnt_idmapping.c
@@ -214,7 +214,7 @@ static int copy_mnt_idmap(struct uid_gid_map *map_from,
* anything at all.
*/
if (nr_extents == 0)
- return 0;
+ return -EINVAL;
/*
* Here we know that nr_extents is greater than zero which means
diff --git a/fs/mpage.c b/fs/mpage.c
index 738882e..fa8b99a 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -605,6 +605,7 @@ static int __mpage_writepage(struct folio *folio, struct writeback_control *wbc,
GFP_NOFS);
bio->bi_iter.bi_sector = first_block << (blkbits - 9);
wbc_init_bio(wbc, bio);
+ bio->bi_write_hint = inode->i_write_hint;
}
/*
diff --git a/fs/namei.c b/fs/namei.c
index 4e0de93..d0c4a3e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1717,7 +1717,11 @@ static inline int may_lookup(struct mnt_idmap *idmap,
{
if (nd->flags & LOOKUP_RCU) {
int err = inode_permission(idmap, nd->inode, MAY_EXEC|MAY_NOT_BLOCK);
- if (err != -ECHILD || !try_to_unlazy(nd))
+ if (!err) // success, keep going
+ return 0;
+ if (!try_to_unlazy(nd))
+ return -ECHILD; // redo it all non-lazy
+ if (err != -ECHILD) // hard error
return err;
}
return inode_permission(idmap, nd->inode, MAY_EXEC);
@@ -2676,10 +2680,8 @@ static int lookup_one_common(struct mnt_idmap *idmap,
if (!len)
return -EACCES;
- if (unlikely(name[0] == '.')) {
- if (len < 2 || (len == 2 && name[1] == '.'))
- return -EACCES;
- }
+ if (is_dot_dotdot(name, len))
+ return -EACCES;
while (len--) {
unsigned int c = *(const unsigned char *)name++;
diff --git a/fs/namespace.c b/fs/namespace.c
index 437f60e..5a51315 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4472,10 +4472,15 @@ static int do_mount_setattr(struct path *path, struct mount_kattr *kattr)
/*
* If this is an attached mount make sure it's located in the callers
* mount namespace. If it's not don't let the caller interact with it.
- * If this is a detached mount make sure it has an anonymous mount
- * namespace attached to it, i.e. we've created it via OPEN_TREE_CLONE.
+ *
+ * If this mount doesn't have a parent it's most often simply a
+ * detached mount with an anonymous mount namespace. IOW, something
+ * that's simply not attached yet. But there are apparently also users
+ * that do change mount properties on the rootfs itself. That obviously
+ * neither has a parent nor is it a detached mount so we cannot
+ * unconditionally check for detached mounts.
*/
- if (!(mnt_has_parent(mnt) ? check_mnt(mnt) : is_anon_ns(mnt->mnt_ns)))
+ if ((mnt_has_parent(mnt) || !is_anon_ns(mnt->mnt_ns)) && !check_mnt(mnt))
goto out;
/*
diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c
index a3059b3..9a0d32e 100644
--- a/fs/netfs/buffered_write.c
+++ b/fs/netfs/buffered_write.c
@@ -477,6 +477,9 @@ ssize_t netfs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
_enter("%llx,%zx,%llx", iocb->ki_pos, iov_iter_count(from), i_size_read(inode));
+ if (!iov_iter_count(from))
+ return 0;
+
if ((iocb->ki_flags & IOCB_DIRECT) ||
test_bit(NETFS_ICTX_UNBUFFERED, &ictx->flags))
return netfs_unbuffered_write_iter(iocb, from);
diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
index 60a40d2..bee047e 100644
--- a/fs/netfs/direct_write.c
+++ b/fs/netfs/direct_write.c
@@ -139,6 +139,9 @@ ssize_t netfs_unbuffered_write_iter(struct kiocb *iocb, struct iov_iter *from)
_enter("%llx,%zx,%llx", iocb->ki_pos, iov_iter_count(from), i_size_read(inode));
+ if (!iov_iter_count(from))
+ return 0;
+
trace_netfs_write_iter(iocb, from);
netfs_stat(&netfs_n_rh_dio_write);
@@ -146,7 +149,7 @@ ssize_t netfs_unbuffered_write_iter(struct kiocb *iocb, struct iov_iter *from)
if (ret < 0)
return ret;
ret = generic_write_checks(iocb, from);
- if (ret < 0)
+ if (ret <= 0)
goto out;
ret = file_remove_privs(file);
if (ret < 0)
diff --git a/fs/netfs/io.c b/fs/netfs/io.c
index e8ff1e6..4261ad6 100644
--- a/fs/netfs/io.c
+++ b/fs/netfs/io.c
@@ -748,6 +748,8 @@ int netfs_begin_read(struct netfs_io_request *rreq, bool sync)
if (!rreq->submitted) {
netfs_put_request(rreq, false, netfs_rreq_trace_put_no_submit);
+ if (rreq->origin == NETFS_DIO_READ)
+ inode_dio_end(rreq->inode);
ret = 0;
goto out;
}
diff --git a/fs/nfs/blocklayout/blocklayout.h b/fs/nfs/blocklayout/blocklayout.h
index b4294a8..f1eeb49 100644
--- a/fs/nfs/blocklayout/blocklayout.h
+++ b/fs/nfs/blocklayout/blocklayout.h
@@ -108,7 +108,7 @@ struct pnfs_block_dev {
struct pnfs_block_dev *children;
u64 chunk_size;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
u64 disk_offset;
u64 pr_key;
diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c
index c97ebc4..93ef7f8 100644
--- a/fs/nfs/blocklayout/dev.c
+++ b/fs/nfs/blocklayout/dev.c
@@ -25,17 +25,17 @@ bl_free_device(struct pnfs_block_dev *dev)
} else {
if (dev->pr_registered) {
const struct pr_ops *ops =
- dev->bdev_handle->bdev->bd_disk->fops->pr_ops;
+ file_bdev(dev->bdev_file)->bd_disk->fops->pr_ops;
int error;
- error = ops->pr_register(dev->bdev_handle->bdev,
+ error = ops->pr_register(file_bdev(dev->bdev_file),
dev->pr_key, 0, false);
if (error)
pr_err("failed to unregister PR key.\n");
}
- if (dev->bdev_handle)
- bdev_release(dev->bdev_handle);
+ if (dev->bdev_file)
+ fput(dev->bdev_file);
}
}
@@ -169,7 +169,7 @@ static bool bl_map_simple(struct pnfs_block_dev *dev, u64 offset,
map->start = dev->start;
map->len = dev->len;
map->disk_offset = dev->disk_offset;
- map->bdev = dev->bdev_handle->bdev;
+ map->bdev = file_bdev(dev->bdev_file);
return true;
}
@@ -236,26 +236,26 @@ bl_parse_simple(struct nfs_server *server, struct pnfs_block_dev *d,
struct pnfs_block_volume *volumes, int idx, gfp_t gfp_mask)
{
struct pnfs_block_volume *v = &volumes[idx];
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
dev_t dev;
dev = bl_resolve_deviceid(server, v, gfp_mask);
if (!dev)
return -EIO;
- bdev_handle = bdev_open_by_dev(dev, BLK_OPEN_READ | BLK_OPEN_WRITE,
+ bdev_file = bdev_file_open_by_dev(dev, BLK_OPEN_READ | BLK_OPEN_WRITE,
NULL, NULL);
- if (IS_ERR(bdev_handle)) {
+ if (IS_ERR(bdev_file)) {
printk(KERN_WARNING "pNFS: failed to open device %d:%d (%ld)\n",
- MAJOR(dev), MINOR(dev), PTR_ERR(bdev_handle));
- return PTR_ERR(bdev_handle);
+ MAJOR(dev), MINOR(dev), PTR_ERR(bdev_file));
+ return PTR_ERR(bdev_file);
}
- d->bdev_handle = bdev_handle;
- d->len = bdev_nr_bytes(bdev_handle->bdev);
+ d->bdev_file = bdev_file;
+ d->len = bdev_nr_bytes(file_bdev(bdev_file));
d->map = bl_map_simple;
printk(KERN_INFO "pNFS: using block device %s\n",
- bdev_handle->bdev->bd_disk->disk_name);
+ file_bdev(bdev_file)->bd_disk->disk_name);
return 0;
}
@@ -300,10 +300,10 @@ bl_validate_designator(struct pnfs_block_volume *v)
}
}
-static struct bdev_handle *
+static struct file *
bl_open_path(struct pnfs_block_volume *v, const char *prefix)
{
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
const char *devname;
devname = kasprintf(GFP_KERNEL, "/dev/disk/by-id/%s%*phN",
@@ -311,15 +311,15 @@ bl_open_path(struct pnfs_block_volume *v, const char *prefix)
if (!devname)
return ERR_PTR(-ENOMEM);
- bdev_handle = bdev_open_by_path(devname, BLK_OPEN_READ | BLK_OPEN_WRITE,
+ bdev_file = bdev_file_open_by_path(devname, BLK_OPEN_READ | BLK_OPEN_WRITE,
NULL, NULL);
- if (IS_ERR(bdev_handle)) {
+ if (IS_ERR(bdev_file)) {
pr_warn("pNFS: failed to open device %s (%ld)\n",
- devname, PTR_ERR(bdev_handle));
+ devname, PTR_ERR(bdev_file));
}
kfree(devname);
- return bdev_handle;
+ return bdev_file;
}
static int
@@ -327,7 +327,7 @@ bl_parse_scsi(struct nfs_server *server, struct pnfs_block_dev *d,
struct pnfs_block_volume *volumes, int idx, gfp_t gfp_mask)
{
struct pnfs_block_volume *v = &volumes[idx];
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
const struct pr_ops *ops;
int error;
@@ -340,14 +340,14 @@ bl_parse_scsi(struct nfs_server *server, struct pnfs_block_dev *d,
* On other distributions like Debian, the default SCSI by-id path will
* point to the dm-multipath device if one exists.
*/
- bdev_handle = bl_open_path(v, "dm-uuid-mpath-0x");
- if (IS_ERR(bdev_handle))
- bdev_handle = bl_open_path(v, "wwn-0x");
- if (IS_ERR(bdev_handle))
- return PTR_ERR(bdev_handle);
- d->bdev_handle = bdev_handle;
+ bdev_file = bl_open_path(v, "dm-uuid-mpath-0x");
+ if (IS_ERR(bdev_file))
+ bdev_file = bl_open_path(v, "wwn-0x");
+ if (IS_ERR(bdev_file))
+ return PTR_ERR(bdev_file);
+ d->bdev_file = bdev_file;
- d->len = bdev_nr_bytes(d->bdev_handle->bdev);
+ d->len = bdev_nr_bytes(file_bdev(d->bdev_file));
d->map = bl_map_simple;
d->pr_key = v->scsi.pr_key;
@@ -355,20 +355,20 @@ bl_parse_scsi(struct nfs_server *server, struct pnfs_block_dev *d,
return -ENODEV;
pr_info("pNFS: using block device %s (reservation key 0x%llx)\n",
- d->bdev_handle->bdev->bd_disk->disk_name, d->pr_key);
+ file_bdev(d->bdev_file)->bd_disk->disk_name, d->pr_key);
- ops = d->bdev_handle->bdev->bd_disk->fops->pr_ops;
+ ops = file_bdev(d->bdev_file)->bd_disk->fops->pr_ops;
if (!ops) {
pr_err("pNFS: block device %s does not support reservations.",
- d->bdev_handle->bdev->bd_disk->disk_name);
+ file_bdev(d->bdev_file)->bd_disk->disk_name);
error = -EINVAL;
goto out_blkdev_put;
}
- error = ops->pr_register(d->bdev_handle->bdev, 0, d->pr_key, true);
+ error = ops->pr_register(file_bdev(d->bdev_file), 0, d->pr_key, true);
if (error) {
pr_err("pNFS: failed to register key for block device %s.",
- d->bdev_handle->bdev->bd_disk->disk_name);
+ file_bdev(d->bdev_file)->bd_disk->disk_name);
goto out_blkdev_put;
}
@@ -376,7 +376,7 @@ bl_parse_scsi(struct nfs_server *server, struct pnfs_block_dev *d,
return 0;
out_blkdev_put:
- bdev_release(d->bdev_handle);
+ fput(d->bdev_file);
return error;
}
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 44eca51..fbdc9ca 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -246,7 +246,7 @@ void nfs_free_client(struct nfs_client *clp)
put_nfs_version(clp->cl_nfs_mod);
kfree(clp->cl_hostname);
kfree(clp->cl_acceptor);
- kfree(clp);
+ kfree_rcu(clp, rcu);
}
EXPORT_SYMBOL_GPL(nfs_free_client);
@@ -1006,6 +1006,14 @@ struct nfs_server *nfs_alloc_server(void)
}
EXPORT_SYMBOL_GPL(nfs_alloc_server);
+static void delayed_free(struct rcu_head *p)
+{
+ struct nfs_server *server = container_of(p, struct nfs_server, rcu);
+
+ nfs_free_iostats(server->io_stats);
+ kfree(server);
+}
+
/*
* Free up a server record
*/
@@ -1031,10 +1039,9 @@ void nfs_free_server(struct nfs_server *server)
ida_destroy(&server->lockowner_id);
ida_destroy(&server->openowner_id);
- nfs_free_iostats(server->io_stats);
put_cred(server->cred);
- kfree(server);
nfs_release_automount_timer();
+ call_rcu(&server->rcu, delayed_free);
}
EXPORT_SYMBOL_GPL(nfs_free_server);
diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index fa1a14d..d4a42ce 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -156,8 +156,8 @@ static int nfs_delegation_claim_locks(struct nfs4_state *state, const nfs4_state
list = &flctx->flc_posix;
spin_lock(&flctx->flc_lock);
restart:
- list_for_each_entry(fl, list, fl_list) {
- if (nfs_file_open_context(fl->fl_file)->state != state)
+ for_each_file_lock(fl, list) {
+ if (nfs_file_open_context(fl->c.flc_file)->state != state)
continue;
spin_unlock(&flctx->flc_lock);
status = nfs4_lock_delegation_recall(fl, state, stateid);
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index c8ecbe9..ac50567 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1431,9 +1431,9 @@ static bool nfs_verifier_is_delegated(struct dentry *dentry)
static void nfs_set_verifier_locked(struct dentry *dentry, unsigned long verf)
{
struct inode *inode = d_inode(dentry);
- struct inode *dir = d_inode(dentry->d_parent);
+ struct inode *dir = d_inode_rcu(dentry->d_parent);
- if (!nfs_verify_change_attribute(dir, verf))
+ if (!dir || !nfs_verify_change_attribute(dir, verf))
return;
if (inode && NFS_PROTO(inode)->have_delegation(inode, FMODE_READ))
nfs_set_verifier_delegated(&verf);
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 8577ccf..407c6e1 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -720,15 +720,15 @@ do_getlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
{
struct inode *inode = filp->f_mapping->host;
int status = 0;
- unsigned int saved_type = fl->fl_type;
+ unsigned int saved_type = fl->c.flc_type;
/* Try local locking first */
posix_test_lock(filp, fl);
- if (fl->fl_type != F_UNLCK) {
+ if (fl->c.flc_type != F_UNLCK) {
/* found a conflict */
goto out;
}
- fl->fl_type = saved_type;
+ fl->c.flc_type = saved_type;
if (NFS_PROTO(inode)->have_delegation(inode, FMODE_READ))
goto out_noconflict;
@@ -740,7 +740,7 @@ do_getlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
out:
return status;
out_noconflict:
- fl->fl_type = F_UNLCK;
+ fl->c.flc_type = F_UNLCK;
goto out;
}
@@ -765,7 +765,7 @@ do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
* If we're signalled while cleaning up locks on process exit, we
* still need to complete the unlock.
*/
- if (status < 0 && !(fl->fl_flags & FL_CLOSE))
+ if (status < 0 && !(fl->c.flc_flags & FL_CLOSE))
return status;
}
@@ -832,12 +832,12 @@ int nfs_lock(struct file *filp, int cmd, struct file_lock *fl)
int is_local = 0;
dprintk("NFS: lock(%pD2, t=%x, fl=%x, r=%lld:%lld)\n",
- filp, fl->fl_type, fl->fl_flags,
+ filp, fl->c.flc_type, fl->c.flc_flags,
(long long)fl->fl_start, (long long)fl->fl_end);
nfs_inc_stats(inode, NFSIOS_VFSLOCK);
- if (fl->fl_flags & FL_RECLAIM)
+ if (fl->c.flc_flags & FL_RECLAIM)
return -ENOGRACE;
if (NFS_SERVER(inode)->flags & NFS_MOUNT_LOCAL_FCNTL)
@@ -851,7 +851,7 @@ int nfs_lock(struct file *filp, int cmd, struct file_lock *fl)
if (IS_GETLK(cmd))
ret = do_getlk(filp, cmd, fl, is_local);
- else if (fl->fl_type == F_UNLCK)
+ else if (lock_is_unlock(fl))
ret = do_unlk(filp, cmd, fl, is_local);
else
ret = do_setlk(filp, cmd, fl, is_local);
@@ -869,16 +869,16 @@ int nfs_flock(struct file *filp, int cmd, struct file_lock *fl)
int is_local = 0;
dprintk("NFS: flock(%pD2, t=%x, fl=%x)\n",
- filp, fl->fl_type, fl->fl_flags);
+ filp, fl->c.flc_type, fl->c.flc_flags);
- if (!(fl->fl_flags & FL_FLOCK))
+ if (!(fl->c.flc_flags & FL_FLOCK))
return -ENOLCK;
if (NFS_SERVER(inode)->flags & NFS_MOUNT_LOCAL_FLOCK)
is_local = 1;
/* We're simulating flock() locks using posix locks on the server */
- if (fl->fl_type == F_UNLCK)
+ if (lock_is_unlock(fl))
return do_unlk(filp, cmd, fl, is_local);
return do_setlk(filp, cmd, fl, is_local);
}
diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index 2de66e4..cbbe3f0 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -963,7 +963,7 @@ nfs3_proc_lock(struct file *filp, int cmd, struct file_lock *fl)
struct nfs_open_context *ctx = nfs_file_open_context(filp);
int status;
- if (fl->fl_flags & FL_CLOSE) {
+ if (fl->c.flc_flags & FL_CLOSE) {
l_ctx = nfs_get_lock_context(ctx);
if (IS_ERR(l_ctx))
l_ctx = NULL;
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 581698f..6ff41ce 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -330,7 +330,7 @@ extern int update_open_stateid(struct nfs4_state *state,
const nfs4_stateid *deleg_stateid,
fmode_t fmode);
extern int nfs4_proc_setlease(struct file *file, int arg,
- struct file_lock **lease, void **priv);
+ struct file_lease **lease, void **priv);
extern int nfs4_proc_get_lease_time(struct nfs_client *clp,
struct nfs_fsinfo *fsinfo);
extern void nfs4_update_changeattr(struct inode *dir,
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index e238abc..1cd9652 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -439,7 +439,7 @@ void nfs42_ssc_unregister_ops(void)
}
#endif /* CONFIG_NFS_V4_2 */
-static int nfs4_setlease(struct file *file, int arg, struct file_lock **lease,
+static int nfs4_setlease(struct file *file, int arg, struct file_lease **lease,
void **priv)
{
return nfs4_proc_setlease(file, arg, lease, priv);
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 23819a7..815996c 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -6800,7 +6800,7 @@ static int _nfs4_proc_getlk(struct nfs4_state *state, int cmd, struct file_lock
status = nfs4_call_sync(server->client, server, &msg, &arg.seq_args, &res.seq_res, 1);
switch (status) {
case 0:
- request->fl_type = F_UNLCK;
+ request->c.flc_type = F_UNLCK;
break;
case -NFS4ERR_DENIED:
status = 0;
@@ -7018,8 +7018,8 @@ static struct rpc_task *nfs4_do_unlck(struct file_lock *fl,
/* Ensure this is an unlock - when canceling a lock, the
* canceled lock is passed in, and it won't be an unlock.
*/
- fl->fl_type = F_UNLCK;
- if (fl->fl_flags & FL_CLOSE)
+ fl->c.flc_type = F_UNLCK;
+ if (fl->c.flc_flags & FL_CLOSE)
set_bit(NFS_CONTEXT_UNLOCK, &ctx->flags);
data = nfs4_alloc_unlockdata(fl, ctx, lsp, seqid);
@@ -7045,11 +7045,11 @@ static int nfs4_proc_unlck(struct nfs4_state *state, int cmd, struct file_lock *
struct rpc_task *task;
struct nfs_seqid *(*alloc_seqid)(struct nfs_seqid_counter *, gfp_t);
int status = 0;
- unsigned char fl_flags = request->fl_flags;
+ unsigned char saved_flags = request->c.flc_flags;
status = nfs4_set_lock_state(state, request);
/* Unlock _before_ we do the RPC call */
- request->fl_flags |= FL_EXISTS;
+ request->c.flc_flags |= FL_EXISTS;
/* Exclude nfs_delegation_claim_locks() */
mutex_lock(&sp->so_delegreturn_mutex);
/* Exclude nfs4_reclaim_open_stateid() - note nesting! */
@@ -7073,14 +7073,16 @@ static int nfs4_proc_unlck(struct nfs4_state *state, int cmd, struct file_lock *
status = -ENOMEM;
if (IS_ERR(seqid))
goto out;
- task = nfs4_do_unlck(request, nfs_file_open_context(request->fl_file), lsp, seqid);
+ task = nfs4_do_unlck(request,
+ nfs_file_open_context(request->c.flc_file),
+ lsp, seqid);
status = PTR_ERR(task);
if (IS_ERR(task))
goto out;
status = rpc_wait_for_completion_task(task);
rpc_put_task(task);
out:
- request->fl_flags = fl_flags;
+ request->c.flc_flags = saved_flags;
trace_nfs4_unlock(request, state, F_SETLK, status);
return status;
}
@@ -7191,7 +7193,7 @@ static void nfs4_lock_done(struct rpc_task *task, void *calldata)
renew_lease(NFS_SERVER(d_inode(data->ctx->dentry)),
data->timestamp);
if (data->arg.new_lock && !data->cancelled) {
- data->fl.fl_flags &= ~(FL_SLEEP | FL_ACCESS);
+ data->fl.c.flc_flags &= ~(FL_SLEEP | FL_ACCESS);
if (locks_lock_inode_wait(lsp->ls_state->inode, &data->fl) < 0)
goto out_restart;
}
@@ -7292,7 +7294,8 @@ static int _nfs4_do_setlk(struct nfs4_state *state, int cmd, struct file_lock *f
if (nfs_server_capable(state->inode, NFS_CAP_MOVEABLE))
task_setup_data.flags |= RPC_TASK_MOVEABLE;
- data = nfs4_alloc_lockdata(fl, nfs_file_open_context(fl->fl_file),
+ data = nfs4_alloc_lockdata(fl,
+ nfs_file_open_context(fl->c.flc_file),
fl->fl_u.nfs4_fl.owner, GFP_KERNEL);
if (data == NULL)
return -ENOMEM;
@@ -7398,10 +7401,10 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
{
struct nfs_inode *nfsi = NFS_I(state->inode);
struct nfs4_state_owner *sp = state->owner;
- unsigned char fl_flags = request->fl_flags;
+ unsigned char flags = request->c.flc_flags;
int status;
- request->fl_flags |= FL_ACCESS;
+ request->c.flc_flags |= FL_ACCESS;
status = locks_lock_inode_wait(state->inode, request);
if (status < 0)
goto out;
@@ -7410,7 +7413,7 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
if (test_bit(NFS_DELEGATED_STATE, &state->flags)) {
/* Yes: cache locks! */
/* ...but avoid races with delegation recall... */
- request->fl_flags = fl_flags & ~FL_SLEEP;
+ request->c.flc_flags = flags & ~FL_SLEEP;
status = locks_lock_inode_wait(state->inode, request);
up_read(&nfsi->rwsem);
mutex_unlock(&sp->so_delegreturn_mutex);
@@ -7420,7 +7423,7 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
mutex_unlock(&sp->so_delegreturn_mutex);
status = _nfs4_do_setlk(state, cmd, request, NFS_LOCK_NEW);
out:
- request->fl_flags = fl_flags;
+ request->c.flc_flags = flags;
return status;
}
@@ -7562,7 +7565,7 @@ nfs4_proc_lock(struct file *filp, int cmd, struct file_lock *request)
if (!(IS_SETLK(cmd) || IS_SETLKW(cmd)))
return -EINVAL;
- if (request->fl_type == F_UNLCK) {
+ if (lock_is_unlock(request)) {
if (state != NULL)
return nfs4_proc_unlck(state, cmd, request);
return 0;
@@ -7571,7 +7574,7 @@ nfs4_proc_lock(struct file *filp, int cmd, struct file_lock *request)
if (state == NULL)
return -ENOLCK;
- if ((request->fl_flags & FL_POSIX) &&
+ if ((request->c.flc_flags & FL_POSIX) &&
!test_bit(NFS_STATE_POSIX_LOCKS, &state->flags))
return -ENOLCK;
@@ -7579,7 +7582,7 @@ nfs4_proc_lock(struct file *filp, int cmd, struct file_lock *request)
* Don't rely on the VFS having checked the file open mode,
* since it won't do this for flock() locks.
*/
- switch (request->fl_type) {
+ switch (request->c.flc_type) {
case F_RDLCK:
if (!(filp->f_mode & FMODE_READ))
return -EBADF;
@@ -7601,7 +7604,7 @@ static int nfs4_delete_lease(struct file *file, void **priv)
return generic_setlease(file, F_UNLCK, NULL, priv);
}
-static int nfs4_add_lease(struct file *file, int arg, struct file_lock **lease,
+static int nfs4_add_lease(struct file *file, int arg, struct file_lease **lease,
void **priv)
{
struct inode *inode = file_inode(file);
@@ -7619,7 +7622,7 @@ static int nfs4_add_lease(struct file *file, int arg, struct file_lock **lease,
return -EAGAIN;
}
-int nfs4_proc_setlease(struct file *file, int arg, struct file_lock **lease,
+int nfs4_proc_setlease(struct file *file, int arg, struct file_lease **lease,
void **priv)
{
switch (arg) {
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 9a5d911..8cfabdbd 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -847,15 +847,15 @@ void nfs4_close_sync(struct nfs4_state *state, fmode_t fmode)
*/
static struct nfs4_lock_state *
__nfs4_find_lock_state(struct nfs4_state *state,
- fl_owner_t fl_owner, fl_owner_t fl_owner2)
+ fl_owner_t owner, fl_owner_t owner2)
{
struct nfs4_lock_state *pos, *ret = NULL;
list_for_each_entry(pos, &state->lock_states, ls_locks) {
- if (pos->ls_owner == fl_owner) {
+ if (pos->ls_owner == owner) {
ret = pos;
break;
}
- if (pos->ls_owner == fl_owner2)
+ if (pos->ls_owner == owner2)
ret = pos;
}
if (ret)
@@ -868,7 +868,7 @@ __nfs4_find_lock_state(struct nfs4_state *state,
* exists, return an uninitialized one.
*
*/
-static struct nfs4_lock_state *nfs4_alloc_lock_state(struct nfs4_state *state, fl_owner_t fl_owner)
+static struct nfs4_lock_state *nfs4_alloc_lock_state(struct nfs4_state *state, fl_owner_t owner)
{
struct nfs4_lock_state *lsp;
struct nfs_server *server = state->owner->so_server;
@@ -879,7 +879,7 @@ static struct nfs4_lock_state *nfs4_alloc_lock_state(struct nfs4_state *state, f
nfs4_init_seqid_counter(&lsp->ls_seqid);
refcount_set(&lsp->ls_count, 1);
lsp->ls_state = state;
- lsp->ls_owner = fl_owner;
+ lsp->ls_owner = owner;
lsp->ls_seqid.owner_id = ida_alloc(&server->lockowner_id, GFP_KERNEL_ACCOUNT);
if (lsp->ls_seqid.owner_id < 0)
goto out_free;
@@ -980,7 +980,7 @@ int nfs4_set_lock_state(struct nfs4_state *state, struct file_lock *fl)
if (fl->fl_ops != NULL)
return 0;
- lsp = nfs4_get_lock_state(state, fl->fl_owner);
+ lsp = nfs4_get_lock_state(state, fl->c.flc_owner);
if (lsp == NULL)
return -ENOMEM;
fl->fl_u.nfs4_fl.owner = lsp;
@@ -993,7 +993,7 @@ static int nfs4_copy_lock_stateid(nfs4_stateid *dst,
const struct nfs_lock_context *l_ctx)
{
struct nfs4_lock_state *lsp;
- fl_owner_t fl_owner, fl_flock_owner;
+ fl_owner_t owner, fl_flock_owner;
int ret = -ENOENT;
if (l_ctx == NULL)
@@ -1002,11 +1002,11 @@ static int nfs4_copy_lock_stateid(nfs4_stateid *dst,
if (test_bit(LK_STATE_IN_USE, &state->flags) == 0)
goto out;
- fl_owner = l_ctx->lockowner;
+ owner = l_ctx->lockowner;
fl_flock_owner = l_ctx->open_context->flock_owner;
spin_lock(&state->state_lock);
- lsp = __nfs4_find_lock_state(state, fl_owner, fl_flock_owner);
+ lsp = __nfs4_find_lock_state(state, owner, fl_flock_owner);
if (lsp && test_bit(NFS_LOCK_LOST, &lsp->ls_flags))
ret = -EIO;
else if (lsp != NULL && test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags) != 0) {
@@ -1529,8 +1529,8 @@ static int nfs4_reclaim_locks(struct nfs4_state *state, const struct nfs4_state_
down_write(&nfsi->rwsem);
spin_lock(&flctx->flc_lock);
restart:
- list_for_each_entry(fl, list, fl_list) {
- if (nfs_file_open_context(fl->fl_file)->state != state)
+ for_each_file_lock(fl, list) {
+ if (nfs_file_open_context(fl->c.flc_file)->state != state)
continue;
spin_unlock(&flctx->flc_lock);
status = ops->recover_lock(state, fl);
diff --git a/fs/nfs/nfs4trace.h b/fs/nfs/nfs4trace.h
index d27919d..fd7cb15 100644
--- a/fs/nfs/nfs4trace.h
+++ b/fs/nfs/nfs4trace.h
@@ -699,7 +699,7 @@ DECLARE_EVENT_CLASS(nfs4_lock_event,
__entry->error = error < 0 ? -error : 0;
__entry->cmd = cmd;
- __entry->type = request->fl_type;
+ __entry->type = request->c.flc_type;
__entry->start = request->fl_start;
__entry->end = request->fl_end;
__entry->dev = inode->i_sb->s_dev;
@@ -771,7 +771,7 @@ TRACE_EVENT(nfs4_set_lock,
__entry->error = error < 0 ? -error : 0;
__entry->cmd = cmd;
- __entry->type = request->fl_type;
+ __entry->type = request->c.flc_type;
__entry->start = request->fl_start;
__entry->end = request->fl_end;
__entry->dev = inode->i_sb->s_dev;
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 69406e6..1416099 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -1305,7 +1305,7 @@ static void encode_link(struct xdr_stream *xdr, const struct qstr *name, struct
static inline int nfs4_lock_type(struct file_lock *fl, int block)
{
- if (fl->fl_type == F_RDLCK)
+ if (lock_is_read(fl))
return block ? NFS4_READW_LT : NFS4_READ_LT;
return block ? NFS4_WRITEW_LT : NFS4_WRITE_LT;
}
@@ -5052,10 +5052,10 @@ static int decode_lock_denied (struct xdr_stream *xdr, struct file_lock *fl)
fl->fl_end = fl->fl_start + (loff_t)length - 1;
if (length == ~(uint64_t)0)
fl->fl_end = OFFSET_MAX;
- fl->fl_type = F_WRLCK;
+ fl->c.flc_type = F_WRLCK;
if (type & 1)
- fl->fl_type = F_RDLCK;
- fl->fl_pid = 0;
+ fl->c.flc_type = F_RDLCK;
+ fl->c.flc_pid = 0;
}
p = xdr_decode_hyper(p, &clientid); /* read 8 bytes */
namelen = be32_to_cpup(p); /* read 4 bytes */ /* have read all 32 bytes now */
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index bb79d3a..84bb852 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1301,7 +1301,7 @@ static bool
is_whole_file_wrlock(struct file_lock *fl)
{
return fl->fl_start == 0 && fl->fl_end == OFFSET_MAX &&
- fl->fl_type == F_WRLCK;
+ lock_is_write(fl);
}
/* If we know the page is up to date, and we're not using byte range locks (or
@@ -1335,13 +1335,13 @@ static int nfs_can_extend_write(struct file *file, struct folio *folio,
spin_lock(&flctx->flc_lock);
if (!list_empty(&flctx->flc_posix)) {
fl = list_first_entry(&flctx->flc_posix, struct file_lock,
- fl_list);
+ c.flc_list);
if (is_whole_file_wrlock(fl))
ret = 1;
} else if (!list_empty(&flctx->flc_flock)) {
fl = list_first_entry(&flctx->flc_flock, struct file_lock,
- fl_list);
- if (fl->fl_type == F_WRLCK)
+ c.flc_list);
+ if (lock_is_write(fl))
ret = 1;
}
spin_unlock(&flctx->flc_lock);
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 9cb7f0c..b86d849 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -662,8 +662,8 @@ nfsd_file_lease_notifier_call(struct notifier_block *nb, unsigned long arg,
struct file_lock *fl = data;
/* Only close files for F_SETLEASE leases */
- if (fl->fl_flags & FL_LEASE)
- nfsd_file_close_inode(file_inode(fl->fl_file));
+ if (fl->c.flc_flags & FL_LEASE)
+ nfsd_file_close_inode(file_inode(fl->c.flc_file));
return 0;
}
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 926c298..32d23ef 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -674,7 +674,7 @@ static void nfs4_xdr_enc_cb_notify_lock(struct rpc_rqst *req,
const struct nfsd4_callback *cb = data;
const struct nfsd4_blocked_lock *nbl =
container_of(cb, struct nfsd4_blocked_lock, nbl_cb);
- struct nfs4_lockowner *lo = (struct nfs4_lockowner *)nbl->nbl_lock.fl_owner;
+ struct nfs4_lockowner *lo = (struct nfs4_lockowner *)nbl->nbl_lock.c.flc_owner;
struct nfs4_cb_compound_hdr hdr = {
.ident = 0,
.minorversion = cb->cb_clp->cl_minorversion,
diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
index 5e8096b..4c0d00b 100644
--- a/fs/nfsd/nfs4layouts.c
+++ b/fs/nfsd/nfs4layouts.c
@@ -25,7 +25,7 @@ static struct kmem_cache *nfs4_layout_cache;
static struct kmem_cache *nfs4_layout_stateid_cache;
static const struct nfsd4_callback_ops nfsd4_cb_layout_ops;
-static const struct lock_manager_operations nfsd4_layouts_lm_ops;
+static const struct lease_manager_operations nfsd4_layouts_lm_ops;
const struct nfsd4_layout_ops *nfsd4_layout_ops[LAYOUT_TYPE_MAX] = {
#ifdef CONFIG_NFSD_FLEXFILELAYOUT
@@ -170,7 +170,7 @@ nfsd4_free_layout_stateid(struct nfs4_stid *stid)
spin_unlock(&fp->fi_lock);
if (!nfsd4_layout_ops[ls->ls_layout_type]->disable_recalls)
- vfs_setlease(ls->ls_file->nf_file, F_UNLCK, NULL, (void **)&ls);
+ kernel_setlease(ls->ls_file->nf_file, F_UNLCK, NULL, (void **)&ls);
nfsd_file_put(ls->ls_file);
if (ls->ls_recalled)
@@ -182,27 +182,26 @@ nfsd4_free_layout_stateid(struct nfs4_stid *stid)
static int
nfsd4_layout_setlease(struct nfs4_layout_stateid *ls)
{
- struct file_lock *fl;
+ struct file_lease *fl;
int status;
if (nfsd4_layout_ops[ls->ls_layout_type]->disable_recalls)
return 0;
- fl = locks_alloc_lock();
+ fl = locks_alloc_lease();
if (!fl)
return -ENOMEM;
- locks_init_lock(fl);
+ locks_init_lease(fl);
fl->fl_lmops = &nfsd4_layouts_lm_ops;
- fl->fl_flags = FL_LAYOUT;
- fl->fl_type = F_RDLCK;
- fl->fl_end = OFFSET_MAX;
- fl->fl_owner = ls;
- fl->fl_pid = current->tgid;
- fl->fl_file = ls->ls_file->nf_file;
+ fl->c.flc_flags = FL_LAYOUT;
+ fl->c.flc_type = F_RDLCK;
+ fl->c.flc_owner = ls;
+ fl->c.flc_pid = current->tgid;
+ fl->c.flc_file = ls->ls_file->nf_file;
- status = vfs_setlease(fl->fl_file, fl->fl_type, &fl, NULL);
+ status = kernel_setlease(fl->c.flc_file, fl->c.flc_type, &fl, NULL);
if (status) {
- locks_free_lock(fl);
+ locks_free_lease(fl);
return status;
}
BUG_ON(fl != NULL);
@@ -723,7 +722,7 @@ static const struct nfsd4_callback_ops nfsd4_cb_layout_ops = {
};
static bool
-nfsd4_layout_lm_break(struct file_lock *fl)
+nfsd4_layout_lm_break(struct file_lease *fl)
{
/*
* We don't want the locks code to timeout the lease for us;
@@ -731,19 +730,19 @@ nfsd4_layout_lm_break(struct file_lock *fl)
* in time:
*/
fl->fl_break_time = 0;
- nfsd4_recall_file_layout(fl->fl_owner);
+ nfsd4_recall_file_layout(fl->c.flc_owner);
return false;
}
static int
-nfsd4_layout_lm_change(struct file_lock *onlist, int arg,
+nfsd4_layout_lm_change(struct file_lease *onlist, int arg,
struct list_head *dispose)
{
BUG_ON(!(arg & F_UNLCK));
return lease_modify(onlist, arg, dispose);
}
-static const struct lock_manager_operations nfsd4_layouts_lm_ops = {
+static const struct lease_manager_operations nfsd4_layouts_lm_ops = {
.lm_break = nfsd4_layout_lm_break,
.lm_change = nfsd4_layout_lm_change,
};
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 6dc6340..9257425 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1249,7 +1249,7 @@ static void nfs4_unlock_deleg_lease(struct nfs4_delegation *dp)
WARN_ON_ONCE(!fp->fi_delegees);
- vfs_setlease(nf->nf_file, F_UNLCK, NULL, (void **)&dp);
+ kernel_setlease(nf->nf_file, F_UNLCK, NULL, (void **)&dp);
put_deleg_file(fp);
}
@@ -4922,9 +4922,9 @@ static void nfsd_break_one_deleg(struct nfs4_delegation *dp)
/* Called from break_lease() with flc_lock held. */
static bool
-nfsd_break_deleg_cb(struct file_lock *fl)
+nfsd_break_deleg_cb(struct file_lease *fl)
{
- struct nfs4_delegation *dp = (struct nfs4_delegation *)fl->fl_owner;
+ struct nfs4_delegation *dp = (struct nfs4_delegation *) fl->c.flc_owner;
struct nfs4_file *fp = dp->dl_stid.sc_file;
struct nfs4_client *clp = dp->dl_stid.sc_client;
struct nfsd_net *nn;
@@ -4945,10 +4945,8 @@ nfsd_break_deleg_cb(struct file_lock *fl)
*/
fl->fl_break_time = 0;
- spin_lock(&fp->fi_lock);
fp->fi_had_conflict = true;
nfsd_break_one_deleg(dp);
- spin_unlock(&fp->fi_lock);
return false;
}
@@ -4960,9 +4958,9 @@ nfsd_break_deleg_cb(struct file_lock *fl)
* %true: Lease conflict was resolved
* %false: Lease conflict was not resolved.
*/
-static bool nfsd_breaker_owns_lease(struct file_lock *fl)
+static bool nfsd_breaker_owns_lease(struct file_lease *fl)
{
- struct nfs4_delegation *dl = fl->fl_owner;
+ struct nfs4_delegation *dl = fl->c.flc_owner;
struct svc_rqst *rqst;
struct nfs4_client *clp;
@@ -4977,10 +4975,10 @@ static bool nfsd_breaker_owns_lease(struct file_lock *fl)
}
static int
-nfsd_change_deleg_cb(struct file_lock *onlist, int arg,
+nfsd_change_deleg_cb(struct file_lease *onlist, int arg,
struct list_head *dispose)
{
- struct nfs4_delegation *dp = (struct nfs4_delegation *)onlist->fl_owner;
+ struct nfs4_delegation *dp = (struct nfs4_delegation *) onlist->c.flc_owner;
struct nfs4_client *clp = dp->dl_stid.sc_client;
if (arg & F_UNLCK) {
@@ -4991,7 +4989,7 @@ nfsd_change_deleg_cb(struct file_lock *onlist, int arg,
return -EAGAIN;
}
-static const struct lock_manager_operations nfsd_lease_mng_ops = {
+static const struct lease_manager_operations nfsd_lease_mng_ops = {
.lm_breaker_owns_lease = nfsd_breaker_owns_lease,
.lm_break = nfsd_break_deleg_cb,
.lm_change = nfsd_change_deleg_cb,
@@ -5331,21 +5329,20 @@ static bool nfsd4_cb_channel_good(struct nfs4_client *clp)
return clp->cl_minorversion && clp->cl_cb_state == NFSD4_CB_UNKNOWN;
}
-static struct file_lock *nfs4_alloc_init_lease(struct nfs4_delegation *dp,
+static struct file_lease *nfs4_alloc_init_lease(struct nfs4_delegation *dp,
int flag)
{
- struct file_lock *fl;
+ struct file_lease *fl;
- fl = locks_alloc_lock();
+ fl = locks_alloc_lease();
if (!fl)
return NULL;
fl->fl_lmops = &nfsd_lease_mng_ops;
- fl->fl_flags = FL_DELEG;
- fl->fl_type = flag == NFS4_OPEN_DELEGATE_READ? F_RDLCK: F_WRLCK;
- fl->fl_end = OFFSET_MAX;
- fl->fl_owner = (fl_owner_t)dp;
- fl->fl_pid = current->tgid;
- fl->fl_file = dp->dl_stid.sc_file->fi_deleg_file->nf_file;
+ fl->c.flc_flags = FL_DELEG;
+ fl->c.flc_type = flag == NFS4_OPEN_DELEGATE_READ? F_RDLCK: F_WRLCK;
+ fl->c.flc_owner = (fl_owner_t)dp;
+ fl->c.flc_pid = current->tgid;
+ fl->c.flc_file = dp->dl_stid.sc_file->fi_deleg_file->nf_file;
return fl;
}
@@ -5463,7 +5460,7 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
struct nfs4_clnt_odstate *odstate = stp->st_clnt_odstate;
struct nfs4_delegation *dp;
struct nfsd_file *nf = NULL;
- struct file_lock *fl;
+ struct file_lease *fl;
u32 dl_type;
/*
@@ -5533,9 +5530,10 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
if (!fl)
goto out_clnt_odstate;
- status = vfs_setlease(fp->fi_deleg_file->nf_file, fl->fl_type, &fl, NULL);
+ status = kernel_setlease(fp->fi_deleg_file->nf_file,
+ fl->c.flc_type, &fl, NULL);
if (fl)
- locks_free_lock(fl);
+ locks_free_lease(fl);
if (status)
goto out_clnt_odstate;
@@ -5557,12 +5555,13 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
if (status)
goto out_unlock;
+ status = -EAGAIN;
+ if (fp->fi_had_conflict)
+ goto out_unlock;
+
spin_lock(&state_lock);
spin_lock(&fp->fi_lock);
- if (fp->fi_had_conflict)
- status = -EAGAIN;
- else
- status = hash_delegation_locked(dp, fp);
+ status = hash_delegation_locked(dp, fp);
spin_unlock(&fp->fi_lock);
spin_unlock(&state_lock);
@@ -5571,7 +5570,7 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
return dp;
out_unlock:
- vfs_setlease(fp->fi_deleg_file->nf_file, F_UNLCK, NULL, (void **)&dp);
+ kernel_setlease(fp->fi_deleg_file->nf_file, F_UNLCK, NULL, (void **)&dp);
out_clnt_odstate:
put_clnt_odstate(dp->dl_clnt_odstate);
nfs4_put_stid(&dp->dl_stid);
@@ -7149,7 +7148,7 @@ nfsd4_lm_put_owner(fl_owner_t owner)
static bool
nfsd4_lm_lock_expirable(struct file_lock *cfl)
{
- struct nfs4_lockowner *lo = (struct nfs4_lockowner *)cfl->fl_owner;
+ struct nfs4_lockowner *lo = (struct nfs4_lockowner *) cfl->c.flc_owner;
struct nfs4_client *clp = lo->lo_owner.so_client;
struct nfsd_net *nn;
@@ -7171,7 +7170,7 @@ nfsd4_lm_expire_lock(void)
static void
nfsd4_lm_notify(struct file_lock *fl)
{
- struct nfs4_lockowner *lo = (struct nfs4_lockowner *)fl->fl_owner;
+ struct nfs4_lockowner *lo = (struct nfs4_lockowner *) fl->c.flc_owner;
struct net *net = lo->lo_owner.so_client->net;
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
struct nfsd4_blocked_lock *nbl = container_of(fl,
@@ -7208,7 +7207,7 @@ nfs4_set_lock_denied(struct file_lock *fl, struct nfsd4_lock_denied *deny)
struct nfs4_lockowner *lo;
if (fl->fl_lmops == &nfsd_posix_mng_ops) {
- lo = (struct nfs4_lockowner *) fl->fl_owner;
+ lo = (struct nfs4_lockowner *) fl->c.flc_owner;
xdr_netobj_dup(&deny->ld_owner, &lo->lo_owner.so_owner,
GFP_KERNEL);
if (!deny->ld_owner.data)
@@ -7227,7 +7226,7 @@ nfs4_set_lock_denied(struct file_lock *fl, struct nfsd4_lock_denied *deny)
if (fl->fl_end != NFS4_MAX_UINT64)
deny->ld_length = fl->fl_end - fl->fl_start + 1;
deny->ld_type = NFS4_READ_LT;
- if (fl->fl_type != F_RDLCK)
+ if (fl->c.flc_type != F_RDLCK)
deny->ld_type = NFS4_WRITE_LT;
}
@@ -7493,8 +7492,8 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
int lkflg;
int err;
bool new = false;
- unsigned char fl_type;
- unsigned int fl_flags = FL_POSIX;
+ unsigned char type;
+ unsigned int flags = FL_POSIX;
struct net *net = SVC_NET(rqstp);
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
@@ -7557,14 +7556,14 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
goto out;
if (lock->lk_reclaim)
- fl_flags |= FL_RECLAIM;
+ flags |= FL_RECLAIM;
fp = lock_stp->st_stid.sc_file;
switch (lock->lk_type) {
case NFS4_READW_LT:
if (nfsd4_has_session(cstate) ||
exportfs_lock_op_is_async(sb->s_export_op))
- fl_flags |= FL_SLEEP;
+ flags |= FL_SLEEP;
fallthrough;
case NFS4_READ_LT:
spin_lock(&fp->fi_lock);
@@ -7572,12 +7571,12 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
if (nf)
get_lock_access(lock_stp, NFS4_SHARE_ACCESS_READ);
spin_unlock(&fp->fi_lock);
- fl_type = F_RDLCK;
+ type = F_RDLCK;
break;
case NFS4_WRITEW_LT:
if (nfsd4_has_session(cstate) ||
exportfs_lock_op_is_async(sb->s_export_op))
- fl_flags |= FL_SLEEP;
+ flags |= FL_SLEEP;
fallthrough;
case NFS4_WRITE_LT:
spin_lock(&fp->fi_lock);
@@ -7585,7 +7584,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
if (nf)
get_lock_access(lock_stp, NFS4_SHARE_ACCESS_WRITE);
spin_unlock(&fp->fi_lock);
- fl_type = F_WRLCK;
+ type = F_WRLCK;
break;
default:
status = nfserr_inval;
@@ -7605,7 +7604,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
* on those filesystems:
*/
if (!exportfs_lock_op_is_async(sb->s_export_op))
- fl_flags &= ~FL_SLEEP;
+ flags &= ~FL_SLEEP;
nbl = find_or_allocate_block(lock_sop, &fp->fi_fhandle, nn);
if (!nbl) {
@@ -7615,11 +7614,11 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
}
file_lock = &nbl->nbl_lock;
- file_lock->fl_type = fl_type;
- file_lock->fl_owner = (fl_owner_t)lockowner(nfs4_get_stateowner(&lock_sop->lo_owner));
- file_lock->fl_pid = current->tgid;
- file_lock->fl_file = nf->nf_file;
- file_lock->fl_flags = fl_flags;
+ file_lock->c.flc_type = type;
+ file_lock->c.flc_owner = (fl_owner_t)lockowner(nfs4_get_stateowner(&lock_sop->lo_owner));
+ file_lock->c.flc_pid = current->tgid;
+ file_lock->c.flc_file = nf->nf_file;
+ file_lock->c.flc_flags = flags;
file_lock->fl_lmops = &nfsd_posix_mng_ops;
file_lock->fl_start = lock->lk_offset;
file_lock->fl_end = last_byte_offset(lock->lk_offset, lock->lk_length);
@@ -7632,7 +7631,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
goto out;
}
- if (fl_flags & FL_SLEEP) {
+ if (flags & FL_SLEEP) {
nbl->nbl_time = ktime_get_boottime_seconds();
spin_lock(&nn->blocked_locks_lock);
list_add_tail(&nbl->nbl_list, &lock_sop->lo_blocked);
@@ -7669,7 +7668,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
out:
if (nbl) {
/* dequeue it if we queued it before */
- if (fl_flags & FL_SLEEP) {
+ if (flags & FL_SLEEP) {
spin_lock(&nn->blocked_locks_lock);
if (!list_empty(&nbl->nbl_list) &&
!list_empty(&nbl->nbl_lru)) {
@@ -7737,9 +7736,9 @@ static __be32 nfsd_test_lock(struct svc_rqst *rqstp, struct svc_fh *fhp, struct
err = nfserrno(nfsd_open_break_lease(inode, NFSD_MAY_READ));
if (err)
goto out;
- lock->fl_file = nf->nf_file;
+ lock->c.flc_file = nf->nf_file;
err = nfserrno(vfs_test_lock(nf->nf_file, lock));
- lock->fl_file = NULL;
+ lock->c.flc_file = NULL;
out:
inode_unlock(inode);
nfsd_file_put(nf);
@@ -7784,11 +7783,11 @@ nfsd4_lockt(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
switch (lockt->lt_type) {
case NFS4_READ_LT:
case NFS4_READW_LT:
- file_lock->fl_type = F_RDLCK;
+ file_lock->c.flc_type = F_RDLCK;
break;
case NFS4_WRITE_LT:
case NFS4_WRITEW_LT:
- file_lock->fl_type = F_WRLCK;
+ file_lock->c.flc_type = F_WRLCK;
break;
default:
dprintk("NFSD: nfs4_lockt: bad lock type!\n");
@@ -7798,9 +7797,9 @@ nfsd4_lockt(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
lo = find_lockowner_str(cstate->clp, &lockt->lt_owner);
if (lo)
- file_lock->fl_owner = (fl_owner_t)lo;
- file_lock->fl_pid = current->tgid;
- file_lock->fl_flags = FL_POSIX;
+ file_lock->c.flc_owner = (fl_owner_t)lo;
+ file_lock->c.flc_pid = current->tgid;
+ file_lock->c.flc_flags = FL_POSIX;
file_lock->fl_start = lockt->lt_offset;
file_lock->fl_end = last_byte_offset(lockt->lt_offset, lockt->lt_length);
@@ -7811,7 +7810,7 @@ nfsd4_lockt(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
if (status)
goto out;
- if (file_lock->fl_type != F_UNLCK) {
+ if (file_lock->c.flc_type != F_UNLCK) {
status = nfserr_denied;
nfs4_set_lock_denied(file_lock, &lockt->lt_denied);
}
@@ -7867,11 +7866,11 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
goto put_file;
}
- file_lock->fl_type = F_UNLCK;
- file_lock->fl_owner = (fl_owner_t)lockowner(nfs4_get_stateowner(stp->st_stateowner));
- file_lock->fl_pid = current->tgid;
- file_lock->fl_file = nf->nf_file;
- file_lock->fl_flags = FL_POSIX;
+ file_lock->c.flc_type = F_UNLCK;
+ file_lock->c.flc_owner = (fl_owner_t)lockowner(nfs4_get_stateowner(stp->st_stateowner));
+ file_lock->c.flc_pid = current->tgid;
+ file_lock->c.flc_file = nf->nf_file;
+ file_lock->c.flc_flags = FL_POSIX;
file_lock->fl_lmops = &nfsd_posix_mng_ops;
file_lock->fl_start = locku->lu_offset;
@@ -7928,8 +7927,8 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner)
if (flctx && !list_empty_careful(&flctx->flc_posix)) {
spin_lock(&flctx->flc_lock);
- list_for_each_entry(fl, &flctx->flc_posix, fl_list) {
- if (fl->fl_owner == (fl_owner_t)lowner) {
+ for_each_file_lock(fl, &flctx->flc_posix) {
+ if (fl->c.flc_owner == (fl_owner_t)lowner) {
status = true;
break;
}
@@ -8452,15 +8451,17 @@ nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct inode *inode)
{
__be32 status;
struct file_lock_context *ctx;
- struct file_lock *fl;
+ struct file_lease *fl;
struct nfs4_delegation *dp;
ctx = locks_inode_context(inode);
if (!ctx)
return 0;
spin_lock(&ctx->flc_lock);
- list_for_each_entry(fl, &ctx->flc_lease, fl_list) {
- if (fl->fl_flags == FL_LAYOUT)
+ for_each_file_lock(fl, &ctx->flc_lease) {
+ unsigned char type = fl->c.flc_type;
+
+ if (fl->c.flc_flags == FL_LAYOUT)
continue;
if (fl->fl_lmops != &nfsd_lease_mng_ops) {
/*
@@ -8468,12 +8469,12 @@ nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct inode *inode)
* we are done; there isn't any write delegation
* on this inode
*/
- if (fl->fl_type == F_RDLCK)
+ if (type == F_RDLCK)
break;
goto break_lease;
}
- if (fl->fl_type == F_WRLCK) {
- dp = fl->fl_owner;
+ if (type == F_WRLCK) {
+ dp = fl->c.flc_owner;
if (dp->dl_recall.cb_clp == *(rqstp->rq_lease_breaker)) {
spin_unlock(&ctx->flc_lock);
return 0;
diff --git a/fs/nilfs2/file.c b/fs/nilfs2/file.c
index bec33b8..0e3fc5ba 100644
--- a/fs/nilfs2/file.c
+++ b/fs/nilfs2/file.c
@@ -107,7 +107,13 @@ static vm_fault_t nilfs_page_mkwrite(struct vm_fault *vmf)
nilfs_transaction_commit(inode->i_sb);
mapped:
- folio_wait_stable(folio);
+ /*
+ * Since checksumming including data blocks is performed to determine
+ * the validity of the log to be written and used for recovery, it is
+ * necessary to wait for writeback to finish here, regardless of the
+ * stable write requirement of the backing device.
+ */
+ folio_wait_writeback(folio);
out:
sb_end_pagefault(inode->i_sb);
return vmf_fs_error(ret);
diff --git a/fs/nilfs2/recovery.c b/fs/nilfs2/recovery.c
index 0955b657..a9b8d77 100644
--- a/fs/nilfs2/recovery.c
+++ b/fs/nilfs2/recovery.c
@@ -472,9 +472,10 @@ static int nilfs_prepare_segment_for_recovery(struct the_nilfs *nilfs,
static int nilfs_recovery_copy_block(struct the_nilfs *nilfs,
struct nilfs_recovery_block *rb,
- struct page *page)
+ loff_t pos, struct page *page)
{
struct buffer_head *bh_org;
+ size_t from = pos & ~PAGE_MASK;
void *kaddr;
bh_org = __bread(nilfs->ns_bdev, rb->blocknr, nilfs->ns_blocksize);
@@ -482,7 +483,7 @@ static int nilfs_recovery_copy_block(struct the_nilfs *nilfs,
return -EIO;
kaddr = kmap_atomic(page);
- memcpy(kaddr + bh_offset(bh_org), bh_org->b_data, bh_org->b_size);
+ memcpy(kaddr + from, bh_org->b_data, bh_org->b_size);
kunmap_atomic(kaddr);
brelse(bh_org);
return 0;
@@ -521,7 +522,7 @@ static int nilfs_recover_dsync_blocks(struct the_nilfs *nilfs,
goto failed_inode;
}
- err = nilfs_recovery_copy_block(nilfs, rb, page);
+ err = nilfs_recovery_copy_block(nilfs, rb, pos, page);
if (unlikely(err))
goto failed_page;
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 2590a08..2bfb080 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -1703,7 +1703,6 @@ static void nilfs_segctor_prepare_write(struct nilfs_sc_info *sci)
list_for_each_entry(bh, &segbuf->sb_payload_buffers,
b_assoc_buffers) {
- set_buffer_async_write(bh);
if (bh == segbuf->sb_super_root) {
if (bh->b_folio != bd_folio) {
folio_lock(bd_folio);
@@ -1714,6 +1713,7 @@ static void nilfs_segctor_prepare_write(struct nilfs_sc_info *sci)
}
break;
}
+ set_buffer_async_write(bh);
if (bh->b_folio != fs_folio) {
nilfs_begin_folio_io(fs_folio);
fs_folio = bh->b_folio;
@@ -1800,7 +1800,6 @@ static void nilfs_abort_logs(struct list_head *logs, int err)
list_for_each_entry(bh, &segbuf->sb_payload_buffers,
b_assoc_buffers) {
- clear_buffer_async_write(bh);
if (bh == segbuf->sb_super_root) {
clear_buffer_uptodate(bh);
if (bh->b_folio != bd_folio) {
@@ -1809,6 +1808,7 @@ static void nilfs_abort_logs(struct list_head *logs, int err)
}
break;
}
+ clear_buffer_async_write(bh);
if (bh->b_folio != fs_folio) {
nilfs_end_folio_io(fs_folio, err);
fs_folio = bh->b_folio;
@@ -1896,8 +1896,9 @@ static void nilfs_segctor_complete_write(struct nilfs_sc_info *sci)
BIT(BH_Delay) | BIT(BH_NILFS_Volatile) |
BIT(BH_NILFS_Redirected));
- set_mask_bits(&bh->b_state, clear_bits, set_bits);
if (bh == segbuf->sb_super_root) {
+ set_buffer_uptodate(bh);
+ clear_buffer_dirty(bh);
if (bh->b_folio != bd_folio) {
folio_end_writeback(bd_folio);
bd_folio = bh->b_folio;
@@ -1905,6 +1906,7 @@ static void nilfs_segctor_complete_write(struct nilfs_sc_info *sci)
update_sr = true;
break;
}
+ set_mask_bits(&bh->b_state, clear_bits, set_bits);
if (bh->b_folio != fs_folio) {
nilfs_end_folio_io(fs_folio, 0);
fs_folio = bh->b_folio;
diff --git a/fs/nsfs.c b/fs/nsfs.c
index 34e1e3e..7aaafb5 100644
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -27,26 +27,17 @@ static const struct file_operations ns_file_operations = {
static char *ns_dname(struct dentry *dentry, char *buffer, int buflen)
{
struct inode *inode = d_inode(dentry);
- const struct proc_ns_operations *ns_ops = dentry->d_fsdata;
+ struct ns_common *ns = inode->i_private;
+ const struct proc_ns_operations *ns_ops = ns->ops;
return dynamic_dname(buffer, buflen, "%s:[%lu]",
ns_ops->name, inode->i_ino);
}
-static void ns_prune_dentry(struct dentry *dentry)
-{
- struct inode *inode = d_inode(dentry);
- if (inode) {
- struct ns_common *ns = inode->i_private;
- atomic_long_set(&ns->stashed, 0);
- }
-}
-
-const struct dentry_operations ns_dentry_operations =
-{
- .d_prune = ns_prune_dentry,
+const struct dentry_operations ns_dentry_operations = {
.d_delete = always_delete_dentry,
.d_dname = ns_dname,
+ .d_prune = stashed_dentry_prune,
};
static void nsfs_evict(struct inode *inode)
@@ -56,67 +47,16 @@ static void nsfs_evict(struct inode *inode)
ns->ops->put(ns);
}
-static int __ns_get_path(struct path *path, struct ns_common *ns)
-{
- struct vfsmount *mnt = nsfs_mnt;
- struct dentry *dentry;
- struct inode *inode;
- unsigned long d;
-
- rcu_read_lock();
- d = atomic_long_read(&ns->stashed);
- if (!d)
- goto slow;
- dentry = (struct dentry *)d;
- if (!lockref_get_not_dead(&dentry->d_lockref))
- goto slow;
- rcu_read_unlock();
- ns->ops->put(ns);
-got_it:
- path->mnt = mntget(mnt);
- path->dentry = dentry;
- return 0;
-slow:
- rcu_read_unlock();
- inode = new_inode_pseudo(mnt->mnt_sb);
- if (!inode) {
- ns->ops->put(ns);
- return -ENOMEM;
- }
- inode->i_ino = ns->inum;
- simple_inode_init_ts(inode);
- inode->i_flags |= S_IMMUTABLE;
- inode->i_mode = S_IFREG | S_IRUGO;
- inode->i_fop = &ns_file_operations;
- inode->i_private = ns;
-
- dentry = d_make_root(inode); /* not the normal use, but... */
- if (!dentry)
- return -ENOMEM;
- dentry->d_fsdata = (void *)ns->ops;
- d = atomic_long_cmpxchg(&ns->stashed, 0, (unsigned long)dentry);
- if (d) {
- d_delete(dentry); /* make sure ->d_prune() does nothing */
- dput(dentry);
- cpu_relax();
- return -EAGAIN;
- }
- goto got_it;
-}
-
int ns_get_path_cb(struct path *path, ns_get_path_helper_t *ns_get_cb,
void *private_data)
{
- int ret;
+ struct ns_common *ns;
- do {
- struct ns_common *ns = ns_get_cb(private_data);
- if (!ns)
- return -ENOENT;
- ret = __ns_get_path(path, ns);
- } while (ret == -EAGAIN);
+ ns = ns_get_cb(private_data);
+ if (!ns)
+ return -ENOENT;
- return ret;
+ return path_from_stashed(&ns->stashed, ns->inum, nsfs_mnt, ns, path);
}
struct ns_get_path_task_args {
@@ -146,6 +86,7 @@ int open_related_ns(struct ns_common *ns,
struct ns_common *(*get_ns)(struct ns_common *ns))
{
struct path path = {};
+ struct ns_common *relative;
struct file *f;
int err;
int fd;
@@ -154,19 +95,15 @@ int open_related_ns(struct ns_common *ns,
if (fd < 0)
return fd;
- do {
- struct ns_common *relative;
+ relative = get_ns(ns);
+ if (IS_ERR(relative)) {
+ put_unused_fd(fd);
+ return PTR_ERR(relative);
+ }
- relative = get_ns(ns);
- if (IS_ERR(relative)) {
- put_unused_fd(fd);
- return PTR_ERR(relative);
- }
-
- err = __ns_get_path(&path, relative);
- } while (err == -EAGAIN);
-
- if (err) {
+ err = path_from_stashed(&relative->stashed, relative->inum, nsfs_mnt,
+ relative, &path);
+ if (err < 0) {
put_unused_fd(fd);
return err;
}
@@ -249,7 +186,8 @@ bool ns_match(const struct ns_common *ns, dev_t dev, ino_t ino)
static int nsfs_show_path(struct seq_file *seq, struct dentry *dentry)
{
struct inode *inode = d_inode(dentry);
- const struct proc_ns_operations *ns_ops = dentry->d_fsdata;
+ const struct ns_common *ns = inode->i_private;
+ const struct proc_ns_operations *ns_ops = ns->ops;
seq_printf(seq, "%s:[%lu]", ns_ops->name, inode->i_ino);
return 0;
@@ -261,6 +199,24 @@ static const struct super_operations nsfs_ops = {
.show_path = nsfs_show_path,
};
+static void nsfs_init_inode(struct inode *inode, void *data)
+{
+ inode->i_private = data;
+ inode->i_mode |= S_IRUGO;
+ inode->i_fop = &ns_file_operations;
+}
+
+static void nsfs_put_data(void *data)
+{
+ struct ns_common *ns = data;
+ ns->ops->put(ns);
+}
+
+static const struct stashed_operations nsfs_stashed_ops = {
+ .init_inode = nsfs_init_inode,
+ .put_data = nsfs_put_data,
+};
+
static int nsfs_init_fs_context(struct fs_context *fc)
{
struct pseudo_fs_context *ctx = init_pseudo(fc, NSFS_MAGIC);
@@ -268,6 +224,7 @@ static int nsfs_init_fs_context(struct fs_context *fc)
return -ENOMEM;
ctx->ops = &nsfs_ops;
ctx->dops = &ns_dentry_operations;
+ fc->s_fs_info = (void *)&nsfs_stashed_ops;
return 0;
}
diff --git a/fs/ntfs/Kconfig b/fs/ntfs/Kconfig
deleted file mode 100644
index 7b25097..0000000
--- a/fs/ntfs/Kconfig
+++ /dev/null
@@ -1,81 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0-only
-config NTFS_FS
- tristate "NTFS file system support"
- select BUFFER_HEAD
- select NLS
- help
- NTFS is the file system of Microsoft Windows NT, 2000, XP and 2003.
-
- Saying Y or M here enables read support. There is partial, but
- safe, write support available. For write support you must also
- say Y to "NTFS write support" below.
-
- There are also a number of user-space tools available, called
- ntfsprogs. These include ntfsundelete and ntfsresize, that work
- without NTFS support enabled in the kernel.
-
- This is a rewrite from scratch of Linux NTFS support and replaced
- the old NTFS code starting with Linux 2.5.11. A backport to
- the Linux 2.4 kernel series is separately available as a patch
- from the project web site.
-
- For more information see <file:Documentation/filesystems/ntfs.rst>
- and <http://www.linux-ntfs.org/>.
-
- To compile this file system support as a module, choose M here: the
- module will be called ntfs.
-
- If you are not using Windows NT, 2000, XP or 2003 in addition to
- Linux on your computer it is safe to say N.
-
-config NTFS_DEBUG
- bool "NTFS debugging support"
- depends on NTFS_FS
- help
- If you are experiencing any problems with the NTFS file system, say
- Y here. This will result in additional consistency checks to be
- performed by the driver as well as additional debugging messages to
- be written to the system log. Note that debugging messages are
- disabled by default. To enable them, supply the option debug_msgs=1
- at the kernel command line when booting the kernel or as an option
- to insmod when loading the ntfs module. Once the driver is active,
- you can enable debugging messages by doing (as root):
- echo 1 > /proc/sys/fs/ntfs-debug
- Replacing the "1" with "0" would disable debug messages.
-
- If you leave debugging messages disabled, this results in little
- overhead, but enabling debug messages results in very significant
- slowdown of the system.
-
- When reporting bugs, please try to have available a full dump of
- debugging messages while the misbehaviour was occurring.
-
-config NTFS_RW
- bool "NTFS write support"
- depends on NTFS_FS
- depends on PAGE_SIZE_LESS_THAN_64KB
- help
- This enables the partial, but safe, write support in the NTFS driver.
-
- The only supported operation is overwriting existing files, without
- changing the file length. No file or directory creation, deletion or
- renaming is possible. Note only non-resident files can be written to
- so you may find that some very small files (<500 bytes or so) cannot
- be written to.
-
- While we cannot guarantee that it will not damage any data, we have
- so far not received a single report where the driver would have
- damaged someones data so we assume it is perfectly safe to use.
-
- Note: While write support is safe in this version (a rewrite from
- scratch of the NTFS support), it should be noted that the old NTFS
- write support, included in Linux 2.5.10 and before (since 1997),
- is not safe.
-
- This is currently useful with TopologiLinux. TopologiLinux is run
- on top of any DOS/Microsoft Windows system without partitioning your
- hard disk. Unlike other Linux distributions TopologiLinux does not
- need its own partition. For more information see
- <http://topologi-linux.sourceforge.net/>
-
- It is perfectly safe to say N here.
diff --git a/fs/ntfs/Makefile b/fs/ntfs/Makefile
deleted file mode 100644
index 3e73657..0000000
--- a/fs/ntfs/Makefile
+++ /dev/null
@@ -1,15 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-# Rules for making the NTFS driver.
-
-obj-$(CONFIG_NTFS_FS) += ntfs.o
-
-ntfs-y := aops.o attrib.o collate.o compress.o debug.o dir.o file.o \
- index.o inode.o mft.o mst.o namei.o runlist.o super.o sysctl.o \
- unistr.o upcase.o
-
-ntfs-$(CONFIG_NTFS_RW) += bitmap.o lcnalloc.o logfile.o quota.o usnjrnl.o
-
-ccflags-y := -DNTFS_VERSION=\"2.1.32\"
-ccflags-$(CONFIG_NTFS_DEBUG) += -DDEBUG
-ccflags-$(CONFIG_NTFS_RW) += -DNTFS_RW
-
diff --git a/fs/ntfs/aops.c b/fs/ntfs/aops.c
deleted file mode 100644
index 2d01517..0000000
--- a/fs/ntfs/aops.c
+++ /dev/null
@@ -1,1744 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * aops.c - NTFS kernel address space operations and page cache handling.
- *
- * Copyright (c) 2001-2014 Anton Altaparmakov and Tuxera Inc.
- * Copyright (c) 2002 Richard Russon
- */
-
-#include <linux/errno.h>
-#include <linux/fs.h>
-#include <linux/gfp.h>
-#include <linux/mm.h>
-#include <linux/pagemap.h>
-#include <linux/swap.h>
-#include <linux/buffer_head.h>
-#include <linux/writeback.h>
-#include <linux/bit_spinlock.h>
-#include <linux/bio.h>
-
-#include "aops.h"
-#include "attrib.h"
-#include "debug.h"
-#include "inode.h"
-#include "mft.h"
-#include "runlist.h"
-#include "types.h"
-#include "ntfs.h"
-
-/**
- * ntfs_end_buffer_async_read - async io completion for reading attributes
- * @bh: buffer head on which io is completed
- * @uptodate: whether @bh is now uptodate or not
- *
- * Asynchronous I/O completion handler for reading pages belonging to the
- * attribute address space of an inode. The inodes can either be files or
- * directories or they can be fake inodes describing some attribute.
- *
- * If NInoMstProtected(), perform the post read mst fixups when all IO on the
- * page has been completed and mark the page uptodate or set the error bit on
- * the page. To determine the size of the records that need fixing up, we
- * cheat a little bit by setting the index_block_size in ntfs_inode to the ntfs
- * record size, and index_block_size_bits, to the log(base 2) of the ntfs
- * record size.
- */
-static void ntfs_end_buffer_async_read(struct buffer_head *bh, int uptodate)
-{
- unsigned long flags;
- struct buffer_head *first, *tmp;
- struct page *page;
- struct inode *vi;
- ntfs_inode *ni;
- int page_uptodate = 1;
-
- page = bh->b_page;
- vi = page->mapping->host;
- ni = NTFS_I(vi);
-
- if (likely(uptodate)) {
- loff_t i_size;
- s64 file_ofs, init_size;
-
- set_buffer_uptodate(bh);
-
- file_ofs = ((s64)page->index << PAGE_SHIFT) +
- bh_offset(bh);
- read_lock_irqsave(&ni->size_lock, flags);
- init_size = ni->initialized_size;
- i_size = i_size_read(vi);
- read_unlock_irqrestore(&ni->size_lock, flags);
- if (unlikely(init_size > i_size)) {
- /* Race with shrinking truncate. */
- init_size = i_size;
- }
- /* Check for the current buffer head overflowing. */
- if (unlikely(file_ofs + bh->b_size > init_size)) {
- int ofs;
- void *kaddr;
-
- ofs = 0;
- if (file_ofs < init_size)
- ofs = init_size - file_ofs;
- kaddr = kmap_atomic(page);
- memset(kaddr + bh_offset(bh) + ofs, 0,
- bh->b_size - ofs);
- flush_dcache_page(page);
- kunmap_atomic(kaddr);
- }
- } else {
- clear_buffer_uptodate(bh);
- SetPageError(page);
- ntfs_error(ni->vol->sb, "Buffer I/O error, logical block "
- "0x%llx.", (unsigned long long)bh->b_blocknr);
- }
- first = page_buffers(page);
- spin_lock_irqsave(&first->b_uptodate_lock, flags);
- clear_buffer_async_read(bh);
- unlock_buffer(bh);
- tmp = bh;
- do {
- if (!buffer_uptodate(tmp))
- page_uptodate = 0;
- if (buffer_async_read(tmp)) {
- if (likely(buffer_locked(tmp)))
- goto still_busy;
- /* Async buffers must be locked. */
- BUG();
- }
- tmp = tmp->b_this_page;
- } while (tmp != bh);
- spin_unlock_irqrestore(&first->b_uptodate_lock, flags);
- /*
- * If none of the buffers had errors then we can set the page uptodate,
- * but we first have to perform the post read mst fixups, if the
- * attribute is mst protected, i.e. if NInoMstProteced(ni) is true.
- * Note we ignore fixup errors as those are detected when
- * map_mft_record() is called which gives us per record granularity
- * rather than per page granularity.
- */
- if (!NInoMstProtected(ni)) {
- if (likely(page_uptodate && !PageError(page)))
- SetPageUptodate(page);
- } else {
- u8 *kaddr;
- unsigned int i, recs;
- u32 rec_size;
-
- rec_size = ni->itype.index.block_size;
- recs = PAGE_SIZE / rec_size;
- /* Should have been verified before we got here... */
- BUG_ON(!recs);
- kaddr = kmap_atomic(page);
- for (i = 0; i < recs; i++)
- post_read_mst_fixup((NTFS_RECORD*)(kaddr +
- i * rec_size), rec_size);
- kunmap_atomic(kaddr);
- flush_dcache_page(page);
- if (likely(page_uptodate && !PageError(page)))
- SetPageUptodate(page);
- }
- unlock_page(page);
- return;
-still_busy:
- spin_unlock_irqrestore(&first->b_uptodate_lock, flags);
- return;
-}
-
-/**
- * ntfs_read_block - fill a @folio of an address space with data
- * @folio: page cache folio to fill with data
- *
- * We read each buffer asynchronously and when all buffers are read in, our io
- * completion handler ntfs_end_buffer_read_async(), if required, automatically
- * applies the mst fixups to the folio before finally marking it uptodate and
- * unlocking it.
- *
- * We only enforce allocated_size limit because i_size is checked for in
- * generic_file_read().
- *
- * Return 0 on success and -errno on error.
- *
- * Contains an adapted version of fs/buffer.c::block_read_full_folio().
- */
-static int ntfs_read_block(struct folio *folio)
-{
- loff_t i_size;
- VCN vcn;
- LCN lcn;
- s64 init_size;
- struct inode *vi;
- ntfs_inode *ni;
- ntfs_volume *vol;
- runlist_element *rl;
- struct buffer_head *bh, *head, *arr[MAX_BUF_PER_PAGE];
- sector_t iblock, lblock, zblock;
- unsigned long flags;
- unsigned int blocksize, vcn_ofs;
- int i, nr;
- unsigned char blocksize_bits;
-
- vi = folio->mapping->host;
- ni = NTFS_I(vi);
- vol = ni->vol;
-
- /* $MFT/$DATA must have its complete runlist in memory at all times. */
- BUG_ON(!ni->runlist.rl && !ni->mft_no && !NInoAttr(ni));
-
- blocksize = vol->sb->s_blocksize;
- blocksize_bits = vol->sb->s_blocksize_bits;
-
- head = folio_buffers(folio);
- if (!head)
- head = create_empty_buffers(folio, blocksize, 0);
- bh = head;
-
- /*
- * We may be racing with truncate. To avoid some of the problems we
- * now take a snapshot of the various sizes and use those for the whole
- * of the function. In case of an extending truncate it just means we
- * may leave some buffers unmapped which are now allocated. This is
- * not a problem since these buffers will just get mapped when a write
- * occurs. In case of a shrinking truncate, we will detect this later
- * on due to the runlist being incomplete and if the folio is being
- * fully truncated, truncate will throw it away as soon as we unlock
- * it so no need to worry what we do with it.
- */
- iblock = (s64)folio->index << (PAGE_SHIFT - blocksize_bits);
- read_lock_irqsave(&ni->size_lock, flags);
- lblock = (ni->allocated_size + blocksize - 1) >> blocksize_bits;
- init_size = ni->initialized_size;
- i_size = i_size_read(vi);
- read_unlock_irqrestore(&ni->size_lock, flags);
- if (unlikely(init_size > i_size)) {
- /* Race with shrinking truncate. */
- init_size = i_size;
- }
- zblock = (init_size + blocksize - 1) >> blocksize_bits;
-
- /* Loop through all the buffers in the folio. */
- rl = NULL;
- nr = i = 0;
- do {
- int err = 0;
-
- if (unlikely(buffer_uptodate(bh)))
- continue;
- if (unlikely(buffer_mapped(bh))) {
- arr[nr++] = bh;
- continue;
- }
- bh->b_bdev = vol->sb->s_bdev;
- /* Is the block within the allowed limits? */
- if (iblock < lblock) {
- bool is_retry = false;
-
- /* Convert iblock into corresponding vcn and offset. */
- vcn = (VCN)iblock << blocksize_bits >>
- vol->cluster_size_bits;
- vcn_ofs = ((VCN)iblock << blocksize_bits) &
- vol->cluster_size_mask;
- if (!rl) {
-lock_retry_remap:
- down_read(&ni->runlist.lock);
- rl = ni->runlist.rl;
- }
- if (likely(rl != NULL)) {
- /* Seek to element containing target vcn. */
- while (rl->length && rl[1].vcn <= vcn)
- rl++;
- lcn = ntfs_rl_vcn_to_lcn(rl, vcn);
- } else
- lcn = LCN_RL_NOT_MAPPED;
- /* Successful remap. */
- if (lcn >= 0) {
- /* Setup buffer head to correct block. */
- bh->b_blocknr = ((lcn << vol->cluster_size_bits)
- + vcn_ofs) >> blocksize_bits;
- set_buffer_mapped(bh);
- /* Only read initialized data blocks. */
- if (iblock < zblock) {
- arr[nr++] = bh;
- continue;
- }
- /* Fully non-initialized data block, zero it. */
- goto handle_zblock;
- }
- /* It is a hole, need to zero it. */
- if (lcn == LCN_HOLE)
- goto handle_hole;
- /* If first try and runlist unmapped, map and retry. */
- if (!is_retry && lcn == LCN_RL_NOT_MAPPED) {
- is_retry = true;
- /*
- * Attempt to map runlist, dropping lock for
- * the duration.
- */
- up_read(&ni->runlist.lock);
- err = ntfs_map_runlist(ni, vcn);
- if (likely(!err))
- goto lock_retry_remap;
- rl = NULL;
- } else if (!rl)
- up_read(&ni->runlist.lock);
- /*
- * If buffer is outside the runlist, treat it as a
- * hole. This can happen due to concurrent truncate
- * for example.
- */
- if (err == -ENOENT || lcn == LCN_ENOENT) {
- err = 0;
- goto handle_hole;
- }
- /* Hard error, zero out region. */
- if (!err)
- err = -EIO;
- bh->b_blocknr = -1;
- folio_set_error(folio);
- ntfs_error(vol->sb, "Failed to read from inode 0x%lx, "
- "attribute type 0x%x, vcn 0x%llx, "
- "offset 0x%x because its location on "
- "disk could not be determined%s "
- "(error code %i).", ni->mft_no,
- ni->type, (unsigned long long)vcn,
- vcn_ofs, is_retry ? " even after "
- "retrying" : "", err);
- }
- /*
- * Either iblock was outside lblock limits or
- * ntfs_rl_vcn_to_lcn() returned error. Just zero that portion
- * of the folio and set the buffer uptodate.
- */
-handle_hole:
- bh->b_blocknr = -1UL;
- clear_buffer_mapped(bh);
-handle_zblock:
- folio_zero_range(folio, i * blocksize, blocksize);
- if (likely(!err))
- set_buffer_uptodate(bh);
- } while (i++, iblock++, (bh = bh->b_this_page) != head);
-
- /* Release the lock if we took it. */
- if (rl)
- up_read(&ni->runlist.lock);
-
- /* Check we have at least one buffer ready for i/o. */
- if (nr) {
- struct buffer_head *tbh;
-
- /* Lock the buffers. */
- for (i = 0; i < nr; i++) {
- tbh = arr[i];
- lock_buffer(tbh);
- tbh->b_end_io = ntfs_end_buffer_async_read;
- set_buffer_async_read(tbh);
- }
- /* Finally, start i/o on the buffers. */
- for (i = 0; i < nr; i++) {
- tbh = arr[i];
- if (likely(!buffer_uptodate(tbh)))
- submit_bh(REQ_OP_READ, tbh);
- else
- ntfs_end_buffer_async_read(tbh, 1);
- }
- return 0;
- }
- /* No i/o was scheduled on any of the buffers. */
- if (likely(!folio_test_error(folio)))
- folio_mark_uptodate(folio);
- else /* Signal synchronous i/o error. */
- nr = -EIO;
- folio_unlock(folio);
- return nr;
-}
-
-/**
- * ntfs_read_folio - fill a @folio of a @file with data from the device
- * @file: open file to which the folio @folio belongs or NULL
- * @folio: page cache folio to fill with data
- *
- * For non-resident attributes, ntfs_read_folio() fills the @folio of the open
- * file @file by calling the ntfs version of the generic block_read_full_folio()
- * function, ntfs_read_block(), which in turn creates and reads in the buffers
- * associated with the folio asynchronously.
- *
- * For resident attributes, OTOH, ntfs_read_folio() fills @folio by copying the
- * data from the mft record (which at this stage is most likely in memory) and
- * fills the remainder with zeroes. Thus, in this case, I/O is synchronous, as
- * even if the mft record is not cached at this point in time, we need to wait
- * for it to be read in before we can do the copy.
- *
- * Return 0 on success and -errno on error.
- */
-static int ntfs_read_folio(struct file *file, struct folio *folio)
-{
- struct page *page = &folio->page;
- loff_t i_size;
- struct inode *vi;
- ntfs_inode *ni, *base_ni;
- u8 *addr;
- ntfs_attr_search_ctx *ctx;
- MFT_RECORD *mrec;
- unsigned long flags;
- u32 attr_len;
- int err = 0;
-
-retry_readpage:
- BUG_ON(!PageLocked(page));
- vi = page->mapping->host;
- i_size = i_size_read(vi);
- /* Is the page fully outside i_size? (truncate in progress) */
- if (unlikely(page->index >= (i_size + PAGE_SIZE - 1) >>
- PAGE_SHIFT)) {
- zero_user(page, 0, PAGE_SIZE);
- ntfs_debug("Read outside i_size - truncated?");
- goto done;
- }
- /*
- * This can potentially happen because we clear PageUptodate() during
- * ntfs_writepage() of MstProtected() attributes.
- */
- if (PageUptodate(page)) {
- unlock_page(page);
- return 0;
- }
- ni = NTFS_I(vi);
- /*
- * Only $DATA attributes can be encrypted and only unnamed $DATA
- * attributes can be compressed. Index root can have the flags set but
- * this means to create compressed/encrypted files, not that the
- * attribute is compressed/encrypted. Note we need to check for
- * AT_INDEX_ALLOCATION since this is the type of both directory and
- * index inodes.
- */
- if (ni->type != AT_INDEX_ALLOCATION) {
- /* If attribute is encrypted, deny access, just like NT4. */
- if (NInoEncrypted(ni)) {
- BUG_ON(ni->type != AT_DATA);
- err = -EACCES;
- goto err_out;
- }
- /* Compressed data streams are handled in compress.c. */
- if (NInoNonResident(ni) && NInoCompressed(ni)) {
- BUG_ON(ni->type != AT_DATA);
- BUG_ON(ni->name_len);
- return ntfs_read_compressed_block(page);
- }
- }
- /* NInoNonResident() == NInoIndexAllocPresent() */
- if (NInoNonResident(ni)) {
- /* Normal, non-resident data stream. */
- return ntfs_read_block(folio);
- }
- /*
- * Attribute is resident, implying it is not compressed or encrypted.
- * This also means the attribute is smaller than an mft record and
- * hence smaller than a page, so can simply zero out any pages with
- * index above 0. Note the attribute can actually be marked compressed
- * but if it is resident the actual data is not compressed so we are
- * ok to ignore the compressed flag here.
- */
- if (unlikely(page->index > 0)) {
- zero_user(page, 0, PAGE_SIZE);
- goto done;
- }
- if (!NInoAttr(ni))
- base_ni = ni;
- else
- base_ni = ni->ext.base_ntfs_ino;
- /* Map, pin, and lock the mft record. */
- mrec = map_mft_record(base_ni);
- if (IS_ERR(mrec)) {
- err = PTR_ERR(mrec);
- goto err_out;
- }
- /*
- * If a parallel write made the attribute non-resident, drop the mft
- * record and retry the read_folio.
- */
- if (unlikely(NInoNonResident(ni))) {
- unmap_mft_record(base_ni);
- goto retry_readpage;
- }
- ctx = ntfs_attr_get_search_ctx(base_ni, mrec);
- if (unlikely(!ctx)) {
- err = -ENOMEM;
- goto unm_err_out;
- }
- err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
- CASE_SENSITIVE, 0, NULL, 0, ctx);
- if (unlikely(err))
- goto put_unm_err_out;
- attr_len = le32_to_cpu(ctx->attr->data.resident.value_length);
- read_lock_irqsave(&ni->size_lock, flags);
- if (unlikely(attr_len > ni->initialized_size))
- attr_len = ni->initialized_size;
- i_size = i_size_read(vi);
- read_unlock_irqrestore(&ni->size_lock, flags);
- if (unlikely(attr_len > i_size)) {
- /* Race with shrinking truncate. */
- attr_len = i_size;
- }
- addr = kmap_atomic(page);
- /* Copy the data to the page. */
- memcpy(addr, (u8*)ctx->attr +
- le16_to_cpu(ctx->attr->data.resident.value_offset),
- attr_len);
- /* Zero the remainder of the page. */
- memset(addr + attr_len, 0, PAGE_SIZE - attr_len);
- flush_dcache_page(page);
- kunmap_atomic(addr);
-put_unm_err_out:
- ntfs_attr_put_search_ctx(ctx);
-unm_err_out:
- unmap_mft_record(base_ni);
-done:
- SetPageUptodate(page);
-err_out:
- unlock_page(page);
- return err;
-}
-
-#ifdef NTFS_RW
-
-/**
- * ntfs_write_block - write a @folio to the backing store
- * @folio: page cache folio to write out
- * @wbc: writeback control structure
- *
- * This function is for writing folios belonging to non-resident, non-mst
- * protected attributes to their backing store.
- *
- * For a folio with buffers, map and write the dirty buffers asynchronously
- * under folio writeback. For a folio without buffers, create buffers for the
- * folio, then proceed as above.
- *
- * If a folio doesn't have buffers the folio dirty state is definitive. If
- * a folio does have buffers, the folio dirty state is just a hint,
- * and the buffer dirty state is definitive. (A hint which has rules:
- * dirty buffers against a clean folio is illegal. Other combinations are
- * legal and need to be handled. In particular a dirty folio containing
- * clean buffers for example.)
- *
- * Return 0 on success and -errno on error.
- *
- * Based on ntfs_read_block() and __block_write_full_folio().
- */
-static int ntfs_write_block(struct folio *folio, struct writeback_control *wbc)
-{
- VCN vcn;
- LCN lcn;
- s64 initialized_size;
- loff_t i_size;
- sector_t block, dblock, iblock;
- struct inode *vi;
- ntfs_inode *ni;
- ntfs_volume *vol;
- runlist_element *rl;
- struct buffer_head *bh, *head;
- unsigned long flags;
- unsigned int blocksize, vcn_ofs;
- int err;
- bool need_end_writeback;
- unsigned char blocksize_bits;
-
- vi = folio->mapping->host;
- ni = NTFS_I(vi);
- vol = ni->vol;
-
- ntfs_debug("Entering for inode 0x%lx, attribute type 0x%x, page index "
- "0x%lx.", ni->mft_no, ni->type, folio->index);
-
- BUG_ON(!NInoNonResident(ni));
- BUG_ON(NInoMstProtected(ni));
- blocksize = vol->sb->s_blocksize;
- blocksize_bits = vol->sb->s_blocksize_bits;
- head = folio_buffers(folio);
- if (!head) {
- BUG_ON(!folio_test_uptodate(folio));
- head = create_empty_buffers(folio, blocksize,
- (1 << BH_Uptodate) | (1 << BH_Dirty));
- }
- bh = head;
-
- /* NOTE: Different naming scheme to ntfs_read_block()! */
-
- /* The first block in the folio. */
- block = (s64)folio->index << (PAGE_SHIFT - blocksize_bits);
-
- read_lock_irqsave(&ni->size_lock, flags);
- i_size = i_size_read(vi);
- initialized_size = ni->initialized_size;
- read_unlock_irqrestore(&ni->size_lock, flags);
-
- /* The first out of bounds block for the data size. */
- dblock = (i_size + blocksize - 1) >> blocksize_bits;
-
- /* The last (fully or partially) initialized block. */
- iblock = initialized_size >> blocksize_bits;
-
- /*
- * Be very careful. We have no exclusion from block_dirty_folio
- * here, and the (potentially unmapped) buffers may become dirty at
- * any time. If a buffer becomes dirty here after we've inspected it
- * then we just miss that fact, and the folio stays dirty.
- *
- * Buffers outside i_size may be dirtied by block_dirty_folio;
- * handle that here by just cleaning them.
- */
-
- /*
- * Loop through all the buffers in the folio, mapping all the dirty
- * buffers to disk addresses and handling any aliases from the
- * underlying block device's mapping.
- */
- rl = NULL;
- err = 0;
- do {
- bool is_retry = false;
-
- if (unlikely(block >= dblock)) {
- /*
- * Mapped buffers outside i_size will occur, because
- * this folio can be outside i_size when there is a
- * truncate in progress. The contents of such buffers
- * were zeroed by ntfs_writepage().
- *
- * FIXME: What about the small race window where
- * ntfs_writepage() has not done any clearing because
- * the folio was within i_size but before we get here,
- * vmtruncate() modifies i_size?
- */
- clear_buffer_dirty(bh);
- set_buffer_uptodate(bh);
- continue;
- }
-
- /* Clean buffers are not written out, so no need to map them. */
- if (!buffer_dirty(bh))
- continue;
-
- /* Make sure we have enough initialized size. */
- if (unlikely((block >= iblock) &&
- (initialized_size < i_size))) {
- /*
- * If this folio is fully outside initialized
- * size, zero out all folios between the current
- * initialized size and the current folio. Just
- * use ntfs_read_folio() to do the zeroing
- * transparently.
- */
- if (block > iblock) {
- // TODO:
- // For each folio do:
- // - read_cache_folio()
- // Again for each folio do:
- // - wait_on_folio_locked()
- // - Check (folio_test_uptodate(folio) &&
- // !folio_test_error(folio))
- // Update initialized size in the attribute and
- // in the inode.
- // Again, for each folio do:
- // block_dirty_folio();
- // folio_put()
- // We don't need to wait on the writes.
- // Update iblock.
- }
- /*
- * The current folio straddles initialized size. Zero
- * all non-uptodate buffers and set them uptodate (and
- * dirty?). Note, there aren't any non-uptodate buffers
- * if the folio is uptodate.
- * FIXME: For an uptodate folio, the buffers may need to
- * be written out because they were not initialized on
- * disk before.
- */
- if (!folio_test_uptodate(folio)) {
- // TODO:
- // Zero any non-uptodate buffers up to i_size.
- // Set them uptodate and dirty.
- }
- // TODO:
- // Update initialized size in the attribute and in the
- // inode (up to i_size).
- // Update iblock.
- // FIXME: This is inefficient. Try to batch the two
- // size changes to happen in one go.
- ntfs_error(vol->sb, "Writing beyond initialized size "
- "is not supported yet. Sorry.");
- err = -EOPNOTSUPP;
- break;
- // Do NOT set_buffer_new() BUT DO clear buffer range
- // outside write request range.
- // set_buffer_uptodate() on complete buffers as well as
- // set_buffer_dirty().
- }
-
- /* No need to map buffers that are already mapped. */
- if (buffer_mapped(bh))
- continue;
-
- /* Unmapped, dirty buffer. Need to map it. */
- bh->b_bdev = vol->sb->s_bdev;
-
- /* Convert block into corresponding vcn and offset. */
- vcn = (VCN)block << blocksize_bits;
- vcn_ofs = vcn & vol->cluster_size_mask;
- vcn >>= vol->cluster_size_bits;
- if (!rl) {
-lock_retry_remap:
- down_read(&ni->runlist.lock);
- rl = ni->runlist.rl;
- }
- if (likely(rl != NULL)) {
- /* Seek to element containing target vcn. */
- while (rl->length && rl[1].vcn <= vcn)
- rl++;
- lcn = ntfs_rl_vcn_to_lcn(rl, vcn);
- } else
- lcn = LCN_RL_NOT_MAPPED;
- /* Successful remap. */
- if (lcn >= 0) {
- /* Setup buffer head to point to correct block. */
- bh->b_blocknr = ((lcn << vol->cluster_size_bits) +
- vcn_ofs) >> blocksize_bits;
- set_buffer_mapped(bh);
- continue;
- }
- /* It is a hole, need to instantiate it. */
- if (lcn == LCN_HOLE) {
- u8 *kaddr;
- unsigned long *bpos, *bend;
-
- /* Check if the buffer is zero. */
- kaddr = kmap_local_folio(folio, bh_offset(bh));
- bpos = (unsigned long *)kaddr;
- bend = (unsigned long *)(kaddr + blocksize);
- do {
- if (unlikely(*bpos))
- break;
- } while (likely(++bpos < bend));
- kunmap_local(kaddr);
- if (bpos == bend) {
- /*
- * Buffer is zero and sparse, no need to write
- * it.
- */
- bh->b_blocknr = -1;
- clear_buffer_dirty(bh);
- continue;
- }
- // TODO: Instantiate the hole.
- // clear_buffer_new(bh);
- // clean_bdev_bh_alias(bh);
- ntfs_error(vol->sb, "Writing into sparse regions is "
- "not supported yet. Sorry.");
- err = -EOPNOTSUPP;
- break;
- }
- /* If first try and runlist unmapped, map and retry. */
- if (!is_retry && lcn == LCN_RL_NOT_MAPPED) {
- is_retry = true;
- /*
- * Attempt to map runlist, dropping lock for
- * the duration.
- */
- up_read(&ni->runlist.lock);
- err = ntfs_map_runlist(ni, vcn);
- if (likely(!err))
- goto lock_retry_remap;
- rl = NULL;
- } else if (!rl)
- up_read(&ni->runlist.lock);
- /*
- * If buffer is outside the runlist, truncate has cut it out
- * of the runlist. Just clean and clear the buffer and set it
- * uptodate so it can get discarded by the VM.
- */
- if (err == -ENOENT || lcn == LCN_ENOENT) {
- bh->b_blocknr = -1;
- clear_buffer_dirty(bh);
- folio_zero_range(folio, bh_offset(bh), blocksize);
- set_buffer_uptodate(bh);
- err = 0;
- continue;
- }
- /* Failed to map the buffer, even after retrying. */
- if (!err)
- err = -EIO;
- bh->b_blocknr = -1;
- ntfs_error(vol->sb, "Failed to write to inode 0x%lx, "
- "attribute type 0x%x, vcn 0x%llx, offset 0x%x "
- "because its location on disk could not be "
- "determined%s (error code %i).", ni->mft_no,
- ni->type, (unsigned long long)vcn,
- vcn_ofs, is_retry ? " even after "
- "retrying" : "", err);
- break;
- } while (block++, (bh = bh->b_this_page) != head);
-
- /* Release the lock if we took it. */
- if (rl)
- up_read(&ni->runlist.lock);
-
- /* For the error case, need to reset bh to the beginning. */
- bh = head;
-
- /* Just an optimization, so ->read_folio() is not called later. */
- if (unlikely(!folio_test_uptodate(folio))) {
- int uptodate = 1;
- do {
- if (!buffer_uptodate(bh)) {
- uptodate = 0;
- bh = head;
- break;
- }
- } while ((bh = bh->b_this_page) != head);
- if (uptodate)
- folio_mark_uptodate(folio);
- }
-
- /* Setup all mapped, dirty buffers for async write i/o. */
- do {
- if (buffer_mapped(bh) && buffer_dirty(bh)) {
- lock_buffer(bh);
- if (test_clear_buffer_dirty(bh)) {
- BUG_ON(!buffer_uptodate(bh));
- mark_buffer_async_write(bh);
- } else
- unlock_buffer(bh);
- } else if (unlikely(err)) {
- /*
- * For the error case. The buffer may have been set
- * dirty during attachment to a dirty folio.
- */
- if (err != -ENOMEM)
- clear_buffer_dirty(bh);
- }
- } while ((bh = bh->b_this_page) != head);
-
- if (unlikely(err)) {
- // TODO: Remove the -EOPNOTSUPP check later on...
- if (unlikely(err == -EOPNOTSUPP))
- err = 0;
- else if (err == -ENOMEM) {
- ntfs_warning(vol->sb, "Error allocating memory. "
- "Redirtying folio so we try again "
- "later.");
- /*
- * Put the folio back on mapping->dirty_pages, but
- * leave its buffer's dirty state as-is.
- */
- folio_redirty_for_writepage(wbc, folio);
- err = 0;
- } else
- folio_set_error(folio);
- }
-
- BUG_ON(folio_test_writeback(folio));
- folio_start_writeback(folio); /* Keeps try_to_free_buffers() away. */
-
- /* Submit the prepared buffers for i/o. */
- need_end_writeback = true;
- do {
- struct buffer_head *next = bh->b_this_page;
- if (buffer_async_write(bh)) {
- submit_bh(REQ_OP_WRITE, bh);
- need_end_writeback = false;
- }
- bh = next;
- } while (bh != head);
- folio_unlock(folio);
-
- /* If no i/o was started, need to end writeback here. */
- if (unlikely(need_end_writeback))
- folio_end_writeback(folio);
-
- ntfs_debug("Done.");
- return err;
-}
-
-/**
- * ntfs_write_mst_block - write a @page to the backing store
- * @page: page cache page to write out
- * @wbc: writeback control structure
- *
- * This function is for writing pages belonging to non-resident, mst protected
- * attributes to their backing store. The only supported attributes are index
- * allocation and $MFT/$DATA. Both directory inodes and index inodes are
- * supported for the index allocation case.
- *
- * The page must remain locked for the duration of the write because we apply
- * the mst fixups, write, and then undo the fixups, so if we were to unlock the
- * page before undoing the fixups, any other user of the page will see the
- * page contents as corrupt.
- *
- * We clear the page uptodate flag for the duration of the function to ensure
- * exclusion for the $MFT/$DATA case against someone mapping an mft record we
- * are about to apply the mst fixups to.
- *
- * Return 0 on success and -errno on error.
- *
- * Based on ntfs_write_block(), ntfs_mft_writepage(), and
- * write_mft_record_nolock().
- */
-static int ntfs_write_mst_block(struct page *page,
- struct writeback_control *wbc)
-{
- sector_t block, dblock, rec_block;
- struct inode *vi = page->mapping->host;
- ntfs_inode *ni = NTFS_I(vi);
- ntfs_volume *vol = ni->vol;
- u8 *kaddr;
- unsigned int rec_size = ni->itype.index.block_size;
- ntfs_inode *locked_nis[PAGE_SIZE / NTFS_BLOCK_SIZE];
- struct buffer_head *bh, *head, *tbh, *rec_start_bh;
- struct buffer_head *bhs[MAX_BUF_PER_PAGE];
- runlist_element *rl;
- int i, nr_locked_nis, nr_recs, nr_bhs, max_bhs, bhs_per_rec, err, err2;
- unsigned bh_size, rec_size_bits;
- bool sync, is_mft, page_is_dirty, rec_is_dirty;
- unsigned char bh_size_bits;
-
- if (WARN_ON(rec_size < NTFS_BLOCK_SIZE))
- return -EINVAL;
-
- ntfs_debug("Entering for inode 0x%lx, attribute type 0x%x, page index "
- "0x%lx.", vi->i_ino, ni->type, page->index);
- BUG_ON(!NInoNonResident(ni));
- BUG_ON(!NInoMstProtected(ni));
- is_mft = (S_ISREG(vi->i_mode) && !vi->i_ino);
- /*
- * NOTE: ntfs_write_mst_block() would be called for $MFTMirr if a page
- * in its page cache were to be marked dirty. However this should
- * never happen with the current driver and considering we do not
- * handle this case here we do want to BUG(), at least for now.
- */
- BUG_ON(!(is_mft || S_ISDIR(vi->i_mode) ||
- (NInoAttr(ni) && ni->type == AT_INDEX_ALLOCATION)));
- bh_size = vol->sb->s_blocksize;
- bh_size_bits = vol->sb->s_blocksize_bits;
- max_bhs = PAGE_SIZE / bh_size;
- BUG_ON(!max_bhs);
- BUG_ON(max_bhs > MAX_BUF_PER_PAGE);
-
- /* Were we called for sync purposes? */
- sync = (wbc->sync_mode == WB_SYNC_ALL);
-
- /* Make sure we have mapped buffers. */
- bh = head = page_buffers(page);
- BUG_ON(!bh);
-
- rec_size_bits = ni->itype.index.block_size_bits;
- BUG_ON(!(PAGE_SIZE >> rec_size_bits));
- bhs_per_rec = rec_size >> bh_size_bits;
- BUG_ON(!bhs_per_rec);
-
- /* The first block in the page. */
- rec_block = block = (sector_t)page->index <<
- (PAGE_SHIFT - bh_size_bits);
-
- /* The first out of bounds block for the data size. */
- dblock = (i_size_read(vi) + bh_size - 1) >> bh_size_bits;
-
- rl = NULL;
- err = err2 = nr_bhs = nr_recs = nr_locked_nis = 0;
- page_is_dirty = rec_is_dirty = false;
- rec_start_bh = NULL;
- do {
- bool is_retry = false;
-
- if (likely(block < rec_block)) {
- if (unlikely(block >= dblock)) {
- clear_buffer_dirty(bh);
- set_buffer_uptodate(bh);
- continue;
- }
- /*
- * This block is not the first one in the record. We
- * ignore the buffer's dirty state because we could
- * have raced with a parallel mark_ntfs_record_dirty().
- */
- if (!rec_is_dirty)
- continue;
- if (unlikely(err2)) {
- if (err2 != -ENOMEM)
- clear_buffer_dirty(bh);
- continue;
- }
- } else /* if (block == rec_block) */ {
- BUG_ON(block > rec_block);
- /* This block is the first one in the record. */
- rec_block += bhs_per_rec;
- err2 = 0;
- if (unlikely(block >= dblock)) {
- clear_buffer_dirty(bh);
- continue;
- }
- if (!buffer_dirty(bh)) {
- /* Clean records are not written out. */
- rec_is_dirty = false;
- continue;
- }
- rec_is_dirty = true;
- rec_start_bh = bh;
- }
- /* Need to map the buffer if it is not mapped already. */
- if (unlikely(!buffer_mapped(bh))) {
- VCN vcn;
- LCN lcn;
- unsigned int vcn_ofs;
-
- bh->b_bdev = vol->sb->s_bdev;
- /* Obtain the vcn and offset of the current block. */
- vcn = (VCN)block << bh_size_bits;
- vcn_ofs = vcn & vol->cluster_size_mask;
- vcn >>= vol->cluster_size_bits;
- if (!rl) {
-lock_retry_remap:
- down_read(&ni->runlist.lock);
- rl = ni->runlist.rl;
- }
- if (likely(rl != NULL)) {
- /* Seek to element containing target vcn. */
- while (rl->length && rl[1].vcn <= vcn)
- rl++;
- lcn = ntfs_rl_vcn_to_lcn(rl, vcn);
- } else
- lcn = LCN_RL_NOT_MAPPED;
- /* Successful remap. */
- if (likely(lcn >= 0)) {
- /* Setup buffer head to correct block. */
- bh->b_blocknr = ((lcn <<
- vol->cluster_size_bits) +
- vcn_ofs) >> bh_size_bits;
- set_buffer_mapped(bh);
- } else {
- /*
- * Remap failed. Retry to map the runlist once
- * unless we are working on $MFT which always
- * has the whole of its runlist in memory.
- */
- if (!is_mft && !is_retry &&
- lcn == LCN_RL_NOT_MAPPED) {
- is_retry = true;
- /*
- * Attempt to map runlist, dropping
- * lock for the duration.
- */
- up_read(&ni->runlist.lock);
- err2 = ntfs_map_runlist(ni, vcn);
- if (likely(!err2))
- goto lock_retry_remap;
- if (err2 == -ENOMEM)
- page_is_dirty = true;
- lcn = err2;
- } else {
- err2 = -EIO;
- if (!rl)
- up_read(&ni->runlist.lock);
- }
- /* Hard error. Abort writing this record. */
- if (!err || err == -ENOMEM)
- err = err2;
- bh->b_blocknr = -1;
- ntfs_error(vol->sb, "Cannot write ntfs record "
- "0x%llx (inode 0x%lx, "
- "attribute type 0x%x) because "
- "its location on disk could "
- "not be determined (error "
- "code %lli).",
- (long long)block <<
- bh_size_bits >>
- vol->mft_record_size_bits,
- ni->mft_no, ni->type,
- (long long)lcn);
- /*
- * If this is not the first buffer, remove the
- * buffers in this record from the list of
- * buffers to write and clear their dirty bit
- * if not error -ENOMEM.
- */
- if (rec_start_bh != bh) {
- while (bhs[--nr_bhs] != rec_start_bh)
- ;
- if (err2 != -ENOMEM) {
- do {
- clear_buffer_dirty(
- rec_start_bh);
- } while ((rec_start_bh =
- rec_start_bh->
- b_this_page) !=
- bh);
- }
- }
- continue;
- }
- }
- BUG_ON(!buffer_uptodate(bh));
- BUG_ON(nr_bhs >= max_bhs);
- bhs[nr_bhs++] = bh;
- } while (block++, (bh = bh->b_this_page) != head);
- if (unlikely(rl))
- up_read(&ni->runlist.lock);
- /* If there were no dirty buffers, we are done. */
- if (!nr_bhs)
- goto done;
- /* Map the page so we can access its contents. */
- kaddr = kmap(page);
- /* Clear the page uptodate flag whilst the mst fixups are applied. */
- BUG_ON(!PageUptodate(page));
- ClearPageUptodate(page);
- for (i = 0; i < nr_bhs; i++) {
- unsigned int ofs;
-
- /* Skip buffers which are not at the beginning of records. */
- if (i % bhs_per_rec)
- continue;
- tbh = bhs[i];
- ofs = bh_offset(tbh);
- if (is_mft) {
- ntfs_inode *tni;
- unsigned long mft_no;
-
- /* Get the mft record number. */
- mft_no = (((s64)page->index << PAGE_SHIFT) + ofs)
- >> rec_size_bits;
- /* Check whether to write this mft record. */
- tni = NULL;
- if (!ntfs_may_write_mft_record(vol, mft_no,
- (MFT_RECORD*)(kaddr + ofs), &tni)) {
- /*
- * The record should not be written. This
- * means we need to redirty the page before
- * returning.
- */
- page_is_dirty = true;
- /*
- * Remove the buffers in this mft record from
- * the list of buffers to write.
- */
- do {
- bhs[i] = NULL;
- } while (++i % bhs_per_rec);
- continue;
- }
- /*
- * The record should be written. If a locked ntfs
- * inode was returned, add it to the array of locked
- * ntfs inodes.
- */
- if (tni)
- locked_nis[nr_locked_nis++] = tni;
- }
- /* Apply the mst protection fixups. */
- err2 = pre_write_mst_fixup((NTFS_RECORD*)(kaddr + ofs),
- rec_size);
- if (unlikely(err2)) {
- if (!err || err == -ENOMEM)
- err = -EIO;
- ntfs_error(vol->sb, "Failed to apply mst fixups "
- "(inode 0x%lx, attribute type 0x%x, "
- "page index 0x%lx, page offset 0x%x)!"
- " Unmount and run chkdsk.", vi->i_ino,
- ni->type, page->index, ofs);
- /*
- * Mark all the buffers in this record clean as we do
- * not want to write corrupt data to disk.
- */
- do {
- clear_buffer_dirty(bhs[i]);
- bhs[i] = NULL;
- } while (++i % bhs_per_rec);
- continue;
- }
- nr_recs++;
- }
- /* If no records are to be written out, we are done. */
- if (!nr_recs)
- goto unm_done;
- flush_dcache_page(page);
- /* Lock buffers and start synchronous write i/o on them. */
- for (i = 0; i < nr_bhs; i++) {
- tbh = bhs[i];
- if (!tbh)
- continue;
- if (!trylock_buffer(tbh))
- BUG();
- /* The buffer dirty state is now irrelevant, just clean it. */
- clear_buffer_dirty(tbh);
- BUG_ON(!buffer_uptodate(tbh));
- BUG_ON(!buffer_mapped(tbh));
- get_bh(tbh);
- tbh->b_end_io = end_buffer_write_sync;
- submit_bh(REQ_OP_WRITE, tbh);
- }
- /* Synchronize the mft mirror now if not @sync. */
- if (is_mft && !sync)
- goto do_mirror;
-do_wait:
- /* Wait on i/o completion of buffers. */
- for (i = 0; i < nr_bhs; i++) {
- tbh = bhs[i];
- if (!tbh)
- continue;
- wait_on_buffer(tbh);
- if (unlikely(!buffer_uptodate(tbh))) {
- ntfs_error(vol->sb, "I/O error while writing ntfs "
- "record buffer (inode 0x%lx, "
- "attribute type 0x%x, page index "
- "0x%lx, page offset 0x%lx)! Unmount "
- "and run chkdsk.", vi->i_ino, ni->type,
- page->index, bh_offset(tbh));
- if (!err || err == -ENOMEM)
- err = -EIO;
- /*
- * Set the buffer uptodate so the page and buffer
- * states do not become out of sync.
- */
- set_buffer_uptodate(tbh);
- }
- }
- /* If @sync, now synchronize the mft mirror. */
- if (is_mft && sync) {
-do_mirror:
- for (i = 0; i < nr_bhs; i++) {
- unsigned long mft_no;
- unsigned int ofs;
-
- /*
- * Skip buffers which are not at the beginning of
- * records.
- */
- if (i % bhs_per_rec)
- continue;
- tbh = bhs[i];
- /* Skip removed buffers (and hence records). */
- if (!tbh)
- continue;
- ofs = bh_offset(tbh);
- /* Get the mft record number. */
- mft_no = (((s64)page->index << PAGE_SHIFT) + ofs)
- >> rec_size_bits;
- if (mft_no < vol->mftmirr_size)
- ntfs_sync_mft_mirror(vol, mft_no,
- (MFT_RECORD*)(kaddr + ofs),
- sync);
- }
- if (!sync)
- goto do_wait;
- }
- /* Remove the mst protection fixups again. */
- for (i = 0; i < nr_bhs; i++) {
- if (!(i % bhs_per_rec)) {
- tbh = bhs[i];
- if (!tbh)
- continue;
- post_write_mst_fixup((NTFS_RECORD*)(kaddr +
- bh_offset(tbh)));
- }
- }
- flush_dcache_page(page);
-unm_done:
- /* Unlock any locked inodes. */
- while (nr_locked_nis-- > 0) {
- ntfs_inode *tni, *base_tni;
-
- tni = locked_nis[nr_locked_nis];
- /* Get the base inode. */
- mutex_lock(&tni->extent_lock);
- if (tni->nr_extents >= 0)
- base_tni = tni;
- else {
- base_tni = tni->ext.base_ntfs_ino;
- BUG_ON(!base_tni);
- }
- mutex_unlock(&tni->extent_lock);
- ntfs_debug("Unlocking %s inode 0x%lx.",
- tni == base_tni ? "base" : "extent",
- tni->mft_no);
- mutex_unlock(&tni->mrec_lock);
- atomic_dec(&tni->count);
- iput(VFS_I(base_tni));
- }
- SetPageUptodate(page);
- kunmap(page);
-done:
- if (unlikely(err && err != -ENOMEM)) {
- /*
- * Set page error if there is only one ntfs record in the page.
- * Otherwise we would loose per-record granularity.
- */
- if (ni->itype.index.block_size == PAGE_SIZE)
- SetPageError(page);
- NVolSetErrors(vol);
- }
- if (page_is_dirty) {
- ntfs_debug("Page still contains one or more dirty ntfs "
- "records. Redirtying the page starting at "
- "record 0x%lx.", page->index <<
- (PAGE_SHIFT - rec_size_bits));
- redirty_page_for_writepage(wbc, page);
- unlock_page(page);
- } else {
- /*
- * Keep the VM happy. This must be done otherwise the
- * radix-tree tag PAGECACHE_TAG_DIRTY remains set even though
- * the page is clean.
- */
- BUG_ON(PageWriteback(page));
- set_page_writeback(page);
- unlock_page(page);
- end_page_writeback(page);
- }
- if (likely(!err))
- ntfs_debug("Done.");
- return err;
-}
-
-/**
- * ntfs_writepage - write a @page to the backing store
- * @page: page cache page to write out
- * @wbc: writeback control structure
- *
- * This is called from the VM when it wants to have a dirty ntfs page cache
- * page cleaned. The VM has already locked the page and marked it clean.
- *
- * For non-resident attributes, ntfs_writepage() writes the @page by calling
- * the ntfs version of the generic block_write_full_folio() function,
- * ntfs_write_block(), which in turn if necessary creates and writes the
- * buffers associated with the page asynchronously.
- *
- * For resident attributes, OTOH, ntfs_writepage() writes the @page by copying
- * the data to the mft record (which at this stage is most likely in memory).
- * The mft record is then marked dirty and written out asynchronously via the
- * vfs inode dirty code path for the inode the mft record belongs to or via the
- * vm page dirty code path for the page the mft record is in.
- *
- * Based on ntfs_read_folio() and fs/buffer.c::block_write_full_folio().
- *
- * Return 0 on success and -errno on error.
- */
-static int ntfs_writepage(struct page *page, struct writeback_control *wbc)
-{
- struct folio *folio = page_folio(page);
- loff_t i_size;
- struct inode *vi = folio->mapping->host;
- ntfs_inode *base_ni = NULL, *ni = NTFS_I(vi);
- char *addr;
- ntfs_attr_search_ctx *ctx = NULL;
- MFT_RECORD *m = NULL;
- u32 attr_len;
- int err;
-
-retry_writepage:
- BUG_ON(!folio_test_locked(folio));
- i_size = i_size_read(vi);
- /* Is the folio fully outside i_size? (truncate in progress) */
- if (unlikely(folio->index >= (i_size + PAGE_SIZE - 1) >>
- PAGE_SHIFT)) {
- /*
- * The folio may have dirty, unmapped buffers. Make them
- * freeable here, so the page does not leak.
- */
- block_invalidate_folio(folio, 0, folio_size(folio));
- folio_unlock(folio);
- ntfs_debug("Write outside i_size - truncated?");
- return 0;
- }
- /*
- * Only $DATA attributes can be encrypted and only unnamed $DATA
- * attributes can be compressed. Index root can have the flags set but
- * this means to create compressed/encrypted files, not that the
- * attribute is compressed/encrypted. Note we need to check for
- * AT_INDEX_ALLOCATION since this is the type of both directory and
- * index inodes.
- */
- if (ni->type != AT_INDEX_ALLOCATION) {
- /* If file is encrypted, deny access, just like NT4. */
- if (NInoEncrypted(ni)) {
- folio_unlock(folio);
- BUG_ON(ni->type != AT_DATA);
- ntfs_debug("Denying write access to encrypted file.");
- return -EACCES;
- }
- /* Compressed data streams are handled in compress.c. */
- if (NInoNonResident(ni) && NInoCompressed(ni)) {
- BUG_ON(ni->type != AT_DATA);
- BUG_ON(ni->name_len);
- // TODO: Implement and replace this with
- // return ntfs_write_compressed_block(page);
- folio_unlock(folio);
- ntfs_error(vi->i_sb, "Writing to compressed files is "
- "not supported yet. Sorry.");
- return -EOPNOTSUPP;
- }
- // TODO: Implement and remove this check.
- if (NInoNonResident(ni) && NInoSparse(ni)) {
- folio_unlock(folio);
- ntfs_error(vi->i_sb, "Writing to sparse files is not "
- "supported yet. Sorry.");
- return -EOPNOTSUPP;
- }
- }
- /* NInoNonResident() == NInoIndexAllocPresent() */
- if (NInoNonResident(ni)) {
- /* We have to zero every time due to mmap-at-end-of-file. */
- if (folio->index >= (i_size >> PAGE_SHIFT)) {
- /* The folio straddles i_size. */
- unsigned int ofs = i_size & (folio_size(folio) - 1);
- folio_zero_segment(folio, ofs, folio_size(folio));
- }
- /* Handle mst protected attributes. */
- if (NInoMstProtected(ni))
- return ntfs_write_mst_block(page, wbc);
- /* Normal, non-resident data stream. */
- return ntfs_write_block(folio, wbc);
- }
- /*
- * Attribute is resident, implying it is not compressed, encrypted, or
- * mst protected. This also means the attribute is smaller than an mft
- * record and hence smaller than a folio, so can simply return error on
- * any folios with index above 0. Note the attribute can actually be
- * marked compressed but if it is resident the actual data is not
- * compressed so we are ok to ignore the compressed flag here.
- */
- BUG_ON(folio_buffers(folio));
- BUG_ON(!folio_test_uptodate(folio));
- if (unlikely(folio->index > 0)) {
- ntfs_error(vi->i_sb, "BUG()! folio->index (0x%lx) > 0. "
- "Aborting write.", folio->index);
- BUG_ON(folio_test_writeback(folio));
- folio_start_writeback(folio);
- folio_unlock(folio);
- folio_end_writeback(folio);
- return -EIO;
- }
- if (!NInoAttr(ni))
- base_ni = ni;
- else
- base_ni = ni->ext.base_ntfs_ino;
- /* Map, pin, and lock the mft record. */
- m = map_mft_record(base_ni);
- if (IS_ERR(m)) {
- err = PTR_ERR(m);
- m = NULL;
- ctx = NULL;
- goto err_out;
- }
- /*
- * If a parallel write made the attribute non-resident, drop the mft
- * record and retry the writepage.
- */
- if (unlikely(NInoNonResident(ni))) {
- unmap_mft_record(base_ni);
- goto retry_writepage;
- }
- ctx = ntfs_attr_get_search_ctx(base_ni, m);
- if (unlikely(!ctx)) {
- err = -ENOMEM;
- goto err_out;
- }
- err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
- CASE_SENSITIVE, 0, NULL, 0, ctx);
- if (unlikely(err))
- goto err_out;
- /*
- * Keep the VM happy. This must be done otherwise
- * PAGECACHE_TAG_DIRTY remains set even though the folio is clean.
- */
- BUG_ON(folio_test_writeback(folio));
- folio_start_writeback(folio);
- folio_unlock(folio);
- attr_len = le32_to_cpu(ctx->attr->data.resident.value_length);
- i_size = i_size_read(vi);
- if (unlikely(attr_len > i_size)) {
- /* Race with shrinking truncate or a failed truncate. */
- attr_len = i_size;
- /*
- * If the truncate failed, fix it up now. If a concurrent
- * truncate, we do its job, so it does not have to do anything.
- */
- err = ntfs_resident_attr_value_resize(ctx->mrec, ctx->attr,
- attr_len);
- /* Shrinking cannot fail. */
- BUG_ON(err);
- }
- addr = kmap_local_folio(folio, 0);
- /* Copy the data from the folio to the mft record. */
- memcpy((u8*)ctx->attr +
- le16_to_cpu(ctx->attr->data.resident.value_offset),
- addr, attr_len);
- /* Zero out of bounds area in the page cache folio. */
- memset(addr + attr_len, 0, folio_size(folio) - attr_len);
- kunmap_local(addr);
- flush_dcache_folio(folio);
- flush_dcache_mft_record_page(ctx->ntfs_ino);
- /* We are done with the folio. */
- folio_end_writeback(folio);
- /* Finally, mark the mft record dirty, so it gets written back. */
- mark_mft_record_dirty(ctx->ntfs_ino);
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(base_ni);
- return 0;
-err_out:
- if (err == -ENOMEM) {
- ntfs_warning(vi->i_sb, "Error allocating memory. Redirtying "
- "page so we try again later.");
- /*
- * Put the folio back on mapping->dirty_pages, but leave its
- * buffers' dirty state as-is.
- */
- folio_redirty_for_writepage(wbc, folio);
- err = 0;
- } else {
- ntfs_error(vi->i_sb, "Resident attribute write failed with "
- "error %i.", err);
- folio_set_error(folio);
- NVolSetErrors(ni->vol);
- }
- folio_unlock(folio);
- if (ctx)
- ntfs_attr_put_search_ctx(ctx);
- if (m)
- unmap_mft_record(base_ni);
- return err;
-}
-
-#endif /* NTFS_RW */
-
-/**
- * ntfs_bmap - map logical file block to physical device block
- * @mapping: address space mapping to which the block to be mapped belongs
- * @block: logical block to map to its physical device block
- *
- * For regular, non-resident files (i.e. not compressed and not encrypted), map
- * the logical @block belonging to the file described by the address space
- * mapping @mapping to its physical device block.
- *
- * The size of the block is equal to the @s_blocksize field of the super block
- * of the mounted file system which is guaranteed to be smaller than or equal
- * to the cluster size thus the block is guaranteed to fit entirely inside the
- * cluster which means we do not need to care how many contiguous bytes are
- * available after the beginning of the block.
- *
- * Return the physical device block if the mapping succeeded or 0 if the block
- * is sparse or there was an error.
- *
- * Note: This is a problem if someone tries to run bmap() on $Boot system file
- * as that really is in block zero but there is nothing we can do. bmap() is
- * just broken in that respect (just like it cannot distinguish sparse from
- * not available or error).
- */
-static sector_t ntfs_bmap(struct address_space *mapping, sector_t block)
-{
- s64 ofs, size;
- loff_t i_size;
- LCN lcn;
- unsigned long blocksize, flags;
- ntfs_inode *ni = NTFS_I(mapping->host);
- ntfs_volume *vol = ni->vol;
- unsigned delta;
- unsigned char blocksize_bits, cluster_size_shift;
-
- ntfs_debug("Entering for mft_no 0x%lx, logical block 0x%llx.",
- ni->mft_no, (unsigned long long)block);
- if (ni->type != AT_DATA || !NInoNonResident(ni) || NInoEncrypted(ni)) {
- ntfs_error(vol->sb, "BMAP does not make sense for %s "
- "attributes, returning 0.",
- (ni->type != AT_DATA) ? "non-data" :
- (!NInoNonResident(ni) ? "resident" :
- "encrypted"));
- return 0;
- }
- /* None of these can happen. */
- BUG_ON(NInoCompressed(ni));
- BUG_ON(NInoMstProtected(ni));
- blocksize = vol->sb->s_blocksize;
- blocksize_bits = vol->sb->s_blocksize_bits;
- ofs = (s64)block << blocksize_bits;
- read_lock_irqsave(&ni->size_lock, flags);
- size = ni->initialized_size;
- i_size = i_size_read(VFS_I(ni));
- read_unlock_irqrestore(&ni->size_lock, flags);
- /*
- * If the offset is outside the initialized size or the block straddles
- * the initialized size then pretend it is a hole unless the
- * initialized size equals the file size.
- */
- if (unlikely(ofs >= size || (ofs + blocksize > size && size < i_size)))
- goto hole;
- cluster_size_shift = vol->cluster_size_bits;
- down_read(&ni->runlist.lock);
- lcn = ntfs_attr_vcn_to_lcn_nolock(ni, ofs >> cluster_size_shift, false);
- up_read(&ni->runlist.lock);
- if (unlikely(lcn < LCN_HOLE)) {
- /*
- * Step down to an integer to avoid gcc doing a long long
- * comparision in the switch when we know @lcn is between
- * LCN_HOLE and LCN_EIO (i.e. -1 to -5).
- *
- * Otherwise older gcc (at least on some architectures) will
- * try to use __cmpdi2() which is of course not available in
- * the kernel.
- */
- switch ((int)lcn) {
- case LCN_ENOENT:
- /*
- * If the offset is out of bounds then pretend it is a
- * hole.
- */
- goto hole;
- case LCN_ENOMEM:
- ntfs_error(vol->sb, "Not enough memory to complete "
- "mapping for inode 0x%lx. "
- "Returning 0.", ni->mft_no);
- break;
- default:
- ntfs_error(vol->sb, "Failed to complete mapping for "
- "inode 0x%lx. Run chkdsk. "
- "Returning 0.", ni->mft_no);
- break;
- }
- return 0;
- }
- if (lcn < 0) {
- /* It is a hole. */
-hole:
- ntfs_debug("Done (returning hole).");
- return 0;
- }
- /*
- * The block is really allocated and fullfils all our criteria.
- * Convert the cluster to units of block size and return the result.
- */
- delta = ofs & vol->cluster_size_mask;
- if (unlikely(sizeof(block) < sizeof(lcn))) {
- block = lcn = ((lcn << cluster_size_shift) + delta) >>
- blocksize_bits;
- /* If the block number was truncated return 0. */
- if (unlikely(block != lcn)) {
- ntfs_error(vol->sb, "Physical block 0x%llx is too "
- "large to be returned, returning 0.",
- (long long)lcn);
- return 0;
- }
- } else
- block = ((lcn << cluster_size_shift) + delta) >>
- blocksize_bits;
- ntfs_debug("Done (returning block 0x%llx).", (unsigned long long)lcn);
- return block;
-}
-
-/*
- * ntfs_normal_aops - address space operations for normal inodes and attributes
- *
- * Note these are not used for compressed or mst protected inodes and
- * attributes.
- */
-const struct address_space_operations ntfs_normal_aops = {
- .read_folio = ntfs_read_folio,
-#ifdef NTFS_RW
- .writepage = ntfs_writepage,
- .dirty_folio = block_dirty_folio,
-#endif /* NTFS_RW */
- .bmap = ntfs_bmap,
- .migrate_folio = buffer_migrate_folio,
- .is_partially_uptodate = block_is_partially_uptodate,
- .error_remove_folio = generic_error_remove_folio,
-};
-
-/*
- * ntfs_compressed_aops - address space operations for compressed inodes
- */
-const struct address_space_operations ntfs_compressed_aops = {
- .read_folio = ntfs_read_folio,
-#ifdef NTFS_RW
- .writepage = ntfs_writepage,
- .dirty_folio = block_dirty_folio,
-#endif /* NTFS_RW */
- .migrate_folio = buffer_migrate_folio,
- .is_partially_uptodate = block_is_partially_uptodate,
- .error_remove_folio = generic_error_remove_folio,
-};
-
-/*
- * ntfs_mst_aops - general address space operations for mst protecteed inodes
- * and attributes
- */
-const struct address_space_operations ntfs_mst_aops = {
- .read_folio = ntfs_read_folio, /* Fill page with data. */
-#ifdef NTFS_RW
- .writepage = ntfs_writepage, /* Write dirty page to disk. */
- .dirty_folio = filemap_dirty_folio,
-#endif /* NTFS_RW */
- .migrate_folio = buffer_migrate_folio,
- .is_partially_uptodate = block_is_partially_uptodate,
- .error_remove_folio = generic_error_remove_folio,
-};
-
-#ifdef NTFS_RW
-
-/**
- * mark_ntfs_record_dirty - mark an ntfs record dirty
- * @page: page containing the ntfs record to mark dirty
- * @ofs: byte offset within @page at which the ntfs record begins
- *
- * Set the buffers and the page in which the ntfs record is located dirty.
- *
- * The latter also marks the vfs inode the ntfs record belongs to dirty
- * (I_DIRTY_PAGES only).
- *
- * If the page does not have buffers, we create them and set them uptodate.
- * The page may not be locked which is why we need to handle the buffers under
- * the mapping->i_private_lock. Once the buffers are marked dirty we no longer
- * need the lock since try_to_free_buffers() does not free dirty buffers.
- */
-void mark_ntfs_record_dirty(struct page *page, const unsigned int ofs) {
- struct address_space *mapping = page->mapping;
- ntfs_inode *ni = NTFS_I(mapping->host);
- struct buffer_head *bh, *head, *buffers_to_free = NULL;
- unsigned int end, bh_size, bh_ofs;
-
- BUG_ON(!PageUptodate(page));
- end = ofs + ni->itype.index.block_size;
- bh_size = VFS_I(ni)->i_sb->s_blocksize;
- spin_lock(&mapping->i_private_lock);
- if (unlikely(!page_has_buffers(page))) {
- spin_unlock(&mapping->i_private_lock);
- bh = head = alloc_page_buffers(page, bh_size, true);
- spin_lock(&mapping->i_private_lock);
- if (likely(!page_has_buffers(page))) {
- struct buffer_head *tail;
-
- do {
- set_buffer_uptodate(bh);
- tail = bh;
- bh = bh->b_this_page;
- } while (bh);
- tail->b_this_page = head;
- attach_page_private(page, head);
- } else
- buffers_to_free = bh;
- }
- bh = head = page_buffers(page);
- BUG_ON(!bh);
- do {
- bh_ofs = bh_offset(bh);
- if (bh_ofs + bh_size <= ofs)
- continue;
- if (unlikely(bh_ofs >= end))
- break;
- set_buffer_dirty(bh);
- } while ((bh = bh->b_this_page) != head);
- spin_unlock(&mapping->i_private_lock);
- filemap_dirty_folio(mapping, page_folio(page));
- if (unlikely(buffers_to_free)) {
- do {
- bh = buffers_to_free->b_this_page;
- free_buffer_head(buffers_to_free);
- buffers_to_free = bh;
- } while (buffers_to_free);
- }
-}
-
-#endif /* NTFS_RW */
diff --git a/fs/ntfs/aops.h b/fs/ntfs/aops.h
deleted file mode 100644
index 8d0958a..0000000
--- a/fs/ntfs/aops.h
+++ /dev/null
@@ -1,88 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * aops.h - Defines for NTFS kernel address space operations and page cache
- * handling. Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2004 Anton Altaparmakov
- * Copyright (c) 2002 Richard Russon
- */
-
-#ifndef _LINUX_NTFS_AOPS_H
-#define _LINUX_NTFS_AOPS_H
-
-#include <linux/mm.h>
-#include <linux/highmem.h>
-#include <linux/pagemap.h>
-#include <linux/fs.h>
-
-#include "inode.h"
-
-/**
- * ntfs_unmap_page - release a page that was mapped using ntfs_map_page()
- * @page: the page to release
- *
- * Unpin, unmap and release a page that was obtained from ntfs_map_page().
- */
-static inline void ntfs_unmap_page(struct page *page)
-{
- kunmap(page);
- put_page(page);
-}
-
-/**
- * ntfs_map_page - map a page into accessible memory, reading it if necessary
- * @mapping: address space for which to obtain the page
- * @index: index into the page cache for @mapping of the page to map
- *
- * Read a page from the page cache of the address space @mapping at position
- * @index, where @index is in units of PAGE_SIZE, and not in bytes.
- *
- * If the page is not in memory it is loaded from disk first using the
- * read_folio method defined in the address space operations of @mapping
- * and the page is added to the page cache of @mapping in the process.
- *
- * If the page belongs to an mst protected attribute and it is marked as such
- * in its ntfs inode (NInoMstProtected()) the mst fixups are applied but no
- * error checking is performed. This means the caller has to verify whether
- * the ntfs record(s) contained in the page are valid or not using one of the
- * ntfs_is_XXXX_record{,p}() macros, where XXXX is the record type you are
- * expecting to see. (For details of the macros, see fs/ntfs/layout.h.)
- *
- * If the page is in high memory it is mapped into memory directly addressible
- * by the kernel.
- *
- * Finally the page count is incremented, thus pinning the page into place.
- *
- * The above means that page_address(page) can be used on all pages obtained
- * with ntfs_map_page() to get the kernel virtual address of the page.
- *
- * When finished with the page, the caller has to call ntfs_unmap_page() to
- * unpin, unmap and release the page.
- *
- * Note this does not grant exclusive access. If such is desired, the caller
- * must provide it independently of the ntfs_{un}map_page() calls by using
- * a {rw_}semaphore or other means of serialization. A spin lock cannot be
- * used as ntfs_map_page() can block.
- *
- * The unlocked and uptodate page is returned on success or an encoded error
- * on failure. Caller has to test for error using the IS_ERR() macro on the
- * return value. If that evaluates to 'true', the negative error code can be
- * obtained using PTR_ERR() on the return value of ntfs_map_page().
- */
-static inline struct page *ntfs_map_page(struct address_space *mapping,
- unsigned long index)
-{
- struct page *page = read_mapping_page(mapping, index, NULL);
-
- if (!IS_ERR(page))
- kmap(page);
- return page;
-}
-
-#ifdef NTFS_RW
-
-extern void mark_ntfs_record_dirty(struct page *page, const unsigned int ofs);
-
-#endif /* NTFS_RW */
-
-#endif /* _LINUX_NTFS_AOPS_H */
diff --git a/fs/ntfs/attrib.c b/fs/ntfs/attrib.c
deleted file mode 100644
index f79408f..0000000
--- a/fs/ntfs/attrib.c
+++ /dev/null
@@ -1,2624 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * attrib.c - NTFS attribute operations. Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2012 Anton Altaparmakov and Tuxera Inc.
- * Copyright (c) 2002 Richard Russon
- */
-
-#include <linux/buffer_head.h>
-#include <linux/sched.h>
-#include <linux/slab.h>
-#include <linux/swap.h>
-#include <linux/writeback.h>
-
-#include "attrib.h"
-#include "debug.h"
-#include "layout.h"
-#include "lcnalloc.h"
-#include "malloc.h"
-#include "mft.h"
-#include "ntfs.h"
-#include "types.h"
-
-/**
- * ntfs_map_runlist_nolock - map (a part of) a runlist of an ntfs inode
- * @ni: ntfs inode for which to map (part of) a runlist
- * @vcn: map runlist part containing this vcn
- * @ctx: active attribute search context if present or NULL if not
- *
- * Map the part of a runlist containing the @vcn of the ntfs inode @ni.
- *
- * If @ctx is specified, it is an active search context of @ni and its base mft
- * record. This is needed when ntfs_map_runlist_nolock() encounters unmapped
- * runlist fragments and allows their mapping. If you do not have the mft
- * record mapped, you can specify @ctx as NULL and ntfs_map_runlist_nolock()
- * will perform the necessary mapping and unmapping.
- *
- * Note, ntfs_map_runlist_nolock() saves the state of @ctx on entry and
- * restores it before returning. Thus, @ctx will be left pointing to the same
- * attribute on return as on entry. However, the actual pointers in @ctx may
- * point to different memory locations on return, so you must remember to reset
- * any cached pointers from the @ctx, i.e. after the call to
- * ntfs_map_runlist_nolock(), you will probably want to do:
- * m = ctx->mrec;
- * a = ctx->attr;
- * Assuming you cache ctx->attr in a variable @a of type ATTR_RECORD * and that
- * you cache ctx->mrec in a variable @m of type MFT_RECORD *.
- *
- * Return 0 on success and -errno on error. There is one special error code
- * which is not an error as such. This is -ENOENT. It means that @vcn is out
- * of bounds of the runlist.
- *
- * Note the runlist can be NULL after this function returns if @vcn is zero and
- * the attribute has zero allocated size, i.e. there simply is no runlist.
- *
- * WARNING: If @ctx is supplied, regardless of whether success or failure is
- * returned, you need to check IS_ERR(@ctx->mrec) and if 'true' the @ctx
- * is no longer valid, i.e. you need to either call
- * ntfs_attr_reinit_search_ctx() or ntfs_attr_put_search_ctx() on it.
- * In that case PTR_ERR(@ctx->mrec) will give you the error code for
- * why the mapping of the old inode failed.
- *
- * Locking: - The runlist described by @ni must be locked for writing on entry
- * and is locked on return. Note the runlist will be modified.
- * - If @ctx is NULL, the base mft record of @ni must not be mapped on
- * entry and it will be left unmapped on return.
- * - If @ctx is not NULL, the base mft record must be mapped on entry
- * and it will be left mapped on return.
- */
-int ntfs_map_runlist_nolock(ntfs_inode *ni, VCN vcn, ntfs_attr_search_ctx *ctx)
-{
- VCN end_vcn;
- unsigned long flags;
- ntfs_inode *base_ni;
- MFT_RECORD *m;
- ATTR_RECORD *a;
- runlist_element *rl;
- struct page *put_this_page = NULL;
- int err = 0;
- bool ctx_is_temporary, ctx_needs_reset;
- ntfs_attr_search_ctx old_ctx = { NULL, };
-
- ntfs_debug("Mapping runlist part containing vcn 0x%llx.",
- (unsigned long long)vcn);
- if (!NInoAttr(ni))
- base_ni = ni;
- else
- base_ni = ni->ext.base_ntfs_ino;
- if (!ctx) {
- ctx_is_temporary = ctx_needs_reset = true;
- m = map_mft_record(base_ni);
- if (IS_ERR(m))
- return PTR_ERR(m);
- ctx = ntfs_attr_get_search_ctx(base_ni, m);
- if (unlikely(!ctx)) {
- err = -ENOMEM;
- goto err_out;
- }
- } else {
- VCN allocated_size_vcn;
-
- BUG_ON(IS_ERR(ctx->mrec));
- a = ctx->attr;
- BUG_ON(!a->non_resident);
- ctx_is_temporary = false;
- end_vcn = sle64_to_cpu(a->data.non_resident.highest_vcn);
- read_lock_irqsave(&ni->size_lock, flags);
- allocated_size_vcn = ni->allocated_size >>
- ni->vol->cluster_size_bits;
- read_unlock_irqrestore(&ni->size_lock, flags);
- if (!a->data.non_resident.lowest_vcn && end_vcn <= 0)
- end_vcn = allocated_size_vcn - 1;
- /*
- * If we already have the attribute extent containing @vcn in
- * @ctx, no need to look it up again. We slightly cheat in
- * that if vcn exceeds the allocated size, we will refuse to
- * map the runlist below, so there is definitely no need to get
- * the right attribute extent.
- */
- if (vcn >= allocated_size_vcn || (a->type == ni->type &&
- a->name_length == ni->name_len &&
- !memcmp((u8*)a + le16_to_cpu(a->name_offset),
- ni->name, ni->name_len) &&
- sle64_to_cpu(a->data.non_resident.lowest_vcn)
- <= vcn && end_vcn >= vcn))
- ctx_needs_reset = false;
- else {
- /* Save the old search context. */
- old_ctx = *ctx;
- /*
- * If the currently mapped (extent) inode is not the
- * base inode we will unmap it when we reinitialize the
- * search context which means we need to get a
- * reference to the page containing the mapped mft
- * record so we do not accidentally drop changes to the
- * mft record when it has not been marked dirty yet.
- */
- if (old_ctx.base_ntfs_ino && old_ctx.ntfs_ino !=
- old_ctx.base_ntfs_ino) {
- put_this_page = old_ctx.ntfs_ino->page;
- get_page(put_this_page);
- }
- /*
- * Reinitialize the search context so we can lookup the
- * needed attribute extent.
- */
- ntfs_attr_reinit_search_ctx(ctx);
- ctx_needs_reset = true;
- }
- }
- if (ctx_needs_reset) {
- err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
- CASE_SENSITIVE, vcn, NULL, 0, ctx);
- if (unlikely(err)) {
- if (err == -ENOENT)
- err = -EIO;
- goto err_out;
- }
- BUG_ON(!ctx->attr->non_resident);
- }
- a = ctx->attr;
- /*
- * Only decompress the mapping pairs if @vcn is inside it. Otherwise
- * we get into problems when we try to map an out of bounds vcn because
- * we then try to map the already mapped runlist fragment and
- * ntfs_mapping_pairs_decompress() fails.
- */
- end_vcn = sle64_to_cpu(a->data.non_resident.highest_vcn) + 1;
- if (unlikely(vcn && vcn >= end_vcn)) {
- err = -ENOENT;
- goto err_out;
- }
- rl = ntfs_mapping_pairs_decompress(ni->vol, a, ni->runlist.rl);
- if (IS_ERR(rl))
- err = PTR_ERR(rl);
- else
- ni->runlist.rl = rl;
-err_out:
- if (ctx_is_temporary) {
- if (likely(ctx))
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(base_ni);
- } else if (ctx_needs_reset) {
- /*
- * If there is no attribute list, restoring the search context
- * is accomplished simply by copying the saved context back over
- * the caller supplied context. If there is an attribute list,
- * things are more complicated as we need to deal with mapping
- * of mft records and resulting potential changes in pointers.
- */
- if (NInoAttrList(base_ni)) {
- /*
- * If the currently mapped (extent) inode is not the
- * one we had before, we need to unmap it and map the
- * old one.
- */
- if (ctx->ntfs_ino != old_ctx.ntfs_ino) {
- /*
- * If the currently mapped inode is not the
- * base inode, unmap it.
- */
- if (ctx->base_ntfs_ino && ctx->ntfs_ino !=
- ctx->base_ntfs_ino) {
- unmap_extent_mft_record(ctx->ntfs_ino);
- ctx->mrec = ctx->base_mrec;
- BUG_ON(!ctx->mrec);
- }
- /*
- * If the old mapped inode is not the base
- * inode, map it.
- */
- if (old_ctx.base_ntfs_ino &&
- old_ctx.ntfs_ino !=
- old_ctx.base_ntfs_ino) {
-retry_map:
- ctx->mrec = map_mft_record(
- old_ctx.ntfs_ino);
- /*
- * Something bad has happened. If out
- * of memory retry till it succeeds.
- * Any other errors are fatal and we
- * return the error code in ctx->mrec.
- * Let the caller deal with it... We
- * just need to fudge things so the
- * caller can reinit and/or put the
- * search context safely.
- */
- if (IS_ERR(ctx->mrec)) {
- if (PTR_ERR(ctx->mrec) ==
- -ENOMEM) {
- schedule();
- goto retry_map;
- } else
- old_ctx.ntfs_ino =
- old_ctx.
- base_ntfs_ino;
- }
- }
- }
- /* Update the changed pointers in the saved context. */
- if (ctx->mrec != old_ctx.mrec) {
- if (!IS_ERR(ctx->mrec))
- old_ctx.attr = (ATTR_RECORD*)(
- (u8*)ctx->mrec +
- ((u8*)old_ctx.attr -
- (u8*)old_ctx.mrec));
- old_ctx.mrec = ctx->mrec;
- }
- }
- /* Restore the search context to the saved one. */
- *ctx = old_ctx;
- /*
- * We drop the reference on the page we took earlier. In the
- * case that IS_ERR(ctx->mrec) is true this means we might lose
- * some changes to the mft record that had been made between
- * the last time it was marked dirty/written out and now. This
- * at this stage is not a problem as the mapping error is fatal
- * enough that the mft record cannot be written out anyway and
- * the caller is very likely to shutdown the whole inode
- * immediately and mark the volume dirty for chkdsk to pick up
- * the pieces anyway.
- */
- if (put_this_page)
- put_page(put_this_page);
- }
- return err;
-}
-
-/**
- * ntfs_map_runlist - map (a part of) a runlist of an ntfs inode
- * @ni: ntfs inode for which to map (part of) a runlist
- * @vcn: map runlist part containing this vcn
- *
- * Map the part of a runlist containing the @vcn of the ntfs inode @ni.
- *
- * Return 0 on success and -errno on error. There is one special error code
- * which is not an error as such. This is -ENOENT. It means that @vcn is out
- * of bounds of the runlist.
- *
- * Locking: - The runlist must be unlocked on entry and is unlocked on return.
- * - This function takes the runlist lock for writing and may modify
- * the runlist.
- */
-int ntfs_map_runlist(ntfs_inode *ni, VCN vcn)
-{
- int err = 0;
-
- down_write(&ni->runlist.lock);
- /* Make sure someone else didn't do the work while we were sleeping. */
- if (likely(ntfs_rl_vcn_to_lcn(ni->runlist.rl, vcn) <=
- LCN_RL_NOT_MAPPED))
- err = ntfs_map_runlist_nolock(ni, vcn, NULL);
- up_write(&ni->runlist.lock);
- return err;
-}
-
-/**
- * ntfs_attr_vcn_to_lcn_nolock - convert a vcn into a lcn given an ntfs inode
- * @ni: ntfs inode of the attribute whose runlist to search
- * @vcn: vcn to convert
- * @write_locked: true if the runlist is locked for writing
- *
- * Find the virtual cluster number @vcn in the runlist of the ntfs attribute
- * described by the ntfs inode @ni and return the corresponding logical cluster
- * number (lcn).
- *
- * If the @vcn is not mapped yet, the attempt is made to map the attribute
- * extent containing the @vcn and the vcn to lcn conversion is retried.
- *
- * If @write_locked is true the caller has locked the runlist for writing and
- * if false for reading.
- *
- * Since lcns must be >= 0, we use negative return codes with special meaning:
- *
- * Return code Meaning / Description
- * ==========================================
- * LCN_HOLE Hole / not allocated on disk.
- * LCN_ENOENT There is no such vcn in the runlist, i.e. @vcn is out of bounds.
- * LCN_ENOMEM Not enough memory to map runlist.
- * LCN_EIO Critical error (runlist/file is corrupt, i/o error, etc).
- *
- * Locking: - The runlist must be locked on entry and is left locked on return.
- * - If @write_locked is 'false', i.e. the runlist is locked for reading,
- * the lock may be dropped inside the function so you cannot rely on
- * the runlist still being the same when this function returns.
- */
-LCN ntfs_attr_vcn_to_lcn_nolock(ntfs_inode *ni, const VCN vcn,
- const bool write_locked)
-{
- LCN lcn;
- unsigned long flags;
- bool is_retry = false;
-
- BUG_ON(!ni);
- ntfs_debug("Entering for i_ino 0x%lx, vcn 0x%llx, %s_locked.",
- ni->mft_no, (unsigned long long)vcn,
- write_locked ? "write" : "read");
- BUG_ON(!NInoNonResident(ni));
- BUG_ON(vcn < 0);
- if (!ni->runlist.rl) {
- read_lock_irqsave(&ni->size_lock, flags);
- if (!ni->allocated_size) {
- read_unlock_irqrestore(&ni->size_lock, flags);
- return LCN_ENOENT;
- }
- read_unlock_irqrestore(&ni->size_lock, flags);
- }
-retry_remap:
- /* Convert vcn to lcn. If that fails map the runlist and retry once. */
- lcn = ntfs_rl_vcn_to_lcn(ni->runlist.rl, vcn);
- if (likely(lcn >= LCN_HOLE)) {
- ntfs_debug("Done, lcn 0x%llx.", (long long)lcn);
- return lcn;
- }
- if (lcn != LCN_RL_NOT_MAPPED) {
- if (lcn != LCN_ENOENT)
- lcn = LCN_EIO;
- } else if (!is_retry) {
- int err;
-
- if (!write_locked) {
- up_read(&ni->runlist.lock);
- down_write(&ni->runlist.lock);
- if (unlikely(ntfs_rl_vcn_to_lcn(ni->runlist.rl, vcn) !=
- LCN_RL_NOT_MAPPED)) {
- up_write(&ni->runlist.lock);
- down_read(&ni->runlist.lock);
- goto retry_remap;
- }
- }
- err = ntfs_map_runlist_nolock(ni, vcn, NULL);
- if (!write_locked) {
- up_write(&ni->runlist.lock);
- down_read(&ni->runlist.lock);
- }
- if (likely(!err)) {
- is_retry = true;
- goto retry_remap;
- }
- if (err == -ENOENT)
- lcn = LCN_ENOENT;
- else if (err == -ENOMEM)
- lcn = LCN_ENOMEM;
- else
- lcn = LCN_EIO;
- }
- if (lcn != LCN_ENOENT)
- ntfs_error(ni->vol->sb, "Failed with error code %lli.",
- (long long)lcn);
- return lcn;
-}
-
-/**
- * ntfs_attr_find_vcn_nolock - find a vcn in the runlist of an ntfs inode
- * @ni: ntfs inode describing the runlist to search
- * @vcn: vcn to find
- * @ctx: active attribute search context if present or NULL if not
- *
- * Find the virtual cluster number @vcn in the runlist described by the ntfs
- * inode @ni and return the address of the runlist element containing the @vcn.
- *
- * If the @vcn is not mapped yet, the attempt is made to map the attribute
- * extent containing the @vcn and the vcn to lcn conversion is retried.
- *
- * If @ctx is specified, it is an active search context of @ni and its base mft
- * record. This is needed when ntfs_attr_find_vcn_nolock() encounters unmapped
- * runlist fragments and allows their mapping. If you do not have the mft
- * record mapped, you can specify @ctx as NULL and ntfs_attr_find_vcn_nolock()
- * will perform the necessary mapping and unmapping.
- *
- * Note, ntfs_attr_find_vcn_nolock() saves the state of @ctx on entry and
- * restores it before returning. Thus, @ctx will be left pointing to the same
- * attribute on return as on entry. However, the actual pointers in @ctx may
- * point to different memory locations on return, so you must remember to reset
- * any cached pointers from the @ctx, i.e. after the call to
- * ntfs_attr_find_vcn_nolock(), you will probably want to do:
- * m = ctx->mrec;
- * a = ctx->attr;
- * Assuming you cache ctx->attr in a variable @a of type ATTR_RECORD * and that
- * you cache ctx->mrec in a variable @m of type MFT_RECORD *.
- * Note you need to distinguish between the lcn of the returned runlist element
- * being >= 0 and LCN_HOLE. In the later case you have to return zeroes on
- * read and allocate clusters on write.
- *
- * Return the runlist element containing the @vcn on success and
- * ERR_PTR(-errno) on error. You need to test the return value with IS_ERR()
- * to decide if the return is success or failure and PTR_ERR() to get to the
- * error code if IS_ERR() is true.
- *
- * The possible error return codes are:
- * -ENOENT - No such vcn in the runlist, i.e. @vcn is out of bounds.
- * -ENOMEM - Not enough memory to map runlist.
- * -EIO - Critical error (runlist/file is corrupt, i/o error, etc).
- *
- * WARNING: If @ctx is supplied, regardless of whether success or failure is
- * returned, you need to check IS_ERR(@ctx->mrec) and if 'true' the @ctx
- * is no longer valid, i.e. you need to either call
- * ntfs_attr_reinit_search_ctx() or ntfs_attr_put_search_ctx() on it.
- * In that case PTR_ERR(@ctx->mrec) will give you the error code for
- * why the mapping of the old inode failed.
- *
- * Locking: - The runlist described by @ni must be locked for writing on entry
- * and is locked on return. Note the runlist may be modified when
- * needed runlist fragments need to be mapped.
- * - If @ctx is NULL, the base mft record of @ni must not be mapped on
- * entry and it will be left unmapped on return.
- * - If @ctx is not NULL, the base mft record must be mapped on entry
- * and it will be left mapped on return.
- */
-runlist_element *ntfs_attr_find_vcn_nolock(ntfs_inode *ni, const VCN vcn,
- ntfs_attr_search_ctx *ctx)
-{
- unsigned long flags;
- runlist_element *rl;
- int err = 0;
- bool is_retry = false;
-
- BUG_ON(!ni);
- ntfs_debug("Entering for i_ino 0x%lx, vcn 0x%llx, with%s ctx.",
- ni->mft_no, (unsigned long long)vcn, ctx ? "" : "out");
- BUG_ON(!NInoNonResident(ni));
- BUG_ON(vcn < 0);
- if (!ni->runlist.rl) {
- read_lock_irqsave(&ni->size_lock, flags);
- if (!ni->allocated_size) {
- read_unlock_irqrestore(&ni->size_lock, flags);
- return ERR_PTR(-ENOENT);
- }
- read_unlock_irqrestore(&ni->size_lock, flags);
- }
-retry_remap:
- rl = ni->runlist.rl;
- if (likely(rl && vcn >= rl[0].vcn)) {
- while (likely(rl->length)) {
- if (unlikely(vcn < rl[1].vcn)) {
- if (likely(rl->lcn >= LCN_HOLE)) {
- ntfs_debug("Done.");
- return rl;
- }
- break;
- }
- rl++;
- }
- if (likely(rl->lcn != LCN_RL_NOT_MAPPED)) {
- if (likely(rl->lcn == LCN_ENOENT))
- err = -ENOENT;
- else
- err = -EIO;
- }
- }
- if (!err && !is_retry) {
- /*
- * If the search context is invalid we cannot map the unmapped
- * region.
- */
- if (IS_ERR(ctx->mrec))
- err = PTR_ERR(ctx->mrec);
- else {
- /*
- * The @vcn is in an unmapped region, map the runlist
- * and retry.
- */
- err = ntfs_map_runlist_nolock(ni, vcn, ctx);
- if (likely(!err)) {
- is_retry = true;
- goto retry_remap;
- }
- }
- if (err == -EINVAL)
- err = -EIO;
- } else if (!err)
- err = -EIO;
- if (err != -ENOENT)
- ntfs_error(ni->vol->sb, "Failed with error code %i.", err);
- return ERR_PTR(err);
-}
-
-/**
- * ntfs_attr_find - find (next) attribute in mft record
- * @type: attribute type to find
- * @name: attribute name to find (optional, i.e. NULL means don't care)
- * @name_len: attribute name length (only needed if @name present)
- * @ic: IGNORE_CASE or CASE_SENSITIVE (ignored if @name not present)
- * @val: attribute value to find (optional, resident attributes only)
- * @val_len: attribute value length
- * @ctx: search context with mft record and attribute to search from
- *
- * You should not need to call this function directly. Use ntfs_attr_lookup()
- * instead.
- *
- * ntfs_attr_find() takes a search context @ctx as parameter and searches the
- * mft record specified by @ctx->mrec, beginning at @ctx->attr, for an
- * attribute of @type, optionally @name and @val.
- *
- * If the attribute is found, ntfs_attr_find() returns 0 and @ctx->attr will
- * point to the found attribute.
- *
- * If the attribute is not found, ntfs_attr_find() returns -ENOENT and
- * @ctx->attr will point to the attribute before which the attribute being
- * searched for would need to be inserted if such an action were to be desired.
- *
- * On actual error, ntfs_attr_find() returns -EIO. In this case @ctx->attr is
- * undefined and in particular do not rely on it not changing.
- *
- * If @ctx->is_first is 'true', the search begins with @ctx->attr itself. If it
- * is 'false', the search begins after @ctx->attr.
- *
- * If @ic is IGNORE_CASE, the @name comparisson is not case sensitive and
- * @ctx->ntfs_ino must be set to the ntfs inode to which the mft record
- * @ctx->mrec belongs. This is so we can get at the ntfs volume and hence at
- * the upcase table. If @ic is CASE_SENSITIVE, the comparison is case
- * sensitive. When @name is present, @name_len is the @name length in Unicode
- * characters.
- *
- * If @name is not present (NULL), we assume that the unnamed attribute is
- * being searched for.
- *
- * Finally, the resident attribute value @val is looked for, if present. If
- * @val is not present (NULL), @val_len is ignored.
- *
- * ntfs_attr_find() only searches the specified mft record and it ignores the
- * presence of an attribute list attribute (unless it is the one being searched
- * for, obviously). If you need to take attribute lists into consideration,
- * use ntfs_attr_lookup() instead (see below). This also means that you cannot
- * use ntfs_attr_find() to search for extent records of non-resident
- * attributes, as extents with lowest_vcn != 0 are usually described by the
- * attribute list attribute only. - Note that it is possible that the first
- * extent is only in the attribute list while the last extent is in the base
- * mft record, so do not rely on being able to find the first extent in the
- * base mft record.
- *
- * Warning: Never use @val when looking for attribute types which can be
- * non-resident as this most likely will result in a crash!
- */
-static int ntfs_attr_find(const ATTR_TYPE type, const ntfschar *name,
- const u32 name_len, const IGNORE_CASE_BOOL ic,
- const u8 *val, const u32 val_len, ntfs_attr_search_ctx *ctx)
-{
- ATTR_RECORD *a;
- ntfs_volume *vol = ctx->ntfs_ino->vol;
- ntfschar *upcase = vol->upcase;
- u32 upcase_len = vol->upcase_len;
-
- /*
- * Iterate over attributes in mft record starting at @ctx->attr, or the
- * attribute following that, if @ctx->is_first is 'true'.
- */
- if (ctx->is_first) {
- a = ctx->attr;
- ctx->is_first = false;
- } else
- a = (ATTR_RECORD*)((u8*)ctx->attr +
- le32_to_cpu(ctx->attr->length));
- for (;; a = (ATTR_RECORD*)((u8*)a + le32_to_cpu(a->length))) {
- u8 *mrec_end = (u8 *)ctx->mrec +
- le32_to_cpu(ctx->mrec->bytes_allocated);
- u8 *name_end;
-
- /* check whether ATTR_RECORD wrap */
- if ((u8 *)a < (u8 *)ctx->mrec)
- break;
-
- /* check whether Attribute Record Header is within bounds */
- if ((u8 *)a > mrec_end ||
- (u8 *)a + sizeof(ATTR_RECORD) > mrec_end)
- break;
-
- /* check whether ATTR_RECORD's name is within bounds */
- name_end = (u8 *)a + le16_to_cpu(a->name_offset) +
- a->name_length * sizeof(ntfschar);
- if (name_end > mrec_end)
- break;
-
- ctx->attr = a;
- if (unlikely(le32_to_cpu(a->type) > le32_to_cpu(type) ||
- a->type == AT_END))
- return -ENOENT;
- if (unlikely(!a->length))
- break;
-
- /* check whether ATTR_RECORD's length wrap */
- if ((u8 *)a + le32_to_cpu(a->length) < (u8 *)a)
- break;
- /* check whether ATTR_RECORD's length is within bounds */
- if ((u8 *)a + le32_to_cpu(a->length) > mrec_end)
- break;
-
- if (a->type != type)
- continue;
- /*
- * If @name is present, compare the two names. If @name is
- * missing, assume we want an unnamed attribute.
- */
- if (!name) {
- /* The search failed if the found attribute is named. */
- if (a->name_length)
- return -ENOENT;
- } else if (!ntfs_are_names_equal(name, name_len,
- (ntfschar*)((u8*)a + le16_to_cpu(a->name_offset)),
- a->name_length, ic, upcase, upcase_len)) {
- register int rc;
-
- rc = ntfs_collate_names(name, name_len,
- (ntfschar*)((u8*)a +
- le16_to_cpu(a->name_offset)),
- a->name_length, 1, IGNORE_CASE,
- upcase, upcase_len);
- /*
- * If @name collates before a->name, there is no
- * matching attribute.
- */
- if (rc == -1)
- return -ENOENT;
- /* If the strings are not equal, continue search. */
- if (rc)
- continue;
- rc = ntfs_collate_names(name, name_len,
- (ntfschar*)((u8*)a +
- le16_to_cpu(a->name_offset)),
- a->name_length, 1, CASE_SENSITIVE,
- upcase, upcase_len);
- if (rc == -1)
- return -ENOENT;
- if (rc)
- continue;
- }
- /*
- * The names match or @name not present and attribute is
- * unnamed. If no @val specified, we have found the attribute
- * and are done.
- */
- if (!val)
- return 0;
- /* @val is present; compare values. */
- else {
- register int rc;
-
- rc = memcmp(val, (u8*)a + le16_to_cpu(
- a->data.resident.value_offset),
- min_t(u32, val_len, le32_to_cpu(
- a->data.resident.value_length)));
- /*
- * If @val collates before the current attribute's
- * value, there is no matching attribute.
- */
- if (!rc) {
- register u32 avl;
-
- avl = le32_to_cpu(
- a->data.resident.value_length);
- if (val_len == avl)
- return 0;
- if (val_len < avl)
- return -ENOENT;
- } else if (rc < 0)
- return -ENOENT;
- }
- }
- ntfs_error(vol->sb, "Inode is corrupt. Run chkdsk.");
- NVolSetErrors(vol);
- return -EIO;
-}
-
-/**
- * load_attribute_list - load an attribute list into memory
- * @vol: ntfs volume from which to read
- * @runlist: runlist of the attribute list
- * @al_start: destination buffer
- * @size: size of the destination buffer in bytes
- * @initialized_size: initialized size of the attribute list
- *
- * Walk the runlist @runlist and load all clusters from it copying them into
- * the linear buffer @al. The maximum number of bytes copied to @al is @size
- * bytes. Note, @size does not need to be a multiple of the cluster size. If
- * @initialized_size is less than @size, the region in @al between
- * @initialized_size and @size will be zeroed and not read from disk.
- *
- * Return 0 on success or -errno on error.
- */
-int load_attribute_list(ntfs_volume *vol, runlist *runlist, u8 *al_start,
- const s64 size, const s64 initialized_size)
-{
- LCN lcn;
- u8 *al = al_start;
- u8 *al_end = al + initialized_size;
- runlist_element *rl;
- struct buffer_head *bh;
- struct super_block *sb;
- unsigned long block_size;
- unsigned long block, max_block;
- int err = 0;
- unsigned char block_size_bits;
-
- ntfs_debug("Entering.");
- if (!vol || !runlist || !al || size <= 0 || initialized_size < 0 ||
- initialized_size > size)
- return -EINVAL;
- if (!initialized_size) {
- memset(al, 0, size);
- return 0;
- }
- sb = vol->sb;
- block_size = sb->s_blocksize;
- block_size_bits = sb->s_blocksize_bits;
- down_read(&runlist->lock);
- rl = runlist->rl;
- if (!rl) {
- ntfs_error(sb, "Cannot read attribute list since runlist is "
- "missing.");
- goto err_out;
- }
- /* Read all clusters specified by the runlist one run at a time. */
- while (rl->length) {
- lcn = ntfs_rl_vcn_to_lcn(rl, rl->vcn);
- ntfs_debug("Reading vcn = 0x%llx, lcn = 0x%llx.",
- (unsigned long long)rl->vcn,
- (unsigned long long)lcn);
- /* The attribute list cannot be sparse. */
- if (lcn < 0) {
- ntfs_error(sb, "ntfs_rl_vcn_to_lcn() failed. Cannot "
- "read attribute list.");
- goto err_out;
- }
- block = lcn << vol->cluster_size_bits >> block_size_bits;
- /* Read the run from device in chunks of block_size bytes. */
- max_block = block + (rl->length << vol->cluster_size_bits >>
- block_size_bits);
- ntfs_debug("max_block = 0x%lx.", max_block);
- do {
- ntfs_debug("Reading block = 0x%lx.", block);
- bh = sb_bread(sb, block);
- if (!bh) {
- ntfs_error(sb, "sb_bread() failed. Cannot "
- "read attribute list.");
- goto err_out;
- }
- if (al + block_size >= al_end)
- goto do_final;
- memcpy(al, bh->b_data, block_size);
- brelse(bh);
- al += block_size;
- } while (++block < max_block);
- rl++;
- }
- if (initialized_size < size) {
-initialize:
- memset(al_start + initialized_size, 0, size - initialized_size);
- }
-done:
- up_read(&runlist->lock);
- return err;
-do_final:
- if (al < al_end) {
- /*
- * Partial block.
- *
- * Note: The attribute list can be smaller than its allocation
- * by multiple clusters. This has been encountered by at least
- * two people running Windows XP, thus we cannot do any
- * truncation sanity checking here. (AIA)
- */
- memcpy(al, bh->b_data, al_end - al);
- brelse(bh);
- if (initialized_size < size)
- goto initialize;
- goto done;
- }
- brelse(bh);
- /* Real overflow! */
- ntfs_error(sb, "Attribute list buffer overflow. Read attribute list "
- "is truncated.");
-err_out:
- err = -EIO;
- goto done;
-}
-
-/**
- * ntfs_external_attr_find - find an attribute in the attribute list of an inode
- * @type: attribute type to find
- * @name: attribute name to find (optional, i.e. NULL means don't care)
- * @name_len: attribute name length (only needed if @name present)
- * @ic: IGNORE_CASE or CASE_SENSITIVE (ignored if @name not present)
- * @lowest_vcn: lowest vcn to find (optional, non-resident attributes only)
- * @val: attribute value to find (optional, resident attributes only)
- * @val_len: attribute value length
- * @ctx: search context with mft record and attribute to search from
- *
- * You should not need to call this function directly. Use ntfs_attr_lookup()
- * instead.
- *
- * Find an attribute by searching the attribute list for the corresponding
- * attribute list entry. Having found the entry, map the mft record if the
- * attribute is in a different mft record/inode, ntfs_attr_find() the attribute
- * in there and return it.
- *
- * On first search @ctx->ntfs_ino must be the base mft record and @ctx must
- * have been obtained from a call to ntfs_attr_get_search_ctx(). On subsequent
- * calls @ctx->ntfs_ino can be any extent inode, too (@ctx->base_ntfs_ino is
- * then the base inode).
- *
- * After finishing with the attribute/mft record you need to call
- * ntfs_attr_put_search_ctx() to cleanup the search context (unmapping any
- * mapped inodes, etc).
- *
- * If the attribute is found, ntfs_external_attr_find() returns 0 and
- * @ctx->attr will point to the found attribute. @ctx->mrec will point to the
- * mft record in which @ctx->attr is located and @ctx->al_entry will point to
- * the attribute list entry for the attribute.
- *
- * If the attribute is not found, ntfs_external_attr_find() returns -ENOENT and
- * @ctx->attr will point to the attribute in the base mft record before which
- * the attribute being searched for would need to be inserted if such an action
- * were to be desired. @ctx->mrec will point to the mft record in which
- * @ctx->attr is located and @ctx->al_entry will point to the attribute list
- * entry of the attribute before which the attribute being searched for would
- * need to be inserted if such an action were to be desired.
- *
- * Thus to insert the not found attribute, one wants to add the attribute to
- * @ctx->mrec (the base mft record) and if there is not enough space, the
- * attribute should be placed in a newly allocated extent mft record. The
- * attribute list entry for the inserted attribute should be inserted in the
- * attribute list attribute at @ctx->al_entry.
- *
- * On actual error, ntfs_external_attr_find() returns -EIO. In this case
- * @ctx->attr is undefined and in particular do not rely on it not changing.
- */
-static int ntfs_external_attr_find(const ATTR_TYPE type,
- const ntfschar *name, const u32 name_len,
- const IGNORE_CASE_BOOL ic, const VCN lowest_vcn,
- const u8 *val, const u32 val_len, ntfs_attr_search_ctx *ctx)
-{
- ntfs_inode *base_ni, *ni;
- ntfs_volume *vol;
- ATTR_LIST_ENTRY *al_entry, *next_al_entry;
- u8 *al_start, *al_end;
- ATTR_RECORD *a;
- ntfschar *al_name;
- u32 al_name_len;
- int err = 0;
- static const char *es = " Unmount and run chkdsk.";
-
- ni = ctx->ntfs_ino;
- base_ni = ctx->base_ntfs_ino;
- ntfs_debug("Entering for inode 0x%lx, type 0x%x.", ni->mft_no, type);
- if (!base_ni) {
- /* First call happens with the base mft record. */
- base_ni = ctx->base_ntfs_ino = ctx->ntfs_ino;
- ctx->base_mrec = ctx->mrec;
- }
- if (ni == base_ni)
- ctx->base_attr = ctx->attr;
- if (type == AT_END)
- goto not_found;
- vol = base_ni->vol;
- al_start = base_ni->attr_list;
- al_end = al_start + base_ni->attr_list_size;
- if (!ctx->al_entry)
- ctx->al_entry = (ATTR_LIST_ENTRY*)al_start;
- /*
- * Iterate over entries in attribute list starting at @ctx->al_entry,
- * or the entry following that, if @ctx->is_first is 'true'.
- */
- if (ctx->is_first) {
- al_entry = ctx->al_entry;
- ctx->is_first = false;
- } else
- al_entry = (ATTR_LIST_ENTRY*)((u8*)ctx->al_entry +
- le16_to_cpu(ctx->al_entry->length));
- for (;; al_entry = next_al_entry) {
- /* Out of bounds check. */
- if ((u8*)al_entry < base_ni->attr_list ||
- (u8*)al_entry > al_end)
- break; /* Inode is corrupt. */
- ctx->al_entry = al_entry;
- /* Catch the end of the attribute list. */
- if ((u8*)al_entry == al_end)
- goto not_found;
- if (!al_entry->length)
- break;
- if ((u8*)al_entry + 6 > al_end || (u8*)al_entry +
- le16_to_cpu(al_entry->length) > al_end)
- break;
- next_al_entry = (ATTR_LIST_ENTRY*)((u8*)al_entry +
- le16_to_cpu(al_entry->length));
- if (le32_to_cpu(al_entry->type) > le32_to_cpu(type))
- goto not_found;
- if (type != al_entry->type)
- continue;
- /*
- * If @name is present, compare the two names. If @name is
- * missing, assume we want an unnamed attribute.
- */
- al_name_len = al_entry->name_length;
- al_name = (ntfschar*)((u8*)al_entry + al_entry->name_offset);
- if (!name) {
- if (al_name_len)
- goto not_found;
- } else if (!ntfs_are_names_equal(al_name, al_name_len, name,
- name_len, ic, vol->upcase, vol->upcase_len)) {
- register int rc;
-
- rc = ntfs_collate_names(name, name_len, al_name,
- al_name_len, 1, IGNORE_CASE,
- vol->upcase, vol->upcase_len);
- /*
- * If @name collates before al_name, there is no
- * matching attribute.
- */
- if (rc == -1)
- goto not_found;
- /* If the strings are not equal, continue search. */
- if (rc)
- continue;
- /*
- * FIXME: Reverse engineering showed 0, IGNORE_CASE but
- * that is inconsistent with ntfs_attr_find(). The
- * subsequent rc checks were also different. Perhaps I
- * made a mistake in one of the two. Need to recheck
- * which is correct or at least see what is going on...
- * (AIA)
- */
- rc = ntfs_collate_names(name, name_len, al_name,
- al_name_len, 1, CASE_SENSITIVE,
- vol->upcase, vol->upcase_len);
- if (rc == -1)
- goto not_found;
- if (rc)
- continue;
- }
- /*
- * The names match or @name not present and attribute is
- * unnamed. Now check @lowest_vcn. Continue search if the
- * next attribute list entry still fits @lowest_vcn. Otherwise
- * we have reached the right one or the search has failed.
- */
- if (lowest_vcn && (u8*)next_al_entry >= al_start &&
- (u8*)next_al_entry + 6 < al_end &&
- (u8*)next_al_entry + le16_to_cpu(
- next_al_entry->length) <= al_end &&
- sle64_to_cpu(next_al_entry->lowest_vcn) <=
- lowest_vcn &&
- next_al_entry->type == al_entry->type &&
- next_al_entry->name_length == al_name_len &&
- ntfs_are_names_equal((ntfschar*)((u8*)
- next_al_entry +
- next_al_entry->name_offset),
- next_al_entry->name_length,
- al_name, al_name_len, CASE_SENSITIVE,
- vol->upcase, vol->upcase_len))
- continue;
- if (MREF_LE(al_entry->mft_reference) == ni->mft_no) {
- if (MSEQNO_LE(al_entry->mft_reference) != ni->seq_no) {
- ntfs_error(vol->sb, "Found stale mft "
- "reference in attribute list "
- "of base inode 0x%lx.%s",
- base_ni->mft_no, es);
- err = -EIO;
- break;
- }
- } else { /* Mft references do not match. */
- /* If there is a mapped record unmap it first. */
- if (ni != base_ni)
- unmap_extent_mft_record(ni);
- /* Do we want the base record back? */
- if (MREF_LE(al_entry->mft_reference) ==
- base_ni->mft_no) {
- ni = ctx->ntfs_ino = base_ni;
- ctx->mrec = ctx->base_mrec;
- } else {
- /* We want an extent record. */
- ctx->mrec = map_extent_mft_record(base_ni,
- le64_to_cpu(
- al_entry->mft_reference), &ni);
- if (IS_ERR(ctx->mrec)) {
- ntfs_error(vol->sb, "Failed to map "
- "extent mft record "
- "0x%lx of base inode "
- "0x%lx.%s",
- MREF_LE(al_entry->
- mft_reference),
- base_ni->mft_no, es);
- err = PTR_ERR(ctx->mrec);
- if (err == -ENOENT)
- err = -EIO;
- /* Cause @ctx to be sanitized below. */
- ni = NULL;
- break;
- }
- ctx->ntfs_ino = ni;
- }
- ctx->attr = (ATTR_RECORD*)((u8*)ctx->mrec +
- le16_to_cpu(ctx->mrec->attrs_offset));
- }
- /*
- * ctx->vfs_ino, ctx->mrec, and ctx->attr now point to the
- * mft record containing the attribute represented by the
- * current al_entry.
- */
- /*
- * We could call into ntfs_attr_find() to find the right
- * attribute in this mft record but this would be less
- * efficient and not quite accurate as ntfs_attr_find() ignores
- * the attribute instance numbers for example which become
- * important when one plays with attribute lists. Also,
- * because a proper match has been found in the attribute list
- * entry above, the comparison can now be optimized. So it is
- * worth re-implementing a simplified ntfs_attr_find() here.
- */
- a = ctx->attr;
- /*
- * Use a manual loop so we can still use break and continue
- * with the same meanings as above.
- */
-do_next_attr_loop:
- if ((u8*)a < (u8*)ctx->mrec || (u8*)a > (u8*)ctx->mrec +
- le32_to_cpu(ctx->mrec->bytes_allocated))
- break;
- if (a->type == AT_END)
- break;
- if (!a->length)
- break;
- if (al_entry->instance != a->instance)
- goto do_next_attr;
- /*
- * If the type and/or the name are mismatched between the
- * attribute list entry and the attribute record, there is
- * corruption so we break and return error EIO.
- */
- if (al_entry->type != a->type)
- break;
- if (!ntfs_are_names_equal((ntfschar*)((u8*)a +
- le16_to_cpu(a->name_offset)), a->name_length,
- al_name, al_name_len, CASE_SENSITIVE,
- vol->upcase, vol->upcase_len))
- break;
- ctx->attr = a;
- /*
- * If no @val specified or @val specified and it matches, we
- * have found it!
- */
- if (!val || (!a->non_resident && le32_to_cpu(
- a->data.resident.value_length) == val_len &&
- !memcmp((u8*)a +
- le16_to_cpu(a->data.resident.value_offset),
- val, val_len))) {
- ntfs_debug("Done, found.");
- return 0;
- }
-do_next_attr:
- /* Proceed to the next attribute in the current mft record. */
- a = (ATTR_RECORD*)((u8*)a + le32_to_cpu(a->length));
- goto do_next_attr_loop;
- }
- if (!err) {
- ntfs_error(vol->sb, "Base inode 0x%lx contains corrupt "
- "attribute list attribute.%s", base_ni->mft_no,
- es);
- err = -EIO;
- }
- if (ni != base_ni) {
- if (ni)
- unmap_extent_mft_record(ni);
- ctx->ntfs_ino = base_ni;
- ctx->mrec = ctx->base_mrec;
- ctx->attr = ctx->base_attr;
- }
- if (err != -ENOMEM)
- NVolSetErrors(vol);
- return err;
-not_found:
- /*
- * If we were looking for AT_END, we reset the search context @ctx and
- * use ntfs_attr_find() to seek to the end of the base mft record.
- */
- if (type == AT_END) {
- ntfs_attr_reinit_search_ctx(ctx);
- return ntfs_attr_find(AT_END, name, name_len, ic, val, val_len,
- ctx);
- }
- /*
- * The attribute was not found. Before we return, we want to ensure
- * @ctx->mrec and @ctx->attr indicate the position at which the
- * attribute should be inserted in the base mft record. Since we also
- * want to preserve @ctx->al_entry we cannot reinitialize the search
- * context using ntfs_attr_reinit_search_ctx() as this would set
- * @ctx->al_entry to NULL. Thus we do the necessary bits manually (see
- * ntfs_attr_init_search_ctx() below). Note, we _only_ preserve
- * @ctx->al_entry as the remaining fields (base_*) are identical to
- * their non base_ counterparts and we cannot set @ctx->base_attr
- * correctly yet as we do not know what @ctx->attr will be set to by
- * the call to ntfs_attr_find() below.
- */
- if (ni != base_ni)
- unmap_extent_mft_record(ni);
- ctx->mrec = ctx->base_mrec;
- ctx->attr = (ATTR_RECORD*)((u8*)ctx->mrec +
- le16_to_cpu(ctx->mrec->attrs_offset));
- ctx->is_first = true;
- ctx->ntfs_ino = base_ni;
- ctx->base_ntfs_ino = NULL;
- ctx->base_mrec = NULL;
- ctx->base_attr = NULL;
- /*
- * In case there are multiple matches in the base mft record, need to
- * keep enumerating until we get an attribute not found response (or
- * another error), otherwise we would keep returning the same attribute
- * over and over again and all programs using us for enumeration would
- * lock up in a tight loop.
- */
- do {
- err = ntfs_attr_find(type, name, name_len, ic, val, val_len,
- ctx);
- } while (!err);
- ntfs_debug("Done, not found.");
- return err;
-}
-
-/**
- * ntfs_attr_lookup - find an attribute in an ntfs inode
- * @type: attribute type to find
- * @name: attribute name to find (optional, i.e. NULL means don't care)
- * @name_len: attribute name length (only needed if @name present)
- * @ic: IGNORE_CASE or CASE_SENSITIVE (ignored if @name not present)
- * @lowest_vcn: lowest vcn to find (optional, non-resident attributes only)
- * @val: attribute value to find (optional, resident attributes only)
- * @val_len: attribute value length
- * @ctx: search context with mft record and attribute to search from
- *
- * Find an attribute in an ntfs inode. On first search @ctx->ntfs_ino must
- * be the base mft record and @ctx must have been obtained from a call to
- * ntfs_attr_get_search_ctx().
- *
- * This function transparently handles attribute lists and @ctx is used to
- * continue searches where they were left off at.
- *
- * After finishing with the attribute/mft record you need to call
- * ntfs_attr_put_search_ctx() to cleanup the search context (unmapping any
- * mapped inodes, etc).
- *
- * Return 0 if the search was successful and -errno if not.
- *
- * When 0, @ctx->attr is the found attribute and it is in mft record
- * @ctx->mrec. If an attribute list attribute is present, @ctx->al_entry is
- * the attribute list entry of the found attribute.
- *
- * When -ENOENT, @ctx->attr is the attribute which collates just after the
- * attribute being searched for, i.e. if one wants to add the attribute to the
- * mft record this is the correct place to insert it into. If an attribute
- * list attribute is present, @ctx->al_entry is the attribute list entry which
- * collates just after the attribute list entry of the attribute being searched
- * for, i.e. if one wants to add the attribute to the mft record this is the
- * correct place to insert its attribute list entry into.
- *
- * When -errno != -ENOENT, an error occurred during the lookup. @ctx->attr is
- * then undefined and in particular you should not rely on it not changing.
- */
-int ntfs_attr_lookup(const ATTR_TYPE type, const ntfschar *name,
- const u32 name_len, const IGNORE_CASE_BOOL ic,
- const VCN lowest_vcn, const u8 *val, const u32 val_len,
- ntfs_attr_search_ctx *ctx)
-{
- ntfs_inode *base_ni;
-
- ntfs_debug("Entering.");
- BUG_ON(IS_ERR(ctx->mrec));
- if (ctx->base_ntfs_ino)
- base_ni = ctx->base_ntfs_ino;
- else
- base_ni = ctx->ntfs_ino;
- /* Sanity check, just for debugging really. */
- BUG_ON(!base_ni);
- if (!NInoAttrList(base_ni) || type == AT_ATTRIBUTE_LIST)
- return ntfs_attr_find(type, name, name_len, ic, val, val_len,
- ctx);
- return ntfs_external_attr_find(type, name, name_len, ic, lowest_vcn,
- val, val_len, ctx);
-}
-
-/**
- * ntfs_attr_init_search_ctx - initialize an attribute search context
- * @ctx: attribute search context to initialize
- * @ni: ntfs inode with which to initialize the search context
- * @mrec: mft record with which to initialize the search context
- *
- * Initialize the attribute search context @ctx with @ni and @mrec.
- */
-static inline void ntfs_attr_init_search_ctx(ntfs_attr_search_ctx *ctx,
- ntfs_inode *ni, MFT_RECORD *mrec)
-{
- *ctx = (ntfs_attr_search_ctx) {
- .mrec = mrec,
- /* Sanity checks are performed elsewhere. */
- .attr = (ATTR_RECORD*)((u8*)mrec +
- le16_to_cpu(mrec->attrs_offset)),
- .is_first = true,
- .ntfs_ino = ni,
- };
-}
-
-/**
- * ntfs_attr_reinit_search_ctx - reinitialize an attribute search context
- * @ctx: attribute search context to reinitialize
- *
- * Reinitialize the attribute search context @ctx, unmapping an associated
- * extent mft record if present, and initialize the search context again.
- *
- * This is used when a search for a new attribute is being started to reset
- * the search context to the beginning.
- */
-void ntfs_attr_reinit_search_ctx(ntfs_attr_search_ctx *ctx)
-{
- if (likely(!ctx->base_ntfs_ino)) {
- /* No attribute list. */
- ctx->is_first = true;
- /* Sanity checks are performed elsewhere. */
- ctx->attr = (ATTR_RECORD*)((u8*)ctx->mrec +
- le16_to_cpu(ctx->mrec->attrs_offset));
- /*
- * This needs resetting due to ntfs_external_attr_find() which
- * can leave it set despite having zeroed ctx->base_ntfs_ino.
- */
- ctx->al_entry = NULL;
- return;
- } /* Attribute list. */
- if (ctx->ntfs_ino != ctx->base_ntfs_ino)
- unmap_extent_mft_record(ctx->ntfs_ino);
- ntfs_attr_init_search_ctx(ctx, ctx->base_ntfs_ino, ctx->base_mrec);
- return;
-}
-
-/**
- * ntfs_attr_get_search_ctx - allocate/initialize a new attribute search context
- * @ni: ntfs inode with which to initialize the search context
- * @mrec: mft record with which to initialize the search context
- *
- * Allocate a new attribute search context, initialize it with @ni and @mrec,
- * and return it. Return NULL if allocation failed.
- */
-ntfs_attr_search_ctx *ntfs_attr_get_search_ctx(ntfs_inode *ni, MFT_RECORD *mrec)
-{
- ntfs_attr_search_ctx *ctx;
-
- ctx = kmem_cache_alloc(ntfs_attr_ctx_cache, GFP_NOFS);
- if (ctx)
- ntfs_attr_init_search_ctx(ctx, ni, mrec);
- return ctx;
-}
-
-/**
- * ntfs_attr_put_search_ctx - release an attribute search context
- * @ctx: attribute search context to free
- *
- * Release the attribute search context @ctx, unmapping an associated extent
- * mft record if present.
- */
-void ntfs_attr_put_search_ctx(ntfs_attr_search_ctx *ctx)
-{
- if (ctx->base_ntfs_ino && ctx->ntfs_ino != ctx->base_ntfs_ino)
- unmap_extent_mft_record(ctx->ntfs_ino);
- kmem_cache_free(ntfs_attr_ctx_cache, ctx);
- return;
-}
-
-#ifdef NTFS_RW
-
-/**
- * ntfs_attr_find_in_attrdef - find an attribute in the $AttrDef system file
- * @vol: ntfs volume to which the attribute belongs
- * @type: attribute type which to find
- *
- * Search for the attribute definition record corresponding to the attribute
- * @type in the $AttrDef system file.
- *
- * Return the attribute type definition record if found and NULL if not found.
- */
-static ATTR_DEF *ntfs_attr_find_in_attrdef(const ntfs_volume *vol,
- const ATTR_TYPE type)
-{
- ATTR_DEF *ad;
-
- BUG_ON(!vol->attrdef);
- BUG_ON(!type);
- for (ad = vol->attrdef; (u8*)ad - (u8*)vol->attrdef <
- vol->attrdef_size && ad->type; ++ad) {
- /* We have not found it yet, carry on searching. */
- if (likely(le32_to_cpu(ad->type) < le32_to_cpu(type)))
- continue;
- /* We found the attribute; return it. */
- if (likely(ad->type == type))
- return ad;
- /* We have gone too far already. No point in continuing. */
- break;
- }
- /* Attribute not found. */
- ntfs_debug("Attribute type 0x%x not found in $AttrDef.",
- le32_to_cpu(type));
- return NULL;
-}
-
-/**
- * ntfs_attr_size_bounds_check - check a size of an attribute type for validity
- * @vol: ntfs volume to which the attribute belongs
- * @type: attribute type which to check
- * @size: size which to check
- *
- * Check whether the @size in bytes is valid for an attribute of @type on the
- * ntfs volume @vol. This information is obtained from $AttrDef system file.
- *
- * Return 0 if valid, -ERANGE if not valid, or -ENOENT if the attribute is not
- * listed in $AttrDef.
- */
-int ntfs_attr_size_bounds_check(const ntfs_volume *vol, const ATTR_TYPE type,
- const s64 size)
-{
- ATTR_DEF *ad;
-
- BUG_ON(size < 0);
- /*
- * $ATTRIBUTE_LIST has a maximum size of 256kiB, but this is not
- * listed in $AttrDef.
- */
- if (unlikely(type == AT_ATTRIBUTE_LIST && size > 256 * 1024))
- return -ERANGE;
- /* Get the $AttrDef entry for the attribute @type. */
- ad = ntfs_attr_find_in_attrdef(vol, type);
- if (unlikely(!ad))
- return -ENOENT;
- /* Do the bounds check. */
- if (((sle64_to_cpu(ad->min_size) > 0) &&
- size < sle64_to_cpu(ad->min_size)) ||
- ((sle64_to_cpu(ad->max_size) > 0) && size >
- sle64_to_cpu(ad->max_size)))
- return -ERANGE;
- return 0;
-}
-
-/**
- * ntfs_attr_can_be_non_resident - check if an attribute can be non-resident
- * @vol: ntfs volume to which the attribute belongs
- * @type: attribute type which to check
- *
- * Check whether the attribute of @type on the ntfs volume @vol is allowed to
- * be non-resident. This information is obtained from $AttrDef system file.
- *
- * Return 0 if the attribute is allowed to be non-resident, -EPERM if not, and
- * -ENOENT if the attribute is not listed in $AttrDef.
- */
-int ntfs_attr_can_be_non_resident(const ntfs_volume *vol, const ATTR_TYPE type)
-{
- ATTR_DEF *ad;
-
- /* Find the attribute definition record in $AttrDef. */
- ad = ntfs_attr_find_in_attrdef(vol, type);
- if (unlikely(!ad))
- return -ENOENT;
- /* Check the flags and return the result. */
- if (ad->flags & ATTR_DEF_RESIDENT)
- return -EPERM;
- return 0;
-}
-
-/**
- * ntfs_attr_can_be_resident - check if an attribute can be resident
- * @vol: ntfs volume to which the attribute belongs
- * @type: attribute type which to check
- *
- * Check whether the attribute of @type on the ntfs volume @vol is allowed to
- * be resident. This information is derived from our ntfs knowledge and may
- * not be completely accurate, especially when user defined attributes are
- * present. Basically we allow everything to be resident except for index
- * allocation and $EA attributes.
- *
- * Return 0 if the attribute is allowed to be non-resident and -EPERM if not.
- *
- * Warning: In the system file $MFT the attribute $Bitmap must be non-resident
- * otherwise windows will not boot (blue screen of death)! We cannot
- * check for this here as we do not know which inode's $Bitmap is
- * being asked about so the caller needs to special case this.
- */
-int ntfs_attr_can_be_resident(const ntfs_volume *vol, const ATTR_TYPE type)
-{
- if (type == AT_INDEX_ALLOCATION)
- return -EPERM;
- return 0;
-}
-
-/**
- * ntfs_attr_record_resize - resize an attribute record
- * @m: mft record containing attribute record
- * @a: attribute record to resize
- * @new_size: new size in bytes to which to resize the attribute record @a
- *
- * Resize the attribute record @a, i.e. the resident part of the attribute, in
- * the mft record @m to @new_size bytes.
- *
- * Return 0 on success and -errno on error. The following error codes are
- * defined:
- * -ENOSPC - Not enough space in the mft record @m to perform the resize.
- *
- * Note: On error, no modifications have been performed whatsoever.
- *
- * Warning: If you make a record smaller without having copied all the data you
- * are interested in the data may be overwritten.
- */
-int ntfs_attr_record_resize(MFT_RECORD *m, ATTR_RECORD *a, u32 new_size)
-{
- ntfs_debug("Entering for new_size %u.", new_size);
- /* Align to 8 bytes if it is not already done. */
- if (new_size & 7)
- new_size = (new_size + 7) & ~7;
- /* If the actual attribute length has changed, move things around. */
- if (new_size != le32_to_cpu(a->length)) {
- u32 new_muse = le32_to_cpu(m->bytes_in_use) -
- le32_to_cpu(a->length) + new_size;
- /* Not enough space in this mft record. */
- if (new_muse > le32_to_cpu(m->bytes_allocated))
- return -ENOSPC;
- /* Move attributes following @a to their new location. */
- memmove((u8*)a + new_size, (u8*)a + le32_to_cpu(a->length),
- le32_to_cpu(m->bytes_in_use) - ((u8*)a -
- (u8*)m) - le32_to_cpu(a->length));
- /* Adjust @m to reflect the change in used space. */
- m->bytes_in_use = cpu_to_le32(new_muse);
- /* Adjust @a to reflect the new size. */
- if (new_size >= offsetof(ATTR_REC, length) + sizeof(a->length))
- a->length = cpu_to_le32(new_size);
- }
- return 0;
-}
-
-/**
- * ntfs_resident_attr_value_resize - resize the value of a resident attribute
- * @m: mft record containing attribute record
- * @a: attribute record whose value to resize
- * @new_size: new size in bytes to which to resize the attribute value of @a
- *
- * Resize the value of the attribute @a in the mft record @m to @new_size bytes.
- * If the value is made bigger, the newly allocated space is cleared.
- *
- * Return 0 on success and -errno on error. The following error codes are
- * defined:
- * -ENOSPC - Not enough space in the mft record @m to perform the resize.
- *
- * Note: On error, no modifications have been performed whatsoever.
- *
- * Warning: If you make a record smaller without having copied all the data you
- * are interested in the data may be overwritten.
- */
-int ntfs_resident_attr_value_resize(MFT_RECORD *m, ATTR_RECORD *a,
- const u32 new_size)
-{
- u32 old_size;
-
- /* Resize the resident part of the attribute record. */
- if (ntfs_attr_record_resize(m, a,
- le16_to_cpu(a->data.resident.value_offset) + new_size))
- return -ENOSPC;
- /*
- * The resize succeeded! If we made the attribute value bigger, clear
- * the area between the old size and @new_size.
- */
- old_size = le32_to_cpu(a->data.resident.value_length);
- if (new_size > old_size)
- memset((u8*)a + le16_to_cpu(a->data.resident.value_offset) +
- old_size, 0, new_size - old_size);
- /* Finally update the length of the attribute value. */
- a->data.resident.value_length = cpu_to_le32(new_size);
- return 0;
-}
-
-/**
- * ntfs_attr_make_non_resident - convert a resident to a non-resident attribute
- * @ni: ntfs inode describing the attribute to convert
- * @data_size: size of the resident data to copy to the non-resident attribute
- *
- * Convert the resident ntfs attribute described by the ntfs inode @ni to a
- * non-resident one.
- *
- * @data_size must be equal to the attribute value size. This is needed since
- * we need to know the size before we can map the mft record and our callers
- * always know it. The reason we cannot simply read the size from the vfs
- * inode i_size is that this is not necessarily uptodate. This happens when
- * ntfs_attr_make_non_resident() is called in the ->truncate call path(s).
- *
- * Return 0 on success and -errno on error. The following error return codes
- * are defined:
- * -EPERM - The attribute is not allowed to be non-resident.
- * -ENOMEM - Not enough memory.
- * -ENOSPC - Not enough disk space.
- * -EINVAL - Attribute not defined on the volume.
- * -EIO - I/o error or other error.
- * Note that -ENOSPC is also returned in the case that there is not enough
- * space in the mft record to do the conversion. This can happen when the mft
- * record is already very full. The caller is responsible for trying to make
- * space in the mft record and trying again. FIXME: Do we need a separate
- * error return code for this kind of -ENOSPC or is it always worth trying
- * again in case the attribute may then fit in a resident state so no need to
- * make it non-resident at all? Ho-hum... (AIA)
- *
- * NOTE to self: No changes in the attribute list are required to move from
- * a resident to a non-resident attribute.
- *
- * Locking: - The caller must hold i_mutex on the inode.
- */
-int ntfs_attr_make_non_resident(ntfs_inode *ni, const u32 data_size)
-{
- s64 new_size;
- struct inode *vi = VFS_I(ni);
- ntfs_volume *vol = ni->vol;
- ntfs_inode *base_ni;
- MFT_RECORD *m;
- ATTR_RECORD *a;
- ntfs_attr_search_ctx *ctx;
- struct page *page;
- runlist_element *rl;
- u8 *kaddr;
- unsigned long flags;
- int mp_size, mp_ofs, name_ofs, arec_size, err, err2;
- u32 attr_size;
- u8 old_res_attr_flags;
-
- /* Check that the attribute is allowed to be non-resident. */
- err = ntfs_attr_can_be_non_resident(vol, ni->type);
- if (unlikely(err)) {
- if (err == -EPERM)
- ntfs_debug("Attribute is not allowed to be "
- "non-resident.");
- else
- ntfs_debug("Attribute not defined on the NTFS "
- "volume!");
- return err;
- }
- /*
- * FIXME: Compressed and encrypted attributes are not supported when
- * writing and we should never have gotten here for them.
- */
- BUG_ON(NInoCompressed(ni));
- BUG_ON(NInoEncrypted(ni));
- /*
- * The size needs to be aligned to a cluster boundary for allocation
- * purposes.
- */
- new_size = (data_size + vol->cluster_size - 1) &
- ~(vol->cluster_size - 1);
- if (new_size > 0) {
- /*
- * Will need the page later and since the page lock nests
- * outside all ntfs locks, we need to get the page now.
- */
- page = find_or_create_page(vi->i_mapping, 0,
- mapping_gfp_mask(vi->i_mapping));
- if (unlikely(!page))
- return -ENOMEM;
- /* Start by allocating clusters to hold the attribute value. */
- rl = ntfs_cluster_alloc(vol, 0, new_size >>
- vol->cluster_size_bits, -1, DATA_ZONE, true);
- if (IS_ERR(rl)) {
- err = PTR_ERR(rl);
- ntfs_debug("Failed to allocate cluster%s, error code "
- "%i.", (new_size >>
- vol->cluster_size_bits) > 1 ? "s" : "",
- err);
- goto page_err_out;
- }
- } else {
- rl = NULL;
- page = NULL;
- }
- /* Determine the size of the mapping pairs array. */
- mp_size = ntfs_get_size_for_mapping_pairs(vol, rl, 0, -1);
- if (unlikely(mp_size < 0)) {
- err = mp_size;
- ntfs_debug("Failed to get size for mapping pairs array, error "
- "code %i.", err);
- goto rl_err_out;
- }
- down_write(&ni->runlist.lock);
- if (!NInoAttr(ni))
- base_ni = ni;
- else
- base_ni = ni->ext.base_ntfs_ino;
- m = map_mft_record(base_ni);
- if (IS_ERR(m)) {
- err = PTR_ERR(m);
- m = NULL;
- ctx = NULL;
- goto err_out;
- }
- ctx = ntfs_attr_get_search_ctx(base_ni, m);
- if (unlikely(!ctx)) {
- err = -ENOMEM;
- goto err_out;
- }
- err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
- CASE_SENSITIVE, 0, NULL, 0, ctx);
- if (unlikely(err)) {
- if (err == -ENOENT)
- err = -EIO;
- goto err_out;
- }
- m = ctx->mrec;
- a = ctx->attr;
- BUG_ON(NInoNonResident(ni));
- BUG_ON(a->non_resident);
- /*
- * Calculate new offsets for the name and the mapping pairs array.
- */
- if (NInoSparse(ni) || NInoCompressed(ni))
- name_ofs = (offsetof(ATTR_REC,
- data.non_resident.compressed_size) +
- sizeof(a->data.non_resident.compressed_size) +
- 7) & ~7;
- else
- name_ofs = (offsetof(ATTR_REC,
- data.non_resident.compressed_size) + 7) & ~7;
- mp_ofs = (name_ofs + a->name_length * sizeof(ntfschar) + 7) & ~7;
- /*
- * Determine the size of the resident part of the now non-resident
- * attribute record.
- */
- arec_size = (mp_ofs + mp_size + 7) & ~7;
- /*
- * If the page is not uptodate bring it uptodate by copying from the
- * attribute value.
- */
- attr_size = le32_to_cpu(a->data.resident.value_length);
- BUG_ON(attr_size != data_size);
- if (page && !PageUptodate(page)) {
- kaddr = kmap_atomic(page);
- memcpy(kaddr, (u8*)a +
- le16_to_cpu(a->data.resident.value_offset),
- attr_size);
- memset(kaddr + attr_size, 0, PAGE_SIZE - attr_size);
- kunmap_atomic(kaddr);
- flush_dcache_page(page);
- SetPageUptodate(page);
- }
- /* Backup the attribute flag. */
- old_res_attr_flags = a->data.resident.flags;
- /* Resize the resident part of the attribute record. */
- err = ntfs_attr_record_resize(m, a, arec_size);
- if (unlikely(err))
- goto err_out;
- /*
- * Convert the resident part of the attribute record to describe a
- * non-resident attribute.
- */
- a->non_resident = 1;
- /* Move the attribute name if it exists and update the offset. */
- if (a->name_length)
- memmove((u8*)a + name_ofs, (u8*)a + le16_to_cpu(a->name_offset),
- a->name_length * sizeof(ntfschar));
- a->name_offset = cpu_to_le16(name_ofs);
- /* Setup the fields specific to non-resident attributes. */
- a->data.non_resident.lowest_vcn = 0;
- a->data.non_resident.highest_vcn = cpu_to_sle64((new_size - 1) >>
- vol->cluster_size_bits);
- a->data.non_resident.mapping_pairs_offset = cpu_to_le16(mp_ofs);
- memset(&a->data.non_resident.reserved, 0,
- sizeof(a->data.non_resident.reserved));
- a->data.non_resident.allocated_size = cpu_to_sle64(new_size);
- a->data.non_resident.data_size =
- a->data.non_resident.initialized_size =
- cpu_to_sle64(attr_size);
- if (NInoSparse(ni) || NInoCompressed(ni)) {
- a->data.non_resident.compression_unit = 0;
- if (NInoCompressed(ni) || vol->major_ver < 3)
- a->data.non_resident.compression_unit = 4;
- a->data.non_resident.compressed_size =
- a->data.non_resident.allocated_size;
- } else
- a->data.non_resident.compression_unit = 0;
- /* Generate the mapping pairs array into the attribute record. */
- err = ntfs_mapping_pairs_build(vol, (u8*)a + mp_ofs,
- arec_size - mp_ofs, rl, 0, -1, NULL);
- if (unlikely(err)) {
- ntfs_debug("Failed to build mapping pairs, error code %i.",
- err);
- goto undo_err_out;
- }
- /* Setup the in-memory attribute structure to be non-resident. */
- ni->runlist.rl = rl;
- write_lock_irqsave(&ni->size_lock, flags);
- ni->allocated_size = new_size;
- if (NInoSparse(ni) || NInoCompressed(ni)) {
- ni->itype.compressed.size = ni->allocated_size;
- if (a->data.non_resident.compression_unit) {
- ni->itype.compressed.block_size = 1U << (a->data.
- non_resident.compression_unit +
- vol->cluster_size_bits);
- ni->itype.compressed.block_size_bits =
- ffs(ni->itype.compressed.block_size) -
- 1;
- ni->itype.compressed.block_clusters = 1U <<
- a->data.non_resident.compression_unit;
- } else {
- ni->itype.compressed.block_size = 0;
- ni->itype.compressed.block_size_bits = 0;
- ni->itype.compressed.block_clusters = 0;
- }
- vi->i_blocks = ni->itype.compressed.size >> 9;
- } else
- vi->i_blocks = ni->allocated_size >> 9;
- write_unlock_irqrestore(&ni->size_lock, flags);
- /*
- * This needs to be last since the address space operations ->read_folio
- * and ->writepage can run concurrently with us as they are not
- * serialized on i_mutex. Note, we are not allowed to fail once we flip
- * this switch, which is another reason to do this last.
- */
- NInoSetNonResident(ni);
- /* Mark the mft record dirty, so it gets written back. */
- flush_dcache_mft_record_page(ctx->ntfs_ino);
- mark_mft_record_dirty(ctx->ntfs_ino);
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(base_ni);
- up_write(&ni->runlist.lock);
- if (page) {
- set_page_dirty(page);
- unlock_page(page);
- put_page(page);
- }
- ntfs_debug("Done.");
- return 0;
-undo_err_out:
- /* Convert the attribute back into a resident attribute. */
- a->non_resident = 0;
- /* Move the attribute name if it exists and update the offset. */
- name_ofs = (offsetof(ATTR_RECORD, data.resident.reserved) +
- sizeof(a->data.resident.reserved) + 7) & ~7;
- if (a->name_length)
- memmove((u8*)a + name_ofs, (u8*)a + le16_to_cpu(a->name_offset),
- a->name_length * sizeof(ntfschar));
- mp_ofs = (name_ofs + a->name_length * sizeof(ntfschar) + 7) & ~7;
- a->name_offset = cpu_to_le16(name_ofs);
- arec_size = (mp_ofs + attr_size + 7) & ~7;
- /* Resize the resident part of the attribute record. */
- err2 = ntfs_attr_record_resize(m, a, arec_size);
- if (unlikely(err2)) {
- /*
- * This cannot happen (well if memory corruption is at work it
- * could happen in theory), but deal with it as well as we can.
- * If the old size is too small, truncate the attribute,
- * otherwise simply give it a larger allocated size.
- * FIXME: Should check whether chkdsk complains when the
- * allocated size is much bigger than the resident value size.
- */
- arec_size = le32_to_cpu(a->length);
- if ((mp_ofs + attr_size) > arec_size) {
- err2 = attr_size;
- attr_size = arec_size - mp_ofs;
- ntfs_error(vol->sb, "Failed to undo partial resident "
- "to non-resident attribute "
- "conversion. Truncating inode 0x%lx, "
- "attribute type 0x%x from %i bytes to "
- "%i bytes to maintain metadata "
- "consistency. THIS MEANS YOU ARE "
- "LOSING %i BYTES DATA FROM THIS %s.",
- vi->i_ino,
- (unsigned)le32_to_cpu(ni->type),
- err2, attr_size, err2 - attr_size,
- ((ni->type == AT_DATA) &&
- !ni->name_len) ? "FILE": "ATTRIBUTE");
- write_lock_irqsave(&ni->size_lock, flags);
- ni->initialized_size = attr_size;
- i_size_write(vi, attr_size);
- write_unlock_irqrestore(&ni->size_lock, flags);
- }
- }
- /* Setup the fields specific to resident attributes. */
- a->data.resident.value_length = cpu_to_le32(attr_size);
- a->data.resident.value_offset = cpu_to_le16(mp_ofs);
- a->data.resident.flags = old_res_attr_flags;
- memset(&a->data.resident.reserved, 0,
- sizeof(a->data.resident.reserved));
- /* Copy the data from the page back to the attribute value. */
- if (page) {
- kaddr = kmap_atomic(page);
- memcpy((u8*)a + mp_ofs, kaddr, attr_size);
- kunmap_atomic(kaddr);
- }
- /* Setup the allocated size in the ntfs inode in case it changed. */
- write_lock_irqsave(&ni->size_lock, flags);
- ni->allocated_size = arec_size - mp_ofs;
- write_unlock_irqrestore(&ni->size_lock, flags);
- /* Mark the mft record dirty, so it gets written back. */
- flush_dcache_mft_record_page(ctx->ntfs_ino);
- mark_mft_record_dirty(ctx->ntfs_ino);
-err_out:
- if (ctx)
- ntfs_attr_put_search_ctx(ctx);
- if (m)
- unmap_mft_record(base_ni);
- ni->runlist.rl = NULL;
- up_write(&ni->runlist.lock);
-rl_err_out:
- if (rl) {
- if (ntfs_cluster_free_from_rl(vol, rl) < 0) {
- ntfs_error(vol->sb, "Failed to release allocated "
- "cluster(s) in error code path. Run "
- "chkdsk to recover the lost "
- "cluster(s).");
- NVolSetErrors(vol);
- }
- ntfs_free(rl);
-page_err_out:
- unlock_page(page);
- put_page(page);
- }
- if (err == -EINVAL)
- err = -EIO;
- return err;
-}
-
-/**
- * ntfs_attr_extend_allocation - extend the allocated space of an attribute
- * @ni: ntfs inode of the attribute whose allocation to extend
- * @new_alloc_size: new size in bytes to which to extend the allocation to
- * @new_data_size: new size in bytes to which to extend the data to
- * @data_start: beginning of region which is required to be non-sparse
- *
- * Extend the allocated space of an attribute described by the ntfs inode @ni
- * to @new_alloc_size bytes. If @data_start is -1, the whole extension may be
- * implemented as a hole in the file (as long as both the volume and the ntfs
- * inode @ni have sparse support enabled). If @data_start is >= 0, then the
- * region between the old allocated size and @data_start - 1 may be made sparse
- * but the regions between @data_start and @new_alloc_size must be backed by
- * actual clusters.
- *
- * If @new_data_size is -1, it is ignored. If it is >= 0, then the data size
- * of the attribute is extended to @new_data_size. Note that the i_size of the
- * vfs inode is not updated. Only the data size in the base attribute record
- * is updated. The caller has to update i_size separately if this is required.
- * WARNING: It is a BUG() for @new_data_size to be smaller than the old data
- * size as well as for @new_data_size to be greater than @new_alloc_size.
- *
- * For resident attributes this involves resizing the attribute record and if
- * necessary moving it and/or other attributes into extent mft records and/or
- * converting the attribute to a non-resident attribute which in turn involves
- * extending the allocation of a non-resident attribute as described below.
- *
- * For non-resident attributes this involves allocating clusters in the data
- * zone on the volume (except for regions that are being made sparse) and
- * extending the run list to describe the allocated clusters as well as
- * updating the mapping pairs array of the attribute. This in turn involves
- * resizing the attribute record and if necessary moving it and/or other
- * attributes into extent mft records and/or splitting the attribute record
- * into multiple extent attribute records.
- *
- * Also, the attribute list attribute is updated if present and in some of the
- * above cases (the ones where extent mft records/attributes come into play),
- * an attribute list attribute is created if not already present.
- *
- * Return the new allocated size on success and -errno on error. In the case
- * that an error is encountered but a partial extension at least up to
- * @data_start (if present) is possible, the allocation is partially extended
- * and this is returned. This means the caller must check the returned size to
- * determine if the extension was partial. If @data_start is -1 then partial
- * allocations are not performed.
- *
- * WARNING: Do not call ntfs_attr_extend_allocation() for $MFT/$DATA.
- *
- * Locking: This function takes the runlist lock of @ni for writing as well as
- * locking the mft record of the base ntfs inode. These locks are maintained
- * throughout execution of the function. These locks are required so that the
- * attribute can be resized safely and so that it can for example be converted
- * from resident to non-resident safely.
- *
- * TODO: At present attribute list attribute handling is not implemented.
- *
- * TODO: At present it is not safe to call this function for anything other
- * than the $DATA attribute(s) of an uncompressed and unencrypted file.
- */
-s64 ntfs_attr_extend_allocation(ntfs_inode *ni, s64 new_alloc_size,
- const s64 new_data_size, const s64 data_start)
-{
- VCN vcn;
- s64 ll, allocated_size, start = data_start;
- struct inode *vi = VFS_I(ni);
- ntfs_volume *vol = ni->vol;
- ntfs_inode *base_ni;
- MFT_RECORD *m;
- ATTR_RECORD *a;
- ntfs_attr_search_ctx *ctx;
- runlist_element *rl, *rl2;
- unsigned long flags;
- int err, mp_size;
- u32 attr_len = 0; /* Silence stupid gcc warning. */
- bool mp_rebuilt;
-
-#ifdef DEBUG
- read_lock_irqsave(&ni->size_lock, flags);
- allocated_size = ni->allocated_size;
- read_unlock_irqrestore(&ni->size_lock, flags);
- ntfs_debug("Entering for i_ino 0x%lx, attribute type 0x%x, "
- "old_allocated_size 0x%llx, "
- "new_allocated_size 0x%llx, new_data_size 0x%llx, "
- "data_start 0x%llx.", vi->i_ino,
- (unsigned)le32_to_cpu(ni->type),
- (unsigned long long)allocated_size,
- (unsigned long long)new_alloc_size,
- (unsigned long long)new_data_size,
- (unsigned long long)start);
-#endif
-retry_extend:
- /*
- * For non-resident attributes, @start and @new_size need to be aligned
- * to cluster boundaries for allocation purposes.
- */
- if (NInoNonResident(ni)) {
- if (start > 0)
- start &= ~(s64)vol->cluster_size_mask;
- new_alloc_size = (new_alloc_size + vol->cluster_size - 1) &
- ~(s64)vol->cluster_size_mask;
- }
- BUG_ON(new_data_size >= 0 && new_data_size > new_alloc_size);
- /* Check if new size is allowed in $AttrDef. */
- err = ntfs_attr_size_bounds_check(vol, ni->type, new_alloc_size);
- if (unlikely(err)) {
- /* Only emit errors when the write will fail completely. */
- read_lock_irqsave(&ni->size_lock, flags);
- allocated_size = ni->allocated_size;
- read_unlock_irqrestore(&ni->size_lock, flags);
- if (start < 0 || start >= allocated_size) {
- if (err == -ERANGE) {
- ntfs_error(vol->sb, "Cannot extend allocation "
- "of inode 0x%lx, attribute "
- "type 0x%x, because the new "
- "allocation would exceed the "
- "maximum allowed size for "
- "this attribute type.",
- vi->i_ino, (unsigned)
- le32_to_cpu(ni->type));
- } else {
- ntfs_error(vol->sb, "Cannot extend allocation "
- "of inode 0x%lx, attribute "
- "type 0x%x, because this "
- "attribute type is not "
- "defined on the NTFS volume. "
- "Possible corruption! You "
- "should run chkdsk!",
- vi->i_ino, (unsigned)
- le32_to_cpu(ni->type));
- }
- }
- /* Translate error code to be POSIX conformant for write(2). */
- if (err == -ERANGE)
- err = -EFBIG;
- else
- err = -EIO;
- return err;
- }
- if (!NInoAttr(ni))
- base_ni = ni;
- else
- base_ni = ni->ext.base_ntfs_ino;
- /*
- * We will be modifying both the runlist (if non-resident) and the mft
- * record so lock them both down.
- */
- down_write(&ni->runlist.lock);
- m = map_mft_record(base_ni);
- if (IS_ERR(m)) {
- err = PTR_ERR(m);
- m = NULL;
- ctx = NULL;
- goto err_out;
- }
- ctx = ntfs_attr_get_search_ctx(base_ni, m);
- if (unlikely(!ctx)) {
- err = -ENOMEM;
- goto err_out;
- }
- read_lock_irqsave(&ni->size_lock, flags);
- allocated_size = ni->allocated_size;
- read_unlock_irqrestore(&ni->size_lock, flags);
- /*
- * If non-resident, seek to the last extent. If resident, there is
- * only one extent, so seek to that.
- */
- vcn = NInoNonResident(ni) ? allocated_size >> vol->cluster_size_bits :
- 0;
- /*
- * Abort if someone did the work whilst we waited for the locks. If we
- * just converted the attribute from resident to non-resident it is
- * likely that exactly this has happened already. We cannot quite
- * abort if we need to update the data size.
- */
- if (unlikely(new_alloc_size <= allocated_size)) {
- ntfs_debug("Allocated size already exceeds requested size.");
- new_alloc_size = allocated_size;
- if (new_data_size < 0)
- goto done;
- /*
- * We want the first attribute extent so that we can update the
- * data size.
- */
- vcn = 0;
- }
- err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
- CASE_SENSITIVE, vcn, NULL, 0, ctx);
- if (unlikely(err)) {
- if (err == -ENOENT)
- err = -EIO;
- goto err_out;
- }
- m = ctx->mrec;
- a = ctx->attr;
- /* Use goto to reduce indentation. */
- if (a->non_resident)
- goto do_non_resident_extend;
- BUG_ON(NInoNonResident(ni));
- /* The total length of the attribute value. */
- attr_len = le32_to_cpu(a->data.resident.value_length);
- /*
- * Extend the attribute record to be able to store the new attribute
- * size. ntfs_attr_record_resize() will not do anything if the size is
- * not changing.
- */
- if (new_alloc_size < vol->mft_record_size &&
- !ntfs_attr_record_resize(m, a,
- le16_to_cpu(a->data.resident.value_offset) +
- new_alloc_size)) {
- /* The resize succeeded! */
- write_lock_irqsave(&ni->size_lock, flags);
- ni->allocated_size = le32_to_cpu(a->length) -
- le16_to_cpu(a->data.resident.value_offset);
- write_unlock_irqrestore(&ni->size_lock, flags);
- if (new_data_size >= 0) {
- BUG_ON(new_data_size < attr_len);
- a->data.resident.value_length =
- cpu_to_le32((u32)new_data_size);
- }
- goto flush_done;
- }
- /*
- * We have to drop all the locks so we can call
- * ntfs_attr_make_non_resident(). This could be optimised by try-
- * locking the first page cache page and only if that fails dropping
- * the locks, locking the page, and redoing all the locking and
- * lookups. While this would be a huge optimisation, it is not worth
- * it as this is definitely a slow code path.
- */
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(base_ni);
- up_write(&ni->runlist.lock);
- /*
- * Not enough space in the mft record, try to make the attribute
- * non-resident and if successful restart the extension process.
- */
- err = ntfs_attr_make_non_resident(ni, attr_len);
- if (likely(!err))
- goto retry_extend;
- /*
- * Could not make non-resident. If this is due to this not being
- * permitted for this attribute type or there not being enough space,
- * try to make other attributes non-resident. Otherwise fail.
- */
- if (unlikely(err != -EPERM && err != -ENOSPC)) {
- /* Only emit errors when the write will fail completely. */
- read_lock_irqsave(&ni->size_lock, flags);
- allocated_size = ni->allocated_size;
- read_unlock_irqrestore(&ni->size_lock, flags);
- if (start < 0 || start >= allocated_size)
- ntfs_error(vol->sb, "Cannot extend allocation of "
- "inode 0x%lx, attribute type 0x%x, "
- "because the conversion from resident "
- "to non-resident attribute failed "
- "with error code %i.", vi->i_ino,
- (unsigned)le32_to_cpu(ni->type), err);
- if (err != -ENOMEM)
- err = -EIO;
- goto conv_err_out;
- }
- /* TODO: Not implemented from here, abort. */
- read_lock_irqsave(&ni->size_lock, flags);
- allocated_size = ni->allocated_size;
- read_unlock_irqrestore(&ni->size_lock, flags);
- if (start < 0 || start >= allocated_size) {
- if (err == -ENOSPC)
- ntfs_error(vol->sb, "Not enough space in the mft "
- "record/on disk for the non-resident "
- "attribute value. This case is not "
- "implemented yet.");
- else /* if (err == -EPERM) */
- ntfs_error(vol->sb, "This attribute type may not be "
- "non-resident. This case is not "
- "implemented yet.");
- }
- err = -EOPNOTSUPP;
- goto conv_err_out;
-#if 0
- // TODO: Attempt to make other attributes non-resident.
- if (!err)
- goto do_resident_extend;
- /*
- * Both the attribute list attribute and the standard information
- * attribute must remain in the base inode. Thus, if this is one of
- * these attributes, we have to try to move other attributes out into
- * extent mft records instead.
- */
- if (ni->type == AT_ATTRIBUTE_LIST ||
- ni->type == AT_STANDARD_INFORMATION) {
- // TODO: Attempt to move other attributes into extent mft
- // records.
- err = -EOPNOTSUPP;
- if (!err)
- goto do_resident_extend;
- goto err_out;
- }
- // TODO: Attempt to move this attribute to an extent mft record, but
- // only if it is not already the only attribute in an mft record in
- // which case there would be nothing to gain.
- err = -EOPNOTSUPP;
- if (!err)
- goto do_resident_extend;
- /* There is nothing we can do to make enough space. )-: */
- goto err_out;
-#endif
-do_non_resident_extend:
- BUG_ON(!NInoNonResident(ni));
- if (new_alloc_size == allocated_size) {
- BUG_ON(vcn);
- goto alloc_done;
- }
- /*
- * If the data starts after the end of the old allocation, this is a
- * $DATA attribute and sparse attributes are enabled on the volume and
- * for this inode, then create a sparse region between the old
- * allocated size and the start of the data. Otherwise simply proceed
- * with filling the whole space between the old allocated size and the
- * new allocated size with clusters.
- */
- if ((start >= 0 && start <= allocated_size) || ni->type != AT_DATA ||
- !NVolSparseEnabled(vol) || NInoSparseDisabled(ni))
- goto skip_sparse;
- // TODO: This is not implemented yet. We just fill in with real
- // clusters for now...
- ntfs_debug("Inserting holes is not-implemented yet. Falling back to "
- "allocating real clusters instead.");
-skip_sparse:
- rl = ni->runlist.rl;
- if (likely(rl)) {
- /* Seek to the end of the runlist. */
- while (rl->length)
- rl++;
- }
- /* If this attribute extent is not mapped, map it now. */
- if (unlikely(!rl || rl->lcn == LCN_RL_NOT_MAPPED ||
- (rl->lcn == LCN_ENOENT && rl > ni->runlist.rl &&
- (rl-1)->lcn == LCN_RL_NOT_MAPPED))) {
- if (!rl && !allocated_size)
- goto first_alloc;
- rl = ntfs_mapping_pairs_decompress(vol, a, ni->runlist.rl);
- if (IS_ERR(rl)) {
- err = PTR_ERR(rl);
- if (start < 0 || start >= allocated_size)
- ntfs_error(vol->sb, "Cannot extend allocation "
- "of inode 0x%lx, attribute "
- "type 0x%x, because the "
- "mapping of a runlist "
- "fragment failed with error "
- "code %i.", vi->i_ino,
- (unsigned)le32_to_cpu(ni->type),
- err);
- if (err != -ENOMEM)
- err = -EIO;
- goto err_out;
- }
- ni->runlist.rl = rl;
- /* Seek to the end of the runlist. */
- while (rl->length)
- rl++;
- }
- /*
- * We now know the runlist of the last extent is mapped and @rl is at
- * the end of the runlist. We want to begin allocating clusters
- * starting at the last allocated cluster to reduce fragmentation. If
- * there are no valid LCNs in the attribute we let the cluster
- * allocator choose the starting cluster.
- */
- /* If the last LCN is a hole or simillar seek back to last real LCN. */
- while (rl->lcn < 0 && rl > ni->runlist.rl)
- rl--;
-first_alloc:
- // FIXME: Need to implement partial allocations so at least part of the
- // write can be performed when start >= 0. (Needed for POSIX write(2)
- // conformance.)
- rl2 = ntfs_cluster_alloc(vol, allocated_size >> vol->cluster_size_bits,
- (new_alloc_size - allocated_size) >>
- vol->cluster_size_bits, (rl && (rl->lcn >= 0)) ?
- rl->lcn + rl->length : -1, DATA_ZONE, true);
- if (IS_ERR(rl2)) {
- err = PTR_ERR(rl2);
- if (start < 0 || start >= allocated_size)
- ntfs_error(vol->sb, "Cannot extend allocation of "
- "inode 0x%lx, attribute type 0x%x, "
- "because the allocation of clusters "
- "failed with error code %i.", vi->i_ino,
- (unsigned)le32_to_cpu(ni->type), err);
- if (err != -ENOMEM && err != -ENOSPC)
- err = -EIO;
- goto err_out;
- }
- rl = ntfs_runlists_merge(ni->runlist.rl, rl2);
- if (IS_ERR(rl)) {
- err = PTR_ERR(rl);
- if (start < 0 || start >= allocated_size)
- ntfs_error(vol->sb, "Cannot extend allocation of "
- "inode 0x%lx, attribute type 0x%x, "
- "because the runlist merge failed "
- "with error code %i.", vi->i_ino,
- (unsigned)le32_to_cpu(ni->type), err);
- if (err != -ENOMEM)
- err = -EIO;
- if (ntfs_cluster_free_from_rl(vol, rl2)) {
- ntfs_error(vol->sb, "Failed to release allocated "
- "cluster(s) in error code path. Run "
- "chkdsk to recover the lost "
- "cluster(s).");
- NVolSetErrors(vol);
- }
- ntfs_free(rl2);
- goto err_out;
- }
- ni->runlist.rl = rl;
- ntfs_debug("Allocated 0x%llx clusters.", (long long)(new_alloc_size -
- allocated_size) >> vol->cluster_size_bits);
- /* Find the runlist element with which the attribute extent starts. */
- ll = sle64_to_cpu(a->data.non_resident.lowest_vcn);
- rl2 = ntfs_rl_find_vcn_nolock(rl, ll);
- BUG_ON(!rl2);
- BUG_ON(!rl2->length);
- BUG_ON(rl2->lcn < LCN_HOLE);
- mp_rebuilt = false;
- /* Get the size for the new mapping pairs array for this extent. */
- mp_size = ntfs_get_size_for_mapping_pairs(vol, rl2, ll, -1);
- if (unlikely(mp_size <= 0)) {
- err = mp_size;
- if (start < 0 || start >= allocated_size)
- ntfs_error(vol->sb, "Cannot extend allocation of "
- "inode 0x%lx, attribute type 0x%x, "
- "because determining the size for the "
- "mapping pairs failed with error code "
- "%i.", vi->i_ino,
- (unsigned)le32_to_cpu(ni->type), err);
- err = -EIO;
- goto undo_alloc;
- }
- /* Extend the attribute record to fit the bigger mapping pairs array. */
- attr_len = le32_to_cpu(a->length);
- err = ntfs_attr_record_resize(m, a, mp_size +
- le16_to_cpu(a->data.non_resident.mapping_pairs_offset));
- if (unlikely(err)) {
- BUG_ON(err != -ENOSPC);
- // TODO: Deal with this by moving this extent to a new mft
- // record or by starting a new extent in a new mft record,
- // possibly by extending this extent partially and filling it
- // and creating a new extent for the remainder, or by making
- // other attributes non-resident and/or by moving other
- // attributes out of this mft record.
- if (start < 0 || start >= allocated_size)
- ntfs_error(vol->sb, "Not enough space in the mft "
- "record for the extended attribute "
- "record. This case is not "
- "implemented yet.");
- err = -EOPNOTSUPP;
- goto undo_alloc;
- }
- mp_rebuilt = true;
- /* Generate the mapping pairs array directly into the attr record. */
- err = ntfs_mapping_pairs_build(vol, (u8*)a +
- le16_to_cpu(a->data.non_resident.mapping_pairs_offset),
- mp_size, rl2, ll, -1, NULL);
- if (unlikely(err)) {
- if (start < 0 || start >= allocated_size)
- ntfs_error(vol->sb, "Cannot extend allocation of "
- "inode 0x%lx, attribute type 0x%x, "
- "because building the mapping pairs "
- "failed with error code %i.", vi->i_ino,
- (unsigned)le32_to_cpu(ni->type), err);
- err = -EIO;
- goto undo_alloc;
- }
- /* Update the highest_vcn. */
- a->data.non_resident.highest_vcn = cpu_to_sle64((new_alloc_size >>
- vol->cluster_size_bits) - 1);
- /*
- * We now have extended the allocated size of the attribute. Reflect
- * this in the ntfs_inode structure and the attribute record.
- */
- if (a->data.non_resident.lowest_vcn) {
- /*
- * We are not in the first attribute extent, switch to it, but
- * first ensure the changes will make it to disk later.
- */
- flush_dcache_mft_record_page(ctx->ntfs_ino);
- mark_mft_record_dirty(ctx->ntfs_ino);
- ntfs_attr_reinit_search_ctx(ctx);
- err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
- CASE_SENSITIVE, 0, NULL, 0, ctx);
- if (unlikely(err))
- goto restore_undo_alloc;
- /* @m is not used any more so no need to set it. */
- a = ctx->attr;
- }
- write_lock_irqsave(&ni->size_lock, flags);
- ni->allocated_size = new_alloc_size;
- a->data.non_resident.allocated_size = cpu_to_sle64(new_alloc_size);
- /*
- * FIXME: This would fail if @ni is a directory, $MFT, or an index,
- * since those can have sparse/compressed set. For example can be
- * set compressed even though it is not compressed itself and in that
- * case the bit means that files are to be created compressed in the
- * directory... At present this is ok as this code is only called for
- * regular files, and only for their $DATA attribute(s).
- * FIXME: The calculation is wrong if we created a hole above. For now
- * it does not matter as we never create holes.
- */
- if (NInoSparse(ni) || NInoCompressed(ni)) {
- ni->itype.compressed.size += new_alloc_size - allocated_size;
- a->data.non_resident.compressed_size =
- cpu_to_sle64(ni->itype.compressed.size);
- vi->i_blocks = ni->itype.compressed.size >> 9;
- } else
- vi->i_blocks = new_alloc_size >> 9;
- write_unlock_irqrestore(&ni->size_lock, flags);
-alloc_done:
- if (new_data_size >= 0) {
- BUG_ON(new_data_size <
- sle64_to_cpu(a->data.non_resident.data_size));
- a->data.non_resident.data_size = cpu_to_sle64(new_data_size);
- }
-flush_done:
- /* Ensure the changes make it to disk. */
- flush_dcache_mft_record_page(ctx->ntfs_ino);
- mark_mft_record_dirty(ctx->ntfs_ino);
-done:
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(base_ni);
- up_write(&ni->runlist.lock);
- ntfs_debug("Done, new_allocated_size 0x%llx.",
- (unsigned long long)new_alloc_size);
- return new_alloc_size;
-restore_undo_alloc:
- if (start < 0 || start >= allocated_size)
- ntfs_error(vol->sb, "Cannot complete extension of allocation "
- "of inode 0x%lx, attribute type 0x%x, because "
- "lookup of first attribute extent failed with "
- "error code %i.", vi->i_ino,
- (unsigned)le32_to_cpu(ni->type), err);
- if (err == -ENOENT)
- err = -EIO;
- ntfs_attr_reinit_search_ctx(ctx);
- if (ntfs_attr_lookup(ni->type, ni->name, ni->name_len, CASE_SENSITIVE,
- allocated_size >> vol->cluster_size_bits, NULL, 0,
- ctx)) {
- ntfs_error(vol->sb, "Failed to find last attribute extent of "
- "attribute in error code path. Run chkdsk to "
- "recover.");
- write_lock_irqsave(&ni->size_lock, flags);
- ni->allocated_size = new_alloc_size;
- /*
- * FIXME: This would fail if @ni is a directory... See above.
- * FIXME: The calculation is wrong if we created a hole above.
- * For now it does not matter as we never create holes.
- */
- if (NInoSparse(ni) || NInoCompressed(ni)) {
- ni->itype.compressed.size += new_alloc_size -
- allocated_size;
- vi->i_blocks = ni->itype.compressed.size >> 9;
- } else
- vi->i_blocks = new_alloc_size >> 9;
- write_unlock_irqrestore(&ni->size_lock, flags);
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(base_ni);
- up_write(&ni->runlist.lock);
- /*
- * The only thing that is now wrong is the allocated size of the
- * base attribute extent which chkdsk should be able to fix.
- */
- NVolSetErrors(vol);
- return err;
- }
- ctx->attr->data.non_resident.highest_vcn = cpu_to_sle64(
- (allocated_size >> vol->cluster_size_bits) - 1);
-undo_alloc:
- ll = allocated_size >> vol->cluster_size_bits;
- if (ntfs_cluster_free(ni, ll, -1, ctx) < 0) {
- ntfs_error(vol->sb, "Failed to release allocated cluster(s) "
- "in error code path. Run chkdsk to recover "
- "the lost cluster(s).");
- NVolSetErrors(vol);
- }
- m = ctx->mrec;
- a = ctx->attr;
- /*
- * If the runlist truncation fails and/or the search context is no
- * longer valid, we cannot resize the attribute record or build the
- * mapping pairs array thus we mark the inode bad so that no access to
- * the freed clusters can happen.
- */
- if (ntfs_rl_truncate_nolock(vol, &ni->runlist, ll) || IS_ERR(m)) {
- ntfs_error(vol->sb, "Failed to %s in error code path. Run "
- "chkdsk to recover.", IS_ERR(m) ?
- "restore attribute search context" :
- "truncate attribute runlist");
- NVolSetErrors(vol);
- } else if (mp_rebuilt) {
- if (ntfs_attr_record_resize(m, a, attr_len)) {
- ntfs_error(vol->sb, "Failed to restore attribute "
- "record in error code path. Run "
- "chkdsk to recover.");
- NVolSetErrors(vol);
- } else /* if (success) */ {
- if (ntfs_mapping_pairs_build(vol, (u8*)a + le16_to_cpu(
- a->data.non_resident.
- mapping_pairs_offset), attr_len -
- le16_to_cpu(a->data.non_resident.
- mapping_pairs_offset), rl2, ll, -1,
- NULL)) {
- ntfs_error(vol->sb, "Failed to restore "
- "mapping pairs array in error "
- "code path. Run chkdsk to "
- "recover.");
- NVolSetErrors(vol);
- }
- flush_dcache_mft_record_page(ctx->ntfs_ino);
- mark_mft_record_dirty(ctx->ntfs_ino);
- }
- }
-err_out:
- if (ctx)
- ntfs_attr_put_search_ctx(ctx);
- if (m)
- unmap_mft_record(base_ni);
- up_write(&ni->runlist.lock);
-conv_err_out:
- ntfs_debug("Failed. Returning error code %i.", err);
- return err;
-}
-
-/**
- * ntfs_attr_set - fill (a part of) an attribute with a byte
- * @ni: ntfs inode describing the attribute to fill
- * @ofs: offset inside the attribute at which to start to fill
- * @cnt: number of bytes to fill
- * @val: the unsigned 8-bit value with which to fill the attribute
- *
- * Fill @cnt bytes of the attribute described by the ntfs inode @ni starting at
- * byte offset @ofs inside the attribute with the constant byte @val.
- *
- * This function is effectively like memset() applied to an ntfs attribute.
- * Note this function actually only operates on the page cache pages belonging
- * to the ntfs attribute and it marks them dirty after doing the memset().
- * Thus it relies on the vm dirty page write code paths to cause the modified
- * pages to be written to the mft record/disk.
- *
- * Return 0 on success and -errno on error. An error code of -ESPIPE means
- * that @ofs + @cnt were outside the end of the attribute and no write was
- * performed.
- */
-int ntfs_attr_set(ntfs_inode *ni, const s64 ofs, const s64 cnt, const u8 val)
-{
- ntfs_volume *vol = ni->vol;
- struct address_space *mapping;
- struct page *page;
- u8 *kaddr;
- pgoff_t idx, end;
- unsigned start_ofs, end_ofs, size;
-
- ntfs_debug("Entering for ofs 0x%llx, cnt 0x%llx, val 0x%hx.",
- (long long)ofs, (long long)cnt, val);
- BUG_ON(ofs < 0);
- BUG_ON(cnt < 0);
- if (!cnt)
- goto done;
- /*
- * FIXME: Compressed and encrypted attributes are not supported when
- * writing and we should never have gotten here for them.
- */
- BUG_ON(NInoCompressed(ni));
- BUG_ON(NInoEncrypted(ni));
- mapping = VFS_I(ni)->i_mapping;
- /* Work out the starting index and page offset. */
- idx = ofs >> PAGE_SHIFT;
- start_ofs = ofs & ~PAGE_MASK;
- /* Work out the ending index and page offset. */
- end = ofs + cnt;
- end_ofs = end & ~PAGE_MASK;
- /* If the end is outside the inode size return -ESPIPE. */
- if (unlikely(end > i_size_read(VFS_I(ni)))) {
- ntfs_error(vol->sb, "Request exceeds end of attribute.");
- return -ESPIPE;
- }
- end >>= PAGE_SHIFT;
- /* If there is a first partial page, need to do it the slow way. */
- if (start_ofs) {
- page = read_mapping_page(mapping, idx, NULL);
- if (IS_ERR(page)) {
- ntfs_error(vol->sb, "Failed to read first partial "
- "page (error, index 0x%lx).", idx);
- return PTR_ERR(page);
- }
- /*
- * If the last page is the same as the first page, need to
- * limit the write to the end offset.
- */
- size = PAGE_SIZE;
- if (idx == end)
- size = end_ofs;
- kaddr = kmap_atomic(page);
- memset(kaddr + start_ofs, val, size - start_ofs);
- flush_dcache_page(page);
- kunmap_atomic(kaddr);
- set_page_dirty(page);
- put_page(page);
- balance_dirty_pages_ratelimited(mapping);
- cond_resched();
- if (idx == end)
- goto done;
- idx++;
- }
- /* Do the whole pages the fast way. */
- for (; idx < end; idx++) {
- /* Find or create the current page. (The page is locked.) */
- page = grab_cache_page(mapping, idx);
- if (unlikely(!page)) {
- ntfs_error(vol->sb, "Insufficient memory to grab "
- "page (index 0x%lx).", idx);
- return -ENOMEM;
- }
- kaddr = kmap_atomic(page);
- memset(kaddr, val, PAGE_SIZE);
- flush_dcache_page(page);
- kunmap_atomic(kaddr);
- /*
- * If the page has buffers, mark them uptodate since buffer
- * state and not page state is definitive in 2.6 kernels.
- */
- if (page_has_buffers(page)) {
- struct buffer_head *bh, *head;
-
- bh = head = page_buffers(page);
- do {
- set_buffer_uptodate(bh);
- } while ((bh = bh->b_this_page) != head);
- }
- /* Now that buffers are uptodate, set the page uptodate, too. */
- SetPageUptodate(page);
- /*
- * Set the page and all its buffers dirty and mark the inode
- * dirty, too. The VM will write the page later on.
- */
- set_page_dirty(page);
- /* Finally unlock and release the page. */
- unlock_page(page);
- put_page(page);
- balance_dirty_pages_ratelimited(mapping);
- cond_resched();
- }
- /* If there is a last partial page, need to do it the slow way. */
- if (end_ofs) {
- page = read_mapping_page(mapping, idx, NULL);
- if (IS_ERR(page)) {
- ntfs_error(vol->sb, "Failed to read last partial page "
- "(error, index 0x%lx).", idx);
- return PTR_ERR(page);
- }
- kaddr = kmap_atomic(page);
- memset(kaddr, val, end_ofs);
- flush_dcache_page(page);
- kunmap_atomic(kaddr);
- set_page_dirty(page);
- put_page(page);
- balance_dirty_pages_ratelimited(mapping);
- cond_resched();
- }
-done:
- ntfs_debug("Done.");
- return 0;
-}
-
-#endif /* NTFS_RW */
diff --git a/fs/ntfs/attrib.h b/fs/ntfs/attrib.h
deleted file mode 100644
index fe0890d..0000000
--- a/fs/ntfs/attrib.h
+++ /dev/null
@@ -1,102 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * attrib.h - Defines for attribute handling in NTFS Linux kernel driver.
- * Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2005 Anton Altaparmakov
- * Copyright (c) 2002 Richard Russon
- */
-
-#ifndef _LINUX_NTFS_ATTRIB_H
-#define _LINUX_NTFS_ATTRIB_H
-
-#include "endian.h"
-#include "types.h"
-#include "layout.h"
-#include "inode.h"
-#include "runlist.h"
-#include "volume.h"
-
-/**
- * ntfs_attr_search_ctx - used in attribute search functions
- * @mrec: buffer containing mft record to search
- * @attr: attribute record in @mrec where to begin/continue search
- * @is_first: if true ntfs_attr_lookup() begins search with @attr, else after
- *
- * Structure must be initialized to zero before the first call to one of the
- * attribute search functions. Initialize @mrec to point to the mft record to
- * search, and @attr to point to the first attribute within @mrec (not necessary
- * if calling the _first() functions), and set @is_first to 'true' (not necessary
- * if calling the _first() functions).
- *
- * If @is_first is 'true', the search begins with @attr. If @is_first is 'false',
- * the search begins after @attr. This is so that, after the first call to one
- * of the search attribute functions, we can call the function again, without
- * any modification of the search context, to automagically get the next
- * matching attribute.
- */
-typedef struct {
- MFT_RECORD *mrec;
- ATTR_RECORD *attr;
- bool is_first;
- ntfs_inode *ntfs_ino;
- ATTR_LIST_ENTRY *al_entry;
- ntfs_inode *base_ntfs_ino;
- MFT_RECORD *base_mrec;
- ATTR_RECORD *base_attr;
-} ntfs_attr_search_ctx;
-
-extern int ntfs_map_runlist_nolock(ntfs_inode *ni, VCN vcn,
- ntfs_attr_search_ctx *ctx);
-extern int ntfs_map_runlist(ntfs_inode *ni, VCN vcn);
-
-extern LCN ntfs_attr_vcn_to_lcn_nolock(ntfs_inode *ni, const VCN vcn,
- const bool write_locked);
-
-extern runlist_element *ntfs_attr_find_vcn_nolock(ntfs_inode *ni,
- const VCN vcn, ntfs_attr_search_ctx *ctx);
-
-int ntfs_attr_lookup(const ATTR_TYPE type, const ntfschar *name,
- const u32 name_len, const IGNORE_CASE_BOOL ic,
- const VCN lowest_vcn, const u8 *val, const u32 val_len,
- ntfs_attr_search_ctx *ctx);
-
-extern int load_attribute_list(ntfs_volume *vol, runlist *rl, u8 *al_start,
- const s64 size, const s64 initialized_size);
-
-static inline s64 ntfs_attr_size(const ATTR_RECORD *a)
-{
- if (!a->non_resident)
- return (s64)le32_to_cpu(a->data.resident.value_length);
- return sle64_to_cpu(a->data.non_resident.data_size);
-}
-
-extern void ntfs_attr_reinit_search_ctx(ntfs_attr_search_ctx *ctx);
-extern ntfs_attr_search_ctx *ntfs_attr_get_search_ctx(ntfs_inode *ni,
- MFT_RECORD *mrec);
-extern void ntfs_attr_put_search_ctx(ntfs_attr_search_ctx *ctx);
-
-#ifdef NTFS_RW
-
-extern int ntfs_attr_size_bounds_check(const ntfs_volume *vol,
- const ATTR_TYPE type, const s64 size);
-extern int ntfs_attr_can_be_non_resident(const ntfs_volume *vol,
- const ATTR_TYPE type);
-extern int ntfs_attr_can_be_resident(const ntfs_volume *vol,
- const ATTR_TYPE type);
-
-extern int ntfs_attr_record_resize(MFT_RECORD *m, ATTR_RECORD *a, u32 new_size);
-extern int ntfs_resident_attr_value_resize(MFT_RECORD *m, ATTR_RECORD *a,
- const u32 new_size);
-
-extern int ntfs_attr_make_non_resident(ntfs_inode *ni, const u32 data_size);
-
-extern s64 ntfs_attr_extend_allocation(ntfs_inode *ni, s64 new_alloc_size,
- const s64 new_data_size, const s64 data_start);
-
-extern int ntfs_attr_set(ntfs_inode *ni, const s64 ofs, const s64 cnt,
- const u8 val);
-
-#endif /* NTFS_RW */
-
-#endif /* _LINUX_NTFS_ATTRIB_H */
diff --git a/fs/ntfs/bitmap.c b/fs/ntfs/bitmap.c
deleted file mode 100644
index 0675b24..0000000
--- a/fs/ntfs/bitmap.c
+++ /dev/null
@@ -1,179 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * bitmap.c - NTFS kernel bitmap handling. Part of the Linux-NTFS project.
- *
- * Copyright (c) 2004-2005 Anton Altaparmakov
- */
-
-#ifdef NTFS_RW
-
-#include <linux/pagemap.h>
-
-#include "bitmap.h"
-#include "debug.h"
-#include "aops.h"
-#include "ntfs.h"
-
-/**
- * __ntfs_bitmap_set_bits_in_run - set a run of bits in a bitmap to a value
- * @vi: vfs inode describing the bitmap
- * @start_bit: first bit to set
- * @count: number of bits to set
- * @value: value to set the bits to (i.e. 0 or 1)
- * @is_rollback: if 'true' this is a rollback operation
- *
- * Set @count bits starting at bit @start_bit in the bitmap described by the
- * vfs inode @vi to @value, where @value is either 0 or 1.
- *
- * @is_rollback should always be 'false', it is for internal use to rollback
- * errors. You probably want to use ntfs_bitmap_set_bits_in_run() instead.
- *
- * Return 0 on success and -errno on error.
- */
-int __ntfs_bitmap_set_bits_in_run(struct inode *vi, const s64 start_bit,
- const s64 count, const u8 value, const bool is_rollback)
-{
- s64 cnt = count;
- pgoff_t index, end_index;
- struct address_space *mapping;
- struct page *page;
- u8 *kaddr;
- int pos, len;
- u8 bit;
-
- BUG_ON(!vi);
- ntfs_debug("Entering for i_ino 0x%lx, start_bit 0x%llx, count 0x%llx, "
- "value %u.%s", vi->i_ino, (unsigned long long)start_bit,
- (unsigned long long)cnt, (unsigned int)value,
- is_rollback ? " (rollback)" : "");
- BUG_ON(start_bit < 0);
- BUG_ON(cnt < 0);
- BUG_ON(value > 1);
- /*
- * Calculate the indices for the pages containing the first and last
- * bits, i.e. @start_bit and @start_bit + @cnt - 1, respectively.
- */
- index = start_bit >> (3 + PAGE_SHIFT);
- end_index = (start_bit + cnt - 1) >> (3 + PAGE_SHIFT);
-
- /* Get the page containing the first bit (@start_bit). */
- mapping = vi->i_mapping;
- page = ntfs_map_page(mapping, index);
- if (IS_ERR(page)) {
- if (!is_rollback)
- ntfs_error(vi->i_sb, "Failed to map first page (error "
- "%li), aborting.", PTR_ERR(page));
- return PTR_ERR(page);
- }
- kaddr = page_address(page);
-
- /* Set @pos to the position of the byte containing @start_bit. */
- pos = (start_bit >> 3) & ~PAGE_MASK;
-
- /* Calculate the position of @start_bit in the first byte. */
- bit = start_bit & 7;
-
- /* If the first byte is partial, modify the appropriate bits in it. */
- if (bit) {
- u8 *byte = kaddr + pos;
- while ((bit & 7) && cnt) {
- cnt--;
- if (value)
- *byte |= 1 << bit++;
- else
- *byte &= ~(1 << bit++);
- }
- /* If we are done, unmap the page and return success. */
- if (!cnt)
- goto done;
-
- /* Update @pos to the new position. */
- pos++;
- }
- /*
- * Depending on @value, modify all remaining whole bytes in the page up
- * to @cnt.
- */
- len = min_t(s64, cnt >> 3, PAGE_SIZE - pos);
- memset(kaddr + pos, value ? 0xff : 0, len);
- cnt -= len << 3;
-
- /* Update @len to point to the first not-done byte in the page. */
- if (cnt < 8)
- len += pos;
-
- /* If we are not in the last page, deal with all subsequent pages. */
- while (index < end_index) {
- BUG_ON(cnt <= 0);
-
- /* Update @index and get the next page. */
- flush_dcache_page(page);
- set_page_dirty(page);
- ntfs_unmap_page(page);
- page = ntfs_map_page(mapping, ++index);
- if (IS_ERR(page))
- goto rollback;
- kaddr = page_address(page);
- /*
- * Depending on @value, modify all remaining whole bytes in the
- * page up to @cnt.
- */
- len = min_t(s64, cnt >> 3, PAGE_SIZE);
- memset(kaddr, value ? 0xff : 0, len);
- cnt -= len << 3;
- }
- /*
- * The currently mapped page is the last one. If the last byte is
- * partial, modify the appropriate bits in it. Note, @len is the
- * position of the last byte inside the page.
- */
- if (cnt) {
- u8 *byte;
-
- BUG_ON(cnt > 7);
-
- bit = cnt;
- byte = kaddr + len;
- while (bit--) {
- if (value)
- *byte |= 1 << bit;
- else
- *byte &= ~(1 << bit);
- }
- }
-done:
- /* We are done. Unmap the page and return success. */
- flush_dcache_page(page);
- set_page_dirty(page);
- ntfs_unmap_page(page);
- ntfs_debug("Done.");
- return 0;
-rollback:
- /*
- * Current state:
- * - no pages are mapped
- * - @count - @cnt is the number of bits that have been modified
- */
- if (is_rollback)
- return PTR_ERR(page);
- if (count != cnt)
- pos = __ntfs_bitmap_set_bits_in_run(vi, start_bit, count - cnt,
- value ? 0 : 1, true);
- else
- pos = 0;
- if (!pos) {
- /* Rollback was successful. */
- ntfs_error(vi->i_sb, "Failed to map subsequent page (error "
- "%li), aborting.", PTR_ERR(page));
- } else {
- /* Rollback failed. */
- ntfs_error(vi->i_sb, "Failed to map subsequent page (error "
- "%li) and rollback failed (error %i). "
- "Aborting and leaving inconsistent metadata. "
- "Unmount and run chkdsk.", PTR_ERR(page), pos);
- NVolSetErrors(NTFS_SB(vi->i_sb));
- }
- return PTR_ERR(page);
-}
-
-#endif /* NTFS_RW */
diff --git a/fs/ntfs/bitmap.h b/fs/ntfs/bitmap.h
deleted file mode 100644
index 9dd2224..0000000
--- a/fs/ntfs/bitmap.h
+++ /dev/null
@@ -1,104 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * bitmap.h - Defines for NTFS kernel bitmap handling. Part of the Linux-NTFS
- * project.
- *
- * Copyright (c) 2004 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_BITMAP_H
-#define _LINUX_NTFS_BITMAP_H
-
-#ifdef NTFS_RW
-
-#include <linux/fs.h>
-
-#include "types.h"
-
-extern int __ntfs_bitmap_set_bits_in_run(struct inode *vi, const s64 start_bit,
- const s64 count, const u8 value, const bool is_rollback);
-
-/**
- * ntfs_bitmap_set_bits_in_run - set a run of bits in a bitmap to a value
- * @vi: vfs inode describing the bitmap
- * @start_bit: first bit to set
- * @count: number of bits to set
- * @value: value to set the bits to (i.e. 0 or 1)
- *
- * Set @count bits starting at bit @start_bit in the bitmap described by the
- * vfs inode @vi to @value, where @value is either 0 or 1.
- *
- * Return 0 on success and -errno on error.
- */
-static inline int ntfs_bitmap_set_bits_in_run(struct inode *vi,
- const s64 start_bit, const s64 count, const u8 value)
-{
- return __ntfs_bitmap_set_bits_in_run(vi, start_bit, count, value,
- false);
-}
-
-/**
- * ntfs_bitmap_set_run - set a run of bits in a bitmap
- * @vi: vfs inode describing the bitmap
- * @start_bit: first bit to set
- * @count: number of bits to set
- *
- * Set @count bits starting at bit @start_bit in the bitmap described by the
- * vfs inode @vi.
- *
- * Return 0 on success and -errno on error.
- */
-static inline int ntfs_bitmap_set_run(struct inode *vi, const s64 start_bit,
- const s64 count)
-{
- return ntfs_bitmap_set_bits_in_run(vi, start_bit, count, 1);
-}
-
-/**
- * ntfs_bitmap_clear_run - clear a run of bits in a bitmap
- * @vi: vfs inode describing the bitmap
- * @start_bit: first bit to clear
- * @count: number of bits to clear
- *
- * Clear @count bits starting at bit @start_bit in the bitmap described by the
- * vfs inode @vi.
- *
- * Return 0 on success and -errno on error.
- */
-static inline int ntfs_bitmap_clear_run(struct inode *vi, const s64 start_bit,
- const s64 count)
-{
- return ntfs_bitmap_set_bits_in_run(vi, start_bit, count, 0);
-}
-
-/**
- * ntfs_bitmap_set_bit - set a bit in a bitmap
- * @vi: vfs inode describing the bitmap
- * @bit: bit to set
- *
- * Set bit @bit in the bitmap described by the vfs inode @vi.
- *
- * Return 0 on success and -errno on error.
- */
-static inline int ntfs_bitmap_set_bit(struct inode *vi, const s64 bit)
-{
- return ntfs_bitmap_set_run(vi, bit, 1);
-}
-
-/**
- * ntfs_bitmap_clear_bit - clear a bit in a bitmap
- * @vi: vfs inode describing the bitmap
- * @bit: bit to clear
- *
- * Clear bit @bit in the bitmap described by the vfs inode @vi.
- *
- * Return 0 on success and -errno on error.
- */
-static inline int ntfs_bitmap_clear_bit(struct inode *vi, const s64 bit)
-{
- return ntfs_bitmap_clear_run(vi, bit, 1);
-}
-
-#endif /* NTFS_RW */
-
-#endif /* defined _LINUX_NTFS_BITMAP_H */
diff --git a/fs/ntfs/collate.c b/fs/ntfs/collate.c
deleted file mode 100644
index 3ab6ec9..0000000
--- a/fs/ntfs/collate.c
+++ /dev/null
@@ -1,110 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * collate.c - NTFS kernel collation handling. Part of the Linux-NTFS project.
- *
- * Copyright (c) 2004 Anton Altaparmakov
- */
-
-#include "collate.h"
-#include "debug.h"
-#include "ntfs.h"
-
-static int ntfs_collate_binary(ntfs_volume *vol,
- const void *data1, const int data1_len,
- const void *data2, const int data2_len)
-{
- int rc;
-
- ntfs_debug("Entering.");
- rc = memcmp(data1, data2, min(data1_len, data2_len));
- if (!rc && (data1_len != data2_len)) {
- if (data1_len < data2_len)
- rc = -1;
- else
- rc = 1;
- }
- ntfs_debug("Done, returning %i", rc);
- return rc;
-}
-
-static int ntfs_collate_ntofs_ulong(ntfs_volume *vol,
- const void *data1, const int data1_len,
- const void *data2, const int data2_len)
-{
- int rc;
- u32 d1, d2;
-
- ntfs_debug("Entering.");
- // FIXME: We don't really want to bug here.
- BUG_ON(data1_len != data2_len);
- BUG_ON(data1_len != 4);
- d1 = le32_to_cpup(data1);
- d2 = le32_to_cpup(data2);
- if (d1 < d2)
- rc = -1;
- else {
- if (d1 == d2)
- rc = 0;
- else
- rc = 1;
- }
- ntfs_debug("Done, returning %i", rc);
- return rc;
-}
-
-typedef int (*ntfs_collate_func_t)(ntfs_volume *, const void *, const int,
- const void *, const int);
-
-static ntfs_collate_func_t ntfs_do_collate0x0[3] = {
- ntfs_collate_binary,
- NULL/*ntfs_collate_file_name*/,
- NULL/*ntfs_collate_unicode_string*/,
-};
-
-static ntfs_collate_func_t ntfs_do_collate0x1[4] = {
- ntfs_collate_ntofs_ulong,
- NULL/*ntfs_collate_ntofs_sid*/,
- NULL/*ntfs_collate_ntofs_security_hash*/,
- NULL/*ntfs_collate_ntofs_ulongs*/,
-};
-
-/**
- * ntfs_collate - collate two data items using a specified collation rule
- * @vol: ntfs volume to which the data items belong
- * @cr: collation rule to use when comparing the items
- * @data1: first data item to collate
- * @data1_len: length in bytes of @data1
- * @data2: second data item to collate
- * @data2_len: length in bytes of @data2
- *
- * Collate the two data items @data1 and @data2 using the collation rule @cr
- * and return -1, 0, ir 1 if @data1 is found, respectively, to collate before,
- * to match, or to collate after @data2.
- *
- * For speed we use the collation rule @cr as an index into two tables of
- * function pointers to call the appropriate collation function.
- */
-int ntfs_collate(ntfs_volume *vol, COLLATION_RULE cr,
- const void *data1, const int data1_len,
- const void *data2, const int data2_len) {
- int i;
-
- ntfs_debug("Entering.");
- /*
- * FIXME: At the moment we only support COLLATION_BINARY and
- * COLLATION_NTOFS_ULONG, so we BUG() for everything else for now.
- */
- BUG_ON(cr != COLLATION_BINARY && cr != COLLATION_NTOFS_ULONG);
- i = le32_to_cpu(cr);
- BUG_ON(i < 0);
- if (i <= 0x02)
- return ntfs_do_collate0x0[i](vol, data1, data1_len,
- data2, data2_len);
- BUG_ON(i < 0x10);
- i -= 0x10;
- if (likely(i <= 3))
- return ntfs_do_collate0x1[i](vol, data1, data1_len,
- data2, data2_len);
- BUG();
- return 0;
-}
diff --git a/fs/ntfs/collate.h b/fs/ntfs/collate.h
deleted file mode 100644
index f225561..0000000
--- a/fs/ntfs/collate.h
+++ /dev/null
@@ -1,36 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * collate.h - Defines for NTFS kernel collation handling. Part of the
- * Linux-NTFS project.
- *
- * Copyright (c) 2004 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_COLLATE_H
-#define _LINUX_NTFS_COLLATE_H
-
-#include "types.h"
-#include "volume.h"
-
-static inline bool ntfs_is_collation_rule_supported(COLLATION_RULE cr) {
- int i;
-
- /*
- * FIXME: At the moment we only support COLLATION_BINARY and
- * COLLATION_NTOFS_ULONG, so we return false for everything else for
- * now.
- */
- if (unlikely(cr != COLLATION_BINARY && cr != COLLATION_NTOFS_ULONG))
- return false;
- i = le32_to_cpu(cr);
- if (likely(((i >= 0) && (i <= 0x02)) ||
- ((i >= 0x10) && (i <= 0x13))))
- return true;
- return false;
-}
-
-extern int ntfs_collate(ntfs_volume *vol, COLLATION_RULE cr,
- const void *data1, const int data1_len,
- const void *data2, const int data2_len);
-
-#endif /* _LINUX_NTFS_COLLATE_H */
diff --git a/fs/ntfs/compress.c b/fs/ntfs/compress.c
deleted file mode 100644
index 761aaa0..0000000
--- a/fs/ntfs/compress.c
+++ /dev/null
@@ -1,950 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * compress.c - NTFS kernel compressed attributes handling.
- * Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2004 Anton Altaparmakov
- * Copyright (c) 2002 Richard Russon
- */
-
-#include <linux/fs.h>
-#include <linux/buffer_head.h>
-#include <linux/blkdev.h>
-#include <linux/vmalloc.h>
-#include <linux/slab.h>
-
-#include "attrib.h"
-#include "inode.h"
-#include "debug.h"
-#include "ntfs.h"
-
-/**
- * ntfs_compression_constants - enum of constants used in the compression code
- */
-typedef enum {
- /* Token types and access mask. */
- NTFS_SYMBOL_TOKEN = 0,
- NTFS_PHRASE_TOKEN = 1,
- NTFS_TOKEN_MASK = 1,
-
- /* Compression sub-block constants. */
- NTFS_SB_SIZE_MASK = 0x0fff,
- NTFS_SB_SIZE = 0x1000,
- NTFS_SB_IS_COMPRESSED = 0x8000,
-
- /*
- * The maximum compression block size is by definition 16 * the cluster
- * size, with the maximum supported cluster size being 4kiB. Thus the
- * maximum compression buffer size is 64kiB, so we use this when
- * initializing the compression buffer.
- */
- NTFS_MAX_CB_SIZE = 64 * 1024,
-} ntfs_compression_constants;
-
-/*
- * ntfs_compression_buffer - one buffer for the decompression engine
- */
-static u8 *ntfs_compression_buffer;
-
-/*
- * ntfs_cb_lock - spinlock which protects ntfs_compression_buffer
- */
-static DEFINE_SPINLOCK(ntfs_cb_lock);
-
-/**
- * allocate_compression_buffers - allocate the decompression buffers
- *
- * Caller has to hold the ntfs_lock mutex.
- *
- * Return 0 on success or -ENOMEM if the allocations failed.
- */
-int allocate_compression_buffers(void)
-{
- BUG_ON(ntfs_compression_buffer);
-
- ntfs_compression_buffer = vmalloc(NTFS_MAX_CB_SIZE);
- if (!ntfs_compression_buffer)
- return -ENOMEM;
- return 0;
-}
-
-/**
- * free_compression_buffers - free the decompression buffers
- *
- * Caller has to hold the ntfs_lock mutex.
- */
-void free_compression_buffers(void)
-{
- BUG_ON(!ntfs_compression_buffer);
- vfree(ntfs_compression_buffer);
- ntfs_compression_buffer = NULL;
-}
-
-/**
- * zero_partial_compressed_page - zero out of bounds compressed page region
- */
-static void zero_partial_compressed_page(struct page *page,
- const s64 initialized_size)
-{
- u8 *kp = page_address(page);
- unsigned int kp_ofs;
-
- ntfs_debug("Zeroing page region outside initialized size.");
- if (((s64)page->index << PAGE_SHIFT) >= initialized_size) {
- clear_page(kp);
- return;
- }
- kp_ofs = initialized_size & ~PAGE_MASK;
- memset(kp + kp_ofs, 0, PAGE_SIZE - kp_ofs);
- return;
-}
-
-/**
- * handle_bounds_compressed_page - test for&handle out of bounds compressed page
- */
-static inline void handle_bounds_compressed_page(struct page *page,
- const loff_t i_size, const s64 initialized_size)
-{
- if ((page->index >= (initialized_size >> PAGE_SHIFT)) &&
- (initialized_size < i_size))
- zero_partial_compressed_page(page, initialized_size);
- return;
-}
-
-/**
- * ntfs_decompress - decompress a compression block into an array of pages
- * @dest_pages: destination array of pages
- * @completed_pages: scratch space to track completed pages
- * @dest_index: current index into @dest_pages (IN/OUT)
- * @dest_ofs: current offset within @dest_pages[@dest_index] (IN/OUT)
- * @dest_max_index: maximum index into @dest_pages (IN)
- * @dest_max_ofs: maximum offset within @dest_pages[@dest_max_index] (IN)
- * @xpage: the target page (-1 if none) (IN)
- * @xpage_done: set to 1 if xpage was completed successfully (IN/OUT)
- * @cb_start: compression block to decompress (IN)
- * @cb_size: size of compression block @cb_start in bytes (IN)
- * @i_size: file size when we started the read (IN)
- * @initialized_size: initialized file size when we started the read (IN)
- *
- * The caller must have disabled preemption. ntfs_decompress() reenables it when
- * the critical section is finished.
- *
- * This decompresses the compression block @cb_start into the array of
- * destination pages @dest_pages starting at index @dest_index into @dest_pages
- * and at offset @dest_pos into the page @dest_pages[@dest_index].
- *
- * When the page @dest_pages[@xpage] is completed, @xpage_done is set to 1.
- * If xpage is -1 or @xpage has not been completed, @xpage_done is not modified.
- *
- * @cb_start is a pointer to the compression block which needs decompressing
- * and @cb_size is the size of @cb_start in bytes (8-64kiB).
- *
- * Return 0 if success or -EOVERFLOW on error in the compressed stream.
- * @xpage_done indicates whether the target page (@dest_pages[@xpage]) was
- * completed during the decompression of the compression block (@cb_start).
- *
- * Warning: This function *REQUIRES* PAGE_SIZE >= 4096 or it will blow up
- * unpredicatbly! You have been warned!
- *
- * Note to hackers: This function may not sleep until it has finished accessing
- * the compression block @cb_start as it is a per-CPU buffer.
- */
-static int ntfs_decompress(struct page *dest_pages[], int completed_pages[],
- int *dest_index, int *dest_ofs, const int dest_max_index,
- const int dest_max_ofs, const int xpage, char *xpage_done,
- u8 *const cb_start, const u32 cb_size, const loff_t i_size,
- const s64 initialized_size)
-{
- /*
- * Pointers into the compressed data, i.e. the compression block (cb),
- * and the therein contained sub-blocks (sb).
- */
- u8 *cb_end = cb_start + cb_size; /* End of cb. */
- u8 *cb = cb_start; /* Current position in cb. */
- u8 *cb_sb_start; /* Beginning of the current sb in the cb. */
- u8 *cb_sb_end; /* End of current sb / beginning of next sb. */
-
- /* Variables for uncompressed data / destination. */
- struct page *dp; /* Current destination page being worked on. */
- u8 *dp_addr; /* Current pointer into dp. */
- u8 *dp_sb_start; /* Start of current sub-block in dp. */
- u8 *dp_sb_end; /* End of current sb in dp (dp_sb_start +
- NTFS_SB_SIZE). */
- u16 do_sb_start; /* @dest_ofs when starting this sub-block. */
- u16 do_sb_end; /* @dest_ofs of end of this sb (do_sb_start +
- NTFS_SB_SIZE). */
-
- /* Variables for tag and token parsing. */
- u8 tag; /* Current tag. */
- int token; /* Loop counter for the eight tokens in tag. */
- int nr_completed_pages = 0;
-
- /* Default error code. */
- int err = -EOVERFLOW;
-
- ntfs_debug("Entering, cb_size = 0x%x.", cb_size);
-do_next_sb:
- ntfs_debug("Beginning sub-block at offset = 0x%zx in the cb.",
- cb - cb_start);
- /*
- * Have we reached the end of the compression block or the end of the
- * decompressed data? The latter can happen for example if the current
- * position in the compression block is one byte before its end so the
- * first two checks do not detect it.
- */
- if (cb == cb_end || !le16_to_cpup((le16*)cb) ||
- (*dest_index == dest_max_index &&
- *dest_ofs == dest_max_ofs)) {
- int i;
-
- ntfs_debug("Completed. Returning success (0).");
- err = 0;
-return_error:
- /* We can sleep from now on, so we drop lock. */
- spin_unlock(&ntfs_cb_lock);
- /* Second stage: finalize completed pages. */
- if (nr_completed_pages > 0) {
- for (i = 0; i < nr_completed_pages; i++) {
- int di = completed_pages[i];
-
- dp = dest_pages[di];
- /*
- * If we are outside the initialized size, zero
- * the out of bounds page range.
- */
- handle_bounds_compressed_page(dp, i_size,
- initialized_size);
- flush_dcache_page(dp);
- kunmap(dp);
- SetPageUptodate(dp);
- unlock_page(dp);
- if (di == xpage)
- *xpage_done = 1;
- else
- put_page(dp);
- dest_pages[di] = NULL;
- }
- }
- return err;
- }
-
- /* Setup offsets for the current sub-block destination. */
- do_sb_start = *dest_ofs;
- do_sb_end = do_sb_start + NTFS_SB_SIZE;
-
- /* Check that we are still within allowed boundaries. */
- if (*dest_index == dest_max_index && do_sb_end > dest_max_ofs)
- goto return_overflow;
-
- /* Does the minimum size of a compressed sb overflow valid range? */
- if (cb + 6 > cb_end)
- goto return_overflow;
-
- /* Setup the current sub-block source pointers and validate range. */
- cb_sb_start = cb;
- cb_sb_end = cb_sb_start + (le16_to_cpup((le16*)cb) & NTFS_SB_SIZE_MASK)
- + 3;
- if (cb_sb_end > cb_end)
- goto return_overflow;
-
- /* Get the current destination page. */
- dp = dest_pages[*dest_index];
- if (!dp) {
- /* No page present. Skip decompression of this sub-block. */
- cb = cb_sb_end;
-
- /* Advance destination position to next sub-block. */
- *dest_ofs = (*dest_ofs + NTFS_SB_SIZE) & ~PAGE_MASK;
- if (!*dest_ofs && (++*dest_index > dest_max_index))
- goto return_overflow;
- goto do_next_sb;
- }
-
- /* We have a valid destination page. Setup the destination pointers. */
- dp_addr = (u8*)page_address(dp) + do_sb_start;
-
- /* Now, we are ready to process the current sub-block (sb). */
- if (!(le16_to_cpup((le16*)cb) & NTFS_SB_IS_COMPRESSED)) {
- ntfs_debug("Found uncompressed sub-block.");
- /* This sb is not compressed, just copy it into destination. */
-
- /* Advance source position to first data byte. */
- cb += 2;
-
- /* An uncompressed sb must be full size. */
- if (cb_sb_end - cb != NTFS_SB_SIZE)
- goto return_overflow;
-
- /* Copy the block and advance the source position. */
- memcpy(dp_addr, cb, NTFS_SB_SIZE);
- cb += NTFS_SB_SIZE;
-
- /* Advance destination position to next sub-block. */
- *dest_ofs += NTFS_SB_SIZE;
- if (!(*dest_ofs &= ~PAGE_MASK)) {
-finalize_page:
- /*
- * First stage: add current page index to array of
- * completed pages.
- */
- completed_pages[nr_completed_pages++] = *dest_index;
- if (++*dest_index > dest_max_index)
- goto return_overflow;
- }
- goto do_next_sb;
- }
- ntfs_debug("Found compressed sub-block.");
- /* This sb is compressed, decompress it into destination. */
-
- /* Setup destination pointers. */
- dp_sb_start = dp_addr;
- dp_sb_end = dp_sb_start + NTFS_SB_SIZE;
-
- /* Forward to the first tag in the sub-block. */
- cb += 2;
-do_next_tag:
- if (cb == cb_sb_end) {
- /* Check if the decompressed sub-block was not full-length. */
- if (dp_addr < dp_sb_end) {
- int nr_bytes = do_sb_end - *dest_ofs;
-
- ntfs_debug("Filling incomplete sub-block with "
- "zeroes.");
- /* Zero remainder and update destination position. */
- memset(dp_addr, 0, nr_bytes);
- *dest_ofs += nr_bytes;
- }
- /* We have finished the current sub-block. */
- if (!(*dest_ofs &= ~PAGE_MASK))
- goto finalize_page;
- goto do_next_sb;
- }
-
- /* Check we are still in range. */
- if (cb > cb_sb_end || dp_addr > dp_sb_end)
- goto return_overflow;
-
- /* Get the next tag and advance to first token. */
- tag = *cb++;
-
- /* Parse the eight tokens described by the tag. */
- for (token = 0; token < 8; token++, tag >>= 1) {
- u16 lg, pt, length, max_non_overlap;
- register u16 i;
- u8 *dp_back_addr;
-
- /* Check if we are done / still in range. */
- if (cb >= cb_sb_end || dp_addr > dp_sb_end)
- break;
-
- /* Determine token type and parse appropriately.*/
- if ((tag & NTFS_TOKEN_MASK) == NTFS_SYMBOL_TOKEN) {
- /*
- * We have a symbol token, copy the symbol across, and
- * advance the source and destination positions.
- */
- *dp_addr++ = *cb++;
- ++*dest_ofs;
-
- /* Continue with the next token. */
- continue;
- }
-
- /*
- * We have a phrase token. Make sure it is not the first tag in
- * the sb as this is illegal and would confuse the code below.
- */
- if (dp_addr == dp_sb_start)
- goto return_overflow;
-
- /*
- * Determine the number of bytes to go back (p) and the number
- * of bytes to copy (l). We use an optimized algorithm in which
- * we first calculate log2(current destination position in sb),
- * which allows determination of l and p in O(1) rather than
- * O(n). We just need an arch-optimized log2() function now.
- */
- lg = 0;
- for (i = *dest_ofs - do_sb_start - 1; i >= 0x10; i >>= 1)
- lg++;
-
- /* Get the phrase token into i. */
- pt = le16_to_cpup((le16*)cb);
-
- /*
- * Calculate starting position of the byte sequence in
- * the destination using the fact that p = (pt >> (12 - lg)) + 1
- * and make sure we don't go too far back.
- */
- dp_back_addr = dp_addr - (pt >> (12 - lg)) - 1;
- if (dp_back_addr < dp_sb_start)
- goto return_overflow;
-
- /* Now calculate the length of the byte sequence. */
- length = (pt & (0xfff >> lg)) + 3;
-
- /* Advance destination position and verify it is in range. */
- *dest_ofs += length;
- if (*dest_ofs > do_sb_end)
- goto return_overflow;
-
- /* The number of non-overlapping bytes. */
- max_non_overlap = dp_addr - dp_back_addr;
-
- if (length <= max_non_overlap) {
- /* The byte sequence doesn't overlap, just copy it. */
- memcpy(dp_addr, dp_back_addr, length);
-
- /* Advance destination pointer. */
- dp_addr += length;
- } else {
- /*
- * The byte sequence does overlap, copy non-overlapping
- * part and then do a slow byte by byte copy for the
- * overlapping part. Also, advance the destination
- * pointer.
- */
- memcpy(dp_addr, dp_back_addr, max_non_overlap);
- dp_addr += max_non_overlap;
- dp_back_addr += max_non_overlap;
- length -= max_non_overlap;
- while (length--)
- *dp_addr++ = *dp_back_addr++;
- }
-
- /* Advance source position and continue with the next token. */
- cb += 2;
- }
-
- /* No tokens left in the current tag. Continue with the next tag. */
- goto do_next_tag;
-
-return_overflow:
- ntfs_error(NULL, "Failed. Returning -EOVERFLOW.");
- goto return_error;
-}
-
-/**
- * ntfs_read_compressed_block - read a compressed block into the page cache
- * @page: locked page in the compression block(s) we need to read
- *
- * When we are called the page has already been verified to be locked and the
- * attribute is known to be non-resident, not encrypted, but compressed.
- *
- * 1. Determine which compression block(s) @page is in.
- * 2. Get hold of all pages corresponding to this/these compression block(s).
- * 3. Read the (first) compression block.
- * 4. Decompress it into the corresponding pages.
- * 5. Throw the compressed data away and proceed to 3. for the next compression
- * block or return success if no more compression blocks left.
- *
- * Warning: We have to be careful what we do about existing pages. They might
- * have been written to so that we would lose data if we were to just overwrite
- * them with the out-of-date uncompressed data.
- *
- * FIXME: For PAGE_SIZE > cb_size we are not doing the Right Thing(TM) at
- * the end of the file I think. We need to detect this case and zero the out
- * of bounds remainder of the page in question and mark it as handled. At the
- * moment we would just return -EIO on such a page. This bug will only become
- * apparent if pages are above 8kiB and the NTFS volume only uses 512 byte
- * clusters so is probably not going to be seen by anyone. Still this should
- * be fixed. (AIA)
- *
- * FIXME: Again for PAGE_SIZE > cb_size we are screwing up both in
- * handling sparse and compressed cbs. (AIA)
- *
- * FIXME: At the moment we don't do any zeroing out in the case that
- * initialized_size is less than data_size. This should be safe because of the
- * nature of the compression algorithm used. Just in case we check and output
- * an error message in read inode if the two sizes are not equal for a
- * compressed file. (AIA)
- */
-int ntfs_read_compressed_block(struct page *page)
-{
- loff_t i_size;
- s64 initialized_size;
- struct address_space *mapping = page->mapping;
- ntfs_inode *ni = NTFS_I(mapping->host);
- ntfs_volume *vol = ni->vol;
- struct super_block *sb = vol->sb;
- runlist_element *rl;
- unsigned long flags, block_size = sb->s_blocksize;
- unsigned char block_size_bits = sb->s_blocksize_bits;
- u8 *cb, *cb_pos, *cb_end;
- struct buffer_head **bhs;
- unsigned long offset, index = page->index;
- u32 cb_size = ni->itype.compressed.block_size;
- u64 cb_size_mask = cb_size - 1UL;
- VCN vcn;
- LCN lcn;
- /* The first wanted vcn (minimum alignment is PAGE_SIZE). */
- VCN start_vcn = (((s64)index << PAGE_SHIFT) & ~cb_size_mask) >>
- vol->cluster_size_bits;
- /*
- * The first vcn after the last wanted vcn (minimum alignment is again
- * PAGE_SIZE.
- */
- VCN end_vcn = ((((s64)(index + 1UL) << PAGE_SHIFT) + cb_size - 1)
- & ~cb_size_mask) >> vol->cluster_size_bits;
- /* Number of compression blocks (cbs) in the wanted vcn range. */
- unsigned int nr_cbs = (end_vcn - start_vcn) << vol->cluster_size_bits
- >> ni->itype.compressed.block_size_bits;
- /*
- * Number of pages required to store the uncompressed data from all
- * compression blocks (cbs) overlapping @page. Due to alignment
- * guarantees of start_vcn and end_vcn, no need to round up here.
- */
- unsigned int nr_pages = (end_vcn - start_vcn) <<
- vol->cluster_size_bits >> PAGE_SHIFT;
- unsigned int xpage, max_page, cur_page, cur_ofs, i;
- unsigned int cb_clusters, cb_max_ofs;
- int block, max_block, cb_max_page, bhs_size, nr_bhs, err = 0;
- struct page **pages;
- int *completed_pages;
- unsigned char xpage_done = 0;
-
- ntfs_debug("Entering, page->index = 0x%lx, cb_size = 0x%x, nr_pages = "
- "%i.", index, cb_size, nr_pages);
- /*
- * Bad things happen if we get here for anything that is not an
- * unnamed $DATA attribute.
- */
- BUG_ON(ni->type != AT_DATA);
- BUG_ON(ni->name_len);
-
- pages = kmalloc_array(nr_pages, sizeof(struct page *), GFP_NOFS);
- completed_pages = kmalloc_array(nr_pages + 1, sizeof(int), GFP_NOFS);
-
- /* Allocate memory to store the buffer heads we need. */
- bhs_size = cb_size / block_size * sizeof(struct buffer_head *);
- bhs = kmalloc(bhs_size, GFP_NOFS);
-
- if (unlikely(!pages || !bhs || !completed_pages)) {
- kfree(bhs);
- kfree(pages);
- kfree(completed_pages);
- unlock_page(page);
- ntfs_error(vol->sb, "Failed to allocate internal buffers.");
- return -ENOMEM;
- }
-
- /*
- * We have already been given one page, this is the one we must do.
- * Once again, the alignment guarantees keep it simple.
- */
- offset = start_vcn << vol->cluster_size_bits >> PAGE_SHIFT;
- xpage = index - offset;
- pages[xpage] = page;
- /*
- * The remaining pages need to be allocated and inserted into the page
- * cache, alignment guarantees keep all the below much simpler. (-8
- */
- read_lock_irqsave(&ni->size_lock, flags);
- i_size = i_size_read(VFS_I(ni));
- initialized_size = ni->initialized_size;
- read_unlock_irqrestore(&ni->size_lock, flags);
- max_page = ((i_size + PAGE_SIZE - 1) >> PAGE_SHIFT) -
- offset;
- /* Is the page fully outside i_size? (truncate in progress) */
- if (xpage >= max_page) {
- kfree(bhs);
- kfree(pages);
- kfree(completed_pages);
- zero_user(page, 0, PAGE_SIZE);
- ntfs_debug("Compressed read outside i_size - truncated?");
- SetPageUptodate(page);
- unlock_page(page);
- return 0;
- }
- if (nr_pages < max_page)
- max_page = nr_pages;
- for (i = 0; i < max_page; i++, offset++) {
- if (i != xpage)
- pages[i] = grab_cache_page_nowait(mapping, offset);
- page = pages[i];
- if (page) {
- /*
- * We only (re)read the page if it isn't already read
- * in and/or dirty or we would be losing data or at
- * least wasting our time.
- */
- if (!PageDirty(page) && (!PageUptodate(page) ||
- PageError(page))) {
- ClearPageError(page);
- kmap(page);
- continue;
- }
- unlock_page(page);
- put_page(page);
- pages[i] = NULL;
- }
- }
-
- /*
- * We have the runlist, and all the destination pages we need to fill.
- * Now read the first compression block.
- */
- cur_page = 0;
- cur_ofs = 0;
- cb_clusters = ni->itype.compressed.block_clusters;
-do_next_cb:
- nr_cbs--;
- nr_bhs = 0;
-
- /* Read all cb buffer heads one cluster at a time. */
- rl = NULL;
- for (vcn = start_vcn, start_vcn += cb_clusters; vcn < start_vcn;
- vcn++) {
- bool is_retry = false;
-
- if (!rl) {
-lock_retry_remap:
- down_read(&ni->runlist.lock);
- rl = ni->runlist.rl;
- }
- if (likely(rl != NULL)) {
- /* Seek to element containing target vcn. */
- while (rl->length && rl[1].vcn <= vcn)
- rl++;
- lcn = ntfs_rl_vcn_to_lcn(rl, vcn);
- } else
- lcn = LCN_RL_NOT_MAPPED;
- ntfs_debug("Reading vcn = 0x%llx, lcn = 0x%llx.",
- (unsigned long long)vcn,
- (unsigned long long)lcn);
- if (lcn < 0) {
- /*
- * When we reach the first sparse cluster we have
- * finished with the cb.
- */
- if (lcn == LCN_HOLE)
- break;
- if (is_retry || lcn != LCN_RL_NOT_MAPPED)
- goto rl_err;
- is_retry = true;
- /*
- * Attempt to map runlist, dropping lock for the
- * duration.
- */
- up_read(&ni->runlist.lock);
- if (!ntfs_map_runlist(ni, vcn))
- goto lock_retry_remap;
- goto map_rl_err;
- }
- block = lcn << vol->cluster_size_bits >> block_size_bits;
- /* Read the lcn from device in chunks of block_size bytes. */
- max_block = block + (vol->cluster_size >> block_size_bits);
- do {
- ntfs_debug("block = 0x%x.", block);
- if (unlikely(!(bhs[nr_bhs] = sb_getblk(sb, block))))
- goto getblk_err;
- nr_bhs++;
- } while (++block < max_block);
- }
-
- /* Release the lock if we took it. */
- if (rl)
- up_read(&ni->runlist.lock);
-
- /* Setup and initiate io on all buffer heads. */
- for (i = 0; i < nr_bhs; i++) {
- struct buffer_head *tbh = bhs[i];
-
- if (!trylock_buffer(tbh))
- continue;
- if (unlikely(buffer_uptodate(tbh))) {
- unlock_buffer(tbh);
- continue;
- }
- get_bh(tbh);
- tbh->b_end_io = end_buffer_read_sync;
- submit_bh(REQ_OP_READ, tbh);
- }
-
- /* Wait for io completion on all buffer heads. */
- for (i = 0; i < nr_bhs; i++) {
- struct buffer_head *tbh = bhs[i];
-
- if (buffer_uptodate(tbh))
- continue;
- wait_on_buffer(tbh);
- /*
- * We need an optimization barrier here, otherwise we start
- * hitting the below fixup code when accessing a loopback
- * mounted ntfs partition. This indicates either there is a
- * race condition in the loop driver or, more likely, gcc
- * overoptimises the code without the barrier and it doesn't
- * do the Right Thing(TM).
- */
- barrier();
- if (unlikely(!buffer_uptodate(tbh))) {
- ntfs_warning(vol->sb, "Buffer is unlocked but not "
- "uptodate! Unplugging the disk queue "
- "and rescheduling.");
- get_bh(tbh);
- io_schedule();
- put_bh(tbh);
- if (unlikely(!buffer_uptodate(tbh)))
- goto read_err;
- ntfs_warning(vol->sb, "Buffer is now uptodate. Good.");
- }
- }
-
- /*
- * Get the compression buffer. We must not sleep any more
- * until we are finished with it.
- */
- spin_lock(&ntfs_cb_lock);
- cb = ntfs_compression_buffer;
-
- BUG_ON(!cb);
-
- cb_pos = cb;
- cb_end = cb + cb_size;
-
- /* Copy the buffer heads into the contiguous buffer. */
- for (i = 0; i < nr_bhs; i++) {
- memcpy(cb_pos, bhs[i]->b_data, block_size);
- cb_pos += block_size;
- }
-
- /* Just a precaution. */
- if (cb_pos + 2 <= cb + cb_size)
- *(u16*)cb_pos = 0;
-
- /* Reset cb_pos back to the beginning. */
- cb_pos = cb;
-
- /* We now have both source (if present) and destination. */
- ntfs_debug("Successfully read the compression block.");
-
- /* The last page and maximum offset within it for the current cb. */
- cb_max_page = (cur_page << PAGE_SHIFT) + cur_ofs + cb_size;
- cb_max_ofs = cb_max_page & ~PAGE_MASK;
- cb_max_page >>= PAGE_SHIFT;
-
- /* Catch end of file inside a compression block. */
- if (cb_max_page > max_page)
- cb_max_page = max_page;
-
- if (vcn == start_vcn - cb_clusters) {
- /* Sparse cb, zero out page range overlapping the cb. */
- ntfs_debug("Found sparse compression block.");
- /* We can sleep from now on, so we drop lock. */
- spin_unlock(&ntfs_cb_lock);
- if (cb_max_ofs)
- cb_max_page--;
- for (; cur_page < cb_max_page; cur_page++) {
- page = pages[cur_page];
- if (page) {
- if (likely(!cur_ofs))
- clear_page(page_address(page));
- else
- memset(page_address(page) + cur_ofs, 0,
- PAGE_SIZE -
- cur_ofs);
- flush_dcache_page(page);
- kunmap(page);
- SetPageUptodate(page);
- unlock_page(page);
- if (cur_page == xpage)
- xpage_done = 1;
- else
- put_page(page);
- pages[cur_page] = NULL;
- }
- cb_pos += PAGE_SIZE - cur_ofs;
- cur_ofs = 0;
- if (cb_pos >= cb_end)
- break;
- }
- /* If we have a partial final page, deal with it now. */
- if (cb_max_ofs && cb_pos < cb_end) {
- page = pages[cur_page];
- if (page)
- memset(page_address(page) + cur_ofs, 0,
- cb_max_ofs - cur_ofs);
- /*
- * No need to update cb_pos at this stage:
- * cb_pos += cb_max_ofs - cur_ofs;
- */
- cur_ofs = cb_max_ofs;
- }
- } else if (vcn == start_vcn) {
- /* We can't sleep so we need two stages. */
- unsigned int cur2_page = cur_page;
- unsigned int cur_ofs2 = cur_ofs;
- u8 *cb_pos2 = cb_pos;
-
- ntfs_debug("Found uncompressed compression block.");
- /* Uncompressed cb, copy it to the destination pages. */
- /*
- * TODO: As a big optimization, we could detect this case
- * before we read all the pages and use block_read_full_folio()
- * on all full pages instead (we still have to treat partial
- * pages especially but at least we are getting rid of the
- * synchronous io for the majority of pages.
- * Or if we choose not to do the read-ahead/-behind stuff, we
- * could just return block_read_full_folio(pages[xpage]) as long
- * as PAGE_SIZE <= cb_size.
- */
- if (cb_max_ofs)
- cb_max_page--;
- /* First stage: copy data into destination pages. */
- for (; cur_page < cb_max_page; cur_page++) {
- page = pages[cur_page];
- if (page)
- memcpy(page_address(page) + cur_ofs, cb_pos,
- PAGE_SIZE - cur_ofs);
- cb_pos += PAGE_SIZE - cur_ofs;
- cur_ofs = 0;
- if (cb_pos >= cb_end)
- break;
- }
- /* If we have a partial final page, deal with it now. */
- if (cb_max_ofs && cb_pos < cb_end) {
- page = pages[cur_page];
- if (page)
- memcpy(page_address(page) + cur_ofs, cb_pos,
- cb_max_ofs - cur_ofs);
- cb_pos += cb_max_ofs - cur_ofs;
- cur_ofs = cb_max_ofs;
- }
- /* We can sleep from now on, so drop lock. */
- spin_unlock(&ntfs_cb_lock);
- /* Second stage: finalize pages. */
- for (; cur2_page < cb_max_page; cur2_page++) {
- page = pages[cur2_page];
- if (page) {
- /*
- * If we are outside the initialized size, zero
- * the out of bounds page range.
- */
- handle_bounds_compressed_page(page, i_size,
- initialized_size);
- flush_dcache_page(page);
- kunmap(page);
- SetPageUptodate(page);
- unlock_page(page);
- if (cur2_page == xpage)
- xpage_done = 1;
- else
- put_page(page);
- pages[cur2_page] = NULL;
- }
- cb_pos2 += PAGE_SIZE - cur_ofs2;
- cur_ofs2 = 0;
- if (cb_pos2 >= cb_end)
- break;
- }
- } else {
- /* Compressed cb, decompress it into the destination page(s). */
- unsigned int prev_cur_page = cur_page;
-
- ntfs_debug("Found compressed compression block.");
- err = ntfs_decompress(pages, completed_pages, &cur_page,
- &cur_ofs, cb_max_page, cb_max_ofs, xpage,
- &xpage_done, cb_pos, cb_size - (cb_pos - cb),
- i_size, initialized_size);
- /*
- * We can sleep from now on, lock already dropped by
- * ntfs_decompress().
- */
- if (err) {
- ntfs_error(vol->sb, "ntfs_decompress() failed in inode "
- "0x%lx with error code %i. Skipping "
- "this compression block.",
- ni->mft_no, -err);
- /* Release the unfinished pages. */
- for (; prev_cur_page < cur_page; prev_cur_page++) {
- page = pages[prev_cur_page];
- if (page) {
- flush_dcache_page(page);
- kunmap(page);
- unlock_page(page);
- if (prev_cur_page != xpage)
- put_page(page);
- pages[prev_cur_page] = NULL;
- }
- }
- }
- }
-
- /* Release the buffer heads. */
- for (i = 0; i < nr_bhs; i++)
- brelse(bhs[i]);
-
- /* Do we have more work to do? */
- if (nr_cbs)
- goto do_next_cb;
-
- /* We no longer need the list of buffer heads. */
- kfree(bhs);
-
- /* Clean up if we have any pages left. Should never happen. */
- for (cur_page = 0; cur_page < max_page; cur_page++) {
- page = pages[cur_page];
- if (page) {
- ntfs_error(vol->sb, "Still have pages left! "
- "Terminating them with extreme "
- "prejudice. Inode 0x%lx, page index "
- "0x%lx.", ni->mft_no, page->index);
- flush_dcache_page(page);
- kunmap(page);
- unlock_page(page);
- if (cur_page != xpage)
- put_page(page);
- pages[cur_page] = NULL;
- }
- }
-
- /* We no longer need the list of pages. */
- kfree(pages);
- kfree(completed_pages);
-
- /* If we have completed the requested page, we return success. */
- if (likely(xpage_done))
- return 0;
-
- ntfs_debug("Failed. Returning error code %s.", err == -EOVERFLOW ?
- "EOVERFLOW" : (!err ? "EIO" : "unknown error"));
- return err < 0 ? err : -EIO;
-
-read_err:
- ntfs_error(vol->sb, "IO error while reading compressed data.");
- /* Release the buffer heads. */
- for (i = 0; i < nr_bhs; i++)
- brelse(bhs[i]);
- goto err_out;
-
-map_rl_err:
- ntfs_error(vol->sb, "ntfs_map_runlist() failed. Cannot read "
- "compression block.");
- goto err_out;
-
-rl_err:
- up_read(&ni->runlist.lock);
- ntfs_error(vol->sb, "ntfs_rl_vcn_to_lcn() failed. Cannot read "
- "compression block.");
- goto err_out;
-
-getblk_err:
- up_read(&ni->runlist.lock);
- ntfs_error(vol->sb, "getblk() failed. Cannot read compression block.");
-
-err_out:
- kfree(bhs);
- for (i = cur_page; i < max_page; i++) {
- page = pages[i];
- if (page) {
- flush_dcache_page(page);
- kunmap(page);
- unlock_page(page);
- if (i != xpage)
- put_page(page);
- }
- }
- kfree(pages);
- kfree(completed_pages);
- return -EIO;
-}
diff --git a/fs/ntfs/debug.c b/fs/ntfs/debug.c
deleted file mode 100644
index a3c1c56..0000000
--- a/fs/ntfs/debug.c
+++ /dev/null
@@ -1,159 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * debug.c - NTFS kernel debug support. Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2004 Anton Altaparmakov
- */
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-#include "debug.h"
-
-/**
- * __ntfs_warning - output a warning to the syslog
- * @function: name of function outputting the warning
- * @sb: super block of mounted ntfs filesystem
- * @fmt: warning string containing format specifications
- * @...: a variable number of arguments specified in @fmt
- *
- * Outputs a warning to the syslog for the mounted ntfs filesystem described
- * by @sb.
- *
- * @fmt and the corresponding @... is printf style format string containing
- * the warning string and the corresponding format arguments, respectively.
- *
- * @function is the name of the function from which __ntfs_warning is being
- * called.
- *
- * Note, you should be using debug.h::ntfs_warning(@sb, @fmt, @...) instead
- * as this provides the @function parameter automatically.
- */
-void __ntfs_warning(const char *function, const struct super_block *sb,
- const char *fmt, ...)
-{
- struct va_format vaf;
- va_list args;
- int flen = 0;
-
-#ifndef DEBUG
- if (!printk_ratelimit())
- return;
-#endif
- if (function)
- flen = strlen(function);
- va_start(args, fmt);
- vaf.fmt = fmt;
- vaf.va = &args;
- if (sb)
- pr_warn("(device %s): %s(): %pV\n",
- sb->s_id, flen ? function : "", &vaf);
- else
- pr_warn("%s(): %pV\n", flen ? function : "", &vaf);
- va_end(args);
-}
-
-/**
- * __ntfs_error - output an error to the syslog
- * @function: name of function outputting the error
- * @sb: super block of mounted ntfs filesystem
- * @fmt: error string containing format specifications
- * @...: a variable number of arguments specified in @fmt
- *
- * Outputs an error to the syslog for the mounted ntfs filesystem described
- * by @sb.
- *
- * @fmt and the corresponding @... is printf style format string containing
- * the error string and the corresponding format arguments, respectively.
- *
- * @function is the name of the function from which __ntfs_error is being
- * called.
- *
- * Note, you should be using debug.h::ntfs_error(@sb, @fmt, @...) instead
- * as this provides the @function parameter automatically.
- */
-void __ntfs_error(const char *function, const struct super_block *sb,
- const char *fmt, ...)
-{
- struct va_format vaf;
- va_list args;
- int flen = 0;
-
-#ifndef DEBUG
- if (!printk_ratelimit())
- return;
-#endif
- if (function)
- flen = strlen(function);
- va_start(args, fmt);
- vaf.fmt = fmt;
- vaf.va = &args;
- if (sb)
- pr_err("(device %s): %s(): %pV\n",
- sb->s_id, flen ? function : "", &vaf);
- else
- pr_err("%s(): %pV\n", flen ? function : "", &vaf);
- va_end(args);
-}
-
-#ifdef DEBUG
-
-/* If 1, output debug messages, and if 0, don't. */
-int debug_msgs = 0;
-
-void __ntfs_debug(const char *file, int line, const char *function,
- const char *fmt, ...)
-{
- struct va_format vaf;
- va_list args;
- int flen = 0;
-
- if (!debug_msgs)
- return;
- if (function)
- flen = strlen(function);
- va_start(args, fmt);
- vaf.fmt = fmt;
- vaf.va = &args;
- pr_debug("(%s, %d): %s(): %pV", file, line, flen ? function : "", &vaf);
- va_end(args);
-}
-
-/* Dump a runlist. Caller has to provide synchronisation for @rl. */
-void ntfs_debug_dump_runlist(const runlist_element *rl)
-{
- int i;
- const char *lcn_str[5] = { "LCN_HOLE ", "LCN_RL_NOT_MAPPED",
- "LCN_ENOENT ", "LCN_unknown " };
-
- if (!debug_msgs)
- return;
- pr_debug("Dumping runlist (values in hex):\n");
- if (!rl) {
- pr_debug("Run list not present.\n");
- return;
- }
- pr_debug("VCN LCN Run length\n");
- for (i = 0; ; i++) {
- LCN lcn = (rl + i)->lcn;
-
- if (lcn < (LCN)0) {
- int index = -lcn - 1;
-
- if (index > -LCN_ENOENT - 1)
- index = 3;
- pr_debug("%-16Lx %s %-16Lx%s\n",
- (long long)(rl + i)->vcn, lcn_str[index],
- (long long)(rl + i)->length,
- (rl + i)->length ? "" :
- " (runlist end)");
- } else
- pr_debug("%-16Lx %-16Lx %-16Lx%s\n",
- (long long)(rl + i)->vcn,
- (long long)(rl + i)->lcn,
- (long long)(rl + i)->length,
- (rl + i)->length ? "" :
- " (runlist end)");
- if (!(rl + i)->length)
- break;
- }
-}
-
-#endif
diff --git a/fs/ntfs/debug.h b/fs/ntfs/debug.h
deleted file mode 100644
index 6fdef38..0000000
--- a/fs/ntfs/debug.h
+++ /dev/null
@@ -1,57 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * debug.h - NTFS kernel debug support. Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2004 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_DEBUG_H
-#define _LINUX_NTFS_DEBUG_H
-
-#include <linux/fs.h>
-
-#include "runlist.h"
-
-#ifdef DEBUG
-
-extern int debug_msgs;
-
-extern __printf(4, 5)
-void __ntfs_debug(const char *file, int line, const char *function,
- const char *format, ...);
-/**
- * ntfs_debug - write a debug level message to syslog
- * @f: a printf format string containing the message
- * @...: the variables to substitute into @f
- *
- * ntfs_debug() writes a DEBUG level message to the syslog but only if the
- * driver was compiled with -DDEBUG. Otherwise, the call turns into a NOP.
- */
-#define ntfs_debug(f, a...) \
- __ntfs_debug(__FILE__, __LINE__, __func__, f, ##a)
-
-extern void ntfs_debug_dump_runlist(const runlist_element *rl);
-
-#else /* !DEBUG */
-
-#define ntfs_debug(fmt, ...) \
-do { \
- if (0) \
- no_printk(fmt, ##__VA_ARGS__); \
-} while (0)
-
-#define ntfs_debug_dump_runlist(rl) do {} while (0)
-
-#endif /* !DEBUG */
-
-extern __printf(3, 4)
-void __ntfs_warning(const char *function, const struct super_block *sb,
- const char *fmt, ...);
-#define ntfs_warning(sb, f, a...) __ntfs_warning(__func__, sb, f, ##a)
-
-extern __printf(3, 4)
-void __ntfs_error(const char *function, const struct super_block *sb,
- const char *fmt, ...);
-#define ntfs_error(sb, f, a...) __ntfs_error(__func__, sb, f, ##a)
-
-#endif /* _LINUX_NTFS_DEBUG_H */
diff --git a/fs/ntfs/dir.c b/fs/ntfs/dir.c
deleted file mode 100644
index 629723a..0000000
--- a/fs/ntfs/dir.c
+++ /dev/null
@@ -1,1540 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * dir.c - NTFS kernel directory operations. Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2007 Anton Altaparmakov
- * Copyright (c) 2002 Richard Russon
- */
-
-#include <linux/buffer_head.h>
-#include <linux/slab.h>
-#include <linux/blkdev.h>
-
-#include "dir.h"
-#include "aops.h"
-#include "attrib.h"
-#include "mft.h"
-#include "debug.h"
-#include "ntfs.h"
-
-/*
- * The little endian Unicode string $I30 as a global constant.
- */
-ntfschar I30[5] = { cpu_to_le16('$'), cpu_to_le16('I'),
- cpu_to_le16('3'), cpu_to_le16('0'), 0 };
-
-/**
- * ntfs_lookup_inode_by_name - find an inode in a directory given its name
- * @dir_ni: ntfs inode of the directory in which to search for the name
- * @uname: Unicode name for which to search in the directory
- * @uname_len: length of the name @uname in Unicode characters
- * @res: return the found file name if necessary (see below)
- *
- * Look for an inode with name @uname in the directory with inode @dir_ni.
- * ntfs_lookup_inode_by_name() walks the contents of the directory looking for
- * the Unicode name. If the name is found in the directory, the corresponding
- * inode number (>= 0) is returned as a mft reference in cpu format, i.e. it
- * is a 64-bit number containing the sequence number.
- *
- * On error, a negative value is returned corresponding to the error code. In
- * particular if the inode is not found -ENOENT is returned. Note that you
- * can't just check the return value for being negative, you have to check the
- * inode number for being negative which you can extract using MREC(return
- * value).
- *
- * Note, @uname_len does not include the (optional) terminating NULL character.
- *
- * Note, we look for a case sensitive match first but we also look for a case
- * insensitive match at the same time. If we find a case insensitive match, we
- * save that for the case that we don't find an exact match, where we return
- * the case insensitive match and setup @res (which we allocate!) with the mft
- * reference, the file name type, length and with a copy of the little endian
- * Unicode file name itself. If we match a file name which is in the DOS name
- * space, we only return the mft reference and file name type in @res.
- * ntfs_lookup() then uses this to find the long file name in the inode itself.
- * This is to avoid polluting the dcache with short file names. We want them to
- * work but we don't care for how quickly one can access them. This also fixes
- * the dcache aliasing issues.
- *
- * Locking: - Caller must hold i_mutex on the directory.
- * - Each page cache page in the index allocation mapping must be
- * locked whilst being accessed otherwise we may find a corrupt
- * page due to it being under ->writepage at the moment which
- * applies the mst protection fixups before writing out and then
- * removes them again after the write is complete after which it
- * unlocks the page.
- */
-MFT_REF ntfs_lookup_inode_by_name(ntfs_inode *dir_ni, const ntfschar *uname,
- const int uname_len, ntfs_name **res)
-{
- ntfs_volume *vol = dir_ni->vol;
- struct super_block *sb = vol->sb;
- MFT_RECORD *m;
- INDEX_ROOT *ir;
- INDEX_ENTRY *ie;
- INDEX_ALLOCATION *ia;
- u8 *index_end;
- u64 mref;
- ntfs_attr_search_ctx *ctx;
- int err, rc;
- VCN vcn, old_vcn;
- struct address_space *ia_mapping;
- struct page *page;
- u8 *kaddr;
- ntfs_name *name = NULL;
-
- BUG_ON(!S_ISDIR(VFS_I(dir_ni)->i_mode));
- BUG_ON(NInoAttr(dir_ni));
- /* Get hold of the mft record for the directory. */
- m = map_mft_record(dir_ni);
- if (IS_ERR(m)) {
- ntfs_error(sb, "map_mft_record() failed with error code %ld.",
- -PTR_ERR(m));
- return ERR_MREF(PTR_ERR(m));
- }
- ctx = ntfs_attr_get_search_ctx(dir_ni, m);
- if (unlikely(!ctx)) {
- err = -ENOMEM;
- goto err_out;
- }
- /* Find the index root attribute in the mft record. */
- err = ntfs_attr_lookup(AT_INDEX_ROOT, I30, 4, CASE_SENSITIVE, 0, NULL,
- 0, ctx);
- if (unlikely(err)) {
- if (err == -ENOENT) {
- ntfs_error(sb, "Index root attribute missing in "
- "directory inode 0x%lx.",
- dir_ni->mft_no);
- err = -EIO;
- }
- goto err_out;
- }
- /* Get to the index root value (it's been verified in read_inode). */
- ir = (INDEX_ROOT*)((u8*)ctx->attr +
- le16_to_cpu(ctx->attr->data.resident.value_offset));
- index_end = (u8*)&ir->index + le32_to_cpu(ir->index.index_length);
- /* The first index entry. */
- ie = (INDEX_ENTRY*)((u8*)&ir->index +
- le32_to_cpu(ir->index.entries_offset));
- /*
- * Loop until we exceed valid memory (corruption case) or until we
- * reach the last entry.
- */
- for (;; ie = (INDEX_ENTRY*)((u8*)ie + le16_to_cpu(ie->length))) {
- /* Bounds checks. */
- if ((u8*)ie < (u8*)ctx->mrec || (u8*)ie +
- sizeof(INDEX_ENTRY_HEADER) > index_end ||
- (u8*)ie + le16_to_cpu(ie->key_length) >
- index_end)
- goto dir_err_out;
- /*
- * The last entry cannot contain a name. It can however contain
- * a pointer to a child node in the B+tree so we just break out.
- */
- if (ie->flags & INDEX_ENTRY_END)
- break;
- /*
- * We perform a case sensitive comparison and if that matches
- * we are done and return the mft reference of the inode (i.e.
- * the inode number together with the sequence number for
- * consistency checking). We convert it to cpu format before
- * returning.
- */
- if (ntfs_are_names_equal(uname, uname_len,
- (ntfschar*)&ie->key.file_name.file_name,
- ie->key.file_name.file_name_length,
- CASE_SENSITIVE, vol->upcase, vol->upcase_len)) {
-found_it:
- /*
- * We have a perfect match, so we don't need to care
- * about having matched imperfectly before, so we can
- * free name and set *res to NULL.
- * However, if the perfect match is a short file name,
- * we need to signal this through *res, so that
- * ntfs_lookup() can fix dcache aliasing issues.
- * As an optimization we just reuse an existing
- * allocation of *res.
- */
- if (ie->key.file_name.file_name_type == FILE_NAME_DOS) {
- if (!name) {
- name = kmalloc(sizeof(ntfs_name),
- GFP_NOFS);
- if (!name) {
- err = -ENOMEM;
- goto err_out;
- }
- }
- name->mref = le64_to_cpu(
- ie->data.dir.indexed_file);
- name->type = FILE_NAME_DOS;
- name->len = 0;
- *res = name;
- } else {
- kfree(name);
- *res = NULL;
- }
- mref = le64_to_cpu(ie->data.dir.indexed_file);
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(dir_ni);
- return mref;
- }
- /*
- * For a case insensitive mount, we also perform a case
- * insensitive comparison (provided the file name is not in the
- * POSIX namespace). If the comparison matches, and the name is
- * in the WIN32 namespace, we cache the filename in *res so
- * that the caller, ntfs_lookup(), can work on it. If the
- * comparison matches, and the name is in the DOS namespace, we
- * only cache the mft reference and the file name type (we set
- * the name length to zero for simplicity).
- */
- if (!NVolCaseSensitive(vol) &&
- ie->key.file_name.file_name_type &&
- ntfs_are_names_equal(uname, uname_len,
- (ntfschar*)&ie->key.file_name.file_name,
- ie->key.file_name.file_name_length,
- IGNORE_CASE, vol->upcase, vol->upcase_len)) {
- int name_size = sizeof(ntfs_name);
- u8 type = ie->key.file_name.file_name_type;
- u8 len = ie->key.file_name.file_name_length;
-
- /* Only one case insensitive matching name allowed. */
- if (name) {
- ntfs_error(sb, "Found already allocated name "
- "in phase 1. Please run chkdsk "
- "and if that doesn't find any "
- "errors please report you saw "
- "this message to "
- "linux-ntfs-dev@lists."
- "sourceforge.net.");
- goto dir_err_out;
- }
-
- if (type != FILE_NAME_DOS)
- name_size += len * sizeof(ntfschar);
- name = kmalloc(name_size, GFP_NOFS);
- if (!name) {
- err = -ENOMEM;
- goto err_out;
- }
- name->mref = le64_to_cpu(ie->data.dir.indexed_file);
- name->type = type;
- if (type != FILE_NAME_DOS) {
- name->len = len;
- memcpy(name->name, ie->key.file_name.file_name,
- len * sizeof(ntfschar));
- } else
- name->len = 0;
- *res = name;
- }
- /*
- * Not a perfect match, need to do full blown collation so we
- * know which way in the B+tree we have to go.
- */
- rc = ntfs_collate_names(uname, uname_len,
- (ntfschar*)&ie->key.file_name.file_name,
- ie->key.file_name.file_name_length, 1,
- IGNORE_CASE, vol->upcase, vol->upcase_len);
- /*
- * If uname collates before the name of the current entry, there
- * is definitely no such name in this index but we might need to
- * descend into the B+tree so we just break out of the loop.
- */
- if (rc == -1)
- break;
- /* The names are not equal, continue the search. */
- if (rc)
- continue;
- /*
- * Names match with case insensitive comparison, now try the
- * case sensitive comparison, which is required for proper
- * collation.
- */
- rc = ntfs_collate_names(uname, uname_len,
- (ntfschar*)&ie->key.file_name.file_name,
- ie->key.file_name.file_name_length, 1,
- CASE_SENSITIVE, vol->upcase, vol->upcase_len);
- if (rc == -1)
- break;
- if (rc)
- continue;
- /*
- * Perfect match, this will never happen as the
- * ntfs_are_names_equal() call will have gotten a match but we
- * still treat it correctly.
- */
- goto found_it;
- }
- /*
- * We have finished with this index without success. Check for the
- * presence of a child node and if not present return -ENOENT, unless
- * we have got a matching name cached in name in which case return the
- * mft reference associated with it.
- */
- if (!(ie->flags & INDEX_ENTRY_NODE)) {
- if (name) {
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(dir_ni);
- return name->mref;
- }
- ntfs_debug("Entry not found.");
- err = -ENOENT;
- goto err_out;
- } /* Child node present, descend into it. */
- /* Consistency check: Verify that an index allocation exists. */
- if (!NInoIndexAllocPresent(dir_ni)) {
- ntfs_error(sb, "No index allocation attribute but index entry "
- "requires one. Directory inode 0x%lx is "
- "corrupt or driver bug.", dir_ni->mft_no);
- goto err_out;
- }
- /* Get the starting vcn of the index_block holding the child node. */
- vcn = sle64_to_cpup((sle64*)((u8*)ie + le16_to_cpu(ie->length) - 8));
- ia_mapping = VFS_I(dir_ni)->i_mapping;
- /*
- * We are done with the index root and the mft record. Release them,
- * otherwise we deadlock with ntfs_map_page().
- */
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(dir_ni);
- m = NULL;
- ctx = NULL;
-descend_into_child_node:
- /*
- * Convert vcn to index into the index allocation attribute in units
- * of PAGE_SIZE and map the page cache page, reading it from
- * disk if necessary.
- */
- page = ntfs_map_page(ia_mapping, vcn <<
- dir_ni->itype.index.vcn_size_bits >> PAGE_SHIFT);
- if (IS_ERR(page)) {
- ntfs_error(sb, "Failed to map directory index page, error %ld.",
- -PTR_ERR(page));
- err = PTR_ERR(page);
- goto err_out;
- }
- lock_page(page);
- kaddr = (u8*)page_address(page);
-fast_descend_into_child_node:
- /* Get to the index allocation block. */
- ia = (INDEX_ALLOCATION*)(kaddr + ((vcn <<
- dir_ni->itype.index.vcn_size_bits) & ~PAGE_MASK));
- /* Bounds checks. */
- if ((u8*)ia < kaddr || (u8*)ia > kaddr + PAGE_SIZE) {
- ntfs_error(sb, "Out of bounds check failed. Corrupt directory "
- "inode 0x%lx or driver bug.", dir_ni->mft_no);
- goto unm_err_out;
- }
- /* Catch multi sector transfer fixup errors. */
- if (unlikely(!ntfs_is_indx_record(ia->magic))) {
- ntfs_error(sb, "Directory index record with vcn 0x%llx is "
- "corrupt. Corrupt inode 0x%lx. Run chkdsk.",
- (unsigned long long)vcn, dir_ni->mft_no);
- goto unm_err_out;
- }
- if (sle64_to_cpu(ia->index_block_vcn) != vcn) {
- ntfs_error(sb, "Actual VCN (0x%llx) of index buffer is "
- "different from expected VCN (0x%llx). "
- "Directory inode 0x%lx is corrupt or driver "
- "bug.", (unsigned long long)
- sle64_to_cpu(ia->index_block_vcn),
- (unsigned long long)vcn, dir_ni->mft_no);
- goto unm_err_out;
- }
- if (le32_to_cpu(ia->index.allocated_size) + 0x18 !=
- dir_ni->itype.index.block_size) {
- ntfs_error(sb, "Index buffer (VCN 0x%llx) of directory inode "
- "0x%lx has a size (%u) differing from the "
- "directory specified size (%u). Directory "
- "inode is corrupt or driver bug.",
- (unsigned long long)vcn, dir_ni->mft_no,
- le32_to_cpu(ia->index.allocated_size) + 0x18,
- dir_ni->itype.index.block_size);
- goto unm_err_out;
- }
- index_end = (u8*)ia + dir_ni->itype.index.block_size;
- if (index_end > kaddr + PAGE_SIZE) {
- ntfs_error(sb, "Index buffer (VCN 0x%llx) of directory inode "
- "0x%lx crosses page boundary. Impossible! "
- "Cannot access! This is probably a bug in the "
- "driver.", (unsigned long long)vcn,
- dir_ni->mft_no);
- goto unm_err_out;
- }
- index_end = (u8*)&ia->index + le32_to_cpu(ia->index.index_length);
- if (index_end > (u8*)ia + dir_ni->itype.index.block_size) {
- ntfs_error(sb, "Size of index buffer (VCN 0x%llx) of directory "
- "inode 0x%lx exceeds maximum size.",
- (unsigned long long)vcn, dir_ni->mft_no);
- goto unm_err_out;
- }
- /* The first index entry. */
- ie = (INDEX_ENTRY*)((u8*)&ia->index +
- le32_to_cpu(ia->index.entries_offset));
- /*
- * Iterate similar to above big loop but applied to index buffer, thus
- * loop until we exceed valid memory (corruption case) or until we
- * reach the last entry.
- */
- for (;; ie = (INDEX_ENTRY*)((u8*)ie + le16_to_cpu(ie->length))) {
- /* Bounds check. */
- if ((u8*)ie < (u8*)ia || (u8*)ie +
- sizeof(INDEX_ENTRY_HEADER) > index_end ||
- (u8*)ie + le16_to_cpu(ie->key_length) >
- index_end) {
- ntfs_error(sb, "Index entry out of bounds in "
- "directory inode 0x%lx.",
- dir_ni->mft_no);
- goto unm_err_out;
- }
- /*
- * The last entry cannot contain a name. It can however contain
- * a pointer to a child node in the B+tree so we just break out.
- */
- if (ie->flags & INDEX_ENTRY_END)
- break;
- /*
- * We perform a case sensitive comparison and if that matches
- * we are done and return the mft reference of the inode (i.e.
- * the inode number together with the sequence number for
- * consistency checking). We convert it to cpu format before
- * returning.
- */
- if (ntfs_are_names_equal(uname, uname_len,
- (ntfschar*)&ie->key.file_name.file_name,
- ie->key.file_name.file_name_length,
- CASE_SENSITIVE, vol->upcase, vol->upcase_len)) {
-found_it2:
- /*
- * We have a perfect match, so we don't need to care
- * about having matched imperfectly before, so we can
- * free name and set *res to NULL.
- * However, if the perfect match is a short file name,
- * we need to signal this through *res, so that
- * ntfs_lookup() can fix dcache aliasing issues.
- * As an optimization we just reuse an existing
- * allocation of *res.
- */
- if (ie->key.file_name.file_name_type == FILE_NAME_DOS) {
- if (!name) {
- name = kmalloc(sizeof(ntfs_name),
- GFP_NOFS);
- if (!name) {
- err = -ENOMEM;
- goto unm_err_out;
- }
- }
- name->mref = le64_to_cpu(
- ie->data.dir.indexed_file);
- name->type = FILE_NAME_DOS;
- name->len = 0;
- *res = name;
- } else {
- kfree(name);
- *res = NULL;
- }
- mref = le64_to_cpu(ie->data.dir.indexed_file);
- unlock_page(page);
- ntfs_unmap_page(page);
- return mref;
- }
- /*
- * For a case insensitive mount, we also perform a case
- * insensitive comparison (provided the file name is not in the
- * POSIX namespace). If the comparison matches, and the name is
- * in the WIN32 namespace, we cache the filename in *res so
- * that the caller, ntfs_lookup(), can work on it. If the
- * comparison matches, and the name is in the DOS namespace, we
- * only cache the mft reference and the file name type (we set
- * the name length to zero for simplicity).
- */
- if (!NVolCaseSensitive(vol) &&
- ie->key.file_name.file_name_type &&
- ntfs_are_names_equal(uname, uname_len,
- (ntfschar*)&ie->key.file_name.file_name,
- ie->key.file_name.file_name_length,
- IGNORE_CASE, vol->upcase, vol->upcase_len)) {
- int name_size = sizeof(ntfs_name);
- u8 type = ie->key.file_name.file_name_type;
- u8 len = ie->key.file_name.file_name_length;
-
- /* Only one case insensitive matching name allowed. */
- if (name) {
- ntfs_error(sb, "Found already allocated name "
- "in phase 2. Please run chkdsk "
- "and if that doesn't find any "
- "errors please report you saw "
- "this message to "
- "linux-ntfs-dev@lists."
- "sourceforge.net.");
- unlock_page(page);
- ntfs_unmap_page(page);
- goto dir_err_out;
- }
-
- if (type != FILE_NAME_DOS)
- name_size += len * sizeof(ntfschar);
- name = kmalloc(name_size, GFP_NOFS);
- if (!name) {
- err = -ENOMEM;
- goto unm_err_out;
- }
- name->mref = le64_to_cpu(ie->data.dir.indexed_file);
- name->type = type;
- if (type != FILE_NAME_DOS) {
- name->len = len;
- memcpy(name->name, ie->key.file_name.file_name,
- len * sizeof(ntfschar));
- } else
- name->len = 0;
- *res = name;
- }
- /*
- * Not a perfect match, need to do full blown collation so we
- * know which way in the B+tree we have to go.
- */
- rc = ntfs_collate_names(uname, uname_len,
- (ntfschar*)&ie->key.file_name.file_name,
- ie->key.file_name.file_name_length, 1,
- IGNORE_CASE, vol->upcase, vol->upcase_len);
- /*
- * If uname collates before the name of the current entry, there
- * is definitely no such name in this index but we might need to
- * descend into the B+tree so we just break out of the loop.
- */
- if (rc == -1)
- break;
- /* The names are not equal, continue the search. */
- if (rc)
- continue;
- /*
- * Names match with case insensitive comparison, now try the
- * case sensitive comparison, which is required for proper
- * collation.
- */
- rc = ntfs_collate_names(uname, uname_len,
- (ntfschar*)&ie->key.file_name.file_name,
- ie->key.file_name.file_name_length, 1,
- CASE_SENSITIVE, vol->upcase, vol->upcase_len);
- if (rc == -1)
- break;
- if (rc)
- continue;
- /*
- * Perfect match, this will never happen as the
- * ntfs_are_names_equal() call will have gotten a match but we
- * still treat it correctly.
- */
- goto found_it2;
- }
- /*
- * We have finished with this index buffer without success. Check for
- * the presence of a child node.
- */
- if (ie->flags & INDEX_ENTRY_NODE) {
- if ((ia->index.flags & NODE_MASK) == LEAF_NODE) {
- ntfs_error(sb, "Index entry with child node found in "
- "a leaf node in directory inode 0x%lx.",
- dir_ni->mft_no);
- goto unm_err_out;
- }
- /* Child node present, descend into it. */
- old_vcn = vcn;
- vcn = sle64_to_cpup((sle64*)((u8*)ie +
- le16_to_cpu(ie->length) - 8));
- if (vcn >= 0) {
- /* If vcn is in the same page cache page as old_vcn we
- * recycle the mapped page. */
- if (old_vcn << vol->cluster_size_bits >>
- PAGE_SHIFT == vcn <<
- vol->cluster_size_bits >>
- PAGE_SHIFT)
- goto fast_descend_into_child_node;
- unlock_page(page);
- ntfs_unmap_page(page);
- goto descend_into_child_node;
- }
- ntfs_error(sb, "Negative child node vcn in directory inode "
- "0x%lx.", dir_ni->mft_no);
- goto unm_err_out;
- }
- /*
- * No child node present, return -ENOENT, unless we have got a matching
- * name cached in name in which case return the mft reference
- * associated with it.
- */
- if (name) {
- unlock_page(page);
- ntfs_unmap_page(page);
- return name->mref;
- }
- ntfs_debug("Entry not found.");
- err = -ENOENT;
-unm_err_out:
- unlock_page(page);
- ntfs_unmap_page(page);
-err_out:
- if (!err)
- err = -EIO;
- if (ctx)
- ntfs_attr_put_search_ctx(ctx);
- if (m)
- unmap_mft_record(dir_ni);
- if (name) {
- kfree(name);
- *res = NULL;
- }
- return ERR_MREF(err);
-dir_err_out:
- ntfs_error(sb, "Corrupt directory. Aborting lookup.");
- goto err_out;
-}
-
-#if 0
-
-// TODO: (AIA)
-// The algorithm embedded in this code will be required for the time when we
-// want to support adding of entries to directories, where we require correct
-// collation of file names in order not to cause corruption of the filesystem.
-
-/**
- * ntfs_lookup_inode_by_name - find an inode in a directory given its name
- * @dir_ni: ntfs inode of the directory in which to search for the name
- * @uname: Unicode name for which to search in the directory
- * @uname_len: length of the name @uname in Unicode characters
- *
- * Look for an inode with name @uname in the directory with inode @dir_ni.
- * ntfs_lookup_inode_by_name() walks the contents of the directory looking for
- * the Unicode name. If the name is found in the directory, the corresponding
- * inode number (>= 0) is returned as a mft reference in cpu format, i.e. it
- * is a 64-bit number containing the sequence number.
- *
- * On error, a negative value is returned corresponding to the error code. In
- * particular if the inode is not found -ENOENT is returned. Note that you
- * can't just check the return value for being negative, you have to check the
- * inode number for being negative which you can extract using MREC(return
- * value).
- *
- * Note, @uname_len does not include the (optional) terminating NULL character.
- */
-u64 ntfs_lookup_inode_by_name(ntfs_inode *dir_ni, const ntfschar *uname,
- const int uname_len)
-{
- ntfs_volume *vol = dir_ni->vol;
- struct super_block *sb = vol->sb;
- MFT_RECORD *m;
- INDEX_ROOT *ir;
- INDEX_ENTRY *ie;
- INDEX_ALLOCATION *ia;
- u8 *index_end;
- u64 mref;
- ntfs_attr_search_ctx *ctx;
- int err, rc;
- IGNORE_CASE_BOOL ic;
- VCN vcn, old_vcn;
- struct address_space *ia_mapping;
- struct page *page;
- u8 *kaddr;
-
- /* Get hold of the mft record for the directory. */
- m = map_mft_record(dir_ni);
- if (IS_ERR(m)) {
- ntfs_error(sb, "map_mft_record() failed with error code %ld.",
- -PTR_ERR(m));
- return ERR_MREF(PTR_ERR(m));
- }
- ctx = ntfs_attr_get_search_ctx(dir_ni, m);
- if (!ctx) {
- err = -ENOMEM;
- goto err_out;
- }
- /* Find the index root attribute in the mft record. */
- err = ntfs_attr_lookup(AT_INDEX_ROOT, I30, 4, CASE_SENSITIVE, 0, NULL,
- 0, ctx);
- if (unlikely(err)) {
- if (err == -ENOENT) {
- ntfs_error(sb, "Index root attribute missing in "
- "directory inode 0x%lx.",
- dir_ni->mft_no);
- err = -EIO;
- }
- goto err_out;
- }
- /* Get to the index root value (it's been verified in read_inode). */
- ir = (INDEX_ROOT*)((u8*)ctx->attr +
- le16_to_cpu(ctx->attr->data.resident.value_offset));
- index_end = (u8*)&ir->index + le32_to_cpu(ir->index.index_length);
- /* The first index entry. */
- ie = (INDEX_ENTRY*)((u8*)&ir->index +
- le32_to_cpu(ir->index.entries_offset));
- /*
- * Loop until we exceed valid memory (corruption case) or until we
- * reach the last entry.
- */
- for (;; ie = (INDEX_ENTRY*)((u8*)ie + le16_to_cpu(ie->length))) {
- /* Bounds checks. */
- if ((u8*)ie < (u8*)ctx->mrec || (u8*)ie +
- sizeof(INDEX_ENTRY_HEADER) > index_end ||
- (u8*)ie + le16_to_cpu(ie->key_length) >
- index_end)
- goto dir_err_out;
- /*
- * The last entry cannot contain a name. It can however contain
- * a pointer to a child node in the B+tree so we just break out.
- */
- if (ie->flags & INDEX_ENTRY_END)
- break;
- /*
- * If the current entry has a name type of POSIX, the name is
- * case sensitive and not otherwise. This has the effect of us
- * not being able to access any POSIX file names which collate
- * after the non-POSIX one when they only differ in case, but
- * anyone doing screwy stuff like that deserves to burn in
- * hell... Doing that kind of stuff on NT4 actually causes
- * corruption on the partition even when using SP6a and Linux
- * is not involved at all.
- */
- ic = ie->key.file_name.file_name_type ? IGNORE_CASE :
- CASE_SENSITIVE;
- /*
- * If the names match perfectly, we are done and return the
- * mft reference of the inode (i.e. the inode number together
- * with the sequence number for consistency checking. We
- * convert it to cpu format before returning.
- */
- if (ntfs_are_names_equal(uname, uname_len,
- (ntfschar*)&ie->key.file_name.file_name,
- ie->key.file_name.file_name_length, ic,
- vol->upcase, vol->upcase_len)) {
-found_it:
- mref = le64_to_cpu(ie->data.dir.indexed_file);
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(dir_ni);
- return mref;
- }
- /*
- * Not a perfect match, need to do full blown collation so we
- * know which way in the B+tree we have to go.
- */
- rc = ntfs_collate_names(uname, uname_len,
- (ntfschar*)&ie->key.file_name.file_name,
- ie->key.file_name.file_name_length, 1,
- IGNORE_CASE, vol->upcase, vol->upcase_len);
- /*
- * If uname collates before the name of the current entry, there
- * is definitely no such name in this index but we might need to
- * descend into the B+tree so we just break out of the loop.
- */
- if (rc == -1)
- break;
- /* The names are not equal, continue the search. */
- if (rc)
- continue;
- /*
- * Names match with case insensitive comparison, now try the
- * case sensitive comparison, which is required for proper
- * collation.
- */
- rc = ntfs_collate_names(uname, uname_len,
- (ntfschar*)&ie->key.file_name.file_name,
- ie->key.file_name.file_name_length, 1,
- CASE_SENSITIVE, vol->upcase, vol->upcase_len);
- if (rc == -1)
- break;
- if (rc)
- continue;
- /*
- * Perfect match, this will never happen as the
- * ntfs_are_names_equal() call will have gotten a match but we
- * still treat it correctly.
- */
- goto found_it;
- }
- /*
- * We have finished with this index without success. Check for the
- * presence of a child node.
- */
- if (!(ie->flags & INDEX_ENTRY_NODE)) {
- /* No child node, return -ENOENT. */
- err = -ENOENT;
- goto err_out;
- } /* Child node present, descend into it. */
- /* Consistency check: Verify that an index allocation exists. */
- if (!NInoIndexAllocPresent(dir_ni)) {
- ntfs_error(sb, "No index allocation attribute but index entry "
- "requires one. Directory inode 0x%lx is "
- "corrupt or driver bug.", dir_ni->mft_no);
- goto err_out;
- }
- /* Get the starting vcn of the index_block holding the child node. */
- vcn = sle64_to_cpup((u8*)ie + le16_to_cpu(ie->length) - 8);
- ia_mapping = VFS_I(dir_ni)->i_mapping;
- /*
- * We are done with the index root and the mft record. Release them,
- * otherwise we deadlock with ntfs_map_page().
- */
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(dir_ni);
- m = NULL;
- ctx = NULL;
-descend_into_child_node:
- /*
- * Convert vcn to index into the index allocation attribute in units
- * of PAGE_SIZE and map the page cache page, reading it from
- * disk if necessary.
- */
- page = ntfs_map_page(ia_mapping, vcn <<
- dir_ni->itype.index.vcn_size_bits >> PAGE_SHIFT);
- if (IS_ERR(page)) {
- ntfs_error(sb, "Failed to map directory index page, error %ld.",
- -PTR_ERR(page));
- err = PTR_ERR(page);
- goto err_out;
- }
- lock_page(page);
- kaddr = (u8*)page_address(page);
-fast_descend_into_child_node:
- /* Get to the index allocation block. */
- ia = (INDEX_ALLOCATION*)(kaddr + ((vcn <<
- dir_ni->itype.index.vcn_size_bits) & ~PAGE_MASK));
- /* Bounds checks. */
- if ((u8*)ia < kaddr || (u8*)ia > kaddr + PAGE_SIZE) {
- ntfs_error(sb, "Out of bounds check failed. Corrupt directory "
- "inode 0x%lx or driver bug.", dir_ni->mft_no);
- goto unm_err_out;
- }
- /* Catch multi sector transfer fixup errors. */
- if (unlikely(!ntfs_is_indx_record(ia->magic))) {
- ntfs_error(sb, "Directory index record with vcn 0x%llx is "
- "corrupt. Corrupt inode 0x%lx. Run chkdsk.",
- (unsigned long long)vcn, dir_ni->mft_no);
- goto unm_err_out;
- }
- if (sle64_to_cpu(ia->index_block_vcn) != vcn) {
- ntfs_error(sb, "Actual VCN (0x%llx) of index buffer is "
- "different from expected VCN (0x%llx). "
- "Directory inode 0x%lx is corrupt or driver "
- "bug.", (unsigned long long)
- sle64_to_cpu(ia->index_block_vcn),
- (unsigned long long)vcn, dir_ni->mft_no);
- goto unm_err_out;
- }
- if (le32_to_cpu(ia->index.allocated_size) + 0x18 !=
- dir_ni->itype.index.block_size) {
- ntfs_error(sb, "Index buffer (VCN 0x%llx) of directory inode "
- "0x%lx has a size (%u) differing from the "
- "directory specified size (%u). Directory "
- "inode is corrupt or driver bug.",
- (unsigned long long)vcn, dir_ni->mft_no,
- le32_to_cpu(ia->index.allocated_size) + 0x18,
- dir_ni->itype.index.block_size);
- goto unm_err_out;
- }
- index_end = (u8*)ia + dir_ni->itype.index.block_size;
- if (index_end > kaddr + PAGE_SIZE) {
- ntfs_error(sb, "Index buffer (VCN 0x%llx) of directory inode "
- "0x%lx crosses page boundary. Impossible! "
- "Cannot access! This is probably a bug in the "
- "driver.", (unsigned long long)vcn,
- dir_ni->mft_no);
- goto unm_err_out;
- }
- index_end = (u8*)&ia->index + le32_to_cpu(ia->index.index_length);
- if (index_end > (u8*)ia + dir_ni->itype.index.block_size) {
- ntfs_error(sb, "Size of index buffer (VCN 0x%llx) of directory "
- "inode 0x%lx exceeds maximum size.",
- (unsigned long long)vcn, dir_ni->mft_no);
- goto unm_err_out;
- }
- /* The first index entry. */
- ie = (INDEX_ENTRY*)((u8*)&ia->index +
- le32_to_cpu(ia->index.entries_offset));
- /*
- * Iterate similar to above big loop but applied to index buffer, thus
- * loop until we exceed valid memory (corruption case) or until we
- * reach the last entry.
- */
- for (;; ie = (INDEX_ENTRY*)((u8*)ie + le16_to_cpu(ie->length))) {
- /* Bounds check. */
- if ((u8*)ie < (u8*)ia || (u8*)ie +
- sizeof(INDEX_ENTRY_HEADER) > index_end ||
- (u8*)ie + le16_to_cpu(ie->key_length) >
- index_end) {
- ntfs_error(sb, "Index entry out of bounds in "
- "directory inode 0x%lx.",
- dir_ni->mft_no);
- goto unm_err_out;
- }
- /*
- * The last entry cannot contain a name. It can however contain
- * a pointer to a child node in the B+tree so we just break out.
- */
- if (ie->flags & INDEX_ENTRY_END)
- break;
- /*
- * If the current entry has a name type of POSIX, the name is
- * case sensitive and not otherwise. This has the effect of us
- * not being able to access any POSIX file names which collate
- * after the non-POSIX one when they only differ in case, but
- * anyone doing screwy stuff like that deserves to burn in
- * hell... Doing that kind of stuff on NT4 actually causes
- * corruption on the partition even when using SP6a and Linux
- * is not involved at all.
- */
- ic = ie->key.file_name.file_name_type ? IGNORE_CASE :
- CASE_SENSITIVE;
- /*
- * If the names match perfectly, we are done and return the
- * mft reference of the inode (i.e. the inode number together
- * with the sequence number for consistency checking. We
- * convert it to cpu format before returning.
- */
- if (ntfs_are_names_equal(uname, uname_len,
- (ntfschar*)&ie->key.file_name.file_name,
- ie->key.file_name.file_name_length, ic,
- vol->upcase, vol->upcase_len)) {
-found_it2:
- mref = le64_to_cpu(ie->data.dir.indexed_file);
- unlock_page(page);
- ntfs_unmap_page(page);
- return mref;
- }
- /*
- * Not a perfect match, need to do full blown collation so we
- * know which way in the B+tree we have to go.
- */
- rc = ntfs_collate_names(uname, uname_len,
- (ntfschar*)&ie->key.file_name.file_name,
- ie->key.file_name.file_name_length, 1,
- IGNORE_CASE, vol->upcase, vol->upcase_len);
- /*
- * If uname collates before the name of the current entry, there
- * is definitely no such name in this index but we might need to
- * descend into the B+tree so we just break out of the loop.
- */
- if (rc == -1)
- break;
- /* The names are not equal, continue the search. */
- if (rc)
- continue;
- /*
- * Names match with case insensitive comparison, now try the
- * case sensitive comparison, which is required for proper
- * collation.
- */
- rc = ntfs_collate_names(uname, uname_len,
- (ntfschar*)&ie->key.file_name.file_name,
- ie->key.file_name.file_name_length, 1,
- CASE_SENSITIVE, vol->upcase, vol->upcase_len);
- if (rc == -1)
- break;
- if (rc)
- continue;
- /*
- * Perfect match, this will never happen as the
- * ntfs_are_names_equal() call will have gotten a match but we
- * still treat it correctly.
- */
- goto found_it2;
- }
- /*
- * We have finished with this index buffer without success. Check for
- * the presence of a child node.
- */
- if (ie->flags & INDEX_ENTRY_NODE) {
- if ((ia->index.flags & NODE_MASK) == LEAF_NODE) {
- ntfs_error(sb, "Index entry with child node found in "
- "a leaf node in directory inode 0x%lx.",
- dir_ni->mft_no);
- goto unm_err_out;
- }
- /* Child node present, descend into it. */
- old_vcn = vcn;
- vcn = sle64_to_cpup((u8*)ie + le16_to_cpu(ie->length) - 8);
- if (vcn >= 0) {
- /* If vcn is in the same page cache page as old_vcn we
- * recycle the mapped page. */
- if (old_vcn << vol->cluster_size_bits >>
- PAGE_SHIFT == vcn <<
- vol->cluster_size_bits >>
- PAGE_SHIFT)
- goto fast_descend_into_child_node;
- unlock_page(page);
- ntfs_unmap_page(page);
- goto descend_into_child_node;
- }
- ntfs_error(sb, "Negative child node vcn in directory inode "
- "0x%lx.", dir_ni->mft_no);
- goto unm_err_out;
- }
- /* No child node, return -ENOENT. */
- ntfs_debug("Entry not found.");
- err = -ENOENT;
-unm_err_out:
- unlock_page(page);
- ntfs_unmap_page(page);
-err_out:
- if (!err)
- err = -EIO;
- if (ctx)
- ntfs_attr_put_search_ctx(ctx);
- if (m)
- unmap_mft_record(dir_ni);
- return ERR_MREF(err);
-dir_err_out:
- ntfs_error(sb, "Corrupt directory. Aborting lookup.");
- goto err_out;
-}
-
-#endif
-
-/**
- * ntfs_filldir - ntfs specific filldir method
- * @vol: current ntfs volume
- * @ndir: ntfs inode of current directory
- * @ia_page: page in which the index allocation buffer @ie is in resides
- * @ie: current index entry
- * @name: buffer to use for the converted name
- * @actor: what to feed the entries to
- *
- * Convert the Unicode @name to the loaded NLS and pass it to the @filldir
- * callback.
- *
- * If @ia_page is not NULL it is the locked page containing the index
- * allocation block containing the index entry @ie.
- *
- * Note, we drop (and then reacquire) the page lock on @ia_page across the
- * @filldir() call otherwise we would deadlock with NFSd when it calls ->lookup
- * since ntfs_lookup() will lock the same page. As an optimization, we do not
- * retake the lock if we are returning a non-zero value as ntfs_readdir()
- * would need to drop the lock immediately anyway.
- */
-static inline int ntfs_filldir(ntfs_volume *vol,
- ntfs_inode *ndir, struct page *ia_page, INDEX_ENTRY *ie,
- u8 *name, struct dir_context *actor)
-{
- unsigned long mref;
- int name_len;
- unsigned dt_type;
- FILE_NAME_TYPE_FLAGS name_type;
-
- name_type = ie->key.file_name.file_name_type;
- if (name_type == FILE_NAME_DOS) {
- ntfs_debug("Skipping DOS name space entry.");
- return 0;
- }
- if (MREF_LE(ie->data.dir.indexed_file) == FILE_root) {
- ntfs_debug("Skipping root directory self reference entry.");
- return 0;
- }
- if (MREF_LE(ie->data.dir.indexed_file) < FILE_first_user &&
- !NVolShowSystemFiles(vol)) {
- ntfs_debug("Skipping system file.");
- return 0;
- }
- name_len = ntfs_ucstonls(vol, (ntfschar*)&ie->key.file_name.file_name,
- ie->key.file_name.file_name_length, &name,
- NTFS_MAX_NAME_LEN * NLS_MAX_CHARSET_SIZE + 1);
- if (name_len <= 0) {
- ntfs_warning(vol->sb, "Skipping unrepresentable inode 0x%llx.",
- (long long)MREF_LE(ie->data.dir.indexed_file));
- return 0;
- }
- if (ie->key.file_name.file_attributes &
- FILE_ATTR_DUP_FILE_NAME_INDEX_PRESENT)
- dt_type = DT_DIR;
- else
- dt_type = DT_REG;
- mref = MREF_LE(ie->data.dir.indexed_file);
- /*
- * Drop the page lock otherwise we deadlock with NFS when it calls
- * ->lookup since ntfs_lookup() will lock the same page.
- */
- if (ia_page)
- unlock_page(ia_page);
- ntfs_debug("Calling filldir for %s with len %i, fpos 0x%llx, inode "
- "0x%lx, DT_%s.", name, name_len, actor->pos, mref,
- dt_type == DT_DIR ? "DIR" : "REG");
- if (!dir_emit(actor, name, name_len, mref, dt_type))
- return 1;
- /* Relock the page but not if we are aborting ->readdir. */
- if (ia_page)
- lock_page(ia_page);
- return 0;
-}
-
-/*
- * We use the same basic approach as the old NTFS driver, i.e. we parse the
- * index root entries and then the index allocation entries that are marked
- * as in use in the index bitmap.
- *
- * While this will return the names in random order this doesn't matter for
- * ->readdir but OTOH results in a faster ->readdir.
- *
- * VFS calls ->readdir without BKL but with i_mutex held. This protects the VFS
- * parts (e.g. ->f_pos and ->i_size, and it also protects against directory
- * modifications).
- *
- * Locking: - Caller must hold i_mutex on the directory.
- * - Each page cache page in the index allocation mapping must be
- * locked whilst being accessed otherwise we may find a corrupt
- * page due to it being under ->writepage at the moment which
- * applies the mst protection fixups before writing out and then
- * removes them again after the write is complete after which it
- * unlocks the page.
- */
-static int ntfs_readdir(struct file *file, struct dir_context *actor)
-{
- s64 ia_pos, ia_start, prev_ia_pos, bmp_pos;
- loff_t i_size;
- struct inode *bmp_vi, *vdir = file_inode(file);
- struct super_block *sb = vdir->i_sb;
- ntfs_inode *ndir = NTFS_I(vdir);
- ntfs_volume *vol = NTFS_SB(sb);
- MFT_RECORD *m;
- INDEX_ROOT *ir = NULL;
- INDEX_ENTRY *ie;
- INDEX_ALLOCATION *ia;
- u8 *name = NULL;
- int rc, err, ir_pos, cur_bmp_pos;
- struct address_space *ia_mapping, *bmp_mapping;
- struct page *bmp_page = NULL, *ia_page = NULL;
- u8 *kaddr, *bmp, *index_end;
- ntfs_attr_search_ctx *ctx;
-
- ntfs_debug("Entering for inode 0x%lx, fpos 0x%llx.",
- vdir->i_ino, actor->pos);
- rc = err = 0;
- /* Are we at end of dir yet? */
- i_size = i_size_read(vdir);
- if (actor->pos >= i_size + vol->mft_record_size)
- return 0;
- /* Emulate . and .. for all directories. */
- if (!dir_emit_dots(file, actor))
- return 0;
- m = NULL;
- ctx = NULL;
- /*
- * Allocate a buffer to store the current name being processed
- * converted to format determined by current NLS.
- */
- name = kmalloc(NTFS_MAX_NAME_LEN * NLS_MAX_CHARSET_SIZE + 1, GFP_NOFS);
- if (unlikely(!name)) {
- err = -ENOMEM;
- goto err_out;
- }
- /* Are we jumping straight into the index allocation attribute? */
- if (actor->pos >= vol->mft_record_size)
- goto skip_index_root;
- /* Get hold of the mft record for the directory. */
- m = map_mft_record(ndir);
- if (IS_ERR(m)) {
- err = PTR_ERR(m);
- m = NULL;
- goto err_out;
- }
- ctx = ntfs_attr_get_search_ctx(ndir, m);
- if (unlikely(!ctx)) {
- err = -ENOMEM;
- goto err_out;
- }
- /* Get the offset into the index root attribute. */
- ir_pos = (s64)actor->pos;
- /* Find the index root attribute in the mft record. */
- err = ntfs_attr_lookup(AT_INDEX_ROOT, I30, 4, CASE_SENSITIVE, 0, NULL,
- 0, ctx);
- if (unlikely(err)) {
- ntfs_error(sb, "Index root attribute missing in directory "
- "inode 0x%lx.", vdir->i_ino);
- goto err_out;
- }
- /*
- * Copy the index root attribute value to a buffer so that we can put
- * the search context and unmap the mft record before calling the
- * filldir() callback. We need to do this because of NFSd which calls
- * ->lookup() from its filldir callback() and this causes NTFS to
- * deadlock as ntfs_lookup() maps the mft record of the directory and
- * we have got it mapped here already. The only solution is for us to
- * unmap the mft record here so that a call to ntfs_lookup() is able to
- * map the mft record without deadlocking.
- */
- rc = le32_to_cpu(ctx->attr->data.resident.value_length);
- ir = kmalloc(rc, GFP_NOFS);
- if (unlikely(!ir)) {
- err = -ENOMEM;
- goto err_out;
- }
- /* Copy the index root value (it has been verified in read_inode). */
- memcpy(ir, (u8*)ctx->attr +
- le16_to_cpu(ctx->attr->data.resident.value_offset), rc);
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(ndir);
- ctx = NULL;
- m = NULL;
- index_end = (u8*)&ir->index + le32_to_cpu(ir->index.index_length);
- /* The first index entry. */
- ie = (INDEX_ENTRY*)((u8*)&ir->index +
- le32_to_cpu(ir->index.entries_offset));
- /*
- * Loop until we exceed valid memory (corruption case) or until we
- * reach the last entry or until filldir tells us it has had enough
- * or signals an error (both covered by the rc test).
- */
- for (;; ie = (INDEX_ENTRY*)((u8*)ie + le16_to_cpu(ie->length))) {
- ntfs_debug("In index root, offset 0x%zx.", (u8*)ie - (u8*)ir);
- /* Bounds checks. */
- if (unlikely((u8*)ie < (u8*)ir || (u8*)ie +
- sizeof(INDEX_ENTRY_HEADER) > index_end ||
- (u8*)ie + le16_to_cpu(ie->key_length) >
- index_end))
- goto err_out;
- /* The last entry cannot contain a name. */
- if (ie->flags & INDEX_ENTRY_END)
- break;
- /* Skip index root entry if continuing previous readdir. */
- if (ir_pos > (u8*)ie - (u8*)ir)
- continue;
- /* Advance the position even if going to skip the entry. */
- actor->pos = (u8*)ie - (u8*)ir;
- /* Submit the name to the filldir callback. */
- rc = ntfs_filldir(vol, ndir, NULL, ie, name, actor);
- if (rc) {
- kfree(ir);
- goto abort;
- }
- }
- /* We are done with the index root and can free the buffer. */
- kfree(ir);
- ir = NULL;
- /* If there is no index allocation attribute we are finished. */
- if (!NInoIndexAllocPresent(ndir))
- goto EOD;
- /* Advance fpos to the beginning of the index allocation. */
- actor->pos = vol->mft_record_size;
-skip_index_root:
- kaddr = NULL;
- prev_ia_pos = -1LL;
- /* Get the offset into the index allocation attribute. */
- ia_pos = (s64)actor->pos - vol->mft_record_size;
- ia_mapping = vdir->i_mapping;
- ntfs_debug("Inode 0x%lx, getting index bitmap.", vdir->i_ino);
- bmp_vi = ntfs_attr_iget(vdir, AT_BITMAP, I30, 4);
- if (IS_ERR(bmp_vi)) {
- ntfs_error(sb, "Failed to get bitmap attribute.");
- err = PTR_ERR(bmp_vi);
- goto err_out;
- }
- bmp_mapping = bmp_vi->i_mapping;
- /* Get the starting bitmap bit position and sanity check it. */
- bmp_pos = ia_pos >> ndir->itype.index.block_size_bits;
- if (unlikely(bmp_pos >> 3 >= i_size_read(bmp_vi))) {
- ntfs_error(sb, "Current index allocation position exceeds "
- "index bitmap size.");
- goto iput_err_out;
- }
- /* Get the starting bit position in the current bitmap page. */
- cur_bmp_pos = bmp_pos & ((PAGE_SIZE * 8) - 1);
- bmp_pos &= ~(u64)((PAGE_SIZE * 8) - 1);
-get_next_bmp_page:
- ntfs_debug("Reading bitmap with page index 0x%llx, bit ofs 0x%llx",
- (unsigned long long)bmp_pos >> (3 + PAGE_SHIFT),
- (unsigned long long)bmp_pos &
- (unsigned long long)((PAGE_SIZE * 8) - 1));
- bmp_page = ntfs_map_page(bmp_mapping,
- bmp_pos >> (3 + PAGE_SHIFT));
- if (IS_ERR(bmp_page)) {
- ntfs_error(sb, "Reading index bitmap failed.");
- err = PTR_ERR(bmp_page);
- bmp_page = NULL;
- goto iput_err_out;
- }
- bmp = (u8*)page_address(bmp_page);
- /* Find next index block in use. */
- while (!(bmp[cur_bmp_pos >> 3] & (1 << (cur_bmp_pos & 7)))) {
-find_next_index_buffer:
- cur_bmp_pos++;
- /*
- * If we have reached the end of the bitmap page, get the next
- * page, and put away the old one.
- */
- if (unlikely((cur_bmp_pos >> 3) >= PAGE_SIZE)) {
- ntfs_unmap_page(bmp_page);
- bmp_pos += PAGE_SIZE * 8;
- cur_bmp_pos = 0;
- goto get_next_bmp_page;
- }
- /* If we have reached the end of the bitmap, we are done. */
- if (unlikely(((bmp_pos + cur_bmp_pos) >> 3) >= i_size))
- goto unm_EOD;
- ia_pos = (bmp_pos + cur_bmp_pos) <<
- ndir->itype.index.block_size_bits;
- }
- ntfs_debug("Handling index buffer 0x%llx.",
- (unsigned long long)bmp_pos + cur_bmp_pos);
- /* If the current index buffer is in the same page we reuse the page. */
- if ((prev_ia_pos & (s64)PAGE_MASK) !=
- (ia_pos & (s64)PAGE_MASK)) {
- prev_ia_pos = ia_pos;
- if (likely(ia_page != NULL)) {
- unlock_page(ia_page);
- ntfs_unmap_page(ia_page);
- }
- /*
- * Map the page cache page containing the current ia_pos,
- * reading it from disk if necessary.
- */
- ia_page = ntfs_map_page(ia_mapping, ia_pos >> PAGE_SHIFT);
- if (IS_ERR(ia_page)) {
- ntfs_error(sb, "Reading index allocation data failed.");
- err = PTR_ERR(ia_page);
- ia_page = NULL;
- goto err_out;
- }
- lock_page(ia_page);
- kaddr = (u8*)page_address(ia_page);
- }
- /* Get the current index buffer. */
- ia = (INDEX_ALLOCATION*)(kaddr + (ia_pos & ~PAGE_MASK &
- ~(s64)(ndir->itype.index.block_size - 1)));
- /* Bounds checks. */
- if (unlikely((u8*)ia < kaddr || (u8*)ia > kaddr + PAGE_SIZE)) {
- ntfs_error(sb, "Out of bounds check failed. Corrupt directory "
- "inode 0x%lx or driver bug.", vdir->i_ino);
- goto err_out;
- }
- /* Catch multi sector transfer fixup errors. */
- if (unlikely(!ntfs_is_indx_record(ia->magic))) {
- ntfs_error(sb, "Directory index record with vcn 0x%llx is "
- "corrupt. Corrupt inode 0x%lx. Run chkdsk.",
- (unsigned long long)ia_pos >>
- ndir->itype.index.vcn_size_bits, vdir->i_ino);
- goto err_out;
- }
- if (unlikely(sle64_to_cpu(ia->index_block_vcn) != (ia_pos &
- ~(s64)(ndir->itype.index.block_size - 1)) >>
- ndir->itype.index.vcn_size_bits)) {
- ntfs_error(sb, "Actual VCN (0x%llx) of index buffer is "
- "different from expected VCN (0x%llx). "
- "Directory inode 0x%lx is corrupt or driver "
- "bug. ", (unsigned long long)
- sle64_to_cpu(ia->index_block_vcn),
- (unsigned long long)ia_pos >>
- ndir->itype.index.vcn_size_bits, vdir->i_ino);
- goto err_out;
- }
- if (unlikely(le32_to_cpu(ia->index.allocated_size) + 0x18 !=
- ndir->itype.index.block_size)) {
- ntfs_error(sb, "Index buffer (VCN 0x%llx) of directory inode "
- "0x%lx has a size (%u) differing from the "
- "directory specified size (%u). Directory "
- "inode is corrupt or driver bug.",
- (unsigned long long)ia_pos >>
- ndir->itype.index.vcn_size_bits, vdir->i_ino,
- le32_to_cpu(ia->index.allocated_size) + 0x18,
- ndir->itype.index.block_size);
- goto err_out;
- }
- index_end = (u8*)ia + ndir->itype.index.block_size;
- if (unlikely(index_end > kaddr + PAGE_SIZE)) {
- ntfs_error(sb, "Index buffer (VCN 0x%llx) of directory inode "
- "0x%lx crosses page boundary. Impossible! "
- "Cannot access! This is probably a bug in the "
- "driver.", (unsigned long long)ia_pos >>
- ndir->itype.index.vcn_size_bits, vdir->i_ino);
- goto err_out;
- }
- ia_start = ia_pos & ~(s64)(ndir->itype.index.block_size - 1);
- index_end = (u8*)&ia->index + le32_to_cpu(ia->index.index_length);
- if (unlikely(index_end > (u8*)ia + ndir->itype.index.block_size)) {
- ntfs_error(sb, "Size of index buffer (VCN 0x%llx) of directory "
- "inode 0x%lx exceeds maximum size.",
- (unsigned long long)ia_pos >>
- ndir->itype.index.vcn_size_bits, vdir->i_ino);
- goto err_out;
- }
- /* The first index entry in this index buffer. */
- ie = (INDEX_ENTRY*)((u8*)&ia->index +
- le32_to_cpu(ia->index.entries_offset));
- /*
- * Loop until we exceed valid memory (corruption case) or until we
- * reach the last entry or until filldir tells us it has had enough
- * or signals an error (both covered by the rc test).
- */
- for (;; ie = (INDEX_ENTRY*)((u8*)ie + le16_to_cpu(ie->length))) {
- ntfs_debug("In index allocation, offset 0x%llx.",
- (unsigned long long)ia_start +
- (unsigned long long)((u8*)ie - (u8*)ia));
- /* Bounds checks. */
- if (unlikely((u8*)ie < (u8*)ia || (u8*)ie +
- sizeof(INDEX_ENTRY_HEADER) > index_end ||
- (u8*)ie + le16_to_cpu(ie->key_length) >
- index_end))
- goto err_out;
- /* The last entry cannot contain a name. */
- if (ie->flags & INDEX_ENTRY_END)
- break;
- /* Skip index block entry if continuing previous readdir. */
- if (ia_pos - ia_start > (u8*)ie - (u8*)ia)
- continue;
- /* Advance the position even if going to skip the entry. */
- actor->pos = (u8*)ie - (u8*)ia +
- (sle64_to_cpu(ia->index_block_vcn) <<
- ndir->itype.index.vcn_size_bits) +
- vol->mft_record_size;
- /*
- * Submit the name to the @filldir callback. Note,
- * ntfs_filldir() drops the lock on @ia_page but it retakes it
- * before returning, unless a non-zero value is returned in
- * which case the page is left unlocked.
- */
- rc = ntfs_filldir(vol, ndir, ia_page, ie, name, actor);
- if (rc) {
- /* @ia_page is already unlocked in this case. */
- ntfs_unmap_page(ia_page);
- ntfs_unmap_page(bmp_page);
- iput(bmp_vi);
- goto abort;
- }
- }
- goto find_next_index_buffer;
-unm_EOD:
- if (ia_page) {
- unlock_page(ia_page);
- ntfs_unmap_page(ia_page);
- }
- ntfs_unmap_page(bmp_page);
- iput(bmp_vi);
-EOD:
- /* We are finished, set fpos to EOD. */
- actor->pos = i_size + vol->mft_record_size;
-abort:
- kfree(name);
- return 0;
-err_out:
- if (bmp_page) {
- ntfs_unmap_page(bmp_page);
-iput_err_out:
- iput(bmp_vi);
- }
- if (ia_page) {
- unlock_page(ia_page);
- ntfs_unmap_page(ia_page);
- }
- kfree(ir);
- kfree(name);
- if (ctx)
- ntfs_attr_put_search_ctx(ctx);
- if (m)
- unmap_mft_record(ndir);
- if (!err)
- err = -EIO;
- ntfs_debug("Failed. Returning error code %i.", -err);
- return err;
-}
-
-/**
- * ntfs_dir_open - called when an inode is about to be opened
- * @vi: inode to be opened
- * @filp: file structure describing the inode
- *
- * Limit directory size to the page cache limit on architectures where unsigned
- * long is 32-bits. This is the most we can do for now without overflowing the
- * page cache page index. Doing it this way means we don't run into problems
- * because of existing too large directories. It would be better to allow the
- * user to read the accessible part of the directory but I doubt very much
- * anyone is going to hit this check on a 32-bit architecture, so there is no
- * point in adding the extra complexity required to support this.
- *
- * On 64-bit architectures, the check is hopefully optimized away by the
- * compiler.
- */
-static int ntfs_dir_open(struct inode *vi, struct file *filp)
-{
- if (sizeof(unsigned long) < 8) {
- if (i_size_read(vi) > MAX_LFS_FILESIZE)
- return -EFBIG;
- }
- return 0;
-}
-
-#ifdef NTFS_RW
-
-/**
- * ntfs_dir_fsync - sync a directory to disk
- * @filp: directory to be synced
- * @start: offset in bytes of the beginning of data range to sync
- * @end: offset in bytes of the end of data range (inclusive)
- * @datasync: if non-zero only flush user data and not metadata
- *
- * Data integrity sync of a directory to disk. Used for fsync, fdatasync, and
- * msync system calls. This function is based on file.c::ntfs_file_fsync().
- *
- * Write the mft record and all associated extent mft records as well as the
- * $INDEX_ALLOCATION and $BITMAP attributes and then sync the block device.
- *
- * If @datasync is true, we do not wait on the inode(s) to be written out
- * but we always wait on the page cache pages to be written out.
- *
- * Note: In the past @filp could be NULL so we ignore it as we don't need it
- * anyway.
- *
- * Locking: Caller must hold i_mutex on the inode.
- *
- * TODO: We should probably also write all attribute/index inodes associated
- * with this inode but since we have no simple way of getting to them we ignore
- * this problem for now. We do write the $BITMAP attribute if it is present
- * which is the important one for a directory so things are not too bad.
- */
-static int ntfs_dir_fsync(struct file *filp, loff_t start, loff_t end,
- int datasync)
-{
- struct inode *bmp_vi, *vi = filp->f_mapping->host;
- int err, ret;
- ntfs_attr na;
-
- ntfs_debug("Entering for inode 0x%lx.", vi->i_ino);
-
- err = file_write_and_wait_range(filp, start, end);
- if (err)
- return err;
- inode_lock(vi);
-
- BUG_ON(!S_ISDIR(vi->i_mode));
- /* If the bitmap attribute inode is in memory sync it, too. */
- na.mft_no = vi->i_ino;
- na.type = AT_BITMAP;
- na.name = I30;
- na.name_len = 4;
- bmp_vi = ilookup5(vi->i_sb, vi->i_ino, ntfs_test_inode, &na);
- if (bmp_vi) {
- write_inode_now(bmp_vi, !datasync);
- iput(bmp_vi);
- }
- ret = __ntfs_write_inode(vi, 1);
- write_inode_now(vi, !datasync);
- err = sync_blockdev(vi->i_sb->s_bdev);
- if (unlikely(err && !ret))
- ret = err;
- if (likely(!ret))
- ntfs_debug("Done.");
- else
- ntfs_warning(vi->i_sb, "Failed to f%ssync inode 0x%lx. Error "
- "%u.", datasync ? "data" : "", vi->i_ino, -ret);
- inode_unlock(vi);
- return ret;
-}
-
-#endif /* NTFS_RW */
-
-WRAP_DIR_ITER(ntfs_readdir) // FIXME!
-const struct file_operations ntfs_dir_ops = {
- .llseek = generic_file_llseek, /* Seek inside directory. */
- .read = generic_read_dir, /* Return -EISDIR. */
- .iterate_shared = shared_ntfs_readdir, /* Read directory contents. */
-#ifdef NTFS_RW
- .fsync = ntfs_dir_fsync, /* Sync a directory to disk. */
-#endif /* NTFS_RW */
- /*.ioctl = ,*/ /* Perform function on the
- mounted filesystem. */
- .open = ntfs_dir_open, /* Open directory. */
-};
diff --git a/fs/ntfs/dir.h b/fs/ntfs/dir.h
deleted file mode 100644
index 0e32675..0000000
--- a/fs/ntfs/dir.h
+++ /dev/null
@@ -1,34 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * dir.h - Defines for directory handling in NTFS Linux kernel driver. Part of
- * the Linux-NTFS project.
- *
- * Copyright (c) 2002-2004 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_DIR_H
-#define _LINUX_NTFS_DIR_H
-
-#include "layout.h"
-#include "inode.h"
-#include "types.h"
-
-/*
- * ntfs_name is used to return the file name to the caller of
- * ntfs_lookup_inode_by_name() in order for the caller (namei.c::ntfs_lookup())
- * to be able to deal with dcache aliasing issues.
- */
-typedef struct {
- MFT_REF mref;
- FILE_NAME_TYPE_FLAGS type;
- u8 len;
- ntfschar name[0];
-} __attribute__ ((__packed__)) ntfs_name;
-
-/* The little endian Unicode string $I30 as a global constant. */
-extern ntfschar I30[5];
-
-extern MFT_REF ntfs_lookup_inode_by_name(ntfs_inode *dir_ni,
- const ntfschar *uname, const int uname_len, ntfs_name **res);
-
-#endif /* _LINUX_NTFS_FS_DIR_H */
diff --git a/fs/ntfs/endian.h b/fs/ntfs/endian.h
deleted file mode 100644
index f30c139..0000000
--- a/fs/ntfs/endian.h
+++ /dev/null
@@ -1,79 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * endian.h - Defines for endianness handling in NTFS Linux kernel driver.
- * Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2004 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_ENDIAN_H
-#define _LINUX_NTFS_ENDIAN_H
-
-#include <asm/byteorder.h>
-#include "types.h"
-
-/*
- * Signed endianness conversion functions.
- */
-
-static inline s16 sle16_to_cpu(sle16 x)
-{
- return le16_to_cpu((__force le16)x);
-}
-
-static inline s32 sle32_to_cpu(sle32 x)
-{
- return le32_to_cpu((__force le32)x);
-}
-
-static inline s64 sle64_to_cpu(sle64 x)
-{
- return le64_to_cpu((__force le64)x);
-}
-
-static inline s16 sle16_to_cpup(sle16 *x)
-{
- return le16_to_cpu(*(__force le16*)x);
-}
-
-static inline s32 sle32_to_cpup(sle32 *x)
-{
- return le32_to_cpu(*(__force le32*)x);
-}
-
-static inline s64 sle64_to_cpup(sle64 *x)
-{
- return le64_to_cpu(*(__force le64*)x);
-}
-
-static inline sle16 cpu_to_sle16(s16 x)
-{
- return (__force sle16)cpu_to_le16(x);
-}
-
-static inline sle32 cpu_to_sle32(s32 x)
-{
- return (__force sle32)cpu_to_le32(x);
-}
-
-static inline sle64 cpu_to_sle64(s64 x)
-{
- return (__force sle64)cpu_to_le64(x);
-}
-
-static inline sle16 cpu_to_sle16p(s16 *x)
-{
- return (__force sle16)cpu_to_le16(*x);
-}
-
-static inline sle32 cpu_to_sle32p(s32 *x)
-{
- return (__force sle32)cpu_to_le32(*x);
-}
-
-static inline sle64 cpu_to_sle64p(s64 *x)
-{
- return (__force sle64)cpu_to_le64(*x);
-}
-
-#endif /* _LINUX_NTFS_ENDIAN_H */
diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
deleted file mode 100644
index 297c0b9..0000000
--- a/fs/ntfs/file.c
+++ /dev/null
@@ -1,1997 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * file.c - NTFS kernel file operations. Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2015 Anton Altaparmakov and Tuxera Inc.
- */
-
-#include <linux/blkdev.h>
-#include <linux/backing-dev.h>
-#include <linux/buffer_head.h>
-#include <linux/gfp.h>
-#include <linux/pagemap.h>
-#include <linux/pagevec.h>
-#include <linux/sched/signal.h>
-#include <linux/swap.h>
-#include <linux/uio.h>
-#include <linux/writeback.h>
-
-#include <asm/page.h>
-#include <linux/uaccess.h>
-
-#include "attrib.h"
-#include "bitmap.h"
-#include "inode.h"
-#include "debug.h"
-#include "lcnalloc.h"
-#include "malloc.h"
-#include "mft.h"
-#include "ntfs.h"
-
-/**
- * ntfs_file_open - called when an inode is about to be opened
- * @vi: inode to be opened
- * @filp: file structure describing the inode
- *
- * Limit file size to the page cache limit on architectures where unsigned long
- * is 32-bits. This is the most we can do for now without overflowing the page
- * cache page index. Doing it this way means we don't run into problems because
- * of existing too large files. It would be better to allow the user to read
- * the beginning of the file but I doubt very much anyone is going to hit this
- * check on a 32-bit architecture, so there is no point in adding the extra
- * complexity required to support this.
- *
- * On 64-bit architectures, the check is hopefully optimized away by the
- * compiler.
- *
- * After the check passes, just call generic_file_open() to do its work.
- */
-static int ntfs_file_open(struct inode *vi, struct file *filp)
-{
- if (sizeof(unsigned long) < 8) {
- if (i_size_read(vi) > MAX_LFS_FILESIZE)
- return -EOVERFLOW;
- }
- return generic_file_open(vi, filp);
-}
-
-#ifdef NTFS_RW
-
-/**
- * ntfs_attr_extend_initialized - extend the initialized size of an attribute
- * @ni: ntfs inode of the attribute to extend
- * @new_init_size: requested new initialized size in bytes
- *
- * Extend the initialized size of an attribute described by the ntfs inode @ni
- * to @new_init_size bytes. This involves zeroing any non-sparse space between
- * the old initialized size and @new_init_size both in the page cache and on
- * disk (if relevant complete pages are already uptodate in the page cache then
- * these are simply marked dirty).
- *
- * As a side-effect, the file size (vfs inode->i_size) may be incremented as,
- * in the resident attribute case, it is tied to the initialized size and, in
- * the non-resident attribute case, it may not fall below the initialized size.
- *
- * Note that if the attribute is resident, we do not need to touch the page
- * cache at all. This is because if the page cache page is not uptodate we
- * bring it uptodate later, when doing the write to the mft record since we
- * then already have the page mapped. And if the page is uptodate, the
- * non-initialized region will already have been zeroed when the page was
- * brought uptodate and the region may in fact already have been overwritten
- * with new data via mmap() based writes, so we cannot just zero it. And since
- * POSIX specifies that the behaviour of resizing a file whilst it is mmap()ped
- * is unspecified, we choose not to do zeroing and thus we do not need to touch
- * the page at all. For a more detailed explanation see ntfs_truncate() in
- * fs/ntfs/inode.c.
- *
- * Return 0 on success and -errno on error. In the case that an error is
- * encountered it is possible that the initialized size will already have been
- * incremented some way towards @new_init_size but it is guaranteed that if
- * this is the case, the necessary zeroing will also have happened and that all
- * metadata is self-consistent.
- *
- * Locking: i_mutex on the vfs inode corrseponsind to the ntfs inode @ni must be
- * held by the caller.
- */
-static int ntfs_attr_extend_initialized(ntfs_inode *ni, const s64 new_init_size)
-{
- s64 old_init_size;
- loff_t old_i_size;
- pgoff_t index, end_index;
- unsigned long flags;
- struct inode *vi = VFS_I(ni);
- ntfs_inode *base_ni;
- MFT_RECORD *m = NULL;
- ATTR_RECORD *a;
- ntfs_attr_search_ctx *ctx = NULL;
- struct address_space *mapping;
- struct page *page = NULL;
- u8 *kattr;
- int err;
- u32 attr_len;
-
- read_lock_irqsave(&ni->size_lock, flags);
- old_init_size = ni->initialized_size;
- old_i_size = i_size_read(vi);
- BUG_ON(new_init_size > ni->allocated_size);
- read_unlock_irqrestore(&ni->size_lock, flags);
- ntfs_debug("Entering for i_ino 0x%lx, attribute type 0x%x, "
- "old_initialized_size 0x%llx, "
- "new_initialized_size 0x%llx, i_size 0x%llx.",
- vi->i_ino, (unsigned)le32_to_cpu(ni->type),
- (unsigned long long)old_init_size,
- (unsigned long long)new_init_size, old_i_size);
- if (!NInoAttr(ni))
- base_ni = ni;
- else
- base_ni = ni->ext.base_ntfs_ino;
- /* Use goto to reduce indentation and we need the label below anyway. */
- if (NInoNonResident(ni))
- goto do_non_resident_extend;
- BUG_ON(old_init_size != old_i_size);
- m = map_mft_record(base_ni);
- if (IS_ERR(m)) {
- err = PTR_ERR(m);
- m = NULL;
- goto err_out;
- }
- ctx = ntfs_attr_get_search_ctx(base_ni, m);
- if (unlikely(!ctx)) {
- err = -ENOMEM;
- goto err_out;
- }
- err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
- CASE_SENSITIVE, 0, NULL, 0, ctx);
- if (unlikely(err)) {
- if (err == -ENOENT)
- err = -EIO;
- goto err_out;
- }
- m = ctx->mrec;
- a = ctx->attr;
- BUG_ON(a->non_resident);
- /* The total length of the attribute value. */
- attr_len = le32_to_cpu(a->data.resident.value_length);
- BUG_ON(old_i_size != (loff_t)attr_len);
- /*
- * Do the zeroing in the mft record and update the attribute size in
- * the mft record.
- */
- kattr = (u8*)a + le16_to_cpu(a->data.resident.value_offset);
- memset(kattr + attr_len, 0, new_init_size - attr_len);
- a->data.resident.value_length = cpu_to_le32((u32)new_init_size);
- /* Finally, update the sizes in the vfs and ntfs inodes. */
- write_lock_irqsave(&ni->size_lock, flags);
- i_size_write(vi, new_init_size);
- ni->initialized_size = new_init_size;
- write_unlock_irqrestore(&ni->size_lock, flags);
- goto done;
-do_non_resident_extend:
- /*
- * If the new initialized size @new_init_size exceeds the current file
- * size (vfs inode->i_size), we need to extend the file size to the
- * new initialized size.
- */
- if (new_init_size > old_i_size) {
- m = map_mft_record(base_ni);
- if (IS_ERR(m)) {
- err = PTR_ERR(m);
- m = NULL;
- goto err_out;
- }
- ctx = ntfs_attr_get_search_ctx(base_ni, m);
- if (unlikely(!ctx)) {
- err = -ENOMEM;
- goto err_out;
- }
- err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
- CASE_SENSITIVE, 0, NULL, 0, ctx);
- if (unlikely(err)) {
- if (err == -ENOENT)
- err = -EIO;
- goto err_out;
- }
- m = ctx->mrec;
- a = ctx->attr;
- BUG_ON(!a->non_resident);
- BUG_ON(old_i_size != (loff_t)
- sle64_to_cpu(a->data.non_resident.data_size));
- a->data.non_resident.data_size = cpu_to_sle64(new_init_size);
- flush_dcache_mft_record_page(ctx->ntfs_ino);
- mark_mft_record_dirty(ctx->ntfs_ino);
- /* Update the file size in the vfs inode. */
- i_size_write(vi, new_init_size);
- ntfs_attr_put_search_ctx(ctx);
- ctx = NULL;
- unmap_mft_record(base_ni);
- m = NULL;
- }
- mapping = vi->i_mapping;
- index = old_init_size >> PAGE_SHIFT;
- end_index = (new_init_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
- do {
- /*
- * Read the page. If the page is not present, this will zero
- * the uninitialized regions for us.
- */
- page = read_mapping_page(mapping, index, NULL);
- if (IS_ERR(page)) {
- err = PTR_ERR(page);
- goto init_err_out;
- }
- /*
- * Update the initialized size in the ntfs inode. This is
- * enough to make ntfs_writepage() work.
- */
- write_lock_irqsave(&ni->size_lock, flags);
- ni->initialized_size = (s64)(index + 1) << PAGE_SHIFT;
- if (ni->initialized_size > new_init_size)
- ni->initialized_size = new_init_size;
- write_unlock_irqrestore(&ni->size_lock, flags);
- /* Set the page dirty so it gets written out. */
- set_page_dirty(page);
- put_page(page);
- /*
- * Play nice with the vm and the rest of the system. This is
- * very much needed as we can potentially be modifying the
- * initialised size from a very small value to a really huge
- * value, e.g.
- * f = open(somefile, O_TRUNC);
- * truncate(f, 10GiB);
- * seek(f, 10GiB);
- * write(f, 1);
- * And this would mean we would be marking dirty hundreds of
- * thousands of pages or as in the above example more than
- * two and a half million pages!
- *
- * TODO: For sparse pages could optimize this workload by using
- * the FsMisc / MiscFs page bit as a "PageIsSparse" bit. This
- * would be set in read_folio for sparse pages and here we would
- * not need to mark dirty any pages which have this bit set.
- * The only caveat is that we have to clear the bit everywhere
- * where we allocate any clusters that lie in the page or that
- * contain the page.
- *
- * TODO: An even greater optimization would be for us to only
- * call read_folio() on pages which are not in sparse regions as
- * determined from the runlist. This would greatly reduce the
- * number of pages we read and make dirty in the case of sparse
- * files.
- */
- balance_dirty_pages_ratelimited(mapping);
- cond_resched();
- } while (++index < end_index);
- read_lock_irqsave(&ni->size_lock, flags);
- BUG_ON(ni->initialized_size != new_init_size);
- read_unlock_irqrestore(&ni->size_lock, flags);
- /* Now bring in sync the initialized_size in the mft record. */
- m = map_mft_record(base_ni);
- if (IS_ERR(m)) {
- err = PTR_ERR(m);
- m = NULL;
- goto init_err_out;
- }
- ctx = ntfs_attr_get_search_ctx(base_ni, m);
- if (unlikely(!ctx)) {
- err = -ENOMEM;
- goto init_err_out;
- }
- err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
- CASE_SENSITIVE, 0, NULL, 0, ctx);
- if (unlikely(err)) {
- if (err == -ENOENT)
- err = -EIO;
- goto init_err_out;
- }
- m = ctx->mrec;
- a = ctx->attr;
- BUG_ON(!a->non_resident);
- a->data.non_resident.initialized_size = cpu_to_sle64(new_init_size);
-done:
- flush_dcache_mft_record_page(ctx->ntfs_ino);
- mark_mft_record_dirty(ctx->ntfs_ino);
- if (ctx)
- ntfs_attr_put_search_ctx(ctx);
- if (m)
- unmap_mft_record(base_ni);
- ntfs_debug("Done, initialized_size 0x%llx, i_size 0x%llx.",
- (unsigned long long)new_init_size, i_size_read(vi));
- return 0;
-init_err_out:
- write_lock_irqsave(&ni->size_lock, flags);
- ni->initialized_size = old_init_size;
- write_unlock_irqrestore(&ni->size_lock, flags);
-err_out:
- if (ctx)
- ntfs_attr_put_search_ctx(ctx);
- if (m)
- unmap_mft_record(base_ni);
- ntfs_debug("Failed. Returning error code %i.", err);
- return err;
-}
-
-static ssize_t ntfs_prepare_file_for_write(struct kiocb *iocb,
- struct iov_iter *from)
-{
- loff_t pos;
- s64 end, ll;
- ssize_t err;
- unsigned long flags;
- struct file *file = iocb->ki_filp;
- struct inode *vi = file_inode(file);
- ntfs_inode *ni = NTFS_I(vi);
- ntfs_volume *vol = ni->vol;
-
- ntfs_debug("Entering for i_ino 0x%lx, attribute type 0x%x, pos "
- "0x%llx, count 0x%zx.", vi->i_ino,
- (unsigned)le32_to_cpu(ni->type),
- (unsigned long long)iocb->ki_pos,
- iov_iter_count(from));
- err = generic_write_checks(iocb, from);
- if (unlikely(err <= 0))
- goto out;
- /*
- * All checks have passed. Before we start doing any writing we want
- * to abort any totally illegal writes.
- */
- BUG_ON(NInoMstProtected(ni));
- BUG_ON(ni->type != AT_DATA);
- /* If file is encrypted, deny access, just like NT4. */
- if (NInoEncrypted(ni)) {
- /* Only $DATA attributes can be encrypted. */
- /*
- * Reminder for later: Encrypted files are _always_
- * non-resident so that the content can always be encrypted.
- */
- ntfs_debug("Denying write access to encrypted file.");
- err = -EACCES;
- goto out;
- }
- if (NInoCompressed(ni)) {
- /* Only unnamed $DATA attribute can be compressed. */
- BUG_ON(ni->name_len);
- /*
- * Reminder for later: If resident, the data is not actually
- * compressed. Only on the switch to non-resident does
- * compression kick in. This is in contrast to encrypted files
- * (see above).
- */
- ntfs_error(vi->i_sb, "Writing to compressed files is not "
- "implemented yet. Sorry.");
- err = -EOPNOTSUPP;
- goto out;
- }
- err = file_remove_privs(file);
- if (unlikely(err))
- goto out;
- /*
- * Our ->update_time method always succeeds thus file_update_time()
- * cannot fail either so there is no need to check the return code.
- */
- file_update_time(file);
- pos = iocb->ki_pos;
- /* The first byte after the last cluster being written to. */
- end = (pos + iov_iter_count(from) + vol->cluster_size_mask) &
- ~(u64)vol->cluster_size_mask;
- /*
- * If the write goes beyond the allocated size, extend the allocation
- * to cover the whole of the write, rounded up to the nearest cluster.
- */
- read_lock_irqsave(&ni->size_lock, flags);
- ll = ni->allocated_size;
- read_unlock_irqrestore(&ni->size_lock, flags);
- if (end > ll) {
- /*
- * Extend the allocation without changing the data size.
- *
- * Note we ensure the allocation is big enough to at least
- * write some data but we do not require the allocation to be
- * complete, i.e. it may be partial.
- */
- ll = ntfs_attr_extend_allocation(ni, end, -1, pos);
- if (likely(ll >= 0)) {
- BUG_ON(pos >= ll);
- /* If the extension was partial truncate the write. */
- if (end > ll) {
- ntfs_debug("Truncating write to inode 0x%lx, "
- "attribute type 0x%x, because "
- "the allocation was only "
- "partially extended.",
- vi->i_ino, (unsigned)
- le32_to_cpu(ni->type));
- iov_iter_truncate(from, ll - pos);
- }
- } else {
- err = ll;
- read_lock_irqsave(&ni->size_lock, flags);
- ll = ni->allocated_size;
- read_unlock_irqrestore(&ni->size_lock, flags);
- /* Perform a partial write if possible or fail. */
- if (pos < ll) {
- ntfs_debug("Truncating write to inode 0x%lx "
- "attribute type 0x%x, because "
- "extending the allocation "
- "failed (error %d).",
- vi->i_ino, (unsigned)
- le32_to_cpu(ni->type),
- (int)-err);
- iov_iter_truncate(from, ll - pos);
- } else {
- if (err != -ENOSPC)
- ntfs_error(vi->i_sb, "Cannot perform "
- "write to inode "
- "0x%lx, attribute "
- "type 0x%x, because "
- "extending the "
- "allocation failed "
- "(error %ld).",
- vi->i_ino, (unsigned)
- le32_to_cpu(ni->type),
- (long)-err);
- else
- ntfs_debug("Cannot perform write to "
- "inode 0x%lx, "
- "attribute type 0x%x, "
- "because there is not "
- "space left.",
- vi->i_ino, (unsigned)
- le32_to_cpu(ni->type));
- goto out;
- }
- }
- }
- /*
- * If the write starts beyond the initialized size, extend it up to the
- * beginning of the write and initialize all non-sparse space between
- * the old initialized size and the new one. This automatically also
- * increments the vfs inode->i_size to keep it above or equal to the
- * initialized_size.
- */
- read_lock_irqsave(&ni->size_lock, flags);
- ll = ni->initialized_size;
- read_unlock_irqrestore(&ni->size_lock, flags);
- if (pos > ll) {
- /*
- * Wait for ongoing direct i/o to complete before proceeding.
- * New direct i/o cannot start as we hold i_mutex.
- */
- inode_dio_wait(vi);
- err = ntfs_attr_extend_initialized(ni, pos);
- if (unlikely(err < 0))
- ntfs_error(vi->i_sb, "Cannot perform write to inode "
- "0x%lx, attribute type 0x%x, because "
- "extending the initialized size "
- "failed (error %d).", vi->i_ino,
- (unsigned)le32_to_cpu(ni->type),
- (int)-err);
- }
-out:
- return err;
-}
-
-/**
- * __ntfs_grab_cache_pages - obtain a number of locked pages
- * @mapping: address space mapping from which to obtain page cache pages
- * @index: starting index in @mapping at which to begin obtaining pages
- * @nr_pages: number of page cache pages to obtain
- * @pages: array of pages in which to return the obtained page cache pages
- * @cached_page: allocated but as yet unused page
- *
- * Obtain @nr_pages locked page cache pages from the mapping @mapping and
- * starting at index @index.
- *
- * If a page is newly created, add it to lru list
- *
- * Note, the page locks are obtained in ascending page index order.
- */
-static inline int __ntfs_grab_cache_pages(struct address_space *mapping,
- pgoff_t index, const unsigned nr_pages, struct page **pages,
- struct page **cached_page)
-{
- int err, nr;
-
- BUG_ON(!nr_pages);
- err = nr = 0;
- do {
- pages[nr] = find_get_page_flags(mapping, index, FGP_LOCK |
- FGP_ACCESSED);
- if (!pages[nr]) {
- if (!*cached_page) {
- *cached_page = page_cache_alloc(mapping);
- if (unlikely(!*cached_page)) {
- err = -ENOMEM;
- goto err_out;
- }
- }
- err = add_to_page_cache_lru(*cached_page, mapping,
- index,
- mapping_gfp_constraint(mapping, GFP_KERNEL));
- if (unlikely(err)) {
- if (err == -EEXIST)
- continue;
- goto err_out;
- }
- pages[nr] = *cached_page;
- *cached_page = NULL;
- }
- index++;
- nr++;
- } while (nr < nr_pages);
-out:
- return err;
-err_out:
- while (nr > 0) {
- unlock_page(pages[--nr]);
- put_page(pages[nr]);
- }
- goto out;
-}
-
-static inline void ntfs_submit_bh_for_read(struct buffer_head *bh)
-{
- lock_buffer(bh);
- get_bh(bh);
- bh->b_end_io = end_buffer_read_sync;
- submit_bh(REQ_OP_READ, bh);
-}
-
-/**
- * ntfs_prepare_pages_for_non_resident_write - prepare pages for receiving data
- * @pages: array of destination pages
- * @nr_pages: number of pages in @pages
- * @pos: byte position in file at which the write begins
- * @bytes: number of bytes to be written
- *
- * This is called for non-resident attributes from ntfs_file_buffered_write()
- * with i_mutex held on the inode (@pages[0]->mapping->host). There are
- * @nr_pages pages in @pages which are locked but not kmap()ped. The source
- * data has not yet been copied into the @pages.
- *
- * Need to fill any holes with actual clusters, allocate buffers if necessary,
- * ensure all the buffers are mapped, and bring uptodate any buffers that are
- * only partially being written to.
- *
- * If @nr_pages is greater than one, we are guaranteed that the cluster size is
- * greater than PAGE_SIZE, that all pages in @pages are entirely inside
- * the same cluster and that they are the entirety of that cluster, and that
- * the cluster is sparse, i.e. we need to allocate a cluster to fill the hole.
- *
- * i_size is not to be modified yet.
- *
- * Return 0 on success or -errno on error.
- */
-static int ntfs_prepare_pages_for_non_resident_write(struct page **pages,
- unsigned nr_pages, s64 pos, size_t bytes)
-{
- VCN vcn, highest_vcn = 0, cpos, cend, bh_cpos, bh_cend;
- LCN lcn;
- s64 bh_pos, vcn_len, end, initialized_size;
- sector_t lcn_block;
- struct folio *folio;
- struct inode *vi;
- ntfs_inode *ni, *base_ni = NULL;
- ntfs_volume *vol;
- runlist_element *rl, *rl2;
- struct buffer_head *bh, *head, *wait[2], **wait_bh = wait;
- ntfs_attr_search_ctx *ctx = NULL;
- MFT_RECORD *m = NULL;
- ATTR_RECORD *a = NULL;
- unsigned long flags;
- u32 attr_rec_len = 0;
- unsigned blocksize, u;
- int err, mp_size;
- bool rl_write_locked, was_hole, is_retry;
- unsigned char blocksize_bits;
- struct {
- u8 runlist_merged:1;
- u8 mft_attr_mapped:1;
- u8 mp_rebuilt:1;
- u8 attr_switched:1;
- } status = { 0, 0, 0, 0 };
-
- BUG_ON(!nr_pages);
- BUG_ON(!pages);
- BUG_ON(!*pages);
- vi = pages[0]->mapping->host;
- ni = NTFS_I(vi);
- vol = ni->vol;
- ntfs_debug("Entering for inode 0x%lx, attribute type 0x%x, start page "
- "index 0x%lx, nr_pages 0x%x, pos 0x%llx, bytes 0x%zx.",
- vi->i_ino, ni->type, pages[0]->index, nr_pages,
- (long long)pos, bytes);
- blocksize = vol->sb->s_blocksize;
- blocksize_bits = vol->sb->s_blocksize_bits;
- rl_write_locked = false;
- rl = NULL;
- err = 0;
- vcn = lcn = -1;
- vcn_len = 0;
- lcn_block = -1;
- was_hole = false;
- cpos = pos >> vol->cluster_size_bits;
- end = pos + bytes;
- cend = (end + vol->cluster_size - 1) >> vol->cluster_size_bits;
- /*
- * Loop over each buffer in each folio. Use goto to
- * reduce indentation.
- */
- u = 0;
-do_next_folio:
- folio = page_folio(pages[u]);
- bh_pos = folio_pos(folio);
- head = folio_buffers(folio);
- if (!head)
- /*
- * create_empty_buffers() will create uptodate/dirty
- * buffers if the folio is uptodate/dirty.
- */
- head = create_empty_buffers(folio, blocksize, 0);
- bh = head;
- do {
- VCN cdelta;
- s64 bh_end;
- unsigned bh_cofs;
-
- /* Clear buffer_new on all buffers to reinitialise state. */
- if (buffer_new(bh))
- clear_buffer_new(bh);
- bh_end = bh_pos + blocksize;
- bh_cpos = bh_pos >> vol->cluster_size_bits;
- bh_cofs = bh_pos & vol->cluster_size_mask;
- if (buffer_mapped(bh)) {
- /*
- * The buffer is already mapped. If it is uptodate,
- * ignore it.
- */
- if (buffer_uptodate(bh))
- continue;
- /*
- * The buffer is not uptodate. If the folio is uptodate
- * set the buffer uptodate and otherwise ignore it.
- */
- if (folio_test_uptodate(folio)) {
- set_buffer_uptodate(bh);
- continue;
- }
- /*
- * Neither the folio nor the buffer are uptodate. If
- * the buffer is only partially being written to, we
- * need to read it in before the write, i.e. now.
- */
- if ((bh_pos < pos && bh_end > pos) ||
- (bh_pos < end && bh_end > end)) {
- /*
- * If the buffer is fully or partially within
- * the initialized size, do an actual read.
- * Otherwise, simply zero the buffer.
- */
- read_lock_irqsave(&ni->size_lock, flags);
- initialized_size = ni->initialized_size;
- read_unlock_irqrestore(&ni->size_lock, flags);
- if (bh_pos < initialized_size) {
- ntfs_submit_bh_for_read(bh);
- *wait_bh++ = bh;
- } else {
- folio_zero_range(folio, bh_offset(bh),
- blocksize);
- set_buffer_uptodate(bh);
- }
- }
- continue;
- }
- /* Unmapped buffer. Need to map it. */
- bh->b_bdev = vol->sb->s_bdev;
- /*
- * If the current buffer is in the same clusters as the map
- * cache, there is no need to check the runlist again. The
- * map cache is made up of @vcn, which is the first cached file
- * cluster, @vcn_len which is the number of cached file
- * clusters, @lcn is the device cluster corresponding to @vcn,
- * and @lcn_block is the block number corresponding to @lcn.
- */
- cdelta = bh_cpos - vcn;
- if (likely(!cdelta || (cdelta > 0 && cdelta < vcn_len))) {
-map_buffer_cached:
- BUG_ON(lcn < 0);
- bh->b_blocknr = lcn_block +
- (cdelta << (vol->cluster_size_bits -
- blocksize_bits)) +
- (bh_cofs >> blocksize_bits);
- set_buffer_mapped(bh);
- /*
- * If the folio is uptodate so is the buffer. If the
- * buffer is fully outside the write, we ignore it if
- * it was already allocated and we mark it dirty so it
- * gets written out if we allocated it. On the other
- * hand, if we allocated the buffer but we are not
- * marking it dirty we set buffer_new so we can do
- * error recovery.
- */
- if (folio_test_uptodate(folio)) {
- if (!buffer_uptodate(bh))
- set_buffer_uptodate(bh);
- if (unlikely(was_hole)) {
- /* We allocated the buffer. */
- clean_bdev_bh_alias(bh);
- if (bh_end <= pos || bh_pos >= end)
- mark_buffer_dirty(bh);
- else
- set_buffer_new(bh);
- }
- continue;
- }
- /* Page is _not_ uptodate. */
- if (likely(!was_hole)) {
- /*
- * Buffer was already allocated. If it is not
- * uptodate and is only partially being written
- * to, we need to read it in before the write,
- * i.e. now.
- */
- if (!buffer_uptodate(bh) && bh_pos < end &&
- bh_end > pos &&
- (bh_pos < pos ||
- bh_end > end)) {
- /*
- * If the buffer is fully or partially
- * within the initialized size, do an
- * actual read. Otherwise, simply zero
- * the buffer.
- */
- read_lock_irqsave(&ni->size_lock,
- flags);
- initialized_size = ni->initialized_size;
- read_unlock_irqrestore(&ni->size_lock,
- flags);
- if (bh_pos < initialized_size) {
- ntfs_submit_bh_for_read(bh);
- *wait_bh++ = bh;
- } else {
- folio_zero_range(folio,
- bh_offset(bh),
- blocksize);
- set_buffer_uptodate(bh);
- }
- }
- continue;
- }
- /* We allocated the buffer. */
- clean_bdev_bh_alias(bh);
- /*
- * If the buffer is fully outside the write, zero it,
- * set it uptodate, and mark it dirty so it gets
- * written out. If it is partially being written to,
- * zero region surrounding the write but leave it to
- * commit write to do anything else. Finally, if the
- * buffer is fully being overwritten, do nothing.
- */
- if (bh_end <= pos || bh_pos >= end) {
- if (!buffer_uptodate(bh)) {
- folio_zero_range(folio, bh_offset(bh),
- blocksize);
- set_buffer_uptodate(bh);
- }
- mark_buffer_dirty(bh);
- continue;
- }
- set_buffer_new(bh);
- if (!buffer_uptodate(bh) &&
- (bh_pos < pos || bh_end > end)) {
- u8 *kaddr;
- unsigned pofs;
-
- kaddr = kmap_local_folio(folio, 0);
- if (bh_pos < pos) {
- pofs = bh_pos & ~PAGE_MASK;
- memset(kaddr + pofs, 0, pos - bh_pos);
- }
- if (bh_end > end) {
- pofs = end & ~PAGE_MASK;
- memset(kaddr + pofs, 0, bh_end - end);
- }
- kunmap_local(kaddr);
- flush_dcache_folio(folio);
- }
- continue;
- }
- /*
- * Slow path: this is the first buffer in the cluster. If it
- * is outside allocated size and is not uptodate, zero it and
- * set it uptodate.
- */
- read_lock_irqsave(&ni->size_lock, flags);
- initialized_size = ni->allocated_size;
- read_unlock_irqrestore(&ni->size_lock, flags);
- if (bh_pos > initialized_size) {
- if (folio_test_uptodate(folio)) {
- if (!buffer_uptodate(bh))
- set_buffer_uptodate(bh);
- } else if (!buffer_uptodate(bh)) {
- folio_zero_range(folio, bh_offset(bh),
- blocksize);
- set_buffer_uptodate(bh);
- }
- continue;
- }
- is_retry = false;
- if (!rl) {
- down_read(&ni->runlist.lock);
-retry_remap:
- rl = ni->runlist.rl;
- }
- if (likely(rl != NULL)) {
- /* Seek to element containing target cluster. */
- while (rl->length && rl[1].vcn <= bh_cpos)
- rl++;
- lcn = ntfs_rl_vcn_to_lcn(rl, bh_cpos);
- if (likely(lcn >= 0)) {
- /*
- * Successful remap, setup the map cache and
- * use that to deal with the buffer.
- */
- was_hole = false;
- vcn = bh_cpos;
- vcn_len = rl[1].vcn - vcn;
- lcn_block = lcn << (vol->cluster_size_bits -
- blocksize_bits);
- cdelta = 0;
- /*
- * If the number of remaining clusters touched
- * by the write is smaller or equal to the
- * number of cached clusters, unlock the
- * runlist as the map cache will be used from
- * now on.
- */
- if (likely(vcn + vcn_len >= cend)) {
- if (rl_write_locked) {
- up_write(&ni->runlist.lock);
- rl_write_locked = false;
- } else
- up_read(&ni->runlist.lock);
- rl = NULL;
- }
- goto map_buffer_cached;
- }
- } else
- lcn = LCN_RL_NOT_MAPPED;
- /*
- * If it is not a hole and not out of bounds, the runlist is
- * probably unmapped so try to map it now.
- */
- if (unlikely(lcn != LCN_HOLE && lcn != LCN_ENOENT)) {
- if (likely(!is_retry && lcn == LCN_RL_NOT_MAPPED)) {
- /* Attempt to map runlist. */
- if (!rl_write_locked) {
- /*
- * We need the runlist locked for
- * writing, so if it is locked for
- * reading relock it now and retry in
- * case it changed whilst we dropped
- * the lock.
- */
- up_read(&ni->runlist.lock);
- down_write(&ni->runlist.lock);
- rl_write_locked = true;
- goto retry_remap;
- }
- err = ntfs_map_runlist_nolock(ni, bh_cpos,
- NULL);
- if (likely(!err)) {
- is_retry = true;
- goto retry_remap;
- }
- /*
- * If @vcn is out of bounds, pretend @lcn is
- * LCN_ENOENT. As long as the buffer is out
- * of bounds this will work fine.
- */
- if (err == -ENOENT) {
- lcn = LCN_ENOENT;
- err = 0;
- goto rl_not_mapped_enoent;
- }
- } else
- err = -EIO;
- /* Failed to map the buffer, even after retrying. */
- bh->b_blocknr = -1;
- ntfs_error(vol->sb, "Failed to write to inode 0x%lx, "
- "attribute type 0x%x, vcn 0x%llx, "
- "vcn offset 0x%x, because its "
- "location on disk could not be "
- "determined%s (error code %i).",
- ni->mft_no, ni->type,
- (unsigned long long)bh_cpos,
- (unsigned)bh_pos &
- vol->cluster_size_mask,
- is_retry ? " even after retrying" : "",
- err);
- break;
- }
-rl_not_mapped_enoent:
- /*
- * The buffer is in a hole or out of bounds. We need to fill
- * the hole, unless the buffer is in a cluster which is not
- * touched by the write, in which case we just leave the buffer
- * unmapped. This can only happen when the cluster size is
- * less than the page cache size.
- */
- if (unlikely(vol->cluster_size < PAGE_SIZE)) {
- bh_cend = (bh_end + vol->cluster_size - 1) >>
- vol->cluster_size_bits;
- if ((bh_cend <= cpos || bh_cpos >= cend)) {
- bh->b_blocknr = -1;
- /*
- * If the buffer is uptodate we skip it. If it
- * is not but the folio is uptodate, we can set
- * the buffer uptodate. If the folio is not
- * uptodate, we can clear the buffer and set it
- * uptodate. Whether this is worthwhile is
- * debatable and this could be removed.
- */
- if (folio_test_uptodate(folio)) {
- if (!buffer_uptodate(bh))
- set_buffer_uptodate(bh);
- } else if (!buffer_uptodate(bh)) {
- folio_zero_range(folio, bh_offset(bh),
- blocksize);
- set_buffer_uptodate(bh);
- }
- continue;
- }
- }
- /*
- * Out of bounds buffer is invalid if it was not really out of
- * bounds.
- */
- BUG_ON(lcn != LCN_HOLE);
- /*
- * We need the runlist locked for writing, so if it is locked
- * for reading relock it now and retry in case it changed
- * whilst we dropped the lock.
- */
- BUG_ON(!rl);
- if (!rl_write_locked) {
- up_read(&ni->runlist.lock);
- down_write(&ni->runlist.lock);
- rl_write_locked = true;
- goto retry_remap;
- }
- /* Find the previous last allocated cluster. */
- BUG_ON(rl->lcn != LCN_HOLE);
- lcn = -1;
- rl2 = rl;
- while (--rl2 >= ni->runlist.rl) {
- if (rl2->lcn >= 0) {
- lcn = rl2->lcn + rl2->length;
- break;
- }
- }
- rl2 = ntfs_cluster_alloc(vol, bh_cpos, 1, lcn, DATA_ZONE,
- false);
- if (IS_ERR(rl2)) {
- err = PTR_ERR(rl2);
- ntfs_debug("Failed to allocate cluster, error code %i.",
- err);
- break;
- }
- lcn = rl2->lcn;
- rl = ntfs_runlists_merge(ni->runlist.rl, rl2);
- if (IS_ERR(rl)) {
- err = PTR_ERR(rl);
- if (err != -ENOMEM)
- err = -EIO;
- if (ntfs_cluster_free_from_rl(vol, rl2)) {
- ntfs_error(vol->sb, "Failed to release "
- "allocated cluster in error "
- "code path. Run chkdsk to "
- "recover the lost cluster.");
- NVolSetErrors(vol);
- }
- ntfs_free(rl2);
- break;
- }
- ni->runlist.rl = rl;
- status.runlist_merged = 1;
- ntfs_debug("Allocated cluster, lcn 0x%llx.",
- (unsigned long long)lcn);
- /* Map and lock the mft record and get the attribute record. */
- if (!NInoAttr(ni))
- base_ni = ni;
- else
- base_ni = ni->ext.base_ntfs_ino;
- m = map_mft_record(base_ni);
- if (IS_ERR(m)) {
- err = PTR_ERR(m);
- break;
- }
- ctx = ntfs_attr_get_search_ctx(base_ni, m);
- if (unlikely(!ctx)) {
- err = -ENOMEM;
- unmap_mft_record(base_ni);
- break;
- }
- status.mft_attr_mapped = 1;
- err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
- CASE_SENSITIVE, bh_cpos, NULL, 0, ctx);
- if (unlikely(err)) {
- if (err == -ENOENT)
- err = -EIO;
- break;
- }
- m = ctx->mrec;
- a = ctx->attr;
- /*
- * Find the runlist element with which the attribute extent
- * starts. Note, we cannot use the _attr_ version because we
- * have mapped the mft record. That is ok because we know the
- * runlist fragment must be mapped already to have ever gotten
- * here, so we can just use the _rl_ version.
- */
- vcn = sle64_to_cpu(a->data.non_resident.lowest_vcn);
- rl2 = ntfs_rl_find_vcn_nolock(rl, vcn);
- BUG_ON(!rl2);
- BUG_ON(!rl2->length);
- BUG_ON(rl2->lcn < LCN_HOLE);
- highest_vcn = sle64_to_cpu(a->data.non_resident.highest_vcn);
- /*
- * If @highest_vcn is zero, calculate the real highest_vcn
- * (which can really be zero).
- */
- if (!highest_vcn)
- highest_vcn = (sle64_to_cpu(
- a->data.non_resident.allocated_size) >>
- vol->cluster_size_bits) - 1;
- /*
- * Determine the size of the mapping pairs array for the new
- * extent, i.e. the old extent with the hole filled.
- */
- mp_size = ntfs_get_size_for_mapping_pairs(vol, rl2, vcn,
- highest_vcn);
- if (unlikely(mp_size <= 0)) {
- if (!(err = mp_size))
- err = -EIO;
- ntfs_debug("Failed to get size for mapping pairs "
- "array, error code %i.", err);
- break;
- }
- /*
- * Resize the attribute record to fit the new mapping pairs
- * array.
- */
- attr_rec_len = le32_to_cpu(a->length);
- err = ntfs_attr_record_resize(m, a, mp_size + le16_to_cpu(
- a->data.non_resident.mapping_pairs_offset));
- if (unlikely(err)) {
- BUG_ON(err != -ENOSPC);
- // TODO: Deal with this by using the current attribute
- // and fill it with as much of the mapping pairs
- // array as possible. Then loop over each attribute
- // extent rewriting the mapping pairs arrays as we go
- // along and if when we reach the end we have not
- // enough space, try to resize the last attribute
- // extent and if even that fails, add a new attribute
- // extent.
- // We could also try to resize at each step in the hope
- // that we will not need to rewrite every single extent.
- // Note, we may need to decompress some extents to fill
- // the runlist as we are walking the extents...
- ntfs_error(vol->sb, "Not enough space in the mft "
- "record for the extended attribute "
- "record. This case is not "
- "implemented yet.");
- err = -EOPNOTSUPP;
- break ;
- }
- status.mp_rebuilt = 1;
- /*
- * Generate the mapping pairs array directly into the attribute
- * record.
- */
- err = ntfs_mapping_pairs_build(vol, (u8*)a + le16_to_cpu(
- a->data.non_resident.mapping_pairs_offset),
- mp_size, rl2, vcn, highest_vcn, NULL);
- if (unlikely(err)) {
- ntfs_error(vol->sb, "Cannot fill hole in inode 0x%lx, "
- "attribute type 0x%x, because building "
- "the mapping pairs failed with error "
- "code %i.", vi->i_ino,
- (unsigned)le32_to_cpu(ni->type), err);
- err = -EIO;
- break;
- }
- /* Update the highest_vcn but only if it was not set. */
- if (unlikely(!a->data.non_resident.highest_vcn))
- a->data.non_resident.highest_vcn =
- cpu_to_sle64(highest_vcn);
- /*
- * If the attribute is sparse/compressed, update the compressed
- * size in the ntfs_inode structure and the attribute record.
- */
- if (likely(NInoSparse(ni) || NInoCompressed(ni))) {
- /*
- * If we are not in the first attribute extent, switch
- * to it, but first ensure the changes will make it to
- * disk later.
- */
- if (a->data.non_resident.lowest_vcn) {
- flush_dcache_mft_record_page(ctx->ntfs_ino);
- mark_mft_record_dirty(ctx->ntfs_ino);
- ntfs_attr_reinit_search_ctx(ctx);
- err = ntfs_attr_lookup(ni->type, ni->name,
- ni->name_len, CASE_SENSITIVE,
- 0, NULL, 0, ctx);
- if (unlikely(err)) {
- status.attr_switched = 1;
- break;
- }
- /* @m is not used any more so do not set it. */
- a = ctx->attr;
- }
- write_lock_irqsave(&ni->size_lock, flags);
- ni->itype.compressed.size += vol->cluster_size;
- a->data.non_resident.compressed_size =
- cpu_to_sle64(ni->itype.compressed.size);
- write_unlock_irqrestore(&ni->size_lock, flags);
- }
- /* Ensure the changes make it to disk. */
- flush_dcache_mft_record_page(ctx->ntfs_ino);
- mark_mft_record_dirty(ctx->ntfs_ino);
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(base_ni);
- /* Successfully filled the hole. */
- status.runlist_merged = 0;
- status.mft_attr_mapped = 0;
- status.mp_rebuilt = 0;
- /* Setup the map cache and use that to deal with the buffer. */
- was_hole = true;
- vcn = bh_cpos;
- vcn_len = 1;
- lcn_block = lcn << (vol->cluster_size_bits - blocksize_bits);
- cdelta = 0;
- /*
- * If the number of remaining clusters in the @pages is smaller
- * or equal to the number of cached clusters, unlock the
- * runlist as the map cache will be used from now on.
- */
- if (likely(vcn + vcn_len >= cend)) {
- up_write(&ni->runlist.lock);
- rl_write_locked = false;
- rl = NULL;
- }
- goto map_buffer_cached;
- } while (bh_pos += blocksize, (bh = bh->b_this_page) != head);
- /* If there are no errors, do the next page. */
- if (likely(!err && ++u < nr_pages))
- goto do_next_folio;
- /* If there are no errors, release the runlist lock if we took it. */
- if (likely(!err)) {
- if (unlikely(rl_write_locked)) {
- up_write(&ni->runlist.lock);
- rl_write_locked = false;
- } else if (unlikely(rl))
- up_read(&ni->runlist.lock);
- rl = NULL;
- }
- /* If we issued read requests, let them complete. */
- read_lock_irqsave(&ni->size_lock, flags);
- initialized_size = ni->initialized_size;
- read_unlock_irqrestore(&ni->size_lock, flags);
- while (wait_bh > wait) {
- bh = *--wait_bh;
- wait_on_buffer(bh);
- if (likely(buffer_uptodate(bh))) {
- folio = bh->b_folio;
- bh_pos = folio_pos(folio) + bh_offset(bh);
- /*
- * If the buffer overflows the initialized size, need
- * to zero the overflowing region.
- */
- if (unlikely(bh_pos + blocksize > initialized_size)) {
- int ofs = 0;
-
- if (likely(bh_pos < initialized_size))
- ofs = initialized_size - bh_pos;
- folio_zero_segment(folio, bh_offset(bh) + ofs,
- blocksize);
- }
- } else /* if (unlikely(!buffer_uptodate(bh))) */
- err = -EIO;
- }
- if (likely(!err)) {
- /* Clear buffer_new on all buffers. */
- u = 0;
- do {
- bh = head = page_buffers(pages[u]);
- do {
- if (buffer_new(bh))
- clear_buffer_new(bh);
- } while ((bh = bh->b_this_page) != head);
- } while (++u < nr_pages);
- ntfs_debug("Done.");
- return err;
- }
- if (status.attr_switched) {
- /* Get back to the attribute extent we modified. */
- ntfs_attr_reinit_search_ctx(ctx);
- if (ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
- CASE_SENSITIVE, bh_cpos, NULL, 0, ctx)) {
- ntfs_error(vol->sb, "Failed to find required "
- "attribute extent of attribute in "
- "error code path. Run chkdsk to "
- "recover.");
- write_lock_irqsave(&ni->size_lock, flags);
- ni->itype.compressed.size += vol->cluster_size;
- write_unlock_irqrestore(&ni->size_lock, flags);
- flush_dcache_mft_record_page(ctx->ntfs_ino);
- mark_mft_record_dirty(ctx->ntfs_ino);
- /*
- * The only thing that is now wrong is the compressed
- * size of the base attribute extent which chkdsk
- * should be able to fix.
- */
- NVolSetErrors(vol);
- } else {
- m = ctx->mrec;
- a = ctx->attr;
- status.attr_switched = 0;
- }
- }
- /*
- * If the runlist has been modified, need to restore it by punching a
- * hole into it and we then need to deallocate the on-disk cluster as
- * well. Note, we only modify the runlist if we are able to generate a
- * new mapping pairs array, i.e. only when the mapped attribute extent
- * is not switched.
- */
- if (status.runlist_merged && !status.attr_switched) {
- BUG_ON(!rl_write_locked);
- /* Make the file cluster we allocated sparse in the runlist. */
- if (ntfs_rl_punch_nolock(vol, &ni->runlist, bh_cpos, 1)) {
- ntfs_error(vol->sb, "Failed to punch hole into "
- "attribute runlist in error code "
- "path. Run chkdsk to recover the "
- "lost cluster.");
- NVolSetErrors(vol);
- } else /* if (success) */ {
- status.runlist_merged = 0;
- /*
- * Deallocate the on-disk cluster we allocated but only
- * if we succeeded in punching its vcn out of the
- * runlist.
- */
- down_write(&vol->lcnbmp_lock);
- if (ntfs_bitmap_clear_bit(vol->lcnbmp_ino, lcn)) {
- ntfs_error(vol->sb, "Failed to release "
- "allocated cluster in error "
- "code path. Run chkdsk to "
- "recover the lost cluster.");
- NVolSetErrors(vol);
- }
- up_write(&vol->lcnbmp_lock);
- }
- }
- /*
- * Resize the attribute record to its old size and rebuild the mapping
- * pairs array. Note, we only can do this if the runlist has been
- * restored to its old state which also implies that the mapped
- * attribute extent is not switched.
- */
- if (status.mp_rebuilt && !status.runlist_merged) {
- if (ntfs_attr_record_resize(m, a, attr_rec_len)) {
- ntfs_error(vol->sb, "Failed to restore attribute "
- "record in error code path. Run "
- "chkdsk to recover.");
- NVolSetErrors(vol);
- } else /* if (success) */ {
- if (ntfs_mapping_pairs_build(vol, (u8*)a +
- le16_to_cpu(a->data.non_resident.
- mapping_pairs_offset), attr_rec_len -
- le16_to_cpu(a->data.non_resident.
- mapping_pairs_offset), ni->runlist.rl,
- vcn, highest_vcn, NULL)) {
- ntfs_error(vol->sb, "Failed to restore "
- "mapping pairs array in error "
- "code path. Run chkdsk to "
- "recover.");
- NVolSetErrors(vol);
- }
- flush_dcache_mft_record_page(ctx->ntfs_ino);
- mark_mft_record_dirty(ctx->ntfs_ino);
- }
- }
- /* Release the mft record and the attribute. */
- if (status.mft_attr_mapped) {
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(base_ni);
- }
- /* Release the runlist lock. */
- if (rl_write_locked)
- up_write(&ni->runlist.lock);
- else if (rl)
- up_read(&ni->runlist.lock);
- /*
- * Zero out any newly allocated blocks to avoid exposing stale data.
- * If BH_New is set, we know that the block was newly allocated above
- * and that it has not been fully zeroed and marked dirty yet.
- */
- nr_pages = u;
- u = 0;
- end = bh_cpos << vol->cluster_size_bits;
- do {
- folio = page_folio(pages[u]);
- bh = head = folio_buffers(folio);
- do {
- if (u == nr_pages &&
- folio_pos(folio) + bh_offset(bh) >= end)
- break;
- if (!buffer_new(bh))
- continue;
- clear_buffer_new(bh);
- if (!buffer_uptodate(bh)) {
- if (folio_test_uptodate(folio))
- set_buffer_uptodate(bh);
- else {
- folio_zero_range(folio, bh_offset(bh),
- blocksize);
- set_buffer_uptodate(bh);
- }
- }
- mark_buffer_dirty(bh);
- } while ((bh = bh->b_this_page) != head);
- } while (++u <= nr_pages);
- ntfs_error(vol->sb, "Failed. Returning error code %i.", err);
- return err;
-}
-
-static inline void ntfs_flush_dcache_pages(struct page **pages,
- unsigned nr_pages)
-{
- BUG_ON(!nr_pages);
- /*
- * Warning: Do not do the decrement at the same time as the call to
- * flush_dcache_page() because it is a NULL macro on i386 and hence the
- * decrement never happens so the loop never terminates.
- */
- do {
- --nr_pages;
- flush_dcache_page(pages[nr_pages]);
- } while (nr_pages > 0);
-}
-
-/**
- * ntfs_commit_pages_after_non_resident_write - commit the received data
- * @pages: array of destination pages
- * @nr_pages: number of pages in @pages
- * @pos: byte position in file at which the write begins
- * @bytes: number of bytes to be written
- *
- * See description of ntfs_commit_pages_after_write(), below.
- */
-static inline int ntfs_commit_pages_after_non_resident_write(
- struct page **pages, const unsigned nr_pages,
- s64 pos, size_t bytes)
-{
- s64 end, initialized_size;
- struct inode *vi;
- ntfs_inode *ni, *base_ni;
- struct buffer_head *bh, *head;
- ntfs_attr_search_ctx *ctx;
- MFT_RECORD *m;
- ATTR_RECORD *a;
- unsigned long flags;
- unsigned blocksize, u;
- int err;
-
- vi = pages[0]->mapping->host;
- ni = NTFS_I(vi);
- blocksize = vi->i_sb->s_blocksize;
- end = pos + bytes;
- u = 0;
- do {
- s64 bh_pos;
- struct page *page;
- bool partial;
-
- page = pages[u];
- bh_pos = (s64)page->index << PAGE_SHIFT;
- bh = head = page_buffers(page);
- partial = false;
- do {
- s64 bh_end;
-
- bh_end = bh_pos + blocksize;
- if (bh_end <= pos || bh_pos >= end) {
- if (!buffer_uptodate(bh))
- partial = true;
- } else {
- set_buffer_uptodate(bh);
- mark_buffer_dirty(bh);
- }
- } while (bh_pos += blocksize, (bh = bh->b_this_page) != head);
- /*
- * If all buffers are now uptodate but the page is not, set the
- * page uptodate.
- */
- if (!partial && !PageUptodate(page))
- SetPageUptodate(page);
- } while (++u < nr_pages);
- /*
- * Finally, if we do not need to update initialized_size or i_size we
- * are finished.
- */
- read_lock_irqsave(&ni->size_lock, flags);
- initialized_size = ni->initialized_size;
- read_unlock_irqrestore(&ni->size_lock, flags);
- if (end <= initialized_size) {
- ntfs_debug("Done.");
- return 0;
- }
- /*
- * Update initialized_size/i_size as appropriate, both in the inode and
- * the mft record.
- */
- if (!NInoAttr(ni))
- base_ni = ni;
- else
- base_ni = ni->ext.base_ntfs_ino;
- /* Map, pin, and lock the mft record. */
- m = map_mft_record(base_ni);
- if (IS_ERR(m)) {
- err = PTR_ERR(m);
- m = NULL;
- ctx = NULL;
- goto err_out;
- }
- BUG_ON(!NInoNonResident(ni));
- ctx = ntfs_attr_get_search_ctx(base_ni, m);
- if (unlikely(!ctx)) {
- err = -ENOMEM;
- goto err_out;
- }
- err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
- CASE_SENSITIVE, 0, NULL, 0, ctx);
- if (unlikely(err)) {
- if (err == -ENOENT)
- err = -EIO;
- goto err_out;
- }
- a = ctx->attr;
- BUG_ON(!a->non_resident);
- write_lock_irqsave(&ni->size_lock, flags);
- BUG_ON(end > ni->allocated_size);
- ni->initialized_size = end;
- a->data.non_resident.initialized_size = cpu_to_sle64(end);
- if (end > i_size_read(vi)) {
- i_size_write(vi, end);
- a->data.non_resident.data_size =
- a->data.non_resident.initialized_size;
- }
- write_unlock_irqrestore(&ni->size_lock, flags);
- /* Mark the mft record dirty, so it gets written back. */
- flush_dcache_mft_record_page(ctx->ntfs_ino);
- mark_mft_record_dirty(ctx->ntfs_ino);
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(base_ni);
- ntfs_debug("Done.");
- return 0;
-err_out:
- if (ctx)
- ntfs_attr_put_search_ctx(ctx);
- if (m)
- unmap_mft_record(base_ni);
- ntfs_error(vi->i_sb, "Failed to update initialized_size/i_size (error "
- "code %i).", err);
- if (err != -ENOMEM)
- NVolSetErrors(ni->vol);
- return err;
-}
-
-/**
- * ntfs_commit_pages_after_write - commit the received data
- * @pages: array of destination pages
- * @nr_pages: number of pages in @pages
- * @pos: byte position in file at which the write begins
- * @bytes: number of bytes to be written
- *
- * This is called from ntfs_file_buffered_write() with i_mutex held on the inode
- * (@pages[0]->mapping->host). There are @nr_pages pages in @pages which are
- * locked but not kmap()ped. The source data has already been copied into the
- * @page. ntfs_prepare_pages_for_non_resident_write() has been called before
- * the data was copied (for non-resident attributes only) and it returned
- * success.
- *
- * Need to set uptodate and mark dirty all buffers within the boundary of the
- * write. If all buffers in a page are uptodate we set the page uptodate, too.
- *
- * Setting the buffers dirty ensures that they get written out later when
- * ntfs_writepage() is invoked by the VM.
- *
- * Finally, we need to update i_size and initialized_size as appropriate both
- * in the inode and the mft record.
- *
- * This is modelled after fs/buffer.c::generic_commit_write(), which marks
- * buffers uptodate and dirty, sets the page uptodate if all buffers in the
- * page are uptodate, and updates i_size if the end of io is beyond i_size. In
- * that case, it also marks the inode dirty.
- *
- * If things have gone as outlined in
- * ntfs_prepare_pages_for_non_resident_write(), we do not need to do any page
- * content modifications here for non-resident attributes. For resident
- * attributes we need to do the uptodate bringing here which we combine with
- * the copying into the mft record which means we save one atomic kmap.
- *
- * Return 0 on success or -errno on error.
- */
-static int ntfs_commit_pages_after_write(struct page **pages,
- const unsigned nr_pages, s64 pos, size_t bytes)
-{
- s64 end, initialized_size;
- loff_t i_size;
- struct inode *vi;
- ntfs_inode *ni, *base_ni;
- struct page *page;
- ntfs_attr_search_ctx *ctx;
- MFT_RECORD *m;
- ATTR_RECORD *a;
- char *kattr, *kaddr;
- unsigned long flags;
- u32 attr_len;
- int err;
-
- BUG_ON(!nr_pages);
- BUG_ON(!pages);
- page = pages[0];
- BUG_ON(!page);
- vi = page->mapping->host;
- ni = NTFS_I(vi);
- ntfs_debug("Entering for inode 0x%lx, attribute type 0x%x, start page "
- "index 0x%lx, nr_pages 0x%x, pos 0x%llx, bytes 0x%zx.",
- vi->i_ino, ni->type, page->index, nr_pages,
- (long long)pos, bytes);
- if (NInoNonResident(ni))
- return ntfs_commit_pages_after_non_resident_write(pages,
- nr_pages, pos, bytes);
- BUG_ON(nr_pages > 1);
- /*
- * Attribute is resident, implying it is not compressed, encrypted, or
- * sparse.
- */
- if (!NInoAttr(ni))
- base_ni = ni;
- else
- base_ni = ni->ext.base_ntfs_ino;
- BUG_ON(NInoNonResident(ni));
- /* Map, pin, and lock the mft record. */
- m = map_mft_record(base_ni);
- if (IS_ERR(m)) {
- err = PTR_ERR(m);
- m = NULL;
- ctx = NULL;
- goto err_out;
- }
- ctx = ntfs_attr_get_search_ctx(base_ni, m);
- if (unlikely(!ctx)) {
- err = -ENOMEM;
- goto err_out;
- }
- err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
- CASE_SENSITIVE, 0, NULL, 0, ctx);
- if (unlikely(err)) {
- if (err == -ENOENT)
- err = -EIO;
- goto err_out;
- }
- a = ctx->attr;
- BUG_ON(a->non_resident);
- /* The total length of the attribute value. */
- attr_len = le32_to_cpu(a->data.resident.value_length);
- i_size = i_size_read(vi);
- BUG_ON(attr_len != i_size);
- BUG_ON(pos > attr_len);
- end = pos + bytes;
- BUG_ON(end > le32_to_cpu(a->length) -
- le16_to_cpu(a->data.resident.value_offset));
- kattr = (u8*)a + le16_to_cpu(a->data.resident.value_offset);
- kaddr = kmap_atomic(page);
- /* Copy the received data from the page to the mft record. */
- memcpy(kattr + pos, kaddr + pos, bytes);
- /* Update the attribute length if necessary. */
- if (end > attr_len) {
- attr_len = end;
- a->data.resident.value_length = cpu_to_le32(attr_len);
- }
- /*
- * If the page is not uptodate, bring the out of bounds area(s)
- * uptodate by copying data from the mft record to the page.
- */
- if (!PageUptodate(page)) {
- if (pos > 0)
- memcpy(kaddr, kattr, pos);
- if (end < attr_len)
- memcpy(kaddr + end, kattr + end, attr_len - end);
- /* Zero the region outside the end of the attribute value. */
- memset(kaddr + attr_len, 0, PAGE_SIZE - attr_len);
- flush_dcache_page(page);
- SetPageUptodate(page);
- }
- kunmap_atomic(kaddr);
- /* Update initialized_size/i_size if necessary. */
- read_lock_irqsave(&ni->size_lock, flags);
- initialized_size = ni->initialized_size;
- BUG_ON(end > ni->allocated_size);
- read_unlock_irqrestore(&ni->size_lock, flags);
- BUG_ON(initialized_size != i_size);
- if (end > initialized_size) {
- write_lock_irqsave(&ni->size_lock, flags);
- ni->initialized_size = end;
- i_size_write(vi, end);
- write_unlock_irqrestore(&ni->size_lock, flags);
- }
- /* Mark the mft record dirty, so it gets written back. */
- flush_dcache_mft_record_page(ctx->ntfs_ino);
- mark_mft_record_dirty(ctx->ntfs_ino);
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(base_ni);
- ntfs_debug("Done.");
- return 0;
-err_out:
- if (err == -ENOMEM) {
- ntfs_warning(vi->i_sb, "Error allocating memory required to "
- "commit the write.");
- if (PageUptodate(page)) {
- ntfs_warning(vi->i_sb, "Page is uptodate, setting "
- "dirty so the write will be retried "
- "later on by the VM.");
- /*
- * Put the page on mapping->dirty_pages, but leave its
- * buffers' dirty state as-is.
- */
- __set_page_dirty_nobuffers(page);
- err = 0;
- } else
- ntfs_error(vi->i_sb, "Page is not uptodate. Written "
- "data has been lost.");
- } else {
- ntfs_error(vi->i_sb, "Resident attribute commit write failed "
- "with error %i.", err);
- NVolSetErrors(ni->vol);
- }
- if (ctx)
- ntfs_attr_put_search_ctx(ctx);
- if (m)
- unmap_mft_record(base_ni);
- return err;
-}
-
-/*
- * Copy as much as we can into the pages and return the number of bytes which
- * were successfully copied. If a fault is encountered then clear the pages
- * out to (ofs + bytes) and return the number of bytes which were copied.
- */
-static size_t ntfs_copy_from_user_iter(struct page **pages, unsigned nr_pages,
- unsigned ofs, struct iov_iter *i, size_t bytes)
-{
- struct page **last_page = pages + nr_pages;
- size_t total = 0;
- unsigned len, copied;
-
- do {
- len = PAGE_SIZE - ofs;
- if (len > bytes)
- len = bytes;
- copied = copy_page_from_iter_atomic(*pages, ofs, len, i);
- total += copied;
- bytes -= copied;
- if (!bytes)
- break;
- if (copied < len)
- goto err;
- ofs = 0;
- } while (++pages < last_page);
-out:
- return total;
-err:
- /* Zero the rest of the target like __copy_from_user(). */
- len = PAGE_SIZE - copied;
- do {
- if (len > bytes)
- len = bytes;
- zero_user(*pages, copied, len);
- bytes -= len;
- copied = 0;
- len = PAGE_SIZE;
- } while (++pages < last_page);
- goto out;
-}
-
-/**
- * ntfs_perform_write - perform buffered write to a file
- * @file: file to write to
- * @i: iov_iter with data to write
- * @pos: byte offset in file at which to begin writing to
- */
-static ssize_t ntfs_perform_write(struct file *file, struct iov_iter *i,
- loff_t pos)
-{
- struct address_space *mapping = file->f_mapping;
- struct inode *vi = mapping->host;
- ntfs_inode *ni = NTFS_I(vi);
- ntfs_volume *vol = ni->vol;
- struct page *pages[NTFS_MAX_PAGES_PER_CLUSTER];
- struct page *cached_page = NULL;
- VCN last_vcn;
- LCN lcn;
- size_t bytes;
- ssize_t status, written = 0;
- unsigned nr_pages;
-
- ntfs_debug("Entering for i_ino 0x%lx, attribute type 0x%x, pos "
- "0x%llx, count 0x%lx.", vi->i_ino,
- (unsigned)le32_to_cpu(ni->type),
- (unsigned long long)pos,
- (unsigned long)iov_iter_count(i));
- /*
- * If a previous ntfs_truncate() failed, repeat it and abort if it
- * fails again.
- */
- if (unlikely(NInoTruncateFailed(ni))) {
- int err;
-
- inode_dio_wait(vi);
- err = ntfs_truncate(vi);
- if (err || NInoTruncateFailed(ni)) {
- if (!err)
- err = -EIO;
- ntfs_error(vol->sb, "Cannot perform write to inode "
- "0x%lx, attribute type 0x%x, because "
- "ntfs_truncate() failed (error code "
- "%i).", vi->i_ino,
- (unsigned)le32_to_cpu(ni->type), err);
- return err;
- }
- }
- /*
- * Determine the number of pages per cluster for non-resident
- * attributes.
- */
- nr_pages = 1;
- if (vol->cluster_size > PAGE_SIZE && NInoNonResident(ni))
- nr_pages = vol->cluster_size >> PAGE_SHIFT;
- last_vcn = -1;
- do {
- VCN vcn;
- pgoff_t start_idx;
- unsigned ofs, do_pages, u;
- size_t copied;
-
- start_idx = pos >> PAGE_SHIFT;
- ofs = pos & ~PAGE_MASK;
- bytes = PAGE_SIZE - ofs;
- do_pages = 1;
- if (nr_pages > 1) {
- vcn = pos >> vol->cluster_size_bits;
- if (vcn != last_vcn) {
- last_vcn = vcn;
- /*
- * Get the lcn of the vcn the write is in. If
- * it is a hole, need to lock down all pages in
- * the cluster.
- */
- down_read(&ni->runlist.lock);
- lcn = ntfs_attr_vcn_to_lcn_nolock(ni, pos >>
- vol->cluster_size_bits, false);
- up_read(&ni->runlist.lock);
- if (unlikely(lcn < LCN_HOLE)) {
- if (lcn == LCN_ENOMEM)
- status = -ENOMEM;
- else {
- status = -EIO;
- ntfs_error(vol->sb, "Cannot "
- "perform write to "
- "inode 0x%lx, "
- "attribute type 0x%x, "
- "because the attribute "
- "is corrupt.",
- vi->i_ino, (unsigned)
- le32_to_cpu(ni->type));
- }
- break;
- }
- if (lcn == LCN_HOLE) {
- start_idx = (pos & ~(s64)
- vol->cluster_size_mask)
- >> PAGE_SHIFT;
- bytes = vol->cluster_size - (pos &
- vol->cluster_size_mask);
- do_pages = nr_pages;
- }
- }
- }
- if (bytes > iov_iter_count(i))
- bytes = iov_iter_count(i);
-again:
- /*
- * Bring in the user page(s) that we will copy from _first_.
- * Otherwise there is a nasty deadlock on copying from the same
- * page(s) as we are writing to, without it/them being marked
- * up-to-date. Note, at present there is nothing to stop the
- * pages being swapped out between us bringing them into memory
- * and doing the actual copying.
- */
- if (unlikely(fault_in_iov_iter_readable(i, bytes))) {
- status = -EFAULT;
- break;
- }
- /* Get and lock @do_pages starting at index @start_idx. */
- status = __ntfs_grab_cache_pages(mapping, start_idx, do_pages,
- pages, &cached_page);
- if (unlikely(status))
- break;
- /*
- * For non-resident attributes, we need to fill any holes with
- * actual clusters and ensure all bufferes are mapped. We also
- * need to bring uptodate any buffers that are only partially
- * being written to.
- */
- if (NInoNonResident(ni)) {
- status = ntfs_prepare_pages_for_non_resident_write(
- pages, do_pages, pos, bytes);
- if (unlikely(status)) {
- do {
- unlock_page(pages[--do_pages]);
- put_page(pages[do_pages]);
- } while (do_pages);
- break;
- }
- }
- u = (pos >> PAGE_SHIFT) - pages[0]->index;
- copied = ntfs_copy_from_user_iter(pages + u, do_pages - u, ofs,
- i, bytes);
- ntfs_flush_dcache_pages(pages + u, do_pages - u);
- status = 0;
- if (likely(copied == bytes)) {
- status = ntfs_commit_pages_after_write(pages, do_pages,
- pos, bytes);
- }
- do {
- unlock_page(pages[--do_pages]);
- put_page(pages[do_pages]);
- } while (do_pages);
- if (unlikely(status < 0)) {
- iov_iter_revert(i, copied);
- break;
- }
- cond_resched();
- if (unlikely(copied < bytes)) {
- iov_iter_revert(i, copied);
- if (copied)
- bytes = copied;
- else if (bytes > PAGE_SIZE - ofs)
- bytes = PAGE_SIZE - ofs;
- goto again;
- }
- pos += copied;
- written += copied;
- balance_dirty_pages_ratelimited(mapping);
- if (fatal_signal_pending(current)) {
- status = -EINTR;
- break;
- }
- } while (iov_iter_count(i));
- if (cached_page)
- put_page(cached_page);
- ntfs_debug("Done. Returning %s (written 0x%lx, status %li).",
- written ? "written" : "status", (unsigned long)written,
- (long)status);
- return written ? written : status;
-}
-
-/**
- * ntfs_file_write_iter - simple wrapper for ntfs_file_write_iter_nolock()
- * @iocb: IO state structure
- * @from: iov_iter with data to write
- *
- * Basically the same as generic_file_write_iter() except that it ends up
- * up calling ntfs_perform_write() instead of generic_perform_write() and that
- * O_DIRECT is not implemented.
- */
-static ssize_t ntfs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
-{
- struct file *file = iocb->ki_filp;
- struct inode *vi = file_inode(file);
- ssize_t written = 0;
- ssize_t err;
-
- inode_lock(vi);
- /* We can write back this queue in page reclaim. */
- err = ntfs_prepare_file_for_write(iocb, from);
- if (iov_iter_count(from) && !err)
- written = ntfs_perform_write(file, from, iocb->ki_pos);
- inode_unlock(vi);
- iocb->ki_pos += written;
- if (likely(written > 0))
- written = generic_write_sync(iocb, written);
- return written ? written : err;
-}
-
-/**
- * ntfs_file_fsync - sync a file to disk
- * @filp: file to be synced
- * @datasync: if non-zero only flush user data and not metadata
- *
- * Data integrity sync of a file to disk. Used for fsync, fdatasync, and msync
- * system calls. This function is inspired by fs/buffer.c::file_fsync().
- *
- * If @datasync is false, write the mft record and all associated extent mft
- * records as well as the $DATA attribute and then sync the block device.
- *
- * If @datasync is true and the attribute is non-resident, we skip the writing
- * of the mft record and all associated extent mft records (this might still
- * happen due to the write_inode_now() call).
- *
- * Also, if @datasync is true, we do not wait on the inode to be written out
- * but we always wait on the page cache pages to be written out.
- *
- * Locking: Caller must hold i_mutex on the inode.
- *
- * TODO: We should probably also write all attribute/index inodes associated
- * with this inode but since we have no simple way of getting to them we ignore
- * this problem for now.
- */
-static int ntfs_file_fsync(struct file *filp, loff_t start, loff_t end,
- int datasync)
-{
- struct inode *vi = filp->f_mapping->host;
- int err, ret = 0;
-
- ntfs_debug("Entering for inode 0x%lx.", vi->i_ino);
-
- err = file_write_and_wait_range(filp, start, end);
- if (err)
- return err;
- inode_lock(vi);
-
- BUG_ON(S_ISDIR(vi->i_mode));
- if (!datasync || !NInoNonResident(NTFS_I(vi)))
- ret = __ntfs_write_inode(vi, 1);
- write_inode_now(vi, !datasync);
- /*
- * NOTE: If we were to use mapping->private_list (see ext2 and
- * fs/buffer.c) for dirty blocks then we could optimize the below to be
- * sync_mapping_buffers(vi->i_mapping).
- */
- err = sync_blockdev(vi->i_sb->s_bdev);
- if (unlikely(err && !ret))
- ret = err;
- if (likely(!ret))
- ntfs_debug("Done.");
- else
- ntfs_warning(vi->i_sb, "Failed to f%ssync inode 0x%lx. Error "
- "%u.", datasync ? "data" : "", vi->i_ino, -ret);
- inode_unlock(vi);
- return ret;
-}
-
-#endif /* NTFS_RW */
-
-const struct file_operations ntfs_file_ops = {
- .llseek = generic_file_llseek,
- .read_iter = generic_file_read_iter,
-#ifdef NTFS_RW
- .write_iter = ntfs_file_write_iter,
- .fsync = ntfs_file_fsync,
-#endif /* NTFS_RW */
- .mmap = generic_file_mmap,
- .open = ntfs_file_open,
- .splice_read = filemap_splice_read,
-};
-
-const struct inode_operations ntfs_file_inode_ops = {
-#ifdef NTFS_RW
- .setattr = ntfs_setattr,
-#endif /* NTFS_RW */
-};
-
-const struct file_operations ntfs_empty_file_ops = {};
-
-const struct inode_operations ntfs_empty_inode_ops = {};
diff --git a/fs/ntfs/index.c b/fs/ntfs/index.c
deleted file mode 100644
index d46c2c0..0000000
--- a/fs/ntfs/index.c
+++ /dev/null
@@ -1,440 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * index.c - NTFS kernel index handling. Part of the Linux-NTFS project.
- *
- * Copyright (c) 2004-2005 Anton Altaparmakov
- */
-
-#include <linux/slab.h>
-
-#include "aops.h"
-#include "collate.h"
-#include "debug.h"
-#include "index.h"
-#include "ntfs.h"
-
-/**
- * ntfs_index_ctx_get - allocate and initialize a new index context
- * @idx_ni: ntfs index inode with which to initialize the context
- *
- * Allocate a new index context, initialize it with @idx_ni and return it.
- * Return NULL if allocation failed.
- *
- * Locking: Caller must hold i_mutex on the index inode.
- */
-ntfs_index_context *ntfs_index_ctx_get(ntfs_inode *idx_ni)
-{
- ntfs_index_context *ictx;
-
- ictx = kmem_cache_alloc(ntfs_index_ctx_cache, GFP_NOFS);
- if (ictx)
- *ictx = (ntfs_index_context){ .idx_ni = idx_ni };
- return ictx;
-}
-
-/**
- * ntfs_index_ctx_put - release an index context
- * @ictx: index context to free
- *
- * Release the index context @ictx, releasing all associated resources.
- *
- * Locking: Caller must hold i_mutex on the index inode.
- */
-void ntfs_index_ctx_put(ntfs_index_context *ictx)
-{
- if (ictx->entry) {
- if (ictx->is_in_root) {
- if (ictx->actx)
- ntfs_attr_put_search_ctx(ictx->actx);
- if (ictx->base_ni)
- unmap_mft_record(ictx->base_ni);
- } else {
- struct page *page = ictx->page;
- if (page) {
- BUG_ON(!PageLocked(page));
- unlock_page(page);
- ntfs_unmap_page(page);
- }
- }
- }
- kmem_cache_free(ntfs_index_ctx_cache, ictx);
- return;
-}
-
-/**
- * ntfs_index_lookup - find a key in an index and return its index entry
- * @key: [IN] key for which to search in the index
- * @key_len: [IN] length of @key in bytes
- * @ictx: [IN/OUT] context describing the index and the returned entry
- *
- * Before calling ntfs_index_lookup(), @ictx must have been obtained from a
- * call to ntfs_index_ctx_get().
- *
- * Look for the @key in the index specified by the index lookup context @ictx.
- * ntfs_index_lookup() walks the contents of the index looking for the @key.
- *
- * If the @key is found in the index, 0 is returned and @ictx is setup to
- * describe the index entry containing the matching @key. @ictx->entry is the
- * index entry and @ictx->data and @ictx->data_len are the index entry data and
- * its length in bytes, respectively.
- *
- * If the @key is not found in the index, -ENOENT is returned and @ictx is
- * setup to describe the index entry whose key collates immediately after the
- * search @key, i.e. this is the position in the index at which an index entry
- * with a key of @key would need to be inserted.
- *
- * If an error occurs return the negative error code and @ictx is left
- * untouched.
- *
- * When finished with the entry and its data, call ntfs_index_ctx_put() to free
- * the context and other associated resources.
- *
- * If the index entry was modified, call flush_dcache_index_entry_page()
- * immediately after the modification and either ntfs_index_entry_mark_dirty()
- * or ntfs_index_entry_write() before the call to ntfs_index_ctx_put() to
- * ensure that the changes are written to disk.
- *
- * Locking: - Caller must hold i_mutex on the index inode.
- * - Each page cache page in the index allocation mapping must be
- * locked whilst being accessed otherwise we may find a corrupt
- * page due to it being under ->writepage at the moment which
- * applies the mst protection fixups before writing out and then
- * removes them again after the write is complete after which it
- * unlocks the page.
- */
-int ntfs_index_lookup(const void *key, const int key_len,
- ntfs_index_context *ictx)
-{
- VCN vcn, old_vcn;
- ntfs_inode *idx_ni = ictx->idx_ni;
- ntfs_volume *vol = idx_ni->vol;
- struct super_block *sb = vol->sb;
- ntfs_inode *base_ni = idx_ni->ext.base_ntfs_ino;
- MFT_RECORD *m;
- INDEX_ROOT *ir;
- INDEX_ENTRY *ie;
- INDEX_ALLOCATION *ia;
- u8 *index_end, *kaddr;
- ntfs_attr_search_ctx *actx;
- struct address_space *ia_mapping;
- struct page *page;
- int rc, err = 0;
-
- ntfs_debug("Entering.");
- BUG_ON(!NInoAttr(idx_ni));
- BUG_ON(idx_ni->type != AT_INDEX_ALLOCATION);
- BUG_ON(idx_ni->nr_extents != -1);
- BUG_ON(!base_ni);
- BUG_ON(!key);
- BUG_ON(key_len <= 0);
- if (!ntfs_is_collation_rule_supported(
- idx_ni->itype.index.collation_rule)) {
- ntfs_error(sb, "Index uses unsupported collation rule 0x%x. "
- "Aborting lookup.", le32_to_cpu(
- idx_ni->itype.index.collation_rule));
- return -EOPNOTSUPP;
- }
- /* Get hold of the mft record for the index inode. */
- m = map_mft_record(base_ni);
- if (IS_ERR(m)) {
- ntfs_error(sb, "map_mft_record() failed with error code %ld.",
- -PTR_ERR(m));
- return PTR_ERR(m);
- }
- actx = ntfs_attr_get_search_ctx(base_ni, m);
- if (unlikely(!actx)) {
- err = -ENOMEM;
- goto err_out;
- }
- /* Find the index root attribute in the mft record. */
- err = ntfs_attr_lookup(AT_INDEX_ROOT, idx_ni->name, idx_ni->name_len,
- CASE_SENSITIVE, 0, NULL, 0, actx);
- if (unlikely(err)) {
- if (err == -ENOENT) {
- ntfs_error(sb, "Index root attribute missing in inode "
- "0x%lx.", idx_ni->mft_no);
- err = -EIO;
- }
- goto err_out;
- }
- /* Get to the index root value (it has been verified in read_inode). */
- ir = (INDEX_ROOT*)((u8*)actx->attr +
- le16_to_cpu(actx->attr->data.resident.value_offset));
- index_end = (u8*)&ir->index + le32_to_cpu(ir->index.index_length);
- /* The first index entry. */
- ie = (INDEX_ENTRY*)((u8*)&ir->index +
- le32_to_cpu(ir->index.entries_offset));
- /*
- * Loop until we exceed valid memory (corruption case) or until we
- * reach the last entry.
- */
- for (;; ie = (INDEX_ENTRY*)((u8*)ie + le16_to_cpu(ie->length))) {
- /* Bounds checks. */
- if ((u8*)ie < (u8*)actx->mrec || (u8*)ie +
- sizeof(INDEX_ENTRY_HEADER) > index_end ||
- (u8*)ie + le16_to_cpu(ie->length) > index_end)
- goto idx_err_out;
- /*
- * The last entry cannot contain a key. It can however contain
- * a pointer to a child node in the B+tree so we just break out.
- */
- if (ie->flags & INDEX_ENTRY_END)
- break;
- /* Further bounds checks. */
- if ((u32)sizeof(INDEX_ENTRY_HEADER) +
- le16_to_cpu(ie->key_length) >
- le16_to_cpu(ie->data.vi.data_offset) ||
- (u32)le16_to_cpu(ie->data.vi.data_offset) +
- le16_to_cpu(ie->data.vi.data_length) >
- le16_to_cpu(ie->length))
- goto idx_err_out;
- /* If the keys match perfectly, we setup @ictx and return 0. */
- if ((key_len == le16_to_cpu(ie->key_length)) && !memcmp(key,
- &ie->key, key_len)) {
-ir_done:
- ictx->is_in_root = true;
- ictx->ir = ir;
- ictx->actx = actx;
- ictx->base_ni = base_ni;
- ictx->ia = NULL;
- ictx->page = NULL;
-done:
- ictx->entry = ie;
- ictx->data = (u8*)ie +
- le16_to_cpu(ie->data.vi.data_offset);
- ictx->data_len = le16_to_cpu(ie->data.vi.data_length);
- ntfs_debug("Done.");
- return err;
- }
- /*
- * Not a perfect match, need to do full blown collation so we
- * know which way in the B+tree we have to go.
- */
- rc = ntfs_collate(vol, idx_ni->itype.index.collation_rule, key,
- key_len, &ie->key, le16_to_cpu(ie->key_length));
- /*
- * If @key collates before the key of the current entry, there
- * is definitely no such key in this index but we might need to
- * descend into the B+tree so we just break out of the loop.
- */
- if (rc == -1)
- break;
- /*
- * A match should never happen as the memcmp() call should have
- * cought it, but we still treat it correctly.
- */
- if (!rc)
- goto ir_done;
- /* The keys are not equal, continue the search. */
- }
- /*
- * We have finished with this index without success. Check for the
- * presence of a child node and if not present setup @ictx and return
- * -ENOENT.
- */
- if (!(ie->flags & INDEX_ENTRY_NODE)) {
- ntfs_debug("Entry not found.");
- err = -ENOENT;
- goto ir_done;
- } /* Child node present, descend into it. */
- /* Consistency check: Verify that an index allocation exists. */
- if (!NInoIndexAllocPresent(idx_ni)) {
- ntfs_error(sb, "No index allocation attribute but index entry "
- "requires one. Inode 0x%lx is corrupt or "
- "driver bug.", idx_ni->mft_no);
- goto err_out;
- }
- /* Get the starting vcn of the index_block holding the child node. */
- vcn = sle64_to_cpup((sle64*)((u8*)ie + le16_to_cpu(ie->length) - 8));
- ia_mapping = VFS_I(idx_ni)->i_mapping;
- /*
- * We are done with the index root and the mft record. Release them,
- * otherwise we deadlock with ntfs_map_page().
- */
- ntfs_attr_put_search_ctx(actx);
- unmap_mft_record(base_ni);
- m = NULL;
- actx = NULL;
-descend_into_child_node:
- /*
- * Convert vcn to index into the index allocation attribute in units
- * of PAGE_SIZE and map the page cache page, reading it from
- * disk if necessary.
- */
- page = ntfs_map_page(ia_mapping, vcn <<
- idx_ni->itype.index.vcn_size_bits >> PAGE_SHIFT);
- if (IS_ERR(page)) {
- ntfs_error(sb, "Failed to map index page, error %ld.",
- -PTR_ERR(page));
- err = PTR_ERR(page);
- goto err_out;
- }
- lock_page(page);
- kaddr = (u8*)page_address(page);
-fast_descend_into_child_node:
- /* Get to the index allocation block. */
- ia = (INDEX_ALLOCATION*)(kaddr + ((vcn <<
- idx_ni->itype.index.vcn_size_bits) & ~PAGE_MASK));
- /* Bounds checks. */
- if ((u8*)ia < kaddr || (u8*)ia > kaddr + PAGE_SIZE) {
- ntfs_error(sb, "Out of bounds check failed. Corrupt inode "
- "0x%lx or driver bug.", idx_ni->mft_no);
- goto unm_err_out;
- }
- /* Catch multi sector transfer fixup errors. */
- if (unlikely(!ntfs_is_indx_record(ia->magic))) {
- ntfs_error(sb, "Index record with vcn 0x%llx is corrupt. "
- "Corrupt inode 0x%lx. Run chkdsk.",
- (long long)vcn, idx_ni->mft_no);
- goto unm_err_out;
- }
- if (sle64_to_cpu(ia->index_block_vcn) != vcn) {
- ntfs_error(sb, "Actual VCN (0x%llx) of index buffer is "
- "different from expected VCN (0x%llx). Inode "
- "0x%lx is corrupt or driver bug.",
- (unsigned long long)
- sle64_to_cpu(ia->index_block_vcn),
- (unsigned long long)vcn, idx_ni->mft_no);
- goto unm_err_out;
- }
- if (le32_to_cpu(ia->index.allocated_size) + 0x18 !=
- idx_ni->itype.index.block_size) {
- ntfs_error(sb, "Index buffer (VCN 0x%llx) of inode 0x%lx has "
- "a size (%u) differing from the index "
- "specified size (%u). Inode is corrupt or "
- "driver bug.", (unsigned long long)vcn,
- idx_ni->mft_no,
- le32_to_cpu(ia->index.allocated_size) + 0x18,
- idx_ni->itype.index.block_size);
- goto unm_err_out;
- }
- index_end = (u8*)ia + idx_ni->itype.index.block_size;
- if (index_end > kaddr + PAGE_SIZE) {
- ntfs_error(sb, "Index buffer (VCN 0x%llx) of inode 0x%lx "
- "crosses page boundary. Impossible! Cannot "
- "access! This is probably a bug in the "
- "driver.", (unsigned long long)vcn,
- idx_ni->mft_no);
- goto unm_err_out;
- }
- index_end = (u8*)&ia->index + le32_to_cpu(ia->index.index_length);
- if (index_end > (u8*)ia + idx_ni->itype.index.block_size) {
- ntfs_error(sb, "Size of index buffer (VCN 0x%llx) of inode "
- "0x%lx exceeds maximum size.",
- (unsigned long long)vcn, idx_ni->mft_no);
- goto unm_err_out;
- }
- /* The first index entry. */
- ie = (INDEX_ENTRY*)((u8*)&ia->index +
- le32_to_cpu(ia->index.entries_offset));
- /*
- * Iterate similar to above big loop but applied to index buffer, thus
- * loop until we exceed valid memory (corruption case) or until we
- * reach the last entry.
- */
- for (;; ie = (INDEX_ENTRY*)((u8*)ie + le16_to_cpu(ie->length))) {
- /* Bounds checks. */
- if ((u8*)ie < (u8*)ia || (u8*)ie +
- sizeof(INDEX_ENTRY_HEADER) > index_end ||
- (u8*)ie + le16_to_cpu(ie->length) > index_end) {
- ntfs_error(sb, "Index entry out of bounds in inode "
- "0x%lx.", idx_ni->mft_no);
- goto unm_err_out;
- }
- /*
- * The last entry cannot contain a key. It can however contain
- * a pointer to a child node in the B+tree so we just break out.
- */
- if (ie->flags & INDEX_ENTRY_END)
- break;
- /* Further bounds checks. */
- if ((u32)sizeof(INDEX_ENTRY_HEADER) +
- le16_to_cpu(ie->key_length) >
- le16_to_cpu(ie->data.vi.data_offset) ||
- (u32)le16_to_cpu(ie->data.vi.data_offset) +
- le16_to_cpu(ie->data.vi.data_length) >
- le16_to_cpu(ie->length)) {
- ntfs_error(sb, "Index entry out of bounds in inode "
- "0x%lx.", idx_ni->mft_no);
- goto unm_err_out;
- }
- /* If the keys match perfectly, we setup @ictx and return 0. */
- if ((key_len == le16_to_cpu(ie->key_length)) && !memcmp(key,
- &ie->key, key_len)) {
-ia_done:
- ictx->is_in_root = false;
- ictx->actx = NULL;
- ictx->base_ni = NULL;
- ictx->ia = ia;
- ictx->page = page;
- goto done;
- }
- /*
- * Not a perfect match, need to do full blown collation so we
- * know which way in the B+tree we have to go.
- */
- rc = ntfs_collate(vol, idx_ni->itype.index.collation_rule, key,
- key_len, &ie->key, le16_to_cpu(ie->key_length));
- /*
- * If @key collates before the key of the current entry, there
- * is definitely no such key in this index but we might need to
- * descend into the B+tree so we just break out of the loop.
- */
- if (rc == -1)
- break;
- /*
- * A match should never happen as the memcmp() call should have
- * cought it, but we still treat it correctly.
- */
- if (!rc)
- goto ia_done;
- /* The keys are not equal, continue the search. */
- }
- /*
- * We have finished with this index buffer without success. Check for
- * the presence of a child node and if not present return -ENOENT.
- */
- if (!(ie->flags & INDEX_ENTRY_NODE)) {
- ntfs_debug("Entry not found.");
- err = -ENOENT;
- goto ia_done;
- }
- if ((ia->index.flags & NODE_MASK) == LEAF_NODE) {
- ntfs_error(sb, "Index entry with child node found in a leaf "
- "node in inode 0x%lx.", idx_ni->mft_no);
- goto unm_err_out;
- }
- /* Child node present, descend into it. */
- old_vcn = vcn;
- vcn = sle64_to_cpup((sle64*)((u8*)ie + le16_to_cpu(ie->length) - 8));
- if (vcn >= 0) {
- /*
- * If vcn is in the same page cache page as old_vcn we recycle
- * the mapped page.
- */
- if (old_vcn << vol->cluster_size_bits >>
- PAGE_SHIFT == vcn <<
- vol->cluster_size_bits >>
- PAGE_SHIFT)
- goto fast_descend_into_child_node;
- unlock_page(page);
- ntfs_unmap_page(page);
- goto descend_into_child_node;
- }
- ntfs_error(sb, "Negative child node vcn in inode 0x%lx.",
- idx_ni->mft_no);
-unm_err_out:
- unlock_page(page);
- ntfs_unmap_page(page);
-err_out:
- if (!err)
- err = -EIO;
- if (actx)
- ntfs_attr_put_search_ctx(actx);
- if (m)
- unmap_mft_record(base_ni);
- return err;
-idx_err_out:
- ntfs_error(sb, "Corrupt index. Aborting lookup.");
- goto err_out;
-}
diff --git a/fs/ntfs/index.h b/fs/ntfs/index.h
deleted file mode 100644
index bb3c3ae..0000000
--- a/fs/ntfs/index.h
+++ /dev/null
@@ -1,134 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * index.h - Defines for NTFS kernel index handling. Part of the Linux-NTFS
- * project.
- *
- * Copyright (c) 2004 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_INDEX_H
-#define _LINUX_NTFS_INDEX_H
-
-#include <linux/fs.h>
-
-#include "types.h"
-#include "layout.h"
-#include "inode.h"
-#include "attrib.h"
-#include "mft.h"
-#include "aops.h"
-
-/**
- * @idx_ni: index inode containing the @entry described by this context
- * @entry: index entry (points into @ir or @ia)
- * @data: index entry data (points into @entry)
- * @data_len: length in bytes of @data
- * @is_in_root: 'true' if @entry is in @ir and 'false' if it is in @ia
- * @ir: index root if @is_in_root and NULL otherwise
- * @actx: attribute search context if @is_in_root and NULL otherwise
- * @base_ni: base inode if @is_in_root and NULL otherwise
- * @ia: index block if @is_in_root is 'false' and NULL otherwise
- * @page: page if @is_in_root is 'false' and NULL otherwise
- *
- * @idx_ni is the index inode this context belongs to.
- *
- * @entry is the index entry described by this context. @data and @data_len
- * are the index entry data and its length in bytes, respectively. @data
- * simply points into @entry. This is probably what the user is interested in.
- *
- * If @is_in_root is 'true', @entry is in the index root attribute @ir described
- * by the attribute search context @actx and the base inode @base_ni. @ia and
- * @page are NULL in this case.
- *
- * If @is_in_root is 'false', @entry is in the index allocation attribute and @ia
- * and @page point to the index allocation block and the mapped, locked page it
- * is in, respectively. @ir, @actx and @base_ni are NULL in this case.
- *
- * To obtain a context call ntfs_index_ctx_get().
- *
- * We use this context to allow ntfs_index_lookup() to return the found index
- * @entry and its @data without having to allocate a buffer and copy the @entry
- * and/or its @data into it.
- *
- * When finished with the @entry and its @data, call ntfs_index_ctx_put() to
- * free the context and other associated resources.
- *
- * If the index entry was modified, call flush_dcache_index_entry_page()
- * immediately after the modification and either ntfs_index_entry_mark_dirty()
- * or ntfs_index_entry_write() before the call to ntfs_index_ctx_put() to
- * ensure that the changes are written to disk.
- */
-typedef struct {
- ntfs_inode *idx_ni;
- INDEX_ENTRY *entry;
- void *data;
- u16 data_len;
- bool is_in_root;
- INDEX_ROOT *ir;
- ntfs_attr_search_ctx *actx;
- ntfs_inode *base_ni;
- INDEX_ALLOCATION *ia;
- struct page *page;
-} ntfs_index_context;
-
-extern ntfs_index_context *ntfs_index_ctx_get(ntfs_inode *idx_ni);
-extern void ntfs_index_ctx_put(ntfs_index_context *ictx);
-
-extern int ntfs_index_lookup(const void *key, const int key_len,
- ntfs_index_context *ictx);
-
-#ifdef NTFS_RW
-
-/**
- * ntfs_index_entry_flush_dcache_page - flush_dcache_page() for index entries
- * @ictx: ntfs index context describing the index entry
- *
- * Call flush_dcache_page() for the page in which an index entry resides.
- *
- * This must be called every time an index entry is modified, just after the
- * modification.
- *
- * If the index entry is in the index root attribute, simply flush the page
- * containing the mft record containing the index root attribute.
- *
- * If the index entry is in an index block belonging to the index allocation
- * attribute, simply flush the page cache page containing the index block.
- */
-static inline void ntfs_index_entry_flush_dcache_page(ntfs_index_context *ictx)
-{
- if (ictx->is_in_root)
- flush_dcache_mft_record_page(ictx->actx->ntfs_ino);
- else
- flush_dcache_page(ictx->page);
-}
-
-/**
- * ntfs_index_entry_mark_dirty - mark an index entry dirty
- * @ictx: ntfs index context describing the index entry
- *
- * Mark the index entry described by the index entry context @ictx dirty.
- *
- * If the index entry is in the index root attribute, simply mark the mft
- * record containing the index root attribute dirty. This ensures the mft
- * record, and hence the index root attribute, will be written out to disk
- * later.
- *
- * If the index entry is in an index block belonging to the index allocation
- * attribute, mark the buffers belonging to the index record as well as the
- * page cache page the index block is in dirty. This automatically marks the
- * VFS inode of the ntfs index inode to which the index entry belongs dirty,
- * too (I_DIRTY_PAGES) and this in turn ensures the page buffers, and hence the
- * dirty index block, will be written out to disk later.
- */
-static inline void ntfs_index_entry_mark_dirty(ntfs_index_context *ictx)
-{
- if (ictx->is_in_root)
- mark_mft_record_dirty(ictx->actx->ntfs_ino);
- else
- mark_ntfs_record_dirty(ictx->page,
- (u8*)ictx->ia - (u8*)page_address(ictx->page));
-}
-
-#endif /* NTFS_RW */
-
-#endif /* _LINUX_NTFS_INDEX_H */
diff --git a/fs/ntfs/inode.c b/fs/ntfs/inode.c
deleted file mode 100644
index aba1e22..0000000
--- a/fs/ntfs/inode.c
+++ /dev/null
@@ -1,3102 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * inode.c - NTFS kernel inode handling.
- *
- * Copyright (c) 2001-2014 Anton Altaparmakov and Tuxera Inc.
- */
-
-#include <linux/buffer_head.h>
-#include <linux/fs.h>
-#include <linux/mm.h>
-#include <linux/mount.h>
-#include <linux/mutex.h>
-#include <linux/pagemap.h>
-#include <linux/quotaops.h>
-#include <linux/slab.h>
-#include <linux/log2.h>
-
-#include "aops.h"
-#include "attrib.h"
-#include "bitmap.h"
-#include "dir.h"
-#include "debug.h"
-#include "inode.h"
-#include "lcnalloc.h"
-#include "malloc.h"
-#include "mft.h"
-#include "time.h"
-#include "ntfs.h"
-
-/**
- * ntfs_test_inode - compare two (possibly fake) inodes for equality
- * @vi: vfs inode which to test
- * @data: data which is being tested with
- *
- * Compare the ntfs attribute embedded in the ntfs specific part of the vfs
- * inode @vi for equality with the ntfs attribute @data.
- *
- * If searching for the normal file/directory inode, set @na->type to AT_UNUSED.
- * @na->name and @na->name_len are then ignored.
- *
- * Return 1 if the attributes match and 0 if not.
- *
- * NOTE: This function runs with the inode_hash_lock spin lock held so it is not
- * allowed to sleep.
- */
-int ntfs_test_inode(struct inode *vi, void *data)
-{
- ntfs_attr *na = (ntfs_attr *)data;
- ntfs_inode *ni;
-
- if (vi->i_ino != na->mft_no)
- return 0;
- ni = NTFS_I(vi);
- /* If !NInoAttr(ni), @vi is a normal file or directory inode. */
- if (likely(!NInoAttr(ni))) {
- /* If not looking for a normal inode this is a mismatch. */
- if (unlikely(na->type != AT_UNUSED))
- return 0;
- } else {
- /* A fake inode describing an attribute. */
- if (ni->type != na->type)
- return 0;
- if (ni->name_len != na->name_len)
- return 0;
- if (na->name_len && memcmp(ni->name, na->name,
- na->name_len * sizeof(ntfschar)))
- return 0;
- }
- /* Match! */
- return 1;
-}
-
-/**
- * ntfs_init_locked_inode - initialize an inode
- * @vi: vfs inode to initialize
- * @data: data which to initialize @vi to
- *
- * Initialize the vfs inode @vi with the values from the ntfs attribute @data in
- * order to enable ntfs_test_inode() to do its work.
- *
- * If initializing the normal file/directory inode, set @na->type to AT_UNUSED.
- * In that case, @na->name and @na->name_len should be set to NULL and 0,
- * respectively. Although that is not strictly necessary as
- * ntfs_read_locked_inode() will fill them in later.
- *
- * Return 0 on success and -errno on error.
- *
- * NOTE: This function runs with the inode->i_lock spin lock held so it is not
- * allowed to sleep. (Hence the GFP_ATOMIC allocation.)
- */
-static int ntfs_init_locked_inode(struct inode *vi, void *data)
-{
- ntfs_attr *na = (ntfs_attr *)data;
- ntfs_inode *ni = NTFS_I(vi);
-
- vi->i_ino = na->mft_no;
-
- ni->type = na->type;
- if (na->type == AT_INDEX_ALLOCATION)
- NInoSetMstProtected(ni);
-
- ni->name = na->name;
- ni->name_len = na->name_len;
-
- /* If initializing a normal inode, we are done. */
- if (likely(na->type == AT_UNUSED)) {
- BUG_ON(na->name);
- BUG_ON(na->name_len);
- return 0;
- }
-
- /* It is a fake inode. */
- NInoSetAttr(ni);
-
- /*
- * We have I30 global constant as an optimization as it is the name
- * in >99.9% of named attributes! The other <0.1% incur a GFP_ATOMIC
- * allocation but that is ok. And most attributes are unnamed anyway,
- * thus the fraction of named attributes with name != I30 is actually
- * absolutely tiny.
- */
- if (na->name_len && na->name != I30) {
- unsigned int i;
-
- BUG_ON(!na->name);
- i = na->name_len * sizeof(ntfschar);
- ni->name = kmalloc(i + sizeof(ntfschar), GFP_ATOMIC);
- if (!ni->name)
- return -ENOMEM;
- memcpy(ni->name, na->name, i);
- ni->name[na->name_len] = 0;
- }
- return 0;
-}
-
-static int ntfs_read_locked_inode(struct inode *vi);
-static int ntfs_read_locked_attr_inode(struct inode *base_vi, struct inode *vi);
-static int ntfs_read_locked_index_inode(struct inode *base_vi,
- struct inode *vi);
-
-/**
- * ntfs_iget - obtain a struct inode corresponding to a specific normal inode
- * @sb: super block of mounted volume
- * @mft_no: mft record number / inode number to obtain
- *
- * Obtain the struct inode corresponding to a specific normal inode (i.e. a
- * file or directory).
- *
- * If the inode is in the cache, it is just returned with an increased
- * reference count. Otherwise, a new struct inode is allocated and initialized,
- * and finally ntfs_read_locked_inode() is called to read in the inode and
- * fill in the remainder of the inode structure.
- *
- * Return the struct inode on success. Check the return value with IS_ERR() and
- * if true, the function failed and the error code is obtained from PTR_ERR().
- */
-struct inode *ntfs_iget(struct super_block *sb, unsigned long mft_no)
-{
- struct inode *vi;
- int err;
- ntfs_attr na;
-
- na.mft_no = mft_no;
- na.type = AT_UNUSED;
- na.name = NULL;
- na.name_len = 0;
-
- vi = iget5_locked(sb, mft_no, ntfs_test_inode,
- ntfs_init_locked_inode, &na);
- if (unlikely(!vi))
- return ERR_PTR(-ENOMEM);
-
- err = 0;
-
- /* If this is a freshly allocated inode, need to read it now. */
- if (vi->i_state & I_NEW) {
- err = ntfs_read_locked_inode(vi);
- unlock_new_inode(vi);
- }
- /*
- * There is no point in keeping bad inodes around if the failure was
- * due to ENOMEM. We want to be able to retry again later.
- */
- if (unlikely(err == -ENOMEM)) {
- iput(vi);
- vi = ERR_PTR(err);
- }
- return vi;
-}
-
-/**
- * ntfs_attr_iget - obtain a struct inode corresponding to an attribute
- * @base_vi: vfs base inode containing the attribute
- * @type: attribute type
- * @name: Unicode name of the attribute (NULL if unnamed)
- * @name_len: length of @name in Unicode characters (0 if unnamed)
- *
- * Obtain the (fake) struct inode corresponding to the attribute specified by
- * @type, @name, and @name_len, which is present in the base mft record
- * specified by the vfs inode @base_vi.
- *
- * If the attribute inode is in the cache, it is just returned with an
- * increased reference count. Otherwise, a new struct inode is allocated and
- * initialized, and finally ntfs_read_locked_attr_inode() is called to read the
- * attribute and fill in the inode structure.
- *
- * Note, for index allocation attributes, you need to use ntfs_index_iget()
- * instead of ntfs_attr_iget() as working with indices is a lot more complex.
- *
- * Return the struct inode of the attribute inode on success. Check the return
- * value with IS_ERR() and if true, the function failed and the error code is
- * obtained from PTR_ERR().
- */
-struct inode *ntfs_attr_iget(struct inode *base_vi, ATTR_TYPE type,
- ntfschar *name, u32 name_len)
-{
- struct inode *vi;
- int err;
- ntfs_attr na;
-
- /* Make sure no one calls ntfs_attr_iget() for indices. */
- BUG_ON(type == AT_INDEX_ALLOCATION);
-
- na.mft_no = base_vi->i_ino;
- na.type = type;
- na.name = name;
- na.name_len = name_len;
-
- vi = iget5_locked(base_vi->i_sb, na.mft_no, ntfs_test_inode,
- ntfs_init_locked_inode, &na);
- if (unlikely(!vi))
- return ERR_PTR(-ENOMEM);
-
- err = 0;
-
- /* If this is a freshly allocated inode, need to read it now. */
- if (vi->i_state & I_NEW) {
- err = ntfs_read_locked_attr_inode(base_vi, vi);
- unlock_new_inode(vi);
- }
- /*
- * There is no point in keeping bad attribute inodes around. This also
- * simplifies things in that we never need to check for bad attribute
- * inodes elsewhere.
- */
- if (unlikely(err)) {
- iput(vi);
- vi = ERR_PTR(err);
- }
- return vi;
-}
-
-/**
- * ntfs_index_iget - obtain a struct inode corresponding to an index
- * @base_vi: vfs base inode containing the index related attributes
- * @name: Unicode name of the index
- * @name_len: length of @name in Unicode characters
- *
- * Obtain the (fake) struct inode corresponding to the index specified by @name
- * and @name_len, which is present in the base mft record specified by the vfs
- * inode @base_vi.
- *
- * If the index inode is in the cache, it is just returned with an increased
- * reference count. Otherwise, a new struct inode is allocated and
- * initialized, and finally ntfs_read_locked_index_inode() is called to read
- * the index related attributes and fill in the inode structure.
- *
- * Return the struct inode of the index inode on success. Check the return
- * value with IS_ERR() and if true, the function failed and the error code is
- * obtained from PTR_ERR().
- */
-struct inode *ntfs_index_iget(struct inode *base_vi, ntfschar *name,
- u32 name_len)
-{
- struct inode *vi;
- int err;
- ntfs_attr na;
-
- na.mft_no = base_vi->i_ino;
- na.type = AT_INDEX_ALLOCATION;
- na.name = name;
- na.name_len = name_len;
-
- vi = iget5_locked(base_vi->i_sb, na.mft_no, ntfs_test_inode,
- ntfs_init_locked_inode, &na);
- if (unlikely(!vi))
- return ERR_PTR(-ENOMEM);
-
- err = 0;
-
- /* If this is a freshly allocated inode, need to read it now. */
- if (vi->i_state & I_NEW) {
- err = ntfs_read_locked_index_inode(base_vi, vi);
- unlock_new_inode(vi);
- }
- /*
- * There is no point in keeping bad index inodes around. This also
- * simplifies things in that we never need to check for bad index
- * inodes elsewhere.
- */
- if (unlikely(err)) {
- iput(vi);
- vi = ERR_PTR(err);
- }
- return vi;
-}
-
-struct inode *ntfs_alloc_big_inode(struct super_block *sb)
-{
- ntfs_inode *ni;
-
- ntfs_debug("Entering.");
- ni = alloc_inode_sb(sb, ntfs_big_inode_cache, GFP_NOFS);
- if (likely(ni != NULL)) {
- ni->state = 0;
- return VFS_I(ni);
- }
- ntfs_error(sb, "Allocation of NTFS big inode structure failed.");
- return NULL;
-}
-
-void ntfs_free_big_inode(struct inode *inode)
-{
- kmem_cache_free(ntfs_big_inode_cache, NTFS_I(inode));
-}
-
-static inline ntfs_inode *ntfs_alloc_extent_inode(void)
-{
- ntfs_inode *ni;
-
- ntfs_debug("Entering.");
- ni = kmem_cache_alloc(ntfs_inode_cache, GFP_NOFS);
- if (likely(ni != NULL)) {
- ni->state = 0;
- return ni;
- }
- ntfs_error(NULL, "Allocation of NTFS inode structure failed.");
- return NULL;
-}
-
-static void ntfs_destroy_extent_inode(ntfs_inode *ni)
-{
- ntfs_debug("Entering.");
- BUG_ON(ni->page);
- if (!atomic_dec_and_test(&ni->count))
- BUG();
- kmem_cache_free(ntfs_inode_cache, ni);
-}
-
-/*
- * The attribute runlist lock has separate locking rules from the
- * normal runlist lock, so split the two lock-classes:
- */
-static struct lock_class_key attr_list_rl_lock_class;
-
-/**
- * __ntfs_init_inode - initialize ntfs specific part of an inode
- * @sb: super block of mounted volume
- * @ni: freshly allocated ntfs inode which to initialize
- *
- * Initialize an ntfs inode to defaults.
- *
- * NOTE: ni->mft_no, ni->state, ni->type, ni->name, and ni->name_len are left
- * untouched. Make sure to initialize them elsewhere.
- *
- * Return zero on success and -ENOMEM on error.
- */
-void __ntfs_init_inode(struct super_block *sb, ntfs_inode *ni)
-{
- ntfs_debug("Entering.");
- rwlock_init(&ni->size_lock);
- ni->initialized_size = ni->allocated_size = 0;
- ni->seq_no = 0;
- atomic_set(&ni->count, 1);
- ni->vol = NTFS_SB(sb);
- ntfs_init_runlist(&ni->runlist);
- mutex_init(&ni->mrec_lock);
- ni->page = NULL;
- ni->page_ofs = 0;
- ni->attr_list_size = 0;
- ni->attr_list = NULL;
- ntfs_init_runlist(&ni->attr_list_rl);
- lockdep_set_class(&ni->attr_list_rl.lock,
- &attr_list_rl_lock_class);
- ni->itype.index.block_size = 0;
- ni->itype.index.vcn_size = 0;
- ni->itype.index.collation_rule = 0;
- ni->itype.index.block_size_bits = 0;
- ni->itype.index.vcn_size_bits = 0;
- mutex_init(&ni->extent_lock);
- ni->nr_extents = 0;
- ni->ext.base_ntfs_ino = NULL;
-}
-
-/*
- * Extent inodes get MFT-mapped in a nested way, while the base inode
- * is still mapped. Teach this nesting to the lock validator by creating
- * a separate class for nested inode's mrec_lock's:
- */
-static struct lock_class_key extent_inode_mrec_lock_key;
-
-inline ntfs_inode *ntfs_new_extent_inode(struct super_block *sb,
- unsigned long mft_no)
-{
- ntfs_inode *ni = ntfs_alloc_extent_inode();
-
- ntfs_debug("Entering.");
- if (likely(ni != NULL)) {
- __ntfs_init_inode(sb, ni);
- lockdep_set_class(&ni->mrec_lock, &extent_inode_mrec_lock_key);
- ni->mft_no = mft_no;
- ni->type = AT_UNUSED;
- ni->name = NULL;
- ni->name_len = 0;
- }
- return ni;
-}
-
-/**
- * ntfs_is_extended_system_file - check if a file is in the $Extend directory
- * @ctx: initialized attribute search context
- *
- * Search all file name attributes in the inode described by the attribute
- * search context @ctx and check if any of the names are in the $Extend system
- * directory.
- *
- * Return values:
- * 1: file is in $Extend directory
- * 0: file is not in $Extend directory
- * -errno: failed to determine if the file is in the $Extend directory
- */
-static int ntfs_is_extended_system_file(ntfs_attr_search_ctx *ctx)
-{
- int nr_links, err;
-
- /* Restart search. */
- ntfs_attr_reinit_search_ctx(ctx);
-
- /* Get number of hard links. */
- nr_links = le16_to_cpu(ctx->mrec->link_count);
-
- /* Loop through all hard links. */
- while (!(err = ntfs_attr_lookup(AT_FILE_NAME, NULL, 0, 0, 0, NULL, 0,
- ctx))) {
- FILE_NAME_ATTR *file_name_attr;
- ATTR_RECORD *attr = ctx->attr;
- u8 *p, *p2;
-
- nr_links--;
- /*
- * Maximum sanity checking as we are called on an inode that
- * we suspect might be corrupt.
- */
- p = (u8*)attr + le32_to_cpu(attr->length);
- if (p < (u8*)ctx->mrec || (u8*)p > (u8*)ctx->mrec +
- le32_to_cpu(ctx->mrec->bytes_in_use)) {
-err_corrupt_attr:
- ntfs_error(ctx->ntfs_ino->vol->sb, "Corrupt file name "
- "attribute. You should run chkdsk.");
- return -EIO;
- }
- if (attr->non_resident) {
- ntfs_error(ctx->ntfs_ino->vol->sb, "Non-resident file "
- "name. You should run chkdsk.");
- return -EIO;
- }
- if (attr->flags) {
- ntfs_error(ctx->ntfs_ino->vol->sb, "File name with "
- "invalid flags. You should run "
- "chkdsk.");
- return -EIO;
- }
- if (!(attr->data.resident.flags & RESIDENT_ATTR_IS_INDEXED)) {
- ntfs_error(ctx->ntfs_ino->vol->sb, "Unindexed file "
- "name. You should run chkdsk.");
- return -EIO;
- }
- file_name_attr = (FILE_NAME_ATTR*)((u8*)attr +
- le16_to_cpu(attr->data.resident.value_offset));
- p2 = (u8 *)file_name_attr + le32_to_cpu(attr->data.resident.value_length);
- if (p2 < (u8*)attr || p2 > p)
- goto err_corrupt_attr;
- /* This attribute is ok, but is it in the $Extend directory? */
- if (MREF_LE(file_name_attr->parent_directory) == FILE_Extend)
- return 1; /* YES, it's an extended system file. */
- }
- if (unlikely(err != -ENOENT))
- return err;
- if (unlikely(nr_links)) {
- ntfs_error(ctx->ntfs_ino->vol->sb, "Inode hard link count "
- "doesn't match number of name attributes. You "
- "should run chkdsk.");
- return -EIO;
- }
- return 0; /* NO, it is not an extended system file. */
-}
-
-/**
- * ntfs_read_locked_inode - read an inode from its device
- * @vi: inode to read
- *
- * ntfs_read_locked_inode() is called from ntfs_iget() to read the inode
- * described by @vi into memory from the device.
- *
- * The only fields in @vi that we need to/can look at when the function is
- * called are i_sb, pointing to the mounted device's super block, and i_ino,
- * the number of the inode to load.
- *
- * ntfs_read_locked_inode() maps, pins and locks the mft record number i_ino
- * for reading and sets up the necessary @vi fields as well as initializing
- * the ntfs inode.
- *
- * Q: What locks are held when the function is called?
- * A: i_state has I_NEW set, hence the inode is locked, also
- * i_count is set to 1, so it is not going to go away
- * i_flags is set to 0 and we have no business touching it. Only an ioctl()
- * is allowed to write to them. We should of course be honouring them but
- * we need to do that using the IS_* macros defined in include/linux/fs.h.
- * In any case ntfs_read_locked_inode() has nothing to do with i_flags.
- *
- * Return 0 on success and -errno on error. In the error case, the inode will
- * have had make_bad_inode() executed on it.
- */
-static int ntfs_read_locked_inode(struct inode *vi)
-{
- ntfs_volume *vol = NTFS_SB(vi->i_sb);
- ntfs_inode *ni;
- struct inode *bvi;
- MFT_RECORD *m;
- ATTR_RECORD *a;
- STANDARD_INFORMATION *si;
- ntfs_attr_search_ctx *ctx;
- int err = 0;
-
- ntfs_debug("Entering for i_ino 0x%lx.", vi->i_ino);
-
- /* Setup the generic vfs inode parts now. */
- vi->i_uid = vol->uid;
- vi->i_gid = vol->gid;
- vi->i_mode = 0;
-
- /*
- * Initialize the ntfs specific part of @vi special casing
- * FILE_MFT which we need to do at mount time.
- */
- if (vi->i_ino != FILE_MFT)
- ntfs_init_big_inode(vi);
- ni = NTFS_I(vi);
-
- m = map_mft_record(ni);
- if (IS_ERR(m)) {
- err = PTR_ERR(m);
- goto err_out;
- }
- ctx = ntfs_attr_get_search_ctx(ni, m);
- if (!ctx) {
- err = -ENOMEM;
- goto unm_err_out;
- }
-
- if (!(m->flags & MFT_RECORD_IN_USE)) {
- ntfs_error(vi->i_sb, "Inode is not in use!");
- goto unm_err_out;
- }
- if (m->base_mft_record) {
- ntfs_error(vi->i_sb, "Inode is an extent inode!");
- goto unm_err_out;
- }
-
- /* Transfer information from mft record into vfs and ntfs inodes. */
- vi->i_generation = ni->seq_no = le16_to_cpu(m->sequence_number);
-
- /*
- * FIXME: Keep in mind that link_count is two for files which have both
- * a long file name and a short file name as separate entries, so if
- * we are hiding short file names this will be too high. Either we need
- * to account for the short file names by subtracting them or we need
- * to make sure we delete files even though i_nlink is not zero which
- * might be tricky due to vfs interactions. Need to think about this
- * some more when implementing the unlink command.
- */
- set_nlink(vi, le16_to_cpu(m->link_count));
- /*
- * FIXME: Reparse points can have the directory bit set even though
- * they would be S_IFLNK. Need to deal with this further below when we
- * implement reparse points / symbolic links but it will do for now.
- * Also if not a directory, it could be something else, rather than
- * a regular file. But again, will do for now.
- */
- /* Everyone gets all permissions. */
- vi->i_mode |= S_IRWXUGO;
- /* If read-only, no one gets write permissions. */
- if (IS_RDONLY(vi))
- vi->i_mode &= ~S_IWUGO;
- if (m->flags & MFT_RECORD_IS_DIRECTORY) {
- vi->i_mode |= S_IFDIR;
- /*
- * Apply the directory permissions mask set in the mount
- * options.
- */
- vi->i_mode &= ~vol->dmask;
- /* Things break without this kludge! */
- if (vi->i_nlink > 1)
- set_nlink(vi, 1);
- } else {
- vi->i_mode |= S_IFREG;
- /* Apply the file permissions mask set in the mount options. */
- vi->i_mode &= ~vol->fmask;
- }
- /*
- * Find the standard information attribute in the mft record. At this
- * stage we haven't setup the attribute list stuff yet, so this could
- * in fact fail if the standard information is in an extent record, but
- * I don't think this actually ever happens.
- */
- err = ntfs_attr_lookup(AT_STANDARD_INFORMATION, NULL, 0, 0, 0, NULL, 0,
- ctx);
- if (unlikely(err)) {
- if (err == -ENOENT) {
- /*
- * TODO: We should be performing a hot fix here (if the
- * recover mount option is set) by creating a new
- * attribute.
- */
- ntfs_error(vi->i_sb, "$STANDARD_INFORMATION attribute "
- "is missing.");
- }
- goto unm_err_out;
- }
- a = ctx->attr;
- /* Get the standard information attribute value. */
- if ((u8 *)a + le16_to_cpu(a->data.resident.value_offset)
- + le32_to_cpu(a->data.resident.value_length) >
- (u8 *)ctx->mrec + vol->mft_record_size) {
- ntfs_error(vi->i_sb, "Corrupt standard information attribute in inode.");
- goto unm_err_out;
- }
- si = (STANDARD_INFORMATION*)((u8*)a +
- le16_to_cpu(a->data.resident.value_offset));
-
- /* Transfer information from the standard information into vi. */
- /*
- * Note: The i_?times do not quite map perfectly onto the NTFS times,
- * but they are close enough, and in the end it doesn't really matter
- * that much...
- */
- /*
- * mtime is the last change of the data within the file. Not changed
- * when only metadata is changed, e.g. a rename doesn't affect mtime.
- */
- inode_set_mtime_to_ts(vi, ntfs2utc(si->last_data_change_time));
- /*
- * ctime is the last change of the metadata of the file. This obviously
- * always changes, when mtime is changed. ctime can be changed on its
- * own, mtime is then not changed, e.g. when a file is renamed.
- */
- inode_set_ctime_to_ts(vi, ntfs2utc(si->last_mft_change_time));
- /*
- * Last access to the data within the file. Not changed during a rename
- * for example but changed whenever the file is written to.
- */
- inode_set_atime_to_ts(vi, ntfs2utc(si->last_access_time));
-
- /* Find the attribute list attribute if present. */
- ntfs_attr_reinit_search_ctx(ctx);
- err = ntfs_attr_lookup(AT_ATTRIBUTE_LIST, NULL, 0, 0, 0, NULL, 0, ctx);
- if (err) {
- if (unlikely(err != -ENOENT)) {
- ntfs_error(vi->i_sb, "Failed to lookup attribute list "
- "attribute.");
- goto unm_err_out;
- }
- } else /* if (!err) */ {
- if (vi->i_ino == FILE_MFT)
- goto skip_attr_list_load;
- ntfs_debug("Attribute list found in inode 0x%lx.", vi->i_ino);
- NInoSetAttrList(ni);
- a = ctx->attr;
- if (a->flags & ATTR_COMPRESSION_MASK) {
- ntfs_error(vi->i_sb, "Attribute list attribute is "
- "compressed.");
- goto unm_err_out;
- }
- if (a->flags & ATTR_IS_ENCRYPTED ||
- a->flags & ATTR_IS_SPARSE) {
- if (a->non_resident) {
- ntfs_error(vi->i_sb, "Non-resident attribute "
- "list attribute is encrypted/"
- "sparse.");
- goto unm_err_out;
- }
- ntfs_warning(vi->i_sb, "Resident attribute list "
- "attribute in inode 0x%lx is marked "
- "encrypted/sparse which is not true. "
- "However, Windows allows this and "
- "chkdsk does not detect or correct it "
- "so we will just ignore the invalid "
- "flags and pretend they are not set.",
- vi->i_ino);
- }
- /* Now allocate memory for the attribute list. */
- ni->attr_list_size = (u32)ntfs_attr_size(a);
- ni->attr_list = ntfs_malloc_nofs(ni->attr_list_size);
- if (!ni->attr_list) {
- ntfs_error(vi->i_sb, "Not enough memory to allocate "
- "buffer for attribute list.");
- err = -ENOMEM;
- goto unm_err_out;
- }
- if (a->non_resident) {
- NInoSetAttrListNonResident(ni);
- if (a->data.non_resident.lowest_vcn) {
- ntfs_error(vi->i_sb, "Attribute list has non "
- "zero lowest_vcn.");
- goto unm_err_out;
- }
- /*
- * Setup the runlist. No need for locking as we have
- * exclusive access to the inode at this time.
- */
- ni->attr_list_rl.rl = ntfs_mapping_pairs_decompress(vol,
- a, NULL);
- if (IS_ERR(ni->attr_list_rl.rl)) {
- err = PTR_ERR(ni->attr_list_rl.rl);
- ni->attr_list_rl.rl = NULL;
- ntfs_error(vi->i_sb, "Mapping pairs "
- "decompression failed.");
- goto unm_err_out;
- }
- /* Now load the attribute list. */
- if ((err = load_attribute_list(vol, &ni->attr_list_rl,
- ni->attr_list, ni->attr_list_size,
- sle64_to_cpu(a->data.non_resident.
- initialized_size)))) {
- ntfs_error(vi->i_sb, "Failed to load "
- "attribute list attribute.");
- goto unm_err_out;
- }
- } else /* if (!a->non_resident) */ {
- if ((u8*)a + le16_to_cpu(a->data.resident.value_offset)
- + le32_to_cpu(
- a->data.resident.value_length) >
- (u8*)ctx->mrec + vol->mft_record_size) {
- ntfs_error(vi->i_sb, "Corrupt attribute list "
- "in inode.");
- goto unm_err_out;
- }
- /* Now copy the attribute list. */
- memcpy(ni->attr_list, (u8*)a + le16_to_cpu(
- a->data.resident.value_offset),
- le32_to_cpu(
- a->data.resident.value_length));
- }
- }
-skip_attr_list_load:
- /*
- * If an attribute list is present we now have the attribute list value
- * in ntfs_ino->attr_list and it is ntfs_ino->attr_list_size bytes.
- */
- if (S_ISDIR(vi->i_mode)) {
- loff_t bvi_size;
- ntfs_inode *bni;
- INDEX_ROOT *ir;
- u8 *ir_end, *index_end;
-
- /* It is a directory, find index root attribute. */
- ntfs_attr_reinit_search_ctx(ctx);
- err = ntfs_attr_lookup(AT_INDEX_ROOT, I30, 4, CASE_SENSITIVE,
- 0, NULL, 0, ctx);
- if (unlikely(err)) {
- if (err == -ENOENT) {
- // FIXME: File is corrupt! Hot-fix with empty
- // index root attribute if recovery option is
- // set.
- ntfs_error(vi->i_sb, "$INDEX_ROOT attribute "
- "is missing.");
- }
- goto unm_err_out;
- }
- a = ctx->attr;
- /* Set up the state. */
- if (unlikely(a->non_resident)) {
- ntfs_error(vol->sb, "$INDEX_ROOT attribute is not "
- "resident.");
- goto unm_err_out;
- }
- /* Ensure the attribute name is placed before the value. */
- if (unlikely(a->name_length && (le16_to_cpu(a->name_offset) >=
- le16_to_cpu(a->data.resident.value_offset)))) {
- ntfs_error(vol->sb, "$INDEX_ROOT attribute name is "
- "placed after the attribute value.");
- goto unm_err_out;
- }
- /*
- * Compressed/encrypted index root just means that the newly
- * created files in that directory should be created compressed/
- * encrypted. However index root cannot be both compressed and
- * encrypted.
- */
- if (a->flags & ATTR_COMPRESSION_MASK)
- NInoSetCompressed(ni);
- if (a->flags & ATTR_IS_ENCRYPTED) {
- if (a->flags & ATTR_COMPRESSION_MASK) {
- ntfs_error(vi->i_sb, "Found encrypted and "
- "compressed attribute.");
- goto unm_err_out;
- }
- NInoSetEncrypted(ni);
- }
- if (a->flags & ATTR_IS_SPARSE)
- NInoSetSparse(ni);
- ir = (INDEX_ROOT*)((u8*)a +
- le16_to_cpu(a->data.resident.value_offset));
- ir_end = (u8*)ir + le32_to_cpu(a->data.resident.value_length);
- if (ir_end > (u8*)ctx->mrec + vol->mft_record_size) {
- ntfs_error(vi->i_sb, "$INDEX_ROOT attribute is "
- "corrupt.");
- goto unm_err_out;
- }
- index_end = (u8*)&ir->index +
- le32_to_cpu(ir->index.index_length);
- if (index_end > ir_end) {
- ntfs_error(vi->i_sb, "Directory index is corrupt.");
- goto unm_err_out;
- }
- if (ir->type != AT_FILE_NAME) {
- ntfs_error(vi->i_sb, "Indexed attribute is not "
- "$FILE_NAME.");
- goto unm_err_out;
- }
- if (ir->collation_rule != COLLATION_FILE_NAME) {
- ntfs_error(vi->i_sb, "Index collation rule is not "
- "COLLATION_FILE_NAME.");
- goto unm_err_out;
- }
- ni->itype.index.collation_rule = ir->collation_rule;
- ni->itype.index.block_size = le32_to_cpu(ir->index_block_size);
- if (ni->itype.index.block_size &
- (ni->itype.index.block_size - 1)) {
- ntfs_error(vi->i_sb, "Index block size (%u) is not a "
- "power of two.",
- ni->itype.index.block_size);
- goto unm_err_out;
- }
- if (ni->itype.index.block_size > PAGE_SIZE) {
- ntfs_error(vi->i_sb, "Index block size (%u) > "
- "PAGE_SIZE (%ld) is not "
- "supported. Sorry.",
- ni->itype.index.block_size,
- PAGE_SIZE);
- err = -EOPNOTSUPP;
- goto unm_err_out;
- }
- if (ni->itype.index.block_size < NTFS_BLOCK_SIZE) {
- ntfs_error(vi->i_sb, "Index block size (%u) < "
- "NTFS_BLOCK_SIZE (%i) is not "
- "supported. Sorry.",
- ni->itype.index.block_size,
- NTFS_BLOCK_SIZE);
- err = -EOPNOTSUPP;
- goto unm_err_out;
- }
- ni->itype.index.block_size_bits =
- ffs(ni->itype.index.block_size) - 1;
- /* Determine the size of a vcn in the directory index. */
- if (vol->cluster_size <= ni->itype.index.block_size) {
- ni->itype.index.vcn_size = vol->cluster_size;
- ni->itype.index.vcn_size_bits = vol->cluster_size_bits;
- } else {
- ni->itype.index.vcn_size = vol->sector_size;
- ni->itype.index.vcn_size_bits = vol->sector_size_bits;
- }
-
- /* Setup the index allocation attribute, even if not present. */
- NInoSetMstProtected(ni);
- ni->type = AT_INDEX_ALLOCATION;
- ni->name = I30;
- ni->name_len = 4;
-
- if (!(ir->index.flags & LARGE_INDEX)) {
- /* No index allocation. */
- vi->i_size = ni->initialized_size =
- ni->allocated_size = 0;
- /* We are done with the mft record, so we release it. */
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(ni);
- m = NULL;
- ctx = NULL;
- goto skip_large_dir_stuff;
- } /* LARGE_INDEX: Index allocation present. Setup state. */
- NInoSetIndexAllocPresent(ni);
- /* Find index allocation attribute. */
- ntfs_attr_reinit_search_ctx(ctx);
- err = ntfs_attr_lookup(AT_INDEX_ALLOCATION, I30, 4,
- CASE_SENSITIVE, 0, NULL, 0, ctx);
- if (unlikely(err)) {
- if (err == -ENOENT)
- ntfs_error(vi->i_sb, "$INDEX_ALLOCATION "
- "attribute is not present but "
- "$INDEX_ROOT indicated it is.");
- else
- ntfs_error(vi->i_sb, "Failed to lookup "
- "$INDEX_ALLOCATION "
- "attribute.");
- goto unm_err_out;
- }
- a = ctx->attr;
- if (!a->non_resident) {
- ntfs_error(vi->i_sb, "$INDEX_ALLOCATION attribute "
- "is resident.");
- goto unm_err_out;
- }
- /*
- * Ensure the attribute name is placed before the mapping pairs
- * array.
- */
- if (unlikely(a->name_length && (le16_to_cpu(a->name_offset) >=
- le16_to_cpu(
- a->data.non_resident.mapping_pairs_offset)))) {
- ntfs_error(vol->sb, "$INDEX_ALLOCATION attribute name "
- "is placed after the mapping pairs "
- "array.");
- goto unm_err_out;
- }
- if (a->flags & ATTR_IS_ENCRYPTED) {
- ntfs_error(vi->i_sb, "$INDEX_ALLOCATION attribute "
- "is encrypted.");
- goto unm_err_out;
- }
- if (a->flags & ATTR_IS_SPARSE) {
- ntfs_error(vi->i_sb, "$INDEX_ALLOCATION attribute "
- "is sparse.");
- goto unm_err_out;
- }
- if (a->flags & ATTR_COMPRESSION_MASK) {
- ntfs_error(vi->i_sb, "$INDEX_ALLOCATION attribute "
- "is compressed.");
- goto unm_err_out;
- }
- if (a->data.non_resident.lowest_vcn) {
- ntfs_error(vi->i_sb, "First extent of "
- "$INDEX_ALLOCATION attribute has non "
- "zero lowest_vcn.");
- goto unm_err_out;
- }
- vi->i_size = sle64_to_cpu(a->data.non_resident.data_size);
- ni->initialized_size = sle64_to_cpu(
- a->data.non_resident.initialized_size);
- ni->allocated_size = sle64_to_cpu(
- a->data.non_resident.allocated_size);
- /*
- * We are done with the mft record, so we release it. Otherwise
- * we would deadlock in ntfs_attr_iget().
- */
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(ni);
- m = NULL;
- ctx = NULL;
- /* Get the index bitmap attribute inode. */
- bvi = ntfs_attr_iget(vi, AT_BITMAP, I30, 4);
- if (IS_ERR(bvi)) {
- ntfs_error(vi->i_sb, "Failed to get bitmap attribute.");
- err = PTR_ERR(bvi);
- goto unm_err_out;
- }
- bni = NTFS_I(bvi);
- if (NInoCompressed(bni) || NInoEncrypted(bni) ||
- NInoSparse(bni)) {
- ntfs_error(vi->i_sb, "$BITMAP attribute is compressed "
- "and/or encrypted and/or sparse.");
- goto iput_unm_err_out;
- }
- /* Consistency check bitmap size vs. index allocation size. */
- bvi_size = i_size_read(bvi);
- if ((bvi_size << 3) < (vi->i_size >>
- ni->itype.index.block_size_bits)) {
- ntfs_error(vi->i_sb, "Index bitmap too small (0x%llx) "
- "for index allocation (0x%llx).",
- bvi_size << 3, vi->i_size);
- goto iput_unm_err_out;
- }
- /* No longer need the bitmap attribute inode. */
- iput(bvi);
-skip_large_dir_stuff:
- /* Setup the operations for this inode. */
- vi->i_op = &ntfs_dir_inode_ops;
- vi->i_fop = &ntfs_dir_ops;
- vi->i_mapping->a_ops = &ntfs_mst_aops;
- } else {
- /* It is a file. */
- ntfs_attr_reinit_search_ctx(ctx);
-
- /* Setup the data attribute, even if not present. */
- ni->type = AT_DATA;
- ni->name = NULL;
- ni->name_len = 0;
-
- /* Find first extent of the unnamed data attribute. */
- err = ntfs_attr_lookup(AT_DATA, NULL, 0, 0, 0, NULL, 0, ctx);
- if (unlikely(err)) {
- vi->i_size = ni->initialized_size =
- ni->allocated_size = 0;
- if (err != -ENOENT) {
- ntfs_error(vi->i_sb, "Failed to lookup $DATA "
- "attribute.");
- goto unm_err_out;
- }
- /*
- * FILE_Secure does not have an unnamed $DATA
- * attribute, so we special case it here.
- */
- if (vi->i_ino == FILE_Secure)
- goto no_data_attr_special_case;
- /*
- * Most if not all the system files in the $Extend
- * system directory do not have unnamed data
- * attributes so we need to check if the parent
- * directory of the file is FILE_Extend and if it is
- * ignore this error. To do this we need to get the
- * name of this inode from the mft record as the name
- * contains the back reference to the parent directory.
- */
- if (ntfs_is_extended_system_file(ctx) > 0)
- goto no_data_attr_special_case;
- // FIXME: File is corrupt! Hot-fix with empty data
- // attribute if recovery option is set.
- ntfs_error(vi->i_sb, "$DATA attribute is missing.");
- goto unm_err_out;
- }
- a = ctx->attr;
- /* Setup the state. */
- if (a->flags & (ATTR_COMPRESSION_MASK | ATTR_IS_SPARSE)) {
- if (a->flags & ATTR_COMPRESSION_MASK) {
- NInoSetCompressed(ni);
- if (vol->cluster_size > 4096) {
- ntfs_error(vi->i_sb, "Found "
- "compressed data but "
- "compression is "
- "disabled due to "
- "cluster size (%i) > "
- "4kiB.",
- vol->cluster_size);
- goto unm_err_out;
- }
- if ((a->flags & ATTR_COMPRESSION_MASK)
- != ATTR_IS_COMPRESSED) {
- ntfs_error(vi->i_sb, "Found unknown "
- "compression method "
- "or corrupt file.");
- goto unm_err_out;
- }
- }
- if (a->flags & ATTR_IS_SPARSE)
- NInoSetSparse(ni);
- }
- if (a->flags & ATTR_IS_ENCRYPTED) {
- if (NInoCompressed(ni)) {
- ntfs_error(vi->i_sb, "Found encrypted and "
- "compressed data.");
- goto unm_err_out;
- }
- NInoSetEncrypted(ni);
- }
- if (a->non_resident) {
- NInoSetNonResident(ni);
- if (NInoCompressed(ni) || NInoSparse(ni)) {
- if (NInoCompressed(ni) && a->data.non_resident.
- compression_unit != 4) {
- ntfs_error(vi->i_sb, "Found "
- "non-standard "
- "compression unit (%u "
- "instead of 4). "
- "Cannot handle this.",
- a->data.non_resident.
- compression_unit);
- err = -EOPNOTSUPP;
- goto unm_err_out;
- }
- if (a->data.non_resident.compression_unit) {
- ni->itype.compressed.block_size = 1U <<
- (a->data.non_resident.
- compression_unit +
- vol->cluster_size_bits);
- ni->itype.compressed.block_size_bits =
- ffs(ni->itype.
- compressed.
- block_size) - 1;
- ni->itype.compressed.block_clusters =
- 1U << a->data.
- non_resident.
- compression_unit;
- } else {
- ni->itype.compressed.block_size = 0;
- ni->itype.compressed.block_size_bits =
- 0;
- ni->itype.compressed.block_clusters =
- 0;
- }
- ni->itype.compressed.size = sle64_to_cpu(
- a->data.non_resident.
- compressed_size);
- }
- if (a->data.non_resident.lowest_vcn) {
- ntfs_error(vi->i_sb, "First extent of $DATA "
- "attribute has non zero "
- "lowest_vcn.");
- goto unm_err_out;
- }
- vi->i_size = sle64_to_cpu(
- a->data.non_resident.data_size);
- ni->initialized_size = sle64_to_cpu(
- a->data.non_resident.initialized_size);
- ni->allocated_size = sle64_to_cpu(
- a->data.non_resident.allocated_size);
- } else { /* Resident attribute. */
- vi->i_size = ni->initialized_size = le32_to_cpu(
- a->data.resident.value_length);
- ni->allocated_size = le32_to_cpu(a->length) -
- le16_to_cpu(
- a->data.resident.value_offset);
- if (vi->i_size > ni->allocated_size) {
- ntfs_error(vi->i_sb, "Resident data attribute "
- "is corrupt (size exceeds "
- "allocation).");
- goto unm_err_out;
- }
- }
-no_data_attr_special_case:
- /* We are done with the mft record, so we release it. */
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(ni);
- m = NULL;
- ctx = NULL;
- /* Setup the operations for this inode. */
- vi->i_op = &ntfs_file_inode_ops;
- vi->i_fop = &ntfs_file_ops;
- vi->i_mapping->a_ops = &ntfs_normal_aops;
- if (NInoMstProtected(ni))
- vi->i_mapping->a_ops = &ntfs_mst_aops;
- else if (NInoCompressed(ni))
- vi->i_mapping->a_ops = &ntfs_compressed_aops;
- }
- /*
- * The number of 512-byte blocks used on disk (for stat). This is in so
- * far inaccurate as it doesn't account for any named streams or other
- * special non-resident attributes, but that is how Windows works, too,
- * so we are at least consistent with Windows, if not entirely
- * consistent with the Linux Way. Doing it the Linux Way would cause a
- * significant slowdown as it would involve iterating over all
- * attributes in the mft record and adding the allocated/compressed
- * sizes of all non-resident attributes present to give us the Linux
- * correct size that should go into i_blocks (after division by 512).
- */
- if (S_ISREG(vi->i_mode) && (NInoCompressed(ni) || NInoSparse(ni)))
- vi->i_blocks = ni->itype.compressed.size >> 9;
- else
- vi->i_blocks = ni->allocated_size >> 9;
- ntfs_debug("Done.");
- return 0;
-iput_unm_err_out:
- iput(bvi);
-unm_err_out:
- if (!err)
- err = -EIO;
- if (ctx)
- ntfs_attr_put_search_ctx(ctx);
- if (m)
- unmap_mft_record(ni);
-err_out:
- ntfs_error(vol->sb, "Failed with error code %i. Marking corrupt "
- "inode 0x%lx as bad. Run chkdsk.", err, vi->i_ino);
- make_bad_inode(vi);
- if (err != -EOPNOTSUPP && err != -ENOMEM)
- NVolSetErrors(vol);
- return err;
-}
-
-/**
- * ntfs_read_locked_attr_inode - read an attribute inode from its base inode
- * @base_vi: base inode
- * @vi: attribute inode to read
- *
- * ntfs_read_locked_attr_inode() is called from ntfs_attr_iget() to read the
- * attribute inode described by @vi into memory from the base mft record
- * described by @base_ni.
- *
- * ntfs_read_locked_attr_inode() maps, pins and locks the base inode for
- * reading and looks up the attribute described by @vi before setting up the
- * necessary fields in @vi as well as initializing the ntfs inode.
- *
- * Q: What locks are held when the function is called?
- * A: i_state has I_NEW set, hence the inode is locked, also
- * i_count is set to 1, so it is not going to go away
- *
- * Return 0 on success and -errno on error. In the error case, the inode will
- * have had make_bad_inode() executed on it.
- *
- * Note this cannot be called for AT_INDEX_ALLOCATION.
- */
-static int ntfs_read_locked_attr_inode(struct inode *base_vi, struct inode *vi)
-{
- ntfs_volume *vol = NTFS_SB(vi->i_sb);
- ntfs_inode *ni, *base_ni;
- MFT_RECORD *m;
- ATTR_RECORD *a;
- ntfs_attr_search_ctx *ctx;
- int err = 0;
-
- ntfs_debug("Entering for i_ino 0x%lx.", vi->i_ino);
-
- ntfs_init_big_inode(vi);
-
- ni = NTFS_I(vi);
- base_ni = NTFS_I(base_vi);
-
- /* Just mirror the values from the base inode. */
- vi->i_uid = base_vi->i_uid;
- vi->i_gid = base_vi->i_gid;
- set_nlink(vi, base_vi->i_nlink);
- inode_set_mtime_to_ts(vi, inode_get_mtime(base_vi));
- inode_set_ctime_to_ts(vi, inode_get_ctime(base_vi));
- inode_set_atime_to_ts(vi, inode_get_atime(base_vi));
- vi->i_generation = ni->seq_no = base_ni->seq_no;
-
- /* Set inode type to zero but preserve permissions. */
- vi->i_mode = base_vi->i_mode & ~S_IFMT;
-
- m = map_mft_record(base_ni);
- if (IS_ERR(m)) {
- err = PTR_ERR(m);
- goto err_out;
- }
- ctx = ntfs_attr_get_search_ctx(base_ni, m);
- if (!ctx) {
- err = -ENOMEM;
- goto unm_err_out;
- }
- /* Find the attribute. */
- err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
- CASE_SENSITIVE, 0, NULL, 0, ctx);
- if (unlikely(err))
- goto unm_err_out;
- a = ctx->attr;
- if (a->flags & (ATTR_COMPRESSION_MASK | ATTR_IS_SPARSE)) {
- if (a->flags & ATTR_COMPRESSION_MASK) {
- NInoSetCompressed(ni);
- if ((ni->type != AT_DATA) || (ni->type == AT_DATA &&
- ni->name_len)) {
- ntfs_error(vi->i_sb, "Found compressed "
- "non-data or named data "
- "attribute. Please report "
- "you saw this message to "
- "linux-ntfs-dev@lists."
- "sourceforge.net");
- goto unm_err_out;
- }
- if (vol->cluster_size > 4096) {
- ntfs_error(vi->i_sb, "Found compressed "
- "attribute but compression is "
- "disabled due to cluster size "
- "(%i) > 4kiB.",
- vol->cluster_size);
- goto unm_err_out;
- }
- if ((a->flags & ATTR_COMPRESSION_MASK) !=
- ATTR_IS_COMPRESSED) {
- ntfs_error(vi->i_sb, "Found unknown "
- "compression method.");
- goto unm_err_out;
- }
- }
- /*
- * The compressed/sparse flag set in an index root just means
- * to compress all files.
- */
- if (NInoMstProtected(ni) && ni->type != AT_INDEX_ROOT) {
- ntfs_error(vi->i_sb, "Found mst protected attribute "
- "but the attribute is %s. Please "
- "report you saw this message to "
- "linux-ntfs-dev@lists.sourceforge.net",
- NInoCompressed(ni) ? "compressed" :
- "sparse");
- goto unm_err_out;
- }
- if (a->flags & ATTR_IS_SPARSE)
- NInoSetSparse(ni);
- }
- if (a->flags & ATTR_IS_ENCRYPTED) {
- if (NInoCompressed(ni)) {
- ntfs_error(vi->i_sb, "Found encrypted and compressed "
- "data.");
- goto unm_err_out;
- }
- /*
- * The encryption flag set in an index root just means to
- * encrypt all files.
- */
- if (NInoMstProtected(ni) && ni->type != AT_INDEX_ROOT) {
- ntfs_error(vi->i_sb, "Found mst protected attribute "
- "but the attribute is encrypted. "
- "Please report you saw this message "
- "to linux-ntfs-dev@lists.sourceforge."
- "net");
- goto unm_err_out;
- }
- if (ni->type != AT_DATA) {
- ntfs_error(vi->i_sb, "Found encrypted non-data "
- "attribute.");
- goto unm_err_out;
- }
- NInoSetEncrypted(ni);
- }
- if (!a->non_resident) {
- /* Ensure the attribute name is placed before the value. */
- if (unlikely(a->name_length && (le16_to_cpu(a->name_offset) >=
- le16_to_cpu(a->data.resident.value_offset)))) {
- ntfs_error(vol->sb, "Attribute name is placed after "
- "the attribute value.");
- goto unm_err_out;
- }
- if (NInoMstProtected(ni)) {
- ntfs_error(vi->i_sb, "Found mst protected attribute "
- "but the attribute is resident. "
- "Please report you saw this message to "
- "linux-ntfs-dev@lists.sourceforge.net");
- goto unm_err_out;
- }
- vi->i_size = ni->initialized_size = le32_to_cpu(
- a->data.resident.value_length);
- ni->allocated_size = le32_to_cpu(a->length) -
- le16_to_cpu(a->data.resident.value_offset);
- if (vi->i_size > ni->allocated_size) {
- ntfs_error(vi->i_sb, "Resident attribute is corrupt "
- "(size exceeds allocation).");
- goto unm_err_out;
- }
- } else {
- NInoSetNonResident(ni);
- /*
- * Ensure the attribute name is placed before the mapping pairs
- * array.
- */
- if (unlikely(a->name_length && (le16_to_cpu(a->name_offset) >=
- le16_to_cpu(
- a->data.non_resident.mapping_pairs_offset)))) {
- ntfs_error(vol->sb, "Attribute name is placed after "
- "the mapping pairs array.");
- goto unm_err_out;
- }
- if (NInoCompressed(ni) || NInoSparse(ni)) {
- if (NInoCompressed(ni) && a->data.non_resident.
- compression_unit != 4) {
- ntfs_error(vi->i_sb, "Found non-standard "
- "compression unit (%u instead "
- "of 4). Cannot handle this.",
- a->data.non_resident.
- compression_unit);
- err = -EOPNOTSUPP;
- goto unm_err_out;
- }
- if (a->data.non_resident.compression_unit) {
- ni->itype.compressed.block_size = 1U <<
- (a->data.non_resident.
- compression_unit +
- vol->cluster_size_bits);
- ni->itype.compressed.block_size_bits =
- ffs(ni->itype.compressed.
- block_size) - 1;
- ni->itype.compressed.block_clusters = 1U <<
- a->data.non_resident.
- compression_unit;
- } else {
- ni->itype.compressed.block_size = 0;
- ni->itype.compressed.block_size_bits = 0;
- ni->itype.compressed.block_clusters = 0;
- }
- ni->itype.compressed.size = sle64_to_cpu(
- a->data.non_resident.compressed_size);
- }
- if (a->data.non_resident.lowest_vcn) {
- ntfs_error(vi->i_sb, "First extent of attribute has "
- "non-zero lowest_vcn.");
- goto unm_err_out;
- }
- vi->i_size = sle64_to_cpu(a->data.non_resident.data_size);
- ni->initialized_size = sle64_to_cpu(
- a->data.non_resident.initialized_size);
- ni->allocated_size = sle64_to_cpu(
- a->data.non_resident.allocated_size);
- }
- vi->i_mapping->a_ops = &ntfs_normal_aops;
- if (NInoMstProtected(ni))
- vi->i_mapping->a_ops = &ntfs_mst_aops;
- else if (NInoCompressed(ni))
- vi->i_mapping->a_ops = &ntfs_compressed_aops;
- if ((NInoCompressed(ni) || NInoSparse(ni)) && ni->type != AT_INDEX_ROOT)
- vi->i_blocks = ni->itype.compressed.size >> 9;
- else
- vi->i_blocks = ni->allocated_size >> 9;
- /*
- * Make sure the base inode does not go away and attach it to the
- * attribute inode.
- */
- igrab(base_vi);
- ni->ext.base_ntfs_ino = base_ni;
- ni->nr_extents = -1;
-
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(base_ni);
-
- ntfs_debug("Done.");
- return 0;
-
-unm_err_out:
- if (!err)
- err = -EIO;
- if (ctx)
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(base_ni);
-err_out:
- ntfs_error(vol->sb, "Failed with error code %i while reading attribute "
- "inode (mft_no 0x%lx, type 0x%x, name_len %i). "
- "Marking corrupt inode and base inode 0x%lx as bad. "
- "Run chkdsk.", err, vi->i_ino, ni->type, ni->name_len,
- base_vi->i_ino);
- make_bad_inode(vi);
- if (err != -ENOMEM)
- NVolSetErrors(vol);
- return err;
-}
-
-/**
- * ntfs_read_locked_index_inode - read an index inode from its base inode
- * @base_vi: base inode
- * @vi: index inode to read
- *
- * ntfs_read_locked_index_inode() is called from ntfs_index_iget() to read the
- * index inode described by @vi into memory from the base mft record described
- * by @base_ni.
- *
- * ntfs_read_locked_index_inode() maps, pins and locks the base inode for
- * reading and looks up the attributes relating to the index described by @vi
- * before setting up the necessary fields in @vi as well as initializing the
- * ntfs inode.
- *
- * Note, index inodes are essentially attribute inodes (NInoAttr() is true)
- * with the attribute type set to AT_INDEX_ALLOCATION. Apart from that, they
- * are setup like directory inodes since directories are a special case of
- * indices ao they need to be treated in much the same way. Most importantly,
- * for small indices the index allocation attribute might not actually exist.
- * However, the index root attribute always exists but this does not need to
- * have an inode associated with it and this is why we define a new inode type
- * index. Also, like for directories, we need to have an attribute inode for
- * the bitmap attribute corresponding to the index allocation attribute and we
- * can store this in the appropriate field of the inode, just like we do for
- * normal directory inodes.
- *
- * Q: What locks are held when the function is called?
- * A: i_state has I_NEW set, hence the inode is locked, also
- * i_count is set to 1, so it is not going to go away
- *
- * Return 0 on success and -errno on error. In the error case, the inode will
- * have had make_bad_inode() executed on it.
- */
-static int ntfs_read_locked_index_inode(struct inode *base_vi, struct inode *vi)
-{
- loff_t bvi_size;
- ntfs_volume *vol = NTFS_SB(vi->i_sb);
- ntfs_inode *ni, *base_ni, *bni;
- struct inode *bvi;
- MFT_RECORD *m;
- ATTR_RECORD *a;
- ntfs_attr_search_ctx *ctx;
- INDEX_ROOT *ir;
- u8 *ir_end, *index_end;
- int err = 0;
-
- ntfs_debug("Entering for i_ino 0x%lx.", vi->i_ino);
- ntfs_init_big_inode(vi);
- ni = NTFS_I(vi);
- base_ni = NTFS_I(base_vi);
- /* Just mirror the values from the base inode. */
- vi->i_uid = base_vi->i_uid;
- vi->i_gid = base_vi->i_gid;
- set_nlink(vi, base_vi->i_nlink);
- inode_set_mtime_to_ts(vi, inode_get_mtime(base_vi));
- inode_set_ctime_to_ts(vi, inode_get_ctime(base_vi));
- inode_set_atime_to_ts(vi, inode_get_atime(base_vi));
- vi->i_generation = ni->seq_no = base_ni->seq_no;
- /* Set inode type to zero but preserve permissions. */
- vi->i_mode = base_vi->i_mode & ~S_IFMT;
- /* Map the mft record for the base inode. */
- m = map_mft_record(base_ni);
- if (IS_ERR(m)) {
- err = PTR_ERR(m);
- goto err_out;
- }
- ctx = ntfs_attr_get_search_ctx(base_ni, m);
- if (!ctx) {
- err = -ENOMEM;
- goto unm_err_out;
- }
- /* Find the index root attribute. */
- err = ntfs_attr_lookup(AT_INDEX_ROOT, ni->name, ni->name_len,
- CASE_SENSITIVE, 0, NULL, 0, ctx);
- if (unlikely(err)) {
- if (err == -ENOENT)
- ntfs_error(vi->i_sb, "$INDEX_ROOT attribute is "
- "missing.");
- goto unm_err_out;
- }
- a = ctx->attr;
- /* Set up the state. */
- if (unlikely(a->non_resident)) {
- ntfs_error(vol->sb, "$INDEX_ROOT attribute is not resident.");
- goto unm_err_out;
- }
- /* Ensure the attribute name is placed before the value. */
- if (unlikely(a->name_length && (le16_to_cpu(a->name_offset) >=
- le16_to_cpu(a->data.resident.value_offset)))) {
- ntfs_error(vol->sb, "$INDEX_ROOT attribute name is placed "
- "after the attribute value.");
- goto unm_err_out;
- }
- /*
- * Compressed/encrypted/sparse index root is not allowed, except for
- * directories of course but those are not dealt with here.
- */
- if (a->flags & (ATTR_COMPRESSION_MASK | ATTR_IS_ENCRYPTED |
- ATTR_IS_SPARSE)) {
- ntfs_error(vi->i_sb, "Found compressed/encrypted/sparse index "
- "root attribute.");
- goto unm_err_out;
- }
- ir = (INDEX_ROOT*)((u8*)a + le16_to_cpu(a->data.resident.value_offset));
- ir_end = (u8*)ir + le32_to_cpu(a->data.resident.value_length);
- if (ir_end > (u8*)ctx->mrec + vol->mft_record_size) {
- ntfs_error(vi->i_sb, "$INDEX_ROOT attribute is corrupt.");
- goto unm_err_out;
- }
- index_end = (u8*)&ir->index + le32_to_cpu(ir->index.index_length);
- if (index_end > ir_end) {
- ntfs_error(vi->i_sb, "Index is corrupt.");
- goto unm_err_out;
- }
- if (ir->type) {
- ntfs_error(vi->i_sb, "Index type is not 0 (type is 0x%x).",
- le32_to_cpu(ir->type));
- goto unm_err_out;
- }
- ni->itype.index.collation_rule = ir->collation_rule;
- ntfs_debug("Index collation rule is 0x%x.",
- le32_to_cpu(ir->collation_rule));
- ni->itype.index.block_size = le32_to_cpu(ir->index_block_size);
- if (!is_power_of_2(ni->itype.index.block_size)) {
- ntfs_error(vi->i_sb, "Index block size (%u) is not a power of "
- "two.", ni->itype.index.block_size);
- goto unm_err_out;
- }
- if (ni->itype.index.block_size > PAGE_SIZE) {
- ntfs_error(vi->i_sb, "Index block size (%u) > PAGE_SIZE "
- "(%ld) is not supported. Sorry.",
- ni->itype.index.block_size, PAGE_SIZE);
- err = -EOPNOTSUPP;
- goto unm_err_out;
- }
- if (ni->itype.index.block_size < NTFS_BLOCK_SIZE) {
- ntfs_error(vi->i_sb, "Index block size (%u) < NTFS_BLOCK_SIZE "
- "(%i) is not supported. Sorry.",
- ni->itype.index.block_size, NTFS_BLOCK_SIZE);
- err = -EOPNOTSUPP;
- goto unm_err_out;
- }
- ni->itype.index.block_size_bits = ffs(ni->itype.index.block_size) - 1;
- /* Determine the size of a vcn in the index. */
- if (vol->cluster_size <= ni->itype.index.block_size) {
- ni->itype.index.vcn_size = vol->cluster_size;
- ni->itype.index.vcn_size_bits = vol->cluster_size_bits;
- } else {
- ni->itype.index.vcn_size = vol->sector_size;
- ni->itype.index.vcn_size_bits = vol->sector_size_bits;
- }
- /* Check for presence of index allocation attribute. */
- if (!(ir->index.flags & LARGE_INDEX)) {
- /* No index allocation. */
- vi->i_size = ni->initialized_size = ni->allocated_size = 0;
- /* We are done with the mft record, so we release it. */
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(base_ni);
- m = NULL;
- ctx = NULL;
- goto skip_large_index_stuff;
- } /* LARGE_INDEX: Index allocation present. Setup state. */
- NInoSetIndexAllocPresent(ni);
- /* Find index allocation attribute. */
- ntfs_attr_reinit_search_ctx(ctx);
- err = ntfs_attr_lookup(AT_INDEX_ALLOCATION, ni->name, ni->name_len,
- CASE_SENSITIVE, 0, NULL, 0, ctx);
- if (unlikely(err)) {
- if (err == -ENOENT)
- ntfs_error(vi->i_sb, "$INDEX_ALLOCATION attribute is "
- "not present but $INDEX_ROOT "
- "indicated it is.");
- else
- ntfs_error(vi->i_sb, "Failed to lookup "
- "$INDEX_ALLOCATION attribute.");
- goto unm_err_out;
- }
- a = ctx->attr;
- if (!a->non_resident) {
- ntfs_error(vi->i_sb, "$INDEX_ALLOCATION attribute is "
- "resident.");
- goto unm_err_out;
- }
- /*
- * Ensure the attribute name is placed before the mapping pairs array.
- */
- if (unlikely(a->name_length && (le16_to_cpu(a->name_offset) >=
- le16_to_cpu(
- a->data.non_resident.mapping_pairs_offset)))) {
- ntfs_error(vol->sb, "$INDEX_ALLOCATION attribute name is "
- "placed after the mapping pairs array.");
- goto unm_err_out;
- }
- if (a->flags & ATTR_IS_ENCRYPTED) {
- ntfs_error(vi->i_sb, "$INDEX_ALLOCATION attribute is "
- "encrypted.");
- goto unm_err_out;
- }
- if (a->flags & ATTR_IS_SPARSE) {
- ntfs_error(vi->i_sb, "$INDEX_ALLOCATION attribute is sparse.");
- goto unm_err_out;
- }
- if (a->flags & ATTR_COMPRESSION_MASK) {
- ntfs_error(vi->i_sb, "$INDEX_ALLOCATION attribute is "
- "compressed.");
- goto unm_err_out;
- }
- if (a->data.non_resident.lowest_vcn) {
- ntfs_error(vi->i_sb, "First extent of $INDEX_ALLOCATION "
- "attribute has non zero lowest_vcn.");
- goto unm_err_out;
- }
- vi->i_size = sle64_to_cpu(a->data.non_resident.data_size);
- ni->initialized_size = sle64_to_cpu(
- a->data.non_resident.initialized_size);
- ni->allocated_size = sle64_to_cpu(a->data.non_resident.allocated_size);
- /*
- * We are done with the mft record, so we release it. Otherwise
- * we would deadlock in ntfs_attr_iget().
- */
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(base_ni);
- m = NULL;
- ctx = NULL;
- /* Get the index bitmap attribute inode. */
- bvi = ntfs_attr_iget(base_vi, AT_BITMAP, ni->name, ni->name_len);
- if (IS_ERR(bvi)) {
- ntfs_error(vi->i_sb, "Failed to get bitmap attribute.");
- err = PTR_ERR(bvi);
- goto unm_err_out;
- }
- bni = NTFS_I(bvi);
- if (NInoCompressed(bni) || NInoEncrypted(bni) ||
- NInoSparse(bni)) {
- ntfs_error(vi->i_sb, "$BITMAP attribute is compressed and/or "
- "encrypted and/or sparse.");
- goto iput_unm_err_out;
- }
- /* Consistency check bitmap size vs. index allocation size. */
- bvi_size = i_size_read(bvi);
- if ((bvi_size << 3) < (vi->i_size >> ni->itype.index.block_size_bits)) {
- ntfs_error(vi->i_sb, "Index bitmap too small (0x%llx) for "
- "index allocation (0x%llx).", bvi_size << 3,
- vi->i_size);
- goto iput_unm_err_out;
- }
- iput(bvi);
-skip_large_index_stuff:
- /* Setup the operations for this index inode. */
- vi->i_mapping->a_ops = &ntfs_mst_aops;
- vi->i_blocks = ni->allocated_size >> 9;
- /*
- * Make sure the base inode doesn't go away and attach it to the
- * index inode.
- */
- igrab(base_vi);
- ni->ext.base_ntfs_ino = base_ni;
- ni->nr_extents = -1;
-
- ntfs_debug("Done.");
- return 0;
-iput_unm_err_out:
- iput(bvi);
-unm_err_out:
- if (!err)
- err = -EIO;
- if (ctx)
- ntfs_attr_put_search_ctx(ctx);
- if (m)
- unmap_mft_record(base_ni);
-err_out:
- ntfs_error(vi->i_sb, "Failed with error code %i while reading index "
- "inode (mft_no 0x%lx, name_len %i.", err, vi->i_ino,
- ni->name_len);
- make_bad_inode(vi);
- if (err != -EOPNOTSUPP && err != -ENOMEM)
- NVolSetErrors(vol);
- return err;
-}
-
-/*
- * The MFT inode has special locking, so teach the lock validator
- * about this by splitting off the locking rules of the MFT from
- * the locking rules of other inodes. The MFT inode can never be
- * accessed from the VFS side (or even internally), only by the
- * map_mft functions.
- */
-static struct lock_class_key mft_ni_runlist_lock_key, mft_ni_mrec_lock_key;
-
-/**
- * ntfs_read_inode_mount - special read_inode for mount time use only
- * @vi: inode to read
- *
- * Read inode FILE_MFT at mount time, only called with super_block lock
- * held from within the read_super() code path.
- *
- * This function exists because when it is called the page cache for $MFT/$DATA
- * is not initialized and hence we cannot get at the contents of mft records
- * by calling map_mft_record*().
- *
- * Further it needs to cope with the circular references problem, i.e. cannot
- * load any attributes other than $ATTRIBUTE_LIST until $DATA is loaded, because
- * we do not know where the other extent mft records are yet and again, because
- * we cannot call map_mft_record*() yet. Obviously this applies only when an
- * attribute list is actually present in $MFT inode.
- *
- * We solve these problems by starting with the $DATA attribute before anything
- * else and iterating using ntfs_attr_lookup($DATA) over all extents. As each
- * extent is found, we ntfs_mapping_pairs_decompress() including the implied
- * ntfs_runlists_merge(). Each step of the iteration necessarily provides
- * sufficient information for the next step to complete.
- *
- * This should work but there are two possible pit falls (see inline comments
- * below), but only time will tell if they are real pits or just smoke...
- */
-int ntfs_read_inode_mount(struct inode *vi)
-{
- VCN next_vcn, last_vcn, highest_vcn;
- s64 block;
- struct super_block *sb = vi->i_sb;
- ntfs_volume *vol = NTFS_SB(sb);
- struct buffer_head *bh;
- ntfs_inode *ni;
- MFT_RECORD *m = NULL;
- ATTR_RECORD *a;
- ntfs_attr_search_ctx *ctx;
- unsigned int i, nr_blocks;
- int err;
-
- ntfs_debug("Entering.");
-
- /* Initialize the ntfs specific part of @vi. */
- ntfs_init_big_inode(vi);
-
- ni = NTFS_I(vi);
-
- /* Setup the data attribute. It is special as it is mst protected. */
- NInoSetNonResident(ni);
- NInoSetMstProtected(ni);
- NInoSetSparseDisabled(ni);
- ni->type = AT_DATA;
- ni->name = NULL;
- ni->name_len = 0;
- /*
- * This sets up our little cheat allowing us to reuse the async read io
- * completion handler for directories.
- */
- ni->itype.index.block_size = vol->mft_record_size;
- ni->itype.index.block_size_bits = vol->mft_record_size_bits;
-
- /* Very important! Needed to be able to call map_mft_record*(). */
- vol->mft_ino = vi;
-
- /* Allocate enough memory to read the first mft record. */
- if (vol->mft_record_size > 64 * 1024) {
- ntfs_error(sb, "Unsupported mft record size %i (max 64kiB).",
- vol->mft_record_size);
- goto err_out;
- }
- i = vol->mft_record_size;
- if (i < sb->s_blocksize)
- i = sb->s_blocksize;
- m = (MFT_RECORD*)ntfs_malloc_nofs(i);
- if (!m) {
- ntfs_error(sb, "Failed to allocate buffer for $MFT record 0.");
- goto err_out;
- }
-
- /* Determine the first block of the $MFT/$DATA attribute. */
- block = vol->mft_lcn << vol->cluster_size_bits >>
- sb->s_blocksize_bits;
- nr_blocks = vol->mft_record_size >> sb->s_blocksize_bits;
- if (!nr_blocks)
- nr_blocks = 1;
-
- /* Load $MFT/$DATA's first mft record. */
- for (i = 0; i < nr_blocks; i++) {
- bh = sb_bread(sb, block++);
- if (!bh) {
- ntfs_error(sb, "Device read failed.");
- goto err_out;
- }
- memcpy((char*)m + (i << sb->s_blocksize_bits), bh->b_data,
- sb->s_blocksize);
- brelse(bh);
- }
-
- if (le32_to_cpu(m->bytes_allocated) != vol->mft_record_size) {
- ntfs_error(sb, "Incorrect mft record size %u in superblock, should be %u.",
- le32_to_cpu(m->bytes_allocated), vol->mft_record_size);
- goto err_out;
- }
-
- /* Apply the mst fixups. */
- if (post_read_mst_fixup((NTFS_RECORD*)m, vol->mft_record_size)) {
- /* FIXME: Try to use the $MFTMirr now. */
- ntfs_error(sb, "MST fixup failed. $MFT is corrupt.");
- goto err_out;
- }
-
- /* Sanity check offset to the first attribute */
- if (le16_to_cpu(m->attrs_offset) >= le32_to_cpu(m->bytes_allocated)) {
- ntfs_error(sb, "Incorrect mft offset to the first attribute %u in superblock.",
- le16_to_cpu(m->attrs_offset));
- goto err_out;
- }
-
- /* Need this to sanity check attribute list references to $MFT. */
- vi->i_generation = ni->seq_no = le16_to_cpu(m->sequence_number);
-
- /* Provides read_folio() for map_mft_record(). */
- vi->i_mapping->a_ops = &ntfs_mst_aops;
-
- ctx = ntfs_attr_get_search_ctx(ni, m);
- if (!ctx) {
- err = -ENOMEM;
- goto err_out;
- }
-
- /* Find the attribute list attribute if present. */
- err = ntfs_attr_lookup(AT_ATTRIBUTE_LIST, NULL, 0, 0, 0, NULL, 0, ctx);
- if (err) {
- if (unlikely(err != -ENOENT)) {
- ntfs_error(sb, "Failed to lookup attribute list "
- "attribute. You should run chkdsk.");
- goto put_err_out;
- }
- } else /* if (!err) */ {
- ATTR_LIST_ENTRY *al_entry, *next_al_entry;
- u8 *al_end;
- static const char *es = " Not allowed. $MFT is corrupt. "
- "You should run chkdsk.";
-
- ntfs_debug("Attribute list attribute found in $MFT.");
- NInoSetAttrList(ni);
- a = ctx->attr;
- if (a->flags & ATTR_COMPRESSION_MASK) {
- ntfs_error(sb, "Attribute list attribute is "
- "compressed.%s", es);
- goto put_err_out;
- }
- if (a->flags & ATTR_IS_ENCRYPTED ||
- a->flags & ATTR_IS_SPARSE) {
- if (a->non_resident) {
- ntfs_error(sb, "Non-resident attribute list "
- "attribute is encrypted/"
- "sparse.%s", es);
- goto put_err_out;
- }
- ntfs_warning(sb, "Resident attribute list attribute "
- "in $MFT system file is marked "
- "encrypted/sparse which is not true. "
- "However, Windows allows this and "
- "chkdsk does not detect or correct it "
- "so we will just ignore the invalid "
- "flags and pretend they are not set.");
- }
- /* Now allocate memory for the attribute list. */
- ni->attr_list_size = (u32)ntfs_attr_size(a);
- if (!ni->attr_list_size) {
- ntfs_error(sb, "Attr_list_size is zero");
- goto put_err_out;
- }
- ni->attr_list = ntfs_malloc_nofs(ni->attr_list_size);
- if (!ni->attr_list) {
- ntfs_error(sb, "Not enough memory to allocate buffer "
- "for attribute list.");
- goto put_err_out;
- }
- if (a->non_resident) {
- NInoSetAttrListNonResident(ni);
- if (a->data.non_resident.lowest_vcn) {
- ntfs_error(sb, "Attribute list has non zero "
- "lowest_vcn. $MFT is corrupt. "
- "You should run chkdsk.");
- goto put_err_out;
- }
- /* Setup the runlist. */
- ni->attr_list_rl.rl = ntfs_mapping_pairs_decompress(vol,
- a, NULL);
- if (IS_ERR(ni->attr_list_rl.rl)) {
- err = PTR_ERR(ni->attr_list_rl.rl);
- ni->attr_list_rl.rl = NULL;
- ntfs_error(sb, "Mapping pairs decompression "
- "failed with error code %i.",
- -err);
- goto put_err_out;
- }
- /* Now load the attribute list. */
- if ((err = load_attribute_list(vol, &ni->attr_list_rl,
- ni->attr_list, ni->attr_list_size,
- sle64_to_cpu(a->data.
- non_resident.initialized_size)))) {
- ntfs_error(sb, "Failed to load attribute list "
- "attribute with error code %i.",
- -err);
- goto put_err_out;
- }
- } else /* if (!ctx.attr->non_resident) */ {
- if ((u8*)a + le16_to_cpu(
- a->data.resident.value_offset) +
- le32_to_cpu(
- a->data.resident.value_length) >
- (u8*)ctx->mrec + vol->mft_record_size) {
- ntfs_error(sb, "Corrupt attribute list "
- "attribute.");
- goto put_err_out;
- }
- /* Now copy the attribute list. */
- memcpy(ni->attr_list, (u8*)a + le16_to_cpu(
- a->data.resident.value_offset),
- le32_to_cpu(
- a->data.resident.value_length));
- }
- /* The attribute list is now setup in memory. */
- /*
- * FIXME: I don't know if this case is actually possible.
- * According to logic it is not possible but I have seen too
- * many weird things in MS software to rely on logic... Thus we
- * perform a manual search and make sure the first $MFT/$DATA
- * extent is in the base inode. If it is not we abort with an
- * error and if we ever see a report of this error we will need
- * to do some magic in order to have the necessary mft record
- * loaded and in the right place in the page cache. But
- * hopefully logic will prevail and this never happens...
- */
- al_entry = (ATTR_LIST_ENTRY*)ni->attr_list;
- al_end = (u8*)al_entry + ni->attr_list_size;
- for (;; al_entry = next_al_entry) {
- /* Out of bounds check. */
- if ((u8*)al_entry < ni->attr_list ||
- (u8*)al_entry > al_end)
- goto em_put_err_out;
- /* Catch the end of the attribute list. */
- if ((u8*)al_entry == al_end)
- goto em_put_err_out;
- if (!al_entry->length)
- goto em_put_err_out;
- if ((u8*)al_entry + 6 > al_end || (u8*)al_entry +
- le16_to_cpu(al_entry->length) > al_end)
- goto em_put_err_out;
- next_al_entry = (ATTR_LIST_ENTRY*)((u8*)al_entry +
- le16_to_cpu(al_entry->length));
- if (le32_to_cpu(al_entry->type) > le32_to_cpu(AT_DATA))
- goto em_put_err_out;
- if (AT_DATA != al_entry->type)
- continue;
- /* We want an unnamed attribute. */
- if (al_entry->name_length)
- goto em_put_err_out;
- /* Want the first entry, i.e. lowest_vcn == 0. */
- if (al_entry->lowest_vcn)
- goto em_put_err_out;
- /* First entry has to be in the base mft record. */
- if (MREF_LE(al_entry->mft_reference) != vi->i_ino) {
- /* MFT references do not match, logic fails. */
- ntfs_error(sb, "BUG: The first $DATA extent "
- "of $MFT is not in the base "
- "mft record. Please report "
- "you saw this message to "
- "linux-ntfs-dev@lists."
- "sourceforge.net");
- goto put_err_out;
- } else {
- /* Sequence numbers must match. */
- if (MSEQNO_LE(al_entry->mft_reference) !=
- ni->seq_no)
- goto em_put_err_out;
- /* Got it. All is ok. We can stop now. */
- break;
- }
- }
- }
-
- ntfs_attr_reinit_search_ctx(ctx);
-
- /* Now load all attribute extents. */
- a = NULL;
- next_vcn = last_vcn = highest_vcn = 0;
- while (!(err = ntfs_attr_lookup(AT_DATA, NULL, 0, 0, next_vcn, NULL, 0,
- ctx))) {
- runlist_element *nrl;
-
- /* Cache the current attribute. */
- a = ctx->attr;
- /* $MFT must be non-resident. */
- if (!a->non_resident) {
- ntfs_error(sb, "$MFT must be non-resident but a "
- "resident extent was found. $MFT is "
- "corrupt. Run chkdsk.");
- goto put_err_out;
- }
- /* $MFT must be uncompressed and unencrypted. */
- if (a->flags & ATTR_COMPRESSION_MASK ||
- a->flags & ATTR_IS_ENCRYPTED ||
- a->flags & ATTR_IS_SPARSE) {
- ntfs_error(sb, "$MFT must be uncompressed, "
- "non-sparse, and unencrypted but a "
- "compressed/sparse/encrypted extent "
- "was found. $MFT is corrupt. Run "
- "chkdsk.");
- goto put_err_out;
- }
- /*
- * Decompress the mapping pairs array of this extent and merge
- * the result into the existing runlist. No need for locking
- * as we have exclusive access to the inode at this time and we
- * are a mount in progress task, too.
- */
- nrl = ntfs_mapping_pairs_decompress(vol, a, ni->runlist.rl);
- if (IS_ERR(nrl)) {
- ntfs_error(sb, "ntfs_mapping_pairs_decompress() "
- "failed with error code %ld. $MFT is "
- "corrupt.", PTR_ERR(nrl));
- goto put_err_out;
- }
- ni->runlist.rl = nrl;
-
- /* Are we in the first extent? */
- if (!next_vcn) {
- if (a->data.non_resident.lowest_vcn) {
- ntfs_error(sb, "First extent of $DATA "
- "attribute has non zero "
- "lowest_vcn. $MFT is corrupt. "
- "You should run chkdsk.");
- goto put_err_out;
- }
- /* Get the last vcn in the $DATA attribute. */
- last_vcn = sle64_to_cpu(
- a->data.non_resident.allocated_size)
- >> vol->cluster_size_bits;
- /* Fill in the inode size. */
- vi->i_size = sle64_to_cpu(
- a->data.non_resident.data_size);
- ni->initialized_size = sle64_to_cpu(
- a->data.non_resident.initialized_size);
- ni->allocated_size = sle64_to_cpu(
- a->data.non_resident.allocated_size);
- /*
- * Verify the number of mft records does not exceed
- * 2^32 - 1.
- */
- if ((vi->i_size >> vol->mft_record_size_bits) >=
- (1ULL << 32)) {
- ntfs_error(sb, "$MFT is too big! Aborting.");
- goto put_err_out;
- }
- /*
- * We have got the first extent of the runlist for
- * $MFT which means it is now relatively safe to call
- * the normal ntfs_read_inode() function.
- * Complete reading the inode, this will actually
- * re-read the mft record for $MFT, this time entering
- * it into the page cache with which we complete the
- * kick start of the volume. It should be safe to do
- * this now as the first extent of $MFT/$DATA is
- * already known and we would hope that we don't need
- * further extents in order to find the other
- * attributes belonging to $MFT. Only time will tell if
- * this is really the case. If not we will have to play
- * magic at this point, possibly duplicating a lot of
- * ntfs_read_inode() at this point. We will need to
- * ensure we do enough of its work to be able to call
- * ntfs_read_inode() on extents of $MFT/$DATA. But lets
- * hope this never happens...
- */
- ntfs_read_locked_inode(vi);
- if (is_bad_inode(vi)) {
- ntfs_error(sb, "ntfs_read_inode() of $MFT "
- "failed. BUG or corrupt $MFT. "
- "Run chkdsk and if no errors "
- "are found, please report you "
- "saw this message to "
- "linux-ntfs-dev@lists."
- "sourceforge.net");
- ntfs_attr_put_search_ctx(ctx);
- /* Revert to the safe super operations. */
- ntfs_free(m);
- return -1;
- }
- /*
- * Re-initialize some specifics about $MFT's inode as
- * ntfs_read_inode() will have set up the default ones.
- */
- /* Set uid and gid to root. */
- vi->i_uid = GLOBAL_ROOT_UID;
- vi->i_gid = GLOBAL_ROOT_GID;
- /* Regular file. No access for anyone. */
- vi->i_mode = S_IFREG;
- /* No VFS initiated operations allowed for $MFT. */
- vi->i_op = &ntfs_empty_inode_ops;
- vi->i_fop = &ntfs_empty_file_ops;
- }
-
- /* Get the lowest vcn for the next extent. */
- highest_vcn = sle64_to_cpu(a->data.non_resident.highest_vcn);
- next_vcn = highest_vcn + 1;
-
- /* Only one extent or error, which we catch below. */
- if (next_vcn <= 0)
- break;
-
- /* Avoid endless loops due to corruption. */
- if (next_vcn < sle64_to_cpu(
- a->data.non_resident.lowest_vcn)) {
- ntfs_error(sb, "$MFT has corrupt attribute list "
- "attribute. Run chkdsk.");
- goto put_err_out;
- }
- }
- if (err != -ENOENT) {
- ntfs_error(sb, "Failed to lookup $MFT/$DATA attribute extent. "
- "$MFT is corrupt. Run chkdsk.");
- goto put_err_out;
- }
- if (!a) {
- ntfs_error(sb, "$MFT/$DATA attribute not found. $MFT is "
- "corrupt. Run chkdsk.");
- goto put_err_out;
- }
- if (highest_vcn && highest_vcn != last_vcn - 1) {
- ntfs_error(sb, "Failed to load the complete runlist for "
- "$MFT/$DATA. Driver bug or corrupt $MFT. "
- "Run chkdsk.");
- ntfs_debug("highest_vcn = 0x%llx, last_vcn - 1 = 0x%llx",
- (unsigned long long)highest_vcn,
- (unsigned long long)last_vcn - 1);
- goto put_err_out;
- }
- ntfs_attr_put_search_ctx(ctx);
- ntfs_debug("Done.");
- ntfs_free(m);
-
- /*
- * Split the locking rules of the MFT inode from the
- * locking rules of other inodes:
- */
- lockdep_set_class(&ni->runlist.lock, &mft_ni_runlist_lock_key);
- lockdep_set_class(&ni->mrec_lock, &mft_ni_mrec_lock_key);
-
- return 0;
-
-em_put_err_out:
- ntfs_error(sb, "Couldn't find first extent of $DATA attribute in "
- "attribute list. $MFT is corrupt. Run chkdsk.");
-put_err_out:
- ntfs_attr_put_search_ctx(ctx);
-err_out:
- ntfs_error(sb, "Failed. Marking inode as bad.");
- make_bad_inode(vi);
- ntfs_free(m);
- return -1;
-}
-
-static void __ntfs_clear_inode(ntfs_inode *ni)
-{
- /* Free all alocated memory. */
- down_write(&ni->runlist.lock);
- if (ni->runlist.rl) {
- ntfs_free(ni->runlist.rl);
- ni->runlist.rl = NULL;
- }
- up_write(&ni->runlist.lock);
-
- if (ni->attr_list) {
- ntfs_free(ni->attr_list);
- ni->attr_list = NULL;
- }
-
- down_write(&ni->attr_list_rl.lock);
- if (ni->attr_list_rl.rl) {
- ntfs_free(ni->attr_list_rl.rl);
- ni->attr_list_rl.rl = NULL;
- }
- up_write(&ni->attr_list_rl.lock);
-
- if (ni->name_len && ni->name != I30) {
- /* Catch bugs... */
- BUG_ON(!ni->name);
- kfree(ni->name);
- }
-}
-
-void ntfs_clear_extent_inode(ntfs_inode *ni)
-{
- ntfs_debug("Entering for inode 0x%lx.", ni->mft_no);
-
- BUG_ON(NInoAttr(ni));
- BUG_ON(ni->nr_extents != -1);
-
-#ifdef NTFS_RW
- if (NInoDirty(ni)) {
- if (!is_bad_inode(VFS_I(ni->ext.base_ntfs_ino)))
- ntfs_error(ni->vol->sb, "Clearing dirty extent inode! "
- "Losing data! This is a BUG!!!");
- // FIXME: Do something!!!
- }
-#endif /* NTFS_RW */
-
- __ntfs_clear_inode(ni);
-
- /* Bye, bye... */
- ntfs_destroy_extent_inode(ni);
-}
-
-/**
- * ntfs_evict_big_inode - clean up the ntfs specific part of an inode
- * @vi: vfs inode pending annihilation
- *
- * When the VFS is going to remove an inode from memory, ntfs_clear_big_inode()
- * is called, which deallocates all memory belonging to the NTFS specific part
- * of the inode and returns.
- *
- * If the MFT record is dirty, we commit it before doing anything else.
- */
-void ntfs_evict_big_inode(struct inode *vi)
-{
- ntfs_inode *ni = NTFS_I(vi);
-
- truncate_inode_pages_final(&vi->i_data);
- clear_inode(vi);
-
-#ifdef NTFS_RW
- if (NInoDirty(ni)) {
- bool was_bad = (is_bad_inode(vi));
-
- /* Committing the inode also commits all extent inodes. */
- ntfs_commit_inode(vi);
-
- if (!was_bad && (is_bad_inode(vi) || NInoDirty(ni))) {
- ntfs_error(vi->i_sb, "Failed to commit dirty inode "
- "0x%lx. Losing data!", vi->i_ino);
- // FIXME: Do something!!!
- }
- }
-#endif /* NTFS_RW */
-
- /* No need to lock at this stage as no one else has a reference. */
- if (ni->nr_extents > 0) {
- int i;
-
- for (i = 0; i < ni->nr_extents; i++)
- ntfs_clear_extent_inode(ni->ext.extent_ntfs_inos[i]);
- kfree(ni->ext.extent_ntfs_inos);
- }
-
- __ntfs_clear_inode(ni);
-
- if (NInoAttr(ni)) {
- /* Release the base inode if we are holding it. */
- if (ni->nr_extents == -1) {
- iput(VFS_I(ni->ext.base_ntfs_ino));
- ni->nr_extents = 0;
- ni->ext.base_ntfs_ino = NULL;
- }
- }
- BUG_ON(ni->page);
- if (!atomic_dec_and_test(&ni->count))
- BUG();
- return;
-}
-
-/**
- * ntfs_show_options - show mount options in /proc/mounts
- * @sf: seq_file in which to write our mount options
- * @root: root of the mounted tree whose mount options to display
- *
- * Called by the VFS once for each mounted ntfs volume when someone reads
- * /proc/mounts in order to display the NTFS specific mount options of each
- * mount. The mount options of fs specified by @root are written to the seq file
- * @sf and success is returned.
- */
-int ntfs_show_options(struct seq_file *sf, struct dentry *root)
-{
- ntfs_volume *vol = NTFS_SB(root->d_sb);
- int i;
-
- seq_printf(sf, ",uid=%i", from_kuid_munged(&init_user_ns, vol->uid));
- seq_printf(sf, ",gid=%i", from_kgid_munged(&init_user_ns, vol->gid));
- if (vol->fmask == vol->dmask)
- seq_printf(sf, ",umask=0%o", vol->fmask);
- else {
- seq_printf(sf, ",fmask=0%o", vol->fmask);
- seq_printf(sf, ",dmask=0%o", vol->dmask);
- }
- seq_printf(sf, ",nls=%s", vol->nls_map->charset);
- if (NVolCaseSensitive(vol))
- seq_printf(sf, ",case_sensitive");
- if (NVolShowSystemFiles(vol))
- seq_printf(sf, ",show_sys_files");
- if (!NVolSparseEnabled(vol))
- seq_printf(sf, ",disable_sparse");
- for (i = 0; on_errors_arr[i].val; i++) {
- if (on_errors_arr[i].val & vol->on_errors)
- seq_printf(sf, ",errors=%s", on_errors_arr[i].str);
- }
- seq_printf(sf, ",mft_zone_multiplier=%i", vol->mft_zone_multiplier);
- return 0;
-}
-
-#ifdef NTFS_RW
-
-static const char *es = " Leaving inconsistent metadata. Unmount and run "
- "chkdsk.";
-
-/**
- * ntfs_truncate - called when the i_size of an ntfs inode is changed
- * @vi: inode for which the i_size was changed
- *
- * We only support i_size changes for normal files at present, i.e. not
- * compressed and not encrypted. This is enforced in ntfs_setattr(), see
- * below.
- *
- * The kernel guarantees that @vi is a regular file (S_ISREG() is true) and
- * that the change is allowed.
- *
- * This implies for us that @vi is a file inode rather than a directory, index,
- * or attribute inode as well as that @vi is a base inode.
- *
- * Returns 0 on success or -errno on error.
- *
- * Called with ->i_mutex held.
- */
-int ntfs_truncate(struct inode *vi)
-{
- s64 new_size, old_size, nr_freed, new_alloc_size, old_alloc_size;
- VCN highest_vcn;
- unsigned long flags;
- ntfs_inode *base_ni, *ni = NTFS_I(vi);
- ntfs_volume *vol = ni->vol;
- ntfs_attr_search_ctx *ctx;
- MFT_RECORD *m;
- ATTR_RECORD *a;
- const char *te = " Leaving file length out of sync with i_size.";
- int err, mp_size, size_change, alloc_change;
-
- ntfs_debug("Entering for inode 0x%lx.", vi->i_ino);
- BUG_ON(NInoAttr(ni));
- BUG_ON(S_ISDIR(vi->i_mode));
- BUG_ON(NInoMstProtected(ni));
- BUG_ON(ni->nr_extents < 0);
-retry_truncate:
- /*
- * Lock the runlist for writing and map the mft record to ensure it is
- * safe to mess with the attribute runlist and sizes.
- */
- down_write(&ni->runlist.lock);
- if (!NInoAttr(ni))
- base_ni = ni;
- else
- base_ni = ni->ext.base_ntfs_ino;
- m = map_mft_record(base_ni);
- if (IS_ERR(m)) {
- err = PTR_ERR(m);
- ntfs_error(vi->i_sb, "Failed to map mft record for inode 0x%lx "
- "(error code %d).%s", vi->i_ino, err, te);
- ctx = NULL;
- m = NULL;
- goto old_bad_out;
- }
- ctx = ntfs_attr_get_search_ctx(base_ni, m);
- if (unlikely(!ctx)) {
- ntfs_error(vi->i_sb, "Failed to allocate a search context for "
- "inode 0x%lx (not enough memory).%s",
- vi->i_ino, te);
- err = -ENOMEM;
- goto old_bad_out;
- }
- err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
- CASE_SENSITIVE, 0, NULL, 0, ctx);
- if (unlikely(err)) {
- if (err == -ENOENT) {
- ntfs_error(vi->i_sb, "Open attribute is missing from "
- "mft record. Inode 0x%lx is corrupt. "
- "Run chkdsk.%s", vi->i_ino, te);
- err = -EIO;
- } else
- ntfs_error(vi->i_sb, "Failed to lookup attribute in "
- "inode 0x%lx (error code %d).%s",
- vi->i_ino, err, te);
- goto old_bad_out;
- }
- m = ctx->mrec;
- a = ctx->attr;
- /*
- * The i_size of the vfs inode is the new size for the attribute value.
- */
- new_size = i_size_read(vi);
- /* The current size of the attribute value is the old size. */
- old_size = ntfs_attr_size(a);
- /* Calculate the new allocated size. */
- if (NInoNonResident(ni))
- new_alloc_size = (new_size + vol->cluster_size - 1) &
- ~(s64)vol->cluster_size_mask;
- else
- new_alloc_size = (new_size + 7) & ~7;
- /* The current allocated size is the old allocated size. */
- read_lock_irqsave(&ni->size_lock, flags);
- old_alloc_size = ni->allocated_size;
- read_unlock_irqrestore(&ni->size_lock, flags);
- /*
- * The change in the file size. This will be 0 if no change, >0 if the
- * size is growing, and <0 if the size is shrinking.
- */
- size_change = -1;
- if (new_size - old_size >= 0) {
- size_change = 1;
- if (new_size == old_size)
- size_change = 0;
- }
- /* As above for the allocated size. */
- alloc_change = -1;
- if (new_alloc_size - old_alloc_size >= 0) {
- alloc_change = 1;
- if (new_alloc_size == old_alloc_size)
- alloc_change = 0;
- }
- /*
- * If neither the size nor the allocation are being changed there is
- * nothing to do.
- */
- if (!size_change && !alloc_change)
- goto unm_done;
- /* If the size is changing, check if new size is allowed in $AttrDef. */
- if (size_change) {
- err = ntfs_attr_size_bounds_check(vol, ni->type, new_size);
- if (unlikely(err)) {
- if (err == -ERANGE) {
- ntfs_error(vol->sb, "Truncate would cause the "
- "inode 0x%lx to %simum size "
- "for its attribute type "
- "(0x%x). Aborting truncate.",
- vi->i_ino,
- new_size > old_size ? "exceed "
- "the max" : "go under the min",
- le32_to_cpu(ni->type));
- err = -EFBIG;
- } else {
- ntfs_error(vol->sb, "Inode 0x%lx has unknown "
- "attribute type 0x%x. "
- "Aborting truncate.",
- vi->i_ino,
- le32_to_cpu(ni->type));
- err = -EIO;
- }
- /* Reset the vfs inode size to the old size. */
- i_size_write(vi, old_size);
- goto err_out;
- }
- }
- if (NInoCompressed(ni) || NInoEncrypted(ni)) {
- ntfs_warning(vi->i_sb, "Changes in inode size are not "
- "supported yet for %s files, ignoring.",
- NInoCompressed(ni) ? "compressed" :
- "encrypted");
- err = -EOPNOTSUPP;
- goto bad_out;
- }
- if (a->non_resident)
- goto do_non_resident_truncate;
- BUG_ON(NInoNonResident(ni));
- /* Resize the attribute record to best fit the new attribute size. */
- if (new_size < vol->mft_record_size &&
- !ntfs_resident_attr_value_resize(m, a, new_size)) {
- /* The resize succeeded! */
- flush_dcache_mft_record_page(ctx->ntfs_ino);
- mark_mft_record_dirty(ctx->ntfs_ino);
- write_lock_irqsave(&ni->size_lock, flags);
- /* Update the sizes in the ntfs inode and all is done. */
- ni->allocated_size = le32_to_cpu(a->length) -
- le16_to_cpu(a->data.resident.value_offset);
- /*
- * Note ntfs_resident_attr_value_resize() has already done any
- * necessary data clearing in the attribute record. When the
- * file is being shrunk vmtruncate() will already have cleared
- * the top part of the last partial page, i.e. since this is
- * the resident case this is the page with index 0. However,
- * when the file is being expanded, the page cache page data
- * between the old data_size, i.e. old_size, and the new_size
- * has not been zeroed. Fortunately, we do not need to zero it
- * either since on one hand it will either already be zero due
- * to both read_folio and writepage clearing partial page data
- * beyond i_size in which case there is nothing to do or in the
- * case of the file being mmap()ped at the same time, POSIX
- * specifies that the behaviour is unspecified thus we do not
- * have to do anything. This means that in our implementation
- * in the rare case that the file is mmap()ped and a write
- * occurred into the mmap()ped region just beyond the file size
- * and writepage has not yet been called to write out the page
- * (which would clear the area beyond the file size) and we now
- * extend the file size to incorporate this dirty region
- * outside the file size, a write of the page would result in
- * this data being written to disk instead of being cleared.
- * Given both POSIX and the Linux mmap(2) man page specify that
- * this corner case is undefined, we choose to leave it like
- * that as this is much simpler for us as we cannot lock the
- * relevant page now since we are holding too many ntfs locks
- * which would result in a lock reversal deadlock.
- */
- ni->initialized_size = new_size;
- write_unlock_irqrestore(&ni->size_lock, flags);
- goto unm_done;
- }
- /* If the above resize failed, this must be an attribute extension. */
- BUG_ON(size_change < 0);
- /*
- * We have to drop all the locks so we can call
- * ntfs_attr_make_non_resident(). This could be optimised by try-
- * locking the first page cache page and only if that fails dropping
- * the locks, locking the page, and redoing all the locking and
- * lookups. While this would be a huge optimisation, it is not worth
- * it as this is definitely a slow code path as it only ever can happen
- * once for any given file.
- */
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(base_ni);
- up_write(&ni->runlist.lock);
- /*
- * Not enough space in the mft record, try to make the attribute
- * non-resident and if successful restart the truncation process.
- */
- err = ntfs_attr_make_non_resident(ni, old_size);
- if (likely(!err))
- goto retry_truncate;
- /*
- * Could not make non-resident. If this is due to this not being
- * permitted for this attribute type or there not being enough space,
- * try to make other attributes non-resident. Otherwise fail.
- */
- if (unlikely(err != -EPERM && err != -ENOSPC)) {
- ntfs_error(vol->sb, "Cannot truncate inode 0x%lx, attribute "
- "type 0x%x, because the conversion from "
- "resident to non-resident attribute failed "
- "with error code %i.", vi->i_ino,
- (unsigned)le32_to_cpu(ni->type), err);
- if (err != -ENOMEM)
- err = -EIO;
- goto conv_err_out;
- }
- /* TODO: Not implemented from here, abort. */
- if (err == -ENOSPC)
- ntfs_error(vol->sb, "Not enough space in the mft record/on "
- "disk for the non-resident attribute value. "
- "This case is not implemented yet.");
- else /* if (err == -EPERM) */
- ntfs_error(vol->sb, "This attribute type may not be "
- "non-resident. This case is not implemented "
- "yet.");
- err = -EOPNOTSUPP;
- goto conv_err_out;
-#if 0
- // TODO: Attempt to make other attributes non-resident.
- if (!err)
- goto do_resident_extend;
- /*
- * Both the attribute list attribute and the standard information
- * attribute must remain in the base inode. Thus, if this is one of
- * these attributes, we have to try to move other attributes out into
- * extent mft records instead.
- */
- if (ni->type == AT_ATTRIBUTE_LIST ||
- ni->type == AT_STANDARD_INFORMATION) {
- // TODO: Attempt to move other attributes into extent mft
- // records.
- err = -EOPNOTSUPP;
- if (!err)
- goto do_resident_extend;
- goto err_out;
- }
- // TODO: Attempt to move this attribute to an extent mft record, but
- // only if it is not already the only attribute in an mft record in
- // which case there would be nothing to gain.
- err = -EOPNOTSUPP;
- if (!err)
- goto do_resident_extend;
- /* There is nothing we can do to make enough space. )-: */
- goto err_out;
-#endif
-do_non_resident_truncate:
- BUG_ON(!NInoNonResident(ni));
- if (alloc_change < 0) {
- highest_vcn = sle64_to_cpu(a->data.non_resident.highest_vcn);
- if (highest_vcn > 0 &&
- old_alloc_size >> vol->cluster_size_bits >
- highest_vcn + 1) {
- /*
- * This attribute has multiple extents. Not yet
- * supported.
- */
- ntfs_error(vol->sb, "Cannot truncate inode 0x%lx, "
- "attribute type 0x%x, because the "
- "attribute is highly fragmented (it "
- "consists of multiple extents) and "
- "this case is not implemented yet.",
- vi->i_ino,
- (unsigned)le32_to_cpu(ni->type));
- err = -EOPNOTSUPP;
- goto bad_out;
- }
- }
- /*
- * If the size is shrinking, need to reduce the initialized_size and
- * the data_size before reducing the allocation.
- */
- if (size_change < 0) {
- /*
- * Make the valid size smaller (i_size is already up-to-date).
- */
- write_lock_irqsave(&ni->size_lock, flags);
- if (new_size < ni->initialized_size) {
- ni->initialized_size = new_size;
- a->data.non_resident.initialized_size =
- cpu_to_sle64(new_size);
- }
- a->data.non_resident.data_size = cpu_to_sle64(new_size);
- write_unlock_irqrestore(&ni->size_lock, flags);
- flush_dcache_mft_record_page(ctx->ntfs_ino);
- mark_mft_record_dirty(ctx->ntfs_ino);
- /* If the allocated size is not changing, we are done. */
- if (!alloc_change)
- goto unm_done;
- /*
- * If the size is shrinking it makes no sense for the
- * allocation to be growing.
- */
- BUG_ON(alloc_change > 0);
- } else /* if (size_change >= 0) */ {
- /*
- * The file size is growing or staying the same but the
- * allocation can be shrinking, growing or staying the same.
- */
- if (alloc_change > 0) {
- /*
- * We need to extend the allocation and possibly update
- * the data size. If we are updating the data size,
- * since we are not touching the initialized_size we do
- * not need to worry about the actual data on disk.
- * And as far as the page cache is concerned, there
- * will be no pages beyond the old data size and any
- * partial region in the last page between the old and
- * new data size (or the end of the page if the new
- * data size is outside the page) does not need to be
- * modified as explained above for the resident
- * attribute truncate case. To do this, we simply drop
- * the locks we hold and leave all the work to our
- * friendly helper ntfs_attr_extend_allocation().
- */
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(base_ni);
- up_write(&ni->runlist.lock);
- err = ntfs_attr_extend_allocation(ni, new_size,
- size_change > 0 ? new_size : -1, -1);
- /*
- * ntfs_attr_extend_allocation() will have done error
- * output already.
- */
- goto done;
- }
- if (!alloc_change)
- goto alloc_done;
- }
- /* alloc_change < 0 */
- /* Free the clusters. */
- nr_freed = ntfs_cluster_free(ni, new_alloc_size >>
- vol->cluster_size_bits, -1, ctx);
- m = ctx->mrec;
- a = ctx->attr;
- if (unlikely(nr_freed < 0)) {
- ntfs_error(vol->sb, "Failed to release cluster(s) (error code "
- "%lli). Unmount and run chkdsk to recover "
- "the lost cluster(s).", (long long)nr_freed);
- NVolSetErrors(vol);
- nr_freed = 0;
- }
- /* Truncate the runlist. */
- err = ntfs_rl_truncate_nolock(vol, &ni->runlist,
- new_alloc_size >> vol->cluster_size_bits);
- /*
- * If the runlist truncation failed and/or the search context is no
- * longer valid, we cannot resize the attribute record or build the
- * mapping pairs array thus we mark the inode bad so that no access to
- * the freed clusters can happen.
- */
- if (unlikely(err || IS_ERR(m))) {
- ntfs_error(vol->sb, "Failed to %s (error code %li).%s",
- IS_ERR(m) ?
- "restore attribute search context" :
- "truncate attribute runlist",
- IS_ERR(m) ? PTR_ERR(m) : err, es);
- err = -EIO;
- goto bad_out;
- }
- /* Get the size for the shrunk mapping pairs array for the runlist. */
- mp_size = ntfs_get_size_for_mapping_pairs(vol, ni->runlist.rl, 0, -1);
- if (unlikely(mp_size <= 0)) {
- ntfs_error(vol->sb, "Cannot shrink allocation of inode 0x%lx, "
- "attribute type 0x%x, because determining the "
- "size for the mapping pairs failed with error "
- "code %i.%s", vi->i_ino,
- (unsigned)le32_to_cpu(ni->type), mp_size, es);
- err = -EIO;
- goto bad_out;
- }
- /*
- * Shrink the attribute record for the new mapping pairs array. Note,
- * this cannot fail since we are making the attribute smaller thus by
- * definition there is enough space to do so.
- */
- err = ntfs_attr_record_resize(m, a, mp_size +
- le16_to_cpu(a->data.non_resident.mapping_pairs_offset));
- BUG_ON(err);
- /*
- * Generate the mapping pairs array directly into the attribute record.
- */
- err = ntfs_mapping_pairs_build(vol, (u8*)a +
- le16_to_cpu(a->data.non_resident.mapping_pairs_offset),
- mp_size, ni->runlist.rl, 0, -1, NULL);
- if (unlikely(err)) {
- ntfs_error(vol->sb, "Cannot shrink allocation of inode 0x%lx, "
- "attribute type 0x%x, because building the "
- "mapping pairs failed with error code %i.%s",
- vi->i_ino, (unsigned)le32_to_cpu(ni->type),
- err, es);
- err = -EIO;
- goto bad_out;
- }
- /* Update the allocated/compressed size as well as the highest vcn. */
- a->data.non_resident.highest_vcn = cpu_to_sle64((new_alloc_size >>
- vol->cluster_size_bits) - 1);
- write_lock_irqsave(&ni->size_lock, flags);
- ni->allocated_size = new_alloc_size;
- a->data.non_resident.allocated_size = cpu_to_sle64(new_alloc_size);
- if (NInoSparse(ni) || NInoCompressed(ni)) {
- if (nr_freed) {
- ni->itype.compressed.size -= nr_freed <<
- vol->cluster_size_bits;
- BUG_ON(ni->itype.compressed.size < 0);
- a->data.non_resident.compressed_size = cpu_to_sle64(
- ni->itype.compressed.size);
- vi->i_blocks = ni->itype.compressed.size >> 9;
- }
- } else
- vi->i_blocks = new_alloc_size >> 9;
- write_unlock_irqrestore(&ni->size_lock, flags);
- /*
- * We have shrunk the allocation. If this is a shrinking truncate we
- * have already dealt with the initialized_size and the data_size above
- * and we are done. If the truncate is only changing the allocation
- * and not the data_size, we are also done. If this is an extending
- * truncate, need to extend the data_size now which is ensured by the
- * fact that @size_change is positive.
- */
-alloc_done:
- /*
- * If the size is growing, need to update it now. If it is shrinking,
- * we have already updated it above (before the allocation change).
- */
- if (size_change > 0)
- a->data.non_resident.data_size = cpu_to_sle64(new_size);
- /* Ensure the modified mft record is written out. */
- flush_dcache_mft_record_page(ctx->ntfs_ino);
- mark_mft_record_dirty(ctx->ntfs_ino);
-unm_done:
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(base_ni);
- up_write(&ni->runlist.lock);
-done:
- /* Update the mtime and ctime on the base inode. */
- /* normally ->truncate shouldn't update ctime or mtime,
- * but ntfs did before so it got a copy & paste version
- * of file_update_time. one day someone should fix this
- * for real.
- */
- if (!IS_NOCMTIME(VFS_I(base_ni)) && !IS_RDONLY(VFS_I(base_ni))) {
- struct timespec64 now = current_time(VFS_I(base_ni));
- struct timespec64 ctime = inode_get_ctime(VFS_I(base_ni));
- struct timespec64 mtime = inode_get_mtime(VFS_I(base_ni));
- int sync_it = 0;
-
- if (!timespec64_equal(&mtime, &now) ||
- !timespec64_equal(&ctime, &now))
- sync_it = 1;
- inode_set_ctime_to_ts(VFS_I(base_ni), now);
- inode_set_mtime_to_ts(VFS_I(base_ni), now);
-
- if (sync_it)
- mark_inode_dirty_sync(VFS_I(base_ni));
- }
-
- if (likely(!err)) {
- NInoClearTruncateFailed(ni);
- ntfs_debug("Done.");
- }
- return err;
-old_bad_out:
- old_size = -1;
-bad_out:
- if (err != -ENOMEM && err != -EOPNOTSUPP)
- NVolSetErrors(vol);
- if (err != -EOPNOTSUPP)
- NInoSetTruncateFailed(ni);
- else if (old_size >= 0)
- i_size_write(vi, old_size);
-err_out:
- if (ctx)
- ntfs_attr_put_search_ctx(ctx);
- if (m)
- unmap_mft_record(base_ni);
- up_write(&ni->runlist.lock);
-out:
- ntfs_debug("Failed. Returning error code %i.", err);
- return err;
-conv_err_out:
- if (err != -ENOMEM && err != -EOPNOTSUPP)
- NVolSetErrors(vol);
- if (err != -EOPNOTSUPP)
- NInoSetTruncateFailed(ni);
- else
- i_size_write(vi, old_size);
- goto out;
-}
-
-/**
- * ntfs_truncate_vfs - wrapper for ntfs_truncate() that has no return value
- * @vi: inode for which the i_size was changed
- *
- * Wrapper for ntfs_truncate() that has no return value.
- *
- * See ntfs_truncate() description above for details.
- */
-#ifdef NTFS_RW
-void ntfs_truncate_vfs(struct inode *vi) {
- ntfs_truncate(vi);
-}
-#endif
-
-/**
- * ntfs_setattr - called from notify_change() when an attribute is being changed
- * @idmap: idmap of the mount the inode was found from
- * @dentry: dentry whose attributes to change
- * @attr: structure describing the attributes and the changes
- *
- * We have to trap VFS attempts to truncate the file described by @dentry as
- * soon as possible, because we do not implement changes in i_size yet. So we
- * abort all i_size changes here.
- *
- * We also abort all changes of user, group, and mode as we do not implement
- * the NTFS ACLs yet.
- *
- * Called with ->i_mutex held.
- */
-int ntfs_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
- struct iattr *attr)
-{
- struct inode *vi = d_inode(dentry);
- int err;
- unsigned int ia_valid = attr->ia_valid;
-
- err = setattr_prepare(&nop_mnt_idmap, dentry, attr);
- if (err)
- goto out;
- /* We do not support NTFS ACLs yet. */
- if (ia_valid & (ATTR_UID | ATTR_GID | ATTR_MODE)) {
- ntfs_warning(vi->i_sb, "Changes in user/group/mode are not "
- "supported yet, ignoring.");
- err = -EOPNOTSUPP;
- goto out;
- }
- if (ia_valid & ATTR_SIZE) {
- if (attr->ia_size != i_size_read(vi)) {
- ntfs_inode *ni = NTFS_I(vi);
- /*
- * FIXME: For now we do not support resizing of
- * compressed or encrypted files yet.
- */
- if (NInoCompressed(ni) || NInoEncrypted(ni)) {
- ntfs_warning(vi->i_sb, "Changes in inode size "
- "are not supported yet for "
- "%s files, ignoring.",
- NInoCompressed(ni) ?
- "compressed" : "encrypted");
- err = -EOPNOTSUPP;
- } else {
- truncate_setsize(vi, attr->ia_size);
- ntfs_truncate_vfs(vi);
- }
- if (err || ia_valid == ATTR_SIZE)
- goto out;
- } else {
- /*
- * We skipped the truncate but must still update
- * timestamps.
- */
- ia_valid |= ATTR_MTIME | ATTR_CTIME;
- }
- }
- if (ia_valid & ATTR_ATIME)
- inode_set_atime_to_ts(vi, attr->ia_atime);
- if (ia_valid & ATTR_MTIME)
- inode_set_mtime_to_ts(vi, attr->ia_mtime);
- if (ia_valid & ATTR_CTIME)
- inode_set_ctime_to_ts(vi, attr->ia_ctime);
- mark_inode_dirty(vi);
-out:
- return err;
-}
-
-/**
- * __ntfs_write_inode - write out a dirty inode
- * @vi: inode to write out
- * @sync: if true, write out synchronously
- *
- * Write out a dirty inode to disk including any extent inodes if present.
- *
- * If @sync is true, commit the inode to disk and wait for io completion. This
- * is done using write_mft_record().
- *
- * If @sync is false, just schedule the write to happen but do not wait for i/o
- * completion. In 2.6 kernels, scheduling usually happens just by virtue of
- * marking the page (and in this case mft record) dirty but we do not implement
- * this yet as write_mft_record() largely ignores the @sync parameter and
- * always performs synchronous writes.
- *
- * Return 0 on success and -errno on error.
- */
-int __ntfs_write_inode(struct inode *vi, int sync)
-{
- sle64 nt;
- ntfs_inode *ni = NTFS_I(vi);
- ntfs_attr_search_ctx *ctx;
- MFT_RECORD *m;
- STANDARD_INFORMATION *si;
- int err = 0;
- bool modified = false;
-
- ntfs_debug("Entering for %sinode 0x%lx.", NInoAttr(ni) ? "attr " : "",
- vi->i_ino);
- /*
- * Dirty attribute inodes are written via their real inodes so just
- * clean them here. Access time updates are taken care off when the
- * real inode is written.
- */
- if (NInoAttr(ni)) {
- NInoClearDirty(ni);
- ntfs_debug("Done.");
- return 0;
- }
- /* Map, pin, and lock the mft record belonging to the inode. */
- m = map_mft_record(ni);
- if (IS_ERR(m)) {
- err = PTR_ERR(m);
- goto err_out;
- }
- /* Update the access times in the standard information attribute. */
- ctx = ntfs_attr_get_search_ctx(ni, m);
- if (unlikely(!ctx)) {
- err = -ENOMEM;
- goto unm_err_out;
- }
- err = ntfs_attr_lookup(AT_STANDARD_INFORMATION, NULL, 0,
- CASE_SENSITIVE, 0, NULL, 0, ctx);
- if (unlikely(err)) {
- ntfs_attr_put_search_ctx(ctx);
- goto unm_err_out;
- }
- si = (STANDARD_INFORMATION*)((u8*)ctx->attr +
- le16_to_cpu(ctx->attr->data.resident.value_offset));
- /* Update the access times if they have changed. */
- nt = utc2ntfs(inode_get_mtime(vi));
- if (si->last_data_change_time != nt) {
- ntfs_debug("Updating mtime for inode 0x%lx: old = 0x%llx, "
- "new = 0x%llx", vi->i_ino, (long long)
- sle64_to_cpu(si->last_data_change_time),
- (long long)sle64_to_cpu(nt));
- si->last_data_change_time = nt;
- modified = true;
- }
- nt = utc2ntfs(inode_get_ctime(vi));
- if (si->last_mft_change_time != nt) {
- ntfs_debug("Updating ctime for inode 0x%lx: old = 0x%llx, "
- "new = 0x%llx", vi->i_ino, (long long)
- sle64_to_cpu(si->last_mft_change_time),
- (long long)sle64_to_cpu(nt));
- si->last_mft_change_time = nt;
- modified = true;
- }
- nt = utc2ntfs(inode_get_atime(vi));
- if (si->last_access_time != nt) {
- ntfs_debug("Updating atime for inode 0x%lx: old = 0x%llx, "
- "new = 0x%llx", vi->i_ino,
- (long long)sle64_to_cpu(si->last_access_time),
- (long long)sle64_to_cpu(nt));
- si->last_access_time = nt;
- modified = true;
- }
- /*
- * If we just modified the standard information attribute we need to
- * mark the mft record it is in dirty. We do this manually so that
- * mark_inode_dirty() is not called which would redirty the inode and
- * hence result in an infinite loop of trying to write the inode.
- * There is no need to mark the base inode nor the base mft record
- * dirty, since we are going to write this mft record below in any case
- * and the base mft record may actually not have been modified so it
- * might not need to be written out.
- * NOTE: It is not a problem when the inode for $MFT itself is being
- * written out as mark_ntfs_record_dirty() will only set I_DIRTY_PAGES
- * on the $MFT inode and hence __ntfs_write_inode() will not be
- * re-invoked because of it which in turn is ok since the dirtied mft
- * record will be cleaned and written out to disk below, i.e. before
- * this function returns.
- */
- if (modified) {
- flush_dcache_mft_record_page(ctx->ntfs_ino);
- if (!NInoTestSetDirty(ctx->ntfs_ino))
- mark_ntfs_record_dirty(ctx->ntfs_ino->page,
- ctx->ntfs_ino->page_ofs);
- }
- ntfs_attr_put_search_ctx(ctx);
- /* Now the access times are updated, write the base mft record. */
- if (NInoDirty(ni))
- err = write_mft_record(ni, m, sync);
- /* Write all attached extent mft records. */
- mutex_lock(&ni->extent_lock);
- if (ni->nr_extents > 0) {
- ntfs_inode **extent_nis = ni->ext.extent_ntfs_inos;
- int i;
-
- ntfs_debug("Writing %i extent inodes.", ni->nr_extents);
- for (i = 0; i < ni->nr_extents; i++) {
- ntfs_inode *tni = extent_nis[i];
-
- if (NInoDirty(tni)) {
- MFT_RECORD *tm = map_mft_record(tni);
- int ret;
-
- if (IS_ERR(tm)) {
- if (!err || err == -ENOMEM)
- err = PTR_ERR(tm);
- continue;
- }
- ret = write_mft_record(tni, tm, sync);
- unmap_mft_record(tni);
- if (unlikely(ret)) {
- if (!err || err == -ENOMEM)
- err = ret;
- }
- }
- }
- }
- mutex_unlock(&ni->extent_lock);
- unmap_mft_record(ni);
- if (unlikely(err))
- goto err_out;
- ntfs_debug("Done.");
- return 0;
-unm_err_out:
- unmap_mft_record(ni);
-err_out:
- if (err == -ENOMEM) {
- ntfs_warning(vi->i_sb, "Not enough memory to write inode. "
- "Marking the inode dirty again, so the VFS "
- "retries later.");
- mark_inode_dirty(vi);
- } else {
- ntfs_error(vi->i_sb, "Failed (error %i): Run chkdsk.", -err);
- NVolSetErrors(ni->vol);
- }
- return err;
-}
-
-#endif /* NTFS_RW */
diff --git a/fs/ntfs/inode.h b/fs/ntfs/inode.h
deleted file mode 100644
index 147ef4d..0000000
--- a/fs/ntfs/inode.h
+++ /dev/null
@@ -1,310 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * inode.h - Defines for inode structures NTFS Linux kernel driver. Part of
- * the Linux-NTFS project.
- *
- * Copyright (c) 2001-2007 Anton Altaparmakov
- * Copyright (c) 2002 Richard Russon
- */
-
-#ifndef _LINUX_NTFS_INODE_H
-#define _LINUX_NTFS_INODE_H
-
-#include <linux/atomic.h>
-
-#include <linux/fs.h>
-#include <linux/list.h>
-#include <linux/mm.h>
-#include <linux/mutex.h>
-#include <linux/seq_file.h>
-
-#include "layout.h"
-#include "volume.h"
-#include "types.h"
-#include "runlist.h"
-#include "debug.h"
-
-typedef struct _ntfs_inode ntfs_inode;
-
-/*
- * The NTFS in-memory inode structure. It is just used as an extension to the
- * fields already provided in the VFS inode.
- */
-struct _ntfs_inode {
- rwlock_t size_lock; /* Lock serializing access to inode sizes. */
- s64 initialized_size; /* Copy from the attribute record. */
- s64 allocated_size; /* Copy from the attribute record. */
- unsigned long state; /* NTFS specific flags describing this inode.
- See ntfs_inode_state_bits below. */
- unsigned long mft_no; /* Number of the mft record / inode. */
- u16 seq_no; /* Sequence number of the mft record. */
- atomic_t count; /* Inode reference count for book keeping. */
- ntfs_volume *vol; /* Pointer to the ntfs volume of this inode. */
- /*
- * If NInoAttr() is true, the below fields describe the attribute which
- * this fake inode belongs to. The actual inode of this attribute is
- * pointed to by base_ntfs_ino and nr_extents is always set to -1 (see
- * below). For real inodes, we also set the type (AT_DATA for files and
- * AT_INDEX_ALLOCATION for directories), with the name = NULL and
- * name_len = 0 for files and name = I30 (global constant) and
- * name_len = 4 for directories.
- */
- ATTR_TYPE type; /* Attribute type of this fake inode. */
- ntfschar *name; /* Attribute name of this fake inode. */
- u32 name_len; /* Attribute name length of this fake inode. */
- runlist runlist; /* If state has the NI_NonResident bit set,
- the runlist of the unnamed data attribute
- (if a file) or of the index allocation
- attribute (directory) or of the attribute
- described by the fake inode (if NInoAttr()).
- If runlist.rl is NULL, the runlist has not
- been read in yet or has been unmapped. If
- NI_NonResident is clear, the attribute is
- resident (file and fake inode) or there is
- no $I30 index allocation attribute
- (small directory). In the latter case
- runlist.rl is always NULL.*/
- /*
- * The following fields are only valid for real inodes and extent
- * inodes.
- */
- struct mutex mrec_lock; /* Lock for serializing access to the
- mft record belonging to this inode. */
- struct page *page; /* The page containing the mft record of the
- inode. This should only be touched by the
- (un)map_mft_record*() functions. */
- int page_ofs; /* Offset into the page at which the mft record
- begins. This should only be touched by the
- (un)map_mft_record*() functions. */
- /*
- * Attribute list support (only for use by the attribute lookup
- * functions). Setup during read_inode for all inodes with attribute
- * lists. Only valid if NI_AttrList is set in state, and attr_list_rl is
- * further only valid if NI_AttrListNonResident is set.
- */
- u32 attr_list_size; /* Length of attribute list value in bytes. */
- u8 *attr_list; /* Attribute list value itself. */
- runlist attr_list_rl; /* Run list for the attribute list value. */
- union {
- struct { /* It is a directory, $MFT, or an index inode. */
- u32 block_size; /* Size of an index block. */
- u32 vcn_size; /* Size of a vcn in this
- index. */
- COLLATION_RULE collation_rule; /* The collation rule
- for the index. */
- u8 block_size_bits; /* Log2 of the above. */
- u8 vcn_size_bits; /* Log2 of the above. */
- } index;
- struct { /* It is a compressed/sparse file/attribute inode. */
- s64 size; /* Copy of compressed_size from
- $DATA. */
- u32 block_size; /* Size of a compression block
- (cb). */
- u8 block_size_bits; /* Log2 of the size of a cb. */
- u8 block_clusters; /* Number of clusters per cb. */
- } compressed;
- } itype;
- struct mutex extent_lock; /* Lock for accessing/modifying the
- below . */
- s32 nr_extents; /* For a base mft record, the number of attached extent
- inodes (0 if none), for extent records and for fake
- inodes describing an attribute this is -1. */
- union { /* This union is only used if nr_extents != 0. */
- ntfs_inode **extent_ntfs_inos; /* For nr_extents > 0, array of
- the ntfs inodes of the extent
- mft records belonging to
- this base inode which have
- been loaded. */
- ntfs_inode *base_ntfs_ino; /* For nr_extents == -1, the
- ntfs inode of the base mft
- record. For fake inodes, the
- real (base) inode to which
- the attribute belongs. */
- } ext;
-};
-
-/*
- * Defined bits for the state field in the ntfs_inode structure.
- * (f) = files only, (d) = directories only, (a) = attributes/fake inodes only
- */
-typedef enum {
- NI_Dirty, /* 1: Mft record needs to be written to disk. */
- NI_AttrList, /* 1: Mft record contains an attribute list. */
- NI_AttrListNonResident, /* 1: Attribute list is non-resident. Implies
- NI_AttrList is set. */
-
- NI_Attr, /* 1: Fake inode for attribute i/o.
- 0: Real inode or extent inode. */
-
- NI_MstProtected, /* 1: Attribute is protected by MST fixups.
- 0: Attribute is not protected by fixups. */
- NI_NonResident, /* 1: Unnamed data attr is non-resident (f).
- 1: Attribute is non-resident (a). */
- NI_IndexAllocPresent = NI_NonResident, /* 1: $I30 index alloc attr is
- present (d). */
- NI_Compressed, /* 1: Unnamed data attr is compressed (f).
- 1: Create compressed files by default (d).
- 1: Attribute is compressed (a). */
- NI_Encrypted, /* 1: Unnamed data attr is encrypted (f).
- 1: Create encrypted files by default (d).
- 1: Attribute is encrypted (a). */
- NI_Sparse, /* 1: Unnamed data attr is sparse (f).
- 1: Create sparse files by default (d).
- 1: Attribute is sparse (a). */
- NI_SparseDisabled, /* 1: May not create sparse regions. */
- NI_TruncateFailed, /* 1: Last ntfs_truncate() call failed. */
-} ntfs_inode_state_bits;
-
-/*
- * NOTE: We should be adding dirty mft records to a list somewhere and they
- * should be independent of the (ntfs/vfs) inode structure so that an inode can
- * be removed but the record can be left dirty for syncing later.
- */
-
-/*
- * Macro tricks to expand the NInoFoo(), NInoSetFoo(), and NInoClearFoo()
- * functions.
- */
-#define NINO_FNS(flag) \
-static inline int NIno##flag(ntfs_inode *ni) \
-{ \
- return test_bit(NI_##flag, &(ni)->state); \
-} \
-static inline void NInoSet##flag(ntfs_inode *ni) \
-{ \
- set_bit(NI_##flag, &(ni)->state); \
-} \
-static inline void NInoClear##flag(ntfs_inode *ni) \
-{ \
- clear_bit(NI_##flag, &(ni)->state); \
-}
-
-/*
- * As above for NInoTestSetFoo() and NInoTestClearFoo().
- */
-#define TAS_NINO_FNS(flag) \
-static inline int NInoTestSet##flag(ntfs_inode *ni) \
-{ \
- return test_and_set_bit(NI_##flag, &(ni)->state); \
-} \
-static inline int NInoTestClear##flag(ntfs_inode *ni) \
-{ \
- return test_and_clear_bit(NI_##flag, &(ni)->state); \
-}
-
-/* Emit the ntfs inode bitops functions. */
-NINO_FNS(Dirty)
-TAS_NINO_FNS(Dirty)
-NINO_FNS(AttrList)
-NINO_FNS(AttrListNonResident)
-NINO_FNS(Attr)
-NINO_FNS(MstProtected)
-NINO_FNS(NonResident)
-NINO_FNS(IndexAllocPresent)
-NINO_FNS(Compressed)
-NINO_FNS(Encrypted)
-NINO_FNS(Sparse)
-NINO_FNS(SparseDisabled)
-NINO_FNS(TruncateFailed)
-
-/*
- * The full structure containing a ntfs_inode and a vfs struct inode. Used for
- * all real and fake inodes but not for extent inodes which lack the vfs struct
- * inode.
- */
-typedef struct {
- ntfs_inode ntfs_inode;
- struct inode vfs_inode; /* The vfs inode structure. */
-} big_ntfs_inode;
-
-/**
- * NTFS_I - return the ntfs inode given a vfs inode
- * @inode: VFS inode
- *
- * NTFS_I() returns the ntfs inode associated with the VFS @inode.
- */
-static inline ntfs_inode *NTFS_I(struct inode *inode)
-{
- return (ntfs_inode *)container_of(inode, big_ntfs_inode, vfs_inode);
-}
-
-static inline struct inode *VFS_I(ntfs_inode *ni)
-{
- return &((big_ntfs_inode *)ni)->vfs_inode;
-}
-
-/**
- * ntfs_attr - ntfs in memory attribute structure
- * @mft_no: mft record number of the base mft record of this attribute
- * @name: Unicode name of the attribute (NULL if unnamed)
- * @name_len: length of @name in Unicode characters (0 if unnamed)
- * @type: attribute type (see layout.h)
- *
- * This structure exists only to provide a small structure for the
- * ntfs_{attr_}iget()/ntfs_test_inode()/ntfs_init_locked_inode() mechanism.
- *
- * NOTE: Elements are ordered by size to make the structure as compact as
- * possible on all architectures.
- */
-typedef struct {
- unsigned long mft_no;
- ntfschar *name;
- u32 name_len;
- ATTR_TYPE type;
-} ntfs_attr;
-
-extern int ntfs_test_inode(struct inode *vi, void *data);
-
-extern struct inode *ntfs_iget(struct super_block *sb, unsigned long mft_no);
-extern struct inode *ntfs_attr_iget(struct inode *base_vi, ATTR_TYPE type,
- ntfschar *name, u32 name_len);
-extern struct inode *ntfs_index_iget(struct inode *base_vi, ntfschar *name,
- u32 name_len);
-
-extern struct inode *ntfs_alloc_big_inode(struct super_block *sb);
-extern void ntfs_free_big_inode(struct inode *inode);
-extern void ntfs_evict_big_inode(struct inode *vi);
-
-extern void __ntfs_init_inode(struct super_block *sb, ntfs_inode *ni);
-
-static inline void ntfs_init_big_inode(struct inode *vi)
-{
- ntfs_inode *ni = NTFS_I(vi);
-
- ntfs_debug("Entering.");
- __ntfs_init_inode(vi->i_sb, ni);
- ni->mft_no = vi->i_ino;
-}
-
-extern ntfs_inode *ntfs_new_extent_inode(struct super_block *sb,
- unsigned long mft_no);
-extern void ntfs_clear_extent_inode(ntfs_inode *ni);
-
-extern int ntfs_read_inode_mount(struct inode *vi);
-
-extern int ntfs_show_options(struct seq_file *sf, struct dentry *root);
-
-#ifdef NTFS_RW
-
-extern int ntfs_truncate(struct inode *vi);
-extern void ntfs_truncate_vfs(struct inode *vi);
-
-extern int ntfs_setattr(struct mnt_idmap *idmap,
- struct dentry *dentry, struct iattr *attr);
-
-extern int __ntfs_write_inode(struct inode *vi, int sync);
-
-static inline void ntfs_commit_inode(struct inode *vi)
-{
- if (!is_bad_inode(vi))
- __ntfs_write_inode(vi, 1);
- return;
-}
-
-#else
-
-static inline void ntfs_truncate_vfs(struct inode *vi) {}
-
-#endif /* NTFS_RW */
-
-#endif /* _LINUX_NTFS_INODE_H */
diff --git a/fs/ntfs/layout.h b/fs/ntfs/layout.h
deleted file mode 100644
index 5d4bf7a3..0000000
--- a/fs/ntfs/layout.h
+++ /dev/null
@@ -1,2421 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * layout.h - All NTFS associated on-disk structures. Part of the Linux-NTFS
- * project.
- *
- * Copyright (c) 2001-2005 Anton Altaparmakov
- * Copyright (c) 2002 Richard Russon
- */
-
-#ifndef _LINUX_NTFS_LAYOUT_H
-#define _LINUX_NTFS_LAYOUT_H
-
-#include <linux/types.h>
-#include <linux/bitops.h>
-#include <linux/list.h>
-#include <asm/byteorder.h>
-
-#include "types.h"
-
-/* The NTFS oem_id "NTFS " */
-#define magicNTFS cpu_to_le64(0x202020205346544eULL)
-
-/*
- * Location of bootsector on partition:
- * The standard NTFS_BOOT_SECTOR is on sector 0 of the partition.
- * On NT4 and above there is one backup copy of the boot sector to
- * be found on the last sector of the partition (not normally accessible
- * from within Windows as the bootsector contained number of sectors
- * value is one less than the actual value!).
- * On versions of NT 3.51 and earlier, the backup copy was located at
- * number of sectors/2 (integer divide), i.e. in the middle of the volume.
- */
-
-/*
- * BIOS parameter block (bpb) structure.
- */
-typedef struct {
- le16 bytes_per_sector; /* Size of a sector in bytes. */
- u8 sectors_per_cluster; /* Size of a cluster in sectors. */
- le16 reserved_sectors; /* zero */
- u8 fats; /* zero */
- le16 root_entries; /* zero */
- le16 sectors; /* zero */
- u8 media_type; /* 0xf8 = hard disk */
- le16 sectors_per_fat; /* zero */
- le16 sectors_per_track; /* irrelevant */
- le16 heads; /* irrelevant */
- le32 hidden_sectors; /* zero */
- le32 large_sectors; /* zero */
-} __attribute__ ((__packed__)) BIOS_PARAMETER_BLOCK;
-
-/*
- * NTFS boot sector structure.
- */
-typedef struct {
- u8 jump[3]; /* Irrelevant (jump to boot up code).*/
- le64 oem_id; /* Magic "NTFS ". */
- BIOS_PARAMETER_BLOCK bpb; /* See BIOS_PARAMETER_BLOCK. */
- u8 unused[4]; /* zero, NTFS diskedit.exe states that
- this is actually:
- __u8 physical_drive; // 0x80
- __u8 current_head; // zero
- __u8 extended_boot_signature;
- // 0x80
- __u8 unused; // zero
- */
-/*0x28*/sle64 number_of_sectors; /* Number of sectors in volume. Gives
- maximum volume size of 2^63 sectors.
- Assuming standard sector size of 512
- bytes, the maximum byte size is
- approx. 4.7x10^21 bytes. (-; */
- sle64 mft_lcn; /* Cluster location of mft data. */
- sle64 mftmirr_lcn; /* Cluster location of copy of mft. */
- s8 clusters_per_mft_record; /* Mft record size in clusters. */
- u8 reserved0[3]; /* zero */
- s8 clusters_per_index_record; /* Index block size in clusters. */
- u8 reserved1[3]; /* zero */
- le64 volume_serial_number; /* Irrelevant (serial number). */
- le32 checksum; /* Boot sector checksum. */
-/*0x54*/u8 bootstrap[426]; /* Irrelevant (boot up code). */
- le16 end_of_sector_marker; /* End of bootsector magic. Always is
- 0xaa55 in little endian. */
-/* sizeof() = 512 (0x200) bytes */
-} __attribute__ ((__packed__)) NTFS_BOOT_SECTOR;
-
-/*
- * Magic identifiers present at the beginning of all ntfs record containing
- * records (like mft records for example).
- */
-enum {
- /* Found in $MFT/$DATA. */
- magic_FILE = cpu_to_le32(0x454c4946), /* Mft entry. */
- magic_INDX = cpu_to_le32(0x58444e49), /* Index buffer. */
- magic_HOLE = cpu_to_le32(0x454c4f48), /* ? (NTFS 3.0+?) */
-
- /* Found in $LogFile/$DATA. */
- magic_RSTR = cpu_to_le32(0x52545352), /* Restart page. */
- magic_RCRD = cpu_to_le32(0x44524352), /* Log record page. */
-
- /* Found in $LogFile/$DATA. (May be found in $MFT/$DATA, also?) */
- magic_CHKD = cpu_to_le32(0x444b4843), /* Modified by chkdsk. */
-
- /* Found in all ntfs record containing records. */
- magic_BAAD = cpu_to_le32(0x44414142), /* Failed multi sector
- transfer was detected. */
- /*
- * Found in $LogFile/$DATA when a page is full of 0xff bytes and is
- * thus not initialized. Page must be initialized before using it.
- */
- magic_empty = cpu_to_le32(0xffffffff) /* Record is empty. */
-};
-
-typedef le32 NTFS_RECORD_TYPE;
-
-/*
- * Generic magic comparison macros. Finally found a use for the ## preprocessor
- * operator! (-8
- */
-
-static inline bool __ntfs_is_magic(le32 x, NTFS_RECORD_TYPE r)
-{
- return (x == r);
-}
-#define ntfs_is_magic(x, m) __ntfs_is_magic(x, magic_##m)
-
-static inline bool __ntfs_is_magicp(le32 *p, NTFS_RECORD_TYPE r)
-{
- return (*p == r);
-}
-#define ntfs_is_magicp(p, m) __ntfs_is_magicp(p, magic_##m)
-
-/*
- * Specialised magic comparison macros for the NTFS_RECORD_TYPEs defined above.
- */
-#define ntfs_is_file_record(x) ( ntfs_is_magic (x, FILE) )
-#define ntfs_is_file_recordp(p) ( ntfs_is_magicp(p, FILE) )
-#define ntfs_is_mft_record(x) ( ntfs_is_file_record (x) )
-#define ntfs_is_mft_recordp(p) ( ntfs_is_file_recordp(p) )
-#define ntfs_is_indx_record(x) ( ntfs_is_magic (x, INDX) )
-#define ntfs_is_indx_recordp(p) ( ntfs_is_magicp(p, INDX) )
-#define ntfs_is_hole_record(x) ( ntfs_is_magic (x, HOLE) )
-#define ntfs_is_hole_recordp(p) ( ntfs_is_magicp(p, HOLE) )
-
-#define ntfs_is_rstr_record(x) ( ntfs_is_magic (x, RSTR) )
-#define ntfs_is_rstr_recordp(p) ( ntfs_is_magicp(p, RSTR) )
-#define ntfs_is_rcrd_record(x) ( ntfs_is_magic (x, RCRD) )
-#define ntfs_is_rcrd_recordp(p) ( ntfs_is_magicp(p, RCRD) )
-
-#define ntfs_is_chkd_record(x) ( ntfs_is_magic (x, CHKD) )
-#define ntfs_is_chkd_recordp(p) ( ntfs_is_magicp(p, CHKD) )
-
-#define ntfs_is_baad_record(x) ( ntfs_is_magic (x, BAAD) )
-#define ntfs_is_baad_recordp(p) ( ntfs_is_magicp(p, BAAD) )
-
-#define ntfs_is_empty_record(x) ( ntfs_is_magic (x, empty) )
-#define ntfs_is_empty_recordp(p) ( ntfs_is_magicp(p, empty) )
-
-/*
- * The Update Sequence Array (usa) is an array of the le16 values which belong
- * to the end of each sector protected by the update sequence record in which
- * this array is contained. Note that the first entry is the Update Sequence
- * Number (usn), a cyclic counter of how many times the protected record has
- * been written to disk. The values 0 and -1 (ie. 0xffff) are not used. All
- * last le16's of each sector have to be equal to the usn (during reading) or
- * are set to it (during writing). If they are not, an incomplete multi sector
- * transfer has occurred when the data was written.
- * The maximum size for the update sequence array is fixed to:
- * maximum size = usa_ofs + (usa_count * 2) = 510 bytes
- * The 510 bytes comes from the fact that the last le16 in the array has to
- * (obviously) finish before the last le16 of the first 512-byte sector.
- * This formula can be used as a consistency check in that usa_ofs +
- * (usa_count * 2) has to be less than or equal to 510.
- */
-typedef struct {
- NTFS_RECORD_TYPE magic; /* A four-byte magic identifying the record
- type and/or status. */
- le16 usa_ofs; /* Offset to the Update Sequence Array (usa)
- from the start of the ntfs record. */
- le16 usa_count; /* Number of le16 sized entries in the usa
- including the Update Sequence Number (usn),
- thus the number of fixups is the usa_count
- minus 1. */
-} __attribute__ ((__packed__)) NTFS_RECORD;
-
-/*
- * System files mft record numbers. All these files are always marked as used
- * in the bitmap attribute of the mft; presumably in order to avoid accidental
- * allocation for random other mft records. Also, the sequence number for each
- * of the system files is always equal to their mft record number and it is
- * never modified.
- */
-typedef enum {
- FILE_MFT = 0, /* Master file table (mft). Data attribute
- contains the entries and bitmap attribute
- records which ones are in use (bit==1). */
- FILE_MFTMirr = 1, /* Mft mirror: copy of first four mft records
- in data attribute. If cluster size > 4kiB,
- copy of first N mft records, with
- N = cluster_size / mft_record_size. */
- FILE_LogFile = 2, /* Journalling log in data attribute. */
- FILE_Volume = 3, /* Volume name attribute and volume information
- attribute (flags and ntfs version). Windows
- refers to this file as volume DASD (Direct
- Access Storage Device). */
- FILE_AttrDef = 4, /* Array of attribute definitions in data
- attribute. */
- FILE_root = 5, /* Root directory. */
- FILE_Bitmap = 6, /* Allocation bitmap of all clusters (lcns) in
- data attribute. */
- FILE_Boot = 7, /* Boot sector (always at cluster 0) in data
- attribute. */
- FILE_BadClus = 8, /* Contains all bad clusters in the non-resident
- data attribute. */
- FILE_Secure = 9, /* Shared security descriptors in data attribute
- and two indexes into the descriptors.
- Appeared in Windows 2000. Before that, this
- file was named $Quota but was unused. */
- FILE_UpCase = 10, /* Uppercase equivalents of all 65536 Unicode
- characters in data attribute. */
- FILE_Extend = 11, /* Directory containing other system files (eg.
- $ObjId, $Quota, $Reparse and $UsnJrnl). This
- is new to NTFS3.0. */
- FILE_reserved12 = 12, /* Reserved for future use (records 12-15). */
- FILE_reserved13 = 13,
- FILE_reserved14 = 14,
- FILE_reserved15 = 15,
- FILE_first_user = 16, /* First user file, used as test limit for
- whether to allow opening a file or not. */
-} NTFS_SYSTEM_FILES;
-
-/*
- * These are the so far known MFT_RECORD_* flags (16-bit) which contain
- * information about the mft record in which they are present.
- */
-enum {
- MFT_RECORD_IN_USE = cpu_to_le16(0x0001),
- MFT_RECORD_IS_DIRECTORY = cpu_to_le16(0x0002),
-} __attribute__ ((__packed__));
-
-typedef le16 MFT_RECORD_FLAGS;
-
-/*
- * mft references (aka file references or file record segment references) are
- * used whenever a structure needs to refer to a record in the mft.
- *
- * A reference consists of a 48-bit index into the mft and a 16-bit sequence
- * number used to detect stale references.
- *
- * For error reporting purposes we treat the 48-bit index as a signed quantity.
- *
- * The sequence number is a circular counter (skipping 0) describing how many
- * times the referenced mft record has been (re)used. This has to match the
- * sequence number of the mft record being referenced, otherwise the reference
- * is considered stale and removed (FIXME: only ntfsck or the driver itself?).
- *
- * If the sequence number is zero it is assumed that no sequence number
- * consistency checking should be performed.
- *
- * FIXME: Since inodes are 32-bit as of now, the driver needs to always check
- * for high_part being 0 and if not either BUG(), cause a panic() or handle
- * the situation in some other way. This shouldn't be a problem as a volume has
- * to become HUGE in order to need more than 32-bits worth of mft records.
- * Assuming the standard mft record size of 1kb only the records (never mind
- * the non-resident attributes, etc.) would require 4Tb of space on their own
- * for the first 32 bits worth of records. This is only if some strange person
- * doesn't decide to foul play and make the mft sparse which would be a really
- * horrible thing to do as it would trash our current driver implementation. )-:
- * Do I hear screams "we want 64-bit inodes!" ?!? (-;
- *
- * FIXME: The mft zone is defined as the first 12% of the volume. This space is
- * reserved so that the mft can grow contiguously and hence doesn't become
- * fragmented. Volume free space includes the empty part of the mft zone and
- * when the volume's free 88% are used up, the mft zone is shrunk by a factor
- * of 2, thus making more space available for more files/data. This process is
- * repeated every time there is no more free space except for the mft zone until
- * there really is no more free space.
- */
-
-/*
- * Typedef the MFT_REF as a 64-bit value for easier handling.
- * Also define two unpacking macros to get to the reference (MREF) and
- * sequence number (MSEQNO) respectively.
- * The _LE versions are to be applied on little endian MFT_REFs.
- * Note: The _LE versions will return a CPU endian formatted value!
- */
-#define MFT_REF_MASK_CPU 0x0000ffffffffffffULL
-#define MFT_REF_MASK_LE cpu_to_le64(MFT_REF_MASK_CPU)
-
-typedef u64 MFT_REF;
-typedef le64 leMFT_REF;
-
-#define MK_MREF(m, s) ((MFT_REF)(((MFT_REF)(s) << 48) | \
- ((MFT_REF)(m) & MFT_REF_MASK_CPU)))
-#define MK_LE_MREF(m, s) cpu_to_le64(MK_MREF(m, s))
-
-#define MREF(x) ((unsigned long)((x) & MFT_REF_MASK_CPU))
-#define MSEQNO(x) ((u16)(((x) >> 48) & 0xffff))
-#define MREF_LE(x) ((unsigned long)(le64_to_cpu(x) & MFT_REF_MASK_CPU))
-#define MSEQNO_LE(x) ((u16)((le64_to_cpu(x) >> 48) & 0xffff))
-
-#define IS_ERR_MREF(x) (((x) & 0x0000800000000000ULL) ? true : false)
-#define ERR_MREF(x) ((u64)((s64)(x)))
-#define MREF_ERR(x) ((int)((s64)(x)))
-
-/*
- * The mft record header present at the beginning of every record in the mft.
- * This is followed by a sequence of variable length attribute records which
- * is terminated by an attribute of type AT_END which is a truncated attribute
- * in that it only consists of the attribute type code AT_END and none of the
- * other members of the attribute structure are present.
- */
-typedef struct {
-/*Ofs*/
-/* 0 NTFS_RECORD; -- Unfolded here as gcc doesn't like unnamed structs. */
- NTFS_RECORD_TYPE magic; /* Usually the magic is "FILE". */
- le16 usa_ofs; /* See NTFS_RECORD definition above. */
- le16 usa_count; /* See NTFS_RECORD definition above. */
-
-/* 8*/ le64 lsn; /* $LogFile sequence number for this record.
- Changed every time the record is modified. */
-/* 16*/ le16 sequence_number; /* Number of times this mft record has been
- reused. (See description for MFT_REF
- above.) NOTE: The increment (skipping zero)
- is done when the file is deleted. NOTE: If
- this is zero it is left zero. */
-/* 18*/ le16 link_count; /* Number of hard links, i.e. the number of
- directory entries referencing this record.
- NOTE: Only used in mft base records.
- NOTE: When deleting a directory entry we
- check the link_count and if it is 1 we
- delete the file. Otherwise we delete the
- FILE_NAME_ATTR being referenced by the
- directory entry from the mft record and
- decrement the link_count.
- FIXME: Careful with Win32 + DOS names! */
-/* 20*/ le16 attrs_offset; /* Byte offset to the first attribute in this
- mft record from the start of the mft record.
- NOTE: Must be aligned to 8-byte boundary. */
-/* 22*/ MFT_RECORD_FLAGS flags; /* Bit array of MFT_RECORD_FLAGS. When a file
- is deleted, the MFT_RECORD_IN_USE flag is
- set to zero. */
-/* 24*/ le32 bytes_in_use; /* Number of bytes used in this mft record.
- NOTE: Must be aligned to 8-byte boundary. */
-/* 28*/ le32 bytes_allocated; /* Number of bytes allocated for this mft
- record. This should be equal to the mft
- record size. */
-/* 32*/ leMFT_REF base_mft_record;/* This is zero for base mft records.
- When it is not zero it is a mft reference
- pointing to the base mft record to which
- this record belongs (this is then used to
- locate the attribute list attribute present
- in the base record which describes this
- extension record and hence might need
- modification when the extension record
- itself is modified, also locating the
- attribute list also means finding the other
- potential extents, belonging to the non-base
- mft record). */
-/* 40*/ le16 next_attr_instance;/* The instance number that will be assigned to
- the next attribute added to this mft record.
- NOTE: Incremented each time after it is used.
- NOTE: Every time the mft record is reused
- this number is set to zero. NOTE: The first
- instance number is always 0. */
-/* The below fields are specific to NTFS 3.1+ (Windows XP and above): */
-/* 42*/ le16 reserved; /* Reserved/alignment. */
-/* 44*/ le32 mft_record_number; /* Number of this mft record. */
-/* sizeof() = 48 bytes */
-/*
- * When (re)using the mft record, we place the update sequence array at this
- * offset, i.e. before we start with the attributes. This also makes sense,
- * otherwise we could run into problems with the update sequence array
- * containing in itself the last two bytes of a sector which would mean that
- * multi sector transfer protection wouldn't work. As you can't protect data
- * by overwriting it since you then can't get it back...
- * When reading we obviously use the data from the ntfs record header.
- */
-} __attribute__ ((__packed__)) MFT_RECORD;
-
-/* This is the version without the NTFS 3.1+ specific fields. */
-typedef struct {
-/*Ofs*/
-/* 0 NTFS_RECORD; -- Unfolded here as gcc doesn't like unnamed structs. */
- NTFS_RECORD_TYPE magic; /* Usually the magic is "FILE". */
- le16 usa_ofs; /* See NTFS_RECORD definition above. */
- le16 usa_count; /* See NTFS_RECORD definition above. */
-
-/* 8*/ le64 lsn; /* $LogFile sequence number for this record.
- Changed every time the record is modified. */
-/* 16*/ le16 sequence_number; /* Number of times this mft record has been
- reused. (See description for MFT_REF
- above.) NOTE: The increment (skipping zero)
- is done when the file is deleted. NOTE: If
- this is zero it is left zero. */
-/* 18*/ le16 link_count; /* Number of hard links, i.e. the number of
- directory entries referencing this record.
- NOTE: Only used in mft base records.
- NOTE: When deleting a directory entry we
- check the link_count and if it is 1 we
- delete the file. Otherwise we delete the
- FILE_NAME_ATTR being referenced by the
- directory entry from the mft record and
- decrement the link_count.
- FIXME: Careful with Win32 + DOS names! */
-/* 20*/ le16 attrs_offset; /* Byte offset to the first attribute in this
- mft record from the start of the mft record.
- NOTE: Must be aligned to 8-byte boundary. */
-/* 22*/ MFT_RECORD_FLAGS flags; /* Bit array of MFT_RECORD_FLAGS. When a file
- is deleted, the MFT_RECORD_IN_USE flag is
- set to zero. */
-/* 24*/ le32 bytes_in_use; /* Number of bytes used in this mft record.
- NOTE: Must be aligned to 8-byte boundary. */
-/* 28*/ le32 bytes_allocated; /* Number of bytes allocated for this mft
- record. This should be equal to the mft
- record size. */
-/* 32*/ leMFT_REF base_mft_record;/* This is zero for base mft records.
- When it is not zero it is a mft reference
- pointing to the base mft record to which
- this record belongs (this is then used to
- locate the attribute list attribute present
- in the base record which describes this
- extension record and hence might need
- modification when the extension record
- itself is modified, also locating the
- attribute list also means finding the other
- potential extents, belonging to the non-base
- mft record). */
-/* 40*/ le16 next_attr_instance;/* The instance number that will be assigned to
- the next attribute added to this mft record.
- NOTE: Incremented each time after it is used.
- NOTE: Every time the mft record is reused
- this number is set to zero. NOTE: The first
- instance number is always 0. */
-/* sizeof() = 42 bytes */
-/*
- * When (re)using the mft record, we place the update sequence array at this
- * offset, i.e. before we start with the attributes. This also makes sense,
- * otherwise we could run into problems with the update sequence array
- * containing in itself the last two bytes of a sector which would mean that
- * multi sector transfer protection wouldn't work. As you can't protect data
- * by overwriting it since you then can't get it back...
- * When reading we obviously use the data from the ntfs record header.
- */
-} __attribute__ ((__packed__)) MFT_RECORD_OLD;
-
-/*
- * System defined attributes (32-bit). Each attribute type has a corresponding
- * attribute name (Unicode string of maximum 64 character length) as described
- * by the attribute definitions present in the data attribute of the $AttrDef
- * system file. On NTFS 3.0 volumes the names are just as the types are named
- * in the below defines exchanging AT_ for the dollar sign ($). If that is not
- * a revealing choice of symbol I do not know what is... (-;
- */
-enum {
- AT_UNUSED = cpu_to_le32( 0),
- AT_STANDARD_INFORMATION = cpu_to_le32( 0x10),
- AT_ATTRIBUTE_LIST = cpu_to_le32( 0x20),
- AT_FILE_NAME = cpu_to_le32( 0x30),
- AT_OBJECT_ID = cpu_to_le32( 0x40),
- AT_SECURITY_DESCRIPTOR = cpu_to_le32( 0x50),
- AT_VOLUME_NAME = cpu_to_le32( 0x60),
- AT_VOLUME_INFORMATION = cpu_to_le32( 0x70),
- AT_DATA = cpu_to_le32( 0x80),
- AT_INDEX_ROOT = cpu_to_le32( 0x90),
- AT_INDEX_ALLOCATION = cpu_to_le32( 0xa0),
- AT_BITMAP = cpu_to_le32( 0xb0),
- AT_REPARSE_POINT = cpu_to_le32( 0xc0),
- AT_EA_INFORMATION = cpu_to_le32( 0xd0),
- AT_EA = cpu_to_le32( 0xe0),
- AT_PROPERTY_SET = cpu_to_le32( 0xf0),
- AT_LOGGED_UTILITY_STREAM = cpu_to_le32( 0x100),
- AT_FIRST_USER_DEFINED_ATTRIBUTE = cpu_to_le32( 0x1000),
- AT_END = cpu_to_le32(0xffffffff)
-};
-
-typedef le32 ATTR_TYPE;
-
-/*
- * The collation rules for sorting views/indexes/etc (32-bit).
- *
- * COLLATION_BINARY - Collate by binary compare where the first byte is most
- * significant.
- * COLLATION_UNICODE_STRING - Collate Unicode strings by comparing their binary
- * Unicode values, except that when a character can be uppercased, the
- * upper case value collates before the lower case one.
- * COLLATION_FILE_NAME - Collate file names as Unicode strings. The collation
- * is done very much like COLLATION_UNICODE_STRING. In fact I have no idea
- * what the difference is. Perhaps the difference is that file names
- * would treat some special characters in an odd way (see
- * unistr.c::ntfs_collate_names() and unistr.c::legal_ansi_char_array[]
- * for what I mean but COLLATION_UNICODE_STRING would not give any special
- * treatment to any characters at all, but this is speculation.
- * COLLATION_NTOFS_ULONG - Sorting is done according to ascending le32 key
- * values. E.g. used for $SII index in FILE_Secure, which sorts by
- * security_id (le32).
- * COLLATION_NTOFS_SID - Sorting is done according to ascending SID values.
- * E.g. used for $O index in FILE_Extend/$Quota.
- * COLLATION_NTOFS_SECURITY_HASH - Sorting is done first by ascending hash
- * values and second by ascending security_id values. E.g. used for $SDH
- * index in FILE_Secure.
- * COLLATION_NTOFS_ULONGS - Sorting is done according to a sequence of ascending
- * le32 key values. E.g. used for $O index in FILE_Extend/$ObjId, which
- * sorts by object_id (16-byte), by splitting up the object_id in four
- * le32 values and using them as individual keys. E.g. take the following
- * two security_ids, stored as follows on disk:
- * 1st: a1 61 65 b7 65 7b d4 11 9e 3d 00 e0 81 10 42 59
- * 2nd: 38 14 37 d2 d2 f3 d4 11 a5 21 c8 6b 79 b1 97 45
- * To compare them, they are split into four le32 values each, like so:
- * 1st: 0xb76561a1 0x11d47b65 0xe0003d9e 0x59421081
- * 2nd: 0xd2371438 0x11d4f3d2 0x6bc821a5 0x4597b179
- * Now, it is apparent why the 2nd object_id collates after the 1st: the
- * first le32 value of the 1st object_id is less than the first le32 of
- * the 2nd object_id. If the first le32 values of both object_ids were
- * equal then the second le32 values would be compared, etc.
- */
-enum {
- COLLATION_BINARY = cpu_to_le32(0x00),
- COLLATION_FILE_NAME = cpu_to_le32(0x01),
- COLLATION_UNICODE_STRING = cpu_to_le32(0x02),
- COLLATION_NTOFS_ULONG = cpu_to_le32(0x10),
- COLLATION_NTOFS_SID = cpu_to_le32(0x11),
- COLLATION_NTOFS_SECURITY_HASH = cpu_to_le32(0x12),
- COLLATION_NTOFS_ULONGS = cpu_to_le32(0x13),
-};
-
-typedef le32 COLLATION_RULE;
-
-/*
- * The flags (32-bit) describing attribute properties in the attribute
- * definition structure. FIXME: This information is based on Regis's
- * information and, according to him, it is not certain and probably
- * incomplete. The INDEXABLE flag is fairly certainly correct as only the file
- * name attribute has this flag set and this is the only attribute indexed in
- * NT4.
- */
-enum {
- ATTR_DEF_INDEXABLE = cpu_to_le32(0x02), /* Attribute can be
- indexed. */
- ATTR_DEF_MULTIPLE = cpu_to_le32(0x04), /* Attribute type
- can be present multiple times in the
- mft records of an inode. */
- ATTR_DEF_NOT_ZERO = cpu_to_le32(0x08), /* Attribute value
- must contain at least one non-zero
- byte. */
- ATTR_DEF_INDEXED_UNIQUE = cpu_to_le32(0x10), /* Attribute must be
- indexed and the attribute value must be
- unique for the attribute type in all of
- the mft records of an inode. */
- ATTR_DEF_NAMED_UNIQUE = cpu_to_le32(0x20), /* Attribute must be
- named and the name must be unique for
- the attribute type in all of the mft
- records of an inode. */
- ATTR_DEF_RESIDENT = cpu_to_le32(0x40), /* Attribute must be
- resident. */
- ATTR_DEF_ALWAYS_LOG = cpu_to_le32(0x80), /* Always log
- modifications to this attribute,
- regardless of whether it is resident or
- non-resident. Without this, only log
- modifications if the attribute is
- resident. */
-};
-
-typedef le32 ATTR_DEF_FLAGS;
-
-/*
- * The data attribute of FILE_AttrDef contains a sequence of attribute
- * definitions for the NTFS volume. With this, it is supposed to be safe for an
- * older NTFS driver to mount a volume containing a newer NTFS version without
- * damaging it (that's the theory. In practice it's: not damaging it too much).
- * Entries are sorted by attribute type. The flags describe whether the
- * attribute can be resident/non-resident and possibly other things, but the
- * actual bits are unknown.
- */
-typedef struct {
-/*hex ofs*/
-/* 0*/ ntfschar name[0x40]; /* Unicode name of the attribute. Zero
- terminated. */
-/* 80*/ ATTR_TYPE type; /* Type of the attribute. */
-/* 84*/ le32 display_rule; /* Default display rule.
- FIXME: What does it mean? (AIA) */
-/* 88*/ COLLATION_RULE collation_rule; /* Default collation rule. */
-/* 8c*/ ATTR_DEF_FLAGS flags; /* Flags describing the attribute. */
-/* 90*/ sle64 min_size; /* Optional minimum attribute size. */
-/* 98*/ sle64 max_size; /* Maximum size of attribute. */
-/* sizeof() = 0xa0 or 160 bytes */
-} __attribute__ ((__packed__)) ATTR_DEF;
-
-/*
- * Attribute flags (16-bit).
- */
-enum {
- ATTR_IS_COMPRESSED = cpu_to_le16(0x0001),
- ATTR_COMPRESSION_MASK = cpu_to_le16(0x00ff), /* Compression method
- mask. Also, first
- illegal value. */
- ATTR_IS_ENCRYPTED = cpu_to_le16(0x4000),
- ATTR_IS_SPARSE = cpu_to_le16(0x8000),
-} __attribute__ ((__packed__));
-
-typedef le16 ATTR_FLAGS;
-
-/*
- * Attribute compression.
- *
- * Only the data attribute is ever compressed in the current ntfs driver in
- * Windows. Further, compression is only applied when the data attribute is
- * non-resident. Finally, to use compression, the maximum allowed cluster size
- * on a volume is 4kib.
- *
- * The compression method is based on independently compressing blocks of X
- * clusters, where X is determined from the compression_unit value found in the
- * non-resident attribute record header (more precisely: X = 2^compression_unit
- * clusters). On Windows NT/2k, X always is 16 clusters (compression_unit = 4).
- *
- * There are three different cases of how a compression block of X clusters
- * can be stored:
- *
- * 1) The data in the block is all zero (a sparse block):
- * This is stored as a sparse block in the runlist, i.e. the runlist
- * entry has length = X and lcn = -1. The mapping pairs array actually
- * uses a delta_lcn value length of 0, i.e. delta_lcn is not present at
- * all, which is then interpreted by the driver as lcn = -1.
- * NOTE: Even uncompressed files can be sparse on NTFS 3.0 volumes, then
- * the same principles apply as above, except that the length is not
- * restricted to being any particular value.
- *
- * 2) The data in the block is not compressed:
- * This happens when compression doesn't reduce the size of the block
- * in clusters. I.e. if compression has a small effect so that the
- * compressed data still occupies X clusters, then the uncompressed data
- * is stored in the block.
- * This case is recognised by the fact that the runlist entry has
- * length = X and lcn >= 0. The mapping pairs array stores this as
- * normal with a run length of X and some specific delta_lcn, i.e.
- * delta_lcn has to be present.
- *
- * 3) The data in the block is compressed:
- * The common case. This case is recognised by the fact that the run
- * list entry has length L < X and lcn >= 0. The mapping pairs array
- * stores this as normal with a run length of X and some specific
- * delta_lcn, i.e. delta_lcn has to be present. This runlist entry is
- * immediately followed by a sparse entry with length = X - L and
- * lcn = -1. The latter entry is to make up the vcn counting to the
- * full compression block size X.
- *
- * In fact, life is more complicated because adjacent entries of the same type
- * can be coalesced. This means that one has to keep track of the number of
- * clusters handled and work on a basis of X clusters at a time being one
- * block. An example: if length L > X this means that this particular runlist
- * entry contains a block of length X and part of one or more blocks of length
- * L - X. Another example: if length L < X, this does not necessarily mean that
- * the block is compressed as it might be that the lcn changes inside the block
- * and hence the following runlist entry describes the continuation of the
- * potentially compressed block. The block would be compressed if the
- * following runlist entry describes at least X - L sparse clusters, thus
- * making up the compression block length as described in point 3 above. (Of
- * course, there can be several runlist entries with small lengths so that the
- * sparse entry does not follow the first data containing entry with
- * length < X.)
- *
- * NOTE: At the end of the compressed attribute value, there most likely is not
- * just the right amount of data to make up a compression block, thus this data
- * is not even attempted to be compressed. It is just stored as is, unless
- * the number of clusters it occupies is reduced when compressed in which case
- * it is stored as a compressed compression block, complete with sparse
- * clusters at the end.
- */
-
-/*
- * Flags of resident attributes (8-bit).
- */
-enum {
- RESIDENT_ATTR_IS_INDEXED = 0x01, /* Attribute is referenced in an index
- (has implications for deleting and
- modifying the attribute). */
-} __attribute__ ((__packed__));
-
-typedef u8 RESIDENT_ATTR_FLAGS;
-
-/*
- * Attribute record header. Always aligned to 8-byte boundary.
- */
-typedef struct {
-/*Ofs*/
-/* 0*/ ATTR_TYPE type; /* The (32-bit) type of the attribute. */
-/* 4*/ le32 length; /* Byte size of the resident part of the
- attribute (aligned to 8-byte boundary).
- Used to get to the next attribute. */
-/* 8*/ u8 non_resident; /* If 0, attribute is resident.
- If 1, attribute is non-resident. */
-/* 9*/ u8 name_length; /* Unicode character size of name of attribute.
- 0 if unnamed. */
-/* 10*/ le16 name_offset; /* If name_length != 0, the byte offset to the
- beginning of the name from the attribute
- record. Note that the name is stored as a
- Unicode string. When creating, place offset
- just at the end of the record header. Then,
- follow with attribute value or mapping pairs
- array, resident and non-resident attributes
- respectively, aligning to an 8-byte
- boundary. */
-/* 12*/ ATTR_FLAGS flags; /* Flags describing the attribute. */
-/* 14*/ le16 instance; /* The instance of this attribute record. This
- number is unique within this mft record (see
- MFT_RECORD/next_attribute_instance notes in
- mft.h for more details). */
-/* 16*/ union {
- /* Resident attributes. */
- struct {
-/* 16 */ le32 value_length;/* Byte size of attribute value. */
-/* 20 */ le16 value_offset;/* Byte offset of the attribute
- value from the start of the
- attribute record. When creating,
- align to 8-byte boundary if we
- have a name present as this might
- not have a length of a multiple
- of 8-bytes. */
-/* 22 */ RESIDENT_ATTR_FLAGS flags; /* See above. */
-/* 23 */ s8 reserved; /* Reserved/alignment to 8-byte
- boundary. */
- } __attribute__ ((__packed__)) resident;
- /* Non-resident attributes. */
- struct {
-/* 16*/ leVCN lowest_vcn;/* Lowest valid virtual cluster number
- for this portion of the attribute value or
- 0 if this is the only extent (usually the
- case). - Only when an attribute list is used
- does lowest_vcn != 0 ever occur. */
-/* 24*/ leVCN highest_vcn;/* Highest valid vcn of this extent of
- the attribute value. - Usually there is only one
- portion, so this usually equals the attribute
- value size in clusters minus 1. Can be -1 for
- zero length files. Can be 0 for "single extent"
- attributes. */
-/* 32*/ le16 mapping_pairs_offset; /* Byte offset from the
- beginning of the structure to the mapping pairs
- array which contains the mappings between the
- vcns and the logical cluster numbers (lcns).
- When creating, place this at the end of this
- record header aligned to 8-byte boundary. */
-/* 34*/ u8 compression_unit; /* The compression unit expressed
- as the log to the base 2 of the number of
- clusters in a compression unit. 0 means not
- compressed. (This effectively limits the
- compression unit size to be a power of two
- clusters.) WinNT4 only uses a value of 4.
- Sparse files have this set to 0 on XPSP2. */
-/* 35*/ u8 reserved[5]; /* Align to 8-byte boundary. */
-/* The sizes below are only used when lowest_vcn is zero, as otherwise it would
- be difficult to keep them up-to-date.*/
-/* 40*/ sle64 allocated_size; /* Byte size of disk space
- allocated to hold the attribute value. Always
- is a multiple of the cluster size. When a file
- is compressed, this field is a multiple of the
- compression block size (2^compression_unit) and
- it represents the logically allocated space
- rather than the actual on disk usage. For this
- use the compressed_size (see below). */
-/* 48*/ sle64 data_size; /* Byte size of the attribute
- value. Can be larger than allocated_size if
- attribute value is compressed or sparse. */
-/* 56*/ sle64 initialized_size; /* Byte size of initialized
- portion of the attribute value. Usually equals
- data_size. */
-/* sizeof(uncompressed attr) = 64*/
-/* 64*/ sle64 compressed_size; /* Byte size of the attribute
- value after compression. Only present when
- compressed or sparse. Always is a multiple of
- the cluster size. Represents the actual amount
- of disk space being used on the disk. */
-/* sizeof(compressed attr) = 72*/
- } __attribute__ ((__packed__)) non_resident;
- } __attribute__ ((__packed__)) data;
-} __attribute__ ((__packed__)) ATTR_RECORD;
-
-typedef ATTR_RECORD ATTR_REC;
-
-/*
- * File attribute flags (32-bit) appearing in the file_attributes fields of the
- * STANDARD_INFORMATION attribute of MFT_RECORDs and the FILENAME_ATTR
- * attributes of MFT_RECORDs and directory index entries.
- *
- * All of the below flags appear in the directory index entries but only some
- * appear in the STANDARD_INFORMATION attribute whilst only some others appear
- * in the FILENAME_ATTR attribute of MFT_RECORDs. Unless otherwise stated the
- * flags appear in all of the above.
- */
-enum {
- FILE_ATTR_READONLY = cpu_to_le32(0x00000001),
- FILE_ATTR_HIDDEN = cpu_to_le32(0x00000002),
- FILE_ATTR_SYSTEM = cpu_to_le32(0x00000004),
- /* Old DOS volid. Unused in NT. = cpu_to_le32(0x00000008), */
-
- FILE_ATTR_DIRECTORY = cpu_to_le32(0x00000010),
- /* Note, FILE_ATTR_DIRECTORY is not considered valid in NT. It is
- reserved for the DOS SUBDIRECTORY flag. */
- FILE_ATTR_ARCHIVE = cpu_to_le32(0x00000020),
- FILE_ATTR_DEVICE = cpu_to_le32(0x00000040),
- FILE_ATTR_NORMAL = cpu_to_le32(0x00000080),
-
- FILE_ATTR_TEMPORARY = cpu_to_le32(0x00000100),
- FILE_ATTR_SPARSE_FILE = cpu_to_le32(0x00000200),
- FILE_ATTR_REPARSE_POINT = cpu_to_le32(0x00000400),
- FILE_ATTR_COMPRESSED = cpu_to_le32(0x00000800),
-
- FILE_ATTR_OFFLINE = cpu_to_le32(0x00001000),
- FILE_ATTR_NOT_CONTENT_INDEXED = cpu_to_le32(0x00002000),
- FILE_ATTR_ENCRYPTED = cpu_to_le32(0x00004000),
-
- FILE_ATTR_VALID_FLAGS = cpu_to_le32(0x00007fb7),
- /* Note, FILE_ATTR_VALID_FLAGS masks out the old DOS VolId and the
- FILE_ATTR_DEVICE and preserves everything else. This mask is used
- to obtain all flags that are valid for reading. */
- FILE_ATTR_VALID_SET_FLAGS = cpu_to_le32(0x000031a7),
- /* Note, FILE_ATTR_VALID_SET_FLAGS masks out the old DOS VolId, the
- F_A_DEVICE, F_A_DIRECTORY, F_A_SPARSE_FILE, F_A_REPARSE_POINT,
- F_A_COMPRESSED, and F_A_ENCRYPTED and preserves the rest. This mask
- is used to obtain all flags that are valid for setting. */
- /*
- * The flag FILE_ATTR_DUP_FILENAME_INDEX_PRESENT is present in all
- * FILENAME_ATTR attributes but not in the STANDARD_INFORMATION
- * attribute of an mft record.
- */
- FILE_ATTR_DUP_FILE_NAME_INDEX_PRESENT = cpu_to_le32(0x10000000),
- /* Note, this is a copy of the corresponding bit from the mft record,
- telling us whether this is a directory or not, i.e. whether it has
- an index root attribute or not. */
- FILE_ATTR_DUP_VIEW_INDEX_PRESENT = cpu_to_le32(0x20000000),
- /* Note, this is a copy of the corresponding bit from the mft record,
- telling us whether this file has a view index present (eg. object id
- index, quota index, one of the security indexes or the encrypting
- filesystem related indexes). */
-};
-
-typedef le32 FILE_ATTR_FLAGS;
-
-/*
- * NOTE on times in NTFS: All times are in MS standard time format, i.e. they
- * are the number of 100-nanosecond intervals since 1st January 1601, 00:00:00
- * universal coordinated time (UTC). (In Linux time starts 1st January 1970,
- * 00:00:00 UTC and is stored as the number of 1-second intervals since then.)
- */
-
-/*
- * Attribute: Standard information (0x10).
- *
- * NOTE: Always resident.
- * NOTE: Present in all base file records on a volume.
- * NOTE: There is conflicting information about the meaning of each of the time
- * fields but the meaning as defined below has been verified to be
- * correct by practical experimentation on Windows NT4 SP6a and is hence
- * assumed to be the one and only correct interpretation.
- */
-typedef struct {
-/*Ofs*/
-/* 0*/ sle64 creation_time; /* Time file was created. Updated when
- a filename is changed(?). */
-/* 8*/ sle64 last_data_change_time; /* Time the data attribute was last
- modified. */
-/* 16*/ sle64 last_mft_change_time; /* Time this mft record was last
- modified. */
-/* 24*/ sle64 last_access_time; /* Approximate time when the file was
- last accessed (obviously this is not
- updated on read-only volumes). In
- Windows this is only updated when
- accessed if some time delta has
- passed since the last update. Also,
- last access time updates can be
- disabled altogether for speed. */
-/* 32*/ FILE_ATTR_FLAGS file_attributes; /* Flags describing the file. */
-/* 36*/ union {
- /* NTFS 1.2 */
- struct {
- /* 36*/ u8 reserved12[12]; /* Reserved/alignment to 8-byte
- boundary. */
- } __attribute__ ((__packed__)) v1;
- /* sizeof() = 48 bytes */
- /* NTFS 3.x */
- struct {
-/*
- * If a volume has been upgraded from a previous NTFS version, then these
- * fields are present only if the file has been accessed since the upgrade.
- * Recognize the difference by comparing the length of the resident attribute
- * value. If it is 48, then the following fields are missing. If it is 72 then
- * the fields are present. Maybe just check like this:
- * if (resident.ValueLength < sizeof(STANDARD_INFORMATION)) {
- * Assume NTFS 1.2- format.
- * If (volume version is 3.x)
- * Upgrade attribute to NTFS 3.x format.
- * else
- * Use NTFS 1.2- format for access.
- * } else
- * Use NTFS 3.x format for access.
- * Only problem is that it might be legal to set the length of the value to
- * arbitrarily large values thus spoiling this check. - But chkdsk probably
- * views that as a corruption, assuming that it behaves like this for all
- * attributes.
- */
- /* 36*/ le32 maximum_versions; /* Maximum allowed versions for
- file. Zero if version numbering is disabled. */
- /* 40*/ le32 version_number; /* This file's version (if any).
- Set to zero if maximum_versions is zero. */
- /* 44*/ le32 class_id; /* Class id from bidirectional
- class id index (?). */
- /* 48*/ le32 owner_id; /* Owner_id of the user owning
- the file. Translate via $Q index in FILE_Extend
- /$Quota to the quota control entry for the user
- owning the file. Zero if quotas are disabled. */
- /* 52*/ le32 security_id; /* Security_id for the file.
- Translate via $SII index and $SDS data stream
- in FILE_Secure to the security descriptor. */
- /* 56*/ le64 quota_charged; /* Byte size of the charge to
- the quota for all streams of the file. Note: Is
- zero if quotas are disabled. */
- /* 64*/ leUSN usn; /* Last update sequence number
- of the file. This is a direct index into the
- transaction log file ($UsnJrnl). It is zero if
- the usn journal is disabled or this file has
- not been subject to logging yet. See usnjrnl.h
- for details. */
- } __attribute__ ((__packed__)) v3;
- /* sizeof() = 72 bytes (NTFS 3.x) */
- } __attribute__ ((__packed__)) ver;
-} __attribute__ ((__packed__)) STANDARD_INFORMATION;
-
-/*
- * Attribute: Attribute list (0x20).
- *
- * - Can be either resident or non-resident.
- * - Value consists of a sequence of variable length, 8-byte aligned,
- * ATTR_LIST_ENTRY records.
- * - The list is not terminated by anything at all! The only way to know when
- * the end is reached is to keep track of the current offset and compare it to
- * the attribute value size.
- * - The attribute list attribute contains one entry for each attribute of
- * the file in which the list is located, except for the list attribute
- * itself. The list is sorted: first by attribute type, second by attribute
- * name (if present), third by instance number. The extents of one
- * non-resident attribute (if present) immediately follow after the initial
- * extent. They are ordered by lowest_vcn and have their instace set to zero.
- * It is not allowed to have two attributes with all sorting keys equal.
- * - Further restrictions:
- * - If not resident, the vcn to lcn mapping array has to fit inside the
- * base mft record.
- * - The attribute list attribute value has a maximum size of 256kb. This
- * is imposed by the Windows cache manager.
- * - Attribute lists are only used when the attributes of mft record do not
- * fit inside the mft record despite all attributes (that can be made
- * non-resident) having been made non-resident. This can happen e.g. when:
- * - File has a large number of hard links (lots of file name
- * attributes present).
- * - The mapping pairs array of some non-resident attribute becomes so
- * large due to fragmentation that it overflows the mft record.
- * - The security descriptor is very complex (not applicable to
- * NTFS 3.0 volumes).
- * - There are many named streams.
- */
-typedef struct {
-/*Ofs*/
-/* 0*/ ATTR_TYPE type; /* Type of referenced attribute. */
-/* 4*/ le16 length; /* Byte size of this entry (8-byte aligned). */
-/* 6*/ u8 name_length; /* Size in Unicode chars of the name of the
- attribute or 0 if unnamed. */
-/* 7*/ u8 name_offset; /* Byte offset to beginning of attribute name
- (always set this to where the name would
- start even if unnamed). */
-/* 8*/ leVCN lowest_vcn; /* Lowest virtual cluster number of this portion
- of the attribute value. This is usually 0. It
- is non-zero for the case where one attribute
- does not fit into one mft record and thus
- several mft records are allocated to hold
- this attribute. In the latter case, each mft
- record holds one extent of the attribute and
- there is one attribute list entry for each
- extent. NOTE: This is DEFINITELY a signed
- value! The windows driver uses cmp, followed
- by jg when comparing this, thus it treats it
- as signed. */
-/* 16*/ leMFT_REF mft_reference;/* The reference of the mft record holding
- the ATTR_RECORD for this portion of the
- attribute value. */
-/* 24*/ le16 instance; /* If lowest_vcn = 0, the instance of the
- attribute being referenced; otherwise 0. */
-/* 26*/ ntfschar name[0]; /* Use when creating only. When reading use
- name_offset to determine the location of the
- name. */
-/* sizeof() = 26 + (attribute_name_length * 2) bytes */
-} __attribute__ ((__packed__)) ATTR_LIST_ENTRY;
-
-/*
- * The maximum allowed length for a file name.
- */
-#define MAXIMUM_FILE_NAME_LENGTH 255
-
-/*
- * Possible namespaces for filenames in ntfs (8-bit).
- */
-enum {
- FILE_NAME_POSIX = 0x00,
- /* This is the largest namespace. It is case sensitive and allows all
- Unicode characters except for: '\0' and '/'. Beware that in
- WinNT/2k/2003 by default files which eg have the same name except
- for their case will not be distinguished by the standard utilities
- and thus a "del filename" will delete both "filename" and "fileName"
- without warning. However if for example Services For Unix (SFU) are
- installed and the case sensitive option was enabled at installation
- time, then you can create/access/delete such files.
- Note that even SFU places restrictions on the filenames beyond the
- '\0' and '/' and in particular the following set of characters is
- not allowed: '"', '/', '<', '>', '\'. All other characters,
- including the ones no allowed in WIN32 namespace are allowed.
- Tested with SFU 3.5 (this is now free) running on Windows XP. */
- FILE_NAME_WIN32 = 0x01,
- /* The standard WinNT/2k NTFS long filenames. Case insensitive. All
- Unicode chars except: '\0', '"', '*', '/', ':', '<', '>', '?', '\',
- and '|'. Further, names cannot end with a '.' or a space. */
- FILE_NAME_DOS = 0x02,
- /* The standard DOS filenames (8.3 format). Uppercase only. All 8-bit
- characters greater space, except: '"', '*', '+', ',', '/', ':', ';',
- '<', '=', '>', '?', and '\'. */
- FILE_NAME_WIN32_AND_DOS = 0x03,
- /* 3 means that both the Win32 and the DOS filenames are identical and
- hence have been saved in this single filename record. */
-} __attribute__ ((__packed__));
-
-typedef u8 FILE_NAME_TYPE_FLAGS;
-
-/*
- * Attribute: Filename (0x30).
- *
- * NOTE: Always resident.
- * NOTE: All fields, except the parent_directory, are only updated when the
- * filename is changed. Until then, they just become out of sync with
- * reality and the more up to date values are present in the standard
- * information attribute.
- * NOTE: There is conflicting information about the meaning of each of the time
- * fields but the meaning as defined below has been verified to be
- * correct by practical experimentation on Windows NT4 SP6a and is hence
- * assumed to be the one and only correct interpretation.
- */
-typedef struct {
-/*hex ofs*/
-/* 0*/ leMFT_REF parent_directory; /* Directory this filename is
- referenced from. */
-/* 8*/ sle64 creation_time; /* Time file was created. */
-/* 10*/ sle64 last_data_change_time; /* Time the data attribute was last
- modified. */
-/* 18*/ sle64 last_mft_change_time; /* Time this mft record was last
- modified. */
-/* 20*/ sle64 last_access_time; /* Time this mft record was last
- accessed. */
-/* 28*/ sle64 allocated_size; /* Byte size of on-disk allocated space
- for the unnamed data attribute. So
- for normal $DATA, this is the
- allocated_size from the unnamed
- $DATA attribute and for compressed
- and/or sparse $DATA, this is the
- compressed_size from the unnamed
- $DATA attribute. For a directory or
- other inode without an unnamed $DATA
- attribute, this is always 0. NOTE:
- This is a multiple of the cluster
- size. */
-/* 30*/ sle64 data_size; /* Byte size of actual data in unnamed
- data attribute. For a directory or
- other inode without an unnamed $DATA
- attribute, this is always 0. */
-/* 38*/ FILE_ATTR_FLAGS file_attributes; /* Flags describing the file. */
-/* 3c*/ union {
- /* 3c*/ struct {
- /* 3c*/ le16 packed_ea_size; /* Size of the buffer needed to
- pack the extended attributes
- (EAs), if such are present.*/
- /* 3e*/ le16 reserved; /* Reserved for alignment. */
- } __attribute__ ((__packed__)) ea;
- /* 3c*/ struct {
- /* 3c*/ le32 reparse_point_tag; /* Type of reparse point,
- present only in reparse
- points and only if there are
- no EAs. */
- } __attribute__ ((__packed__)) rp;
- } __attribute__ ((__packed__)) type;
-/* 40*/ u8 file_name_length; /* Length of file name in
- (Unicode) characters. */
-/* 41*/ FILE_NAME_TYPE_FLAGS file_name_type; /* Namespace of the file name.*/
-/* 42*/ ntfschar file_name[0]; /* File name in Unicode. */
-} __attribute__ ((__packed__)) FILE_NAME_ATTR;
-
-/*
- * GUID structures store globally unique identifiers (GUID). A GUID is a
- * 128-bit value consisting of one group of eight hexadecimal digits, followed
- * by three groups of four hexadecimal digits each, followed by one group of
- * twelve hexadecimal digits. GUIDs are Microsoft's implementation of the
- * distributed computing environment (DCE) universally unique identifier (UUID).
- * Example of a GUID:
- * 1F010768-5A73-BC91-0010A52216A7
- */
-typedef struct {
- le32 data1; /* The first eight hexadecimal digits of the GUID. */
- le16 data2; /* The first group of four hexadecimal digits. */
- le16 data3; /* The second group of four hexadecimal digits. */
- u8 data4[8]; /* The first two bytes are the third group of four
- hexadecimal digits. The remaining six bytes are the
- final 12 hexadecimal digits. */
-} __attribute__ ((__packed__)) GUID;
-
-/*
- * FILE_Extend/$ObjId contains an index named $O. This index contains all
- * object_ids present on the volume as the index keys and the corresponding
- * mft_record numbers as the index entry data parts. The data part (defined
- * below) also contains three other object_ids:
- * birth_volume_id - object_id of FILE_Volume on which the file was first
- * created. Optional (i.e. can be zero).
- * birth_object_id - object_id of file when it was first created. Usually
- * equals the object_id. Optional (i.e. can be zero).
- * domain_id - Reserved (always zero).
- */
-typedef struct {
- leMFT_REF mft_reference;/* Mft record containing the object_id in
- the index entry key. */
- union {
- struct {
- GUID birth_volume_id;
- GUID birth_object_id;
- GUID domain_id;
- } __attribute__ ((__packed__)) origin;
- u8 extended_info[48];
- } __attribute__ ((__packed__)) opt;
-} __attribute__ ((__packed__)) OBJ_ID_INDEX_DATA;
-
-/*
- * Attribute: Object id (NTFS 3.0+) (0x40).
- *
- * NOTE: Always resident.
- */
-typedef struct {
- GUID object_id; /* Unique id assigned to the
- file.*/
- /* The following fields are optional. The attribute value size is 16
- bytes, i.e. sizeof(GUID), if these are not present at all. Note,
- the entries can be present but one or more (or all) can be zero
- meaning that that particular value(s) is(are) not defined. */
- union {
- struct {
- GUID birth_volume_id; /* Unique id of volume on which
- the file was first created.*/
- GUID birth_object_id; /* Unique id of file when it was
- first created. */
- GUID domain_id; /* Reserved, zero. */
- } __attribute__ ((__packed__)) origin;
- u8 extended_info[48];
- } __attribute__ ((__packed__)) opt;
-} __attribute__ ((__packed__)) OBJECT_ID_ATTR;
-
-/*
- * The pre-defined IDENTIFIER_AUTHORITIES used as SID_IDENTIFIER_AUTHORITY in
- * the SID structure (see below).
- */
-//typedef enum { /* SID string prefix. */
-// SECURITY_NULL_SID_AUTHORITY = {0, 0, 0, 0, 0, 0}, /* S-1-0 */
-// SECURITY_WORLD_SID_AUTHORITY = {0, 0, 0, 0, 0, 1}, /* S-1-1 */
-// SECURITY_LOCAL_SID_AUTHORITY = {0, 0, 0, 0, 0, 2}, /* S-1-2 */
-// SECURITY_CREATOR_SID_AUTHORITY = {0, 0, 0, 0, 0, 3}, /* S-1-3 */
-// SECURITY_NON_UNIQUE_AUTHORITY = {0, 0, 0, 0, 0, 4}, /* S-1-4 */
-// SECURITY_NT_SID_AUTHORITY = {0, 0, 0, 0, 0, 5}, /* S-1-5 */
-//} IDENTIFIER_AUTHORITIES;
-
-/*
- * These relative identifiers (RIDs) are used with the above identifier
- * authorities to make up universal well-known SIDs.
- *
- * Note: The relative identifier (RID) refers to the portion of a SID, which
- * identifies a user or group in relation to the authority that issued the SID.
- * For example, the universal well-known SID Creator Owner ID (S-1-3-0) is
- * made up of the identifier authority SECURITY_CREATOR_SID_AUTHORITY (3) and
- * the relative identifier SECURITY_CREATOR_OWNER_RID (0).
- */
-typedef enum { /* Identifier authority. */
- SECURITY_NULL_RID = 0, /* S-1-0 */
- SECURITY_WORLD_RID = 0, /* S-1-1 */
- SECURITY_LOCAL_RID = 0, /* S-1-2 */
-
- SECURITY_CREATOR_OWNER_RID = 0, /* S-1-3 */
- SECURITY_CREATOR_GROUP_RID = 1, /* S-1-3 */
-
- SECURITY_CREATOR_OWNER_SERVER_RID = 2, /* S-1-3 */
- SECURITY_CREATOR_GROUP_SERVER_RID = 3, /* S-1-3 */
-
- SECURITY_DIALUP_RID = 1,
- SECURITY_NETWORK_RID = 2,
- SECURITY_BATCH_RID = 3,
- SECURITY_INTERACTIVE_RID = 4,
- SECURITY_SERVICE_RID = 6,
- SECURITY_ANONYMOUS_LOGON_RID = 7,
- SECURITY_PROXY_RID = 8,
- SECURITY_ENTERPRISE_CONTROLLERS_RID=9,
- SECURITY_SERVER_LOGON_RID = 9,
- SECURITY_PRINCIPAL_SELF_RID = 0xa,
- SECURITY_AUTHENTICATED_USER_RID = 0xb,
- SECURITY_RESTRICTED_CODE_RID = 0xc,
- SECURITY_TERMINAL_SERVER_RID = 0xd,
-
- SECURITY_LOGON_IDS_RID = 5,
- SECURITY_LOGON_IDS_RID_COUNT = 3,
-
- SECURITY_LOCAL_SYSTEM_RID = 0x12,
-
- SECURITY_NT_NON_UNIQUE = 0x15,
-
- SECURITY_BUILTIN_DOMAIN_RID = 0x20,
-
- /*
- * Well-known domain relative sub-authority values (RIDs).
- */
-
- /* Users. */
- DOMAIN_USER_RID_ADMIN = 0x1f4,
- DOMAIN_USER_RID_GUEST = 0x1f5,
- DOMAIN_USER_RID_KRBTGT = 0x1f6,
-
- /* Groups. */
- DOMAIN_GROUP_RID_ADMINS = 0x200,
- DOMAIN_GROUP_RID_USERS = 0x201,
- DOMAIN_GROUP_RID_GUESTS = 0x202,
- DOMAIN_GROUP_RID_COMPUTERS = 0x203,
- DOMAIN_GROUP_RID_CONTROLLERS = 0x204,
- DOMAIN_GROUP_RID_CERT_ADMINS = 0x205,
- DOMAIN_GROUP_RID_SCHEMA_ADMINS = 0x206,
- DOMAIN_GROUP_RID_ENTERPRISE_ADMINS= 0x207,
- DOMAIN_GROUP_RID_POLICY_ADMINS = 0x208,
-
- /* Aliases. */
- DOMAIN_ALIAS_RID_ADMINS = 0x220,
- DOMAIN_ALIAS_RID_USERS = 0x221,
- DOMAIN_ALIAS_RID_GUESTS = 0x222,
- DOMAIN_ALIAS_RID_POWER_USERS = 0x223,
-
- DOMAIN_ALIAS_RID_ACCOUNT_OPS = 0x224,
- DOMAIN_ALIAS_RID_SYSTEM_OPS = 0x225,
- DOMAIN_ALIAS_RID_PRINT_OPS = 0x226,
- DOMAIN_ALIAS_RID_BACKUP_OPS = 0x227,
-
- DOMAIN_ALIAS_RID_REPLICATOR = 0x228,
- DOMAIN_ALIAS_RID_RAS_SERVERS = 0x229,
- DOMAIN_ALIAS_RID_PREW2KCOMPACCESS = 0x22a,
-} RELATIVE_IDENTIFIERS;
-
-/*
- * The universal well-known SIDs:
- *
- * NULL_SID S-1-0-0
- * WORLD_SID S-1-1-0
- * LOCAL_SID S-1-2-0
- * CREATOR_OWNER_SID S-1-3-0
- * CREATOR_GROUP_SID S-1-3-1
- * CREATOR_OWNER_SERVER_SID S-1-3-2
- * CREATOR_GROUP_SERVER_SID S-1-3-3
- *
- * (Non-unique IDs) S-1-4
- *
- * NT well-known SIDs:
- *
- * NT_AUTHORITY_SID S-1-5
- * DIALUP_SID S-1-5-1
- *
- * NETWORD_SID S-1-5-2
- * BATCH_SID S-1-5-3
- * INTERACTIVE_SID S-1-5-4
- * SERVICE_SID S-1-5-6
- * ANONYMOUS_LOGON_SID S-1-5-7 (aka null logon session)
- * PROXY_SID S-1-5-8
- * SERVER_LOGON_SID S-1-5-9 (aka domain controller account)
- * SELF_SID S-1-5-10 (self RID)
- * AUTHENTICATED_USER_SID S-1-5-11
- * RESTRICTED_CODE_SID S-1-5-12 (running restricted code)
- * TERMINAL_SERVER_SID S-1-5-13 (running on terminal server)
- *
- * (Logon IDs) S-1-5-5-X-Y
- *
- * (NT non-unique IDs) S-1-5-0x15-...
- *
- * (Built-in domain) S-1-5-0x20
- */
-
-/*
- * The SID_IDENTIFIER_AUTHORITY is a 48-bit value used in the SID structure.
- *
- * NOTE: This is stored as a big endian number, hence the high_part comes
- * before the low_part.
- */
-typedef union {
- struct {
- u16 high_part; /* High 16-bits. */
- u32 low_part; /* Low 32-bits. */
- } __attribute__ ((__packed__)) parts;
- u8 value[6]; /* Value as individual bytes. */
-} __attribute__ ((__packed__)) SID_IDENTIFIER_AUTHORITY;
-
-/*
- * The SID structure is a variable-length structure used to uniquely identify
- * users or groups. SID stands for security identifier.
- *
- * The standard textual representation of the SID is of the form:
- * S-R-I-S-S...
- * Where:
- * - The first "S" is the literal character 'S' identifying the following
- * digits as a SID.
- * - R is the revision level of the SID expressed as a sequence of digits
- * either in decimal or hexadecimal (if the later, prefixed by "0x").
- * - I is the 48-bit identifier_authority, expressed as digits as R above.
- * - S... is one or more sub_authority values, expressed as digits as above.
- *
- * Example SID; the domain-relative SID of the local Administrators group on
- * Windows NT/2k:
- * S-1-5-32-544
- * This translates to a SID with:
- * revision = 1,
- * sub_authority_count = 2,
- * identifier_authority = {0,0,0,0,0,5}, // SECURITY_NT_AUTHORITY
- * sub_authority[0] = 32, // SECURITY_BUILTIN_DOMAIN_RID
- * sub_authority[1] = 544 // DOMAIN_ALIAS_RID_ADMINS
- */
-typedef struct {
- u8 revision;
- u8 sub_authority_count;
- SID_IDENTIFIER_AUTHORITY identifier_authority;
- le32 sub_authority[1]; /* At least one sub_authority. */
-} __attribute__ ((__packed__)) SID;
-
-/*
- * Current constants for SIDs.
- */
-typedef enum {
- SID_REVISION = 1, /* Current revision level. */
- SID_MAX_SUB_AUTHORITIES = 15, /* Maximum number of those. */
- SID_RECOMMENDED_SUB_AUTHORITIES = 1, /* Will change to around 6 in
- a future revision. */
-} SID_CONSTANTS;
-
-/*
- * The predefined ACE types (8-bit, see below).
- */
-enum {
- ACCESS_MIN_MS_ACE_TYPE = 0,
- ACCESS_ALLOWED_ACE_TYPE = 0,
- ACCESS_DENIED_ACE_TYPE = 1,
- SYSTEM_AUDIT_ACE_TYPE = 2,
- SYSTEM_ALARM_ACE_TYPE = 3, /* Not implemented as of Win2k. */
- ACCESS_MAX_MS_V2_ACE_TYPE = 3,
-
- ACCESS_ALLOWED_COMPOUND_ACE_TYPE= 4,
- ACCESS_MAX_MS_V3_ACE_TYPE = 4,
-
- /* The following are Win2k only. */
- ACCESS_MIN_MS_OBJECT_ACE_TYPE = 5,
- ACCESS_ALLOWED_OBJECT_ACE_TYPE = 5,
- ACCESS_DENIED_OBJECT_ACE_TYPE = 6,
- SYSTEM_AUDIT_OBJECT_ACE_TYPE = 7,
- SYSTEM_ALARM_OBJECT_ACE_TYPE = 8,
- ACCESS_MAX_MS_OBJECT_ACE_TYPE = 8,
-
- ACCESS_MAX_MS_V4_ACE_TYPE = 8,
-
- /* This one is for WinNT/2k. */
- ACCESS_MAX_MS_ACE_TYPE = 8,
-} __attribute__ ((__packed__));
-
-typedef u8 ACE_TYPES;
-
-/*
- * The ACE flags (8-bit) for audit and inheritance (see below).
- *
- * SUCCESSFUL_ACCESS_ACE_FLAG is only used with system audit and alarm ACE
- * types to indicate that a message is generated (in Windows!) for successful
- * accesses.
- *
- * FAILED_ACCESS_ACE_FLAG is only used with system audit and alarm ACE types
- * to indicate that a message is generated (in Windows!) for failed accesses.
- */
-enum {
- /* The inheritance flags. */
- OBJECT_INHERIT_ACE = 0x01,
- CONTAINER_INHERIT_ACE = 0x02,
- NO_PROPAGATE_INHERIT_ACE = 0x04,
- INHERIT_ONLY_ACE = 0x08,
- INHERITED_ACE = 0x10, /* Win2k only. */
- VALID_INHERIT_FLAGS = 0x1f,
-
- /* The audit flags. */
- SUCCESSFUL_ACCESS_ACE_FLAG = 0x40,
- FAILED_ACCESS_ACE_FLAG = 0x80,
-} __attribute__ ((__packed__));
-
-typedef u8 ACE_FLAGS;
-
-/*
- * An ACE is an access-control entry in an access-control list (ACL).
- * An ACE defines access to an object for a specific user or group or defines
- * the types of access that generate system-administration messages or alarms
- * for a specific user or group. The user or group is identified by a security
- * identifier (SID).
- *
- * Each ACE starts with an ACE_HEADER structure (aligned on 4-byte boundary),
- * which specifies the type and size of the ACE. The format of the subsequent
- * data depends on the ACE type.
- */
-typedef struct {
-/*Ofs*/
-/* 0*/ ACE_TYPES type; /* Type of the ACE. */
-/* 1*/ ACE_FLAGS flags; /* Flags describing the ACE. */
-/* 2*/ le16 size; /* Size in bytes of the ACE. */
-} __attribute__ ((__packed__)) ACE_HEADER;
-
-/*
- * The access mask (32-bit). Defines the access rights.
- *
- * The specific rights (bits 0 to 15). These depend on the type of the object
- * being secured by the ACE.
- */
-enum {
- /* Specific rights for files and directories are as follows: */
-
- /* Right to read data from the file. (FILE) */
- FILE_READ_DATA = cpu_to_le32(0x00000001),
- /* Right to list contents of a directory. (DIRECTORY) */
- FILE_LIST_DIRECTORY = cpu_to_le32(0x00000001),
-
- /* Right to write data to the file. (FILE) */
- FILE_WRITE_DATA = cpu_to_le32(0x00000002),
- /* Right to create a file in the directory. (DIRECTORY) */
- FILE_ADD_FILE = cpu_to_le32(0x00000002),
-
- /* Right to append data to the file. (FILE) */
- FILE_APPEND_DATA = cpu_to_le32(0x00000004),
- /* Right to create a subdirectory. (DIRECTORY) */
- FILE_ADD_SUBDIRECTORY = cpu_to_le32(0x00000004),
-
- /* Right to read extended attributes. (FILE/DIRECTORY) */
- FILE_READ_EA = cpu_to_le32(0x00000008),
-
- /* Right to write extended attributes. (FILE/DIRECTORY) */
- FILE_WRITE_EA = cpu_to_le32(0x00000010),
-
- /* Right to execute a file. (FILE) */
- FILE_EXECUTE = cpu_to_le32(0x00000020),
- /* Right to traverse the directory. (DIRECTORY) */
- FILE_TRAVERSE = cpu_to_le32(0x00000020),
-
- /*
- * Right to delete a directory and all the files it contains (its
- * children), even if the files are read-only. (DIRECTORY)
- */
- FILE_DELETE_CHILD = cpu_to_le32(0x00000040),
-
- /* Right to read file attributes. (FILE/DIRECTORY) */
- FILE_READ_ATTRIBUTES = cpu_to_le32(0x00000080),
-
- /* Right to change file attributes. (FILE/DIRECTORY) */
- FILE_WRITE_ATTRIBUTES = cpu_to_le32(0x00000100),
-
- /*
- * The standard rights (bits 16 to 23). These are independent of the
- * type of object being secured.
- */
-
- /* Right to delete the object. */
- DELETE = cpu_to_le32(0x00010000),
-
- /*
- * Right to read the information in the object's security descriptor,
- * not including the information in the SACL, i.e. right to read the
- * security descriptor and owner.
- */
- READ_CONTROL = cpu_to_le32(0x00020000),
-
- /* Right to modify the DACL in the object's security descriptor. */
- WRITE_DAC = cpu_to_le32(0x00040000),
-
- /* Right to change the owner in the object's security descriptor. */
- WRITE_OWNER = cpu_to_le32(0x00080000),
-
- /*
- * Right to use the object for synchronization. Enables a process to
- * wait until the object is in the signalled state. Some object types
- * do not support this access right.
- */
- SYNCHRONIZE = cpu_to_le32(0x00100000),
-
- /*
- * The following STANDARD_RIGHTS_* are combinations of the above for
- * convenience and are defined by the Win32 API.
- */
-
- /* These are currently defined to READ_CONTROL. */
- STANDARD_RIGHTS_READ = cpu_to_le32(0x00020000),
- STANDARD_RIGHTS_WRITE = cpu_to_le32(0x00020000),
- STANDARD_RIGHTS_EXECUTE = cpu_to_le32(0x00020000),
-
- /* Combines DELETE, READ_CONTROL, WRITE_DAC, and WRITE_OWNER access. */
- STANDARD_RIGHTS_REQUIRED = cpu_to_le32(0x000f0000),
-
- /*
- * Combines DELETE, READ_CONTROL, WRITE_DAC, WRITE_OWNER, and
- * SYNCHRONIZE access.
- */
- STANDARD_RIGHTS_ALL = cpu_to_le32(0x001f0000),
-
- /*
- * The access system ACL and maximum allowed access types (bits 24 to
- * 25, bits 26 to 27 are reserved).
- */
- ACCESS_SYSTEM_SECURITY = cpu_to_le32(0x01000000),
- MAXIMUM_ALLOWED = cpu_to_le32(0x02000000),
-
- /*
- * The generic rights (bits 28 to 31). These map onto the standard and
- * specific rights.
- */
-
- /* Read, write, and execute access. */
- GENERIC_ALL = cpu_to_le32(0x10000000),
-
- /* Execute access. */
- GENERIC_EXECUTE = cpu_to_le32(0x20000000),
-
- /*
- * Write access. For files, this maps onto:
- * FILE_APPEND_DATA | FILE_WRITE_ATTRIBUTES | FILE_WRITE_DATA |
- * FILE_WRITE_EA | STANDARD_RIGHTS_WRITE | SYNCHRONIZE
- * For directories, the mapping has the same numerical value. See
- * above for the descriptions of the rights granted.
- */
- GENERIC_WRITE = cpu_to_le32(0x40000000),
-
- /*
- * Read access. For files, this maps onto:
- * FILE_READ_ATTRIBUTES | FILE_READ_DATA | FILE_READ_EA |
- * STANDARD_RIGHTS_READ | SYNCHRONIZE
- * For directories, the mapping has the same numberical value. See
- * above for the descriptions of the rights granted.
- */
- GENERIC_READ = cpu_to_le32(0x80000000),
-};
-
-typedef le32 ACCESS_MASK;
-
-/*
- * The generic mapping array. Used to denote the mapping of each generic
- * access right to a specific access mask.
- *
- * FIXME: What exactly is this and what is it for? (AIA)
- */
-typedef struct {
- ACCESS_MASK generic_read;
- ACCESS_MASK generic_write;
- ACCESS_MASK generic_execute;
- ACCESS_MASK generic_all;
-} __attribute__ ((__packed__)) GENERIC_MAPPING;
-
-/*
- * The predefined ACE type structures are as defined below.
- */
-
-/*
- * ACCESS_ALLOWED_ACE, ACCESS_DENIED_ACE, SYSTEM_AUDIT_ACE, SYSTEM_ALARM_ACE
- */
-typedef struct {
-/* 0 ACE_HEADER; -- Unfolded here as gcc doesn't like unnamed structs. */
- ACE_TYPES type; /* Type of the ACE. */
- ACE_FLAGS flags; /* Flags describing the ACE. */
- le16 size; /* Size in bytes of the ACE. */
-/* 4*/ ACCESS_MASK mask; /* Access mask associated with the ACE. */
-
-/* 8*/ SID sid; /* The SID associated with the ACE. */
-} __attribute__ ((__packed__)) ACCESS_ALLOWED_ACE, ACCESS_DENIED_ACE,
- SYSTEM_AUDIT_ACE, SYSTEM_ALARM_ACE;
-
-/*
- * The object ACE flags (32-bit).
- */
-enum {
- ACE_OBJECT_TYPE_PRESENT = cpu_to_le32(1),
- ACE_INHERITED_OBJECT_TYPE_PRESENT = cpu_to_le32(2),
-};
-
-typedef le32 OBJECT_ACE_FLAGS;
-
-typedef struct {
-/* 0 ACE_HEADER; -- Unfolded here as gcc doesn't like unnamed structs. */
- ACE_TYPES type; /* Type of the ACE. */
- ACE_FLAGS flags; /* Flags describing the ACE. */
- le16 size; /* Size in bytes of the ACE. */
-/* 4*/ ACCESS_MASK mask; /* Access mask associated with the ACE. */
-
-/* 8*/ OBJECT_ACE_FLAGS object_flags; /* Flags describing the object ACE. */
-/* 12*/ GUID object_type;
-/* 28*/ GUID inherited_object_type;
-
-/* 44*/ SID sid; /* The SID associated with the ACE. */
-} __attribute__ ((__packed__)) ACCESS_ALLOWED_OBJECT_ACE,
- ACCESS_DENIED_OBJECT_ACE,
- SYSTEM_AUDIT_OBJECT_ACE,
- SYSTEM_ALARM_OBJECT_ACE;
-
-/*
- * An ACL is an access-control list (ACL).
- * An ACL starts with an ACL header structure, which specifies the size of
- * the ACL and the number of ACEs it contains. The ACL header is followed by
- * zero or more access control entries (ACEs). The ACL as well as each ACE
- * are aligned on 4-byte boundaries.
- */
-typedef struct {
- u8 revision; /* Revision of this ACL. */
- u8 alignment1;
- le16 size; /* Allocated space in bytes for ACL. Includes this
- header, the ACEs and the remaining free space. */
- le16 ace_count; /* Number of ACEs in the ACL. */
- le16 alignment2;
-/* sizeof() = 8 bytes */
-} __attribute__ ((__packed__)) ACL;
-
-/*
- * Current constants for ACLs.
- */
-typedef enum {
- /* Current revision. */
- ACL_REVISION = 2,
- ACL_REVISION_DS = 4,
-
- /* History of revisions. */
- ACL_REVISION1 = 1,
- MIN_ACL_REVISION = 2,
- ACL_REVISION2 = 2,
- ACL_REVISION3 = 3,
- ACL_REVISION4 = 4,
- MAX_ACL_REVISION = 4,
-} ACL_CONSTANTS;
-
-/*
- * The security descriptor control flags (16-bit).
- *
- * SE_OWNER_DEFAULTED - This boolean flag, when set, indicates that the SID
- * pointed to by the Owner field was provided by a defaulting mechanism
- * rather than explicitly provided by the original provider of the
- * security descriptor. This may affect the treatment of the SID with
- * respect to inheritance of an owner.
- *
- * SE_GROUP_DEFAULTED - This boolean flag, when set, indicates that the SID in
- * the Group field was provided by a defaulting mechanism rather than
- * explicitly provided by the original provider of the security
- * descriptor. This may affect the treatment of the SID with respect to
- * inheritance of a primary group.
- *
- * SE_DACL_PRESENT - This boolean flag, when set, indicates that the security
- * descriptor contains a discretionary ACL. If this flag is set and the
- * Dacl field of the SECURITY_DESCRIPTOR is null, then a null ACL is
- * explicitly being specified.
- *
- * SE_DACL_DEFAULTED - This boolean flag, when set, indicates that the ACL
- * pointed to by the Dacl field was provided by a defaulting mechanism
- * rather than explicitly provided by the original provider of the
- * security descriptor. This may affect the treatment of the ACL with
- * respect to inheritance of an ACL. This flag is ignored if the
- * DaclPresent flag is not set.
- *
- * SE_SACL_PRESENT - This boolean flag, when set, indicates that the security
- * descriptor contains a system ACL pointed to by the Sacl field. If this
- * flag is set and the Sacl field of the SECURITY_DESCRIPTOR is null, then
- * an empty (but present) ACL is being specified.
- *
- * SE_SACL_DEFAULTED - This boolean flag, when set, indicates that the ACL
- * pointed to by the Sacl field was provided by a defaulting mechanism
- * rather than explicitly provided by the original provider of the
- * security descriptor. This may affect the treatment of the ACL with
- * respect to inheritance of an ACL. This flag is ignored if the
- * SaclPresent flag is not set.
- *
- * SE_SELF_RELATIVE - This boolean flag, when set, indicates that the security
- * descriptor is in self-relative form. In this form, all fields of the
- * security descriptor are contiguous in memory and all pointer fields are
- * expressed as offsets from the beginning of the security descriptor.
- */
-enum {
- SE_OWNER_DEFAULTED = cpu_to_le16(0x0001),
- SE_GROUP_DEFAULTED = cpu_to_le16(0x0002),
- SE_DACL_PRESENT = cpu_to_le16(0x0004),
- SE_DACL_DEFAULTED = cpu_to_le16(0x0008),
-
- SE_SACL_PRESENT = cpu_to_le16(0x0010),
- SE_SACL_DEFAULTED = cpu_to_le16(0x0020),
-
- SE_DACL_AUTO_INHERIT_REQ = cpu_to_le16(0x0100),
- SE_SACL_AUTO_INHERIT_REQ = cpu_to_le16(0x0200),
- SE_DACL_AUTO_INHERITED = cpu_to_le16(0x0400),
- SE_SACL_AUTO_INHERITED = cpu_to_le16(0x0800),
-
- SE_DACL_PROTECTED = cpu_to_le16(0x1000),
- SE_SACL_PROTECTED = cpu_to_le16(0x2000),
- SE_RM_CONTROL_VALID = cpu_to_le16(0x4000),
- SE_SELF_RELATIVE = cpu_to_le16(0x8000)
-} __attribute__ ((__packed__));
-
-typedef le16 SECURITY_DESCRIPTOR_CONTROL;
-
-/*
- * Self-relative security descriptor. Contains the owner and group SIDs as well
- * as the sacl and dacl ACLs inside the security descriptor itself.
- */
-typedef struct {
- u8 revision; /* Revision level of the security descriptor. */
- u8 alignment;
- SECURITY_DESCRIPTOR_CONTROL control; /* Flags qualifying the type of
- the descriptor as well as the following fields. */
- le32 owner; /* Byte offset to a SID representing an object's
- owner. If this is NULL, no owner SID is present in
- the descriptor. */
- le32 group; /* Byte offset to a SID representing an object's
- primary group. If this is NULL, no primary group
- SID is present in the descriptor. */
- le32 sacl; /* Byte offset to a system ACL. Only valid, if
- SE_SACL_PRESENT is set in the control field. If
- SE_SACL_PRESENT is set but sacl is NULL, a NULL ACL
- is specified. */
- le32 dacl; /* Byte offset to a discretionary ACL. Only valid, if
- SE_DACL_PRESENT is set in the control field. If
- SE_DACL_PRESENT is set but dacl is NULL, a NULL ACL
- (unconditionally granting access) is specified. */
-/* sizeof() = 0x14 bytes */
-} __attribute__ ((__packed__)) SECURITY_DESCRIPTOR_RELATIVE;
-
-/*
- * Absolute security descriptor. Does not contain the owner and group SIDs, nor
- * the sacl and dacl ACLs inside the security descriptor. Instead, it contains
- * pointers to these structures in memory. Obviously, absolute security
- * descriptors are only useful for in memory representations of security
- * descriptors. On disk, a self-relative security descriptor is used.
- */
-typedef struct {
- u8 revision; /* Revision level of the security descriptor. */
- u8 alignment;
- SECURITY_DESCRIPTOR_CONTROL control; /* Flags qualifying the type of
- the descriptor as well as the following fields. */
- SID *owner; /* Points to a SID representing an object's owner. If
- this is NULL, no owner SID is present in the
- descriptor. */
- SID *group; /* Points to a SID representing an object's primary
- group. If this is NULL, no primary group SID is
- present in the descriptor. */
- ACL *sacl; /* Points to a system ACL. Only valid, if
- SE_SACL_PRESENT is set in the control field. If
- SE_SACL_PRESENT is set but sacl is NULL, a NULL ACL
- is specified. */
- ACL *dacl; /* Points to a discretionary ACL. Only valid, if
- SE_DACL_PRESENT is set in the control field. If
- SE_DACL_PRESENT is set but dacl is NULL, a NULL ACL
- (unconditionally granting access) is specified. */
-} __attribute__ ((__packed__)) SECURITY_DESCRIPTOR;
-
-/*
- * Current constants for security descriptors.
- */
-typedef enum {
- /* Current revision. */
- SECURITY_DESCRIPTOR_REVISION = 1,
- SECURITY_DESCRIPTOR_REVISION1 = 1,
-
- /* The sizes of both the absolute and relative security descriptors is
- the same as pointers, at least on ia32 architecture are 32-bit. */
- SECURITY_DESCRIPTOR_MIN_LENGTH = sizeof(SECURITY_DESCRIPTOR),
-} SECURITY_DESCRIPTOR_CONSTANTS;
-
-/*
- * Attribute: Security descriptor (0x50). A standard self-relative security
- * descriptor.
- *
- * NOTE: Can be resident or non-resident.
- * NOTE: Not used in NTFS 3.0+, as security descriptors are stored centrally
- * in FILE_Secure and the correct descriptor is found using the security_id
- * from the standard information attribute.
- */
-typedef SECURITY_DESCRIPTOR_RELATIVE SECURITY_DESCRIPTOR_ATTR;
-
-/*
- * On NTFS 3.0+, all security descriptors are stored in FILE_Secure. Only one
- * referenced instance of each unique security descriptor is stored.
- *
- * FILE_Secure contains no unnamed data attribute, i.e. it has zero length. It
- * does, however, contain two indexes ($SDH and $SII) as well as a named data
- * stream ($SDS).
- *
- * Every unique security descriptor is assigned a unique security identifier
- * (security_id, not to be confused with a SID). The security_id is unique for
- * the NTFS volume and is used as an index into the $SII index, which maps
- * security_ids to the security descriptor's storage location within the $SDS
- * data attribute. The $SII index is sorted by ascending security_id.
- *
- * A simple hash is computed from each security descriptor. This hash is used
- * as an index into the $SDH index, which maps security descriptor hashes to
- * the security descriptor's storage location within the $SDS data attribute.
- * The $SDH index is sorted by security descriptor hash and is stored in a B+
- * tree. When searching $SDH (with the intent of determining whether or not a
- * new security descriptor is already present in the $SDS data stream), if a
- * matching hash is found, but the security descriptors do not match, the
- * search in the $SDH index is continued, searching for a next matching hash.
- *
- * When a precise match is found, the security_id coresponding to the security
- * descriptor in the $SDS attribute is read from the found $SDH index entry and
- * is stored in the $STANDARD_INFORMATION attribute of the file/directory to
- * which the security descriptor is being applied. The $STANDARD_INFORMATION
- * attribute is present in all base mft records (i.e. in all files and
- * directories).
- *
- * If a match is not found, the security descriptor is assigned a new unique
- * security_id and is added to the $SDS data attribute. Then, entries
- * referencing the this security descriptor in the $SDS data attribute are
- * added to the $SDH and $SII indexes.
- *
- * Note: Entries are never deleted from FILE_Secure, even if nothing
- * references an entry any more.
- */
-
-/*
- * This header precedes each security descriptor in the $SDS data stream.
- * This is also the index entry data part of both the $SII and $SDH indexes.
- */
-typedef struct {
- le32 hash; /* Hash of the security descriptor. */
- le32 security_id; /* The security_id assigned to the descriptor. */
- le64 offset; /* Byte offset of this entry in the $SDS stream. */
- le32 length; /* Size in bytes of this entry in $SDS stream. */
-} __attribute__ ((__packed__)) SECURITY_DESCRIPTOR_HEADER;
-
-/*
- * The $SDS data stream contains the security descriptors, aligned on 16-byte
- * boundaries, sorted by security_id in a B+ tree. Security descriptors cannot
- * cross 256kib boundaries (this restriction is imposed by the Windows cache
- * manager). Each security descriptor is contained in a SDS_ENTRY structure.
- * Also, each security descriptor is stored twice in the $SDS stream with a
- * fixed offset of 0x40000 bytes (256kib, the Windows cache manager's max size)
- * between them; i.e. if a SDS_ENTRY specifies an offset of 0x51d0, then the
- * first copy of the security descriptor will be at offset 0x51d0 in the
- * $SDS data stream and the second copy will be at offset 0x451d0.
- */
-typedef struct {
-/*Ofs*/
-/* 0 SECURITY_DESCRIPTOR_HEADER; -- Unfolded here as gcc doesn't like
- unnamed structs. */
- le32 hash; /* Hash of the security descriptor. */
- le32 security_id; /* The security_id assigned to the descriptor. */
- le64 offset; /* Byte offset of this entry in the $SDS stream. */
- le32 length; /* Size in bytes of this entry in $SDS stream. */
-/* 20*/ SECURITY_DESCRIPTOR_RELATIVE sid; /* The self-relative security
- descriptor. */
-} __attribute__ ((__packed__)) SDS_ENTRY;
-
-/*
- * The index entry key used in the $SII index. The collation type is
- * COLLATION_NTOFS_ULONG.
- */
-typedef struct {
- le32 security_id; /* The security_id assigned to the descriptor. */
-} __attribute__ ((__packed__)) SII_INDEX_KEY;
-
-/*
- * The index entry key used in the $SDH index. The keys are sorted first by
- * hash and then by security_id. The collation rule is
- * COLLATION_NTOFS_SECURITY_HASH.
- */
-typedef struct {
- le32 hash; /* Hash of the security descriptor. */
- le32 security_id; /* The security_id assigned to the descriptor. */
-} __attribute__ ((__packed__)) SDH_INDEX_KEY;
-
-/*
- * Attribute: Volume name (0x60).
- *
- * NOTE: Always resident.
- * NOTE: Present only in FILE_Volume.
- */
-typedef struct {
- ntfschar name[0]; /* The name of the volume in Unicode. */
-} __attribute__ ((__packed__)) VOLUME_NAME;
-
-/*
- * Possible flags for the volume (16-bit).
- */
-enum {
- VOLUME_IS_DIRTY = cpu_to_le16(0x0001),
- VOLUME_RESIZE_LOG_FILE = cpu_to_le16(0x0002),
- VOLUME_UPGRADE_ON_MOUNT = cpu_to_le16(0x0004),
- VOLUME_MOUNTED_ON_NT4 = cpu_to_le16(0x0008),
-
- VOLUME_DELETE_USN_UNDERWAY = cpu_to_le16(0x0010),
- VOLUME_REPAIR_OBJECT_ID = cpu_to_le16(0x0020),
-
- VOLUME_CHKDSK_UNDERWAY = cpu_to_le16(0x4000),
- VOLUME_MODIFIED_BY_CHKDSK = cpu_to_le16(0x8000),
-
- VOLUME_FLAGS_MASK = cpu_to_le16(0xc03f),
-
- /* To make our life easier when checking if we must mount read-only. */
- VOLUME_MUST_MOUNT_RO_MASK = cpu_to_le16(0xc027),
-} __attribute__ ((__packed__));
-
-typedef le16 VOLUME_FLAGS;
-
-/*
- * Attribute: Volume information (0x70).
- *
- * NOTE: Always resident.
- * NOTE: Present only in FILE_Volume.
- * NOTE: Windows 2000 uses NTFS 3.0 while Windows NT4 service pack 6a uses
- * NTFS 1.2. I haven't personally seen other values yet.
- */
-typedef struct {
- le64 reserved; /* Not used (yet?). */
- u8 major_ver; /* Major version of the ntfs format. */
- u8 minor_ver; /* Minor version of the ntfs format. */
- VOLUME_FLAGS flags; /* Bit array of VOLUME_* flags. */
-} __attribute__ ((__packed__)) VOLUME_INFORMATION;
-
-/*
- * Attribute: Data attribute (0x80).
- *
- * NOTE: Can be resident or non-resident.
- *
- * Data contents of a file (i.e. the unnamed stream) or of a named stream.
- */
-typedef struct {
- u8 data[0]; /* The file's data contents. */
-} __attribute__ ((__packed__)) DATA_ATTR;
-
-/*
- * Index header flags (8-bit).
- */
-enum {
- /*
- * When index header is in an index root attribute:
- */
- SMALL_INDEX = 0, /* The index is small enough to fit inside the index
- root attribute and there is no index allocation
- attribute present. */
- LARGE_INDEX = 1, /* The index is too large to fit in the index root
- attribute and/or an index allocation attribute is
- present. */
- /*
- * When index header is in an index block, i.e. is part of index
- * allocation attribute:
- */
- LEAF_NODE = 0, /* This is a leaf node, i.e. there are no more nodes
- branching off it. */
- INDEX_NODE = 1, /* This node indexes other nodes, i.e. it is not a leaf
- node. */
- NODE_MASK = 1, /* Mask for accessing the *_NODE bits. */
-} __attribute__ ((__packed__));
-
-typedef u8 INDEX_HEADER_FLAGS;
-
-/*
- * This is the header for indexes, describing the INDEX_ENTRY records, which
- * follow the INDEX_HEADER. Together the index header and the index entries
- * make up a complete index.
- *
- * IMPORTANT NOTE: The offset, length and size structure members are counted
- * relative to the start of the index header structure and not relative to the
- * start of the index root or index allocation structures themselves.
- */
-typedef struct {
- le32 entries_offset; /* Byte offset to first INDEX_ENTRY
- aligned to 8-byte boundary. */
- le32 index_length; /* Data size of the index in bytes,
- i.e. bytes used from allocated
- size, aligned to 8-byte boundary. */
- le32 allocated_size; /* Byte size of this index (block),
- multiple of 8 bytes. */
- /* NOTE: For the index root attribute, the above two numbers are always
- equal, as the attribute is resident and it is resized as needed. In
- the case of the index allocation attribute the attribute is not
- resident and hence the allocated_size is a fixed value and must
- equal the index_block_size specified by the INDEX_ROOT attribute
- corresponding to the INDEX_ALLOCATION attribute this INDEX_BLOCK
- belongs to. */
- INDEX_HEADER_FLAGS flags; /* Bit field of INDEX_HEADER_FLAGS. */
- u8 reserved[3]; /* Reserved/align to 8-byte boundary. */
-} __attribute__ ((__packed__)) INDEX_HEADER;
-
-/*
- * Attribute: Index root (0x90).
- *
- * NOTE: Always resident.
- *
- * This is followed by a sequence of index entries (INDEX_ENTRY structures)
- * as described by the index header.
- *
- * When a directory is small enough to fit inside the index root then this
- * is the only attribute describing the directory. When the directory is too
- * large to fit in the index root, on the other hand, two additional attributes
- * are present: an index allocation attribute, containing sub-nodes of the B+
- * directory tree (see below), and a bitmap attribute, describing which virtual
- * cluster numbers (vcns) in the index allocation attribute are in use by an
- * index block.
- *
- * NOTE: The root directory (FILE_root) contains an entry for itself. Other
- * directories do not contain entries for themselves, though.
- */
-typedef struct {
- ATTR_TYPE type; /* Type of the indexed attribute. Is
- $FILE_NAME for directories, zero
- for view indexes. No other values
- allowed. */
- COLLATION_RULE collation_rule; /* Collation rule used to sort the
- index entries. If type is $FILE_NAME,
- this must be COLLATION_FILE_NAME. */
- le32 index_block_size; /* Size of each index block in bytes (in
- the index allocation attribute). */
- u8 clusters_per_index_block; /* Cluster size of each index block (in
- the index allocation attribute), when
- an index block is >= than a cluster,
- otherwise this will be the log of
- the size (like how the encoding of
- the mft record size and the index
- record size found in the boot sector
- work). Has to be a power of 2. */
- u8 reserved[3]; /* Reserved/align to 8-byte boundary. */
- INDEX_HEADER index; /* Index header describing the
- following index entries. */
-} __attribute__ ((__packed__)) INDEX_ROOT;
-
-/*
- * Attribute: Index allocation (0xa0).
- *
- * NOTE: Always non-resident (doesn't make sense to be resident anyway!).
- *
- * This is an array of index blocks. Each index block starts with an
- * INDEX_BLOCK structure containing an index header, followed by a sequence of
- * index entries (INDEX_ENTRY structures), as described by the INDEX_HEADER.
- */
-typedef struct {
-/* 0 NTFS_RECORD; -- Unfolded here as gcc doesn't like unnamed structs. */
- NTFS_RECORD_TYPE magic; /* Magic is "INDX". */
- le16 usa_ofs; /* See NTFS_RECORD definition. */
- le16 usa_count; /* See NTFS_RECORD definition. */
-
-/* 8*/ sle64 lsn; /* $LogFile sequence number of the last
- modification of this index block. */
-/* 16*/ leVCN index_block_vcn; /* Virtual cluster number of the index block.
- If the cluster_size on the volume is <= the
- index_block_size of the directory,
- index_block_vcn counts in units of clusters,
- and in units of sectors otherwise. */
-/* 24*/ INDEX_HEADER index; /* Describes the following index entries. */
-/* sizeof()= 40 (0x28) bytes */
-/*
- * When creating the index block, we place the update sequence array at this
- * offset, i.e. before we start with the index entries. This also makes sense,
- * otherwise we could run into problems with the update sequence array
- * containing in itself the last two bytes of a sector which would mean that
- * multi sector transfer protection wouldn't work. As you can't protect data
- * by overwriting it since you then can't get it back...
- * When reading use the data from the ntfs record header.
- */
-} __attribute__ ((__packed__)) INDEX_BLOCK;
-
-typedef INDEX_BLOCK INDEX_ALLOCATION;
-
-/*
- * The system file FILE_Extend/$Reparse contains an index named $R listing
- * all reparse points on the volume. The index entry keys are as defined
- * below. Note, that there is no index data associated with the index entries.
- *
- * The index entries are sorted by the index key file_id. The collation rule is
- * COLLATION_NTOFS_ULONGS. FIXME: Verify whether the reparse_tag is not the
- * primary key / is not a key at all. (AIA)
- */
-typedef struct {
- le32 reparse_tag; /* Reparse point type (inc. flags). */
- leMFT_REF file_id; /* Mft record of the file containing the
- reparse point attribute. */
-} __attribute__ ((__packed__)) REPARSE_INDEX_KEY;
-
-/*
- * Quota flags (32-bit).
- *
- * The user quota flags. Names explain meaning.
- */
-enum {
- QUOTA_FLAG_DEFAULT_LIMITS = cpu_to_le32(0x00000001),
- QUOTA_FLAG_LIMIT_REACHED = cpu_to_le32(0x00000002),
- QUOTA_FLAG_ID_DELETED = cpu_to_le32(0x00000004),
-
- QUOTA_FLAG_USER_MASK = cpu_to_le32(0x00000007),
- /* This is a bit mask for the user quota flags. */
-
- /*
- * These flags are only present in the quota defaults index entry, i.e.
- * in the entry where owner_id = QUOTA_DEFAULTS_ID.
- */
- QUOTA_FLAG_TRACKING_ENABLED = cpu_to_le32(0x00000010),
- QUOTA_FLAG_ENFORCEMENT_ENABLED = cpu_to_le32(0x00000020),
- QUOTA_FLAG_TRACKING_REQUESTED = cpu_to_le32(0x00000040),
- QUOTA_FLAG_LOG_THRESHOLD = cpu_to_le32(0x00000080),
-
- QUOTA_FLAG_LOG_LIMIT = cpu_to_le32(0x00000100),
- QUOTA_FLAG_OUT_OF_DATE = cpu_to_le32(0x00000200),
- QUOTA_FLAG_CORRUPT = cpu_to_le32(0x00000400),
- QUOTA_FLAG_PENDING_DELETES = cpu_to_le32(0x00000800),
-};
-
-typedef le32 QUOTA_FLAGS;
-
-/*
- * The system file FILE_Extend/$Quota contains two indexes $O and $Q. Quotas
- * are on a per volume and per user basis.
- *
- * The $Q index contains one entry for each existing user_id on the volume. The
- * index key is the user_id of the user/group owning this quota control entry,
- * i.e. the key is the owner_id. The user_id of the owner of a file, i.e. the
- * owner_id, is found in the standard information attribute. The collation rule
- * for $Q is COLLATION_NTOFS_ULONG.
- *
- * The $O index contains one entry for each user/group who has been assigned
- * a quota on that volume. The index key holds the SID of the user_id the
- * entry belongs to, i.e. the owner_id. The collation rule for $O is
- * COLLATION_NTOFS_SID.
- *
- * The $O index entry data is the user_id of the user corresponding to the SID.
- * This user_id is used as an index into $Q to find the quota control entry
- * associated with the SID.
- *
- * The $Q index entry data is the quota control entry and is defined below.
- */
-typedef struct {
- le32 version; /* Currently equals 2. */
- QUOTA_FLAGS flags; /* Flags describing this quota entry. */
- le64 bytes_used; /* How many bytes of the quota are in use. */
- sle64 change_time; /* Last time this quota entry was changed. */
- sle64 threshold; /* Soft quota (-1 if not limited). */
- sle64 limit; /* Hard quota (-1 if not limited). */
- sle64 exceeded_time; /* How long the soft quota has been exceeded. */
- SID sid; /* The SID of the user/object associated with
- this quota entry. Equals zero for the quota
- defaults entry (and in fact on a WinXP
- volume, it is not present at all). */
-} __attribute__ ((__packed__)) QUOTA_CONTROL_ENTRY;
-
-/*
- * Predefined owner_id values (32-bit).
- */
-enum {
- QUOTA_INVALID_ID = cpu_to_le32(0x00000000),
- QUOTA_DEFAULTS_ID = cpu_to_le32(0x00000001),
- QUOTA_FIRST_USER_ID = cpu_to_le32(0x00000100),
-};
-
-/*
- * Current constants for quota control entries.
- */
-typedef enum {
- /* Current version. */
- QUOTA_VERSION = 2,
-} QUOTA_CONTROL_ENTRY_CONSTANTS;
-
-/*
- * Index entry flags (16-bit).
- */
-enum {
- INDEX_ENTRY_NODE = cpu_to_le16(1), /* This entry contains a
- sub-node, i.e. a reference to an index block in form of
- a virtual cluster number (see below). */
- INDEX_ENTRY_END = cpu_to_le16(2), /* This signifies the last
- entry in an index block. The index entry does not
- represent a file but it can point to a sub-node. */
-
- INDEX_ENTRY_SPACE_FILLER = cpu_to_le16(0xffff), /* gcc: Force
- enum bit width to 16-bit. */
-} __attribute__ ((__packed__));
-
-typedef le16 INDEX_ENTRY_FLAGS;
-
-/*
- * This the index entry header (see below).
- */
-typedef struct {
-/* 0*/ union {
- struct { /* Only valid when INDEX_ENTRY_END is not set. */
- leMFT_REF indexed_file; /* The mft reference of the file
- described by this index
- entry. Used for directory
- indexes. */
- } __attribute__ ((__packed__)) dir;
- struct { /* Used for views/indexes to find the entry's data. */
- le16 data_offset; /* Data byte offset from this
- INDEX_ENTRY. Follows the
- index key. */
- le16 data_length; /* Data length in bytes. */
- le32 reservedV; /* Reserved (zero). */
- } __attribute__ ((__packed__)) vi;
- } __attribute__ ((__packed__)) data;
-/* 8*/ le16 length; /* Byte size of this index entry, multiple of
- 8-bytes. */
-/* 10*/ le16 key_length; /* Byte size of the key value, which is in the
- index entry. It follows field reserved. Not
- multiple of 8-bytes. */
-/* 12*/ INDEX_ENTRY_FLAGS flags; /* Bit field of INDEX_ENTRY_* flags. */
-/* 14*/ le16 reserved; /* Reserved/align to 8-byte boundary. */
-/* sizeof() = 16 bytes */
-} __attribute__ ((__packed__)) INDEX_ENTRY_HEADER;
-
-/*
- * This is an index entry. A sequence of such entries follows each INDEX_HEADER
- * structure. Together they make up a complete index. The index follows either
- * an index root attribute or an index allocation attribute.
- *
- * NOTE: Before NTFS 3.0 only filename attributes were indexed.
- */
-typedef struct {
-/*Ofs*/
-/* 0 INDEX_ENTRY_HEADER; -- Unfolded here as gcc dislikes unnamed structs. */
- union {
- struct { /* Only valid when INDEX_ENTRY_END is not set. */
- leMFT_REF indexed_file; /* The mft reference of the file
- described by this index
- entry. Used for directory
- indexes. */
- } __attribute__ ((__packed__)) dir;
- struct { /* Used for views/indexes to find the entry's data. */
- le16 data_offset; /* Data byte offset from this
- INDEX_ENTRY. Follows the
- index key. */
- le16 data_length; /* Data length in bytes. */
- le32 reservedV; /* Reserved (zero). */
- } __attribute__ ((__packed__)) vi;
- } __attribute__ ((__packed__)) data;
- le16 length; /* Byte size of this index entry, multiple of
- 8-bytes. */
- le16 key_length; /* Byte size of the key value, which is in the
- index entry. It follows field reserved. Not
- multiple of 8-bytes. */
- INDEX_ENTRY_FLAGS flags; /* Bit field of INDEX_ENTRY_* flags. */
- le16 reserved; /* Reserved/align to 8-byte boundary. */
-
-/* 16*/ union { /* The key of the indexed attribute. NOTE: Only present
- if INDEX_ENTRY_END bit in flags is not set. NOTE: On
- NTFS versions before 3.0 the only valid key is the
- FILE_NAME_ATTR. On NTFS 3.0+ the following
- additional index keys are defined: */
- FILE_NAME_ATTR file_name;/* $I30 index in directories. */
- SII_INDEX_KEY sii; /* $SII index in $Secure. */
- SDH_INDEX_KEY sdh; /* $SDH index in $Secure. */
- GUID object_id; /* $O index in FILE_Extend/$ObjId: The
- object_id of the mft record found in
- the data part of the index. */
- REPARSE_INDEX_KEY reparse; /* $R index in
- FILE_Extend/$Reparse. */
- SID sid; /* $O index in FILE_Extend/$Quota:
- SID of the owner of the user_id. */
- le32 owner_id; /* $Q index in FILE_Extend/$Quota:
- user_id of the owner of the quota
- control entry in the data part of
- the index. */
- } __attribute__ ((__packed__)) key;
- /* The (optional) index data is inserted here when creating. */
- // leVCN vcn; /* If INDEX_ENTRY_NODE bit in flags is set, the last
- // eight bytes of this index entry contain the virtual
- // cluster number of the index block that holds the
- // entries immediately preceding the current entry (the
- // vcn references the corresponding cluster in the data
- // of the non-resident index allocation attribute). If
- // the key_length is zero, then the vcn immediately
- // follows the INDEX_ENTRY_HEADER. Regardless of
- // key_length, the address of the 8-byte boundary
- // aligned vcn of INDEX_ENTRY{_HEADER} *ie is given by
- // (char*)ie + le16_to_cpu(ie*)->length) - sizeof(VCN),
- // where sizeof(VCN) can be hardcoded as 8 if wanted. */
-} __attribute__ ((__packed__)) INDEX_ENTRY;
-
-/*
- * Attribute: Bitmap (0xb0).
- *
- * Contains an array of bits (aka a bitfield).
- *
- * When used in conjunction with the index allocation attribute, each bit
- * corresponds to one index block within the index allocation attribute. Thus
- * the number of bits in the bitmap * index block size / cluster size is the
- * number of clusters in the index allocation attribute.
- */
-typedef struct {
- u8 bitmap[0]; /* Array of bits. */
-} __attribute__ ((__packed__)) BITMAP_ATTR;
-
-/*
- * The reparse point tag defines the type of the reparse point. It also
- * includes several flags, which further describe the reparse point.
- *
- * The reparse point tag is an unsigned 32-bit value divided in three parts:
- *
- * 1. The least significant 16 bits (i.e. bits 0 to 15) specifiy the type of
- * the reparse point.
- * 2. The 13 bits after this (i.e. bits 16 to 28) are reserved for future use.
- * 3. The most significant three bits are flags describing the reparse point.
- * They are defined as follows:
- * bit 29: Name surrogate bit. If set, the filename is an alias for
- * another object in the system.
- * bit 30: High-latency bit. If set, accessing the first byte of data will
- * be slow. (E.g. the data is stored on a tape drive.)
- * bit 31: Microsoft bit. If set, the tag is owned by Microsoft. User
- * defined tags have to use zero here.
- *
- * These are the predefined reparse point tags:
- */
-enum {
- IO_REPARSE_TAG_IS_ALIAS = cpu_to_le32(0x20000000),
- IO_REPARSE_TAG_IS_HIGH_LATENCY = cpu_to_le32(0x40000000),
- IO_REPARSE_TAG_IS_MICROSOFT = cpu_to_le32(0x80000000),
-
- IO_REPARSE_TAG_RESERVED_ZERO = cpu_to_le32(0x00000000),
- IO_REPARSE_TAG_RESERVED_ONE = cpu_to_le32(0x00000001),
- IO_REPARSE_TAG_RESERVED_RANGE = cpu_to_le32(0x00000001),
-
- IO_REPARSE_TAG_NSS = cpu_to_le32(0x68000005),
- IO_REPARSE_TAG_NSS_RECOVER = cpu_to_le32(0x68000006),
- IO_REPARSE_TAG_SIS = cpu_to_le32(0x68000007),
- IO_REPARSE_TAG_DFS = cpu_to_le32(0x68000008),
-
- IO_REPARSE_TAG_MOUNT_POINT = cpu_to_le32(0x88000003),
-
- IO_REPARSE_TAG_HSM = cpu_to_le32(0xa8000004),
-
- IO_REPARSE_TAG_SYMBOLIC_LINK = cpu_to_le32(0xe8000000),
-
- IO_REPARSE_TAG_VALID_VALUES = cpu_to_le32(0xe000ffff),
-};
-
-/*
- * Attribute: Reparse point (0xc0).
- *
- * NOTE: Can be resident or non-resident.
- */
-typedef struct {
- le32 reparse_tag; /* Reparse point type (inc. flags). */
- le16 reparse_data_length; /* Byte size of reparse data. */
- le16 reserved; /* Align to 8-byte boundary. */
- u8 reparse_data[0]; /* Meaning depends on reparse_tag. */
-} __attribute__ ((__packed__)) REPARSE_POINT;
-
-/*
- * Attribute: Extended attribute (EA) information (0xd0).
- *
- * NOTE: Always resident. (Is this true???)
- */
-typedef struct {
- le16 ea_length; /* Byte size of the packed extended
- attributes. */
- le16 need_ea_count; /* The number of extended attributes which have
- the NEED_EA bit set. */
- le32 ea_query_length; /* Byte size of the buffer required to query
- the extended attributes when calling
- ZwQueryEaFile() in Windows NT/2k. I.e. the
- byte size of the unpacked extended
- attributes. */
-} __attribute__ ((__packed__)) EA_INFORMATION;
-
-/*
- * Extended attribute flags (8-bit).
- */
-enum {
- NEED_EA = 0x80 /* If set the file to which the EA belongs
- cannot be interpreted without understanding
- the associates extended attributes. */
-} __attribute__ ((__packed__));
-
-typedef u8 EA_FLAGS;
-
-/*
- * Attribute: Extended attribute (EA) (0xe0).
- *
- * NOTE: Can be resident or non-resident.
- *
- * Like the attribute list and the index buffer list, the EA attribute value is
- * a sequence of EA_ATTR variable length records.
- */
-typedef struct {
- le32 next_entry_offset; /* Offset to the next EA_ATTR. */
- EA_FLAGS flags; /* Flags describing the EA. */
- u8 ea_name_length; /* Length of the name of the EA in bytes
- excluding the '\0' byte terminator. */
- le16 ea_value_length; /* Byte size of the EA's value. */
- u8 ea_name[0]; /* Name of the EA. Note this is ASCII, not
- Unicode and it is zero terminated. */
- u8 ea_value[0]; /* The value of the EA. Immediately follows
- the name. */
-} __attribute__ ((__packed__)) EA_ATTR;
-
-/*
- * Attribute: Property set (0xf0).
- *
- * Intended to support Native Structure Storage (NSS) - a feature removed from
- * NTFS 3.0 during beta testing.
- */
-typedef struct {
- /* Irrelevant as feature unused. */
-} __attribute__ ((__packed__)) PROPERTY_SET;
-
-/*
- * Attribute: Logged utility stream (0x100).
- *
- * NOTE: Can be resident or non-resident.
- *
- * Operations on this attribute are logged to the journal ($LogFile) like
- * normal metadata changes.
- *
- * Used by the Encrypting File System (EFS). All encrypted files have this
- * attribute with the name $EFS.
- */
-typedef struct {
- /* Can be anything the creator chooses. */
- /* EFS uses it as follows: */
- // FIXME: Type this info, verifying it along the way. (AIA)
-} __attribute__ ((__packed__)) LOGGED_UTILITY_STREAM, EFS_ATTR;
-
-#endif /* _LINUX_NTFS_LAYOUT_H */
diff --git a/fs/ntfs/lcnalloc.c b/fs/ntfs/lcnalloc.c
deleted file mode 100644
index eda9972..0000000
--- a/fs/ntfs/lcnalloc.c
+++ /dev/null
@@ -1,1000 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * lcnalloc.c - Cluster (de)allocation code. Part of the Linux-NTFS project.
- *
- * Copyright (c) 2004-2005 Anton Altaparmakov
- */
-
-#ifdef NTFS_RW
-
-#include <linux/pagemap.h>
-
-#include "lcnalloc.h"
-#include "debug.h"
-#include "bitmap.h"
-#include "inode.h"
-#include "volume.h"
-#include "attrib.h"
-#include "malloc.h"
-#include "aops.h"
-#include "ntfs.h"
-
-/**
- * ntfs_cluster_free_from_rl_nolock - free clusters from runlist
- * @vol: mounted ntfs volume on which to free the clusters
- * @rl: runlist describing the clusters to free
- *
- * Free all the clusters described by the runlist @rl on the volume @vol. In
- * the case of an error being returned, at least some of the clusters were not
- * freed.
- *
- * Return 0 on success and -errno on error.
- *
- * Locking: - The volume lcn bitmap must be locked for writing on entry and is
- * left locked on return.
- */
-int ntfs_cluster_free_from_rl_nolock(ntfs_volume *vol,
- const runlist_element *rl)
-{
- struct inode *lcnbmp_vi = vol->lcnbmp_ino;
- int ret = 0;
-
- ntfs_debug("Entering.");
- if (!rl)
- return 0;
- for (; rl->length; rl++) {
- int err;
-
- if (rl->lcn < 0)
- continue;
- err = ntfs_bitmap_clear_run(lcnbmp_vi, rl->lcn, rl->length);
- if (unlikely(err && (!ret || ret == -ENOMEM) && ret != err))
- ret = err;
- }
- ntfs_debug("Done.");
- return ret;
-}
-
-/**
- * ntfs_cluster_alloc - allocate clusters on an ntfs volume
- * @vol: mounted ntfs volume on which to allocate the clusters
- * @start_vcn: vcn to use for the first allocated cluster
- * @count: number of clusters to allocate
- * @start_lcn: starting lcn at which to allocate the clusters (or -1 if none)
- * @zone: zone from which to allocate the clusters
- * @is_extension: if 'true', this is an attribute extension
- *
- * Allocate @count clusters preferably starting at cluster @start_lcn or at the
- * current allocator position if @start_lcn is -1, on the mounted ntfs volume
- * @vol. @zone is either DATA_ZONE for allocation of normal clusters or
- * MFT_ZONE for allocation of clusters for the master file table, i.e. the
- * $MFT/$DATA attribute.
- *
- * @start_vcn specifies the vcn of the first allocated cluster. This makes
- * merging the resulting runlist with the old runlist easier.
- *
- * If @is_extension is 'true', the caller is allocating clusters to extend an
- * attribute and if it is 'false', the caller is allocating clusters to fill a
- * hole in an attribute. Practically the difference is that if @is_extension
- * is 'true' the returned runlist will be terminated with LCN_ENOENT and if
- * @is_extension is 'false' the runlist will be terminated with
- * LCN_RL_NOT_MAPPED.
- *
- * You need to check the return value with IS_ERR(). If this is false, the
- * function was successful and the return value is a runlist describing the
- * allocated cluster(s). If IS_ERR() is true, the function failed and
- * PTR_ERR() gives you the error code.
- *
- * Notes on the allocation algorithm
- * =================================
- *
- * There are two data zones. First is the area between the end of the mft zone
- * and the end of the volume, and second is the area between the start of the
- * volume and the start of the mft zone. On unmodified/standard NTFS 1.x
- * volumes, the second data zone does not exist due to the mft zone being
- * expanded to cover the start of the volume in order to reserve space for the
- * mft bitmap attribute.
- *
- * This is not the prettiest function but the complexity stems from the need of
- * implementing the mft vs data zoned approach and from the fact that we have
- * access to the lcn bitmap in portions of up to 8192 bytes at a time, so we
- * need to cope with crossing over boundaries of two buffers. Further, the
- * fact that the allocator allows for caller supplied hints as to the location
- * of where allocation should begin and the fact that the allocator keeps track
- * of where in the data zones the next natural allocation should occur,
- * contribute to the complexity of the function. But it should all be
- * worthwhile, because this allocator should: 1) be a full implementation of
- * the MFT zone approach used by Windows NT, 2) cause reduction in
- * fragmentation, and 3) be speedy in allocations (the code is not optimized
- * for speed, but the algorithm is, so further speed improvements are probably
- * possible).
- *
- * FIXME: We should be monitoring cluster allocation and increment the MFT zone
- * size dynamically but this is something for the future. We will just cause
- * heavier fragmentation by not doing it and I am not even sure Windows would
- * grow the MFT zone dynamically, so it might even be correct not to do this.
- * The overhead in doing dynamic MFT zone expansion would be very large and
- * unlikely worth the effort. (AIA)
- *
- * TODO: I have added in double the required zone position pointer wrap around
- * logic which can be optimized to having only one of the two logic sets.
- * However, having the double logic will work fine, but if we have only one of
- * the sets and we get it wrong somewhere, then we get into trouble, so
- * removing the duplicate logic requires _very_ careful consideration of _all_
- * possible code paths. So at least for now, I am leaving the double logic -
- * better safe than sorry... (AIA)
- *
- * Locking: - The volume lcn bitmap must be unlocked on entry and is unlocked
- * on return.
- * - This function takes the volume lcn bitmap lock for writing and
- * modifies the bitmap contents.
- */
-runlist_element *ntfs_cluster_alloc(ntfs_volume *vol, const VCN start_vcn,
- const s64 count, const LCN start_lcn,
- const NTFS_CLUSTER_ALLOCATION_ZONES zone,
- const bool is_extension)
-{
- LCN zone_start, zone_end, bmp_pos, bmp_initial_pos, last_read_pos, lcn;
- LCN prev_lcn = 0, prev_run_len = 0, mft_zone_size;
- s64 clusters;
- loff_t i_size;
- struct inode *lcnbmp_vi;
- runlist_element *rl = NULL;
- struct address_space *mapping;
- struct page *page = NULL;
- u8 *buf, *byte;
- int err = 0, rlpos, rlsize, buf_size;
- u8 pass, done_zones, search_zone, need_writeback = 0, bit;
-
- ntfs_debug("Entering for start_vcn 0x%llx, count 0x%llx, start_lcn "
- "0x%llx, zone %s_ZONE.", (unsigned long long)start_vcn,
- (unsigned long long)count,
- (unsigned long long)start_lcn,
- zone == MFT_ZONE ? "MFT" : "DATA");
- BUG_ON(!vol);
- lcnbmp_vi = vol->lcnbmp_ino;
- BUG_ON(!lcnbmp_vi);
- BUG_ON(start_vcn < 0);
- BUG_ON(count < 0);
- BUG_ON(start_lcn < -1);
- BUG_ON(zone < FIRST_ZONE);
- BUG_ON(zone > LAST_ZONE);
-
- /* Return NULL if @count is zero. */
- if (!count)
- return NULL;
- /* Take the lcnbmp lock for writing. */
- down_write(&vol->lcnbmp_lock);
- /*
- * If no specific @start_lcn was requested, use the current data zone
- * position, otherwise use the requested @start_lcn but make sure it
- * lies outside the mft zone. Also set done_zones to 0 (no zones done)
- * and pass depending on whether we are starting inside a zone (1) or
- * at the beginning of a zone (2). If requesting from the MFT_ZONE,
- * we either start at the current position within the mft zone or at
- * the specified position. If the latter is out of bounds then we start
- * at the beginning of the MFT_ZONE.
- */
- done_zones = 0;
- pass = 1;
- /*
- * zone_start and zone_end are the current search range. search_zone
- * is 1 for mft zone, 2 for data zone 1 (end of mft zone till end of
- * volume) and 4 for data zone 2 (start of volume till start of mft
- * zone).
- */
- zone_start = start_lcn;
- if (zone_start < 0) {
- if (zone == DATA_ZONE)
- zone_start = vol->data1_zone_pos;
- else
- zone_start = vol->mft_zone_pos;
- if (!zone_start) {
- /*
- * Zone starts at beginning of volume which means a
- * single pass is sufficient.
- */
- pass = 2;
- }
- } else if (zone == DATA_ZONE && zone_start >= vol->mft_zone_start &&
- zone_start < vol->mft_zone_end) {
- zone_start = vol->mft_zone_end;
- /*
- * Starting at beginning of data1_zone which means a single
- * pass in this zone is sufficient.
- */
- pass = 2;
- } else if (zone == MFT_ZONE && (zone_start < vol->mft_zone_start ||
- zone_start >= vol->mft_zone_end)) {
- zone_start = vol->mft_lcn;
- if (!vol->mft_zone_end)
- zone_start = 0;
- /*
- * Starting at beginning of volume which means a single pass
- * is sufficient.
- */
- pass = 2;
- }
- if (zone == MFT_ZONE) {
- zone_end = vol->mft_zone_end;
- search_zone = 1;
- } else /* if (zone == DATA_ZONE) */ {
- /* Skip searching the mft zone. */
- done_zones |= 1;
- if (zone_start >= vol->mft_zone_end) {
- zone_end = vol->nr_clusters;
- search_zone = 2;
- } else {
- zone_end = vol->mft_zone_start;
- search_zone = 4;
- }
- }
- /*
- * bmp_pos is the current bit position inside the bitmap. We use
- * bmp_initial_pos to determine whether or not to do a zone switch.
- */
- bmp_pos = bmp_initial_pos = zone_start;
-
- /* Loop until all clusters are allocated, i.e. clusters == 0. */
- clusters = count;
- rlpos = rlsize = 0;
- mapping = lcnbmp_vi->i_mapping;
- i_size = i_size_read(lcnbmp_vi);
- while (1) {
- ntfs_debug("Start of outer while loop: done_zones 0x%x, "
- "search_zone %i, pass %i, zone_start 0x%llx, "
- "zone_end 0x%llx, bmp_initial_pos 0x%llx, "
- "bmp_pos 0x%llx, rlpos %i, rlsize %i.",
- done_zones, search_zone, pass,
- (unsigned long long)zone_start,
- (unsigned long long)zone_end,
- (unsigned long long)bmp_initial_pos,
- (unsigned long long)bmp_pos, rlpos, rlsize);
- /* Loop until we run out of free clusters. */
- last_read_pos = bmp_pos >> 3;
- ntfs_debug("last_read_pos 0x%llx.",
- (unsigned long long)last_read_pos);
- if (last_read_pos > i_size) {
- ntfs_debug("End of attribute reached. "
- "Skipping to zone_pass_done.");
- goto zone_pass_done;
- }
- if (likely(page)) {
- if (need_writeback) {
- ntfs_debug("Marking page dirty.");
- flush_dcache_page(page);
- set_page_dirty(page);
- need_writeback = 0;
- }
- ntfs_unmap_page(page);
- }
- page = ntfs_map_page(mapping, last_read_pos >>
- PAGE_SHIFT);
- if (IS_ERR(page)) {
- err = PTR_ERR(page);
- ntfs_error(vol->sb, "Failed to map page.");
- goto out;
- }
- buf_size = last_read_pos & ~PAGE_MASK;
- buf = page_address(page) + buf_size;
- buf_size = PAGE_SIZE - buf_size;
- if (unlikely(last_read_pos + buf_size > i_size))
- buf_size = i_size - last_read_pos;
- buf_size <<= 3;
- lcn = bmp_pos & 7;
- bmp_pos &= ~(LCN)7;
- ntfs_debug("Before inner while loop: buf_size %i, lcn 0x%llx, "
- "bmp_pos 0x%llx, need_writeback %i.", buf_size,
- (unsigned long long)lcn,
- (unsigned long long)bmp_pos, need_writeback);
- while (lcn < buf_size && lcn + bmp_pos < zone_end) {
- byte = buf + (lcn >> 3);
- ntfs_debug("In inner while loop: buf_size %i, "
- "lcn 0x%llx, bmp_pos 0x%llx, "
- "need_writeback %i, byte ofs 0x%x, "
- "*byte 0x%x.", buf_size,
- (unsigned long long)lcn,
- (unsigned long long)bmp_pos,
- need_writeback,
- (unsigned int)(lcn >> 3),
- (unsigned int)*byte);
- /* Skip full bytes. */
- if (*byte == 0xff) {
- lcn = (lcn + 8) & ~(LCN)7;
- ntfs_debug("Continuing while loop 1.");
- continue;
- }
- bit = 1 << (lcn & 7);
- ntfs_debug("bit 0x%x.", bit);
- /* If the bit is already set, go onto the next one. */
- if (*byte & bit) {
- lcn++;
- ntfs_debug("Continuing while loop 2.");
- continue;
- }
- /*
- * Allocate more memory if needed, including space for
- * the terminator element.
- * ntfs_malloc_nofs() operates on whole pages only.
- */
- if ((rlpos + 2) * sizeof(*rl) > rlsize) {
- runlist_element *rl2;
-
- ntfs_debug("Reallocating memory.");
- if (!rl)
- ntfs_debug("First free bit is at LCN "
- "0x%llx.",
- (unsigned long long)
- (lcn + bmp_pos));
- rl2 = ntfs_malloc_nofs(rlsize + (int)PAGE_SIZE);
- if (unlikely(!rl2)) {
- err = -ENOMEM;
- ntfs_error(vol->sb, "Failed to "
- "allocate memory.");
- goto out;
- }
- memcpy(rl2, rl, rlsize);
- ntfs_free(rl);
- rl = rl2;
- rlsize += PAGE_SIZE;
- ntfs_debug("Reallocated memory, rlsize 0x%x.",
- rlsize);
- }
- /* Allocate the bitmap bit. */
- *byte |= bit;
- /* We need to write this bitmap page to disk. */
- need_writeback = 1;
- ntfs_debug("*byte 0x%x, need_writeback is set.",
- (unsigned int)*byte);
- /*
- * Coalesce with previous run if adjacent LCNs.
- * Otherwise, append a new run.
- */
- ntfs_debug("Adding run (lcn 0x%llx, len 0x%llx), "
- "prev_lcn 0x%llx, lcn 0x%llx, "
- "bmp_pos 0x%llx, prev_run_len 0x%llx, "
- "rlpos %i.",
- (unsigned long long)(lcn + bmp_pos),
- 1ULL, (unsigned long long)prev_lcn,
- (unsigned long long)lcn,
- (unsigned long long)bmp_pos,
- (unsigned long long)prev_run_len,
- rlpos);
- if (prev_lcn == lcn + bmp_pos - prev_run_len && rlpos) {
- ntfs_debug("Coalescing to run (lcn 0x%llx, "
- "len 0x%llx).",
- (unsigned long long)
- rl[rlpos - 1].lcn,
- (unsigned long long)
- rl[rlpos - 1].length);
- rl[rlpos - 1].length = ++prev_run_len;
- ntfs_debug("Run now (lcn 0x%llx, len 0x%llx), "
- "prev_run_len 0x%llx.",
- (unsigned long long)
- rl[rlpos - 1].lcn,
- (unsigned long long)
- rl[rlpos - 1].length,
- (unsigned long long)
- prev_run_len);
- } else {
- if (likely(rlpos)) {
- ntfs_debug("Adding new run, (previous "
- "run lcn 0x%llx, "
- "len 0x%llx).",
- (unsigned long long)
- rl[rlpos - 1].lcn,
- (unsigned long long)
- rl[rlpos - 1].length);
- rl[rlpos].vcn = rl[rlpos - 1].vcn +
- prev_run_len;
- } else {
- ntfs_debug("Adding new run, is first "
- "run.");
- rl[rlpos].vcn = start_vcn;
- }
- rl[rlpos].lcn = prev_lcn = lcn + bmp_pos;
- rl[rlpos].length = prev_run_len = 1;
- rlpos++;
- }
- /* Done? */
- if (!--clusters) {
- LCN tc;
- /*
- * Update the current zone position. Positions
- * of already scanned zones have been updated
- * during the respective zone switches.
- */
- tc = lcn + bmp_pos + 1;
- ntfs_debug("Done. Updating current zone "
- "position, tc 0x%llx, "
- "search_zone %i.",
- (unsigned long long)tc,
- search_zone);
- switch (search_zone) {
- case 1:
- ntfs_debug("Before checks, "
- "vol->mft_zone_pos "
- "0x%llx.",
- (unsigned long long)
- vol->mft_zone_pos);
- if (tc >= vol->mft_zone_end) {
- vol->mft_zone_pos =
- vol->mft_lcn;
- if (!vol->mft_zone_end)
- vol->mft_zone_pos = 0;
- } else if ((bmp_initial_pos >=
- vol->mft_zone_pos ||
- tc > vol->mft_zone_pos)
- && tc >= vol->mft_lcn)
- vol->mft_zone_pos = tc;
- ntfs_debug("After checks, "
- "vol->mft_zone_pos "
- "0x%llx.",
- (unsigned long long)
- vol->mft_zone_pos);
- break;
- case 2:
- ntfs_debug("Before checks, "
- "vol->data1_zone_pos "
- "0x%llx.",
- (unsigned long long)
- vol->data1_zone_pos);
- if (tc >= vol->nr_clusters)
- vol->data1_zone_pos =
- vol->mft_zone_end;
- else if ((bmp_initial_pos >=
- vol->data1_zone_pos ||
- tc > vol->data1_zone_pos)
- && tc >= vol->mft_zone_end)
- vol->data1_zone_pos = tc;
- ntfs_debug("After checks, "
- "vol->data1_zone_pos "
- "0x%llx.",
- (unsigned long long)
- vol->data1_zone_pos);
- break;
- case 4:
- ntfs_debug("Before checks, "
- "vol->data2_zone_pos "
- "0x%llx.",
- (unsigned long long)
- vol->data2_zone_pos);
- if (tc >= vol->mft_zone_start)
- vol->data2_zone_pos = 0;
- else if (bmp_initial_pos >=
- vol->data2_zone_pos ||
- tc > vol->data2_zone_pos)
- vol->data2_zone_pos = tc;
- ntfs_debug("After checks, "
- "vol->data2_zone_pos "
- "0x%llx.",
- (unsigned long long)
- vol->data2_zone_pos);
- break;
- default:
- BUG();
- }
- ntfs_debug("Finished. Going to out.");
- goto out;
- }
- lcn++;
- }
- bmp_pos += buf_size;
- ntfs_debug("After inner while loop: buf_size 0x%x, lcn "
- "0x%llx, bmp_pos 0x%llx, need_writeback %i.",
- buf_size, (unsigned long long)lcn,
- (unsigned long long)bmp_pos, need_writeback);
- if (bmp_pos < zone_end) {
- ntfs_debug("Continuing outer while loop, "
- "bmp_pos 0x%llx, zone_end 0x%llx.",
- (unsigned long long)bmp_pos,
- (unsigned long long)zone_end);
- continue;
- }
-zone_pass_done: /* Finished with the current zone pass. */
- ntfs_debug("At zone_pass_done, pass %i.", pass);
- if (pass == 1) {
- /*
- * Now do pass 2, scanning the first part of the zone
- * we omitted in pass 1.
- */
- pass = 2;
- zone_end = zone_start;
- switch (search_zone) {
- case 1: /* mft_zone */
- zone_start = vol->mft_zone_start;
- break;
- case 2: /* data1_zone */
- zone_start = vol->mft_zone_end;
- break;
- case 4: /* data2_zone */
- zone_start = 0;
- break;
- default:
- BUG();
- }
- /* Sanity check. */
- if (zone_end < zone_start)
- zone_end = zone_start;
- bmp_pos = zone_start;
- ntfs_debug("Continuing outer while loop, pass 2, "
- "zone_start 0x%llx, zone_end 0x%llx, "
- "bmp_pos 0x%llx.",
- (unsigned long long)zone_start,
- (unsigned long long)zone_end,
- (unsigned long long)bmp_pos);
- continue;
- } /* pass == 2 */
-done_zones_check:
- ntfs_debug("At done_zones_check, search_zone %i, done_zones "
- "before 0x%x, done_zones after 0x%x.",
- search_zone, done_zones,
- done_zones | search_zone);
- done_zones |= search_zone;
- if (done_zones < 7) {
- ntfs_debug("Switching zone.");
- /* Now switch to the next zone we haven't done yet. */
- pass = 1;
- switch (search_zone) {
- case 1:
- ntfs_debug("Switching from mft zone to data1 "
- "zone.");
- /* Update mft zone position. */
- if (rlpos) {
- LCN tc;
-
- ntfs_debug("Before checks, "
- "vol->mft_zone_pos "
- "0x%llx.",
- (unsigned long long)
- vol->mft_zone_pos);
- tc = rl[rlpos - 1].lcn +
- rl[rlpos - 1].length;
- if (tc >= vol->mft_zone_end) {
- vol->mft_zone_pos =
- vol->mft_lcn;
- if (!vol->mft_zone_end)
- vol->mft_zone_pos = 0;
- } else if ((bmp_initial_pos >=
- vol->mft_zone_pos ||
- tc > vol->mft_zone_pos)
- && tc >= vol->mft_lcn)
- vol->mft_zone_pos = tc;
- ntfs_debug("After checks, "
- "vol->mft_zone_pos "
- "0x%llx.",
- (unsigned long long)
- vol->mft_zone_pos);
- }
- /* Switch from mft zone to data1 zone. */
-switch_to_data1_zone: search_zone = 2;
- zone_start = bmp_initial_pos =
- vol->data1_zone_pos;
- zone_end = vol->nr_clusters;
- if (zone_start == vol->mft_zone_end)
- pass = 2;
- if (zone_start >= zone_end) {
- vol->data1_zone_pos = zone_start =
- vol->mft_zone_end;
- pass = 2;
- }
- break;
- case 2:
- ntfs_debug("Switching from data1 zone to "
- "data2 zone.");
- /* Update data1 zone position. */
- if (rlpos) {
- LCN tc;
-
- ntfs_debug("Before checks, "
- "vol->data1_zone_pos "
- "0x%llx.",
- (unsigned long long)
- vol->data1_zone_pos);
- tc = rl[rlpos - 1].lcn +
- rl[rlpos - 1].length;
- if (tc >= vol->nr_clusters)
- vol->data1_zone_pos =
- vol->mft_zone_end;
- else if ((bmp_initial_pos >=
- vol->data1_zone_pos ||
- tc > vol->data1_zone_pos)
- && tc >= vol->mft_zone_end)
- vol->data1_zone_pos = tc;
- ntfs_debug("After checks, "
- "vol->data1_zone_pos "
- "0x%llx.",
- (unsigned long long)
- vol->data1_zone_pos);
- }
- /* Switch from data1 zone to data2 zone. */
- search_zone = 4;
- zone_start = bmp_initial_pos =
- vol->data2_zone_pos;
- zone_end = vol->mft_zone_start;
- if (!zone_start)
- pass = 2;
- if (zone_start >= zone_end) {
- vol->data2_zone_pos = zone_start =
- bmp_initial_pos = 0;
- pass = 2;
- }
- break;
- case 4:
- ntfs_debug("Switching from data2 zone to "
- "data1 zone.");
- /* Update data2 zone position. */
- if (rlpos) {
- LCN tc;
-
- ntfs_debug("Before checks, "
- "vol->data2_zone_pos "
- "0x%llx.",
- (unsigned long long)
- vol->data2_zone_pos);
- tc = rl[rlpos - 1].lcn +
- rl[rlpos - 1].length;
- if (tc >= vol->mft_zone_start)
- vol->data2_zone_pos = 0;
- else if (bmp_initial_pos >=
- vol->data2_zone_pos ||
- tc > vol->data2_zone_pos)
- vol->data2_zone_pos = tc;
- ntfs_debug("After checks, "
- "vol->data2_zone_pos "
- "0x%llx.",
- (unsigned long long)
- vol->data2_zone_pos);
- }
- /* Switch from data2 zone to data1 zone. */
- goto switch_to_data1_zone;
- default:
- BUG();
- }
- ntfs_debug("After zone switch, search_zone %i, "
- "pass %i, bmp_initial_pos 0x%llx, "
- "zone_start 0x%llx, zone_end 0x%llx.",
- search_zone, pass,
- (unsigned long long)bmp_initial_pos,
- (unsigned long long)zone_start,
- (unsigned long long)zone_end);
- bmp_pos = zone_start;
- if (zone_start == zone_end) {
- ntfs_debug("Empty zone, going to "
- "done_zones_check.");
- /* Empty zone. Don't bother searching it. */
- goto done_zones_check;
- }
- ntfs_debug("Continuing outer while loop.");
- continue;
- } /* done_zones == 7 */
- ntfs_debug("All zones are finished.");
- /*
- * All zones are finished! If DATA_ZONE, shrink mft zone. If
- * MFT_ZONE, we have really run out of space.
- */
- mft_zone_size = vol->mft_zone_end - vol->mft_zone_start;
- ntfs_debug("vol->mft_zone_start 0x%llx, vol->mft_zone_end "
- "0x%llx, mft_zone_size 0x%llx.",
- (unsigned long long)vol->mft_zone_start,
- (unsigned long long)vol->mft_zone_end,
- (unsigned long long)mft_zone_size);
- if (zone == MFT_ZONE || mft_zone_size <= 0) {
- ntfs_debug("No free clusters left, going to out.");
- /* Really no more space left on device. */
- err = -ENOSPC;
- goto out;
- } /* zone == DATA_ZONE && mft_zone_size > 0 */
- ntfs_debug("Shrinking mft zone.");
- zone_end = vol->mft_zone_end;
- mft_zone_size >>= 1;
- if (mft_zone_size > 0)
- vol->mft_zone_end = vol->mft_zone_start + mft_zone_size;
- else /* mft zone and data2 zone no longer exist. */
- vol->data2_zone_pos = vol->mft_zone_start =
- vol->mft_zone_end = 0;
- if (vol->mft_zone_pos >= vol->mft_zone_end) {
- vol->mft_zone_pos = vol->mft_lcn;
- if (!vol->mft_zone_end)
- vol->mft_zone_pos = 0;
- }
- bmp_pos = zone_start = bmp_initial_pos =
- vol->data1_zone_pos = vol->mft_zone_end;
- search_zone = 2;
- pass = 2;
- done_zones &= ~2;
- ntfs_debug("After shrinking mft zone, mft_zone_size 0x%llx, "
- "vol->mft_zone_start 0x%llx, "
- "vol->mft_zone_end 0x%llx, "
- "vol->mft_zone_pos 0x%llx, search_zone 2, "
- "pass 2, dones_zones 0x%x, zone_start 0x%llx, "
- "zone_end 0x%llx, vol->data1_zone_pos 0x%llx, "
- "continuing outer while loop.",
- (unsigned long long)mft_zone_size,
- (unsigned long long)vol->mft_zone_start,
- (unsigned long long)vol->mft_zone_end,
- (unsigned long long)vol->mft_zone_pos,
- done_zones, (unsigned long long)zone_start,
- (unsigned long long)zone_end,
- (unsigned long long)vol->data1_zone_pos);
- }
- ntfs_debug("After outer while loop.");
-out:
- ntfs_debug("At out.");
- /* Add runlist terminator element. */
- if (likely(rl)) {
- rl[rlpos].vcn = rl[rlpos - 1].vcn + rl[rlpos - 1].length;
- rl[rlpos].lcn = is_extension ? LCN_ENOENT : LCN_RL_NOT_MAPPED;
- rl[rlpos].length = 0;
- }
- if (likely(page && !IS_ERR(page))) {
- if (need_writeback) {
- ntfs_debug("Marking page dirty.");
- flush_dcache_page(page);
- set_page_dirty(page);
- need_writeback = 0;
- }
- ntfs_unmap_page(page);
- }
- if (likely(!err)) {
- up_write(&vol->lcnbmp_lock);
- ntfs_debug("Done.");
- return rl;
- }
- ntfs_error(vol->sb, "Failed to allocate clusters, aborting "
- "(error %i).", err);
- if (rl) {
- int err2;
-
- if (err == -ENOSPC)
- ntfs_debug("Not enough space to complete allocation, "
- "err -ENOSPC, first free lcn 0x%llx, "
- "could allocate up to 0x%llx "
- "clusters.",
- (unsigned long long)rl[0].lcn,
- (unsigned long long)(count - clusters));
- /* Deallocate all allocated clusters. */
- ntfs_debug("Attempting rollback...");
- err2 = ntfs_cluster_free_from_rl_nolock(vol, rl);
- if (err2) {
- ntfs_error(vol->sb, "Failed to rollback (error %i). "
- "Leaving inconsistent metadata! "
- "Unmount and run chkdsk.", err2);
- NVolSetErrors(vol);
- }
- /* Free the runlist. */
- ntfs_free(rl);
- } else if (err == -ENOSPC)
- ntfs_debug("No space left at all, err = -ENOSPC, first free "
- "lcn = 0x%llx.",
- (long long)vol->data1_zone_pos);
- up_write(&vol->lcnbmp_lock);
- return ERR_PTR(err);
-}
-
-/**
- * __ntfs_cluster_free - free clusters on an ntfs volume
- * @ni: ntfs inode whose runlist describes the clusters to free
- * @start_vcn: vcn in the runlist of @ni at which to start freeing clusters
- * @count: number of clusters to free or -1 for all clusters
- * @ctx: active attribute search context if present or NULL if not
- * @is_rollback: true if this is a rollback operation
- *
- * Free @count clusters starting at the cluster @start_vcn in the runlist
- * described by the vfs inode @ni.
- *
- * If @count is -1, all clusters from @start_vcn to the end of the runlist are
- * deallocated. Thus, to completely free all clusters in a runlist, use
- * @start_vcn = 0 and @count = -1.
- *
- * If @ctx is specified, it is an active search context of @ni and its base mft
- * record. This is needed when __ntfs_cluster_free() encounters unmapped
- * runlist fragments and allows their mapping. If you do not have the mft
- * record mapped, you can specify @ctx as NULL and __ntfs_cluster_free() will
- * perform the necessary mapping and unmapping.
- *
- * Note, __ntfs_cluster_free() saves the state of @ctx on entry and restores it
- * before returning. Thus, @ctx will be left pointing to the same attribute on
- * return as on entry. However, the actual pointers in @ctx may point to
- * different memory locations on return, so you must remember to reset any
- * cached pointers from the @ctx, i.e. after the call to __ntfs_cluster_free(),
- * you will probably want to do:
- * m = ctx->mrec;
- * a = ctx->attr;
- * Assuming you cache ctx->attr in a variable @a of type ATTR_RECORD * and that
- * you cache ctx->mrec in a variable @m of type MFT_RECORD *.
- *
- * @is_rollback should always be 'false', it is for internal use to rollback
- * errors. You probably want to use ntfs_cluster_free() instead.
- *
- * Note, __ntfs_cluster_free() does not modify the runlist, so you have to
- * remove from the runlist or mark sparse the freed runs later.
- *
- * Return the number of deallocated clusters (not counting sparse ones) on
- * success and -errno on error.
- *
- * WARNING: If @ctx is supplied, regardless of whether success or failure is
- * returned, you need to check IS_ERR(@ctx->mrec) and if 'true' the @ctx
- * is no longer valid, i.e. you need to either call
- * ntfs_attr_reinit_search_ctx() or ntfs_attr_put_search_ctx() on it.
- * In that case PTR_ERR(@ctx->mrec) will give you the error code for
- * why the mapping of the old inode failed.
- *
- * Locking: - The runlist described by @ni must be locked for writing on entry
- * and is locked on return. Note the runlist may be modified when
- * needed runlist fragments need to be mapped.
- * - The volume lcn bitmap must be unlocked on entry and is unlocked
- * on return.
- * - This function takes the volume lcn bitmap lock for writing and
- * modifies the bitmap contents.
- * - If @ctx is NULL, the base mft record of @ni must not be mapped on
- * entry and it will be left unmapped on return.
- * - If @ctx is not NULL, the base mft record must be mapped on entry
- * and it will be left mapped on return.
- */
-s64 __ntfs_cluster_free(ntfs_inode *ni, const VCN start_vcn, s64 count,
- ntfs_attr_search_ctx *ctx, const bool is_rollback)
-{
- s64 delta, to_free, total_freed, real_freed;
- ntfs_volume *vol;
- struct inode *lcnbmp_vi;
- runlist_element *rl;
- int err;
-
- BUG_ON(!ni);
- ntfs_debug("Entering for i_ino 0x%lx, start_vcn 0x%llx, count "
- "0x%llx.%s", ni->mft_no, (unsigned long long)start_vcn,
- (unsigned long long)count,
- is_rollback ? " (rollback)" : "");
- vol = ni->vol;
- lcnbmp_vi = vol->lcnbmp_ino;
- BUG_ON(!lcnbmp_vi);
- BUG_ON(start_vcn < 0);
- BUG_ON(count < -1);
- /*
- * Lock the lcn bitmap for writing but only if not rolling back. We
- * must hold the lock all the way including through rollback otherwise
- * rollback is not possible because once we have cleared a bit and
- * dropped the lock, anyone could have set the bit again, thus
- * allocating the cluster for another use.
- */
- if (likely(!is_rollback))
- down_write(&vol->lcnbmp_lock);
-
- total_freed = real_freed = 0;
-
- rl = ntfs_attr_find_vcn_nolock(ni, start_vcn, ctx);
- if (IS_ERR(rl)) {
- if (!is_rollback)
- ntfs_error(vol->sb, "Failed to find first runlist "
- "element (error %li), aborting.",
- PTR_ERR(rl));
- err = PTR_ERR(rl);
- goto err_out;
- }
- if (unlikely(rl->lcn < LCN_HOLE)) {
- if (!is_rollback)
- ntfs_error(vol->sb, "First runlist element has "
- "invalid lcn, aborting.");
- err = -EIO;
- goto err_out;
- }
- /* Find the starting cluster inside the run that needs freeing. */
- delta = start_vcn - rl->vcn;
-
- /* The number of clusters in this run that need freeing. */
- to_free = rl->length - delta;
- if (count >= 0 && to_free > count)
- to_free = count;
-
- if (likely(rl->lcn >= 0)) {
- /* Do the actual freeing of the clusters in this run. */
- err = ntfs_bitmap_set_bits_in_run(lcnbmp_vi, rl->lcn + delta,
- to_free, likely(!is_rollback) ? 0 : 1);
- if (unlikely(err)) {
- if (!is_rollback)
- ntfs_error(vol->sb, "Failed to clear first run "
- "(error %i), aborting.", err);
- goto err_out;
- }
- /* We have freed @to_free real clusters. */
- real_freed = to_free;
- };
- /* Go to the next run and adjust the number of clusters left to free. */
- ++rl;
- if (count >= 0)
- count -= to_free;
-
- /* Keep track of the total "freed" clusters, including sparse ones. */
- total_freed = to_free;
- /*
- * Loop over the remaining runs, using @count as a capping value, and
- * free them.
- */
- for (; rl->length && count != 0; ++rl) {
- if (unlikely(rl->lcn < LCN_HOLE)) {
- VCN vcn;
-
- /* Attempt to map runlist. */
- vcn = rl->vcn;
- rl = ntfs_attr_find_vcn_nolock(ni, vcn, ctx);
- if (IS_ERR(rl)) {
- err = PTR_ERR(rl);
- if (!is_rollback)
- ntfs_error(vol->sb, "Failed to map "
- "runlist fragment or "
- "failed to find "
- "subsequent runlist "
- "element.");
- goto err_out;
- }
- if (unlikely(rl->lcn < LCN_HOLE)) {
- if (!is_rollback)
- ntfs_error(vol->sb, "Runlist element "
- "has invalid lcn "
- "(0x%llx).",
- (unsigned long long)
- rl->lcn);
- err = -EIO;
- goto err_out;
- }
- }
- /* The number of clusters in this run that need freeing. */
- to_free = rl->length;
- if (count >= 0 && to_free > count)
- to_free = count;
-
- if (likely(rl->lcn >= 0)) {
- /* Do the actual freeing of the clusters in the run. */
- err = ntfs_bitmap_set_bits_in_run(lcnbmp_vi, rl->lcn,
- to_free, likely(!is_rollback) ? 0 : 1);
- if (unlikely(err)) {
- if (!is_rollback)
- ntfs_error(vol->sb, "Failed to clear "
- "subsequent run.");
- goto err_out;
- }
- /* We have freed @to_free real clusters. */
- real_freed += to_free;
- }
- /* Adjust the number of clusters left to free. */
- if (count >= 0)
- count -= to_free;
-
- /* Update the total done clusters. */
- total_freed += to_free;
- }
- if (likely(!is_rollback))
- up_write(&vol->lcnbmp_lock);
-
- BUG_ON(count > 0);
-
- /* We are done. Return the number of actually freed clusters. */
- ntfs_debug("Done.");
- return real_freed;
-err_out:
- if (is_rollback)
- return err;
- /* If no real clusters were freed, no need to rollback. */
- if (!real_freed) {
- up_write(&vol->lcnbmp_lock);
- return err;
- }
- /*
- * Attempt to rollback and if that succeeds just return the error code.
- * If rollback fails, set the volume errors flag, emit an error
- * message, and return the error code.
- */
- delta = __ntfs_cluster_free(ni, start_vcn, total_freed, ctx, true);
- if (delta < 0) {
- ntfs_error(vol->sb, "Failed to rollback (error %i). Leaving "
- "inconsistent metadata! Unmount and run "
- "chkdsk.", (int)delta);
- NVolSetErrors(vol);
- }
- up_write(&vol->lcnbmp_lock);
- ntfs_error(vol->sb, "Aborting (error %i).", err);
- return err;
-}
-
-#endif /* NTFS_RW */
diff --git a/fs/ntfs/lcnalloc.h b/fs/ntfs/lcnalloc.h
deleted file mode 100644
index 1589a6d..0000000
--- a/fs/ntfs/lcnalloc.h
+++ /dev/null
@@ -1,131 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * lcnalloc.h - Exports for NTFS kernel cluster (de)allocation. Part of the
- * Linux-NTFS project.
- *
- * Copyright (c) 2004-2005 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_LCNALLOC_H
-#define _LINUX_NTFS_LCNALLOC_H
-
-#ifdef NTFS_RW
-
-#include <linux/fs.h>
-
-#include "attrib.h"
-#include "types.h"
-#include "inode.h"
-#include "runlist.h"
-#include "volume.h"
-
-typedef enum {
- FIRST_ZONE = 0, /* For sanity checking. */
- MFT_ZONE = 0, /* Allocate from $MFT zone. */
- DATA_ZONE = 1, /* Allocate from $DATA zone. */
- LAST_ZONE = 1, /* For sanity checking. */
-} NTFS_CLUSTER_ALLOCATION_ZONES;
-
-extern runlist_element *ntfs_cluster_alloc(ntfs_volume *vol,
- const VCN start_vcn, const s64 count, const LCN start_lcn,
- const NTFS_CLUSTER_ALLOCATION_ZONES zone,
- const bool is_extension);
-
-extern s64 __ntfs_cluster_free(ntfs_inode *ni, const VCN start_vcn,
- s64 count, ntfs_attr_search_ctx *ctx, const bool is_rollback);
-
-/**
- * ntfs_cluster_free - free clusters on an ntfs volume
- * @ni: ntfs inode whose runlist describes the clusters to free
- * @start_vcn: vcn in the runlist of @ni at which to start freeing clusters
- * @count: number of clusters to free or -1 for all clusters
- * @ctx: active attribute search context if present or NULL if not
- *
- * Free @count clusters starting at the cluster @start_vcn in the runlist
- * described by the ntfs inode @ni.
- *
- * If @count is -1, all clusters from @start_vcn to the end of the runlist are
- * deallocated. Thus, to completely free all clusters in a runlist, use
- * @start_vcn = 0 and @count = -1.
- *
- * If @ctx is specified, it is an active search context of @ni and its base mft
- * record. This is needed when ntfs_cluster_free() encounters unmapped runlist
- * fragments and allows their mapping. If you do not have the mft record
- * mapped, you can specify @ctx as NULL and ntfs_cluster_free() will perform
- * the necessary mapping and unmapping.
- *
- * Note, ntfs_cluster_free() saves the state of @ctx on entry and restores it
- * before returning. Thus, @ctx will be left pointing to the same attribute on
- * return as on entry. However, the actual pointers in @ctx may point to
- * different memory locations on return, so you must remember to reset any
- * cached pointers from the @ctx, i.e. after the call to ntfs_cluster_free(),
- * you will probably want to do:
- * m = ctx->mrec;
- * a = ctx->attr;
- * Assuming you cache ctx->attr in a variable @a of type ATTR_RECORD * and that
- * you cache ctx->mrec in a variable @m of type MFT_RECORD *.
- *
- * Note, ntfs_cluster_free() does not modify the runlist, so you have to remove
- * from the runlist or mark sparse the freed runs later.
- *
- * Return the number of deallocated clusters (not counting sparse ones) on
- * success and -errno on error.
- *
- * WARNING: If @ctx is supplied, regardless of whether success or failure is
- * returned, you need to check IS_ERR(@ctx->mrec) and if 'true' the @ctx
- * is no longer valid, i.e. you need to either call
- * ntfs_attr_reinit_search_ctx() or ntfs_attr_put_search_ctx() on it.
- * In that case PTR_ERR(@ctx->mrec) will give you the error code for
- * why the mapping of the old inode failed.
- *
- * Locking: - The runlist described by @ni must be locked for writing on entry
- * and is locked on return. Note the runlist may be modified when
- * needed runlist fragments need to be mapped.
- * - The volume lcn bitmap must be unlocked on entry and is unlocked
- * on return.
- * - This function takes the volume lcn bitmap lock for writing and
- * modifies the bitmap contents.
- * - If @ctx is NULL, the base mft record of @ni must not be mapped on
- * entry and it will be left unmapped on return.
- * - If @ctx is not NULL, the base mft record must be mapped on entry
- * and it will be left mapped on return.
- */
-static inline s64 ntfs_cluster_free(ntfs_inode *ni, const VCN start_vcn,
- s64 count, ntfs_attr_search_ctx *ctx)
-{
- return __ntfs_cluster_free(ni, start_vcn, count, ctx, false);
-}
-
-extern int ntfs_cluster_free_from_rl_nolock(ntfs_volume *vol,
- const runlist_element *rl);
-
-/**
- * ntfs_cluster_free_from_rl - free clusters from runlist
- * @vol: mounted ntfs volume on which to free the clusters
- * @rl: runlist describing the clusters to free
- *
- * Free all the clusters described by the runlist @rl on the volume @vol. In
- * the case of an error being returned, at least some of the clusters were not
- * freed.
- *
- * Return 0 on success and -errno on error.
- *
- * Locking: - This function takes the volume lcn bitmap lock for writing and
- * modifies the bitmap contents.
- * - The caller must have locked the runlist @rl for reading or
- * writing.
- */
-static inline int ntfs_cluster_free_from_rl(ntfs_volume *vol,
- const runlist_element *rl)
-{
- int ret;
-
- down_write(&vol->lcnbmp_lock);
- ret = ntfs_cluster_free_from_rl_nolock(vol, rl);
- up_write(&vol->lcnbmp_lock);
- return ret;
-}
-
-#endif /* NTFS_RW */
-
-#endif /* defined _LINUX_NTFS_LCNALLOC_H */
diff --git a/fs/ntfs/logfile.c b/fs/ntfs/logfile.c
deleted file mode 100644
index 6ce60ff..0000000
--- a/fs/ntfs/logfile.c
+++ /dev/null
@@ -1,849 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * logfile.c - NTFS kernel journal handling. Part of the Linux-NTFS project.
- *
- * Copyright (c) 2002-2007 Anton Altaparmakov
- */
-
-#ifdef NTFS_RW
-
-#include <linux/types.h>
-#include <linux/fs.h>
-#include <linux/highmem.h>
-#include <linux/buffer_head.h>
-#include <linux/bitops.h>
-#include <linux/log2.h>
-#include <linux/bio.h>
-
-#include "attrib.h"
-#include "aops.h"
-#include "debug.h"
-#include "logfile.h"
-#include "malloc.h"
-#include "volume.h"
-#include "ntfs.h"
-
-/**
- * ntfs_check_restart_page_header - check the page header for consistency
- * @vi: $LogFile inode to which the restart page header belongs
- * @rp: restart page header to check
- * @pos: position in @vi at which the restart page header resides
- *
- * Check the restart page header @rp for consistency and return 'true' if it is
- * consistent and 'false' otherwise.
- *
- * This function only needs NTFS_BLOCK_SIZE bytes in @rp, i.e. it does not
- * require the full restart page.
- */
-static bool ntfs_check_restart_page_header(struct inode *vi,
- RESTART_PAGE_HEADER *rp, s64 pos)
-{
- u32 logfile_system_page_size, logfile_log_page_size;
- u16 ra_ofs, usa_count, usa_ofs, usa_end = 0;
- bool have_usa = true;
-
- ntfs_debug("Entering.");
- /*
- * If the system or log page sizes are smaller than the ntfs block size
- * or either is not a power of 2 we cannot handle this log file.
- */
- logfile_system_page_size = le32_to_cpu(rp->system_page_size);
- logfile_log_page_size = le32_to_cpu(rp->log_page_size);
- if (logfile_system_page_size < NTFS_BLOCK_SIZE ||
- logfile_log_page_size < NTFS_BLOCK_SIZE ||
- logfile_system_page_size &
- (logfile_system_page_size - 1) ||
- !is_power_of_2(logfile_log_page_size)) {
- ntfs_error(vi->i_sb, "$LogFile uses unsupported page size.");
- return false;
- }
- /*
- * We must be either at !pos (1st restart page) or at pos = system page
- * size (2nd restart page).
- */
- if (pos && pos != logfile_system_page_size) {
- ntfs_error(vi->i_sb, "Found restart area in incorrect "
- "position in $LogFile.");
- return false;
- }
- /* We only know how to handle version 1.1. */
- if (sle16_to_cpu(rp->major_ver) != 1 ||
- sle16_to_cpu(rp->minor_ver) != 1) {
- ntfs_error(vi->i_sb, "$LogFile version %i.%i is not "
- "supported. (This driver supports version "
- "1.1 only.)", (int)sle16_to_cpu(rp->major_ver),
- (int)sle16_to_cpu(rp->minor_ver));
- return false;
- }
- /*
- * If chkdsk has been run the restart page may not be protected by an
- * update sequence array.
- */
- if (ntfs_is_chkd_record(rp->magic) && !le16_to_cpu(rp->usa_count)) {
- have_usa = false;
- goto skip_usa_checks;
- }
- /* Verify the size of the update sequence array. */
- usa_count = 1 + (logfile_system_page_size >> NTFS_BLOCK_SIZE_BITS);
- if (usa_count != le16_to_cpu(rp->usa_count)) {
- ntfs_error(vi->i_sb, "$LogFile restart page specifies "
- "inconsistent update sequence array count.");
- return false;
- }
- /* Verify the position of the update sequence array. */
- usa_ofs = le16_to_cpu(rp->usa_ofs);
- usa_end = usa_ofs + usa_count * sizeof(u16);
- if (usa_ofs < sizeof(RESTART_PAGE_HEADER) ||
- usa_end > NTFS_BLOCK_SIZE - sizeof(u16)) {
- ntfs_error(vi->i_sb, "$LogFile restart page specifies "
- "inconsistent update sequence array offset.");
- return false;
- }
-skip_usa_checks:
- /*
- * Verify the position of the restart area. It must be:
- * - aligned to 8-byte boundary,
- * - after the update sequence array, and
- * - within the system page size.
- */
- ra_ofs = le16_to_cpu(rp->restart_area_offset);
- if (ra_ofs & 7 || (have_usa ? ra_ofs < usa_end :
- ra_ofs < sizeof(RESTART_PAGE_HEADER)) ||
- ra_ofs > logfile_system_page_size) {
- ntfs_error(vi->i_sb, "$LogFile restart page specifies "
- "inconsistent restart area offset.");
- return false;
- }
- /*
- * Only restart pages modified by chkdsk are allowed to have chkdsk_lsn
- * set.
- */
- if (!ntfs_is_chkd_record(rp->magic) && sle64_to_cpu(rp->chkdsk_lsn)) {
- ntfs_error(vi->i_sb, "$LogFile restart page is not modified "
- "by chkdsk but a chkdsk LSN is specified.");
- return false;
- }
- ntfs_debug("Done.");
- return true;
-}
-
-/**
- * ntfs_check_restart_area - check the restart area for consistency
- * @vi: $LogFile inode to which the restart page belongs
- * @rp: restart page whose restart area to check
- *
- * Check the restart area of the restart page @rp for consistency and return
- * 'true' if it is consistent and 'false' otherwise.
- *
- * This function assumes that the restart page header has already been
- * consistency checked.
- *
- * This function only needs NTFS_BLOCK_SIZE bytes in @rp, i.e. it does not
- * require the full restart page.
- */
-static bool ntfs_check_restart_area(struct inode *vi, RESTART_PAGE_HEADER *rp)
-{
- u64 file_size;
- RESTART_AREA *ra;
- u16 ra_ofs, ra_len, ca_ofs;
- u8 fs_bits;
-
- ntfs_debug("Entering.");
- ra_ofs = le16_to_cpu(rp->restart_area_offset);
- ra = (RESTART_AREA*)((u8*)rp + ra_ofs);
- /*
- * Everything before ra->file_size must be before the first word
- * protected by an update sequence number. This ensures that it is
- * safe to access ra->client_array_offset.
- */
- if (ra_ofs + offsetof(RESTART_AREA, file_size) >
- NTFS_BLOCK_SIZE - sizeof(u16)) {
- ntfs_error(vi->i_sb, "$LogFile restart area specifies "
- "inconsistent file offset.");
- return false;
- }
- /*
- * Now that we can access ra->client_array_offset, make sure everything
- * up to the log client array is before the first word protected by an
- * update sequence number. This ensures we can access all of the
- * restart area elements safely. Also, the client array offset must be
- * aligned to an 8-byte boundary.
- */
- ca_ofs = le16_to_cpu(ra->client_array_offset);
- if (((ca_ofs + 7) & ~7) != ca_ofs ||
- ra_ofs + ca_ofs > NTFS_BLOCK_SIZE - sizeof(u16)) {
- ntfs_error(vi->i_sb, "$LogFile restart area specifies "
- "inconsistent client array offset.");
- return false;
- }
- /*
- * The restart area must end within the system page size both when
- * calculated manually and as specified by ra->restart_area_length.
- * Also, the calculated length must not exceed the specified length.
- */
- ra_len = ca_ofs + le16_to_cpu(ra->log_clients) *
- sizeof(LOG_CLIENT_RECORD);
- if (ra_ofs + ra_len > le32_to_cpu(rp->system_page_size) ||
- ra_ofs + le16_to_cpu(ra->restart_area_length) >
- le32_to_cpu(rp->system_page_size) ||
- ra_len > le16_to_cpu(ra->restart_area_length)) {
- ntfs_error(vi->i_sb, "$LogFile restart area is out of bounds "
- "of the system page size specified by the "
- "restart page header and/or the specified "
- "restart area length is inconsistent.");
- return false;
- }
- /*
- * The ra->client_free_list and ra->client_in_use_list must be either
- * LOGFILE_NO_CLIENT or less than ra->log_clients or they are
- * overflowing the client array.
- */
- if ((ra->client_free_list != LOGFILE_NO_CLIENT &&
- le16_to_cpu(ra->client_free_list) >=
- le16_to_cpu(ra->log_clients)) ||
- (ra->client_in_use_list != LOGFILE_NO_CLIENT &&
- le16_to_cpu(ra->client_in_use_list) >=
- le16_to_cpu(ra->log_clients))) {
- ntfs_error(vi->i_sb, "$LogFile restart area specifies "
- "overflowing client free and/or in use lists.");
- return false;
- }
- /*
- * Check ra->seq_number_bits against ra->file_size for consistency.
- * We cannot just use ffs() because the file size is not a power of 2.
- */
- file_size = (u64)sle64_to_cpu(ra->file_size);
- fs_bits = 0;
- while (file_size) {
- file_size >>= 1;
- fs_bits++;
- }
- if (le32_to_cpu(ra->seq_number_bits) != 67 - fs_bits) {
- ntfs_error(vi->i_sb, "$LogFile restart area specifies "
- "inconsistent sequence number bits.");
- return false;
- }
- /* The log record header length must be a multiple of 8. */
- if (((le16_to_cpu(ra->log_record_header_length) + 7) & ~7) !=
- le16_to_cpu(ra->log_record_header_length)) {
- ntfs_error(vi->i_sb, "$LogFile restart area specifies "
- "inconsistent log record header length.");
- return false;
- }
- /* Dito for the log page data offset. */
- if (((le16_to_cpu(ra->log_page_data_offset) + 7) & ~7) !=
- le16_to_cpu(ra->log_page_data_offset)) {
- ntfs_error(vi->i_sb, "$LogFile restart area specifies "
- "inconsistent log page data offset.");
- return false;
- }
- ntfs_debug("Done.");
- return true;
-}
-
-/**
- * ntfs_check_log_client_array - check the log client array for consistency
- * @vi: $LogFile inode to which the restart page belongs
- * @rp: restart page whose log client array to check
- *
- * Check the log client array of the restart page @rp for consistency and
- * return 'true' if it is consistent and 'false' otherwise.
- *
- * This function assumes that the restart page header and the restart area have
- * already been consistency checked.
- *
- * Unlike ntfs_check_restart_page_header() and ntfs_check_restart_area(), this
- * function needs @rp->system_page_size bytes in @rp, i.e. it requires the full
- * restart page and the page must be multi sector transfer deprotected.
- */
-static bool ntfs_check_log_client_array(struct inode *vi,
- RESTART_PAGE_HEADER *rp)
-{
- RESTART_AREA *ra;
- LOG_CLIENT_RECORD *ca, *cr;
- u16 nr_clients, idx;
- bool in_free_list, idx_is_first;
-
- ntfs_debug("Entering.");
- ra = (RESTART_AREA*)((u8*)rp + le16_to_cpu(rp->restart_area_offset));
- ca = (LOG_CLIENT_RECORD*)((u8*)ra +
- le16_to_cpu(ra->client_array_offset));
- /*
- * Check the ra->client_free_list first and then check the
- * ra->client_in_use_list. Check each of the log client records in
- * each of the lists and check that the array does not overflow the
- * ra->log_clients value. Also keep track of the number of records
- * visited as there cannot be more than ra->log_clients records and
- * that way we detect eventual loops in within a list.
- */
- nr_clients = le16_to_cpu(ra->log_clients);
- idx = le16_to_cpu(ra->client_free_list);
- in_free_list = true;
-check_list:
- for (idx_is_first = true; idx != LOGFILE_NO_CLIENT_CPU; nr_clients--,
- idx = le16_to_cpu(cr->next_client)) {
- if (!nr_clients || idx >= le16_to_cpu(ra->log_clients))
- goto err_out;
- /* Set @cr to the current log client record. */
- cr = ca + idx;
- /* The first log client record must not have a prev_client. */
- if (idx_is_first) {
- if (cr->prev_client != LOGFILE_NO_CLIENT)
- goto err_out;
- idx_is_first = false;
- }
- }
- /* Switch to and check the in use list if we just did the free list. */
- if (in_free_list) {
- in_free_list = false;
- idx = le16_to_cpu(ra->client_in_use_list);
- goto check_list;
- }
- ntfs_debug("Done.");
- return true;
-err_out:
- ntfs_error(vi->i_sb, "$LogFile log client array is corrupt.");
- return false;
-}
-
-/**
- * ntfs_check_and_load_restart_page - check the restart page for consistency
- * @vi: $LogFile inode to which the restart page belongs
- * @rp: restart page to check
- * @pos: position in @vi at which the restart page resides
- * @wrp: [OUT] copy of the multi sector transfer deprotected restart page
- * @lsn: [OUT] set to the current logfile lsn on success
- *
- * Check the restart page @rp for consistency and return 0 if it is consistent
- * and -errno otherwise. The restart page may have been modified by chkdsk in
- * which case its magic is CHKD instead of RSTR.
- *
- * This function only needs NTFS_BLOCK_SIZE bytes in @rp, i.e. it does not
- * require the full restart page.
- *
- * If @wrp is not NULL, on success, *@wrp will point to a buffer containing a
- * copy of the complete multi sector transfer deprotected page. On failure,
- * *@wrp is undefined.
- *
- * Simillarly, if @lsn is not NULL, on success *@lsn will be set to the current
- * logfile lsn according to this restart page. On failure, *@lsn is undefined.
- *
- * The following error codes are defined:
- * -EINVAL - The restart page is inconsistent.
- * -ENOMEM - Not enough memory to load the restart page.
- * -EIO - Failed to reading from $LogFile.
- */
-static int ntfs_check_and_load_restart_page(struct inode *vi,
- RESTART_PAGE_HEADER *rp, s64 pos, RESTART_PAGE_HEADER **wrp,
- LSN *lsn)
-{
- RESTART_AREA *ra;
- RESTART_PAGE_HEADER *trp;
- int size, err;
-
- ntfs_debug("Entering.");
- /* Check the restart page header for consistency. */
- if (!ntfs_check_restart_page_header(vi, rp, pos)) {
- /* Error output already done inside the function. */
- return -EINVAL;
- }
- /* Check the restart area for consistency. */
- if (!ntfs_check_restart_area(vi, rp)) {
- /* Error output already done inside the function. */
- return -EINVAL;
- }
- ra = (RESTART_AREA*)((u8*)rp + le16_to_cpu(rp->restart_area_offset));
- /*
- * Allocate a buffer to store the whole restart page so we can multi
- * sector transfer deprotect it.
- */
- trp = ntfs_malloc_nofs(le32_to_cpu(rp->system_page_size));
- if (!trp) {
- ntfs_error(vi->i_sb, "Failed to allocate memory for $LogFile "
- "restart page buffer.");
- return -ENOMEM;
- }
- /*
- * Read the whole of the restart page into the buffer. If it fits
- * completely inside @rp, just copy it from there. Otherwise map all
- * the required pages and copy the data from them.
- */
- size = PAGE_SIZE - (pos & ~PAGE_MASK);
- if (size >= le32_to_cpu(rp->system_page_size)) {
- memcpy(trp, rp, le32_to_cpu(rp->system_page_size));
- } else {
- pgoff_t idx;
- struct page *page;
- int have_read, to_read;
-
- /* First copy what we already have in @rp. */
- memcpy(trp, rp, size);
- /* Copy the remaining data one page at a time. */
- have_read = size;
- to_read = le32_to_cpu(rp->system_page_size) - size;
- idx = (pos + size) >> PAGE_SHIFT;
- BUG_ON((pos + size) & ~PAGE_MASK);
- do {
- page = ntfs_map_page(vi->i_mapping, idx);
- if (IS_ERR(page)) {
- ntfs_error(vi->i_sb, "Error mapping $LogFile "
- "page (index %lu).", idx);
- err = PTR_ERR(page);
- if (err != -EIO && err != -ENOMEM)
- err = -EIO;
- goto err_out;
- }
- size = min_t(int, to_read, PAGE_SIZE);
- memcpy((u8*)trp + have_read, page_address(page), size);
- ntfs_unmap_page(page);
- have_read += size;
- to_read -= size;
- idx++;
- } while (to_read > 0);
- }
- /*
- * Perform the multi sector transfer deprotection on the buffer if the
- * restart page is protected.
- */
- if ((!ntfs_is_chkd_record(trp->magic) || le16_to_cpu(trp->usa_count))
- && post_read_mst_fixup((NTFS_RECORD*)trp,
- le32_to_cpu(rp->system_page_size))) {
- /*
- * A multi sector tranfer error was detected. We only need to
- * abort if the restart page contents exceed the multi sector
- * transfer fixup of the first sector.
- */
- if (le16_to_cpu(rp->restart_area_offset) +
- le16_to_cpu(ra->restart_area_length) >
- NTFS_BLOCK_SIZE - sizeof(u16)) {
- ntfs_error(vi->i_sb, "Multi sector transfer error "
- "detected in $LogFile restart page.");
- err = -EINVAL;
- goto err_out;
- }
- }
- /*
- * If the restart page is modified by chkdsk or there are no active
- * logfile clients, the logfile is consistent. Otherwise, need to
- * check the log client records for consistency, too.
- */
- err = 0;
- if (ntfs_is_rstr_record(rp->magic) &&
- ra->client_in_use_list != LOGFILE_NO_CLIENT) {
- if (!ntfs_check_log_client_array(vi, trp)) {
- err = -EINVAL;
- goto err_out;
- }
- }
- if (lsn) {
- if (ntfs_is_rstr_record(rp->magic))
- *lsn = sle64_to_cpu(ra->current_lsn);
- else /* if (ntfs_is_chkd_record(rp->magic)) */
- *lsn = sle64_to_cpu(rp->chkdsk_lsn);
- }
- ntfs_debug("Done.");
- if (wrp)
- *wrp = trp;
- else {
-err_out:
- ntfs_free(trp);
- }
- return err;
-}
-
-/**
- * ntfs_check_logfile - check the journal for consistency
- * @log_vi: struct inode of loaded journal $LogFile to check
- * @rp: [OUT] on success this is a copy of the current restart page
- *
- * Check the $LogFile journal for consistency and return 'true' if it is
- * consistent and 'false' if not. On success, the current restart page is
- * returned in *@rp. Caller must call ntfs_free(*@rp) when finished with it.
- *
- * At present we only check the two restart pages and ignore the log record
- * pages.
- *
- * Note that the MstProtected flag is not set on the $LogFile inode and hence
- * when reading pages they are not deprotected. This is because we do not know
- * if the $LogFile was created on a system with a different page size to ours
- * yet and mst deprotection would fail if our page size is smaller.
- */
-bool ntfs_check_logfile(struct inode *log_vi, RESTART_PAGE_HEADER **rp)
-{
- s64 size, pos;
- LSN rstr1_lsn, rstr2_lsn;
- ntfs_volume *vol = NTFS_SB(log_vi->i_sb);
- struct address_space *mapping = log_vi->i_mapping;
- struct page *page = NULL;
- u8 *kaddr = NULL;
- RESTART_PAGE_HEADER *rstr1_ph = NULL;
- RESTART_PAGE_HEADER *rstr2_ph = NULL;
- int log_page_size, err;
- bool logfile_is_empty = true;
- u8 log_page_bits;
-
- ntfs_debug("Entering.");
- /* An empty $LogFile must have been clean before it got emptied. */
- if (NVolLogFileEmpty(vol))
- goto is_empty;
- size = i_size_read(log_vi);
- /* Make sure the file doesn't exceed the maximum allowed size. */
- if (size > MaxLogFileSize)
- size = MaxLogFileSize;
- /*
- * Truncate size to a multiple of the page cache size or the default
- * log page size if the page cache size is between the default log page
- * log page size if the page cache size is between the default log page
- * size and twice that.
- */
- if (PAGE_SIZE >= DefaultLogPageSize && PAGE_SIZE <=
- DefaultLogPageSize * 2)
- log_page_size = DefaultLogPageSize;
- else
- log_page_size = PAGE_SIZE;
- /*
- * Use ntfs_ffs() instead of ffs() to enable the compiler to
- * optimize log_page_size and log_page_bits into constants.
- */
- log_page_bits = ntfs_ffs(log_page_size) - 1;
- size &= ~(s64)(log_page_size - 1);
- /*
- * Ensure the log file is big enough to store at least the two restart
- * pages and the minimum number of log record pages.
- */
- if (size < log_page_size * 2 || (size - log_page_size * 2) >>
- log_page_bits < MinLogRecordPages) {
- ntfs_error(vol->sb, "$LogFile is too small.");
- return false;
- }
- /*
- * Read through the file looking for a restart page. Since the restart
- * page header is at the beginning of a page we only need to search at
- * what could be the beginning of a page (for each page size) rather
- * than scanning the whole file byte by byte. If all potential places
- * contain empty and uninitialzed records, the log file can be assumed
- * to be empty.
- */
- for (pos = 0; pos < size; pos <<= 1) {
- pgoff_t idx = pos >> PAGE_SHIFT;
- if (!page || page->index != idx) {
- if (page)
- ntfs_unmap_page(page);
- page = ntfs_map_page(mapping, idx);
- if (IS_ERR(page)) {
- ntfs_error(vol->sb, "Error mapping $LogFile "
- "page (index %lu).", idx);
- goto err_out;
- }
- }
- kaddr = (u8*)page_address(page) + (pos & ~PAGE_MASK);
- /*
- * A non-empty block means the logfile is not empty while an
- * empty block after a non-empty block has been encountered
- * means we are done.
- */
- if (!ntfs_is_empty_recordp((le32*)kaddr))
- logfile_is_empty = false;
- else if (!logfile_is_empty)
- break;
- /*
- * A log record page means there cannot be a restart page after
- * this so no need to continue searching.
- */
- if (ntfs_is_rcrd_recordp((le32*)kaddr))
- break;
- /* If not a (modified by chkdsk) restart page, continue. */
- if (!ntfs_is_rstr_recordp((le32*)kaddr) &&
- !ntfs_is_chkd_recordp((le32*)kaddr)) {
- if (!pos)
- pos = NTFS_BLOCK_SIZE >> 1;
- continue;
- }
- /*
- * Check the (modified by chkdsk) restart page for consistency
- * and get a copy of the complete multi sector transfer
- * deprotected restart page.
- */
- err = ntfs_check_and_load_restart_page(log_vi,
- (RESTART_PAGE_HEADER*)kaddr, pos,
- !rstr1_ph ? &rstr1_ph : &rstr2_ph,
- !rstr1_ph ? &rstr1_lsn : &rstr2_lsn);
- if (!err) {
- /*
- * If we have now found the first (modified by chkdsk)
- * restart page, continue looking for the second one.
- */
- if (!pos) {
- pos = NTFS_BLOCK_SIZE >> 1;
- continue;
- }
- /*
- * We have now found the second (modified by chkdsk)
- * restart page, so we can stop looking.
- */
- break;
- }
- /*
- * Error output already done inside the function. Note, we do
- * not abort if the restart page was invalid as we might still
- * find a valid one further in the file.
- */
- if (err != -EINVAL) {
- ntfs_unmap_page(page);
- goto err_out;
- }
- /* Continue looking. */
- if (!pos)
- pos = NTFS_BLOCK_SIZE >> 1;
- }
- if (page)
- ntfs_unmap_page(page);
- if (logfile_is_empty) {
- NVolSetLogFileEmpty(vol);
-is_empty:
- ntfs_debug("Done. ($LogFile is empty.)");
- return true;
- }
- if (!rstr1_ph) {
- BUG_ON(rstr2_ph);
- ntfs_error(vol->sb, "Did not find any restart pages in "
- "$LogFile and it was not empty.");
- return false;
- }
- /* If both restart pages were found, use the more recent one. */
- if (rstr2_ph) {
- /*
- * If the second restart area is more recent, switch to it.
- * Otherwise just throw it away.
- */
- if (rstr2_lsn > rstr1_lsn) {
- ntfs_debug("Using second restart page as it is more "
- "recent.");
- ntfs_free(rstr1_ph);
- rstr1_ph = rstr2_ph;
- /* rstr1_lsn = rstr2_lsn; */
- } else {
- ntfs_debug("Using first restart page as it is more "
- "recent.");
- ntfs_free(rstr2_ph);
- }
- rstr2_ph = NULL;
- }
- /* All consistency checks passed. */
- if (rp)
- *rp = rstr1_ph;
- else
- ntfs_free(rstr1_ph);
- ntfs_debug("Done.");
- return true;
-err_out:
- if (rstr1_ph)
- ntfs_free(rstr1_ph);
- return false;
-}
-
-/**
- * ntfs_is_logfile_clean - check in the journal if the volume is clean
- * @log_vi: struct inode of loaded journal $LogFile to check
- * @rp: copy of the current restart page
- *
- * Analyze the $LogFile journal and return 'true' if it indicates the volume was
- * shutdown cleanly and 'false' if not.
- *
- * At present we only look at the two restart pages and ignore the log record
- * pages. This is a little bit crude in that there will be a very small number
- * of cases where we think that a volume is dirty when in fact it is clean.
- * This should only affect volumes that have not been shutdown cleanly but did
- * not have any pending, non-check-pointed i/o, i.e. they were completely idle
- * at least for the five seconds preceding the unclean shutdown.
- *
- * This function assumes that the $LogFile journal has already been consistency
- * checked by a call to ntfs_check_logfile() and in particular if the $LogFile
- * is empty this function requires that NVolLogFileEmpty() is true otherwise an
- * empty volume will be reported as dirty.
- */
-bool ntfs_is_logfile_clean(struct inode *log_vi, const RESTART_PAGE_HEADER *rp)
-{
- ntfs_volume *vol = NTFS_SB(log_vi->i_sb);
- RESTART_AREA *ra;
-
- ntfs_debug("Entering.");
- /* An empty $LogFile must have been clean before it got emptied. */
- if (NVolLogFileEmpty(vol)) {
- ntfs_debug("Done. ($LogFile is empty.)");
- return true;
- }
- BUG_ON(!rp);
- if (!ntfs_is_rstr_record(rp->magic) &&
- !ntfs_is_chkd_record(rp->magic)) {
- ntfs_error(vol->sb, "Restart page buffer is invalid. This is "
- "probably a bug in that the $LogFile should "
- "have been consistency checked before calling "
- "this function.");
- return false;
- }
- ra = (RESTART_AREA*)((u8*)rp + le16_to_cpu(rp->restart_area_offset));
- /*
- * If the $LogFile has active clients, i.e. it is open, and we do not
- * have the RESTART_VOLUME_IS_CLEAN bit set in the restart area flags,
- * we assume there was an unclean shutdown.
- */
- if (ra->client_in_use_list != LOGFILE_NO_CLIENT &&
- !(ra->flags & RESTART_VOLUME_IS_CLEAN)) {
- ntfs_debug("Done. $LogFile indicates a dirty shutdown.");
- return false;
- }
- /* $LogFile indicates a clean shutdown. */
- ntfs_debug("Done. $LogFile indicates a clean shutdown.");
- return true;
-}
-
-/**
- * ntfs_empty_logfile - empty the contents of the $LogFile journal
- * @log_vi: struct inode of loaded journal $LogFile to empty
- *
- * Empty the contents of the $LogFile journal @log_vi and return 'true' on
- * success and 'false' on error.
- *
- * This function assumes that the $LogFile journal has already been consistency
- * checked by a call to ntfs_check_logfile() and that ntfs_is_logfile_clean()
- * has been used to ensure that the $LogFile is clean.
- */
-bool ntfs_empty_logfile(struct inode *log_vi)
-{
- VCN vcn, end_vcn;
- ntfs_inode *log_ni = NTFS_I(log_vi);
- ntfs_volume *vol = log_ni->vol;
- struct super_block *sb = vol->sb;
- runlist_element *rl;
- unsigned long flags;
- unsigned block_size, block_size_bits;
- int err;
- bool should_wait = true;
-
- ntfs_debug("Entering.");
- if (NVolLogFileEmpty(vol)) {
- ntfs_debug("Done.");
- return true;
- }
- /*
- * We cannot use ntfs_attr_set() because we may be still in the middle
- * of a mount operation. Thus we do the emptying by hand by first
- * zapping the page cache pages for the $LogFile/$DATA attribute and
- * then emptying each of the buffers in each of the clusters specified
- * by the runlist by hand.
- */
- block_size = sb->s_blocksize;
- block_size_bits = sb->s_blocksize_bits;
- vcn = 0;
- read_lock_irqsave(&log_ni->size_lock, flags);
- end_vcn = (log_ni->initialized_size + vol->cluster_size_mask) >>
- vol->cluster_size_bits;
- read_unlock_irqrestore(&log_ni->size_lock, flags);
- truncate_inode_pages(log_vi->i_mapping, 0);
- down_write(&log_ni->runlist.lock);
- rl = log_ni->runlist.rl;
- if (unlikely(!rl || vcn < rl->vcn || !rl->length)) {
-map_vcn:
- err = ntfs_map_runlist_nolock(log_ni, vcn, NULL);
- if (err) {
- ntfs_error(sb, "Failed to map runlist fragment (error "
- "%d).", -err);
- goto err;
- }
- rl = log_ni->runlist.rl;
- BUG_ON(!rl || vcn < rl->vcn || !rl->length);
- }
- /* Seek to the runlist element containing @vcn. */
- while (rl->length && vcn >= rl[1].vcn)
- rl++;
- do {
- LCN lcn;
- sector_t block, end_block;
- s64 len;
-
- /*
- * If this run is not mapped map it now and start again as the
- * runlist will have been updated.
- */
- lcn = rl->lcn;
- if (unlikely(lcn == LCN_RL_NOT_MAPPED)) {
- vcn = rl->vcn;
- goto map_vcn;
- }
- /* If this run is not valid abort with an error. */
- if (unlikely(!rl->length || lcn < LCN_HOLE))
- goto rl_err;
- /* Skip holes. */
- if (lcn == LCN_HOLE)
- continue;
- block = lcn << vol->cluster_size_bits >> block_size_bits;
- len = rl->length;
- if (rl[1].vcn > end_vcn)
- len = end_vcn - rl->vcn;
- end_block = (lcn + len) << vol->cluster_size_bits >>
- block_size_bits;
- /* Iterate over the blocks in the run and empty them. */
- do {
- struct buffer_head *bh;
-
- /* Obtain the buffer, possibly not uptodate. */
- bh = sb_getblk(sb, block);
- BUG_ON(!bh);
- /* Setup buffer i/o submission. */
- lock_buffer(bh);
- bh->b_end_io = end_buffer_write_sync;
- get_bh(bh);
- /* Set the entire contents of the buffer to 0xff. */
- memset(bh->b_data, -1, block_size);
- if (!buffer_uptodate(bh))
- set_buffer_uptodate(bh);
- if (buffer_dirty(bh))
- clear_buffer_dirty(bh);
- /*
- * Submit the buffer and wait for i/o to complete but
- * only for the first buffer so we do not miss really
- * serious i/o errors. Once the first buffer has
- * completed ignore errors afterwards as we can assume
- * that if one buffer worked all of them will work.
- */
- submit_bh(REQ_OP_WRITE, bh);
- if (should_wait) {
- should_wait = false;
- wait_on_buffer(bh);
- if (unlikely(!buffer_uptodate(bh)))
- goto io_err;
- }
- brelse(bh);
- } while (++block < end_block);
- } while ((++rl)->vcn < end_vcn);
- up_write(&log_ni->runlist.lock);
- /*
- * Zap the pages again just in case any got instantiated whilst we were
- * emptying the blocks by hand. FIXME: We may not have completed
- * writing to all the buffer heads yet so this may happen too early.
- * We really should use a kernel thread to do the emptying
- * asynchronously and then we can also set the volume dirty and output
- * an error message if emptying should fail.
- */
- truncate_inode_pages(log_vi->i_mapping, 0);
- /* Set the flag so we do not have to do it again on remount. */
- NVolSetLogFileEmpty(vol);
- ntfs_debug("Done.");
- return true;
-io_err:
- ntfs_error(sb, "Failed to write buffer. Unmount and run chkdsk.");
- goto dirty_err;
-rl_err:
- ntfs_error(sb, "Runlist is corrupt. Unmount and run chkdsk.");
-dirty_err:
- NVolSetErrors(vol);
- err = -EIO;
-err:
- up_write(&log_ni->runlist.lock);
- ntfs_error(sb, "Failed to fill $LogFile with 0xff bytes (error %d).",
- -err);
- return false;
-}
-
-#endif /* NTFS_RW */
diff --git a/fs/ntfs/logfile.h b/fs/ntfs/logfile.h
deleted file mode 100644
index 429d490..0000000
--- a/fs/ntfs/logfile.h
+++ /dev/null
@@ -1,295 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * logfile.h - Defines for NTFS kernel journal ($LogFile) handling. Part of
- * the Linux-NTFS project.
- *
- * Copyright (c) 2000-2005 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_LOGFILE_H
-#define _LINUX_NTFS_LOGFILE_H
-
-#ifdef NTFS_RW
-
-#include <linux/fs.h>
-
-#include "types.h"
-#include "endian.h"
-#include "layout.h"
-
-/*
- * Journal ($LogFile) organization:
- *
- * Two restart areas present in the first two pages (restart pages, one restart
- * area in each page). When the volume is dismounted they should be identical,
- * except for the update sequence array which usually has a different update
- * sequence number.
- *
- * These are followed by log records organized in pages headed by a log record
- * header going up to log file size. Not all pages contain log records when a
- * volume is first formatted, but as the volume ages, all records will be used.
- * When the log file fills up, the records at the beginning are purged (by
- * modifying the oldest_lsn to a higher value presumably) and writing begins
- * at the beginning of the file. Effectively, the log file is viewed as a
- * circular entity.
- *
- * NOTE: Windows NT, 2000, and XP all use log file version 1.1 but they accept
- * versions <= 1.x, including 0.-1. (Yes, that is a minus one in there!) We
- * probably only want to support 1.1 as this seems to be the current version
- * and we don't know how that differs from the older versions. The only
- * exception is if the journal is clean as marked by the two restart pages
- * then it doesn't matter whether we are on an earlier version. We can just
- * reinitialize the logfile and start again with version 1.1.
- */
-
-/* Some $LogFile related constants. */
-#define MaxLogFileSize 0x100000000ULL
-#define DefaultLogPageSize 4096
-#define MinLogRecordPages 48
-
-/*
- * Log file restart page header (begins the restart area).
- */
-typedef struct {
-/*Ofs*/
-/* 0 NTFS_RECORD; -- Unfolded here as gcc doesn't like unnamed structs. */
-/* 0*/ NTFS_RECORD_TYPE magic; /* The magic is "RSTR". */
-/* 4*/ le16 usa_ofs; /* See NTFS_RECORD definition in layout.h.
- When creating, set this to be immediately
- after this header structure (without any
- alignment). */
-/* 6*/ le16 usa_count; /* See NTFS_RECORD definition in layout.h. */
-
-/* 8*/ leLSN chkdsk_lsn; /* The last log file sequence number found by
- chkdsk. Only used when the magic is changed
- to "CHKD". Otherwise this is zero. */
-/* 16*/ le32 system_page_size; /* Byte size of system pages when the log file
- was created, has to be >= 512 and a power of
- 2. Use this to calculate the required size
- of the usa (usa_count) and add it to usa_ofs.
- Then verify that the result is less than the
- value of the restart_area_offset. */
-/* 20*/ le32 log_page_size; /* Byte size of log file pages, has to be >=
- 512 and a power of 2. The default is 4096
- and is used when the system page size is
- between 4096 and 8192. Otherwise this is
- set to the system page size instead. */
-/* 24*/ le16 restart_area_offset;/* Byte offset from the start of this header to
- the RESTART_AREA. Value has to be aligned
- to 8-byte boundary. When creating, set this
- to be after the usa. */
-/* 26*/ sle16 minor_ver; /* Log file minor version. Only check if major
- version is 1. */
-/* 28*/ sle16 major_ver; /* Log file major version. We only support
- version 1.1. */
-/* sizeof() = 30 (0x1e) bytes */
-} __attribute__ ((__packed__)) RESTART_PAGE_HEADER;
-
-/*
- * Constant for the log client indices meaning that there are no client records
- * in this particular client array. Also inside the client records themselves,
- * this means that there are no client records preceding or following this one.
- */
-#define LOGFILE_NO_CLIENT cpu_to_le16(0xffff)
-#define LOGFILE_NO_CLIENT_CPU 0xffff
-
-/*
- * These are the so far known RESTART_AREA_* flags (16-bit) which contain
- * information about the log file in which they are present.
- */
-enum {
- RESTART_VOLUME_IS_CLEAN = cpu_to_le16(0x0002),
- RESTART_SPACE_FILLER = cpu_to_le16(0xffff), /* gcc: Force enum bit width to 16. */
-} __attribute__ ((__packed__));
-
-typedef le16 RESTART_AREA_FLAGS;
-
-/*
- * Log file restart area record. The offset of this record is found by adding
- * the offset of the RESTART_PAGE_HEADER to the restart_area_offset value found
- * in it. See notes at restart_area_offset above.
- */
-typedef struct {
-/*Ofs*/
-/* 0*/ leLSN current_lsn; /* The current, i.e. last LSN inside the log
- when the restart area was last written.
- This happens often but what is the interval?
- Is it just fixed time or is it every time a
- check point is written or somethine else?
- On create set to 0. */
-/* 8*/ le16 log_clients; /* Number of log client records in the array of
- log client records which follows this
- restart area. Must be 1. */
-/* 10*/ le16 client_free_list; /* The index of the first free log client record
- in the array of log client records.
- LOGFILE_NO_CLIENT means that there are no
- free log client records in the array.
- If != LOGFILE_NO_CLIENT, check that
- log_clients > client_free_list. On Win2k
- and presumably earlier, on a clean volume
- this is != LOGFILE_NO_CLIENT, and it should
- be 0, i.e. the first (and only) client
- record is free and thus the logfile is
- closed and hence clean. A dirty volume
- would have left the logfile open and hence
- this would be LOGFILE_NO_CLIENT. On WinXP
- and presumably later, the logfile is always
- open, even on clean shutdown so this should
- always be LOGFILE_NO_CLIENT. */
-/* 12*/ le16 client_in_use_list;/* The index of the first in-use log client
- record in the array of log client records.
- LOGFILE_NO_CLIENT means that there are no
- in-use log client records in the array. If
- != LOGFILE_NO_CLIENT check that log_clients
- > client_in_use_list. On Win2k and
- presumably earlier, on a clean volume this
- is LOGFILE_NO_CLIENT, i.e. there are no
- client records in use and thus the logfile
- is closed and hence clean. A dirty volume
- would have left the logfile open and hence
- this would be != LOGFILE_NO_CLIENT, and it
- should be 0, i.e. the first (and only)
- client record is in use. On WinXP and
- presumably later, the logfile is always
- open, even on clean shutdown so this should
- always be 0. */
-/* 14*/ RESTART_AREA_FLAGS flags;/* Flags modifying LFS behaviour. On Win2k
- and presumably earlier this is always 0. On
- WinXP and presumably later, if the logfile
- was shutdown cleanly, the second bit,
- RESTART_VOLUME_IS_CLEAN, is set. This bit
- is cleared when the volume is mounted by
- WinXP and set when the volume is dismounted,
- thus if the logfile is dirty, this bit is
- clear. Thus we don't need to check the
- Windows version to determine if the logfile
- is clean. Instead if the logfile is closed,
- we know it must be clean. If it is open and
- this bit is set, we also know it must be
- clean. If on the other hand the logfile is
- open and this bit is clear, we can be almost
- certain that the logfile is dirty. */
-/* 16*/ le32 seq_number_bits; /* How many bits to use for the sequence
- number. This is calculated as 67 - the
- number of bits required to store the logfile
- size in bytes and this can be used in with
- the specified file_size as a consistency
- check. */
-/* 20*/ le16 restart_area_length;/* Length of the restart area including the
- client array. Following checks required if
- version matches. Otherwise, skip them.
- restart_area_offset + restart_area_length
- has to be <= system_page_size. Also,
- restart_area_length has to be >=
- client_array_offset + (log_clients *
- sizeof(log client record)). */
-/* 22*/ le16 client_array_offset;/* Offset from the start of this record to
- the first log client record if versions are
- matched. When creating, set this to be
- after this restart area structure, aligned
- to 8-bytes boundary. If the versions do not
- match, this is ignored and the offset is
- assumed to be (sizeof(RESTART_AREA) + 7) &
- ~7, i.e. rounded up to first 8-byte
- boundary. Either way, client_array_offset
- has to be aligned to an 8-byte boundary.
- Also, restart_area_offset +
- client_array_offset has to be <= 510.
- Finally, client_array_offset + (log_clients
- * sizeof(log client record)) has to be <=
- system_page_size. On Win2k and presumably
- earlier, this is 0x30, i.e. immediately
- following this record. On WinXP and
- presumably later, this is 0x40, i.e. there
- are 16 extra bytes between this record and
- the client array. This probably means that
- the RESTART_AREA record is actually bigger
- in WinXP and later. */
-/* 24*/ sle64 file_size; /* Usable byte size of the log file. If the
- restart_area_offset + the offset of the
- file_size are > 510 then corruption has
- occurred. This is the very first check when
- starting with the restart_area as if it
- fails it means that some of the above values
- will be corrupted by the multi sector
- transfer protection. The file_size has to
- be rounded down to be a multiple of the
- log_page_size in the RESTART_PAGE_HEADER and
- then it has to be at least big enough to
- store the two restart pages and 48 (0x30)
- log record pages. */
-/* 32*/ le32 last_lsn_data_length;/* Length of data of last LSN, not including
- the log record header. On create set to
- 0. */
-/* 36*/ le16 log_record_header_length;/* Byte size of the log record header.
- If the version matches then check that the
- value of log_record_header_length is a
- multiple of 8, i.e.
- (log_record_header_length + 7) & ~7 ==
- log_record_header_length. When creating set
- it to sizeof(LOG_RECORD_HEADER), aligned to
- 8 bytes. */
-/* 38*/ le16 log_page_data_offset;/* Offset to the start of data in a log record
- page. Must be a multiple of 8. On create
- set it to immediately after the update
- sequence array of the log record page. */
-/* 40*/ le32 restart_log_open_count;/* A counter that gets incremented every
- time the logfile is restarted which happens
- at mount time when the logfile is opened.
- When creating set to a random value. Win2k
- sets it to the low 32 bits of the current
- system time in NTFS format (see time.h). */
-/* 44*/ le32 reserved; /* Reserved/alignment to 8-byte boundary. */
-/* sizeof() = 48 (0x30) bytes */
-} __attribute__ ((__packed__)) RESTART_AREA;
-
-/*
- * Log client record. The offset of this record is found by adding the offset
- * of the RESTART_AREA to the client_array_offset value found in it.
- */
-typedef struct {
-/*Ofs*/
-/* 0*/ leLSN oldest_lsn; /* Oldest LSN needed by this client. On create
- set to 0. */
-/* 8*/ leLSN client_restart_lsn;/* LSN at which this client needs to restart
- the volume, i.e. the current position within
- the log file. At present, if clean this
- should = current_lsn in restart area but it
- probably also = current_lsn when dirty most
- of the time. At create set to 0. */
-/* 16*/ le16 prev_client; /* The offset to the previous log client record
- in the array of log client records.
- LOGFILE_NO_CLIENT means there is no previous
- client record, i.e. this is the first one.
- This is always LOGFILE_NO_CLIENT. */
-/* 18*/ le16 next_client; /* The offset to the next log client record in
- the array of log client records.
- LOGFILE_NO_CLIENT means there are no next
- client records, i.e. this is the last one.
- This is always LOGFILE_NO_CLIENT. */
-/* 20*/ le16 seq_number; /* On Win2k and presumably earlier, this is set
- to zero every time the logfile is restarted
- and it is incremented when the logfile is
- closed at dismount time. Thus it is 0 when
- dirty and 1 when clean. On WinXP and
- presumably later, this is always 0. */
-/* 22*/ u8 reserved[6]; /* Reserved/alignment. */
-/* 28*/ le32 client_name_length;/* Length of client name in bytes. Should
- always be 8. */
-/* 32*/ ntfschar client_name[64];/* Name of the client in Unicode. Should
- always be "NTFS" with the remaining bytes
- set to 0. */
-/* sizeof() = 160 (0xa0) bytes */
-} __attribute__ ((__packed__)) LOG_CLIENT_RECORD;
-
-extern bool ntfs_check_logfile(struct inode *log_vi,
- RESTART_PAGE_HEADER **rp);
-
-extern bool ntfs_is_logfile_clean(struct inode *log_vi,
- const RESTART_PAGE_HEADER *rp);
-
-extern bool ntfs_empty_logfile(struct inode *log_vi);
-
-#endif /* NTFS_RW */
-
-#endif /* _LINUX_NTFS_LOGFILE_H */
diff --git a/fs/ntfs/malloc.h b/fs/ntfs/malloc.h
deleted file mode 100644
index 7068425..0000000
--- a/fs/ntfs/malloc.h
+++ /dev/null
@@ -1,77 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * malloc.h - NTFS kernel memory handling. Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2005 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_MALLOC_H
-#define _LINUX_NTFS_MALLOC_H
-
-#include <linux/vmalloc.h>
-#include <linux/slab.h>
-#include <linux/highmem.h>
-
-/**
- * __ntfs_malloc - allocate memory in multiples of pages
- * @size: number of bytes to allocate
- * @gfp_mask: extra flags for the allocator
- *
- * Internal function. You probably want ntfs_malloc_nofs()...
- *
- * Allocates @size bytes of memory, rounded up to multiples of PAGE_SIZE and
- * returns a pointer to the allocated memory.
- *
- * If there was insufficient memory to complete the request, return NULL.
- * Depending on @gfp_mask the allocation may be guaranteed to succeed.
- */
-static inline void *__ntfs_malloc(unsigned long size, gfp_t gfp_mask)
-{
- if (likely(size <= PAGE_SIZE)) {
- BUG_ON(!size);
- /* kmalloc() has per-CPU caches so is faster for now. */
- return kmalloc(PAGE_SIZE, gfp_mask & ~__GFP_HIGHMEM);
- /* return (void *)__get_free_page(gfp_mask); */
- }
- if (likely((size >> PAGE_SHIFT) < totalram_pages()))
- return __vmalloc(size, gfp_mask);
- return NULL;
-}
-
-/**
- * ntfs_malloc_nofs - allocate memory in multiples of pages
- * @size: number of bytes to allocate
- *
- * Allocates @size bytes of memory, rounded up to multiples of PAGE_SIZE and
- * returns a pointer to the allocated memory.
- *
- * If there was insufficient memory to complete the request, return NULL.
- */
-static inline void *ntfs_malloc_nofs(unsigned long size)
-{
- return __ntfs_malloc(size, GFP_NOFS | __GFP_HIGHMEM);
-}
-
-/**
- * ntfs_malloc_nofs_nofail - allocate memory in multiples of pages
- * @size: number of bytes to allocate
- *
- * Allocates @size bytes of memory, rounded up to multiples of PAGE_SIZE and
- * returns a pointer to the allocated memory.
- *
- * This function guarantees that the allocation will succeed. It will sleep
- * for as long as it takes to complete the allocation.
- *
- * If there was insufficient memory to complete the request, return NULL.
- */
-static inline void *ntfs_malloc_nofs_nofail(unsigned long size)
-{
- return __ntfs_malloc(size, GFP_NOFS | __GFP_HIGHMEM | __GFP_NOFAIL);
-}
-
-static inline void ntfs_free(void *addr)
-{
- kvfree(addr);
-}
-
-#endif /* _LINUX_NTFS_MALLOC_H */
diff --git a/fs/ntfs/mft.c b/fs/ntfs/mft.c
deleted file mode 100644
index 6fd1dc4..0000000
--- a/fs/ntfs/mft.c
+++ /dev/null
@@ -1,2907 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * mft.c - NTFS kernel mft record operations. Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2012 Anton Altaparmakov and Tuxera Inc.
- * Copyright (c) 2002 Richard Russon
- */
-
-#include <linux/buffer_head.h>
-#include <linux/slab.h>
-#include <linux/swap.h>
-#include <linux/bio.h>
-
-#include "attrib.h"
-#include "aops.h"
-#include "bitmap.h"
-#include "debug.h"
-#include "dir.h"
-#include "lcnalloc.h"
-#include "malloc.h"
-#include "mft.h"
-#include "ntfs.h"
-
-#define MAX_BHS (PAGE_SIZE / NTFS_BLOCK_SIZE)
-
-/**
- * map_mft_record_page - map the page in which a specific mft record resides
- * @ni: ntfs inode whose mft record page to map
- *
- * This maps the page in which the mft record of the ntfs inode @ni is situated
- * and returns a pointer to the mft record within the mapped page.
- *
- * Return value needs to be checked with IS_ERR() and if that is true PTR_ERR()
- * contains the negative error code returned.
- */
-static inline MFT_RECORD *map_mft_record_page(ntfs_inode *ni)
-{
- loff_t i_size;
- ntfs_volume *vol = ni->vol;
- struct inode *mft_vi = vol->mft_ino;
- struct page *page;
- unsigned long index, end_index;
- unsigned ofs;
-
- BUG_ON(ni->page);
- /*
- * The index into the page cache and the offset within the page cache
- * page of the wanted mft record. FIXME: We need to check for
- * overflowing the unsigned long, but I don't think we would ever get
- * here if the volume was that big...
- */
- index = (u64)ni->mft_no << vol->mft_record_size_bits >>
- PAGE_SHIFT;
- ofs = (ni->mft_no << vol->mft_record_size_bits) & ~PAGE_MASK;
-
- i_size = i_size_read(mft_vi);
- /* The maximum valid index into the page cache for $MFT's data. */
- end_index = i_size >> PAGE_SHIFT;
-
- /* If the wanted index is out of bounds the mft record doesn't exist. */
- if (unlikely(index >= end_index)) {
- if (index > end_index || (i_size & ~PAGE_MASK) < ofs +
- vol->mft_record_size) {
- page = ERR_PTR(-ENOENT);
- ntfs_error(vol->sb, "Attempt to read mft record 0x%lx, "
- "which is beyond the end of the mft. "
- "This is probably a bug in the ntfs "
- "driver.", ni->mft_no);
- goto err_out;
- }
- }
- /* Read, map, and pin the page. */
- page = ntfs_map_page(mft_vi->i_mapping, index);
- if (!IS_ERR(page)) {
- /* Catch multi sector transfer fixup errors. */
- if (likely(ntfs_is_mft_recordp((le32*)(page_address(page) +
- ofs)))) {
- ni->page = page;
- ni->page_ofs = ofs;
- return page_address(page) + ofs;
- }
- ntfs_error(vol->sb, "Mft record 0x%lx is corrupt. "
- "Run chkdsk.", ni->mft_no);
- ntfs_unmap_page(page);
- page = ERR_PTR(-EIO);
- NVolSetErrors(vol);
- }
-err_out:
- ni->page = NULL;
- ni->page_ofs = 0;
- return (void*)page;
-}
-
-/**
- * map_mft_record - map, pin and lock an mft record
- * @ni: ntfs inode whose MFT record to map
- *
- * First, take the mrec_lock mutex. We might now be sleeping, while waiting
- * for the mutex if it was already locked by someone else.
- *
- * The page of the record is mapped using map_mft_record_page() before being
- * returned to the caller.
- *
- * This in turn uses ntfs_map_page() to get the page containing the wanted mft
- * record (it in turn calls read_cache_page() which reads it in from disk if
- * necessary, increments the use count on the page so that it cannot disappear
- * under us and returns a reference to the page cache page).
- *
- * If read_cache_page() invokes ntfs_readpage() to load the page from disk, it
- * sets PG_locked and clears PG_uptodate on the page. Once I/O has completed
- * and the post-read mst fixups on each mft record in the page have been
- * performed, the page gets PG_uptodate set and PG_locked cleared (this is done
- * in our asynchronous I/O completion handler end_buffer_read_mft_async()).
- * ntfs_map_page() waits for PG_locked to become clear and checks if
- * PG_uptodate is set and returns an error code if not. This provides
- * sufficient protection against races when reading/using the page.
- *
- * However there is the write mapping to think about. Doing the above described
- * checking here will be fine, because when initiating the write we will set
- * PG_locked and clear PG_uptodate making sure nobody is touching the page
- * contents. Doing the locking this way means that the commit to disk code in
- * the page cache code paths is automatically sufficiently locked with us as
- * we will not touch a page that has been locked or is not uptodate. The only
- * locking problem then is them locking the page while we are accessing it.
- *
- * So that code will end up having to own the mrec_lock of all mft
- * records/inodes present in the page before I/O can proceed. In that case we
- * wouldn't need to bother with PG_locked and PG_uptodate as nobody will be
- * accessing anything without owning the mrec_lock mutex. But we do need to
- * use them because of the read_cache_page() invocation and the code becomes so
- * much simpler this way that it is well worth it.
- *
- * The mft record is now ours and we return a pointer to it. You need to check
- * the returned pointer with IS_ERR() and if that is true, PTR_ERR() will return
- * the error code.
- *
- * NOTE: Caller is responsible for setting the mft record dirty before calling
- * unmap_mft_record(). This is obviously only necessary if the caller really
- * modified the mft record...
- * Q: Do we want to recycle one of the VFS inode state bits instead?
- * A: No, the inode ones mean we want to change the mft record, not we want to
- * write it out.
- */
-MFT_RECORD *map_mft_record(ntfs_inode *ni)
-{
- MFT_RECORD *m;
-
- ntfs_debug("Entering for mft_no 0x%lx.", ni->mft_no);
-
- /* Make sure the ntfs inode doesn't go away. */
- atomic_inc(&ni->count);
-
- /* Serialize access to this mft record. */
- mutex_lock(&ni->mrec_lock);
-
- m = map_mft_record_page(ni);
- if (!IS_ERR(m))
- return m;
-
- mutex_unlock(&ni->mrec_lock);
- atomic_dec(&ni->count);
- ntfs_error(ni->vol->sb, "Failed with error code %lu.", -PTR_ERR(m));
- return m;
-}
-
-/**
- * unmap_mft_record_page - unmap the page in which a specific mft record resides
- * @ni: ntfs inode whose mft record page to unmap
- *
- * This unmaps the page in which the mft record of the ntfs inode @ni is
- * situated and returns. This is a NOOP if highmem is not configured.
- *
- * The unmap happens via ntfs_unmap_page() which in turn decrements the use
- * count on the page thus releasing it from the pinned state.
- *
- * We do not actually unmap the page from memory of course, as that will be
- * done by the page cache code itself when memory pressure increases or
- * whatever.
- */
-static inline void unmap_mft_record_page(ntfs_inode *ni)
-{
- BUG_ON(!ni->page);
-
- // TODO: If dirty, blah...
- ntfs_unmap_page(ni->page);
- ni->page = NULL;
- ni->page_ofs = 0;
- return;
-}
-
-/**
- * unmap_mft_record - release a mapped mft record
- * @ni: ntfs inode whose MFT record to unmap
- *
- * We release the page mapping and the mrec_lock mutex which unmaps the mft
- * record and releases it for others to get hold of. We also release the ntfs
- * inode by decrementing the ntfs inode reference count.
- *
- * NOTE: If caller has modified the mft record, it is imperative to set the mft
- * record dirty BEFORE calling unmap_mft_record().
- */
-void unmap_mft_record(ntfs_inode *ni)
-{
- struct page *page = ni->page;
-
- BUG_ON(!page);
-
- ntfs_debug("Entering for mft_no 0x%lx.", ni->mft_no);
-
- unmap_mft_record_page(ni);
- mutex_unlock(&ni->mrec_lock);
- atomic_dec(&ni->count);
- /*
- * If pure ntfs_inode, i.e. no vfs inode attached, we leave it to
- * ntfs_clear_extent_inode() in the extent inode case, and to the
- * caller in the non-extent, yet pure ntfs inode case, to do the actual
- * tear down of all structures and freeing of all allocated memory.
- */
- return;
-}
-
-/**
- * map_extent_mft_record - load an extent inode and attach it to its base
- * @base_ni: base ntfs inode
- * @mref: mft reference of the extent inode to load
- * @ntfs_ino: on successful return, pointer to the ntfs_inode structure
- *
- * Load the extent mft record @mref and attach it to its base inode @base_ni.
- * Return the mapped extent mft record if IS_ERR(result) is false. Otherwise
- * PTR_ERR(result) gives the negative error code.
- *
- * On successful return, @ntfs_ino contains a pointer to the ntfs_inode
- * structure of the mapped extent inode.
- */
-MFT_RECORD *map_extent_mft_record(ntfs_inode *base_ni, MFT_REF mref,
- ntfs_inode **ntfs_ino)
-{
- MFT_RECORD *m;
- ntfs_inode *ni = NULL;
- ntfs_inode **extent_nis = NULL;
- int i;
- unsigned long mft_no = MREF(mref);
- u16 seq_no = MSEQNO(mref);
- bool destroy_ni = false;
-
- ntfs_debug("Mapping extent mft record 0x%lx (base mft record 0x%lx).",
- mft_no, base_ni->mft_no);
- /* Make sure the base ntfs inode doesn't go away. */
- atomic_inc(&base_ni->count);
- /*
- * Check if this extent inode has already been added to the base inode,
- * in which case just return it. If not found, add it to the base
- * inode before returning it.
- */
- mutex_lock(&base_ni->extent_lock);
- if (base_ni->nr_extents > 0) {
- extent_nis = base_ni->ext.extent_ntfs_inos;
- for (i = 0; i < base_ni->nr_extents; i++) {
- if (mft_no != extent_nis[i]->mft_no)
- continue;
- ni = extent_nis[i];
- /* Make sure the ntfs inode doesn't go away. */
- atomic_inc(&ni->count);
- break;
- }
- }
- if (likely(ni != NULL)) {
- mutex_unlock(&base_ni->extent_lock);
- atomic_dec(&base_ni->count);
- /* We found the record; just have to map and return it. */
- m = map_mft_record(ni);
- /* map_mft_record() has incremented this on success. */
- atomic_dec(&ni->count);
- if (!IS_ERR(m)) {
- /* Verify the sequence number. */
- if (likely(le16_to_cpu(m->sequence_number) == seq_no)) {
- ntfs_debug("Done 1.");
- *ntfs_ino = ni;
- return m;
- }
- unmap_mft_record(ni);
- ntfs_error(base_ni->vol->sb, "Found stale extent mft "
- "reference! Corrupt filesystem. "
- "Run chkdsk.");
- return ERR_PTR(-EIO);
- }
-map_err_out:
- ntfs_error(base_ni->vol->sb, "Failed to map extent "
- "mft record, error code %ld.", -PTR_ERR(m));
- return m;
- }
- /* Record wasn't there. Get a new ntfs inode and initialize it. */
- ni = ntfs_new_extent_inode(base_ni->vol->sb, mft_no);
- if (unlikely(!ni)) {
- mutex_unlock(&base_ni->extent_lock);
- atomic_dec(&base_ni->count);
- return ERR_PTR(-ENOMEM);
- }
- ni->vol = base_ni->vol;
- ni->seq_no = seq_no;
- ni->nr_extents = -1;
- ni->ext.base_ntfs_ino = base_ni;
- /* Now map the record. */
- m = map_mft_record(ni);
- if (IS_ERR(m)) {
- mutex_unlock(&base_ni->extent_lock);
- atomic_dec(&base_ni->count);
- ntfs_clear_extent_inode(ni);
- goto map_err_out;
- }
- /* Verify the sequence number if it is present. */
- if (seq_no && (le16_to_cpu(m->sequence_number) != seq_no)) {
- ntfs_error(base_ni->vol->sb, "Found stale extent mft "
- "reference! Corrupt filesystem. Run chkdsk.");
- destroy_ni = true;
- m = ERR_PTR(-EIO);
- goto unm_err_out;
- }
- /* Attach extent inode to base inode, reallocating memory if needed. */
- if (!(base_ni->nr_extents & 3)) {
- ntfs_inode **tmp;
- int new_size = (base_ni->nr_extents + 4) * sizeof(ntfs_inode *);
-
- tmp = kmalloc(new_size, GFP_NOFS);
- if (unlikely(!tmp)) {
- ntfs_error(base_ni->vol->sb, "Failed to allocate "
- "internal buffer.");
- destroy_ni = true;
- m = ERR_PTR(-ENOMEM);
- goto unm_err_out;
- }
- if (base_ni->nr_extents) {
- BUG_ON(!base_ni->ext.extent_ntfs_inos);
- memcpy(tmp, base_ni->ext.extent_ntfs_inos, new_size -
- 4 * sizeof(ntfs_inode *));
- kfree(base_ni->ext.extent_ntfs_inos);
- }
- base_ni->ext.extent_ntfs_inos = tmp;
- }
- base_ni->ext.extent_ntfs_inos[base_ni->nr_extents++] = ni;
- mutex_unlock(&base_ni->extent_lock);
- atomic_dec(&base_ni->count);
- ntfs_debug("Done 2.");
- *ntfs_ino = ni;
- return m;
-unm_err_out:
- unmap_mft_record(ni);
- mutex_unlock(&base_ni->extent_lock);
- atomic_dec(&base_ni->count);
- /*
- * If the extent inode was not attached to the base inode we need to
- * release it or we will leak memory.
- */
- if (destroy_ni)
- ntfs_clear_extent_inode(ni);
- return m;
-}
-
-#ifdef NTFS_RW
-
-/**
- * __mark_mft_record_dirty - set the mft record and the page containing it dirty
- * @ni: ntfs inode describing the mapped mft record
- *
- * Internal function. Users should call mark_mft_record_dirty() instead.
- *
- * Set the mapped (extent) mft record of the (base or extent) ntfs inode @ni,
- * as well as the page containing the mft record, dirty. Also, mark the base
- * vfs inode dirty. This ensures that any changes to the mft record are
- * written out to disk.
- *
- * NOTE: We only set I_DIRTY_DATASYNC (and not I_DIRTY_PAGES)
- * on the base vfs inode, because even though file data may have been modified,
- * it is dirty in the inode meta data rather than the data page cache of the
- * inode, and thus there are no data pages that need writing out. Therefore, a
- * full mark_inode_dirty() is overkill. A mark_inode_dirty_sync(), on the
- * other hand, is not sufficient, because ->write_inode needs to be called even
- * in case of fdatasync. This needs to happen or the file data would not
- * necessarily hit the device synchronously, even though the vfs inode has the
- * O_SYNC flag set. Also, I_DIRTY_DATASYNC simply "feels" better than just
- * I_DIRTY_SYNC, since the file data has not actually hit the block device yet,
- * which is not what I_DIRTY_SYNC on its own would suggest.
- */
-void __mark_mft_record_dirty(ntfs_inode *ni)
-{
- ntfs_inode *base_ni;
-
- ntfs_debug("Entering for inode 0x%lx.", ni->mft_no);
- BUG_ON(NInoAttr(ni));
- mark_ntfs_record_dirty(ni->page, ni->page_ofs);
- /* Determine the base vfs inode and mark it dirty, too. */
- mutex_lock(&ni->extent_lock);
- if (likely(ni->nr_extents >= 0))
- base_ni = ni;
- else
- base_ni = ni->ext.base_ntfs_ino;
- mutex_unlock(&ni->extent_lock);
- __mark_inode_dirty(VFS_I(base_ni), I_DIRTY_DATASYNC);
-}
-
-static const char *ntfs_please_email = "Please email "
- "linux-ntfs-dev@lists.sourceforge.net and say that you saw "
- "this message. Thank you.";
-
-/**
- * ntfs_sync_mft_mirror_umount - synchronise an mft record to the mft mirror
- * @vol: ntfs volume on which the mft record to synchronize resides
- * @mft_no: mft record number of mft record to synchronize
- * @m: mapped, mst protected (extent) mft record to synchronize
- *
- * Write the mapped, mst protected (extent) mft record @m with mft record
- * number @mft_no to the mft mirror ($MFTMirr) of the ntfs volume @vol,
- * bypassing the page cache and the $MFTMirr inode itself.
- *
- * This function is only for use at umount time when the mft mirror inode has
- * already been disposed off. We BUG() if we are called while the mft mirror
- * inode is still attached to the volume.
- *
- * On success return 0. On error return -errno.
- *
- * NOTE: This function is not implemented yet as I am not convinced it can
- * actually be triggered considering the sequence of commits we do in super.c::
- * ntfs_put_super(). But just in case we provide this place holder as the
- * alternative would be either to BUG() or to get a NULL pointer dereference
- * and Oops.
- */
-static int ntfs_sync_mft_mirror_umount(ntfs_volume *vol,
- const unsigned long mft_no, MFT_RECORD *m)
-{
- BUG_ON(vol->mftmirr_ino);
- ntfs_error(vol->sb, "Umount time mft mirror syncing is not "
- "implemented yet. %s", ntfs_please_email);
- return -EOPNOTSUPP;
-}
-
-/**
- * ntfs_sync_mft_mirror - synchronize an mft record to the mft mirror
- * @vol: ntfs volume on which the mft record to synchronize resides
- * @mft_no: mft record number of mft record to synchronize
- * @m: mapped, mst protected (extent) mft record to synchronize
- * @sync: if true, wait for i/o completion
- *
- * Write the mapped, mst protected (extent) mft record @m with mft record
- * number @mft_no to the mft mirror ($MFTMirr) of the ntfs volume @vol.
- *
- * On success return 0. On error return -errno and set the volume errors flag
- * in the ntfs volume @vol.
- *
- * NOTE: We always perform synchronous i/o and ignore the @sync parameter.
- *
- * TODO: If @sync is false, want to do truly asynchronous i/o, i.e. just
- * schedule i/o via ->writepage or do it via kntfsd or whatever.
- */
-int ntfs_sync_mft_mirror(ntfs_volume *vol, const unsigned long mft_no,
- MFT_RECORD *m, int sync)
-{
- struct page *page;
- unsigned int blocksize = vol->sb->s_blocksize;
- int max_bhs = vol->mft_record_size / blocksize;
- struct buffer_head *bhs[MAX_BHS];
- struct buffer_head *bh, *head;
- u8 *kmirr;
- runlist_element *rl;
- unsigned int block_start, block_end, m_start, m_end, page_ofs;
- int i_bhs, nr_bhs, err = 0;
- unsigned char blocksize_bits = vol->sb->s_blocksize_bits;
-
- ntfs_debug("Entering for inode 0x%lx.", mft_no);
- BUG_ON(!max_bhs);
- if (WARN_ON(max_bhs > MAX_BHS))
- return -EINVAL;
- if (unlikely(!vol->mftmirr_ino)) {
- /* This could happen during umount... */
- err = ntfs_sync_mft_mirror_umount(vol, mft_no, m);
- if (likely(!err))
- return err;
- goto err_out;
- }
- /* Get the page containing the mirror copy of the mft record @m. */
- page = ntfs_map_page(vol->mftmirr_ino->i_mapping, mft_no >>
- (PAGE_SHIFT - vol->mft_record_size_bits));
- if (IS_ERR(page)) {
- ntfs_error(vol->sb, "Failed to map mft mirror page.");
- err = PTR_ERR(page);
- goto err_out;
- }
- lock_page(page);
- BUG_ON(!PageUptodate(page));
- ClearPageUptodate(page);
- /* Offset of the mft mirror record inside the page. */
- page_ofs = (mft_no << vol->mft_record_size_bits) & ~PAGE_MASK;
- /* The address in the page of the mirror copy of the mft record @m. */
- kmirr = page_address(page) + page_ofs;
- /* Copy the mst protected mft record to the mirror. */
- memcpy(kmirr, m, vol->mft_record_size);
- /* Create uptodate buffers if not present. */
- if (unlikely(!page_has_buffers(page))) {
- struct buffer_head *tail;
-
- bh = head = alloc_page_buffers(page, blocksize, true);
- do {
- set_buffer_uptodate(bh);
- tail = bh;
- bh = bh->b_this_page;
- } while (bh);
- tail->b_this_page = head;
- attach_page_private(page, head);
- }
- bh = head = page_buffers(page);
- BUG_ON(!bh);
- rl = NULL;
- nr_bhs = 0;
- block_start = 0;
- m_start = kmirr - (u8*)page_address(page);
- m_end = m_start + vol->mft_record_size;
- do {
- block_end = block_start + blocksize;
- /* If the buffer is outside the mft record, skip it. */
- if (block_end <= m_start)
- continue;
- if (unlikely(block_start >= m_end))
- break;
- /* Need to map the buffer if it is not mapped already. */
- if (unlikely(!buffer_mapped(bh))) {
- VCN vcn;
- LCN lcn;
- unsigned int vcn_ofs;
-
- bh->b_bdev = vol->sb->s_bdev;
- /* Obtain the vcn and offset of the current block. */
- vcn = ((VCN)mft_no << vol->mft_record_size_bits) +
- (block_start - m_start);
- vcn_ofs = vcn & vol->cluster_size_mask;
- vcn >>= vol->cluster_size_bits;
- if (!rl) {
- down_read(&NTFS_I(vol->mftmirr_ino)->
- runlist.lock);
- rl = NTFS_I(vol->mftmirr_ino)->runlist.rl;
- /*
- * $MFTMirr always has the whole of its runlist
- * in memory.
- */
- BUG_ON(!rl);
- }
- /* Seek to element containing target vcn. */
- while (rl->length && rl[1].vcn <= vcn)
- rl++;
- lcn = ntfs_rl_vcn_to_lcn(rl, vcn);
- /* For $MFTMirr, only lcn >= 0 is a successful remap. */
- if (likely(lcn >= 0)) {
- /* Setup buffer head to correct block. */
- bh->b_blocknr = ((lcn <<
- vol->cluster_size_bits) +
- vcn_ofs) >> blocksize_bits;
- set_buffer_mapped(bh);
- } else {
- bh->b_blocknr = -1;
- ntfs_error(vol->sb, "Cannot write mft mirror "
- "record 0x%lx because its "
- "location on disk could not "
- "be determined (error code "
- "%lli).", mft_no,
- (long long)lcn);
- err = -EIO;
- }
- }
- BUG_ON(!buffer_uptodate(bh));
- BUG_ON(!nr_bhs && (m_start != block_start));
- BUG_ON(nr_bhs >= max_bhs);
- bhs[nr_bhs++] = bh;
- BUG_ON((nr_bhs >= max_bhs) && (m_end != block_end));
- } while (block_start = block_end, (bh = bh->b_this_page) != head);
- if (unlikely(rl))
- up_read(&NTFS_I(vol->mftmirr_ino)->runlist.lock);
- if (likely(!err)) {
- /* Lock buffers and start synchronous write i/o on them. */
- for (i_bhs = 0; i_bhs < nr_bhs; i_bhs++) {
- struct buffer_head *tbh = bhs[i_bhs];
-
- if (!trylock_buffer(tbh))
- BUG();
- BUG_ON(!buffer_uptodate(tbh));
- clear_buffer_dirty(tbh);
- get_bh(tbh);
- tbh->b_end_io = end_buffer_write_sync;
- submit_bh(REQ_OP_WRITE, tbh);
- }
- /* Wait on i/o completion of buffers. */
- for (i_bhs = 0; i_bhs < nr_bhs; i_bhs++) {
- struct buffer_head *tbh = bhs[i_bhs];
-
- wait_on_buffer(tbh);
- if (unlikely(!buffer_uptodate(tbh))) {
- err = -EIO;
- /*
- * Set the buffer uptodate so the page and
- * buffer states do not become out of sync.
- */
- set_buffer_uptodate(tbh);
- }
- }
- } else /* if (unlikely(err)) */ {
- /* Clean the buffers. */
- for (i_bhs = 0; i_bhs < nr_bhs; i_bhs++)
- clear_buffer_dirty(bhs[i_bhs]);
- }
- /* Current state: all buffers are clean, unlocked, and uptodate. */
- /* Remove the mst protection fixups again. */
- post_write_mst_fixup((NTFS_RECORD*)kmirr);
- flush_dcache_page(page);
- SetPageUptodate(page);
- unlock_page(page);
- ntfs_unmap_page(page);
- if (likely(!err)) {
- ntfs_debug("Done.");
- } else {
- ntfs_error(vol->sb, "I/O error while writing mft mirror "
- "record 0x%lx!", mft_no);
-err_out:
- ntfs_error(vol->sb, "Failed to synchronize $MFTMirr (error "
- "code %i). Volume will be left marked dirty "
- "on umount. Run ntfsfix on the partition "
- "after umounting to correct this.", -err);
- NVolSetErrors(vol);
- }
- return err;
-}
-
-/**
- * write_mft_record_nolock - write out a mapped (extent) mft record
- * @ni: ntfs inode describing the mapped (extent) mft record
- * @m: mapped (extent) mft record to write
- * @sync: if true, wait for i/o completion
- *
- * Write the mapped (extent) mft record @m described by the (regular or extent)
- * ntfs inode @ni to backing store. If the mft record @m has a counterpart in
- * the mft mirror, that is also updated.
- *
- * We only write the mft record if the ntfs inode @ni is dirty and the first
- * buffer belonging to its mft record is dirty, too. We ignore the dirty state
- * of subsequent buffers because we could have raced with
- * fs/ntfs/aops.c::mark_ntfs_record_dirty().
- *
- * On success, clean the mft record and return 0. On error, leave the mft
- * record dirty and return -errno.
- *
- * NOTE: We always perform synchronous i/o and ignore the @sync parameter.
- * However, if the mft record has a counterpart in the mft mirror and @sync is
- * true, we write the mft record, wait for i/o completion, and only then write
- * the mft mirror copy. This ensures that if the system crashes either the mft
- * or the mft mirror will contain a self-consistent mft record @m. If @sync is
- * false on the other hand, we start i/o on both and then wait for completion
- * on them. This provides a speedup but no longer guarantees that you will end
- * up with a self-consistent mft record in the case of a crash but if you asked
- * for asynchronous writing you probably do not care about that anyway.
- *
- * TODO: If @sync is false, want to do truly asynchronous i/o, i.e. just
- * schedule i/o via ->writepage or do it via kntfsd or whatever.
- */
-int write_mft_record_nolock(ntfs_inode *ni, MFT_RECORD *m, int sync)
-{
- ntfs_volume *vol = ni->vol;
- struct page *page = ni->page;
- unsigned int blocksize = vol->sb->s_blocksize;
- unsigned char blocksize_bits = vol->sb->s_blocksize_bits;
- int max_bhs = vol->mft_record_size / blocksize;
- struct buffer_head *bhs[MAX_BHS];
- struct buffer_head *bh, *head;
- runlist_element *rl;
- unsigned int block_start, block_end, m_start, m_end;
- int i_bhs, nr_bhs, err = 0;
-
- ntfs_debug("Entering for inode 0x%lx.", ni->mft_no);
- BUG_ON(NInoAttr(ni));
- BUG_ON(!max_bhs);
- BUG_ON(!PageLocked(page));
- if (WARN_ON(max_bhs > MAX_BHS)) {
- err = -EINVAL;
- goto err_out;
- }
- /*
- * If the ntfs_inode is clean no need to do anything. If it is dirty,
- * mark it as clean now so that it can be redirtied later on if needed.
- * There is no danger of races since the caller is holding the locks
- * for the mft record @m and the page it is in.
- */
- if (!NInoTestClearDirty(ni))
- goto done;
- bh = head = page_buffers(page);
- BUG_ON(!bh);
- rl = NULL;
- nr_bhs = 0;
- block_start = 0;
- m_start = ni->page_ofs;
- m_end = m_start + vol->mft_record_size;
- do {
- block_end = block_start + blocksize;
- /* If the buffer is outside the mft record, skip it. */
- if (block_end <= m_start)
- continue;
- if (unlikely(block_start >= m_end))
- break;
- /*
- * If this block is not the first one in the record, we ignore
- * the buffer's dirty state because we could have raced with a
- * parallel mark_ntfs_record_dirty().
- */
- if (block_start == m_start) {
- /* This block is the first one in the record. */
- if (!buffer_dirty(bh)) {
- BUG_ON(nr_bhs);
- /* Clean records are not written out. */
- break;
- }
- }
- /* Need to map the buffer if it is not mapped already. */
- if (unlikely(!buffer_mapped(bh))) {
- VCN vcn;
- LCN lcn;
- unsigned int vcn_ofs;
-
- bh->b_bdev = vol->sb->s_bdev;
- /* Obtain the vcn and offset of the current block. */
- vcn = ((VCN)ni->mft_no << vol->mft_record_size_bits) +
- (block_start - m_start);
- vcn_ofs = vcn & vol->cluster_size_mask;
- vcn >>= vol->cluster_size_bits;
- if (!rl) {
- down_read(&NTFS_I(vol->mft_ino)->runlist.lock);
- rl = NTFS_I(vol->mft_ino)->runlist.rl;
- BUG_ON(!rl);
- }
- /* Seek to element containing target vcn. */
- while (rl->length && rl[1].vcn <= vcn)
- rl++;
- lcn = ntfs_rl_vcn_to_lcn(rl, vcn);
- /* For $MFT, only lcn >= 0 is a successful remap. */
- if (likely(lcn >= 0)) {
- /* Setup buffer head to correct block. */
- bh->b_blocknr = ((lcn <<
- vol->cluster_size_bits) +
- vcn_ofs) >> blocksize_bits;
- set_buffer_mapped(bh);
- } else {
- bh->b_blocknr = -1;
- ntfs_error(vol->sb, "Cannot write mft record "
- "0x%lx because its location "
- "on disk could not be "
- "determined (error code %lli).",
- ni->mft_no, (long long)lcn);
- err = -EIO;
- }
- }
- BUG_ON(!buffer_uptodate(bh));
- BUG_ON(!nr_bhs && (m_start != block_start));
- BUG_ON(nr_bhs >= max_bhs);
- bhs[nr_bhs++] = bh;
- BUG_ON((nr_bhs >= max_bhs) && (m_end != block_end));
- } while (block_start = block_end, (bh = bh->b_this_page) != head);
- if (unlikely(rl))
- up_read(&NTFS_I(vol->mft_ino)->runlist.lock);
- if (!nr_bhs)
- goto done;
- if (unlikely(err))
- goto cleanup_out;
- /* Apply the mst protection fixups. */
- err = pre_write_mst_fixup((NTFS_RECORD*)m, vol->mft_record_size);
- if (err) {
- ntfs_error(vol->sb, "Failed to apply mst fixups!");
- goto cleanup_out;
- }
- flush_dcache_mft_record_page(ni);
- /* Lock buffers and start synchronous write i/o on them. */
- for (i_bhs = 0; i_bhs < nr_bhs; i_bhs++) {
- struct buffer_head *tbh = bhs[i_bhs];
-
- if (!trylock_buffer(tbh))
- BUG();
- BUG_ON(!buffer_uptodate(tbh));
- clear_buffer_dirty(tbh);
- get_bh(tbh);
- tbh->b_end_io = end_buffer_write_sync;
- submit_bh(REQ_OP_WRITE, tbh);
- }
- /* Synchronize the mft mirror now if not @sync. */
- if (!sync && ni->mft_no < vol->mftmirr_size)
- ntfs_sync_mft_mirror(vol, ni->mft_no, m, sync);
- /* Wait on i/o completion of buffers. */
- for (i_bhs = 0; i_bhs < nr_bhs; i_bhs++) {
- struct buffer_head *tbh = bhs[i_bhs];
-
- wait_on_buffer(tbh);
- if (unlikely(!buffer_uptodate(tbh))) {
- err = -EIO;
- /*
- * Set the buffer uptodate so the page and buffer
- * states do not become out of sync.
- */
- if (PageUptodate(page))
- set_buffer_uptodate(tbh);
- }
- }
- /* If @sync, now synchronize the mft mirror. */
- if (sync && ni->mft_no < vol->mftmirr_size)
- ntfs_sync_mft_mirror(vol, ni->mft_no, m, sync);
- /* Remove the mst protection fixups again. */
- post_write_mst_fixup((NTFS_RECORD*)m);
- flush_dcache_mft_record_page(ni);
- if (unlikely(err)) {
- /* I/O error during writing. This is really bad! */
- ntfs_error(vol->sb, "I/O error while writing mft record "
- "0x%lx! Marking base inode as bad. You "
- "should unmount the volume and run chkdsk.",
- ni->mft_no);
- goto err_out;
- }
-done:
- ntfs_debug("Done.");
- return 0;
-cleanup_out:
- /* Clean the buffers. */
- for (i_bhs = 0; i_bhs < nr_bhs; i_bhs++)
- clear_buffer_dirty(bhs[i_bhs]);
-err_out:
- /*
- * Current state: all buffers are clean, unlocked, and uptodate.
- * The caller should mark the base inode as bad so that no more i/o
- * happens. ->clear_inode() will still be invoked so all extent inodes
- * and other allocated memory will be freed.
- */
- if (err == -ENOMEM) {
- ntfs_error(vol->sb, "Not enough memory to write mft record. "
- "Redirtying so the write is retried later.");
- mark_mft_record_dirty(ni);
- err = 0;
- } else
- NVolSetErrors(vol);
- return err;
-}
-
-/**
- * ntfs_may_write_mft_record - check if an mft record may be written out
- * @vol: [IN] ntfs volume on which the mft record to check resides
- * @mft_no: [IN] mft record number of the mft record to check
- * @m: [IN] mapped mft record to check
- * @locked_ni: [OUT] caller has to unlock this ntfs inode if one is returned
- *
- * Check if the mapped (base or extent) mft record @m with mft record number
- * @mft_no belonging to the ntfs volume @vol may be written out. If necessary
- * and possible the ntfs inode of the mft record is locked and the base vfs
- * inode is pinned. The locked ntfs inode is then returned in @locked_ni. The
- * caller is responsible for unlocking the ntfs inode and unpinning the base
- * vfs inode.
- *
- * Return 'true' if the mft record may be written out and 'false' if not.
- *
- * The caller has locked the page and cleared the uptodate flag on it which
- * means that we can safely write out any dirty mft records that do not have
- * their inodes in icache as determined by ilookup5() as anyone
- * opening/creating such an inode would block when attempting to map the mft
- * record in read_cache_page() until we are finished with the write out.
- *
- * Here is a description of the tests we perform:
- *
- * If the inode is found in icache we know the mft record must be a base mft
- * record. If it is dirty, we do not write it and return 'false' as the vfs
- * inode write paths will result in the access times being updated which would
- * cause the base mft record to be redirtied and written out again. (We know
- * the access time update will modify the base mft record because Windows
- * chkdsk complains if the standard information attribute is not in the base
- * mft record.)
- *
- * If the inode is in icache and not dirty, we attempt to lock the mft record
- * and if we find the lock was already taken, it is not safe to write the mft
- * record and we return 'false'.
- *
- * If we manage to obtain the lock we have exclusive access to the mft record,
- * which also allows us safe writeout of the mft record. We then set
- * @locked_ni to the locked ntfs inode and return 'true'.
- *
- * Note we cannot just lock the mft record and sleep while waiting for the lock
- * because this would deadlock due to lock reversal (normally the mft record is
- * locked before the page is locked but we already have the page locked here
- * when we try to lock the mft record).
- *
- * If the inode is not in icache we need to perform further checks.
- *
- * If the mft record is not a FILE record or it is a base mft record, we can
- * safely write it and return 'true'.
- *
- * We now know the mft record is an extent mft record. We check if the inode
- * corresponding to its base mft record is in icache and obtain a reference to
- * it if it is. If it is not, we can safely write it and return 'true'.
- *
- * We now have the base inode for the extent mft record. We check if it has an
- * ntfs inode for the extent mft record attached and if not it is safe to write
- * the extent mft record and we return 'true'.
- *
- * The ntfs inode for the extent mft record is attached to the base inode so we
- * attempt to lock the extent mft record and if we find the lock was already
- * taken, it is not safe to write the extent mft record and we return 'false'.
- *
- * If we manage to obtain the lock we have exclusive access to the extent mft
- * record, which also allows us safe writeout of the extent mft record. We
- * set the ntfs inode of the extent mft record clean and then set @locked_ni to
- * the now locked ntfs inode and return 'true'.
- *
- * Note, the reason for actually writing dirty mft records here and not just
- * relying on the vfs inode dirty code paths is that we can have mft records
- * modified without them ever having actual inodes in memory. Also we can have
- * dirty mft records with clean ntfs inodes in memory. None of the described
- * cases would result in the dirty mft records being written out if we only
- * relied on the vfs inode dirty code paths. And these cases can really occur
- * during allocation of new mft records and in particular when the
- * initialized_size of the $MFT/$DATA attribute is extended and the new space
- * is initialized using ntfs_mft_record_format(). The clean inode can then
- * appear if the mft record is reused for a new inode before it got written
- * out.
- */
-bool ntfs_may_write_mft_record(ntfs_volume *vol, const unsigned long mft_no,
- const MFT_RECORD *m, ntfs_inode **locked_ni)
-{
- struct super_block *sb = vol->sb;
- struct inode *mft_vi = vol->mft_ino;
- struct inode *vi;
- ntfs_inode *ni, *eni, **extent_nis;
- int i;
- ntfs_attr na;
-
- ntfs_debug("Entering for inode 0x%lx.", mft_no);
- /*
- * Normally we do not return a locked inode so set @locked_ni to NULL.
- */
- BUG_ON(!locked_ni);
- *locked_ni = NULL;
- /*
- * Check if the inode corresponding to this mft record is in the VFS
- * inode cache and obtain a reference to it if it is.
- */
- ntfs_debug("Looking for inode 0x%lx in icache.", mft_no);
- na.mft_no = mft_no;
- na.name = NULL;
- na.name_len = 0;
- na.type = AT_UNUSED;
- /*
- * Optimize inode 0, i.e. $MFT itself, since we have it in memory and
- * we get here for it rather often.
- */
- if (!mft_no) {
- /* Balance the below iput(). */
- vi = igrab(mft_vi);
- BUG_ON(vi != mft_vi);
- } else {
- /*
- * Have to use ilookup5_nowait() since ilookup5() waits for the
- * inode lock which causes ntfs to deadlock when a concurrent
- * inode write via the inode dirty code paths and the page
- * dirty code path of the inode dirty code path when writing
- * $MFT occurs.
- */
- vi = ilookup5_nowait(sb, mft_no, ntfs_test_inode, &na);
- }
- if (vi) {
- ntfs_debug("Base inode 0x%lx is in icache.", mft_no);
- /* The inode is in icache. */
- ni = NTFS_I(vi);
- /* Take a reference to the ntfs inode. */
- atomic_inc(&ni->count);
- /* If the inode is dirty, do not write this record. */
- if (NInoDirty(ni)) {
- ntfs_debug("Inode 0x%lx is dirty, do not write it.",
- mft_no);
- atomic_dec(&ni->count);
- iput(vi);
- return false;
- }
- ntfs_debug("Inode 0x%lx is not dirty.", mft_no);
- /* The inode is not dirty, try to take the mft record lock. */
- if (unlikely(!mutex_trylock(&ni->mrec_lock))) {
- ntfs_debug("Mft record 0x%lx is already locked, do "
- "not write it.", mft_no);
- atomic_dec(&ni->count);
- iput(vi);
- return false;
- }
- ntfs_debug("Managed to lock mft record 0x%lx, write it.",
- mft_no);
- /*
- * The write has to occur while we hold the mft record lock so
- * return the locked ntfs inode.
- */
- *locked_ni = ni;
- return true;
- }
- ntfs_debug("Inode 0x%lx is not in icache.", mft_no);
- /* The inode is not in icache. */
- /* Write the record if it is not a mft record (type "FILE"). */
- if (!ntfs_is_mft_record(m->magic)) {
- ntfs_debug("Mft record 0x%lx is not a FILE record, write it.",
- mft_no);
- return true;
- }
- /* Write the mft record if it is a base inode. */
- if (!m->base_mft_record) {
- ntfs_debug("Mft record 0x%lx is a base record, write it.",
- mft_no);
- return true;
- }
- /*
- * This is an extent mft record. Check if the inode corresponding to
- * its base mft record is in icache and obtain a reference to it if it
- * is.
- */
- na.mft_no = MREF_LE(m->base_mft_record);
- ntfs_debug("Mft record 0x%lx is an extent record. Looking for base "
- "inode 0x%lx in icache.", mft_no, na.mft_no);
- if (!na.mft_no) {
- /* Balance the below iput(). */
- vi = igrab(mft_vi);
- BUG_ON(vi != mft_vi);
- } else
- vi = ilookup5_nowait(sb, na.mft_no, ntfs_test_inode,
- &na);
- if (!vi) {
- /*
- * The base inode is not in icache, write this extent mft
- * record.
- */
- ntfs_debug("Base inode 0x%lx is not in icache, write the "
- "extent record.", na.mft_no);
- return true;
- }
- ntfs_debug("Base inode 0x%lx is in icache.", na.mft_no);
- /*
- * The base inode is in icache. Check if it has the extent inode
- * corresponding to this extent mft record attached.
- */
- ni = NTFS_I(vi);
- mutex_lock(&ni->extent_lock);
- if (ni->nr_extents <= 0) {
- /*
- * The base inode has no attached extent inodes, write this
- * extent mft record.
- */
- mutex_unlock(&ni->extent_lock);
- iput(vi);
- ntfs_debug("Base inode 0x%lx has no attached extent inodes, "
- "write the extent record.", na.mft_no);
- return true;
- }
- /* Iterate over the attached extent inodes. */
- extent_nis = ni->ext.extent_ntfs_inos;
- for (eni = NULL, i = 0; i < ni->nr_extents; ++i) {
- if (mft_no == extent_nis[i]->mft_no) {
- /*
- * Found the extent inode corresponding to this extent
- * mft record.
- */
- eni = extent_nis[i];
- break;
- }
- }
- /*
- * If the extent inode was not attached to the base inode, write this
- * extent mft record.
- */
- if (!eni) {
- mutex_unlock(&ni->extent_lock);
- iput(vi);
- ntfs_debug("Extent inode 0x%lx is not attached to its base "
- "inode 0x%lx, write the extent record.",
- mft_no, na.mft_no);
- return true;
- }
- ntfs_debug("Extent inode 0x%lx is attached to its base inode 0x%lx.",
- mft_no, na.mft_no);
- /* Take a reference to the extent ntfs inode. */
- atomic_inc(&eni->count);
- mutex_unlock(&ni->extent_lock);
- /*
- * Found the extent inode coresponding to this extent mft record.
- * Try to take the mft record lock.
- */
- if (unlikely(!mutex_trylock(&eni->mrec_lock))) {
- atomic_dec(&eni->count);
- iput(vi);
- ntfs_debug("Extent mft record 0x%lx is already locked, do "
- "not write it.", mft_no);
- return false;
- }
- ntfs_debug("Managed to lock extent mft record 0x%lx, write it.",
- mft_no);
- if (NInoTestClearDirty(eni))
- ntfs_debug("Extent inode 0x%lx is dirty, marking it clean.",
- mft_no);
- /*
- * The write has to occur while we hold the mft record lock so return
- * the locked extent ntfs inode.
- */
- *locked_ni = eni;
- return true;
-}
-
-static const char *es = " Leaving inconsistent metadata. Unmount and run "
- "chkdsk.";
-
-/**
- * ntfs_mft_bitmap_find_and_alloc_free_rec_nolock - see name
- * @vol: volume on which to search for a free mft record
- * @base_ni: open base inode if allocating an extent mft record or NULL
- *
- * Search for a free mft record in the mft bitmap attribute on the ntfs volume
- * @vol.
- *
- * If @base_ni is NULL start the search at the default allocator position.
- *
- * If @base_ni is not NULL start the search at the mft record after the base
- * mft record @base_ni.
- *
- * Return the free mft record on success and -errno on error. An error code of
- * -ENOSPC means that there are no free mft records in the currently
- * initialized mft bitmap.
- *
- * Locking: Caller must hold vol->mftbmp_lock for writing.
- */
-static int ntfs_mft_bitmap_find_and_alloc_free_rec_nolock(ntfs_volume *vol,
- ntfs_inode *base_ni)
-{
- s64 pass_end, ll, data_pos, pass_start, ofs, bit;
- unsigned long flags;
- struct address_space *mftbmp_mapping;
- u8 *buf, *byte;
- struct page *page;
- unsigned int page_ofs, size;
- u8 pass, b;
-
- ntfs_debug("Searching for free mft record in the currently "
- "initialized mft bitmap.");
- mftbmp_mapping = vol->mftbmp_ino->i_mapping;
- /*
- * Set the end of the pass making sure we do not overflow the mft
- * bitmap.
- */
- read_lock_irqsave(&NTFS_I(vol->mft_ino)->size_lock, flags);
- pass_end = NTFS_I(vol->mft_ino)->allocated_size >>
- vol->mft_record_size_bits;
- read_unlock_irqrestore(&NTFS_I(vol->mft_ino)->size_lock, flags);
- read_lock_irqsave(&NTFS_I(vol->mftbmp_ino)->size_lock, flags);
- ll = NTFS_I(vol->mftbmp_ino)->initialized_size << 3;
- read_unlock_irqrestore(&NTFS_I(vol->mftbmp_ino)->size_lock, flags);
- if (pass_end > ll)
- pass_end = ll;
- pass = 1;
- if (!base_ni)
- data_pos = vol->mft_data_pos;
- else
- data_pos = base_ni->mft_no + 1;
- if (data_pos < 24)
- data_pos = 24;
- if (data_pos >= pass_end) {
- data_pos = 24;
- pass = 2;
- /* This happens on a freshly formatted volume. */
- if (data_pos >= pass_end)
- return -ENOSPC;
- }
- pass_start = data_pos;
- ntfs_debug("Starting bitmap search: pass %u, pass_start 0x%llx, "
- "pass_end 0x%llx, data_pos 0x%llx.", pass,
- (long long)pass_start, (long long)pass_end,
- (long long)data_pos);
- /* Loop until a free mft record is found. */
- for (; pass <= 2;) {
- /* Cap size to pass_end. */
- ofs = data_pos >> 3;
- page_ofs = ofs & ~PAGE_MASK;
- size = PAGE_SIZE - page_ofs;
- ll = ((pass_end + 7) >> 3) - ofs;
- if (size > ll)
- size = ll;
- size <<= 3;
- /*
- * If we are still within the active pass, search the next page
- * for a zero bit.
- */
- if (size) {
- page = ntfs_map_page(mftbmp_mapping,
- ofs >> PAGE_SHIFT);
- if (IS_ERR(page)) {
- ntfs_error(vol->sb, "Failed to read mft "
- "bitmap, aborting.");
- return PTR_ERR(page);
- }
- buf = (u8*)page_address(page) + page_ofs;
- bit = data_pos & 7;
- data_pos &= ~7ull;
- ntfs_debug("Before inner for loop: size 0x%x, "
- "data_pos 0x%llx, bit 0x%llx", size,
- (long long)data_pos, (long long)bit);
- for (; bit < size && data_pos + bit < pass_end;
- bit &= ~7ull, bit += 8) {
- byte = buf + (bit >> 3);
- if (*byte == 0xff)
- continue;
- b = ffz((unsigned long)*byte);
- if (b < 8 && b >= (bit & 7)) {
- ll = data_pos + (bit & ~7ull) + b;
- if (unlikely(ll > (1ll << 32))) {
- ntfs_unmap_page(page);
- return -ENOSPC;
- }
- *byte |= 1 << b;
- flush_dcache_page(page);
- set_page_dirty(page);
- ntfs_unmap_page(page);
- ntfs_debug("Done. (Found and "
- "allocated mft record "
- "0x%llx.)",
- (long long)ll);
- return ll;
- }
- }
- ntfs_debug("After inner for loop: size 0x%x, "
- "data_pos 0x%llx, bit 0x%llx", size,
- (long long)data_pos, (long long)bit);
- data_pos += size;
- ntfs_unmap_page(page);
- /*
- * If the end of the pass has not been reached yet,
- * continue searching the mft bitmap for a zero bit.
- */
- if (data_pos < pass_end)
- continue;
- }
- /* Do the next pass. */
- if (++pass == 2) {
- /*
- * Starting the second pass, in which we scan the first
- * part of the zone which we omitted earlier.
- */
- pass_end = pass_start;
- data_pos = pass_start = 24;
- ntfs_debug("pass %i, pass_start 0x%llx, pass_end "
- "0x%llx.", pass, (long long)pass_start,
- (long long)pass_end);
- if (data_pos >= pass_end)
- break;
- }
- }
- /* No free mft records in currently initialized mft bitmap. */
- ntfs_debug("Done. (No free mft records left in currently initialized "
- "mft bitmap.)");
- return -ENOSPC;
-}
-
-/**
- * ntfs_mft_bitmap_extend_allocation_nolock - extend mft bitmap by a cluster
- * @vol: volume on which to extend the mft bitmap attribute
- *
- * Extend the mft bitmap attribute on the ntfs volume @vol by one cluster.
- *
- * Note: Only changes allocated_size, i.e. does not touch initialized_size or
- * data_size.
- *
- * Return 0 on success and -errno on error.
- *
- * Locking: - Caller must hold vol->mftbmp_lock for writing.
- * - This function takes NTFS_I(vol->mftbmp_ino)->runlist.lock for
- * writing and releases it before returning.
- * - This function takes vol->lcnbmp_lock for writing and releases it
- * before returning.
- */
-static int ntfs_mft_bitmap_extend_allocation_nolock(ntfs_volume *vol)
-{
- LCN lcn;
- s64 ll;
- unsigned long flags;
- struct page *page;
- ntfs_inode *mft_ni, *mftbmp_ni;
- runlist_element *rl, *rl2 = NULL;
- ntfs_attr_search_ctx *ctx = NULL;
- MFT_RECORD *mrec;
- ATTR_RECORD *a = NULL;
- int ret, mp_size;
- u32 old_alen = 0;
- u8 *b, tb;
- struct {
- u8 added_cluster:1;
- u8 added_run:1;
- u8 mp_rebuilt:1;
- } status = { 0, 0, 0 };
-
- ntfs_debug("Extending mft bitmap allocation.");
- mft_ni = NTFS_I(vol->mft_ino);
- mftbmp_ni = NTFS_I(vol->mftbmp_ino);
- /*
- * Determine the last lcn of the mft bitmap. The allocated size of the
- * mft bitmap cannot be zero so we are ok to do this.
- */
- down_write(&mftbmp_ni->runlist.lock);
- read_lock_irqsave(&mftbmp_ni->size_lock, flags);
- ll = mftbmp_ni->allocated_size;
- read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
- rl = ntfs_attr_find_vcn_nolock(mftbmp_ni,
- (ll - 1) >> vol->cluster_size_bits, NULL);
- if (IS_ERR(rl) || unlikely(!rl->length || rl->lcn < 0)) {
- up_write(&mftbmp_ni->runlist.lock);
- ntfs_error(vol->sb, "Failed to determine last allocated "
- "cluster of mft bitmap attribute.");
- if (!IS_ERR(rl))
- ret = -EIO;
- else
- ret = PTR_ERR(rl);
- return ret;
- }
- lcn = rl->lcn + rl->length;
- ntfs_debug("Last lcn of mft bitmap attribute is 0x%llx.",
- (long long)lcn);
- /*
- * Attempt to get the cluster following the last allocated cluster by
- * hand as it may be in the MFT zone so the allocator would not give it
- * to us.
- */
- ll = lcn >> 3;
- page = ntfs_map_page(vol->lcnbmp_ino->i_mapping,
- ll >> PAGE_SHIFT);
- if (IS_ERR(page)) {
- up_write(&mftbmp_ni->runlist.lock);
- ntfs_error(vol->sb, "Failed to read from lcn bitmap.");
- return PTR_ERR(page);
- }
- b = (u8*)page_address(page) + (ll & ~PAGE_MASK);
- tb = 1 << (lcn & 7ull);
- down_write(&vol->lcnbmp_lock);
- if (*b != 0xff && !(*b & tb)) {
- /* Next cluster is free, allocate it. */
- *b |= tb;
- flush_dcache_page(page);
- set_page_dirty(page);
- up_write(&vol->lcnbmp_lock);
- ntfs_unmap_page(page);
- /* Update the mft bitmap runlist. */
- rl->length++;
- rl[1].vcn++;
- status.added_cluster = 1;
- ntfs_debug("Appending one cluster to mft bitmap.");
- } else {
- up_write(&vol->lcnbmp_lock);
- ntfs_unmap_page(page);
- /* Allocate a cluster from the DATA_ZONE. */
- rl2 = ntfs_cluster_alloc(vol, rl[1].vcn, 1, lcn, DATA_ZONE,
- true);
- if (IS_ERR(rl2)) {
- up_write(&mftbmp_ni->runlist.lock);
- ntfs_error(vol->sb, "Failed to allocate a cluster for "
- "the mft bitmap.");
- return PTR_ERR(rl2);
- }
- rl = ntfs_runlists_merge(mftbmp_ni->runlist.rl, rl2);
- if (IS_ERR(rl)) {
- up_write(&mftbmp_ni->runlist.lock);
- ntfs_error(vol->sb, "Failed to merge runlists for mft "
- "bitmap.");
- if (ntfs_cluster_free_from_rl(vol, rl2)) {
- ntfs_error(vol->sb, "Failed to deallocate "
- "allocated cluster.%s", es);
- NVolSetErrors(vol);
- }
- ntfs_free(rl2);
- return PTR_ERR(rl);
- }
- mftbmp_ni->runlist.rl = rl;
- status.added_run = 1;
- ntfs_debug("Adding one run to mft bitmap.");
- /* Find the last run in the new runlist. */
- for (; rl[1].length; rl++)
- ;
- }
- /*
- * Update the attribute record as well. Note: @rl is the last
- * (non-terminator) runlist element of mft bitmap.
- */
- mrec = map_mft_record(mft_ni);
- if (IS_ERR(mrec)) {
- ntfs_error(vol->sb, "Failed to map mft record.");
- ret = PTR_ERR(mrec);
- goto undo_alloc;
- }
- ctx = ntfs_attr_get_search_ctx(mft_ni, mrec);
- if (unlikely(!ctx)) {
- ntfs_error(vol->sb, "Failed to get search context.");
- ret = -ENOMEM;
- goto undo_alloc;
- }
- ret = ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
- mftbmp_ni->name_len, CASE_SENSITIVE, rl[1].vcn, NULL,
- 0, ctx);
- if (unlikely(ret)) {
- ntfs_error(vol->sb, "Failed to find last attribute extent of "
- "mft bitmap attribute.");
- if (ret == -ENOENT)
- ret = -EIO;
- goto undo_alloc;
- }
- a = ctx->attr;
- ll = sle64_to_cpu(a->data.non_resident.lowest_vcn);
- /* Search back for the previous last allocated cluster of mft bitmap. */
- for (rl2 = rl; rl2 > mftbmp_ni->runlist.rl; rl2--) {
- if (ll >= rl2->vcn)
- break;
- }
- BUG_ON(ll < rl2->vcn);
- BUG_ON(ll >= rl2->vcn + rl2->length);
- /* Get the size for the new mapping pairs array for this extent. */
- mp_size = ntfs_get_size_for_mapping_pairs(vol, rl2, ll, -1);
- if (unlikely(mp_size <= 0)) {
- ntfs_error(vol->sb, "Get size for mapping pairs failed for "
- "mft bitmap attribute extent.");
- ret = mp_size;
- if (!ret)
- ret = -EIO;
- goto undo_alloc;
- }
- /* Expand the attribute record if necessary. */
- old_alen = le32_to_cpu(a->length);
- ret = ntfs_attr_record_resize(ctx->mrec, a, mp_size +
- le16_to_cpu(a->data.non_resident.mapping_pairs_offset));
- if (unlikely(ret)) {
- if (ret != -ENOSPC) {
- ntfs_error(vol->sb, "Failed to resize attribute "
- "record for mft bitmap attribute.");
- goto undo_alloc;
- }
- // TODO: Deal with this by moving this extent to a new mft
- // record or by starting a new extent in a new mft record or by
- // moving other attributes out of this mft record.
- // Note: It will need to be a special mft record and if none of
- // those are available it gets rather complicated...
- ntfs_error(vol->sb, "Not enough space in this mft record to "
- "accommodate extended mft bitmap attribute "
- "extent. Cannot handle this yet.");
- ret = -EOPNOTSUPP;
- goto undo_alloc;
- }
- status.mp_rebuilt = 1;
- /* Generate the mapping pairs array directly into the attr record. */
- ret = ntfs_mapping_pairs_build(vol, (u8*)a +
- le16_to_cpu(a->data.non_resident.mapping_pairs_offset),
- mp_size, rl2, ll, -1, NULL);
- if (unlikely(ret)) {
- ntfs_error(vol->sb, "Failed to build mapping pairs array for "
- "mft bitmap attribute.");
- goto undo_alloc;
- }
- /* Update the highest_vcn. */
- a->data.non_resident.highest_vcn = cpu_to_sle64(rl[1].vcn - 1);
- /*
- * We now have extended the mft bitmap allocated_size by one cluster.
- * Reflect this in the ntfs_inode structure and the attribute record.
- */
- if (a->data.non_resident.lowest_vcn) {
- /*
- * We are not in the first attribute extent, switch to it, but
- * first ensure the changes will make it to disk later.
- */
- flush_dcache_mft_record_page(ctx->ntfs_ino);
- mark_mft_record_dirty(ctx->ntfs_ino);
- ntfs_attr_reinit_search_ctx(ctx);
- ret = ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
- mftbmp_ni->name_len, CASE_SENSITIVE, 0, NULL,
- 0, ctx);
- if (unlikely(ret)) {
- ntfs_error(vol->sb, "Failed to find first attribute "
- "extent of mft bitmap attribute.");
- goto restore_undo_alloc;
- }
- a = ctx->attr;
- }
- write_lock_irqsave(&mftbmp_ni->size_lock, flags);
- mftbmp_ni->allocated_size += vol->cluster_size;
- a->data.non_resident.allocated_size =
- cpu_to_sle64(mftbmp_ni->allocated_size);
- write_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
- /* Ensure the changes make it to disk. */
- flush_dcache_mft_record_page(ctx->ntfs_ino);
- mark_mft_record_dirty(ctx->ntfs_ino);
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(mft_ni);
- up_write(&mftbmp_ni->runlist.lock);
- ntfs_debug("Done.");
- return 0;
-restore_undo_alloc:
- ntfs_attr_reinit_search_ctx(ctx);
- if (ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
- mftbmp_ni->name_len, CASE_SENSITIVE, rl[1].vcn, NULL,
- 0, ctx)) {
- ntfs_error(vol->sb, "Failed to find last attribute extent of "
- "mft bitmap attribute.%s", es);
- write_lock_irqsave(&mftbmp_ni->size_lock, flags);
- mftbmp_ni->allocated_size += vol->cluster_size;
- write_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(mft_ni);
- up_write(&mftbmp_ni->runlist.lock);
- /*
- * The only thing that is now wrong is ->allocated_size of the
- * base attribute extent which chkdsk should be able to fix.
- */
- NVolSetErrors(vol);
- return ret;
- }
- a = ctx->attr;
- a->data.non_resident.highest_vcn = cpu_to_sle64(rl[1].vcn - 2);
-undo_alloc:
- if (status.added_cluster) {
- /* Truncate the last run in the runlist by one cluster. */
- rl->length--;
- rl[1].vcn--;
- } else if (status.added_run) {
- lcn = rl->lcn;
- /* Remove the last run from the runlist. */
- rl->lcn = rl[1].lcn;
- rl->length = 0;
- }
- /* Deallocate the cluster. */
- down_write(&vol->lcnbmp_lock);
- if (ntfs_bitmap_clear_bit(vol->lcnbmp_ino, lcn)) {
- ntfs_error(vol->sb, "Failed to free allocated cluster.%s", es);
- NVolSetErrors(vol);
- }
- up_write(&vol->lcnbmp_lock);
- if (status.mp_rebuilt) {
- if (ntfs_mapping_pairs_build(vol, (u8*)a + le16_to_cpu(
- a->data.non_resident.mapping_pairs_offset),
- old_alen - le16_to_cpu(
- a->data.non_resident.mapping_pairs_offset),
- rl2, ll, -1, NULL)) {
- ntfs_error(vol->sb, "Failed to restore mapping pairs "
- "array.%s", es);
- NVolSetErrors(vol);
- }
- if (ntfs_attr_record_resize(ctx->mrec, a, old_alen)) {
- ntfs_error(vol->sb, "Failed to restore attribute "
- "record.%s", es);
- NVolSetErrors(vol);
- }
- flush_dcache_mft_record_page(ctx->ntfs_ino);
- mark_mft_record_dirty(ctx->ntfs_ino);
- }
- if (ctx)
- ntfs_attr_put_search_ctx(ctx);
- if (!IS_ERR(mrec))
- unmap_mft_record(mft_ni);
- up_write(&mftbmp_ni->runlist.lock);
- return ret;
-}
-
-/**
- * ntfs_mft_bitmap_extend_initialized_nolock - extend mftbmp initialized data
- * @vol: volume on which to extend the mft bitmap attribute
- *
- * Extend the initialized portion of the mft bitmap attribute on the ntfs
- * volume @vol by 8 bytes.
- *
- * Note: Only changes initialized_size and data_size, i.e. requires that
- * allocated_size is big enough to fit the new initialized_size.
- *
- * Return 0 on success and -error on error.
- *
- * Locking: Caller must hold vol->mftbmp_lock for writing.
- */
-static int ntfs_mft_bitmap_extend_initialized_nolock(ntfs_volume *vol)
-{
- s64 old_data_size, old_initialized_size;
- unsigned long flags;
- struct inode *mftbmp_vi;
- ntfs_inode *mft_ni, *mftbmp_ni;
- ntfs_attr_search_ctx *ctx;
- MFT_RECORD *mrec;
- ATTR_RECORD *a;
- int ret;
-
- ntfs_debug("Extending mft bitmap initiailized (and data) size.");
- mft_ni = NTFS_I(vol->mft_ino);
- mftbmp_vi = vol->mftbmp_ino;
- mftbmp_ni = NTFS_I(mftbmp_vi);
- /* Get the attribute record. */
- mrec = map_mft_record(mft_ni);
- if (IS_ERR(mrec)) {
- ntfs_error(vol->sb, "Failed to map mft record.");
- return PTR_ERR(mrec);
- }
- ctx = ntfs_attr_get_search_ctx(mft_ni, mrec);
- if (unlikely(!ctx)) {
- ntfs_error(vol->sb, "Failed to get search context.");
- ret = -ENOMEM;
- goto unm_err_out;
- }
- ret = ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
- mftbmp_ni->name_len, CASE_SENSITIVE, 0, NULL, 0, ctx);
- if (unlikely(ret)) {
- ntfs_error(vol->sb, "Failed to find first attribute extent of "
- "mft bitmap attribute.");
- if (ret == -ENOENT)
- ret = -EIO;
- goto put_err_out;
- }
- a = ctx->attr;
- write_lock_irqsave(&mftbmp_ni->size_lock, flags);
- old_data_size = i_size_read(mftbmp_vi);
- old_initialized_size = mftbmp_ni->initialized_size;
- /*
- * We can simply update the initialized_size before filling the space
- * with zeroes because the caller is holding the mft bitmap lock for
- * writing which ensures that no one else is trying to access the data.
- */
- mftbmp_ni->initialized_size += 8;
- a->data.non_resident.initialized_size =
- cpu_to_sle64(mftbmp_ni->initialized_size);
- if (mftbmp_ni->initialized_size > old_data_size) {
- i_size_write(mftbmp_vi, mftbmp_ni->initialized_size);
- a->data.non_resident.data_size =
- cpu_to_sle64(mftbmp_ni->initialized_size);
- }
- write_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
- /* Ensure the changes make it to disk. */
- flush_dcache_mft_record_page(ctx->ntfs_ino);
- mark_mft_record_dirty(ctx->ntfs_ino);
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(mft_ni);
- /* Initialize the mft bitmap attribute value with zeroes. */
- ret = ntfs_attr_set(mftbmp_ni, old_initialized_size, 8, 0);
- if (likely(!ret)) {
- ntfs_debug("Done. (Wrote eight initialized bytes to mft "
- "bitmap.");
- return 0;
- }
- ntfs_error(vol->sb, "Failed to write to mft bitmap.");
- /* Try to recover from the error. */
- mrec = map_mft_record(mft_ni);
- if (IS_ERR(mrec)) {
- ntfs_error(vol->sb, "Failed to map mft record.%s", es);
- NVolSetErrors(vol);
- return ret;
- }
- ctx = ntfs_attr_get_search_ctx(mft_ni, mrec);
- if (unlikely(!ctx)) {
- ntfs_error(vol->sb, "Failed to get search context.%s", es);
- NVolSetErrors(vol);
- goto unm_err_out;
- }
- if (ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
- mftbmp_ni->name_len, CASE_SENSITIVE, 0, NULL, 0, ctx)) {
- ntfs_error(vol->sb, "Failed to find first attribute extent of "
- "mft bitmap attribute.%s", es);
- NVolSetErrors(vol);
-put_err_out:
- ntfs_attr_put_search_ctx(ctx);
-unm_err_out:
- unmap_mft_record(mft_ni);
- goto err_out;
- }
- a = ctx->attr;
- write_lock_irqsave(&mftbmp_ni->size_lock, flags);
- mftbmp_ni->initialized_size = old_initialized_size;
- a->data.non_resident.initialized_size =
- cpu_to_sle64(old_initialized_size);
- if (i_size_read(mftbmp_vi) != old_data_size) {
- i_size_write(mftbmp_vi, old_data_size);
- a->data.non_resident.data_size = cpu_to_sle64(old_data_size);
- }
- write_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
- flush_dcache_mft_record_page(ctx->ntfs_ino);
- mark_mft_record_dirty(ctx->ntfs_ino);
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(mft_ni);
-#ifdef DEBUG
- read_lock_irqsave(&mftbmp_ni->size_lock, flags);
- ntfs_debug("Restored status of mftbmp: allocated_size 0x%llx, "
- "data_size 0x%llx, initialized_size 0x%llx.",
- (long long)mftbmp_ni->allocated_size,
- (long long)i_size_read(mftbmp_vi),
- (long long)mftbmp_ni->initialized_size);
- read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
-#endif /* DEBUG */
-err_out:
- return ret;
-}
-
-/**
- * ntfs_mft_data_extend_allocation_nolock - extend mft data attribute
- * @vol: volume on which to extend the mft data attribute
- *
- * Extend the mft data attribute on the ntfs volume @vol by 16 mft records
- * worth of clusters or if not enough space for this by one mft record worth
- * of clusters.
- *
- * Note: Only changes allocated_size, i.e. does not touch initialized_size or
- * data_size.
- *
- * Return 0 on success and -errno on error.
- *
- * Locking: - Caller must hold vol->mftbmp_lock for writing.
- * - This function takes NTFS_I(vol->mft_ino)->runlist.lock for
- * writing and releases it before returning.
- * - This function calls functions which take vol->lcnbmp_lock for
- * writing and release it before returning.
- */
-static int ntfs_mft_data_extend_allocation_nolock(ntfs_volume *vol)
-{
- LCN lcn;
- VCN old_last_vcn;
- s64 min_nr, nr, ll;
- unsigned long flags;
- ntfs_inode *mft_ni;
- runlist_element *rl, *rl2;
- ntfs_attr_search_ctx *ctx = NULL;
- MFT_RECORD *mrec;
- ATTR_RECORD *a = NULL;
- int ret, mp_size;
- u32 old_alen = 0;
- bool mp_rebuilt = false;
-
- ntfs_debug("Extending mft data allocation.");
- mft_ni = NTFS_I(vol->mft_ino);
- /*
- * Determine the preferred allocation location, i.e. the last lcn of
- * the mft data attribute. The allocated size of the mft data
- * attribute cannot be zero so we are ok to do this.
- */
- down_write(&mft_ni->runlist.lock);
- read_lock_irqsave(&mft_ni->size_lock, flags);
- ll = mft_ni->allocated_size;
- read_unlock_irqrestore(&mft_ni->size_lock, flags);
- rl = ntfs_attr_find_vcn_nolock(mft_ni,
- (ll - 1) >> vol->cluster_size_bits, NULL);
- if (IS_ERR(rl) || unlikely(!rl->length || rl->lcn < 0)) {
- up_write(&mft_ni->runlist.lock);
- ntfs_error(vol->sb, "Failed to determine last allocated "
- "cluster of mft data attribute.");
- if (!IS_ERR(rl))
- ret = -EIO;
- else
- ret = PTR_ERR(rl);
- return ret;
- }
- lcn = rl->lcn + rl->length;
- ntfs_debug("Last lcn of mft data attribute is 0x%llx.", (long long)lcn);
- /* Minimum allocation is one mft record worth of clusters. */
- min_nr = vol->mft_record_size >> vol->cluster_size_bits;
- if (!min_nr)
- min_nr = 1;
- /* Want to allocate 16 mft records worth of clusters. */
- nr = vol->mft_record_size << 4 >> vol->cluster_size_bits;
- if (!nr)
- nr = min_nr;
- /* Ensure we do not go above 2^32-1 mft records. */
- read_lock_irqsave(&mft_ni->size_lock, flags);
- ll = mft_ni->allocated_size;
- read_unlock_irqrestore(&mft_ni->size_lock, flags);
- if (unlikely((ll + (nr << vol->cluster_size_bits)) >>
- vol->mft_record_size_bits >= (1ll << 32))) {
- nr = min_nr;
- if (unlikely((ll + (nr << vol->cluster_size_bits)) >>
- vol->mft_record_size_bits >= (1ll << 32))) {
- ntfs_warning(vol->sb, "Cannot allocate mft record "
- "because the maximum number of inodes "
- "(2^32) has already been reached.");
- up_write(&mft_ni->runlist.lock);
- return -ENOSPC;
- }
- }
- ntfs_debug("Trying mft data allocation with %s cluster count %lli.",
- nr > min_nr ? "default" : "minimal", (long long)nr);
- old_last_vcn = rl[1].vcn;
- do {
- rl2 = ntfs_cluster_alloc(vol, old_last_vcn, nr, lcn, MFT_ZONE,
- true);
- if (!IS_ERR(rl2))
- break;
- if (PTR_ERR(rl2) != -ENOSPC || nr == min_nr) {
- ntfs_error(vol->sb, "Failed to allocate the minimal "
- "number of clusters (%lli) for the "
- "mft data attribute.", (long long)nr);
- up_write(&mft_ni->runlist.lock);
- return PTR_ERR(rl2);
- }
- /*
- * There is not enough space to do the allocation, but there
- * might be enough space to do a minimal allocation so try that
- * before failing.
- */
- nr = min_nr;
- ntfs_debug("Retrying mft data allocation with minimal cluster "
- "count %lli.", (long long)nr);
- } while (1);
- rl = ntfs_runlists_merge(mft_ni->runlist.rl, rl2);
- if (IS_ERR(rl)) {
- up_write(&mft_ni->runlist.lock);
- ntfs_error(vol->sb, "Failed to merge runlists for mft data "
- "attribute.");
- if (ntfs_cluster_free_from_rl(vol, rl2)) {
- ntfs_error(vol->sb, "Failed to deallocate clusters "
- "from the mft data attribute.%s", es);
- NVolSetErrors(vol);
- }
- ntfs_free(rl2);
- return PTR_ERR(rl);
- }
- mft_ni->runlist.rl = rl;
- ntfs_debug("Allocated %lli clusters.", (long long)nr);
- /* Find the last run in the new runlist. */
- for (; rl[1].length; rl++)
- ;
- /* Update the attribute record as well. */
- mrec = map_mft_record(mft_ni);
- if (IS_ERR(mrec)) {
- ntfs_error(vol->sb, "Failed to map mft record.");
- ret = PTR_ERR(mrec);
- goto undo_alloc;
- }
- ctx = ntfs_attr_get_search_ctx(mft_ni, mrec);
- if (unlikely(!ctx)) {
- ntfs_error(vol->sb, "Failed to get search context.");
- ret = -ENOMEM;
- goto undo_alloc;
- }
- ret = ntfs_attr_lookup(mft_ni->type, mft_ni->name, mft_ni->name_len,
- CASE_SENSITIVE, rl[1].vcn, NULL, 0, ctx);
- if (unlikely(ret)) {
- ntfs_error(vol->sb, "Failed to find last attribute extent of "
- "mft data attribute.");
- if (ret == -ENOENT)
- ret = -EIO;
- goto undo_alloc;
- }
- a = ctx->attr;
- ll = sle64_to_cpu(a->data.non_resident.lowest_vcn);
- /* Search back for the previous last allocated cluster of mft bitmap. */
- for (rl2 = rl; rl2 > mft_ni->runlist.rl; rl2--) {
- if (ll >= rl2->vcn)
- break;
- }
- BUG_ON(ll < rl2->vcn);
- BUG_ON(ll >= rl2->vcn + rl2->length);
- /* Get the size for the new mapping pairs array for this extent. */
- mp_size = ntfs_get_size_for_mapping_pairs(vol, rl2, ll, -1);
- if (unlikely(mp_size <= 0)) {
- ntfs_error(vol->sb, "Get size for mapping pairs failed for "
- "mft data attribute extent.");
- ret = mp_size;
- if (!ret)
- ret = -EIO;
- goto undo_alloc;
- }
- /* Expand the attribute record if necessary. */
- old_alen = le32_to_cpu(a->length);
- ret = ntfs_attr_record_resize(ctx->mrec, a, mp_size +
- le16_to_cpu(a->data.non_resident.mapping_pairs_offset));
- if (unlikely(ret)) {
- if (ret != -ENOSPC) {
- ntfs_error(vol->sb, "Failed to resize attribute "
- "record for mft data attribute.");
- goto undo_alloc;
- }
- // TODO: Deal with this by moving this extent to a new mft
- // record or by starting a new extent in a new mft record or by
- // moving other attributes out of this mft record.
- // Note: Use the special reserved mft records and ensure that
- // this extent is not required to find the mft record in
- // question. If no free special records left we would need to
- // move an existing record away, insert ours in its place, and
- // then place the moved record into the newly allocated space
- // and we would then need to update all references to this mft
- // record appropriately. This is rather complicated...
- ntfs_error(vol->sb, "Not enough space in this mft record to "
- "accommodate extended mft data attribute "
- "extent. Cannot handle this yet.");
- ret = -EOPNOTSUPP;
- goto undo_alloc;
- }
- mp_rebuilt = true;
- /* Generate the mapping pairs array directly into the attr record. */
- ret = ntfs_mapping_pairs_build(vol, (u8*)a +
- le16_to_cpu(a->data.non_resident.mapping_pairs_offset),
- mp_size, rl2, ll, -1, NULL);
- if (unlikely(ret)) {
- ntfs_error(vol->sb, "Failed to build mapping pairs array of "
- "mft data attribute.");
- goto undo_alloc;
- }
- /* Update the highest_vcn. */
- a->data.non_resident.highest_vcn = cpu_to_sle64(rl[1].vcn - 1);
- /*
- * We now have extended the mft data allocated_size by nr clusters.
- * Reflect this in the ntfs_inode structure and the attribute record.
- * @rl is the last (non-terminator) runlist element of mft data
- * attribute.
- */
- if (a->data.non_resident.lowest_vcn) {
- /*
- * We are not in the first attribute extent, switch to it, but
- * first ensure the changes will make it to disk later.
- */
- flush_dcache_mft_record_page(ctx->ntfs_ino);
- mark_mft_record_dirty(ctx->ntfs_ino);
- ntfs_attr_reinit_search_ctx(ctx);
- ret = ntfs_attr_lookup(mft_ni->type, mft_ni->name,
- mft_ni->name_len, CASE_SENSITIVE, 0, NULL, 0,
- ctx);
- if (unlikely(ret)) {
- ntfs_error(vol->sb, "Failed to find first attribute "
- "extent of mft data attribute.");
- goto restore_undo_alloc;
- }
- a = ctx->attr;
- }
- write_lock_irqsave(&mft_ni->size_lock, flags);
- mft_ni->allocated_size += nr << vol->cluster_size_bits;
- a->data.non_resident.allocated_size =
- cpu_to_sle64(mft_ni->allocated_size);
- write_unlock_irqrestore(&mft_ni->size_lock, flags);
- /* Ensure the changes make it to disk. */
- flush_dcache_mft_record_page(ctx->ntfs_ino);
- mark_mft_record_dirty(ctx->ntfs_ino);
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(mft_ni);
- up_write(&mft_ni->runlist.lock);
- ntfs_debug("Done.");
- return 0;
-restore_undo_alloc:
- ntfs_attr_reinit_search_ctx(ctx);
- if (ntfs_attr_lookup(mft_ni->type, mft_ni->name, mft_ni->name_len,
- CASE_SENSITIVE, rl[1].vcn, NULL, 0, ctx)) {
- ntfs_error(vol->sb, "Failed to find last attribute extent of "
- "mft data attribute.%s", es);
- write_lock_irqsave(&mft_ni->size_lock, flags);
- mft_ni->allocated_size += nr << vol->cluster_size_bits;
- write_unlock_irqrestore(&mft_ni->size_lock, flags);
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(mft_ni);
- up_write(&mft_ni->runlist.lock);
- /*
- * The only thing that is now wrong is ->allocated_size of the
- * base attribute extent which chkdsk should be able to fix.
- */
- NVolSetErrors(vol);
- return ret;
- }
- ctx->attr->data.non_resident.highest_vcn =
- cpu_to_sle64(old_last_vcn - 1);
-undo_alloc:
- if (ntfs_cluster_free(mft_ni, old_last_vcn, -1, ctx) < 0) {
- ntfs_error(vol->sb, "Failed to free clusters from mft data "
- "attribute.%s", es);
- NVolSetErrors(vol);
- }
-
- if (ntfs_rl_truncate_nolock(vol, &mft_ni->runlist, old_last_vcn)) {
- ntfs_error(vol->sb, "Failed to truncate mft data attribute "
- "runlist.%s", es);
- NVolSetErrors(vol);
- }
- if (ctx) {
- a = ctx->attr;
- if (mp_rebuilt && !IS_ERR(ctx->mrec)) {
- if (ntfs_mapping_pairs_build(vol, (u8 *)a + le16_to_cpu(
- a->data.non_resident.mapping_pairs_offset),
- old_alen - le16_to_cpu(
- a->data.non_resident.mapping_pairs_offset),
- rl2, ll, -1, NULL)) {
- ntfs_error(vol->sb, "Failed to restore mapping pairs "
- "array.%s", es);
- NVolSetErrors(vol);
- }
- if (ntfs_attr_record_resize(ctx->mrec, a, old_alen)) {
- ntfs_error(vol->sb, "Failed to restore attribute "
- "record.%s", es);
- NVolSetErrors(vol);
- }
- flush_dcache_mft_record_page(ctx->ntfs_ino);
- mark_mft_record_dirty(ctx->ntfs_ino);
- } else if (IS_ERR(ctx->mrec)) {
- ntfs_error(vol->sb, "Failed to restore attribute search "
- "context.%s", es);
- NVolSetErrors(vol);
- }
- ntfs_attr_put_search_ctx(ctx);
- }
- if (!IS_ERR(mrec))
- unmap_mft_record(mft_ni);
- up_write(&mft_ni->runlist.lock);
- return ret;
-}
-
-/**
- * ntfs_mft_record_layout - layout an mft record into a memory buffer
- * @vol: volume to which the mft record will belong
- * @mft_no: mft reference specifying the mft record number
- * @m: destination buffer of size >= @vol->mft_record_size bytes
- *
- * Layout an empty, unused mft record with the mft record number @mft_no into
- * the buffer @m. The volume @vol is needed because the mft record structure
- * was modified in NTFS 3.1 so we need to know which volume version this mft
- * record will be used on.
- *
- * Return 0 on success and -errno on error.
- */
-static int ntfs_mft_record_layout(const ntfs_volume *vol, const s64 mft_no,
- MFT_RECORD *m)
-{
- ATTR_RECORD *a;
-
- ntfs_debug("Entering for mft record 0x%llx.", (long long)mft_no);
- if (mft_no >= (1ll << 32)) {
- ntfs_error(vol->sb, "Mft record number 0x%llx exceeds "
- "maximum of 2^32.", (long long)mft_no);
- return -ERANGE;
- }
- /* Start by clearing the whole mft record to gives us a clean slate. */
- memset(m, 0, vol->mft_record_size);
- /* Aligned to 2-byte boundary. */
- if (vol->major_ver < 3 || (vol->major_ver == 3 && !vol->minor_ver))
- m->usa_ofs = cpu_to_le16((sizeof(MFT_RECORD_OLD) + 1) & ~1);
- else {
- m->usa_ofs = cpu_to_le16((sizeof(MFT_RECORD) + 1) & ~1);
- /*
- * Set the NTFS 3.1+ specific fields while we know that the
- * volume version is 3.1+.
- */
- m->reserved = 0;
- m->mft_record_number = cpu_to_le32((u32)mft_no);
- }
- m->magic = magic_FILE;
- if (vol->mft_record_size >= NTFS_BLOCK_SIZE)
- m->usa_count = cpu_to_le16(vol->mft_record_size /
- NTFS_BLOCK_SIZE + 1);
- else {
- m->usa_count = cpu_to_le16(1);
- ntfs_warning(vol->sb, "Sector size is bigger than mft record "
- "size. Setting usa_count to 1. If chkdsk "
- "reports this as corruption, please email "
- "linux-ntfs-dev@lists.sourceforge.net stating "
- "that you saw this message and that the "
- "modified filesystem created was corrupt. "
- "Thank you.");
- }
- /* Set the update sequence number to 1. */
- *(le16*)((u8*)m + le16_to_cpu(m->usa_ofs)) = cpu_to_le16(1);
- m->lsn = 0;
- m->sequence_number = cpu_to_le16(1);
- m->link_count = 0;
- /*
- * Place the attributes straight after the update sequence array,
- * aligned to 8-byte boundary.
- */
- m->attrs_offset = cpu_to_le16((le16_to_cpu(m->usa_ofs) +
- (le16_to_cpu(m->usa_count) << 1) + 7) & ~7);
- m->flags = 0;
- /*
- * Using attrs_offset plus eight bytes (for the termination attribute).
- * attrs_offset is already aligned to 8-byte boundary, so no need to
- * align again.
- */
- m->bytes_in_use = cpu_to_le32(le16_to_cpu(m->attrs_offset) + 8);
- m->bytes_allocated = cpu_to_le32(vol->mft_record_size);
- m->base_mft_record = 0;
- m->next_attr_instance = 0;
- /* Add the termination attribute. */
- a = (ATTR_RECORD*)((u8*)m + le16_to_cpu(m->attrs_offset));
- a->type = AT_END;
- a->length = 0;
- ntfs_debug("Done.");
- return 0;
-}
-
-/**
- * ntfs_mft_record_format - format an mft record on an ntfs volume
- * @vol: volume on which to format the mft record
- * @mft_no: mft record number to format
- *
- * Format the mft record @mft_no in $MFT/$DATA, i.e. lay out an empty, unused
- * mft record into the appropriate place of the mft data attribute. This is
- * used when extending the mft data attribute.
- *
- * Return 0 on success and -errno on error.
- */
-static int ntfs_mft_record_format(const ntfs_volume *vol, const s64 mft_no)
-{
- loff_t i_size;
- struct inode *mft_vi = vol->mft_ino;
- struct page *page;
- MFT_RECORD *m;
- pgoff_t index, end_index;
- unsigned int ofs;
- int err;
-
- ntfs_debug("Entering for mft record 0x%llx.", (long long)mft_no);
- /*
- * The index into the page cache and the offset within the page cache
- * page of the wanted mft record.
- */
- index = mft_no << vol->mft_record_size_bits >> PAGE_SHIFT;
- ofs = (mft_no << vol->mft_record_size_bits) & ~PAGE_MASK;
- /* The maximum valid index into the page cache for $MFT's data. */
- i_size = i_size_read(mft_vi);
- end_index = i_size >> PAGE_SHIFT;
- if (unlikely(index >= end_index)) {
- if (unlikely(index > end_index || ofs + vol->mft_record_size >=
- (i_size & ~PAGE_MASK))) {
- ntfs_error(vol->sb, "Tried to format non-existing mft "
- "record 0x%llx.", (long long)mft_no);
- return -ENOENT;
- }
- }
- /* Read, map, and pin the page containing the mft record. */
- page = ntfs_map_page(mft_vi->i_mapping, index);
- if (IS_ERR(page)) {
- ntfs_error(vol->sb, "Failed to map page containing mft record "
- "to format 0x%llx.", (long long)mft_no);
- return PTR_ERR(page);
- }
- lock_page(page);
- BUG_ON(!PageUptodate(page));
- ClearPageUptodate(page);
- m = (MFT_RECORD*)((u8*)page_address(page) + ofs);
- err = ntfs_mft_record_layout(vol, mft_no, m);
- if (unlikely(err)) {
- ntfs_error(vol->sb, "Failed to layout mft record 0x%llx.",
- (long long)mft_no);
- SetPageUptodate(page);
- unlock_page(page);
- ntfs_unmap_page(page);
- return err;
- }
- flush_dcache_page(page);
- SetPageUptodate(page);
- unlock_page(page);
- /*
- * Make sure the mft record is written out to disk. We could use
- * ilookup5() to check if an inode is in icache and so on but this is
- * unnecessary as ntfs_writepage() will write the dirty record anyway.
- */
- mark_ntfs_record_dirty(page, ofs);
- ntfs_unmap_page(page);
- ntfs_debug("Done.");
- return 0;
-}
-
-/**
- * ntfs_mft_record_alloc - allocate an mft record on an ntfs volume
- * @vol: [IN] volume on which to allocate the mft record
- * @mode: [IN] mode if want a file or directory, i.e. base inode or 0
- * @base_ni: [IN] open base inode if allocating an extent mft record or NULL
- * @mrec: [OUT] on successful return this is the mapped mft record
- *
- * Allocate an mft record in $MFT/$DATA of an open ntfs volume @vol.
- *
- * If @base_ni is NULL make the mft record a base mft record, i.e. a file or
- * direvctory inode, and allocate it at the default allocator position. In
- * this case @mode is the file mode as given to us by the caller. We in
- * particular use @mode to distinguish whether a file or a directory is being
- * created (S_IFDIR(mode) and S_IFREG(mode), respectively).
- *
- * If @base_ni is not NULL make the allocated mft record an extent record,
- * allocate it starting at the mft record after the base mft record and attach
- * the allocated and opened ntfs inode to the base inode @base_ni. In this
- * case @mode must be 0 as it is meaningless for extent inodes.
- *
- * You need to check the return value with IS_ERR(). If false, the function
- * was successful and the return value is the now opened ntfs inode of the
- * allocated mft record. *@mrec is then set to the allocated, mapped, pinned,
- * and locked mft record. If IS_ERR() is true, the function failed and the
- * error code is obtained from PTR_ERR(return value). *@mrec is undefined in
- * this case.
- *
- * Allocation strategy:
- *
- * To find a free mft record, we scan the mft bitmap for a zero bit. To
- * optimize this we start scanning at the place specified by @base_ni or if
- * @base_ni is NULL we start where we last stopped and we perform wrap around
- * when we reach the end. Note, we do not try to allocate mft records below
- * number 24 because numbers 0 to 15 are the defined system files anyway and 16
- * to 24 are special in that they are used for storing extension mft records
- * for the $DATA attribute of $MFT. This is required to avoid the possibility
- * of creating a runlist with a circular dependency which once written to disk
- * can never be read in again. Windows will only use records 16 to 24 for
- * normal files if the volume is completely out of space. We never use them
- * which means that when the volume is really out of space we cannot create any
- * more files while Windows can still create up to 8 small files. We can start
- * doing this at some later time, it does not matter much for now.
- *
- * When scanning the mft bitmap, we only search up to the last allocated mft
- * record. If there are no free records left in the range 24 to number of
- * allocated mft records, then we extend the $MFT/$DATA attribute in order to
- * create free mft records. We extend the allocated size of $MFT/$DATA by 16
- * records at a time or one cluster, if cluster size is above 16kiB. If there
- * is not sufficient space to do this, we try to extend by a single mft record
- * or one cluster, if cluster size is above the mft record size.
- *
- * No matter how many mft records we allocate, we initialize only the first
- * allocated mft record, incrementing mft data size and initialized size
- * accordingly, open an ntfs_inode for it and return it to the caller, unless
- * there are less than 24 mft records, in which case we allocate and initialize
- * mft records until we reach record 24 which we consider as the first free mft
- * record for use by normal files.
- *
- * If during any stage we overflow the initialized data in the mft bitmap, we
- * extend the initialized size (and data size) by 8 bytes, allocating another
- * cluster if required. The bitmap data size has to be at least equal to the
- * number of mft records in the mft, but it can be bigger, in which case the
- * superflous bits are padded with zeroes.
- *
- * Thus, when we return successfully (IS_ERR() is false), we will have:
- * - initialized / extended the mft bitmap if necessary,
- * - initialized / extended the mft data if necessary,
- * - set the bit corresponding to the mft record being allocated in the
- * mft bitmap,
- * - opened an ntfs_inode for the allocated mft record, and we will have
- * - returned the ntfs_inode as well as the allocated mapped, pinned, and
- * locked mft record.
- *
- * On error, the volume will be left in a consistent state and no record will
- * be allocated. If rolling back a partial operation fails, we may leave some
- * inconsistent metadata in which case we set NVolErrors() so the volume is
- * left dirty when unmounted.
- *
- * Note, this function cannot make use of most of the normal functions, like
- * for example for attribute resizing, etc, because when the run list overflows
- * the base mft record and an attribute list is used, it is very important that
- * the extension mft records used to store the $DATA attribute of $MFT can be
- * reached without having to read the information contained inside them, as
- * this would make it impossible to find them in the first place after the
- * volume is unmounted. $MFT/$BITMAP probably does not need to follow this
- * rule because the bitmap is not essential for finding the mft records, but on
- * the other hand, handling the bitmap in this special way would make life
- * easier because otherwise there might be circular invocations of functions
- * when reading the bitmap.
- */
-ntfs_inode *ntfs_mft_record_alloc(ntfs_volume *vol, const int mode,
- ntfs_inode *base_ni, MFT_RECORD **mrec)
-{
- s64 ll, bit, old_data_initialized, old_data_size;
- unsigned long flags;
- struct inode *vi;
- struct page *page;
- ntfs_inode *mft_ni, *mftbmp_ni, *ni;
- ntfs_attr_search_ctx *ctx;
- MFT_RECORD *m;
- ATTR_RECORD *a;
- pgoff_t index;
- unsigned int ofs;
- int err;
- le16 seq_no, usn;
- bool record_formatted = false;
-
- if (base_ni) {
- ntfs_debug("Entering (allocating an extent mft record for "
- "base mft record 0x%llx).",
- (long long)base_ni->mft_no);
- /* @mode and @base_ni are mutually exclusive. */
- BUG_ON(mode);
- } else
- ntfs_debug("Entering (allocating a base mft record).");
- if (mode) {
- /* @mode and @base_ni are mutually exclusive. */
- BUG_ON(base_ni);
- /* We only support creation of normal files and directories. */
- if (!S_ISREG(mode) && !S_ISDIR(mode))
- return ERR_PTR(-EOPNOTSUPP);
- }
- BUG_ON(!mrec);
- mft_ni = NTFS_I(vol->mft_ino);
- mftbmp_ni = NTFS_I(vol->mftbmp_ino);
- down_write(&vol->mftbmp_lock);
- bit = ntfs_mft_bitmap_find_and_alloc_free_rec_nolock(vol, base_ni);
- if (bit >= 0) {
- ntfs_debug("Found and allocated free record (#1), bit 0x%llx.",
- (long long)bit);
- goto have_alloc_rec;
- }
- if (bit != -ENOSPC) {
- up_write(&vol->mftbmp_lock);
- return ERR_PTR(bit);
- }
- /*
- * No free mft records left. If the mft bitmap already covers more
- * than the currently used mft records, the next records are all free,
- * so we can simply allocate the first unused mft record.
- * Note: We also have to make sure that the mft bitmap at least covers
- * the first 24 mft records as they are special and whilst they may not
- * be in use, we do not allocate from them.
- */
- read_lock_irqsave(&mft_ni->size_lock, flags);
- ll = mft_ni->initialized_size >> vol->mft_record_size_bits;
- read_unlock_irqrestore(&mft_ni->size_lock, flags);
- read_lock_irqsave(&mftbmp_ni->size_lock, flags);
- old_data_initialized = mftbmp_ni->initialized_size;
- read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
- if (old_data_initialized << 3 > ll && old_data_initialized > 3) {
- bit = ll;
- if (bit < 24)
- bit = 24;
- if (unlikely(bit >= (1ll << 32)))
- goto max_err_out;
- ntfs_debug("Found free record (#2), bit 0x%llx.",
- (long long)bit);
- goto found_free_rec;
- }
- /*
- * The mft bitmap needs to be expanded until it covers the first unused
- * mft record that we can allocate.
- * Note: The smallest mft record we allocate is mft record 24.
- */
- bit = old_data_initialized << 3;
- if (unlikely(bit >= (1ll << 32)))
- goto max_err_out;
- read_lock_irqsave(&mftbmp_ni->size_lock, flags);
- old_data_size = mftbmp_ni->allocated_size;
- ntfs_debug("Status of mftbmp before extension: allocated_size 0x%llx, "
- "data_size 0x%llx, initialized_size 0x%llx.",
- (long long)old_data_size,
- (long long)i_size_read(vol->mftbmp_ino),
- (long long)old_data_initialized);
- read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
- if (old_data_initialized + 8 > old_data_size) {
- /* Need to extend bitmap by one more cluster. */
- ntfs_debug("mftbmp: initialized_size + 8 > allocated_size.");
- err = ntfs_mft_bitmap_extend_allocation_nolock(vol);
- if (unlikely(err)) {
- up_write(&vol->mftbmp_lock);
- goto err_out;
- }
-#ifdef DEBUG
- read_lock_irqsave(&mftbmp_ni->size_lock, flags);
- ntfs_debug("Status of mftbmp after allocation extension: "
- "allocated_size 0x%llx, data_size 0x%llx, "
- "initialized_size 0x%llx.",
- (long long)mftbmp_ni->allocated_size,
- (long long)i_size_read(vol->mftbmp_ino),
- (long long)mftbmp_ni->initialized_size);
- read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
-#endif /* DEBUG */
- }
- /*
- * We now have sufficient allocated space, extend the initialized_size
- * as well as the data_size if necessary and fill the new space with
- * zeroes.
- */
- err = ntfs_mft_bitmap_extend_initialized_nolock(vol);
- if (unlikely(err)) {
- up_write(&vol->mftbmp_lock);
- goto err_out;
- }
-#ifdef DEBUG
- read_lock_irqsave(&mftbmp_ni->size_lock, flags);
- ntfs_debug("Status of mftbmp after initialized extension: "
- "allocated_size 0x%llx, data_size 0x%llx, "
- "initialized_size 0x%llx.",
- (long long)mftbmp_ni->allocated_size,
- (long long)i_size_read(vol->mftbmp_ino),
- (long long)mftbmp_ni->initialized_size);
- read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
-#endif /* DEBUG */
- ntfs_debug("Found free record (#3), bit 0x%llx.", (long long)bit);
-found_free_rec:
- /* @bit is the found free mft record, allocate it in the mft bitmap. */
- ntfs_debug("At found_free_rec.");
- err = ntfs_bitmap_set_bit(vol->mftbmp_ino, bit);
- if (unlikely(err)) {
- ntfs_error(vol->sb, "Failed to allocate bit in mft bitmap.");
- up_write(&vol->mftbmp_lock);
- goto err_out;
- }
- ntfs_debug("Set bit 0x%llx in mft bitmap.", (long long)bit);
-have_alloc_rec:
- /*
- * The mft bitmap is now uptodate. Deal with mft data attribute now.
- * Note, we keep hold of the mft bitmap lock for writing until all
- * modifications to the mft data attribute are complete, too, as they
- * will impact decisions for mft bitmap and mft record allocation done
- * by a parallel allocation and if the lock is not maintained a
- * parallel allocation could allocate the same mft record as this one.
- */
- ll = (bit + 1) << vol->mft_record_size_bits;
- read_lock_irqsave(&mft_ni->size_lock, flags);
- old_data_initialized = mft_ni->initialized_size;
- read_unlock_irqrestore(&mft_ni->size_lock, flags);
- if (ll <= old_data_initialized) {
- ntfs_debug("Allocated mft record already initialized.");
- goto mft_rec_already_initialized;
- }
- ntfs_debug("Initializing allocated mft record.");
- /*
- * The mft record is outside the initialized data. Extend the mft data
- * attribute until it covers the allocated record. The loop is only
- * actually traversed more than once when a freshly formatted volume is
- * first written to so it optimizes away nicely in the common case.
- */
- read_lock_irqsave(&mft_ni->size_lock, flags);
- ntfs_debug("Status of mft data before extension: "
- "allocated_size 0x%llx, data_size 0x%llx, "
- "initialized_size 0x%llx.",
- (long long)mft_ni->allocated_size,
- (long long)i_size_read(vol->mft_ino),
- (long long)mft_ni->initialized_size);
- while (ll > mft_ni->allocated_size) {
- read_unlock_irqrestore(&mft_ni->size_lock, flags);
- err = ntfs_mft_data_extend_allocation_nolock(vol);
- if (unlikely(err)) {
- ntfs_error(vol->sb, "Failed to extend mft data "
- "allocation.");
- goto undo_mftbmp_alloc_nolock;
- }
- read_lock_irqsave(&mft_ni->size_lock, flags);
- ntfs_debug("Status of mft data after allocation extension: "
- "allocated_size 0x%llx, data_size 0x%llx, "
- "initialized_size 0x%llx.",
- (long long)mft_ni->allocated_size,
- (long long)i_size_read(vol->mft_ino),
- (long long)mft_ni->initialized_size);
- }
- read_unlock_irqrestore(&mft_ni->size_lock, flags);
- /*
- * Extend mft data initialized size (and data size of course) to reach
- * the allocated mft record, formatting the mft records allong the way.
- * Note: We only modify the ntfs_inode structure as that is all that is
- * needed by ntfs_mft_record_format(). We will update the attribute
- * record itself in one fell swoop later on.
- */
- write_lock_irqsave(&mft_ni->size_lock, flags);
- old_data_initialized = mft_ni->initialized_size;
- old_data_size = vol->mft_ino->i_size;
- while (ll > mft_ni->initialized_size) {
- s64 new_initialized_size, mft_no;
-
- new_initialized_size = mft_ni->initialized_size +
- vol->mft_record_size;
- mft_no = mft_ni->initialized_size >> vol->mft_record_size_bits;
- if (new_initialized_size > i_size_read(vol->mft_ino))
- i_size_write(vol->mft_ino, new_initialized_size);
- write_unlock_irqrestore(&mft_ni->size_lock, flags);
- ntfs_debug("Initializing mft record 0x%llx.",
- (long long)mft_no);
- err = ntfs_mft_record_format(vol, mft_no);
- if (unlikely(err)) {
- ntfs_error(vol->sb, "Failed to format mft record.");
- goto undo_data_init;
- }
- write_lock_irqsave(&mft_ni->size_lock, flags);
- mft_ni->initialized_size = new_initialized_size;
- }
- write_unlock_irqrestore(&mft_ni->size_lock, flags);
- record_formatted = true;
- /* Update the mft data attribute record to reflect the new sizes. */
- m = map_mft_record(mft_ni);
- if (IS_ERR(m)) {
- ntfs_error(vol->sb, "Failed to map mft record.");
- err = PTR_ERR(m);
- goto undo_data_init;
- }
- ctx = ntfs_attr_get_search_ctx(mft_ni, m);
- if (unlikely(!ctx)) {
- ntfs_error(vol->sb, "Failed to get search context.");
- err = -ENOMEM;
- unmap_mft_record(mft_ni);
- goto undo_data_init;
- }
- err = ntfs_attr_lookup(mft_ni->type, mft_ni->name, mft_ni->name_len,
- CASE_SENSITIVE, 0, NULL, 0, ctx);
- if (unlikely(err)) {
- ntfs_error(vol->sb, "Failed to find first attribute extent of "
- "mft data attribute.");
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(mft_ni);
- goto undo_data_init;
- }
- a = ctx->attr;
- read_lock_irqsave(&mft_ni->size_lock, flags);
- a->data.non_resident.initialized_size =
- cpu_to_sle64(mft_ni->initialized_size);
- a->data.non_resident.data_size =
- cpu_to_sle64(i_size_read(vol->mft_ino));
- read_unlock_irqrestore(&mft_ni->size_lock, flags);
- /* Ensure the changes make it to disk. */
- flush_dcache_mft_record_page(ctx->ntfs_ino);
- mark_mft_record_dirty(ctx->ntfs_ino);
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(mft_ni);
- read_lock_irqsave(&mft_ni->size_lock, flags);
- ntfs_debug("Status of mft data after mft record initialization: "
- "allocated_size 0x%llx, data_size 0x%llx, "
- "initialized_size 0x%llx.",
- (long long)mft_ni->allocated_size,
- (long long)i_size_read(vol->mft_ino),
- (long long)mft_ni->initialized_size);
- BUG_ON(i_size_read(vol->mft_ino) > mft_ni->allocated_size);
- BUG_ON(mft_ni->initialized_size > i_size_read(vol->mft_ino));
- read_unlock_irqrestore(&mft_ni->size_lock, flags);
-mft_rec_already_initialized:
- /*
- * We can finally drop the mft bitmap lock as the mft data attribute
- * has been fully updated. The only disparity left is that the
- * allocated mft record still needs to be marked as in use to match the
- * set bit in the mft bitmap but this is actually not a problem since
- * this mft record is not referenced from anywhere yet and the fact
- * that it is allocated in the mft bitmap means that no-one will try to
- * allocate it either.
- */
- up_write(&vol->mftbmp_lock);
- /*
- * We now have allocated and initialized the mft record. Calculate the
- * index of and the offset within the page cache page the record is in.
- */
- index = bit << vol->mft_record_size_bits >> PAGE_SHIFT;
- ofs = (bit << vol->mft_record_size_bits) & ~PAGE_MASK;
- /* Read, map, and pin the page containing the mft record. */
- page = ntfs_map_page(vol->mft_ino->i_mapping, index);
- if (IS_ERR(page)) {
- ntfs_error(vol->sb, "Failed to map page containing allocated "
- "mft record 0x%llx.", (long long)bit);
- err = PTR_ERR(page);
- goto undo_mftbmp_alloc;
- }
- lock_page(page);
- BUG_ON(!PageUptodate(page));
- ClearPageUptodate(page);
- m = (MFT_RECORD*)((u8*)page_address(page) + ofs);
- /* If we just formatted the mft record no need to do it again. */
- if (!record_formatted) {
- /* Sanity check that the mft record is really not in use. */
- if (ntfs_is_file_record(m->magic) &&
- (m->flags & MFT_RECORD_IN_USE)) {
- ntfs_error(vol->sb, "Mft record 0x%llx was marked "
- "free in mft bitmap but is marked "
- "used itself. Corrupt filesystem. "
- "Unmount and run chkdsk.",
- (long long)bit);
- err = -EIO;
- SetPageUptodate(page);
- unlock_page(page);
- ntfs_unmap_page(page);
- NVolSetErrors(vol);
- goto undo_mftbmp_alloc;
- }
- /*
- * We need to (re-)format the mft record, preserving the
- * sequence number if it is not zero as well as the update
- * sequence number if it is not zero or -1 (0xffff). This
- * means we do not need to care whether or not something went
- * wrong with the previous mft record.
- */
- seq_no = m->sequence_number;
- usn = *(le16*)((u8*)m + le16_to_cpu(m->usa_ofs));
- err = ntfs_mft_record_layout(vol, bit, m);
- if (unlikely(err)) {
- ntfs_error(vol->sb, "Failed to layout allocated mft "
- "record 0x%llx.", (long long)bit);
- SetPageUptodate(page);
- unlock_page(page);
- ntfs_unmap_page(page);
- goto undo_mftbmp_alloc;
- }
- if (seq_no)
- m->sequence_number = seq_no;
- if (usn && le16_to_cpu(usn) != 0xffff)
- *(le16*)((u8*)m + le16_to_cpu(m->usa_ofs)) = usn;
- }
- /* Set the mft record itself in use. */
- m->flags |= MFT_RECORD_IN_USE;
- if (S_ISDIR(mode))
- m->flags |= MFT_RECORD_IS_DIRECTORY;
- flush_dcache_page(page);
- SetPageUptodate(page);
- if (base_ni) {
- MFT_RECORD *m_tmp;
-
- /*
- * Setup the base mft record in the extent mft record. This
- * completes initialization of the allocated extent mft record
- * and we can simply use it with map_extent_mft_record().
- */
- m->base_mft_record = MK_LE_MREF(base_ni->mft_no,
- base_ni->seq_no);
- /*
- * Allocate an extent inode structure for the new mft record,
- * attach it to the base inode @base_ni and map, pin, and lock
- * its, i.e. the allocated, mft record.
- */
- m_tmp = map_extent_mft_record(base_ni, bit, &ni);
- if (IS_ERR(m_tmp)) {
- ntfs_error(vol->sb, "Failed to map allocated extent "
- "mft record 0x%llx.", (long long)bit);
- err = PTR_ERR(m_tmp);
- /* Set the mft record itself not in use. */
- m->flags &= cpu_to_le16(
- ~le16_to_cpu(MFT_RECORD_IN_USE));
- flush_dcache_page(page);
- /* Make sure the mft record is written out to disk. */
- mark_ntfs_record_dirty(page, ofs);
- unlock_page(page);
- ntfs_unmap_page(page);
- goto undo_mftbmp_alloc;
- }
- BUG_ON(m != m_tmp);
- /*
- * Make sure the allocated mft record is written out to disk.
- * No need to set the inode dirty because the caller is going
- * to do that anyway after finishing with the new extent mft
- * record (e.g. at a minimum a new attribute will be added to
- * the mft record.
- */
- mark_ntfs_record_dirty(page, ofs);
- unlock_page(page);
- /*
- * Need to unmap the page since map_extent_mft_record() mapped
- * it as well so we have it mapped twice at the moment.
- */
- ntfs_unmap_page(page);
- } else {
- /*
- * Allocate a new VFS inode and set it up. NOTE: @vi->i_nlink
- * is set to 1 but the mft record->link_count is 0. The caller
- * needs to bear this in mind.
- */
- vi = new_inode(vol->sb);
- if (unlikely(!vi)) {
- err = -ENOMEM;
- /* Set the mft record itself not in use. */
- m->flags &= cpu_to_le16(
- ~le16_to_cpu(MFT_RECORD_IN_USE));
- flush_dcache_page(page);
- /* Make sure the mft record is written out to disk. */
- mark_ntfs_record_dirty(page, ofs);
- unlock_page(page);
- ntfs_unmap_page(page);
- goto undo_mftbmp_alloc;
- }
- vi->i_ino = bit;
-
- /* The owner and group come from the ntfs volume. */
- vi->i_uid = vol->uid;
- vi->i_gid = vol->gid;
-
- /* Initialize the ntfs specific part of @vi. */
- ntfs_init_big_inode(vi);
- ni = NTFS_I(vi);
- /*
- * Set the appropriate mode, attribute type, and name. For
- * directories, also setup the index values to the defaults.
- */
- if (S_ISDIR(mode)) {
- vi->i_mode = S_IFDIR | S_IRWXUGO;
- vi->i_mode &= ~vol->dmask;
-
- NInoSetMstProtected(ni);
- ni->type = AT_INDEX_ALLOCATION;
- ni->name = I30;
- ni->name_len = 4;
-
- ni->itype.index.block_size = 4096;
- ni->itype.index.block_size_bits = ntfs_ffs(4096) - 1;
- ni->itype.index.collation_rule = COLLATION_FILE_NAME;
- if (vol->cluster_size <= ni->itype.index.block_size) {
- ni->itype.index.vcn_size = vol->cluster_size;
- ni->itype.index.vcn_size_bits =
- vol->cluster_size_bits;
- } else {
- ni->itype.index.vcn_size = vol->sector_size;
- ni->itype.index.vcn_size_bits =
- vol->sector_size_bits;
- }
- } else {
- vi->i_mode = S_IFREG | S_IRWXUGO;
- vi->i_mode &= ~vol->fmask;
-
- ni->type = AT_DATA;
- ni->name = NULL;
- ni->name_len = 0;
- }
- if (IS_RDONLY(vi))
- vi->i_mode &= ~S_IWUGO;
-
- /* Set the inode times to the current time. */
- simple_inode_init_ts(vi);
- /*
- * Set the file size to 0, the ntfs inode sizes are set to 0 by
- * the call to ntfs_init_big_inode() below.
- */
- vi->i_size = 0;
- vi->i_blocks = 0;
-
- /* Set the sequence number. */
- vi->i_generation = ni->seq_no = le16_to_cpu(m->sequence_number);
- /*
- * Manually map, pin, and lock the mft record as we already
- * have its page mapped and it is very easy to do.
- */
- atomic_inc(&ni->count);
- mutex_lock(&ni->mrec_lock);
- ni->page = page;
- ni->page_ofs = ofs;
- /*
- * Make sure the allocated mft record is written out to disk.
- * NOTE: We do not set the ntfs inode dirty because this would
- * fail in ntfs_write_inode() because the inode does not have a
- * standard information attribute yet. Also, there is no need
- * to set the inode dirty because the caller is going to do
- * that anyway after finishing with the new mft record (e.g. at
- * a minimum some new attributes will be added to the mft
- * record.
- */
- mark_ntfs_record_dirty(page, ofs);
- unlock_page(page);
-
- /* Add the inode to the inode hash for the superblock. */
- insert_inode_hash(vi);
-
- /* Update the default mft allocation position. */
- vol->mft_data_pos = bit + 1;
- }
- /*
- * Return the opened, allocated inode of the allocated mft record as
- * well as the mapped, pinned, and locked mft record.
- */
- ntfs_debug("Returning opened, allocated %sinode 0x%llx.",
- base_ni ? "extent " : "", (long long)bit);
- *mrec = m;
- return ni;
-undo_data_init:
- write_lock_irqsave(&mft_ni->size_lock, flags);
- mft_ni->initialized_size = old_data_initialized;
- i_size_write(vol->mft_ino, old_data_size);
- write_unlock_irqrestore(&mft_ni->size_lock, flags);
- goto undo_mftbmp_alloc_nolock;
-undo_mftbmp_alloc:
- down_write(&vol->mftbmp_lock);
-undo_mftbmp_alloc_nolock:
- if (ntfs_bitmap_clear_bit(vol->mftbmp_ino, bit)) {
- ntfs_error(vol->sb, "Failed to clear bit in mft bitmap.%s", es);
- NVolSetErrors(vol);
- }
- up_write(&vol->mftbmp_lock);
-err_out:
- return ERR_PTR(err);
-max_err_out:
- ntfs_warning(vol->sb, "Cannot allocate mft record because the maximum "
- "number of inodes (2^32) has already been reached.");
- up_write(&vol->mftbmp_lock);
- return ERR_PTR(-ENOSPC);
-}
-
-/**
- * ntfs_extent_mft_record_free - free an extent mft record on an ntfs volume
- * @ni: ntfs inode of the mapped extent mft record to free
- * @m: mapped extent mft record of the ntfs inode @ni
- *
- * Free the mapped extent mft record @m of the extent ntfs inode @ni.
- *
- * Note that this function unmaps the mft record and closes and destroys @ni
- * internally and hence you cannot use either @ni nor @m any more after this
- * function returns success.
- *
- * On success return 0 and on error return -errno. @ni and @m are still valid
- * in this case and have not been freed.
- *
- * For some errors an error message is displayed and the success code 0 is
- * returned and the volume is then left dirty on umount. This makes sense in
- * case we could not rollback the changes that were already done since the
- * caller no longer wants to reference this mft record so it does not matter to
- * the caller if something is wrong with it as long as it is properly detached
- * from the base inode.
- */
-int ntfs_extent_mft_record_free(ntfs_inode *ni, MFT_RECORD *m)
-{
- unsigned long mft_no = ni->mft_no;
- ntfs_volume *vol = ni->vol;
- ntfs_inode *base_ni;
- ntfs_inode **extent_nis;
- int i, err;
- le16 old_seq_no;
- u16 seq_no;
-
- BUG_ON(NInoAttr(ni));
- BUG_ON(ni->nr_extents != -1);
-
- mutex_lock(&ni->extent_lock);
- base_ni = ni->ext.base_ntfs_ino;
- mutex_unlock(&ni->extent_lock);
-
- BUG_ON(base_ni->nr_extents <= 0);
-
- ntfs_debug("Entering for extent inode 0x%lx, base inode 0x%lx.\n",
- mft_no, base_ni->mft_no);
-
- mutex_lock(&base_ni->extent_lock);
-
- /* Make sure we are holding the only reference to the extent inode. */
- if (atomic_read(&ni->count) > 2) {
- ntfs_error(vol->sb, "Tried to free busy extent inode 0x%lx, "
- "not freeing.", base_ni->mft_no);
- mutex_unlock(&base_ni->extent_lock);
- return -EBUSY;
- }
-
- /* Dissociate the ntfs inode from the base inode. */
- extent_nis = base_ni->ext.extent_ntfs_inos;
- err = -ENOENT;
- for (i = 0; i < base_ni->nr_extents; i++) {
- if (ni != extent_nis[i])
- continue;
- extent_nis += i;
- base_ni->nr_extents--;
- memmove(extent_nis, extent_nis + 1, (base_ni->nr_extents - i) *
- sizeof(ntfs_inode*));
- err = 0;
- break;
- }
-
- mutex_unlock(&base_ni->extent_lock);
-
- if (unlikely(err)) {
- ntfs_error(vol->sb, "Extent inode 0x%lx is not attached to "
- "its base inode 0x%lx.", mft_no,
- base_ni->mft_no);
- BUG();
- }
-
- /*
- * The extent inode is no longer attached to the base inode so no one
- * can get a reference to it any more.
- */
-
- /* Mark the mft record as not in use. */
- m->flags &= ~MFT_RECORD_IN_USE;
-
- /* Increment the sequence number, skipping zero, if it is not zero. */
- old_seq_no = m->sequence_number;
- seq_no = le16_to_cpu(old_seq_no);
- if (seq_no == 0xffff)
- seq_no = 1;
- else if (seq_no)
- seq_no++;
- m->sequence_number = cpu_to_le16(seq_no);
-
- /*
- * Set the ntfs inode dirty and write it out. We do not need to worry
- * about the base inode here since whatever caused the extent mft
- * record to be freed is guaranteed to do it already.
- */
- NInoSetDirty(ni);
- err = write_mft_record(ni, m, 0);
- if (unlikely(err)) {
- ntfs_error(vol->sb, "Failed to write mft record 0x%lx, not "
- "freeing.", mft_no);
- goto rollback;
- }
-rollback_error:
- /* Unmap and throw away the now freed extent inode. */
- unmap_extent_mft_record(ni);
- ntfs_clear_extent_inode(ni);
-
- /* Clear the bit in the $MFT/$BITMAP corresponding to this record. */
- down_write(&vol->mftbmp_lock);
- err = ntfs_bitmap_clear_bit(vol->mftbmp_ino, mft_no);
- up_write(&vol->mftbmp_lock);
- if (unlikely(err)) {
- /*
- * The extent inode is gone but we failed to deallocate it in
- * the mft bitmap. Just emit a warning and leave the volume
- * dirty on umount.
- */
- ntfs_error(vol->sb, "Failed to clear bit in mft bitmap.%s", es);
- NVolSetErrors(vol);
- }
- return 0;
-rollback:
- /* Rollback what we did... */
- mutex_lock(&base_ni->extent_lock);
- extent_nis = base_ni->ext.extent_ntfs_inos;
- if (!(base_ni->nr_extents & 3)) {
- int new_size = (base_ni->nr_extents + 4) * sizeof(ntfs_inode*);
-
- extent_nis = kmalloc(new_size, GFP_NOFS);
- if (unlikely(!extent_nis)) {
- ntfs_error(vol->sb, "Failed to allocate internal "
- "buffer during rollback.%s", es);
- mutex_unlock(&base_ni->extent_lock);
- NVolSetErrors(vol);
- goto rollback_error;
- }
- if (base_ni->nr_extents) {
- BUG_ON(!base_ni->ext.extent_ntfs_inos);
- memcpy(extent_nis, base_ni->ext.extent_ntfs_inos,
- new_size - 4 * sizeof(ntfs_inode*));
- kfree(base_ni->ext.extent_ntfs_inos);
- }
- base_ni->ext.extent_ntfs_inos = extent_nis;
- }
- m->flags |= MFT_RECORD_IN_USE;
- m->sequence_number = old_seq_no;
- extent_nis[base_ni->nr_extents++] = ni;
- mutex_unlock(&base_ni->extent_lock);
- mark_mft_record_dirty(ni);
- return err;
-}
-#endif /* NTFS_RW */
diff --git a/fs/ntfs/mft.h b/fs/ntfs/mft.h
deleted file mode 100644
index 49c001a..0000000
--- a/fs/ntfs/mft.h
+++ /dev/null
@@ -1,110 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * mft.h - Defines for mft record handling in NTFS Linux kernel driver.
- * Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2004 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_MFT_H
-#define _LINUX_NTFS_MFT_H
-
-#include <linux/fs.h>
-#include <linux/highmem.h>
-#include <linux/pagemap.h>
-
-#include "inode.h"
-
-extern MFT_RECORD *map_mft_record(ntfs_inode *ni);
-extern void unmap_mft_record(ntfs_inode *ni);
-
-extern MFT_RECORD *map_extent_mft_record(ntfs_inode *base_ni, MFT_REF mref,
- ntfs_inode **ntfs_ino);
-
-static inline void unmap_extent_mft_record(ntfs_inode *ni)
-{
- unmap_mft_record(ni);
- return;
-}
-
-#ifdef NTFS_RW
-
-/**
- * flush_dcache_mft_record_page - flush_dcache_page() for mft records
- * @ni: ntfs inode structure of mft record
- *
- * Call flush_dcache_page() for the page in which an mft record resides.
- *
- * This must be called every time an mft record is modified, just after the
- * modification.
- */
-static inline void flush_dcache_mft_record_page(ntfs_inode *ni)
-{
- flush_dcache_page(ni->page);
-}
-
-extern void __mark_mft_record_dirty(ntfs_inode *ni);
-
-/**
- * mark_mft_record_dirty - set the mft record and the page containing it dirty
- * @ni: ntfs inode describing the mapped mft record
- *
- * Set the mapped (extent) mft record of the (base or extent) ntfs inode @ni,
- * as well as the page containing the mft record, dirty. Also, mark the base
- * vfs inode dirty. This ensures that any changes to the mft record are
- * written out to disk.
- *
- * NOTE: Do not do anything if the mft record is already marked dirty.
- */
-static inline void mark_mft_record_dirty(ntfs_inode *ni)
-{
- if (!NInoTestSetDirty(ni))
- __mark_mft_record_dirty(ni);
-}
-
-extern int ntfs_sync_mft_mirror(ntfs_volume *vol, const unsigned long mft_no,
- MFT_RECORD *m, int sync);
-
-extern int write_mft_record_nolock(ntfs_inode *ni, MFT_RECORD *m, int sync);
-
-/**
- * write_mft_record - write out a mapped (extent) mft record
- * @ni: ntfs inode describing the mapped (extent) mft record
- * @m: mapped (extent) mft record to write
- * @sync: if true, wait for i/o completion
- *
- * This is just a wrapper for write_mft_record_nolock() (see mft.c), which
- * locks the page for the duration of the write. This ensures that there are
- * no race conditions between writing the mft record via the dirty inode code
- * paths and via the page cache write back code paths or between writing
- * neighbouring mft records residing in the same page.
- *
- * Locking the page also serializes us against ->read_folio() if the page is not
- * uptodate.
- *
- * On success, clean the mft record and return 0. On error, leave the mft
- * record dirty and return -errno.
- */
-static inline int write_mft_record(ntfs_inode *ni, MFT_RECORD *m, int sync)
-{
- struct page *page = ni->page;
- int err;
-
- BUG_ON(!page);
- lock_page(page);
- err = write_mft_record_nolock(ni, m, sync);
- unlock_page(page);
- return err;
-}
-
-extern bool ntfs_may_write_mft_record(ntfs_volume *vol,
- const unsigned long mft_no, const MFT_RECORD *m,
- ntfs_inode **locked_ni);
-
-extern ntfs_inode *ntfs_mft_record_alloc(ntfs_volume *vol, const int mode,
- ntfs_inode *base_ni, MFT_RECORD **mrec);
-extern int ntfs_extent_mft_record_free(ntfs_inode *ni, MFT_RECORD *m);
-
-#endif /* NTFS_RW */
-
-#endif /* _LINUX_NTFS_MFT_H */
diff --git a/fs/ntfs/mst.c b/fs/ntfs/mst.c
deleted file mode 100644
index 16b3c88..0000000
--- a/fs/ntfs/mst.c
+++ /dev/null
@@ -1,189 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * mst.c - NTFS multi sector transfer protection handling code. Part of the
- * Linux-NTFS project.
- *
- * Copyright (c) 2001-2004 Anton Altaparmakov
- */
-
-#include "ntfs.h"
-
-/**
- * post_read_mst_fixup - deprotect multi sector transfer protected data
- * @b: pointer to the data to deprotect
- * @size: size in bytes of @b
- *
- * Perform the necessary post read multi sector transfer fixup and detect the
- * presence of incomplete multi sector transfers. - In that case, overwrite the
- * magic of the ntfs record header being processed with "BAAD" (in memory only!)
- * and abort processing.
- *
- * Return 0 on success and -EINVAL on error ("BAAD" magic will be present).
- *
- * NOTE: We consider the absence / invalidity of an update sequence array to
- * mean that the structure is not protected at all and hence doesn't need to
- * be fixed up. Thus, we return success and not failure in this case. This is
- * in contrast to pre_write_mst_fixup(), see below.
- */
-int post_read_mst_fixup(NTFS_RECORD *b, const u32 size)
-{
- u16 usa_ofs, usa_count, usn;
- u16 *usa_pos, *data_pos;
-
- /* Setup the variables. */
- usa_ofs = le16_to_cpu(b->usa_ofs);
- /* Decrement usa_count to get number of fixups. */
- usa_count = le16_to_cpu(b->usa_count) - 1;
- /* Size and alignment checks. */
- if ( size & (NTFS_BLOCK_SIZE - 1) ||
- usa_ofs & 1 ||
- usa_ofs + (usa_count * 2) > size ||
- (size >> NTFS_BLOCK_SIZE_BITS) != usa_count)
- return 0;
- /* Position of usn in update sequence array. */
- usa_pos = (u16*)b + usa_ofs/sizeof(u16);
- /*
- * The update sequence number which has to be equal to each of the
- * u16 values before they are fixed up. Note no need to care for
- * endianness since we are comparing and moving data for on disk
- * structures which means the data is consistent. - If it is
- * consistenty the wrong endianness it doesn't make any difference.
- */
- usn = *usa_pos;
- /*
- * Position in protected data of first u16 that needs fixing up.
- */
- data_pos = (u16*)b + NTFS_BLOCK_SIZE/sizeof(u16) - 1;
- /*
- * Check for incomplete multi sector transfer(s).
- */
- while (usa_count--) {
- if (*data_pos != usn) {
- /*
- * Incomplete multi sector transfer detected! )-:
- * Set the magic to "BAAD" and return failure.
- * Note that magic_BAAD is already converted to le32.
- */
- b->magic = magic_BAAD;
- return -EINVAL;
- }
- data_pos += NTFS_BLOCK_SIZE/sizeof(u16);
- }
- /* Re-setup the variables. */
- usa_count = le16_to_cpu(b->usa_count) - 1;
- data_pos = (u16*)b + NTFS_BLOCK_SIZE/sizeof(u16) - 1;
- /* Fixup all sectors. */
- while (usa_count--) {
- /*
- * Increment position in usa and restore original data from
- * the usa into the data buffer.
- */
- *data_pos = *(++usa_pos);
- /* Increment position in data as well. */
- data_pos += NTFS_BLOCK_SIZE/sizeof(u16);
- }
- return 0;
-}
-
-/**
- * pre_write_mst_fixup - apply multi sector transfer protection
- * @b: pointer to the data to protect
- * @size: size in bytes of @b
- *
- * Perform the necessary pre write multi sector transfer fixup on the data
- * pointer to by @b of @size.
- *
- * Return 0 if fixup applied (success) or -EINVAL if no fixup was performed
- * (assumed not needed). This is in contrast to post_read_mst_fixup() above.
- *
- * NOTE: We consider the absence / invalidity of an update sequence array to
- * mean that the structure is not subject to protection and hence doesn't need
- * to be fixed up. This means that you have to create a valid update sequence
- * array header in the ntfs record before calling this function, otherwise it
- * will fail (the header needs to contain the position of the update sequence
- * array together with the number of elements in the array). You also need to
- * initialise the update sequence number before calling this function
- * otherwise a random word will be used (whatever was in the record at that
- * position at that time).
- */
-int pre_write_mst_fixup(NTFS_RECORD *b, const u32 size)
-{
- le16 *usa_pos, *data_pos;
- u16 usa_ofs, usa_count, usn;
- le16 le_usn;
-
- /* Sanity check + only fixup if it makes sense. */
- if (!b || ntfs_is_baad_record(b->magic) ||
- ntfs_is_hole_record(b->magic))
- return -EINVAL;
- /* Setup the variables. */
- usa_ofs = le16_to_cpu(b->usa_ofs);
- /* Decrement usa_count to get number of fixups. */
- usa_count = le16_to_cpu(b->usa_count) - 1;
- /* Size and alignment checks. */
- if ( size & (NTFS_BLOCK_SIZE - 1) ||
- usa_ofs & 1 ||
- usa_ofs + (usa_count * 2) > size ||
- (size >> NTFS_BLOCK_SIZE_BITS) != usa_count)
- return -EINVAL;
- /* Position of usn in update sequence array. */
- usa_pos = (le16*)((u8*)b + usa_ofs);
- /*
- * Cyclically increment the update sequence number
- * (skipping 0 and -1, i.e. 0xffff).
- */
- usn = le16_to_cpup(usa_pos) + 1;
- if (usn == 0xffff || !usn)
- usn = 1;
- le_usn = cpu_to_le16(usn);
- *usa_pos = le_usn;
- /* Position in data of first u16 that needs fixing up. */
- data_pos = (le16*)b + NTFS_BLOCK_SIZE/sizeof(le16) - 1;
- /* Fixup all sectors. */
- while (usa_count--) {
- /*
- * Increment the position in the usa and save the
- * original data from the data buffer into the usa.
- */
- *(++usa_pos) = *data_pos;
- /* Apply fixup to data. */
- *data_pos = le_usn;
- /* Increment position in data as well. */
- data_pos += NTFS_BLOCK_SIZE/sizeof(le16);
- }
- return 0;
-}
-
-/**
- * post_write_mst_fixup - fast deprotect multi sector transfer protected data
- * @b: pointer to the data to deprotect
- *
- * Perform the necessary post write multi sector transfer fixup, not checking
- * for any errors, because we assume we have just used pre_write_mst_fixup(),
- * thus the data will be fine or we would never have gotten here.
- */
-void post_write_mst_fixup(NTFS_RECORD *b)
-{
- le16 *usa_pos, *data_pos;
-
- u16 usa_ofs = le16_to_cpu(b->usa_ofs);
- u16 usa_count = le16_to_cpu(b->usa_count) - 1;
-
- /* Position of usn in update sequence array. */
- usa_pos = (le16*)b + usa_ofs/sizeof(le16);
-
- /* Position in protected data of first u16 that needs fixing up. */
- data_pos = (le16*)b + NTFS_BLOCK_SIZE/sizeof(le16) - 1;
-
- /* Fixup all sectors. */
- while (usa_count--) {
- /*
- * Increment position in usa and restore original data from
- * the usa into the data buffer.
- */
- *data_pos = *(++usa_pos);
-
- /* Increment position in data as well. */
- data_pos += NTFS_BLOCK_SIZE/sizeof(le16);
- }
-}
diff --git a/fs/ntfs/namei.c b/fs/ntfs/namei.c
deleted file mode 100644
index d7498dd..0000000
--- a/fs/ntfs/namei.c
+++ /dev/null
@@ -1,392 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * namei.c - NTFS kernel directory inode operations. Part of the Linux-NTFS
- * project.
- *
- * Copyright (c) 2001-2006 Anton Altaparmakov
- */
-
-#include <linux/dcache.h>
-#include <linux/exportfs.h>
-#include <linux/security.h>
-#include <linux/slab.h>
-
-#include "attrib.h"
-#include "debug.h"
-#include "dir.h"
-#include "mft.h"
-#include "ntfs.h"
-
-/**
- * ntfs_lookup - find the inode represented by a dentry in a directory inode
- * @dir_ino: directory inode in which to look for the inode
- * @dent: dentry representing the inode to look for
- * @flags: lookup flags
- *
- * In short, ntfs_lookup() looks for the inode represented by the dentry @dent
- * in the directory inode @dir_ino and if found attaches the inode to the
- * dentry @dent.
- *
- * In more detail, the dentry @dent specifies which inode to look for by
- * supplying the name of the inode in @dent->d_name.name. ntfs_lookup()
- * converts the name to Unicode and walks the contents of the directory inode
- * @dir_ino looking for the converted Unicode name. If the name is found in the
- * directory, the corresponding inode is loaded by calling ntfs_iget() on its
- * inode number and the inode is associated with the dentry @dent via a call to
- * d_splice_alias().
- *
- * If the name is not found in the directory, a NULL inode is inserted into the
- * dentry @dent via a call to d_add(). The dentry is then termed a negative
- * dentry.
- *
- * Only if an actual error occurs, do we return an error via ERR_PTR().
- *
- * In order to handle the case insensitivity issues of NTFS with regards to the
- * dcache and the dcache requiring only one dentry per directory, we deal with
- * dentry aliases that only differ in case in ->ntfs_lookup() while maintaining
- * a case sensitive dcache. This means that we get the full benefit of dcache
- * speed when the file/directory is looked up with the same case as returned by
- * ->ntfs_readdir() but that a lookup for any other case (or for the short file
- * name) will not find anything in dcache and will enter ->ntfs_lookup()
- * instead, where we search the directory for a fully matching file name
- * (including case) and if that is not found, we search for a file name that
- * matches with different case and if that has non-POSIX semantics we return
- * that. We actually do only one search (case sensitive) and keep tabs on
- * whether we have found a case insensitive match in the process.
- *
- * To simplify matters for us, we do not treat the short vs long filenames as
- * two hard links but instead if the lookup matches a short filename, we
- * return the dentry for the corresponding long filename instead.
- *
- * There are three cases we need to distinguish here:
- *
- * 1) @dent perfectly matches (i.e. including case) a directory entry with a
- * file name in the WIN32 or POSIX namespaces. In this case
- * ntfs_lookup_inode_by_name() will return with name set to NULL and we
- * just d_splice_alias() @dent.
- * 2) @dent matches (not including case) a directory entry with a file name in
- * the WIN32 namespace. In this case ntfs_lookup_inode_by_name() will return
- * with name set to point to a kmalloc()ed ntfs_name structure containing
- * the properly cased little endian Unicode name. We convert the name to the
- * current NLS code page, search if a dentry with this name already exists
- * and if so return that instead of @dent. At this point things are
- * complicated by the possibility of 'disconnected' dentries due to NFS
- * which we deal with appropriately (see the code comments). The VFS will
- * then destroy the old @dent and use the one we returned. If a dentry is
- * not found, we allocate a new one, d_splice_alias() it, and return it as
- * above.
- * 3) @dent matches either perfectly or not (i.e. we don't care about case) a
- * directory entry with a file name in the DOS namespace. In this case
- * ntfs_lookup_inode_by_name() will return with name set to point to a
- * kmalloc()ed ntfs_name structure containing the mft reference (cpu endian)
- * of the inode. We use the mft reference to read the inode and to find the
- * file name in the WIN32 namespace corresponding to the matched short file
- * name. We then convert the name to the current NLS code page, and proceed
- * searching for a dentry with this name, etc, as in case 2), above.
- *
- * Locking: Caller must hold i_mutex on the directory.
- */
-static struct dentry *ntfs_lookup(struct inode *dir_ino, struct dentry *dent,
- unsigned int flags)
-{
- ntfs_volume *vol = NTFS_SB(dir_ino->i_sb);
- struct inode *dent_inode;
- ntfschar *uname;
- ntfs_name *name = NULL;
- MFT_REF mref;
- unsigned long dent_ino;
- int uname_len;
-
- ntfs_debug("Looking up %pd in directory inode 0x%lx.",
- dent, dir_ino->i_ino);
- /* Convert the name of the dentry to Unicode. */
- uname_len = ntfs_nlstoucs(vol, dent->d_name.name, dent->d_name.len,
- &uname);
- if (uname_len < 0) {
- if (uname_len != -ENAMETOOLONG)
- ntfs_error(vol->sb, "Failed to convert name to "
- "Unicode.");
- return ERR_PTR(uname_len);
- }
- mref = ntfs_lookup_inode_by_name(NTFS_I(dir_ino), uname, uname_len,
- &name);
- kmem_cache_free(ntfs_name_cache, uname);
- if (!IS_ERR_MREF(mref)) {
- dent_ino = MREF(mref);
- ntfs_debug("Found inode 0x%lx. Calling ntfs_iget.", dent_ino);
- dent_inode = ntfs_iget(vol->sb, dent_ino);
- if (!IS_ERR(dent_inode)) {
- /* Consistency check. */
- if (is_bad_inode(dent_inode) || MSEQNO(mref) ==
- NTFS_I(dent_inode)->seq_no ||
- dent_ino == FILE_MFT) {
- /* Perfect WIN32/POSIX match. -- Case 1. */
- if (!name) {
- ntfs_debug("Done. (Case 1.)");
- return d_splice_alias(dent_inode, dent);
- }
- /*
- * We are too indented. Handle imperfect
- * matches and short file names further below.
- */
- goto handle_name;
- }
- ntfs_error(vol->sb, "Found stale reference to inode "
- "0x%lx (reference sequence number = "
- "0x%x, inode sequence number = 0x%x), "
- "returning -EIO. Run chkdsk.",
- dent_ino, MSEQNO(mref),
- NTFS_I(dent_inode)->seq_no);
- iput(dent_inode);
- dent_inode = ERR_PTR(-EIO);
- } else
- ntfs_error(vol->sb, "ntfs_iget(0x%lx) failed with "
- "error code %li.", dent_ino,
- PTR_ERR(dent_inode));
- kfree(name);
- /* Return the error code. */
- return ERR_CAST(dent_inode);
- }
- /* It is guaranteed that @name is no longer allocated at this point. */
- if (MREF_ERR(mref) == -ENOENT) {
- ntfs_debug("Entry was not found, adding negative dentry.");
- /* The dcache will handle negative entries. */
- d_add(dent, NULL);
- ntfs_debug("Done.");
- return NULL;
- }
- ntfs_error(vol->sb, "ntfs_lookup_ino_by_name() failed with error "
- "code %i.", -MREF_ERR(mref));
- return ERR_PTR(MREF_ERR(mref));
- // TODO: Consider moving this lot to a separate function! (AIA)
-handle_name:
- {
- MFT_RECORD *m;
- ntfs_attr_search_ctx *ctx;
- ntfs_inode *ni = NTFS_I(dent_inode);
- int err;
- struct qstr nls_name;
-
- nls_name.name = NULL;
- if (name->type != FILE_NAME_DOS) { /* Case 2. */
- ntfs_debug("Case 2.");
- nls_name.len = (unsigned)ntfs_ucstonls(vol,
- (ntfschar*)&name->name, name->len,
- (unsigned char**)&nls_name.name, 0);
- kfree(name);
- } else /* if (name->type == FILE_NAME_DOS) */ { /* Case 3. */
- FILE_NAME_ATTR *fn;
-
- ntfs_debug("Case 3.");
- kfree(name);
-
- /* Find the WIN32 name corresponding to the matched DOS name. */
- ni = NTFS_I(dent_inode);
- m = map_mft_record(ni);
- if (IS_ERR(m)) {
- err = PTR_ERR(m);
- m = NULL;
- ctx = NULL;
- goto err_out;
- }
- ctx = ntfs_attr_get_search_ctx(ni, m);
- if (unlikely(!ctx)) {
- err = -ENOMEM;
- goto err_out;
- }
- do {
- ATTR_RECORD *a;
- u32 val_len;
-
- err = ntfs_attr_lookup(AT_FILE_NAME, NULL, 0, 0, 0,
- NULL, 0, ctx);
- if (unlikely(err)) {
- ntfs_error(vol->sb, "Inode corrupt: No WIN32 "
- "namespace counterpart to DOS "
- "file name. Run chkdsk.");
- if (err == -ENOENT)
- err = -EIO;
- goto err_out;
- }
- /* Consistency checks. */
- a = ctx->attr;
- if (a->non_resident || a->flags)
- goto eio_err_out;
- val_len = le32_to_cpu(a->data.resident.value_length);
- if (le16_to_cpu(a->data.resident.value_offset) +
- val_len > le32_to_cpu(a->length))
- goto eio_err_out;
- fn = (FILE_NAME_ATTR*)((u8*)ctx->attr + le16_to_cpu(
- ctx->attr->data.resident.value_offset));
- if ((u32)(fn->file_name_length * sizeof(ntfschar) +
- sizeof(FILE_NAME_ATTR)) > val_len)
- goto eio_err_out;
- } while (fn->file_name_type != FILE_NAME_WIN32);
-
- /* Convert the found WIN32 name to current NLS code page. */
- nls_name.len = (unsigned)ntfs_ucstonls(vol,
- (ntfschar*)&fn->file_name, fn->file_name_length,
- (unsigned char**)&nls_name.name, 0);
-
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(ni);
- }
- m = NULL;
- ctx = NULL;
-
- /* Check if a conversion error occurred. */
- if ((signed)nls_name.len < 0) {
- err = (signed)nls_name.len;
- goto err_out;
- }
- nls_name.hash = full_name_hash(dent, nls_name.name, nls_name.len);
-
- dent = d_add_ci(dent, dent_inode, &nls_name);
- kfree(nls_name.name);
- return dent;
-
-eio_err_out:
- ntfs_error(vol->sb, "Illegal file name attribute. Run chkdsk.");
- err = -EIO;
-err_out:
- if (ctx)
- ntfs_attr_put_search_ctx(ctx);
- if (m)
- unmap_mft_record(ni);
- iput(dent_inode);
- ntfs_error(vol->sb, "Failed, returning error code %i.", err);
- return ERR_PTR(err);
- }
-}
-
-/*
- * Inode operations for directories.
- */
-const struct inode_operations ntfs_dir_inode_ops = {
- .lookup = ntfs_lookup, /* VFS: Lookup directory. */
-};
-
-/**
- * ntfs_get_parent - find the dentry of the parent of a given directory dentry
- * @child_dent: dentry of the directory whose parent directory to find
- *
- * Find the dentry for the parent directory of the directory specified by the
- * dentry @child_dent. This function is called from
- * fs/exportfs/expfs.c::find_exported_dentry() which in turn is called from the
- * default ->decode_fh() which is export_decode_fh() in the same file.
- *
- * The code is based on the ext3 ->get_parent() implementation found in
- * fs/ext3/namei.c::ext3_get_parent().
- *
- * Note: ntfs_get_parent() is called with @d_inode(child_dent)->i_mutex down.
- *
- * Return the dentry of the parent directory on success or the error code on
- * error (IS_ERR() is true).
- */
-static struct dentry *ntfs_get_parent(struct dentry *child_dent)
-{
- struct inode *vi = d_inode(child_dent);
- ntfs_inode *ni = NTFS_I(vi);
- MFT_RECORD *mrec;
- ntfs_attr_search_ctx *ctx;
- ATTR_RECORD *attr;
- FILE_NAME_ATTR *fn;
- unsigned long parent_ino;
- int err;
-
- ntfs_debug("Entering for inode 0x%lx.", vi->i_ino);
- /* Get the mft record of the inode belonging to the child dentry. */
- mrec = map_mft_record(ni);
- if (IS_ERR(mrec))
- return ERR_CAST(mrec);
- /* Find the first file name attribute in the mft record. */
- ctx = ntfs_attr_get_search_ctx(ni, mrec);
- if (unlikely(!ctx)) {
- unmap_mft_record(ni);
- return ERR_PTR(-ENOMEM);
- }
-try_next:
- err = ntfs_attr_lookup(AT_FILE_NAME, NULL, 0, CASE_SENSITIVE, 0, NULL,
- 0, ctx);
- if (unlikely(err)) {
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(ni);
- if (err == -ENOENT)
- ntfs_error(vi->i_sb, "Inode 0x%lx does not have a "
- "file name attribute. Run chkdsk.",
- vi->i_ino);
- return ERR_PTR(err);
- }
- attr = ctx->attr;
- if (unlikely(attr->non_resident))
- goto try_next;
- fn = (FILE_NAME_ATTR *)((u8 *)attr +
- le16_to_cpu(attr->data.resident.value_offset));
- if (unlikely((u8 *)fn + le32_to_cpu(attr->data.resident.value_length) >
- (u8*)attr + le32_to_cpu(attr->length)))
- goto try_next;
- /* Get the inode number of the parent directory. */
- parent_ino = MREF_LE(fn->parent_directory);
- /* Release the search context and the mft record of the child. */
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(ni);
-
- return d_obtain_alias(ntfs_iget(vi->i_sb, parent_ino));
-}
-
-static struct inode *ntfs_nfs_get_inode(struct super_block *sb,
- u64 ino, u32 generation)
-{
- struct inode *inode;
-
- inode = ntfs_iget(sb, ino);
- if (!IS_ERR(inode)) {
- if (is_bad_inode(inode) || inode->i_generation != generation) {
- iput(inode);
- inode = ERR_PTR(-ESTALE);
- }
- }
-
- return inode;
-}
-
-static struct dentry *ntfs_fh_to_dentry(struct super_block *sb, struct fid *fid,
- int fh_len, int fh_type)
-{
- return generic_fh_to_dentry(sb, fid, fh_len, fh_type,
- ntfs_nfs_get_inode);
-}
-
-static struct dentry *ntfs_fh_to_parent(struct super_block *sb, struct fid *fid,
- int fh_len, int fh_type)
-{
- return generic_fh_to_parent(sb, fid, fh_len, fh_type,
- ntfs_nfs_get_inode);
-}
-
-/*
- * Export operations allowing NFS exporting of mounted NTFS partitions.
- *
- * We use the default ->encode_fh() for now. Note that they
- * use 32 bits to store the inode number which is an unsigned long so on 64-bit
- * architectures is usually 64 bits so it would all fail horribly on huge
- * volumes. I guess we need to define our own encode and decode fh functions
- * that store 64-bit inode numbers at some point but for now we will ignore the
- * problem...
- *
- * We also use the default ->get_name() helper (used by ->decode_fh() via
- * fs/exportfs/expfs.c::find_exported_dentry()) as that is completely fs
- * independent.
- *
- * The default ->get_parent() just returns -EACCES so we have to provide our
- * own and the default ->get_dentry() is incompatible with NTFS due to not
- * allowing the inode number 0 which is used in NTFS for the system file $MFT
- * and due to using iget() whereas NTFS needs ntfs_iget().
- */
-const struct export_operations ntfs_export_ops = {
- .encode_fh = generic_encode_ino32_fh,
- .get_parent = ntfs_get_parent, /* Find the parent of a given
- directory. */
- .fh_to_dentry = ntfs_fh_to_dentry,
- .fh_to_parent = ntfs_fh_to_parent,
-};
diff --git a/fs/ntfs/ntfs.h b/fs/ntfs/ntfs.h
deleted file mode 100644
index e81376ea..0000000
--- a/fs/ntfs/ntfs.h
+++ /dev/null
@@ -1,150 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * ntfs.h - Defines for NTFS Linux kernel driver.
- *
- * Copyright (c) 2001-2014 Anton Altaparmakov and Tuxera Inc.
- * Copyright (C) 2002 Richard Russon
- */
-
-#ifndef _LINUX_NTFS_H
-#define _LINUX_NTFS_H
-
-#include <linux/stddef.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/compiler.h>
-#include <linux/fs.h>
-#include <linux/nls.h>
-#include <linux/smp.h>
-#include <linux/pagemap.h>
-
-#include "types.h"
-#include "volume.h"
-#include "layout.h"
-
-typedef enum {
- NTFS_BLOCK_SIZE = 512,
- NTFS_BLOCK_SIZE_BITS = 9,
- NTFS_SB_MAGIC = 0x5346544e, /* 'NTFS' */
- NTFS_MAX_NAME_LEN = 255,
- NTFS_MAX_ATTR_NAME_LEN = 255,
- NTFS_MAX_CLUSTER_SIZE = 64 * 1024, /* 64kiB */
- NTFS_MAX_PAGES_PER_CLUSTER = NTFS_MAX_CLUSTER_SIZE / PAGE_SIZE,
-} NTFS_CONSTANTS;
-
-/* Global variables. */
-
-/* Slab caches (from super.c). */
-extern struct kmem_cache *ntfs_name_cache;
-extern struct kmem_cache *ntfs_inode_cache;
-extern struct kmem_cache *ntfs_big_inode_cache;
-extern struct kmem_cache *ntfs_attr_ctx_cache;
-extern struct kmem_cache *ntfs_index_ctx_cache;
-
-/* The various operations structs defined throughout the driver files. */
-extern const struct address_space_operations ntfs_normal_aops;
-extern const struct address_space_operations ntfs_compressed_aops;
-extern const struct address_space_operations ntfs_mst_aops;
-
-extern const struct file_operations ntfs_file_ops;
-extern const struct inode_operations ntfs_file_inode_ops;
-
-extern const struct file_operations ntfs_dir_ops;
-extern const struct inode_operations ntfs_dir_inode_ops;
-
-extern const struct file_operations ntfs_empty_file_ops;
-extern const struct inode_operations ntfs_empty_inode_ops;
-
-extern const struct export_operations ntfs_export_ops;
-
-/**
- * NTFS_SB - return the ntfs volume given a vfs super block
- * @sb: VFS super block
- *
- * NTFS_SB() returns the ntfs volume associated with the VFS super block @sb.
- */
-static inline ntfs_volume *NTFS_SB(struct super_block *sb)
-{
- return sb->s_fs_info;
-}
-
-/* Declarations of functions and global variables. */
-
-/* From fs/ntfs/compress.c */
-extern int ntfs_read_compressed_block(struct page *page);
-extern int allocate_compression_buffers(void);
-extern void free_compression_buffers(void);
-
-/* From fs/ntfs/super.c */
-#define default_upcase_len 0x10000
-extern struct mutex ntfs_lock;
-
-typedef struct {
- int val;
- char *str;
-} option_t;
-extern const option_t on_errors_arr[];
-
-/* From fs/ntfs/mst.c */
-extern int post_read_mst_fixup(NTFS_RECORD *b, const u32 size);
-extern int pre_write_mst_fixup(NTFS_RECORD *b, const u32 size);
-extern void post_write_mst_fixup(NTFS_RECORD *b);
-
-/* From fs/ntfs/unistr.c */
-extern bool ntfs_are_names_equal(const ntfschar *s1, size_t s1_len,
- const ntfschar *s2, size_t s2_len,
- const IGNORE_CASE_BOOL ic,
- const ntfschar *upcase, const u32 upcase_size);
-extern int ntfs_collate_names(const ntfschar *name1, const u32 name1_len,
- const ntfschar *name2, const u32 name2_len,
- const int err_val, const IGNORE_CASE_BOOL ic,
- const ntfschar *upcase, const u32 upcase_len);
-extern int ntfs_ucsncmp(const ntfschar *s1, const ntfschar *s2, size_t n);
-extern int ntfs_ucsncasecmp(const ntfschar *s1, const ntfschar *s2, size_t n,
- const ntfschar *upcase, const u32 upcase_size);
-extern void ntfs_upcase_name(ntfschar *name, u32 name_len,
- const ntfschar *upcase, const u32 upcase_len);
-extern void ntfs_file_upcase_value(FILE_NAME_ATTR *file_name_attr,
- const ntfschar *upcase, const u32 upcase_len);
-extern int ntfs_file_compare_values(FILE_NAME_ATTR *file_name_attr1,
- FILE_NAME_ATTR *file_name_attr2,
- const int err_val, const IGNORE_CASE_BOOL ic,
- const ntfschar *upcase, const u32 upcase_len);
-extern int ntfs_nlstoucs(const ntfs_volume *vol, const char *ins,
- const int ins_len, ntfschar **outs);
-extern int ntfs_ucstonls(const ntfs_volume *vol, const ntfschar *ins,
- const int ins_len, unsigned char **outs, int outs_len);
-
-/* From fs/ntfs/upcase.c */
-extern ntfschar *generate_default_upcase(void);
-
-static inline int ntfs_ffs(int x)
-{
- int r = 1;
-
- if (!x)
- return 0;
- if (!(x & 0xffff)) {
- x >>= 16;
- r += 16;
- }
- if (!(x & 0xff)) {
- x >>= 8;
- r += 8;
- }
- if (!(x & 0xf)) {
- x >>= 4;
- r += 4;
- }
- if (!(x & 3)) {
- x >>= 2;
- r += 2;
- }
- if (!(x & 1)) {
- x >>= 1;
- r += 1;
- }
- return r;
-}
-
-#endif /* _LINUX_NTFS_H */
diff --git a/fs/ntfs/quota.c b/fs/ntfs/quota.c
deleted file mode 100644
index 9160480..0000000
--- a/fs/ntfs/quota.c
+++ /dev/null
@@ -1,103 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * quota.c - NTFS kernel quota ($Quota) handling. Part of the Linux-NTFS
- * project.
- *
- * Copyright (c) 2004 Anton Altaparmakov
- */
-
-#ifdef NTFS_RW
-
-#include "index.h"
-#include "quota.h"
-#include "debug.h"
-#include "ntfs.h"
-
-/**
- * ntfs_mark_quotas_out_of_date - mark the quotas out of date on an ntfs volume
- * @vol: ntfs volume on which to mark the quotas out of date
- *
- * Mark the quotas out of date on the ntfs volume @vol and return 'true' on
- * success and 'false' on error.
- */
-bool ntfs_mark_quotas_out_of_date(ntfs_volume *vol)
-{
- ntfs_index_context *ictx;
- QUOTA_CONTROL_ENTRY *qce;
- const le32 qid = QUOTA_DEFAULTS_ID;
- int err;
-
- ntfs_debug("Entering.");
- if (NVolQuotaOutOfDate(vol))
- goto done;
- if (!vol->quota_ino || !vol->quota_q_ino) {
- ntfs_error(vol->sb, "Quota inodes are not open.");
- return false;
- }
- inode_lock(vol->quota_q_ino);
- ictx = ntfs_index_ctx_get(NTFS_I(vol->quota_q_ino));
- if (!ictx) {
- ntfs_error(vol->sb, "Failed to get index context.");
- goto err_out;
- }
- err = ntfs_index_lookup(&qid, sizeof(qid), ictx);
- if (err) {
- if (err == -ENOENT)
- ntfs_error(vol->sb, "Quota defaults entry is not "
- "present.");
- else
- ntfs_error(vol->sb, "Lookup of quota defaults entry "
- "failed.");
- goto err_out;
- }
- if (ictx->data_len < offsetof(QUOTA_CONTROL_ENTRY, sid)) {
- ntfs_error(vol->sb, "Quota defaults entry size is invalid. "
- "Run chkdsk.");
- goto err_out;
- }
- qce = (QUOTA_CONTROL_ENTRY*)ictx->data;
- if (le32_to_cpu(qce->version) != QUOTA_VERSION) {
- ntfs_error(vol->sb, "Quota defaults entry version 0x%x is not "
- "supported.", le32_to_cpu(qce->version));
- goto err_out;
- }
- ntfs_debug("Quota defaults flags = 0x%x.", le32_to_cpu(qce->flags));
- /* If quotas are already marked out of date, no need to do anything. */
- if (qce->flags & QUOTA_FLAG_OUT_OF_DATE)
- goto set_done;
- /*
- * If quota tracking is neither requested, nor enabled and there are no
- * pending deletes, no need to mark the quotas out of date.
- */
- if (!(qce->flags & (QUOTA_FLAG_TRACKING_ENABLED |
- QUOTA_FLAG_TRACKING_REQUESTED |
- QUOTA_FLAG_PENDING_DELETES)))
- goto set_done;
- /*
- * Set the QUOTA_FLAG_OUT_OF_DATE bit thus marking quotas out of date.
- * This is verified on WinXP to be sufficient to cause windows to
- * rescan the volume on boot and update all quota entries.
- */
- qce->flags |= QUOTA_FLAG_OUT_OF_DATE;
- /* Ensure the modified flags are written to disk. */
- ntfs_index_entry_flush_dcache_page(ictx);
- ntfs_index_entry_mark_dirty(ictx);
-set_done:
- ntfs_index_ctx_put(ictx);
- inode_unlock(vol->quota_q_ino);
- /*
- * We set the flag so we do not try to mark the quotas out of date
- * again on remount.
- */
- NVolSetQuotaOutOfDate(vol);
-done:
- ntfs_debug("Done.");
- return true;
-err_out:
- if (ictx)
- ntfs_index_ctx_put(ictx);
- inode_unlock(vol->quota_q_ino);
- return false;
-}
-
-#endif /* NTFS_RW */
diff --git a/fs/ntfs/quota.h b/fs/ntfs/quota.h
deleted file mode 100644
index fe3132a..0000000
--- a/fs/ntfs/quota.h
+++ /dev/null
@@ -1,21 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * quota.h - Defines for NTFS kernel quota ($Quota) handling. Part of the
- * Linux-NTFS project.
- *
- * Copyright (c) 2004 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_QUOTA_H
-#define _LINUX_NTFS_QUOTA_H
-
-#ifdef NTFS_RW
-
-#include "types.h"
-#include "volume.h"
-
-extern bool ntfs_mark_quotas_out_of_date(ntfs_volume *vol);
-
-#endif /* NTFS_RW */
-
-#endif /* _LINUX_NTFS_QUOTA_H */
diff --git a/fs/ntfs/runlist.c b/fs/ntfs/runlist.c
deleted file mode 100644
index 0d448e9..0000000
--- a/fs/ntfs/runlist.c
+++ /dev/null
@@ -1,1893 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * runlist.c - NTFS runlist handling code. Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2007 Anton Altaparmakov
- * Copyright (c) 2002-2005 Richard Russon
- */
-
-#include "debug.h"
-#include "dir.h"
-#include "endian.h"
-#include "malloc.h"
-#include "ntfs.h"
-
-/**
- * ntfs_rl_mm - runlist memmove
- *
- * It is up to the caller to serialize access to the runlist @base.
- */
-static inline void ntfs_rl_mm(runlist_element *base, int dst, int src,
- int size)
-{
- if (likely((dst != src) && (size > 0)))
- memmove(base + dst, base + src, size * sizeof(*base));
-}
-
-/**
- * ntfs_rl_mc - runlist memory copy
- *
- * It is up to the caller to serialize access to the runlists @dstbase and
- * @srcbase.
- */
-static inline void ntfs_rl_mc(runlist_element *dstbase, int dst,
- runlist_element *srcbase, int src, int size)
-{
- if (likely(size > 0))
- memcpy(dstbase + dst, srcbase + src, size * sizeof(*dstbase));
-}
-
-/**
- * ntfs_rl_realloc - Reallocate memory for runlists
- * @rl: original runlist
- * @old_size: number of runlist elements in the original runlist @rl
- * @new_size: number of runlist elements we need space for
- *
- * As the runlists grow, more memory will be required. To prevent the
- * kernel having to allocate and reallocate large numbers of small bits of
- * memory, this function returns an entire page of memory.
- *
- * It is up to the caller to serialize access to the runlist @rl.
- *
- * N.B. If the new allocation doesn't require a different number of pages in
- * memory, the function will return the original pointer.
- *
- * On success, return a pointer to the newly allocated, or recycled, memory.
- * On error, return -errno. The following error codes are defined:
- * -ENOMEM - Not enough memory to allocate runlist array.
- * -EINVAL - Invalid parameters were passed in.
- */
-static inline runlist_element *ntfs_rl_realloc(runlist_element *rl,
- int old_size, int new_size)
-{
- runlist_element *new_rl;
-
- old_size = PAGE_ALIGN(old_size * sizeof(*rl));
- new_size = PAGE_ALIGN(new_size * sizeof(*rl));
- if (old_size == new_size)
- return rl;
-
- new_rl = ntfs_malloc_nofs(new_size);
- if (unlikely(!new_rl))
- return ERR_PTR(-ENOMEM);
-
- if (likely(rl != NULL)) {
- if (unlikely(old_size > new_size))
- old_size = new_size;
- memcpy(new_rl, rl, old_size);
- ntfs_free(rl);
- }
- return new_rl;
-}
-
-/**
- * ntfs_rl_realloc_nofail - Reallocate memory for runlists
- * @rl: original runlist
- * @old_size: number of runlist elements in the original runlist @rl
- * @new_size: number of runlist elements we need space for
- *
- * As the runlists grow, more memory will be required. To prevent the
- * kernel having to allocate and reallocate large numbers of small bits of
- * memory, this function returns an entire page of memory.
- *
- * This function guarantees that the allocation will succeed. It will sleep
- * for as long as it takes to complete the allocation.
- *
- * It is up to the caller to serialize access to the runlist @rl.
- *
- * N.B. If the new allocation doesn't require a different number of pages in
- * memory, the function will return the original pointer.
- *
- * On success, return a pointer to the newly allocated, or recycled, memory.
- * On error, return -errno. The following error codes are defined:
- * -ENOMEM - Not enough memory to allocate runlist array.
- * -EINVAL - Invalid parameters were passed in.
- */
-static inline runlist_element *ntfs_rl_realloc_nofail(runlist_element *rl,
- int old_size, int new_size)
-{
- runlist_element *new_rl;
-
- old_size = PAGE_ALIGN(old_size * sizeof(*rl));
- new_size = PAGE_ALIGN(new_size * sizeof(*rl));
- if (old_size == new_size)
- return rl;
-
- new_rl = ntfs_malloc_nofs_nofail(new_size);
- BUG_ON(!new_rl);
-
- if (likely(rl != NULL)) {
- if (unlikely(old_size > new_size))
- old_size = new_size;
- memcpy(new_rl, rl, old_size);
- ntfs_free(rl);
- }
- return new_rl;
-}
-
-/**
- * ntfs_are_rl_mergeable - test if two runlists can be joined together
- * @dst: original runlist
- * @src: new runlist to test for mergeability with @dst
- *
- * Test if two runlists can be joined together. For this, their VCNs and LCNs
- * must be adjacent.
- *
- * It is up to the caller to serialize access to the runlists @dst and @src.
- *
- * Return: true Success, the runlists can be merged.
- * false Failure, the runlists cannot be merged.
- */
-static inline bool ntfs_are_rl_mergeable(runlist_element *dst,
- runlist_element *src)
-{
- BUG_ON(!dst);
- BUG_ON(!src);
-
- /* We can merge unmapped regions even if they are misaligned. */
- if ((dst->lcn == LCN_RL_NOT_MAPPED) && (src->lcn == LCN_RL_NOT_MAPPED))
- return true;
- /* If the runs are misaligned, we cannot merge them. */
- if ((dst->vcn + dst->length) != src->vcn)
- return false;
- /* If both runs are non-sparse and contiguous, we can merge them. */
- if ((dst->lcn >= 0) && (src->lcn >= 0) &&
- ((dst->lcn + dst->length) == src->lcn))
- return true;
- /* If we are merging two holes, we can merge them. */
- if ((dst->lcn == LCN_HOLE) && (src->lcn == LCN_HOLE))
- return true;
- /* Cannot merge. */
- return false;
-}
-
-/**
- * __ntfs_rl_merge - merge two runlists without testing if they can be merged
- * @dst: original, destination runlist
- * @src: new runlist to merge with @dst
- *
- * Merge the two runlists, writing into the destination runlist @dst. The
- * caller must make sure the runlists can be merged or this will corrupt the
- * destination runlist.
- *
- * It is up to the caller to serialize access to the runlists @dst and @src.
- */
-static inline void __ntfs_rl_merge(runlist_element *dst, runlist_element *src)
-{
- dst->length += src->length;
-}
-
-/**
- * ntfs_rl_append - append a runlist after a given element
- * @dst: original runlist to be worked on
- * @dsize: number of elements in @dst (including end marker)
- * @src: runlist to be inserted into @dst
- * @ssize: number of elements in @src (excluding end marker)
- * @loc: append the new runlist @src after this element in @dst
- *
- * Append the runlist @src after element @loc in @dst. Merge the right end of
- * the new runlist, if necessary. Adjust the size of the hole before the
- * appended runlist.
- *
- * It is up to the caller to serialize access to the runlists @dst and @src.
- *
- * On success, return a pointer to the new, combined, runlist. Note, both
- * runlists @dst and @src are deallocated before returning so you cannot use
- * the pointers for anything any more. (Strictly speaking the returned runlist
- * may be the same as @dst but this is irrelevant.)
- *
- * On error, return -errno. Both runlists are left unmodified. The following
- * error codes are defined:
- * -ENOMEM - Not enough memory to allocate runlist array.
- * -EINVAL - Invalid parameters were passed in.
- */
-static inline runlist_element *ntfs_rl_append(runlist_element *dst,
- int dsize, runlist_element *src, int ssize, int loc)
-{
- bool right = false; /* Right end of @src needs merging. */
- int marker; /* End of the inserted runs. */
-
- BUG_ON(!dst);
- BUG_ON(!src);
-
- /* First, check if the right hand end needs merging. */
- if ((loc + 1) < dsize)
- right = ntfs_are_rl_mergeable(src + ssize - 1, dst + loc + 1);
-
- /* Space required: @dst size + @src size, less one if we merged. */
- dst = ntfs_rl_realloc(dst, dsize, dsize + ssize - right);
- if (IS_ERR(dst))
- return dst;
- /*
- * We are guaranteed to succeed from here so can start modifying the
- * original runlists.
- */
-
- /* First, merge the right hand end, if necessary. */
- if (right)
- __ntfs_rl_merge(src + ssize - 1, dst + loc + 1);
-
- /* First run after the @src runs that have been inserted. */
- marker = loc + ssize + 1;
-
- /* Move the tail of @dst out of the way, then copy in @src. */
- ntfs_rl_mm(dst, marker, loc + 1 + right, dsize - (loc + 1 + right));
- ntfs_rl_mc(dst, loc + 1, src, 0, ssize);
-
- /* Adjust the size of the preceding hole. */
- dst[loc].length = dst[loc + 1].vcn - dst[loc].vcn;
-
- /* We may have changed the length of the file, so fix the end marker */
- if (dst[marker].lcn == LCN_ENOENT)
- dst[marker].vcn = dst[marker - 1].vcn + dst[marker - 1].length;
-
- return dst;
-}
-
-/**
- * ntfs_rl_insert - insert a runlist into another
- * @dst: original runlist to be worked on
- * @dsize: number of elements in @dst (including end marker)
- * @src: new runlist to be inserted
- * @ssize: number of elements in @src (excluding end marker)
- * @loc: insert the new runlist @src before this element in @dst
- *
- * Insert the runlist @src before element @loc in the runlist @dst. Merge the
- * left end of the new runlist, if necessary. Adjust the size of the hole
- * after the inserted runlist.
- *
- * It is up to the caller to serialize access to the runlists @dst and @src.
- *
- * On success, return a pointer to the new, combined, runlist. Note, both
- * runlists @dst and @src are deallocated before returning so you cannot use
- * the pointers for anything any more. (Strictly speaking the returned runlist
- * may be the same as @dst but this is irrelevant.)
- *
- * On error, return -errno. Both runlists are left unmodified. The following
- * error codes are defined:
- * -ENOMEM - Not enough memory to allocate runlist array.
- * -EINVAL - Invalid parameters were passed in.
- */
-static inline runlist_element *ntfs_rl_insert(runlist_element *dst,
- int dsize, runlist_element *src, int ssize, int loc)
-{
- bool left = false; /* Left end of @src needs merging. */
- bool disc = false; /* Discontinuity between @dst and @src. */
- int marker; /* End of the inserted runs. */
-
- BUG_ON(!dst);
- BUG_ON(!src);
-
- /*
- * disc => Discontinuity between the end of @dst and the start of @src.
- * This means we might need to insert a "not mapped" run.
- */
- if (loc == 0)
- disc = (src[0].vcn > 0);
- else {
- s64 merged_length;
-
- left = ntfs_are_rl_mergeable(dst + loc - 1, src);
-
- merged_length = dst[loc - 1].length;
- if (left)
- merged_length += src->length;
-
- disc = (src[0].vcn > dst[loc - 1].vcn + merged_length);
- }
- /*
- * Space required: @dst size + @src size, less one if we merged, plus
- * one if there was a discontinuity.
- */
- dst = ntfs_rl_realloc(dst, dsize, dsize + ssize - left + disc);
- if (IS_ERR(dst))
- return dst;
- /*
- * We are guaranteed to succeed from here so can start modifying the
- * original runlist.
- */
- if (left)
- __ntfs_rl_merge(dst + loc - 1, src);
- /*
- * First run after the @src runs that have been inserted.
- * Nominally, @marker equals @loc + @ssize, i.e. location + number of
- * runs in @src. However, if @left, then the first run in @src has
- * been merged with one in @dst. And if @disc, then @dst and @src do
- * not meet and we need an extra run to fill the gap.
- */
- marker = loc + ssize - left + disc;
-
- /* Move the tail of @dst out of the way, then copy in @src. */
- ntfs_rl_mm(dst, marker, loc, dsize - loc);
- ntfs_rl_mc(dst, loc + disc, src, left, ssize - left);
-
- /* Adjust the VCN of the first run after the insertion... */
- dst[marker].vcn = dst[marker - 1].vcn + dst[marker - 1].length;
- /* ... and the length. */
- if (dst[marker].lcn == LCN_HOLE || dst[marker].lcn == LCN_RL_NOT_MAPPED)
- dst[marker].length = dst[marker + 1].vcn - dst[marker].vcn;
-
- /* Writing beyond the end of the file and there is a discontinuity. */
- if (disc) {
- if (loc > 0) {
- dst[loc].vcn = dst[loc - 1].vcn + dst[loc - 1].length;
- dst[loc].length = dst[loc + 1].vcn - dst[loc].vcn;
- } else {
- dst[loc].vcn = 0;
- dst[loc].length = dst[loc + 1].vcn;
- }
- dst[loc].lcn = LCN_RL_NOT_MAPPED;
- }
- return dst;
-}
-
-/**
- * ntfs_rl_replace - overwrite a runlist element with another runlist
- * @dst: original runlist to be worked on
- * @dsize: number of elements in @dst (including end marker)
- * @src: new runlist to be inserted
- * @ssize: number of elements in @src (excluding end marker)
- * @loc: index in runlist @dst to overwrite with @src
- *
- * Replace the runlist element @dst at @loc with @src. Merge the left and
- * right ends of the inserted runlist, if necessary.
- *
- * It is up to the caller to serialize access to the runlists @dst and @src.
- *
- * On success, return a pointer to the new, combined, runlist. Note, both
- * runlists @dst and @src are deallocated before returning so you cannot use
- * the pointers for anything any more. (Strictly speaking the returned runlist
- * may be the same as @dst but this is irrelevant.)
- *
- * On error, return -errno. Both runlists are left unmodified. The following
- * error codes are defined:
- * -ENOMEM - Not enough memory to allocate runlist array.
- * -EINVAL - Invalid parameters were passed in.
- */
-static inline runlist_element *ntfs_rl_replace(runlist_element *dst,
- int dsize, runlist_element *src, int ssize, int loc)
-{
- signed delta;
- bool left = false; /* Left end of @src needs merging. */
- bool right = false; /* Right end of @src needs merging. */
- int tail; /* Start of tail of @dst. */
- int marker; /* End of the inserted runs. */
-
- BUG_ON(!dst);
- BUG_ON(!src);
-
- /* First, see if the left and right ends need merging. */
- if ((loc + 1) < dsize)
- right = ntfs_are_rl_mergeable(src + ssize - 1, dst + loc + 1);
- if (loc > 0)
- left = ntfs_are_rl_mergeable(dst + loc - 1, src);
- /*
- * Allocate some space. We will need less if the left, right, or both
- * ends get merged. The -1 accounts for the run being replaced.
- */
- delta = ssize - 1 - left - right;
- if (delta > 0) {
- dst = ntfs_rl_realloc(dst, dsize, dsize + delta);
- if (IS_ERR(dst))
- return dst;
- }
- /*
- * We are guaranteed to succeed from here so can start modifying the
- * original runlists.
- */
-
- /* First, merge the left and right ends, if necessary. */
- if (right)
- __ntfs_rl_merge(src + ssize - 1, dst + loc + 1);
- if (left)
- __ntfs_rl_merge(dst + loc - 1, src);
- /*
- * Offset of the tail of @dst. This needs to be moved out of the way
- * to make space for the runs to be copied from @src, i.e. the first
- * run of the tail of @dst.
- * Nominally, @tail equals @loc + 1, i.e. location, skipping the
- * replaced run. However, if @right, then one of @dst's runs is
- * already merged into @src.
- */
- tail = loc + right + 1;
- /*
- * First run after the @src runs that have been inserted, i.e. where
- * the tail of @dst needs to be moved to.
- * Nominally, @marker equals @loc + @ssize, i.e. location + number of
- * runs in @src. However, if @left, then the first run in @src has
- * been merged with one in @dst.
- */
- marker = loc + ssize - left;
-
- /* Move the tail of @dst out of the way, then copy in @src. */
- ntfs_rl_mm(dst, marker, tail, dsize - tail);
- ntfs_rl_mc(dst, loc, src, left, ssize - left);
-
- /* We may have changed the length of the file, so fix the end marker. */
- if (dsize - tail > 0 && dst[marker].lcn == LCN_ENOENT)
- dst[marker].vcn = dst[marker - 1].vcn + dst[marker - 1].length;
- return dst;
-}
-
-/**
- * ntfs_rl_split - insert a runlist into the centre of a hole
- * @dst: original runlist to be worked on
- * @dsize: number of elements in @dst (including end marker)
- * @src: new runlist to be inserted
- * @ssize: number of elements in @src (excluding end marker)
- * @loc: index in runlist @dst at which to split and insert @src
- *
- * Split the runlist @dst at @loc into two and insert @new in between the two
- * fragments. No merging of runlists is necessary. Adjust the size of the
- * holes either side.
- *
- * It is up to the caller to serialize access to the runlists @dst and @src.
- *
- * On success, return a pointer to the new, combined, runlist. Note, both
- * runlists @dst and @src are deallocated before returning so you cannot use
- * the pointers for anything any more. (Strictly speaking the returned runlist
- * may be the same as @dst but this is irrelevant.)
- *
- * On error, return -errno. Both runlists are left unmodified. The following
- * error codes are defined:
- * -ENOMEM - Not enough memory to allocate runlist array.
- * -EINVAL - Invalid parameters were passed in.
- */
-static inline runlist_element *ntfs_rl_split(runlist_element *dst, int dsize,
- runlist_element *src, int ssize, int loc)
-{
- BUG_ON(!dst);
- BUG_ON(!src);
-
- /* Space required: @dst size + @src size + one new hole. */
- dst = ntfs_rl_realloc(dst, dsize, dsize + ssize + 1);
- if (IS_ERR(dst))
- return dst;
- /*
- * We are guaranteed to succeed from here so can start modifying the
- * original runlists.
- */
-
- /* Move the tail of @dst out of the way, then copy in @src. */
- ntfs_rl_mm(dst, loc + 1 + ssize, loc, dsize - loc);
- ntfs_rl_mc(dst, loc + 1, src, 0, ssize);
-
- /* Adjust the size of the holes either size of @src. */
- dst[loc].length = dst[loc+1].vcn - dst[loc].vcn;
- dst[loc+ssize+1].vcn = dst[loc+ssize].vcn + dst[loc+ssize].length;
- dst[loc+ssize+1].length = dst[loc+ssize+2].vcn - dst[loc+ssize+1].vcn;
-
- return dst;
-}
-
-/**
- * ntfs_runlists_merge - merge two runlists into one
- * @drl: original runlist to be worked on
- * @srl: new runlist to be merged into @drl
- *
- * First we sanity check the two runlists @srl and @drl to make sure that they
- * are sensible and can be merged. The runlist @srl must be either after the
- * runlist @drl or completely within a hole (or unmapped region) in @drl.
- *
- * It is up to the caller to serialize access to the runlists @drl and @srl.
- *
- * Merging of runlists is necessary in two cases:
- * 1. When attribute lists are used and a further extent is being mapped.
- * 2. When new clusters are allocated to fill a hole or extend a file.
- *
- * There are four possible ways @srl can be merged. It can:
- * - be inserted at the beginning of a hole,
- * - split the hole in two and be inserted between the two fragments,
- * - be appended at the end of a hole, or it can
- * - replace the whole hole.
- * It can also be appended to the end of the runlist, which is just a variant
- * of the insert case.
- *
- * On success, return a pointer to the new, combined, runlist. Note, both
- * runlists @drl and @srl are deallocated before returning so you cannot use
- * the pointers for anything any more. (Strictly speaking the returned runlist
- * may be the same as @dst but this is irrelevant.)
- *
- * On error, return -errno. Both runlists are left unmodified. The following
- * error codes are defined:
- * -ENOMEM - Not enough memory to allocate runlist array.
- * -EINVAL - Invalid parameters were passed in.
- * -ERANGE - The runlists overlap and cannot be merged.
- */
-runlist_element *ntfs_runlists_merge(runlist_element *drl,
- runlist_element *srl)
-{
- int di, si; /* Current index into @[ds]rl. */
- int sstart; /* First index with lcn > LCN_RL_NOT_MAPPED. */
- int dins; /* Index into @drl at which to insert @srl. */
- int dend, send; /* Last index into @[ds]rl. */
- int dfinal, sfinal; /* The last index into @[ds]rl with
- lcn >= LCN_HOLE. */
- int marker = 0;
- VCN marker_vcn = 0;
-
-#ifdef DEBUG
- ntfs_debug("dst:");
- ntfs_debug_dump_runlist(drl);
- ntfs_debug("src:");
- ntfs_debug_dump_runlist(srl);
-#endif
-
- /* Check for silly calling... */
- if (unlikely(!srl))
- return drl;
- if (IS_ERR(srl) || IS_ERR(drl))
- return ERR_PTR(-EINVAL);
-
- /* Check for the case where the first mapping is being done now. */
- if (unlikely(!drl)) {
- drl = srl;
- /* Complete the source runlist if necessary. */
- if (unlikely(drl[0].vcn)) {
- /* Scan to the end of the source runlist. */
- for (dend = 0; likely(drl[dend].length); dend++)
- ;
- dend++;
- drl = ntfs_rl_realloc(drl, dend, dend + 1);
- if (IS_ERR(drl))
- return drl;
- /* Insert start element at the front of the runlist. */
- ntfs_rl_mm(drl, 1, 0, dend);
- drl[0].vcn = 0;
- drl[0].lcn = LCN_RL_NOT_MAPPED;
- drl[0].length = drl[1].vcn;
- }
- goto finished;
- }
-
- si = di = 0;
-
- /* Skip any unmapped start element(s) in the source runlist. */
- while (srl[si].length && srl[si].lcn < LCN_HOLE)
- si++;
-
- /* Can't have an entirely unmapped source runlist. */
- BUG_ON(!srl[si].length);
-
- /* Record the starting points. */
- sstart = si;
-
- /*
- * Skip forward in @drl until we reach the position where @srl needs to
- * be inserted. If we reach the end of @drl, @srl just needs to be
- * appended to @drl.
- */
- for (; drl[di].length; di++) {
- if (drl[di].vcn + drl[di].length > srl[sstart].vcn)
- break;
- }
- dins = di;
-
- /* Sanity check for illegal overlaps. */
- if ((drl[di].vcn == srl[si].vcn) && (drl[di].lcn >= 0) &&
- (srl[si].lcn >= 0)) {
- ntfs_error(NULL, "Run lists overlap. Cannot merge!");
- return ERR_PTR(-ERANGE);
- }
-
- /* Scan to the end of both runlists in order to know their sizes. */
- for (send = si; srl[send].length; send++)
- ;
- for (dend = di; drl[dend].length; dend++)
- ;
-
- if (srl[send].lcn == LCN_ENOENT)
- marker_vcn = srl[marker = send].vcn;
-
- /* Scan to the last element with lcn >= LCN_HOLE. */
- for (sfinal = send; sfinal >= 0 && srl[sfinal].lcn < LCN_HOLE; sfinal--)
- ;
- for (dfinal = dend; dfinal >= 0 && drl[dfinal].lcn < LCN_HOLE; dfinal--)
- ;
-
- {
- bool start;
- bool finish;
- int ds = dend + 1; /* Number of elements in drl & srl */
- int ss = sfinal - sstart + 1;
-
- start = ((drl[dins].lcn < LCN_RL_NOT_MAPPED) || /* End of file */
- (drl[dins].vcn == srl[sstart].vcn)); /* Start of hole */
- finish = ((drl[dins].lcn >= LCN_RL_NOT_MAPPED) && /* End of file */
- ((drl[dins].vcn + drl[dins].length) <= /* End of hole */
- (srl[send - 1].vcn + srl[send - 1].length)));
-
- /* Or we will lose an end marker. */
- if (finish && !drl[dins].length)
- ss++;
- if (marker && (drl[dins].vcn + drl[dins].length > srl[send - 1].vcn))
- finish = false;
-#if 0
- ntfs_debug("dfinal = %i, dend = %i", dfinal, dend);
- ntfs_debug("sstart = %i, sfinal = %i, send = %i", sstart, sfinal, send);
- ntfs_debug("start = %i, finish = %i", start, finish);
- ntfs_debug("ds = %i, ss = %i, dins = %i", ds, ss, dins);
-#endif
- if (start) {
- if (finish)
- drl = ntfs_rl_replace(drl, ds, srl + sstart, ss, dins);
- else
- drl = ntfs_rl_insert(drl, ds, srl + sstart, ss, dins);
- } else {
- if (finish)
- drl = ntfs_rl_append(drl, ds, srl + sstart, ss, dins);
- else
- drl = ntfs_rl_split(drl, ds, srl + sstart, ss, dins);
- }
- if (IS_ERR(drl)) {
- ntfs_error(NULL, "Merge failed.");
- return drl;
- }
- ntfs_free(srl);
- if (marker) {
- ntfs_debug("Triggering marker code.");
- for (ds = dend; drl[ds].length; ds++)
- ;
- /* We only need to care if @srl ended after @drl. */
- if (drl[ds].vcn <= marker_vcn) {
- int slots = 0;
-
- if (drl[ds].vcn == marker_vcn) {
- ntfs_debug("Old marker = 0x%llx, replacing "
- "with LCN_ENOENT.",
- (unsigned long long)
- drl[ds].lcn);
- drl[ds].lcn = LCN_ENOENT;
- goto finished;
- }
- /*
- * We need to create an unmapped runlist element in
- * @drl or extend an existing one before adding the
- * ENOENT terminator.
- */
- if (drl[ds].lcn == LCN_ENOENT) {
- ds--;
- slots = 1;
- }
- if (drl[ds].lcn != LCN_RL_NOT_MAPPED) {
- /* Add an unmapped runlist element. */
- if (!slots) {
- drl = ntfs_rl_realloc_nofail(drl, ds,
- ds + 2);
- slots = 2;
- }
- ds++;
- /* Need to set vcn if it isn't set already. */
- if (slots != 1)
- drl[ds].vcn = drl[ds - 1].vcn +
- drl[ds - 1].length;
- drl[ds].lcn = LCN_RL_NOT_MAPPED;
- /* We now used up a slot. */
- slots--;
- }
- drl[ds].length = marker_vcn - drl[ds].vcn;
- /* Finally add the ENOENT terminator. */
- ds++;
- if (!slots)
- drl = ntfs_rl_realloc_nofail(drl, ds, ds + 1);
- drl[ds].vcn = marker_vcn;
- drl[ds].lcn = LCN_ENOENT;
- drl[ds].length = (s64)0;
- }
- }
- }
-
-finished:
- /* The merge was completed successfully. */
- ntfs_debug("Merged runlist:");
- ntfs_debug_dump_runlist(drl);
- return drl;
-}
-
-/**
- * ntfs_mapping_pairs_decompress - convert mapping pairs array to runlist
- * @vol: ntfs volume on which the attribute resides
- * @attr: attribute record whose mapping pairs array to decompress
- * @old_rl: optional runlist in which to insert @attr's runlist
- *
- * It is up to the caller to serialize access to the runlist @old_rl.
- *
- * Decompress the attribute @attr's mapping pairs array into a runlist. On
- * success, return the decompressed runlist.
- *
- * If @old_rl is not NULL, decompressed runlist is inserted into the
- * appropriate place in @old_rl and the resultant, combined runlist is
- * returned. The original @old_rl is deallocated.
- *
- * On error, return -errno. @old_rl is left unmodified in that case.
- *
- * The following error codes are defined:
- * -ENOMEM - Not enough memory to allocate runlist array.
- * -EIO - Corrupt runlist.
- * -EINVAL - Invalid parameters were passed in.
- * -ERANGE - The two runlists overlap.
- *
- * FIXME: For now we take the conceptionally simplest approach of creating the
- * new runlist disregarding the already existing one and then splicing the
- * two into one, if that is possible (we check for overlap and discard the new
- * runlist if overlap present before returning ERR_PTR(-ERANGE)).
- */
-runlist_element *ntfs_mapping_pairs_decompress(const ntfs_volume *vol,
- const ATTR_RECORD *attr, runlist_element *old_rl)
-{
- VCN vcn; /* Current vcn. */
- LCN lcn; /* Current lcn. */
- s64 deltaxcn; /* Change in [vl]cn. */
- runlist_element *rl; /* The output runlist. */
- u8 *buf; /* Current position in mapping pairs array. */
- u8 *attr_end; /* End of attribute. */
- int rlsize; /* Size of runlist buffer. */
- u16 rlpos; /* Current runlist position in units of
- runlist_elements. */
- u8 b; /* Current byte offset in buf. */
-
-#ifdef DEBUG
- /* Make sure attr exists and is non-resident. */
- if (!attr || !attr->non_resident || sle64_to_cpu(
- attr->data.non_resident.lowest_vcn) < (VCN)0) {
- ntfs_error(vol->sb, "Invalid arguments.");
- return ERR_PTR(-EINVAL);
- }
-#endif
- /* Start at vcn = lowest_vcn and lcn 0. */
- vcn = sle64_to_cpu(attr->data.non_resident.lowest_vcn);
- lcn = 0;
- /* Get start of the mapping pairs array. */
- buf = (u8*)attr + le16_to_cpu(
- attr->data.non_resident.mapping_pairs_offset);
- attr_end = (u8*)attr + le32_to_cpu(attr->length);
- if (unlikely(buf < (u8*)attr || buf > attr_end)) {
- ntfs_error(vol->sb, "Corrupt attribute.");
- return ERR_PTR(-EIO);
- }
- /* If the mapping pairs array is valid but empty, nothing to do. */
- if (!vcn && !*buf)
- return old_rl;
- /* Current position in runlist array. */
- rlpos = 0;
- /* Allocate first page and set current runlist size to one page. */
- rl = ntfs_malloc_nofs(rlsize = PAGE_SIZE);
- if (unlikely(!rl))
- return ERR_PTR(-ENOMEM);
- /* Insert unmapped starting element if necessary. */
- if (vcn) {
- rl->vcn = 0;
- rl->lcn = LCN_RL_NOT_MAPPED;
- rl->length = vcn;
- rlpos++;
- }
- while (buf < attr_end && *buf) {
- /*
- * Allocate more memory if needed, including space for the
- * not-mapped and terminator elements. ntfs_malloc_nofs()
- * operates on whole pages only.
- */
- if (((rlpos + 3) * sizeof(*old_rl)) > rlsize) {
- runlist_element *rl2;
-
- rl2 = ntfs_malloc_nofs(rlsize + (int)PAGE_SIZE);
- if (unlikely(!rl2)) {
- ntfs_free(rl);
- return ERR_PTR(-ENOMEM);
- }
- memcpy(rl2, rl, rlsize);
- ntfs_free(rl);
- rl = rl2;
- rlsize += PAGE_SIZE;
- }
- /* Enter the current vcn into the current runlist element. */
- rl[rlpos].vcn = vcn;
- /*
- * Get the change in vcn, i.e. the run length in clusters.
- * Doing it this way ensures that we signextend negative values.
- * A negative run length doesn't make any sense, but hey, I
- * didn't make up the NTFS specs and Windows NT4 treats the run
- * length as a signed value so that's how it is...
- */
- b = *buf & 0xf;
- if (b) {
- if (unlikely(buf + b > attr_end))
- goto io_error;
- for (deltaxcn = (s8)buf[b--]; b; b--)
- deltaxcn = (deltaxcn << 8) + buf[b];
- } else { /* The length entry is compulsory. */
- ntfs_error(vol->sb, "Missing length entry in mapping "
- "pairs array.");
- deltaxcn = (s64)-1;
- }
- /*
- * Assume a negative length to indicate data corruption and
- * hence clean-up and return NULL.
- */
- if (unlikely(deltaxcn < 0)) {
- ntfs_error(vol->sb, "Invalid length in mapping pairs "
- "array.");
- goto err_out;
- }
- /*
- * Enter the current run length into the current runlist
- * element.
- */
- rl[rlpos].length = deltaxcn;
- /* Increment the current vcn by the current run length. */
- vcn += deltaxcn;
- /*
- * There might be no lcn change at all, as is the case for
- * sparse clusters on NTFS 3.0+, in which case we set the lcn
- * to LCN_HOLE.
- */
- if (!(*buf & 0xf0))
- rl[rlpos].lcn = LCN_HOLE;
- else {
- /* Get the lcn change which really can be negative. */
- u8 b2 = *buf & 0xf;
- b = b2 + ((*buf >> 4) & 0xf);
- if (buf + b > attr_end)
- goto io_error;
- for (deltaxcn = (s8)buf[b--]; b > b2; b--)
- deltaxcn = (deltaxcn << 8) + buf[b];
- /* Change the current lcn to its new value. */
- lcn += deltaxcn;
-#ifdef DEBUG
- /*
- * On NTFS 1.2-, apparently can have lcn == -1 to
- * indicate a hole. But we haven't verified ourselves
- * whether it is really the lcn or the deltaxcn that is
- * -1. So if either is found give us a message so we
- * can investigate it further!
- */
- if (vol->major_ver < 3) {
- if (unlikely(deltaxcn == (LCN)-1))
- ntfs_error(vol->sb, "lcn delta == -1");
- if (unlikely(lcn == (LCN)-1))
- ntfs_error(vol->sb, "lcn == -1");
- }
-#endif
- /* Check lcn is not below -1. */
- if (unlikely(lcn < (LCN)-1)) {
- ntfs_error(vol->sb, "Invalid LCN < -1 in "
- "mapping pairs array.");
- goto err_out;
- }
- /* Enter the current lcn into the runlist element. */
- rl[rlpos].lcn = lcn;
- }
- /* Get to the next runlist element. */
- rlpos++;
- /* Increment the buffer position to the next mapping pair. */
- buf += (*buf & 0xf) + ((*buf >> 4) & 0xf) + 1;
- }
- if (unlikely(buf >= attr_end))
- goto io_error;
- /*
- * If there is a highest_vcn specified, it must be equal to the final
- * vcn in the runlist - 1, or something has gone badly wrong.
- */
- deltaxcn = sle64_to_cpu(attr->data.non_resident.highest_vcn);
- if (unlikely(deltaxcn && vcn - 1 != deltaxcn)) {
-mpa_err:
- ntfs_error(vol->sb, "Corrupt mapping pairs array in "
- "non-resident attribute.");
- goto err_out;
- }
- /* Setup not mapped runlist element if this is the base extent. */
- if (!attr->data.non_resident.lowest_vcn) {
- VCN max_cluster;
-
- max_cluster = ((sle64_to_cpu(
- attr->data.non_resident.allocated_size) +
- vol->cluster_size - 1) >>
- vol->cluster_size_bits) - 1;
- /*
- * A highest_vcn of zero means this is a single extent
- * attribute so simply terminate the runlist with LCN_ENOENT).
- */
- if (deltaxcn) {
- /*
- * If there is a difference between the highest_vcn and
- * the highest cluster, the runlist is either corrupt
- * or, more likely, there are more extents following
- * this one.
- */
- if (deltaxcn < max_cluster) {
- ntfs_debug("More extents to follow; deltaxcn "
- "= 0x%llx, max_cluster = "
- "0x%llx",
- (unsigned long long)deltaxcn,
- (unsigned long long)
- max_cluster);
- rl[rlpos].vcn = vcn;
- vcn += rl[rlpos].length = max_cluster -
- deltaxcn;
- rl[rlpos].lcn = LCN_RL_NOT_MAPPED;
- rlpos++;
- } else if (unlikely(deltaxcn > max_cluster)) {
- ntfs_error(vol->sb, "Corrupt attribute. "
- "deltaxcn = 0x%llx, "
- "max_cluster = 0x%llx",
- (unsigned long long)deltaxcn,
- (unsigned long long)
- max_cluster);
- goto mpa_err;
- }
- }
- rl[rlpos].lcn = LCN_ENOENT;
- } else /* Not the base extent. There may be more extents to follow. */
- rl[rlpos].lcn = LCN_RL_NOT_MAPPED;
-
- /* Setup terminating runlist element. */
- rl[rlpos].vcn = vcn;
- rl[rlpos].length = (s64)0;
- /* If no existing runlist was specified, we are done. */
- if (!old_rl) {
- ntfs_debug("Mapping pairs array successfully decompressed:");
- ntfs_debug_dump_runlist(rl);
- return rl;
- }
- /* Now combine the new and old runlists checking for overlaps. */
- old_rl = ntfs_runlists_merge(old_rl, rl);
- if (!IS_ERR(old_rl))
- return old_rl;
- ntfs_free(rl);
- ntfs_error(vol->sb, "Failed to merge runlists.");
- return old_rl;
-io_error:
- ntfs_error(vol->sb, "Corrupt attribute.");
-err_out:
- ntfs_free(rl);
- return ERR_PTR(-EIO);
-}
-
-/**
- * ntfs_rl_vcn_to_lcn - convert a vcn into a lcn given a runlist
- * @rl: runlist to use for conversion
- * @vcn: vcn to convert
- *
- * Convert the virtual cluster number @vcn of an attribute into a logical
- * cluster number (lcn) of a device using the runlist @rl to map vcns to their
- * corresponding lcns.
- *
- * It is up to the caller to serialize access to the runlist @rl.
- *
- * Since lcns must be >= 0, we use negative return codes with special meaning:
- *
- * Return code Meaning / Description
- * ==================================================
- * LCN_HOLE Hole / not allocated on disk.
- * LCN_RL_NOT_MAPPED This is part of the runlist which has not been
- * inserted into the runlist yet.
- * LCN_ENOENT There is no such vcn in the attribute.
- *
- * Locking: - The caller must have locked the runlist (for reading or writing).
- * - This function does not touch the lock, nor does it modify the
- * runlist.
- */
-LCN ntfs_rl_vcn_to_lcn(const runlist_element *rl, const VCN vcn)
-{
- int i;
-
- BUG_ON(vcn < 0);
- /*
- * If rl is NULL, assume that we have found an unmapped runlist. The
- * caller can then attempt to map it and fail appropriately if
- * necessary.
- */
- if (unlikely(!rl))
- return LCN_RL_NOT_MAPPED;
-
- /* Catch out of lower bounds vcn. */
- if (unlikely(vcn < rl[0].vcn))
- return LCN_ENOENT;
-
- for (i = 0; likely(rl[i].length); i++) {
- if (unlikely(vcn < rl[i+1].vcn)) {
- if (likely(rl[i].lcn >= (LCN)0))
- return rl[i].lcn + (vcn - rl[i].vcn);
- return rl[i].lcn;
- }
- }
- /*
- * The terminator element is setup to the correct value, i.e. one of
- * LCN_HOLE, LCN_RL_NOT_MAPPED, or LCN_ENOENT.
- */
- if (likely(rl[i].lcn < (LCN)0))
- return rl[i].lcn;
- /* Just in case... We could replace this with BUG() some day. */
- return LCN_ENOENT;
-}
-
-#ifdef NTFS_RW
-
-/**
- * ntfs_rl_find_vcn_nolock - find a vcn in a runlist
- * @rl: runlist to search
- * @vcn: vcn to find
- *
- * Find the virtual cluster number @vcn in the runlist @rl and return the
- * address of the runlist element containing the @vcn on success.
- *
- * Return NULL if @rl is NULL or @vcn is in an unmapped part/out of bounds of
- * the runlist.
- *
- * Locking: The runlist must be locked on entry.
- */
-runlist_element *ntfs_rl_find_vcn_nolock(runlist_element *rl, const VCN vcn)
-{
- BUG_ON(vcn < 0);
- if (unlikely(!rl || vcn < rl[0].vcn))
- return NULL;
- while (likely(rl->length)) {
- if (unlikely(vcn < rl[1].vcn)) {
- if (likely(rl->lcn >= LCN_HOLE))
- return rl;
- return NULL;
- }
- rl++;
- }
- if (likely(rl->lcn == LCN_ENOENT))
- return rl;
- return NULL;
-}
-
-/**
- * ntfs_get_nr_significant_bytes - get number of bytes needed to store a number
- * @n: number for which to get the number of bytes for
- *
- * Return the number of bytes required to store @n unambiguously as
- * a signed number.
- *
- * This is used in the context of the mapping pairs array to determine how
- * many bytes will be needed in the array to store a given logical cluster
- * number (lcn) or a specific run length.
- *
- * Return the number of bytes written. This function cannot fail.
- */
-static inline int ntfs_get_nr_significant_bytes(const s64 n)
-{
- s64 l = n;
- int i;
- s8 j;
-
- i = 0;
- do {
- l >>= 8;
- i++;
- } while (l != 0 && l != -1);
- j = (n >> 8 * (i - 1)) & 0xff;
- /* If the sign bit is wrong, we need an extra byte. */
- if ((n < 0 && j >= 0) || (n > 0 && j < 0))
- i++;
- return i;
-}
-
-/**
- * ntfs_get_size_for_mapping_pairs - get bytes needed for mapping pairs array
- * @vol: ntfs volume (needed for the ntfs version)
- * @rl: locked runlist to determine the size of the mapping pairs of
- * @first_vcn: first vcn which to include in the mapping pairs array
- * @last_vcn: last vcn which to include in the mapping pairs array
- *
- * Walk the locked runlist @rl and calculate the size in bytes of the mapping
- * pairs array corresponding to the runlist @rl, starting at vcn @first_vcn and
- * finishing with vcn @last_vcn.
- *
- * A @last_vcn of -1 means end of runlist and in that case the size of the
- * mapping pairs array corresponding to the runlist starting at vcn @first_vcn
- * and finishing at the end of the runlist is determined.
- *
- * This for example allows us to allocate a buffer of the right size when
- * building the mapping pairs array.
- *
- * If @rl is NULL, just return 1 (for the single terminator byte).
- *
- * Return the calculated size in bytes on success. On error, return -errno.
- * The following error codes are defined:
- * -EINVAL - Run list contains unmapped elements. Make sure to only pass
- * fully mapped runlists to this function.
- * -EIO - The runlist is corrupt.
- *
- * Locking: @rl must be locked on entry (either for reading or writing), it
- * remains locked throughout, and is left locked upon return.
- */
-int ntfs_get_size_for_mapping_pairs(const ntfs_volume *vol,
- const runlist_element *rl, const VCN first_vcn,
- const VCN last_vcn)
-{
- LCN prev_lcn;
- int rls;
- bool the_end = false;
-
- BUG_ON(first_vcn < 0);
- BUG_ON(last_vcn < -1);
- BUG_ON(last_vcn >= 0 && first_vcn > last_vcn);
- if (!rl) {
- BUG_ON(first_vcn);
- BUG_ON(last_vcn > 0);
- return 1;
- }
- /* Skip to runlist element containing @first_vcn. */
- while (rl->length && first_vcn >= rl[1].vcn)
- rl++;
- if (unlikely((!rl->length && first_vcn > rl->vcn) ||
- first_vcn < rl->vcn))
- return -EINVAL;
- prev_lcn = 0;
- /* Always need the termining zero byte. */
- rls = 1;
- /* Do the first partial run if present. */
- if (first_vcn > rl->vcn) {
- s64 delta, length = rl->length;
-
- /* We know rl->length != 0 already. */
- if (unlikely(length < 0 || rl->lcn < LCN_HOLE))
- goto err_out;
- /*
- * If @stop_vcn is given and finishes inside this run, cap the
- * run length.
- */
- if (unlikely(last_vcn >= 0 && rl[1].vcn > last_vcn)) {
- s64 s1 = last_vcn + 1;
- if (unlikely(rl[1].vcn > s1))
- length = s1 - rl->vcn;
- the_end = true;
- }
- delta = first_vcn - rl->vcn;
- /* Header byte + length. */
- rls += 1 + ntfs_get_nr_significant_bytes(length - delta);
- /*
- * If the logical cluster number (lcn) denotes a hole and we
- * are on NTFS 3.0+, we don't store it at all, i.e. we need
- * zero space. On earlier NTFS versions we just store the lcn.
- * Note: this assumes that on NTFS 1.2-, holes are stored with
- * an lcn of -1 and not a delta_lcn of -1 (unless both are -1).
- */
- if (likely(rl->lcn >= 0 || vol->major_ver < 3)) {
- prev_lcn = rl->lcn;
- if (likely(rl->lcn >= 0))
- prev_lcn += delta;
- /* Change in lcn. */
- rls += ntfs_get_nr_significant_bytes(prev_lcn);
- }
- /* Go to next runlist element. */
- rl++;
- }
- /* Do the full runs. */
- for (; rl->length && !the_end; rl++) {
- s64 length = rl->length;
-
- if (unlikely(length < 0 || rl->lcn < LCN_HOLE))
- goto err_out;
- /*
- * If @stop_vcn is given and finishes inside this run, cap the
- * run length.
- */
- if (unlikely(last_vcn >= 0 && rl[1].vcn > last_vcn)) {
- s64 s1 = last_vcn + 1;
- if (unlikely(rl[1].vcn > s1))
- length = s1 - rl->vcn;
- the_end = true;
- }
- /* Header byte + length. */
- rls += 1 + ntfs_get_nr_significant_bytes(length);
- /*
- * If the logical cluster number (lcn) denotes a hole and we
- * are on NTFS 3.0+, we don't store it at all, i.e. we need
- * zero space. On earlier NTFS versions we just store the lcn.
- * Note: this assumes that on NTFS 1.2-, holes are stored with
- * an lcn of -1 and not a delta_lcn of -1 (unless both are -1).
- */
- if (likely(rl->lcn >= 0 || vol->major_ver < 3)) {
- /* Change in lcn. */
- rls += ntfs_get_nr_significant_bytes(rl->lcn -
- prev_lcn);
- prev_lcn = rl->lcn;
- }
- }
- return rls;
-err_out:
- if (rl->lcn == LCN_RL_NOT_MAPPED)
- rls = -EINVAL;
- else
- rls = -EIO;
- return rls;
-}
-
-/**
- * ntfs_write_significant_bytes - write the significant bytes of a number
- * @dst: destination buffer to write to
- * @dst_max: pointer to last byte of destination buffer for bounds checking
- * @n: number whose significant bytes to write
- *
- * Store in @dst, the minimum bytes of the number @n which are required to
- * identify @n unambiguously as a signed number, taking care not to exceed
- * @dest_max, the maximum position within @dst to which we are allowed to
- * write.
- *
- * This is used when building the mapping pairs array of a runlist to compress
- * a given logical cluster number (lcn) or a specific run length to the minimum
- * size possible.
- *
- * Return the number of bytes written on success. On error, i.e. the
- * destination buffer @dst is too small, return -ENOSPC.
- */
-static inline int ntfs_write_significant_bytes(s8 *dst, const s8 *dst_max,
- const s64 n)
-{
- s64 l = n;
- int i;
- s8 j;
-
- i = 0;
- do {
- if (unlikely(dst > dst_max))
- goto err_out;
- *dst++ = l & 0xffll;
- l >>= 8;
- i++;
- } while (l != 0 && l != -1);
- j = (n >> 8 * (i - 1)) & 0xff;
- /* If the sign bit is wrong, we need an extra byte. */
- if (n < 0 && j >= 0) {
- if (unlikely(dst > dst_max))
- goto err_out;
- i++;
- *dst = (s8)-1;
- } else if (n > 0 && j < 0) {
- if (unlikely(dst > dst_max))
- goto err_out;
- i++;
- *dst = (s8)0;
- }
- return i;
-err_out:
- return -ENOSPC;
-}
-
-/**
- * ntfs_mapping_pairs_build - build the mapping pairs array from a runlist
- * @vol: ntfs volume (needed for the ntfs version)
- * @dst: destination buffer to which to write the mapping pairs array
- * @dst_len: size of destination buffer @dst in bytes
- * @rl: locked runlist for which to build the mapping pairs array
- * @first_vcn: first vcn which to include in the mapping pairs array
- * @last_vcn: last vcn which to include in the mapping pairs array
- * @stop_vcn: first vcn outside destination buffer on success or -ENOSPC
- *
- * Create the mapping pairs array from the locked runlist @rl, starting at vcn
- * @first_vcn and finishing with vcn @last_vcn and save the array in @dst.
- * @dst_len is the size of @dst in bytes and it should be at least equal to the
- * value obtained by calling ntfs_get_size_for_mapping_pairs().
- *
- * A @last_vcn of -1 means end of runlist and in that case the mapping pairs
- * array corresponding to the runlist starting at vcn @first_vcn and finishing
- * at the end of the runlist is created.
- *
- * If @rl is NULL, just write a single terminator byte to @dst.
- *
- * On success or -ENOSPC error, if @stop_vcn is not NULL, *@stop_vcn is set to
- * the first vcn outside the destination buffer. Note that on error, @dst has
- * been filled with all the mapping pairs that will fit, thus it can be treated
- * as partial success, in that a new attribute extent needs to be created or
- * the next extent has to be used and the mapping pairs build has to be
- * continued with @first_vcn set to *@stop_vcn.
- *
- * Return 0 on success and -errno on error. The following error codes are
- * defined:
- * -EINVAL - Run list contains unmapped elements. Make sure to only pass
- * fully mapped runlists to this function.
- * -EIO - The runlist is corrupt.
- * -ENOSPC - The destination buffer is too small.
- *
- * Locking: @rl must be locked on entry (either for reading or writing), it
- * remains locked throughout, and is left locked upon return.
- */
-int ntfs_mapping_pairs_build(const ntfs_volume *vol, s8 *dst,
- const int dst_len, const runlist_element *rl,
- const VCN first_vcn, const VCN last_vcn, VCN *const stop_vcn)
-{
- LCN prev_lcn;
- s8 *dst_max, *dst_next;
- int err = -ENOSPC;
- bool the_end = false;
- s8 len_len, lcn_len;
-
- BUG_ON(first_vcn < 0);
- BUG_ON(last_vcn < -1);
- BUG_ON(last_vcn >= 0 && first_vcn > last_vcn);
- BUG_ON(dst_len < 1);
- if (!rl) {
- BUG_ON(first_vcn);
- BUG_ON(last_vcn > 0);
- if (stop_vcn)
- *stop_vcn = 0;
- /* Terminator byte. */
- *dst = 0;
- return 0;
- }
- /* Skip to runlist element containing @first_vcn. */
- while (rl->length && first_vcn >= rl[1].vcn)
- rl++;
- if (unlikely((!rl->length && first_vcn > rl->vcn) ||
- first_vcn < rl->vcn))
- return -EINVAL;
- /*
- * @dst_max is used for bounds checking in
- * ntfs_write_significant_bytes().
- */
- dst_max = dst + dst_len - 1;
- prev_lcn = 0;
- /* Do the first partial run if present. */
- if (first_vcn > rl->vcn) {
- s64 delta, length = rl->length;
-
- /* We know rl->length != 0 already. */
- if (unlikely(length < 0 || rl->lcn < LCN_HOLE))
- goto err_out;
- /*
- * If @stop_vcn is given and finishes inside this run, cap the
- * run length.
- */
- if (unlikely(last_vcn >= 0 && rl[1].vcn > last_vcn)) {
- s64 s1 = last_vcn + 1;
- if (unlikely(rl[1].vcn > s1))
- length = s1 - rl->vcn;
- the_end = true;
- }
- delta = first_vcn - rl->vcn;
- /* Write length. */
- len_len = ntfs_write_significant_bytes(dst + 1, dst_max,
- length - delta);
- if (unlikely(len_len < 0))
- goto size_err;
- /*
- * If the logical cluster number (lcn) denotes a hole and we
- * are on NTFS 3.0+, we don't store it at all, i.e. we need
- * zero space. On earlier NTFS versions we just write the lcn
- * change. FIXME: Do we need to write the lcn change or just
- * the lcn in that case? Not sure as I have never seen this
- * case on NT4. - We assume that we just need to write the lcn
- * change until someone tells us otherwise... (AIA)
- */
- if (likely(rl->lcn >= 0 || vol->major_ver < 3)) {
- prev_lcn = rl->lcn;
- if (likely(rl->lcn >= 0))
- prev_lcn += delta;
- /* Write change in lcn. */
- lcn_len = ntfs_write_significant_bytes(dst + 1 +
- len_len, dst_max, prev_lcn);
- if (unlikely(lcn_len < 0))
- goto size_err;
- } else
- lcn_len = 0;
- dst_next = dst + len_len + lcn_len + 1;
- if (unlikely(dst_next > dst_max))
- goto size_err;
- /* Update header byte. */
- *dst = lcn_len << 4 | len_len;
- /* Position at next mapping pairs array element. */
- dst = dst_next;
- /* Go to next runlist element. */
- rl++;
- }
- /* Do the full runs. */
- for (; rl->length && !the_end; rl++) {
- s64 length = rl->length;
-
- if (unlikely(length < 0 || rl->lcn < LCN_HOLE))
- goto err_out;
- /*
- * If @stop_vcn is given and finishes inside this run, cap the
- * run length.
- */
- if (unlikely(last_vcn >= 0 && rl[1].vcn > last_vcn)) {
- s64 s1 = last_vcn + 1;
- if (unlikely(rl[1].vcn > s1))
- length = s1 - rl->vcn;
- the_end = true;
- }
- /* Write length. */
- len_len = ntfs_write_significant_bytes(dst + 1, dst_max,
- length);
- if (unlikely(len_len < 0))
- goto size_err;
- /*
- * If the logical cluster number (lcn) denotes a hole and we
- * are on NTFS 3.0+, we don't store it at all, i.e. we need
- * zero space. On earlier NTFS versions we just write the lcn
- * change. FIXME: Do we need to write the lcn change or just
- * the lcn in that case? Not sure as I have never seen this
- * case on NT4. - We assume that we just need to write the lcn
- * change until someone tells us otherwise... (AIA)
- */
- if (likely(rl->lcn >= 0 || vol->major_ver < 3)) {
- /* Write change in lcn. */
- lcn_len = ntfs_write_significant_bytes(dst + 1 +
- len_len, dst_max, rl->lcn - prev_lcn);
- if (unlikely(lcn_len < 0))
- goto size_err;
- prev_lcn = rl->lcn;
- } else
- lcn_len = 0;
- dst_next = dst + len_len + lcn_len + 1;
- if (unlikely(dst_next > dst_max))
- goto size_err;
- /* Update header byte. */
- *dst = lcn_len << 4 | len_len;
- /* Position at next mapping pairs array element. */
- dst = dst_next;
- }
- /* Success. */
- err = 0;
-size_err:
- /* Set stop vcn. */
- if (stop_vcn)
- *stop_vcn = rl->vcn;
- /* Add terminator byte. */
- *dst = 0;
- return err;
-err_out:
- if (rl->lcn == LCN_RL_NOT_MAPPED)
- err = -EINVAL;
- else
- err = -EIO;
- return err;
-}
-
-/**
- * ntfs_rl_truncate_nolock - truncate a runlist starting at a specified vcn
- * @vol: ntfs volume (needed for error output)
- * @runlist: runlist to truncate
- * @new_length: the new length of the runlist in VCNs
- *
- * Truncate the runlist described by @runlist as well as the memory buffer
- * holding the runlist elements to a length of @new_length VCNs.
- *
- * If @new_length lies within the runlist, the runlist elements with VCNs of
- * @new_length and above are discarded. As a special case if @new_length is
- * zero, the runlist is discarded and set to NULL.
- *
- * If @new_length lies beyond the runlist, a sparse runlist element is added to
- * the end of the runlist @runlist or if the last runlist element is a sparse
- * one already, this is extended.
- *
- * Note, no checking is done for unmapped runlist elements. It is assumed that
- * the caller has mapped any elements that need to be mapped already.
- *
- * Return 0 on success and -errno on error.
- *
- * Locking: The caller must hold @runlist->lock for writing.
- */
-int ntfs_rl_truncate_nolock(const ntfs_volume *vol, runlist *const runlist,
- const s64 new_length)
-{
- runlist_element *rl;
- int old_size;
-
- ntfs_debug("Entering for new_length 0x%llx.", (long long)new_length);
- BUG_ON(!runlist);
- BUG_ON(new_length < 0);
- rl = runlist->rl;
- if (!new_length) {
- ntfs_debug("Freeing runlist.");
- runlist->rl = NULL;
- if (rl)
- ntfs_free(rl);
- return 0;
- }
- if (unlikely(!rl)) {
- /*
- * Create a runlist consisting of a sparse runlist element of
- * length @new_length followed by a terminator runlist element.
- */
- rl = ntfs_malloc_nofs(PAGE_SIZE);
- if (unlikely(!rl)) {
- ntfs_error(vol->sb, "Not enough memory to allocate "
- "runlist element buffer.");
- return -ENOMEM;
- }
- runlist->rl = rl;
- rl[1].length = rl->vcn = 0;
- rl->lcn = LCN_HOLE;
- rl[1].vcn = rl->length = new_length;
- rl[1].lcn = LCN_ENOENT;
- return 0;
- }
- BUG_ON(new_length < rl->vcn);
- /* Find @new_length in the runlist. */
- while (likely(rl->length && new_length >= rl[1].vcn))
- rl++;
- /*
- * If not at the end of the runlist we need to shrink it.
- * If at the end of the runlist we need to expand it.
- */
- if (rl->length) {
- runlist_element *trl;
- bool is_end;
-
- ntfs_debug("Shrinking runlist.");
- /* Determine the runlist size. */
- trl = rl + 1;
- while (likely(trl->length))
- trl++;
- old_size = trl - runlist->rl + 1;
- /* Truncate the run. */
- rl->length = new_length - rl->vcn;
- /*
- * If a run was partially truncated, make the following runlist
- * element a terminator.
- */
- is_end = false;
- if (rl->length) {
- rl++;
- if (!rl->length)
- is_end = true;
- rl->vcn = new_length;
- rl->length = 0;
- }
- rl->lcn = LCN_ENOENT;
- /* Reallocate memory if necessary. */
- if (!is_end) {
- int new_size = rl - runlist->rl + 1;
- rl = ntfs_rl_realloc(runlist->rl, old_size, new_size);
- if (IS_ERR(rl))
- ntfs_warning(vol->sb, "Failed to shrink "
- "runlist buffer. This just "
- "wastes a bit of memory "
- "temporarily so we ignore it "
- "and return success.");
- else
- runlist->rl = rl;
- }
- } else if (likely(/* !rl->length && */ new_length > rl->vcn)) {
- ntfs_debug("Expanding runlist.");
- /*
- * If there is a previous runlist element and it is a sparse
- * one, extend it. Otherwise need to add a new, sparse runlist
- * element.
- */
- if ((rl > runlist->rl) && ((rl - 1)->lcn == LCN_HOLE))
- (rl - 1)->length = new_length - (rl - 1)->vcn;
- else {
- /* Determine the runlist size. */
- old_size = rl - runlist->rl + 1;
- /* Reallocate memory if necessary. */
- rl = ntfs_rl_realloc(runlist->rl, old_size,
- old_size + 1);
- if (IS_ERR(rl)) {
- ntfs_error(vol->sb, "Failed to expand runlist "
- "buffer, aborting.");
- return PTR_ERR(rl);
- }
- runlist->rl = rl;
- /*
- * Set @rl to the same runlist element in the new
- * runlist as before in the old runlist.
- */
- rl += old_size - 1;
- /* Add a new, sparse runlist element. */
- rl->lcn = LCN_HOLE;
- rl->length = new_length - rl->vcn;
- /* Add a new terminator runlist element. */
- rl++;
- rl->length = 0;
- }
- rl->vcn = new_length;
- rl->lcn = LCN_ENOENT;
- } else /* if (unlikely(!rl->length && new_length == rl->vcn)) */ {
- /* Runlist already has same size as requested. */
- rl->lcn = LCN_ENOENT;
- }
- ntfs_debug("Done.");
- return 0;
-}
-
-/**
- * ntfs_rl_punch_nolock - punch a hole into a runlist
- * @vol: ntfs volume (needed for error output)
- * @runlist: runlist to punch a hole into
- * @start: starting VCN of the hole to be created
- * @length: size of the hole to be created in units of clusters
- *
- * Punch a hole into the runlist @runlist starting at VCN @start and of size
- * @length clusters.
- *
- * Return 0 on success and -errno on error, in which case @runlist has not been
- * modified.
- *
- * If @start and/or @start + @length are outside the runlist return error code
- * -ENOENT.
- *
- * If the runlist contains unmapped or error elements between @start and @start
- * + @length return error code -EINVAL.
- *
- * Locking: The caller must hold @runlist->lock for writing.
- */
-int ntfs_rl_punch_nolock(const ntfs_volume *vol, runlist *const runlist,
- const VCN start, const s64 length)
-{
- const VCN end = start + length;
- s64 delta;
- runlist_element *rl, *rl_end, *rl_real_end, *trl;
- int old_size;
- bool lcn_fixup = false;
-
- ntfs_debug("Entering for start 0x%llx, length 0x%llx.",
- (long long)start, (long long)length);
- BUG_ON(!runlist);
- BUG_ON(start < 0);
- BUG_ON(length < 0);
- BUG_ON(end < 0);
- rl = runlist->rl;
- if (unlikely(!rl)) {
- if (likely(!start && !length))
- return 0;
- return -ENOENT;
- }
- /* Find @start in the runlist. */
- while (likely(rl->length && start >= rl[1].vcn))
- rl++;
- rl_end = rl;
- /* Find @end in the runlist. */
- while (likely(rl_end->length && end >= rl_end[1].vcn)) {
- /* Verify there are no unmapped or error elements. */
- if (unlikely(rl_end->lcn < LCN_HOLE))
- return -EINVAL;
- rl_end++;
- }
- /* Check the last element. */
- if (unlikely(rl_end->length && rl_end->lcn < LCN_HOLE))
- return -EINVAL;
- /* This covers @start being out of bounds, too. */
- if (!rl_end->length && end > rl_end->vcn)
- return -ENOENT;
- if (!length)
- return 0;
- if (!rl->length)
- return -ENOENT;
- rl_real_end = rl_end;
- /* Determine the runlist size. */
- while (likely(rl_real_end->length))
- rl_real_end++;
- old_size = rl_real_end - runlist->rl + 1;
- /* If @start is in a hole simply extend the hole. */
- if (rl->lcn == LCN_HOLE) {
- /*
- * If both @start and @end are in the same sparse run, we are
- * done.
- */
- if (end <= rl[1].vcn) {
- ntfs_debug("Done (requested hole is already sparse).");
- return 0;
- }
-extend_hole:
- /* Extend the hole. */
- rl->length = end - rl->vcn;
- /* If @end is in a hole, merge it with the current one. */
- if (rl_end->lcn == LCN_HOLE) {
- rl_end++;
- rl->length = rl_end->vcn - rl->vcn;
- }
- /* We have done the hole. Now deal with the remaining tail. */
- rl++;
- /* Cut out all runlist elements up to @end. */
- if (rl < rl_end)
- memmove(rl, rl_end, (rl_real_end - rl_end + 1) *
- sizeof(*rl));
- /* Adjust the beginning of the tail if necessary. */
- if (end > rl->vcn) {
- delta = end - rl->vcn;
- rl->vcn = end;
- rl->length -= delta;
- /* Only adjust the lcn if it is real. */
- if (rl->lcn >= 0)
- rl->lcn += delta;
- }
-shrink_allocation:
- /* Reallocate memory if the allocation changed. */
- if (rl < rl_end) {
- rl = ntfs_rl_realloc(runlist->rl, old_size,
- old_size - (rl_end - rl));
- if (IS_ERR(rl))
- ntfs_warning(vol->sb, "Failed to shrink "
- "runlist buffer. This just "
- "wastes a bit of memory "
- "temporarily so we ignore it "
- "and return success.");
- else
- runlist->rl = rl;
- }
- ntfs_debug("Done (extend hole).");
- return 0;
- }
- /*
- * If @start is at the beginning of a run things are easier as there is
- * no need to split the first run.
- */
- if (start == rl->vcn) {
- /*
- * @start is at the beginning of a run.
- *
- * If the previous run is sparse, extend its hole.
- *
- * If @end is not in the same run, switch the run to be sparse
- * and extend the newly created hole.
- *
- * Thus both of these cases reduce the problem to the above
- * case of "@start is in a hole".
- */
- if (rl > runlist->rl && (rl - 1)->lcn == LCN_HOLE) {
- rl--;
- goto extend_hole;
- }
- if (end >= rl[1].vcn) {
- rl->lcn = LCN_HOLE;
- goto extend_hole;
- }
- /*
- * The final case is when @end is in the same run as @start.
- * For this need to split the run into two. One run for the
- * sparse region between the beginning of the old run, i.e.
- * @start, and @end and one for the remaining non-sparse
- * region, i.e. between @end and the end of the old run.
- */
- trl = ntfs_rl_realloc(runlist->rl, old_size, old_size + 1);
- if (IS_ERR(trl))
- goto enomem_out;
- old_size++;
- if (runlist->rl != trl) {
- rl = trl + (rl - runlist->rl);
- rl_end = trl + (rl_end - runlist->rl);
- rl_real_end = trl + (rl_real_end - runlist->rl);
- runlist->rl = trl;
- }
-split_end:
- /* Shift all the runs up by one. */
- memmove(rl + 1, rl, (rl_real_end - rl + 1) * sizeof(*rl));
- /* Finally, setup the two split runs. */
- rl->lcn = LCN_HOLE;
- rl->length = length;
- rl++;
- rl->vcn += length;
- /* Only adjust the lcn if it is real. */
- if (rl->lcn >= 0 || lcn_fixup)
- rl->lcn += length;
- rl->length -= length;
- ntfs_debug("Done (split one).");
- return 0;
- }
- /*
- * @start is neither in a hole nor at the beginning of a run.
- *
- * If @end is in a hole, things are easier as simply truncating the run
- * @start is in to end at @start - 1, deleting all runs after that up
- * to @end, and finally extending the beginning of the run @end is in
- * to be @start is all that is needed.
- */
- if (rl_end->lcn == LCN_HOLE) {
- /* Truncate the run containing @start. */
- rl->length = start - rl->vcn;
- rl++;
- /* Cut out all runlist elements up to @end. */
- if (rl < rl_end)
- memmove(rl, rl_end, (rl_real_end - rl_end + 1) *
- sizeof(*rl));
- /* Extend the beginning of the run @end is in to be @start. */
- rl->vcn = start;
- rl->length = rl[1].vcn - start;
- goto shrink_allocation;
- }
- /*
- * If @end is not in a hole there are still two cases to distinguish.
- * Either @end is or is not in the same run as @start.
- *
- * The second case is easier as it can be reduced to an already solved
- * problem by truncating the run @start is in to end at @start - 1.
- * Then, if @end is in the next run need to split the run into a sparse
- * run followed by a non-sparse run (already covered above) and if @end
- * is not in the next run switching it to be sparse, again reduces the
- * problem to the already covered case of "@start is in a hole".
- */
- if (end >= rl[1].vcn) {
- /*
- * If @end is not in the next run, reduce the problem to the
- * case of "@start is in a hole".
- */
- if (rl[1].length && end >= rl[2].vcn) {
- /* Truncate the run containing @start. */
- rl->length = start - rl->vcn;
- rl++;
- rl->vcn = start;
- rl->lcn = LCN_HOLE;
- goto extend_hole;
- }
- trl = ntfs_rl_realloc(runlist->rl, old_size, old_size + 1);
- if (IS_ERR(trl))
- goto enomem_out;
- old_size++;
- if (runlist->rl != trl) {
- rl = trl + (rl - runlist->rl);
- rl_end = trl + (rl_end - runlist->rl);
- rl_real_end = trl + (rl_real_end - runlist->rl);
- runlist->rl = trl;
- }
- /* Truncate the run containing @start. */
- rl->length = start - rl->vcn;
- rl++;
- /*
- * @end is in the next run, reduce the problem to the case
- * where "@start is at the beginning of a run and @end is in
- * the same run as @start".
- */
- delta = rl->vcn - start;
- rl->vcn = start;
- if (rl->lcn >= 0) {
- rl->lcn -= delta;
- /* Need this in case the lcn just became negative. */
- lcn_fixup = true;
- }
- rl->length += delta;
- goto split_end;
- }
- /*
- * The first case from above, i.e. @end is in the same run as @start.
- * We need to split the run into three. One run for the non-sparse
- * region between the beginning of the old run and @start, one for the
- * sparse region between @start and @end, and one for the remaining
- * non-sparse region, i.e. between @end and the end of the old run.
- */
- trl = ntfs_rl_realloc(runlist->rl, old_size, old_size + 2);
- if (IS_ERR(trl))
- goto enomem_out;
- old_size += 2;
- if (runlist->rl != trl) {
- rl = trl + (rl - runlist->rl);
- rl_end = trl + (rl_end - runlist->rl);
- rl_real_end = trl + (rl_real_end - runlist->rl);
- runlist->rl = trl;
- }
- /* Shift all the runs up by two. */
- memmove(rl + 2, rl, (rl_real_end - rl + 1) * sizeof(*rl));
- /* Finally, setup the three split runs. */
- rl->length = start - rl->vcn;
- rl++;
- rl->vcn = start;
- rl->lcn = LCN_HOLE;
- rl->length = length;
- rl++;
- delta = end - rl->vcn;
- rl->vcn = end;
- rl->lcn += delta;
- rl->length -= delta;
- ntfs_debug("Done (split both).");
- return 0;
-enomem_out:
- ntfs_error(vol->sb, "Not enough memory to extend runlist buffer.");
- return -ENOMEM;
-}
-
-#endif /* NTFS_RW */
diff --git a/fs/ntfs/runlist.h b/fs/ntfs/runlist.h
deleted file mode 100644
index 38de0a3..0000000
--- a/fs/ntfs/runlist.h
+++ /dev/null
@@ -1,88 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * runlist.h - Defines for runlist handling in NTFS Linux kernel driver.
- * Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2005 Anton Altaparmakov
- * Copyright (c) 2002 Richard Russon
- */
-
-#ifndef _LINUX_NTFS_RUNLIST_H
-#define _LINUX_NTFS_RUNLIST_H
-
-#include "types.h"
-#include "layout.h"
-#include "volume.h"
-
-/**
- * runlist_element - in memory vcn to lcn mapping array element
- * @vcn: starting vcn of the current array element
- * @lcn: starting lcn of the current array element
- * @length: length in clusters of the current array element
- *
- * The last vcn (in fact the last vcn + 1) is reached when length == 0.
- *
- * When lcn == -1 this means that the count vcns starting at vcn are not
- * physically allocated (i.e. this is a hole / data is sparse).
- */
-typedef struct { /* In memory vcn to lcn mapping structure element. */
- VCN vcn; /* vcn = Starting virtual cluster number. */
- LCN lcn; /* lcn = Starting logical cluster number. */
- s64 length; /* Run length in clusters. */
-} runlist_element;
-
-/**
- * runlist - in memory vcn to lcn mapping array including a read/write lock
- * @rl: pointer to an array of runlist elements
- * @lock: read/write spinlock for serializing access to @rl
- *
- */
-typedef struct {
- runlist_element *rl;
- struct rw_semaphore lock;
-} runlist;
-
-static inline void ntfs_init_runlist(runlist *rl)
-{
- rl->rl = NULL;
- init_rwsem(&rl->lock);
-}
-
-typedef enum {
- LCN_HOLE = -1, /* Keep this as highest value or die! */
- LCN_RL_NOT_MAPPED = -2,
- LCN_ENOENT = -3,
- LCN_ENOMEM = -4,
- LCN_EIO = -5,
-} LCN_SPECIAL_VALUES;
-
-extern runlist_element *ntfs_runlists_merge(runlist_element *drl,
- runlist_element *srl);
-
-extern runlist_element *ntfs_mapping_pairs_decompress(const ntfs_volume *vol,
- const ATTR_RECORD *attr, runlist_element *old_rl);
-
-extern LCN ntfs_rl_vcn_to_lcn(const runlist_element *rl, const VCN vcn);
-
-#ifdef NTFS_RW
-
-extern runlist_element *ntfs_rl_find_vcn_nolock(runlist_element *rl,
- const VCN vcn);
-
-extern int ntfs_get_size_for_mapping_pairs(const ntfs_volume *vol,
- const runlist_element *rl, const VCN first_vcn,
- const VCN last_vcn);
-
-extern int ntfs_mapping_pairs_build(const ntfs_volume *vol, s8 *dst,
- const int dst_len, const runlist_element *rl,
- const VCN first_vcn, const VCN last_vcn, VCN *const stop_vcn);
-
-extern int ntfs_rl_truncate_nolock(const ntfs_volume *vol,
- runlist *const runlist, const s64 new_length);
-
-int ntfs_rl_punch_nolock(const ntfs_volume *vol, runlist *const runlist,
- const VCN start, const s64 length);
-
-#endif /* NTFS_RW */
-
-#endif /* _LINUX_NTFS_RUNLIST_H */
diff --git a/fs/ntfs/super.c b/fs/ntfs/super.c
deleted file mode 100644
index 56a7d5b..0000000
--- a/fs/ntfs/super.c
+++ /dev/null
@@ -1,3202 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * super.c - NTFS kernel super block handling. Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2012 Anton Altaparmakov and Tuxera Inc.
- * Copyright (c) 2001,2002 Richard Russon
- */
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#include <linux/stddef.h>
-#include <linux/init.h>
-#include <linux/slab.h>
-#include <linux/string.h>
-#include <linux/spinlock.h>
-#include <linux/blkdev.h> /* For bdev_logical_block_size(). */
-#include <linux/backing-dev.h>
-#include <linux/buffer_head.h>
-#include <linux/vfs.h>
-#include <linux/moduleparam.h>
-#include <linux/bitmap.h>
-
-#include "sysctl.h"
-#include "logfile.h"
-#include "quota.h"
-#include "usnjrnl.h"
-#include "dir.h"
-#include "debug.h"
-#include "index.h"
-#include "inode.h"
-#include "aops.h"
-#include "layout.h"
-#include "malloc.h"
-#include "ntfs.h"
-
-/* Number of mounted filesystems which have compression enabled. */
-static unsigned long ntfs_nr_compression_users;
-
-/* A global default upcase table and a corresponding reference count. */
-static ntfschar *default_upcase;
-static unsigned long ntfs_nr_upcase_users;
-
-/* Error constants/strings used in inode.c::ntfs_show_options(). */
-typedef enum {
- /* One of these must be present, default is ON_ERRORS_CONTINUE. */
- ON_ERRORS_PANIC = 0x01,
- ON_ERRORS_REMOUNT_RO = 0x02,
- ON_ERRORS_CONTINUE = 0x04,
- /* Optional, can be combined with any of the above. */
- ON_ERRORS_RECOVER = 0x10,
-} ON_ERRORS_ACTIONS;
-
-const option_t on_errors_arr[] = {
- { ON_ERRORS_PANIC, "panic" },
- { ON_ERRORS_REMOUNT_RO, "remount-ro", },
- { ON_ERRORS_CONTINUE, "continue", },
- { ON_ERRORS_RECOVER, "recover" },
- { 0, NULL }
-};
-
-/**
- * simple_getbool - convert input string to a boolean value
- * @s: input string to convert
- * @setval: where to store the output boolean value
- *
- * Copied from old ntfs driver (which copied from vfat driver).
- *
- * "1", "yes", "true", or an empty string are converted to %true.
- * "0", "no", and "false" are converted to %false.
- *
- * Return: %1 if the string is converted or was empty and *setval contains it;
- * %0 if the string was not valid.
- */
-static int simple_getbool(char *s, bool *setval)
-{
- if (s) {
- if (!strcmp(s, "1") || !strcmp(s, "yes") || !strcmp(s, "true"))
- *setval = true;
- else if (!strcmp(s, "0") || !strcmp(s, "no") ||
- !strcmp(s, "false"))
- *setval = false;
- else
- return 0;
- } else
- *setval = true;
- return 1;
-}
-
-/**
- * parse_options - parse the (re)mount options
- * @vol: ntfs volume
- * @opt: string containing the (re)mount options
- *
- * Parse the recognized options in @opt for the ntfs volume described by @vol.
- */
-static bool parse_options(ntfs_volume *vol, char *opt)
-{
- char *p, *v, *ov;
- static char *utf8 = "utf8";
- int errors = 0, sloppy = 0;
- kuid_t uid = INVALID_UID;
- kgid_t gid = INVALID_GID;
- umode_t fmask = (umode_t)-1, dmask = (umode_t)-1;
- int mft_zone_multiplier = -1, on_errors = -1;
- int show_sys_files = -1, case_sensitive = -1, disable_sparse = -1;
- struct nls_table *nls_map = NULL, *old_nls;
-
- /* I am lazy... (-8 */
-#define NTFS_GETOPT_WITH_DEFAULT(option, variable, default_value) \
- if (!strcmp(p, option)) { \
- if (!v || !*v) \
- variable = default_value; \
- else { \
- variable = simple_strtoul(ov = v, &v, 0); \
- if (*v) \
- goto needs_val; \
- } \
- }
-#define NTFS_GETOPT(option, variable) \
- if (!strcmp(p, option)) { \
- if (!v || !*v) \
- goto needs_arg; \
- variable = simple_strtoul(ov = v, &v, 0); \
- if (*v) \
- goto needs_val; \
- }
-#define NTFS_GETOPT_UID(option, variable) \
- if (!strcmp(p, option)) { \
- uid_t uid_value; \
- if (!v || !*v) \
- goto needs_arg; \
- uid_value = simple_strtoul(ov = v, &v, 0); \
- if (*v) \
- goto needs_val; \
- variable = make_kuid(current_user_ns(), uid_value); \
- if (!uid_valid(variable)) \
- goto needs_val; \
- }
-#define NTFS_GETOPT_GID(option, variable) \
- if (!strcmp(p, option)) { \
- gid_t gid_value; \
- if (!v || !*v) \
- goto needs_arg; \
- gid_value = simple_strtoul(ov = v, &v, 0); \
- if (*v) \
- goto needs_val; \
- variable = make_kgid(current_user_ns(), gid_value); \
- if (!gid_valid(variable)) \
- goto needs_val; \
- }
-#define NTFS_GETOPT_OCTAL(option, variable) \
- if (!strcmp(p, option)) { \
- if (!v || !*v) \
- goto needs_arg; \
- variable = simple_strtoul(ov = v, &v, 8); \
- if (*v) \
- goto needs_val; \
- }
-#define NTFS_GETOPT_BOOL(option, variable) \
- if (!strcmp(p, option)) { \
- bool val; \
- if (!simple_getbool(v, &val)) \
- goto needs_bool; \
- variable = val; \
- }
-#define NTFS_GETOPT_OPTIONS_ARRAY(option, variable, opt_array) \
- if (!strcmp(p, option)) { \
- int _i; \
- if (!v || !*v) \
- goto needs_arg; \
- ov = v; \
- if (variable == -1) \
- variable = 0; \
- for (_i = 0; opt_array[_i].str && *opt_array[_i].str; _i++) \
- if (!strcmp(opt_array[_i].str, v)) { \
- variable |= opt_array[_i].val; \
- break; \
- } \
- if (!opt_array[_i].str || !*opt_array[_i].str) \
- goto needs_val; \
- }
- if (!opt || !*opt)
- goto no_mount_options;
- ntfs_debug("Entering with mount options string: %s", opt);
- while ((p = strsep(&opt, ","))) {
- if ((v = strchr(p, '=')))
- *v++ = 0;
- NTFS_GETOPT_UID("uid", uid)
- else NTFS_GETOPT_GID("gid", gid)
- else NTFS_GETOPT_OCTAL("umask", fmask = dmask)
- else NTFS_GETOPT_OCTAL("fmask", fmask)
- else NTFS_GETOPT_OCTAL("dmask", dmask)
- else NTFS_GETOPT("mft_zone_multiplier", mft_zone_multiplier)
- else NTFS_GETOPT_WITH_DEFAULT("sloppy", sloppy, true)
- else NTFS_GETOPT_BOOL("show_sys_files", show_sys_files)
- else NTFS_GETOPT_BOOL("case_sensitive", case_sensitive)
- else NTFS_GETOPT_BOOL("disable_sparse", disable_sparse)
- else NTFS_GETOPT_OPTIONS_ARRAY("errors", on_errors,
- on_errors_arr)
- else if (!strcmp(p, "posix") || !strcmp(p, "show_inodes"))
- ntfs_warning(vol->sb, "Ignoring obsolete option %s.",
- p);
- else if (!strcmp(p, "nls") || !strcmp(p, "iocharset")) {
- if (!strcmp(p, "iocharset"))
- ntfs_warning(vol->sb, "Option iocharset is "
- "deprecated. Please use "
- "option nls=<charsetname> in "
- "the future.");
- if (!v || !*v)
- goto needs_arg;
-use_utf8:
- old_nls = nls_map;
- nls_map = load_nls(v);
- if (!nls_map) {
- if (!old_nls) {
- ntfs_error(vol->sb, "NLS character set "
- "%s not found.", v);
- return false;
- }
- ntfs_error(vol->sb, "NLS character set %s not "
- "found. Using previous one %s.",
- v, old_nls->charset);
- nls_map = old_nls;
- } else /* nls_map */ {
- unload_nls(old_nls);
- }
- } else if (!strcmp(p, "utf8")) {
- bool val = false;
- ntfs_warning(vol->sb, "Option utf8 is no longer "
- "supported, using option nls=utf8. Please "
- "use option nls=utf8 in the future and "
- "make sure utf8 is compiled either as a "
- "module or into the kernel.");
- if (!v || !*v)
- val = true;
- else if (!simple_getbool(v, &val))
- goto needs_bool;
- if (val) {
- v = utf8;
- goto use_utf8;
- }
- } else {
- ntfs_error(vol->sb, "Unrecognized mount option %s.", p);
- if (errors < INT_MAX)
- errors++;
- }
-#undef NTFS_GETOPT_OPTIONS_ARRAY
-#undef NTFS_GETOPT_BOOL
-#undef NTFS_GETOPT
-#undef NTFS_GETOPT_WITH_DEFAULT
- }
-no_mount_options:
- if (errors && !sloppy)
- return false;
- if (sloppy)
- ntfs_warning(vol->sb, "Sloppy option given. Ignoring "
- "unrecognized mount option(s) and continuing.");
- /* Keep this first! */
- if (on_errors != -1) {
- if (!on_errors) {
- ntfs_error(vol->sb, "Invalid errors option argument "
- "or bug in options parser.");
- return false;
- }
- }
- if (nls_map) {
- if (vol->nls_map && vol->nls_map != nls_map) {
- ntfs_error(vol->sb, "Cannot change NLS character set "
- "on remount.");
- return false;
- } /* else (!vol->nls_map) */
- ntfs_debug("Using NLS character set %s.", nls_map->charset);
- vol->nls_map = nls_map;
- } else /* (!nls_map) */ {
- if (!vol->nls_map) {
- vol->nls_map = load_nls_default();
- if (!vol->nls_map) {
- ntfs_error(vol->sb, "Failed to load default "
- "NLS character set.");
- return false;
- }
- ntfs_debug("Using default NLS character set (%s).",
- vol->nls_map->charset);
- }
- }
- if (mft_zone_multiplier != -1) {
- if (vol->mft_zone_multiplier && vol->mft_zone_multiplier !=
- mft_zone_multiplier) {
- ntfs_error(vol->sb, "Cannot change mft_zone_multiplier "
- "on remount.");
- return false;
- }
- if (mft_zone_multiplier < 1 || mft_zone_multiplier > 4) {
- ntfs_error(vol->sb, "Invalid mft_zone_multiplier. "
- "Using default value, i.e. 1.");
- mft_zone_multiplier = 1;
- }
- vol->mft_zone_multiplier = mft_zone_multiplier;
- }
- if (!vol->mft_zone_multiplier)
- vol->mft_zone_multiplier = 1;
- if (on_errors != -1)
- vol->on_errors = on_errors;
- if (!vol->on_errors || vol->on_errors == ON_ERRORS_RECOVER)
- vol->on_errors |= ON_ERRORS_CONTINUE;
- if (uid_valid(uid))
- vol->uid = uid;
- if (gid_valid(gid))
- vol->gid = gid;
- if (fmask != (umode_t)-1)
- vol->fmask = fmask;
- if (dmask != (umode_t)-1)
- vol->dmask = dmask;
- if (show_sys_files != -1) {
- if (show_sys_files)
- NVolSetShowSystemFiles(vol);
- else
- NVolClearShowSystemFiles(vol);
- }
- if (case_sensitive != -1) {
- if (case_sensitive)
- NVolSetCaseSensitive(vol);
- else
- NVolClearCaseSensitive(vol);
- }
- if (disable_sparse != -1) {
- if (disable_sparse)
- NVolClearSparseEnabled(vol);
- else {
- if (!NVolSparseEnabled(vol) &&
- vol->major_ver && vol->major_ver < 3)
- ntfs_warning(vol->sb, "Not enabling sparse "
- "support due to NTFS volume "
- "version %i.%i (need at least "
- "version 3.0).", vol->major_ver,
- vol->minor_ver);
- else
- NVolSetSparseEnabled(vol);
- }
- }
- return true;
-needs_arg:
- ntfs_error(vol->sb, "The %s option requires an argument.", p);
- return false;
-needs_bool:
- ntfs_error(vol->sb, "The %s option requires a boolean argument.", p);
- return false;
-needs_val:
- ntfs_error(vol->sb, "Invalid %s option argument: %s", p, ov);
- return false;
-}
-
-#ifdef NTFS_RW
-
-/**
- * ntfs_write_volume_flags - write new flags to the volume information flags
- * @vol: ntfs volume on which to modify the flags
- * @flags: new flags value for the volume information flags
- *
- * Internal function. You probably want to use ntfs_{set,clear}_volume_flags()
- * instead (see below).
- *
- * Replace the volume information flags on the volume @vol with the value
- * supplied in @flags. Note, this overwrites the volume information flags, so
- * make sure to combine the flags you want to modify with the old flags and use
- * the result when calling ntfs_write_volume_flags().
- *
- * Return 0 on success and -errno on error.
- */
-static int ntfs_write_volume_flags(ntfs_volume *vol, const VOLUME_FLAGS flags)
-{
- ntfs_inode *ni = NTFS_I(vol->vol_ino);
- MFT_RECORD *m;
- VOLUME_INFORMATION *vi;
- ntfs_attr_search_ctx *ctx;
- int err;
-
- ntfs_debug("Entering, old flags = 0x%x, new flags = 0x%x.",
- le16_to_cpu(vol->vol_flags), le16_to_cpu(flags));
- if (vol->vol_flags == flags)
- goto done;
- BUG_ON(!ni);
- m = map_mft_record(ni);
- if (IS_ERR(m)) {
- err = PTR_ERR(m);
- goto err_out;
- }
- ctx = ntfs_attr_get_search_ctx(ni, m);
- if (!ctx) {
- err = -ENOMEM;
- goto put_unm_err_out;
- }
- err = ntfs_attr_lookup(AT_VOLUME_INFORMATION, NULL, 0, 0, 0, NULL, 0,
- ctx);
- if (err)
- goto put_unm_err_out;
- vi = (VOLUME_INFORMATION*)((u8*)ctx->attr +
- le16_to_cpu(ctx->attr->data.resident.value_offset));
- vol->vol_flags = vi->flags = flags;
- flush_dcache_mft_record_page(ctx->ntfs_ino);
- mark_mft_record_dirty(ctx->ntfs_ino);
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(ni);
-done:
- ntfs_debug("Done.");
- return 0;
-put_unm_err_out:
- if (ctx)
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(ni);
-err_out:
- ntfs_error(vol->sb, "Failed with error code %i.", -err);
- return err;
-}
-
-/**
- * ntfs_set_volume_flags - set bits in the volume information flags
- * @vol: ntfs volume on which to modify the flags
- * @flags: flags to set on the volume
- *
- * Set the bits in @flags in the volume information flags on the volume @vol.
- *
- * Return 0 on success and -errno on error.
- */
-static inline int ntfs_set_volume_flags(ntfs_volume *vol, VOLUME_FLAGS flags)
-{
- flags &= VOLUME_FLAGS_MASK;
- return ntfs_write_volume_flags(vol, vol->vol_flags | flags);
-}
-
-/**
- * ntfs_clear_volume_flags - clear bits in the volume information flags
- * @vol: ntfs volume on which to modify the flags
- * @flags: flags to clear on the volume
- *
- * Clear the bits in @flags in the volume information flags on the volume @vol.
- *
- * Return 0 on success and -errno on error.
- */
-static inline int ntfs_clear_volume_flags(ntfs_volume *vol, VOLUME_FLAGS flags)
-{
- flags &= VOLUME_FLAGS_MASK;
- flags = vol->vol_flags & cpu_to_le16(~le16_to_cpu(flags));
- return ntfs_write_volume_flags(vol, flags);
-}
-
-#endif /* NTFS_RW */
-
-/**
- * ntfs_remount - change the mount options of a mounted ntfs filesystem
- * @sb: superblock of mounted ntfs filesystem
- * @flags: remount flags
- * @opt: remount options string
- *
- * Change the mount options of an already mounted ntfs filesystem.
- *
- * NOTE: The VFS sets the @sb->s_flags remount flags to @flags after
- * ntfs_remount() returns successfully (i.e. returns 0). Otherwise,
- * @sb->s_flags are not changed.
- */
-static int ntfs_remount(struct super_block *sb, int *flags, char *opt)
-{
- ntfs_volume *vol = NTFS_SB(sb);
-
- ntfs_debug("Entering with remount options string: %s", opt);
-
- sync_filesystem(sb);
-
-#ifndef NTFS_RW
- /* For read-only compiled driver, enforce read-only flag. */
- *flags |= SB_RDONLY;
-#else /* NTFS_RW */
- /*
- * For the read-write compiled driver, if we are remounting read-write,
- * make sure there are no volume errors and that no unsupported volume
- * flags are set. Also, empty the logfile journal as it would become
- * stale as soon as something is written to the volume and mark the
- * volume dirty so that chkdsk is run if the volume is not umounted
- * cleanly. Finally, mark the quotas out of date so Windows rescans
- * the volume on boot and updates them.
- *
- * When remounting read-only, mark the volume clean if no volume errors
- * have occurred.
- */
- if (sb_rdonly(sb) && !(*flags & SB_RDONLY)) {
- static const char *es = ". Cannot remount read-write.";
-
- /* Remounting read-write. */
- if (NVolErrors(vol)) {
- ntfs_error(sb, "Volume has errors and is read-only%s",
- es);
- return -EROFS;
- }
- if (vol->vol_flags & VOLUME_IS_DIRTY) {
- ntfs_error(sb, "Volume is dirty and read-only%s", es);
- return -EROFS;
- }
- if (vol->vol_flags & VOLUME_MODIFIED_BY_CHKDSK) {
- ntfs_error(sb, "Volume has been modified by chkdsk "
- "and is read-only%s", es);
- return -EROFS;
- }
- if (vol->vol_flags & VOLUME_MUST_MOUNT_RO_MASK) {
- ntfs_error(sb, "Volume has unsupported flags set "
- "(0x%x) and is read-only%s",
- (unsigned)le16_to_cpu(vol->vol_flags),
- es);
- return -EROFS;
- }
- if (ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY)) {
- ntfs_error(sb, "Failed to set dirty bit in volume "
- "information flags%s", es);
- return -EROFS;
- }
-#if 0
- // TODO: Enable this code once we start modifying anything that
- // is different between NTFS 1.2 and 3.x...
- /* Set NT4 compatibility flag on newer NTFS version volumes. */
- if ((vol->major_ver > 1)) {
- if (ntfs_set_volume_flags(vol, VOLUME_MOUNTED_ON_NT4)) {
- ntfs_error(sb, "Failed to set NT4 "
- "compatibility flag%s", es);
- NVolSetErrors(vol);
- return -EROFS;
- }
- }
-#endif
- if (!ntfs_empty_logfile(vol->logfile_ino)) {
- ntfs_error(sb, "Failed to empty journal $LogFile%s",
- es);
- NVolSetErrors(vol);
- return -EROFS;
- }
- if (!ntfs_mark_quotas_out_of_date(vol)) {
- ntfs_error(sb, "Failed to mark quotas out of date%s",
- es);
- NVolSetErrors(vol);
- return -EROFS;
- }
- if (!ntfs_stamp_usnjrnl(vol)) {
- ntfs_error(sb, "Failed to stamp transaction log "
- "($UsnJrnl)%s", es);
- NVolSetErrors(vol);
- return -EROFS;
- }
- } else if (!sb_rdonly(sb) && (*flags & SB_RDONLY)) {
- /* Remounting read-only. */
- if (!NVolErrors(vol)) {
- if (ntfs_clear_volume_flags(vol, VOLUME_IS_DIRTY))
- ntfs_warning(sb, "Failed to clear dirty bit "
- "in volume information "
- "flags. Run chkdsk.");
- }
- }
-#endif /* NTFS_RW */
-
- // TODO: Deal with *flags.
-
- if (!parse_options(vol, opt))
- return -EINVAL;
-
- ntfs_debug("Done.");
- return 0;
-}
-
-/**
- * is_boot_sector_ntfs - check whether a boot sector is a valid NTFS boot sector
- * @sb: Super block of the device to which @b belongs.
- * @b: Boot sector of device @sb to check.
- * @silent: If 'true', all output will be silenced.
- *
- * is_boot_sector_ntfs() checks whether the boot sector @b is a valid NTFS boot
- * sector. Returns 'true' if it is valid and 'false' if not.
- *
- * @sb is only needed for warning/error output, i.e. it can be NULL when silent
- * is 'true'.
- */
-static bool is_boot_sector_ntfs(const struct super_block *sb,
- const NTFS_BOOT_SECTOR *b, const bool silent)
-{
- /*
- * Check that checksum == sum of u32 values from b to the checksum
- * field. If checksum is zero, no checking is done. We will work when
- * the checksum test fails, since some utilities update the boot sector
- * ignoring the checksum which leaves the checksum out-of-date. We
- * report a warning if this is the case.
- */
- if ((void*)b < (void*)&b->checksum && b->checksum && !silent) {
- le32 *u;
- u32 i;
-
- for (i = 0, u = (le32*)b; u < (le32*)(&b->checksum); ++u)
- i += le32_to_cpup(u);
- if (le32_to_cpu(b->checksum) != i)
- ntfs_warning(sb, "Invalid boot sector checksum.");
- }
- /* Check OEMidentifier is "NTFS " */
- if (b->oem_id != magicNTFS)
- goto not_ntfs;
- /* Check bytes per sector value is between 256 and 4096. */
- if (le16_to_cpu(b->bpb.bytes_per_sector) < 0x100 ||
- le16_to_cpu(b->bpb.bytes_per_sector) > 0x1000)
- goto not_ntfs;
- /* Check sectors per cluster value is valid. */
- switch (b->bpb.sectors_per_cluster) {
- case 1: case 2: case 4: case 8: case 16: case 32: case 64: case 128:
- break;
- default:
- goto not_ntfs;
- }
- /* Check the cluster size is not above the maximum (64kiB). */
- if ((u32)le16_to_cpu(b->bpb.bytes_per_sector) *
- b->bpb.sectors_per_cluster > NTFS_MAX_CLUSTER_SIZE)
- goto not_ntfs;
- /* Check reserved/unused fields are really zero. */
- if (le16_to_cpu(b->bpb.reserved_sectors) ||
- le16_to_cpu(b->bpb.root_entries) ||
- le16_to_cpu(b->bpb.sectors) ||
- le16_to_cpu(b->bpb.sectors_per_fat) ||
- le32_to_cpu(b->bpb.large_sectors) || b->bpb.fats)
- goto not_ntfs;
- /* Check clusters per file mft record value is valid. */
- if ((u8)b->clusters_per_mft_record < 0xe1 ||
- (u8)b->clusters_per_mft_record > 0xf7)
- switch (b->clusters_per_mft_record) {
- case 1: case 2: case 4: case 8: case 16: case 32: case 64:
- break;
- default:
- goto not_ntfs;
- }
- /* Check clusters per index block value is valid. */
- if ((u8)b->clusters_per_index_record < 0xe1 ||
- (u8)b->clusters_per_index_record > 0xf7)
- switch (b->clusters_per_index_record) {
- case 1: case 2: case 4: case 8: case 16: case 32: case 64:
- break;
- default:
- goto not_ntfs;
- }
- /*
- * Check for valid end of sector marker. We will work without it, but
- * many BIOSes will refuse to boot from a bootsector if the magic is
- * incorrect, so we emit a warning.
- */
- if (!silent && b->end_of_sector_marker != cpu_to_le16(0xaa55))
- ntfs_warning(sb, "Invalid end of sector marker.");
- return true;
-not_ntfs:
- return false;
-}
-
-/**
- * read_ntfs_boot_sector - read the NTFS boot sector of a device
- * @sb: super block of device to read the boot sector from
- * @silent: if true, suppress all output
- *
- * Reads the boot sector from the device and validates it. If that fails, tries
- * to read the backup boot sector, first from the end of the device a-la NT4 and
- * later and then from the middle of the device a-la NT3.51 and before.
- *
- * If a valid boot sector is found but it is not the primary boot sector, we
- * repair the primary boot sector silently (unless the device is read-only or
- * the primary boot sector is not accessible).
- *
- * NOTE: To call this function, @sb must have the fields s_dev, the ntfs super
- * block (u.ntfs_sb), nr_blocks and the device flags (s_flags) initialized
- * to their respective values.
- *
- * Return the unlocked buffer head containing the boot sector or NULL on error.
- */
-static struct buffer_head *read_ntfs_boot_sector(struct super_block *sb,
- const int silent)
-{
- const char *read_err_str = "Unable to read %s boot sector.";
- struct buffer_head *bh_primary, *bh_backup;
- sector_t nr_blocks = NTFS_SB(sb)->nr_blocks;
-
- /* Try to read primary boot sector. */
- if ((bh_primary = sb_bread(sb, 0))) {
- if (is_boot_sector_ntfs(sb, (NTFS_BOOT_SECTOR*)
- bh_primary->b_data, silent))
- return bh_primary;
- if (!silent)
- ntfs_error(sb, "Primary boot sector is invalid.");
- } else if (!silent)
- ntfs_error(sb, read_err_str, "primary");
- if (!(NTFS_SB(sb)->on_errors & ON_ERRORS_RECOVER)) {
- if (bh_primary)
- brelse(bh_primary);
- if (!silent)
- ntfs_error(sb, "Mount option errors=recover not used. "
- "Aborting without trying to recover.");
- return NULL;
- }
- /* Try to read NT4+ backup boot sector. */
- if ((bh_backup = sb_bread(sb, nr_blocks - 1))) {
- if (is_boot_sector_ntfs(sb, (NTFS_BOOT_SECTOR*)
- bh_backup->b_data, silent))
- goto hotfix_primary_boot_sector;
- brelse(bh_backup);
- } else if (!silent)
- ntfs_error(sb, read_err_str, "backup");
- /* Try to read NT3.51- backup boot sector. */
- if ((bh_backup = sb_bread(sb, nr_blocks >> 1))) {
- if (is_boot_sector_ntfs(sb, (NTFS_BOOT_SECTOR*)
- bh_backup->b_data, silent))
- goto hotfix_primary_boot_sector;
- if (!silent)
- ntfs_error(sb, "Could not find a valid backup boot "
- "sector.");
- brelse(bh_backup);
- } else if (!silent)
- ntfs_error(sb, read_err_str, "backup");
- /* We failed. Cleanup and return. */
- if (bh_primary)
- brelse(bh_primary);
- return NULL;
-hotfix_primary_boot_sector:
- if (bh_primary) {
- /*
- * If we managed to read sector zero and the volume is not
- * read-only, copy the found, valid backup boot sector to the
- * primary boot sector. Note we only copy the actual boot
- * sector structure, not the actual whole device sector as that
- * may be bigger and would potentially damage the $Boot system
- * file (FIXME: Would be nice to know if the backup boot sector
- * on a large sector device contains the whole boot loader or
- * just the first 512 bytes).
- */
- if (!sb_rdonly(sb)) {
- ntfs_warning(sb, "Hot-fix: Recovering invalid primary "
- "boot sector from backup copy.");
- memcpy(bh_primary->b_data, bh_backup->b_data,
- NTFS_BLOCK_SIZE);
- mark_buffer_dirty(bh_primary);
- sync_dirty_buffer(bh_primary);
- if (buffer_uptodate(bh_primary)) {
- brelse(bh_backup);
- return bh_primary;
- }
- ntfs_error(sb, "Hot-fix: Device write error while "
- "recovering primary boot sector.");
- } else {
- ntfs_warning(sb, "Hot-fix: Recovery of primary boot "
- "sector failed: Read-only mount.");
- }
- brelse(bh_primary);
- }
- ntfs_warning(sb, "Using backup boot sector.");
- return bh_backup;
-}
-
-/**
- * parse_ntfs_boot_sector - parse the boot sector and store the data in @vol
- * @vol: volume structure to initialise with data from boot sector
- * @b: boot sector to parse
- *
- * Parse the ntfs boot sector @b and store all imporant information therein in
- * the ntfs super block @vol. Return 'true' on success and 'false' on error.
- */
-static bool parse_ntfs_boot_sector(ntfs_volume *vol, const NTFS_BOOT_SECTOR *b)
-{
- unsigned int sectors_per_cluster_bits, nr_hidden_sects;
- int clusters_per_mft_record, clusters_per_index_record;
- s64 ll;
-
- vol->sector_size = le16_to_cpu(b->bpb.bytes_per_sector);
- vol->sector_size_bits = ffs(vol->sector_size) - 1;
- ntfs_debug("vol->sector_size = %i (0x%x)", vol->sector_size,
- vol->sector_size);
- ntfs_debug("vol->sector_size_bits = %i (0x%x)", vol->sector_size_bits,
- vol->sector_size_bits);
- if (vol->sector_size < vol->sb->s_blocksize) {
- ntfs_error(vol->sb, "Sector size (%i) is smaller than the "
- "device block size (%lu). This is not "
- "supported. Sorry.", vol->sector_size,
- vol->sb->s_blocksize);
- return false;
- }
- ntfs_debug("sectors_per_cluster = 0x%x", b->bpb.sectors_per_cluster);
- sectors_per_cluster_bits = ffs(b->bpb.sectors_per_cluster) - 1;
- ntfs_debug("sectors_per_cluster_bits = 0x%x",
- sectors_per_cluster_bits);
- nr_hidden_sects = le32_to_cpu(b->bpb.hidden_sectors);
- ntfs_debug("number of hidden sectors = 0x%x", nr_hidden_sects);
- vol->cluster_size = vol->sector_size << sectors_per_cluster_bits;
- vol->cluster_size_mask = vol->cluster_size - 1;
- vol->cluster_size_bits = ffs(vol->cluster_size) - 1;
- ntfs_debug("vol->cluster_size = %i (0x%x)", vol->cluster_size,
- vol->cluster_size);
- ntfs_debug("vol->cluster_size_mask = 0x%x", vol->cluster_size_mask);
- ntfs_debug("vol->cluster_size_bits = %i", vol->cluster_size_bits);
- if (vol->cluster_size < vol->sector_size) {
- ntfs_error(vol->sb, "Cluster size (%i) is smaller than the "
- "sector size (%i). This is not supported. "
- "Sorry.", vol->cluster_size, vol->sector_size);
- return false;
- }
- clusters_per_mft_record = b->clusters_per_mft_record;
- ntfs_debug("clusters_per_mft_record = %i (0x%x)",
- clusters_per_mft_record, clusters_per_mft_record);
- if (clusters_per_mft_record > 0)
- vol->mft_record_size = vol->cluster_size <<
- (ffs(clusters_per_mft_record) - 1);
- else
- /*
- * When mft_record_size < cluster_size, clusters_per_mft_record
- * = -log2(mft_record_size) bytes. mft_record_size normaly is
- * 1024 bytes, which is encoded as 0xF6 (-10 in decimal).
- */
- vol->mft_record_size = 1 << -clusters_per_mft_record;
- vol->mft_record_size_mask = vol->mft_record_size - 1;
- vol->mft_record_size_bits = ffs(vol->mft_record_size) - 1;
- ntfs_debug("vol->mft_record_size = %i (0x%x)", vol->mft_record_size,
- vol->mft_record_size);
- ntfs_debug("vol->mft_record_size_mask = 0x%x",
- vol->mft_record_size_mask);
- ntfs_debug("vol->mft_record_size_bits = %i (0x%x)",
- vol->mft_record_size_bits, vol->mft_record_size_bits);
- /*
- * We cannot support mft record sizes above the PAGE_SIZE since
- * we store $MFT/$DATA, the table of mft records in the page cache.
- */
- if (vol->mft_record_size > PAGE_SIZE) {
- ntfs_error(vol->sb, "Mft record size (%i) exceeds the "
- "PAGE_SIZE on your system (%lu). "
- "This is not supported. Sorry.",
- vol->mft_record_size, PAGE_SIZE);
- return false;
- }
- /* We cannot support mft record sizes below the sector size. */
- if (vol->mft_record_size < vol->sector_size) {
- ntfs_error(vol->sb, "Mft record size (%i) is smaller than the "
- "sector size (%i). This is not supported. "
- "Sorry.", vol->mft_record_size,
- vol->sector_size);
- return false;
- }
- clusters_per_index_record = b->clusters_per_index_record;
- ntfs_debug("clusters_per_index_record = %i (0x%x)",
- clusters_per_index_record, clusters_per_index_record);
- if (clusters_per_index_record > 0)
- vol->index_record_size = vol->cluster_size <<
- (ffs(clusters_per_index_record) - 1);
- else
- /*
- * When index_record_size < cluster_size,
- * clusters_per_index_record = -log2(index_record_size) bytes.
- * index_record_size normaly equals 4096 bytes, which is
- * encoded as 0xF4 (-12 in decimal).
- */
- vol->index_record_size = 1 << -clusters_per_index_record;
- vol->index_record_size_mask = vol->index_record_size - 1;
- vol->index_record_size_bits = ffs(vol->index_record_size) - 1;
- ntfs_debug("vol->index_record_size = %i (0x%x)",
- vol->index_record_size, vol->index_record_size);
- ntfs_debug("vol->index_record_size_mask = 0x%x",
- vol->index_record_size_mask);
- ntfs_debug("vol->index_record_size_bits = %i (0x%x)",
- vol->index_record_size_bits,
- vol->index_record_size_bits);
- /* We cannot support index record sizes below the sector size. */
- if (vol->index_record_size < vol->sector_size) {
- ntfs_error(vol->sb, "Index record size (%i) is smaller than "
- "the sector size (%i). This is not "
- "supported. Sorry.", vol->index_record_size,
- vol->sector_size);
- return false;
- }
- /*
- * Get the size of the volume in clusters and check for 64-bit-ness.
- * Windows currently only uses 32 bits to save the clusters so we do
- * the same as it is much faster on 32-bit CPUs.
- */
- ll = sle64_to_cpu(b->number_of_sectors) >> sectors_per_cluster_bits;
- if ((u64)ll >= 1ULL << 32) {
- ntfs_error(vol->sb, "Cannot handle 64-bit clusters. Sorry.");
- return false;
- }
- vol->nr_clusters = ll;
- ntfs_debug("vol->nr_clusters = 0x%llx", (long long)vol->nr_clusters);
- /*
- * On an architecture where unsigned long is 32-bits, we restrict the
- * volume size to 2TiB (2^41). On a 64-bit architecture, the compiler
- * will hopefully optimize the whole check away.
- */
- if (sizeof(unsigned long) < 8) {
- if ((ll << vol->cluster_size_bits) >= (1ULL << 41)) {
- ntfs_error(vol->sb, "Volume size (%lluTiB) is too "
- "large for this architecture. "
- "Maximum supported is 2TiB. Sorry.",
- (unsigned long long)ll >> (40 -
- vol->cluster_size_bits));
- return false;
- }
- }
- ll = sle64_to_cpu(b->mft_lcn);
- if (ll >= vol->nr_clusters) {
- ntfs_error(vol->sb, "MFT LCN (%lli, 0x%llx) is beyond end of "
- "volume. Weird.", (unsigned long long)ll,
- (unsigned long long)ll);
- return false;
- }
- vol->mft_lcn = ll;
- ntfs_debug("vol->mft_lcn = 0x%llx", (long long)vol->mft_lcn);
- ll = sle64_to_cpu(b->mftmirr_lcn);
- if (ll >= vol->nr_clusters) {
- ntfs_error(vol->sb, "MFTMirr LCN (%lli, 0x%llx) is beyond end "
- "of volume. Weird.", (unsigned long long)ll,
- (unsigned long long)ll);
- return false;
- }
- vol->mftmirr_lcn = ll;
- ntfs_debug("vol->mftmirr_lcn = 0x%llx", (long long)vol->mftmirr_lcn);
-#ifdef NTFS_RW
- /*
- * Work out the size of the mft mirror in number of mft records. If the
- * cluster size is less than or equal to the size taken by four mft
- * records, the mft mirror stores the first four mft records. If the
- * cluster size is bigger than the size taken by four mft records, the
- * mft mirror contains as many mft records as will fit into one
- * cluster.
- */
- if (vol->cluster_size <= (4 << vol->mft_record_size_bits))
- vol->mftmirr_size = 4;
- else
- vol->mftmirr_size = vol->cluster_size >>
- vol->mft_record_size_bits;
- ntfs_debug("vol->mftmirr_size = %i", vol->mftmirr_size);
-#endif /* NTFS_RW */
- vol->serial_no = le64_to_cpu(b->volume_serial_number);
- ntfs_debug("vol->serial_no = 0x%llx",
- (unsigned long long)vol->serial_no);
- return true;
-}
-
-/**
- * ntfs_setup_allocators - initialize the cluster and mft allocators
- * @vol: volume structure for which to setup the allocators
- *
- * Setup the cluster (lcn) and mft allocators to the starting values.
- */
-static void ntfs_setup_allocators(ntfs_volume *vol)
-{
-#ifdef NTFS_RW
- LCN mft_zone_size, mft_lcn;
-#endif /* NTFS_RW */
-
- ntfs_debug("vol->mft_zone_multiplier = 0x%x",
- vol->mft_zone_multiplier);
-#ifdef NTFS_RW
- /* Determine the size of the MFT zone. */
- mft_zone_size = vol->nr_clusters;
- switch (vol->mft_zone_multiplier) { /* % of volume size in clusters */
- case 4:
- mft_zone_size >>= 1; /* 50% */
- break;
- case 3:
- mft_zone_size = (mft_zone_size +
- (mft_zone_size >> 1)) >> 2; /* 37.5% */
- break;
- case 2:
- mft_zone_size >>= 2; /* 25% */
- break;
- /* case 1: */
- default:
- mft_zone_size >>= 3; /* 12.5% */
- break;
- }
- /* Setup the mft zone. */
- vol->mft_zone_start = vol->mft_zone_pos = vol->mft_lcn;
- ntfs_debug("vol->mft_zone_pos = 0x%llx",
- (unsigned long long)vol->mft_zone_pos);
- /*
- * Calculate the mft_lcn for an unmodified NTFS volume (see mkntfs
- * source) and if the actual mft_lcn is in the expected place or even
- * further to the front of the volume, extend the mft_zone to cover the
- * beginning of the volume as well. This is in order to protect the
- * area reserved for the mft bitmap as well within the mft_zone itself.
- * On non-standard volumes we do not protect it as the overhead would
- * be higher than the speed increase we would get by doing it.
- */
- mft_lcn = (8192 + 2 * vol->cluster_size - 1) / vol->cluster_size;
- if (mft_lcn * vol->cluster_size < 16 * 1024)
- mft_lcn = (16 * 1024 + vol->cluster_size - 1) /
- vol->cluster_size;
- if (vol->mft_zone_start <= mft_lcn)
- vol->mft_zone_start = 0;
- ntfs_debug("vol->mft_zone_start = 0x%llx",
- (unsigned long long)vol->mft_zone_start);
- /*
- * Need to cap the mft zone on non-standard volumes so that it does
- * not point outside the boundaries of the volume. We do this by
- * halving the zone size until we are inside the volume.
- */
- vol->mft_zone_end = vol->mft_lcn + mft_zone_size;
- while (vol->mft_zone_end >= vol->nr_clusters) {
- mft_zone_size >>= 1;
- vol->mft_zone_end = vol->mft_lcn + mft_zone_size;
- }
- ntfs_debug("vol->mft_zone_end = 0x%llx",
- (unsigned long long)vol->mft_zone_end);
- /*
- * Set the current position within each data zone to the start of the
- * respective zone.
- */
- vol->data1_zone_pos = vol->mft_zone_end;
- ntfs_debug("vol->data1_zone_pos = 0x%llx",
- (unsigned long long)vol->data1_zone_pos);
- vol->data2_zone_pos = 0;
- ntfs_debug("vol->data2_zone_pos = 0x%llx",
- (unsigned long long)vol->data2_zone_pos);
-
- /* Set the mft data allocation position to mft record 24. */
- vol->mft_data_pos = 24;
- ntfs_debug("vol->mft_data_pos = 0x%llx",
- (unsigned long long)vol->mft_data_pos);
-#endif /* NTFS_RW */
-}
-
-#ifdef NTFS_RW
-
-/**
- * load_and_init_mft_mirror - load and setup the mft mirror inode for a volume
- * @vol: ntfs super block describing device whose mft mirror to load
- *
- * Return 'true' on success or 'false' on error.
- */
-static bool load_and_init_mft_mirror(ntfs_volume *vol)
-{
- struct inode *tmp_ino;
- ntfs_inode *tmp_ni;
-
- ntfs_debug("Entering.");
- /* Get mft mirror inode. */
- tmp_ino = ntfs_iget(vol->sb, FILE_MFTMirr);
- if (IS_ERR(tmp_ino) || is_bad_inode(tmp_ino)) {
- if (!IS_ERR(tmp_ino))
- iput(tmp_ino);
- /* Caller will display error message. */
- return false;
- }
- /*
- * Re-initialize some specifics about $MFTMirr's inode as
- * ntfs_read_inode() will have set up the default ones.
- */
- /* Set uid and gid to root. */
- tmp_ino->i_uid = GLOBAL_ROOT_UID;
- tmp_ino->i_gid = GLOBAL_ROOT_GID;
- /* Regular file. No access for anyone. */
- tmp_ino->i_mode = S_IFREG;
- /* No VFS initiated operations allowed for $MFTMirr. */
- tmp_ino->i_op = &ntfs_empty_inode_ops;
- tmp_ino->i_fop = &ntfs_empty_file_ops;
- /* Put in our special address space operations. */
- tmp_ino->i_mapping->a_ops = &ntfs_mst_aops;
- tmp_ni = NTFS_I(tmp_ino);
- /* The $MFTMirr, like the $MFT is multi sector transfer protected. */
- NInoSetMstProtected(tmp_ni);
- NInoSetSparseDisabled(tmp_ni);
- /*
- * Set up our little cheat allowing us to reuse the async read io
- * completion handler for directories.
- */
- tmp_ni->itype.index.block_size = vol->mft_record_size;
- tmp_ni->itype.index.block_size_bits = vol->mft_record_size_bits;
- vol->mftmirr_ino = tmp_ino;
- ntfs_debug("Done.");
- return true;
-}
-
-/**
- * check_mft_mirror - compare contents of the mft mirror with the mft
- * @vol: ntfs super block describing device whose mft mirror to check
- *
- * Return 'true' on success or 'false' on error.
- *
- * Note, this function also results in the mft mirror runlist being completely
- * mapped into memory. The mft mirror write code requires this and will BUG()
- * should it find an unmapped runlist element.
- */
-static bool check_mft_mirror(ntfs_volume *vol)
-{
- struct super_block *sb = vol->sb;
- ntfs_inode *mirr_ni;
- struct page *mft_page, *mirr_page;
- u8 *kmft, *kmirr;
- runlist_element *rl, rl2[2];
- pgoff_t index;
- int mrecs_per_page, i;
-
- ntfs_debug("Entering.");
- /* Compare contents of $MFT and $MFTMirr. */
- mrecs_per_page = PAGE_SIZE / vol->mft_record_size;
- BUG_ON(!mrecs_per_page);
- BUG_ON(!vol->mftmirr_size);
- mft_page = mirr_page = NULL;
- kmft = kmirr = NULL;
- index = i = 0;
- do {
- u32 bytes;
-
- /* Switch pages if necessary. */
- if (!(i % mrecs_per_page)) {
- if (index) {
- ntfs_unmap_page(mft_page);
- ntfs_unmap_page(mirr_page);
- }
- /* Get the $MFT page. */
- mft_page = ntfs_map_page(vol->mft_ino->i_mapping,
- index);
- if (IS_ERR(mft_page)) {
- ntfs_error(sb, "Failed to read $MFT.");
- return false;
- }
- kmft = page_address(mft_page);
- /* Get the $MFTMirr page. */
- mirr_page = ntfs_map_page(vol->mftmirr_ino->i_mapping,
- index);
- if (IS_ERR(mirr_page)) {
- ntfs_error(sb, "Failed to read $MFTMirr.");
- goto mft_unmap_out;
- }
- kmirr = page_address(mirr_page);
- ++index;
- }
- /* Do not check the record if it is not in use. */
- if (((MFT_RECORD*)kmft)->flags & MFT_RECORD_IN_USE) {
- /* Make sure the record is ok. */
- if (ntfs_is_baad_recordp((le32*)kmft)) {
- ntfs_error(sb, "Incomplete multi sector "
- "transfer detected in mft "
- "record %i.", i);
-mm_unmap_out:
- ntfs_unmap_page(mirr_page);
-mft_unmap_out:
- ntfs_unmap_page(mft_page);
- return false;
- }
- }
- /* Do not check the mirror record if it is not in use. */
- if (((MFT_RECORD*)kmirr)->flags & MFT_RECORD_IN_USE) {
- if (ntfs_is_baad_recordp((le32*)kmirr)) {
- ntfs_error(sb, "Incomplete multi sector "
- "transfer detected in mft "
- "mirror record %i.", i);
- goto mm_unmap_out;
- }
- }
- /* Get the amount of data in the current record. */
- bytes = le32_to_cpu(((MFT_RECORD*)kmft)->bytes_in_use);
- if (bytes < sizeof(MFT_RECORD_OLD) ||
- bytes > vol->mft_record_size ||
- ntfs_is_baad_recordp((le32*)kmft)) {
- bytes = le32_to_cpu(((MFT_RECORD*)kmirr)->bytes_in_use);
- if (bytes < sizeof(MFT_RECORD_OLD) ||
- bytes > vol->mft_record_size ||
- ntfs_is_baad_recordp((le32*)kmirr))
- bytes = vol->mft_record_size;
- }
- /* Compare the two records. */
- if (memcmp(kmft, kmirr, bytes)) {
- ntfs_error(sb, "$MFT and $MFTMirr (record %i) do not "
- "match. Run ntfsfix or chkdsk.", i);
- goto mm_unmap_out;
- }
- kmft += vol->mft_record_size;
- kmirr += vol->mft_record_size;
- } while (++i < vol->mftmirr_size);
- /* Release the last pages. */
- ntfs_unmap_page(mft_page);
- ntfs_unmap_page(mirr_page);
-
- /* Construct the mft mirror runlist by hand. */
- rl2[0].vcn = 0;
- rl2[0].lcn = vol->mftmirr_lcn;
- rl2[0].length = (vol->mftmirr_size * vol->mft_record_size +
- vol->cluster_size - 1) / vol->cluster_size;
- rl2[1].vcn = rl2[0].length;
- rl2[1].lcn = LCN_ENOENT;
- rl2[1].length = 0;
- /*
- * Because we have just read all of the mft mirror, we know we have
- * mapped the full runlist for it.
- */
- mirr_ni = NTFS_I(vol->mftmirr_ino);
- down_read(&mirr_ni->runlist.lock);
- rl = mirr_ni->runlist.rl;
- /* Compare the two runlists. They must be identical. */
- i = 0;
- do {
- if (rl2[i].vcn != rl[i].vcn || rl2[i].lcn != rl[i].lcn ||
- rl2[i].length != rl[i].length) {
- ntfs_error(sb, "$MFTMirr location mismatch. "
- "Run chkdsk.");
- up_read(&mirr_ni->runlist.lock);
- return false;
- }
- } while (rl2[i++].length);
- up_read(&mirr_ni->runlist.lock);
- ntfs_debug("Done.");
- return true;
-}
-
-/**
- * load_and_check_logfile - load and check the logfile inode for a volume
- * @vol: ntfs super block describing device whose logfile to load
- *
- * Return 'true' on success or 'false' on error.
- */
-static bool load_and_check_logfile(ntfs_volume *vol,
- RESTART_PAGE_HEADER **rp)
-{
- struct inode *tmp_ino;
-
- ntfs_debug("Entering.");
- tmp_ino = ntfs_iget(vol->sb, FILE_LogFile);
- if (IS_ERR(tmp_ino) || is_bad_inode(tmp_ino)) {
- if (!IS_ERR(tmp_ino))
- iput(tmp_ino);
- /* Caller will display error message. */
- return false;
- }
- if (!ntfs_check_logfile(tmp_ino, rp)) {
- iput(tmp_ino);
- /* ntfs_check_logfile() will have displayed error output. */
- return false;
- }
- NInoSetSparseDisabled(NTFS_I(tmp_ino));
- vol->logfile_ino = tmp_ino;
- ntfs_debug("Done.");
- return true;
-}
-
-#define NTFS_HIBERFIL_HEADER_SIZE 4096
-
-/**
- * check_windows_hibernation_status - check if Windows is suspended on a volume
- * @vol: ntfs super block of device to check
- *
- * Check if Windows is hibernated on the ntfs volume @vol. This is done by
- * looking for the file hiberfil.sys in the root directory of the volume. If
- * the file is not present Windows is definitely not suspended.
- *
- * If hiberfil.sys exists and is less than 4kiB in size it means Windows is
- * definitely suspended (this volume is not the system volume). Caveat: on a
- * system with many volumes it is possible that the < 4kiB check is bogus but
- * for now this should do fine.
- *
- * If hiberfil.sys exists and is larger than 4kiB in size, we need to read the
- * hiberfil header (which is the first 4kiB). If this begins with "hibr",
- * Windows is definitely suspended. If it is completely full of zeroes,
- * Windows is definitely not hibernated. Any other case is treated as if
- * Windows is suspended. This caters for the above mentioned caveat of a
- * system with many volumes where no "hibr" magic would be present and there is
- * no zero header.
- *
- * Return 0 if Windows is not hibernated on the volume, >0 if Windows is
- * hibernated on the volume, and -errno on error.
- */
-static int check_windows_hibernation_status(ntfs_volume *vol)
-{
- MFT_REF mref;
- struct inode *vi;
- struct page *page;
- u32 *kaddr, *kend;
- ntfs_name *name = NULL;
- int ret = 1;
- static const ntfschar hiberfil[13] = { cpu_to_le16('h'),
- cpu_to_le16('i'), cpu_to_le16('b'),
- cpu_to_le16('e'), cpu_to_le16('r'),
- cpu_to_le16('f'), cpu_to_le16('i'),
- cpu_to_le16('l'), cpu_to_le16('.'),
- cpu_to_le16('s'), cpu_to_le16('y'),
- cpu_to_le16('s'), 0 };
-
- ntfs_debug("Entering.");
- /*
- * Find the inode number for the hibernation file by looking up the
- * filename hiberfil.sys in the root directory.
- */
- inode_lock(vol->root_ino);
- mref = ntfs_lookup_inode_by_name(NTFS_I(vol->root_ino), hiberfil, 12,
- &name);
- inode_unlock(vol->root_ino);
- if (IS_ERR_MREF(mref)) {
- ret = MREF_ERR(mref);
- /* If the file does not exist, Windows is not hibernated. */
- if (ret == -ENOENT) {
- ntfs_debug("hiberfil.sys not present. Windows is not "
- "hibernated on the volume.");
- return 0;
- }
- /* A real error occurred. */
- ntfs_error(vol->sb, "Failed to find inode number for "
- "hiberfil.sys.");
- return ret;
- }
- /* We do not care for the type of match that was found. */
- kfree(name);
- /* Get the inode. */
- vi = ntfs_iget(vol->sb, MREF(mref));
- if (IS_ERR(vi) || is_bad_inode(vi)) {
- if (!IS_ERR(vi))
- iput(vi);
- ntfs_error(vol->sb, "Failed to load hiberfil.sys.");
- return IS_ERR(vi) ? PTR_ERR(vi) : -EIO;
- }
- if (unlikely(i_size_read(vi) < NTFS_HIBERFIL_HEADER_SIZE)) {
- ntfs_debug("hiberfil.sys is smaller than 4kiB (0x%llx). "
- "Windows is hibernated on the volume. This "
- "is not the system volume.", i_size_read(vi));
- goto iput_out;
- }
- page = ntfs_map_page(vi->i_mapping, 0);
- if (IS_ERR(page)) {
- ntfs_error(vol->sb, "Failed to read from hiberfil.sys.");
- ret = PTR_ERR(page);
- goto iput_out;
- }
- kaddr = (u32*)page_address(page);
- if (*(le32*)kaddr == cpu_to_le32(0x72626968)/*'hibr'*/) {
- ntfs_debug("Magic \"hibr\" found in hiberfil.sys. Windows is "
- "hibernated on the volume. This is the "
- "system volume.");
- goto unm_iput_out;
- }
- kend = kaddr + NTFS_HIBERFIL_HEADER_SIZE/sizeof(*kaddr);
- do {
- if (unlikely(*kaddr)) {
- ntfs_debug("hiberfil.sys is larger than 4kiB "
- "(0x%llx), does not contain the "
- "\"hibr\" magic, and does not have a "
- "zero header. Windows is hibernated "
- "on the volume. This is not the "
- "system volume.", i_size_read(vi));
- goto unm_iput_out;
- }
- } while (++kaddr < kend);
- ntfs_debug("hiberfil.sys contains a zero header. Windows is not "
- "hibernated on the volume. This is the system "
- "volume.");
- ret = 0;
-unm_iput_out:
- ntfs_unmap_page(page);
-iput_out:
- iput(vi);
- return ret;
-}
-
-/**
- * load_and_init_quota - load and setup the quota file for a volume if present
- * @vol: ntfs super block describing device whose quota file to load
- *
- * Return 'true' on success or 'false' on error. If $Quota is not present, we
- * leave vol->quota_ino as NULL and return success.
- */
-static bool load_and_init_quota(ntfs_volume *vol)
-{
- MFT_REF mref;
- struct inode *tmp_ino;
- ntfs_name *name = NULL;
- static const ntfschar Quota[7] = { cpu_to_le16('$'),
- cpu_to_le16('Q'), cpu_to_le16('u'),
- cpu_to_le16('o'), cpu_to_le16('t'),
- cpu_to_le16('a'), 0 };
- static ntfschar Q[3] = { cpu_to_le16('$'),
- cpu_to_le16('Q'), 0 };
-
- ntfs_debug("Entering.");
- /*
- * Find the inode number for the quota file by looking up the filename
- * $Quota in the extended system files directory $Extend.
- */
- inode_lock(vol->extend_ino);
- mref = ntfs_lookup_inode_by_name(NTFS_I(vol->extend_ino), Quota, 6,
- &name);
- inode_unlock(vol->extend_ino);
- if (IS_ERR_MREF(mref)) {
- /*
- * If the file does not exist, quotas are disabled and have
- * never been enabled on this volume, just return success.
- */
- if (MREF_ERR(mref) == -ENOENT) {
- ntfs_debug("$Quota not present. Volume does not have "
- "quotas enabled.");
- /*
- * No need to try to set quotas out of date if they are
- * not enabled.
- */
- NVolSetQuotaOutOfDate(vol);
- return true;
- }
- /* A real error occurred. */
- ntfs_error(vol->sb, "Failed to find inode number for $Quota.");
- return false;
- }
- /* We do not care for the type of match that was found. */
- kfree(name);
- /* Get the inode. */
- tmp_ino = ntfs_iget(vol->sb, MREF(mref));
- if (IS_ERR(tmp_ino) || is_bad_inode(tmp_ino)) {
- if (!IS_ERR(tmp_ino))
- iput(tmp_ino);
- ntfs_error(vol->sb, "Failed to load $Quota.");
- return false;
- }
- vol->quota_ino = tmp_ino;
- /* Get the $Q index allocation attribute. */
- tmp_ino = ntfs_index_iget(vol->quota_ino, Q, 2);
- if (IS_ERR(tmp_ino)) {
- ntfs_error(vol->sb, "Failed to load $Quota/$Q index.");
- return false;
- }
- vol->quota_q_ino = tmp_ino;
- ntfs_debug("Done.");
- return true;
-}
-
-/**
- * load_and_init_usnjrnl - load and setup the transaction log if present
- * @vol: ntfs super block describing device whose usnjrnl file to load
- *
- * Return 'true' on success or 'false' on error.
- *
- * If $UsnJrnl is not present or in the process of being disabled, we set
- * NVolUsnJrnlStamped() and return success.
- *
- * If the $UsnJrnl $DATA/$J attribute has a size equal to the lowest valid usn,
- * i.e. transaction logging has only just been enabled or the journal has been
- * stamped and nothing has been logged since, we also set NVolUsnJrnlStamped()
- * and return success.
- */
-static bool load_and_init_usnjrnl(ntfs_volume *vol)
-{
- MFT_REF mref;
- struct inode *tmp_ino;
- ntfs_inode *tmp_ni;
- struct page *page;
- ntfs_name *name = NULL;
- USN_HEADER *uh;
- static const ntfschar UsnJrnl[9] = { cpu_to_le16('$'),
- cpu_to_le16('U'), cpu_to_le16('s'),
- cpu_to_le16('n'), cpu_to_le16('J'),
- cpu_to_le16('r'), cpu_to_le16('n'),
- cpu_to_le16('l'), 0 };
- static ntfschar Max[5] = { cpu_to_le16('$'),
- cpu_to_le16('M'), cpu_to_le16('a'),
- cpu_to_le16('x'), 0 };
- static ntfschar J[3] = { cpu_to_le16('$'),
- cpu_to_le16('J'), 0 };
-
- ntfs_debug("Entering.");
- /*
- * Find the inode number for the transaction log file by looking up the
- * filename $UsnJrnl in the extended system files directory $Extend.
- */
- inode_lock(vol->extend_ino);
- mref = ntfs_lookup_inode_by_name(NTFS_I(vol->extend_ino), UsnJrnl, 8,
- &name);
- inode_unlock(vol->extend_ino);
- if (IS_ERR_MREF(mref)) {
- /*
- * If the file does not exist, transaction logging is disabled,
- * just return success.
- */
- if (MREF_ERR(mref) == -ENOENT) {
- ntfs_debug("$UsnJrnl not present. Volume does not "
- "have transaction logging enabled.");
-not_enabled:
- /*
- * No need to try to stamp the transaction log if
- * transaction logging is not enabled.
- */
- NVolSetUsnJrnlStamped(vol);
- return true;
- }
- /* A real error occurred. */
- ntfs_error(vol->sb, "Failed to find inode number for "
- "$UsnJrnl.");
- return false;
- }
- /* We do not care for the type of match that was found. */
- kfree(name);
- /* Get the inode. */
- tmp_ino = ntfs_iget(vol->sb, MREF(mref));
- if (IS_ERR(tmp_ino) || unlikely(is_bad_inode(tmp_ino))) {
- if (!IS_ERR(tmp_ino))
- iput(tmp_ino);
- ntfs_error(vol->sb, "Failed to load $UsnJrnl.");
- return false;
- }
- vol->usnjrnl_ino = tmp_ino;
- /*
- * If the transaction log is in the process of being deleted, we can
- * ignore it.
- */
- if (unlikely(vol->vol_flags & VOLUME_DELETE_USN_UNDERWAY)) {
- ntfs_debug("$UsnJrnl in the process of being disabled. "
- "Volume does not have transaction logging "
- "enabled.");
- goto not_enabled;
- }
- /* Get the $DATA/$Max attribute. */
- tmp_ino = ntfs_attr_iget(vol->usnjrnl_ino, AT_DATA, Max, 4);
- if (IS_ERR(tmp_ino)) {
- ntfs_error(vol->sb, "Failed to load $UsnJrnl/$DATA/$Max "
- "attribute.");
- return false;
- }
- vol->usnjrnl_max_ino = tmp_ino;
- if (unlikely(i_size_read(tmp_ino) < sizeof(USN_HEADER))) {
- ntfs_error(vol->sb, "Found corrupt $UsnJrnl/$DATA/$Max "
- "attribute (size is 0x%llx but should be at "
- "least 0x%zx bytes).", i_size_read(tmp_ino),
- sizeof(USN_HEADER));
- return false;
- }
- /* Get the $DATA/$J attribute. */
- tmp_ino = ntfs_attr_iget(vol->usnjrnl_ino, AT_DATA, J, 2);
- if (IS_ERR(tmp_ino)) {
- ntfs_error(vol->sb, "Failed to load $UsnJrnl/$DATA/$J "
- "attribute.");
- return false;
- }
- vol->usnjrnl_j_ino = tmp_ino;
- /* Verify $J is non-resident and sparse. */
- tmp_ni = NTFS_I(vol->usnjrnl_j_ino);
- if (unlikely(!NInoNonResident(tmp_ni) || !NInoSparse(tmp_ni))) {
- ntfs_error(vol->sb, "$UsnJrnl/$DATA/$J attribute is resident "
- "and/or not sparse.");
- return false;
- }
- /* Read the USN_HEADER from $DATA/$Max. */
- page = ntfs_map_page(vol->usnjrnl_max_ino->i_mapping, 0);
- if (IS_ERR(page)) {
- ntfs_error(vol->sb, "Failed to read from $UsnJrnl/$DATA/$Max "
- "attribute.");
- return false;
- }
- uh = (USN_HEADER*)page_address(page);
- /* Sanity check the $Max. */
- if (unlikely(sle64_to_cpu(uh->allocation_delta) >
- sle64_to_cpu(uh->maximum_size))) {
- ntfs_error(vol->sb, "Allocation delta (0x%llx) exceeds "
- "maximum size (0x%llx). $UsnJrnl is corrupt.",
- (long long)sle64_to_cpu(uh->allocation_delta),
- (long long)sle64_to_cpu(uh->maximum_size));
- ntfs_unmap_page(page);
- return false;
- }
- /*
- * If the transaction log has been stamped and nothing has been written
- * to it since, we do not need to stamp it.
- */
- if (unlikely(sle64_to_cpu(uh->lowest_valid_usn) >=
- i_size_read(vol->usnjrnl_j_ino))) {
- if (likely(sle64_to_cpu(uh->lowest_valid_usn) ==
- i_size_read(vol->usnjrnl_j_ino))) {
- ntfs_unmap_page(page);
- ntfs_debug("$UsnJrnl is enabled but nothing has been "
- "logged since it was last stamped. "
- "Treating this as if the volume does "
- "not have transaction logging "
- "enabled.");
- goto not_enabled;
- }
- ntfs_error(vol->sb, "$UsnJrnl has lowest valid usn (0x%llx) "
- "which is out of bounds (0x%llx). $UsnJrnl "
- "is corrupt.",
- (long long)sle64_to_cpu(uh->lowest_valid_usn),
- i_size_read(vol->usnjrnl_j_ino));
- ntfs_unmap_page(page);
- return false;
- }
- ntfs_unmap_page(page);
- ntfs_debug("Done.");
- return true;
-}
-
-/**
- * load_and_init_attrdef - load the attribute definitions table for a volume
- * @vol: ntfs super block describing device whose attrdef to load
- *
- * Return 'true' on success or 'false' on error.
- */
-static bool load_and_init_attrdef(ntfs_volume *vol)
-{
- loff_t i_size;
- struct super_block *sb = vol->sb;
- struct inode *ino;
- struct page *page;
- pgoff_t index, max_index;
- unsigned int size;
-
- ntfs_debug("Entering.");
- /* Read attrdef table and setup vol->attrdef and vol->attrdef_size. */
- ino = ntfs_iget(sb, FILE_AttrDef);
- if (IS_ERR(ino) || is_bad_inode(ino)) {
- if (!IS_ERR(ino))
- iput(ino);
- goto failed;
- }
- NInoSetSparseDisabled(NTFS_I(ino));
- /* The size of FILE_AttrDef must be above 0 and fit inside 31 bits. */
- i_size = i_size_read(ino);
- if (i_size <= 0 || i_size > 0x7fffffff)
- goto iput_failed;
- vol->attrdef = (ATTR_DEF*)ntfs_malloc_nofs(i_size);
- if (!vol->attrdef)
- goto iput_failed;
- index = 0;
- max_index = i_size >> PAGE_SHIFT;
- size = PAGE_SIZE;
- while (index < max_index) {
- /* Read the attrdef table and copy it into the linear buffer. */
-read_partial_attrdef_page:
- page = ntfs_map_page(ino->i_mapping, index);
- if (IS_ERR(page))
- goto free_iput_failed;
- memcpy((u8*)vol->attrdef + (index++ << PAGE_SHIFT),
- page_address(page), size);
- ntfs_unmap_page(page);
- }
- if (size == PAGE_SIZE) {
- size = i_size & ~PAGE_MASK;
- if (size)
- goto read_partial_attrdef_page;
- }
- vol->attrdef_size = i_size;
- ntfs_debug("Read %llu bytes from $AttrDef.", i_size);
- iput(ino);
- return true;
-free_iput_failed:
- ntfs_free(vol->attrdef);
- vol->attrdef = NULL;
-iput_failed:
- iput(ino);
-failed:
- ntfs_error(sb, "Failed to initialize attribute definition table.");
- return false;
-}
-
-#endif /* NTFS_RW */
-
-/**
- * load_and_init_upcase - load the upcase table for an ntfs volume
- * @vol: ntfs super block describing device whose upcase to load
- *
- * Return 'true' on success or 'false' on error.
- */
-static bool load_and_init_upcase(ntfs_volume *vol)
-{
- loff_t i_size;
- struct super_block *sb = vol->sb;
- struct inode *ino;
- struct page *page;
- pgoff_t index, max_index;
- unsigned int size;
- int i, max;
-
- ntfs_debug("Entering.");
- /* Read upcase table and setup vol->upcase and vol->upcase_len. */
- ino = ntfs_iget(sb, FILE_UpCase);
- if (IS_ERR(ino) || is_bad_inode(ino)) {
- if (!IS_ERR(ino))
- iput(ino);
- goto upcase_failed;
- }
- /*
- * The upcase size must not be above 64k Unicode characters, must not
- * be zero and must be a multiple of sizeof(ntfschar).
- */
- i_size = i_size_read(ino);
- if (!i_size || i_size & (sizeof(ntfschar) - 1) ||
- i_size > 64ULL * 1024 * sizeof(ntfschar))
- goto iput_upcase_failed;
- vol->upcase = (ntfschar*)ntfs_malloc_nofs(i_size);
- if (!vol->upcase)
- goto iput_upcase_failed;
- index = 0;
- max_index = i_size >> PAGE_SHIFT;
- size = PAGE_SIZE;
- while (index < max_index) {
- /* Read the upcase table and copy it into the linear buffer. */
-read_partial_upcase_page:
- page = ntfs_map_page(ino->i_mapping, index);
- if (IS_ERR(page))
- goto iput_upcase_failed;
- memcpy((char*)vol->upcase + (index++ << PAGE_SHIFT),
- page_address(page), size);
- ntfs_unmap_page(page);
- }
- if (size == PAGE_SIZE) {
- size = i_size & ~PAGE_MASK;
- if (size)
- goto read_partial_upcase_page;
- }
- vol->upcase_len = i_size >> UCHAR_T_SIZE_BITS;
- ntfs_debug("Read %llu bytes from $UpCase (expected %zu bytes).",
- i_size, 64 * 1024 * sizeof(ntfschar));
- iput(ino);
- mutex_lock(&ntfs_lock);
- if (!default_upcase) {
- ntfs_debug("Using volume specified $UpCase since default is "
- "not present.");
- mutex_unlock(&ntfs_lock);
- return true;
- }
- max = default_upcase_len;
- if (max > vol->upcase_len)
- max = vol->upcase_len;
- for (i = 0; i < max; i++)
- if (vol->upcase[i] != default_upcase[i])
- break;
- if (i == max) {
- ntfs_free(vol->upcase);
- vol->upcase = default_upcase;
- vol->upcase_len = max;
- ntfs_nr_upcase_users++;
- mutex_unlock(&ntfs_lock);
- ntfs_debug("Volume specified $UpCase matches default. Using "
- "default.");
- return true;
- }
- mutex_unlock(&ntfs_lock);
- ntfs_debug("Using volume specified $UpCase since it does not match "
- "the default.");
- return true;
-iput_upcase_failed:
- iput(ino);
- ntfs_free(vol->upcase);
- vol->upcase = NULL;
-upcase_failed:
- mutex_lock(&ntfs_lock);
- if (default_upcase) {
- vol->upcase = default_upcase;
- vol->upcase_len = default_upcase_len;
- ntfs_nr_upcase_users++;
- mutex_unlock(&ntfs_lock);
- ntfs_error(sb, "Failed to load $UpCase from the volume. Using "
- "default.");
- return true;
- }
- mutex_unlock(&ntfs_lock);
- ntfs_error(sb, "Failed to initialize upcase table.");
- return false;
-}
-
-/*
- * The lcn and mft bitmap inodes are NTFS-internal inodes with
- * their own special locking rules:
- */
-static struct lock_class_key
- lcnbmp_runlist_lock_key, lcnbmp_mrec_lock_key,
- mftbmp_runlist_lock_key, mftbmp_mrec_lock_key;
-
-/**
- * load_system_files - open the system files using normal functions
- * @vol: ntfs super block describing device whose system files to load
- *
- * Open the system files with normal access functions and complete setting up
- * the ntfs super block @vol.
- *
- * Return 'true' on success or 'false' on error.
- */
-static bool load_system_files(ntfs_volume *vol)
-{
- struct super_block *sb = vol->sb;
- MFT_RECORD *m;
- VOLUME_INFORMATION *vi;
- ntfs_attr_search_ctx *ctx;
-#ifdef NTFS_RW
- RESTART_PAGE_HEADER *rp;
- int err;
-#endif /* NTFS_RW */
-
- ntfs_debug("Entering.");
-#ifdef NTFS_RW
- /* Get mft mirror inode compare the contents of $MFT and $MFTMirr. */
- if (!load_and_init_mft_mirror(vol) || !check_mft_mirror(vol)) {
- static const char *es1 = "Failed to load $MFTMirr";
- static const char *es2 = "$MFTMirr does not match $MFT";
- static const char *es3 = ". Run ntfsfix and/or chkdsk.";
-
- /* If a read-write mount, convert it to a read-only mount. */
- if (!sb_rdonly(sb)) {
- if (!(vol->on_errors & (ON_ERRORS_REMOUNT_RO |
- ON_ERRORS_CONTINUE))) {
- ntfs_error(sb, "%s and neither on_errors="
- "continue nor on_errors="
- "remount-ro was specified%s",
- !vol->mftmirr_ino ? es1 : es2,
- es3);
- goto iput_mirr_err_out;
- }
- sb->s_flags |= SB_RDONLY;
- ntfs_error(sb, "%s. Mounting read-only%s",
- !vol->mftmirr_ino ? es1 : es2, es3);
- } else
- ntfs_warning(sb, "%s. Will not be able to remount "
- "read-write%s",
- !vol->mftmirr_ino ? es1 : es2, es3);
- /* This will prevent a read-write remount. */
- NVolSetErrors(vol);
- }
-#endif /* NTFS_RW */
- /* Get mft bitmap attribute inode. */
- vol->mftbmp_ino = ntfs_attr_iget(vol->mft_ino, AT_BITMAP, NULL, 0);
- if (IS_ERR(vol->mftbmp_ino)) {
- ntfs_error(sb, "Failed to load $MFT/$BITMAP attribute.");
- goto iput_mirr_err_out;
- }
- lockdep_set_class(&NTFS_I(vol->mftbmp_ino)->runlist.lock,
- &mftbmp_runlist_lock_key);
- lockdep_set_class(&NTFS_I(vol->mftbmp_ino)->mrec_lock,
- &mftbmp_mrec_lock_key);
- /* Read upcase table and setup @vol->upcase and @vol->upcase_len. */
- if (!load_and_init_upcase(vol))
- goto iput_mftbmp_err_out;
-#ifdef NTFS_RW
- /*
- * Read attribute definitions table and setup @vol->attrdef and
- * @vol->attrdef_size.
- */
- if (!load_and_init_attrdef(vol))
- goto iput_upcase_err_out;
-#endif /* NTFS_RW */
- /*
- * Get the cluster allocation bitmap inode and verify the size, no
- * need for any locking at this stage as we are already running
- * exclusively as we are mount in progress task.
- */
- vol->lcnbmp_ino = ntfs_iget(sb, FILE_Bitmap);
- if (IS_ERR(vol->lcnbmp_ino) || is_bad_inode(vol->lcnbmp_ino)) {
- if (!IS_ERR(vol->lcnbmp_ino))
- iput(vol->lcnbmp_ino);
- goto bitmap_failed;
- }
- lockdep_set_class(&NTFS_I(vol->lcnbmp_ino)->runlist.lock,
- &lcnbmp_runlist_lock_key);
- lockdep_set_class(&NTFS_I(vol->lcnbmp_ino)->mrec_lock,
- &lcnbmp_mrec_lock_key);
-
- NInoSetSparseDisabled(NTFS_I(vol->lcnbmp_ino));
- if ((vol->nr_clusters + 7) >> 3 > i_size_read(vol->lcnbmp_ino)) {
- iput(vol->lcnbmp_ino);
-bitmap_failed:
- ntfs_error(sb, "Failed to load $Bitmap.");
- goto iput_attrdef_err_out;
- }
- /*
- * Get the volume inode and setup our cache of the volume flags and
- * version.
- */
- vol->vol_ino = ntfs_iget(sb, FILE_Volume);
- if (IS_ERR(vol->vol_ino) || is_bad_inode(vol->vol_ino)) {
- if (!IS_ERR(vol->vol_ino))
- iput(vol->vol_ino);
-volume_failed:
- ntfs_error(sb, "Failed to load $Volume.");
- goto iput_lcnbmp_err_out;
- }
- m = map_mft_record(NTFS_I(vol->vol_ino));
- if (IS_ERR(m)) {
-iput_volume_failed:
- iput(vol->vol_ino);
- goto volume_failed;
- }
- if (!(ctx = ntfs_attr_get_search_ctx(NTFS_I(vol->vol_ino), m))) {
- ntfs_error(sb, "Failed to get attribute search context.");
- goto get_ctx_vol_failed;
- }
- if (ntfs_attr_lookup(AT_VOLUME_INFORMATION, NULL, 0, 0, 0, NULL, 0,
- ctx) || ctx->attr->non_resident || ctx->attr->flags) {
-err_put_vol:
- ntfs_attr_put_search_ctx(ctx);
-get_ctx_vol_failed:
- unmap_mft_record(NTFS_I(vol->vol_ino));
- goto iput_volume_failed;
- }
- vi = (VOLUME_INFORMATION*)((char*)ctx->attr +
- le16_to_cpu(ctx->attr->data.resident.value_offset));
- /* Some bounds checks. */
- if ((u8*)vi < (u8*)ctx->attr || (u8*)vi +
- le32_to_cpu(ctx->attr->data.resident.value_length) >
- (u8*)ctx->attr + le32_to_cpu(ctx->attr->length))
- goto err_put_vol;
- /* Copy the volume flags and version to the ntfs_volume structure. */
- vol->vol_flags = vi->flags;
- vol->major_ver = vi->major_ver;
- vol->minor_ver = vi->minor_ver;
- ntfs_attr_put_search_ctx(ctx);
- unmap_mft_record(NTFS_I(vol->vol_ino));
- pr_info("volume version %i.%i.\n", vol->major_ver,
- vol->minor_ver);
- if (vol->major_ver < 3 && NVolSparseEnabled(vol)) {
- ntfs_warning(vol->sb, "Disabling sparse support due to NTFS "
- "volume version %i.%i (need at least version "
- "3.0).", vol->major_ver, vol->minor_ver);
- NVolClearSparseEnabled(vol);
- }
-#ifdef NTFS_RW
- /* Make sure that no unsupported volume flags are set. */
- if (vol->vol_flags & VOLUME_MUST_MOUNT_RO_MASK) {
- static const char *es1a = "Volume is dirty";
- static const char *es1b = "Volume has been modified by chkdsk";
- static const char *es1c = "Volume has unsupported flags set";
- static const char *es2a = ". Run chkdsk and mount in Windows.";
- static const char *es2b = ". Mount in Windows.";
- const char *es1, *es2;
-
- es2 = es2a;
- if (vol->vol_flags & VOLUME_IS_DIRTY)
- es1 = es1a;
- else if (vol->vol_flags & VOLUME_MODIFIED_BY_CHKDSK) {
- es1 = es1b;
- es2 = es2b;
- } else {
- es1 = es1c;
- ntfs_warning(sb, "Unsupported volume flags 0x%x "
- "encountered.",
- (unsigned)le16_to_cpu(vol->vol_flags));
- }
- /* If a read-write mount, convert it to a read-only mount. */
- if (!sb_rdonly(sb)) {
- if (!(vol->on_errors & (ON_ERRORS_REMOUNT_RO |
- ON_ERRORS_CONTINUE))) {
- ntfs_error(sb, "%s and neither on_errors="
- "continue nor on_errors="
- "remount-ro was specified%s",
- es1, es2);
- goto iput_vol_err_out;
- }
- sb->s_flags |= SB_RDONLY;
- ntfs_error(sb, "%s. Mounting read-only%s", es1, es2);
- } else
- ntfs_warning(sb, "%s. Will not be able to remount "
- "read-write%s", es1, es2);
- /*
- * Do not set NVolErrors() because ntfs_remount() re-checks the
- * flags which we need to do in case any flags have changed.
- */
- }
- /*
- * Get the inode for the logfile, check it and determine if the volume
- * was shutdown cleanly.
- */
- rp = NULL;
- if (!load_and_check_logfile(vol, &rp) ||
- !ntfs_is_logfile_clean(vol->logfile_ino, rp)) {
- static const char *es1a = "Failed to load $LogFile";
- static const char *es1b = "$LogFile is not clean";
- static const char *es2 = ". Mount in Windows.";
- const char *es1;
-
- es1 = !vol->logfile_ino ? es1a : es1b;
- /* If a read-write mount, convert it to a read-only mount. */
- if (!sb_rdonly(sb)) {
- if (!(vol->on_errors & (ON_ERRORS_REMOUNT_RO |
- ON_ERRORS_CONTINUE))) {
- ntfs_error(sb, "%s and neither on_errors="
- "continue nor on_errors="
- "remount-ro was specified%s",
- es1, es2);
- if (vol->logfile_ino) {
- BUG_ON(!rp);
- ntfs_free(rp);
- }
- goto iput_logfile_err_out;
- }
- sb->s_flags |= SB_RDONLY;
- ntfs_error(sb, "%s. Mounting read-only%s", es1, es2);
- } else
- ntfs_warning(sb, "%s. Will not be able to remount "
- "read-write%s", es1, es2);
- /* This will prevent a read-write remount. */
- NVolSetErrors(vol);
- }
- ntfs_free(rp);
-#endif /* NTFS_RW */
- /* Get the root directory inode so we can do path lookups. */
- vol->root_ino = ntfs_iget(sb, FILE_root);
- if (IS_ERR(vol->root_ino) || is_bad_inode(vol->root_ino)) {
- if (!IS_ERR(vol->root_ino))
- iput(vol->root_ino);
- ntfs_error(sb, "Failed to load root directory.");
- goto iput_logfile_err_out;
- }
-#ifdef NTFS_RW
- /*
- * Check if Windows is suspended to disk on the target volume. If it
- * is hibernated, we must not write *anything* to the disk so set
- * NVolErrors() without setting the dirty volume flag and mount
- * read-only. This will prevent read-write remounting and it will also
- * prevent all writes.
- */
- err = check_windows_hibernation_status(vol);
- if (unlikely(err)) {
- static const char *es1a = "Failed to determine if Windows is "
- "hibernated";
- static const char *es1b = "Windows is hibernated";
- static const char *es2 = ". Run chkdsk.";
- const char *es1;
-
- es1 = err < 0 ? es1a : es1b;
- /* If a read-write mount, convert it to a read-only mount. */
- if (!sb_rdonly(sb)) {
- if (!(vol->on_errors & (ON_ERRORS_REMOUNT_RO |
- ON_ERRORS_CONTINUE))) {
- ntfs_error(sb, "%s and neither on_errors="
- "continue nor on_errors="
- "remount-ro was specified%s",
- es1, es2);
- goto iput_root_err_out;
- }
- sb->s_flags |= SB_RDONLY;
- ntfs_error(sb, "%s. Mounting read-only%s", es1, es2);
- } else
- ntfs_warning(sb, "%s. Will not be able to remount "
- "read-write%s", es1, es2);
- /* This will prevent a read-write remount. */
- NVolSetErrors(vol);
- }
- /* If (still) a read-write mount, mark the volume dirty. */
- if (!sb_rdonly(sb) && ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY)) {
- static const char *es1 = "Failed to set dirty bit in volume "
- "information flags";
- static const char *es2 = ". Run chkdsk.";
-
- /* Convert to a read-only mount. */
- if (!(vol->on_errors & (ON_ERRORS_REMOUNT_RO |
- ON_ERRORS_CONTINUE))) {
- ntfs_error(sb, "%s and neither on_errors=continue nor "
- "on_errors=remount-ro was specified%s",
- es1, es2);
- goto iput_root_err_out;
- }
- ntfs_error(sb, "%s. Mounting read-only%s", es1, es2);
- sb->s_flags |= SB_RDONLY;
- /*
- * Do not set NVolErrors() because ntfs_remount() might manage
- * to set the dirty flag in which case all would be well.
- */
- }
-#if 0
- // TODO: Enable this code once we start modifying anything that is
- // different between NTFS 1.2 and 3.x...
- /*
- * If (still) a read-write mount, set the NT4 compatibility flag on
- * newer NTFS version volumes.
- */
- if (!(sb->s_flags & SB_RDONLY) && (vol->major_ver > 1) &&
- ntfs_set_volume_flags(vol, VOLUME_MOUNTED_ON_NT4)) {
- static const char *es1 = "Failed to set NT4 compatibility flag";
- static const char *es2 = ". Run chkdsk.";
-
- /* Convert to a read-only mount. */
- if (!(vol->on_errors & (ON_ERRORS_REMOUNT_RO |
- ON_ERRORS_CONTINUE))) {
- ntfs_error(sb, "%s and neither on_errors=continue nor "
- "on_errors=remount-ro was specified%s",
- es1, es2);
- goto iput_root_err_out;
- }
- ntfs_error(sb, "%s. Mounting read-only%s", es1, es2);
- sb->s_flags |= SB_RDONLY;
- NVolSetErrors(vol);
- }
-#endif
- /* If (still) a read-write mount, empty the logfile. */
- if (!sb_rdonly(sb) && !ntfs_empty_logfile(vol->logfile_ino)) {
- static const char *es1 = "Failed to empty $LogFile";
- static const char *es2 = ". Mount in Windows.";
-
- /* Convert to a read-only mount. */
- if (!(vol->on_errors & (ON_ERRORS_REMOUNT_RO |
- ON_ERRORS_CONTINUE))) {
- ntfs_error(sb, "%s and neither on_errors=continue nor "
- "on_errors=remount-ro was specified%s",
- es1, es2);
- goto iput_root_err_out;
- }
- ntfs_error(sb, "%s. Mounting read-only%s", es1, es2);
- sb->s_flags |= SB_RDONLY;
- NVolSetErrors(vol);
- }
-#endif /* NTFS_RW */
- /* If on NTFS versions before 3.0, we are done. */
- if (unlikely(vol->major_ver < 3))
- return true;
- /* NTFS 3.0+ specific initialization. */
- /* Get the security descriptors inode. */
- vol->secure_ino = ntfs_iget(sb, FILE_Secure);
- if (IS_ERR(vol->secure_ino) || is_bad_inode(vol->secure_ino)) {
- if (!IS_ERR(vol->secure_ino))
- iput(vol->secure_ino);
- ntfs_error(sb, "Failed to load $Secure.");
- goto iput_root_err_out;
- }
- // TODO: Initialize security.
- /* Get the extended system files' directory inode. */
- vol->extend_ino = ntfs_iget(sb, FILE_Extend);
- if (IS_ERR(vol->extend_ino) || is_bad_inode(vol->extend_ino) ||
- !S_ISDIR(vol->extend_ino->i_mode)) {
- if (!IS_ERR(vol->extend_ino))
- iput(vol->extend_ino);
- ntfs_error(sb, "Failed to load $Extend.");
- goto iput_sec_err_out;
- }
-#ifdef NTFS_RW
- /* Find the quota file, load it if present, and set it up. */
- if (!load_and_init_quota(vol)) {
- static const char *es1 = "Failed to load $Quota";
- static const char *es2 = ". Run chkdsk.";
-
- /* If a read-write mount, convert it to a read-only mount. */
- if (!sb_rdonly(sb)) {
- if (!(vol->on_errors & (ON_ERRORS_REMOUNT_RO |
- ON_ERRORS_CONTINUE))) {
- ntfs_error(sb, "%s and neither on_errors="
- "continue nor on_errors="
- "remount-ro was specified%s",
- es1, es2);
- goto iput_quota_err_out;
- }
- sb->s_flags |= SB_RDONLY;
- ntfs_error(sb, "%s. Mounting read-only%s", es1, es2);
- } else
- ntfs_warning(sb, "%s. Will not be able to remount "
- "read-write%s", es1, es2);
- /* This will prevent a read-write remount. */
- NVolSetErrors(vol);
- }
- /* If (still) a read-write mount, mark the quotas out of date. */
- if (!sb_rdonly(sb) && !ntfs_mark_quotas_out_of_date(vol)) {
- static const char *es1 = "Failed to mark quotas out of date";
- static const char *es2 = ". Run chkdsk.";
-
- /* Convert to a read-only mount. */
- if (!(vol->on_errors & (ON_ERRORS_REMOUNT_RO |
- ON_ERRORS_CONTINUE))) {
- ntfs_error(sb, "%s and neither on_errors=continue nor "
- "on_errors=remount-ro was specified%s",
- es1, es2);
- goto iput_quota_err_out;
- }
- ntfs_error(sb, "%s. Mounting read-only%s", es1, es2);
- sb->s_flags |= SB_RDONLY;
- NVolSetErrors(vol);
- }
- /*
- * Find the transaction log file ($UsnJrnl), load it if present, check
- * it, and set it up.
- */
- if (!load_and_init_usnjrnl(vol)) {
- static const char *es1 = "Failed to load $UsnJrnl";
- static const char *es2 = ". Run chkdsk.";
-
- /* If a read-write mount, convert it to a read-only mount. */
- if (!sb_rdonly(sb)) {
- if (!(vol->on_errors & (ON_ERRORS_REMOUNT_RO |
- ON_ERRORS_CONTINUE))) {
- ntfs_error(sb, "%s and neither on_errors="
- "continue nor on_errors="
- "remount-ro was specified%s",
- es1, es2);
- goto iput_usnjrnl_err_out;
- }
- sb->s_flags |= SB_RDONLY;
- ntfs_error(sb, "%s. Mounting read-only%s", es1, es2);
- } else
- ntfs_warning(sb, "%s. Will not be able to remount "
- "read-write%s", es1, es2);
- /* This will prevent a read-write remount. */
- NVolSetErrors(vol);
- }
- /* If (still) a read-write mount, stamp the transaction log. */
- if (!sb_rdonly(sb) && !ntfs_stamp_usnjrnl(vol)) {
- static const char *es1 = "Failed to stamp transaction log "
- "($UsnJrnl)";
- static const char *es2 = ". Run chkdsk.";
-
- /* Convert to a read-only mount. */
- if (!(vol->on_errors & (ON_ERRORS_REMOUNT_RO |
- ON_ERRORS_CONTINUE))) {
- ntfs_error(sb, "%s and neither on_errors=continue nor "
- "on_errors=remount-ro was specified%s",
- es1, es2);
- goto iput_usnjrnl_err_out;
- }
- ntfs_error(sb, "%s. Mounting read-only%s", es1, es2);
- sb->s_flags |= SB_RDONLY;
- NVolSetErrors(vol);
- }
-#endif /* NTFS_RW */
- return true;
-#ifdef NTFS_RW
-iput_usnjrnl_err_out:
- iput(vol->usnjrnl_j_ino);
- iput(vol->usnjrnl_max_ino);
- iput(vol->usnjrnl_ino);
-iput_quota_err_out:
- iput(vol->quota_q_ino);
- iput(vol->quota_ino);
- iput(vol->extend_ino);
-#endif /* NTFS_RW */
-iput_sec_err_out:
- iput(vol->secure_ino);
-iput_root_err_out:
- iput(vol->root_ino);
-iput_logfile_err_out:
-#ifdef NTFS_RW
- iput(vol->logfile_ino);
-iput_vol_err_out:
-#endif /* NTFS_RW */
- iput(vol->vol_ino);
-iput_lcnbmp_err_out:
- iput(vol->lcnbmp_ino);
-iput_attrdef_err_out:
- vol->attrdef_size = 0;
- if (vol->attrdef) {
- ntfs_free(vol->attrdef);
- vol->attrdef = NULL;
- }
-#ifdef NTFS_RW
-iput_upcase_err_out:
-#endif /* NTFS_RW */
- vol->upcase_len = 0;
- mutex_lock(&ntfs_lock);
- if (vol->upcase == default_upcase) {
- ntfs_nr_upcase_users--;
- vol->upcase = NULL;
- }
- mutex_unlock(&ntfs_lock);
- if (vol->upcase) {
- ntfs_free(vol->upcase);
- vol->upcase = NULL;
- }
-iput_mftbmp_err_out:
- iput(vol->mftbmp_ino);
-iput_mirr_err_out:
-#ifdef NTFS_RW
- iput(vol->mftmirr_ino);
-#endif /* NTFS_RW */
- return false;
-}
-
-/**
- * ntfs_put_super - called by the vfs to unmount a volume
- * @sb: vfs superblock of volume to unmount
- *
- * ntfs_put_super() is called by the VFS (from fs/super.c::do_umount()) when
- * the volume is being unmounted (umount system call has been invoked) and it
- * releases all inodes and memory belonging to the NTFS specific part of the
- * super block.
- */
-static void ntfs_put_super(struct super_block *sb)
-{
- ntfs_volume *vol = NTFS_SB(sb);
-
- ntfs_debug("Entering.");
-
-#ifdef NTFS_RW
- /*
- * Commit all inodes while they are still open in case some of them
- * cause others to be dirtied.
- */
- ntfs_commit_inode(vol->vol_ino);
-
- /* NTFS 3.0+ specific. */
- if (vol->major_ver >= 3) {
- if (vol->usnjrnl_j_ino)
- ntfs_commit_inode(vol->usnjrnl_j_ino);
- if (vol->usnjrnl_max_ino)
- ntfs_commit_inode(vol->usnjrnl_max_ino);
- if (vol->usnjrnl_ino)
- ntfs_commit_inode(vol->usnjrnl_ino);
- if (vol->quota_q_ino)
- ntfs_commit_inode(vol->quota_q_ino);
- if (vol->quota_ino)
- ntfs_commit_inode(vol->quota_ino);
- if (vol->extend_ino)
- ntfs_commit_inode(vol->extend_ino);
- if (vol->secure_ino)
- ntfs_commit_inode(vol->secure_ino);
- }
-
- ntfs_commit_inode(vol->root_ino);
-
- down_write(&vol->lcnbmp_lock);
- ntfs_commit_inode(vol->lcnbmp_ino);
- up_write(&vol->lcnbmp_lock);
-
- down_write(&vol->mftbmp_lock);
- ntfs_commit_inode(vol->mftbmp_ino);
- up_write(&vol->mftbmp_lock);
-
- if (vol->logfile_ino)
- ntfs_commit_inode(vol->logfile_ino);
-
- if (vol->mftmirr_ino)
- ntfs_commit_inode(vol->mftmirr_ino);
- ntfs_commit_inode(vol->mft_ino);
-
- /*
- * If a read-write mount and no volume errors have occurred, mark the
- * volume clean. Also, re-commit all affected inodes.
- */
- if (!sb_rdonly(sb)) {
- if (!NVolErrors(vol)) {
- if (ntfs_clear_volume_flags(vol, VOLUME_IS_DIRTY))
- ntfs_warning(sb, "Failed to clear dirty bit "
- "in volume information "
- "flags. Run chkdsk.");
- ntfs_commit_inode(vol->vol_ino);
- ntfs_commit_inode(vol->root_ino);
- if (vol->mftmirr_ino)
- ntfs_commit_inode(vol->mftmirr_ino);
- ntfs_commit_inode(vol->mft_ino);
- } else {
- ntfs_warning(sb, "Volume has errors. Leaving volume "
- "marked dirty. Run chkdsk.");
- }
- }
-#endif /* NTFS_RW */
-
- iput(vol->vol_ino);
- vol->vol_ino = NULL;
-
- /* NTFS 3.0+ specific clean up. */
- if (vol->major_ver >= 3) {
-#ifdef NTFS_RW
- if (vol->usnjrnl_j_ino) {
- iput(vol->usnjrnl_j_ino);
- vol->usnjrnl_j_ino = NULL;
- }
- if (vol->usnjrnl_max_ino) {
- iput(vol->usnjrnl_max_ino);
- vol->usnjrnl_max_ino = NULL;
- }
- if (vol->usnjrnl_ino) {
- iput(vol->usnjrnl_ino);
- vol->usnjrnl_ino = NULL;
- }
- if (vol->quota_q_ino) {
- iput(vol->quota_q_ino);
- vol->quota_q_ino = NULL;
- }
- if (vol->quota_ino) {
- iput(vol->quota_ino);
- vol->quota_ino = NULL;
- }
-#endif /* NTFS_RW */
- if (vol->extend_ino) {
- iput(vol->extend_ino);
- vol->extend_ino = NULL;
- }
- if (vol->secure_ino) {
- iput(vol->secure_ino);
- vol->secure_ino = NULL;
- }
- }
-
- iput(vol->root_ino);
- vol->root_ino = NULL;
-
- down_write(&vol->lcnbmp_lock);
- iput(vol->lcnbmp_ino);
- vol->lcnbmp_ino = NULL;
- up_write(&vol->lcnbmp_lock);
-
- down_write(&vol->mftbmp_lock);
- iput(vol->mftbmp_ino);
- vol->mftbmp_ino = NULL;
- up_write(&vol->mftbmp_lock);
-
-#ifdef NTFS_RW
- if (vol->logfile_ino) {
- iput(vol->logfile_ino);
- vol->logfile_ino = NULL;
- }
- if (vol->mftmirr_ino) {
- /* Re-commit the mft mirror and mft just in case. */
- ntfs_commit_inode(vol->mftmirr_ino);
- ntfs_commit_inode(vol->mft_ino);
- iput(vol->mftmirr_ino);
- vol->mftmirr_ino = NULL;
- }
- /*
- * We should have no dirty inodes left, due to
- * mft.c::ntfs_mft_writepage() cleaning all the dirty pages as
- * the underlying mft records are written out and cleaned.
- */
- ntfs_commit_inode(vol->mft_ino);
- write_inode_now(vol->mft_ino, 1);
-#endif /* NTFS_RW */
-
- iput(vol->mft_ino);
- vol->mft_ino = NULL;
-
- /* Throw away the table of attribute definitions. */
- vol->attrdef_size = 0;
- if (vol->attrdef) {
- ntfs_free(vol->attrdef);
- vol->attrdef = NULL;
- }
- vol->upcase_len = 0;
- /*
- * Destroy the global default upcase table if necessary. Also decrease
- * the number of upcase users if we are a user.
- */
- mutex_lock(&ntfs_lock);
- if (vol->upcase == default_upcase) {
- ntfs_nr_upcase_users--;
- vol->upcase = NULL;
- }
- if (!ntfs_nr_upcase_users && default_upcase) {
- ntfs_free(default_upcase);
- default_upcase = NULL;
- }
- if (vol->cluster_size <= 4096 && !--ntfs_nr_compression_users)
- free_compression_buffers();
- mutex_unlock(&ntfs_lock);
- if (vol->upcase) {
- ntfs_free(vol->upcase);
- vol->upcase = NULL;
- }
-
- unload_nls(vol->nls_map);
-
- sb->s_fs_info = NULL;
- kfree(vol);
-}
-
-/**
- * get_nr_free_clusters - return the number of free clusters on a volume
- * @vol: ntfs volume for which to obtain free cluster count
- *
- * Calculate the number of free clusters on the mounted NTFS volume @vol. We
- * actually calculate the number of clusters in use instead because this
- * allows us to not care about partial pages as these will be just zero filled
- * and hence not be counted as allocated clusters.
- *
- * The only particularity is that clusters beyond the end of the logical ntfs
- * volume will be marked as allocated to prevent errors which means we have to
- * discount those at the end. This is important as the cluster bitmap always
- * has a size in multiples of 8 bytes, i.e. up to 63 clusters could be outside
- * the logical volume and marked in use when they are not as they do not exist.
- *
- * If any pages cannot be read we assume all clusters in the erroring pages are
- * in use. This means we return an underestimate on errors which is better than
- * an overestimate.
- */
-static s64 get_nr_free_clusters(ntfs_volume *vol)
-{
- s64 nr_free = vol->nr_clusters;
- struct address_space *mapping = vol->lcnbmp_ino->i_mapping;
- struct page *page;
- pgoff_t index, max_index;
-
- ntfs_debug("Entering.");
- /* Serialize accesses to the cluster bitmap. */
- down_read(&vol->lcnbmp_lock);
- /*
- * Convert the number of bits into bytes rounded up, then convert into
- * multiples of PAGE_SIZE, rounding up so that if we have one
- * full and one partial page max_index = 2.
- */
- max_index = (((vol->nr_clusters + 7) >> 3) + PAGE_SIZE - 1) >>
- PAGE_SHIFT;
- /* Use multiples of 4 bytes, thus max_size is PAGE_SIZE / 4. */
- ntfs_debug("Reading $Bitmap, max_index = 0x%lx, max_size = 0x%lx.",
- max_index, PAGE_SIZE / 4);
- for (index = 0; index < max_index; index++) {
- unsigned long *kaddr;
-
- /*
- * Read the page from page cache, getting it from backing store
- * if necessary, and increment the use count.
- */
- page = read_mapping_page(mapping, index, NULL);
- /* Ignore pages which errored synchronously. */
- if (IS_ERR(page)) {
- ntfs_debug("read_mapping_page() error. Skipping "
- "page (index 0x%lx).", index);
- nr_free -= PAGE_SIZE * 8;
- continue;
- }
- kaddr = kmap_atomic(page);
- /*
- * Subtract the number of set bits. If this
- * is the last page and it is partial we don't really care as
- * it just means we do a little extra work but it won't affect
- * the result as all out of range bytes are set to zero by
- * ntfs_readpage().
- */
- nr_free -= bitmap_weight(kaddr,
- PAGE_SIZE * BITS_PER_BYTE);
- kunmap_atomic(kaddr);
- put_page(page);
- }
- ntfs_debug("Finished reading $Bitmap, last index = 0x%lx.", index - 1);
- /*
- * Fixup for eventual bits outside logical ntfs volume (see function
- * description above).
- */
- if (vol->nr_clusters & 63)
- nr_free += 64 - (vol->nr_clusters & 63);
- up_read(&vol->lcnbmp_lock);
- /* If errors occurred we may well have gone below zero, fix this. */
- if (nr_free < 0)
- nr_free = 0;
- ntfs_debug("Exiting.");
- return nr_free;
-}
-
-/**
- * __get_nr_free_mft_records - return the number of free inodes on a volume
- * @vol: ntfs volume for which to obtain free inode count
- * @nr_free: number of mft records in filesystem
- * @max_index: maximum number of pages containing set bits
- *
- * Calculate the number of free mft records (inodes) on the mounted NTFS
- * volume @vol. We actually calculate the number of mft records in use instead
- * because this allows us to not care about partial pages as these will be just
- * zero filled and hence not be counted as allocated mft record.
- *
- * If any pages cannot be read we assume all mft records in the erroring pages
- * are in use. This means we return an underestimate on errors which is better
- * than an overestimate.
- *
- * NOTE: Caller must hold mftbmp_lock rw_semaphore for reading or writing.
- */
-static unsigned long __get_nr_free_mft_records(ntfs_volume *vol,
- s64 nr_free, const pgoff_t max_index)
-{
- struct address_space *mapping = vol->mftbmp_ino->i_mapping;
- struct page *page;
- pgoff_t index;
-
- ntfs_debug("Entering.");
- /* Use multiples of 4 bytes, thus max_size is PAGE_SIZE / 4. */
- ntfs_debug("Reading $MFT/$BITMAP, max_index = 0x%lx, max_size = "
- "0x%lx.", max_index, PAGE_SIZE / 4);
- for (index = 0; index < max_index; index++) {
- unsigned long *kaddr;
-
- /*
- * Read the page from page cache, getting it from backing store
- * if necessary, and increment the use count.
- */
- page = read_mapping_page(mapping, index, NULL);
- /* Ignore pages which errored synchronously. */
- if (IS_ERR(page)) {
- ntfs_debug("read_mapping_page() error. Skipping "
- "page (index 0x%lx).", index);
- nr_free -= PAGE_SIZE * 8;
- continue;
- }
- kaddr = kmap_atomic(page);
- /*
- * Subtract the number of set bits. If this
- * is the last page and it is partial we don't really care as
- * it just means we do a little extra work but it won't affect
- * the result as all out of range bytes are set to zero by
- * ntfs_readpage().
- */
- nr_free -= bitmap_weight(kaddr,
- PAGE_SIZE * BITS_PER_BYTE);
- kunmap_atomic(kaddr);
- put_page(page);
- }
- ntfs_debug("Finished reading $MFT/$BITMAP, last index = 0x%lx.",
- index - 1);
- /* If errors occurred we may well have gone below zero, fix this. */
- if (nr_free < 0)
- nr_free = 0;
- ntfs_debug("Exiting.");
- return nr_free;
-}
-
-/**
- * ntfs_statfs - return information about mounted NTFS volume
- * @dentry: dentry from mounted volume
- * @sfs: statfs structure in which to return the information
- *
- * Return information about the mounted NTFS volume @dentry in the statfs structure
- * pointed to by @sfs (this is initialized with zeros before ntfs_statfs is
- * called). We interpret the values to be correct of the moment in time at
- * which we are called. Most values are variable otherwise and this isn't just
- * the free values but the totals as well. For example we can increase the
- * total number of file nodes if we run out and we can keep doing this until
- * there is no more space on the volume left at all.
- *
- * Called from vfs_statfs which is used to handle the statfs, fstatfs, and
- * ustat system calls.
- *
- * Return 0 on success or -errno on error.
- */
-static int ntfs_statfs(struct dentry *dentry, struct kstatfs *sfs)
-{
- struct super_block *sb = dentry->d_sb;
- s64 size;
- ntfs_volume *vol = NTFS_SB(sb);
- ntfs_inode *mft_ni = NTFS_I(vol->mft_ino);
- pgoff_t max_index;
- unsigned long flags;
-
- ntfs_debug("Entering.");
- /* Type of filesystem. */
- sfs->f_type = NTFS_SB_MAGIC;
- /* Optimal transfer block size. */
- sfs->f_bsize = PAGE_SIZE;
- /*
- * Total data blocks in filesystem in units of f_bsize and since
- * inodes are also stored in data blocs ($MFT is a file) this is just
- * the total clusters.
- */
- sfs->f_blocks = vol->nr_clusters << vol->cluster_size_bits >>
- PAGE_SHIFT;
- /* Free data blocks in filesystem in units of f_bsize. */
- size = get_nr_free_clusters(vol) << vol->cluster_size_bits >>
- PAGE_SHIFT;
- if (size < 0LL)
- size = 0LL;
- /* Free blocks avail to non-superuser, same as above on NTFS. */
- sfs->f_bavail = sfs->f_bfree = size;
- /* Serialize accesses to the inode bitmap. */
- down_read(&vol->mftbmp_lock);
- read_lock_irqsave(&mft_ni->size_lock, flags);
- size = i_size_read(vol->mft_ino) >> vol->mft_record_size_bits;
- /*
- * Convert the maximum number of set bits into bytes rounded up, then
- * convert into multiples of PAGE_SIZE, rounding up so that if we
- * have one full and one partial page max_index = 2.
- */
- max_index = ((((mft_ni->initialized_size >> vol->mft_record_size_bits)
- + 7) >> 3) + PAGE_SIZE - 1) >> PAGE_SHIFT;
- read_unlock_irqrestore(&mft_ni->size_lock, flags);
- /* Number of inodes in filesystem (at this point in time). */
- sfs->f_files = size;
- /* Free inodes in fs (based on current total count). */
- sfs->f_ffree = __get_nr_free_mft_records(vol, size, max_index);
- up_read(&vol->mftbmp_lock);
- /*
- * File system id. This is extremely *nix flavour dependent and even
- * within Linux itself all fs do their own thing. I interpret this to
- * mean a unique id associated with the mounted fs and not the id
- * associated with the filesystem driver, the latter is already given
- * by the filesystem type in sfs->f_type. Thus we use the 64-bit
- * volume serial number splitting it into two 32-bit parts. We enter
- * the least significant 32-bits in f_fsid[0] and the most significant
- * 32-bits in f_fsid[1].
- */
- sfs->f_fsid = u64_to_fsid(vol->serial_no);
- /* Maximum length of filenames. */
- sfs->f_namelen = NTFS_MAX_NAME_LEN;
- return 0;
-}
-
-#ifdef NTFS_RW
-static int ntfs_write_inode(struct inode *vi, struct writeback_control *wbc)
-{
- return __ntfs_write_inode(vi, wbc->sync_mode == WB_SYNC_ALL);
-}
-#endif
-
-/*
- * The complete super operations.
- */
-static const struct super_operations ntfs_sops = {
- .alloc_inode = ntfs_alloc_big_inode, /* VFS: Allocate new inode. */
- .free_inode = ntfs_free_big_inode, /* VFS: Deallocate inode. */
-#ifdef NTFS_RW
- .write_inode = ntfs_write_inode, /* VFS: Write dirty inode to
- disk. */
-#endif /* NTFS_RW */
- .put_super = ntfs_put_super, /* Syscall: umount. */
- .statfs = ntfs_statfs, /* Syscall: statfs */
- .remount_fs = ntfs_remount, /* Syscall: mount -o remount. */
- .evict_inode = ntfs_evict_big_inode, /* VFS: Called when an inode is
- removed from memory. */
- .show_options = ntfs_show_options, /* Show mount options in
- proc. */
-};
-
-/**
- * ntfs_fill_super - mount an ntfs filesystem
- * @sb: super block of ntfs filesystem to mount
- * @opt: string containing the mount options
- * @silent: silence error output
- *
- * ntfs_fill_super() is called by the VFS to mount the device described by @sb
- * with the mount otions in @data with the NTFS filesystem.
- *
- * If @silent is true, remain silent even if errors are detected. This is used
- * during bootup, when the kernel tries to mount the root filesystem with all
- * registered filesystems one after the other until one succeeds. This implies
- * that all filesystems except the correct one will quite correctly and
- * expectedly return an error, but nobody wants to see error messages when in
- * fact this is what is supposed to happen.
- *
- * NOTE: @sb->s_flags contains the mount options flags.
- */
-static int ntfs_fill_super(struct super_block *sb, void *opt, const int silent)
-{
- ntfs_volume *vol;
- struct buffer_head *bh;
- struct inode *tmp_ino;
- int blocksize, result;
-
- /*
- * We do a pretty difficult piece of bootstrap by reading the
- * MFT (and other metadata) from disk into memory. We'll only
- * release this metadata during umount, so the locking patterns
- * observed during bootstrap do not count. So turn off the
- * observation of locking patterns (strictly for this context
- * only) while mounting NTFS. [The validator is still active
- * otherwise, even for this context: it will for example record
- * lock class registrations.]
- */
- lockdep_off();
- ntfs_debug("Entering.");
-#ifndef NTFS_RW
- sb->s_flags |= SB_RDONLY;
-#endif /* ! NTFS_RW */
- /* Allocate a new ntfs_volume and place it in sb->s_fs_info. */
- sb->s_fs_info = kmalloc(sizeof(ntfs_volume), GFP_NOFS);
- vol = NTFS_SB(sb);
- if (!vol) {
- if (!silent)
- ntfs_error(sb, "Allocation of NTFS volume structure "
- "failed. Aborting mount...");
- lockdep_on();
- return -ENOMEM;
- }
- /* Initialize ntfs_volume structure. */
- *vol = (ntfs_volume) {
- .sb = sb,
- /*
- * Default is group and other don't have any access to files or
- * directories while owner has full access. Further, files by
- * default are not executable but directories are of course
- * browseable.
- */
- .fmask = 0177,
- .dmask = 0077,
- };
- init_rwsem(&vol->mftbmp_lock);
- init_rwsem(&vol->lcnbmp_lock);
-
- /* By default, enable sparse support. */
- NVolSetSparseEnabled(vol);
-
- /* Important to get the mount options dealt with now. */
- if (!parse_options(vol, (char*)opt))
- goto err_out_now;
-
- /* We support sector sizes up to the PAGE_SIZE. */
- if (bdev_logical_block_size(sb->s_bdev) > PAGE_SIZE) {
- if (!silent)
- ntfs_error(sb, "Device has unsupported sector size "
- "(%i). The maximum supported sector "
- "size on this architecture is %lu "
- "bytes.",
- bdev_logical_block_size(sb->s_bdev),
- PAGE_SIZE);
- goto err_out_now;
- }
- /*
- * Setup the device access block size to NTFS_BLOCK_SIZE or the hard
- * sector size, whichever is bigger.
- */
- blocksize = sb_min_blocksize(sb, NTFS_BLOCK_SIZE);
- if (blocksize < NTFS_BLOCK_SIZE) {
- if (!silent)
- ntfs_error(sb, "Unable to set device block size.");
- goto err_out_now;
- }
- BUG_ON(blocksize != sb->s_blocksize);
- ntfs_debug("Set device block size to %i bytes (block size bits %i).",
- blocksize, sb->s_blocksize_bits);
- /* Determine the size of the device in units of block_size bytes. */
- vol->nr_blocks = sb_bdev_nr_blocks(sb);
- if (!vol->nr_blocks) {
- if (!silent)
- ntfs_error(sb, "Unable to determine device size.");
- goto err_out_now;
- }
- /* Read the boot sector and return unlocked buffer head to it. */
- if (!(bh = read_ntfs_boot_sector(sb, silent))) {
- if (!silent)
- ntfs_error(sb, "Not an NTFS volume.");
- goto err_out_now;
- }
- /*
- * Extract the data from the boot sector and setup the ntfs volume
- * using it.
- */
- result = parse_ntfs_boot_sector(vol, (NTFS_BOOT_SECTOR*)bh->b_data);
- brelse(bh);
- if (!result) {
- if (!silent)
- ntfs_error(sb, "Unsupported NTFS filesystem.");
- goto err_out_now;
- }
- /*
- * If the boot sector indicates a sector size bigger than the current
- * device block size, switch the device block size to the sector size.
- * TODO: It may be possible to support this case even when the set
- * below fails, we would just be breaking up the i/o for each sector
- * into multiple blocks for i/o purposes but otherwise it should just
- * work. However it is safer to leave disabled until someone hits this
- * error message and then we can get them to try it without the setting
- * so we know for sure that it works.
- */
- if (vol->sector_size > blocksize) {
- blocksize = sb_set_blocksize(sb, vol->sector_size);
- if (blocksize != vol->sector_size) {
- if (!silent)
- ntfs_error(sb, "Unable to set device block "
- "size to sector size (%i).",
- vol->sector_size);
- goto err_out_now;
- }
- BUG_ON(blocksize != sb->s_blocksize);
- vol->nr_blocks = sb_bdev_nr_blocks(sb);
- ntfs_debug("Changed device block size to %i bytes (block size "
- "bits %i) to match volume sector size.",
- blocksize, sb->s_blocksize_bits);
- }
- /* Initialize the cluster and mft allocators. */
- ntfs_setup_allocators(vol);
- /* Setup remaining fields in the super block. */
- sb->s_magic = NTFS_SB_MAGIC;
- /*
- * Ntfs allows 63 bits for the file size, i.e. correct would be:
- * sb->s_maxbytes = ~0ULL >> 1;
- * But the kernel uses a long as the page cache page index which on
- * 32-bit architectures is only 32-bits. MAX_LFS_FILESIZE is kernel
- * defined to the maximum the page cache page index can cope with
- * without overflowing the index or to 2^63 - 1, whichever is smaller.
- */
- sb->s_maxbytes = MAX_LFS_FILESIZE;
- /* Ntfs measures time in 100ns intervals. */
- sb->s_time_gran = 100;
- /*
- * Now load the metadata required for the page cache and our address
- * space operations to function. We do this by setting up a specialised
- * read_inode method and then just calling the normal iget() to obtain
- * the inode for $MFT which is sufficient to allow our normal inode
- * operations and associated address space operations to function.
- */
- sb->s_op = &ntfs_sops;
- tmp_ino = new_inode(sb);
- if (!tmp_ino) {
- if (!silent)
- ntfs_error(sb, "Failed to load essential metadata.");
- goto err_out_now;
- }
- tmp_ino->i_ino = FILE_MFT;
- insert_inode_hash(tmp_ino);
- if (ntfs_read_inode_mount(tmp_ino) < 0) {
- if (!silent)
- ntfs_error(sb, "Failed to load essential metadata.");
- goto iput_tmp_ino_err_out_now;
- }
- mutex_lock(&ntfs_lock);
- /*
- * The current mount is a compression user if the cluster size is
- * less than or equal 4kiB.
- */
- if (vol->cluster_size <= 4096 && !ntfs_nr_compression_users++) {
- result = allocate_compression_buffers();
- if (result) {
- ntfs_error(NULL, "Failed to allocate buffers "
- "for compression engine.");
- ntfs_nr_compression_users--;
- mutex_unlock(&ntfs_lock);
- goto iput_tmp_ino_err_out_now;
- }
- }
- /*
- * Generate the global default upcase table if necessary. Also
- * temporarily increment the number of upcase users to avoid race
- * conditions with concurrent (u)mounts.
- */
- if (!default_upcase)
- default_upcase = generate_default_upcase();
- ntfs_nr_upcase_users++;
- mutex_unlock(&ntfs_lock);
- /*
- * From now on, ignore @silent parameter. If we fail below this line,
- * it will be due to a corrupt fs or a system error, so we report it.
- */
- /*
- * Open the system files with normal access functions and complete
- * setting up the ntfs super block.
- */
- if (!load_system_files(vol)) {
- ntfs_error(sb, "Failed to load system files.");
- goto unl_upcase_iput_tmp_ino_err_out_now;
- }
-
- /* We grab a reference, simulating an ntfs_iget(). */
- ihold(vol->root_ino);
- if ((sb->s_root = d_make_root(vol->root_ino))) {
- ntfs_debug("Exiting, status successful.");
- /* Release the default upcase if it has no users. */
- mutex_lock(&ntfs_lock);
- if (!--ntfs_nr_upcase_users && default_upcase) {
- ntfs_free(default_upcase);
- default_upcase = NULL;
- }
- mutex_unlock(&ntfs_lock);
- sb->s_export_op = &ntfs_export_ops;
- lockdep_on();
- return 0;
- }
- ntfs_error(sb, "Failed to allocate root directory.");
- /* Clean up after the successful load_system_files() call from above. */
- // TODO: Use ntfs_put_super() instead of repeating all this code...
- // FIXME: Should mark the volume clean as the error is most likely
- // -ENOMEM.
- iput(vol->vol_ino);
- vol->vol_ino = NULL;
- /* NTFS 3.0+ specific clean up. */
- if (vol->major_ver >= 3) {
-#ifdef NTFS_RW
- if (vol->usnjrnl_j_ino) {
- iput(vol->usnjrnl_j_ino);
- vol->usnjrnl_j_ino = NULL;
- }
- if (vol->usnjrnl_max_ino) {
- iput(vol->usnjrnl_max_ino);
- vol->usnjrnl_max_ino = NULL;
- }
- if (vol->usnjrnl_ino) {
- iput(vol->usnjrnl_ino);
- vol->usnjrnl_ino = NULL;
- }
- if (vol->quota_q_ino) {
- iput(vol->quota_q_ino);
- vol->quota_q_ino = NULL;
- }
- if (vol->quota_ino) {
- iput(vol->quota_ino);
- vol->quota_ino = NULL;
- }
-#endif /* NTFS_RW */
- if (vol->extend_ino) {
- iput(vol->extend_ino);
- vol->extend_ino = NULL;
- }
- if (vol->secure_ino) {
- iput(vol->secure_ino);
- vol->secure_ino = NULL;
- }
- }
- iput(vol->root_ino);
- vol->root_ino = NULL;
- iput(vol->lcnbmp_ino);
- vol->lcnbmp_ino = NULL;
- iput(vol->mftbmp_ino);
- vol->mftbmp_ino = NULL;
-#ifdef NTFS_RW
- if (vol->logfile_ino) {
- iput(vol->logfile_ino);
- vol->logfile_ino = NULL;
- }
- if (vol->mftmirr_ino) {
- iput(vol->mftmirr_ino);
- vol->mftmirr_ino = NULL;
- }
-#endif /* NTFS_RW */
- /* Throw away the table of attribute definitions. */
- vol->attrdef_size = 0;
- if (vol->attrdef) {
- ntfs_free(vol->attrdef);
- vol->attrdef = NULL;
- }
- vol->upcase_len = 0;
- mutex_lock(&ntfs_lock);
- if (vol->upcase == default_upcase) {
- ntfs_nr_upcase_users--;
- vol->upcase = NULL;
- }
- mutex_unlock(&ntfs_lock);
- if (vol->upcase) {
- ntfs_free(vol->upcase);
- vol->upcase = NULL;
- }
- if (vol->nls_map) {
- unload_nls(vol->nls_map);
- vol->nls_map = NULL;
- }
- /* Error exit code path. */
-unl_upcase_iput_tmp_ino_err_out_now:
- /*
- * Decrease the number of upcase users and destroy the global default
- * upcase table if necessary.
- */
- mutex_lock(&ntfs_lock);
- if (!--ntfs_nr_upcase_users && default_upcase) {
- ntfs_free(default_upcase);
- default_upcase = NULL;
- }
- if (vol->cluster_size <= 4096 && !--ntfs_nr_compression_users)
- free_compression_buffers();
- mutex_unlock(&ntfs_lock);
-iput_tmp_ino_err_out_now:
- iput(tmp_ino);
- if (vol->mft_ino && vol->mft_ino != tmp_ino)
- iput(vol->mft_ino);
- vol->mft_ino = NULL;
- /* Errors at this stage are irrelevant. */
-err_out_now:
- sb->s_fs_info = NULL;
- kfree(vol);
- ntfs_debug("Failed, returning -EINVAL.");
- lockdep_on();
- return -EINVAL;
-}
-
-/*
- * This is a slab cache to optimize allocations and deallocations of Unicode
- * strings of the maximum length allowed by NTFS, which is NTFS_MAX_NAME_LEN
- * (255) Unicode characters + a terminating NULL Unicode character.
- */
-struct kmem_cache *ntfs_name_cache;
-
-/* Slab caches for efficient allocation/deallocation of inodes. */
-struct kmem_cache *ntfs_inode_cache;
-struct kmem_cache *ntfs_big_inode_cache;
-
-/* Init once constructor for the inode slab cache. */
-static void ntfs_big_inode_init_once(void *foo)
-{
- ntfs_inode *ni = (ntfs_inode *)foo;
-
- inode_init_once(VFS_I(ni));
-}
-
-/*
- * Slab caches to optimize allocations and deallocations of attribute search
- * contexts and index contexts, respectively.
- */
-struct kmem_cache *ntfs_attr_ctx_cache;
-struct kmem_cache *ntfs_index_ctx_cache;
-
-/* Driver wide mutex. */
-DEFINE_MUTEX(ntfs_lock);
-
-static struct dentry *ntfs_mount(struct file_system_type *fs_type,
- int flags, const char *dev_name, void *data)
-{
- return mount_bdev(fs_type, flags, dev_name, data, ntfs_fill_super);
-}
-
-static struct file_system_type ntfs_fs_type = {
- .owner = THIS_MODULE,
- .name = "ntfs",
- .mount = ntfs_mount,
- .kill_sb = kill_block_super,
- .fs_flags = FS_REQUIRES_DEV,
-};
-MODULE_ALIAS_FS("ntfs");
-
-/* Stable names for the slab caches. */
-static const char ntfs_index_ctx_cache_name[] = "ntfs_index_ctx_cache";
-static const char ntfs_attr_ctx_cache_name[] = "ntfs_attr_ctx_cache";
-static const char ntfs_name_cache_name[] = "ntfs_name_cache";
-static const char ntfs_inode_cache_name[] = "ntfs_inode_cache";
-static const char ntfs_big_inode_cache_name[] = "ntfs_big_inode_cache";
-
-static int __init init_ntfs_fs(void)
-{
- int err = 0;
-
- /* This may be ugly but it results in pretty output so who cares. (-8 */
- pr_info("driver " NTFS_VERSION " [Flags: R/"
-#ifdef NTFS_RW
- "W"
-#else
- "O"
-#endif
-#ifdef DEBUG
- " DEBUG"
-#endif
-#ifdef MODULE
- " MODULE"
-#endif
- "].\n");
-
- ntfs_debug("Debug messages are enabled.");
-
- ntfs_index_ctx_cache = kmem_cache_create(ntfs_index_ctx_cache_name,
- sizeof(ntfs_index_context), 0 /* offset */,
- SLAB_HWCACHE_ALIGN, NULL /* ctor */);
- if (!ntfs_index_ctx_cache) {
- pr_crit("Failed to create %s!\n", ntfs_index_ctx_cache_name);
- goto ictx_err_out;
- }
- ntfs_attr_ctx_cache = kmem_cache_create(ntfs_attr_ctx_cache_name,
- sizeof(ntfs_attr_search_ctx), 0 /* offset */,
- SLAB_HWCACHE_ALIGN, NULL /* ctor */);
- if (!ntfs_attr_ctx_cache) {
- pr_crit("NTFS: Failed to create %s!\n",
- ntfs_attr_ctx_cache_name);
- goto actx_err_out;
- }
-
- ntfs_name_cache = kmem_cache_create(ntfs_name_cache_name,
- (NTFS_MAX_NAME_LEN+1) * sizeof(ntfschar), 0,
- SLAB_HWCACHE_ALIGN, NULL);
- if (!ntfs_name_cache) {
- pr_crit("Failed to create %s!\n", ntfs_name_cache_name);
- goto name_err_out;
- }
-
- ntfs_inode_cache = kmem_cache_create(ntfs_inode_cache_name,
- sizeof(ntfs_inode), 0,
- SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD, NULL);
- if (!ntfs_inode_cache) {
- pr_crit("Failed to create %s!\n", ntfs_inode_cache_name);
- goto inode_err_out;
- }
-
- ntfs_big_inode_cache = kmem_cache_create(ntfs_big_inode_cache_name,
- sizeof(big_ntfs_inode), 0,
- SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
- SLAB_ACCOUNT, ntfs_big_inode_init_once);
- if (!ntfs_big_inode_cache) {
- pr_crit("Failed to create %s!\n", ntfs_big_inode_cache_name);
- goto big_inode_err_out;
- }
-
- /* Register the ntfs sysctls. */
- err = ntfs_sysctl(1);
- if (err) {
- pr_crit("Failed to register NTFS sysctls!\n");
- goto sysctl_err_out;
- }
-
- err = register_filesystem(&ntfs_fs_type);
- if (!err) {
- ntfs_debug("NTFS driver registered successfully.");
- return 0; /* Success! */
- }
- pr_crit("Failed to register NTFS filesystem driver!\n");
-
- /* Unregister the ntfs sysctls. */
- ntfs_sysctl(0);
-sysctl_err_out:
- kmem_cache_destroy(ntfs_big_inode_cache);
-big_inode_err_out:
- kmem_cache_destroy(ntfs_inode_cache);
-inode_err_out:
- kmem_cache_destroy(ntfs_name_cache);
-name_err_out:
- kmem_cache_destroy(ntfs_attr_ctx_cache);
-actx_err_out:
- kmem_cache_destroy(ntfs_index_ctx_cache);
-ictx_err_out:
- if (!err) {
- pr_crit("Aborting NTFS filesystem driver registration...\n");
- err = -ENOMEM;
- }
- return err;
-}
-
-static void __exit exit_ntfs_fs(void)
-{
- ntfs_debug("Unregistering NTFS driver.");
-
- unregister_filesystem(&ntfs_fs_type);
-
- /*
- * Make sure all delayed rcu free inodes are flushed before we
- * destroy cache.
- */
- rcu_barrier();
- kmem_cache_destroy(ntfs_big_inode_cache);
- kmem_cache_destroy(ntfs_inode_cache);
- kmem_cache_destroy(ntfs_name_cache);
- kmem_cache_destroy(ntfs_attr_ctx_cache);
- kmem_cache_destroy(ntfs_index_ctx_cache);
- /* Unregister the ntfs sysctls. */
- ntfs_sysctl(0);
-}
-
-MODULE_AUTHOR("Anton Altaparmakov <anton@tuxera.com>");
-MODULE_DESCRIPTION("NTFS 1.2/3.x driver - Copyright (c) 2001-2014 Anton Altaparmakov and Tuxera Inc.");
-MODULE_VERSION(NTFS_VERSION);
-MODULE_LICENSE("GPL");
-#ifdef DEBUG
-module_param(debug_msgs, bint, 0);
-MODULE_PARM_DESC(debug_msgs, "Enable debug messages.");
-#endif
-
-module_init(init_ntfs_fs)
-module_exit(exit_ntfs_fs)
diff --git a/fs/ntfs/sysctl.c b/fs/ntfs/sysctl.c
deleted file mode 100644
index 4e98017..0000000
--- a/fs/ntfs/sysctl.c
+++ /dev/null
@@ -1,58 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * sysctl.c - Code for sysctl handling in NTFS Linux kernel driver. Part of
- * the Linux-NTFS project. Adapted from the old NTFS driver,
- * Copyright (C) 1997 Martin von Löwis, Régis Duchesne
- *
- * Copyright (c) 2002-2005 Anton Altaparmakov
- */
-
-#ifdef DEBUG
-
-#include <linux/module.h>
-
-#ifdef CONFIG_SYSCTL
-
-#include <linux/proc_fs.h>
-#include <linux/sysctl.h>
-
-#include "sysctl.h"
-#include "debug.h"
-
-/* Definition of the ntfs sysctl. */
-static struct ctl_table ntfs_sysctls[] = {
- {
- .procname = "ntfs-debug",
- .data = &debug_msgs, /* Data pointer and size. */
- .maxlen = sizeof(debug_msgs),
- .mode = 0644, /* Mode, proc handler. */
- .proc_handler = proc_dointvec
- },
-};
-
-/* Storage for the sysctls header. */
-static struct ctl_table_header *sysctls_root_table;
-
-/**
- * ntfs_sysctl - add or remove the debug sysctl
- * @add: add (1) or remove (0) the sysctl
- *
- * Add or remove the debug sysctl. Return 0 on success or -errno on error.
- */
-int ntfs_sysctl(int add)
-{
- if (add) {
- BUG_ON(sysctls_root_table);
- sysctls_root_table = register_sysctl("fs", ntfs_sysctls);
- if (!sysctls_root_table)
- return -ENOMEM;
- } else {
- BUG_ON(!sysctls_root_table);
- unregister_sysctl_table(sysctls_root_table);
- sysctls_root_table = NULL;
- }
- return 0;
-}
-
-#endif /* CONFIG_SYSCTL */
-#endif /* DEBUG */
diff --git a/fs/ntfs/sysctl.h b/fs/ntfs/sysctl.h
deleted file mode 100644
index 96bb229..0000000
--- a/fs/ntfs/sysctl.h
+++ /dev/null
@@ -1,27 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * sysctl.h - Defines for sysctl handling in NTFS Linux kernel driver. Part of
- * the Linux-NTFS project. Adapted from the old NTFS driver,
- * Copyright (C) 1997 Martin von Löwis, Régis Duchesne
- *
- * Copyright (c) 2002-2004 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_SYSCTL_H
-#define _LINUX_NTFS_SYSCTL_H
-
-
-#if defined(DEBUG) && defined(CONFIG_SYSCTL)
-
-extern int ntfs_sysctl(int add);
-
-#else
-
-/* Just return success. */
-static inline int ntfs_sysctl(int add)
-{
- return 0;
-}
-
-#endif /* DEBUG && CONFIG_SYSCTL */
-#endif /* _LINUX_NTFS_SYSCTL_H */
diff --git a/fs/ntfs/time.h b/fs/ntfs/time.h
deleted file mode 100644
index 6b63261..0000000
--- a/fs/ntfs/time.h
+++ /dev/null
@@ -1,89 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * time.h - NTFS time conversion functions. Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2005 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_TIME_H
-#define _LINUX_NTFS_TIME_H
-
-#include <linux/time.h> /* For current_kernel_time(). */
-#include <asm/div64.h> /* For do_div(). */
-
-#include "endian.h"
-
-#define NTFS_TIME_OFFSET ((s64)(369 * 365 + 89) * 24 * 3600 * 10000000)
-
-/**
- * utc2ntfs - convert Linux UTC time to NTFS time
- * @ts: Linux UTC time to convert to NTFS time
- *
- * Convert the Linux UTC time @ts to its corresponding NTFS time and return
- * that in little endian format.
- *
- * Linux stores time in a struct timespec64 consisting of a time64_t tv_sec
- * and a long tv_nsec where tv_sec is the number of 1-second intervals since
- * 1st January 1970, 00:00:00 UTC and tv_nsec is the number of 1-nano-second
- * intervals since the value of tv_sec.
- *
- * NTFS uses Microsoft's standard time format which is stored in a s64 and is
- * measured as the number of 100-nano-second intervals since 1st January 1601,
- * 00:00:00 UTC.
- */
-static inline sle64 utc2ntfs(const struct timespec64 ts)
-{
- /*
- * Convert the seconds to 100ns intervals, add the nano-seconds
- * converted to 100ns intervals, and then add the NTFS time offset.
- */
- return cpu_to_sle64((s64)ts.tv_sec * 10000000 + ts.tv_nsec / 100 +
- NTFS_TIME_OFFSET);
-}
-
-/**
- * get_current_ntfs_time - get the current time in little endian NTFS format
- *
- * Get the current time from the Linux kernel, convert it to its corresponding
- * NTFS time and return that in little endian format.
- */
-static inline sle64 get_current_ntfs_time(void)
-{
- struct timespec64 ts;
-
- ktime_get_coarse_real_ts64(&ts);
- return utc2ntfs(ts);
-}
-
-/**
- * ntfs2utc - convert NTFS time to Linux time
- * @time: NTFS time (little endian) to convert to Linux UTC
- *
- * Convert the little endian NTFS time @time to its corresponding Linux UTC
- * time and return that in cpu format.
- *
- * Linux stores time in a struct timespec64 consisting of a time64_t tv_sec
- * and a long tv_nsec where tv_sec is the number of 1-second intervals since
- * 1st January 1970, 00:00:00 UTC and tv_nsec is the number of 1-nano-second
- * intervals since the value of tv_sec.
- *
- * NTFS uses Microsoft's standard time format which is stored in a s64 and is
- * measured as the number of 100 nano-second intervals since 1st January 1601,
- * 00:00:00 UTC.
- */
-static inline struct timespec64 ntfs2utc(const sle64 time)
-{
- struct timespec64 ts;
-
- /* Subtract the NTFS time offset. */
- u64 t = (u64)(sle64_to_cpu(time) - NTFS_TIME_OFFSET);
- /*
- * Convert the time to 1-second intervals and the remainder to
- * 1-nano-second intervals.
- */
- ts.tv_nsec = do_div(t, 10000000) * 100;
- ts.tv_sec = t;
- return ts;
-}
-
-#endif /* _LINUX_NTFS_TIME_H */
diff --git a/fs/ntfs/types.h b/fs/ntfs/types.h
deleted file mode 100644
index 9a47859..0000000
--- a/fs/ntfs/types.h
+++ /dev/null
@@ -1,55 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * types.h - Defines for NTFS Linux kernel driver specific types.
- * Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2005 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_TYPES_H
-#define _LINUX_NTFS_TYPES_H
-
-#include <linux/types.h>
-
-typedef __le16 le16;
-typedef __le32 le32;
-typedef __le64 le64;
-typedef __u16 __bitwise sle16;
-typedef __u32 __bitwise sle32;
-typedef __u64 __bitwise sle64;
-
-/* 2-byte Unicode character type. */
-typedef le16 ntfschar;
-#define UCHAR_T_SIZE_BITS 1
-
-/*
- * Clusters are signed 64-bit values on NTFS volumes. We define two types, LCN
- * and VCN, to allow for type checking and better code readability.
- */
-typedef s64 VCN;
-typedef sle64 leVCN;
-typedef s64 LCN;
-typedef sle64 leLCN;
-
-/*
- * The NTFS journal $LogFile uses log sequence numbers which are signed 64-bit
- * values. We define our own type LSN, to allow for type checking and better
- * code readability.
- */
-typedef s64 LSN;
-typedef sle64 leLSN;
-
-/*
- * The NTFS transaction log $UsnJrnl uses usn which are signed 64-bit values.
- * We define our own type USN, to allow for type checking and better code
- * readability.
- */
-typedef s64 USN;
-typedef sle64 leUSN;
-
-typedef enum {
- CASE_SENSITIVE = 0,
- IGNORE_CASE = 1,
-} IGNORE_CASE_BOOL;
-
-#endif /* _LINUX_NTFS_TYPES_H */
diff --git a/fs/ntfs/unistr.c b/fs/ntfs/unistr.c
deleted file mode 100644
index a6b6c64..0000000
--- a/fs/ntfs/unistr.c
+++ /dev/null
@@ -1,384 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * unistr.c - NTFS Unicode string handling. Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2006 Anton Altaparmakov
- */
-
-#include <linux/slab.h>
-
-#include "types.h"
-#include "debug.h"
-#include "ntfs.h"
-
-/*
- * IMPORTANT
- * =========
- *
- * All these routines assume that the Unicode characters are in little endian
- * encoding inside the strings!!!
- */
-
-/*
- * This is used by the name collation functions to quickly determine what
- * characters are (in)valid.
- */
-static const u8 legal_ansi_char_array[0x40] = {
- 0x00, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10,
- 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10,
-
- 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10,
- 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10,
-
- 0x17, 0x07, 0x18, 0x17, 0x17, 0x17, 0x17, 0x17,
- 0x17, 0x17, 0x18, 0x16, 0x16, 0x17, 0x07, 0x00,
-
- 0x17, 0x17, 0x17, 0x17, 0x17, 0x17, 0x17, 0x17,
- 0x17, 0x17, 0x04, 0x16, 0x18, 0x16, 0x18, 0x18,
-};
-
-/**
- * ntfs_are_names_equal - compare two Unicode names for equality
- * @s1: name to compare to @s2
- * @s1_len: length in Unicode characters of @s1
- * @s2: name to compare to @s1
- * @s2_len: length in Unicode characters of @s2
- * @ic: ignore case bool
- * @upcase: upcase table (only if @ic == IGNORE_CASE)
- * @upcase_size: length in Unicode characters of @upcase (if present)
- *
- * Compare the names @s1 and @s2 and return 'true' (1) if the names are
- * identical, or 'false' (0) if they are not identical. If @ic is IGNORE_CASE,
- * the @upcase table is used to performa a case insensitive comparison.
- */
-bool ntfs_are_names_equal(const ntfschar *s1, size_t s1_len,
- const ntfschar *s2, size_t s2_len, const IGNORE_CASE_BOOL ic,
- const ntfschar *upcase, const u32 upcase_size)
-{
- if (s1_len != s2_len)
- return false;
- if (ic == CASE_SENSITIVE)
- return !ntfs_ucsncmp(s1, s2, s1_len);
- return !ntfs_ucsncasecmp(s1, s2, s1_len, upcase, upcase_size);
-}
-
-/**
- * ntfs_collate_names - collate two Unicode names
- * @name1: first Unicode name to compare
- * @name2: second Unicode name to compare
- * @err_val: if @name1 contains an invalid character return this value
- * @ic: either CASE_SENSITIVE or IGNORE_CASE
- * @upcase: upcase table (ignored if @ic is CASE_SENSITIVE)
- * @upcase_len: upcase table size (ignored if @ic is CASE_SENSITIVE)
- *
- * ntfs_collate_names collates two Unicode names and returns:
- *
- * -1 if the first name collates before the second one,
- * 0 if the names match,
- * 1 if the second name collates before the first one, or
- * @err_val if an invalid character is found in @name1 during the comparison.
- *
- * The following characters are considered invalid: '"', '*', '<', '>' and '?'.
- */
-int ntfs_collate_names(const ntfschar *name1, const u32 name1_len,
- const ntfschar *name2, const u32 name2_len,
- const int err_val, const IGNORE_CASE_BOOL ic,
- const ntfschar *upcase, const u32 upcase_len)
-{
- u32 cnt, min_len;
- u16 c1, c2;
-
- min_len = name1_len;
- if (name1_len > name2_len)
- min_len = name2_len;
- for (cnt = 0; cnt < min_len; ++cnt) {
- c1 = le16_to_cpu(*name1++);
- c2 = le16_to_cpu(*name2++);
- if (ic) {
- if (c1 < upcase_len)
- c1 = le16_to_cpu(upcase[c1]);
- if (c2 < upcase_len)
- c2 = le16_to_cpu(upcase[c2]);
- }
- if (c1 < 64 && legal_ansi_char_array[c1] & 8)
- return err_val;
- if (c1 < c2)
- return -1;
- if (c1 > c2)
- return 1;
- }
- if (name1_len < name2_len)
- return -1;
- if (name1_len == name2_len)
- return 0;
- /* name1_len > name2_len */
- c1 = le16_to_cpu(*name1);
- if (c1 < 64 && legal_ansi_char_array[c1] & 8)
- return err_val;
- return 1;
-}
-
-/**
- * ntfs_ucsncmp - compare two little endian Unicode strings
- * @s1: first string
- * @s2: second string
- * @n: maximum unicode characters to compare
- *
- * Compare the first @n characters of the Unicode strings @s1 and @s2,
- * The strings in little endian format and appropriate le16_to_cpu()
- * conversion is performed on non-little endian machines.
- *
- * The function returns an integer less than, equal to, or greater than zero
- * if @s1 (or the first @n Unicode characters thereof) is found, respectively,
- * to be less than, to match, or be greater than @s2.
- */
-int ntfs_ucsncmp(const ntfschar *s1, const ntfschar *s2, size_t n)
-{
- u16 c1, c2;
- size_t i;
-
- for (i = 0; i < n; ++i) {
- c1 = le16_to_cpu(s1[i]);
- c2 = le16_to_cpu(s2[i]);
- if (c1 < c2)
- return -1;
- if (c1 > c2)
- return 1;
- if (!c1)
- break;
- }
- return 0;
-}
-
-/**
- * ntfs_ucsncasecmp - compare two little endian Unicode strings, ignoring case
- * @s1: first string
- * @s2: second string
- * @n: maximum unicode characters to compare
- * @upcase: upcase table
- * @upcase_size: upcase table size in Unicode characters
- *
- * Compare the first @n characters of the Unicode strings @s1 and @s2,
- * ignoring case. The strings in little endian format and appropriate
- * le16_to_cpu() conversion is performed on non-little endian machines.
- *
- * Each character is uppercased using the @upcase table before the comparison.
- *
- * The function returns an integer less than, equal to, or greater than zero
- * if @s1 (or the first @n Unicode characters thereof) is found, respectively,
- * to be less than, to match, or be greater than @s2.
- */
-int ntfs_ucsncasecmp(const ntfschar *s1, const ntfschar *s2, size_t n,
- const ntfschar *upcase, const u32 upcase_size)
-{
- size_t i;
- u16 c1, c2;
-
- for (i = 0; i < n; ++i) {
- if ((c1 = le16_to_cpu(s1[i])) < upcase_size)
- c1 = le16_to_cpu(upcase[c1]);
- if ((c2 = le16_to_cpu(s2[i])) < upcase_size)
- c2 = le16_to_cpu(upcase[c2]);
- if (c1 < c2)
- return -1;
- if (c1 > c2)
- return 1;
- if (!c1)
- break;
- }
- return 0;
-}
-
-void ntfs_upcase_name(ntfschar *name, u32 name_len, const ntfschar *upcase,
- const u32 upcase_len)
-{
- u32 i;
- u16 u;
-
- for (i = 0; i < name_len; i++)
- if ((u = le16_to_cpu(name[i])) < upcase_len)
- name[i] = upcase[u];
-}
-
-void ntfs_file_upcase_value(FILE_NAME_ATTR *file_name_attr,
- const ntfschar *upcase, const u32 upcase_len)
-{
- ntfs_upcase_name((ntfschar*)&file_name_attr->file_name,
- file_name_attr->file_name_length, upcase, upcase_len);
-}
-
-int ntfs_file_compare_values(FILE_NAME_ATTR *file_name_attr1,
- FILE_NAME_ATTR *file_name_attr2,
- const int err_val, const IGNORE_CASE_BOOL ic,
- const ntfschar *upcase, const u32 upcase_len)
-{
- return ntfs_collate_names((ntfschar*)&file_name_attr1->file_name,
- file_name_attr1->file_name_length,
- (ntfschar*)&file_name_attr2->file_name,
- file_name_attr2->file_name_length,
- err_val, ic, upcase, upcase_len);
-}
-
-/**
- * ntfs_nlstoucs - convert NLS string to little endian Unicode string
- * @vol: ntfs volume which we are working with
- * @ins: input NLS string buffer
- * @ins_len: length of input string in bytes
- * @outs: on return contains the allocated output Unicode string buffer
- *
- * Convert the input string @ins, which is in whatever format the loaded NLS
- * map dictates, into a little endian, 2-byte Unicode string.
- *
- * This function allocates the string and the caller is responsible for
- * calling kmem_cache_free(ntfs_name_cache, *@outs); when finished with it.
- *
- * On success the function returns the number of Unicode characters written to
- * the output string *@outs (>= 0), not counting the terminating Unicode NULL
- * character. *@outs is set to the allocated output string buffer.
- *
- * On error, a negative number corresponding to the error code is returned. In
- * that case the output string is not allocated. Both *@outs and *@outs_len
- * are then undefined.
- *
- * This might look a bit odd due to fast path optimization...
- */
-int ntfs_nlstoucs(const ntfs_volume *vol, const char *ins,
- const int ins_len, ntfschar **outs)
-{
- struct nls_table *nls = vol->nls_map;
- ntfschar *ucs;
- wchar_t wc;
- int i, o, wc_len;
-
- /* We do not trust outside sources. */
- if (likely(ins)) {
- ucs = kmem_cache_alloc(ntfs_name_cache, GFP_NOFS);
- if (likely(ucs)) {
- for (i = o = 0; i < ins_len; i += wc_len) {
- wc_len = nls->char2uni(ins + i, ins_len - i,
- &wc);
- if (likely(wc_len >= 0 &&
- o < NTFS_MAX_NAME_LEN)) {
- if (likely(wc)) {
- ucs[o++] = cpu_to_le16(wc);
- continue;
- } /* else if (!wc) */
- break;
- } /* else if (wc_len < 0 ||
- o >= NTFS_MAX_NAME_LEN) */
- goto name_err;
- }
- ucs[o] = 0;
- *outs = ucs;
- return o;
- } /* else if (!ucs) */
- ntfs_error(vol->sb, "Failed to allocate buffer for converted "
- "name from ntfs_name_cache.");
- return -ENOMEM;
- } /* else if (!ins) */
- ntfs_error(vol->sb, "Received NULL pointer.");
- return -EINVAL;
-name_err:
- kmem_cache_free(ntfs_name_cache, ucs);
- if (wc_len < 0) {
- ntfs_error(vol->sb, "Name using character set %s contains "
- "characters that cannot be converted to "
- "Unicode.", nls->charset);
- i = -EILSEQ;
- } else /* if (o >= NTFS_MAX_NAME_LEN) */ {
- ntfs_error(vol->sb, "Name is too long (maximum length for a "
- "name on NTFS is %d Unicode characters.",
- NTFS_MAX_NAME_LEN);
- i = -ENAMETOOLONG;
- }
- return i;
-}
-
-/**
- * ntfs_ucstonls - convert little endian Unicode string to NLS string
- * @vol: ntfs volume which we are working with
- * @ins: input Unicode string buffer
- * @ins_len: length of input string in Unicode characters
- * @outs: on return contains the (allocated) output NLS string buffer
- * @outs_len: length of output string buffer in bytes
- *
- * Convert the input little endian, 2-byte Unicode string @ins, of length
- * @ins_len into the string format dictated by the loaded NLS.
- *
- * If *@outs is NULL, this function allocates the string and the caller is
- * responsible for calling kfree(*@outs); when finished with it. In this case
- * @outs_len is ignored and can be 0.
- *
- * On success the function returns the number of bytes written to the output
- * string *@outs (>= 0), not counting the terminating NULL byte. If the output
- * string buffer was allocated, *@outs is set to it.
- *
- * On error, a negative number corresponding to the error code is returned. In
- * that case the output string is not allocated. The contents of *@outs are
- * then undefined.
- *
- * This might look a bit odd due to fast path optimization...
- */
-int ntfs_ucstonls(const ntfs_volume *vol, const ntfschar *ins,
- const int ins_len, unsigned char **outs, int outs_len)
-{
- struct nls_table *nls = vol->nls_map;
- unsigned char *ns;
- int i, o, ns_len, wc;
-
- /* We don't trust outside sources. */
- if (ins) {
- ns = *outs;
- ns_len = outs_len;
- if (ns && !ns_len) {
- wc = -ENAMETOOLONG;
- goto conversion_err;
- }
- if (!ns) {
- ns_len = ins_len * NLS_MAX_CHARSET_SIZE;
- ns = kmalloc(ns_len + 1, GFP_NOFS);
- if (!ns)
- goto mem_err_out;
- }
- for (i = o = 0; i < ins_len; i++) {
-retry: wc = nls->uni2char(le16_to_cpu(ins[i]), ns + o,
- ns_len - o);
- if (wc > 0) {
- o += wc;
- continue;
- } else if (!wc)
- break;
- else if (wc == -ENAMETOOLONG && ns != *outs) {
- unsigned char *tc;
- /* Grow in multiples of 64 bytes. */
- tc = kmalloc((ns_len + 64) &
- ~63, GFP_NOFS);
- if (tc) {
- memcpy(tc, ns, ns_len);
- ns_len = ((ns_len + 64) & ~63) - 1;
- kfree(ns);
- ns = tc;
- goto retry;
- } /* No memory so goto conversion_error; */
- } /* wc < 0, real error. */
- goto conversion_err;
- }
- ns[o] = 0;
- *outs = ns;
- return o;
- } /* else (!ins) */
- ntfs_error(vol->sb, "Received NULL pointer.");
- return -EINVAL;
-conversion_err:
- ntfs_error(vol->sb, "Unicode name contains characters that cannot be "
- "converted to character set %s. You might want to "
- "try to use the mount option nls=utf8.", nls->charset);
- if (ns != *outs)
- kfree(ns);
- if (wc != -ENAMETOOLONG)
- wc = -EILSEQ;
- return wc;
-mem_err_out:
- ntfs_error(vol->sb, "Failed to allocate name!");
- return -ENOMEM;
-}
diff --git a/fs/ntfs/upcase.c b/fs/ntfs/upcase.c
deleted file mode 100644
index 4ebe84a..0000000
--- a/fs/ntfs/upcase.c
+++ /dev/null
@@ -1,73 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * upcase.c - Generate the full NTFS Unicode upcase table in little endian.
- * Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001 Richard Russon <ntfs@flatcap.org>
- * Copyright (c) 2001-2006 Anton Altaparmakov
- */
-
-#include "malloc.h"
-#include "ntfs.h"
-
-ntfschar *generate_default_upcase(void)
-{
- static const int uc_run_table[][3] = { /* Start, End, Add */
- {0x0061, 0x007B, -32}, {0x0451, 0x045D, -80}, {0x1F70, 0x1F72, 74},
- {0x00E0, 0x00F7, -32}, {0x045E, 0x0460, -80}, {0x1F72, 0x1F76, 86},
- {0x00F8, 0x00FF, -32}, {0x0561, 0x0587, -48}, {0x1F76, 0x1F78, 100},
- {0x0256, 0x0258, -205}, {0x1F00, 0x1F08, 8}, {0x1F78, 0x1F7A, 128},
- {0x028A, 0x028C, -217}, {0x1F10, 0x1F16, 8}, {0x1F7A, 0x1F7C, 112},
- {0x03AC, 0x03AD, -38}, {0x1F20, 0x1F28, 8}, {0x1F7C, 0x1F7E, 126},
- {0x03AD, 0x03B0, -37}, {0x1F30, 0x1F38, 8}, {0x1FB0, 0x1FB2, 8},
- {0x03B1, 0x03C2, -32}, {0x1F40, 0x1F46, 8}, {0x1FD0, 0x1FD2, 8},
- {0x03C2, 0x03C3, -31}, {0x1F51, 0x1F52, 8}, {0x1FE0, 0x1FE2, 8},
- {0x03C3, 0x03CC, -32}, {0x1F53, 0x1F54, 8}, {0x1FE5, 0x1FE6, 7},
- {0x03CC, 0x03CD, -64}, {0x1F55, 0x1F56, 8}, {0x2170, 0x2180, -16},
- {0x03CD, 0x03CF, -63}, {0x1F57, 0x1F58, 8}, {0x24D0, 0x24EA, -26},
- {0x0430, 0x0450, -32}, {0x1F60, 0x1F68, 8}, {0xFF41, 0xFF5B, -32},
- {0}
- };
-
- static const int uc_dup_table[][2] = { /* Start, End */
- {0x0100, 0x012F}, {0x01A0, 0x01A6}, {0x03E2, 0x03EF}, {0x04CB, 0x04CC},
- {0x0132, 0x0137}, {0x01B3, 0x01B7}, {0x0460, 0x0481}, {0x04D0, 0x04EB},
- {0x0139, 0x0149}, {0x01CD, 0x01DD}, {0x0490, 0x04BF}, {0x04EE, 0x04F5},
- {0x014A, 0x0178}, {0x01DE, 0x01EF}, {0x04BF, 0x04BF}, {0x04F8, 0x04F9},
- {0x0179, 0x017E}, {0x01F4, 0x01F5}, {0x04C1, 0x04C4}, {0x1E00, 0x1E95},
- {0x018B, 0x018B}, {0x01FA, 0x0218}, {0x04C7, 0x04C8}, {0x1EA0, 0x1EF9},
- {0}
- };
-
- static const int uc_word_table[][2] = { /* Offset, Value */
- {0x00FF, 0x0178}, {0x01AD, 0x01AC}, {0x01F3, 0x01F1}, {0x0269, 0x0196},
- {0x0183, 0x0182}, {0x01B0, 0x01AF}, {0x0253, 0x0181}, {0x026F, 0x019C},
- {0x0185, 0x0184}, {0x01B9, 0x01B8}, {0x0254, 0x0186}, {0x0272, 0x019D},
- {0x0188, 0x0187}, {0x01BD, 0x01BC}, {0x0259, 0x018F}, {0x0275, 0x019F},
- {0x018C, 0x018B}, {0x01C6, 0x01C4}, {0x025B, 0x0190}, {0x0283, 0x01A9},
- {0x0192, 0x0191}, {0x01C9, 0x01C7}, {0x0260, 0x0193}, {0x0288, 0x01AE},
- {0x0199, 0x0198}, {0x01CC, 0x01CA}, {0x0263, 0x0194}, {0x0292, 0x01B7},
- {0x01A8, 0x01A7}, {0x01DD, 0x018E}, {0x0268, 0x0197},
- {0}
- };
-
- int i, r;
- ntfschar *uc;
-
- uc = ntfs_malloc_nofs(default_upcase_len * sizeof(ntfschar));
- if (!uc)
- return uc;
- memset(uc, 0, default_upcase_len * sizeof(ntfschar));
- /* Generate the little endian Unicode upcase table used by ntfs. */
- for (i = 0; i < default_upcase_len; i++)
- uc[i] = cpu_to_le16(i);
- for (r = 0; uc_run_table[r][0]; r++)
- for (i = uc_run_table[r][0]; i < uc_run_table[r][1]; i++)
- le16_add_cpu(&uc[i], uc_run_table[r][2]);
- for (r = 0; uc_dup_table[r][0]; r++)
- for (i = uc_dup_table[r][0]; i < uc_dup_table[r][1]; i += 2)
- le16_add_cpu(&uc[i + 1], -1);
- for (r = 0; uc_word_table[r][0]; r++)
- uc[uc_word_table[r][0]] = cpu_to_le16(uc_word_table[r][1]);
- return uc;
-}
diff --git a/fs/ntfs/usnjrnl.c b/fs/ntfs/usnjrnl.c
deleted file mode 100644
index 9097a0b..0000000
--- a/fs/ntfs/usnjrnl.c
+++ /dev/null
@@ -1,70 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * usnjrnl.h - NTFS kernel transaction log ($UsnJrnl) handling. Part of the
- * Linux-NTFS project.
- *
- * Copyright (c) 2005 Anton Altaparmakov
- */
-
-#ifdef NTFS_RW
-
-#include <linux/fs.h>
-#include <linux/highmem.h>
-#include <linux/mm.h>
-
-#include "aops.h"
-#include "debug.h"
-#include "endian.h"
-#include "time.h"
-#include "types.h"
-#include "usnjrnl.h"
-#include "volume.h"
-
-/**
- * ntfs_stamp_usnjrnl - stamp the transaction log ($UsnJrnl) on an ntfs volume
- * @vol: ntfs volume on which to stamp the transaction log
- *
- * Stamp the transaction log ($UsnJrnl) on the ntfs volume @vol and return
- * 'true' on success and 'false' on error.
- *
- * This function assumes that the transaction log has already been loaded and
- * consistency checked by a call to fs/ntfs/super.c::load_and_init_usnjrnl().
- */
-bool ntfs_stamp_usnjrnl(ntfs_volume *vol)
-{
- ntfs_debug("Entering.");
- if (likely(!NVolUsnJrnlStamped(vol))) {
- sle64 stamp;
- struct page *page;
- USN_HEADER *uh;
-
- page = ntfs_map_page(vol->usnjrnl_max_ino->i_mapping, 0);
- if (IS_ERR(page)) {
- ntfs_error(vol->sb, "Failed to read from "
- "$UsnJrnl/$DATA/$Max attribute.");
- return false;
- }
- uh = (USN_HEADER*)page_address(page);
- stamp = get_current_ntfs_time();
- ntfs_debug("Stamping transaction log ($UsnJrnl): old "
- "journal_id 0x%llx, old lowest_valid_usn "
- "0x%llx, new journal_id 0x%llx, new "
- "lowest_valid_usn 0x%llx.",
- (long long)sle64_to_cpu(uh->journal_id),
- (long long)sle64_to_cpu(uh->lowest_valid_usn),
- (long long)sle64_to_cpu(stamp),
- i_size_read(vol->usnjrnl_j_ino));
- uh->lowest_valid_usn =
- cpu_to_sle64(i_size_read(vol->usnjrnl_j_ino));
- uh->journal_id = stamp;
- flush_dcache_page(page);
- set_page_dirty(page);
- ntfs_unmap_page(page);
- /* Set the flag so we do not have to do it again on remount. */
- NVolSetUsnJrnlStamped(vol);
- }
- ntfs_debug("Done.");
- return true;
-}
-
-#endif /* NTFS_RW */
diff --git a/fs/ntfs/usnjrnl.h b/fs/ntfs/usnjrnl.h
deleted file mode 100644
index 85f531b..0000000
--- a/fs/ntfs/usnjrnl.h
+++ /dev/null
@@ -1,191 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * usnjrnl.h - Defines for NTFS kernel transaction log ($UsnJrnl) handling.
- * Part of the Linux-NTFS project.
- *
- * Copyright (c) 2005 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_USNJRNL_H
-#define _LINUX_NTFS_USNJRNL_H
-
-#ifdef NTFS_RW
-
-#include "types.h"
-#include "endian.h"
-#include "layout.h"
-#include "volume.h"
-
-/*
- * Transaction log ($UsnJrnl) organization:
- *
- * The transaction log records whenever a file is modified in any way. So for
- * example it will record that file "blah" was written to at a particular time
- * but not what was written. If will record that a file was deleted or
- * created, that a file was truncated, etc. See below for all the reason
- * codes used.
- *
- * The transaction log is in the $Extend directory which is in the root
- * directory of each volume. If it is not present it means transaction
- * logging is disabled. If it is present it means transaction logging is
- * either enabled or in the process of being disabled in which case we can
- * ignore it as it will go away as soon as Windows gets its hands on it.
- *
- * To determine whether the transaction logging is enabled or in the process
- * of being disabled, need to check the volume flags in the
- * $VOLUME_INFORMATION attribute in the $Volume system file (which is present
- * in the root directory and has a fixed mft record number, see layout.h).
- * If the flag VOLUME_DELETE_USN_UNDERWAY is set it means the transaction log
- * is in the process of being disabled and if this flag is clear it means the
- * transaction log is enabled.
- *
- * The transaction log consists of two parts; the $DATA/$Max attribute as well
- * as the $DATA/$J attribute. $Max is a header describing the transaction
- * log whilst $J is the transaction log data itself as a sequence of variable
- * sized USN_RECORDs (see below for all the structures).
- *
- * We do not care about transaction logging at this point in time but we still
- * need to let windows know that the transaction log is out of date. To do
- * this we need to stamp the transaction log. This involves setting the
- * lowest_valid_usn field in the $DATA/$Max attribute to the usn to be used
- * for the next added USN_RECORD to the $DATA/$J attribute as well as
- * generating a new journal_id in $DATA/$Max.
- *
- * The journal_id is as of the current version (2.0) of the transaction log
- * simply the 64-bit timestamp of when the journal was either created or last
- * stamped.
- *
- * To determine the next usn there are two ways. The first is to parse
- * $DATA/$J and to find the last USN_RECORD in it and to add its record_length
- * to its usn (which is the byte offset in the $DATA/$J attribute). The
- * second is simply to take the data size of the attribute. Since the usns
- * are simply byte offsets into $DATA/$J, this is exactly the next usn. For
- * obvious reasons we use the second method as it is much simpler and faster.
- *
- * As an aside, note that to actually disable the transaction log, one would
- * need to set the VOLUME_DELETE_USN_UNDERWAY flag (see above), then go
- * through all the mft records on the volume and set the usn field in their
- * $STANDARD_INFORMATION attribute to zero. Once that is done, one would need
- * to delete the transaction log file, i.e. \$Extent\$UsnJrnl, and finally,
- * one would need to clear the VOLUME_DELETE_USN_UNDERWAY flag.
- *
- * Note that if a volume is unmounted whilst the transaction log is being
- * disabled, the process will continue the next time the volume is mounted.
- * This is why we can safely mount read-write when we see a transaction log
- * in the process of being deleted.
- */
-
-/* Some $UsnJrnl related constants. */
-#define UsnJrnlMajorVer 2
-#define UsnJrnlMinorVer 0
-
-/*
- * $DATA/$Max attribute. This is (always?) resident and has a fixed size of
- * 32 bytes. It contains the header describing the transaction log.
- */
-typedef struct {
-/*Ofs*/
-/* 0*/sle64 maximum_size; /* The maximum on-disk size of the $DATA/$J
- attribute. */
-/* 8*/sle64 allocation_delta; /* Number of bytes by which to increase the
- size of the $DATA/$J attribute. */
-/*0x10*/sle64 journal_id; /* Current id of the transaction log. */
-/*0x18*/leUSN lowest_valid_usn; /* Lowest valid usn in $DATA/$J for the
- current journal_id. */
-/* sizeof() = 32 (0x20) bytes */
-} __attribute__ ((__packed__)) USN_HEADER;
-
-/*
- * Reason flags (32-bit). Cumulative flags describing the change(s) to the
- * file since it was last opened. I think the names speak for themselves but
- * if you disagree check out the descriptions in the Linux NTFS project NTFS
- * documentation: http://www.linux-ntfs.org/
- */
-enum {
- USN_REASON_DATA_OVERWRITE = cpu_to_le32(0x00000001),
- USN_REASON_DATA_EXTEND = cpu_to_le32(0x00000002),
- USN_REASON_DATA_TRUNCATION = cpu_to_le32(0x00000004),
- USN_REASON_NAMED_DATA_OVERWRITE = cpu_to_le32(0x00000010),
- USN_REASON_NAMED_DATA_EXTEND = cpu_to_le32(0x00000020),
- USN_REASON_NAMED_DATA_TRUNCATION= cpu_to_le32(0x00000040),
- USN_REASON_FILE_CREATE = cpu_to_le32(0x00000100),
- USN_REASON_FILE_DELETE = cpu_to_le32(0x00000200),
- USN_REASON_EA_CHANGE = cpu_to_le32(0x00000400),
- USN_REASON_SECURITY_CHANGE = cpu_to_le32(0x00000800),
- USN_REASON_RENAME_OLD_NAME = cpu_to_le32(0x00001000),
- USN_REASON_RENAME_NEW_NAME = cpu_to_le32(0x00002000),
- USN_REASON_INDEXABLE_CHANGE = cpu_to_le32(0x00004000),
- USN_REASON_BASIC_INFO_CHANGE = cpu_to_le32(0x00008000),
- USN_REASON_HARD_LINK_CHANGE = cpu_to_le32(0x00010000),
- USN_REASON_COMPRESSION_CHANGE = cpu_to_le32(0x00020000),
- USN_REASON_ENCRYPTION_CHANGE = cpu_to_le32(0x00040000),
- USN_REASON_OBJECT_ID_CHANGE = cpu_to_le32(0x00080000),
- USN_REASON_REPARSE_POINT_CHANGE = cpu_to_le32(0x00100000),
- USN_REASON_STREAM_CHANGE = cpu_to_le32(0x00200000),
- USN_REASON_CLOSE = cpu_to_le32(0x80000000),
-};
-
-typedef le32 USN_REASON_FLAGS;
-
-/*
- * Source info flags (32-bit). Information about the source of the change(s)
- * to the file. For detailed descriptions of what these mean, see the Linux
- * NTFS project NTFS documentation:
- * http://www.linux-ntfs.org/
- */
-enum {
- USN_SOURCE_DATA_MANAGEMENT = cpu_to_le32(0x00000001),
- USN_SOURCE_AUXILIARY_DATA = cpu_to_le32(0x00000002),
- USN_SOURCE_REPLICATION_MANAGEMENT = cpu_to_le32(0x00000004),
-};
-
-typedef le32 USN_SOURCE_INFO_FLAGS;
-
-/*
- * $DATA/$J attribute. This is always non-resident, is marked as sparse, and
- * is of variabled size. It consists of a sequence of variable size
- * USN_RECORDS. The minimum allocated_size is allocation_delta as
- * specified in $DATA/$Max. When the maximum_size specified in $DATA/$Max is
- * exceeded by more than allocation_delta bytes, allocation_delta bytes are
- * allocated and appended to the $DATA/$J attribute and an equal number of
- * bytes at the beginning of the attribute are freed and made sparse. Note the
- * making sparse only happens at volume checkpoints and hence the actual
- * $DATA/$J size can exceed maximum_size + allocation_delta temporarily.
- */
-typedef struct {
-/*Ofs*/
-/* 0*/le32 length; /* Byte size of this record (8-byte
- aligned). */
-/* 4*/le16 major_ver; /* Major version of the transaction log used
- for this record. */
-/* 6*/le16 minor_ver; /* Minor version of the transaction log used
- for this record. */
-/* 8*/leMFT_REF mft_reference;/* The mft reference of the file (or
- directory) described by this record. */
-/*0x10*/leMFT_REF parent_directory;/* The mft reference of the parent
- directory of the file described by this
- record. */
-/*0x18*/leUSN usn; /* The usn of this record. Equals the offset
- within the $DATA/$J attribute. */
-/*0x20*/sle64 time; /* Time when this record was created. */
-/*0x28*/USN_REASON_FLAGS reason;/* Reason flags (see above). */
-/*0x2c*/USN_SOURCE_INFO_FLAGS source_info;/* Source info flags (see above). */
-/*0x30*/le32 security_id; /* File security_id copied from
- $STANDARD_INFORMATION. */
-/*0x34*/FILE_ATTR_FLAGS file_attributes; /* File attributes copied from
- $STANDARD_INFORMATION or $FILE_NAME (not
- sure which). */
-/*0x38*/le16 file_name_size; /* Size of the file name in bytes. */
-/*0x3a*/le16 file_name_offset; /* Offset to the file name in bytes from the
- start of this record. */
-/*0x3c*/ntfschar file_name[0]; /* Use when creating only. When reading use
- file_name_offset to determine the location
- of the name. */
-/* sizeof() = 60 (0x3c) bytes */
-} __attribute__ ((__packed__)) USN_RECORD;
-
-extern bool ntfs_stamp_usnjrnl(ntfs_volume *vol);
-
-#endif /* NTFS_RW */
-
-#endif /* _LINUX_NTFS_USNJRNL_H */
diff --git a/fs/ntfs/volume.h b/fs/ntfs/volume.h
deleted file mode 100644
index 930a9ae..0000000
--- a/fs/ntfs/volume.h
+++ /dev/null
@@ -1,164 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * volume.h - Defines for volume structures in NTFS Linux kernel driver. Part
- * of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2006 Anton Altaparmakov
- * Copyright (c) 2002 Richard Russon
- */
-
-#ifndef _LINUX_NTFS_VOLUME_H
-#define _LINUX_NTFS_VOLUME_H
-
-#include <linux/rwsem.h>
-#include <linux/uidgid.h>
-
-#include "types.h"
-#include "layout.h"
-
-/*
- * The NTFS in memory super block structure.
- */
-typedef struct {
- /*
- * FIXME: Reorder to have commonly used together element within the
- * same cache line, aiming at a cache line size of 32 bytes. Aim for
- * 64 bytes for less commonly used together elements. Put most commonly
- * used elements to front of structure. Obviously do this only when the
- * structure has stabilized... (AIA)
- */
- /* Device specifics. */
- struct super_block *sb; /* Pointer back to the super_block. */
- LCN nr_blocks; /* Number of sb->s_blocksize bytes
- sized blocks on the device. */
- /* Configuration provided by user at mount time. */
- unsigned long flags; /* Miscellaneous flags, see below. */
- kuid_t uid; /* uid that files will be mounted as. */
- kgid_t gid; /* gid that files will be mounted as. */
- umode_t fmask; /* The mask for file permissions. */
- umode_t dmask; /* The mask for directory
- permissions. */
- u8 mft_zone_multiplier; /* Initial mft zone multiplier. */
- u8 on_errors; /* What to do on filesystem errors. */
- /* NTFS bootsector provided information. */
- u16 sector_size; /* in bytes */
- u8 sector_size_bits; /* log2(sector_size) */
- u32 cluster_size; /* in bytes */
- u32 cluster_size_mask; /* cluster_size - 1 */
- u8 cluster_size_bits; /* log2(cluster_size) */
- u32 mft_record_size; /* in bytes */
- u32 mft_record_size_mask; /* mft_record_size - 1 */
- u8 mft_record_size_bits; /* log2(mft_record_size) */
- u32 index_record_size; /* in bytes */
- u32 index_record_size_mask; /* index_record_size - 1 */
- u8 index_record_size_bits; /* log2(index_record_size) */
- LCN nr_clusters; /* Volume size in clusters == number of
- bits in lcn bitmap. */
- LCN mft_lcn; /* Cluster location of mft data. */
- LCN mftmirr_lcn; /* Cluster location of copy of mft. */
- u64 serial_no; /* The volume serial number. */
- /* Mount specific NTFS information. */
- u32 upcase_len; /* Number of entries in upcase[]. */
- ntfschar *upcase; /* The upcase table. */
-
- s32 attrdef_size; /* Size of the attribute definition
- table in bytes. */
- ATTR_DEF *attrdef; /* Table of attribute definitions.
- Obtained from FILE_AttrDef. */
-
-#ifdef NTFS_RW
- /* Variables used by the cluster and mft allocators. */
- s64 mft_data_pos; /* Mft record number at which to
- allocate the next mft record. */
- LCN mft_zone_start; /* First cluster of the mft zone. */
- LCN mft_zone_end; /* First cluster beyond the mft zone. */
- LCN mft_zone_pos; /* Current position in the mft zone. */
- LCN data1_zone_pos; /* Current position in the first data
- zone. */
- LCN data2_zone_pos; /* Current position in the second data
- zone. */
-#endif /* NTFS_RW */
-
- struct inode *mft_ino; /* The VFS inode of $MFT. */
-
- struct inode *mftbmp_ino; /* Attribute inode for $MFT/$BITMAP. */
- struct rw_semaphore mftbmp_lock; /* Lock for serializing accesses to the
- mft record bitmap ($MFT/$BITMAP). */
-#ifdef NTFS_RW
- struct inode *mftmirr_ino; /* The VFS inode of $MFTMirr. */
- int mftmirr_size; /* Size of mft mirror in mft records. */
-
- struct inode *logfile_ino; /* The VFS inode of $LogFile. */
-#endif /* NTFS_RW */
-
- struct inode *lcnbmp_ino; /* The VFS inode of $Bitmap. */
- struct rw_semaphore lcnbmp_lock; /* Lock for serializing accesses to the
- cluster bitmap ($Bitmap/$DATA). */
-
- struct inode *vol_ino; /* The VFS inode of $Volume. */
- VOLUME_FLAGS vol_flags; /* Volume flags. */
- u8 major_ver; /* Ntfs major version of volume. */
- u8 minor_ver; /* Ntfs minor version of volume. */
-
- struct inode *root_ino; /* The VFS inode of the root
- directory. */
- struct inode *secure_ino; /* The VFS inode of $Secure (NTFS3.0+
- only, otherwise NULL). */
- struct inode *extend_ino; /* The VFS inode of $Extend (NTFS3.0+
- only, otherwise NULL). */
-#ifdef NTFS_RW
- /* $Quota stuff is NTFS3.0+ specific. Unused/NULL otherwise. */
- struct inode *quota_ino; /* The VFS inode of $Quota. */
- struct inode *quota_q_ino; /* Attribute inode for $Quota/$Q. */
- /* $UsnJrnl stuff is NTFS3.0+ specific. Unused/NULL otherwise. */
- struct inode *usnjrnl_ino; /* The VFS inode of $UsnJrnl. */
- struct inode *usnjrnl_max_ino; /* Attribute inode for $UsnJrnl/$Max. */
- struct inode *usnjrnl_j_ino; /* Attribute inode for $UsnJrnl/$J. */
-#endif /* NTFS_RW */
- struct nls_table *nls_map;
-} ntfs_volume;
-
-/*
- * Defined bits for the flags field in the ntfs_volume structure.
- */
-typedef enum {
- NV_Errors, /* 1: Volume has errors, prevent remount rw. */
- NV_ShowSystemFiles, /* 1: Return system files in ntfs_readdir(). */
- NV_CaseSensitive, /* 1: Treat file names as case sensitive and
- create filenames in the POSIX namespace.
- Otherwise be case insensitive but still
- create file names in POSIX namespace. */
- NV_LogFileEmpty, /* 1: $LogFile journal is empty. */
- NV_QuotaOutOfDate, /* 1: $Quota is out of date. */
- NV_UsnJrnlStamped, /* 1: $UsnJrnl has been stamped. */
- NV_SparseEnabled, /* 1: May create sparse files. */
-} ntfs_volume_flags;
-
-/*
- * Macro tricks to expand the NVolFoo(), NVolSetFoo(), and NVolClearFoo()
- * functions.
- */
-#define DEFINE_NVOL_BIT_OPS(flag) \
-static inline int NVol##flag(ntfs_volume *vol) \
-{ \
- return test_bit(NV_##flag, &(vol)->flags); \
-} \
-static inline void NVolSet##flag(ntfs_volume *vol) \
-{ \
- set_bit(NV_##flag, &(vol)->flags); \
-} \
-static inline void NVolClear##flag(ntfs_volume *vol) \
-{ \
- clear_bit(NV_##flag, &(vol)->flags); \
-}
-
-/* Emit the ntfs volume bitops functions. */
-DEFINE_NVOL_BIT_OPS(Errors)
-DEFINE_NVOL_BIT_OPS(ShowSystemFiles)
-DEFINE_NVOL_BIT_OPS(CaseSensitive)
-DEFINE_NVOL_BIT_OPS(LogFileEmpty)
-DEFINE_NVOL_BIT_OPS(QuotaOutOfDate)
-DEFINE_NVOL_BIT_OPS(UsnJrnlStamped)
-DEFINE_NVOL_BIT_OPS(SparseEnabled)
-
-#endif /* _LINUX_NTFS_VOLUME_H */
diff --git a/fs/ntfs3/attrib.c b/fs/ntfs3/attrib.c
index 63f7025..7aadf50 100644
--- a/fs/ntfs3/attrib.c
+++ b/fs/ntfs3/attrib.c
@@ -886,7 +886,7 @@ int attr_data_get_block(struct ntfs_inode *ni, CLST vcn, CLST clen, CLST *lcn,
struct runs_tree *run = &ni->file.run;
struct ntfs_sb_info *sbi;
u8 cluster_bits;
- struct ATTRIB *attr = NULL, *attr_b;
+ struct ATTRIB *attr, *attr_b;
struct ATTR_LIST_ENTRY *le, *le_b;
struct mft_inode *mi, *mi_b;
CLST hint, svcn, to_alloc, evcn1, next_svcn, asize, end, vcn0, alen;
@@ -904,12 +904,8 @@ int attr_data_get_block(struct ntfs_inode *ni, CLST vcn, CLST clen, CLST *lcn,
*len = 0;
up_read(&ni->file.run_lock);
- if (*len) {
- if (*lcn != SPARSE_LCN || !new)
- return 0; /* Fast normal way without allocation. */
- else if (clen > *len)
- clen = *len;
- }
+ if (*len && (*lcn != SPARSE_LCN || !new))
+ return 0; /* Fast normal way without allocation. */
/* No cluster in cache or we need to allocate cluster in hole. */
sbi = ni->mi.sbi;
@@ -918,6 +914,17 @@ int attr_data_get_block(struct ntfs_inode *ni, CLST vcn, CLST clen, CLST *lcn,
ni_lock(ni);
down_write(&ni->file.run_lock);
+ /* Repeat the code above (under write lock). */
+ if (!run_lookup_entry(run, vcn, lcn, len, NULL))
+ *len = 0;
+
+ if (*len) {
+ if (*lcn != SPARSE_LCN || !new)
+ goto out; /* normal way without allocation. */
+ if (clen > *len)
+ clen = *len;
+ }
+
le_b = NULL;
attr_b = ni_find_attr(ni, NULL, &le_b, ATTR_DATA, NULL, 0, NULL, &mi_b);
if (!attr_b) {
@@ -1736,8 +1743,10 @@ int attr_allocate_frame(struct ntfs_inode *ni, CLST frame, size_t compr_size,
le_b = NULL;
attr_b = ni_find_attr(ni, NULL, &le_b, ATTR_DATA, NULL,
0, NULL, &mi_b);
- if (!attr_b)
- return -ENOENT;
+ if (!attr_b) {
+ err = -ENOENT;
+ goto out;
+ }
attr = attr_b;
le = le_b;
@@ -1818,13 +1827,15 @@ int attr_allocate_frame(struct ntfs_inode *ni, CLST frame, size_t compr_size,
ok:
run_truncate_around(run, vcn);
out:
- if (new_valid > data_size)
- new_valid = data_size;
+ if (attr_b) {
+ if (new_valid > data_size)
+ new_valid = data_size;
- valid_size = le64_to_cpu(attr_b->nres.valid_size);
- if (new_valid != valid_size) {
- attr_b->nres.valid_size = cpu_to_le64(valid_size);
- mi_b->dirty = true;
+ valid_size = le64_to_cpu(attr_b->nres.valid_size);
+ if (new_valid != valid_size) {
+ attr_b->nres.valid_size = cpu_to_le64(valid_size);
+ mi_b->dirty = true;
+ }
}
return err;
@@ -2073,7 +2084,7 @@ int attr_collapse_range(struct ntfs_inode *ni, u64 vbo, u64 bytes)
/* Update inode size. */
ni->i_valid = valid_size;
- ni->vfs_inode.i_size = data_size;
+ i_size_write(&ni->vfs_inode, data_size);
inode_set_bytes(&ni->vfs_inode, total_size);
ni->ni_flags |= NI_FLAG_UPDATE_PARENT;
mark_inode_dirty(&ni->vfs_inode);
@@ -2488,7 +2499,7 @@ int attr_insert_range(struct ntfs_inode *ni, u64 vbo, u64 bytes)
mi_b->dirty = true;
done:
- ni->vfs_inode.i_size += bytes;
+ i_size_write(&ni->vfs_inode, ni->vfs_inode.i_size + bytes);
ni->ni_flags |= NI_FLAG_UPDATE_PARENT;
mark_inode_dirty(&ni->vfs_inode);
diff --git a/fs/ntfs3/attrlist.c b/fs/ntfs3/attrlist.c
index 7c01735..9f4bd8d 100644
--- a/fs/ntfs3/attrlist.c
+++ b/fs/ntfs3/attrlist.c
@@ -29,7 +29,7 @@ static inline bool al_is_valid_le(const struct ntfs_inode *ni,
void al_destroy(struct ntfs_inode *ni)
{
run_close(&ni->attr_list.run);
- kfree(ni->attr_list.le);
+ kvfree(ni->attr_list.le);
ni->attr_list.le = NULL;
ni->attr_list.size = 0;
ni->attr_list.dirty = false;
@@ -127,12 +127,13 @@ struct ATTR_LIST_ENTRY *al_enumerate(struct ntfs_inode *ni,
{
size_t off;
u16 sz;
+ const unsigned le_min_size = le_size(0);
if (!le) {
le = ni->attr_list.le;
} else {
sz = le16_to_cpu(le->size);
- if (sz < sizeof(struct ATTR_LIST_ENTRY)) {
+ if (sz < le_min_size) {
/* Impossible 'cause we should not return such le. */
return NULL;
}
@@ -141,7 +142,7 @@ struct ATTR_LIST_ENTRY *al_enumerate(struct ntfs_inode *ni,
/* Check boundary. */
off = PtrOffset(ni->attr_list.le, le);
- if (off + sizeof(struct ATTR_LIST_ENTRY) > ni->attr_list.size) {
+ if (off + le_min_size > ni->attr_list.size) {
/* The regular end of list. */
return NULL;
}
@@ -149,8 +150,7 @@ struct ATTR_LIST_ENTRY *al_enumerate(struct ntfs_inode *ni,
sz = le16_to_cpu(le->size);
/* Check le for errors. */
- if (sz < sizeof(struct ATTR_LIST_ENTRY) ||
- off + sz > ni->attr_list.size ||
+ if (sz < le_min_size || off + sz > ni->attr_list.size ||
sz < le->name_off + le->name_len * sizeof(short)) {
return NULL;
}
@@ -318,7 +318,7 @@ int al_add_le(struct ntfs_inode *ni, enum ATTR_TYPE type, const __le16 *name,
memcpy(ptr, al->le, off);
memcpy(Add2Ptr(ptr, off + sz), le, old_size - off);
le = Add2Ptr(ptr, off);
- kfree(al->le);
+ kvfree(al->le);
al->le = ptr;
} else {
memmove(Add2Ptr(le, sz), le, old_size - off);
diff --git a/fs/ntfs3/bitmap.c b/fs/ntfs3/bitmap.c
index 63f14a0..845f9b2 100644
--- a/fs/ntfs3/bitmap.c
+++ b/fs/ntfs3/bitmap.c
@@ -124,7 +124,7 @@ void wnd_close(struct wnd_bitmap *wnd)
{
struct rb_node *node, *next;
- kfree(wnd->free_bits);
+ kvfree(wnd->free_bits);
wnd->free_bits = NULL;
run_close(&wnd->run);
@@ -1360,7 +1360,7 @@ int wnd_extend(struct wnd_bitmap *wnd, size_t new_bits)
memcpy(new_free, wnd->free_bits, wnd->nwnd * sizeof(short));
memset(new_free + wnd->nwnd, 0,
(new_wnd - wnd->nwnd) * sizeof(short));
- kfree(wnd->free_bits);
+ kvfree(wnd->free_bits);
wnd->free_bits = new_free;
}
diff --git a/fs/ntfs3/dir.c b/fs/ntfs3/dir.c
index ec0566b..5cf3d9d 100644
--- a/fs/ntfs3/dir.c
+++ b/fs/ntfs3/dir.c
@@ -309,11 +309,31 @@ static inline int ntfs_filldir(struct ntfs_sb_info *sbi, struct ntfs_inode *ni,
return 0;
}
- /* NTFS: symlinks are "dir + reparse" or "file + reparse" */
- if (fname->dup.fa & FILE_ATTRIBUTE_REPARSE_POINT)
- dt_type = DT_LNK;
- else
- dt_type = (fname->dup.fa & FILE_ATTRIBUTE_DIRECTORY) ? DT_DIR : DT_REG;
+ /*
+ * NTFS: symlinks are "dir + reparse" or "file + reparse"
+ * Unfortunately reparse attribute is used for many purposes (several dozens).
+ * It is not possible here to know is this name symlink or not.
+ * To get exactly the type of name we should to open inode (read mft).
+ * getattr for opened file (fstat) correctly returns symlink.
+ */
+ dt_type = (fname->dup.fa & FILE_ATTRIBUTE_DIRECTORY) ? DT_DIR : DT_REG;
+
+ /*
+ * It is not reliable to detect the type of name using duplicated information
+ * stored in parent directory.
+ * The only correct way to get the type of name - read MFT record and find ATTR_STD.
+ * The code below is not good idea.
+ * It does additional locks/reads just to get the type of name.
+ * Should we use additional mount option to enable branch below?
+ */
+ if ((fname->dup.fa & FILE_ATTRIBUTE_REPARSE_POINT) &&
+ ino != ni->mi.rno) {
+ struct inode *inode = ntfs_iget5(sbi->sb, &e->ref, NULL);
+ if (!IS_ERR_OR_NULL(inode)) {
+ dt_type = fs_umode_to_dtype(inode->i_mode);
+ iput(inode);
+ }
+ }
return !dir_emit(ctx, (s8 *)name, name_len, ino, dt_type);
}
@@ -495,11 +515,9 @@ static int ntfs_dir_count(struct inode *dir, bool *is_empty, size_t *dirs,
struct INDEX_HDR *hdr;
const struct ATTR_FILE_NAME *fname;
u32 e_size, off, end;
- u64 vbo = 0;
size_t drs = 0, fles = 0, bit = 0;
- loff_t i_size = ni->vfs_inode.i_size;
struct indx_node *node = NULL;
- u8 index_bits = ni->dir.index_bits;
+ size_t max_indx = i_size_read(&ni->vfs_inode) >> ni->dir.index_bits;
if (is_empty)
*is_empty = true;
@@ -518,8 +536,10 @@ static int ntfs_dir_count(struct inode *dir, bool *is_empty, size_t *dirs,
e = Add2Ptr(hdr, off);
e_size = le16_to_cpu(e->size);
if (e_size < sizeof(struct NTFS_DE) ||
- off + e_size > end)
+ off + e_size > end) {
+ /* Looks like corruption. */
break;
+ }
if (de_is_last(e))
break;
@@ -543,7 +563,7 @@ static int ntfs_dir_count(struct inode *dir, bool *is_empty, size_t *dirs,
fles += 1;
}
- if (vbo >= i_size)
+ if (bit >= max_indx)
goto out;
err = indx_used_bit(&ni->dir, ni, &bit);
@@ -553,8 +573,7 @@ static int ntfs_dir_count(struct inode *dir, bool *is_empty, size_t *dirs,
if (bit == MINUS_ONE_T)
goto out;
- vbo = (u64)bit << index_bits;
- if (vbo >= i_size)
+ if (bit >= max_indx)
goto out;
err = indx_read(&ni->dir, ni, bit << ni->dir.idx2vbn_bits,
@@ -564,7 +583,6 @@ static int ntfs_dir_count(struct inode *dir, bool *is_empty, size_t *dirs,
hdr = &node->index->ihdr;
bit += 1;
- vbo = (u64)bit << ni->dir.idx2vbn_bits;
}
out:
@@ -593,5 +611,9 @@ const struct file_operations ntfs_dir_operations = {
.iterate_shared = ntfs_readdir,
.fsync = generic_file_fsync,
.open = ntfs_file_open,
+ .unlocked_ioctl = ntfs_ioctl,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = ntfs_compat_ioctl,
+#endif
};
// clang-format on
diff --git a/fs/ntfs3/file.c b/fs/ntfs3/file.c
index a5a30a2..5418662 100644
--- a/fs/ntfs3/file.c
+++ b/fs/ntfs3/file.c
@@ -48,7 +48,7 @@ static int ntfs_ioctl_fitrim(struct ntfs_sb_info *sbi, unsigned long arg)
return 0;
}
-static long ntfs_ioctl(struct file *filp, u32 cmd, unsigned long arg)
+long ntfs_ioctl(struct file *filp, u32 cmd, unsigned long arg)
{
struct inode *inode = file_inode(filp);
struct ntfs_sb_info *sbi = inode->i_sb->s_fs_info;
@@ -61,7 +61,7 @@ static long ntfs_ioctl(struct file *filp, u32 cmd, unsigned long arg)
}
#ifdef CONFIG_COMPAT
-static long ntfs_compat_ioctl(struct file *filp, u32 cmd, unsigned long arg)
+long ntfs_compat_ioctl(struct file *filp, u32 cmd, unsigned long arg)
{
return ntfs_ioctl(filp, cmd, (unsigned long)compat_ptr(arg));
@@ -188,6 +188,7 @@ static int ntfs_zero_range(struct inode *inode, u64 vbo, u64 vbo_to)
u32 bh_next, bh_off, to;
sector_t iblock;
struct folio *folio;
+ bool dirty = false;
for (; idx < idx_end; idx += 1, from = 0) {
page_off = (loff_t)idx << PAGE_SHIFT;
@@ -223,29 +224,27 @@ static int ntfs_zero_range(struct inode *inode, u64 vbo, u64 vbo_to)
/* Ok, it's mapped. Make sure it's up-to-date. */
if (folio_test_uptodate(folio))
set_buffer_uptodate(bh);
-
- if (!buffer_uptodate(bh)) {
- err = bh_read(bh, 0);
- if (err < 0) {
- folio_unlock(folio);
- folio_put(folio);
- goto out;
- }
+ else if (bh_read(bh, 0) < 0) {
+ err = -EIO;
+ folio_unlock(folio);
+ folio_put(folio);
+ goto out;
}
mark_buffer_dirty(bh);
-
} while (bh_off = bh_next, iblock += 1,
head != (bh = bh->b_this_page));
folio_zero_segment(folio, from, to);
+ dirty = true;
folio_unlock(folio);
folio_put(folio);
cond_resched();
}
out:
- mark_inode_dirty(inode);
+ if (dirty)
+ mark_inode_dirty(inode);
return err;
}
@@ -261,6 +260,9 @@ static int ntfs_file_mmap(struct file *file, struct vm_area_struct *vma)
bool rw = vma->vm_flags & VM_WRITE;
int err;
+ if (unlikely(ntfs3_forced_shutdown(inode->i_sb)))
+ return -EIO;
+
if (is_encrypted(ni)) {
ntfs_inode_warn(inode, "mmap encrypted not supported");
return -EOPNOTSUPP;
@@ -499,10 +501,14 @@ static long ntfs_fallocate(struct file *file, int mode, loff_t vbo, loff_t len)
ni_lock(ni);
err = attr_punch_hole(ni, vbo, len, &frame_size);
ni_unlock(ni);
+ if (!err)
+ goto ok;
+
if (err != E_NTFS_NOTALIGNED)
goto out;
/* Process not aligned punch. */
+ err = 0;
mask = frame_size - 1;
vbo_a = (vbo + mask) & ~mask;
end_a = end & ~mask;
@@ -525,6 +531,8 @@ static long ntfs_fallocate(struct file *file, int mode, loff_t vbo, loff_t len)
ni_lock(ni);
err = attr_punch_hole(ni, vbo_a, end_a - vbo_a, NULL);
ni_unlock(ni);
+ if (err)
+ goto out;
}
} else if (mode & FALLOC_FL_COLLAPSE_RANGE) {
/*
@@ -564,6 +572,8 @@ static long ntfs_fallocate(struct file *file, int mode, loff_t vbo, loff_t len)
ni_lock(ni);
err = attr_insert_range(ni, vbo, len);
ni_unlock(ni);
+ if (err)
+ goto out;
} else {
/* Check new size. */
u8 cluster_bits = sbi->cluster_bits;
@@ -633,11 +643,18 @@ static long ntfs_fallocate(struct file *file, int mode, loff_t vbo, loff_t len)
&ni->file.run, i_size, &ni->i_valid,
true, NULL);
ni_unlock(ni);
+ if (err)
+ goto out;
} else if (new_size > i_size) {
- inode->i_size = new_size;
+ i_size_write(inode, new_size);
}
}
+ok:
+ err = file_modified(file);
+ if (err)
+ goto out;
+
out:
if (map_locked)
filemap_invalidate_unlock(mapping);
@@ -663,6 +680,9 @@ int ntfs3_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
umode_t mode = inode->i_mode;
int err;
+ if (unlikely(ntfs3_forced_shutdown(inode->i_sb)))
+ return -EIO;
+
err = setattr_prepare(idmap, dentry, attr);
if (err)
goto out;
@@ -676,7 +696,7 @@ int ntfs3_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
goto out;
}
inode_dio_wait(inode);
- oldsize = inode->i_size;
+ oldsize = i_size_read(inode);
newsize = attr->ia_size;
if (newsize <= oldsize)
@@ -688,7 +708,7 @@ int ntfs3_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
goto out;
ni->ni_flags |= NI_FLAG_UPDATE_PARENT;
- inode->i_size = newsize;
+ i_size_write(inode, newsize);
}
setattr_copy(idmap, inode, attr);
@@ -718,6 +738,9 @@ static ssize_t ntfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
struct inode *inode = file->f_mapping->host;
struct ntfs_inode *ni = ntfs_i(inode);
+ if (unlikely(ntfs3_forced_shutdown(inode->i_sb)))
+ return -EIO;
+
if (is_encrypted(ni)) {
ntfs_inode_warn(inode, "encrypted i/o not supported");
return -EOPNOTSUPP;
@@ -752,6 +775,9 @@ static ssize_t ntfs_file_splice_read(struct file *in, loff_t *ppos,
struct inode *inode = in->f_mapping->host;
struct ntfs_inode *ni = ntfs_i(inode);
+ if (unlikely(ntfs3_forced_shutdown(inode->i_sb)))
+ return -EIO;
+
if (is_encrypted(ni)) {
ntfs_inode_warn(inode, "encrypted i/o not supported");
return -EOPNOTSUPP;
@@ -821,7 +847,7 @@ static ssize_t ntfs_compress_write(struct kiocb *iocb, struct iov_iter *from)
size_t count = iov_iter_count(from);
loff_t pos = iocb->ki_pos;
struct inode *inode = file_inode(file);
- loff_t i_size = inode->i_size;
+ loff_t i_size = i_size_read(inode);
struct address_space *mapping = inode->i_mapping;
struct ntfs_inode *ni = ntfs_i(inode);
u64 valid = ni->i_valid;
@@ -1028,6 +1054,8 @@ static ssize_t ntfs_compress_write(struct kiocb *iocb, struct iov_iter *from)
iocb->ki_pos += written;
if (iocb->ki_pos > ni->i_valid)
ni->i_valid = iocb->ki_pos;
+ if (iocb->ki_pos > i_size)
+ i_size_write(inode, iocb->ki_pos);
return written;
}
@@ -1041,8 +1069,12 @@ static ssize_t ntfs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
struct address_space *mapping = file->f_mapping;
struct inode *inode = mapping->host;
ssize_t ret;
+ int err;
struct ntfs_inode *ni = ntfs_i(inode);
+ if (unlikely(ntfs3_forced_shutdown(inode->i_sb)))
+ return -EIO;
+
if (is_encrypted(ni)) {
ntfs_inode_warn(inode, "encrypted i/o not supported");
return -EOPNOTSUPP;
@@ -1068,6 +1100,12 @@ static ssize_t ntfs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
if (ret <= 0)
goto out;
+ err = file_modified(iocb->ki_filp);
+ if (err) {
+ ret = err;
+ goto out;
+ }
+
if (WARN_ON(ni->ni_flags & NI_FLAG_COMPRESSED_MASK)) {
/* Should never be here, see ntfs_file_open(). */
ret = -EOPNOTSUPP;
@@ -1097,6 +1135,9 @@ int ntfs_file_open(struct inode *inode, struct file *file)
{
struct ntfs_inode *ni = ntfs_i(inode);
+ if (unlikely(ntfs3_forced_shutdown(inode->i_sb)))
+ return -EIO;
+
if (unlikely((is_compressed(ni) || is_encrypted(ni)) &&
(file->f_flags & O_DIRECT))) {
return -EOPNOTSUPP;
@@ -1138,7 +1179,8 @@ static int ntfs_file_release(struct inode *inode, struct file *file)
down_write(&ni->file.run_lock);
err = attr_set_size(ni, ATTR_DATA, NULL, 0, &ni->file.run,
- inode->i_size, &ni->i_valid, false, NULL);
+ i_size_read(inode), &ni->i_valid, false,
+ NULL);
up_write(&ni->file.run_lock);
ni_unlock(ni);
diff --git a/fs/ntfs3/frecord.c b/fs/ntfs3/frecord.c
index 3df2d9e..7f27382 100644
--- a/fs/ntfs3/frecord.c
+++ b/fs/ntfs3/frecord.c
@@ -778,7 +778,7 @@ static int ni_try_remove_attr_list(struct ntfs_inode *ni)
run_deallocate(sbi, &ni->attr_list.run, true);
run_close(&ni->attr_list.run);
ni->attr_list.size = 0;
- kfree(ni->attr_list.le);
+ kvfree(ni->attr_list.le);
ni->attr_list.le = NULL;
ni->attr_list.dirty = false;
@@ -927,7 +927,7 @@ int ni_create_attr_list(struct ntfs_inode *ni)
return 0;
out:
- kfree(ni->attr_list.le);
+ kvfree(ni->attr_list.le);
ni->attr_list.le = NULL;
ni->attr_list.size = 0;
return err;
@@ -2099,7 +2099,7 @@ int ni_readpage_cmpr(struct ntfs_inode *ni, struct page *page)
gfp_t gfp_mask;
struct page *pg;
- if (vbo >= ni->vfs_inode.i_size) {
+ if (vbo >= i_size_read(&ni->vfs_inode)) {
SetPageUptodate(page);
err = 0;
goto out;
@@ -2173,7 +2173,7 @@ int ni_decompress_file(struct ntfs_inode *ni)
{
struct ntfs_sb_info *sbi = ni->mi.sbi;
struct inode *inode = &ni->vfs_inode;
- loff_t i_size = inode->i_size;
+ loff_t i_size = i_size_read(inode);
struct address_space *mapping = inode->i_mapping;
gfp_t gfp_mask = mapping_gfp_mask(mapping);
struct page **pages = NULL;
@@ -2508,6 +2508,7 @@ int ni_read_frame(struct ntfs_inode *ni, u64 frame_vbo, struct page **pages,
err = -EOPNOTSUPP;
goto out1;
#else
+ loff_t i_size = i_size_read(&ni->vfs_inode);
u32 frame_bits = ni_ext_compress_bits(ni);
u64 frame64 = frame_vbo >> frame_bits;
u64 frames, vbo_data;
@@ -2548,7 +2549,7 @@ int ni_read_frame(struct ntfs_inode *ni, u64 frame_vbo, struct page **pages,
}
}
- frames = (ni->vfs_inode.i_size - 1) >> frame_bits;
+ frames = (i_size - 1) >> frame_bits;
err = attr_wof_frame_info(ni, attr, run, frame64, frames,
frame_bits, &ondisk_size, &vbo_data);
@@ -2556,8 +2557,7 @@ int ni_read_frame(struct ntfs_inode *ni, u64 frame_vbo, struct page **pages,
goto out2;
if (frame64 == frames) {
- unc_size = 1 + ((ni->vfs_inode.i_size - 1) &
- (frame_size - 1));
+ unc_size = 1 + ((i_size - 1) & (frame_size - 1));
ondisk_size = attr_size(attr) - vbo_data;
} else {
unc_size = frame_size;
@@ -3259,6 +3259,9 @@ int ni_write_inode(struct inode *inode, int sync, const char *hint)
if (is_bad_inode(inode) || sb_rdonly(sb))
return 0;
+ if (unlikely(ntfs3_forced_shutdown(sb)))
+ return -EIO;
+
if (!ni_trylock(ni)) {
/* 'ni' is under modification, skip for now. */
mark_inode_dirty_sync(inode);
@@ -3288,7 +3291,7 @@ int ni_write_inode(struct inode *inode, int sync, const char *hint)
modified = true;
}
- ts = inode_get_mtime(inode);
+ ts = inode_get_ctime(inode);
dup.c_time = kernel2nt(&ts);
if (std->c_time != dup.c_time) {
std->c_time = dup.c_time;
diff --git a/fs/ntfs3/fslog.c b/fs/ntfs3/fslog.c
index 98ccb66..8555197 100644
--- a/fs/ntfs3/fslog.c
+++ b/fs/ntfs3/fslog.c
@@ -465,7 +465,7 @@ static inline bool is_rst_area_valid(const struct RESTART_HDR *rhdr)
{
const struct RESTART_AREA *ra;
u16 cl, fl, ul;
- u32 off, l_size, file_dat_bits, file_size_round;
+ u32 off, l_size, seq_bits;
u16 ro = le16_to_cpu(rhdr->ra_off);
u32 sys_page = le32_to_cpu(rhdr->sys_page_size);
@@ -511,13 +511,15 @@ static inline bool is_rst_area_valid(const struct RESTART_HDR *rhdr)
/* Make sure the sequence number bits match the log file size. */
l_size = le64_to_cpu(ra->l_size);
- file_dat_bits = sizeof(u64) * 8 - le32_to_cpu(ra->seq_num_bits);
- file_size_round = 1u << (file_dat_bits + 3);
- if (file_size_round != l_size &&
- (file_size_round < l_size || (file_size_round / 2) > l_size)) {
- return false;
+ seq_bits = sizeof(u64) * 8 + 3;
+ while (l_size) {
+ l_size >>= 1;
+ seq_bits -= 1;
}
+ if (seq_bits != ra->seq_num_bits)
+ return false;
+
/* The log page data offset and record header length must be quad-aligned. */
if (!IS_ALIGNED(le16_to_cpu(ra->data_off), 8) ||
!IS_ALIGNED(le16_to_cpu(ra->rec_hdr_len), 8))
@@ -974,6 +976,16 @@ static inline void *alloc_rsttbl_from_idx(struct RESTART_TABLE **tbl, u32 vbo)
return e;
}
+struct restart_info {
+ u64 last_lsn;
+ struct RESTART_HDR *r_page;
+ u32 vbo;
+ bool chkdsk_was_run;
+ bool valid_page;
+ bool initialized;
+ bool restart;
+};
+
#define RESTART_SINGLE_PAGE_IO cpu_to_le16(0x0001)
#define NTFSLOG_WRAPPED 0x00000001
@@ -987,6 +999,7 @@ struct ntfs_log {
struct ntfs_inode *ni;
u32 l_size;
+ u32 orig_file_size;
u32 sys_page_size;
u32 sys_page_mask;
u32 page_size;
@@ -1040,6 +1053,8 @@ struct ntfs_log {
struct CLIENT_ID client_id;
u32 client_undo_commit;
+
+ struct restart_info rst_info, rst_info2;
};
static inline u32 lsn_to_vbo(struct ntfs_log *log, const u64 lsn)
@@ -1105,16 +1120,6 @@ static inline bool verify_client_lsn(struct ntfs_log *log,
lsn <= le64_to_cpu(log->ra->current_lsn) && lsn;
}
-struct restart_info {
- u64 last_lsn;
- struct RESTART_HDR *r_page;
- u32 vbo;
- bool chkdsk_was_run;
- bool valid_page;
- bool initialized;
- bool restart;
-};
-
static int read_log_page(struct ntfs_log *log, u32 vbo,
struct RECORD_PAGE_HDR **buffer, bool *usa_error)
{
@@ -1176,7 +1181,7 @@ static int read_log_page(struct ntfs_log *log, u32 vbo,
* restart page header. It will stop the first time we find a
* valid page header.
*/
-static int log_read_rst(struct ntfs_log *log, u32 l_size, bool first,
+static int log_read_rst(struct ntfs_log *log, bool first,
struct restart_info *info)
{
u32 skip, vbo;
@@ -1192,7 +1197,7 @@ static int log_read_rst(struct ntfs_log *log, u32 l_size, bool first,
}
/* Loop continuously until we succeed. */
- for (; vbo < l_size; vbo = 2 * vbo + skip, skip = 0) {
+ for (; vbo < log->l_size; vbo = 2 * vbo + skip, skip = 0) {
bool usa_error;
bool brst, bchk;
struct RESTART_AREA *ra;
@@ -1285,22 +1290,17 @@ static int log_read_rst(struct ntfs_log *log, u32 l_size, bool first,
/*
* Ilog_init_pg_hdr - Init @log from restart page header.
*/
-static void log_init_pg_hdr(struct ntfs_log *log, u32 sys_page_size,
- u32 page_size, u16 major_ver, u16 minor_ver)
+static void log_init_pg_hdr(struct ntfs_log *log, u16 major_ver, u16 minor_ver)
{
- log->sys_page_size = sys_page_size;
- log->sys_page_mask = sys_page_size - 1;
- log->page_size = page_size;
- log->page_mask = page_size - 1;
- log->page_bits = blksize_bits(page_size);
+ log->sys_page_size = log->page_size;
+ log->sys_page_mask = log->page_mask;
log->clst_per_page = log->page_size >> log->ni->mi.sbi->cluster_bits;
if (!log->clst_per_page)
log->clst_per_page = 1;
- log->first_page = major_ver >= 2 ?
- 0x22 * page_size :
- ((sys_page_size << 1) + (page_size << 1));
+ log->first_page = major_ver >= 2 ? 0x22 * log->page_size :
+ 4 * log->page_size;
log->major_ver = major_ver;
log->minor_ver = minor_ver;
}
@@ -1308,12 +1308,11 @@ static void log_init_pg_hdr(struct ntfs_log *log, u32 sys_page_size,
/*
* log_create - Init @log in cases when we don't have a restart area to use.
*/
-static void log_create(struct ntfs_log *log, u32 l_size, const u64 last_lsn,
+static void log_create(struct ntfs_log *log, const u64 last_lsn,
u32 open_log_count, bool wrapped, bool use_multi_page)
{
- log->l_size = l_size;
/* All file offsets must be quadword aligned. */
- log->file_data_bits = blksize_bits(l_size) - 3;
+ log->file_data_bits = blksize_bits(log->l_size) - 3;
log->seq_num_mask = (8 << log->file_data_bits) - 1;
log->seq_num_bits = sizeof(u64) * 8 - log->file_data_bits;
log->seq_num = (last_lsn >> log->file_data_bits) + 2;
@@ -3720,10 +3719,8 @@ int log_replay(struct ntfs_inode *ni, bool *initialized)
struct ntfs_sb_info *sbi = ni->mi.sbi;
struct ntfs_log *log;
- struct restart_info rst_info, rst_info2;
- u64 rec_lsn, ra_lsn, checkpt_lsn = 0, rlsn = 0;
+ u64 rec_lsn, checkpt_lsn = 0, rlsn = 0;
struct ATTR_NAME_ENTRY *attr_names = NULL;
- struct ATTR_NAME_ENTRY *ane;
struct RESTART_TABLE *dptbl = NULL;
struct RESTART_TABLE *trtbl = NULL;
const struct RESTART_TABLE *rt;
@@ -3741,9 +3738,7 @@ int log_replay(struct ntfs_inode *ni, bool *initialized)
struct TRANSACTION_ENTRY *tr;
struct DIR_PAGE_ENTRY *dp;
u32 i, bytes_per_attr_entry;
- u32 l_size = ni->vfs_inode.i_size;
- u32 orig_file_size = l_size;
- u32 page_size, vbo, tail, off, dlen;
+ u32 vbo, tail, off, dlen;
u32 saved_len, rec_len, transact_id;
bool use_second_page;
struct RESTART_AREA *ra2, *ra = NULL;
@@ -3758,52 +3753,50 @@ int log_replay(struct ntfs_inode *ni, bool *initialized)
u16 t16;
u32 t32;
- /* Get the size of page. NOTE: To replay we can use default page. */
-#if PAGE_SIZE >= DefaultLogPageSize && PAGE_SIZE <= DefaultLogPageSize * 2
- page_size = norm_file_page(PAGE_SIZE, &l_size, true);
-#else
- page_size = norm_file_page(PAGE_SIZE, &l_size, false);
-#endif
- if (!page_size)
- return -EINVAL;
-
log = kzalloc(sizeof(struct ntfs_log), GFP_NOFS);
if (!log)
return -ENOMEM;
log->ni = ni;
- log->l_size = l_size;
- log->one_page_buf = kmalloc(page_size, GFP_NOFS);
+ log->l_size = log->orig_file_size = ni->vfs_inode.i_size;
+ /* Get the size of page. NOTE: To replay we can use default page. */
+#if PAGE_SIZE >= DefaultLogPageSize && PAGE_SIZE <= DefaultLogPageSize * 2
+ log->page_size = norm_file_page(PAGE_SIZE, &log->l_size, true);
+#else
+ log->page_size = norm_file_page(PAGE_SIZE, &log->l_size, false);
+#endif
+ if (!log->page_size) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ log->one_page_buf = kmalloc(log->page_size, GFP_NOFS);
if (!log->one_page_buf) {
err = -ENOMEM;
goto out;
}
- log->page_size = page_size;
- log->page_mask = page_size - 1;
- log->page_bits = blksize_bits(page_size);
+ log->page_mask = log->page_size - 1;
+ log->page_bits = blksize_bits(log->page_size);
/* Look for a restart area on the disk. */
- memset(&rst_info, 0, sizeof(struct restart_info));
- err = log_read_rst(log, l_size, true, &rst_info);
+ err = log_read_rst(log, true, &log->rst_info);
if (err)
goto out;
/* remember 'initialized' */
- *initialized = rst_info.initialized;
+ *initialized = log->rst_info.initialized;
- if (!rst_info.restart) {
- if (rst_info.initialized) {
+ if (!log->rst_info.restart) {
+ if (log->rst_info.initialized) {
/* No restart area but the file is not initialized. */
err = -EINVAL;
goto out;
}
- log_init_pg_hdr(log, page_size, page_size, 1, 1);
- log_create(log, l_size, 0, get_random_u32(), false, false);
-
- log->ra = ra;
+ log_init_pg_hdr(log, 1, 1);
+ log_create(log, 0, get_random_u32(), false, false);
ra = log_create_ra(log);
if (!ra) {
@@ -3820,25 +3813,26 @@ int log_replay(struct ntfs_inode *ni, bool *initialized)
* If the restart offset above wasn't zero then we won't
* look for a second restart.
*/
- if (rst_info.vbo)
+ if (log->rst_info.vbo)
goto check_restart_area;
- memset(&rst_info2, 0, sizeof(struct restart_info));
- err = log_read_rst(log, l_size, false, &rst_info2);
+ err = log_read_rst(log, false, &log->rst_info2);
if (err)
goto out;
/* Determine which restart area to use. */
- if (!rst_info2.restart || rst_info2.last_lsn <= rst_info.last_lsn)
+ if (!log->rst_info2.restart ||
+ log->rst_info2.last_lsn <= log->rst_info.last_lsn)
goto use_first_page;
use_second_page = true;
- if (rst_info.chkdsk_was_run && page_size != rst_info.vbo) {
+ if (log->rst_info.chkdsk_was_run &&
+ log->page_size != log->rst_info.vbo) {
struct RECORD_PAGE_HDR *sp = NULL;
bool usa_error;
- if (!read_log_page(log, page_size, &sp, &usa_error) &&
+ if (!read_log_page(log, log->page_size, &sp, &usa_error) &&
sp->rhdr.sign == NTFS_CHKD_SIGNATURE) {
use_second_page = false;
}
@@ -3846,52 +3840,43 @@ int log_replay(struct ntfs_inode *ni, bool *initialized)
}
if (use_second_page) {
- kfree(rst_info.r_page);
- memcpy(&rst_info, &rst_info2, sizeof(struct restart_info));
- rst_info2.r_page = NULL;
+ kfree(log->rst_info.r_page);
+ memcpy(&log->rst_info, &log->rst_info2,
+ sizeof(struct restart_info));
+ log->rst_info2.r_page = NULL;
}
use_first_page:
- kfree(rst_info2.r_page);
+ kfree(log->rst_info2.r_page);
check_restart_area:
/*
* If the restart area is at offset 0, we want
* to write the second restart area first.
*/
- log->init_ra = !!rst_info.vbo;
+ log->init_ra = !!log->rst_info.vbo;
/* If we have a valid page then grab a pointer to the restart area. */
- ra2 = rst_info.valid_page ?
- Add2Ptr(rst_info.r_page,
- le16_to_cpu(rst_info.r_page->ra_off)) :
+ ra2 = log->rst_info.valid_page ?
+ Add2Ptr(log->rst_info.r_page,
+ le16_to_cpu(log->rst_info.r_page->ra_off)) :
NULL;
- if (rst_info.chkdsk_was_run ||
+ if (log->rst_info.chkdsk_was_run ||
(ra2 && ra2->client_idx[1] == LFS_NO_CLIENT_LE)) {
bool wrapped = false;
bool use_multi_page = false;
u32 open_log_count;
/* Do some checks based on whether we have a valid log page. */
- if (!rst_info.valid_page) {
- open_log_count = get_random_u32();
- goto init_log_instance;
- }
- open_log_count = le32_to_cpu(ra2->open_log_count);
+ open_log_count = log->rst_info.valid_page ?
+ le32_to_cpu(ra2->open_log_count) :
+ get_random_u32();
- /*
- * If the restart page size isn't changing then we want to
- * check how much work we need to do.
- */
- if (page_size != le32_to_cpu(rst_info.r_page->sys_page_size))
- goto init_log_instance;
+ log_init_pg_hdr(log, 1, 1);
-init_log_instance:
- log_init_pg_hdr(log, page_size, page_size, 1, 1);
-
- log_create(log, l_size, rst_info.last_lsn, open_log_count,
- wrapped, use_multi_page);
+ log_create(log, log->rst_info.last_lsn, open_log_count, wrapped,
+ use_multi_page);
ra = log_create_ra(log);
if (!ra) {
@@ -3916,28 +3901,27 @@ int log_replay(struct ntfs_inode *ni, bool *initialized)
* use the log file. We must use the system page size instead of the
* default size if there is not a clean shutdown.
*/
- t32 = le32_to_cpu(rst_info.r_page->sys_page_size);
- if (page_size != t32) {
- l_size = orig_file_size;
- page_size =
- norm_file_page(t32, &l_size, t32 == DefaultLogPageSize);
+ t32 = le32_to_cpu(log->rst_info.r_page->sys_page_size);
+ if (log->page_size != t32) {
+ log->l_size = log->orig_file_size;
+ log->page_size = norm_file_page(t32, &log->l_size,
+ t32 == DefaultLogPageSize);
}
- if (page_size != t32 ||
- page_size != le32_to_cpu(rst_info.r_page->page_size)) {
+ if (log->page_size != t32 ||
+ log->page_size != le32_to_cpu(log->rst_info.r_page->page_size)) {
err = -EINVAL;
goto out;
}
/* If the file size has shrunk then we won't mount it. */
- if (l_size < le64_to_cpu(ra2->l_size)) {
+ if (log->l_size < le64_to_cpu(ra2->l_size)) {
err = -EINVAL;
goto out;
}
- log_init_pg_hdr(log, page_size, page_size,
- le16_to_cpu(rst_info.r_page->major_ver),
- le16_to_cpu(rst_info.r_page->minor_ver));
+ log_init_pg_hdr(log, le16_to_cpu(log->rst_info.r_page->major_ver),
+ le16_to_cpu(log->rst_info.r_page->minor_ver));
log->l_size = le64_to_cpu(ra2->l_size);
log->seq_num_bits = le32_to_cpu(ra2->seq_num_bits);
@@ -3945,7 +3929,7 @@ int log_replay(struct ntfs_inode *ni, bool *initialized)
log->seq_num_mask = (8 << log->file_data_bits) - 1;
log->last_lsn = le64_to_cpu(ra2->current_lsn);
log->seq_num = log->last_lsn >> log->file_data_bits;
- log->ra_off = le16_to_cpu(rst_info.r_page->ra_off);
+ log->ra_off = le16_to_cpu(log->rst_info.r_page->ra_off);
log->restart_size = log->sys_page_size - log->ra_off;
log->record_header_len = le16_to_cpu(ra2->rec_hdr_len);
log->ra_size = le16_to_cpu(ra2->ra_len);
@@ -4045,7 +4029,7 @@ int log_replay(struct ntfs_inode *ni, bool *initialized)
log->current_avail = current_log_avail(log);
/* Remember which restart area to write first. */
- log->init_ra = rst_info.vbo;
+ log->init_ra = log->rst_info.vbo;
process_log:
/* 1.0, 1.1, 2.0 log->major_ver/minor_ver - short values. */
@@ -4105,7 +4089,7 @@ int log_replay(struct ntfs_inode *ni, bool *initialized)
log->client_id.seq_num = cr->seq_num;
log->client_id.client_idx = client;
- err = read_rst_area(log, &rst, &ra_lsn);
+ err = read_rst_area(log, &rst, &checkpt_lsn);
if (err)
goto out;
@@ -4114,9 +4098,8 @@ int log_replay(struct ntfs_inode *ni, bool *initialized)
bytes_per_attr_entry = !rst->major_ver ? 0x2C : 0x28;
- checkpt_lsn = le64_to_cpu(rst->check_point_start);
- if (!checkpt_lsn)
- checkpt_lsn = ra_lsn;
+ if (rst->check_point_start)
+ checkpt_lsn = le64_to_cpu(rst->check_point_start);
/* Allocate and Read the Transaction Table. */
if (!rst->transact_table_len)
@@ -4330,23 +4313,20 @@ int log_replay(struct ntfs_inode *ni, bool *initialized)
lcb = NULL;
check_attribute_names2:
- if (!rst->attr_names_len)
- goto trace_attribute_table;
-
- ane = attr_names;
- if (!oatbl)
- goto trace_attribute_table;
- while (ane->off) {
- /* TODO: Clear table on exit! */
- oe = Add2Ptr(oatbl, le16_to_cpu(ane->off));
- t16 = le16_to_cpu(ane->name_bytes);
- oe->name_len = t16 / sizeof(short);
- oe->ptr = ane->name;
- oe->is_attr_name = 2;
- ane = Add2Ptr(ane, sizeof(struct ATTR_NAME_ENTRY) + t16);
+ if (rst->attr_names_len && oatbl) {
+ struct ATTR_NAME_ENTRY *ane = attr_names;
+ while (ane->off) {
+ /* TODO: Clear table on exit! */
+ oe = Add2Ptr(oatbl, le16_to_cpu(ane->off));
+ t16 = le16_to_cpu(ane->name_bytes);
+ oe->name_len = t16 / sizeof(short);
+ oe->ptr = ane->name;
+ oe->is_attr_name = 2;
+ ane = Add2Ptr(ane,
+ sizeof(struct ATTR_NAME_ENTRY) + t16);
+ }
}
-trace_attribute_table:
/*
* If the checkpt_lsn is zero, then this is a freshly
* formatted disk and we have no work to do.
@@ -5189,7 +5169,7 @@ int log_replay(struct ntfs_inode *ni, bool *initialized)
kfree(oatbl);
kfree(dptbl);
kfree(attr_names);
- kfree(rst_info.r_page);
+ kfree(log->rst_info.r_page);
kfree(ra);
kfree(log->one_page_buf);
diff --git a/fs/ntfs3/fsntfs.c b/fs/ntfs3/fsntfs.c
index fbfe21d..ae2ef5c 100644
--- a/fs/ntfs3/fsntfs.c
+++ b/fs/ntfs3/fsntfs.c
@@ -853,7 +853,8 @@ void ntfs_update_mftmirr(struct ntfs_sb_info *sbi, int wait)
/*
* sb can be NULL here. In this case sbi->flags should be 0 too.
*/
- if (!sb || !(sbi->flags & NTFS_FLAGS_MFTMIRR))
+ if (!sb || !(sbi->flags & NTFS_FLAGS_MFTMIRR) ||
+ unlikely(ntfs3_forced_shutdown(sb)))
return;
blocksize = sb->s_blocksize;
@@ -1006,6 +1007,30 @@ static inline __le32 security_hash(const void *sd, size_t bytes)
return cpu_to_le32(hash);
}
+/*
+ * simple wrapper for sb_bread_unmovable.
+ */
+struct buffer_head *ntfs_bread(struct super_block *sb, sector_t block)
+{
+ struct ntfs_sb_info *sbi = sb->s_fs_info;
+ struct buffer_head *bh;
+
+ if (unlikely(block >= sbi->volume.blocks)) {
+ /* prevent generic message "attempt to access beyond end of device" */
+ ntfs_err(sb, "try to read out of volume at offset 0x%llx",
+ (u64)block << sb->s_blocksize_bits);
+ return NULL;
+ }
+
+ bh = sb_bread_unmovable(sb, block);
+ if (bh)
+ return bh;
+
+ ntfs_err(sb, "failed to read volume at offset 0x%llx",
+ (u64)block << sb->s_blocksize_bits);
+ return NULL;
+}
+
int ntfs_sb_read(struct super_block *sb, u64 lbo, size_t bytes, void *buffer)
{
struct block_device *bdev = sb->s_bdev;
@@ -2128,8 +2153,8 @@ int ntfs_insert_security(struct ntfs_sb_info *sbi,
if (le32_to_cpu(d_security->size) == new_sec_size &&
d_security->key.hash == hash_key.hash &&
!memcmp(d_security + 1, sd, size_sd)) {
- *security_id = d_security->key.sec_id;
/* Such security already exists. */
+ *security_id = d_security->key.sec_id;
err = 0;
goto out;
}
diff --git a/fs/ntfs3/index.c b/fs/ntfs3/index.c
index cf92b24..daabaad 100644
--- a/fs/ntfs3/index.c
+++ b/fs/ntfs3/index.c
@@ -1462,7 +1462,7 @@ static int indx_create_allocate(struct ntfs_index *indx, struct ntfs_inode *ni,
goto out2;
if (in->name == I30_NAME) {
- ni->vfs_inode.i_size = data_size;
+ i_size_write(&ni->vfs_inode, data_size);
inode_set_bytes(&ni->vfs_inode, alloc_size);
}
@@ -1544,7 +1544,7 @@ static int indx_add_allocate(struct ntfs_index *indx, struct ntfs_inode *ni,
}
if (in->name == I30_NAME)
- ni->vfs_inode.i_size = data_size;
+ i_size_write(&ni->vfs_inode, data_size);
*vbn = bit << indx->idx2vbn_bits;
@@ -2090,7 +2090,7 @@ static int indx_shrink(struct ntfs_index *indx, struct ntfs_inode *ni,
return err;
if (in->name == I30_NAME)
- ni->vfs_inode.i_size = new_data;
+ i_size_write(&ni->vfs_inode, new_data);
bpb = bitmap_size(bit);
if (bpb * 8 == nbits)
@@ -2576,7 +2576,7 @@ int indx_delete_entry(struct ntfs_index *indx, struct ntfs_inode *ni,
err = attr_set_size(ni, ATTR_ALLOC, in->name, in->name_len,
&indx->alloc_run, 0, NULL, false, NULL);
if (in->name == I30_NAME)
- ni->vfs_inode.i_size = 0;
+ i_size_write(&ni->vfs_inode, 0);
err = ni_remove_attr(ni, ATTR_ALLOC, in->name, in->name_len,
false, NULL);
diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c
index 5e3d713..eb7a8c9 100644
--- a/fs/ntfs3/inode.c
+++ b/fs/ntfs3/inode.c
@@ -345,9 +345,7 @@ static struct inode *ntfs_read_mft(struct inode *inode,
inode->i_size = le16_to_cpu(rp.SymbolicLinkReparseBuffer
.PrintNameLength) /
sizeof(u16);
-
ni->i_valid = inode->i_size;
-
/* Clear directory bit. */
if (ni->ni_flags & NI_FLAG_DIR) {
indx_clear(&ni->dir);
@@ -412,7 +410,6 @@ static struct inode *ntfs_read_mft(struct inode *inode,
goto out;
if (!is_match && name) {
- /* Reuse rec as buffer for ascii name. */
err = -ENOENT;
goto out;
}
@@ -427,6 +424,7 @@ static struct inode *ntfs_read_mft(struct inode *inode,
if (names != le16_to_cpu(rec->hard_links)) {
/* Correct minor error on the fly. Do not mark inode as dirty. */
+ ntfs_inode_warn(inode, "Correct links count -> %u.", names);
rec->hard_links = cpu_to_le16(names);
ni->mi.dirty = true;
}
@@ -653,9 +651,10 @@ static noinline int ntfs_get_block_vbo(struct inode *inode, u64 vbo,
off = vbo & (PAGE_SIZE - 1);
folio_set_bh(bh, folio, off);
- err = bh_read(bh, 0);
- if (err < 0)
+ if (bh_read(bh, 0) < 0) {
+ err = -EIO;
goto out;
+ }
folio_zero_segment(folio, off + voff, off + block_size);
}
}
@@ -853,9 +852,13 @@ static int ntfs_resident_writepage(struct folio *folio,
struct writeback_control *wbc, void *data)
{
struct address_space *mapping = data;
- struct ntfs_inode *ni = ntfs_i(mapping->host);
+ struct inode *inode = mapping->host;
+ struct ntfs_inode *ni = ntfs_i(inode);
int ret;
+ if (unlikely(ntfs3_forced_shutdown(inode->i_sb)))
+ return -EIO;
+
ni_lock(ni);
ret = attr_data_write_resident(ni, &folio->page);
ni_unlock(ni);
@@ -869,7 +872,12 @@ static int ntfs_resident_writepage(struct folio *folio,
static int ntfs_writepages(struct address_space *mapping,
struct writeback_control *wbc)
{
- if (is_resident(ntfs_i(mapping->host)))
+ struct inode *inode = mapping->host;
+
+ if (unlikely(ntfs3_forced_shutdown(inode->i_sb)))
+ return -EIO;
+
+ if (is_resident(ntfs_i(inode)))
return write_cache_pages(mapping, wbc, ntfs_resident_writepage,
mapping);
return mpage_writepages(mapping, wbc, ntfs_get_block);
@@ -889,6 +897,9 @@ int ntfs_write_begin(struct file *file, struct address_space *mapping,
struct inode *inode = mapping->host;
struct ntfs_inode *ni = ntfs_i(inode);
+ if (unlikely(ntfs3_forced_shutdown(inode->i_sb)))
+ return -EIO;
+
*pagep = NULL;
if (is_resident(ni)) {
struct page *page =
@@ -974,7 +985,7 @@ int ntfs_write_end(struct file *file, struct address_space *mapping, loff_t pos,
}
if (pos + err > inode->i_size) {
- inode->i_size = pos + err;
+ i_size_write(inode, pos + err);
dirty = true;
}
@@ -1306,6 +1317,11 @@ struct inode *ntfs_create_inode(struct mnt_idmap *idmap, struct inode *dir,
goto out1;
}
+ if (unlikely(ntfs3_forced_shutdown(sb))) {
+ err = -EIO;
+ goto out2;
+ }
+
/* Mark rw ntfs as dirty. it will be cleared at umount. */
ntfs_set_state(sbi, NTFS_DIRTY_DIRTY);
diff --git a/fs/ntfs3/namei.c b/fs/ntfs3/namei.c
index ee3093b..084d19d 100644
--- a/fs/ntfs3/namei.c
+++ b/fs/ntfs3/namei.c
@@ -181,6 +181,9 @@ static int ntfs_unlink(struct inode *dir, struct dentry *dentry)
struct ntfs_inode *ni = ntfs_i(dir);
int err;
+ if (unlikely(ntfs3_forced_shutdown(dir->i_sb)))
+ return -EIO;
+
ni_lock_dir(ni);
err = ntfs_unlink_inode(dir, dentry);
@@ -199,6 +202,9 @@ static int ntfs_symlink(struct mnt_idmap *idmap, struct inode *dir,
u32 size = strlen(symname);
struct inode *inode;
+ if (unlikely(ntfs3_forced_shutdown(dir->i_sb)))
+ return -EIO;
+
inode = ntfs_create_inode(idmap, dir, dentry, NULL, S_IFLNK | 0777, 0,
symname, size, NULL);
@@ -227,6 +233,9 @@ static int ntfs_rmdir(struct inode *dir, struct dentry *dentry)
struct ntfs_inode *ni = ntfs_i(dir);
int err;
+ if (unlikely(ntfs3_forced_shutdown(dir->i_sb)))
+ return -EIO;
+
ni_lock_dir(ni);
err = ntfs_unlink_inode(dir, dentry);
@@ -264,6 +273,9 @@ static int ntfs_rename(struct mnt_idmap *idmap, struct inode *dir,
1024);
static_assert(PATH_MAX >= 4 * 1024);
+ if (unlikely(ntfs3_forced_shutdown(sb)))
+ return -EIO;
+
if (flags & ~RENAME_NOREPLACE)
return -EINVAL;
@@ -419,7 +431,7 @@ static int ntfs_atomic_open(struct inode *dir, struct dentry *dentry,
* fnd contains tree's path to insert to.
* If fnd is not NULL then dir is locked.
*/
- inode = ntfs_create_inode(mnt_idmap(file->f_path.mnt), dir, dentry, uni,
+ inode = ntfs_create_inode(file_mnt_idmap(file), dir, dentry, uni,
mode, 0, NULL, 0, fnd);
err = IS_ERR(inode) ? PTR_ERR(inode) :
finish_open(file, dentry, ntfs_file_open);
diff --git a/fs/ntfs3/ntfs.h b/fs/ntfs3/ntfs.h
index 86aecbb..9c74781 100644
--- a/fs/ntfs3/ntfs.h
+++ b/fs/ntfs3/ntfs.h
@@ -523,12 +523,10 @@ struct ATTR_LIST_ENTRY {
__le64 vcn; // 0x08: Starting VCN of this attribute.
struct MFT_REF ref; // 0x10: MFT record number with attribute.
__le16 id; // 0x18: struct ATTRIB ID.
- __le16 name[3]; // 0x1A: Just to align. To get real name can use bNameOffset.
+ __le16 name[]; // 0x1A: To get real name use name_off.
}; // sizeof(0x20)
-static_assert(sizeof(struct ATTR_LIST_ENTRY) == 0x20);
-
static inline u32 le_size(u8 name_len)
{
return ALIGN(offsetof(struct ATTR_LIST_ENTRY, name) +
diff --git a/fs/ntfs3/ntfs_fs.h b/fs/ntfs3/ntfs_fs.h
index f670614..79356fd 100644
--- a/fs/ntfs3/ntfs_fs.h
+++ b/fs/ntfs3/ntfs_fs.h
@@ -61,6 +61,8 @@ enum utf16_endian;
/* sbi->flags */
#define NTFS_FLAGS_NODISCARD 0x00000001
+/* ntfs in shutdown state. */
+#define NTFS_FLAGS_SHUTDOWN_BIT 0x00000002 /* == 4*/
/* Set when LogFile is replaying. */
#define NTFS_FLAGS_LOG_REPLAYING 0x00000008
/* Set when we changed first MFT's which copy must be updated in $MftMirr. */
@@ -226,7 +228,7 @@ struct ntfs_sb_info {
u64 maxbytes; // Maximum size for normal files.
u64 maxbytes_sparse; // Maximum size for sparse file.
- u32 flags; // See NTFS_FLAGS_XXX.
+ unsigned long flags; // See NTFS_FLAGS_
CLST zone_max; // Maximum MFT zone length in clusters
CLST bad_clusters; // The count of marked bad clusters.
@@ -473,7 +475,7 @@ bool al_delete_le(struct ntfs_inode *ni, enum ATTR_TYPE type, CLST vcn,
int al_update(struct ntfs_inode *ni, int sync);
static inline size_t al_aligned(size_t size)
{
- return (size + 1023) & ~(size_t)1023;
+ return size_add(size, 1023) & ~(size_t)1023;
}
/* Globals from bitfunc.c */
@@ -500,6 +502,8 @@ int ntfs3_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
int ntfs_file_open(struct inode *inode, struct file *file);
int ntfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
__u64 start, __u64 len);
+long ntfs_ioctl(struct file *filp, u32 cmd, unsigned long arg);
+long ntfs_compat_ioctl(struct file *filp, u32 cmd, unsigned long arg);
extern const struct inode_operations ntfs_special_inode_operations;
extern const struct inode_operations ntfs_file_inode_operations;
extern const struct file_operations ntfs_file_operations;
@@ -584,6 +588,7 @@ bool check_index_header(const struct INDEX_HDR *hdr, size_t bytes);
int log_replay(struct ntfs_inode *ni, bool *initialized);
/* Globals from fsntfs.c */
+struct buffer_head *ntfs_bread(struct super_block *sb, sector_t block);
bool ntfs_fix_pre_write(struct NTFS_RECORD_HEADER *rhdr, size_t bytes);
int ntfs_fix_post_read(struct NTFS_RECORD_HEADER *rhdr, size_t bytes,
bool simple);
@@ -872,7 +877,7 @@ int ntfs_init_acl(struct mnt_idmap *idmap, struct inode *inode,
int ntfs_acl_chmod(struct mnt_idmap *idmap, struct dentry *dentry);
ssize_t ntfs_listxattr(struct dentry *dentry, char *buffer, size_t size);
-extern const struct xattr_handler * const ntfs_xattr_handlers[];
+extern const struct xattr_handler *const ntfs_xattr_handlers[];
int ntfs_save_wsl_perm(struct inode *inode, __le16 *ea_size);
void ntfs_get_wsl_perm(struct inode *inode);
@@ -999,6 +1004,11 @@ static inline struct ntfs_sb_info *ntfs_sb(struct super_block *sb)
return sb->s_fs_info;
}
+static inline int ntfs3_forced_shutdown(struct super_block *sb)
+{
+ return test_bit(NTFS_FLAGS_SHUTDOWN_BIT, &ntfs_sb(sb)->flags);
+}
+
/*
* ntfs_up_cluster - Align up on cluster boundary.
*/
@@ -1025,19 +1035,6 @@ static inline u64 bytes_to_block(const struct super_block *sb, u64 size)
return (size + sb->s_blocksize - 1) >> sb->s_blocksize_bits;
}
-static inline struct buffer_head *ntfs_bread(struct super_block *sb,
- sector_t block)
-{
- struct buffer_head *bh = sb_bread(sb, block);
-
- if (bh)
- return bh;
-
- ntfs_err(sb, "failed to read volume at offset 0x%llx",
- (u64)block << sb->s_blocksize_bits);
- return NULL;
-}
-
static inline struct ntfs_inode *ntfs_i(struct inode *inode)
{
return container_of(inode, struct ntfs_inode, vfs_inode);
diff --git a/fs/ntfs3/record.c b/fs/ntfs3/record.c
index 53629b1..6aa3a9d 100644
--- a/fs/ntfs3/record.c
+++ b/fs/ntfs3/record.c
@@ -279,7 +279,7 @@ struct ATTRIB *mi_enum_attr(struct mft_inode *mi, struct ATTRIB *attr)
if (t16 > asize)
return NULL;
- if (t16 + le32_to_cpu(attr->res.data_size) > asize)
+ if (le32_to_cpu(attr->res.data_size) > asize - t16)
return NULL;
t32 = sizeof(short) * attr->name_len;
@@ -535,8 +535,20 @@ bool mi_remove_attr(struct ntfs_inode *ni, struct mft_inode *mi,
return false;
if (ni && is_attr_indexed(attr)) {
- le16_add_cpu(&ni->mi.mrec->hard_links, -1);
- ni->mi.dirty = true;
+ u16 links = le16_to_cpu(ni->mi.mrec->hard_links);
+ struct ATTR_FILE_NAME *fname =
+ attr->type != ATTR_NAME ?
+ NULL :
+ resident_data_ex(attr,
+ SIZEOF_ATTRIBUTE_FILENAME);
+ if (fname && fname->type == FILE_NAME_DOS) {
+ /* Do not decrease links count deleting DOS name. */
+ } else if (!links) {
+ /* minor error. Not critical. */
+ } else {
+ ni->mi.mrec->hard_links = cpu_to_le16(links - 1);
+ ni->mi.dirty = true;
+ }
}
used -= asize;
diff --git a/fs/ntfs3/super.c b/fs/ntfs3/super.c
index 9153dff..cef5467 100644
--- a/fs/ntfs3/super.c
+++ b/fs/ntfs3/super.c
@@ -122,13 +122,12 @@ void ntfs_inode_printk(struct inode *inode, const char *fmt, ...)
if (name) {
struct dentry *de = d_find_alias(inode);
- const u32 name_len = ARRAY_SIZE(s_name_buf) - 1;
if (de) {
spin_lock(&de->d_lock);
- snprintf(name, name_len, " \"%s\"", de->d_name.name);
+ snprintf(name, sizeof(s_name_buf), " \"%s\"",
+ de->d_name.name);
spin_unlock(&de->d_lock);
- name[name_len] = 0; /* To be sure. */
} else {
name[0] = 0;
}
@@ -625,7 +624,7 @@ static void ntfs3_free_sbi(struct ntfs_sb_info *sbi)
{
kfree(sbi->new_rec);
kvfree(ntfs_put_shared(sbi->upcase));
- kfree(sbi->def_table);
+ kvfree(sbi->def_table);
kfree(sbi->compress.lznt);
#ifdef CONFIG_NTFS3_LZX_XPRESS
xpress_free_decompressor(sbi->compress.xpress);
@@ -715,6 +714,14 @@ static int ntfs_show_options(struct seq_file *m, struct dentry *root)
}
/*
+ * ntfs_shutdown - super_operations::shutdown
+ */
+static void ntfs_shutdown(struct super_block *sb)
+{
+ set_bit(NTFS_FLAGS_SHUTDOWN_BIT, &ntfs_sb(sb)->flags);
+}
+
+/*
* ntfs_sync_fs - super_operations::sync_fs
*/
static int ntfs_sync_fs(struct super_block *sb, int wait)
@@ -724,6 +731,9 @@ static int ntfs_sync_fs(struct super_block *sb, int wait)
struct ntfs_inode *ni;
struct inode *inode;
+ if (unlikely(ntfs3_forced_shutdown(sb)))
+ return -EIO;
+
ni = sbi->security.ni;
if (ni) {
inode = &ni->vfs_inode;
@@ -763,6 +773,7 @@ static const struct super_operations ntfs_sops = {
.put_super = ntfs_put_super,
.statfs = ntfs_statfs,
.show_options = ntfs_show_options,
+ .shutdown = ntfs_shutdown,
.sync_fs = ntfs_sync_fs,
.write_inode = ntfs3_write_inode,
};
@@ -866,6 +877,7 @@ static int ntfs_init_from_boot(struct super_block *sb, u32 sector_size,
u16 fn, ao;
u8 cluster_bits;
u32 boot_off = 0;
+ sector_t boot_block = 0;
const char *hint = "Primary boot";
/* Save original dev_size. Used with alternative boot. */
@@ -873,11 +885,11 @@ static int ntfs_init_from_boot(struct super_block *sb, u32 sector_size,
sbi->volume.blocks = dev_size >> PAGE_SHIFT;
- bh = ntfs_bread(sb, 0);
+read_boot:
+ bh = ntfs_bread(sb, boot_block);
if (!bh)
- return -EIO;
+ return boot_block ? -EINVAL : -EIO;
-check_boot:
err = -EINVAL;
/* Corrupted image; do not read OOB */
@@ -1108,26 +1120,24 @@ static int ntfs_init_from_boot(struct super_block *sb, u32 sector_size,
}
out:
- if (err == -EINVAL && !bh->b_blocknr && dev_size0 > PAGE_SHIFT) {
+ brelse(bh);
+
+ if (err == -EINVAL && !boot_block && dev_size0 > PAGE_SHIFT) {
u32 block_size = min_t(u32, sector_size, PAGE_SIZE);
u64 lbo = dev_size0 - sizeof(*boot);
- /*
- * Try alternative boot (last sector)
- */
- brelse(bh);
-
- sb_set_blocksize(sb, block_size);
- bh = ntfs_bread(sb, lbo >> blksize_bits(block_size));
- if (!bh)
- return -EINVAL;
-
+ boot_block = lbo >> blksize_bits(block_size);
boot_off = lbo & (block_size - 1);
- hint = "Alternative boot";
- dev_size = dev_size0; /* restore original size. */
- goto check_boot;
+ if (boot_block && block_size >= boot_off + sizeof(*boot)) {
+ /*
+ * Try alternative boot (last sector)
+ */
+ sb_set_blocksize(sb, block_size);
+ hint = "Alternative boot";
+ dev_size = dev_size0; /* restore original size. */
+ goto read_boot;
+ }
}
- brelse(bh);
return err;
}
diff --git a/fs/ntfs3/xattr.c b/fs/ntfs3/xattr.c
index 4274b6f..53e7d1f 100644
--- a/fs/ntfs3/xattr.c
+++ b/fs/ntfs3/xattr.c
@@ -219,6 +219,9 @@ static ssize_t ntfs_list_ea(struct ntfs_inode *ni, char *buffer,
if (!ea->name_len)
break;
+ if (ea->name_len > ea_size)
+ break;
+
if (buffer) {
/* Check if we can use field ea->name */
if (off + ea_size > size)
@@ -744,6 +747,9 @@ static int ntfs_getxattr(const struct xattr_handler *handler, struct dentry *de,
int err;
struct ntfs_inode *ni = ntfs_i(inode);
+ if (unlikely(ntfs3_forced_shutdown(inode->i_sb)))
+ return -EIO;
+
/* Dispatch request. */
if (!strcmp(name, SYSTEM_DOS_ATTRIB)) {
/* system.dos_attrib */
diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index 4d7efef..1bde128 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -213,7 +213,7 @@ struct o2hb_region {
unsigned int hr_num_pages;
struct page **hr_slot_data;
- struct bdev_handle *hr_bdev_handle;
+ struct file *hr_bdev_file;
struct o2hb_disk_slot *hr_slots;
/* live node map of this region */
@@ -263,7 +263,7 @@ struct o2hb_region {
static inline struct block_device *reg_bdev(struct o2hb_region *reg)
{
- return reg->hr_bdev_handle ? reg->hr_bdev_handle->bdev : NULL;
+ return reg->hr_bdev_file ? file_bdev(reg->hr_bdev_file) : NULL;
}
struct o2hb_bio_wait_ctxt {
@@ -1509,8 +1509,8 @@ static void o2hb_region_release(struct config_item *item)
kfree(reg->hr_slot_data);
}
- if (reg->hr_bdev_handle)
- bdev_release(reg->hr_bdev_handle);
+ if (reg->hr_bdev_file)
+ fput(reg->hr_bdev_file);
kfree(reg->hr_slots);
@@ -1569,7 +1569,7 @@ static ssize_t o2hb_region_block_bytes_store(struct config_item *item,
unsigned long block_bytes;
unsigned int block_bits;
- if (reg->hr_bdev_handle)
+ if (reg->hr_bdev_file)
return -EINVAL;
status = o2hb_read_block_input(reg, page, &block_bytes,
@@ -1598,7 +1598,7 @@ static ssize_t o2hb_region_start_block_store(struct config_item *item,
char *p = (char *)page;
ssize_t ret;
- if (reg->hr_bdev_handle)
+ if (reg->hr_bdev_file)
return -EINVAL;
ret = kstrtoull(p, 0, &tmp);
@@ -1623,7 +1623,7 @@ static ssize_t o2hb_region_blocks_store(struct config_item *item,
unsigned long tmp;
char *p = (char *)page;
- if (reg->hr_bdev_handle)
+ if (reg->hr_bdev_file)
return -EINVAL;
tmp = simple_strtoul(p, &p, 0);
@@ -1642,7 +1642,7 @@ static ssize_t o2hb_region_dev_show(struct config_item *item, char *page)
{
unsigned int ret = 0;
- if (to_o2hb_region(item)->hr_bdev_handle)
+ if (to_o2hb_region(item)->hr_bdev_file)
ret = sprintf(page, "%pg\n", reg_bdev(to_o2hb_region(item)));
return ret;
@@ -1753,7 +1753,7 @@ static int o2hb_populate_slot_data(struct o2hb_region *reg)
}
/*
- * this is acting as commit; we set up all of hr_bdev_handle and hr_task or
+ * this is acting as commit; we set up all of hr_bdev_file and hr_task or
* nothing
*/
static ssize_t o2hb_region_dev_store(struct config_item *item,
@@ -1769,7 +1769,7 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
ssize_t ret = -EINVAL;
int live_threshold;
- if (reg->hr_bdev_handle)
+ if (reg->hr_bdev_file)
goto out;
/* We can't heartbeat without having had our node number
@@ -1795,11 +1795,11 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
if (!S_ISBLK(f.file->f_mapping->host->i_mode))
goto out2;
- reg->hr_bdev_handle = bdev_open_by_dev(f.file->f_mapping->host->i_rdev,
+ reg->hr_bdev_file = bdev_file_open_by_dev(f.file->f_mapping->host->i_rdev,
BLK_OPEN_WRITE | BLK_OPEN_READ, NULL, NULL);
- if (IS_ERR(reg->hr_bdev_handle)) {
- ret = PTR_ERR(reg->hr_bdev_handle);
- reg->hr_bdev_handle = NULL;
+ if (IS_ERR(reg->hr_bdev_file)) {
+ ret = PTR_ERR(reg->hr_bdev_file);
+ reg->hr_bdev_file = NULL;
goto out2;
}
@@ -1903,8 +1903,8 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
out3:
if (ret < 0) {
- bdev_release(reg->hr_bdev_handle);
- reg->hr_bdev_handle = NULL;
+ fput(reg->hr_bdev_file);
+ reg->hr_bdev_file = NULL;
}
out2:
fdput(f);
diff --git a/fs/ocfs2/locks.c b/fs/ocfs2/locks.c
index f37174e..6de9448 100644
--- a/fs/ocfs2/locks.c
+++ b/fs/ocfs2/locks.c
@@ -27,7 +27,7 @@ static int ocfs2_do_flock(struct file *file, struct inode *inode,
struct ocfs2_file_private *fp = file->private_data;
struct ocfs2_lock_res *lockres = &fp->fp_flock;
- if (fl->fl_type == F_WRLCK)
+ if (lock_is_write(fl))
level = 1;
if (!IS_SETLKW(cmd))
trylock = 1;
@@ -53,8 +53,8 @@ static int ocfs2_do_flock(struct file *file, struct inode *inode,
*/
locks_init_lock(&request);
- request.fl_type = F_UNLCK;
- request.fl_flags = FL_FLOCK;
+ request.c.flc_type = F_UNLCK;
+ request.c.flc_flags = FL_FLOCK;
locks_lock_file_wait(file, &request);
ocfs2_file_unlock(file);
@@ -100,14 +100,14 @@ int ocfs2_flock(struct file *file, int cmd, struct file_lock *fl)
struct inode *inode = file->f_mapping->host;
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
- if (!(fl->fl_flags & FL_FLOCK))
+ if (!(fl->c.flc_flags & FL_FLOCK))
return -ENOLCK;
if ((osb->s_mount_opt & OCFS2_MOUNT_LOCALFLOCKS) ||
ocfs2_mount_local(osb))
return locks_lock_file_wait(file, fl);
- if (fl->fl_type == F_UNLCK)
+ if (lock_is_unlock(fl))
return ocfs2_do_funlock(file, cmd, fl);
else
return ocfs2_do_flock(file, inode, cmd, fl);
@@ -118,7 +118,7 @@ int ocfs2_lock(struct file *file, int cmd, struct file_lock *fl)
struct inode *inode = file->f_mapping->host;
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
- if (!(fl->fl_flags & FL_POSIX))
+ if (!(fl->c.flc_flags & FL_POSIX))
return -ENOLCK;
return ocfs2_plock(osb->cconn, OCFS2_I(inode)->ip_blkno, file, cmd, fl);
diff --git a/fs/ocfs2/stack_user.c b/fs/ocfs2/stack_user.c
index 9b76ee6..c11406c 100644
--- a/fs/ocfs2/stack_user.c
+++ b/fs/ocfs2/stack_user.c
@@ -744,7 +744,7 @@ static int user_plock(struct ocfs2_cluster_connection *conn,
return dlm_posix_cancel(conn->cc_lockspace, ino, file, fl);
else if (IS_GETLK(cmd))
return dlm_posix_get(conn->cc_lockspace, ino, file, fl);
- else if (fl->fl_type == F_UNLCK)
+ else if (lock_is_unlock(fl))
return dlm_posix_unlock(conn->cc_lockspace, ino, file, fl);
else
return dlm_posix_lock(conn->cc_lockspace, ino, file, cmd, fl);
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index 6b90642..a70aff1 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -2027,8 +2027,8 @@ static int ocfs2_initialize_super(struct super_block *sb,
cbits = le32_to_cpu(di->id2.i_super.s_clustersize_bits);
bbits = le32_to_cpu(di->id2.i_super.s_blocksize_bits);
sb->s_maxbytes = ocfs2_max_file_offset(bbits, cbits);
- memcpy(&sb->s_uuid, di->id2.i_super.s_uuid,
- sizeof(di->id2.i_super.s_uuid));
+ super_set_uuid(sb, di->id2.i_super.s_uuid,
+ sizeof(di->id2.i_super.s_uuid));
osb->osb_dx_mask = (1 << (cbits - bbits)) - 1;
diff --git a/fs/open.c b/fs/open.c
index a84d21e..a7d4bb2 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -154,49 +154,52 @@ COMPAT_SYSCALL_DEFINE2(truncate, const char __user *, path, compat_off_t, length
}
#endif
-long do_sys_ftruncate(unsigned int fd, loff_t length, int small)
+long do_ftruncate(struct file *file, loff_t length, int small)
{
struct inode *inode;
struct dentry *dentry;
+ int error;
+
+ /* explicitly opened as large or we are on 64-bit box */
+ if (file->f_flags & O_LARGEFILE)
+ small = 0;
+
+ dentry = file->f_path.dentry;
+ inode = dentry->d_inode;
+ if (!S_ISREG(inode->i_mode) || !(file->f_mode & FMODE_WRITE))
+ return -EINVAL;
+
+ /* Cannot ftruncate over 2^31 bytes without large file support */
+ if (small && length > MAX_NON_LFS)
+ return -EINVAL;
+
+ /* Check IS_APPEND on real upper inode */
+ if (IS_APPEND(file_inode(file)))
+ return -EPERM;
+ sb_start_write(inode->i_sb);
+ error = security_file_truncate(file);
+ if (!error)
+ error = do_truncate(file_mnt_idmap(file), dentry, length,
+ ATTR_MTIME | ATTR_CTIME, file);
+ sb_end_write(inode->i_sb);
+
+ return error;
+}
+
+long do_sys_ftruncate(unsigned int fd, loff_t length, int small)
+{
struct fd f;
int error;
- error = -EINVAL;
if (length < 0)
- goto out;
- error = -EBADF;
+ return -EINVAL;
f = fdget(fd);
if (!f.file)
- goto out;
+ return -EBADF;
- /* explicitly opened as large or we are on 64-bit box */
- if (f.file->f_flags & O_LARGEFILE)
- small = 0;
+ error = do_ftruncate(f.file, length, small);
- dentry = f.file->f_path.dentry;
- inode = dentry->d_inode;
- error = -EINVAL;
- if (!S_ISREG(inode->i_mode) || !(f.file->f_mode & FMODE_WRITE))
- goto out_putf;
-
- error = -EINVAL;
- /* Cannot ftruncate over 2^31 bytes without large file support */
- if (small && length > MAX_NON_LFS)
- goto out_putf;
-
- error = -EPERM;
- /* Check IS_APPEND on real upper inode */
- if (IS_APPEND(file_inode(f.file)))
- goto out_putf;
- sb_start_write(inode->i_sb);
- error = security_file_truncate(f.file);
- if (!error)
- error = do_truncate(file_mnt_idmap(f.file), dentry, length,
- ATTR_MTIME | ATTR_CTIME, f.file);
- sb_end_write(inode->i_sb);
-out_putf:
fdput(f);
-out:
return error;
}
@@ -1364,7 +1367,7 @@ struct file *filp_open(const char *filename, int flags, umode_t mode)
{
struct filename *name = getname_kernel(filename);
struct file *file = ERR_CAST(name);
-
+
if (!IS_ERR(name)) {
file = file_open_name(name, flags, mode);
putname(name);
diff --git a/fs/openpromfs/inode.c b/fs/openpromfs/inode.c
index c4b65a6..4a0779e 100644
--- a/fs/openpromfs/inode.c
+++ b/fs/openpromfs/inode.c
@@ -446,7 +446,7 @@ static int __init init_openprom_fs(void)
sizeof(struct op_inode_info),
0,
(SLAB_RECLAIM_ACCOUNT |
- SLAB_MEM_SPREAD | SLAB_ACCOUNT),
+ SLAB_ACCOUNT),
op_inode_init_once);
if (!op_inode_cachep)
return -ENOMEM;
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index b8e25ca..8586e2f 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -265,20 +265,18 @@ static int ovl_copy_up_file(struct ovl_fs *ofs, struct dentry *dentry,
if (IS_ERR(old_file))
return PTR_ERR(old_file);
+ /* Try to use clone_file_range to clone up within the same fs */
+ cloned = vfs_clone_file_range(old_file, 0, new_file, 0, len, 0);
+ if (cloned == len)
+ goto out_fput;
+
+ /* Couldn't clone, so now we try to copy the data */
error = rw_verify_area(READ, old_file, &old_pos, len);
if (!error)
error = rw_verify_area(WRITE, new_file, &new_pos, len);
if (error)
goto out_fput;
- /* Try to use clone_file_range to clone up within the same fs */
- ovl_start_write(dentry);
- cloned = do_clone_file_range(old_file, 0, new_file, 0, len, 0);
- ovl_end_write(dentry);
- if (cloned == len)
- goto out_fput;
- /* Couldn't clone, so now we try to copy the data */
-
/* Check if lower fs supports seek operation */
if (old_file->f_mode & FMODE_LSEEK)
skip_hole = true;
diff --git a/fs/overlayfs/params.c b/fs/overlayfs/params.c
index 112b4b1..36dcc53 100644
--- a/fs/overlayfs/params.c
+++ b/fs/overlayfs/params.c
@@ -280,12 +280,20 @@ static int ovl_mount_dir_check(struct fs_context *fc, const struct path *path,
{
struct ovl_fs_context *ctx = fc->fs_private;
- if (ovl_dentry_weird(path->dentry))
- return invalfc(fc, "filesystem on %s not supported", name);
-
if (!d_is_dir(path->dentry))
return invalfc(fc, "%s is not a directory", name);
+ /*
+ * Root dentries of case-insensitive capable filesystems might
+ * not have the dentry operations set, but still be incompatible
+ * with overlayfs. Check explicitly to prevent post-mount
+ * failures.
+ */
+ if (sb_has_encoding(path->mnt->mnt_sb))
+ return invalfc(fc, "case-insensitive capable filesystem on %s not supported", name);
+
+ if (ovl_dentry_weird(path->dentry))
+ return invalfc(fc, "filesystem on %s not supported", name);
/*
* Check whether upper path is read-only here to report failures
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 2eef6c7..36d4b8b 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -28,41 +28,38 @@ MODULE_LICENSE("GPL");
struct ovl_dir_cache;
-static struct dentry *ovl_d_real(struct dentry *dentry,
- const struct inode *inode)
+static struct dentry *ovl_d_real(struct dentry *dentry, enum d_real_type type)
{
- struct dentry *real = NULL, *lower;
+ struct dentry *upper, *lower;
int err;
- /*
- * vfs is only expected to call d_real() with NULL from d_real_inode()
- * and with overlay inode from file_dentry() on an overlay file.
- *
- * TODO: remove @inode argument from d_real() API, remove code in this
- * function that deals with non-NULL @inode and remove d_real() call
- * from file_dentry().
- */
- if (inode && d_inode(dentry) == inode)
- return dentry;
- else if (inode)
+ switch (type) {
+ case D_REAL_DATA:
+ case D_REAL_METADATA:
+ break;
+ default:
goto bug;
+ }
if (!d_is_reg(dentry)) {
/* d_real_inode() is only relevant for regular files */
return dentry;
}
- real = ovl_dentry_upper(dentry);
- if (real && (inode == d_inode(real)))
- return real;
+ upper = ovl_dentry_upper(dentry);
+ if (upper && (type == D_REAL_METADATA ||
+ ovl_has_upperdata(d_inode(dentry))))
+ return upper;
- if (real && !inode && ovl_has_upperdata(d_inode(dentry)))
- return real;
+ if (type == D_REAL_METADATA) {
+ lower = ovl_dentry_lower(dentry);
+ goto real_lower;
+ }
/*
- * Best effort lazy lookup of lowerdata for !inode case to return
+ * Best effort lazy lookup of lowerdata for D_REAL_DATA case to return
* the real lowerdata dentry. The only current caller of d_real() with
- * NULL inode is d_real_inode() from trace_uprobe and this caller is
+ * D_REAL_DATA is d_real_inode() from trace_uprobe and this caller is
* likely going to be followed reading from the file, before placing
* uprobes on offset within the file, so lowerdata should be available
* when setting the uprobe.
@@ -73,18 +70,13 @@ static struct dentry *ovl_d_real(struct dentry *dentry,
lower = ovl_dentry_lowerdata(dentry);
if (!lower)
goto bug;
- real = lower;
- /* Handle recursion */
- real = d_real(real, inode);
+real_lower:
+ /* Handle recursion into stacked lower fs */
+ return d_real(lower, type);
- if (!inode || inode == d_inode(real))
- return real;
bug:
- WARN(1, "%s(%pd4, %s:%lu): real dentry (%p/%lu) not found\n",
- __func__, dentry, inode ? inode->i_sb->s_id : "NULL",
- inode ? inode->i_ino : 0, real,
- real && d_inode(real) ? d_inode(real)->i_ino : 0);
+ WARN(1, "%s(%pd4, %d): real dentry not found\n", __func__, dentry, type);
return dentry;
}
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index a8e17f1..d285d1d 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -774,13 +774,14 @@ bool ovl_init_uuid_xattr(struct super_block *sb, struct ovl_fs *ofs,
const struct path *upperpath)
{
bool set = false;
+ uuid_t uuid;
int res;
/* Try to load existing persistent uuid */
- res = ovl_path_getxattr(ofs, upperpath, OVL_XATTR_UUID, sb->s_uuid.b,
+ res = ovl_path_getxattr(ofs, upperpath, OVL_XATTR_UUID, uuid.b,
UUID_SIZE);
if (res == UUID_SIZE)
- return true;
+ goto set_uuid;
if (res != -ENODATA)
goto fail;
@@ -808,17 +809,20 @@ bool ovl_init_uuid_xattr(struct super_block *sb, struct ovl_fs *ofs,
}
/* Generate overlay instance uuid */
- uuid_gen(&sb->s_uuid);
+ uuid_gen(&uuid);
/* Try to store persistent uuid */
set = true;
- res = ovl_setxattr(ofs, upperpath->dentry, OVL_XATTR_UUID, sb->s_uuid.b,
+ res = ovl_setxattr(ofs, upperpath->dentry, OVL_XATTR_UUID, uuid.b,
UUID_SIZE);
- if (res == 0)
- return true;
+ if (res)
+ goto fail;
+
+set_uuid:
+ super_set_uuid(sb, uuid.b, sizeof(uuid));
+ return true;
fail:
- memset(sb->s_uuid.b, 0, UUID_SIZE);
ofs->config.uuid = OVL_UUID_NULL;
pr_warn("failed to %s uuid (%pd2, err=%i); falling back to uuid=null.\n",
set ? "set" : "get", upperpath->dentry, res);
diff --git a/fs/pidfs.c b/fs/pidfs.c
new file mode 100644
index 0000000..8fd71a0
--- /dev/null
+++ b/fs/pidfs.c
@@ -0,0 +1,290 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/anon_inodes.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/magic.h>
+#include <linux/mount.h>
+#include <linux/pid.h>
+#include <linux/pidfs.h>
+#include <linux/pid_namespace.h>
+#include <linux/poll.h>
+#include <linux/proc_fs.h>
+#include <linux/proc_ns.h>
+#include <linux/pseudo_fs.h>
+#include <linux/seq_file.h>
+#include <uapi/linux/pidfd.h>
+
+#include "internal.h"
+
+static int pidfd_release(struct inode *inode, struct file *file)
+{
+#ifndef CONFIG_FS_PID
+ struct pid *pid = file->private_data;
+
+ file->private_data = NULL;
+ put_pid(pid);
+#endif
+ return 0;
+}
+
+#ifdef CONFIG_PROC_FS
+/**
+ * pidfd_show_fdinfo - print information about a pidfd
+ * @m: proc fdinfo file
+ * @f: file referencing a pidfd
+ *
+ * Pid:
+ * This function will print the pid that a given pidfd refers to in the
+ * pid namespace of the procfs instance.
+ * If the pid namespace of the process is not a descendant of the pid
+ * namespace of the procfs instance 0 will be shown as its pid. This is
+ * similar to calling getppid() on a process whose parent is outside of
+ * its pid namespace.
+ *
+ * NSpid:
+ * If pid namespaces are supported then this function will also print
+ * the pid of a given pidfd refers to for all descendant pid namespaces
+ * starting from the current pid namespace of the instance, i.e. the
+ * Pid field and the first entry in the NSpid field will be identical.
+ * If the pid namespace of the process is not a descendant of the pid
+ * namespace of the procfs instance 0 will be shown as its first NSpid
+ * entry and no others will be shown.
+ * Note that this differs from the Pid and NSpid fields in
+ * /proc/<pid>/status where Pid and NSpid are always shown relative to
+ * the pid namespace of the procfs instance. The difference becomes
+ * obvious when sending around a pidfd between pid namespaces from a
+ * different branch of the tree, i.e. where no ancestral relation is
+ * present between the pid namespaces:
+ * - create two new pid namespaces ns1 and ns2 in the initial pid
+ * namespace (also take care to create new mount namespaces in the
+ * new pid namespace and mount procfs)
+ * - create a process with a pidfd in ns1
+ * - send pidfd from ns1 to ns2
+ * - read /proc/self/fdinfo/<pidfd> and observe that both Pid and NSpid
+ * have exactly one entry, which is 0
+ */
+static void pidfd_show_fdinfo(struct seq_file *m, struct file *f)
+{
+ struct pid *pid = pidfd_pid(f);
+ struct pid_namespace *ns;
+ pid_t nr = -1;
+
+ if (likely(pid_has_task(pid, PIDTYPE_PID))) {
+ ns = proc_pid_ns(file_inode(m->file)->i_sb);
+ nr = pid_nr_ns(pid, ns);
+ }
+
+ seq_put_decimal_ll(m, "Pid:\t", nr);
+
+#ifdef CONFIG_PID_NS
+ seq_put_decimal_ll(m, "\nNSpid:\t", nr);
+ if (nr > 0) {
+ int i;
+
+ /* If nr is non-zero it means that 'pid' is valid and that
+ * ns, i.e. the pid namespace associated with the procfs
+ * instance, is in the pid namespace hierarchy of pid.
+ * Start at one below the already printed level.
+ */
+ for (i = ns->level + 1; i <= pid->level; i++)
+ seq_put_decimal_ll(m, "\t", pid->numbers[i].nr);
+ }
+#endif
+ seq_putc(m, '\n');
+}
+#endif
+
+/*
+ * Poll support for process exit notification.
+ */
+static __poll_t pidfd_poll(struct file *file, struct poll_table_struct *pts)
+{
+ struct pid *pid = pidfd_pid(file);
+ bool thread = file->f_flags & PIDFD_THREAD;
+ struct task_struct *task;
+ __poll_t poll_flags = 0;
+
+ poll_wait(file, &pid->wait_pidfd, pts);
+ /*
+ * Depending on PIDFD_THREAD, inform pollers when the thread
+ * or the whole thread-group exits.
+ */
+ guard(rcu)();
+ task = pid_task(pid, PIDTYPE_PID);
+ if (!task)
+ poll_flags = EPOLLIN | EPOLLRDNORM | EPOLLHUP;
+ else if (task->exit_state && (thread || thread_group_empty(task)))
+ poll_flags = EPOLLIN | EPOLLRDNORM;
+
+ return poll_flags;
+}
+
+static const struct file_operations pidfs_file_operations = {
+ .release = pidfd_release,
+ .poll = pidfd_poll,
+#ifdef CONFIG_PROC_FS
+ .show_fdinfo = pidfd_show_fdinfo,
+#endif
+};
+
+struct pid *pidfd_pid(const struct file *file)
+{
+ if (file->f_op != &pidfs_file_operations)
+ return ERR_PTR(-EBADF);
+#ifdef CONFIG_FS_PID
+ return file_inode(file)->i_private;
+#else
+ return file->private_data;
+#endif
+}
+
+#ifdef CONFIG_FS_PID
+static struct vfsmount *pidfs_mnt __ro_after_init;
+
+/*
+ * The vfs falls back to simple_setattr() if i_op->setattr() isn't
+ * implemented. Let's reject it completely until we have a clean
+ * permission concept for pidfds.
+ */
+static int pidfs_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
+ struct iattr *attr)
+{
+ return -EOPNOTSUPP;
+}
+
+static int pidfs_getattr(struct mnt_idmap *idmap, const struct path *path,
+ struct kstat *stat, u32 request_mask,
+ unsigned int query_flags)
+{
+ struct inode *inode = d_inode(path->dentry);
+
+ generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
+ return 0;
+}
+
+static const struct inode_operations pidfs_inode_operations = {
+ .getattr = pidfs_getattr,
+ .setattr = pidfs_setattr,
+};
+
+static void pidfs_evict_inode(struct inode *inode)
+{
+ struct pid *pid = inode->i_private;
+
+ clear_inode(inode);
+ put_pid(pid);
+}
+
+static const struct super_operations pidfs_sops = {
+ .drop_inode = generic_delete_inode,
+ .evict_inode = pidfs_evict_inode,
+ .statfs = simple_statfs,
+};
+
+static char *pidfs_dname(struct dentry *dentry, char *buffer, int buflen)
+{
+ return dynamic_dname(buffer, buflen, "pidfd:[%lu]",
+ d_inode(dentry)->i_ino);
+}
+
+static const struct dentry_operations pidfs_dentry_operations = {
+ .d_delete = always_delete_dentry,
+ .d_dname = pidfs_dname,
+ .d_prune = stashed_dentry_prune,
+};
+
+static void pidfs_init_inode(struct inode *inode, void *data)
+{
+ inode->i_private = data;
+ inode->i_flags |= S_PRIVATE;
+ inode->i_mode |= S_IRWXU;
+ inode->i_op = &pidfs_inode_operations;
+ inode->i_fop = &pidfs_file_operations;
+}
+
+static void pidfs_put_data(void *data)
+{
+ struct pid *pid = data;
+ put_pid(pid);
+}
+
+static const struct stashed_operations pidfs_stashed_ops = {
+ .init_inode = pidfs_init_inode,
+ .put_data = pidfs_put_data,
+};
+
+static int pidfs_init_fs_context(struct fs_context *fc)
+{
+ struct pseudo_fs_context *ctx;
+
+ ctx = init_pseudo(fc, PID_FS_MAGIC);
+ if (!ctx)
+ return -ENOMEM;
+
+ ctx->ops = &pidfs_sops;
+ ctx->dops = &pidfs_dentry_operations;
+ fc->s_fs_info = (void *)&pidfs_stashed_ops;
+ return 0;
+}
+
+static struct file_system_type pidfs_type = {
+ .name = "pidfs",
+ .init_fs_context = pidfs_init_fs_context,
+ .kill_sb = kill_anon_super,
+};
+
+struct file *pidfs_alloc_file(struct pid *pid, unsigned int flags)
+{
+
+ struct file *pidfd_file;
+ struct path path;
+ int ret;
+
+ /*
+ * Inode numbering for pidfs start at RESERVED_PIDS + 1.
+ * This avoids collisions with the root inode which is 1
+ * for pseudo filesystems.
+ */
+ ret = path_from_stashed(&pid->stashed, pid->ino, pidfs_mnt,
+ get_pid(pid), &path);
+ if (ret < 0)
+ return ERR_PTR(ret);
+
+ pidfd_file = dentry_open(&path, flags, current_cred());
+ path_put(&path);
+ return pidfd_file;
+}
+
+void __init pidfs_init(void)
+{
+ pidfs_mnt = kern_mount(&pidfs_type);
+ if (IS_ERR(pidfs_mnt))
+ panic("Failed to mount pidfs pseudo filesystem");
+}
+
+bool is_pidfs_sb(const struct super_block *sb)
+{
+ return sb == pidfs_mnt->mnt_sb;
+}
+
+#else /* !CONFIG_FS_PID */
+
+struct file *pidfs_alloc_file(struct pid *pid, unsigned int flags)
+{
+ struct file *pidfd_file;
+
+ pidfd_file = anon_inode_getfile("[pidfd]", &pidfs_file_operations, pid,
+ flags | O_RDWR);
+ if (IS_ERR(pidfd_file))
+ return pidfd_file;
+
+ get_pid(pid);
+ return pidfd_file;
+}
+
+void __init pidfs_init(void) { }
+bool is_pidfs_sb(const struct super_block *sb)
+{
+ return false;
+}
+#endif
diff --git a/fs/pipe.c b/fs/pipe.c
index f1adbfe..50c8a85 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -76,18 +76,20 @@ static unsigned long pipe_user_pages_soft = PIPE_DEF_BUFFERS * INR_OPEN_CUR;
* -- Manfred Spraul <manfred@colorfullife.com> 2002-05-09
*/
-static void pipe_lock_nested(struct pipe_inode_info *pipe, int subclass)
+#define cmp_int(l, r) ((l > r) - (l < r))
+
+#ifdef CONFIG_PROVE_LOCKING
+static int pipe_lock_cmp_fn(const struct lockdep_map *a,
+ const struct lockdep_map *b)
{
- if (pipe->files)
- mutex_lock_nested(&pipe->mutex, subclass);
+ return cmp_int((unsigned long) a, (unsigned long) b);
}
+#endif
void pipe_lock(struct pipe_inode_info *pipe)
{
- /*
- * pipe_lock() nests non-pipe inode locks (for writing to a file)
- */
- pipe_lock_nested(pipe, I_MUTEX_PARENT);
+ if (pipe->files)
+ mutex_lock(&pipe->mutex);
}
EXPORT_SYMBOL(pipe_lock);
@@ -98,28 +100,16 @@ void pipe_unlock(struct pipe_inode_info *pipe)
}
EXPORT_SYMBOL(pipe_unlock);
-static inline void __pipe_lock(struct pipe_inode_info *pipe)
-{
- mutex_lock_nested(&pipe->mutex, I_MUTEX_PARENT);
-}
-
-static inline void __pipe_unlock(struct pipe_inode_info *pipe)
-{
- mutex_unlock(&pipe->mutex);
-}
-
void pipe_double_lock(struct pipe_inode_info *pipe1,
struct pipe_inode_info *pipe2)
{
BUG_ON(pipe1 == pipe2);
- if (pipe1 < pipe2) {
- pipe_lock_nested(pipe1, I_MUTEX_PARENT);
- pipe_lock_nested(pipe2, I_MUTEX_CHILD);
- } else {
- pipe_lock_nested(pipe2, I_MUTEX_PARENT);
- pipe_lock_nested(pipe1, I_MUTEX_CHILD);
- }
+ if (pipe1 > pipe2)
+ swap(pipe1, pipe2);
+
+ pipe_lock(pipe1);
+ pipe_lock(pipe2);
}
static void anon_pipe_buf_release(struct pipe_inode_info *pipe,
@@ -271,7 +261,7 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
return 0;
ret = 0;
- __pipe_lock(pipe);
+ mutex_lock(&pipe->mutex);
/*
* We only wake up writers if the pipe was full when we started
@@ -368,7 +358,7 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
ret = -EAGAIN;
break;
}
- __pipe_unlock(pipe);
+ mutex_unlock(&pipe->mutex);
/*
* We only get here if we didn't actually read anything.
@@ -400,13 +390,13 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
if (wait_event_interruptible_exclusive(pipe->rd_wait, pipe_readable(pipe)) < 0)
return -ERESTARTSYS;
- __pipe_lock(pipe);
+ mutex_lock(&pipe->mutex);
was_full = pipe_full(pipe->head, pipe->tail, pipe->max_usage);
wake_next_reader = true;
}
if (pipe_empty(pipe->head, pipe->tail))
wake_next_reader = false;
- __pipe_unlock(pipe);
+ mutex_unlock(&pipe->mutex);
if (was_full)
wake_up_interruptible_sync_poll(&pipe->wr_wait, EPOLLOUT | EPOLLWRNORM);
@@ -462,7 +452,7 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
if (unlikely(total_len == 0))
return 0;
- __pipe_lock(pipe);
+ mutex_lock(&pipe->mutex);
if (!pipe->readers) {
send_sig(SIGPIPE, current, 0);
@@ -582,19 +572,19 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
* after waiting we need to re-check whether the pipe
* become empty while we dropped the lock.
*/
- __pipe_unlock(pipe);
+ mutex_unlock(&pipe->mutex);
if (was_empty)
wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM);
kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
wait_event_interruptible_exclusive(pipe->wr_wait, pipe_writable(pipe));
- __pipe_lock(pipe);
+ mutex_lock(&pipe->mutex);
was_empty = pipe_empty(pipe->head, pipe->tail);
wake_next_writer = true;
}
out:
if (pipe_full(pipe->head, pipe->tail, pipe->max_usage))
wake_next_writer = false;
- __pipe_unlock(pipe);
+ mutex_unlock(&pipe->mutex);
/*
* If we do do a wakeup event, we do a 'sync' wakeup, because we
@@ -629,7 +619,7 @@ static long pipe_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
switch (cmd) {
case FIONREAD:
- __pipe_lock(pipe);
+ mutex_lock(&pipe->mutex);
count = 0;
head = pipe->head;
tail = pipe->tail;
@@ -639,16 +629,16 @@ static long pipe_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
count += pipe->bufs[tail & mask].len;
tail++;
}
- __pipe_unlock(pipe);
+ mutex_unlock(&pipe->mutex);
return put_user(count, (int __user *)arg);
#ifdef CONFIG_WATCH_QUEUE
case IOC_WATCH_QUEUE_SET_SIZE: {
int ret;
- __pipe_lock(pipe);
+ mutex_lock(&pipe->mutex);
ret = watch_queue_set_size(pipe, arg);
- __pipe_unlock(pipe);
+ mutex_unlock(&pipe->mutex);
return ret;
}
@@ -734,7 +724,7 @@ pipe_release(struct inode *inode, struct file *file)
{
struct pipe_inode_info *pipe = file->private_data;
- __pipe_lock(pipe);
+ mutex_lock(&pipe->mutex);
if (file->f_mode & FMODE_READ)
pipe->readers--;
if (file->f_mode & FMODE_WRITE)
@@ -747,7 +737,7 @@ pipe_release(struct inode *inode, struct file *file)
kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
}
- __pipe_unlock(pipe);
+ mutex_unlock(&pipe->mutex);
put_pipe_info(inode, pipe);
return 0;
@@ -759,7 +749,7 @@ pipe_fasync(int fd, struct file *filp, int on)
struct pipe_inode_info *pipe = filp->private_data;
int retval = 0;
- __pipe_lock(pipe);
+ mutex_lock(&pipe->mutex);
if (filp->f_mode & FMODE_READ)
retval = fasync_helper(fd, filp, on, &pipe->fasync_readers);
if ((filp->f_mode & FMODE_WRITE) && retval >= 0) {
@@ -768,7 +758,7 @@ pipe_fasync(int fd, struct file *filp, int on)
/* this can happen only if on == T */
fasync_helper(-1, filp, 0, &pipe->fasync_readers);
}
- __pipe_unlock(pipe);
+ mutex_unlock(&pipe->mutex);
return retval;
}
@@ -834,6 +824,7 @@ struct pipe_inode_info *alloc_pipe_info(void)
pipe->nr_accounted = pipe_bufs;
pipe->user = user;
mutex_init(&pipe->mutex);
+ lock_set_cmp_fn(&pipe->mutex, pipe_lock_cmp_fn, NULL);
return pipe;
}
@@ -1144,7 +1135,7 @@ static int fifo_open(struct inode *inode, struct file *filp)
filp->private_data = pipe;
/* OK, we have a pipe and it's pinned down */
- __pipe_lock(pipe);
+ mutex_lock(&pipe->mutex);
/* We can only do regular read/write on fifos */
stream_open(inode, filp);
@@ -1214,7 +1205,7 @@ static int fifo_open(struct inode *inode, struct file *filp)
}
/* Ok! */
- __pipe_unlock(pipe);
+ mutex_unlock(&pipe->mutex);
return 0;
err_rd:
@@ -1230,7 +1221,7 @@ static int fifo_open(struct inode *inode, struct file *filp)
goto err;
err:
- __pipe_unlock(pipe);
+ mutex_unlock(&pipe->mutex);
put_pipe_info(inode, pipe);
return ret;
@@ -1411,7 +1402,7 @@ long pipe_fcntl(struct file *file, unsigned int cmd, unsigned int arg)
if (!pipe)
return -EBADF;
- __pipe_lock(pipe);
+ mutex_lock(&pipe->mutex);
switch (cmd) {
case F_SETPIPE_SZ:
@@ -1425,7 +1416,7 @@ long pipe_fcntl(struct file *file, unsigned int cmd, unsigned int arg)
break;
}
- __pipe_unlock(pipe);
+ mutex_unlock(&pipe->mutex);
return ret;
}
diff --git a/fs/posix_acl.c b/fs/posix_acl.c
index e1af208..6bf587d 100644
--- a/fs/posix_acl.c
+++ b/fs/posix_acl.c
@@ -786,12 +786,12 @@ struct posix_acl *posix_acl_from_xattr(struct user_namespace *userns,
return ERR_PTR(count);
if (count == 0)
return NULL;
-
+
acl = posix_acl_alloc(count, GFP_NOFS);
if (!acl)
return ERR_PTR(-ENOMEM);
acl_e = acl->a_entries;
-
+
for (end = entry + count; entry != end; acl_e++, entry++) {
acl_e->e_tag = le16_to_cpu(entry->e_tag);
acl_e->e_perm = le16_to_cpu(entry->e_perm);
diff --git a/fs/proc/array.c b/fs/proc/array.c
index ff08a89..34a47fb 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -477,13 +477,13 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
int permitted;
struct mm_struct *mm;
unsigned long long start_time;
- unsigned long cmin_flt = 0, cmaj_flt = 0;
- unsigned long min_flt = 0, maj_flt = 0;
- u64 cutime, cstime, utime, stime;
- u64 cgtime, gtime;
+ unsigned long cmin_flt, cmaj_flt, min_flt, maj_flt;
+ u64 cutime, cstime, cgtime, utime, stime, gtime;
unsigned long rsslim = 0;
unsigned long flags;
int exit_code = task->exit_code;
+ struct signal_struct *sig = task->signal;
+ unsigned int seq = 1;
state = *get_task_state(task);
vsize = eip = esp = 0;
@@ -511,12 +511,8 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
sigemptyset(&sigign);
sigemptyset(&sigcatch);
- cutime = cstime = utime = stime = 0;
- cgtime = gtime = 0;
if (lock_task_sighand(task, &flags)) {
- struct signal_struct *sig = task->signal;
-
if (sig->tty) {
struct pid *pgrp = tty_get_pgrp(sig->tty);
tty_pgrp = pid_nr_ns(pgrp, ns);
@@ -527,28 +523,9 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
num_threads = get_nr_threads(task);
collect_sigign_sigcatch(task, &sigign, &sigcatch);
- cmin_flt = sig->cmin_flt;
- cmaj_flt = sig->cmaj_flt;
- cutime = sig->cutime;
- cstime = sig->cstime;
- cgtime = sig->cgtime;
rsslim = READ_ONCE(sig->rlim[RLIMIT_RSS].rlim_cur);
- /* add up live thread stats at the group level */
if (whole) {
- struct task_struct *t;
-
- __for_each_thread(sig, t) {
- min_flt += t->min_flt;
- maj_flt += t->maj_flt;
- gtime += task_gtime(t);
- }
-
- min_flt += sig->min_flt;
- maj_flt += sig->maj_flt;
- thread_group_cputime_adjusted(task, &utime, &stime);
- gtime += sig->gtime;
-
if (sig->flags & (SIGNAL_GROUP_EXIT | SIGNAL_STOP_STOPPED))
exit_code = sig->group_exit_code;
}
@@ -562,10 +539,41 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
if (permitted && (!whole || num_threads < 2))
wchan = !task_is_running(task);
- if (!whole) {
+
+ do {
+ seq++; /* 2 on the 1st/lockless path, otherwise odd */
+ flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq);
+
+ cmin_flt = sig->cmin_flt;
+ cmaj_flt = sig->cmaj_flt;
+ cutime = sig->cutime;
+ cstime = sig->cstime;
+ cgtime = sig->cgtime;
+
+ if (whole) {
+ struct task_struct *t;
+
+ min_flt = sig->min_flt;
+ maj_flt = sig->maj_flt;
+ gtime = sig->gtime;
+
+ rcu_read_lock();
+ __for_each_thread(sig, t) {
+ min_flt += t->min_flt;
+ maj_flt += t->maj_flt;
+ gtime += task_gtime(t);
+ }
+ rcu_read_unlock();
+ }
+ } while (need_seqretry(&sig->stats_lock, seq));
+ done_seqretry_irqrestore(&sig->stats_lock, seq, flags);
+
+ if (whole) {
+ thread_group_cputime_adjusted(task, &utime, &stime);
+ } else {
+ task_cputime_adjusted(task, &utime, &stime);
min_flt = task->min_flt;
maj_flt = task->maj_flt;
- task_cputime_adjusted(task, &utime, &stime);
gtime = task_gtime(task);
}
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 98a031a..18550c07 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1878,8 +1878,6 @@ void proc_pid_evict_inode(struct proc_inode *ei)
hlist_del_init_rcu(&ei->sibling_inodes);
spin_unlock(&pid->lock);
}
-
- put_pid(pid);
}
struct inode *proc_pid_make_inode(struct super_block *sb,
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index b33e490..dcd513d 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -30,7 +30,6 @@
static void proc_evict_inode(struct inode *inode)
{
- struct proc_dir_entry *de;
struct ctl_table_header *head;
struct proc_inode *ei = PROC_I(inode);
@@ -38,17 +37,8 @@ static void proc_evict_inode(struct inode *inode)
clear_inode(inode);
/* Stop tracking associated processes */
- if (ei->pid) {
+ if (ei->pid)
proc_pid_evict_inode(ei);
- ei->pid = NULL;
- }
-
- /* Let go of any associated proc directory entry */
- de = ei->pde;
- if (de) {
- pde_put(de);
- ei->pde = NULL;
- }
head = ei->sysctl;
if (head) {
@@ -80,6 +70,13 @@ static struct inode *proc_alloc_inode(struct super_block *sb)
static void proc_free_inode(struct inode *inode)
{
+ struct proc_inode *ei = PROC_I(inode);
+
+ if (ei->pid)
+ put_pid(ei->pid);
+ /* Let go of any associated proc directory entry */
+ if (ei->pde)
+ pde_put(ei->pde);
kmem_cache_free(proc_inode_cachep, PROC_I(inode));
}
@@ -95,7 +92,7 @@ void __init proc_init_kmemcache(void)
proc_inode_cachep = kmem_cache_create("proc_inode_cache",
sizeof(struct proc_inode),
0, (SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|SLAB_ACCOUNT|
+ SLAB_ACCOUNT|
SLAB_PANIC),
init_once);
pde_opener_cache =
diff --git a/fs/proc/root.c b/fs/proc/root.c
index b55dbc7..06a297a 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -271,7 +271,7 @@ static void proc_kill_sb(struct super_block *sb)
kill_anon_super(sb);
put_pid_ns(fs_info->pid_ns);
- kfree(fs_info);
+ kfree_rcu(fs_info, rcu);
}
static struct file_system_type proc_fs_type = {
diff --git a/fs/qnx4/inode.c b/fs/qnx4/inode.c
index 6eb9bb3..7b5711f 100644
--- a/fs/qnx4/inode.c
+++ b/fs/qnx4/inode.c
@@ -21,6 +21,7 @@
#include <linux/buffer_head.h>
#include <linux/writeback.h>
#include <linux/statfs.h>
+#include <linux/fs_context.h>
#include "qnx4.h"
#define QNX4_VERSION 4
@@ -30,28 +31,33 @@ static const struct super_operations qnx4_sops;
static struct inode *qnx4_alloc_inode(struct super_block *sb);
static void qnx4_free_inode(struct inode *inode);
-static int qnx4_remount(struct super_block *sb, int *flags, char *data);
static int qnx4_statfs(struct dentry *, struct kstatfs *);
+static int qnx4_get_tree(struct fs_context *fc);
static const struct super_operations qnx4_sops =
{
.alloc_inode = qnx4_alloc_inode,
.free_inode = qnx4_free_inode,
.statfs = qnx4_statfs,
- .remount_fs = qnx4_remount,
};
-static int qnx4_remount(struct super_block *sb, int *flags, char *data)
+static int qnx4_reconfigure(struct fs_context *fc)
{
+ struct super_block *sb = fc->root->d_sb;
struct qnx4_sb_info *qs;
sync_filesystem(sb);
qs = qnx4_sb(sb);
qs->Version = QNX4_VERSION;
- *flags |= SB_RDONLY;
+ fc->sb_flags |= SB_RDONLY;
return 0;
}
+static const struct fs_context_operations qnx4_context_opts = {
+ .get_tree = qnx4_get_tree,
+ .reconfigure = qnx4_reconfigure,
+};
+
static int qnx4_get_block( struct inode *inode, sector_t iblock, struct buffer_head *bh, int create )
{
unsigned long phys;
@@ -183,12 +189,13 @@ static const char *qnx4_checkroot(struct super_block *sb,
return "bitmap file not found.";
}
-static int qnx4_fill_super(struct super_block *s, void *data, int silent)
+static int qnx4_fill_super(struct super_block *s, struct fs_context *fc)
{
struct buffer_head *bh;
struct inode *root;
const char *errmsg;
struct qnx4_sb_info *qs;
+ int silent = fc->sb_flags & SB_SILENT;
qs = kzalloc(sizeof(struct qnx4_sb_info), GFP_KERNEL);
if (!qs)
@@ -216,7 +223,7 @@ static int qnx4_fill_super(struct super_block *s, void *data, int silent)
errmsg = qnx4_checkroot(s, (struct qnx4_super_block *) bh->b_data);
brelse(bh);
if (errmsg != NULL) {
- if (!silent)
+ if (!silent)
printk(KERN_ERR "qnx4: %s\n", errmsg);
return -EINVAL;
}
@@ -235,6 +242,18 @@ static int qnx4_fill_super(struct super_block *s, void *data, int silent)
return 0;
}
+static int qnx4_get_tree(struct fs_context *fc)
+{
+ return get_tree_bdev(fc, qnx4_fill_super);
+}
+
+static int qnx4_init_fs_context(struct fs_context *fc)
+{
+ fc->ops = &qnx4_context_opts;
+
+ return 0;
+}
+
static void qnx4_kill_sb(struct super_block *sb)
{
struct qnx4_sb_info *qs = qnx4_sb(sb);
@@ -376,18 +395,12 @@ static void destroy_inodecache(void)
kmem_cache_destroy(qnx4_inode_cachep);
}
-static struct dentry *qnx4_mount(struct file_system_type *fs_type,
- int flags, const char *dev_name, void *data)
-{
- return mount_bdev(fs_type, flags, dev_name, data, qnx4_fill_super);
-}
-
static struct file_system_type qnx4_fs_type = {
- .owner = THIS_MODULE,
- .name = "qnx4",
- .mount = qnx4_mount,
- .kill_sb = qnx4_kill_sb,
- .fs_flags = FS_REQUIRES_DEV,
+ .owner = THIS_MODULE,
+ .name = "qnx4",
+ .kill_sb = qnx4_kill_sb,
+ .fs_flags = FS_REQUIRES_DEV,
+ .init_fs_context = qnx4_init_fs_context,
};
MODULE_ALIAS_FS("qnx4");
diff --git a/fs/qnx6/inode.c b/fs/qnx6/inode.c
index a286c545..405913f 100644
--- a/fs/qnx6/inode.c
+++ b/fs/qnx6/inode.c
@@ -615,7 +615,7 @@ static int init_inodecache(void)
qnx6_inode_cachep = kmem_cache_create("qnx6_inode_cache",
sizeof(struct qnx6_inode_info),
0, (SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+ SLAB_ACCOUNT),
init_once);
if (!qnx6_inode_cachep)
return -ENOMEM;
diff --git a/fs/reiserfs/journal.c b/fs/reiserfs/journal.c
index 171c912..6474529 100644
--- a/fs/reiserfs/journal.c
+++ b/fs/reiserfs/journal.c
@@ -2386,7 +2386,7 @@ static int journal_read(struct super_block *sb)
cur_dblock = SB_ONDISK_JOURNAL_1st_BLOCK(sb);
reiserfs_info(sb, "checking transaction log (%pg)\n",
- journal->j_bdev_handle->bdev);
+ file_bdev(journal->j_bdev_file));
start = ktime_get_seconds();
/*
@@ -2447,7 +2447,7 @@ static int journal_read(struct super_block *sb)
* device and journal device to be the same
*/
d_bh =
- reiserfs_breada(journal->j_bdev_handle->bdev, cur_dblock,
+ reiserfs_breada(file_bdev(journal->j_bdev_file), cur_dblock,
sb->s_blocksize,
SB_ONDISK_JOURNAL_1st_BLOCK(sb) +
SB_ONDISK_JOURNAL_SIZE(sb));
@@ -2588,9 +2588,9 @@ static void journal_list_init(struct super_block *sb)
static void release_journal_dev(struct reiserfs_journal *journal)
{
- if (journal->j_bdev_handle) {
- bdev_release(journal->j_bdev_handle);
- journal->j_bdev_handle = NULL;
+ if (journal->j_bdev_file) {
+ fput(journal->j_bdev_file);
+ journal->j_bdev_file = NULL;
}
}
@@ -2605,7 +2605,7 @@ static int journal_init_dev(struct super_block *super,
result = 0;
- journal->j_bdev_handle = NULL;
+ journal->j_bdev_file = NULL;
jdev = SB_ONDISK_JOURNAL_DEVICE(super) ?
new_decode_dev(SB_ONDISK_JOURNAL_DEVICE(super)) : super->s_dev;
@@ -2616,37 +2616,37 @@ static int journal_init_dev(struct super_block *super,
if ((!jdev_name || !jdev_name[0])) {
if (jdev == super->s_dev)
holder = NULL;
- journal->j_bdev_handle = bdev_open_by_dev(jdev, blkdev_mode,
+ journal->j_bdev_file = bdev_file_open_by_dev(jdev, blkdev_mode,
holder, NULL);
- if (IS_ERR(journal->j_bdev_handle)) {
- result = PTR_ERR(journal->j_bdev_handle);
- journal->j_bdev_handle = NULL;
+ if (IS_ERR(journal->j_bdev_file)) {
+ result = PTR_ERR(journal->j_bdev_file);
+ journal->j_bdev_file = NULL;
reiserfs_warning(super, "sh-458",
"cannot init journal device unknown-block(%u,%u): %i",
MAJOR(jdev), MINOR(jdev), result);
return result;
} else if (jdev != super->s_dev)
- set_blocksize(journal->j_bdev_handle->bdev,
+ set_blocksize(file_bdev(journal->j_bdev_file),
super->s_blocksize);
return 0;
}
- journal->j_bdev_handle = bdev_open_by_path(jdev_name, blkdev_mode,
+ journal->j_bdev_file = bdev_file_open_by_path(jdev_name, blkdev_mode,
holder, NULL);
- if (IS_ERR(journal->j_bdev_handle)) {
- result = PTR_ERR(journal->j_bdev_handle);
- journal->j_bdev_handle = NULL;
+ if (IS_ERR(journal->j_bdev_file)) {
+ result = PTR_ERR(journal->j_bdev_file);
+ journal->j_bdev_file = NULL;
reiserfs_warning(super, "sh-457",
"journal_init_dev: Cannot open '%s': %i",
jdev_name, result);
return result;
}
- set_blocksize(journal->j_bdev_handle->bdev, super->s_blocksize);
+ set_blocksize(file_bdev(journal->j_bdev_file), super->s_blocksize);
reiserfs_info(super,
"journal_init_dev: journal device: %pg\n",
- journal->j_bdev_handle->bdev);
+ file_bdev(journal->j_bdev_file));
return 0;
}
@@ -2804,7 +2804,7 @@ int journal_init(struct super_block *sb, const char *j_dev_name,
"journal header magic %x (device %pg) does "
"not match to magic found in super block %x",
jh->jh_journal.jp_journal_magic,
- journal->j_bdev_handle->bdev,
+ file_bdev(journal->j_bdev_file),
sb_jp_journal_magic(rs));
brelse(bhjh);
goto free_and_return;
@@ -2828,7 +2828,7 @@ int journal_init(struct super_block *sb, const char *j_dev_name,
reiserfs_info(sb, "journal params: device %pg, size %u, "
"journal first block %u, max trans len %u, max batch %u, "
"max commit age %u, max trans age %u\n",
- journal->j_bdev_handle->bdev,
+ file_bdev(journal->j_bdev_file),
SB_ONDISK_JOURNAL_SIZE(sb),
SB_ONDISK_JOURNAL_1st_BLOCK(sb),
journal->j_trans_max,
diff --git a/fs/reiserfs/procfs.c b/fs/reiserfs/procfs.c
index 83cb940..5c68a4a 100644
--- a/fs/reiserfs/procfs.c
+++ b/fs/reiserfs/procfs.c
@@ -354,7 +354,7 @@ static int show_journal(struct seq_file *m, void *unused)
"prepare: \t%12lu\n"
"prepare_retry: \t%12lu\n",
DJP(jp_journal_1st_block),
- SB_JOURNAL(sb)->j_bdev_handle->bdev,
+ file_bdev(SB_JOURNAL(sb)->j_bdev_file),
DJP(jp_journal_dev),
DJP(jp_journal_size),
DJP(jp_journal_trans_max),
diff --git a/fs/reiserfs/reiserfs.h b/fs/reiserfs/reiserfs.h
index 7256678..0554903 100644
--- a/fs/reiserfs/reiserfs.h
+++ b/fs/reiserfs/reiserfs.h
@@ -299,7 +299,7 @@ struct reiserfs_journal {
/* oldest journal block. start here for traverse */
struct reiserfs_journal_cnode *j_first;
- struct bdev_handle *j_bdev_handle;
+ struct file *j_bdev_file;
/* first block on s_dev of reserved area journal */
int j_1st_reserved_block;
@@ -2810,10 +2810,10 @@ struct reiserfs_journal_header {
/* We need these to make journal.c code more readable */
#define journal_find_get_block(s, block) __find_get_block(\
- SB_JOURNAL(s)->j_bdev_handle->bdev, block, s->s_blocksize)
-#define journal_getblk(s, block) __getblk(SB_JOURNAL(s)->j_bdev_handle->bdev,\
+ file_bdev(SB_JOURNAL(s)->j_bdev_file), block, s->s_blocksize)
+#define journal_getblk(s, block) __getblk(file_bdev(SB_JOURNAL(s)->j_bdev_file),\
block, s->s_blocksize)
-#define journal_bread(s, block) __bread(SB_JOURNAL(s)->j_bdev_handle->bdev,\
+#define journal_bread(s, block) __bread(file_bdev(SB_JOURNAL(s)->j_bdev_file),\
block, s->s_blocksize)
enum reiserfs_bh_state_bits {
diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c
index 67b5510..2cc469d 100644
--- a/fs/reiserfs/super.c
+++ b/fs/reiserfs/super.c
@@ -670,7 +670,6 @@ static int __init init_inodecache(void)
sizeof(struct
reiserfs_inode_info),
0, (SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD|
SLAB_ACCOUNT),
init_once);
if (reiserfs_inode_cachep == NULL)
diff --git a/fs/remap_range.c b/fs/remap_range.c
index f8c1120..de07f97 100644
--- a/fs/remap_range.c
+++ b/fs/remap_range.c
@@ -373,9 +373,9 @@ int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
}
EXPORT_SYMBOL(generic_remap_file_range_prep);
-loff_t do_clone_file_range(struct file *file_in, loff_t pos_in,
- struct file *file_out, loff_t pos_out,
- loff_t len, unsigned int remap_flags)
+loff_t vfs_clone_file_range(struct file *file_in, loff_t pos_in,
+ struct file *file_out, loff_t pos_out,
+ loff_t len, unsigned int remap_flags)
{
loff_t ret;
@@ -391,23 +391,6 @@ loff_t do_clone_file_range(struct file *file_in, loff_t pos_in,
if (!file_in->f_op->remap_file_range)
return -EOPNOTSUPP;
- ret = file_in->f_op->remap_file_range(file_in, pos_in,
- file_out, pos_out, len, remap_flags);
- if (ret < 0)
- return ret;
-
- fsnotify_access(file_in);
- fsnotify_modify(file_out);
- return ret;
-}
-EXPORT_SYMBOL(do_clone_file_range);
-
-loff_t vfs_clone_file_range(struct file *file_in, loff_t pos_in,
- struct file *file_out, loff_t pos_out,
- loff_t len, unsigned int remap_flags)
-{
- loff_t ret;
-
ret = remap_verify_area(file_in, pos_in, len, false);
if (ret)
return ret;
@@ -417,10 +400,14 @@ loff_t vfs_clone_file_range(struct file *file_in, loff_t pos_in,
return ret;
file_start_write(file_out);
- ret = do_clone_file_range(file_in, pos_in, file_out, pos_out, len,
- remap_flags);
+ ret = file_in->f_op->remap_file_range(file_in, pos_in,
+ file_out, pos_out, len, remap_flags);
file_end_write(file_out);
+ if (ret < 0)
+ return ret;
+ fsnotify_access(file_in);
+ fsnotify_modify(file_out);
return ret;
}
EXPORT_SYMBOL(vfs_clone_file_range);
diff --git a/fs/romfs/super.c b/fs/romfs/super.c
index 545ad44..2be2275 100644
--- a/fs/romfs/super.c
+++ b/fs/romfs/super.c
@@ -594,7 +594,7 @@ static void romfs_kill_sb(struct super_block *sb)
#ifdef CONFIG_ROMFS_ON_BLOCK
if (sb->s_bdev) {
sync_blockdev(sb->s_bdev);
- bdev_release(sb->s_bdev_handle);
+ fput(sb->s_bdev_file);
}
#endif
}
@@ -630,8 +630,8 @@ static int __init init_romfs_fs(void)
romfs_inode_cachep =
kmem_cache_create("romfs_i",
sizeof(struct romfs_inode_info), 0,
- SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD |
- SLAB_ACCOUNT, romfs_i_init_once);
+ SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
+ romfs_i_init_once);
if (!romfs_inode_cachep) {
pr_err("Failed to initialise inode cache\n");
diff --git a/fs/select.c b/fs/select.c
index 0ee55af..9515c3f 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -476,7 +476,7 @@ static inline void wait_key_set(poll_table *wait, unsigned long in,
wait->_key |= POLLOUT_SET;
}
-static int do_select(int n, fd_set_bits *fds, struct timespec64 *end_time)
+static noinline_for_stack int do_select(int n, fd_set_bits *fds, struct timespec64 *end_time)
{
ktime_t expire, *to = NULL;
struct poll_wqueues table;
@@ -839,7 +839,7 @@ SYSCALL_DEFINE1(old_select, struct sel_arg_struct __user *, arg)
struct poll_list {
struct poll_list *next;
- int len;
+ unsigned int len;
struct pollfd entries[];
};
@@ -975,14 +975,15 @@ static int do_sys_poll(struct pollfd __user *ufds, unsigned int nfds,
struct timespec64 *end_time)
{
struct poll_wqueues table;
- int err = -EFAULT, fdcount, len;
+ int err = -EFAULT, fdcount;
/* Allocate small arguments on the stack to save memory and be
faster - use long to make sure the buffer is aligned properly
on 64 bit archs to avoid unaligned access */
long stack_pps[POLL_STACK_ALLOC/sizeof(long)];
struct poll_list *const head = (struct poll_list *)stack_pps;
struct poll_list *walk = head;
- unsigned long todo = nfds;
+ unsigned int todo = nfds;
+ unsigned int len;
if (nfds > rlimit(RLIMIT_NOFILE))
return -EINVAL;
@@ -998,9 +999,9 @@ static int do_sys_poll(struct pollfd __user *ufds, unsigned int nfds,
sizeof(struct pollfd) * walk->len))
goto out_fds;
- todo -= walk->len;
- if (!todo)
+ if (walk->len >= todo)
break;
+ todo -= walk->len;
len = min(todo, POLLFD_PER_PAGE);
walk = walk->next = kmalloc(struct_size(walk, entries, len),
@@ -1020,7 +1021,7 @@ static int do_sys_poll(struct pollfd __user *ufds, unsigned int nfds,
for (walk = head; walk; walk = walk->next) {
struct pollfd *fds = walk->entries;
- int j;
+ unsigned int j;
for (j = walk->len; j; fds++, ufds++, j--)
unsafe_put_user(fds->revents, &ufds->revents, Efault);
diff --git a/fs/smb/client/cached_dir.c b/fs/smb/client/cached_dir.c
index 1daeb57..3de5047a 100644
--- a/fs/smb/client/cached_dir.c
+++ b/fs/smb/client/cached_dir.c
@@ -242,6 +242,7 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
.desired_access = FILE_READ_DATA | FILE_READ_ATTRIBUTES,
.disposition = FILE_OPEN,
.fid = pfid,
+ .replay = !!(retries),
};
rc = SMB2_open_init(tcon, server,
diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
index 2a4a4e3..fb368b19 100644
--- a/fs/smb/client/cifsfs.c
+++ b/fs/smb/client/cifsfs.c
@@ -1085,7 +1085,7 @@ static loff_t cifs_llseek(struct file *file, loff_t offset, int whence)
}
static int
-cifs_setlease(struct file *file, int arg, struct file_lock **lease, void **priv)
+cifs_setlease(struct file *file, int arg, struct file_lease **lease, void **priv)
{
/*
* Note that this is called by vfs setlease with i_lock held to
@@ -1094,9 +1094,6 @@ cifs_setlease(struct file *file, int arg, struct file_lock **lease, void **priv)
struct inode *inode = file_inode(file);
struct cifsFileInfo *cfile = file->private_data;
- if (!(S_ISREG(inode->i_mode)))
- return -EINVAL;
-
/* Check if file is oplocked if this is request for new lease */
if (arg == F_UNLCK ||
((arg == F_RDLCK) && CIFS_CACHE_READ(CIFS_I(inode))) ||
@@ -1172,6 +1169,9 @@ const char *cifs_get_link(struct dentry *dentry, struct inode *inode,
{
char *target_path;
+ if (!dentry)
+ return ERR_PTR(-ECHILD);
+
target_path = kmalloc(PATH_MAX, GFP_KERNEL);
if (!target_path)
return ERR_PTR(-ENOMEM);
diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h
index c86a72c..53c75cf 100644
--- a/fs/smb/client/cifsglob.h
+++ b/fs/smb/client/cifsglob.h
@@ -1378,6 +1378,7 @@ struct cifs_open_parms {
struct cifs_fid *fid;
umode_t mode;
bool reconnect:1;
+ bool replay:1; /* indicates that this open is for a replay */
};
struct cifs_fid {
diff --git a/fs/smb/client/cifssmb.c b/fs/smb/client/cifssmb.c
index 01e8907..5eb83ba 100644
--- a/fs/smb/client/cifssmb.c
+++ b/fs/smb/client/cifssmb.c
@@ -2066,20 +2066,20 @@ CIFSSMBPosixLock(const unsigned int xid, struct cifs_tcon *tcon,
parm_data = (struct cifs_posix_lock *)
((char *)&pSMBr->hdr.Protocol + data_offset);
if (parm_data->lock_type == cpu_to_le16(CIFS_UNLCK))
- pLockData->fl_type = F_UNLCK;
+ pLockData->c.flc_type = F_UNLCK;
else {
if (parm_data->lock_type ==
cpu_to_le16(CIFS_RDLCK))
- pLockData->fl_type = F_RDLCK;
+ pLockData->c.flc_type = F_RDLCK;
else if (parm_data->lock_type ==
cpu_to_le16(CIFS_WRLCK))
- pLockData->fl_type = F_WRLCK;
+ pLockData->c.flc_type = F_WRLCK;
pLockData->fl_start = le64_to_cpu(parm_data->start);
pLockData->fl_end = pLockData->fl_start +
(le64_to_cpu(parm_data->length) ?
le64_to_cpu(parm_data->length) - 1 : 0);
- pLockData->fl_pid = -le32_to_cpu(parm_data->pid);
+ pLockData->c.flc_pid = -le32_to_cpu(parm_data->pid);
}
}
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
index bfd568f..ac95955 100644
--- a/fs/smb/client/connect.c
+++ b/fs/smb/client/connect.c
@@ -233,6 +233,12 @@ cifs_mark_tcp_ses_conns_for_reconnect(struct TCP_Server_Info *server,
list_for_each_entry_safe(ses, nses, &pserver->smb_ses_list, smb_ses_list) {
/* check if iface is still active */
spin_lock(&ses->chan_lock);
+ if (cifs_ses_get_chan_index(ses, server) ==
+ CIFS_INVAL_CHAN_INDEX) {
+ spin_unlock(&ses->chan_lock);
+ continue;
+ }
+
if (!cifs_chan_is_iface_active(ses, server)) {
spin_unlock(&ses->chan_lock);
cifs_chan_update_iface(ses, server);
@@ -3438,8 +3444,18 @@ int cifs_mount_get_tcon(struct cifs_mount_ctx *mnt_ctx)
* the user on mount
*/
if ((cifs_sb->ctx->wsize == 0) ||
- (cifs_sb->ctx->wsize > server->ops->negotiate_wsize(tcon, ctx)))
- cifs_sb->ctx->wsize = server->ops->negotiate_wsize(tcon, ctx);
+ (cifs_sb->ctx->wsize > server->ops->negotiate_wsize(tcon, ctx))) {
+ cifs_sb->ctx->wsize =
+ round_down(server->ops->negotiate_wsize(tcon, ctx), PAGE_SIZE);
+ /*
+ * in the very unlikely event that the server sent a max write size under PAGE_SIZE,
+ * (which would get rounded down to 0) then reset wsize to absolute minimum eg 4096
+ */
+ if (cifs_sb->ctx->wsize == 0) {
+ cifs_sb->ctx->wsize = PAGE_SIZE;
+ cifs_dbg(VFS, "wsize too small, reset to minimum ie PAGE_SIZE, usually 4096\n");
+ }
+ }
if ((cifs_sb->ctx->rsize == 0) ||
(cifs_sb->ctx->rsize > server->ops->negotiate_rsize(tcon, ctx)))
cifs_sb->ctx->rsize = server->ops->negotiate_rsize(tcon, ctx);
@@ -4228,6 +4244,11 @@ int cifs_tree_connect(const unsigned int xid, struct cifs_tcon *tcon, const stru
/* only send once per connect */
spin_lock(&tcon->tc_lock);
+
+ /* if tcon is marked for needing reconnect, update state */
+ if (tcon->need_reconnect)
+ tcon->status = TID_NEED_TCON;
+
if (tcon->status == TID_GOOD) {
spin_unlock(&tcon->tc_lock);
return 0;
diff --git a/fs/smb/client/dfs.c b/fs/smb/client/dfs.c
index a8a1d38..449c598 100644
--- a/fs/smb/client/dfs.c
+++ b/fs/smb/client/dfs.c
@@ -565,6 +565,11 @@ int cifs_tree_connect(const unsigned int xid, struct cifs_tcon *tcon, const stru
/* only send once per connect */
spin_lock(&tcon->tc_lock);
+
+ /* if tcon is marked for needing reconnect, update state */
+ if (tcon->need_reconnect)
+ tcon->status = TID_NEED_TCON;
+
if (tcon->status == TID_GOOD) {
spin_unlock(&tcon->tc_lock);
return 0;
@@ -625,8 +630,8 @@ int cifs_tree_connect(const unsigned int xid, struct cifs_tcon *tcon, const stru
spin_lock(&tcon->tc_lock);
if (tcon->status == TID_IN_TCON)
tcon->status = TID_GOOD;
- spin_unlock(&tcon->tc_lock);
tcon->need_reconnect = false;
+ spin_unlock(&tcon->tc_lock);
}
return rc;
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c
index b75282c..c3b8e70 100644
--- a/fs/smb/client/file.c
+++ b/fs/smb/client/file.c
@@ -175,6 +175,9 @@ cifs_mark_open_files_invalid(struct cifs_tcon *tcon)
/* only send once per connect */
spin_lock(&tcon->tc_lock);
+ if (tcon->need_reconnect)
+ tcon->status = TID_NEED_RECON;
+
if (tcon->status != TID_NEED_RECON) {
spin_unlock(&tcon->tc_lock);
return;
@@ -1312,20 +1315,20 @@ cifs_lock_test(struct cifsFileInfo *cfile, __u64 offset, __u64 length,
down_read(&cinode->lock_sem);
exist = cifs_find_lock_conflict(cfile, offset, length, type,
- flock->fl_flags, &conf_lock,
+ flock->c.flc_flags, &conf_lock,
CIFS_LOCK_OP);
if (exist) {
flock->fl_start = conf_lock->offset;
flock->fl_end = conf_lock->offset + conf_lock->length - 1;
- flock->fl_pid = conf_lock->pid;
+ flock->c.flc_pid = conf_lock->pid;
if (conf_lock->type & server->vals->shared_lock_type)
- flock->fl_type = F_RDLCK;
+ flock->c.flc_type = F_RDLCK;
else
- flock->fl_type = F_WRLCK;
+ flock->c.flc_type = F_WRLCK;
} else if (!cinode->can_cache_brlcks)
rc = 1;
else
- flock->fl_type = F_UNLCK;
+ flock->c.flc_type = F_UNLCK;
up_read(&cinode->lock_sem);
return rc;
@@ -1401,16 +1404,16 @@ cifs_posix_lock_test(struct file *file, struct file_lock *flock)
{
int rc = 0;
struct cifsInodeInfo *cinode = CIFS_I(file_inode(file));
- unsigned char saved_type = flock->fl_type;
+ unsigned char saved_type = flock->c.flc_type;
- if ((flock->fl_flags & FL_POSIX) == 0)
+ if ((flock->c.flc_flags & FL_POSIX) == 0)
return 1;
down_read(&cinode->lock_sem);
posix_test_lock(file, flock);
- if (flock->fl_type == F_UNLCK && !cinode->can_cache_brlcks) {
- flock->fl_type = saved_type;
+ if (lock_is_unlock(flock) && !cinode->can_cache_brlcks) {
+ flock->c.flc_type = saved_type;
rc = 1;
}
@@ -1431,7 +1434,7 @@ cifs_posix_lock_set(struct file *file, struct file_lock *flock)
struct cifsInodeInfo *cinode = CIFS_I(file_inode(file));
int rc = FILE_LOCK_DEFERRED + 1;
- if ((flock->fl_flags & FL_POSIX) == 0)
+ if ((flock->c.flc_flags & FL_POSIX) == 0)
return rc;
cifs_down_write(&cinode->lock_sem);
@@ -1581,7 +1584,9 @@ cifs_push_posix_locks(struct cifsFileInfo *cfile)
el = locks_to_send.next;
spin_lock(&flctx->flc_lock);
- list_for_each_entry(flock, &flctx->flc_posix, fl_list) {
+ for_each_file_lock(flock, &flctx->flc_posix) {
+ unsigned char ftype = flock->c.flc_type;
+
if (el == &locks_to_send) {
/*
* The list ended. We don't have enough allocated
@@ -1591,12 +1596,12 @@ cifs_push_posix_locks(struct cifsFileInfo *cfile)
break;
}
length = cifs_flock_len(flock);
- if (flock->fl_type == F_RDLCK || flock->fl_type == F_SHLCK)
+ if (ftype == F_RDLCK || ftype == F_SHLCK)
type = CIFS_RDLCK;
else
type = CIFS_WRLCK;
lck = list_entry(el, struct lock_to_push, llist);
- lck->pid = hash_lockowner(flock->fl_owner);
+ lck->pid = hash_lockowner(flock->c.flc_owner);
lck->netfid = cfile->fid.netfid;
lck->length = length;
lck->type = type;
@@ -1663,42 +1668,43 @@ static void
cifs_read_flock(struct file_lock *flock, __u32 *type, int *lock, int *unlock,
bool *wait_flag, struct TCP_Server_Info *server)
{
- if (flock->fl_flags & FL_POSIX)
+ if (flock->c.flc_flags & FL_POSIX)
cifs_dbg(FYI, "Posix\n");
- if (flock->fl_flags & FL_FLOCK)
+ if (flock->c.flc_flags & FL_FLOCK)
cifs_dbg(FYI, "Flock\n");
- if (flock->fl_flags & FL_SLEEP) {
+ if (flock->c.flc_flags & FL_SLEEP) {
cifs_dbg(FYI, "Blocking lock\n");
*wait_flag = true;
}
- if (flock->fl_flags & FL_ACCESS)
+ if (flock->c.flc_flags & FL_ACCESS)
cifs_dbg(FYI, "Process suspended by mandatory locking - not implemented yet\n");
- if (flock->fl_flags & FL_LEASE)
+ if (flock->c.flc_flags & FL_LEASE)
cifs_dbg(FYI, "Lease on file - not implemented yet\n");
- if (flock->fl_flags &
+ if (flock->c.flc_flags &
(~(FL_POSIX | FL_FLOCK | FL_SLEEP |
FL_ACCESS | FL_LEASE | FL_CLOSE | FL_OFDLCK)))
- cifs_dbg(FYI, "Unknown lock flags 0x%x\n", flock->fl_flags);
+ cifs_dbg(FYI, "Unknown lock flags 0x%x\n",
+ flock->c.flc_flags);
*type = server->vals->large_lock_type;
- if (flock->fl_type == F_WRLCK) {
+ if (lock_is_write(flock)) {
cifs_dbg(FYI, "F_WRLCK\n");
*type |= server->vals->exclusive_lock_type;
*lock = 1;
- } else if (flock->fl_type == F_UNLCK) {
+ } else if (lock_is_unlock(flock)) {
cifs_dbg(FYI, "F_UNLCK\n");
*type |= server->vals->unlock_lock_type;
*unlock = 1;
/* Check if unlock includes more than one lock range */
- } else if (flock->fl_type == F_RDLCK) {
+ } else if (lock_is_read(flock)) {
cifs_dbg(FYI, "F_RDLCK\n");
*type |= server->vals->shared_lock_type;
*lock = 1;
- } else if (flock->fl_type == F_EXLCK) {
+ } else if (flock->c.flc_type == F_EXLCK) {
cifs_dbg(FYI, "F_EXLCK\n");
*type |= server->vals->exclusive_lock_type;
*lock = 1;
- } else if (flock->fl_type == F_SHLCK) {
+ } else if (flock->c.flc_type == F_SHLCK) {
cifs_dbg(FYI, "F_SHLCK\n");
*type |= server->vals->shared_lock_type;
*lock = 1;
@@ -1730,7 +1736,7 @@ cifs_getlk(struct file *file, struct file_lock *flock, __u32 type,
else
posix_lock_type = CIFS_WRLCK;
rc = CIFSSMBPosixLock(xid, tcon, netfid,
- hash_lockowner(flock->fl_owner),
+ hash_lockowner(flock->c.flc_owner),
flock->fl_start, length, flock,
posix_lock_type, wait_flag);
return rc;
@@ -1747,7 +1753,7 @@ cifs_getlk(struct file *file, struct file_lock *flock, __u32 type,
if (rc == 0) {
rc = server->ops->mand_lock(xid, cfile, flock->fl_start, length,
type, 0, 1, false);
- flock->fl_type = F_UNLCK;
+ flock->c.flc_type = F_UNLCK;
if (rc != 0)
cifs_dbg(VFS, "Error unlocking previously locked range %d during test of lock\n",
rc);
@@ -1755,7 +1761,7 @@ cifs_getlk(struct file *file, struct file_lock *flock, __u32 type,
}
if (type & server->vals->shared_lock_type) {
- flock->fl_type = F_WRLCK;
+ flock->c.flc_type = F_WRLCK;
return 0;
}
@@ -1767,12 +1773,12 @@ cifs_getlk(struct file *file, struct file_lock *flock, __u32 type,
if (rc == 0) {
rc = server->ops->mand_lock(xid, cfile, flock->fl_start, length,
type | server->vals->shared_lock_type, 0, 1, false);
- flock->fl_type = F_RDLCK;
+ flock->c.flc_type = F_RDLCK;
if (rc != 0)
cifs_dbg(VFS, "Error unlocking previously locked range %d during test of lock\n",
rc);
} else
- flock->fl_type = F_WRLCK;
+ flock->c.flc_type = F_WRLCK;
return 0;
}
@@ -1940,7 +1946,7 @@ cifs_setlk(struct file *file, struct file_lock *flock, __u32 type,
posix_lock_type = CIFS_UNLCK;
rc = CIFSSMBPosixLock(xid, tcon, cfile->fid.netfid,
- hash_lockowner(flock->fl_owner),
+ hash_lockowner(flock->c.flc_owner),
flock->fl_start, length,
NULL, posix_lock_type, wait_flag);
goto out;
@@ -1950,7 +1956,7 @@ cifs_setlk(struct file *file, struct file_lock *flock, __u32 type,
struct cifsLockInfo *lock;
lock = cifs_lock_init(flock->fl_start, length, type,
- flock->fl_flags);
+ flock->c.flc_flags);
if (!lock)
return -ENOMEM;
@@ -1989,7 +1995,7 @@ cifs_setlk(struct file *file, struct file_lock *flock, __u32 type,
rc = server->ops->mand_unlock_range(cfile, flock, xid);
out:
- if ((flock->fl_flags & FL_POSIX) || (flock->fl_flags & FL_FLOCK)) {
+ if ((flock->c.flc_flags & FL_POSIX) || (flock->c.flc_flags & FL_FLOCK)) {
/*
* If this is a request to remove all locks because we
* are closing the file, it doesn't matter if the
@@ -1998,7 +2004,7 @@ cifs_setlk(struct file *file, struct file_lock *flock, __u32 type,
*/
if (rc) {
cifs_dbg(VFS, "%s failed rc=%d\n", __func__, rc);
- if (!(flock->fl_flags & FL_CLOSE))
+ if (!(flock->c.flc_flags & FL_CLOSE))
return rc;
}
rc = locks_lock_file_wait(file, flock);
@@ -2019,7 +2025,7 @@ int cifs_flock(struct file *file, int cmd, struct file_lock *fl)
xid = get_xid();
- if (!(fl->fl_flags & FL_FLOCK)) {
+ if (!(fl->c.flc_flags & FL_FLOCK)) {
rc = -ENOLCK;
free_xid(xid);
return rc;
@@ -2070,7 +2076,8 @@ int cifs_lock(struct file *file, int cmd, struct file_lock *flock)
xid = get_xid();
cifs_dbg(FYI, "%s: %pD2 cmd=0x%x type=0x%x flags=0x%x r=%lld:%lld\n", __func__, file, cmd,
- flock->fl_flags, flock->fl_type, (long long)flock->fl_start,
+ flock->c.flc_flags, flock->c.flc_type,
+ (long long)flock->fl_start,
(long long)flock->fl_end);
cfile = (struct cifsFileInfo *)file->private_data;
@@ -2951,7 +2958,7 @@ static int cifs_writepages_region(struct address_space *mapping,
continue;
}
- folio_batch_release(&fbatch);
+ folio_batch_release(&fbatch);
cond_resched();
} while (wbc->nr_to_write > 0);
diff --git a/fs/smb/client/fs_context.c b/fs/smb/client/fs_context.c
index 52cbef2..4b2f5aa2 100644
--- a/fs/smb/client/fs_context.c
+++ b/fs/smb/client/fs_context.c
@@ -211,7 +211,7 @@ cifs_parse_security_flavors(struct fs_context *fc, char *value, struct smb3_fs_c
switch (match_token(value, cifs_secflavor_tokens, args)) {
case Opt_sec_krb5p:
- cifs_errorf(fc, "sec=krb5p is not supported!\n");
+ cifs_errorf(fc, "sec=krb5p is not supported. Use sec=krb5,seal instead\n");
return 1;
case Opt_sec_krb5i:
ctx->sign = true;
@@ -1111,6 +1111,17 @@ static int smb3_fs_context_parse_param(struct fs_context *fc,
case Opt_wsize:
ctx->wsize = result.uint_32;
ctx->got_wsize = true;
+ if (ctx->wsize % PAGE_SIZE != 0) {
+ ctx->wsize = round_down(ctx->wsize, PAGE_SIZE);
+ if (ctx->wsize == 0) {
+ ctx->wsize = PAGE_SIZE;
+ cifs_dbg(VFS, "wsize too small, reset to minimum %ld\n", PAGE_SIZE);
+ } else {
+ cifs_dbg(VFS,
+ "wsize rounded down to %d to multiple of PAGE_SIZE %ld\n",
+ ctx->wsize, PAGE_SIZE);
+ }
+ }
break;
case Opt_acregmax:
ctx->acregmax = HZ * result.uint_32;
diff --git a/fs/smb/client/namespace.c b/fs/smb/client/namespace.c
index a696857..4a517b2 100644
--- a/fs/smb/client/namespace.c
+++ b/fs/smb/client/namespace.c
@@ -168,6 +168,21 @@ static char *automount_fullpath(struct dentry *dentry, void *page)
return s;
}
+static void fs_context_set_ids(struct smb3_fs_context *ctx)
+{
+ kuid_t uid = current_fsuid();
+ kgid_t gid = current_fsgid();
+
+ if (ctx->multiuser) {
+ if (!ctx->uid_specified)
+ ctx->linux_uid = uid;
+ if (!ctx->gid_specified)
+ ctx->linux_gid = gid;
+ }
+ if (!ctx->cruid_specified)
+ ctx->cred_uid = uid;
+}
+
/*
* Create a vfsmount that we can automount
*/
@@ -205,6 +220,7 @@ static struct vfsmount *cifs_do_automount(struct path *path)
tmp.leaf_fullpath = NULL;
tmp.UNC = tmp.prepath = NULL;
tmp.dfs_root_ses = NULL;
+ fs_context_set_ids(&tmp);
rc = smb3_fs_context_dup(ctx, &tmp);
if (rc) {
diff --git a/fs/smb/client/readdir.c b/fs/smb/client/readdir.c
index 3b1b01d..b520eea 100644
--- a/fs/smb/client/readdir.c
+++ b/fs/smb/client/readdir.c
@@ -307,14 +307,16 @@ cifs_dir_info_to_fattr(struct cifs_fattr *fattr, FILE_DIRECTORY_INFO *info,
}
static void cifs_fulldir_info_to_fattr(struct cifs_fattr *fattr,
- SEARCH_ID_FULL_DIR_INFO *info,
+ const void *info,
struct cifs_sb_info *cifs_sb)
{
+ const FILE_FULL_DIRECTORY_INFO *di = info;
+
__dir_info_to_fattr(fattr, info);
- /* See MS-FSCC 2.4.19 FileIdFullDirectoryInformation */
+ /* See MS-FSCC 2.4.14, 2.4.19 */
if (fattr->cf_cifsattrs & ATTR_REPARSE)
- fattr->cf_cifstag = le32_to_cpu(info->EaSize);
+ fattr->cf_cifstag = le32_to_cpu(di->EaSize);
cifs_fill_common_info(fattr, cifs_sb);
}
@@ -396,7 +398,7 @@ _initiate_cifs_search(const unsigned int xid, struct file *file,
} else if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SERVER_INUM) {
cifsFile->srch_inf.info_level = SMB_FIND_FILE_ID_FULL_DIR_INFO;
} else /* not srvinos - BB fixme add check for backlevel? */ {
- cifsFile->srch_inf.info_level = SMB_FIND_FILE_DIRECTORY_INFO;
+ cifsFile->srch_inf.info_level = SMB_FIND_FILE_FULL_DIRECTORY_INFO;
}
search_flags = CIFS_SEARCH_CLOSE_AT_END | CIFS_SEARCH_RETURN_RESUME;
@@ -987,10 +989,9 @@ static int cifs_filldir(char *find_entry, struct file *file,
(FIND_FILE_STANDARD_INFO *)find_entry,
cifs_sb);
break;
+ case SMB_FIND_FILE_FULL_DIRECTORY_INFO:
case SMB_FIND_FILE_ID_FULL_DIR_INFO:
- cifs_fulldir_info_to_fattr(&fattr,
- (SEARCH_ID_FULL_DIR_INFO *)find_entry,
- cifs_sb);
+ cifs_fulldir_info_to_fattr(&fattr, find_entry, cifs_sb);
break;
default:
cifs_dir_info_to_fattr(&fattr,
diff --git a/fs/smb/client/sess.c b/fs/smb/client/sess.c
index ed4bd88..8f37373 100644
--- a/fs/smb/client/sess.c
+++ b/fs/smb/client/sess.c
@@ -76,7 +76,7 @@ cifs_ses_get_chan_index(struct cifs_ses *ses,
unsigned int i;
/* if the channel is waiting for termination */
- if (server->terminate)
+ if (server && server->terminate)
return CIFS_INVAL_CHAN_INDEX;
for (i = 0; i < ses->chan_count; i++) {
@@ -88,7 +88,6 @@ cifs_ses_get_chan_index(struct cifs_ses *ses,
if (server)
cifs_dbg(VFS, "unable to get chan index for server: 0x%llx",
server->conn_id);
- WARN_ON(1);
return CIFS_INVAL_CHAN_INDEX;
}
diff --git a/fs/smb/client/smb2file.c b/fs/smb/client/smb2file.c
index e0ee96d..c23478a 100644
--- a/fs/smb/client/smb2file.c
+++ b/fs/smb/client/smb2file.c
@@ -228,7 +228,7 @@ smb2_unlock_range(struct cifsFileInfo *cfile, struct file_lock *flock,
* flock and OFD lock are associated with an open
* file description, not the process.
*/
- if (!(flock->fl_flags & (FL_FLOCK | FL_OFDLCK)))
+ if (!(flock->c.flc_flags & (FL_FLOCK | FL_OFDLCK)))
continue;
if (cinode->can_cache_brlcks) {
/*
diff --git a/fs/smb/client/smb2ops.c b/fs/smb/client/smb2ops.c
index 83c898a..4695433 100644
--- a/fs/smb/client/smb2ops.c
+++ b/fs/smb/client/smb2ops.c
@@ -619,7 +619,7 @@ parse_server_interfaces(struct network_interface_info_ioctl_rsp *buf,
goto out;
}
- while (bytes_left >= sizeof(*p)) {
+ while (bytes_left >= (ssize_t)sizeof(*p)) {
memset(&tmp_iface, 0, sizeof(tmp_iface));
tmp_iface.speed = le64_to_cpu(p->LinkSpeed);
tmp_iface.rdma_capable = le32_to_cpu(p->Capability & RDMA_CAPABLE) ? 1 : 0;
@@ -1204,6 +1204,7 @@ smb2_set_ea(const unsigned int xid, struct cifs_tcon *tcon,
.disposition = FILE_OPEN,
.create_options = cifs_create_options(cifs_sb, 0),
.fid = &fid,
+ .replay = !!(retries),
};
rc = SMB2_open_init(tcon, server,
@@ -1569,6 +1570,7 @@ smb2_ioctl_query_info(const unsigned int xid,
.disposition = FILE_OPEN,
.create_options = cifs_create_options(cifs_sb, create_options),
.fid = &fid,
+ .replay = !!(retries),
};
if (qi.flags & PASSTHRU_FSCTL) {
@@ -2295,6 +2297,7 @@ smb2_query_dir_first(const unsigned int xid, struct cifs_tcon *tcon,
.disposition = FILE_OPEN,
.create_options = cifs_create_options(cifs_sb, 0),
.fid = fid,
+ .replay = !!(retries),
};
rc = SMB2_open_init(tcon, server,
@@ -2681,6 +2684,7 @@ smb2_query_info_compound(const unsigned int xid, struct cifs_tcon *tcon,
.disposition = FILE_OPEN,
.create_options = cifs_create_options(cifs_sb, 0),
.fid = &fid,
+ .replay = !!(retries),
};
rc = SMB2_open_init(tcon, server,
@@ -5213,7 +5217,7 @@ static int smb2_create_reparse_symlink(const unsigned int xid,
struct inode *new;
struct kvec iov;
__le16 *path;
- char *sym;
+ char *sym, sep = CIFS_DIR_SEP(cifs_sb);
u16 len, plen;
int rc = 0;
@@ -5227,7 +5231,8 @@ static int smb2_create_reparse_symlink(const unsigned int xid,
.symlink_target = sym,
};
- path = cifs_convert_path_to_utf16(symname, cifs_sb);
+ convert_delimiter(sym, sep);
+ path = cifs_convert_path_to_utf16(sym, cifs_sb);
if (!path) {
rc = -ENOMEM;
goto out;
@@ -5250,7 +5255,10 @@ static int smb2_create_reparse_symlink(const unsigned int xid,
buf->PrintNameLength = cpu_to_le16(plen);
memcpy(buf->PathBuffer, path, plen);
buf->Flags = cpu_to_le32(*symname != '/' ? SYMLINK_FLAG_RELATIVE : 0);
+ if (*sym != sep)
+ buf->Flags = cpu_to_le32(SYMLINK_FLAG_RELATIVE);
+ convert_delimiter(sym, '/');
iov.iov_base = buf;
iov.iov_len = len;
new = smb2_get_reparse_inode(&data, inode->i_sb, xid,
diff --git a/fs/smb/client/smb2pdu.c b/fs/smb/client/smb2pdu.c
index c58fa44..608ee05 100644
--- a/fs/smb/client/smb2pdu.c
+++ b/fs/smb/client/smb2pdu.c
@@ -2404,8 +2404,13 @@ create_durable_v2_buf(struct cifs_open_parms *oparms)
*/
buf->dcontext.Timeout = cpu_to_le32(oparms->tcon->handle_timeout);
buf->dcontext.Flags = cpu_to_le32(SMB2_DHANDLE_FLAG_PERSISTENT);
- generate_random_uuid(buf->dcontext.CreateGuid);
- memcpy(pfid->create_guid, buf->dcontext.CreateGuid, 16);
+
+ /* for replay, we should not overwrite the existing create guid */
+ if (!oparms->replay) {
+ generate_random_uuid(buf->dcontext.CreateGuid);
+ memcpy(pfid->create_guid, buf->dcontext.CreateGuid, 16);
+ } else
+ memcpy(buf->dcontext.CreateGuid, pfid->create_guid, 16);
/* SMB2_CREATE_DURABLE_HANDLE_REQUEST is "DH2Q" */
buf->Name[0] = 'D';
@@ -3142,6 +3147,7 @@ SMB2_open(const unsigned int xid, struct cifs_open_parms *oparms, __le16 *path,
/* reinitialize for possible replay */
flags = 0;
server = cifs_pick_channel(ses);
+ oparms->replay = !!(retries);
cifs_dbg(FYI, "create/open\n");
if (!ses || !server)
@@ -5206,6 +5212,9 @@ int SMB2_query_directory_init(const unsigned int xid,
case SMB_FIND_FILE_POSIX_INFO:
req->FileInformationClass = SMB_FIND_FILE_POSIX_INFO;
break;
+ case SMB_FIND_FILE_FULL_DIRECTORY_INFO:
+ req->FileInformationClass = FILE_FULL_DIRECTORY_INFORMATION;
+ break;
default:
cifs_tcon_dbg(VFS, "info level %u isn't supported\n",
info_level);
@@ -5275,6 +5284,9 @@ smb2_parse_query_directory(struct cifs_tcon *tcon,
/* note that posix payload are variable size */
info_buf_size = sizeof(struct smb2_posix_info);
break;
+ case SMB_FIND_FILE_FULL_DIRECTORY_INFO:
+ info_buf_size = sizeof(FILE_FULL_DIRECTORY_INFO);
+ break;
default:
cifs_tcon_dbg(VFS, "info level %u isn't supported\n",
srch_inf->info_level);
diff --git a/fs/smb/server/misc.c b/fs/smb/server/misc.c
index 9e8afaa..1a5faa6 100644
--- a/fs/smb/server/misc.c
+++ b/fs/smb/server/misc.c
@@ -261,6 +261,7 @@ char *ksmbd_casefold_sharename(struct unicode_map *um, const char *name)
/**
* ksmbd_extract_sharename() - get share name from tree connect request
+ * @um: pointer to a unicode_map structure for character encoding handling
* @treename: buffer containing tree name and share name
*
* Return: share name on success, otherwise error
diff --git a/fs/smb/server/smb2pdu.c b/fs/smb/server/smb2pdu.c
index ba7a72a..089527a 100644
--- a/fs/smb/server/smb2pdu.c
+++ b/fs/smb/server/smb2pdu.c
@@ -6173,8 +6173,10 @@ static noinline int smb2_read_pipe(struct ksmbd_work *work)
err = ksmbd_iov_pin_rsp_read(work, (void *)rsp,
offsetof(struct smb2_read_rsp, Buffer),
aux_payload_buf, nbytes);
- if (err)
+ if (err) {
+ kvfree(aux_payload_buf);
goto out;
+ }
kvfree(rpc_resp);
} else {
err = ksmbd_iov_pin_rsp(work, (void *)rsp,
@@ -6384,8 +6386,10 @@ int smb2_read(struct ksmbd_work *work)
err = ksmbd_iov_pin_rsp_read(work, (void *)rsp,
offsetof(struct smb2_read_rsp, Buffer),
aux_payload_buf, nbytes);
- if (err)
+ if (err) {
+ kvfree(aux_payload_buf);
goto out;
+ }
ksmbd_fd_put(work, fp);
return 0;
@@ -6760,10 +6764,10 @@ struct file_lock *smb_flock_init(struct file *f)
locks_init_lock(fl);
- fl->fl_owner = f;
- fl->fl_pid = current->tgid;
- fl->fl_file = f;
- fl->fl_flags = FL_POSIX;
+ fl->c.flc_owner = f;
+ fl->c.flc_pid = current->tgid;
+ fl->c.flc_file = f;
+ fl->c.flc_flags = FL_POSIX;
fl->fl_ops = NULL;
fl->fl_lmops = NULL;
@@ -6780,30 +6784,30 @@ static int smb2_set_flock_flags(struct file_lock *flock, int flags)
case SMB2_LOCKFLAG_SHARED:
ksmbd_debug(SMB, "received shared request\n");
cmd = F_SETLKW;
- flock->fl_type = F_RDLCK;
- flock->fl_flags |= FL_SLEEP;
+ flock->c.flc_type = F_RDLCK;
+ flock->c.flc_flags |= FL_SLEEP;
break;
case SMB2_LOCKFLAG_EXCLUSIVE:
ksmbd_debug(SMB, "received exclusive request\n");
cmd = F_SETLKW;
- flock->fl_type = F_WRLCK;
- flock->fl_flags |= FL_SLEEP;
+ flock->c.flc_type = F_WRLCK;
+ flock->c.flc_flags |= FL_SLEEP;
break;
case SMB2_LOCKFLAG_SHARED | SMB2_LOCKFLAG_FAIL_IMMEDIATELY:
ksmbd_debug(SMB,
"received shared & fail immediately request\n");
cmd = F_SETLK;
- flock->fl_type = F_RDLCK;
+ flock->c.flc_type = F_RDLCK;
break;
case SMB2_LOCKFLAG_EXCLUSIVE | SMB2_LOCKFLAG_FAIL_IMMEDIATELY:
ksmbd_debug(SMB,
"received exclusive & fail immediately request\n");
cmd = F_SETLK;
- flock->fl_type = F_WRLCK;
+ flock->c.flc_type = F_WRLCK;
break;
case SMB2_LOCKFLAG_UNLOCK:
ksmbd_debug(SMB, "received unlock request\n");
- flock->fl_type = F_UNLCK;
+ flock->c.flc_type = F_UNLCK;
cmd = F_SETLK;
break;
}
@@ -6841,13 +6845,13 @@ static void smb2_remove_blocked_lock(void **argv)
struct file_lock *flock = (struct file_lock *)argv[0];
ksmbd_vfs_posix_lock_unblock(flock);
- wake_up(&flock->fl_wait);
+ locks_wake_up(flock);
}
static inline bool lock_defer_pending(struct file_lock *fl)
{
/* check pending lock waiters */
- return waitqueue_active(&fl->fl_wait);
+ return waitqueue_active(&fl->c.flc_wait);
}
/**
@@ -6938,8 +6942,8 @@ int smb2_lock(struct ksmbd_work *work)
list_for_each_entry(cmp_lock, &lock_list, llist) {
if (cmp_lock->fl->fl_start <= flock->fl_start &&
cmp_lock->fl->fl_end >= flock->fl_end) {
- if (cmp_lock->fl->fl_type != F_UNLCK &&
- flock->fl_type != F_UNLCK) {
+ if (cmp_lock->fl->c.flc_type != F_UNLCK &&
+ flock->c.flc_type != F_UNLCK) {
pr_err("conflict two locks in one request\n");
err = -EINVAL;
locks_free_lock(flock);
@@ -6987,12 +6991,12 @@ int smb2_lock(struct ksmbd_work *work)
list_for_each_entry(conn, &conn_list, conns_list) {
spin_lock(&conn->llist_lock);
list_for_each_entry_safe(cmp_lock, tmp2, &conn->lock_list, clist) {
- if (file_inode(cmp_lock->fl->fl_file) !=
- file_inode(smb_lock->fl->fl_file))
+ if (file_inode(cmp_lock->fl->c.flc_file) !=
+ file_inode(smb_lock->fl->c.flc_file))
continue;
- if (smb_lock->fl->fl_type == F_UNLCK) {
- if (cmp_lock->fl->fl_file == smb_lock->fl->fl_file &&
+ if (lock_is_unlock(smb_lock->fl)) {
+ if (cmp_lock->fl->c.flc_file == smb_lock->fl->c.flc_file &&
cmp_lock->start == smb_lock->start &&
cmp_lock->end == smb_lock->end &&
!lock_defer_pending(cmp_lock->fl)) {
@@ -7009,7 +7013,7 @@ int smb2_lock(struct ksmbd_work *work)
continue;
}
- if (cmp_lock->fl->fl_file == smb_lock->fl->fl_file) {
+ if (cmp_lock->fl->c.flc_file == smb_lock->fl->c.flc_file) {
if (smb_lock->flags & SMB2_LOCKFLAG_SHARED)
continue;
} else {
@@ -7051,7 +7055,7 @@ int smb2_lock(struct ksmbd_work *work)
}
up_read(&conn_list_lock);
out_check_cl:
- if (smb_lock->fl->fl_type == F_UNLCK && nolock) {
+ if (lock_is_unlock(smb_lock->fl) && nolock) {
pr_err("Try to unlock nolocked range\n");
rsp->hdr.Status = STATUS_RANGE_NOT_LOCKED;
goto out;
@@ -7175,7 +7179,7 @@ int smb2_lock(struct ksmbd_work *work)
struct file_lock *rlock = NULL;
rlock = smb_flock_init(filp);
- rlock->fl_type = F_UNLCK;
+ rlock->c.flc_type = F_UNLCK;
rlock->fl_start = smb_lock->start;
rlock->fl_end = smb_lock->end;
diff --git a/fs/smb/server/vfs.c b/fs/smb/server/vfs.c
index a6961bf..c487e83 100644
--- a/fs/smb/server/vfs.c
+++ b/fs/smb/server/vfs.c
@@ -337,18 +337,18 @@ static int check_lock_range(struct file *filp, loff_t start, loff_t end,
return 0;
spin_lock(&ctx->flc_lock);
- list_for_each_entry(flock, &ctx->flc_posix, fl_list) {
+ for_each_file_lock(flock, &ctx->flc_posix) {
/* check conflict locks */
if (flock->fl_end >= start && end >= flock->fl_start) {
- if (flock->fl_type == F_RDLCK) {
+ if (lock_is_read(flock)) {
if (type == WRITE) {
pr_err("not allow write by shared lock\n");
error = 1;
goto out;
}
- } else if (flock->fl_type == F_WRLCK) {
+ } else if (lock_is_write(flock)) {
/* check owner in lock */
- if (flock->fl_file != filp) {
+ if (flock->c.flc_file != filp) {
error = 1;
pr_err("not allow rw access by exclusive lock from other opens\n");
goto out;
@@ -1837,13 +1837,13 @@ int ksmbd_vfs_copy_file_ranges(struct ksmbd_work *work,
void ksmbd_vfs_posix_lock_wait(struct file_lock *flock)
{
- wait_event(flock->fl_wait, !flock->fl_blocker);
+ wait_event(flock->c.flc_wait, !flock->c.flc_blocker);
}
int ksmbd_vfs_posix_lock_wait_timeout(struct file_lock *flock, long timeout)
{
- return wait_event_interruptible_timeout(flock->fl_wait,
- !flock->fl_blocker,
+ return wait_event_interruptible_timeout(flock->c.flc_wait,
+ !flock->c.flc_blocker,
timeout);
}
diff --git a/fs/super.c b/fs/super.c
index d35e852..ee05ab6 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -274,9 +274,10 @@ static void destroy_super_work(struct work_struct *work)
{
struct super_block *s = container_of(work, struct super_block,
destroy_work);
- int i;
-
- for (i = 0; i < SB_FREEZE_LEVELS; i++)
+ security_sb_free(s);
+ put_user_ns(s->s_user_ns);
+ kfree(s->s_subtype);
+ for (int i = 0; i < SB_FREEZE_LEVELS; i++)
percpu_free_rwsem(&s->s_writers.rw_sem[i]);
kfree(s);
}
@@ -296,9 +297,6 @@ static void destroy_unused_super(struct super_block *s)
super_unlock_excl(s);
list_lru_destroy(&s->s_dentry_lru);
list_lru_destroy(&s->s_inode_lru);
- security_sb_free(s);
- put_user_ns(s->s_user_ns);
- kfree(s->s_subtype);
shrinker_free(s->s_shrink);
/* no delays needed */
destroy_super_work(&s->destroy_work);
@@ -409,9 +407,6 @@ static void __put_super(struct super_block *s)
WARN_ON(s->s_dentry_lru.node);
WARN_ON(s->s_inode_lru.node);
WARN_ON(!list_empty(&s->s_mounts));
- security_sb_free(s);
- put_user_ns(s->s_user_ns);
- kfree(s->s_subtype);
call_rcu(&s->rcu, destroy_super_rcu);
}
}
@@ -1532,16 +1527,16 @@ int setup_bdev_super(struct super_block *sb, int sb_flags,
struct fs_context *fc)
{
blk_mode_t mode = sb_open_mode(sb_flags);
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
struct block_device *bdev;
- bdev_handle = bdev_open_by_dev(sb->s_dev, mode, sb, &fs_holder_ops);
- if (IS_ERR(bdev_handle)) {
+ bdev_file = bdev_file_open_by_dev(sb->s_dev, mode, sb, &fs_holder_ops);
+ if (IS_ERR(bdev_file)) {
if (fc)
errorf(fc, "%s: Can't open blockdev", fc->source);
- return PTR_ERR(bdev_handle);
+ return PTR_ERR(bdev_file);
}
- bdev = bdev_handle->bdev;
+ bdev = file_bdev(bdev_file);
/*
* This really should be in blkdev_get_by_dev, but right now can't due
@@ -1549,7 +1544,7 @@ int setup_bdev_super(struct super_block *sb, int sb_flags,
* writable from userspace even for a read-only block device.
*/
if ((mode & BLK_OPEN_WRITE) && bdev_read_only(bdev)) {
- bdev_release(bdev_handle);
+ fput(bdev_file);
return -EACCES;
}
@@ -1560,11 +1555,11 @@ int setup_bdev_super(struct super_block *sb, int sb_flags,
if (atomic_read(&bdev->bd_fsfreeze_count) > 0) {
if (fc)
warnf(fc, "%pg: Can't mount, blockdev is frozen", bdev);
- bdev_release(bdev_handle);
+ fput(bdev_file);
return -EBUSY;
}
spin_lock(&sb_lock);
- sb->s_bdev_handle = bdev_handle;
+ sb->s_bdev_file = bdev_file;
sb->s_bdev = bdev;
sb->s_bdi = bdi_get(bdev->bd_disk->bdi);
if (bdev_stable_writes(bdev))
@@ -1680,7 +1675,7 @@ void kill_block_super(struct super_block *sb)
generic_shutdown_super(sb);
if (bdev) {
sync_blockdev(bdev);
- bdev_release(sb->s_bdev_handle);
+ fput(sb->s_bdev_file);
}
}
diff --git a/fs/sysv/inode.c b/fs/sysv/inode.c
index 5a915b2..76bc2d5 100644
--- a/fs/sysv/inode.c
+++ b/fs/sysv/inode.c
@@ -336,7 +336,7 @@ int __init sysv_init_icache(void)
{
sysv_inode_cachep = kmem_cache_create("sysv_inode_cache",
sizeof(struct sysv_inode_info), 0,
- SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
+ SLAB_RECLAIM_ACCOUNT|SLAB_ACCOUNT,
init_once);
if (!sysv_inode_cachep)
return -ENOMEM;
diff --git a/fs/sysv/itree.c b/fs/sysv/itree.c
index 410ab2a..19bcb51 100644
--- a/fs/sysv/itree.c
+++ b/fs/sysv/itree.c
@@ -83,9 +83,6 @@ static inline sysv_zone_t *block_end(struct buffer_head *bh)
return (sysv_zone_t*)((char*)bh->b_data + bh->b_size);
}
-/*
- * Requires read_lock(&pointers_lock) or write_lock(&pointers_lock)
- */
static Indirect *get_branch(struct inode *inode,
int depth,
int offsets[],
@@ -105,15 +102,18 @@ static Indirect *get_branch(struct inode *inode,
bh = sb_bread(sb, block);
if (!bh)
goto failure;
+ read_lock(&pointers_lock);
if (!verify_chain(chain, p))
goto changed;
add_chain(++p, bh, (sysv_zone_t*)bh->b_data + *++offsets);
+ read_unlock(&pointers_lock);
if (!p->key)
goto no_block;
}
return NULL;
changed:
+ read_unlock(&pointers_lock);
brelse(bh);
*err = -EAGAIN;
goto no_block;
@@ -219,9 +219,7 @@ static int get_block(struct inode *inode, sector_t iblock, struct buffer_head *b
goto out;
reread:
- read_lock(&pointers_lock);
partial = get_branch(inode, depth, offsets, chain, &err);
- read_unlock(&pointers_lock);
/* Simplest case - block found, no allocation needed */
if (!partial) {
@@ -291,9 +289,9 @@ static Indirect *find_shared(struct inode *inode,
*top = 0;
for (k = depth; k > 1 && !offsets[k-1]; k--)
;
+ partial = get_branch(inode, k, offsets, chain, &err);
write_lock(&pointers_lock);
- partial = get_branch(inode, k, offsets, chain, &err);
if (!partial)
partial = chain + k-1;
/*
diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index e413a9c..551148d 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -205,7 +205,6 @@ static struct dentry *ubifs_lookup(struct inode *dir, struct dentry *dentry,
dbg_gen("'%pd' in dir ino %lu", dentry, dir->i_ino);
err = fscrypt_prepare_lookup(dir, dentry, &nm);
- generic_set_encrypted_ci_d_ops(dentry);
if (err == -ENOENT)
return d_splice_alias(NULL, dentry);
if (err)
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index 09e270d..d288104 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -2239,13 +2239,14 @@ static int ubifs_fill_super(struct super_block *sb, void *data, int silent)
goto out_umount;
}
+ generic_set_sb_d_ops(sb);
sb->s_root = d_make_root(root);
if (!sb->s_root) {
err = -ENOMEM;
goto out_umount;
}
- import_uuid(&sb->s_uuid, c->uuid);
+ super_set_uuid(sb, c->uuid, sizeof(c->uuid));
mutex_unlock(&c->umount_mutex);
return 0;
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 813f851..1698507d 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -112,7 +112,7 @@ xfs_end_ioend(
* longer dirty. If we don't remove delalloc blocks here, they become
* stale and can corrupt free space accounting on unmount.
*/
- error = blk_status_to_errno(ioend->io_bio->bi_status);
+ error = blk_status_to_errno(ioend->io_bio.bi_status);
if (unlikely(error)) {
if (ioend->io_flags & IOMAP_F_SHARED) {
xfs_reflink_cancel_cow_range(ip, offset, size, true);
@@ -179,7 +179,7 @@ STATIC void
xfs_end_bio(
struct bio *bio)
{
- struct iomap_ioend *ioend = bio->bi_private;
+ struct iomap_ioend *ioend = iomap_ioend_from_bio(bio);
struct xfs_inode *ip = XFS_I(ioend->io_inode);
unsigned long flags;
@@ -276,7 +276,8 @@ static int
xfs_map_blocks(
struct iomap_writepage_ctx *wpc,
struct inode *inode,
- loff_t offset)
+ loff_t offset,
+ unsigned int len)
{
struct xfs_inode *ip = XFS_I(inode);
struct xfs_mount *mp = ip->i_mount;
@@ -444,7 +445,7 @@ xfs_prepare_ioend(
/* send ioends that might require a transaction to the completion wq */
if (xfs_ioend_is_append(ioend) || ioend->io_type == IOMAP_UNWRITTEN ||
(ioend->io_flags & IOMAP_F_SHARED))
- ioend->io_bio->bi_end_io = xfs_end_bio;
+ ioend->io_bio.bi_end_io = xfs_end_bio;
return status;
}
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 8e5bd50..01b41fa 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1951,7 +1951,7 @@ xfs_free_buftarg(
fs_put_dax(btp->bt_daxdev, btp->bt_mount);
/* the main block device is closed by kill_block_super */
if (btp->bt_bdev != btp->bt_mount->m_super->s_bdev)
- bdev_release(btp->bt_bdev_handle);
+ fput(btp->bt_bdev_file);
kmem_free(btp);
}
@@ -1994,7 +1994,7 @@ xfs_setsize_buftarg_early(
struct xfs_buftarg *
xfs_alloc_buftarg(
struct xfs_mount *mp,
- struct bdev_handle *bdev_handle)
+ struct file *bdev_file)
{
xfs_buftarg_t *btp;
const struct dax_holder_operations *ops = NULL;
@@ -2005,9 +2005,9 @@ xfs_alloc_buftarg(
btp = kmem_zalloc(sizeof(*btp), KM_NOFS);
btp->bt_mount = mp;
- btp->bt_bdev_handle = bdev_handle;
- btp->bt_dev = bdev_handle->bdev->bd_dev;
- btp->bt_bdev = bdev_handle->bdev;
+ btp->bt_bdev_file = bdev_file;
+ btp->bt_bdev = file_bdev(bdev_file);
+ btp->bt_dev = btp->bt_bdev->bd_dev;
btp->bt_daxdev = fs_dax_get_by_bdev(btp->bt_bdev, &btp->bt_dax_part_off,
mp, ops);
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index b470de0..304e858 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -98,7 +98,7 @@ typedef unsigned int xfs_buf_flags_t;
*/
typedef struct xfs_buftarg {
dev_t bt_dev;
- struct bdev_handle *bt_bdev_handle;
+ struct file *bt_bdev_file;
struct block_device *bt_bdev;
struct dax_device *bt_daxdev;
u64 bt_dax_part_off;
@@ -366,7 +366,7 @@ xfs_buf_update_cksum(struct xfs_buf *bp, unsigned long cksum_offset)
* Handling of buftargs.
*/
struct xfs_buftarg *xfs_alloc_buftarg(struct xfs_mount *mp,
- struct bdev_handle *bdev_handle);
+ struct file *bdev_file);
extern void xfs_free_buftarg(struct xfs_buftarg *);
extern void xfs_buftarg_wait(struct xfs_buftarg *);
extern void xfs_buftarg_drain(struct xfs_buftarg *);
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index aabb25d..57fa21a 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -62,7 +62,7 @@ xfs_uuid_mount(
int hole, i;
/* Publish UUID in struct super_block */
- uuid_copy(&mp->m_super->s_uuid, uuid);
+ super_set_uuid(mp->m_super, uuid->b, sizeof(*uuid));
if (xfs_has_nouuid(mp))
return 0;
@@ -706,6 +706,8 @@ xfs_mountfs(
/* enable fail_at_unmount as default */
mp->m_fail_unmount = true;
+ super_set_sysfs_name_id(mp->m_super);
+
error = xfs_sysfs_init(&mp->m_kobj, &xfs_mp_ktype,
NULL, mp->m_super->s_id);
if (error)
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 5a2512d..00fbd5b6 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -350,7 +350,6 @@ xfs_setup_dax_always(
return -EINVAL;
}
- xfs_warn(mp, "DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
return 0;
disable_dax:
@@ -362,16 +361,16 @@ STATIC int
xfs_blkdev_get(
xfs_mount_t *mp,
const char *name,
- struct bdev_handle **handlep)
+ struct file **bdev_filep)
{
int error = 0;
- *handlep = bdev_open_by_path(name,
+ *bdev_filep = bdev_file_open_by_path(name,
BLK_OPEN_READ | BLK_OPEN_WRITE | BLK_OPEN_RESTRICT_WRITES,
mp->m_super, &fs_holder_ops);
- if (IS_ERR(*handlep)) {
- error = PTR_ERR(*handlep);
- *handlep = NULL;
+ if (IS_ERR(*bdev_filep)) {
+ error = PTR_ERR(*bdev_filep);
+ *bdev_filep = NULL;
xfs_warn(mp, "Invalid device [%s], error=%d", name, error);
}
@@ -436,26 +435,26 @@ xfs_open_devices(
{
struct super_block *sb = mp->m_super;
struct block_device *ddev = sb->s_bdev;
- struct bdev_handle *logdev_handle = NULL, *rtdev_handle = NULL;
+ struct file *logdev_file = NULL, *rtdev_file = NULL;
int error;
/*
* Open real time and log devices - order is important.
*/
if (mp->m_logname) {
- error = xfs_blkdev_get(mp, mp->m_logname, &logdev_handle);
+ error = xfs_blkdev_get(mp, mp->m_logname, &logdev_file);
if (error)
return error;
}
if (mp->m_rtname) {
- error = xfs_blkdev_get(mp, mp->m_rtname, &rtdev_handle);
+ error = xfs_blkdev_get(mp, mp->m_rtname, &rtdev_file);
if (error)
goto out_close_logdev;
- if (rtdev_handle->bdev == ddev ||
- (logdev_handle &&
- rtdev_handle->bdev == logdev_handle->bdev)) {
+ if (file_bdev(rtdev_file) == ddev ||
+ (logdev_file &&
+ file_bdev(rtdev_file) == file_bdev(logdev_file))) {
xfs_warn(mp,
"Cannot mount filesystem with identical rtdev and ddev/logdev.");
error = -EINVAL;
@@ -467,25 +466,25 @@ xfs_open_devices(
* Setup xfs_mount buffer target pointers
*/
error = -ENOMEM;
- mp->m_ddev_targp = xfs_alloc_buftarg(mp, sb->s_bdev_handle);
+ mp->m_ddev_targp = xfs_alloc_buftarg(mp, sb->s_bdev_file);
if (!mp->m_ddev_targp)
goto out_close_rtdev;
- if (rtdev_handle) {
- mp->m_rtdev_targp = xfs_alloc_buftarg(mp, rtdev_handle);
+ if (rtdev_file) {
+ mp->m_rtdev_targp = xfs_alloc_buftarg(mp, rtdev_file);
if (!mp->m_rtdev_targp)
goto out_free_ddev_targ;
}
- if (logdev_handle && logdev_handle->bdev != ddev) {
- mp->m_logdev_targp = xfs_alloc_buftarg(mp, logdev_handle);
+ if (logdev_file && file_bdev(logdev_file) != ddev) {
+ mp->m_logdev_targp = xfs_alloc_buftarg(mp, logdev_file);
if (!mp->m_logdev_targp)
goto out_free_rtdev_targ;
} else {
mp->m_logdev_targp = mp->m_ddev_targp;
/* Handle won't be used, drop it */
- if (logdev_handle)
- bdev_release(logdev_handle);
+ if (logdev_file)
+ fput(logdev_file);
}
return 0;
@@ -496,11 +495,11 @@ xfs_open_devices(
out_free_ddev_targ:
xfs_free_buftarg(mp->m_ddev_targp);
out_close_rtdev:
- if (rtdev_handle)
- bdev_release(rtdev_handle);
+ if (rtdev_file)
+ fput(rtdev_file);
out_close_logdev:
- if (logdev_handle)
- bdev_release(logdev_handle);
+ if (logdev_file)
+ fput(logdev_file);
return error;
}
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index 6ab2318..3b10371 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -125,7 +125,8 @@ static void zonefs_readahead(struct readahead_control *rac)
* which implies that the page range can only be within the fixed inode size.
*/
static int zonefs_write_map_blocks(struct iomap_writepage_ctx *wpc,
- struct inode *inode, loff_t offset)
+ struct inode *inode, loff_t offset,
+ unsigned int len)
{
struct zonefs_zone *z = zonefs_inode_zone(inode);
@@ -348,7 +349,12 @@ static int zonefs_file_write_dio_end_io(struct kiocb *iocb, ssize_t size,
struct zonefs_inode_info *zi = ZONEFS_I(inode);
if (error) {
- zonefs_io_error(inode, true);
+ /*
+ * For Sync IOs, error recovery is called from
+ * zonefs_file_dio_write().
+ */
+ if (!is_sync_kiocb(iocb))
+ zonefs_io_error(inode, true);
return error;
}
@@ -491,6 +497,14 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
ret = -EINVAL;
goto inode_unlock;
}
+ /*
+ * Advance the zone write pointer offset. This assumes that the
+ * IO will succeed, which is OK to do because we do not allow
+ * partial writes (IOMAP_DIO_PARTIAL is not set) and if the IO
+ * fails, the error path will correct the write pointer offset.
+ */
+ z->z_wpoffset += count;
+ zonefs_inode_account_active(inode);
mutex_unlock(&zi->i_truncate_mutex);
}
@@ -504,20 +518,19 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
if (ret == -ENOTBLK)
ret = -EBUSY;
- if (zonefs_zone_is_seq(z) &&
- (ret > 0 || ret == -EIOCBQUEUED)) {
- if (ret > 0)
- count = ret;
-
- /*
- * Update the zone write pointer offset assuming the write
- * operation succeeded. If it did not, the error recovery path
- * will correct it. Also do active seq file accounting.
- */
- mutex_lock(&zi->i_truncate_mutex);
- z->z_wpoffset += count;
- zonefs_inode_account_active(inode);
- mutex_unlock(&zi->i_truncate_mutex);
+ /*
+ * For a failed IO or partial completion, trigger error recovery
+ * to update the zone write pointer offset to a correct value.
+ * For asynchronous IOs, zonefs_file_write_dio_end_io() may already
+ * have executed error recovery if the IO already completed when we
+ * reach here. However, we cannot know that and execute error recovery
+ * again (that will not change anything).
+ */
+ if (zonefs_zone_is_seq(z)) {
+ if (ret > 0 && ret != count)
+ ret = -EIO;
+ if (ret < 0 && ret != -EIOCBQUEUED)
+ zonefs_io_error(inode, true);
}
inode_unlock:
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index cadb136..aadad16 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -246,16 +246,18 @@ static void zonefs_inode_update_mode(struct inode *inode)
z->z_mode = inode->i_mode;
}
-struct zonefs_ioerr_data {
- struct inode *inode;
- bool write;
-};
-
static int zonefs_io_error_cb(struct blk_zone *zone, unsigned int idx,
void *data)
{
- struct zonefs_ioerr_data *err = data;
- struct inode *inode = err->inode;
+ struct blk_zone *z = data;
+
+ *z = *zone;
+ return 0;
+}
+
+static void zonefs_handle_io_error(struct inode *inode, struct blk_zone *zone,
+ bool write)
+{
struct zonefs_zone *z = zonefs_inode_zone(inode);
struct super_block *sb = inode->i_sb;
struct zonefs_sb_info *sbi = ZONEFS_SB(sb);
@@ -270,8 +272,8 @@ static int zonefs_io_error_cb(struct blk_zone *zone, unsigned int idx,
data_size = zonefs_check_zone_condition(sb, z, zone);
isize = i_size_read(inode);
if (!(z->z_flags & (ZONEFS_ZONE_READONLY | ZONEFS_ZONE_OFFLINE)) &&
- !err->write && isize == data_size)
- return 0;
+ !write && isize == data_size)
+ return;
/*
* At this point, we detected either a bad zone or an inconsistency
@@ -292,7 +294,7 @@ static int zonefs_io_error_cb(struct blk_zone *zone, unsigned int idx,
* In all cases, warn about inode size inconsistency and handle the
* IO error according to the zone condition and to the mount options.
*/
- if (zonefs_zone_is_seq(z) && isize != data_size)
+ if (isize != data_size)
zonefs_warn(sb,
"inode %lu: invalid size %lld (should be %lld)\n",
inode->i_ino, isize, data_size);
@@ -352,8 +354,6 @@ static int zonefs_io_error_cb(struct blk_zone *zone, unsigned int idx,
zonefs_i_size_write(inode, data_size);
z->z_wpoffset = data_size;
zonefs_inode_account_active(inode);
-
- return 0;
}
/*
@@ -367,23 +367,25 @@ void __zonefs_io_error(struct inode *inode, bool write)
{
struct zonefs_zone *z = zonefs_inode_zone(inode);
struct super_block *sb = inode->i_sb;
- struct zonefs_sb_info *sbi = ZONEFS_SB(sb);
unsigned int noio_flag;
- unsigned int nr_zones = 1;
- struct zonefs_ioerr_data err = {
- .inode = inode,
- .write = write,
- };
+ struct blk_zone zone;
int ret;
/*
- * The only files that have more than one zone are conventional zone
- * files with aggregated conventional zones, for which the inode zone
- * size is always larger than the device zone size.
+ * Conventional zone have no write pointer and cannot become read-only
+ * or offline. So simply fake a report for a single or aggregated zone
+ * and let zonefs_handle_io_error() correct the zone inode information
+ * according to the mount options.
*/
- if (z->z_size > bdev_zone_sectors(sb->s_bdev))
- nr_zones = z->z_size >>
- (sbi->s_zone_sectors_shift + SECTOR_SHIFT);
+ if (!zonefs_zone_is_seq(z)) {
+ zone.start = z->z_sector;
+ zone.len = z->z_size >> SECTOR_SHIFT;
+ zone.wp = zone.start + zone.len;
+ zone.type = BLK_ZONE_TYPE_CONVENTIONAL;
+ zone.cond = BLK_ZONE_COND_NOT_WP;
+ zone.capacity = zone.len;
+ goto handle_io_error;
+ }
/*
* Memory allocations in blkdev_report_zones() can trigger a memory
@@ -394,12 +396,20 @@ void __zonefs_io_error(struct inode *inode, bool write)
* the GFP_NOIO context avoids both problems.
*/
noio_flag = memalloc_noio_save();
- ret = blkdev_report_zones(sb->s_bdev, z->z_sector, nr_zones,
- zonefs_io_error_cb, &err);
- if (ret != nr_zones)
+ ret = blkdev_report_zones(sb->s_bdev, z->z_sector, 1,
+ zonefs_io_error_cb, &zone);
+ memalloc_noio_restore(noio_flag);
+
+ if (ret != 1) {
zonefs_err(sb, "Get inode %lu zone information failed %d\n",
inode->i_ino, ret);
- memalloc_noio_restore(noio_flag);
+ zonefs_warn(sb, "remounting filesystem read-only\n");
+ sb->s_flags |= SB_RDONLY;
+ return;
+ }
+
+handle_io_error:
+ zonefs_handle_io_error(inode, &zone, write);
}
static struct kmem_cache *zonefs_inode_cachep;
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 961f4d8..0c06957 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -193,7 +193,6 @@ do { \
#ifndef smp_store_release
#define smp_store_release(p, v) \
do { \
- compiletime_assert_atomic_type(*p); \
barrier(); \
WRITE_ONCE(*p, v); \
} while (0)
@@ -203,7 +202,6 @@ do { \
#define smp_load_acquire(p) \
({ \
__unqual_scalar_typeof(*p) ___p1 = READ_ONCE(*p); \
- compiletime_assert_atomic_type(*p); \
barrier(); \
(typeof(*p))___p1; \
})
diff --git a/include/drm/bridge/aux-bridge.h b/include/drm/bridge/aux-bridge.h
index c4c423e..4453906 100644
--- a/include/drm/bridge/aux-bridge.h
+++ b/include/drm/bridge/aux-bridge.h
@@ -9,6 +9,8 @@
#include <drm/drm_connector.h>
+struct auxiliary_device;
+
#if IS_ENABLED(CONFIG_DRM_AUX_BRIDGE)
int drm_aux_bridge_register(struct device *parent);
#else
@@ -19,10 +21,23 @@ static inline int drm_aux_bridge_register(struct device *parent)
#endif
#if IS_ENABLED(CONFIG_DRM_AUX_HPD_BRIDGE)
+struct auxiliary_device *devm_drm_dp_hpd_bridge_alloc(struct device *parent, struct device_node *np);
+int devm_drm_dp_hpd_bridge_add(struct device *dev, struct auxiliary_device *adev);
struct device *drm_dp_hpd_bridge_register(struct device *parent,
struct device_node *np);
void drm_aux_hpd_bridge_notify(struct device *dev, enum drm_connector_status status);
#else
+static inline struct auxiliary_device *devm_drm_dp_hpd_bridge_alloc(struct device *parent,
+ struct device_node *np)
+{
+ return NULL;
+}
+
+static inline int devm_drm_dp_hpd_bridge_add(struct auxiliary_device *adev)
+{
+ return 0;
+}
+
static inline struct device *drm_dp_hpd_bridge_register(struct device *parent,
struct device_node *np)
{
diff --git a/include/kunit/test.h b/include/kunit/test.h
index fcb4a49..61637ef 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -579,12 +579,12 @@ void __printf(2, 3) kunit_log_append(struct string_stream *log, const char *fmt,
void __noreturn __kunit_abort(struct kunit *test);
-void __kunit_do_failed_assertion(struct kunit *test,
- const struct kunit_loc *loc,
- enum kunit_assert_type type,
- const struct kunit_assert *assert,
- assert_format_t assert_format,
- const char *fmt, ...);
+void __printf(6, 7) __kunit_do_failed_assertion(struct kunit *test,
+ const struct kunit_loc *loc,
+ enum kunit_assert_type type,
+ const struct kunit_assert *assert,
+ assert_format_t assert_format,
+ const char *fmt, ...);
#define _KUNIT_FAILED(test, assert_type, assert_class, assert_format, INITIALIZER, fmt, ...) do { \
static const struct kunit_loc __loc = KUNIT_CURRENT_LOC; \
diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index ae12696..2ad2610 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -141,8 +141,6 @@ struct bdi_writeback {
struct delayed_work dwork; /* work item used for writeback */
struct delayed_work bw_dwork; /* work item used for bandwidth estimate */
- unsigned long dirty_sleep; /* last wait */
-
struct list_head bdi_node; /* anchored at bdi->wb_list */
#ifdef CONFIG_CGROUP_WRITEBACK
@@ -179,6 +177,11 @@ struct backing_dev_info {
* any dirty wbs, which is depended upon by bdi_has_dirty().
*/
atomic_long_t tot_write_bandwidth;
+ /*
+ * Jiffies when last process was dirty throttled on this bdi. Used by
+ * blk-wbt.
+ */
+ unsigned long last_bdp_sleep;
struct bdi_writeback wb; /* the root writeback info for this bdi */
struct list_head wb_list; /* list of all wbs */
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 1a97277..8e7af9a 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -38,7 +38,6 @@ struct backing_dev_info *bdi_alloc(int node_id);
void wb_start_background_writeback(struct bdi_writeback *wb);
void wb_workfn(struct work_struct *work);
-void wb_wakeup_delayed(struct bdi_writeback *wb);
void wb_wait_for_completion(struct wb_completion *done);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 390d35f..d3d8fd8 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -8,6 +8,7 @@
#include <linux/scatterlist.h>
#include <linux/prefetch.h>
#include <linux/srcu.h>
+#include <linux/rw_hint.h>
struct blk_mq_tags;
struct blk_flush_queue;
@@ -135,6 +136,7 @@ struct request {
struct blk_crypto_keyslot *crypt_keyslot;
#endif
+ enum rw_hint write_hint;
unsigned short ioprio;
enum mq_rq_state state;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 1c07848..cb1526e 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -10,6 +10,7 @@
#include <linux/bvec.h>
#include <linux/device.h>
#include <linux/ktime.h>
+#include <linux/rw_hint.h>
struct bio_set;
struct bio;
@@ -227,6 +228,7 @@ struct bio {
*/
unsigned short bi_flags; /* BIO_* below */
unsigned short bi_ioprio;
+ enum rw_hint bi_write_hint;
blk_status_t bi_status;
atomic_t __bi_remaining;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index a9a0bd6..f9b87c3 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -24,6 +24,7 @@
#include <linux/sbitmap.h>
#include <linux/uuid.h>
#include <linux/xarray.h>
+#include <linux/file.h>
struct module;
struct request_queue;
@@ -1521,26 +1522,20 @@ extern const struct blk_holder_ops fs_holder_ops;
(BLK_OPEN_READ | BLK_OPEN_RESTRICT_WRITES | \
(((flags) & SB_RDONLY) ? 0 : BLK_OPEN_WRITE))
-struct bdev_handle {
- struct block_device *bdev;
- void *holder;
- blk_mode_t mode;
-};
-
-struct bdev_handle *bdev_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
+struct file *bdev_file_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
const struct blk_holder_ops *hops);
-struct bdev_handle *bdev_open_by_path(const char *path, blk_mode_t mode,
+struct file *bdev_file_open_by_path(const char *path, blk_mode_t mode,
void *holder, const struct blk_holder_ops *hops);
int bd_prepare_to_claim(struct block_device *bdev, void *holder,
const struct blk_holder_ops *hops);
void bd_abort_claiming(struct block_device *bdev, void *holder);
-void bdev_release(struct bdev_handle *handle);
/* just for blk-cgroup, don't use elsewhere */
struct block_device *blkdev_get_no_open(dev_t dev);
void blkdev_put_no_open(struct block_device *bdev);
struct block_device *I_BDEV(struct inode *inode);
+struct block_device *file_bdev(struct file *bdev_file);
#ifdef CONFIG_BLOCK
void invalidate_bdev(struct block_device *bdev);
diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 555aae54..bd1e361 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -83,7 +83,7 @@ struct bvec_iter {
unsigned int bi_bvec_done; /* number of bytes completed in
current bvec */
-} __packed;
+} __packed __aligned(4);
struct bvec_iter_all {
struct bio_vec bv;
diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h
index 2eaaabb..1717cc57 100644
--- a/include/linux/ceph/messenger.h
+++ b/include/linux/ceph/messenger.h
@@ -283,7 +283,7 @@ struct ceph_msg {
struct kref kref;
bool more_to_follow;
bool needs_out_seq;
- bool sparse_read;
+ u64 sparse_read_total;
int front_alloc_len;
struct ceph_msgpool *pool;
diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
index fa018d5..f66f6aa 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -45,6 +45,7 @@ enum ceph_sparse_read_state {
CEPH_SPARSE_READ_HDR = 0,
CEPH_SPARSE_READ_EXTENTS,
CEPH_SPARSE_READ_DATA_LEN,
+ CEPH_SPARSE_READ_DATA_PRE,
CEPH_SPARSE_READ_DATA,
};
@@ -64,7 +65,7 @@ struct ceph_sparse_read {
u64 sr_req_len; /* orig request length */
u64 sr_pos; /* current pos in buffer */
int sr_index; /* current extent index */
- __le32 sr_datalen; /* length of actual data */
+ u32 sr_datalen; /* length of actual data */
u32 sr_count; /* extent count in reply */
int sr_ext_len; /* length of extent array */
struct ceph_sparse_extent *sr_extent; /* extent array */
diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index aebb65b..75bd169 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -64,6 +64,26 @@
__builtin_unreachable(); \
} while (0)
+/*
+ * GCC 'asm goto' with outputs miscompiles certain code sequences:
+ *
+ * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921
+ *
+ * Work around it via the same compiler barrier quirk that we used
+ * to use for the old 'asm goto' workaround.
+ *
+ * Also, always mark such 'asm goto' statements as volatile: all
+ * asm goto statements are supposed to be volatile as per the
+ * documentation, but some versions of gcc didn't actually do
+ * that for asms with outputs:
+ *
+ * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619
+ */
+#ifdef CONFIG_GCC_ASM_GOTO_OUTPUT_WORKAROUND
+#define asm_goto_output(x...) \
+ do { asm volatile goto(x); asm (""); } while (0)
+#endif
+
#if defined(CONFIG_ARCH_USE_BUILTIN_BSWAP)
#define __HAVE_BUILTIN_BSWAP32__
#define __HAVE_BUILTIN_BSWAP64__
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index 6f1ca49..0caf354 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -362,8 +362,15 @@ struct ftrace_likely_data {
#define __member_size(p) __builtin_object_size(p, 1)
#endif
-#ifndef asm_volatile_goto
-#define asm_volatile_goto(x...) asm goto(x)
+/*
+ * Some versions of gcc do not mark 'asm goto' volatile:
+ *
+ * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103979
+ *
+ * We do it here by hand, because it doesn't hurt.
+ */
+#ifndef asm_goto_output
+#define asm_goto_output(x...) asm volatile goto(x)
#endif
#ifdef CONFIG_CC_HAS_ASM_INLINE
diff --git a/include/linux/cper.h b/include/linux/cper.h
index c1a7dc3..265b0f8 100644
--- a/include/linux/cper.h
+++ b/include/linux/cper.h
@@ -90,6 +90,29 @@ enum {
GUID_INIT(0x667DD791, 0xC6B3, 0x4c27, 0x8A, 0x6B, 0x0F, 0x8E, \
0x72, 0x2D, 0xEB, 0x41)
+/* CXL Event record UUIDs are formatted as GUIDs and reported in section type */
+/*
+ * General Media Event Record
+ * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+#define CPER_SEC_CXL_GEN_MEDIA_GUID \
+ GUID_INIT(0xfbcd0a77, 0xc260, 0x417f, \
+ 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6)
+/*
+ * DRAM Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
+ */
+#define CPER_SEC_CXL_DRAM_GUID \
+ GUID_INIT(0x601dcbb3, 0x9c06, 0x4eab, \
+ 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24)
+/*
+ * Memory Module Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
+ */
+#define CPER_SEC_CXL_MEM_MODULE_GUID \
+ GUID_INIT(0xfe927475, 0xdd59, 0x4339, \
+ 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74)
+
/*
* Flags bits definitions for flags in struct cper_record_header
* If set, the error has been recovered
diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
index 91125ec..03fa6d5 100644
--- a/include/linux/cxl-event.h
+++ b/include/linux/cxl-event.h
@@ -140,22 +140,4 @@ struct cxl_cper_event_rec {
union cxl_event event;
} __packed;
-typedef void (*cxl_cper_callback)(enum cxl_event_type type,
- struct cxl_cper_event_rec *rec);
-
-#ifdef CONFIG_ACPI_APEI_GHES
-int cxl_cper_register_callback(cxl_cper_callback callback);
-int cxl_cper_unregister_callback(cxl_cper_callback callback);
-#else
-static inline int cxl_cper_register_callback(cxl_cper_callback callback)
-{
- return 0;
-}
-
-static inline int cxl_cper_unregister_callback(cxl_cper_callback callback)
-{
- return 0;
-}
-#endif
-
#endif /* _LINUX_CXL_EVENT_H */
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 1666c38..bf53e38 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -125,6 +125,11 @@ enum dentry_d_lock_class
DENTRY_D_LOCK_NESTED
};
+enum d_real_type {
+ D_REAL_DATA,
+ D_REAL_METADATA,
+};
+
struct dentry_operations {
int (*d_revalidate)(struct dentry *, unsigned int);
int (*d_weak_revalidate)(struct dentry *, unsigned int);
@@ -139,7 +144,7 @@ struct dentry_operations {
char *(*d_dname)(struct dentry *, char *, int);
struct vfsmount *(*d_automount)(struct path *);
int (*d_manage)(const struct path *, bool);
- struct dentry *(*d_real)(struct dentry *, const struct inode *);
+ struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
} ____cacheline_aligned;
/*
@@ -173,6 +178,7 @@ struct dentry_operations {
#define DCACHE_DONTCACHE BIT(7) /* Purge from memory on final dput() */
#define DCACHE_CANT_MOUNT BIT(8)
+#define DCACHE_GENOCIDE BIT(9)
#define DCACHE_SHRINK_LIST BIT(10)
#define DCACHE_OP_WEAK_REVALIDATE BIT(11)
@@ -546,24 +552,23 @@ static inline struct inode *d_backing_inode(const struct dentry *upper)
/**
* d_real - Return the real dentry
* @dentry: the dentry to query
- * @inode: inode to select the dentry from multiple layers (can be NULL)
+ * @type: the type of real dentry (data or metadata)
*
* If dentry is on a union/overlay, then return the underlying, real dentry.
* Otherwise return the dentry itself.
*
* See also: Documentation/filesystems/vfs.rst
*/
-static inline struct dentry *d_real(struct dentry *dentry,
- const struct inode *inode)
+static inline struct dentry *d_real(struct dentry *dentry, enum d_real_type type)
{
if (unlikely(dentry->d_flags & DCACHE_OP_REAL))
- return dentry->d_op->d_real(dentry, inode);
+ return dentry->d_op->d_real(dentry, type);
else
return dentry;
}
/**
- * d_real_inode - Return the real inode
+ * d_real_inode - Return the real inode hosting the data
* @dentry: The dentry to query
*
* If dentry is on a union/overlay, then return the underlying, real inode.
@@ -572,7 +577,7 @@ static inline struct dentry *d_real(struct dentry *dentry,
static inline struct inode *d_real_inode(const struct dentry *dentry)
{
/* This usage of d_real() results in const dentry */
- return d_backing_inode(d_real((struct dentry *) dentry, NULL));
+ return d_inode(d_real((struct dentry *) dentry, D_REAL_DATA));
}
struct name_snapshot {
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index 772ab4d..82b2195 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -165,7 +165,7 @@ void dm_error(const char *message);
struct dm_dev {
struct block_device *bdev;
- struct bdev_handle *bdev_handle;
+ struct file *bdev_file;
struct dax_device *dax_dev;
blk_mode_t mode;
char name[16];
diff --git a/include/linux/dpll.h b/include/linux/dpll.h
index 9cf896e..e37344f 100644
--- a/include/linux/dpll.h
+++ b/include/linux/dpll.h
@@ -10,6 +10,8 @@
#include <uapi/linux/dpll.h>
#include <linux/device.h>
#include <linux/netlink.h>
+#include <linux/netdevice.h>
+#include <linux/rtnetlink.h>
struct dpll_device;
struct dpll_pin;
@@ -120,15 +122,24 @@ struct dpll_pin_properties {
};
#if IS_ENABLED(CONFIG_DPLL)
-size_t dpll_msg_pin_handle_size(struct dpll_pin *pin);
-int dpll_msg_add_pin_handle(struct sk_buff *msg, struct dpll_pin *pin);
+void dpll_netdev_pin_set(struct net_device *dev, struct dpll_pin *dpll_pin);
+void dpll_netdev_pin_clear(struct net_device *dev);
+
+size_t dpll_netdev_pin_handle_size(const struct net_device *dev);
+int dpll_netdev_add_pin_handle(struct sk_buff *msg,
+ const struct net_device *dev);
#else
-static inline size_t dpll_msg_pin_handle_size(struct dpll_pin *pin)
+static inline void
+dpll_netdev_pin_set(struct net_device *dev, struct dpll_pin *dpll_pin) { }
+static inline void dpll_netdev_pin_clear(struct net_device *dev) { }
+
+static inline size_t dpll_netdev_pin_handle_size(const struct net_device *dev)
{
return 0;
}
-static inline int dpll_msg_add_pin_handle(struct sk_buff *msg, struct dpll_pin *pin)
+static inline int
+dpll_netdev_add_pin_handle(struct sk_buff *msg, const struct net_device *dev)
{
return 0;
}
diff --git a/include/linux/file.h b/include/linux/file.h
index 6834a29..169692c 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -24,6 +24,8 @@ struct inode;
struct path;
extern struct file *alloc_file_pseudo(struct inode *, struct vfsmount *,
const char *, int flags, const struct file_operations *);
+extern struct file *alloc_file_pseudo_noaccount(struct inode *, struct vfsmount *,
+ const char *, int flags, const struct file_operations *);
extern struct file *alloc_file_clone(struct file *, int flags,
const struct file_operations *);
diff --git a/include/linux/filelock.h b/include/linux/filelock.h
index 95e868e..daee999 100644
--- a/include/linux/filelock.h
+++ b/include/linux/filelock.h
@@ -27,6 +27,7 @@
#define FILE_LOCK_DEFERRED 1
struct file_lock;
+struct file_lease;
struct file_lock_operations {
void (*fl_copy_lock)(struct file_lock *, struct file_lock *);
@@ -39,14 +40,17 @@ struct lock_manager_operations {
void (*lm_put_owner)(fl_owner_t);
void (*lm_notify)(struct file_lock *); /* unblock callback */
int (*lm_grant)(struct file_lock *, int);
- bool (*lm_break)(struct file_lock *);
- int (*lm_change)(struct file_lock *, int, struct list_head *);
- void (*lm_setup)(struct file_lock *, void **);
- bool (*lm_breaker_owns_lease)(struct file_lock *);
bool (*lm_lock_expirable)(struct file_lock *cfl);
void (*lm_expire_lock)(void);
};
+struct lease_manager_operations {
+ bool (*lm_break)(struct file_lease *);
+ int (*lm_change)(struct file_lease *, int, struct list_head *);
+ void (*lm_setup)(struct file_lease *, void **);
+ bool (*lm_breaker_owns_lease)(struct file_lease *);
+};
+
struct lock_manager {
struct list_head list;
/*
@@ -85,31 +89,31 @@ bool opens_in_grace(struct net *);
*
* Obviously, the last two criteria only matter for POSIX locks.
*/
-struct file_lock {
- struct file_lock *fl_blocker; /* The lock, that is blocking us */
- struct list_head fl_list; /* link into file_lock_context */
- struct hlist_node fl_link; /* node in global lists */
- struct list_head fl_blocked_requests; /* list of requests with
+
+struct file_lock_core {
+ struct file_lock_core *flc_blocker; /* The lock that is blocking us */
+ struct list_head flc_list; /* link into file_lock_context */
+ struct hlist_node flc_link; /* node in global lists */
+ struct list_head flc_blocked_requests; /* list of requests with
* ->fl_blocker pointing here
*/
- struct list_head fl_blocked_member; /* node in
+ struct list_head flc_blocked_member; /* node in
* ->fl_blocker->fl_blocked_requests
*/
- fl_owner_t fl_owner;
- unsigned int fl_flags;
- unsigned char fl_type;
- unsigned int fl_pid;
- int fl_link_cpu; /* what cpu's list is this on? */
- wait_queue_head_t fl_wait;
- struct file *fl_file;
+ fl_owner_t flc_owner;
+ unsigned int flc_flags;
+ unsigned char flc_type;
+ pid_t flc_pid;
+ int flc_link_cpu; /* what cpu's list is this on? */
+ wait_queue_head_t flc_wait;
+ struct file *flc_file;
+};
+
+struct file_lock {
+ struct file_lock_core c;
loff_t fl_start;
loff_t fl_end;
- struct fasync_struct * fl_fasync; /* for lease break notifications */
- /* for lease breaks: */
- unsigned long fl_break_time;
- unsigned long fl_downgrade_time;
-
const struct file_lock_operations *fl_ops; /* Callbacks for filesystems */
const struct lock_manager_operations *fl_lmops; /* Callbacks for lockmanagers */
union {
@@ -126,6 +130,15 @@ struct file_lock {
} fl_u;
} __randomize_layout;
+struct file_lease {
+ struct file_lock_core c;
+ struct fasync_struct * fl_fasync; /* for lease break notifications */
+ /* for lease breaks: */
+ unsigned long fl_break_time;
+ unsigned long fl_downgrade_time;
+ const struct lease_manager_operations *fl_lmops; /* Callbacks for lease managers */
+} __randomize_layout;
+
struct file_lock_context {
spinlock_t flc_lock;
struct list_head flc_flock;
@@ -147,11 +160,31 @@ int fcntl_setlk64(unsigned int, struct file *, unsigned int,
int fcntl_setlease(unsigned int fd, struct file *filp, int arg);
int fcntl_getlease(struct file *filp);
+static inline bool lock_is_unlock(struct file_lock *fl)
+{
+ return fl->c.flc_type == F_UNLCK;
+}
+
+static inline bool lock_is_read(struct file_lock *fl)
+{
+ return fl->c.flc_type == F_RDLCK;
+}
+
+static inline bool lock_is_write(struct file_lock *fl)
+{
+ return fl->c.flc_type == F_WRLCK;
+}
+
+static inline void locks_wake_up(struct file_lock *fl)
+{
+ wake_up(&fl->c.flc_wait);
+}
+
/* fs/locks.c */
void locks_free_lock_context(struct inode *inode);
void locks_free_lock(struct file_lock *fl);
void locks_init_lock(struct file_lock *);
-struct file_lock * locks_alloc_lock(void);
+struct file_lock *locks_alloc_lock(void);
void locks_copy_lock(struct file_lock *, struct file_lock *);
void locks_copy_conflock(struct file_lock *, struct file_lock *);
void locks_remove_posix(struct file *, fl_owner_t);
@@ -165,11 +198,16 @@ int vfs_lock_file(struct file *, unsigned int, struct file_lock *, struct file_l
int vfs_cancel_lock(struct file *filp, struct file_lock *fl);
bool vfs_inode_has_locks(struct inode *inode);
int locks_lock_inode_wait(struct inode *inode, struct file_lock *fl);
+
+void locks_init_lease(struct file_lease *);
+void locks_free_lease(struct file_lease *fl);
+struct file_lease *locks_alloc_lease(void);
int __break_lease(struct inode *inode, unsigned int flags, unsigned int type);
void lease_get_mtime(struct inode *, struct timespec64 *time);
-int generic_setlease(struct file *, int, struct file_lock **, void **priv);
-int vfs_setlease(struct file *, int, struct file_lock **, void **);
-int lease_modify(struct file_lock *, int, struct list_head *);
+int generic_setlease(struct file *, int, struct file_lease **, void **priv);
+int kernel_setlease(struct file *, int, struct file_lease **, void **);
+int vfs_setlease(struct file *, int, struct file_lease **, void **);
+int lease_modify(struct file_lease *, int, struct list_head *);
struct notifier_block;
int lease_register_notifier(struct notifier_block *);
@@ -223,6 +261,25 @@ static inline int fcntl_getlease(struct file *filp)
return F_UNLCK;
}
+static inline bool lock_is_unlock(struct file_lock *fl)
+{
+ return false;
+}
+
+static inline bool lock_is_read(struct file_lock *fl)
+{
+ return false;
+}
+
+static inline bool lock_is_write(struct file_lock *fl)
+{
+ return false;
+}
+
+static inline void locks_wake_up(struct file_lock *fl)
+{
+}
+
static inline void
locks_free_lock_context(struct inode *inode)
{
@@ -233,6 +290,11 @@ static inline void locks_init_lock(struct file_lock *fl)
return;
}
+static inline void locks_init_lease(struct file_lease *fl)
+{
+ return;
+}
+
static inline void locks_copy_conflock(struct file_lock *new, struct file_lock *fl)
{
return;
@@ -307,18 +369,24 @@ static inline void lease_get_mtime(struct inode *inode,
}
static inline int generic_setlease(struct file *filp, int arg,
- struct file_lock **flp, void **priv)
+ struct file_lease **flp, void **priv)
+{
+ return -EINVAL;
+}
+
+static inline int kernel_setlease(struct file *filp, int arg,
+ struct file_lease **lease, void **priv)
{
return -EINVAL;
}
static inline int vfs_setlease(struct file *filp, int arg,
- struct file_lock **lease, void **priv)
+ struct file_lease **lease, void **priv)
{
return -EINVAL;
}
-static inline int lease_modify(struct file_lock *fl, int arg,
+static inline int lease_modify(struct file_lease *fl, int arg,
struct list_head *dispose)
{
return -EINVAL;
@@ -341,6 +409,9 @@ locks_inode_context(const struct inode *inode)
#endif /* !CONFIG_FILE_LOCKING */
+/* for walking lists of file_locks linked by fl_list */
+#define for_each_file_lock(_fl, _head) list_for_each_entry(_fl, _head, c.flc_list)
+
static inline int locks_lock_file_wait(struct file *filp, struct file_lock *fl)
{
return locks_lock_inode_wait(file_inode(filp), fl);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ed5966a..0a22b72 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -43,6 +43,8 @@
#include <linux/cred.h>
#include <linux/mnt_idmapping.h>
#include <linux/slab.h>
+#include <linux/maple_tree.h>
+#include <linux/rw_hint.h>
#include <asm/byteorder.h>
#include <uapi/linux/fs.h>
@@ -309,19 +311,6 @@ struct address_space;
struct writeback_control;
struct readahead_control;
-/*
- * Write life time hint values.
- * Stored in struct inode as u8.
- */
-enum rw_hint {
- WRITE_LIFE_NOT_SET = 0,
- WRITE_LIFE_NONE = RWH_WRITE_LIFE_NONE,
- WRITE_LIFE_SHORT = RWH_WRITE_LIFE_SHORT,
- WRITE_LIFE_MEDIUM = RWH_WRITE_LIFE_MEDIUM,
- WRITE_LIFE_LONG = RWH_WRITE_LIFE_LONG,
- WRITE_LIFE_EXTREME = RWH_WRITE_LIFE_EXTREME,
-};
-
/* Match RWF_* bits to IOCB bits */
#define IOCB_HIPRI (__force int) RWF_HIPRI
#define IOCB_DSYNC (__force int) RWF_DSYNC
@@ -352,6 +341,8 @@ enum rw_hint {
* unrelated IO (like cache flushing, new IO generation, etc).
*/
#define IOCB_DIO_CALLER_COMP (1 << 22)
+/* kiocb is a read or write operation submitted by fs/aio.c. */
+#define IOCB_AIO_RW (1 << 23)
/* for use in trace events */
#define TRACE_IOCB_STRINGS \
@@ -482,10 +473,10 @@ struct address_space {
pgoff_t writeback_index;
const struct address_space_operations *a_ops;
unsigned long flags;
- struct rw_semaphore i_mmap_rwsem;
errseq_t wb_err;
spinlock_t i_private_lock;
struct list_head i_private_list;
+ struct rw_semaphore i_mmap_rwsem;
void * i_private_data;
} __attribute__((aligned(sizeof(long)))) __randomize_layout;
/*
@@ -677,7 +668,7 @@ struct inode {
spinlock_t i_lock; /* i_blocks, i_bytes, maybe i_size */
unsigned short i_bytes;
u8 i_blkbits;
- u8 i_write_hint;
+ enum rw_hint i_write_hint;
blkcnt_t i_blocks;
#ifdef __NEED_I_SIZE_ORDERED
@@ -907,7 +898,8 @@ static inline loff_t i_size_read(const struct inode *inode)
preempt_enable();
return i_size;
#else
- return inode->i_size;
+ /* Pairs with smp_store_release() in i_size_write() */
+ return smp_load_acquire(&inode->i_size);
#endif
}
@@ -929,7 +921,12 @@ static inline void i_size_write(struct inode *inode, loff_t i_size)
inode->i_size = i_size;
preempt_enable();
#else
- inode->i_size = i_size;
+ /*
+ * Pairs with smp_load_acquire() in i_size_read() to ensure
+ * changes related to inode size (such as page contents) are
+ * visible before we see the changed inode size.
+ */
+ smp_store_release(&inode->i_size, i_size);
#endif
}
@@ -1064,6 +1061,7 @@ struct file *get_file_active(struct file **f);
typedef void *fl_owner_t;
struct file_lock;
+struct file_lease;
/* The following constant reflects the upper bound of the file/locking space */
#ifndef OFFSET_MAX
@@ -1078,9 +1076,20 @@ static inline struct inode *file_inode(const struct file *f)
return f->f_inode;
}
+/*
+ * file_dentry() is a relic from the days that overlayfs was using files with a
+ * "fake" path, meaning, f_path on overlayfs and f_inode on underlying fs.
+ * In those days, file_dentry() was needed to get the underlying fs dentry that
+ * matches f_inode.
+ * Files with "fake" path should not exist nowadays, so use an assertion to make
+ * sure that file_dentry() was not papering over filesystem bugs.
+ */
static inline struct dentry *file_dentry(const struct file *file)
{
- return d_real(file->f_path.dentry, file_inode(file));
+ struct dentry *dentry = file->f_path.dentry;
+
+ WARN_ON_ONCE(d_inode(dentry) != file_inode(file));
+ return dentry;
}
struct fasync_struct {
@@ -1228,8 +1237,8 @@ struct super_block {
#endif
struct hlist_bl_head s_roots; /* alternate root dentries for NFS */
struct list_head s_mounts; /* list of mounts; _not_ for fs use */
- struct block_device *s_bdev;
- struct bdev_handle *s_bdev_handle;
+ struct block_device *s_bdev; /* can go away once we use an accessor for @s_bdev_file */
+ struct file *s_bdev_file;
struct backing_dev_info *s_bdi;
struct mtd_info *s_mtd;
struct hlist_node s_instances;
@@ -1255,8 +1264,22 @@ struct super_block {
struct fsnotify_mark_connector __rcu *s_fsnotify_marks;
#endif
+ /*
+ * q: why are s_id and s_sysfs_name not the same? both are human
+ * readable strings that identify the filesystem
+ * a: s_id is allowed to change at runtime; it's used in log messages,
+ * and we want to when a device starts out as single device (s_id is dev
+ * name) but then a device is hot added and we have to switch to
+ * identifying it by UUID
+ * but s_sysfs_name is a handle for programmatic access, and can't
+ * change at runtime
+ */
char s_id[32]; /* Informational name */
uuid_t s_uuid; /* UUID */
+ u8 s_uuid_len; /* Default 16, possibly smaller for weird filesystems */
+
+ /* if set, fs shows up under sysfs at /sys/fs/$FSTYP/s_sysfs_name */
+ char s_sysfs_name[UUID_STRING_LEN + 1];
unsigned int s_max_links;
@@ -2005,7 +2028,7 @@ struct file_operations {
ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int);
ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int);
void (*splice_eof)(struct file *file);
- int (*setlease)(struct file *, int, struct file_lock **, void **);
+ int (*setlease)(struct file *, int, struct file_lease **, void **);
long (*fallocate)(struct file *file, int mode, loff_t offset,
loff_t len);
void (*show_fdinfo)(struct seq_file *m, struct file *f);
@@ -2101,9 +2124,6 @@ int __generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
struct file *file_out, loff_t pos_out,
loff_t *count, unsigned int remap_flags);
-extern loff_t do_clone_file_range(struct file *file_in, loff_t pos_in,
- struct file *file_out, loff_t pos_out,
- loff_t len, unsigned int remap_flags);
extern loff_t vfs_clone_file_range(struct file *file_in, loff_t pos_in,
struct file *file_out, loff_t pos_out,
loff_t len, unsigned int remap_flags);
@@ -2532,6 +2552,44 @@ extern __printf(2, 3)
int super_setup_bdi_name(struct super_block *sb, char *fmt, ...);
extern int super_setup_bdi(struct super_block *sb);
+static inline void super_set_uuid(struct super_block *sb, const u8 *uuid, unsigned len)
+{
+ if (WARN_ON(len > sizeof(sb->s_uuid)))
+ len = sizeof(sb->s_uuid);
+ sb->s_uuid_len = len;
+ memcpy(&sb->s_uuid, uuid, len);
+}
+
+/* set sb sysfs name based on sb->s_bdev */
+static inline void super_set_sysfs_name_bdev(struct super_block *sb)
+{
+ snprintf(sb->s_sysfs_name, sizeof(sb->s_sysfs_name), "%pg", sb->s_bdev);
+}
+
+/* set sb sysfs name based on sb->s_uuid */
+static inline void super_set_sysfs_name_uuid(struct super_block *sb)
+{
+ WARN_ON(sb->s_uuid_len != sizeof(sb->s_uuid));
+ snprintf(sb->s_sysfs_name, sizeof(sb->s_sysfs_name), "%pU", sb->s_uuid.b);
+}
+
+/* set sb sysfs name based on sb->s_id */
+static inline void super_set_sysfs_name_id(struct super_block *sb)
+{
+ strscpy(sb->s_sysfs_name, sb->s_id, sizeof(sb->s_sysfs_name));
+}
+
+/* try to use something standard before you use this */
+__printf(2, 3)
+static inline void super_set_sysfs_name_generic(struct super_block *sb, const char *fmt, ...)
+{
+ va_list args;
+
+ va_start(args, fmt);
+ vsnprintf(sb->s_sysfs_name, sizeof(sb->s_sysfs_name), fmt, args);
+ va_end(args);
+}
+
extern int current_umask(void);
extern void ihold(struct inode * inode);
@@ -2928,6 +2986,17 @@ extern bool path_is_under(const struct path *, const struct path *);
extern char *file_path(struct file *, char *, int);
+/**
+ * is_dot_dotdot - returns true only if @name is "." or ".."
+ * @name: file name to check
+ * @len: length of file name, in bytes
+ */
+static inline bool is_dot_dotdot(const char *name, size_t len)
+{
+ return len && unlikely(name[0] == '.') &&
+ (len == 1 || (len == 2 && name[1] == '.'));
+}
+
#include <linux/err.h>
/* needed for stackable file system support */
@@ -3238,7 +3307,7 @@ extern int simple_write_begin(struct file *file, struct address_space *mapping,
extern const struct address_space_operations ram_aops;
extern int always_delete_dentry(const struct dentry *);
extern struct inode *alloc_anon_inode(struct super_block *);
-extern int simple_nosetlease(struct file *, int, struct file_lock **, void **);
+extern int simple_nosetlease(struct file *, int, struct file_lease **, void **);
extern const struct dentry_operations simple_dentry_operations;
extern struct dentry *simple_lookup(struct inode *, struct dentry *, unsigned int flags);
@@ -3260,13 +3329,14 @@ extern ssize_t simple_write_to_buffer(void *to, size_t available, loff_t *ppos,
const void __user *from, size_t count);
struct offset_ctx {
- struct xarray xa;
- u32 next_offset;
+ struct maple_tree mt;
+ unsigned long next_offset;
};
void simple_offset_init(struct offset_ctx *octx);
int simple_offset_add(struct offset_ctx *octx, struct dentry *dentry);
void simple_offset_remove(struct offset_ctx *octx, struct dentry *dentry);
+int simple_offset_empty(struct dentry *dentry);
int simple_offset_rename_exchange(struct inode *old_dir,
struct dentry *old_dentry,
struct inode *new_dir,
@@ -3280,7 +3350,16 @@ extern int generic_file_fsync(struct file *, loff_t, loff_t, int);
extern int generic_check_addressable(unsigned, u64);
-extern void generic_set_encrypted_ci_d_ops(struct dentry *dentry);
+extern void generic_set_sb_d_ops(struct super_block *sb);
+
+static inline bool sb_has_encoding(const struct super_block *sb)
+{
+#if IS_ENABLED(CONFIG_UNICODE)
+ return !!sb->s_encoding;
+#else
+ return false;
+#endif
+}
int may_setattr(struct mnt_idmap *idmap, struct inode *inode,
unsigned int ia_valid);
@@ -3335,6 +3414,8 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
return 0;
if (unlikely(flags & ~RWF_SUPPORTED))
return -EOPNOTSUPP;
+ if (unlikely((flags & RWF_APPEND) && (flags & RWF_NOAPPEND)))
+ return -EINVAL;
if (flags & RWF_NOWAIT) {
if (!(ki->ki_filp->f_mode & FMODE_NOWAIT))
@@ -3345,6 +3426,12 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
if (flags & RWF_SYNC)
kiocb_flags |= IOCB_DSYNC;
+ if ((flags & RWF_NOAPPEND) && (ki->ki_flags & IOCB_APPEND)) {
+ if (IS_APPEND(file_inode(ki->ki_filp)))
+ return -EPERM;
+ ki->ki_flags &= ~IOCB_APPEND;
+ }
+
ki->ki_flags |= kiocb_flags;
return 0;
}
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index 12f9e45..772f822 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -192,6 +192,8 @@ struct fscrypt_operations {
unsigned int *num_devs);
};
+int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags);
+
static inline struct fscrypt_inode_info *
fscrypt_get_inode_info(const struct inode *inode)
{
@@ -221,15 +223,29 @@ static inline bool fscrypt_needs_contents_encryption(const struct inode *inode)
}
/*
- * When d_splice_alias() moves a directory's no-key alias to its plaintext alias
- * as a result of the encryption key being added, DCACHE_NOKEY_NAME must be
- * cleared. Note that we don't have to support arbitrary moves of this flag
- * because fscrypt doesn't allow no-key names to be the source or target of a
- * rename().
+ * When d_splice_alias() moves a directory's no-key alias to its
+ * plaintext alias as a result of the encryption key being added,
+ * DCACHE_NOKEY_NAME must be cleared and there might be an opportunity
+ * to disable d_revalidate. Note that we don't have to support the
+ * inverse operation because fscrypt doesn't allow no-key names to be
+ * the source or target of a rename().
*/
static inline void fscrypt_handle_d_move(struct dentry *dentry)
{
- dentry->d_flags &= ~DCACHE_NOKEY_NAME;
+ /*
+ * VFS calls fscrypt_handle_d_move even for non-fscrypt
+ * filesystems.
+ */
+ if (dentry->d_flags & DCACHE_NOKEY_NAME) {
+ dentry->d_flags &= ~DCACHE_NOKEY_NAME;
+
+ /*
+ * Other filesystem features might be handling dentry
+ * revalidation, in which case it cannot be disabled.
+ */
+ if (dentry->d_op->d_revalidate == fscrypt_d_revalidate)
+ dentry->d_flags &= ~DCACHE_OP_REVALIDATE;
+ }
}
/**
@@ -261,6 +277,35 @@ static inline bool fscrypt_is_nokey_name(const struct dentry *dentry)
return dentry->d_flags & DCACHE_NOKEY_NAME;
}
+static inline void fscrypt_prepare_dentry(struct dentry *dentry,
+ bool is_nokey_name)
+{
+ /*
+ * This code tries to only take ->d_lock when necessary to write
+ * to ->d_flags. We shouldn't be peeking on d_flags for
+ * DCACHE_OP_REVALIDATE unlocked, but in the unlikely case
+ * there is a race, the worst it can happen is that we fail to
+ * unset DCACHE_OP_REVALIDATE and pay the cost of an extra
+ * d_revalidate.
+ */
+ if (is_nokey_name) {
+ spin_lock(&dentry->d_lock);
+ dentry->d_flags |= DCACHE_NOKEY_NAME;
+ spin_unlock(&dentry->d_lock);
+ } else if (dentry->d_flags & DCACHE_OP_REVALIDATE &&
+ dentry->d_op->d_revalidate == fscrypt_d_revalidate) {
+ /*
+ * Unencrypted dentries and encrypted dentries where the
+ * key is available are always valid from fscrypt
+ * perspective. Avoid the cost of calling
+ * fscrypt_d_revalidate unnecessarily.
+ */
+ spin_lock(&dentry->d_lock);
+ dentry->d_flags &= ~DCACHE_OP_REVALIDATE;
+ spin_unlock(&dentry->d_lock);
+ }
+}
+
/* crypto.c */
void fscrypt_enqueue_decrypt_work(struct work_struct *);
@@ -368,7 +413,6 @@ int fscrypt_fname_disk_to_usr(const struct inode *inode,
bool fscrypt_match_name(const struct fscrypt_name *fname,
const u8 *de_name, u32 de_name_len);
u64 fscrypt_fname_siphash(const struct inode *dir, const struct qstr *name);
-int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags);
/* bio.c */
bool fscrypt_decrypt_bio(struct bio *bio);
@@ -425,6 +469,11 @@ static inline bool fscrypt_is_nokey_name(const struct dentry *dentry)
return false;
}
+static inline void fscrypt_prepare_dentry(struct dentry *dentry,
+ bool is_nokey_name)
+{
+}
+
/* crypto.c */
static inline void fscrypt_enqueue_decrypt_work(struct work_struct *work)
{
@@ -982,6 +1031,9 @@ static inline int fscrypt_prepare_lookup(struct inode *dir,
fname->usr_fname = &dentry->d_name;
fname->disk_name.name = (unsigned char *)dentry->d_name.name;
fname->disk_name.len = dentry->d_name.len;
+
+ fscrypt_prepare_dentry(dentry, false);
+
return 0;
}
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index de292a0..e2a916c 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -353,6 +353,15 @@ static inline bool gfp_has_io_fs(gfp_t gfp)
return (gfp & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS);
}
+/*
+ * Check if the gfp flags allow compaction - GFP_NOIO is a really
+ * tricky context because the migration might require IO.
+ */
+static inline bool gfp_compaction_allowed(gfp_t gfp_mask)
+{
+ return IS_ENABLED(CONFIG_COMPACTION) && (gfp_mask & __GFP_IO);
+}
+
extern gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma);
#ifdef CONFIG_CONTIG_ALLOC
diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h
index 9a5c6c7..7f75c9a 100644
--- a/include/linux/gpio/driver.h
+++ b/include/linux/gpio/driver.h
@@ -819,6 +819,24 @@ static inline struct gpio_chip *gpiod_to_chip(const struct gpio_desc *desc)
return ERR_PTR(-ENODEV);
}
+static inline struct gpio_device *gpiod_to_gpio_device(struct gpio_desc *desc)
+{
+ WARN_ON(1);
+ return ERR_PTR(-ENODEV);
+}
+
+static inline int gpio_device_get_base(struct gpio_device *gdev)
+{
+ WARN_ON(1);
+ return -ENODEV;
+}
+
+static inline const char *gpio_device_get_label(struct gpio_device *gdev)
+{
+ WARN_ON(1);
+ return NULL;
+}
+
static inline int gpiochip_lock_as_irq(struct gpio_chip *gc,
unsigned int offset)
{
diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 87e3bed..641c456 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -157,6 +157,7 @@ enum hrtimer_base_type {
* @max_hang_time: Maximum time spent in hrtimer_interrupt
* @softirq_expiry_lock: Lock which is taken while softirq based hrtimer are
* expired
+ * @online: CPU is online from an hrtimers point of view
* @timer_waiters: A hrtimer_cancel() invocation waits for the timer
* callback to finish.
* @expires_next: absolute time of the next event, is required for remote
@@ -179,7 +180,8 @@ struct hrtimer_cpu_base {
unsigned int hres_active : 1,
in_hrtirq : 1,
hang_detected : 1,
- softirq_activated : 1;
+ softirq_activated : 1,
+ online : 1;
#ifdef CONFIG_HIGH_RES_TIMERS
unsigned int nr_events;
unsigned short nr_retries;
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 2b00faf..6ef0557b 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -164,8 +164,28 @@ struct hv_ring_buffer {
u8 buffer[];
} __packed;
+
+/*
+ * If the requested ring buffer size is at least 8 times the size of the
+ * header, steal space from the ring buffer for the header. Otherwise, add
+ * space for the header so that is doesn't take too much of the ring buffer
+ * space.
+ *
+ * The factor of 8 is somewhat arbitrary. The goal is to prevent adding a
+ * relatively small header (4 Kbytes on x86) to a large-ish power-of-2 ring
+ * buffer size (such as 128 Kbytes) and so end up making a nearly twice as
+ * large allocation that will be almost half wasted. As a contrasting example,
+ * on ARM64 with 64 Kbyte page size, we don't want to take 64 Kbytes for the
+ * header from a 128 Kbyte allocation, leaving only 64 Kbytes for the ring.
+ * In this latter case, we must add 64 Kbytes for the header and not worry
+ * about what's wasted.
+ */
+#define VMBUS_HEADER_ADJ(payload_sz) \
+ ((payload_sz) >= 8 * sizeof(struct hv_ring_buffer) ? \
+ 0 : sizeof(struct hv_ring_buffer))
+
/* Calculate the proper size of a ringbuffer, it must be page-aligned */
-#define VMBUS_RING_SIZE(payload_sz) PAGE_ALIGN(sizeof(struct hv_ring_buffer) + \
+#define VMBUS_RING_SIZE(payload_sz) PAGE_ALIGN(VMBUS_HEADER_ADJ(payload_sz) + \
(payload_sz))
struct hv_ring_buffer_info {
diff --git a/include/linux/iio/adc/ad_sigma_delta.h b/include/linux/iio/adc/ad_sigma_delta.h
index 7852f6c..719cf9c 100644
--- a/include/linux/iio/adc/ad_sigma_delta.h
+++ b/include/linux/iio/adc/ad_sigma_delta.h
@@ -8,6 +8,8 @@
#ifndef __AD_SIGMA_DELTA_H__
#define __AD_SIGMA_DELTA_H__
+#include <linux/iio/iio.h>
+
enum ad_sigma_delta_mode {
AD_SD_MODE_CONTINUOUS = 0,
AD_SD_MODE_SINGLE = 1,
@@ -99,7 +101,7 @@ struct ad_sigma_delta {
* 'rx_buf' is up to 32 bits per sample + 64 bit timestamp,
* rounded to 16 bytes to take into account padding.
*/
- uint8_t tx_buf[4] ____cacheline_aligned;
+ uint8_t tx_buf[4] __aligned(IIO_DMA_MINALIGN);
uint8_t rx_buf[16] __aligned(8);
};
diff --git a/include/linux/iio/common/st_sensors.h b/include/linux/iio/common/st_sensors.h
index 607c3a8..f9ae5cd 100644
--- a/include/linux/iio/common/st_sensors.h
+++ b/include/linux/iio/common/st_sensors.h
@@ -258,9 +258,9 @@ struct st_sensor_data {
bool hw_irq_trigger;
s64 hw_timestamp;
- char buffer_data[ST_SENSORS_MAX_BUFFER_SIZE] ____cacheline_aligned;
-
struct mutex odr_lock;
+
+ char buffer_data[ST_SENSORS_MAX_BUFFER_SIZE] __aligned(IIO_DMA_MINALIGN);
};
#ifdef CONFIG_IIO_BUFFER
diff --git a/include/linux/iio/imu/adis.h b/include/linux/iio/imu/adis.h
index dc9ea29..8898966 100644
--- a/include/linux/iio/imu/adis.h
+++ b/include/linux/iio/imu/adis.h
@@ -11,6 +11,7 @@
#include <linux/spi/spi.h>
#include <linux/interrupt.h>
+#include <linux/iio/iio.h>
#include <linux/iio/types.h>
#define ADIS_WRITE_REG(reg) ((0x80 | (reg)))
@@ -131,7 +132,7 @@ struct adis {
unsigned long irq_flag;
void *buffer;
- u8 tx[10] ____cacheline_aligned;
+ u8 tx[10] __aligned(IIO_DMA_MINALIGN);
u8 rx[4];
};
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 854ad67..e248936 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -2,6 +2,7 @@
#define IO_URING_TYPES_H
#include <linux/blkdev.h>
+#include <linux/hashtable.h>
#include <linux/task_work.h>
#include <linux/bitmap.h>
#include <linux/llist.h>
@@ -240,12 +241,14 @@ struct io_ring_ctx {
unsigned int poll_activated: 1;
unsigned int drain_disabled: 1;
unsigned int compat: 1;
+ unsigned int iowq_limits_set : 1;
struct task_struct *submitter_task;
struct io_rings *rings;
struct percpu_ref refs;
enum task_work_notify_mode notify_method;
+ unsigned sq_thread_idle;
} ____cacheline_aligned_in_smp;
/* submission data */
@@ -274,10 +277,20 @@ struct io_ring_ctx {
*/
struct io_rsrc_node *rsrc_node;
atomic_t cancel_seq;
+
+ /*
+ * ->iopoll_list is protected by the ctx->uring_lock for
+ * io_uring instances that don't use IORING_SETUP_SQPOLL.
+ * For SQPOLL, only the single threaded io_sq_thread() will
+ * manipulate the list, hence no extra locking is needed there.
+ */
+ bool poll_multi_queue;
+ struct io_wq_work_list iopoll_list;
+
struct io_file_table file_table;
+ struct io_mapped_ubuf **user_bufs;
unsigned nr_user_files;
unsigned nr_user_bufs;
- struct io_mapped_ubuf **user_bufs;
struct io_submit_state submit_state;
@@ -289,15 +302,6 @@ struct io_ring_ctx {
struct io_alloc_cache netmsg_cache;
/*
- * ->iopoll_list is protected by the ctx->uring_lock for
- * io_uring instances that don't use IORING_SETUP_SQPOLL.
- * For SQPOLL, only the single threaded io_sq_thread() will
- * manipulate the list, hence no extra locking is needed there.
- */
- struct io_wq_work_list iopoll_list;
- bool poll_multi_queue;
-
- /*
* Any cancelable uring_cmd is added to this list in
* ->uring_cmd() by io_uring_cmd_insert_cancelable()
*/
@@ -343,8 +347,8 @@ struct io_ring_ctx {
spinlock_t completion_lock;
/* IRQ completion list, under ->completion_lock */
- struct io_wq_work_list locked_free_list;
unsigned int locked_free_nr;
+ struct io_wq_work_list locked_free_list;
struct list_head io_buffers_comp;
struct list_head cq_overflow_list;
@@ -366,9 +370,6 @@ struct io_ring_ctx {
unsigned int file_alloc_start;
unsigned int file_alloc_end;
- struct xarray personalities;
- u32 pers_next;
-
struct list_head io_buffers_cache;
/* deferred free list, protected by ->uring_lock */
@@ -389,6 +390,9 @@ struct io_ring_ctx {
struct wait_queue_head rsrc_quiesce_wq;
unsigned rsrc_quiesce;
+ u32 pers_next;
+ struct xarray personalities;
+
/* hashed buffered write serialization */
struct io_wq_hash *hash_map;
@@ -405,11 +409,22 @@ struct io_ring_ctx {
/* io-wq management, e.g. thread count */
u32 iowq_limits[2];
- bool iowq_limits_set;
struct callback_head poll_wq_task_work;
struct list_head defer_list;
- unsigned sq_thread_idle;
+
+#ifdef CONFIG_NET_RX_BUSY_POLL
+ struct list_head napi_list; /* track busy poll napi_id */
+ spinlock_t napi_lock; /* napi_list lock */
+
+ /* napi busy poll default timeout */
+ unsigned int napi_busy_poll_to;
+ bool napi_prefer_busy_poll;
+ bool napi_enabled;
+
+ DECLARE_HASHTABLE(napi_ht, 4);
+#endif
+
/* protected by ->completion_lock */
unsigned evfd_last_cq_tail;
@@ -455,7 +470,6 @@ enum {
REQ_F_SKIP_LINK_CQES_BIT,
REQ_F_SINGLE_POLL_BIT,
REQ_F_DOUBLE_POLL_BIT,
- REQ_F_PARTIAL_IO_BIT,
REQ_F_APOLL_MULTISHOT_BIT,
REQ_F_CLEAR_POLLIN_BIT,
REQ_F_HASH_LOCKED_BIT,
@@ -463,75 +477,88 @@ enum {
REQ_F_SUPPORT_NOWAIT_BIT,
REQ_F_ISREG_BIT,
REQ_F_POLL_NO_LAZY_BIT,
+ REQ_F_CANCEL_SEQ_BIT,
+ REQ_F_CAN_POLL_BIT,
+ REQ_F_BL_EMPTY_BIT,
+ REQ_F_BL_NO_RECYCLE_BIT,
/* not a real bit, just to check we're not overflowing the space */
__REQ_F_LAST_BIT,
};
+typedef u64 __bitwise io_req_flags_t;
+#define IO_REQ_FLAG(bitno) ((__force io_req_flags_t) BIT_ULL((bitno)))
+
enum {
/* ctx owns file */
- REQ_F_FIXED_FILE = BIT(REQ_F_FIXED_FILE_BIT),
+ REQ_F_FIXED_FILE = IO_REQ_FLAG(REQ_F_FIXED_FILE_BIT),
/* drain existing IO first */
- REQ_F_IO_DRAIN = BIT(REQ_F_IO_DRAIN_BIT),
+ REQ_F_IO_DRAIN = IO_REQ_FLAG(REQ_F_IO_DRAIN_BIT),
/* linked sqes */
- REQ_F_LINK = BIT(REQ_F_LINK_BIT),
+ REQ_F_LINK = IO_REQ_FLAG(REQ_F_LINK_BIT),
/* doesn't sever on completion < 0 */
- REQ_F_HARDLINK = BIT(REQ_F_HARDLINK_BIT),
+ REQ_F_HARDLINK = IO_REQ_FLAG(REQ_F_HARDLINK_BIT),
/* IOSQE_ASYNC */
- REQ_F_FORCE_ASYNC = BIT(REQ_F_FORCE_ASYNC_BIT),
+ REQ_F_FORCE_ASYNC = IO_REQ_FLAG(REQ_F_FORCE_ASYNC_BIT),
/* IOSQE_BUFFER_SELECT */
- REQ_F_BUFFER_SELECT = BIT(REQ_F_BUFFER_SELECT_BIT),
+ REQ_F_BUFFER_SELECT = IO_REQ_FLAG(REQ_F_BUFFER_SELECT_BIT),
/* IOSQE_CQE_SKIP_SUCCESS */
- REQ_F_CQE_SKIP = BIT(REQ_F_CQE_SKIP_BIT),
+ REQ_F_CQE_SKIP = IO_REQ_FLAG(REQ_F_CQE_SKIP_BIT),
/* fail rest of links */
- REQ_F_FAIL = BIT(REQ_F_FAIL_BIT),
+ REQ_F_FAIL = IO_REQ_FLAG(REQ_F_FAIL_BIT),
/* on inflight list, should be cancelled and waited on exit reliably */
- REQ_F_INFLIGHT = BIT(REQ_F_INFLIGHT_BIT),
+ REQ_F_INFLIGHT = IO_REQ_FLAG(REQ_F_INFLIGHT_BIT),
/* read/write uses file position */
- REQ_F_CUR_POS = BIT(REQ_F_CUR_POS_BIT),
+ REQ_F_CUR_POS = IO_REQ_FLAG(REQ_F_CUR_POS_BIT),
/* must not punt to workers */
- REQ_F_NOWAIT = BIT(REQ_F_NOWAIT_BIT),
+ REQ_F_NOWAIT = IO_REQ_FLAG(REQ_F_NOWAIT_BIT),
/* has or had linked timeout */
- REQ_F_LINK_TIMEOUT = BIT(REQ_F_LINK_TIMEOUT_BIT),
+ REQ_F_LINK_TIMEOUT = IO_REQ_FLAG(REQ_F_LINK_TIMEOUT_BIT),
/* needs cleanup */
- REQ_F_NEED_CLEANUP = BIT(REQ_F_NEED_CLEANUP_BIT),
+ REQ_F_NEED_CLEANUP = IO_REQ_FLAG(REQ_F_NEED_CLEANUP_BIT),
/* already went through poll handler */
- REQ_F_POLLED = BIT(REQ_F_POLLED_BIT),
+ REQ_F_POLLED = IO_REQ_FLAG(REQ_F_POLLED_BIT),
/* buffer already selected */
- REQ_F_BUFFER_SELECTED = BIT(REQ_F_BUFFER_SELECTED_BIT),
+ REQ_F_BUFFER_SELECTED = IO_REQ_FLAG(REQ_F_BUFFER_SELECTED_BIT),
/* buffer selected from ring, needs commit */
- REQ_F_BUFFER_RING = BIT(REQ_F_BUFFER_RING_BIT),
+ REQ_F_BUFFER_RING = IO_REQ_FLAG(REQ_F_BUFFER_RING_BIT),
/* caller should reissue async */
- REQ_F_REISSUE = BIT(REQ_F_REISSUE_BIT),
+ REQ_F_REISSUE = IO_REQ_FLAG(REQ_F_REISSUE_BIT),
/* supports async reads/writes */
- REQ_F_SUPPORT_NOWAIT = BIT(REQ_F_SUPPORT_NOWAIT_BIT),
+ REQ_F_SUPPORT_NOWAIT = IO_REQ_FLAG(REQ_F_SUPPORT_NOWAIT_BIT),
/* regular file */
- REQ_F_ISREG = BIT(REQ_F_ISREG_BIT),
+ REQ_F_ISREG = IO_REQ_FLAG(REQ_F_ISREG_BIT),
/* has creds assigned */
- REQ_F_CREDS = BIT(REQ_F_CREDS_BIT),
+ REQ_F_CREDS = IO_REQ_FLAG(REQ_F_CREDS_BIT),
/* skip refcounting if not set */
- REQ_F_REFCOUNT = BIT(REQ_F_REFCOUNT_BIT),
+ REQ_F_REFCOUNT = IO_REQ_FLAG(REQ_F_REFCOUNT_BIT),
/* there is a linked timeout that has to be armed */
- REQ_F_ARM_LTIMEOUT = BIT(REQ_F_ARM_LTIMEOUT_BIT),
+ REQ_F_ARM_LTIMEOUT = IO_REQ_FLAG(REQ_F_ARM_LTIMEOUT_BIT),
/* ->async_data allocated */
- REQ_F_ASYNC_DATA = BIT(REQ_F_ASYNC_DATA_BIT),
+ REQ_F_ASYNC_DATA = IO_REQ_FLAG(REQ_F_ASYNC_DATA_BIT),
/* don't post CQEs while failing linked requests */
- REQ_F_SKIP_LINK_CQES = BIT(REQ_F_SKIP_LINK_CQES_BIT),
+ REQ_F_SKIP_LINK_CQES = IO_REQ_FLAG(REQ_F_SKIP_LINK_CQES_BIT),
/* single poll may be active */
- REQ_F_SINGLE_POLL = BIT(REQ_F_SINGLE_POLL_BIT),
+ REQ_F_SINGLE_POLL = IO_REQ_FLAG(REQ_F_SINGLE_POLL_BIT),
/* double poll may active */
- REQ_F_DOUBLE_POLL = BIT(REQ_F_DOUBLE_POLL_BIT),
- /* request has already done partial IO */
- REQ_F_PARTIAL_IO = BIT(REQ_F_PARTIAL_IO_BIT),
+ REQ_F_DOUBLE_POLL = IO_REQ_FLAG(REQ_F_DOUBLE_POLL_BIT),
/* fast poll multishot mode */
- REQ_F_APOLL_MULTISHOT = BIT(REQ_F_APOLL_MULTISHOT_BIT),
+ REQ_F_APOLL_MULTISHOT = IO_REQ_FLAG(REQ_F_APOLL_MULTISHOT_BIT),
/* recvmsg special flag, clear EPOLLIN */
- REQ_F_CLEAR_POLLIN = BIT(REQ_F_CLEAR_POLLIN_BIT),
+ REQ_F_CLEAR_POLLIN = IO_REQ_FLAG(REQ_F_CLEAR_POLLIN_BIT),
/* hashed into ->cancel_hash_locked, protected by ->uring_lock */
- REQ_F_HASH_LOCKED = BIT(REQ_F_HASH_LOCKED_BIT),
+ REQ_F_HASH_LOCKED = IO_REQ_FLAG(REQ_F_HASH_LOCKED_BIT),
/* don't use lazy poll wake for this request */
- REQ_F_POLL_NO_LAZY = BIT(REQ_F_POLL_NO_LAZY_BIT),
+ REQ_F_POLL_NO_LAZY = IO_REQ_FLAG(REQ_F_POLL_NO_LAZY_BIT),
+ /* cancel sequence is set and valid */
+ REQ_F_CANCEL_SEQ = IO_REQ_FLAG(REQ_F_CANCEL_SEQ_BIT),
+ /* file is pollable */
+ REQ_F_CAN_POLL = IO_REQ_FLAG(REQ_F_CAN_POLL_BIT),
+ /* buffer list was empty after selection of buffer */
+ REQ_F_BL_EMPTY = IO_REQ_FLAG(REQ_F_BL_EMPTY_BIT),
+ /* don't recycle provided buffers for this request */
+ REQ_F_BL_NO_RECYCLE = IO_REQ_FLAG(REQ_F_BL_NO_RECYCLE_BIT),
};
typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts);
@@ -592,15 +619,17 @@ struct io_kiocb {
* and after selection it points to the buffer ID itself.
*/
u16 buf_index;
- unsigned int flags;
+
+ unsigned nr_tw;
+
+ /* REQ_F_* flags */
+ io_req_flags_t flags;
struct io_cqe cqe;
struct io_ring_ctx *ctx;
struct task_struct *task;
- struct io_rsrc_node *rsrc_node;
-
union {
/* store used ubuf, so we can prevent reloading */
struct io_mapped_ubuf *imu;
@@ -621,10 +650,12 @@ struct io_kiocb {
/* cache ->apoll->events */
__poll_t apoll_events;
};
+
+ struct io_rsrc_node *rsrc_node;
+
atomic_t refs;
atomic_t poll_refs;
struct io_task_work io_task_work;
- unsigned nr_tw;
/* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */
struct hlist_node hash_node;
/* internal polling, see IORING_FEAT_FAST_POLL */
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 96dd0ac..6fc1c85 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -293,22 +293,32 @@ struct iomap_ioend {
struct list_head io_list; /* next ioend in chain */
u16 io_type;
u16 io_flags; /* IOMAP_F_* */
- u32 io_folios; /* folios added to ioend */
struct inode *io_inode; /* file being written to */
size_t io_size; /* size of the extent */
loff_t io_offset; /* offset in the file */
sector_t io_sector; /* start sector of ioend */
- struct bio *io_bio; /* bio being built */
- struct bio io_inline_bio; /* MUST BE LAST! */
+ struct bio io_bio; /* MUST BE LAST! */
};
+static inline struct iomap_ioend *iomap_ioend_from_bio(struct bio *bio)
+{
+ return container_of(bio, struct iomap_ioend, io_bio);
+}
+
struct iomap_writeback_ops {
/*
* Required, maps the blocks so that writeback can be performed on
* the range starting at offset.
+ *
+ * Can return arbitrarily large regions, but we need to call into it at
+ * least once per folio to allow the file systems to synchronize with
+ * the write path that could be invalidating mappings.
+ *
+ * An existing mapping from a previous call to this method can be reused
+ * by the file system if it is still valid.
*/
int (*map_blocks)(struct iomap_writepage_ctx *wpc, struct inode *inode,
- loff_t offset);
+ loff_t offset, unsigned len);
/*
* Optional, allows the file systems to perform actions just before
@@ -329,6 +339,7 @@ struct iomap_writepage_ctx {
struct iomap iomap;
struct iomap_ioend *ioend;
const struct iomap_writeback_ops *ops;
+ u32 nr_folios; /* folios added to the ioend */
};
void iomap_finish_ioends(struct iomap_ioend *ioend, int error);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 1ea2a82..5e27cb3 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -892,11 +892,14 @@ struct iommu_fwspec {
struct iommu_sva {
struct device *dev;
struct iommu_domain *domain;
+ struct list_head handle_item;
+ refcount_t users;
};
struct iommu_mm_data {
u32 pasid;
struct list_head sva_domains;
+ struct list_head sva_handles;
};
int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7e7fd25..179df96 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2031,6 +2031,32 @@ static inline int mmu_invalidate_retry_gfn(struct kvm *kvm,
return 1;
return 0;
}
+
+/*
+ * This lockless version of the range-based retry check *must* be paired with a
+ * call to the locked version after acquiring mmu_lock, i.e. this is safe to
+ * use only as a pre-check to avoid contending mmu_lock. This version *will*
+ * get false negatives and false positives.
+ */
+static inline bool mmu_invalidate_retry_gfn_unsafe(struct kvm *kvm,
+ unsigned long mmu_seq,
+ gfn_t gfn)
+{
+ /*
+ * Use READ_ONCE() to ensure the in-progress flag and sequence counter
+ * are always read from memory, e.g. so that checking for retry in a
+ * loop won't result in an infinite retry loop. Don't force loads for
+ * start+end, as the key to avoiding infinite retry loops is observing
+ * the 1=>0 transition of in-progress, i.e. getting false negatives
+ * due to stale start+end values is acceptable.
+ */
+ if (unlikely(READ_ONCE(kvm->mmu_invalidate_in_progress)) &&
+ gfn >= kvm->mmu_invalidate_range_start &&
+ gfn < kvm->mmu_invalidate_range_end)
+ return true;
+
+ return READ_ONCE(kvm->mmu_invalidate_seq) != mmu_seq;
+}
#endif
#ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h
index 9f56541..1b95fe3 100644
--- a/include/linux/lockd/lockd.h
+++ b/include/linux/lockd/lockd.h
@@ -375,12 +375,12 @@ static inline int nlm_privileged_requester(const struct svc_rqst *rqstp)
static inline int nlm_compare_locks(const struct file_lock *fl1,
const struct file_lock *fl2)
{
- return file_inode(fl1->fl_file) == file_inode(fl2->fl_file)
- && fl1->fl_pid == fl2->fl_pid
- && fl1->fl_owner == fl2->fl_owner
+ return file_inode(fl1->c.flc_file) == file_inode(fl2->c.flc_file)
+ && fl1->c.flc_pid == fl2->c.flc_pid
+ && fl1->c.flc_owner == fl2->c.flc_owner
&& fl1->fl_start == fl2->fl_start
&& fl1->fl_end == fl2->fl_end
- &&(fl1->fl_type == fl2->fl_type || fl2->fl_type == F_UNLCK);
+ &&(fl1->c.flc_type == fl2->c.flc_type || fl2->c.flc_type == F_UNLCK);
}
extern const struct lock_manager_operations nlmsvc_lock_operations;
diff --git a/include/linux/lockd/xdr.h b/include/linux/lockd/xdr.h
index b60fbcd..80cca94 100644
--- a/include/linux/lockd/xdr.h
+++ b/include/linux/lockd/xdr.h
@@ -52,7 +52,7 @@ struct nlm_lock {
* FreeBSD uses 16, Apple Mac OS X 10.3 uses 20. Therefore we set it to
* 32 bytes.
*/
-
+
struct nlm_cookie
{
unsigned char data[NLM_MAXCOOKIELEN];
diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
index b3d6312..a53ad4d 100644
--- a/include/linux/maple_tree.h
+++ b/include/linux/maple_tree.h
@@ -171,6 +171,7 @@ enum maple_type {
#define MT_FLAGS_LOCK_IRQ 0x100
#define MT_FLAGS_LOCK_BH 0x200
#define MT_FLAGS_LOCK_EXTERN 0x300
+#define MT_FLAGS_ALLOC_WRAPPED 0x0800
#define MAPLE_HEIGHT_MAX 31
@@ -319,6 +320,9 @@ int mtree_insert_range(struct maple_tree *mt, unsigned long first,
int mtree_alloc_range(struct maple_tree *mt, unsigned long *startp,
void *entry, unsigned long size, unsigned long min,
unsigned long max, gfp_t gfp);
+int mtree_alloc_cyclic(struct maple_tree *mt, unsigned long *startp,
+ void *entry, unsigned long range_lo, unsigned long range_hi,
+ unsigned long *next, gfp_t gfp);
int mtree_alloc_rrange(struct maple_tree *mt, unsigned long *startp,
void *entry, unsigned long size, unsigned long min,
unsigned long max, gfp_t gfp);
@@ -499,6 +503,9 @@ void *mas_find_range(struct ma_state *mas, unsigned long max);
void *mas_find_rev(struct ma_state *mas, unsigned long min);
void *mas_find_range_rev(struct ma_state *mas, unsigned long max);
int mas_preallocate(struct ma_state *mas, void *entry, gfp_t gfp);
+int mas_alloc_cyclic(struct ma_state *mas, unsigned long *startp,
+ void *entry, unsigned long range_lo, unsigned long range_hi,
+ unsigned long *next, gfp_t gfp);
bool mas_nomem(struct ma_state *mas, gfp_t gfp);
void mas_pause(struct ma_state *mas);
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index b695f9e..e208224 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -121,6 +121,8 @@ int memblock_reserve(phys_addr_t base, phys_addr_t size);
int memblock_physmem_add(phys_addr_t base, phys_addr_t size);
#endif
void memblock_trim_memory(phys_addr_t align);
+unsigned long memblock_addrs_overlap(phys_addr_t base1, phys_addr_t size1,
+ phys_addr_t base2, phys_addr_t size2);
bool memblock_overlaps_region(struct memblock_type *type,
phys_addr_t base, phys_addr_t size);
bool memblock_validate_numa_coverage(unsigned long threshold_bytes);
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index c726f90a..486b749 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1103,7 +1103,7 @@ struct mlx5_ifc_roce_cap_bits {
u8 sw_r_roce_src_udp_port[0x1];
u8 fl_rc_qp_when_roce_disabled[0x1];
u8 fl_rc_qp_when_roce_enabled[0x1];
- u8 reserved_at_7[0x1];
+ u8 roce_cc_general[0x1];
u8 qp_ooo_transmit_default[0x1];
u8 reserved_at_9[0x15];
u8 qp_ts_format[0x2];
@@ -10261,7 +10261,9 @@ struct mlx5_ifc_mcam_access_reg_bits {
u8 regs_63_to_46[0x12];
u8 mrtc[0x1];
- u8 regs_44_to_32[0xd];
+ u8 regs_44_to_41[0x4];
+ u8 mfrl[0x1];
+ u8 regs_39_to_32[0x8];
u8 regs_31_to_10[0x16];
u8 mtmp[0x1];
diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h
index bd53cf4..f0e55bf 100644
--- a/include/linux/mlx5/qp.h
+++ b/include/linux/mlx5/qp.h
@@ -269,7 +269,10 @@ struct mlx5_wqe_eth_seg {
union {
struct {
__be16 sz;
- u8 start[2];
+ union {
+ u8 start[2];
+ DECLARE_FLEX_ARRAY(u8, data);
+ };
} inline_hdr;
struct {
__be16 type;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 118c402..78a09af 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -79,8 +79,6 @@ struct xdp_buff;
struct xdp_frame;
struct xdp_metadata_ops;
struct xdp_md;
-/* DPLL specific */
-struct dpll_pin;
typedef u32 xdp_features_t;
@@ -2141,6 +2139,11 @@ struct net_device {
/* TXRX read-mostly hotpath */
__cacheline_group_begin(net_device_read_txrx);
+ union {
+ struct pcpu_lstats __percpu *lstats;
+ struct pcpu_sw_netstats __percpu *tstats;
+ struct pcpu_dstats __percpu *dstats;
+ };
unsigned int flags;
unsigned short hard_header_len;
netdev_features_t features;
@@ -2395,11 +2398,6 @@ struct net_device {
enum netdev_ml_priv_type ml_priv_type;
enum netdev_stat_type pcpu_stat_type:8;
- union {
- struct pcpu_lstats __percpu *lstats;
- struct pcpu_sw_netstats __percpu *tstats;
- struct pcpu_dstats __percpu *dstats;
- };
#if IS_ENABLED(CONFIG_GARP)
struct garp_port __rcu *garp_port;
@@ -2469,7 +2467,7 @@ struct net_device {
struct devlink_port *devlink_port;
#if IS_ENABLED(CONFIG_DPLL)
- struct dpll_pin *dpll_pin;
+ struct dpll_pin __rcu *dpll_pin;
#endif
#if IS_ENABLED(CONFIG_PAGE_POOL)
/** @page_pools: page pools created for this netdevice */
@@ -3499,6 +3497,16 @@ static inline void netdev_queue_set_dql_min_limit(struct netdev_queue *dev_queue
#endif
}
+static inline int netdev_queue_dql_avail(const struct netdev_queue *txq)
+{
+#ifdef CONFIG_BQL
+ /* Non-BQL migrated drivers will return 0, too. */
+ return dql_avail(&txq->dql);
+#else
+ return 0;
+#endif
+}
+
/**
* netdev_txq_bql_enqueue_prefetchw - prefetch bql data for write
* @dev_queue: pointer to transmit queue
@@ -4032,17 +4040,6 @@ int dev_get_mac_address(struct sockaddr *sa, struct net *net, char *dev_name);
int dev_get_port_parent_id(struct net_device *dev,
struct netdev_phys_item_id *ppid, bool recurse);
bool netdev_port_same_parent_id(struct net_device *a, struct net_device *b);
-void netdev_dpll_pin_set(struct net_device *dev, struct dpll_pin *dpll_pin);
-void netdev_dpll_pin_clear(struct net_device *dev);
-
-static inline struct dpll_pin *netdev_dpll_pin(const struct net_device *dev)
-{
-#if IS_ENABLED(CONFIG_DPLL)
- return dev->dpll_pin;
-#else
- return NULL;
-#endif
-}
struct sk_buff *validate_xmit_skb_list(struct sk_buff *skb, struct net_device *dev, bool *again);
struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index 80900d9..ce660d5 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -474,6 +474,7 @@ struct nf_ct_hook {
const struct sk_buff *);
void (*attach)(struct sk_buff *nskb, const struct sk_buff *skb);
void (*set_closing)(struct nf_conntrack *nfct);
+ int (*confirm)(struct sk_buff *skb);
};
extern const struct nf_ct_hook __rcu *nf_ct_hook;
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index cd797e0..92de074 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -124,6 +124,7 @@ struct nfs_client {
char cl_ipaddr[48];
struct net *cl_net;
struct list_head pending_cb_stateids;
+ struct rcu_head rcu;
};
/*
@@ -265,6 +266,7 @@ struct nfs_server {
const struct cred *cred;
bool has_sec_mnt_opts;
struct kobject kobj;
+ struct rcu_head rcu;
};
/* Server capabilities */
diff --git a/include/linux/ns_common.h b/include/linux/ns_common.h
index 0f1d024..7d22ea5 100644
--- a/include/linux/ns_common.h
+++ b/include/linux/ns_common.h
@@ -7,7 +7,7 @@
struct proc_ns_operations;
struct ns_common {
- atomic_long_t stashed;
+ struct dentry *stashed;
const struct proc_ns_operations *ops;
unsigned int inum;
refcount_t count;
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index ce0c114..4255732 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -644,6 +644,7 @@ enum {
NVME_CMD_EFFECTS_NCC = 1 << 2,
NVME_CMD_EFFECTS_NIC = 1 << 3,
NVME_CMD_EFFECTS_CCC = 1 << 4,
+ NVME_CMD_EFFECTS_CSER_MASK = GENMASK(15, 14),
NVME_CMD_EFFECTS_CSE_MASK = GENMASK(18, 16),
NVME_CMD_EFFECTS_UUID_SEL = 1 << 19,
NVME_CMD_EFFECTS_SCOPE_MASK = GENMASK(31, 20),
diff --git a/include/linux/pid.h b/include/linux/pid.h
index 395cacc..c79a0ef 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -55,6 +55,10 @@ struct pid
refcount_t count;
unsigned int level;
spinlock_t lock;
+#ifdef CONFIG_FS_PID
+ struct dentry *stashed;
+ unsigned long ino;
+#endif
/* lists of tasks that use this pid */
struct hlist_head tasks[PIDTYPE_MAX];
struct hlist_head inodes;
@@ -66,15 +70,13 @@ struct pid
extern struct pid init_struct_pid;
-extern const struct file_operations pidfd_fops;
-
struct file;
-extern struct pid *pidfd_pid(const struct file *file);
+struct pid *pidfd_pid(const struct file *file);
struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags);
struct task_struct *pidfd_get_task(int pidfd, unsigned int *flags);
-int pidfd_create(struct pid *pid, unsigned int flags);
int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret);
+void do_notify_pidfd(struct task_struct *task);
static inline struct pid *get_pid(struct pid *pid)
{
diff --git a/include/linux/pidfs.h b/include/linux/pidfs.h
new file mode 100644
index 0000000..40dd325
--- /dev/null
+++ b/include/linux/pidfs.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_PID_FS_H
+#define _LINUX_PID_FS_H
+
+struct file *pidfs_alloc_file(struct pid *pid, unsigned int flags);
+void __init pidfs_init(void);
+bool is_pidfs_sb(const struct super_block *sb);
+
+#endif /* _LINUX_PID_FS_H */
diff --git a/include/linux/pktcdvd.h b/include/linux/pktcdvd.h
index 79594ae..2f1b952 100644
--- a/include/linux/pktcdvd.h
+++ b/include/linux/pktcdvd.h
@@ -154,9 +154,9 @@ struct packet_stacked_data
struct pktcdvd_device
{
- struct bdev_handle *bdev_handle; /* dev attached */
+ struct file *bdev_file; /* dev attached */
/* handle acquired for bdev during pkt_open_dev() */
- struct bdev_handle *open_bdev_handle;
+ struct file *f_open_bdev;
dev_t pkt_dev; /* our dev */
struct packet_settings settings;
struct packet_stats stats;
diff --git a/include/linux/poison.h b/include/linux/poison.h
index 27a7dad..1f0ee24 100644
--- a/include/linux/poison.h
+++ b/include/linux/poison.h
@@ -92,4 +92,7 @@
/********** VFS **********/
#define VFS_PTR_POISON ((void *)(0xF5 + POISON_POINTER_DELTA))
+/********** lib/stackdepot.c **********/
+#define STACK_DEPOT_POISON ((void *)(0xD390 + POISON_POINTER_DELTA))
+
#endif
diff --git a/include/linux/poll.h b/include/linux/poll.h
index a9e0e1c..d1ea4f3 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -14,11 +14,7 @@
/* ~832 bytes of stack space used max in sys_select/sys_poll before allocating
additional memory. */
-#ifdef __clang__
-#define MAX_STACK_ALLOC 768
-#else
#define MAX_STACK_ALLOC 832
-#endif
#define FRONTEND_STACK_ALLOC 256
#define SELECT_STACK_ALLOC FRONTEND_STACK_ALLOC
#define POLL_STACK_ALLOC FRONTEND_STACK_ALLOC
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index de407e7..0b2a898 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -65,6 +65,7 @@ struct proc_fs_info {
kgid_t pid_gid;
enum proc_hidepid hide_pid;
enum proc_pidonly pidonly;
+ struct rcu_head rcu;
};
static inline struct proc_fs_info *proc_sb_info(struct super_block *sb)
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
index 49539bc..5ea470e 100644
--- a/include/linux/proc_ns.h
+++ b/include/linux/proc_ns.h
@@ -66,7 +66,7 @@ static inline void proc_free_inum(unsigned int inum) {}
static inline int ns_alloc_inum(struct ns_common *ns)
{
- atomic_long_set(&ns->stashed, 0);
+ WRITE_ONCE(ns->stashed, NULL);
return proc_alloc_inum(&ns->inum);
}
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index eaaef3f..90507d4 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -393,6 +393,10 @@ static inline void user_single_step_report(struct pt_regs *regs)
#define current_user_stack_pointer() user_stack_pointer(current_pt_regs())
#endif
+#ifndef exception_ip
+#define exception_ip(x) instruction_pointer(x)
+#endif
+
extern int task_current_syscall(struct task_struct *target, struct syscall_info *info);
extern void sigaction_compat_abi(struct k_sigaction *act, struct k_sigaction *oact);
diff --git a/include/linux/rw_hint.h b/include/linux/rw_hint.h
new file mode 100644
index 0000000..309ca72
--- /dev/null
+++ b/include/linux/rw_hint.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_RW_HINT_H
+#define _LINUX_RW_HINT_H
+
+#include <linux/build_bug.h>
+#include <linux/compiler_attributes.h>
+#include <uapi/linux/fcntl.h>
+
+/* Block storage write lifetime hint values. */
+enum rw_hint {
+ WRITE_LIFE_NOT_SET = RWH_WRITE_LIFE_NOT_SET,
+ WRITE_LIFE_NONE = RWH_WRITE_LIFE_NONE,
+ WRITE_LIFE_SHORT = RWH_WRITE_LIFE_SHORT,
+ WRITE_LIFE_MEDIUM = RWH_WRITE_LIFE_MEDIUM,
+ WRITE_LIFE_LONG = RWH_WRITE_LIFE_LONG,
+ WRITE_LIFE_EXTREME = RWH_WRITE_LIFE_EXTREME,
+} __packed;
+
+/* Sparse ignores __packed annotations on enums, hence the #ifndef below. */
+#ifndef __CHECKER__
+static_assert(sizeof(enum rw_hint) == 1);
+#endif
+
+#endif /* _LINUX_RW_HINT_H */
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 4b7664c..0a0e23c 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -735,8 +735,6 @@ static inline int thread_group_empty(struct task_struct *p)
#define delay_group_leader(p) \
(thread_group_leader(p) && !thread_group_empty(p))
-extern bool thread_group_exited(struct pid *pid);
-
extern struct sighand_struct *__lock_task_sighand(struct task_struct *task,
unsigned long *flags);
diff --git a/include/linux/seq_buf.h b/include/linux/seq_buf.h
index c44f4b4..fe41da0 100644
--- a/include/linux/seq_buf.h
+++ b/include/linux/seq_buf.h
@@ -2,7 +2,10 @@
#ifndef _LINUX_SEQ_BUF_H
#define _LINUX_SEQ_BUF_H
-#include <linux/fs.h>
+#include <linux/bug.h>
+#include <linux/minmax.h>
+#include <linux/seq_file.h>
+#include <linux/types.h>
/*
* Trace sequences are used to allow a function to call several other functions
@@ -10,7 +13,7 @@
*/
/**
- * seq_buf - seq buffer structure
+ * struct seq_buf - seq buffer structure
* @buffer: pointer to the buffer
* @size: size of the buffer
* @len: the amount of data inside the buffer
@@ -77,10 +80,10 @@ static inline unsigned int seq_buf_used(struct seq_buf *s)
}
/**
- * seq_buf_str - get %NUL-terminated C string from seq_buf
+ * seq_buf_str - get NUL-terminated C string from seq_buf
* @s: the seq_buf handle
*
- * This makes sure that the buffer in @s is nul terminated and
+ * This makes sure that the buffer in @s is NUL-terminated and
* safe to read as a string.
*
* Note, if this is called when the buffer has overflowed, then
@@ -90,7 +93,7 @@ static inline unsigned int seq_buf_used(struct seq_buf *s)
* After this function is called, s->buffer is safe to use
* in string operations.
*
- * Returns @s->buf after making sure it is terminated.
+ * Returns: @s->buf after making sure it is terminated.
*/
static inline const char *seq_buf_str(struct seq_buf *s)
{
@@ -110,7 +113,7 @@ static inline const char *seq_buf_str(struct seq_buf *s)
* @s: the seq_buf handle
* @bufp: the beginning of the buffer is stored here
*
- * Return the number of bytes available in the buffer, or zero if
+ * Returns: the number of bytes available in the buffer, or zero if
* there's no space.
*/
static inline size_t seq_buf_get_buf(struct seq_buf *s, char **bufp)
@@ -132,7 +135,7 @@ static inline size_t seq_buf_get_buf(struct seq_buf *s, char **bufp)
* @num: the number of bytes to commit
*
* Commit @num bytes of data written to a buffer previously acquired
- * by seq_buf_get. To signal an error condition, or that the data
+ * by seq_buf_get_buf(). To signal an error condition, or that the data
* didn't fit in the available space, pass a negative @num value.
*/
static inline void seq_buf_commit(struct seq_buf *s, int num)
diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
index 536b258..55b1f3b 100644
--- a/include/linux/serial_core.h
+++ b/include/linux/serial_core.h
@@ -748,8 +748,17 @@ struct uart_driver {
void uart_write_wakeup(struct uart_port *port);
-#define __uart_port_tx(uport, ch, tx_ready, put_char, tx_done, for_test, \
- for_post) \
+/**
+ * enum UART_TX_FLAGS -- flags for uart_port_tx_flags()
+ *
+ * @UART_TX_NOSTOP: don't call port->ops->stop_tx() on empty buffer
+ */
+enum UART_TX_FLAGS {
+ UART_TX_NOSTOP = BIT(0),
+};
+
+#define __uart_port_tx(uport, ch, flags, tx_ready, put_char, tx_done, \
+ for_test, for_post) \
({ \
struct uart_port *__port = (uport); \
struct circ_buf *xmit = &__port->state->xmit; \
@@ -777,7 +786,7 @@ void uart_write_wakeup(struct uart_port *port);
if (pending < WAKEUP_CHARS) { \
uart_write_wakeup(__port); \
\
- if (pending == 0) \
+ if (!((flags) & UART_TX_NOSTOP) && pending == 0) \
__port->ops->stop_tx(__port); \
} \
\
@@ -812,7 +821,7 @@ void uart_write_wakeup(struct uart_port *port);
*/
#define uart_port_tx_limited(port, ch, count, tx_ready, put_char, tx_done) ({ \
unsigned int __count = (count); \
- __uart_port_tx(port, ch, tx_ready, put_char, tx_done, __count, \
+ __uart_port_tx(port, ch, 0, tx_ready, put_char, tx_done, __count, \
__count--); \
})
@@ -826,8 +835,21 @@ void uart_write_wakeup(struct uart_port *port);
* See uart_port_tx_limited() for more details.
*/
#define uart_port_tx(port, ch, tx_ready, put_char) \
- __uart_port_tx(port, ch, tx_ready, put_char, ({}), true, ({}))
+ __uart_port_tx(port, ch, 0, tx_ready, put_char, ({}), true, ({}))
+
+/**
+ * uart_port_tx_flags -- transmit helper for uart_port with flags
+ * @port: uart port
+ * @ch: variable to store a character to be written to the HW
+ * @flags: %UART_TX_NOSTOP or similar
+ * @tx_ready: can HW accept more data function
+ * @put_char: function to write a character
+ *
+ * See uart_port_tx_limited() for more details.
+ */
+#define uart_port_tx_flags(port, ch, flags, tx_ready, put_char) \
+ __uart_port_tx(port, ch, flags, tx_ready, put_char, ({}), true, ({}))
/*
* Baud rate helpers.
*/
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 4db00dd..378ab1c 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -298,7 +298,7 @@ struct swap_info_struct {
unsigned int __percpu *cluster_next_cpu; /*percpu index for next allocation */
struct percpu_cluster __percpu *percpu_cluster; /* per cpu's swap location */
struct rb_root swap_extent_root;/* root of the swap extent rbtree */
- struct bdev_handle *bdev_handle;/* open handle of the bdev */
+ struct file *bdev_file; /* open handle of the bdev */
struct block_device *bdev; /* swap device or bdev of swap file */
struct file *swap_file; /* seldom referenced */
unsigned int old_block_size; /* seldom referenced */
@@ -549,6 +549,11 @@ static inline int swap_duplicate(swp_entry_t swp)
return 0;
}
+static inline int swapcache_prepare(swp_entry_t swp)
+{
+ return 0;
+}
+
static inline void swap_free(swp_entry_t swp)
{
}
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 89b290d..a1c47a6 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -221,8 +221,10 @@ struct tcp_sock {
u32 lost_out; /* Lost packets */
u32 sacked_out; /* SACK'd packets */
u16 tcp_header_len; /* Bytes of tcp header to send */
+ u8 scaling_ratio; /* see tcp_win_from_space() */
u8 chrono_type : 2, /* current chronograph type */
repair : 1,
+ tcp_usec_ts : 1, /* TSval values in usec */
is_sack_reneg:1, /* in recovery from loss with SACK reneg? */
is_cwnd_limited:1;/* forward progress limited by snd_cwnd? */
__cacheline_group_end(tcp_sock_read_txrx);
@@ -352,7 +354,6 @@ struct tcp_sock {
u32 compressed_ack_rcv_nxt;
struct list_head tsq_node; /* anchor in tsq_tasklet.head list */
- u8 scaling_ratio; /* see tcp_win_from_space() */
/* Information of the most recently (s)acked skb */
struct tcp_rack {
u64 mstamp; /* (Re)sent time of the skb */
@@ -368,8 +369,7 @@ struct tcp_sock {
u8 compressed_ack;
u8 dup_ack_counter:2,
tlp_retrans:1, /* TLP is a retransmission */
- tcp_usec_ts:1, /* TSval values in usec */
- unused:4;
+ unused:5;
u8 thin_lto : 1,/* Use linear timeouts for thin streams */
recvmsg_inq : 1,/* Indicate # of bytes in queue upon recvmsg */
fastopen_connect:1, /* FASTOPEN_CONNECT sockopt */
diff --git a/include/linux/trace_seq.h b/include/linux/trace_seq.h
index 9ec229d..1ef95c0 100644
--- a/include/linux/trace_seq.h
+++ b/include/linux/trace_seq.h
@@ -9,9 +9,15 @@
/*
* Trace sequences are used to allow a function to call several other functions
* to create a string of data to use.
+ *
+ * Have the trace seq to be 8K which is typically PAGE_SIZE * 2 on
+ * most architectures. The TRACE_SEQ_BUFFER_SIZE (which is
+ * TRACE_SEQ_SIZE minus the other fields of trace_seq), is the
+ * max size the output of a trace event may be.
*/
-#define TRACE_SEQ_BUFFER_SIZE (PAGE_SIZE * 2 - \
+#define TRACE_SEQ_SIZE 8192
+#define TRACE_SEQ_BUFFER_SIZE (TRACE_SEQ_SIZE - \
(sizeof(struct seq_buf) + sizeof(size_t) + sizeof(int)))
struct trace_seq {
diff --git a/include/linux/uio.h b/include/linux/uio.h
index bea9c89..00cebe2 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -40,7 +40,6 @@ struct iov_iter_state {
struct iov_iter {
u8 iter_type;
- bool copy_mc;
bool nofault;
bool data_source;
size_t iov_offset;
@@ -248,22 +247,8 @@ size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i);
#ifdef CONFIG_ARCH_HAS_COPY_MC
size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
-static inline void iov_iter_set_copy_mc(struct iov_iter *i)
-{
- i->copy_mc = true;
-}
-
-static inline bool iov_iter_is_copy_mc(const struct iov_iter *i)
-{
- return i->copy_mc;
-}
#else
#define _copy_mc_to_iter _copy_to_iter
-static inline void iov_iter_set_copy_mc(struct iov_iter *i) { }
-static inline bool iov_iter_is_copy_mc(const struct iov_iter *i)
-{
- return false;
-}
#endif
size_t iov_iter_zero(size_t bytes, struct iov_iter *);
@@ -355,7 +340,6 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction,
WARN_ON(direction & ~(READ | WRITE));
*i = (struct iov_iter) {
.iter_type = ITER_UBUF,
- .copy_mc = false,
.data_source = direction,
.ubuf = buf,
.count = count,
diff --git a/include/linux/usb/gadget.h b/include/linux/usb/gadget.h
index a771ccc..6532beb 100644
--- a/include/linux/usb/gadget.h
+++ b/include/linux/usb/gadget.h
@@ -236,7 +236,6 @@ struct usb_ep {
unsigned max_streams:16;
unsigned mult:2;
unsigned maxburst:5;
- unsigned fifo_mode:1;
u8 address;
const struct usb_endpoint_descriptor *desc;
const struct usb_ss_ep_comp_descriptor *comp_desc;
diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h
index 4dabeb6..9b09aca 100644
--- a/include/net/busy_poll.h
+++ b/include/net/busy_poll.h
@@ -48,6 +48,10 @@ void napi_busy_loop(unsigned int napi_id,
bool (*loop_end)(void *, unsigned long),
void *loop_end_arg, bool prefer_busy_poll, u16 budget);
+void napi_busy_loop_rcu(unsigned int napi_id,
+ bool (*loop_end)(void *, unsigned long),
+ void *loop_end_arg, bool prefer_busy_poll, u16 budget);
+
#else /* CONFIG_NET_RX_BUSY_POLL */
static inline unsigned long net_busy_loop_on(void)
{
diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index cf79656..2b54fdd 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -2910,6 +2910,8 @@ struct cfg80211_bss_ies {
* own the beacon_ies, but they're just pointers to the ones from the
* @hidden_beacon_bss struct)
* @proberesp_ies: the information elements from the last Probe Response frame
+ * @proberesp_ecsa_stuck: ECSA element is stuck in the Probe Response frame,
+ * cannot rely on it having valid data
* @hidden_beacon_bss: in case this BSS struct represents a probe response from
* a BSS that hides the SSID in its beacon, this points to the BSS struct
* that holds the beacon data. @beacon_ies is still valid, of course, and
@@ -2950,6 +2952,8 @@ struct cfg80211_bss {
u8 chains;
s8 chain_signal[IEEE80211_MAX_CHAINS];
+ u8 proberesp_ecsa_stuck:1;
+
u8 bssid_index;
u8 max_bssid_indicator;
diff --git a/include/net/mctp.h b/include/net/mctp.h
index da86e10..2bff5f4 100644
--- a/include/net/mctp.h
+++ b/include/net/mctp.h
@@ -249,6 +249,7 @@ struct mctp_route {
struct mctp_route *mctp_route_lookup(struct net *net, unsigned int dnet,
mctp_eid_t daddr);
+/* always takes ownership of skb */
int mctp_local_output(struct sock *sk, struct mctp_route *rt,
struct sk_buff *skb, mctp_eid_t daddr, u8 req_tag);
diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index 956c752..a763dd3 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -276,7 +276,7 @@ nf_flow_table_offload_del_cb(struct nf_flowtable *flow_table,
}
void flow_offload_route_init(struct flow_offload *flow,
- const struct nf_flow_route *route);
+ struct nf_flow_route *route);
int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow);
void flow_offload_refresh(struct nf_flowtable *flow_table,
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 001226c3..510244c 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -808,10 +808,16 @@ static inline struct nft_set_elem_expr *nft_set_ext_expr(const struct nft_set_ex
return nft_set_ext(ext, NFT_SET_EXT_EXPRESSIONS);
}
-static inline bool nft_set_elem_expired(const struct nft_set_ext *ext)
+static inline bool __nft_set_elem_expired(const struct nft_set_ext *ext,
+ u64 tstamp)
{
return nft_set_ext_exists(ext, NFT_SET_EXT_EXPIRATION) &&
- time_is_before_eq_jiffies64(*nft_set_ext_expiration(ext));
+ time_after_eq64(tstamp, *nft_set_ext_expiration(ext));
+}
+
+static inline bool nft_set_elem_expired(const struct nft_set_ext *ext)
+{
+ return __nft_set_elem_expired(ext, get_jiffies_64());
}
static inline struct nft_set_ext *nft_set_elem_ext(const struct nft_set *set,
@@ -1779,6 +1785,7 @@ struct nftables_pernet {
struct list_head notify_list;
struct mutex commit_mutex;
u64 table_handle;
+ u64 tstamp;
unsigned int base_seq;
unsigned int gc_seq;
u8 validate_state;
@@ -1791,6 +1798,11 @@ static inline struct nftables_pernet *nft_pernet(const struct net *net)
return net_generic(net, nf_tables_net_id);
}
+static inline u64 nft_net_tstamp(const struct net *net)
+{
+ return nft_pernet(net)->tstamp;
+}
+
#define __NFT_REDUCE_READONLY 1UL
#define NFT_REDUCE_READONLY (void *)__NFT_REDUCE_READONLY
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 934fdb9..cefe0c4 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -238,12 +238,7 @@ static inline bool qdisc_may_bulk(const struct Qdisc *qdisc)
static inline int qdisc_avail_bulklimit(const struct netdev_queue *txq)
{
-#ifdef CONFIG_BQL
- /* Non-BQL migrated drivers will return 0, too. */
- return dql_avail(&txq->dql);
-#else
- return 0;
-#endif
+ return netdev_queue_dql_avail(txq);
}
struct Qdisc_class_ops {
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index a43062d..8346b0d 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -308,6 +308,9 @@ void switchdev_deferred_process(void);
int switchdev_port_attr_set(struct net_device *dev,
const struct switchdev_attr *attr,
struct netlink_ext_ack *extack);
+bool switchdev_port_obj_act_is_deferred(struct net_device *dev,
+ enum switchdev_notifier_type nt,
+ const struct switchdev_obj *obj);
int switchdev_port_obj_add(struct net_device *dev,
const struct switchdev_obj *obj,
struct netlink_ext_ack *extack);
diff --git a/include/net/tcp.h b/include/net/tcp.h
index dd78a11..f6eba96 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2506,7 +2506,7 @@ struct tcp_ulp_ops {
/* cleanup ulp */
void (*release)(struct sock *sk);
/* diagnostic */
- int (*get_info)(const struct sock *sk, struct sk_buff *skb);
+ int (*get_info)(struct sock *sk, struct sk_buff *skb);
size_t (*get_info_size)(const struct sock *sk);
/* clone ulp */
void (*clone)(const struct request_sock *req, struct sock *newsk,
diff --git a/include/net/tls.h b/include/net/tls.h
index 962f0c5..340ad43 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -97,9 +97,6 @@ struct tls_sw_context_tx {
struct tls_rec *open_rec;
struct list_head tx_list;
atomic_t encrypt_pending;
- /* protect crypto_wait with encrypt_pending */
- spinlock_t encrypt_compl_lock;
- int async_notify;
u8 async_capable:1;
#define BIT_TX_SCHEDULED 0
@@ -136,8 +133,6 @@ struct tls_sw_context_rx {
struct tls_strparser strp;
atomic_t decrypt_pending;
- /* protect crypto_wait with decrypt_pending*/
- spinlock_t decrypt_compl_lock;
struct sk_buff_head async_hold;
struct wait_queue_head wq;
};
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 5ec1e71..c38f4fe 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -100,10 +100,6 @@ struct scsi_vpd {
unsigned char data[];
};
-enum scsi_vpd_parameters {
- SCSI_VPD_HEADER_SIZE = 4,
-};
-
struct scsi_device {
struct Scsi_Host *host;
struct request_queue *request_queue;
@@ -208,6 +204,7 @@ struct scsi_device {
unsigned use_10_for_rw:1; /* first try 10-byte read / write */
unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */
unsigned set_dbd_for_ms:1; /* Set "DBD" field in mode sense */
+ unsigned read_before_ms:1; /* perform a READ before MODE SENSE */
unsigned no_report_opcodes:1; /* no REPORT SUPPORTED OPERATION CODES */
unsigned no_write_same:1; /* no WRITE SAME command */
unsigned use_16_for_rw:1; /* Use read/write(16) over read/write(10) */
diff --git a/include/sound/soc-card.h b/include/sound/soc-card.h
index ecc02e95..1f4c399 100644
--- a/include/sound/soc-card.h
+++ b/include/sound/soc-card.h
@@ -30,6 +30,8 @@ static inline void snd_soc_card_mutex_unlock(struct snd_soc_card *card)
struct snd_kcontrol *snd_soc_card_get_kcontrol(struct snd_soc_card *soc_card,
const char *name);
+struct snd_kcontrol *snd_soc_card_get_kcontrol_locked(struct snd_soc_card *soc_card,
+ const char *name);
int snd_soc_card_jack_new(struct snd_soc_card *card, const char *id, int type,
struct snd_soc_jack *jack);
int snd_soc_card_jack_new_pins(struct snd_soc_card *card, const char *id,
diff --git a/include/sound/tas2781.h b/include/sound/tas2781.h
index b00d654..9aff384 100644
--- a/include/sound/tas2781.h
+++ b/include/sound/tas2781.h
@@ -142,6 +142,7 @@ struct tasdevice_priv {
void tas2781_reset(struct tasdevice_priv *tas_dev);
int tascodec_init(struct tasdevice_priv *tas_priv, void *codec,
+ struct module *module,
void (*cont)(const struct firmware *fw, void *context));
struct tasdevice_priv *tasdevice_kzalloc(struct i2c_client *i2c);
int tasdevice_init(struct tasdevice_priv *tas_priv);
diff --git a/include/trace/events/afs.h b/include/trace/events/afs.h
index 08f2c93..450c44c 100644
--- a/include/trace/events/afs.h
+++ b/include/trace/events/afs.h
@@ -1189,8 +1189,8 @@ TRACE_EVENT(afs_flock_op,
__entry->from = fl->fl_start;
__entry->len = fl->fl_end - fl->fl_start + 1;
__entry->op = op;
- __entry->type = fl->fl_type;
- __entry->flags = fl->fl_flags;
+ __entry->type = fl->c.flc_type;
+ __entry->flags = fl->c.flc_flags;
__entry->debug_id = fl->fl_u.afs.debug_id;
),
diff --git a/include/trace/events/filelock.h b/include/trace/events/filelock.h
index 1646dad..b8d1e00 100644
--- a/include/trace/events/filelock.h
+++ b/include/trace/events/filelock.h
@@ -68,11 +68,11 @@ DECLARE_EVENT_CLASS(filelock_lock,
__field(struct file_lock *, fl)
__field(unsigned long, i_ino)
__field(dev_t, s_dev)
- __field(struct file_lock *, fl_blocker)
- __field(fl_owner_t, fl_owner)
- __field(unsigned int, fl_pid)
- __field(unsigned int, fl_flags)
- __field(unsigned char, fl_type)
+ __field(struct file_lock_core *, blocker)
+ __field(fl_owner_t, owner)
+ __field(unsigned int, pid)
+ __field(unsigned int, flags)
+ __field(unsigned char, type)
__field(loff_t, fl_start)
__field(loff_t, fl_end)
__field(int, ret)
@@ -82,11 +82,11 @@ DECLARE_EVENT_CLASS(filelock_lock,
__entry->fl = fl ? fl : NULL;
__entry->s_dev = inode->i_sb->s_dev;
__entry->i_ino = inode->i_ino;
- __entry->fl_blocker = fl ? fl->fl_blocker : NULL;
- __entry->fl_owner = fl ? fl->fl_owner : NULL;
- __entry->fl_pid = fl ? fl->fl_pid : 0;
- __entry->fl_flags = fl ? fl->fl_flags : 0;
- __entry->fl_type = fl ? fl->fl_type : 0;
+ __entry->blocker = fl ? fl->c.flc_blocker : NULL;
+ __entry->owner = fl ? fl->c.flc_owner : NULL;
+ __entry->pid = fl ? fl->c.flc_pid : 0;
+ __entry->flags = fl ? fl->c.flc_flags : 0;
+ __entry->type = fl ? fl->c.flc_type : 0;
__entry->fl_start = fl ? fl->fl_start : 0;
__entry->fl_end = fl ? fl->fl_end : 0;
__entry->ret = ret;
@@ -94,9 +94,9 @@ DECLARE_EVENT_CLASS(filelock_lock,
TP_printk("fl=%p dev=0x%x:0x%x ino=0x%lx fl_blocker=%p fl_owner=%p fl_pid=%u fl_flags=%s fl_type=%s fl_start=%lld fl_end=%lld ret=%d",
__entry->fl, MAJOR(__entry->s_dev), MINOR(__entry->s_dev),
- __entry->i_ino, __entry->fl_blocker, __entry->fl_owner,
- __entry->fl_pid, show_fl_flags(__entry->fl_flags),
- show_fl_type(__entry->fl_type),
+ __entry->i_ino, __entry->blocker, __entry->owner,
+ __entry->pid, show_fl_flags(__entry->flags),
+ show_fl_type(__entry->type),
__entry->fl_start, __entry->fl_end, __entry->ret)
);
@@ -117,59 +117,59 @@ DEFINE_EVENT(filelock_lock, flock_lock_inode,
TP_ARGS(inode, fl, ret));
DECLARE_EVENT_CLASS(filelock_lease,
- TP_PROTO(struct inode *inode, struct file_lock *fl),
+ TP_PROTO(struct inode *inode, struct file_lease *fl),
TP_ARGS(inode, fl),
TP_STRUCT__entry(
- __field(struct file_lock *, fl)
+ __field(struct file_lease *, fl)
__field(unsigned long, i_ino)
__field(dev_t, s_dev)
- __field(struct file_lock *, fl_blocker)
- __field(fl_owner_t, fl_owner)
- __field(unsigned int, fl_flags)
- __field(unsigned char, fl_type)
- __field(unsigned long, fl_break_time)
- __field(unsigned long, fl_downgrade_time)
+ __field(struct file_lock_core *, blocker)
+ __field(fl_owner_t, owner)
+ __field(unsigned int, flags)
+ __field(unsigned char, type)
+ __field(unsigned long, break_time)
+ __field(unsigned long, downgrade_time)
),
TP_fast_assign(
__entry->fl = fl ? fl : NULL;
__entry->s_dev = inode->i_sb->s_dev;
__entry->i_ino = inode->i_ino;
- __entry->fl_blocker = fl ? fl->fl_blocker : NULL;
- __entry->fl_owner = fl ? fl->fl_owner : NULL;
- __entry->fl_flags = fl ? fl->fl_flags : 0;
- __entry->fl_type = fl ? fl->fl_type : 0;
- __entry->fl_break_time = fl ? fl->fl_break_time : 0;
- __entry->fl_downgrade_time = fl ? fl->fl_downgrade_time : 0;
+ __entry->blocker = fl ? fl->c.flc_blocker : NULL;
+ __entry->owner = fl ? fl->c.flc_owner : NULL;
+ __entry->flags = fl ? fl->c.flc_flags : 0;
+ __entry->type = fl ? fl->c.flc_type : 0;
+ __entry->break_time = fl ? fl->fl_break_time : 0;
+ __entry->downgrade_time = fl ? fl->fl_downgrade_time : 0;
),
TP_printk("fl=%p dev=0x%x:0x%x ino=0x%lx fl_blocker=%p fl_owner=%p fl_flags=%s fl_type=%s fl_break_time=%lu fl_downgrade_time=%lu",
__entry->fl, MAJOR(__entry->s_dev), MINOR(__entry->s_dev),
- __entry->i_ino, __entry->fl_blocker, __entry->fl_owner,
- show_fl_flags(__entry->fl_flags),
- show_fl_type(__entry->fl_type),
- __entry->fl_break_time, __entry->fl_downgrade_time)
+ __entry->i_ino, __entry->blocker, __entry->owner,
+ show_fl_flags(__entry->flags),
+ show_fl_type(__entry->type),
+ __entry->break_time, __entry->downgrade_time)
);
-DEFINE_EVENT(filelock_lease, break_lease_noblock, TP_PROTO(struct inode *inode, struct file_lock *fl),
+DEFINE_EVENT(filelock_lease, break_lease_noblock, TP_PROTO(struct inode *inode, struct file_lease *fl),
TP_ARGS(inode, fl));
-DEFINE_EVENT(filelock_lease, break_lease_block, TP_PROTO(struct inode *inode, struct file_lock *fl),
+DEFINE_EVENT(filelock_lease, break_lease_block, TP_PROTO(struct inode *inode, struct file_lease *fl),
TP_ARGS(inode, fl));
-DEFINE_EVENT(filelock_lease, break_lease_unblock, TP_PROTO(struct inode *inode, struct file_lock *fl),
+DEFINE_EVENT(filelock_lease, break_lease_unblock, TP_PROTO(struct inode *inode, struct file_lease *fl),
TP_ARGS(inode, fl));
-DEFINE_EVENT(filelock_lease, generic_delete_lease, TP_PROTO(struct inode *inode, struct file_lock *fl),
+DEFINE_EVENT(filelock_lease, generic_delete_lease, TP_PROTO(struct inode *inode, struct file_lease *fl),
TP_ARGS(inode, fl));
-DEFINE_EVENT(filelock_lease, time_out_leases, TP_PROTO(struct inode *inode, struct file_lock *fl),
+DEFINE_EVENT(filelock_lease, time_out_leases, TP_PROTO(struct inode *inode, struct file_lease *fl),
TP_ARGS(inode, fl));
TRACE_EVENT(generic_add_lease,
- TP_PROTO(struct inode *inode, struct file_lock *fl),
+ TP_PROTO(struct inode *inode, struct file_lease *fl),
TP_ARGS(inode, fl),
@@ -179,9 +179,9 @@ TRACE_EVENT(generic_add_lease,
__field(int, rcount)
__field(int, icount)
__field(dev_t, s_dev)
- __field(fl_owner_t, fl_owner)
- __field(unsigned int, fl_flags)
- __field(unsigned char, fl_type)
+ __field(fl_owner_t, owner)
+ __field(unsigned int, flags)
+ __field(unsigned char, type)
),
TP_fast_assign(
@@ -190,21 +190,21 @@ TRACE_EVENT(generic_add_lease,
__entry->wcount = atomic_read(&inode->i_writecount);
__entry->rcount = atomic_read(&inode->i_readcount);
__entry->icount = atomic_read(&inode->i_count);
- __entry->fl_owner = fl->fl_owner;
- __entry->fl_flags = fl->fl_flags;
- __entry->fl_type = fl->fl_type;
+ __entry->owner = fl->c.flc_owner;
+ __entry->flags = fl->c.flc_flags;
+ __entry->type = fl->c.flc_type;
),
TP_printk("dev=0x%x:0x%x ino=0x%lx wcount=%d rcount=%d icount=%d fl_owner=%p fl_flags=%s fl_type=%s",
MAJOR(__entry->s_dev), MINOR(__entry->s_dev),
__entry->i_ino, __entry->wcount, __entry->rcount,
- __entry->icount, __entry->fl_owner,
- show_fl_flags(__entry->fl_flags),
- show_fl_type(__entry->fl_type))
+ __entry->icount, __entry->owner,
+ show_fl_flags(__entry->flags),
+ show_fl_type(__entry->type))
);
TRACE_EVENT(leases_conflict,
- TP_PROTO(bool conflict, struct file_lock *lease, struct file_lock *breaker),
+ TP_PROTO(bool conflict, struct file_lease *lease, struct file_lease *breaker),
TP_ARGS(conflict, lease, breaker),
@@ -220,11 +220,11 @@ TRACE_EVENT(leases_conflict,
TP_fast_assign(
__entry->lease = lease;
- __entry->l_fl_flags = lease->fl_flags;
- __entry->l_fl_type = lease->fl_type;
+ __entry->l_fl_flags = lease->c.flc_flags;
+ __entry->l_fl_type = lease->c.flc_type;
__entry->breaker = breaker;
- __entry->b_fl_flags = breaker->fl_flags;
- __entry->b_fl_type = breaker->fl_type;
+ __entry->b_fl_flags = breaker->c.flc_flags;
+ __entry->b_fl_type = breaker->c.flc_type;
__entry->conflict = conflict;
),
diff --git a/include/trace/events/io_uring.h b/include/trace/events/io_uring.h
index 69454f1..e948df7 100644
--- a/include/trace/events/io_uring.h
+++ b/include/trace/events/io_uring.h
@@ -148,7 +148,7 @@ TRACE_EVENT(io_uring_queue_async_work,
__field( void *, req )
__field( u64, user_data )
__field( u8, opcode )
- __field( unsigned int, flags )
+ __field( unsigned long long, flags )
__field( struct io_wq_work *, work )
__field( int, rw )
@@ -159,7 +159,7 @@ TRACE_EVENT(io_uring_queue_async_work,
__entry->ctx = req->ctx;
__entry->req = req;
__entry->user_data = req->cqe.user_data;
- __entry->flags = req->flags;
+ __entry->flags = (__force unsigned long long) req->flags;
__entry->opcode = req->opcode;
__entry->work = &req->work;
__entry->rw = rw;
@@ -167,10 +167,10 @@ TRACE_EVENT(io_uring_queue_async_work,
__assign_str(op_str, io_uring_get_opcode(req->opcode));
),
- TP_printk("ring %p, request %p, user_data 0x%llx, opcode %s, flags 0x%x, %s queue, work %p",
+ TP_printk("ring %p, request %p, user_data 0x%llx, opcode %s, flags 0x%llx, %s queue, work %p",
__entry->ctx, __entry->req, __entry->user_data,
- __get_str(op_str),
- __entry->flags, __entry->rw ? "hashed" : "normal", __entry->work)
+ __get_str(op_str), __entry->flags,
+ __entry->rw ? "hashed" : "normal", __entry->work)
);
/**
@@ -378,7 +378,7 @@ TRACE_EVENT(io_uring_submit_req,
__field( void *, req )
__field( unsigned long long, user_data )
__field( u8, opcode )
- __field( u32, flags )
+ __field( unsigned long long, flags )
__field( bool, sq_thread )
__string( op_str, io_uring_get_opcode(req->opcode) )
@@ -389,16 +389,16 @@ TRACE_EVENT(io_uring_submit_req,
__entry->req = req;
__entry->user_data = req->cqe.user_data;
__entry->opcode = req->opcode;
- __entry->flags = req->flags;
+ __entry->flags = (__force unsigned long long) req->flags;
__entry->sq_thread = req->ctx->flags & IORING_SETUP_SQPOLL;
__assign_str(op_str, io_uring_get_opcode(req->opcode));
),
- TP_printk("ring %p, req %p, user_data 0x%llx, opcode %s, flags 0x%x, "
+ TP_printk("ring %p, req %p, user_data 0x%llx, opcode %s, flags 0x%llx, "
"sq_thread %d", __entry->ctx, __entry->req,
- __entry->user_data, __get_str(op_str),
- __entry->flags, __entry->sq_thread)
+ __entry->user_data, __get_str(op_str), __entry->flags,
+ __entry->sq_thread)
);
/*
@@ -602,29 +602,25 @@ TRACE_EVENT(io_uring_cqe_overflow,
*
* @tctx: pointer to a io_uring_task
* @count: how many functions it ran
- * @loops: how many loops it ran
*
*/
TRACE_EVENT(io_uring_task_work_run,
- TP_PROTO(void *tctx, unsigned int count, unsigned int loops),
+ TP_PROTO(void *tctx, unsigned int count),
- TP_ARGS(tctx, count, loops),
+ TP_ARGS(tctx, count),
TP_STRUCT__entry (
__field( void *, tctx )
__field( unsigned int, count )
- __field( unsigned int, loops )
),
TP_fast_assign(
__entry->tctx = tctx;
__entry->count = count;
- __entry->loops = loops;
),
- TP_printk("tctx %p, count %u, loops %u",
- __entry->tctx, __entry->count, __entry->loops)
+ TP_printk("tctx %p, count %u", __entry->tctx, __entry->count)
);
TRACE_EVENT(io_uring_short_write,
diff --git a/include/trace/events/qdisc.h b/include/trace/events/qdisc.h
index a399592..1f42583 100644
--- a/include/trace/events/qdisc.h
+++ b/include/trace/events/qdisc.h
@@ -81,14 +81,14 @@ TRACE_EVENT(qdisc_reset,
TP_ARGS(q),
TP_STRUCT__entry(
- __string( dev, qdisc_dev(q) )
- __string( kind, q->ops->id )
- __field( u32, parent )
- __field( u32, handle )
+ __string( dev, qdisc_dev(q)->name )
+ __string( kind, q->ops->id )
+ __field( u32, parent )
+ __field( u32, handle )
),
TP_fast_assign(
- __assign_str(dev, qdisc_dev(q));
+ __assign_str(dev, qdisc_dev(q)->name);
__assign_str(kind, q->ops->id);
__entry->parent = q->parent;
__entry->handle = q->handle;
@@ -106,14 +106,14 @@ TRACE_EVENT(qdisc_destroy,
TP_ARGS(q),
TP_STRUCT__entry(
- __string( dev, qdisc_dev(q) )
- __string( kind, q->ops->id )
- __field( u32, parent )
- __field( u32, handle )
+ __string( dev, qdisc_dev(q)->name )
+ __string( kind, q->ops->id )
+ __field( u32, parent )
+ __field( u32, handle )
),
TP_fast_assign(
- __assign_str(dev, qdisc_dev(q));
+ __assign_str(dev, qdisc_dev(q)->name);
__assign_str(kind, q->ops->id);
__entry->parent = q->parent;
__entry->handle = q->handle;
diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index 4c1ef7b..87b8de9 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -128,6 +128,7 @@
EM(rxrpc_skb_eaten_by_unshare_nomem, "ETN unshar-nm") \
EM(rxrpc_skb_get_conn_secured, "GET conn-secd") \
EM(rxrpc_skb_get_conn_work, "GET conn-work") \
+ EM(rxrpc_skb_get_last_nack, "GET last-nack") \
EM(rxrpc_skb_get_local_work, "GET locl-work") \
EM(rxrpc_skb_get_reject_work, "GET rej-work ") \
EM(rxrpc_skb_get_to_recvmsg, "GET to-recv ") \
@@ -141,6 +142,7 @@
EM(rxrpc_skb_put_error_report, "PUT error-rep") \
EM(rxrpc_skb_put_input, "PUT input ") \
EM(rxrpc_skb_put_jumbo_subpacket, "PUT jumbo-sub") \
+ EM(rxrpc_skb_put_last_nack, "PUT last-nack") \
EM(rxrpc_skb_put_purge, "PUT purge ") \
EM(rxrpc_skb_put_rotate, "PUT rotate ") \
EM(rxrpc_skb_put_unknown, "PUT unknown ") \
@@ -1552,7 +1554,7 @@ TRACE_EVENT(rxrpc_congest,
memcpy(&__entry->sum, summary, sizeof(__entry->sum));
),
- TP_printk("c=%08x r=%08x %s q=%08x %s cw=%u ss=%u nA=%u,%u+%u r=%u b=%u u=%u d=%u l=%x%s%s%s",
+ TP_printk("c=%08x r=%08x %s q=%08x %s cw=%u ss=%u nA=%u,%u+%u,%u b=%u u=%u d=%u l=%x%s%s%s",
__entry->call,
__entry->ack_serial,
__print_symbolic(__entry->sum.ack_reason, rxrpc_ack_names),
@@ -1560,9 +1562,9 @@ TRACE_EVENT(rxrpc_congest,
__print_symbolic(__entry->sum.mode, rxrpc_congest_modes),
__entry->sum.cwnd,
__entry->sum.ssthresh,
- __entry->sum.nr_acks, __entry->sum.saw_nacks,
+ __entry->sum.nr_acks, __entry->sum.nr_retained_nacks,
__entry->sum.nr_new_acks,
- __entry->sum.nr_rot_new_acks,
+ __entry->sum.nr_new_nacks,
__entry->top - __entry->hard_ack,
__entry->sum.cumulative_acks,
__entry->sum.dup_acks,
diff --git a/include/uapi/drm/ivpu_accel.h b/include/uapi/drm/ivpu_accel.h
index 63c4931..19a1346 100644
--- a/include/uapi/drm/ivpu_accel.h
+++ b/include/uapi/drm/ivpu_accel.h
@@ -305,6 +305,7 @@ struct drm_ivpu_submit {
/* drm_ivpu_bo_wait job status codes */
#define DRM_IVPU_JOB_STATUS_SUCCESS 0
+#define DRM_IVPU_JOB_STATUS_ABORTED 256
/**
* struct drm_ivpu_bo_wait - Wait for BO to become inactive
diff --git a/include/uapi/drm/nouveau_drm.h b/include/uapi/drm/nouveau_drm.h
index 0bade159..77d7ff0 100644
--- a/include/uapi/drm/nouveau_drm.h
+++ b/include/uapi/drm/nouveau_drm.h
@@ -54,6 +54,20 @@ extern "C" {
*/
#define NOUVEAU_GETPARAM_EXEC_PUSH_MAX 17
+/*
+ * NOUVEAU_GETPARAM_VRAM_BAR_SIZE - query bar size
+ *
+ * Query the VRAM BAR size.
+ */
+#define NOUVEAU_GETPARAM_VRAM_BAR_SIZE 18
+
+/*
+ * NOUVEAU_GETPARAM_VRAM_USED
+ *
+ * Get remaining VRAM size.
+ */
+#define NOUVEAU_GETPARAM_VRAM_USED 19
+
struct drm_nouveau_getparam {
__u64 param;
__u64 value;
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 9fa3ae3..bb0c8a9 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -831,11 +831,6 @@ struct drm_xe_vm_destroy {
* - %DRM_XE_VM_BIND_OP_PREFETCH
*
* and the @flags can be:
- * - %DRM_XE_VM_BIND_FLAG_READONLY
- * - %DRM_XE_VM_BIND_FLAG_ASYNC
- * - %DRM_XE_VM_BIND_FLAG_IMMEDIATE - Valid on a faulting VM only, do the
- * MAP operation immediately rather than deferring the MAP to the page
- * fault handler.
* - %DRM_XE_VM_BIND_FLAG_NULL - When the NULL flag is set, the page
* tables are setup with a special bit which indicates writes are
* dropped and all reads return zero. In the future, the NULL flags
@@ -928,9 +923,8 @@ struct drm_xe_vm_bind_op {
/** @op: Bind operation to perform */
__u32 op;
-#define DRM_XE_VM_BIND_FLAG_READONLY (1 << 0)
-#define DRM_XE_VM_BIND_FLAG_IMMEDIATE (1 << 1)
#define DRM_XE_VM_BIND_FLAG_NULL (1 << 2)
+#define DRM_XE_VM_BIND_FLAG_DUMPABLE (1 << 3)
/** @flags: Bind flags */
__u32 flags;
@@ -1045,20 +1039,6 @@ struct drm_xe_exec_queue_create {
#define DRM_XE_EXEC_QUEUE_EXTENSION_SET_PROPERTY 0
#define DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY 0
#define DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE 1
-#define DRM_XE_EXEC_QUEUE_SET_PROPERTY_PREEMPTION_TIMEOUT 2
-#define DRM_XE_EXEC_QUEUE_SET_PROPERTY_PERSISTENCE 3
-#define DRM_XE_EXEC_QUEUE_SET_PROPERTY_JOB_TIMEOUT 4
-#define DRM_XE_EXEC_QUEUE_SET_PROPERTY_ACC_TRIGGER 5
-#define DRM_XE_EXEC_QUEUE_SET_PROPERTY_ACC_NOTIFY 6
-#define DRM_XE_EXEC_QUEUE_SET_PROPERTY_ACC_GRANULARITY 7
-/* Monitor 128KB contiguous region with 4K sub-granularity */
-#define DRM_XE_ACC_GRANULARITY_128K 0
-/* Monitor 2MB contiguous region with 64KB sub-granularity */
-#define DRM_XE_ACC_GRANULARITY_2M 1
-/* Monitor 16MB contiguous region with 512KB sub-granularity */
-#define DRM_XE_ACC_GRANULARITY_16M 2
-/* Monitor 64MB contiguous region with 2M sub-granularity */
-#define DRM_XE_ACC_GRANULARITY_64M 3
/** @extensions: Pointer to the first extension struct, if any */
__u64 extensions;
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 48ad69f..45e4e64 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -64,6 +64,24 @@ struct fstrim_range {
__u64 minlen;
};
+/*
+ * We include a length field because some filesystems (vfat) have an identifier
+ * that we do want to expose as a UUID, but doesn't have the standard length.
+ *
+ * We use a fixed size buffer beacuse this interface will, by fiat, never
+ * support "UUIDs" longer than 16 bytes; we don't want to force all downstream
+ * users to have to deal with that.
+ */
+struct fsuuid2 {
+ __u8 len;
+ __u8 uuid[16];
+};
+
+struct fs_sysfs_path {
+ __u8 len;
+ __u8 name[128];
+};
+
/* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
#define FILE_DEDUPE_RANGE_SAME 0
#define FILE_DEDUPE_RANGE_DIFFERS 1
@@ -215,6 +233,13 @@ struct fsxattr {
#define FS_IOC_FSSETXATTR _IOW('X', 32, struct fsxattr)
#define FS_IOC_GETFSLABEL _IOR(0x94, 49, char[FSLABEL_MAX])
#define FS_IOC_SETFSLABEL _IOW(0x94, 50, char[FSLABEL_MAX])
+/* Returns the external filesystem UUID, the same one blkid returns */
+#define FS_IOC_GETFSUUID _IOR(0x15, 0, struct fsuuid2)
+/*
+ * Returns the path component under /sys/fs/ that refers to this filesystem;
+ * also /sys/kernel/debug/ for filesystems with debugfs exports
+ */
+#define FS_IOC_GETFSSYSFSPATH _IOR(0x15, 1, struct fs_sysfs_path)
/*
* Inode flags (FS_IOC_GETFLAGS / FS_IOC_SETFLAGS)
@@ -301,9 +326,12 @@ typedef int __bitwise __kernel_rwf_t;
/* per-IO O_APPEND */
#define RWF_APPEND ((__force __kernel_rwf_t)0x00000010)
+/* per-IO negation of O_APPEND */
+#define RWF_NOAPPEND ((__force __kernel_rwf_t)0x00000020)
+
/* mask of flags supported by the kernel */
#define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\
- RWF_APPEND)
+ RWF_APPEND | RWF_NOAPPEND)
/* Pagemap ioctl */
#define PAGEMAP_SCAN _IOWR('f', 16, struct pm_scan_arg)
diff --git a/include/uapi/linux/iio/types.h b/include/uapi/linux/iio/types.h
index 5060963..f2e0b2d 100644
--- a/include/uapi/linux/iio/types.h
+++ b/include/uapi/linux/iio/types.h
@@ -91,8 +91,6 @@ enum iio_modifier {
IIO_MOD_CO2,
IIO_MOD_VOC,
IIO_MOD_LIGHT_UV,
- IIO_MOD_LIGHT_UVA,
- IIO_MOD_LIGHT_UVB,
IIO_MOD_LIGHT_DUV,
IIO_MOD_PM1,
IIO_MOD_PM2P5,
@@ -107,6 +105,8 @@ enum iio_modifier {
IIO_MOD_PITCH,
IIO_MOD_YAW,
IIO_MOD_ROLL,
+ IIO_MOD_LIGHT_UVA,
+ IIO_MOD_LIGHT_UVB,
};
enum iio_event_type {
diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h
index c4c53a9..ff8d21f 100644
--- a/include/uapi/linux/in6.h
+++ b/include/uapi/linux/in6.h
@@ -145,7 +145,7 @@ struct in6_flowlabel_req {
#define IPV6_TLV_PADN 1
#define IPV6_TLV_ROUTERALERT 5
#define IPV6_TLV_CALIPSO 7 /* RFC 5570 */
-#define IPV6_TLV_IOAM 49 /* TEMPORARY IANA allocation for IOAM */
+#define IPV6_TLV_IOAM 49 /* RFC 9486 */
#define IPV6_TLV_JUMBO 194
#define IPV6_TLV_HAO 201 /* home address option */
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 7a673b5..7bd1020 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -255,6 +255,7 @@ enum io_uring_op {
IORING_OP_FUTEX_WAKE,
IORING_OP_FUTEX_WAITV,
IORING_OP_FIXED_FD_INSTALL,
+ IORING_OP_FTRUNCATE,
/* this goes last, obviously */
IORING_OP_LAST,
@@ -570,6 +571,10 @@ enum {
/* return status information for a buffer group */
IORING_REGISTER_PBUF_STATUS = 26,
+ /* set/clear busy poll settings */
+ IORING_REGISTER_NAPI = 27,
+ IORING_UNREGISTER_NAPI = 28,
+
/* this goes last */
IORING_REGISTER_LAST,
@@ -703,6 +708,14 @@ struct io_uring_buf_status {
__u32 resv[8];
};
+/* argument for IORING_(UN)REGISTER_NAPI */
+struct io_uring_napi {
+ __u32 busy_poll_to;
+ __u8 prefer_busy_poll;
+ __u8 pad[3];
+ __u64 resv;
+};
+
/*
* io_uring_restriction->opcode values
*/
diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
index 6325d1d..1b40a96 100644
--- a/include/uapi/linux/magic.h
+++ b/include/uapi/linux/magic.h
@@ -101,5 +101,6 @@
#define DMA_BUF_MAGIC 0x444d4142 /* "DMAB" */
#define DEVMEM_MAGIC 0x454d444d /* "DMEM" */
#define SECRETMEM_MAGIC 0x5345434d /* "SECM" */
+#define PID_FS_MAGIC 0x50494446 /* "PIDF" */
#endif /* __LINUX_MAGIC_H__ */
diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index ca30232..117c6a9 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -285,9 +285,11 @@ enum nft_rule_attributes {
/**
* enum nft_rule_compat_flags - nf_tables rule compat flags
*
+ * @NFT_RULE_COMPAT_F_UNUSED: unused
* @NFT_RULE_COMPAT_F_INV: invert the check result
*/
enum nft_rule_compat_flags {
+ NFT_RULE_COMPAT_F_UNUSED = (1 << 0),
NFT_RULE_COMPAT_F_INV = (1 << 1),
NFT_RULE_COMPAT_F_MASK = NFT_RULE_COMPAT_F_INV,
};
diff --git a/include/uapi/linux/pidfd.h b/include/uapi/linux/pidfd.h
index 5406fbc..72ec000 100644
--- a/include/uapi/linux/pidfd.h
+++ b/include/uapi/linux/pidfd.h
@@ -7,6 +7,12 @@
#include <linux/fcntl.h>
/* Flags for pidfd_open(). */
-#define PIDFD_NONBLOCK O_NONBLOCK
+#define PIDFD_NONBLOCK O_NONBLOCK
+#define PIDFD_THREAD O_EXCL
+
+/* Flags for pidfd_send_signal(). */
+#define PIDFD_SIGNAL_THREAD (1UL << 0)
+#define PIDFD_SIGNAL_THREAD_GROUP (1UL << 1)
+#define PIDFD_SIGNAL_PROCESS_GROUP (1UL << 2)
#endif /* _UAPI_LINUX_PIDFD_H */
diff --git a/include/uapi/sound/asound.h b/include/uapi/sound/asound.h
index d5b9cfb..628d46a 100644
--- a/include/uapi/sound/asound.h
+++ b/include/uapi/sound/asound.h
@@ -142,7 +142,7 @@ struct snd_hwdep_dsp_image {
* *
*****************************************************************************/
-#define SNDRV_PCM_VERSION SNDRV_PROTOCOL_VERSION(2, 0, 16)
+#define SNDRV_PCM_VERSION SNDRV_PROTOCOL_VERSION(2, 0, 17)
typedef unsigned long snd_pcm_uframes_t;
typedef signed long snd_pcm_sframes_t;
@@ -416,7 +416,7 @@ struct snd_pcm_hw_params {
unsigned int rmask; /* W: requested masks */
unsigned int cmask; /* R: changed masks */
unsigned int info; /* R: Info flags for returned setup */
- unsigned int msbits; /* R: used most significant bits */
+ unsigned int msbits; /* R: used most significant bits (in sample bit-width) */
unsigned int rate_num; /* R: rate numerator */
unsigned int rate_den; /* R: rate denominator */
snd_pcm_uframes_t fifo_size; /* R: chip FIFO size in frames */
diff --git a/include/uapi/xen/gntalloc.h b/include/uapi/xen/gntalloc.h
index 48d2790..3109282 100644
--- a/include/uapi/xen/gntalloc.h
+++ b/include/uapi/xen/gntalloc.h
@@ -31,7 +31,10 @@ struct ioctl_gntalloc_alloc_gref {
__u64 index;
/* The grant references of the newly created grant, one per page */
/* Variable size, depending on count */
- __u32 gref_ids[1];
+ union {
+ __u32 gref_ids[1];
+ __DECLARE_FLEX_ARRAY(__u32, gref_ids_flex);
+ };
};
#define GNTALLOC_FLAG_WRITABLE 1
diff --git a/init/Kconfig b/init/Kconfig
index deda3d1..bee58f7 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -89,6 +89,15 @@
# Detect buggy gcc and clang, fixed in gcc-11 clang-14.
def_bool $(success,echo 'int foo(int *x) { asm goto (".long (%l[bar]) - .": "+m"(*x) ::: bar); return *x; bar: return 0; }' | $CC -x c - -c -o /dev/null)
+config GCC_ASM_GOTO_OUTPUT_WORKAROUND
+ bool
+ depends on CC_IS_GCC && CC_HAS_ASM_GOTO_OUTPUT
+ # Fixed in GCC 14, 13.3, 12.4 and 11.5
+ # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921
+ default y if GCC_VERSION < 110500
+ default y if GCC_VERSION >= 120000 && GCC_VERSION < 120400
+ default y if GCC_VERSION >= 130000 && GCC_VERSION < 130300
+
config TOOLS_SUPPORT_RELR
def_bool $(success,env "CC=$(CC)" "LD=$(LD)" "NM=$(NM)" "OBJCOPY=$(OBJCOPY)" $(srctree)/scripts/tools-support-relr.sh)
@@ -867,14 +876,14 @@
default "-Wimplicit-fallthrough=5" if CC_IS_GCC && $(cc-option,-Wimplicit-fallthrough=5)
default "-Wimplicit-fallthrough" if CC_IS_CLANG && $(cc-option,-Wunreachable-code-fallthrough)
-# Currently, disable gcc-11+ array-bounds globally.
+# Currently, disable gcc-10+ array-bounds globally.
# It's still broken in gcc-13, so no upper bound yet.
-config GCC11_NO_ARRAY_BOUNDS
+config GCC10_NO_ARRAY_BOUNDS
def_bool y
config CC_NO_ARRAY_BOUNDS
bool
- default y if CC_IS_GCC && GCC_VERSION >= 110000 && GCC11_NO_ARRAY_BOUNDS
+ default y if CC_IS_GCC && GCC_VERSION >= 100000 && GCC10_NO_ARRAY_BOUNDS
# Currently, disable -Wstringop-overflow for GCC globally.
config GCC_NO_STRINGOP_OVERFLOW
diff --git a/init/do_mounts.c b/init/do_mounts.c
index 279ad28..3c5fd99 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -208,6 +208,9 @@ void __init mount_root_generic(char *name, char *pretty_name, int flags)
goto out;
case -EACCES:
case -EINVAL:
+#ifdef CONFIG_BLOCK
+ init_flush_fput();
+#endif
continue;
}
/*
diff --git a/init/do_mounts.h b/init/do_mounts.h
index 15e372b..6069ea3 100644
--- a/init/do_mounts.h
+++ b/init/do_mounts.h
@@ -9,6 +9,8 @@
#include <linux/major.h>
#include <linux/root_dev.h>
#include <linux/init_syscalls.h>
+#include <linux/task_work.h>
+#include <linux/file.h>
void mount_root_generic(char *name, char *pretty_name, int flags);
void mount_root(char *root_device_name);
@@ -41,3 +43,10 @@ static inline bool initrd_load(char *root_device_name)
}
#endif
+
+/* Ensure that async file closing finished to prevent spurious errors. */
+static inline void init_flush_fput(void)
+{
+ flush_delayed_fput();
+ task_work_run();
+}
diff --git a/init/initramfs.c b/init/initramfs.c
index 76deb48..01dbd0e 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -16,9 +16,10 @@
#include <linux/mm.h>
#include <linux/namei.h>
#include <linux/init_syscalls.h>
-#include <linux/task_work.h>
#include <linux/umh.h>
+#include "do_mounts.h"
+
static __initdata bool csum_present;
static __initdata u32 io_csum;
@@ -679,8 +680,6 @@ static void __init populate_initrd_image(char *err)
struct file *file;
loff_t pos = 0;
- unpack_to_rootfs(__initramfs_start, __initramfs_size);
-
printk(KERN_INFO "rootfs image is not initramfs (%s); looks like an initrd\n",
err);
file = filp_open("/initrd.image", O_WRONLY | O_CREAT, 0700);
@@ -736,8 +735,7 @@ static void __init do_populate_rootfs(void *unused, async_cookie_t cookie)
initrd_start = 0;
initrd_end = 0;
- flush_delayed_fput();
- task_work_run();
+ init_flush_fput();
}
static ASYNC_DOMAIN_EXCLUSIVE(initramfs_domain);
diff --git a/init/main.c b/init/main.c
index e24b078..2fbf6a3 100644
--- a/init/main.c
+++ b/init/main.c
@@ -99,6 +99,7 @@
#include <linux/init_syscalls.h>
#include <linux/stackdepot.h>
#include <linux/randomize_kstack.h>
+#include <linux/pidfs.h>
#include <net/net_namespace.h>
#include <asm/io.h>
@@ -1059,6 +1060,7 @@ void start_kernel(void)
seq_file_init();
proc_root_init();
nsfs_init();
+ pidfs_init();
cpuset_init();
cgroup_init();
taskstats_init_early();
diff --git a/io_uring/Makefile b/io_uring/Makefile
index 2cdc518..2e1d4e0 100644
--- a/io_uring/Makefile
+++ b/io_uring/Makefile
@@ -8,6 +8,7 @@
statx.o net.o msg_ring.o timeout.o \
sqpoll.o fdinfo.o tctx.o poll.o \
cancel.o kbuf.o rsrc.o rw.o opdef.o \
- notif.o waitid.o register.o
+ notif.o waitid.o register.o truncate.o
obj-$(CONFIG_IO_WQ) += io-wq.o
obj-$(CONFIG_FUTEX) += futex.o
+obj-$(CONFIG_NET_RX_BUSY_POLL) += napi.o
diff --git a/io_uring/cancel.c b/io_uring/cancel.c
index 8a8b07d..acfcdd7 100644
--- a/io_uring/cancel.c
+++ b/io_uring/cancel.c
@@ -58,9 +58,8 @@ bool io_cancel_req_match(struct io_kiocb *req, struct io_cancel_data *cd)
return false;
if (cd->flags & IORING_ASYNC_CANCEL_ALL) {
check_seq:
- if (cd->seq == req->work.cancel_seq)
+ if (io_cancel_match_sequence(req, cd->seq))
return false;
- req->work.cancel_seq = cd->seq;
}
return true;
diff --git a/io_uring/cancel.h b/io_uring/cancel.h
index c0a8e7c..76b32e6 100644
--- a/io_uring/cancel.h
+++ b/io_uring/cancel.h
@@ -25,4 +25,14 @@ void init_hash_table(struct io_hash_table *table, unsigned size);
int io_sync_cancel(struct io_ring_ctx *ctx, void __user *arg);
bool io_cancel_req_match(struct io_kiocb *req, struct io_cancel_data *cd);
+static inline bool io_cancel_match_sequence(struct io_kiocb *req, int sequence)
+{
+ if ((req->flags & REQ_F_CANCEL_SEQ) && sequence == req->work.cancel_seq)
+ return true;
+
+ req->flags |= REQ_F_CANCEL_SEQ;
+ req->work.cancel_seq = sequence;
+ return false;
+}
+
#endif
diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
index 976e950..8d444dd 100644
--- a/io_uring/fdinfo.c
+++ b/io_uring/fdinfo.c
@@ -55,6 +55,7 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
struct io_ring_ctx *ctx = f->private_data;
struct io_overflow_cqe *ocqe;
struct io_rings *r = ctx->rings;
+ struct rusage sq_usage;
unsigned int sq_mask = ctx->sq_entries - 1, cq_mask = ctx->cq_entries - 1;
unsigned int sq_head = READ_ONCE(r->sq.head);
unsigned int sq_tail = READ_ONCE(r->sq.tail);
@@ -64,6 +65,7 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
unsigned int sq_shift = 0;
unsigned int sq_entries, cq_entries;
int sq_pid = -1, sq_cpu = -1;
+ u64 sq_total_time = 0, sq_work_time = 0;
bool has_lock;
unsigned int i;
@@ -145,12 +147,24 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) {
struct io_sq_data *sq = ctx->sq_data;
- sq_pid = sq->task_pid;
- sq_cpu = sq->sq_cpu;
+ /*
+ * sq->thread might be NULL if we raced with the sqpoll
+ * thread termination.
+ */
+ if (sq->thread) {
+ sq_pid = sq->task_pid;
+ sq_cpu = sq->sq_cpu;
+ getrusage(sq->thread, RUSAGE_SELF, &sq_usage);
+ sq_total_time = (sq_usage.ru_stime.tv_sec * 1000000
+ + sq_usage.ru_stime.tv_usec);
+ sq_work_time = sq->work_time;
+ }
}
seq_printf(m, "SqThread:\t%d\n", sq_pid);
seq_printf(m, "SqThreadCpu:\t%d\n", sq_cpu);
+ seq_printf(m, "SqTotalTime:\t%llu\n", sq_total_time);
+ seq_printf(m, "SqWorkTime:\t%llu\n", sq_work_time);
seq_printf(m, "UserFiles:\t%u\n", ctx->nr_user_files);
for (i = 0; has_lock && i < ctx->nr_user_files; i++) {
struct file *f = io_file_from_index(&ctx->file_table, i);
diff --git a/io_uring/filetable.h b/io_uring/filetable.h
index b47adf1..b2435c4 100644
--- a/io_uring/filetable.h
+++ b/io_uring/filetable.h
@@ -17,7 +17,7 @@ int io_fixed_fd_remove(struct io_ring_ctx *ctx, unsigned int offset);
int io_register_file_alloc_range(struct io_ring_ctx *ctx,
struct io_uring_file_index_range __user *arg);
-unsigned int io_file_get_flags(struct file *file);
+io_req_flags_t io_file_get_flags(struct file *file);
static inline void io_file_bitmap_clear(struct io_file_table *table, int bit)
{
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index cd9a137..cf348c3 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -59,7 +59,6 @@
#include <linux/bvec.h>
#include <linux/net.h>
#include <net/sock.h>
-#include <net/af_unix.h>
#include <linux/anon_inodes.h>
#include <linux/sched/mm.h>
#include <linux/uaccess.h>
@@ -95,6 +94,7 @@
#include "notif.h"
#include "waitid.h"
#include "futex.h"
+#include "napi.h"
#include "timeout.h"
#include "poll.h"
@@ -122,11 +122,6 @@
#define IO_COMPL_BATCH 32
#define IO_REQ_ALLOC_BATCH 8
-enum {
- IO_CHECK_CQ_OVERFLOW_BIT,
- IO_CHECK_CQ_DROPPED_BIT,
-};
-
struct io_defer_entry {
struct list_head list;
struct io_kiocb *req;
@@ -349,6 +344,8 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
INIT_DELAYED_WORK(&ctx->fallback_work, io_fallback_req_func);
INIT_WQ_LIST(&ctx->submit_state.compl_reqs);
INIT_HLIST_HEAD(&ctx->cancelable_uring_cmd);
+ io_napi_init(ctx);
+
return ctx;
err:
kfree(ctx->cancel_table.hbs);
@@ -463,7 +460,6 @@ static void io_prep_async_work(struct io_kiocb *req)
req->work.list.next = NULL;
req->work.flags = 0;
- req->work.cancel_seq = atomic_read(&ctx->cancel_seq);
if (req->flags & REQ_F_FORCE_ASYNC)
req->work.flags |= IO_WQ_WORK_CONCURRENT;
@@ -670,7 +666,6 @@ static void io_cq_unlock_post(struct io_ring_ctx *ctx)
io_commit_cqring_flush(ctx);
}
-/* Returns true if there are no backlogged entries after the flush */
static void io_cqring_overflow_kill(struct io_ring_ctx *ctx)
{
struct io_overflow_cqe *ocqe;
@@ -949,6 +944,8 @@ bool io_fill_cqe_req_aux(struct io_kiocb *req, bool defer, s32 res, u32 cflags)
u64 user_data = req->cqe.user_data;
struct io_uring_cqe *cqe;
+ lockdep_assert(!io_wq_current_is_worker());
+
if (!defer)
return __io_post_aux_cqe(ctx, user_data, res, cflags, false);
@@ -1025,15 +1022,15 @@ static void __io_req_complete_post(struct io_kiocb *req, unsigned issue_flags)
void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags)
{
- if (req->ctx->task_complete && req->ctx->submitter_task != current) {
+ struct io_ring_ctx *ctx = req->ctx;
+
+ if (ctx->task_complete && ctx->submitter_task != current) {
req->io_task_work.func = io_req_task_complete;
io_req_task_work_add(req);
} else if (!(issue_flags & IO_URING_F_UNLOCKED) ||
- !(req->ctx->flags & IORING_SETUP_IOPOLL)) {
+ !(ctx->flags & IORING_SETUP_IOPOLL)) {
__io_req_complete_post(req, issue_flags);
} else {
- struct io_ring_ctx *ctx = req->ctx;
-
mutex_lock(&ctx->uring_lock);
__io_req_complete_post(req, issue_flags & ~IO_URING_F_UNLOCKED);
mutex_unlock(&ctx->uring_lock);
@@ -1174,40 +1171,44 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx, struct io_tw_state *ts)
percpu_ref_put(&ctx->refs);
}
-static unsigned int handle_tw_list(struct llist_node *node,
- struct io_ring_ctx **ctx,
- struct io_tw_state *ts,
- struct llist_node *last)
+/*
+ * Run queued task_work, returning the number of entries processed in *count.
+ * If more entries than max_entries are available, stop processing once this
+ * is reached and return the rest of the list.
+ */
+struct llist_node *io_handle_tw_list(struct llist_node *node,
+ unsigned int *count,
+ unsigned int max_entries)
{
- unsigned int count = 0;
+ struct io_ring_ctx *ctx = NULL;
+ struct io_tw_state ts = { };
- while (node && node != last) {
+ do {
struct llist_node *next = node->next;
struct io_kiocb *req = container_of(node, struct io_kiocb,
io_task_work.node);
- prefetch(container_of(next, struct io_kiocb, io_task_work.node));
-
- if (req->ctx != *ctx) {
- ctx_flush_and_put(*ctx, ts);
- *ctx = req->ctx;
+ if (req->ctx != ctx) {
+ ctx_flush_and_put(ctx, &ts);
+ ctx = req->ctx;
/* if not contended, grab and improve batching */
- ts->locked = mutex_trylock(&(*ctx)->uring_lock);
- percpu_ref_get(&(*ctx)->refs);
+ ts.locked = mutex_trylock(&ctx->uring_lock);
+ percpu_ref_get(&ctx->refs);
}
INDIRECT_CALL_2(req->io_task_work.func,
io_poll_task_func, io_req_rw_complete,
- req, ts);
+ req, &ts);
node = next;
- count++;
+ (*count)++;
if (unlikely(need_resched())) {
- ctx_flush_and_put(*ctx, ts);
- *ctx = NULL;
+ ctx_flush_and_put(ctx, &ts);
+ ctx = NULL;
cond_resched();
}
- }
+ } while (node && *count < max_entries);
- return count;
+ ctx_flush_and_put(ctx, &ts);
+ return node;
}
/**
@@ -1224,22 +1225,6 @@ static inline struct llist_node *io_llist_xchg(struct llist_head *head,
return xchg(&head->first, new);
}
-/**
- * io_llist_cmpxchg - possibly swap all entries in a lock-less list
- * @head: the head of lock-less list to delete all entries
- * @old: expected old value of the first entry of the list
- * @new: new entry as the head of the list
- *
- * perform a cmpxchg on the first entry of the list.
- */
-
-static inline struct llist_node *io_llist_cmpxchg(struct llist_head *head,
- struct llist_node *old,
- struct llist_node *new)
-{
- return cmpxchg(&head->first, old, new);
-}
-
static __cold void io_fallback_tw(struct io_uring_task *tctx, bool sync)
{
struct llist_node *node = llist_del_all(&tctx->task_list);
@@ -1268,45 +1253,41 @@ static __cold void io_fallback_tw(struct io_uring_task *tctx, bool sync)
}
}
-void tctx_task_work(struct callback_head *cb)
+struct llist_node *tctx_task_work_run(struct io_uring_task *tctx,
+ unsigned int max_entries,
+ unsigned int *count)
{
- struct io_tw_state ts = {};
- struct io_ring_ctx *ctx = NULL;
- struct io_uring_task *tctx = container_of(cb, struct io_uring_task,
- task_work);
- struct llist_node fake = {};
struct llist_node *node;
- unsigned int loops = 0;
- unsigned int count = 0;
if (unlikely(current->flags & PF_EXITING)) {
io_fallback_tw(tctx, true);
- return;
+ return NULL;
}
- do {
- loops++;
- node = io_llist_xchg(&tctx->task_list, &fake);
- count += handle_tw_list(node, &ctx, &ts, &fake);
-
- /* skip expensive cmpxchg if there are items in the list */
- if (READ_ONCE(tctx->task_list.first) != &fake)
- continue;
- if (ts.locked && !wq_list_empty(&ctx->submit_state.compl_reqs)) {
- io_submit_flush_completions(ctx);
- if (READ_ONCE(tctx->task_list.first) != &fake)
- continue;
- }
- node = io_llist_cmpxchg(&tctx->task_list, &fake, NULL);
- } while (node != &fake);
-
- ctx_flush_and_put(ctx, &ts);
+ node = llist_del_all(&tctx->task_list);
+ if (node) {
+ node = llist_reverse_order(node);
+ node = io_handle_tw_list(node, count, max_entries);
+ }
/* relaxed read is enough as only the task itself sets ->in_cancel */
if (unlikely(atomic_read(&tctx->in_cancel)))
io_uring_drop_tctx_refs(current);
- trace_io_uring_task_work_run(tctx, count, loops);
+ trace_io_uring_task_work_run(tctx, *count);
+ return node;
+}
+
+void tctx_task_work(struct callback_head *cb)
+{
+ struct io_uring_task *tctx;
+ struct llist_node *ret;
+ unsigned int count = 0;
+
+ tctx = container_of(cb, struct io_uring_task, task_work);
+ ret = tctx_task_work_run(tctx, UINT_MAX, &count);
+ /* can't happen */
+ WARN_ON_ONCE(ret);
}
static inline void io_req_local_work_add(struct io_kiocb *req, unsigned flags)
@@ -1389,6 +1370,15 @@ static void io_req_normal_work_add(struct io_kiocb *req)
if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
+ /* SQPOLL doesn't need the task_work added, it'll run it itself */
+ if (ctx->flags & IORING_SETUP_SQPOLL) {
+ struct io_sq_data *sqd = ctx->sq_data;
+
+ if (wq_has_sleeper(&sqd->wait))
+ wake_up(&sqd->wait);
+ return;
+ }
+
if (likely(!task_work_add(req->task, &tctx->task_work, ctx->notify_method)))
return;
@@ -1420,7 +1410,20 @@ static void __cold io_move_task_work_from_local(struct io_ring_ctx *ctx)
}
}
-static int __io_run_local_work(struct io_ring_ctx *ctx, struct io_tw_state *ts)
+static bool io_run_local_work_continue(struct io_ring_ctx *ctx, int events,
+ int min_events)
+{
+ if (llist_empty(&ctx->work_llist))
+ return false;
+ if (events < min_events)
+ return true;
+ if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
+ atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
+ return false;
+}
+
+static int __io_run_local_work(struct io_ring_ctx *ctx, struct io_tw_state *ts,
+ int min_events)
{
struct llist_node *node;
unsigned int loops = 0;
@@ -1440,7 +1443,6 @@ static int __io_run_local_work(struct io_ring_ctx *ctx, struct io_tw_state *ts)
struct llist_node *next = node->next;
struct io_kiocb *req = container_of(node, struct io_kiocb,
io_task_work.node);
- prefetch(container_of(next, struct io_kiocb, io_task_work.node));
INDIRECT_CALL_2(req->io_task_work.func,
io_poll_task_func, io_req_rw_complete,
req, ts);
@@ -1449,18 +1451,20 @@ static int __io_run_local_work(struct io_ring_ctx *ctx, struct io_tw_state *ts)
}
loops++;
- if (!llist_empty(&ctx->work_llist))
+ if (io_run_local_work_continue(ctx, ret, min_events))
goto again;
if (ts->locked) {
io_submit_flush_completions(ctx);
- if (!llist_empty(&ctx->work_llist))
+ if (io_run_local_work_continue(ctx, ret, min_events))
goto again;
}
+
trace_io_uring_local_work_run(ctx, ret, loops);
return ret;
}
-static inline int io_run_local_work_locked(struct io_ring_ctx *ctx)
+static inline int io_run_local_work_locked(struct io_ring_ctx *ctx,
+ int min_events)
{
struct io_tw_state ts = { .locked = true, };
int ret;
@@ -1468,20 +1472,20 @@ static inline int io_run_local_work_locked(struct io_ring_ctx *ctx)
if (llist_empty(&ctx->work_llist))
return 0;
- ret = __io_run_local_work(ctx, &ts);
+ ret = __io_run_local_work(ctx, &ts, min_events);
/* shouldn't happen! */
if (WARN_ON_ONCE(!ts.locked))
mutex_lock(&ctx->uring_lock);
return ret;
}
-static int io_run_local_work(struct io_ring_ctx *ctx)
+static int io_run_local_work(struct io_ring_ctx *ctx, int min_events)
{
struct io_tw_state ts = {};
int ret;
ts.locked = mutex_trylock(&ctx->uring_lock);
- ret = __io_run_local_work(ctx, &ts);
+ ret = __io_run_local_work(ctx, &ts, min_events);
if (ts.locked)
mutex_unlock(&ctx->uring_lock);
@@ -1677,7 +1681,7 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
io_task_work_pending(ctx)) {
u32 tail = ctx->cached_cq_tail;
- (void) io_run_local_work_locked(ctx);
+ (void) io_run_local_work_locked(ctx, min);
if (task_work_pending(current) ||
wq_list_empty(&ctx->iopoll_list)) {
@@ -1768,9 +1772,9 @@ static void io_iopoll_req_issued(struct io_kiocb *req, unsigned int issue_flags)
}
}
-unsigned int io_file_get_flags(struct file *file)
+io_req_flags_t io_file_get_flags(struct file *file)
{
- unsigned int res = 0;
+ io_req_flags_t res = 0;
if (S_ISREG(file_inode(file)->i_mode))
res |= REQ_F_ISREG;
@@ -1966,10 +1970,28 @@ void io_wq_submit_work(struct io_wq_work *work)
goto fail;
}
+ /*
+ * If DEFER_TASKRUN is set, it's only allowed to post CQEs from the
+ * submitter task context. Final request completions are handed to the
+ * right context, however this is not the case of auxiliary CQEs,
+ * which is the main mean of operation for multishot requests.
+ * Don't allow any multishot execution from io-wq. It's more restrictive
+ * than necessary and also cleaner.
+ */
+ if (req->flags & REQ_F_APOLL_MULTISHOT) {
+ err = -EBADFD;
+ if (!io_file_can_poll(req))
+ goto fail;
+ err = -ECANCELED;
+ if (io_arm_poll_handler(req, issue_flags) != IO_APOLL_OK)
+ goto fail;
+ return;
+ }
+
if (req->flags & REQ_F_FORCE_ASYNC) {
bool opcode_poll = def->pollin || def->pollout;
- if (opcode_poll && file_can_poll(req->file)) {
+ if (opcode_poll && io_file_can_poll(req)) {
needs_poll = true;
issue_flags |= IO_URING_F_NONBLOCK;
}
@@ -2171,7 +2193,8 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
/* req is partially pre-initialised, see io_preinit_req() */
req->opcode = opcode = READ_ONCE(sqe->opcode);
/* same numerical values with corresponding REQ_F_*, safe to copy */
- req->flags = sqe_flags = READ_ONCE(sqe->flags);
+ sqe_flags = READ_ONCE(sqe->flags);
+ req->flags = (io_req_flags_t) sqe_flags;
req->cqe.user_data = READ_ONCE(sqe->user_data);
req->file = NULL;
req->rsrc_node = NULL;
@@ -2475,33 +2498,6 @@ int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
return ret;
}
-struct io_wait_queue {
- struct wait_queue_entry wq;
- struct io_ring_ctx *ctx;
- unsigned cq_tail;
- unsigned nr_timeouts;
- ktime_t timeout;
-};
-
-static inline bool io_has_work(struct io_ring_ctx *ctx)
-{
- return test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq) ||
- !llist_empty(&ctx->work_llist);
-}
-
-static inline bool io_should_wake(struct io_wait_queue *iowq)
-{
- struct io_ring_ctx *ctx = iowq->ctx;
- int dist = READ_ONCE(ctx->rings->cq.tail) - (int) iowq->cq_tail;
-
- /*
- * Wake up if we have enough events, or if a timeout occurred since we
- * started waiting. For timeouts, we always want to return to userspace,
- * regardless of event count.
- */
- return dist >= 0 || atomic_read(&ctx->cq_timeouts) != iowq->nr_timeouts;
-}
-
static int io_wake_function(struct wait_queue_entry *curr, unsigned int mode,
int wake_flags, void *key)
{
@@ -2520,7 +2516,7 @@ int io_run_task_work_sig(struct io_ring_ctx *ctx)
{
if (!llist_empty(&ctx->work_llist)) {
__set_current_state(TASK_RUNNING);
- if (io_run_local_work(ctx) > 0)
+ if (io_run_local_work(ctx, INT_MAX) > 0)
return 0;
}
if (io_run_task_work() > 0)
@@ -2588,7 +2584,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
if (!io_allowed_run_tw(ctx))
return -EEXIST;
if (!llist_empty(&ctx->work_llist))
- io_run_local_work(ctx);
+ io_run_local_work(ctx, min_events);
io_run_task_work();
io_cqring_overflow_flush(ctx);
/* if user messes with these they will just get an early return */
@@ -2621,16 +2617,19 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
if (get_timespec64(&ts, uts))
return -EFAULT;
+
iowq.timeout = ktime_add_ns(timespec64_to_ktime(ts), ktime_get_ns());
+ io_napi_adjust_timeout(ctx, &iowq, &ts);
}
+ io_napi_busy_loop(ctx, &iowq);
+
trace_io_uring_cqring_wait(ctx, min_events);
do {
+ int nr_wait = (int) iowq.cq_tail - READ_ONCE(ctx->rings->cq.tail);
unsigned long check_cq;
if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) {
- int nr_wait = (int) iowq.cq_tail - READ_ONCE(ctx->rings->cq.tail);
-
atomic_set(&ctx->cq_wait_nr, nr_wait);
set_current_state(TASK_INTERRUPTIBLE);
} else {
@@ -2649,7 +2648,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
*/
io_run_task_work();
if (!llist_empty(&ctx->work_llist))
- io_run_local_work(ctx);
+ io_run_local_work(ctx, nr_wait);
/*
* Non-local task_work will be run on exit to userspace, but
@@ -2917,6 +2916,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
io_req_caches_free(ctx);
if (ctx->hash_map)
io_wq_put_hash(ctx->hash_map);
+ io_napi_free(ctx);
kfree(ctx->cancel_table.hbs);
kfree(ctx->cancel_table_locked.hbs);
kfree(ctx->io_bl);
@@ -3304,7 +3304,7 @@ static __cold bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
if ((ctx->flags & IORING_SETUP_DEFER_TASKRUN) &&
io_allowed_defer_tw_run(ctx))
- ret |= io_run_local_work(ctx) > 0;
+ ret |= io_run_local_work(ctx, INT_MAX) > 0;
ret |= io_cancel_defer_files(ctx, task, cancel_all);
mutex_lock(&ctx->uring_lock);
ret |= io_poll_remove_all(ctx, task, cancel_all);
@@ -3666,7 +3666,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
* it should handle ownership problems if any.
*/
if (ctx->flags & IORING_SETUP_DEFER_TASKRUN)
- (void)io_run_local_work_locked(ctx);
+ (void)io_run_local_work_locked(ctx, min_complete);
}
mutex_unlock(&ctx->uring_lock);
}
@@ -4153,7 +4153,7 @@ static int __init io_uring_init(void)
BUILD_BUG_ON(SQE_COMMON_FLAGS >= (1 << 8));
BUILD_BUG_ON((SQE_VALID_FLAGS | SQE_COMMON_FLAGS) != SQE_VALID_FLAGS);
- BUILD_BUG_ON(__REQ_F_LAST_BIT > 8 * sizeof(int));
+ BUILD_BUG_ON(__REQ_F_LAST_BIT > 8 * sizeof_field(struct io_kiocb, flags));
BUILD_BUG_ON(sizeof(atomic_t) != sizeof(u32));
@@ -4175,9 +4175,8 @@ static int __init io_uring_init(void)
SLAB_ACCOUNT | SLAB_TYPESAFE_BY_RCU,
offsetof(struct io_kiocb, cmd.data),
sizeof_field(struct io_kiocb, cmd.data), NULL);
- io_buf_cachep = kmem_cache_create("io_buffer", sizeof(struct io_buffer), 0,
- SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT,
- NULL);
+ io_buf_cachep = KMEM_CACHE(io_buffer,
+ SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT);
#ifdef CONFIG_SYSCTL
register_sysctl_init("kernel", kernel_io_uring_disabled_table);
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index d549571..6426ee3 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -5,6 +5,7 @@
#include <linux/lockdep.h>
#include <linux/resume_user_mode.h>
#include <linux/kasan.h>
+#include <linux/poll.h>
#include <linux/io_uring_types.h>
#include <uapi/linux/eventpoll.h>
#include "io-wq.h"
@@ -34,6 +35,32 @@ enum {
IOU_STOP_MULTISHOT = -ECANCELED,
};
+struct io_wait_queue {
+ struct wait_queue_entry wq;
+ struct io_ring_ctx *ctx;
+ unsigned cq_tail;
+ unsigned nr_timeouts;
+ ktime_t timeout;
+
+#ifdef CONFIG_NET_RX_BUSY_POLL
+ unsigned int napi_busy_poll_to;
+ bool napi_prefer_busy_poll;
+#endif
+};
+
+static inline bool io_should_wake(struct io_wait_queue *iowq)
+{
+ struct io_ring_ctx *ctx = iowq->ctx;
+ int dist = READ_ONCE(ctx->rings->cq.tail) - (int) iowq->cq_tail;
+
+ /*
+ * Wake up if we have enough events, or if a timeout occurred since we
+ * started waiting. For timeouts, we always want to return to userspace,
+ * regardless of event count.
+ */
+ return dist >= 0 || atomic_read(&ctx->cq_timeouts) != iowq->nr_timeouts;
+}
+
bool io_cqe_cache_refill(struct io_ring_ctx *ctx, bool overflow);
void io_req_cqe_overflow(struct io_kiocb *req);
int io_run_task_work_sig(struct io_ring_ctx *ctx);
@@ -56,6 +83,8 @@ void io_queue_iowq(struct io_kiocb *req, struct io_tw_state *ts_dont_use);
void io_req_task_complete(struct io_kiocb *req, struct io_tw_state *ts);
void io_req_task_queue_fail(struct io_kiocb *req, int ret);
void io_req_task_submit(struct io_kiocb *req, struct io_tw_state *ts);
+struct llist_node *io_handle_tw_list(struct llist_node *node, unsigned int *count, unsigned int max_entries);
+struct llist_node *tctx_task_work_run(struct io_uring_task *tctx, unsigned int max_entries, unsigned int *count);
void tctx_task_work(struct callback_head *cb);
__cold void io_uring_cancel_generic(bool cancel_all, struct io_sq_data *sqd);
int io_uring_alloc_task_context(struct task_struct *task,
@@ -207,7 +236,7 @@ static inline void io_ring_submit_unlock(struct io_ring_ctx *ctx,
unsigned issue_flags)
{
lockdep_assert_held(&ctx->uring_lock);
- if (issue_flags & IO_URING_F_UNLOCKED)
+ if (unlikely(issue_flags & IO_URING_F_UNLOCKED))
mutex_unlock(&ctx->uring_lock);
}
@@ -220,7 +249,7 @@ static inline void io_ring_submit_lock(struct io_ring_ctx *ctx,
* The only exception is when we've detached the request and issue it
* from an async worker thread, grab the lock for that case.
*/
- if (issue_flags & IO_URING_F_UNLOCKED)
+ if (unlikely(issue_flags & IO_URING_F_UNLOCKED))
mutex_lock(&ctx->uring_lock);
lockdep_assert_held(&ctx->uring_lock);
}
@@ -274,6 +303,8 @@ static inline unsigned int io_sqring_entries(struct io_ring_ctx *ctx)
static inline int io_run_task_work(void)
{
+ bool ret = false;
+
/*
* Always check-and-clear the task_work notification signal. With how
* signaling works for task_work, we can find it set with nothing to
@@ -285,18 +316,26 @@ static inline int io_run_task_work(void)
* PF_IO_WORKER never returns to userspace, so check here if we have
* notify work that needs processing.
*/
- if (current->flags & PF_IO_WORKER &&
- test_thread_flag(TIF_NOTIFY_RESUME)) {
- __set_current_state(TASK_RUNNING);
- resume_user_mode_work(NULL);
+ if (current->flags & PF_IO_WORKER) {
+ if (test_thread_flag(TIF_NOTIFY_RESUME)) {
+ __set_current_state(TASK_RUNNING);
+ resume_user_mode_work(NULL);
+ }
+ if (current->io_uring) {
+ unsigned int count = 0;
+
+ tctx_task_work_run(current->io_uring, UINT_MAX, &count);
+ if (count)
+ ret = true;
+ }
}
if (task_work_pending(current)) {
__set_current_state(TASK_RUNNING);
task_work_run();
- return 1;
+ ret = true;
}
- return 0;
+ return ret;
}
static inline bool io_task_work_pending(struct io_ring_ctx *ctx)
@@ -398,4 +437,26 @@ static inline size_t uring_sqe_size(struct io_ring_ctx *ctx)
return 2 * sizeof(struct io_uring_sqe);
return sizeof(struct io_uring_sqe);
}
+
+static inline bool io_file_can_poll(struct io_kiocb *req)
+{
+ if (req->flags & REQ_F_CAN_POLL)
+ return true;
+ if (file_can_poll(req->file)) {
+ req->flags |= REQ_F_CAN_POLL;
+ return true;
+ }
+ return false;
+}
+
+enum {
+ IO_CHECK_CQ_OVERFLOW_BIT,
+ IO_CHECK_CQ_DROPPED_BIT,
+};
+
+static inline bool io_has_work(struct io_ring_ctx *ctx)
+{
+ return test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq) ||
+ !llist_empty(&ctx->work_llist);
+}
#endif
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index 18df5a9..9be42bf 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -81,15 +81,6 @@ bool io_kbuf_recycle_legacy(struct io_kiocb *req, unsigned issue_flags)
struct io_buffer_list *bl;
struct io_buffer *buf;
- /*
- * For legacy provided buffer mode, don't recycle if we already did
- * IO to this buffer. For ring-mapped provided buffer mode, we should
- * increment ring->head to explicitly monopolize the buffer to avoid
- * multiple use.
- */
- if (req->flags & REQ_F_PARTIAL_IO)
- return false;
-
io_ring_submit_lock(ctx, issue_flags);
buf = req->kbuf;
@@ -102,10 +93,8 @@ bool io_kbuf_recycle_legacy(struct io_kiocb *req, unsigned issue_flags)
return true;
}
-unsigned int __io_put_kbuf(struct io_kiocb *req, unsigned issue_flags)
+void __io_put_kbuf(struct io_kiocb *req, unsigned issue_flags)
{
- unsigned int cflags;
-
/*
* We can add this buffer back to two lists:
*
@@ -118,21 +107,17 @@ unsigned int __io_put_kbuf(struct io_kiocb *req, unsigned issue_flags)
* We migrate buffers from the comp_list to the issue cache list
* when we need one.
*/
- if (req->flags & REQ_F_BUFFER_RING) {
- /* no buffers to recycle for this case */
- cflags = __io_put_kbuf_list(req, NULL);
- } else if (issue_flags & IO_URING_F_UNLOCKED) {
+ if (issue_flags & IO_URING_F_UNLOCKED) {
struct io_ring_ctx *ctx = req->ctx;
spin_lock(&ctx->completion_lock);
- cflags = __io_put_kbuf_list(req, &ctx->io_buffers_comp);
+ __io_put_kbuf_list(req, &ctx->io_buffers_comp);
spin_unlock(&ctx->completion_lock);
} else {
lockdep_assert_held(&req->ctx->uring_lock);
- cflags = __io_put_kbuf_list(req, &req->ctx->io_buffers_cache);
+ __io_put_kbuf_list(req, &req->ctx->io_buffers_cache);
}
- return cflags;
}
static void __user *io_provided_buffer_select(struct io_kiocb *req, size_t *len,
@@ -145,6 +130,8 @@ static void __user *io_provided_buffer_select(struct io_kiocb *req, size_t *len,
list_del(&kbuf->list);
if (*len == 0 || *len > kbuf->len)
*len = kbuf->len;
+ if (list_empty(&bl->buf_list))
+ req->flags |= REQ_F_BL_EMPTY;
req->flags |= REQ_F_BUFFER_SELECTED;
req->kbuf = kbuf;
req->buf_index = kbuf->bid;
@@ -158,12 +145,16 @@ static void __user *io_ring_buffer_select(struct io_kiocb *req, size_t *len,
unsigned int issue_flags)
{
struct io_uring_buf_ring *br = bl->buf_ring;
+ __u16 tail, head = bl->head;
struct io_uring_buf *buf;
- __u16 head = bl->head;
- if (unlikely(smp_load_acquire(&br->tail) == head))
+ tail = smp_load_acquire(&br->tail);
+ if (unlikely(tail == head))
return NULL;
+ if (head + 1 == tail)
+ req->flags |= REQ_F_BL_EMPTY;
+
head &= bl->mask;
/* mmaped buffers are always contig */
if (bl->is_mmap || head < IO_BUFFER_LIST_BUF_PER_PAGE) {
@@ -180,7 +171,7 @@ static void __user *io_ring_buffer_select(struct io_kiocb *req, size_t *len,
req->buf_list = bl;
req->buf_index = buf->bid;
- if (issue_flags & IO_URING_F_UNLOCKED || !file_can_poll(req->file)) {
+ if (issue_flags & IO_URING_F_UNLOCKED || !io_file_can_poll(req)) {
/*
* If we came in unlocked, we have no choice but to consume the
* buffer here, otherwise nothing ensures that the buffer won't
diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h
index 53dfaa7..5218bfd 100644
--- a/io_uring/kbuf.h
+++ b/io_uring/kbuf.h
@@ -57,7 +57,7 @@ int io_register_pbuf_status(struct io_ring_ctx *ctx, void __user *arg);
void io_kbuf_mmap_list_free(struct io_ring_ctx *ctx);
-unsigned int __io_put_kbuf(struct io_kiocb *req, unsigned issue_flags);
+void __io_put_kbuf(struct io_kiocb *req, unsigned issue_flags);
bool io_kbuf_recycle_legacy(struct io_kiocb *req, unsigned issue_flags);
@@ -73,21 +73,9 @@ static inline bool io_kbuf_recycle_ring(struct io_kiocb *req)
* to monopolize the buffer.
*/
if (req->buf_list) {
- if (req->flags & REQ_F_PARTIAL_IO) {
- /*
- * If we end up here, then the io_uring_lock has
- * been kept held since we retrieved the buffer.
- * For the io-wq case, we already cleared
- * req->buf_list when the buffer was retrieved,
- * hence it cannot be set here for that case.
- */
- req->buf_list->head++;
- req->buf_list = NULL;
- } else {
- req->buf_index = req->buf_list->bgid;
- req->flags &= ~REQ_F_BUFFER_RING;
- return true;
- }
+ req->buf_index = req->buf_list->bgid;
+ req->flags &= ~REQ_F_BUFFER_RING;
+ return true;
}
return false;
}
@@ -101,6 +89,8 @@ static inline bool io_do_buffer_select(struct io_kiocb *req)
static inline bool io_kbuf_recycle(struct io_kiocb *req, unsigned issue_flags)
{
+ if (req->flags & REQ_F_BL_NO_RECYCLE)
+ return false;
if (req->flags & REQ_F_BUFFER_SELECTED)
return io_kbuf_recycle_legacy(req, issue_flags);
if (req->flags & REQ_F_BUFFER_RING)
@@ -108,41 +98,54 @@ static inline bool io_kbuf_recycle(struct io_kiocb *req, unsigned issue_flags)
return false;
}
-static inline unsigned int __io_put_kbuf_list(struct io_kiocb *req,
- struct list_head *list)
+static inline void __io_put_kbuf_ring(struct io_kiocb *req)
{
- unsigned int ret = IORING_CQE_F_BUFFER | (req->buf_index << IORING_CQE_BUFFER_SHIFT);
+ if (req->buf_list) {
+ req->buf_index = req->buf_list->bgid;
+ req->buf_list->head++;
+ }
+ req->flags &= ~REQ_F_BUFFER_RING;
+}
+static inline void __io_put_kbuf_list(struct io_kiocb *req,
+ struct list_head *list)
+{
if (req->flags & REQ_F_BUFFER_RING) {
- if (req->buf_list) {
- req->buf_index = req->buf_list->bgid;
- req->buf_list->head++;
- }
- req->flags &= ~REQ_F_BUFFER_RING;
+ __io_put_kbuf_ring(req);
} else {
req->buf_index = req->kbuf->bgid;
list_add(&req->kbuf->list, list);
req->flags &= ~REQ_F_BUFFER_SELECTED;
}
-
- return ret;
}
static inline unsigned int io_put_kbuf_comp(struct io_kiocb *req)
{
+ unsigned int ret;
+
lockdep_assert_held(&req->ctx->completion_lock);
if (!(req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING)))
return 0;
- return __io_put_kbuf_list(req, &req->ctx->io_buffers_comp);
+
+ ret = IORING_CQE_F_BUFFER | (req->buf_index << IORING_CQE_BUFFER_SHIFT);
+ __io_put_kbuf_list(req, &req->ctx->io_buffers_comp);
+ return ret;
}
static inline unsigned int io_put_kbuf(struct io_kiocb *req,
unsigned issue_flags)
{
+ unsigned int ret;
- if (!(req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING)))
+ if (!(req->flags & (REQ_F_BUFFER_RING | REQ_F_BUFFER_SELECTED)))
return 0;
- return __io_put_kbuf(req, issue_flags);
+
+ ret = IORING_CQE_F_BUFFER | (req->buf_index << IORING_CQE_BUFFER_SHIFT);
+ if (req->flags & REQ_F_BUFFER_RING)
+ __io_put_kbuf_ring(req);
+ else
+ __io_put_kbuf(req, issue_flags);
+ return ret;
}
#endif
diff --git a/io_uring/napi.c b/io_uring/napi.c
new file mode 100644
index 0000000..883a1a6
--- /dev/null
+++ b/io_uring/napi.c
@@ -0,0 +1,332 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "io_uring.h"
+#include "napi.h"
+
+#ifdef CONFIG_NET_RX_BUSY_POLL
+
+/* Timeout for cleanout of stale entries. */
+#define NAPI_TIMEOUT (60 * SEC_CONVERSION)
+
+struct io_napi_entry {
+ unsigned int napi_id;
+ struct list_head list;
+
+ unsigned long timeout;
+ struct hlist_node node;
+
+ struct rcu_head rcu;
+};
+
+static struct io_napi_entry *io_napi_hash_find(struct hlist_head *hash_list,
+ unsigned int napi_id)
+{
+ struct io_napi_entry *e;
+
+ hlist_for_each_entry_rcu(e, hash_list, node) {
+ if (e->napi_id != napi_id)
+ continue;
+ e->timeout = jiffies + NAPI_TIMEOUT;
+ return e;
+ }
+
+ return NULL;
+}
+
+void __io_napi_add(struct io_ring_ctx *ctx, struct socket *sock)
+{
+ struct hlist_head *hash_list;
+ unsigned int napi_id;
+ struct sock *sk;
+ struct io_napi_entry *e;
+
+ sk = sock->sk;
+ if (!sk)
+ return;
+
+ napi_id = READ_ONCE(sk->sk_napi_id);
+
+ /* Non-NAPI IDs can be rejected. */
+ if (napi_id < MIN_NAPI_ID)
+ return;
+
+ hash_list = &ctx->napi_ht[hash_min(napi_id, HASH_BITS(ctx->napi_ht))];
+
+ rcu_read_lock();
+ e = io_napi_hash_find(hash_list, napi_id);
+ if (e) {
+ e->timeout = jiffies + NAPI_TIMEOUT;
+ rcu_read_unlock();
+ return;
+ }
+ rcu_read_unlock();
+
+ e = kmalloc(sizeof(*e), GFP_NOWAIT);
+ if (!e)
+ return;
+
+ e->napi_id = napi_id;
+ e->timeout = jiffies + NAPI_TIMEOUT;
+
+ spin_lock(&ctx->napi_lock);
+ if (unlikely(io_napi_hash_find(hash_list, napi_id))) {
+ spin_unlock(&ctx->napi_lock);
+ kfree(e);
+ return;
+ }
+
+ hlist_add_tail_rcu(&e->node, hash_list);
+ list_add_tail(&e->list, &ctx->napi_list);
+ spin_unlock(&ctx->napi_lock);
+}
+
+static void __io_napi_remove_stale(struct io_ring_ctx *ctx)
+{
+ struct io_napi_entry *e;
+ unsigned int i;
+
+ spin_lock(&ctx->napi_lock);
+ hash_for_each(ctx->napi_ht, i, e, node) {
+ if (time_after(jiffies, e->timeout)) {
+ list_del(&e->list);
+ hash_del_rcu(&e->node);
+ kfree_rcu(e, rcu);
+ }
+ }
+ spin_unlock(&ctx->napi_lock);
+}
+
+static inline void io_napi_remove_stale(struct io_ring_ctx *ctx, bool is_stale)
+{
+ if (is_stale)
+ __io_napi_remove_stale(ctx);
+}
+
+static inline bool io_napi_busy_loop_timeout(unsigned long start_time,
+ unsigned long bp_usec)
+{
+ if (bp_usec) {
+ unsigned long end_time = start_time + bp_usec;
+ unsigned long now = busy_loop_current_time();
+
+ return time_after(now, end_time);
+ }
+
+ return true;
+}
+
+static bool io_napi_busy_loop_should_end(void *data,
+ unsigned long start_time)
+{
+ struct io_wait_queue *iowq = data;
+
+ if (signal_pending(current))
+ return true;
+ if (io_should_wake(iowq) || io_has_work(iowq->ctx))
+ return true;
+ if (io_napi_busy_loop_timeout(start_time, iowq->napi_busy_poll_to))
+ return true;
+
+ return false;
+}
+
+static bool __io_napi_do_busy_loop(struct io_ring_ctx *ctx,
+ void *loop_end_arg)
+{
+ struct io_napi_entry *e;
+ bool (*loop_end)(void *, unsigned long) = NULL;
+ bool is_stale = false;
+
+ if (loop_end_arg)
+ loop_end = io_napi_busy_loop_should_end;
+
+ list_for_each_entry_rcu(e, &ctx->napi_list, list) {
+ napi_busy_loop_rcu(e->napi_id, loop_end, loop_end_arg,
+ ctx->napi_prefer_busy_poll, BUSY_POLL_BUDGET);
+
+ if (time_after(jiffies, e->timeout))
+ is_stale = true;
+ }
+
+ return is_stale;
+}
+
+static void io_napi_blocking_busy_loop(struct io_ring_ctx *ctx,
+ struct io_wait_queue *iowq)
+{
+ unsigned long start_time = busy_loop_current_time();
+ void *loop_end_arg = NULL;
+ bool is_stale = false;
+
+ /* Singular lists use a different napi loop end check function and are
+ * only executed once.
+ */
+ if (list_is_singular(&ctx->napi_list))
+ loop_end_arg = iowq;
+
+ rcu_read_lock();
+ do {
+ is_stale = __io_napi_do_busy_loop(ctx, loop_end_arg);
+ } while (!io_napi_busy_loop_should_end(iowq, start_time) && !loop_end_arg);
+ rcu_read_unlock();
+
+ io_napi_remove_stale(ctx, is_stale);
+}
+
+/*
+ * io_napi_init() - Init napi settings
+ * @ctx: pointer to io-uring context structure
+ *
+ * Init napi settings in the io-uring context.
+ */
+void io_napi_init(struct io_ring_ctx *ctx)
+{
+ INIT_LIST_HEAD(&ctx->napi_list);
+ spin_lock_init(&ctx->napi_lock);
+ ctx->napi_prefer_busy_poll = false;
+ ctx->napi_busy_poll_to = READ_ONCE(sysctl_net_busy_poll);
+}
+
+/*
+ * io_napi_free() - Deallocate napi
+ * @ctx: pointer to io-uring context structure
+ *
+ * Free the napi list and the hash table in the io-uring context.
+ */
+void io_napi_free(struct io_ring_ctx *ctx)
+{
+ struct io_napi_entry *e;
+ LIST_HEAD(napi_list);
+ unsigned int i;
+
+ spin_lock(&ctx->napi_lock);
+ hash_for_each(ctx->napi_ht, i, e, node) {
+ hash_del_rcu(&e->node);
+ kfree_rcu(e, rcu);
+ }
+ spin_unlock(&ctx->napi_lock);
+}
+
+/*
+ * io_napi_register() - Register napi with io-uring
+ * @ctx: pointer to io-uring context structure
+ * @arg: pointer to io_uring_napi structure
+ *
+ * Register napi in the io-uring context.
+ */
+int io_register_napi(struct io_ring_ctx *ctx, void __user *arg)
+{
+ const struct io_uring_napi curr = {
+ .busy_poll_to = ctx->napi_busy_poll_to,
+ .prefer_busy_poll = ctx->napi_prefer_busy_poll
+ };
+ struct io_uring_napi napi;
+
+ if (copy_from_user(&napi, arg, sizeof(napi)))
+ return -EFAULT;
+ if (napi.pad[0] || napi.pad[1] || napi.pad[2] || napi.resv)
+ return -EINVAL;
+
+ if (copy_to_user(arg, &curr, sizeof(curr)))
+ return -EFAULT;
+
+ WRITE_ONCE(ctx->napi_busy_poll_to, napi.busy_poll_to);
+ WRITE_ONCE(ctx->napi_prefer_busy_poll, !!napi.prefer_busy_poll);
+ WRITE_ONCE(ctx->napi_enabled, true);
+ return 0;
+}
+
+/*
+ * io_napi_unregister() - Unregister napi with io-uring
+ * @ctx: pointer to io-uring context structure
+ * @arg: pointer to io_uring_napi structure
+ *
+ * Unregister napi. If arg has been specified copy the busy poll timeout and
+ * prefer busy poll setting to the passed in structure.
+ */
+int io_unregister_napi(struct io_ring_ctx *ctx, void __user *arg)
+{
+ const struct io_uring_napi curr = {
+ .busy_poll_to = ctx->napi_busy_poll_to,
+ .prefer_busy_poll = ctx->napi_prefer_busy_poll
+ };
+
+ if (arg && copy_to_user(arg, &curr, sizeof(curr)))
+ return -EFAULT;
+
+ WRITE_ONCE(ctx->napi_busy_poll_to, 0);
+ WRITE_ONCE(ctx->napi_prefer_busy_poll, false);
+ WRITE_ONCE(ctx->napi_enabled, false);
+ return 0;
+}
+
+/*
+ * __io_napi_adjust_timeout() - Add napi id to the busy poll list
+ * @ctx: pointer to io-uring context structure
+ * @iowq: pointer to io wait queue
+ * @ts: pointer to timespec or NULL
+ *
+ * Adjust the busy loop timeout according to timespec and busy poll timeout.
+ */
+void __io_napi_adjust_timeout(struct io_ring_ctx *ctx, struct io_wait_queue *iowq,
+ struct timespec64 *ts)
+{
+ unsigned int poll_to = READ_ONCE(ctx->napi_busy_poll_to);
+
+ if (ts) {
+ struct timespec64 poll_to_ts = ns_to_timespec64(1000 * (s64)poll_to);
+
+ if (timespec64_compare(ts, &poll_to_ts) > 0) {
+ *ts = timespec64_sub(*ts, poll_to_ts);
+ } else {
+ u64 to = timespec64_to_ns(ts);
+
+ do_div(to, 1000);
+ ts->tv_sec = 0;
+ ts->tv_nsec = 0;
+ }
+ }
+
+ iowq->napi_busy_poll_to = poll_to;
+}
+
+/*
+ * __io_napi_busy_loop() - execute busy poll loop
+ * @ctx: pointer to io-uring context structure
+ * @iowq: pointer to io wait queue
+ *
+ * Execute the busy poll loop and merge the spliced off list.
+ */
+void __io_napi_busy_loop(struct io_ring_ctx *ctx, struct io_wait_queue *iowq)
+{
+ iowq->napi_prefer_busy_poll = READ_ONCE(ctx->napi_prefer_busy_poll);
+
+ if (!(ctx->flags & IORING_SETUP_SQPOLL) && ctx->napi_enabled)
+ io_napi_blocking_busy_loop(ctx, iowq);
+}
+
+/*
+ * io_napi_sqpoll_busy_poll() - busy poll loop for sqpoll
+ * @ctx: pointer to io-uring context structure
+ *
+ * Splice of the napi list and execute the napi busy poll loop.
+ */
+int io_napi_sqpoll_busy_poll(struct io_ring_ctx *ctx)
+{
+ LIST_HEAD(napi_list);
+ bool is_stale = false;
+
+ if (!READ_ONCE(ctx->napi_busy_poll_to))
+ return 0;
+ if (list_empty_careful(&ctx->napi_list))
+ return 0;
+
+ rcu_read_lock();
+ is_stale = __io_napi_do_busy_loop(ctx, NULL);
+ rcu_read_unlock();
+
+ io_napi_remove_stale(ctx, is_stale);
+ return 1;
+}
+
+#endif
diff --git a/io_uring/napi.h b/io_uring/napi.h
new file mode 100644
index 0000000..6fc0393
--- /dev/null
+++ b/io_uring/napi.h
@@ -0,0 +1,104 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef IOU_NAPI_H
+#define IOU_NAPI_H
+
+#include <linux/kernel.h>
+#include <linux/io_uring.h>
+#include <net/busy_poll.h>
+
+#ifdef CONFIG_NET_RX_BUSY_POLL
+
+void io_napi_init(struct io_ring_ctx *ctx);
+void io_napi_free(struct io_ring_ctx *ctx);
+
+int io_register_napi(struct io_ring_ctx *ctx, void __user *arg);
+int io_unregister_napi(struct io_ring_ctx *ctx, void __user *arg);
+
+void __io_napi_add(struct io_ring_ctx *ctx, struct socket *sock);
+
+void __io_napi_adjust_timeout(struct io_ring_ctx *ctx,
+ struct io_wait_queue *iowq, struct timespec64 *ts);
+void __io_napi_busy_loop(struct io_ring_ctx *ctx, struct io_wait_queue *iowq);
+int io_napi_sqpoll_busy_poll(struct io_ring_ctx *ctx);
+
+static inline bool io_napi(struct io_ring_ctx *ctx)
+{
+ return !list_empty(&ctx->napi_list);
+}
+
+static inline void io_napi_adjust_timeout(struct io_ring_ctx *ctx,
+ struct io_wait_queue *iowq,
+ struct timespec64 *ts)
+{
+ if (!io_napi(ctx))
+ return;
+ __io_napi_adjust_timeout(ctx, iowq, ts);
+}
+
+static inline void io_napi_busy_loop(struct io_ring_ctx *ctx,
+ struct io_wait_queue *iowq)
+{
+ if (!io_napi(ctx))
+ return;
+ __io_napi_busy_loop(ctx, iowq);
+}
+
+/*
+ * io_napi_add() - Add napi id to the busy poll list
+ * @req: pointer to io_kiocb request
+ *
+ * Add the napi id of the socket to the napi busy poll list and hash table.
+ */
+static inline void io_napi_add(struct io_kiocb *req)
+{
+ struct io_ring_ctx *ctx = req->ctx;
+ struct socket *sock;
+
+ if (!READ_ONCE(ctx->napi_busy_poll_to))
+ return;
+
+ sock = sock_from_file(req->file);
+ if (sock)
+ __io_napi_add(ctx, sock);
+}
+
+#else
+
+static inline void io_napi_init(struct io_ring_ctx *ctx)
+{
+}
+static inline void io_napi_free(struct io_ring_ctx *ctx)
+{
+}
+static inline int io_register_napi(struct io_ring_ctx *ctx, void __user *arg)
+{
+ return -EOPNOTSUPP;
+}
+static inline int io_unregister_napi(struct io_ring_ctx *ctx, void __user *arg)
+{
+ return -EOPNOTSUPP;
+}
+static inline bool io_napi(struct io_ring_ctx *ctx)
+{
+ return false;
+}
+static inline void io_napi_add(struct io_kiocb *req)
+{
+}
+static inline void io_napi_adjust_timeout(struct io_ring_ctx *ctx,
+ struct io_wait_queue *iowq,
+ struct timespec64 *ts)
+{
+}
+static inline void io_napi_busy_loop(struct io_ring_ctx *ctx,
+ struct io_wait_queue *iowq)
+{
+}
+static inline int io_napi_sqpoll_busy_poll(struct io_ring_ctx *ctx)
+{
+ return 0;
+}
+#endif /* CONFIG_NET_RX_BUSY_POLL */
+
+#endif
diff --git a/io_uring/net.c b/io_uring/net.c
index 43bc9a5..19451f0 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -78,19 +78,6 @@ struct io_sr_msg {
*/
#define MULTISHOT_MAX_RETRY 32
-static inline bool io_check_multishot(struct io_kiocb *req,
- unsigned int issue_flags)
-{
- /*
- * When ->locked_cq is set we only allow to post CQEs from the original
- * task context. Usual request completions will be handled in other
- * generic paths but multipoll may decide to post extra cqes.
- */
- return !(issue_flags & IO_URING_F_IOWQ) ||
- !(issue_flags & IO_URING_F_MULTISHOT) ||
- !req->ctx->task_complete;
-}
-
int io_shutdown_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
{
struct io_shutdown *shutdown = io_kiocb_to_cmd(req, struct io_shutdown);
@@ -204,16 +191,130 @@ static int io_setup_async_msg(struct io_kiocb *req,
return -EAGAIN;
}
-static int io_sendmsg_copy_hdr(struct io_kiocb *req,
- struct io_async_msghdr *iomsg)
+#ifdef CONFIG_COMPAT
+static int io_compat_msg_copy_hdr(struct io_kiocb *req,
+ struct io_async_msghdr *iomsg,
+ struct compat_msghdr *msg, int ddir)
+{
+ struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
+ struct compat_iovec __user *uiov;
+ int ret;
+
+ if (copy_from_user(msg, sr->umsg_compat, sizeof(*msg)))
+ return -EFAULT;
+
+ uiov = compat_ptr(msg->msg_iov);
+ if (req->flags & REQ_F_BUFFER_SELECT) {
+ compat_ssize_t clen;
+
+ iomsg->free_iov = NULL;
+ if (msg->msg_iovlen == 0) {
+ sr->len = 0;
+ } else if (msg->msg_iovlen > 1) {
+ return -EINVAL;
+ } else {
+ if (!access_ok(uiov, sizeof(*uiov)))
+ return -EFAULT;
+ if (__get_user(clen, &uiov->iov_len))
+ return -EFAULT;
+ if (clen < 0)
+ return -EINVAL;
+ sr->len = clen;
+ }
+
+ return 0;
+ }
+
+ iomsg->free_iov = iomsg->fast_iov;
+ ret = __import_iovec(ddir, (struct iovec __user *)uiov, msg->msg_iovlen,
+ UIO_FASTIOV, &iomsg->free_iov,
+ &iomsg->msg.msg_iter, true);
+ if (unlikely(ret < 0))
+ return ret;
+
+ return 0;
+}
+#endif
+
+static int io_msg_copy_hdr(struct io_kiocb *req, struct io_async_msghdr *iomsg,
+ struct user_msghdr *msg, int ddir)
{
struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
int ret;
- iomsg->msg.msg_name = &iomsg->addr;
+ if (!user_access_begin(sr->umsg, sizeof(*sr->umsg)))
+ return -EFAULT;
+
+ ret = -EFAULT;
+ unsafe_get_user(msg->msg_name, &sr->umsg->msg_name, ua_end);
+ unsafe_get_user(msg->msg_namelen, &sr->umsg->msg_namelen, ua_end);
+ unsafe_get_user(msg->msg_iov, &sr->umsg->msg_iov, ua_end);
+ unsafe_get_user(msg->msg_iovlen, &sr->umsg->msg_iovlen, ua_end);
+ unsafe_get_user(msg->msg_control, &sr->umsg->msg_control, ua_end);
+ unsafe_get_user(msg->msg_controllen, &sr->umsg->msg_controllen, ua_end);
+ msg->msg_flags = 0;
+
+ if (req->flags & REQ_F_BUFFER_SELECT) {
+ if (msg->msg_iovlen == 0) {
+ sr->len = iomsg->fast_iov[0].iov_len = 0;
+ iomsg->fast_iov[0].iov_base = NULL;
+ iomsg->free_iov = NULL;
+ } else if (msg->msg_iovlen > 1) {
+ ret = -EINVAL;
+ goto ua_end;
+ } else {
+ /* we only need the length for provided buffers */
+ if (!access_ok(&msg->msg_iov[0].iov_len, sizeof(__kernel_size_t)))
+ goto ua_end;
+ unsafe_get_user(iomsg->fast_iov[0].iov_len,
+ &msg->msg_iov[0].iov_len, ua_end);
+ sr->len = iomsg->fast_iov[0].iov_len;
+ iomsg->free_iov = NULL;
+ }
+ ret = 0;
+ua_end:
+ user_access_end();
+ return ret;
+ }
+
+ user_access_end();
iomsg->free_iov = iomsg->fast_iov;
- ret = sendmsg_copy_msghdr(&iomsg->msg, sr->umsg, sr->msg_flags,
- &iomsg->free_iov);
+ ret = __import_iovec(ddir, msg->msg_iov, msg->msg_iovlen, UIO_FASTIOV,
+ &iomsg->free_iov, &iomsg->msg.msg_iter, false);
+ if (unlikely(ret < 0))
+ return ret;
+
+ return 0;
+}
+
+static int io_sendmsg_copy_hdr(struct io_kiocb *req,
+ struct io_async_msghdr *iomsg)
+{
+ struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
+ struct user_msghdr msg;
+ int ret;
+
+ iomsg->msg.msg_name = &iomsg->addr;
+ iomsg->msg.msg_iter.nr_segs = 0;
+
+#ifdef CONFIG_COMPAT
+ if (unlikely(req->ctx->compat)) {
+ struct compat_msghdr cmsg;
+
+ ret = io_compat_msg_copy_hdr(req, iomsg, &cmsg, ITER_SOURCE);
+ if (unlikely(ret))
+ return ret;
+
+ return __get_compat_msghdr(&iomsg->msg, &cmsg, NULL);
+ }
+#endif
+
+ ret = io_msg_copy_hdr(req, iomsg, &msg, ITER_SOURCE);
+ if (unlikely(ret))
+ return ret;
+
+ ret = __copy_msghdr(&iomsg->msg, &msg, NULL);
+
/* save msg_control as sys_sendmsg() overwrites it */
sr->msg_control = iomsg->msg.msg_control_user;
return ret;
@@ -273,6 +374,8 @@ int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
{
struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
+ sr->done_io = 0;
+
if (req->opcode == IORING_OP_SEND) {
if (READ_ONCE(sqe->__pad3[0]))
return -EINVAL;
@@ -295,10 +398,20 @@ int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
if (req->ctx->compat)
sr->msg_flags |= MSG_CMSG_COMPAT;
#endif
- sr->done_io = 0;
return 0;
}
+static void io_req_msg_cleanup(struct io_kiocb *req,
+ struct io_async_msghdr *kmsg,
+ unsigned int issue_flags)
+{
+ req->flags &= ~REQ_F_NEED_CLEANUP;
+ /* fast path, check for non-NULL to avoid function call */
+ if (kmsg->free_iov)
+ kfree(kmsg->free_iov);
+ io_netmsg_recycle(req, issue_flags);
+}
+
int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
@@ -341,18 +454,14 @@ int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
kmsg->msg.msg_controllen = 0;
kmsg->msg.msg_control = NULL;
sr->done_io += ret;
- req->flags |= REQ_F_PARTIAL_IO;
+ req->flags |= REQ_F_BL_NO_RECYCLE;
return io_setup_async_msg(req, kmsg, issue_flags);
}
if (ret == -ERESTARTSYS)
ret = -EINTR;
req_set_fail(req);
}
- /* fast path, check for non-NULL to avoid function call */
- if (kmsg->free_iov)
- kfree(kmsg->free_iov);
- req->flags &= ~REQ_F_NEED_CLEANUP;
- io_netmsg_recycle(req, issue_flags);
+ io_req_msg_cleanup(req, kmsg, issue_flags);
if (ret >= 0)
ret += sr->done_io;
else if (sr->done_io)
@@ -420,7 +529,7 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags)
sr->len -= ret;
sr->buf += ret;
sr->done_io += ret;
- req->flags |= REQ_F_PARTIAL_IO;
+ req->flags |= REQ_F_BL_NO_RECYCLE;
return io_setup_async_addr(req, &__address, issue_flags);
}
if (ret == -ERESTARTSYS)
@@ -435,142 +544,77 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags)
return IOU_OK;
}
-static bool io_recvmsg_multishot_overflow(struct io_async_msghdr *iomsg)
+static int io_recvmsg_mshot_prep(struct io_kiocb *req,
+ struct io_async_msghdr *iomsg,
+ int namelen, size_t controllen)
{
- int hdr;
+ if ((req->flags & (REQ_F_APOLL_MULTISHOT|REQ_F_BUFFER_SELECT)) ==
+ (REQ_F_APOLL_MULTISHOT|REQ_F_BUFFER_SELECT)) {
+ int hdr;
- if (iomsg->namelen < 0)
- return true;
- if (check_add_overflow((int)sizeof(struct io_uring_recvmsg_out),
- iomsg->namelen, &hdr))
- return true;
- if (check_add_overflow(hdr, (int)iomsg->controllen, &hdr))
- return true;
+ if (unlikely(namelen < 0))
+ return -EOVERFLOW;
+ if (check_add_overflow(sizeof(struct io_uring_recvmsg_out),
+ namelen, &hdr))
+ return -EOVERFLOW;
+ if (check_add_overflow(hdr, controllen, &hdr))
+ return -EOVERFLOW;
- return false;
-}
-
-static int __io_recvmsg_copy_hdr(struct io_kiocb *req,
- struct io_async_msghdr *iomsg)
-{
- struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
- struct user_msghdr msg;
- int ret;
-
- if (copy_from_user(&msg, sr->umsg, sizeof(*sr->umsg)))
- return -EFAULT;
-
- ret = __copy_msghdr(&iomsg->msg, &msg, &iomsg->uaddr);
- if (ret)
- return ret;
-
- if (req->flags & REQ_F_BUFFER_SELECT) {
- if (msg.msg_iovlen == 0) {
- sr->len = iomsg->fast_iov[0].iov_len = 0;
- iomsg->fast_iov[0].iov_base = NULL;
- iomsg->free_iov = NULL;
- } else if (msg.msg_iovlen > 1) {
- return -EINVAL;
- } else {
- if (copy_from_user(iomsg->fast_iov, msg.msg_iov, sizeof(*msg.msg_iov)))
- return -EFAULT;
- sr->len = iomsg->fast_iov[0].iov_len;
- iomsg->free_iov = NULL;
- }
-
- if (req->flags & REQ_F_APOLL_MULTISHOT) {
- iomsg->namelen = msg.msg_namelen;
- iomsg->controllen = msg.msg_controllen;
- if (io_recvmsg_multishot_overflow(iomsg))
- return -EOVERFLOW;
- }
- } else {
- iomsg->free_iov = iomsg->fast_iov;
- ret = __import_iovec(ITER_DEST, msg.msg_iov, msg.msg_iovlen, UIO_FASTIOV,
- &iomsg->free_iov, &iomsg->msg.msg_iter,
- false);
- if (ret > 0)
- ret = 0;
- }
-
- return ret;
-}
-
-#ifdef CONFIG_COMPAT
-static int __io_compat_recvmsg_copy_hdr(struct io_kiocb *req,
- struct io_async_msghdr *iomsg)
-{
- struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
- struct compat_msghdr msg;
- struct compat_iovec __user *uiov;
- int ret;
-
- if (copy_from_user(&msg, sr->umsg_compat, sizeof(msg)))
- return -EFAULT;
-
- ret = __get_compat_msghdr(&iomsg->msg, &msg, &iomsg->uaddr);
- if (ret)
- return ret;
-
- uiov = compat_ptr(msg.msg_iov);
- if (req->flags & REQ_F_BUFFER_SELECT) {
- compat_ssize_t clen;
-
- iomsg->free_iov = NULL;
- if (msg.msg_iovlen == 0) {
- sr->len = 0;
- } else if (msg.msg_iovlen > 1) {
- return -EINVAL;
- } else {
- if (!access_ok(uiov, sizeof(*uiov)))
- return -EFAULT;
- if (__get_user(clen, &uiov->iov_len))
- return -EFAULT;
- if (clen < 0)
- return -EINVAL;
- sr->len = clen;
- }
-
- if (req->flags & REQ_F_APOLL_MULTISHOT) {
- iomsg->namelen = msg.msg_namelen;
- iomsg->controllen = msg.msg_controllen;
- if (io_recvmsg_multishot_overflow(iomsg))
- return -EOVERFLOW;
- }
- } else {
- iomsg->free_iov = iomsg->fast_iov;
- ret = __import_iovec(ITER_DEST, (struct iovec __user *)uiov, msg.msg_iovlen,
- UIO_FASTIOV, &iomsg->free_iov,
- &iomsg->msg.msg_iter, true);
- if (ret < 0)
- return ret;
+ iomsg->namelen = namelen;
+ iomsg->controllen = controllen;
+ return 0;
}
return 0;
}
-#endif
static int io_recvmsg_copy_hdr(struct io_kiocb *req,
struct io_async_msghdr *iomsg)
{
+ struct user_msghdr msg;
+ int ret;
+
iomsg->msg.msg_name = &iomsg->addr;
iomsg->msg.msg_iter.nr_segs = 0;
#ifdef CONFIG_COMPAT
- if (req->ctx->compat)
- return __io_compat_recvmsg_copy_hdr(req, iomsg);
+ if (unlikely(req->ctx->compat)) {
+ struct compat_msghdr cmsg;
+
+ ret = io_compat_msg_copy_hdr(req, iomsg, &cmsg, ITER_DEST);
+ if (unlikely(ret))
+ return ret;
+
+ ret = __get_compat_msghdr(&iomsg->msg, &cmsg, &iomsg->uaddr);
+ if (unlikely(ret))
+ return ret;
+
+ return io_recvmsg_mshot_prep(req, iomsg, cmsg.msg_namelen,
+ cmsg.msg_controllen);
+ }
#endif
- return __io_recvmsg_copy_hdr(req, iomsg);
+ ret = io_msg_copy_hdr(req, iomsg, &msg, ITER_DEST);
+ if (unlikely(ret))
+ return ret;
+
+ ret = __copy_msghdr(&iomsg->msg, &msg, &iomsg->uaddr);
+ if (unlikely(ret))
+ return ret;
+
+ return io_recvmsg_mshot_prep(req, iomsg, msg.msg_namelen,
+ msg.msg_controllen);
}
int io_recvmsg_prep_async(struct io_kiocb *req)
{
+ struct io_async_msghdr *iomsg;
int ret;
if (!io_msg_alloc_async_prep(req))
return -ENOMEM;
- ret = io_recvmsg_copy_hdr(req, req->async_data);
+ iomsg = req->async_data;
+ ret = io_recvmsg_copy_hdr(req, iomsg);
if (!ret)
req->flags |= REQ_F_NEED_CLEANUP;
return ret;
@@ -582,6 +626,8 @@ int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
{
struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
+ sr->done_io = 0;
+
if (unlikely(sqe->file_index || sqe->addr2))
return -EINVAL;
@@ -618,7 +664,6 @@ int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
if (req->ctx->compat)
sr->msg_flags |= MSG_CMSG_COMPAT;
#endif
- sr->done_io = 0;
sr->nr_multishot_loops = 0;
return 0;
}
@@ -627,6 +672,7 @@ static inline void io_recv_prep_retry(struct io_kiocb *req)
{
struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
+ req->flags &= ~REQ_F_BL_EMPTY;
sr->done_io = 0;
sr->len = 0; /* get from the provided buffer */
req->buf_index = sr->buf_group;
@@ -645,30 +691,22 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret,
unsigned int cflags;
cflags = io_put_kbuf(req, issue_flags);
- if (msg->msg_inq && msg->msg_inq != -1)
+ if (msg->msg_inq > 0)
cflags |= IORING_CQE_F_SOCK_NONEMPTY;
- if (!(req->flags & REQ_F_APOLL_MULTISHOT)) {
- io_req_set_res(req, *ret, cflags);
- *ret = IOU_OK;
- return true;
- }
-
- if (mshot_finished)
- goto finish;
-
/*
* Fill CQE for this receive and see if we should keep trying to
* receive from this socket.
*/
- if (io_fill_cqe_req_aux(req, issue_flags & IO_URING_F_COMPLETE_DEFER,
+ if ((req->flags & REQ_F_APOLL_MULTISHOT) && !mshot_finished &&
+ io_fill_cqe_req_aux(req, issue_flags & IO_URING_F_COMPLETE_DEFER,
*ret, cflags | IORING_CQE_F_MORE)) {
struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
int mshot_retry_ret = IOU_ISSUE_SKIP_COMPLETE;
io_recv_prep_retry(req);
/* Known not-empty or unknown state, retry */
- if (cflags & IORING_CQE_F_SOCK_NONEMPTY || msg->msg_inq == -1) {
+ if (cflags & IORING_CQE_F_SOCK_NONEMPTY || msg->msg_inq < 0) {
if (sr->nr_multishot_loops++ < MULTISHOT_MAX_RETRY)
return false;
/* mshot retries exceeded, force a requeue */
@@ -681,8 +719,8 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret,
*ret = -EAGAIN;
return true;
}
- /* Otherwise stop multishot but use the current result. */
-finish:
+
+ /* Finish the request / stop multishot. */
io_req_set_res(req, *ret, cflags);
if (issue_flags & IO_URING_F_MULTISHOT)
@@ -803,8 +841,9 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
(sr->flags & IORING_RECVSEND_POLL_FIRST))
return io_setup_async_msg(req, kmsg, issue_flags);
- if (!io_check_multishot(req, issue_flags))
- return io_setup_async_msg(req, kmsg, issue_flags);
+ flags = sr->msg_flags;
+ if (force_nonblock)
+ flags |= MSG_DONTWAIT;
retry_multishot:
if (io_do_buffer_select(req)) {
@@ -826,10 +865,6 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
iov_iter_ubuf(&kmsg->msg.msg_iter, ITER_DEST, buf, len);
}
- flags = sr->msg_flags;
- if (force_nonblock)
- flags |= MSG_DONTWAIT;
-
kmsg->msg.msg_get_inq = 1;
kmsg->msg.msg_inq = -1;
if (req->flags & REQ_F_APOLL_MULTISHOT) {
@@ -855,7 +890,7 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
}
if (ret > 0 && io_net_retry(sock, flags)) {
sr->done_io += ret;
- req->flags |= REQ_F_PARTIAL_IO;
+ req->flags |= REQ_F_BL_NO_RECYCLE;
return io_setup_async_msg(req, kmsg, issue_flags);
}
if (ret == -ERESTARTSYS)
@@ -875,13 +910,10 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
if (!io_recv_finish(req, &ret, &kmsg->msg, mshot_finished, issue_flags))
goto retry_multishot;
- if (mshot_finished) {
- /* fast path, check for non-NULL to avoid function call */
- if (kmsg->free_iov)
- kfree(kmsg->free_iov);
- io_netmsg_recycle(req, issue_flags);
- req->flags &= ~REQ_F_NEED_CLEANUP;
- }
+ if (mshot_finished)
+ io_req_msg_cleanup(req, kmsg, issue_flags);
+ else if (ret == -EAGAIN)
+ return io_setup_async_msg(req, kmsg, issue_flags);
return ret;
}
@@ -900,9 +932,6 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
(sr->flags & IORING_RECVSEND_POLL_FIRST))
return -EAGAIN;
- if (!io_check_multishot(req, issue_flags))
- return -EAGAIN;
-
sock = sock_from_file(req->file);
if (unlikely(!sock))
return -ENOTSOCK;
@@ -915,6 +944,10 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
msg.msg_iocb = NULL;
msg.msg_ubuf = NULL;
+ flags = sr->msg_flags;
+ if (force_nonblock)
+ flags |= MSG_DONTWAIT;
+
retry_multishot:
if (io_do_buffer_select(req)) {
void __user *buf;
@@ -933,9 +966,6 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
msg.msg_inq = -1;
msg.msg_flags = 0;
- flags = sr->msg_flags;
- if (force_nonblock)
- flags |= MSG_DONTWAIT;
if (flags & MSG_WAITALL)
min_ret = iov_iter_count(&msg.msg_iter);
@@ -953,7 +983,7 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
sr->len -= ret;
sr->buf += ret;
sr->done_io += ret;
- req->flags |= REQ_F_PARTIAL_IO;
+ req->flags |= REQ_F_BL_NO_RECYCLE;
return -EAGAIN;
}
if (ret == -ERESTARTSYS)
@@ -1003,6 +1033,8 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
struct io_ring_ctx *ctx = req->ctx;
struct io_kiocb *notif;
+ zc->done_io = 0;
+
if (unlikely(READ_ONCE(sqe->__pad2[0]) || READ_ONCE(sqe->addr3)))
return -EINVAL;
/* we don't support IOSQE_CQE_SKIP_SUCCESS just yet */
@@ -1055,8 +1087,6 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
if (zc->msg_flags & MSG_DONTWAIT)
req->flags |= REQ_F_NOWAIT;
- zc->done_io = 0;
-
#ifdef CONFIG_COMPAT
if (req->ctx->compat)
zc->msg_flags |= MSG_CMSG_COMPAT;
@@ -1196,7 +1226,7 @@ int io_send_zc(struct io_kiocb *req, unsigned int issue_flags)
zc->len -= ret;
zc->buf += ret;
zc->done_io += ret;
- req->flags |= REQ_F_PARTIAL_IO;
+ req->flags |= REQ_F_BL_NO_RECYCLE;
return io_setup_async_addr(req, &__address, issue_flags);
}
if (ret == -ERESTARTSYS)
@@ -1266,7 +1296,7 @@ int io_sendmsg_zc(struct io_kiocb *req, unsigned int issue_flags)
if (ret > 0 && io_net_retry(sock, flags)) {
sr->done_io += ret;
- req->flags |= REQ_F_PARTIAL_IO;
+ req->flags |= REQ_F_BL_NO_RECYCLE;
return io_setup_async_msg(req, kmsg, issue_flags);
}
if (ret == -ERESTARTSYS)
@@ -1301,7 +1331,7 @@ void io_sendrecv_fail(struct io_kiocb *req)
{
struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
- if (req->flags & REQ_F_PARTIAL_IO)
+ if (sr->done_io)
req->cqe.res = sr->done_io;
if ((req->flags & REQ_F_NEED_CLEANUP) &&
@@ -1351,8 +1381,6 @@ int io_accept(struct io_kiocb *req, unsigned int issue_flags)
struct file *file;
int ret, fd;
- if (!io_check_multishot(req, issue_flags))
- return -EAGAIN;
retry:
if (!fixed) {
fd = __get_unused_fd_flags(accept->flags, accept->nofile);
@@ -1372,7 +1400,7 @@ int io_accept(struct io_kiocb *req, unsigned int issue_flags)
* has already been done
*/
if (issue_flags & IO_URING_F_MULTISHOT)
- ret = IOU_ISSUE_SKIP_COMPLETE;
+ return IOU_ISSUE_SKIP_COMPLETE;
return ret;
}
if (ret == -ERESTARTSYS)
@@ -1397,7 +1425,8 @@ int io_accept(struct io_kiocb *req, unsigned int issue_flags)
ret, IORING_CQE_F_MORE))
goto retry;
- return -ECANCELED;
+ io_req_set_res(req, ret, 0);
+ return IOU_STOP_MULTISHOT;
}
int io_socket_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
diff --git a/io_uring/opdef.c b/io_uring/opdef.c
index b1ee3a9..9c080aa 100644
--- a/io_uring/opdef.c
+++ b/io_uring/opdef.c
@@ -35,6 +35,7 @@
#include "rw.h"
#include "waitid.h"
#include "futex.h"
+#include "truncate.h"
static int io_no_issue(struct io_kiocb *req, unsigned int issue_flags)
{
@@ -474,6 +475,12 @@ const struct io_issue_def io_issue_defs[] = {
.prep = io_install_fixed_fd_prep,
.issue = io_install_fixed_fd,
},
+ [IORING_OP_FTRUNCATE] = {
+ .needs_file = 1,
+ .hash_reg_file = 1,
+ .prep = io_ftruncate_prep,
+ .issue = io_ftruncate,
+ },
};
const struct io_cold_def io_cold_defs[] = {
@@ -712,6 +719,9 @@ const struct io_cold_def io_cold_defs[] = {
[IORING_OP_FIXED_FD_INSTALL] = {
.name = "FIXED_FD_INSTALL",
},
+ [IORING_OP_FTRUNCATE] = {
+ .name = "FTRUNCATE",
+ },
};
const char *io_uring_get_opcode(u8 opcode)
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 7513afc..5f77913 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -15,6 +15,7 @@
#include "io_uring.h"
#include "refs.h"
+#include "napi.h"
#include "opdef.h"
#include "kbuf.h"
#include "poll.h"
@@ -343,8 +344,8 @@ static int io_poll_check_events(struct io_kiocb *req, struct io_tw_state *ts)
* Release all references, retry if someone tried to restart
* task_work while we were executing it.
*/
- } while (atomic_sub_return(v & IO_POLL_REF_MASK, &req->poll_refs) &
- IO_POLL_REF_MASK);
+ v &= IO_POLL_REF_MASK;
+ } while (atomic_sub_return(v, &req->poll_refs) & IO_POLL_REF_MASK);
return IOU_POLL_NO_ACTION;
}
@@ -539,14 +540,6 @@ static void __io_queue_proc(struct io_poll *poll, struct io_poll_table *pt,
poll->wait.private = (void *) wqe_private;
if (poll->events & EPOLLEXCLUSIVE) {
- /*
- * Exclusive waits may only wake a limited amount of entries
- * rather than all of them, this may interfere with lazy
- * wake if someone does wait(events > 1). Ensure we don't do
- * lazy wake for those, as we need to process each one as they
- * come in.
- */
- req->flags |= REQ_F_POLL_NO_LAZY;
add_wait_queue_exclusive(head, &poll->wait);
} else {
add_wait_queue(head, &poll->wait);
@@ -588,10 +581,7 @@ static int __io_arm_poll_handler(struct io_kiocb *req,
struct io_poll_table *ipt, __poll_t mask,
unsigned issue_flags)
{
- struct io_ring_ctx *ctx = req->ctx;
-
INIT_HLIST_NODE(&req->hash_node);
- req->work.cancel_seq = atomic_read(&ctx->cancel_seq);
io_init_poll_iocb(poll, mask);
poll->file = req->file;
req->apoll_events = poll->events;
@@ -618,6 +608,17 @@ static int __io_arm_poll_handler(struct io_kiocb *req,
if (issue_flags & IO_URING_F_UNLOCKED)
req->flags &= ~REQ_F_HASH_LOCKED;
+
+ /*
+ * Exclusive waits may only wake a limited amount of entries
+ * rather than all of them, this may interfere with lazy
+ * wake if someone does wait(events > 1). Ensure we don't do
+ * lazy wake for those, as we need to process each one as they
+ * come in.
+ */
+ if (poll->events & EPOLLEXCLUSIVE)
+ req->flags |= REQ_F_POLL_NO_LAZY;
+
mask = vfs_poll(req->file, &ipt->pt) & poll->events;
if (unlikely(ipt->error || !ipt->nr_entries)) {
@@ -652,6 +653,7 @@ static int __io_arm_poll_handler(struct io_kiocb *req,
__io_poll_execute(req, mask);
return 0;
}
+ io_napi_add(req);
if (ipt->owning) {
/*
@@ -727,7 +729,7 @@ int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
if (!def->pollin && !def->pollout)
return IO_APOLL_ABORTED;
- if (!file_can_poll(req->file))
+ if (!io_file_can_poll(req))
return IO_APOLL_ABORTED;
if (!(req->flags & REQ_F_APOLL_MULTISHOT))
mask |= EPOLLONESHOT;
@@ -818,9 +820,8 @@ static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
if (poll_only && req->opcode != IORING_OP_POLL_ADD)
continue;
if (cd->flags & IORING_ASYNC_CANCEL_ALL) {
- if (cd->seq == req->work.cancel_seq)
+ if (io_cancel_match_sequence(req, cd->seq))
continue;
- req->work.cancel_seq = cd->seq;
}
*out_bucket = hb;
return req;
diff --git a/io_uring/register.c b/io_uring/register.c
index 5e62c12..99c3777 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -26,6 +26,7 @@
#include "register.h"
#include "cancel.h"
#include "kbuf.h"
+#include "napi.h"
#define IORING_MAX_RESTRICTIONS (IORING_RESTRICTION_LAST + \
IORING_REGISTER_LAST + IORING_OP_LAST)
@@ -550,6 +551,18 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
break;
ret = io_register_pbuf_status(ctx, arg);
break;
+ case IORING_REGISTER_NAPI:
+ ret = -EINVAL;
+ if (!arg || nr_args != 1)
+ break;
+ ret = io_register_napi(ctx, arg);
+ break;
+ case IORING_UNREGISTER_NAPI:
+ ret = -EINVAL;
+ if (nr_args != 1)
+ break;
+ ret = io_unregister_napi(ctx, arg);
+ break;
default:
ret = -EINVAL;
break;
diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index c6f199b..e210002 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -2,8 +2,6 @@
#ifndef IOU_RSRC_H
#define IOU_RSRC_H
-#include <net/af_unix.h>
-
#include "alloc_cache.h"
#define IO_NODE_ALLOC_CACHE_MAX 32
diff --git a/io_uring/rw.c b/io_uring/rw.c
index d5e79d9..47e097a 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -11,6 +11,7 @@
#include <linux/nospec.h>
#include <linux/compat.h>
#include <linux/io_uring/cmd.h>
+#include <linux/indirect_call_wrapper.h>
#include <uapi/linux/io_uring.h>
@@ -274,7 +275,7 @@ static bool __io_complete_rw_common(struct io_kiocb *req, long res)
* current cycle.
*/
io_req_io_end(req);
- req->flags |= REQ_F_REISSUE | REQ_F_PARTIAL_IO;
+ req->flags |= REQ_F_REISSUE | REQ_F_BL_NO_RECYCLE;
return true;
}
req_set_fail(req);
@@ -341,7 +342,7 @@ static void io_complete_rw_iopoll(struct kiocb *kiocb, long res)
io_req_end_write(req);
if (unlikely(res != req->cqe.res)) {
if (res == -EAGAIN && io_rw_should_reissue(req)) {
- req->flags |= REQ_F_REISSUE | REQ_F_PARTIAL_IO;
+ req->flags |= REQ_F_REISSUE | REQ_F_BL_NO_RECYCLE;
return;
}
req->cqe.res = res;
@@ -682,7 +683,7 @@ static bool io_rw_should_retry(struct io_kiocb *req)
* just use poll if we can, and don't attempt if the fs doesn't
* support callback based unlocks
*/
- if (file_can_poll(req->file) || !(req->file->f_mode & FMODE_BUF_RASYNC))
+ if (io_file_can_poll(req) || !(req->file->f_mode & FMODE_BUF_RASYNC))
return false;
wait->wait.func = io_async_buf_func;
@@ -721,7 +722,7 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode)
struct file *file = req->file;
int ret;
- if (unlikely(!file || !(file->f_mode & mode)))
+ if (unlikely(!(file->f_mode & mode)))
return -EBADF;
if (!(req->flags & REQ_F_FIXED_FILE))
@@ -831,7 +832,7 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags)
* If we can poll, just do that. For a vectored read, we'll
* need to copy state first.
*/
- if (file_can_poll(req->file) && !io_issue_defs[req->opcode].vectored)
+ if (io_file_can_poll(req) && !io_issue_defs[req->opcode].vectored)
return -EAGAIN;
/* IOPOLL retry should happen for io-wq threads */
if (!force_nonblock && !(req->ctx->flags & IORING_SETUP_IOPOLL))
@@ -930,7 +931,7 @@ int io_read_mshot(struct io_kiocb *req, unsigned int issue_flags)
/*
* Multishot MUST be used on a pollable file
*/
- if (!file_can_poll(req->file))
+ if (!io_file_can_poll(req))
return -EBADFD;
ret = __io_read(req, issue_flags);
diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
index 65b5dbe..363052b 100644
--- a/io_uring/sqpoll.c
+++ b/io_uring/sqpoll.c
@@ -15,9 +15,11 @@
#include <uapi/linux/io_uring.h>
#include "io_uring.h"
+#include "napi.h"
#include "sqpoll.h"
#define IORING_SQPOLL_CAP_ENTRIES_VALUE 8
+#define IORING_TW_CAP_ENTRIES_VALUE 8
enum {
IO_SQ_THREAD_SHOULD_STOP = 0,
@@ -193,6 +195,9 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
ret = io_submit_sqes(ctx, to_submit);
mutex_unlock(&ctx->uring_lock);
+ if (io_napi(ctx))
+ ret += io_napi_sqpoll_busy_poll(ctx);
+
if (to_submit && wq_has_sleeper(&ctx->sqo_sq_wait))
wake_up(&ctx->sqo_sq_wait);
if (creds)
@@ -219,10 +224,52 @@ static bool io_sqd_handle_event(struct io_sq_data *sqd)
return did_sig || test_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state);
}
+/*
+ * Run task_work, processing the retry_list first. The retry_list holds
+ * entries that we passed on in the previous run, if we had more task_work
+ * than we were asked to process. Newly queued task_work isn't run until the
+ * retry list has been fully processed.
+ */
+static unsigned int io_sq_tw(struct llist_node **retry_list, int max_entries)
+{
+ struct io_uring_task *tctx = current->io_uring;
+ unsigned int count = 0;
+
+ if (*retry_list) {
+ *retry_list = io_handle_tw_list(*retry_list, &count, max_entries);
+ if (count >= max_entries)
+ return count;
+ max_entries -= count;
+ }
+
+ *retry_list = tctx_task_work_run(tctx, max_entries, &count);
+ return count;
+}
+
+static bool io_sq_tw_pending(struct llist_node *retry_list)
+{
+ struct io_uring_task *tctx = current->io_uring;
+
+ return retry_list || !llist_empty(&tctx->task_list);
+}
+
+static void io_sq_update_worktime(struct io_sq_data *sqd, struct rusage *start)
+{
+ struct rusage end;
+
+ getrusage(current, RUSAGE_SELF, &end);
+ end.ru_stime.tv_sec -= start->ru_stime.tv_sec;
+ end.ru_stime.tv_usec -= start->ru_stime.tv_usec;
+
+ sqd->work_time += end.ru_stime.tv_usec + end.ru_stime.tv_sec * 1000000;
+}
+
static int io_sq_thread(void *data)
{
+ struct llist_node *retry_list = NULL;
struct io_sq_data *sqd = data;
struct io_ring_ctx *ctx;
+ struct rusage start;
unsigned long timeout = 0;
char buf[TASK_COMM_LEN];
DEFINE_WAIT(wait);
@@ -251,18 +298,21 @@ static int io_sq_thread(void *data)
}
cap_entries = !list_is_singular(&sqd->ctx_list);
+ getrusage(current, RUSAGE_SELF, &start);
list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
int ret = __io_sq_thread(ctx, cap_entries);
if (!sqt_spin && (ret > 0 || !wq_list_empty(&ctx->iopoll_list)))
sqt_spin = true;
}
- if (io_run_task_work())
+ if (io_sq_tw(&retry_list, IORING_TW_CAP_ENTRIES_VALUE))
sqt_spin = true;
if (sqt_spin || !time_after(jiffies, timeout)) {
- if (sqt_spin)
+ if (sqt_spin) {
+ io_sq_update_worktime(sqd, &start);
timeout = jiffies + sqd->sq_thread_idle;
+ }
if (unlikely(need_resched())) {
mutex_unlock(&sqd->lock);
cond_resched();
@@ -273,7 +323,7 @@ static int io_sq_thread(void *data)
}
prepare_to_wait(&sqd->wait, &wait, TASK_INTERRUPTIBLE);
- if (!io_sqd_events_pending(sqd) && !task_work_pending(current)) {
+ if (!io_sqd_events_pending(sqd) && !io_sq_tw_pending(retry_list)) {
bool needs_sched = true;
list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
@@ -312,6 +362,9 @@ static int io_sq_thread(void *data)
timeout = jiffies + sqd->sq_thread_idle;
}
+ if (retry_list)
+ io_sq_tw(&retry_list, UINT_MAX);
+
io_uring_cancel_generic(true, sqd);
sqd->thread = NULL;
list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
diff --git a/io_uring/sqpoll.h b/io_uring/sqpoll.h
index 8df37e8..4171666 100644
--- a/io_uring/sqpoll.h
+++ b/io_uring/sqpoll.h
@@ -16,6 +16,7 @@ struct io_sq_data {
pid_t task_pid;
pid_t task_tgid;
+ u64 work_time;
unsigned long state;
struct completion exited;
};
diff --git a/io_uring/truncate.c b/io_uring/truncate.c
new file mode 100644
index 0000000..62ee73d3
--- /dev/null
+++ b/io_uring/truncate.c
@@ -0,0 +1,48 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/syscalls.h>
+#include <linux/io_uring.h>
+
+#include <uapi/linux/io_uring.h>
+
+#include "../fs/internal.h"
+
+#include "io_uring.h"
+#include "truncate.h"
+
+struct io_ftrunc {
+ struct file *file;
+ loff_t len;
+};
+
+int io_ftruncate_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+ struct io_ftrunc *ft = io_kiocb_to_cmd(req, struct io_ftrunc);
+
+ if (sqe->rw_flags || sqe->addr || sqe->len || sqe->buf_index ||
+ sqe->splice_fd_in || sqe->addr3)
+ return -EINVAL;
+
+ ft->len = READ_ONCE(sqe->off);
+
+ req->flags |= REQ_F_FORCE_ASYNC;
+ return 0;
+}
+
+int io_ftruncate(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_ftrunc *ft = io_kiocb_to_cmd(req, struct io_ftrunc);
+ int ret;
+
+ WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
+
+ ret = do_ftruncate(req->file, ft->len, 1);
+
+ io_req_set_res(req, ret, 0);
+ return IOU_OK;
+}
diff --git a/io_uring/truncate.h b/io_uring/truncate.h
new file mode 100644
index 0000000..ec08829
--- /dev/null
+++ b/io_uring/truncate.h
@@ -0,0 +1,4 @@
+// SPDX-License-Identifier: GPL-2.0
+
+int io_ftruncate_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
+int io_ftruncate(struct io_kiocb *req, unsigned int issue_flags);
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index c33fca5..42f63ad 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -5,6 +5,7 @@
#include <linux/io_uring/cmd.h>
#include <linux/security.h>
#include <linux/nospec.h>
+#include <net/sock.h>
#include <uapi/linux/io_uring.h>
#include <asm/ioctls.h>
diff --git a/io_uring/xattr.c b/io_uring/xattr.c
index e1c810e..44905b8 100644
--- a/io_uring/xattr.c
+++ b/io_uring/xattr.c
@@ -112,7 +112,7 @@ int io_fgetxattr(struct io_kiocb *req, unsigned int issue_flags)
WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
- ret = do_getxattr(mnt_idmap(req->file->f_path.mnt),
+ ret = do_getxattr(file_mnt_idmap(req->file),
req->file->f_path.dentry,
&ix->ctx);
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 8a0bb80..ef82ffc 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -178,7 +178,7 @@ static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_entry *rcpu,
void **frames, int n,
struct xdp_cpumap_stats *stats)
{
- struct xdp_rxq_info rxq;
+ struct xdp_rxq_info rxq = {};
struct xdp_buff xdp;
int i, nframes = 0;
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index be72824..d19cd86 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1101,6 +1101,7 @@ struct bpf_hrtimer {
struct bpf_prog *prog;
void __rcu *callback_fn;
void *value;
+ struct rcu_head rcu;
};
/* the actual struct hidden inside uapi struct bpf_timer */
@@ -1332,6 +1333,7 @@ BPF_CALL_1(bpf_timer_cancel, struct bpf_timer_kern *, timer)
if (in_nmi())
return -EOPNOTSUPP;
+ rcu_read_lock();
__bpf_spin_lock_irqsave(&timer->lock);
t = timer->timer;
if (!t) {
@@ -1353,6 +1355,7 @@ BPF_CALL_1(bpf_timer_cancel, struct bpf_timer_kern *, timer)
* if it was running.
*/
ret = ret ?: hrtimer_cancel(&t->timer);
+ rcu_read_unlock();
return ret;
}
@@ -1407,7 +1410,7 @@ void bpf_timer_cancel_and_free(void *val)
*/
if (this_cpu_read(hrtimer_running) != t)
hrtimer_cancel(&t->timer);
- kfree(t);
+ kfree_rcu(t, rcu);
}
BPF_CALL_2(bpf_kptr_xchg, void *, map_value, void *, ptr)
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index e5c3500..ec4e97c 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -978,6 +978,8 @@ __bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it,
BUILD_BUG_ON(__alignof__(struct bpf_iter_task_kern) !=
__alignof__(struct bpf_iter_task));
+ kit->pos = NULL;
+
switch (flags) {
case BPF_TASK_ITER_ALL_THREADS:
case BPF_TASK_ITER_ALL_PROCS:
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 65f5986..ddea956 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5227,7 +5227,9 @@ BTF_ID(struct, prog_test_ref_kfunc)
#ifdef CONFIG_CGROUPS
BTF_ID(struct, cgroup)
#endif
+#ifdef CONFIG_BPF_JIT
BTF_ID(struct, bpf_cpumask)
+#endif
BTF_ID(struct, task_struct)
BTF_SET_END(rcu_protected_types)
@@ -16600,6 +16602,9 @@ static bool func_states_equal(struct bpf_verifier_env *env, struct bpf_func_stat
{
int i;
+ if (old->callback_depth > cur->callback_depth)
+ return false;
+
for (i = 0; i < MAX_BPF_REG; i++)
if (!regsafe(env, &old->regs[i], &cur->regs[i],
&env->idmap_scratch, exact))
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index ba36c07..927bef3 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2562,7 +2562,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
update_partition_sd_lb(cs, old_prs);
out_free:
free_cpumasks(NULL, &tmp);
- return 0;
+ return retval;
}
/**
@@ -2598,9 +2598,6 @@ static int update_exclusive_cpumask(struct cpuset *cs, struct cpuset *trialcs,
if (cpumask_equal(cs->exclusive_cpus, trialcs->exclusive_cpus))
return 0;
- if (alloc_cpumasks(NULL, &tmp))
- return -ENOMEM;
-
if (*buf)
compute_effective_exclusive_cpumask(trialcs, NULL);
@@ -2615,6 +2612,9 @@ static int update_exclusive_cpumask(struct cpuset *cs, struct cpuset *trialcs,
if (retval)
return retval;
+ if (alloc_cpumasks(NULL, &tmp))
+ return -ENOMEM;
+
if (old_prs) {
if (cpumask_empty(trialcs->effective_xcpus)) {
invalidate = true;
diff --git a/kernel/exit.c b/kernel/exit.c
index 3988a02..41a1263 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -739,6 +739,13 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
kill_orphaned_pgrp(tsk->group_leader, NULL);
tsk->exit_state = EXIT_ZOMBIE;
+ /*
+ * sub-thread or delay_group_leader(), wake up the
+ * PIDFD_THREAD waiters.
+ */
+ if (!thread_group_empty(tsk))
+ do_notify_pidfd(tsk);
+
if (unlikely(tsk->ptrace)) {
int sig = thread_group_leader(tsk) &&
thread_group_empty(tsk) &&
@@ -1127,17 +1134,14 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
* and nobody can change them.
*
* psig->stats_lock also protects us from our sub-threads
- * which can reap other children at the same time. Until
- * we change k_getrusage()-like users to rely on this lock
- * we have to take ->siglock as well.
+ * which can reap other children at the same time.
*
* We use thread_group_cputime_adjusted() to get times for
* the thread group, which consolidates times for all threads
* in the group including the group leader.
*/
thread_group_cputime_adjusted(p, &tgutime, &tgstime);
- spin_lock_irq(¤t->sighand->siglock);
- write_seqlock(&psig->stats_lock);
+ write_seqlock_irq(&psig->stats_lock);
psig->cutime += tgutime + sig->cutime;
psig->cstime += tgstime + sig->cstime;
psig->cgtime += task_gtime(p) + sig->gtime + sig->cgtime;
@@ -1160,8 +1164,7 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
psig->cmaxrss = maxrss;
task_io_accounting_add(&psig->ioac, &p->ioac);
task_io_accounting_add(&psig->ioac, &sig->ioac);
- write_sequnlock(&psig->stats_lock);
- spin_unlock_irq(¤t->sighand->siglock);
+ write_sequnlock_irq(&psig->stats_lock);
}
if (wo->wo_rusage)
@@ -1893,30 +1896,6 @@ COMPAT_SYSCALL_DEFINE5(waitid,
}
#endif
-/**
- * thread_group_exited - check that a thread group has exited
- * @pid: tgid of thread group to be checked.
- *
- * Test if the thread group represented by tgid has exited (all
- * threads are zombies, dead or completely gone).
- *
- * Return: true if the thread group has exited. false otherwise.
- */
-bool thread_group_exited(struct pid *pid)
-{
- struct task_struct *task;
- bool exited;
-
- rcu_read_lock();
- task = pid_task(pid, PIDTYPE_PID);
- exited = !task ||
- (READ_ONCE(task->exit_state) && thread_group_empty(task));
- rcu_read_unlock();
-
- return exited;
-}
-EXPORT_SYMBOL(thread_group_exited);
-
/*
* This needs to be __function_aligned as GCC implicitly makes any
* implementation of abort() cold and drops alignment specified by
diff --git a/kernel/fork.c b/kernel/fork.c
index 0d944e9..1af8dfd 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -101,6 +101,8 @@
#include <linux/user_events.h>
#include <linux/iommu.h>
#include <linux/rseq.h>
+#include <uapi/linux/pidfd.h>
+#include <linux/pidfs.h>
#include <asm/pgalloc.h>
#include <linux/uaccess.h>
@@ -1985,119 +1987,6 @@ static inline void rcu_copy_process(struct task_struct *p)
#endif /* #ifdef CONFIG_TASKS_TRACE_RCU */
}
-struct pid *pidfd_pid(const struct file *file)
-{
- if (file->f_op == &pidfd_fops)
- return file->private_data;
-
- return ERR_PTR(-EBADF);
-}
-
-static int pidfd_release(struct inode *inode, struct file *file)
-{
- struct pid *pid = file->private_data;
-
- file->private_data = NULL;
- put_pid(pid);
- return 0;
-}
-
-#ifdef CONFIG_PROC_FS
-/**
- * pidfd_show_fdinfo - print information about a pidfd
- * @m: proc fdinfo file
- * @f: file referencing a pidfd
- *
- * Pid:
- * This function will print the pid that a given pidfd refers to in the
- * pid namespace of the procfs instance.
- * If the pid namespace of the process is not a descendant of the pid
- * namespace of the procfs instance 0 will be shown as its pid. This is
- * similar to calling getppid() on a process whose parent is outside of
- * its pid namespace.
- *
- * NSpid:
- * If pid namespaces are supported then this function will also print
- * the pid of a given pidfd refers to for all descendant pid namespaces
- * starting from the current pid namespace of the instance, i.e. the
- * Pid field and the first entry in the NSpid field will be identical.
- * If the pid namespace of the process is not a descendant of the pid
- * namespace of the procfs instance 0 will be shown as its first NSpid
- * entry and no others will be shown.
- * Note that this differs from the Pid and NSpid fields in
- * /proc/<pid>/status where Pid and NSpid are always shown relative to
- * the pid namespace of the procfs instance. The difference becomes
- * obvious when sending around a pidfd between pid namespaces from a
- * different branch of the tree, i.e. where no ancestral relation is
- * present between the pid namespaces:
- * - create two new pid namespaces ns1 and ns2 in the initial pid
- * namespace (also take care to create new mount namespaces in the
- * new pid namespace and mount procfs)
- * - create a process with a pidfd in ns1
- * - send pidfd from ns1 to ns2
- * - read /proc/self/fdinfo/<pidfd> and observe that both Pid and NSpid
- * have exactly one entry, which is 0
- */
-static void pidfd_show_fdinfo(struct seq_file *m, struct file *f)
-{
- struct pid *pid = f->private_data;
- struct pid_namespace *ns;
- pid_t nr = -1;
-
- if (likely(pid_has_task(pid, PIDTYPE_PID))) {
- ns = proc_pid_ns(file_inode(m->file)->i_sb);
- nr = pid_nr_ns(pid, ns);
- }
-
- seq_put_decimal_ll(m, "Pid:\t", nr);
-
-#ifdef CONFIG_PID_NS
- seq_put_decimal_ll(m, "\nNSpid:\t", nr);
- if (nr > 0) {
- int i;
-
- /* If nr is non-zero it means that 'pid' is valid and that
- * ns, i.e. the pid namespace associated with the procfs
- * instance, is in the pid namespace hierarchy of pid.
- * Start at one below the already printed level.
- */
- for (i = ns->level + 1; i <= pid->level; i++)
- seq_put_decimal_ll(m, "\t", pid->numbers[i].nr);
- }
-#endif
- seq_putc(m, '\n');
-}
-#endif
-
-/*
- * Poll support for process exit notification.
- */
-static __poll_t pidfd_poll(struct file *file, struct poll_table_struct *pts)
-{
- struct pid *pid = file->private_data;
- __poll_t poll_flags = 0;
-
- poll_wait(file, &pid->wait_pidfd, pts);
-
- /*
- * Inform pollers only when the whole thread group exits.
- * If the thread group leader exits before all other threads in the
- * group, then poll(2) should block, similar to the wait(2) family.
- */
- if (thread_group_exited(pid))
- poll_flags = EPOLLIN | EPOLLRDNORM;
-
- return poll_flags;
-}
-
-const struct file_operations pidfd_fops = {
- .release = pidfd_release,
- .poll = pidfd_poll,
-#ifdef CONFIG_PROC_FS
- .show_fdinfo = pidfd_show_fdinfo,
-#endif
-};
-
/**
* __pidfd_prepare - allocate a new pidfd_file and reserve a pidfd
* @pid: the struct pid for which to create a pidfd
@@ -2131,20 +2020,20 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
int pidfd;
struct file *pidfd_file;
- if (flags & ~(O_NONBLOCK | O_RDWR | O_CLOEXEC))
- return -EINVAL;
-
- pidfd = get_unused_fd_flags(O_RDWR | O_CLOEXEC);
+ pidfd = get_unused_fd_flags(O_CLOEXEC);
if (pidfd < 0)
return pidfd;
- pidfd_file = anon_inode_getfile("[pidfd]", &pidfd_fops, pid,
- flags | O_RDWR | O_CLOEXEC);
+ pidfd_file = pidfs_alloc_file(pid, flags | O_RDWR);
if (IS_ERR(pidfd_file)) {
put_unused_fd(pidfd);
return PTR_ERR(pidfd_file);
}
- get_pid(pid); /* held by pidfd_file now */
+ /*
+ * anon_inode_getfile() ignores everything outside of the
+ * O_ACCMODE | O_NONBLOCK mask, set PIDFD_THREAD manually.
+ */
+ pidfd_file->f_flags |= (flags & PIDFD_THREAD);
*ret = pidfd_file;
return pidfd;
}
@@ -2158,7 +2047,8 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
* Allocate a new file that stashes @pid and reserve a new pidfd number in the
* caller's file descriptor table. The pidfd is reserved but not installed yet.
*
- * The helper verifies that @pid is used as a thread group leader.
+ * The helper verifies that @pid is still in use, without PIDFD_THREAD the
+ * task identified by @pid must be a thread-group leader.
*
* If this function returns successfully the caller is responsible to either
* call fd_install() passing the returned pidfd and pidfd file as arguments in
@@ -2177,7 +2067,9 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
*/
int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
{
- if (!pid || !pid_has_task(pid, PIDTYPE_TGID))
+ bool thread = flags & PIDFD_THREAD;
+
+ if (!pid || !pid_has_task(pid, thread ? PIDTYPE_PID : PIDTYPE_TGID))
return -EINVAL;
return __pidfd_prepare(pid, flags, ret);
@@ -2299,9 +2191,8 @@ __latent_entropy struct task_struct *copy_process(
/*
* - CLONE_DETACHED is blocked so that we can potentially
* reuse it later for CLONE_PIDFD.
- * - CLONE_THREAD is blocked until someone really needs it.
*/
- if (clone_flags & (CLONE_DETACHED | CLONE_THREAD))
+ if (clone_flags & CLONE_DETACHED)
return ERR_PTR(-EINVAL);
}
@@ -2524,8 +2415,10 @@ __latent_entropy struct task_struct *copy_process(
* if the fd table isn't shared).
*/
if (clone_flags & CLONE_PIDFD) {
+ int flags = (clone_flags & CLONE_THREAD) ? PIDFD_THREAD : 0;
+
/* Note that no task has been attached to @pid yet. */
- retval = __pidfd_prepare(pid, O_RDWR | O_CLOEXEC, &pidfile);
+ retval = __pidfd_prepare(pid, flags, &pidfile);
if (retval < 0)
goto bad_fork_free_pid;
pidfd = retval;
@@ -2876,8 +2769,8 @@ pid_t kernel_clone(struct kernel_clone_args *args)
* here has the advantage that we don't need to have a separate helper
* to check for legacy clone().
*/
- if ((args->flags & CLONE_PIDFD) &&
- (args->flags & CLONE_PARENT_SETTID) &&
+ if ((clone_flags & CLONE_PIDFD) &&
+ (clone_flags & CLONE_PARENT_SETTID) &&
(args->pidfd == args->parent_tid))
return -EINVAL;
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index d5a0ee4..9d9095e 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1993,7 +1993,7 @@ NOKPROBE_SYMBOL(__kretprobe_find_ret_addr);
unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp,
struct llist_node **cur)
{
- struct kretprobe_instance *ri = NULL;
+ struct kretprobe_instance *ri;
kprobe_opcode_t *ret;
if (WARN_ON_ONCE(!cur))
@@ -2802,7 +2802,7 @@ static int show_kprobe_addr(struct seq_file *pi, void *v)
{
struct hlist_head *head;
struct kprobe *p, *kp;
- const char *sym = NULL;
+ const char *sym;
unsigned int i = *(loff_t *) v;
unsigned long offset = 0;
char *modname, namebuf[KSYM_NAME_LEN];
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 15781ac..6ec3dee 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -573,7 +573,7 @@ SYSCALL_DEFINE2(setns, int, fd, int, flags)
if (proc_ns_file(f.file))
err = validate_ns(&nsset, ns);
else
- err = validate_nsset(&nsset, f.file->private_data);
+ err = validate_nsset(&nsset, pidfd_pid(f.file));
if (!err) {
commit_nsset(&nsset);
perf_event_namespaces(current);
diff --git a/kernel/pid.c b/kernel/pid.c
index b52b108..99a0c5e 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -42,6 +42,7 @@
#include <linux/sched/signal.h>
#include <linux/sched/task.h>
#include <linux/idr.h>
+#include <linux/pidfs.h>
#include <net/sock.h>
#include <uapi/linux/pidfd.h>
@@ -65,6 +66,13 @@ int pid_max = PID_MAX_DEFAULT;
int pid_max_min = RESERVED_PIDS + 1;
int pid_max_max = PID_MAX_LIMIT;
+#ifdef CONFIG_FS_PID
+/*
+ * Pseudo filesystems start inode numbering after one. We use Reserved
+ * PIDs as a natural offset.
+ */
+static u64 pidfs_ino = RESERVED_PIDS;
+#endif
/*
* PID-map pages start out as NULL, they get allocated upon
@@ -272,6 +280,10 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
spin_lock_irq(&pidmap_lock);
if (!(ns->pid_allocated & PIDNS_ADDING))
goto out_unlock;
+#ifdef CONFIG_FS_PID
+ pid->stashed = NULL;
+ pid->ino = ++pidfs_ino;
+#endif
for ( ; upid >= pid->numbers; --upid) {
/* Make the PID visible to find_pid_ns. */
idr_replace(&upid->ns->idr, pid, upid->nr);
@@ -349,6 +361,11 @@ static void __change_pid(struct task_struct *task, enum pid_type type,
hlist_del_rcu(&task->pid_links[type]);
*pid_ptr = new;
+ if (type == PIDTYPE_PID) {
+ WARN_ON_ONCE(pid_has_task(pid, PIDTYPE_PID));
+ wake_up_all(&pid->wait_pidfd);
+ }
+
for (tmp = PIDTYPE_MAX; --tmp >= 0; )
if (pid_has_task(pid, tmp))
return;
@@ -391,8 +408,7 @@ void exchange_tids(struct task_struct *left, struct task_struct *right)
void transfer_pid(struct task_struct *old, struct task_struct *new,
enum pid_type type)
{
- if (type == PIDTYPE_PID)
- new->thread_pid = old->thread_pid;
+ WARN_ON_ONCE(type == PIDTYPE_PID);
hlist_replace_rcu(&old->pid_links[type], &new->pid_links[type]);
}
@@ -552,11 +568,6 @@ struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags)
* Return the task associated with @pidfd. The function takes a reference on
* the returned task. The caller is responsible for releasing that reference.
*
- * Currently, the process identified by @pidfd is always a thread-group leader.
- * This restriction currently exists for all aspects of pidfds including pidfd
- * creation (CLONE_PIDFD cannot be used with CLONE_THREAD) and pidfd polling
- * (only supports thread group leaders).
- *
* Return: On success, the task_struct associated with the pidfd.
* On error, a negative errno number will be returned.
*/
@@ -595,7 +606,7 @@ struct task_struct *pidfd_get_task(int pidfd, unsigned int *flags)
* Return: On success, a cloexec pidfd is returned.
* On error, a negative errno number will be returned.
*/
-int pidfd_create(struct pid *pid, unsigned int flags)
+static int pidfd_create(struct pid *pid, unsigned int flags)
{
int pidfd;
struct file *pidfd_file;
@@ -615,11 +626,8 @@ int pidfd_create(struct pid *pid, unsigned int flags)
* @flags: flags to pass
*
* This creates a new pid file descriptor with the O_CLOEXEC flag set for
- * the process identified by @pid. Currently, the process identified by
- * @pid must be a thread-group leader. This restriction currently exists
- * for all aspects of pidfds including pidfd creation (CLONE_PIDFD cannot
- * be used with CLONE_THREAD) and pidfd polling (only supports thread group
- * leaders).
+ * the task identified by @pid. Without PIDFD_THREAD flag the target task
+ * must be a thread-group leader.
*
* Return: On success, a cloexec pidfd is returned.
* On error, a negative errno number will be returned.
@@ -629,7 +637,7 @@ SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
int fd;
struct pid *p;
- if (flags & ~PIDFD_NONBLOCK)
+ if (flags & ~(PIDFD_NONBLOCK | PIDFD_THREAD))
return -EINVAL;
if (pid <= 0)
@@ -682,7 +690,26 @@ static struct file *__pidfd_fget(struct task_struct *task, int fd)
up_read(&task->signal->exec_update_lock);
- return file ?: ERR_PTR(-EBADF);
+ if (!file) {
+ /*
+ * It is possible that the target thread is exiting; it can be
+ * either:
+ * 1. before exit_signals(), which gives a real fd
+ * 2. before exit_files() takes the task_lock() gives a real fd
+ * 3. after exit_files() releases task_lock(), ->files is NULL;
+ * this has PF_EXITING, since it was set in exit_signals(),
+ * __pidfd_fget() returns EBADF.
+ * In case 3 we get EBADF, but that really means ESRCH, since
+ * the task is currently exiting and has freed its files
+ * struct, so we fix it up.
+ */
+ if (task->flags & PF_EXITING)
+ file = ERR_PTR(-ESRCH);
+ else
+ file = ERR_PTR(-EBADF);
+ }
+
+ return file;
}
static int pidfd_getfd(struct pid *pid, int fd)
diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index 6053ddd..692f12f 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -222,7 +222,7 @@ int swsusp_swap_in_use(void)
*/
static unsigned short root_swap = 0xffff;
-static struct bdev_handle *hib_resume_bdev_handle;
+static struct file *hib_resume_bdev_file;
struct hib_bio_batch {
atomic_t count;
@@ -276,7 +276,7 @@ static int hib_submit_io(blk_opf_t opf, pgoff_t page_off, void *addr,
struct bio *bio;
int error = 0;
- bio = bio_alloc(hib_resume_bdev_handle->bdev, 1, opf,
+ bio = bio_alloc(file_bdev(hib_resume_bdev_file), 1, opf,
GFP_NOIO | __GFP_HIGH);
bio->bi_iter.bi_sector = page_off * (PAGE_SIZE >> 9);
@@ -357,14 +357,14 @@ static int swsusp_swap_check(void)
return res;
root_swap = res;
- hib_resume_bdev_handle = bdev_open_by_dev(swsusp_resume_device,
+ hib_resume_bdev_file = bdev_file_open_by_dev(swsusp_resume_device,
BLK_OPEN_WRITE, NULL, NULL);
- if (IS_ERR(hib_resume_bdev_handle))
- return PTR_ERR(hib_resume_bdev_handle);
+ if (IS_ERR(hib_resume_bdev_file))
+ return PTR_ERR(hib_resume_bdev_file);
- res = set_blocksize(hib_resume_bdev_handle->bdev, PAGE_SIZE);
+ res = set_blocksize(file_bdev(hib_resume_bdev_file), PAGE_SIZE);
if (res < 0)
- bdev_release(hib_resume_bdev_handle);
+ fput(hib_resume_bdev_file);
return res;
}
@@ -1523,10 +1523,10 @@ int swsusp_check(bool exclusive)
void *holder = exclusive ? &swsusp_holder : NULL;
int error;
- hib_resume_bdev_handle = bdev_open_by_dev(swsusp_resume_device,
+ hib_resume_bdev_file = bdev_file_open_by_dev(swsusp_resume_device,
BLK_OPEN_READ, holder, NULL);
- if (!IS_ERR(hib_resume_bdev_handle)) {
- set_blocksize(hib_resume_bdev_handle->bdev, PAGE_SIZE);
+ if (!IS_ERR(hib_resume_bdev_file)) {
+ set_blocksize(file_bdev(hib_resume_bdev_file), PAGE_SIZE);
clear_page(swsusp_header);
error = hib_submit_io(REQ_OP_READ, swsusp_resume_block,
swsusp_header, NULL);
@@ -1551,11 +1551,11 @@ int swsusp_check(bool exclusive)
put:
if (error)
- bdev_release(hib_resume_bdev_handle);
+ fput(hib_resume_bdev_file);
else
pr_debug("Image signature found, resuming\n");
} else {
- error = PTR_ERR(hib_resume_bdev_handle);
+ error = PTR_ERR(hib_resume_bdev_file);
}
if (error)
@@ -1570,12 +1570,12 @@ int swsusp_check(bool exclusive)
void swsusp_close(void)
{
- if (IS_ERR(hib_resume_bdev_handle)) {
+ if (IS_ERR(hib_resume_bdev_file)) {
pr_debug("Image device not initialised\n");
return;
}
- bdev_release(hib_resume_bdev_handle);
+ fput(hib_resume_bdev_file);
}
/**
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 2ad881d..4e715b9 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -162,6 +162,9 @@
| MEMBARRIER_PRIVATE_EXPEDITED_RSEQ_BITMASK \
| MEMBARRIER_CMD_GET_REGISTRATIONS)
+static DEFINE_MUTEX(membarrier_ipi_mutex);
+#define SERIALIZE_IPI() guard(mutex)(&membarrier_ipi_mutex)
+
static void ipi_mb(void *info)
{
smp_mb(); /* IPIs should be serializing but paranoid. */
@@ -259,6 +262,7 @@ static int membarrier_global_expedited(void)
if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
return -ENOMEM;
+ SERIALIZE_IPI();
cpus_read_lock();
rcu_read_lock();
for_each_online_cpu(cpu) {
@@ -347,6 +351,7 @@ static int membarrier_private_expedited(int flags, int cpu_id)
if (cpu_id < 0 && !zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
return -ENOMEM;
+ SERIALIZE_IPI();
cpus_read_lock();
if (cpu_id >= 0) {
@@ -460,6 +465,7 @@ static int sync_runqueues_membarrier_state(struct mm_struct *mm)
* between threads which are users of @mm has its membarrier state
* updated.
*/
+ SERIALIZE_IPI();
cpus_read_lock();
rcu_read_lock();
for_each_online_cpu(cpu) {
diff --git a/kernel/signal.c b/kernel/signal.c
index c9c57d0..bdca529 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -47,6 +47,7 @@
#include <linux/cgroup.h>
#include <linux/audit.h>
#include <linux/sysctl.h>
+#include <uapi/linux/pidfd.h>
#define CREATE_TRACE_POINTS
#include <trace/events/signal.h>
@@ -1436,7 +1437,8 @@ void lockdep_assert_task_sighand_held(struct task_struct *task)
#endif
/*
- * send signal info to all the members of a group
+ * send signal info to all the members of a thread group or to the
+ * individual thread if type == PIDTYPE_PID.
*/
int group_send_sig_info(int sig, struct kernel_siginfo *info,
struct task_struct *p, enum pid_type type)
@@ -1478,7 +1480,8 @@ int __kill_pgrp_info(int sig, struct kernel_siginfo *info, struct pid *pgrp)
return ret;
}
-int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid)
+static int kill_pid_info_type(int sig, struct kernel_siginfo *info,
+ struct pid *pid, enum pid_type type)
{
int error = -ESRCH;
struct task_struct *p;
@@ -1487,11 +1490,10 @@ int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid)
rcu_read_lock();
p = pid_task(pid, PIDTYPE_PID);
if (p)
- error = group_send_sig_info(sig, info, p, PIDTYPE_TGID);
+ error = group_send_sig_info(sig, info, p, type);
rcu_read_unlock();
if (likely(!p || error != -ESRCH))
return error;
-
/*
* The task was unhashed in between, try again. If it
* is dead, pid_task() will return NULL, if we race with
@@ -1500,6 +1502,11 @@ int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid)
}
}
+int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid)
+{
+ return kill_pid_info_type(sig, info, pid, PIDTYPE_TGID);
+}
+
static int kill_proc_info(int sig, struct kernel_siginfo *info, pid_t pid)
{
int error;
@@ -1898,16 +1905,19 @@ int send_sig_fault_trapno(int sig, int code, void __user *addr, int trapno,
return send_sig_info(info.si_signo, &info, t);
}
-int kill_pgrp(struct pid *pid, int sig, int priv)
+static int kill_pgrp_info(int sig, struct kernel_siginfo *info, struct pid *pgrp)
{
int ret;
-
read_lock(&tasklist_lock);
- ret = __kill_pgrp_info(sig, __si_special(priv), pid);
+ ret = __kill_pgrp_info(sig, info, pgrp);
read_unlock(&tasklist_lock);
-
return ret;
}
+
+int kill_pgrp(struct pid *pid, int sig, int priv)
+{
+ return kill_pgrp_info(sig, __si_special(priv), pid);
+}
EXPORT_SYMBOL(kill_pgrp);
int kill_pid(struct pid *pid, int sig, int priv)
@@ -2019,13 +2029,14 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
return ret;
}
-static void do_notify_pidfd(struct task_struct *task)
+void do_notify_pidfd(struct task_struct *task)
{
- struct pid *pid;
+ struct pid *pid = task_pid(task);
WARN_ON(task->exit_state == 0);
- pid = task_pid(task);
- wake_up_all(&pid->wait_pidfd);
+
+ __wake_up(&pid->wait_pidfd, TASK_NORMAL, 0,
+ poll_to_key(EPOLLIN | EPOLLRDNORM));
}
/*
@@ -2050,9 +2061,12 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
WARN_ON_ONCE(!tsk->ptrace &&
(tsk->group_leader != tsk || !thread_group_empty(tsk)));
-
- /* Wake up all pidfd waiters */
- do_notify_pidfd(tsk);
+ /*
+ * tsk is a group leader and has no threads, wake up the
+ * non-PIDFD_THREAD waiters.
+ */
+ if (thread_group_empty(tsk))
+ do_notify_pidfd(tsk);
if (sig != SIGCHLD) {
/*
@@ -3789,12 +3803,13 @@ COMPAT_SYSCALL_DEFINE4(rt_sigtimedwait_time32, compat_sigset_t __user *, uthese,
#endif
#endif
-static inline void prepare_kill_siginfo(int sig, struct kernel_siginfo *info)
+static void prepare_kill_siginfo(int sig, struct kernel_siginfo *info,
+ enum pid_type type)
{
clear_siginfo(info);
info->si_signo = sig;
info->si_errno = 0;
- info->si_code = SI_USER;
+ info->si_code = (type == PIDTYPE_PID) ? SI_TKILL : SI_USER;
info->si_pid = task_tgid_vnr(current);
info->si_uid = from_kuid_munged(current_user_ns(), current_uid());
}
@@ -3808,7 +3823,7 @@ SYSCALL_DEFINE2(kill, pid_t, pid, int, sig)
{
struct kernel_siginfo info;
- prepare_kill_siginfo(sig, &info);
+ prepare_kill_siginfo(sig, &info, PIDTYPE_TGID);
return kill_something_info(sig, &info, pid);
}
@@ -3861,6 +3876,10 @@ static struct pid *pidfd_to_pid(const struct file *file)
return tgid_pidfd_to_pid(file);
}
+#define PIDFD_SEND_SIGNAL_FLAGS \
+ (PIDFD_SIGNAL_THREAD | PIDFD_SIGNAL_THREAD_GROUP | \
+ PIDFD_SIGNAL_PROCESS_GROUP)
+
/**
* sys_pidfd_send_signal - Signal a process through a pidfd
* @pidfd: file descriptor of the process
@@ -3868,14 +3887,10 @@ static struct pid *pidfd_to_pid(const struct file *file)
* @info: signal info
* @flags: future flags
*
- * The syscall currently only signals via PIDTYPE_PID which covers
- * kill(<positive-pid>, <signal>. It does not signal threads or process
- * groups.
- * In order to extend the syscall to threads and process groups the @flags
- * argument should be used. In essence, the @flags argument will determine
- * what is signaled and not the file descriptor itself. Put in other words,
- * grouping is a property of the flags argument not a property of the file
- * descriptor.
+ * Send the signal to the thread group or to the individual thread depending
+ * on PIDFD_THREAD.
+ * In the future extension to @flags may be used to override the default scope
+ * of @pidfd.
*
* Return: 0 on success, negative errno on failure
*/
@@ -3886,9 +3901,14 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
struct fd f;
struct pid *pid;
kernel_siginfo_t kinfo;
+ enum pid_type type;
/* Enforce flags be set to 0 until we add an extension. */
- if (flags)
+ if (flags & ~PIDFD_SEND_SIGNAL_FLAGS)
+ return -EINVAL;
+
+ /* Ensure that only a single signal scope determining flag is set. */
+ if (hweight32(flags & PIDFD_SEND_SIGNAL_FLAGS) > 1)
return -EINVAL;
f = fdget(pidfd);
@@ -3906,6 +3926,25 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
if (!access_pidfd_pidns(pid))
goto err;
+ switch (flags) {
+ case 0:
+ /* Infer scope from the type of pidfd. */
+ if (f.file->f_flags & PIDFD_THREAD)
+ type = PIDTYPE_PID;
+ else
+ type = PIDTYPE_TGID;
+ break;
+ case PIDFD_SIGNAL_THREAD:
+ type = PIDTYPE_PID;
+ break;
+ case PIDFD_SIGNAL_THREAD_GROUP:
+ type = PIDTYPE_TGID;
+ break;
+ case PIDFD_SIGNAL_PROCESS_GROUP:
+ type = PIDTYPE_PGID;
+ break;
+ }
+
if (info) {
ret = copy_siginfo_from_user_any(&kinfo, info);
if (unlikely(ret))
@@ -3917,15 +3956,17 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
/* Only allow sending arbitrary signals to yourself. */
ret = -EPERM;
- if ((task_pid(current) != pid) &&
+ if ((task_pid(current) != pid || type > PIDTYPE_TGID) &&
(kinfo.si_code >= 0 || kinfo.si_code == SI_TKILL))
goto err;
} else {
- prepare_kill_siginfo(sig, &kinfo);
+ prepare_kill_siginfo(sig, &kinfo, type);
}
- ret = kill_pid_info(sig, &kinfo, pid);
-
+ if (type == PIDTYPE_PGID)
+ ret = kill_pgrp_info(sig, &kinfo, pid);
+ else
+ ret = kill_pid_info_type(sig, &kinfo, pid, type);
err:
fdput(f);
return ret;
@@ -3965,12 +4006,7 @@ static int do_tkill(pid_t tgid, pid_t pid, int sig)
{
struct kernel_siginfo info;
- clear_siginfo(&info);
- info.si_signo = sig;
- info.si_errno = 0;
- info.si_code = SI_TKILL;
- info.si_pid = task_tgid_vnr(current);
- info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
+ prepare_kill_siginfo(sig, &info, PIDTYPE_PID);
return do_send_specific(tgid, pid, sig, &info);
}
diff --git a/kernel/sys.c b/kernel/sys.c
index e219fcf..f8e543f 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1785,21 +1785,24 @@ void getrusage(struct task_struct *p, int who, struct rusage *r)
struct task_struct *t;
unsigned long flags;
u64 tgutime, tgstime, utime, stime;
- unsigned long maxrss = 0;
+ unsigned long maxrss;
+ struct mm_struct *mm;
struct signal_struct *sig = p->signal;
+ unsigned int seq = 0;
- memset((char *)r, 0, sizeof (*r));
+retry:
+ memset(r, 0, sizeof(*r));
utime = stime = 0;
+ maxrss = 0;
if (who == RUSAGE_THREAD) {
task_cputime_adjusted(current, &utime, &stime);
accumulate_thread_rusage(p, r);
maxrss = sig->maxrss;
- goto out;
+ goto out_thread;
}
- if (!lock_task_sighand(p, &flags))
- return;
+ flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq);
switch (who) {
case RUSAGE_BOTH:
@@ -1819,9 +1822,6 @@ void getrusage(struct task_struct *p, int who, struct rusage *r)
fallthrough;
case RUSAGE_SELF:
- thread_group_cputime_adjusted(p, &tgutime, &tgstime);
- utime += tgutime;
- stime += tgstime;
r->ru_nvcsw += sig->nvcsw;
r->ru_nivcsw += sig->nivcsw;
r->ru_minflt += sig->min_flt;
@@ -1830,28 +1830,42 @@ void getrusage(struct task_struct *p, int who, struct rusage *r)
r->ru_oublock += sig->oublock;
if (maxrss < sig->maxrss)
maxrss = sig->maxrss;
+
+ rcu_read_lock();
__for_each_thread(sig, t)
accumulate_thread_rusage(t, r);
+ rcu_read_unlock();
+
break;
default:
BUG();
}
- unlock_task_sighand(p, &flags);
-out:
+ if (need_seqretry(&sig->stats_lock, seq)) {
+ seq = 1;
+ goto retry;
+ }
+ done_seqretry_irqrestore(&sig->stats_lock, seq, flags);
+
+ if (who == RUSAGE_CHILDREN)
+ goto out_children;
+
+ thread_group_cputime_adjusted(p, &tgutime, &tgstime);
+ utime += tgutime;
+ stime += tgstime;
+
+out_thread:
+ mm = get_task_mm(p);
+ if (mm) {
+ setmax_mm_hiwater_rss(&maxrss, mm);
+ mmput(mm);
+ }
+
+out_children:
+ r->ru_maxrss = maxrss * (PAGE_SIZE / 1024); /* convert pages to KBs */
r->ru_utime = ns_to_kernel_old_timeval(utime);
r->ru_stime = ns_to_kernel_old_timeval(stime);
-
- if (who != RUSAGE_CHILDREN) {
- struct mm_struct *mm = get_task_mm(p);
-
- if (mm) {
- setmax_mm_hiwater_rss(&maxrss, mm);
- mmput(mm);
- }
- }
- r->ru_maxrss = maxrss * (PAGE_SIZE / 1024); /* convert pages to KBs */
}
SYSCALL_DEFINE2(getrusage, int, who, struct rusage __user *, ru)
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 7607939..edb0f82 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1085,6 +1085,7 @@ static int enqueue_hrtimer(struct hrtimer *timer,
enum hrtimer_mode mode)
{
debug_activate(timer, mode);
+ WARN_ON_ONCE(!base->cpu_base->online);
base->cpu_base->active_bases |= 1 << base->index;
@@ -2183,6 +2184,7 @@ int hrtimers_prepare_cpu(unsigned int cpu)
cpu_base->softirq_next_timer = NULL;
cpu_base->expires_next = KTIME_MAX;
cpu_base->softirq_expires_next = KTIME_MAX;
+ cpu_base->online = 1;
hrtimer_cpu_base_init_expiry_lock(cpu_base);
return 0;
}
@@ -2250,6 +2252,7 @@ int hrtimers_cpu_dying(unsigned int dying_cpu)
smp_call_function_single(ncpu, retrigger_next_event, NULL, 0);
raw_spin_unlock(&new_base->lock);
+ old_base->online = 0;
raw_spin_unlock(&old_base->lock);
return 0;
diff --git a/kernel/time/time_test.c b/kernel/time/time_test.c
index ca058c8..3e5d422 100644
--- a/kernel/time/time_test.c
+++ b/kernel/time/time_test.c
@@ -73,7 +73,7 @@ static void time64_to_tm_test_date_range(struct kunit *test)
days = div_s64(secs, 86400);
- #define FAIL_MSG "%05ld/%02d/%02d (%2d) : %ld", \
+ #define FAIL_MSG "%05ld/%02d/%02d (%2d) : %lld", \
year, month, mdday, yday, days
KUNIT_ASSERT_EQ_MSG(test, year - 1900, result.tm_year, FAIL_MSG);
diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
index 6cd2a4e..9ff0182 100644
--- a/kernel/trace/fprobe.c
+++ b/kernel/trace/fprobe.c
@@ -189,9 +189,6 @@ static int fprobe_init_rethook(struct fprobe *fp, int num)
{
int size;
- if (num <= 0)
- return -EINVAL;
-
if (!fp->exit_handler) {
fp->rethook = NULL;
return 0;
@@ -199,15 +196,16 @@ static int fprobe_init_rethook(struct fprobe *fp, int num)
/* Initialize rethook if needed */
if (fp->nr_maxactive)
- size = fp->nr_maxactive;
+ num = fp->nr_maxactive;
else
- size = num * num_possible_cpus() * 2;
- if (size <= 0)
+ num *= num_possible_cpus() * 2;
+ if (num <= 0)
return -EINVAL;
+ size = sizeof(struct fprobe_rethook_node) + fp->entry_data_size;
+
/* Initialize rethook */
- fp->rethook = rethook_alloc((void *)fp, fprobe_exit_handler,
- sizeof(struct fprobe_rethook_node), size);
+ fp->rethook = rethook_alloc((void *)fp, fprobe_exit_handler, size, num);
if (IS_ERR(fp->rethook))
return PTR_ERR(fp->rethook);
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index b01ae7d..83ba342 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -5325,7 +5325,17 @@ static LIST_HEAD(ftrace_direct_funcs);
static int register_ftrace_function_nolock(struct ftrace_ops *ops);
+/*
+ * If there are multiple ftrace_ops, use SAVE_REGS by default, so that direct
+ * call will be jumped from ftrace_regs_caller. Only if the architecture does
+ * not support ftrace_regs_caller but direct_call, use SAVE_ARGS so that it
+ * jumps from ftrace_caller for multiple ftrace_ops.
+ */
+#ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS
#define MULTI_FLAGS (FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_ARGS)
+#else
+#define MULTI_FLAGS (FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_REGS)
+#endif
static int check_direct_multi(struct ftrace_ops *ops)
{
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index fd4bfe3..aa332ac 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -384,7 +384,6 @@ struct rb_irq_work {
struct irq_work work;
wait_queue_head_t waiters;
wait_queue_head_t full_waiters;
- long wait_index;
bool waiters_pending;
bool full_waiters_pending;
bool wakeup_full;
@@ -756,8 +755,19 @@ static void rb_wake_up_waiters(struct irq_work *work)
wake_up_all(&rbwork->waiters);
if (rbwork->full_waiters_pending || rbwork->wakeup_full) {
+ /* Only cpu_buffer sets the above flags */
+ struct ring_buffer_per_cpu *cpu_buffer =
+ container_of(rbwork, struct ring_buffer_per_cpu, irq_work);
+
+ /* Called from interrupt context */
+ raw_spin_lock(&cpu_buffer->reader_lock);
rbwork->wakeup_full = false;
rbwork->full_waiters_pending = false;
+
+ /* Waking up all waiters, they will reset the shortest full */
+ cpu_buffer->shortest_full = 0;
+ raw_spin_unlock(&cpu_buffer->reader_lock);
+
wake_up_all(&rbwork->full_waiters);
}
}
@@ -798,14 +808,40 @@ void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu)
rbwork = &cpu_buffer->irq_work;
}
- rbwork->wait_index++;
- /* make sure the waiters see the new index */
- smp_wmb();
-
/* This can be called in any context */
irq_work_queue(&rbwork->work);
}
+static bool rb_watermark_hit(struct trace_buffer *buffer, int cpu, int full)
+{
+ struct ring_buffer_per_cpu *cpu_buffer;
+ bool ret = false;
+
+ /* Reads of all CPUs always waits for any data */
+ if (cpu == RING_BUFFER_ALL_CPUS)
+ return !ring_buffer_empty(buffer);
+
+ cpu_buffer = buffer->buffers[cpu];
+
+ if (!ring_buffer_empty_cpu(buffer, cpu)) {
+ unsigned long flags;
+ bool pagebusy;
+
+ if (!full)
+ return true;
+
+ raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+ pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page;
+ ret = !pagebusy && full_hit(buffer, cpu, full);
+
+ if (!cpu_buffer->shortest_full ||
+ cpu_buffer->shortest_full > full)
+ cpu_buffer->shortest_full = full;
+ raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+ }
+ return ret;
+}
+
/**
* ring_buffer_wait - wait for input to the ring buffer
* @buffer: buffer to wait on
@@ -821,7 +857,6 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
struct ring_buffer_per_cpu *cpu_buffer;
DEFINE_WAIT(wait);
struct rb_irq_work *work;
- long wait_index;
int ret = 0;
/*
@@ -840,81 +875,54 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
work = &cpu_buffer->irq_work;
}
- wait_index = READ_ONCE(work->wait_index);
+ if (full)
+ prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE);
+ else
+ prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE);
- while (true) {
- if (full)
- prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE);
- else
- prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE);
+ /*
+ * The events can happen in critical sections where
+ * checking a work queue can cause deadlocks.
+ * After adding a task to the queue, this flag is set
+ * only to notify events to try to wake up the queue
+ * using irq_work.
+ *
+ * We don't clear it even if the buffer is no longer
+ * empty. The flag only causes the next event to run
+ * irq_work to do the work queue wake up. The worse
+ * that can happen if we race with !trace_empty() is that
+ * an event will cause an irq_work to try to wake up
+ * an empty queue.
+ *
+ * There's no reason to protect this flag either, as
+ * the work queue and irq_work logic will do the necessary
+ * synchronization for the wake ups. The only thing
+ * that is necessary is that the wake up happens after
+ * a task has been queued. It's OK for spurious wake ups.
+ */
+ if (full)
+ work->full_waiters_pending = true;
+ else
+ work->waiters_pending = true;
- /*
- * The events can happen in critical sections where
- * checking a work queue can cause deadlocks.
- * After adding a task to the queue, this flag is set
- * only to notify events to try to wake up the queue
- * using irq_work.
- *
- * We don't clear it even if the buffer is no longer
- * empty. The flag only causes the next event to run
- * irq_work to do the work queue wake up. The worse
- * that can happen if we race with !trace_empty() is that
- * an event will cause an irq_work to try to wake up
- * an empty queue.
- *
- * There's no reason to protect this flag either, as
- * the work queue and irq_work logic will do the necessary
- * synchronization for the wake ups. The only thing
- * that is necessary is that the wake up happens after
- * a task has been queued. It's OK for spurious wake ups.
- */
- if (full)
- work->full_waiters_pending = true;
- else
- work->waiters_pending = true;
+ if (rb_watermark_hit(buffer, cpu, full))
+ goto out;
- if (signal_pending(current)) {
- ret = -EINTR;
- break;
- }
-
- if (cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer))
- break;
-
- if (cpu != RING_BUFFER_ALL_CPUS &&
- !ring_buffer_empty_cpu(buffer, cpu)) {
- unsigned long flags;
- bool pagebusy;
- bool done;
-
- if (!full)
- break;
-
- raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
- pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page;
- done = !pagebusy && full_hit(buffer, cpu, full);
-
- if (!cpu_buffer->shortest_full ||
- cpu_buffer->shortest_full > full)
- cpu_buffer->shortest_full = full;
- raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
- if (done)
- break;
- }
-
- schedule();
-
- /* Make sure to see the new wait index */
- smp_rmb();
- if (wait_index != work->wait_index)
- break;
+ if (signal_pending(current)) {
+ ret = -EINTR;
+ goto out;
}
+ schedule();
+ out:
if (full)
finish_wait(&work->full_waiters, &wait);
else
finish_wait(&work->waiters, &wait);
+ if (!ret && !rb_watermark_hit(buffer, cpu, full) && signal_pending(current))
+ ret = -EINTR;
+
return ret;
}
@@ -937,28 +945,33 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
struct file *filp, poll_table *poll_table, int full)
{
struct ring_buffer_per_cpu *cpu_buffer;
- struct rb_irq_work *work;
+ struct rb_irq_work *rbwork;
if (cpu == RING_BUFFER_ALL_CPUS) {
- work = &buffer->irq_work;
+ rbwork = &buffer->irq_work;
full = 0;
} else {
if (!cpumask_test_cpu(cpu, buffer->cpumask))
return EPOLLERR;
cpu_buffer = buffer->buffers[cpu];
- work = &cpu_buffer->irq_work;
+ rbwork = &cpu_buffer->irq_work;
}
if (full) {
- poll_wait(filp, &work->full_waiters, poll_table);
- work->full_waiters_pending = true;
+ unsigned long flags;
+
+ poll_wait(filp, &rbwork->full_waiters, poll_table);
+
+ raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+ rbwork->full_waiters_pending = true;
if (!cpu_buffer->shortest_full ||
cpu_buffer->shortest_full > full)
cpu_buffer->shortest_full = full;
+ raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
} else {
- poll_wait(filp, &work->waiters, poll_table);
- work->waiters_pending = true;
+ poll_wait(filp, &rbwork->waiters, poll_table);
+ rbwork->waiters_pending = true;
}
/*
@@ -5877,6 +5890,10 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order)
if (psize <= BUF_PAGE_HDR_SIZE)
return -EINVAL;
+ /* Size of a subbuf cannot be greater than the write counter */
+ if (psize > RB_WRITE_MASK + 1)
+ return -EINVAL;
+
old_order = buffer->subbuf_order;
old_size = buffer->subbuf_size;
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 2a7c6fd..c9c8983 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -39,6 +39,7 @@
#include <linux/ctype.h>
#include <linux/init.h>
#include <linux/panic_notifier.h>
+#include <linux/kmemleak.h>
#include <linux/poll.h>
#include <linux/nmi.h>
#include <linux/fs.h>
@@ -1532,7 +1533,7 @@ void disable_trace_on_warning(void)
bool tracer_tracing_is_on(struct trace_array *tr)
{
if (tr->array_buffer.buffer)
- return ring_buffer_record_is_on(tr->array_buffer.buffer);
+ return ring_buffer_record_is_set_on(tr->array_buffer.buffer);
return !tr->buffer_disabled;
}
@@ -2320,7 +2321,7 @@ struct saved_cmdlines_buffer {
unsigned *map_cmdline_to_pid;
unsigned cmdline_num;
int cmdline_idx;
- char *saved_cmdlines;
+ char saved_cmdlines[];
};
static struct saved_cmdlines_buffer *savedcmd;
@@ -2334,47 +2335,60 @@ static inline void set_cmdline(int idx, const char *cmdline)
strncpy(get_saved_cmdlines(idx), cmdline, TASK_COMM_LEN);
}
-static int allocate_cmdlines_buffer(unsigned int val,
- struct saved_cmdlines_buffer *s)
+static void free_saved_cmdlines_buffer(struct saved_cmdlines_buffer *s)
{
+ int order = get_order(sizeof(*s) + s->cmdline_num * TASK_COMM_LEN);
+
+ kfree(s->map_cmdline_to_pid);
+ kmemleak_free(s);
+ free_pages((unsigned long)s, order);
+}
+
+static struct saved_cmdlines_buffer *allocate_cmdlines_buffer(unsigned int val)
+{
+ struct saved_cmdlines_buffer *s;
+ struct page *page;
+ int orig_size, size;
+ int order;
+
+ /* Figure out how much is needed to hold the given number of cmdlines */
+ orig_size = sizeof(*s) + val * TASK_COMM_LEN;
+ order = get_order(orig_size);
+ size = 1 << (order + PAGE_SHIFT);
+ page = alloc_pages(GFP_KERNEL, order);
+ if (!page)
+ return NULL;
+
+ s = page_address(page);
+ kmemleak_alloc(s, size, 1, GFP_KERNEL);
+ memset(s, 0, sizeof(*s));
+
+ /* Round up to actual allocation */
+ val = (size - sizeof(*s)) / TASK_COMM_LEN;
+ s->cmdline_num = val;
+
s->map_cmdline_to_pid = kmalloc_array(val,
sizeof(*s->map_cmdline_to_pid),
GFP_KERNEL);
- if (!s->map_cmdline_to_pid)
- return -ENOMEM;
-
- s->saved_cmdlines = kmalloc_array(TASK_COMM_LEN, val, GFP_KERNEL);
- if (!s->saved_cmdlines) {
- kfree(s->map_cmdline_to_pid);
- return -ENOMEM;
+ if (!s->map_cmdline_to_pid) {
+ free_saved_cmdlines_buffer(s);
+ return NULL;
}
s->cmdline_idx = 0;
- s->cmdline_num = val;
memset(&s->map_pid_to_cmdline, NO_CMDLINE_MAP,
sizeof(s->map_pid_to_cmdline));
memset(s->map_cmdline_to_pid, NO_CMDLINE_MAP,
val * sizeof(*s->map_cmdline_to_pid));
- return 0;
+ return s;
}
static int trace_create_savedcmd(void)
{
- int ret;
+ savedcmd = allocate_cmdlines_buffer(SAVED_CMDLINES_DEFAULT);
- savedcmd = kmalloc(sizeof(*savedcmd), GFP_KERNEL);
- if (!savedcmd)
- return -ENOMEM;
-
- ret = allocate_cmdlines_buffer(SAVED_CMDLINES_DEFAULT, savedcmd);
- if (ret < 0) {
- kfree(savedcmd);
- savedcmd = NULL;
- return -ENOMEM;
- }
-
- return 0;
+ return savedcmd ? 0 : -ENOMEM;
}
int is_tracing_stopped(void)
@@ -6056,26 +6070,14 @@ tracing_saved_cmdlines_size_read(struct file *filp, char __user *ubuf,
return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
}
-static void free_saved_cmdlines_buffer(struct saved_cmdlines_buffer *s)
-{
- kfree(s->saved_cmdlines);
- kfree(s->map_cmdline_to_pid);
- kfree(s);
-}
-
static int tracing_resize_saved_cmdlines(unsigned int val)
{
struct saved_cmdlines_buffer *s, *savedcmd_temp;
- s = kmalloc(sizeof(*s), GFP_KERNEL);
+ s = allocate_cmdlines_buffer(val);
if (!s)
return -ENOMEM;
- if (allocate_cmdlines_buffer(val, s) < 0) {
- kfree(s);
- return -ENOMEM;
- }
-
preempt_disable();
arch_spin_lock(&trace_cmdline_lock);
savedcmd_temp = savedcmd;
@@ -7291,6 +7293,8 @@ tracing_free_buffer_release(struct inode *inode, struct file *filp)
return 0;
}
+#define TRACE_MARKER_MAX_SIZE 4096
+
static ssize_t
tracing_mark_write(struct file *filp, const char __user *ubuf,
size_t cnt, loff_t *fpos)
@@ -7318,6 +7322,9 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,
if ((ssize_t)cnt < 0)
return -EINVAL;
+ if (cnt > TRACE_MARKER_MAX_SIZE)
+ cnt = TRACE_MARKER_MAX_SIZE;
+
meta_size = sizeof(*entry) + 2; /* add '\0' and possible '\n' */
again:
size = cnt + meta_size;
@@ -7326,11 +7333,6 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,
if (cnt < FAULTED_SIZE)
size += FAULTED_SIZE - cnt;
- if (size > TRACE_SEQ_BUFFER_SIZE) {
- cnt -= size - TRACE_SEQ_BUFFER_SIZE;
- goto again;
- }
-
buffer = tr->array_buffer.buffer;
event = __trace_buffer_lock_reserve(buffer, TRACE_PRINT, size,
tracing_gen_ctx());
@@ -8391,6 +8393,20 @@ tracing_buffers_read(struct file *filp, char __user *ubuf,
return size;
}
+static int tracing_buffers_flush(struct file *file, fl_owner_t id)
+{
+ struct ftrace_buffer_info *info = file->private_data;
+ struct trace_iterator *iter = &info->iter;
+
+ iter->wait_index++;
+ /* Make sure the waiters see the new wait_index */
+ smp_wmb();
+
+ ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
+
+ return 0;
+}
+
static int tracing_buffers_release(struct inode *inode, struct file *file)
{
struct ftrace_buffer_info *info = file->private_data;
@@ -8402,12 +8418,6 @@ static int tracing_buffers_release(struct inode *inode, struct file *file)
__trace_array_put(iter->tr);
- iter->wait_index++;
- /* Make sure the waiters see the new wait_index */
- smp_wmb();
-
- ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
-
if (info->spare)
ring_buffer_free_read_page(iter->array_buffer->buffer,
info->spare_cpu, info->spare);
@@ -8623,6 +8633,7 @@ static const struct file_operations tracing_buffers_fops = {
.read = tracing_buffers_read,
.poll = tracing_buffers_poll,
.release = tracing_buffers_release,
+ .flush = tracing_buffers_flush,
.splice_read = tracing_buffers_splice_read,
.unlocked_ioctl = tracing_buffers_ioctl,
.llseek = no_llseek,
diff --git a/kernel/trace/trace_btf.c b/kernel/trace/trace_btf.c
index ca224d5..5bbdbcb 100644
--- a/kernel/trace/trace_btf.c
+++ b/kernel/trace/trace_btf.c
@@ -91,8 +91,8 @@ const struct btf_member *btf_find_struct_member(struct btf *btf,
for_each_member(i, type, member) {
if (!member->name_off) {
/* Anonymous union/struct: push it for later use */
- type = btf_type_skip_modifiers(btf, member->type, &tid);
- if (type && top < BTF_ANON_STACK_MAX) {
+ if (btf_type_skip_modifiers(btf, member->type, &tid) &&
+ top < BTF_ANON_STACK_MAX) {
anon_stack[top].tid = tid;
anon_stack[top++].offset =
cur_offset + member->offset;
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index e7af286a..c82b401 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -441,8 +441,9 @@ static unsigned int trace_string(struct synth_trace_event *entry,
if (is_dynamic) {
union trace_synth_field *data = &entry->fields[*n_u64];
+ len = fetch_store_strlen((unsigned long)str_val);
data->as_dynamic.offset = struct_size(entry, fields, event->n_u64) + data_size;
- data->as_dynamic.len = fetch_store_strlen((unsigned long)str_val);
+ data->as_dynamic.len = len;
ret = fetch_store_string((unsigned long)str_val, &entry->fields[*n_u64], entry);
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 3e7fa44..d8b302d 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -1587,12 +1587,11 @@ static enum print_line_t trace_print_print(struct trace_iterator *iter,
{
struct print_entry *field;
struct trace_seq *s = &iter->seq;
- int max = iter->ent_size - offsetof(struct print_entry, buf);
trace_assign_type(field, iter->ent);
seq_print_ip_sym(s, field->ip, flags);
- trace_seq_printf(s, ": %.*s", max, field->buf);
+ trace_seq_printf(s, ": %s", field->buf);
return trace_handle_return(s);
}
@@ -1601,11 +1600,10 @@ static enum print_line_t trace_print_raw(struct trace_iterator *iter, int flags,
struct trace_event *event)
{
struct print_entry *field;
- int max = iter->ent_size - offsetof(struct print_entry, buf);
trace_assign_type(field, iter->ent);
- trace_seq_printf(&iter->seq, "# %lx %.*s", field->ip, max, field->buf);
+ trace_seq_printf(&iter->seq, "# %lx %s", field->ip, field->buf);
return trace_handle_return(&iter->seq);
}
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 4dc74d7..34289f9 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -1159,9 +1159,12 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
if (!(ctx->flags & TPARG_FL_TEVENT) &&
(strcmp(arg, "$comm") == 0 || strcmp(arg, "$COMM") == 0 ||
strncmp(arg, "\\\"", 2) == 0)) {
- /* The type of $comm must be "string", and not an array. */
- if (parg->count || (t && strcmp(t, "string")))
+ /* The type of $comm must be "string", and not an array type. */
+ if (parg->count || (t && strcmp(t, "string"))) {
+ trace_probe_log_err(ctx->offset + (t ? (t - arg) : 0),
+ NEED_STRING_TYPE);
goto out;
+ }
parg->type = find_fetch_type("string", ctx->flags);
} else
parg->type = find_fetch_type(t, ctx->flags);
@@ -1169,18 +1172,6 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
trace_probe_log_err(ctx->offset + (t ? (t - arg) : 0), BAD_TYPE);
goto out;
}
- parg->offset = *size;
- *size += parg->type->size * (parg->count ?: 1);
-
- ret = -ENOMEM;
- if (parg->count) {
- len = strlen(parg->type->fmttype) + 6;
- parg->fmt = kmalloc(len, GFP_KERNEL);
- if (!parg->fmt)
- goto out;
- snprintf(parg->fmt, len, "%s[%d]", parg->type->fmttype,
- parg->count);
- }
code = tmp = kcalloc(FETCH_INSN_MAX, sizeof(*code), GFP_KERNEL);
if (!code)
@@ -1204,6 +1195,19 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
goto fail;
}
}
+ parg->offset = *size;
+ *size += parg->type->size * (parg->count ?: 1);
+
+ if (parg->count) {
+ len = strlen(parg->type->fmttype) + 6;
+ parg->fmt = kmalloc(len, GFP_KERNEL);
+ if (!parg->fmt) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ snprintf(parg->fmt, len, "%s[%d]", parg->type->fmttype,
+ parg->count);
+ }
ret = -EINVAL;
/* Store operation */
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 850d9ec..c1877d0 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -515,7 +515,8 @@ extern int traceprobe_define_arg_fields(struct trace_event_call *event_call,
C(BAD_HYPHEN, "Failed to parse single hyphen. Forgot '>'?"), \
C(NO_BTF_FIELD, "This field is not found."), \
C(BAD_BTF_TID, "Failed to get BTF type info."),\
- C(BAD_TYPE4STR, "This type does not fit for string."),
+ C(BAD_TYPE4STR, "This type does not fit for string."),\
+ C(NEED_STRING_TYPE, "$comm and immediate-string only accepts string type"),
#undef C
#define C(a, b) TP_ERR_##a
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 76e60fa..7b482a2 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -5786,13 +5786,9 @@ static int workqueue_apply_unbound_cpumask(const cpumask_var_t unbound_cpumask)
list_for_each_entry(wq, &workqueues, list) {
if (!(wq->flags & WQ_UNBOUND))
continue;
-
/* creating multiple pwqs breaks ordering guarantee */
- if (!list_empty(&wq->pwqs)) {
- if (wq->flags & __WQ_ORDERED_EXPLICIT)
- continue;
- wq->flags &= ~__WQ_ORDERED;
- }
+ if (wq->flags & __WQ_ORDERED)
+ continue;
ctx = apply_wqattrs_prepare(wq, wq->unbound_attrs, unbound_cpumask);
if (IS_ERR(ctx)) {
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 975a07f..f3b50b4 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2235,6 +2235,7 @@
config TEST_IOV_ITER
tristate "Test iov_iter operation" if !KUNIT_ALL_TESTS
depends on KUNIT
+ depends on MMU
default KUNIT_ALL_TESTS
help
Enable this to turn on testing of the operation of the I/O iterator
@@ -2857,28 +2858,6 @@
If unsure, say N.
-config TEST_LIVEPATCH
- tristate "Test livepatching"
- default n
- depends on DYNAMIC_DEBUG
- depends on LIVEPATCH
- depends on m
- help
- Test kernel livepatching features for correctness. The tests will
- load test modules that will be livepatched in various scenarios.
-
- To run all the livepatching tests:
-
- make -C tools/testing/selftests TARGETS=livepatch run_tests
-
- Alternatively, individual tests may be invoked:
-
- tools/testing/selftests/livepatch/test-callbacks.sh
- tools/testing/selftests/livepatch/test-livepatch.sh
- tools/testing/selftests/livepatch/test-shadow-vars.sh
-
- If unsure, say N.
-
config TEST_OBJAGG
tristate "Perform selftest on object aggreration manager"
default n
diff --git a/lib/Makefile b/lib/Makefile
index 6b09731..95ed57f 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -134,8 +134,6 @@
obj-$(CONFIG_TEST_FPU) += test_fpu.o
CFLAGS_test_fpu.o += $(FPU_CFLAGS)
-obj-$(CONFIG_TEST_LIVEPATCH) += livepatch/
-
# Some KUnit files (hooks.o) need to be built-in even when KUnit is a module,
# so we can't just use obj-$(CONFIG_KUNIT).
ifdef CONFIG_KUNIT
diff --git a/lib/checksum_kunit.c b/lib/checksum_kunit.c
index 225bb77..bf70850 100644
--- a/lib/checksum_kunit.c
+++ b/lib/checksum_kunit.c
@@ -215,7 +215,7 @@ static const u32 init_sums_no_overflow[] = {
0xffff0000, 0xfffffffb,
};
-static const __sum16 expected_csum_ipv6_magic[] = {
+static const u16 expected_csum_ipv6_magic[] = {
0x18d4, 0x3085, 0x2e4b, 0xd9f4, 0xbdc8, 0x78f, 0x1034, 0x8422, 0x6fc0,
0xd2f6, 0xbeb5, 0x9d3, 0x7e2a, 0x312e, 0x778e, 0xc1bb, 0x7cf2, 0x9d1e,
0xca21, 0xf3ff, 0x7569, 0xb02e, 0xca86, 0x7e76, 0x4539, 0x45e3, 0xf28d,
@@ -241,7 +241,7 @@ static const __sum16 expected_csum_ipv6_magic[] = {
0x3845, 0x1014
};
-static const __sum16 expected_fast_csum[] = {
+static const u16 expected_fast_csum[] = {
0xda83, 0x45da, 0x4f46, 0x4e4f, 0x34e, 0xe902, 0xa5e9, 0x87a5, 0x7187,
0x5671, 0xf556, 0x6df5, 0x816d, 0x8f81, 0xbb8f, 0xfbba, 0x5afb, 0xbe5a,
0xedbe, 0xabee, 0x6aac, 0xe6b, 0xea0d, 0x67ea, 0x7e68, 0x8a7e, 0x6f8a,
@@ -577,7 +577,8 @@ static void test_csum_no_carry_inputs(struct kunit *test)
static void test_ip_fast_csum(struct kunit *test)
{
- __sum16 csum_result, expected;
+ __sum16 csum_result;
+ u16 expected;
for (int len = IPv4_MIN_WORDS; len < IPv4_MAX_WORDS; len++) {
for (int index = 0; index < NUM_IP_FAST_CSUM_TESTS; index++) {
@@ -586,7 +587,7 @@ static void test_ip_fast_csum(struct kunit *test)
expected_fast_csum[(len - IPv4_MIN_WORDS) *
NUM_IP_FAST_CSUM_TESTS +
index];
- CHECK_EQ(expected, csum_result);
+ CHECK_EQ(to_sum16(expected), csum_result);
}
}
}
@@ -598,7 +599,7 @@ static void test_csum_ipv6_magic(struct kunit *test)
const struct in6_addr *daddr;
unsigned int len;
unsigned char proto;
- unsigned int csum;
+ __wsum csum;
const int daddr_offset = sizeof(struct in6_addr);
const int len_offset = sizeof(struct in6_addr) + sizeof(struct in6_addr);
@@ -611,10 +612,10 @@ static void test_csum_ipv6_magic(struct kunit *test)
saddr = (const struct in6_addr *)(random_buf + i);
daddr = (const struct in6_addr *)(random_buf + i +
daddr_offset);
- len = *(unsigned int *)(random_buf + i + len_offset);
+ len = le32_to_cpu(*(__le32 *)(random_buf + i + len_offset));
proto = *(random_buf + i + proto_offset);
- csum = *(unsigned int *)(random_buf + i + csum_offset);
- CHECK_EQ(expected_csum_ipv6_magic[i],
+ csum = *(__wsum *)(random_buf + i + csum_offset);
+ CHECK_EQ(to_sum16(expected_csum_ipv6_magic[i]),
csum_ipv6_magic(saddr, daddr, len, proto, csum));
}
#endif /* !CONFIG_NET */
diff --git a/lib/cmdline_kunit.c b/lib/cmdline_kunit.c
index d4572db..705b827 100644
--- a/lib/cmdline_kunit.c
+++ b/lib/cmdline_kunit.c
@@ -124,7 +124,7 @@ static void cmdline_do_one_range_test(struct kunit *test, const char *in,
n, e[0], r[0]);
p = memchr_inv(&r[1], 0, sizeof(r) - sizeof(r[0]));
- KUNIT_EXPECT_PTR_EQ_MSG(test, p, NULL, "in test %u at %u out of bound", n, p - r);
+ KUNIT_EXPECT_PTR_EQ_MSG(test, p, NULL, "in test %u at %td out of bound", n, p - r);
}
static void cmdline_test_range(struct kunit *test)
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index e0aa6b4..4a6a9f4 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -166,7 +166,6 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction,
WARN_ON(direction & ~(READ | WRITE));
*i = (struct iov_iter) {
.iter_type = ITER_IOVEC,
- .copy_mc = false,
.nofault = false,
.data_source = direction,
.__iov = iov,
@@ -245,26 +244,8 @@ EXPORT_SYMBOL_GPL(_copy_mc_to_iter);
#endif /* CONFIG_ARCH_HAS_COPY_MC */
static __always_inline
-size_t memcpy_from_iter_mc(void *iter_from, size_t progress,
- size_t len, void *to, void *priv2)
-{
- return copy_mc_to_kernel(to + progress, iter_from, len);
-}
-
-static size_t __copy_from_iter_mc(void *addr, size_t bytes, struct iov_iter *i)
-{
- if (unlikely(i->count < bytes))
- bytes = i->count;
- if (unlikely(!bytes))
- return 0;
- return iterate_bvec(i, bytes, addr, NULL, memcpy_from_iter_mc);
-}
-
-static __always_inline
size_t __copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
{
- if (unlikely(iov_iter_is_copy_mc(i)))
- return __copy_from_iter_mc(addr, bytes, i);
return iterate_and_advance(i, bytes, addr,
copy_from_user_iter, memcpy_from_iter);
}
@@ -633,7 +614,6 @@ void iov_iter_kvec(struct iov_iter *i, unsigned int direction,
WARN_ON(direction & ~(READ | WRITE));
*i = (struct iov_iter){
.iter_type = ITER_KVEC,
- .copy_mc = false,
.data_source = direction,
.kvec = kvec,
.nr_segs = nr_segs,
@@ -650,7 +630,6 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int direction,
WARN_ON(direction & ~(READ | WRITE));
*i = (struct iov_iter){
.iter_type = ITER_BVEC,
- .copy_mc = false,
.data_source = direction,
.bvec = bvec,
.nr_segs = nr_segs,
@@ -679,7 +658,6 @@ void iov_iter_xarray(struct iov_iter *i, unsigned int direction,
BUG_ON(direction & ~1);
*i = (struct iov_iter) {
.iter_type = ITER_XARRAY,
- .copy_mc = false,
.data_source = direction,
.xarray = xarray,
.xarray_start = start,
@@ -703,7 +681,6 @@ void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count)
BUG_ON(direction != READ);
*i = (struct iov_iter){
.iter_type = ITER_DISCARD,
- .copy_mc = false,
.data_source = false,
.count = count,
.iov_offset = 0
@@ -714,12 +691,11 @@ EXPORT_SYMBOL(iov_iter_discard);
static bool iov_iter_aligned_iovec(const struct iov_iter *i, unsigned addr_mask,
unsigned len_mask)
{
+ const struct iovec *iov = iter_iov(i);
size_t size = i->count;
size_t skip = i->iov_offset;
- unsigned k;
- for (k = 0; k < i->nr_segs; k++, skip = 0) {
- const struct iovec *iov = iter_iov(i) + k;
+ do {
size_t len = iov->iov_len - skip;
if (len > size)
@@ -729,34 +705,36 @@ static bool iov_iter_aligned_iovec(const struct iov_iter *i, unsigned addr_mask,
if ((unsigned long)(iov->iov_base + skip) & addr_mask)
return false;
+ iov++;
size -= len;
- if (!size)
- break;
- }
+ skip = 0;
+ } while (size);
+
return true;
}
static bool iov_iter_aligned_bvec(const struct iov_iter *i, unsigned addr_mask,
unsigned len_mask)
{
- size_t size = i->count;
+ const struct bio_vec *bvec = i->bvec;
unsigned skip = i->iov_offset;
- unsigned k;
+ size_t size = i->count;
- for (k = 0; k < i->nr_segs; k++, skip = 0) {
- size_t len = i->bvec[k].bv_len - skip;
+ do {
+ size_t len = bvec->bv_len;
if (len > size)
len = size;
if (len & len_mask)
return false;
- if ((unsigned long)(i->bvec[k].bv_offset + skip) & addr_mask)
+ if ((unsigned long)(bvec->bv_offset + skip) & addr_mask)
return false;
+ bvec++;
size -= len;
- if (!size)
- break;
- }
+ skip = 0;
+ } while (size);
+
return true;
}
@@ -800,13 +778,12 @@ EXPORT_SYMBOL_GPL(iov_iter_is_aligned);
static unsigned long iov_iter_alignment_iovec(const struct iov_iter *i)
{
+ const struct iovec *iov = iter_iov(i);
unsigned long res = 0;
size_t size = i->count;
size_t skip = i->iov_offset;
- unsigned k;
- for (k = 0; k < i->nr_segs; k++, skip = 0) {
- const struct iovec *iov = iter_iov(i) + k;
+ do {
size_t len = iov->iov_len - skip;
if (len) {
res |= (unsigned long)iov->iov_base + skip;
@@ -814,30 +791,31 @@ static unsigned long iov_iter_alignment_iovec(const struct iov_iter *i)
len = size;
res |= len;
size -= len;
- if (!size)
- break;
}
- }
+ iov++;
+ skip = 0;
+ } while (size);
return res;
}
static unsigned long iov_iter_alignment_bvec(const struct iov_iter *i)
{
+ const struct bio_vec *bvec = i->bvec;
unsigned res = 0;
size_t size = i->count;
unsigned skip = i->iov_offset;
- unsigned k;
- for (k = 0; k < i->nr_segs; k++, skip = 0) {
- size_t len = i->bvec[k].bv_len - skip;
- res |= (unsigned long)i->bvec[k].bv_offset + skip;
+ do {
+ size_t len = bvec->bv_len - skip;
+ res |= (unsigned long)bvec->bv_offset + skip;
if (len > size)
len = size;
res |= len;
+ bvec++;
size -= len;
- if (!size)
- break;
- }
+ skip = 0;
+ } while (size);
+
return res;
}
@@ -1166,11 +1144,12 @@ const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
EXPORT_SYMBOL(dup_iter);
static __noclone int copy_compat_iovec_from_user(struct iovec *iov,
- const struct iovec __user *uvec, unsigned long nr_segs)
+ const struct iovec __user *uvec, u32 nr_segs)
{
const struct compat_iovec __user *uiov =
(const struct compat_iovec __user *)uvec;
- int ret = -EFAULT, i;
+ int ret = -EFAULT;
+ u32 i;
if (!user_access_begin(uiov, nr_segs * sizeof(*uiov)))
return -EFAULT;
diff --git a/lib/kobject.c b/lib/kobject.c
index 59dbcbd..72fa20f 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -74,10 +74,12 @@ static int create_dir(struct kobject *kobj)
if (error)
return error;
- error = sysfs_create_groups(kobj, ktype->default_groups);
- if (error) {
- sysfs_remove_dir(kobj);
- return error;
+ if (ktype) {
+ error = sysfs_create_groups(kobj, ktype->default_groups);
+ if (error) {
+ sysfs_remove_dir(kobj);
+ return error;
+ }
}
/*
@@ -589,7 +591,8 @@ static void __kobject_del(struct kobject *kobj)
sd = kobj->sd;
ktype = get_ktype(kobj);
- sysfs_remove_groups(kobj, ktype->default_groups);
+ if (ktype)
+ sysfs_remove_groups(kobj, ktype->default_groups);
/* send "remove" if the caller did not do it but sent "add" */
if (kobj->state_add_uevent_sent && !kobj->state_remove_uevent_sent) {
@@ -666,6 +669,10 @@ static void kobject_cleanup(struct kobject *kobj)
pr_debug("'%s' (%p): %s, parent %p\n",
kobject_name(kobj), kobj, __func__, kobj->parent);
+ if (t && !t->release)
+ pr_debug("'%s' (%p): does not have a release() function, it is broken and must be fixed. See Documentation/core-api/kobject.rst.\n",
+ kobject_name(kobj), kobj);
+
/* remove from sysfs if the caller did not do it */
if (kobj->state_in_sysfs) {
pr_debug("'%s' (%p): auto cleanup kobject_del\n",
@@ -676,13 +683,10 @@ static void kobject_cleanup(struct kobject *kobj)
parent = NULL;
}
- if (t->release) {
+ if (t && t->release) {
pr_debug("'%s' (%p): calling ktype release\n",
kobject_name(kobj), kobj);
t->release(kobj);
- } else {
- pr_debug("'%s' (%p): does not have a release() function, it is broken and must be fixed. See Documentation/core-api/kobject.rst.\n",
- kobject_name(kobj), kobj);
}
/* free name if we allocated it */
@@ -1056,7 +1060,7 @@ const struct kobj_ns_type_operations *kobj_child_ns_ops(const struct kobject *pa
{
const struct kobj_ns_type_operations *ops = NULL;
- if (parent && parent->ktype->child_ns_type)
+ if (parent && parent->ktype && parent->ktype->child_ns_type)
ops = parent->ktype->child_ns_type(parent);
return ops;
diff --git a/lib/kunit/device-impl.h b/lib/kunit/device-impl.h
index 54bd558..5fcd48f 100644
--- a/lib/kunit/device-impl.h
+++ b/lib/kunit/device-impl.h
@@ -13,5 +13,7 @@
// For internal use only -- registers the kunit_bus.
int kunit_bus_init(void);
+// For internal use only -- unregisters the kunit_bus.
+void kunit_bus_shutdown(void);
#endif //_KUNIT_DEVICE_IMPL_H
diff --git a/lib/kunit/device.c b/lib/kunit/device.c
index 074c6dd..abc6037 100644
--- a/lib/kunit/device.c
+++ b/lib/kunit/device.c
@@ -10,6 +10,7 @@
*/
#include <linux/device.h>
+#include <linux/dma-mapping.h>
#include <kunit/test.h>
#include <kunit/device.h>
@@ -35,7 +36,7 @@ struct kunit_device {
#define to_kunit_device(d) container_of_const(d, struct kunit_device, dev)
-static struct bus_type kunit_bus_type = {
+static const struct bus_type kunit_bus_type = {
.name = "kunit",
};
@@ -54,6 +55,20 @@ int kunit_bus_init(void)
return error;
}
+/* Unregister the 'kunit_bus' in case the KUnit module is unloaded. */
+void kunit_bus_shutdown(void)
+{
+ /* Make sure the bus exists before we unregister it. */
+ if (IS_ERR_OR_NULL(kunit_bus_device))
+ return;
+
+ bus_unregister(&kunit_bus_type);
+
+ root_device_unregister(kunit_bus_device);
+
+ kunit_bus_device = NULL;
+}
+
/* Release a 'fake' KUnit device. */
static void kunit_device_release(struct device *d)
{
@@ -119,6 +134,9 @@ static struct kunit_device *kunit_device_register_internal(struct kunit *test,
return ERR_PTR(err);
}
+ kunit_dev->dev.dma_mask = &kunit_dev->dev.coherent_dma_mask;
+ kunit_dev->dev.coherent_dma_mask = DMA_BIT_MASK(32);
+
kunit_add_action(test, device_unregister_wrapper, &kunit_dev->dev);
return kunit_dev;
diff --git a/lib/kunit/executor.c b/lib/kunit/executor.c
index 689fff2..70b9a43 100644
--- a/lib/kunit/executor.c
+++ b/lib/kunit/executor.c
@@ -33,13 +33,13 @@ static char *filter_glob_param;
static char *filter_param;
static char *filter_action_param;
-module_param_named(filter_glob, filter_glob_param, charp, 0400);
+module_param_named(filter_glob, filter_glob_param, charp, 0600);
MODULE_PARM_DESC(filter_glob,
"Filter which KUnit test suites/tests run at boot-time, e.g. list* or list*.*del_test");
-module_param_named(filter, filter_param, charp, 0400);
+module_param_named(filter, filter_param, charp, 0600);
MODULE_PARM_DESC(filter,
"Filter which KUnit test suites/tests run at boot-time using attributes, e.g. speed>slow");
-module_param_named(filter_action, filter_action_param, charp, 0400);
+module_param_named(filter_action, filter_action_param, charp, 0600);
MODULE_PARM_DESC(filter_action,
"Changes behavior of filtered tests using attributes, valid values are:\n"
"<none>: do not run filtered tests as normal\n"
diff --git a/lib/kunit/executor_test.c b/lib/kunit/executor_test.c
index 22d4ee8..3f7f967 100644
--- a/lib/kunit/executor_test.c
+++ b/lib/kunit/executor_test.c
@@ -129,7 +129,7 @@ static void parse_filter_attr_test(struct kunit *test)
GFP_KERNEL);
for (j = 0; j < filter_count; j++) {
parsed_filters[j] = kunit_next_attr_filter(&filter, &err);
- KUNIT_ASSERT_EQ_MSG(test, err, 0, "failed to parse filter '%s'", filters[j]);
+ KUNIT_ASSERT_EQ_MSG(test, err, 0, "failed to parse filter from '%s'", filters);
}
KUNIT_EXPECT_STREQ(test, kunit_attr_filter_name(parsed_filters[0]), "speed");
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index 31a5a992e..1d14755 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -928,6 +928,9 @@ static void __exit kunit_exit(void)
#ifdef CONFIG_MODULES
unregister_module_notifier(&kunit_mod_nb);
#endif
+
+ kunit_bus_shutdown();
+
kunit_debugfs_cleanup();
}
module_exit(kunit_exit);
diff --git a/lib/livepatch/Makefile b/lib/livepatch/Makefile
deleted file mode 100644
index dcc912b..0000000
--- a/lib/livepatch/Makefile
+++ /dev/null
@@ -1,14 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-#
-# Makefile for livepatch test code.
-
-obj-$(CONFIG_TEST_LIVEPATCH) += test_klp_atomic_replace.o \
- test_klp_callbacks_demo.o \
- test_klp_callbacks_demo2.o \
- test_klp_callbacks_busy.o \
- test_klp_callbacks_mod.o \
- test_klp_livepatch.o \
- test_klp_shadow_vars.o \
- test_klp_state.o \
- test_klp_state2.o \
- test_klp_state3.o
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 6f241bb..af09702 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -4290,6 +4290,56 @@ static inline void *mas_insert(struct ma_state *mas, void *entry)
}
+/**
+ * mas_alloc_cyclic() - Internal call to find somewhere to store an entry
+ * @mas: The maple state.
+ * @startp: Pointer to ID.
+ * @range_lo: Lower bound of range to search.
+ * @range_hi: Upper bound of range to search.
+ * @entry: The entry to store.
+ * @next: Pointer to next ID to allocate.
+ * @gfp: The GFP_FLAGS to use for allocations.
+ *
+ * Return: 0 if the allocation succeeded without wrapping, 1 if the
+ * allocation succeeded after wrapping, or -EBUSY if there are no
+ * free entries.
+ */
+int mas_alloc_cyclic(struct ma_state *mas, unsigned long *startp,
+ void *entry, unsigned long range_lo, unsigned long range_hi,
+ unsigned long *next, gfp_t gfp)
+{
+ unsigned long min = range_lo;
+ int ret = 0;
+
+ range_lo = max(min, *next);
+ ret = mas_empty_area(mas, range_lo, range_hi, 1);
+ if ((mas->tree->ma_flags & MT_FLAGS_ALLOC_WRAPPED) && ret == 0) {
+ mas->tree->ma_flags &= ~MT_FLAGS_ALLOC_WRAPPED;
+ ret = 1;
+ }
+ if (ret < 0 && range_lo > min) {
+ ret = mas_empty_area(mas, min, range_hi, 1);
+ if (ret == 0)
+ ret = 1;
+ }
+ if (ret < 0)
+ return ret;
+
+ do {
+ mas_insert(mas, entry);
+ } while (mas_nomem(mas, gfp));
+ if (mas_is_err(mas))
+ return xa_err(mas->node);
+
+ *startp = mas->index;
+ *next = *startp + 1;
+ if (*next == 0)
+ mas->tree->ma_flags |= MT_FLAGS_ALLOC_WRAPPED;
+
+ return ret;
+}
+EXPORT_SYMBOL(mas_alloc_cyclic);
+
static __always_inline void mas_rewalk(struct ma_state *mas, unsigned long index)
{
retry:
@@ -6443,6 +6493,49 @@ int mtree_alloc_range(struct maple_tree *mt, unsigned long *startp,
}
EXPORT_SYMBOL(mtree_alloc_range);
+/**
+ * mtree_alloc_cyclic() - Find somewhere to store this entry in the tree.
+ * @mt: The maple tree.
+ * @startp: Pointer to ID.
+ * @range_lo: Lower bound of range to search.
+ * @range_hi: Upper bound of range to search.
+ * @entry: The entry to store.
+ * @next: Pointer to next ID to allocate.
+ * @gfp: The GFP_FLAGS to use for allocations.
+ *
+ * Finds an empty entry in @mt after @next, stores the new index into
+ * the @id pointer, stores the entry at that index, then updates @next.
+ *
+ * @mt must be initialized with the MT_FLAGS_ALLOC_RANGE flag.
+ *
+ * Context: Any context. Takes and releases the mt.lock. May sleep if
+ * the @gfp flags permit.
+ *
+ * Return: 0 if the allocation succeeded without wrapping, 1 if the
+ * allocation succeeded after wrapping, -ENOMEM if memory could not be
+ * allocated, -EINVAL if @mt cannot be used, or -EBUSY if there are no
+ * free entries.
+ */
+int mtree_alloc_cyclic(struct maple_tree *mt, unsigned long *startp,
+ void *entry, unsigned long range_lo, unsigned long range_hi,
+ unsigned long *next, gfp_t gfp)
+{
+ int ret;
+
+ MA_STATE(mas, mt, 0, 0);
+
+ if (!mt_is_alloc(mt))
+ return -EINVAL;
+ if (WARN_ON_ONCE(mt_is_reserved(entry)))
+ return -EINVAL;
+ mtree_lock(mt);
+ ret = mas_alloc_cyclic(&mas, startp, entry, range_lo, range_hi,
+ next, gfp);
+ mtree_unlock(mt);
+ return ret;
+}
+EXPORT_SYMBOL(mtree_alloc_cyclic);
+
int mtree_alloc_rrange(struct maple_tree *mt, unsigned long *startp,
void *entry, unsigned long size, unsigned long min,
unsigned long max, gfp_t gfp)
diff --git a/lib/memcpy_kunit.c b/lib/memcpy_kunit.c
index 440aee7..30e00ef 100644
--- a/lib/memcpy_kunit.c
+++ b/lib/memcpy_kunit.c
@@ -32,7 +32,7 @@ struct some_bytes {
BUILD_BUG_ON(sizeof(instance.data) != 32); \
for (size_t i = 0; i < sizeof(instance.data); i++) { \
KUNIT_ASSERT_EQ_MSG(test, instance.data[i], v, \
- "line %d: '%s' not initialized to 0x%02x @ %d (saw 0x%02x)\n", \
+ "line %d: '%s' not initialized to 0x%02x @ %zu (saw 0x%02x)\n", \
__LINE__, #instance, v, i, instance.data[i]); \
} \
} while (0)
@@ -41,7 +41,7 @@ struct some_bytes {
BUILD_BUG_ON(sizeof(one) != sizeof(two)); \
for (size_t i = 0; i < sizeof(one); i++) { \
KUNIT_EXPECT_EQ_MSG(test, one.data[i], two.data[i], \
- "line %d: %s.data[%d] (0x%02x) != %s.data[%d] (0x%02x)\n", \
+ "line %d: %s.data[%zu] (0x%02x) != %s.data[%zu] (0x%02x)\n", \
__LINE__, #one, i, one.data[i], #two, i, two.data[i]); \
} \
kunit_info(test, "ok: " TEST_OP "() " name "\n"); \
diff --git a/lib/nlattr.c b/lib/nlattr.c
index ed2ab43..be9c576 100644
--- a/lib/nlattr.c
+++ b/lib/nlattr.c
@@ -30,6 +30,8 @@ static const u8 nla_attr_len[NLA_TYPE_MAX+1] = {
[NLA_S16] = sizeof(s16),
[NLA_S32] = sizeof(s32),
[NLA_S64] = sizeof(s64),
+ [NLA_BE16] = sizeof(__be16),
+ [NLA_BE32] = sizeof(__be32),
};
static const u8 nla_attr_minlen[NLA_TYPE_MAX+1] = {
@@ -43,6 +45,8 @@ static const u8 nla_attr_minlen[NLA_TYPE_MAX+1] = {
[NLA_S16] = sizeof(s16),
[NLA_S32] = sizeof(s32),
[NLA_S64] = sizeof(s64),
+ [NLA_BE16] = sizeof(__be16),
+ [NLA_BE32] = sizeof(__be32),
};
/*
diff --git a/lib/seq_buf.c b/lib/seq_buf.c
index 010c730..f3f3436 100644
--- a/lib/seq_buf.c
+++ b/lib/seq_buf.c
@@ -13,16 +13,26 @@
* seq_buf_init() more than once to reset the seq_buf to start
* from scratch.
*/
-#include <linux/uaccess.h>
-#include <linux/seq_file.h>
+
+#include <linux/bug.h>
+#include <linux/err.h>
+#include <linux/export.h>
+#include <linux/hex.h>
+#include <linux/minmax.h>
+#include <linux/printk.h>
#include <linux/seq_buf.h>
+#include <linux/seq_file.h>
+#include <linux/sprintf.h>
+#include <linux/string.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
/**
* seq_buf_can_fit - can the new data fit in the current buffer?
* @s: the seq_buf descriptor
* @len: The length to see if it can fit in the current buffer
*
- * Returns true if there's enough unused space in the seq_buf buffer
+ * Returns: true if there's enough unused space in the seq_buf buffer
* to fit the amount of new data according to @len.
*/
static bool seq_buf_can_fit(struct seq_buf *s, size_t len)
@@ -35,7 +45,7 @@ static bool seq_buf_can_fit(struct seq_buf *s, size_t len)
* @m: the seq_file descriptor that is the destination
* @s: the seq_buf descriptor that is the source.
*
- * Returns zero on success, non zero otherwise
+ * Returns: zero on success, non-zero otherwise.
*/
int seq_buf_print_seq(struct seq_file *m, struct seq_buf *s)
{
@@ -50,9 +60,9 @@ int seq_buf_print_seq(struct seq_file *m, struct seq_buf *s)
* @fmt: printf format string
* @args: va_list of arguments from a printf() type function
*
- * Writes a vnprintf() format into the sequencce buffer.
+ * Writes a vnprintf() format into the sequence buffer.
*
- * Returns zero on success, -1 on overflow.
+ * Returns: zero on success, -1 on overflow.
*/
int seq_buf_vprintf(struct seq_buf *s, const char *fmt, va_list args)
{
@@ -78,7 +88,7 @@ int seq_buf_vprintf(struct seq_buf *s, const char *fmt, va_list args)
*
* Writes a printf() format into the sequence buffer.
*
- * Returns zero on success, -1 on overflow.
+ * Returns: zero on success, -1 on overflow.
*/
int seq_buf_printf(struct seq_buf *s, const char *fmt, ...)
{
@@ -94,12 +104,12 @@ int seq_buf_printf(struct seq_buf *s, const char *fmt, ...)
EXPORT_SYMBOL_GPL(seq_buf_printf);
/**
- * seq_buf_do_printk - printk seq_buf line by line
+ * seq_buf_do_printk - printk() seq_buf line by line
* @s: seq_buf descriptor
* @lvl: printk level
*
* printk()-s a multi-line sequential buffer line by line. The function
- * makes sure that the buffer in @s is nul terminated and safe to read
+ * makes sure that the buffer in @s is NUL-terminated and safe to read
* as a string.
*/
void seq_buf_do_printk(struct seq_buf *s, const char *lvl)
@@ -139,7 +149,7 @@ EXPORT_SYMBOL_GPL(seq_buf_do_printk);
* This function will take the format and the binary array and finish
* the conversion into the ASCII string within the buffer.
*
- * Returns zero on success, -1 on overflow.
+ * Returns: zero on success, -1 on overflow.
*/
int seq_buf_bprintf(struct seq_buf *s, const char *fmt, const u32 *binary)
{
@@ -167,7 +177,7 @@ int seq_buf_bprintf(struct seq_buf *s, const char *fmt, const u32 *binary)
*
* Copy a simple string into the sequence buffer.
*
- * Returns zero on success, -1 on overflow
+ * Returns: zero on success, -1 on overflow.
*/
int seq_buf_puts(struct seq_buf *s, const char *str)
{
@@ -196,7 +206,7 @@ EXPORT_SYMBOL_GPL(seq_buf_puts);
*
* Copy a single character into the sequence buffer.
*
- * Returns zero on success, -1 on overflow
+ * Returns: zero on success, -1 on overflow.
*/
int seq_buf_putc(struct seq_buf *s, unsigned char c)
{
@@ -212,7 +222,7 @@ int seq_buf_putc(struct seq_buf *s, unsigned char c)
EXPORT_SYMBOL_GPL(seq_buf_putc);
/**
- * seq_buf_putmem - write raw data into the sequenc buffer
+ * seq_buf_putmem - write raw data into the sequence buffer
* @s: seq_buf descriptor
* @mem: The raw memory to copy into the buffer
* @len: The length of the raw memory to copy (in bytes)
@@ -221,7 +231,7 @@ EXPORT_SYMBOL_GPL(seq_buf_putc);
* buffer and a strcpy() would not work. Using this function allows
* for such cases.
*
- * Returns zero on success, -1 on overflow
+ * Returns: zero on success, -1 on overflow.
*/
int seq_buf_putmem(struct seq_buf *s, const void *mem, unsigned int len)
{
@@ -249,7 +259,7 @@ int seq_buf_putmem(struct seq_buf *s, const void *mem, unsigned int len)
* raw memory into the buffer it writes its ASCII representation of it
* in hex characters.
*
- * Returns zero on success, -1 on overflow
+ * Returns: zero on success, -1 on overflow.
*/
int seq_buf_putmem_hex(struct seq_buf *s, const void *mem,
unsigned int len)
@@ -297,7 +307,7 @@ int seq_buf_putmem_hex(struct seq_buf *s, const void *mem,
*
* Write a path name into the sequence buffer.
*
- * Returns the number of written bytes on success, -1 on overflow
+ * Returns: the number of written bytes on success, -1 on overflow.
*/
int seq_buf_path(struct seq_buf *s, const struct path *path, const char *esc)
{
@@ -332,6 +342,7 @@ int seq_buf_path(struct seq_buf *s, const struct path *path, const char *esc)
* or until it reaches the end of the content in the buffer (@s->len),
* whichever comes first.
*
+ * Returns:
* On success, it returns a positive number of the number of bytes
* it copied.
*
@@ -382,11 +393,11 @@ int seq_buf_to_user(struct seq_buf *s, char __user *ubuf, size_t start, int cnt)
* linebuf size is maximal length for one line.
* 32 * 3 - maximum bytes per line, each printed into 2 chars + 1 for
* separating space
- * 2 - spaces separating hex dump and ascii representation
- * 32 - ascii representation
+ * 2 - spaces separating hex dump and ASCII representation
+ * 32 - ASCII representation
* 1 - terminating '\0'
*
- * Returns zero on success, -1 on overflow
+ * Returns: zero on success, -1 on overflow.
*/
int seq_buf_hex_dump(struct seq_buf *s, const char *prefix_str, int prefix_type,
int rowsize, int groupsize,
diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index 5caa1f5..4a7055a 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -22,6 +22,7 @@
#include <linux/list.h>
#include <linux/mm.h>
#include <linux/mutex.h>
+#include <linux/poison.h>
#include <linux/printk.h>
#include <linux/rculist.h>
#include <linux/rcupdate.h>
@@ -43,17 +44,7 @@
#define DEPOT_OFFSET_BITS (DEPOT_POOL_ORDER + PAGE_SHIFT - DEPOT_STACK_ALIGN)
#define DEPOT_POOL_INDEX_BITS (DEPOT_HANDLE_BITS - DEPOT_OFFSET_BITS - \
STACK_DEPOT_EXTRA_BITS)
-#if IS_ENABLED(CONFIG_KMSAN) && CONFIG_STACKDEPOT_MAX_FRAMES >= 32
-/*
- * KMSAN is frequently used in fuzzing scenarios and thus saves a lot of stack
- * traces. As KMSAN does not support evicting stack traces from the stack
- * depot, the stack depot capacity might be reached quickly with large stack
- * records. Adjust the maximum number of stack depot pools for this case.
- */
-#define DEPOT_POOLS_CAP (8192 * (CONFIG_STACKDEPOT_MAX_FRAMES / 16))
-#else
#define DEPOT_POOLS_CAP 8192
-#endif
#define DEPOT_MAX_POOLS \
(((1LL << (DEPOT_POOL_INDEX_BITS)) < DEPOT_POOLS_CAP) ? \
(1LL << (DEPOT_POOL_INDEX_BITS)) : DEPOT_POOLS_CAP)
@@ -93,9 +84,6 @@ struct stack_record {
};
};
-#define DEPOT_STACK_RECORD_SIZE \
- ALIGN(sizeof(struct stack_record), 1 << DEPOT_STACK_ALIGN)
-
static bool stack_depot_disabled;
static bool __stack_depot_early_init_requested __initdata = IS_ENABLED(CONFIG_STACKDEPOT_ALWAYS_INIT);
static bool __stack_depot_early_init_passed __initdata;
@@ -121,32 +109,31 @@ static void *stack_pools[DEPOT_MAX_POOLS];
static void *new_pool;
/* Number of pools in stack_pools. */
static int pools_num;
+/* Offset to the unused space in the currently used pool. */
+static size_t pool_offset = DEPOT_POOL_SIZE;
/* Freelist of stack records within stack_pools. */
static LIST_HEAD(free_stacks);
-/*
- * Stack depot tries to keep an extra pool allocated even before it runs out
- * of space in the currently used pool. This flag marks whether this extra pool
- * needs to be allocated. It has the value 0 when either an extra pool is not
- * yet allocated or if the limit on the number of pools is reached.
- */
-static bool new_pool_required = true;
/* The lock must be held when performing pool or freelist modifications. */
static DEFINE_RAW_SPINLOCK(pool_lock);
/* Statistics counters for debugfs. */
enum depot_counter_id {
- DEPOT_COUNTER_ALLOCS,
- DEPOT_COUNTER_FREES,
- DEPOT_COUNTER_INUSE,
+ DEPOT_COUNTER_REFD_ALLOCS,
+ DEPOT_COUNTER_REFD_FREES,
+ DEPOT_COUNTER_REFD_INUSE,
DEPOT_COUNTER_FREELIST_SIZE,
+ DEPOT_COUNTER_PERSIST_COUNT,
+ DEPOT_COUNTER_PERSIST_BYTES,
DEPOT_COUNTER_COUNT,
};
static long counters[DEPOT_COUNTER_COUNT];
static const char *const counter_names[] = {
- [DEPOT_COUNTER_ALLOCS] = "allocations",
- [DEPOT_COUNTER_FREES] = "frees",
- [DEPOT_COUNTER_INUSE] = "in_use",
+ [DEPOT_COUNTER_REFD_ALLOCS] = "refcounted_allocations",
+ [DEPOT_COUNTER_REFD_FREES] = "refcounted_frees",
+ [DEPOT_COUNTER_REFD_INUSE] = "refcounted_in_use",
[DEPOT_COUNTER_FREELIST_SIZE] = "freelist_size",
+ [DEPOT_COUNTER_PERSIST_COUNT] = "persistent_count",
+ [DEPOT_COUNTER_PERSIST_BYTES] = "persistent_bytes",
};
static_assert(ARRAY_SIZE(counter_names) == DEPOT_COUNTER_COUNT);
@@ -294,48 +281,52 @@ int stack_depot_init(void)
EXPORT_SYMBOL_GPL(stack_depot_init);
/*
- * Initializes new stack depot @pool, release all its entries to the freelist,
- * and update the list of pools.
+ * Initializes new stack pool, and updates the list of pools.
*/
-static void depot_init_pool(void *pool)
+static bool depot_init_pool(void **prealloc)
{
- int offset;
-
lockdep_assert_held(&pool_lock);
- /* Initialize handles and link stack records into the freelist. */
- for (offset = 0; offset <= DEPOT_POOL_SIZE - DEPOT_STACK_RECORD_SIZE;
- offset += DEPOT_STACK_RECORD_SIZE) {
- struct stack_record *stack = pool + offset;
-
- stack->handle.pool_index = pools_num;
- stack->handle.offset = offset >> DEPOT_STACK_ALIGN;
- stack->handle.extra = 0;
-
- /*
- * Stack traces of size 0 are never saved, and we can simply use
- * the size field as an indicator if this is a new unused stack
- * record in the freelist.
- */
- stack->size = 0;
-
- INIT_LIST_HEAD(&stack->hash_list);
- /*
- * Add to the freelist front to prioritize never-used entries:
- * required in case there are entries in the freelist, but their
- * RCU cookie still belongs to the current RCU grace period
- * (there can still be concurrent readers).
- */
- list_add(&stack->free_list, &free_stacks);
- counters[DEPOT_COUNTER_FREELIST_SIZE]++;
+ if (unlikely(pools_num >= DEPOT_MAX_POOLS)) {
+ /* Bail out if we reached the pool limit. */
+ WARN_ON_ONCE(pools_num > DEPOT_MAX_POOLS); /* should never happen */
+ WARN_ON_ONCE(!new_pool); /* to avoid unnecessary pre-allocation */
+ WARN_ONCE(1, "Stack depot reached limit capacity");
+ return false;
}
+ if (!new_pool && *prealloc) {
+ /* We have preallocated memory, use it. */
+ WRITE_ONCE(new_pool, *prealloc);
+ *prealloc = NULL;
+ }
+
+ if (!new_pool)
+ return false; /* new_pool and *prealloc are NULL */
+
/* Save reference to the pool to be used by depot_fetch_stack(). */
- stack_pools[pools_num] = pool;
+ stack_pools[pools_num] = new_pool;
+
+ /*
+ * Stack depot tries to keep an extra pool allocated even before it runs
+ * out of space in the currently used pool.
+ *
+ * To indicate that a new preallocation is needed new_pool is reset to
+ * NULL; do not reset to NULL if we have reached the maximum number of
+ * pools.
+ */
+ if (pools_num < DEPOT_MAX_POOLS)
+ WRITE_ONCE(new_pool, NULL);
+ else
+ WRITE_ONCE(new_pool, STACK_DEPOT_POISON);
/* Pairs with concurrent READ_ONCE() in depot_fetch_stack(). */
WRITE_ONCE(pools_num, pools_num + 1);
ASSERT_EXCLUSIVE_WRITER(pools_num);
+
+ pool_offset = 0;
+
+ return true;
}
/* Keeps the preallocated memory to be used for a new stack depot pool. */
@@ -347,63 +338,51 @@ static void depot_keep_new_pool(void **prealloc)
* If a new pool is already saved or the maximum number of
* pools is reached, do not use the preallocated memory.
*/
- if (!new_pool_required)
+ if (new_pool)
return;
- /*
- * Use the preallocated memory for the new pool
- * as long as we do not exceed the maximum number of pools.
- */
- if (pools_num < DEPOT_MAX_POOLS) {
- new_pool = *prealloc;
- *prealloc = NULL;
- }
-
- /*
- * At this point, either a new pool is kept or the maximum
- * number of pools is reached. In either case, take note that
- * keeping another pool is not required.
- */
- WRITE_ONCE(new_pool_required, false);
+ WRITE_ONCE(new_pool, *prealloc);
+ *prealloc = NULL;
}
/*
- * Try to initialize a new stack depot pool from either a previous or the
- * current pre-allocation, and release all its entries to the freelist.
+ * Try to initialize a new stack record from the current pool, a cached pool, or
+ * the current pre-allocation.
*/
-static bool depot_try_init_pool(void **prealloc)
+static struct stack_record *depot_pop_free_pool(void **prealloc, size_t size)
{
+ struct stack_record *stack;
+ void *current_pool;
+ u32 pool_index;
+
lockdep_assert_held(&pool_lock);
- /* Check if we have a new pool saved and use it. */
- if (new_pool) {
- depot_init_pool(new_pool);
- new_pool = NULL;
-
- /* Take note that we might need a new new_pool. */
- if (pools_num < DEPOT_MAX_POOLS)
- WRITE_ONCE(new_pool_required, true);
-
- return true;
+ if (pool_offset + size > DEPOT_POOL_SIZE) {
+ if (!depot_init_pool(prealloc))
+ return NULL;
}
- /* Bail out if we reached the pool limit. */
- if (unlikely(pools_num >= DEPOT_MAX_POOLS)) {
- WARN_ONCE(1, "Stack depot reached limit capacity");
- return false;
- }
+ if (WARN_ON_ONCE(pools_num < 1))
+ return NULL;
+ pool_index = pools_num - 1;
+ current_pool = stack_pools[pool_index];
+ if (WARN_ON_ONCE(!current_pool))
+ return NULL;
- /* Check if we have preallocated memory and use it. */
- if (*prealloc) {
- depot_init_pool(*prealloc);
- *prealloc = NULL;
- return true;
- }
+ stack = current_pool + pool_offset;
- return false;
+ /* Pre-initialize handle once. */
+ stack->handle.pool_index = pool_index;
+ stack->handle.offset = pool_offset >> DEPOT_STACK_ALIGN;
+ stack->handle.extra = 0;
+ INIT_LIST_HEAD(&stack->hash_list);
+
+ pool_offset += size;
+
+ return stack;
}
-/* Try to find next free usable entry. */
+/* Try to find next free usable entry from the freelist. */
static struct stack_record *depot_pop_free(void)
{
struct stack_record *stack;
@@ -420,7 +399,7 @@ static struct stack_record *depot_pop_free(void)
* check the first entry.
*/
stack = list_first_entry(&free_stacks, struct stack_record, free_list);
- if (stack->size && !poll_state_synchronize_rcu(stack->rcu_state))
+ if (!poll_state_synchronize_rcu(stack->rcu_state))
return NULL;
list_del(&stack->free_list);
@@ -429,48 +408,73 @@ static struct stack_record *depot_pop_free(void)
return stack;
}
+static inline size_t depot_stack_record_size(struct stack_record *s, unsigned int nr_entries)
+{
+ const size_t used = flex_array_size(s, entries, nr_entries);
+ const size_t unused = sizeof(s->entries) - used;
+
+ WARN_ON_ONCE(sizeof(s->entries) < used);
+
+ return ALIGN(sizeof(struct stack_record) - unused, 1 << DEPOT_STACK_ALIGN);
+}
+
/* Allocates a new stack in a stack depot pool. */
static struct stack_record *
-depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
+depot_alloc_stack(unsigned long *entries, unsigned int nr_entries, u32 hash, depot_flags_t flags, void **prealloc)
{
- struct stack_record *stack;
+ struct stack_record *stack = NULL;
+ size_t record_size;
lockdep_assert_held(&pool_lock);
/* This should already be checked by public API entry points. */
- if (WARN_ON_ONCE(!size))
+ if (WARN_ON_ONCE(!nr_entries))
return NULL;
- /* Check if we have a stack record to save the stack trace. */
- stack = depot_pop_free();
- if (!stack) {
- /* No usable entries on the freelist - try to refill the freelist. */
- if (!depot_try_init_pool(prealloc))
- return NULL;
+ /* Limit number of saved frames to CONFIG_STACKDEPOT_MAX_FRAMES. */
+ if (nr_entries > CONFIG_STACKDEPOT_MAX_FRAMES)
+ nr_entries = CONFIG_STACKDEPOT_MAX_FRAMES;
+
+ if (flags & STACK_DEPOT_FLAG_GET) {
+ /*
+ * Evictable entries have to allocate the max. size so they may
+ * safely be re-used by differently sized allocations.
+ */
+ record_size = depot_stack_record_size(stack, CONFIG_STACKDEPOT_MAX_FRAMES);
stack = depot_pop_free();
- if (WARN_ON(!stack))
+ } else {
+ record_size = depot_stack_record_size(stack, nr_entries);
+ }
+
+ if (!stack) {
+ stack = depot_pop_free_pool(prealloc, record_size);
+ if (!stack)
return NULL;
}
- /* Limit number of saved frames to CONFIG_STACKDEPOT_MAX_FRAMES. */
- if (size > CONFIG_STACKDEPOT_MAX_FRAMES)
- size = CONFIG_STACKDEPOT_MAX_FRAMES;
-
/* Save the stack trace. */
stack->hash = hash;
- stack->size = size;
- /* stack->handle is already filled in by depot_init_pool(). */
- refcount_set(&stack->count, 1);
- memcpy(stack->entries, entries, flex_array_size(stack, entries, size));
+ stack->size = nr_entries;
+ /* stack->handle is already filled in by depot_pop_free_pool(). */
+ memcpy(stack->entries, entries, flex_array_size(stack, entries, nr_entries));
+
+ if (flags & STACK_DEPOT_FLAG_GET) {
+ refcount_set(&stack->count, 1);
+ counters[DEPOT_COUNTER_REFD_ALLOCS]++;
+ counters[DEPOT_COUNTER_REFD_INUSE]++;
+ } else {
+ /* Warn on attempts to switch to refcounting this entry. */
+ refcount_set(&stack->count, REFCOUNT_SATURATED);
+ counters[DEPOT_COUNTER_PERSIST_COUNT]++;
+ counters[DEPOT_COUNTER_PERSIST_BYTES] += record_size;
+ }
/*
* Let KMSAN know the stored stack record is initialized. This shall
* prevent false positive reports if instrumented code accesses it.
*/
- kmsan_unpoison_memory(stack, DEPOT_STACK_RECORD_SIZE);
+ kmsan_unpoison_memory(stack, record_size);
- counters[DEPOT_COUNTER_ALLOCS]++;
- counters[DEPOT_COUNTER_INUSE]++;
return stack;
}
@@ -538,8 +542,8 @@ static void depot_free_stack(struct stack_record *stack)
list_add_tail(&stack->free_list, &free_stacks);
counters[DEPOT_COUNTER_FREELIST_SIZE]++;
- counters[DEPOT_COUNTER_FREES]++;
- counters[DEPOT_COUNTER_INUSE]--;
+ counters[DEPOT_COUNTER_REFD_FREES]++;
+ counters[DEPOT_COUNTER_REFD_INUSE]--;
printk_deferred_exit();
raw_spin_unlock_irqrestore(&pool_lock, flags);
@@ -660,7 +664,7 @@ depot_stack_handle_t stack_depot_save_flags(unsigned long *entries,
* Allocate memory for a new pool if required now:
* we won't be able to do that under the lock.
*/
- if (unlikely(can_alloc && READ_ONCE(new_pool_required))) {
+ if (unlikely(can_alloc && !READ_ONCE(new_pool))) {
/*
* Zero out zone modifiers, as we don't have specific zone
* requirements. Keep the flags related to allocation in atomic
@@ -681,7 +685,7 @@ depot_stack_handle_t stack_depot_save_flags(unsigned long *entries,
found = find_stack(bucket, entries, nr_entries, hash, depot_flags);
if (!found) {
struct stack_record *new =
- depot_alloc_stack(entries, nr_entries, hash, &prealloc);
+ depot_alloc_stack(entries, nr_entries, hash, depot_flags, &prealloc);
if (new) {
/*
diff --git a/lib/test_maple_tree.c b/lib/test_maple_tree.c
index 29185ac5..399380d 100644
--- a/lib/test_maple_tree.c
+++ b/lib/test_maple_tree.c
@@ -3599,6 +3599,45 @@ static noinline void __init check_state_handling(struct maple_tree *mt)
mas_unlock(&mas);
}
+static noinline void __init alloc_cyclic_testing(struct maple_tree *mt)
+{
+ unsigned long location;
+ unsigned long next;
+ int ret = 0;
+ MA_STATE(mas, mt, 0, 0);
+
+ next = 0;
+ mtree_lock(mt);
+ for (int i = 0; i < 100; i++) {
+ mas_alloc_cyclic(&mas, &location, mt, 2, ULONG_MAX, &next, GFP_KERNEL);
+ MAS_BUG_ON(&mas, i != location - 2);
+ MAS_BUG_ON(&mas, mas.index != location);
+ MAS_BUG_ON(&mas, mas.last != location);
+ MAS_BUG_ON(&mas, i != next - 3);
+ }
+
+ mtree_unlock(mt);
+ mtree_destroy(mt);
+ next = 0;
+ mt_init_flags(mt, MT_FLAGS_ALLOC_RANGE);
+ for (int i = 0; i < 100; i++) {
+ mtree_alloc_cyclic(mt, &location, mt, 2, ULONG_MAX, &next, GFP_KERNEL);
+ MT_BUG_ON(mt, i != location - 2);
+ MT_BUG_ON(mt, i != next - 3);
+ MT_BUG_ON(mt, mtree_load(mt, location) != mt);
+ }
+
+ mtree_destroy(mt);
+ /* Overflow test */
+ next = ULONG_MAX - 1;
+ ret = mtree_alloc_cyclic(mt, &location, mt, 2, ULONG_MAX, &next, GFP_KERNEL);
+ MT_BUG_ON(mt, ret != 0);
+ ret = mtree_alloc_cyclic(mt, &location, mt, 2, ULONG_MAX, &next, GFP_KERNEL);
+ MT_BUG_ON(mt, ret != 0);
+ ret = mtree_alloc_cyclic(mt, &location, mt, 2, ULONG_MAX, &next, GFP_KERNEL);
+ MT_BUG_ON(mt, ret != 1);
+}
+
static DEFINE_MTREE(tree);
static int __init maple_tree_seed(void)
{
@@ -3880,6 +3919,11 @@ static int __init maple_tree_seed(void)
check_state_handling(&tree);
mtree_destroy(&tree);
+ mt_init_flags(&tree, MT_FLAGS_ALLOC_RANGE);
+ alloc_cyclic_testing(&tree);
+ mtree_destroy(&tree);
+
+
#if defined(BENCH)
skip:
#endif
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 1e3447b..5f2be8c 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -372,31 +372,6 @@ static int __init default_bdi_init(void)
}
subsys_initcall(default_bdi_init);
-/*
- * This function is used when the first inode for this wb is marked dirty. It
- * wakes-up the corresponding bdi thread which should then take care of the
- * periodic background write-out of dirty inodes. Since the write-out would
- * starts only 'dirty_writeback_interval' centisecs from now anyway, we just
- * set up a timer which wakes the bdi thread up later.
- *
- * Note, we wouldn't bother setting up the timer, but this function is on the
- * fast-path (used by '__mark_inode_dirty()'), so we save few context switches
- * by delaying the wake-up.
- *
- * We have to be careful not to postpone flush work if it is scheduled for
- * earlier. Thus we use queue_delayed_work().
- */
-void wb_wakeup_delayed(struct bdi_writeback *wb)
-{
- unsigned long timeout;
-
- timeout = msecs_to_jiffies(dirty_writeback_interval * 10);
- spin_lock_irq(&wb->work_lock);
- if (test_bit(WB_registered, &wb->state))
- queue_delayed_work(bdi_wq, &wb->dwork, timeout);
- spin_unlock_irq(&wb->work_lock);
-}
-
static void wb_update_bandwidth_workfn(struct work_struct *work)
{
struct bdi_writeback *wb = container_of(to_delayed_work(work),
@@ -436,7 +411,6 @@ static int wb_init(struct bdi_writeback *wb, struct backing_dev_info *bdi,
INIT_LIST_HEAD(&wb->work_list);
INIT_DELAYED_WORK(&wb->dwork, wb_workfn);
INIT_DELAYED_WORK(&wb->bw_dwork, wb_update_bandwidth_workfn);
- wb->dirty_sleep = jiffies;
err = fprop_local_init_percpu(&wb->completions, gfp);
if (err)
@@ -921,6 +895,7 @@ int bdi_init(struct backing_dev_info *bdi)
INIT_LIST_HEAD(&bdi->bdi_list);
INIT_LIST_HEAD(&bdi->wb_list);
init_waitqueue_head(&bdi->wb_waitq);
+ bdi->last_bdp_sleep = jiffies;
return cgwb_bdi_init(bdi);
}
diff --git a/mm/compaction.c b/mm/compaction.c
index 4add68d..b961db6 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2723,16 +2723,11 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
unsigned int alloc_flags, const struct alloc_context *ac,
enum compact_priority prio, struct page **capture)
{
- int may_perform_io = (__force int)(gfp_mask & __GFP_IO);
struct zoneref *z;
struct zone *zone;
enum compact_result rc = COMPACT_SKIPPED;
- /*
- * Check if the GFP flags allow compaction - GFP_NOIO is really
- * tricky context because the migration might require IO
- */
- if (!may_perform_io)
+ if (!gfp_compaction_allowed(gfp_mask))
return COMPACT_SKIPPED;
trace_mm_compaction_try_to_compact_pages(order, gfp_mask, prio);
diff --git a/mm/damon/core.c b/mm/damon/core.c
index 36f6f1d..5b32574 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -1026,6 +1026,9 @@ static void damon_do_apply_schemes(struct damon_ctx *c,
damon_for_each_scheme(s, c) {
struct damos_quota *quota = &s->quota;
+ if (c->passed_sample_intervals != s->next_apply_sis)
+ continue;
+
if (!s->wmarks.activated)
continue;
@@ -1176,10 +1179,6 @@ static void kdamond_apply_schemes(struct damon_ctx *c)
if (c->passed_sample_intervals != s->next_apply_sis)
continue;
- s->next_apply_sis +=
- (s->apply_interval_us ? s->apply_interval_us :
- c->attrs.aggr_interval) / sample_interval;
-
if (!s->wmarks.activated)
continue;
@@ -1195,6 +1194,14 @@ static void kdamond_apply_schemes(struct damon_ctx *c)
damon_for_each_region_safe(r, next_r, t)
damon_do_apply_schemes(c, t, r);
}
+
+ damon_for_each_scheme(s, c) {
+ if (c->passed_sample_intervals != s->next_apply_sis)
+ continue;
+ s->next_apply_sis +=
+ (s->apply_interval_us ? s->apply_interval_us :
+ c->attrs.aggr_interval) / sample_interval;
+ }
}
/*
diff --git a/mm/damon/lru_sort.c b/mm/damon/lru_sort.c
index f2e5f94..3de2916 100644
--- a/mm/damon/lru_sort.c
+++ b/mm/damon/lru_sort.c
@@ -185,9 +185,21 @@ static struct damos *damon_lru_sort_new_cold_scheme(unsigned int cold_thres)
return damon_lru_sort_new_scheme(&pattern, DAMOS_LRU_DEPRIO);
}
+static void damon_lru_sort_copy_quota_status(struct damos_quota *dst,
+ struct damos_quota *src)
+{
+ dst->total_charged_sz = src->total_charged_sz;
+ dst->total_charged_ns = src->total_charged_ns;
+ dst->charged_sz = src->charged_sz;
+ dst->charged_from = src->charged_from;
+ dst->charge_target_from = src->charge_target_from;
+ dst->charge_addr_from = src->charge_addr_from;
+}
+
static int damon_lru_sort_apply_parameters(void)
{
- struct damos *scheme;
+ struct damos *scheme, *hot_scheme, *cold_scheme;
+ struct damos *old_hot_scheme = NULL, *old_cold_scheme = NULL;
unsigned int hot_thres, cold_thres;
int err = 0;
@@ -195,18 +207,35 @@ static int damon_lru_sort_apply_parameters(void)
if (err)
return err;
+ damon_for_each_scheme(scheme, ctx) {
+ if (!old_hot_scheme) {
+ old_hot_scheme = scheme;
+ continue;
+ }
+ old_cold_scheme = scheme;
+ }
+
hot_thres = damon_max_nr_accesses(&damon_lru_sort_mon_attrs) *
hot_thres_access_freq / 1000;
- scheme = damon_lru_sort_new_hot_scheme(hot_thres);
- if (!scheme)
+ hot_scheme = damon_lru_sort_new_hot_scheme(hot_thres);
+ if (!hot_scheme)
return -ENOMEM;
- damon_set_schemes(ctx, &scheme, 1);
+ if (old_hot_scheme)
+ damon_lru_sort_copy_quota_status(&hot_scheme->quota,
+ &old_hot_scheme->quota);
cold_thres = cold_min_age / damon_lru_sort_mon_attrs.aggr_interval;
- scheme = damon_lru_sort_new_cold_scheme(cold_thres);
- if (!scheme)
+ cold_scheme = damon_lru_sort_new_cold_scheme(cold_thres);
+ if (!cold_scheme) {
+ damon_destroy_scheme(hot_scheme);
return -ENOMEM;
- damon_add_scheme(ctx, scheme);
+ }
+ if (old_cold_scheme)
+ damon_lru_sort_copy_quota_status(&cold_scheme->quota,
+ &old_cold_scheme->quota);
+
+ damon_set_schemes(ctx, &hot_scheme, 1);
+ damon_add_scheme(ctx, cold_scheme);
return damon_set_region_biggest_system_ram_default(target,
&monitor_region_start,
diff --git a/mm/damon/reclaim.c b/mm/damon/reclaim.c
index ab974e4..66e190f 100644
--- a/mm/damon/reclaim.c
+++ b/mm/damon/reclaim.c
@@ -150,9 +150,20 @@ static struct damos *damon_reclaim_new_scheme(void)
&damon_reclaim_wmarks);
}
+static void damon_reclaim_copy_quota_status(struct damos_quota *dst,
+ struct damos_quota *src)
+{
+ dst->total_charged_sz = src->total_charged_sz;
+ dst->total_charged_ns = src->total_charged_ns;
+ dst->charged_sz = src->charged_sz;
+ dst->charged_from = src->charged_from;
+ dst->charge_target_from = src->charge_target_from;
+ dst->charge_addr_from = src->charge_addr_from;
+}
+
static int damon_reclaim_apply_parameters(void)
{
- struct damos *scheme;
+ struct damos *scheme, *old_scheme;
struct damos_filter *filter;
int err = 0;
@@ -164,6 +175,11 @@ static int damon_reclaim_apply_parameters(void)
scheme = damon_reclaim_new_scheme();
if (!scheme)
return -ENOMEM;
+ if (!list_empty(&ctx->schemes)) {
+ damon_for_each_scheme(old_scheme, ctx)
+ damon_reclaim_copy_quota_status(&scheme->quota,
+ &old_scheme->quota);
+ }
if (skip_anon) {
filter = damos_new_filter(DAMOS_FILTER_TYPE_ANON, true);
if (!filter) {
diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c
index 8dbaac6..ae0f0b3 100644
--- a/mm/damon/sysfs-schemes.c
+++ b/mm/damon/sysfs-schemes.c
@@ -1905,6 +1905,10 @@ void damos_sysfs_set_quota_scores(struct damon_sysfs_schemes *sysfs_schemes,
damon_for_each_scheme(scheme, ctx) {
struct damon_sysfs_scheme *sysfs_scheme;
+ /* user could have removed the scheme sysfs dir */
+ if (i >= sysfs_schemes->nr)
+ break;
+
sysfs_scheme = sysfs_schemes->schemes_arr[i];
damos_sysfs_set_quota_score(sysfs_scheme->quotas->goals,
&scheme->quota);
@@ -2194,7 +2198,7 @@ static void damos_tried_regions_init_upd_status(
sysfs_regions->upd_timeout_jiffies = jiffies +
2 * usecs_to_jiffies(scheme->apply_interval_us ?
scheme->apply_interval_us :
- ctx->attrs.sample_interval);
+ ctx->attrs.aggr_interval);
}
}
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 5662e29..65c1902 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -362,6 +362,12 @@ static void __init pud_advanced_tests(struct pgtable_debug_args *args)
vaddr &= HPAGE_PUD_MASK;
pud = pfn_pud(args->pud_pfn, args->page_prot);
+ /*
+ * Some architectures have debug checks to make sure
+ * huge pud mapping are only found with devmap entries
+ * For now test with only devmap entries.
+ */
+ pud = pud_mkdevmap(pud);
set_pud_at(args->mm, vaddr, args->pudp, pud);
flush_dcache_page(page);
pudp_set_wrprotect(args->mm, vaddr, args->pudp);
@@ -374,6 +380,7 @@ static void __init pud_advanced_tests(struct pgtable_debug_args *args)
WARN_ON(!pud_none(pud));
#endif /* __PAGETABLE_PMD_FOLDED */
pud = pfn_pud(args->pud_pfn, args->page_prot);
+ pud = pud_mkdevmap(pud);
pud = pud_wrprotect(pud);
pud = pud_mkclean(pud);
set_pud_at(args->mm, vaddr, args->pudp, pud);
@@ -391,6 +398,7 @@ static void __init pud_advanced_tests(struct pgtable_debug_args *args)
#endif /* __PAGETABLE_PMD_FOLDED */
pud = pfn_pud(args->pud_pfn, args->page_prot);
+ pud = pud_mkdevmap(pud);
pud = pud_mkyoung(pud);
set_pud_at(args->mm, vaddr, args->pudp, pud);
flush_dcache_page(page);
diff --git a/mm/filemap.c b/mm/filemap.c
index 750e779..8df4797 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2609,15 +2609,6 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
end_offset = min_t(loff_t, isize, iocb->ki_pos + iter->count);
/*
- * Pairs with a barrier in
- * block_write_end()->mark_buffer_dirty() or other page
- * dirtying routines like iomap_write_end() to ensure
- * changes to page contents are visible before we see
- * increased inode size.
- */
- smp_rmb();
-
- /*
* Once we start copying data, we don't want to be touching any
* cachelines that might be contended:
*/
@@ -4111,28 +4102,40 @@ static void filemap_cachestat(struct address_space *mapping,
rcu_read_lock();
xas_for_each(&xas, folio, last_index) {
+ int order;
unsigned long nr_pages;
pgoff_t folio_first_index, folio_last_index;
+ /*
+ * Don't deref the folio. It is not pinned, and might
+ * get freed (and reused) underneath us.
+ *
+ * We *could* pin it, but that would be expensive for
+ * what should be a fast and lightweight syscall.
+ *
+ * Instead, derive all information of interest from
+ * the rcu-protected xarray.
+ */
+
if (xas_retry(&xas, folio))
continue;
+ order = xa_get_order(xas.xa, xas.xa_index);
+ nr_pages = 1 << order;
+ folio_first_index = round_down(xas.xa_index, 1 << order);
+ folio_last_index = folio_first_index + nr_pages - 1;
+
+ /* Folios might straddle the range boundaries, only count covered pages */
+ if (folio_first_index < first_index)
+ nr_pages -= first_index - folio_first_index;
+
+ if (folio_last_index > last_index)
+ nr_pages -= folio_last_index - last_index;
+
if (xa_is_value(folio)) {
/* page is evicted */
void *shadow = (void *)folio;
bool workingset; /* not used */
- int order = xa_get_order(xas.xa, xas.xa_index);
-
- nr_pages = 1 << order;
- folio_first_index = round_down(xas.xa_index, 1 << order);
- folio_last_index = folio_first_index + nr_pages - 1;
-
- /* Folios might straddle the range boundaries, only count covered pages */
- if (folio_first_index < first_index)
- nr_pages -= first_index - folio_first_index;
-
- if (folio_last_index > last_index)
- nr_pages -= folio_last_index - last_index;
cs->nr_evicted += nr_pages;
@@ -4150,24 +4153,13 @@ static void filemap_cachestat(struct address_space *mapping,
goto resched;
}
- nr_pages = folio_nr_pages(folio);
- folio_first_index = folio_pgoff(folio);
- folio_last_index = folio_first_index + nr_pages - 1;
-
- /* Folios might straddle the range boundaries, only count covered pages */
- if (folio_first_index < first_index)
- nr_pages -= first_index - folio_first_index;
-
- if (folio_last_index > last_index)
- nr_pages -= folio_last_index - last_index;
-
/* page is in cache */
cs->nr_cache += nr_pages;
- if (folio_test_dirty(folio))
+ if (xas_get_mark(&xas, PAGECACHE_TAG_DIRTY))
cs->nr_dirty += nr_pages;
- if (folio_test_writeback(folio))
+ if (xas_get_mark(&xas, PAGECACHE_TAG_WRITEBACK))
cs->nr_writeback += nr_pages;
resched:
diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 610efae..6ca63e8 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -65,8 +65,7 @@ void kasan_save_track(struct kasan_track *track, gfp_t flags)
{
depot_stack_handle_t stack;
- stack = kasan_save_stack(flags,
- STACK_DEPOT_FLAG_CAN_ALLOC | STACK_DEPOT_FLAG_GET);
+ stack = kasan_save_stack(flags, STACK_DEPOT_FLAG_CAN_ALLOC);
kasan_set_track(track, stack);
}
@@ -266,10 +265,9 @@ bool __kasan_slab_free(struct kmem_cache *cache, void *object,
return true;
/*
- * If the object is not put into quarantine, it will likely be quickly
- * reallocated. Thus, release its metadata now.
+ * Note: Keep per-object metadata to allow KASAN print stack traces for
+ * use-after-free-before-realloc bugs.
*/
- kasan_release_object_meta(cache, object);
/* Let slab put the object onto the freelist. */
return false;
diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index df6627f624..1900f85 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -485,16 +485,6 @@ void kasan_init_object_meta(struct kmem_cache *cache, const void *object)
if (alloc_meta) {
/* Zero out alloc meta to mark it as invalid. */
__memset(alloc_meta, 0, sizeof(*alloc_meta));
-
- /*
- * Prepare the lock for saving auxiliary stack traces.
- * Temporarily disable KASAN bug reporting to allow instrumented
- * raw_spin_lock_init to access aux_lock, which resides inside
- * of a redzone.
- */
- kasan_disable_current();
- raw_spin_lock_init(&alloc_meta->aux_lock);
- kasan_enable_current();
}
/*
@@ -506,47 +496,23 @@ void kasan_init_object_meta(struct kmem_cache *cache, const void *object)
static void release_alloc_meta(struct kasan_alloc_meta *meta)
{
- /* Evict the stack traces from stack depot. */
- stack_depot_put(meta->alloc_track.stack);
- stack_depot_put(meta->aux_stack[0]);
- stack_depot_put(meta->aux_stack[1]);
-
- /*
- * Zero out alloc meta to mark it as invalid but keep aux_lock
- * initialized to avoid having to reinitialize it when another object
- * is allocated in the same slot.
- */
- __memset(&meta->alloc_track, 0, sizeof(meta->alloc_track));
- __memset(meta->aux_stack, 0, sizeof(meta->aux_stack));
+ /* Zero out alloc meta to mark it as invalid. */
+ __memset(meta, 0, sizeof(*meta));
}
static void release_free_meta(const void *object, struct kasan_free_meta *meta)
{
+ if (!kasan_arch_is_ready())
+ return;
+
/* Check if free meta is valid. */
if (*(u8 *)kasan_mem_to_shadow(object) != KASAN_SLAB_FREE_META)
return;
- /* Evict the stack trace from the stack depot. */
- stack_depot_put(meta->free_track.stack);
-
/* Mark free meta as invalid. */
*(u8 *)kasan_mem_to_shadow(object) = KASAN_SLAB_FREE;
}
-void kasan_release_object_meta(struct kmem_cache *cache, const void *object)
-{
- struct kasan_alloc_meta *alloc_meta;
- struct kasan_free_meta *free_meta;
-
- alloc_meta = kasan_get_alloc_meta(cache, object);
- if (alloc_meta)
- release_alloc_meta(alloc_meta);
-
- free_meta = kasan_get_free_meta(cache, object);
- if (free_meta)
- release_free_meta(object, free_meta);
-}
-
size_t kasan_metadata_size(struct kmem_cache *cache, bool in_object)
{
struct kasan_cache *info = &cache->kasan_info;
@@ -571,8 +537,6 @@ static void __kasan_record_aux_stack(void *addr, depot_flags_t depot_flags)
struct kmem_cache *cache;
struct kasan_alloc_meta *alloc_meta;
void *object;
- depot_stack_handle_t new_handle, old_handle;
- unsigned long flags;
if (is_kfence_address(addr) || !slab)
return;
@@ -583,33 +547,18 @@ static void __kasan_record_aux_stack(void *addr, depot_flags_t depot_flags)
if (!alloc_meta)
return;
- new_handle = kasan_save_stack(0, depot_flags);
-
- /*
- * Temporarily disable KASAN bug reporting to allow instrumented
- * spinlock functions to access aux_lock, which resides inside of a
- * redzone.
- */
- kasan_disable_current();
- raw_spin_lock_irqsave(&alloc_meta->aux_lock, flags);
- old_handle = alloc_meta->aux_stack[1];
alloc_meta->aux_stack[1] = alloc_meta->aux_stack[0];
- alloc_meta->aux_stack[0] = new_handle;
- raw_spin_unlock_irqrestore(&alloc_meta->aux_lock, flags);
- kasan_enable_current();
-
- stack_depot_put(old_handle);
+ alloc_meta->aux_stack[0] = kasan_save_stack(0, depot_flags);
}
void kasan_record_aux_stack(void *addr)
{
- return __kasan_record_aux_stack(addr,
- STACK_DEPOT_FLAG_CAN_ALLOC | STACK_DEPOT_FLAG_GET);
+ return __kasan_record_aux_stack(addr, STACK_DEPOT_FLAG_CAN_ALLOC);
}
void kasan_record_aux_stack_noalloc(void *addr)
{
- return __kasan_record_aux_stack(addr, STACK_DEPOT_FLAG_GET);
+ return __kasan_record_aux_stack(addr, 0);
}
void kasan_save_alloc_info(struct kmem_cache *cache, void *object, gfp_t flags)
@@ -620,7 +569,7 @@ void kasan_save_alloc_info(struct kmem_cache *cache, void *object, gfp_t flags)
if (!alloc_meta)
return;
- /* Evict previous stack traces (might exist for krealloc or mempool). */
+ /* Invalidate previous stack traces (might exist for krealloc or mempool). */
release_alloc_meta(alloc_meta);
kasan_save_track(&alloc_meta->alloc_track, flags);
@@ -634,7 +583,7 @@ void kasan_save_free_info(struct kmem_cache *cache, void *object)
if (!free_meta)
return;
- /* Evict previous stack trace (might exist for mempool). */
+ /* Invalidate previous stack trace (might exist for mempool). */
release_free_meta(object, free_meta);
kasan_save_track(&free_meta->free_track, 0);
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index d0f172f2..fb2b9ac 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -6,7 +6,6 @@
#include <linux/kasan.h>
#include <linux/kasan-tags.h>
#include <linux/kfence.h>
-#include <linux/spinlock.h>
#include <linux/stackdepot.h>
#if defined(CONFIG_KASAN_SW_TAGS) || defined(CONFIG_KASAN_HW_TAGS)
@@ -265,13 +264,6 @@ struct kasan_global {
struct kasan_alloc_meta {
struct kasan_track alloc_track;
/* Free track is stored in kasan_free_meta. */
- /*
- * aux_lock protects aux_stack from accesses from concurrent
- * kasan_record_aux_stack calls. It is a raw spinlock to avoid sleeping
- * on RT kernels, as kasan_record_aux_stack_noalloc can be called from
- * non-sleepable contexts.
- */
- raw_spinlock_t aux_lock;
depot_stack_handle_t aux_stack[2];
};
@@ -398,10 +390,8 @@ struct kasan_alloc_meta *kasan_get_alloc_meta(struct kmem_cache *cache,
struct kasan_free_meta *kasan_get_free_meta(struct kmem_cache *cache,
const void *object);
void kasan_init_object_meta(struct kmem_cache *cache, const void *object);
-void kasan_release_object_meta(struct kmem_cache *cache, const void *object);
#else
static inline void kasan_init_object_meta(struct kmem_cache *cache, const void *object) { }
-static inline void kasan_release_object_meta(struct kmem_cache *cache, const void *object) { }
#endif
depot_stack_handle_t kasan_save_stack(gfp_t flags, depot_flags_t depot_flags);
diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
index 3ba02ef..6958aa7 100644
--- a/mm/kasan/quarantine.c
+++ b/mm/kasan/quarantine.c
@@ -145,7 +145,10 @@ static void qlink_free(struct qlist_node *qlink, struct kmem_cache *cache)
void *object = qlink_to_object(qlink, cache);
struct kasan_free_meta *free_meta = kasan_get_free_meta(cache, object);
- kasan_release_object_meta(cache, object);
+ /*
+ * Note: Keep per-object metadata to allow KASAN print stack traces for
+ * use-after-free-before-realloc bugs.
+ */
/*
* If init_on_free is enabled and KASAN's free metadata is stored in
diff --git a/mm/madvise.c b/mm/madvise.c
index 912155a..cfa5e72 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -429,6 +429,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
if (++batch_count == SWAP_CLUSTER_MAX) {
batch_count = 0;
if (need_resched()) {
+ arch_leave_lazy_mmu_mode();
pte_unmap_unlock(start_pte, ptl);
cond_resched();
goto restart;
diff --git a/mm/memblock.c b/mm/memblock.c
index 4dcb2ee3..d09136e 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -180,8 +180,9 @@ static inline phys_addr_t memblock_cap_size(phys_addr_t base, phys_addr_t *size)
/*
* Address comparison utilities
*/
-static unsigned long __init_memblock memblock_addrs_overlap(phys_addr_t base1, phys_addr_t size1,
- phys_addr_t base2, phys_addr_t size2)
+unsigned long __init_memblock
+memblock_addrs_overlap(phys_addr_t base1, phys_addr_t size1, phys_addr_t base2,
+ phys_addr_t size2)
{
return ((base1 < (base2 + size2)) && (base2 < (base1 + size1)));
}
@@ -2249,6 +2250,7 @@ static const char * const flagname[] = {
[ilog2(MEMBLOCK_MIRROR)] = "MIRROR",
[ilog2(MEMBLOCK_NOMAP)] = "NOMAP",
[ilog2(MEMBLOCK_DRIVER_MANAGED)] = "DRV_MNG",
+ [ilog2(MEMBLOCK_RSRV_NOINIT)] = "RSV_NIT",
};
static int memblock_debug_show(struct seq_file *m, void *private)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 46d8d02..61932c9 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -621,6 +621,15 @@ static inline int memcg_events_index(enum vm_event_item idx)
}
struct memcg_vmstats_percpu {
+ /* Stats updates since the last flush */
+ unsigned int stats_updates;
+
+ /* Cached pointers for fast iteration in memcg_rstat_updated() */
+ struct memcg_vmstats_percpu *parent;
+ struct memcg_vmstats *vmstats;
+
+ /* The above should fit a single cacheline for memcg_rstat_updated() */
+
/* Local (CPU and cgroup) page state & events */
long state[MEMCG_NR_STAT];
unsigned long events[NR_MEMCG_EVENTS];
@@ -632,10 +641,7 @@ struct memcg_vmstats_percpu {
/* Cgroup1: threshold notifications & softlimit tree updates */
unsigned long nr_page_events;
unsigned long targets[MEM_CGROUP_NTARGETS];
-
- /* Stats updates since the last flush */
- unsigned int stats_updates;
-};
+} ____cacheline_aligned;
struct memcg_vmstats {
/* Aggregated (CPU and subtree) page state & events */
@@ -698,36 +704,35 @@ static void memcg_stats_unlock(void)
}
-static bool memcg_should_flush_stats(struct mem_cgroup *memcg)
+static bool memcg_vmstats_needs_flush(struct memcg_vmstats *vmstats)
{
- return atomic64_read(&memcg->vmstats->stats_updates) >
+ return atomic64_read(&vmstats->stats_updates) >
MEMCG_CHARGE_BATCH * num_online_cpus();
}
static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
{
+ struct memcg_vmstats_percpu *statc;
int cpu = smp_processor_id();
- unsigned int x;
if (!val)
return;
cgroup_rstat_updated(memcg->css.cgroup, cpu);
-
- for (; memcg; memcg = parent_mem_cgroup(memcg)) {
- x = __this_cpu_add_return(memcg->vmstats_percpu->stats_updates,
- abs(val));
-
- if (x < MEMCG_CHARGE_BATCH)
+ statc = this_cpu_ptr(memcg->vmstats_percpu);
+ for (; statc; statc = statc->parent) {
+ statc->stats_updates += abs(val);
+ if (statc->stats_updates < MEMCG_CHARGE_BATCH)
continue;
/*
* If @memcg is already flush-able, increasing stats_updates is
* redundant. Avoid the overhead of the atomic update.
*/
- if (!memcg_should_flush_stats(memcg))
- atomic64_add(x, &memcg->vmstats->stats_updates);
- __this_cpu_write(memcg->vmstats_percpu->stats_updates, 0);
+ if (!memcg_vmstats_needs_flush(statc->vmstats))
+ atomic64_add(statc->stats_updates,
+ &statc->vmstats->stats_updates);
+ statc->stats_updates = 0;
}
}
@@ -756,7 +761,7 @@ void mem_cgroup_flush_stats(struct mem_cgroup *memcg)
if (!memcg)
memcg = root_mem_cgroup;
- if (memcg_should_flush_stats(memcg))
+ if (memcg_vmstats_needs_flush(memcg->vmstats))
do_flush_stats(memcg);
}
@@ -770,7 +775,7 @@ void mem_cgroup_flush_stats_ratelimited(struct mem_cgroup *memcg)
static void flush_memcg_stats_dwork(struct work_struct *w)
{
/*
- * Deliberately ignore memcg_should_flush_stats() here so that flushing
+ * Deliberately ignore memcg_vmstats_needs_flush() here so that flushing
* in latency-sensitive paths is as cheap as possible.
*/
do_flush_stats(root_mem_cgroup);
@@ -5477,10 +5482,11 @@ static void mem_cgroup_free(struct mem_cgroup *memcg)
__mem_cgroup_free(memcg);
}
-static struct mem_cgroup *mem_cgroup_alloc(void)
+static struct mem_cgroup *mem_cgroup_alloc(struct mem_cgroup *parent)
{
+ struct memcg_vmstats_percpu *statc, *pstatc;
struct mem_cgroup *memcg;
- int node;
+ int node, cpu;
int __maybe_unused i;
long error = -ENOMEM;
@@ -5504,6 +5510,14 @@ static struct mem_cgroup *mem_cgroup_alloc(void)
if (!memcg->vmstats_percpu)
goto fail;
+ for_each_possible_cpu(cpu) {
+ if (parent)
+ pstatc = per_cpu_ptr(parent->vmstats_percpu, cpu);
+ statc = per_cpu_ptr(memcg->vmstats_percpu, cpu);
+ statc->parent = parent ? pstatc : NULL;
+ statc->vmstats = memcg->vmstats;
+ }
+
for_each_node(node)
if (alloc_mem_cgroup_per_node_info(memcg, node))
goto fail;
@@ -5549,7 +5563,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
struct mem_cgroup *memcg, *old_memcg;
old_memcg = set_active_memcg(parent);
- memcg = mem_cgroup_alloc();
+ memcg = mem_cgroup_alloc(parent);
set_active_memcg(old_memcg);
if (IS_ERR(memcg))
return ERR_CAST(memcg);
@@ -7957,9 +7971,13 @@ bool mem_cgroup_swap_full(struct folio *folio)
static int __init setup_swap_account(char *s)
{
- pr_warn_once("The swapaccount= commandline option is deprecated. "
- "Please report your usecase to linux-mm@kvack.org if you "
- "depend on this functionality.\n");
+ bool res;
+
+ if (!kstrtobool(s, &res) && !res)
+ pr_warn_once("The swapaccount=0 commandline option is deprecated "
+ "in favor of configuring swap control via cgroupfs. "
+ "Please report your usecase to linux-mm@kvack.org if you "
+ "depend on this functionality.\n");
return 1;
}
__setup("swapaccount=", setup_swap_account);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 636280d..9349948 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1377,6 +1377,9 @@ void ClearPageHWPoisonTakenOff(struct page *page)
*/
static inline bool HWPoisonHandlable(struct page *page, unsigned long flags)
{
+ if (PageSlab(page))
+ return false;
+
/* Soft offline could migrate non-LRU movable pages */
if ((flags & MF_SOFT_OFFLINE) && __PageMovable(page))
return true;
diff --git a/mm/memory.c b/mm/memory.c
index 89bcae0..0bfc8b0 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3799,6 +3799,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
struct page *page;
struct swap_info_struct *si = NULL;
rmap_t rmap_flags = RMAP_NONE;
+ bool need_clear_cache = false;
bool exclusive = false;
swp_entry_t entry;
pte_t pte;
@@ -3867,6 +3868,20 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
if (!folio) {
if (data_race(si->flags & SWP_SYNCHRONOUS_IO) &&
__swap_count(entry) == 1) {
+ /*
+ * Prevent parallel swapin from proceeding with
+ * the cache flag. Otherwise, another thread may
+ * finish swapin first, free the entry, and swapout
+ * reusing the same entry. It's undetectable as
+ * pte_same() returns true due to entry reuse.
+ */
+ if (swapcache_prepare(entry)) {
+ /* Relax a bit to prevent rapid repeated page faults */
+ schedule_timeout_uninterruptible(1);
+ goto out;
+ }
+ need_clear_cache = true;
+
/* skip swapcache */
folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0,
vma, vmf->address, false);
@@ -4117,6 +4132,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
if (vmf->pte)
pte_unmap_unlock(vmf->pte, vmf->ptl);
out:
+ /* Clear the swap cache pin for direct swapin after PTL unlock */
+ if (need_clear_cache)
+ swapcache_clear(si, entry);
if (si)
put_swap_device(si);
return ret;
@@ -4131,6 +4149,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
folio_unlock(swapcache);
folio_put(swapcache);
}
+ if (need_clear_cache)
+ swapcache_clear(si, entry);
if (si)
put_swap_device(si);
return ret;
@@ -5478,7 +5498,7 @@ static inline bool get_mmap_lock_carefully(struct mm_struct *mm, struct pt_regs
return true;
if (regs && !user_mode(regs)) {
- unsigned long ip = instruction_pointer(regs);
+ unsigned long ip = exception_ip(regs);
if (!search_exception_tables(ip))
return false;
}
@@ -5503,7 +5523,7 @@ static inline bool upgrade_mmap_lock_carefully(struct mm_struct *mm, struct pt_r
{
mmap_read_unlock(mm);
if (regs && !user_mode(regs)) {
- unsigned long ip = instruction_pointer(regs);
+ unsigned long ip = exception_ip(regs);
if (!search_exception_tables(ip))
return false;
}
diff --git a/mm/migrate.c b/mm/migrate.c
index cc9f2bcd..c27b1f8 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2519,6 +2519,14 @@ static int numamigrate_isolate_folio(pg_data_t *pgdat, struct folio *folio)
if (managed_zone(pgdat->node_zones + z))
break;
}
+
+ /*
+ * If there are no managed zones, it should not proceed
+ * further.
+ */
+ if (z < 0)
+ return 0;
+
wakeup_kswapd(pgdat->node_zones + z, 0,
folio_order(folio), ZONE_MOVABLE);
return 0;
diff --git a/mm/mmap.c b/mm/mmap.c
index d89770e..3281287 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -954,13 +954,21 @@ static struct vm_area_struct
} else if (merge_prev) { /* case 2 */
if (curr) {
vma_start_write(curr);
- err = dup_anon_vma(prev, curr, &anon_dup);
if (end == curr->vm_end) { /* case 7 */
+ /*
+ * can_vma_merge_after() assumed we would not be
+ * removing prev vma, so it skipped the check
+ * for vm_ops->close, but we are removing curr
+ */
+ if (curr->vm_ops && curr->vm_ops->close)
+ err = -EINVAL;
remove = curr;
} else { /* case 5 */
adjust = curr;
adj_start = (end - curr->vm_start);
}
+ if (!err)
+ err = dup_anon_vma(prev, curr, &anon_dup);
}
} else { /* merge_next */
vma_start_write(next);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 02147b6..3f25553 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1921,7 +1921,7 @@ static int balance_dirty_pages(struct bdi_writeback *wb,
break;
}
__set_current_state(TASK_KILLABLE);
- wb->dirty_sleep = now;
+ bdi->last_bdp_sleep = jiffies;
io_schedule_timeout(pause);
current->dirty_paused_when = now + pause;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 150d4f2..a663202 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4041,6 +4041,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
struct alloc_context *ac)
{
bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM;
+ bool can_compact = gfp_compaction_allowed(gfp_mask);
const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER;
struct page *page = NULL;
unsigned int alloc_flags;
@@ -4111,7 +4112,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
* Don't try this for allocations that are allowed to ignore
* watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen.
*/
- if (can_direct_reclaim &&
+ if (can_direct_reclaim && can_compact &&
(costly_order ||
(order > 0 && ac->migratetype != MIGRATE_MOVABLE))
&& !gfp_pfmemalloc_allowed(gfp_mask)) {
@@ -4209,9 +4210,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
/*
* Do not retry costly high order allocations unless they are
- * __GFP_RETRY_MAYFAIL
+ * __GFP_RETRY_MAYFAIL and we can compact
*/
- if (costly_order && !(gfp_mask & __GFP_RETRY_MAYFAIL))
+ if (costly_order && (!can_compact ||
+ !(gfp_mask & __GFP_RETRY_MAYFAIL)))
goto nopage;
if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
@@ -4224,7 +4226,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
* implementation of the compaction depends on the sufficient amount
* of free memory (see __compaction_suitable)
*/
- if (did_some_progress > 0 &&
+ if (did_some_progress > 0 && can_compact &&
should_compact_retry(ac, order, alloc_flags,
compact_result, &compact_priority,
&compaction_retries))
diff --git a/mm/shmem.c b/mm/shmem.c
index d7c84ff..29d5a02 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3374,7 +3374,7 @@ static int shmem_unlink(struct inode *dir, struct dentry *dentry)
static int shmem_rmdir(struct inode *dir, struct dentry *dentry)
{
- if (!simple_empty(dentry))
+ if (!simple_offset_empty(dentry))
return -ENOTEMPTY;
drop_nlink(d_inode(dentry));
@@ -3431,7 +3431,7 @@ static int shmem_rename2(struct mnt_idmap *idmap,
return simple_offset_rename_exchange(old_dir, old_dentry,
new_dir, new_dentry);
- if (!simple_empty(new_dentry))
+ if (!simple_offset_empty(new_dentry))
return -ENOTEMPTY;
if (flags & RENAME_WHITEOUT) {
@@ -4355,7 +4355,9 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
#ifdef CONFIG_TMPFS_POSIX_ACL
sb->s_flags |= SB_POSIXACL;
#endif
- uuid_gen(&sb->s_uuid);
+ uuid_t uuid;
+ uuid_gen(&uuid);
+ super_set_uuid(sb, uuid.b, sizeof(uuid));
#ifdef CONFIG_TMPFS_QUOTA
if (ctx->seen & SHMEM_SEEN_QUOTA) {
diff --git a/mm/swap.h b/mm/swap.h
index 758c46c..fc2f6ad 100644
--- a/mm/swap.h
+++ b/mm/swap.h
@@ -41,6 +41,7 @@ void __delete_from_swap_cache(struct folio *folio,
void delete_from_swap_cache(struct folio *folio);
void clear_shadow_from_swap_cache(int type, unsigned long begin,
unsigned long end);
+void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry);
struct folio *swap_cache_get_folio(swp_entry_t entry,
struct vm_area_struct *vma, unsigned long addr);
struct folio *filemap_get_incore_folio(struct address_space *mapping,
@@ -97,6 +98,10 @@ static inline int swap_writepage(struct page *p, struct writeback_control *wbc)
return 0;
}
+static inline void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry)
+{
+}
+
static inline struct folio *swap_cache_get_folio(swp_entry_t entry,
struct vm_area_struct *vma, unsigned long addr)
{
diff --git a/mm/swap_state.c b/mm/swap_state.c
index e671266..7255c01 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -680,9 +680,10 @@ struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
/* The page was likely read above, so no need for plugging here */
folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx,
&page_allocated, false);
- if (unlikely(page_allocated))
+ if (unlikely(page_allocated)) {
+ zswap_folio_swapin(folio);
swap_read_folio(folio, false, NULL);
- zswap_folio_swapin(folio);
+ }
return folio;
}
@@ -855,9 +856,10 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask,
/* The folio was likely read above, so no need for plugging here */
folio = __read_swap_cache_async(targ_entry, gfp_mask, mpol, targ_ilx,
&page_allocated, false);
- if (unlikely(page_allocated))
+ if (unlikely(page_allocated)) {
+ zswap_folio_swapin(folio);
swap_read_folio(folio, false, NULL);
- zswap_folio_swapin(folio);
+ }
return folio;
}
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 556ff73..573843d 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2532,10 +2532,10 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
exit_swap_address_space(p->type);
inode = mapping->host;
- if (p->bdev_handle) {
+ if (p->bdev_file) {
set_blocksize(p->bdev, old_block_size);
- bdev_release(p->bdev_handle);
- p->bdev_handle = NULL;
+ fput(p->bdev_file);
+ p->bdev_file = NULL;
}
inode_lock(inode);
@@ -2765,14 +2765,14 @@ static int claim_swapfile(struct swap_info_struct *p, struct inode *inode)
int error;
if (S_ISBLK(inode->i_mode)) {
- p->bdev_handle = bdev_open_by_dev(inode->i_rdev,
+ p->bdev_file = bdev_file_open_by_dev(inode->i_rdev,
BLK_OPEN_READ | BLK_OPEN_WRITE, p, NULL);
- if (IS_ERR(p->bdev_handle)) {
- error = PTR_ERR(p->bdev_handle);
- p->bdev_handle = NULL;
+ if (IS_ERR(p->bdev_file)) {
+ error = PTR_ERR(p->bdev_file);
+ p->bdev_file = NULL;
return error;
}
- p->bdev = p->bdev_handle->bdev;
+ p->bdev = file_bdev(p->bdev_file);
p->old_block_size = block_size(p->bdev);
error = set_blocksize(p->bdev, PAGE_SIZE);
if (error < 0)
@@ -3208,10 +3208,10 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
p->percpu_cluster = NULL;
free_percpu(p->cluster_next_cpu);
p->cluster_next_cpu = NULL;
- if (p->bdev_handle) {
+ if (p->bdev_file) {
set_blocksize(p->bdev, p->old_block_size);
- bdev_release(p->bdev_handle);
- p->bdev_handle = NULL;
+ fput(p->bdev_file);
+ p->bdev_file = NULL;
}
inode = NULL;
destroy_swap_extents(p);
@@ -3365,6 +3365,19 @@ int swapcache_prepare(swp_entry_t entry)
return __swap_duplicate(entry, SWAP_HAS_CACHE);
}
+void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry)
+{
+ struct swap_cluster_info *ci;
+ unsigned long offset = swp_offset(entry);
+ unsigned char usage;
+
+ ci = lock_cluster_or_swap_info(si, offset);
+ usage = __swap_entry_free_locked(si, offset, SWAP_HAS_CACHE);
+ unlock_cluster_or_swap_info(si, ci);
+ if (!usage)
+ free_swap_slot(entry);
+}
+
struct swap_info_struct *swp_swap_info(swp_entry_t entry)
{
return swap_type_to_swap_info(swp_type(entry));
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 75fcf1f..313f1c4 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -902,8 +902,8 @@ static int move_present_pte(struct mm_struct *mm,
double_pt_lock(dst_ptl, src_ptl);
- if (!pte_same(*src_pte, orig_src_pte) ||
- !pte_same(*dst_pte, orig_dst_pte)) {
+ if (!pte_same(ptep_get(src_pte), orig_src_pte) ||
+ !pte_same(ptep_get(dst_pte), orig_dst_pte)) {
err = -EAGAIN;
goto out;
}
@@ -914,9 +914,6 @@ static int move_present_pte(struct mm_struct *mm,
goto out;
}
- folio_move_anon_rmap(src_folio, dst_vma);
- WRITE_ONCE(src_folio->index, linear_page_index(dst_vma, dst_addr));
-
orig_src_pte = ptep_clear_flush(src_vma, src_addr, src_pte);
/* Folio got pinned from under us. Put it back and fail the move. */
if (folio_maybe_dma_pinned(src_folio)) {
@@ -925,6 +922,9 @@ static int move_present_pte(struct mm_struct *mm,
goto out;
}
+ folio_move_anon_rmap(src_folio, dst_vma);
+ WRITE_ONCE(src_folio->index, linear_page_index(dst_vma, dst_addr));
+
orig_dst_pte = mk_pte(&src_folio->page, dst_vma->vm_page_prot);
/* Follow mremap() behavior and treat the entry dirty after the move */
orig_dst_pte = pte_mkwrite(pte_mkdirty(orig_dst_pte), dst_vma);
@@ -946,8 +946,8 @@ static int move_swap_pte(struct mm_struct *mm,
double_pt_lock(dst_ptl, src_ptl);
- if (!pte_same(*src_pte, orig_src_pte) ||
- !pte_same(*dst_pte, orig_dst_pte)) {
+ if (!pte_same(ptep_get(src_pte), orig_src_pte) ||
+ !pte_same(ptep_get(dst_pte), orig_dst_pte)) {
double_pt_unlock(dst_ptl, src_ptl);
return -EAGAIN;
}
@@ -1016,7 +1016,7 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
}
spin_lock(dst_ptl);
- orig_dst_pte = *dst_pte;
+ orig_dst_pte = ptep_get(dst_pte);
spin_unlock(dst_ptl);
if (!pte_none(orig_dst_pte)) {
err = -EEXIST;
@@ -1024,7 +1024,7 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
}
spin_lock(src_ptl);
- orig_src_pte = *src_pte;
+ orig_src_pte = ptep_get(src_pte);
spin_unlock(src_ptl);
if (pte_none(orig_src_pte)) {
if (!(mode & UFFDIO_MOVE_MODE_ALLOW_SRC_HOLES))
@@ -1054,7 +1054,7 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
* page isn't freed under us
*/
spin_lock(src_ptl);
- if (!pte_same(orig_src_pte, *src_pte)) {
+ if (!pte_same(orig_src_pte, ptep_get(src_pte))) {
spin_unlock(src_ptl);
err = -EAGAIN;
goto out;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4f9c854c..4255619 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -5753,7 +5753,7 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
/* Use reclaim/compaction for costly allocs or under memory pressure */
static bool in_reclaim_compaction(struct scan_control *sc)
{
- if (IS_ENABLED(CONFIG_COMPACTION) && sc->order &&
+ if (gfp_compaction_allowed(sc->gfp_mask) && sc->order &&
(sc->order > PAGE_ALLOC_COSTLY_ORDER ||
sc->priority < DEF_PRIORITY - 2))
return true;
@@ -5998,6 +5998,9 @@ static inline bool compaction_ready(struct zone *zone, struct scan_control *sc)
{
unsigned long watermark;
+ if (!gfp_compaction_allowed(sc->gfp_mask))
+ return false;
+
/* Allocation can already succeed, nothing to do */
if (zone_watermark_ok(zone, sc->order, min_wmark_pages(zone),
sc->reclaim_idx, 0))
diff --git a/mm/zswap.c b/mm/zswap.c
index ca25b67..db4625a 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -377,10 +377,9 @@ void zswap_folio_swapin(struct folio *folio)
{
struct lruvec *lruvec;
- if (folio) {
- lruvec = folio_lruvec(folio);
- atomic_long_inc(&lruvec->zswap_lruvec_state.nr_zswap_protected);
- }
+ VM_WARN_ON_ONCE(!folio_test_locked(folio));
+ lruvec = folio_lruvec(folio);
+ atomic_long_inc(&lruvec->zswap_lruvec_state.nr_zswap_protected);
}
/*********************************
@@ -536,10 +535,6 @@ static struct zpool *zswap_find_zpool(struct zswap_entry *entry)
*/
static void zswap_free_entry(struct zswap_entry *entry)
{
- if (entry->objcg) {
- obj_cgroup_uncharge_zswap(entry->objcg, entry->length);
- obj_cgroup_put(entry->objcg);
- }
if (!entry->length)
atomic_dec(&zswap_same_filled_pages);
else {
@@ -548,6 +543,10 @@ static void zswap_free_entry(struct zswap_entry *entry)
atomic_dec(&entry->pool->nr_stored);
zswap_pool_put(entry->pool);
}
+ if (entry->objcg) {
+ obj_cgroup_uncharge_zswap(entry->objcg, entry->length);
+ obj_cgroup_put(entry->objcg);
+ }
zswap_entry_cache_free(entry);
atomic_dec(&zswap_stored_pages);
zswap_update_total_size();
@@ -895,10 +894,8 @@ static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o
* into the warmer region. We should terminate shrinking (if we're in the dynamic
* shrinker context).
*/
- if (writeback_result == -EEXIST && encountered_page_in_swapcache) {
- ret = LRU_SKIP;
+ if (writeback_result == -EEXIST && encountered_page_in_swapcache)
*encountered_page_in_swapcache = true;
- }
goto put_unlock;
}
@@ -1442,6 +1439,8 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
if (zswap_rb_search(&tree->rbroot, swp_offset(entry->swpentry)) != entry) {
spin_unlock(&tree->lock);
delete_from_swap_cache(folio);
+ folio_unlock(folio);
+ folio_put(folio);
return -ENOMEM;
}
spin_unlock(&tree->lock);
@@ -1519,7 +1518,7 @@ bool zswap_store(struct folio *folio)
if (folio_test_large(folio))
return false;
- if (!zswap_enabled || !tree)
+ if (!tree)
return false;
/*
@@ -1534,6 +1533,10 @@ bool zswap_store(struct folio *folio)
zswap_invalidate_entry(tree, dupentry);
}
spin_unlock(&tree->lock);
+
+ if (!zswap_enabled)
+ return false;
+
objcg = get_obj_cgroup_from_folio(folio);
if (objcg && !obj_cgroup_may_zswap(objcg)) {
memcg = get_mem_cgroup_from_objcg(objcg);
diff --git a/net/6lowpan/core.c b/net/6lowpan/core.c
index 7b3341c..850d4a1 100644
--- a/net/6lowpan/core.c
+++ b/net/6lowpan/core.c
@@ -179,4 +179,5 @@ static void __exit lowpan_module_exit(void)
module_init(lowpan_module_init);
module_exit(lowpan_module_exit);
+MODULE_DESCRIPTION("IPv6 over Low-Power Wireless Personal Area Network core module");
MODULE_LICENSE("GPL");
diff --git a/net/atm/mpc.c b/net/atm/mpc.c
index 033871e..324e3ab 100644
--- a/net/atm/mpc.c
+++ b/net/atm/mpc.c
@@ -1532,4 +1532,5 @@ static void __exit atm_mpoa_cleanup(void)
module_init(atm_mpoa_init);
module_exit(atm_mpoa_cleanup);
+MODULE_DESCRIPTION("Multi-Protocol Over ATM (MPOA) driver");
MODULE_LICENSE("GPL");
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 65601aa..2821a42 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -1049,6 +1049,7 @@ static void hci_error_reset(struct work_struct *work)
{
struct hci_dev *hdev = container_of(work, struct hci_dev, error_reset);
+ hci_dev_hold(hdev);
BT_DBG("%s", hdev->name);
if (hdev->hw_error)
@@ -1056,10 +1057,10 @@ static void hci_error_reset(struct work_struct *work)
else
bt_dev_err(hdev, "hardware error 0x%2.2x", hdev->hw_error_code);
- if (hci_dev_do_close(hdev))
- return;
+ if (!hci_dev_do_close(hdev))
+ hci_dev_do_open(hdev);
- hci_dev_do_open(hdev);
+ hci_dev_put(hdev);
}
void hci_uuids_clear(struct hci_dev *hdev)
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index ef8c3be..2a5f5a7 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -5329,9 +5329,12 @@ static void hci_io_capa_request_evt(struct hci_dev *hdev, void *data,
hci_dev_lock(hdev);
conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &ev->bdaddr);
- if (!conn || !hci_conn_ssp_enabled(conn))
+ if (!conn || !hci_dev_test_flag(hdev, HCI_SSP_ENABLED))
goto unlock;
+ /* Assume remote supports SSP since it has triggered this event */
+ set_bit(HCI_CONN_SSP_ENABLED, &conn->flags);
+
hci_conn_hold(conn);
if (!hci_dev_test_flag(hdev, HCI_MGMT))
@@ -6794,6 +6797,10 @@ static void hci_le_remote_conn_param_req_evt(struct hci_dev *hdev, void *data,
return send_conn_param_neg_reply(hdev, handle,
HCI_ERROR_UNKNOWN_CONN_ID);
+ if (max > hcon->le_conn_max_interval)
+ return send_conn_param_neg_reply(hdev, handle,
+ HCI_ERROR_INVALID_LL_PARAMS);
+
if (hci_check_conn_params(min, max, latency, timeout))
return send_conn_param_neg_reply(hdev, handle,
HCI_ERROR_INVALID_LL_PARAMS);
@@ -7420,10 +7427,10 @@ static void hci_store_wake_reason(struct hci_dev *hdev, u8 event,
* keep track of the bdaddr of the connection event that woke us up.
*/
if (event == HCI_EV_CONN_REQUEST) {
- bacpy(&hdev->wake_addr, &conn_complete->bdaddr);
+ bacpy(&hdev->wake_addr, &conn_request->bdaddr);
hdev->wake_addr_type = BDADDR_BREDR;
} else if (event == HCI_EV_CONN_COMPLETE) {
- bacpy(&hdev->wake_addr, &conn_request->bdaddr);
+ bacpy(&hdev->wake_addr, &conn_complete->bdaddr);
hdev->wake_addr_type = BDADDR_BREDR;
} else if (event == HCI_EV_LE_META) {
struct hci_ev_le_meta *le_ev = (void *)skb->data;
diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
index a6fc8a2..5716345 100644
--- a/net/bluetooth/hci_sync.c
+++ b/net/bluetooth/hci_sync.c
@@ -2206,8 +2206,11 @@ static int hci_le_add_accept_list_sync(struct hci_dev *hdev,
/* During suspend, only wakeable devices can be in acceptlist */
if (hdev->suspended &&
- !(params->flags & HCI_CONN_FLAG_REMOTE_WAKEUP))
+ !(params->flags & HCI_CONN_FLAG_REMOTE_WAKEUP)) {
+ hci_le_del_accept_list_sync(hdev, ¶ms->addr,
+ params->addr_type);
return 0;
+ }
/* Select filter policy to accept all advertising */
if (*num_entries >= hdev->le_accept_list_size)
@@ -5559,7 +5562,7 @@ static int hci_inquiry_sync(struct hci_dev *hdev, u8 length)
bt_dev_dbg(hdev, "");
- if (hci_dev_test_flag(hdev, HCI_INQUIRY))
+ if (test_bit(HCI_INQUIRY, &hdev->flags))
return 0;
hci_dev_lock(hdev);
diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
index 6029897..656f49b 100644
--- a/net/bluetooth/l2cap_core.c
+++ b/net/bluetooth/l2cap_core.c
@@ -5613,7 +5613,13 @@ static inline int l2cap_conn_param_update_req(struct l2cap_conn *conn,
memset(&rsp, 0, sizeof(rsp));
- err = hci_check_conn_params(min, max, latency, to_multiplier);
+ if (max > hcon->le_conn_max_interval) {
+ BT_DBG("requested connection interval exceeds current bounds.");
+ err = -EINVAL;
+ } else {
+ err = hci_check_conn_params(min, max, latency, to_multiplier);
+ }
+
if (err)
rsp.result = cpu_to_le16(L2CAP_CONN_PARAM_REJECTED);
else
diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c
index bb72ff6..ee3b4aa 100644
--- a/net/bluetooth/mgmt.c
+++ b/net/bluetooth/mgmt.c
@@ -1045,6 +1045,8 @@ static void rpa_expired(struct work_struct *work)
hci_cmd_sync_queue(hdev, rpa_expired_sync, NULL, NULL);
}
+static int set_discoverable_sync(struct hci_dev *hdev, void *data);
+
static void discov_off(struct work_struct *work)
{
struct hci_dev *hdev = container_of(work, struct hci_dev,
@@ -1063,7 +1065,7 @@ static void discov_off(struct work_struct *work)
hci_dev_clear_flag(hdev, HCI_DISCOVERABLE);
hdev->discov_timeout = 0;
- hci_update_discoverable(hdev);
+ hci_cmd_sync_queue(hdev, set_discoverable_sync, NULL, NULL);
mgmt_new_settings(hdev);
diff --git a/net/bluetooth/rfcomm/core.c b/net/bluetooth/rfcomm/core.c
index 053ef8f..1d34d84 100644
--- a/net/bluetooth/rfcomm/core.c
+++ b/net/bluetooth/rfcomm/core.c
@@ -1941,7 +1941,7 @@ static struct rfcomm_session *rfcomm_process_rx(struct rfcomm_session *s)
/* Get data directly from socket receive queue without copying it. */
while ((skb = skb_dequeue(&sk->sk_receive_queue))) {
skb_orphan(skb);
- if (!skb_linearize(skb)) {
+ if (!skb_linearize(skb) && sk->sk_state != BT_CLOSED) {
s = rfcomm_recv_frame(s, skb);
if (!s)
break;
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index ed17208..35e10c5 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -43,6 +43,10 @@
#include <linux/sysctl.h>
#endif
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+#include <net/netfilter/nf_conntrack_core.h>
+#endif
+
static unsigned int brnf_net_id __read_mostly;
struct brnf_net {
@@ -553,6 +557,90 @@ static unsigned int br_nf_pre_routing(void *priv,
return NF_STOLEN;
}
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+/* conntracks' nf_confirm logic cannot handle cloned skbs referencing
+ * the same nf_conn entry, which will happen for multicast (broadcast)
+ * Frames on bridges.
+ *
+ * Example:
+ * macvlan0
+ * br0
+ * ethX ethY
+ *
+ * ethX (or Y) receives multicast or broadcast packet containing
+ * an IP packet, not yet in conntrack table.
+ *
+ * 1. skb passes through bridge and fake-ip (br_netfilter)Prerouting.
+ * -> skb->_nfct now references a unconfirmed entry
+ * 2. skb is broad/mcast packet. bridge now passes clones out on each bridge
+ * interface.
+ * 3. skb gets passed up the stack.
+ * 4. In macvlan case, macvlan driver retains clone(s) of the mcast skb
+ * and schedules a work queue to send them out on the lower devices.
+ *
+ * The clone skb->_nfct is not a copy, it is the same entry as the
+ * original skb. The macvlan rx handler then returns RX_HANDLER_PASS.
+ * 5. Normal conntrack hooks (in NF_INET_LOCAL_IN) confirm the orig skb.
+ *
+ * The Macvlan broadcast worker and normal confirm path will race.
+ *
+ * This race will not happen if step 2 already confirmed a clone. In that
+ * case later steps perform skb_clone() with skb->_nfct already confirmed (in
+ * hash table). This works fine.
+ *
+ * But such confirmation won't happen when eb/ip/nftables rules dropped the
+ * packets before they reached the nf_confirm step in postrouting.
+ *
+ * Work around this problem by explicit confirmation of the entry at
+ * LOCAL_IN time, before upper layer has a chance to clone the unconfirmed
+ * entry.
+ *
+ */
+static unsigned int br_nf_local_in(void *priv,
+ struct sk_buff *skb,
+ const struct nf_hook_state *state)
+{
+ struct nf_conntrack *nfct = skb_nfct(skb);
+ const struct nf_ct_hook *ct_hook;
+ struct nf_conn *ct;
+ int ret;
+
+ if (!nfct || skb->pkt_type == PACKET_HOST)
+ return NF_ACCEPT;
+
+ ct = container_of(nfct, struct nf_conn, ct_general);
+ if (likely(nf_ct_is_confirmed(ct)))
+ return NF_ACCEPT;
+
+ WARN_ON_ONCE(skb_shared(skb));
+ WARN_ON_ONCE(refcount_read(&nfct->use) != 1);
+
+ /* We can't call nf_confirm here, it would create a dependency
+ * on nf_conntrack module.
+ */
+ ct_hook = rcu_dereference(nf_ct_hook);
+ if (!ct_hook) {
+ skb->_nfct = 0ul;
+ nf_conntrack_put(nfct);
+ return NF_ACCEPT;
+ }
+
+ nf_bridge_pull_encap_header(skb);
+ ret = ct_hook->confirm(skb);
+ switch (ret & NF_VERDICT_MASK) {
+ case NF_STOLEN:
+ return NF_STOLEN;
+ default:
+ nf_bridge_push_encap_header(skb);
+ break;
+ }
+
+ ct = container_of(nfct, struct nf_conn, ct_general);
+ WARN_ON_ONCE(!nf_ct_is_confirmed(ct));
+
+ return ret;
+}
+#endif
/* PF_BRIDGE/FORWARD *************************************************/
static int br_nf_forward_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
@@ -964,6 +1052,14 @@ static const struct nf_hook_ops br_nf_ops[] = {
.hooknum = NF_BR_PRE_ROUTING,
.priority = NF_BR_PRI_BRNF,
},
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+ {
+ .hook = br_nf_local_in,
+ .pf = NFPROTO_BRIDGE,
+ .hooknum = NF_BR_LOCAL_IN,
+ .priority = NF_BR_PRI_LAST,
+ },
+#endif
{
.hook = br_nf_forward,
.pf = NFPROTO_BRIDGE,
diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
index ee84e78..7b41ee8 100644
--- a/net/bridge/br_switchdev.c
+++ b/net/bridge/br_switchdev.c
@@ -595,21 +595,40 @@ br_switchdev_mdb_replay_one(struct notifier_block *nb, struct net_device *dev,
}
static int br_switchdev_mdb_queue_one(struct list_head *mdb_list,
+ struct net_device *dev,
+ unsigned long action,
enum switchdev_obj_id id,
const struct net_bridge_mdb_entry *mp,
struct net_device *orig_dev)
{
- struct switchdev_obj_port_mdb *mdb;
+ struct switchdev_obj_port_mdb mdb = {
+ .obj = {
+ .id = id,
+ .orig_dev = orig_dev,
+ },
+ };
+ struct switchdev_obj_port_mdb *pmdb;
- mdb = kzalloc(sizeof(*mdb), GFP_ATOMIC);
- if (!mdb)
+ br_switchdev_mdb_populate(&mdb, mp);
+
+ if (action == SWITCHDEV_PORT_OBJ_ADD &&
+ switchdev_port_obj_act_is_deferred(dev, action, &mdb.obj)) {
+ /* This event is already in the deferred queue of
+ * events, so this replay must be elided, lest the
+ * driver receives duplicate events for it. This can
+ * only happen when replaying additions, since
+ * modifications are always immediately visible in
+ * br->mdb_list, whereas actual event delivery may be
+ * delayed.
+ */
+ return 0;
+ }
+
+ pmdb = kmemdup(&mdb, sizeof(mdb), GFP_ATOMIC);
+ if (!pmdb)
return -ENOMEM;
- mdb->obj.id = id;
- mdb->obj.orig_dev = orig_dev;
- br_switchdev_mdb_populate(mdb, mp);
- list_add_tail(&mdb->obj.list, mdb_list);
-
+ list_add_tail(&pmdb->obj.list, mdb_list);
return 0;
}
@@ -677,51 +696,50 @@ br_switchdev_mdb_replay(struct net_device *br_dev, struct net_device *dev,
if (!br_opt_get(br, BROPT_MULTICAST_ENABLED))
return 0;
- /* We cannot walk over br->mdb_list protected just by the rtnl_mutex,
- * because the write-side protection is br->multicast_lock. But we
- * need to emulate the [ blocking ] calling context of a regular
- * switchdev event, so since both br->multicast_lock and RCU read side
- * critical sections are atomic, we have no choice but to pick the RCU
- * read side lock, queue up all our events, leave the critical section
- * and notify switchdev from blocking context.
- */
- rcu_read_lock();
+ if (adding)
+ action = SWITCHDEV_PORT_OBJ_ADD;
+ else
+ action = SWITCHDEV_PORT_OBJ_DEL;
- hlist_for_each_entry_rcu(mp, &br->mdb_list, mdb_node) {
+ /* br_switchdev_mdb_queue_one() will take care to not queue a
+ * replay of an event that is already pending in the switchdev
+ * deferred queue. In order to safely determine that, there
+ * must be no new deferred MDB notifications enqueued for the
+ * duration of the MDB scan. Therefore, grab the write-side
+ * lock to avoid racing with any concurrent IGMP/MLD snooping.
+ */
+ spin_lock_bh(&br->multicast_lock);
+
+ hlist_for_each_entry(mp, &br->mdb_list, mdb_node) {
struct net_bridge_port_group __rcu * const *pp;
const struct net_bridge_port_group *p;
if (mp->host_joined) {
- err = br_switchdev_mdb_queue_one(&mdb_list,
+ err = br_switchdev_mdb_queue_one(&mdb_list, dev, action,
SWITCHDEV_OBJ_ID_HOST_MDB,
mp, br_dev);
if (err) {
- rcu_read_unlock();
+ spin_unlock_bh(&br->multicast_lock);
goto out_free_mdb;
}
}
- for (pp = &mp->ports; (p = rcu_dereference(*pp)) != NULL;
+ for (pp = &mp->ports; (p = mlock_dereference(*pp, br)) != NULL;
pp = &p->next) {
if (p->key.port->dev != dev)
continue;
- err = br_switchdev_mdb_queue_one(&mdb_list,
+ err = br_switchdev_mdb_queue_one(&mdb_list, dev, action,
SWITCHDEV_OBJ_ID_PORT_MDB,
mp, dev);
if (err) {
- rcu_read_unlock();
+ spin_unlock_bh(&br->multicast_lock);
goto out_free_mdb;
}
}
}
- rcu_read_unlock();
-
- if (adding)
- action = SWITCHDEV_PORT_OBJ_ADD;
- else
- action = SWITCHDEV_PORT_OBJ_DEL;
+ spin_unlock_bh(&br->multicast_lock);
list_for_each_entry(obj, &mdb_list, list) {
err = br_switchdev_mdb_replay_one(nb, dev,
@@ -786,6 +804,16 @@ static void nbp_switchdev_unsync_objs(struct net_bridge_port *p,
br_switchdev_mdb_replay(br_dev, dev, ctx, false, blocking_nb, NULL);
br_switchdev_vlan_replay(br_dev, ctx, false, blocking_nb, NULL);
+
+ /* Make sure that the device leaving this bridge has seen all
+ * relevant events before it is disassociated. In the normal
+ * case, when the device is directly attached to the bridge,
+ * this is covered by del_nbp(). If the association was indirect
+ * however, e.g. via a team or bond, and the device is leaving
+ * that intermediate device, then the bridge port remains in
+ * place.
+ */
+ switchdev_deferred_process();
}
/* Let the bridge know that this port is offloaded, so that it can assign a
diff --git a/net/bridge/netfilter/nf_conntrack_bridge.c b/net/bridge/netfilter/nf_conntrack_bridge.c
index abb090f..6f877e3 100644
--- a/net/bridge/netfilter/nf_conntrack_bridge.c
+++ b/net/bridge/netfilter/nf_conntrack_bridge.c
@@ -291,6 +291,30 @@ static unsigned int nf_ct_bridge_pre(void *priv, struct sk_buff *skb,
return nf_conntrack_in(skb, &bridge_state);
}
+static unsigned int nf_ct_bridge_in(void *priv, struct sk_buff *skb,
+ const struct nf_hook_state *state)
+{
+ enum ip_conntrack_info ctinfo;
+ struct nf_conn *ct;
+
+ if (skb->pkt_type == PACKET_HOST)
+ return NF_ACCEPT;
+
+ /* nf_conntrack_confirm() cannot handle concurrent clones,
+ * this happens for broad/multicast frames with e.g. macvlan on top
+ * of the bridge device.
+ */
+ ct = nf_ct_get(skb, &ctinfo);
+ if (!ct || nf_ct_is_confirmed(ct) || nf_ct_is_template(ct))
+ return NF_ACCEPT;
+
+ /* let inet prerouting call conntrack again */
+ skb->_nfct = 0;
+ nf_ct_put(ct);
+
+ return NF_ACCEPT;
+}
+
static void nf_ct_bridge_frag_save(struct sk_buff *skb,
struct nf_bridge_frag_data *data)
{
@@ -386,6 +410,12 @@ static struct nf_hook_ops nf_ct_bridge_hook_ops[] __read_mostly = {
.priority = NF_IP_PRI_CONNTRACK,
},
{
+ .hook = nf_ct_bridge_in,
+ .pf = NFPROTO_BRIDGE,
+ .hooknum = NF_BR_LOCAL_IN,
+ .priority = NF_IP_PRI_CONNTRACK_CONFIRM,
+ },
+ {
.hook = nf_ct_bridge_post,
.pf = NFPROTO_BRIDGE,
.hooknum = NF_BR_POST_ROUTING,
diff --git a/net/can/j1939/j1939-priv.h b/net/can/j1939/j1939-priv.h
index 16af1a7..31a93ca 100644
--- a/net/can/j1939/j1939-priv.h
+++ b/net/can/j1939/j1939-priv.h
@@ -86,7 +86,7 @@ struct j1939_priv {
unsigned int tp_max_packet_size;
/* lock for j1939_socks list */
- spinlock_t j1939_socks_lock;
+ rwlock_t j1939_socks_lock;
struct list_head j1939_socks;
struct kref rx_kref;
@@ -301,6 +301,7 @@ struct j1939_sock {
int ifindex;
struct j1939_addr addr;
+ spinlock_t filters_lock;
struct j1939_filter *filters;
int nfilters;
pgn_t pgn_rx_filter;
diff --git a/net/can/j1939/main.c b/net/can/j1939/main.c
index ecff1c9..a6fb89f 100644
--- a/net/can/j1939/main.c
+++ b/net/can/j1939/main.c
@@ -274,7 +274,7 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev)
return ERR_PTR(-ENOMEM);
j1939_tp_init(priv);
- spin_lock_init(&priv->j1939_socks_lock);
+ rwlock_init(&priv->j1939_socks_lock);
INIT_LIST_HEAD(&priv->j1939_socks);
mutex_lock(&j1939_netdev_lock);
diff --git a/net/can/j1939/socket.c b/net/can/j1939/socket.c
index 14c4316..305dd72 100644
--- a/net/can/j1939/socket.c
+++ b/net/can/j1939/socket.c
@@ -80,16 +80,16 @@ static void j1939_jsk_add(struct j1939_priv *priv, struct j1939_sock *jsk)
jsk->state |= J1939_SOCK_BOUND;
j1939_priv_get(priv);
- spin_lock_bh(&priv->j1939_socks_lock);
+ write_lock_bh(&priv->j1939_socks_lock);
list_add_tail(&jsk->list, &priv->j1939_socks);
- spin_unlock_bh(&priv->j1939_socks_lock);
+ write_unlock_bh(&priv->j1939_socks_lock);
}
static void j1939_jsk_del(struct j1939_priv *priv, struct j1939_sock *jsk)
{
- spin_lock_bh(&priv->j1939_socks_lock);
+ write_lock_bh(&priv->j1939_socks_lock);
list_del_init(&jsk->list);
- spin_unlock_bh(&priv->j1939_socks_lock);
+ write_unlock_bh(&priv->j1939_socks_lock);
j1939_priv_put(priv);
jsk->state &= ~J1939_SOCK_BOUND;
@@ -262,12 +262,17 @@ static bool j1939_sk_match_dst(struct j1939_sock *jsk,
static bool j1939_sk_match_filter(struct j1939_sock *jsk,
const struct j1939_sk_buff_cb *skcb)
{
- const struct j1939_filter *f = jsk->filters;
- int nfilter = jsk->nfilters;
+ const struct j1939_filter *f;
+ int nfilter;
+
+ spin_lock_bh(&jsk->filters_lock);
+
+ f = jsk->filters;
+ nfilter = jsk->nfilters;
if (!nfilter)
/* receive all when no filters are assigned */
- return true;
+ goto filter_match_found;
for (; nfilter; ++f, --nfilter) {
if ((skcb->addr.pgn & f->pgn_mask) != f->pgn)
@@ -276,9 +281,15 @@ static bool j1939_sk_match_filter(struct j1939_sock *jsk,
continue;
if ((skcb->addr.src_name & f->name_mask) != f->name)
continue;
- return true;
+ goto filter_match_found;
}
+
+ spin_unlock_bh(&jsk->filters_lock);
return false;
+
+filter_match_found:
+ spin_unlock_bh(&jsk->filters_lock);
+ return true;
}
static bool j1939_sk_recv_match_one(struct j1939_sock *jsk,
@@ -329,13 +340,13 @@ bool j1939_sk_recv_match(struct j1939_priv *priv, struct j1939_sk_buff_cb *skcb)
struct j1939_sock *jsk;
bool match = false;
- spin_lock_bh(&priv->j1939_socks_lock);
+ read_lock_bh(&priv->j1939_socks_lock);
list_for_each_entry(jsk, &priv->j1939_socks, list) {
match = j1939_sk_recv_match_one(jsk, skcb);
if (match)
break;
}
- spin_unlock_bh(&priv->j1939_socks_lock);
+ read_unlock_bh(&priv->j1939_socks_lock);
return match;
}
@@ -344,11 +355,11 @@ void j1939_sk_recv(struct j1939_priv *priv, struct sk_buff *skb)
{
struct j1939_sock *jsk;
- spin_lock_bh(&priv->j1939_socks_lock);
+ read_lock_bh(&priv->j1939_socks_lock);
list_for_each_entry(jsk, &priv->j1939_socks, list) {
j1939_sk_recv_one(jsk, skb);
}
- spin_unlock_bh(&priv->j1939_socks_lock);
+ read_unlock_bh(&priv->j1939_socks_lock);
}
static void j1939_sk_sock_destruct(struct sock *sk)
@@ -401,6 +412,7 @@ static int j1939_sk_init(struct sock *sk)
atomic_set(&jsk->skb_pending, 0);
spin_lock_init(&jsk->sk_session_queue_lock);
INIT_LIST_HEAD(&jsk->sk_session_queue);
+ spin_lock_init(&jsk->filters_lock);
/* j1939_sk_sock_destruct() depends on SOCK_RCU_FREE flag */
sock_set_flag(sk, SOCK_RCU_FREE);
@@ -703,9 +715,11 @@ static int j1939_sk_setsockopt(struct socket *sock, int level, int optname,
}
lock_sock(&jsk->sk);
+ spin_lock_bh(&jsk->filters_lock);
ofilters = jsk->filters;
jsk->filters = filters;
jsk->nfilters = count;
+ spin_unlock_bh(&jsk->filters_lock);
release_sock(&jsk->sk);
kfree(ofilters);
return 0;
@@ -1080,12 +1094,12 @@ void j1939_sk_errqueue(struct j1939_session *session,
}
/* spread RX notifications to all sockets subscribed to this session */
- spin_lock_bh(&priv->j1939_socks_lock);
+ read_lock_bh(&priv->j1939_socks_lock);
list_for_each_entry(jsk, &priv->j1939_socks, list) {
if (j1939_sk_recv_match_one(jsk, &session->skcb))
__j1939_sk_errqueue(session, &jsk->sk, type);
}
- spin_unlock_bh(&priv->j1939_socks_lock);
+ read_unlock_bh(&priv->j1939_socks_lock);
};
void j1939_sk_send_loop_abort(struct sock *sk, int err)
@@ -1273,7 +1287,7 @@ void j1939_sk_netdev_event_netdown(struct j1939_priv *priv)
struct j1939_sock *jsk;
int error_code = ENETDOWN;
- spin_lock_bh(&priv->j1939_socks_lock);
+ read_lock_bh(&priv->j1939_socks_lock);
list_for_each_entry(jsk, &priv->j1939_socks, list) {
jsk->sk.sk_err = error_code;
if (!sock_flag(&jsk->sk, SOCK_DEAD))
@@ -1281,7 +1295,7 @@ void j1939_sk_netdev_event_netdown(struct j1939_priv *priv)
j1939_sk_queue_drop_all(priv, jsk, error_code);
}
- spin_unlock_bh(&priv->j1939_socks_lock);
+ read_unlock_bh(&priv->j1939_socks_lock);
}
static int j1939_sk_no_ioctlcmd(struct socket *sock, unsigned int cmd,
diff --git a/net/ceph/messenger_v1.c b/net/ceph/messenger_v1.c
index f9a50d7..0cb61c7 100644
--- a/net/ceph/messenger_v1.c
+++ b/net/ceph/messenger_v1.c
@@ -160,8 +160,9 @@ static size_t sizeof_footer(struct ceph_connection *con)
static void prepare_message_data(struct ceph_msg *msg, u32 data_len)
{
/* Initialize data cursor if it's not a sparse read */
- if (!msg->sparse_read)
- ceph_msg_data_cursor_init(&msg->cursor, msg, data_len);
+ u64 len = msg->sparse_read_total ? : data_len;
+
+ ceph_msg_data_cursor_init(&msg->cursor, msg, len);
}
/*
@@ -991,7 +992,7 @@ static inline int read_partial_message_section(struct ceph_connection *con,
return read_partial_message_chunk(con, section, sec_len, crc);
}
-static int read_sparse_msg_extent(struct ceph_connection *con, u32 *crc)
+static int read_partial_sparse_msg_extent(struct ceph_connection *con, u32 *crc)
{
struct ceph_msg_data_cursor *cursor = &con->in_msg->cursor;
bool do_bounce = ceph_test_opt(from_msgr(con->msgr), RXBOUNCE);
@@ -1026,7 +1027,7 @@ static int read_sparse_msg_extent(struct ceph_connection *con, u32 *crc)
return 1;
}
-static int read_sparse_msg_data(struct ceph_connection *con)
+static int read_partial_sparse_msg_data(struct ceph_connection *con)
{
struct ceph_msg_data_cursor *cursor = &con->in_msg->cursor;
bool do_datacrc = !ceph_test_opt(from_msgr(con->msgr), NOCRC);
@@ -1036,31 +1037,31 @@ static int read_sparse_msg_data(struct ceph_connection *con)
if (do_datacrc)
crc = con->in_data_crc;
- do {
+ while (cursor->total_resid) {
if (con->v1.in_sr_kvec.iov_base)
ret = read_partial_message_chunk(con,
&con->v1.in_sr_kvec,
con->v1.in_sr_len,
&crc);
else if (cursor->sr_resid > 0)
- ret = read_sparse_msg_extent(con, &crc);
-
- if (ret <= 0) {
- if (do_datacrc)
- con->in_data_crc = crc;
- return ret;
- }
+ ret = read_partial_sparse_msg_extent(con, &crc);
+ if (ret <= 0)
+ break;
memset(&con->v1.in_sr_kvec, 0, sizeof(con->v1.in_sr_kvec));
ret = con->ops->sparse_read(con, cursor,
(char **)&con->v1.in_sr_kvec.iov_base);
+ if (ret <= 0) {
+ ret = ret ? ret : 1; /* must return > 0 to indicate success */
+ break;
+ }
con->v1.in_sr_len = ret;
- } while (ret > 0);
+ }
if (do_datacrc)
con->in_data_crc = crc;
- return ret < 0 ? ret : 1; /* must return > 0 to indicate success */
+ return ret;
}
static int read_partial_msg_data(struct ceph_connection *con)
@@ -1253,8 +1254,8 @@ static int read_partial_message(struct ceph_connection *con)
if (!m->num_data_items)
return -EIO;
- if (m->sparse_read)
- ret = read_sparse_msg_data(con);
+ if (m->sparse_read_total)
+ ret = read_partial_sparse_msg_data(con);
else if (ceph_test_opt(from_msgr(con->msgr), RXBOUNCE))
ret = read_partial_msg_data_bounce(con);
else
diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
index f8ec60e..bd608ff 100644
--- a/net/ceph/messenger_v2.c
+++ b/net/ceph/messenger_v2.c
@@ -1128,7 +1128,7 @@ static int decrypt_tail(struct ceph_connection *con)
struct sg_table enc_sgt = {};
struct sg_table sgt = {};
struct page **pages = NULL;
- bool sparse = con->in_msg->sparse_read;
+ bool sparse = !!con->in_msg->sparse_read_total;
int dpos = 0;
int tail_len;
int ret;
@@ -2034,6 +2034,9 @@ static int prepare_sparse_read_data(struct ceph_connection *con)
if (!con_secure(con))
con->in_data_crc = -1;
+ ceph_msg_data_cursor_init(&con->v2.in_cursor, msg,
+ msg->sparse_read_total);
+
reset_in_kvecs(con);
con->v2.in_state = IN_S_PREPARE_SPARSE_DATA_CONT;
con->v2.data_len_remain = data_len(msg);
@@ -2060,7 +2063,7 @@ static int prepare_read_tail_plain(struct ceph_connection *con)
}
if (data_len(msg)) {
- if (msg->sparse_read)
+ if (msg->sparse_read_total)
con->v2.in_state = IN_S_PREPARE_SPARSE_DATA;
else
con->v2.in_state = IN_S_PREPARE_READ_DATA;
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 6256220..9d078b3 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -5510,7 +5510,7 @@ static struct ceph_msg *get_reply(struct ceph_connection *con,
}
m = ceph_msg_get(req->r_reply);
- m->sparse_read = (bool)srlen;
+ m->sparse_read_total = srlen;
dout("get_reply tid %lld %p\n", tid, m);
@@ -5777,11 +5777,8 @@ static int prep_next_sparse_read(struct ceph_connection *con,
}
if (o->o_sparse_op_idx < 0) {
- u64 srlen = sparse_data_requested(req);
-
- dout("%s: [%d] starting new sparse read req. srlen=0x%llx\n",
- __func__, o->o_osd, srlen);
- ceph_msg_data_cursor_init(cursor, con->in_msg, srlen);
+ dout("%s: [%d] starting new sparse read req\n",
+ __func__, o->o_osd);
} else {
u64 end;
@@ -5857,8 +5854,8 @@ static int osd_sparse_read(struct ceph_connection *con,
struct ceph_osd *o = con->private;
struct ceph_sparse_read *sr = &o->o_sparse_read;
u32 count = sr->sr_count;
- u64 eoff, elen;
- int ret;
+ u64 eoff, elen, len = 0;
+ int i, ret;
switch (sr->sr_state) {
case CEPH_SPARSE_READ_HDR:
@@ -5903,8 +5900,20 @@ static int osd_sparse_read(struct ceph_connection *con,
convert_extent_map(sr);
ret = sizeof(sr->sr_datalen);
*pbuf = (char *)&sr->sr_datalen;
- sr->sr_state = CEPH_SPARSE_READ_DATA;
+ sr->sr_state = CEPH_SPARSE_READ_DATA_PRE;
break;
+ case CEPH_SPARSE_READ_DATA_PRE:
+ /* Convert sr_datalen to host-endian */
+ sr->sr_datalen = le32_to_cpu((__force __le32)sr->sr_datalen);
+ for (i = 0; i < count; i++)
+ len += sr->sr_extent[i].len;
+ if (sr->sr_datalen != len) {
+ pr_warn_ratelimited("data len %u != extent len %llu\n",
+ sr->sr_datalen, len);
+ return -EREMOTEIO;
+ }
+ sr->sr_state = CEPH_SPARSE_READ_DATA;
+ fallthrough;
case CEPH_SPARSE_READ_DATA:
if (sr->sr_index >= count) {
sr->sr_state = CEPH_SPARSE_READ_HDR;
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 103d46f..a8b625a 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -751,7 +751,7 @@ size_t memcpy_to_iter_csum(void *iter_to, size_t progress,
size_t len, void *from, void *priv2)
{
__wsum *csum = priv2;
- __wsum next = csum_partial_copy_nocheck(from, iter_to, len);
+ __wsum next = csum_partial_copy_nocheck(from + progress, iter_to, len);
*csum = csum_block_add(*csum, next, progress);
return 0;
diff --git a/net/core/dev.c b/net/core/dev.c
index cb2dab0..a892f72 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -336,7 +336,7 @@ int netdev_name_node_alt_create(struct net_device *dev, const char *name)
return -ENOMEM;
netdev_name_node_add(net, name_node);
/* The node that holds dev->name acts as a head of per-device list. */
- list_add_tail(&name_node->list, &dev->name_node->list);
+ list_add_tail_rcu(&name_node->list, &dev->name_node->list);
return 0;
}
@@ -6177,8 +6177,13 @@ static void __busy_poll_stop(struct napi_struct *napi, bool skip_schedule)
clear_bit(NAPI_STATE_SCHED, &napi->state);
}
-static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, bool prefer_busy_poll,
- u16 budget)
+enum {
+ NAPI_F_PREFER_BUSY_POLL = 1,
+ NAPI_F_END_ON_RESCHED = 2,
+};
+
+static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock,
+ unsigned flags, u16 budget)
{
bool skip_schedule = false;
unsigned long timeout;
@@ -6198,7 +6203,7 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, bool
local_bh_disable();
- if (prefer_busy_poll) {
+ if (flags & NAPI_F_PREFER_BUSY_POLL) {
napi->defer_hard_irqs_count = READ_ONCE(napi->dev->napi_defer_hard_irqs);
timeout = READ_ONCE(napi->dev->gro_flush_timeout);
if (napi->defer_hard_irqs_count && timeout) {
@@ -6222,23 +6227,23 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, bool
local_bh_enable();
}
-void napi_busy_loop(unsigned int napi_id,
- bool (*loop_end)(void *, unsigned long),
- void *loop_end_arg, bool prefer_busy_poll, u16 budget)
+static void __napi_busy_loop(unsigned int napi_id,
+ bool (*loop_end)(void *, unsigned long),
+ void *loop_end_arg, unsigned flags, u16 budget)
{
unsigned long start_time = loop_end ? busy_loop_current_time() : 0;
int (*napi_poll)(struct napi_struct *napi, int budget);
void *have_poll_lock = NULL;
struct napi_struct *napi;
+ WARN_ON_ONCE(!rcu_read_lock_held());
+
restart:
napi_poll = NULL;
- rcu_read_lock();
-
napi = napi_by_id(napi_id);
if (!napi)
- goto out;
+ return;
if (!IS_ENABLED(CONFIG_PREEMPT_RT))
preempt_disable();
@@ -6254,14 +6259,14 @@ void napi_busy_loop(unsigned int napi_id,
*/
if (val & (NAPIF_STATE_DISABLE | NAPIF_STATE_SCHED |
NAPIF_STATE_IN_BUSY_POLL)) {
- if (prefer_busy_poll)
+ if (flags & NAPI_F_PREFER_BUSY_POLL)
set_bit(NAPI_STATE_PREFER_BUSY_POLL, &napi->state);
goto count;
}
if (cmpxchg(&napi->state, val,
val | NAPIF_STATE_IN_BUSY_POLL |
NAPIF_STATE_SCHED) != val) {
- if (prefer_busy_poll)
+ if (flags & NAPI_F_PREFER_BUSY_POLL)
set_bit(NAPI_STATE_PREFER_BUSY_POLL, &napi->state);
goto count;
}
@@ -6281,12 +6286,15 @@ void napi_busy_loop(unsigned int napi_id,
break;
if (unlikely(need_resched())) {
+ if (flags & NAPI_F_END_ON_RESCHED)
+ break;
if (napi_poll)
- busy_poll_stop(napi, have_poll_lock, prefer_busy_poll, budget);
+ busy_poll_stop(napi, have_poll_lock, flags, budget);
if (!IS_ENABLED(CONFIG_PREEMPT_RT))
preempt_enable();
rcu_read_unlock();
cond_resched();
+ rcu_read_lock();
if (loop_end(loop_end_arg, start_time))
return;
goto restart;
@@ -6294,10 +6302,31 @@ void napi_busy_loop(unsigned int napi_id,
cpu_relax();
}
if (napi_poll)
- busy_poll_stop(napi, have_poll_lock, prefer_busy_poll, budget);
+ busy_poll_stop(napi, have_poll_lock, flags, budget);
if (!IS_ENABLED(CONFIG_PREEMPT_RT))
preempt_enable();
-out:
+}
+
+void napi_busy_loop_rcu(unsigned int napi_id,
+ bool (*loop_end)(void *, unsigned long),
+ void *loop_end_arg, bool prefer_busy_poll, u16 budget)
+{
+ unsigned flags = NAPI_F_END_ON_RESCHED;
+
+ if (prefer_busy_poll)
+ flags |= NAPI_F_PREFER_BUSY_POLL;
+
+ __napi_busy_loop(napi_id, loop_end, loop_end_arg, flags, budget);
+}
+
+void napi_busy_loop(unsigned int napi_id,
+ bool (*loop_end)(void *, unsigned long),
+ void *loop_end_arg, bool prefer_busy_poll, u16 budget)
+{
+ unsigned flags = prefer_busy_poll ? NAPI_F_PREFER_BUSY_POLL : 0;
+
+ rcu_read_lock();
+ __napi_busy_loop(napi_id, loop_end, loop_end_arg, flags, budget);
rcu_read_unlock();
}
EXPORT_SYMBOL(napi_busy_loop);
@@ -9074,28 +9103,6 @@ bool netdev_port_same_parent_id(struct net_device *a, struct net_device *b)
}
EXPORT_SYMBOL(netdev_port_same_parent_id);
-static void netdev_dpll_pin_assign(struct net_device *dev, struct dpll_pin *dpll_pin)
-{
-#if IS_ENABLED(CONFIG_DPLL)
- rtnl_lock();
- dev->dpll_pin = dpll_pin;
- rtnl_unlock();
-#endif
-}
-
-void netdev_dpll_pin_set(struct net_device *dev, struct dpll_pin *dpll_pin)
-{
- WARN_ON(!dpll_pin);
- netdev_dpll_pin_assign(dev, dpll_pin);
-}
-EXPORT_SYMBOL(netdev_dpll_pin_set);
-
-void netdev_dpll_pin_clear(struct net_device *dev)
-{
- netdev_dpll_pin_assign(dev, NULL);
-}
-EXPORT_SYMBOL(netdev_dpll_pin_clear);
-
/**
* dev_change_proto_down - set carrier according to proto_down.
*
@@ -11652,11 +11659,12 @@ static void __init net_dev_struct_check(void)
CACHELINE_ASSERT_GROUP_SIZE(struct net_device, net_device_read_tx, 160);
/* TXRX read-mostly hotpath */
+ CACHELINE_ASSERT_GROUP_MEMBER(struct net_device, net_device_read_txrx, lstats);
CACHELINE_ASSERT_GROUP_MEMBER(struct net_device, net_device_read_txrx, flags);
CACHELINE_ASSERT_GROUP_MEMBER(struct net_device, net_device_read_txrx, hard_header_len);
CACHELINE_ASSERT_GROUP_MEMBER(struct net_device, net_device_read_txrx, features);
CACHELINE_ASSERT_GROUP_MEMBER(struct net_device, net_device_read_txrx, ip6_ptr);
- CACHELINE_ASSERT_GROUP_SIZE(struct net_device, net_device_read_txrx, 30);
+ CACHELINE_ASSERT_GROUP_SIZE(struct net_device, net_device_read_txrx, 38);
/* RX read-mostly hotpath */
CACHELINE_ASSERT_GROUP_MEMBER(struct net_device, net_device_read_rx, ptype_specific);
diff --git a/net/core/gso_test.c b/net/core/gso_test.c
index 4c2e77b..358c446 100644
--- a/net/core/gso_test.c
+++ b/net/core/gso_test.c
@@ -225,7 +225,7 @@ static void gso_test_func(struct kunit *test)
segs = skb_segment(skb, features);
if (IS_ERR(segs)) {
- KUNIT_FAIL(test, "segs error %lld", PTR_ERR(segs));
+ KUNIT_FAIL(test, "segs error %pe", segs);
goto free_gso_skb;
} else if (!segs) {
KUNIT_FAIL(test, "no segments");
diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
index ffe5244..278294a 100644
--- a/net/core/page_pool_user.c
+++ b/net/core/page_pool_user.c
@@ -94,11 +94,12 @@ netdev_nl_page_pool_get_dump(struct sk_buff *skb, struct netlink_callback *cb,
state->pp_id = pool->user.id;
err = fill(skb, pool, info);
if (err)
- break;
+ goto out;
}
state->pp_id = 0;
}
+out:
mutex_unlock(&page_pools_lock);
rtnl_unlock();
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index f6f29eb0..bd50e9f 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1020,14 +1020,17 @@ static size_t rtnl_xdp_size(void)
static size_t rtnl_prop_list_size(const struct net_device *dev)
{
struct netdev_name_node *name_node;
- size_t size;
+ unsigned int cnt = 0;
- if (list_empty(&dev->name_node->list))
+ rcu_read_lock();
+ list_for_each_entry_rcu(name_node, &dev->name_node->list, list)
+ cnt++;
+ rcu_read_unlock();
+
+ if (!cnt)
return 0;
- size = nla_total_size(0);
- list_for_each_entry(name_node, &dev->name_node->list, list)
- size += nla_total_size(ALTIFNAMSIZ);
- return size;
+
+ return nla_total_size(0) + cnt * nla_total_size(ALTIFNAMSIZ);
}
static size_t rtnl_proto_down_size(const struct net_device *dev)
@@ -1054,7 +1057,7 @@ static size_t rtnl_dpll_pin_size(const struct net_device *dev)
{
size_t size = nla_total_size(0); /* nest IFLA_DPLL_PIN */
- size += dpll_msg_pin_handle_size(netdev_dpll_pin(dev));
+ size += dpll_netdev_pin_handle_size(dev);
return size;
}
@@ -1789,7 +1792,7 @@ static int rtnl_fill_dpll_pin(struct sk_buff *skb,
if (!dpll_pin_nest)
return -EMSGSIZE;
- ret = dpll_msg_add_pin_handle(skb, netdev_dpll_pin(dev));
+ ret = dpll_netdev_add_pin_handle(skb, dev);
if (ret < 0)
goto nest_cancel;
@@ -5166,10 +5169,9 @@ static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
struct net *net = sock_net(skb->sk);
struct ifinfomsg *ifm;
struct net_device *dev;
- struct nlattr *br_spec, *attr = NULL;
+ struct nlattr *br_spec, *attr, *br_flags_attr = NULL;
int rem, err = -EOPNOTSUPP;
u16 flags = 0;
- bool have_flags = false;
if (nlmsg_len(nlh) < sizeof(*ifm))
return -EINVAL;
@@ -5187,11 +5189,11 @@ static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
br_spec = nlmsg_find_attr(nlh, sizeof(struct ifinfomsg), IFLA_AF_SPEC);
if (br_spec) {
nla_for_each_nested(attr, br_spec, rem) {
- if (nla_type(attr) == IFLA_BRIDGE_FLAGS && !have_flags) {
+ if (nla_type(attr) == IFLA_BRIDGE_FLAGS && !br_flags_attr) {
if (nla_len(attr) < sizeof(flags))
return -EINVAL;
- have_flags = true;
+ br_flags_attr = attr;
flags = nla_get_u16(attr);
}
@@ -5235,8 +5237,8 @@ static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
}
}
- if (have_flags)
- memcpy(nla_data(attr), &flags, sizeof(flags));
+ if (br_flags_attr)
+ memcpy(nla_data(br_flags_attr), &flags, sizeof(flags));
out:
return err;
}
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 93ecfce..4d75ef9 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -1226,8 +1226,11 @@ static void sk_psock_verdict_data_ready(struct sock *sk)
rcu_read_lock();
psock = sk_psock(sk);
- if (psock)
- psock->saved_data_ready(sk);
+ if (psock) {
+ read_lock_bh(&sk->sk_callback_lock);
+ sk_psock_data_ready(sk, psock);
+ read_unlock_bh(&sk->sk_callback_lock);
+ }
rcu_read_unlock();
}
}
diff --git a/net/core/sock.c b/net/core/sock.c
index 0a7f46c..5e78798 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1188,6 +1188,17 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
*/
WRITE_ONCE(sk->sk_txrehash, (u8)val);
return 0;
+ case SO_PEEK_OFF:
+ {
+ int (*set_peek_off)(struct sock *sk, int val);
+
+ set_peek_off = READ_ONCE(sock->ops)->set_peek_off;
+ if (set_peek_off)
+ ret = set_peek_off(sk, val);
+ else
+ ret = -EOPNOTSUPP;
+ return ret;
+ }
}
sockopt_lock_sock(sk);
@@ -1430,18 +1441,6 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
sock_valbool_flag(sk, SOCK_WIFI_STATUS, valbool);
break;
- case SO_PEEK_OFF:
- {
- int (*set_peek_off)(struct sock *sk, int val);
-
- set_peek_off = READ_ONCE(sock->ops)->set_peek_off;
- if (set_peek_off)
- ret = set_peek_off(sk, val);
- else
- ret = -EOPNOTSUPP;
- break;
- }
-
case SO_NOFCS:
sock_valbool_flag(sk, SOCK_NOFCS, valbool);
break;
diff --git a/net/devlink/core.c b/net/devlink/core.c
index 4275a2b..7f0b093 100644
--- a/net/devlink/core.c
+++ b/net/devlink/core.c
@@ -46,7 +46,7 @@ struct devlink_rel {
u32 obj_index;
devlink_rel_notify_cb_t *notify_cb;
devlink_rel_cleanup_cb_t *cleanup_cb;
- struct work_struct notify_work;
+ struct delayed_work notify_work;
} nested_in;
};
@@ -70,7 +70,7 @@ static void __devlink_rel_put(struct devlink_rel *rel)
static void devlink_rel_nested_in_notify_work(struct work_struct *work)
{
struct devlink_rel *rel = container_of(work, struct devlink_rel,
- nested_in.notify_work);
+ nested_in.notify_work.work);
struct devlink *devlink;
devlink = devlinks_xa_get(rel->nested_in.devlink_index);
@@ -96,13 +96,13 @@ static void devlink_rel_nested_in_notify_work(struct work_struct *work)
return;
reschedule_work:
- schedule_work(&rel->nested_in.notify_work);
+ schedule_delayed_work(&rel->nested_in.notify_work, 1);
}
static void devlink_rel_nested_in_notify_work_schedule(struct devlink_rel *rel)
{
__devlink_rel_get(rel);
- schedule_work(&rel->nested_in.notify_work);
+ schedule_delayed_work(&rel->nested_in.notify_work, 0);
}
static struct devlink_rel *devlink_rel_alloc(void)
@@ -123,8 +123,8 @@ static struct devlink_rel *devlink_rel_alloc(void)
}
refcount_set(&rel->refcount, 1);
- INIT_WORK(&rel->nested_in.notify_work,
- &devlink_rel_nested_in_notify_work);
+ INIT_DELAYED_WORK(&rel->nested_in.notify_work,
+ &devlink_rel_nested_in_notify_work);
return rel;
}
@@ -529,14 +529,20 @@ static int __init devlink_init(void)
{
int err;
- err = genl_register_family(&devlink_nl_family);
- if (err)
- goto out;
err = register_pernet_subsys(&devlink_pernet_ops);
if (err)
goto out;
+ err = genl_register_family(&devlink_nl_family);
+ if (err)
+ goto out_unreg_pernet_subsys;
err = register_netdevice_notifier(&devlink_port_netdevice_nb);
+ if (!err)
+ return 0;
+ genl_unregister_family(&devlink_nl_family);
+
+out_unreg_pernet_subsys:
+ unregister_pernet_subsys(&devlink_pernet_ops);
out:
WARN_ON(err);
return err;
diff --git a/net/devlink/port.c b/net/devlink/port.c
index 78592912..4b2d46c 100644
--- a/net/devlink/port.c
+++ b/net/devlink/port.c
@@ -583,7 +583,7 @@ devlink_nl_port_get_dump_one(struct sk_buff *msg, struct devlink *devlink,
xa_for_each_start(&devlink->ports, port_index, devlink_port, state->idx) {
err = devlink_nl_port_fill(msg, devlink_port,
- DEVLINK_CMD_NEW,
+ DEVLINK_CMD_PORT_NEW,
NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq, flags,
cb->extack);
diff --git a/net/handshake/handshake-test.c b/net/handshake/handshake-test.c
index 16ed7bf..34fd1d9 100644
--- a/net/handshake/handshake-test.c
+++ b/net/handshake/handshake-test.c
@@ -471,7 +471,10 @@ static void handshake_req_destroy_test1(struct kunit *test)
handshake_req_cancel(sock->sk);
/* Act */
- fput(filp);
+ /* Ensure the close/release/put process has run to
+ * completion before checking the result.
+ */
+ __fput_sync(filp);
/* Assert */
KUNIT_EXPECT_PTR_EQ(test, handshake_req_destroy_test, req);
diff --git a/net/hsr/hsr_forward.c b/net/hsr/hsr_forward.c
index 80cdc6f..5d68cb1 100644
--- a/net/hsr/hsr_forward.c
+++ b/net/hsr/hsr_forward.c
@@ -83,7 +83,7 @@ static bool is_supervision_frame(struct hsr_priv *hsr, struct sk_buff *skb)
return false;
/* Get next tlv */
- total_length += sizeof(struct hsr_sup_tlv) + hsr_sup_tag->tlv.HSR_TLV_length;
+ total_length += hsr_sup_tag->tlv.HSR_TLV_length;
if (!pskb_may_pull(skb, total_length))
return false;
skb_pull(skb, total_length);
@@ -435,7 +435,7 @@ static void hsr_forward_do(struct hsr_frame_info *frame)
continue;
/* Don't send frame over port where it has been sent before.
- * Also fro SAN, this shouldn't be done.
+ * Also for SAN, this shouldn't be done.
*/
if (!frame->is_from_san &&
hsr_register_frame_out(port, frame->node_src,
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 4e635dd..a5a820e 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1628,10 +1628,12 @@ EXPORT_SYMBOL(inet_current_timestamp);
int inet_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
{
- if (sk->sk_family == AF_INET)
+ unsigned int family = READ_ONCE(sk->sk_family);
+
+ if (family == AF_INET)
return ip_recv_error(sk, msg, len, addr_len);
#if IS_ENABLED(CONFIG_IPV6)
- if (sk->sk_family == AF_INET6)
+ if (family == AF_INET6)
return pingv6_ops.ipv6_recv_error(sk, msg, len, addr_len);
#endif
return -EINVAL;
diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c
index a2e6e1f..64aec3d 100644
--- a/net/ipv4/ah4.c
+++ b/net/ipv4/ah4.c
@@ -597,5 +597,6 @@ static void __exit ah4_fini(void)
module_init(ah4_init);
module_exit(ah4_fini);
+MODULE_DESCRIPTION("IPv4 AH transformation library");
MODULE_LICENSE("GPL");
MODULE_ALIAS_XFRM_TYPE(AF_INET, XFRM_PROTO_AH);
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 9456f5b..0d0d725 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -1125,7 +1125,8 @@ static int arp_req_get(struct arpreq *r, struct net_device *dev)
if (neigh) {
if (!(READ_ONCE(neigh->nud_state) & NUD_NOARP)) {
read_lock_bh(&neigh->lock);
- memcpy(r->arp_ha.sa_data, neigh->ha, dev->addr_len);
+ memcpy(r->arp_ha.sa_data, neigh->ha,
+ min(dev->addr_len, sizeof(r->arp_ha.sa_data_min)));
r->arp_flags = arp_state_to_flags(neigh);
read_unlock_bh(&neigh->lock);
r->arp_ha.sa_family = dev->type;
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index ca0ff15..bc74f13 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1825,6 +1825,21 @@ static int in_dev_dump_addr(struct in_device *in_dev, struct sk_buff *skb,
return err;
}
+/* Combine dev_addr_genid and dev_base_seq to detect changes.
+ */
+static u32 inet_base_seq(const struct net *net)
+{
+ u32 res = atomic_read(&net->ipv4.dev_addr_genid) +
+ net->dev_base_seq;
+
+ /* Must not return 0 (see nl_dump_check_consistent()).
+ * Chose a value far away from 0.
+ */
+ if (!res)
+ res = 0x80000000;
+ return res;
+}
+
static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
{
const struct nlmsghdr *nlh = cb->nlh;
@@ -1876,8 +1891,7 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
idx = 0;
head = &tgt_net->dev_index_head[h];
rcu_read_lock();
- cb->seq = atomic_read(&tgt_net->ipv4.dev_addr_genid) ^
- tgt_net->dev_base_seq;
+ cb->seq = inet_base_seq(tgt_net);
hlist_for_each_entry_rcu(dev, head, index_hlist) {
if (idx < s_idx)
goto cont;
@@ -2278,8 +2292,7 @@ static int inet_netconf_dump_devconf(struct sk_buff *skb,
idx = 0;
head = &net->dev_index_head[h];
rcu_read_lock();
- cb->seq = atomic_read(&net->ipv4.dev_addr_genid) ^
- net->dev_base_seq;
+ cb->seq = inet_base_seq(net);
hlist_for_each_entry_rcu(dev, head, index_hlist) {
if (idx < s_idx)
goto cont;
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 4ccfc10..4dd9e50 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -1247,5 +1247,6 @@ static void __exit esp4_fini(void)
module_init(esp4_init);
module_exit(esp4_fini);
+MODULE_DESCRIPTION("IPv4 ESP transformation library");
MODULE_LICENSE("GPL");
MODULE_ALIAS_XFRM_TYPE(AF_INET, XFRM_PROTO_ESP);
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 93e9193..308ff34 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -1130,10 +1130,33 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
return 0;
error:
+ if (sk_hashed(sk)) {
+ spinlock_t *lock = inet_ehash_lockp(hinfo, sk->sk_hash);
+
+ sock_prot_inuse_add(net, sk->sk_prot, -1);
+
+ spin_lock(lock);
+ sk_nulls_del_node_init_rcu(sk);
+ spin_unlock(lock);
+
+ sk->sk_hash = 0;
+ inet_sk(sk)->inet_sport = 0;
+ inet_sk(sk)->inet_num = 0;
+
+ if (tw)
+ inet_twsk_bind_unhash(tw, hinfo);
+ }
+
spin_unlock(&head2->lock);
if (tb_created)
inet_bind_bucket_destroy(hinfo->bind_bucket_cachep, tb);
- spin_unlock_bh(&head->lock);
+ spin_unlock(&head->lock);
+
+ if (tw)
+ inet_twsk_deschedule_put(tw);
+
+ local_bh_enable();
+
return -ENOMEM;
}
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 5169c3c..6b9cf5a 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -1793,6 +1793,7 @@ static void __exit ipgre_fini(void)
module_init(ipgre_init);
module_exit(ipgre_fini);
+MODULE_DESCRIPTION("IPv4 GRE tunnels over IP library");
MODULE_LICENSE("GPL");
MODULE_ALIAS_RTNL_LINK("gre");
MODULE_ALIAS_RTNL_LINK("gretap");
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 41537d1..67d8466 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -972,8 +972,8 @@ static int __ip_append_data(struct sock *sk,
unsigned int maxfraglen, fragheaderlen, maxnonfragsize;
int csummode = CHECKSUM_NONE;
struct rtable *rt = (struct rtable *)cork->dst;
+ bool paged, hold_tskey, extra_uref = false;
unsigned int wmem_alloc_delta = 0;
- bool paged, extra_uref = false;
u32 tskey = 0;
skb = skb_peek_tail(queue);
@@ -982,10 +982,6 @@ static int __ip_append_data(struct sock *sk,
mtu = cork->gso_size ? IP_MAX_MTU : cork->fragsize;
paged = !!cork->gso_size;
- if (cork->tx_flags & SKBTX_ANY_TSTAMP &&
- READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID)
- tskey = atomic_inc_return(&sk->sk_tskey) - 1;
-
hh_len = LL_RESERVED_SPACE(rt->dst.dev);
fragheaderlen = sizeof(struct iphdr) + (opt ? opt->optlen : 0);
@@ -1052,6 +1048,11 @@ static int __ip_append_data(struct sock *sk,
cork->length += length;
+ hold_tskey = cork->tx_flags & SKBTX_ANY_TSTAMP &&
+ READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID;
+ if (hold_tskey)
+ tskey = atomic_inc_return(&sk->sk_tskey) - 1;
+
/* So, what's going on in the loop below?
*
* We use calculated fragment length to generate chained skb,
@@ -1274,6 +1275,8 @@ static int __ip_append_data(struct sock *sk,
cork->length -= length;
IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTDISCARDS);
refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc);
+ if (hold_tskey)
+ atomic_dec(&sk->sk_tskey);
return err;
}
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index beeae62..1b6981d 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -554,6 +554,20 @@ static int tnl_update_pmtu(struct net_device *dev, struct sk_buff *skb,
return 0;
}
+static void ip_tunnel_adj_headroom(struct net_device *dev, unsigned int headroom)
+{
+ /* we must cap headroom to some upperlimit, else pskb_expand_head
+ * will overflow header offsets in skb_headers_offset_update().
+ */
+ static const unsigned int max_allowed = 512;
+
+ if (headroom > max_allowed)
+ headroom = max_allowed;
+
+ if (headroom > READ_ONCE(dev->needed_headroom))
+ WRITE_ONCE(dev->needed_headroom, headroom);
+}
+
void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
u8 proto, int tunnel_hlen)
{
@@ -632,13 +646,13 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
}
headroom += LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len;
- if (headroom > READ_ONCE(dev->needed_headroom))
- WRITE_ONCE(dev->needed_headroom, headroom);
-
- if (skb_cow_head(skb, READ_ONCE(dev->needed_headroom))) {
+ if (skb_cow_head(skb, headroom)) {
ip_rt_put(rt);
goto tx_dropped;
}
+
+ ip_tunnel_adj_headroom(dev, headroom);
+
iptunnel_xmit(NULL, rt, skb, fl4.saddr, fl4.daddr, proto, tos, ttl,
df, !net_eq(tunnel->net, dev_net(dev)));
return;
@@ -818,16 +832,16 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
max_headroom = LL_RESERVED_SPACE(rt->dst.dev) + sizeof(struct iphdr)
+ rt->dst.header_len + ip_encap_hlen(&tunnel->encap);
- if (max_headroom > READ_ONCE(dev->needed_headroom))
- WRITE_ONCE(dev->needed_headroom, max_headroom);
- if (skb_cow_head(skb, READ_ONCE(dev->needed_headroom))) {
+ if (skb_cow_head(skb, max_headroom)) {
ip_rt_put(rt);
DEV_STATS_INC(dev, tx_dropped);
kfree_skb(skb);
return;
}
+ ip_tunnel_adj_headroom(dev, max_headroom);
+
iptunnel_xmit(NULL, rt, skb, fl4.saddr, fl4.daddr, protocol, tos, ttl,
df, !net_eq(tunnel->net, dev_net(dev)));
return;
@@ -1298,4 +1312,5 @@ void ip_tunnel_setup(struct net_device *dev, unsigned int net_id)
}
EXPORT_SYMBOL_GPL(ip_tunnel_setup);
+MODULE_DESCRIPTION("IPv4 tunnel implementation library");
MODULE_LICENSE("GPL");
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 586b1b3e..80ccd66 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -332,7 +332,7 @@ static int iptunnel_pmtud_build_icmpv6(struct sk_buff *skb, int mtu)
};
skb_reset_network_header(skb);
- csum = csum_partial(icmp6h, len, 0);
+ csum = skb_checksum(skb, skb_transport_offset(skb), len, 0);
icmp6h->icmp6_cksum = csum_ipv6_magic(&nip6h->saddr, &nip6h->daddr, len,
IPPROTO_ICMPV6, csum);
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 9ab9b3e..d1d6bb2 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -721,6 +721,7 @@ static void __exit vti_fini(void)
module_init(vti_init);
module_exit(vti_fini);
+MODULE_DESCRIPTION("Virtual (secure) IP tunneling library");
MODULE_LICENSE("GPL");
MODULE_ALIAS_RTNL_LINK("vti");
MODULE_ALIAS_NETDEV("ip_vti0");
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index 27b8f83..03afa38 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -658,6 +658,7 @@ static void __exit ipip_fini(void)
module_init(ipip_init);
module_exit(ipip_fini);
+MODULE_DESCRIPTION("IP/IP protocol decoder library");
MODULE_LICENSE("GPL");
MODULE_ALIAS_RTNL_LINK("ipip");
MODULE_ALIAS_NETDEV("tunl0");
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 7e2481b..c82dc42 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -4615,7 +4615,8 @@ static void __init tcp_struct_check(void)
CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_read_txrx, prr_out);
CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_read_txrx, lost_out);
CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_read_txrx, sacked_out);
- CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock, tcp_sock_read_txrx, 31);
+ CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_read_txrx, scaling_ratio);
+ CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock, tcp_sock_read_txrx, 32);
/* RX read-mostly hotpath cache lines */
CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_read_rx, copied_seq);
diff --git a/net/ipv4/tunnel4.c b/net/ipv4/tunnel4.c
index 5048c47..4c1f836 100644
--- a/net/ipv4/tunnel4.c
+++ b/net/ipv4/tunnel4.c
@@ -294,4 +294,5 @@ static void __exit tunnel4_fini(void)
module_init(tunnel4_init);
module_exit(tunnel4_fini);
+MODULE_DESCRIPTION("IPv4 XFRM tunnel library");
MODULE_LICENSE("GPL");
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index f631b0a..e474b20 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1589,12 +1589,7 @@ int udp_init_sock(struct sock *sk)
void skb_consume_udp(struct sock *sk, struct sk_buff *skb, int len)
{
- if (unlikely(READ_ONCE(sk->sk_peek_off) >= 0)) {
- bool slow = lock_sock_fast(sk);
-
- sk_peek_offset_bwd(sk, len);
- unlock_sock_fast(sk, slow);
- }
+ sk_peek_offset_bwd(sk, len);
if (!skb_unref(skb))
return;
diff --git a/net/ipv4/udp_tunnel_core.c b/net/ipv4/udp_tunnel_core.c
index a87defb..860aff5 100644
--- a/net/ipv4/udp_tunnel_core.c
+++ b/net/ipv4/udp_tunnel_core.c
@@ -253,4 +253,5 @@ struct rtable *udp_tunnel_dst_lookup(struct sk_buff *skb,
}
EXPORT_SYMBOL_GPL(udp_tunnel_dst_lookup);
+MODULE_DESCRIPTION("IPv4 Foo over UDP tunnel driver");
MODULE_LICENSE("GPL");
diff --git a/net/ipv4/xfrm4_tunnel.c b/net/ipv4/xfrm4_tunnel.c
index 8489fa1..8cb266a 100644
--- a/net/ipv4/xfrm4_tunnel.c
+++ b/net/ipv4/xfrm4_tunnel.c
@@ -114,5 +114,6 @@ static void __exit ipip_fini(void)
module_init(ipip_init);
module_exit(ipip_fini);
+MODULE_DESCRIPTION("IPv4 XFRM tunnel driver");
MODULE_LICENSE("GPL");
MODULE_ALIAS_XFRM_TYPE(AF_INET, XFRM_PROTO_IPIP);
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 733ace1..055230b 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -708,6 +708,22 @@ static int inet6_netconf_get_devconf(struct sk_buff *in_skb,
return err;
}
+/* Combine dev_addr_genid and dev_base_seq to detect changes.
+ */
+static u32 inet6_base_seq(const struct net *net)
+{
+ u32 res = atomic_read(&net->ipv6.dev_addr_genid) +
+ net->dev_base_seq;
+
+ /* Must not return 0 (see nl_dump_check_consistent()).
+ * Chose a value far away from 0.
+ */
+ if (!res)
+ res = 0x80000000;
+ return res;
+}
+
+
static int inet6_netconf_dump_devconf(struct sk_buff *skb,
struct netlink_callback *cb)
{
@@ -741,8 +757,7 @@ static int inet6_netconf_dump_devconf(struct sk_buff *skb,
idx = 0;
head = &net->dev_index_head[h];
rcu_read_lock();
- cb->seq = atomic_read(&net->ipv6.dev_addr_genid) ^
- net->dev_base_seq;
+ cb->seq = inet6_base_seq(net);
hlist_for_each_entry_rcu(dev, head, index_hlist) {
if (idx < s_idx)
goto cont;
@@ -5362,7 +5377,7 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb,
}
rcu_read_lock();
- cb->seq = atomic_read(&tgt_net->ipv6.dev_addr_genid) ^ tgt_net->dev_base_seq;
+ cb->seq = inet6_base_seq(tgt_net);
for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) {
idx = 0;
head = &tgt_net->dev_index_head[h];
@@ -5494,9 +5509,10 @@ static int inet6_rtm_getaddr(struct sk_buff *in_skb, struct nlmsghdr *nlh,
}
addr = extract_addr(tb[IFA_ADDRESS], tb[IFA_LOCAL], &peer);
- if (!addr)
- return -EINVAL;
-
+ if (!addr) {
+ err = -EINVAL;
+ goto errout;
+ }
ifm = nlmsg_data(nlh);
if (ifm->ifa_index)
dev = dev_get_by_index(tgt_net, ifm->ifa_index);
diff --git a/net/ipv6/ah6.c b/net/ipv6/ah6.c
index 2016e90..eb474f0 100644
--- a/net/ipv6/ah6.c
+++ b/net/ipv6/ah6.c
@@ -800,5 +800,6 @@ static void __exit ah6_fini(void)
module_init(ah6_init);
module_exit(ah6_fini);
+MODULE_DESCRIPTION("IPv6 AH transformation helpers");
MODULE_LICENSE("GPL");
MODULE_ALIAS_XFRM_TYPE(AF_INET6, XFRM_PROTO_AH);
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index 2cc1a45..6e6efe0 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -1301,5 +1301,6 @@ static void __exit esp6_fini(void)
module_init(esp6_init);
module_exit(esp6_fini);
+MODULE_DESCRIPTION("IPv6 ESP transformation helpers");
MODULE_LICENSE("GPL");
MODULE_ALIAS_XFRM_TYPE(AF_INET6, XFRM_PROTO_ESP);
diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index 4952ae7..02e9ffb 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -177,6 +177,8 @@ static bool ip6_parse_tlv(bool hopbyhop,
case IPV6_TLV_IOAM:
if (!ipv6_hop_ioam(skb, off))
return false;
+
+ nh = skb_network_header(skb);
break;
case IPV6_TLV_JUMBO:
if (!ipv6_hop_jumbo(skb, off))
@@ -943,6 +945,14 @@ static bool ipv6_hop_ioam(struct sk_buff *skb, int optoff)
if (!skb_valid_dst(skb))
ip6_route_input(skb);
+ /* About to mangle packet header */
+ if (skb_ensure_writable(skb, optoff + 2 + hdr->opt_len))
+ goto drop;
+
+ /* Trace pointer may have changed */
+ trace = (struct ioam6_trace_hdr *)(skb_network_header(skb)
+ + optoff + sizeof(*hdr));
+
ioam6_fill_trace_data(skb, ns, trace, true);
break;
default:
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index a722a43..31b86fe 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1424,11 +1424,11 @@ static int __ip6_append_data(struct sock *sk,
bool zc = false;
u32 tskey = 0;
struct rt6_info *rt = (struct rt6_info *)cork->dst;
+ bool paged, hold_tskey, extra_uref = false;
struct ipv6_txoptions *opt = v6_cork->opt;
int csummode = CHECKSUM_NONE;
unsigned int maxnonfragsize, headersize;
unsigned int wmem_alloc_delta = 0;
- bool paged, extra_uref = false;
skb = skb_peek_tail(queue);
if (!skb) {
@@ -1440,10 +1440,6 @@ static int __ip6_append_data(struct sock *sk,
mtu = cork->gso_size ? IP6_MAX_MTU : cork->fragsize;
orig_mtu = mtu;
- if (cork->tx_flags & SKBTX_ANY_TSTAMP &&
- READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID)
- tskey = atomic_inc_return(&sk->sk_tskey) - 1;
-
hh_len = LL_RESERVED_SPACE(rt->dst.dev);
fragheaderlen = sizeof(struct ipv6hdr) + rt->rt6i_nfheader_len +
@@ -1538,6 +1534,11 @@ static int __ip6_append_data(struct sock *sk,
flags &= ~MSG_SPLICE_PAGES;
}
+ hold_tskey = cork->tx_flags & SKBTX_ANY_TSTAMP &&
+ READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID;
+ if (hold_tskey)
+ tskey = atomic_inc_return(&sk->sk_tskey) - 1;
+
/*
* Let's try using as much space as possible.
* Use MTU if total length of the message fits into the MTU.
@@ -1794,6 +1795,8 @@ static int __ip6_append_data(struct sock *sk,
cork->length -= length;
IP6_INC_STATS(sock_net(sk), rt->rt6i_idev, IPSTATS_MIB_OUTDISCARDS);
refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc);
+ if (hold_tskey)
+ atomic_dec(&sk->sk_tskey);
return err;
}
diff --git a/net/ipv6/ip6_udp_tunnel.c b/net/ipv6/ip6_udp_tunnel.c
index a7bf032..c990531 100644
--- a/net/ipv6/ip6_udp_tunnel.c
+++ b/net/ipv6/ip6_udp_tunnel.c
@@ -182,4 +182,5 @@ struct dst_entry *udp_tunnel6_dst_lookup(struct sk_buff *skb,
}
EXPORT_SYMBOL_GPL(udp_tunnel6_dst_lookup);
+MODULE_DESCRIPTION("IPv6 Foo over UDP tunnel driver");
MODULE_LICENSE("GPL");
diff --git a/net/ipv6/mip6.c b/net/ipv6/mip6.c
index 83d2a8b..6a16a5b 100644
--- a/net/ipv6/mip6.c
+++ b/net/ipv6/mip6.c
@@ -405,6 +405,7 @@ static void __exit mip6_fini(void)
module_init(mip6_init);
module_exit(mip6_fini);
+MODULE_DESCRIPTION("IPv6 Mobility driver");
MODULE_LICENSE("GPL");
MODULE_ALIAS_XFRM_TYPE(AF_INET6, XFRM_PROTO_DSTOPTS);
MODULE_ALIAS_XFRM_TYPE(AF_INET6, XFRM_PROTO_ROUTING);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index ea1dec8..ef815ba5 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -5332,19 +5332,7 @@ static int ip6_route_multipath_add(struct fib6_config *cfg,
err_nh = NULL;
list_for_each_entry(nh, &rt6_nh_list, next) {
err = __ip6_ins_rt(nh->fib6_info, info, extack);
- fib6_info_release(nh->fib6_info);
- if (!err) {
- /* save reference to last route successfully inserted */
- rt_last = nh->fib6_info;
-
- /* save reference to first route for notification */
- if (!rt_notif)
- rt_notif = nh->fib6_info;
- }
-
- /* nh->fib6_info is used or freed at this point, reset to NULL*/
- nh->fib6_info = NULL;
if (err) {
if (replace && nhn)
NL_SET_ERR_MSG_MOD(extack,
@@ -5352,6 +5340,12 @@ static int ip6_route_multipath_add(struct fib6_config *cfg,
err_nh = nh;
goto add_errout;
}
+ /* save reference to last route successfully inserted */
+ rt_last = nh->fib6_info;
+
+ /* save reference to first route for notification */
+ if (!rt_notif)
+ rt_notif = nh->fib6_info;
/* Because each route is added like a single route we remove
* these flags after the first nexthop: if there is a collision,
@@ -5412,8 +5406,7 @@ static int ip6_route_multipath_add(struct fib6_config *cfg,
cleanup:
list_for_each_entry_safe(nh, nh_safe, &rt6_nh_list, next) {
- if (nh->fib6_info)
- fib6_info_release(nh->fib6_info);
+ fib6_info_release(nh->fib6_info);
list_del(&nh->next);
kfree(nh);
}
diff --git a/net/ipv6/seg6.c b/net/ipv6/seg6.c
index 29346a6..35508ab 100644
--- a/net/ipv6/seg6.c
+++ b/net/ipv6/seg6.c
@@ -512,22 +512,24 @@ int __init seg6_init(void)
{
int err;
- err = genl_register_family(&seg6_genl_family);
+ err = register_pernet_subsys(&ip6_segments_ops);
if (err)
goto out;
- err = register_pernet_subsys(&ip6_segments_ops);
+ err = genl_register_family(&seg6_genl_family);
if (err)
- goto out_unregister_genl;
+ goto out_unregister_pernet;
#ifdef CONFIG_IPV6_SEG6_LWTUNNEL
err = seg6_iptunnel_init();
if (err)
- goto out_unregister_pernet;
+ goto out_unregister_genl;
err = seg6_local_init();
- if (err)
- goto out_unregister_pernet;
+ if (err) {
+ seg6_iptunnel_exit();
+ goto out_unregister_genl;
+ }
#endif
#ifdef CONFIG_IPV6_SEG6_HMAC
@@ -548,11 +550,11 @@ int __init seg6_init(void)
#endif
#endif
#ifdef CONFIG_IPV6_SEG6_LWTUNNEL
-out_unregister_pernet:
- unregister_pernet_subsys(&ip6_segments_ops);
-#endif
out_unregister_genl:
genl_unregister_family(&seg6_genl_family);
+#endif
+out_unregister_pernet:
+ unregister_pernet_subsys(&ip6_segments_ops);
goto out;
}
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index cc24cefd..5e9f625 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1956,6 +1956,7 @@ static int __init sit_init(void)
module_init(sit_init);
module_exit(sit_cleanup);
+MODULE_DESCRIPTION("IPv6-in-IPv4 tunnel SIT driver");
MODULE_LICENSE("GPL");
MODULE_ALIAS_RTNL_LINK("sit");
MODULE_ALIAS_NETDEV("sit0");
diff --git a/net/ipv6/tunnel6.c b/net/ipv6/tunnel6.c
index 00e8d8b..dc4ea9b 100644
--- a/net/ipv6/tunnel6.c
+++ b/net/ipv6/tunnel6.c
@@ -302,4 +302,5 @@ static void __exit tunnel6_fini(void)
module_init(tunnel6_init);
module_exit(tunnel6_fini);
+MODULE_DESCRIPTION("IP-in-IPv6 tunnel driver");
MODULE_LICENSE("GPL");
diff --git a/net/ipv6/xfrm6_tunnel.c b/net/ipv6/xfrm6_tunnel.c
index 1323f2f..f6cb94f 100644
--- a/net/ipv6/xfrm6_tunnel.c
+++ b/net/ipv6/xfrm6_tunnel.c
@@ -401,5 +401,6 @@ static void __exit xfrm6_tunnel_fini(void)
module_init(xfrm6_tunnel_init);
module_exit(xfrm6_tunnel_fini);
+MODULE_DESCRIPTION("IPv6 XFRM tunnel driver");
MODULE_LICENSE("GPL");
MODULE_ALIAS_XFRM_TYPE(AF_INET6, XFRM_PROTO_IPV6);
diff --git a/net/iucv/iucv.c b/net/iucv/iucv.c
index 6334f64..b0b3e9c 100644
--- a/net/iucv/iucv.c
+++ b/net/iucv/iucv.c
@@ -156,7 +156,7 @@ static char iucv_error_pathid[16] = "INVALID PATHID";
static LIST_HEAD(iucv_handler_list);
/*
- * iucv_path_table: an array of iucv_path structures.
+ * iucv_path_table: array of pointers to iucv_path structures.
*/
static struct iucv_path **iucv_path_table;
static unsigned long iucv_max_pathid;
@@ -544,7 +544,7 @@ static int iucv_enable(void)
cpus_read_lock();
rc = -ENOMEM;
- alloc_size = iucv_max_pathid * sizeof(struct iucv_path);
+ alloc_size = iucv_max_pathid * sizeof(*iucv_path_table);
iucv_path_table = kzalloc(alloc_size, GFP_KERNEL);
if (!iucv_path_table)
goto out;
diff --git a/net/key/af_key.c b/net/key/af_key.c
index d68d018..f79fb99 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -3924,5 +3924,6 @@ static int __init ipsec_pfkey_init(void)
module_init(ipsec_pfkey_init);
module_exit(ipsec_pfkey_exit);
+MODULE_DESCRIPTION("PF_KEY socket helpers");
MODULE_LICENSE("GPL");
MODULE_ALIAS_NETPROTO(PF_KEY);
diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index dd31539..7bf14cf 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -627,7 +627,7 @@ static int l2tp_ip6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
back_from_confirm:
lock_sock(sk);
- ulen = len + skb_queue_empty(&sk->sk_write_queue) ? transhdrlen : 0;
+ ulen = len + (skb_queue_empty(&sk->sk_write_queue) ? transhdrlen : 0);
err = ip6_append_data(sk, ip_generic_getfrag, msg,
ulen, transhdrlen, &ipc6,
&fl6, (struct rt6_info *)dst,
diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index 489dd97..3276829 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -5,7 +5,7 @@
* Copyright 2006-2010 Johannes Berg <johannes@sipsolutions.net>
* Copyright 2013-2015 Intel Mobile Communications GmbH
* Copyright (C) 2015-2017 Intel Deutschland GmbH
- * Copyright (C) 2018-2022 Intel Corporation
+ * Copyright (C) 2018-2024 Intel Corporation
*/
#include <linux/ieee80211.h>
@@ -987,7 +987,8 @@ static int
ieee80211_set_unsol_bcast_probe_resp(struct ieee80211_sub_if_data *sdata,
struct cfg80211_unsol_bcast_probe_resp *params,
struct ieee80211_link_data *link,
- struct ieee80211_bss_conf *link_conf)
+ struct ieee80211_bss_conf *link_conf,
+ u64 *changed)
{
struct unsol_bcast_probe_resp_data *new, *old = NULL;
@@ -1011,7 +1012,8 @@ ieee80211_set_unsol_bcast_probe_resp(struct ieee80211_sub_if_data *sdata,
RCU_INIT_POINTER(link->u.ap.unsol_bcast_probe_resp, NULL);
}
- return BSS_CHANGED_UNSOL_BCAST_PROBE_RESP;
+ *changed |= BSS_CHANGED_UNSOL_BCAST_PROBE_RESP;
+ return 0;
}
static int ieee80211_set_ftm_responder_params(
@@ -1450,10 +1452,9 @@ static int ieee80211_start_ap(struct wiphy *wiphy, struct net_device *dev,
err = ieee80211_set_unsol_bcast_probe_resp(sdata,
¶ms->unsol_bcast_probe_resp,
- link, link_conf);
+ link, link_conf, &changed);
if (err < 0)
goto error;
- changed |= err;
err = drv_start_ap(sdata->local, sdata, link_conf);
if (err) {
@@ -1525,10 +1526,9 @@ static int ieee80211_change_beacon(struct wiphy *wiphy, struct net_device *dev,
err = ieee80211_set_unsol_bcast_probe_resp(sdata,
¶ms->unsol_bcast_probe_resp,
- link, link_conf);
+ link, link_conf, &changed);
if (err < 0)
return err;
- changed |= err;
if (beacon->he_bss_color_valid &&
beacon->he_bss_color.enabled != link_conf->he_bss_color.enabled) {
@@ -1869,6 +1869,8 @@ static int sta_link_apply_parameters(struct ieee80211_local *local,
sband->band);
}
+ ieee80211_sta_set_rx_nss(link_sta);
+
return ret;
}
diff --git a/net/mac80211/debugfs_netdev.c b/net/mac80211/debugfs_netdev.c
index dce5606..68596ef 100644
--- a/net/mac80211/debugfs_netdev.c
+++ b/net/mac80211/debugfs_netdev.c
@@ -997,8 +997,8 @@ static void add_link_files(struct ieee80211_link_data *link,
}
}
-void ieee80211_debugfs_add_netdev(struct ieee80211_sub_if_data *sdata,
- bool mld_vif)
+static void ieee80211_debugfs_add_netdev(struct ieee80211_sub_if_data *sdata,
+ bool mld_vif)
{
char buf[10+IFNAMSIZ];
diff --git a/net/mac80211/debugfs_netdev.h b/net/mac80211/debugfs_netdev.h
index b226b1a..a02ec0a 100644
--- a/net/mac80211/debugfs_netdev.h
+++ b/net/mac80211/debugfs_netdev.h
@@ -11,8 +11,6 @@
#include "ieee80211_i.h"
#ifdef CONFIG_MAC80211_DEBUGFS
-void ieee80211_debugfs_add_netdev(struct ieee80211_sub_if_data *sdata,
- bool mld_vif);
void ieee80211_debugfs_remove_netdev(struct ieee80211_sub_if_data *sdata);
void ieee80211_debugfs_rename_netdev(struct ieee80211_sub_if_data *sdata);
void ieee80211_debugfs_recreate_netdev(struct ieee80211_sub_if_data *sdata,
@@ -24,9 +22,6 @@ void ieee80211_link_debugfs_remove(struct ieee80211_link_data *link);
void ieee80211_link_debugfs_drv_add(struct ieee80211_link_data *link);
void ieee80211_link_debugfs_drv_remove(struct ieee80211_link_data *link);
#else
-static inline void ieee80211_debugfs_add_netdev(
- struct ieee80211_sub_if_data *sdata, bool mld_vif)
-{}
static inline void ieee80211_debugfs_remove_netdev(
struct ieee80211_sub_if_data *sdata)
{}
diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index e4e7c0b..11c4caa 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -1783,7 +1783,7 @@ static void ieee80211_setup_sdata(struct ieee80211_sub_if_data *sdata,
/* need to do this after the switch so vif.type is correct */
ieee80211_link_setup(&sdata->deflink);
- ieee80211_debugfs_add_netdev(sdata, false);
+ ieee80211_debugfs_recreate_netdev(sdata, false);
}
static int ieee80211_runtime_change_iftype(struct ieee80211_sub_if_data *sdata,
diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
index 073105d..2022a26 100644
--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -8,7 +8,7 @@
* Copyright 2007, Michael Wu <flamingice@sourmilk.net>
* Copyright 2013-2014 Intel Mobile Communications GmbH
* Copyright (C) 2015 - 2017 Intel Deutschland GmbH
- * Copyright (C) 2018 - 2023 Intel Corporation
+ * Copyright (C) 2018 - 2024 Intel Corporation
*/
#include <linux/delay.h>
@@ -2918,6 +2918,7 @@ static void ieee80211_set_disassoc(struct ieee80211_sub_if_data *sdata,
/* other links will be destroyed */
sdata->deflink.u.mgd.bss = NULL;
+ sdata->deflink.smps_mode = IEEE80211_SMPS_OFF;
netif_carrier_off(sdata->dev);
@@ -5045,9 +5046,6 @@ static int ieee80211_prep_channel(struct ieee80211_sub_if_data *sdata,
if (!link)
return 0;
- /* will change later if needed */
- link->smps_mode = IEEE80211_SMPS_OFF;
-
/*
* If this fails (possibly due to channel context sharing
* on incompatible channels, e.g. 80+80 and 160 sharing the
@@ -7096,6 +7094,7 @@ void ieee80211_mgd_setup_link(struct ieee80211_link_data *link)
link->u.mgd.p2p_noa_index = -1;
link->u.mgd.conn_flags = 0;
link->conf->bssid = link->u.mgd.bssid;
+ link->smps_mode = IEEE80211_SMPS_OFF;
wiphy_work_init(&link->u.mgd.request_smps_work,
ieee80211_request_smps_mgd_work);
@@ -7309,6 +7308,75 @@ static int ieee80211_prep_connection(struct ieee80211_sub_if_data *sdata,
return err;
}
+static bool ieee80211_mgd_csa_present(struct ieee80211_sub_if_data *sdata,
+ const struct cfg80211_bss_ies *ies,
+ u8 cur_channel, bool ignore_ecsa)
+{
+ const struct element *csa_elem, *ecsa_elem;
+ struct ieee80211_channel_sw_ie *csa = NULL;
+ struct ieee80211_ext_chansw_ie *ecsa = NULL;
+
+ if (!ies)
+ return false;
+
+ csa_elem = cfg80211_find_elem(WLAN_EID_CHANNEL_SWITCH,
+ ies->data, ies->len);
+ if (csa_elem && csa_elem->datalen == sizeof(*csa))
+ csa = (void *)csa_elem->data;
+
+ ecsa_elem = cfg80211_find_elem(WLAN_EID_EXT_CHANSWITCH_ANN,
+ ies->data, ies->len);
+ if (ecsa_elem && ecsa_elem->datalen == sizeof(*ecsa))
+ ecsa = (void *)ecsa_elem->data;
+
+ if (csa && csa->count == 0)
+ csa = NULL;
+ if (csa && !csa->mode && csa->new_ch_num == cur_channel)
+ csa = NULL;
+
+ if (ecsa && ecsa->count == 0)
+ ecsa = NULL;
+ if (ecsa && !ecsa->mode && ecsa->new_ch_num == cur_channel)
+ ecsa = NULL;
+
+ if (ignore_ecsa && ecsa) {
+ sdata_info(sdata,
+ "Ignoring ECSA in probe response - was considered stuck!\n");
+ return csa;
+ }
+
+ return csa || ecsa;
+}
+
+static bool ieee80211_mgd_csa_in_process(struct ieee80211_sub_if_data *sdata,
+ struct cfg80211_bss *bss)
+{
+ u8 cur_channel;
+ bool ret;
+
+ cur_channel = ieee80211_frequency_to_channel(bss->channel->center_freq);
+
+ rcu_read_lock();
+ if (ieee80211_mgd_csa_present(sdata,
+ rcu_dereference(bss->beacon_ies),
+ cur_channel, false)) {
+ ret = true;
+ goto out;
+ }
+
+ if (ieee80211_mgd_csa_present(sdata,
+ rcu_dereference(bss->proberesp_ies),
+ cur_channel, bss->proberesp_ecsa_stuck)) {
+ ret = true;
+ goto out;
+ }
+
+ ret = false;
+out:
+ rcu_read_unlock();
+ return ret;
+}
+
/* config hooks */
int ieee80211_mgd_auth(struct ieee80211_sub_if_data *sdata,
struct cfg80211_auth_request *req)
@@ -7317,7 +7385,6 @@ int ieee80211_mgd_auth(struct ieee80211_sub_if_data *sdata,
struct ieee80211_if_managed *ifmgd = &sdata->u.mgd;
struct ieee80211_mgd_auth_data *auth_data;
struct ieee80211_link_data *link;
- const struct element *csa_elem, *ecsa_elem;
u16 auth_alg;
int err;
bool cont_auth;
@@ -7360,21 +7427,10 @@ int ieee80211_mgd_auth(struct ieee80211_sub_if_data *sdata,
if (ifmgd->assoc_data)
return -EBUSY;
- rcu_read_lock();
- csa_elem = ieee80211_bss_get_elem(req->bss, WLAN_EID_CHANNEL_SWITCH);
- ecsa_elem = ieee80211_bss_get_elem(req->bss,
- WLAN_EID_EXT_CHANSWITCH_ANN);
- if ((csa_elem &&
- csa_elem->datalen == sizeof(struct ieee80211_channel_sw_ie) &&
- ((struct ieee80211_channel_sw_ie *)csa_elem->data)->count != 0) ||
- (ecsa_elem &&
- ecsa_elem->datalen == sizeof(struct ieee80211_ext_chansw_ie) &&
- ((struct ieee80211_ext_chansw_ie *)ecsa_elem->data)->count != 0)) {
- rcu_read_unlock();
+ if (ieee80211_mgd_csa_in_process(sdata, req->bss)) {
sdata_info(sdata, "AP is in CSA process, reject auth\n");
return -EINVAL;
}
- rcu_read_unlock();
auth_data = kzalloc(sizeof(*auth_data) + req->auth_data_len +
req->ie_len, GFP_KERNEL);
@@ -7684,7 +7740,7 @@ int ieee80211_mgd_assoc(struct ieee80211_sub_if_data *sdata,
struct ieee80211_local *local = sdata->local;
struct ieee80211_if_managed *ifmgd = &sdata->u.mgd;
struct ieee80211_mgd_assoc_data *assoc_data;
- const struct element *ssid_elem, *csa_elem, *ecsa_elem;
+ const struct element *ssid_elem;
struct ieee80211_vif_cfg *vif_cfg = &sdata->vif.cfg;
ieee80211_conn_flags_t conn_flags = 0;
struct ieee80211_link_data *link;
@@ -7707,23 +7763,15 @@ int ieee80211_mgd_assoc(struct ieee80211_sub_if_data *sdata,
cbss = req->link_id < 0 ? req->bss : req->links[req->link_id].bss;
- rcu_read_lock();
- ssid_elem = ieee80211_bss_get_elem(cbss, WLAN_EID_SSID);
- if (!ssid_elem || ssid_elem->datalen > sizeof(assoc_data->ssid)) {
- rcu_read_unlock();
+ if (ieee80211_mgd_csa_in_process(sdata, cbss)) {
+ sdata_info(sdata, "AP is in CSA process, reject assoc\n");
kfree(assoc_data);
return -EINVAL;
}
- csa_elem = ieee80211_bss_get_elem(cbss, WLAN_EID_CHANNEL_SWITCH);
- ecsa_elem = ieee80211_bss_get_elem(cbss, WLAN_EID_EXT_CHANSWITCH_ANN);
- if ((csa_elem &&
- csa_elem->datalen == sizeof(struct ieee80211_channel_sw_ie) &&
- ((struct ieee80211_channel_sw_ie *)csa_elem->data)->count != 0) ||
- (ecsa_elem &&
- ecsa_elem->datalen == sizeof(struct ieee80211_ext_chansw_ie) &&
- ((struct ieee80211_ext_chansw_ie *)ecsa_elem->data)->count != 0)) {
- sdata_info(sdata, "AP is in CSA process, reject assoc\n");
+ rcu_read_lock();
+ ssid_elem = ieee80211_bss_get_elem(cbss, WLAN_EID_SSID);
+ if (!ssid_elem || ssid_elem->datalen > sizeof(assoc_data->ssid)) {
rcu_read_unlock();
kfree(assoc_data);
return -EINVAL;
@@ -7998,8 +8046,7 @@ int ieee80211_mgd_assoc(struct ieee80211_sub_if_data *sdata,
rcu_read_lock();
beacon_ies = rcu_dereference(req->bss->beacon_ies);
-
- if (beacon_ies) {
+ if (!beacon_ies) {
/*
* Wait up to one beacon interval ...
* should this be more if we miss one?
@@ -8080,6 +8127,7 @@ int ieee80211_mgd_deauth(struct ieee80211_sub_if_data *sdata,
ieee80211_report_disconnect(sdata, frame_buf,
sizeof(frame_buf), true,
req->reason_code, false);
+ drv_mgd_complete_tx(sdata->local, sdata, &info);
return 0;
}
diff --git a/net/mac80211/rate.c b/net/mac80211/rate.c
index d5ea5f5..9d33fd23 100644
--- a/net/mac80211/rate.c
+++ b/net/mac80211/rate.c
@@ -119,7 +119,8 @@ void rate_control_rate_update(struct ieee80211_local *local,
rcu_read_unlock();
}
- drv_sta_rc_update(local, sta->sdata, &sta->sta, changed);
+ if (sta->uploaded)
+ drv_sta_rc_update(local, sta->sdata, &sta->sta, changed);
}
int ieee80211_rate_control_register(const struct rate_control_ops *ops)
diff --git a/net/mac80211/scan.c b/net/mac80211/scan.c
index 645355e..f9d5842 100644
--- a/net/mac80211/scan.c
+++ b/net/mac80211/scan.c
@@ -9,7 +9,7 @@
* Copyright 2007, Michael Wu <flamingice@sourmilk.net>
* Copyright 2013-2015 Intel Mobile Communications GmbH
* Copyright 2016-2017 Intel Deutschland GmbH
- * Copyright (C) 2018-2023 Intel Corporation
+ * Copyright (C) 2018-2024 Intel Corporation
*/
#include <linux/if_arp.h>
@@ -237,14 +237,18 @@ ieee80211_bss_info_update(struct ieee80211_local *local,
}
static bool ieee80211_scan_accept_presp(struct ieee80211_sub_if_data *sdata,
+ struct ieee80211_channel *channel,
u32 scan_flags, const u8 *da)
{
if (!sdata)
return false;
- /* accept broadcast for OCE */
- if (scan_flags & NL80211_SCAN_FLAG_ACCEPT_BCAST_PROBE_RESP &&
- is_broadcast_ether_addr(da))
+
+ /* accept broadcast on 6 GHz and for OCE */
+ if (is_broadcast_ether_addr(da) &&
+ (channel->band == NL80211_BAND_6GHZ ||
+ scan_flags & NL80211_SCAN_FLAG_ACCEPT_BCAST_PROBE_RESP))
return true;
+
if (scan_flags & NL80211_SCAN_FLAG_RANDOM_ADDR)
return true;
return ether_addr_equal(da, sdata->vif.addr);
@@ -293,6 +297,12 @@ void ieee80211_scan_rx(struct ieee80211_local *local, struct sk_buff *skb)
wiphy_delayed_work_queue(local->hw.wiphy, &local->scan_work, 0);
}
+ channel = ieee80211_get_channel_khz(local->hw.wiphy,
+ ieee80211_rx_status_to_khz(rx_status));
+
+ if (!channel || channel->flags & IEEE80211_CHAN_DISABLED)
+ return;
+
if (ieee80211_is_probe_resp(mgmt->frame_control)) {
struct cfg80211_scan_request *scan_req;
struct cfg80211_sched_scan_request *sched_scan_req;
@@ -310,19 +320,15 @@ void ieee80211_scan_rx(struct ieee80211_local *local, struct sk_buff *skb)
/* ignore ProbeResp to foreign address or non-bcast (OCE)
* unless scanning with randomised address
*/
- if (!ieee80211_scan_accept_presp(sdata1, scan_req_flags,
+ if (!ieee80211_scan_accept_presp(sdata1, channel,
+ scan_req_flags,
mgmt->da) &&
- !ieee80211_scan_accept_presp(sdata2, sched_scan_req_flags,
+ !ieee80211_scan_accept_presp(sdata2, channel,
+ sched_scan_req_flags,
mgmt->da))
return;
}
- channel = ieee80211_get_channel_khz(local->hw.wiphy,
- ieee80211_rx_status_to_khz(rx_status));
-
- if (!channel || channel->flags & IEEE80211_CHAN_DISABLED)
- return;
-
bss = ieee80211_bss_info_update(local, rx_status,
mgmt, skb->len,
channel);
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 68a48ab..6fbb15b 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -5,7 +5,7 @@
* Copyright 2006-2007 Jiri Benc <jbenc@suse.cz>
* Copyright 2007 Johannes Berg <johannes@sipsolutions.net>
* Copyright 2013-2014 Intel Mobile Communications GmbH
- * Copyright (C) 2018-2022 Intel Corporation
+ * Copyright (C) 2018-2024 Intel Corporation
*
* Transmit and frame generation functions.
*/
@@ -3100,10 +3100,11 @@ void ieee80211_check_fast_xmit(struct sta_info *sta)
/* DA SA BSSID */
build.da_offs = offsetof(struct ieee80211_hdr, addr1);
build.sa_offs = offsetof(struct ieee80211_hdr, addr2);
+ rcu_read_lock();
link = rcu_dereference(sdata->link[tdls_link_id]);
- if (WARN_ON_ONCE(!link))
- break;
- memcpy(hdr->addr3, link->u.mgd.bssid, ETH_ALEN);
+ if (!WARN_ON_ONCE(!link))
+ memcpy(hdr->addr3, link->u.mgd.bssid, ETH_ALEN);
+ rcu_read_unlock();
build.hdr_len = 24;
break;
}
@@ -3926,6 +3927,7 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
goto begin;
skb = __skb_dequeue(&tx.skbs);
+ info = IEEE80211_SKB_CB(skb);
if (!skb_queue_empty(&tx.skbs)) {
spin_lock_bh(&fq->lock);
@@ -3970,7 +3972,7 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
}
encap_out:
- IEEE80211_SKB_CB(skb)->control.vif = vif;
+ info->control.vif = vif;
if (tx.sta &&
wiphy_ext_feature_isset(local->hw.wiphy, NL80211_EXT_FEATURE_AQL)) {
diff --git a/net/mac80211/wbrf.c b/net/mac80211/wbrf.c
index a05c5b9..3a86123 100644
--- a/net/mac80211/wbrf.c
+++ b/net/mac80211/wbrf.c
@@ -23,8 +23,6 @@ void ieee80211_check_wbrf_support(struct ieee80211_local *local)
return;
local->wbrf_supported = acpi_amd_wbrf_supported_producer(dev);
- dev_dbg(dev, "WBRF is %s supported\n",
- local->wbrf_supported ? "" : "not");
}
static void get_chan_freq_boundary(u32 center_freq, u32 bandwidth, u64 *start, u64 *end)
diff --git a/net/mctp/route.c b/net/mctp/route.c
index 7a47a58..ceee44e 100644
--- a/net/mctp/route.c
+++ b/net/mctp/route.c
@@ -663,7 +663,7 @@ struct mctp_sk_key *mctp_alloc_local_tag(struct mctp_sock *msk,
spin_unlock_irqrestore(&mns->keys_lock, flags);
if (!tagbits) {
- kfree(key);
+ mctp_key_unref(key);
return ERR_PTR(-EBUSY);
}
@@ -888,7 +888,7 @@ int mctp_local_output(struct sock *sk, struct mctp_route *rt,
dev = dev_get_by_index_rcu(sock_net(sk), cb->ifindex);
if (!dev) {
rcu_read_unlock();
- return rc;
+ goto out_free;
}
rt->dev = __mctp_dev_get(dev);
rcu_read_unlock();
@@ -903,7 +903,8 @@ int mctp_local_output(struct sock *sk, struct mctp_route *rt,
rt->mtu = 0;
} else {
- return -EINVAL;
+ rc = -EINVAL;
+ goto out_free;
}
spin_lock_irqsave(&rt->dev->addrs_lock, flags);
@@ -966,12 +967,17 @@ int mctp_local_output(struct sock *sk, struct mctp_route *rt,
rc = mctp_do_fragment_route(rt, skb, mtu, tag);
}
+ /* route output functions consume the skb, even on error */
+ skb = NULL;
+
out_release:
if (!ext_rt)
mctp_route_release(rt);
mctp_dev_put(tmp_rt.dev);
+out_free:
+ kfree_skb(skb);
return rc;
}
diff --git a/net/mptcp/diag.c b/net/mptcp/diag.c
index a536586..7017dd6 100644
--- a/net/mptcp/diag.c
+++ b/net/mptcp/diag.c
@@ -13,17 +13,22 @@
#include <uapi/linux/mptcp.h>
#include "protocol.h"
-static int subflow_get_info(const struct sock *sk, struct sk_buff *skb)
+static int subflow_get_info(struct sock *sk, struct sk_buff *skb)
{
struct mptcp_subflow_context *sf;
struct nlattr *start;
u32 flags = 0;
+ bool slow;
int err;
+ if (inet_sk_state_load(sk) == TCP_LISTEN)
+ return 0;
+
start = nla_nest_start_noflag(skb, INET_ULP_INFO_MPTCP);
if (!start)
return -EMSGSIZE;
+ slow = lock_sock_fast(sk);
rcu_read_lock();
sf = rcu_dereference(inet_csk(sk)->icsk_ulp_data);
if (!sf) {
@@ -63,17 +68,19 @@ static int subflow_get_info(const struct sock *sk, struct sk_buff *skb)
sf->map_data_len) ||
nla_put_u32(skb, MPTCP_SUBFLOW_ATTR_FLAGS, flags) ||
nla_put_u8(skb, MPTCP_SUBFLOW_ATTR_ID_REM, sf->remote_id) ||
- nla_put_u8(skb, MPTCP_SUBFLOW_ATTR_ID_LOC, sf->local_id)) {
+ nla_put_u8(skb, MPTCP_SUBFLOW_ATTR_ID_LOC, subflow_get_local_id(sf))) {
err = -EMSGSIZE;
goto nla_failure;
}
rcu_read_unlock();
+ unlock_sock_fast(sk, slow);
nla_nest_end(skb, start);
return 0;
nla_failure:
rcu_read_unlock();
+ unlock_sock_fast(sk, slow);
nla_nest_cancel(skb, start);
return err;
}
diff --git a/net/mptcp/fastopen.c b/net/mptcp/fastopen.c
index 7469858..ad28da6 100644
--- a/net/mptcp/fastopen.c
+++ b/net/mptcp/fastopen.c
@@ -59,13 +59,12 @@ void mptcp_fastopen_subflow_synack_set_params(struct mptcp_subflow_context *subf
mptcp_data_unlock(sk);
}
-void mptcp_fastopen_gen_msk_ackseq(struct mptcp_sock *msk, struct mptcp_subflow_context *subflow,
- const struct mptcp_options_received *mp_opt)
+void __mptcp_fastopen_gen_msk_ackseq(struct mptcp_sock *msk, struct mptcp_subflow_context *subflow,
+ const struct mptcp_options_received *mp_opt)
{
struct sock *sk = (struct sock *)msk;
struct sk_buff *skb;
- mptcp_data_lock(sk);
skb = skb_peek_tail(&sk->sk_receive_queue);
if (skb) {
WARN_ON_ONCE(MPTCP_SKB_CB(skb)->end_seq);
@@ -77,5 +76,4 @@ void mptcp_fastopen_gen_msk_ackseq(struct mptcp_sock *msk, struct mptcp_subflow_
}
pr_debug("msk=%p ack_seq=%llx", msk, msk->ack_seq);
- mptcp_data_unlock(sk);
}
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index d2527d1..63fc075 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -962,9 +962,7 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk,
/* subflows are fully established as soon as we get any
* additional ack, including ADD_ADDR.
*/
- subflow->fully_established = 1;
- WRITE_ONCE(msk->fully_established, true);
- goto check_notify;
+ goto set_fully_established;
}
/* If the first established packet does not contain MP_CAPABLE + data
@@ -983,10 +981,13 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk,
if (mp_opt->deny_join_id0)
WRITE_ONCE(msk->pm.remote_deny_join_id0, true);
-set_fully_established:
if (unlikely(!READ_ONCE(msk->pm.server_side)))
pr_warn_once("bogus mpc option on established client sk");
- mptcp_subflow_fully_established(subflow, mp_opt);
+
+set_fully_established:
+ mptcp_data_lock((struct sock *)msk);
+ __mptcp_subflow_fully_established(msk, subflow, mp_opt);
+ mptcp_data_unlock((struct sock *)msk);
check_notify:
/* if the subflow is not already linked into the conn_list, we can't
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 287a603..58d17d9 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -396,19 +396,6 @@ void mptcp_pm_free_anno_list(struct mptcp_sock *msk)
}
}
-static bool lookup_address_in_vec(const struct mptcp_addr_info *addrs, unsigned int nr,
- const struct mptcp_addr_info *addr)
-{
- int i;
-
- for (i = 0; i < nr; i++) {
- if (addrs[i].id == addr->id)
- return true;
- }
-
- return false;
-}
-
/* Fill all the remote addresses into the array addrs[],
* and return the array size.
*/
@@ -440,18 +427,34 @@ static unsigned int fill_remote_addresses_vec(struct mptcp_sock *msk,
msk->pm.subflows++;
addrs[i++] = remote;
} else {
+ DECLARE_BITMAP(unavail_id, MPTCP_PM_MAX_ADDR_ID + 1);
+
+ /* Forbid creation of new subflows matching existing
+ * ones, possibly already created by incoming ADD_ADDR
+ */
+ bitmap_zero(unavail_id, MPTCP_PM_MAX_ADDR_ID + 1);
+ mptcp_for_each_subflow(msk, subflow)
+ if (READ_ONCE(subflow->local_id) == local->id)
+ __set_bit(subflow->remote_id, unavail_id);
+
mptcp_for_each_subflow(msk, subflow) {
ssk = mptcp_subflow_tcp_sock(subflow);
remote_address((struct sock_common *)ssk, &addrs[i]);
- addrs[i].id = subflow->remote_id;
+ addrs[i].id = READ_ONCE(subflow->remote_id);
if (deny_id0 && !addrs[i].id)
continue;
+ if (test_bit(addrs[i].id, unavail_id))
+ continue;
+
if (!mptcp_pm_addr_families_match(sk, local, &addrs[i]))
continue;
- if (!lookup_address_in_vec(addrs, i, &addrs[i]) &&
- msk->pm.subflows < subflows_max) {
+ if (msk->pm.subflows < subflows_max) {
+ /* forbid creating multiple address towards
+ * this id
+ */
+ __set_bit(addrs[i].id, unavail_id);
msk->pm.subflows++;
i++;
}
@@ -799,18 +802,18 @@ static void mptcp_pm_nl_rm_addr_or_subflow(struct mptcp_sock *msk,
mptcp_for_each_subflow_safe(msk, subflow, tmp) {
struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+ u8 remote_id = READ_ONCE(subflow->remote_id);
int how = RCV_SHUTDOWN | SEND_SHUTDOWN;
- u8 id = subflow->local_id;
+ u8 id = subflow_get_local_id(subflow);
- if (rm_type == MPTCP_MIB_RMADDR && subflow->remote_id != rm_id)
+ if (rm_type == MPTCP_MIB_RMADDR && remote_id != rm_id)
continue;
if (rm_type == MPTCP_MIB_RMSUBFLOW && !mptcp_local_id_match(msk, id, rm_id))
continue;
pr_debug(" -> %s rm_list_ids[%d]=%u local_id=%u remote_id=%u mpc_id=%u",
rm_type == MPTCP_MIB_RMADDR ? "address" : "subflow",
- i, rm_id, subflow->local_id, subflow->remote_id,
- msk->mpc_endpoint_id);
+ i, rm_id, id, remote_id, msk->mpc_endpoint_id);
spin_unlock_bh(&msk->pm.lock);
mptcp_subflow_shutdown(sk, ssk, how);
@@ -901,7 +904,8 @@ static void __mptcp_pm_release_addr_entry(struct mptcp_pm_addr_entry *entry)
}
static int mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet,
- struct mptcp_pm_addr_entry *entry)
+ struct mptcp_pm_addr_entry *entry,
+ bool needs_id)
{
struct mptcp_pm_addr_entry *cur, *del_entry = NULL;
unsigned int addr_max;
@@ -949,7 +953,7 @@ static int mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet,
}
}
- if (!entry->addr.id) {
+ if (!entry->addr.id && needs_id) {
find_next:
entry->addr.id = find_next_zero_bit(pernet->id_bitmap,
MPTCP_PM_MAX_ADDR_ID + 1,
@@ -960,7 +964,7 @@ static int mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet,
}
}
- if (!entry->addr.id)
+ if (!entry->addr.id && needs_id)
goto out;
__set_bit(entry->addr.id, pernet->id_bitmap);
@@ -1092,7 +1096,7 @@ int mptcp_pm_nl_get_local_id(struct mptcp_sock *msk, struct mptcp_addr_info *skc
entry->ifindex = 0;
entry->flags = MPTCP_PM_ADDR_FLAG_IMPLICIT;
entry->lsk = NULL;
- ret = mptcp_pm_nl_append_new_local_addr(pernet, entry);
+ ret = mptcp_pm_nl_append_new_local_addr(pernet, entry, true);
if (ret < 0)
kfree(entry);
@@ -1285,6 +1289,18 @@ static int mptcp_nl_add_subflow_or_signal_addr(struct net *net)
return 0;
}
+static bool mptcp_pm_has_addr_attr_id(const struct nlattr *attr,
+ struct genl_info *info)
+{
+ struct nlattr *tb[MPTCP_PM_ADDR_ATTR_MAX + 1];
+
+ if (!nla_parse_nested_deprecated(tb, MPTCP_PM_ADDR_ATTR_MAX, attr,
+ mptcp_pm_address_nl_policy, info->extack) &&
+ tb[MPTCP_PM_ADDR_ATTR_ID])
+ return true;
+ return false;
+}
+
int mptcp_pm_nl_add_addr_doit(struct sk_buff *skb, struct genl_info *info)
{
struct nlattr *attr = info->attrs[MPTCP_PM_ENDPOINT_ADDR];
@@ -1326,7 +1342,8 @@ int mptcp_pm_nl_add_addr_doit(struct sk_buff *skb, struct genl_info *info)
goto out_free;
}
}
- ret = mptcp_pm_nl_append_new_local_addr(pernet, entry);
+ ret = mptcp_pm_nl_append_new_local_addr(pernet, entry,
+ !mptcp_pm_has_addr_attr_id(attr, info));
if (ret < 0) {
GENL_SET_ERR_MSG_FMT(info, "too many addresses or duplicate one: %d", ret);
goto out_free;
@@ -1980,7 +1997,7 @@ static int mptcp_event_add_subflow(struct sk_buff *skb, const struct sock *ssk)
if (WARN_ON_ONCE(!sf))
return -EINVAL;
- if (nla_put_u8(skb, MPTCP_ATTR_LOC_ID, sf->local_id))
+ if (nla_put_u8(skb, MPTCP_ATTR_LOC_ID, subflow_get_local_id(sf)))
return -EMSGSIZE;
if (nla_put_u8(skb, MPTCP_ATTR_REM_ID, sf->remote_id))
diff --git a/net/mptcp/pm_userspace.c b/net/mptcp/pm_userspace.c
index efecbe3..bc97cc3 100644
--- a/net/mptcp/pm_userspace.c
+++ b/net/mptcp/pm_userspace.c
@@ -26,7 +26,8 @@ void mptcp_free_local_addr_list(struct mptcp_sock *msk)
}
static int mptcp_userspace_pm_append_new_local_addr(struct mptcp_sock *msk,
- struct mptcp_pm_addr_entry *entry)
+ struct mptcp_pm_addr_entry *entry,
+ bool needs_id)
{
DECLARE_BITMAP(id_bitmap, MPTCP_PM_MAX_ADDR_ID + 1);
struct mptcp_pm_addr_entry *match = NULL;
@@ -41,7 +42,7 @@ static int mptcp_userspace_pm_append_new_local_addr(struct mptcp_sock *msk,
spin_lock_bh(&msk->pm.lock);
list_for_each_entry(e, &msk->pm.userspace_pm_local_addr_list, list) {
addr_match = mptcp_addresses_equal(&e->addr, &entry->addr, true);
- if (addr_match && entry->addr.id == 0)
+ if (addr_match && entry->addr.id == 0 && needs_id)
entry->addr.id = e->addr.id;
id_match = (e->addr.id == entry->addr.id);
if (addr_match && id_match) {
@@ -64,7 +65,7 @@ static int mptcp_userspace_pm_append_new_local_addr(struct mptcp_sock *msk,
}
*e = *entry;
- if (!e->addr.id)
+ if (!e->addr.id && needs_id)
e->addr.id = find_next_zero_bit(id_bitmap,
MPTCP_PM_MAX_ADDR_ID + 1,
1);
@@ -130,10 +131,21 @@ int mptcp_userspace_pm_get_flags_and_ifindex_by_id(struct mptcp_sock *msk,
int mptcp_userspace_pm_get_local_id(struct mptcp_sock *msk,
struct mptcp_addr_info *skc)
{
- struct mptcp_pm_addr_entry new_entry;
+ struct mptcp_pm_addr_entry *entry = NULL, *e, new_entry;
__be16 msk_sport = ((struct inet_sock *)
inet_sk((struct sock *)msk))->inet_sport;
+ spin_lock_bh(&msk->pm.lock);
+ list_for_each_entry(e, &msk->pm.userspace_pm_local_addr_list, list) {
+ if (mptcp_addresses_equal(&e->addr, skc, false)) {
+ entry = e;
+ break;
+ }
+ }
+ spin_unlock_bh(&msk->pm.lock);
+ if (entry)
+ return entry->addr.id;
+
memset(&new_entry, 0, sizeof(struct mptcp_pm_addr_entry));
new_entry.addr = *skc;
new_entry.addr.id = 0;
@@ -142,7 +154,7 @@ int mptcp_userspace_pm_get_local_id(struct mptcp_sock *msk,
if (new_entry.addr.port == msk_sport)
new_entry.addr.port = 0;
- return mptcp_userspace_pm_append_new_local_addr(msk, &new_entry);
+ return mptcp_userspace_pm_append_new_local_addr(msk, &new_entry, true);
}
int mptcp_pm_nl_announce_doit(struct sk_buff *skb, struct genl_info *info)
@@ -187,7 +199,7 @@ int mptcp_pm_nl_announce_doit(struct sk_buff *skb, struct genl_info *info)
goto announce_err;
}
- err = mptcp_userspace_pm_append_new_local_addr(msk, &addr_val);
+ err = mptcp_userspace_pm_append_new_local_addr(msk, &addr_val, false);
if (err < 0) {
GENL_SET_ERR_MSG(info, "did not match address and id");
goto announce_err;
@@ -222,7 +234,7 @@ static int mptcp_userspace_pm_remove_id_zero_address(struct mptcp_sock *msk,
lock_sock(sk);
mptcp_for_each_subflow(msk, subflow) {
- if (subflow->local_id == 0) {
+ if (READ_ONCE(subflow->local_id) == 0) {
has_id_0 = true;
break;
}
@@ -367,7 +379,7 @@ int mptcp_pm_nl_subflow_create_doit(struct sk_buff *skb, struct genl_info *info)
}
local.addr = addr_l;
- err = mptcp_userspace_pm_append_new_local_addr(msk, &local);
+ err = mptcp_userspace_pm_append_new_local_addr(msk, &local, false);
if (err < 0) {
GENL_SET_ERR_MSG(info, "did not match address and id");
goto create_err;
@@ -483,6 +495,16 @@ int mptcp_pm_nl_subflow_destroy_doit(struct sk_buff *skb, struct genl_info *info
goto destroy_err;
}
+#if IS_ENABLED(CONFIG_MPTCP_IPV6)
+ if (addr_l.family == AF_INET && ipv6_addr_v4mapped(&addr_r.addr6)) {
+ ipv6_addr_set_v4mapped(addr_l.addr.s_addr, &addr_l.addr6);
+ addr_l.family = AF_INET6;
+ }
+ if (addr_r.family == AF_INET && ipv6_addr_v4mapped(&addr_l.addr6)) {
+ ipv6_addr_set_v4mapped(addr_r.addr.s_addr, &addr_r.addr6);
+ addr_r.family = AF_INET6;
+ }
+#endif
if (addr_l.family != addr_r.family) {
GENL_SET_ERR_MSG(info, "address families do not match");
err = -EINVAL;
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 028e8b4..7833a49 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -85,7 +85,7 @@ static int __mptcp_socket_create(struct mptcp_sock *msk)
subflow->subflow_id = msk->subflow_id++;
/* This is the first subflow, always with id 0 */
- subflow->local_id_valid = 1;
+ WRITE_ONCE(subflow->local_id, 0);
mptcp_sock_graft(msk->first, sk->sk_socket);
iput(SOCK_INODE(ssock));
@@ -1260,6 +1260,7 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
mpext = mptcp_get_ext(skb);
if (!mptcp_skb_can_collapse_to(data_seq, skb, mpext)) {
TCP_SKB_CB(skb)->eor = 1;
+ tcp_mark_push(tcp_sk(ssk), skb);
goto alloc_skb;
}
@@ -1505,8 +1506,11 @@ static void mptcp_update_post_push(struct mptcp_sock *msk,
void mptcp_check_and_set_pending(struct sock *sk)
{
- if (mptcp_send_head(sk))
- mptcp_sk(sk)->push_pending |= BIT(MPTCP_PUSH_PENDING);
+ if (mptcp_send_head(sk)) {
+ mptcp_data_lock(sk);
+ mptcp_sk(sk)->cb_flags |= BIT(MPTCP_PUSH_PENDING);
+ mptcp_data_unlock(sk);
+ }
}
static int __subflow_push_pending(struct sock *sk, struct sock *ssk,
@@ -1960,6 +1964,9 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied)
if (copied <= 0)
return;
+ if (!msk->rcvspace_init)
+ mptcp_rcv_space_init(msk, msk->first);
+
msk->rcvq_space.copied += copied;
mstamp = div_u64(tcp_clock_ns(), NSEC_PER_USEC);
@@ -3142,7 +3149,6 @@ static int mptcp_disconnect(struct sock *sk, int flags)
mptcp_destroy_common(msk, MPTCP_CF_FASTCLOSE);
WRITE_ONCE(msk->flags, 0);
msk->cb_flags = 0;
- msk->push_pending = 0;
msk->recovery = false;
msk->can_ack = false;
msk->fully_established = false;
@@ -3158,6 +3164,7 @@ static int mptcp_disconnect(struct sock *sk, int flags)
msk->bytes_received = 0;
msk->bytes_sent = 0;
msk->bytes_retrans = 0;
+ msk->rcvspace_init = 0;
WRITE_ONCE(sk->sk_shutdown, 0);
sk_error_report(sk);
@@ -3171,8 +3178,50 @@ static struct ipv6_pinfo *mptcp_inet6_sk(const struct sock *sk)
return (struct ipv6_pinfo *)(((u8 *)sk) + offset);
}
+
+static void mptcp_copy_ip6_options(struct sock *newsk, const struct sock *sk)
+{
+ const struct ipv6_pinfo *np = inet6_sk(sk);
+ struct ipv6_txoptions *opt;
+ struct ipv6_pinfo *newnp;
+
+ newnp = inet6_sk(newsk);
+
+ rcu_read_lock();
+ opt = rcu_dereference(np->opt);
+ if (opt) {
+ opt = ipv6_dup_options(newsk, opt);
+ if (!opt)
+ net_warn_ratelimited("%s: Failed to copy ip6 options\n", __func__);
+ }
+ RCU_INIT_POINTER(newnp->opt, opt);
+ rcu_read_unlock();
+}
#endif
+static void mptcp_copy_ip_options(struct sock *newsk, const struct sock *sk)
+{
+ struct ip_options_rcu *inet_opt, *newopt = NULL;
+ const struct inet_sock *inet = inet_sk(sk);
+ struct inet_sock *newinet;
+
+ newinet = inet_sk(newsk);
+
+ rcu_read_lock();
+ inet_opt = rcu_dereference(inet->inet_opt);
+ if (inet_opt) {
+ newopt = sock_kmalloc(newsk, sizeof(*inet_opt) +
+ inet_opt->opt.optlen, GFP_ATOMIC);
+ if (newopt)
+ memcpy(newopt, inet_opt, sizeof(*inet_opt) +
+ inet_opt->opt.optlen);
+ else
+ net_warn_ratelimited("%s: Failed to copy ip options\n", __func__);
+ }
+ RCU_INIT_POINTER(newinet->inet_opt, newopt);
+ rcu_read_unlock();
+}
+
struct sock *mptcp_sk_clone_init(const struct sock *sk,
const struct mptcp_options_received *mp_opt,
struct sock *ssk,
@@ -3180,6 +3229,7 @@ struct sock *mptcp_sk_clone_init(const struct sock *sk,
{
struct mptcp_subflow_request_sock *subflow_req = mptcp_subflow_rsk(req);
struct sock *nsk = sk_clone_lock(sk, GFP_ATOMIC);
+ struct mptcp_subflow_context *subflow;
struct mptcp_sock *msk;
if (!nsk)
@@ -3192,6 +3242,13 @@ struct sock *mptcp_sk_clone_init(const struct sock *sk,
__mptcp_init_sock(nsk);
+#if IS_ENABLED(CONFIG_MPTCP_IPV6)
+ if (nsk->sk_family == AF_INET6)
+ mptcp_copy_ip6_options(nsk, sk);
+ else
+#endif
+ mptcp_copy_ip_options(nsk, sk);
+
msk = mptcp_sk(nsk);
msk->local_key = subflow_req->local_key;
msk->token = subflow_req->token;
@@ -3203,7 +3260,7 @@ struct sock *mptcp_sk_clone_init(const struct sock *sk,
msk->write_seq = subflow_req->idsn + 1;
msk->snd_nxt = msk->write_seq;
msk->snd_una = msk->write_seq;
- msk->wnd_end = msk->snd_nxt + req->rsk_rcv_wnd;
+ msk->wnd_end = msk->snd_nxt + tcp_sk(ssk)->snd_wnd;
msk->setsockopt_seq = mptcp_sk(sk)->setsockopt_seq;
mptcp_init_sched(msk, mptcp_sk(sk)->sched);
@@ -3220,7 +3277,8 @@ struct sock *mptcp_sk_clone_init(const struct sock *sk,
/* The msk maintain a ref to each subflow in the connections list */
WRITE_ONCE(msk->first, ssk);
- list_add(&mptcp_subflow_ctx(ssk)->node, &msk->conn_list);
+ subflow = mptcp_subflow_ctx(ssk);
+ list_add(&subflow->node, &msk->conn_list);
sock_hold(ssk);
/* new mpc subflow takes ownership of the newly
@@ -3235,6 +3293,9 @@ struct sock *mptcp_sk_clone_init(const struct sock *sk,
__mptcp_propagate_sndbuf(nsk, ssk);
mptcp_rcv_space_init(msk, ssk);
+
+ if (mp_opt->suboptions & OPTION_MPTCP_MPC_ACK)
+ __mptcp_subflow_fully_established(msk, subflow, mp_opt);
bh_unlock_sock(nsk);
/* note: the newly allocated socket refcount is 2 now */
@@ -3245,6 +3306,7 @@ void mptcp_rcv_space_init(struct mptcp_sock *msk, const struct sock *ssk)
{
const struct tcp_sock *tp = tcp_sk(ssk);
+ msk->rcvspace_init = 1;
msk->rcvq_space.copied = 0;
msk->rcvq_space.rtt_us = 0;
@@ -3255,8 +3317,6 @@ void mptcp_rcv_space_init(struct mptcp_sock *msk, const struct sock *ssk)
TCP_INIT_CWND * tp->advmss);
if (msk->rcvq_space.space == 0)
msk->rcvq_space.space = TCP_INIT_CWND * TCP_MSS_DEFAULT;
-
- WRITE_ONCE(msk->wnd_end, msk->snd_nxt + tcp_sk(ssk)->snd_wnd);
}
void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags)
@@ -3330,8 +3390,7 @@ static void mptcp_release_cb(struct sock *sk)
struct mptcp_sock *msk = mptcp_sk(sk);
for (;;) {
- unsigned long flags = (msk->cb_flags & MPTCP_FLAGS_PROCESS_CTX_NEED) |
- msk->push_pending;
+ unsigned long flags = (msk->cb_flags & MPTCP_FLAGS_PROCESS_CTX_NEED);
struct list_head join_list;
if (!flags)
@@ -3347,7 +3406,6 @@ static void mptcp_release_cb(struct sock *sk)
* datapath acquires the msk socket spinlock while helding
* the subflow socket lock
*/
- msk->push_pending = 0;
msk->cb_flags &= ~flags;
spin_unlock_bh(&sk->sk_lock.slock);
@@ -3475,13 +3533,8 @@ void mptcp_finish_connect(struct sock *ssk)
* accessing the field below
*/
WRITE_ONCE(msk->local_key, subflow->local_key);
- WRITE_ONCE(msk->write_seq, subflow->idsn + 1);
- WRITE_ONCE(msk->snd_nxt, msk->write_seq);
- WRITE_ONCE(msk->snd_una, msk->write_seq);
mptcp_pm_new_connection(msk, ssk, 0);
-
- mptcp_rcv_space_init(msk, ssk);
}
void mptcp_sock_graft(struct sock *sk, struct socket *parent)
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 3517f2d..07f6242 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -286,7 +286,6 @@ struct mptcp_sock {
int rmem_released;
unsigned long flags;
unsigned long cb_flags;
- unsigned long push_pending;
bool recovery; /* closing subflow write queue reinjected */
bool can_ack;
bool fully_established;
@@ -305,7 +304,8 @@ struct mptcp_sock {
nodelay:1,
fastopening:1,
in_accept_queue:1,
- free_first:1;
+ free_first:1,
+ rcvspace_init:1;
struct work_struct work;
struct sk_buff *ooo_last_skb;
struct rb_root out_of_order_queue;
@@ -491,10 +491,9 @@ struct mptcp_subflow_context {
remote_key_valid : 1, /* received the peer key from */
disposable : 1, /* ctx can be free at ulp release time */
stale : 1, /* unable to snd/rcv data, do not use for xmit */
- local_id_valid : 1, /* local_id is correctly initialized */
valid_csum_seen : 1, /* at least one csum validated */
is_mptfo : 1, /* subflow is doing TFO */
- __unused : 9;
+ __unused : 10;
bool data_avail;
bool scheduled;
u32 remote_nonce;
@@ -505,7 +504,7 @@ struct mptcp_subflow_context {
u8 hmac[MPTCPOPT_HMAC_LEN]; /* MPJ subflow only */
u64 iasn; /* initial ack sequence number, MPC subflows only */
};
- u8 local_id;
+ s16 local_id; /* if negative not initialized yet */
u8 remote_id;
u8 reset_seen:1;
u8 reset_transient:1;
@@ -556,6 +555,7 @@ mptcp_subflow_ctx_reset(struct mptcp_subflow_context *subflow)
{
memset(&subflow->reset, 0, sizeof(subflow->reset));
subflow->request_mptcp = 1;
+ WRITE_ONCE(subflow->local_id, -1);
}
static inline u64
@@ -622,8 +622,9 @@ unsigned int mptcp_stale_loss_cnt(const struct net *net);
unsigned int mptcp_close_timeout(const struct sock *sk);
int mptcp_get_pm_type(const struct net *net);
const char *mptcp_get_scheduler(const struct net *net);
-void mptcp_subflow_fully_established(struct mptcp_subflow_context *subflow,
- const struct mptcp_options_received *mp_opt);
+void __mptcp_subflow_fully_established(struct mptcp_sock *msk,
+ struct mptcp_subflow_context *subflow,
+ const struct mptcp_options_received *mp_opt);
bool __mptcp_retransmit_pending_data(struct sock *sk);
void mptcp_check_and_set_pending(struct sock *sk);
void __mptcp_push_pending(struct sock *sk, unsigned int flags);
@@ -789,6 +790,16 @@ static inline bool mptcp_data_fin_enabled(const struct mptcp_sock *msk)
READ_ONCE(msk->write_seq) == READ_ONCE(msk->snd_nxt);
}
+static inline void mptcp_write_space(struct sock *sk)
+{
+ if (sk_stream_is_writeable(sk)) {
+ /* pairs with memory barrier in mptcp_poll */
+ smp_mb();
+ if (test_and_clear_bit(MPTCP_NOSPACE, &mptcp_sk(sk)->flags))
+ sk_stream_write_space(sk);
+ }
+}
+
static inline void __mptcp_sync_sndbuf(struct sock *sk)
{
struct mptcp_subflow_context *subflow;
@@ -807,6 +818,7 @@ static inline void __mptcp_sync_sndbuf(struct sock *sk)
/* the msk max wmem limit is <nr_subflows> * tcp wmem[2] */
WRITE_ONCE(sk->sk_sndbuf, new_sndbuf);
+ mptcp_write_space(sk);
}
/* The called held both the msk socket and the subflow socket locks,
@@ -837,16 +849,6 @@ static inline void mptcp_propagate_sndbuf(struct sock *sk, struct sock *ssk)
local_bh_enable();
}
-static inline void mptcp_write_space(struct sock *sk)
-{
- if (sk_stream_is_writeable(sk)) {
- /* pairs with memory barrier in mptcp_poll */
- smp_mb();
- if (test_and_clear_bit(MPTCP_NOSPACE, &mptcp_sk(sk)->flags))
- sk_stream_write_space(sk);
- }
-}
-
void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags);
#define MPTCP_TOKEN_MAX_RETRIES 4
@@ -952,8 +954,8 @@ void mptcp_event_pm_listener(const struct sock *ssk,
enum mptcp_event_type event);
bool mptcp_userspace_pm_active(const struct mptcp_sock *msk);
-void mptcp_fastopen_gen_msk_ackseq(struct mptcp_sock *msk, struct mptcp_subflow_context *subflow,
- const struct mptcp_options_received *mp_opt);
+void __mptcp_fastopen_gen_msk_ackseq(struct mptcp_sock *msk, struct mptcp_subflow_context *subflow,
+ const struct mptcp_options_received *mp_opt);
void mptcp_fastopen_subflow_synack_set_params(struct mptcp_subflow_context *subflow,
struct request_sock *req);
@@ -1021,6 +1023,15 @@ int mptcp_pm_get_local_id(struct mptcp_sock *msk, struct sock_common *skc);
int mptcp_pm_nl_get_local_id(struct mptcp_sock *msk, struct mptcp_addr_info *skc);
int mptcp_userspace_pm_get_local_id(struct mptcp_sock *msk, struct mptcp_addr_info *skc);
+static inline u8 subflow_get_local_id(const struct mptcp_subflow_context *subflow)
+{
+ int local_id = READ_ONCE(subflow->local_id);
+
+ if (local_id < 0)
+ return 0;
+ return local_id;
+}
+
void __init mptcp_pm_nl_init(void);
void mptcp_pm_nl_work(struct mptcp_sock *msk);
void mptcp_pm_nl_rm_subflow_received(struct mptcp_sock *msk,
@@ -1128,7 +1139,8 @@ static inline bool subflow_simultaneous_connect(struct sock *sk)
{
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
- return (1 << sk->sk_state) & (TCPF_ESTABLISHED | TCPF_FIN_WAIT1) &&
+ return (1 << sk->sk_state) &
+ (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 | TCPF_FIN_WAIT2 | TCPF_CLOSING) &&
is_active_ssk(subflow) &&
!subflow->conn_finished;
}
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 0dcb721..71ba862 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -421,29 +421,26 @@ static bool subflow_use_different_dport(struct mptcp_sock *msk, const struct soc
void __mptcp_sync_state(struct sock *sk, int state)
{
+ struct mptcp_subflow_context *subflow;
struct mptcp_sock *msk = mptcp_sk(sk);
+ struct sock *ssk = msk->first;
- __mptcp_propagate_sndbuf(sk, msk->first);
+ subflow = mptcp_subflow_ctx(ssk);
+ __mptcp_propagate_sndbuf(sk, ssk);
+ if (!msk->rcvspace_init)
+ mptcp_rcv_space_init(msk, ssk);
+
if (sk->sk_state == TCP_SYN_SENT) {
+ /* subflow->idsn is always available is TCP_SYN_SENT state,
+ * even for the FASTOPEN scenarios
+ */
+ WRITE_ONCE(msk->write_seq, subflow->idsn + 1);
+ WRITE_ONCE(msk->snd_nxt, msk->write_seq);
mptcp_set_state(sk, state);
sk->sk_state_change(sk);
}
}
-static void mptcp_propagate_state(struct sock *sk, struct sock *ssk)
-{
- struct mptcp_sock *msk = mptcp_sk(sk);
-
- mptcp_data_lock(sk);
- if (!sock_owned_by_user(sk)) {
- __mptcp_sync_state(sk, ssk->sk_state);
- } else {
- msk->pending_state = ssk->sk_state;
- __set_bit(MPTCP_SYNC_STATE, &msk->cb_flags);
- }
- mptcp_data_unlock(sk);
-}
-
static void subflow_set_remote_key(struct mptcp_sock *msk,
struct mptcp_subflow_context *subflow,
const struct mptcp_options_received *mp_opt)
@@ -465,6 +462,31 @@ static void subflow_set_remote_key(struct mptcp_sock *msk,
atomic64_set(&msk->rcv_wnd_sent, subflow->iasn);
}
+static void mptcp_propagate_state(struct sock *sk, struct sock *ssk,
+ struct mptcp_subflow_context *subflow,
+ const struct mptcp_options_received *mp_opt)
+{
+ struct mptcp_sock *msk = mptcp_sk(sk);
+
+ mptcp_data_lock(sk);
+ if (mp_opt) {
+ /* Options are available only in the non fallback cases
+ * avoid updating rx path fields otherwise
+ */
+ WRITE_ONCE(msk->snd_una, subflow->idsn + 1);
+ WRITE_ONCE(msk->wnd_end, subflow->idsn + 1 + tcp_sk(ssk)->snd_wnd);
+ subflow_set_remote_key(msk, subflow, mp_opt);
+ }
+
+ if (!sock_owned_by_user(sk)) {
+ __mptcp_sync_state(sk, ssk->sk_state);
+ } else {
+ msk->pending_state = ssk->sk_state;
+ __set_bit(MPTCP_SYNC_STATE, &msk->cb_flags);
+ }
+ mptcp_data_unlock(sk);
+}
+
static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
{
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
@@ -499,10 +521,9 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
if (mp_opt.deny_join_id0)
WRITE_ONCE(msk->pm.remote_deny_join_id0, true);
subflow->mp_capable = 1;
- subflow_set_remote_key(msk, subflow, &mp_opt);
MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_MPCAPABLEACTIVEACK);
mptcp_finish_connect(sk);
- mptcp_propagate_state(parent, sk);
+ mptcp_propagate_state(parent, sk, subflow, &mp_opt);
} else if (subflow->request_join) {
u8 hmac[SHA256_DIGEST_SIZE];
@@ -514,7 +535,7 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
subflow->backup = mp_opt.backup;
subflow->thmac = mp_opt.thmac;
subflow->remote_nonce = mp_opt.nonce;
- subflow->remote_id = mp_opt.join_id;
+ WRITE_ONCE(subflow->remote_id, mp_opt.join_id);
pr_debug("subflow=%p, thmac=%llu, remote_nonce=%u backup=%d",
subflow, subflow->thmac, subflow->remote_nonce,
subflow->backup);
@@ -545,8 +566,7 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
}
} else if (mptcp_check_fallback(sk)) {
fallback:
- mptcp_rcv_space_init(msk, sk);
- mptcp_propagate_state(parent, sk);
+ mptcp_propagate_state(parent, sk, subflow, NULL);
}
return;
@@ -557,8 +577,8 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
static void subflow_set_local_id(struct mptcp_subflow_context *subflow, int local_id)
{
- subflow->local_id = local_id;
- subflow->local_id_valid = 1;
+ WARN_ON_ONCE(local_id < 0 || local_id > 255);
+ WRITE_ONCE(subflow->local_id, local_id);
}
static int subflow_chk_local_id(struct sock *sk)
@@ -567,7 +587,7 @@ static int subflow_chk_local_id(struct sock *sk)
struct mptcp_sock *msk = mptcp_sk(subflow->conn);
int err;
- if (likely(subflow->local_id_valid))
+ if (likely(subflow->local_id >= 0))
return 0;
err = mptcp_pm_get_local_id(msk, (struct sock_common *)sk);
@@ -731,17 +751,16 @@ void mptcp_subflow_drop_ctx(struct sock *ssk)
kfree_rcu(ctx, rcu);
}
-void mptcp_subflow_fully_established(struct mptcp_subflow_context *subflow,
- const struct mptcp_options_received *mp_opt)
+void __mptcp_subflow_fully_established(struct mptcp_sock *msk,
+ struct mptcp_subflow_context *subflow,
+ const struct mptcp_options_received *mp_opt)
{
- struct mptcp_sock *msk = mptcp_sk(subflow->conn);
-
subflow_set_remote_key(msk, subflow, mp_opt);
subflow->fully_established = 1;
WRITE_ONCE(msk->fully_established, true);
if (subflow->is_mptfo)
- mptcp_fastopen_gen_msk_ackseq(msk, subflow, mp_opt);
+ __mptcp_fastopen_gen_msk_ackseq(msk, subflow, mp_opt);
}
static struct sock *subflow_syn_recv_sock(const struct sock *sk,
@@ -834,7 +853,6 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk,
* mpc option
*/
if (mp_opt.suboptions & OPTION_MPTCP_MPC_ACK) {
- mptcp_subflow_fully_established(ctx, &mp_opt);
mptcp_pm_fully_established(owner, child);
ctx->pm_notified = 1;
}
@@ -1549,7 +1567,7 @@ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_addr_info *loc,
pr_debug("msk=%p remote_token=%u local_id=%d remote_id=%d", msk,
remote_token, local_id, remote_id);
subflow->remote_token = remote_token;
- subflow->remote_id = remote_id;
+ WRITE_ONCE(subflow->remote_id, remote_id);
subflow->request_join = 1;
subflow->request_bkup = !!(flags & MPTCP_PM_ADDR_FLAG_BACKUP);
subflow->subflow_id = msk->subflow_id++;
@@ -1713,6 +1731,7 @@ static struct mptcp_subflow_context *subflow_create_ctx(struct sock *sk,
pr_debug("subflow=%p", ctx);
ctx->tcp_sock = sk;
+ WRITE_ONCE(ctx->local_id, -1);
return ctx;
}
@@ -1744,10 +1763,9 @@ static void subflow_state_change(struct sock *sk)
msk = mptcp_sk(parent);
if (subflow_simultaneous_connect(sk)) {
mptcp_do_fallback(sk);
- mptcp_rcv_space_init(msk, sk);
pr_fallback(msk);
subflow->conn_finished = 1;
- mptcp_propagate_state(parent, sk);
+ mptcp_propagate_state(parent, sk, subflow, NULL);
}
/* as recvmsg() does not acquire the subflow socket for ssk selection
@@ -1949,14 +1967,14 @@ static void subflow_ulp_clone(const struct request_sock *req,
new_ctx->idsn = subflow_req->idsn;
/* this is the first subflow, id is always 0 */
- new_ctx->local_id_valid = 1;
+ subflow_set_local_id(new_ctx, 0);
} else if (subflow_req->mp_join) {
new_ctx->ssn_offset = subflow_req->ssn_offset;
new_ctx->mp_join = 1;
new_ctx->fully_established = 1;
new_ctx->remote_key_valid = 1;
new_ctx->backup = subflow_req->backup;
- new_ctx->remote_id = subflow_req->remote_id;
+ WRITE_ONCE(new_ctx->remote_id, subflow_req->remote_id);
new_ctx->token = subflow_req->token;
new_ctx->thmac = subflow_req->thmac;
diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index bcaad9c..3184cc6 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -1154,6 +1154,7 @@ static int ip_set_create(struct sk_buff *skb, const struct nfnl_info *info,
return ret;
cleanup:
+ set->variant->cancel_gc(set);
set->variant->destroy(set);
put_out:
module_put(set->type->me);
@@ -2378,6 +2379,7 @@ ip_set_net_exit(struct net *net)
set = ip_set(inst, i);
if (set) {
ip_set(inst, i) = NULL;
+ set->variant->cancel_gc(set);
ip_set_destroy_set(set);
}
}
diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h
index 1136510..cf3ce72 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -432,7 +432,7 @@ mtype_ahash_destroy(struct ip_set *set, struct htable *t, bool ext_destroy)
u32 i;
for (i = 0; i < jhash_size(t->htable_bits); i++) {
- n = __ipset_dereference(hbucket(t, i));
+ n = (__force struct hbucket *)hbucket(t, i);
if (!n)
continue;
if (set->extensions & IPSET_EXT_DESTROY && ext_destroy)
@@ -452,7 +452,7 @@ mtype_destroy(struct ip_set *set)
struct htype *h = set->data;
struct list_head *l, *lt;
- mtype_ahash_destroy(set, ipset_dereference_nfnl(h->table), true);
+ mtype_ahash_destroy(set, (__force struct htable *)h->table, true);
list_for_each_safe(l, lt, &h->ad) {
list_del(l);
kfree(l);
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 2e5f386..5b876fa 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -2756,6 +2756,7 @@ static const struct nf_ct_hook nf_conntrack_hook = {
.get_tuple_skb = nf_conntrack_get_tuple_skb,
.attach = nf_conntrack_attach,
.set_closing = nf_conntrack_set_closing,
+ .confirm = __nf_conntrack_confirm,
};
void nf_conntrack_init_end(void)
diff --git a/net/netfilter/nf_conntrack_h323_asn1.c b/net/netfilter/nf_conntrack_h323_asn1.c
index e697a82..540d977 100644
--- a/net/netfilter/nf_conntrack_h323_asn1.c
+++ b/net/netfilter/nf_conntrack_h323_asn1.c
@@ -533,6 +533,8 @@ static int decode_seq(struct bitstr *bs, const struct field_t *f,
/* Get fields bitmap */
if (nf_h323_error_boundary(bs, 0, f->sz))
return H323_ERROR_BOUND;
+ if (f->sz > 32)
+ return H323_ERROR_RANGE;
bmp = get_bitmap(bs, f->sz);
if (base)
*(unsigned int *)base = bmp;
@@ -589,6 +591,8 @@ static int decode_seq(struct bitstr *bs, const struct field_t *f,
bmp2_len = get_bits(bs, 7) + 1;
if (nf_h323_error_boundary(bs, 0, bmp2_len))
return H323_ERROR_BOUND;
+ if (bmp2_len > 32)
+ return H323_ERROR_RANGE;
bmp2 = get_bitmap(bs, bmp2_len);
bmp |= bmp2 >> f->sz;
if (base)
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 0c22a02..3b846cb 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -876,6 +876,7 @@ struct ctnetlink_filter_u32 {
struct ctnetlink_filter {
u8 family;
+ bool zone_filter;
u_int32_t orig_flags;
u_int32_t reply_flags;
@@ -992,9 +993,12 @@ ctnetlink_alloc_filter(const struct nlattr * const cda[], u8 family)
if (err)
goto err_filter;
- err = ctnetlink_parse_zone(cda[CTA_ZONE], &filter->zone);
- if (err < 0)
- goto err_filter;
+ if (cda[CTA_ZONE]) {
+ err = ctnetlink_parse_zone(cda[CTA_ZONE], &filter->zone);
+ if (err < 0)
+ goto err_filter;
+ filter->zone_filter = true;
+ }
if (!cda[CTA_FILTER])
return filter;
@@ -1148,7 +1152,7 @@ static int ctnetlink_filter_match(struct nf_conn *ct, void *data)
if (filter->family && nf_ct_l3num(ct) != filter->family)
goto ignore_entry;
- if (filter->zone.id != NF_CT_DEFAULT_ZONE_ID &&
+ if (filter->zone_filter &&
!nf_ct_zone_equal_any(ct, &filter->zone))
goto ignore_entry;
diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index 920a5a2..a057133 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -87,12 +87,22 @@ static u32 flow_offload_dst_cookie(struct flow_offload_tuple *flow_tuple)
return 0;
}
+static struct dst_entry *nft_route_dst_fetch(struct nf_flow_route *route,
+ enum flow_offload_tuple_dir dir)
+{
+ struct dst_entry *dst = route->tuple[dir].dst;
+
+ route->tuple[dir].dst = NULL;
+
+ return dst;
+}
+
static int flow_offload_fill_route(struct flow_offload *flow,
- const struct nf_flow_route *route,
+ struct nf_flow_route *route,
enum flow_offload_tuple_dir dir)
{
struct flow_offload_tuple *flow_tuple = &flow->tuplehash[dir].tuple;
- struct dst_entry *dst = route->tuple[dir].dst;
+ struct dst_entry *dst = nft_route_dst_fetch(route, dir);
int i, j = 0;
switch (flow_tuple->l3proto) {
@@ -122,6 +132,7 @@ static int flow_offload_fill_route(struct flow_offload *flow,
ETH_ALEN);
flow_tuple->out.ifidx = route->tuple[dir].out.ifindex;
flow_tuple->out.hw_ifidx = route->tuple[dir].out.hw_ifindex;
+ dst_release(dst);
break;
case FLOW_OFFLOAD_XMIT_XFRM:
case FLOW_OFFLOAD_XMIT_NEIGH:
@@ -146,7 +157,7 @@ static void nft_flow_dst_release(struct flow_offload *flow,
}
void flow_offload_route_init(struct flow_offload *flow,
- const struct nf_flow_route *route)
+ struct nf_flow_route *route)
{
flow_offload_fill_route(flow, route, FLOW_OFFLOAD_DIR_ORIGINAL);
flow_offload_fill_route(flow, route, FLOW_OFFLOAD_DIR_REPLY);
diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index c3d7ecb..016c816 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -551,8 +551,11 @@ static void nf_nat_l4proto_unique_tuple(struct nf_conntrack_tuple *tuple,
find_free_id:
if (range->flags & NF_NAT_RANGE_PROTO_OFFSET)
off = (ntohs(*keyptr) - ntohs(range->base_proto.all));
- else
+ else if ((range->flags & NF_NAT_RANGE_PROTO_RANDOM_ALL) ||
+ maniptype != NF_NAT_MANIP_DST)
off = get_random_u16();
+ else
+ off = 0;
attempts = range_size;
if (attempts > NF_NAT_MAX_ATTEMPTS)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index fc016be..1683dc1 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -684,15 +684,16 @@ static int nft_delobj(struct nft_ctx *ctx, struct nft_object *obj)
return err;
}
-static int nft_trans_flowtable_add(struct nft_ctx *ctx, int msg_type,
- struct nft_flowtable *flowtable)
+static struct nft_trans *
+nft_trans_flowtable_add(struct nft_ctx *ctx, int msg_type,
+ struct nft_flowtable *flowtable)
{
struct nft_trans *trans;
trans = nft_trans_alloc(ctx, msg_type,
sizeof(struct nft_trans_flowtable));
if (trans == NULL)
- return -ENOMEM;
+ return ERR_PTR(-ENOMEM);
if (msg_type == NFT_MSG_NEWFLOWTABLE)
nft_activate_next(ctx->net, flowtable);
@@ -701,22 +702,22 @@ static int nft_trans_flowtable_add(struct nft_ctx *ctx, int msg_type,
nft_trans_flowtable(trans) = flowtable;
nft_trans_commit_list_add_tail(ctx->net, trans);
- return 0;
+ return trans;
}
static int nft_delflowtable(struct nft_ctx *ctx,
struct nft_flowtable *flowtable)
{
- int err;
+ struct nft_trans *trans;
- err = nft_trans_flowtable_add(ctx, NFT_MSG_DELFLOWTABLE, flowtable);
- if (err < 0)
- return err;
+ trans = nft_trans_flowtable_add(ctx, NFT_MSG_DELFLOWTABLE, flowtable);
+ if (IS_ERR(trans))
+ return PTR_ERR(trans);
nft_deactivate_next(ctx->net, flowtable);
nft_use_dec(&ctx->table->use);
- return err;
+ return 0;
}
static void __nft_reg_track_clobber(struct nft_regs_track *track, u8 dreg)
@@ -1251,6 +1252,7 @@ static int nf_tables_updtable(struct nft_ctx *ctx)
return 0;
err_register_hooks:
+ ctx->table->flags |= NFT_TABLE_F_DORMANT;
nft_trans_destroy(trans);
return ret;
}
@@ -2080,7 +2082,7 @@ static struct nft_hook *nft_netdev_hook_alloc(struct net *net,
struct nft_hook *hook;
int err;
- hook = kmalloc(sizeof(struct nft_hook), GFP_KERNEL_ACCOUNT);
+ hook = kzalloc(sizeof(struct nft_hook), GFP_KERNEL_ACCOUNT);
if (!hook) {
err = -ENOMEM;
goto err_hook_alloc;
@@ -2503,19 +2505,15 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask,
RCU_INIT_POINTER(chain->blob_gen_0, blob);
RCU_INIT_POINTER(chain->blob_gen_1, blob);
- err = nf_tables_register_hook(net, table, chain);
- if (err < 0)
- goto err_destroy_chain;
-
if (!nft_use_inc(&table->use)) {
err = -EMFILE;
- goto err_use;
+ goto err_destroy_chain;
}
trans = nft_trans_chain_add(ctx, NFT_MSG_NEWCHAIN);
if (IS_ERR(trans)) {
err = PTR_ERR(trans);
- goto err_unregister_hook;
+ goto err_trans;
}
nft_trans_chain_policy(trans) = NFT_CHAIN_POLICY_UNSET;
@@ -2523,17 +2521,22 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask,
nft_trans_chain_policy(trans) = policy;
err = nft_chain_add(table, chain);
- if (err < 0) {
- nft_trans_destroy(trans);
- goto err_unregister_hook;
- }
+ if (err < 0)
+ goto err_chain_add;
+
+ /* This must be LAST to ensure no packets are walking over this chain. */
+ err = nf_tables_register_hook(net, table, chain);
+ if (err < 0)
+ goto err_register_hook;
return 0;
-err_unregister_hook:
+err_register_hook:
+ nft_chain_del(chain);
+err_chain_add:
+ nft_trans_destroy(trans);
+err_trans:
nft_use_dec_restore(&table->use);
-err_use:
- nf_tables_unregister_hook(net, table, chain);
err_destroy_chain:
nf_tables_chain_destroy(ctx);
@@ -4998,6 +5001,12 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info,
if ((flags & (NFT_SET_EVAL | NFT_SET_OBJECT)) ==
(NFT_SET_EVAL | NFT_SET_OBJECT))
return -EOPNOTSUPP;
+ if ((flags & (NFT_SET_ANONYMOUS | NFT_SET_TIMEOUT | NFT_SET_EVAL)) ==
+ (NFT_SET_ANONYMOUS | NFT_SET_TIMEOUT))
+ return -EOPNOTSUPP;
+ if ((flags & (NFT_SET_CONSTANT | NFT_SET_TIMEOUT)) ==
+ (NFT_SET_CONSTANT | NFT_SET_TIMEOUT))
+ return -EOPNOTSUPP;
}
desc.dtype = 0;
@@ -5421,6 +5430,7 @@ static void nf_tables_unbind_set(const struct nft_ctx *ctx, struct nft_set *set,
if (list_empty(&set->bindings) && nft_set_is_anonymous(set)) {
list_del_rcu(&set->list);
+ set->dead = 1;
if (event)
nf_tables_set_notify(ctx, set, NFT_MSG_DELSET,
GFP_KERNEL);
@@ -8455,9 +8465,9 @@ static int nf_tables_newflowtable(struct sk_buff *skb,
u8 family = info->nfmsg->nfgen_family;
const struct nf_flowtable_type *type;
struct nft_flowtable *flowtable;
- struct nft_hook *hook, *next;
struct net *net = info->net;
struct nft_table *table;
+ struct nft_trans *trans;
struct nft_ctx ctx;
int err;
@@ -8537,34 +8547,34 @@ static int nf_tables_newflowtable(struct sk_buff *skb,
err = nft_flowtable_parse_hook(&ctx, nla, &flowtable_hook, flowtable,
extack, true);
if (err < 0)
- goto err4;
+ goto err_flowtable_parse_hooks;
list_splice(&flowtable_hook.list, &flowtable->hook_list);
flowtable->data.priority = flowtable_hook.priority;
flowtable->hooknum = flowtable_hook.num;
+ trans = nft_trans_flowtable_add(&ctx, NFT_MSG_NEWFLOWTABLE, flowtable);
+ if (IS_ERR(trans)) {
+ err = PTR_ERR(trans);
+ goto err_flowtable_trans;
+ }
+
+ /* This must be LAST to ensure no packets are walking over this flowtable. */
err = nft_register_flowtable_net_hooks(ctx.net, table,
&flowtable->hook_list,
flowtable);
- if (err < 0) {
- nft_hooks_destroy(&flowtable->hook_list);
- goto err4;
- }
-
- err = nft_trans_flowtable_add(&ctx, NFT_MSG_NEWFLOWTABLE, flowtable);
if (err < 0)
- goto err5;
+ goto err_flowtable_hooks;
list_add_tail_rcu(&flowtable->list, &table->flowtables);
return 0;
-err5:
- list_for_each_entry_safe(hook, next, &flowtable->hook_list, list) {
- nft_unregister_flowtable_hook(net, flowtable, hook);
- list_del_rcu(&hook->list);
- kfree_rcu(hook, rcu);
- }
-err4:
+
+err_flowtable_hooks:
+ nft_trans_destroy(trans);
+err_flowtable_trans:
+ nft_hooks_destroy(&flowtable->hook_list);
+err_flowtable_parse_hooks:
flowtable->data.type->free(&flowtable->data);
err3:
module_put(type->owner);
@@ -9827,6 +9837,7 @@ struct nft_trans_gc *nft_trans_gc_catchall_async(struct nft_trans_gc *gc,
struct nft_trans_gc *nft_trans_gc_catchall_sync(struct nft_trans_gc *gc)
{
struct nft_set_elem_catchall *catchall, *next;
+ u64 tstamp = nft_net_tstamp(gc->net);
const struct nft_set *set = gc->set;
struct nft_elem_priv *elem_priv;
struct nft_set_ext *ext;
@@ -9836,7 +9847,7 @@ struct nft_trans_gc *nft_trans_gc_catchall_sync(struct nft_trans_gc *gc)
list_for_each_entry_safe(catchall, next, &set->catchall_list, list) {
ext = nft_set_elem_ext(set, catchall->elem);
- if (!nft_set_elem_expired(ext))
+ if (!__nft_set_elem_expired(ext, tstamp))
continue;
gc = nft_trans_gc_queue_sync(gc, GFP_KERNEL);
@@ -10622,6 +10633,7 @@ static bool nf_tables_valid_genid(struct net *net, u32 genid)
bool genid_ok;
mutex_lock(&nft_net->commit_mutex);
+ nft_net->tstamp = get_jiffies_64();
genid_ok = genid == 0 || nft_net->base_seq == genid;
if (!genid_ok)
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index 171d1f5..5cf38fc 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -232,18 +232,25 @@ static void nfqnl_reinject(struct nf_queue_entry *entry, unsigned int verdict)
if (verdict == NF_ACCEPT ||
verdict == NF_REPEAT ||
verdict == NF_STOP) {
+ unsigned int ct_verdict = verdict;
+
rcu_read_lock();
ct_hook = rcu_dereference(nf_ct_hook);
if (ct_hook)
- verdict = ct_hook->update(entry->state.net, entry->skb);
+ ct_verdict = ct_hook->update(entry->state.net, entry->skb);
rcu_read_unlock();
- switch (verdict & NF_VERDICT_MASK) {
+ switch (ct_verdict & NF_VERDICT_MASK) {
+ case NF_ACCEPT:
+ /* follow userspace verdict, could be REPEAT */
+ break;
case NF_STOLEN:
nf_queue_entry_free(entry);
return;
+ default:
+ verdict = ct_verdict & NF_VERDICT_MASK;
+ break;
}
-
}
nf_reinject(entry, verdict);
}
diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c
index f0eeda9..d3d11de 100644
--- a/net/netfilter/nft_compat.c
+++ b/net/netfilter/nft_compat.c
@@ -135,7 +135,7 @@ static void nft_target_eval_bridge(const struct nft_expr *expr,
static const struct nla_policy nft_target_policy[NFTA_TARGET_MAX + 1] = {
[NFTA_TARGET_NAME] = { .type = NLA_NUL_STRING },
- [NFTA_TARGET_REV] = { .type = NLA_U32 },
+ [NFTA_TARGET_REV] = NLA_POLICY_MAX(NLA_BE32, 255),
[NFTA_TARGET_INFO] = { .type = NLA_BINARY },
};
@@ -200,6 +200,7 @@ static const struct nla_policy nft_rule_compat_policy[NFTA_RULE_COMPAT_MAX + 1]
static int nft_parse_compat(const struct nlattr *attr, u16 *proto, bool *inv)
{
struct nlattr *tb[NFTA_RULE_COMPAT_MAX+1];
+ u32 l4proto;
u32 flags;
int err;
@@ -212,12 +213,18 @@ static int nft_parse_compat(const struct nlattr *attr, u16 *proto, bool *inv)
return -EINVAL;
flags = ntohl(nla_get_be32(tb[NFTA_RULE_COMPAT_FLAGS]));
- if (flags & ~NFT_RULE_COMPAT_F_MASK)
+ if (flags & NFT_RULE_COMPAT_F_UNUSED ||
+ flags & ~NFT_RULE_COMPAT_F_MASK)
return -EINVAL;
if (flags & NFT_RULE_COMPAT_F_INV)
*inv = true;
- *proto = ntohl(nla_get_be32(tb[NFTA_RULE_COMPAT_PROTO]));
+ l4proto = ntohl(nla_get_be32(tb[NFTA_RULE_COMPAT_PROTO]));
+ if (l4proto > U16_MAX)
+ return -EINVAL;
+
+ *proto = l4proto;
+
return 0;
}
@@ -352,10 +359,20 @@ static int nft_target_validate(const struct nft_ctx *ctx,
if (ctx->family != NFPROTO_IPV4 &&
ctx->family != NFPROTO_IPV6 &&
+ ctx->family != NFPROTO_INET &&
ctx->family != NFPROTO_BRIDGE &&
ctx->family != NFPROTO_ARP)
return -EOPNOTSUPP;
+ ret = nft_chain_validate_hooks(ctx->chain,
+ (1 << NF_INET_PRE_ROUTING) |
+ (1 << NF_INET_LOCAL_IN) |
+ (1 << NF_INET_FORWARD) |
+ (1 << NF_INET_LOCAL_OUT) |
+ (1 << NF_INET_POST_ROUTING));
+ if (ret)
+ return ret;
+
if (nft_is_base_chain(ctx->chain)) {
const struct nft_base_chain *basechain =
nft_base_chain(ctx->chain);
@@ -419,7 +436,7 @@ static void nft_match_eval(const struct nft_expr *expr,
static const struct nla_policy nft_match_policy[NFTA_MATCH_MAX + 1] = {
[NFTA_MATCH_NAME] = { .type = NLA_NUL_STRING },
- [NFTA_MATCH_REV] = { .type = NLA_U32 },
+ [NFTA_MATCH_REV] = NLA_POLICY_MAX(NLA_BE32, 255),
[NFTA_MATCH_INFO] = { .type = NLA_BINARY },
};
@@ -603,10 +620,20 @@ static int nft_match_validate(const struct nft_ctx *ctx,
if (ctx->family != NFPROTO_IPV4 &&
ctx->family != NFPROTO_IPV6 &&
+ ctx->family != NFPROTO_INET &&
ctx->family != NFPROTO_BRIDGE &&
ctx->family != NFPROTO_ARP)
return -EOPNOTSUPP;
+ ret = nft_chain_validate_hooks(ctx->chain,
+ (1 << NF_INET_PRE_ROUTING) |
+ (1 << NF_INET_LOCAL_IN) |
+ (1 << NF_INET_FORWARD) |
+ (1 << NF_INET_LOCAL_OUT) |
+ (1 << NF_INET_POST_ROUTING));
+ if (ret)
+ return ret;
+
if (nft_is_base_chain(ctx->chain)) {
const struct nft_base_chain *basechain =
nft_base_chain(ctx->chain);
@@ -724,7 +751,7 @@ static int nfnl_compat_get_rcu(struct sk_buff *skb,
static const struct nla_policy nfnl_compat_policy_get[NFTA_COMPAT_MAX+1] = {
[NFTA_COMPAT_NAME] = { .type = NLA_NUL_STRING,
.len = NFT_COMPAT_NAME_MAX-1 },
- [NFTA_COMPAT_REV] = { .type = NLA_U32 },
+ [NFTA_COMPAT_REV] = NLA_POLICY_MAX(NLA_BE32, 255),
[NFTA_COMPAT_TYPE] = { .type = NLA_U32 },
};
diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c
index aac98a3..2556400 100644
--- a/net/netfilter/nft_ct.c
+++ b/net/netfilter/nft_ct.c
@@ -476,6 +476,9 @@ static int nft_ct_get_init(const struct nft_ctx *ctx,
break;
#endif
case NFT_CT_ID:
+ if (tb[NFTA_CT_DIRECTION])
+ return -EINVAL;
+
len = sizeof(u32);
break;
default:
@@ -1253,14 +1256,13 @@ static int nft_ct_expect_obj_init(const struct nft_ctx *ctx,
switch (priv->l3num) {
case NFPROTO_IPV4:
case NFPROTO_IPV6:
- if (priv->l3num != ctx->family)
- return -EINVAL;
+ if (priv->l3num == ctx->family || ctx->family == NFPROTO_INET)
+ break;
- fallthrough;
- case NFPROTO_INET:
- break;
+ return -EINVAL;
+ case NFPROTO_INET: /* tuple.src.l3num supports NFPROTO_IPV4/6 only */
default:
- return -EOPNOTSUPP;
+ return -EAFNOSUPPORT;
}
priv->l4proto = nla_get_u8(tb[NFTA_CT_EXPECT_L4PROTO]);
diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
index 397351f..ab95760 100644
--- a/net/netfilter/nft_flow_offload.c
+++ b/net/netfilter/nft_flow_offload.c
@@ -361,6 +361,7 @@ static void nft_flow_offload_eval(const struct nft_expr *expr,
ct->proto.tcp.seen[1].flags |= IP_CT_TCP_FLAG_BE_LIBERAL;
}
+ __set_bit(NF_FLOW_HW_BIDIRECTIONAL, &flow->flags);
ret = flow_offload_add(flowtable, flow);
if (ret < 0)
goto err_flow_add;
diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index 6c2061b..6968a3b 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -36,6 +36,7 @@ struct nft_rhash_cmp_arg {
const struct nft_set *set;
const u32 *key;
u8 genmask;
+ u64 tstamp;
};
static inline u32 nft_rhash_key(const void *data, u32 len, u32 seed)
@@ -62,7 +63,7 @@ static inline int nft_rhash_cmp(struct rhashtable_compare_arg *arg,
return 1;
if (nft_set_elem_is_dead(&he->ext))
return 1;
- if (nft_set_elem_expired(&he->ext))
+ if (__nft_set_elem_expired(&he->ext, x->tstamp))
return 1;
if (!nft_set_elem_active(&he->ext, x->genmask))
return 1;
@@ -87,6 +88,7 @@ bool nft_rhash_lookup(const struct net *net, const struct nft_set *set,
.genmask = nft_genmask_cur(net),
.set = set,
.key = key,
+ .tstamp = get_jiffies_64(),
};
he = rhashtable_lookup(&priv->ht, &arg, nft_rhash_params);
@@ -106,6 +108,7 @@ nft_rhash_get(const struct net *net, const struct nft_set *set,
.genmask = nft_genmask_cur(net),
.set = set,
.key = elem->key.val.data,
+ .tstamp = get_jiffies_64(),
};
he = rhashtable_lookup(&priv->ht, &arg, nft_rhash_params);
@@ -131,6 +134,7 @@ static bool nft_rhash_update(struct nft_set *set, const u32 *key,
.genmask = NFT_GENMASK_ANY,
.set = set,
.key = key,
+ .tstamp = get_jiffies_64(),
};
he = rhashtable_lookup(&priv->ht, &arg, nft_rhash_params);
@@ -175,6 +179,7 @@ static int nft_rhash_insert(const struct net *net, const struct nft_set *set,
.genmask = nft_genmask_next(net),
.set = set,
.key = elem->key.val.data,
+ .tstamp = nft_net_tstamp(net),
};
struct nft_rhash_elem *prev;
@@ -216,6 +221,7 @@ nft_rhash_deactivate(const struct net *net, const struct nft_set *set,
.genmask = nft_genmask_next(net),
.set = set,
.key = elem->key.val.data,
+ .tstamp = nft_net_tstamp(net),
};
rcu_read_lock();
diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index efd5234..aa1d9e9 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -342,9 +342,6 @@
#include "nft_set_pipapo_avx2.h"
#include "nft_set_pipapo.h"
-/* Current working bitmap index, toggled between field matches */
-static DEFINE_PER_CPU(bool, nft_pipapo_scratch_index);
-
/**
* pipapo_refill() - For each set bit, set bits from selected mapping table item
* @map: Bitmap to be scanned for set bits
@@ -412,6 +409,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
const u32 *key, const struct nft_set_ext **ext)
{
struct nft_pipapo *priv = nft_set_priv(set);
+ struct nft_pipapo_scratch *scratch;
unsigned long *res_map, *fill_map;
u8 genmask = nft_genmask_cur(net);
const u8 *rp = (const u8 *)key;
@@ -422,15 +420,17 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
local_bh_disable();
- map_index = raw_cpu_read(nft_pipapo_scratch_index);
-
m = rcu_dereference(priv->match);
if (unlikely(!m || !*raw_cpu_ptr(m->scratch)))
goto out;
- res_map = *raw_cpu_ptr(m->scratch) + (map_index ? m->bsize_max : 0);
- fill_map = *raw_cpu_ptr(m->scratch) + (map_index ? 0 : m->bsize_max);
+ scratch = *raw_cpu_ptr(m->scratch);
+
+ map_index = scratch->map_index;
+
+ res_map = scratch->map + (map_index ? m->bsize_max : 0);
+ fill_map = scratch->map + (map_index ? 0 : m->bsize_max);
memset(res_map, 0xff, m->bsize_max * sizeof(*res_map));
@@ -460,7 +460,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
b = pipapo_refill(res_map, f->bsize, f->rules, fill_map, f->mt,
last);
if (b < 0) {
- raw_cpu_write(nft_pipapo_scratch_index, map_index);
+ scratch->map_index = map_index;
local_bh_enable();
return false;
@@ -477,7 +477,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
* current inactive bitmap is clean and can be reused as
* *next* bitmap (not initial) for the next packet.
*/
- raw_cpu_write(nft_pipapo_scratch_index, map_index);
+ scratch->map_index = map_index;
local_bh_enable();
return true;
@@ -504,6 +504,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
* @set: nftables API set representation
* @data: Key data to be matched against existing elements
* @genmask: If set, check that element is active in given genmask
+ * @tstamp: timestamp to check for expired elements
*
* This is essentially the same as the lookup function, except that it matches
* key data against the uncommitted copy and doesn't use preallocated maps for
@@ -513,7 +514,8 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
*/
static struct nft_pipapo_elem *pipapo_get(const struct net *net,
const struct nft_set *set,
- const u8 *data, u8 genmask)
+ const u8 *data, u8 genmask,
+ u64 tstamp)
{
struct nft_pipapo_elem *ret = ERR_PTR(-ENOENT);
struct nft_pipapo *priv = nft_set_priv(set);
@@ -566,7 +568,7 @@ static struct nft_pipapo_elem *pipapo_get(const struct net *net,
goto out;
if (last) {
- if (nft_set_elem_expired(&f->mt[b].e->ext))
+ if (__nft_set_elem_expired(&f->mt[b].e->ext, tstamp))
goto next_match;
if ((genmask &&
!nft_set_elem_active(&f->mt[b].e->ext, genmask)))
@@ -603,10 +605,10 @@ static struct nft_elem_priv *
nft_pipapo_get(const struct net *net, const struct nft_set *set,
const struct nft_set_elem *elem, unsigned int flags)
{
- static struct nft_pipapo_elem *e;
+ struct nft_pipapo_elem *e;
e = pipapo_get(net, set, (const u8 *)elem->key.val.data,
- nft_genmask_cur(net));
+ nft_genmask_cur(net), get_jiffies_64());
if (IS_ERR(e))
return ERR_CAST(e);
@@ -1109,6 +1111,25 @@ static void pipapo_map(struct nft_pipapo_match *m,
}
/**
+ * pipapo_free_scratch() - Free per-CPU map at original (not aligned) address
+ * @m: Matching data
+ * @cpu: CPU number
+ */
+static void pipapo_free_scratch(const struct nft_pipapo_match *m, unsigned int cpu)
+{
+ struct nft_pipapo_scratch *s;
+ void *mem;
+
+ s = *per_cpu_ptr(m->scratch, cpu);
+ if (!s)
+ return;
+
+ mem = s;
+ mem -= s->align_off;
+ kfree(mem);
+}
+
+/**
* pipapo_realloc_scratch() - Reallocate scratch maps for partial match results
* @clone: Copy of matching data with pending insertions and deletions
* @bsize_max: Maximum bucket size, scratch maps cover two buckets
@@ -1121,12 +1142,13 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone,
int i;
for_each_possible_cpu(i) {
- unsigned long *scratch;
+ struct nft_pipapo_scratch *scratch;
#ifdef NFT_PIPAPO_ALIGN
- unsigned long *scratch_aligned;
+ void *scratch_aligned;
+ u32 align_off;
#endif
-
- scratch = kzalloc_node(bsize_max * sizeof(*scratch) * 2 +
+ scratch = kzalloc_node(struct_size(scratch, map,
+ bsize_max * 2) +
NFT_PIPAPO_ALIGN_HEADROOM,
GFP_KERNEL, cpu_to_node(i));
if (!scratch) {
@@ -1140,14 +1162,25 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone,
return -ENOMEM;
}
- kfree(*per_cpu_ptr(clone->scratch, i));
-
- *per_cpu_ptr(clone->scratch, i) = scratch;
+ pipapo_free_scratch(clone, i);
#ifdef NFT_PIPAPO_ALIGN
- scratch_aligned = NFT_PIPAPO_LT_ALIGN(scratch);
- *per_cpu_ptr(clone->scratch_aligned, i) = scratch_aligned;
+ /* Align &scratch->map (not the struct itself): the extra
+ * %NFT_PIPAPO_ALIGN_HEADROOM bytes passed to kzalloc_node()
+ * above guarantee we can waste up to those bytes in order
+ * to align the map field regardless of its offset within
+ * the struct.
+ */
+ BUILD_BUG_ON(offsetof(struct nft_pipapo_scratch, map) > NFT_PIPAPO_ALIGN_HEADROOM);
+
+ scratch_aligned = NFT_PIPAPO_LT_ALIGN(&scratch->map);
+ scratch_aligned -= offsetof(struct nft_pipapo_scratch, map);
+ align_off = scratch_aligned - (void *)scratch;
+
+ scratch = scratch_aligned;
+ scratch->align_off = align_off;
#endif
+ *per_cpu_ptr(clone->scratch, i) = scratch;
}
return 0;
@@ -1173,6 +1206,7 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
struct nft_pipapo_match *m = priv->clone;
u8 genmask = nft_genmask_next(net);
struct nft_pipapo_elem *e, *dup;
+ u64 tstamp = nft_net_tstamp(net);
struct nft_pipapo_field *f;
const u8 *start_p, *end_p;
int i, bsize_max, err = 0;
@@ -1182,7 +1216,7 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
else
end = start;
- dup = pipapo_get(net, set, start, genmask);
+ dup = pipapo_get(net, set, start, genmask, tstamp);
if (!IS_ERR(dup)) {
/* Check if we already have the same exact entry */
const struct nft_data *dup_key, *dup_end;
@@ -1204,7 +1238,7 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
if (PTR_ERR(dup) == -ENOENT) {
/* Look for partially overlapping entries */
- dup = pipapo_get(net, set, end, nft_genmask_next(net));
+ dup = pipapo_get(net, set, end, nft_genmask_next(net), tstamp);
}
if (PTR_ERR(dup) != -ENOENT) {
@@ -1301,11 +1335,6 @@ static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old)
if (!new->scratch)
goto out_scratch;
-#ifdef NFT_PIPAPO_ALIGN
- new->scratch_aligned = alloc_percpu(*new->scratch_aligned);
- if (!new->scratch_aligned)
- goto out_scratch;
-#endif
for_each_possible_cpu(i)
*per_cpu_ptr(new->scratch, i) = NULL;
@@ -1357,10 +1386,7 @@ static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old)
}
out_scratch_realloc:
for_each_possible_cpu(i)
- kfree(*per_cpu_ptr(new->scratch, i));
-#ifdef NFT_PIPAPO_ALIGN
- free_percpu(new->scratch_aligned);
-#endif
+ pipapo_free_scratch(new, i);
out_scratch:
free_percpu(new->scratch);
kfree(new);
@@ -1560,6 +1586,7 @@ static void pipapo_gc(struct nft_set *set, struct nft_pipapo_match *m)
{
struct nft_pipapo *priv = nft_set_priv(set);
struct net *net = read_pnet(&set->net);
+ u64 tstamp = nft_net_tstamp(net);
int rules_f0, first_rule = 0;
struct nft_pipapo_elem *e;
struct nft_trans_gc *gc;
@@ -1594,7 +1621,7 @@ static void pipapo_gc(struct nft_set *set, struct nft_pipapo_match *m)
/* synchronous gc never fails, there is no need to set on
* NFT_SET_ELEM_DEAD_BIT.
*/
- if (nft_set_elem_expired(&e->ext)) {
+ if (__nft_set_elem_expired(&e->ext, tstamp)) {
priv->dirty = true;
gc = nft_trans_gc_queue_sync(gc, GFP_KERNEL);
@@ -1640,13 +1667,9 @@ static void pipapo_free_match(struct nft_pipapo_match *m)
int i;
for_each_possible_cpu(i)
- kfree(*per_cpu_ptr(m->scratch, i));
+ pipapo_free_scratch(m, i);
-#ifdef NFT_PIPAPO_ALIGN
- free_percpu(m->scratch_aligned);
-#endif
free_percpu(m->scratch);
-
pipapo_free_fields(m);
kfree(m);
@@ -1769,7 +1792,7 @@ static void *pipapo_deactivate(const struct net *net, const struct nft_set *set,
{
struct nft_pipapo_elem *e;
- e = pipapo_get(net, set, data, nft_genmask_next(net));
+ e = pipapo_get(net, set, data, nft_genmask_next(net), nft_net_tstamp(net));
if (IS_ERR(e))
return NULL;
@@ -2132,7 +2155,7 @@ static int nft_pipapo_init(const struct nft_set *set,
m->field_count = field_count;
m->bsize_max = 0;
- m->scratch = alloc_percpu(unsigned long *);
+ m->scratch = alloc_percpu(struct nft_pipapo_scratch *);
if (!m->scratch) {
err = -ENOMEM;
goto out_scratch;
@@ -2140,16 +2163,6 @@ static int nft_pipapo_init(const struct nft_set *set,
for_each_possible_cpu(i)
*per_cpu_ptr(m->scratch, i) = NULL;
-#ifdef NFT_PIPAPO_ALIGN
- m->scratch_aligned = alloc_percpu(unsigned long *);
- if (!m->scratch_aligned) {
- err = -ENOMEM;
- goto out_free;
- }
- for_each_possible_cpu(i)
- *per_cpu_ptr(m->scratch_aligned, i) = NULL;
-#endif
-
rcu_head_init(&m->rcu);
nft_pipapo_for_each_field(f, i, m) {
@@ -2180,9 +2193,6 @@ static int nft_pipapo_init(const struct nft_set *set,
return 0;
out_free:
-#ifdef NFT_PIPAPO_ALIGN
- free_percpu(m->scratch_aligned);
-#endif
free_percpu(m->scratch);
out_scratch:
kfree(m);
@@ -2236,11 +2246,8 @@ static void nft_pipapo_destroy(const struct nft_ctx *ctx,
nft_set_pipapo_match_destroy(ctx, set, m);
-#ifdef NFT_PIPAPO_ALIGN
- free_percpu(m->scratch_aligned);
-#endif
for_each_possible_cpu(cpu)
- kfree(*per_cpu_ptr(m->scratch, cpu));
+ pipapo_free_scratch(m, cpu);
free_percpu(m->scratch);
pipapo_free_fields(m);
kfree(m);
@@ -2253,11 +2260,8 @@ static void nft_pipapo_destroy(const struct nft_ctx *ctx,
if (priv->dirty)
nft_set_pipapo_match_destroy(ctx, set, m);
-#ifdef NFT_PIPAPO_ALIGN
- free_percpu(priv->clone->scratch_aligned);
-#endif
for_each_possible_cpu(cpu)
- kfree(*per_cpu_ptr(priv->clone->scratch, cpu));
+ pipapo_free_scratch(priv->clone, cpu);
free_percpu(priv->clone->scratch);
pipapo_free_fields(priv->clone);
diff --git a/net/netfilter/nft_set_pipapo.h b/net/netfilter/nft_set_pipapo.h
index 1040223..3842c73 100644
--- a/net/netfilter/nft_set_pipapo.h
+++ b/net/netfilter/nft_set_pipapo.h
@@ -131,20 +131,28 @@ struct nft_pipapo_field {
};
/**
+ * struct nft_pipapo_scratch - percpu data used for lookup and matching
+ * @map_index: Current working bitmap index, toggled between field matches
+ * @align_off: Offset to get the originally allocated address
+ * @map: store partial matching results during lookup
+ */
+struct nft_pipapo_scratch {
+ u8 map_index;
+ u32 align_off;
+ unsigned long map[];
+};
+
+/**
* struct nft_pipapo_match - Data used for lookup and matching
- * @field_count Amount of fields in set
+ * @field_count: Amount of fields in set
* @scratch: Preallocated per-CPU maps for partial matching results
- * @scratch_aligned: Version of @scratch aligned to NFT_PIPAPO_ALIGN bytes
* @bsize_max: Maximum lookup table bucket size of all fields, in longs
- * @rcu Matching data is swapped on commits
+ * @rcu: Matching data is swapped on commits
* @f: Fields, with lookup and mapping tables
*/
struct nft_pipapo_match {
int field_count;
-#ifdef NFT_PIPAPO_ALIGN
- unsigned long * __percpu *scratch_aligned;
-#endif
- unsigned long * __percpu *scratch;
+ struct nft_pipapo_scratch * __percpu *scratch;
size_t bsize_max;
struct rcu_head rcu;
struct nft_pipapo_field f[] __counted_by(field_count);
diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c
index 52e0d02..a3a8ddc 100644
--- a/net/netfilter/nft_set_pipapo_avx2.c
+++ b/net/netfilter/nft_set_pipapo_avx2.c
@@ -57,7 +57,7 @@
/* Jump to label if @reg is zero */
#define NFT_PIPAPO_AVX2_NOMATCH_GOTO(reg, label) \
- asm_volatile_goto("vptest %%ymm" #reg ", %%ymm" #reg ";" \
+ asm goto("vptest %%ymm" #reg ", %%ymm" #reg ";" \
"je %l[" #label "]" : : : : label)
/* Store 256 bits from YMM register into memory. Contrary to bucket load
@@ -71,9 +71,6 @@
#define NFT_PIPAPO_AVX2_ZERO(reg) \
asm volatile("vpxor %ymm" #reg ", %ymm" #reg ", %ymm" #reg)
-/* Current working bitmap index, toggled between field matches */
-static DEFINE_PER_CPU(bool, nft_pipapo_avx2_scratch_index);
-
/**
* nft_pipapo_avx2_prepare() - Prepare before main algorithm body
*
@@ -1120,11 +1117,12 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
const u32 *key, const struct nft_set_ext **ext)
{
struct nft_pipapo *priv = nft_set_priv(set);
- unsigned long *res, *fill, *scratch;
+ struct nft_pipapo_scratch *scratch;
u8 genmask = nft_genmask_cur(net);
const u8 *rp = (const u8 *)key;
struct nft_pipapo_match *m;
struct nft_pipapo_field *f;
+ unsigned long *res, *fill;
bool map_index;
int i, ret = 0;
@@ -1141,15 +1139,16 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
*/
kernel_fpu_begin_mask(0);
- scratch = *raw_cpu_ptr(m->scratch_aligned);
+ scratch = *raw_cpu_ptr(m->scratch);
if (unlikely(!scratch)) {
kernel_fpu_end();
return false;
}
- map_index = raw_cpu_read(nft_pipapo_avx2_scratch_index);
- res = scratch + (map_index ? m->bsize_max : 0);
- fill = scratch + (map_index ? 0 : m->bsize_max);
+ map_index = scratch->map_index;
+
+ res = scratch->map + (map_index ? m->bsize_max : 0);
+ fill = scratch->map + (map_index ? 0 : m->bsize_max);
/* Starting map doesn't need to be set for this implementation */
@@ -1221,7 +1220,7 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
out:
if (i % 2)
- raw_cpu_write(nft_pipapo_avx2_scratch_index, !map_index);
+ scratch->map_index = !map_index;
kernel_fpu_end();
return ret >= 0;
diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index baa3fea..9944fe4 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -234,7 +234,7 @@ static void nft_rbtree_gc_elem_remove(struct net *net, struct nft_set *set,
static const struct nft_rbtree_elem *
nft_rbtree_gc_elem(const struct nft_set *__set, struct nft_rbtree *priv,
- struct nft_rbtree_elem *rbe, u8 genmask)
+ struct nft_rbtree_elem *rbe)
{
struct nft_set *set = (struct nft_set *)__set;
struct rb_node *prev = rb_prev(&rbe->node);
@@ -253,7 +253,7 @@ nft_rbtree_gc_elem(const struct nft_set *__set, struct nft_rbtree *priv,
while (prev) {
rbe_prev = rb_entry(prev, struct nft_rbtree_elem, node);
if (nft_rbtree_interval_end(rbe_prev) &&
- nft_set_elem_active(&rbe_prev->ext, genmask))
+ nft_set_elem_active(&rbe_prev->ext, NFT_GENMASK_ANY))
break;
prev = rb_prev(prev);
@@ -313,6 +313,7 @@ static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set,
struct nft_rbtree *priv = nft_set_priv(set);
u8 cur_genmask = nft_genmask_cur(net);
u8 genmask = nft_genmask_next(net);
+ u64 tstamp = nft_net_tstamp(net);
int d;
/* Descend the tree to search for an existing element greater than the
@@ -360,11 +361,11 @@ static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set,
/* perform garbage collection to avoid bogus overlap reports
* but skip new elements in this transaction.
*/
- if (nft_set_elem_expired(&rbe->ext) &&
+ if (__nft_set_elem_expired(&rbe->ext, tstamp) &&
nft_set_elem_active(&rbe->ext, cur_genmask)) {
const struct nft_rbtree_elem *removed_end;
- removed_end = nft_rbtree_gc_elem(set, priv, rbe, genmask);
+ removed_end = nft_rbtree_gc_elem(set, priv, rbe);
if (IS_ERR(removed_end))
return PTR_ERR(removed_end);
@@ -551,6 +552,7 @@ nft_rbtree_deactivate(const struct net *net, const struct nft_set *set,
const struct nft_rbtree *priv = nft_set_priv(set);
const struct rb_node *parent = priv->root.rb_node;
u8 genmask = nft_genmask_next(net);
+ u64 tstamp = nft_net_tstamp(net);
int d;
while (parent != NULL) {
@@ -571,7 +573,7 @@ nft_rbtree_deactivate(const struct net *net, const struct nft_set *set,
nft_rbtree_interval_end(this)) {
parent = parent->rb_right;
continue;
- } else if (nft_set_elem_expired(&rbe->ext)) {
+ } else if (__nft_set_elem_expired(&rbe->ext, tstamp)) {
break;
} else if (!nft_set_elem_active(&rbe->ext, genmask)) {
parent = parent->rb_left;
@@ -624,9 +626,10 @@ static void nft_rbtree_gc(struct nft_set *set)
{
struct nft_rbtree *priv = nft_set_priv(set);
struct nft_rbtree_elem *rbe, *rbe_end = NULL;
+ struct net *net = read_pnet(&set->net);
+ u64 tstamp = nft_net_tstamp(net);
struct rb_node *node, *next;
struct nft_trans_gc *gc;
- struct net *net;
set = nft_set_container_of(priv);
net = read_pnet(&set->net);
@@ -648,7 +651,7 @@ static void nft_rbtree_gc(struct nft_set *set)
rbe_end = rbe;
continue;
}
- if (!nft_set_elem_expired(&rbe->ext))
+ if (!__nft_set_elem_expired(&rbe->ext, tstamp))
continue;
gc = nft_trans_gc_queue_sync(gc, GFP_KERNEL);
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 9c96234..ff31535 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -167,7 +167,7 @@ static inline u32 netlink_group_mask(u32 group)
static struct sk_buff *netlink_to_full_skb(const struct sk_buff *skb,
gfp_t gfp_mask)
{
- unsigned int len = skb_end_offset(skb);
+ unsigned int len = skb->len;
struct sk_buff *new;
new = alloc_skb(len, gfp_mask);
diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index 0eed00184..104a80b 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -453,16 +453,16 @@ static int nr_create(struct net *net, struct socket *sock, int protocol,
nr_init_timers(sk);
nr->t1 =
- msecs_to_jiffies(sysctl_netrom_transport_timeout);
+ msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_timeout));
nr->t2 =
- msecs_to_jiffies(sysctl_netrom_transport_acknowledge_delay);
+ msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_acknowledge_delay));
nr->n2 =
- msecs_to_jiffies(sysctl_netrom_transport_maximum_tries);
+ msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_maximum_tries));
nr->t4 =
- msecs_to_jiffies(sysctl_netrom_transport_busy_delay);
+ msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_busy_delay));
nr->idle =
- msecs_to_jiffies(sysctl_netrom_transport_no_activity_timeout);
- nr->window = sysctl_netrom_transport_requested_window_size;
+ msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_no_activity_timeout));
+ nr->window = READ_ONCE(sysctl_netrom_transport_requested_window_size);
nr->bpqext = 1;
nr->state = NR_STATE_0;
@@ -954,7 +954,7 @@ int nr_rx_frame(struct sk_buff *skb, struct net_device *dev)
* G8PZT's Xrouter which is sending packets with command type 7
* as an extension of the protocol.
*/
- if (sysctl_netrom_reset_circuit &&
+ if (READ_ONCE(sysctl_netrom_reset_circuit) &&
(frametype != NR_RESET || flags != 0))
nr_transmit_reset(skb, 1);
diff --git a/net/netrom/nr_dev.c b/net/netrom/nr_dev.c
index 3aaac4a..2c34389 100644
--- a/net/netrom/nr_dev.c
+++ b/net/netrom/nr_dev.c
@@ -81,7 +81,7 @@ static int nr_header(struct sk_buff *skb, struct net_device *dev,
buff[6] |= AX25_SSSID_SPARE;
buff += AX25_ADDR_LEN;
- *buff++ = sysctl_netrom_network_ttl_initialiser;
+ *buff++ = READ_ONCE(sysctl_netrom_network_ttl_initialiser);
*buff++ = NR_PROTO_IP;
*buff++ = NR_PROTO_IP;
diff --git a/net/netrom/nr_in.c b/net/netrom/nr_in.c
index 2f084b6..97944db 100644
--- a/net/netrom/nr_in.c
+++ b/net/netrom/nr_in.c
@@ -97,7 +97,7 @@ static int nr_state1_machine(struct sock *sk, struct sk_buff *skb,
break;
case NR_RESET:
- if (sysctl_netrom_reset_circuit)
+ if (READ_ONCE(sysctl_netrom_reset_circuit))
nr_disconnect(sk, ECONNRESET);
break;
@@ -128,7 +128,7 @@ static int nr_state2_machine(struct sock *sk, struct sk_buff *skb,
break;
case NR_RESET:
- if (sysctl_netrom_reset_circuit)
+ if (READ_ONCE(sysctl_netrom_reset_circuit))
nr_disconnect(sk, ECONNRESET);
break;
@@ -262,7 +262,7 @@ static int nr_state3_machine(struct sock *sk, struct sk_buff *skb, int frametype
break;
case NR_RESET:
- if (sysctl_netrom_reset_circuit)
+ if (READ_ONCE(sysctl_netrom_reset_circuit))
nr_disconnect(sk, ECONNRESET);
break;
diff --git a/net/netrom/nr_out.c b/net/netrom/nr_out.c
index 4492965..5e53139 100644
--- a/net/netrom/nr_out.c
+++ b/net/netrom/nr_out.c
@@ -204,7 +204,7 @@ void nr_transmit_buffer(struct sock *sk, struct sk_buff *skb)
dptr[6] |= AX25_SSSID_SPARE;
dptr += AX25_ADDR_LEN;
- *dptr++ = sysctl_netrom_network_ttl_initialiser;
+ *dptr++ = READ_ONCE(sysctl_netrom_network_ttl_initialiser);
if (!nr_route_frame(skb, NULL)) {
kfree_skb(skb);
diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
index baea3cb..7048086 100644
--- a/net/netrom/nr_route.c
+++ b/net/netrom/nr_route.c
@@ -153,7 +153,7 @@ static int __must_check nr_add_node(ax25_address *nr, const char *mnemonic,
nr_neigh->digipeat = NULL;
nr_neigh->ax25 = NULL;
nr_neigh->dev = dev;
- nr_neigh->quality = sysctl_netrom_default_path_quality;
+ nr_neigh->quality = READ_ONCE(sysctl_netrom_default_path_quality);
nr_neigh->locked = 0;
nr_neigh->count = 0;
nr_neigh->number = nr_neigh_no++;
@@ -728,7 +728,7 @@ void nr_link_failed(ax25_cb *ax25, int reason)
nr_neigh->ax25 = NULL;
ax25_cb_put(ax25);
- if (++nr_neigh->failed < sysctl_netrom_link_fails_count) {
+ if (++nr_neigh->failed < READ_ONCE(sysctl_netrom_link_fails_count)) {
nr_neigh_put(nr_neigh);
return;
}
@@ -766,7 +766,7 @@ int nr_route_frame(struct sk_buff *skb, ax25_cb *ax25)
if (ax25 != NULL) {
ret = nr_add_node(nr_src, "", &ax25->dest_addr, ax25->digipeat,
ax25->ax25_dev->dev, 0,
- sysctl_netrom_obsolescence_count_initialiser);
+ READ_ONCE(sysctl_netrom_obsolescence_count_initialiser));
if (ret)
return ret;
}
@@ -780,7 +780,7 @@ int nr_route_frame(struct sk_buff *skb, ax25_cb *ax25)
return ret;
}
- if (!sysctl_netrom_routing_control && ax25 != NULL)
+ if (!READ_ONCE(sysctl_netrom_routing_control) && ax25 != NULL)
return 0;
/* Its Time-To-Live has expired */
diff --git a/net/netrom/nr_subr.c b/net/netrom/nr_subr.c
index e2d2af9..c3bbd58 100644
--- a/net/netrom/nr_subr.c
+++ b/net/netrom/nr_subr.c
@@ -182,7 +182,8 @@ void nr_write_internal(struct sock *sk, int frametype)
*dptr++ = nr->my_id;
*dptr++ = frametype;
*dptr++ = nr->window;
- if (nr->bpqext) *dptr++ = sysctl_netrom_network_ttl_initialiser;
+ if (nr->bpqext)
+ *dptr++ = READ_ONCE(sysctl_netrom_network_ttl_initialiser);
break;
case NR_DISCREQ:
@@ -236,7 +237,7 @@ void __nr_transmit_reply(struct sk_buff *skb, int mine, unsigned char cmdflags)
dptr[6] |= AX25_SSSID_SPARE;
dptr += AX25_ADDR_LEN;
- *dptr++ = sysctl_netrom_network_ttl_initialiser;
+ *dptr++ = READ_ONCE(sysctl_netrom_network_ttl_initialiser);
if (mine) {
*dptr++ = 0;
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 88965e2..ebc5728 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -48,6 +48,7 @@ struct ovs_len_tbl {
#define OVS_ATTR_NESTED -1
#define OVS_ATTR_VARIABLE -2
+#define OVS_COPY_ACTIONS_MAX_DEPTH 16
static bool actions_may_change_flow(const struct nlattr *actions)
{
@@ -2545,13 +2546,15 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
const struct sw_flow_key *key,
struct sw_flow_actions **sfa,
__be16 eth_type, __be16 vlan_tci,
- u32 mpls_label_count, bool log);
+ u32 mpls_label_count, bool log,
+ u32 depth);
static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
const struct sw_flow_key *key,
struct sw_flow_actions **sfa,
__be16 eth_type, __be16 vlan_tci,
- u32 mpls_label_count, bool log, bool last)
+ u32 mpls_label_count, bool log, bool last,
+ u32 depth)
{
const struct nlattr *attrs[OVS_SAMPLE_ATTR_MAX + 1];
const struct nlattr *probability, *actions;
@@ -2602,7 +2605,8 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
return err;
err = __ovs_nla_copy_actions(net, actions, key, sfa,
- eth_type, vlan_tci, mpls_label_count, log);
+ eth_type, vlan_tci, mpls_label_count, log,
+ depth + 1);
if (err)
return err;
@@ -2617,7 +2621,8 @@ static int validate_and_copy_dec_ttl(struct net *net,
const struct sw_flow_key *key,
struct sw_flow_actions **sfa,
__be16 eth_type, __be16 vlan_tci,
- u32 mpls_label_count, bool log)
+ u32 mpls_label_count, bool log,
+ u32 depth)
{
const struct nlattr *attrs[OVS_DEC_TTL_ATTR_MAX + 1];
int start, action_start, err, rem;
@@ -2660,7 +2665,8 @@ static int validate_and_copy_dec_ttl(struct net *net,
return action_start;
err = __ovs_nla_copy_actions(net, actions, key, sfa, eth_type,
- vlan_tci, mpls_label_count, log);
+ vlan_tci, mpls_label_count, log,
+ depth + 1);
if (err)
return err;
@@ -2674,7 +2680,8 @@ static int validate_and_copy_clone(struct net *net,
const struct sw_flow_key *key,
struct sw_flow_actions **sfa,
__be16 eth_type, __be16 vlan_tci,
- u32 mpls_label_count, bool log, bool last)
+ u32 mpls_label_count, bool log, bool last,
+ u32 depth)
{
int start, err;
u32 exec;
@@ -2694,7 +2701,8 @@ static int validate_and_copy_clone(struct net *net,
return err;
err = __ovs_nla_copy_actions(net, attr, key, sfa,
- eth_type, vlan_tci, mpls_label_count, log);
+ eth_type, vlan_tci, mpls_label_count, log,
+ depth + 1);
if (err)
return err;
@@ -3063,7 +3071,7 @@ static int validate_and_copy_check_pkt_len(struct net *net,
struct sw_flow_actions **sfa,
__be16 eth_type, __be16 vlan_tci,
u32 mpls_label_count,
- bool log, bool last)
+ bool log, bool last, u32 depth)
{
const struct nlattr *acts_if_greater, *acts_if_lesser_eq;
struct nlattr *a[OVS_CHECK_PKT_LEN_ATTR_MAX + 1];
@@ -3111,7 +3119,8 @@ static int validate_and_copy_check_pkt_len(struct net *net,
return nested_acts_start;
err = __ovs_nla_copy_actions(net, acts_if_lesser_eq, key, sfa,
- eth_type, vlan_tci, mpls_label_count, log);
+ eth_type, vlan_tci, mpls_label_count, log,
+ depth + 1);
if (err)
return err;
@@ -3124,7 +3133,8 @@ static int validate_and_copy_check_pkt_len(struct net *net,
return nested_acts_start;
err = __ovs_nla_copy_actions(net, acts_if_greater, key, sfa,
- eth_type, vlan_tci, mpls_label_count, log);
+ eth_type, vlan_tci, mpls_label_count, log,
+ depth + 1);
if (err)
return err;
@@ -3152,12 +3162,16 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
const struct sw_flow_key *key,
struct sw_flow_actions **sfa,
__be16 eth_type, __be16 vlan_tci,
- u32 mpls_label_count, bool log)
+ u32 mpls_label_count, bool log,
+ u32 depth)
{
u8 mac_proto = ovs_key_mac_proto(key);
const struct nlattr *a;
int rem, err;
+ if (depth > OVS_COPY_ACTIONS_MAX_DEPTH)
+ return -EOVERFLOW;
+
nla_for_each_nested(a, attr, rem) {
/* Expected argument lengths, (u32)-1 for variable length. */
static const u32 action_lens[OVS_ACTION_ATTR_MAX + 1] = {
@@ -3355,7 +3369,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
err = validate_and_copy_sample(net, a, key, sfa,
eth_type, vlan_tci,
mpls_label_count,
- log, last);
+ log, last, depth);
if (err)
return err;
skip_copy = true;
@@ -3426,7 +3440,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
err = validate_and_copy_clone(net, a, key, sfa,
eth_type, vlan_tci,
mpls_label_count,
- log, last);
+ log, last, depth);
if (err)
return err;
skip_copy = true;
@@ -3440,7 +3454,8 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
eth_type,
vlan_tci,
mpls_label_count,
- log, last);
+ log, last,
+ depth);
if (err)
return err;
skip_copy = true;
@@ -3450,7 +3465,8 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
case OVS_ACTION_ATTR_DEC_TTL:
err = validate_and_copy_dec_ttl(net, a, key, sfa,
eth_type, vlan_tci,
- mpls_label_count, log);
+ mpls_label_count, log,
+ depth);
if (err)
return err;
skip_copy = true;
@@ -3495,7 +3511,8 @@ int ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
(*sfa)->orig_len = nla_len(attr);
err = __ovs_nla_copy_actions(net, attr, key, sfa, key->eth.type,
- key->eth.vlan.tci, mpls_label_count, log);
+ key->eth.vlan.tci, mpls_label_count, log,
+ 0);
if (err)
ovs_nla_free_flow_actions(*sfa);
diff --git a/net/phonet/datagram.c b/net/phonet/datagram.c
index 3aa50dc..976fe25 100644
--- a/net/phonet/datagram.c
+++ b/net/phonet/datagram.c
@@ -34,10 +34,10 @@ static int pn_ioctl(struct sock *sk, int cmd, int *karg)
switch (cmd) {
case SIOCINQ:
- lock_sock(sk);
+ spin_lock_bh(&sk->sk_receive_queue.lock);
skb = skb_peek(&sk->sk_receive_queue);
*karg = skb ? skb->len : 0;
- release_sock(sk);
+ spin_unlock_bh(&sk->sk_receive_queue.lock);
return 0;
case SIOCPNADDRESOURCE:
diff --git a/net/phonet/pep.c b/net/phonet/pep.c
index faba31f..3dd5f52 100644
--- a/net/phonet/pep.c
+++ b/net/phonet/pep.c
@@ -917,6 +917,37 @@ static int pep_sock_enable(struct sock *sk, struct sockaddr *addr, int len)
return 0;
}
+static unsigned int pep_first_packet_length(struct sock *sk)
+{
+ struct pep_sock *pn = pep_sk(sk);
+ struct sk_buff_head *q;
+ struct sk_buff *skb;
+ unsigned int len = 0;
+ bool found = false;
+
+ if (sock_flag(sk, SOCK_URGINLINE)) {
+ q = &pn->ctrlreq_queue;
+ spin_lock_bh(&q->lock);
+ skb = skb_peek(q);
+ if (skb) {
+ len = skb->len;
+ found = true;
+ }
+ spin_unlock_bh(&q->lock);
+ }
+
+ if (likely(!found)) {
+ q = &sk->sk_receive_queue;
+ spin_lock_bh(&q->lock);
+ skb = skb_peek(q);
+ if (skb)
+ len = skb->len;
+ spin_unlock_bh(&q->lock);
+ }
+
+ return len;
+}
+
static int pep_ioctl(struct sock *sk, int cmd, int *karg)
{
struct pep_sock *pn = pep_sk(sk);
@@ -929,15 +960,7 @@ static int pep_ioctl(struct sock *sk, int cmd, int *karg)
break;
}
- lock_sock(sk);
- if (sock_flag(sk, SOCK_URGINLINE) &&
- !skb_queue_empty(&pn->ctrlreq_queue))
- *karg = skb_peek(&pn->ctrlreq_queue)->len;
- else if (!skb_queue_empty(&sk->sk_receive_queue))
- *karg = skb_peek(&sk->sk_receive_queue)->len;
- else
- *karg = 0;
- release_sock(sk);
+ *karg = pep_first_packet_length(sk);
ret = 0;
break;
diff --git a/net/rds/rdma.c b/net/rds/rdma.c
index fba82d3..a4e3c5d 100644
--- a/net/rds/rdma.c
+++ b/net/rds/rdma.c
@@ -301,6 +301,9 @@ static int __rds_rdma_map(struct rds_sock *rs, struct rds_get_mr_args *args,
kfree(sg);
}
ret = PTR_ERR(trans_private);
+ /* Trigger connection so that its ready for the next retry */
+ if (ret == -ENODEV)
+ rds_conn_connect_if_down(cp->cp_conn);
goto out;
}
diff --git a/net/rds/recv.c b/net/rds/recv.c
index c71b923..5627f80 100644
--- a/net/rds/recv.c
+++ b/net/rds/recv.c
@@ -425,6 +425,7 @@ static int rds_still_queued(struct rds_sock *rs, struct rds_incoming *inc,
struct sock *sk = rds_rs_to_sk(rs);
int ret = 0;
unsigned long flags;
+ struct rds_incoming *to_drop = NULL;
write_lock_irqsave(&rs->rs_recv_lock, flags);
if (!list_empty(&inc->i_item)) {
@@ -435,11 +436,14 @@ static int rds_still_queued(struct rds_sock *rs, struct rds_incoming *inc,
-be32_to_cpu(inc->i_hdr.h_len),
inc->i_hdr.h_dport);
list_del_init(&inc->i_item);
- rds_inc_put(inc);
+ to_drop = inc;
}
}
write_unlock_irqrestore(&rs->rs_recv_lock, flags);
+ if (to_drop)
+ rds_inc_put(to_drop);
+
rdsdebug("inc %p rs %p still %d dropped %d\n", inc, rs, ret, drop);
return ret;
}
@@ -758,16 +762,21 @@ void rds_clear_recv_queue(struct rds_sock *rs)
struct sock *sk = rds_rs_to_sk(rs);
struct rds_incoming *inc, *tmp;
unsigned long flags;
+ LIST_HEAD(to_drop);
write_lock_irqsave(&rs->rs_recv_lock, flags);
list_for_each_entry_safe(inc, tmp, &rs->rs_recv_queue, i_item) {
rds_recv_rcvbuf_delta(rs, sk, inc->i_conn->c_lcong,
-be32_to_cpu(inc->i_hdr.h_len),
inc->i_hdr.h_dport);
+ list_move(&inc->i_item, &to_drop);
+ }
+ write_unlock_irqrestore(&rs->rs_recv_lock, flags);
+
+ list_for_each_entry_safe(inc, tmp, &to_drop, i_item) {
list_del_init(&inc->i_item);
rds_inc_put(inc);
}
- write_unlock_irqrestore(&rs->rs_recv_lock, flags);
}
/*
diff --git a/net/rds/send.c b/net/rds/send.c
index 5e57a15..2899def 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -1313,12 +1313,8 @@ int rds_sendmsg(struct socket *sock, struct msghdr *msg, size_t payload_len)
/* Parse any control messages the user may have included. */
ret = rds_cmsg_send(rs, rm, msg, &allocated_mr, &vct);
- if (ret) {
- /* Trigger connection so that its ready for the next retry */
- if (ret == -EAGAIN)
- rds_conn_connect_if_down(conn);
+ if (ret)
goto out;
- }
if (rm->rdma.op_active && !conn->c_trans->xmit_rdma) {
printk_ratelimited(KERN_NOTICE "rdma_op %p conn xmit_rdma %p\n",
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index dbeb75c..7818aae 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -199,11 +199,19 @@ struct rxrpc_host_header {
*/
struct rxrpc_skb_priv {
struct rxrpc_connection *conn; /* Connection referred to (poke packet) */
- u16 offset; /* Offset of data */
- u16 len; /* Length of data */
- u8 flags;
+ union {
+ struct {
+ u16 offset; /* Offset of data */
+ u16 len; /* Length of data */
+ u8 flags;
#define RXRPC_RX_VERIFIED 0x01
-
+ };
+ struct {
+ rxrpc_seq_t first_ack; /* First packet in acks table */
+ u8 nr_acks; /* Number of acks+nacks */
+ u8 nr_nacks; /* Number of nacks */
+ };
+ };
struct rxrpc_host_header hdr; /* RxRPC packet header from this packet */
};
@@ -510,7 +518,7 @@ struct rxrpc_connection {
enum rxrpc_call_completion completion; /* Completion condition */
s32 abort_code; /* Abort code of connection abort */
int debug_id; /* debug ID for printks */
- atomic_t serial; /* packet serial number counter */
+ rxrpc_serial_t tx_serial; /* Outgoing packet serial number counter */
unsigned int hi_serial; /* highest serial number received */
u32 service_id; /* Service ID, possibly upgraded */
u32 security_level; /* Security level selected */
@@ -692,11 +700,11 @@ struct rxrpc_call {
u8 cong_dup_acks; /* Count of ACKs showing missing packets */
u8 cong_cumul_acks; /* Cumulative ACK count */
ktime_t cong_tstamp; /* Last time cwnd was changed */
+ struct sk_buff *cong_last_nack; /* Last ACK with nacks received */
/* Receive-phase ACK management (ACKs we send). */
u8 ackr_reason; /* reason to ACK */
u16 ackr_sack_base; /* Starting slot in SACK table ring */
- rxrpc_serial_t ackr_serial; /* serial of packet being ACK'd */
rxrpc_seq_t ackr_window; /* Base of SACK window */
rxrpc_seq_t ackr_wtop; /* Base of SACK window */
unsigned int ackr_nr_unacked; /* Number of unacked packets */
@@ -730,7 +738,8 @@ struct rxrpc_call {
struct rxrpc_ack_summary {
u16 nr_acks; /* Number of ACKs in packet */
u16 nr_new_acks; /* Number of new ACKs in packet */
- u16 nr_rot_new_acks; /* Number of rotated new ACKs */
+ u16 nr_new_nacks; /* Number of new nacks in packet */
+ u16 nr_retained_nacks; /* Number of nacks retained between ACKs */
u8 ack_reason;
bool saw_nacks; /* Saw NACKs in packet */
bool new_low_nack; /* T if new low NACK found */
@@ -823,6 +832,20 @@ static inline bool rxrpc_sending_to_client(const struct rxrpc_txbuf *txb)
#include <trace/events/rxrpc.h>
/*
+ * Allocate the next serial number on a connection. 0 must be skipped.
+ */
+static inline rxrpc_serial_t rxrpc_get_next_serial(struct rxrpc_connection *conn)
+{
+ rxrpc_serial_t serial;
+
+ serial = conn->tx_serial;
+ if (serial == 0)
+ serial = 1;
+ conn->tx_serial = serial + 1;
+ return serial;
+}
+
+/*
* af_rxrpc.c
*/
extern atomic_t rxrpc_n_rx_skbs;
diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index e363f21a..0f78544 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -43,8 +43,6 @@ void rxrpc_propose_delay_ACK(struct rxrpc_call *call, rxrpc_serial_t serial,
unsigned long expiry = rxrpc_soft_ack_delay;
unsigned long now = jiffies, ack_at;
- call->ackr_serial = serial;
-
if (rxrpc_soft_ack_delay < expiry)
expiry = rxrpc_soft_ack_delay;
if (call->peer->srtt_us != 0)
@@ -114,6 +112,7 @@ static void rxrpc_congestion_timeout(struct rxrpc_call *call)
void rxrpc_resend(struct rxrpc_call *call, struct sk_buff *ack_skb)
{
struct rxrpc_ackpacket *ack = NULL;
+ struct rxrpc_skb_priv *sp;
struct rxrpc_txbuf *txb;
unsigned long resend_at;
rxrpc_seq_t transmitted = READ_ONCE(call->tx_transmitted);
@@ -141,14 +140,15 @@ void rxrpc_resend(struct rxrpc_call *call, struct sk_buff *ack_skb)
* explicitly NAK'd packets.
*/
if (ack_skb) {
+ sp = rxrpc_skb(ack_skb);
ack = (void *)ack_skb->data + sizeof(struct rxrpc_wire_header);
- for (i = 0; i < ack->nAcks; i++) {
+ for (i = 0; i < sp->nr_acks; i++) {
rxrpc_seq_t seq;
if (ack->acks[i] & 1)
continue;
- seq = ntohl(ack->firstPacket) + i;
+ seq = sp->first_ack + i;
if (after(txb->seq, transmitted))
break;
if (after(txb->seq, seq))
@@ -373,7 +373,6 @@ static void rxrpc_send_initial_ping(struct rxrpc_call *call)
bool rxrpc_input_call_event(struct rxrpc_call *call, struct sk_buff *skb)
{
unsigned long now, next, t;
- rxrpc_serial_t ackr_serial;
bool resend = false, expired = false;
s32 abort_code;
@@ -423,8 +422,7 @@ bool rxrpc_input_call_event(struct rxrpc_call *call, struct sk_buff *skb)
if (time_after_eq(now, t)) {
trace_rxrpc_timer(call, rxrpc_timer_exp_ack, now);
cmpxchg(&call->delay_ack_at, t, now + MAX_JIFFY_OFFSET);
- ackr_serial = xchg(&call->ackr_serial, 0);
- rxrpc_send_ACK(call, RXRPC_ACK_DELAY, ackr_serial,
+ rxrpc_send_ACK(call, RXRPC_ACK_DELAY, 0,
rxrpc_propose_ack_ping_for_lost_ack);
}
diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index 0943e54..9fc9a6c 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -686,6 +686,7 @@ static void rxrpc_destroy_call(struct work_struct *work)
del_timer_sync(&call->timer);
+ rxrpc_free_skb(call->cong_last_nack, rxrpc_skb_put_last_nack);
rxrpc_cleanup_ring(call);
while ((txb = list_first_entry_or_null(&call->tx_sendmsg,
struct rxrpc_txbuf, call_link))) {
diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c
index 95f4bc2..1f251d7 100644
--- a/net/rxrpc/conn_event.c
+++ b/net/rxrpc/conn_event.c
@@ -95,6 +95,14 @@ void rxrpc_conn_retransmit_call(struct rxrpc_connection *conn,
_enter("%d", conn->debug_id);
+ if (sp && sp->hdr.type == RXRPC_PACKET_TYPE_ACK) {
+ if (skb_copy_bits(skb, sizeof(struct rxrpc_wire_header),
+ &pkt.ack, sizeof(pkt.ack)) < 0)
+ return;
+ if (pkt.ack.reason == RXRPC_ACK_PING_RESPONSE)
+ return;
+ }
+
chan = &conn->channels[channel];
/* If the last call got moved on whilst we were waiting to run, just
@@ -117,7 +125,7 @@ void rxrpc_conn_retransmit_call(struct rxrpc_connection *conn,
iov[2].iov_base = &ack_info;
iov[2].iov_len = sizeof(ack_info);
- serial = atomic_inc_return(&conn->serial);
+ serial = rxrpc_get_next_serial(conn);
pkt.whdr.epoch = htonl(conn->proto.epoch);
pkt.whdr.cid = htonl(conn->proto.cid | channel);
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 92495e7..9691de0 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -45,11 +45,9 @@ static void rxrpc_congestion_management(struct rxrpc_call *call,
}
cumulative_acks += summary->nr_new_acks;
- cumulative_acks += summary->nr_rot_new_acks;
if (cumulative_acks > 255)
cumulative_acks = 255;
- summary->mode = call->cong_mode;
summary->cwnd = call->cong_cwnd;
summary->ssthresh = call->cong_ssthresh;
summary->cumulative_acks = cumulative_acks;
@@ -151,6 +149,7 @@ static void rxrpc_congestion_management(struct rxrpc_call *call,
cwnd = RXRPC_TX_MAX_WINDOW;
call->cong_cwnd = cwnd;
call->cong_cumul_acks = cumulative_acks;
+ summary->mode = call->cong_mode;
trace_rxrpc_congest(call, summary, acked_serial, change);
if (resend)
rxrpc_resend(call, skb);
@@ -213,7 +212,6 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to,
list_for_each_entry_rcu(txb, &call->tx_buffer, call_link, false) {
if (before_eq(txb->seq, call->acks_hard_ack))
continue;
- summary->nr_rot_new_acks++;
if (test_bit(RXRPC_TXBUF_LAST, &txb->flags)) {
set_bit(RXRPC_CALL_TX_LAST, &call->flags);
rot_last = true;
@@ -254,6 +252,11 @@ static void rxrpc_end_tx_phase(struct rxrpc_call *call, bool reply_begun,
{
ASSERT(test_bit(RXRPC_CALL_TX_LAST, &call->flags));
+ if (unlikely(call->cong_last_nack)) {
+ rxrpc_free_skb(call->cong_last_nack, rxrpc_skb_put_last_nack);
+ call->cong_last_nack = NULL;
+ }
+
switch (__rxrpc_call_state(call)) {
case RXRPC_CALL_CLIENT_SEND_REQUEST:
case RXRPC_CALL_CLIENT_AWAIT_REPLY:
@@ -703,6 +706,43 @@ static void rxrpc_input_ackinfo(struct rxrpc_call *call, struct sk_buff *skb,
}
/*
+ * Determine how many nacks from the previous ACK have now been satisfied.
+ */
+static rxrpc_seq_t rxrpc_input_check_prev_ack(struct rxrpc_call *call,
+ struct rxrpc_ack_summary *summary,
+ rxrpc_seq_t seq)
+{
+ struct sk_buff *skb = call->cong_last_nack;
+ struct rxrpc_ackpacket ack;
+ struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
+ unsigned int i, new_acks = 0, retained_nacks = 0;
+ rxrpc_seq_t old_seq = sp->first_ack;
+ u8 *acks = skb->data + sizeof(struct rxrpc_wire_header) + sizeof(ack);
+
+ if (after_eq(seq, old_seq + sp->nr_acks)) {
+ summary->nr_new_acks += sp->nr_nacks;
+ summary->nr_new_acks += seq - (old_seq + sp->nr_acks);
+ summary->nr_retained_nacks = 0;
+ } else if (seq == old_seq) {
+ summary->nr_retained_nacks = sp->nr_nacks;
+ } else {
+ for (i = 0; i < sp->nr_acks; i++) {
+ if (acks[i] == RXRPC_ACK_TYPE_NACK) {
+ if (before(old_seq + i, seq))
+ new_acks++;
+ else
+ retained_nacks++;
+ }
+ }
+
+ summary->nr_new_acks += new_acks;
+ summary->nr_retained_nacks = retained_nacks;
+ }
+
+ return old_seq + sp->nr_acks;
+}
+
+/*
* Process individual soft ACKs.
*
* Each ACK in the array corresponds to one packet and can be either an ACK or
@@ -711,25 +751,51 @@ static void rxrpc_input_ackinfo(struct rxrpc_call *call, struct sk_buff *skb,
* the timer on the basis that the peer might just not have processed them at
* the time the ACK was sent.
*/
-static void rxrpc_input_soft_acks(struct rxrpc_call *call, u8 *acks,
- rxrpc_seq_t seq, int nr_acks,
- struct rxrpc_ack_summary *summary)
+static void rxrpc_input_soft_acks(struct rxrpc_call *call,
+ struct rxrpc_ack_summary *summary,
+ struct sk_buff *skb,
+ rxrpc_seq_t seq,
+ rxrpc_seq_t since)
{
- unsigned int i;
+ struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
+ unsigned int i, old_nacks = 0;
+ rxrpc_seq_t lowest_nak = seq + sp->nr_acks;
+ u8 *acks = skb->data + sizeof(struct rxrpc_wire_header) + sizeof(struct rxrpc_ackpacket);
- for (i = 0; i < nr_acks; i++) {
+ for (i = 0; i < sp->nr_acks; i++) {
if (acks[i] == RXRPC_ACK_TYPE_ACK) {
summary->nr_acks++;
- summary->nr_new_acks++;
+ if (after_eq(seq, since))
+ summary->nr_new_acks++;
} else {
- if (!summary->saw_nacks &&
- call->acks_lowest_nak != seq + i) {
- call->acks_lowest_nak = seq + i;
- summary->new_low_nack = true;
- }
summary->saw_nacks = true;
+ if (before(seq, since)) {
+ /* Overlap with previous ACK */
+ old_nacks++;
+ } else {
+ summary->nr_new_nacks++;
+ sp->nr_nacks++;
+ }
+
+ if (before(seq, lowest_nak))
+ lowest_nak = seq;
}
+ seq++;
}
+
+ if (lowest_nak != call->acks_lowest_nak) {
+ call->acks_lowest_nak = lowest_nak;
+ summary->new_low_nack = true;
+ }
+
+ /* We *can* have more nacks than we did - the peer is permitted to drop
+ * packets it has soft-acked and re-request them. Further, it is
+ * possible for the nack distribution to change whilst the number of
+ * nacks stays the same or goes down.
+ */
+ if (old_nacks < summary->nr_retained_nacks)
+ summary->nr_new_acks += summary->nr_retained_nacks - old_nacks;
+ summary->nr_retained_nacks = old_nacks;
}
/*
@@ -773,7 +839,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb)
struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
struct rxrpc_ackinfo info;
rxrpc_serial_t ack_serial, acked_serial;
- rxrpc_seq_t first_soft_ack, hard_ack, prev_pkt;
+ rxrpc_seq_t first_soft_ack, hard_ack, prev_pkt, since;
int nr_acks, offset, ioffset;
_enter("");
@@ -789,6 +855,8 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb)
prev_pkt = ntohl(ack.previousPacket);
hard_ack = first_soft_ack - 1;
nr_acks = ack.nAcks;
+ sp->first_ack = first_soft_ack;
+ sp->nr_acks = nr_acks;
summary.ack_reason = (ack.reason < RXRPC_ACK__INVALID ?
ack.reason : RXRPC_ACK__INVALID);
@@ -858,6 +926,16 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb)
if (nr_acks > 0)
skb_condense(skb);
+ if (call->cong_last_nack) {
+ since = rxrpc_input_check_prev_ack(call, &summary, first_soft_ack);
+ rxrpc_free_skb(call->cong_last_nack, rxrpc_skb_put_last_nack);
+ call->cong_last_nack = NULL;
+ } else {
+ summary.nr_new_acks = first_soft_ack - call->acks_first_seq;
+ call->acks_lowest_nak = first_soft_ack + nr_acks;
+ since = first_soft_ack;
+ }
+
call->acks_latest_ts = skb->tstamp;
call->acks_first_seq = first_soft_ack;
call->acks_prev_seq = prev_pkt;
@@ -866,7 +944,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb)
case RXRPC_ACK_PING:
break;
default:
- if (after(acked_serial, call->acks_highest_serial))
+ if (acked_serial && after(acked_serial, call->acks_highest_serial))
call->acks_highest_serial = acked_serial;
break;
}
@@ -905,8 +983,9 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb)
if (nr_acks > 0) {
if (offset > (int)skb->len - nr_acks)
return rxrpc_proto_abort(call, 0, rxrpc_eproto_ackr_short_sack);
- rxrpc_input_soft_acks(call, skb->data + offset, first_soft_ack,
- nr_acks, &summary);
+ rxrpc_input_soft_acks(call, &summary, skb, first_soft_ack, since);
+ rxrpc_get_skb(skb, rxrpc_skb_get_last_nack);
+ call->cong_last_nack = skb;
}
if (test_bit(RXRPC_CALL_TX_LAST, &call->flags) &&
diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index a090614..4a292f8 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -216,7 +216,7 @@ int rxrpc_send_ack_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb)
iov[0].iov_len = sizeof(txb->wire) + sizeof(txb->ack) + n;
len = iov[0].iov_len;
- serial = atomic_inc_return(&conn->serial);
+ serial = rxrpc_get_next_serial(conn);
txb->wire.serial = htonl(serial);
trace_rxrpc_tx_ack(call->debug_id, serial,
ntohl(txb->ack.firstPacket),
@@ -302,7 +302,7 @@ int rxrpc_send_abort_packet(struct rxrpc_call *call)
iov[0].iov_base = &pkt;
iov[0].iov_len = sizeof(pkt);
- serial = atomic_inc_return(&conn->serial);
+ serial = rxrpc_get_next_serial(conn);
pkt.whdr.serial = htonl(serial);
iov_iter_kvec(&msg.msg_iter, WRITE, iov, 1, sizeof(pkt));
@@ -334,7 +334,7 @@ int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb)
_enter("%x,{%d}", txb->seq, txb->len);
/* Each transmission of a Tx packet needs a new serial number */
- serial = atomic_inc_return(&conn->serial);
+ serial = rxrpc_get_next_serial(conn);
txb->wire.serial = htonl(serial);
if (test_bit(RXRPC_CONN_PROBING_FOR_UPGRADE, &conn->flags) &&
@@ -558,7 +558,7 @@ void rxrpc_send_conn_abort(struct rxrpc_connection *conn)
len = iov[0].iov_len + iov[1].iov_len;
- serial = atomic_inc_return(&conn->serial);
+ serial = rxrpc_get_next_serial(conn);
whdr.serial = htonl(serial);
iov_iter_kvec(&msg.msg_iter, WRITE, iov, 2, len);
diff --git a/net/rxrpc/proc.c b/net/rxrpc/proc.c
index 6c86cbb..26dc2f2 100644
--- a/net/rxrpc/proc.c
+++ b/net/rxrpc/proc.c
@@ -181,7 +181,7 @@ static int rxrpc_connection_seq_show(struct seq_file *seq, void *v)
atomic_read(&conn->active),
state,
key_serial(conn->key),
- atomic_read(&conn->serial),
+ conn->tx_serial,
conn->hi_serial,
conn->channels[0].call_id,
conn->channels[1].call_id,
diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c
index b52dedc..6b32d61 100644
--- a/net/rxrpc/rxkad.c
+++ b/net/rxrpc/rxkad.c
@@ -664,7 +664,7 @@ static int rxkad_issue_challenge(struct rxrpc_connection *conn)
len = iov[0].iov_len + iov[1].iov_len;
- serial = atomic_inc_return(&conn->serial);
+ serial = rxrpc_get_next_serial(conn);
whdr.serial = htonl(serial);
ret = kernel_sendmsg(conn->local->socket, &msg, iov, 2, len);
@@ -721,7 +721,7 @@ static int rxkad_send_response(struct rxrpc_connection *conn,
len = iov[0].iov_len + iov[1].iov_len + iov[2].iov_len;
- serial = atomic_inc_return(&conn->serial);
+ serial = rxrpc_get_next_serial(conn);
whdr.serial = htonl(serial);
rxrpc_local_dont_fragment(conn->local, false);
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 12386f5..6faa7d0 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -232,18 +232,14 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
return err;
}
-static bool is_mirred_nested(void)
-{
- return unlikely(__this_cpu_read(mirred_nest_level) > 1);
-}
-
-static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb)
+static int
+tcf_mirred_forward(bool at_ingress, bool want_ingress, struct sk_buff *skb)
{
int err;
if (!want_ingress)
err = tcf_dev_queue_xmit(skb, dev_queue_xmit);
- else if (is_mirred_nested())
+ else if (!at_ingress)
err = netif_rx(skb);
else
err = netif_receive_skb(skb);
@@ -270,8 +266,7 @@ static int tcf_mirred_to_dev(struct sk_buff *skb, struct tcf_mirred *m,
if (unlikely(!(dev->flags & IFF_UP)) || !netif_carrier_ok(dev)) {
net_notice_ratelimited("tc mirred to Houston: device %s is down\n",
dev->name);
- err = -ENODEV;
- goto out;
+ goto err_cant_do;
}
/* we could easily avoid the clone only if called by ingress and clsact;
@@ -283,10 +278,8 @@ static int tcf_mirred_to_dev(struct sk_buff *skb, struct tcf_mirred *m,
tcf_mirred_can_reinsert(retval);
if (!dont_clone) {
skb_to_send = skb_clone(skb, GFP_ATOMIC);
- if (!skb_to_send) {
- err = -ENOMEM;
- goto out;
- }
+ if (!skb_to_send)
+ goto err_cant_do;
}
want_ingress = tcf_mirred_act_wants_ingress(m_eaction);
@@ -319,19 +312,20 @@ static int tcf_mirred_to_dev(struct sk_buff *skb, struct tcf_mirred *m,
skb_set_redirected(skb_to_send, skb_to_send->tc_at_ingress);
- err = tcf_mirred_forward(want_ingress, skb_to_send);
+ err = tcf_mirred_forward(at_ingress, want_ingress, skb_to_send);
} else {
- err = tcf_mirred_forward(want_ingress, skb_to_send);
+ err = tcf_mirred_forward(at_ingress, want_ingress, skb_to_send);
}
-
- if (err) {
-out:
+ if (err)
tcf_action_inc_overlimit_qstats(&m->common);
- if (is_redirect)
- retval = TC_ACT_SHOT;
- }
return retval;
+
+err_cant_do:
+ if (is_redirect)
+ retval = TC_ACT_SHOT;
+ tcf_action_inc_overlimit_qstats(&m->common);
+ return retval;
}
static int tcf_blockcast_redir(struct sk_buff *skb, struct tcf_mirred *m,
@@ -533,8 +527,6 @@ static int mirred_device_event(struct notifier_block *unused,
* net_device are already rcu protected.
*/
RCU_INIT_POINTER(m->tcfm_dev, NULL);
- } else if (m->tcfm_blockid) {
- m->tcfm_blockid = 0;
}
spin_unlock_bh(&m->tcf_lock);
}
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index efb9d28..6ee7064 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -2460,8 +2460,11 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
}
errout_idr:
- if (!fold)
+ if (!fold) {
+ spin_lock(&tp->lock);
idr_remove(&head->handle_idr, fnew->handle);
+ spin_unlock(&tp->lock);
+ }
__fl_put(fnew);
errout_tb:
kfree(tb);
diff --git a/net/sched/em_canid.c b/net/sched/em_canid.c
index 5ea84de..5337bc4 100644
--- a/net/sched/em_canid.c
+++ b/net/sched/em_canid.c
@@ -222,6 +222,7 @@ static void __exit exit_em_canid(void)
tcf_em_unregister(&em_canid_ops);
}
+MODULE_DESCRIPTION("ematch classifier to match CAN IDs embedded in skb CAN frames");
MODULE_LICENSE("GPL");
module_init(init_em_canid);
diff --git a/net/sched/em_cmp.c b/net/sched/em_cmp.c
index f17b049..c90ad7e 100644
--- a/net/sched/em_cmp.c
+++ b/net/sched/em_cmp.c
@@ -87,6 +87,7 @@ static void __exit exit_em_cmp(void)
tcf_em_unregister(&em_cmp_ops);
}
+MODULE_DESCRIPTION("ematch classifier for basic data types(8/16/32 bit) against skb data");
MODULE_LICENSE("GPL");
module_init(init_em_cmp);
diff --git a/net/sched/em_meta.c b/net/sched/em_meta.c
index 09d8afd..8996c73 100644
--- a/net/sched/em_meta.c
+++ b/net/sched/em_meta.c
@@ -1006,6 +1006,7 @@ static void __exit exit_em_meta(void)
tcf_em_unregister(&em_meta_ops);
}
+MODULE_DESCRIPTION("ematch classifier for various internal kernel metadata, skb metadata and sk metadata");
MODULE_LICENSE("GPL");
module_init(init_em_meta);
diff --git a/net/sched/em_nbyte.c b/net/sched/em_nbyte.c
index a83b237..4f9f21a 100644
--- a/net/sched/em_nbyte.c
+++ b/net/sched/em_nbyte.c
@@ -68,6 +68,7 @@ static void __exit exit_em_nbyte(void)
tcf_em_unregister(&em_nbyte_ops);
}
+MODULE_DESCRIPTION("ematch classifier for arbitrary skb multi-bytes");
MODULE_LICENSE("GPL");
module_init(init_em_nbyte);
diff --git a/net/sched/em_text.c b/net/sched/em_text.c
index f176afb..420c662 100644
--- a/net/sched/em_text.c
+++ b/net/sched/em_text.c
@@ -147,6 +147,7 @@ static void __exit exit_em_text(void)
tcf_em_unregister(&em_text_ops);
}
+MODULE_DESCRIPTION("ematch classifier for embedded text in skbs");
MODULE_LICENSE("GPL");
module_init(init_em_text);
diff --git a/net/sched/em_u32.c b/net/sched/em_u32.c
index 71b070d..fdec4db 100644
--- a/net/sched/em_u32.c
+++ b/net/sched/em_u32.c
@@ -52,6 +52,7 @@ static void __exit exit_em_u32(void)
tcf_em_unregister(&em_u32_ops);
}
+MODULE_DESCRIPTION("ematch skb classifier using 32 bit chunks of data");
MODULE_LICENSE("GPL");
module_init(init_em_u32);
diff --git a/net/sctp/inqueue.c b/net/sctp/inqueue.c
index 7182c5a..5c16521 100644
--- a/net/sctp/inqueue.c
+++ b/net/sctp/inqueue.c
@@ -38,6 +38,14 @@ void sctp_inq_init(struct sctp_inq *queue)
INIT_WORK(&queue->immediate, NULL);
}
+/* Properly release the chunk which is being worked on. */
+static inline void sctp_inq_chunk_free(struct sctp_chunk *chunk)
+{
+ if (chunk->head_skb)
+ chunk->skb = chunk->head_skb;
+ sctp_chunk_free(chunk);
+}
+
/* Release the memory associated with an SCTP inqueue. */
void sctp_inq_free(struct sctp_inq *queue)
{
@@ -53,7 +61,7 @@ void sctp_inq_free(struct sctp_inq *queue)
* free it as well.
*/
if (queue->in_progress) {
- sctp_chunk_free(queue->in_progress);
+ sctp_inq_chunk_free(queue->in_progress);
queue->in_progress = NULL;
}
}
@@ -130,9 +138,7 @@ struct sctp_chunk *sctp_inq_pop(struct sctp_inq *queue)
goto new_skb;
}
- if (chunk->head_skb)
- chunk->skb = chunk->head_skb;
- sctp_chunk_free(chunk);
+ sctp_inq_chunk_free(chunk);
chunk = queue->in_progress = NULL;
} else {
/* Nothing to do. Next chunk in the packet, please. */
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index a2cb30a..0f53a5c 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -924,6 +924,7 @@ static int smc_switch_to_fallback(struct smc_sock *smc, int reason_code)
smc->clcsock->file->private_data = smc->clcsock;
smc->clcsock->wq.fasync_list =
smc->sk.sk_socket->wq.fasync_list;
+ smc->sk.sk_socket->wq.fasync_list = NULL;
/* There might be some wait entries remaining
* in smc sk->sk_wq and they should be woken up
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 5b04528..c9189a9 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -19,6 +19,35 @@
#include <linux/rtnetlink.h>
#include <net/switchdev.h>
+static bool switchdev_obj_eq(const struct switchdev_obj *a,
+ const struct switchdev_obj *b)
+{
+ const struct switchdev_obj_port_vlan *va, *vb;
+ const struct switchdev_obj_port_mdb *ma, *mb;
+
+ if (a->id != b->id || a->orig_dev != b->orig_dev)
+ return false;
+
+ switch (a->id) {
+ case SWITCHDEV_OBJ_ID_PORT_VLAN:
+ va = SWITCHDEV_OBJ_PORT_VLAN(a);
+ vb = SWITCHDEV_OBJ_PORT_VLAN(b);
+ return va->flags == vb->flags &&
+ va->vid == vb->vid &&
+ va->changed == vb->changed;
+ case SWITCHDEV_OBJ_ID_PORT_MDB:
+ case SWITCHDEV_OBJ_ID_HOST_MDB:
+ ma = SWITCHDEV_OBJ_PORT_MDB(a);
+ mb = SWITCHDEV_OBJ_PORT_MDB(b);
+ return ma->vid == mb->vid &&
+ ether_addr_equal(ma->addr, mb->addr);
+ default:
+ break;
+ }
+
+ BUG();
+}
+
static LIST_HEAD(deferred);
static DEFINE_SPINLOCK(deferred_lock);
@@ -307,6 +336,50 @@ int switchdev_port_obj_del(struct net_device *dev,
}
EXPORT_SYMBOL_GPL(switchdev_port_obj_del);
+/**
+ * switchdev_port_obj_act_is_deferred - Is object action pending?
+ *
+ * @dev: port device
+ * @nt: type of action; add or delete
+ * @obj: object to test
+ *
+ * Returns true if a deferred item is pending, which is
+ * equivalent to the action @nt on an object @obj.
+ *
+ * rtnl_lock must be held.
+ */
+bool switchdev_port_obj_act_is_deferred(struct net_device *dev,
+ enum switchdev_notifier_type nt,
+ const struct switchdev_obj *obj)
+{
+ struct switchdev_deferred_item *dfitem;
+ bool found = false;
+
+ ASSERT_RTNL();
+
+ spin_lock_bh(&deferred_lock);
+
+ list_for_each_entry(dfitem, &deferred, list) {
+ if (dfitem->dev != dev)
+ continue;
+
+ if ((dfitem->func == switchdev_port_obj_add_deferred &&
+ nt == SWITCHDEV_PORT_OBJ_ADD) ||
+ (dfitem->func == switchdev_port_obj_del_deferred &&
+ nt == SWITCHDEV_PORT_OBJ_DEL)) {
+ if (switchdev_obj_eq((const void *)dfitem->data, obj)) {
+ found = true;
+ break;
+ }
+ }
+ }
+
+ spin_unlock_bh(&deferred_lock);
+
+ return found;
+}
+EXPORT_SYMBOL_GPL(switchdev_port_obj_act_is_deferred);
+
static ATOMIC_NOTIFIER_HEAD(switchdev_notif_chain);
static BLOCKING_NOTIFIER_HEAD(switchdev_blocking_notif_chain);
diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index 2cde375..878415c 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -1086,6 +1086,12 @@ int tipc_nl_bearer_add(struct sk_buff *skb, struct genl_info *info)
#ifdef CONFIG_TIPC_MEDIA_UDP
if (attrs[TIPC_NLA_BEARER_UDP_OPTS]) {
+ if (b->media->type_id != TIPC_MEDIA_TYPE_UDP) {
+ rtnl_unlock();
+ NL_SET_ERR_MSG(info->extack, "UDP option is unsupported");
+ return -EINVAL;
+ }
+
err = tipc_udp_nl_bearer_add(b,
attrs[TIPC_NLA_BEARER_UDP_OPTS]);
if (err) {
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 1c2c680..b4674f0 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -1003,7 +1003,7 @@ static u16 tls_user_config(struct tls_context *ctx, bool tx)
return 0;
}
-static int tls_get_info(const struct sock *sk, struct sk_buff *skb)
+static int tls_get_info(struct sock *sk, struct sk_buff *skb)
{
u16 version, cipher_type;
struct tls_context *ctx;
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 31e8a94..211f571 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -52,6 +52,7 @@ struct tls_decrypt_arg {
struct_group(inargs,
bool zc;
bool async;
+ bool async_done;
u8 tail;
);
@@ -63,6 +64,7 @@ struct tls_decrypt_ctx {
u8 iv[TLS_MAX_IV_SIZE];
u8 aad[TLS_MAX_AAD_SIZE];
u8 tail;
+ bool free_sgout;
struct scatterlist sg[];
};
@@ -187,7 +189,6 @@ static void tls_decrypt_done(void *data, int err)
struct aead_request *aead_req = data;
struct crypto_aead *aead = crypto_aead_reqtfm(aead_req);
struct scatterlist *sgout = aead_req->dst;
- struct scatterlist *sgin = aead_req->src;
struct tls_sw_context_rx *ctx;
struct tls_decrypt_ctx *dctx;
struct tls_context *tls_ctx;
@@ -196,6 +197,17 @@ static void tls_decrypt_done(void *data, int err)
struct sock *sk;
int aead_size;
+ /* If requests get too backlogged crypto API returns -EBUSY and calls
+ * ->complete(-EINPROGRESS) immediately followed by ->complete(0)
+ * to make waiting for backlog to flush with crypto_wait_req() easier.
+ * First wait converts -EBUSY -> -EINPROGRESS, and the second one
+ * -EINPROGRESS -> 0.
+ * We have a single struct crypto_async_request per direction, this
+ * scheme doesn't help us, so just ignore the first ->complete().
+ */
+ if (err == -EINPROGRESS)
+ return;
+
aead_size = sizeof(*aead_req) + crypto_aead_reqsize(aead);
aead_size = ALIGN(aead_size, __alignof__(*dctx));
dctx = (void *)((u8 *)aead_req + aead_size);
@@ -213,7 +225,7 @@ static void tls_decrypt_done(void *data, int err)
}
/* Free the destination pages if skb was not decrypted inplace */
- if (sgout != sgin) {
+ if (dctx->free_sgout) {
/* Skip the first S/G entry as it points to AAD */
for_each_sg(sg_next(sgout), sg, UINT_MAX, pages) {
if (!sg)
@@ -224,10 +236,17 @@ static void tls_decrypt_done(void *data, int err)
kfree(aead_req);
- spin_lock_bh(&ctx->decrypt_compl_lock);
- if (!atomic_dec_return(&ctx->decrypt_pending))
+ if (atomic_dec_and_test(&ctx->decrypt_pending))
complete(&ctx->async_wait.completion);
- spin_unlock_bh(&ctx->decrypt_compl_lock);
+}
+
+static int tls_decrypt_async_wait(struct tls_sw_context_rx *ctx)
+{
+ if (!atomic_dec_and_test(&ctx->decrypt_pending))
+ crypto_wait_req(-EINPROGRESS, &ctx->async_wait);
+ atomic_inc(&ctx->decrypt_pending);
+
+ return ctx->async_wait.err;
}
static int tls_do_decryption(struct sock *sk,
@@ -253,20 +272,33 @@ static int tls_do_decryption(struct sock *sk,
aead_request_set_callback(aead_req,
CRYPTO_TFM_REQ_MAY_BACKLOG,
tls_decrypt_done, aead_req);
+ DEBUG_NET_WARN_ON_ONCE(atomic_read(&ctx->decrypt_pending) < 1);
atomic_inc(&ctx->decrypt_pending);
} else {
+ DECLARE_CRYPTO_WAIT(wait);
+
aead_request_set_callback(aead_req,
CRYPTO_TFM_REQ_MAY_BACKLOG,
- crypto_req_done, &ctx->async_wait);
+ crypto_req_done, &wait);
+ ret = crypto_aead_decrypt(aead_req);
+ if (ret == -EINPROGRESS || ret == -EBUSY)
+ ret = crypto_wait_req(ret, &wait);
+ return ret;
}
ret = crypto_aead_decrypt(aead_req);
- if (ret == -EINPROGRESS) {
- if (darg->async)
- return 0;
+ if (ret == -EINPROGRESS)
+ return 0;
- ret = crypto_wait_req(ret, &ctx->async_wait);
+ if (ret == -EBUSY) {
+ ret = tls_decrypt_async_wait(ctx);
+ darg->async_done = true;
+ /* all completions have run, we're not doing async anymore */
+ darg->async = false;
+ return ret;
}
+
+ atomic_dec(&ctx->decrypt_pending);
darg->async = false;
return ret;
@@ -439,9 +471,10 @@ static void tls_encrypt_done(void *data, int err)
struct tls_rec *rec = data;
struct scatterlist *sge;
struct sk_msg *msg_en;
- bool ready = false;
struct sock *sk;
- int pending;
+
+ if (err == -EINPROGRESS) /* see the comment in tls_decrypt_done() */
+ return;
msg_en = &rec->msg_encrypted;
@@ -476,23 +509,25 @@ static void tls_encrypt_done(void *data, int err)
/* If received record is at head of tx_list, schedule tx */
first_rec = list_first_entry(&ctx->tx_list,
struct tls_rec, list);
- if (rec == first_rec)
- ready = true;
+ if (rec == first_rec) {
+ /* Schedule the transmission */
+ if (!test_and_set_bit(BIT_TX_SCHEDULED,
+ &ctx->tx_bitmask))
+ schedule_delayed_work(&ctx->tx_work.work, 1);
+ }
}
- spin_lock_bh(&ctx->encrypt_compl_lock);
- pending = atomic_dec_return(&ctx->encrypt_pending);
-
- if (!pending && ctx->async_notify)
+ if (atomic_dec_and_test(&ctx->encrypt_pending))
complete(&ctx->async_wait.completion);
- spin_unlock_bh(&ctx->encrypt_compl_lock);
+}
- if (!ready)
- return;
+static int tls_encrypt_async_wait(struct tls_sw_context_tx *ctx)
+{
+ if (!atomic_dec_and_test(&ctx->encrypt_pending))
+ crypto_wait_req(-EINPROGRESS, &ctx->async_wait);
+ atomic_inc(&ctx->encrypt_pending);
- /* Schedule the transmission */
- if (!test_and_set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask))
- schedule_delayed_work(&ctx->tx_work.work, 1);
+ return ctx->async_wait.err;
}
static int tls_do_encryption(struct sock *sk,
@@ -541,9 +576,14 @@ static int tls_do_encryption(struct sock *sk,
/* Add the record in tx_list */
list_add_tail((struct list_head *)&rec->list, &ctx->tx_list);
+ DEBUG_NET_WARN_ON_ONCE(atomic_read(&ctx->encrypt_pending) < 1);
atomic_inc(&ctx->encrypt_pending);
rc = crypto_aead_encrypt(aead_req);
+ if (rc == -EBUSY) {
+ rc = tls_encrypt_async_wait(ctx);
+ rc = rc ?: -EINPROGRESS;
+ }
if (!rc || rc != -EINPROGRESS) {
atomic_dec(&ctx->encrypt_pending);
sge->offset -= prot->prepend_size;
@@ -984,7 +1024,6 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg,
int num_zc = 0;
int orig_size;
int ret = 0;
- int pending;
if (!eor && (msg->msg_flags & MSG_EOR))
return -EINVAL;
@@ -1163,24 +1202,12 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg,
if (!num_async) {
goto send_end;
} else if (num_zc) {
+ int err;
+
/* Wait for pending encryptions to get completed */
- spin_lock_bh(&ctx->encrypt_compl_lock);
- ctx->async_notify = true;
-
- pending = atomic_read(&ctx->encrypt_pending);
- spin_unlock_bh(&ctx->encrypt_compl_lock);
- if (pending)
- crypto_wait_req(-EINPROGRESS, &ctx->async_wait);
- else
- reinit_completion(&ctx->async_wait.completion);
-
- /* There can be no concurrent accesses, since we have no
- * pending encrypt operations
- */
- WRITE_ONCE(ctx->async_notify, false);
-
- if (ctx->async_wait.err) {
- ret = ctx->async_wait.err;
+ err = tls_encrypt_async_wait(ctx);
+ if (err) {
+ ret = err;
copied = 0;
}
}
@@ -1229,7 +1256,6 @@ void tls_sw_splice_eof(struct socket *sock)
ssize_t copied = 0;
bool retrying = false;
int ret = 0;
- int pending;
if (!ctx->open_rec)
return;
@@ -1264,22 +1290,7 @@ void tls_sw_splice_eof(struct socket *sock)
}
/* Wait for pending encryptions to get completed */
- spin_lock_bh(&ctx->encrypt_compl_lock);
- ctx->async_notify = true;
-
- pending = atomic_read(&ctx->encrypt_pending);
- spin_unlock_bh(&ctx->encrypt_compl_lock);
- if (pending)
- crypto_wait_req(-EINPROGRESS, &ctx->async_wait);
- else
- reinit_completion(&ctx->async_wait.completion);
-
- /* There can be no concurrent accesses, since we have no pending
- * encrypt operations
- */
- WRITE_ONCE(ctx->async_notify, false);
-
- if (ctx->async_wait.err)
+ if (tls_encrypt_async_wait(ctx))
goto unlock;
/* Transmit if any encryptions have completed */
@@ -1581,12 +1592,16 @@ static int tls_decrypt_sg(struct sock *sk, struct iov_iter *out_iov,
} else if (out_sg) {
memcpy(sgout, out_sg, n_sgout * sizeof(*sgout));
}
+ dctx->free_sgout = !!pages;
/* Prepare and submit AEAD request */
err = tls_do_decryption(sk, sgin, sgout, dctx->iv,
data_len + prot->tail_size, aead_req, darg);
- if (err)
+ if (err) {
+ if (darg->async_done)
+ goto exit_free_skb;
goto exit_free_pages;
+ }
darg->skb = clear_skb ?: tls_strp_msg(ctx);
clear_skb = NULL;
@@ -1598,6 +1613,9 @@ static int tls_decrypt_sg(struct sock *sk, struct iov_iter *out_iov,
return err;
}
+ if (unlikely(darg->async_done))
+ return 0;
+
if (prot->tail_size)
darg->tail = dctx->tail;
@@ -1769,7 +1787,8 @@ static int process_rx_list(struct tls_sw_context_rx *ctx,
u8 *control,
size_t skip,
size_t len,
- bool is_peek)
+ bool is_peek,
+ bool *more)
{
struct sk_buff *skb = skb_peek(&ctx->rx_list);
struct tls_msg *tlm;
@@ -1782,7 +1801,7 @@ static int process_rx_list(struct tls_sw_context_rx *ctx,
err = tls_record_content_type(msg, tlm, control);
if (err <= 0)
- goto out;
+ goto more;
if (skip < rxm->full_len)
break;
@@ -1800,12 +1819,12 @@ static int process_rx_list(struct tls_sw_context_rx *ctx,
err = tls_record_content_type(msg, tlm, control);
if (err <= 0)
- goto out;
+ goto more;
err = skb_copy_datagram_msg(skb, rxm->offset + skip,
msg, chunk);
if (err < 0)
- goto out;
+ goto more;
len = len - chunk;
copied = copied + chunk;
@@ -1841,6 +1860,10 @@ static int process_rx_list(struct tls_sw_context_rx *ctx,
out:
return copied ? : err;
+more:
+ if (more)
+ *more = true;
+ goto out;
}
static bool
@@ -1940,10 +1963,12 @@ int tls_sw_recvmsg(struct sock *sk,
struct strp_msg *rxm;
struct tls_msg *tlm;
ssize_t copied = 0;
+ ssize_t peeked = 0;
bool async = false;
int target, err;
bool is_kvec = iov_iter_is_kvec(&msg->msg_iter);
bool is_peek = flags & MSG_PEEK;
+ bool rx_more = false;
bool released = true;
bool bpf_strp_enabled;
bool zc_capable;
@@ -1963,12 +1988,12 @@ int tls_sw_recvmsg(struct sock *sk,
goto end;
/* Process pending decrypted records. It must be non-zero-copy */
- err = process_rx_list(ctx, msg, &control, 0, len, is_peek);
+ err = process_rx_list(ctx, msg, &control, 0, len, is_peek, &rx_more);
if (err < 0)
goto end;
copied = err;
- if (len <= copied)
+ if (len <= copied || (copied && control != TLS_RECORD_TYPE_DATA) || rx_more)
goto end;
target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);
@@ -2061,6 +2086,8 @@ int tls_sw_recvmsg(struct sock *sk,
decrypted += chunk;
len -= chunk;
__skb_queue_tail(&ctx->rx_list, skb);
+ if (unlikely(control != TLS_RECORD_TYPE_DATA))
+ break;
continue;
}
@@ -2084,8 +2111,10 @@ int tls_sw_recvmsg(struct sock *sk,
if (err < 0)
goto put_on_rx_list_err;
- if (is_peek)
+ if (is_peek) {
+ peeked += chunk;
goto put_on_rx_list;
+ }
if (partially_consumed) {
rxm->offset += chunk;
@@ -2109,16 +2138,10 @@ int tls_sw_recvmsg(struct sock *sk,
recv_end:
if (async) {
- int ret, pending;
+ int ret;
/* Wait for all previously submitted records to be decrypted */
- spin_lock_bh(&ctx->decrypt_compl_lock);
- reinit_completion(&ctx->async_wait.completion);
- pending = atomic_read(&ctx->decrypt_pending);
- spin_unlock_bh(&ctx->decrypt_compl_lock);
- ret = 0;
- if (pending)
- ret = crypto_wait_req(-EINPROGRESS, &ctx->async_wait);
+ ret = tls_decrypt_async_wait(ctx);
__skb_queue_purge(&ctx->async_hold);
if (ret) {
@@ -2130,12 +2153,11 @@ int tls_sw_recvmsg(struct sock *sk,
/* Drain records from the rx_list & copy if required */
if (is_peek || is_kvec)
- err = process_rx_list(ctx, msg, &control, copied,
- decrypted, is_peek);
+ err = process_rx_list(ctx, msg, &control, copied + peeked,
+ decrypted - peeked, is_peek, NULL);
else
err = process_rx_list(ctx, msg, &control, 0,
- async_copy_bytes, is_peek);
- decrypted += max(err, 0);
+ async_copy_bytes, is_peek, NULL);
}
copied += decrypted;
@@ -2435,16 +2457,9 @@ void tls_sw_release_resources_tx(struct sock *sk)
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
struct tls_rec *rec, *tmp;
- int pending;
/* Wait for any pending async encryptions to complete */
- spin_lock_bh(&ctx->encrypt_compl_lock);
- ctx->async_notify = true;
- pending = atomic_read(&ctx->encrypt_pending);
- spin_unlock_bh(&ctx->encrypt_compl_lock);
-
- if (pending)
- crypto_wait_req(-EINPROGRESS, &ctx->async_wait);
+ tls_encrypt_async_wait(ctx);
tls_tx_records(sk, -1);
@@ -2607,7 +2622,7 @@ static struct tls_sw_context_tx *init_ctx_tx(struct tls_context *ctx, struct soc
}
crypto_init_wait(&sw_ctx_tx->async_wait);
- spin_lock_init(&sw_ctx_tx->encrypt_compl_lock);
+ atomic_set(&sw_ctx_tx->encrypt_pending, 1);
INIT_LIST_HEAD(&sw_ctx_tx->tx_list);
INIT_DELAYED_WORK(&sw_ctx_tx->tx_work.work, tx_work_handler);
sw_ctx_tx->tx_work.sk = sk;
@@ -2628,7 +2643,7 @@ static struct tls_sw_context_rx *init_ctx_rx(struct tls_context *ctx)
}
crypto_init_wait(&sw_ctx_rx->async_wait);
- spin_lock_init(&sw_ctx_rx->decrypt_compl_lock);
+ atomic_set(&sw_ctx_rx->decrypt_pending, 1);
init_waitqueue_head(&sw_ctx_rx->wq);
skb_queue_head_init(&sw_ctx_rx->rx_list);
skb_queue_head_init(&sw_ctx_rx->async_hold);
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 30b178e..0748e7e 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -782,19 +782,6 @@ static int unix_seqpacket_sendmsg(struct socket *, struct msghdr *, size_t);
static int unix_seqpacket_recvmsg(struct socket *, struct msghdr *, size_t,
int);
-static int unix_set_peek_off(struct sock *sk, int val)
-{
- struct unix_sock *u = unix_sk(sk);
-
- if (mutex_lock_interruptible(&u->iolock))
- return -EINTR;
-
- WRITE_ONCE(sk->sk_peek_off, val);
- mutex_unlock(&u->iolock);
-
- return 0;
-}
-
#ifdef CONFIG_PROC_FS
static int unix_count_nr_fds(struct sock *sk)
{
@@ -862,7 +849,7 @@ static const struct proto_ops unix_stream_ops = {
.read_skb = unix_stream_read_skb,
.mmap = sock_no_mmap,
.splice_read = unix_stream_splice_read,
- .set_peek_off = unix_set_peek_off,
+ .set_peek_off = sk_set_peek_off,
.show_fdinfo = unix_show_fdinfo,
};
@@ -886,7 +873,7 @@ static const struct proto_ops unix_dgram_ops = {
.read_skb = unix_read_skb,
.recvmsg = unix_dgram_recvmsg,
.mmap = sock_no_mmap,
- .set_peek_off = unix_set_peek_off,
+ .set_peek_off = sk_set_peek_off,
.show_fdinfo = unix_show_fdinfo,
};
@@ -909,7 +896,7 @@ static const struct proto_ops unix_seqpacket_ops = {
.sendmsg = unix_seqpacket_sendmsg,
.recvmsg = unix_seqpacket_recvmsg,
.mmap = sock_no_mmap,
- .set_peek_off = unix_set_peek_off,
+ .set_peek_off = sk_set_peek_off,
.show_fdinfo = unix_show_fdinfo,
};
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index 2405f0f..2a81880 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -284,9 +284,17 @@ void unix_gc(void)
* which are creating the cycle(s).
*/
skb_queue_head_init(&hitlist);
- list_for_each_entry(u, &gc_candidates, link)
+ list_for_each_entry(u, &gc_candidates, link) {
scan_children(&u->sk, inc_inflight, &hitlist);
+#if IS_ENABLED(CONFIG_AF_UNIX_OOB)
+ if (u->oob_skb) {
+ kfree_skb(u->oob_skb);
+ u->oob_skb = NULL;
+ }
+#endif
+ }
+
/* not_cycle_list contains those sockets which do not make up a
* cycle. Restore these to the inflight list.
*/
diff --git a/net/wireless/core.c b/net/wireless/core.c
index 409d74c..3fb1b637 100644
--- a/net/wireless/core.c
+++ b/net/wireless/core.c
@@ -5,7 +5,7 @@
* Copyright 2006-2010 Johannes Berg <johannes@sipsolutions.net>
* Copyright 2013-2014 Intel Mobile Communications GmbH
* Copyright 2015-2017 Intel Deutschland GmbH
- * Copyright (C) 2018-2023 Intel Corporation
+ * Copyright (C) 2018-2024 Intel Corporation
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -1661,6 +1661,7 @@ void wiphy_delayed_work_queue(struct wiphy *wiphy,
unsigned long delay)
{
if (!delay) {
+ del_timer(&dwork->timer);
wiphy_work_queue(wiphy, &dwork->work);
return;
}
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index b097004..bd54a928 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -4197,6 +4197,8 @@ static int nl80211_set_interface(struct sk_buff *skb, struct genl_info *info)
if (ntype != NL80211_IFTYPE_MESH_POINT)
return -EINVAL;
+ if (otype != NL80211_IFTYPE_MESH_POINT)
+ return -EINVAL;
if (netif_running(dev))
return -EBUSY;
diff --git a/net/wireless/scan.c b/net/wireless/scan.c
index 2249b1a8..389a52c 100644
--- a/net/wireless/scan.c
+++ b/net/wireless/scan.c
@@ -1731,6 +1731,61 @@ static void cfg80211_update_hidden_bsses(struct cfg80211_internal_bss *known,
}
}
+static void cfg80211_check_stuck_ecsa(struct cfg80211_registered_device *rdev,
+ struct cfg80211_internal_bss *known,
+ const struct cfg80211_bss_ies *old)
+{
+ const struct ieee80211_ext_chansw_ie *ecsa;
+ const struct element *elem_new, *elem_old;
+ const struct cfg80211_bss_ies *new, *bcn;
+
+ if (known->pub.proberesp_ecsa_stuck)
+ return;
+
+ new = rcu_dereference_protected(known->pub.proberesp_ies,
+ lockdep_is_held(&rdev->bss_lock));
+ if (WARN_ON(!new))
+ return;
+
+ if (new->tsf - old->tsf < USEC_PER_SEC)
+ return;
+
+ elem_old = cfg80211_find_elem(WLAN_EID_EXT_CHANSWITCH_ANN,
+ old->data, old->len);
+ if (!elem_old)
+ return;
+
+ elem_new = cfg80211_find_elem(WLAN_EID_EXT_CHANSWITCH_ANN,
+ new->data, new->len);
+ if (!elem_new)
+ return;
+
+ bcn = rcu_dereference_protected(known->pub.beacon_ies,
+ lockdep_is_held(&rdev->bss_lock));
+ if (bcn &&
+ cfg80211_find_elem(WLAN_EID_EXT_CHANSWITCH_ANN,
+ bcn->data, bcn->len))
+ return;
+
+ if (elem_new->datalen != elem_old->datalen)
+ return;
+ if (elem_new->datalen < sizeof(struct ieee80211_ext_chansw_ie))
+ return;
+ if (memcmp(elem_new->data, elem_old->data, elem_new->datalen))
+ return;
+
+ ecsa = (void *)elem_new->data;
+
+ if (!ecsa->mode)
+ return;
+
+ if (ecsa->new_ch_num !=
+ ieee80211_frequency_to_channel(known->pub.channel->center_freq))
+ return;
+
+ known->pub.proberesp_ecsa_stuck = 1;
+}
+
static bool
cfg80211_update_known_bss(struct cfg80211_registered_device *rdev,
struct cfg80211_internal_bss *known,
@@ -1750,8 +1805,10 @@ cfg80211_update_known_bss(struct cfg80211_registered_device *rdev,
/* Override possible earlier Beacon frame IEs */
rcu_assign_pointer(known->pub.ies,
new->pub.proberesp_ies);
- if (old)
+ if (old) {
+ cfg80211_check_stuck_ecsa(rdev, known, old);
kfree_rcu((struct cfg80211_bss_ies *)old, rcu_head);
+ }
}
if (rcu_access_pointer(new->pub.beacon_ies)) {
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 1eadfac..b78c0e0 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -722,7 +722,8 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
memcpy(vaddr, buffer, len);
kunmap_local(vaddr);
- skb_add_rx_frag(skb, nr_frags, page, 0, len, 0);
+ skb_add_rx_frag(skb, nr_frags, page, 0, len, PAGE_SIZE);
+ refcount_add(PAGE_SIZE, &xs->sk.sk_wmem_alloc);
}
if (first_frag && desc->options & XDP_TX_METADATA) {
diff --git a/net/xfrm/xfrm_algo.c b/net/xfrm/xfrm_algo.c
index 41533c6..e6da7e8 100644
--- a/net/xfrm/xfrm_algo.c
+++ b/net/xfrm/xfrm_algo.c
@@ -858,4 +858,5 @@ int xfrm_count_pfkey_enc_supported(void)
}
EXPORT_SYMBOL_GPL(xfrm_count_pfkey_enc_supported);
+MODULE_DESCRIPTION("XFRM Algorithm interface");
MODULE_LICENSE("GPL");
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 3784534..653e51a 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -407,7 +407,7 @@ bool xfrm_dev_offload_ok(struct sk_buff *skb, struct xfrm_state *x)
struct xfrm_dst *xdst = (struct xfrm_dst *)dst;
struct net_device *dev = x->xso.dev;
- if (!x->type_offload || x->encap)
+ if (!x->type_offload)
return false;
if (x->xso.type == XFRM_DEV_OFFLOAD_PACKET ||
diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index 662c83b..e5722c9 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -704,9 +704,13 @@ int xfrm_output(struct sock *sk, struct sk_buff *skb)
{
struct net *net = dev_net(skb_dst(skb)->dev);
struct xfrm_state *x = skb_dst(skb)->xfrm;
+ int family;
int err;
- switch (x->outer_mode.family) {
+ family = (x->xso.type != XFRM_DEV_OFFLOAD_PACKET) ? x->outer_mode.family
+ : skb_dst(skb)->ops->family;
+
+ switch (family) {
case AF_INET:
memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
IPCB(skb)->flags |= IPSKB_XFRM_TRANSFORMED;
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 1b7e751..da6ecc6 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -2694,7 +2694,9 @@ static struct dst_entry *xfrm_bundle_create(struct xfrm_policy *policy,
if (xfrm[i]->props.smark.v || xfrm[i]->props.smark.m)
mark = xfrm_smark_get(fl->flowi_mark, xfrm[i]);
- family = xfrm[i]->props.family;
+ if (xfrm[i]->xso.type != XFRM_DEV_OFFLOAD_PACKET)
+ family = xfrm[i]->props.family;
+
oif = fl->flowi_oif ? : fl->flowi_l3mdev;
dst = xfrm_dst_lookup(xfrm[i], tos, oif,
&saddr, &daddr, family, mark);
@@ -3416,7 +3418,7 @@ decode_session4(const struct xfrm_flow_keys *flkeys, struct flowi *fl, bool reve
}
fl4->flowi4_proto = flkeys->basic.ip_proto;
- fl4->flowi4_tos = flkeys->ip.tos;
+ fl4->flowi4_tos = flkeys->ip.tos & ~INET_ECN_MASK;
}
#if IS_ENABLED(CONFIG_IPV6)
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index ad01997..912c118 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2017,6 +2017,9 @@ static int copy_to_user_tmpl(struct xfrm_policy *xp, struct sk_buff *skb)
if (xp->xfrm_nr == 0)
return 0;
+ if (xp->xfrm_nr > XFRM_MAX_DEPTH)
+ return -ENOBUFS;
+
for (i = 0; i < xp->xfrm_nr; i++) {
struct xfrm_user_tmpl *up = &vec[i];
struct xfrm_tmpl *kp = &xp->xfrm_vec[i];
@@ -3888,5 +3891,6 @@ static void __exit xfrm_user_exit(void)
module_init(xfrm_user_init);
module_exit(xfrm_user_exit);
+MODULE_DESCRIPTION("XFRM User interface");
MODULE_LICENSE("GPL");
MODULE_ALIAS_NET_PF_PROTO(PF_NETLINK, NETLINK_XFRM);
diff --git a/samples/bpf/asm_goto_workaround.h b/samples/bpf/asm_goto_workaround.h
index 7048bb3..634e81d 100644
--- a/samples/bpf/asm_goto_workaround.h
+++ b/samples/bpf/asm_goto_workaround.h
@@ -4,14 +4,14 @@
#define __ASM_GOTO_WORKAROUND_H
/*
- * This will bring in asm_volatile_goto and asm_inline macro definitions
+ * This will bring in asm_goto_output and asm_inline macro definitions
* if enabled by compiler and config options.
*/
#include <linux/types.h>
-#ifdef asm_volatile_goto
-#undef asm_volatile_goto
-#define asm_volatile_goto(x...) asm volatile("invalid use of asm_volatile_goto")
+#ifdef asm_goto_output
+#undef asm_goto_output
+#define asm_goto_output(x...) asm volatile("invalid use of asm_goto_output")
#endif
/*
diff --git a/scripts/Kconfig.include b/scripts/Kconfig.include
index 5a84b64..3ee8ecf 100644
--- a/scripts/Kconfig.include
+++ b/scripts/Kconfig.include
@@ -33,7 +33,7 @@
# $(as-instr,<instr>)
# Return y if the assembler supports <instr>, n otherwise
-as-instr = $(success,printf "%b\n" "$(1)" | $(CC) $(CLANG_FLAGS) -c -x assembler-with-cpp -o /dev/null -)
+as-instr = $(success,printf "%b\n" "$(1)" | $(CC) $(CLANG_FLAGS) -Wa$(comma)--fatal-warnings -c -x assembler-with-cpp -o /dev/null -)
# check if $(CC) and $(LD) exist
$(error-if,$(failure,command -v $(CC)),C compiler '$(CC)' not found)
diff --git a/scripts/Makefile.compiler b/scripts/Makefile.compiler
index 8fcb427..92be0c9 100644
--- a/scripts/Makefile.compiler
+++ b/scripts/Makefile.compiler
@@ -38,7 +38,7 @@
# Usage: aflags-y += $(call as-instr,instr,option1,option2)
as-instr = $(call try-run,\
- printf "%b\n" "$(1)" | $(CC) -Werror $(CLANG_FLAGS) $(KBUILD_AFLAGS) -c -x assembler-with-cpp -o "$$TMP" -,$(2),$(3))
+ printf "%b\n" "$(1)" | $(CC) -Werror $(CLANG_FLAGS) $(KBUILD_AFLAGS) -Wa$(comma)--fatal-warnings -c -x assembler-with-cpp -o "$$TMP" -,$(2),$(3))
# __cc-option
# Usage: MY_CFLAGS += $(call __cc-option,$(CC),$(MY_CFLAGS),-march=winchip-c6,-march=i586)
diff --git a/scripts/bpf_doc.py b/scripts/bpf_doc.py
index 61b7ddd..0669bac 100755
--- a/scripts/bpf_doc.py
+++ b/scripts/bpf_doc.py
@@ -513,7 +513,7 @@
instructions to the kernel when the programs are loaded. The format for that
string is identical to the one in use for kernel modules (Dual licenses, such
as "Dual BSD/GPL", may be used). Some helper functions are only accessible to
-programs that are compatible with the GNU Privacy License (GPL).
+programs that are compatible with the GNU General Public License (GNU GPL).
In order to use such helpers, the eBPF program must be loaded with the correct
license string passed (via **attr**) to the **bpf**\\ () system call, and this
diff --git a/scripts/clang-tools/gen_compile_commands.py b/scripts/clang-tools/gen_compile_commands.py
index 5dea447..e4fb686 100755
--- a/scripts/clang-tools/gen_compile_commands.py
+++ b/scripts/clang-tools/gen_compile_commands.py
@@ -170,7 +170,7 @@
# escape the pound sign '#', either as '\#' or '$(pound)' (depending on the
# kernel version). The compile_commands.json file is not interepreted
# by Make, so this code replaces the escaped version with '#'.
- prefix = command_prefix.replace('\#', '#').replace('$(pound)', '#')
+ prefix = command_prefix.replace(r'\#', '#').replace('$(pound)', '#')
# Return the canonical path, eliminating any symbolic links encountered in the path.
abs_path = os.path.realpath(os.path.join(root_directory, file_path))
diff --git a/scripts/gdb/linux/symbols.py b/scripts/gdb/linux/symbols.py
index c8047f44..e8316be 100644
--- a/scripts/gdb/linux/symbols.py
+++ b/scripts/gdb/linux/symbols.py
@@ -82,7 +82,7 @@
self.module_files_updated = True
def _get_module_file(self, module_name):
- module_pattern = ".*/{0}\.ko(?:.debug)?$".format(
+ module_pattern = r".*/{0}\.ko(?:.debug)?$".format(
module_name.replace("_", r"[_\-]"))
for name in self.module_files:
if re.match(module_pattern, name) and os.path.exists(name):
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index a432b17..7862a81 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -135,8 +135,13 @@
${OBJCOPY} --only-section=.BTF --set-section-flags .BTF=alloc,readonly \
--strip-all ${1} ${2} 2>/dev/null
# Change e_type to ET_REL so that it can be used to link final vmlinux.
- # Unlike GNU ld, lld does not allow an ET_EXEC input.
- printf '\1' | dd of=${2} conv=notrunc bs=1 seek=16 status=none
+ # GNU ld 2.35+ and lld do not allow an ET_EXEC input.
+ if is_enabled CONFIG_CPU_BIG_ENDIAN; then
+ et_rel='\0\1'
+ else
+ et_rel='\1\0'
+ fi
+ printf "${et_rel}" | dd of=${2} conv=notrunc bs=1 seek=16 status=none
}
# Create ${2} .S file with all symbols from the ${1} object file
diff --git a/scripts/mksysmap b/scripts/mksysmap
index 9ba1c9d..57ff565 100755
--- a/scripts/mksysmap
+++ b/scripts/mksysmap
@@ -48,17 +48,8 @@
/ __kvm_nvhe_\\$/d
/ __kvm_nvhe_\.L/d
-# arm64 lld
-/ __AArch64ADRPThunk_/d
-
-# arm lld
-/ __ARMV5PILongThunk_/d
-/ __ARMV7PILongThunk_/d
-/ __ThumbV7PILongThunk_/d
-
-# mips lld
-/ __LA25Thunk_/d
-/ __microLA25Thunk_/d
+# lld arm/aarch64/mips thunks
+/ __[[:alnum:]]*Thunk_/d
# CFI type identifiers
/ __kcfi_typeid_/d
diff --git a/scripts/mod/sumversion.c b/scripts/mod/sumversion.c
index 31066bf..dc48785 100644
--- a/scripts/mod/sumversion.c
+++ b/scripts/mod/sumversion.c
@@ -326,7 +326,12 @@ static int parse_source_files(const char *objfile, struct md4_ctx *md)
/* Sum all files in the same dir or subdirs. */
while ((line = get_line(&pos))) {
- char* p = line;
+ char* p;
+
+ /* trim the leading spaces away */
+ while (isspace(*line))
+ line++;
+ p = line;
if (strncmp(line, "source_", sizeof("source_")-1) == 0) {
p = strrchr(line, ' ');
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index 98e1150..9a3dcaa 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -784,7 +784,7 @@ static int apparmor_getselfattr(unsigned int attr, struct lsm_ctx __user *lx,
int error = -ENOENT;
struct aa_task_ctx *ctx = task_ctx(current);
struct aa_label *label = NULL;
- char *value;
+ char *value = NULL;
switch (attr) {
case LSM_ATTR_CURRENT:
diff --git a/security/integrity/digsig.c b/security/integrity/digsig.c
index df387de..45c3e5d 100644
--- a/security/integrity/digsig.c
+++ b/security/integrity/digsig.c
@@ -179,7 +179,8 @@ static int __init integrity_add_key(const unsigned int id, const void *data,
KEY_ALLOC_NOT_IN_QUOTA);
if (IS_ERR(key)) {
rc = PTR_ERR(key);
- pr_err("Problem loading X.509 certificate %d\n", rc);
+ if (id != INTEGRITY_KEYRING_MACHINE)
+ pr_err("Problem loading X.509 certificate %d\n", rc);
} else {
pr_notice("Loaded X.509 cert '%s'\n",
key_ref_to_ptr(key)->description);
diff --git a/security/landlock/fs.c b/security/landlock/fs.c
index fc520a0..0171f7e 100644
--- a/security/landlock/fs.c
+++ b/security/landlock/fs.c
@@ -737,8 +737,8 @@ static int current_check_refer_path(struct dentry *const old_dentry,
bool allow_parent1, allow_parent2;
access_mask_t access_request_parent1, access_request_parent2;
struct path mnt_dir;
- layer_mask_t layer_masks_parent1[LANDLOCK_NUM_ACCESS_FS],
- layer_masks_parent2[LANDLOCK_NUM_ACCESS_FS];
+ layer_mask_t layer_masks_parent1[LANDLOCK_NUM_ACCESS_FS] = {},
+ layer_masks_parent2[LANDLOCK_NUM_ACCESS_FS] = {};
if (!dom)
return 0;
diff --git a/security/security.c b/security/security.c
index 3aaad75..7035ee3 100644
--- a/security/security.c
+++ b/security/security.c
@@ -29,6 +29,7 @@
#include <linux/backing-dev.h>
#include <linux/string.h>
#include <linux/msg.h>
+#include <linux/overflow.h>
#include <net/flow.h>
/* How many LSMs were built into the kernel? */
@@ -4015,6 +4016,7 @@ int security_setselfattr(unsigned int attr, struct lsm_ctx __user *uctx,
struct security_hook_list *hp;
struct lsm_ctx *lctx;
int rc = LSM_RET_DEFAULT(setselfattr);
+ u64 required_len;
if (flags)
return -EINVAL;
@@ -4027,8 +4029,9 @@ int security_setselfattr(unsigned int attr, struct lsm_ctx __user *uctx,
if (IS_ERR(lctx))
return PTR_ERR(lctx);
- if (size < lctx->len || size < lctx->ctx_len + sizeof(*lctx) ||
- lctx->len < lctx->ctx_len + sizeof(*lctx)) {
+ if (size < lctx->len ||
+ check_add_overflow(sizeof(*lctx), lctx->ctx_len, &required_len) ||
+ lctx->len < required_len) {
rc = -EINVAL;
goto free_out;
}
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index a6bf90a..338b023 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -6559,7 +6559,7 @@ static int selinux_getselfattr(unsigned int attr, struct lsm_ctx __user *ctx,
size_t *size, u32 flags)
{
int rc;
- char *val;
+ char *val = NULL;
int val_len;
val_len = selinux_lsm_getattr(attr, current, &val);
diff --git a/security/tomoyo/common.c b/security/tomoyo/common.c
index 57ee70a..ea3140d5 100644
--- a/security/tomoyo/common.c
+++ b/security/tomoyo/common.c
@@ -2649,13 +2649,14 @@ ssize_t tomoyo_write_control(struct tomoyo_io_buffer *head,
{
int error = buffer_len;
size_t avail_len = buffer_len;
- char *cp0 = head->write_buf;
+ char *cp0;
int idx;
if (!head->write)
return -EINVAL;
if (mutex_lock_interruptible(&head->io_sem))
return -EINTR;
+ cp0 = head->write_buf;
head->read_user_buf_avail = 0;
idx = tomoyo_read_lock();
/* Read a line and dispatch it to the policy handler. */
diff --git a/sound/core/Makefile b/sound/core/Makefile
index a6b444e..f6526b3 100644
--- a/sound/core/Makefile
+++ b/sound/core/Makefile
@@ -32,7 +32,6 @@
snd-ump-$(CONFIG_SND_UMP_LEGACY_RAWMIDI) += ump_convert.o
snd-timer-objs := timer.o
snd-hrtimer-objs := hrtimer.o
-snd-rtctimer-objs := rtctimer.o
snd-hwdep-objs := hwdep.o
snd-seq-device-objs := seq_device.o
diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
index f5ff00f..21baf6b 100644
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -486,6 +486,11 @@ static int fixup_unreferenced_params(struct snd_pcm_substream *substream,
i = hw_param_interval_c(params, SNDRV_PCM_HW_PARAM_SAMPLE_BITS);
if (snd_interval_single(i))
params->msbits = snd_interval_value(i);
+ m = hw_param_mask_c(params, SNDRV_PCM_HW_PARAM_FORMAT);
+ if (snd_mask_single(m)) {
+ snd_pcm_format_t format = (__force snd_pcm_format_t)snd_mask_min(m);
+ params->msbits = snd_pcm_format_width(format);
+ }
}
if (params->msbits) {
diff --git a/sound/core/ump.c b/sound/core/ump.c
index 3bef194..fe79114 100644
--- a/sound/core/ump.c
+++ b/sound/core/ump.c
@@ -985,7 +985,7 @@ static int snd_ump_legacy_open(struct snd_rawmidi_substream *substream)
struct snd_ump_endpoint *ump = substream->rmidi->private_data;
int dir = substream->stream;
int group = ump->legacy_mapping[substream->number];
- int err;
+ int err = 0;
mutex_lock(&ump->open_mutex);
if (ump->legacy_substreams[dir][group]) {
@@ -1009,7 +1009,7 @@ static int snd_ump_legacy_open(struct snd_rawmidi_substream *substream)
spin_unlock_irq(&ump->legacy_locks[dir]);
unlock:
mutex_unlock(&ump->open_mutex);
- return 0;
+ return err;
}
static int snd_ump_legacy_close(struct snd_rawmidi_substream *substream)
diff --git a/sound/firewire/amdtp-stream.c b/sound/firewire/amdtp-stream.c
index a13c0b4..7be17bc 100644
--- a/sound/firewire/amdtp-stream.c
+++ b/sound/firewire/amdtp-stream.c
@@ -951,7 +951,7 @@ static int generate_tx_packet_descs(struct amdtp_stream *s, struct pkt_desc *des
// to the reason.
unsigned int safe_cycle = increment_ohci_cycle_count(next_cycle,
IR_JUMBO_PAYLOAD_MAX_SKIP_CYCLES);
- lost = (compare_ohci_cycle_count(safe_cycle, cycle) > 0);
+ lost = (compare_ohci_cycle_count(safe_cycle, cycle) < 0);
}
if (lost) {
dev_err(&s->unit->device, "Detect discontinuity of cycle: %d %d\n",
diff --git a/sound/pci/hda/Kconfig b/sound/pci/hda/Kconfig
index 21a90b3..8e0ff70 100644
--- a/sound/pci/hda/Kconfig
+++ b/sound/pci/hda/Kconfig
@@ -156,7 +156,7 @@
depends on I2C
depends on ACPI || COMPILE_TEST
depends on SND_SOC
- select CS_DSP
+ select FW_CS_DSP
select SND_HDA_GENERIC
select SND_SOC_CS35L56_SHARED
select SND_HDA_SCODEC_CS35L56
@@ -171,7 +171,7 @@
depends on SPI_MASTER
depends on ACPI || COMPILE_TEST
depends on SND_SOC
- select CS_DSP
+ select FW_CS_DSP
select SND_HDA_GENERIC
select SND_SOC_CS35L56_SHARED
select SND_HDA_SCODEC_CS35L56
diff --git a/sound/pci/hda/cs35l41_hda_property.c b/sound/pci/hda/cs35l41_hda_property.c
index d74cf11..e436d4d 100644
--- a/sound/pci/hda/cs35l41_hda_property.c
+++ b/sound/pci/hda/cs35l41_hda_property.c
@@ -83,6 +83,7 @@ static const struct cs35l41_config cs35l41_config_table[] = {
{ "104317F3", 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 1000, 4500, 24 },
{ "10431863", 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 1000, 4500, 24 },
{ "104318D3", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 0, 0, 0 },
+ { "10431A83", 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 1000, 4500, 24 },
{ "10431C9F", 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 1000, 4500, 24 },
{ "10431CAF", 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 1000, 4500, 24 },
{ "10431CCF", 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 1000, 4500, 24 },
@@ -91,10 +92,14 @@ static const struct cs35l41_config cs35l41_config_table[] = {
{ "10431D1F", 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 1000, 4500, 24 },
{ "10431DA2", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 0, 0, 0 },
{ "10431E02", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 0, 0, 0 },
+ { "10431E12", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 0, 0, 0 },
{ "10431EE2", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, -1, -1, 0, 0, 0 },
{ "10431F12", 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 1000, 4500, 24 },
{ "10431F1F", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, -1, 0, 0, 0, 0 },
{ "10431F62", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 0, 0, 0 },
+ { "17AA386F", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, -1, -1, 0, 0, 0 },
+ { "17AA38A9", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 2, -1, 0, 0, 0 },
+ { "17AA38AB", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 2, -1, 0, 0, 0 },
{ "17AA38B4", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 0, 0, 0 },
{ "17AA38B5", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 0, 0, 0 },
{ "17AA38B6", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 0, 0, 0 },
@@ -419,6 +424,7 @@ static const struct cs35l41_prop_model cs35l41_prop_model_table[] = {
{ "CSC3551", "104317F3", generic_dsd_config },
{ "CSC3551", "10431863", generic_dsd_config },
{ "CSC3551", "104318D3", generic_dsd_config },
+ { "CSC3551", "10431A83", generic_dsd_config },
{ "CSC3551", "10431C9F", generic_dsd_config },
{ "CSC3551", "10431CAF", generic_dsd_config },
{ "CSC3551", "10431CCF", generic_dsd_config },
@@ -427,10 +433,14 @@ static const struct cs35l41_prop_model cs35l41_prop_model_table[] = {
{ "CSC3551", "10431D1F", generic_dsd_config },
{ "CSC3551", "10431DA2", generic_dsd_config },
{ "CSC3551", "10431E02", generic_dsd_config },
+ { "CSC3551", "10431E12", generic_dsd_config },
{ "CSC3551", "10431EE2", generic_dsd_config },
{ "CSC3551", "10431F12", generic_dsd_config },
{ "CSC3551", "10431F1F", generic_dsd_config },
{ "CSC3551", "10431F62", generic_dsd_config },
+ { "CSC3551", "17AA386F", generic_dsd_config },
+ { "CSC3551", "17AA38A9", generic_dsd_config },
+ { "CSC3551", "17AA38AB", generic_dsd_config },
{ "CSC3551", "17AA38B4", generic_dsd_config },
{ "CSC3551", "17AA38B5", generic_dsd_config },
{ "CSC3551", "17AA38B6", generic_dsd_config },
diff --git a/sound/pci/hda/hda_controller.c b/sound/pci/hda/hda_controller.c
index 3e7bfee..efe98f6 100644
--- a/sound/pci/hda/hda_controller.c
+++ b/sound/pci/hda/hda_controller.c
@@ -1207,6 +1207,9 @@ int azx_probe_codecs(struct azx *chip, unsigned int max_slots)
dev_warn(chip->card->dev,
"Codec #%d probe error; disabling it...\n", c);
bus->codec_mask &= ~(1 << c);
+ /* no codecs */
+ if (bus->codec_mask == 0)
+ break;
/* More badly, accessing to a non-existing
* codec often screws up the controller chip,
* and disturbs the further communications.
diff --git a/sound/pci/hda/patch_conexant.c b/sound/pci/hda/patch_conexant.c
index e8819e8..e820917 100644
--- a/sound/pci/hda/patch_conexant.c
+++ b/sound/pci/hda/patch_conexant.c
@@ -344,6 +344,7 @@ enum {
CXT_FIXUP_HP_ZBOOK_MUTE_LED,
CXT_FIXUP_HEADSET_MIC,
CXT_FIXUP_HP_MIC_NO_PRESENCE,
+ CXT_PINCFG_SWS_JS201D,
};
/* for hda_fixup_thinkpad_acpi() */
@@ -841,6 +842,17 @@ static const struct hda_pintbl cxt_pincfg_lemote[] = {
{}
};
+/* SuoWoSi/South-holding JS201D with sn6140 */
+static const struct hda_pintbl cxt_pincfg_sws_js201d[] = {
+ { 0x16, 0x03211040 }, /* hp out */
+ { 0x17, 0x91170110 }, /* SPK/Class_D */
+ { 0x18, 0x95a70130 }, /* Internal mic */
+ { 0x19, 0x03a11020 }, /* Headset Mic */
+ { 0x1a, 0x40f001f0 }, /* Not used */
+ { 0x21, 0x40f001f0 }, /* Not used */
+ {}
+};
+
static const struct hda_fixup cxt_fixups[] = {
[CXT_PINCFG_LENOVO_X200] = {
.type = HDA_FIXUP_PINS,
@@ -996,6 +1008,10 @@ static const struct hda_fixup cxt_fixups[] = {
.chained = true,
.chain_id = CXT_FIXUP_HEADSET_MIC,
},
+ [CXT_PINCFG_SWS_JS201D] = {
+ .type = HDA_FIXUP_PINS,
+ .v.pins = cxt_pincfg_sws_js201d,
+ },
};
static const struct snd_pci_quirk cxt5045_fixups[] = {
@@ -1069,6 +1085,7 @@ static const struct snd_pci_quirk cxt5066_fixups[] = {
SND_PCI_QUIRK(0x103c, 0x8457, "HP Z2 G4 mini", CXT_FIXUP_HP_MIC_NO_PRESENCE),
SND_PCI_QUIRK(0x103c, 0x8458, "HP Z2 G4 mini premium", CXT_FIXUP_HP_MIC_NO_PRESENCE),
SND_PCI_QUIRK(0x1043, 0x138d, "Asus", CXT_FIXUP_HEADPHONE_MIC_PIN),
+ SND_PCI_QUIRK(0x14f1, 0x0265, "SWS JS201D", CXT_PINCFG_SWS_JS201D),
SND_PCI_QUIRK(0x152d, 0x0833, "OLPC XO-1.5", CXT_FIXUP_OLPC_XO),
SND_PCI_QUIRK(0x17aa, 0x20f2, "Lenovo T400", CXT_PINCFG_LENOVO_TP410),
SND_PCI_QUIRK(0x17aa, 0x215e, "Lenovo T410", CXT_PINCFG_LENOVO_TP410),
@@ -1109,6 +1126,7 @@ static const struct hda_model_fixup cxt5066_fixup_models[] = {
{ .id = CXT_FIXUP_HP_ZBOOK_MUTE_LED, .name = "hp-zbook-mute-led" },
{ .id = CXT_FIXUP_HP_MIC_NO_PRESENCE, .name = "hp-mic-fix" },
{ .id = CXT_PINCFG_LENOVO_NOTEBOOK, .name = "lenovo-20149" },
+ { .id = CXT_PINCFG_SWS_JS201D, .name = "sws-js201d" },
{}
};
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index 6994c4c..a1facdb 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -3684,6 +3684,7 @@ static void alc285_hp_init(struct hda_codec *codec)
int i, val;
int coef38, coef0d, coef36;
+ alc_write_coefex_idx(codec, 0x58, 0x00, 0x1888); /* write default value */
alc_update_coef_idx(codec, 0x4a, 1<<15, 1<<15); /* Reset HP JD */
coef38 = alc_read_coef_idx(codec, 0x38); /* Amp control */
coef0d = alc_read_coef_idx(codec, 0x0d); /* Digital Misc control */
@@ -7444,6 +7445,7 @@ enum {
ALC287_FIXUP_LEGION_15IMHG05_AUTOMUTE,
ALC287_FIXUP_YOGA7_14ITL_SPEAKERS,
ALC298_FIXUP_LENOVO_C940_DUET7,
+ ALC287_FIXUP_LENOVO_14IRP8_DUETITL,
ALC287_FIXUP_13S_GEN2_SPEAKERS,
ALC256_FIXUP_SET_COEF_DEFAULTS,
ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE,
@@ -7495,6 +7497,26 @@ static void alc298_fixup_lenovo_c940_duet7(struct hda_codec *codec,
__snd_hda_apply_fixup(codec, id, action, 0);
}
+/* A special fixup for Lenovo Slim/Yoga Pro 9 14IRP8 and Yoga DuetITL 2021;
+ * 14IRP8 PCI SSID will mistakenly be matched with the DuetITL codec SSID,
+ * so we need to apply a different fixup in this case. The only DuetITL codec
+ * SSID reported so far is the 17aa:3802 while the 14IRP8 has the 17aa:38be
+ * and 17aa:38bf. If it weren't for the PCI SSID, the 14IRP8 models would
+ * have matched correctly by their codecs.
+ */
+static void alc287_fixup_lenovo_14irp8_duetitl(struct hda_codec *codec,
+ const struct hda_fixup *fix,
+ int action)
+{
+ int id;
+
+ if (codec->core.subsystem_id == 0x17aa3802)
+ id = ALC287_FIXUP_YOGA7_14ITL_SPEAKERS; /* DuetITL */
+ else
+ id = ALC287_FIXUP_TAS2781_I2C; /* 14IRP8 */
+ __snd_hda_apply_fixup(codec, id, action, 0);
+}
+
static const struct hda_fixup alc269_fixups[] = {
[ALC269_FIXUP_GPIO2] = {
.type = HDA_FIXUP_FUNC,
@@ -9379,6 +9401,10 @@ static const struct hda_fixup alc269_fixups[] = {
.type = HDA_FIXUP_FUNC,
.v.func = alc298_fixup_lenovo_c940_duet7,
},
+ [ALC287_FIXUP_LENOVO_14IRP8_DUETITL] = {
+ .type = HDA_FIXUP_FUNC,
+ .v.func = alc287_fixup_lenovo_14irp8_duetitl,
+ },
[ALC287_FIXUP_13S_GEN2_SPEAKERS] = {
.type = HDA_FIXUP_VERBS,
.v.verbs = (const struct hda_verb[]) {
@@ -9585,7 +9611,7 @@ static const struct hda_fixup alc269_fixups[] = {
.type = HDA_FIXUP_FUNC,
.v.func = tas2781_fixup_i2c,
.chained = true,
- .chain_id = ALC269_FIXUP_THINKPAD_ACPI,
+ .chain_id = ALC285_FIXUP_THINKPAD_HEADSET_JACK,
},
[ALC287_FIXUP_YOGA7_14ARB7_I2C] = {
.type = HDA_FIXUP_FUNC,
@@ -9737,13 +9763,16 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x1028, 0x0b71, "Dell Inspiron 16 Plus 7620", ALC295_FIXUP_DELL_INSPIRON_TOP_SPEAKERS),
SND_PCI_QUIRK(0x1028, 0x0beb, "Dell XPS 15 9530 (2023)", ALC289_FIXUP_DELL_CS35L41_SPI_2),
SND_PCI_QUIRK(0x1028, 0x0c03, "Dell Precision 5340", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE),
+ SND_PCI_QUIRK(0x1028, 0x0c0b, "Dell Oasis 14 RPL-P", ALC289_FIXUP_RTK_AMP_DUAL_SPK),
SND_PCI_QUIRK(0x1028, 0x0c0d, "Dell Oasis", ALC289_FIXUP_RTK_AMP_DUAL_SPK),
+ SND_PCI_QUIRK(0x1028, 0x0c0e, "Dell Oasis 16", ALC289_FIXUP_RTK_AMP_DUAL_SPK),
SND_PCI_QUIRK(0x1028, 0x0c19, "Dell Precision 3340", ALC236_FIXUP_DELL_DUAL_CODECS),
SND_PCI_QUIRK(0x1028, 0x0c1a, "Dell Precision 3340", ALC236_FIXUP_DELL_DUAL_CODECS),
SND_PCI_QUIRK(0x1028, 0x0c1b, "Dell Precision 3440", ALC236_FIXUP_DELL_DUAL_CODECS),
SND_PCI_QUIRK(0x1028, 0x0c1c, "Dell Precision 3540", ALC236_FIXUP_DELL_DUAL_CODECS),
SND_PCI_QUIRK(0x1028, 0x0c1d, "Dell Precision 3440", ALC236_FIXUP_DELL_DUAL_CODECS),
SND_PCI_QUIRK(0x1028, 0x0c1e, "Dell Precision 3540", ALC236_FIXUP_DELL_DUAL_CODECS),
+ SND_PCI_QUIRK(0x1028, 0x0c28, "Dell Inspiron 16 Plus 7630", ALC295_FIXUP_DELL_INSPIRON_TOP_SPEAKERS),
SND_PCI_QUIRK(0x1028, 0x0c4d, "Dell", ALC287_FIXUP_CS35L41_I2C_4),
SND_PCI_QUIRK(0x1028, 0x0cbd, "Dell Oasis 13 CS MTL-U", ALC289_FIXUP_DELL_CS35L41_SPI_2),
SND_PCI_QUIRK(0x1028, 0x0cbe, "Dell Oasis 13 2-IN-1 MTL-U", ALC289_FIXUP_DELL_CS35L41_SPI_2),
@@ -9900,6 +9929,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x103c, 0x8973, "HP EliteBook 860 G9", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
SND_PCI_QUIRK(0x103c, 0x8974, "HP EliteBook 840 Aero G9", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
SND_PCI_QUIRK(0x103c, 0x8975, "HP EliteBook x360 840 Aero G9", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
+ SND_PCI_QUIRK(0x103c, 0x897d, "HP mt440 Mobile Thin Client U74", ALC236_FIXUP_HP_GPIO_LED),
SND_PCI_QUIRK(0x103c, 0x8981, "HP Elite Dragonfly G3", ALC245_FIXUP_CS35L41_SPI_4),
SND_PCI_QUIRK(0x103c, 0x898e, "HP EliteBook 835 G9", ALC287_FIXUP_CS35L41_I2C_2),
SND_PCI_QUIRK(0x103c, 0x898f, "HP EliteBook 835 G9", ALC287_FIXUP_CS35L41_I2C_2),
@@ -9925,16 +9955,20 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x103c, 0x8aa3, "HP ProBook 450 G9 (MB 8AA1)", ALC236_FIXUP_HP_GPIO_LED),
SND_PCI_QUIRK(0x103c, 0x8aa8, "HP EliteBook 640 G9 (MB 8AA6)", ALC236_FIXUP_HP_GPIO_LED),
SND_PCI_QUIRK(0x103c, 0x8aab, "HP EliteBook 650 G9 (MB 8AA9)", ALC236_FIXUP_HP_GPIO_LED),
+ SND_PCI_QUIRK(0x103c, 0x8ab9, "HP EliteBook 840 G8 (MB 8AB8)", ALC285_FIXUP_HP_GPIO_LED),
SND_PCI_QUIRK(0x103c, 0x8abb, "HP ZBook Firefly 14 G9", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
SND_PCI_QUIRK(0x103c, 0x8ad1, "HP EliteBook 840 14 inch G9 Notebook PC", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
SND_PCI_QUIRK(0x103c, 0x8ad2, "HP EliteBook 860 16 inch G9 Notebook PC", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
+ SND_PCI_QUIRK(0x103c, 0x8b0f, "HP Elite mt645 G7 Mobile Thin Client U81", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF),
SND_PCI_QUIRK(0x103c, 0x8b2f, "HP 255 15.6 inch G10 Notebook PC", ALC236_FIXUP_HP_MUTE_LED_COEFBIT2),
+ SND_PCI_QUIRK(0x103c, 0x8b3f, "HP mt440 Mobile Thin Client U91", ALC236_FIXUP_HP_GPIO_LED),
SND_PCI_QUIRK(0x103c, 0x8b42, "HP", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
SND_PCI_QUIRK(0x103c, 0x8b43, "HP", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
SND_PCI_QUIRK(0x103c, 0x8b44, "HP", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
SND_PCI_QUIRK(0x103c, 0x8b45, "HP", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
SND_PCI_QUIRK(0x103c, 0x8b46, "HP", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
SND_PCI_QUIRK(0x103c, 0x8b47, "HP", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
+ SND_PCI_QUIRK(0x103c, 0x8b59, "HP Elite mt645 G7 Mobile Thin Client U89", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF),
SND_PCI_QUIRK(0x103c, 0x8b5d, "HP", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF),
SND_PCI_QUIRK(0x103c, 0x8b5e, "HP", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF),
SND_PCI_QUIRK(0x103c, 0x8b63, "HP Elite Dragonfly 13.5 inch G4", ALC245_FIXUP_CS35L41_SPI_4_HP_GPIO_LED),
@@ -9962,6 +9996,10 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x103c, 0x8c70, "HP EliteBook 835 G11", ALC287_FIXUP_CS35L41_I2C_2_HP_GPIO_LED),
SND_PCI_QUIRK(0x103c, 0x8c71, "HP EliteBook 845 G11", ALC287_FIXUP_CS35L41_I2C_2_HP_GPIO_LED),
SND_PCI_QUIRK(0x103c, 0x8c72, "HP EliteBook 865 G11", ALC287_FIXUP_CS35L41_I2C_2_HP_GPIO_LED),
+ SND_PCI_QUIRK(0x103c, 0x8c8a, "HP EliteBook 630", ALC236_FIXUP_HP_GPIO_LED),
+ SND_PCI_QUIRK(0x103c, 0x8c8c, "HP EliteBook 660", ALC236_FIXUP_HP_GPIO_LED),
+ SND_PCI_QUIRK(0x103c, 0x8c90, "HP EliteBook 640", ALC236_FIXUP_HP_GPIO_LED),
+ SND_PCI_QUIRK(0x103c, 0x8c91, "HP EliteBook 660", ALC236_FIXUP_HP_GPIO_LED),
SND_PCI_QUIRK(0x103c, 0x8c96, "HP", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF),
SND_PCI_QUIRK(0x103c, 0x8c97, "HP ZBook", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF),
SND_PCI_QUIRK(0x103c, 0x8ca1, "HP ZBook Power", ALC236_FIXUP_HP_GPIO_LED),
@@ -10003,6 +10041,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x1043, 0x1662, "ASUS GV301QH", ALC294_FIXUP_ASUS_DUAL_SPK),
SND_PCI_QUIRK(0x1043, 0x1663, "ASUS GU603ZI/ZJ/ZQ/ZU/ZV", ALC285_FIXUP_ASUS_HEADSET_MIC),
SND_PCI_QUIRK(0x1043, 0x1683, "ASUS UM3402YAR", ALC287_FIXUP_CS35L41_I2C_2),
+ SND_PCI_QUIRK(0x1043, 0x16a3, "ASUS UX3402VA", ALC245_FIXUP_CS35L41_SPI_2),
SND_PCI_QUIRK(0x1043, 0x16b2, "ASUS GU603", ALC289_FIXUP_ASUS_GA401),
SND_PCI_QUIRK(0x1043, 0x16d3, "ASUS UX5304VA", ALC245_FIXUP_CS35L41_SPI_2),
SND_PCI_QUIRK(0x1043, 0x16e3, "ASUS UX50", ALC269_FIXUP_STEREO_DMIC),
@@ -10046,14 +10085,12 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x1043, 0x1d4e, "ASUS TM420", ALC256_FIXUP_ASUS_HPE),
SND_PCI_QUIRK(0x1043, 0x1da2, "ASUS UP6502ZA/ZD", ALC245_FIXUP_CS35L41_SPI_2),
SND_PCI_QUIRK(0x1043, 0x1e02, "ASUS UX3402ZA", ALC245_FIXUP_CS35L41_SPI_2),
- SND_PCI_QUIRK(0x1043, 0x16a3, "ASUS UX3402VA", ALC245_FIXUP_CS35L41_SPI_2),
- SND_PCI_QUIRK(0x1043, 0x1f62, "ASUS UX7602ZM", ALC245_FIXUP_CS35L41_SPI_2),
SND_PCI_QUIRK(0x1043, 0x1e11, "ASUS Zephyrus G15", ALC289_FIXUP_ASUS_GA502),
- SND_PCI_QUIRK(0x1043, 0x1e12, "ASUS UM6702RA/RC", ALC287_FIXUP_CS35L41_I2C_2),
+ SND_PCI_QUIRK(0x1043, 0x1e12, "ASUS UM3402", ALC287_FIXUP_CS35L41_I2C_2),
SND_PCI_QUIRK(0x1043, 0x1e51, "ASUS Zephyrus M15", ALC294_FIXUP_ASUS_GU502_PINS),
SND_PCI_QUIRK(0x1043, 0x1e5e, "ASUS ROG Strix G513", ALC294_FIXUP_ASUS_G513_PINS),
SND_PCI_QUIRK(0x1043, 0x1e8e, "ASUS Zephyrus G15", ALC289_FIXUP_ASUS_GA401),
- SND_PCI_QUIRK(0x1043, 0x1ee2, "ASUS UM3402", ALC287_FIXUP_CS35L41_I2C_2),
+ SND_PCI_QUIRK(0x1043, 0x1ee2, "ASUS UM6702RA/RC", ALC287_FIXUP_CS35L41_I2C_2),
SND_PCI_QUIRK(0x1043, 0x1c52, "ASUS Zephyrus G15 2022", ALC289_FIXUP_ASUS_GA401),
SND_PCI_QUIRK(0x1043, 0x1f11, "ASUS Zephyrus G14", ALC289_FIXUP_ASUS_GA401),
SND_PCI_QUIRK(0x1043, 0x1f12, "ASUS UM5302", ALC287_FIXUP_CS35L41_I2C_2),
@@ -10244,7 +10281,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x17aa, 0x31af, "ThinkCentre Station", ALC623_FIXUP_LENOVO_THINKSTATION_P340),
SND_PCI_QUIRK(0x17aa, 0x334b, "Lenovo ThinkCentre M70 Gen5", ALC283_FIXUP_HEADSET_MIC),
SND_PCI_QUIRK(0x17aa, 0x3801, "Lenovo Yoga9 14IAP7", ALC287_FIXUP_YOGA9_14IAP7_BASS_SPK_PIN),
- SND_PCI_QUIRK(0x17aa, 0x3802, "Lenovo Yoga DuetITL 2021", ALC287_FIXUP_YOGA7_14ITL_SPEAKERS),
+ SND_PCI_QUIRK(0x17aa, 0x3802, "Lenovo Yoga Pro 9 14IRP8 / DuetITL 2021", ALC287_FIXUP_LENOVO_14IRP8_DUETITL),
SND_PCI_QUIRK(0x17aa, 0x3813, "Legion 7i 15IMHG05", ALC287_FIXUP_LEGION_15IMHG05_SPEAKERS),
SND_PCI_QUIRK(0x17aa, 0x3818, "Lenovo C940 / Yoga Duet 7", ALC298_FIXUP_LENOVO_C940_DUET7),
SND_PCI_QUIRK(0x17aa, 0x3819, "Lenovo 13s Gen2 ITL", ALC287_FIXUP_13S_GEN2_SPEAKERS),
@@ -10260,6 +10297,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x17aa, 0x3853, "Lenovo Yoga 7 15ITL5", ALC287_FIXUP_YOGA7_14ITL_SPEAKERS),
SND_PCI_QUIRK(0x17aa, 0x3855, "Legion 7 16ITHG6", ALC287_FIXUP_LEGION_16ITHG6),
SND_PCI_QUIRK(0x17aa, 0x3869, "Lenovo Yoga7 14IAL7", ALC287_FIXUP_YOGA9_14IAP7_BASS_SPK_PIN),
+ SND_PCI_QUIRK(0x17aa, 0x386f, "Legion 7i 16IAX7", ALC287_FIXUP_CS35L41_I2C_2),
SND_PCI_QUIRK(0x17aa, 0x3870, "Lenovo Yoga 7 14ARB7", ALC287_FIXUP_YOGA7_14ARB7_I2C),
SND_PCI_QUIRK(0x17aa, 0x387d, "Yoga S780-16 pro Quad AAC", ALC287_FIXUP_TAS2781_I2C),
SND_PCI_QUIRK(0x17aa, 0x387e, "Yoga S780-16 pro Quad YC", ALC287_FIXUP_TAS2781_I2C),
@@ -10269,6 +10307,8 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x17aa, 0x3886, "Y780 VECO DUAL", ALC287_FIXUP_TAS2781_I2C),
SND_PCI_QUIRK(0x17aa, 0x38a7, "Y780P AMD YG dual", ALC287_FIXUP_TAS2781_I2C),
SND_PCI_QUIRK(0x17aa, 0x38a8, "Y780P AMD VECO dual", ALC287_FIXUP_TAS2781_I2C),
+ SND_PCI_QUIRK(0x17aa, 0x38a9, "Thinkbook 16P", ALC287_FIXUP_CS35L41_I2C_2),
+ SND_PCI_QUIRK(0x17aa, 0x38ab, "Thinkbook 16P", ALC287_FIXUP_CS35L41_I2C_2),
SND_PCI_QUIRK(0x17aa, 0x38b4, "Legion Slim 7 16IRH8", ALC287_FIXUP_CS35L41_I2C_2),
SND_PCI_QUIRK(0x17aa, 0x38b5, "Legion Slim 7 16IRH8", ALC287_FIXUP_CS35L41_I2C_2),
SND_PCI_QUIRK(0x17aa, 0x38b6, "Legion Slim 7 16APH8", ALC287_FIXUP_CS35L41_I2C_2),
@@ -10961,6 +11001,8 @@ static const struct snd_hda_pin_quirk alc269_pin_fixup_tbl[] = {
* at most one tbl is allowed to define for the same vendor and same codec
*/
static const struct snd_hda_pin_quirk alc269_fallback_pin_fixup_tbl[] = {
+ SND_HDA_PIN_QUIRK(0x10ec0256, 0x1025, "Acer", ALC2XX_FIXUP_HEADSET_MIC,
+ {0x19, 0x40000000}),
SND_HDA_PIN_QUIRK(0x10ec0289, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE,
{0x19, 0x40000000},
{0x1b, 0x40000000}),
@@ -11650,8 +11692,7 @@ static void alc897_hp_automute_hook(struct hda_codec *codec,
snd_hda_gen_hp_automute(codec, jack);
vref = spec->gen.hp_jack_present ? (PIN_HP | AC_PINCTL_VREF_100) : PIN_HP;
- snd_hda_codec_write(codec, 0x1b, 0, AC_VERB_SET_PIN_WIDGET_CONTROL,
- vref);
+ snd_hda_set_pin_ctl(codec, 0x1b, vref);
}
static void alc897_fixup_lenovo_headset_mic(struct hda_codec *codec,
@@ -11660,6 +11701,10 @@ static void alc897_fixup_lenovo_headset_mic(struct hda_codec *codec,
struct alc_spec *spec = codec->spec;
if (action == HDA_FIXUP_ACT_PRE_PROBE) {
spec->gen.hp_automute_hook = alc897_hp_automute_hook;
+ spec->no_shutup_pins = 1;
+ }
+ if (action == HDA_FIXUP_ACT_PROBE) {
+ snd_hda_set_pin_ctl_cache(codec, 0x1a, PIN_IN | AC_PINCTL_VREF_100);
}
}
diff --git a/sound/pci/hda/tas2781_hda_i2c.c b/sound/pci/hda/tas2781_hda_i2c.c
index 2dd809d..1bfb001 100644
--- a/sound/pci/hda/tas2781_hda_i2c.c
+++ b/sound/pci/hda/tas2781_hda_i2c.c
@@ -710,7 +710,7 @@ static int tas2781_hda_bind(struct device *dev, struct device *master,
strscpy(comps->name, dev_name(dev), sizeof(comps->name));
- ret = tascodec_init(tas_hda->priv, codec, tasdev_fw_ready);
+ ret = tascodec_init(tas_hda->priv, codec, THIS_MODULE, tasdev_fw_ready);
if (!ret)
comps->playback_hook = tas2781_hda_playback_hook;
diff --git a/sound/soc/amd/yc/acp6x-mach.c b/sound/soc/amd/yc/acp6x-mach.c
index 23d44a5..90360f8 100644
--- a/sound/soc/amd/yc/acp6x-mach.c
+++ b/sound/soc/amd/yc/acp6x-mach.c
@@ -203,6 +203,20 @@ static const struct dmi_system_id yc_acp_quirk_table[] = {
.driver_data = &acp6x_card,
.matches = {
DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "21J2"),
+ }
+ },
+ {
+ .driver_data = &acp6x_card,
+ .matches = {
+ DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "21J0"),
+ }
+ },
+ {
+ .driver_data = &acp6x_card,
+ .matches = {
+ DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
DMI_MATCH(DMI_PRODUCT_NAME, "21J5"),
}
},
@@ -238,6 +252,13 @@ static const struct dmi_system_id yc_acp_quirk_table[] = {
.driver_data = &acp6x_card,
.matches = {
DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "82UU"),
+ }
+ },
+ {
+ .driver_data = &acp6x_card,
+ .matches = {
+ DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
DMI_MATCH(DMI_PRODUCT_NAME, "82V2"),
}
},
@@ -251,6 +272,13 @@ static const struct dmi_system_id yc_acp_quirk_table[] = {
{
.driver_data = &acp6x_card,
.matches = {
+ DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "83AS"),
+ }
+ },
+ {
+ .driver_data = &acp6x_card,
+ .matches = {
DMI_MATCH(DMI_BOARD_VENDOR, "ASUSTeK COMPUTER INC."),
DMI_MATCH(DMI_PRODUCT_NAME, "UM5302TA"),
}
@@ -391,6 +419,13 @@ static const struct dmi_system_id yc_acp_quirk_table[] = {
{
.driver_data = &acp6x_card,
.matches = {
+ DMI_MATCH(DMI_BOARD_VENDOR, "HP"),
+ DMI_MATCH(DMI_BOARD_NAME, "8BD6"),
+ }
+ },
+ {
+ .driver_data = &acp6x_card,
+ .matches = {
DMI_MATCH(DMI_BOARD_VENDOR, "MECHREVO"),
DMI_MATCH(DMI_BOARD_NAME, "MRID6"),
}
diff --git a/sound/soc/amd/yc/pci-acp6x.c b/sound/soc/amd/yc/pci-acp6x.c
index 7af6a34..694b8e31 100644
--- a/sound/soc/amd/yc/pci-acp6x.c
+++ b/sound/soc/amd/yc/pci-acp6x.c
@@ -162,6 +162,7 @@ static int snd_acp6x_probe(struct pci_dev *pci,
/* Yellow Carp device check */
switch (pci->revision) {
case 0x60:
+ case 0x63:
case 0x6f:
break;
default:
diff --git a/sound/soc/codecs/cs35l45.c b/sound/soc/codecs/cs35l45.c
index 44c2217..2392c6e 100644
--- a/sound/soc/codecs/cs35l45.c
+++ b/sound/soc/codecs/cs35l45.c
@@ -184,7 +184,7 @@ static int cs35l45_activate_ctl(struct snd_soc_component *component,
else
snprintf(name, SNDRV_CTL_ELEM_ID_NAME_MAXLEN, "%s", ctl_name);
- kcontrol = snd_soc_card_get_kcontrol(component->card, name);
+ kcontrol = snd_soc_card_get_kcontrol_locked(component->card, name);
if (!kcontrol) {
dev_err(component->dev, "Can't find kcontrol %s\n", name);
return -EINVAL;
diff --git a/sound/soc/codecs/cs35l56-shared.c b/sound/soc/codecs/cs35l56-shared.c
index 02fba4b..cb4e831 100644
--- a/sound/soc/codecs/cs35l56-shared.c
+++ b/sound/soc/codecs/cs35l56-shared.c
@@ -51,7 +51,6 @@ static const struct reg_default cs35l56_reg_defaults[] = {
{ CS35L56_SWIRE_DP3_CH2_INPUT, 0x00000019 },
{ CS35L56_SWIRE_DP3_CH3_INPUT, 0x00000029 },
{ CS35L56_SWIRE_DP3_CH4_INPUT, 0x00000028 },
- { CS35L56_IRQ1_CFG, 0x00000000 },
{ CS35L56_IRQ1_MASK_1, 0x83ffffff },
{ CS35L56_IRQ1_MASK_2, 0xffff7fff },
{ CS35L56_IRQ1_MASK_4, 0xe0ffffff },
@@ -336,6 +335,7 @@ void cs35l56_wait_min_reset_pulse(void)
EXPORT_SYMBOL_NS_GPL(cs35l56_wait_min_reset_pulse, SND_SOC_CS35L56_SHARED);
static const struct reg_sequence cs35l56_system_reset_seq[] = {
+ REG_SEQ0(CS35L56_DSP1_HALO_STATE, 0),
REG_SEQ0(CS35L56_DSP_VIRTUAL1_MBOX_1, CS35L56_MBOX_CMD_SYSTEM_RESET),
};
diff --git a/sound/soc/codecs/cs35l56.c b/sound/soc/codecs/cs35l56.c
index c23e29d..6dd0319 100644
--- a/sound/soc/codecs/cs35l56.c
+++ b/sound/soc/codecs/cs35l56.c
@@ -5,6 +5,7 @@
// Copyright (C) 2023 Cirrus Logic, Inc. and
// Cirrus Logic International Semiconductor Ltd.
+#include <linux/acpi.h>
#include <linux/completion.h>
#include <linux/debugfs.h>
#include <linux/delay.h>
@@ -15,6 +16,7 @@
#include <linux/module.h>
#include <linux/pm.h>
#include <linux/pm_runtime.h>
+#include <linux/property.h>
#include <linux/regmap.h>
#include <linux/regulator/consumer.h>
#include <linux/slab.h>
@@ -68,6 +70,66 @@ static const char * const cs35l56_asp1_mux_control_names[] = {
"ASP1 TX1 Source", "ASP1 TX2 Source", "ASP1 TX3 Source", "ASP1 TX4 Source"
};
+static int cs35l56_sync_asp1_mixer_widgets_with_firmware(struct cs35l56_private *cs35l56)
+{
+ struct snd_soc_dapm_context *dapm = snd_soc_component_get_dapm(cs35l56->component);
+ const char *prefix = cs35l56->component->name_prefix;
+ char full_name[SNDRV_CTL_ELEM_ID_NAME_MAXLEN];
+ const char *name;
+ struct snd_kcontrol *kcontrol;
+ struct soc_enum *e;
+ unsigned int val[4];
+ int i, item, ret;
+
+ if (cs35l56->asp1_mixer_widgets_initialized)
+ return 0;
+
+ /*
+ * Resume so we can read the registers from silicon if the regmap
+ * cache has not yet been populated.
+ */
+ ret = pm_runtime_resume_and_get(cs35l56->base.dev);
+ if (ret < 0)
+ return ret;
+
+ /* Wait for firmware download and reboot */
+ cs35l56_wait_dsp_ready(cs35l56);
+
+ ret = regmap_bulk_read(cs35l56->base.regmap, CS35L56_ASP1TX1_INPUT,
+ val, ARRAY_SIZE(val));
+
+ pm_runtime_mark_last_busy(cs35l56->base.dev);
+ pm_runtime_put_autosuspend(cs35l56->base.dev);
+
+ if (ret) {
+ dev_err(cs35l56->base.dev, "Failed to read ASP1 mixer regs: %d\n", ret);
+ return ret;
+ }
+
+ for (i = 0; i < ARRAY_SIZE(cs35l56_asp1_mux_control_names); ++i) {
+ name = cs35l56_asp1_mux_control_names[i];
+
+ if (prefix) {
+ snprintf(full_name, sizeof(full_name), "%s %s", prefix, name);
+ name = full_name;
+ }
+
+ kcontrol = snd_soc_card_get_kcontrol_locked(dapm->card, name);
+ if (!kcontrol) {
+ dev_warn(cs35l56->base.dev, "Could not find control %s\n", name);
+ continue;
+ }
+
+ e = (struct soc_enum *)kcontrol->private_value;
+ item = snd_soc_enum_val_to_item(e, val[i] & CS35L56_ASP_TXn_SRC_MASK);
+ snd_soc_dapm_mux_update_power(dapm, kcontrol, item, e, NULL);
+ }
+
+ cs35l56->asp1_mixer_widgets_initialized = true;
+
+ return 0;
+}
+
static int cs35l56_dspwait_asp1tx_get(struct snd_kcontrol *kcontrol,
struct snd_ctl_elem_value *ucontrol)
{
@@ -78,9 +140,9 @@ static int cs35l56_dspwait_asp1tx_get(struct snd_kcontrol *kcontrol,
unsigned int addr, val;
int ret;
- /* Wait for mux to be initialized */
- cs35l56_wait_dsp_ready(cs35l56);
- flush_work(&cs35l56->mux_init_work);
+ ret = cs35l56_sync_asp1_mixer_widgets_with_firmware(cs35l56);
+ if (ret)
+ return ret;
addr = cs35l56_asp1_mixer_regs[index];
ret = regmap_read(cs35l56->base.regmap, addr, &val);
@@ -106,16 +168,16 @@ static int cs35l56_dspwait_asp1tx_put(struct snd_kcontrol *kcontrol,
bool changed;
int ret;
- /* Wait for mux to be initialized */
- cs35l56_wait_dsp_ready(cs35l56);
- flush_work(&cs35l56->mux_init_work);
+ ret = cs35l56_sync_asp1_mixer_widgets_with_firmware(cs35l56);
+ if (ret)
+ return ret;
addr = cs35l56_asp1_mixer_regs[index];
val = snd_soc_enum_item_to_val(e, item);
ret = regmap_update_bits_check(cs35l56->base.regmap, addr,
CS35L56_ASP_TXn_SRC_MASK, val, &changed);
- if (!ret)
+ if (ret)
return ret;
if (changed)
@@ -124,70 +186,6 @@ static int cs35l56_dspwait_asp1tx_put(struct snd_kcontrol *kcontrol,
return changed;
}
-static void cs35l56_mark_asp1_mixer_widgets_dirty(struct cs35l56_private *cs35l56)
-{
- struct snd_soc_dapm_context *dapm = snd_soc_component_get_dapm(cs35l56->component);
- const char *prefix = cs35l56->component->name_prefix;
- char full_name[SNDRV_CTL_ELEM_ID_NAME_MAXLEN];
- const char *name;
- struct snd_kcontrol *kcontrol;
- struct soc_enum *e;
- unsigned int val[4];
- int i, item, ret;
-
- /*
- * Resume so we can read the registers from silicon if the regmap
- * cache has not yet been populated.
- */
- ret = pm_runtime_resume_and_get(cs35l56->base.dev);
- if (ret < 0)
- return;
-
- ret = regmap_bulk_read(cs35l56->base.regmap, CS35L56_ASP1TX1_INPUT,
- val, ARRAY_SIZE(val));
-
- pm_runtime_mark_last_busy(cs35l56->base.dev);
- pm_runtime_put_autosuspend(cs35l56->base.dev);
-
- if (ret) {
- dev_err(cs35l56->base.dev, "Failed to read ASP1 mixer regs: %d\n", ret);
- return;
- }
-
- snd_soc_card_mutex_lock(dapm->card);
- WARN_ON(!dapm->card->instantiated);
-
- for (i = 0; i < ARRAY_SIZE(cs35l56_asp1_mux_control_names); ++i) {
- name = cs35l56_asp1_mux_control_names[i];
-
- if (prefix) {
- snprintf(full_name, sizeof(full_name), "%s %s", prefix, name);
- name = full_name;
- }
-
- kcontrol = snd_soc_card_get_kcontrol(dapm->card, name);
- if (!kcontrol) {
- dev_warn(cs35l56->base.dev, "Could not find control %s\n", name);
- continue;
- }
-
- e = (struct soc_enum *)kcontrol->private_value;
- item = snd_soc_enum_val_to_item(e, val[i] & CS35L56_ASP_TXn_SRC_MASK);
- snd_soc_dapm_mux_update_power(dapm, kcontrol, item, e, NULL);
- }
-
- snd_soc_card_mutex_unlock(dapm->card);
-}
-
-static void cs35l56_mux_init_work(struct work_struct *work)
-{
- struct cs35l56_private *cs35l56 = container_of(work,
- struct cs35l56_private,
- mux_init_work);
-
- cs35l56_mark_asp1_mixer_widgets_dirty(cs35l56);
-}
-
static DECLARE_TLV_DB_SCALE(vol_tlv, -10000, 25, 0);
static const struct snd_kcontrol_new cs35l56_controls[] = {
@@ -936,14 +934,6 @@ static void cs35l56_dsp_work(struct work_struct *work)
else
cs35l56_patch(cs35l56, firmware_missing);
-
- /*
- * Set starting value of ASP1 mux widgets. Updating a mux takes
- * the DAPM mutex. Post this to a separate job so that DAPM
- * power-up can wait for dsp_work to complete without deadlocking
- * on the DAPM mutex.
- */
- queue_work(cs35l56->dsp_wq, &cs35l56->mux_init_work);
err:
pm_runtime_mark_last_busy(cs35l56->base.dev);
pm_runtime_put_autosuspend(cs35l56->base.dev);
@@ -989,6 +979,13 @@ static int cs35l56_component_probe(struct snd_soc_component *component)
debugfs_create_bool("can_hibernate", 0444, debugfs_root, &cs35l56->base.can_hibernate);
debugfs_create_bool("fw_patched", 0444, debugfs_root, &cs35l56->base.fw_patched);
+ /*
+ * The widgets for the ASP1TX mixer can't be initialized
+ * until the firmware has been downloaded and rebooted.
+ */
+ regcache_drop_region(cs35l56->base.regmap, CS35L56_ASP1TX1_INPUT, CS35L56_ASP1TX4_INPUT);
+ cs35l56->asp1_mixer_widgets_initialized = false;
+
queue_work(cs35l56->dsp_wq, &cs35l56->dsp_work);
return 0;
@@ -999,7 +996,6 @@ static void cs35l56_component_remove(struct snd_soc_component *component)
struct cs35l56_private *cs35l56 = snd_soc_component_get_drvdata(component);
cancel_work_sync(&cs35l56->dsp_work);
- cancel_work_sync(&cs35l56->mux_init_work);
if (cs35l56->dsp.cs_dsp.booted)
wm_adsp_power_down(&cs35l56->dsp);
@@ -1070,10 +1066,8 @@ int cs35l56_system_suspend(struct device *dev)
dev_dbg(dev, "system_suspend\n");
- if (cs35l56->component) {
+ if (cs35l56->component)
flush_work(&cs35l56->dsp_work);
- cancel_work_sync(&cs35l56->mux_init_work);
- }
/*
* The interrupt line is normally shared, but after we start suspending
@@ -1224,7 +1218,6 @@ static int cs35l56_dsp_init(struct cs35l56_private *cs35l56)
return -ENOMEM;
INIT_WORK(&cs35l56->dsp_work, cs35l56_dsp_work);
- INIT_WORK(&cs35l56->mux_init_work, cs35l56_mux_init_work);
dsp = &cs35l56->dsp;
cs35l56_init_cs_dsp(&cs35l56->base, &dsp->cs_dsp);
@@ -1269,6 +1262,94 @@ static int cs35l56_get_firmware_uid(struct cs35l56_private *cs35l56)
return 0;
}
+/*
+ * Some SoundWire laptops have a spk-id-gpios property but it points to
+ * the wrong ACPI Device node so can't be used to get the GPIO. Try to
+ * find the SDCA node containing the GpioIo resource and add a GPIO
+ * mapping to it.
+ */
+static const struct acpi_gpio_params cs35l56_af01_first_gpio = { 0, 0, false };
+static const struct acpi_gpio_mapping cs35l56_af01_spkid_gpios_mapping[] = {
+ { "spk-id-gpios", &cs35l56_af01_first_gpio, 1 },
+ { }
+};
+
+static void cs35l56_acpi_dev_release_driver_gpios(void *adev)
+{
+ acpi_dev_remove_driver_gpios(adev);
+}
+
+static int cs35l56_try_get_broken_sdca_spkid_gpio(struct cs35l56_private *cs35l56)
+{
+ struct fwnode_handle *af01_fwnode;
+ const union acpi_object *obj;
+ struct gpio_desc *desc;
+ int ret;
+
+ /* Find the SDCA node containing the GpioIo */
+ af01_fwnode = device_get_named_child_node(cs35l56->base.dev, "AF01");
+ if (!af01_fwnode) {
+ dev_dbg(cs35l56->base.dev, "No AF01 node\n");
+ return -ENOENT;
+ }
+
+ ret = acpi_dev_get_property(ACPI_COMPANION(cs35l56->base.dev),
+ "spk-id-gpios", ACPI_TYPE_PACKAGE, &obj);
+ if (ret) {
+ dev_dbg(cs35l56->base.dev, "Could not get spk-id-gpios package: %d\n", ret);
+ return -ENOENT;
+ }
+
+ /* The broken properties we can handle are a 4-element package (one GPIO) */
+ if (obj->package.count != 4) {
+ dev_warn(cs35l56->base.dev, "Unexpected spk-id element count %d\n",
+ obj->package.count);
+ return -ENOENT;
+ }
+
+ /* Add a GPIO mapping if it doesn't already have one */
+ if (!fwnode_property_present(af01_fwnode, "spk-id-gpios")) {
+ struct acpi_device *adev = to_acpi_device_node(af01_fwnode);
+
+ /*
+ * Can't use devm_acpi_dev_add_driver_gpios() because the
+ * mapping isn't being added to the node pointed to by
+ * ACPI_COMPANION().
+ */
+ ret = acpi_dev_add_driver_gpios(adev, cs35l56_af01_spkid_gpios_mapping);
+ if (ret) {
+ return dev_err_probe(cs35l56->base.dev, ret,
+ "Failed to add gpio mapping to AF01\n");
+ }
+
+ ret = devm_add_action_or_reset(cs35l56->base.dev,
+ cs35l56_acpi_dev_release_driver_gpios,
+ adev);
+ if (ret)
+ return ret;
+
+ dev_dbg(cs35l56->base.dev, "Added spk-id-gpios mapping to AF01\n");
+ }
+
+ desc = fwnode_gpiod_get_index(af01_fwnode, "spk-id", 0, GPIOD_IN, NULL);
+ if (IS_ERR(desc)) {
+ ret = PTR_ERR(desc);
+ return dev_err_probe(cs35l56->base.dev, ret, "Get GPIO from AF01 failed\n");
+ }
+
+ ret = gpiod_get_value_cansleep(desc);
+ gpiod_put(desc);
+
+ if (ret < 0) {
+ dev_err_probe(cs35l56->base.dev, ret, "Error reading spk-id GPIO\n");
+ return ret;
+ }
+
+ dev_info(cs35l56->base.dev, "Got spk-id from AF01\n");
+
+ return ret;
+}
+
int cs35l56_common_probe(struct cs35l56_private *cs35l56)
{
int ret;
@@ -1313,6 +1394,9 @@ int cs35l56_common_probe(struct cs35l56_private *cs35l56)
}
ret = cs35l56_get_speaker_id(&cs35l56->base);
+ if (ACPI_COMPANION(cs35l56->base.dev) && cs35l56->sdw_peripheral && (ret == -ENOENT))
+ ret = cs35l56_try_get_broken_sdca_spkid_gpio(cs35l56);
+
if ((ret < 0) && (ret != -ENOENT))
goto err;
diff --git a/sound/soc/codecs/cs35l56.h b/sound/soc/codecs/cs35l56.h
index 596b141..b000e73 100644
--- a/sound/soc/codecs/cs35l56.h
+++ b/sound/soc/codecs/cs35l56.h
@@ -34,7 +34,6 @@ struct cs35l56_private {
struct wm_adsp dsp; /* must be first member */
struct cs35l56_base base;
struct work_struct dsp_work;
- struct work_struct mux_init_work;
struct workqueue_struct *dsp_wq;
struct snd_soc_component *component;
struct regulator_bulk_data supplies[CS35L56_NUM_BULK_SUPPLIES];
@@ -52,6 +51,7 @@ struct cs35l56_private {
u8 asp_slot_count;
bool tdm_mode;
bool sysclk_set;
+ bool asp1_mixer_widgets_initialized;
u8 old_sdw_clock_scale;
};
diff --git a/sound/soc/codecs/cs42l43.c b/sound/soc/codecs/cs42l43.c
index 6a64681..a97ccb5 100644
--- a/sound/soc/codecs/cs42l43.c
+++ b/sound/soc/codecs/cs42l43.c
@@ -2257,7 +2257,10 @@ static int cs42l43_codec_probe(struct platform_device *pdev)
pm_runtime_use_autosuspend(priv->dev);
pm_runtime_set_active(priv->dev);
pm_runtime_get_noresume(priv->dev);
- devm_pm_runtime_enable(priv->dev);
+
+ ret = devm_pm_runtime_enable(priv->dev);
+ if (ret)
+ goto err_pm;
for (i = 0; i < ARRAY_SIZE(cs42l43_irqs); i++) {
ret = cs42l43_request_irq(priv, dom, cs42l43_irqs[i].name,
@@ -2333,8 +2336,47 @@ static int cs42l43_codec_runtime_resume(struct device *dev)
return 0;
}
-static DEFINE_RUNTIME_DEV_PM_OPS(cs42l43_codec_pm_ops, NULL,
- cs42l43_codec_runtime_resume, NULL);
+static int cs42l43_codec_suspend(struct device *dev)
+{
+ struct cs42l43 *cs42l43 = dev_get_drvdata(dev);
+
+ disable_irq(cs42l43->irq);
+
+ return 0;
+}
+
+static int cs42l43_codec_suspend_noirq(struct device *dev)
+{
+ struct cs42l43 *cs42l43 = dev_get_drvdata(dev);
+
+ enable_irq(cs42l43->irq);
+
+ return 0;
+}
+
+static int cs42l43_codec_resume(struct device *dev)
+{
+ struct cs42l43 *cs42l43 = dev_get_drvdata(dev);
+
+ enable_irq(cs42l43->irq);
+
+ return 0;
+}
+
+static int cs42l43_codec_resume_noirq(struct device *dev)
+{
+ struct cs42l43 *cs42l43 = dev_get_drvdata(dev);
+
+ disable_irq(cs42l43->irq);
+
+ return 0;
+}
+
+static const struct dev_pm_ops cs42l43_codec_pm_ops = {
+ SYSTEM_SLEEP_PM_OPS(cs42l43_codec_suspend, cs42l43_codec_resume)
+ NOIRQ_SYSTEM_SLEEP_PM_OPS(cs42l43_codec_suspend_noirq, cs42l43_codec_resume_noirq)
+ RUNTIME_PM_OPS(NULL, cs42l43_codec_runtime_resume, NULL)
+};
static const struct platform_device_id cs42l43_codec_id_table[] = {
{ "cs42l43-codec", },
diff --git a/sound/soc/codecs/madera.c b/sound/soc/codecs/madera.c
index b9f19fb..b24d647 100644
--- a/sound/soc/codecs/madera.c
+++ b/sound/soc/codecs/madera.c
@@ -3884,7 +3884,7 @@ static inline int madera_set_fll_clks(struct madera_fll *fll, int base, bool ena
return madera_set_fll_clks_reg(fll, ena,
base + MADERA_FLL_CONTROL_6_OFFS,
MADERA_FLL1_REFCLK_SRC_MASK,
- MADERA_FLL1_REFCLK_DIV_SHIFT);
+ MADERA_FLL1_REFCLK_SRC_SHIFT);
}
static inline int madera_set_fllao_clks(struct madera_fll *fll, int base, bool ena)
diff --git a/sound/soc/codecs/rt5645.c b/sound/soc/codecs/rt5645.c
index 5150d6e..20191a4 100644
--- a/sound/soc/codecs/rt5645.c
+++ b/sound/soc/codecs/rt5645.c
@@ -3317,6 +3317,7 @@ static void rt5645_jack_detect_work(struct work_struct *work)
report, SND_JACK_HEADPHONE);
snd_soc_jack_report(rt5645->mic_jack,
report, SND_JACK_MICROPHONE);
+ mutex_unlock(&rt5645->jd_mutex);
return;
case 4:
val = snd_soc_component_read(rt5645->component, RT5645_A_JD_CTRL1) & 0x0020;
@@ -3692,6 +3693,11 @@ static const struct rt5645_platform_data jd_mode3_monospk_platform_data = {
.mono_speaker = true,
};
+static const struct rt5645_platform_data jd_mode3_inv_data = {
+ .jd_mode = 3,
+ .inv_jd1_1 = true,
+};
+
static const struct rt5645_platform_data jd_mode3_platform_data = {
.jd_mode = 3,
};
@@ -3837,6 +3843,16 @@ static const struct dmi_system_id dmi_platform_data[] = {
DMI_EXACT_MATCH(DMI_BOARD_VENDOR, "AMI Corporation"),
DMI_EXACT_MATCH(DMI_BOARD_NAME, "Cherry Trail CR"),
DMI_EXACT_MATCH(DMI_BOARD_VERSION, "Default string"),
+ /*
+ * Above strings are too generic, LattePanda BIOS versions for
+ * all 4 hw revisions are:
+ * DF-BI-7-S70CR100-*
+ * DF-BI-7-S70CR110-*
+ * DF-BI-7-S70CR200-*
+ * LP-BS-7-S70CR700-*
+ * Do a partial match for S70CR to avoid false positive matches.
+ */
+ DMI_MATCH(DMI_BIOS_VERSION, "S70CR"),
},
.driver_data = (void *)&lattepanda_board_platform_data,
},
@@ -3871,6 +3887,16 @@ static const struct dmi_system_id dmi_platform_data[] = {
},
.driver_data = (void *)&intel_braswell_platform_data,
},
+ {
+ .ident = "Meegopad T08",
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "Default string"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "Default string"),
+ DMI_MATCH(DMI_BOARD_NAME, "T3 MRD"),
+ DMI_MATCH(DMI_BOARD_VERSION, "V1.1"),
+ },
+ .driver_data = (void *)&jd_mode3_inv_data,
+ },
{ }
};
diff --git a/sound/soc/codecs/tas2781-comlib.c b/sound/soc/codecs/tas2781-comlib.c
index b7e56ce..5d0e534 100644
--- a/sound/soc/codecs/tas2781-comlib.c
+++ b/sound/soc/codecs/tas2781-comlib.c
@@ -267,6 +267,7 @@ void tas2781_reset(struct tasdevice_priv *tas_dev)
EXPORT_SYMBOL_GPL(tas2781_reset);
int tascodec_init(struct tasdevice_priv *tas_priv, void *codec,
+ struct module *module,
void (*cont)(const struct firmware *fw, void *context))
{
int ret = 0;
@@ -280,7 +281,7 @@ int tascodec_init(struct tasdevice_priv *tas_priv, void *codec,
tas_priv->dev_name, tas_priv->ndev);
crc8_populate_msb(tas_priv->crc8_lkp_tbl, TASDEVICE_CRC8_POLYNOMIAL);
tas_priv->codec = codec;
- ret = request_firmware_nowait(THIS_MODULE, FW_ACTION_UEVENT,
+ ret = request_firmware_nowait(module, FW_ACTION_UEVENT,
tas_priv->rca_binaryname, tas_priv->dev, GFP_KERNEL, tas_priv,
cont);
if (ret)
diff --git a/sound/soc/codecs/tas2781-i2c.c b/sound/soc/codecs/tas2781-i2c.c
index 32913bd..b5abff2 100644
--- a/sound/soc/codecs/tas2781-i2c.c
+++ b/sound/soc/codecs/tas2781-i2c.c
@@ -566,7 +566,7 @@ static int tasdevice_codec_probe(struct snd_soc_component *codec)
{
struct tasdevice_priv *tas_priv = snd_soc_component_get_drvdata(codec);
- return tascodec_init(tas_priv, codec, tasdevice_fw_ready);
+ return tascodec_init(tas_priv, codec, THIS_MODULE, tasdevice_fw_ready);
}
static void tasdevice_deinit(void *context)
diff --git a/sound/soc/codecs/wm8962.c b/sound/soc/codecs/wm8962.c
index fb90ae6..7c6ed298 100644
--- a/sound/soc/codecs/wm8962.c
+++ b/sound/soc/codecs/wm8962.c
@@ -2229,6 +2229,9 @@ SND_SOC_DAPM_PGA_E("HPOUT", SND_SOC_NOPM, 0, 0, NULL, 0, hp_event,
SND_SOC_DAPM_OUTPUT("HPOUTL"),
SND_SOC_DAPM_OUTPUT("HPOUTR"),
+
+SND_SOC_DAPM_PGA("SPKOUTL Output", WM8962_CLASS_D_CONTROL_1, 6, 0, NULL, 0),
+SND_SOC_DAPM_PGA("SPKOUTR Output", WM8962_CLASS_D_CONTROL_1, 7, 0, NULL, 0),
};
static const struct snd_soc_dapm_widget wm8962_dapm_spk_mono_widgets[] = {
@@ -2236,7 +2239,6 @@ SND_SOC_DAPM_MIXER("Speaker Mixer", WM8962_MIXER_ENABLES, 1, 0,
spkmixl, ARRAY_SIZE(spkmixl)),
SND_SOC_DAPM_MUX_E("Speaker PGA", WM8962_PWR_MGMT_2, 4, 0, &spkoutl_mux,
out_pga_event, SND_SOC_DAPM_POST_PMU),
-SND_SOC_DAPM_PGA("Speaker Output", WM8962_CLASS_D_CONTROL_1, 7, 0, NULL, 0),
SND_SOC_DAPM_OUTPUT("SPKOUT"),
};
@@ -2251,9 +2253,6 @@ SND_SOC_DAPM_MUX_E("SPKOUTL PGA", WM8962_PWR_MGMT_2, 4, 0, &spkoutl_mux,
SND_SOC_DAPM_MUX_E("SPKOUTR PGA", WM8962_PWR_MGMT_2, 3, 0, &spkoutr_mux,
out_pga_event, SND_SOC_DAPM_POST_PMU),
-SND_SOC_DAPM_PGA("SPKOUTR Output", WM8962_CLASS_D_CONTROL_1, 7, 0, NULL, 0),
-SND_SOC_DAPM_PGA("SPKOUTL Output", WM8962_CLASS_D_CONTROL_1, 6, 0, NULL, 0),
-
SND_SOC_DAPM_OUTPUT("SPKOUTL"),
SND_SOC_DAPM_OUTPUT("SPKOUTR"),
};
@@ -2366,12 +2365,18 @@ static const struct snd_soc_dapm_route wm8962_spk_mono_intercon[] = {
{ "Speaker PGA", "Mixer", "Speaker Mixer" },
{ "Speaker PGA", "DAC", "DACL" },
- { "Speaker Output", NULL, "Speaker PGA" },
- { "Speaker Output", NULL, "SYSCLK" },
- { "Speaker Output", NULL, "TOCLK" },
- { "Speaker Output", NULL, "TEMP_SPK" },
+ { "SPKOUTL Output", NULL, "Speaker PGA" },
+ { "SPKOUTL Output", NULL, "SYSCLK" },
+ { "SPKOUTL Output", NULL, "TOCLK" },
+ { "SPKOUTL Output", NULL, "TEMP_SPK" },
- { "SPKOUT", NULL, "Speaker Output" },
+ { "SPKOUTR Output", NULL, "Speaker PGA" },
+ { "SPKOUTR Output", NULL, "SYSCLK" },
+ { "SPKOUTR Output", NULL, "TOCLK" },
+ { "SPKOUTR Output", NULL, "TEMP_SPK" },
+
+ { "SPKOUT", NULL, "SPKOUTL Output" },
+ { "SPKOUT", NULL, "SPKOUTR Output" },
};
static const struct snd_soc_dapm_route wm8962_spk_stereo_intercon[] = {
@@ -2914,8 +2919,12 @@ static int wm8962_set_fll(struct snd_soc_component *component, int fll_id, int s
switch (fll_id) {
case WM8962_FLL_MCLK:
case WM8962_FLL_BCLK:
+ fll1 |= (fll_id - 1) << WM8962_FLL_REFCLK_SRC_SHIFT;
+ break;
case WM8962_FLL_OSC:
fll1 |= (fll_id - 1) << WM8962_FLL_REFCLK_SRC_SHIFT;
+ snd_soc_component_update_bits(component, WM8962_PLL2,
+ WM8962_OSC_ENA, WM8962_OSC_ENA);
break;
case WM8962_FLL_INT:
snd_soc_component_update_bits(component, WM8962_FLL_CONTROL_1,
@@ -2924,7 +2933,7 @@ static int wm8962_set_fll(struct snd_soc_component *component, int fll_id, int s
WM8962_FLL_FRC_NCO, WM8962_FLL_FRC_NCO);
break;
default:
- dev_err(component->dev, "Unknown FLL source %d\n", ret);
+ dev_err(component->dev, "Unknown FLL source %d\n", source);
return -EINVAL;
}
diff --git a/sound/soc/fsl/fsl_xcvr.c b/sound/soc/fsl/fsl_xcvr.c
index f0fb33d..c46f645 100644
--- a/sound/soc/fsl/fsl_xcvr.c
+++ b/sound/soc/fsl/fsl_xcvr.c
@@ -174,7 +174,9 @@ static int fsl_xcvr_activate_ctl(struct snd_soc_dai *dai, const char *name,
struct snd_kcontrol *kctl;
bool enabled;
- kctl = snd_soc_card_get_kcontrol(card, name);
+ lockdep_assert_held(&card->snd_card->controls_rwsem);
+
+ kctl = snd_soc_card_get_kcontrol_locked(card, name);
if (kctl == NULL)
return -ENOENT;
@@ -576,10 +578,14 @@ static int fsl_xcvr_startup(struct snd_pcm_substream *substream,
xcvr->streams |= BIT(substream->stream);
if (!xcvr->soc_data->spdif_only) {
+ struct snd_soc_card *card = dai->component->card;
+
/* Disable XCVR controls if there is stream started */
+ down_read(&card->snd_card->controls_rwsem);
fsl_xcvr_activate_ctl(dai, fsl_xcvr_mode_kctl.name, false);
fsl_xcvr_activate_ctl(dai, fsl_xcvr_arc_mode_kctl.name, false);
fsl_xcvr_activate_ctl(dai, fsl_xcvr_earc_capds_kctl.name, false);
+ up_read(&card->snd_card->controls_rwsem);
}
return 0;
@@ -598,11 +604,15 @@ static void fsl_xcvr_shutdown(struct snd_pcm_substream *substream,
/* Enable XCVR controls if there is no stream started */
if (!xcvr->streams) {
if (!xcvr->soc_data->spdif_only) {
+ struct snd_soc_card *card = dai->component->card;
+
+ down_read(&card->snd_card->controls_rwsem);
fsl_xcvr_activate_ctl(dai, fsl_xcvr_mode_kctl.name, true);
fsl_xcvr_activate_ctl(dai, fsl_xcvr_arc_mode_kctl.name,
(xcvr->mode == FSL_XCVR_MODE_ARC));
fsl_xcvr_activate_ctl(dai, fsl_xcvr_earc_capds_kctl.name,
(xcvr->mode == FSL_XCVR_MODE_EARC));
+ up_read(&card->snd_card->controls_rwsem);
}
ret = regmap_update_bits(xcvr->regmap, FSL_XCVR_EXT_IER0,
FSL_XCVR_IRQ_EARC_ALL, 0);
diff --git a/sound/soc/intel/avs/core.c b/sound/soc/intel/avs/core.c
index 59c3793..db78eb2 100644
--- a/sound/soc/intel/avs/core.c
+++ b/sound/soc/intel/avs/core.c
@@ -477,6 +477,9 @@ static int avs_pci_probe(struct pci_dev *pci, const struct pci_device_id *id)
return 0;
err_i915_init:
+ pci_free_irq(pci, 0, adev);
+ pci_free_irq(pci, 0, bus);
+ pci_free_irq_vectors(pci);
pci_clear_master(pci);
pci_set_drvdata(pci, NULL);
err_acquire_irq:
diff --git a/sound/soc/intel/avs/topology.c b/sound/soc/intel/avs/topology.c
index 778236d..48b3c67 100644
--- a/sound/soc/intel/avs/topology.c
+++ b/sound/soc/intel/avs/topology.c
@@ -857,7 +857,7 @@ assign_copier_gtw_instance(struct snd_soc_component *comp, struct avs_tplg_modcf
}
/* If topology sets value don't overwrite it */
- if (cfg->copier.vindex.i2s.instance)
+ if (cfg->copier.vindex.val)
return;
mach = dev_get_platdata(comp->card->dev);
diff --git a/sound/soc/intel/boards/bytcht_cx2072x.c b/sound/soc/intel/boards/bytcht_cx2072x.c
index 10a84a2..c014d85 100644
--- a/sound/soc/intel/boards/bytcht_cx2072x.c
+++ b/sound/soc/intel/boards/bytcht_cx2072x.c
@@ -241,7 +241,8 @@ static int snd_byt_cht_cx2072x_probe(struct platform_device *pdev)
/* fix index of codec dai */
for (i = 0; i < ARRAY_SIZE(byt_cht_cx2072x_dais); i++) {
- if (!strcmp(byt_cht_cx2072x_dais[i].codecs->name,
+ if (byt_cht_cx2072x_dais[i].codecs->name &&
+ !strcmp(byt_cht_cx2072x_dais[i].codecs->name,
"i2c-14F10720:00")) {
dai_index = i;
break;
diff --git a/sound/soc/intel/boards/bytcht_da7213.c b/sound/soc/intel/boards/bytcht_da7213.c
index 7e5eea6..f4ac3dd 100644
--- a/sound/soc/intel/boards/bytcht_da7213.c
+++ b/sound/soc/intel/boards/bytcht_da7213.c
@@ -245,7 +245,8 @@ static int bytcht_da7213_probe(struct platform_device *pdev)
/* fix index of codec dai */
for (i = 0; i < ARRAY_SIZE(dailink); i++) {
- if (!strcmp(dailink[i].codecs->name, "i2c-DLGS7213:00")) {
+ if (dailink[i].codecs->name &&
+ !strcmp(dailink[i].codecs->name, "i2c-DLGS7213:00")) {
dai_index = i;
break;
}
diff --git a/sound/soc/intel/boards/bytcht_es8316.c b/sound/soc/intel/boards/bytcht_es8316.c
index 1564a88..2fcec2e 100644
--- a/sound/soc/intel/boards/bytcht_es8316.c
+++ b/sound/soc/intel/boards/bytcht_es8316.c
@@ -546,7 +546,8 @@ static int snd_byt_cht_es8316_mc_probe(struct platform_device *pdev)
/* fix index of codec dai */
for (i = 0; i < ARRAY_SIZE(byt_cht_es8316_dais); i++) {
- if (!strcmp(byt_cht_es8316_dais[i].codecs->name,
+ if (byt_cht_es8316_dais[i].codecs->name &&
+ !strcmp(byt_cht_es8316_dais[i].codecs->name,
"i2c-ESSX8316:00")) {
dai_index = i;
break;
diff --git a/sound/soc/intel/boards/bytcr_rt5640.c b/sound/soc/intel/boards/bytcr_rt5640.c
index 42466b4..05f38d1 100644
--- a/sound/soc/intel/boards/bytcr_rt5640.c
+++ b/sound/soc/intel/boards/bytcr_rt5640.c
@@ -685,6 +685,18 @@ static const struct dmi_system_id byt_rt5640_quirk_table[] = {
BYT_RT5640_SSP0_AIF1 |
BYT_RT5640_MCLK_EN),
},
+ { /* Chuwi Vi8 dual-boot (CWI506) */
+ .matches = {
+ DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Insyde"),
+ DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "i86"),
+ /* The above are too generic, also match BIOS info */
+ DMI_MATCH(DMI_BIOS_VERSION, "CHUWI2.D86JHBNR02"),
+ },
+ .driver_data = (void *)(BYTCR_INPUT_DEFAULTS |
+ BYT_RT5640_MONO_SPEAKER |
+ BYT_RT5640_SSP0_AIF1 |
+ BYT_RT5640_MCLK_EN),
+ },
{
/* Chuwi Vi10 (CWI505) */
.matches = {
@@ -1652,7 +1664,8 @@ static int snd_byt_rt5640_mc_probe(struct platform_device *pdev)
/* fix index of codec dai */
for (i = 0; i < ARRAY_SIZE(byt_rt5640_dais); i++) {
- if (!strcmp(byt_rt5640_dais[i].codecs->name,
+ if (byt_rt5640_dais[i].codecs->name &&
+ !strcmp(byt_rt5640_dais[i].codecs->name,
"i2c-10EC5640:00")) {
dai_index = i;
break;
diff --git a/sound/soc/intel/boards/bytcr_rt5651.c b/sound/soc/intel/boards/bytcr_rt5651.c
index f9fe841..80c841b 100644
--- a/sound/soc/intel/boards/bytcr_rt5651.c
+++ b/sound/soc/intel/boards/bytcr_rt5651.c
@@ -910,7 +910,8 @@ static int snd_byt_rt5651_mc_probe(struct platform_device *pdev)
/* fix index of codec dai */
for (i = 0; i < ARRAY_SIZE(byt_rt5651_dais); i++) {
- if (!strcmp(byt_rt5651_dais[i].codecs->name,
+ if (byt_rt5651_dais[i].codecs->name &&
+ !strcmp(byt_rt5651_dais[i].codecs->name,
"i2c-10EC5651:00")) {
dai_index = i;
break;
diff --git a/sound/soc/intel/boards/bytcr_wm5102.c b/sound/soc/intel/boards/bytcr_wm5102.c
index 6978ebd..cccb5e9 100644
--- a/sound/soc/intel/boards/bytcr_wm5102.c
+++ b/sound/soc/intel/boards/bytcr_wm5102.c
@@ -605,7 +605,8 @@ static int snd_byt_wm5102_mc_probe(struct platform_device *pdev)
/* find index of codec dai */
for (i = 0; i < ARRAY_SIZE(byt_wm5102_dais); i++) {
- if (!strcmp(byt_wm5102_dais[i].codecs->name,
+ if (byt_wm5102_dais[i].codecs->name &&
+ !strcmp(byt_wm5102_dais[i].codecs->name,
"wm5102-codec")) {
dai_index = i;
break;
diff --git a/sound/soc/intel/boards/cht_bsw_rt5645.c b/sound/soc/intel/boards/cht_bsw_rt5645.c
index c952a96..eb41b71 100644
--- a/sound/soc/intel/boards/cht_bsw_rt5645.c
+++ b/sound/soc/intel/boards/cht_bsw_rt5645.c
@@ -40,7 +40,6 @@ struct cht_acpi_card {
struct cht_mc_private {
struct snd_soc_jack jack;
struct cht_acpi_card *acpi_card;
- char codec_name[SND_ACPI_I2C_ID_LEN];
struct clk *mclk;
};
@@ -567,14 +566,14 @@ static int snd_cht_mc_probe(struct platform_device *pdev)
}
card->dev = &pdev->dev;
- sprintf(drv->codec_name, "i2c-%s:00", drv->acpi_card->codec_id);
/* set correct codec name */
for (i = 0; i < ARRAY_SIZE(cht_dailink); i++)
- if (!strcmp(card->dai_link[i].codecs->name,
+ if (cht_dailink[i].codecs->name &&
+ !strcmp(cht_dailink[i].codecs->name,
"i2c-10EC5645:00")) {
- card->dai_link[i].codecs->name = drv->codec_name;
dai_index = i;
+ break;
}
/* fixup codec name based on HID */
diff --git a/sound/soc/intel/boards/cht_bsw_rt5672.c b/sound/soc/intel/boards/cht_bsw_rt5672.c
index 8cf0b33..be2d1a8 100644
--- a/sound/soc/intel/boards/cht_bsw_rt5672.c
+++ b/sound/soc/intel/boards/cht_bsw_rt5672.c
@@ -466,7 +466,8 @@ static int snd_cht_mc_probe(struct platform_device *pdev)
/* find index of codec dai */
for (i = 0; i < ARRAY_SIZE(cht_dailink); i++) {
- if (!strcmp(cht_dailink[i].codecs->name, RT5672_I2C_DEFAULT)) {
+ if (cht_dailink[i].codecs->name &&
+ !strcmp(cht_dailink[i].codecs->name, RT5672_I2C_DEFAULT)) {
dai_index = i;
break;
}
diff --git a/sound/soc/qcom/lpass-cdc-dma.c b/sound/soc/qcom/lpass-cdc-dma.c
index 48b03e6..8106c58 100644
--- a/sound/soc/qcom/lpass-cdc-dma.c
+++ b/sound/soc/qcom/lpass-cdc-dma.c
@@ -259,7 +259,7 @@ static int lpass_cdc_dma_daiops_trigger(struct snd_pcm_substream *substream,
int cmd, struct snd_soc_dai *dai)
{
struct snd_soc_pcm_runtime *soc_runtime = snd_soc_substream_to_rtd(substream);
- struct lpaif_dmactl *dmactl;
+ struct lpaif_dmactl *dmactl = NULL;
int ret = 0, id;
switch (cmd) {
diff --git a/sound/soc/qcom/qdsp6/q6apm-dai.c b/sound/soc/qcom/qdsp6/q6apm-dai.c
index 052e40c..00bbd29 100644
--- a/sound/soc/qcom/qdsp6/q6apm-dai.c
+++ b/sound/soc/qcom/qdsp6/q6apm-dai.c
@@ -123,7 +123,7 @@ static struct snd_pcm_hardware q6apm_dai_hardware_playback = {
.fifo_size = 0,
};
-static void event_handler(uint32_t opcode, uint32_t token, uint32_t *payload, void *priv)
+static void event_handler(uint32_t opcode, uint32_t token, void *payload, void *priv)
{
struct q6apm_dai_rtd *prtd = priv;
struct snd_pcm_substream *substream = prtd->substream;
@@ -157,7 +157,7 @@ static void event_handler(uint32_t opcode, uint32_t token, uint32_t *payload, vo
}
static void event_handler_compr(uint32_t opcode, uint32_t token,
- uint32_t *payload, void *priv)
+ void *payload, void *priv)
{
struct q6apm_dai_rtd *prtd = priv;
struct snd_compr_stream *substream = prtd->cstream;
@@ -352,7 +352,7 @@ static int q6apm_dai_open(struct snd_soc_component *component,
spin_lock_init(&prtd->lock);
prtd->substream = substream;
- prtd->graph = q6apm_graph_open(dev, (q6apm_cb)event_handler, prtd, graph_id);
+ prtd->graph = q6apm_graph_open(dev, event_handler, prtd, graph_id);
if (IS_ERR(prtd->graph)) {
dev_err(dev, "%s: Could not allocate memory\n", __func__);
ret = PTR_ERR(prtd->graph);
@@ -496,7 +496,7 @@ static int q6apm_dai_compr_open(struct snd_soc_component *component,
return -ENOMEM;
prtd->cstream = stream;
- prtd->graph = q6apm_graph_open(dev, (q6apm_cb)event_handler_compr, prtd, graph_id);
+ prtd->graph = q6apm_graph_open(dev, event_handler_compr, prtd, graph_id);
if (IS_ERR(prtd->graph)) {
ret = PTR_ERR(prtd->graph);
kfree(prtd);
diff --git a/sound/soc/sh/rcar/adg.c b/sound/soc/sh/rcar/adg.c
index 230c486..afd69c6 100644
--- a/sound/soc/sh/rcar/adg.c
+++ b/sound/soc/sh/rcar/adg.c
@@ -111,6 +111,13 @@ static u32 rsnd_adg_ssi_ws_timing_gen2(struct rsnd_dai_stream *io)
ws = 7;
break;
}
+ } else {
+ /*
+ * SSI8 is not connected to ADG.
+ * Thus SSI9 is using ws = 8
+ */
+ if (id == 9)
+ ws = 8;
}
return (0x6 + ws) << 8;
diff --git a/sound/soc/soc-card.c b/sound/soc/soc-card.c
index 285ab4c..8a2f163 100644
--- a/sound/soc/soc-card.c
+++ b/sound/soc/soc-card.c
@@ -5,6 +5,9 @@
// Copyright (C) 2019 Renesas Electronics Corp.
// Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
//
+
+#include <linux/lockdep.h>
+#include <linux/rwsem.h>
#include <sound/soc.h>
#include <sound/jack.h>
@@ -26,12 +29,15 @@ static inline int _soc_card_ret(struct snd_soc_card *card,
return ret;
}
-struct snd_kcontrol *snd_soc_card_get_kcontrol(struct snd_soc_card *soc_card,
- const char *name)
+struct snd_kcontrol *snd_soc_card_get_kcontrol_locked(struct snd_soc_card *soc_card,
+ const char *name)
{
struct snd_card *card = soc_card->snd_card;
struct snd_kcontrol *kctl;
+ /* must be held read or write */
+ lockdep_assert_held(&card->controls_rwsem);
+
if (unlikely(!name))
return NULL;
@@ -40,6 +46,20 @@ struct snd_kcontrol *snd_soc_card_get_kcontrol(struct snd_soc_card *soc_card,
return kctl;
return NULL;
}
+EXPORT_SYMBOL_GPL(snd_soc_card_get_kcontrol_locked);
+
+struct snd_kcontrol *snd_soc_card_get_kcontrol(struct snd_soc_card *soc_card,
+ const char *name)
+{
+ struct snd_card *card = soc_card->snd_card;
+ struct snd_kcontrol *kctl;
+
+ down_read(&card->controls_rwsem);
+ kctl = snd_soc_card_get_kcontrol_locked(soc_card, name);
+ up_read(&card->controls_rwsem);
+
+ return kctl;
+}
EXPORT_SYMBOL_GPL(snd_soc_card_get_kcontrol);
static int jack_new(struct snd_soc_card *card, const char *id, int type,
diff --git a/sound/soc/sof/amd/acp-ipc.c b/sound/soc/sof/amd/acp-ipc.c
index 2743f07..b44b1b1 100644
--- a/sound/soc/sof/amd/acp-ipc.c
+++ b/sound/soc/sof/amd/acp-ipc.c
@@ -188,11 +188,13 @@ irqreturn_t acp_sof_ipc_irq_thread(int irq, void *context)
dsp_ack = snd_sof_dsp_read(sdev, ACP_DSP_BAR, ACP_SCRATCH_REG_0 + dsp_ack_write);
if (dsp_ack) {
+ spin_lock_irq(&sdev->ipc_lock);
/* handle immediate reply from DSP core */
acp_dsp_ipc_get_reply(sdev);
snd_sof_ipc_reply(sdev, 0);
/* set the done bit */
acp_dsp_ipc_dsp_done(sdev);
+ spin_unlock_irq(&sdev->ipc_lock);
ipc_irq = true;
}
diff --git a/sound/soc/sof/amd/acp.c b/sound/soc/sof/amd/acp.c
index 32a741f..07632ae 100644
--- a/sound/soc/sof/amd/acp.c
+++ b/sound/soc/sof/amd/acp.c
@@ -355,21 +355,20 @@ static irqreturn_t acp_irq_thread(int irq, void *context)
unsigned int count = ACP_HW_SEM_RETRY_COUNT;
spin_lock_irq(&sdev->ipc_lock);
- while (snd_sof_dsp_read(sdev, ACP_DSP_BAR, desc->hw_semaphore_offset)) {
- /* Wait until acquired HW Semaphore lock or timeout */
- count--;
- if (!count) {
- dev_err(sdev->dev, "%s: Failed to acquire HW lock\n", __func__);
- spin_unlock_irq(&sdev->ipc_lock);
- return IRQ_NONE;
- }
+ /* Wait until acquired HW Semaphore lock or timeout */
+ while (snd_sof_dsp_read(sdev, ACP_DSP_BAR, desc->hw_semaphore_offset) && --count)
+ ;
+ spin_unlock_irq(&sdev->ipc_lock);
+
+ if (!count) {
+ dev_err(sdev->dev, "%s: Failed to acquire HW lock\n", __func__);
+ return IRQ_NONE;
}
sof_ops(sdev)->irq_thread(irq, sdev);
/* Unlock or Release HW Semaphore */
snd_sof_dsp_write(sdev, ACP_DSP_BAR, desc->hw_semaphore_offset, 0x0);
- spin_unlock_irq(&sdev->ipc_lock);
return IRQ_HANDLED;
};
diff --git a/sound/soc/sof/intel/pci-lnl.c b/sound/soc/sof/intel/pci-lnl.c
index 78a57eb..b26ffe7 100644
--- a/sound/soc/sof/intel/pci-lnl.c
+++ b/sound/soc/sof/intel/pci-lnl.c
@@ -36,7 +36,7 @@ static const struct sof_dev_desc lnl_desc = {
[SOF_IPC_TYPE_4] = "intel/sof-ipc4/lnl",
},
.default_tplg_path = {
- [SOF_IPC_TYPE_4] = "intel/sof-ace-tplg",
+ [SOF_IPC_TYPE_4] = "intel/sof-ipc4-tplg",
},
.default_fw_filename = {
[SOF_IPC_TYPE_4] = "sof-lnl.ri",
diff --git a/sound/soc/sof/intel/pci-tgl.c b/sound/soc/sof/intel/pci-tgl.c
index 0660d4b..a361ee9 100644
--- a/sound/soc/sof/intel/pci-tgl.c
+++ b/sound/soc/sof/intel/pci-tgl.c
@@ -33,18 +33,18 @@ static const struct sof_dev_desc tgl_desc = {
.dspless_mode_supported = true, /* Only supported for HDaudio */
.default_fw_path = {
[SOF_IPC_TYPE_3] = "intel/sof",
- [SOF_IPC_TYPE_4] = "intel/avs/tgl",
+ [SOF_IPC_TYPE_4] = "intel/sof-ipc4/tgl",
},
.default_lib_path = {
- [SOF_IPC_TYPE_4] = "intel/avs-lib/tgl",
+ [SOF_IPC_TYPE_4] = "intel/sof-ipc4-lib/tgl",
},
.default_tplg_path = {
[SOF_IPC_TYPE_3] = "intel/sof-tplg",
- [SOF_IPC_TYPE_4] = "intel/avs-tplg",
+ [SOF_IPC_TYPE_4] = "intel/sof-ipc4-tplg",
},
.default_fw_filename = {
[SOF_IPC_TYPE_3] = "sof-tgl.ri",
- [SOF_IPC_TYPE_4] = "dsp_basefw.bin",
+ [SOF_IPC_TYPE_4] = "sof-tgl.ri",
},
.nocodec_tplg_filename = "sof-tgl-nocodec.tplg",
.ops = &sof_tgl_ops,
@@ -66,18 +66,18 @@ static const struct sof_dev_desc tglh_desc = {
.dspless_mode_supported = true, /* Only supported for HDaudio */
.default_fw_path = {
[SOF_IPC_TYPE_3] = "intel/sof",
- [SOF_IPC_TYPE_4] = "intel/avs/tgl-h",
+ [SOF_IPC_TYPE_4] = "intel/sof-ipc4/tgl-h",
},
.default_lib_path = {
- [SOF_IPC_TYPE_4] = "intel/avs-lib/tgl-h",
+ [SOF_IPC_TYPE_4] = "intel/sof-ipc4-lib/tgl-h",
},
.default_tplg_path = {
[SOF_IPC_TYPE_3] = "intel/sof-tplg",
- [SOF_IPC_TYPE_4] = "intel/avs-tplg",
+ [SOF_IPC_TYPE_4] = "intel/sof-ipc4-tplg",
},
.default_fw_filename = {
[SOF_IPC_TYPE_3] = "sof-tgl-h.ri",
- [SOF_IPC_TYPE_4] = "dsp_basefw.bin",
+ [SOF_IPC_TYPE_4] = "sof-tgl-h.ri",
},
.nocodec_tplg_filename = "sof-tgl-nocodec.tplg",
.ops = &sof_tgl_ops,
@@ -98,18 +98,18 @@ static const struct sof_dev_desc ehl_desc = {
.dspless_mode_supported = true, /* Only supported for HDaudio */
.default_fw_path = {
[SOF_IPC_TYPE_3] = "intel/sof",
- [SOF_IPC_TYPE_4] = "intel/avs/ehl",
+ [SOF_IPC_TYPE_4] = "intel/sof-ipc4/ehl",
},
.default_lib_path = {
- [SOF_IPC_TYPE_4] = "intel/avs-lib/ehl",
+ [SOF_IPC_TYPE_4] = "intel/sof-ipc4-lib/ehl",
},
.default_tplg_path = {
[SOF_IPC_TYPE_3] = "intel/sof-tplg",
- [SOF_IPC_TYPE_4] = "intel/avs-tplg",
+ [SOF_IPC_TYPE_4] = "intel/sof-ipc4-tplg",
},
.default_fw_filename = {
[SOF_IPC_TYPE_3] = "sof-ehl.ri",
- [SOF_IPC_TYPE_4] = "dsp_basefw.bin",
+ [SOF_IPC_TYPE_4] = "sof-ehl.ri",
},
.nocodec_tplg_filename = "sof-ehl-nocodec.tplg",
.ops = &sof_tgl_ops,
@@ -131,18 +131,18 @@ static const struct sof_dev_desc adls_desc = {
.dspless_mode_supported = true, /* Only supported for HDaudio */
.default_fw_path = {
[SOF_IPC_TYPE_3] = "intel/sof",
- [SOF_IPC_TYPE_4] = "intel/avs/adl-s",
+ [SOF_IPC_TYPE_4] = "intel/sof-ipc4/adl-s",
},
.default_lib_path = {
- [SOF_IPC_TYPE_4] = "intel/avs-lib/adl-s",
+ [SOF_IPC_TYPE_4] = "intel/sof-ipc4-lib/adl-s",
},
.default_tplg_path = {
[SOF_IPC_TYPE_3] = "intel/sof-tplg",
- [SOF_IPC_TYPE_4] = "intel/avs-tplg",
+ [SOF_IPC_TYPE_4] = "intel/sof-ipc4-tplg",
},
.default_fw_filename = {
[SOF_IPC_TYPE_3] = "sof-adl-s.ri",
- [SOF_IPC_TYPE_4] = "dsp_basefw.bin",
+ [SOF_IPC_TYPE_4] = "sof-adl-s.ri",
},
.nocodec_tplg_filename = "sof-adl-nocodec.tplg",
.ops = &sof_tgl_ops,
@@ -164,18 +164,18 @@ static const struct sof_dev_desc adl_desc = {
.dspless_mode_supported = true, /* Only supported for HDaudio */
.default_fw_path = {
[SOF_IPC_TYPE_3] = "intel/sof",
- [SOF_IPC_TYPE_4] = "intel/avs/adl",
+ [SOF_IPC_TYPE_4] = "intel/sof-ipc4/adl",
},
.default_lib_path = {
- [SOF_IPC_TYPE_4] = "intel/avs-lib/adl",
+ [SOF_IPC_TYPE_4] = "intel/sof-ipc4-lib/adl",
},
.default_tplg_path = {
[SOF_IPC_TYPE_3] = "intel/sof-tplg",
- [SOF_IPC_TYPE_4] = "intel/avs-tplg",
+ [SOF_IPC_TYPE_4] = "intel/sof-ipc4-tplg",
},
.default_fw_filename = {
[SOF_IPC_TYPE_3] = "sof-adl.ri",
- [SOF_IPC_TYPE_4] = "dsp_basefw.bin",
+ [SOF_IPC_TYPE_4] = "sof-adl.ri",
},
.nocodec_tplg_filename = "sof-adl-nocodec.tplg",
.ops = &sof_tgl_ops,
@@ -197,18 +197,18 @@ static const struct sof_dev_desc adl_n_desc = {
.dspless_mode_supported = true, /* Only supported for HDaudio */
.default_fw_path = {
[SOF_IPC_TYPE_3] = "intel/sof",
- [SOF_IPC_TYPE_4] = "intel/avs/adl-n",
+ [SOF_IPC_TYPE_4] = "intel/sof-ipc4/adl-n",
},
.default_lib_path = {
- [SOF_IPC_TYPE_4] = "intel/avs-lib/adl-n",
+ [SOF_IPC_TYPE_4] = "intel/sof-ipc4-lib/adl-n",
},
.default_tplg_path = {
[SOF_IPC_TYPE_3] = "intel/sof-tplg",
- [SOF_IPC_TYPE_4] = "intel/avs-tplg",
+ [SOF_IPC_TYPE_4] = "intel/sof-ipc4-tplg",
},
.default_fw_filename = {
[SOF_IPC_TYPE_3] = "sof-adl-n.ri",
- [SOF_IPC_TYPE_4] = "dsp_basefw.bin",
+ [SOF_IPC_TYPE_4] = "sof-adl-n.ri",
},
.nocodec_tplg_filename = "sof-adl-nocodec.tplg",
.ops = &sof_tgl_ops,
@@ -230,18 +230,18 @@ static const struct sof_dev_desc rpls_desc = {
.dspless_mode_supported = true, /* Only supported for HDaudio */
.default_fw_path = {
[SOF_IPC_TYPE_3] = "intel/sof",
- [SOF_IPC_TYPE_4] = "intel/avs/rpl-s",
+ [SOF_IPC_TYPE_4] = "intel/sof-ipc4/rpl-s",
},
.default_lib_path = {
- [SOF_IPC_TYPE_4] = "intel/avs-lib/rpl-s",
+ [SOF_IPC_TYPE_4] = "intel/sof-ipc4-lib/rpl-s",
},
.default_tplg_path = {
[SOF_IPC_TYPE_3] = "intel/sof-tplg",
- [SOF_IPC_TYPE_4] = "intel/avs-tplg",
+ [SOF_IPC_TYPE_4] = "intel/sof-ipc4-tplg",
},
.default_fw_filename = {
[SOF_IPC_TYPE_3] = "sof-rpl-s.ri",
- [SOF_IPC_TYPE_4] = "dsp_basefw.bin",
+ [SOF_IPC_TYPE_4] = "sof-rpl-s.ri",
},
.nocodec_tplg_filename = "sof-rpl-nocodec.tplg",
.ops = &sof_tgl_ops,
@@ -263,18 +263,18 @@ static const struct sof_dev_desc rpl_desc = {
.dspless_mode_supported = true, /* Only supported for HDaudio */
.default_fw_path = {
[SOF_IPC_TYPE_3] = "intel/sof",
- [SOF_IPC_TYPE_4] = "intel/avs/rpl",
+ [SOF_IPC_TYPE_4] = "intel/sof-ipc4/rpl",
},
.default_lib_path = {
- [SOF_IPC_TYPE_4] = "intel/avs-lib/rpl",
+ [SOF_IPC_TYPE_4] = "intel/sof-ipc4-lib/rpl",
},
.default_tplg_path = {
[SOF_IPC_TYPE_3] = "intel/sof-tplg",
- [SOF_IPC_TYPE_4] = "intel/avs-tplg",
+ [SOF_IPC_TYPE_4] = "intel/sof-ipc4-tplg",
},
.default_fw_filename = {
[SOF_IPC_TYPE_3] = "sof-rpl.ri",
- [SOF_IPC_TYPE_4] = "dsp_basefw.bin",
+ [SOF_IPC_TYPE_4] = "sof-rpl.ri",
},
.nocodec_tplg_filename = "sof-rpl-nocodec.tplg",
.ops = &sof_tgl_ops,
diff --git a/sound/soc/sof/ipc3-topology.c b/sound/soc/sof/ipc3-topology.c
index a8832a1..d47698f4 100644
--- a/sound/soc/sof/ipc3-topology.c
+++ b/sound/soc/sof/ipc3-topology.c
@@ -2360,6 +2360,44 @@ static int sof_tear_down_left_over_pipelines(struct snd_sof_dev *sdev)
return 0;
}
+static int sof_ipc3_free_widgets_in_list(struct snd_sof_dev *sdev, bool include_scheduler,
+ bool *dyn_widgets, bool verify)
+{
+ struct sof_ipc_fw_version *v = &sdev->fw_ready.version;
+ struct snd_sof_widget *swidget;
+ int ret;
+
+ list_for_each_entry(swidget, &sdev->widget_list, list) {
+ if (swidget->dynamic_pipeline_widget) {
+ *dyn_widgets = true;
+ continue;
+ }
+
+ /* Do not free widgets for static pipelines with FW older than SOF2.2 */
+ if (!verify && !swidget->dynamic_pipeline_widget &&
+ SOF_FW_VER(v->major, v->minor, v->micro) < SOF_FW_VER(2, 2, 0)) {
+ mutex_lock(&swidget->setup_mutex);
+ swidget->use_count = 0;
+ mutex_unlock(&swidget->setup_mutex);
+ if (swidget->spipe)
+ swidget->spipe->complete = 0;
+ continue;
+ }
+
+ if (include_scheduler && swidget->id != snd_soc_dapm_scheduler)
+ continue;
+
+ if (!include_scheduler && swidget->id == snd_soc_dapm_scheduler)
+ continue;
+
+ ret = sof_widget_free(sdev, swidget);
+ if (ret < 0)
+ return ret;
+ }
+
+ return 0;
+}
+
/*
* For older firmware, this function doesn't free widgets for static pipelines during suspend.
* It only resets use_count for all widgets.
@@ -2376,29 +2414,18 @@ static int sof_ipc3_tear_down_all_pipelines(struct snd_sof_dev *sdev, bool verif
* This function is called during suspend and for one-time topology verification during
* first boot. In both cases, there is no need to protect swidget->use_count and
* sroute->setup because during suspend all running streams are suspended and during
- * topology loading the sound card unavailable to open PCMs.
+ * topology loading the sound card unavailable to open PCMs. Do not free the scheduler
+ * widgets yet so that the secondary cores do not get powered down before all the widgets
+ * associated with the scheduler are freed.
*/
- list_for_each_entry(swidget, &sdev->widget_list, list) {
- if (swidget->dynamic_pipeline_widget) {
- dyn_widgets = true;
- continue;
- }
+ ret = sof_ipc3_free_widgets_in_list(sdev, false, &dyn_widgets, verify);
+ if (ret < 0)
+ return ret;
- /* Do not free widgets for static pipelines with FW older than SOF2.2 */
- if (!verify && !swidget->dynamic_pipeline_widget &&
- SOF_FW_VER(v->major, v->minor, v->micro) < SOF_FW_VER(2, 2, 0)) {
- mutex_lock(&swidget->setup_mutex);
- swidget->use_count = 0;
- mutex_unlock(&swidget->setup_mutex);
- if (swidget->spipe)
- swidget->spipe->complete = 0;
- continue;
- }
-
- ret = sof_widget_free(sdev, swidget);
- if (ret < 0)
- return ret;
- }
+ /* free all the scheduler widgets now */
+ ret = sof_ipc3_free_widgets_in_list(sdev, true, &dyn_widgets, verify);
+ if (ret < 0)
+ return ret;
/*
* Tear down all pipelines associated with PCMs that did not get suspended
diff --git a/sound/soc/sof/ipc3.c b/sound/soc/sof/ipc3.c
index fb40378..c03dd51 100644
--- a/sound/soc/sof/ipc3.c
+++ b/sound/soc/sof/ipc3.c
@@ -1067,7 +1067,7 @@ static void sof_ipc3_rx_msg(struct snd_sof_dev *sdev)
return;
}
- if (hdr.size < sizeof(hdr)) {
+ if (hdr.size < sizeof(hdr) || hdr.size > SOF_IPC_MSG_MAX_SIZE) {
dev_err(sdev->dev, "The received message size is invalid\n");
return;
}
diff --git a/sound/soc/sof/ipc4-pcm.c b/sound/soc/sof/ipc4-pcm.c
index 85d3f39..07eb5c6 100644
--- a/sound/soc/sof/ipc4-pcm.c
+++ b/sound/soc/sof/ipc4-pcm.c
@@ -413,7 +413,18 @@ static int sof_ipc4_trigger_pipelines(struct snd_soc_component *component,
ret = sof_ipc4_set_multi_pipeline_state(sdev, state, trigger_list);
if (ret < 0) {
dev_err(sdev->dev, "failed to set final state %d for all pipelines\n", state);
- goto free;
+ /*
+ * workaround: if the firmware is crashed while setting the
+ * pipelines to reset state we must ignore the error code and
+ * reset it to 0.
+ * Since the firmware is crashed we will not send IPC messages
+ * and we are going to see errors printed, but the state of the
+ * widgets will be correct for the next boot.
+ */
+ if (sdev->fw_state != SOF_FW_CRASHED || state != SOF_IPC4_PIPE_RESET)
+ goto free;
+
+ ret = 0;
}
/* update RUNNING/RESET state for all pipelines that were just triggered */
diff --git a/sound/usb/midi.c b/sound/usb/midi.c
index 6b09932..c1f2e5a 100644
--- a/sound/usb/midi.c
+++ b/sound/usb/midi.c
@@ -1742,50 +1742,44 @@ static void snd_usbmidi_get_port_info(struct snd_rawmidi *rmidi, int number,
}
}
-static struct usb_midi_in_jack_descriptor *find_usb_in_jack_descriptor(
- struct usb_host_interface *hostif, uint8_t jack_id)
+/* return iJack for the corresponding jackID */
+static int find_usb_ijack(struct usb_host_interface *hostif, uint8_t jack_id)
{
unsigned char *extra = hostif->extra;
int extralen = hostif->extralen;
+ struct usb_descriptor_header *h;
+ struct usb_midi_out_jack_descriptor *outjd;
+ struct usb_midi_in_jack_descriptor *injd;
+ size_t sz;
while (extralen > 4) {
- struct usb_midi_in_jack_descriptor *injd =
- (struct usb_midi_in_jack_descriptor *)extra;
+ h = (struct usb_descriptor_header *)extra;
+ if (h->bDescriptorType != USB_DT_CS_INTERFACE)
+ goto next;
- if (injd->bLength >= sizeof(*injd) &&
- injd->bDescriptorType == USB_DT_CS_INTERFACE &&
- injd->bDescriptorSubtype == UAC_MIDI_IN_JACK &&
- injd->bJackID == jack_id)
- return injd;
- if (!extra[0])
- break;
- extralen -= extra[0];
- extra += extra[0];
- }
- return NULL;
-}
-
-static struct usb_midi_out_jack_descriptor *find_usb_out_jack_descriptor(
- struct usb_host_interface *hostif, uint8_t jack_id)
-{
- unsigned char *extra = hostif->extra;
- int extralen = hostif->extralen;
-
- while (extralen > 4) {
- struct usb_midi_out_jack_descriptor *outjd =
- (struct usb_midi_out_jack_descriptor *)extra;
-
- if (outjd->bLength >= sizeof(*outjd) &&
- outjd->bDescriptorType == USB_DT_CS_INTERFACE &&
+ outjd = (struct usb_midi_out_jack_descriptor *)h;
+ if (h->bLength >= sizeof(*outjd) &&
outjd->bDescriptorSubtype == UAC_MIDI_OUT_JACK &&
- outjd->bJackID == jack_id)
- return outjd;
+ outjd->bJackID == jack_id) {
+ sz = USB_DT_MIDI_OUT_SIZE(outjd->bNrInputPins);
+ if (outjd->bLength < sz)
+ goto next;
+ return *(extra + sz - 1);
+ }
+
+ injd = (struct usb_midi_in_jack_descriptor *)h;
+ if (injd->bLength >= sizeof(*injd) &&
+ injd->bDescriptorSubtype == UAC_MIDI_IN_JACK &&
+ injd->bJackID == jack_id)
+ return injd->iJack;
+
+next:
if (!extra[0])
break;
extralen -= extra[0];
extra += extra[0];
}
- return NULL;
+ return 0;
}
static void snd_usbmidi_init_substream(struct snd_usb_midi *umidi,
@@ -1796,13 +1790,10 @@ static void snd_usbmidi_init_substream(struct snd_usb_midi *umidi,
const char *name_format;
struct usb_interface *intf;
struct usb_host_interface *hostif;
- struct usb_midi_in_jack_descriptor *injd;
- struct usb_midi_out_jack_descriptor *outjd;
uint8_t jack_name_buf[32];
uint8_t *default_jack_name = "MIDI";
uint8_t *jack_name = default_jack_name;
uint8_t iJack;
- size_t sz;
int res;
struct snd_rawmidi_substream *substream =
@@ -1816,21 +1807,7 @@ static void snd_usbmidi_init_substream(struct snd_usb_midi *umidi,
intf = umidi->iface;
if (intf && jack_id >= 0) {
hostif = intf->cur_altsetting;
- iJack = 0;
- if (stream != SNDRV_RAWMIDI_STREAM_OUTPUT) {
- /* in jacks connect to outs */
- outjd = find_usb_out_jack_descriptor(hostif, jack_id);
- if (outjd) {
- sz = USB_DT_MIDI_OUT_SIZE(outjd->bNrInputPins);
- if (outjd->bLength >= sz)
- iJack = *(((uint8_t *) outjd) + sz - sizeof(uint8_t));
- }
- } else {
- /* and out jacks connect to ins */
- injd = find_usb_in_jack_descriptor(hostif, jack_id);
- if (injd)
- iJack = injd->iJack;
- }
+ iJack = find_usb_ijack(hostif, jack_id);
if (iJack != 0) {
res = usb_string(umidi->dev, iJack, jack_name_buf,
ARRAY_SIZE(jack_name_buf));
diff --git a/tools/arch/x86/include/asm/rmwcc.h b/tools/arch/x86/include/asm/rmwcc.h
index 11ff975..e2ff22b 100644
--- a/tools/arch/x86/include/asm/rmwcc.h
+++ b/tools/arch/x86/include/asm/rmwcc.h
@@ -4,7 +4,7 @@
#define __GEN_RMWcc(fullop, var, cc, ...) \
do { \
- asm_volatile_goto (fullop "; j" cc " %l[cc_label]" \
+ asm goto (fullop "; j" cc " %l[cc_label]" \
: : "m" (var), ## __VA_ARGS__ \
: "memory" : cc_label); \
return 0; \
diff --git a/tools/include/linux/compiler_types.h b/tools/include/linux/compiler_types.h
index 1bdd834..d09f9dc 100644
--- a/tools/include/linux/compiler_types.h
+++ b/tools/include/linux/compiler_types.h
@@ -36,8 +36,8 @@
#include <linux/compiler-gcc.h>
#endif
-#ifndef asm_volatile_goto
-#define asm_volatile_goto(x...) asm goto(x)
+#ifndef asm_goto_output
+#define asm_goto_output(x...) asm goto(x)
#endif
#endif /* __LINUX_COMPILER_TYPES_H */
diff --git a/tools/net/ynl/lib/ynl.c b/tools/net/ynl/lib/ynl.c
index c82a7f4..45e4967 100644
--- a/tools/net/ynl/lib/ynl.c
+++ b/tools/net/ynl/lib/ynl.c
@@ -466,6 +466,8 @@ ynl_gemsg_start_dump(struct ynl_sock *ys, __u32 id, __u8 cmd, __u8 version)
int ynl_recv_ack(struct ynl_sock *ys, int ret)
{
+ struct ynl_parse_arg yarg = { .ys = ys, };
+
if (!ret) {
yerr(ys, YNL_ERROR_EXPECT_ACK,
"Expecting an ACK but nothing received");
@@ -478,7 +480,7 @@ int ynl_recv_ack(struct ynl_sock *ys, int ret)
return ret;
}
return mnl_cb_run(ys->rx_buf, ret, ys->seq, ys->portid,
- ynl_cb_null, ys);
+ ynl_cb_null, &yarg);
}
int ynl_cb_null(const struct nlmsghdr *nlh, void *data)
@@ -521,6 +523,7 @@ ynl_get_family_info_mcast(struct ynl_sock *ys, const struct nlattr *mcasts)
ys->mcast_groups[i].name[GENL_NAMSIZ - 1] = 0;
}
}
+ i++;
}
return 0;
@@ -586,7 +589,13 @@ static int ynl_sock_read_family(struct ynl_sock *ys, const char *family_name)
return err;
}
- return ynl_recv_ack(ys, err);
+ err = ynl_recv_ack(ys, err);
+ if (err < 0) {
+ free(ys->mcast_groups);
+ return err;
+ }
+
+ return 0;
}
struct ynl_sock *
@@ -741,11 +750,14 @@ static int ynl_ntf_parse(struct ynl_sock *ys, const struct nlmsghdr *nlh)
static int ynl_ntf_trampoline(const struct nlmsghdr *nlh, void *data)
{
- return ynl_ntf_parse((struct ynl_sock *)data, nlh);
+ struct ynl_parse_arg *yarg = data;
+
+ return ynl_ntf_parse(yarg->ys, nlh);
}
int ynl_ntf_check(struct ynl_sock *ys)
{
+ struct ynl_parse_arg yarg = { .ys = ys, };
ssize_t len;
int err;
@@ -767,7 +779,7 @@ int ynl_ntf_check(struct ynl_sock *ys)
return len;
err = mnl_cb_run2(ys->rx_buf, len, ys->seq, ys->portid,
- ynl_ntf_trampoline, ys,
+ ynl_ntf_trampoline, &yarg,
ynl_cb_array, NLMSG_MIN_TYPE);
if (err < 0)
return err;
diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
index caff383..030b388 100644
--- a/tools/testing/cxl/Kbuild
+++ b/tools/testing/cxl/Kbuild
@@ -13,6 +13,7 @@
ldflags-y += --wrap=cxl_dvsec_rr_decode
ldflags-y += --wrap=devm_cxl_add_rch_dport
ldflags-y += --wrap=cxl_rcd_component_reg_phys
+ldflags-y += --wrap=cxl_endpoint_parse_cdat
DRIVERS := ../../../drivers
CXL_SRC := $(DRIVERS)/cxl
diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index a3cdbb2..908e0d0 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -15,6 +15,8 @@
static int interleave_arithmetic;
+#define FAKE_QTG_ID 42
+
#define NR_CXL_HOST_BRIDGES 2
#define NR_CXL_SINGLE_HOST 1
#define NR_CXL_RCH 1
@@ -209,7 +211,7 @@ static struct {
.granularity = 4,
.restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
ACPI_CEDT_CFMWS_RESTRICT_VOLATILE,
- .qtg_id = 0,
+ .qtg_id = FAKE_QTG_ID,
.window_size = SZ_256M * 4UL,
},
.target = { 0 },
@@ -224,7 +226,7 @@ static struct {
.granularity = 4,
.restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
ACPI_CEDT_CFMWS_RESTRICT_VOLATILE,
- .qtg_id = 1,
+ .qtg_id = FAKE_QTG_ID,
.window_size = SZ_256M * 8UL,
},
.target = { 0, 1, },
@@ -239,7 +241,7 @@ static struct {
.granularity = 4,
.restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
ACPI_CEDT_CFMWS_RESTRICT_PMEM,
- .qtg_id = 2,
+ .qtg_id = FAKE_QTG_ID,
.window_size = SZ_256M * 4UL,
},
.target = { 0 },
@@ -254,7 +256,7 @@ static struct {
.granularity = 4,
.restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
ACPI_CEDT_CFMWS_RESTRICT_PMEM,
- .qtg_id = 3,
+ .qtg_id = FAKE_QTG_ID,
.window_size = SZ_256M * 8UL,
},
.target = { 0, 1, },
@@ -269,7 +271,7 @@ static struct {
.granularity = 4,
.restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
ACPI_CEDT_CFMWS_RESTRICT_PMEM,
- .qtg_id = 4,
+ .qtg_id = FAKE_QTG_ID,
.window_size = SZ_256M * 4UL,
},
.target = { 2 },
@@ -284,7 +286,7 @@ static struct {
.granularity = 4,
.restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
ACPI_CEDT_CFMWS_RESTRICT_VOLATILE,
- .qtg_id = 5,
+ .qtg_id = FAKE_QTG_ID,
.window_size = SZ_256M,
},
.target = { 3 },
@@ -301,7 +303,7 @@ static struct {
.granularity = 4,
.restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
ACPI_CEDT_CFMWS_RESTRICT_PMEM,
- .qtg_id = 0,
+ .qtg_id = FAKE_QTG_ID,
.window_size = SZ_256M * 8UL,
},
.target = { 0, },
@@ -317,7 +319,7 @@ static struct {
.granularity = 0,
.restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
ACPI_CEDT_CFMWS_RESTRICT_PMEM,
- .qtg_id = 1,
+ .qtg_id = FAKE_QTG_ID,
.window_size = SZ_256M * 8UL,
},
.target = { 0, 1, },
@@ -333,7 +335,7 @@ static struct {
.granularity = 0,
.restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
ACPI_CEDT_CFMWS_RESTRICT_PMEM,
- .qtg_id = 0,
+ .qtg_id = FAKE_QTG_ID,
.window_size = SZ_256M * 16UL,
},
.target = { 0, 1, 0, 1, },
@@ -976,6 +978,48 @@ static int mock_cxl_port_enumerate_dports(struct cxl_port *port)
return 0;
}
+/*
+ * Faking the cxl_dpa_perf for the memdev when appropriate.
+ */
+static void dpa_perf_setup(struct cxl_port *endpoint, struct range *range,
+ struct cxl_dpa_perf *dpa_perf)
+{
+ dpa_perf->qos_class = FAKE_QTG_ID;
+ dpa_perf->dpa_range = *range;
+ dpa_perf->coord.read_latency = 500;
+ dpa_perf->coord.write_latency = 500;
+ dpa_perf->coord.read_bandwidth = 1000;
+ dpa_perf->coord.write_bandwidth = 1000;
+}
+
+static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port)
+{
+ struct cxl_root *cxl_root __free(put_cxl_root) =
+ find_cxl_root(port);
+ struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
+ struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
+ struct range pmem_range = {
+ .start = cxlds->pmem_res.start,
+ .end = cxlds->pmem_res.end,
+ };
+ struct range ram_range = {
+ .start = cxlds->ram_res.start,
+ .end = cxlds->ram_res.end,
+ };
+
+ if (!cxl_root)
+ return;
+
+ if (range_len(&ram_range))
+ dpa_perf_setup(port, &ram_range, &mds->ram_perf);
+
+ if (range_len(&pmem_range))
+ dpa_perf_setup(port, &pmem_range, &mds->pmem_perf);
+
+ cxl_memdev_update_perf(cxlmd);
+}
+
static struct cxl_mock_ops cxl_mock_ops = {
.is_mock_adev = is_mock_adev,
.is_mock_bridge = is_mock_bridge,
@@ -989,6 +1033,7 @@ static struct cxl_mock_ops cxl_mock_ops = {
.devm_cxl_setup_hdm = mock_cxl_setup_hdm,
.devm_cxl_add_passthrough_decoder = mock_cxl_add_passthrough_decoder,
.devm_cxl_enumerate_decoders = mock_cxl_enumerate_decoders,
+ .cxl_endpoint_parse_cdat = mock_cxl_endpoint_parse_cdat,
.list = LIST_HEAD_INIT(cxl_mock_ops.list),
};
diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c
index 1a61e68..6f73794 100644
--- a/tools/testing/cxl/test/mock.c
+++ b/tools/testing/cxl/test/mock.c
@@ -285,6 +285,20 @@ resource_size_t __wrap_cxl_rcd_component_reg_phys(struct device *dev,
}
EXPORT_SYMBOL_NS_GPL(__wrap_cxl_rcd_component_reg_phys, CXL);
+void __wrap_cxl_endpoint_parse_cdat(struct cxl_port *port)
+{
+ int index;
+ struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
+ struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
+
+ if (ops && ops->is_mock_dev(cxlmd->dev.parent))
+ ops->cxl_endpoint_parse_cdat(port);
+ else
+ cxl_endpoint_parse_cdat(port);
+ put_cxl_mock_ops(index);
+}
+EXPORT_SYMBOL_NS_GPL(__wrap_cxl_endpoint_parse_cdat, CXL);
+
MODULE_LICENSE("GPL v2");
MODULE_IMPORT_NS(ACPI);
MODULE_IMPORT_NS(CXL);
diff --git a/tools/testing/cxl/test/mock.h b/tools/testing/cxl/test/mock.h
index a942237..d1b0271 100644
--- a/tools/testing/cxl/test/mock.h
+++ b/tools/testing/cxl/test/mock.h
@@ -25,6 +25,7 @@ struct cxl_mock_ops {
int (*devm_cxl_add_passthrough_decoder)(struct cxl_port *port);
int (*devm_cxl_enumerate_decoders)(
struct cxl_hdm *hdm, struct cxl_endpoint_dvsec_info *info);
+ void (*cxl_endpoint_parse_cdat)(struct cxl_port *port);
};
void register_cxl_mock_ops(struct cxl_mock_ops *ops);
diff --git a/tools/testing/kunit/kunit_kernel.py b/tools/testing/kunit/kunit_kernel.py
index 0b6488e..7254c110 100644
--- a/tools/testing/kunit/kunit_kernel.py
+++ b/tools/testing/kunit/kunit_kernel.py
@@ -146,6 +146,7 @@
"""Runs the Linux UML binary. Must be named 'linux'."""
linux_bin = os.path.join(build_dir, 'linux')
params.extend(['mem=1G', 'console=tty', 'kunit_shutdown=halt'])
+ print('Running tests with:\n$', linux_bin, ' '.join(shlex.quote(arg) for arg in params))
return subprocess.Popen([linux_bin] + params,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 15b6a11..cd9ae57 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -67,6 +67,7 @@
TARGETS += perf_events
TARGETS += pidfd
TARGETS += pid_namespace
+TARGETS += power_supply
TARGETS += powerpc
TARGETS += prctl
TARGETS += proc
@@ -78,6 +79,7 @@
TARGETS += rlimits
TARGETS += rseq
TARGETS += rtc
+TARGETS += rust
TARGETS += seccomp
TARGETS += sgx
TARGETS += sigaltstack
@@ -236,6 +238,7 @@
install -m 744 kselftest/module.sh $(INSTALL_PATH)/kselftest/
install -m 744 kselftest/runner.sh $(INSTALL_PATH)/kselftest/
install -m 744 kselftest/prefix.pl $(INSTALL_PATH)/kselftest/
+ install -m 744 kselftest/ktap_helpers.sh $(INSTALL_PATH)/kselftest/
install -m 744 run_kselftest.sh $(INSTALL_PATH)/
rm -f $(TEST_LIST)
@ret=1; \
diff --git a/tools/testing/selftests/bpf/prog_tests/iters.c b/tools/testing/selftests/bpf/prog_tests/iters.c
index bf84d4a..3c44037 100644
--- a/tools/testing/selftests/bpf/prog_tests/iters.c
+++ b/tools/testing/selftests/bpf/prog_tests/iters.c
@@ -193,6 +193,7 @@ static void subtest_task_iters(void)
ASSERT_EQ(skel->bss->procs_cnt, 1, "procs_cnt");
ASSERT_EQ(skel->bss->threads_cnt, thread_num + 1, "threads_cnt");
ASSERT_EQ(skel->bss->proc_threads_cnt, thread_num + 1, "proc_threads_cnt");
+ ASSERT_EQ(skel->bss->invalid_cnt, 0, "invalid_cnt");
pthread_mutex_unlock(&do_nothing_mutex);
for (int i = 0; i < thread_num; i++)
ASSERT_OK(pthread_join(thread_ids[i], &ret), "pthread_join");
diff --git a/tools/testing/selftests/bpf/prog_tests/read_vsyscall.c b/tools/testing/selftests/bpf/prog_tests/read_vsyscall.c
new file mode 100644
index 0000000..3405923
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/read_vsyscall.c
@@ -0,0 +1,57 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2024. Huawei Technologies Co., Ltd */
+#include "test_progs.h"
+#include "read_vsyscall.skel.h"
+
+#if defined(__x86_64__)
+/* For VSYSCALL_ADDR */
+#include <asm/vsyscall.h>
+#else
+/* To prevent build failure on non-x86 arch */
+#define VSYSCALL_ADDR 0UL
+#endif
+
+struct read_ret_desc {
+ const char *name;
+ int ret;
+} all_read[] = {
+ { .name = "probe_read_kernel", .ret = -ERANGE },
+ { .name = "probe_read_kernel_str", .ret = -ERANGE },
+ { .name = "probe_read", .ret = -ERANGE },
+ { .name = "probe_read_str", .ret = -ERANGE },
+ { .name = "probe_read_user", .ret = -EFAULT },
+ { .name = "probe_read_user_str", .ret = -EFAULT },
+ { .name = "copy_from_user", .ret = -EFAULT },
+ { .name = "copy_from_user_task", .ret = -EFAULT },
+};
+
+void test_read_vsyscall(void)
+{
+ struct read_vsyscall *skel;
+ unsigned int i;
+ int err;
+
+#if !defined(__x86_64__)
+ test__skip();
+ return;
+#endif
+ skel = read_vsyscall__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "read_vsyscall open_load"))
+ return;
+
+ skel->bss->target_pid = getpid();
+ err = read_vsyscall__attach(skel);
+ if (!ASSERT_EQ(err, 0, "read_vsyscall attach"))
+ goto out;
+
+ /* userspace may don't have vsyscall page due to LEGACY_VSYSCALL_NONE,
+ * but it doesn't affect the returned error codes.
+ */
+ skel->bss->user_ptr = (void *)VSYSCALL_ADDR;
+ usleep(1);
+
+ for (i = 0; i < ARRAY_SIZE(all_read); i++)
+ ASSERT_EQ(skel->bss->read_ret[i], all_read[i].ret, all_read[i].name);
+out:
+ read_vsyscall__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/prog_tests/timer.c b/tools/testing/selftests/bpf/prog_tests/timer.c
index 760ad96..d66687f 100644
--- a/tools/testing/selftests/bpf/prog_tests/timer.c
+++ b/tools/testing/selftests/bpf/prog_tests/timer.c
@@ -4,10 +4,29 @@
#include "timer.skel.h"
#include "timer_failure.skel.h"
+#define NUM_THR 8
+
+static void *spin_lock_thread(void *arg)
+{
+ int i, err, prog_fd = *(int *)arg;
+ LIBBPF_OPTS(bpf_test_run_opts, topts);
+
+ for (i = 0; i < 10000; i++) {
+ err = bpf_prog_test_run_opts(prog_fd, &topts);
+ if (!ASSERT_OK(err, "test_run_opts err") ||
+ !ASSERT_OK(topts.retval, "test_run_opts retval"))
+ break;
+ }
+
+ pthread_exit(arg);
+}
+
static int timer(struct timer *timer_skel)
{
- int err, prog_fd;
+ int i, err, prog_fd;
LIBBPF_OPTS(bpf_test_run_opts, topts);
+ pthread_t thread_id[NUM_THR];
+ void *ret;
err = timer__attach(timer_skel);
if (!ASSERT_OK(err, "timer_attach"))
@@ -43,6 +62,20 @@ static int timer(struct timer *timer_skel)
/* check that code paths completed */
ASSERT_EQ(timer_skel->bss->ok, 1 | 2 | 4, "ok");
+ prog_fd = bpf_program__fd(timer_skel->progs.race);
+ for (i = 0; i < NUM_THR; i++) {
+ err = pthread_create(&thread_id[i], NULL,
+ &spin_lock_thread, &prog_fd);
+ if (!ASSERT_OK(err, "pthread_create"))
+ break;
+ }
+
+ while (i) {
+ err = pthread_join(thread_id[--i], &ret);
+ if (ASSERT_OK(err, "pthread_join"))
+ ASSERT_EQ(ret, (void *)&prog_fd, "pthread_join");
+ }
+
return 0;
}
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c b/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c
index c3b4574..6d8b541 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c
@@ -511,7 +511,7 @@ static void test_xdp_bonding_features(struct skeletons *skeletons)
if (!ASSERT_OK(err, "bond bpf_xdp_query"))
goto out;
- if (!ASSERT_EQ(query_opts.feature_flags, NETDEV_XDP_ACT_MASK,
+ if (!ASSERT_EQ(query_opts.feature_flags, 0,
"bond query_opts.feature_flags"))
goto out;
@@ -601,7 +601,7 @@ static void test_xdp_bonding_features(struct skeletons *skeletons)
if (!ASSERT_OK(err, "bond bpf_xdp_query"))
goto out;
- ASSERT_EQ(query_opts.feature_flags, NETDEV_XDP_ACT_MASK,
+ ASSERT_EQ(query_opts.feature_flags, 0,
"bond query_opts.feature_flags");
out:
bpf_link__destroy(link);
diff --git a/tools/testing/selftests/bpf/progs/iters_task.c b/tools/testing/selftests/bpf/progs/iters_task.c
index c9b4055..e4d53e4 100644
--- a/tools/testing/selftests/bpf/progs/iters_task.c
+++ b/tools/testing/selftests/bpf/progs/iters_task.c
@@ -10,7 +10,7 @@
char _license[] SEC("license") = "GPL";
pid_t target_pid;
-int procs_cnt, threads_cnt, proc_threads_cnt;
+int procs_cnt, threads_cnt, proc_threads_cnt, invalid_cnt;
void bpf_rcu_read_lock(void) __ksym;
void bpf_rcu_read_unlock(void) __ksym;
@@ -26,6 +26,16 @@ int iter_task_for_each_sleep(void *ctx)
procs_cnt = threads_cnt = proc_threads_cnt = 0;
bpf_rcu_read_lock();
+ bpf_for_each(task, pos, NULL, ~0U) {
+ /* Below instructions shouldn't be executed for invalid flags */
+ invalid_cnt++;
+ }
+
+ bpf_for_each(task, pos, NULL, BPF_TASK_ITER_PROC_THREADS) {
+ /* Below instructions shouldn't be executed for invalid task__nullable */
+ invalid_cnt++;
+ }
+
bpf_for_each(task, pos, NULL, BPF_TASK_ITER_ALL_PROCS)
if (pos->pid == target_pid)
procs_cnt++;
diff --git a/tools/testing/selftests/bpf/progs/read_vsyscall.c b/tools/testing/selftests/bpf/progs/read_vsyscall.c
new file mode 100644
index 0000000..986f966
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/read_vsyscall.c
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2024. Huawei Technologies Co., Ltd */
+#include <linux/types.h>
+#include <bpf/bpf_helpers.h>
+
+#include "bpf_misc.h"
+
+int target_pid = 0;
+void *user_ptr = 0;
+int read_ret[8];
+
+char _license[] SEC("license") = "GPL";
+
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+int do_probe_read(void *ctx)
+{
+ char buf[8];
+
+ if ((bpf_get_current_pid_tgid() >> 32) != target_pid)
+ return 0;
+
+ read_ret[0] = bpf_probe_read_kernel(buf, sizeof(buf), user_ptr);
+ read_ret[1] = bpf_probe_read_kernel_str(buf, sizeof(buf), user_ptr);
+ read_ret[2] = bpf_probe_read(buf, sizeof(buf), user_ptr);
+ read_ret[3] = bpf_probe_read_str(buf, sizeof(buf), user_ptr);
+ read_ret[4] = bpf_probe_read_user(buf, sizeof(buf), user_ptr);
+ read_ret[5] = bpf_probe_read_user_str(buf, sizeof(buf), user_ptr);
+
+ return 0;
+}
+
+SEC("fentry.s/" SYS_PREFIX "sys_nanosleep")
+int do_copy_from_user(void *ctx)
+{
+ char buf[8];
+
+ if ((bpf_get_current_pid_tgid() >> 32) != target_pid)
+ return 0;
+
+ read_ret[6] = bpf_copy_from_user(buf, sizeof(buf), user_ptr);
+ read_ret[7] = bpf_copy_from_user_task(buf, sizeof(buf), user_ptr,
+ bpf_get_current_task_btf(), 0);
+
+ return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/timer.c b/tools/testing/selftests/bpf/progs/timer.c
index 8b946c8..f615da9 100644
--- a/tools/testing/selftests/bpf/progs/timer.c
+++ b/tools/testing/selftests/bpf/progs/timer.c
@@ -51,7 +51,8 @@ struct {
__uint(max_entries, 1);
__type(key, int);
__type(value, struct elem);
-} abs_timer SEC(".maps"), soft_timer_pinned SEC(".maps"), abs_timer_pinned SEC(".maps");
+} abs_timer SEC(".maps"), soft_timer_pinned SEC(".maps"), abs_timer_pinned SEC(".maps"),
+ race_array SEC(".maps");
__u64 bss_data;
__u64 abs_data;
@@ -390,3 +391,34 @@ int BPF_PROG2(test5, int, a)
return 0;
}
+
+static int race_timer_callback(void *race_array, int *race_key, struct bpf_timer *timer)
+{
+ bpf_timer_start(timer, 1000000, 0);
+ return 0;
+}
+
+SEC("syscall")
+int race(void *ctx)
+{
+ struct bpf_timer *timer;
+ int err, race_key = 0;
+ struct elem init;
+
+ __builtin_memset(&init, 0, sizeof(struct elem));
+ bpf_map_update_elem(&race_array, &race_key, &init, BPF_ANY);
+
+ timer = bpf_map_lookup_elem(&race_array, &race_key);
+ if (!timer)
+ return 1;
+
+ err = bpf_timer_init(timer, &race_array, CLOCK_MONOTONIC);
+ if (err && err != -EBUSY)
+ return 1;
+
+ bpf_timer_set_callback(timer, race_timer_callback);
+ bpf_timer_start(timer, 0, 0);
+ bpf_timer_cancel(timer);
+
+ return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c b/tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c
index 5905e03..a955a63 100644
--- a/tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c
+++ b/tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c
@@ -239,4 +239,74 @@ int bpf_loop_iter_limit_nested(void *unused)
return 1000 * a + b + c;
}
+struct iter_limit_bug_ctx {
+ __u64 a;
+ __u64 b;
+ __u64 c;
+};
+
+static __naked void iter_limit_bug_cb(void)
+{
+ /* This is the same as C code below, but written
+ * in assembly to control which branches are fall-through.
+ *
+ * switch (bpf_get_prandom_u32()) {
+ * case 1: ctx->a = 42; break;
+ * case 2: ctx->b = 42; break;
+ * default: ctx->c = 42; break;
+ * }
+ */
+ asm volatile (
+ "r9 = r2;"
+ "call %[bpf_get_prandom_u32];"
+ "r1 = r0;"
+ "r2 = 42;"
+ "r0 = 0;"
+ "if r1 == 0x1 goto 1f;"
+ "if r1 == 0x2 goto 2f;"
+ "*(u64 *)(r9 + 16) = r2;"
+ "exit;"
+ "1: *(u64 *)(r9 + 0) = r2;"
+ "exit;"
+ "2: *(u64 *)(r9 + 8) = r2;"
+ "exit;"
+ :
+ : __imm(bpf_get_prandom_u32)
+ : __clobber_all
+ );
+}
+
+SEC("tc")
+__failure
+__flag(BPF_F_TEST_STATE_FREQ)
+int iter_limit_bug(struct __sk_buff *skb)
+{
+ struct iter_limit_bug_ctx ctx = { 7, 7, 7 };
+
+ bpf_loop(2, iter_limit_bug_cb, &ctx, 0);
+
+ /* This is the same as C code below,
+ * written in assembly to guarantee checks order.
+ *
+ * if (ctx.a == 42 && ctx.b == 42 && ctx.c == 7)
+ * asm volatile("r1 /= 0;":::"r1");
+ */
+ asm volatile (
+ "r1 = *(u64 *)%[ctx_a];"
+ "if r1 != 42 goto 1f;"
+ "r1 = *(u64 *)%[ctx_b];"
+ "if r1 != 42 goto 1f;"
+ "r1 = *(u64 *)%[ctx_c];"
+ "if r1 != 7 goto 1f;"
+ "r1 /= 0;"
+ "1:"
+ :
+ : [ctx_a]"m"(ctx.a),
+ [ctx_b]"m"(ctx.b),
+ [ctx_c]"m"(ctx.c)
+ : "r1"
+ );
+ return 0;
+}
+
char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/core/close_range_test.c b/tools/testing/selftests/core/close_range_test.c
index 534576f..c59e4ad 100644
--- a/tools/testing/selftests/core/close_range_test.c
+++ b/tools/testing/selftests/core/close_range_test.c
@@ -12,6 +12,7 @@
#include <syscall.h>
#include <unistd.h>
#include <sys/resource.h>
+#include <linux/close_range.h>
#include "../kselftest_harness.h"
#include "../clone3/clone3_selftests.h"
diff --git a/tools/testing/selftests/drivers/net/bonding/bond_options.sh b/tools/testing/selftests/drivers/net/bonding/bond_options.sh
index d508486..9a3d3c3 100755
--- a/tools/testing/selftests/drivers/net/bonding/bond_options.sh
+++ b/tools/testing/selftests/drivers/net/bonding/bond_options.sh
@@ -62,6 +62,8 @@
# create bond
bond_reset "${param}"
+ # set active_slave to primary eth1 specifically
+ ip -n ${s_ns} link set bond0 type bond active_slave eth1
# check bonding member prio value
ip -n ${s_ns} link set eth0 type bond_slave prio 0
diff --git a/tools/testing/selftests/dt/Makefile b/tools/testing/selftests/dt/Makefile
index 62dc00e..2d33ee9e 100644
--- a/tools/testing/selftests/dt/Makefile
+++ b/tools/testing/selftests/dt/Makefile
@@ -4,7 +4,7 @@
TEST_PROGS := test_unprobed_devices.sh
TEST_GEN_FILES := compatible_list
-TEST_FILES := compatible_ignore_list ktap_helpers.sh
+TEST_FILES := compatible_ignore_list
include ../lib.mk
diff --git a/tools/testing/selftests/dt/test_unprobed_devices.sh b/tools/testing/selftests/dt/test_unprobed_devices.sh
index b07af2a..2d7e70c 100755
--- a/tools/testing/selftests/dt/test_unprobed_devices.sh
+++ b/tools/testing/selftests/dt/test_unprobed_devices.sh
@@ -15,16 +15,12 @@
DIR="$(dirname $(readlink -f "$0"))"
-source "${DIR}"/ktap_helpers.sh
+source "${DIR}"/../kselftest/ktap_helpers.sh
PDT=/proc/device-tree/
COMPAT_LIST="${DIR}"/compatible_list
IGNORE_LIST="${DIR}"/compatible_ignore_list
-KSFT_PASS=0
-KSFT_FAIL=1
-KSFT_SKIP=4
-
ktap_print_header
if [[ ! -d "${PDT}" ]]; then
@@ -33,8 +29,8 @@
fi
nodes_compatible=$(
- for node_compat in $(find ${PDT} -name compatible); do
- node=$(dirname "${node_compat}")
+ for node in $(find ${PDT} -type d); do
+ [ ! -f "${node}"/compatible ] && continue
# Check if node is available
if [[ -e "${node}"/status ]]; then
status=$(tr -d '\000' < "${node}"/status)
@@ -46,10 +42,11 @@
nodes_dev_bound=$(
IFS=$'\n'
- for uevent in $(find /sys/devices -name uevent); do
- if [[ -d "$(dirname "${uevent}")"/driver ]]; then
- grep '^OF_FULLNAME=' "${uevent}" | sed -e 's|OF_FULLNAME=||'
- fi
+ for dev_dir in $(find /sys/devices -type d); do
+ [ ! -f "${dev_dir}"/uevent ] && continue
+ [ ! -d "${dev_dir}"/driver ] && continue
+
+ grep '^OF_FULLNAME=' "${dev_dir}"/uevent | sed -e 's|OF_FULLNAME=||'
done
)
diff --git a/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c b/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c
index e19ab0e..759f86e 100644
--- a/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c
+++ b/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c
@@ -10,7 +10,6 @@
#include <linux/mount.h>
#include <sys/syscall.h>
#include <sys/stat.h>
-#include <sys/mount.h>
#include <sys/mman.h>
#include <sched.h>
#include <fcntl.h>
@@ -32,7 +31,11 @@ static int sys_fsmount(int fd, unsigned int flags, unsigned int attr_flags)
{
return syscall(__NR_fsmount, fd, flags, attr_flags);
}
-
+static int sys_mount(const char *src, const char *tgt, const char *fst,
+ unsigned long flags, const void *data)
+{
+ return syscall(__NR_mount, src, tgt, fst, flags, data);
+}
static int sys_move_mount(int from_dfd, const char *from_pathname,
int to_dfd, const char *to_pathname,
unsigned int flags)
@@ -166,8 +169,7 @@ int main(int argc, char **argv)
ksft_test_result_skip("unable to create a new mount namespace\n");
return 1;
}
-
- if (mount(NULL, "/", NULL, MS_SLAVE | MS_REC, NULL) == -1) {
+ if (sys_mount(NULL, "/", NULL, MS_SLAVE | MS_REC, NULL) == -1) {
pr_perror("mount");
return 1;
}
diff --git a/tools/testing/selftests/ftrace/ftracetest b/tools/testing/selftests/ftrace/ftracetest
index c778d4d..25d4e0f 100755
--- a/tools/testing/selftests/ftrace/ftracetest
+++ b/tools/testing/selftests/ftrace/ftracetest
@@ -504,7 +504,7 @@
if [ "$KTAP" = "1" ]; then
echo -n "# Totals:"
echo -n " pass:"`echo $PASSED_CASES | wc -w`
- echo -n " faii:"`echo $FAILED_CASES | wc -w`
+ echo -n " fail:"`echo $FAILED_CASES | wc -w`
echo -n " xfail:"`echo $XFAILED_CASES | wc -w`
echo -n " xpass:0"
echo -n " skip:"`echo $UNTESTED_CASES $UNSUPPORTED_CASES | wc -w`
diff --git a/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc b/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
index add7d5b..c45094d 100644
--- a/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
+++ b/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
@@ -1,6 +1,6 @@
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0
-# description: Test file and directory owership changes for eventfs
+# description: Test file and directory ownership changes for eventfs
original_group=`stat -c "%g" .`
original_owner=`stat -c "%u" .`
diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/func_hotplug.tc b/tools/testing/selftests/ftrace/test.d/ftrace/func_hotplug.tc
new file mode 100644
index 0000000..ccfbfde3
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/ftrace/func_hotplug.tc
@@ -0,0 +1,42 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0-or-later
+# description: ftrace - function trace across cpu hotplug
+# requires: function:tracer
+
+if ! which nproc ; then
+ nproc() {
+ ls -d /sys/devices/system/cpu/cpu[0-9]* | wc -l
+ }
+fi
+
+NP=`nproc`
+
+if [ $NP -eq 1 ] ;then
+ echo "We cannot test cpu hotplug in UP environment"
+ exit_unresolved
+fi
+
+# Find online cpu
+for i in /sys/devices/system/cpu/cpu[1-9]*; do
+ if [ -f $i/online ] && [ "$(cat $i/online)" = "1" ]; then
+ cpu=$i
+ break
+ fi
+done
+
+if [ -z "$cpu" ]; then
+ echo "We cannot test cpu hotplug with a single cpu online"
+ exit_unresolved
+fi
+
+echo 0 > tracing_on
+echo > trace
+
+: "Set $(basename $cpu) offline/online with function tracer enabled"
+echo function > current_tracer
+echo 1 > tracing_on
+(echo 0 > $cpu/online)
+(echo "forked"; sleep 1)
+(echo 1 > $cpu/online)
+echo 0 > tracing_on
+echo nop > current_tracer
diff --git a/tools/testing/selftests/ftrace/test.d/trigger/trigger-hist-mod.tc b/tools/testing/selftests/ftrace/test.d/trigger/trigger-hist-mod.tc
index 4562e13..7178988 100644
--- a/tools/testing/selftests/ftrace/test.d/trigger/trigger-hist-mod.tc
+++ b/tools/testing/selftests/ftrace/test.d/trigger/trigger-hist-mod.tc
@@ -40,7 +40,7 @@
reset_trigger
-echo "Test histgram with log2 modifier"
+echo "Test histogram with log2 modifier"
echo 'hist:keys=bytes_req.log2' > events/kmem/kmalloc/trigger
for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done
diff --git a/tools/testing/selftests/futex/functional/futex_requeue_pi.c b/tools/testing/selftests/futex/functional/futex_requeue_pi.c
index 1ee5518..7f3ca5c 100644
--- a/tools/testing/selftests/futex/functional/futex_requeue_pi.c
+++ b/tools/testing/selftests/futex/functional/futex_requeue_pi.c
@@ -17,6 +17,8 @@
*
*****************************************************************************/
+#define _GNU_SOURCE
+
#include <errno.h>
#include <limits.h>
#include <pthread.h>
@@ -358,6 +360,7 @@ int unit_test(int broadcast, long lock, int third_party_owner, long timeout_ns)
int main(int argc, char *argv[])
{
+ const char *test_name;
int c, ret;
while ((c = getopt(argc, argv, "bchlot:v:")) != -1) {
@@ -397,6 +400,14 @@ int main(int argc, char *argv[])
"\tArguments: broadcast=%d locked=%d owner=%d timeout=%ldns\n",
broadcast, locked, owner, timeout_ns);
+ ret = asprintf(&test_name,
+ "%s broadcast=%d locked=%d owner=%d timeout=%ldns",
+ TEST_NAME, broadcast, locked, owner, timeout_ns);
+ if (ret < 0) {
+ ksft_print_msg("Failed to generate test name\n");
+ test_name = TEST_NAME;
+ }
+
/*
* FIXME: unit_test is obsolete now that we parse options and the
* various style of runs are done by run.sh - simplify the code and move
@@ -404,6 +415,6 @@ int main(int argc, char *argv[])
*/
ret = unit_test(broadcast, locked, owner, timeout_ns);
- print_result(TEST_NAME, ret);
+ print_result(test_name, ret);
return ret;
}
diff --git a/tools/testing/selftests/iommu/config b/tools/testing/selftests/iommu/config
index 6c4f901..110d739 100644
--- a/tools/testing/selftests/iommu/config
+++ b/tools/testing/selftests/iommu/config
@@ -1,2 +1,3 @@
-CONFIG_IOMMUFD
-CONFIG_IOMMUFD_TEST
+CONFIG_IOMMUFD=y
+CONFIG_FAULT_INJECTION=y
+CONFIG_IOMMUFD_TEST=y
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 1a881e7..edf1c99 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -12,6 +12,7 @@
static unsigned long HUGEPAGE_SIZE;
#define MOCK_PAGE_SIZE (PAGE_SIZE / 2)
+#define MOCK_HUGE_PAGE_SIZE (512 * MOCK_PAGE_SIZE)
static unsigned long get_huge_page_size(void)
{
@@ -1716,10 +1717,12 @@ FIXTURE(iommufd_dirty_tracking)
FIXTURE_VARIANT(iommufd_dirty_tracking)
{
unsigned long buffer_size;
+ bool hugepages;
};
FIXTURE_SETUP(iommufd_dirty_tracking)
{
+ int mmap_flags;
void *vrc;
int rc;
@@ -1732,25 +1735,41 @@ FIXTURE_SETUP(iommufd_dirty_tracking)
variant->buffer_size, rc);
}
+ mmap_flags = MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED;
+ if (variant->hugepages) {
+ /*
+ * MAP_POPULATE will cause the kernel to fail mmap if THPs are
+ * not available.
+ */
+ mmap_flags |= MAP_HUGETLB | MAP_POPULATE;
+ }
assert((uintptr_t)self->buffer % HUGEPAGE_SIZE == 0);
vrc = mmap(self->buffer, variant->buffer_size, PROT_READ | PROT_WRITE,
- MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+ mmap_flags, -1, 0);
assert(vrc == self->buffer);
self->page_size = MOCK_PAGE_SIZE;
self->bitmap_size =
variant->buffer_size / self->page_size / BITS_PER_BYTE;
- /* Provision with an extra (MOCK_PAGE_SIZE) for the unaligned case */
+ /* Provision with an extra (PAGE_SIZE) for the unaligned case */
rc = posix_memalign(&self->bitmap, PAGE_SIZE,
- self->bitmap_size + MOCK_PAGE_SIZE);
+ self->bitmap_size + PAGE_SIZE);
assert(!rc);
assert(self->bitmap);
assert((uintptr_t)self->bitmap % PAGE_SIZE == 0);
test_ioctl_ioas_alloc(&self->ioas_id);
- test_cmd_mock_domain(self->ioas_id, &self->stdev_id, &self->hwpt_id,
- &self->idev_id);
+ /* Enable 1M mock IOMMU hugepages */
+ if (variant->hugepages) {
+ test_cmd_mock_domain_flags(self->ioas_id,
+ MOCK_FLAGS_DEVICE_HUGE_IOVA,
+ &self->stdev_id, &self->hwpt_id,
+ &self->idev_id);
+ } else {
+ test_cmd_mock_domain(self->ioas_id, &self->stdev_id,
+ &self->hwpt_id, &self->idev_id);
+ }
}
FIXTURE_TEARDOWN(iommufd_dirty_tracking)
@@ -1784,12 +1803,26 @@ FIXTURE_VARIANT_ADD(iommufd_dirty_tracking, domain_dirty128M)
.buffer_size = 128UL * 1024UL * 1024UL,
};
+FIXTURE_VARIANT_ADD(iommufd_dirty_tracking, domain_dirty128M_huge)
+{
+ /* 4K bitmap (128M IOVA range) */
+ .buffer_size = 128UL * 1024UL * 1024UL,
+ .hugepages = true,
+};
+
FIXTURE_VARIANT_ADD(iommufd_dirty_tracking, domain_dirty256M)
{
/* 8K bitmap (256M IOVA range) */
.buffer_size = 256UL * 1024UL * 1024UL,
};
+FIXTURE_VARIANT_ADD(iommufd_dirty_tracking, domain_dirty256M_huge)
+{
+ /* 8K bitmap (256M IOVA range) */
+ .buffer_size = 256UL * 1024UL * 1024UL,
+ .hugepages = true,
+};
+
TEST_F(iommufd_dirty_tracking, enforce_dirty)
{
uint32_t ioas_id, stddev_id, idev_id;
@@ -1849,65 +1882,80 @@ TEST_F(iommufd_dirty_tracking, device_dirty_capability)
TEST_F(iommufd_dirty_tracking, get_dirty_bitmap)
{
- uint32_t stddev_id;
+ uint32_t page_size = MOCK_PAGE_SIZE;
uint32_t hwpt_id;
uint32_t ioas_id;
+ if (variant->hugepages)
+ page_size = MOCK_HUGE_PAGE_SIZE;
+
test_ioctl_ioas_alloc(&ioas_id);
test_ioctl_ioas_map_fixed_id(ioas_id, self->buffer,
variant->buffer_size, MOCK_APERTURE_START);
test_cmd_hwpt_alloc(self->idev_id, ioas_id,
IOMMU_HWPT_ALLOC_DIRTY_TRACKING, &hwpt_id);
- test_cmd_mock_domain(hwpt_id, &stddev_id, NULL, NULL);
test_cmd_set_dirty_tracking(hwpt_id, true);
test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
- MOCK_APERTURE_START, self->page_size,
+ MOCK_APERTURE_START, self->page_size, page_size,
self->bitmap, self->bitmap_size, 0, _metadata);
/* PAGE_SIZE unaligned bitmap */
test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
- MOCK_APERTURE_START, self->page_size,
+ MOCK_APERTURE_START, self->page_size, page_size,
self->bitmap + MOCK_PAGE_SIZE,
self->bitmap_size, 0, _metadata);
- test_ioctl_destroy(stddev_id);
+ /* u64 unaligned bitmap */
+ test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
+ MOCK_APERTURE_START, self->page_size, page_size,
+ self->bitmap + 0xff1, self->bitmap_size, 0,
+ _metadata);
+
test_ioctl_destroy(hwpt_id);
}
TEST_F(iommufd_dirty_tracking, get_dirty_bitmap_no_clear)
{
- uint32_t stddev_id;
+ uint32_t page_size = MOCK_PAGE_SIZE;
uint32_t hwpt_id;
uint32_t ioas_id;
+ if (variant->hugepages)
+ page_size = MOCK_HUGE_PAGE_SIZE;
+
test_ioctl_ioas_alloc(&ioas_id);
test_ioctl_ioas_map_fixed_id(ioas_id, self->buffer,
variant->buffer_size, MOCK_APERTURE_START);
test_cmd_hwpt_alloc(self->idev_id, ioas_id,
IOMMU_HWPT_ALLOC_DIRTY_TRACKING, &hwpt_id);
- test_cmd_mock_domain(hwpt_id, &stddev_id, NULL, NULL);
test_cmd_set_dirty_tracking(hwpt_id, true);
test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
- MOCK_APERTURE_START, self->page_size,
+ MOCK_APERTURE_START, self->page_size, page_size,
self->bitmap, self->bitmap_size,
IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR,
_metadata);
/* Unaligned bitmap */
test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
- MOCK_APERTURE_START, self->page_size,
+ MOCK_APERTURE_START, self->page_size, page_size,
self->bitmap + MOCK_PAGE_SIZE,
self->bitmap_size,
IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR,
_metadata);
- test_ioctl_destroy(stddev_id);
+ /* u64 unaligned bitmap */
+ test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
+ MOCK_APERTURE_START, self->page_size, page_size,
+ self->bitmap + 0xff1, self->bitmap_size,
+ IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR,
+ _metadata);
+
test_ioctl_destroy(hwpt_id);
}
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index c646264..8d2b46b 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -344,16 +344,19 @@ static int _test_cmd_mock_domain_set_dirty(int fd, __u32 hwpt_id, size_t length,
page_size, bitmap, nr))
static int _test_mock_dirty_bitmaps(int fd, __u32 hwpt_id, size_t length,
- __u64 iova, size_t page_size, __u64 *bitmap,
+ __u64 iova, size_t page_size,
+ size_t pte_page_size, __u64 *bitmap,
__u64 bitmap_size, __u32 flags,
struct __test_metadata *_metadata)
{
- unsigned long i, nbits = bitmap_size * BITS_PER_BYTE;
- unsigned long nr = nbits / 2;
+ unsigned long npte = pte_page_size / page_size, pteset = 2 * npte;
+ unsigned long nbits = bitmap_size * BITS_PER_BYTE;
+ unsigned long j, i, nr = nbits / pteset ?: 1;
__u64 out_dirty = 0;
/* Mark all even bits as dirty in the mock domain */
- for (i = 0; i < nbits; i += 2)
+ memset(bitmap, 0, bitmap_size);
+ for (i = 0; i < nbits; i += pteset)
set_bit(i, (unsigned long *)bitmap);
test_cmd_mock_domain_set_dirty(fd, hwpt_id, length, iova, page_size,
@@ -365,8 +368,12 @@ static int _test_mock_dirty_bitmaps(int fd, __u32 hwpt_id, size_t length,
test_cmd_get_dirty_bitmap(fd, hwpt_id, length, iova, page_size, bitmap,
flags);
/* Beware ASSERT_EQ() is two statements -- braces are not redundant! */
- for (i = 0; i < nbits; i++) {
- ASSERT_EQ(!(i % 2), test_bit(i, (unsigned long *)bitmap));
+ for (i = 0; i < nbits; i += pteset) {
+ for (j = 0; j < pteset; j++) {
+ ASSERT_EQ(j < npte,
+ test_bit(i + j, (unsigned long *)bitmap));
+ }
+ ASSERT_EQ(!(i % pteset), test_bit(i, (unsigned long *)bitmap));
}
memset(bitmap, 0, bitmap_size);
@@ -374,19 +381,23 @@ static int _test_mock_dirty_bitmaps(int fd, __u32 hwpt_id, size_t length,
flags);
/* It as read already -- expect all zeroes */
- for (i = 0; i < nbits; i++) {
- ASSERT_EQ(!(i % 2) && (flags &
- IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR),
- test_bit(i, (unsigned long *)bitmap));
+ for (i = 0; i < nbits; i += pteset) {
+ for (j = 0; j < pteset; j++) {
+ ASSERT_EQ(
+ (j < npte) &&
+ (flags &
+ IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR),
+ test_bit(i + j, (unsigned long *)bitmap));
+ }
}
return 0;
}
-#define test_mock_dirty_bitmaps(hwpt_id, length, iova, page_size, bitmap, \
- bitmap_size, flags, _metadata) \
+#define test_mock_dirty_bitmaps(hwpt_id, length, iova, page_size, pte_size,\
+ bitmap, bitmap_size, flags, _metadata) \
ASSERT_EQ(0, _test_mock_dirty_bitmaps(self->fd, hwpt_id, length, iova, \
- page_size, bitmap, bitmap_size, \
- flags, _metadata))
+ page_size, pte_size, bitmap, \
+ bitmap_size, flags, _metadata))
static int _test_cmd_create_access(int fd, unsigned int ioas_id,
__u32 *access_id, unsigned int flags)
diff --git a/tools/testing/selftests/dt/ktap_helpers.sh b/tools/testing/selftests/kselftest/ktap_helpers.sh
similarity index 66%
rename from tools/testing/selftests/dt/ktap_helpers.sh
rename to tools/testing/selftests/kselftest/ktap_helpers.sh
index 8dfae51..f2fbb91 100644
--- a/tools/testing/selftests/dt/ktap_helpers.sh
+++ b/tools/testing/selftests/kselftest/ktap_helpers.sh
@@ -9,14 +9,27 @@
KTAP_CNT_FAIL=0
KTAP_CNT_SKIP=0
+KSFT_PASS=0
+KSFT_FAIL=1
+KSFT_XFAIL=2
+KSFT_XPASS=3
+KSFT_SKIP=4
+
+KSFT_NUM_TESTS=0
+
ktap_print_header() {
echo "TAP version 13"
}
-ktap_set_plan() {
- num_tests="$1"
+ktap_print_msg()
+{
+ echo "#" $@
+}
- echo "1..$num_tests"
+ktap_set_plan() {
+ KSFT_NUM_TESTS="$1"
+
+ echo "1..$KSFT_NUM_TESTS"
}
ktap_skip_all() {
@@ -65,6 +78,34 @@
KTAP_CNT_FAIL=$((KTAP_CNT_FAIL+1))
}
+ktap_test_result() {
+ description="$1"
+ shift
+
+ if $@; then
+ ktap_test_pass "$description"
+ else
+ ktap_test_fail "$description"
+ fi
+}
+
+ktap_exit_fail_msg() {
+ echo "Bail out! " $@
+ ktap_print_totals
+
+ exit "$KSFT_FAIL"
+}
+
+ktap_finished() {
+ ktap_print_totals
+
+ if [ $(("$KTAP_CNT_PASS" + "$KTAP_CNT_SKIP")) -eq "$KSFT_NUM_TESTS" ]; then
+ exit "$KSFT_PASS"
+ else
+ exit "$KSFT_FAIL"
+ fi
+}
+
ktap_print_totals() {
echo "# Totals: pass:$KTAP_CNT_PASS fail:$KTAP_CNT_FAIL xfail:0 xpass:0 skip:$KTAP_CNT_SKIP error:0"
}
diff --git a/tools/testing/selftests/kvm/aarch64/arch_timer.c b/tools/testing/selftests/kvm/aarch64/arch_timer.c
index 274b846..2cb8dd1 100644
--- a/tools/testing/selftests/kvm/aarch64/arch_timer.c
+++ b/tools/testing/selftests/kvm/aarch64/arch_timer.c
@@ -248,7 +248,7 @@ static void *test_vcpu_run(void *arg)
REPORT_GUEST_ASSERT(uc);
break;
default:
- TEST_FAIL("Unexpected guest exit\n");
+ TEST_FAIL("Unexpected guest exit");
}
return NULL;
@@ -287,7 +287,7 @@ static int test_migrate_vcpu(unsigned int vcpu_idx)
/* Allow the error where the vCPU thread is already finished */
TEST_ASSERT(ret == 0 || ret == ESRCH,
- "Failed to migrate the vCPU:%u to pCPU: %u; ret: %d\n",
+ "Failed to migrate the vCPU:%u to pCPU: %u; ret: %d",
vcpu_idx, new_pcpu, ret);
return ret;
@@ -326,12 +326,12 @@ static void test_run(struct kvm_vm *vm)
pthread_mutex_init(&vcpu_done_map_lock, NULL);
vcpu_done_map = bitmap_zalloc(test_args.nr_vcpus);
- TEST_ASSERT(vcpu_done_map, "Failed to allocate vcpu done bitmap\n");
+ TEST_ASSERT(vcpu_done_map, "Failed to allocate vcpu done bitmap");
for (i = 0; i < (unsigned long)test_args.nr_vcpus; i++) {
ret = pthread_create(&pt_vcpu_run[i], NULL, test_vcpu_run,
(void *)(unsigned long)i);
- TEST_ASSERT(!ret, "Failed to create vCPU-%d pthread\n", i);
+ TEST_ASSERT(!ret, "Failed to create vCPU-%d pthread", i);
}
/* Spawn a thread to control the vCPU migrations */
@@ -340,7 +340,7 @@ static void test_run(struct kvm_vm *vm)
ret = pthread_create(&pt_vcpu_migration, NULL,
test_vcpu_migration, NULL);
- TEST_ASSERT(!ret, "Failed to create the migration pthread\n");
+ TEST_ASSERT(!ret, "Failed to create the migration pthread");
}
@@ -384,7 +384,7 @@ static struct kvm_vm *test_vm_create(void)
if (kvm_has_cap(KVM_CAP_COUNTER_OFFSET))
vm_ioctl(vm, KVM_ARM_SET_COUNTER_OFFSET, &test_args.offset);
else
- TEST_FAIL("no support for global offset\n");
+ TEST_FAIL("no support for global offset");
}
for (i = 0; i < nr_vcpus; i++)
diff --git a/tools/testing/selftests/kvm/aarch64/hypercalls.c b/tools/testing/selftests/kvm/aarch64/hypercalls.c
index 31f66ba..27c10e7 100644
--- a/tools/testing/selftests/kvm/aarch64/hypercalls.c
+++ b/tools/testing/selftests/kvm/aarch64/hypercalls.c
@@ -175,18 +175,18 @@ static void test_fw_regs_before_vm_start(struct kvm_vcpu *vcpu)
/* First 'read' should be an upper limit of the features supported */
vcpu_get_reg(vcpu, reg_info->reg, &val);
TEST_ASSERT(val == FW_REG_ULIMIT_VAL(reg_info->max_feat_bit),
- "Expected all the features to be set for reg: 0x%lx; expected: 0x%lx; read: 0x%lx\n",
+ "Expected all the features to be set for reg: 0x%lx; expected: 0x%lx; read: 0x%lx",
reg_info->reg, FW_REG_ULIMIT_VAL(reg_info->max_feat_bit), val);
/* Test a 'write' by disabling all the features of the register map */
ret = __vcpu_set_reg(vcpu, reg_info->reg, 0);
TEST_ASSERT(ret == 0,
- "Failed to clear all the features of reg: 0x%lx; ret: %d\n",
+ "Failed to clear all the features of reg: 0x%lx; ret: %d",
reg_info->reg, errno);
vcpu_get_reg(vcpu, reg_info->reg, &val);
TEST_ASSERT(val == 0,
- "Expected all the features to be cleared for reg: 0x%lx\n", reg_info->reg);
+ "Expected all the features to be cleared for reg: 0x%lx", reg_info->reg);
/*
* Test enabling a feature that's not supported.
@@ -195,7 +195,7 @@ static void test_fw_regs_before_vm_start(struct kvm_vcpu *vcpu)
if (reg_info->max_feat_bit < 63) {
ret = __vcpu_set_reg(vcpu, reg_info->reg, BIT(reg_info->max_feat_bit + 1));
TEST_ASSERT(ret != 0 && errno == EINVAL,
- "Unexpected behavior or return value (%d) while setting an unsupported feature for reg: 0x%lx\n",
+ "Unexpected behavior or return value (%d) while setting an unsupported feature for reg: 0x%lx",
errno, reg_info->reg);
}
}
@@ -216,7 +216,7 @@ static void test_fw_regs_after_vm_start(struct kvm_vcpu *vcpu)
*/
vcpu_get_reg(vcpu, reg_info->reg, &val);
TEST_ASSERT(val == 0,
- "Expected all the features to be cleared for reg: 0x%lx\n",
+ "Expected all the features to be cleared for reg: 0x%lx",
reg_info->reg);
/*
@@ -226,7 +226,7 @@ static void test_fw_regs_after_vm_start(struct kvm_vcpu *vcpu)
*/
ret = __vcpu_set_reg(vcpu, reg_info->reg, FW_REG_ULIMIT_VAL(reg_info->max_feat_bit));
TEST_ASSERT(ret != 0 && errno == EBUSY,
- "Unexpected behavior or return value (%d) while setting a feature while VM is running for reg: 0x%lx\n",
+ "Unexpected behavior or return value (%d) while setting a feature while VM is running for reg: 0x%lx",
errno, reg_info->reg);
}
}
@@ -265,7 +265,7 @@ static void test_guest_stage(struct kvm_vm **vm, struct kvm_vcpu **vcpu)
case TEST_STAGE_HVC_IFACE_FALSE_INFO:
break;
default:
- TEST_FAIL("Unknown test stage: %d\n", prev_stage);
+ TEST_FAIL("Unknown test stage: %d", prev_stage);
}
}
@@ -294,7 +294,7 @@ static void test_run(void)
REPORT_GUEST_ASSERT(uc);
break;
default:
- TEST_FAIL("Unexpected guest exit\n");
+ TEST_FAIL("Unexpected guest exit");
}
}
diff --git a/tools/testing/selftests/kvm/aarch64/page_fault_test.c b/tools/testing/selftests/kvm/aarch64/page_fault_test.c
index 08a5ca5..53fddad 100644
--- a/tools/testing/selftests/kvm/aarch64/page_fault_test.c
+++ b/tools/testing/selftests/kvm/aarch64/page_fault_test.c
@@ -414,10 +414,10 @@ static bool punch_hole_in_backing_store(struct kvm_vm *vm,
if (fd != -1) {
ret = fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
0, paging_size);
- TEST_ASSERT(ret == 0, "fallocate failed\n");
+ TEST_ASSERT(ret == 0, "fallocate failed");
} else {
ret = madvise(hva, paging_size, MADV_DONTNEED);
- TEST_ASSERT(ret == 0, "madvise failed\n");
+ TEST_ASSERT(ret == 0, "madvise failed");
}
return true;
@@ -501,7 +501,7 @@ static bool handle_cmd(struct kvm_vm *vm, int cmd)
void fail_vcpu_run_no_handler(int ret)
{
- TEST_FAIL("Unexpected vcpu run failure\n");
+ TEST_FAIL("Unexpected vcpu run failure");
}
void fail_vcpu_run_mmio_no_syndrome_handler(int ret)
diff --git a/tools/testing/selftests/kvm/aarch64/smccc_filter.c b/tools/testing/selftests/kvm/aarch64/smccc_filter.c
index f4ceae9..2d189f3 100644
--- a/tools/testing/selftests/kvm/aarch64/smccc_filter.c
+++ b/tools/testing/selftests/kvm/aarch64/smccc_filter.c
@@ -178,7 +178,7 @@ static void expect_call_denied(struct kvm_vcpu *vcpu)
struct ucall uc;
if (get_ucall(vcpu, &uc) != UCALL_SYNC)
- TEST_FAIL("Unexpected ucall: %lu\n", uc.cmd);
+ TEST_FAIL("Unexpected ucall: %lu", uc.cmd);
TEST_ASSERT(uc.args[1] == SMCCC_RET_NOT_SUPPORTED,
"Unexpected SMCCC return code: %lu", uc.args[1]);
diff --git a/tools/testing/selftests/kvm/aarch64/vpmu_counter_access.c b/tools/testing/selftests/kvm/aarch64/vpmu_counter_access.c
index 9d51b56..5f97133 100644
--- a/tools/testing/selftests/kvm/aarch64/vpmu_counter_access.c
+++ b/tools/testing/selftests/kvm/aarch64/vpmu_counter_access.c
@@ -517,11 +517,11 @@ static void test_create_vpmu_vm_with_pmcr_n(uint64_t pmcr_n, bool expect_fail)
if (expect_fail)
TEST_ASSERT(pmcr_orig == pmcr,
- "PMCR.N modified by KVM to a larger value (PMCR: 0x%lx) for pmcr_n: 0x%lx\n",
+ "PMCR.N modified by KVM to a larger value (PMCR: 0x%lx) for pmcr_n: 0x%lx",
pmcr, pmcr_n);
else
TEST_ASSERT(pmcr_n == get_pmcr_n(pmcr),
- "Failed to update PMCR.N to %lu (received: %lu)\n",
+ "Failed to update PMCR.N to %lu (received: %lu)",
pmcr_n, get_pmcr_n(pmcr));
}
@@ -594,12 +594,12 @@ static void run_pmregs_validity_test(uint64_t pmcr_n)
*/
vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(set_reg_id), ®_val);
TEST_ASSERT((reg_val & (~valid_counters_mask)) == 0,
- "Initial read of set_reg: 0x%llx has unimplemented counters enabled: 0x%lx\n",
+ "Initial read of set_reg: 0x%llx has unimplemented counters enabled: 0x%lx",
KVM_ARM64_SYS_REG(set_reg_id), reg_val);
vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(clr_reg_id), ®_val);
TEST_ASSERT((reg_val & (~valid_counters_mask)) == 0,
- "Initial read of clr_reg: 0x%llx has unimplemented counters enabled: 0x%lx\n",
+ "Initial read of clr_reg: 0x%llx has unimplemented counters enabled: 0x%lx",
KVM_ARM64_SYS_REG(clr_reg_id), reg_val);
/*
@@ -611,12 +611,12 @@ static void run_pmregs_validity_test(uint64_t pmcr_n)
vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(set_reg_id), ®_val);
TEST_ASSERT((reg_val & (~valid_counters_mask)) == 0,
- "Read of set_reg: 0x%llx has unimplemented counters enabled: 0x%lx\n",
+ "Read of set_reg: 0x%llx has unimplemented counters enabled: 0x%lx",
KVM_ARM64_SYS_REG(set_reg_id), reg_val);
vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(clr_reg_id), ®_val);
TEST_ASSERT((reg_val & (~valid_counters_mask)) == 0,
- "Read of clr_reg: 0x%llx has unimplemented counters enabled: 0x%lx\n",
+ "Read of clr_reg: 0x%llx has unimplemented counters enabled: 0x%lx",
KVM_ARM64_SYS_REG(clr_reg_id), reg_val);
}
diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c
index 09c116a..bf3609f 100644
--- a/tools/testing/selftests/kvm/demand_paging_test.c
+++ b/tools/testing/selftests/kvm/demand_paging_test.c
@@ -45,10 +45,10 @@ static void vcpu_worker(struct memstress_vcpu_args *vcpu_args)
/* Let the guest access its memory */
ret = _vcpu_run(vcpu);
- TEST_ASSERT(ret == 0, "vcpu_run failed: %d\n", ret);
+ TEST_ASSERT(ret == 0, "vcpu_run failed: %d", ret);
if (get_ucall(vcpu, NULL) != UCALL_SYNC) {
TEST_ASSERT(false,
- "Invalid guest sync status: exit_reason=%s\n",
+ "Invalid guest sync status: exit_reason=%s",
exit_reason_str(run->exit_reason));
}
diff --git a/tools/testing/selftests/kvm/dirty_log_perf_test.c b/tools/testing/selftests/kvm/dirty_log_perf_test.c
index d374dbc..504f6fe 100644
--- a/tools/testing/selftests/kvm/dirty_log_perf_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_perf_test.c
@@ -88,9 +88,9 @@ static void vcpu_worker(struct memstress_vcpu_args *vcpu_args)
ret = _vcpu_run(vcpu);
ts_diff = timespec_elapsed(start);
- TEST_ASSERT(ret == 0, "vcpu_run failed: %d\n", ret);
+ TEST_ASSERT(ret == 0, "vcpu_run failed: %d", ret);
TEST_ASSERT(get_ucall(vcpu, NULL) == UCALL_SYNC,
- "Invalid guest sync status: exit_reason=%s\n",
+ "Invalid guest sync status: exit_reason=%s",
exit_reason_str(run->exit_reason));
pr_debug("Got sync event from vCPU %d\n", vcpu_idx);
diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index 6cbecf4..eaad5b2 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -262,7 +262,7 @@ static void default_after_vcpu_run(struct kvm_vcpu *vcpu, int ret, int err)
"vcpu run failed: errno=%d", err);
TEST_ASSERT(get_ucall(vcpu, NULL) == UCALL_SYNC,
- "Invalid guest sync status: exit_reason=%s\n",
+ "Invalid guest sync status: exit_reason=%s",
exit_reason_str(run->exit_reason));
vcpu_handle_sync_stop();
@@ -376,7 +376,10 @@ static void dirty_ring_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
cleared = kvm_vm_reset_dirty_ring(vcpu->vm);
- /* Cleared pages should be the same as collected */
+ /*
+ * Cleared pages should be the same as collected, as KVM is supposed to
+ * clear only the entries that have been harvested.
+ */
TEST_ASSERT(cleared == count, "Reset dirty pages (%u) mismatch "
"with collected (%u)", cleared, count);
@@ -410,17 +413,11 @@ static void dirty_ring_after_vcpu_run(struct kvm_vcpu *vcpu, int ret, int err)
pr_info("vcpu continues now.\n");
} else {
TEST_ASSERT(false, "Invalid guest sync status: "
- "exit_reason=%s\n",
+ "exit_reason=%s",
exit_reason_str(run->exit_reason));
}
}
-static void dirty_ring_before_vcpu_join(void)
-{
- /* Kick another round of vcpu just to make sure it will quit */
- sem_post(&sem_vcpu_cont);
-}
-
struct log_mode {
const char *name;
/* Return true if this mode is supported, otherwise false */
@@ -433,7 +430,6 @@ struct log_mode {
uint32_t *ring_buf_idx);
/* Hook to call when after each vcpu run */
void (*after_vcpu_run)(struct kvm_vcpu *vcpu, int ret, int err);
- void (*before_vcpu_join) (void);
} log_modes[LOG_MODE_NUM] = {
{
.name = "dirty-log",
@@ -452,7 +448,6 @@ struct log_mode {
.supported = dirty_ring_supported,
.create_vm_done = dirty_ring_create_vm_done,
.collect_dirty_pages = dirty_ring_collect_dirty_pages,
- .before_vcpu_join = dirty_ring_before_vcpu_join,
.after_vcpu_run = dirty_ring_after_vcpu_run,
},
};
@@ -513,14 +508,6 @@ static void log_mode_after_vcpu_run(struct kvm_vcpu *vcpu, int ret, int err)
mode->after_vcpu_run(vcpu, ret, err);
}
-static void log_mode_before_vcpu_join(void)
-{
- struct log_mode *mode = &log_modes[host_log_mode];
-
- if (mode->before_vcpu_join)
- mode->before_vcpu_join();
-}
-
static void generate_random_array(uint64_t *guest_array, uint64_t size)
{
uint64_t i;
@@ -719,6 +706,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
struct kvm_vm *vm;
unsigned long *bmap;
uint32_t ring_buf_idx = 0;
+ int sem_val;
if (!log_mode_supported()) {
print_skip("Log mode '%s' not supported",
@@ -788,12 +776,22 @@ static void run_test(enum vm_guest_mode mode, void *arg)
/* Start the iterations */
iteration = 1;
sync_global_to_guest(vm, iteration);
- host_quit = false;
+ WRITE_ONCE(host_quit, false);
host_dirty_count = 0;
host_clear_count = 0;
host_track_next_count = 0;
WRITE_ONCE(dirty_ring_vcpu_ring_full, false);
+ /*
+ * Ensure the previous iteration didn't leave a dangling semaphore, i.e.
+ * that the main task and vCPU worker were synchronized and completed
+ * verification of all iterations.
+ */
+ sem_getvalue(&sem_vcpu_stop, &sem_val);
+ TEST_ASSERT_EQ(sem_val, 0);
+ sem_getvalue(&sem_vcpu_cont, &sem_val);
+ TEST_ASSERT_EQ(sem_val, 0);
+
pthread_create(&vcpu_thread, NULL, vcpu_worker, vcpu);
while (iteration < p->iterations) {
@@ -819,15 +817,21 @@ static void run_test(enum vm_guest_mode mode, void *arg)
assert(host_log_mode == LOG_MODE_DIRTY_RING ||
atomic_read(&vcpu_sync_stop_requested) == false);
vm_dirty_log_verify(mode, bmap);
- sem_post(&sem_vcpu_cont);
- iteration++;
+ /*
+ * Set host_quit before sem_vcpu_cont in the final iteration to
+ * ensure that the vCPU worker doesn't resume the guest. As
+ * above, the dirty ring test may stop and wait even when not
+ * explicitly request to do so, i.e. would hang waiting for a
+ * "continue" if it's allowed to resume the guest.
+ */
+ if (++iteration == p->iterations)
+ WRITE_ONCE(host_quit, true);
+
+ sem_post(&sem_vcpu_cont);
sync_global_to_guest(vm, iteration);
}
- /* Tell the vcpu thread to quit */
- host_quit = true;
- log_mode_before_vcpu_join();
pthread_join(vcpu_thread, NULL);
pr_info("Total bits checked: dirty (%"PRIu64"), clear (%"PRIu64"), "
diff --git a/tools/testing/selftests/kvm/get-reg-list.c b/tools/testing/selftests/kvm/get-reg-list.c
index 8274ef0..91f05f7 100644
--- a/tools/testing/selftests/kvm/get-reg-list.c
+++ b/tools/testing/selftests/kvm/get-reg-list.c
@@ -152,7 +152,7 @@ static void check_supported(struct vcpu_reg_list *c)
continue;
__TEST_REQUIRE(kvm_has_cap(s->capability),
- "%s: %s not available, skipping tests\n",
+ "%s: %s not available, skipping tests",
config_name(c), s->name);
}
}
diff --git a/tools/testing/selftests/kvm/guest_print_test.c b/tools/testing/selftests/kvm/guest_print_test.c
index 41230b7..3502caa 100644
--- a/tools/testing/selftests/kvm/guest_print_test.c
+++ b/tools/testing/selftests/kvm/guest_print_test.c
@@ -98,7 +98,7 @@ static void ucall_abort(const char *assert_msg, const char *expected_assert_msg)
int offset = len_str - len_substr;
TEST_ASSERT(len_substr <= len_str,
- "Expected '%s' to be a substring of '%s'\n",
+ "Expected '%s' to be a substring of '%s'",
assert_msg, expected_assert_msg);
TEST_ASSERT(strcmp(&assert_msg[offset], expected_assert_msg) == 0,
@@ -116,7 +116,7 @@ static void run_test(struct kvm_vcpu *vcpu, const char *expected_printf,
vcpu_run(vcpu);
TEST_ASSERT(run->exit_reason == UCALL_EXIT_REASON,
- "Unexpected exit reason: %u (%s),\n",
+ "Unexpected exit reason: %u (%s),",
run->exit_reason, exit_reason_str(run->exit_reason));
switch (get_ucall(vcpu, &uc)) {
@@ -161,11 +161,11 @@ static void test_limits(void)
vcpu_run(vcpu);
TEST_ASSERT(run->exit_reason == UCALL_EXIT_REASON,
- "Unexpected exit reason: %u (%s),\n",
+ "Unexpected exit reason: %u (%s),",
run->exit_reason, exit_reason_str(run->exit_reason));
TEST_ASSERT(get_ucall(vcpu, &uc) == UCALL_ABORT,
- "Unexpected ucall command: %lu, Expected: %u (UCALL_ABORT)\n",
+ "Unexpected ucall command: %lu, Expected: %u (UCALL_ABORT)",
uc.cmd, UCALL_ABORT);
kvm_vm_free(vm);
diff --git a/tools/testing/selftests/kvm/hardware_disable_test.c b/tools/testing/selftests/kvm/hardware_disable_test.c
index f5d59b9..decc521f 100644
--- a/tools/testing/selftests/kvm/hardware_disable_test.c
+++ b/tools/testing/selftests/kvm/hardware_disable_test.c
@@ -41,7 +41,7 @@ static void *run_vcpu(void *arg)
vcpu_run(vcpu);
- TEST_ASSERT(false, "%s: exited with reason %d: %s\n",
+ TEST_ASSERT(false, "%s: exited with reason %d: %s",
__func__, run->exit_reason,
exit_reason_str(run->exit_reason));
pthread_exit(NULL);
@@ -55,7 +55,7 @@ static void *sleeping_thread(void *arg)
fd = open("/dev/null", O_RDWR);
close(fd);
}
- TEST_ASSERT(false, "%s: exited\n", __func__);
+ TEST_ASSERT(false, "%s: exited", __func__);
pthread_exit(NULL);
}
@@ -118,7 +118,7 @@ static void run_test(uint32_t run)
for (i = 0; i < VCPU_NUM; ++i)
check_join(threads[i], &b);
/* Should not be reached */
- TEST_ASSERT(false, "%s: [%d] child escaped the ninja\n", __func__, run);
+ TEST_ASSERT(false, "%s: [%d] child escaped the ninja", __func__, run);
}
void wait_for_child_setup(pid_t pid)
diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h
index 71a41fa..50a5e31b 100644
--- a/tools/testing/selftests/kvm/include/test_util.h
+++ b/tools/testing/selftests/kvm/include/test_util.h
@@ -195,4 +195,6 @@ __printf(3, 4) int guest_snprintf(char *buf, int n, const char *fmt, ...);
char *strdup_printf(const char *fmt, ...) __attribute__((format(printf, 1, 2), nonnull(1)));
+char *sys_get_cur_clocksource(void);
+
#endif /* SELFTEST_KVM_TEST_UTIL_H */
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index a848635..5bca8c9 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -1271,4 +1271,6 @@ void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
#define PFERR_GUEST_PAGE_MASK BIT_ULL(PFERR_GUEST_PAGE_BIT)
#define PFERR_IMPLICIT_ACCESS BIT_ULL(PFERR_IMPLICIT_ACCESS_BIT)
+bool sys_clocksource_is_based_on_tsc(void);
+
#endif /* SELFTEST_KVM_PROCESSOR_H */
diff --git a/tools/testing/selftests/kvm/kvm_create_max_vcpus.c b/tools/testing/selftests/kvm/kvm_create_max_vcpus.c
index 31b3cb2..b9e2326 100644
--- a/tools/testing/selftests/kvm/kvm_create_max_vcpus.c
+++ b/tools/testing/selftests/kvm/kvm_create_max_vcpus.c
@@ -65,7 +65,7 @@ int main(int argc, char *argv[])
int r = setrlimit(RLIMIT_NOFILE, &rl);
__TEST_REQUIRE(r >= 0,
- "RLIMIT_NOFILE hard limit is too low (%d, wanted %d)\n",
+ "RLIMIT_NOFILE hard limit is too low (%d, wanted %d)",
old_rlim_max, nr_fds_wanted);
} else {
TEST_ASSERT(!setrlimit(RLIMIT_NOFILE, &rl), "setrlimit() failed!");
diff --git a/tools/testing/selftests/kvm/kvm_page_table_test.c b/tools/testing/selftests/kvm/kvm_page_table_test.c
index e37dc9c..e0ba97a 100644
--- a/tools/testing/selftests/kvm/kvm_page_table_test.c
+++ b/tools/testing/selftests/kvm/kvm_page_table_test.c
@@ -204,9 +204,9 @@ static void *vcpu_worker(void *data)
ret = _vcpu_run(vcpu);
ts_diff = timespec_elapsed(start);
- TEST_ASSERT(ret == 0, "vcpu_run failed: %d\n", ret);
+ TEST_ASSERT(ret == 0, "vcpu_run failed: %d", ret);
TEST_ASSERT(get_ucall(vcpu, NULL) == UCALL_SYNC,
- "Invalid guest sync status: exit_reason=%s\n",
+ "Invalid guest sync status: exit_reason=%s",
exit_reason_str(vcpu->run->exit_reason));
pr_debug("Got sync event from vCPU %d\n", vcpu->id);
diff --git a/tools/testing/selftests/kvm/lib/aarch64/processor.c b/tools/testing/selftests/kvm/lib/aarch64/processor.c
index 41c776b6..43b9a72 100644
--- a/tools/testing/selftests/kvm/lib/aarch64/processor.c
+++ b/tools/testing/selftests/kvm/lib/aarch64/processor.c
@@ -398,7 +398,7 @@ void vcpu_args_set(struct kvm_vcpu *vcpu, unsigned int num, ...)
int i;
TEST_ASSERT(num >= 1 && num <= 8, "Unsupported number of args,\n"
- " num: %u\n", num);
+ " num: %u", num);
va_start(ap, num);
diff --git a/tools/testing/selftests/kvm/lib/aarch64/vgic.c b/tools/testing/selftests/kvm/lib/aarch64/vgic.c
index b5f28d2..184378d5 100644
--- a/tools/testing/selftests/kvm/lib/aarch64/vgic.c
+++ b/tools/testing/selftests/kvm/lib/aarch64/vgic.c
@@ -38,7 +38,7 @@ int vgic_v3_setup(struct kvm_vm *vm, unsigned int nr_vcpus, uint32_t nr_irqs,
struct list_head *iter;
unsigned int nr_gic_pages, nr_vcpus_created = 0;
- TEST_ASSERT(nr_vcpus, "Number of vCPUs cannot be empty\n");
+ TEST_ASSERT(nr_vcpus, "Number of vCPUs cannot be empty");
/*
* Make sure that the caller is infact calling this
@@ -47,7 +47,7 @@ int vgic_v3_setup(struct kvm_vm *vm, unsigned int nr_vcpus, uint32_t nr_irqs,
list_for_each(iter, &vm->vcpus)
nr_vcpus_created++;
TEST_ASSERT(nr_vcpus == nr_vcpus_created,
- "Number of vCPUs requested (%u) doesn't match with the ones created for the VM (%u)\n",
+ "Number of vCPUs requested (%u) doesn't match with the ones created for the VM (%u)",
nr_vcpus, nr_vcpus_created);
/* Distributor setup */
diff --git a/tools/testing/selftests/kvm/lib/elf.c b/tools/testing/selftests/kvm/lib/elf.c
index 266f387..f34d926d 100644
--- a/tools/testing/selftests/kvm/lib/elf.c
+++ b/tools/testing/selftests/kvm/lib/elf.c
@@ -184,7 +184,7 @@ void kvm_vm_elf_load(struct kvm_vm *vm, const char *filename)
"Seek to program segment offset failed,\n"
" program header idx: %u errno: %i\n"
" offset_rv: 0x%jx\n"
- " expected: 0x%jx\n",
+ " expected: 0x%jx",
n1, errno, (intmax_t) offset_rv,
(intmax_t) phdr.p_offset);
test_read(fd, addr_gva2hva(vm, phdr.p_vaddr),
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index e066d58..1b19742 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -27,7 +27,8 @@ int open_path_or_exit(const char *path, int flags)
int fd;
fd = open(path, flags);
- __TEST_REQUIRE(fd >= 0, "%s not available (errno: %d)", path, errno);
+ __TEST_REQUIRE(fd >= 0 || errno != ENOENT, "Cannot open %s: %s", path, strerror(errno));
+ TEST_ASSERT(fd >= 0, "Failed to open '%s'", path);
return fd;
}
@@ -320,7 +321,7 @@ static uint64_t vm_nr_pages_required(enum vm_guest_mode mode,
uint64_t nr_pages;
TEST_ASSERT(nr_runnable_vcpus,
- "Use vm_create_barebones() for VMs that _never_ have vCPUs\n");
+ "Use vm_create_barebones() for VMs that _never_ have vCPUs");
TEST_ASSERT(nr_runnable_vcpus <= kvm_check_cap(KVM_CAP_MAX_VCPUS),
"nr_vcpus = %d too large for host, max-vcpus = %d",
@@ -491,7 +492,7 @@ void kvm_pin_this_task_to_pcpu(uint32_t pcpu)
CPU_ZERO(&mask);
CPU_SET(pcpu, &mask);
r = sched_setaffinity(0, sizeof(mask), &mask);
- TEST_ASSERT(!r, "sched_setaffinity() failed for pCPU '%u'.\n", pcpu);
+ TEST_ASSERT(!r, "sched_setaffinity() failed for pCPU '%u'.", pcpu);
}
static uint32_t parse_pcpu(const char *cpu_str, const cpu_set_t *allowed_mask)
@@ -499,7 +500,7 @@ static uint32_t parse_pcpu(const char *cpu_str, const cpu_set_t *allowed_mask)
uint32_t pcpu = atoi_non_negative("CPU number", cpu_str);
TEST_ASSERT(CPU_ISSET(pcpu, allowed_mask),
- "Not allowed to run on pCPU '%d', check cgroups?\n", pcpu);
+ "Not allowed to run on pCPU '%d', check cgroups?", pcpu);
return pcpu;
}
@@ -529,7 +530,7 @@ void kvm_parse_vcpu_pinning(const char *pcpus_string, uint32_t vcpu_to_pcpu[],
int i, r;
cpu_list = strdup(pcpus_string);
- TEST_ASSERT(cpu_list, "strdup() allocation failed.\n");
+ TEST_ASSERT(cpu_list, "strdup() allocation failed.");
r = sched_getaffinity(0, sizeof(allowed_mask), &allowed_mask);
TEST_ASSERT(!r, "sched_getaffinity() failed");
@@ -538,7 +539,7 @@ void kvm_parse_vcpu_pinning(const char *pcpus_string, uint32_t vcpu_to_pcpu[],
/* 1. Get all pcpus for vcpus. */
for (i = 0; i < nr_vcpus; i++) {
- TEST_ASSERT(cpu, "pCPU not provided for vCPU '%d'\n", i);
+ TEST_ASSERT(cpu, "pCPU not provided for vCPU '%d'", i);
vcpu_to_pcpu[i] = parse_pcpu(cpu, &allowed_mask);
cpu = strtok(NULL, delim);
}
@@ -1057,7 +1058,7 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
TEST_ASSERT(ret == 0, "KVM_SET_USER_MEMORY_REGION2 IOCTL failed,\n"
" rc: %i errno: %i\n"
" slot: %u flags: 0x%x\n"
- " guest_phys_addr: 0x%lx size: 0x%lx guest_memfd: %d\n",
+ " guest_phys_addr: 0x%lx size: 0x%lx guest_memfd: %d",
ret, errno, slot, flags,
guest_paddr, (uint64_t) region->region.memory_size,
region->region.guest_memfd);
@@ -1222,7 +1223,7 @@ void vm_guest_mem_fallocate(struct kvm_vm *vm, uint64_t base, uint64_t size,
len = min_t(uint64_t, end - gpa, region->region.memory_size - offset);
ret = fallocate(region->region.guest_memfd, mode, fd_offset, len);
- TEST_ASSERT(!ret, "fallocate() failed to %s at %lx (len = %lu), fd = %d, mode = %x, offset = %lx\n",
+ TEST_ASSERT(!ret, "fallocate() failed to %s at %lx (len = %lu), fd = %d, mode = %x, offset = %lx",
punch_hole ? "punch hole" : "allocate", gpa, len,
region->region.guest_memfd, mode, fd_offset);
}
@@ -1265,7 +1266,7 @@ struct kvm_vcpu *__vm_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id)
struct kvm_vcpu *vcpu;
/* Confirm a vcpu with the specified id doesn't already exist. */
- TEST_ASSERT(!vcpu_exists(vm, vcpu_id), "vCPU%d already exists\n", vcpu_id);
+ TEST_ASSERT(!vcpu_exists(vm, vcpu_id), "vCPU%d already exists", vcpu_id);
/* Allocate and initialize new vcpu structure. */
vcpu = calloc(1, sizeof(*vcpu));
diff --git a/tools/testing/selftests/kvm/lib/memstress.c b/tools/testing/selftests/kvm/lib/memstress.c
index d05487e..cf2c739 100644
--- a/tools/testing/selftests/kvm/lib/memstress.c
+++ b/tools/testing/selftests/kvm/lib/memstress.c
@@ -192,7 +192,7 @@ struct kvm_vm *memstress_create_vm(enum vm_guest_mode mode, int nr_vcpus,
TEST_ASSERT(guest_num_pages < region_end_gfn,
"Requested more guest memory than address space allows.\n"
" guest pages: %" PRIx64 " max gfn: %" PRIx64
- " nr_vcpus: %d wss: %" PRIx64 "]\n",
+ " nr_vcpus: %d wss: %" PRIx64 "]",
guest_num_pages, region_end_gfn - 1, nr_vcpus, vcpu_memory_bytes);
args->gpa = (region_end_gfn - guest_num_pages - 1) * args->guest_page_size;
diff --git a/tools/testing/selftests/kvm/lib/riscv/processor.c b/tools/testing/selftests/kvm/lib/riscv/processor.c
index 7ca736f..2bb33a8 100644
--- a/tools/testing/selftests/kvm/lib/riscv/processor.c
+++ b/tools/testing/selftests/kvm/lib/riscv/processor.c
@@ -327,7 +327,7 @@ void vcpu_args_set(struct kvm_vcpu *vcpu, unsigned int num, ...)
int i;
TEST_ASSERT(num >= 1 && num <= 8, "Unsupported number of args,\n"
- " num: %u\n", num);
+ " num: %u", num);
va_start(ap, num);
diff --git a/tools/testing/selftests/kvm/lib/s390x/processor.c b/tools/testing/selftests/kvm/lib/s390x/processor.c
index 1594512..f6d2278 100644
--- a/tools/testing/selftests/kvm/lib/s390x/processor.c
+++ b/tools/testing/selftests/kvm/lib/s390x/processor.c
@@ -198,7 +198,7 @@ void vcpu_args_set(struct kvm_vcpu *vcpu, unsigned int num, ...)
int i;
TEST_ASSERT(num >= 1 && num <= 5, "Unsupported number of args,\n"
- " num: %u\n",
+ " num: %u",
num);
va_start(ap, num);
diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c
index 5d7f28b..5a8f8be 100644
--- a/tools/testing/selftests/kvm/lib/test_util.c
+++ b/tools/testing/selftests/kvm/lib/test_util.c
@@ -392,3 +392,28 @@ char *strdup_printf(const char *fmt, ...)
return str;
}
+
+#define CLOCKSOURCE_PATH "/sys/devices/system/clocksource/clocksource0/current_clocksource"
+
+char *sys_get_cur_clocksource(void)
+{
+ char *clk_name;
+ struct stat st;
+ FILE *fp;
+
+ fp = fopen(CLOCKSOURCE_PATH, "r");
+ TEST_ASSERT(fp, "failed to open clocksource file, errno: %d", errno);
+
+ TEST_ASSERT(!fstat(fileno(fp), &st), "failed to stat clocksource file, errno: %d",
+ errno);
+
+ clk_name = malloc(st.st_size);
+ TEST_ASSERT(clk_name, "failed to allocate buffer to read file");
+
+ TEST_ASSERT(fgets(clk_name, st.st_size, fp), "failed to read clocksource file: %d",
+ ferror(fp));
+
+ fclose(fp);
+
+ return clk_name;
+}
diff --git a/tools/testing/selftests/kvm/lib/userfaultfd_util.c b/tools/testing/selftests/kvm/lib/userfaultfd_util.c
index 271f638..f4eef6e 100644
--- a/tools/testing/selftests/kvm/lib/userfaultfd_util.c
+++ b/tools/testing/selftests/kvm/lib/userfaultfd_util.c
@@ -69,7 +69,7 @@ static void *uffd_handler_thread_fn(void *arg)
if (pollfd[1].revents & POLLIN) {
r = read(pollfd[1].fd, &tmp_chr, 1);
TEST_ASSERT(r == 1,
- "Error reading pipefd in UFFD thread\n");
+ "Error reading pipefd in UFFD thread");
break;
}
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index d8288374..f639b3e 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -170,10 +170,10 @@ static uint64_t *virt_create_upper_pte(struct kvm_vm *vm,
* this level.
*/
TEST_ASSERT(current_level != target_level,
- "Cannot create hugepage at level: %u, vaddr: 0x%lx\n",
+ "Cannot create hugepage at level: %u, vaddr: 0x%lx",
current_level, vaddr);
TEST_ASSERT(!(*pte & PTE_LARGE_MASK),
- "Cannot create page table at level: %u, vaddr: 0x%lx\n",
+ "Cannot create page table at level: %u, vaddr: 0x%lx",
current_level, vaddr);
}
return pte;
@@ -220,7 +220,7 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
/* Fill in page table entry. */
pte = virt_get_pte(vm, pde, vaddr, PG_LEVEL_4K);
TEST_ASSERT(!(*pte & PTE_PRESENT_MASK),
- "PTE already present for 4k page at vaddr: 0x%lx\n", vaddr);
+ "PTE already present for 4k page at vaddr: 0x%lx", vaddr);
*pte = PTE_PRESENT_MASK | PTE_WRITABLE_MASK | (paddr & PHYSICAL_PAGE_MASK);
}
@@ -253,7 +253,7 @@ static bool vm_is_target_pte(uint64_t *pte, int *level, int current_level)
if (*pte & PTE_LARGE_MASK) {
TEST_ASSERT(*level == PG_LEVEL_NONE ||
*level == current_level,
- "Unexpected hugepage at level %d\n", current_level);
+ "Unexpected hugepage at level %d", current_level);
*level = current_level;
}
@@ -825,7 +825,7 @@ void vcpu_args_set(struct kvm_vcpu *vcpu, unsigned int num, ...)
struct kvm_regs regs;
TEST_ASSERT(num >= 1 && num <= 6, "Unsupported number of args,\n"
- " num: %u\n",
+ " num: %u",
num);
va_start(ap, num);
@@ -1299,3 +1299,14 @@ void kvm_selftest_arch_init(void)
host_cpu_is_intel = this_cpu_is_intel();
host_cpu_is_amd = this_cpu_is_amd();
}
+
+bool sys_clocksource_is_based_on_tsc(void)
+{
+ char *clk_name = sys_get_cur_clocksource();
+ bool ret = !strcmp(clk_name, "tsc\n") ||
+ !strcmp(clk_name, "hyperv_clocksource_tsc_page\n");
+
+ free(clk_name);
+
+ return ret;
+}
diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
index 59d9753..089b892 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
@@ -54,7 +54,7 @@ int vcpu_enable_evmcs(struct kvm_vcpu *vcpu)
/* KVM should return supported EVMCS version range */
TEST_ASSERT(((evmcs_ver >> 8) >= (evmcs_ver & 0xff)) &&
(evmcs_ver & 0xff) > 0,
- "Incorrect EVMCS version range: %x:%x\n",
+ "Incorrect EVMCS version range: %x:%x",
evmcs_ver & 0xff, evmcs_ver >> 8);
return evmcs_ver;
@@ -387,10 +387,10 @@ static void nested_create_pte(struct kvm_vm *vm,
* this level.
*/
TEST_ASSERT(current_level != target_level,
- "Cannot create hugepage at level: %u, nested_paddr: 0x%lx\n",
+ "Cannot create hugepage at level: %u, nested_paddr: 0x%lx",
current_level, nested_paddr);
TEST_ASSERT(!pte->page_size,
- "Cannot create page table at level: %u, nested_paddr: 0x%lx\n",
+ "Cannot create page table at level: %u, nested_paddr: 0x%lx",
current_level, nested_paddr);
}
}
diff --git a/tools/testing/selftests/kvm/memslot_modification_stress_test.c b/tools/testing/selftests/kvm/memslot_modification_stress_test.c
index 9855c41..1563619 100644
--- a/tools/testing/selftests/kvm/memslot_modification_stress_test.c
+++ b/tools/testing/selftests/kvm/memslot_modification_stress_test.c
@@ -45,7 +45,7 @@ static void vcpu_worker(struct memstress_vcpu_args *vcpu_args)
/* Let the guest access its memory until a stop signal is received */
while (!READ_ONCE(memstress_args.stop_vcpus)) {
ret = _vcpu_run(vcpu);
- TEST_ASSERT(ret == 0, "vcpu_run failed: %d\n", ret);
+ TEST_ASSERT(ret == 0, "vcpu_run failed: %d", ret);
if (get_ucall(vcpu, NULL) == UCALL_SYNC)
continue;
diff --git a/tools/testing/selftests/kvm/memslot_perf_test.c b/tools/testing/selftests/kvm/memslot_perf_test.c
index 8698d1a..579a64f 100644
--- a/tools/testing/selftests/kvm/memslot_perf_test.c
+++ b/tools/testing/selftests/kvm/memslot_perf_test.c
@@ -175,11 +175,11 @@ static void wait_for_vcpu(void)
struct timespec ts;
TEST_ASSERT(!clock_gettime(CLOCK_REALTIME, &ts),
- "clock_gettime() failed: %d\n", errno);
+ "clock_gettime() failed: %d", errno);
ts.tv_sec += 2;
TEST_ASSERT(!sem_timedwait(&vcpu_ready, &ts),
- "sem_timedwait() failed: %d\n", errno);
+ "sem_timedwait() failed: %d", errno);
}
static void *vm_gpa2hva(struct vm_data *data, uint64_t gpa, uint64_t *rempages)
@@ -336,7 +336,7 @@ static bool prepare_vm(struct vm_data *data, int nslots, uint64_t *maxslots,
gpa = vm_phy_pages_alloc(data->vm, npages, guest_addr, slot);
TEST_ASSERT(gpa == guest_addr,
- "vm_phy_pages_alloc() failed\n");
+ "vm_phy_pages_alloc() failed");
data->hva_slots[slot - 1] = addr_gpa2hva(data->vm, guest_addr);
memset(data->hva_slots[slot - 1], 0, npages * guest_page_size);
diff --git a/tools/testing/selftests/kvm/riscv/get-reg-list.c b/tools/testing/selftests/kvm/riscv/get-reg-list.c
index 6652108..6435e7a 100644
--- a/tools/testing/selftests/kvm/riscv/get-reg-list.c
+++ b/tools/testing/selftests/kvm/riscv/get-reg-list.c
@@ -49,15 +49,42 @@ bool filter_reg(__u64 reg)
case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SVPBMT:
case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZBA:
case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZBB:
+ case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZBC:
+ case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZBKB:
+ case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZBKC:
+ case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZBKX:
case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZBS:
+ case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZFA:
+ case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZFH:
+ case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZFHMIN:
case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZICBOM:
case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZICBOZ:
case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZICNTR:
case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZICOND:
case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZICSR:
case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZIFENCEI:
+ case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZIHINTNTL:
case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZIHINTPAUSE:
case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZIHPM:
+ case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZKND:
+ case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZKNE:
+ case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZKNH:
+ case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZKR:
+ case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZKSED:
+ case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZKSH:
+ case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZKT:
+ case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZVBB:
+ case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZVBC:
+ case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZVFH:
+ case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZVFHMIN:
+ case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZVKB:
+ case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZVKG:
+ case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZVKNED:
+ case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZVKNHA:
+ case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZVKNHB:
+ case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZVKSED:
+ case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZVKSH:
+ case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZVKT:
/*
* Like ISA_EXT registers, SBI_EXT registers are only visible when the
* host supports them and disabling them does not affect the visibility
@@ -150,7 +177,7 @@ void finalize_vcpu(struct kvm_vcpu *vcpu, struct vcpu_reg_list *c)
/* Double check whether the desired extension was enabled */
__TEST_REQUIRE(vcpu_has_ext(vcpu, feature),
- "%s not available, skipping tests\n", s->name);
+ "%s not available, skipping tests", s->name);
}
}
@@ -394,15 +421,42 @@ static const char *isa_ext_single_id_to_str(__u64 reg_off)
KVM_ISA_EXT_ARR(SVPBMT),
KVM_ISA_EXT_ARR(ZBA),
KVM_ISA_EXT_ARR(ZBB),
+ KVM_ISA_EXT_ARR(ZBC),
+ KVM_ISA_EXT_ARR(ZBKB),
+ KVM_ISA_EXT_ARR(ZBKC),
+ KVM_ISA_EXT_ARR(ZBKX),
KVM_ISA_EXT_ARR(ZBS),
+ KVM_ISA_EXT_ARR(ZFA),
+ KVM_ISA_EXT_ARR(ZFH),
+ KVM_ISA_EXT_ARR(ZFHMIN),
KVM_ISA_EXT_ARR(ZICBOM),
KVM_ISA_EXT_ARR(ZICBOZ),
KVM_ISA_EXT_ARR(ZICNTR),
KVM_ISA_EXT_ARR(ZICOND),
KVM_ISA_EXT_ARR(ZICSR),
KVM_ISA_EXT_ARR(ZIFENCEI),
+ KVM_ISA_EXT_ARR(ZIHINTNTL),
KVM_ISA_EXT_ARR(ZIHINTPAUSE),
KVM_ISA_EXT_ARR(ZIHPM),
+ KVM_ISA_EXT_ARR(ZKND),
+ KVM_ISA_EXT_ARR(ZKNE),
+ KVM_ISA_EXT_ARR(ZKNH),
+ KVM_ISA_EXT_ARR(ZKR),
+ KVM_ISA_EXT_ARR(ZKSED),
+ KVM_ISA_EXT_ARR(ZKSH),
+ KVM_ISA_EXT_ARR(ZKT),
+ KVM_ISA_EXT_ARR(ZVBB),
+ KVM_ISA_EXT_ARR(ZVBC),
+ KVM_ISA_EXT_ARR(ZVFH),
+ KVM_ISA_EXT_ARR(ZVFHMIN),
+ KVM_ISA_EXT_ARR(ZVKB),
+ KVM_ISA_EXT_ARR(ZVKG),
+ KVM_ISA_EXT_ARR(ZVKNED),
+ KVM_ISA_EXT_ARR(ZVKNHA),
+ KVM_ISA_EXT_ARR(ZVKNHB),
+ KVM_ISA_EXT_ARR(ZVKSED),
+ KVM_ISA_EXT_ARR(ZVKSH),
+ KVM_ISA_EXT_ARR(ZVKT),
};
if (reg_off >= ARRAY_SIZE(kvm_isa_ext_reg_name))
@@ -888,15 +942,42 @@ KVM_ISA_EXT_SIMPLE_CONFIG(svnapot, SVNAPOT);
KVM_ISA_EXT_SIMPLE_CONFIG(svpbmt, SVPBMT);
KVM_ISA_EXT_SIMPLE_CONFIG(zba, ZBA);
KVM_ISA_EXT_SIMPLE_CONFIG(zbb, ZBB);
+KVM_ISA_EXT_SIMPLE_CONFIG(zbc, ZBC);
+KVM_ISA_EXT_SIMPLE_CONFIG(zbkb, ZBKB);
+KVM_ISA_EXT_SIMPLE_CONFIG(zbkc, ZBKC);
+KVM_ISA_EXT_SIMPLE_CONFIG(zbkx, ZBKX);
KVM_ISA_EXT_SIMPLE_CONFIG(zbs, ZBS);
+KVM_ISA_EXT_SIMPLE_CONFIG(zfa, ZFA);
+KVM_ISA_EXT_SIMPLE_CONFIG(zfh, ZFH);
+KVM_ISA_EXT_SIMPLE_CONFIG(zfhmin, ZFHMIN);
KVM_ISA_EXT_SUBLIST_CONFIG(zicbom, ZICBOM);
KVM_ISA_EXT_SUBLIST_CONFIG(zicboz, ZICBOZ);
KVM_ISA_EXT_SIMPLE_CONFIG(zicntr, ZICNTR);
KVM_ISA_EXT_SIMPLE_CONFIG(zicond, ZICOND);
KVM_ISA_EXT_SIMPLE_CONFIG(zicsr, ZICSR);
KVM_ISA_EXT_SIMPLE_CONFIG(zifencei, ZIFENCEI);
+KVM_ISA_EXT_SIMPLE_CONFIG(zihintntl, ZIHINTNTL);
KVM_ISA_EXT_SIMPLE_CONFIG(zihintpause, ZIHINTPAUSE);
KVM_ISA_EXT_SIMPLE_CONFIG(zihpm, ZIHPM);
+KVM_ISA_EXT_SIMPLE_CONFIG(zknd, ZKND);
+KVM_ISA_EXT_SIMPLE_CONFIG(zkne, ZKNE);
+KVM_ISA_EXT_SIMPLE_CONFIG(zknh, ZKNH);
+KVM_ISA_EXT_SIMPLE_CONFIG(zkr, ZKR);
+KVM_ISA_EXT_SIMPLE_CONFIG(zksed, ZKSED);
+KVM_ISA_EXT_SIMPLE_CONFIG(zksh, ZKSH);
+KVM_ISA_EXT_SIMPLE_CONFIG(zkt, ZKT);
+KVM_ISA_EXT_SIMPLE_CONFIG(zvbb, ZVBB);
+KVM_ISA_EXT_SIMPLE_CONFIG(zvbc, ZVBC);
+KVM_ISA_EXT_SIMPLE_CONFIG(zvfh, ZVFH);
+KVM_ISA_EXT_SIMPLE_CONFIG(zvfhmin, ZVFHMIN);
+KVM_ISA_EXT_SIMPLE_CONFIG(zvkb, ZVKB);
+KVM_ISA_EXT_SIMPLE_CONFIG(zvkg, ZVKG);
+KVM_ISA_EXT_SIMPLE_CONFIG(zvkned, ZVKNED);
+KVM_ISA_EXT_SIMPLE_CONFIG(zvknha, ZVKNHA);
+KVM_ISA_EXT_SIMPLE_CONFIG(zvknhb, ZVKNHB);
+KVM_ISA_EXT_SIMPLE_CONFIG(zvksed, ZVKSED);
+KVM_ISA_EXT_SIMPLE_CONFIG(zvksh, ZVKSH);
+KVM_ISA_EXT_SIMPLE_CONFIG(zvkt, ZVKT);
struct vcpu_reg_list *vcpu_configs[] = {
&config_sbi_base,
@@ -914,14 +995,41 @@ struct vcpu_reg_list *vcpu_configs[] = {
&config_svpbmt,
&config_zba,
&config_zbb,
+ &config_zbc,
+ &config_zbkb,
+ &config_zbkc,
+ &config_zbkx,
&config_zbs,
+ &config_zfa,
+ &config_zfh,
+ &config_zfhmin,
&config_zicbom,
&config_zicboz,
&config_zicntr,
&config_zicond,
&config_zicsr,
&config_zifencei,
+ &config_zihintntl,
&config_zihintpause,
&config_zihpm,
+ &config_zknd,
+ &config_zkne,
+ &config_zknh,
+ &config_zkr,
+ &config_zksed,
+ &config_zksh,
+ &config_zkt,
+ &config_zvbb,
+ &config_zvbc,
+ &config_zvfh,
+ &config_zvfhmin,
+ &config_zvkb,
+ &config_zvkg,
+ &config_zvkned,
+ &config_zvknha,
+ &config_zvknhb,
+ &config_zvksed,
+ &config_zvksh,
+ &config_zvkt,
};
int vcpu_configs_n = ARRAY_SIZE(vcpu_configs);
diff --git a/tools/testing/selftests/kvm/rseq_test.c b/tools/testing/selftests/kvm/rseq_test.c
index f74e76d..28f97fb 100644
--- a/tools/testing/selftests/kvm/rseq_test.c
+++ b/tools/testing/selftests/kvm/rseq_test.c
@@ -245,7 +245,7 @@ int main(int argc, char *argv[])
} while (snapshot != atomic_read(&seq_cnt));
TEST_ASSERT(rseq_cpu == cpu,
- "rseq CPU = %d, sched CPU = %d\n", rseq_cpu, cpu);
+ "rseq CPU = %d, sched CPU = %d", rseq_cpu, cpu);
}
/*
@@ -256,7 +256,7 @@ int main(int argc, char *argv[])
* migrations given the 1us+ delay in the migration task.
*/
TEST_ASSERT(i > (NR_TASK_MIGRATIONS / 2),
- "Only performed %d KVM_RUNs, task stalled too much?\n", i);
+ "Only performed %d KVM_RUNs, task stalled too much?", i);
pthread_join(migration_thread, NULL);
diff --git a/tools/testing/selftests/kvm/s390x/resets.c b/tools/testing/selftests/kvm/s390x/resets.c
index e41e2cb..357943f 100644
--- a/tools/testing/selftests/kvm/s390x/resets.c
+++ b/tools/testing/selftests/kvm/s390x/resets.c
@@ -78,7 +78,7 @@ static void assert_noirq(struct kvm_vcpu *vcpu)
* (notably, the emergency call interrupt we have injected) should
* be cleared by the resets, so this should be 0.
*/
- TEST_ASSERT(irqs >= 0, "Could not fetch IRQs: errno %d\n", errno);
+ TEST_ASSERT(irqs >= 0, "Could not fetch IRQs: errno %d", errno);
TEST_ASSERT(!irqs, "IRQ pending");
}
@@ -199,7 +199,7 @@ static void inject_irq(struct kvm_vcpu *vcpu)
irq->type = KVM_S390_INT_EMERGENCY;
irq->u.emerg.code = vcpu->id;
irqs = __vcpu_ioctl(vcpu, KVM_S390_SET_IRQ_STATE, &irq_state);
- TEST_ASSERT(irqs >= 0, "Error injecting EMERGENCY IRQ errno %d\n", errno);
+ TEST_ASSERT(irqs >= 0, "Error injecting EMERGENCY IRQ errno %d", errno);
}
static struct kvm_vm *create_vm(struct kvm_vcpu **vcpu)
diff --git a/tools/testing/selftests/kvm/s390x/sync_regs_test.c b/tools/testing/selftests/kvm/s390x/sync_regs_test.c
index 636a70d..43fb25d 100644
--- a/tools/testing/selftests/kvm/s390x/sync_regs_test.c
+++ b/tools/testing/selftests/kvm/s390x/sync_regs_test.c
@@ -39,13 +39,13 @@ static void guest_code(void)
#define REG_COMPARE(reg) \
TEST_ASSERT(left->reg == right->reg, \
"Register " #reg \
- " values did not match: 0x%llx, 0x%llx\n", \
+ " values did not match: 0x%llx, 0x%llx", \
left->reg, right->reg)
#define REG_COMPARE32(reg) \
TEST_ASSERT(left->reg == right->reg, \
"Register " #reg \
- " values did not match: 0x%x, 0x%x\n", \
+ " values did not match: 0x%x, 0x%x", \
left->reg, right->reg)
@@ -82,14 +82,14 @@ void test_read_invalid(struct kvm_vcpu *vcpu)
run->kvm_valid_regs = INVALID_SYNC_FIELD;
rv = _vcpu_run(vcpu);
TEST_ASSERT(rv < 0 && errno == EINVAL,
- "Invalid kvm_valid_regs did not cause expected KVM_RUN error: %d\n",
+ "Invalid kvm_valid_regs did not cause expected KVM_RUN error: %d",
rv);
run->kvm_valid_regs = 0;
run->kvm_valid_regs = INVALID_SYNC_FIELD | TEST_SYNC_FIELDS;
rv = _vcpu_run(vcpu);
TEST_ASSERT(rv < 0 && errno == EINVAL,
- "Invalid kvm_valid_regs did not cause expected KVM_RUN error: %d\n",
+ "Invalid kvm_valid_regs did not cause expected KVM_RUN error: %d",
rv);
run->kvm_valid_regs = 0;
}
@@ -103,14 +103,14 @@ void test_set_invalid(struct kvm_vcpu *vcpu)
run->kvm_dirty_regs = INVALID_SYNC_FIELD;
rv = _vcpu_run(vcpu);
TEST_ASSERT(rv < 0 && errno == EINVAL,
- "Invalid kvm_dirty_regs did not cause expected KVM_RUN error: %d\n",
+ "Invalid kvm_dirty_regs did not cause expected KVM_RUN error: %d",
rv);
run->kvm_dirty_regs = 0;
run->kvm_dirty_regs = INVALID_SYNC_FIELD | TEST_SYNC_FIELDS;
rv = _vcpu_run(vcpu);
TEST_ASSERT(rv < 0 && errno == EINVAL,
- "Invalid kvm_dirty_regs did not cause expected KVM_RUN error: %d\n",
+ "Invalid kvm_dirty_regs did not cause expected KVM_RUN error: %d",
rv);
run->kvm_dirty_regs = 0;
}
@@ -125,12 +125,12 @@ void test_req_and_verify_all_valid_regs(struct kvm_vcpu *vcpu)
/* Request and verify all valid register sets. */
run->kvm_valid_regs = TEST_SYNC_FIELDS;
rv = _vcpu_run(vcpu);
- TEST_ASSERT(rv == 0, "vcpu_run failed: %d\n", rv);
+ TEST_ASSERT(rv == 0, "vcpu_run failed: %d", rv);
TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_S390_SIEIC);
TEST_ASSERT(run->s390_sieic.icptcode == 4 &&
(run->s390_sieic.ipa >> 8) == 0x83 &&
(run->s390_sieic.ipb >> 16) == 0x501,
- "Unexpected interception code: ic=%u, ipa=0x%x, ipb=0x%x\n",
+ "Unexpected interception code: ic=%u, ipa=0x%x, ipb=0x%x",
run->s390_sieic.icptcode, run->s390_sieic.ipa,
run->s390_sieic.ipb);
@@ -161,7 +161,7 @@ void test_set_and_verify_various_reg_values(struct kvm_vcpu *vcpu)
}
rv = _vcpu_run(vcpu);
- TEST_ASSERT(rv == 0, "vcpu_run failed: %d\n", rv);
+ TEST_ASSERT(rv == 0, "vcpu_run failed: %d", rv);
TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_S390_SIEIC);
TEST_ASSERT(run->s.regs.gprs[11] == 0xBAD1DEA + 1,
"r11 sync regs value incorrect 0x%llx.",
@@ -193,7 +193,7 @@ void test_clear_kvm_dirty_regs_bits(struct kvm_vcpu *vcpu)
run->s.regs.gprs[11] = 0xDEADBEEF;
run->s.regs.diag318 = 0x4B1D;
rv = _vcpu_run(vcpu);
- TEST_ASSERT(rv == 0, "vcpu_run failed: %d\n", rv);
+ TEST_ASSERT(rv == 0, "vcpu_run failed: %d", rv);
TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_S390_SIEIC);
TEST_ASSERT(run->s.regs.gprs[11] != 0xDEADBEEF,
"r11 sync regs value incorrect 0x%llx.",
diff --git a/tools/testing/selftests/kvm/set_memory_region_test.c b/tools/testing/selftests/kvm/set_memory_region_test.c
index 075b80d..06b43ed 100644
--- a/tools/testing/selftests/kvm/set_memory_region_test.c
+++ b/tools/testing/selftests/kvm/set_memory_region_test.c
@@ -98,11 +98,11 @@ static void wait_for_vcpu(void)
struct timespec ts;
TEST_ASSERT(!clock_gettime(CLOCK_REALTIME, &ts),
- "clock_gettime() failed: %d\n", errno);
+ "clock_gettime() failed: %d", errno);
ts.tv_sec += 2;
TEST_ASSERT(!sem_timedwait(&vcpu_ready, &ts),
- "sem_timedwait() failed: %d\n", errno);
+ "sem_timedwait() failed: %d", errno);
/* Wait for the vCPU thread to reenter the guest. */
usleep(100000);
@@ -302,7 +302,7 @@ static void test_delete_memory_region(void)
if (run->exit_reason == KVM_EXIT_INTERNAL_ERROR)
TEST_ASSERT(regs.rip >= final_rip_start &&
regs.rip < final_rip_end,
- "Bad rip, expected 0x%lx - 0x%lx, got 0x%llx\n",
+ "Bad rip, expected 0x%lx - 0x%lx, got 0x%llx",
final_rip_start, final_rip_end, regs.rip);
kvm_vm_free(vm);
@@ -367,11 +367,21 @@ static void test_invalid_memory_region_flags(void)
}
if (supported_flags & KVM_MEM_GUEST_MEMFD) {
+ int guest_memfd = vm_create_guest_memfd(vm, MEM_REGION_SIZE, 0);
+
r = __vm_set_user_memory_region2(vm, 0,
KVM_MEM_LOG_DIRTY_PAGES | KVM_MEM_GUEST_MEMFD,
- 0, MEM_REGION_SIZE, NULL, 0, 0);
+ 0, MEM_REGION_SIZE, NULL, guest_memfd, 0);
TEST_ASSERT(r && errno == EINVAL,
"KVM_SET_USER_MEMORY_REGION2 should have failed, dirty logging private memory is unsupported");
+
+ r = __vm_set_user_memory_region2(vm, 0,
+ KVM_MEM_READONLY | KVM_MEM_GUEST_MEMFD,
+ 0, MEM_REGION_SIZE, NULL, guest_memfd, 0);
+ TEST_ASSERT(r && errno == EINVAL,
+ "KVM_SET_USER_MEMORY_REGION2 should have failed, read-only GUEST_MEMFD memslots are unsupported");
+
+ close(guest_memfd);
}
}
diff --git a/tools/testing/selftests/kvm/system_counter_offset_test.c b/tools/testing/selftests/kvm/system_counter_offset_test.c
index 7f5b330..513d421 100644
--- a/tools/testing/selftests/kvm/system_counter_offset_test.c
+++ b/tools/testing/selftests/kvm/system_counter_offset_test.c
@@ -108,7 +108,7 @@ static void enter_guest(struct kvm_vcpu *vcpu)
handle_abort(&uc);
return;
default:
- TEST_ASSERT(0, "unhandled ucall %ld\n",
+ TEST_ASSERT(0, "unhandled ucall %ld",
get_ucall(vcpu, &uc));
}
}
diff --git a/tools/testing/selftests/kvm/x86_64/amx_test.c b/tools/testing/selftests/kvm/x86_64/amx_test.c
index 11329e5..eae521f 100644
--- a/tools/testing/selftests/kvm/x86_64/amx_test.c
+++ b/tools/testing/selftests/kvm/x86_64/amx_test.c
@@ -221,7 +221,7 @@ int main(int argc, char *argv[])
vm_vaddr_t amx_cfg, tiledata, xstate;
struct ucall uc;
u32 amx_offset;
- int stage, ret;
+ int ret;
/*
* Note, all off-by-default features must be enabled before anything
@@ -263,7 +263,7 @@ int main(int argc, char *argv[])
memset(addr_gva2hva(vm, xstate), 0, PAGE_SIZE * DIV_ROUND_UP(XSAVE_SIZE, PAGE_SIZE));
vcpu_args_set(vcpu, 3, amx_cfg, tiledata, xstate);
- for (stage = 1; ; stage++) {
+ for (;;) {
vcpu_run(vcpu);
TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
@@ -296,7 +296,7 @@ int main(int argc, char *argv[])
void *tiles_data = (void *)addr_gva2hva(vm, tiledata);
/* Only check TMM0 register, 1 tile */
ret = memcmp(amx_start, tiles_data, TILE_SIZE);
- TEST_ASSERT(ret == 0, "memcmp failed, ret=%d\n", ret);
+ TEST_ASSERT(ret == 0, "memcmp failed, ret=%d", ret);
kvm_x86_state_cleanup(state);
break;
case 9:
diff --git a/tools/testing/selftests/kvm/x86_64/cpuid_test.c b/tools/testing/selftests/kvm/x86_64/cpuid_test.c
index 3b34d81..8c579ce 100644
--- a/tools/testing/selftests/kvm/x86_64/cpuid_test.c
+++ b/tools/testing/selftests/kvm/x86_64/cpuid_test.c
@@ -84,7 +84,7 @@ static void compare_cpuids(const struct kvm_cpuid2 *cpuid1,
TEST_ASSERT(e1->function == e2->function &&
e1->index == e2->index && e1->flags == e2->flags,
- "CPUID entries[%d] mismtach: 0x%x.%d.%x vs. 0x%x.%d.%x\n",
+ "CPUID entries[%d] mismtach: 0x%x.%d.%x vs. 0x%x.%d.%x",
i, e1->function, e1->index, e1->flags,
e2->function, e2->index, e2->flags);
@@ -170,7 +170,7 @@ static void test_get_cpuid2(struct kvm_vcpu *vcpu)
vcpu_ioctl(vcpu, KVM_GET_CPUID2, cpuid);
TEST_ASSERT(cpuid->nent == vcpu->cpuid->nent,
- "KVM didn't update nent on success, wanted %u, got %u\n",
+ "KVM didn't update nent on success, wanted %u, got %u",
vcpu->cpuid->nent, cpuid->nent);
for (i = 0; i < vcpu->cpuid->nent; i++) {
diff --git a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
index 634c6bf..ee3b384 100644
--- a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
+++ b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
@@ -92,7 +92,6 @@ static void run_test(enum vm_guest_mode mode, void *unused)
uint64_t host_num_pages;
uint64_t pages_per_slot;
int i;
- uint64_t total_4k_pages;
struct kvm_page_stats stats_populated;
struct kvm_page_stats stats_dirty_logging_enabled;
struct kvm_page_stats stats_dirty_pass[ITERATIONS];
@@ -107,6 +106,9 @@ static void run_test(enum vm_guest_mode mode, void *unused)
guest_num_pages = vm_adjust_num_guest_pages(mode, guest_num_pages);
host_num_pages = vm_num_host_pages(mode, guest_num_pages);
pages_per_slot = host_num_pages / SLOTS;
+ TEST_ASSERT_EQ(host_num_pages, pages_per_slot * SLOTS);
+ TEST_ASSERT(!(host_num_pages % 512),
+ "Number of pages, '%lu' not a multiple of 2MiB", host_num_pages);
bitmaps = memstress_alloc_bitmaps(SLOTS, pages_per_slot);
@@ -165,10 +167,8 @@ static void run_test(enum vm_guest_mode mode, void *unused)
memstress_free_bitmaps(bitmaps, SLOTS);
memstress_destroy_vm(vm);
- /* Make assertions about the page counts. */
- total_4k_pages = stats_populated.pages_4k;
- total_4k_pages += stats_populated.pages_2m * 512;
- total_4k_pages += stats_populated.pages_1g * 512 * 512;
+ TEST_ASSERT_EQ((stats_populated.pages_2m * 512 +
+ stats_populated.pages_1g * 512 * 512), host_num_pages);
/*
* Check that all huge pages were split. Since large pages can only
@@ -180,19 +180,22 @@ static void run_test(enum vm_guest_mode mode, void *unused)
*/
if (dirty_log_manual_caps) {
TEST_ASSERT_EQ(stats_clear_pass[0].hugepages, 0);
- TEST_ASSERT_EQ(stats_clear_pass[0].pages_4k, total_4k_pages);
+ TEST_ASSERT(stats_clear_pass[0].pages_4k >= host_num_pages,
+ "Expected at least '%lu' 4KiB pages, found only '%lu'",
+ host_num_pages, stats_clear_pass[0].pages_4k);
TEST_ASSERT_EQ(stats_dirty_logging_enabled.hugepages, stats_populated.hugepages);
} else {
TEST_ASSERT_EQ(stats_dirty_logging_enabled.hugepages, 0);
- TEST_ASSERT_EQ(stats_dirty_logging_enabled.pages_4k, total_4k_pages);
+ TEST_ASSERT(stats_dirty_logging_enabled.pages_4k >= host_num_pages,
+ "Expected at least '%lu' 4KiB pages, found only '%lu'",
+ host_num_pages, stats_dirty_logging_enabled.pages_4k);
}
/*
* Once dirty logging is disabled and the vCPUs have touched all their
- * memory again, the page counts should be the same as they were
+ * memory again, the hugepage counts should be the same as they were
* right after initial population of memory.
*/
- TEST_ASSERT_EQ(stats_populated.pages_4k, stats_repopulated.pages_4k);
TEST_ASSERT_EQ(stats_populated.pages_2m, stats_repopulated.pages_2m);
TEST_ASSERT_EQ(stats_populated.pages_1g, stats_repopulated.pages_1g);
}
diff --git a/tools/testing/selftests/kvm/x86_64/flds_emulation.h b/tools/testing/selftests/kvm/x86_64/flds_emulation.h
index 0a1573d..37b1a9f 100644
--- a/tools/testing/selftests/kvm/x86_64/flds_emulation.h
+++ b/tools/testing/selftests/kvm/x86_64/flds_emulation.h
@@ -41,7 +41,7 @@ static inline void handle_flds_emulation_failure_exit(struct kvm_vcpu *vcpu)
insn_bytes = run->emulation_failure.insn_bytes;
TEST_ASSERT(insn_bytes[0] == 0xd9 && insn_bytes[1] == 0,
- "Expected 'flds [eax]', opcode '0xd9 0x00', got opcode 0x%02x 0x%02x\n",
+ "Expected 'flds [eax]', opcode '0xd9 0x00', got opcode 0x%02x 0x%02x",
insn_bytes[0], insn_bytes[1]);
vcpu_regs_get(vcpu, ®s);
diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_clock.c b/tools/testing/selftests/kvm/x86_64/hyperv_clock.c
index f5e1e98..e058bc6 100644
--- a/tools/testing/selftests/kvm/x86_64/hyperv_clock.c
+++ b/tools/testing/selftests/kvm/x86_64/hyperv_clock.c
@@ -212,6 +212,7 @@ int main(void)
int stage;
TEST_REQUIRE(kvm_has_cap(KVM_CAP_HYPERV_TIME));
+ TEST_REQUIRE(sys_clocksource_is_based_on_tsc());
vm = vm_create_with_one_vcpu(&vcpu, guest_main);
@@ -220,7 +221,7 @@ int main(void)
tsc_page_gva = vm_vaddr_alloc_page(vm);
memset(addr_gva2hva(vm, tsc_page_gva), 0x0, getpagesize());
TEST_ASSERT((addr_gva2gpa(vm, tsc_page_gva) & (getpagesize() - 1)) == 0,
- "TSC page has to be page aligned\n");
+ "TSC page has to be page aligned");
vcpu_args_set(vcpu, 2, tsc_page_gva, addr_gva2gpa(vm, tsc_page_gva));
host_check_tsc_msr_rdtsc(vcpu);
@@ -237,7 +238,7 @@ int main(void)
break;
case UCALL_DONE:
/* Keep in sync with guest_main() */
- TEST_ASSERT(stage == 11, "Testing ended prematurely, stage %d\n",
+ TEST_ASSERT(stage == 11, "Testing ended prematurely, stage %d",
stage);
goto out;
default:
diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_features.c b/tools/testing/selftests/kvm/x86_64/hyperv_features.c
index 4f4193f..b923a28 100644
--- a/tools/testing/selftests/kvm/x86_64/hyperv_features.c
+++ b/tools/testing/selftests/kvm/x86_64/hyperv_features.c
@@ -454,7 +454,7 @@ static void guest_test_msrs_access(void)
case 44:
/* MSR is not available when CPUID feature bit is unset */
if (!has_invtsc)
- continue;
+ goto next_stage;
msr->idx = HV_X64_MSR_TSC_INVARIANT_CONTROL;
msr->write = false;
msr->fault_expected = true;
@@ -462,7 +462,7 @@ static void guest_test_msrs_access(void)
case 45:
/* MSR is vailable when CPUID feature bit is set */
if (!has_invtsc)
- continue;
+ goto next_stage;
vcpu_set_cpuid_feature(vcpu, HV_ACCESS_TSC_INVARIANT);
msr->idx = HV_X64_MSR_TSC_INVARIANT_CONTROL;
msr->write = false;
@@ -471,7 +471,7 @@ static void guest_test_msrs_access(void)
case 46:
/* Writing bits other than 0 is forbidden */
if (!has_invtsc)
- continue;
+ goto next_stage;
msr->idx = HV_X64_MSR_TSC_INVARIANT_CONTROL;
msr->write = true;
msr->write_val = 0xdeadbeef;
@@ -480,7 +480,7 @@ static void guest_test_msrs_access(void)
case 47:
/* Setting bit 0 enables the feature */
if (!has_invtsc)
- continue;
+ goto next_stage;
msr->idx = HV_X64_MSR_TSC_INVARIANT_CONTROL;
msr->write = true;
msr->write_val = 1;
@@ -513,6 +513,7 @@ static void guest_test_msrs_access(void)
return;
}
+next_stage:
stage++;
kvm_vm_free(vm);
}
diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_ipi.c b/tools/testing/selftests/kvm/x86_64/hyperv_ipi.c
index 65e5f4c..f161776 100644
--- a/tools/testing/selftests/kvm/x86_64/hyperv_ipi.c
+++ b/tools/testing/selftests/kvm/x86_64/hyperv_ipi.c
@@ -289,7 +289,7 @@ int main(int argc, char *argv[])
switch (get_ucall(vcpu[0], &uc)) {
case UCALL_SYNC:
TEST_ASSERT(uc.args[1] == stage,
- "Unexpected stage: %ld (%d expected)\n",
+ "Unexpected stage: %ld (%d expected)",
uc.args[1], stage);
break;
case UCALL_DONE:
diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush.c b/tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush.c
index c4443f7..05b5609 100644
--- a/tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush.c
+++ b/tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush.c
@@ -658,7 +658,7 @@ int main(int argc, char *argv[])
switch (get_ucall(vcpu[0], &uc)) {
case UCALL_SYNC:
TEST_ASSERT(uc.args[1] == stage,
- "Unexpected stage: %ld (%d expected)\n",
+ "Unexpected stage: %ld (%d expected)",
uc.args[1], stage);
break;
case UCALL_ABORT:
diff --git a/tools/testing/selftests/kvm/x86_64/kvm_clock_test.c b/tools/testing/selftests/kvm/x86_64/kvm_clock_test.c
index 1778704..5bc1222 100644
--- a/tools/testing/selftests/kvm/x86_64/kvm_clock_test.c
+++ b/tools/testing/selftests/kvm/x86_64/kvm_clock_test.c
@@ -92,7 +92,7 @@ static void setup_clock(struct kvm_vm *vm, struct test_case *test_case)
break;
} while (errno == EINTR);
- TEST_ASSERT(!r, "clock_gettime() failed: %d\n", r);
+ TEST_ASSERT(!r, "clock_gettime() failed: %d", r);
data.realtime = ts.tv_sec * NSEC_PER_SEC;
data.realtime += ts.tv_nsec;
@@ -127,47 +127,11 @@ static void enter_guest(struct kvm_vcpu *vcpu)
handle_abort(&uc);
return;
default:
- TEST_ASSERT(0, "unhandled ucall: %ld\n", uc.cmd);
+ TEST_ASSERT(0, "unhandled ucall: %ld", uc.cmd);
}
}
}
-#define CLOCKSOURCE_PATH "/sys/devices/system/clocksource/clocksource0/current_clocksource"
-
-static void check_clocksource(void)
-{
- char *clk_name;
- struct stat st;
- FILE *fp;
-
- fp = fopen(CLOCKSOURCE_PATH, "r");
- if (!fp) {
- pr_info("failed to open clocksource file: %d; assuming TSC.\n",
- errno);
- return;
- }
-
- if (fstat(fileno(fp), &st)) {
- pr_info("failed to stat clocksource file: %d; assuming TSC.\n",
- errno);
- goto out;
- }
-
- clk_name = malloc(st.st_size);
- TEST_ASSERT(clk_name, "failed to allocate buffer to read file\n");
-
- if (!fgets(clk_name, st.st_size, fp)) {
- pr_info("failed to read clocksource file: %d; assuming TSC.\n",
- ferror(fp));
- goto out;
- }
-
- TEST_ASSERT(!strncmp(clk_name, "tsc\n", st.st_size),
- "clocksource not supported: %s", clk_name);
-out:
- fclose(fp);
-}
-
int main(void)
{
struct kvm_vcpu *vcpu;
@@ -179,7 +143,7 @@ int main(void)
flags = kvm_check_cap(KVM_CAP_ADJUST_CLOCK);
TEST_REQUIRE(flags & KVM_CLOCK_REALTIME);
- check_clocksource();
+ TEST_REQUIRE(sys_clocksource_is_based_on_tsc());
vm = vm_create_with_one_vcpu(&vcpu, guest_main);
diff --git a/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c
index 83e25bc..17bbb96 100644
--- a/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c
+++ b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c
@@ -257,9 +257,9 @@ int main(int argc, char **argv)
TEST_REQUIRE(kvm_has_cap(KVM_CAP_VM_DISABLE_NX_HUGE_PAGES));
__TEST_REQUIRE(token == MAGIC_TOKEN,
- "This test must be run with the magic token %d.\n"
- "This is done by nx_huge_pages_test.sh, which\n"
- "also handles environment setup for the test.", MAGIC_TOKEN);
+ "This test must be run with the magic token via '-t %d'.\n"
+ "Running via nx_huge_pages_test.sh, which also handles "
+ "environment setup, is strongly recommended.", MAGIC_TOKEN);
run_test(reclaim_period_ms, false, reboot_permissions);
run_test(reclaim_period_ms, true, reboot_permissions);
diff --git a/tools/testing/selftests/kvm/x86_64/platform_info_test.c b/tools/testing/selftests/kvm/x86_64/platform_info_test.c
index c9a0796..8701196 100644
--- a/tools/testing/selftests/kvm/x86_64/platform_info_test.c
+++ b/tools/testing/selftests/kvm/x86_64/platform_info_test.c
@@ -44,7 +44,7 @@ static void test_msr_platform_info_enabled(struct kvm_vcpu *vcpu)
get_ucall(vcpu, &uc);
TEST_ASSERT(uc.cmd == UCALL_SYNC,
- "Received ucall other than UCALL_SYNC: %lu\n", uc.cmd);
+ "Received ucall other than UCALL_SYNC: %lu", uc.cmd);
TEST_ASSERT((uc.args[1] & MSR_PLATFORM_INFO_MAX_TURBO_RATIO) ==
MSR_PLATFORM_INFO_MAX_TURBO_RATIO,
"Expected MSR_PLATFORM_INFO to have max turbo ratio mask: %i.",
diff --git a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
index 283cc55..a3bd54b 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
@@ -866,7 +866,7 @@ static void __test_fixed_counter_bitmap(struct kvm_vcpu *vcpu, uint8_t idx,
* userspace doesn't set any pmu filter.
*/
count = run_vcpu_to_sync(vcpu);
- TEST_ASSERT(count, "Unexpected count value: %ld\n", count);
+ TEST_ASSERT(count, "Unexpected count value: %ld", count);
for (i = 0; i < BIT(nr_fixed_counters); i++) {
bitmap = BIT(i);
diff --git a/tools/testing/selftests/kvm/x86_64/sev_migrate_tests.c b/tools/testing/selftests/kvm/x86_64/sev_migrate_tests.c
index c7ef975..a49828a 100644
--- a/tools/testing/selftests/kvm/x86_64/sev_migrate_tests.c
+++ b/tools/testing/selftests/kvm/x86_64/sev_migrate_tests.c
@@ -91,7 +91,7 @@ static void sev_migrate_from(struct kvm_vm *dst, struct kvm_vm *src)
int ret;
ret = __sev_migrate_from(dst, src);
- TEST_ASSERT(!ret, "Migration failed, ret: %d, errno: %d\n", ret, errno);
+ TEST_ASSERT(!ret, "Migration failed, ret: %d, errno: %d", ret, errno);
}
static void test_sev_migrate_from(bool es)
@@ -113,7 +113,7 @@ static void test_sev_migrate_from(bool es)
/* Migrate the guest back to the original VM. */
ret = __sev_migrate_from(src_vm, dst_vms[NR_MIGRATE_TEST_VMS - 1]);
TEST_ASSERT(ret == -1 && errno == EIO,
- "VM that was migrated from should be dead. ret %d, errno: %d\n", ret,
+ "VM that was migrated from should be dead. ret %d, errno: %d", ret,
errno);
kvm_vm_free(src_vm);
@@ -172,7 +172,7 @@ static void test_sev_migrate_parameters(void)
vm_no_sev = aux_vm_create(true);
ret = __sev_migrate_from(vm_no_vcpu, vm_no_sev);
TEST_ASSERT(ret == -1 && errno == EINVAL,
- "Migrations require SEV enabled. ret %d, errno: %d\n", ret,
+ "Migrations require SEV enabled. ret %d, errno: %d", ret,
errno);
if (!have_sev_es)
@@ -187,25 +187,25 @@ static void test_sev_migrate_parameters(void)
ret = __sev_migrate_from(sev_vm, sev_es_vm);
TEST_ASSERT(
ret == -1 && errno == EINVAL,
- "Should not be able migrate to SEV enabled VM. ret: %d, errno: %d\n",
+ "Should not be able migrate to SEV enabled VM. ret: %d, errno: %d",
ret, errno);
ret = __sev_migrate_from(sev_es_vm, sev_vm);
TEST_ASSERT(
ret == -1 && errno == EINVAL,
- "Should not be able migrate to SEV-ES enabled VM. ret: %d, errno: %d\n",
+ "Should not be able migrate to SEV-ES enabled VM. ret: %d, errno: %d",
ret, errno);
ret = __sev_migrate_from(vm_no_vcpu, sev_es_vm);
TEST_ASSERT(
ret == -1 && errno == EINVAL,
- "SEV-ES migrations require same number of vCPUS. ret: %d, errno: %d\n",
+ "SEV-ES migrations require same number of vCPUS. ret: %d, errno: %d",
ret, errno);
ret = __sev_migrate_from(vm_no_vcpu, sev_es_vm_no_vmsa);
TEST_ASSERT(
ret == -1 && errno == EINVAL,
- "SEV-ES migrations require UPDATE_VMSA. ret %d, errno: %d\n",
+ "SEV-ES migrations require UPDATE_VMSA. ret %d, errno: %d",
ret, errno);
kvm_vm_free(sev_vm);
@@ -227,7 +227,7 @@ static void sev_mirror_create(struct kvm_vm *dst, struct kvm_vm *src)
int ret;
ret = __sev_mirror_create(dst, src);
- TEST_ASSERT(!ret, "Copying context failed, ret: %d, errno: %d\n", ret, errno);
+ TEST_ASSERT(!ret, "Copying context failed, ret: %d, errno: %d", ret, errno);
}
static void verify_mirror_allowed_cmds(int vm_fd)
@@ -259,7 +259,7 @@ static void verify_mirror_allowed_cmds(int vm_fd)
ret = __sev_ioctl(vm_fd, cmd_id, NULL, &fw_error);
TEST_ASSERT(
ret == -1 && errno == EINVAL,
- "Should not be able call command: %d. ret: %d, errno: %d\n",
+ "Should not be able call command: %d. ret: %d, errno: %d",
cmd_id, ret, errno);
}
@@ -301,18 +301,18 @@ static void test_sev_mirror_parameters(void)
ret = __sev_mirror_create(sev_vm, sev_vm);
TEST_ASSERT(
ret == -1 && errno == EINVAL,
- "Should not be able copy context to self. ret: %d, errno: %d\n",
+ "Should not be able copy context to self. ret: %d, errno: %d",
ret, errno);
ret = __sev_mirror_create(vm_no_vcpu, vm_with_vcpu);
TEST_ASSERT(ret == -1 && errno == EINVAL,
- "Copy context requires SEV enabled. ret %d, errno: %d\n", ret,
+ "Copy context requires SEV enabled. ret %d, errno: %d", ret,
errno);
ret = __sev_mirror_create(vm_with_vcpu, sev_vm);
TEST_ASSERT(
ret == -1 && errno == EINVAL,
- "SEV copy context requires no vCPUS on the destination. ret: %d, errno: %d\n",
+ "SEV copy context requires no vCPUS on the destination. ret: %d, errno: %d",
ret, errno);
if (!have_sev_es)
@@ -322,13 +322,13 @@ static void test_sev_mirror_parameters(void)
ret = __sev_mirror_create(sev_vm, sev_es_vm);
TEST_ASSERT(
ret == -1 && errno == EINVAL,
- "Should not be able copy context to SEV enabled VM. ret: %d, errno: %d\n",
+ "Should not be able copy context to SEV enabled VM. ret: %d, errno: %d",
ret, errno);
ret = __sev_mirror_create(sev_es_vm, sev_vm);
TEST_ASSERT(
ret == -1 && errno == EINVAL,
- "Should not be able copy context to SEV-ES enabled VM. ret: %d, errno: %d\n",
+ "Should not be able copy context to SEV-ES enabled VM. ret: %d, errno: %d",
ret, errno);
kvm_vm_free(sev_es_vm);
diff --git a/tools/testing/selftests/kvm/x86_64/smaller_maxphyaddr_emulation_test.c b/tools/testing/selftests/kvm/x86_64/smaller_maxphyaddr_emulation_test.c
index 06edf00..1a46dd7 100644
--- a/tools/testing/selftests/kvm/x86_64/smaller_maxphyaddr_emulation_test.c
+++ b/tools/testing/selftests/kvm/x86_64/smaller_maxphyaddr_emulation_test.c
@@ -74,7 +74,7 @@ int main(int argc, char *argv[])
MEM_REGION_SIZE / PAGE_SIZE, 0);
gpa = vm_phy_pages_alloc(vm, MEM_REGION_SIZE / PAGE_SIZE,
MEM_REGION_GPA, MEM_REGION_SLOT);
- TEST_ASSERT(gpa == MEM_REGION_GPA, "Failed vm_phy_pages_alloc\n");
+ TEST_ASSERT(gpa == MEM_REGION_GPA, "Failed vm_phy_pages_alloc");
virt_map(vm, MEM_REGION_GVA, MEM_REGION_GPA, 1);
hva = addr_gpa2hva(vm, MEM_REGION_GPA);
memset(hva, 0, PAGE_SIZE);
@@ -102,7 +102,7 @@ int main(int argc, char *argv[])
case UCALL_DONE:
break;
default:
- TEST_FAIL("Unrecognized ucall: %lu\n", uc.cmd);
+ TEST_FAIL("Unrecognized ucall: %lu", uc.cmd);
}
kvm_vm_free(vm);
diff --git a/tools/testing/selftests/kvm/x86_64/sync_regs_test.c b/tools/testing/selftests/kvm/x86_64/sync_regs_test.c
index 00965ba..a91b5b1 100644
--- a/tools/testing/selftests/kvm/x86_64/sync_regs_test.c
+++ b/tools/testing/selftests/kvm/x86_64/sync_regs_test.c
@@ -46,7 +46,7 @@ static void compare_regs(struct kvm_regs *left, struct kvm_regs *right)
#define REG_COMPARE(reg) \
TEST_ASSERT(left->reg == right->reg, \
"Register " #reg \
- " values did not match: 0x%llx, 0x%llx\n", \
+ " values did not match: 0x%llx, 0x%llx", \
left->reg, right->reg)
REG_COMPARE(rax);
REG_COMPARE(rbx);
@@ -230,14 +230,14 @@ int main(int argc, char *argv[])
run->kvm_valid_regs = INVALID_SYNC_FIELD;
rv = _vcpu_run(vcpu);
TEST_ASSERT(rv < 0 && errno == EINVAL,
- "Invalid kvm_valid_regs did not cause expected KVM_RUN error: %d\n",
+ "Invalid kvm_valid_regs did not cause expected KVM_RUN error: %d",
rv);
run->kvm_valid_regs = 0;
run->kvm_valid_regs = INVALID_SYNC_FIELD | TEST_SYNC_FIELDS;
rv = _vcpu_run(vcpu);
TEST_ASSERT(rv < 0 && errno == EINVAL,
- "Invalid kvm_valid_regs did not cause expected KVM_RUN error: %d\n",
+ "Invalid kvm_valid_regs did not cause expected KVM_RUN error: %d",
rv);
run->kvm_valid_regs = 0;
@@ -245,14 +245,14 @@ int main(int argc, char *argv[])
run->kvm_dirty_regs = INVALID_SYNC_FIELD;
rv = _vcpu_run(vcpu);
TEST_ASSERT(rv < 0 && errno == EINVAL,
- "Invalid kvm_dirty_regs did not cause expected KVM_RUN error: %d\n",
+ "Invalid kvm_dirty_regs did not cause expected KVM_RUN error: %d",
rv);
run->kvm_dirty_regs = 0;
run->kvm_dirty_regs = INVALID_SYNC_FIELD | TEST_SYNC_FIELDS;
rv = _vcpu_run(vcpu);
TEST_ASSERT(rv < 0 && errno == EINVAL,
- "Invalid kvm_dirty_regs did not cause expected KVM_RUN error: %d\n",
+ "Invalid kvm_dirty_regs did not cause expected KVM_RUN error: %d",
rv);
run->kvm_dirty_regs = 0;
diff --git a/tools/testing/selftests/kvm/x86_64/ucna_injection_test.c b/tools/testing/selftests/kvm/x86_64/ucna_injection_test.c
index 0ed32ec..dcbb3c2 100644
--- a/tools/testing/selftests/kvm/x86_64/ucna_injection_test.c
+++ b/tools/testing/selftests/kvm/x86_64/ucna_injection_test.c
@@ -143,7 +143,7 @@ static void run_vcpu_expect_gp(struct kvm_vcpu *vcpu)
TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
TEST_ASSERT(get_ucall(vcpu, &uc) == UCALL_SYNC,
- "Expect UCALL_SYNC\n");
+ "Expect UCALL_SYNC");
TEST_ASSERT(uc.args[1] == SYNC_GP, "#GP is expected.");
printf("vCPU received GP in guest.\n");
}
@@ -188,7 +188,7 @@ static void *run_ucna_injection(void *arg)
TEST_ASSERT_KVM_EXIT_REASON(params->vcpu, KVM_EXIT_IO);
TEST_ASSERT(get_ucall(params->vcpu, &uc) == UCALL_SYNC,
- "Expect UCALL_SYNC\n");
+ "Expect UCALL_SYNC");
TEST_ASSERT(uc.args[1] == SYNC_FIRST_UCNA, "Injecting first UCNA.");
printf("Injecting first UCNA at %#x.\n", FIRST_UCNA_ADDR);
@@ -198,7 +198,7 @@ static void *run_ucna_injection(void *arg)
TEST_ASSERT_KVM_EXIT_REASON(params->vcpu, KVM_EXIT_IO);
TEST_ASSERT(get_ucall(params->vcpu, &uc) == UCALL_SYNC,
- "Expect UCALL_SYNC\n");
+ "Expect UCALL_SYNC");
TEST_ASSERT(uc.args[1] == SYNC_SECOND_UCNA, "Injecting second UCNA.");
printf("Injecting second UCNA at %#x.\n", SECOND_UCNA_ADDR);
@@ -208,7 +208,7 @@ static void *run_ucna_injection(void *arg)
TEST_ASSERT_KVM_EXIT_REASON(params->vcpu, KVM_EXIT_IO);
if (get_ucall(params->vcpu, &uc) == UCALL_ABORT) {
- TEST_ASSERT(false, "vCPU assertion failure: %s.\n",
+ TEST_ASSERT(false, "vCPU assertion failure: %s.",
(const char *)uc.args[0]);
}
diff --git a/tools/testing/selftests/kvm/x86_64/userspace_io_test.c b/tools/testing/selftests/kvm/x86_64/userspace_io_test.c
index 255c50b..9481cbc 100644
--- a/tools/testing/selftests/kvm/x86_64/userspace_io_test.c
+++ b/tools/testing/selftests/kvm/x86_64/userspace_io_test.c
@@ -71,7 +71,7 @@ int main(int argc, char *argv[])
break;
TEST_ASSERT(run->io.port == 0x80,
- "Expected I/O at port 0x80, got port 0x%x\n", run->io.port);
+ "Expected I/O at port 0x80, got port 0x%x", run->io.port);
/*
* Modify the rep string count in RCX: 2 => 1 and 3 => 8192.
diff --git a/tools/testing/selftests/kvm/x86_64/vmx_apic_access_test.c b/tools/testing/selftests/kvm/x86_64/vmx_apic_access_test.c
index 2bed5fb..a81a247 100644
--- a/tools/testing/selftests/kvm/x86_64/vmx_apic_access_test.c
+++ b/tools/testing/selftests/kvm/x86_64/vmx_apic_access_test.c
@@ -99,7 +99,7 @@ int main(int argc, char *argv[])
TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_INTERNAL_ERROR);
TEST_ASSERT(run->internal.suberror ==
KVM_INTERNAL_ERROR_EMULATION,
- "Got internal suberror other than KVM_INTERNAL_ERROR_EMULATION: %u\n",
+ "Got internal suberror other than KVM_INTERNAL_ERROR_EMULATION: %u",
run->internal.suberror);
break;
}
diff --git a/tools/testing/selftests/kvm/x86_64/vmx_dirty_log_test.c b/tools/testing/selftests/kvm/x86_64/vmx_dirty_log_test.c
index e4ad5fe..7f6f5f2 100644
--- a/tools/testing/selftests/kvm/x86_64/vmx_dirty_log_test.c
+++ b/tools/testing/selftests/kvm/x86_64/vmx_dirty_log_test.c
@@ -128,17 +128,17 @@ int main(int argc, char *argv[])
*/
kvm_vm_get_dirty_log(vm, TEST_MEM_SLOT_INDEX, bmap);
if (uc.args[1]) {
- TEST_ASSERT(test_bit(0, bmap), "Page 0 incorrectly reported clean\n");
- TEST_ASSERT(host_test_mem[0] == 1, "Page 0 not written by guest\n");
+ TEST_ASSERT(test_bit(0, bmap), "Page 0 incorrectly reported clean");
+ TEST_ASSERT(host_test_mem[0] == 1, "Page 0 not written by guest");
} else {
- TEST_ASSERT(!test_bit(0, bmap), "Page 0 incorrectly reported dirty\n");
- TEST_ASSERT(host_test_mem[0] == 0xaaaaaaaaaaaaaaaaULL, "Page 0 written by guest\n");
+ TEST_ASSERT(!test_bit(0, bmap), "Page 0 incorrectly reported dirty");
+ TEST_ASSERT(host_test_mem[0] == 0xaaaaaaaaaaaaaaaaULL, "Page 0 written by guest");
}
- TEST_ASSERT(!test_bit(1, bmap), "Page 1 incorrectly reported dirty\n");
- TEST_ASSERT(host_test_mem[4096 / 8] == 0xaaaaaaaaaaaaaaaaULL, "Page 1 written by guest\n");
- TEST_ASSERT(!test_bit(2, bmap), "Page 2 incorrectly reported dirty\n");
- TEST_ASSERT(host_test_mem[8192 / 8] == 0xaaaaaaaaaaaaaaaaULL, "Page 2 written by guest\n");
+ TEST_ASSERT(!test_bit(1, bmap), "Page 1 incorrectly reported dirty");
+ TEST_ASSERT(host_test_mem[4096 / 8] == 0xaaaaaaaaaaaaaaaaULL, "Page 1 written by guest");
+ TEST_ASSERT(!test_bit(2, bmap), "Page 2 incorrectly reported dirty");
+ TEST_ASSERT(host_test_mem[8192 / 8] == 0xaaaaaaaaaaaaaaaaULL, "Page 2 written by guest");
break;
case UCALL_DONE:
done = true;
diff --git a/tools/testing/selftests/kvm/x86_64/vmx_exception_with_invalid_guest_state.c b/tools/testing/selftests/kvm/x86_64/vmx_exception_with_invalid_guest_state.c
index a9b827c..fad3634 100644
--- a/tools/testing/selftests/kvm/x86_64/vmx_exception_with_invalid_guest_state.c
+++ b/tools/testing/selftests/kvm/x86_64/vmx_exception_with_invalid_guest_state.c
@@ -28,7 +28,7 @@ static void __run_vcpu_with_invalid_state(struct kvm_vcpu *vcpu)
TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_INTERNAL_ERROR);
TEST_ASSERT(run->emulation_failure.suberror == KVM_INTERNAL_ERROR_EMULATION,
- "Expected emulation failure, got %d\n",
+ "Expected emulation failure, got %d",
run->emulation_failure.suberror);
}
diff --git a/tools/testing/selftests/kvm/x86_64/vmx_nested_tsc_scaling_test.c b/tools/testing/selftests/kvm/x86_64/vmx_nested_tsc_scaling_test.c
index e710b6e..1759fa5 100644
--- a/tools/testing/selftests/kvm/x86_64/vmx_nested_tsc_scaling_test.c
+++ b/tools/testing/selftests/kvm/x86_64/vmx_nested_tsc_scaling_test.c
@@ -116,23 +116,6 @@ static void l1_guest_code(struct vmx_pages *vmx_pages)
GUEST_DONE();
}
-static bool system_has_stable_tsc(void)
-{
- bool tsc_is_stable;
- FILE *fp;
- char buf[4];
-
- fp = fopen("/sys/devices/system/clocksource/clocksource0/current_clocksource", "r");
- if (fp == NULL)
- return false;
-
- tsc_is_stable = fgets(buf, sizeof(buf), fp) &&
- !strncmp(buf, "tsc", sizeof(buf));
-
- fclose(fp);
- return tsc_is_stable;
-}
-
int main(int argc, char *argv[])
{
struct kvm_vcpu *vcpu;
@@ -148,7 +131,7 @@ int main(int argc, char *argv[])
TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_VMX));
TEST_REQUIRE(kvm_has_cap(KVM_CAP_TSC_CONTROL));
- TEST_REQUIRE(system_has_stable_tsc());
+ TEST_REQUIRE(sys_clocksource_is_based_on_tsc());
/*
* We set L1's scale factor to be a random number from 2 to 10.
diff --git a/tools/testing/selftests/kvm/x86_64/xapic_ipi_test.c b/tools/testing/selftests/kvm/x86_64/xapic_ipi_test.c
index 67ac2a3..725c206 100644
--- a/tools/testing/selftests/kvm/x86_64/xapic_ipi_test.c
+++ b/tools/testing/selftests/kvm/x86_64/xapic_ipi_test.c
@@ -216,7 +216,7 @@ static void *vcpu_thread(void *arg)
"Halting vCPU halted %lu times, woke %lu times, received %lu IPIs.\n"
"Halter TPR=%#x PPR=%#x LVR=%#x\n"
"Migrations attempted: %lu\n"
- "Migrations completed: %lu\n",
+ "Migrations completed: %lu",
vcpu->id, (const char *)uc.args[0],
params->data->ipis_sent, params->data->hlt_count,
params->data->wake_count,
@@ -288,7 +288,7 @@ void do_migrations(struct test_data_page *data, int run_secs, int delay_usecs,
}
TEST_ASSERT(nodes > 1,
- "Did not find at least 2 numa nodes. Can't do migration\n");
+ "Did not find at least 2 numa nodes. Can't do migration");
fprintf(stderr, "Migrating amongst %d nodes found\n", nodes);
@@ -347,7 +347,7 @@ void do_migrations(struct test_data_page *data, int run_secs, int delay_usecs,
wake_count != data->wake_count,
"IPI, HLT and wake count have not increased "
"in the last %lu seconds. "
- "HLTer is likely hung.\n", interval_secs);
+ "HLTer is likely hung.", interval_secs);
ipis_sent = data->ipis_sent;
hlt_count = data->hlt_count;
@@ -381,7 +381,7 @@ void get_cmdline_args(int argc, char *argv[], int *run_secs,
"-m adds calls to migrate_pages while vCPUs are running."
" Default is no migrations.\n"
"-d <delay microseconds> - delay between migrate_pages() calls."
- " Default is %d microseconds.\n",
+ " Default is %d microseconds.",
DEFAULT_RUN_SECS, DEFAULT_DELAY_USECS);
}
}
diff --git a/tools/testing/selftests/kvm/x86_64/xcr0_cpuid_test.c b/tools/testing/selftests/kvm/x86_64/xcr0_cpuid_test.c
index dc62174..25a0b0d 100644
--- a/tools/testing/selftests/kvm/x86_64/xcr0_cpuid_test.c
+++ b/tools/testing/selftests/kvm/x86_64/xcr0_cpuid_test.c
@@ -116,7 +116,7 @@ int main(int argc, char *argv[])
vcpu_run(vcpu);
TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
- "Unexpected exit reason: %u (%s),\n",
+ "Unexpected exit reason: %u (%s),",
run->exit_reason,
exit_reason_str(run->exit_reason));
diff --git a/tools/testing/selftests/kvm/x86_64/xss_msr_test.c b/tools/testing/selftests/kvm/x86_64/xss_msr_test.c
index e0ddf47..167c97a 100644
--- a/tools/testing/selftests/kvm/x86_64/xss_msr_test.c
+++ b/tools/testing/selftests/kvm/x86_64/xss_msr_test.c
@@ -29,7 +29,7 @@ int main(int argc, char *argv[])
xss_val = vcpu_get_msr(vcpu, MSR_IA32_XSS);
TEST_ASSERT(xss_val == 0,
- "MSR_IA32_XSS should be initialized to zero\n");
+ "MSR_IA32_XSS should be initialized to zero");
vcpu_set_msr(vcpu, MSR_IA32_XSS, xss_val);
diff --git a/tools/testing/selftests/landlock/common.h b/tools/testing/selftests/landlock/common.h
index 5b79758..e64bbdf 100644
--- a/tools/testing/selftests/landlock/common.h
+++ b/tools/testing/selftests/landlock/common.h
@@ -9,6 +9,7 @@
#include <errno.h>
#include <linux/landlock.h>
+#include <linux/securebits.h>
#include <sys/capability.h>
#include <sys/socket.h>
#include <sys/syscall.h>
@@ -115,11 +116,16 @@ static void _init_caps(struct __test_metadata *const _metadata, bool drop_all)
/* clang-format off */
CAP_DAC_OVERRIDE,
CAP_MKNOD,
+ CAP_NET_ADMIN,
+ CAP_NET_BIND_SERVICE,
CAP_SYS_ADMIN,
CAP_SYS_CHROOT,
- CAP_NET_BIND_SERVICE,
/* clang-format on */
};
+ const unsigned int noroot = SECBIT_NOROOT | SECBIT_NOROOT_LOCKED;
+
+ if ((cap_get_secbits() & noroot) != noroot)
+ EXPECT_EQ(0, cap_set_secbits(noroot));
cap_p = cap_get_proc();
EXPECT_NE(NULL, cap_p)
@@ -137,6 +143,8 @@ static void _init_caps(struct __test_metadata *const _metadata, bool drop_all)
TH_LOG("Failed to cap_set_flag: %s", strerror(errno));
}
}
+
+ /* Automatically resets ambient capabilities. */
EXPECT_NE(-1, cap_set_proc(cap_p))
{
TH_LOG("Failed to cap_set_proc: %s", strerror(errno));
@@ -145,6 +153,9 @@ static void _init_caps(struct __test_metadata *const _metadata, bool drop_all)
{
TH_LOG("Failed to cap_free: %s", strerror(errno));
}
+
+ /* Quickly checks that ambient capabilities are cleared. */
+ EXPECT_NE(-1, cap_get_ambient(caps[0]));
}
/* We cannot put such helpers in a library because of kselftest_harness.h . */
@@ -158,8 +169,9 @@ static void __maybe_unused drop_caps(struct __test_metadata *const _metadata)
_init_caps(_metadata, true);
}
-static void _effective_cap(struct __test_metadata *const _metadata,
- const cap_value_t caps, const cap_flag_value_t value)
+static void _change_cap(struct __test_metadata *const _metadata,
+ const cap_flag_t flag, const cap_value_t cap,
+ const cap_flag_value_t value)
{
cap_t cap_p;
@@ -168,7 +180,7 @@ static void _effective_cap(struct __test_metadata *const _metadata,
{
TH_LOG("Failed to cap_get_proc: %s", strerror(errno));
}
- EXPECT_NE(-1, cap_set_flag(cap_p, CAP_EFFECTIVE, 1, &caps, value))
+ EXPECT_NE(-1, cap_set_flag(cap_p, flag, 1, &cap, value))
{
TH_LOG("Failed to cap_set_flag: %s", strerror(errno));
}
@@ -183,15 +195,35 @@ static void _effective_cap(struct __test_metadata *const _metadata,
}
static void __maybe_unused set_cap(struct __test_metadata *const _metadata,
- const cap_value_t caps)
+ const cap_value_t cap)
{
- _effective_cap(_metadata, caps, CAP_SET);
+ _change_cap(_metadata, CAP_EFFECTIVE, cap, CAP_SET);
}
static void __maybe_unused clear_cap(struct __test_metadata *const _metadata,
- const cap_value_t caps)
+ const cap_value_t cap)
{
- _effective_cap(_metadata, caps, CAP_CLEAR);
+ _change_cap(_metadata, CAP_EFFECTIVE, cap, CAP_CLEAR);
+}
+
+static void __maybe_unused
+set_ambient_cap(struct __test_metadata *const _metadata, const cap_value_t cap)
+{
+ _change_cap(_metadata, CAP_INHERITABLE, cap, CAP_SET);
+
+ EXPECT_NE(-1, cap_set_ambient(cap, CAP_SET))
+ {
+ TH_LOG("Failed to set ambient capability %d: %s", cap,
+ strerror(errno));
+ }
+}
+
+static void __maybe_unused clear_ambient_cap(
+ struct __test_metadata *const _metadata, const cap_value_t cap)
+{
+ EXPECT_EQ(1, cap_get_ambient(cap));
+ _change_cap(_metadata, CAP_INHERITABLE, cap, CAP_CLEAR);
+ EXPECT_EQ(0, cap_get_ambient(cap));
}
/* Receives an FD from a UNIX socket. Returns the received FD, or -errno. */
diff --git a/tools/testing/selftests/landlock/fs_test.c b/tools/testing/selftests/landlock/fs_test.c
index 5081890..2d6d9b4 100644
--- a/tools/testing/selftests/landlock/fs_test.c
+++ b/tools/testing/selftests/landlock/fs_test.c
@@ -241,9 +241,11 @@ struct mnt_opt {
const char *const data;
};
-const struct mnt_opt mnt_tmp = {
+#define MNT_TMP_DATA "size=4m,mode=700"
+
+static const struct mnt_opt mnt_tmp = {
.type = "tmpfs",
- .data = "size=4m,mode=700",
+ .data = MNT_TMP_DATA,
};
static int mount_opt(const struct mnt_opt *const mnt, const char *const target)
@@ -4632,7 +4634,10 @@ FIXTURE_VARIANT(layout3_fs)
/* clang-format off */
FIXTURE_VARIANT_ADD(layout3_fs, tmpfs) {
/* clang-format on */
- .mnt = mnt_tmp,
+ .mnt = {
+ .type = "tmpfs",
+ .data = MNT_TMP_DATA,
+ },
.file_path = file1_s1d1,
};
diff --git a/tools/testing/selftests/landlock/net_test.c b/tools/testing/selftests/landlock/net_test.c
index ea5f727..936cfc8 100644
--- a/tools/testing/selftests/landlock/net_test.c
+++ b/tools/testing/selftests/landlock/net_test.c
@@ -17,6 +17,7 @@
#include <string.h>
#include <sys/prctl.h>
#include <sys/socket.h>
+#include <sys/syscall.h>
#include <sys/un.h>
#include "common.h"
@@ -54,6 +55,11 @@ struct service_fixture {
};
};
+static pid_t sys_gettid(void)
+{
+ return syscall(__NR_gettid);
+}
+
static int set_service(struct service_fixture *const srv,
const struct protocol_variant prot,
const unsigned short index)
@@ -88,7 +94,7 @@ static int set_service(struct service_fixture *const srv,
case AF_UNIX:
srv->unix_addr.sun_family = prot.domain;
sprintf(srv->unix_addr.sun_path,
- "_selftests-landlock-net-tid%d-index%d", gettid(),
+ "_selftests-landlock-net-tid%d-index%d", sys_gettid(),
index);
srv->unix_addr_len = SUN_LEN(&srv->unix_addr);
srv->unix_addr.sun_path[0] = '\0';
@@ -101,8 +107,11 @@ static void setup_loopback(struct __test_metadata *const _metadata)
{
set_cap(_metadata, CAP_SYS_ADMIN);
ASSERT_EQ(0, unshare(CLONE_NEWNET));
- ASSERT_EQ(0, system("ip link set dev lo up"));
clear_cap(_metadata, CAP_SYS_ADMIN);
+
+ set_ambient_cap(_metadata, CAP_NET_ADMIN);
+ ASSERT_EQ(0, system("ip link set dev lo up"));
+ clear_ambient_cap(_metadata, CAP_NET_ADMIN);
}
static bool is_restricted(const struct protocol_variant *const prot,
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index aa646e0..286ce0e 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -58,7 +58,8 @@
TEST_GEN_PROGS_EXTENDED := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_PROGS_EXTENDED))
TEST_GEN_FILES := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_FILES))
-all: $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES)
+all: $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES) \
+ $(if $(TEST_GEN_MODS_DIR),gen_mods_dir)
define RUN_TESTS
BASE_DIR="$(selfdir)"; \
@@ -71,8 +72,8 @@
run_tests: all
ifdef building_out_of_srctree
- @if [ "X$(TEST_PROGS)$(TEST_PROGS_EXTENDED)$(TEST_FILES)" != "X" ]; then \
- rsync -aq --copy-unsafe-links $(TEST_PROGS) $(TEST_PROGS_EXTENDED) $(TEST_FILES) $(OUTPUT); \
+ @if [ "X$(TEST_PROGS)$(TEST_PROGS_EXTENDED)$(TEST_FILES)$(TEST_GEN_MODS_DIR)" != "X" ]; then \
+ rsync -aq --copy-unsafe-links $(TEST_PROGS) $(TEST_PROGS_EXTENDED) $(TEST_FILES) $(TEST_GEN_MODS_DIR) $(OUTPUT); \
fi
@if [ "X$(TEST_PROGS)" != "X" ]; then \
$(call RUN_TESTS, $(TEST_GEN_PROGS) $(TEST_CUSTOM_PROGS) \
@@ -84,11 +85,22 @@
@$(call RUN_TESTS, $(TEST_GEN_PROGS) $(TEST_CUSTOM_PROGS) $(TEST_PROGS))
endif
+gen_mods_dir:
+ $(Q)$(MAKE) -C $(TEST_GEN_MODS_DIR)
+
+clean_mods_dir:
+ $(Q)$(MAKE) -C $(TEST_GEN_MODS_DIR) clean
+
define INSTALL_SINGLE_RULE
$(if $(INSTALL_LIST),@mkdir -p $(INSTALL_PATH))
$(if $(INSTALL_LIST),rsync -a --copy-unsafe-links $(INSTALL_LIST) $(INSTALL_PATH)/)
endef
+define INSTALL_MODS_RULE
+ $(if $(INSTALL_LIST),@mkdir -p $(INSTALL_PATH)/$(INSTALL_LIST))
+ $(if $(INSTALL_LIST),rsync -a --copy-unsafe-links $(INSTALL_LIST)/*.ko $(INSTALL_PATH)/$(INSTALL_LIST))
+endef
+
define INSTALL_RULE
$(eval INSTALL_LIST = $(TEST_PROGS)) $(INSTALL_SINGLE_RULE)
$(eval INSTALL_LIST = $(TEST_PROGS_EXTENDED)) $(INSTALL_SINGLE_RULE)
@@ -97,6 +109,7 @@
$(eval INSTALL_LIST = $(TEST_CUSTOM_PROGS)) $(INSTALL_SINGLE_RULE)
$(eval INSTALL_LIST = $(TEST_GEN_PROGS_EXTENDED)) $(INSTALL_SINGLE_RULE)
$(eval INSTALL_LIST = $(TEST_GEN_FILES)) $(INSTALL_SINGLE_RULE)
+ $(eval INSTALL_LIST = $(notdir $(TEST_GEN_MODS_DIR))) $(INSTALL_MODS_RULE)
$(eval INSTALL_LIST = $(wildcard config settings)) $(INSTALL_SINGLE_RULE)
endef
@@ -122,7 +135,7 @@
$(RM) -r $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES) $(EXTRA_CLEAN)
endef
-clean:
+clean: $(if $(TEST_GEN_MODS_DIR),clean_mods_dir)
$(CLEAN)
# Enables to extend CFLAGS and LDFLAGS from command line, e.g.
@@ -153,4 +166,4 @@
$(LINK.S) $^ $(LDLIBS) -o $@
endif
-.PHONY: run_tests all clean install emit_tests
+.PHONY: run_tests all clean install emit_tests gen_mods_dir clean_mods_dir
diff --git a/tools/testing/selftests/livepatch/.gitignore b/tools/testing/selftests/livepatch/.gitignore
new file mode 100644
index 0000000..f1e9c2a2
--- /dev/null
+++ b/tools/testing/selftests/livepatch/.gitignore
@@ -0,0 +1 @@
+test_klp-call_getpid
diff --git a/tools/testing/selftests/livepatch/Makefile b/tools/testing/selftests/livepatch/Makefile
index 02fadc9..35418a4 100644
--- a/tools/testing/selftests/livepatch/Makefile
+++ b/tools/testing/selftests/livepatch/Makefile
@@ -1,5 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
+TEST_GEN_FILES := test_klp-call_getpid
+TEST_GEN_MODS_DIR := test_modules
TEST_PROGS_EXTENDED := functions.sh
TEST_PROGS := \
test-livepatch.sh \
@@ -7,7 +9,8 @@
test-shadow-vars.sh \
test-state.sh \
test-ftrace.sh \
- test-sysfs.sh
+ test-sysfs.sh \
+ test-syscall.sh
TEST_FILES := settings
diff --git a/tools/testing/selftests/livepatch/README b/tools/testing/selftests/livepatch/README
index 0942dd5..d2035dd 100644
--- a/tools/testing/selftests/livepatch/README
+++ b/tools/testing/selftests/livepatch/README
@@ -13,23 +13,36 @@
Config
------
-Set these config options and their prerequisites:
+Set CONFIG_LIVEPATCH=y option and it's prerequisites.
-CONFIG_LIVEPATCH=y
-CONFIG_TEST_LIVEPATCH=m
+Building the tests
+------------------
+
+To only build the tests without running them, run:
+
+ % make -C tools/testing/selftests/livepatch
+
+The command above will compile all test modules and test programs, making them
+ready to be packaged if so desired.
Running the tests
-----------------
-Test kernel modules are built as part of lib/ (make modules) and need to
-be installed (make modules_install) as the test scripts will modprobe
-them.
+Test kernel modules are built before running the livepatch selftests. The
+modules are located under test_modules directory, and are built as out-of-tree
+modules. This is specially useful since the same sources can be built and
+tested on systems with different kABI, ensuring they the tests are backwards
+compatible. The modules will be loaded by the test scripts using insmod.
To run the livepatch selftests, from the top of the kernel source tree:
% make -C tools/testing/selftests TARGETS=livepatch run_tests
+or
+
+ % make kselftest TARGETS=livepatch
+
Adding tests
------------
diff --git a/tools/testing/selftests/livepatch/config b/tools/testing/selftests/livepatch/config
index ad23100..e88bf51 100644
--- a/tools/testing/selftests/livepatch/config
+++ b/tools/testing/selftests/livepatch/config
@@ -1,3 +1,2 @@
CONFIG_LIVEPATCH=y
CONFIG_DYNAMIC_DEBUG=y
-CONFIG_TEST_LIVEPATCH=m
diff --git a/tools/testing/selftests/livepatch/functions.sh b/tools/testing/selftests/livepatch/functions.sh
index b1fd736..fc4c6a0 100644
--- a/tools/testing/selftests/livepatch/functions.sh
+++ b/tools/testing/selftests/livepatch/functions.sh
@@ -34,6 +34,18 @@
fi
}
+# Check if we can compile the modules before loading them
+function has_kdir() {
+ if [ -z "$KDIR" ]; then
+ KDIR="/lib/modules/$(uname -r)/build"
+ fi
+
+ if [ ! -d "$KDIR" ]; then
+ echo "skip all tests: KDIR ($KDIR) not available to compile modules."
+ exit $ksft_skip
+ fi
+}
+
# die(msg) - game over, man
# msg - dying words
function die() {
@@ -96,6 +108,7 @@
# the ftrace_enabled sysctl.
function setup_config() {
is_root
+ has_kdir
push_config
set_dynamic_debug
set_ftrace_enabled 1
@@ -115,16 +128,14 @@
done
}
-function assert_mod() {
- local mod="$1"
-
- modprobe --dry-run "$mod" &>/dev/null
-}
-
function is_livepatch_mod() {
local mod="$1"
- if [[ $(modinfo "$mod" | awk '/^livepatch:/{print $NF}') == "Y" ]]; then
+ if [[ ! -f "test_modules/$mod.ko" ]]; then
+ die "Can't find \"test_modules/$mod.ko\", try \"make\""
+ fi
+
+ if [[ $(modinfo "test_modules/$mod.ko" | awk '/^livepatch:/{print $NF}') == "Y" ]]; then
return 0
fi
@@ -134,9 +145,9 @@
function __load_mod() {
local mod="$1"; shift
- local msg="% modprobe $mod $*"
+ local msg="% insmod test_modules/$mod.ko $*"
log "${msg%% }"
- ret=$(modprobe "$mod" "$@" 2>&1)
+ ret=$(insmod "test_modules/$mod.ko" "$@" 2>&1)
if [[ "$ret" != "" ]]; then
die "$ret"
fi
@@ -149,13 +160,10 @@
# load_mod(modname, params) - load a kernel module
# modname - module name to load
-# params - module parameters to pass to modprobe
+# params - module parameters to pass to insmod
function load_mod() {
local mod="$1"; shift
- assert_mod "$mod" ||
- skip "unable to load module ${mod}, verify CONFIG_TEST_LIVEPATCH=m and run self-tests as root"
-
is_livepatch_mod "$mod" &&
die "use load_lp() to load the livepatch module $mod"
@@ -165,13 +173,10 @@
# load_lp_nowait(modname, params) - load a kernel module with a livepatch
# but do not wait on until the transition finishes
# modname - module name to load
-# params - module parameters to pass to modprobe
+# params - module parameters to pass to insmod
function load_lp_nowait() {
local mod="$1"; shift
- assert_mod "$mod" ||
- skip "unable to load module ${mod}, verify CONFIG_TEST_LIVEPATCH=m and run self-tests as root"
-
is_livepatch_mod "$mod" ||
die "module $mod is not a livepatch"
@@ -184,7 +189,7 @@
# load_lp(modname, params) - load a kernel module with a livepatch
# modname - module name to load
-# params - module parameters to pass to modprobe
+# params - module parameters to pass to insmod
function load_lp() {
local mod="$1"; shift
@@ -197,13 +202,13 @@
# load_failing_mod(modname, params) - load a kernel module, expect to fail
# modname - module name to load
-# params - module parameters to pass to modprobe
+# params - module parameters to pass to insmod
function load_failing_mod() {
local mod="$1"; shift
- local msg="% modprobe $mod $*"
+ local msg="% insmod test_modules/$mod.ko $*"
log "${msg%% }"
- ret=$(modprobe "$mod" "$@" 2>&1)
+ ret=$(insmod "test_modules/$mod.ko" "$@" 2>&1)
if [[ "$ret" == "" ]]; then
die "$mod unexpectedly loaded"
fi
diff --git a/tools/testing/selftests/livepatch/test-callbacks.sh b/tools/testing/selftests/livepatch/test-callbacks.sh
index 90b26db..32b150e 100755
--- a/tools/testing/selftests/livepatch/test-callbacks.sh
+++ b/tools/testing/selftests/livepatch/test-callbacks.sh
@@ -34,9 +34,9 @@
unload_lp $MOD_LIVEPATCH
unload_mod $MOD_TARGET
-check_result "% modprobe $MOD_TARGET
+check_result "% insmod test_modules/$MOD_TARGET.ko
$MOD_TARGET: ${MOD_TARGET}_init
-% modprobe $MOD_LIVEPATCH
+% insmod test_modules/$MOD_LIVEPATCH.ko
livepatch: enabling patch '$MOD_LIVEPATCH'
livepatch: '$MOD_LIVEPATCH': initializing patching transition
$MOD_LIVEPATCH: pre_patch_callback: vmlinux
@@ -81,7 +81,7 @@
unload_lp $MOD_LIVEPATCH
unload_mod $MOD_TARGET
-check_result "% modprobe $MOD_LIVEPATCH
+check_result "% insmod test_modules/$MOD_LIVEPATCH.ko
livepatch: enabling patch '$MOD_LIVEPATCH'
livepatch: '$MOD_LIVEPATCH': initializing patching transition
$MOD_LIVEPATCH: pre_patch_callback: vmlinux
@@ -89,7 +89,7 @@
livepatch: '$MOD_LIVEPATCH': completing patching transition
$MOD_LIVEPATCH: post_patch_callback: vmlinux
livepatch: '$MOD_LIVEPATCH': patching complete
-% modprobe $MOD_TARGET
+% insmod test_modules/$MOD_TARGET.ko
livepatch: applying patch '$MOD_LIVEPATCH' to loading module '$MOD_TARGET'
$MOD_LIVEPATCH: pre_patch_callback: $MOD_TARGET -> [MODULE_STATE_COMING] Full formed, running module_init
$MOD_LIVEPATCH: post_patch_callback: $MOD_TARGET -> [MODULE_STATE_COMING] Full formed, running module_init
@@ -129,9 +129,9 @@
disable_lp $MOD_LIVEPATCH
unload_lp $MOD_LIVEPATCH
-check_result "% modprobe $MOD_TARGET
+check_result "% insmod test_modules/$MOD_TARGET.ko
$MOD_TARGET: ${MOD_TARGET}_init
-% modprobe $MOD_LIVEPATCH
+% insmod test_modules/$MOD_LIVEPATCH.ko
livepatch: enabling patch '$MOD_LIVEPATCH'
livepatch: '$MOD_LIVEPATCH': initializing patching transition
$MOD_LIVEPATCH: pre_patch_callback: vmlinux
@@ -177,7 +177,7 @@
disable_lp $MOD_LIVEPATCH
unload_lp $MOD_LIVEPATCH
-check_result "% modprobe $MOD_LIVEPATCH
+check_result "% insmod test_modules/$MOD_LIVEPATCH.ko
livepatch: enabling patch '$MOD_LIVEPATCH'
livepatch: '$MOD_LIVEPATCH': initializing patching transition
$MOD_LIVEPATCH: pre_patch_callback: vmlinux
@@ -185,7 +185,7 @@
livepatch: '$MOD_LIVEPATCH': completing patching transition
$MOD_LIVEPATCH: post_patch_callback: vmlinux
livepatch: '$MOD_LIVEPATCH': patching complete
-% modprobe $MOD_TARGET
+% insmod test_modules/$MOD_TARGET.ko
livepatch: applying patch '$MOD_LIVEPATCH' to loading module '$MOD_TARGET'
$MOD_LIVEPATCH: pre_patch_callback: $MOD_TARGET -> [MODULE_STATE_COMING] Full formed, running module_init
$MOD_LIVEPATCH: post_patch_callback: $MOD_TARGET -> [MODULE_STATE_COMING] Full formed, running module_init
@@ -219,7 +219,7 @@
disable_lp $MOD_LIVEPATCH
unload_lp $MOD_LIVEPATCH
-check_result "% modprobe $MOD_LIVEPATCH
+check_result "% insmod test_modules/$MOD_LIVEPATCH.ko
livepatch: enabling patch '$MOD_LIVEPATCH'
livepatch: '$MOD_LIVEPATCH': initializing patching transition
$MOD_LIVEPATCH: pre_patch_callback: vmlinux
@@ -254,9 +254,9 @@
load_failing_mod $MOD_LIVEPATCH pre_patch_ret=-19
unload_mod $MOD_TARGET
-check_result "% modprobe $MOD_TARGET
+check_result "% insmod test_modules/$MOD_TARGET.ko
$MOD_TARGET: ${MOD_TARGET}_init
-% modprobe $MOD_LIVEPATCH pre_patch_ret=-19
+% insmod test_modules/$MOD_LIVEPATCH.ko pre_patch_ret=-19
livepatch: enabling patch '$MOD_LIVEPATCH'
livepatch: '$MOD_LIVEPATCH': initializing patching transition
test_klp_callbacks_demo: pre_patch_callback: vmlinux
@@ -265,7 +265,7 @@
livepatch: '$MOD_LIVEPATCH': canceling patching transition, going to unpatch
livepatch: '$MOD_LIVEPATCH': completing unpatching transition
livepatch: '$MOD_LIVEPATCH': unpatching complete
-modprobe: ERROR: could not insert '$MOD_LIVEPATCH': No such device
+insmod: ERROR: could not insert module test_modules/$MOD_LIVEPATCH.ko: No such device
% rmmod $MOD_TARGET
$MOD_TARGET: ${MOD_TARGET}_exit"
@@ -295,7 +295,7 @@
disable_lp $MOD_LIVEPATCH
unload_lp $MOD_LIVEPATCH
-check_result "% modprobe $MOD_LIVEPATCH
+check_result "% insmod test_modules/$MOD_LIVEPATCH.ko
livepatch: enabling patch '$MOD_LIVEPATCH'
livepatch: '$MOD_LIVEPATCH': initializing patching transition
$MOD_LIVEPATCH: pre_patch_callback: vmlinux
@@ -304,12 +304,12 @@
$MOD_LIVEPATCH: post_patch_callback: vmlinux
livepatch: '$MOD_LIVEPATCH': patching complete
% echo -19 > /sys/module/$MOD_LIVEPATCH/parameters/pre_patch_ret
-% modprobe $MOD_TARGET
+% insmod test_modules/$MOD_TARGET.ko
livepatch: applying patch '$MOD_LIVEPATCH' to loading module '$MOD_TARGET'
$MOD_LIVEPATCH: pre_patch_callback: $MOD_TARGET -> [MODULE_STATE_COMING] Full formed, running module_init
livepatch: pre-patch callback failed for object '$MOD_TARGET'
livepatch: patch '$MOD_LIVEPATCH' failed for module '$MOD_TARGET', refusing to load module '$MOD_TARGET'
-modprobe: ERROR: could not insert '$MOD_TARGET': No such device
+insmod: ERROR: could not insert module test_modules/$MOD_TARGET.ko: No such device
% echo 0 > /sys/kernel/livepatch/$MOD_LIVEPATCH/enabled
livepatch: '$MOD_LIVEPATCH': initializing unpatching transition
$MOD_LIVEPATCH: pre_unpatch_callback: vmlinux
@@ -340,11 +340,11 @@
unload_lp $MOD_LIVEPATCH
unload_mod $MOD_TARGET_BUSY
-check_result "% modprobe $MOD_TARGET_BUSY block_transition=N
+check_result "% insmod test_modules/$MOD_TARGET_BUSY.ko block_transition=N
$MOD_TARGET_BUSY: ${MOD_TARGET_BUSY}_init
$MOD_TARGET_BUSY: busymod_work_func enter
$MOD_TARGET_BUSY: busymod_work_func exit
-% modprobe $MOD_LIVEPATCH
+% insmod test_modules/$MOD_LIVEPATCH.ko
livepatch: enabling patch '$MOD_LIVEPATCH'
livepatch: '$MOD_LIVEPATCH': initializing patching transition
$MOD_LIVEPATCH: pre_patch_callback: vmlinux
@@ -354,7 +354,7 @@
$MOD_LIVEPATCH: post_patch_callback: vmlinux
$MOD_LIVEPATCH: post_patch_callback: $MOD_TARGET_BUSY -> [MODULE_STATE_LIVE] Normal state
livepatch: '$MOD_LIVEPATCH': patching complete
-% modprobe $MOD_TARGET
+% insmod test_modules/$MOD_TARGET.ko
livepatch: applying patch '$MOD_LIVEPATCH' to loading module '$MOD_TARGET'
$MOD_LIVEPATCH: pre_patch_callback: $MOD_TARGET -> [MODULE_STATE_COMING] Full formed, running module_init
$MOD_LIVEPATCH: post_patch_callback: $MOD_TARGET -> [MODULE_STATE_COMING] Full formed, running module_init
@@ -421,16 +421,16 @@
unload_lp $MOD_LIVEPATCH
unload_mod $MOD_TARGET_BUSY
-check_result "% modprobe $MOD_TARGET_BUSY block_transition=Y
+check_result "% insmod test_modules/$MOD_TARGET_BUSY.ko block_transition=Y
$MOD_TARGET_BUSY: ${MOD_TARGET_BUSY}_init
$MOD_TARGET_BUSY: busymod_work_func enter
-% modprobe $MOD_LIVEPATCH
+% insmod test_modules/$MOD_LIVEPATCH.ko
livepatch: enabling patch '$MOD_LIVEPATCH'
livepatch: '$MOD_LIVEPATCH': initializing patching transition
$MOD_LIVEPATCH: pre_patch_callback: vmlinux
$MOD_LIVEPATCH: pre_patch_callback: $MOD_TARGET_BUSY -> [MODULE_STATE_LIVE] Normal state
livepatch: '$MOD_LIVEPATCH': starting patching transition
-% modprobe $MOD_TARGET
+% insmod test_modules/$MOD_TARGET.ko
livepatch: applying patch '$MOD_LIVEPATCH' to loading module '$MOD_TARGET'
$MOD_LIVEPATCH: pre_patch_callback: $MOD_TARGET -> [MODULE_STATE_COMING] Full formed, running module_init
$MOD_TARGET: ${MOD_TARGET}_init
@@ -467,7 +467,7 @@
unload_lp $MOD_LIVEPATCH2
unload_lp $MOD_LIVEPATCH
-check_result "% modprobe $MOD_LIVEPATCH
+check_result "% insmod test_modules/$MOD_LIVEPATCH.ko
livepatch: enabling patch '$MOD_LIVEPATCH'
livepatch: '$MOD_LIVEPATCH': initializing patching transition
$MOD_LIVEPATCH: pre_patch_callback: vmlinux
@@ -475,7 +475,7 @@
livepatch: '$MOD_LIVEPATCH': completing patching transition
$MOD_LIVEPATCH: post_patch_callback: vmlinux
livepatch: '$MOD_LIVEPATCH': patching complete
-% modprobe $MOD_LIVEPATCH2
+% insmod test_modules/$MOD_LIVEPATCH2.ko
livepatch: enabling patch '$MOD_LIVEPATCH2'
livepatch: '$MOD_LIVEPATCH2': initializing patching transition
$MOD_LIVEPATCH2: pre_patch_callback: vmlinux
@@ -523,7 +523,7 @@
unload_lp $MOD_LIVEPATCH2
unload_lp $MOD_LIVEPATCH
-check_result "% modprobe $MOD_LIVEPATCH
+check_result "% insmod test_modules/$MOD_LIVEPATCH.ko
livepatch: enabling patch '$MOD_LIVEPATCH'
livepatch: '$MOD_LIVEPATCH': initializing patching transition
$MOD_LIVEPATCH: pre_patch_callback: vmlinux
@@ -531,7 +531,7 @@
livepatch: '$MOD_LIVEPATCH': completing patching transition
$MOD_LIVEPATCH: post_patch_callback: vmlinux
livepatch: '$MOD_LIVEPATCH': patching complete
-% modprobe $MOD_LIVEPATCH2 replace=1
+% insmod test_modules/$MOD_LIVEPATCH2.ko replace=1
livepatch: enabling patch '$MOD_LIVEPATCH2'
livepatch: '$MOD_LIVEPATCH2': initializing patching transition
$MOD_LIVEPATCH2: pre_patch_callback: vmlinux
diff --git a/tools/testing/selftests/livepatch/test-ftrace.sh b/tools/testing/selftests/livepatch/test-ftrace.sh
index 825540a..730218b 100755
--- a/tools/testing/selftests/livepatch/test-ftrace.sh
+++ b/tools/testing/selftests/livepatch/test-ftrace.sh
@@ -35,7 +35,7 @@
unload_lp $MOD_LIVEPATCH
check_result "livepatch: kernel.ftrace_enabled = 0
-% modprobe $MOD_LIVEPATCH
+% insmod test_modules/$MOD_LIVEPATCH.ko
livepatch: enabling patch '$MOD_LIVEPATCH'
livepatch: '$MOD_LIVEPATCH': initializing patching transition
livepatch: failed to register ftrace handler for function 'cmdline_proc_show' (-16)
@@ -44,9 +44,9 @@
livepatch: '$MOD_LIVEPATCH': canceling patching transition, going to unpatch
livepatch: '$MOD_LIVEPATCH': completing unpatching transition
livepatch: '$MOD_LIVEPATCH': unpatching complete
-modprobe: ERROR: could not insert '$MOD_LIVEPATCH': Device or resource busy
+insmod: ERROR: could not insert module test_modules/$MOD_LIVEPATCH.ko: Device or resource busy
livepatch: kernel.ftrace_enabled = 1
-% modprobe $MOD_LIVEPATCH
+% insmod test_modules/$MOD_LIVEPATCH.ko
livepatch: enabling patch '$MOD_LIVEPATCH'
livepatch: '$MOD_LIVEPATCH': initializing patching transition
livepatch: '$MOD_LIVEPATCH': starting patching transition
diff --git a/tools/testing/selftests/livepatch/test-livepatch.sh b/tools/testing/selftests/livepatch/test-livepatch.sh
index 5fe79ac..e3455a6 100755
--- a/tools/testing/selftests/livepatch/test-livepatch.sh
+++ b/tools/testing/selftests/livepatch/test-livepatch.sh
@@ -31,7 +31,7 @@
die "livepatch kselftest(s) failed"
fi
-check_result "% modprobe $MOD_LIVEPATCH
+check_result "% insmod test_modules/$MOD_LIVEPATCH.ko
livepatch: enabling patch '$MOD_LIVEPATCH'
livepatch: '$MOD_LIVEPATCH': initializing patching transition
livepatch: '$MOD_LIVEPATCH': starting patching transition
@@ -75,14 +75,14 @@
grep 'live patched' /proc/cmdline > /dev/kmsg
grep 'live patched' /proc/meminfo > /dev/kmsg
-check_result "% modprobe $MOD_LIVEPATCH
+check_result "% insmod test_modules/$MOD_LIVEPATCH.ko
livepatch: enabling patch '$MOD_LIVEPATCH'
livepatch: '$MOD_LIVEPATCH': initializing patching transition
livepatch: '$MOD_LIVEPATCH': starting patching transition
livepatch: '$MOD_LIVEPATCH': completing patching transition
livepatch: '$MOD_LIVEPATCH': patching complete
$MOD_LIVEPATCH: this has been live patched
-% modprobe $MOD_REPLACE replace=0
+% insmod test_modules/$MOD_REPLACE.ko replace=0
livepatch: enabling patch '$MOD_REPLACE'
livepatch: '$MOD_REPLACE': initializing patching transition
livepatch: '$MOD_REPLACE': starting patching transition
@@ -135,14 +135,14 @@
grep 'live patched' /proc/cmdline > /dev/kmsg
grep 'live patched' /proc/meminfo > /dev/kmsg
-check_result "% modprobe $MOD_LIVEPATCH
+check_result "% insmod test_modules/$MOD_LIVEPATCH.ko
livepatch: enabling patch '$MOD_LIVEPATCH'
livepatch: '$MOD_LIVEPATCH': initializing patching transition
livepatch: '$MOD_LIVEPATCH': starting patching transition
livepatch: '$MOD_LIVEPATCH': completing patching transition
livepatch: '$MOD_LIVEPATCH': patching complete
$MOD_LIVEPATCH: this has been live patched
-% modprobe $MOD_REPLACE replace=1
+% insmod test_modules/$MOD_REPLACE.ko replace=1
livepatch: enabling patch '$MOD_REPLACE'
livepatch: '$MOD_REPLACE': initializing patching transition
livepatch: '$MOD_REPLACE': starting patching transition
diff --git a/tools/testing/selftests/livepatch/test-shadow-vars.sh b/tools/testing/selftests/livepatch/test-shadow-vars.sh
index e04cb35..1218c15 100755
--- a/tools/testing/selftests/livepatch/test-shadow-vars.sh
+++ b/tools/testing/selftests/livepatch/test-shadow-vars.sh
@@ -16,7 +16,7 @@
load_mod $MOD_TEST
unload_mod $MOD_TEST
-check_result "% modprobe $MOD_TEST
+check_result "% insmod test_modules/$MOD_TEST.ko
$MOD_TEST: klp_shadow_get(obj=PTR1, id=0x1234) = PTR0
$MOD_TEST: got expected NULL result
$MOD_TEST: shadow_ctor: PTR3 -> PTR2
diff --git a/tools/testing/selftests/livepatch/test-state.sh b/tools/testing/selftests/livepatch/test-state.sh
index 3865672..10a52ac 100755
--- a/tools/testing/selftests/livepatch/test-state.sh
+++ b/tools/testing/selftests/livepatch/test-state.sh
@@ -19,7 +19,7 @@
disable_lp $MOD_LIVEPATCH
unload_lp $MOD_LIVEPATCH
-check_result "% modprobe $MOD_LIVEPATCH
+check_result "% insmod test_modules/$MOD_LIVEPATCH.ko
livepatch: enabling patch '$MOD_LIVEPATCH'
livepatch: '$MOD_LIVEPATCH': initializing patching transition
$MOD_LIVEPATCH: pre_patch_callback: vmlinux
@@ -51,7 +51,7 @@
disable_lp $MOD_LIVEPATCH2
unload_lp $MOD_LIVEPATCH2
-check_result "% modprobe $MOD_LIVEPATCH
+check_result "% insmod test_modules/$MOD_LIVEPATCH.ko
livepatch: enabling patch '$MOD_LIVEPATCH'
livepatch: '$MOD_LIVEPATCH': initializing patching transition
$MOD_LIVEPATCH: pre_patch_callback: vmlinux
@@ -61,7 +61,7 @@
$MOD_LIVEPATCH: post_patch_callback: vmlinux
$MOD_LIVEPATCH: fix_console_loglevel: fixing console_loglevel
livepatch: '$MOD_LIVEPATCH': patching complete
-% modprobe $MOD_LIVEPATCH2
+% insmod test_modules/$MOD_LIVEPATCH2.ko
livepatch: enabling patch '$MOD_LIVEPATCH2'
livepatch: '$MOD_LIVEPATCH2': initializing patching transition
$MOD_LIVEPATCH2: pre_patch_callback: vmlinux
@@ -96,7 +96,7 @@
unload_lp $MOD_LIVEPATCH2
unload_lp $MOD_LIVEPATCH3
-check_result "% modprobe $MOD_LIVEPATCH2
+check_result "% insmod test_modules/$MOD_LIVEPATCH2.ko
livepatch: enabling patch '$MOD_LIVEPATCH2'
livepatch: '$MOD_LIVEPATCH2': initializing patching transition
$MOD_LIVEPATCH2: pre_patch_callback: vmlinux
@@ -106,7 +106,7 @@
$MOD_LIVEPATCH2: post_patch_callback: vmlinux
$MOD_LIVEPATCH2: fix_console_loglevel: fixing console_loglevel
livepatch: '$MOD_LIVEPATCH2': patching complete
-% modprobe $MOD_LIVEPATCH3
+% insmod test_modules/$MOD_LIVEPATCH3.ko
livepatch: enabling patch '$MOD_LIVEPATCH3'
livepatch: '$MOD_LIVEPATCH3': initializing patching transition
$MOD_LIVEPATCH3: pre_patch_callback: vmlinux
@@ -117,7 +117,7 @@
$MOD_LIVEPATCH3: fix_console_loglevel: taking over the console_loglevel change
livepatch: '$MOD_LIVEPATCH3': patching complete
% rmmod $MOD_LIVEPATCH2
-% modprobe $MOD_LIVEPATCH2
+% insmod test_modules/$MOD_LIVEPATCH2.ko
livepatch: enabling patch '$MOD_LIVEPATCH2'
livepatch: '$MOD_LIVEPATCH2': initializing patching transition
$MOD_LIVEPATCH2: pre_patch_callback: vmlinux
@@ -149,7 +149,7 @@
disable_lp $MOD_LIVEPATCH2
unload_lp $MOD_LIVEPATCH2
-check_result "% modprobe $MOD_LIVEPATCH2
+check_result "% insmod test_modules/$MOD_LIVEPATCH2.ko
livepatch: enabling patch '$MOD_LIVEPATCH2'
livepatch: '$MOD_LIVEPATCH2': initializing patching transition
$MOD_LIVEPATCH2: pre_patch_callback: vmlinux
@@ -159,9 +159,9 @@
$MOD_LIVEPATCH2: post_patch_callback: vmlinux
$MOD_LIVEPATCH2: fix_console_loglevel: fixing console_loglevel
livepatch: '$MOD_LIVEPATCH2': patching complete
-% modprobe $MOD_LIVEPATCH
+% insmod test_modules/$MOD_LIVEPATCH.ko
livepatch: Livepatch patch ($MOD_LIVEPATCH) is not compatible with the already installed livepatches.
-modprobe: ERROR: could not insert '$MOD_LIVEPATCH': Invalid argument
+insmod: ERROR: could not insert module test_modules/$MOD_LIVEPATCH.ko: Invalid parameters
% echo 0 > /sys/kernel/livepatch/$MOD_LIVEPATCH2/enabled
livepatch: '$MOD_LIVEPATCH2': initializing unpatching transition
$MOD_LIVEPATCH2: pre_unpatch_callback: vmlinux
diff --git a/tools/testing/selftests/livepatch/test-syscall.sh b/tools/testing/selftests/livepatch/test-syscall.sh
new file mode 100755
index 0000000..b76a881
--- /dev/null
+++ b/tools/testing/selftests/livepatch/test-syscall.sh
@@ -0,0 +1,53 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2023 SUSE
+# Author: Marcos Paulo de Souza <mpdesouza@suse.com>
+
+. $(dirname $0)/functions.sh
+
+MOD_SYSCALL=test_klp_syscall
+
+setup_config
+
+# - Start _NRPROC processes calling getpid and load a livepatch to patch the
+# getpid syscall. Check if all the processes transitioned to the livepatched
+# state.
+
+start_test "patch getpid syscall while being heavily hammered"
+
+for i in $(seq 1 $(getconf _NPROCESSORS_ONLN)); do
+ ./test_klp-call_getpid &
+ pids[$i]="$!"
+done
+
+pid_list=$(echo ${pids[@]} | tr ' ' ',')
+load_lp $MOD_SYSCALL klp_pids=$pid_list
+
+# wait for all tasks to transition to patched state
+loop_until 'grep -q '^0$' /sys/kernel/test_klp_syscall/npids'
+
+pending_pids=$(cat /sys/kernel/test_klp_syscall/npids)
+log "$MOD_SYSCALL: Remaining not livepatched processes: $pending_pids"
+
+for pid in ${pids[@]}; do
+ kill $pid || true
+done
+
+disable_lp $MOD_SYSCALL
+unload_lp $MOD_SYSCALL
+
+check_result "% insmod test_modules/$MOD_SYSCALL.ko klp_pids=$pid_list
+livepatch: enabling patch '$MOD_SYSCALL'
+livepatch: '$MOD_SYSCALL': initializing patching transition
+livepatch: '$MOD_SYSCALL': starting patching transition
+livepatch: '$MOD_SYSCALL': completing patching transition
+livepatch: '$MOD_SYSCALL': patching complete
+$MOD_SYSCALL: Remaining not livepatched processes: 0
+% echo 0 > /sys/kernel/livepatch/$MOD_SYSCALL/enabled
+livepatch: '$MOD_SYSCALL': initializing unpatching transition
+livepatch: '$MOD_SYSCALL': starting unpatching transition
+livepatch: '$MOD_SYSCALL': completing unpatching transition
+livepatch: '$MOD_SYSCALL': unpatching complete
+% rmmod $MOD_SYSCALL"
+
+exit 0
diff --git a/tools/testing/selftests/livepatch/test-sysfs.sh b/tools/testing/selftests/livepatch/test-sysfs.sh
index 7f76f28..6c646af 100755
--- a/tools/testing/selftests/livepatch/test-sysfs.sh
+++ b/tools/testing/selftests/livepatch/test-sysfs.sh
@@ -27,7 +27,7 @@
unload_lp $MOD_LIVEPATCH
-check_result "% modprobe $MOD_LIVEPATCH
+check_result "% insmod test_modules/$MOD_LIVEPATCH.ko
livepatch: enabling patch '$MOD_LIVEPATCH'
livepatch: '$MOD_LIVEPATCH': initializing patching transition
livepatch: '$MOD_LIVEPATCH': starting patching transition
@@ -56,7 +56,7 @@
disable_lp $MOD_LIVEPATCH
unload_lp $MOD_LIVEPATCH
-check_result "% modprobe test_klp_callbacks_demo
+check_result "% insmod test_modules/test_klp_callbacks_demo.ko
livepatch: enabling patch 'test_klp_callbacks_demo'
livepatch: 'test_klp_callbacks_demo': initializing patching transition
test_klp_callbacks_demo: pre_patch_callback: vmlinux
@@ -64,7 +64,7 @@
livepatch: 'test_klp_callbacks_demo': completing patching transition
test_klp_callbacks_demo: post_patch_callback: vmlinux
livepatch: 'test_klp_callbacks_demo': patching complete
-% modprobe test_klp_callbacks_mod
+% insmod test_modules/test_klp_callbacks_mod.ko
livepatch: applying patch 'test_klp_callbacks_demo' to loading module 'test_klp_callbacks_mod'
test_klp_callbacks_demo: pre_patch_callback: test_klp_callbacks_mod -> [MODULE_STATE_COMING] Full formed, running module_init
test_klp_callbacks_demo: post_patch_callback: test_klp_callbacks_mod -> [MODULE_STATE_COMING] Full formed, running module_init
diff --git a/tools/testing/selftests/livepatch/test_klp-call_getpid.c b/tools/testing/selftests/livepatch/test_klp-call_getpid.c
new file mode 100644
index 0000000..ce321a2
--- /dev/null
+++ b/tools/testing/selftests/livepatch/test_klp-call_getpid.c
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2023 SUSE
+ * Authors: Libor Pechacek <lpechacek@suse.cz>
+ * Marcos Paulo de Souza <mpdesouza@suse.com>
+ */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <signal.h>
+
+static int stop;
+static int sig_int;
+
+void hup_handler(int signum)
+{
+ stop = 1;
+}
+
+void int_handler(int signum)
+{
+ stop = 1;
+ sig_int = 1;
+}
+
+int main(int argc, char *argv[])
+{
+ long count = 0;
+
+ signal(SIGHUP, &hup_handler);
+ signal(SIGINT, &int_handler);
+
+ while (!stop) {
+ (void)syscall(SYS_getpid);
+ count++;
+ }
+
+ if (sig_int)
+ printf("%ld iterations done\n", count);
+
+ return 0;
+}
diff --git a/tools/testing/selftests/livepatch/test_modules/Makefile b/tools/testing/selftests/livepatch/test_modules/Makefile
new file mode 100644
index 0000000..e6e638c
--- /dev/null
+++ b/tools/testing/selftests/livepatch/test_modules/Makefile
@@ -0,0 +1,26 @@
+TESTMODS_DIR := $(realpath $(dir $(abspath $(lastword $(MAKEFILE_LIST)))))
+KDIR ?= /lib/modules/$(shell uname -r)/build
+
+obj-m += test_klp_atomic_replace.o \
+ test_klp_callbacks_busy.o \
+ test_klp_callbacks_demo.o \
+ test_klp_callbacks_demo2.o \
+ test_klp_callbacks_mod.o \
+ test_klp_livepatch.o \
+ test_klp_state.o \
+ test_klp_state2.o \
+ test_klp_state3.o \
+ test_klp_shadow_vars.o \
+ test_klp_syscall.o
+
+# Ensure that KDIR exists, otherwise skip the compilation
+modules:
+ifneq ("$(wildcard $(KDIR))", "")
+ $(Q)$(MAKE) -C $(KDIR) modules KBUILD_EXTMOD=$(TESTMODS_DIR)
+endif
+
+# Ensure that KDIR exists, otherwise skip the clean target
+clean:
+ifneq ("$(wildcard $(KDIR))", "")
+ $(Q)$(MAKE) -C $(KDIR) clean KBUILD_EXTMOD=$(TESTMODS_DIR)
+endif
diff --git a/lib/livepatch/test_klp_atomic_replace.c b/tools/testing/selftests/livepatch/test_modules/test_klp_atomic_replace.c
similarity index 100%
rename from lib/livepatch/test_klp_atomic_replace.c
rename to tools/testing/selftests/livepatch/test_modules/test_klp_atomic_replace.c
diff --git a/lib/livepatch/test_klp_callbacks_busy.c b/tools/testing/selftests/livepatch/test_modules/test_klp_callbacks_busy.c
similarity index 100%
rename from lib/livepatch/test_klp_callbacks_busy.c
rename to tools/testing/selftests/livepatch/test_modules/test_klp_callbacks_busy.c
diff --git a/lib/livepatch/test_klp_callbacks_demo.c b/tools/testing/selftests/livepatch/test_modules/test_klp_callbacks_demo.c
similarity index 100%
rename from lib/livepatch/test_klp_callbacks_demo.c
rename to tools/testing/selftests/livepatch/test_modules/test_klp_callbacks_demo.c
diff --git a/lib/livepatch/test_klp_callbacks_demo2.c b/tools/testing/selftests/livepatch/test_modules/test_klp_callbacks_demo2.c
similarity index 100%
rename from lib/livepatch/test_klp_callbacks_demo2.c
rename to tools/testing/selftests/livepatch/test_modules/test_klp_callbacks_demo2.c
diff --git a/lib/livepatch/test_klp_callbacks_mod.c b/tools/testing/selftests/livepatch/test_modules/test_klp_callbacks_mod.c
similarity index 100%
rename from lib/livepatch/test_klp_callbacks_mod.c
rename to tools/testing/selftests/livepatch/test_modules/test_klp_callbacks_mod.c
diff --git a/lib/livepatch/test_klp_livepatch.c b/tools/testing/selftests/livepatch/test_modules/test_klp_livepatch.c
similarity index 100%
rename from lib/livepatch/test_klp_livepatch.c
rename to tools/testing/selftests/livepatch/test_modules/test_klp_livepatch.c
diff --git a/lib/livepatch/test_klp_shadow_vars.c b/tools/testing/selftests/livepatch/test_modules/test_klp_shadow_vars.c
similarity index 100%
rename from lib/livepatch/test_klp_shadow_vars.c
rename to tools/testing/selftests/livepatch/test_modules/test_klp_shadow_vars.c
diff --git a/lib/livepatch/test_klp_state.c b/tools/testing/selftests/livepatch/test_modules/test_klp_state.c
similarity index 100%
rename from lib/livepatch/test_klp_state.c
rename to tools/testing/selftests/livepatch/test_modules/test_klp_state.c
diff --git a/lib/livepatch/test_klp_state2.c b/tools/testing/selftests/livepatch/test_modules/test_klp_state2.c
similarity index 100%
rename from lib/livepatch/test_klp_state2.c
rename to tools/testing/selftests/livepatch/test_modules/test_klp_state2.c
diff --git a/lib/livepatch/test_klp_state3.c b/tools/testing/selftests/livepatch/test_modules/test_klp_state3.c
similarity index 100%
rename from lib/livepatch/test_klp_state3.c
rename to tools/testing/selftests/livepatch/test_modules/test_klp_state3.c
diff --git a/tools/testing/selftests/livepatch/test_modules/test_klp_syscall.c b/tools/testing/selftests/livepatch/test_modules/test_klp_syscall.c
new file mode 100644
index 0000000..dd80278
--- /dev/null
+++ b/tools/testing/selftests/livepatch/test_modules/test_klp_syscall.c
@@ -0,0 +1,116 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2017-2023 SUSE
+ * Authors: Libor Pechacek <lpechacek@suse.cz>
+ * Nicolai Stange <nstange@suse.de>
+ * Marcos Paulo de Souza <mpdesouza@suse.com>
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/livepatch.h>
+
+#if defined(__x86_64__)
+#define FN_PREFIX __x64_
+#elif defined(__s390x__)
+#define FN_PREFIX __s390x_
+#elif defined(__aarch64__)
+#define FN_PREFIX __arm64_
+#else
+/* powerpc does not select ARCH_HAS_SYSCALL_WRAPPER */
+#define FN_PREFIX
+#endif
+
+/* Protects klp_pids */
+static DEFINE_MUTEX(kpid_mutex);
+
+static unsigned int npids, npids_pending;
+static int klp_pids[NR_CPUS];
+module_param_array(klp_pids, int, &npids_pending, 0);
+MODULE_PARM_DESC(klp_pids, "Array of pids to be transitioned to livepatched state.");
+
+static ssize_t npids_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf)
+{
+ return sprintf(buf, "%u\n", npids_pending);
+}
+
+static struct kobj_attribute klp_attr = __ATTR_RO(npids);
+static struct kobject *klp_kobj;
+
+static asmlinkage long lp_sys_getpid(void)
+{
+ int i;
+
+ mutex_lock(&kpid_mutex);
+ if (npids_pending > 0) {
+ for (i = 0; i < npids; i++) {
+ if (current->pid == klp_pids[i]) {
+ klp_pids[i] = 0;
+ npids_pending--;
+ break;
+ }
+ }
+ }
+ mutex_unlock(&kpid_mutex);
+
+ return task_tgid_vnr(current);
+}
+
+static struct klp_func vmlinux_funcs[] = {
+ {
+ .old_name = __stringify(FN_PREFIX) "sys_getpid",
+ .new_func = lp_sys_getpid,
+ }, {}
+};
+
+static struct klp_object objs[] = {
+ {
+ /* name being NULL means vmlinux */
+ .funcs = vmlinux_funcs,
+ }, {}
+};
+
+static struct klp_patch patch = {
+ .mod = THIS_MODULE,
+ .objs = objs,
+};
+
+static int livepatch_init(void)
+{
+ int ret;
+
+ klp_kobj = kobject_create_and_add("test_klp_syscall", kernel_kobj);
+ if (!klp_kobj)
+ return -ENOMEM;
+
+ ret = sysfs_create_file(klp_kobj, &klp_attr.attr);
+ if (ret) {
+ kobject_put(klp_kobj);
+ return ret;
+ }
+
+ /*
+ * Save the number pids to transition to livepatched state before the
+ * number of pending pids is decremented.
+ */
+ npids = npids_pending;
+
+ return klp_enable_patch(&patch);
+}
+
+static void livepatch_exit(void)
+{
+ kobject_put(klp_kobj);
+}
+
+module_init(livepatch_init);
+module_exit(livepatch_exit);
+MODULE_LICENSE("GPL");
+MODULE_INFO(livepatch, "Y");
+MODULE_AUTHOR("Libor Pechacek <lpechacek@suse.cz>");
+MODULE_AUTHOR("Nicolai Stange <nstange@suse.de>");
+MODULE_AUTHOR("Marcos Paulo de Souza <mpdesouza@suse.com>");
+MODULE_DESCRIPTION("Livepatch test: syscall transition");
diff --git a/tools/testing/selftests/mm/uffd-unit-tests.c b/tools/testing/selftests/mm/uffd-unit-tests.c
index cce90a1..2b9f8cc 100644
--- a/tools/testing/selftests/mm/uffd-unit-tests.c
+++ b/tools/testing/selftests/mm/uffd-unit-tests.c
@@ -1517,6 +1517,12 @@ int main(int argc, char *argv[])
continue;
uffd_test_start("%s on %s", test->name, mem_type->name);
+ if ((mem_type->mem_flag == MEM_HUGETLB ||
+ mem_type->mem_flag == MEM_HUGETLB_PRIVATE) &&
+ (default_huge_page_size() == 0)) {
+ uffd_test_skip("huge page size is 0, feature missing?");
+ continue;
+ }
if (!uffd_feature_supported(test)) {
uffd_test_skip("feature missing");
continue;
diff --git a/tools/testing/selftests/move_mount_set_group/move_mount_set_group_test.c b/tools/testing/selftests/move_mount_set_group/move_mount_set_group_test.c
index 50ed5d4..bcf51d7 100644
--- a/tools/testing/selftests/move_mount_set_group/move_mount_set_group_test.c
+++ b/tools/testing/selftests/move_mount_set_group/move_mount_set_group_test.c
@@ -218,7 +218,7 @@ static bool move_mount_set_group_supported(void)
if (mount(NULL, SET_GROUP_FROM, NULL, MS_SHARED, 0))
return -1;
- ret = syscall(SYS_move_mount, AT_FDCWD, SET_GROUP_FROM,
+ ret = syscall(__NR_move_mount, AT_FDCWD, SET_GROUP_FROM,
AT_FDCWD, SET_GROUP_TO, MOVE_MOUNT_SET_GROUP);
umount2("/tmp", MNT_DETACH);
@@ -363,7 +363,7 @@ TEST_F(move_mount_set_group, complex_sharing_copying)
CLONE_VM | CLONE_FILES); ASSERT_GT(pid, 0);
ASSERT_EQ(wait_for_pid(pid), 0);
- ASSERT_EQ(syscall(SYS_move_mount, ca_from.mntfd, "",
+ ASSERT_EQ(syscall(__NR_move_mount, ca_from.mntfd, "",
ca_to.mntfd, "", MOVE_MOUNT_SET_GROUP
| MOVE_MOUNT_F_EMPTY_PATH | MOVE_MOUNT_T_EMPTY_PATH),
0);
diff --git a/tools/testing/selftests/mqueue/setting b/tools/testing/selftests/mqueue/setting
new file mode 100644
index 0000000..a953c96
--- /dev/null
+++ b/tools/testing/selftests/mqueue/setting
@@ -0,0 +1 @@
+timeout=180
diff --git a/tools/testing/selftests/net/big_tcp.sh b/tools/testing/selftests/net/big_tcp.sh
index cde9a91..2db9d15c 100755
--- a/tools/testing/selftests/net/big_tcp.sh
+++ b/tools/testing/selftests/net/big_tcp.sh
@@ -122,7 +122,9 @@
local netns=$1
[ "$NF" = "6" ] && serip=$SERVER_IP6
- ip net exec $netns netperf -$NF -t TCP_STREAM -H $serip 2>&1 >/dev/null
+
+ # use large write to be sure to generate big tcp packets
+ ip net exec $netns netperf -$NF -t TCP_STREAM -l 1 -H $serip -- -m 262144 2>&1 >/dev/null
}
do_test() {
diff --git a/tools/testing/selftests/net/cmsg_ipv6.sh b/tools/testing/selftests/net/cmsg_ipv6.sh
index f30bd57..8bc23fb 100755
--- a/tools/testing/selftests/net/cmsg_ipv6.sh
+++ b/tools/testing/selftests/net/cmsg_ipv6.sh
@@ -89,7 +89,7 @@
check_result $? 0 "TCLASS $prot $ovr - pass"
while [ -d /proc/$BG ]; do
- $NSEXE ./cmsg_sender -6 -p u $TGT6 1234
+ $NSEXE ./cmsg_sender -6 -p $p $m $((TOS2)) $TGT6 1234
done
tcpdump -r $TMPF -v 2>&1 | grep "class $TOS2" >> /dev/null
@@ -126,7 +126,7 @@
check_result $? 0 "HOPLIMIT $prot $ovr - pass"
while [ -d /proc/$BG ]; do
- $NSEXE ./cmsg_sender -6 -p u $TGT6 1234
+ $NSEXE ./cmsg_sender -6 -p $p $m $LIM $TGT6 1234
done
tcpdump -r $TMPF -v 2>&1 | grep "hlim $LIM[^0-9]" >> /dev/null
diff --git a/tools/testing/selftests/net/config b/tools/testing/selftests/net/config
index 3b749ad..5e4390c 100644
--- a/tools/testing/selftests/net/config
+++ b/tools/testing/selftests/net/config
@@ -24,10 +24,14 @@
CONFIG_INET_DIAG=y
CONFIG_INET_ESP=y
CONFIG_INET_ESP_OFFLOAD=y
+CONFIG_NET_FOU=y
+CONFIG_NET_FOU_IP_TUNNELS=y
CONFIG_IP_GRE=m
CONFIG_NETFILTER=y
CONFIG_NETFILTER_ADVANCED=y
CONFIG_NF_CONNTRACK=m
+CONFIG_IPV6_SIT=y
+CONFIG_IP_DCCP=m
CONFIG_NF_NAT=m
CONFIG_IP6_NF_IPTABLES=m
CONFIG_IP_NF_IPTABLES=m
@@ -62,6 +66,7 @@
CONFIG_NET_CLS_U32=m
CONFIG_NET_IPGRE_DEMUX=m
CONFIG_NET_IPGRE=m
+CONFIG_NET_IPIP=y
CONFIG_NET_SCH_FQ_CODEL=m
CONFIG_NET_SCH_HTB=m
CONFIG_NET_SCH_FQ=m
@@ -78,7 +83,6 @@
CONFIG_TRACEPOINTS=y
CONFIG_NET_DROP_MONITOR=m
CONFIG_NETDEVSIM=m
-CONFIG_NET_FOU=m
CONFIG_MPLS_ROUTING=m
CONFIG_MPLS_IPTUNNEL=m
CONFIG_NET_SCH_INGRESS=m
diff --git a/tools/testing/selftests/net/forwarding/bridge_locked_port.sh b/tools/testing/selftests/net/forwarding/bridge_locked_port.sh
index 9af9f69..c62331b2 100755
--- a/tools/testing/selftests/net/forwarding/bridge_locked_port.sh
+++ b/tools/testing/selftests/net/forwarding/bridge_locked_port.sh
@@ -327,10 +327,10 @@
RET=0
check_port_mab_support || return 0
- bridge link set dev $swp1 learning on locked on mab on
tc qdisc add dev $swp1 clsact
tc filter add dev $swp1 ingress protocol all pref 1 handle 101 flower \
action mirred egress redirect dev $swp2
+ bridge link set dev $swp1 learning on locked on mab on
ping_do $h1 192.0.2.2
check_err $? "Ping did not work with redirection"
@@ -349,8 +349,8 @@
check_err $? "Locked entry not created after deleting filter"
bridge fdb del `mac_get $h1` vlan 1 dev $swp1 master
- tc qdisc del dev $swp1 clsact
bridge link set dev $swp1 learning off locked off mab off
+ tc qdisc del dev $swp1 clsact
log_test "Locked port MAB redirect"
}
diff --git a/tools/testing/selftests/net/forwarding/bridge_mdb.sh b/tools/testing/selftests/net/forwarding/bridge_mdb.sh
index 61348f7..d9d5874 100755
--- a/tools/testing/selftests/net/forwarding/bridge_mdb.sh
+++ b/tools/testing/selftests/net/forwarding/bridge_mdb.sh
@@ -329,7 +329,7 @@
bridge -d -s mdb get dev br0 grp $grp vid 10 | grep -q " 0.00"
check_err $? "(*, G) \"permanent\" entry has a pending group timer"
- bridge -d -s mdb get dev br0 grp $grp vid 10 | grep -q "\/0.00"
+ bridge -d -s mdb get dev br0 grp $grp vid 10 | grep -q "/0.00"
check_err $? "\"permanent\" source entry has a pending source timer"
bridge mdb del dev br0 port $swp1 grp $grp vid 10
@@ -346,7 +346,7 @@
bridge -d -s mdb get dev br0 grp $grp vid 10 | grep -q " 0.00"
check_fail $? "(*, G) EXCLUDE entry does not have a pending group timer"
- bridge -d -s mdb get dev br0 grp $grp vid 10 | grep -q "\/0.00"
+ bridge -d -s mdb get dev br0 grp $grp vid 10 | grep -q "/0.00"
check_err $? "\"blocked\" source entry has a pending source timer"
bridge mdb del dev br0 port $swp1 grp $grp vid 10
@@ -363,7 +363,7 @@
bridge -d -s mdb get dev br0 grp $grp vid 10 | grep -q " 0.00"
check_err $? "(*, G) INCLUDE entry has a pending group timer"
- bridge -d -s mdb get dev br0 grp $grp vid 10 | grep -q "\/0.00"
+ bridge -d -s mdb get dev br0 grp $grp vid 10 | grep -q "/0.00"
check_fail $? "Source entry does not have a pending source timer"
bridge mdb del dev br0 port $swp1 grp $grp vid 10
@@ -1252,14 +1252,17 @@
echo
log_info "# Forwarding tests"
+ # Set the Max Response Delay to 100 centiseconds (1 second) so that the
+ # bridge will start forwarding according to its MDB soon after a
+ # multicast querier is enabled.
+ ip link set dev br0 type bridge mcast_query_response_interval 100
+
# Forwarding according to MDB entries only takes place when the bridge
# detects that there is a valid querier in the network. Set the bridge
# as the querier and assign it a valid IPv6 link-local address to be
# used as the source address for MLD queries.
ip -6 address add fe80::1/64 nodad dev br0
ip link set dev br0 type bridge mcast_querier 1
- # Wait the default Query Response Interval (10 seconds) for the bridge
- # to determine that there are no other queriers in the network.
sleep 10
fwd_test_host
@@ -1267,6 +1270,7 @@
ip link set dev br0 type bridge mcast_querier 0
ip -6 address del fe80::1/64 dev br0
+ ip link set dev br0 type bridge mcast_query_response_interval 1000
}
ctrl_igmpv3_is_in_test()
diff --git a/tools/testing/selftests/net/forwarding/tc_actions.sh b/tools/testing/selftests/net/forwarding/tc_actions.sh
index b0f5e55..5896296 100755
--- a/tools/testing/selftests/net/forwarding/tc_actions.sh
+++ b/tools/testing/selftests/net/forwarding/tc_actions.sh
@@ -235,9 +235,6 @@
check_err $? "didn't mirred redirect ICMP"
tc_check_packets "dev $h1 ingress" 102 10
check_err $? "didn't drop mirred ICMP"
- local overlimits=$(tc_rule_stats_get ${h1} 101 egress .overlimits)
- test ${overlimits} = 10
- check_err $? "wrong overlimits, expected 10 got ${overlimits}"
tc filter del dev $h1 egress protocol ip pref 100 handle 100 flower
tc filter del dev $h1 egress protocol ip pref 101 handle 101 flower
diff --git a/tools/testing/selftests/net/forwarding/tc_flower_l2_miss.sh b/tools/testing/selftests/net/forwarding/tc_flower_l2_miss.sh
index 20a7cb7..c2420bb 100755
--- a/tools/testing/selftests/net/forwarding/tc_flower_l2_miss.sh
+++ b/tools/testing/selftests/net/forwarding/tc_flower_l2_miss.sh
@@ -209,14 +209,17 @@
# both registered and unregistered multicast traffic.
bridge link set dev $swp2 mcast_router 2
+ # Set the Max Response Delay to 100 centiseconds (1 second) so that the
+ # bridge will start forwarding according to its MDB soon after a
+ # multicast querier is enabled.
+ ip link set dev br1 type bridge mcast_query_response_interval 100
+
# Forwarding according to MDB entries only takes place when the bridge
# detects that there is a valid querier in the network. Set the bridge
# as the querier and assign it a valid IPv6 link-local address to be
# used as the source address for MLD queries.
ip link set dev br1 type bridge mcast_querier 1
ip -6 address add fe80::1/64 nodad dev br1
- # Wait the default Query Response Interval (10 seconds) for the bridge
- # to determine that there are no other queriers in the network.
sleep 10
test_l2_miss_multicast_ipv4
@@ -224,6 +227,7 @@
ip -6 address del fe80::1/64 dev br1
ip link set dev br1 type bridge mcast_querier 0
+ ip link set dev br1 type bridge mcast_query_response_interval 1000
bridge link set dev $swp2 mcast_router 1
}
diff --git a/tools/testing/selftests/net/gro.sh b/tools/testing/selftests/net/gro.sh
index 19352f1..02c21ff 100755
--- a/tools/testing/selftests/net/gro.sh
+++ b/tools/testing/selftests/net/gro.sh
@@ -31,6 +31,11 @@
1>>log.txt
wait "${server_pid}"
exit_code=$?
+ if [[ ${test} == "large" && -n "${KSFT_MACHINE_SLOW}" && \
+ ${exit_code} -ne 0 ]]; then
+ echo "Ignoring errors due to slow environment" 1>&2
+ exit_code=0
+ fi
if [[ "${exit_code}" -eq 0 ]]; then
break;
fi
diff --git a/tools/testing/selftests/net/ioam6.sh b/tools/testing/selftests/net/ioam6.sh
index fe59ca3e..12491850 100755
--- a/tools/testing/selftests/net/ioam6.sh
+++ b/tools/testing/selftests/net/ioam6.sh
@@ -367,14 +367,12 @@
local desc=$2
local node_src=$3
local node_dst=$4
- local ip6_src=$5
- local ip6_dst=$6
- local if_dst=$7
- local trace_type=$8
- local ioam_ns=$9
+ local ip6_dst=$5
+ local trace_type=$6
+ local ioam_ns=$7
+ local type=$8
- ip netns exec $node_dst ./ioam6_parser $if_dst $name $ip6_src $ip6_dst \
- $trace_type $ioam_ns &
+ ip netns exec $node_dst ./ioam6_parser $name $trace_type $ioam_ns $type &
local spid=$!
sleep 0.1
@@ -489,7 +487,7 @@
trace prealloc type 0x800000 ns 0 size 4 dev veth0
run_test ${FUNCNAME[0]} "${desc} ($1 mode)" $ioam_node_alpha $ioam_node_beta \
- db01::2 db01::1 veth0 0x800000 0
+ db01::1 0x800000 0 $1
[ "$1" = "encap" ] && ip -netns $ioam_node_beta link set ip6tnl0 down
}
@@ -509,7 +507,7 @@
trace prealloc type 0xc00000 ns 123 size 4 dev veth0
run_test ${FUNCNAME[0]} "${desc} ($1 mode)" $ioam_node_alpha $ioam_node_beta \
- db01::2 db01::1 veth0 0xc00000 123
+ db01::1 0xc00000 123 $1
[ "$1" = "encap" ] && ip -netns $ioam_node_beta link set ip6tnl0 down
}
@@ -543,14 +541,14 @@
if [ $cmd_res != 0 ]
then
npassed=$((npassed+1))
- log_test_passed "$descr"
+ log_test_passed "$descr ($1 mode)"
else
nfailed=$((nfailed+1))
- log_test_failed "$descr"
+ log_test_failed "$descr ($1 mode)"
fi
else
run_test "out_bit$i" "$descr ($1 mode)" $ioam_node_alpha \
- $ioam_node_beta db01::2 db01::1 veth0 ${bit2type[$i]} 123
+ $ioam_node_beta db01::1 ${bit2type[$i]} 123 $1
fi
done
@@ -574,7 +572,7 @@
trace prealloc type 0xfff002 ns 123 size 100 dev veth0
run_test ${FUNCNAME[0]} "${desc} ($1 mode)" $ioam_node_alpha $ioam_node_beta \
- db01::2 db01::1 veth0 0xfff002 123
+ db01::1 0xfff002 123 $1
[ "$1" = "encap" ] && ip -netns $ioam_node_beta link set ip6tnl0 down
}
@@ -604,7 +602,7 @@
trace prealloc type 0x800000 ns 0 size 4 dev veth0
run_test ${FUNCNAME[0]} "${desc} ($1 mode)" $ioam_node_alpha $ioam_node_beta \
- db01::2 db01::1 veth0 0x800000 0
+ db01::1 0x800000 0 $1
[ "$1" = "encap" ] && ip -netns $ioam_node_beta link set ip6tnl0 down
}
@@ -624,7 +622,7 @@
trace prealloc type 0xc00000 ns 123 size 4 dev veth0
run_test ${FUNCNAME[0]} "${desc} ($1 mode)" $ioam_node_alpha $ioam_node_beta \
- db01::2 db01::1 veth0 0xc00000 123
+ db01::1 0xc00000 123 $1
[ "$1" = "encap" ] && ip -netns $ioam_node_beta link set ip6tnl0 down
}
@@ -651,7 +649,7 @@
dev veth0
run_test "in_bit$i" "${desc/<n>/$i} ($1 mode)" $ioam_node_alpha \
- $ioam_node_beta db01::2 db01::1 veth0 ${bit2type[$i]} 123
+ $ioam_node_beta db01::1 ${bit2type[$i]} 123 $1
done
[ "$1" = "encap" ] && ip -netns $ioam_node_beta link set ip6tnl0 down
@@ -679,7 +677,7 @@
trace prealloc type 0xc00000 ns 123 size 4 dev veth0
run_test ${FUNCNAME[0]} "${desc} ($1 mode)" $ioam_node_alpha $ioam_node_beta \
- db01::2 db01::1 veth0 0xc00000 123
+ db01::1 0xc00000 123 $1
[ "$1" = "encap" ] && ip -netns $ioam_node_beta link set ip6tnl0 down
@@ -703,7 +701,7 @@
trace prealloc type 0xfff002 ns 123 size 80 dev veth0
run_test ${FUNCNAME[0]} "${desc} ($1 mode)" $ioam_node_alpha $ioam_node_beta \
- db01::2 db01::1 veth0 0xfff002 123
+ db01::1 0xfff002 123 $1
[ "$1" = "encap" ] && ip -netns $ioam_node_beta link set ip6tnl0 down
}
@@ -731,7 +729,7 @@
trace prealloc type 0xfff002 ns 123 size 244 via db01::1 dev veth0
run_test ${FUNCNAME[0]} "${desc} ($1 mode)" $ioam_node_alpha $ioam_node_gamma \
- db01::2 db02::2 veth0 0xfff002 123
+ db02::2 0xfff002 123 $1
[ "$1" = "encap" ] && ip -netns $ioam_node_gamma link set ip6tnl0 down
}
diff --git a/tools/testing/selftests/net/ioam6_parser.c b/tools/testing/selftests/net/ioam6_parser.c
index d9d1d41..895e5bb 100644
--- a/tools/testing/selftests/net/ioam6_parser.c
+++ b/tools/testing/selftests/net/ioam6_parser.c
@@ -8,7 +8,6 @@
#include <errno.h>
#include <limits.h>
#include <linux/const.h>
-#include <linux/if_ether.h>
#include <linux/ioam6.h>
#include <linux/ipv6.h>
#include <stdlib.h>
@@ -512,14 +511,6 @@ static int str2id(const char *tname)
return -1;
}
-static int ipv6_addr_equal(const struct in6_addr *a1, const struct in6_addr *a2)
-{
- return ((a1->s6_addr32[0] ^ a2->s6_addr32[0]) |
- (a1->s6_addr32[1] ^ a2->s6_addr32[1]) |
- (a1->s6_addr32[2] ^ a2->s6_addr32[2]) |
- (a1->s6_addr32[3] ^ a2->s6_addr32[3])) == 0;
-}
-
static int get_u32(__u32 *val, const char *arg, int base)
{
unsigned long res;
@@ -603,70 +594,80 @@ static int (*func[__TEST_MAX])(int, struct ioam6_trace_hdr *, __u32, __u16) = {
int main(int argc, char **argv)
{
- int fd, size, hoplen, tid, ret = 1;
- struct in6_addr src, dst;
+ int fd, size, hoplen, tid, ret = 1, on = 1;
struct ioam6_hdr *opt;
- struct ipv6hdr *ip6h;
- __u8 buffer[400], *p;
- __u16 ioam_ns;
+ struct cmsghdr *cmsg;
+ struct msghdr msg;
+ struct iovec iov;
+ __u8 buffer[512];
__u32 tr_type;
+ __u16 ioam_ns;
+ __u8 *ptr;
- if (argc != 7)
+ if (argc != 5)
goto out;
- tid = str2id(argv[2]);
+ tid = str2id(argv[1]);
if (tid < 0 || !func[tid])
goto out;
- if (inet_pton(AF_INET6, argv[3], &src) != 1 ||
- inet_pton(AF_INET6, argv[4], &dst) != 1)
+ if (get_u32(&tr_type, argv[2], 16) ||
+ get_u16(&ioam_ns, argv[3], 0))
goto out;
- if (get_u32(&tr_type, argv[5], 16) ||
- get_u16(&ioam_ns, argv[6], 0))
+ fd = socket(PF_INET6, SOCK_RAW,
+ !strcmp(argv[4], "encap") ? IPPROTO_IPV6 : IPPROTO_ICMPV6);
+ if (fd < 0)
goto out;
- fd = socket(AF_PACKET, SOCK_DGRAM, __cpu_to_be16(ETH_P_IPV6));
- if (!fd)
- goto out;
+ setsockopt(fd, IPPROTO_IPV6, IPV6_RECVHOPOPTS, &on, sizeof(on));
- if (setsockopt(fd, SOL_SOCKET, SO_BINDTODEVICE,
- argv[1], strlen(argv[1])))
+ iov.iov_len = 1;
+ iov.iov_base = malloc(CMSG_SPACE(sizeof(buffer)));
+ if (!iov.iov_base)
goto close;
-
recv:
- size = recv(fd, buffer, sizeof(buffer), 0);
+ memset(&msg, 0, sizeof(msg));
+ msg.msg_iov = &iov;
+ msg.msg_iovlen = 1;
+ msg.msg_control = buffer;
+ msg.msg_controllen = CMSG_SPACE(sizeof(buffer));
+
+ size = recvmsg(fd, &msg, 0);
if (size <= 0)
goto close;
- ip6h = (struct ipv6hdr *)buffer;
+ for (cmsg = CMSG_FIRSTHDR(&msg); cmsg; cmsg = CMSG_NXTHDR(&msg, cmsg)) {
+ if (cmsg->cmsg_level != IPPROTO_IPV6 ||
+ cmsg->cmsg_type != IPV6_HOPOPTS ||
+ cmsg->cmsg_len < sizeof(struct ipv6_hopopt_hdr))
+ continue;
- if (!ipv6_addr_equal(&ip6h->saddr, &src) ||
- !ipv6_addr_equal(&ip6h->daddr, &dst))
- goto recv;
+ ptr = (__u8 *)CMSG_DATA(cmsg);
- if (ip6h->nexthdr != IPPROTO_HOPOPTS)
- goto close;
+ hoplen = (ptr[1] + 1) << 3;
+ ptr += sizeof(struct ipv6_hopopt_hdr);
- p = buffer + sizeof(*ip6h);
- hoplen = (p[1] + 1) << 3;
- p += sizeof(struct ipv6_hopopt_hdr);
+ while (hoplen > 0) {
+ opt = (struct ioam6_hdr *)ptr;
- while (hoplen > 0) {
- opt = (struct ioam6_hdr *)p;
+ if (opt->opt_type == IPV6_TLV_IOAM &&
+ opt->type == IOAM6_TYPE_PREALLOC) {
+ ptr += sizeof(*opt);
+ ret = func[tid](tid,
+ (struct ioam6_trace_hdr *)ptr,
+ tr_type, ioam_ns);
+ goto close;
+ }
- if (opt->opt_type == IPV6_TLV_IOAM &&
- opt->type == IOAM6_TYPE_PREALLOC) {
- p += sizeof(*opt);
- ret = func[tid](tid, (struct ioam6_trace_hdr *)p,
- tr_type, ioam_ns);
- break;
+ ptr += opt->opt_len + 2;
+ hoplen -= opt->opt_len + 2;
}
-
- p += opt->opt_len + 2;
- hoplen -= opt->opt_len + 2;
}
+
+ goto recv;
close:
+ free(iov.iov_base);
close(fd);
out:
return ret;
diff --git a/tools/testing/selftests/net/ip_local_port_range.c b/tools/testing/selftests/net/ip_local_port_range.c
index 0f217a1..6ebd588 100644
--- a/tools/testing/selftests/net/ip_local_port_range.c
+++ b/tools/testing/selftests/net/ip_local_port_range.c
@@ -16,6 +16,10 @@
#define IP_LOCAL_PORT_RANGE 51
#endif
+#ifndef IPPROTO_MPTCP
+#define IPPROTO_MPTCP 262
+#endif
+
static __u32 pack_port_range(__u16 lo, __u16 hi)
{
return (hi << 16) | (lo << 0);
diff --git a/tools/testing/selftests/net/mptcp/diag.sh b/tools/testing/selftests/net/mptcp/diag.sh
index 04fcb8a..75fc956 100755
--- a/tools/testing/selftests/net/mptcp/diag.sh
+++ b/tools/testing/selftests/net/mptcp/diag.sh
@@ -20,7 +20,7 @@
ip netns pids "${ns}" | xargs --no-run-if-empty kill -SIGUSR1 &>/dev/null
- for _ in $(seq 10); do
+ for _ in $(seq $((timeout_poll * 10))); do
[ -z "$(ip netns pids "${ns}")" ] && break
sleep 0.1
done
@@ -62,14 +62,14 @@
nr=$(eval $command)
printf "%-50s" "$msg"
- if [ $nr != $expected ]; then
- if [ $nr = "$skip" ] && ! mptcp_lib_expect_all_features; then
+ if [ "$nr" != "$expected" ]; then
+ if [ "$nr" = "$skip" ] && ! mptcp_lib_expect_all_features; then
echo "[ skip ] Feature probably not supported"
mptcp_lib_result_skip "${msg}"
else
echo "[ fail ] expected $expected found $nr"
mptcp_lib_result_fail "${msg}"
- ret=$test_cnt
+ ret=${KSFT_FAIL}
fi
else
echo "[ ok ]"
@@ -91,6 +91,15 @@
__chk_msk_nr "grep -c token:" "$@"
}
+chk_listener_nr()
+{
+ local expected=$1
+ local msg="$2"
+
+ __chk_nr "ss -nlHMON $ns | wc -l" "$expected" "$msg - mptcp" 0
+ __chk_nr "ss -nlHtON $ns | wc -l" "$expected" "$msg - subflows"
+}
+
wait_msk_nr()
{
local condition="grep -c token:"
@@ -115,11 +124,11 @@
if [ $i -ge $timeout ]; then
echo "[ fail ] timeout while expecting $expected max $max last $nr"
mptcp_lib_result_fail "${msg} # timeout"
- ret=$test_cnt
+ ret=${KSFT_FAIL}
elif [ $nr != $expected ]; then
echo "[ fail ] expected $expected found $nr"
mptcp_lib_result_fail "${msg} # unexpected result"
- ret=$test_cnt
+ ret=${KSFT_FAIL}
else
echo "[ ok ]"
mptcp_lib_result_pass "${msg}"
@@ -166,9 +175,13 @@
chk_msk_inuse()
{
local expected=$1
- local msg="$2"
+ local msg="....chk ${2:-${expected}} msk in use"
local listen_nr
+ if [ "${expected}" -eq 0 ]; then
+ msg+=" after flush"
+ fi
+
listen_nr=$(ss -N "${ns}" -Ml | grep -c LISTEN)
expected=$((expected + listen_nr))
@@ -179,16 +192,21 @@
sleep 0.1
done
- __chk_nr get_msk_inuse $expected "$msg" 0
+ __chk_nr get_msk_inuse $expected "${msg}" 0
}
# $1: cestab nr
chk_msk_cestab()
{
- local cestab=$1
+ local expected=$1
+ local msg="....chk ${2:-${expected}} cestab"
+
+ if [ "${expected}" -eq 0 ]; then
+ msg+=" after flush"
+ fi
__chk_nr "mptcp_lib_get_counter ${ns} MPTcpExtMPCurrEstab" \
- "${cestab}" "....chk ${cestab} cestab" ""
+ "${expected}" "${msg}" ""
}
wait_connected()
@@ -227,12 +245,12 @@
chk_msk_nr 2 "after MPC handshake "
chk_msk_remote_key_nr 2 "....chk remote_key"
chk_msk_fallback_nr 0 "....chk no fallback"
-chk_msk_inuse 2 "....chk 2 msk in use"
+chk_msk_inuse 2
chk_msk_cestab 2
flush_pids
-chk_msk_inuse 0 "....chk 0 msk in use after flush"
-chk_msk_cestab 0
+chk_msk_inuse 0 "2->0"
+chk_msk_cestab 0 "2->0"
echo "a" | \
timeout ${timeout_test} \
@@ -247,12 +265,12 @@
127.0.0.1 >/dev/null &
wait_connected $ns 10001
chk_msk_fallback_nr 1 "check fallback"
-chk_msk_inuse 1 "....chk 1 msk in use"
+chk_msk_inuse 1
chk_msk_cestab 1
flush_pids
-chk_msk_inuse 0 "....chk 0 msk in use after flush"
-chk_msk_cestab 0
+chk_msk_inuse 0 "1->0"
+chk_msk_cestab 0 "1->0"
NR_CLIENTS=100
for I in `seq 1 $NR_CLIENTS`; do
@@ -273,12 +291,28 @@
done
wait_msk_nr $((NR_CLIENTS*2)) "many msk socket present"
-chk_msk_inuse $((NR_CLIENTS*2)) "....chk many msk in use"
-chk_msk_cestab $((NR_CLIENTS*2))
+chk_msk_inuse $((NR_CLIENTS*2)) "many"
+chk_msk_cestab $((NR_CLIENTS*2)) "many"
flush_pids
-chk_msk_inuse 0 "....chk 0 msk in use after flush"
-chk_msk_cestab 0
+chk_msk_inuse 0 "many->0"
+chk_msk_cestab 0 "many->0"
+
+chk_listener_nr 0 "no listener sockets"
+NR_SERVERS=100
+for I in $(seq 1 $NR_SERVERS); do
+ ip netns exec $ns ./mptcp_connect -p $((I + 20001)) \
+ -t ${timeout_poll} -l 0.0.0.0 >/dev/null 2>&1 &
+done
+mptcp_lib_wait_local_port_listen $ns $((NR_SERVERS + 20001))
+
+chk_listener_nr $NR_SERVERS "many listener sockets"
+
+# graceful termination
+for I in $(seq 1 $NR_SERVERS); do
+ echo a | ip netns exec $ns ./mptcp_connect -p $((I + 20001)) 127.0.0.1 >/dev/null 2>&1 &
+done
+flush_pids
mptcp_lib_result_print_all_tap
exit $ret
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index c07386e2..e4581b0 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -161,6 +161,11 @@
exit $ksft_skip
fi
+ if ! ss -h | grep -q MPTCP; then
+ echo "SKIP: ss tool does not support MPTCP"
+ exit $ksft_skip
+ fi
+
# Use the legacy version if available to support old kernel versions
if iptables-legacy -V &> /dev/null; then
iptables="iptables-legacy"
@@ -3333,16 +3338,17 @@
{
local evts=$evts_ns1
local t=${3:-1}
- local ip=4
+ local ip
local tk da dp sp
local cnt
[ "$1" == "$ns2" ] && evts=$evts_ns2
- if mptcp_lib_is_v6 $2; then ip=6; fi
+ [ -n "$(mptcp_lib_evts_get_info "saddr4" "$evts" $t)" ] && ip=4
+ [ -n "$(mptcp_lib_evts_get_info "saddr6" "$evts" $t)" ] && ip=6
tk=$(mptcp_lib_evts_get_info token "$evts")
- da=$(mptcp_lib_evts_get_info "daddr$ip" "$evts" $t)
- dp=$(mptcp_lib_evts_get_info dport "$evts" $t)
- sp=$(mptcp_lib_evts_get_info sport "$evts" $t)
+ da=$(mptcp_lib_evts_get_info "daddr$ip" "$evts" $t $2)
+ dp=$(mptcp_lib_evts_get_info dport "$evts" $t $2)
+ sp=$(mptcp_lib_evts_get_info sport "$evts" $t $2)
cnt=$(rm_sf_count ${1})
ip netns exec $1 ./pm_nl_ctl dsf lip $2 lport $sp \
@@ -3429,20 +3435,23 @@
if reset_with_events "userspace pm add & remove address" &&
continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then
set_userspace_pm $ns1
- pm_nl_set_limits $ns2 1 1
+ pm_nl_set_limits $ns2 2 2
speed=5 \
run_tests $ns1 $ns2 10.0.1.1 &
local tests_pid=$!
wait_mpj $ns1
userspace_pm_add_addr $ns1 10.0.2.1 10
- chk_join_nr 1 1 1
- chk_add_nr 1 1
- chk_mptcp_info subflows 1 subflows 1
- chk_subflows_total 2 2
- chk_mptcp_info add_addr_signal 1 add_addr_accepted 1
+ userspace_pm_add_addr $ns1 10.0.3.1 20
+ chk_join_nr 2 2 2
+ chk_add_nr 2 2
+ chk_mptcp_info subflows 2 subflows 2
+ chk_subflows_total 3 3
+ chk_mptcp_info add_addr_signal 2 add_addr_accepted 2
userspace_pm_rm_addr $ns1 10
userspace_pm_rm_sf $ns1 "::ffff:10.0.2.1" $SUB_ESTABLISHED
- chk_rm_nr 1 1 invert
+ userspace_pm_rm_addr $ns1 20
+ userspace_pm_rm_sf $ns1 10.0.3.1 $SUB_ESTABLISHED
+ chk_rm_nr 2 2 invert
chk_mptcp_info subflows 0 subflows 0
chk_subflows_total 1 1
kill_events_pids
diff --git a/tools/testing/selftests/net/mptcp/mptcp_lib.sh b/tools/testing/selftests/net/mptcp/mptcp_lib.sh
index 3a2abae..3777d66 100644
--- a/tools/testing/selftests/net/mptcp/mptcp_lib.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_lib.sh
@@ -213,9 +213,9 @@
grep "${2}" | sed -n 's/.*\('"${1}"':\)\([0-9a-f:.]*\).*$/\2/p;q'
}
-# $1: info name ; $2: evts_ns ; $3: event type
+# $1: info name ; $2: evts_ns ; [$3: event type; [$4: addr]]
mptcp_lib_evts_get_info() {
- mptcp_lib_get_info_value "${1}" "^type:${3:-1}," < "${2}"
+ grep "${4:-}" "${2}" | mptcp_lib_get_info_value "${1}" "^type:${3:-1},"
}
# $1: PID
diff --git a/tools/testing/selftests/net/mptcp/pm_netlink.sh b/tools/testing/selftests/net/mptcp/pm_netlink.sh
index 8f4ff12..71899a3 100755
--- a/tools/testing/selftests/net/mptcp/pm_netlink.sh
+++ b/tools/testing/selftests/net/mptcp/pm_netlink.sh
@@ -183,7 +183,7 @@
subflow 10.0.1.1" " (nobackup)"
# fullmesh support has been added later
-ip netns exec $ns1 ./pm_nl_ctl set id 1 flags fullmesh
+ip netns exec $ns1 ./pm_nl_ctl set id 1 flags fullmesh 2>/dev/null
if ip netns exec $ns1 ./pm_nl_ctl dump | grep -q "fullmesh" ||
mptcp_lib_expect_all_features; then
check "ip netns exec $ns1 ./pm_nl_ctl dump" "id 1 flags \
@@ -194,6 +194,12 @@
ip netns exec $ns1 ./pm_nl_ctl set id 1 flags backup,fullmesh
check "ip netns exec $ns1 ./pm_nl_ctl dump" "id 1 flags \
subflow,backup,fullmesh 10.0.1.1" " (backup,fullmesh)"
+else
+ for st in fullmesh nofullmesh backup,fullmesh; do
+ st=" (${st})"
+ printf "%-50s%s\n" "${st}" "[SKIP]"
+ mptcp_lib_result_skip "${st}"
+ done
fi
mptcp_lib_result_print_all_tap
diff --git a/tools/testing/selftests/net/mptcp/simult_flows.sh b/tools/testing/selftests/net/mptcp/simult_flows.sh
index 0cc964e..8f9ddb3 100755
--- a/tools/testing/selftests/net/mptcp/simult_flows.sh
+++ b/tools/testing/selftests/net/mptcp/simult_flows.sh
@@ -250,7 +250,8 @@
[ $bail -eq 0 ] || exit $ret
fi
- printf "%-60s" "$msg - reverse direction"
+ msg+=" - reverse direction"
+ printf "%-60s" "${msg}"
do_transfer $large $small $time
lret=$?
mptcp_lib_result_code "${lret}" "${msg}"
diff --git a/tools/testing/selftests/net/mptcp/userspace_pm.sh b/tools/testing/selftests/net/mptcp/userspace_pm.sh
index 6167837..1b94a75 100755
--- a/tools/testing/selftests/net/mptcp/userspace_pm.sh
+++ b/tools/testing/selftests/net/mptcp/userspace_pm.sh
@@ -75,7 +75,7 @@
{
test_name="${1}"
- _printf "%-63s" "${test_name}"
+ _printf "%-68s" "${test_name}"
}
print_results()
@@ -542,7 +542,7 @@
local remid
local info
- info="${e_saddr} (${e_from}) => ${e_daddr} (${e_to})"
+ info="${e_saddr} (${e_from}) => ${e_daddr}:${e_dport} (${e_to})"
if [ "$e_type" = "$SUB_ESTABLISHED" ]
then
diff --git a/tools/testing/selftests/net/net_helper.sh b/tools/testing/selftests/net/net_helper.sh
index 4fe0bef..6596fe0 100644
--- a/tools/testing/selftests/net/net_helper.sh
+++ b/tools/testing/selftests/net/net_helper.sh
@@ -8,13 +8,16 @@
local listener_ns="${1}"
local port="${2}"
local protocol="${3}"
- local port_hex
+ local pattern
local i
- port_hex="$(printf "%04X" "${port}")"
+ pattern=":$(printf "%04X" "${port}") "
+
+ # for tcp protocol additionally check the socket state
+ [ ${protocol} = "tcp" ] && pattern="${pattern}0A"
for i in $(seq 10); do
- if ip netns exec "${listener_ns}" cat /proc/net/"${protocol}"* | \
- grep -q "${port_hex}"; then
+ if ip netns exec "${listener_ns}" awk '{print $2" "$4}' \
+ /proc/net/"${protocol}"* | grep -q "${pattern}"; then
break
fi
sleep 0.1
diff --git a/tools/testing/selftests/net/openvswitch/openvswitch.sh b/tools/testing/selftests/net/openvswitch/openvswitch.sh
index f8499d4..36e4025 100755
--- a/tools/testing/selftests/net/openvswitch/openvswitch.sh
+++ b/tools/testing/selftests/net/openvswitch/openvswitch.sh
@@ -502,9 +502,22 @@
wc -l) == 2 ] || \
return 1
+ info "Checking clone depth"
ERR_MSG="Flow actions may not be safe on all matching packets"
PRE_TEST=$(dmesg | grep -c "${ERR_MSG}")
ovs_add_flow "test_netlink_checks" nv0 \
+ 'in_port(1),eth(),eth_type(0x800),ipv4()' \
+ 'clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(drop)))))))))))))))))' \
+ >/dev/null 2>&1 && return 1
+ POST_TEST=$(dmesg | grep -c "${ERR_MSG}")
+
+ if [ "$PRE_TEST" == "$POST_TEST" ]; then
+ info "failed - clone depth too large"
+ return 1
+ fi
+
+ PRE_TEST=$(dmesg | grep -c "${ERR_MSG}")
+ ovs_add_flow "test_netlink_checks" nv0 \
'in_port(1),eth(),eth_type(0x0806),arp()' 'drop(0),2' \
&> /dev/null && return 1
POST_TEST=$(dmesg | grep -c "${ERR_MSG}")
diff --git a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
index b97e621..5e0e539 100644
--- a/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
+++ b/tools/testing/selftests/net/openvswitch/ovs-dpctl.py
@@ -299,7 +299,7 @@
("OVS_ACTION_ATTR_PUSH_NSH", "none"),
("OVS_ACTION_ATTR_POP_NSH", "flag"),
("OVS_ACTION_ATTR_METER", "none"),
- ("OVS_ACTION_ATTR_CLONE", "none"),
+ ("OVS_ACTION_ATTR_CLONE", "recursive"),
("OVS_ACTION_ATTR_CHECK_PKT_LEN", "none"),
("OVS_ACTION_ATTR_ADD_MPLS", "none"),
("OVS_ACTION_ATTR_DEC_TTL", "none"),
@@ -465,29 +465,42 @@
print_str += "pop_mpls"
else:
datum = self.get_attr(field[0])
- print_str += datum.dpstr(more)
+ if field[0] == "OVS_ACTION_ATTR_CLONE":
+ print_str += "clone("
+ print_str += datum.dpstr(more)
+ print_str += ")"
+ else:
+ print_str += datum.dpstr(more)
return print_str
def parse(self, actstr):
+ totallen = len(actstr)
while len(actstr) != 0:
parsed = False
+ parencount = 0
if actstr.startswith("drop"):
# If no reason is provided, the implicit drop is used (i.e no
# action). If some reason is given, an explicit action is used.
- actstr, reason = parse_extract_field(
- actstr,
- "drop(",
- "([0-9]+)",
- lambda x: int(x, 0),
- False,
- None,
- )
+ reason = None
+ if actstr.startswith("drop("):
+ parencount += 1
+
+ actstr, reason = parse_extract_field(
+ actstr,
+ "drop(",
+ "([0-9]+)",
+ lambda x: int(x, 0),
+ False,
+ None,
+ )
+
if reason is not None:
self["attrs"].append(["OVS_ACTION_ATTR_DROP", reason])
parsed = True
else:
- return
+ actstr = actstr[len("drop"): ]
+ return (totallen - len(actstr))
elif parse_starts_block(actstr, "^(\d+)", False, True):
actstr, output = parse_extract_field(
@@ -504,6 +517,7 @@
False,
0,
)
+ parencount += 1
self["attrs"].append(["OVS_ACTION_ATTR_RECIRC", recircid])
parsed = True
@@ -516,12 +530,22 @@
for flat_act in parse_flat_map:
if parse_starts_block(actstr, flat_act[0], False):
- actstr += len(flat_act[0])
+ actstr = actstr[len(flat_act[0]):]
self["attrs"].append([flat_act[1]])
actstr = actstr[strspn(actstr, ", ") :]
parsed = True
- if parse_starts_block(actstr, "ct(", False):
+ if parse_starts_block(actstr, "clone(", False):
+ parencount += 1
+ subacts = ovsactions()
+ actstr = actstr[len("clone("):]
+ parsedLen = subacts.parse(actstr)
+ lst = []
+ self["attrs"].append(("OVS_ACTION_ATTR_CLONE", subacts))
+ actstr = actstr[parsedLen:]
+ parsed = True
+ elif parse_starts_block(actstr, "ct(", False):
+ parencount += 1
actstr = actstr[len("ct(") :]
ctact = ovsactions.ctact()
@@ -553,6 +577,7 @@
natact = ovsactions.ctact.natattr()
if actstr.startswith("("):
+ parencount += 1
t = None
actstr = actstr[1:]
if actstr.startswith("src"):
@@ -607,15 +632,29 @@
actstr = actstr[strspn(actstr, ", ") :]
ctact["attrs"].append(["OVS_CT_ATTR_NAT", natact])
- actstr = actstr[strspn(actstr, ",) ") :]
+ actstr = actstr[strspn(actstr, ", ") :]
self["attrs"].append(["OVS_ACTION_ATTR_CT", ctact])
parsed = True
- actstr = actstr[strspn(actstr, "), ") :]
+ actstr = actstr[strspn(actstr, ", ") :]
+ while parencount > 0:
+ parencount -= 1
+ actstr = actstr[strspn(actstr, " "):]
+ if len(actstr) and actstr[0] != ")":
+ raise ValueError("Action str: '%s' unbalanced" % actstr)
+ actstr = actstr[1:]
+
+ if len(actstr) and actstr[0] == ")":
+ return (totallen - len(actstr))
+
+ actstr = actstr[strspn(actstr, ", ") :]
+
if not parsed:
raise ValueError("Action str: '%s' not supported" % actstr)
+ return (totallen - len(actstr))
+
class ovskey(nla):
nla_flags = NLA_F_NESTED
@@ -2111,6 +2150,8 @@
ovsflow = OvsFlow()
ndb = NDB()
+ sys.setrecursionlimit(100000)
+
if hasattr(args, "showdp"):
found = False
for iface in ndb.interfaces:
diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh
index 3f118e3..cfc8495 100755
--- a/tools/testing/selftests/net/pmtu.sh
+++ b/tools/testing/selftests/net/pmtu.sh
@@ -199,6 +199,7 @@
# Same as above but with IPv6
source lib.sh
+source net_helper.sh
PAUSE_ON_FAIL=no
VERBOSE=0
@@ -1335,12 +1336,14 @@
else
TCPDST="TCP:[${dst}]:50000"
fi
- ${ns_b} socat -T 3 -u -6 TCP-LISTEN:50000 STDOUT > $tmpoutfile &
+ ${ns_b} socat -T 3 -u -6 TCP-LISTEN:50000,reuseaddr STDOUT > $tmpoutfile &
+ local socat_pid=$!
- sleep 1
+ wait_local_port_listen ${NS_B} 50000 tcp
dd if=/dev/zero status=none bs=1M count=1 | ${target} socat -T 3 -u STDIN $TCPDST,connect-timeout=3
+ wait ${socat_pid}
size=$(du -sb $tmpoutfile)
size=${size%%/tmp/*}
@@ -1954,6 +1957,13 @@
return 0
}
+check_running() {
+ pid=${1}
+ cmd=${2}
+
+ [ "$(cat /proc/${pid}/cmdline 2>/dev/null | tr -d '\0')" = "{cmd}" ]
+}
+
test_cleanup_vxlanX_exception() {
outer="${1}"
encap="vxlan"
@@ -1984,11 +1994,12 @@
${ns_a} ip link del dev veth_A-R1 &
iplink_pid=$!
- sleep 1
- if [ "$(cat /proc/${iplink_pid}/cmdline 2>/dev/null | tr -d '\0')" = "iplinkdeldevveth_A-R1" ]; then
- err " can't delete veth device in a timely manner, PMTU dst likely leaked"
- return 1
- fi
+ for i in $(seq 1 20); do
+ check_running ${iplink_pid} "iplinkdeldevveth_A-R1" || return 0
+ sleep 0.1
+ done
+ err " can't delete veth device in a timely manner, PMTU dst likely leaked"
+ return 1
}
test_cleanup_ipv6_exception() {
diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh
index 4667d74..874a295 100755
--- a/tools/testing/selftests/net/rtnetlink.sh
+++ b/tools/testing/selftests/net/rtnetlink.sh
@@ -440,7 +440,6 @@
local ret=0
vxlan="test-vxlan0"
vlan="test-vlan0"
- testns="$1"
run_cmd ip -netns "$testns" link add "$vxlan" type vxlan id 42 group 239.1.1.1 \
dev "$devdummy" dstport 4789
if [ $? -ne 0 ]; then
@@ -485,7 +484,6 @@
{
local ret=0
name="test-fou"
- testns="$1"
run_cmd_grep 'Usage: ip fou' ip fou help
if [ $? -ne 0 ];then
end_test "SKIP: fou: iproute2 too old"
@@ -526,8 +524,8 @@
run_cmd ip -netns "$testns" link set lo up
run_cmd ip -netns "$testns" link add name "$devdummy" type dummy
run_cmd ip -netns "$testns" link set "$devdummy" up
- run_cmd kci_test_encap_vxlan "$testns"
- run_cmd kci_test_encap_fou "$testns"
+ run_cmd kci_test_encap_vxlan
+ run_cmd kci_test_encap_fou
ip netns del "$testns"
return $ret
diff --git a/tools/testing/selftests/net/so_txtime.sh b/tools/testing/selftests/net/so_txtime.sh
index 3f06f4d..5e861ad 100755
--- a/tools/testing/selftests/net/so_txtime.sh
+++ b/tools/testing/selftests/net/so_txtime.sh
@@ -5,6 +5,7 @@
set -e
+readonly ksft_skip=4
readonly DEV="veth0"
readonly BIN="./so_txtime"
@@ -46,7 +47,7 @@
ip -netns "${NS1}" addr add fd::1/64 dev "${DEV}" nodad
ip -netns "${NS2}" addr add fd::2/64 dev "${DEV}" nodad
-do_test() {
+run_test() {
local readonly IP="$1"
local readonly CLOCK="$2"
local readonly TXARGS="$3"
@@ -64,12 +65,25 @@
fi
local readonly START="$(date +%s%N --date="+ 0.1 seconds")"
+
ip netns exec "${NS2}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S "${SADDR}" -D "${DADDR}" "${RXARGS}" -r &
ip netns exec "${NS1}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S "${SADDR}" -D "${DADDR}" "${TXARGS}"
wait "$!"
}
+do_test() {
+ run_test $@
+ [ $? -ne 0 ] && ret=1
+}
+
+do_fail_test() {
+ run_test $@
+ [ $? -eq 0 ] && ret=1
+}
+
ip netns exec "${NS1}" tc qdisc add dev "${DEV}" root fq
+set +e
+ret=0
do_test 4 mono a,-1 a,-1
do_test 6 mono a,0 a,0
do_test 6 mono a,10 a,10
@@ -77,13 +91,20 @@
do_test 6 mono a,20,b,10 b,20,a,20
if ip netns exec "${NS1}" tc qdisc replace dev "${DEV}" root etf clockid CLOCK_TAI delta 400000; then
- ! do_test 4 tai a,-1 a,-1
- ! do_test 6 tai a,0 a,0
+ do_fail_test 4 tai a,-1 a,-1
+ do_fail_test 6 tai a,0 a,0
do_test 6 tai a,10 a,10
do_test 4 tai a,10,b,20 a,10,b,20
do_test 6 tai a,20,b,10 b,10,a,20
else
echo "tc ($(tc -V)) does not support qdisc etf. skipping"
+ [ $ret -eq 0 ] && ret=$ksft_skip
fi
-echo OK. All tests passed
+if [ $ret -eq 0 ]; then
+ echo OK. All tests passed
+elif [[ $ret -ne $ksft_skip && -n "$KSFT_MACHINE_SLOW" ]]; then
+ echo "Ignoring errors due to slow environment" 1>&2
+ ret=0
+fi
+exit $ret
diff --git a/tools/testing/selftests/net/tcp_ao/unsigned-md5.c b/tools/testing/selftests/net/tcp_ao/unsigned-md5.c
index c5b568cd..6b59a65 100644
--- a/tools/testing/selftests/net/tcp_ao/unsigned-md5.c
+++ b/tools/testing/selftests/net/tcp_ao/unsigned-md5.c
@@ -110,9 +110,9 @@ static void try_accept(const char *tst_name, unsigned int port,
test_tcp_ao_counters_cmp(tst_name, &ao_cnt1, &ao_cnt2, cnt_expected);
out:
- synchronize_threads(); /* close() */
+ synchronize_threads(); /* test_kill_sk() */
if (sk > 0)
- close(sk);
+ test_kill_sk(sk);
}
static void server_add_routes(void)
@@ -302,10 +302,10 @@ static void try_connect(const char *tst_name, unsigned int port,
test_ok("%s: connected", tst_name);
out:
- synchronize_threads(); /* close() */
+ synchronize_threads(); /* test_kill_sk() */
/* _test_connect_socket() cleans up on failure */
if (ret > 0)
- close(sk);
+ test_kill_sk(sk);
}
#define PREINSTALL_MD5_FIRST BIT(0)
@@ -486,10 +486,10 @@ static void try_to_add(const char *tst_name, unsigned int port,
}
out:
- synchronize_threads(); /* close() */
+ synchronize_threads(); /* test_kill_sk() */
/* _test_connect_socket() cleans up on failure */
if (ret > 0)
- close(sk);
+ test_kill_sk(sk);
}
static void client_add_ip(union tcp_addr *client, const char *ip)
diff --git a/tools/testing/selftests/net/test_bridge_backup_port.sh b/tools/testing/selftests/net/test_bridge_backup_port.sh
index 70a7d87..1b3f89e 100755
--- a/tools/testing/selftests/net/test_bridge_backup_port.sh
+++ b/tools/testing/selftests/net/test_bridge_backup_port.sh
@@ -124,6 +124,16 @@
[[ $pkts == $count ]]
}
+bridge_link_check()
+{
+ local ns=$1; shift
+ local dev=$1; shift
+ local state=$1; shift
+
+ bridge -n $ns -d -j link show dev $dev | \
+ jq -e ".[][\"state\"] == \"$state\"" &> /dev/null
+}
+
################################################################################
# Setup
@@ -259,6 +269,7 @@
log_test $? 0 "No forwarding out of vx0"
run_cmd "ip -n $sw1 link set dev swp1 carrier off"
+ busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled
log_test $? 0 "swp1 carrier off"
run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1"
@@ -268,6 +279,7 @@
log_test $? 0 "No forwarding out of vx0"
run_cmd "ip -n $sw1 link set dev swp1 carrier on"
+ busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 forwarding
log_test $? 0 "swp1 carrier on"
# Configure vx0 as the backup port of swp1 and check that packets are
@@ -284,6 +296,7 @@
log_test $? 0 "No forwarding out of vx0"
run_cmd "ip -n $sw1 link set dev swp1 carrier off"
+ busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled
log_test $? 0 "swp1 carrier off"
run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1"
@@ -293,6 +306,7 @@
log_test $? 0 "Forwarding out of vx0"
run_cmd "ip -n $sw1 link set dev swp1 carrier on"
+ busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 forwarding
log_test $? 0 "swp1 carrier on"
run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1"
@@ -314,6 +328,7 @@
log_test $? 0 "No forwarding out of vx0"
run_cmd "ip -n $sw1 link set dev swp1 carrier off"
+ busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled
log_test $? 0 "swp1 carrier off"
run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1"
@@ -369,6 +384,7 @@
log_test $? 0 "No forwarding out of vx0"
run_cmd "ip -n $sw1 link set dev swp1 carrier off"
+ busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled
log_test $? 0 "swp1 carrier off"
run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1"
@@ -382,6 +398,7 @@
log_test $? 0 "Forwarding using VXLAN FDB entry"
run_cmd "ip -n $sw1 link set dev swp1 carrier on"
+ busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 forwarding
log_test $? 0 "swp1 carrier on"
# Configure nexthop ID 10 as the backup nexthop ID of swp1 and check
@@ -398,6 +415,7 @@
log_test $? 0 "No forwarding out of vx0"
run_cmd "ip -n $sw1 link set dev swp1 carrier off"
+ busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled
log_test $? 0 "swp1 carrier off"
run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1"
@@ -411,6 +429,7 @@
log_test $? 0 "No forwarding using VXLAN FDB entry"
run_cmd "ip -n $sw1 link set dev swp1 carrier on"
+ busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 forwarding
log_test $? 0 "swp1 carrier on"
run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1"
@@ -441,6 +460,7 @@
log_test $? 0 "No forwarding using VXLAN FDB entry"
run_cmd "ip -n $sw1 link set dev swp1 carrier off"
+ busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled
log_test $? 0 "swp1 carrier off"
run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1"
@@ -497,6 +517,7 @@
log_test $? 0 "Valid nexthop as backup nexthop"
run_cmd "ip -n $sw1 link set dev swp1 carrier off"
+ busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled
log_test $? 0 "swp1 carrier off"
run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1"
@@ -604,7 +625,9 @@
run_cmd "bridge -n $sw2 link set dev swp1 backup_nhid 10"
run_cmd "ip -n $sw1 link set dev swp1 carrier off"
+ busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled
run_cmd "ip -n $sw2 link set dev swp1 carrier off"
+ busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw2 swp1 disabled
run_cmd "ip netns exec $sw1 ping -i 0.1 -c 10 -w $PING_TIMEOUT 192.0.2.66"
log_test $? 0 "Ping with backup nexthop ID"
diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c
index 7799e04..b95c249 100644
--- a/tools/testing/selftests/net/tls.c
+++ b/tools/testing/selftests/net/tls.c
@@ -1002,12 +1002,12 @@ TEST_F(tls, recv_partial)
memset(recv_mem, 0, sizeof(recv_mem));
EXPECT_EQ(send(self->fd, test_str, send_len, 0), send_len);
- EXPECT_NE(recv(self->cfd, recv_mem, strlen(test_str_first),
- MSG_WAITALL), -1);
+ EXPECT_EQ(recv(self->cfd, recv_mem, strlen(test_str_first),
+ MSG_WAITALL), strlen(test_str_first));
EXPECT_EQ(memcmp(test_str_first, recv_mem, strlen(test_str_first)), 0);
memset(recv_mem, 0, sizeof(recv_mem));
- EXPECT_NE(recv(self->cfd, recv_mem, strlen(test_str_second),
- MSG_WAITALL), -1);
+ EXPECT_EQ(recv(self->cfd, recv_mem, strlen(test_str_second),
+ MSG_WAITALL), strlen(test_str_second));
EXPECT_EQ(memcmp(test_str_second, recv_mem, strlen(test_str_second)),
0);
}
@@ -1485,6 +1485,51 @@ TEST_F(tls, control_msg)
EXPECT_EQ(memcmp(buf, test_str, send_len), 0);
}
+TEST_F(tls, control_msg_nomerge)
+{
+ char *rec1 = "1111";
+ char *rec2 = "2222";
+ int send_len = 5;
+ char buf[15];
+
+ if (self->notls)
+ SKIP(return, "no TLS support");
+
+ EXPECT_EQ(tls_send_cmsg(self->fd, 100, rec1, send_len, 0), send_len);
+ EXPECT_EQ(tls_send_cmsg(self->fd, 100, rec2, send_len, 0), send_len);
+
+ EXPECT_EQ(tls_recv_cmsg(_metadata, self->cfd, 100, buf, sizeof(buf), MSG_PEEK), send_len);
+ EXPECT_EQ(memcmp(buf, rec1, send_len), 0);
+
+ EXPECT_EQ(tls_recv_cmsg(_metadata, self->cfd, 100, buf, sizeof(buf), MSG_PEEK), send_len);
+ EXPECT_EQ(memcmp(buf, rec1, send_len), 0);
+
+ EXPECT_EQ(tls_recv_cmsg(_metadata, self->cfd, 100, buf, sizeof(buf), 0), send_len);
+ EXPECT_EQ(memcmp(buf, rec1, send_len), 0);
+
+ EXPECT_EQ(tls_recv_cmsg(_metadata, self->cfd, 100, buf, sizeof(buf), 0), send_len);
+ EXPECT_EQ(memcmp(buf, rec2, send_len), 0);
+}
+
+TEST_F(tls, data_control_data)
+{
+ char *rec1 = "1111";
+ char *rec2 = "2222";
+ char *rec3 = "3333";
+ int send_len = 5;
+ char buf[15];
+
+ if (self->notls)
+ SKIP(return, "no TLS support");
+
+ EXPECT_EQ(send(self->fd, rec1, send_len, 0), send_len);
+ EXPECT_EQ(tls_send_cmsg(self->fd, 100, rec2, send_len, 0), send_len);
+ EXPECT_EQ(send(self->fd, rec3, send_len, 0), send_len);
+
+ EXPECT_EQ(recv(self->cfd, buf, sizeof(buf), MSG_PEEK), send_len);
+ EXPECT_EQ(recv(self->cfd, buf, sizeof(buf), MSG_PEEK), send_len);
+}
+
TEST_F(tls, shutdown)
{
char const *test_str = "test_read";
@@ -1874,13 +1919,13 @@ TEST_F(tls_err, poll_partial_rec_async)
/* Child should sleep in poll(), never get a wake */
pfd.fd = self->cfd2;
pfd.events = POLLIN;
- EXPECT_EQ(poll(&pfd, 1, 5), 0);
+ EXPECT_EQ(poll(&pfd, 1, 20), 0);
EXPECT_EQ(write(p[1], &token, 1), 1); /* Barrier #1 */
pfd.fd = self->cfd2;
pfd.events = POLLIN;
- EXPECT_EQ(poll(&pfd, 1, 5), 1);
+ EXPECT_EQ(poll(&pfd, 1, 20), 1);
exit(!_metadata->passed);
}
diff --git a/tools/testing/selftests/net/udpgro_fwd.sh b/tools/testing/selftests/net/udpgro_fwd.sh
index d6b9c75..9cd5e88 100755
--- a/tools/testing/selftests/net/udpgro_fwd.sh
+++ b/tools/testing/selftests/net/udpgro_fwd.sh
@@ -39,6 +39,10 @@
for ns in $NS_SRC $NS_DST; do
ip netns add $ns
ip -n $ns link set dev lo up
+
+ # disable route solicitations to decrease 'noise' traffic
+ ip netns exec $ns sysctl -qw net.ipv6.conf.default.router_solicitations=0
+ ip netns exec $ns sysctl -qw net.ipv6.conf.all.router_solicitations=0
done
ip link add name veth$SRC type veth peer name veth$DST
@@ -80,6 +84,12 @@
create_vxlan_endpoint $BASE$ns veth$ns $BM_NET_V6$((3 - $ns)) vxlan6$ns 6
ip -n $BASE$ns addr add dev vxlan6$ns $OL_NET_V6$ns/24 nodad
done
+
+ # preload neighbur cache, do avoid some noisy traffic
+ local addr_dst=$(ip -j -n $BASE$DST link show dev vxlan6$DST |jq -r '.[]["address"]')
+ local addr_src=$(ip -j -n $BASE$SRC link show dev vxlan6$SRC |jq -r '.[]["address"]')
+ ip -n $BASE$DST neigh add dev vxlan6$DST lladdr $addr_src $OL_NET_V6$SRC
+ ip -n $BASE$SRC neigh add dev vxlan6$SRC lladdr $addr_dst $OL_NET_V6$DST
}
is_ipv6() {
@@ -119,7 +129,7 @@
# not enable GRO
ip netns exec $NS_DST $ipt -A INPUT -p udp --dport 4789
ip netns exec $NS_DST $ipt -A INPUT -p udp --dport 8000
- ip netns exec $NS_DST ./udpgso_bench_rx -C 1000 -R 10 -n 10 -l 1300 $rx_args &
+ ip netns exec $NS_DST ./udpgso_bench_rx -C 2000 -R 100 -n 10 -l 1300 $rx_args &
local spid=$!
wait_local_port_listen "$NS_DST" 8000 udp
ip netns exec $NS_SRC ./udpgso_bench_tx $family -M 1 -s 13000 -S 1300 -D $dst
@@ -168,7 +178,7 @@
# bind the sender and the receiver to different CPUs to try
# get reproducible results
ip netns exec $NS_DST bash -c "echo 2 > /sys/class/net/veth$DST/queues/rx-0/rps_cpus"
- ip netns exec $NS_DST taskset 0x2 ./udpgso_bench_rx -C 1000 -R 10 &
+ ip netns exec $NS_DST taskset 0x2 ./udpgso_bench_rx -C 2000 -R 100 &
local spid=$!
wait_local_port_listen "$NS_DST" 8000 udp
ip netns exec $NS_SRC taskset 0x1 ./udpgso_bench_tx $family -l 3 -S 1300 -D $dst
diff --git a/tools/testing/selftests/net/udpgso_bench_rx.c b/tools/testing/selftests/net/udpgso_bench_rx.c
index f35a924..1cbadd2 100644
--- a/tools/testing/selftests/net/udpgso_bench_rx.c
+++ b/tools/testing/selftests/net/udpgso_bench_rx.c
@@ -375,7 +375,7 @@ static void do_recv(void)
do_flush_udp(fd);
tnow = gettimeofday_ms();
- if (tnow > treport) {
+ if (!cfg_expected_pkt_nr && tnow > treport) {
if (packets)
fprintf(stderr,
"%s rx: %6lu MB/s %8lu calls/s\n",
diff --git a/tools/testing/selftests/net/veth.sh b/tools/testing/selftests/net/veth.sh
index 27574bb..5ae85de 100755
--- a/tools/testing/selftests/net/veth.sh
+++ b/tools/testing/selftests/net/veth.sh
@@ -247,6 +247,20 @@
cleanup
create_ns
+ip -n $NS_DST link set dev veth$DST up
+ip -n $NS_DST link set dev veth$DST xdp object ${BPF_FILE} section xdp
+chk_gro_flag "gro vs xdp while down - gro flag on" $DST on
+ip -n $NS_DST link set dev veth$DST down
+chk_gro_flag " - after down" $DST on
+ip -n $NS_DST link set dev veth$DST xdp off
+chk_gro_flag " - after xdp off" $DST off
+ip -n $NS_DST link set dev veth$DST up
+chk_gro_flag " - after up" $DST off
+ip -n $NS_SRC link set dev veth$SRC xdp object ${BPF_FILE} section xdp
+chk_gro_flag " - after peer xdp" $DST off
+cleanup
+
+create_ns
chk_channels "default channels" $DST 1 1
ip -n $NS_DST link set dev veth$DST down
diff --git a/tools/testing/selftests/netfilter/Makefile b/tools/testing/selftests/netfilter/Makefile
index db27153..936c308 100644
--- a/tools/testing/selftests/netfilter/Makefile
+++ b/tools/testing/selftests/netfilter/Makefile
@@ -7,7 +7,8 @@
nft_queue.sh nft_meta.sh nf_nat_edemux.sh \
ipip-conntrack-mtu.sh conntrack_tcp_unreplied.sh \
conntrack_vrf.sh nft_synproxy.sh rpath.sh nft_audit.sh \
- conntrack_sctp_collision.sh xt_string.sh
+ conntrack_sctp_collision.sh xt_string.sh \
+ bridge_netfilter.sh
HOSTPKG_CONFIG := pkg-config
diff --git a/tools/testing/selftests/netfilter/bridge_netfilter.sh b/tools/testing/selftests/netfilter/bridge_netfilter.sh
new file mode 100644
index 0000000..659b3ab
--- /dev/null
+++ b/tools/testing/selftests/netfilter/bridge_netfilter.sh
@@ -0,0 +1,188 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Test bridge netfilter + conntrack, a combination that doesn't really work,
+# with multicast/broadcast packets racing for hash table insertion.
+
+# eth0 br0 eth0
+# setup is: ns1 <->,ns0 <-> ns3
+# ns2 <-' `'-> ns4
+
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+ret=0
+
+sfx=$(mktemp -u "XXXXXXXX")
+ns0="ns0-$sfx"
+ns1="ns1-$sfx"
+ns2="ns2-$sfx"
+ns3="ns3-$sfx"
+ns4="ns4-$sfx"
+
+ebtables -V > /dev/null 2>&1
+if [ $? -ne 0 ];then
+ echo "SKIP: Could not run test without ebtables"
+ exit $ksft_skip
+fi
+
+ip -Version > /dev/null 2>&1
+if [ $? -ne 0 ];then
+ echo "SKIP: Could not run test without ip tool"
+ exit $ksft_skip
+fi
+
+for i in $(seq 0 4); do
+ eval ip netns add \$ns$i
+done
+
+cleanup() {
+ for i in $(seq 0 4); do eval ip netns del \$ns$i;done
+}
+
+trap cleanup EXIT
+
+do_ping()
+{
+ fromns="$1"
+ dstip="$2"
+
+ ip netns exec $fromns ping -c 1 -q $dstip > /dev/null
+ if [ $? -ne 0 ]; then
+ echo "ERROR: ping from $fromns to $dstip"
+ ip netns exec ${ns0} nft list ruleset
+ ret=1
+ fi
+}
+
+bcast_ping()
+{
+ fromns="$1"
+ dstip="$2"
+
+ for i in $(seq 1 1000); do
+ ip netns exec $fromns ping -q -f -b -c 1 -q $dstip > /dev/null 2>&1
+ if [ $? -ne 0 ]; then
+ echo "ERROR: ping -b from $fromns to $dstip"
+ ip netns exec ${ns0} nft list ruleset
+ fi
+ done
+}
+
+ip link add veth1 netns ${ns0} type veth peer name eth0 netns ${ns1}
+if [ $? -ne 0 ]; then
+ echo "SKIP: Can't create veth device"
+ exit $ksft_skip
+fi
+
+ip link add veth2 netns ${ns0} type veth peer name eth0 netns $ns2
+ip link add veth3 netns ${ns0} type veth peer name eth0 netns $ns3
+ip link add veth4 netns ${ns0} type veth peer name eth0 netns $ns4
+
+ip -net ${ns0} link set lo up
+
+for i in $(seq 1 4); do
+ ip -net ${ns0} link set veth$i up
+done
+
+ip -net ${ns0} link add br0 type bridge stp_state 0 forward_delay 0 nf_call_iptables 1 nf_call_ip6tables 1 nf_call_arptables 1
+if [ $? -ne 0 ]; then
+ echo "SKIP: Can't create bridge br0"
+ exit $ksft_skip
+fi
+
+# make veth0,1,2 part of bridge.
+for i in $(seq 1 3); do
+ ip -net ${ns0} link set veth$i master br0
+done
+
+# add a macvlan on top of the bridge.
+MACVLAN_ADDR=ba:f3:13:37:42:23
+ip -net ${ns0} link add link br0 name macvlan0 type macvlan mode private
+ip -net ${ns0} link set macvlan0 address ${MACVLAN_ADDR}
+ip -net ${ns0} link set macvlan0 up
+ip -net ${ns0} addr add 10.23.0.1/24 dev macvlan0
+
+# add a macvlan on top of veth4.
+MACVLAN_ADDR=ba:f3:13:37:42:24
+ip -net ${ns0} link add link veth4 name macvlan4 type macvlan mode vepa
+ip -net ${ns0} link set macvlan4 address ${MACVLAN_ADDR}
+ip -net ${ns0} link set macvlan4 up
+
+# make the macvlan part of the bridge.
+# veth4 is not a bridge port, only the macvlan on top of it.
+ip -net ${ns0} link set macvlan4 master br0
+
+ip -net ${ns0} link set br0 up
+ip -net ${ns0} addr add 10.0.0.1/24 dev br0
+ip netns exec ${ns0} sysctl -q net.bridge.bridge-nf-call-iptables=1
+ret=$?
+if [ $ret -ne 0 ] ; then
+ echo "SKIP: bridge netfilter not available"
+ ret=$ksft_skip
+fi
+
+# for testing, so namespaces will reply to ping -b probes.
+ip netns exec ${ns0} sysctl -q net.ipv4.icmp_echo_ignore_broadcasts=0
+
+# enable conntrack in ns0 and drop broadcast packets in forward to
+# avoid them from getting confirmed in the postrouting hook before
+# the cloned skb is passed up the stack.
+ip netns exec ${ns0} nft -f - <<EOF
+table ip filter {
+ chain input {
+ type filter hook input priority 1; policy accept
+ iifname br0 counter
+ ct state new accept
+ }
+}
+
+table bridge filter {
+ chain forward {
+ type filter hook forward priority 0; policy accept
+ meta pkttype broadcast ip protocol icmp counter drop
+ }
+}
+EOF
+
+# place 1, 2 & 3 in same subnet, connected via ns0:br0.
+# ns4 is placed in same subnet as well, but its not
+# part of the bridge: the corresponding veth4 is not
+# part of the bridge, only its macvlan interface.
+for i in $(seq 1 4); do
+ eval ip -net \$ns$i link set lo up
+ eval ip -net \$ns$i link set eth0 up
+done
+for i in $(seq 1 2); do
+ eval ip -net \$ns$i addr add 10.0.0.1$i/24 dev eth0
+done
+
+ip -net ${ns3} addr add 10.23.0.13/24 dev eth0
+ip -net ${ns4} addr add 10.23.0.14/24 dev eth0
+
+# test basic connectivity
+do_ping ${ns1} 10.0.0.12
+do_ping ${ns3} 10.23.0.1
+do_ping ${ns4} 10.23.0.1
+
+if [ $ret -eq 0 ];then
+ echo "PASS: netns connectivity: ns1 can reach ns2, ns3 and ns4 can reach ns0"
+fi
+
+bcast_ping ${ns1} 10.0.0.255
+
+# This should deliver broadcast to macvlan0, which is on top of ns0:br0.
+bcast_ping ${ns3} 10.23.0.255
+
+# same, this time via veth4:macvlan4.
+bcast_ping ${ns4} 10.23.0.255
+
+read t < /proc/sys/kernel/tainted
+
+if [ $t -eq 0 ];then
+ echo PASS: kernel not tainted
+else
+ echo ERROR: kernel is tainted
+ ret=1
+fi
+
+exit $ret
diff --git a/tools/testing/selftests/netfilter/conntrack_dump_flush.c b/tools/testing/selftests/netfilter/conntrack_dump_flush.c
index f18c6db..b11ea8e 100644
--- a/tools/testing/selftests/netfilter/conntrack_dump_flush.c
+++ b/tools/testing/selftests/netfilter/conntrack_dump_flush.c
@@ -13,7 +13,7 @@
#include "../kselftest_harness.h"
#define TEST_ZONE_ID 123
-#define CTA_FILTER_F_CTA_TUPLE_ZONE (1 << 2)
+#define NF_CT_DEFAULT_ZONE_ID 0
static int reply_counter;
@@ -336,6 +336,9 @@ FIXTURE_SETUP(conntrack_dump_flush)
ret = conntrack_data_generate_v4(self->sock, 0xf4f4f4f4, 0xf5f5f5f5,
TEST_ZONE_ID + 2);
EXPECT_EQ(ret, 0);
+ ret = conntrack_data_generate_v4(self->sock, 0xf6f6f6f6, 0xf7f7f7f7,
+ NF_CT_DEFAULT_ZONE_ID);
+ EXPECT_EQ(ret, 0);
src = (struct in6_addr) {{
.__u6_addr32 = {
@@ -395,6 +398,26 @@ FIXTURE_SETUP(conntrack_dump_flush)
TEST_ZONE_ID + 2);
EXPECT_EQ(ret, 0);
+ src = (struct in6_addr) {{
+ .__u6_addr32 = {
+ 0xb80d0120,
+ 0x00000000,
+ 0x00000000,
+ 0x07000000
+ }
+ }};
+ dst = (struct in6_addr) {{
+ .__u6_addr32 = {
+ 0xb80d0120,
+ 0x00000000,
+ 0x00000000,
+ 0x08000000
+ }
+ }};
+ ret = conntrack_data_generate_v6(self->sock, src, dst,
+ NF_CT_DEFAULT_ZONE_ID);
+ EXPECT_EQ(ret, 0);
+
ret = conntracK_count_zone(self->sock, TEST_ZONE_ID);
EXPECT_GE(ret, 2);
if (ret > 2)
@@ -425,6 +448,24 @@ TEST_F(conntrack_dump_flush, test_flush_by_zone)
EXPECT_EQ(ret, 2);
ret = conntracK_count_zone(self->sock, TEST_ZONE_ID + 2);
EXPECT_EQ(ret, 2);
+ ret = conntracK_count_zone(self->sock, NF_CT_DEFAULT_ZONE_ID);
+ EXPECT_EQ(ret, 2);
+}
+
+TEST_F(conntrack_dump_flush, test_flush_by_zone_default)
+{
+ int ret;
+
+ ret = conntrack_flush_zone(self->sock, NF_CT_DEFAULT_ZONE_ID);
+ EXPECT_EQ(ret, 0);
+ ret = conntracK_count_zone(self->sock, TEST_ZONE_ID);
+ EXPECT_EQ(ret, 2);
+ ret = conntracK_count_zone(self->sock, TEST_ZONE_ID + 1);
+ EXPECT_EQ(ret, 2);
+ ret = conntracK_count_zone(self->sock, TEST_ZONE_ID + 2);
+ EXPECT_EQ(ret, 2);
+ ret = conntracK_count_zone(self->sock, NF_CT_DEFAULT_ZONE_ID);
+ EXPECT_EQ(ret, 0);
}
TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/pidfd/pidfd_getfd_test.c b/tools/testing/selftests/pidfd/pidfd_getfd_test.c
index 0930e24..cd51d54 100644
--- a/tools/testing/selftests/pidfd/pidfd_getfd_test.c
+++ b/tools/testing/selftests/pidfd/pidfd_getfd_test.c
@@ -5,6 +5,7 @@
#include <fcntl.h>
#include <limits.h>
#include <linux/types.h>
+#include <poll.h>
#include <sched.h>
#include <signal.h>
#include <stdio.h>
@@ -129,6 +130,7 @@ FIXTURE(child)
* When it is closed, the child will exit.
*/
int sk;
+ bool ignore_child_result;
};
FIXTURE_SETUP(child)
@@ -165,10 +167,14 @@ FIXTURE_SETUP(child)
FIXTURE_TEARDOWN(child)
{
+ int ret;
+
EXPECT_EQ(0, close(self->pidfd));
EXPECT_EQ(0, close(self->sk));
- EXPECT_EQ(0, wait_for_pid(self->pid));
+ ret = wait_for_pid(self->pid);
+ if (!self->ignore_child_result)
+ EXPECT_EQ(0, ret);
}
TEST_F(child, disable_ptrace)
@@ -235,6 +241,29 @@ TEST(flags_set)
EXPECT_EQ(errno, EINVAL);
}
+TEST_F(child, no_strange_EBADF)
+{
+ struct pollfd fds;
+
+ self->ignore_child_result = true;
+
+ fds.fd = self->pidfd;
+ fds.events = POLLIN;
+
+ ASSERT_EQ(kill(self->pid, SIGKILL), 0);
+ ASSERT_EQ(poll(&fds, 1, 5000), 1);
+
+ /*
+ * It used to be that pidfd_getfd() could race with the exiting thread
+ * between exit_files() and release_task(), and get a non-null task
+ * with a NULL files struct, and you'd get EBADF, which was slightly
+ * confusing.
+ */
+ errno = 0;
+ EXPECT_EQ(sys_pidfd_getfd(self->pidfd, self->remote_fd, 0), -1);
+ EXPECT_EQ(errno, ESRCH);
+}
+
#if __NR_pidfd_getfd == -1
int main(void)
{
diff --git a/tools/testing/selftests/power_supply/Makefile b/tools/testing/selftests/power_supply/Makefile
new file mode 100644
index 0000000..44f0658
--- /dev/null
+++ b/tools/testing/selftests/power_supply/Makefile
@@ -0,0 +1,4 @@
+TEST_PROGS := test_power_supply_properties.sh
+TEST_FILES := helpers.sh
+
+include ../lib.mk
diff --git a/tools/testing/selftests/power_supply/helpers.sh b/tools/testing/selftests/power_supply/helpers.sh
new file mode 100644
index 0000000..1ec90d7
--- /dev/null
+++ b/tools/testing/selftests/power_supply/helpers.sh
@@ -0,0 +1,178 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (c) 2022, 2024 Collabora Ltd
+SYSFS_SUPPLIES=/sys/class/power_supply
+
+calc() {
+ awk "BEGIN { print $* }";
+}
+
+test_sysfs_prop() {
+ PROP="$1"
+ VALUE="$2" # optional
+
+ PROP_PATH="$SYSFS_SUPPLIES"/"$DEVNAME"/"$PROP"
+ TEST_NAME="$DEVNAME".sysfs."$PROP"
+
+ if [ -z "$VALUE" ]; then
+ ktap_test_result "$TEST_NAME" [ -f "$PROP_PATH" ]
+ else
+ ktap_test_result "$TEST_NAME" grep -q "$VALUE" "$PROP_PATH"
+ fi
+}
+
+to_human_readable_unit() {
+ VALUE="$1"
+ UNIT="$2"
+
+ case "$VALUE" in
+ *[!0-9]* ) return ;; # Not a number
+ esac
+
+ if [ "$UNIT" = "uA" ]; then
+ new_unit="mA"
+ div=1000
+ elif [ "$UNIT" = "uV" ]; then
+ new_unit="V"
+ div=1000000
+ elif [ "$UNIT" = "uAh" ]; then
+ new_unit="Ah"
+ div=1000000
+ elif [ "$UNIT" = "uW" ]; then
+ new_unit="mW"
+ div=1000
+ elif [ "$UNIT" = "uWh" ]; then
+ new_unit="Wh"
+ div=1000000
+ else
+ return
+ fi
+
+ value_converted=$(calc "$VALUE"/"$div")
+ echo "$value_converted" "$new_unit"
+}
+
+_check_sysfs_prop_available() {
+ PROP=$1
+
+ PROP_PATH="$SYSFS_SUPPLIES"/"$DEVNAME"/"$PROP"
+ TEST_NAME="$DEVNAME".sysfs."$PROP"
+
+ if [ ! -e "$PROP_PATH" ] ; then
+ ktap_test_skip "$TEST_NAME"
+ return 1
+ fi
+
+ if ! cat "$PROP_PATH" >/dev/null; then
+ ktap_print_msg "Failed to read"
+ ktap_test_fail "$TEST_NAME"
+ return 1
+ fi
+
+ return 0
+}
+
+test_sysfs_prop_optional() {
+ PROP=$1
+ UNIT=$2 # optional
+
+ TEST_NAME="$DEVNAME".sysfs."$PROP"
+
+ _check_sysfs_prop_available "$PROP" || return
+ DATA=$(cat "$SYSFS_SUPPLIES"/"$DEVNAME"/"$PROP")
+
+ ktap_print_msg "Reported: '$DATA' $UNIT ($(to_human_readable_unit "$DATA" "$UNIT"))"
+ ktap_test_pass "$TEST_NAME"
+}
+
+test_sysfs_prop_optional_range() {
+ PROP=$1
+ MIN=$2
+ MAX=$3
+ UNIT=$4 # optional
+
+ TEST_NAME="$DEVNAME".sysfs."$PROP"
+
+ _check_sysfs_prop_available "$PROP" || return
+ DATA=$(cat "$SYSFS_SUPPLIES"/"$DEVNAME"/"$PROP")
+
+ if [ "$DATA" -lt "$MIN" ] || [ "$DATA" -gt "$MAX" ]; then
+ ktap_print_msg "'$DATA' is out of range (min=$MIN, max=$MAX)"
+ ktap_test_fail "$TEST_NAME"
+ else
+ ktap_print_msg "Reported: '$DATA' $UNIT ($(to_human_readable_unit "$DATA" "$UNIT"))"
+ ktap_test_pass "$TEST_NAME"
+ fi
+}
+
+test_sysfs_prop_optional_list() {
+ PROP=$1
+ LIST=$2
+
+ TEST_NAME="$DEVNAME".sysfs."$PROP"
+
+ _check_sysfs_prop_available "$PROP" || return
+ DATA=$(cat "$SYSFS_SUPPLIES"/"$DEVNAME"/"$PROP")
+
+ valid=0
+
+ OLDIFS=$IFS
+ IFS=","
+ for item in $LIST; do
+ if [ "$DATA" = "$item" ]; then
+ valid=1
+ break
+ fi
+ done
+ if [ "$valid" -eq 1 ]; then
+ ktap_print_msg "Reported: '$DATA'"
+ ktap_test_pass "$TEST_NAME"
+ else
+ ktap_print_msg "'$DATA' is not a valid value for this property"
+ ktap_test_fail "$TEST_NAME"
+ fi
+ IFS=$OLDIFS
+}
+
+dump_file() {
+ FILE="$1"
+ while read -r line; do
+ ktap_print_msg "$line"
+ done < "$FILE"
+}
+
+__test_uevent_prop() {
+ PROP="$1"
+ OPTIONAL="$2"
+ VALUE="$3" # optional
+
+ UEVENT_PATH="$SYSFS_SUPPLIES"/"$DEVNAME"/uevent
+ TEST_NAME="$DEVNAME".uevent."$PROP"
+
+ if ! grep -q "POWER_SUPPLY_$PROP=" "$UEVENT_PATH"; then
+ if [ "$OPTIONAL" -eq 1 ]; then
+ ktap_test_skip "$TEST_NAME"
+ else
+ ktap_print_msg "Missing property"
+ ktap_test_fail "$TEST_NAME"
+ fi
+ return
+ fi
+
+ if ! grep -q "POWER_SUPPLY_$PROP=$VALUE" "$UEVENT_PATH"; then
+ ktap_print_msg "Invalid value for uevent property, dumping..."
+ dump_file "$UEVENT_PATH"
+ ktap_test_fail "$TEST_NAME"
+ else
+ ktap_test_pass "$TEST_NAME"
+ fi
+}
+
+test_uevent_prop() {
+ __test_uevent_prop "$1" 0 "$2"
+}
+
+test_uevent_prop_optional() {
+ __test_uevent_prop "$1" 1 "$2"
+}
diff --git a/tools/testing/selftests/power_supply/test_power_supply_properties.sh b/tools/testing/selftests/power_supply/test_power_supply_properties.sh
new file mode 100755
index 0000000..df272df
--- /dev/null
+++ b/tools/testing/selftests/power_supply/test_power_supply_properties.sh
@@ -0,0 +1,114 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (c) 2022, 2024 Collabora Ltd
+#
+# This test validates the power supply uAPI: namely, the files in sysfs and
+# lines in uevent that expose the power supply properties.
+#
+# By default all power supplies available are tested. Optionally the name of a
+# power supply can be passed as a parameter to test only that one instead.
+DIR="$(dirname "$(readlink -f "$0")")"
+
+. "${DIR}"/../kselftest/ktap_helpers.sh
+
+. "${DIR}"/helpers.sh
+
+count_tests() {
+ SUPPLIES=$1
+
+ # This needs to be updated every time a new test is added.
+ NUM_TESTS=33
+
+ total_tests=0
+
+ for i in $SUPPLIES; do
+ total_tests=$(("$total_tests" + "$NUM_TESTS"))
+ done
+
+ echo "$total_tests"
+}
+
+ktap_print_header
+
+SYSFS_SUPPLIES=/sys/class/power_supply/
+
+if [ $# -eq 0 ]; then
+ supplies=$(ls "$SYSFS_SUPPLIES")
+else
+ supplies=$1
+fi
+
+ktap_set_plan "$(count_tests "$supplies")"
+
+for DEVNAME in $supplies; do
+ ktap_print_msg Testing device "$DEVNAME"
+
+ if [ ! -d "$SYSFS_SUPPLIES"/"$DEVNAME" ]; then
+ ktap_test_fail "$DEVNAME".exists
+ ktap_exit_fail_msg Device does not exist
+ fi
+
+ ktap_test_pass "$DEVNAME".exists
+
+ test_uevent_prop NAME "$DEVNAME"
+
+ test_sysfs_prop type
+ SUPPLY_TYPE=$(cat "$SYSFS_SUPPLIES"/"$DEVNAME"/type)
+ # This fails on kernels < 5.8 (needs 2ad3d74e3c69f)
+ test_uevent_prop TYPE "$SUPPLY_TYPE"
+
+ test_sysfs_prop_optional usb_type
+
+ test_sysfs_prop_optional_range online 0 2
+ test_sysfs_prop_optional_range present 0 1
+
+ test_sysfs_prop_optional_list status "Unknown","Charging","Discharging","Not charging","Full"
+
+ # Capacity is reported as percentage, thus any value less than 0 and
+ # greater than 100 are not allowed.
+ test_sysfs_prop_optional_range capacity 0 100 "%"
+
+ test_sysfs_prop_optional_list capacity_level "Unknown","Critical","Low","Normal","High","Full"
+
+ test_sysfs_prop_optional model_name
+ test_sysfs_prop_optional manufacturer
+ test_sysfs_prop_optional serial_number
+ test_sysfs_prop_optional_list technology "Unknown","NiMH","Li-ion","Li-poly","LiFe","NiCd","LiMn"
+
+ test_sysfs_prop_optional cycle_count
+
+ test_sysfs_prop_optional_list scope "Unknown","System","Device"
+
+ test_sysfs_prop_optional input_current_limit "uA"
+ test_sysfs_prop_optional input_voltage_limit "uV"
+
+ # Technically the power-supply class does not limit reported values.
+ # E.g. one could expose an RTC backup-battery, which goes below 1.5V or
+ # an electric vehicle battery with over 300V. But most devices do not
+ # have a step-up capable regulator behind the battery and operate with
+ # voltages considered safe to touch, so we limit the allowed range to
+ # 1.8V-60V to catch drivers reporting incorrectly scaled values. E.g. a
+ # common mistake is reporting data in mV instead of µV.
+ test_sysfs_prop_optional_range voltage_now 1800000 60000000 "uV"
+ test_sysfs_prop_optional_range voltage_min 1800000 60000000 "uV"
+ test_sysfs_prop_optional_range voltage_max 1800000 60000000 "uV"
+ test_sysfs_prop_optional_range voltage_min_design 1800000 60000000 "uV"
+ test_sysfs_prop_optional_range voltage_max_design 1800000 60000000 "uV"
+
+ # current based systems
+ test_sysfs_prop_optional current_now "uA"
+ test_sysfs_prop_optional current_max "uA"
+ test_sysfs_prop_optional charge_now "uAh"
+ test_sysfs_prop_optional charge_full "uAh"
+ test_sysfs_prop_optional charge_full_design "uAh"
+
+ # power based systems
+ test_sysfs_prop_optional power_now "uW"
+ test_sysfs_prop_optional energy_now "uWh"
+ test_sysfs_prop_optional energy_full "uWh"
+ test_sysfs_prop_optional energy_full_design "uWh"
+ test_sysfs_prop_optional energy_full_design "uWh"
+done
+
+ktap_finished
diff --git a/tools/testing/selftests/powerpc/math/fpu_signal.c b/tools/testing/selftests/powerpc/math/fpu_signal.c
index 7b1addd..8a64f63 100644
--- a/tools/testing/selftests/powerpc/math/fpu_signal.c
+++ b/tools/testing/selftests/powerpc/math/fpu_signal.c
@@ -18,6 +18,7 @@
#include <pthread.h>
#include "utils.h"
+#include "fpu.h"
/* Number of times each thread should receive the signal */
#define ITERATIONS 10
@@ -27,9 +28,7 @@
*/
#define THREAD_FACTOR 8
-__thread double darray[] = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
- 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0,
- 2.1};
+__thread double darray[32];
bool bad_context;
int threads_starting;
@@ -43,9 +42,9 @@ void signal_fpu_sig(int sig, siginfo_t *info, void *context)
ucontext_t *uc = context;
mcontext_t *mc = &uc->uc_mcontext;
- /* Only the non volatiles were loaded up */
- for (i = 14; i < 32; i++) {
- if (mc->fp_regs[i] != darray[i - 14]) {
+ // Don't check f30/f31, they're used as scratches in check_all_fprs()
+ for (i = 0; i < 30; i++) {
+ if (mc->fp_regs[i] != darray[i]) {
bad_context = true;
break;
}
@@ -54,7 +53,6 @@ void signal_fpu_sig(int sig, siginfo_t *info, void *context)
void *signal_fpu_c(void *p)
{
- int i;
long rc;
struct sigaction act;
act.sa_sigaction = signal_fpu_sig;
@@ -64,9 +62,7 @@ void *signal_fpu_c(void *p)
return p;
srand(pthread_self());
- for (i = 0; i < 21; i++)
- darray[i] = rand();
-
+ randomise_darray(darray, ARRAY_SIZE(darray));
rc = preempt_fpu(darray, &threads_starting, &running);
return (void *) rc;
diff --git a/tools/testing/selftests/powerpc/papr_vpd/papr_vpd.c b/tools/testing/selftests/powerpc/papr_vpd/papr_vpd.c
index 98cbb91..505294d 100644
--- a/tools/testing/selftests/powerpc/papr_vpd/papr_vpd.c
+++ b/tools/testing/selftests/powerpc/papr_vpd/papr_vpd.c
@@ -263,10 +263,10 @@ static int papr_vpd_system_loc_code(void)
off_t size;
int fd;
- SKIP_IF_MSG(get_system_loc_code(&lc),
- "Cannot determine system location code");
SKIP_IF_MSG(devfd < 0 && errno == ENOENT,
DEVPATH " not present");
+ SKIP_IF_MSG(get_system_loc_code(&lc),
+ "Cannot determine system location code");
FAIL_IF(devfd < 0);
diff --git a/tools/testing/selftests/resctrl/cache.c b/tools/testing/selftests/resctrl/cache.c
index bcbca35..1b339d6 100644
--- a/tools/testing/selftests/resctrl/cache.c
+++ b/tools/testing/selftests/resctrl/cache.c
@@ -3,106 +3,59 @@
#include <stdint.h>
#include "resctrl.h"
-struct read_format {
- __u64 nr; /* The number of events */
- struct {
- __u64 value; /* The value of the event */
- } values[2];
-};
-
-static struct perf_event_attr pea_llc_miss;
-static struct read_format rf_cqm;
-static int fd_lm;
char llc_occup_path[1024];
-static void initialize_perf_event_attr(void)
+void perf_event_attr_initialize(struct perf_event_attr *pea, __u64 config)
{
- pea_llc_miss.type = PERF_TYPE_HARDWARE;
- pea_llc_miss.size = sizeof(struct perf_event_attr);
- pea_llc_miss.read_format = PERF_FORMAT_GROUP;
- pea_llc_miss.exclude_kernel = 1;
- pea_llc_miss.exclude_hv = 1;
- pea_llc_miss.exclude_idle = 1;
- pea_llc_miss.exclude_callchain_kernel = 1;
- pea_llc_miss.inherit = 1;
- pea_llc_miss.exclude_guest = 1;
- pea_llc_miss.disabled = 1;
+ memset(pea, 0, sizeof(*pea));
+ pea->type = PERF_TYPE_HARDWARE;
+ pea->size = sizeof(*pea);
+ pea->read_format = PERF_FORMAT_GROUP;
+ pea->exclude_kernel = 1;
+ pea->exclude_hv = 1;
+ pea->exclude_idle = 1;
+ pea->exclude_callchain_kernel = 1;
+ pea->inherit = 1;
+ pea->exclude_guest = 1;
+ pea->disabled = 1;
+ pea->config = config;
}
-static void ioctl_perf_event_ioc_reset_enable(void)
+/* Start counters to log values */
+int perf_event_reset_enable(int pe_fd)
{
- ioctl(fd_lm, PERF_EVENT_IOC_RESET, 0);
- ioctl(fd_lm, PERF_EVENT_IOC_ENABLE, 0);
-}
+ int ret;
-static int perf_event_open_llc_miss(pid_t pid, int cpu_no)
-{
- fd_lm = perf_event_open(&pea_llc_miss, pid, cpu_no, -1,
- PERF_FLAG_FD_CLOEXEC);
- if (fd_lm == -1) {
- perror("Error opening leader");
- ctrlc_handler(0, NULL, NULL);
- return -1;
- }
-
- return 0;
-}
-
-static void initialize_llc_perf(void)
-{
- memset(&pea_llc_miss, 0, sizeof(struct perf_event_attr));
- memset(&rf_cqm, 0, sizeof(struct read_format));
-
- /* Initialize perf_event_attr structures for HW_CACHE_MISSES */
- initialize_perf_event_attr();
-
- pea_llc_miss.config = PERF_COUNT_HW_CACHE_MISSES;
-
- rf_cqm.nr = 1;
-}
-
-static int reset_enable_llc_perf(pid_t pid, int cpu_no)
-{
- int ret = 0;
-
- ret = perf_event_open_llc_miss(pid, cpu_no);
+ ret = ioctl(pe_fd, PERF_EVENT_IOC_RESET, 0);
if (ret < 0)
return ret;
- /* Start counters to log values */
- ioctl_perf_event_ioc_reset_enable();
+ ret = ioctl(pe_fd, PERF_EVENT_IOC_ENABLE, 0);
+ if (ret < 0)
+ return ret;
return 0;
}
-/*
- * get_llc_perf: llc cache miss through perf events
- * @llc_perf_miss: LLC miss counter that is filled on success
- *
- * Perf events like HW_CACHE_MISSES could be used to validate number of
- * cache lines allocated.
- *
- * Return: =0 on success. <0 on failure.
- */
-static int get_llc_perf(unsigned long *llc_perf_miss)
+void perf_event_initialize_read_format(struct perf_event_read *pe_read)
{
- __u64 total_misses;
- int ret;
+ memset(pe_read, 0, sizeof(*pe_read));
+ pe_read->nr = 1;
+}
- /* Stop counters after one span to get miss rate */
+int perf_open(struct perf_event_attr *pea, pid_t pid, int cpu_no)
+{
+ int pe_fd;
- ioctl(fd_lm, PERF_EVENT_IOC_DISABLE, 0);
-
- ret = read(fd_lm, &rf_cqm, sizeof(struct read_format));
- if (ret == -1) {
- perror("Could not get llc misses through perf");
+ pe_fd = perf_event_open(pea, pid, cpu_no, -1, PERF_FLAG_FD_CLOEXEC);
+ if (pe_fd == -1) {
+ ksft_perror("Error opening leader");
return -1;
}
- total_misses = rf_cqm.values[0].value;
- *llc_perf_miss = total_misses;
+ perf_event_reset_enable(pe_fd);
- return 0;
+ return pe_fd;
}
/*
@@ -124,12 +77,12 @@ static int get_llc_occu_resctrl(unsigned long *llc_occupancy)
fp = fopen(llc_occup_path, "r");
if (!fp) {
- perror("Failed to open results file");
+ ksft_perror("Failed to open results file");
- return errno;
+ return -1;
}
if (fscanf(fp, "%lu", llc_occupancy) <= 0) {
- perror("Could not get llc occupancy");
+ ksft_perror("Could not get llc occupancy");
fclose(fp);
return -1;
@@ -146,163 +99,91 @@ static int get_llc_occu_resctrl(unsigned long *llc_occupancy)
* @llc_value: perf miss value /
* llc occupancy value reported by resctrl FS
*
- * Return: 0 on success. non-zero on failure.
+ * Return: 0 on success, < 0 on error.
*/
-static int print_results_cache(char *filename, int bm_pid,
- unsigned long llc_value)
+static int print_results_cache(const char *filename, int bm_pid, __u64 llc_value)
{
FILE *fp;
if (strcmp(filename, "stdio") == 0 || strcmp(filename, "stderr") == 0) {
- printf("Pid: %d \t LLC_value: %lu\n", bm_pid,
- llc_value);
+ printf("Pid: %d \t LLC_value: %llu\n", bm_pid, llc_value);
} else {
fp = fopen(filename, "a");
if (!fp) {
- perror("Cannot open results file");
+ ksft_perror("Cannot open results file");
- return errno;
+ return -1;
}
- fprintf(fp, "Pid: %d \t llc_value: %lu\n", bm_pid, llc_value);
+ fprintf(fp, "Pid: %d \t llc_value: %llu\n", bm_pid, llc_value);
fclose(fp);
}
return 0;
}
-int measure_cache_vals(struct resctrl_val_param *param, int bm_pid)
+/*
+ * perf_event_measure - Measure perf events
+ * @filename: Filename for writing the results
+ * @bm_pid: PID that runs the benchmark
+ *
+ * Measures perf events (e.g., cache misses) and writes the results into
+ * @filename. @bm_pid is written to the results file along with the measured
+ * value.
+ *
+ * Return: =0 on success. <0 on failure.
+ */
+int perf_event_measure(int pe_fd, struct perf_event_read *pe_read,
+ const char *filename, int bm_pid)
{
- unsigned long llc_perf_miss = 0, llc_occu_resc = 0, llc_value = 0;
int ret;
- /*
- * Measure cache miss from perf.
- */
- if (!strncmp(param->resctrl_val, CAT_STR, sizeof(CAT_STR))) {
- ret = get_llc_perf(&llc_perf_miss);
- if (ret < 0)
- return ret;
- llc_value = llc_perf_miss;
- }
-
- /*
- * Measure llc occupancy from resctrl.
- */
- if (!strncmp(param->resctrl_val, CMT_STR, sizeof(CMT_STR))) {
- ret = get_llc_occu_resctrl(&llc_occu_resc);
- if (ret < 0)
- return ret;
- llc_value = llc_occu_resc;
- }
- ret = print_results_cache(param->filename, bm_pid, llc_value);
- if (ret)
+ /* Stop counters after one span to get miss rate */
+ ret = ioctl(pe_fd, PERF_EVENT_IOC_DISABLE, 0);
+ if (ret < 0)
return ret;
- return 0;
+ ret = read(pe_fd, pe_read, sizeof(*pe_read));
+ if (ret == -1) {
+ ksft_perror("Could not get perf value");
+ return -1;
+ }
+
+ return print_results_cache(filename, bm_pid, pe_read->values[0].value);
}
/*
- * cache_val: execute benchmark and measure LLC occupancy resctrl
- * and perf cache miss for the benchmark
- * @param: parameters passed to cache_val()
- * @span: buffer size for the benchmark
+ * measure_llc_resctrl - Measure resctrl LLC value from resctrl
+ * @filename: Filename for writing the results
+ * @bm_pid: PID that runs the benchmark
*
- * Return: 0 on success. non-zero on failure.
+ * Measures LLC occupancy from resctrl and writes the results into @filename.
+ * @bm_pid is written to the results file along with the measured value.
+ *
+ * Return: =0 on success. <0 on failure.
*/
-int cat_val(struct resctrl_val_param *param, size_t span)
+int measure_llc_resctrl(const char *filename, int bm_pid)
{
- int memflush = 1, operation = 0, ret = 0;
- char *resctrl_val = param->resctrl_val;
- pid_t bm_pid;
+ unsigned long llc_occu_resc = 0;
+ int ret;
- if (strcmp(param->filename, "") == 0)
- sprintf(param->filename, "stdio");
-
- bm_pid = getpid();
-
- /* Taskset benchmark to specified cpu */
- ret = taskset_benchmark(bm_pid, param->cpu_no);
- if (ret)
+ ret = get_llc_occu_resctrl(&llc_occu_resc);
+ if (ret < 0)
return ret;
- /* Write benchmark to specified con_mon grp, mon_grp in resctrl FS*/
- ret = write_bm_pid_to_resctrl(bm_pid, param->ctrlgrp, param->mongrp,
- resctrl_val);
- if (ret)
- return ret;
-
- initialize_llc_perf();
-
- /* Test runs until the callback setup() tells the test to stop. */
- while (1) {
- ret = param->setup(param);
- if (ret == END_OF_TESTS) {
- ret = 0;
- break;
- }
- if (ret < 0)
- break;
- ret = reset_enable_llc_perf(bm_pid, param->cpu_no);
- if (ret)
- break;
-
- if (run_fill_buf(span, memflush, operation, true)) {
- fprintf(stderr, "Error-running fill buffer\n");
- ret = -1;
- goto pe_close;
- }
-
- sleep(1);
- ret = measure_cache_vals(param, bm_pid);
- if (ret)
- goto pe_close;
- }
-
- return ret;
-
-pe_close:
- close(fd_lm);
- return ret;
+ return print_results_cache(filename, bm_pid, llc_occu_resc);
}
/*
- * show_cache_info: show cache test result information
- * @sum_llc_val: sum of LLC cache result data
- * @no_of_bits: number of bits
- * @cache_span: cache span in bytes for CMT or in lines for CAT
- * @max_diff: max difference
- * @max_diff_percent: max difference percentage
- * @num_of_runs: number of runs
- * @platform: show test information on this platform
- * @cmt: CMT test or CAT test
- *
- * Return: 0 on success. non-zero on failure.
+ * show_cache_info - Show generic cache test information
+ * @no_of_bits: Number of bits
+ * @avg_llc_val: Average of LLC cache result data
+ * @cache_span: Cache span
+ * @lines: @cache_span in lines or bytes
*/
-int show_cache_info(unsigned long sum_llc_val, int no_of_bits,
- size_t cache_span, unsigned long max_diff,
- unsigned long max_diff_percent, unsigned long num_of_runs,
- bool platform, bool cmt)
+void show_cache_info(int no_of_bits, __u64 avg_llc_val, size_t cache_span, bool lines)
{
- unsigned long avg_llc_val = 0;
- float diff_percent;
- long avg_diff = 0;
- int ret;
-
- avg_llc_val = sum_llc_val / num_of_runs;
- avg_diff = (long)abs(cache_span - avg_llc_val);
- diff_percent = ((float)cache_span - avg_llc_val) / cache_span * 100;
-
- ret = platform && abs((int)diff_percent) > max_diff_percent &&
- (cmt ? (abs(avg_diff) > max_diff) : true);
-
- ksft_print_msg("%s Check cache miss rate within %lu%%\n",
- ret ? "Fail:" : "Pass:", max_diff_percent);
-
- ksft_print_msg("Percent diff=%d\n", abs((int)diff_percent));
ksft_print_msg("Number of bits: %d\n", no_of_bits);
- ksft_print_msg("Average LLC val: %lu\n", avg_llc_val);
- ksft_print_msg("Cache span (%s): %zu\n", cmt ? "bytes" : "lines",
+ ksft_print_msg("Average LLC val: %llu\n", avg_llc_val);
+ ksft_print_msg("Cache span (%s): %zu\n", lines ? "lines" : "bytes",
cache_span);
-
- return ret;
}
diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c
index 224ba85..4cb991b 100644
--- a/tools/testing/selftests/resctrl/cat_test.c
+++ b/tools/testing/selftests/resctrl/cat_test.c
@@ -11,108 +11,254 @@
#include "resctrl.h"
#include <unistd.h>
-#define RESULT_FILE_NAME1 "result_cat1"
-#define RESULT_FILE_NAME2 "result_cat2"
+#define RESULT_FILE_NAME "result_cat"
#define NUM_OF_RUNS 5
-#define MAX_DIFF_PERCENT 4
-#define MAX_DIFF 1000000
/*
- * Change schemata. Write schemata to specified
- * con_mon grp, mon_grp in resctrl FS.
- * Run 5 times in order to get average values.
+ * Minimum difference in LLC misses between a test with n+1 bits CBM to the
+ * test with n bits is MIN_DIFF_PERCENT_PER_BIT * (n - 1). With e.g. 5 vs 4
+ * bits in the CBM mask, the minimum difference must be at least
+ * MIN_DIFF_PERCENT_PER_BIT * (4 - 1) = 3 percent.
+ *
+ * The relationship between number of used CBM bits and difference in LLC
+ * misses is not expected to be linear. With a small number of bits, the
+ * margin is smaller than with larger number of bits. For selftest purposes,
+ * however, linear approach is enough because ultimately only pass/fail
+ * decision has to be made and distinction between strong and stronger
+ * signal is irrelevant.
*/
-static int cat_setup(struct resctrl_val_param *p)
+#define MIN_DIFF_PERCENT_PER_BIT 1UL
+
+static int show_results_info(__u64 sum_llc_val, int no_of_bits,
+ unsigned long cache_span,
+ unsigned long min_diff_percent,
+ unsigned long num_of_runs, bool platform,
+ __s64 *prev_avg_llc_val)
{
- char schemata[64];
+ __u64 avg_llc_val = 0;
+ float avg_diff;
int ret = 0;
- /* Run NUM_OF_RUNS times */
- if (p->num_of_runs >= NUM_OF_RUNS)
- return END_OF_TESTS;
+ avg_llc_val = sum_llc_val / num_of_runs;
+ if (*prev_avg_llc_val) {
+ float delta = (__s64)(avg_llc_val - *prev_avg_llc_val);
- if (p->num_of_runs == 0) {
- sprintf(schemata, "%lx", p->mask);
- ret = write_schemata(p->ctrlgrp, schemata, p->cpu_no,
- p->resctrl_val);
+ avg_diff = delta / *prev_avg_llc_val;
+ ret = platform && (avg_diff * 100) < (float)min_diff_percent;
+
+ ksft_print_msg("%s Check cache miss rate changed more than %.1f%%\n",
+ ret ? "Fail:" : "Pass:", (float)min_diff_percent);
+
+ ksft_print_msg("Percent diff=%.1f\n", avg_diff * 100);
}
- p->num_of_runs++;
+ *prev_avg_llc_val = avg_llc_val;
+
+ show_cache_info(no_of_bits, avg_llc_val, cache_span, true);
return ret;
}
-static int check_results(struct resctrl_val_param *param, size_t span)
+/* Remove the highest bit from CBM */
+static unsigned long next_mask(unsigned long current_mask)
+{
+ return current_mask & (current_mask >> 1);
+}
+
+static int check_results(struct resctrl_val_param *param, const char *cache_type,
+ unsigned long cache_total_size, unsigned long full_cache_mask,
+ unsigned long current_mask)
{
char *token_array[8], temp[512];
- unsigned long sum_llc_perf_miss = 0;
- int runs = 0, no_of_bits = 0;
+ __u64 sum_llc_perf_miss = 0;
+ __s64 prev_avg_llc_val = 0;
+ unsigned long alloc_size;
+ int runs = 0;
+ int fail = 0;
+ int ret;
FILE *fp;
ksft_print_msg("Checking for pass/fail\n");
fp = fopen(param->filename, "r");
if (!fp) {
- perror("# Cannot open file");
+ ksft_perror("Cannot open file");
- return errno;
+ return -1;
}
while (fgets(temp, sizeof(temp), fp)) {
char *token = strtok(temp, ":\t");
int fields = 0;
+ int bits;
while (token) {
token_array[fields++] = token;
token = strtok(NULL, ":\t");
}
- /*
- * Discard the first value which is inaccurate due to monitoring
- * setup transition phase.
- */
- if (runs > 0)
- sum_llc_perf_miss += strtoul(token_array[3], NULL, 0);
+
+ sum_llc_perf_miss += strtoull(token_array[3], NULL, 0);
runs++;
+
+ if (runs < NUM_OF_RUNS)
+ continue;
+
+ if (!current_mask) {
+ ksft_print_msg("Unexpected empty cache mask\n");
+ break;
+ }
+
+ alloc_size = cache_portion_size(cache_total_size, current_mask, full_cache_mask);
+
+ bits = count_bits(current_mask);
+
+ ret = show_results_info(sum_llc_perf_miss, bits,
+ alloc_size / 64,
+ MIN_DIFF_PERCENT_PER_BIT * (bits - 1),
+ runs, get_vendor() == ARCH_INTEL,
+ &prev_avg_llc_val);
+ if (ret)
+ fail = 1;
+
+ runs = 0;
+ sum_llc_perf_miss = 0;
+ current_mask = next_mask(current_mask);
}
fclose(fp);
- no_of_bits = count_bits(param->mask);
- return show_cache_info(sum_llc_perf_miss, no_of_bits, span / 64,
- MAX_DIFF, MAX_DIFF_PERCENT, runs - 1,
- get_vendor() == ARCH_INTEL, false);
+ return fail;
}
void cat_test_cleanup(void)
{
- remove(RESULT_FILE_NAME1);
- remove(RESULT_FILE_NAME2);
+ remove(RESULT_FILE_NAME);
}
-int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
+/*
+ * cat_test - Execute CAT benchmark and measure cache misses
+ * @test: Test information structure
+ * @uparams: User supplied parameters
+ * @param: Parameters passed to cat_test()
+ * @span: Buffer size for the benchmark
+ * @current_mask Start mask for the first iteration
+ *
+ * Run CAT selftest by varying the allocated cache portion and comparing the
+ * impact on cache misses (the result analysis is done in check_results()
+ * and show_results_info(), not in this function).
+ *
+ * One bit is removed from the CAT allocation bit mask (in current_mask) for
+ * each subsequent test which keeps reducing the size of the allocated cache
+ * portion. A single test flushes the buffer, reads it to warm up the cache,
+ * and reads the buffer again. The cache misses are measured during the last
+ * read pass.
+ *
+ * Return: 0 when the test was run, < 0 on error.
+ */
+static int cat_test(const struct resctrl_test *test,
+ const struct user_params *uparams,
+ struct resctrl_val_param *param,
+ size_t span, unsigned long current_mask)
{
- unsigned long l_mask, l_mask_1;
- int ret, pipefd[2], sibling_cpu_no;
- unsigned long cache_size = 0;
- unsigned long long_mask;
- char cbm_mask[256];
- int count_of_bits;
- char pipe_message;
- size_t span;
+ char *resctrl_val = param->resctrl_val;
+ struct perf_event_read pe_read;
+ struct perf_event_attr pea;
+ cpu_set_t old_affinity;
+ unsigned char *buf;
+ char schemata[64];
+ int ret, i, pe_fd;
+ pid_t bm_pid;
- /* Get default cbm mask for L3/L2 cache */
- ret = get_cbm_mask(cache_type, cbm_mask);
+ if (strcmp(param->filename, "") == 0)
+ sprintf(param->filename, "stdio");
+
+ bm_pid = getpid();
+
+ /* Taskset benchmark to specified cpu */
+ ret = taskset_benchmark(bm_pid, uparams->cpu, &old_affinity);
if (ret)
return ret;
- long_mask = strtoul(cbm_mask, NULL, 16);
+ /* Write benchmark to specified con_mon grp, mon_grp in resctrl FS*/
+ ret = write_bm_pid_to_resctrl(bm_pid, param->ctrlgrp, param->mongrp,
+ resctrl_val);
+ if (ret)
+ goto reset_affinity;
+
+ perf_event_attr_initialize(&pea, PERF_COUNT_HW_CACHE_MISSES);
+ perf_event_initialize_read_format(&pe_read);
+ pe_fd = perf_open(&pea, bm_pid, uparams->cpu);
+ if (pe_fd < 0) {
+ ret = -1;
+ goto reset_affinity;
+ }
+
+ buf = alloc_buffer(span, 1);
+ if (!buf) {
+ ret = -1;
+ goto pe_close;
+ }
+
+ while (current_mask) {
+ snprintf(schemata, sizeof(schemata), "%lx", param->mask & ~current_mask);
+ ret = write_schemata("", schemata, uparams->cpu, test->resource);
+ if (ret)
+ goto free_buf;
+ snprintf(schemata, sizeof(schemata), "%lx", current_mask);
+ ret = write_schemata(param->ctrlgrp, schemata, uparams->cpu, test->resource);
+ if (ret)
+ goto free_buf;
+
+ for (i = 0; i < NUM_OF_RUNS; i++) {
+ mem_flush(buf, span);
+ fill_cache_read(buf, span, true);
+
+ ret = perf_event_reset_enable(pe_fd);
+ if (ret)
+ goto free_buf;
+
+ fill_cache_read(buf, span, true);
+
+ ret = perf_event_measure(pe_fd, &pe_read, param->filename, bm_pid);
+ if (ret)
+ goto free_buf;
+ }
+ current_mask = next_mask(current_mask);
+ }
+
+free_buf:
+ free(buf);
+pe_close:
+ close(pe_fd);
+reset_affinity:
+ taskset_restore(bm_pid, &old_affinity);
+
+ return ret;
+}
+
+static int cat_run_test(const struct resctrl_test *test, const struct user_params *uparams)
+{
+ unsigned long long_mask, start_mask, full_cache_mask;
+ unsigned long cache_total_size = 0;
+ int n = uparams->bits;
+ unsigned int start;
+ int count_of_bits;
+ size_t span;
+ int ret;
+
+ ret = get_full_cbm(test->resource, &full_cache_mask);
+ if (ret)
+ return ret;
+ /* Get the largest contiguous exclusive portion of the cache */
+ ret = get_mask_no_shareable(test->resource, &long_mask);
+ if (ret)
+ return ret;
/* Get L3/L2 cache size */
- ret = get_cache_size(cpu_no, cache_type, &cache_size);
+ ret = get_cache_size(uparams->cpu, test->resource, &cache_total_size);
if (ret)
return ret;
- ksft_print_msg("Cache size :%lu\n", cache_size);
+ ksft_print_msg("Cache size :%lu\n", cache_total_size);
- /* Get max number of bits from default-cabm mask */
- count_of_bits = count_bits(long_mask);
+ count_of_bits = count_contiguous_bits(long_mask, &start);
if (!n)
n = count_of_bits / 2;
@@ -123,89 +269,124 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
count_of_bits - 1);
return -1;
}
-
- /* Get core id from same socket for running another thread */
- sibling_cpu_no = get_core_sibling(cpu_no);
- if (sibling_cpu_no < 0)
- return -1;
+ start_mask = create_bit_mask(start, n);
struct resctrl_val_param param = {
.resctrl_val = CAT_STR,
- .cpu_no = cpu_no,
- .setup = cat_setup,
+ .ctrlgrp = "c1",
+ .filename = RESULT_FILE_NAME,
+ .num_of_runs = 0,
};
-
- l_mask = long_mask >> n;
- l_mask_1 = ~l_mask & long_mask;
-
- /* Set param values for parent thread which will be allocated bitmask
- * with (max_bits - n) bits
- */
- span = cache_size * (count_of_bits - n) / count_of_bits;
- strcpy(param.ctrlgrp, "c2");
- strcpy(param.mongrp, "m2");
- strcpy(param.filename, RESULT_FILE_NAME2);
- param.mask = l_mask;
- param.num_of_runs = 0;
-
- if (pipe(pipefd)) {
- perror("# Unable to create pipe");
- return errno;
- }
-
- fflush(stdout);
- bm_pid = fork();
-
- /* Set param values for child thread which will be allocated bitmask
- * with n bits
- */
- if (bm_pid == 0) {
- param.mask = l_mask_1;
- strcpy(param.ctrlgrp, "c1");
- strcpy(param.mongrp, "m1");
- span = cache_size * n / count_of_bits;
- strcpy(param.filename, RESULT_FILE_NAME1);
- param.num_of_runs = 0;
- param.cpu_no = sibling_cpu_no;
- }
+ param.mask = long_mask;
+ span = cache_portion_size(cache_total_size, start_mask, full_cache_mask);
remove(param.filename);
- ret = cat_val(¶m, span);
- if (ret == 0)
- ret = check_results(¶m, span);
+ ret = cat_test(test, uparams, ¶m, span, start_mask);
+ if (ret)
+ goto out;
- if (bm_pid == 0) {
- /* Tell parent that child is ready */
- close(pipefd[0]);
- pipe_message = 1;
- if (write(pipefd[1], &pipe_message, sizeof(pipe_message)) <
- sizeof(pipe_message))
- /*
- * Just print the error message.
- * Let while(1) run and wait for itself to be killed.
- */
- perror("# failed signaling parent process");
-
- close(pipefd[1]);
- while (1)
- ;
- } else {
- /* Parent waits for child to be ready. */
- close(pipefd[1]);
- pipe_message = 0;
- while (pipe_message != 1) {
- if (read(pipefd[0], &pipe_message,
- sizeof(pipe_message)) < sizeof(pipe_message)) {
- perror("# failed reading from child process");
- break;
- }
- }
- close(pipefd[0]);
- kill(bm_pid, SIGKILL);
- }
-
+ ret = check_results(¶m, test->resource,
+ cache_total_size, full_cache_mask, start_mask);
+out:
cat_test_cleanup();
return ret;
}
+
+static int noncont_cat_run_test(const struct resctrl_test *test,
+ const struct user_params *uparams)
+{
+ unsigned long full_cache_mask, cont_mask, noncont_mask;
+ unsigned int eax, ebx, ecx, edx, sparse_masks;
+ int bit_center, ret;
+ char schemata[64];
+
+ /* Check to compare sparse_masks content to CPUID output. */
+ ret = resource_info_unsigned_get(test->resource, "sparse_masks", &sparse_masks);
+ if (ret)
+ return ret;
+
+ if (!strcmp(test->resource, "L3"))
+ __cpuid_count(0x10, 1, eax, ebx, ecx, edx);
+ else if (!strcmp(test->resource, "L2"))
+ __cpuid_count(0x10, 2, eax, ebx, ecx, edx);
+ else
+ return -EINVAL;
+
+ if (sparse_masks != ((ecx >> 3) & 1)) {
+ ksft_print_msg("CPUID output doesn't match 'sparse_masks' file content!\n");
+ return 1;
+ }
+
+ /* Write checks initialization. */
+ ret = get_full_cbm(test->resource, &full_cache_mask);
+ if (ret < 0)
+ return ret;
+ bit_center = count_bits(full_cache_mask) / 2;
+
+ /*
+ * The bit_center needs to be at least 3 to properly calculate the CBM
+ * hole in the noncont_mask. If it's smaller return an error since the
+ * cache mask is too short and that shouldn't happen.
+ */
+ if (bit_center < 3)
+ return -EINVAL;
+ cont_mask = full_cache_mask >> bit_center;
+
+ /* Contiguous mask write check. */
+ snprintf(schemata, sizeof(schemata), "%lx", cont_mask);
+ ret = write_schemata("", schemata, uparams->cpu, test->resource);
+ if (ret) {
+ ksft_print_msg("Write of contiguous CBM failed\n");
+ return 1;
+ }
+
+ /*
+ * Non-contiguous mask write check. CBM has a 0xf hole approximately in the middle.
+ * Output is compared with support information to catch any edge case errors.
+ */
+ noncont_mask = ~(0xfUL << (bit_center - 2)) & full_cache_mask;
+ snprintf(schemata, sizeof(schemata), "%lx", noncont_mask);
+ ret = write_schemata("", schemata, uparams->cpu, test->resource);
+ if (ret && sparse_masks)
+ ksft_print_msg("Non-contiguous CBMs supported but write of non-contiguous CBM failed\n");
+ else if (ret && !sparse_masks)
+ ksft_print_msg("Non-contiguous CBMs not supported and write of non-contiguous CBM failed as expected\n");
+ else if (!ret && !sparse_masks)
+ ksft_print_msg("Non-contiguous CBMs not supported but write of non-contiguous CBM succeeded\n");
+
+ return !ret == !sparse_masks;
+}
+
+static bool noncont_cat_feature_check(const struct resctrl_test *test)
+{
+ if (!resctrl_resource_exists(test->resource))
+ return false;
+
+ return resource_info_file_exists(test->resource, "sparse_masks");
+}
+
+struct resctrl_test l3_cat_test = {
+ .name = "L3_CAT",
+ .group = "CAT",
+ .resource = "L3",
+ .feature_check = test_resource_feature_check,
+ .run_test = cat_run_test,
+};
+
+struct resctrl_test l3_noncont_cat_test = {
+ .name = "L3_NONCONT_CAT",
+ .group = "CAT",
+ .resource = "L3",
+ .feature_check = noncont_cat_feature_check,
+ .run_test = noncont_cat_run_test,
+};
+
+struct resctrl_test l2_noncont_cat_test = {
+ .name = "L2_NONCONT_CAT",
+ .group = "CAT",
+ .resource = "L2",
+ .feature_check = noncont_cat_feature_check,
+ .run_test = noncont_cat_run_test,
+};
diff --git a/tools/testing/selftests/resctrl/cmt_test.c b/tools/testing/selftests/resctrl/cmt_test.c
index 50bdbce..a81f912 100644
--- a/tools/testing/selftests/resctrl/cmt_test.c
+++ b/tools/testing/selftests/resctrl/cmt_test.c
@@ -16,7 +16,9 @@
#define MAX_DIFF 2000000
#define MAX_DIFF_PERCENT 15
-static int cmt_setup(struct resctrl_val_param *p)
+static int cmt_setup(const struct resctrl_test *test,
+ const struct user_params *uparams,
+ struct resctrl_val_param *p)
{
/* Run NUM_OF_RUNS times */
if (p->num_of_runs >= NUM_OF_RUNS)
@@ -27,6 +29,33 @@ static int cmt_setup(struct resctrl_val_param *p)
return 0;
}
+static int show_results_info(unsigned long sum_llc_val, int no_of_bits,
+ unsigned long cache_span, unsigned long max_diff,
+ unsigned long max_diff_percent, unsigned long num_of_runs,
+ bool platform)
+{
+ unsigned long avg_llc_val = 0;
+ float diff_percent;
+ long avg_diff = 0;
+ int ret;
+
+ avg_llc_val = sum_llc_val / num_of_runs;
+ avg_diff = (long)abs(cache_span - avg_llc_val);
+ diff_percent = ((float)cache_span - avg_llc_val) / cache_span * 100;
+
+ ret = platform && abs((int)diff_percent) > max_diff_percent &&
+ abs(avg_diff) > max_diff;
+
+ ksft_print_msg("%s Check cache miss rate within %lu%%\n",
+ ret ? "Fail:" : "Pass:", max_diff_percent);
+
+ ksft_print_msg("Percent diff=%d\n", abs((int)diff_percent));
+
+ show_cache_info(no_of_bits, avg_llc_val, cache_span, false);
+
+ return ret;
+}
+
static int check_results(struct resctrl_val_param *param, size_t span, int no_of_bits)
{
char *token_array[8], temp[512];
@@ -37,9 +66,9 @@ static int check_results(struct resctrl_val_param *param, size_t span, int no_of
ksft_print_msg("Checking for pass/fail\n");
fp = fopen(param->filename, "r");
if (!fp) {
- perror("# Error in opening file\n");
+ ksft_perror("Error in opening file");
- return errno;
+ return -1;
}
while (fgets(temp, sizeof(temp), fp)) {
@@ -58,9 +87,8 @@ static int check_results(struct resctrl_val_param *param, size_t span, int no_of
}
fclose(fp);
- return show_cache_info(sum_llc_occu_resc, no_of_bits, span,
- MAX_DIFF, MAX_DIFF_PERCENT, runs - 1,
- true, true);
+ return show_results_info(sum_llc_occu_resc, no_of_bits, span,
+ MAX_DIFF, MAX_DIFF_PERCENT, runs - 1, true);
}
void cmt_test_cleanup(void)
@@ -68,28 +96,26 @@ void cmt_test_cleanup(void)
remove(RESULT_FILE_NAME);
}
-int cmt_resctrl_val(int cpu_no, int n, const char * const *benchmark_cmd)
+static int cmt_run_test(const struct resctrl_test *test, const struct user_params *uparams)
{
- const char * const *cmd = benchmark_cmd;
+ const char * const *cmd = uparams->benchmark_cmd;
const char *new_cmd[BENCHMARK_ARGS];
- unsigned long cache_size = 0;
+ unsigned long cache_total_size = 0;
+ int n = uparams->bits ? : 5;
unsigned long long_mask;
char *span_str = NULL;
- char cbm_mask[256];
int count_of_bits;
size_t span;
int ret, i;
- ret = get_cbm_mask("L3", cbm_mask);
+ ret = get_full_cbm("L3", &long_mask);
if (ret)
return ret;
- long_mask = strtoul(cbm_mask, NULL, 16);
-
- ret = get_cache_size(cpu_no, "L3", &cache_size);
+ ret = get_cache_size(uparams->cpu, "L3", &cache_total_size);
if (ret)
return ret;
- ksft_print_msg("Cache size :%lu\n", cache_size);
+ ksft_print_msg("Cache size :%lu\n", cache_total_size);
count_of_bits = count_bits(long_mask);
@@ -103,19 +129,18 @@ int cmt_resctrl_val(int cpu_no, int n, const char * const *benchmark_cmd)
.resctrl_val = CMT_STR,
.ctrlgrp = "c1",
.mongrp = "m1",
- .cpu_no = cpu_no,
.filename = RESULT_FILE_NAME,
.mask = ~(long_mask << n) & long_mask,
.num_of_runs = 0,
.setup = cmt_setup,
};
- span = cache_size * n / count_of_bits;
+ span = cache_portion_size(cache_total_size, param.mask, long_mask);
if (strcmp(cmd[0], "fill_buf") == 0) {
/* Duplicate the command to be able to replace span in it */
- for (i = 0; benchmark_cmd[i]; i++)
- new_cmd[i] = benchmark_cmd[i];
+ for (i = 0; uparams->benchmark_cmd[i]; i++)
+ new_cmd[i] = uparams->benchmark_cmd[i];
new_cmd[i] = NULL;
ret = asprintf(&span_str, "%zu", span);
@@ -127,11 +152,13 @@ int cmt_resctrl_val(int cpu_no, int n, const char * const *benchmark_cmd)
remove(RESULT_FILE_NAME);
- ret = resctrl_val(cmd, ¶m);
+ ret = resctrl_val(test, uparams, cmd, ¶m);
if (ret)
goto out;
ret = check_results(¶m, span, n);
+ if (ret && (get_vendor() == ARCH_INTEL))
+ ksft_print_msg("Intel CMT may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration.\n");
out:
cmt_test_cleanup();
@@ -139,3 +166,16 @@ int cmt_resctrl_val(int cpu_no, int n, const char * const *benchmark_cmd)
return ret;
}
+
+static bool cmt_feature_check(const struct resctrl_test *test)
+{
+ return test_resource_feature_check(test) &&
+ resctrl_mon_feature_exists("L3_MON", "llc_occupancy");
+}
+
+struct resctrl_test cmt_test = {
+ .name = "CMT",
+ .resource = "L3",
+ .feature_check = cmt_feature_check,
+ .run_test = cmt_run_test,
+};
diff --git a/tools/testing/selftests/resctrl/fill_buf.c b/tools/testing/selftests/resctrl/fill_buf.c
index 0d425f2..ae120f1 100644
--- a/tools/testing/selftests/resctrl/fill_buf.c
+++ b/tools/testing/selftests/resctrl/fill_buf.c
@@ -38,7 +38,7 @@ static void cl_flush(void *p)
#endif
}
-static void mem_flush(unsigned char *buf, size_t buf_size)
+void mem_flush(unsigned char *buf, size_t buf_size)
{
unsigned char *cp = buf;
size_t i = 0;
@@ -51,39 +51,38 @@ static void mem_flush(unsigned char *buf, size_t buf_size)
sb();
}
-static void *malloc_and_init_memory(size_t buf_size)
-{
- void *p = NULL;
- uint64_t *p64;
- size_t s64;
- int ret;
-
- ret = posix_memalign(&p, PAGE_SIZE, buf_size);
- if (ret < 0)
- return NULL;
-
- p64 = (uint64_t *)p;
- s64 = buf_size / sizeof(uint64_t);
-
- while (s64 > 0) {
- *p64 = (uint64_t)rand();
- p64 += (CL_SIZE / sizeof(uint64_t));
- s64 -= (CL_SIZE / sizeof(uint64_t));
- }
-
- return p;
-}
+/*
+ * Buffer index step advance to workaround HW prefetching interfering with
+ * the measurements.
+ *
+ * Must be a prime to step through all indexes of the buffer.
+ *
+ * Some primes work better than others on some architectures (from MBA/MBM
+ * result stability point of view).
+ */
+#define FILL_IDX_MULT 23
static int fill_one_span_read(unsigned char *buf, size_t buf_size)
{
- unsigned char *end_ptr = buf + buf_size;
- unsigned char sum, *p;
+ unsigned int size = buf_size / (CL_SIZE / 2);
+ unsigned int i, idx = 0;
+ unsigned char sum = 0;
- sum = 0;
- p = buf;
- while (p < end_ptr) {
- sum += *p;
- p += (CL_SIZE / 2);
+ /*
+ * Read the buffer in an order that is unexpected by HW prefetching
+ * optimizations to prevent them interfering with the caching pattern.
+ *
+ * The read order is (in terms of halves of cachelines):
+ * i * FILL_IDX_MULT % size
+ * The formula is open-coded below to avoiding modulo inside the loop
+ * as it improves MBA/MBM result stability on some architectures.
+ */
+ for (i = 0; i < size; i++) {
+ sum += buf[idx * (CL_SIZE / 2)];
+
+ idx += FILL_IDX_MULT;
+ while (idx >= size)
+ idx -= size;
}
return sum;
@@ -101,10 +100,9 @@ static void fill_one_span_write(unsigned char *buf, size_t buf_size)
}
}
-static int fill_cache_read(unsigned char *buf, size_t buf_size, bool once)
+void fill_cache_read(unsigned char *buf, size_t buf_size, bool once)
{
int ret = 0;
- FILE *fp;
while (1) {
ret = fill_one_span_read(buf, buf_size);
@@ -113,67 +111,59 @@ static int fill_cache_read(unsigned char *buf, size_t buf_size, bool once)
}
/* Consume read result so that reading memory is not optimized out. */
- fp = fopen("/dev/null", "w");
- if (!fp) {
- perror("Unable to write to /dev/null");
- return -1;
- }
- fprintf(fp, "Sum: %d ", ret);
- fclose(fp);
-
- return 0;
+ *value_sink = ret;
}
-static int fill_cache_write(unsigned char *buf, size_t buf_size, bool once)
+static void fill_cache_write(unsigned char *buf, size_t buf_size, bool once)
{
while (1) {
fill_one_span_write(buf, buf_size);
if (once)
break;
}
-
- return 0;
}
-static int fill_cache(size_t buf_size, int memflush, int op, bool once)
+unsigned char *alloc_buffer(size_t buf_size, int memflush)
{
- unsigned char *buf;
+ void *buf = NULL;
+ uint64_t *p64;
+ size_t s64;
int ret;
- buf = malloc_and_init_memory(buf_size);
- if (!buf)
- return -1;
+ ret = posix_memalign(&buf, PAGE_SIZE, buf_size);
+ if (ret < 0)
+ return NULL;
+
+ /* Initialize the buffer */
+ p64 = buf;
+ s64 = buf_size / sizeof(uint64_t);
+
+ while (s64 > 0) {
+ *p64 = (uint64_t)rand();
+ p64 += (CL_SIZE / sizeof(uint64_t));
+ s64 -= (CL_SIZE / sizeof(uint64_t));
+ }
/* Flush the memory before using to avoid "cache hot pages" effect */
if (memflush)
mem_flush(buf, buf_size);
- if (op == 0)
- ret = fill_cache_read(buf, buf_size, once);
- else
- ret = fill_cache_write(buf, buf_size, once);
-
- free(buf);
-
- if (ret) {
- printf("\n Error in fill cache read/write...\n");
- return -1;
- }
-
-
- return 0;
+ return buf;
}
-int run_fill_buf(size_t span, int memflush, int op, bool once)
+int run_fill_buf(size_t buf_size, int memflush, int op, bool once)
{
- size_t cache_size = span;
- int ret;
+ unsigned char *buf;
- ret = fill_cache(cache_size, memflush, op, once);
- if (ret) {
- printf("\n Error in fill cache\n");
+ buf = alloc_buffer(buf_size, memflush);
+ if (!buf)
return -1;
- }
+
+ if (op == 0)
+ fill_cache_read(buf, buf_size, once);
+ else
+ fill_cache_write(buf, buf_size, once);
+ free(buf);
return 0;
}
diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c
index d3bf436..7946e32 100644
--- a/tools/testing/selftests/resctrl/mba_test.c
+++ b/tools/testing/selftests/resctrl/mba_test.c
@@ -22,7 +22,9 @@
* con_mon grp, mon_grp in resctrl FS.
* For each allocation, run 5 times in order to get average values.
*/
-static int mba_setup(struct resctrl_val_param *p)
+static int mba_setup(const struct resctrl_test *test,
+ const struct user_params *uparams,
+ struct resctrl_val_param *p)
{
static int runs_per_allocation, allocation = 100;
char allocation_str[64];
@@ -40,8 +42,7 @@ static int mba_setup(struct resctrl_val_param *p)
sprintf(allocation_str, "%d", allocation);
- ret = write_schemata(p->ctrlgrp, allocation_str, p->cpu_no,
- p->resctrl_val);
+ ret = write_schemata(p->ctrlgrp, allocation_str, uparams->cpu, test->resource);
if (ret < 0)
return ret;
@@ -109,9 +110,9 @@ static int check_results(void)
fp = fopen(output, "r");
if (!fp) {
- perror(output);
+ ksft_perror(output);
- return errno;
+ return -1;
}
runs = 0;
@@ -141,13 +142,12 @@ void mba_test_cleanup(void)
remove(RESULT_FILE_NAME);
}
-int mba_schemata_change(int cpu_no, const char * const *benchmark_cmd)
+static int mba_run_test(const struct resctrl_test *test, const struct user_params *uparams)
{
struct resctrl_val_param param = {
.resctrl_val = MBA_STR,
.ctrlgrp = "c1",
.mongrp = "m1",
- .cpu_no = cpu_no,
.filename = RESULT_FILE_NAME,
.bw_report = "reads",
.setup = mba_setup
@@ -156,7 +156,7 @@ int mba_schemata_change(int cpu_no, const char * const *benchmark_cmd)
remove(RESULT_FILE_NAME);
- ret = resctrl_val(benchmark_cmd, ¶m);
+ ret = resctrl_val(test, uparams, uparams->benchmark_cmd, ¶m);
if (ret)
goto out;
@@ -167,3 +167,17 @@ int mba_schemata_change(int cpu_no, const char * const *benchmark_cmd)
return ret;
}
+
+static bool mba_feature_check(const struct resctrl_test *test)
+{
+ return test_resource_feature_check(test) &&
+ resctrl_mon_feature_exists("L3_MON", "mbm_local_bytes");
+}
+
+struct resctrl_test mba_test = {
+ .name = "MBA",
+ .resource = "MB",
+ .vendor_specific = ARCH_INTEL,
+ .feature_check = mba_feature_check,
+ .run_test = mba_run_test,
+};
diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c
index 741533f..d67ffa3 100644
--- a/tools/testing/selftests/resctrl/mbm_test.c
+++ b/tools/testing/selftests/resctrl/mbm_test.c
@@ -59,9 +59,9 @@ static int check_results(size_t span)
fp = fopen(output, "r");
if (!fp) {
- perror(output);
+ ksft_perror(output);
- return errno;
+ return -1;
}
runs = 0;
@@ -86,7 +86,9 @@ static int check_results(size_t span)
return ret;
}
-static int mbm_setup(struct resctrl_val_param *p)
+static int mbm_setup(const struct resctrl_test *test,
+ const struct user_params *uparams,
+ struct resctrl_val_param *p)
{
int ret = 0;
@@ -95,9 +97,8 @@ static int mbm_setup(struct resctrl_val_param *p)
return END_OF_TESTS;
/* Set up shemata with 100% allocation on the first run. */
- if (p->num_of_runs == 0 && validate_resctrl_feature_request("MB", NULL))
- ret = write_schemata(p->ctrlgrp, "100", p->cpu_no,
- p->resctrl_val);
+ if (p->num_of_runs == 0 && resctrl_resource_exists("MB"))
+ ret = write_schemata(p->ctrlgrp, "100", uparams->cpu, test->resource);
p->num_of_runs++;
@@ -109,13 +110,12 @@ void mbm_test_cleanup(void)
remove(RESULT_FILE_NAME);
}
-int mbm_bw_change(int cpu_no, const char * const *benchmark_cmd)
+static int mbm_run_test(const struct resctrl_test *test, const struct user_params *uparams)
{
struct resctrl_val_param param = {
.resctrl_val = MBM_STR,
.ctrlgrp = "c1",
.mongrp = "m1",
- .cpu_no = cpu_no,
.filename = RESULT_FILE_NAME,
.bw_report = "reads",
.setup = mbm_setup
@@ -124,14 +124,30 @@ int mbm_bw_change(int cpu_no, const char * const *benchmark_cmd)
remove(RESULT_FILE_NAME);
- ret = resctrl_val(benchmark_cmd, ¶m);
+ ret = resctrl_val(test, uparams, uparams->benchmark_cmd, ¶m);
if (ret)
goto out;
ret = check_results(DEFAULT_SPAN);
+ if (ret && (get_vendor() == ARCH_INTEL))
+ ksft_print_msg("Intel MBM may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration.\n");
out:
mbm_test_cleanup();
return ret;
}
+
+static bool mbm_feature_check(const struct resctrl_test *test)
+{
+ return resctrl_mon_feature_exists("L3_MON", "mbm_total_bytes") &&
+ resctrl_mon_feature_exists("L3_MON", "mbm_local_bytes");
+}
+
+struct resctrl_test mbm_test = {
+ .name = "MBM",
+ .resource = "MB",
+ .vendor_specific = ARCH_INTEL,
+ .feature_check = mbm_feature_check,
+ .run_test = mbm_run_test,
+};
diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h
index a33f414f..2051bd1 100644
--- a/tools/testing/selftests/resctrl/resctrl.h
+++ b/tools/testing/selftests/resctrl/resctrl.h
@@ -28,6 +28,12 @@
#define PHYS_ID_PATH "/sys/devices/system/cpu/cpu"
#define INFO_PATH "/sys/fs/resctrl/info"
+/*
+ * CPU vendor IDs
+ *
+ * Define as bits because they're used for vendor_specific bitmask in
+ * the struct resctrl_test.
+ */
#define ARCH_INTEL 1
#define ARCH_AMD 2
@@ -37,20 +43,52 @@
#define DEFAULT_SPAN (250 * MB)
-#define PARENT_EXIT(err_msg) \
+#define PARENT_EXIT() \
do { \
- perror(err_msg); \
kill(ppid, SIGKILL); \
umount_resctrlfs(); \
exit(EXIT_FAILURE); \
} while (0)
/*
+ * user_params: User supplied parameters
+ * @cpu: CPU number to which the benchmark will be bound to
+ * @bits: Number of bits used for cache allocation size
+ * @benchmark_cmd: Benchmark command to run during (some of the) tests
+ */
+struct user_params {
+ int cpu;
+ int bits;
+ const char *benchmark_cmd[BENCHMARK_ARGS];
+};
+
+/*
+ * resctrl_test: resctrl test definition
+ * @name: Test name
+ * @group: Test group - a common name for tests that share some characteristic
+ * (e.g., L3 CAT test belongs to the CAT group). Can be NULL
+ * @resource: Resource to test (e.g., MB, L3, L2, etc.)
+ * @vendor_specific: Bitmask for vendor-specific tests (can be 0 for universal tests)
+ * @disabled: Test is disabled
+ * @feature_check: Callback to check required resctrl features
+ * @run_test: Callback to run the test
+ */
+struct resctrl_test {
+ const char *name;
+ const char *group;
+ const char *resource;
+ unsigned int vendor_specific;
+ bool disabled;
+ bool (*feature_check)(const struct resctrl_test *test);
+ int (*run_test)(const struct resctrl_test *test,
+ const struct user_params *uparams);
+};
+
+/*
* resctrl_val_param: resctrl test parameters
* @resctrl_val: Resctrl feature (Eg: mbm, mba.. etc)
* @ctrlgrp: Name of the control monitor group (con_mon grp)
* @mongrp: Name of the monitor group (mon grp)
- * @cpu_no: CPU number to which the benchmark would be binded
* @filename: Name of file to which the o/p should be written
* @bw_report: Bandwidth report type (reads vs writes)
* @setup: Call back function to setup test environment
@@ -59,12 +97,20 @@ struct resctrl_val_param {
char *resctrl_val;
char ctrlgrp[64];
char mongrp[64];
- int cpu_no;
char filename[64];
char *bw_report;
unsigned long mask;
int num_of_runs;
- int (*setup)(struct resctrl_val_param *param);
+ int (*setup)(const struct resctrl_test *test,
+ const struct user_params *uparams,
+ struct resctrl_val_param *param);
+};
+
+struct perf_event_read {
+ __u64 nr; /* The number of events */
+ struct {
+ __u64 value; /* The value of the event */
+ } values[2];
};
#define MBM_STR "mbm"
@@ -72,6 +118,13 @@ struct resctrl_val_param {
#define CMT_STR "cmt"
#define CAT_STR "cat"
+/*
+ * Memory location that consumes values compiler must not optimize away.
+ * Volatile ensures writes to this location cannot be optimized away by
+ * compiler.
+ */
+extern volatile int *value_sink;
+
extern pid_t bm_pid, ppid;
extern char llc_occup_path[1024];
@@ -79,42 +132,84 @@ extern char llc_occup_path[1024];
int get_vendor(void);
bool check_resctrlfs_support(void);
int filter_dmesg(void);
-int get_resource_id(int cpu_no, int *resource_id);
+int get_domain_id(const char *resource, int cpu_no, int *domain_id);
int mount_resctrlfs(void);
int umount_resctrlfs(void);
int validate_bw_report_request(char *bw_report);
-bool validate_resctrl_feature_request(const char *resource, const char *feature);
+bool resctrl_resource_exists(const char *resource);
+bool resctrl_mon_feature_exists(const char *resource, const char *feature);
+bool resource_info_file_exists(const char *resource, const char *file);
+bool test_resource_feature_check(const struct resctrl_test *test);
char *fgrep(FILE *inf, const char *str);
-int taskset_benchmark(pid_t bm_pid, int cpu_no);
-int write_schemata(char *ctrlgrp, char *schemata, int cpu_no,
- char *resctrl_val);
+int taskset_benchmark(pid_t bm_pid, int cpu_no, cpu_set_t *old_affinity);
+int taskset_restore(pid_t bm_pid, cpu_set_t *old_affinity);
+int write_schemata(char *ctrlgrp, char *schemata, int cpu_no, const char *resource);
int write_bm_pid_to_resctrl(pid_t bm_pid, char *ctrlgrp, char *mongrp,
char *resctrl_val);
int perf_event_open(struct perf_event_attr *hw_event, pid_t pid, int cpu,
int group_fd, unsigned long flags);
-int run_fill_buf(size_t span, int memflush, int op, bool once);
-int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *param);
-int mbm_bw_change(int cpu_no, const char * const *benchmark_cmd);
+unsigned char *alloc_buffer(size_t buf_size, int memflush);
+void mem_flush(unsigned char *buf, size_t buf_size);
+void fill_cache_read(unsigned char *buf, size_t buf_size, bool once);
+int run_fill_buf(size_t buf_size, int memflush, int op, bool once);
+int resctrl_val(const struct resctrl_test *test,
+ const struct user_params *uparams,
+ const char * const *benchmark_cmd,
+ struct resctrl_val_param *param);
void tests_cleanup(void);
void mbm_test_cleanup(void);
-int mba_schemata_change(int cpu_no, const char * const *benchmark_cmd);
void mba_test_cleanup(void);
-int get_cbm_mask(char *cache_type, char *cbm_mask);
-int get_cache_size(int cpu_no, char *cache_type, unsigned long *cache_size);
+unsigned long create_bit_mask(unsigned int start, unsigned int len);
+unsigned int count_contiguous_bits(unsigned long val, unsigned int *start);
+int get_full_cbm(const char *cache_type, unsigned long *mask);
+int get_mask_no_shareable(const char *cache_type, unsigned long *mask);
+int get_cache_size(int cpu_no, const char *cache_type, unsigned long *cache_size);
+int resource_info_unsigned_get(const char *resource, const char *filename, unsigned int *val);
void ctrlc_handler(int signum, siginfo_t *info, void *ptr);
int signal_handler_register(void);
void signal_handler_unregister(void);
-int cat_val(struct resctrl_val_param *param, size_t span);
void cat_test_cleanup(void);
-int cat_perf_miss_val(int cpu_no, int no_of_bits, char *cache_type);
-int cmt_resctrl_val(int cpu_no, int n, const char * const *benchmark_cmd);
unsigned int count_bits(unsigned long n);
void cmt_test_cleanup(void);
-int get_core_sibling(int cpu_no);
-int measure_cache_vals(struct resctrl_val_param *param, int bm_pid);
-int show_cache_info(unsigned long sum_llc_val, int no_of_bits,
- size_t cache_span, unsigned long max_diff,
- unsigned long max_diff_percent, unsigned long num_of_runs,
- bool platform, bool cmt);
+
+void perf_event_attr_initialize(struct perf_event_attr *pea, __u64 config);
+void perf_event_initialize_read_format(struct perf_event_read *pe_read);
+int perf_open(struct perf_event_attr *pea, pid_t pid, int cpu_no);
+int perf_event_reset_enable(int pe_fd);
+int perf_event_measure(int pe_fd, struct perf_event_read *pe_read,
+ const char *filename, int bm_pid);
+int measure_llc_resctrl(const char *filename, int bm_pid);
+void show_cache_info(int no_of_bits, __u64 avg_llc_val, size_t cache_span, bool lines);
+
+/*
+ * cache_portion_size - Calculate the size of a cache portion
+ * @cache_size: Total cache size in bytes
+ * @portion_mask: Cache portion mask
+ * @full_cache_mask: Full Cache Bit Mask (CBM) for the cache
+ *
+ * Return: The size of the cache portion in bytes.
+ */
+static inline unsigned long cache_portion_size(unsigned long cache_size,
+ unsigned long portion_mask,
+ unsigned long full_cache_mask)
+{
+ unsigned int bits = count_bits(full_cache_mask);
+
+ /*
+ * With no bits the full CBM, assume cache cannot be split into
+ * smaller portions. To avoid divide by zero, return cache_size.
+ */
+ if (!bits)
+ return cache_size;
+
+ return cache_size * count_bits(portion_mask) / bits;
+}
+
+extern struct resctrl_test mbm_test;
+extern struct resctrl_test mba_test;
+extern struct resctrl_test cmt_test;
+extern struct resctrl_test l3_cat_test;
+extern struct resctrl_test l3_noncont_cat_test;
+extern struct resctrl_test l2_noncont_cat_test;
#endif /* RESCTRL_H */
diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c
index 2bbe304..f3dc1b9 100644
--- a/tools/testing/selftests/resctrl/resctrl_tests.c
+++ b/tools/testing/selftests/resctrl/resctrl_tests.c
@@ -10,6 +10,19 @@
*/
#include "resctrl.h"
+/* Volatile memory sink to prevent compiler optimizations */
+static volatile int sink_target;
+volatile int *value_sink = &sink_target;
+
+static struct resctrl_test *resctrl_tests[] = {
+ &mbm_test,
+ &mba_test,
+ &cmt_test,
+ &l3_cat_test,
+ &l3_noncont_cat_test,
+ &l2_noncont_cat_test,
+};
+
static int detect_vendor(void)
{
FILE *inf = fopen("/proc/cpuinfo", "r");
@@ -49,11 +62,20 @@ int get_vendor(void)
static void cmd_help(void)
{
+ int i;
+
printf("usage: resctrl_tests [-h] [-t test list] [-n no_of_bits] [-b benchmark_cmd [option]...]\n");
printf("\t-b benchmark_cmd [option]...: run specified benchmark for MBM, MBA and CMT\n");
printf("\t default benchmark is builtin fill_buf\n");
- printf("\t-t test list: run tests specified in the test list, ");
+ printf("\t-t test list: run tests/groups specified by the list, ");
printf("e.g. -t mbm,mba,cmt,cat\n");
+ printf("\t\tSupported tests (group):\n");
+ for (i = 0; i < ARRAY_SIZE(resctrl_tests); i++) {
+ if (resctrl_tests[i]->group)
+ printf("\t\t\t%s (%s)\n", resctrl_tests[i]->name, resctrl_tests[i]->group);
+ else
+ printf("\t\t\t%s\n", resctrl_tests[i]->name);
+ }
printf("\t-n no_of_bits: run cache tests using specified no of bits in cache bit mask\n");
printf("\t-p cpu_no: specify CPU number to run the test. 1 is default\n");
printf("\t-h: help\n");
@@ -92,116 +114,63 @@ static void test_cleanup(void)
signal_handler_unregister();
}
-static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no)
+static bool test_vendor_specific_check(const struct resctrl_test *test)
{
- int res;
+ if (!test->vendor_specific)
+ return true;
- ksft_print_msg("Starting MBM BW change ...\n");
+ return get_vendor() & test->vendor_specific;
+}
+
+static void run_single_test(const struct resctrl_test *test, const struct user_params *uparams)
+{
+ int ret;
+
+ if (test->disabled)
+ return;
+
+ if (!test_vendor_specific_check(test)) {
+ ksft_test_result_skip("Hardware does not support %s\n", test->name);
+ return;
+ }
+
+ ksft_print_msg("Starting %s test ...\n", test->name);
if (test_prepare()) {
ksft_exit_fail_msg("Abnormal failure when preparing for the test\n");
return;
}
- if (!validate_resctrl_feature_request("L3_MON", "mbm_total_bytes") ||
- !validate_resctrl_feature_request("L3_MON", "mbm_local_bytes") ||
- (get_vendor() != ARCH_INTEL)) {
- ksft_test_result_skip("Hardware does not support MBM or MBM is disabled\n");
+ if (!test->feature_check(test)) {
+ ksft_test_result_skip("Hardware does not support %s or %s is disabled\n",
+ test->name, test->name);
goto cleanup;
}
- res = mbm_bw_change(cpu_no, benchmark_cmd);
- ksft_test_result(!res, "MBM: bw change\n");
- if ((get_vendor() == ARCH_INTEL) && res)
- ksft_print_msg("Intel MBM may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration.\n");
+ ret = test->run_test(test, uparams);
+ ksft_test_result(!ret, "%s: test\n", test->name);
cleanup:
test_cleanup();
}
-static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
+static void init_user_params(struct user_params *uparams)
{
- int res;
+ memset(uparams, 0, sizeof(*uparams));
- ksft_print_msg("Starting MBA Schemata change ...\n");
-
- if (test_prepare()) {
- ksft_exit_fail_msg("Abnormal failure when preparing for the test\n");
- return;
- }
-
- if (!validate_resctrl_feature_request("MB", NULL) ||
- !validate_resctrl_feature_request("L3_MON", "mbm_local_bytes") ||
- (get_vendor() != ARCH_INTEL)) {
- ksft_test_result_skip("Hardware does not support MBA or MBA is disabled\n");
- goto cleanup;
- }
-
- res = mba_schemata_change(cpu_no, benchmark_cmd);
- ksft_test_result(!res, "MBA: schemata change\n");
-
-cleanup:
- test_cleanup();
-}
-
-static void run_cmt_test(const char * const *benchmark_cmd, int cpu_no)
-{
- int res;
-
- ksft_print_msg("Starting CMT test ...\n");
-
- if (test_prepare()) {
- ksft_exit_fail_msg("Abnormal failure when preparing for the test\n");
- return;
- }
-
- if (!validate_resctrl_feature_request("L3_MON", "llc_occupancy") ||
- !validate_resctrl_feature_request("L3", NULL)) {
- ksft_test_result_skip("Hardware does not support CMT or CMT is disabled\n");
- goto cleanup;
- }
-
- res = cmt_resctrl_val(cpu_no, 5, benchmark_cmd);
- ksft_test_result(!res, "CMT: test\n");
- if ((get_vendor() == ARCH_INTEL) && res)
- ksft_print_msg("Intel CMT may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration.\n");
-
-cleanup:
- test_cleanup();
-}
-
-static void run_cat_test(int cpu_no, int no_of_bits)
-{
- int res;
-
- ksft_print_msg("Starting CAT test ...\n");
-
- if (test_prepare()) {
- ksft_exit_fail_msg("Abnormal failure when preparing for the test\n");
- return;
- }
-
- if (!validate_resctrl_feature_request("L3", NULL)) {
- ksft_test_result_skip("Hardware does not support CAT or CAT is disabled\n");
- goto cleanup;
- }
-
- res = cat_perf_miss_val(cpu_no, no_of_bits, "L3");
- ksft_test_result(!res, "CAT: test\n");
-
-cleanup:
- test_cleanup();
+ uparams->cpu = 1;
+ uparams->bits = 0;
}
int main(int argc, char **argv)
{
- bool mbm_test = true, mba_test = true, cmt_test = true;
- const char *benchmark_cmd[BENCHMARK_ARGS] = {};
- int c, cpu_no = 1, i, no_of_bits = 0;
+ int tests = ARRAY_SIZE(resctrl_tests);
+ bool test_param_seen = false;
+ struct user_params uparams;
char *span_str = NULL;
- bool cat_test = true;
- int tests = 0;
- int ret;
+ int ret, c, i;
+
+ init_user_params(&uparams);
while ((c = getopt(argc, argv, "ht:b:n:p:")) != -1) {
char *token;
@@ -219,32 +188,35 @@ int main(int argc, char **argv)
/* Extract benchmark command from command line. */
for (i = 0; i < argc - optind; i++)
- benchmark_cmd[i] = argv[i + optind];
- benchmark_cmd[i] = NULL;
+ uparams.benchmark_cmd[i] = argv[i + optind];
+ uparams.benchmark_cmd[i] = NULL;
goto last_arg;
case 't':
token = strtok(optarg, ",");
- mbm_test = false;
- mba_test = false;
- cmt_test = false;
- cat_test = false;
+ if (!test_param_seen) {
+ for (i = 0; i < ARRAY_SIZE(resctrl_tests); i++)
+ resctrl_tests[i]->disabled = true;
+ tests = 0;
+ test_param_seen = true;
+ }
while (token) {
- if (!strncmp(token, MBM_STR, sizeof(MBM_STR))) {
- mbm_test = true;
- tests++;
- } else if (!strncmp(token, MBA_STR, sizeof(MBA_STR))) {
- mba_test = true;
- tests++;
- } else if (!strncmp(token, CMT_STR, sizeof(CMT_STR))) {
- cmt_test = true;
- tests++;
- } else if (!strncmp(token, CAT_STR, sizeof(CAT_STR))) {
- cat_test = true;
- tests++;
- } else {
- printf("invalid argument\n");
+ bool found = false;
+
+ for (i = 0; i < ARRAY_SIZE(resctrl_tests); i++) {
+ if (!strcasecmp(token, resctrl_tests[i]->name) ||
+ (resctrl_tests[i]->group &&
+ !strcasecmp(token, resctrl_tests[i]->group))) {
+ if (resctrl_tests[i]->disabled)
+ tests++;
+ resctrl_tests[i]->disabled = false;
+ found = true;
+ }
+ }
+
+ if (!found) {
+ printf("invalid test: %s\n", token);
return -1;
}
@@ -252,11 +224,11 @@ int main(int argc, char **argv)
}
break;
case 'p':
- cpu_no = atoi(optarg);
+ uparams.cpu = atoi(optarg);
break;
case 'n':
- no_of_bits = atoi(optarg);
- if (no_of_bits <= 0) {
+ uparams.bits = atoi(optarg);
+ if (uparams.bits <= 0) {
printf("Bail out! invalid argument for no_of_bits\n");
return -1;
}
@@ -291,32 +263,23 @@ int main(int argc, char **argv)
filter_dmesg();
- if (!benchmark_cmd[0]) {
+ if (!uparams.benchmark_cmd[0]) {
/* If no benchmark is given by "-b" argument, use fill_buf. */
- benchmark_cmd[0] = "fill_buf";
+ uparams.benchmark_cmd[0] = "fill_buf";
ret = asprintf(&span_str, "%u", DEFAULT_SPAN);
if (ret < 0)
ksft_exit_fail_msg("Out of memory!\n");
- benchmark_cmd[1] = span_str;
- benchmark_cmd[2] = "1";
- benchmark_cmd[3] = "0";
- benchmark_cmd[4] = "false";
- benchmark_cmd[5] = NULL;
+ uparams.benchmark_cmd[1] = span_str;
+ uparams.benchmark_cmd[2] = "1";
+ uparams.benchmark_cmd[3] = "0";
+ uparams.benchmark_cmd[4] = "false";
+ uparams.benchmark_cmd[5] = NULL;
}
- ksft_set_plan(tests ? : 4);
+ ksft_set_plan(tests);
- if (mbm_test)
- run_mbm_test(benchmark_cmd, cpu_no);
-
- if (mba_test)
- run_mba_test(benchmark_cmd, cpu_no);
-
- if (cmt_test)
- run_cmt_test(benchmark_cmd, cpu_no);
-
- if (cat_test)
- run_cat_test(cpu_no, no_of_bits);
+ for (i = 0; i < ARRAY_SIZE(resctrl_tests); i++)
+ run_single_test(resctrl_tests[i], &uparams);
free(span_str);
ksft_finished();
diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c
index 88789678..5a49f07 100644
--- a/tools/testing/selftests/resctrl/resctrl_val.c
+++ b/tools/testing/selftests/resctrl/resctrl_val.c
@@ -156,12 +156,12 @@ static int read_from_imc_dir(char *imc_dir, int count)
sprintf(imc_counter_type, "%s%s", imc_dir, "type");
fp = fopen(imc_counter_type, "r");
if (!fp) {
- perror("Failed to open imc counter type file");
+ ksft_perror("Failed to open iMC counter type file");
return -1;
}
if (fscanf(fp, "%u", &imc_counters_config[count][READ].type) <= 0) {
- perror("Could not get imc type");
+ ksft_perror("Could not get iMC type");
fclose(fp);
return -1;
@@ -175,12 +175,12 @@ static int read_from_imc_dir(char *imc_dir, int count)
sprintf(imc_counter_cfg, "%s%s", imc_dir, READ_FILE_NAME);
fp = fopen(imc_counter_cfg, "r");
if (!fp) {
- perror("Failed to open imc config file");
+ ksft_perror("Failed to open iMC config file");
return -1;
}
if (fscanf(fp, "%s", cas_count_cfg) <= 0) {
- perror("Could not get imc cas count read");
+ ksft_perror("Could not get iMC cas count read");
fclose(fp);
return -1;
@@ -193,12 +193,12 @@ static int read_from_imc_dir(char *imc_dir, int count)
sprintf(imc_counter_cfg, "%s%s", imc_dir, WRITE_FILE_NAME);
fp = fopen(imc_counter_cfg, "r");
if (!fp) {
- perror("Failed to open imc config file");
+ ksft_perror("Failed to open iMC config file");
return -1;
}
if (fscanf(fp, "%s", cas_count_cfg) <= 0) {
- perror("Could not get imc cas count write");
+ ksft_perror("Could not get iMC cas count write");
fclose(fp);
return -1;
@@ -262,12 +262,12 @@ static int num_of_imcs(void)
}
closedir(dp);
if (count == 0) {
- perror("Unable find iMC counters!\n");
+ ksft_print_msg("Unable to find iMC counters\n");
return -1;
}
} else {
- perror("Unable to open PMU directory!\n");
+ ksft_perror("Unable to open PMU directory");
return -1;
}
@@ -339,14 +339,14 @@ static int get_mem_bw_imc(int cpu_no, char *bw_report, float *bw_imc)
if (read(r->fd, &r->return_value,
sizeof(struct membw_read_format)) == -1) {
- perror("Couldn't get read b/w through iMC");
+ ksft_perror("Couldn't get read b/w through iMC");
return -1;
}
if (read(w->fd, &w->return_value,
sizeof(struct membw_read_format)) == -1) {
- perror("Couldn't get write bw through iMC");
+ ksft_perror("Couldn't get write bw through iMC");
return -1;
}
@@ -387,20 +387,20 @@ static int get_mem_bw_imc(int cpu_no, char *bw_report, float *bw_imc)
return 0;
}
-void set_mbm_path(const char *ctrlgrp, const char *mongrp, int resource_id)
+void set_mbm_path(const char *ctrlgrp, const char *mongrp, int domain_id)
{
if (ctrlgrp && mongrp)
sprintf(mbm_total_path, CON_MON_MBM_LOCAL_BYTES_PATH,
- RESCTRL_PATH, ctrlgrp, mongrp, resource_id);
+ RESCTRL_PATH, ctrlgrp, mongrp, domain_id);
else if (!ctrlgrp && mongrp)
sprintf(mbm_total_path, MON_MBM_LOCAL_BYTES_PATH, RESCTRL_PATH,
- mongrp, resource_id);
+ mongrp, domain_id);
else if (ctrlgrp && !mongrp)
sprintf(mbm_total_path, CON_MBM_LOCAL_BYTES_PATH, RESCTRL_PATH,
- ctrlgrp, resource_id);
+ ctrlgrp, domain_id);
else if (!ctrlgrp && !mongrp)
sprintf(mbm_total_path, MBM_LOCAL_BYTES_PATH, RESCTRL_PATH,
- resource_id);
+ domain_id);
}
/*
@@ -413,23 +413,23 @@ void set_mbm_path(const char *ctrlgrp, const char *mongrp, int resource_id)
static void initialize_mem_bw_resctrl(const char *ctrlgrp, const char *mongrp,
int cpu_no, char *resctrl_val)
{
- int resource_id;
+ int domain_id;
- if (get_resource_id(cpu_no, &resource_id) < 0) {
- perror("Could not get resource_id");
+ if (get_domain_id("MB", cpu_no, &domain_id) < 0) {
+ ksft_print_msg("Could not get domain ID\n");
return;
}
if (!strncmp(resctrl_val, MBM_STR, sizeof(MBM_STR)))
- set_mbm_path(ctrlgrp, mongrp, resource_id);
+ set_mbm_path(ctrlgrp, mongrp, domain_id);
if (!strncmp(resctrl_val, MBA_STR, sizeof(MBA_STR))) {
if (ctrlgrp)
sprintf(mbm_total_path, CON_MBM_LOCAL_BYTES_PATH,
- RESCTRL_PATH, ctrlgrp, resource_id);
+ RESCTRL_PATH, ctrlgrp, domain_id);
else
sprintf(mbm_total_path, MBM_LOCAL_BYTES_PATH,
- RESCTRL_PATH, resource_id);
+ RESCTRL_PATH, domain_id);
}
}
@@ -449,12 +449,12 @@ static int get_mem_bw_resctrl(unsigned long *mbm_total)
fp = fopen(mbm_total_path, "r");
if (!fp) {
- perror("Failed to open total bw file");
+ ksft_perror("Failed to open total bw file");
return -1;
}
if (fscanf(fp, "%lu", mbm_total) <= 0) {
- perror("Could not get mbm local bytes");
+ ksft_perror("Could not get mbm local bytes");
fclose(fp);
return -1;
@@ -495,7 +495,7 @@ int signal_handler_register(void)
if (sigaction(SIGINT, &sigact, NULL) ||
sigaction(SIGTERM, &sigact, NULL) ||
sigaction(SIGHUP, &sigact, NULL)) {
- perror("# sigaction");
+ ksft_perror("sigaction");
ret = -1;
}
return ret;
@@ -515,7 +515,7 @@ void signal_handler_unregister(void)
if (sigaction(SIGINT, &sigact, NULL) ||
sigaction(SIGTERM, &sigact, NULL) ||
sigaction(SIGHUP, &sigact, NULL)) {
- perror("# sigaction");
+ ksft_perror("sigaction");
}
}
@@ -526,7 +526,7 @@ void signal_handler_unregister(void)
* @bw_imc: perf imc counter value
* @bw_resc: memory bandwidth value
*
- * Return: 0 on success. non-zero on failure.
+ * Return: 0 on success, < 0 on error.
*/
static int print_results_bw(char *filename, int bm_pid, float bw_imc,
unsigned long bw_resc)
@@ -540,16 +540,16 @@ static int print_results_bw(char *filename, int bm_pid, float bw_imc,
} else {
fp = fopen(filename, "a");
if (!fp) {
- perror("Cannot open results file");
+ ksft_perror("Cannot open results file");
- return errno;
+ return -1;
}
if (fprintf(fp, "Pid: %d \t Mem_BW_iMC: %f \t Mem_BW_resc: %lu \t Difference: %lu\n",
bm_pid, bw_imc, bw_resc, diff) <= 0) {
+ ksft_print_msg("Could not log results\n");
fclose(fp);
- perror("Could not log results.");
- return errno;
+ return -1;
}
fclose(fp);
}
@@ -582,19 +582,20 @@ static void set_cmt_path(const char *ctrlgrp, const char *mongrp, char sock_num)
static void initialize_llc_occu_resctrl(const char *ctrlgrp, const char *mongrp,
int cpu_no, char *resctrl_val)
{
- int resource_id;
+ int domain_id;
- if (get_resource_id(cpu_no, &resource_id) < 0) {
- perror("# Unable to resource_id");
+ if (get_domain_id("L3", cpu_no, &domain_id) < 0) {
+ ksft_print_msg("Could not get domain ID\n");
return;
}
if (!strncmp(resctrl_val, CMT_STR, sizeof(CMT_STR)))
- set_cmt_path(ctrlgrp, mongrp, resource_id);
+ set_cmt_path(ctrlgrp, mongrp, domain_id);
}
-static int
-measure_vals(struct resctrl_val_param *param, unsigned long *bw_resc_start)
+static int measure_vals(const struct user_params *uparams,
+ struct resctrl_val_param *param,
+ unsigned long *bw_resc_start)
{
unsigned long bw_resc, bw_resc_end;
float bw_imc;
@@ -607,7 +608,7 @@ measure_vals(struct resctrl_val_param *param, unsigned long *bw_resc_start)
* Compare the two values to validate resctrl value.
* It takes 1sec to measure the data.
*/
- ret = get_mem_bw_imc(param->cpu_no, param->bw_report, &bw_imc);
+ ret = get_mem_bw_imc(uparams->cpu, param->bw_report, &bw_imc);
if (ret < 0)
return ret;
@@ -647,20 +648,24 @@ static void run_benchmark(int signum, siginfo_t *info, void *ucontext)
* stdio (console)
*/
fp = freopen("/dev/null", "w", stdout);
- if (!fp)
- PARENT_EXIT("Unable to direct benchmark status to /dev/null");
+ if (!fp) {
+ ksft_perror("Unable to direct benchmark status to /dev/null");
+ PARENT_EXIT();
+ }
if (strcmp(benchmark_cmd[0], "fill_buf") == 0) {
/* Execute default fill_buf benchmark */
span = strtoul(benchmark_cmd[1], NULL, 10);
memflush = atoi(benchmark_cmd[2]);
operation = atoi(benchmark_cmd[3]);
- if (!strcmp(benchmark_cmd[4], "true"))
+ if (!strcmp(benchmark_cmd[4], "true")) {
once = true;
- else if (!strcmp(benchmark_cmd[4], "false"))
+ } else if (!strcmp(benchmark_cmd[4], "false")) {
once = false;
- else
- PARENT_EXIT("Invalid once parameter");
+ } else {
+ ksft_print_msg("Invalid once parameter\n");
+ PARENT_EXIT();
+ }
if (run_fill_buf(span, memflush, operation, once))
fprintf(stderr, "Error in running fill buffer\n");
@@ -668,22 +673,28 @@ static void run_benchmark(int signum, siginfo_t *info, void *ucontext)
/* Execute specified benchmark */
ret = execvp(benchmark_cmd[0], benchmark_cmd);
if (ret)
- perror("wrong\n");
+ ksft_perror("execvp");
}
fclose(stdout);
- PARENT_EXIT("Unable to run specified benchmark");
+ ksft_print_msg("Unable to run specified benchmark\n");
+ PARENT_EXIT();
}
/*
* resctrl_val: execute benchmark and measure memory bandwidth on
* the benchmark
+ * @test: test information structure
+ * @uparams: user supplied parameters
* @benchmark_cmd: benchmark command and its arguments
* @param: parameters passed to resctrl_val()
*
- * Return: 0 on success. non-zero on failure.
+ * Return: 0 when the test was run, < 0 on error.
*/
-int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *param)
+int resctrl_val(const struct resctrl_test *test,
+ const struct user_params *uparams,
+ const char * const *benchmark_cmd,
+ struct resctrl_val_param *param)
{
char *resctrl_val = param->resctrl_val;
unsigned long bw_resc_start = 0;
@@ -709,7 +720,7 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par
ppid = getpid();
if (pipe(pipefd)) {
- perror("# Unable to create pipe");
+ ksft_perror("Unable to create pipe");
return -1;
}
@@ -721,7 +732,7 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par
fflush(stdout);
bm_pid = fork();
if (bm_pid == -1) {
- perror("# Unable to fork");
+ ksft_perror("Unable to fork");
return -1;
}
@@ -738,15 +749,17 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par
sigact.sa_flags = SA_SIGINFO;
/* Register for "SIGUSR1" signal from parent */
- if (sigaction(SIGUSR1, &sigact, NULL))
- PARENT_EXIT("Can't register child for signal");
+ if (sigaction(SIGUSR1, &sigact, NULL)) {
+ ksft_perror("Can't register child for signal");
+ PARENT_EXIT();
+ }
/* Tell parent that child is ready */
close(pipefd[0]);
pipe_message = 1;
if (write(pipefd[1], &pipe_message, sizeof(pipe_message)) <
sizeof(pipe_message)) {
- perror("# failed signaling parent process");
+ ksft_perror("Failed signaling parent process");
close(pipefd[1]);
return -1;
}
@@ -755,7 +768,8 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par
/* Suspend child until delivery of "SIGUSR1" from parent */
sigsuspend(&sigact.sa_mask);
- PARENT_EXIT("Child is done");
+ ksft_perror("Child is done");
+ PARENT_EXIT();
}
ksft_print_msg("Benchmark PID: %d\n", bm_pid);
@@ -769,7 +783,7 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par
value.sival_ptr = (void *)benchmark_cmd;
/* Taskset benchmark to specified cpu */
- ret = taskset_benchmark(bm_pid, param->cpu_no);
+ ret = taskset_benchmark(bm_pid, uparams->cpu, NULL);
if (ret)
goto out;
@@ -786,17 +800,17 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par
goto out;
initialize_mem_bw_resctrl(param->ctrlgrp, param->mongrp,
- param->cpu_no, resctrl_val);
+ uparams->cpu, resctrl_val);
} else if (!strncmp(resctrl_val, CMT_STR, sizeof(CMT_STR)))
initialize_llc_occu_resctrl(param->ctrlgrp, param->mongrp,
- param->cpu_no, resctrl_val);
+ uparams->cpu, resctrl_val);
/* Parent waits for child to be ready. */
close(pipefd[1]);
while (pipe_message != 1) {
if (read(pipefd[0], &pipe_message, sizeof(pipe_message)) <
sizeof(pipe_message)) {
- perror("# failed reading message from child process");
+ ksft_perror("Failed reading message from child process");
close(pipefd[0]);
goto out;
}
@@ -805,8 +819,8 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par
/* Signal child to start benchmark */
if (sigqueue(bm_pid, SIGUSR1, value) == -1) {
- perror("# sigqueue SIGUSR1 to child");
- ret = errno;
+ ksft_perror("sigqueue SIGUSR1 to child");
+ ret = -1;
goto out;
}
@@ -815,7 +829,7 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par
/* Test runs until the callback setup() tells the test to stop. */
while (1) {
- ret = param->setup(param);
+ ret = param->setup(test, uparams, param);
if (ret == END_OF_TESTS) {
ret = 0;
break;
@@ -825,12 +839,12 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par
if (!strncmp(resctrl_val, MBM_STR, sizeof(MBM_STR)) ||
!strncmp(resctrl_val, MBA_STR, sizeof(MBA_STR))) {
- ret = measure_vals(param, &bw_resc_start);
+ ret = measure_vals(uparams, param, &bw_resc_start);
if (ret)
break;
} else if (!strncmp(resctrl_val, CMT_STR, sizeof(CMT_STR))) {
sleep(1);
- ret = measure_cache_vals(param, bm_pid);
+ ret = measure_llc_resctrl(param->filename, bm_pid);
if (ret)
break;
}
diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c
index 5ebd436..1cade75 100644
--- a/tools/testing/selftests/resctrl/resctrlfs.c
+++ b/tools/testing/selftests/resctrl/resctrlfs.c
@@ -20,7 +20,7 @@ static int find_resctrl_mount(char *buffer)
mounts = fopen("/proc/mounts", "r");
if (!mounts) {
- perror("/proc/mounts");
+ ksft_perror("/proc/mounts");
return -ENXIO;
}
while (!feof(mounts)) {
@@ -56,7 +56,7 @@ static int find_resctrl_mount(char *buffer)
* Mounts resctrl FS. Fails if resctrl FS is already mounted to avoid
* pre-existing settings interfering with the test results.
*
- * Return: 0 on success, non-zero on failure
+ * Return: 0 on success, < 0 on error.
*/
int mount_resctrlfs(void)
{
@@ -69,7 +69,7 @@ int mount_resctrlfs(void)
ksft_print_msg("Mounting resctrl to \"%s\"\n", RESCTRL_PATH);
ret = mount("resctrl", RESCTRL_PATH, "resctrl", 0, NULL);
if (ret)
- perror("# mount");
+ ksft_perror("mount");
return ret;
}
@@ -86,41 +86,67 @@ int umount_resctrlfs(void)
return ret;
if (umount(mountpoint)) {
- perror("# Unable to umount resctrl");
+ ksft_perror("Unable to umount resctrl");
- return errno;
+ return -1;
}
return 0;
}
/*
- * get_resource_id - Get socket number/l3 id for a specified CPU
+ * get_cache_level - Convert cache level from string to integer
+ * @cache_type: Cache level as string
+ *
+ * Return: cache level as integer or -1 if @cache_type is invalid.
+ */
+static int get_cache_level(const char *cache_type)
+{
+ if (!strcmp(cache_type, "L3"))
+ return 3;
+ if (!strcmp(cache_type, "L2"))
+ return 2;
+
+ ksft_print_msg("Invalid cache level\n");
+ return -1;
+}
+
+static int get_resource_cache_level(const char *resource)
+{
+ /* "MB" use L3 (LLC) as resource */
+ if (!strcmp(resource, "MB"))
+ return 3;
+ return get_cache_level(resource);
+}
+
+/*
+ * get_domain_id - Get resctrl domain ID for a specified CPU
+ * @resource: resource name
* @cpu_no: CPU number
- * @resource_id: Socket number or l3_id
+ * @domain_id: domain ID (cache ID; for MB, L3 cache ID)
*
* Return: >= 0 on success, < 0 on failure.
*/
-int get_resource_id(int cpu_no, int *resource_id)
+int get_domain_id(const char *resource, int cpu_no, int *domain_id)
{
char phys_pkg_path[1024];
+ int cache_num;
FILE *fp;
- if (get_vendor() == ARCH_AMD)
- sprintf(phys_pkg_path, "%s%d/cache/index3/id",
- PHYS_ID_PATH, cpu_no);
- else
- sprintf(phys_pkg_path, "%s%d/topology/physical_package_id",
- PHYS_ID_PATH, cpu_no);
+ cache_num = get_resource_cache_level(resource);
+ if (cache_num < 0)
+ return cache_num;
+
+ sprintf(phys_pkg_path, "%s%d/cache/index%d/id", PHYS_ID_PATH, cpu_no, cache_num);
fp = fopen(phys_pkg_path, "r");
if (!fp) {
- perror("Failed to open physical_package_id");
+ ksft_perror("Failed to open cache id file");
return -1;
}
- if (fscanf(fp, "%d", resource_id) <= 0) {
- perror("Could not get socket number or l3 id");
+ if (fscanf(fp, "%d", domain_id) <= 0) {
+ ksft_perror("Could not get domain ID");
fclose(fp);
return -1;
@@ -138,31 +164,26 @@ int get_resource_id(int cpu_no, int *resource_id)
*
* Return: = 0 on success, < 0 on failure.
*/
-int get_cache_size(int cpu_no, char *cache_type, unsigned long *cache_size)
+int get_cache_size(int cpu_no, const char *cache_type, unsigned long *cache_size)
{
char cache_path[1024], cache_str[64];
int length, i, cache_num;
FILE *fp;
- if (!strcmp(cache_type, "L3")) {
- cache_num = 3;
- } else if (!strcmp(cache_type, "L2")) {
- cache_num = 2;
- } else {
- perror("Invalid cache level");
- return -1;
- }
+ cache_num = get_cache_level(cache_type);
+ if (cache_num < 0)
+ return cache_num;
sprintf(cache_path, "/sys/bus/cpu/devices/cpu%d/cache/index%d/size",
cpu_no, cache_num);
fp = fopen(cache_path, "r");
if (!fp) {
- perror("Failed to open cache size");
+ ksft_perror("Failed to open cache size");
return -1;
}
if (fscanf(fp, "%s", cache_str) <= 0) {
- perror("Could not get cache_size");
+ ksft_perror("Could not get cache_size");
fclose(fp);
return -1;
@@ -196,30 +217,29 @@ int get_cache_size(int cpu_no, char *cache_type, unsigned long *cache_size)
#define CORE_SIBLINGS_PATH "/sys/bus/cpu/devices/cpu"
/*
- * get_cbm_mask - Get cbm mask for given cache
- * @cache_type: Cache level L2/L3
- * @cbm_mask: cbm_mask returned as a string
+ * get_bit_mask - Get bit mask from given file
+ * @filename: File containing the mask
+ * @mask: The bit mask returned as unsigned long
*
* Return: = 0 on success, < 0 on failure.
*/
-int get_cbm_mask(char *cache_type, char *cbm_mask)
+static int get_bit_mask(const char *filename, unsigned long *mask)
{
- char cbm_mask_path[1024];
FILE *fp;
- if (!cbm_mask)
+ if (!filename || !mask)
return -1;
- sprintf(cbm_mask_path, "%s/%s/cbm_mask", INFO_PATH, cache_type);
-
- fp = fopen(cbm_mask_path, "r");
+ fp = fopen(filename, "r");
if (!fp) {
- perror("Failed to open cache level");
-
+ ksft_print_msg("Failed to open bit mask file '%s': %s\n",
+ filename, strerror(errno));
return -1;
}
- if (fscanf(fp, "%s", cbm_mask) <= 0) {
- perror("Could not get max cbm_mask");
+
+ if (fscanf(fp, "%lx", mask) <= 0) {
+ ksft_print_msg("Could not read bit mask file '%s': %s\n",
+ filename, strerror(errno));
fclose(fp);
return -1;
@@ -230,63 +250,182 @@ int get_cbm_mask(char *cache_type, char *cbm_mask)
}
/*
- * get_core_sibling - Get sibling core id from the same socket for given CPU
- * @cpu_no: CPU number
+ * resource_info_unsigned_get - Read an unsigned value from
+ * /sys/fs/resctrl/info/@resource/@filename
+ * @resource: Resource name that matches directory name in
+ * /sys/fs/resctrl/info
+ * @filename: File in /sys/fs/resctrl/info/@resource
+ * @val: Contains read value on success.
*
- * Return: > 0 on success, < 0 on failure.
+ * Return: = 0 on success, < 0 on failure. On success the read
+ * value is saved into @val.
*/
-int get_core_sibling(int cpu_no)
+int resource_info_unsigned_get(const char *resource, const char *filename,
+ unsigned int *val)
{
- char core_siblings_path[1024], cpu_list_str[64];
- int sibling_cpu_no = -1;
+ char file_path[PATH_MAX];
FILE *fp;
- sprintf(core_siblings_path, "%s%d/topology/core_siblings_list",
- CORE_SIBLINGS_PATH, cpu_no);
+ snprintf(file_path, sizeof(file_path), "%s/%s/%s", INFO_PATH, resource,
+ filename);
- fp = fopen(core_siblings_path, "r");
+ fp = fopen(file_path, "r");
if (!fp) {
- perror("Failed to open core siblings path");
-
+ ksft_print_msg("Error opening %s: %m\n", file_path);
return -1;
}
- if (fscanf(fp, "%s", cpu_list_str) <= 0) {
- perror("Could not get core_siblings list");
+
+ if (fscanf(fp, "%u", val) <= 0) {
+ ksft_print_msg("Could not get contents of %s: %m\n", file_path);
fclose(fp);
-
return -1;
}
+
fclose(fp);
+ return 0;
+}
- char *token = strtok(cpu_list_str, "-,");
+/*
+ * create_bit_mask- Create bit mask from start, len pair
+ * @start: LSB of the mask
+ * @len Number of bits in the mask
+ */
+unsigned long create_bit_mask(unsigned int start, unsigned int len)
+{
+ return ((1UL << len) - 1UL) << start;
+}
- while (token) {
- sibling_cpu_no = atoi(token);
- /* Skipping core 0 as we don't want to run test on core 0 */
- if (sibling_cpu_no != 0 && sibling_cpu_no != cpu_no)
- break;
- token = strtok(NULL, "-,");
+/*
+ * count_contiguous_bits - Returns the longest train of bits in a bit mask
+ * @val A bit mask
+ * @start The location of the least-significant bit of the longest train
+ *
+ * Return: The length of the contiguous bits in the longest train of bits
+ */
+unsigned int count_contiguous_bits(unsigned long val, unsigned int *start)
+{
+ unsigned long last_val;
+ unsigned int count = 0;
+
+ while (val) {
+ last_val = val;
+ val &= (val >> 1);
+ count++;
}
- return sibling_cpu_no;
+ if (start) {
+ if (count)
+ *start = ffsl(last_val) - 1;
+ else
+ *start = 0;
+ }
+
+ return count;
+}
+
+/*
+ * get_full_cbm - Get full Cache Bit Mask (CBM)
+ * @cache_type: Cache type as "L2" or "L3"
+ * @mask: Full cache bit mask representing the maximal portion of cache
+ * available for allocation, returned as unsigned long.
+ *
+ * Return: = 0 on success, < 0 on failure.
+ */
+int get_full_cbm(const char *cache_type, unsigned long *mask)
+{
+ char cbm_path[PATH_MAX];
+ int ret;
+
+ if (!cache_type)
+ return -1;
+
+ snprintf(cbm_path, sizeof(cbm_path), "%s/%s/cbm_mask",
+ INFO_PATH, cache_type);
+
+ ret = get_bit_mask(cbm_path, mask);
+ if (ret || !*mask)
+ return -1;
+
+ return 0;
+}
+
+/*
+ * get_shareable_mask - Get shareable mask from shareable_bits
+ * @cache_type: Cache type as "L2" or "L3"
+ * @shareable_mask: Shareable mask returned as unsigned long
+ *
+ * Return: = 0 on success, < 0 on failure.
+ */
+static int get_shareable_mask(const char *cache_type, unsigned long *shareable_mask)
+{
+ char mask_path[PATH_MAX];
+
+ if (!cache_type)
+ return -1;
+
+ snprintf(mask_path, sizeof(mask_path), "%s/%s/shareable_bits",
+ INFO_PATH, cache_type);
+
+ return get_bit_mask(mask_path, shareable_mask);
+}
+
+/*
+ * get_mask_no_shareable - Get Cache Bit Mask (CBM) without shareable bits
+ * @cache_type: Cache type as "L2" or "L3"
+ * @mask: The largest exclusive portion of the cache out of the
+ * full CBM, returned as unsigned long
+ *
+ * Parts of a cache may be shared with other devices such as GPU. This function
+ * calculates the largest exclusive portion of the cache where no other devices
+ * besides CPU have access to the cache portion.
+ *
+ * Return: = 0 on success, < 0 on failure.
+ */
+int get_mask_no_shareable(const char *cache_type, unsigned long *mask)
+{
+ unsigned long full_mask, shareable_mask;
+ unsigned int start, len;
+
+ if (get_full_cbm(cache_type, &full_mask) < 0)
+ return -1;
+ if (get_shareable_mask(cache_type, &shareable_mask) < 0)
+ return -1;
+
+ len = count_contiguous_bits(full_mask & ~shareable_mask, &start);
+ if (!len)
+ return -1;
+
+ *mask = create_bit_mask(start, len);
+
+ return 0;
}
/*
* taskset_benchmark - Taskset PID (i.e. benchmark) to a specified cpu
- * @bm_pid: PID that should be binded
- * @cpu_no: CPU number at which the PID would be binded
+ * @bm_pid: PID that should be binded
+ * @cpu_no: CPU number at which the PID would be binded
+ * @old_affinity: When not NULL, set to old CPU affinity
*
- * Return: 0 on success, non-zero on failure
+ * Return: 0 on success, < 0 on error.
*/
-int taskset_benchmark(pid_t bm_pid, int cpu_no)
+int taskset_benchmark(pid_t bm_pid, int cpu_no, cpu_set_t *old_affinity)
{
cpu_set_t my_set;
+ if (old_affinity) {
+ CPU_ZERO(old_affinity);
+ if (sched_getaffinity(bm_pid, sizeof(*old_affinity),
+ old_affinity)) {
+ ksft_perror("Unable to read CPU affinity");
+ return -1;
+ }
+ }
+
CPU_ZERO(&my_set);
CPU_SET(cpu_no, &my_set);
if (sched_setaffinity(bm_pid, sizeof(cpu_set_t), &my_set)) {
- perror("Unable to taskset benchmark");
+ ksft_perror("Unable to taskset benchmark");
return -1;
}
@@ -295,12 +434,29 @@ int taskset_benchmark(pid_t bm_pid, int cpu_no)
}
/*
+ * taskset_restore - Taskset PID to the earlier CPU affinity
+ * @bm_pid: PID that should be reset
+ * @old_affinity: The old CPU affinity to restore
+ *
+ * Return: 0 on success, < 0 on error.
+ */
+int taskset_restore(pid_t bm_pid, cpu_set_t *old_affinity)
+{
+ if (sched_setaffinity(bm_pid, sizeof(*old_affinity), old_affinity)) {
+ ksft_perror("Unable to restore CPU affinity");
+ return -1;
+ }
+
+ return 0;
+}
+
+/*
* create_grp - Create a group only if one doesn't exist
* @grp_name: Name of the group
* @grp: Full path and name of the group
* @parent_grp: Full path and name of the parent group
*
- * Return: 0 on success, non-zero on failure
+ * Return: 0 on success, < 0 on error.
*/
static int create_grp(const char *grp_name, char *grp, const char *parent_grp)
{
@@ -325,7 +481,7 @@ static int create_grp(const char *grp_name, char *grp, const char *parent_grp)
}
closedir(dp);
} else {
- perror("Unable to open resctrl for group");
+ ksft_perror("Unable to open resctrl for group");
return -1;
}
@@ -333,7 +489,7 @@ static int create_grp(const char *grp_name, char *grp, const char *parent_grp)
/* Requested grp doesn't exist, hence create it */
if (found_grp == 0) {
if (mkdir(grp, 0) == -1) {
- perror("Unable to create group");
+ ksft_perror("Unable to create group");
return -1;
}
@@ -348,12 +504,12 @@ static int write_pid_to_tasks(char *tasks, pid_t pid)
fp = fopen(tasks, "w");
if (!fp) {
- perror("Failed to open tasks file");
+ ksft_perror("Failed to open tasks file");
return -1;
}
if (fprintf(fp, "%d\n", pid) < 0) {
- perror("Failed to wr pid to tasks file");
+ ksft_print_msg("Failed to write pid to tasks file\n");
fclose(fp);
return -1;
@@ -376,7 +532,7 @@ static int write_pid_to_tasks(char *tasks, pid_t pid)
* pid is not written, this means that pid is in con_mon grp and hence
* should consult con_mon grp's mon_data directory for results.
*
- * Return: 0 on success, non-zero on failure
+ * Return: 0 on success, < 0 on error.
*/
int write_bm_pid_to_resctrl(pid_t bm_pid, char *ctrlgrp, char *mongrp,
char *resctrl_val)
@@ -420,7 +576,7 @@ int write_bm_pid_to_resctrl(pid_t bm_pid, char *ctrlgrp, char *mongrp,
out:
ksft_print_msg("Writing benchmark parameters to resctrl FS\n");
if (ret)
- perror("# writing to resctrlfs");
+ ksft_print_msg("Failed writing to resctrlfs\n");
return ret;
}
@@ -430,23 +586,17 @@ int write_bm_pid_to_resctrl(pid_t bm_pid, char *ctrlgrp, char *mongrp,
* @ctrlgrp: Name of the con_mon grp
* @schemata: Schemata that should be updated to
* @cpu_no: CPU number that the benchmark PID is binded to
- * @resctrl_val: Resctrl feature (Eg: mbm, mba.. etc)
+ * @resource: Resctrl resource (Eg: MB, L3, L2, etc.)
*
- * Update schemata of a con_mon grp *only* if requested resctrl feature is
+ * Update schemata of a con_mon grp *only* if requested resctrl resource is
* allocation type
*
- * Return: 0 on success, non-zero on failure
+ * Return: 0 on success, < 0 on error.
*/
-int write_schemata(char *ctrlgrp, char *schemata, int cpu_no, char *resctrl_val)
+int write_schemata(char *ctrlgrp, char *schemata, int cpu_no, const char *resource)
{
char controlgroup[1024], reason[128], schema[1024] = {};
- int resource_id, fd, schema_len = -1, ret = 0;
-
- if (strncmp(resctrl_val, MBA_STR, sizeof(MBA_STR)) &&
- strncmp(resctrl_val, MBM_STR, sizeof(MBM_STR)) &&
- strncmp(resctrl_val, CAT_STR, sizeof(CAT_STR)) &&
- strncmp(resctrl_val, CMT_STR, sizeof(CMT_STR)))
- return -ENOENT;
+ int domain_id, fd, schema_len, ret = 0;
if (!schemata) {
ksft_print_msg("Skipping empty schemata update\n");
@@ -454,8 +604,8 @@ int write_schemata(char *ctrlgrp, char *schemata, int cpu_no, char *resctrl_val)
return -1;
}
- if (get_resource_id(cpu_no, &resource_id) < 0) {
- sprintf(reason, "Failed to get resource id");
+ if (get_domain_id(resource, cpu_no, &domain_id) < 0) {
+ sprintf(reason, "Failed to get domain ID");
ret = -1;
goto out;
@@ -466,14 +616,8 @@ int write_schemata(char *ctrlgrp, char *schemata, int cpu_no, char *resctrl_val)
else
sprintf(controlgroup, "%s/schemata", RESCTRL_PATH);
- if (!strncmp(resctrl_val, CAT_STR, sizeof(CAT_STR)) ||
- !strncmp(resctrl_val, CMT_STR, sizeof(CMT_STR)))
- schema_len = snprintf(schema, sizeof(schema), "%s%d%c%s\n",
- "L3:", resource_id, '=', schemata);
- if (!strncmp(resctrl_val, MBA_STR, sizeof(MBA_STR)) ||
- !strncmp(resctrl_val, MBM_STR, sizeof(MBM_STR)))
- schema_len = snprintf(schema, sizeof(schema), "%s%d%c%s\n",
- "MB:", resource_id, '=', schemata);
+ schema_len = snprintf(schema, sizeof(schema), "%s:%d=%s\n",
+ resource, domain_id, schemata);
if (schema_len < 0 || schema_len >= sizeof(schema)) {
snprintf(reason, sizeof(reason),
"snprintf() failed with return value : %d", schema_len);
@@ -564,20 +708,16 @@ char *fgrep(FILE *inf, const char *str)
}
/*
- * validate_resctrl_feature_request - Check if requested feature is valid.
- * @resource: Required resource (e.g., MB, L3, L2, L3_MON, etc.)
- * @feature: Required monitor feature (in mon_features file). Can only be
- * set for L3_MON. Must be NULL for all other resources.
+ * resctrl_resource_exists - Check if a resource is supported.
+ * @resource: Resctrl resource (e.g., MB, L3, L2, L3_MON, etc.)
*
- * Return: True if the resource/feature is supported, else false. False is
+ * Return: True if the resource is supported, else false. False is
* also returned if resctrl FS is not mounted.
*/
-bool validate_resctrl_feature_request(const char *resource, const char *feature)
+bool resctrl_resource_exists(const char *resource)
{
char res_path[PATH_MAX];
struct stat statbuf;
- char *res;
- FILE *inf;
int ret;
if (!resource)
@@ -592,8 +732,25 @@ bool validate_resctrl_feature_request(const char *resource, const char *feature)
if (stat(res_path, &statbuf))
return false;
- if (!feature)
- return true;
+ return true;
+}
+
+/*
+ * resctrl_mon_feature_exists - Check if requested monitoring feature is valid.
+ * @resource: Resource that uses the mon_features file. Currently only L3_MON
+ * is valid.
+ * @feature: Required monitor feature (in mon_features file).
+ *
+ * Return: True if the feature is supported, else false.
+ */
+bool resctrl_mon_feature_exists(const char *resource, const char *feature)
+{
+ char res_path[PATH_MAX];
+ char *res;
+ FILE *inf;
+
+ if (!feature || !resource)
+ return false;
snprintf(res_path, sizeof(res_path), "%s/%s/mon_features", INFO_PATH, resource);
inf = fopen(res_path, "r");
@@ -607,6 +764,36 @@ bool validate_resctrl_feature_request(const char *resource, const char *feature)
return !!res;
}
+/*
+ * resource_info_file_exists - Check if a file is present inside
+ * /sys/fs/resctrl/info/@resource.
+ * @resource: Required resource (Eg: MB, L3, L2, etc.)
+ * @file: Required file.
+ *
+ * Return: True if the /sys/fs/resctrl/info/@resource/@file exists, else false.
+ */
+bool resource_info_file_exists(const char *resource, const char *file)
+{
+ char res_path[PATH_MAX];
+ struct stat statbuf;
+
+ if (!file || !resource)
+ return false;
+
+ snprintf(res_path, sizeof(res_path), "%s/%s/%s", INFO_PATH, resource,
+ file);
+
+ if (stat(res_path, &statbuf))
+ return false;
+
+ return true;
+}
+
+bool test_resource_feature_check(const struct resctrl_test *test)
+{
+ return resctrl_resource_exists(test->resource);
+}
+
int filter_dmesg(void)
{
char line[1024];
@@ -617,7 +804,7 @@ int filter_dmesg(void)
ret = pipe(pipefds);
if (ret) {
- perror("pipe");
+ ksft_perror("pipe");
return ret;
}
fflush(stdout);
@@ -626,13 +813,13 @@ int filter_dmesg(void)
close(pipefds[0]);
dup2(pipefds[1], STDOUT_FILENO);
execlp("dmesg", "dmesg", NULL);
- perror("executing dmesg");
+ ksft_perror("Executing dmesg");
exit(1);
}
close(pipefds[1]);
fp = fdopen(pipefds[0], "r");
if (!fp) {
- perror("fdopen(pipe)");
+ ksft_perror("fdopen(pipe)");
kill(pid, SIGTERM);
return -1;
diff --git a/tools/testing/selftests/rust/Makefile b/tools/testing/selftests/rust/Makefile
new file mode 100644
index 0000000..fce1584
--- /dev/null
+++ b/tools/testing/selftests/rust/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0
+TEST_PROGS += test_probe_samples.sh
+
+include ../lib.mk
diff --git a/tools/testing/selftests/rust/config b/tools/testing/selftests/rust/config
new file mode 100644
index 0000000..b4002ac
--- /dev/null
+++ b/tools/testing/selftests/rust/config
@@ -0,0 +1,5 @@
+CONFIG_RUST=y
+CONFIG_SAMPLES=y
+CONFIG_SAMPLES_RUST=y
+CONFIG_SAMPLE_RUST_MINIMAL=m
+CONFIG_SAMPLE_RUST_PRINT=m
\ No newline at end of file
diff --git a/tools/testing/selftests/rust/test_probe_samples.sh b/tools/testing/selftests/rust/test_probe_samples.sh
new file mode 100755
index 0000000..ad0397e
--- /dev/null
+++ b/tools/testing/selftests/rust/test_probe_samples.sh
@@ -0,0 +1,41 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (c) 2023 Collabora Ltd
+#
+# This script tests whether the rust sample modules can
+# be added and removed correctly.
+#
+DIR="$(dirname "$(readlink -f "$0")")"
+
+KTAP_HELPERS="${DIR}/../kselftest/ktap_helpers.sh"
+if [ -e "$KTAP_HELPERS" ]; then
+ source "$KTAP_HELPERS"
+else
+ echo "$KTAP_HELPERS file not found [SKIP]"
+ exit 4
+fi
+
+rust_sample_modules=("rust_minimal" "rust_print")
+
+ktap_print_header
+
+for sample in "${rust_sample_modules[@]}"; do
+ if ! /sbin/modprobe -n -q "$sample"; then
+ ktap_skip_all "module $sample is not found in /lib/modules/$(uname -r)"
+ exit "$KSFT_SKIP"
+ fi
+done
+
+ktap_set_plan "${#rust_sample_modules[@]}"
+
+for sample in "${rust_sample_modules[@]}"; do
+ if /sbin/modprobe -q "$sample"; then
+ /sbin/modprobe -q -r "$sample"
+ ktap_test_pass "$sample"
+ else
+ ktap_test_fail "$sample"
+ fi
+done
+
+ktap_finished
diff --git a/tools/testing/selftests/sched/cs_prctl_test.c b/tools/testing/selftests/sched/cs_prctl_test.c
index 7ba05715..62fba73 100644
--- a/tools/testing/selftests/sched/cs_prctl_test.c
+++ b/tools/testing/selftests/sched/cs_prctl_test.c
@@ -276,7 +276,7 @@ int main(int argc, char *argv[])
if (setpgid(0, 0) != 0)
handle_error("process group");
- printf("\n## Create a thread/process/process group hiearchy\n");
+ printf("\n## Create a thread/process/process group hierarchy\n");
create_processes(num_processes, num_threads, procs);
need_cleanup = 1;
disp_processes(num_processes, procs);
diff --git a/tools/testing/selftests/thermal/intel/power_floor/.gitignore b/tools/testing/selftests/thermal/intel/power_floor/.gitignore
new file mode 100644
index 0000000..1b9a764
--- /dev/null
+++ b/tools/testing/selftests/thermal/intel/power_floor/.gitignore
@@ -0,0 +1 @@
+power_floor_test
diff --git a/tools/testing/selftests/thermal/intel/workload_hint/.gitignore b/tools/testing/selftests/thermal/intel/workload_hint/.gitignore
new file mode 100644
index 0000000..d697b03
--- /dev/null
+++ b/tools/testing/selftests/thermal/intel/workload_hint/.gitignore
@@ -0,0 +1 @@
+workload_hint_test
diff --git a/tools/testing/selftests/uevent/.gitignore b/tools/testing/selftests/uevent/.gitignore
new file mode 100644
index 0000000..382afb7
--- /dev/null
+++ b/tools/testing/selftests/uevent/.gitignore
@@ -0,0 +1 @@
+uevent_filtering
diff --git a/tools/tracing/rtla/Makefile b/tools/tracing/rtla/Makefile
index 2456a39..afd18c6 100644
--- a/tools/tracing/rtla/Makefile
+++ b/tools/tracing/rtla/Makefile
@@ -28,10 +28,15 @@
-fasynchronous-unwind-tables -fstack-clash-protection
WOPTS := -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized
+ifeq ($(CC),clang)
+ FOPTS := $(filter-out -ffat-lto-objects, $(FOPTS))
+ WOPTS := $(filter-out -Wno-maybe-uninitialized, $(WOPTS))
+endif
+
TRACEFS_HEADERS := $$($(PKG_CONFIG) --cflags libtracefs)
CFLAGS := -O -g -DVERSION=\"$(VERSION)\" $(FOPTS) $(MOPTS) $(WOPTS) $(TRACEFS_HEADERS) $(EXTRA_CFLAGS)
-LDFLAGS := -ggdb $(EXTRA_LDFLAGS)
+LDFLAGS := -flto=auto -ggdb $(EXTRA_LDFLAGS)
LIBS := $$($(PKG_CONFIG) --libs libtracefs)
SRC := $(wildcard src/*.c)
diff --git a/tools/tracing/rtla/src/osnoise_hist.c b/tools/tracing/rtla/src/osnoise_hist.c
index 8f81fa0..01870d5 100644
--- a/tools/tracing/rtla/src/osnoise_hist.c
+++ b/tools/tracing/rtla/src/osnoise_hist.c
@@ -135,8 +135,7 @@ static void osnoise_hist_update_multiple(struct osnoise_tool *tool, int cpu,
if (params->output_divisor)
duration = duration / params->output_divisor;
- if (data->bucket_size)
- bucket = duration / data->bucket_size;
+ bucket = duration / data->bucket_size;
total_duration = duration * count;
@@ -480,7 +479,11 @@ static void osnoise_hist_usage(char *usage)
for (i = 0; msg[i]; i++)
fprintf(stderr, "%s\n", msg[i]);
- exit(1);
+
+ if (usage)
+ exit(EXIT_FAILURE);
+
+ exit(EXIT_SUCCESS);
}
/*
diff --git a/tools/tracing/rtla/src/osnoise_top.c b/tools/tracing/rtla/src/osnoise_top.c
index f7c959b..457360d 100644
--- a/tools/tracing/rtla/src/osnoise_top.c
+++ b/tools/tracing/rtla/src/osnoise_top.c
@@ -331,7 +331,11 @@ static void osnoise_top_usage(struct osnoise_top_params *params, char *usage)
for (i = 0; msg[i]; i++)
fprintf(stderr, "%s\n", msg[i]);
- exit(1);
+
+ if (usage)
+ exit(EXIT_FAILURE);
+
+ exit(EXIT_SUCCESS);
}
/*
diff --git a/tools/tracing/rtla/src/timerlat_hist.c b/tools/tracing/rtla/src/timerlat_hist.c
index 47d3d8b..dbf1540 100644
--- a/tools/tracing/rtla/src/timerlat_hist.c
+++ b/tools/tracing/rtla/src/timerlat_hist.c
@@ -178,8 +178,7 @@ timerlat_hist_update(struct osnoise_tool *tool, int cpu,
if (params->output_divisor)
latency = latency / params->output_divisor;
- if (data->bucket_size)
- bucket = latency / data->bucket_size;
+ bucket = latency / data->bucket_size;
if (!context) {
hist = data->hist[cpu].irq;
@@ -546,7 +545,11 @@ static void timerlat_hist_usage(char *usage)
for (i = 0; msg[i]; i++)
fprintf(stderr, "%s\n", msg[i]);
- exit(1);
+
+ if (usage)
+ exit(EXIT_FAILURE);
+
+ exit(EXIT_SUCCESS);
}
/*
diff --git a/tools/tracing/rtla/src/timerlat_top.c b/tools/tracing/rtla/src/timerlat_top.c
index 1640f12..3e9af2c 100644
--- a/tools/tracing/rtla/src/timerlat_top.c
+++ b/tools/tracing/rtla/src/timerlat_top.c
@@ -375,7 +375,11 @@ static void timerlat_top_usage(char *usage)
for (i = 0; msg[i]; i++)
fprintf(stderr, "%s\n", msg[i]);
- exit(1);
+
+ if (usage)
+ exit(EXIT_FAILURE);
+
+ exit(EXIT_SUCCESS);
}
/*
diff --git a/tools/tracing/rtla/src/utils.c b/tools/tracing/rtla/src/utils.c
index c769d7b..9ac71a6 100644
--- a/tools/tracing/rtla/src/utils.c
+++ b/tools/tracing/rtla/src/utils.c
@@ -238,12 +238,6 @@ static inline int sched_setattr(pid_t pid, const struct sched_attr *attr,
return syscall(__NR_sched_setattr, pid, attr, flags);
}
-static inline int sched_getattr(pid_t pid, struct sched_attr *attr,
- unsigned int size, unsigned int flags)
-{
- return syscall(__NR_sched_getattr, pid, attr, size, flags);
-}
-
int __set_sched_attr(int pid, struct sched_attr *attr)
{
int flags = 0;
@@ -479,13 +473,13 @@ int parse_prio(char *arg, struct sched_attr *sched_param)
if (prio == INVALID_VAL)
return -1;
- if (prio < sched_get_priority_min(SCHED_OTHER))
+ if (prio < MIN_NICE)
return -1;
- if (prio > sched_get_priority_max(SCHED_OTHER))
+ if (prio > MAX_NICE)
return -1;
sched_param->sched_policy = SCHED_OTHER;
- sched_param->sched_priority = prio;
+ sched_param->sched_nice = prio;
break;
default:
return -1;
@@ -536,7 +530,7 @@ int set_cpu_dma_latency(int32_t latency)
*/
static const int find_mount(const char *fs, char *mp, int sizeof_mp)
{
- char mount_point[MAX_PATH];
+ char mount_point[MAX_PATH+1];
char type[100];
int found = 0;
FILE *fp;
diff --git a/tools/tracing/rtla/src/utils.h b/tools/tracing/rtla/src/utils.h
index 04ed1e6..d44513e 100644
--- a/tools/tracing/rtla/src/utils.h
+++ b/tools/tracing/rtla/src/utils.h
@@ -9,6 +9,8 @@
*/
#define BUFF_U64_STR_SIZE 24
#define MAX_PATH 1024
+#define MAX_NICE 20
+#define MIN_NICE -19
#define container_of(ptr, type, member)({ \
const typeof(((type *)0)->member) *__mptr = (ptr); \
diff --git a/tools/verification/rv/Makefile b/tools/verification/rv/Makefile
index 3d0f388..485f8ae 100644
--- a/tools/verification/rv/Makefile
+++ b/tools/verification/rv/Makefile
@@ -28,10 +28,15 @@
-fasynchronous-unwind-tables -fstack-clash-protection
WOPTS := -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized
+ifeq ($(CC),clang)
+ FOPTS := $(filter-out -ffat-lto-objects, $(FOPTS))
+ WOPTS := $(filter-out -Wno-maybe-uninitialized, $(WOPTS))
+endif
+
TRACEFS_HEADERS := $$($(PKG_CONFIG) --cflags libtracefs)
CFLAGS := -O -g -DVERSION=\"$(VERSION)\" $(FOPTS) $(MOPTS) $(WOPTS) $(TRACEFS_HEADERS) $(EXTRA_CFLAGS) -I include
-LDFLAGS := -ggdb $(EXTRA_LDFLAGS)
+LDFLAGS := -flto=auto -ggdb $(EXTRA_LDFLAGS)
LIBS := $$($(PKG_CONFIG) --libs libtracefs)
SRC := $(wildcard src/*.c)
diff --git a/tools/verification/rv/src/in_kernel.c b/tools/verification/rv/src/in_kernel.c
index ad28582..f04479e 100644
--- a/tools/verification/rv/src/in_kernel.c
+++ b/tools/verification/rv/src/in_kernel.c
@@ -210,9 +210,9 @@ static char *ikm_read_reactor(char *monitor_name)
static char *ikm_get_current_reactor(char *monitor_name)
{
char *reactors = ikm_read_reactor(monitor_name);
+ char *curr_reactor = NULL;
char *start;
char *end;
- char *curr_reactor;
if (!reactors)
return NULL;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 10bfc88..0f50960 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1615,7 +1615,13 @@ static int check_memory_region_flags(struct kvm *kvm,
valid_flags &= ~KVM_MEM_LOG_DIRTY_PAGES;
#ifdef __KVM_HAVE_READONLY_MEM
- valid_flags |= KVM_MEM_READONLY;
+ /*
+ * GUEST_MEMFD is incompatible with read-only memslots, as writes to
+ * read-only memslots have emulated MMIO, not page fault, semantics,
+ * and KVM doesn't allow emulated MMIO for private memory.
+ */
+ if (!(mem->flags & KVM_MEM_GUEST_MEMFD))
+ valid_flags |= KVM_MEM_READONLY;
#endif
if (mem->flags & ~valid_flags)