summaryrefslogtreecommitdiffstats
path: root/net/sunrpc
AgeCommit message (Collapse)AuthorLines
2026-02-27xprtrdma: Decrement re_receiving on the early exit pathsEric Badger-3/+4
In the event that rpcrdma_post_recvs() fails to create a work request (due to memory allocation failure, say) or otherwise exits early, we should decrement ep->re_receiving before returning. Otherwise we will hang in rpcrdma_xprt_drain() as re_receiving will never reach zero and the completion will never be triggered. On a system with high memory pressure, this can appear as the following hung task: INFO: task kworker/u385:17:8393 blocked for more than 122 seconds. Tainted: G S E 6.19.0 #3 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u385:17 state:D stack:0 pid:8393 tgid:8393 ppid:2 task_flags:0x4248060 flags:0x00080000 Workqueue: xprtiod xprt_autoclose [sunrpc] Call Trace: <TASK> __schedule+0x48b/0x18b0 ? ib_post_send_mad+0x247/0xae0 [ib_core] schedule+0x27/0xf0 schedule_timeout+0x104/0x110 __wait_for_common+0x98/0x180 ? __pfx_schedule_timeout+0x10/0x10 wait_for_completion+0x24/0x40 rpcrdma_xprt_disconnect+0x444/0x460 [rpcrdma] xprt_rdma_close+0x12/0x40 [rpcrdma] xprt_autoclose+0x5f/0x120 [sunrpc] process_one_work+0x191/0x3e0 worker_thread+0x2e3/0x420 ? __pfx_worker_thread+0x10/0x10 kthread+0x10d/0x230 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x273/0x2b0 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 Fixes: 15788d1d1077 ("xprtrdma: Do not refresh Receive Queue while it is draining") Signed-off-by: Eric Badger <ebadger@purestorage.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-02-21Convert more 'alloc_obj' cases to default GFP_KERNEL argumentsLinus Torvalds-6/+3
This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_flex' family to use the new default GFP_KERNEL argumentLinus Torvalds-2/+2
This is the exact same thing as the 'alloc_obj()' version, only much smaller because there are a lot fewer users of the *alloc_flex() interface. As with alloc_obj() version, this was done entirely with mindless brute force, using the same script, except using 'flex' in the pattern rather than 'objs*'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds-31/+31
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook-57/+56
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-12Merge tag 'nfs-for-7.0-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds-2/+4
Pull NFS client updates from Anna Schumaker: "New Features: - Use an LRU list for returning unused delegations - Introduce a KConfig option to disable NFS v4.0 and make NFS v4.1 the default Bugfixes: - NFS/localio: - Handle short writes by retrying - Prevent direct reclaim recursion into NFS via nfs_writepages - Use GFP_NOIO and non-memreclaim workqueue in nfs_local_commit - Remove -EAGAIN handling in nfs_local_doio() - pNFS: fix a missing wake up while waiting on NFS_LAYOUT_DRAIN - fs/nfs: Fix a readdir slow-start regression - SUNRPC: fix gss_auth kref leak in gss_alloc_msg error path Other cleanups and improvements: - A few other NFS/localio cleanups - Various other delegation handling cleanups from Christoph - Unify security_inode_listsecurity() calls - Improvements to NFSv4 lease handling - Clean up SUNRPC *_debug fields when CONFIG_SUNRPC_DEBUG is not set" * tag 'nfs-for-7.0-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (60 commits) SUNRPC: fix gss_auth kref leak in gss_alloc_msg error path nfs: nfs4proc: Convert comma to semicolon SUNRPC: Change list definition method sunrpc: rpc_debug and others are defined even if CONFIG_SUNRPC_DEBUG unset NFSv4: limit lease period in nfs4_set_lease_period() NFSv4: pass lease period in seconds to nfs4_set_lease_period() nfs: unify security_inode_listsecurity() calls fs/nfs: Fix readdir slow-start regression pNFS: fix a missing wake up while waiting on NFS_LAYOUT_DRAIN NFS: fix delayed delegation return handling NFS: simplify error handling in nfs_end_delegation_return NFS: fold nfs_abort_delegation_return into nfs_end_delegation_return NFS: remove the delegation == NULL check in nfs_end_delegation_return NFS: use bool for the issync argument to nfs_end_delegation_return NFS: return void from ->return_delegation NFS: return void from nfs4_inode_make_writeable NFS: Merge CONFIG_NFS_V4_1 with CONFIG_NFS_V4 NFS: Add a way to disable NFS v4.0 via KConfig NFS: Move sequence slot operations into minorversion operations NFS: Pass a struct nfs_client to nfs4_init_sequence() ...
2026-02-12Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds-71/+92
Pull rdma updates from Jason Gunthorpe: "Usual smallish cycle. The NFS biovec work to push it down into RDMA instead of indirecting through a scatterlist is pretty nice to see, been talked about for a long time now. - Various code improvements in irdma, rtrs, qedr, ocrdma, irdma, rxe - Small driver improvements and minor bug fixes to hns, mlx5, rxe, mana, mlx5, irdma - Robusness improvements in completion processing for EFA - New query_port_speed() verb to move past limited IBA defined speed steps - Support for SG_GAPS in rts and many other small improvements - Rare list corruption fix in iwcm - Better support different page sizes in rxe - Device memory support for mana - Direct bio vec to kernel MR for use by NFS-RDMA - QP rate limiting for bnxt_re - Remote triggerable NULL pointer crash in siw - DMA-buf exporter support for RDMA mmaps like doorbells" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (66 commits) RDMA/mlx5: Implement DMABUF export ops RDMA/uverbs: Add DMABUF object type and operations RDMA/uverbs: Support external FD uobjects RDMA/siw: Fix potential NULL pointer dereference in header processing RDMA/umad: Reject negative data_len in ib_umad_write IB/core: Extend rate limit support for RC QPs RDMA/mlx5: Support rate limit only for Raw Packet QP RDMA/bnxt_re: Report QP rate limit in debugfs RDMA/bnxt_re: Report packet pacing capabilities when querying device RDMA/bnxt_re: Add support for QP rate limiting MAINTAINERS: Drop RDMA files from Hyper-V section RDMA/uverbs: Add __GFP_NOWARN to ib_uverbs_unmarshall_recv() kmalloc svcrdma: use bvec-based RDMA read/write API RDMA/core: add rdma_rw_max_sge() helper for SQ sizing RDMA/core: add MR support for bvec-based RDMA operations RDMA/core: use IOVA-based DMA mapping for bvec RDMA operations RDMA/core: add bio_vec based RDMA read/write API RDMA/irdma: Use kvzalloc for paged memory DMA address array RDMA/rxe: Fix race condition in QP timer handlers RDMA/mana_ib: Add device‑memory support ...
2026-02-12Merge tag 'mm-nonmm-stable-2026-02-12-10-48' of ↵Linus Torvalds-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves disk space by teaching ocfs2 to reclaim suballocator block group space (Heming Zhao) - "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the ARRAY_END() macro and uses it in various places (Alejandro Colomar) - "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the page size (Pnina Feder) - "kallsyms: Prevent invalid access when showing module buildid" cleans up kallsyms code related to module buildid and fixes an invalid access crash when printing backtraces (Petr Mladek) - "Address page fault in ima_restore_measurement_list()" fixes a kexec-related crash that can occur when booting the second-stage kernel on x86 (Harshit Mogalapalli) - "kho: ABI headers and Documentation updates" updates the kexec handover ABI documentation (Mike Rapoport) - "Align atomic storage" adds the __aligned attribute to atomic_t and atomic64_t definitions to get natural alignment of both types on csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain) - "kho: clean up page initialization logic" simplifies the page initialization logic in kho_restore_page() (Pratyush Yadav) - "Unload linux/kernel.h" moves several things out of kernel.h and into more appropriate places (Yury Norov) - "don't abuse task_struct.group_leader" removes the usage of ->group_leader when it is "obviously unnecessary" (Oleg Nesterov) - "list private v2 & luo flb" adds some infrastructure improvements to the live update orchestrator (Pasha Tatashin) * tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits) watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency procfs: fix missing RCU protection when reading real_parent in do_task_stat() watchdog/softlockup: fix sample ring index wrap in need_counting_irqs() kcsan, compiler_types: avoid duplicate type issues in BPF Type Format kho: fix doc for kho_restore_pages() tests/liveupdate: add in-kernel liveupdate test liveupdate: luo_flb: introduce File-Lifecycle-Bound global state liveupdate: luo_file: Use private list list: add kunit test for private list primitives list: add primitives for private list manipulations delayacct: fix uapi timespec64 definition panic: add panic_force_cpu= parameter to redirect panic to a specific CPU netclassid: use thread_group_leader(p) in update_classid_task() RDMA/umem: don't abuse current->group_leader drm/pan*: don't abuse current->group_leader drm/amd: kill the outdated "Only the pthreads threading model is supported" checks drm/amdgpu: don't abuse current->group_leader android/binder: use same_thread_group(proc->tsk, current) in binder_mmap() android/binder: don't abuse current->group_leader kho: skip memoryless NUMA nodes when reserving scratch areas ...
2026-02-09SUNRPC: fix gss_auth kref leak in gss_alloc_msg error pathDaniel Hodges-0/+3
Commit 5940d1cf9f42 ("SUNRPC: Rebalance a kref in auth_gss.c") added a kref_get(&gss_auth->kref) call to balance the gss_put_auth() done in gss_release_msg(), but forgot to add a corresponding kref_put() on the error path when kstrdup_const() fails. If service_name is non-NULL and kstrdup_const() fails, the function jumps to err_put_pipe_version which calls put_pipe_version() and kfree(gss_msg), but never releases the gss_auth reference. This leads to a kref leak where the gss_auth structure is never freed. Add a forward declaration for gss_free_callback() and call kref_put() in the err_put_pipe_version error path to properly release the reference taken earlier. Fixes: 5940d1cf9f42 ("SUNRPC: Rebalance a kref in auth_gss.c") Cc: stable@vger.kernel.org Signed-off-by: Daniel Hodges <git@danielhodges.dev> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-02-09SUNRPC: Change list definition methodChenguang Zhao-2/+1
The LIST_HEAD macro can both define a linked list and initialize it in one step. To simplify code, we replace the separate operations of linked list definition and manual initialization with the LIST_HEAD macro. Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2026-01-28sunrpc: allow svc_recv() to return -ETIMEDOUT and -EBUSYJeff Layton-9/+42
To dynamically adjust the thread count, nfsd requires some information about how busy things are. Change svc_recv() to take a timeout value, and then allow the wait for work to time out if it's set. If a timeout is not defined, then the schedule will be set to MAX_SCHEDULE_TIMEOUT. If the task waits for the full timeout, then have it return -ETIMEDOUT to the caller. If it wakes up, finds that there is more work and that no threads are available, then attempt to set SP_TASK_STARTING. If wasn't already set, have the task return -EBUSY to cue to the caller that the service could use more threads. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28sunrpc: split new thread creation into a separate functionJeff Layton-29/+46
Break out the part of svc_start_kthreads() that creates a thread into svc_new_thread(), as a new exported helper function. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28sunrpc: introduce the concept of a minimum number of threads per poolJeff Layton-8/+37
Add a new pool->sp_nrthrmin field to track the minimum number of threads in a pool. Add min_threads parameters to both svc_set_num_threads() and svc_set_pool_threads(). If min_threads is non-zero and less than the max, svc_set_num_threads() will ensure that the number of running threads is between the min and the max. If the min is 0 or greater than the max, then it is ignored, and the maximum number of threads will be started, and never spun down. For now, the min_threads is always 0, but a later patch will pass the proper value through from nfsd. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28sunrpc: track the max number of requested threads in a poolJeff Layton-0/+1
The kernel currently tracks the number of threads running in a pool in the "sp_nrthreads" field. In the future, where threads are dynamically spun up and down, it'll be necessary to keep track of the maximum number of requested threads separately from the actual number running. Add a pool->sp_nrthrmax parameter to track this. When userland changes the number of threads in a pool, update that value accordingly. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28sunrpc: remove special handling of NULL pool from svc_start/stop_kthreads()Jeff Layton-46/+7
Now that svc_set_num_threads() handles distributing the threads among the available pools, remove the special handling of a NULL pool pointer from svc_start_kthreads() and svc_stop_kthreads(). Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28sunrpc: split svc_set_num_threads() into two functionsJeff Layton-15/+53
svc_set_num_threads() will set the number of running threads for a given pool. If the pool argument is set to NULL however, it will distribute the threads among all of the pools evenly. These divergent codepaths complicate the move to dynamic threading. Simplify the API by splitting these two cases into different helpers: Add a new svc_set_pool_threads() function that sets the number of threads in a single, given pool. Modify svc_set_num_threads() to distribute the threads evenly between all of the pools and then call svc_set_pool_threads() for each. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-28svcrdma: use bvec-based RDMA read/write APIChuck Lever-69/+86
Convert svcrdma to the bvec-based RDMA API introduced earlier in this series. The bvec-based RDMA API eliminates the intermediate scatterlist conversion step, allowing direct DMA mapping from bio_vec arrays. This simplifies the svc_rdma_rw_ctxt structure by removing the chained SG table management. The structure retains an inline array approach similar to the previous scatterlist implementation: an inline bvec array sized to max_send_sge handles most I/O operations without additional allocation. Larger requests fall back to dynamic allocation. This preserves the allocation-free fast path for typical NFS operations while supporting arbitrarily large transfers. The bvec API handles all device types internally, including iWARP devices which require memory registration. No explicit fallback path is needed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://patch.msgid.link/20260128005400.25147-6-cel@kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-28RDMA/core: add rdma_rw_max_sge() helper for SQ sizingChuck Lever-2/+6
svc_rdma_accept() computes sc_sq_depth as the sum of rq_depth and the number of rdma_rw contexts (ctxts). This value is used to allocate the Send CQ and to initialize the sc_sq_avail credit pool. However, when the device uses memory registration for RDMA operations, rdma_rw_init_qp() inflates the QP's max_send_wr by a factor of three per context to account for REG and INV work requests. The Send CQ and credit pool remain sized for only one work request per context, causing Send Queue exhaustion under heavy NFS WRITE workloads. Introduce rdma_rw_max_sge() to compute the actual number of Send Queue entries required for a given number of rdma_rw contexts. Upper layer protocols call this helper before creating a Queue Pair so that their Send CQs and credit accounting match the QP's true capacity. Update svc_rdma_accept() to use rdma_rw_max_sge() when computing sc_sq_depth, ensuring the credit pool reflects the work requests that rdma_rw_init_qp() will reserve. Reviewed-by: Christoph Hellwig <hch@lst.de> Fixes: 00bd1439f464 ("RDMA/rw: Support threshold for registration vs scattering to local pages") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://patch.msgid.link/20260128005400.25147-5-cel@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-26SUNRPC: auth_gss: fix memory leaks in XDR decoding error pathsChuck Lever-18/+64
The gssx_dec_ctx(), gssx_dec_status(), and gssx_dec_name() functions allocate memory via gssx_dec_buffer(), which calls kmemdup(). When a subsequent decode operation fails, these functions return immediately without freeing previously allocated buffers, causing memory leaks. The leak in gssx_dec_ctx() is particularly relevant because the caller (gssp_accept_sec_context_upcall) initializes several buffer length fields to non-zero values, resulting in memory allocation: struct gssx_ctx rctxh = { .exported_context_token.len = GSSX_max_output_handle_sz, .mech.len = GSS_OID_MAX_LEN, .src_name.display_name.len = GSSX_max_princ_sz, .targ_name.display_name.len = GSSX_max_princ_sz }; If, for example, gssx_dec_name() succeeds for src_name but fails for targ_name, the memory allocated for exported_context_token, mech, and src_name.display_name remains unreferenced and cannot be reclaimed. Add error handling with goto-based cleanup to free any previously allocated buffers before returning an error. Reported-by: Xingjing Deng <micro6947@gmail.com> Closes: https://lore.kernel.org/linux-nfs/CAK+ZN9qttsFDu6h1FoqGadXjMx1QXqPMoYQ=6O9RY4SxVTvKng@mail.gmail.com/ Fixes: 1d658336b05f ("SUNRPC: Add RPC based upcall mechanism for RPCGSS auth") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-20kernel.h: drop hex.h and update all hex.h usersRandy Dunlap-0/+1
Remove <linux/hex.h> from <linux/kernel.h> and update all users/callers of hex.h interfaces to directly #include <linux/hex.h> as part of the process of putting kernel.h on a diet. Removing hex.h from kernel.h means that 36K C source files don't have to pay the price of parsing hex.h for the roughly 120 C source files that need it. This change has been build-tested with allmodconfig on most ARCHes. Also, all users/callers of <linux/hex.h> in the entire source tree have been updated if needed (if not already #included). Link: https://lkml.kernel.org/r/20251215005206.2362276-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-12-24Merge tag 'nfsd-6.19-1' of ↵Linus Torvalds-3/+7
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "A set of NFSD fixes that arrived just a bit late for the 6.19 merge window. Regression fixes: - Mark variable __maybe_unused to avoid W=1 build break Stable fixes: - NFSv4 file creation neglects setting ACL - Clear TIME_DELEG in the suppattr_exclcreat bitmap - Clear SECLABEL in the suppattr_exclcreat bitmap - Fix memory leak in nfsd_create_serv error paths - Bound check rq_pages index in inline path - Return 0 on success from svc_rdma_copy_inline_range - Use rc_pageoff for memcpy byte offset - Avoid NULL deref on zero length gss_token in gss_read_proxy_verf" * tag 'nfsd-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: NFSv4 file creation neglects setting ACL NFSD: Clear TIME_DELEG in the suppattr_exclcreat bitmap NFSD: Clear SECLABEL in the suppattr_exclcreat bitmap nfsd: fix memory leak in nfsd_create_serv error paths nfsd: Mark variable __maybe_unused to avoid W=1 build break svcrdma: bound check rq_pages index in inline path svcrdma: return 0 on success from svc_rdma_copy_inline_range svcrdma: use rc_pageoff for memcpy byte offset SUNRPC: svcauth_gss: avoid NULL deref on zero length gss_token in gss_read_proxy_verf
2025-12-12Merge tag 'nfs-for-6.19-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds-9/+34
Pull NFS client updates from Trond Myklebust: "Bugfixes: - Fix 'nlink' attribute update races when unlinking a file - Add missing initialisers for the directory verifier in various places - Don't regress the NFSv4 open state due to misordered racing replies - Ensure the NFSv4.x callback server uses the correct transport connection - Fix potential use-after-free races when shutting down the NFSv4.x callback server - Fix a pNFS layout commit crash - Assorted fixes to ensure correct propagation of mount options when the client crosses a filesystem boundary and triggers the VFS automount code - More localio fixes Features and cleanups: - Add initial support for basic directory delegations - SunRPC back channel code cleanups" * tag 'nfs-for-6.19-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (24 commits) NFSv4: Handle NFS4ERR_NOTSUPP errors for directory delegations nfs/localio: remove 61 byte hole from needless ____cacheline_aligned nfs/localio: remove alignment size checking in nfs_is_local_dio_possible NFS: Fix up the automount fs_context to use the correct cred NFS: Fix inheritance of the block sizes when automounting NFS: Automounted filesystems should inherit ro,noexec,nodev,sync flags Revert "nfs: ignore SB_RDONLY when mounting nfs" Revert "nfs: clear SB_RDONLY before getting superblock" Revert "nfs: ignore SB_RDONLY when remounting nfs" NFS: Add a module option to disable directory delegations NFS: Shortcut lookup revalidations if we have a directory delegation NFS: Request a directory delegation during RENAME NFS: Request a directory delegation on ACCESS, CREATE, and UNLINK NFS: Add support for sending GDD_GETATTR NFSv4/pNFS: Clear NFS_INO_LAYOUTCOMMIT in pnfs_mark_layout_stateid_invalid NFSv4.1: protect destroying and nullifying bc_serv structure SUNRPC: new helper function for stopping backchannel server SUNRPC: cleanup common code in backchannel request NFSv4.1: pass transport for callback shutdown NFSv4: ensure the open stateid seqid doesn't go backwards ...
2025-12-08svcrdma: bound check rq_pages index in inline pathJoshua Rogers-0/+3
svc_rdma_copy_inline_range indexed rqstp->rq_pages[rc_curpage] without verifying rc_curpage stays within the allocated page array. Add guards before the first use and after advancing to a new page. Fixes: d7cc73972661 ("svcrdma: support multiple Read chunks per RPC") Cc: stable@vger.kernel.org Signed-off-by: Joshua Rogers <linux@joshua.hu> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-12-08svcrdma: return 0 on success from svc_rdma_copy_inline_rangeJoshua Rogers-1/+1
The function comment specifies 0 on success and -EINVAL on invalid parameters. Make the tail return 0 after a successful copy loop. Fixes: d7cc73972661 ("svcrdma: support multiple Read chunks per RPC") Cc: stable@vger.kernel.org Signed-off-by: Joshua Rogers <linux@joshua.hu> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-12-08svcrdma: use rc_pageoff for memcpy byte offsetJoshua Rogers-1/+1
svc_rdma_copy_inline_range added rc_curpage (page index) to the page base instead of the byte offset rc_pageoff. Use rc_pageoff so copies land within the current page. Found by ZeroPath (https://zeropath.com) Fixes: 8e122582680c ("svcrdma: Move svc_rdma_read_info::ri_pageno to struct svc_rdma_recv_ctxt") Cc: stable@vger.kernel.org Signed-off-by: Joshua Rogers <linux@joshua.hu> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-12-08SUNRPC: svcauth_gss: avoid NULL deref on zero length gss_token in ↵Joshua Rogers-1/+2
gss_read_proxy_verf A zero length gss_token results in pages == 0 and in_token->pages[0] is NULL. The code unconditionally evaluates page_address(in_token->pages[0]) for the initial memcpy, which can dereference NULL even when the copy length is 0. Guard the first memcpy so it only runs when length > 0. Fixes: 5866efa8cbfb ("SUNRPC: Fix svcauth_gss_proxy_init()") Cc: stable@vger.kernel.org Signed-off-by: Joshua Rogers <linux@joshua.hu> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-12-06Merge tag 'nfsd-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds-21/+60
Pull nfsd updates from Chuck Lever: - Mike Snitzer's mechanism for disabling I/O caching introduced in v6.18 is extended to include using direct I/O. The goal is to further reduce the memory footprint consumed by NFS clients accessing large data sets via NFSD. - The NFSD community adopted a maintainer entry profile during this cycle. See Documentation/filesystems/nfs/nfsd-maintainer-entry-profile.rst - Work continues on hardening NFSD's implementation of the pNFS block layout type. This type enables pNFS clients to directly access the underlying block devices that contain an exported file system, reducing server overhead and increasing data throughput. - The remaining patches are clean-ups and minor optimizations. Many thanks to the contributors, reviewers, testers, and bug reporters who participated during the v6.19 NFSD development cycle. * tag 'nfsd-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (38 commits) NFSD: nfsd-io-modes: Separate lists NFSD: nfsd-io-modes: Wrap shell snippets in literal code blocks NFSD: Add toctree entry for NFSD IO modes docs NFSD: add Documentation/filesystems/nfs/nfsd-io-modes.rst NFSD: Implement NFSD_IO_DIRECT for NFS WRITE NFSD: Make FILE_SYNC WRITEs comply with spec NFSD: Add trace point for SCSI fencing operation. NFSD: use correct reservation type in nfsd4_scsi_fence_client xdrgen: Don't generate unnecessary semicolon xdrgen: Fix union declarations NFSD: don't start nfsd if sv_permsocks is empty xdrgen: handle _XdrString in union encoder/decoder xdrgen: Fix the variable-length opaque field decoder template xdrgen: Make the xdrgen script location-independent xdrgen: Generalize/harden pathname construction lockd: don't allow locking on reexported NFSv2/3 MAINTAINERS: add a nfsd blocklayout reviewer nfsd: Use MD5 library instead of crypto_shash nfsd: stop pretending that we cache the SEQUENCE reply. NFS: nfsd-maintainer-entry-profile: Inline function name prefixes ...
2025-12-05Merge tag 'pull-persistency' of ↵Linus Torvalds-15/+12
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull persistent dentry infrastructure and conversion from Al Viro: "Some filesystems use a kinda-sorta controlled dentry refcount leak to pin dentries of created objects in dcache (and undo it when removing those). A reference is grabbed and not released, but it's not actually _stored_ anywhere. That works, but it's hard to follow and verify; among other things, we have no way to tell _which_ of the increments is intended to be an unpaired one. Worse, on removal we need to decide whether the reference had already been dropped, which can be non-trivial if that removal is on umount and we need to figure out if this dentry is pinned due to e.g. unlink() not done. Usually that is handled by using kill_litter_super() as ->kill_sb(), but there are open-coded special cases of the same (consider e.g. /proc/self). Things get simpler if we introduce a new dentry flag (DCACHE_PERSISTENT) marking those "leaked" dentries. Having it set claims responsibility for +1 in refcount. The end result this series is aiming for: - get these unbalanced dget() and dput() replaced with new primitives that would, in addition to adjusting refcount, set and clear persistency flag. - instead of having kill_litter_super() mess with removing the remaining "leaked" references (e.g. for all tmpfs files that hadn't been removed prior to umount), have the regular shrink_dcache_for_umount() strip DCACHE_PERSISTENT of all dentries, dropping the corresponding reference if it had been set. After that kill_litter_super() becomes an equivalent of kill_anon_super(). Doing that in a single step is not feasible - it would affect too many places in too many filesystems. It has to be split into a series. This work has really started early in 2024; quite a few preliminary pieces have already gone into mainline. This chunk is finally getting to the meat of that stuff - infrastructure and most of the conversions to it. Some pieces are still sitting in the local branches, but the bulk of that stuff is here" * tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits) d_make_discardable(): warn if given a non-persistent dentry kill securityfs_recursive_remove() convert securityfs get rid of kill_litter_super() convert rust_binderfs convert nfsctl convert rpc_pipefs convert hypfs hypfs: swich hypfs_create_u64() to returning int hypfs: switch hypfs_create_str() to returning int hypfs: don't pin dentries twice convert gadgetfs gadgetfs: switch to simple_remove_by_name() convert functionfs functionfs: switch to simple_remove_by_name() functionfs: fix the open/removal races functionfs: need to cancel ->reset_work in ->kill_sb() functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}() functionfs: don't abuse ffs_data_closed() on fs shutdown convert selinuxfs ...
2025-11-23SUNRPC: new helper function for stopping backchannel serverOlga Kornievskaia-0/+16
Create a new backchannel function to stop the backchannel server and clear the bc_serv in transport protected under the bc_pa_lock. Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-23SUNRPC: cleanup common code in backchannel requestOlga Kornievskaia-9/+18
Create a helper function for common code between rdma and tcp backchannel handling of the backchannel request. Make sure that access is protected by the bc_pa_lock lock. Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-17convert rpc_pipefsAl Viro-15/+12
Just use d_make_persistent() + dput() (and fold the latter into simple_finish_creating()) and that's it... NOTE: pipe->dentry is a borrowed reference - it does not contribute to dentry refcount. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16sunrpc: allocate a separate bvec array for socket sendsJeff Layton-7/+48
svc_tcp_sendmsg() calls xdr_buf_to_bvec() with the second slot of rq_bvec as the start, but doesn't reduce the array length by one, which could lead to an array overrun. Also, rq_bvec is always rq_maxpages in length, which can be too short in some cases, since the TCP record marker consumes a slot. Fix both problems by adding a separate bvec array to the svc_sock that is specifically for sending. For TCP, make this array one slot longer than rq_maxpages, to account for the record marker. For UDP, only allocate as large an array as we need since it's limited to 64k of payload. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-11-16SUNRPC: Improve "fragment too large" warningChuck Lever-3/+4
Including the client IP address that generated the overrun traffic seems like it would be helpful. The message now reads: kernel: svc: nfsd oversized RPC fragment (1064958 octets) from 100.64.0.11:45866 Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-11-16svcrdma: Release transport resources synchronouslyChuck Lever-11/+8
NFSD has always supported added network listeners. The new netlink protocol now enables the removal of listeners. Olga noticed that if an RDMA listener is removed and immediately re-added, the deferred __svc_rdma_free() function might not have run yet, so some or all of the old listener's RDMA resources linger, which prevents a new listener on the same address from being created. Also, svc_xprt_free() does a module_put() just after calling ->xpo_free(). That means if there is deferred work going on, the module could be unloaded before that work is even started, resulting in a UAF. Neil asks: > What particular part of __svc_rdma_free() needs to run in order for a > subsequent registration to succeed? > Can that bit be run directory from svc_rdma_free() rather than be > delayed? > (I know almost nothing about rdma so forgive me if the answers to these > questions seems obvious) The reasons I can recall are: - Some of the transport tear-down work can sleep - Releasing a cm_id is tricky and can deadlock We might be able to mitigate the second issue with judicious application of transport reference counting. Reported-by: Olga Kornievskaia <okorniev@redhat.com> Closes: https://lore.kernel.org/linux-nfs/20250821204328.89218-1-okorniev@redhat.com/ Suggested-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-11-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski-2/+1
Cross-merge networking fixes after downstream PR (net-6.18-rc6). No conflicts, adjacent changes in: drivers/net/phy/micrel.c 96a9178a29a6 ("net: phy: micrel: lan8814 fix reset of the QSGMII interface") 61b7ade9ba8c ("net: phy: micrel: Add support for non PTP SKUs for lan8814") and a trivial one in tools/testing/selftests/drivers/net/Makefile. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-12Merge tag 'nfsd-6.18-3' of ↵Linus Torvalds-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: "Address recently reported issues or issues found at the recent NFS bake-a-thon held in Raleigh, NC. Issues reported with v6.18-rc: - Address a kernel build issue - Reorder SEQUENCE processing to avoid spurious NFS4ERR_SEQ_MISORDERED Issues that need expedient stable backports: - Close a refcount leak exposure - Report support for NFSv4.2 CLONE correctly - Fix oops during COPY_NOTIFY processing - Prevent rare crash after XDR encoding failure - Prevent crash due to confused or malicious NFSv4.1 client" * tag 'nfsd-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: Revert "SUNRPC: Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it" nfsd: ensure SEQUENCE replay sends a valid reply. NFSD: Never cache a COMPOUND when the SEQUENCE operation fails NFSD: Skip close replay processing if XDR encoding fails NFSD: free copynotify stateid in nfs4_free_ol_stateid() nfsd: add missing FATTR4_WORD2_CLONE_BLKSIZE from supported attributes nfsd: fix refcount leak in nfsd_set_fh_dentry()
2025-11-10Revert "SUNRPC: Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it"Chuck Lever-2/+1
Geert reports: > This is now commit d8e97cc476e33037 ("SUNRPC: Make RPCSEC_GSS_KRB5 > select CRYPTO instead of depending on it") in v6.18-rc1. > As RPCSEC_GSS_KRB5 defaults to "y", CRYPTO is now auto-enabled in > defconfigs that didn't enable it before. Revert while we work out a proper solution and then test it. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Closes: https://lore.kernel.org/linux-nfs/b97cea29-4ab7-4fb6-85ba-83f9830e524f@kernel.org/T/#t Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-11-04net: Convert proto_ops connect() callbacks to use sockaddr_unsizedKees Cook-3/+4
Update all struct proto_ops connect() callback function prototypes from "struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the compiler about object sizes. Calls into struct proto handlers gain casts that will be removed in the struct proto conversion patch. No binary changes expected. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20251104002617.2752303-3-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04net: Convert proto_ops bind() callbacks to use sockaddr_unsizedKees Cook-5/+5
Update all struct proto_ops bind() callback function prototypes from "struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the compiler about object sizes. Calls into struct proto handlers gain casts that will be removed in the struct proto conversion patch. No binary changes expected. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20251104002617.2752303-2-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-06Merge tag 'nfsd-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds-20/+42
Pull nfsd updates from Chuck Lever: "Mike Snitzer has prototyped a mechanism for disabling I/O caching in NFSD. This is introduced in v6.18 as an experimental feature. This enables scaling NFSD in /both/ directions: - NFS service can be supported on systems with small memory footprints, such as low-cost cloud instances - Large NFS workloads will be less likely to force the eviction of server-local activity, helping it avoid thrashing Jeff Layton contributed a number of fixes to the new attribute delegation implementation (based on a pending Internet RFC) that we hope will make attribute delegation reliable enough to enable by default, as it is on the Linux NFS client. The remaining patches in this pull request are clean-ups and minor optimizations. Many thanks to the contributors, reviewers, testers, and bug reporters who participated during the v6.18 NFSD development cycle" * tag 'nfsd-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (42 commits) nfsd: discard nfserr_dropit SUNRPC: Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it NFSD: Add io_cache_{read,write} controls to debugfs NFSD: Do the grace period check in ->proc_layoutget nfsd: delete unnecessary NULL check in __fh_verify() NFSD: Allow layoutcommit during grace period NFSD: Disallow layoutget during grace period sunrpc: fix "occurence"->"occurrence" nfsd: Don't force CRYPTO_LIB_SHA256 to be built-in nfsd: nfserr_jukebox in nlm_fopen should lead to a retry NFSD: Reduce DRC bucket size NFSD: Delay adding new entries to LRU SUNRPC: Move the svc_rpcb_cleanup() call sites NFS: Remove rpcbind cleanup for NFSv4.0 callback nfsd: unregister with rpcbind when deleting a transport NFSD: Drop redundant conversion to bool sunrpc: eliminate return pointer in svc_tcp_sendmsg() sunrpc: fix pr_notice in svc_tcp_sendto() to show correct length nfsd: decouple the xprtsec policy check from check_nfsd_access() NFSD: Fix destination buffer size in nfsd4_ssc_setup_dul() ...
2025-10-01SUNRPC: Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on itEric Biggers-1/+2
Make RPCSEC_GSS_KRB5 select CRYPTO instead of depending on it. This unblocks the eventual removal of the selection of CRYPTO from NFSD_V4, which will no longer be needed by nfsd itself due to switching to the crypto library functions. But NFSD_V4 selects RPCSEC_GSS_KRB5, which still needs CRYPTO. It makes more sense for RPCSEC_GSS_KRB5 to select CRYPTO itself, like most other kconfig options that need CRYPTO do. Signed-off-by: Eric Biggers <ebiggers@kernel.org> Acked-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-30sunrpc: unexport rpc_malloc() and rpc_free()Jeff Layton-2/+0
These are not used outside of sunrpc code. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-23SUNRPC: Update gssx_accept_sec_context() to use xdr_set_scratch_folio()Anna Schumaker-4/+4
This was the last caller of xdr_set_scratch_page(), so I remove this function while I'm at it. Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-23SUNRPC: Update svcxdr_init_decode() to call xdr_set_scratch_folio()Anna Schumaker-5/+5
The only snag here is that __folio_alloc_node() doesn't handle NUMA_NO_NODE, so I also need to update svc_pool_map_get_node() to return numa_mem_id() instead. I arrived at this approach by looking at what other users of __folio_alloc_node() do for this case. Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-23SUNRPC: Remove redundant __GFP_NOWARNQianfeng Rong-2/+2
GFP_NOWAIT already includes __GFP_NOWARN, so let's remove the redundant __GFP_NOWARN. Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-23SUNRPC: Move the svc_rpcb_cleanup() call sitesChuck Lever-2/+6
Clean up: because svc_rpcb_cleanup() and svc_xprt_destroy_all() are always invoked in pairs, we can deduplicate code by moving the svc_rpcb_cleanup() call sites into svc_xprt_destroy_all(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-23sunrpc: add a Kconfig option to redirect dfprintk() output to trace bufferJeff Layton-0/+14
We have a lot of old dprintk() call sites that aren't going anywhere anytime soon. At the same time, turning them up is a serious burden on the host due to the console locking overhead. Add a new Kconfig option that redirects dfprintk() output to the trace buffer. This is more efficient than logging to the console and allows for proper interleaving of dprintk and static tracepoint events. Since using trace_printk() causes scary warnings to pop at boot time, this new option defaults to "n". Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-21sunrpc: fix "occurence"->"occurrence"Xichao Zhao-1/+1
Trivial fix to spelling mistake in comment text. Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com> Reviewed-by: Joe Damato <joe@dama.to> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21SUNRPC: Move the svc_rpcb_cleanup() call sitesChuck Lever-2/+6
Clean up: because svc_rpcb_cleanup() and svc_xprt_destroy_all() are always invoked in pairs, we can deduplicate code by moving the svc_rpcb_cleanup() call sites into svc_xprt_destroy_all(). Tested-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21nfsd: unregister with rpcbind when deleting a transportOlga Kornievskaia-0/+15
When a listener is added, a part of creation of transport also registers program/port with rpcbind. However, when the listener is removed, while transport goes away, rpcbind still has the entry for that port/type. When deleting the transport, unregister with rpcbind when appropriate. ---v2 created a new xpt_flag XPT_RPCB_UNREG to mark TCP and UDP transport and at xprt destroy send rpcbind unregister if flag set. Suggested-by: Chuck Lever <chuck.lever@oracle.com> Fixes: d093c9089260 ("nfsd: fix management of listener transports") Cc: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>