aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv4/tcp_input.c (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2025-10-21mm/damon/core: fix list_add_tail() call on damon_call()SeongJae Park1-1/+1
Each damon_ctx maintains callback requests using a linked list (damon_ctx->call_controls). When a new callback request is received via damon_call(), the new request should be added to the list. However, the function is making a mistake at list_add_tail() invocation: putting the new item to add and the list head to add it before, in the opposite order. Because of the linked list manipulation implementation, the new request can still be reached from the context's list head. But the list items that were added before the new request are dropped from the list. As a result, the callbacks are unexpectedly not invocated. Worse yet, if the dropped callback requests were dynamically allocated, the memory is leaked. Actually DAMON sysfs interface is using a dynamically allocated repeat-mode callback request for automatic essential stats update. And because the online DAMON parameters commit is using a non-repeat-mode callback request, the issue can easily be reproduced, like below. # damo start --damos_action stat --refresh_stat 1s # damo tune --damos_action stat --refresh_stat 1s The first command dynamically allocates the repeat-mode callback request for automatic essential stat update. Users can see the essential stats are automatically updated for every second, using the sysfs interface. The second command calls damon_commit() with a new callback request that was made for the commit. As a result, the previously added repeat-mode callback request is dropped from the list. The automatic stats refresh stops working, and the memory for the repeat-mode callback request is leaked. It can be confirmed using kmemleak. Fix the mistake on the list_add_tail() call. Link: https://lkml.kernel.org/r/20251014205939.1206-1-sj@kernel.org Fixes: 004ded6bee11 ("mm/damon: accept parallel damon_call() requests") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [6.17+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-10-21mm/mremap: correctly account old mapping after MREMAP_DONTUNMAP remapLorenzo Stoakes1-9/+6
Commit b714ccb02a76 ("mm/mremap: complete refactor of move_vma()") mistakenly introduced a new behaviour - clearing the VM_ACCOUNT flag of the old mapping when a mapping is mremap()'d with the MREMAP_DONTUNMAP flag set. While we always clear the VM_LOCKED and VM_LOCKONFAULT flags for the old mapping (the page tables have been moved, so there is no data that could possibly be locked in memory), there is no reason to touch any other VMA flags. This is because after the move the old mapping is in a state as if it were freshly mapped. This implies that the attributes of the mapping ought to remain the same, including whether or not the mapping is accounted. Link: https://lkml.kernel.org/r/20251013165836.273113-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Fixes: b714ccb02a76 ("mm/mremap: complete refactor of move_vma()") Reviewed-by: Pedro Falcato <pfalcato@suse.de> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-10-21bpf: Sync pending IRQ work before freeing ring bufferNoorain Eqbal1-0/+2
Fix a race where irq_work can be queued in bpf_ringbuf_commit() but the ring buffer is freed before the work executes. In the syzbot reproducer, a BPF program attached to sched_switch triggers bpf_ringbuf_commit(), queuing an irq_work. If the ring buffer is freed before this work executes, the irq_work thread may accesses freed memory. Calling `irq_work_sync(&rb->work)` ensures that all pending irq_work complete before freeing the buffer. Fixes: 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it") Reported-by: syzbot+2617fc732430968b45d2@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=2617fc732430968b45d2 Tested-by: syzbot+2617fc732430968b45d2@syzkaller.appspotmail.com Signed-off-by: Noorain Eqbal <nooraineqbal@gmail.com> Link: https://lore.kernel.org/r/20251020180301.103366-1-nooraineqbal@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-21Revert "NFSD: Remove the cap on number of operations per NFSv4 COMPOUND"Chuck Lever5-3/+20
I've found that pynfs COMP6 now leaves the connection or lease in a strange state, which causes CLOSE9 to hang indefinitely. I've dug into it a little, but I haven't been able to root-cause it yet. However, I bisected to commit 48aab1606fa8 ("NFSD: Remove the cap on number of operations per NFSv4 COMPOUND"). Tianshuo Han also reports a potential vulnerability when decoding an NFSv4 COMPOUND. An attacker can place an arbitrarily large op count in the COMPOUND header, which results in: [ 51.410584] nfsd: vmalloc error: size 1209533382144, exceeds total pages, mode:0xdc0(GFP_KERNEL|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0 when NFSD attempts to allocate the COMPOUND op array. Let's restore the operation-per-COMPOUND limit, but increased to 200 for now. Reported-by: tianshuo han <hantianshuo233@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org Tested-by: Tianshuo Han <hantianshuo233@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-10-21nfsd: Avoid strlen conflict in nfsd4_encode_components_esc()Nathan Chancellor1-6/+3
There is an error building nfs4xdr.c with CONFIG_SUNRPC_DEBUG_TRACE=y and CONFIG_FORTIFY_SOURCE=n due to the local variable strlen conflicting with the function strlen(): In file included from include/linux/cpumask.h:11, from arch/x86/include/asm/paravirt.h:21, from arch/x86/include/asm/irqflags.h:102, from include/linux/irqflags.h:18, from include/linux/spinlock.h:59, from include/linux/mmzone.h:8, from include/linux/gfp.h:7, from include/linux/slab.h:16, from fs/nfsd/nfs4xdr.c:37: fs/nfsd/nfs4xdr.c: In function 'nfsd4_encode_components_esc': include/linux/kernel.h:321:46: error: called object 'strlen' is not a function or function pointer 321 | __trace_puts(_THIS_IP_, str, strlen(str)); \ | ^~~~~~ include/linux/kernel.h:265:17: note: in expansion of macro 'trace_puts' 265 | trace_puts(fmt); \ | ^~~~~~~~~~ include/linux/sunrpc/debug.h:34:41: note: in expansion of macro 'trace_printk' 34 | # define __sunrpc_printk(fmt, ...) trace_printk(fmt, ##__VA_ARGS__) | ^~~~~~~~~~~~ include/linux/sunrpc/debug.h:42:17: note: in expansion of macro '__sunrpc_printk' 42 | __sunrpc_printk(fmt, ##__VA_ARGS__); \ | ^~~~~~~~~~~~~~~ include/linux/sunrpc/debug.h:25:9: note: in expansion of macro 'dfprintk' 25 | dfprintk(FACILITY, fmt, ##__VA_ARGS__) | ^~~~~~~~ fs/nfsd/nfs4xdr.c:2646:9: note: in expansion of macro 'dprintk' 2646 | dprintk("nfsd4_encode_components(%s)\n", components); | ^~~~~~~ fs/nfsd/nfs4xdr.c:2643:13: note: declared here 2643 | int strlen, count=0; | ^~~~~~ This dprintk() instance is not particularly useful, so just remove it altogether to get rid of the immediate strlen() conflict. At the same time, eliminate the local strlen variable to avoid potential conflicts with strlen() in the future. Fixes: ec7d8e68ef0e ("sunrpc: add a Kconfig option to redirect dfprintk() output to trace buffer") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-10-21NFSD: Fix crash in nfsd4_read_release()Chuck Lever1-3/+4
When tracing is enabled, the trace_nfsd_read_done trace point crashes during the pynfs read.testNoFh test. Fixes: 15a8b55dbb1b ("nfsd: call op_release, even when op_func returns an error") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-10-21NFSD: Define actions for the new time_deleg FATTR4 attributesChuck Lever1-0/+8
NFSv4 clients won't send legitimate GETATTR requests for these new attributes because they are intended to be used only with CB_GETATTR and SETATTR. But NFSD has to do something besides crashing if it ever sees a GETATTR request that queries these attributes. RFC 8881 Section 18.7.3 states: > The server MUST return a value for each attribute that the client > requests if the attribute is supported by the server for the > target file system. If the server does not support a particular > attribute on the target file system, then it MUST NOT return the > attribute value and MUST NOT set the attribute bit in the result > bitmap. The server MUST return an error if it supports an > attribute on the target but cannot obtain its value. In that case, > no attribute values will be returned. Further, RFC 9754 Section 5 states: > These new attributes are invalid to be used with GETATTR, VERIFY, > and NVERIFY, and they can only be used with CB_GETATTR and SETATTR > by a client holding an appropriate delegation. Thus there does not appear to be a specific server response mandated by specification. Taking the guidance that querying these attributes via GETATTR is "invalid", NFSD will return nfserr_inval, failing the request entirely. Reported-by: Robert Morris <rtm@csail.mit.edu> Closes: https://lore.kernel.org/linux-nfs/7819419cf0cb50d8130dc6b747765d2b8febc88a.camel@kernel.org/T/#t Fixes: 51c0d4f7e317 ("nfsd: add support for FATTR4_OPEN_ARGUMENTS") Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-10-21arm64, mm: avoid always making PTE dirty in pte_mkwrite()Huang Ying1-1/+2
Current pte_mkwrite_novma() makes PTE dirty unconditionally. This may mark some pages that are never written dirty wrongly. For example, do_swap_page() may map the exclusive pages with writable and clean PTEs if the VMA is writable and the page fault is for read access. However, current pte_mkwrite_novma() implementation always dirties the PTE. This may cause unnecessary disk writing if the pages are never written before being reclaimed. So, change pte_mkwrite_novma() to clear the PTE_RDONLY bit only if the PTE_DIRTY bit is set to make it possible to make the PTE writable and clean. The current behavior was introduced in commit 73e86cb03cf2 ("arm64: Move PTE_RDONLY bit handling out of set_pte_at()"). Before that, pte_mkwrite() only sets the PTE_WRITE bit, while set_pte_at() only clears the PTE_RDONLY bit if both the PTE_WRITE and the PTE_DIRTY bits are set. To test the performance impact of the patch, on an arm64 server machine, run 16 redis-server processes on socket 1 and 16 memtier_benchmark processes on socket 0 with mostly get transactions (that is, redis-server will mostly read memory only). The memory footprint of redis-server is larger than the available memory, so swap out/in will be triggered. Test results show that the patch can avoid most swapping out because the pages are mostly clean. And the benchmark throughput improves ~23.9% in the test. Fixes: 73e86cb03cf2 ("arm64: Move PTE_RDONLY bit handling out of set_pte_at()") Signed-off-by: Huang Ying <ying.huang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Gavin Shan <gshan@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-10-21ACPICA: Work around bogus -Wstringop-overread warning since GCC 11Xi Ruoyao1-0/+6
When ACPI_MISALIGNMENT_NOT_SUPPORTED is set, GCC can produce a bogus -Wstringop-overread warning, see [1]. To me, it's very clear that we have a compiler bug here, thus just disable the warning. Fixes: a9d13433fe17 ("LoongArch: Align ACPI structures if ARCH_STRICT_ALIGN enabled") Link: https://lore.kernel.org/all/899f2dec-e8b9-44f4-ab8d-001e160a2aed@roeck-us.net/ Link: https://github.com/acpica/acpica/commit/abf5b573 Link: https://gcc.gnu.org/PR122073 [1] Co-developed-by: Saket Dumbre <saket.dumbre@intel.com> Signed-off-by: Saket Dumbre <saket.dumbre@intel.com> Signed-off-by: Xi Ruoyao <xry111@xry111.site> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Cc: All applicable <stable@vger.kernel.org> [ rjw: Subject and changelog edits ] Link: https://patch.msgid.link/20251021092825.822007-1-xry111@xry111.site Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-21drm/amd/display: use GFP_NOWAIT for allocation in interrupt handlerAurabindo Pillai1-2/+2
schedule_dc_vmin_vmax() is called by dm_crtc_high_irq(). Hence, we cannot have the former sleep. Use GFP_NOWAIT for allocation in this function. Fixes: c210b757b400 ("drm/amd/display: fix dmub access race condition") Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Sun peng (Leo) Li <sunpeng.li@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c04812cbe2f247a1c1e53a9b6c5e659963fe4065) Cc: stable@vger.kernel.org
2025-10-21drm/amd/display: increase max link count and fix link->enc NULL pointer accessCharlene Liu2-1/+10
[why] 1.) dc->links[MAX_LINKS] array size smaller than actual requested. max_connector + max_dpia + 4 virtual = 14. increase from 12 to 14. 2.) hw_init() access null LINK_ENC for dpia non display_endpoint. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com> Reviewed-by: Chris Park <chris.park@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d7f5a61e1b04ed87b008c8d327649d184dc5bb45) Cc: stable@vger.kernel.org
2025-10-21drm/amd/display: Fix NULL pointer dereferenceMeenakshikumar Somasundaram1-1/+2
[Why] On a mst branch with multi display setup, dc context is obselete after updating the first stream. Referencing the same dc context for the next stream update to fetch dc pointer leads to NULL pointer dereference. [How] Get the dc pointer from the link rather than context. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit dc69b48988b171d6ccb3a083607e4dff015e2c0d) Cc: stable@vger.kernel.org
2025-10-21slab: Avoid race on slab->obj_exts in alloc_slab_obj_extsHao Ge1-3/+6
If two competing threads enter alloc_slab_obj_exts() and one of them fails to allocate the object extension vector, it might override the valid slab->obj_exts allocated by the other thread with OBJEXTS_ALLOC_FAIL. This will cause the thread that lost this race and expects a valid pointer to dereference a NULL pointer later on. Update slab->obj_exts atomically using cmpxchg() to avoid slab->obj_exts overrides by racing threads. Thanks for Vlastimil and Suren's help with debugging. Fixes: f7381b911640 ("slab: mark slab->obj_exts allocation failures unconditionally") Cc: <stable@vger.kernel.org> Suggested-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Hao Ge <gehao@kylinos.cn> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Link: https://patch.msgid.link/20251021010353.1187193-1-hao.ge@linux.dev Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-10-21x86/bugs: Qualify RETBLEED_INTEL_MSGDavid Kaplan1-1/+3
When retbleed mitigation is disabled, the kernel already prints an info message that the system is vulnerable. Recent code restructuring also inadvertently led to RETBLEED_INTEL_MSG being printed as an error, which is unnecessary as retbleed mitigation was already explicitly disabled (by config option, cmdline, etc.). Qualify this print statement so the warning is not printed unless an actual retbleed mitigation was selected and is being disabled due to incompatibility with spectre_v2. Fixes: e3b78a7ad5ea ("x86/bugs: Restructure retbleed mitigation") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220624 Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20251003171936.155391-1-david.kaplan@amd.com
2025-10-21net: ethernet: ti: am65-cpts: fix timestamp loss due to race conditionsAksh Garg1-20/+43
Resolve race conditions in timestamp events list handling between TX and RX paths causing missed timestamps. The current implementation uses a single events list for both TX and RX timestamps. The am65_cpts_find_ts() function acquires the lock, splices all events (TX as well as RX events) to a temporary list, and releases the lock. This function performs matching of timestamps for TX packets only. Before it acquires the lock again to put the non-TX events back to the main events list, a concurrent RX processing thread could acquire the lock (as observed in practice), find an empty events list, and fail to attach timestamp to it, even though a relevant event exists in the spliced list which is yet to be restored to the main list. Fix this by creating separate events lists to handle TX and RX timestamps independently. Fixes: c459f606f66df ("net: ethernet: ti: am65-cpts: Enable RX HW timestamp for PTP packets using CPTS FIFO") Signed-off-by: Aksh Garg <a-garg7@ti.com> Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com> Link: https://patch.msgid.link/20251016115755.1123646-1-a-garg7@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-21x86/microcode: Fix Entrysign revision check for Zen1/NaplesAndrew Cooper1-1/+1
... to match AMD's statement here: https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7033.html Fixes: 50cef76d5cb0 ("x86/microcode/AMD: Load only SHA256-checksummed patches") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://patch.msgid.link/20251020144124.2930784-1-andrew.cooper3@citrix.com
2025-10-21gpio: pci-idio-16: Define maximum valid register address offsetWilliam Breathitt Gray1-0/+1
Attempting to load the pci-idio-16 module fails during regmap initialization with a return error -EINVAL. This is a result of the regmap cache failing initialization. Set the idio_16_regmap_config max_register member to fix this failure. Fixes: 73d8f3efc5c2 ("gpio: pci-idio-16: Migrate to the regmap API") Reported-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Closes: https://lore.kernel.org/r/9b0375fd-235f-4ee1-a7fa-daca296ef6bf@nutanix.com Suggested-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Cc: stable@vger.kernel.org Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: William Breathitt Gray <wbg@kernel.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20251020-fix-gpio-idio-16-regmap-v2-2-ebeb50e93c33@kernel.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-10-21gpio: 104-idio-16: Define maximum valid register address offsetWilliam Breathitt Gray1-0/+1
Attempting to load the 104-idio-16 module fails during regmap initialization with a return error -EINVAL. This is a result of the regmap cache failing initialization. Set the idio_16_regmap_config max_register member to fix this failure. Fixes: 2c210c9a34a3 ("gpio: 104-idio-16: Migrate to the regmap API") Reported-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Closes: https://lore.kernel.org/r/9b0375fd-235f-4ee1-a7fa-daca296ef6bf@nutanix.com Suggested-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Cc: stable@vger.kernel.org Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: William Breathitt Gray <wbg@kernel.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20251020-fix-gpio-idio-16-regmap-v2-1-ebeb50e93c33@kernel.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-10-21xfs: don't use __GFP_NOFAIL in xfs_init_fs_contextChristoph Hellwig1-1/+1
With enough debug options enabled, struct xfs_mount is larger than 4k and thus NOFAIL allocations won't work for it. xfs_init_fs_context is early in the mount process, and if we really are out of memory there we'd better give up ASAP anyway. Fixes: 7b77b46a6137 ("xfs: use kmem functions for struct xfs_mount") Reported-by: syzbot+359a67b608de1ef72f65@syzkaller.appspotmail.com Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-10-21xfs: cache open zone in inode->i_privateChristoph Hellwig4-85/+53
The MRU cache for open zones is unfortunately still not ideal, as it can time out pretty easily when doing heavy I/O to hard disks using up most or all open zones. One option would be to just increase the timeout, but while looking into that I realized we're just better off caching it indefinitely as there is no real downside to that once we don't hold a reference to the cache open zone. So switch the open zone to RCU freeing, and then stash the last used open zone into inode->i_private. This helps to significantly reduce fragmentation by keeping I/O localized to zones for workloads that write using many open files to HDD. Fixes: 4e4d52075577 ("xfs: add the zoned space allocator") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Tested-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-10-21xfs: avoid busy loops in GCDChristoph Hellwig1-35/+46
When GCD has no new work to handle, but read, write or reset commands are outstanding, it currently busy loops, which is a bit suboptimal, and can lead to softlockup warnings in case of stuck commands. Change the code so that the task state is only set to running when work is performed, which looks a bit tricky due to the design of the reading/writing/resetting lists that contain both in-flight and finished commands. Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-10-21drm/panic: Fix 24bit pixel crossing page boundariesJocelyn Falempe1-2/+44
When using page list framebuffer, and using RGB888 format, some pixels can cross the page boundaries, and this case was not handled, leading to writing 1 or 2 bytes on the next virtual address. Add a check and a specific function to handle this case. Fixes: c9ff2808790f0 ("drm/panic: Add support to scanout buffer as array of pages") Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://lore.kernel.org/r/20251009122955.562888-7-jfalempe@redhat.com Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
2025-10-21drm/panic: Fix divide by 0 if the screen width < font widthJocelyn Falempe1-1/+1
In the unlikely case that the screen is tiny, and smaller than the font width, it leads to a divide by 0: draw_line_with_wrap() chars_per_row = sb->width / font->width = 0 line_wrap.len = line->len % chars_per_row; This will trigger a divide by 0 Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://lore.kernel.org/r/20251009122955.562888-6-jfalempe@redhat.com Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
2025-10-21drm/panic: Fix kmsg text drawing rectangleJocelyn Falempe1-1/+1
The rectangle height was larger than the screen size. This has no real impact. Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://lore.kernel.org/r/20251009122955.562888-5-jfalempe@redhat.com Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
2025-10-21drm/panic: Fix qr_code, ensure vmargin is positiveJocelyn Falempe1-1/+4
Depending on qr_code size and screen size, the vertical margin can be negative, that means there is not enough room to draw the qr_code. So abort early, to avoid a segfault by trying to draw at negative coordinates. Fixes: cb5164ac43d0f ("drm/panic: Add a QR code panic screen") Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://lore.kernel.org/r/20251009122955.562888-4-jfalempe@redhat.com Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
2025-10-21drm/panic: Fix overlap between qr code and logoJocelyn Falempe1-1/+1
The borders of the qr code was not taken into account to check if it overlap with the logo, leading to the logo being partially covered. Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://lore.kernel.org/r/20251009122955.562888-3-jfalempe@redhat.com Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
2025-10-21drm/panic: Fix drawing the logo on a small narrow screenJocelyn Falempe1-0/+3
If the logo width is bigger than the framebuffer width, and the height is big enough to hold the logo and the message, it will draw at x coordinate that are higher than the width, and ends up in a corrupted image. Fixes: 4b570ac2eb54 ("drm/rect: Add drm_rect_overlap()") Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://lore.kernel.org/r/20251009122955.562888-2-jfalempe@redhat.com Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
2025-10-21xfs: XFS_ONLINE_SCRUB_STATS should depend on DEBUG_FSGeert Uytterhoeven1-1/+1
Currently, XFS_ONLINE_SCRUB_STATS selects DEBUG_FS. However, DEBUG_FS is meant for debugging, and people may want to disable it on production systems. Since commit 0ff51a1fd786f47b ("xfs: enable online fsck by default in Kconfig")), XFS_ONLINE_SCRUB_STATS is enabled by default, forcing DEBUG_FS enabled too. Fix this by replacing the selection of DEBUG_FS by a dependency on DEBUG_FS, which is what most other options controlling the gathering and exposing of statistics do. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-10-21xfs: do not tightly pack-write large filesDamien Le Moal1-4/+15
When using a zoned realtime device, tightly packing of data blocks belonging to multiple closed files into the same realtime group (RTG) is very efficient at improving write performance. This is especially true with SMR HDDs as this can reduce, and even suppress, disk head seeks. However, such tight packing does not make sense for large files that require at least a full RTG. If tight packing placement is applied for such files, the VM writeback thread switching between inodes result in the large files to be fragmented, thus increasing the garbage collection penalty later when the RTG needs to be reclaimed. This problem can be avoided with a simple heuristic: if the size of the inode being written back is at least equal to the RTG size, do not use tight-packing. Modify xfs_zoned_pack_tight() to always return false in this case. With this change, a multi-writer workload writing files of 256 MB on a file system backed by an SMR HDD with 256 MB zone size as a realtime device sees all files occupying exactly one RTG (i.e. one device zone), thus completely removing the heavy fragmentation observed without this change. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-10-21xfs: Improve CONFIG_XFS_RT Kconfig helpDamien Le Moal1-0/+9
Improve the description of the XFS_RT configuration option to document that this option is required for zoned block devices. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-10-20net/smc: fix general protection fault in __smc_diag_dumpWang Liang1-13/+0
The syzbot report a crash: Oops: general protection fault, probably for non-canonical address 0xfbd5a5d5a0000003: 0000 [#1] SMP KASAN NOPTI KASAN: maybe wild-memory-access in range [0xdead4ead00000018-0xdead4ead0000001f] CPU: 1 UID: 0 PID: 6949 Comm: syz.0.335 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025 RIP: 0010:smc_diag_msg_common_fill net/smc/smc_diag.c:44 [inline] RIP: 0010:__smc_diag_dump.constprop.0+0x3ca/0x2550 net/smc/smc_diag.c:89 Call Trace: <TASK> smc_diag_dump_proto+0x26d/0x420 net/smc/smc_diag.c:217 smc_diag_dump+0x27/0x90 net/smc/smc_diag.c:234 netlink_dump+0x539/0xd30 net/netlink/af_netlink.c:2327 __netlink_dump_start+0x6d6/0x990 net/netlink/af_netlink.c:2442 netlink_dump_start include/linux/netlink.h:341 [inline] smc_diag_handler_dump+0x1f9/0x240 net/smc/smc_diag.c:251 __sock_diag_cmd net/core/sock_diag.c:249 [inline] sock_diag_rcv_msg+0x438/0x790 net/core/sock_diag.c:285 netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2552 netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline] netlink_unicast+0x5a7/0x870 net/netlink/af_netlink.c:1346 netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1896 sock_sendmsg_nosec net/socket.c:714 [inline] __sock_sendmsg net/socket.c:729 [inline] ____sys_sendmsg+0xa95/0xc70 net/socket.c:2614 ___sys_sendmsg+0x134/0x1d0 net/socket.c:2668 __sys_sendmsg+0x16d/0x220 net/socket.c:2700 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0x4e0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> The process like this: (CPU1) | (CPU2) ---------------------------------|------------------------------- inet_create() | // init clcsock to NULL | sk = sk_alloc() | | // unexpectedly change clcsock | inet_init_csk_locks() | | // add sk to hash table | smc_inet_init_sock() | smc_sk_init() | smc_hash_sk() | | // traverse the hash table | smc_diag_dump_proto | __smc_diag_dump() | // visit wrong clcsock | smc_diag_msg_common_fill() // alloc clcsock | smc_create_clcsk | sock_create_kern | With CONFIG_DEBUG_LOCK_ALLOC=y, the smc->clcsock is unexpectedly changed in inet_init_csk_locks(). The INET_PROTOSW_ICSK flag is no need by smc, just remove it. After removing the INET_PROTOSW_ICSK flag, this patch alse revert commit 6fd27ea183c2 ("net/smc: fix lacks of icsk_syn_mss with IPPROTO_SMC") to avoid casting smc_sock to inet_connection_sock. Reported-by: syzbot+f775be4458668f7d220e@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=f775be4458668f7d220e Tested-by: syzbot+f775be4458668f7d220e@syzkaller.appspotmail.com Fixes: d25a92ccae6b ("net/smc: Introduce IPPROTO_SMC") Signed-off-by: Wang Liang <wangliang74@huawei.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: D. Wythe <alibuda@linux.alibaba.com> Link: https://patch.msgid.link/20251017024827.3137512-1-wangliang74@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-20net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for striding RQAmery Hung1-3/+23
XDP programs can change the layout of an xdp_buff through bpf_xdp_adjust_tail() and bpf_xdp_adjust_head(). Therefore, the driver cannot assume the size of the linear data area nor fragments. Fix the bug in mlx5 by generating skb according to xdp_buff after XDP programs run. Currently, when handling multi-buf XDP, the mlx5 driver assumes the layout of an xdp_buff to be unchanged. That is, the linear data area continues to be empty and fragments remain the same. This may cause the driver to generate erroneous skb or triggering a kernel warning. When an XDP program added linear data through bpf_xdp_adjust_head(), the linear data will be ignored as mlx5e_build_linear_skb() builds an skb without linear data and then pull data from fragments to fill the linear data area. When an XDP program has shrunk the non-linear data through bpf_xdp_adjust_tail(), the delta passed to __pskb_pull_tail() may exceed the actual nonlinear data size and trigger the BUG_ON in it. To fix the issue, first record the original number of fragments. If the number of fragments changes after the XDP program runs, rewind the end fragment pointer by the difference and recalculate the truesize. Then, build the skb with the linear data area matching the xdp_buff. Finally, only pull data in if there is non-linear data and fill the linear part up to 256 bytes. Fixes: f52ac7028bec ("net/mlx5e: RX, Add XDP multi-buffer support in Striding RQ") Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1760644540-899148-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-20net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for legacy RQAmery Hung1-6/+19
XDP programs can release xdp_buff fragments when calling bpf_xdp_adjust_tail(). The driver currently assumes the number of fragments to be unchanged and may generate skb with wrong truesize or containing invalid frags. Fix the bug by generating skb according to xdp_buff after the XDP program runs. Fixes: ea5d49bdae8b ("net/mlx5e: Add XDP multi buffer support to the non-linear legacy RQ") Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1760644540-899148-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-20drm/xe/uapi: Hide the madvise autoreset behind a VM_BIND flagThomas Hellström4-3/+30
The madvise implementation currently resets the SVM madvise if the underlying CPU map is unmapped. This is in an attempt to mimic the CPU madvise behaviour. However, it's not clear that this is a desired behaviour since if the end app user relies on it for malloc()ed objects or stack objects, it may not work as intended. Instead of having the autoreset functionality being a direct application-facing implicit UAPI, make the UMD explicitly choose this behaviour if it wants to expose it by introducing DRM_XE_VM_BIND_FLAG_MADVISE_AUTORESET, and add a semantics description. v2: - Kerneldoc fixes. Fix a commit log message. Fixes: a2eb8aec3ebe ("drm/xe: Reset VMA attributes to default in SVM garbage collector") Cc: Matthew Brost <matthew.brost@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: "Falkowski, John" <john.falkowski@intel.com> Cc: "Mrozek, Michal" <michal.mrozek@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://lore.kernel.org/r/20251015170726.178685-2-thomas.hellstrom@linux.intel.com (cherry picked from commit 59a2d3f38ab23cce4cd9f0c4a5e08fdfe9e67ae7) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-20drm/xe: Retain vma flags when recreating and splitting vmas for madviseThomas Hellström3-67/+32
When splitting and restoring vmas for madvise, we only copied the XE_VMA_SYSTEM_ALLOCATOR flag. That meant we lost flags for read_only, dumpable and sparse (in case anyone would call madvise for the latter). Instead, define a mask of relevant flags and ensure all are replicated, To simplify this and make the code a bit less fragile, remove the conversion to VMA_CREATE flags and instead just pass around the gpuva flags after initial conversion from user-space. Fixes: a2eb8aec3ebe ("drm/xe: Reset VMA attributes to default in SVM garbage collector") Cc: Matthew Brost <matthew.brost@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251015170726.178685-1-thomas.hellstrom@linux.intel.com (cherry picked from commit b3af8658ec70f2196190c66103478352286aba3b) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-20selftests: net: fix server bind failure in sctp_vrf.shXin Long2-43/+47
sctp_vrf.sh could fail: TEST 12: bind vrf-2 & 1 in server, connect from client 1 & 2, N [FAIL] not ok 1 selftests: net: sctp_vrf.sh # exit=3 The failure happens when the server bind in a new run conflicts with an existing association from the previous run: [1] ip netns exec $SERVER_NS ./sctp_hello server ... [2] ip netns exec $CLIENT_NS ./sctp_hello client ... [3] ip netns exec $SERVER_NS pkill sctp_hello ... [4] ip netns exec $SERVER_NS ./sctp_hello server ... It occurs if the client in [2] sends a message and closes immediately. With the message unacked, no SHUTDOWN is sent. Killing the server in [3] triggers a SHUTDOWN the client also ignores due to the unacked message, leaving the old association alive. This causes the bind at [4] to fail until the message is acked and the client responds to a second SHUTDOWN after the server’s T2 timer expires (3s). This patch fixes the issue by preventing the client from sending data. Instead, the client blocks on recv() and waits for the server to close. It also waits until both the server and the client sockets are fully released in stop_server and wait_client before restarting. Additionally, replace 2>&1 >/dev/null with -q in sysctl and grep, and drop other redundant 2>&1 >/dev/null redirections, and fix a typo from N to Y (connect successfully) in the description of the last test. Fixes: a61bd7b9fef3 ("selftests: add a selftest for sctp vrf") Reported-by: Hangbin Liu <liuhangbin@gmail.com> Tested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/be2dacf52d0917c4ba5e2e8c5a9cb640740ad2b6.1760731574.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-20iommufd/selftest: Fix ioctl return value in _test_cmd_trigger_vevents()Nicolin Chen1-2/+2
The ioctl returns 0 upon success, so !0 returning -1 breaks the selftest. Drop the '!' to fix it. Fixes: 1d235d849425 ("iommu/selftest: prevent use of uninitialized variable") Link: https://patch.msgid.link/r/20251014214847.1113759-1-nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-10-20iommufd: Don't overflow during division for dirty trackingJason Gunthorpe1-3/+2
If pgshift is 63 then BITS_PER_TYPE(*bitmap->bitmap) * pgsize will overflow to 0 and this triggers divide by 0. In this case the index should just be 0, so reorganize things to divide by shift and avoid hitting any overflows. Link: https://patch.msgid.link/r/0-v1-663679b57226+172-iommufd_dirty_div0_jgg@nvidia.com Cc: stable@vger.kernel.org Fixes: 58ccf0190d19 ("vfio: Add an IOVA bitmap support") Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reported-by: syzbot+093a8a8b859472e6c257@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=093a8a8b859472e6c257 Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-10-20cifs: Add a couple of missing smb3_rw_credits tracepointsDavid Howells1-0/+8
Add missing smb3_rw_credits tracepoints to cifs_readv_callback() (for SMB1) to match those of SMB2/3. Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.org> cc: Shyam Prasad N <sprasad@microsoft.com> cc: Tom Talpey <tom@talpey.com> cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-10-20MAINTAINERS: Update Alex Williamson's email addressAlex Williamson2-2/+3
Switch to a personal email account as I'll be leaving Red Hat soon. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20251013152613.3088777-1-alex.williamson@redhat.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-10-20ACPI: property: Fix argument order in __acpi_node_get_property_reference()Sunil V L1-1/+1
A refactoring bug introduced an argument order mistake in the call to acpi_fwnode_get_reference_args() from __acpi_node_get_property_reference(). This caused incorrect behavior when resolving ACPI property references. Fix the issue by correcting the argument order. Fixes: e121be784d35 ("ACPI: property: Refactor acpi_fwnode_get_reference_args() to support nargs_prop") Reported-by: Thomas Richard <thomas.richard@bootlin.com> Closes: https://lore.kernel.org/all/1241f2b6-9b4e-4623-8a83-77db8774ac32@bootlin.com/ Tested-by: Thomas Richard <thomas.richard@bootlin.com> Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20251017100744.71871-1-sunilvl@ventanamicro.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-20Revert "cpuidle: menu: Avoid discarding useful information"Rafael J. Wysocki1-12/+9
It is reported that commit 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information") led to a performance regression on Intel Jasper Lake systems because it reduced the time spent by CPUs in idle state C7 which is correlated to the maximum frequency the CPUs can get to because of an average running power limit [1]. Before that commit, get_typical_interval() would have returned UINT_MAX whenever it had been unable to make a high-confidence prediction which had led to selecting the deepest available idle state too often and both power and performance had been inadequate as a result of that on some systems. However, this had not been a problem on systems with relatively aggressive average running power limits, like the Jasper Lake systems in question, because on those systems it was compensated by the ability to run CPUs faster. It was addressed by causing get_typical_interval() to return a number based on the recent idle duration information available to it even if it could not make a high-confidence prediction, but that clearly did not take the possible correlation between idle power and available CPU capacity into account. For this reason, revert most of the changes made by commit 85975daeaa4d, except for one cosmetic cleanup, and add a comment explaining the rationale for returning UINT_MAX from get_typical_interval() when it is unable to make a high-confidence prediction. Fixes: 85975daeaa4d ("cpuidle: menu: Avoid discarding useful information") Closes: https://lore.kernel.org/linux-pm/36iykr223vmcfsoysexug6s274nq2oimcu55ybn6ww4il3g3cv@cohflgdbpnq7/ [1] Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/3663603.iIbC2pHGDl@rafael.j.wysocki
2025-10-20kunit: test_dev_action: Correctly cast 'priv' pointer to long*Florian Schmaus1-1/+1
The previous implementation incorrectly assumed the original type of 'priv' was void**, leading to an unnecessary and misleading cast. Correct the cast of the 'priv' pointer in test_dev_action() to its actual type, long*, removing an unnecessary cast. As an additional benefit, this fixes an out-of-bounds CHERI fault on hardware with architectural capabilities. The original implementation tried to store a capability-sized pointer using the priv pointer. However, the priv pointer's capability only granted access to the memory region of its original long type, leading to a bounds violation since the size of a long is smaller than the size of a capability. This change ensures that the pointer usage respects the capabilities' bounds. Link: https://lore.kernel.org/r/20251017092814.80022-1-florian.schmaus@codasip.com Fixes: d03c720e03bd ("kunit: Add APIs for managing devices") Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Florian Schmaus <florian.schmaus@codasip.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2025-10-20timekeeping: Fix aux clocks sysfs initialization loop boundHaofeng Li1-1/+1
The loop in tk_aux_sysfs_init() uses `i <= MAX_AUX_CLOCKS` as the termination condition, which results in 9 iterations (i=0 to 8) when MAX_AUX_CLOCKS is defined as 8. However, the kernel is designed to support only up to 8 auxiliary clocks. This off-by-one error causes the creation of a 9th sysfs entry that exceeds the intended auxiliary clock range. Fix the loop bound to use `i < MAX_AUX_CLOCKS` to ensure exactly 8 auxiliary clock entries are created, matching the design specification. Fixes: 7b95663a3d96 ("timekeeping: Provide interface to control auxiliary clocks") Signed-off-by: Haofeng Li <lihaofeng@kylinos.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/tencent_2376993D9FC06A3616A4F981B3DE1C599607@qq.com
2025-10-20drm/i915/panic: fix panic structure allocation memory leakJani Nikula1-12/+13
Separating the panic allocation from framebuffer allocation in commit 729c5f7ffa83 ("drm/{i915,xe}/panic: move framebuffer allocation where it belongs") failed to deallocate the panic structure anywhere. The fix is two-fold. First, free the panic structure in intel_user_framebuffer_destroy() in the general case. Second, move the panic allocation later to intel_framebuffer_init() to not leak the panic structure in error paths (if any, now or later) between intel_framebuffer_alloc() and intel_framebuffer_init(). v2: Rebase Fixes: 729c5f7ffa83 ("drm/{i915,xe}/panic: move framebuffer allocation where it belongs") Cc: Jocelyn Falempe <jfalempe@redhat.com> Cc: Maarten Lankhorst <dev@lankhorst.se> Reported-by: Michał Grzelak <michal.grzelak@intel.com> Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Tested-by: Michał Grzelak <michal.grzelak@intel.com> # v1 Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://lore.kernel.org/r/20251015095135.2183415-1-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit 8f8ef09fcf6a3b00369bfc704e8f68d7474eca94) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-10-20x86,fs/resctrl: Fix NULL pointer dereference with events force-disabled in ↵Babu Moger2-10/+17
mbm_event mode The following NULL pointer dereference is encountered on mount of resctrl fs after booting a system that supports assignable counters with the "rdt=!mbmtotal,!mbmlocal" kernel parameters: BUG: kernel NULL pointer dereference, address: 0000000000000008 RIP: 0010:mbm_cntr_get Call Trace: rdtgroup_assign_cntr_event rdtgroup_assign_cntrs rdt_get_tree Specifying the kernel parameter "rdt=!mbmtotal,!mbmlocal" effectively disables the legacy X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL features and the MBM events they represent. This results in the per-domain MBM event related data structures to not be allocated during early initialization. resctrl fs initialization follows by implicitly enabling both MBM total and local events on a system that supports assignable counters (mbm_event mode), but this enabling occurs after the per-domain data structures have been created. After booting, resctrl fs assumes that an enabled event can access all its state. This results in NULL pointer dereference when resctrl attempts to access the un-allocated structures of an enabled event. Remove the late MBM event enabling from resctrl fs. This leaves a problem where the X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL features may be disabled while assignable counter (mbm_event) mode is enabled without any events to support. Switching between the "default" and "mbm_event" mode without any events is not practical. Create a dependency between the X86_FEATURE_{CQM_MBM_TOTAL,CQM_MBM_LOCAL} and X86_FEATURE_ABMC (assignable counter) hardware features. An x86 system that supports assignable counters now requires support of X86_FEATURE_CQM_MBM_TOTAL or X86_FEATURE_CQM_MBM_LOCAL. This ensures all needed MBM related data structures are created before use and that it is only possible to switch between "default" and "mbm_event" mode when the same events are available in both modes. This dependency does not exist in the hardware but this usage of these feature settings work for known systems. [ bp: Massage commit message. ] Fixes: 13390861b426e ("x86,fs/resctrl: Detect Assignable Bandwidth Monitoring feature details") Co-developed-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://patch.msgid.link/a62e6ac063d0693475615edd213d5be5e55443e6.1760560934.git.babu.moger@amd.com
2025-10-20scsi: core: Fix a regression triggered by scsi_host_busy()Bart Van Assche1-2/+3
Commit 995412e23bb2 ("blk-mq: Replace tags->lock with SRCU for tag iterators") introduced the following regression: Call trace: __srcu_read_lock+0x30/0x80 (P) blk_mq_tagset_busy_iter+0x44/0x300 scsi_host_busy+0x38/0x70 ufshcd_print_host_state+0x34/0x1bc ufshcd_link_startup.constprop.0+0xe4/0x2e0 ufshcd_init+0x944/0xf80 ufshcd_pltfrm_init+0x504/0x820 ufs_rockchip_probe+0x2c/0x88 platform_probe+0x5c/0xa4 really_probe+0xc0/0x38c __driver_probe_device+0x7c/0x150 driver_probe_device+0x40/0x120 __driver_attach+0xc8/0x1e0 bus_for_each_dev+0x7c/0xdc driver_attach+0x24/0x30 bus_add_driver+0x110/0x230 driver_register+0x68/0x130 __platform_driver_register+0x20/0x2c ufs_rockchip_pltform_init+0x1c/0x28 do_one_initcall+0x60/0x1e0 kernel_init_freeable+0x248/0x2c4 kernel_init+0x20/0x140 ret_from_fork+0x10/0x20 Fix this regression by making scsi_host_busy() check whether the SCSI host tag set has already been initialized. tag_set->ops is set by scsi_mq_setup_tags() just before blk_mq_alloc_tag_set() is called. This fix is based on the assumption that scsi_host_busy() and scsi_mq_setup_tags() calls are serialized. This is the case in the UFS driver. Reported-by: Sebastian Reichel <sebastian.reichel@collabora.com> Closes: https://lore.kernel.org/linux-block/pnezafputodmqlpumwfbn644ohjybouveehcjhz2hmhtcf2rka@sdhoiivync4y/ Cc: Ming Lei <ming.lei@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Tested-by: Sebastian Reichel <sebastian.reichel@collabora.com> Link: https://patch.msgid.link/20251007214800.1678255-1-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-20Revert "PCI: qcom: Prepare for the DWC ECAM enablement"Krishna Chaitanya Chundru1-68/+0
This reverts commit 4660e50cf81800f82eeecf743ad1e3e97ab72190. Commit f6fd357f7afb ("PCI: dwc: Prepare the driver for enabling ECAM mechanism using iATU 'CFG Shift Feature'") enabled ECAM access by using the config space start as DBI address. However, this approach breaks vendor drivers that rely on the DBI address for internal accesses, especially when the vendor config space is 256MB aligned. To resolve this, avoid using the DBI as the start of config space and instead introduce a custom ECAM PCI ops implementation. Revert the qcom specific ECAM preparation logic in 4660e50cf818 ("PCI: qcom: Prepare for the DWC ECAM enablement") since it's no longer necessary. Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20251017-ecam_fix-v1-2-f6faa3d0edf3@oss.qualcomm.com
2025-10-20PCI: dwc: Use custom pci_ops for root bus DBI vs ECAM config accessKrishna Chaitanya Chundru1-4/+24
When the vendor configuration space is 256MB aligned, the DesignWare PCIe host driver enables ECAM access and sets the DBI base to the start of the config space. This causes vendor drivers to incorrectly program iATU regions, as they rely on the DBI address for internal accesses. To fix this, avoid overwriting the DBI base when ECAM is enabled. Instead, introduce a custom pci_ops that accesses the DBI region directly for the root bus and uses ECAM for other buses. Fixes: f6fd357f7afb ("PCI: dwc: Prepare the driver for enabling ECAM mechanism using iATU 'CFG Shift Feature'") Reported-by: Ron Economos <re@w6rz.net> Closes: https://lore.kernel.org/all/eac81c57-1164-4d74-a1b4-6f353c577731@w6rz.net/ Suggested-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Ron Economos <re@w6rz.net> Link: https://patch.msgid.link/20251017-ecam_fix-v1-1-f6faa3d0edf3@oss.qualcomm.com
2025-10-20nbd: override creds to kernel when calling sock_{send,recv}msg()Ondrej Mosnacek1-0/+15
sock_{send,recv}msg() internally calls security_socket_{send,recv}msg(), which does security checks (e.g. SELinux) for socket access against the current task. However, _sock_xmit() in drivers/block/nbd.c may be called indirectly from a userspace syscall, where the NBD socket access would be incorrectly checked against the calling userspace task (which simply tries to read/write a file that happens to reside on an NBD device). To fix this, temporarily override creds to kernel ones before calling the sock_*() functions. This allows the security modules to recognize this as internal access by the kernel, which will normally be allowed. A way to trigger the issue is to do the following (on a system with SELinux set to enforcing): ### Create nbd device: truncate -s 256M /tmp/testfile nbd-server localhost:10809 /tmp/testfile ### Connect to the nbd server: nbd-client localhost ### Create mdraid array mdadm --create -l 1 -n 2 /dev/md/testarray /dev/nbd0 missing After these steps, assuming the SELinux policy doesn't allow the unexpected access pattern, errors will be visible on the kernel console: [ 142.204243] nbd0: detected capacity change from 0 to 524288 [ 165.189967] md: async del_gendisk mode will be removed in future, please upgrade to mdadm-4.5+ [ 165.252299] md/raid1:md127: active with 1 out of 2 mirrors [ 165.252725] md127: detected capacity change from 0 to 522240 [ 165.255434] block nbd0: Send control failed (result -13) [ 165.255718] block nbd0: Request send failed, requeueing [ 165.256006] block nbd0: Dead connection, failed to find a fallback [ 165.256041] block nbd0: Receive control failed (result -32) [ 165.256423] block nbd0: shutting down sockets [ 165.257196] I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [ 165.257736] Buffer I/O error on dev md127, logical block 0, async page read [ 165.258263] I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [ 165.259376] Buffer I/O error on dev md127, logical block 0, async page read [ 165.259920] I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [ 165.260628] Buffer I/O error on dev md127, logical block 0, async page read [ 165.261661] ldm_validate_partition_table(): Disk read failed. [ 165.262108] I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [ 165.262769] Buffer I/O error on dev md127, logical block 0, async page read [ 165.263697] I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [ 165.264412] Buffer I/O error on dev md127, logical block 0, async page read [ 165.265412] I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [ 165.265872] Buffer I/O error on dev md127, logical block 0, async page read [ 165.266378] I/O error, dev nbd0, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [ 165.267168] Buffer I/O error on dev md127, logical block 0, async page read [ 165.267564] md127: unable to read partition table [ 165.269581] I/O error, dev nbd0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [ 165.269960] Buffer I/O error on dev nbd0, logical block 0, async page read [ 165.270316] I/O error, dev nbd0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [ 165.270913] Buffer I/O error on dev nbd0, logical block 0, async page read [ 165.271253] I/O error, dev nbd0, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2 [ 165.271809] Buffer I/O error on dev nbd0, logical block 0, async page read [ 165.272074] ldm_validate_partition_table(): Disk read failed. [ 165.272360] nbd0: unable to read partition table [ 165.289004] ldm_validate_partition_table(): Disk read failed. [ 165.289614] nbd0: unable to read partition table The corresponding SELinux denial on Fedora/RHEL will look like this (assuming it's not silenced): type=AVC msg=audit(1758104872.510:116): avc: denied { write } for pid=1908 comm="mdadm" laddr=::1 lport=32772 faddr=::1 fport=10809 scontext=system_u:system_r:mdadm_t:s0-s0:c0.c1023 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=tcp_socket permissive=0 The respective backtrace looks like this: @security[mdadm, -13, handshake_exit+221615650 handshake_exit+221615650 handshake_exit+221616465 security_socket_sendmsg+5 sock_sendmsg+106 handshake_exit+221616150 sock_sendmsg+5 __sock_xmit+162 nbd_send_cmd+597 nbd_handle_cmd+377 nbd_queue_rq+63 blk_mq_dispatch_rq_list+653 __blk_mq_do_dispatch_sched+184 __blk_mq_sched_dispatch_requests+333 blk_mq_sched_dispatch_requests+38 blk_mq_run_hw_queue+239 blk_mq_dispatch_plug_list+382 blk_mq_flush_plug_list.part.0+55 __blk_flush_plug+241 __submit_bio+353 submit_bio_noacct_nocheck+364 submit_bio_wait+84 __blkdev_direct_IO_simple+232 blkdev_read_iter+162 vfs_read+591 ksys_read+95 do_syscall_64+92 entry_SYSCALL_64_after_hwframe+120 ]: 1 The issue has started to appear since commit 060406c61c7c ("block: add plug while submitting IO"). Cc: Ming Lei <ming.lei@redhat.com> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2348878 Fixes: 060406c61c7c ("block: add plug while submitting IO") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Acked-by: Paul Moore <paul@paul-moore.com> Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Tested-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>