aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/stackcollapse.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2025-08-19xfs: reject swapon for inodes on a zoned file system earlierChristoph Hellwig1-0/+3
No point in going down into the iomap mapping loop when we know it will be rejected. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-08-19xfs: kick off inodegc when failing to reserve zoned blocksChristoph Hellwig1-0/+6
XFS processes truncating unlinked inodes asynchronously and thus the free space pool only sees them with a delay. The non-zoned write path thus calls into inodegc to accelerate this processing before failing an allocation due the lack of free blocks. Do the same for the zoned space reservation. Fixes: 0bb2193056b5 ("xfs: add support for zoned space reservations") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-08-19xfs: remove xfs_last_used_zoneChristoph Hellwig1-43/+2
This was my first attempt at caching the last used zone. But it turns out for O_DIRECT or RWF_DONTCACHE that operate concurrently or in very short sequence, the bmap btree does not record a written extent yet, so it fails. Because it then still finds the last written zone it can lead to a weird ping-pong around a few zones with writers seeing different values. Remove it entirely as the later added xfs_cached_zone actually does a much better job enforcing the locality as the zone is associated with the inode in the MRU cache as soon as the zone is selected. Fixes: 4e4d52075577 ("xfs: add the zoned space allocator") Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-08-19xfs: Default XFS_RT to Y if CONFIG_BLK_DEV_ZONED is enabledDamien Le Moal1-0/+1
XFS support for zoned block devices requires the realtime subvolume support (XFS_RT) to be enabled. Change the default configuration value of XFS_RT from N to CONFIG_BLK_DEV_ZONED to align with this requirement. This change still allows the user to disable XFS_RT if this feature is not desired for the user use case. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-08-11xfs: split xfs_zone_record_blocksChristoph Hellwig2-13/+30
xfs_zone_record_blocks not only records successfully written blocks that now back file data, but is also used for blocks speculatively written by garbage collection that were never linked to an inode and instantly become invalid. Split the latter functionality out to be easier to understand. This also make it clear that we don't need to attach the rmap inode to a transaction for the skipped blocks case as we never dirty any peristent data structure. Also make the argument order to xfs_zone_record_blocks a bit more natural. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-08-11xfs: fix scrub trace with null pointer in quotacheckAndrey Albershteyn1-1/+1
The quotacheck doesn't initialize sc->ip. Cc: stable@vger.kernel.org # v6.8 Fixes: 21d7500929c8a0 ("xfs: improve dquot iteration for scrub") Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-08-11xfs: reject max_atomic_write mount option for no reflinkJohn Garry1-0/+19
If the FS has no reflink, then atomic writes greater than 1x block are not supported. As such, for no reflink it is pointless to accept setting max_atomic_write when it cannot be supported, so reject max_atomic_write mount option in this case. It could be still possible to accept max_atomic_write option of size 1x block if HW atomics are supported, so check for this specifically. Fixes: 4528b9052731 ("xfs: allow sysadmins to specify a maximum atomic write limit at mount time") Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-08-11xfs: disallow atomic writes on DAXJohn Garry3-5/+17
Atomic writes are not currently supported for DAX, but two problems exist: - we may go down DAX write path for IOCB_ATOMIC, which does not handle IOCB_ATOMIC properly - we report non-zero atomic write limits in statx (for DAX inodes) We may want atomic writes support on DAX in future, but just disallow for now. For this, ensure when IOCB_ATOMIC is set that we check the write size versus the atomic write min and max before branching off to the DAX write path. This is not strictly required for DAX, as we should not get this far in the write path as FMODE_CAN_ATOMIC_WRITE should not be set. In addition, due to reflink being supported for DAX, we automatically get CoW-based atomic writes support being advertised. Remedy this by disallowing atomic writes for a DAX inode for both sw and hw modes. Reported-by: Darrick J. Wong <djwong@kernel.org> Fixes: 9dffc58f2384 ("xfs: update atomic write limits") Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-08-11fs/dax: Reject IOCB_ATOMIC in dax_iomap_rw()John Garry1-0/+3
The DAX write path does not support IOCB_ATOMIC, so reject it when set. Suggested-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-08-11xfs: remove XFS_IBULK_SAME_AGChristoph Hellwig3-17/+7
Add a new field to struct xfs_ibulk to directly pass XFS_IWALK* flags, and thus remove the need to indirect the SAME_AG flag through XFS_IBULK*. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-08-11xfs: fully decouple XFS_IBULK* flags from XFS_IWALK* flagsChristoph Hellwig1-1/+5
Fix up xfs_inumbers to now pass in the XFS_IBULK* flags into the flags argument to xfs_inobt_walk, which expects the XFS_IWALK* flags. Currently passing the wrong flags works for non-debug builds because the only XFS_IWALK* flag has the same encoding as the corresponding XFS_IBULK* flag, but in debug builds it can trigger an assert that no incorrect flag is passed. Instead just extra the relevant flag. Fixes: 5b35d922c52798 ("xfs: Decouple XFS_IBULK flags from XFS_IWALK flags") Cc: <stable@vger.kernel.org> # v5.19 Reported-by: cen zhang <zzzccc427@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-08-11xfs: fix frozen file system assert in xfs_trans_allocChristoph Hellwig1-1/+1
Commit 83a80e95e797 ("xfs: decouple xfs_trans_alloc_empty from xfs_trans_alloc") move the place of the assert for a frozen file system after the sb_start_intwrite call that ensures it doesn't run on frozen file systems, and thus allows to incorrect trigger it. Fix that by moving it back to where it belongs. Fixes: 83a80e95e797 ("xfs: decouple xfs_trans_alloc_empty from xfs_trans_alloc") Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-08-10Linux 6.17-rc1v6.17-rc1Linus Torvalds1-2/+2
2025-08-09tools/power turbostat: version 2025.09.09Len Brown1-1/+1
Probe and display L3 Cache topology Add ability to average an added counter (useful for pre-integrated "counters", such as Watts) Break the limit of 64 built-in counters. Assorted bug fixes and minor feature tweaks Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09tools/power turbostat: Handle non-root legacy-uncore sysfs permissionsLen Brown1-1/+2
/sys/devices/system/cpu/intel_uncore_frequency/package_X_die_Y/ may be readable by all, but /sys/devices/system/cpu/intel_uncore_frequency/package_X_die_Y/current_freq_khz may be readable only by root. Non-root turbostat users see complaints in this scenario. Fail probe of the interface if we can't read current_freq_khz. Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Original-patch-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09tools/power turbostat: standardize PER_THREAD_PARAMSLen Brown1-20/+22
use a macro for PER_THREAD_PARAMS to make adding one later more clear. no functional change Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09tools/power turbostat: Fix DMR supportZhang Rui1-14/+15
Together with the RAPL MSRs, there are more MSRs gone on DMR, including PLR (Perf Limit Reasons), and IRTL (Package cstate Interrupt Response Time Limit) MSRs. The configurable TDP info should also be retrieved from TPMI based Intel Speed Select Technology feature. Remove the access of these MSRs for DMR. Improve the DMR platform feature table to make it more readable at the same time. Fixes: 83075bd59de2 ("tools/power turbostat: Add initial support for DMR") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09tools/power turbostat: add format "average" for external attributesMichael Hebenstreit2-11/+22
External atributes with format "raw" are not printed in summary lines for nodes/packages (or with option -S). The new format "average" behaves like "raw" but also adds the summary data Signed-off-by: Michael Hebenstreit <michael.hebenstreit@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09tools/power turbostat: delete GET_PKG()Len Brown1-15/+6
pkg_base[pkg_id] is a simple array of structure pointers, let the compiler treat it that way. Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09tools/power turbostat: probe and display L3 cache topologyLen Brown1-3/+31
Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09tools/power turbostat: Support more than 64 built-in-countersLen Brown1-154/+406
We have out-grown the ability to use a 64-bit memory location to inventory every possible built-in counter. Leverage the the CPU_SET(3) macros to break this barrier. Also, break the Joules & Watts counters into two, since we can no longer 'or' them together... Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-09tools/power turbostat.8: Document Totl%C0, Any%C0, GFX%C0, CPUGFX% columnsLen Brown1-0/+8
Explain the meaning of the Totl%C0, Any%C0, GFX%C0, CPUGFX% columns. Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-08tools/power turbostat: Fix bogus SysWatt for forked programZhang Rui1-0/+1
Similar to delta_cpu(), delta_platform() is called in turbostat main loop. This ensures accurate SysWatt readings in periodic monitoring mode $ sudo turbostat -S -q --show power -i 1 CoreTmp PkgTmp PkgWatt CorWatt GFXWatt RAMWatt PKG_% RAM_% SysWatt 60 61 6.21 1.13 0.16 0.00 0.00 0.00 13.07 58 61 6.00 1.07 0.18 0.00 0.00 0.00 12.75 58 61 5.74 1.05 0.17 0.00 0.00 0.00 12.22 58 60 6.27 1.11 0.24 0.00 0.00 0.00 13.55 However, delta_platform() is missing for forked program and causes bogus SysWatt reporting, $ sudo turbostat -S -q --show power sleep 1 1.004736 sec CoreTmp PkgTmp PkgWatt CorWatt GFXWatt RAMWatt PKG_% RAM_% SysWatt 57 58 6.05 1.02 0.16 0.00 0.00 0.00 0.03 Add missing delta_platform() for forked program. Fixes: e5f687b89bc2 ("tools/power turbostat: Add RAPL psys as a built-in counter") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-08tools/power turbostat: Handle cap_get_proc() ENOSYSCalvin Owens1-1/+9
Kernels configured with CONFIG_MULTIUSER=n have no cap_get_proc(). Check for ENOSYS to recognize this case, and continue on to attempt to access the requested MSRs (such as temperature). Signed-off-by: Calvin Owens <calvin@wbinvd.org> Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-08tools/power turbostat: Fix build with muslCalvin Owens1-0/+1
turbostat.c: In function 'parse_int_file': turbostat.c:5567:19: error: 'PATH_MAX' undeclared (first use in this function) 5567 | char path[PATH_MAX]; | ^~~~~~~~ turbostat.c: In function 'probe_graphics': turbostat.c:6787:19: error: 'PATH_MAX' undeclared (first use in this function) 6787 | char path[PATH_MAX]; | ^~~~~~~~ Signed-off-by: Calvin Owens <calvin@wbinvd.org> Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-08tools/power turbostat: verify arguments to params --show and --hideLen Brown1-2/+31
$ sudo turbostat --quiet --show junk turbostat: Counter 'junk' can not be added. Previously, invalid arguments to --show and --hide were silently ignored Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-08-08io_uring/memmap: cast nr_pages to size_t before shiftingJens Axboe1-1/+1
If the allocated size exceeds UINT_MAX, then it's necessary to cast the mr->nr_pages value to size_t to prevent it from overflowing. In practice this isn't much of a concern as the required memory size will have been validated upfront, and accounted to the user. And > 4GB sizes will be necessary to make the lack of a cast a problem, which greatly exceeds normal user locked_vm settings that are generally in the kb to mb range. However, if root is used, then accounting isn't done, and then it's possible to hit this issue. Link: https://lore.kernel.org/all/6895b298.050a0220.7f033.0059.GAE@google.com/ Cc: stable@vger.kernel.org Reported-by: syzbot+23727438116feb13df15@syzkaller.appspotmail.com Fixes: 087f997870a9 ("io_uring/memmap: implement mmap for regions") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-08-07mailbox/pcc: support mailbox management of the shared bufferAdam Young2-4/+127
Define a new, optional, callback that allows the driver to specify how the return data buffer is allocated. If that callback is set, mailbox/pcc.c is now responsible for reading from and writing to the PCC shared buffer. This also allows for proper checks of the Commnand complete flag between the PCC sender and receiver. For Type 4 channels, initialize the command complete flag prior to accepting messages. Since the mailbox does not know what memory allocation scheme to use for response messages, the client now has an optional callback that allows it to allocate the buffer for a response message. When an outbound message is written to the buffer, the mailbox checks for the flag indicating the client wants an tx complete notification via IRQ. Upon receipt of the interrupt It will pair it with the outgoing message. The expected use is to free the kernel memory buffer for the previous outgoing message. Signed-off-by: Adam Young <admiyo@os.amperecomputing.com> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
2025-08-07smb: server: Fix extension string in ksmbd_extract_shortname()Thorsten Blum1-1/+1
In ksmbd_extract_shortname(), strscpy() is incorrectly called with the length of the source string (excluding the NUL terminator) rather than the size of the destination buffer. This results in "__" being copied to 'extension' rather than "___" (two underscores instead of three). Use the destination buffer size instead to ensure that the string "___" (three underscores) is copied correctly. Cc: stable@vger.kernel.org Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-07ksmbd: limit repeated connections from clients with the same IPNamjae Jeon2-0/+18
Repeated connections from clients with the same IP address may exhaust the max connections and prevent other normal client connections. This patch limit repeated connections from clients with the same IP. Reported-by: tianshuo han <hantianshuo233@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-07smb: client: only use a single wait_queue to monitor smbdirect connection statusStefan Metzmacher2-11/+9
There's no need for separate conn_wait and disconn_wait queues. This will simplify the move to common code, the server code already a single wait_queue for this. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-07smb: client: don't call init_waitqueue_head(&info->conn_wait) twice in ↵Stefan Metzmacher1-1/+0
_smbd_get_connection It is already called long before we may hit this cleanup code path. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-07smb: client: improve logging in smbd_conn_upcall()Stefan Metzmacher1-4/+10
Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-07smb: client: return an error if rdma_connect does not return within 5 secondsStefan Metzmacher1-2/+4
This matches the timeout for tcp connections. Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Fixes: f198186aa9bb ("CIFS: SMBD: Establish SMB Direct connection") Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-08-07PCI: vmd: Fix wrong kfree() in vmd_msi_free()Nam Cao1-1/+3
vmd_msi_alloc() allocates struct vmd_irq and stashes it into irq_data->chip_data associated with the VMD's interrupt domain. vmd_msi_free() extracts the pointer by calling irq_get_chip_data() and frees it. irq_get_chip_data() returns the chip_data associated with the top interrupt domain. This worked in the past because VMD's interrupt domain was the top domain. But d7d8ab87e3e7 ("PCI: vmd: Switch to msi_create_parent_irq_domain()") changed the interrupt domain hierarchy so VMD's interrupt domain is not the top domain anymore. irq_get_chip_data() now returns the chip_data at the MSI devices' interrupt domains. It is therefore broken for vmd_msi_free() to kfree() this chip_data. Fix by extracting the chip_data associated with the VMD's interrupt domain. Fixes: d7d8ab87e3e7 ("PCI: vmd: Switch to msi_create_parent_irq_domain()") Reported-by: Kenneth Crudup <kenny@panix.com> Closes: https://lore.kernel.org/linux-pci/dfa40e48-8840-4e61-9fda-25cdb3ad81c1@panix.com/ Reported-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Closes: https://lore.kernel.org/linux-pci/ed53280ed15d1140700b96cca2734bf327ee92539e5eb68e80f5bbbf0f01@linux.gnuweeb.org/ Tested-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Tested-by: Kenneth Crudup <kenny@panix.com> Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com> Acked-by: Manivannan Sadhasivam <mani@kernel.org> Link: https://patch.msgid.link/20250807081051.2253962-1-namcao@linutronix.de
2025-08-07perf bpf-filter: Enable events manuallyIlya Leoshkevich1-1/+4
On s390, and, in general, on all platforms where the respective event supports auxiliary data gathering, the command: # ./perf record -u 0 -aB --synth=no -- ./perf test -w thloop [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.011 MB perf.data ] # ./perf report --stats | grep SAMPLE # does not generate samples in the perf.data file. On x86 the command: # sudo perf record -e intel_pt// -u 0 ls is broken too. Looking at the sequence of calls in 'perf record' reveals this behavior: 1. The event 'cycles' is created and enabled: record__open() +-> evlist__apply_filters() +-> perf_bpf_filter__prepare() +-> bpf_program.attach_perf_event() +-> bpf_program.attach_perf_event_opts() +-> __GI___ioctl(..., PERF_EVENT_IOC_ENABLE, ...) The event 'cycles' is enabled and active now. However the event's ring-buffer to store the samples generated by hardware is not allocated yet. 2. The event's fd is mmap()ed to create the ring buffer: record__open() +-> record__mmap() +-> record__mmap_evlist() +-> evlist__mmap_ex() +-> perf_evlist__mmap_ops() +-> mmap_per_cpu() +-> mmap_per_evsel() +-> mmap__mmap() +-> perf_mmap__mmap() +-> mmap() This allocates the ring buffer for the event 'cycles'. With mmap() the kernel creates the ring buffer: perf_mmap(): kernel function to create the event's ring | buffer to save the sampled data. | +-> ring_buffer_attach(): Allocates memory for ring buffer. | The PMU has auxiliary data setup function. The | has_aux(event) condition is true and the PMU's | stop() is called to stop sampling. It is not | restarted: | | if (has_aux(event)) | perf_event_stop(event, 0); | +-> cpumsf_pmu_stop(): Hardware sampling is stopped. No samples are generated and saved anymore. 3. After the event 'cycles' has been mapped, the event is enabled a second time in: __cmd_record() +-> evlist__enable() +-> __evlist__enable() +-> evsel__enable_cpu() +-> perf_evsel__enable_cpu() +-> perf_evsel__run_ioctl() +-> perf_evsel__ioctl() +-> __GI___ioctl(., PERF_EVENT_IOC_ENABLE, .) The second ioctl(fd, PERF_EVENT_IOC_ENABLE, 0); is just a NOP in this case. The first invocation in (1.) sets the event::state to PERF_EVENT_STATE_ACTIVE. The kernel functions perf_ioctl() +-> _perf_ioctl() +-> _perf_event_enable() +-> __perf_event_enable() return immediately because event::state is already set to PERF_EVENT_STATE_ACTIVE. This happens on s390, because the event 'cycles' offers the possibility to save auxilary data. The PMU callbacks setup_aux() and free_aux() are defined. Without both callback functions, cpumsf_pmu_stop() is not invoked and sampling continues. To remedy this, remove the first invocation of ioctl(..., PERF_EVENT_IOC_ENABLE, ...). in step (1.) Create the event in step (1.) and enable it in step (3.) after the ring buffer has been mapped. Output after: # ./perf record -aB --synth=no -u 0 -- ./perf test -w thloop 2 [ perf record: Woken up 3 times to write data ] [ perf record: Captured and wrote 0.876 MB perf.data ] # ./perf report --stats | grep SAMPLE SAMPLE events: 16200 (99.5%) SAMPLE events: 16200 # The software event succeeded both before and after the patch: # ./perf record -e cpu-clock -aB --synth=no -u 0 -- \ ./perf test -w thloop 2 [ perf record: Woken up 7 times to write data ] [ perf record: Captured and wrote 2.870 MB perf.data ] # ./perf report --stats | grep SAMPLE SAMPLE events: 53506 (99.8%) SAMPLE events: 53506 # Fixes: b4c658d4d63d61 ("perf target: Remove uid from target") Suggested-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Co-developed-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20250806162417.19666-3-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-07libbpf: Add the ability to suppress perf event enablementIlya Leoshkevich2-6/+11
Automatically enabling a perf event after attaching a BPF prog to it is not always desirable. Add a new "dont_enable" field to struct bpf_perf_event_opts. While introducing "enable" instead would be nicer in that it would avoid a double negation in the implementation, it would make DECLARE_LIBBPF_OPTS() less efficient. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Suggested-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Co-developed-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20250806162417.19666-2-iii@linux.ibm.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-07pptp: fix pptp_xmit() error pathEric Dumazet1-3/+4
I accidentally added a bug in pptp_xmit() that syzbot caught for us. Only call ip_rt_put() if a route has been allocated. BUG: unable to handle page fault for address: ffffffffffffffdb PGD df3b067 P4D df3b067 PUD df3d067 PMD 0 Oops: Oops: 0002 [#1] SMP KASAN PTI CPU: 1 UID: 0 PID: 6346 Comm: syz.0.336 Not tainted 6.16.0-next-20250804-syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025 RIP: 0010:arch_atomic_add_return arch/x86/include/asm/atomic.h:85 [inline] RIP: 0010:raw_atomic_sub_return_release include/linux/atomic/atomic-arch-fallback.h:846 [inline] RIP: 0010:atomic_sub_return_release include/linux/atomic/atomic-instrumented.h:327 [inline] RIP: 0010:__rcuref_put include/linux/rcuref.h:109 [inline] RIP: 0010:rcuref_put+0x172/0x210 include/linux/rcuref.h:173 Call Trace: <TASK> dst_release+0x24/0x1b0 net/core/dst.c:167 ip_rt_put include/net/route.h:285 [inline] pptp_xmit+0x14b/0x1a90 drivers/net/ppp/pptp.c:267 __ppp_channel_push+0xf2/0x1c0 drivers/net/ppp/ppp_generic.c:2166 ppp_channel_push+0x123/0x660 drivers/net/ppp/ppp_generic.c:2198 ppp_write+0x2b0/0x400 drivers/net/ppp/ppp_generic.c:544 vfs_write+0x27b/0xb30 fs/read_write.c:684 ksys_write+0x145/0x250 fs/read_write.c:738 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: de9c4861fb42 ("pptp: ensure minimal skb length in pptp_xmit()") Reported-by: syzbot+27d7cfbc93457e472e00@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/689095a5.050a0220.1fc43d.0009.GAE@google.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250807142146.2877060-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-07lib/sbitmap: make sbitmap_get_shallow() internalYu Kuai2-19/+16
Because it's only used in sbitmap.c Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20250807032413.1469456-3-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-08-07lib/sbitmap: convert shallow_depth from one word to the whole sbitmapYu Kuai6-73/+52
Currently elevators will record internal 'async_depth' to throttle asynchronous requests, and they both calculate shallow_dpeth based on sb->shift, with the respect that sb->shift is the available tags in one word. However, sb->shift is not the availbale tags in the last word, see __map_depth: if (index == sb->map_nr - 1) return sb->depth - (index << sb->shift); For consequence, if the last word is used, more tags can be get than expected, for example, assume nr_requests=256 and there are four words, in the worst case if user set nr_requests=32, then the first word is the last word, and still use bits per word, which is 64, to calculate async_depth is wrong. One the ohter hand, due to cgroup qos, bfq can allow only one request to be allocated, and set shallow_dpeth=1 will still allow the number of words request to be allocated. Fix this problems by using shallow_depth to the whole sbitmap instead of per word, also change kyber, mq-deadline and bfq to follow this, a new helper __map_depth_with_shallow() is introduced to calculate available bits in each word. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20250807032413.1469456-2-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-08-07nvmet: exit debugfs after discovery subsystem exitsMohamed Khalfella1-1/+1
Commit 528589947c180 ("nvmet: initialize discovery subsys after debugfs is initialized") changed nvmet_init() to initialize nvme discovery after "nvmet" debugfs directory is initialized. The change broke nvmet_exit() because discovery subsystem now depends on debugfs. Debugfs should be destroyed after discovery subsystem. Fix nvmet_exit() to do that. Reported-by: Yi Zhang <yi.zhang@redhat.com> Closes: https://lore.kernel.org/all/CAHj4cs96AfFQpyDKF_MdfJsnOEo=2V7dQgqjFv+k3t7H-=yGhA@mail.gmail.com/ Fixes: 528589947c180 ("nvmet: initialize discovery subsys after debugfs is initialized") Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Daniel Wagner <dwagner@suse.de> Link: https://lore.kernel.org/r/20250807053507.2794335-1-mkhalfella@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-08-07treewide: rename GPIO set callbacks back to their original namesBartosz Golaszewski282-356/+355
The conversion of all GPIO drivers to using the .set_rv() and .set_multiple_rv() callbacks from struct gpio_chip (which - unlike their predecessors - return an integer and allow the controller drivers to indicate failures to users) is now complete and the legacy ones have been removed. Rename the new callbacks back to their original names in one sweeping change. Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-08-07gpio: remove legacy GPIO line value setter callbacksBartosz Golaszewski2-28/+6
With no more users of the legacy GPIO line value setters - .set() and .set_multiple() - we can now remove them from the kernel. Link: https://lore.kernel.org/r/20250725074651.14002-1-brgl@bgdev.pl Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-08-07ALSA: hda/cirrus: Restrict prompt only for CONFIG_EXPERTTakashi Iwai1-8/+12
The split of Cirrus HD-audio codec driver may confuse users when migrating from the previous kernel configs and leave the needed drivers disabled. Although we've already set y as default, it's still safer to paper over the wrong choices. This patch marks the prompt of split CS420x and CS421x codec drivers with CONFIG_EXPERT, so that they are all enabled when the top-level CONFIG_SND_HDA_CODEC_CIRRUS is set. For users who really care about the minimalistic configuration, they can turn each driver on/off individually after setting CONFIG_EXPERT=y. This patch adds the missing help text to the top-level CONFIG_SND_HDA_CIRRUS_CODEC together with the explanation of individual choices, and corrects the help texts that don't fit well nowadays, too. Fixes: 1cb8744a36c7 ("ALSA: hda/cirrus: Split to cs420x and cs421x drivers") Link: https://lore.kernel.org/10172c80-daec-4e20-ab57-a483cf1afc02@molgen.mpg.de Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: https://patch.msgid.link/20250806192541.21949-4-tiwai@suse.de
2025-08-07ALSA: hda/hdmi: Restrict prompt only for CONFIG_EXPERTTakashi Iwai1-7/+13
The split of HDMI codec driver may confuse users when migrating from the previous kernel configs and leave some drivers disabled unexpectedly. Although we've already set y to all HDMI codec drivers as default, it's still safer to paper over the wrong choices. This patch marks the prompt of each HDMI codec driver with CONFIG_EXPERT, so that they are all enabled when the top-level CONFIG_SND_HDA_CODEC_HDMI is set. For users who really care about the minimalistic configuration, they can turn each driver on/off individually after setting CONFIG_EXPERT=y. The patch also adds the missing help text to the top-level CONFIG_SND_HDA_CODEC_HDMI together with the explanation of individual choices, too. Fixes: 73cd0490819d ("ALSA: hda/hdmi: Split vendor codec drivers") Link: https://lore.kernel.org/10172c80-daec-4e20-ab57-a483cf1afc02@molgen.mpg.de Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: https://patch.msgid.link/20250806192541.21949-3-tiwai@suse.de
2025-08-07ALSA: hda/realtek: Restrict prompt only for CONFIG_EXPERTTakashi Iwai1-12/+16
The split of Realtek HD-audio codec driver may cause confusions especially when migrating from the previous kernel configurations because it's hard to know which driver to be enabled. Although we've already set default=y for those codec drivers, it may still make people changing the stuff unnecessarily without knowing its side effect. This patch is for avoiding such pitfalls by marking the prompt of each Realtek codec driver with CONFIG_EXPERT. For "normal" users (that is, unless CONFIG_EXPERT is set), all Realtek HD-audio codecs are enabled together with CONFIG_SND_HDA_CODEC_REALTEK; this is the very same situation like the previous kernels, after all. For users who really care about the minimalistic configuration, they can turn each driver on/off individually after setting CONFIG_EXPERT=y. The patch also adds the missing help text to the top-level CONFIG_SND_HDA_CODEC_REALTEK together with the explanation of individual choices, too. Fixes: aeeb85f26c3b ("ALSA: hda: Split Realtek HD-audio codec driver") Link: https://lore.kernel.org/10172c80-daec-4e20-ab57-a483cf1afc02@molgen.mpg.de Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: https://patch.msgid.link/20250806192541.21949-2-tiwai@suse.de
2025-08-06drm/amdgpu: add missing vram lost check for LEGACY RESETAlex Deucher1-0/+1
Legacy resets reset the memory controllers so VRAM contents may be unreliable after reset. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit aae94897b6661a2a4b1de2d328090fc388b3e0af) Cc: stable@vger.kernel.org
2025-08-06drm/amdgpu/discovery: fix fw based ip discoveryAlex Deucher2-36/+41
We only need the fw based discovery table for sysfs. No need to parse it. Additionally parsing some of the board specific tables may result in incorrect data on some boards. just load the binary and don't parse it on those boards. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4441 Fixes: 80a0e8282933 ("drm/amdgpu/discovery: optionally use fw based ip discovery") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 62eedd150fa11aefc2d377fc746633fdb1baeb55) Cc: stable@vger.kernel.org
2025-08-06drm/amdkfd: Destroy KFD debugfs after destroy KFD wqAmber Lin1-1/+1
Since KFD proc content was moved to kernel debugfs, we can't destroy KFD debugfs before kfd_process_destroy_wq. Move kfd_process_destroy_wq prior to kfd_debugfs_fini to fix a kernel NULL pointer problem. It happens when /sys/kernel/debug/kfd was already destroyed in kfd_debugfs_fini but kfd_process_destroy_wq calls kfd_debugfs_remove_process. This line debugfs_remove_recursive(entry->proc_dentry); tries to remove /sys/kernel/debug/kfd/proc/<pid> while /sys/kernel/debug/kfd is already gone. It hangs the kernel by kernel NULL pointer. Signed-off-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Eric Huang <jinhuieric.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0333052d90683d88531558dcfdbf2525cc37c233) Cc: stable@vger.kernel.org
2025-08-06amdgpu/amdgpu_discovery: increase timeout limit for IFWI initXaver Hugl1-2/+2
With a timeout of only 1 second, my rx 5700XT fails to initialize, so this increases the timeout to 2s. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3697 Signed-off-by: Xaver Hugl <xaver.hugl@kde.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 9ed3d7bdf2dcdf1a1196630fab89a124526e9cc2) Cc: stable@vger.kernel.org