aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/stackcollapse.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2023-12-28netfs: Implement a write-through caching optionDavid Howells7-12/+162
Provide a flag whereby a filesystem may request that cifs_perform_write() perform write-through caching. This involves putting pages directly into writeback rather than dirty and attaching them to a write operation as we go. Further, the writes being made are limited to the byte range being written rather than whole folios being written. This can be used by cifs, for example, to deal with strict byte-range locking. This can't be used with content encryption as that may require expansion of the write RPC beyond the write being made. This doesn't affect writes via mmap - those are written back in the normal way; similarly failed writethrough writes are marked dirty and left to writeback to retry. Another option would be to simply invalidate them, but the contents can be simultaneously accessed by read() and through mmap. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28netfs: Provide a launder_folio implementationDavid Howells4-0/+80
Provide a launder_folio implementation for netfslib. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28netfs: Provide a writepages implementationDavid Howells2-0/+638
Provide an implementation of writepages for network filesystems to delegate to. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28netfs, cachefiles: Pass upper bound length to allow expansionDavid Howells9-26/+25
Make netfslib pass the maximum length to the ->prepare_write() op to tell the cache how much it can expand the length of a write to. This allows a write to the server at the end of a file to be limited to a few bytes whilst writing an entire block to the cache (something required by direct I/O). Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28netfs: Provide netfs_file_read_iter()David Howells2-0/+75
Provide a top-level-ish function that can be pointed to directly by ->read_iter file op. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28netfs: Allow buffered shared-writeable mmap through netfs_page_mkwrite()David Howells2-0/+63
Provide an entry point to delegate a filesystem's ->page_mkwrite() to. This checks for conflicting writes, then attached any netfs-specific group marking (e.g. ceph snap) to the page to be considered dirty. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28netfs: Implement buffered write APIDavid Howells2-0/+86
Institute a netfs write helper, netfs_file_write_iter(), to be pointed at by the network filesystem ->write_iter() call. Make it handled buffered writes by calling the previously defined netfs_perform_write() to copy the source data into the pagecache. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28netfs: Implement unbuffered/DIO write supportDavid Howells11-10/+224
Implement support for unbuffered writes and direct I/O writes. If the write is misaligned with respect to the fscrypt block size, then RMW cycles are performed if necessary. DIO writes are a special case of unbuffered writes with extra restriction imposed, such as block size alignment requirements. Also provide a field that can tell the code to add some extra space onto the bounce buffer for use by the filesystem in the case of a content-encrypted file. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28netfs: Implement unbuffered/DIO read supportDavid Howells10-11/+226
Implement support for unbuffered and DIO reads in the netfs library, utilising the existing read helper code to do block splitting and individual queuing. The code also handles extraction of the destination buffer from the supplied iterator, allowing async unbuffered reads to take place. The read will be split up according to the rsize setting and, if supplied, the ->clamp_length() method. Note that the next subrequest will be issued as soon as issue_op returns, without waiting for previous ones to finish. The network filesystem needs to pause or handle queuing them if it doesn't want to fire them all at the server simultaneously. Once all the subrequests have finished, the state will be assessed and the amount of data to be indicated as having being obtained will be determined. As the subrequests may finish in any order, if an intermediate subrequest is short, any further subrequests may be copied into the buffer and then abandoned. In the future, this will also take care of doing an unbuffered read from encrypted content, with the decryption being done by the library. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28netfs: Allocate multipage folios in the writepathDavid Howells1-2/+7
Allocate a multipage folio when copying data into the pagecache if possible if there's sufficient data to warrant it. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28netfs: Make netfs_read_folio() handle streaming-write pagesDavid Howells2-3/+60
netfs_read_folio() needs to handle partially-valid pages that are marked dirty, but not uptodate in the event that someone tries to read a page was used to cache data by a streaming write. In such a case, make netfs_read_folio() set up a bvec iterator that points to the parts of the folio that need filling and to a sink page for the data that should be discarded and use that instead of i_pages as the iterator to be written to. This requires netfs_rreq_unlock_folios() to convert the page into a normal dirty uptodate page, getting rid of the partial write record and bumping the group pointer over to folio->private. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28netfs: Provide func to copy data to pagecache for buffered writeDavid Howells7-0/+461
Provide a netfs write helper, netfs_perform_write() to buffer data to be written in the pagecache and mark the modified folios dirty. It will perform "streaming writes" for folios that aren't currently resident, if possible, storing data in partially modified folios that are marked dirty, but not uptodate. It will also tag pages as belonging to fs-specific write groups if so directed by the filesystem. This is derived from generic_perform_write(), but doesn't use ->write_begin() and ->write_end(), having that logic rolled in instead. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28netfs: Dispatch write requests to process a writeback sliceDavid Howells5-3/+432
Dispatch one or more write reqeusts to process a writeback slice, where a slice is tailored more to logical block divisions within the file (such as crypto blocks, an object layout or cache granules) than the protocol RPC maximum capacity. The dispatch doesn't happen until throttling allows, at which point the entire writeback slice is processed and queued. A slice may be written to multiple destinations (one or more servers and the local cache) and the writes to each destination might be split up along different lines. The writeback slice holds the required folios pinned. An iov_iter is provided in netfs_write_request that describes the buffer to be used. This may be part of the pagecache, may have auxiliary padding pages attached or may be a bounce buffer resulting from crypto or compression. Consequently, the filesystem must not twiddle the folio markings directly. The following API is available to the filesystem: (1) The ->create_write_requests() method is called to ask the filesystem to create the requests it needs. This is passed the writeback slice to be processed. (2) The filesystem should then call netfs_create_write_request() to create the requests it needs. (3) Once a request is initialised, netfs_queue_write_request() can be called to dispatch it asynchronously, if not completed immediately. (4) netfs_write_request_completed() should be called to note the completion of a request. (5) netfs_get_write_request() and netfs_put_write_request() are provided to refcount a request. These take constants from the netfs_wreq_trace enum for logging into ftrace. (6) The ->free_write_request is method is called to ask the filesystem to clean up a request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28netfs: Prep to use folio->private for write grouping and streaming writeDavid Howells3-0/+115
Prepare to use folio->private to hold information write grouping and streaming write. These are implemented in the same commit as they both make use of folio->private and will be both checked at the same time in several places. "Write grouping" involves ordering the writeback of groups of writes, such as is needed for ceph snaps. A group is represented by a filesystem-supplied object which must contain a netfs_group struct. This contains just a refcount and a pointer to a destructor. "Streaming write" is the storage of data in folios that are marked dirty, but not uptodate, to avoid unnecessary reads of data. This is represented by a netfs_folio struct. This contains the offset and length of the modified region plus the otherwise displaced write grouping pointer. The way folio->private is multiplexed is: (1) If private is NULL then neither is in operation on a dirty folio. (2) If private is set, with bit 0 clear, then this points to a group. (3) If private is set, with bit 0 set, then this points to a netfs_folio struct (with bit 0 AND'ed out). Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28netfs: Make the refcounting of netfs_begin_read() easier to useDavid Howells3-20/+23
Make the refcounting of netfs_begin_read() easier to use by not eating the caller's ref on the netfs_io_request it's given. This makes it easier to use when we need to look in the request struct after. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28netfs: Make netfs_put_request() handle a NULL pointerDavid Howells1-10/+13
Make netfs_put_request() just return if given a NULL request pointer. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28netfs: Add a hook to allow tell the netfs to update its i_sizeDavid Howells1-0/+4
Add a hook for netfslib's write helpers to call to tell the network filesystem that it should update its i_size. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28netfs: Extend the netfs_io_*request structs to handle writesDavid Howells6-7/+47
Modify the netfs_io_request struct to act as a point around which writes can be coordinated. It represents and pins a range of pages that need writing and a list of regions of dirty data in that range of pages. If RMW is required, the original data can be downloaded into the bounce buffer, decrypted if necessary, the modifications made, then the modified data can be reencrypted/recompressed and sent back to the server. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28netfs: Limit subrequest by size or number of segmentsDavid Howells3-0/+20
Limit a subrequest to a maximum size and/or a maximum number of contiguous physical regions. This permits, for instance, an subreq's iterator to be limited to the number of DMA'able segments that a large RDMA request can handle. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28netfs: Add func to calculate pagecount/size-limited span of an iteratorDavid Howells2-0/+99
Add a function to work out how much of an ITER_BVEC or ITER_XARRAY iterator we can use in a pagecount-limited and size-limited span. This will be used, for example, to limit the number of segments in a subrequest to the maximum number of elements that an RDMA transfer can handle. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28netfs: Provide tools to create a buffer in an xarrayDavid Howells3-0/+98
Provide tools to create a buffer in an xarray, with a function to add new folios with a mark. This will be used to create bounce buffer and can be used more easily to create a list of folios the span of which would require more than a page's worth of bio_vec structs. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28netfs: Add support for DIO bufferingDavid Howells2-0/+13
Add a bvec array pointer and an iterator to netfs_io_request for either holding a copy of a DIO iterator or a list of all the bits of buffer pointed to by a DIO iterator. There are two problems: Firstly, if an iovec-class iov_iter is passed to ->read_iter() or ->write_iter(), this cannot be passed directly to kernel_sendmsg() or kernel_recvmsg() as that may cause locking recursion if a fault is generated, so we need to keep track of the pages involved separately. Secondly, if the I/O is asynchronous, we must copy the iov_iter describing the buffer before returning to the caller as it may be immediately deallocated. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-24netfs: Add iov_iters to (sub)requests to describe various buffersDavid Howells4-24/+67
Add three iov_iter structs: (1) Add an iov_iter (->iter) to the I/O request to describe the unencrypted-side buffer. (2) Add an iov_iter (->io_iter) to the I/O request to describe the encrypted-side I/O buffer. This may be a different size to the buffer in (1). (3) Add an iov_iter (->io_iter) to the I/O subrequest to describe the part of the I/O buffer for that subrequest. This will allow future patches to point to a bounce buffer instead for purposes of handling oversize writes, decryption (where we want to save the encrypted data to the cache) and decompression. These iov_iters persist for the lifetime of the (sub)request, and so can be accessed multiple times without worrying about them being deallocated upon return to the caller. The network filesystem must appropriately advance the iterator before terminating the request. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-24netfs: Implement unbuffered/DIO vs buffered I/O lockingDavid Howells3-0/+227
Borrow NFS's direct-vs-buffered I/O locking into netfslib. Similar code is also used in ceph. Modify it to have the correct checker annotations for i_rwsem lock acquisition/release and to return -ERESTARTSYS if waits are interrupted. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-24netfs: Provide invalidate_folio and release_folio callsDavid Howells6-114/+54
Provide default invalidate_folio and release_folio calls. These will need to interact with invalidation correctly at some point. They will be needed if netfslib is to make use of folio->private for its own purposes. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-24afs: Don't use folio->private to record partial modificationDavid Howells4-285/+42
AFS currently uses folio->private to store the range of bytes within a folio that have been modified - the idea being that if we have, say, a 2MiB folio and someone writes a single byte, we only have to write back that single page and not the whole 2MiB folio - thereby saving on network bandwidth. Remove this, at least for now, and accept the extra network load (which doesn't matter in the common case of writing a whole file at a time from beginning to end). This makes folio->private available for netfslib to use. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-24netfs: Add a ->free_subrequest() opDavid Howells2-0/+3
Add a ->free_subrequest() op so that the netfs can clean up data attached to a subrequest. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-24netfs: Allow the netfs to make the io (sub)request alloc largerDavid Howells2-2/+7
Allow the network filesystem to specify extra space to be allocated on the end of the io (sub)request. This allows cifs, for example, to use this space rather than allocating its own cifs_readdata struct. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-24netfs: Add a procfile to list in-progress requestsDavid Howells4-3/+98
Add a procfile, /proc/fs/netfs/requests, to list in-progress netfslib I/O requests. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-24netfs: Move pinning-for-writeback from fscache to netfsDavid Howells20-187/+124
Move the resource pinning-for-writeback from fscache code to netfslib code. This is used to keep a cache backing object pinned whilst we have dirty pages on the netfs inode in the pagecache such that VM writeback will be able to reach it. Whilst we're at it, switch the parameters of netfs_unpin_writeback() to match ->write_inode() so that it can be used for that directly. Note that this mechanism could be more generically useful than that for network filesystems. Quite often they have to keep around other resources (e.g. authentication tokens or network connections) until the writeback is complete. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-24netfs, fscache: Move /proc/fs/fscache to /proc/fs/netfs and put in a symlinkDavid Howells7-32/+62
Rename /proc/fs/fscache to "netfs" and make a symlink from fscache to that. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Christian Brauner <christian@brauner.io> cc: linux-fsdevel@vger.kernel.org cc: linux-cachefs@redhat.com
2023-12-24netfs, fscache: Remove ->begin_cache_operationDavid Howells9-89/+23
Remove ->begin_cache_operation() in favour of just calling fscache directly. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Christian Brauner <christian@brauner.io> cc: linux-fsdevel@vger.kernel.org cc: linux-cachefs@redhat.com
2023-12-24netfs, fscache: Combine fscache with netfsDavid Howells18-311/+237
Now that the fscache code is moved to be colocated with the netfslib code so that they combined into one module, do the combining. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Christian Brauner <christian@brauner.io> cc: linux-fsdevel@vger.kernel.org cc: linux-cachefs@redhat.com cc: linux-nfs@vger.kernel.org, cc: linux-erofs@lists.ozlabs.org
2023-12-24netfs, fscache: Move fs/fscache/* into fs/netfs/David Howells17-69/+73
There's a problem with dependencies between netfslib and fscache as each wants to access some functions of the other. Deal with this by moving fs/fscache/* into fs/netfs/ and renaming those files to begin with "fscache-". For the moment, the moved files are changed as little as possible and an fscache module is still built. A subsequent patch will integrate them. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Christian Brauner <christian@brauner.io> cc: linux-fsdevel@vger.kernel.org cc: linux-cachefs@redhat.com
2023-12-24afs: Automatically generate trace tag enumsDavid Howells1-206/+27
Automatically generate trace tag enums from the symbol -> string mapping tables rather than having the enums as well, thereby reducing duplicated data. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org
2023-12-24afs: Remove whitespace before most ')' from the trace headerDavid Howells1-121/+121
checkpatch objects to whitespace before ')', so remove most of it from the afs trace header. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org
2023-12-23Linux 6.7-rc7v6.7-rc7Linus Torvalds1-1/+1
2023-12-22Input: soc_button_array - add mapping for airplane mode buttonChristoffer Sandberg1-0/+5
This add a mapping for the airplane mode button on the TUXEDO Pulse Gen3. While it is physically a key it behaves more like a switch, sending a key down on first press and a key up on 2nd press. Therefor the switch event is used here. Besides this behaviour it uses the HID usage-id 0xc6 (Wireless Radio Button) and not 0xc8 (Wireless Radio Slider Switch), but since neither 0xc6 nor 0xc8 are currently implemented at all in soc_button_array this not to standard behaviour is not put behind a quirk for the moment. Signed-off-by: Christoffer Sandberg <cs@tuxedo.de> Signed-off-by: Werner Sembach <wse@tuxedocomputers.com> Link: https://lore.kernel.org/r/20231215171718.80229-1-wse@tuxedocomputers.com Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2023-12-22debugfs: initialize cancellations earlierJohannes Berg1-2/+4
Tetsuo Handa pointed out that in the (now reverted) lockdep commit I initialized the data too late. The same is true for the cancellation data, it must be initialized before the cmpxchg(), otherwise it may be done twice and possibly even overwriting data in there already when there's a race. Fix that, which also requires destroying the mutex in case we lost the race. Fixes: 8c88a474357e ("debugfs: add API to allow debugfs operations cancellation") Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20231221150444.1e47a0377f80.If7e8ba721ba2956f12c6e8405e7d61e154aa7ae7@changeid Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-21afs: Fix use-after-free due to get/remove race in volume treeDavid Howells2-3/+25
When an afs_volume struct is put, its refcount is reduced to 0 before the cell->volume_lock is taken and the volume removed from the cell->volumes tree. Unfortunately, this means that the lookup code can race and see a volume with a zero ref in the tree, resulting in a use-after-free: refcount_t: addition on 0; use-after-free. WARNING: CPU: 3 PID: 130782 at lib/refcount.c:25 refcount_warn_saturate+0x7a/0xda ... RIP: 0010:refcount_warn_saturate+0x7a/0xda ... Call Trace: afs_get_volume+0x3d/0x55 afs_create_volume+0x126/0x1de afs_validate_fc+0xfe/0x130 afs_get_tree+0x20/0x2e5 vfs_get_tree+0x1d/0xc9 do_new_mount+0x13b/0x22e do_mount+0x5d/0x8a __do_sys_mount+0x100/0x12a do_syscall_64+0x3a/0x94 entry_SYSCALL_64_after_hwframe+0x62/0x6a Fix this by: (1) When putting, use a flag to indicate if the volume has been removed from the tree and skip the rb_erase if it has. (2) When looking up, use a conditional ref increment and if it fails because the refcount is 0, replace the node in the tree and set the removal flag. Fixes: 20325960f875 ("afs: Reorganise volume and server trees to be rooted on the cell") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-21ida: Fix crash in ida_free when the bitmap is emptyMatthew Wilcox (Oracle)2-1/+41
The IDA usually detects double-frees, but that detection failed to consider the case when there are no nearby IDs allocated and so we have a NULL bitmap rather than simply having a clear bit. Add some tests to the test-suite to be sure we don't inadvertently reintroduce this problem. Unfortunately they're quite noisy so include a message to disregard the warnings. Reported-by: Zhenghan Wang <wzhmmmmm@gmail.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-21afs: Fix overwriting of result of DNS queryDavid Howells1-2/+4
In afs_update_cell(), ret is the result of the DNS lookup and the errors are to be handled by a switch - however, the value gets clobbered in between by setting it to -ENOMEM in case afs_alloc_vlserver_list() fails. Fix this by moving the setting of -ENOMEM into the error handling for OOM failure. Further, only do it if we don't have an alternative error to return. Found by Linux Verification Center (linuxtesting.org) with SVACE. Based on a patch from Anastasia Belova [1]. Fixes: d5c32c89b208 ("afs: Fix cell DNS lookup") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: Anastasia Belova <abelova@astralinux.ru> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: lvc-project@linuxtesting.org Link: https://lore.kernel.org/r/20231221085849.1463-1-abelova@astralinux.ru/ [1] Link: https://lore.kernel.org/r/1700862.1703168632@warthog.procyon.org.uk/ # v1 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-12-21USB: serial: option: add Quectel EG912Y module supportAlper Ak1-0/+2
Add Quectel EG912Y "DIAG, AT, MODEM" 0x6001: ECM / RNDIS + DIAG + AT + MODEM T: Bus=01 Lev=02 Prnt=02 Port=00 Cnt=01 Dev#= 3 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=2c7c ProdID=6001 Rev= 3.18 S: Manufacturer=Android S: Product=Android S: SerialNumber=0000 C:* #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA A: FirstIf#= 0 IfCount= 2 Cls=02(comm.) Sub=06 Prot=00 I:* If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=06 Prot=00 Driver=cdc_ether E: Ad=87(I) Atr=03(Int.) MxPS= 64 Ivl=4096ms I: If#= 1 Alt= 0 #EPs= 0 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_ether I:* If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_ether E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=0c(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=0b(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=89(I) Atr=03(Int.) MxPS= 64 Ivl=4096ms E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=0f(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=88(I) Atr=03(Int.) MxPS= 64 Ivl=4096ms E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=0a(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms Signed-off-by: Alper Ak <alperyasinak1@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold <johan@kernel.org>
2023-12-21tracing / synthetic: Disable events after testing in synth_event_gen_test_init()Steven Rostedt (Google)1-0/+11
The synth_event_gen_test module can be built in, if someone wants to run the tests at boot up and not have to load them. The synth_event_gen_test_init() function creates and enables the synthetic events and runs its tests. The synth_event_gen_test_exit() disables the events it created and destroys the events. If the module is builtin, the events are never disabled. The issue is, the events should be disable after the tests are run. This could be an issue if the rest of the boot up tests are enabled, as they expect the events to be in a known state before testing. That known state happens to be disabled. When CONFIG_SYNTH_EVENT_GEN_TEST=y and CONFIG_EVENT_TRACE_STARTUP_TEST=y a warning will trigger: Running tests on trace events: Testing event create_synth_test: Enabled event during self test! ------------[ cut here ]------------ WARNING: CPU: 2 PID: 1 at kernel/trace/trace_events.c:4150 event_trace_self_tests+0x1c2/0x480 Modules linked in: CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.7.0-rc2-test-00031-gb803d7c664d5-dirty #276 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 RIP: 0010:event_trace_self_tests+0x1c2/0x480 Code: bb e8 a2 ab 5d fc 48 8d 7b 48 e8 f9 3d 99 fc 48 8b 73 48 40 f6 c6 01 0f 84 d6 fe ff ff 48 c7 c7 20 b6 ad bb e8 7f ab 5d fc 90 <0f> 0b 90 48 89 df e8 d3 3d 99 fc 48 8b 1b 4c 39 f3 0f 85 2c ff ff RSP: 0000:ffffc9000001fdc0 EFLAGS: 00010246 RAX: 0000000000000029 RBX: ffff88810399ca80 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffffb9f19478 RDI: ffff88823c734e64 RBP: ffff88810399f300 R08: 0000000000000000 R09: fffffbfff79eb32a R10: ffffffffbcf59957 R11: 0000000000000001 R12: ffff888104068090 R13: ffffffffbc89f0a0 R14: ffffffffbc8a0f08 R15: 0000000000000078 FS: 0000000000000000(0000) GS:ffff88823c700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000001f6282001 CR4: 0000000000170ef0 Call Trace: <TASK> ? __warn+0xa5/0x200 ? event_trace_self_tests+0x1c2/0x480 ? report_bug+0x1f6/0x220 ? handle_bug+0x6f/0x90 ? exc_invalid_op+0x17/0x50 ? asm_exc_invalid_op+0x1a/0x20 ? tracer_preempt_on+0x78/0x1c0 ? event_trace_self_tests+0x1c2/0x480 ? __pfx_event_trace_self_tests_init+0x10/0x10 event_trace_self_tests_init+0x27/0xe0 do_one_initcall+0xd6/0x3c0 ? __pfx_do_one_initcall+0x10/0x10 ? kasan_set_track+0x25/0x30 ? rcu_is_watching+0x38/0x60 kernel_init_freeable+0x324/0x450 ? __pfx_kernel_init+0x10/0x10 kernel_init+0x1f/0x1e0 ? _raw_spin_unlock_irq+0x33/0x50 ret_from_fork+0x34/0x60 ? __pfx_kernel_init+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> This is because the synth_event_gen_test_init() left the synthetic events that it created enabled. By having it disable them after testing, the other selftests will run fine. Link: https://lore.kernel.org/linux-trace-kernel/20231220111525.2f0f49b0@gandalf.local.home Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Tom Zanussi <zanussi@kernel.org> Fixes: 9fe41efaca084 ("tracing: Add synth event generation test module") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reported-by: Alexander Graf <graf@amazon.com> Tested-by: Alexander Graf <graf@amazon.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-21eventfs: Have event files and directories default to parent uid and gidSteven Rostedt (Google)1-3/+9
Dongliang reported: I found that in the latest version, the nodes of tracefs have been changed to dynamically created. This has caused me to encounter a problem where the gid I specified in the mounting parameters cannot apply to all files, as in the following situation: /data/tmp/events # mount | grep tracefs tracefs on /data/tmp type tracefs (rw,seclabel,relatime,gid=3012) gid 3012 = readtracefs /data/tmp # ls -lh total 0 -r--r----- 1 root readtracefs 0 1970-01-01 08:00 README -r--r----- 1 root readtracefs 0 1970-01-01 08:00 available_events ums9621_1h10:/data/tmp/events # ls -lh total 0 drwxr-xr-x 2 root root 0 2023-12-19 00:56 alarmtimer drwxr-xr-x 2 root root 0 2023-12-19 00:56 asoc It will prevent certain applications from accessing tracefs properly, I try to avoid this issue by making the following modifications. To fix this, have the files created default to taking the ownership of the parent dentry unless the ownership was previously set by the user. Link: https://lore.kernel.org/linux-trace-kernel/1703063706-30539-1-git-send-email-dongliang.cui@unisoc.com/ Link: https://lore.kernel.org/linux-trace-kernel/20231220105017.1489d790@gandalf.local.home Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Hongyu Jin <hongyu.jin@unisoc.com> Fixes: 28e12c09f5aa0 ("eventfs: Save ownership and mode") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reported-by: Dongliang Cui <cuidongliang390@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-12-21keys, dns: Allow key types (eg. DNS) to be reclaimed immediately on expiryDavid Howells6-23/+47
If a key has an expiration time, then when that time passes, the key is left around for a certain amount of time before being collected (5 mins by default) so that EKEYEXPIRED can be returned instead of ENOKEY. This is a problem for DNS keys because we want to redo the DNS lookup immediately at that point. Fix this by allowing key types to be marked such that keys of that type don't have this extra period, but are reclaimed as soon as they expire and turn this on for dns_resolver-type keys. To make this easier to handle, key->expiry is changed to be permanent if TIME64_MAX rather than 0. Furthermore, give such new-style negative DNS results a 1s default expiry if no other expiry time is set rather than allowing it to stick around indefinitely. This shouldn't be zero as ls will follow a failing stat call immediately with a second with AT_SYMLINK_NOFOLLOW added. Fixes: 1a4240f4764a ("DNS: Separate out CIFS DNS Resolver code") Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Markus Suvanto <markus.suvanto@gmail.com> cc: Wang Lei <wang840925@gmail.com> cc: Jeff Layton <jlayton@redhat.com> cc: Steve French <smfrench@gmail.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jarkko Sakkinen <jarkko@kernel.org> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: linux-cifs@vger.kernel.org cc: linux-nfs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: keyrings@vger.kernel.org cc: netdev@vger.kernel.org
2023-12-21gpio: dwapb: mask/unmask IRQ when disable/enale itxiongxin1-4/+8
In the hardware implementation of the I2C HID driver based on DesignWare GPIO IRQ chip, when the user continues to use the I2C HID device in the suspend process, the I2C HID interrupt will be masked after the resume process is finished. This is because the disable_irq()/enable_irq() of the DesignWare GPIO driver does not synchronize the IRQ mask register state. In normal use of the I2C HID procedure, the GPIO IRQ irq_mask()/irq_unmask() functions are called in pairs. In case of an exception, i2c_hid_core_suspend() calls disable_irq() to disable the GPIO IRQ. With low probability, this causes irq_unmask() to not be called, which causes the GPIO IRQ to be masked and not unmasked in enable_irq(), raising an exception. Add synchronization to the masked register state in the dwapb_irq_enable()/dwapb_irq_disable() function. mask the GPIO IRQ before disabling it. After enabling the GPIO IRQ, unmask the IRQ. Fixes: 7779b3455697 ("gpio: add a driver for the Synopsys DesignWare APB GPIO block") Cc: stable@kernel.org Co-developed-by: Riwen Lu <luriwen@kylinos.cn> Signed-off-by: Riwen Lu <luriwen@kylinos.cn> Signed-off-by: xiongxin <xiongxin@kylinos.cn> Acked-by: Serge Semin <fancer.lancer@gmail.com> Reviewed-by: Andy Shevchenko <andy@kernel.org> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2023-12-21gpiolib: cdev: add gpio_device locking wrapper around gpio_ioctl()Kent Gibson1-4/+12
While the GPIO cdev gpio_ioctl() call is in progress, the kernel can call gpiochip_remove() which will set gdev->chip to NULL, after which any subsequent access will cause a crash. gpio_ioctl() was overlooked by the previous fix to protect syscalls (bdbbae241a04), so add protection for that. Fixes: bdbbae241a04 ("gpiolib: protect the GPIO device against being dropped while in use by user-space") Fixes: d7c51b47ac11 ("gpio: userspace ABI for reading/writing GPIO lines") Fixes: 3c0d9c635ae2 ("gpiolib: cdev: support GPIO_V2_GET_LINE_IOCTL and GPIO_V2_LINE_GET_VALUES_IOCTL") Fixes: aad955842d1c ("gpiolib: cdev: support GPIO_V2_GET_LINEINFO_IOCTL and GPIO_V2_GET_LINEINFO_WATCH_IOCTL") Signed-off-by: Kent Gibson <warthog618@gmail.com> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2023-12-21net: check dev->gso_max_size in gso_features_check()Eric Dumazet1-0/+3
Some drivers might misbehave if TSO packets get too big. GVE for instance uses a 16bit field in its TX descriptor, and will do bad things if a packet is bigger than 2^16 bytes. Linux TCP stack honors dev->gso_max_size, but there are other ways for too big packets to reach an ndo_start_xmit() handler : virtio_net, af_packet, GRO... Add a generic check in gso_features_check() and fallback to GSO when needed. gso_max_size was added in the blamed commit. Fixes: 82cc1a7a5687 ("[NET]: Add per-connection option to set max TSO frame size") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231219125331.4127498-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-21kselftest: rtnetlink.sh: use grep_fail when expecting the cmd failHangbin Liu1-1/+1
run_cmd_grep_fail should be used when expecting the cmd fail, or the ret will be set to 1, and the total test return 1 when exiting. This would cause the result report to fail if run via run_kselftest.sh. Before fix: # ./rtnetlink.sh -t kci_test_addrlft PASS: preferred_lft addresses have expired # echo $? 1 After fix: # ./rtnetlink.sh -t kci_test_addrlft PASS: preferred_lft addresses have expired # echo $? 0 Fixes: 9c2a19f71515 ("kselftest: rtnetlink.sh: add verbose flag") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231219065737.1725120-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>