linux - Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/

Age	Commit message (Collapse)	Author	Files	Lines
2023-12-29	thermal/sysfs: Update governors when the 'weight' has changed	Lukasz Luba	2	-0/+6
	Support governors update when the thermal instance's weight has changed. This allows to adjust internal state for the governor. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> [ rjw: Add two empty code lines aroung the locking ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29	thermal/sysfs: Update instance->weight under tz lock	Lukasz Luba	1	-0/+4
	User space can change the weight of a thermal instance via sysfs while the .throttle() callback is running for a governor, because weight_store() does not use the zone lock. The IPA governor uses instance weight values for power calculations and caches the sum of them as total_weight, so it gets confused when one of them changes while its .throttle() callback is running. To prevent that from happening, use thermal zone locking in weight_store(). Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29	thermal: gov_power_allocator: Simplify checks for valid power actor	Lukasz Luba	1	-23/+17
	There is a need to check if the cooling device in the thermal zone supports IPA callback and is set for control trip point. Refactor the code which validates the power actor capabilities and make it more consistent in all places. No intentional functional impact. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29	thermal: gov_power_allocator: Move memory allocation out of throttle()	Lukasz Luba	1	-71/+136
	The new thermal callback allows to react to the change of cooling instances in the thermal zone. Move the memory allocation to that new callback and save CPU cycles in the throttle() code path. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29	thermal: gov_power_allocator: Change trace functions	Lukasz Luba	2	-23/+32
	Change trace event trace_thermal_power_allocator() to not use dynamic array for requested power and granted power for all power actors. Instead, simplify the trace event and print other simple values. Add new trace event to print power actor information of requested power and granted power. That trace event would be called in a loop for each power actor. The trace data would be easier to parse comparing to the dynamic array implementation. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29	thermal: gov_power_allocator: Refactor checks in divvy_up_power()	Lukasz Luba	1	-10/+10
	Simplify the code and remove one extra 'if' block. No intentional functional impact. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29	thermal: gov_power_allocator: Refactor check_power_actors()	Lukasz Luba	1	-4/+6
	In preparation for a subsequent change, rearrange check_power_actors(). No intentional functional impact. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29	thermal: core: Add governor callback for thermal zone change	Lukasz Luba	3	-0/+22
	Add a new callback to the struct thermal_governor. It can be used for updating governors when there is a change in the thermal zone internals, e.g. thermal cooling device is bind to the thermal zone. That makes possible to move some heavy operations like memory allocations related to the number of cooling instances out of the throttle() callback. Both callback code paths (throttle() and update_tz()) are protected with the same thermal zone lock, which guaranties the consistency. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-28	thermal: netlink: Add thermal_group_has_listeners() helper	Stanislaw Gruszka	1	-0/+11
	Add a helper function to check if there are listeners for thermal_gnl_family multicast groups. For now use it to avoid unnecessary allocations and sending thermal genl messages when there are no recipients. In the future, in conjunction with (not yet implemented) notification of change in the netlink socket group membership, this helper can be used to open/close hardware interfaces based on the presence of user space subscribers. Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-28	thermal: netlink: Add enum for mutlicast groups indexes	Stanislaw Gruszka	1	-4/+9
	Use enum instead of hard-coded numbers for indexing multicast groups. Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-28	thermal: core: Resume thermal zones asynchronously	Rafael J. Wysocki	1	-4/+26
	The resume of thermal zones in thermal_pm_notify() is carried out sequentially, which may be a problem if __thermal_zone_device_update() takes a significant time to run for some thermal zones, because some other thermal zones may need to wait for them to resume then and if any other PM notifiers are going to be invoked after the thermal one, they will need to wait for it either. To address this, make thermal_pm_notify() switch the poll_queue delayed work over to a one-shot thermal_zone_device_resume() work function that will restore the original one during the thermal zone resume and queue up poll_queue without a delay for each thermal zone. Link: https://lore.kernel.org/linux-pm/20231120234015.3273143-1-radusolea@google.com/ Reported-by: Radu Solea <radusolea@google.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-28	thermal: core: Initialize poll_queue in thermal_zone_device_init()	Rafael J. Wysocki	1	-10/+13
	In preparation for a subsequent change, move the initialization of the poll_queue delayed work from thermal_zone_device_register_with_trips() to thermal_zone_device_init() which is called by the former. However, because thermal_zone_device_init() is also called by thermal_pm_notify(), make the latter call cancel_delayed_work() on poll_queue before invoking the former, so as to allow the work item to be re-initialized safely. Also move thermal_zone_device_check() which needs to be defined before thermal_zone_device_init(), so the latter can pass it to the INIT_DELAYED_WORK() macro. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-28	thermal: core: Fix thermal zone suspend-resume synchronization	Rafael J. Wysocki	2	-7/+25
	There are 3 synchronization issues with thermal zone suspend-resume during system-wide transitions: 1. The resume code runs in a PM notifier which is invoked after user space has been thawed, so it can run concurrently with user space which can trigger a thermal zone device removal. If that happens, the thermal zone resume code may use a stale pointer to the next list element and crash, because it does not hold thermal_list_lock while walking thermal_tz_list. 2. The thermal zone resume code calls thermal_zone_device_init() outside the zone lock, so user space or an update triggered by the platform firmware may see an inconsistent state of a thermal zone leading to unexpected behavior. 3. Clearing the in_suspend global variable in thermal_pm_notify() allows __thermal_zone_device_update() to continue for all thermal zones and it may as well run before the thermal_tz_list walk (or at any point during the list walk for that matter) and attempt to operate on a thermal zone that has not been resumed yet. It may also race destructively with thermal_zone_device_init(). To address these issues, add thermal_list_lock locking to thermal_pm_notify(), especially arount the thermal_tz_list, make it call thermal_zone_device_init() back-to-back with __thermal_zone_device_update() under the zone lock and replace in_suspend with per-zone bool "suspend" indicators set and unset under the given zone's lock. Link: https://lore.kernel.org/linux-pm/20231218162348.69101-1-bo.ye@mediatek.com/ Reported-by: Bo Ye <bo.ye@mediatek.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-21	thermal: cpuidle_cooling: fix kernel-doc warning and a spello	Randy Dunlap	1	-2/+2
	Correct one misuse of kernel-doc notation and one spelling error as reported by codespell. cpuidle_cooling.c:152: warning: cannot understand function prototype: 'struct thermal_cooling_device_ops cpuidle_cooling_ops = ' For the kernel-doc warning, don't use "/**" for a comment on data. kernel-doc can be used for structure declarations but not definitions. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-15	thermal: core: Fix NULL pointer dereference in zone registration error path	Rafael J. Wysocki	1	-1/+0
	If device_register() in thermal_zone_device_register_with_trips() returns an error, the tz variable is set to NULL and subsequently dereferenced in kfree(tz->tzp). Commit adc8749b150c ("thermal/drivers/core: Use put_device() if device_register() fails") added the tz = NULL assignment in question to avoid a possible double-free after dropping the reference to the zone device. However, after commit 4649620d9404 ("thermal: core: Make thermal_zone_device_unregister() return after freeing the zone"), that assignment has become redundant, because dropping the reference to the zone device does not cause the zone object to be freed any more. Drop it to address the NULL pointer dereference. Fixes: 3d439b1a2ad3 ("thermal/core: Alloc-copy-free the thermal zone parameters structure") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2023-12-13	thermal/core: Check get_temp ops is present when registering a tz	Daniel Lezcano	1	-6/+1
	Initially the check against the get_temp ops in the thermal_zone_device_update() was put in there in order to catch drivers not providing this method. Instead of checking again and again the function if the ops exists in the update function, let's do the check at registration time, so it is checked one time and for all. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-13	thermal: trip: Send trip change notifications on all trip updates	Rafael J. Wysocki	5	-8/+27
	The _store callbacks of the trip point temperature and hysteresis sysfs attributes invoke thermal_notify_tz_trip_change() to send a notification regarding the trip point change, but when trip points are updated by the platform firmware, trip point change notifications are not sent. To make the behavior after a trip point change more consistent, modify all of the 3 places where trip point temperature is updated to use a new function called thermal_zone_set_trip_temp() for this purpose and make that function call thermal_notify_tz_trip_change(). Note that trip point hysteresis can only be updated via sysfs and trip_point_hyst_store() calls thermal_notify_tz_trip_change() already, so this code path need not be changed. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2023-12-13	thermal: netlink: Use for_each_trip() in thermal_genl_cmd_tz_get_trip()	Rafael J. Wysocki	1	-12/+8
	Make thermal_genl_cmd_tz_get_trip() use for_each_trip() instead of an open- coded loop over trip indices. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2023-12-13	thermal: helpers: Use for_each_trip() in __thermal_zone_get_temp()	Rafael J. Wysocki	1	-7/+5
	Make __thermal_zone_get_temp() use for_each_trip() instead of an open- coded loop over trip indices. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2023-12-13	thermal: trip: Use for_each_trip() in __thermal_zone_set_trips()	Rafael J. Wysocki	1	-11/+7
	Make __thermal_zone_set_trips() use for_each_trip() instead of an open- coded loop over trip indices. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2023-12-13	thermal: trip: Drop redundant __thermal_zone_get_trip() header	Rafael J. Wysocki	1	-2/+0
	The __thermal_zone_get_trip() header in drivers/thermal/thermal_core.h is redundant, because there is one already in thermal.h, so drop it. No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2023-12-12	thermal: core: Rework thermal zone availability check	Rafael J. Wysocki	1	-3/+12
	In order to avoid running __thermal_zone_device_update() for thermal zones going away, the thermal zone lock is held around device_del() in thermal_zone_device_unregister() and thermal_zone_device_update() passes the given thermal zone device to device_is_registered(). This allows thermal_zone_device_update() to skip the __thermal_zone_device_update() if device_del() has already run for the thermal zone at hand. However, instead of looking at driver core internals, the thermal subsystem may as well rely on its own data structures for this purpose. Namely, if the thermal zone is not present in thermal_tz_list, it can be regarded as unavailable, which in fact is already the case in thermal_zone_device_unregister(). Accordingly, the device_is_registered() check in thermal_zone_device_update() can be replaced with checking whether or not the node list_head in struct thermal_zone_device is empty, in which case it is not there in thermal_tz_list. To make this work, though, it is necessary to initialize tz->node in thermal_zone_device_register_with_trips() before registering the thermal zone device and it needs to be added to thermal_tz_list and deleted from it under its zone lock. After the above modifications, the zone lock does not need to be held around device_del() in thermal_zone_device_unregister() any more. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-and-tested-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2023-12-12	thermal: Drop redundant and confusing device_is_registered() checks	Rafael J. Wysocki	4	-72/+7
	Multiple places in the thermal subsystem (most importantly, sysfs attribute callback functions) check if the given thermal zone device is still registered in order to return early in case the device_del() in thermal_zone_device_unregister() has run already. However, after thermal_zone_device_unregister() has been made wait for all of the zone-related activity to complete before returning, it is not necessary to do that any more, because all of the code holding a reference to the thermal zone device object will be waited for even if it does not do anything special to enforce this. Accordingly, drop all of the device_is_registered() checks that are now redundant and get rid of the zone locking that is not necessary any more after dropping them. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-and-tested-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2023-12-11	thermal: core: Make thermal_zone_device_unregister() return after freeing ↵	Rafael J. Wysocki	2	-1/+7
	the zone Make thermal_zone_device_unregister() wait until all of the references to the given thermal zone object have been dropped and free it before returning. This guarantees that when thermal_zone_device_unregister() returns, there is no leftover activity regarding the thermal zone in question which is required by some of its callers (for instance, modular driver code that wants to know when it is safe to let the module go away). Subsequently, this will allow some confusing device_is_registered() checks to be dropped from the thermal sysfs and core code. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-and-tested-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2023-12-06	thermal: sysfs: Rework the reading of trip point attributes	Rafael J. Wysocki	1	-27/+25
	Rework the _show() callback functions for the trip point temperature, hysteresis and type attributes to avoid copying the values of struct thermal_trip fields that they do not use and make them carry out the same validation checks as the corresponding _store() callback functions. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2023-12-06	thermal: sysfs: Rework the handling of trip point updates	Rafael J. Wysocki	4	-56/+47
	Both trip_point_temp_store() and trip_point_hyst_store() use thermal_zone_set_trip() to update a given trip point, but none of them actually needs to change more than one field in struct thermal_trip representing it. However, each of them effectively calls __thermal_zone_get_trip() twice in a row for the same trip index value, once directly and once via thermal_zone_set_trip(), which is not particularly efficient, and the way in which thermal_zone_set_trip() carries out the update is not particularly straightforward. Moreover, input processing need not be done under the thermal zone lock in any of these functions. Rework trip_point_temp_store() and trip_point_hyst_store() to address the above, move the part of thermal_zone_set_trip() that is still useful to a new function called thermal_zone_trip_updated() and drop the rest of it. While at it, make trip_point_hyst_store() reject negative hysteresis values. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2023-11-30	thermal: trip: Drop a redundant check from thermal_zone_set_trip()	Rafael J. Wysocki	1	-3/+0
	After recent changes in the thermal framework, a trip points array is required for registering a thermal zone that is not tripless, so the tz->trips pointer in thermal_zone_set_trip() is never NULL and the check involving it is redundant. Drop that check. No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2023-11-28	thermal: gov_power_allocator: Rearrange initialization of local variables	Lukasz Luba	1	-9/+6
	Rearrange the initialization of local variables in allocate_power() so as to improve code clarity and the visibility of the initial values. This change is not expected to alter the general functionality. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-11-28	thermal: gov_power_allocator: Remove excessive local variables	Lukasz Luba	1	-6/+5
	Local variable 'ret' in allocate_power() is only used in the return statement, so drop it. Local variable 'trip_max' in allocate_power() is only used for caching the params->trip_max value which may as well be accessed directly as needed, so drop it either. This change is not expected to alter the general functionality. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-11-28	thermal: gov_power_allocator: Use shorter paths to access data when possible	Lukasz Luba	1	-3/+3
	The 'cdev' pointer in allow_maximum_power() is valid, so there is no need to use 'instance->cdev' instead of it. This change is not expected to alter the general functionality. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-11-28	thermal: gov_power_allocator: Rearrange local variables	Lukasz Luba	1	-19/+20
	Rearrange the order of local variable definitions in multiple functions so as to follow the kernel coding style in that respect. Also, move local variable definitions located in nested code blocks to the beginning of each function to improve the visibility of all local variables in use. This change is not expected to alter the general functionality. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-11-28	thermal: gov_power_allocator: Check the cooling devices only for trip_max	Lukasz Luba	1	-2/+7
	The throttling logic only cares about the last passive trip point and the cooling devices attached to it. Therefore, there is no need to bail out if other trip points have cooling devices which are not a supported by the IPA. Check the cooling devices only for 'trip_max' during the binding. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-11-28	thermal: gov_power_allocator: Set up trip points earlier	Lukasz Luba	1	-10/+17
	Set up the trip points at the beginning of the binding function. This simplifies the code a bit and allows for further cleanups. Also add a check to fail the binding if the last passive trip point is not found. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-11-28	thermal: gov_power_allocator: Rename trip_max_desired_temperature	Lukasz Luba	1	-22/+18
	Refactor the code and rename the last passive trip point field. There is a comment describing the field properly. Use shorter field name so as to allow to clarify the code. This change is not expected to alter the general functionality. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-11-20	thermal: core: Add trip thresholds for trip crossing detection	Rafael J. Wysocki	2	-7/+38
	The trip crossing detection in handle_thermal_trip() does not work correctly in the cases when a trip point is crossed on the way up and then the zone temperature stays above its low temperature (that is, its temperature decreased by its hysteresis). The trip temperature may be passed by the zone temperature subsequently in that case, even multiple times, but that does not count as the trip crossing as long as the zone temperature does not fall below the trip's low temperature or, in other words, until the trip is crossed on the way down. \|-----------low--------high------------\| \|<--------->\| \| hyst \| \| \| \| -\|--> crossed on the way up \| <---\|-- crossed on the way down However, handle_thermal_trip() will invoke thermal_notify_tz_trip_up() every time the trip temperature is passed by the zone temperature on the way up regardless of whether or not the trip has been crossed on the way down yet. Moreover, it will not call thermal_notify_tz_trip_down() if the last zone temperature was between the trip's temperature and its low temperature, so some "trip crossed on the way down" events may not be reported. To address this issue, introduce trip thresholds equal to either the temperature of the given trip, or its low temperature, such that if the trip's threshold is passed by the zone temperature on the way up, its value will be set to the trip's low temperature and thermal_notify_tz_trip_up() will be called, and if the trip's threshold is passed by the zone temperature on the way down, its value will be set to the trip's temperature (high) and thermal_notify_tz_trip_down() will be called. Accordingly, if the threshold is passed on the way up, it cannot be passed on the way up again until its passed on the way down and if it is passed on the way down, it cannot be passed on the way down again until it is passed on the way up which guarantees correct triggering of trip crossing notifications. If the last temperature of the zone is invalid, the trip's threshold will be set depending of the zone's current temperature: If that temperature is above the trip's temperature, its threshold will be set to its low temperature or otherwise its threshold will be set to its (high) temperature. Because the zone temperature is initially set to invalid and tz->last_temperature is only updated by update_temperature(), this is sufficient to set the correct initial threshold values for all trips. Link: https://lore.kernel.org/all/20220718145038.1114379-4-daniel.lezcano@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-11-19	Linux 6.7-rc2v6.7-rc2	Linus Torvalds	1	-1/+1

2023-11-18	prctl: Disable prctl(PR_SET_MDWE) on parisc	Helge Deller	1	-0/+4
	systemd-254 tries to use prctl(PR_SET_MDWE) for it's MemoryDenyWriteExecute functionality, but fails on parisc which still needs executable stacks in certain combinations of gcc/glibc/kernel. Disable prctl(PR_SET_MDWE) by returning -EINVAL for now on parisc, until userspace has catched up. Signed-off-by: Helge Deller <deller@gmx.de> Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Sam James <sam@gentoo.org> Closes: https://github.com/systemd/systemd/issues/29775 Tested-by: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/all/875y2jro9a.fsf@gentoo.org/ Cc: <stable@vger.kernel.org> # v6.3+
2023-11-18	parisc/power: Fix power soft-off when running on qemu	Helge Deller	1	-1/+1
	Firmware returns the physical address of the power switch, so need to use gsc_writel() instead of direct memory access. Fixes: d0c219472980 ("parisc/power: Add power soft-off when running on qemu") Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org # v6.0+
2023-11-18	parisc: Replace strlcpy() with strscpy()	Kees Cook	1	-1/+1
	strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated[1]. Additionally, it returns the size of the source string, not the resulting size of the destination string. In an effort to remove strlcpy() completely[2], replace strlcpy() here with strscpy(). Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [1] Link: https://github.com/KSPP/linux/issues/89 [2] Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Azeem Shaikh <azeemshaikh38@gmail.com> Cc: linux-parisc@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Helge Deller <deller@gmx.de>
2023-11-17	NFSD: Fix checksum mismatches in the duplicate reply cache	Chuck Lever	3	-24/+54
	nfsd_cache_csum() currently assumes that the server's RPC layer has been advancing rq_arg.head[0].iov_base as it decodes an incoming request, because that's the way it used to work. On entry, it expects that buf->head[0].iov_base points to the start of the NFS header, and excludes the already-decoded RPC header. These days however, head[0].iov_base now points to the start of the RPC header during all processing. It no longer points at the NFS Call header when execution arrives at nfsd_cache_csum(). In a retransmitted RPC the XID and the NFS header are supposed to be the same as the original message, but the contents of the retransmitted RPC header can be different. For example, for krb5, the GSS sequence number will be different between the two. Thus if the RPC header is always included in the DRC checksum computation, the checksum of the retransmitted message might not match the checksum of the original message, even though the NFS part of these messages is identical. The result is that, even if a matching XID is found in the DRC, the checksum mismatch causes the server to execute the retransmitted RPC transaction again. Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-11-17	NFSD: Fix "start of NFS reply" pointer passed to nfsd_cache_update()	Chuck Lever	1	-1/+3
	The "statp + 1" pointer that is passed to nfsd_cache_update() is supposed to point to the start of the egress NFS Reply header. In fact, it does point there for AUTH_SYS and RPCSEC_GSS_KRB5 requests. But both krb5i and krb5p add fields between the RPC header's accept_stat field and the start of the NFS Reply header. In those cases, "statp + 1" points at the extra fields instead of the Reply. The result is that nfsd_cache_update() caches what looks to the client like garbage. A connection break can occur for a number of reasons, but the most common reason when using krb5i/p is a GSS sequence number window underrun. When an underrun is detected, the server is obliged to drop the RPC and the connection to force a retransmit with a fresh GSS sequence number. The client presents the same XID, it hits in the server's DRC, and the server returns the garbage cache entry. The "statp + 1" argument has been used since the oldest changeset in the kernel history repo, so it has been in nfsd_dispatch() literally since before history began. The problem arose only when the server-side GSS implementation was added twenty years ago. Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-11-17	NFSD: Update nfsd_cache_append() to use xdr_stream	Chuck Lever	1	-15/+8
	When inserting a DRC-cached response into the reply buffer, ensure that the reply buffer's xdr_stream is updated properly. Otherwise the server will send a garbage response. Cc: stable@vger.kernel.org # v6.3+ Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-11-17	nfsd: fix file memleak on client_opens_release	Mahmoud Adam	1	-1/+1
	seq_release should be called to free the allocated seq_file Cc: stable@vger.kernel.org # v5.3+ Signed-off-by: Mahmoud Adam <mngyadam@amazon.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Fixes: 78599c42ae3c ("nfsd4: add file to display list of client's opens") Reviewed-by: NeilBrown <neilb@suse.de> Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-11-17	dm-crypt: start allocating with MAX_ORDER	Mikulas Patocka	1	-1/+1
	Commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") changed the meaning of MAX_ORDER from exclusive to inclusive. So, we can allocate compound pages with up to 1 << MAX_ORDER pages. Reflect this change in dm-crypt and start trying to allocate compound pages with MAX_ORDER. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-11-17	dm-verity: don't use blocking calls from tasklets	Mikulas Patocka	3	-14/+15
	The commit 5721d4e5a9cd enhanced dm-verity, so that it can verify blocks from tasklets rather than from workqueues. This reportedly improves performance significantly. However, dm-verity was using the flag CRYPTO_TFM_REQ_MAY_SLEEP from tasklets which resulted in warnings about sleeping function being called from non-sleeping context. BUG: sleeping function called from invalid context at crypto/internal.h:206 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 14, name: ksoftirqd/0 preempt_count: 100, expected: 0 RCU nest depth: 0, expected: 0 CPU: 0 PID: 14 Comm: ksoftirqd/0 Tainted: G W 6.7.0-rc1 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x32/0x50 __might_resched+0x110/0x160 crypto_hash_walk_done+0x54/0xb0 shash_ahash_update+0x51/0x60 verity_hash_update.isra.0+0x4a/0x130 [dm_verity] verity_verify_io+0x165/0x550 [dm_verity] ? free_unref_page+0xdf/0x170 ? psi_group_change+0x113/0x390 verity_tasklet+0xd/0x70 [dm_verity] tasklet_action_common.isra.0+0xb3/0xc0 __do_softirq+0xaf/0x1ec ? smpboot_thread_fn+0x1d/0x200 ? sort_range+0x20/0x20 run_ksoftirqd+0x15/0x30 smpboot_thread_fn+0xed/0x200 kthread+0xdc/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x28/0x40 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork_asm+0x11/0x20 </TASK> This commit fixes dm-verity so that it doesn't use the flags CRYPTO_TFM_REQ_MAY_SLEEP and CRYPTO_TFM_REQ_MAY_BACKLOG from tasklets. The crypto API would do GFP_ATOMIC allocation instead, it could return -ENOMEM and we catch -ENOMEM in verity_tasklet and requeue the request to the workqueue. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org # v6.0+ Fixes: 5721d4e5a9cd ("dm verity: Add optional "try_verify_in_tasklet" feature") Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-11-17	dm-bufio: fix no-sleep mode	Mikulas Patocka	1	-25/+62
	dm-bufio has a no-sleep mode. When activated (with the DM_BUFIO_CLIENT_NO_SLEEP flag), the bufio client is read-only and we could call dm_bufio_get from tasklets. This is used by dm-verity. Unfortunately, commit 450e8dee51aa ("dm bufio: improve concurrent IO performance") broke this and the kernel would warn that cache_get() was calling down_read() from no-sleeping context. The bug can be reproduced by using "veritysetup open" with the "--use-tasklets" flag. This commit fixes dm-bufio, so that the tasklet mode works again, by expanding use of the 'no_sleep_enabled' static_key to conditionally use either a rw_semaphore or rwlock_t (which are colocated in the buffer_tree structure using a union). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org # v6.4 Fixes: 450e8dee51aa ("dm bufio: improve concurrent IO performance") Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-11-17	dm-delay: avoid duplicate logic	Mikulas Patocka	1	-44/+21
	This is small refactoring of dm-delay - we avoid duplicate logic in flush_delayed_bios and flush_delayed_bios_fast and join these two functions into one. We also add cond_resched() to flush_delayed_bios because the list may have unbounded number of entries. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-11-17	dm-delay: fix bugs introduced by kthread mode	Mikulas Patocka	1	-26/+35
	This commit fixes the following bugs introduced by commit 70bbeb29fab0 ("dm delay: for short delays, use kthread instead of timers and wq"): * the function flush_worker_fn has no exit path - on unload, this function will just loop and consume 100% CPU without any progress * the wake-up mechanism in flush_worker_fn is racy - a wake up will be missed if the process adds entries to the delayed_bios list just before set_current_state(TASK_INTERRUPTIBLE) * flush_delayed_bios_fast submits a bio while holding a global mutex; this may deadlock if we have multiple stacked dm-delay devices and the underlying device attempts to acquire the mutex too * if the target constructor fails, it will call delay_dtr. delay_dtr would attempt to free dc->timer_lock without it being initialized by the constructor. * if the target constructor's kthread allocation fails, delay_dtr would crash trying to dereference dc->worker because it is non-NULL due to ERR_PTR. Fixes: 70bbeb29fab0 ("dm delay: for short delays, use kthread instead of timers and wq") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-11-17	dm-delay: fix a race between delay_presuspend and delay_bio	Mikulas Patocka	1	-5/+11
	In delay_presuspend, we set the atomic variable may_delay and then stop the timer and flush pending bios. The intention here is to prevent the delay target from re-arming the timer again. However, this test is racy. Suppose that one thread goes to delay_bio, sees that dc->may_delay is one and proceeds; now, another thread executes delay_presuspend, it sets dc->may_delay to zero, deletes the timer and flushes pending bios. Then, the first thread continues and adds the bio to delayed->list despite the fact that dc->may_delay is false. Fix this bug by changing may_delay's type from atomic_t to bool and only access it while holding the delayed_bios_lock mutex. Note that we don't have to grab the mutex in delay_resume because there are no bios in flight at this point. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-11-17	drm/amdgpu/gmc9: disable AGP aperture	Alex Deucher	1	-1/+1
	We've had misc reports of random IOMMU page faults when this is used. It's just a rarely used optimization anyway, so let's just disable it. It can still be toggled via the module parameter for testing. v2: leave it configurable via module parameter Reviewed-by: Yang Wang <kevinyang.wang@amd.com> (v1) Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Mario Limonciello <mario.limonciello@amd.com> # PHX & Navi33 Signed-off-by: Alex Deucher <alexander.deucher@amd.com>