<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/cpufreq, branch v5.11</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.11</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.11'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2021-02-08T12:45:51Z</updated>
<entry>
<title>cpufreq: ACPI: Update arch scale-invariance max perf ratio if CPPC is not there</title>
<updated>2021-02-08T12:45:51Z</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2021-02-04T17:34:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d11a1d08a082a7dc0ada423d2b2e26e9b6f2525c'/>
<id>urn:sha1:d11a1d08a082a7dc0ada423d2b2e26e9b6f2525c</id>
<content type='text'>
If the maximum performance level taken for computing the
arch_max_freq_ratio value used in the x86 scale-invariance code is
higher than the one corresponding to the cpuinfo.max_freq value
coming from the acpi_cpufreq driver, the scale-invariant utilization
falls below 100% even if the CPU runs at cpuinfo.max_freq or slightly
faster, which causes the schedutil governor to select a frequency
below cpuinfo.max_freq.  That frequency corresponds to a frequency
table entry below the maximum performance level necessary to get to
the "boost" range of CPU frequencies which prevents "boost"
frequencies from being used in some workloads.

While this issue is related to scale-invariance, it may be amplified
by commit db865272d9c4 ("cpufreq: Avoid configuring old governors as
default with intel_pstate") from the 5.10 development cycle which
made it extremely easy to default to schedutil even if the preferred
driver is acpi_cpufreq as long as intel_pstate is built too, because
the mere presence of the latter effectively removes the ondemand
governor from the defaults.  Distro kernels are likely to include
both intel_pstate and acpi_cpufreq on x86, so their users who cannot
use intel_pstate or choose to use acpi_cpufreq may easily be
affectecd by this issue.

If CPPC is available, it can be used to address this issue by
extending the frequency tables created by acpi_cpufreq to cover the
entire available frequency range (including "boost" frequencies) for
each CPU, but if CPPC is not there, acpi_cpufreq has no idea what
the maximum "boost" frequency is and the frequency tables created by
it cannot be extended in a meaningful way, so in that case make it
ask the arch scale-invariance code to to use the "nominal" performance
level for CPU utilization scaling in order to avoid the issue at hand.

Fixes: db865272d9c4 ("cpufreq: Avoid configuring old governors as default with intel_pstate")
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Reviewed-by: Giovanni Gherdovich &lt;ggherdovich@suse.cz&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
</content>
</entry>
<entry>
<title>cpufreq: ACPI: Extend frequency tables to cover boost frequencies</title>
<updated>2021-02-08T12:45:51Z</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2021-02-04T17:25:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3c55e94c0adea4a5389c4b80f6ae9927dd6a4501'/>
<id>urn:sha1:3c55e94c0adea4a5389c4b80f6ae9927dd6a4501</id>
<content type='text'>
A severe performance regression on AMD EPYC processors when using
the schedutil scaling governor was discovered by Phoronix.com and
attributed to the following commits:

  41ea667227ba ("x86, sched: Calculate frequency invariance for AMD
  systems")

  976df7e5730e ("x86, sched: Use midpoint of max_boost and max_P for
  frequency invariance on AMD EPYC")

The source of the problem is that the maximum performance level taken
for computing the arch_max_freq_ratio value used in the x86 scale-
invariance code is higher than the one corresponding to the
cpuinfo.max_freq value coming from the acpi_cpufreq driver.

This effectively causes the scale-invariant utilization to fall below
100% even if the CPU runs at cpuinfo.max_freq or slightly faster, so
the schedutil governor selects a frequency below cpuinfo.max_freq
then.  That frequency corresponds to a frequency table entry below
the maximum performance level necessary to get to the "boost" range
of CPU frequencies.

However, if the cpuinfo.max_freq value coming from acpi_cpufreq was
higher, the schedutil governor would select higher frequencies which
in turn would allow acpi_cpufreq to set more adequate performance
levels and to get to the "boost" range of CPU frequencies more often.

This issue affects any systems where acpi_cpufreq is used and the
"boost" (or "turbo") frequencies are enabled, not just AMD EPYC.
Moreover, commit db865272d9c4 ("cpufreq: Avoid configuring old
governors as default with intel_pstate") from the 5.10 development
cycle made it extremely easy to default to schedutil even if the
preferred driver is acpi_cpufreq as long as intel_pstate is built
too, because the mere presence of the latter effectively removes the
ondemand governor from the defaults.  Distro kernels are likely to
include both intel_pstate and acpi_cpufreq on x86, so their users
who cannot use intel_pstate or choose to use acpi_cpufreq may
easily be affectecd by this issue.

To address this issue, extend the frequency table constructed by
acpi_cpufreq for each CPU to cover the entire range of available
frequencies (including the "boost" ones) if CPPC is available and
indicates that "boost" (or "turbo") frequencies are enabled.  That
causes cpuinfo.max_freq to become the maximum "boost" frequency of
the given CPU (instead of the maximum frequency returned by the ACPI
_PSS object that corresponds to the "nominal" performance level).

Fixes: 41ea667227ba ("x86, sched: Calculate frequency invariance for AMD systems")
Fixes: 976df7e5730e ("x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC")
Fixes: db865272d9c4 ("cpufreq: Avoid configuring old governors as default with intel_pstate")
Link: https://www.phoronix.com/scan.php?page=article&amp;item=linux511-amd-schedutil&amp;num=1
Link: https://lore.kernel.org/linux-pm/20210203135321.12253-2-ggherdovich@suse.cz/
Reported-by: Michael Larabel &lt;Michael@phoronix.com&gt;
Diagnosed-by: Giovanni Gherdovich &lt;ggherdovich@suse.cz&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Tested-by: Giovanni Gherdovich &lt;ggherdovich@suse.cz&gt;
Reviewed-by: Giovanni Gherdovich &lt;ggherdovich@suse.cz&gt;
Tested-by: Michael Larabel &lt;Michael@phoronix.com&gt;
</content>
</entry>
<entry>
<title>cpufreq: intel_pstate: remove obsolete functions</title>
<updated>2021-01-07T17:22:46Z</updated>
<author>
<name>Lukas Bulwahn</name>
<email>lukas.bulwahn@gmail.com</email>
</author>
<published>2020-12-21T05:13:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c4151604f0603d5700072183a05828ff87d764e4'/>
<id>urn:sha1:c4151604f0603d5700072183a05828ff87d764e4</id>
<content type='text'>
percent_fp() was used in intel_pstate_pid_reset(), which was removed in
commit 9d0ef7af1f2d ("cpufreq: intel_pstate: Do not use PID-based P-state
selection") and hence, percent_fp() is unused since then.

percent_ext_fp() was last used in intel_pstate_update_perf_limits(), which
was refactored in commit 1a4fe38add8b ("cpufreq: intel_pstate: Remove
max/min fractions to limit performance"), and hence, percent_ext_fp() is
unused since then.

make CC=clang W=1 points us those unused functions:

drivers/cpufreq/intel_pstate.c:79:23: warning: unused function 'percent_fp' [-Wunused-function]
static inline int32_t percent_fp(int percent)
                      ^

drivers/cpufreq/intel_pstate.c:94:23: warning: unused function 'percent_ext_fp' [-Wunused-function]
static inline int32_t percent_ext_fp(int percent)
                      ^

Remove those obsolete functions.

Signed-off-by: Lukas Bulwahn &lt;lukas.bulwahn@gmail.com&gt;
Reviewed-by: Nathan Chancellor &lt;natechancellor@gmail.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>cpufreq: powernow-k8: pass policy rather than use cpufreq_cpu_get()</title>
<updated>2021-01-07T16:37:33Z</updated>
<author>
<name>Colin Ian King</name>
<email>colin.king@canonical.com</email>
</author>
<published>2021-01-05T10:19:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=943bdd0cecad06da8392a33093230e30e501eccc'/>
<id>urn:sha1:943bdd0cecad06da8392a33093230e30e501eccc</id>
<content type='text'>
Currently there is an unlikely case where cpufreq_cpu_get() returns a
NULL policy and this will cause a NULL pointer dereference later on.

Fix this by passing the policy to transition_frequency_fidvid() from
the caller and hence eliminating the need for the cpufreq_cpu_get()
and cpufreq_cpu_put().

Thanks to Viresh Kumar for suggesting the fix.

Addresses-Coverity: ("Dereference null return")
Fixes: b43a7ffbf33b ("cpufreq: Notify all policy-&gt;cpus in cpufreq_notify_transition()")
Suggested-by: Viresh Kumar &lt;viresh.kumar@linaro.org&gt;
Signed-off-by: Colin Ian King &lt;colin.king@canonical.com&gt;
Acked-by: Viresh Kumar &lt;viresh.kumar@linaro.org&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>cpufreq: intel_pstate: Use HWP capabilities in intel_cpufreq_adjust_perf()</title>
<updated>2021-01-07T16:34:32Z</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2021-01-05T18:20:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=17ffd35809c34b9564edb10727d02eb62958ba5c'/>
<id>urn:sha1:17ffd35809c34b9564edb10727d02eb62958ba5c</id>
<content type='text'>
If turbo P-states cannot be used, either due to the configuration of
the processor, or because intel_pstate is not allowed to used them,
the maximum available P-state with HWP enabled corresponds to the
HWP_CAP.GUARANTEED value which is not static.  It can be adjusted by
an out-of-band agent or during an Intel Speed Select performance
level change, so long as it remains less than or equal to
HWP_CAP.MAX.

However, if turbo P-states cannot be used, intel_cpufreq_adjust_perf()
always uses pstate.max_pstate (set during the initialization of the
driver only) as the maximum available P-state, so it may miss a change
of the HWP_CAP.GUARANTEED value.

Prevent that from happening by modifyig intel_cpufreq_adjust_perf()
to always read the "guaranteed" and "maximum turbo" performance
levels from the cached HWP_CAP value.

Fixes: a365ab6b9dfb ("cpufreq: intel_pstate: Implement the -&gt;adjust_perf() callback")
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Acked-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
</content>
</entry>
<entry>
<title>cpufreq: intel_pstate: Fix fast-switch fallback path</title>
<updated>2020-12-30T17:22:17Z</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2020-12-29T17:08:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=be1283454b61a1f3b089f1a74b73e20532262e32'/>
<id>urn:sha1:be1283454b61a1f3b089f1a74b73e20532262e32</id>
<content type='text'>
When sugov_update_single_perf() falls back to the "frequency"
path due to the missing scale-invariance, it will call
cpufreq_driver_fast_switch() via sugov_fast_switch()
and the driver's -&gt;fast_switch() callback will be invoked,
so it must not be NULL.

However, after commit a365ab6b9dfb ("cpufreq: intel_pstate: Implement
the -&gt;adjust_perf() callback") intel_pstate sets -&gt;fast_switch() to
NULL when it is going to use intel_cpufreq_adjust_perf(), which is a
mistake, because on x86 the scale-invariance may be turned off
dynamically, so modify it to retain the original -&gt;adjust_perf()
callback pointer.

Fixes: a365ab6b9dfb ("cpufreq: intel_pstate: Implement the -&gt;adjust_perf() callback")
Reported-by: Kenneth R. Crudup &lt;kenny@panix.com&gt;
Tested-by: Kenneth R. Crudup &lt;kenny@panix.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'pm-cpufreq'</title>
<updated>2020-12-22T16:59:11Z</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2020-12-22T16:59:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c3a74f8e25e97166ca0f954414825ae98a3209f6'/>
<id>urn:sha1:c3a74f8e25e97166ca0f954414825ae98a3209f6</id>
<content type='text'>
* pm-cpufreq:
  cpufreq: intel_pstate: Use most recent guaranteed performance values
  cpufreq: intel_pstate: Implement the -&gt;adjust_perf() callback
  cpufreq: Add special-purpose fast-switching callback for drivers
  cpufreq: schedutil: Add util to struct sg_cpu
  cppc_cpufreq: replace per-cpu data array with a list
  cppc_cpufreq: expose information on frequency domains
  cppc_cpufreq: clarify support for coordination types
  cppc_cpufreq: use policy-&gt;cpu as driver of frequency setting
  ACPI: processor: fix NONE coordination for domain mapping failure
  ACPI: processor: Drop duplicate setting of shared_cpu_map
</content>
</entry>
<entry>
<title>cpufreq: intel_pstate: Use most recent guaranteed performance values</title>
<updated>2020-12-21T09:51:00Z</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2020-12-17T19:17:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e40ad84c26b4deeee46666492ec66b9a534b8e59'/>
<id>urn:sha1:e40ad84c26b4deeee46666492ec66b9a534b8e59</id>
<content type='text'>
When turbo has been disabled by the BIOS, but HWP_CAP.GUARANTEED is
changed later, user space may want to take advantage of this increased
guaranteed performance.

HWP_CAP.GUARANTEED is not a static value.  It can be adjusted by an
out-of-band agent or during an Intel Speed Select performance level
change.  The HWP_CAP.MAX is still the maximum achievable performance
with turbo disabled by the BIOS, so HWP_CAP.GUARANTEED can still
change as long as it remains less than or equal to HWP_CAP.MAX.

When HWP_CAP.GUARANTEED is changed, the sysfs base_frequency
attribute shows the most recent guaranteed frequency value. This
attribute can be used by user space software to update the scaling
min/max limits of the CPU.

Currently, the -&gt;setpolicy() callback already uses the latest
HWP_CAP values when setting HWP_REQ, but the -&gt;verify() callback will
restrict the user settings to the to old guaranteed performance value
which prevents user space from making use of the extra CPU capacity
theoretically available to it after increasing HWP_CAP.GUARANTEED.

To address this, read HWP_CAP in intel_pstate_verify_cpu_policy()
to obtain the maximum P-state that can be used and use that to
confine the policy max limit instead of using the cached and
possibly stale pstate.max_freq value for this purpose.

For consistency, update intel_pstate_update_perf_limits() to use the
maximum available P-state returned by intel_pstate_get_hwp_max() to
compute the maximum frequency instead of using the return value of
intel_pstate_get_max_freq() which, again, may be stale.

This issue is a side-effect of fixing the scaling frequency limits in
commit eacc9c5a927e ("cpufreq: intel_pstate: Fix intel_pstate_get_hwp_max()
for turbo disabled") which corrected the setting of the reduced scaling
frequency values, but caused stale HWP_CAP.GUARANTEED to be used in
the case at hand.

Fixes: eacc9c5a927e ("cpufreq: intel_pstate: Fix intel_pstate_get_hwp_max() for turbo disabled")
Reported-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Tested-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Cc: 5.8+ &lt;stable@vger.kernel.org&gt; # 5.8+
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>cpufreq: intel_pstate: Implement the -&gt;adjust_perf() callback</title>
<updated>2020-12-15T18:24:19Z</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2020-12-14T20:09:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a365ab6b9dfbaf8fb4fb4cd5d8a4c55dc4fb8b1c'/>
<id>urn:sha1:a365ab6b9dfbaf8fb4fb4cd5d8a4c55dc4fb8b1c</id>
<content type='text'>
Make intel_pstate expose the -&gt;adjust_perf() callback when it
operates in the passive mode with HWP enabled which causes the
schedutil governor to use that callback instead of -&gt;fast_switch().

The minimum and target performance-level values passed by the
governor to -&gt;adjust_perf() are converted to HWP.REQ.MIN and
HWP.REQ.DESIRED, respectively, which allows the processor to
adjust its configuration to maximize energy-efficiency while
providing sufficient capacity.

Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Acked-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Acked-by: Viresh Kumar &lt;viresh.kumar@linaro.org&gt;
</content>
</entry>
<entry>
<title>cpufreq: Add special-purpose fast-switching callback for drivers</title>
<updated>2020-12-15T18:24:18Z</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2020-12-14T20:08:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ee2cc4276ba4909438f5894a218877660e1536d9'/>
<id>urn:sha1:ee2cc4276ba4909438f5894a218877660e1536d9</id>
<content type='text'>
First off, some cpufreq drivers (eg. intel_pstate) can pass hints
beyond the current target frequency to the hardware and there are no
provisions for doing that in the cpufreq framework.  In particular,
today the driver has to assume that it should not allow the frequency
to fall below the one requested by the governor (or the required
capacity may not be provided) which may not be the case and which may
lead to excessive energy usage in some scenarios.

Second, the hints passed by these drivers to the hardware need not be
in terms of the frequency, so representing the utilization numbers
coming from the scheduler as frequency before passing them to those
drivers is not really useful.

Address the two points above by adding a special-purpose replacement
for the -&gt;fast_switch callback, called -&gt;adjust_perf, allowing the
governor to pass abstract performance level (rather than frequency)
values for the minimum (required) and target (desired) performance
along with the CPU capacity to compare them to.

Also update the schedutil governor to use the new callback instead
of -&gt;fast_switch if present and if the utilization mertics are
frequency-invariant (that is requisite for the direct mapping
between the utilization and the CPU performance levels to be a
reasonable approximation).

Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Acked-by: Viresh Kumar &lt;viresh.kumar@linaro.org&gt;
</content>
</entry>
</feed>
