<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/idle, branch v6.7</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.7</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.7'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2023-10-07T09:33:28Z</updated>
<entry>
<title>intel_idle: Add ibrs_off module parameter to force-disable IBRS</title>
<updated>2023-10-07T09:33:28Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2023-07-27T18:46:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=aa1567a7e6440b8c3af4b0d8a8219d8fc5028c5f'/>
<id>urn:sha1:aa1567a7e6440b8c3af4b0d8a8219d8fc5028c5f</id>
<content type='text'>
Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
disables IBRS when the cstate is 6 or lower. However, there are
some use cases where a customer may want to use max_cstate=1 to
lower latency. Such use cases will suffer from the performance
degradation caused by the enabling of IBRS in the sibling idle thread.
Add a "ibrs_off" module parameter to force disable IBRS and the
CPUIDLE_FLAG_IRQ_ENABLE flag if set.

In the case of a Skylake server with max_cstate=1, this new ibrs_off
option will likely increase the IRQ response latency as IRQ will now
be disabled.

When running SPECjbb2015 with cstates set to C1 on a Skylake system.

First test when the kernel is booted with: "intel_idle.ibrs_off":

  max-jOPS = 117828, critical-jOPS = 66047

Then retest when the kernel is booted without the "intel_idle.ibrs_off"
added:

  max-jOPS = 116408, critical-jOPS = 58958

That means booting with "intel_idle.ibrs_off" improves performance by:

  max-jOPS:      +1.2%, which could be considered noise range.
  critical-jOPS: +12%,  which is definitely a solid improvement.

The admin-guide/pm/intel_idle.rst file is updated to add a description
about the new "ibrs_off" module parameter.

Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: https://lore.kernel.org/r/20230727184600.26768-5-longman@redhat.com
</content>
</entry>
<entry>
<title>intel_idle: Use __update_spec_ctrl() in intel_idle_ibrs()</title>
<updated>2023-10-07T09:33:28Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2023-07-27T18:45:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7506203089dceb1d9e1f35d37ad2e46d44798a6d'/>
<id>urn:sha1:7506203089dceb1d9e1f35d37ad2e46d44798a6d</id>
<content type='text'>
When intel_idle_ibrs() is called, it modifies the SPEC_CTRL MSR to 0
in order disable IBRS. However, the new MSR value isn't reflected in
x86_spec_ctrl_current which is at odd with the other code that keep track
of its state in that percpu variable.  Use the new __update_spec_ctrl()
to have the x86_spec_ctrl_current percpu value properly updated.

Since spec-ctrl.h includes both msr.h and nospec-branch.h, we can remove
those from the include file list.

Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: https://lore.kernel.org/r/20230727184600.26768-4-longman@redhat.com
</content>
</entry>
<entry>
<title>Merge tag 'perf-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2023-08-28T23:35:01Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-08-28T23:35:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1a7c611546e552193180941ecf6b191e659db979'/>
<id>urn:sha1:1a7c611546e552193180941ecf6b191e659db979</id>
<content type='text'>
Pull perf event updates from Ingo Molnar:

 - AMD IBS improvements

 - Intel PMU driver updates

 - Extend core perf facilities &amp; the ARM PMU driver to better handle ARM big.LITTLE events

 - Micro-optimize software events and the ring-buffer code

 - Misc cleanups &amp; fixes

* tag 'perf-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/uncore: Remove unnecessary ?: operator around pcibios_err_to_errno() call
  perf/x86/intel: Add Crestmont PMU
  x86/cpu: Update Hybrids
  x86/cpu: Fix Crestmont uarch
  x86/cpu: Fix Gracemont uarch
  perf: Remove unused extern declaration arch_perf_get_page_size()
  perf: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
  arm_pmu: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
  perf/x86: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
  arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability
  perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src
  perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA
  perf/mem: Introduce PERF_MEM_LVLNUM_UNC
  perf/ring_buffer: Use local_try_cmpxchg in __perf_output_begin
  locking/arch: Avoid variable shadowing in local_try_cmpxchg()
  perf/core: Use local64_try_cmpxchg in perf_swevent_set_period
  perf/x86: Use local64_try_cmpxchg
  perf/amd: Prevent grouping of IBS events
</content>
</entry>
<entry>
<title>x86/cpu: Fix Gracemont uarch</title>
<updated>2023-08-09T19:51:06Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2023-08-07T12:38:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=882cdb06b668488a42ef717a260c05ba7dc43a49'/>
<id>urn:sha1:882cdb06b668488a42ef717a260c05ba7dc43a49</id>
<content type='text'>
Alderlake N is an E-core only product using Gracemont
micro-architecture. It fits the pre-existing naming scheme perfectly
fine, adhere to it.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Acked-by: Hans de Goede &lt;hdegoede@redhat.com&gt;
Link: https://lore.kernel.org/r/20230807150405.686834933@infradead.org
</content>
</entry>
<entry>
<title>Revert "intel_idle: Add support for using intel_idle in a VM guest using just hlt"</title>
<updated>2023-07-19T18:10:03Z</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2023-07-19T18:10:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5534f44627418897cd901d725303ce3dedd7bc1e'/>
<id>urn:sha1:5534f44627418897cd901d725303ce3dedd7bc1e</id>
<content type='text'>
This reverts commit 2f3d08f074b0 ("intel_idle: Add support for using
intel_idle in a VM guest using just hlt"), because it causes functional
issues to appear and it is not really useful without a related commit
that got reverted previously.

Link: https://lore.kernel.org/linux-pm/5c7de6d5-7706-c4a5-7c41-146db1269aff@intel.com
Reported-by: Xiaoyao Li &lt;xiaoyao.li@intel.com&gt;
Requested-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>Revert "intel_idle: Add a "Long HLT" C1 state for the VM guest mode"</title>
<updated>2023-07-19T18:05:43Z</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2023-07-19T17:57:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a5155c023d6a2c9ff5f4fad5f75117578a58affc'/>
<id>urn:sha1:a5155c023d6a2c9ff5f4fad5f75117578a58affc</id>
<content type='text'>
This reverts commit 0fac214bb75e ("intel_idle: Add a "Long HLT" C1 state
for the VM guest mode"), because there is a coding mistake in it and its
validity is questioned.

Link: https://lore.kernel.org/all/20230711132553.GN3062772@hirez.programming.kicks-ass.net
Requested-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>Revert "intel_idle: Add __init annotation to matchup_vm_state_with_baremetal()"</title>
<updated>2023-07-19T17:56:08Z</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2023-07-19T17:56:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d46b0a05bdc849d39c75268ddaf654c59bd6785c'/>
<id>urn:sha1:d46b0a05bdc849d39c75268ddaf654c59bd6785c</id>
<content type='text'>
This reverts commit b2918089d5cb ("intel_idle: Add __init annotation to
matchup_vm_state_with_baremetal()"), because the commit fixed by it will
be reverted.

Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>intel_idle: Add __init annotation to matchup_vm_state_with_baremetal()</title>
<updated>2023-06-28T17:09:55Z</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2023-06-27T17:50:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b2918089d5cb452e928ad9f86c5ee601c592cee3'/>
<id>urn:sha1:b2918089d5cb452e928ad9f86c5ee601c592cee3</id>
<content type='text'>
The caller of (recently added) matchup_vm_state_with_baremetal() is an
__init function and it uses some __initdata data structures, so add the
__init annotation to it for consistency.

This addresses the following build warnings:

WARNING: modpost: vmlinux: section mismatch in reference: matchup_vm_state_with_baremetal+0x51 (section: .text) -&gt; intel_idle_max_cstate_reached (section: .init.text)
WARNING: modpost: vmlinux: section mismatch in reference: matchup_vm_state_with_baremetal+0x62 (section: .text) -&gt; cpuidle_state_table (section: .init.data)
WARNING: modpost: vmlinux: section mismatch in reference: matchup_vm_state_with_baremetal+0x79 (section: .text) -&gt; icpu (section: .init.data)

Fixes: 0fac214bb75e ("intel_idle: Add a "Long HLT" C1 state for the VM guest mode")
Reported-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Tested-by: Randy Dunlap &lt;rdunlap@infradead.org&gt; # build-tested
Reviewed-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
</content>
</entry>
<entry>
<title>intel_idle: Add a "Long HLT" C1 state for the VM guest mode</title>
<updated>2023-06-21T17:46:58Z</updated>
<author>
<name>Arjan van de Ven</name>
<email>arjan@linux.intel.com</email>
</author>
<published>2023-06-20T18:03:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0fac214bb75ee64d7b9e6521d53f8251a4d324ea'/>
<id>urn:sha1:0fac214bb75ee64d7b9e6521d53f8251a4d324ea</id>
<content type='text'>
intel_idle will, for the bare metal case, usually have one or more deep
power states that have the CPUIDLE_FLAG_TLB_FLUSHED flag set. When
a state with this flag is selected by the cpuidle framework, it will also
flush the TLBs as part of entering this state. The benefit of doing this is
that the kernel does not need to wake the cpu out of this deep power state
just to flush the TLBs... for which the latency can be very high due to
the exit latency of deep power states.

In a VM guest currently, this benefit of avoiding the wakeup does not exist,
while the problem (long exit latency) is even more severe. Linux will need
to wake up a vCPU (causing the host to either come out of a deep C state,
or the VMM to have to deschedule something else to schedule the vCPU) which
can take a very long time.. adding a lot of latency to tlb flush operations
(including munmap and others).

To solve this, add a "Long HLT" C state to the state table for the VM guest
case that has the CPUIDLE_FLAG_TLB_FLUSHED flag set.  The result of that is
that for long idle periods (where the VMM is likely to do things that cause
large latency) the cpuidle framework will flush the TLBs (and avoid the
wakeups), while for short/quick idle durations, the existing behavior is
retained.

Now, there is still only "hlt" available in the guest, but for long idle,
the host can go to a deeper state (say C6).  There is a reasonable debate
one can have to what to set for the exit_latency and break even point for
this "Long HLT" state.  The good news is that intel_idle has these values
available for the underlying CPU (even when mwait is not exposed).  The
solution thus is to just use the latency and break even of the deepest state
from the bare metal CPU.  This is under the assumption that this is a pretty
reasonable estimate of what the VMM would do to cause latency.

Signed-off-by: Arjan van de Ven &lt;arjan@linux.intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>intel_idle: Add support for using intel_idle in a VM guest using just hlt</title>
<updated>2023-06-16T17:12:36Z</updated>
<author>
<name>Arjan van de Ven</name>
<email>arjan@linux.intel.com</email>
</author>
<published>2023-06-05T15:47:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2f3d08f074b02aa449de27238fda72496c789034'/>
<id>urn:sha1:2f3d08f074b02aa449de27238fda72496c789034</id>
<content type='text'>
In a typical VM guest, the mwait instruction is not available, leaving
only the 'hlt' instruction (which causes a VMEXIT to the host).

So for this common case, intel_idle will detect the lack of mwait, and
fail to initialize (after which another idle method would step in which
will just use hlt always).

Other (non-common) cases exist; the table below shows the before/after
for these:

+------------+--------------------------+-------------------------+
| Hypervisor | Idle method before patch | Idle method after patch |
| exposes    |                          |                         |
+============+==========================+=========================+
| nothing    | default_idle fallback    | intel_idle VM table     |
| (common)   | (straight "hlt")         |                         |
+------------+--------------------------+-------------------------+
| mwait      | intel_idle mwait table   | intel_idle mwait table  |
+------------+--------------------------+-------------------------+
| ACPI       | ACPI C1 state ("hlt")    | intel_idle VM table     |
+------------+--------------------------+-------------------------+

This is only applicable to CPUs known by intel_idle. For the bare metal
case, unknown CPU models will use the ACPI tables (when available) to
get estimates for latency and break even point for longer idle states.
In guests, the common case is that ACPI tables are not available, but
even when they are available, they can't and don't provide the latency
information for the longer (mwait based) states. For this scenario
(unknown CPU model), the default_idle mode (no ACPI) or ACPI C1 (ACPI
avaible) will be used.

By providing capability to do this with the intel_idle driver, we can
do better than the fallback or ACPI table methods. While this current
change only gets us to the existing behavior, later patches in this
series will add new capabilities such as optimized TLB flushing.

In order to do this, a simplified version of the initialization
function for VM guests is created, and this will be called if the CPU
is recognized, but mwait is not supported, and we're in a VM guest.

One thing to note is that the max latency (and break even) of this C1
state is higher than the typical bare metal C1 state. Because hlt causes
a vmexit, and the cost of vmexit + hypervisor overhead + vmenter is
typically in the order of upto 5 microseconds... even if the hypervisor
does not actually goes into a hardware power saving state.

Signed-off-by: Arjan van de Ven &lt;arjan@linux.intel.com&gt;
[ rjw: Dropped redundant checks from should_verify_mwait() ]
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
</feed>
