<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/Documentation/virt, branch v5.14</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.14</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.14'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2021-08-13T07:32:14Z</updated>
<entry>
<title>KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlock</title>
<updated>2021-08-13T07:32:14Z</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2021-08-12T18:18:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ce25681d59ffc4303321e555a2d71b1946af07da'/>
<id>urn:sha1:ce25681d59ffc4303321e555a2d71b1946af07da</id>
<content type='text'>
Add yet another spinlock for the TDP MMU and take it when marking indirect
shadow pages unsync.  When using the TDP MMU and L1 is running L2(s) with
nested TDP, KVM may encounter shadow pages for the TDP entries managed by
L1 (controlling L2) when handling a TDP MMU page fault.  The unsync logic
is not thread safe, e.g. the kvm_mmu_page fields are not atomic, and
misbehaves when a shadow page is marked unsync via a TDP MMU page fault,
which runs with mmu_lock held for read, not write.

Lack of a critical section manifests most visibly as an underflow of
unsync_children in clear_unsync_child_bit() due to unsync_children being
corrupted when multiple CPUs write it without a critical section and
without atomic operations.  But underflow is the best case scenario.  The
worst case scenario is that unsync_children prematurely hits '0' and
leads to guest memory corruption due to KVM neglecting to properly sync
shadow pages.

Use an entirely new spinlock even though piggybacking tdp_mmu_pages_lock
would functionally be ok.  Usurping the lock could degrade performance when
building upper level page tables on different vCPUs, especially since the
unsync flow could hold the lock for a comparatively long time depending on
the number of indirect shadow pages and the depth of the paging tree.

For simplicity, take the lock for all MMUs, even though KVM could fairly
easily know that mmu_lock is held for write.  If mmu_lock is held for
write, there cannot be contention for the inner spinlock, and marking
shadow pages unsync across multiple vCPUs will be slow enough that
bouncing the kvm_arch cacheline should be in the noise.

Note, even though L2 could theoretically be given access to its own EPT
entries, a nested MMU must hold mmu_lock for write and thus cannot race
against a TDP MMU page fault.  I.e. the additional spinlock only _needs_ to
be taken by the TDP MMU, as opposed to being taken by any MMU for a VM
that is running with the TDP MMU enabled.  Holding mmu_lock for read also
prevents the indirect shadow page from being freed.  But as above, keep
it simple and always take the lock.

Alternative #1, the TDP MMU could simply pass "false" for can_unsync and
effectively disable unsync behavior for nested TDP.  Write protecting leaf
shadow pages is unlikely to noticeably impact traditional L1 VMMs, as such
VMMs typically don't modify TDP entries, but the same may not hold true for
non-standard use cases and/or VMMs that are migrating physical pages (from
L1's perspective).

Alternative #2, the unsync logic could be made thread safe.  In theory,
simply converting all relevant kvm_mmu_page fields to atomics and using
atomic bitops for the bitmap would suffice.  However, (a) an in-depth audit
would be required, (b) the code churn would be substantial, and (c) legacy
shadow paging would incur additional atomic operations in performance
sensitive paths for no benefit (to legacy shadow paging).

Fixes: a2855afc7ee8 ("KVM: x86/mmu: Allow parallel page faults for the TDP MMU")
Cc: stable@vger.kernel.org
Cc: Ben Gardon &lt;bgardon@google.com&gt;
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
Message-Id: &lt;20210812181815.3378104-1-seanjc@google.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>docs: virt: kvm: api.rst: replace some characters</title>
<updated>2021-07-26T12:26:06Z</updated>
<author>
<name>Mauro Carvalho Chehab</name>
<email>mchehab+huawei@kernel.org</email>
</author>
<published>2021-07-22T09:50:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3b1c8c5682672d73c1e977944af8c3ebed4a0ce1'/>
<id>urn:sha1:3b1c8c5682672d73c1e977944af8c3ebed4a0ce1</id>
<content type='text'>
The conversion tools used during DocBook/LaTeX/html/Markdown-&gt;ReST
conversion and some cut-and-pasted text contain some characters that
aren't easily reachable on standard keyboards and/or could cause
troubles when parsed by the documentation build system.

Replace the occurences of the following characters:

	- U+00a0 (' '): NO-BREAK SPACE
	  as it can cause lines being truncated on PDF output

Signed-off-by: Mauro Carvalho Chehab &lt;mchehab+huawei@kernel.org&gt;
Message-Id: &lt;ff70cb42d63f3a1da66af1b21b8d038418ed5189.1626947264.git.mchehab+huawei@kernel.org&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: Documentation: Fix KVM_CAP_ENFORCE_PV_FEATURE_CPUID name</title>
<updated>2021-07-26T12:24:30Z</updated>
<author>
<name>Vitaly Kuznetsov</name>
<email>vkuznets@redhat.com</email>
</author>
<published>2021-07-22T09:26:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0e691ee7b5034c91a31b565d3ff9a50e01dde445'/>
<id>urn:sha1:0e691ee7b5034c91a31b565d3ff9a50e01dde445</id>
<content type='text'>
'KVM_CAP_ENFORCE_PV_CPUID' doesn't match the define in
include/uapi/linux/kvm.h.

Signed-off-by: Vitaly Kuznetsov &lt;vkuznets@redhat.com&gt;
Message-Id: &lt;20210722092628.236474-1-vkuznets@redhat.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'docs-5.14' of git://git.lwn.net/linux</title>
<updated>2021-06-28T23:53:05Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2021-06-28T23:53:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=233a806b00e31b3ab8d57a68f1aab40cf1e5eaea'/>
<id>urn:sha1:233a806b00e31b3ab8d57a68f1aab40cf1e5eaea</id>
<content type='text'>
Pull documentation updates from Jonathan Corbet:
 "This was a reasonably active cycle for documentation; this includes:

   - Some kernel-doc cleanups. That script is still regex onslaught from
     hell, but it has gotten a little better.

   - Improvements to the checkpatch docs, which are also used by the
     tool itself.

   - A major update to the pathname lookup documentation.

   - Elimination of :doc: markup, since our automarkup magic can create
     references from filenames without all the extra noise.

   - The flurry of Chinese translation activity continues.

  Plus, of course, the usual collection of updates, typo fixes, and
  warning fixes"

* tag 'docs-5.14' of git://git.lwn.net/linux: (115 commits)
  docs: path-lookup: use bare function() rather than literals
  docs: path-lookup: update symlink description
  docs: path-lookup: update get_link() -&gt;follow_link description
  docs: path-lookup: update WALK_GET, WALK_PUT desc
  docs: path-lookup: no get_link()
  docs: path-lookup: update i_op-&gt;put_link and cookie description
  docs: path-lookup: i_op-&gt;follow_link replaced with i_op-&gt;get_link
  docs: path-lookup: Add macro name to symlink limit description
  docs: path-lookup: remove filename_mountpoint
  docs: path-lookup: update do_last() part
  docs: path-lookup: update path_mountpoint() part
  docs: path-lookup: update path_to_nameidata() part
  docs: path-lookup: update follow_managed() part
  docs: Makefile: Use CONFIG_SHELL not SHELL
  docs: Take a little noise out of the build process
  docs: x86: avoid using ReST :doc:`foo` markup
  docs: virt: kvm: s390-pv-boot.rst: avoid using ReST :doc:`foo` markup
  docs: userspace-api: landlock.rst: avoid using ReST :doc:`foo` markup
  docs: trace: ftrace.rst: avoid using ReST :doc:`foo` markup
  docs: trace: coresight: coresight.rst: avoid using ReST :doc:`foo` markup
  ...
</content>
</entry>
<entry>
<title>Merge tag 'kvmarm-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD</title>
<updated>2021-06-25T15:24:24Z</updated>
<author>
<name>Paolo Bonzini</name>
<email>pbonzini@redhat.com</email>
</author>
<published>2021-06-25T15:24:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b8917b4ae44d1b945f6fba3d8ee6777edb44633b'/>
<id>urn:sha1:b8917b4ae44d1b945f6fba3d8ee6777edb44633b</id>
<content type='text'>
KVM/arm64 updates for v5.14.

- Add MTE support in guests, complete with tag save/restore interface
- Reduce the impact of CMOs by moving them in the page-table code
- Allow device block mappings at stage-2
- Reduce the footprint of the vmemmap in protected mode
- Support the vGIC on dumb systems such as the Apple M1
- Add selftest infrastructure to support multiple configuration
  and apply that to PMU/non-PMU setups
- Add selftests for the debug architecture
- The usual crop of PMU fixes
</content>
</entry>
<entry>
<title>kvm: x86: Allow userspace to handle emulation errors</title>
<updated>2021-06-24T22:00:48Z</updated>
<author>
<name>Aaron Lewis</name>
<email>aaronlewis@google.com</email>
</author>
<published>2021-05-10T14:48:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=19238e75bd8ed8ffe784bf5b37586e77b2093742'/>
<id>urn:sha1:19238e75bd8ed8ffe784bf5b37586e77b2093742</id>
<content type='text'>
Add a fallback mechanism to the in-kernel instruction emulator that
allows userspace the opportunity to process an instruction the emulator
was unable to.  When the in-kernel instruction emulator fails to process
an instruction it will either inject a #UD into the guest or exit to
userspace with exit reason KVM_INTERNAL_ERROR.  This is because it does
not know how to proceed in an appropriate manner.  This feature lets
userspace get involved to see if it can figure out a better path
forward.

Signed-off-by: Aaron Lewis &lt;aaronlewis@google.com&gt;
Reviewed-by: David Edmondson &lt;david.edmondson@oracle.com&gt;
Message-Id: &lt;20210510144834.658457-2-aaronlewis@google.com&gt;
Reviewed-by: Jim Mattson &lt;jmattson@google.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: x86/mmu: Rename "nxe" role bit to "efer_nx" for macro shenanigans</title>
<updated>2021-06-24T22:00:41Z</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2021-06-22T17:57:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=167f8a5cae99fb2050d3d674ca84457a526e23dd'/>
<id>urn:sha1:167f8a5cae99fb2050d3d674ca84457a526e23dd</id>
<content type='text'>
Rename "nxe" to "efer_nx" so that future macro magic can use the pattern
&lt;reg&gt;_&lt;bit&gt; for all CR0, CR4, and EFER bits that included in the role.
Using "efer_nx" also makes it clear that the role bit reflects EFER.NX,
not the NX bit in the corresponding PTE.

Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
Message-Id: &lt;20210622175739.3610207-25-seanjc@google.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: x86/mmu: Use MMU role to check for matching guest page sizes</title>
<updated>2021-06-24T22:00:37Z</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2021-06-22T17:56:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=00a669780ffa8c4b5f3e37346b5bf45508dd15bb'/>
<id>urn:sha1:00a669780ffa8c4b5f3e37346b5bf45508dd15bb</id>
<content type='text'>
Originally, __kvm_sync_page used to check the cr4_pae bit in the role
to avoid zapping 4-byte kvm_mmu_pages when guest page size are 8-byte
or the other way round.  However, in commit 47c42e6b4192 ("KVM: x86: fix
handling of role.cr4_pae and rename it to 'gpte_size'", 2019-03-28) it
was observed that this did not work for nested EPT, where the page table
size would be 8 bytes even if CR4.PAE=0.  (Note that the check still
has to be done for nested *NPT*, so it is not possible to use tdp_enabled
or similar).

Therefore, a hack was introduced to identify nested EPT shadow pages
and unconditionally call __kvm_sync_page() on them.  However, it is
possible to do without the hack to identify nested EPT shadow pages:
if EPT is active, there will be no shadow pages in non-EPT format,
and all of them will have gpte_is_8_bytes set to true; we can just
check the MMU role directly, and the test will always be true.

Even for non-EPT shadow MMUs, this test should really always be true
now that __kvm_sync_page() is called if and only if the role is an
exact match (kvm_mmu_get_page()) or is part of the current MMU context
(kvm_mmu_sync_roots()).  A future commit will convert the likely-pointless
check into a meaningful WARN to enforce that the mmu_roles of the current
context and the shadow page are compatible.

Cc: Vitaly Kuznetsov &lt;vkuznets@redhat.com&gt;
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
Message-Id: &lt;20210622175739.3610207-11-seanjc@google.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: x86: Alert userspace that KVM_SET_CPUID{,2} after KVM_RUN is broken</title>
<updated>2021-06-24T22:00:36Z</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2021-06-22T17:56:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=63f5a1909f9e465eb446274969f65471794deafb'/>
<id>urn:sha1:63f5a1909f9e465eb446274969f65471794deafb</id>
<content type='text'>
Warn userspace that KVM_SET_CPUID{,2} after KVM_RUN "may" cause guest
instability.  Initialize last_vmentry_cpu to -1 and use it to detect if
the vCPU has been run at least once when its CPUID model is changed.

KVM does not correctly handle changes to paging related settings in the
guest's vCPU model after KVM_RUN, e.g. MAXPHYADDR, GBPAGES, etc...  KVM
could theoretically zap all shadow pages, but actually making that happen
is a mess due to lock inversion (vcpu-&gt;mutex is held).  And even then,
updating paging settings on the fly would only work if all vCPUs are
stopped, updated in concert with identical settings, then restarted.

To support running vCPUs with different vCPU models (that affect paging),
KVM would need to track all relevant information in kvm_mmu_page_role.
Note, that's the _page_ role, not the full mmu_role.  Updating mmu_role
isn't sufficient as a vCPU can reuse a shadow page translation that was
created by a vCPU with different settings and thus completely skip the
reserved bit checks (that are tied to CPUID).

Tracking CPUID state in kvm_mmu_page_role is _extremely_ undesirable as
it would require doubling gfn_track from a u16 to a u32, i.e. would
increase KVM's memory footprint by 2 bytes for every 4kb of guest memory.
E.g. MAXPHYADDR (6 bits), GBPAGES, AMD vs. INTEL = 1 bit, and SEV C-BIT
would all need to be tracked.

In practice, there is no remotely sane use case for changing any paging
related CPUID entries on the fly, so just sweep it under the rug (after
yelling at userspace).

Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
Message-Id: &lt;20210622175739.3610207-8-seanjc@google.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: stats: Add documentation for binary statistics interface</title>
<updated>2021-06-24T22:00:23Z</updated>
<author>
<name>Jing Zhang</name>
<email>jingzhangos@google.com</email>
</author>
<published>2021-06-18T22:27:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fdc09ddd40645b0e3f245e4512fd4b4c34cde5e5'/>
<id>urn:sha1:fdc09ddd40645b0e3f245e4512fd4b4c34cde5e5</id>
<content type='text'>
This new API provides a file descriptor for every VM and VCPU to read
KVM statistics data in binary format.
It is meant to provide a lightweight, flexible, scalable and efficient
lock-free solution for user space telemetry applications to pull the
statistics data periodically for large scale systems. The pulling
frequency could be as high as a few times per second.
The statistics descriptors are defined by KVM in kernel and can be
by userspace to discover VM/VCPU statistics during the one-time setup
stage.
The statistics data itself could be read out by userspace telemetry
periodically without any extra parsing or setup effort.
There are a few existed interface protocols and definitions, but no
one can fulfil all the requirements this interface implemented as
below:
1. During high frequency periodic stats reading, there should be no
   extra efforts except the stats data read itself.
2. Support stats annotation, like type (cumulative, instantaneous,
   peak, histogram, etc) and unit (counter, time, size, cycles, etc).
3. The stats data reading should be free of lock/synchronization. We
   don't care about the consistency between all the stats data. All
   stats data can not be read out at exactly the same time. We really
   care about the change or trend of the stats data. The lock-free
   solution is not just for efficiency and scalability, also for the
   stats data accuracy and usability. For example, in the situation
   that all the stats data readings are protected by a global lock,
   if one VCPU died somehow with that lock held, then all stats data
   reading would be blocked, then we have no way from stats data that
   which VCPU has died.
4. The stats data reading workload can be handed over to other
   unprivileged process.

Reviewed-by: David Matlack &lt;dmatlack@google.com&gt;
Reviewed-by: Ricardo Koller &lt;ricarkol@google.com&gt;
Reviewed-by: Krish Sadhukhan &lt;krish.sadhukhan@oracle.com&gt;
Reviewed-by: Fuad Tabba &lt;tabba@google.com&gt;
Signed-off-by: Jing Zhang &lt;jingzhangos@google.com&gt;
Message-Id: &lt;20210618222709.1858088-6-jingzhangos@google.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
</feed>
