<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/virt, branch v6.14</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.14</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.14'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-02-04T16:27:45Z</updated>
<entry>
<title>KVM: remove kvm_arch_post_init_vm</title>
<updated>2025-02-04T16:27:45Z</updated>
<author>
<name>Paolo Bonzini</name>
<email>pbonzini@redhat.com</email>
</author>
<published>2025-01-24T15:26:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6f61269495260531e15d84d090ee63618110c470'/>
<id>urn:sha1:6f61269495260531e15d84d090ee63618110c470</id>
<content type='text'>
The only statement in a kvm_arch_post_init_vm implementation
can be moved into the x86 kvm_arch_init_vm.  Do so and remove all
traces from architecture-independent code.

Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: Do not restrict the size of KVM-internal memory regions</title>
<updated>2025-01-31T11:03:52Z</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2025-01-23T14:46:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=66119f8ce135de664cb2fb88d9aaa322d7451a1f'/>
<id>urn:sha1:66119f8ce135de664cb2fb88d9aaa322d7451a1f</id>
<content type='text'>
Exempt KVM-internal memslots from the KVM_MEM_MAX_NR_PAGES restriction, as
the limit on the number of pages exists purely to play nice with dirty
bitmap operations, which use 32-bit values to index the bitmaps, and dirty
logging isn't supported for KVM-internal memslots.

Link: https://lore.kernel.org/all/20240802205003.353672-6-seanjc@google.com
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
Reviewed-by: Christoph Schlameuss &lt;schlameuss@linux.ibm.com&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
Link: https://lore.kernel.org/r/20250123144627.312456-2-imbrenda@linux.ibm.com
Signed-off-by: Claudio Imbrenda &lt;imbrenda@linux.ibm.com&gt;
Message-ID: &lt;20250123144627.312456-2-imbrenda@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'kvm-mirror-page-tables' into HEAD</title>
<updated>2025-01-20T12:15:58Z</updated>
<author>
<name>Paolo Bonzini</name>
<email>pbonzini@redhat.com</email>
</author>
<published>2025-01-20T12:15:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=86eb1aef7279ec68fe9b7a44685efc09aa56a8f0'/>
<id>urn:sha1:86eb1aef7279ec68fe9b7a44685efc09aa56a8f0</id>
<content type='text'>
As part of enabling TDX virtual machines, support support separation of
private/shared EPT into separate roots.

Confidential computing solutions almost invariably have concepts of
private and shared memory, but they may different a lot in the details.
In SEV, for example, the bit is handled more like a permission bit as
far as the page tables are concerned: the private/shared bit is not
included in the physical address.

For TDX, instead, the bit is more like a physical address bit, with
the host mapping private memory in one half of the address space and
shared in another.  Furthermore, the two halves are mapped by different
EPT roots and only the shared half is managed by KVM; the private half
(also called Secure EPT in Intel documentation) gets managed by the
privileged TDX Module via SEAMCALLs.

As a result, the operations that actually change the private half of
the EPT are limited and relatively slow compared to reading a PTE. For
this reason the design for KVM is to keep a mirror of the private EPT in
host memory.  This allows KVM to quickly walk the EPT and only perform the
slower private EPT operations when it needs to actually modify mid-level
private PTEs.

There are thus three sets of EPT page tables: external, mirror and
direct.  In the case of TDX (the only user of this framework) the
first two cover private memory, whereas the third manages shared
memory:

  external EPT - Hidden within the TDX module, modified via TDX module
                 calls.

  mirror EPT   - Bookkeeping tree used as an optimization by KVM, not
                 used by the processor.

  direct EPT   - Normal EPT that maps unencrypted shared memory.
                 Managed like the EPT of a normal VM.

Modifying external EPT
----------------------

Modifications to the mirrored page tables need to also perform the
same operations to the private page tables, which will be handled via
kvm_x86_ops.  Although this prep series does not interact with the TDX
module at all to actually configure the private EPT, it does lay the
ground work for doing this.

In some ways updating the private EPT is as simple as plumbing PTE
modifications through to also call into the TDX module; however, the
locking is more complicated because inserting a single PTE cannot anymore
be done atomically with a single CMPXCHG.  For this reason, the existing
FROZEN_SPTE mechanism is used whenever a call to the TDX module updates the
private EPT.  FROZEN_SPTE acts basically as a spinlock on a PTE.  Besides
protecting operation of KVM, it limits the set of cases in which the
TDX module will encounter contention on its own PTE locks.

Zapping external EPT
--------------------
While the framework tries to be relatively generic, and to be
understandable without knowing TDX much in detail, some requirements of
TDX sometimes leak; for example the private page tables also cannot be
zapped while the range has anything mapped, so the mirrored/private page
tables need to be protected from KVM operations that zap any non-leaf
PTEs, for example kvm_mmu_reset_context() or kvm_mmu_zap_all_fast().

For normal VMs, guest memory is zapped for several reasons: user
memory getting paged out by the guest, memslots getting deleted,
passthrough of devices with non-coherent DMA.  Confidential computing
adds to these the conversion of memory between shared and privates. These
operations must not zap any private memory that is in use by the guest.

This is possible because the only zapping that is out of the control
of KVM/userspace is paging out userspace memory, which cannot apply to
guestmemfd operations.  Thus a TDX VM will only zap private memory from
memslot deletion and from conversion between private and shared memory
which is triggered by the guest.

To avoid zapping too much memory, enums are introduced so that operations
can choose to target only private or shared memory, and thus only
direct or mirror EPT.  For example:

  Memslot deletion           - Private and shared
  MMU notifier based zapping - Shared only
  Conversion to shared       - Private only
  Conversion to private      - Shared only

Other cases of zapping will not be supported for KVM, for example
APICv update or non-coherent DMA status update; for the latter, TDX will
simply require that the CPU supports self-snoop and honor guest PAT
unconditionally for shared memory.
</content>
</entry>
<entry>
<title>Merge tag 'kvm-x86-vcpu_array-6.14' of https://github.com/kvm-x86/linux into HEAD</title>
<updated>2025-01-20T11:36:40Z</updated>
<author>
<name>Paolo Bonzini</name>
<email>pbonzini@redhat.com</email>
</author>
<published>2025-01-20T11:36:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7a9164dc69fd56d0f5c4af6fce6552837b2b0bad'/>
<id>urn:sha1:7a9164dc69fd56d0f5c4af6fce6552837b2b0bad</id>
<content type='text'>
KVM vcpu_array fixes and cleanups for 6.14:

 - Explicitly verify the target vCPU is online in kvm_get_vcpu() to fix a bug
   where KVM would return a pointer to a vCPU prior to it being fully online,
   and give kvm_for_each_vcpu() similar treatment to fix a similar flaw.

 - Wait for a vCPU to come online prior to executing a vCPU ioctl to fix a
   bug where userspace could coerce KVM into handling the ioctl on a vCPU that
   isn't yet onlined.

 - Gracefully handle xa_insert() failures even though such failuires should be
   impossible in practice.
</content>
</entry>
<entry>
<title>KVM: Disallow all flags for KVM-internal memslots</title>
<updated>2025-01-15T01:36:16Z</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2025-01-11T00:20:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0cc3cb2151f9830274e7bef39a23dc1da1ecd34a'/>
<id>urn:sha1:0cc3cb2151f9830274e7bef39a23dc1da1ecd34a</id>
<content type='text'>
Disallow all flags for KVM-internal memslots as all existing flags require
some amount of userspace interaction to have any meaning.  In addition to
guarding against KVM goofs, explicitly disallowing dirty logging of KVM-
internal memslots will (hopefully) allow exempting KVM-internal memslots
from the KVM_MEM_MAX_NR_PAGES limit, which appears to exist purely because
the dirty bitmap operations use a 32-bit index.

Cc: Xiaoyao Li &lt;xiaoyao.li@intel.com&gt;
Cc: Claudio Imbrenda &lt;imbrenda@linux.ibm.com&gt;
Cc: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Reviewed-by: Xiaoyao Li &lt;xiaoyao.li@intel.com&gt;
Reviewed-by: Claudio Imbrenda &lt;imbrenda@linux.ibm.com&gt;
Acked-by: Christoph Schlameuss &lt;schlameuss@linux.ibm.com&gt;
Link: https://lore.kernel.org/r/20250111002022.1230573-6-seanjc@google.com
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
</content>
</entry>
<entry>
<title>KVM: x86: Drop double-underscores from __kvm_set_memory_region()</title>
<updated>2025-01-15T01:36:16Z</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2025-01-11T00:20:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=344315e93dbc9c4733d5c9fd605ecfc46ef97180'/>
<id>urn:sha1:344315e93dbc9c4733d5c9fd605ecfc46ef97180</id>
<content type='text'>
Now that there's no outer wrapper for __kvm_set_memory_region() and it's
static, drop its double-underscore prefix.

No functional change intended.

Cc: Tao Su &lt;tao1.su@linux.intel.com&gt;
Reviewed-by: Xiaoyao Li &lt;xiaoyao.li@intel.com&gt;
Reviewed-by: Claudio Imbrenda &lt;imbrenda@linux.ibm.com&gt;
Acked-by: Christoph Schlameuss &lt;schlameuss@linux.ibm.com&gt;
Link: https://lore.kernel.org/r/20250111002022.1230573-5-seanjc@google.com
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
</content>
</entry>
<entry>
<title>KVM: Add a dedicated API for setting KVM-internal memslots</title>
<updated>2025-01-15T01:36:15Z</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2025-01-11T00:20:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=156bffdb2b49fc0c869bf160a57378886f5fa92d'/>
<id>urn:sha1:156bffdb2b49fc0c869bf160a57378886f5fa92d</id>
<content type='text'>
Add a dedicated API for setting internal memslots, and have it explicitly
disallow setting userspace memslots.  Setting a userspace memslots without
a direct command from userspace would result in all manner of issues.

No functional change intended.

Cc: Tao Su &lt;tao1.su@linux.intel.com&gt;
Cc: Claudio Imbrenda &lt;imbrenda@linux.ibm.com&gt;
Cc: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Reviewed-by: Xiaoyao Li &lt;xiaoyao.li@intel.com&gt;
Reviewed-by: Claudio Imbrenda &lt;imbrenda@linux.ibm.com&gt;
Acked-by: Christoph Schlameuss &lt;schlameuss@linux.ibm.com&gt;
Link: https://lore.kernel.org/r/20250111002022.1230573-4-seanjc@google.com
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
</content>
</entry>
<entry>
<title>KVM: Assert slots_lock is held when setting memory regions</title>
<updated>2025-01-15T01:36:15Z</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2025-01-11T00:20:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d131f0042f466d558f7b75af5e100b62a91a5803'/>
<id>urn:sha1:d131f0042f466d558f7b75af5e100b62a91a5803</id>
<content type='text'>
Add proper lockdep assertions in __kvm_set_memory_region() and
__x86_set_memory_region() instead of relying comments.

Opportunistically delete __kvm_set_memory_region()'s entire function
comment as the API doesn't allocate memory or select a gfn, and the
"mostly for framebuffers" comment hasn't been true for a very long time.

Cc: Tao Su &lt;tao1.su@linux.intel.com&gt;
Reviewed-by: Xiaoyao Li &lt;xiaoyao.li@intel.com&gt;
Reviewed-by: Claudio Imbrenda &lt;imbrenda@linux.ibm.com&gt;
Acked-by: Christoph Schlameuss &lt;schlameuss@linux.ibm.com&gt;
Link: https://lore.kernel.org/r/20250111002022.1230573-3-seanjc@google.com
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
</content>
</entry>
<entry>
<title>KVM: Open code kvm_set_memory_region() into its sole caller (ioctl() API)</title>
<updated>2025-01-15T01:36:15Z</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2025-01-11T00:20:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f81a6d12bf8b262f4c8ce5e856a4d399d97612ee'/>
<id>urn:sha1:f81a6d12bf8b262f4c8ce5e856a4d399d97612ee</id>
<content type='text'>
Open code kvm_set_memory_region() into its sole caller in preparation for
adding a dedicated API for setting internal memslots.

Oppurtunistically use the fancy new guard(mutex) to avoid a local 'r'
variable.

Cc: Tao Su &lt;tao1.su@linux.intel.com&gt;
Reviewed-by: Xiaoyao Li &lt;xiaoyao.li@intel.com&gt;
Reviewed-by: Claudio Imbrenda &lt;imbrenda@linux.ibm.com&gt;
Acked-by: Christoph Schlameuss &lt;schlameuss@linux.ibm.com&gt;
Link: https://lore.kernel.org/r/20250111002022.1230573-2-seanjc@google.com
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
</content>
</entry>
<entry>
<title>KVM: Add member to struct kvm_gfn_range to indicate private/shared</title>
<updated>2024-12-23T13:28:55Z</updated>
<author>
<name>Isaku Yamahata</name>
<email>isaku.yamahata@intel.com</email>
</author>
<published>2024-07-18T21:12:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=dca6c88532322830d5d92486467fcc91b67a9ad8'/>
<id>urn:sha1:dca6c88532322830d5d92486467fcc91b67a9ad8</id>
<content type='text'>
Add new members to strut kvm_gfn_range to indicate which mapping
(private-vs-shared) to operate on: enum kvm_gfn_range_filter
attr_filter. Update the core zapping operations to set them appropriately.

TDX utilizes two GPA aliases for the same memslots, one for memory that is
for private memory and one that is for shared. For private memory, KVM
cannot always perform the same operations it does on memory for default
VMs, such as zapping pages and having them be faulted back in, as this
requires guest coordination. However, some operations such as guest driven
conversion of memory between private and shared should zap private memory.

Internally to the MMU, private and shared mappings are tracked on separate
roots. Mapping and zapping operations will operate on the respective GFN
alias for each root (private or shared). So zapping operations will by
default zap both aliases. Add fields in struct kvm_gfn_range to allow
callers to specify which aliases so they can only target the aliases
appropriate for their specific operation.

There was feedback that target aliases should be specified such that the
default value (0) is to operate on both aliases. Several options were
considered. Several variations of having separate bools defined such
that the default behavior was to process both aliases. They either allowed
nonsensical configurations, or were confusing for the caller. A simple
enum was also explored and was close, but was hard to process in the
caller. Instead, use an enum with the default value (0) reserved as a
disallowed value. Catch ranges that didn't have the target aliases
specified by looking for that specific value.

Set target alias with enum appropriately for these MMU operations:
 - For KVM's mmu notifier callbacks, zap shared pages only because private
   pages won't have a userspace mapping
 - For setting memory attributes, kvm_arch_pre_set_memory_attributes()
   chooses the aliases based on the attribute.
 - For guest_memfd invalidations, zap private only.

Link: https://lore.kernel.org/kvm/ZivIF9vjKcuGie3s@google.com/
Signed-off-by: Isaku Yamahata &lt;isaku.yamahata@intel.com&gt;
Co-developed-by: Rick Edgecombe &lt;rick.p.edgecombe@intel.com&gt;
Signed-off-by: Rick Edgecombe &lt;rick.p.edgecombe@intel.com&gt;
Message-ID: &lt;20240718211230.1492011-3-rick.p.edgecombe@intel.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
</feed>
