<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/virt, branch v6.3</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.3</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.3'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2023-03-31T15:19:05Z</updated>
<entry>
<title>KVM: PPC: Make KVM_CAP_IRQFD_RESAMPLE platform dependent</title>
<updated>2023-03-31T15:19:05Z</updated>
<author>
<name>Alexey Kardashevskiy</name>
<email>aik@ozlabs.ru</email>
</author>
<published>2022-05-04T07:48:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=52882b9c7a761b2b4e44717d6fbd1ed94c601b7f'/>
<id>urn:sha1:52882b9c7a761b2b4e44717d6fbd1ed94c601b7f</id>
<content type='text'>
When introduced, IRQFD resampling worked on POWER8 with XICS. However
KVM on POWER9 has never implemented it - the compatibility mode code
("XICS-on-XIVE") misses the kvm_notify_acked_irq() call and the native
XIVE mode does not handle INTx in KVM at all.

This moved the capability support advertising to platforms and stops
advertising it on XIVE, i.e. POWER9 and later.

Signed-off-by: Alexey Kardashevskiy &lt;aik@ozlabs.ru&gt;
Acked-by: Anup Patel &lt;anup@brainfault.org&gt;
Acked-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Message-Id: &lt;20220504074807.3616813-1-aik@ozlabs.ru&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: x86/ioapic: Resample the pending state of an IRQ when unmasking</title>
<updated>2023-03-27T14:13:28Z</updated>
<author>
<name>Dmytro Maluka</name>
<email>dmy@semihalf.com</email>
</author>
<published>2023-03-22T20:43:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fef8f2b90edbd7089a4278021314f11f056b0cbb'/>
<id>urn:sha1:fef8f2b90edbd7089a4278021314f11f056b0cbb</id>
<content type='text'>
KVM irqfd based emulation of level-triggered interrupts doesn't work
quite correctly in some cases, particularly in the case of interrupts
that are handled in a Linux guest as oneshot interrupts (IRQF_ONESHOT).
Such an interrupt is acked to the device in its threaded irq handler,
i.e. later than it is acked to the interrupt controller (EOI at the end
of hardirq), not earlier.

Linux keeps such interrupt masked until its threaded handler finishes,
to prevent the EOI from re-asserting an unacknowledged interrupt.
However, with KVM + vfio (or whatever is listening on the resamplefd)
we always notify resamplefd at the EOI, so vfio prematurely unmasks the
host physical IRQ, thus a new physical interrupt is fired in the host.
This extra interrupt in the host is not a problem per se. The problem is
that it is unconditionally queued for injection into the guest, so the
guest sees an extra bogus interrupt. [*]

There are observed at least 2 user-visible issues caused by those
extra erroneous interrupts for a oneshot irq in the guest:

1. System suspend aborted due to a pending wakeup interrupt from
   ChromeOS EC (drivers/platform/chrome/cros_ec.c).
2. Annoying "invalid report id data" errors from ELAN0000 touchpad
   (drivers/input/mouse/elan_i2c_core.c), flooding the guest dmesg
   every time the touchpad is touched.

The core issue here is that by the time when the guest unmasks the IRQ,
the physical IRQ line is no longer asserted (since the guest has
acked the interrupt to the device in the meantime), yet we
unconditionally inject the interrupt queued into the guest by the
previous resampling. So to fix the issue, we need a way to detect that
the IRQ is no longer pending, and cancel the queued interrupt in this
case.

With IOAPIC we are not able to probe the physical IRQ line state
directly (at least not if the underlying physical interrupt controller
is an IOAPIC too), so in this patch we use irqfd resampler for that.
Namely, instead of injecting the queued interrupt, we just notify the
resampler that this interrupt is done. If the IRQ line is actually
already deasserted, we are done. If it is still asserted, a new
interrupt will be shortly triggered through irqfd and injected into the
guest.

In the case if there is no irqfd resampler registered for this IRQ, we
cannot fix the issue, so we keep the existing behavior: immediately
unconditionally inject the queued interrupt.

This patch fixes the issue for x86 IOAPIC only. In the long run, we can
fix it for other irqchips and other architectures too, possibly taking
advantage of reading the physical state of the IRQ line, which is
possible with some other irqchips (e.g. with arm64 GIC, maybe even with
the legacy x86 PIC).

[*] In this description we assume that the interrupt is a physical host
    interrupt forwarded to the guest e.g. by vfio. Potentially the same
    issue may occur also with a purely virtual interrupt from an
    emulated device, e.g. if the guest handles this interrupt, again, as
    a oneshot interrupt.

Signed-off-by: Dmytro Maluka &lt;dmy@semihalf.com&gt;
Link: https://lore.kernel.org/kvm/31420943-8c5f-125c-a5ee-d2fde2700083@semihalf.com/
Link: https://lore.kernel.org/lkml/87o7wrug0w.wl-maz@kernel.org/
Message-Id: &lt;20230322204344.50138-3-dmy@semihalf.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: irqfd: Make resampler_list an RCU list</title>
<updated>2023-03-27T14:13:28Z</updated>
<author>
<name>Dmytro Maluka</name>
<email>dmy@semihalf.com</email>
</author>
<published>2023-03-22T20:43:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d583fbd7066a2dea43050521a95d9770f7d7593e'/>
<id>urn:sha1:d583fbd7066a2dea43050521a95d9770f7d7593e</id>
<content type='text'>
It is useful to be able to do read-only traversal of the list of all the
registered irqfd resamplers without locking the resampler_lock mutex.
In particular, we are going to traverse it to search for a resampler
registered for the given irq of an irqchip, and that will be done with
an irqchip spinlock (ioapic-&gt;lock) held, so it is undesirable to lock a
mutex in this context. So turn this list into an RCU list.

For protecting the read side, reuse kvm-&gt;irq_srcu which is already used
for protecting a number of irq related things (kvm-&gt;irq_routing,
irqfd-&gt;resampler-&gt;list, kvm-&gt;irq_ack_notifier_list,
kvm-&gt;arch.mask_notifier_list).

Signed-off-by: Dmytro Maluka &lt;dmy@semihalf.com&gt;
Message-Id: &lt;20230322204344.50138-2-dmy@semihalf.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'kvm-riscv-6.3-1' of https://github.com/kvm-riscv/linux into HEAD</title>
<updated>2023-02-15T17:33:28Z</updated>
<author>
<name>Paolo Bonzini</name>
<email>pbonzini@redhat.com</email>
</author>
<published>2023-02-15T17:33:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=33436335e93a1788a58443fc99c5ab320ce4b9d9'/>
<id>urn:sha1:33436335e93a1788a58443fc99c5ab320ce4b9d9</id>
<content type='text'>
KVM/riscv changes for 6.3

- Fix wrong usage of PGDIR_SIZE to check page sizes
- Fix privilege mode setting in kvm_riscv_vcpu_trap_redirect()
- Redirect illegal instruction traps to guest
- SBI PMU support for guest
</content>
</entry>
<entry>
<title>KVM: Destroy target device if coalesced MMIO unregistration fails</title>
<updated>2023-02-01T19:25:05Z</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2022-12-19T17:19:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b1cb1fac22abf102ffeb29dd3eeca208a3869d54'/>
<id>urn:sha1:b1cb1fac22abf102ffeb29dd3eeca208a3869d54</id>
<content type='text'>
Destroy and free the target coalesced MMIO device if unregistering said
device fails.  As clearly noted in the code, kvm_io_bus_unregister_dev()
does not destroy the target device.

  BUG: memory leak
  unreferenced object 0xffff888112a54880 (size 64):
    comm "syz-executor.2", pid 5258, jiffies 4297861402 (age 14.129s)
    hex dump (first 32 bytes):
      38 c7 67 15 00 c9 ff ff 38 c7 67 15 00 c9 ff ff  8.g.....8.g.....
      e0 c7 e1 83 ff ff ff ff 00 30 67 15 00 c9 ff ff  .........0g.....
    backtrace:
      [&lt;0000000006995a8a&gt;] kmalloc include/linux/slab.h:556 [inline]
      [&lt;0000000006995a8a&gt;] kzalloc include/linux/slab.h:690 [inline]
      [&lt;0000000006995a8a&gt;] kvm_vm_ioctl_register_coalesced_mmio+0x8e/0x3d0 arch/x86/kvm/../../../virt/kvm/coalesced_mmio.c:150
      [&lt;00000000022550c2&gt;] kvm_vm_ioctl+0x47d/0x1600 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3323
      [&lt;000000008a75102f&gt;] vfs_ioctl fs/ioctl.c:46 [inline]
      [&lt;000000008a75102f&gt;] file_ioctl fs/ioctl.c:509 [inline]
      [&lt;000000008a75102f&gt;] do_vfs_ioctl+0xbab/0x1160 fs/ioctl.c:696
      [&lt;0000000080e3f669&gt;] ksys_ioctl+0x76/0xa0 fs/ioctl.c:713
      [&lt;0000000059ef4888&gt;] __do_sys_ioctl fs/ioctl.c:720 [inline]
      [&lt;0000000059ef4888&gt;] __se_sys_ioctl fs/ioctl.c:718 [inline]
      [&lt;0000000059ef4888&gt;] __x64_sys_ioctl+0x6f/0xb0 fs/ioctl.c:718
      [&lt;000000006444fa05&gt;] do_syscall_64+0x9f/0x4e0 arch/x86/entry/common.c:290
      [&lt;000000009a4ed50b&gt;] entry_SYSCALL_64_after_hwframe+0x49/0xbe

  BUG: leak checking failed

Fixes: 5d3c4c79384a ("KVM: Stop looking for coalesced MMIO zones if the bus is destroyed")
Cc: stable@vger.kernel.org
Reported-by: 柳菁峰 &lt;liujingfeng@qianxin.com&gt;
Reported-by: Michal Luczaj &lt;mhal@rbox.co&gt;
Link: https://lore.kernel.org/r/20221219171924.67989-1-seanjc@google.com
Link: https://lore.kernel.org/all/20230118220003.1239032-1-mhal@rbox.co
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'kvm-v6.2-rc4-fixes' into HEAD</title>
<updated>2023-01-24T11:05:23Z</updated>
<author>
<name>Paolo Bonzini</name>
<email>pbonzini@redhat.com</email>
</author>
<published>2023-01-13T16:27:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=dc7c31e922787466957cadf2c0ad21c0f9a4091f'/>
<id>urn:sha1:dc7c31e922787466957cadf2c0ad21c0f9a4091f</id>
<content type='text'>
ARM:

* Fix the PMCR_EL0 reset value after the PMU rework

* Correctly handle S2 fault triggered by a S1 page table walk
  by not always classifying it as a write, as this breaks on
  R/O memslots

* Document why we cannot exit with KVM_EXIT_MMIO when taking
  a write fault from a S1 PTW on a R/O memslot

* Put the Apple M2 on the naughty list for not being able to
  correctly implement the vgic SEIS feature, just like the M1
  before it

* Reviewer updates: Alex is stepping down, replaced by Zenghui

x86:

* Fix various rare locking issues in Xen emulation and teach lockdep
  to detect them

* Documentation improvements

* Do not return host topology information from KVM_GET_SUPPORTED_CPUID
</content>
</entry>
<entry>
<title>Merge tag 'vfio-v6.2-rc6' of https://github.com/awilliam/linux-vfio</title>
<updated>2023-01-23T19:56:07Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-01-23T19:56:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7bf70dbb18820b37406fdfa2aaf14c2f5c71a11a'/>
<id>urn:sha1:7bf70dbb18820b37406fdfa2aaf14c2f5c71a11a</id>
<content type='text'>
Pull VFIO fixes from Alex Williamson:

 - Honor reserved regions when testing for IOMMU find grained super page
   support, avoiding a regression on s390 for a firmware device where
   the existence of the mapping, even if unused can trigger an error
   state. (Niklas Schnelle)

 - Fix a deadlock in releasing KVM references by using the alternate
   .release() rather than .destroy() callback for the kvm-vfio device.
   (Yi Liu)

* tag 'vfio-v6.2-rc6' of https://github.com/awilliam/linux-vfio:
  kvm/vfio: Fix potential deadlock on vfio group_lock
  vfio/type1: Respect IOMMU reserved regions in vfio_test_domain_fgsp()
</content>
</entry>
<entry>
<title>kvm/vfio: Fix potential deadlock on vfio group_lock</title>
<updated>2023-01-20T15:50:05Z</updated>
<author>
<name>Yi Liu</name>
<email>yi.l.liu@intel.com</email>
</author>
<published>2023-01-20T15:05:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=51cdc8bc120ef6e42f6fb758341f5d91bc955952'/>
<id>urn:sha1:51cdc8bc120ef6e42f6fb758341f5d91bc955952</id>
<content type='text'>
Currently it is possible that the final put of a KVM reference comes from
vfio during its device close operation.  This occurs while the vfio group
lock is held; however, if the vfio device is still in the kvm device list,
then the following call chain could result in a deadlock:

VFIO holds group-&gt;group_lock/group_rwsem
  -&gt; kvm_put_kvm
   -&gt; kvm_destroy_vm
    -&gt; kvm_destroy_devices
     -&gt; kvm_vfio_destroy
      -&gt; kvm_vfio_file_set_kvm
       -&gt; vfio_file_set_kvm
        -&gt; try to hold group-&gt;group_lock/group_rwsem

The key function is the kvm_destroy_devices() which triggers destroy cb
of kvm_device_ops. It calls back to vfio and try to hold group_lock. So
if this path doesn't call back to vfio, this dead lock would be fixed.
Actually, there is a way for it. KVM provides another point to free the
kvm-vfio device which is the point when the device file descriptor is
closed. This can be achieved by providing the release cb instead of the
destroy cb. Also rename kvm_vfio_destroy() to be kvm_vfio_release().

	/*
	 * Destroy is responsible for freeing dev.
	 *
	 * Destroy may be called before or after destructors are called
	 * on emulated I/O regions, depending on whether a reference is
	 * held by a vcpu or other kvm component that gets destroyed
	 * after the emulated I/O.
	 */
	void (*destroy)(struct kvm_device *dev);

	/*
	 * Release is an alternative method to free the device. It is
	 * called when the device file descriptor is closed. Once
	 * release is called, the destroy method will not be called
	 * anymore as the device is removed from the device list of
	 * the VM. kvm-&gt;lock is held.
	 */
	void (*release)(struct kvm_device *dev);

Fixes: 421cfe6596f6 ("vfio: remove VFIO_GROUP_NOTIFY_SET_KVM")
Reported-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
Suggested-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Yi Liu &lt;yi.l.liu@intel.com&gt;
Reviewed-by: Matthew Rosato &lt;mjrosato@linux.ibm.com&gt;
Link: https://lore.kernel.org/r/20230114000351.115444-1-mjrosato@linux.ibm.com
Link: https://lore.kernel.org/r/20230120150528.471752-1-yi.l.liu@intel.com
[aw: update comment as well, s/destroy/release/]
Signed-off-by: Alex Williamson &lt;alex.williamson@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: Ensure lockdep knows about kvm-&gt;lock vs. vcpu-&gt;mutex ordering rule</title>
<updated>2023-01-11T18:32:21Z</updated>
<author>
<name>David Woodhouse</name>
<email>dwmw@amazon.co.uk</email>
</author>
<published>2023-01-11T18:06:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=42a90008f890afc41837dfeec1f0b1e7bcecf94a'/>
<id>urn:sha1:42a90008f890afc41837dfeec1f0b1e7bcecf94a</id>
<content type='text'>
Documentation/virt/kvm/locking.rst tells us that kvm-&gt;lock is taken outside
vcpu-&gt;mutex. But that doesn't actually happen very often; it's only in
some esoteric cases like migration with AMD SEV. This means that lockdep
usually doesn't notice, and doesn't do its job of keeping us honest.

Ensure that lockdep *always* knows about the ordering of these two locks,
by briefly taking vcpu-&gt;mutex in kvm_vm_ioctl_create_vcpu() while kvm-&gt;lock
is held.

Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Message-Id: &lt;20230111180651.14394-3-dwmw2@infradead.org&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: Clean up error labels in kvm_init()</title>
<updated>2022-12-29T20:48:37Z</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2022-11-30T23:09:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9f1a4c004869d3c8061f286fec4d8096dd099b84'/>
<id>urn:sha1:9f1a4c004869d3c8061f286fec4d8096dd099b84</id>
<content type='text'>
Convert the last two "out" lables to "err" labels now that the dust has
settled, i.e. now that there are no more planned changes to the order
of things in kvm_init().

Use "err" instead of "out" as it's easier to describe what failed than it
is to describe what needs to be unwound, e.g. if allocating a per-CPU kick
mask fails, KVM needs to free any masks that were allocated, and of course
needs to unwind previous operations.

Reported-by: Chao Gao &lt;chao.gao@intel.com&gt;
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
Message-Id: &lt;20221130230934.1014142-51-seanjc@google.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
</feed>
