<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/virtio, branch v5.5</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.5</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.5'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2019-12-11T13:14:07Z</updated>
<entry>
<title>virtio_balloon: divide/multiply instead of shifts</title>
<updated>2019-12-11T13:14:07Z</updated>
<author>
<name>Michael S. Tsirkin</name>
<email>mst@redhat.com</email>
</author>
<published>2019-11-19T10:25:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=63b9b80e9f5b2c463d98d6e550e0d0e3ace66033'/>
<id>urn:sha1:63b9b80e9f5b2c463d98d6e550e0d0e3ace66033</id>
<content type='text'>
We managed to get confused about the shift direction at least once.
Let's switch to division/multiplcation instead. Add a number of pages
macro for this purpose.  We still keep the order macro around too since
this is what alloc/free pages want.

Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
Reviewed-by: Wei Wang &lt;wei.w.wang@intel.com&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
</content>
</entry>
<entry>
<title>virtio_balloon: name cleanups</title>
<updated>2019-12-11T13:14:06Z</updated>
<author>
<name>Michael S. Tsirkin</name>
<email>mst@redhat.com</email>
</author>
<published>2019-11-19T10:21:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2a946fa1c8bc26df03bacebb6811b656d5e0cf1c'/>
<id>urn:sha1:2a946fa1c8bc26df03bacebb6811b656d5e0cf1c</id>
<content type='text'>
free_page_order is a confusing name. It's not a page order
actually, it's the order of the block of memory we are hinting.
Rename to hint_block_order. Also, rename SIZE to BYTES
to make it clear it's the block size in bytes.

Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
Reviewed-by: Wei Wang &lt;wei.w.wang@intel.com&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
</content>
</entry>
<entry>
<title>virtio-balloon: fix managed page counts when migrating pages between zones</title>
<updated>2019-12-11T13:14:06Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2019-12-11T11:11:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=63341ab03706e11a31e3dd8ccc0fbc9beaf723f0'/>
<id>urn:sha1:63341ab03706e11a31e3dd8ccc0fbc9beaf723f0</id>
<content type='text'>
In case we have to migrate a ballon page to a newpage of another zone, the
managed page count of both zones is wrong. Paired with memory offlining
(which will adjust the managed page count), we can trigger kernel crashes
and all kinds of different symptoms.

One way to reproduce:
1. Start a QEMU guest with 4GB, no NUMA
2. Hotplug a 1GB DIMM and online the memory to ZONE_NORMAL
3. Inflate the balloon to 1GB
4. Unplug the DIMM (be quick, otherwise unmovable data ends up on it)
5. Observe /proc/zoneinfo
  Node 0, zone   Normal
    pages free     16810
          min      24848885473806
          low      18471592959183339
          high     36918337032892872
          spanned  262144
          present  262144
          managed  18446744073709533486
6. Do anything that requires some memory (e.g., inflate the balloon some
more). The OOM goes crazy and the system crashes
  [  238.324946] Out of memory: Killed process 537 (login) total-vm:27584kB, anon-rss:860kB, file-rss:0kB, shmem-rss:00
  [  238.338585] systemd invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
  [  238.339420] CPU: 0 PID: 1 Comm: systemd Tainted: G      D W         5.4.0-next-20191204+ #75
  [  238.340139] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu4
  [  238.341121] Call Trace:
  [  238.341337]  dump_stack+0x8f/0xd0
  [  238.341630]  dump_header+0x61/0x5ea
  [  238.341942]  oom_kill_process.cold+0xb/0x10
  [  238.342299]  out_of_memory+0x24d/0x5a0
  [  238.342625]  __alloc_pages_slowpath+0xd12/0x1020
  [  238.343024]  __alloc_pages_nodemask+0x391/0x410
  [  238.343407]  pagecache_get_page+0xc3/0x3a0
  [  238.343757]  filemap_fault+0x804/0xc30
  [  238.344083]  ? ext4_filemap_fault+0x28/0x42
  [  238.344444]  ext4_filemap_fault+0x30/0x42
  [  238.344789]  __do_fault+0x37/0x1a0
  [  238.345087]  __handle_mm_fault+0x104d/0x1ab0
  [  238.345450]  handle_mm_fault+0x169/0x360
  [  238.345790]  do_user_addr_fault+0x20d/0x490
  [  238.346154]  do_page_fault+0x31/0x210
  [  238.346468]  async_page_fault+0x43/0x50
  [  238.346797] RIP: 0033:0x7f47eba4197e
  [  238.347110] Code: Bad RIP value.
  [  238.347387] RSP: 002b:00007ffd7c0c1890 EFLAGS: 00010293
  [  238.347834] RAX: 0000000000000002 RBX: 000055d196a20a20 RCX: 00007f47eba4197e
  [  238.348437] RDX: 0000000000000033 RSI: 00007ffd7c0c18c0 RDI: 0000000000000004
  [  238.349047] RBP: 00007ffd7c0c1c20 R08: 0000000000000000 R09: 0000000000000033
  [  238.349660] R10: 00000000ffffffff R11: 0000000000000293 R12: 0000000000000001
  [  238.350261] R13: ffffffffffffffff R14: 0000000000000000 R15: 00007ffd7c0c18c0
  [  238.350878] Mem-Info:
  [  238.351085] active_anon:3121 inactive_anon:51 isolated_anon:0
  [  238.351085]  active_file:12 inactive_file:7 isolated_file:0
  [  238.351085]  unevictable:0 dirty:0 writeback:0 unstable:0
  [  238.351085]  slab_reclaimable:5565 slab_unreclaimable:10170
  [  238.351085]  mapped:3 shmem:111 pagetables:155 bounce:0
  [  238.351085]  free:720717 free_pcp:2 free_cma:0
  [  238.353757] Node 0 active_anon:12484kB inactive_anon:204kB active_file:48kB inactive_file:28kB unevictable:0kB iss
  [  238.355979] Node 0 DMA free:11556kB min:36kB low:48kB high:60kB reserved_highatomic:0KB active_anon:152kB inactivB
  [  238.358345] lowmem_reserve[]: 0 2955 2884 2884 2884
  [  238.358761] Node 0 DMA32 free:2677864kB min:7004kB low:10028kB high:13052kB reserved_highatomic:0KB active_anon:0B
  [  238.361202] lowmem_reserve[]: 0 0 72057594037927865 72057594037927865 72057594037927865
  [  238.361888] Node 0 Normal free:193448kB min:99395541895224kB low:73886371836733356kB high:147673348131571488kB reB
  [  238.364765] lowmem_reserve[]: 0 0 0 0 0
  [  238.365101] Node 0 DMA: 7*4kB (U) 5*8kB (UE) 6*16kB (UME) 2*32kB (UM) 1*64kB (U) 2*128kB (UE) 3*256kB (UME) 2*512B
  [  238.366379] Node 0 DMA32: 0*4kB 1*8kB (U) 2*16kB (UM) 2*32kB (UM) 2*64kB (UM) 1*128kB (U) 1*256kB (U) 1*512kB (U)B
  [  238.367654] Node 0 Normal: 1985*4kB (UME) 1321*8kB (UME) 844*16kB (UME) 524*32kB (UME) 300*64kB (UME) 138*128kB (B
  [  238.369184] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
  [  238.369915] 130 total pagecache pages
  [  238.370241] 0 pages in swap cache
  [  238.370533] Swap cache stats: add 0, delete 0, find 0/0
  [  238.370981] Free swap  = 0kB
  [  238.371239] Total swap = 0kB
  [  238.371488] 1048445 pages RAM
  [  238.371756] 0 pages HighMem/MovableOnly
  [  238.372090] 306992 pages reserved
  [  238.372376] 0 pages cma reserved
  [  238.372661] 0 pages hwpoisoned

In another instance (older kernel), I was able to observe this
(negative page count :/):
  [  180.896971] Offlined Pages 32768
  [  182.667462] Offlined Pages 32768
  [  184.408117] Offlined Pages 32768
  [  186.026321] Offlined Pages 32768
  [  187.684861] Offlined Pages 32768
  [  189.227013] Offlined Pages 32768
  [  190.830303] Offlined Pages 32768
  [  190.833071] Built 1 zonelists, mobility grouping on.  Total pages: -36920272750453009

In another instance (older kernel), I was no longer able to start any
process:
  [root@vm ~]# [  214.348068] Offlined Pages 32768
  [  215.973009] Offlined Pages 32768
  cat /proc/meminfo
  -bash: fork: Cannot allocate memory
  [root@vm ~]# cat /proc/meminfo
  -bash: fork: Cannot allocate memory

Fix it by properly adjusting the managed page count when migrating if
the zone changed. The managed page count of the zones now looks after
unplug of the DIMM (and after deflating the balloon) just like before
inflating the balloon (and plugging+onlining the DIMM).

We'll temporarily modify the totalram page count. If this ever becomes a
problem, we can fine tune by providing helpers that don't touch
the totalram pages (e.g., adjust_zone_managed_page_count()).

Please note that fixing up the managed page count is only necessary when
we adjusted the managed page count when inflating - only if we
don't have VIRTIO_BALLOON_F_DEFLATE_ON_OOM. With that feature, the
managed page count is not touched when inflating/deflating.

Reported-by: Yumei Huang &lt;yuhuang@redhat.com&gt;
Fixes: 3dcc0571cd64 ("mm: correctly update zone-&gt;managed_pages")
Cc: &lt;stable@vger.kernel.org&gt; # v3.11+
Cc: "Michael S. Tsirkin" &lt;mst@redhat.com&gt;
Cc: Jason Wang &lt;jasowang@redhat.com&gt;
Cc: Jiang Liu &lt;liuj97@gmail.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Igor Mammedov &lt;imammedo@redhat.com&gt;
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
</content>
</entry>
<entry>
<title>virtio_balloon: fix shrinker count</title>
<updated>2019-11-20T07:15:57Z</updated>
<author>
<name>Wei Wang</name>
<email>wei.w.wang@intel.com</email>
</author>
<published>2019-11-19T10:02:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c9a6820fc0da2603be3054ee7590eb9f350508a7'/>
<id>urn:sha1:c9a6820fc0da2603be3054ee7590eb9f350508a7</id>
<content type='text'>
Instead of multiplying by page order, virtio balloon divided by page
order. The result is that it can return 0 if there are a bit less
than MAX_ORDER - 1 pages in use, and then shrinker scan won't be called.

Cc: stable@vger.kernel.org
Fixes: 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
Signed-off-by: Wei Wang &lt;wei.w.wang@intel.com&gt;
Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
</content>
</entry>
<entry>
<title>virtio_balloon: fix shrinker scan number of pages</title>
<updated>2019-11-20T07:15:57Z</updated>
<author>
<name>Michael S. Tsirkin</name>
<email>mst@redhat.com</email>
</author>
<published>2019-11-19T09:50:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=60bd04f258b7816af851d22821d5e6ce5b09c326'/>
<id>urn:sha1:60bd04f258b7816af851d22821d5e6ce5b09c326</id>
<content type='text'>
virtio_balloon_shrinker_scan should return number of system pages freed,
but because it's calling functions that deal with balloon pages, it gets
confused and sometimes returns the number of balloon pages.

It does not matter practically as the exact number isn't
used, but it seems better to be consistent in case someone
starts using this API.

Further, if we ever tried to iteratively leak pages as
virtio_balloon_shrinker_scan tries to do, we'd run into issues - this is
because freed_pages was accumulating total freed pages, but was also
subtracted on each iteration from pages_to_free, which can result in
either leaking less memory than we were supposed to free, or more if
pages_to_free underruns.

On a system with 4K pages we are lucky that we are never asked to leak
more than 128 pages while we can leak up to 256 at a time,
but it looks like a real issue for systems with page size != 4K.

Fixes: 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
Reported-by: Khazhismel Kumykov &lt;khazhy@google.com&gt;
Reviewed-by: Wei Wang &lt;wei.w.wang@intel.com&gt;
Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
</content>
</entry>
<entry>
<title>virtio_ring: fix return code on DMA mapping fails</title>
<updated>2019-11-19T10:13:49Z</updated>
<author>
<name>Halil Pasic</name>
<email>pasic@linux.ibm.com</email>
</author>
<published>2019-11-14T12:46:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f7728002c1c7bfa787b276a31c3ef458739b8e7c'/>
<id>urn:sha1:f7728002c1c7bfa787b276a31c3ef458739b8e7c</id>
<content type='text'>
Commit 780bc7903a32 ("virtio_ring: Support DMA APIs")  makes
virtqueue_add() return -EIO when we fail to map our I/O buffers. This is
a very realistic scenario for guests with encrypted memory, as swiotlb
may run out of space, depending on it's size and the I/O load.

The virtio-blk driver interprets -EIO form virtqueue_add() as an IO
error, despite the fact that swiotlb full is in absence of bugs a
recoverable condition.

Let us change the return code to -ENOMEM, and make the block layer
recover form these failures when virtio-blk encounters the condition
described above.

Cc: stable@vger.kernel.org
Fixes: 780bc7903a32 ("virtio_ring: Support DMA APIs")
Signed-off-by: Halil Pasic &lt;pasic@linux.ibm.com&gt;
Tested-by: Michael Mueller &lt;mimu@linux.ibm.com&gt;
Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
</content>
</entry>
<entry>
<title>virtio_ring: fix stalls for packed rings</title>
<updated>2019-10-28T08:24:46Z</updated>
<author>
<name>Marvin Liu</name>
<email>yong.liu@intel.com</email>
</author>
<published>2019-10-21T17:10:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=40ce7919d8730f5936da2bc8a21b46bd07db6411'/>
<id>urn:sha1:40ce7919d8730f5936da2bc8a21b46bd07db6411</id>
<content type='text'>
When VIRTIO_F_RING_EVENT_IDX is negotiated, virtio devices can
use virtqueue_enable_cb_delayed_packed to reduce the number of device
interrupts.  At the moment, this is the case for virtio-net when the
napi_tx module parameter is set to false.

In this case, the virtio driver selects an event offset and expects that
the device will send a notification when rolling over the event offset
in the ring.  However, if this roll-over happens before the event
suppression structure update, the notification won't be sent. To address
this race condition the driver needs to check wether the device rolled
over the offset after updating the event suppression structure.

With VIRTIO_F_RING_PACKED, the virtio driver did this by reading the
flags field of the descriptor at the specified offset.

Unfortunately, checking at the event offset isn't reliable: if
descriptors are chained (e.g. when INDIRECT is off) not all descriptors
are overwritten by the device, so it's possible that the device skipped
the specific descriptor driver is checking when writing out used
descriptors. If this happens, the driver won't detect the race condition
and will incorrectly expect the device to send a notification.

For virtio-net, the result will be a TX queue stall, with the
transmission getting blocked forever.

With the packed ring, it isn't easy to find a location which is
guaranteed to change upon the roll-over, except the next device
descriptor, as described in the spec:

        Writes of device and driver descriptors can generally be
        reordered, but each side (driver and device) are only required to
        poll (or test) a single location in memory: the next device descriptor after
        the one they processed previously, in circular order.

while this might be sub-optimal, let's do exactly this for now.

Cc: stable@vger.kernel.org
Cc: Jason Wang &lt;jasowang@redhat.com&gt;
Fixes: f51f982682e2a ("virtio_ring: leverage event idx in packed ring")
Signed-off-by: Marvin Liu &lt;yong.liu@intel.com&gt;
Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
</content>
</entry>
<entry>
<title>virtio_ring: fix unmap of indirect descriptors</title>
<updated>2019-09-09T14:43:15Z</updated>
<author>
<name>Matthias Lange</name>
<email>matthias.lange@kernkonzept.com</email>
</author>
<published>2019-09-06T14:59:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cf8f1696709ad5bb3138ed8c771c2eb98950cd8a'/>
<id>urn:sha1:cf8f1696709ad5bb3138ed8c771c2eb98950cd8a</id>
<content type='text'>
The function virtqueue_add_split() DMA-maps the scatterlist buffers. In
case a mapping error occurs the already mapped buffers must be unmapped.
This happens by jumping to the 'unmap_release' label.

In case of indirect descriptors the release is wrong and may leak kernel
memory. Because the implementation assumes that the head descriptor is
already mapped it starts iterating over the descriptor list starting
from the head descriptor. However for indirect descriptors the head
descriptor is never mapped in case of an error.

The fix is to initialize the start index with zero in case of indirect
descriptors and use the 'desc' pointer directly for iterating over the
descriptor chain.

Signed-off-by: Matthias Lange &lt;matthias.lange@kernkonzept.com&gt;
Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs</title>
<updated>2019-07-19T17:42:02Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-07-19T17:42:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=933a90bf4f3505f8ec83bda21a3c7d70d7c2b426'/>
<id>urn:sha1:933a90bf4f3505f8ec83bda21a3c7d70d7c2b426</id>
<content type='text'>
Pull vfs mount updates from Al Viro:
 "The first part of mount updates.

  Convert filesystems to use the new mount API"

* 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  mnt_init(): call shmem_init() unconditionally
  constify ksys_mount() string arguments
  don't bother with registering rootfs
  init_rootfs(): don't bother with init_ramfs_fs()
  vfs: Convert smackfs to use the new mount API
  vfs: Convert selinuxfs to use the new mount API
  vfs: Convert securityfs to use the new mount API
  vfs: Convert apparmorfs to use the new mount API
  vfs: Convert openpromfs to use the new mount API
  vfs: Convert xenfs to use the new mount API
  vfs: Convert gadgetfs to use the new mount API
  vfs: Convert oprofilefs to use the new mount API
  vfs: Convert ibmasmfs to use the new mount API
  vfs: Convert qib_fs/ipathfs to use the new mount API
  vfs: Convert efivarfs to use the new mount API
  vfs: Convert configfs to use the new mount API
  vfs: Convert binfmt_misc to use the new mount API
  convenience helper: get_tree_single()
  convenience helper get_tree_nodev()
  vfs: Kill sget_userns()
  ...
</content>
</entry>
<entry>
<title>Merge tag 'libnvdimm-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm</title>
<updated>2019-07-18T17:52:08Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-07-18T17:52:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f8c3500cd137867927bc080f4a6e02e0222dd1b8'/>
<id>urn:sha1:f8c3500cd137867927bc080f4a6e02e0222dd1b8</id>
<content type='text'>
Pull libnvdimm updates from Dan Williams:
 "Primarily just the virtio_pmem driver:

   - virtio_pmem

     The new virtio_pmem facility introduces a paravirtualized
     persistent memory device that allows a guest VM to use DAX
     mechanisms to access a host-file with host-page-cache. It arranges
     for MAP_SYNC to be disabled and instead triggers a host fsync()
     when a 'write-cache flush' command is sent to the virtual disk
     device.

   - Miscellaneous small fixups"

* tag 'libnvdimm-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  virtio_pmem: fix sparse warning
  xfs: disable map_sync for async flush
  ext4: disable map_sync for async flush
  dax: check synchronous mapping is supported
  dm: enable synchronous dax
  libnvdimm: add dax_dev sync flag
  virtio-pmem: Add virtio pmem driver
  libnvdimm: nd_region flush callback support
  libnvdimm, namespace: Drop uuid_t implementation detail
</content>
</entry>
</feed>
