<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/include/net/page_pool, branch v6.16</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.16</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.16'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-04-17T01:17:57Z</updated>
<entry>
<title>eth: bnxt: add support rx side device memory TCP</title>
<updated>2025-04-17T01:17:57Z</updated>
<author>
<name>Taehee Yoo</name>
<email>ap420073@gmail.com</email>
</author>
<published>2025-04-15T05:24:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cd1fafe7da1f6f2aa25723e317f6e8e9d0c050a1'/>
<id>urn:sha1:cd1fafe7da1f6f2aa25723e317f6e8e9d0c050a1</id>
<content type='text'>
Currently, bnxt_en driver satisfies the requirements of the Device
memory TCP, which is HDS.
So, it implements rx-side Device memory TCP for bnxt_en driver.
It requires only converting the page API to netmem API.
`struct page` of agg rings are changed to `netmem_ref netmem` and
corresponding functions are changed to a variant of netmem API.

It also passes PP_FLAG_ALLOW_UNREADABLE_NETMEM flag to a parameter of
page_pool.
The netmem will be activated only when a user requests devmem TCP.

When netmem is activated, received data is unreadable and netmem is
disabled, received data is readable.
But drivers don't need to handle both cases because netmem core API will
handle it properly.
So, using proper netmem API is enough for drivers.

Device memory TCP can be tested with
tools/testing/selftests/drivers/net/hw/ncdevmem.
This is tested with BCM57504-N425G and firmware version 232.0.155.8/pkg
232.1.132.8.

Reviewed-by: Mina Almasry &lt;almasrymina@google.com&gt;
Tested-by: David Wei &lt;dw@davidwei.uk&gt;
Signed-off-by: Taehee Yoo &lt;ap420073@gmail.com&gt;
Link: https://patch.msgid.link/20250415052458.1260575-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>page_pool: Track DMA-mapped pages and unmap them when destroying the pool</title>
<updated>2025-04-14T23:30:29Z</updated>
<author>
<name>Toke Høiland-Jørgensen</name>
<email>toke@redhat.com</email>
</author>
<published>2025-04-09T10:41:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ee62ce7a1d909ccba0399680a03c2dee83bcae95'/>
<id>urn:sha1:ee62ce7a1d909ccba0399680a03c2dee83bcae95</id>
<content type='text'>
When enabling DMA mapping in page_pool, pages are kept DMA mapped until
they are released from the pool, to avoid the overhead of re-mapping the
pages every time they are used. This causes resource leaks and/or
crashes when there are pages still outstanding while the device is torn
down, because page_pool will attempt an unmap through a non-existent DMA
device on the subsequent page return.

To fix this, implement a simple tracking of outstanding DMA-mapped pages
in page pool using an xarray. This was first suggested by Mina[0], and
turns out to be fairly straight forward: We simply store pointers to
pages directly in the xarray with xa_alloc() when they are first DMA
mapped, and remove them from the array on unmap. Then, when a page pool
is torn down, it can simply walk the xarray and unmap all pages still
present there before returning, which also allows us to get rid of the
get/put_device() calls in page_pool. Using xa_cmpxchg(), no additional
synchronisation is needed, as a page will only ever be unmapped once.

To avoid having to walk the entire xarray on unmap to find the page
reference, we stash the ID assigned by xa_alloc() into the page
structure itself, using the upper bits of the pp_magic field. This
requires a couple of defines to avoid conflicting with the
POINTER_POISON_DELTA define, but this is all evaluated at compile-time,
so does not affect run-time performance. The bitmap calculations in this
patch gives the following number of bits for different architectures:

- 23 bits on 32-bit architectures
- 21 bits on PPC64 (because of the definition of ILLEGAL_POINTER_VALUE)
- 32 bits on other 64-bit architectures

Stashing a value into the unused bits of pp_magic does have the effect
that it can make the value stored there lie outside the unmappable
range (as governed by the mmap_min_addr sysctl), for architectures that
don't define ILLEGAL_POINTER_VALUE. This means that if one of the
pointers that is aliased to the pp_magic field (such as page-&gt;lru.next)
is dereferenced while the page is owned by page_pool, that could lead to
a dereference into userspace, which is a security concern. The risk of
this is mitigated by the fact that (a) we always clear pp_magic before
releasing a page from page_pool, and (b) this would need a
use-after-free bug for struct page, which can have many other risks
since page-&gt;lru.next is used as a generic list pointer in multiple
places in the kernel. As such, with this patch we take the position that
this risk is negligible in practice. For more discussion, see[1].

Since all the tracking added in this patch is performed on DMA
map/unmap, no additional code is needed in the fast path, meaning the
performance overhead of this tracking is negligible there. A
micro-benchmark shows that the total overhead of the tracking itself is
about 400 ns (39 cycles(tsc) 395.218 ns; sum for both map and unmap[2]).
Since this cost is only paid on DMA map and unmap, it seems like an
acceptable cost to fix the late unmap issue. Further optimisation can
narrow the cases where this cost is paid (for instance by eliding the
tracking when DMA map/unmap is a no-op).

The extra memory needed to track the pages is neatly encapsulated inside
xarray, which uses the 'struct xa_node' structure to track items. This
structure is 576 bytes long, with slots for 64 items, meaning that a
full node occurs only 9 bytes of overhead per slot it tracks (in
practice, it probably won't be this efficient, but in any case it should
be an acceptable overhead).

[0] https://lore.kernel.org/all/CAHS8izPg7B5DwKfSuzz-iOop_YRbk3Sd6Y4rX7KBG9DcVJcyWg@mail.gmail.com/
[1] https://lore.kernel.org/r/20250320023202.GA25514@openwall.com
[2] https://lore.kernel.org/r/ae07144c-9295-4c9d-a400-153bb689fe9e@huawei.com

Reported-by: Yonglong Liu &lt;liuyonglong@huawei.com&gt;
Closes: https://lore.kernel.org/r/8743264a-9700-4227-a556-5f931c720211@huawei.com
Fixes: ff7d6b27f894 ("page_pool: refurbish version of page_pool code")
Suggested-by: Mina Almasry &lt;almasrymina@google.com&gt;
Reviewed-by: Mina Almasry &lt;almasrymina@google.com&gt;
Reviewed-by: Jesper Dangaard Brouer &lt;hawk@kernel.org&gt;
Tested-by: Jesper Dangaard Brouer &lt;hawk@kernel.org&gt;
Tested-by: Qiuling Ren &lt;qren@redhat.com&gt;
Tested-by: Yuying Ma &lt;yuma@redhat.com&gt;
Tested-by: Yonglong Liu &lt;liuyonglong@huawei.com&gt;
Acked-by: Jesper Dangaard Brouer &lt;hawk@kernel.org&gt;
Signed-off-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Link: https://patch.msgid.link/20250409-page-pool-track-dma-v9-2-6a9ef2e0cba8@redhat.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: move mp dev config validation to __net_mp_open_rxq()</title>
<updated>2025-04-04T14:35:38Z</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2025-04-03T01:34:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ec304b70d46bd2ed66541c5b57b63276529e05b1'/>
<id>urn:sha1:ec304b70d46bd2ed66541c5b57b63276529e05b1</id>
<content type='text'>
devmem code performs a number of safety checks to avoid having
to reimplement all of them in the drivers. Move those to
__net_mp_open_rxq() and reuse that function for binding to make
sure that io_uring ZC also benefits from them.

While at it rename the queue ID variable to rxq_idx in
__net_mp_open_rxq(), we touch most of the relevant lines.

The XArray insertion is reordered after the netdev_rx_queue_restart()
call, otherwise we'd need to duplicate the queue index check
or risk inserting an invalid pointer. The XArray allocation
failures should be extremely rare.

Reviewed-by: Mina Almasry &lt;almasrymina@google.com&gt;
Acked-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Fixes: 6e18ed929d3b ("net: add helpers for setting a memory provider on an rx queue")
Link: https://patch.msgid.link/20250403013405.2827250-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: add helpers for setting a memory provider on an rx queue</title>
<updated>2025-02-07T00:27:31Z</updated>
<author>
<name>David Wei</name>
<email>dw@davidwei.uk</email>
</author>
<published>2025-02-04T21:56:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6e18ed929d3ba9b3b92ba5894f9233686b3e3ec1'/>
<id>urn:sha1:6e18ed929d3ba9b3b92ba5894f9233686b3e3ec1</id>
<content type='text'>
Add helpers that properly prep or remove a memory provider for an rx
queue then restart the queue.

Reviewed-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Signed-off-by: David Wei &lt;dw@davidwei.uk&gt;
Link: https://patch.msgid.link/20250204215622.695511-11-dw@davidwei.uk
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: page_pool: add memory provider helpers</title>
<updated>2025-02-07T00:27:31Z</updated>
<author>
<name>Pavel Begunkov</name>
<email>asml.silence@gmail.com</email>
</author>
<published>2025-02-04T21:56:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=56102c013fa7b8dbba8c5d5f7e042ad5f18cf4ec'/>
<id>urn:sha1:56102c013fa7b8dbba8c5d5f7e042ad5f18cf4ec</id>
<content type='text'>
Add helpers for memory providers to interact with page pools.
net_mp_niov_{set,clear}_page_pool() serve to [dis]associate a net_iov
with a page pool. If used, the memory provider is responsible to match
"set" calls with "clear" once a net_iov is not going to be used by a page
pool anymore, changing a page pool, etc.

Acked-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Signed-off-by: David Wei &lt;dw@davidwei.uk&gt;
Link: https://patch.msgid.link/20250204215622.695511-10-dw@davidwei.uk
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: page_pool: add a mp hook to unregister_netdevice*</title>
<updated>2025-02-07T00:27:31Z</updated>
<author>
<name>Pavel Begunkov</name>
<email>asml.silence@gmail.com</email>
</author>
<published>2025-02-04T21:56:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f8350a4358fc9c4801f2f4229cf9b6e678055d9a'/>
<id>urn:sha1:f8350a4358fc9c4801f2f4229cf9b6e678055d9a</id>
<content type='text'>
Devmem TCP needs a hook in unregister_netdevice_many_notify() to upkeep
the set tracking queues it's bound to, i.e. -&gt;bound_rxqs. Instead of
devmem sticking directly out of the genetic path, add a mp function.

Reviewed-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Reviewed-by: Mina Almasry &lt;almasrymina@google.com&gt;
Signed-off-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Signed-off-by: David Wei &lt;dw@davidwei.uk&gt;
Link: https://patch.msgid.link/20250204215622.695511-8-dw@davidwei.uk
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: page_pool: add callback for mp info printing</title>
<updated>2025-02-07T00:27:31Z</updated>
<author>
<name>Pavel Begunkov</name>
<email>asml.silence@gmail.com</email>
</author>
<published>2025-02-04T21:56:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2508a46f920a3130e35ab2183a70fc93f0aaaee4'/>
<id>urn:sha1:2508a46f920a3130e35ab2183a70fc93f0aaaee4</id>
<content type='text'>
Add a mandatory callback that prints information about the memory
provider to netlink.

Reviewed-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Signed-off-by: David Wei &lt;dw@davidwei.uk&gt;
Link: https://patch.msgid.link/20250204215622.695511-7-dw@davidwei.uk
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: page_pool: create hooks for custom memory providers</title>
<updated>2025-02-07T00:27:30Z</updated>
<author>
<name>Pavel Begunkov</name>
<email>asml.silence@gmail.com</email>
</author>
<published>2025-02-04T21:56:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=57afb483015768903029c8336ee287f4b03c1235'/>
<id>urn:sha1:57afb483015768903029c8336ee287f4b03c1235</id>
<content type='text'>
A spin off from the original page pool memory providers patch by Jakub,
which allows extending page pools with custom allocators. One of such
providers is devmem TCP, and the other is io_uring zerocopy added in
following patches.

Link: https://lore.kernel.org/netdev/20230707183935.997267-7-kuba@kernel.org/
Co-developed-by: Jakub Kicinski &lt;kuba@kernel.org&gt; # initial mp proposal
Signed-off-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Signed-off-by: David Wei &lt;dw@davidwei.uk&gt;
Link: https://patch.msgid.link/20250204215622.695511-5-dw@davidwei.uk
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: page_pool: don't try to stash the napi id</title>
<updated>2025-01-27T22:37:41Z</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2025-01-23T23:16:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=67e4bb2ced0f2d8fbd414f932daea94ba63ae4c4'/>
<id>urn:sha1:67e4bb2ced0f2d8fbd414f932daea94ba63ae4c4</id>
<content type='text'>
Page ppol tried to cache the NAPI ID in page pool info to avoid
having a dependency on the life cycle of the NAPI instance.
Since commit under Fixes the NAPI ID is not populated until
napi_enable() and there's a good chance that page pool is
created before NAPI gets enabled.

Protect the NAPI pointer with the existing page pool mutex,
the reading path already holds it. napi_id itself we need
to READ_ONCE(), it's protected by netdev_lock() which are
not holding in page pool.

Before this patch napi IDs were missing for mlx5:

 # ./cli.py --spec netlink/specs/netdev.yaml --dump page-pool-get

 [{'id': 144, 'ifindex': 2, 'inflight': 3072, 'inflight-mem': 12582912},
  {'id': 143, 'ifindex': 2, 'inflight': 5568, 'inflight-mem': 22806528},
  {'id': 142, 'ifindex': 2, 'inflight': 5120, 'inflight-mem': 20971520},
  {'id': 141, 'ifindex': 2, 'inflight': 4992, 'inflight-mem': 20447232},
  ...

After:

 [{'id': 144, 'ifindex': 2, 'inflight': 3072, 'inflight-mem': 12582912,
   'napi-id': 565},
  {'id': 143, 'ifindex': 2, 'inflight': 4224, 'inflight-mem': 17301504,
   'napi-id': 525},
  {'id': 142, 'ifindex': 2, 'inflight': 4288, 'inflight-mem': 17563648,
   'napi-id': 524},
  ...

Fixes: 86e25f40aa1e ("net: napi: Add napi_config")
Reviewed-by: Mina Almasry &lt;almasrymina@google.com&gt;
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Link: https://patch.msgid.link/20250123231620.1086401-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2025-01-16T18:34:59Z</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2025-01-16T18:30:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2ee738e90e80850582cbe10f34c6447965c1d87b'/>
<id>urn:sha1:2ee738e90e80850582cbe10f34c6447965c1d87b</id>
<content type='text'>
Cross-merge networking fixes after downstream PR (net-6.13-rc8).

Conflicts:

drivers/net/ethernet/realtek/r8169_main.c
  1f691a1fc4be ("r8169: remove redundant hwmon support")
  152d00a91396 ("r8169: simplify setting hwmon attribute visibility")
https://lore.kernel.org/20250115122152.760b4e8d@canb.auug.org.au

Adjacent changes:

drivers/net/ethernet/broadcom/bnxt/bnxt.c
  152f4da05aee ("bnxt_en: add support for rx-copybreak ethtool command")
  f0aa6a37a3db ("eth: bnxt: always recalculate features after XDP clearing, fix null-deref")

drivers/net/ethernet/intel/ice/ice_type.h
  50327223a8bb ("ice: add lock to protect low latency interface")
  dc26548d729e ("ice: Fix quad registers read on E825")

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
</feed>
