<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/bpf/hashtab.c, branch v6.15</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.15</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.15'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-04-25T15:36:59Z</updated>
<entry>
<title>bpf: fix possible endless loop in BPF map iteration</title>
<updated>2025-04-25T15:36:59Z</updated>
<author>
<name>Brandon Kammerdiener</name>
<email>brandon.kammerdiener@intel.com</email>
</author>
<published>2025-04-24T15:32:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=75673fda0c557ae26078177dd14d4857afbf128d'/>
<id>urn:sha1:75673fda0c557ae26078177dd14d4857afbf128d</id>
<content type='text'>
The _safe variant used here gets the next element before running the callback,
avoiding the endless loop condition.

Signed-off-by: Brandon Kammerdiener &lt;brandon.kammerdiener@intel.com&gt;
Link: https://lore.kernel.org/r/20250424153246.141677-2-brandon.kammerdiener@intel.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Hou Tao &lt;houtao1@huawei.com&gt;
</content>
</entry>
<entry>
<title>bpf: Convert hashtab.c to rqspinlock</title>
<updated>2025-03-19T15:03:05Z</updated>
<author>
<name>Kumar Kartikeya Dwivedi</name>
<email>memxor@gmail.com</email>
</author>
<published>2025-03-16T04:05:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4fa8d68aa53e6d76f66f3ed21e06c52cf8912074'/>
<id>urn:sha1:4fa8d68aa53e6d76f66f3ed21e06c52cf8912074</id>
<content type='text'>
Convert hashtab.c from raw_spinlock to rqspinlock, and drop the hashed
per-cpu counter crud from the code base which is no longer necessary.

Closes: https://lore.kernel.org/bpf/675302fd.050a0220.2477f.0004.GAE@google.com
Closes: https://lore.kernel.org/bpf/000000000000b3e63e061eed3f6b@google.com
Signed-off-by: Kumar Kartikeya Dwivedi &lt;memxor@gmail.com&gt;
Link: https://lore.kernel.org/r/20250316040541.108729-20-memxor@gmail.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Check map-&gt;record at the beginning of check_and_free_fields()</title>
<updated>2025-03-15T19:06:50Z</updated>
<author>
<name>Hou Tao</name>
<email>houtao1@huawei.com</email>
</author>
<published>2025-03-15T15:09:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bb2243f4328bc2e4aa4d8566a5a0a7f9ce947570'/>
<id>urn:sha1:bb2243f4328bc2e4aa4d8566a5a0a7f9ce947570</id>
<content type='text'>
When there are no special fields in the map value, there is no need to
invoke bpf_obj_free_fields(). Therefore, checking the validity of
map-&gt;record in advance.

After the change, the benchmark result of the per-cpu update case in
map_perf_test increased by 40% under a 16-CPU VM.

Signed-off-by: Hou Tao &lt;houtao1@huawei.com&gt;
Acked-by: Kumar Kartikeya Dwivedi &lt;memxor@gmail.com&gt;
Link: https://lore.kernel.org/r/20250315150930.1511727-1-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Fix kmemleak warning for percpu hashmap</title>
<updated>2025-02-24T20:11:00Z</updated>
<author>
<name>Yonghong Song</name>
<email>yonghong.song@linux.dev</email>
</author>
<published>2025-02-24T17:55:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=11ba7ce076e5903e7bdc1fd1498979c331b3c286'/>
<id>urn:sha1:11ba7ce076e5903e7bdc1fd1498979c331b3c286</id>
<content type='text'>
Vlad Poenaru reported the following kmemleak issue:

  unreferenced object 0x606fd7c44ac8 (size 32):
    backtrace (crc 0):
      pcpu_alloc_noprof+0x730/0xeb0
      bpf_map_alloc_percpu+0x69/0xc0
      prealloc_init+0x9d/0x1b0
      htab_map_alloc+0x363/0x510
      map_create+0x215/0x3a0
      __sys_bpf+0x16b/0x3e0
      __x64_sys_bpf+0x18/0x20
      do_syscall_64+0x7b/0x150
      entry_SYSCALL_64_after_hwframe+0x4b/0x53

Further investigation shows the reason is due to not 8-byte aligned
store of percpu pointer in htab_elem_set_ptr():
  *(void __percpu **)(l-&gt;key + key_size) = pptr;

Note that the whole htab_elem alignment is 8 (for x86_64). If the key_size
is 4, that means pptr is stored in a location which is 4 byte aligned but
not 8 byte aligned. In mm/kmemleak.c, scan_block() scans the memory based
on 8 byte stride, so it won't detect above pptr, hence reporting the memory
leak.

In htab_map_alloc(), we already have

        htab-&gt;elem_size = sizeof(struct htab_elem) +
                          round_up(htab-&gt;map.key_size, 8);
        if (percpu)
                htab-&gt;elem_size += sizeof(void *);
        else
                htab-&gt;elem_size += round_up(htab-&gt;map.value_size, 8);

So storing pptr with 8-byte alignment won't cause any problem and can fix
kmemleak too.

The issue can be reproduced with bpf selftest as well:
  1. Enable CONFIG_DEBUG_KMEMLEAK config
  2. Add a getchar() before skel destroy in test_hash_map() in prog_tests/for_each.c.
     The purpose is to keep map available so kmemleak can be detected.
  3. run './test_progs -t for_each/hash_map &amp;' and a kmemleak should be reported.

Reported-by: Vlad Poenaru &lt;thevlad@meta.com&gt;
Signed-off-by: Yonghong Song &lt;yonghong.song@linux.dev&gt;
Acked-by: Martin KaFai Lau &lt;martin.lau@kernel.org&gt;
Link: https://lore.kernel.org/r/20250224175514.2207227-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Free element after unlock in __htab_map_lookup_and_delete_elem()</title>
<updated>2025-01-20T17:09:01Z</updated>
<author>
<name>Hou Tao</name>
<email>houtao1@huawei.com</email>
</author>
<published>2025-01-17T10:18:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=47363f1553e69b8c2e3269f9883799a4ea898cd4'/>
<id>urn:sha1:47363f1553e69b8c2e3269f9883799a4ea898cd4</id>
<content type='text'>
The freeing of special fields in map value may acquire a spin-lock
(e.g., the freeing of bpf_timer), however, the lookup_and_delete_elem
procedure has already held a raw-spin-lock, which violates the lockdep
rule.

The running context of __htab_map_lookup_and_delete_elem() has already
disabled the migration. Therefore, it is OK to invoke free_htab_elem()
after unlocking the bucket lock.

Fix the potential problem by freeing element after unlocking bucket lock
in __htab_map_lookup_and_delete_elem().

Signed-off-by: Hou Tao &lt;houtao1@huawei.com&gt;
Link: https://lore.kernel.org/r/20250117101816.2101857-4-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Bail out early in __htab_map_lookup_and_delete_elem()</title>
<updated>2025-01-20T17:09:01Z</updated>
<author>
<name>Hou Tao</name>
<email>houtao1@huawei.com</email>
</author>
<published>2025-01-17T10:18:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=588c6ead325aecc9894c9925cf1f771b77437bee'/>
<id>urn:sha1:588c6ead325aecc9894c9925cf1f771b77437bee</id>
<content type='text'>
Use goto statement to bail out early when the target element is not
found, instead of using a large else branch to handle the more likely
case. This change doesn't affect functionality and simply make the code
cleaner.

Signed-off-by: Hou Tao &lt;houtao1@huawei.com&gt;
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@kernel.org&gt;
Link: https://lore.kernel.org/r/20250117101816.2101857-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Free special fields after unlock in htab_lru_map_delete_node()</title>
<updated>2025-01-20T17:09:01Z</updated>
<author>
<name>Hou Tao</name>
<email>houtao1@huawei.com</email>
</author>
<published>2025-01-17T10:18:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=45dc92c32a476daf60eaef855014d0ea35780ba5'/>
<id>urn:sha1:45dc92c32a476daf60eaef855014d0ea35780ba5</id>
<content type='text'>
When bpf_timer is used in LRU hash map, calling check_and_free_fields()
in htab_lru_map_delete_node() will invoke bpf_timer_cancel_and_free() to
free the bpf_timer. If the timer is running on other CPUs,
hrtimer_cancel() will invoke hrtimer_cancel_wait_running() to spin on
current CPU to wait for the completion of the hrtimer callback.

Considering that the deletion has already acquired a raw-spin-lock
(bucket lock). To reduce the time holding the bucket lock, move the
invocation of check_and_free_fields() out of bucket lock. However,
because htab_lru_map_delete_node() is invoked with LRU raw spin lock
being held, the freeing of special fields still happens in a locked
scope.

Signed-off-by: Hou Tao &lt;houtao1@huawei.com&gt;
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@kernel.org&gt;
Link: https://lore.kernel.org/r/20250117101816.2101857-2-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Disable migration before calling ops-&gt;map_free()</title>
<updated>2025-01-09T02:06:36Z</updated>
<author>
<name>Hou Tao</name>
<email>houtao1@huawei.com</email>
</author>
<published>2025-01-08T01:07:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4b7e7cd1c105cc5881f4054805dfbb92aa24eb78'/>
<id>urn:sha1:4b7e7cd1c105cc5881f4054805dfbb92aa24eb78</id>
<content type='text'>
The freeing of all map elements may invoke bpf_obj_free_fields() to free
the special fields in the map value. Since these special fields may be
allocated from bpf memory allocator, migrate_{disable|enable} pairs are
necessary for the freeing of these special fields.

To simplify reasoning about when migrate_disable() is needed for the
freeing of these special fields, let the caller to guarantee migration
is disabled before invoking bpf_obj_free_fields(). Therefore, disabling
migration before calling ops-&gt;map_free() to simplify the freeing of map
values or special fields allocated from bpf memory allocator.

After disabling migration in bpf_map_free(), there is no need for
additional migration_{disable|enable} pairs in these -&gt;map_free()
callbacks. Remove these redundant invocations.

The migrate_{disable|enable} pairs in the underlying implementation of
bpf_obj_free_fields() will be removed by the following patch.

Signed-off-by: Hou Tao &lt;houtao1@huawei.com&gt;
Link: https://lore.kernel.org/r/20250108010728.207536-11-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Remove migrate_{disable|enable} in htab_elem_free</title>
<updated>2025-01-09T02:06:36Z</updated>
<author>
<name>Hou Tao</name>
<email>houtao1@huawei.com</email>
</author>
<published>2025-01-08T01:07:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=53f2ba0b1cc087a597b43e63d35f355e9348bd61'/>
<id>urn:sha1:53f2ba0b1cc087a597b43e63d35f355e9348bd61</id>
<content type='text'>
htab_elem_free() has two call-sites: delete_all_elements() has already
disabled migration, free_htab_elem() is invoked by other 4 functions:
__htab_map_lookup_and_delete_elem, __htab_map_lookup_and_delete_batch,
htab_map_update_elem and htab_map_delete_elem.

BPF syscall has already disabled migration before invoking
-&gt;map_update_elem, -&gt;map_delete_elem, and -&gt;map_lookup_and_delete_elem
callbacks for hash map. __htab_map_lookup_and_delete_batch() also
disables migration before invoking free_htab_elem(). -&gt;map_update_elem()
and -&gt;map_delete_elem() of hash map may be invoked by BPF program and
the running context of BPF program has already disabled migration.
Therefore, it is safe to remove the migration_{disable|enable} pair in
htab_elem_free()

Signed-off-by: Hou Tao &lt;houtao1@huawei.com&gt;
Link: https://lore.kernel.org/r/20250108010728.207536-4-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Remove migrate_{disable|enable} in -&gt;map_for_each_callback</title>
<updated>2025-01-09T02:06:35Z</updated>
<author>
<name>Hou Tao</name>
<email>houtao1@huawei.com</email>
</author>
<published>2025-01-08T01:07:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ea5b229630a631ee6a72e1f58bc40029efc1daf8'/>
<id>urn:sha1:ea5b229630a631ee6a72e1f58bc40029efc1daf8</id>
<content type='text'>
BPF program may call bpf_for_each_map_elem(), and it will call
the -&gt;map_for_each_callback callback of related bpf map. Considering the
running context of bpf program has already disabled migration, remove
the unnecessary migrate_{disable|enable} pair in the implementations of
-&gt;map_for_each_callback. To ensure the guarantee will not be voilated
later, also add cant_migrate() check in the implementations.

Signed-off-by: Hou Tao &lt;houtao1@huawei.com&gt;
Link: https://lore.kernel.org/r/20250108010728.207536-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
</feed>
