<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/bpf/hashtab.c, branch v5.8</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.8</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.8'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2020-07-29T23:30:22Z</updated>
<entry>
<title>bpf: Fix map leak in HASH_OF_MAPS map</title>
<updated>2020-07-29T23:30:22Z</updated>
<author>
<name>Andrii Nakryiko</name>
<email>andriin@fb.com</email>
</author>
<published>2020-07-29T04:09:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1d4e1eab456e1ee92a94987499b211db05f900ea'/>
<id>urn:sha1:1d4e1eab456e1ee92a94987499b211db05f900ea</id>
<content type='text'>
Fix HASH_OF_MAPS bug of not putting inner map pointer on bpf_map_elem_update()
operation. This is due to per-cpu extra_elems optimization, which bypassed
free_htab_elem() logic doing proper clean ups. Make sure that inner map is put
properly in optimized case as well.

Fixes: 8c290e60fa2a ("bpf: fix hashmap extra_elems logic")
Signed-off-by: Andrii Nakryiko &lt;andriin@fb.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Song Liu &lt;songliubraving@fb.com&gt;
Link: https://lore.kernel.org/bpf/20200729040913.2815687-1-andriin@fb.com
</content>
</entry>
<entry>
<title>bpf: Implement CAP_BPF</title>
<updated>2020-05-15T15:29:41Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2020-05-13T23:03:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2c78ee898d8f10ae6fb2fa23a3fbaec96b1b7366'/>
<id>urn:sha1:2c78ee898d8f10ae6fb2fa23a3fbaec96b1b7366</id>
<content type='text'>
Implement permissions as stated in uapi/linux/capability.h
In order to do that the verifier allow_ptr_leaks flag is split
into four flags and they are set as:
  env-&gt;allow_ptr_leaks = bpf_allow_ptr_leaks();
  env-&gt;bypass_spec_v1 = bpf_bypass_spec_v1();
  env-&gt;bypass_spec_v4 = bpf_bypass_spec_v4();
  env-&gt;bpf_capable = bpf_capable();

The first three currently equivalent to perfmon_capable(), since leaking kernel
pointers and reading kernel memory via side channel attacks is roughly
equivalent to reading kernel memory with cap_perfmon.

'bpf_capable' enables bounded loops, precision tracking, bpf to bpf calls and
other verifier features. 'allow_ptr_leaks' enable ptr leaks, ptr conversions,
subtraction of pointers. 'bypass_spec_v1' disables speculative analysis in the
verifier, run time mitigations in bpf array, and enables indirect variable
access in bpf programs. 'bypass_spec_v4' disables emission of sanitation code
by the verifier.

That means that the networking BPF program loaded with CAP_BPF + CAP_NET_ADMIN
will have speculative checks done by the verifier and other spectre mitigation
applied. Such networking BPF program will not be able to leak kernel pointers
and will not be able to access arbitrary kernel memory.

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20200513230355.7858-3-alexei.starovoitov@gmail.com
</content>
</entry>
<entry>
<title>bpf: Replace zero-length array with flexible-array member</title>
<updated>2020-02-28T00:21:02Z</updated>
<author>
<name>Gustavo A. R. Silva</name>
<email>gustavo@embeddedor.com</email>
</author>
<published>2020-02-27T00:17:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d7f10df86202273155a9d8f8553bc2ad28e0dd46'/>
<id>urn:sha1:d7f10df86202273155a9d8f8553bc2ad28e0dd46</id>
<content type='text'>
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva &lt;gustavo@embeddedor.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Song Liu &lt;songliubraving@fb.com&gt;
Link: https://lore.kernel.org/bpf/20200227001744.GA3317@embeddedor
</content>
</entry>
<entry>
<title>bpf: Prepare hashtab locking for PREEMPT_RT</title>
<updated>2020-02-25T00:20:10Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2020-02-24T14:01:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7f805d17f1523c7b2c0da319ddb427d6c5d94ff1'/>
<id>urn:sha1:7f805d17f1523c7b2c0da319ddb427d6c5d94ff1</id>
<content type='text'>
PREEMPT_RT forbids certain operations like memory allocations (even with
GFP_ATOMIC) from atomic contexts. This is required because even with
GFP_ATOMIC the memory allocator calls into code pathes which acquire locks
with long held lock sections. To ensure the deterministic behaviour these
locks are regular spinlocks, which are converted to 'sleepable' spinlocks
on RT. The only true atomic contexts on an RT kernel are the low level
hardware handling, scheduling, low level interrupt handling, NMIs etc. None
of these contexts should ever do memory allocations.

As regular device interrupt handlers and soft interrupts are forced into
thread context, the existing code which does
  spin_lock*(); alloc(GPF_ATOMIC); spin_unlock*();
just works.

In theory the BPF locks could be converted to regular spinlocks as well,
but the bucket locks and percpu_freelist locks can be taken from arbitrary
contexts (perf, kprobes, tracepoints) which are required to be atomic
contexts even on RT. These mechanisms require preallocated maps, so there
is no need to invoke memory allocations within the lock held sections.

BPF maps which need dynamic allocation are only used from (forced) thread
context on RT and can therefore use regular spinlocks which in turn allows
to invoke memory allocations from the lock held section.

To achieve this make the hash bucket lock a union of a raw and a regular
spinlock and initialize and lock/unlock either the raw spinlock for
preallocated maps or the regular variant for maps which require memory
allocations.

On a non RT kernel this distinction is neither possible nor required.
spinlock maps to raw_spinlock and the extra code and conditional is
optimized out by the compiler. No functional change.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20200224145644.509685912@linutronix.de
</content>
</entry>
<entry>
<title>bpf: Factor out hashtab bucket lock operations</title>
<updated>2020-02-25T00:20:10Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2020-02-24T14:01:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d01f9b198ca985b49717d8cd0b1f57806cb1319a'/>
<id>urn:sha1:d01f9b198ca985b49717d8cd0b1f57806cb1319a</id>
<content type='text'>
As a preparation for making the BPF locking RT friendly, factor out the
hash bucket lock operations into inline functions. This allows to do the
necessary RT modification in one place instead of sprinkling it all over
the place. No functional change.

The now unused htab argument of the lock/unlock functions will be used in
the next step which adds PREEMPT_RT support.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20200224145644.420416916@linutronix.de
</content>
</entry>
<entry>
<title>bpf: Use recursion prevention helpers in hashtab code</title>
<updated>2020-02-25T00:20:10Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2020-02-24T14:01:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=085fee1a72a9fba101a4a68a2c02fa8bd2b6f913'/>
<id>urn:sha1:085fee1a72a9fba101a4a68a2c02fa8bd2b6f913</id>
<content type='text'>
The required protection is that the caller cannot be migrated to a
different CPU as these places take either a hash bucket lock or might
trigger a kprobe inside the memory allocator. Both scenarios can lead to
deadlocks. The deadlock prevention is per CPU by incrementing a per CPU
variable which temporarily blocks the invocation of BPF programs from perf
and kprobes.

Replace the open coded preempt_disable/enable() and this_cpu_inc/dec()
pairs with the new recursion prevention helpers to prepare BPF to work on
PREEMPT_RT enabled kernels. On a non-RT kernel the migrate disable/enable
in the helpers map to preempt_disable/enable(), i.e. no functional change.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20200224145644.211208533@linutronix.de
</content>
</entry>
<entry>
<title>bpf: Remove recursion prevention from rcu free callback</title>
<updated>2020-02-25T00:18:20Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2020-02-24T14:01:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8a37963c7ac9ecb7f86f8ebda020e3f8d6d7b8a0'/>
<id>urn:sha1:8a37963c7ac9ecb7f86f8ebda020e3f8d6d7b8a0</id>
<content type='text'>
If an element is freed via RCU then recursion into BPF instrumentation
functions is not a concern. The element is already detached from the map
and the RCU callback does not hold any locks on which a kprobe, perf event
or tracepoint attached BPF program could deadlock.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20200224145643.259118710@linutronix.de
</content>
</entry>
<entry>
<title>bpf: Update locking comment in hashtab code</title>
<updated>2020-02-25T00:12:20Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2020-02-24T14:01:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=dbca151cad736c99f4817076daf9fd02ed0c2daa'/>
<id>urn:sha1:dbca151cad736c99f4817076daf9fd02ed0c2daa</id>
<content type='text'>
The comment where the bucket lock is acquired says:

  /* bpf_map_update_elem() can be called in_irq() */

which is not really helpful and aside of that it does not explain the
subtle details of the hash bucket locks expecially in the context of BPF
and perf, kprobes and tracing.

Add a comment at the top of the file which explains the protection scopes
and the details how potential deadlocks are prevented.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20200224145642.755793061@linutronix.de
</content>
</entry>
<entry>
<title>bpf: Fix a potential deadlock with bpf_map_do_batch</title>
<updated>2020-02-20T00:01:25Z</updated>
<author>
<name>Yonghong Song</name>
<email>yhs@fb.com</email>
</author>
<published>2020-02-19T23:47:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b9aff38de2cb166476988020428985c5f7412ffc'/>
<id>urn:sha1:b9aff38de2cb166476988020428985c5f7412ffc</id>
<content type='text'>
Commit 057996380a42 ("bpf: Add batch ops to all htab bpf map")
added lookup_and_delete batch operation for hash table.
The current implementation has bpf_lru_push_free() inside
the bucket lock, which may cause a deadlock.

syzbot reports:
   -&gt; #2 (&amp;htab-&gt;buckets[i].lock#2){....}:
       __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
       _raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:159
       htab_lru_map_delete_node+0xce/0x2f0 kernel/bpf/hashtab.c:593
       __bpf_lru_list_shrink_inactive kernel/bpf/bpf_lru_list.c:220 [inline]
       __bpf_lru_list_shrink+0xf9/0x470 kernel/bpf/bpf_lru_list.c:266
       bpf_lru_list_pop_free_to_local kernel/bpf/bpf_lru_list.c:340 [inline]
       bpf_common_lru_pop_free kernel/bpf/bpf_lru_list.c:447 [inline]
       bpf_lru_pop_free+0x87c/0x1670 kernel/bpf/bpf_lru_list.c:499
       prealloc_lru_pop+0x2c/0xa0 kernel/bpf/hashtab.c:132
       __htab_lru_percpu_map_update_elem+0x67e/0xa90 kernel/bpf/hashtab.c:1069
       bpf_percpu_hash_update+0x16e/0x210 kernel/bpf/hashtab.c:1585
       bpf_map_update_value.isra.0+0x2d7/0x8e0 kernel/bpf/syscall.c:181
       generic_map_update_batch+0x41f/0x610 kernel/bpf/syscall.c:1319
       bpf_map_do_batch+0x3f5/0x510 kernel/bpf/syscall.c:3348
       __do_sys_bpf+0x9b7/0x41e0 kernel/bpf/syscall.c:3460
       __se_sys_bpf kernel/bpf/syscall.c:3355 [inline]
       __x64_sys_bpf+0x73/0xb0 kernel/bpf/syscall.c:3355
       do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

   -&gt; #0 (&amp;loc_l-&gt;lock){....}:
       check_prev_add kernel/locking/lockdep.c:2475 [inline]
       check_prevs_add kernel/locking/lockdep.c:2580 [inline]
       validate_chain kernel/locking/lockdep.c:2970 [inline]
       __lock_acquire+0x2596/0x4a00 kernel/locking/lockdep.c:3954
       lock_acquire+0x190/0x410 kernel/locking/lockdep.c:4484
       __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
       _raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:159
       bpf_common_lru_push_free kernel/bpf/bpf_lru_list.c:516 [inline]
       bpf_lru_push_free+0x250/0x5b0 kernel/bpf/bpf_lru_list.c:555
       __htab_map_lookup_and_delete_batch+0x8d4/0x1540 kernel/bpf/hashtab.c:1374
       htab_lru_map_lookup_and_delete_batch+0x34/0x40 kernel/bpf/hashtab.c:1491
       bpf_map_do_batch+0x3f5/0x510 kernel/bpf/syscall.c:3348
       __do_sys_bpf+0x1f7d/0x41e0 kernel/bpf/syscall.c:3456
       __se_sys_bpf kernel/bpf/syscall.c:3355 [inline]
       __x64_sys_bpf+0x73/0xb0 kernel/bpf/syscall.c:3355
       do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Possible unsafe locking scenario:

          CPU0                    CPU2
          ----                    ----
     lock(&amp;htab-&gt;buckets[i].lock#2);
                                  lock(&amp;l-&gt;lock);
                                  lock(&amp;htab-&gt;buckets[i].lock#2);
     lock(&amp;loc_l-&gt;lock);

    *** DEADLOCK ***

To fix the issue, for htab_lru_map_lookup_and_delete_batch() in CPU0,
let us do bpf_lru_push_free() out of the htab bucket lock. This can
avoid the above deadlock scenario.

Fixes: 057996380a42 ("bpf: Add batch ops to all htab bpf map")
Reported-by: syzbot+a38ff3d9356388f2fb83@syzkaller.appspotmail.com
Reported-by: syzbot+122b5421d14e68f29cd1@syzkaller.appspotmail.com
Suggested-by: Hillf Danton &lt;hdanton@sina.com&gt;
Suggested-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: Yonghong Song &lt;yhs@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Reviewed-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Acked-by: Brian Vazquez &lt;brianvv@google.com&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Link: https://lore.kernel.org/bpf/20200219234757.3544014-1-yhs@fb.com
</content>
</entry>
<entry>
<title>bpf: Do not grab the bucket spinlock by default on htab batch ops</title>
<updated>2020-02-19T23:59:30Z</updated>
<author>
<name>Brian Vazquez</name>
<email>brianvv@google.com</email>
</author>
<published>2020-02-18T17:25:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=492e0d0d6f2eb4badfd2868addf9da0f651eba0e'/>
<id>urn:sha1:492e0d0d6f2eb4badfd2868addf9da0f651eba0e</id>
<content type='text'>
Grabbing the spinlock for every bucket even if it's empty, was causing
significant perfomance cost when traversing htab maps that have only a
few entries. This patch addresses the issue by checking first the
bucket_cnt, if the bucket has some entries then we go and grab the
spinlock and proceed with the batching.

Tested with a htab of size 50K and different value of populated entries.

Before:
  Benchmark             Time(ns)        CPU(ns)
  ---------------------------------------------
  BM_DumpHashMap/1       2759655        2752033
  BM_DumpHashMap/10      2933722        2930825
  BM_DumpHashMap/200     3171680        3170265
  BM_DumpHashMap/500     3639607        3635511
  BM_DumpHashMap/1000    4369008        4364981
  BM_DumpHashMap/5k     11171919       11134028
  BM_DumpHashMap/20k    69150080       69033496
  BM_DumpHashMap/39k   190501036      190226162

After:
  Benchmark             Time(ns)        CPU(ns)
  ---------------------------------------------
  BM_DumpHashMap/1        202707         200109
  BM_DumpHashMap/10       213441         210569
  BM_DumpHashMap/200      478641         472350
  BM_DumpHashMap/500      980061         967102
  BM_DumpHashMap/1000    1863835        1839575
  BM_DumpHashMap/5k      8961836        8902540
  BM_DumpHashMap/20k    69761497       69322756
  BM_DumpHashMap/39k   187437830      186551111

Fixes: 057996380a42 ("bpf: Add batch ops to all htab bpf map")
Signed-off-by: Brian Vazquez &lt;brianvv@google.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Yonghong Song &lt;yhs@fb.com&gt;
Link: https://lore.kernel.org/bpf/20200218172552.215077-1-brianvv@google.com
</content>
</entry>
</feed>
