<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/bpf/arraymap.c, branch v5.17</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.17</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.17'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2021-10-28T17:43:58Z</updated>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2021-10-28T17:43:58Z</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2021-10-28T17:43:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7df621a3eea6761bc83e641aaca6963210c7290d'/>
<id>urn:sha1:7df621a3eea6761bc83e641aaca6963210c7290d</id>
<content type='text'>
include/net/sock.h
  7b50ecfcc6cd ("net: Rename -&gt;stream_memory_read to -&gt;sock_is_readable")
  4c1e34c0dbff ("vsock: Enable y2038 safe timeval for timeout")

drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c
  0daa55d033b0 ("octeontx2-af: cn10k: debugfs for dumping LMTST map table")
  e77bcdd1f639 ("octeontx2-af: Display all enabled PF VF rsrc_alloc entries.")

Adjacent code addition in both cases, keep both.

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Fix potential race in tail call compatibility check</title>
<updated>2021-10-26T19:37:28Z</updated>
<author>
<name>Toke Høiland-Jørgensen</name>
<email>toke@redhat.com</email>
</author>
<published>2021-10-26T11:00:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=54713c85f536048e685258f880bf298a74c3620d'/>
<id>urn:sha1:54713c85f536048e685258f880bf298a74c3620d</id>
<content type='text'>
Lorenzo noticed that the code testing for program type compatibility of
tail call maps is potentially racy in that two threads could encounter a
map with an unset type simultaneously and both return true even though they
are inserting incompatible programs.

The race window is quite small, but artificially enlarging it by adding a
usleep_range() inside the check in bpf_prog_array_compatible() makes it
trivial to trigger from userspace with a program that does, essentially:

        map_fd = bpf_create_map(BPF_MAP_TYPE_PROG_ARRAY, 4, 4, 2, 0);
        pid = fork();
        if (pid) {
                key = 0;
                value = xdp_fd;
        } else {
                key = 1;
                value = tc_fd;
        }
        err = bpf_map_update_elem(map_fd, &amp;key, &amp;value, 0);

While the race window is small, it has potentially serious ramifications in
that triggering it would allow a BPF program to tail call to a program of a
different type. So let's get rid of it by protecting the update with a
spinlock. The commit in the Fixes tag is the last commit that touches the
code in question.

v2:
- Use a spinlock instead of an atomic variable and cmpxchg() (Alexei)
v3:
- Put lock and the members it protects into an embedded 'owner' struct (Daniel)

Fixes: 3324b584b6f6 ("ebpf: misc core cleanup")
Reported-by: Lorenzo Bianconi &lt;lorenzo.bianconi@redhat.com&gt;
Signed-off-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20211026110019.363464-1-toke@redhat.com
</content>
</entry>
<entry>
<title>bpf: Replace callers of BPF_CAST_CALL with proper function typedef</title>
<updated>2021-09-28T23:27:18Z</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2021-09-28T23:09:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=102acbacfd9a96d101abd96d1a7a5bf92b7c3e8e'/>
<id>urn:sha1:102acbacfd9a96d101abd96d1a7a5bf92b7c3e8e</id>
<content type='text'>
In order to keep ahead of cases in the kernel where Control Flow
Integrity (CFI) may trip over function call casts, enabling
-Wcast-function-type is helpful. To that end, BPF_CAST_CALL causes
various warnings and is one of the last places in the kernel
triggering this warning.

For actual function calls, replace BPF_CAST_CALL() with a typedef, which
captures the same details about the given function pointers.

This change results in no object code difference.

Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Acked-by: Gustavo A. R. Silva &lt;gustavoars@kernel.org&gt;
Link: https://github.com/KSPP/linux/issues/20
Link: https://lore.kernel.org/lkml/CAEf4Bzb46=-J5Fxc3mMZ8JQPtK1uoE0q6+g6WPz53Cvx=CBEhw@mail.gmail.com
Link: https://lore.kernel.org/bpf/20210928230946.4062144-3-keescook@chromium.org
</content>
</entry>
<entry>
<title>bpf: Add map side support for bpf timers.</title>
<updated>2021-07-15T20:31:10Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2021-07-15T00:54:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=68134668c17f31f51930478f75495b552a411550'/>
<id>urn:sha1:68134668c17f31f51930478f75495b552a411550</id>
<content type='text'>
Restrict bpf timers to array, hash (both preallocated and kmalloced), and
lru map types. The per-cpu maps with timers don't make sense, since 'struct
bpf_timer' is a part of map value. bpf timers in per-cpu maps would mean that
the number of timers depends on number of possible cpus and timers would not be
accessible from all cpus. lpm map support can be added in the future.
The timers in inner maps are supported.

The bpf_map_update/delete_elem() helpers and sys_bpf commands cancel and free
bpf_timer in a given map element.

Similar to 'struct bpf_spin_lock' BTF is required and it is used to validate
that map element indeed contains 'struct bpf_timer'.

Make check_and_init_map_value() init both bpf_spin_lock and bpf_timer when
map element data is reused in preallocated htab and lru maps.

Teach copy_map_value() to support both bpf_spin_lock and bpf_timer in a single
map element. There could be one of each, but not more than one. Due to 'one
bpf_timer in one element' restriction do not support timers in global data,
since global data is a map of single element, but from bpf program side it's
seen as many global variables and restriction of single global timer would be
odd. The sys_bpf map_freeze and sys_mmap syscalls are not allowed on maps with
timers, since user space could have corrupted mmap element and crashed the
kernel. The maps with timers cannot be readonly. Due to these restrictions
search for bpf_timer in datasec BTF in case it was placed in the global data to
report clear error.

The previous patch allowed 'struct bpf_timer' as a first field in a map
element only. Relax this restriction.

Refactor lru map to s/bpf_lru_push_free/htab_lru_push_free/ to cancel and free
the timer when lru map deletes an element as a part of it eviction algorithm.

Make sure that bpf program cannot access 'struct bpf_timer' via direct load/store.
The timer operation are done through helpers only.
This is similar to 'struct bpf_spin_lock'.

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Yonghong Song &lt;yhs@fb.com&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Acked-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Acked-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Link: https://lore.kernel.org/bpf/20210715005417.78572-5-alexei.starovoitov@gmail.com
</content>
</entry>
<entry>
<title>bpf: Add batched ops support for percpu array</title>
<updated>2021-04-27T23:17:45Z</updated>
<author>
<name>Pedro Tammela</name>
<email>pctammela@gmail.com</email>
</author>
<published>2021-04-24T21:45:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f008d732ab181fd00d95c2e8a6e479d2f7c634b3'/>
<id>urn:sha1:f008d732ab181fd00d95c2e8a6e479d2f7c634b3</id>
<content type='text'>
Uses the already in-place infrastructure provided by the
'generic_map_*_batch' functions.

No tweak was needed as it transparently handles the percpu variant.

As arrays don't have delete operations, let it return a error to
user space (default behaviour).

Suggested-by: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Signed-off-by: Pedro Tammela &lt;pctammela@mojatatu.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20210424214510.806627-2-pctammela@mojatatu.com
</content>
</entry>
<entry>
<title>bpf: Add arraymap support for bpf_for_each_map_elem() helper</title>
<updated>2021-02-26T21:23:52Z</updated>
<author>
<name>Yonghong Song</name>
<email>yhs@fb.com</email>
</author>
<published>2021-02-26T20:49:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=06dcdcd4b9e84ee78d865c928b1d43bb71829251'/>
<id>urn:sha1:06dcdcd4b9e84ee78d865c928b1d43bb71829251</id>
<content type='text'>
This patch added support for arraymap and percpu arraymap.

Signed-off-by: Yonghong Song &lt;yhs@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20210226204928.3885192-1-yhs@fb.com
</content>
</entry>
<entry>
<title>bpf: Eliminate rlimit-based memory accounting for arraymap maps</title>
<updated>2020-12-03T02:32:46Z</updated>
<author>
<name>Roman Gushchin</name>
<email>guro@fb.com</email>
</author>
<published>2020-12-01T21:58:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1bc5975613ed155fc57ee321041d3463e580b4a3'/>
<id>urn:sha1:1bc5975613ed155fc57ee321041d3463e580b4a3</id>
<content type='text'>
Do not use rlimit-based memory accounting for arraymap maps.
It has been replaced with the memcg-based memory accounting.

Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Song Liu &lt;songliubraving@fb.com&gt;
Link: https://lore.kernel.org/bpf/20201201215900.3569844-19-guro@fb.com
</content>
</entry>
<entry>
<title>bpf: Refine memcg-based memory accounting for arraymap maps</title>
<updated>2020-12-03T02:32:45Z</updated>
<author>
<name>Roman Gushchin</name>
<email>guro@fb.com</email>
</author>
<published>2020-12-01T21:58:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6d192c7938b7e53a6bb55b90b86bd02ea0153731'/>
<id>urn:sha1:6d192c7938b7e53a6bb55b90b86bd02ea0153731</id>
<content type='text'>
Include percpu arrays and auxiliary data into the memcg-based memory
accounting.

Signed-off-by: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20201201215900.3569844-9-guro@fb.com
</content>
</entry>
<entry>
<title>bpf: Allow for map-in-map with dynamic inner array map entries</title>
<updated>2020-10-11T17:21:04Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2020-10-10T23:40:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4a8f87e60f6db40e640f1db555d063b2c4dea5f1'/>
<id>urn:sha1:4a8f87e60f6db40e640f1db555d063b2c4dea5f1</id>
<content type='text'>
Recent work in f4d05259213f ("bpf: Add map_meta_equal map ops") and 134fede4eecf
("bpf: Relax max_entries check for most of the inner map types") added support
for dynamic inner max elements for most map-in-map types. Exceptions were maps
like array or prog array where the map_gen_lookup() callback uses the maps'
max_entries field as a constant when emitting instructions.

We recently implemented Maglev consistent hashing into Cilium's load balancer
which uses map-in-map with an outer map being hash and inner being array holding
the Maglev backend table for each service. This has been designed this way in
order to reduce overall memory consumption given the outer hash map allows to
avoid preallocating a large, flat memory area for all services. Also, the
number of service mappings is not always known a-priori.

The use case for dynamic inner array map entries is to further reduce memory
overhead, for example, some services might just have a small number of back
ends while others could have a large number. Right now the Maglev backend table
for small and large number of backends would need to have the same inner array
map entries which adds a lot of unneeded overhead.

Dynamic inner array map entries can be realized by avoiding the inlined code
generation for their lookup. The lookup will still be efficient since it will
be calling into array_map_lookup_elem() directly and thus avoiding retpoline.
The patch adds a BPF_F_INNER_MAP flag to map creation which therefore skips
inline code generation and relaxes array_map_meta_equal() check to ignore both
maps' max_entries. This also still allows to have faster lookups for map-in-map
when BPF_F_INNER_MAP is not specified and hence dynamic max_entries not needed.

Example code generation where inner map is dynamic sized array:

  # bpftool p d x i 125
  int handle__sys_enter(void * ctx):
  ; int handle__sys_enter(void *ctx)
     0: (b4) w1 = 0
  ; int key = 0;
     1: (63) *(u32 *)(r10 -4) = r1
     2: (bf) r2 = r10
  ;
     3: (07) r2 += -4
  ; inner_map = bpf_map_lookup_elem(&amp;outer_arr_dyn, &amp;key);
     4: (18) r1 = map[id:468]
     6: (07) r1 += 272
     7: (61) r0 = *(u32 *)(r2 +0)
     8: (35) if r0 &gt;= 0x3 goto pc+5
     9: (67) r0 &lt;&lt;= 3
    10: (0f) r0 += r1
    11: (79) r0 = *(u64 *)(r0 +0)
    12: (15) if r0 == 0x0 goto pc+1
    13: (05) goto pc+1
    14: (b7) r0 = 0
    15: (b4) w6 = -1
  ; if (!inner_map)
    16: (15) if r0 == 0x0 goto pc+6
    17: (bf) r2 = r10
  ;
    18: (07) r2 += -4
  ; val = bpf_map_lookup_elem(inner_map, &amp;key);
    19: (bf) r1 = r0                               | No inlining but instead
    20: (85) call array_map_lookup_elem#149280     | call to array_map_lookup_elem()
  ; return val ? *val : -1;                        | for inner array lookup.
    21: (15) if r0 == 0x0 goto pc+1
  ; return val ? *val : -1;
    22: (61) r6 = *(u32 *)(r0 +0)
  ; }
    23: (bc) w0 = w6
    24: (95) exit

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20201010234006.7075-4-daniel@iogearbox.net
</content>
</entry>
<entry>
<title>bpf: Introduce BPF_F_PRESERVE_ELEMS for perf event array</title>
<updated>2020-10-01T06:18:12Z</updated>
<author>
<name>Song Liu</name>
<email>songliubraving@fb.com</email>
</author>
<published>2020-09-30T22:49:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=792caccc4526bb489e054f9ab61d7c024b15dea2'/>
<id>urn:sha1:792caccc4526bb489e054f9ab61d7c024b15dea2</id>
<content type='text'>
Currently, perf event in perf event array is removed from the array when
the map fd used to add the event is closed. This behavior makes it
difficult to the share perf events with perf event array.

Introduce perf event map that keeps the perf event open with a new flag
BPF_F_PRESERVE_ELEMS. With this flag set, perf events in the array are not
removed when the original map fd is closed. Instead, the perf event will
stay in the map until 1) it is explicitly removed from the array; or 2)
the array is freed.

Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20200930224927.1936644-2-songliubraving@fb.com
</content>
</entry>
</feed>
