diff options
| author | Alexei Starovoitov <ast@kernel.org> | 2021-08-25 17:40:36 -0700 |
|---|---|---|
| committer | Alexei Starovoitov <ast@kernel.org> | 2021-08-25 17:40:36 -0700 |
| commit | 0584e965fb2517f41b7057ffa26f3b6e15a53754 (patch) | |
| tree | cad9561b3fc80ffe9254a66bcf36159a0d5141c8 /kernel | |
| parent | Merge branch 'selftests: xsk: various simplifications' (diff) | |
| parent | bpf: selftests: Add dctcp fallback test (diff) | |
| download | linux-0584e965fb2517f41b7057ffa26f3b6e15a53754.tar.gz linux-0584e965fb2517f41b7057ffa26f3b6e15a53754.zip | |
Merge branch 'bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt'
Martin KaFai says:
====================
This set allows the bpf-tcp-cc to call bpf_setsockopt. One use
case is to allow a bpf-tcp-cc switching to another cc during init().
For example, when the tcp flow is not ecn ready, the bpf_dctcp
can switch to another cc by calling setsockopt(TCP_CONGESTION).
bpf_getsockopt() is also added to have a symmetrical API, so
less usage surprise.
v2:
- Not allow switching to kernel's tcp_cdg because it is the only
kernel tcp-cc that stores a pointer to icsk_ca_priv.
Please see the commit log in patch 1 for details.
Test is added in patch 4 to check switching to tcp_cdg.
- Refactor the logic finding the offset of a func ptr
in the "struct tcp_congestion_ops" to prog_ops_moff()
in patch 1.
- bpf_setsockopt() has been disabled in release() since v1 (please
see commit log in patch 1 for reason). bpf_getsockopt() is
also disabled together in release() in v2 to avoid usage surprise
because both of them are usually expected to be available together.
bpf-tcp-cc can already use PTR_TO_BTF_ID to read from tcp_sock.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Diffstat (limited to 'kernel')
| -rw-r--r-- | kernel/bpf/bpf_struct_ops.c | 22 |
1 files changed, 21 insertions, 1 deletions
diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index 70f6fd4fa305..d6731c32864e 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -28,6 +28,7 @@ struct bpf_struct_ops_value { struct bpf_struct_ops_map { struct bpf_map map; + struct rcu_head rcu; const struct bpf_struct_ops *st_ops; /* protect map_update */ struct mutex lock; @@ -622,6 +623,14 @@ bool bpf_struct_ops_get(const void *kdata) return refcount_inc_not_zero(&kvalue->refcnt); } +static void bpf_struct_ops_put_rcu(struct rcu_head *head) +{ + struct bpf_struct_ops_map *st_map; + + st_map = container_of(head, struct bpf_struct_ops_map, rcu); + bpf_map_put(&st_map->map); +} + void bpf_struct_ops_put(const void *kdata) { struct bpf_struct_ops_value *kvalue; @@ -632,6 +641,17 @@ void bpf_struct_ops_put(const void *kdata) st_map = container_of(kvalue, struct bpf_struct_ops_map, kvalue); - bpf_map_put(&st_map->map); + /* The struct_ops's function may switch to another struct_ops. + * + * For example, bpf_tcp_cc_x->init() may switch to + * another tcp_cc_y by calling + * setsockopt(TCP_CONGESTION, "tcp_cc_y"). + * During the switch, bpf_struct_ops_put(tcp_cc_x) is called + * and its map->refcnt may reach 0 which then free its + * trampoline image while tcp_cc_x is still running. + * + * Thus, a rcu grace period is needed here. + */ + call_rcu(&st_map->rcu, bpf_struct_ops_put_rcu); } } |
