<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/ipv4/ip_input.c, branch v6.0</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.0</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.0'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2022-07-16T01:50:35Z</updated>
<entry>
<title>tcp/udp: Make early_demux back namespacified.</title>
<updated>2022-07-16T01:50:35Z</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.com</email>
</author>
<published>2022-07-13T17:52:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=11052589cf5c0bab3b4884d423d5f60c38fcf25d'/>
<id>urn:sha1:11052589cf5c0bab3b4884d423d5f60c38fcf25d</id>
<content type='text'>
Commit e21145a9871a ("ipv4: namespacify ip_early_demux sysctl knob") made
it possible to enable/disable early_demux on a per-netns basis.  Then, we
introduced two knobs, tcp_early_demux and udp_early_demux, to switch it for
TCP/UDP in commit dddb64bcb346 ("net: Add sysctl to toggle early demux for
tcp and udp").  However, the .proc_handler() was wrong and actually
disabled us from changing the behaviour in each netns.

We can execute early_demux if net.ipv4.ip_early_demux is on and each proto
.early_demux() handler is not NULL.  When we toggle (tcp|udp)_early_demux,
the change itself is saved in each netns variable, but the .early_demux()
handler is a global variable, so the handler is switched based on the
init_net's sysctl variable.  Thus, netns (tcp|udp)_early_demux knobs have
nothing to do with the logic.  Whether we CAN execute proto .early_demux()
is always decided by init_net's sysctl knob, and whether we DO it or not is
by each netns ip_early_demux knob.

This patch namespacifies (tcp|udp)_early_demux again.  For now, the users
of the .early_demux() handler are TCP and UDP only, and they are called
directly to avoid retpoline.  So, we can remove the .early_demux() handler
from inet6?_protos and need not dereference them in ip6?_rcv_finish_core().
If another proto needs .early_demux(), we can restore it at that time.

Fixes: dddb64bcb346 ("net: Add sysctl to toggle early demux for tcp and udp")
Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Link: https://lore.kernel.org/r/20220713175207.7727-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net-core: rx_otherhost_dropped to core_stats</title>
<updated>2022-04-08T03:32:49Z</updated>
<author>
<name>Jeffrey Ji</name>
<email>jeffreyji@google.com</email>
</author>
<published>2022-04-06T17:26:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=794c24e9921f32ded4422833a990ccf11dc3c00e'/>
<id>urn:sha1:794c24e9921f32ded4422833a990ccf11dc3c00e</id>
<content type='text'>
Increment rx_otherhost_dropped counter when packet dropped due to
mismatched dest MAC addr.

An example when this drop can occur is when manually crafting raw
packets that will be consumed by a user space application via a tap
device. For testing purposes local traffic was generated using trafgen
for the client and netcat to start a server

Tested: Created 2 netns, sent 1 packet using trafgen from 1 to the other
with "{eth(daddr=$INCORRECT_MAC...}", verified that iproute2 showed the
counter was incremented. (Also had to modify iproute2 to show the stat,
additional patch for that coming next.)

Signed-off-by: Jeffrey Ji &lt;jeffreyji@google.com&gt;
Reviewed-by: Brian Vazquez &lt;brianvv@google.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://lore.kernel.org/r/20220406172600.1141083-1-jeffreyjilinux@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: Postpone skb_clear_delivery_time() until knowing the skb is delivered locally</title>
<updated>2022-03-03T14:38:48Z</updated>
<author>
<name>Martin KaFai Lau</name>
<email>kafai@fb.com</email>
</author>
<published>2022-03-02T19:56:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cd14e9b7b8d312dfbf75ce1f78552902e51b9045'/>
<id>urn:sha1:cd14e9b7b8d312dfbf75ce1f78552902e51b9045</id>
<content type='text'>
The previous patches handled the delivery_time in the ingress path
before the routing decision is made.  This patch can postpone clearing
delivery_time in a skb until knowing it is delivered locally and also
set the (rcv) timestamp if needed.  This patch moves the
skb_clear_delivery_time() from dev.c to ip_local_deliver_finish()
and ip6_input_finish().

Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: ipv4: use kfree_skb_reason() in ip_protocol_deliver_rcu()</title>
<updated>2022-02-07T11:18:49Z</updated>
<author>
<name>Menglong Dong</name>
<email>imagedong@tencent.com</email>
</author>
<published>2022-02-05T07:47:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=10580c4791902571777603cb980414892dd5f149'/>
<id>urn:sha1:10580c4791902571777603cb980414892dd5f149</id>
<content type='text'>
Replace kfree_skb() with kfree_skb_reason() in ip_protocol_deliver_rcu().
Following new drop reasons are introduced:

SKB_DROP_REASON_XFRM_POLICY
SKB_DROP_REASON_IP_NOPROTO

Signed-off-by: Menglong Dong &lt;imagedong@tencent.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: ipv4: use kfree_skb_reason() in ip_rcv_finish_core()</title>
<updated>2022-02-07T11:18:49Z</updated>
<author>
<name>Menglong Dong</name>
<email>imagedong@tencent.com</email>
</author>
<published>2022-02-05T07:47:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c1f166d1f7eef212096a98b22f5acf92f9af353d'/>
<id>urn:sha1:c1f166d1f7eef212096a98b22f5acf92f9af353d</id>
<content type='text'>
Replace kfree_skb() with kfree_skb_reason() in ip_rcv_finish_core(),
following drop reasons are introduced:

SKB_DROP_REASON_IP_RPFILTER
SKB_DROP_REASON_UNICAST_IN_L2_MULTICAST

Signed-off-by: Menglong Dong &lt;imagedong@tencent.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: ipv4: use kfree_skb_reason() in ip_rcv_core()</title>
<updated>2022-02-07T11:18:49Z</updated>
<author>
<name>Menglong Dong</name>
<email>imagedong@tencent.com</email>
</author>
<published>2022-02-05T07:47:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=33cba42985c8144eef78d618fc1e51aaa074b169'/>
<id>urn:sha1:33cba42985c8144eef78d618fc1e51aaa074b169</id>
<content type='text'>
Replace kfree_skb() with kfree_skb_reason() in ip_rcv_core(). Three new
drop reasons are introduced:

SKB_DROP_REASON_OTHERHOST
SKB_DROP_REASON_IP_CSUM
SKB_DROP_REASON_IP_INHDR

Signed-off-by: Menglong Dong &lt;imagedong@tencent.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: use indirect call helpers for dst_input</title>
<updated>2021-02-03T22:51:39Z</updated>
<author>
<name>Brian Vazquez</name>
<email>brianvv@google.com</email>
</author>
<published>2021-02-01T17:41:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e43b21906439ed14dda84f9784d38c03d0464607'/>
<id>urn:sha1:e43b21906439ed14dda84f9784d38c03d0464607</id>
<content type='text'>
This patch avoids the indirect call for the common case:
ip_local_deliver and ip6_input

Signed-off-by: Brian Vazquez &lt;brianvv@google.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Add socket assign support</title>
<updated>2020-03-30T20:45:04Z</updated>
<author>
<name>Joe Stringer</name>
<email>joe@wand.net.nz</email>
</author>
<published>2020-03-29T22:53:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cf7fbe660f2dbd738ab58aea8e9b0ca6ad232449'/>
<id>urn:sha1:cf7fbe660f2dbd738ab58aea8e9b0ca6ad232449</id>
<content type='text'>
Add support for TPROXY via a new bpf helper, bpf_sk_assign().

This helper requires the BPF program to discover the socket via a call
to bpf_sk*_lookup_*(), then pass this socket to the new helper. The
helper takes its own reference to the socket in addition to any existing
reference that may or may not currently be obtained for the duration of
BPF processing. For the destination socket to receive the traffic, the
traffic must be routed towards that socket via local route. The
simplest example route is below, but in practice you may want to route
traffic more narrowly (eg by CIDR):

  $ ip route add local default dev lo

This patch avoids trying to introduce an extra bit into the skb-&gt;sk, as
that would require more invasive changes to all code interacting with
the socket to ensure that the bit is handled correctly, such as all
error-handling cases along the path from the helper in BPF through to
the orphan path in the input. Instead, we opt to use the destructor
variable to switch on the prefetch of the socket.

Signed-off-by: Joe Stringer &lt;joe@wand.net.nz&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Link: https://lore.kernel.org/bpf/20200329225342.16317-2-joe@wand.net.nz
</content>
</entry>
<entry>
<title>ipv4: use dst hint for ipv4 list receive</title>
<updated>2019-11-21T22:45:55Z</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2019-11-20T12:47:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=02b24941619fcce3d280311ac73b1e461552e9c8'/>
<id>urn:sha1:02b24941619fcce3d280311ac73b1e461552e9c8</id>
<content type='text'>
This is alike the previous change, with some additional ipv4 specific
quirk. Even when using the route hint we still have to do perform
additional per packet checks about source address validity: a new
helper is added to wrap them.

Hints are explicitly disabled if the destination is a local broadcast,
that keeps the code simple and local broadcast are a slower path anyway.

UDP flood performances vs recvmmsg() receiver:

vanilla		patched		delta
Kpps		Kpps		%
1683		1871		+11

In the worst case scenario - each packet has a different
destination address - the performance delta is within noise
range.

v3 -&gt; v4:
 - re-enable hints for forward

v2 -&gt; v3:
 - really fix build (sic) and hint usage check
 - use fib4_has_custom_rules() helpers (David A.)
 - add ip_extract_route_hint() helper (Edward C.)
 - use prev skb as hint instead of copying data (Willem)

v1 -&gt; v2:
 - fix build issue with !CONFIG_IP_MULTIPLE_TABLES

Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Reviewed-by: David Ahern &lt;dsahern@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inet: do not call sublist_rcv on empty list</title>
<updated>2019-10-30T00:54:29Z</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2019-10-29T00:44:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=51210ad5a558dcc7511d0c083f5cd796077b4e4d'/>
<id>urn:sha1:51210ad5a558dcc7511d0c083f5cd796077b4e4d</id>
<content type='text'>
syzbot triggered struct net NULL deref in NF_HOOK_LIST:
RIP: 0010:NF_HOOK_LIST include/linux/netfilter.h:331 [inline]
RIP: 0010:ip6_sublist_rcv+0x5c9/0x930 net/ipv6/ip6_input.c:292
 ipv6_list_rcv+0x373/0x4b0 net/ipv6/ip6_input.c:328
 __netif_receive_skb_list_ptype net/core/dev.c:5274 [inline]

Reason:
void ipv6_list_rcv(struct list_head *head, struct packet_type *pt,
                   struct net_device *orig_dev)
[..]
        list_for_each_entry_safe(skb, next, head, list) {
		/* iterates list */
                skb = ip6_rcv_core(skb, dev, net);
		/* ip6_rcv_core drops skb -&gt; NULL is returned */
                if (skb == NULL)
                        continue;
	[..]
	}
	/* sublist is empty -&gt; curr_net is NULL */
        ip6_sublist_rcv(&amp;sublist, curr_dev, curr_net);

Before the recent change NF_HOOK_LIST did a list iteration before
struct net deref, i.e. it was a no-op in the empty list case.

List iteration now happens after *net deref, causing crash.

Follow the same pattern as the ip(v6)_list_rcv loop and add a list_empty
test for the final sublist dispatch too.

Cc: Edward Cree &lt;ecree@solarflare.com&gt;
Reported-by: syzbot+c54f457cad330e57e967@syzkaller.appspotmail.com
Fixes: ca58fbe06c54 ("netfilter: add and use nf_hook_slow_list()")
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Tested-by: Leon Romanovsky &lt;leonro@mellanox.com&gt;
Tested-by: Nikolay Aleksandrov &lt;nikolay@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
