<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/ipv4/ip_input.c, branch v4.1</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.1</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.1'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2015-04-07T19:25:55Z</updated>
<entry>
<title>netfilter: Pass socket pointer down through okfn().</title>
<updated>2015-04-07T19:25:55Z</updated>
<author>
<name>David Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2015-04-06T02:19:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7026b1ddb6b8d4e6ee33dc2bd06c0ca8746fa7ab'/>
<id>urn:sha1:7026b1ddb6b8d4e6ee33dc2bd06c0ca8746fa7ab</id>
<content type='text'>
On the output paths in particular, we have to sometimes deal with two
socket contexts.  First, and usually skb-&gt;sk, is the local socket that
generated the frame.

And second, is potentially the socket used to control a tunneling
socket, such as one the encapsulates using UDP.

We do not want to disassociate skb-&gt;sk when encapsulating in order
to fix this, because that would break socket memory accounting.

The most extreme case where this can cause huge problems is an
AF_PACKET socket transmitting over a vxlan device.  We hit code
paths doing checks that assume they are dealing with an ipv4
socket, but are actually operating upon the AF_PACKET one.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: coding style: comparison for inequality with NULL</title>
<updated>2015-04-03T16:11:15Z</updated>
<author>
<name>Ian Morris</name>
<email>ipm@chirality.org.uk</email>
</author>
<published>2015-04-03T08:17:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=00db41243e8d5032c2e0f5bf6063bb19324bfdb3'/>
<id>urn:sha1:00db41243e8d5032c2e0f5bf6063bb19324bfdb3</id>
<content type='text'>
The ipv4 code uses a mixture of coding styles. In some instances check
for non-NULL pointer is done as x != NULL and sometimes as x. x is
preferred according to checkpatch and this patch makes the code
consistent by adopting the latter form.

No changes detected by objdiff.

Signed-off-by: Ian Morris &lt;ipm@chirality.org.uk&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: coding style: comparison for equality with NULL</title>
<updated>2015-04-03T16:11:15Z</updated>
<author>
<name>Ian Morris</name>
<email>ipm@chirality.org.uk</email>
</author>
<published>2015-04-03T08:17:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=51456b2914a34d16b1255b7c55d5cbf6a681d306'/>
<id>urn:sha1:51456b2914a34d16b1255b7c55d5cbf6a681d306</id>
<content type='text'>
The ipv4 code uses a mixture of coding styles. In some instances check
for NULL pointer is done as x == NULL and sometimes as !x. !x is
preferred according to checkpatch and this patch makes the code
consistent by adopting the latter form.

No changes detected by objdiff.

Signed-off-by: Ian Morris &lt;ipm@chirality.org.uk&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: Fix memory leak if TPROXY used with TCP early demux</title>
<updated>2014-01-28T00:22:11Z</updated>
<author>
<name>Holger Eitzenberger</name>
<email>holger@eitzenberger.org</email>
</author>
<published>2014-01-27T09:33:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a452ce345d63ddf92cd101e4196569f8718ad319'/>
<id>urn:sha1:a452ce345d63ddf92cd101e4196569f8718ad319</id>
<content type='text'>
I see a memory leak when using a transparent HTTP proxy using TPROXY
together with TCP early demux and Kernel v3.8.13.15 (Ubuntu stable):

unreferenced object 0xffff88008cba4a40 (size 1696):
  comm "softirq", pid 0, jiffies 4294944115 (age 8907.520s)
  hex dump (first 32 bytes):
    0a e0 20 6a 40 04 1b 37 92 be 32 e2 e8 b4 00 00  .. j@..7..2.....
    02 00 07 01 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [&lt;ffffffff810b710a&gt;] kmem_cache_alloc+0xad/0xb9
    [&lt;ffffffff81270185&gt;] sk_prot_alloc+0x29/0xc5
    [&lt;ffffffff812702cf&gt;] sk_clone_lock+0x14/0x283
    [&lt;ffffffff812aaf3a&gt;] inet_csk_clone_lock+0xf/0x7b
    [&lt;ffffffff8129a893&gt;] netlink_broadcast+0x14/0x16
    [&lt;ffffffff812c1573&gt;] tcp_create_openreq_child+0x1b/0x4c3
    [&lt;ffffffff812c033e&gt;] tcp_v4_syn_recv_sock+0x38/0x25d
    [&lt;ffffffff812c13e4&gt;] tcp_check_req+0x25c/0x3d0
    [&lt;ffffffff812bf87a&gt;] tcp_v4_do_rcv+0x287/0x40e
    [&lt;ffffffff812a08a7&gt;] ip_route_input_noref+0x843/0xa55
    [&lt;ffffffff812bfeca&gt;] tcp_v4_rcv+0x4c9/0x725
    [&lt;ffffffff812a26f4&gt;] ip_local_deliver_finish+0xe9/0x154
    [&lt;ffffffff8127a927&gt;] __netif_receive_skb+0x4b2/0x514
    [&lt;ffffffff8127aa77&gt;] process_backlog+0xee/0x1c5
    [&lt;ffffffff8127c949&gt;] net_rx_action+0xa7/0x200
    [&lt;ffffffff81209d86&gt;] add_interrupt_randomness+0x39/0x157

But there are many more, resulting in the machine going OOM after some
days.

From looking at the TPROXY code, and with help from Florian, I see
that the memory leak is introduced in tcp_v4_early_demux():

  void tcp_v4_early_demux(struct sk_buff *skb)
  {
    /* ... */

    iph = ip_hdr(skb);
    th = tcp_hdr(skb);

    if (th-&gt;doff &lt; sizeof(struct tcphdr) / 4)
        return;

    sk = __inet_lookup_established(dev_net(skb-&gt;dev), &amp;tcp_hashinfo,
                       iph-&gt;saddr, th-&gt;source,
                       iph-&gt;daddr, ntohs(th-&gt;dest),
                       skb-&gt;skb_iif);
    if (sk) {
        skb-&gt;sk = sk;

where the socket is assigned unconditionally to skb-&gt;sk, also bumping
the refcnt on it.  This is problematic, because in our case the skb
has already a socket assigned in the TPROXY target.  This then results
in the leak I see.

The very same issue seems to be with IPv6, but haven't tested.

Reviewed-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Holger Eitzenberger &lt;holger@eitzenberger.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: add SNMP counters tracking incoming ECN bits</title>
<updated>2013-08-09T05:24:59Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-08-06T10:32:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1f07d03e2069df2ecd82301936b598a3b257c6d6'/>
<id>urn:sha1:1f07d03e2069df2ecd82301936b598a3b257c6d6</id>
<content type='text'>
With GRO/LRO processing, there is a problem because Ip[6]InReceives SNMP
counters do not count the number of frames, but number of aggregated
segments.

Its probably too late to change this now.

This patch adds four new counters, tracking number of frames, regardless
of LRO/GRO, and on a per ECN status basis, for IPv4 and IPv6.

Ip[6]NoECTPkts : Number of packets received with NOECT
Ip[6]ECT1Pkts  : Number of packets received with ECT(1)
Ip[6]ECT0Pkts  : Number of packets received with ECT(0)
Ip[6]CEPkts    : Number of packets received with Congestion Experienced

lph37:~# nstat | egrep "Pkts|InReceive"
IpInReceives                    1634137            0.0
Ip6InReceives                   3714107            0.0
Ip6InNoECTPkts                  19205              0.0
Ip6InECT0Pkts                   52651828           0.0
IpExtInNoECTPkts                33630              0.0
IpExtInECT0Pkts                 15581379           0.0
IpExtInCEPkts                   6                  0.0

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: set transport header earlier</title>
<updated>2013-07-16T19:59:28Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-07-16T03:03:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=21d1196a35f5686c4323e42a62fdb4b23b0ab4a3'/>
<id>urn:sha1:21d1196a35f5686c4323e42a62fdb4b23b0ab4a3</id>
<content type='text'>
commit 45f00f99d6e ("ipv4: tcp: clean up tcp_v4_early_demux()") added a
performance regression for non GRO traffic, basically disabling
IP early demux.

IPv6 stack resets transport header in ip6_rcv() before calling
IP early demux in ip6_rcv_finish(), while IPv4 does this only in
ip_local_deliver_finish(), _after_ IP early demux.

GRO traffic happened to enable IP early demux because transport header
is also set in inet_gro_receive()

Instead of reverting the faulty commit, we can make IPv4/IPv6 behave the
same : transport_header should be set in ip_rcv() instead of
ip_local_deliver_finish()

ip_local_deliver_finish() can also use skb_network_header_len() which is
faster than ip_hdrlen()

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: Add MIB counters for checksum errors</title>
<updated>2013-04-29T19:14:03Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-04-29T08:39:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6a5dc9e598fe90160fee7de098fa319665f5253e'/>
<id>urn:sha1:6a5dc9e598fe90160fee7de098fa319665f5253e</id>
<content type='text'>
Add MIB counters for checksum errors in IP layer,
and TCP/UDP/ICMP layers, to help diagnose problems.

$ nstat -a | grep  Csum
IcmpInCsumErrors                72                 0.0
TcpInCsumErrors                 382                0.0
UdpInCsumErrors                 463221             0.0
Icmp6InCsumErrors               75                 0.0
Udp6InCsumErrors                173442             0.0
IpExtInCsumErrors               10884              0.0

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv[4|6]: correct dropwatch false positive in local_deliver_finish</title>
<updated>2013-03-01T20:56:29Z</updated>
<author>
<name>Neil Horman</name>
<email>nhorman@tuxdriver.com</email>
</author>
<published>2013-03-01T07:44:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d8c6f4b9b7848bca8babfc0ae43a50c8ab22fbb9'/>
<id>urn:sha1:d8c6f4b9b7848bca8babfc0ae43a50c8ab22fbb9</id>
<content type='text'>
I had a report recently of a user trying to use dropwatch to localise some frame
loss, and they were getting false positives.  Turned out they were using a user
space SCTP stack that used raw sockets to grab frames.  When we don't have a
registered protocol for a given packet, we record it as a drop, even if a raw
socket receieves the frame.  We should only record the drop in the event a raw
socket doesnt exist to receive the frames

Tested by the reported successfully

Signed-off-by: Neil Horman &lt;nhorman@tuxdriver.com&gt;
Reported-by: William Reich &lt;reich@ulticom.com&gt;
Tested-by: William Reich &lt;reich@ulticom.com&gt;
CC: "David S. Miller" &lt;davem@davemloft.net&gt;
CC: William Reich &lt;reich@ulticom.com&gt;
CC: eric.dumazet@gmail.com
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: Disallow non-namespace aware protocols to register.</title>
<updated>2013-02-05T19:42:23Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2013-02-05T19:42:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=547472b8e1da72ae226430c0c4273e36fc8ca768'/>
<id>urn:sha1:547472b8e1da72ae226430c0c4273e36fc8ca768</id>
<content type='text'>
All in-tree ipv4 protocol implementations are now namespace
aware.  Therefore all the run-time checks are superfluous.

Reject registry of any non-namespace aware ipv4 protocol.
Eventually we'll remove prot-&gt;netns_ok and this registry
time check as well.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: TCP early demux cleanup</title>
<updated>2012-07-30T21:53:21Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-07-29T21:06:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cca32e4bf999a34ac08d959f351f2b30bcd02460'/>
<id>urn:sha1:cca32e4bf999a34ac08d959f351f2b30bcd02460</id>
<content type='text'>
early_demux() handlers should be called in RCU context, and as we
use skb_dst_set_noref(skb, dst), caller must not exit from RCU context
before dst use (skb_dst(skb)) or release (skb_drop(dst))

Therefore, rcu_read_lock()/rcu_read_unlock() pairs around
-&gt;early_demux() are confusing and not needed :

Protocol handlers are already in an RCU read lock section.
(__netif_receive_skb() does the rcu_read_lock() )

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
