<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/ipv4, branch v3.5</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.5</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.5'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2012-07-18T16:01:12Z</updated>
<entry>
<title>cipso: don't follow a NULL pointer when setsockopt() is called</title>
<updated>2012-07-18T16:01:12Z</updated>
<author>
<name>Paul Moore</name>
<email>pmoore@redhat.com</email>
</author>
<published>2012-07-17T11:07:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=89d7ae34cdda4195809a5a987f697a517a2a3177'/>
<id>urn:sha1:89d7ae34cdda4195809a5a987f697a517a2a3177</id>
<content type='text'>
As reported by Alan Cox, and verified by Lin Ming, when a user
attempts to add a CIPSO option to a socket using the CIPSO_V4_TAG_LOCAL
tag the kernel dies a terrible death when it attempts to follow a NULL
pointer (the skb argument to cipso_v4_validate() is NULL when called via
the setsockopt() syscall).

This patch fixes this by first checking to ensure that the skb is
non-NULL before using it to find the incoming network interface.  In
the unlikely case where the skb is NULL and the user attempts to add
a CIPSO option with the _TAG_LOCAL tag we return an error as this is
not something we want to allow.

A simple reproducer, kindly supplied by Lin Ming, although you must
have the CIPSO DOI #3 configure on the system first or you will be
caught early in cipso_v4_validate():

	#include &lt;sys/types.h&gt;
	#include &lt;sys/socket.h&gt;
	#include &lt;linux/ip.h&gt;
	#include &lt;linux/in.h&gt;
	#include &lt;string.h&gt;

	struct local_tag {
		char type;
		char length;
		char info[4];
	};

	struct cipso {
		char type;
		char length;
		char doi[4];
		struct local_tag local;
	};

	int main(int argc, char **argv)
	{
		int sockfd;
		struct cipso cipso = {
			.type = IPOPT_CIPSO,
			.length = sizeof(struct cipso),
			.local = {
				.type = 128,
				.length = sizeof(struct local_tag),
			},
		};

		memset(cipso.doi, 0, 4);
		cipso.doi[3] = 3;

		sockfd = socket(AF_INET, SOCK_DGRAM, 0);
		#define SOL_IP 0
		setsockopt(sockfd, SOL_IP, IP_OPTIONS,
			&amp;cipso, sizeof(struct cipso));

		return 0;
	}

CC: Lin Ming &lt;mlin@ss.pku.edu.cn&gt;
Reported-by: Alan Cox &lt;alan@lxorguk.ukuu.org.uk&gt;
Signed-off-by: Paul Moore &lt;pmoore@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>snmp: fix OutOctets counter to include forwarded datagrams</title>
<updated>2012-06-07T21:50:56Z</updated>
<author>
<name>Vincent Bernat</name>
<email>bernat@luffy.cx</email>
</author>
<published>2012-06-05T03:41:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2d8dbb04c63e5369988f008bc4df3359c01d8812'/>
<id>urn:sha1:2d8dbb04c63e5369988f008bc4df3359c01d8812</id>
<content type='text'>
RFC 4293 defines ipIfStatsOutOctets (similar definition for
ipSystemStatsOutOctets):

   The total number of octets in IP datagrams delivered to the lower
   layers for transmission.  Octets from datagrams counted in
   ipIfStatsOutTransmits MUST be counted here.

And ipIfStatsOutTransmits:

   The total number of IP datagrams that this entity supplied to the
   lower layers for transmission.  This includes datagrams generated
   locally and those forwarded by this entity.

Therefore, IPSTATS_MIB_OUTOCTETS must be incremented when incrementing
IPSTATS_MIB_OUTFORWDATAGRAMS.

IP_UPD_PO_STATS is not used since ipIfStatsOutRequests must not
include forwarded datagrams:

   The total number of IP datagrams that local IP user-protocols
   (including ICMP) supplied to IP in requests for transmission.  Note
   that this counter does not include any datagrams counted in
   ipIfStatsOutForwDatagrams.

Signed-off-by: Vincent Bernat &lt;bernat@luffy.cx&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inetpeer: fix a race in inetpeer_gc_worker()</title>
<updated>2012-06-06T17:45:15Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-06-05T03:00:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=55432d2b543a4b6dfae54f5c432a566877a85d90'/>
<id>urn:sha1:55432d2b543a4b6dfae54f5c432a566877a85d90</id>
<content type='text'>
commit 5faa5df1fa2024 (inetpeer: Invalidate the inetpeer tree along with
the routing cache) added a race :

Before freeing an inetpeer, we must respect a RCU grace period, and make
sure no user will attempt to increase refcnt.

inetpeer_invalidate_tree() waits for a RCU grace period before inserting
inetpeer tree into gc_list and waking the worker. At that time, no
concurrent lookup can find a inetpeer in this tree.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
Acked-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: reflect SYN queue_mapping into SYNACK packets</title>
<updated>2012-06-01T18:22:11Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-06-01T01:47:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fff3269907897ee91406ece125795f53e722677e'/>
<id>urn:sha1:fff3269907897ee91406ece125795f53e722677e</id>
<content type='text'>
While testing how linux behaves on SYNFLOOD attack on multiqueue device
(ixgbe), I found that SYNACK messages were dropped at Qdisc level
because we send them all on a single queue.

Obvious choice is to reflect incoming SYN packet @queue_mapping to
SYNACK packet.

Under stress, my machine could only send 25.000 SYNACK per second (for
200.000 incoming SYN per second). NIC : ixgbe with 16 rx/tx queues.

After patch, not a single SYNACK is dropped.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Hans Schillstrom &lt;hans.schillstrom@ericsson.com&gt;
Cc: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: do not create inetpeer on SYNACK message</title>
<updated>2012-06-01T18:22:11Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-05-31T21:00:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7433819a1eefd4e74711fffd6d54e30a644ef240'/>
<id>urn:sha1:7433819a1eefd4e74711fffd6d54e30a644ef240</id>
<content type='text'>
Another problem on SYNFLOOD/DDOS attack is the inetpeer cache getting
larger and larger, using lots of memory and cpu time.

tcp_v4_send_synack()
-&gt;inet_csk_route_req()
 -&gt;ip_route_output_flow()
  -&gt;rt_set_nexthop()
   -&gt;rt_init_metrics()
    -&gt;inet_getpeer( create = true)

This is a side effect of commit a4daad6b09230 (net: Pre-COW metrics for
TCP) added in 2.6.39

Possible solution :

Instruct inet_csk_route_req() to remove FLOWI_FLAG_PRECOW_METRICS

Before patch :

# grep peer /proc/slabinfo
inet_peer_cache   4175430 4175430    192   42    2 : tunables    0    0    0 : slabdata  99415  99415      0

Samples: 41K of event 'cycles', Event count (approx.): 30716565122
+  20,24%      ksoftirqd/0  [kernel.kallsyms]           [k] inet_getpeer
+   8,19%      ksoftirqd/0  [kernel.kallsyms]           [k] peer_avl_rebalance.isra.1
+   4,81%      ksoftirqd/0  [kernel.kallsyms]           [k] sha_transform
+   3,64%      ksoftirqd/0  [kernel.kallsyms]           [k] fib_table_lookup
+   2,36%      ksoftirqd/0  [ixgbe]                     [k] ixgbe_poll
+   2,16%      ksoftirqd/0  [kernel.kallsyms]           [k] __ip_route_output_key
+   2,11%      ksoftirqd/0  [kernel.kallsyms]           [k] kernel_map_pages
+   2,11%      ksoftirqd/0  [kernel.kallsyms]           [k] ip_route_input_common
+   2,01%      ksoftirqd/0  [kernel.kallsyms]           [k] __inet_lookup_established
+   1,83%      ksoftirqd/0  [kernel.kallsyms]           [k] md5_transform
+   1,75%      ksoftirqd/0  [kernel.kallsyms]           [k] check_leaf.isra.9
+   1,49%      ksoftirqd/0  [kernel.kallsyms]           [k] ipt_do_table
+   1,46%      ksoftirqd/0  [kernel.kallsyms]           [k] hrtimer_interrupt
+   1,45%      ksoftirqd/0  [kernel.kallsyms]           [k] kmem_cache_alloc
+   1,29%      ksoftirqd/0  [kernel.kallsyms]           [k] inet_csk_search_req
+   1,29%      ksoftirqd/0  [kernel.kallsyms]           [k] __netif_receive_skb
+   1,16%      ksoftirqd/0  [kernel.kallsyms]           [k] copy_user_generic_string
+   1,15%      ksoftirqd/0  [kernel.kallsyms]           [k] kmem_cache_free
+   1,02%      ksoftirqd/0  [kernel.kallsyms]           [k] tcp_make_synack
+   0,93%      ksoftirqd/0  [kernel.kallsyms]           [k] _raw_spin_lock_bh
+   0,87%      ksoftirqd/0  [kernel.kallsyms]           [k] __call_rcu
+   0,84%      ksoftirqd/0  [kernel.kallsyms]           [k] rt_garbage_collect
+   0,84%      ksoftirqd/0  [kernel.kallsyms]           [k] fib_rules_lookup

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Hans Schillstrom &lt;hans.schillstrom@ericsson.com&gt;
Cc: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2012-05-31T17:32:36Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-05-31T17:32:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=13199a0845729492fc51d1ba87938cdfe341b141'/>
<id>urn:sha1:13199a0845729492fc51d1ba87938cdfe341b141</id>
<content type='text'>
Pull networking changes from David S. Miller:

 1) Fix IPSEC header length calculation for transport mode in ESP.  The
    issue is whether to do the calculation before or after alignment.
    Fix from Benjamin Poirier.

 2) Fix regression in IPV6 IPSEC fragment length calculations, from Gao
    Feng.  This is another transport vs tunnel mode issue.

 3) Handle AF_UNSPEC connect()s properly in L2TP to avoid OOPSes.  Fix
    from James Chapman.

 4) Fix USB ASIX driver's reception of full sized VLAN packets, from
    Eric Dumazet.

 5) Allow drop monitor (and, more generically, all generic netlink
    protocols) to be automatically loaded as a module.  From Neil
    Horman.

Fix up trivial conflict in Documentation/feature-removal-schedule.txt
due to new entries added next to each other at the end. As usual.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits)
  net/smsc911x: Repair broken failure paths
  virtio-net: remove useless disable on freeze
  netdevice: Update netif_dbg for CONFIG_DYNAMIC_DEBUG
  drop_monitor: Add module alias to enable automatic module loading
  genetlink: Build a generic netlink family module alias
  net: add MODULE_ALIAS_NET_PF_PROTO_NAME
  r6040: Do a Proper deinit at errorpath and also when driver unloads (calling r6040_remove_one)
  r6040: disable pci device if the subsequent calls (after pci_enable_device) fails
  skb: avoid unnecessary reallocations in __skb_cow
  net: sh_eth: fix the rxdesc pointer when rx descriptor empty happens
  asix: allow full size 8021Q frames to be received
  rds_rdma: don't assume infiniband device is PCI
  l2tp: fix oops in L2TP IP sockets for connect() AF_UNSPEC case
  mac80211: fix ADDBA declined after suspend with wowlan
  wlcore: fix undefined symbols when CONFIG_PM is not defined
  mac80211: fix flag check for QoS NOACK frames
  ath9k_hw: apply internal regulator settings on AR933x
  ath9k_hw: update AR933x initvals to fix issues with high power devices
  ath9k: fix a use-after-free-bug when ath_tx_setup_buffer() fails
  ath9k: stop rx dma before stopping tx
  ...
</content>
</entry>
<entry>
<title>memcg: decrement static keys at real destroy time</title>
<updated>2012-05-29T23:22:28Z</updated>
<author>
<name>Glauber Costa</name>
<email>glommer@parallels.com</email>
</author>
<published>2012-05-29T22:07:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3f134619393cb6c6dfab7890a617d0ceca6d05d7'/>
<id>urn:sha1:3f134619393cb6c6dfab7890a617d0ceca6d05d7</id>
<content type='text'>
We call the destroy function when a cgroup starts to be removed, such as
by a rmdir event.

However, because of our reference counters, some objects are still
inflight.  Right now, we are decrementing the static_keys at destroy()
time, meaning that if we get rid of the last static_key reference, some
objects will still have charges, but the code to properly uncharge them
won't be run.

This becomes a problem specially if it is ever enabled again, because now
new charges will be added to the staled charges making keeping it pretty
much impossible.

We just need to be careful with the static branch activation: since there
is no particular preferred order of their activation, we need to make sure
that we only start using it after all call sites are active.  This is
achieved by having a per-memcg flag that is only updated after
static_key_slow_inc() returns.  At this time, we are sure all sites are
active.

This is made per-memcg, not global, for a reason: it also has the effect
of making socket accounting more consistent.  The first memcg to be
limited will trigger static_key() activation, therefore, accounting.  But
all the others will then be accounted no matter what.  After this patch,
only limited memcgs will have its sockets accounted.

[akpm@linux-foundation.org: move enum sock_flag_bits into sock.h,
                            document enum sock_flag_bits,
                            convert memcg_proto_active() and memcg_proto_activated() to test_bit(),
                            redo tcp_update_limit() comment to 80 cols]
Signed-off-by: Glauber Costa &lt;glommer@parallels.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Li Zefan &lt;lizefan@huawei.com&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Acked-by: David Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>xfrm: take net hdr len into account for esp payload size calculation</title>
<updated>2012-05-27T05:08:29Z</updated>
<author>
<name>Benjamin Poirier</name>
<email>bpoirier@suse.de</email>
</author>
<published>2012-05-24T11:32:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=91657eafb64b4cb53ec3a2fbc4afc3497f735788'/>
<id>urn:sha1:91657eafb64b4cb53ec3a2fbc4afc3497f735788</id>
<content type='text'>
Corrects the function that determines the esp payload size. The calculations
done in esp{4,6}_get_mtu() lead to overlength frames in transport mode for
certain mtu values and suboptimal frames for others.

According to what is done, mainly in esp{,6}_output() and tcp_mtu_to_mss(),
net_header_len must be taken into account before doing the alignment
calculation.

Signed-off-by: Benjamin Poirier &lt;bpoirier@suse.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2012-05-24T18:54:29Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-05-24T18:54:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=28f3d717618156c0dcd2f497d791b578a7931d87'/>
<id>urn:sha1:28f3d717618156c0dcd2f497d791b578a7931d87</id>
<content type='text'>
Pull more networking updates from David Miller:
 "Ok, everything from here on out will be bug fixes."

1) One final sync of wireless and bluetooth stuff from John Linville.
   These changes have all been in his tree for more than a week, and
   therefore have had the necessary -next exposure.  John was just away
   on a trip and didn't have a change to send the pull request until a
   day or two ago.

2) Put back some defines in user exposed header file areas that were
   removed during the tokenring purge.  From Stephen Hemminger and Paul
   Gortmaker.

3) A bug fix for UDP hash table allocation got lost in the pile due to
   one of those "you got it..  no I've got it.." situations.  :-)

   From Tim Bird.

4) SKB coalescing in TCP needs to have stricter checks, otherwise we'll
   try to coalesce overlapping frags and crash.  Fix from Eric Dumazet.

5) RCU routing table lookups can race with free_fib_info(), causing
   crashes when we deref the device pointers in the route.  Fix by
   releasing the net device in the RCU callback.  From Yanmin Zhang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (293 commits)
  tcp: take care of overlaps in tcp_try_coalesce()
  ipv4: fix the rcu race between free_fib_info and ip_route_output_slow
  mm: add a low limit to alloc_large_system_hash
  ipx: restore token ring define to include/linux/ipx.h
  if: restore token ring ARP type to header
  xen: do not disable netfront in dom0
  phy/micrel: Fix ID of KSZ9021
  mISDN: Add X-Tensions USB ISDN TA XC-525
  gianfar:don't add FCB length to hard_header_len
  Bluetooth: Report proper error number in disconnection
  Bluetooth: Create flags for bt_sk()
  Bluetooth: report the right security level in getsockopt
  Bluetooth: Lock the L2CAP channel when sending
  Bluetooth: Restore locking semantics when looking up L2CAP channels
  Bluetooth: Fix a redundant and problematic incoming MTU check
  Bluetooth: Add support for Foxconn/Hon Hai AR5BBU22 0489:E03C
  Bluetooth: Fix EIR data generation for mgmt_device_found
  Bluetooth: Fix Inquiry with RSSI event mask
  Bluetooth: improve readability of l2cap_seq_list code
  Bluetooth: Fix skb length calculation
  ...
</content>
</entry>
<entry>
<title>tcp: take care of overlaps in tcp_try_coalesce()</title>
<updated>2012-05-24T04:28:21Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-05-23T17:51:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1ca7ee30630e1022dbcf1b51be20580815ffab73'/>
<id>urn:sha1:1ca7ee30630e1022dbcf1b51be20580815ffab73</id>
<content type='text'>
Sergio Correia reported following warning :

WARNING: at net/ipv4/tcp.c:1301 tcp_cleanup_rbuf+0x4f/0x110()

WARN(skb &amp;&amp; !before(tp-&gt;copied_seq, TCP_SKB_CB(skb)-&gt;end_seq),
     "cleanup rbuf bug: copied %X seq %X rcvnxt %X\n",
     tp-&gt;copied_seq, TCP_SKB_CB(skb)-&gt;end_seq, tp-&gt;rcv_nxt);

It appears TCP coalescing, and more specifically commit b081f85c297
(net: implement tcp coalescing in tcp_queue_rcv()) should take care of
possible segment overlaps in receive queue. This was properly done in
the case of out_or_order_queue by the caller.

For example, segment at tail of queue have sequence 1000-2000, and we
add a segment with sequence 1500-2500.
This can happen in case of retransmits.

In this case, just don't do the coalescing.

Reported-by: Sergio Correia &lt;lists@uece.net&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Tested-by: Sergio Correia &lt;lists@uece.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
