linux/net, branch v3.5

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client

2012-07-19T23:11:28Z

Pull last minute Ceph fixes from Sage Weil: "The important one fixes a bug in the socket failure handling behavior that was turned up in some recent failure injection testing. The other two are minor bug fixes." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: endian bug in rbd_req_cb() rbd: Fix ceph_snap_context size calculation libceph: fix messenger retry

net: Statically initialize init_net.dev_base_head

2012-07-18T20:32:27Z

This change eliminates an initialization-order hazard most recently seen when netprio_cgroup is built into the kernel. With thanks to Eric Dumazet for catching a bug. Signed-off-by: Mark Rustad Acked-by: Eric Dumazet Signed-off-by: David S. Miller

cipso: don't follow a NULL pointer when setsockopt() is called

2012-07-18T16:01:12Z

As reported by Alan Cox, and verified by Lin Ming, when a user attempts to add a CIPSO option to a socket using the CIPSO_V4_TAG_LOCAL tag the kernel dies a terrible death when it attempts to follow a NULL pointer (the skb argument to cipso_v4_validate() is NULL when called via the setsockopt() syscall). This patch fixes this by first checking to ensure that the skb is non-NULL before using it to find the incoming network interface. In the unlikely case where the skb is NULL and the user attempts to add a CIPSO option with the _TAG_LOCAL tag we return an error as this is not something we want to allow. A simple reproducer, kindly supplied by Lin Ming, although you must have the CIPSO DOI #3 configure on the system first or you will be caught early in cipso_v4_validate(): #include #include #include #include #include struct local_tag { char type; char length; char info[4]; }; struct cipso { char type; char length; char doi[4]; struct local_tag local; }; int main(int argc, char **argv) { int sockfd; struct cipso cipso = { .type = IPOPT_CIPSO, .length = sizeof(struct cipso), .local = { .type = 128, .length = sizeof(struct local_tag), }, }; memset(cipso.doi, 0, 4); cipso.doi[3] = 3; sockfd = socket(AF_INET, SOCK_DGRAM, 0); #define SOL_IP 0 setsockopt(sockfd, SOL_IP, IP_OPTIONS, &cipso, sizeof(struct cipso)); return 0; } CC: Lin Ming Reported-by: Alan Cox Signed-off-by: Paul Moore Signed-off-by: David S. Miller

libceph: fix messenger retry

2012-07-18T02:35:59Z

In ancient times, the messenger could both initiate and accept connections. An artifact if that was data structures to store/process an incoming ceph_msg_connect request and send an outgoing ceph_msg_connect_reply. Sadly, the negotiation code was referencing those structures and ignoring important information (like the peer's connect_seq) from the correct ones. Among other things, this fixes tight reconnect loops where the server sends RETRY_SESSION and we (the client) retries with the same connect_seq as last time. This bug pretty easily triggered by injecting socket failures on the MDS and running some fs workload like workunits/direct_io/test_sync_io. Signed-off-by: Sage Weil

Merge branch 'master' of git://1984.lsi.us.es/nf

2012-07-17T10:19:33Z

Pablo Neira Ayuso says: ==================== I know that we're in fairly late stage to request pulls, but the IPVS people pinged me with little patches with oops fixes last week. One of them was recently introduced (during the 3.4 development cycle) while cleaning up the IPVS netns support. They are: * Fix one regression introduced in 3.4 while cleaning up the netns support for IPVS, from Julian Anastasov. * Fix one oops triggered due to resetting the conntrack attached to the skb instead of just putting it in the forward hook, from Lin Ming. This problem seems to be there since 2.6.37 according to Simon Horman. ==================== Signed-off-by: David S. Miller

ipvs: fix oops in ip_vs_dst_event on rmmod

2012-07-17T10:00:58Z

After commit 39f618b4fd95ae243d940ec64c961009c74e3333 (3.4) "ipvs: reset ipvs pointer in netns" we can oops in ip_vs_dst_event on rmmod ip_vs because ip_vs_control_cleanup is called after the ipvs_core_ops subsys is unregistered and net->ipvs is NULL. Fix it by exiting early from ip_vs_dst_event if ipvs is NULL. It is safe because all services and dests for the net are already freed. Signed-off-by: Julian Anastasov Signed-off-by: Simon Horman Signed-off-by: Pablo Neira Ayuso

ax25: Fix missing break

2012-07-17T06:22:36Z

At least there seems to be no reason to disallow ROSE sockets when NETROM is loaded. Signed-off-by: Alan Cox Signed-off-by: David S. Miller

caif: Fix access to freed pernet memory

2012-07-17T06:06:20Z

unregister_netdevice_notifier() must be called before unregister_pernet_subsys() to avoid accessing already freed pernet memory. This fixes the following oops when doing rmmod: Call Trace: [] caif_device_notify+0x4d/0x5a0 [caif] [] unregister_netdevice_notifier+0xb9/0x100 [] caif_device_exit+0x1c/0x250 [caif] [] sys_delete_module+0x1a4/0x300 [] ? trace_hardirqs_on_caller+0x15d/0x1e0 [] ? trace_hardirqs_on_thunk+0x3a/0x3 [] system_call_fastpath+0x1a/0x1f RIP [] caif_get+0x51/0xb0 [caif] Signed-off-by: Sjur Brændeland Acked-by: "Eric W. Biederman" Signed-off-by: David S. Miller

net: cgroup: fix access the unallocated memory in netprio cgroup

2012-07-17T06:00:43Z

there are some out of bound accesses in netprio cgroup. now before accessing the dev->priomap.priomap array,we only check if the dev->priomap exist.and because we don't want to see additional bound checkings in fast path, so we should make sure that dev->priomap is null or array size of dev->priomap.priomap is equal to max_prioidx + 1; so in write_priomap logic,we should call extend_netdev_table when dev->priomap is null and dev->priomap.priomap_len < max_len. and in cgrp_create->update_netdev_tables logic,we should call extend_netdev_table only when dev->priomap exist and dev->priomap.priomap_len < max_len. and it's not needed to call update_netdev_tables in write_priomap, we can only allocate the net device's priomap which we change through net_prio.ifpriomap. this patch also add a return value for update_netdev_tables & extend_netdev_table, so when new_priomap is allocated failed, write_priomap will stop to access the priomap,and return -ENOMEM back to the userspace to tell the user what happend. Change From v3: 1. add rtnl protect when reading max_prioidx in write_priomap. 2. only call extend_netdev_table when map->priomap_len < max_len, this will make sure array size of dev->map->priomap always bigger than any prioidx. 3. add a function write_update_netdev_table to make codes clear. Change From v2: 1. protect extend_netdev_table by RTNL. 2. when extend_netdev_table failed,call dev_put to reduce device's refcount. Signed-off-by: Gao feng Cc: Neil Horman Cc: Eric Dumazet Acked-by: Neil Horman Signed-off-by: David S. Miller

sctp: Fix list corruption resulting from freeing an association on a list

2012-07-17T05:32:26Z

A few days ago Dave Jones reported this oops: [22766.294255] general protection fault: 0000 [#1] PREEMPT SMP [22766.295376] CPU 0 [22766.295384] Modules linked in: [22766.387137] ffffffffa169f292 6b6b6b6b6b6b6b6b ffff880147c03a90 ffff880147c03a74 [22766.387135] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 00000000000 [22766.387136] Process trinity-watchdo (pid: 10896, threadinfo ffff88013e7d2000, [22766.387137] Stack: [22766.387140] ffff880147c03a10 [22766.387140] ffffffffa169f2b6 [22766.387140] ffff88013ed95728 [22766.387143] 0000000000000002 [22766.387143] 0000000000000000 [22766.387143] ffff880003fad062 [22766.387144] ffff88013c120000 [22766.387144] [22766.387145] Call Trace: [22766.387145] [22766.387150] [] ? __sctp_lookup_association+0x62/0xd0 [sctp] [22766.387154] [] __sctp_lookup_association+0x86/0xd0 [sctp] [22766.387157] [] sctp_rcv+0x207/0xbb0 [sctp] [22766.387161] [] ? trace_hardirqs_off_caller+0x28/0xd0 [22766.387163] [] ? nf_hook_slow+0x133/0x210 [22766.387166] [] ? ip_local_deliver_finish+0x4c/0x4c0 [22766.387168] [] ip_local_deliver_finish+0x18d/0x4c0 [22766.387169] [] ? ip_local_deliver_finish+0x4c/0x4c0 [22766.387171] [] ip_local_deliver+0x47/0x80 [22766.387172] [] ip_rcv_finish+0x150/0x680 [22766.387174] [] ip_rcv+0x214/0x320 [22766.387176] [] __netif_receive_skb+0x7b7/0x910 [22766.387178] [] ? __netif_receive_skb+0x11c/0x910 [22766.387180] [] ? put_lock_stats.isra.25+0xe/0x40 [22766.387182] [] netif_receive_skb+0x23/0x1f0 [22766.387183] [] ? dev_gro_receive+0x139/0x440 [22766.387185] [] napi_skb_finish+0x70/0xa0 [22766.387187] [] napi_gro_receive+0xf5/0x130 [22766.387218] [] e1000_receive_skb+0x59/0x70 [e1000e] [22766.387242] [] e1000_clean_rx_irq+0x28b/0x460 [e1000e] [22766.387266] [] e1000e_poll+0x78/0x430 [e1000e] [22766.387268] [] net_rx_action+0x1aa/0x3d0 [22766.387270] [] ? account_system_vtime+0x10f/0x130 [22766.387273] [] __do_softirq+0xe0/0x420 [22766.387275] [] call_softirq+0x1c/0x30 [22766.387278] [] do_softirq+0xd5/0x110 [22766.387279] [] irq_exit+0xd5/0xe0 [22766.387281] [] do_IRQ+0x63/0xd0 [22766.387283] [] common_interrupt+0x6f/0x6f [22766.387283] [22766.387284] [22766.387285] [] ? retint_swapgs+0x13/0x1b [22766.387285] Code: c0 90 5d c3 66 0f 1f 44 00 00 4c 89 c8 5d c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 <0f> b7 87 98 00 00 00 48 89 fb 49 89 f5 66 c1 c0 08 66 39 46 02 [22766.387307] [22766.387307] RIP [22766.387311] [] sctp_assoc_is_match+0x19/0x90 [sctp] [22766.387311] RSP [22766.387142] ffffffffa16ab120 [22766.599537] ---[ end trace 3f6dae82e37b17f5 ]--- [22766.601221] Kernel panic - not syncing: Fatal exception in interrupt It appears from his analysis and some staring at the code that this is likely occuring because an association is getting freed while still on the sctp_assoc_hashtable. As a result, we get a gpf when traversing the hashtable while a freed node corrupts part of the list. Nominally I would think that an mibalanced refcount was responsible for this, but I can't seem to find any obvious imbalance. What I did note however was that the two places where we create an association using sctp_primitive_ASSOCIATE (__sctp_connect and sctp_sendmsg), have failure paths which free a newly created association after calling sctp_primitive_ASSOCIATE. sctp_primitive_ASSOCIATE brings us into the sctp_sf_do_prm_asoc path, which issues a SCTP_CMD_NEW_ASOC side effect, which in turn adds a new association to the aforementioned hash table. the sctp command interpreter that process side effects has not way to unwind previously processed commands, so freeing the association from the __sctp_connect or sctp_sendmsg error path would lead to a freed association remaining on this hash table. I've fixed this but modifying sctp_[un]hash_established to use hlist_del_init, which allows us to proerly use hlist_unhashed to check if the node is on a hashlist safely during a delete. That in turn alows us to safely call sctp_unhash_established in the __sctp_connect and sctp_sendmsg error paths before freeing them, regardles of what the associations state is on the hash list. I noted, while I was doing this, that the __sctp_unhash_endpoint was using hlist_unhsashed in a simmilar fashion, but never nullified any removed nodes pointers to make that function work properly, so I fixed that up in a simmilar fashion. I attempted to test this using a virtual guest running the SCTP_RR test from netperf in a loop while running the trinity fuzzer, both in a loop. I wasn't able to recreate the problem prior to this fix, nor was I able to trigger the failure after (neither of which I suppose is suprising). Given the trace above however, I think its likely that this is what we hit. Signed-off-by: Neil Horman Reported-by: davej@redhat.com CC: davej@redhat.com CC: "David S. Miller" CC: Vlad Yasevich CC: Sridhar Samudrala CC: linux-sctp@vger.kernel.org Signed-off-by: David S. Miller