summaryrefslogtreecommitdiffstats
path: root/include/net
AgeCommit message (Collapse)AuthorLines
2016-02-11inet: create IPv6-equivalent inet_hash functionCraig Gallek-0/+2
In order to support fast lookups for TCP sockets with SO_REUSEPORT, the function that adds sockets to the listening hash set needs to be able to check receive address equality. Since this equality check is different for IPv4 and IPv6, we will need two different socket hashing functions. This patch adds inet6_hash identical to the existing inet_hash function and updates the appropriate references. A following patch will differentiate the two by passing different comparison functions to __inet_hash. Additionally, in order to use the IPv6 address equality function from inet6_hashtables (which is compiled as a built-in object when IPv6 is enabled) it also needs to be in a built-in object file as well. This moves ipv6_rcv_saddr_equal into inet_hashtables to accomplish this. Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11sock: struct proto hash function may errorCraig Gallek-8/+9
In order to support fast reuseport lookups in TCP, the hash function defined in struct proto must be capable of returning an error code. This patch changes the function signature of all related hash functions to return an integer and handles or propagates this return value at all call sites. Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-10vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devicesDavid Wragg-0/+1
Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could transmit vxlan packets of any size, constrained only by the ability to send out the resulting packets. 4.3 introduced netdevs corresponding to tunnel vports. These netdevs have an MTU, which limits the size of a packet that can be successfully encapsulated. The default MTU values are low (1500 or less), which is awkwardly small in the context of physical networks supporting jumbo frames, and leads to a conspicuous change in behaviour for userspace. Instead, set the MTU on openvswitch-created netdevs to be the relevant maximum (i.e. the maximum IP packet size minus any relevant overhead), effectively restoring the behaviour prior to 4.3. Signed-off-by: David Wragg <david@weave.works> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-09bonding: 3ad: apply ad_actor settings changes immediatelyNikolay Aleksandrov-0/+1
Currently the bonding allows to set ad_actor_system and prio while the bond device is down, but these are actually applied only if there aren't any slaves yet (applied to bond device when first slave shows up, and to slaves at 3ad bind time). After this patch changes are applied immediately and the new values can be used/seen after the bond's upped so it's not necessary anymore to release all and enslave again to see the changes. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-09tcp: do not drop syn_recv on all icmp reportsEric Dumazet-1/+1
Petr Novopashenniy reported that ICMP redirects on SYN_RECV sockets were leading to RST. This is of course incorrect. A specific list of ICMP messages should be able to drop a SYN_RECV. For instance, a REDIRECT on SYN_RECV shall be ignored, as we do not hold a dst per SYN_RECV pseudo request. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111751 Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Reported-by: Petr Novopashenniy <pety@rusnet.ru> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-08unix: correctly track in-flight fds in sending process user_structHannes Frederic Sowa-2/+3
The commit referenced in the Fixes tag incorrectly accounted the number of in-flight fds over a unix domain socket to the original opener of the file-descriptor. This allows another process to arbitrary deplete the original file-openers resource limit for the maximum of open files. Instead the sending processes and its struct cred should be credited. To do so, we add a reference counted struct user_struct pointer to the scm_fp_list and use it to account for the number of inflight unix fds. Fixes: 712f4aad406bb1 ("unix: properly account for FDs passed over unix sockets") Reported-by: David Herrmann <dh.herrmann@gmail.com> Cc: David Herrmann <dh.herrmann@gmail.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-07ipv4: Namespaceify tcp_notsent_lowat sysctl knobNikolay Borisov-2/+3
Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-07ipv4: Namespaceify tcp_fin_timeout sysctl knobNikolay Borisov-2/+2
Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-07ipv4: Namespaceify tcp_orphan_retries sysctl knobNikolay Borisov-1/+1
Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-07ipv4: Namespaceify tcp_retries2 sysctl knobNikolay Borisov-1/+1
Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-07ipv4: Namespaceify tcp_retries1 sysctl knobNikolay Borisov-1/+1
Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-07ipv4: Namespaceify tcp reordering sysctl knobNikolay Borisov-2/+4
Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-07ipv4: Namespaceify tcp syncookies sysctl knobNikolay Borisov-1/+2
Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-07ipv4: Namespaceify tcp synack retries sysctl knobNikolay Borisov-1/+1
Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-07ipv4: Namespaceify tcp syn retries sysctl knobNikolay Borisov-1/+2
Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-07vxlan: restructure vxlan.h definitionsJiri Benc-41/+63
RCO and GBP are VXLAN extensions, not specified in RFC 7348. Because of that, they need to be explicitly enabled when creating vxlan interface. By default, those extensions are not used and plain VXLAN header is sent and received. Reflect this in vxlan.h: first, the plain VXLAN header is defined. Following it, RCO is documented and defined, and likewise for GBP. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-07vxlan: remove duplicated macrosJiri Benc-3/+0
VNI_HASH_BITS and VNI_HASH_SIZE are defined twice. Remove the extra definitions. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-07vxlan: cleanup typesJiri Benc-7/+7
include/net/vxlan.h is a kernel header, no need to prefix fixed size types with double underscore. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-06tcp: fastopen: call tcp_fin() if FIN present in SYNACKEric Dumazet-0/+1
When we acknowledge a FIN, it is not enough to ack the sequence number and queue the skb into receive queue. We also have to call tcp_fin() to properly update socket state and send proper poll() notifications. It seems we also had the problem if we received a SYN packet with the FIN flag set, but it does not seem an urgent issue, as no known implementation can do that. Fixes: 61d2bcae99f6 ("tcp: fastopen: accept data/FIN present in SYNACK message") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-06tcp: fastopen: accept data/FIN present in SYNACK messageEric Dumazet-0/+1
RFC 7413 (TCP Fast Open) 4.2.2 states that the SYNACK message MAY include data and/or FIN This patch adds support for the client side : If we receive a SYNACK with payload or FIN, queue the skb instead of ignoring it. Since we already support the same for SYN, we refactor the existing code and reuse it. Note we need to clone the skb, so this operation might fail under memory pressure. Sara Dickinson pointed out FreeBSD server Fast Open implementation was planned to generate such SYNACK in the future. The server side might be implemented on linux later. Reported-by: Sara Dickinson <sara@sinodun.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds-24/+41
Pull networking fixes from David Miller: "This looks like a lot but it's a mixture of regression fixes as well as fixes for longer standing issues. 1) Fix on-channel cancellation in mac80211, from Johannes Berg. 2) Handle CHECKSUM_COMPLETE properly in xt_TCPMSS netfilter xtables module, from Eric Dumazet. 3) Avoid infinite loop in UDP SO_REUSEPORT logic, also from Eric Dumazet. 4) Avoid a NULL deref if we try to set SO_REUSEPORT after a socket is bound, from Craig Gallek. 5) GRO key comparisons don't take lightweight tunnels into account, from Jesse Gross. 6) Fix struct pid leak via SCM credentials in AF_UNIX, from Eric Dumazet. 7) We need to set the rtnl_link_ops of ipv6 SIT tunnels before we register them, otherwise the NEWLINK netlink message is missing the proper attributes. From Thadeu Lima de Souza Cascardo. 8) Several Spectrum chip bug fixes for mlxsw switch driver, from Ido Schimmel 9) Handle fragments properly in ipv4 easly socket demux, from Eric Dumazet. 10) Don't ignore the ifindex key specifier on ipv6 output route lookups, from Paolo Abeni" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (128 commits) tcp: avoid cwnd undo after receiving ECN irda: fix a potential use-after-free in ircomm_param_request net: tg3: avoid uninitialized variable warning net: nb8800: avoid uninitialized variable warning net: vxge: avoid unused function warnings net: bgmac: clarify CONFIG_BCMA dependency net: hp100: remove unnecessary #ifdefs net: davinci_cpdma: use dma_addr_t for DMA address ipv6/udp: use sticky pktinfo egress ifindex on connect() ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail() netlink: not trim skb for mmaped socket when dump vxlan: fix a out of bounds access in __vxlan_find_mac net: dsa: mv88e6xxx: fix port VLAN maps fib_trie: Fix shift by 32 in fib_table_lookup net: moxart: use correct accessors for DMA memory ipv4: ipconfig: avoid unused ic_proto_used symbol bnxt_en: Fix crash in bnxt_free_tx_skbs() during tx timeout. bnxt_en: Exclude rx_drop_pkts hw counter from the stack's rx_dropped counter. bnxt_en: Ring free response from close path should use completion ring net_sched: drr: check for NULL pointer in drr_dequeue ...
2016-01-30Merge branch 'for-upstream' of ↵David S. Miller-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Johan Hedberg says: ==================== pull request: bluetooth 2016-01-30 Here's a set of important Bluetooth fixes for the 4.5 kernel: - Two fixes to 6LoWPAN code (one fixing a potential crash) - Fix LE pairing with devices using both public and random addresses - Fix allocation of dynamic LE PSM values - Fix missing COMPATIBLE_IOCTL for UART line discipline Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-29ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail()Paolo Abeni-2/+10
The current implementation of ip6_dst_lookup_tail basically ignore the egress ifindex match: if the saddr is set, ip6_route_output() purposefully ignores flowi6_oif, due to the commit d46a9d678e4c ("net: ipv6: Dont add RT6_LOOKUP_F_IFACE flag if saddr set"), if the saddr is 'any' the first route lookup in ip6_dst_lookup_tail fails, but upon failure a second lookup will be performed with saddr set, thus ignoring the ifindex constraint. This commit adds an output route lookup function variant, which allows the caller to specify lookup flags, and modify ip6_dst_lookup_tail() to enforce the ifindex match on the second lookup via said helper. ip6_route_output() becames now a static inline function build on top of ip6_route_output_flags(); as a side effect, out-of-tree modules need now a GPL license to access the output route lookup functionality. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-29tcp: Change reference to experimental CWND RFC.Jörg Thalheim-1/+1
Signed-off-by: Jörg Thalheim <joerg@higgsboson.tk> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-29cfg80211/wext: fix message orderingJohannes Berg-0/+6
Since cfg80211 frequently takes actions from its netdev notifier call, wireless extensions messages could still be ordered badly since the wext netdev notifier, since wext is built into the kernel, runs before the cfg80211 netdev notifier. For example, the following can happen: 5: wlan1: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default link/ether 02:00:00:00:01:00 brd ff:ff:ff:ff:ff:ff 5: wlan1: <BROADCAST,MULTICAST,UP> link/ether when setting the interface down causes the wext message. To also fix this, export the wireless_nlevent_flush() function and also call it from the cfg80211 notifier. Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-01-29Bluetooth: L2CAP: Introduce proper defines for PSM rangesJohan Hedberg-0/+6
Having proper defines makes the code a bit readable, it also avoids duplicating hard-coded values since these are also needed when auto-allocating PSM values (in a subsequent patch). Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2016-01-28sctp: remove the dead field of sctp_transportXin Long-2/+1
After we use refcnt to check if transport is alive, the dead can be removed from sctp_transport. The traversal of transport_addr_list in procfs dump is using list_for_each_entry_rcu, no need to check if it has been freed. sctp_generate_t3_rtx_event and sctp_generate_heartbeat_event is protected by sock lock, it's not necessary to check dead, either. also, the timers are cancelled when sctp_transport_free() is called, that it doesn't wait for refcnt to reach 0 to cancel them. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-28sctp: fix the transport dead race check by using atomic_add_unless on refcntXin Long-1/+1
Now when __sctp_lookup_association is running in BH, it will try to check if t->dead is set, but meanwhile other CPUs may be freeing this transport and this assoc and if it happens that __sctp_lookup_association checked t->dead a bit too early, it may think that the association is still good while it was already freed. So we fix this race by using atomic_add_unless in sctp_transport_hold. After we get one transport from hashtable, we will hold it only when this transport's refcnt is not 0, so that we can make sure t->asoc cannot be freed before we hold the asoc again. Note that sctp association is not freed using RCU so we can't use atomic_add_unless() with it as it may just be too late for that either. Fixes: 4f0087812648 ("sctp: apply rhashtable api to send/recv path") Reported-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-27tcp: Use ahashHerbert Xu-5/+1
This patch replaces uses of the long obsolete hash interface with ahash. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: David S. Miller <davem@davemloft.net>
2016-01-27sctp: Use shashHerbert Xu-5/+5
This patch replaces uses of the long obsolete hash interface with shash. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: David S. Miller <davem@davemloft.net>
2016-01-21net: sock: remove dead cgroup methods from struct protoJohannes Weiner-12/+0
The cgroup methods are no longer used after baac50bbc3cd ("net: tcp_memcontrol: simplify linkage between socket and page counter"). The hunk to delete them was included in the original patch but must have gotten lost during conflict resolution on the way upstream. Fixes: baac50bbc3cd ("net: tcp_memcontrol: simplify linkage between socket and page counter") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller-5/+3
2016-01-20gro: Make GRO aware of lightweight tunnels.Jesse Gross-0/+18
GRO is currently not aware of tunnel metadata generated by lightweight tunnels and stored in the dst. This leads to two possible problems: * Incorrectly merging two frames that have different metadata. * Leaking of allocated metadata from merged frames. This avoids those problems by comparing the tunnel information before merging, similar to how we handle other metadata (such as vlan tags), and releasing any state when we are done. Reported-by: John <john.phillips5@hpe.com> Fixes: 2e15ea39 ("ip_gre: Add support to collect tunnel metadata.") Signed-off-by: Jesse Gross <jesse@kernel.org> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-20net: drop tcp_memcontrol.cVladimir Davydov-10/+0
tcp_memcontrol.c only contains legacy memory.tcp.kmem.* file definitions and mem_cgroup->tcp_mem init/destroy stuff. This doesn't belong to network subsys. Let's move it to memcontrol.c. This also allows us to reuse generic code for handling legacy memcg files. Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: "David S. Miller" <davem@davemloft.net> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-20mm: memcontrol: drop unused @css argument in memcg_init_kmemJohannes Weiner-1/+2
This series adds accounting of the historical "kmem" memory consumers to the cgroup2 memory controller. These consumers include the dentry cache, the inode cache, kernel stack pages, and a few others that are pointed out in patch 7/8. The footprint of these consumers is directly tied to userspace activity in common workloads, and so they have to be part of the minimally viable configuration in order to present a complete feature to our users. The cgroup2 interface of the memory controller is far from complete, but this series, along with the socket memory accounting series, provides the final semantic changes for the existing memory knobs in the cgroup2 interface, which is scheduled for initial release in the next merge window. This patch (of 8): Remove unused css argument frmo memcg_init_kmem() Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Tejun Heo <tj@kernel.org> Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-20netfilter: nf_conntrack: use safer way to lock all bucketsSasha Levin-5/+3
When we need to lock all buckets in the connection hashtable we'd attempt to lock 1024 spinlocks, which is way more preemption levels than supported by the kernel. Furthermore, this behavior was hidden by checking if lockdep is enabled, and if it was - use only 8 buckets(!). Fix this by using a global lock and synchronize all buckets on it when we need to lock them all. This is pretty heavyweight, but is only done when we need to resize the hashtable, and that doesn't happen often enough (or at all). Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-01-19soreuseport: fix NULL ptr dereference SO_REUSEPORT after bindCraig Gallek-1/+1
Marc Dionne discovered a NULL pointer dereference when setting SO_REUSEPORT on a socket after it is bound. This patch removes the assumption that at least one socket in the reuseport group is bound with the SO_REUSEPORT option before other bind calls occur. Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection") Reported-by: Marc Dionne <marc.c.dionne@gmail.com> Signed-off-by: Craig Gallek <kraig@google.com> Tested-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-17tcp_memcontrol: Forward declare cgroup_subsys and mem_cgroup stuctsGeert Uytterhoeven-0/+3
In file included from net/ipv4/tcp_ipv4.c:77 (and many more): include/net/tcp_memcontrol.h:5: warning: ‘struct cgroup_subsys’ declared inside parameter list include/net/tcp_memcontrol.h:5: warning: its scope is only this definition or declaration, which is probably not what you want Add forward declarations for all used structures to fix this. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds-3/+16
Pull networking fixes from David Miller: "A quick set of bug fixes after there initial networking merge: 1) Netlink multicast group storage allocator only was tested with nr_groups equal to 1, make it work for other values too. From Matti Vaittinen. 2) Check build_skb() return value in macb and hip04_eth drivers, from Weidong Wang. 3) Don't leak x25_asy on x25_asy_open() failure. 4) More DMA map/unmap fixes in 3c59x from Neil Horman. 5) Don't clobber IP skb control block during GSO segmentation, from Konstantin Khlebnikov. 6) ECN helpers for ipv6 don't fixup the checksum, from Eric Dumazet. 7) Fix SKB segment utilization estimation in xen-netback, from David Vrabel. 8) Fix lockdep splat in bridge addrlist handling, from Nikolay Aleksandrov" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits) bgmac: Fix reversed test of build_skb() return value. bridge: fix lockdep addr_list_lock false positive splat net: smsc: Add support h8300 xen-netback: free queues after freeing the net device xen-netback: delete NAPI instance when queue fails to initialize xen-netback: use skb to determine number of required guest Rx requests net: sctp: Move sequence start handling into sctp_transport_get_idx() ipv6: update skb->csum when CE mark is propagated net: phy: turn carrier off on phy attach net: macb: clear interrupts when disabling them sctp: support to lookup with ep+paddr in transport rhashtable net: hns: fixes no syscon error when init mdio dts: hisi: fixes no syscon fault when init mdio net: preserve IP control block during GSO segmentation fsl/fman: Delete one function call "put_device" in dtsec_config() hip04_eth: fix missing error handle for build_skb failed 3c59x: fix another page map/single unmap imbalance 3c59x: balance page maps and unmaps x25_asy: Free x25_asy on x25_asy_open() failure. mlxsw: fix SWITCHDEV_OBJ_ID_PORT_MDB ...
2016-01-15ipv6: update skb->csum when CE mark is propagatedEric Dumazet-3/+16
When a tunnel decapsulates the outer header, it has to comply with RFC 6080 and eventually propagate CE mark into inner header. It turns out IP6_ECN_set_ce() does not correctly update skb->csum for CHECKSUM_COMPLETE packets, triggering infamous "hw csum failure" messages and stack traces. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-14mm: memcontrol: generalize the socket accounting jump labelJohannes Weiner-7/+0
The unified hierarchy memory controller is going to use this jump label as well to control the networking callbacks. Move it to the memory controller code and give it a more generic name. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14net: tcp_memcontrol: simplify linkage between socket and page counterJohannes Weiner-24/+6
There won't be any separate counters for socket memory consumed by protocols other than TCP in the future. Remove the indirection and link sockets directly to their owning memory cgroup. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14net: tcp_memcontrol: sanitize tcp memory accounting callbacksJohannes Weiner-58/+11
There won't be a tcp control soft limit, so integrating the memcg code into the global skmem limiting scheme complicates things unnecessarily. Replace this with simple and clear charge and uncharge calls--hidden behind a jump label--to account skb memory. Note that this is not purely aesthetic: as a result of shoehorning the per-memcg code into the same memory accounting functions that handle the global level, the old code would compare the per-memcg consumption against the smaller of the per-memcg limit and the global limit. This allowed the total consumption of multiple sockets to exceed the global limit, as long as the individual sockets stayed within bounds. After this change, the code will always compare the per-memcg consumption to the per-memcg limit, and the global consumption to the global limit, and thus close this loophole. Without a soft limit, the per-memcg memory pressure state in sockets is generally questionable. However, we did it until now, so we continue to enter it when the hard limit is hit, and packets are dropped, to let other sockets in the cgroup know that they shouldn't grow their transmit windows, either. However, keep it simple in the new callback model and leave memory pressure lazily when the next packet is accepted (as opposed to doing it synchroneously when packets are processed). When packets are dropped, network performance will already be in the toilet, so that should be a reasonable trade-off. As described above, consumption is now checked on the per-memcg level and the global level separately. Likewise, memory pressure states are maintained on both the per-memcg level and the global level, and a socket is considered under pressure when either level asserts as much. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14net: tcp_memcontrol: simplify the per-memcg limit accessJohannes Weiner-3/+5
tcp_memcontrol replicates the global sysctl_mem limit array per cgroup, but it only ever sets these entries to the value of the memory_allocated page_counter limit. Use the latter directly. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14net: tcp_memcontrol: remove dead per-memcg count of allocated socketsJohannes Weiner-36/+3
The number of allocated sockets is used for calculations in the soft limit phase, where packets are accepted but the socket is under memory pressure. Since there is no soft limit phase in tcp_memcontrol, and memory pressure is only entered when packets are already dropped, this is actually dead code. Remove it. As this is the last user of parent_cg_proto(), remove that too. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: David S. Miller <davem@davemloft.net> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14net: tcp_memcontrol: remove bogus hierarchy pressure propagationJohannes Weiner-15/+4
When a cgroup currently breaches its socket memory limit, it enters memory pressure mode for itself and its *ancestors*. This throttles transmission in unrelated sibling and cousin subtrees that have nothing to do with the breached limit. On the contrary, breaching a limit should make that group and its *children* enter memory pressure mode. But this happens already, albeit lazily: if an ancestor limit is breached, siblings will enter memory pressure on their own once the next packet arrives for them. So no additional hierarchy code is needed. Remove the bogus stuff. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: David S. Miller <davem@davemloft.net> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14net: tcp_memcontrol: properly detect ancestor socket pressureJohannes Weiner-4/+6
When charging socket memory, the code currently checks only the local page counter for excess to determine whether the memcg is under socket pressure. But even if the local counter is fine, one of the ancestors could have breached its limit, which should also force this child to enter socket pressure. This currently doesn't happen. Fix this by using page_counter_try_charge() first. If that fails, it means that either the local counter or one of the ancestors are in excess of their limit, and the child should enter socket pressure. Fixes: 3e32cb2e0a12 ("mm: memcontrol: lockless page counters") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: David S. Miller <davem@davemloft.net> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14mac80211: pass block ack session timeout to to driverSara Sharon-14/+30
Currently mac80211 does not inform the driver of the session block ack timeout when starting a rx aggregation session. Drivers that manage the reorder buffer need to know this parameter. Seeing that there are now too many arguments for the drv_ampdu_action() function, wrap them inside a structure. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-01-14mac80211: document status.freq restrictionsJohannes Berg-0/+2
It's not always necessary to set the status.freq field, for example when this would be an expensive calculation. It must be set for all management frames (as they might be reported to userspace), but for data frames it's not really required. Document this. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-01-14mac80211: pass RX aggregation window size to driverSara Sharon-3/+5
Currently mac80211 does not inform the driver of the window size when starting an RX aggregation session. To enable managing the reorder buffer in the driver or hardware the window size is needed. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>