Age	Commit message (Collapse)	Author	Files	Lines
2014-10-30	mlx4: Avoid leaking steering rules on flow creation error flow	Or Gerlitz	2	-2/+12
	If mlx4_ib_create_flow() attempts to create > 1 rules with the firmware, and one of these registrations fail, we leaked the already created flow rules. One example of the leak is when the registration of the VXLAN ghost steering rule fails, we didn't unregister the original rule requested by the user, introduced in commit d2fce8a9060d "mlx4: Set user-space raw Ethernet QPs to properly handle VXLAN traffic". While here, add dump of the VXLAN portion of steering rules so it can actually be seen when flow creation fails. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30	net/mlx4_en: Don't attempt to TX offload the outer UDP checksum for VXLAN	Or Gerlitz	1	-2/+5
	For VXLAN/NVGRE encapsulation, the current HW doesn't support offloading both the outer UDP TX checksum and the inner TCP/UDP TX checksum. The driver doesn't advertize SKB_GSO_UDP_TUNNEL_CSUM, however we are wrongly telling the HW to offload the outer UDP checksum for encapsulated packets, fix that. Fixes: 837052d0ccc5 ('net/mlx4_en: Add netdev support for TCP/IP offloads of vxlan tunneling') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30	ipv4: Do not cache routing failures due to disabled forwarding.	Nicolas Cavallari	1	-0/+1
	If we cache them, the kernel will reuse them, independently of whether forwarding is enabled or not. Which means that if forwarding is disabled on the input interface where the first routing request comes from, then that unreachable result will be cached and reused for other interfaces, even if forwarding is enabled on them. The opposite is also true. This can be verified with two interfaces A and B and an output interface C, where B has forwarding enabled, but not A and trying ip route get $dst iif A from $src && ip route get $dst iif B from $src Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr> Reviewed-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30	cxgb4 : Fix missing initialization of win0_lock	Anish Bhatt	1	-0/+1
	win0_lock was being used un-initialized, resulting in warning traces being seen when lock debugging is enabled (and just wrong) Fixes : fc5ab0209650 ('cxgb4: Replaced the backdoor mechanism to access the HW memory with PCIe Window method') Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30	r8152: check WORK_ENABLE in suspend function	hayeswang	1	-1/+1
	Avoid unnecessary behavior when autosuspend occurs during open(). The relative processes should only be run after finishing open(). Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30	r8152: reset tp->speed before autoresuming in open function	hayeswang	1	-0/+5
	If (tp->speed & LINK_STATUS) is not zero, the rtl8152_resume() would call rtl_start_rx() before enabling the tx/rx. Avoid this by resetting it to zero. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30	r8152: clear SELECTIVE_SUSPEND when autoresuming	hayeswang	1	-4/+3
	The flag of SELECTIVE_SUSPEND should be cleared when autoresuming. Otherwise, when the system suspend and resume occur, it may have the wrong flow. Besides, because the flag of SELECTIVE_SUSPEND couldn't be used to check if the hw enables the relative feature, it should alwayes be disabled in close(). Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30	ixgbe: fix race when setting advertised speed	Emil Tantilov	1	-0/+4
	Following commands: modprobe ixgbe ifconfig ethX up ethtool -s ethX advertise 0x020 can lead to "setup link failed with code -14" error due to the setup_link call racing with the SFP detection routine in the watchdog. This patch resolves this issue by protecting the setup_link call with check for __IXGBE_IN_SFP_INIT. Reported-by: Scott Harrison <scoharr2@cisco.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-10-30	ixgbe: need not repeat init skb with NULL	Junwei Zhang	1	-1/+1
	Signed-off-by: Martin Zhang <martinbj2008@gmail.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-10-30	igb: don't reuse pages with pfmemalloc flag	Roman Gushchin	1	-1/+5
	Incoming packet is dropped silently by sk_filter(), if the skb was allocated from pfmemalloc reserves and the corresponding socket is not marked with the SOCK_MEMALLOC flag. Igb driver allocates pages for DMA with __skb_alloc_page(), which calls alloc_pages_node() with the __GFP_MEMALLOC flag. So, in case of OOM condition, igb can get pages with pfmemalloc flag set. If an incoming packet hits the pfmemalloc page and is large enough (small packets are copying into the memory, allocated with netdev_alloc_skb_ip_align(), so they are not affected), it will be dropped. This behavior is ok under high memory pressure, but the problem is that the igb driver reuses these mapped pages. So, packets are still dropping even if all memory issues are gone and there is a plenty of free memory. In my case, some TCP sessions hang on a small percentage (< 0.1%) of machines days after OOMs. Fix this by avoiding reuse of such pages. Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> Tested-by: Aaron Brown "aaron.f.brown@intel.com" Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-10-30	e1000: unset IFF_UNICAST_FLT on WMware 82545EM	Francesco Ruggeri	1	-1/+4
	VMWare's e1000 implementation does not seem to support unicast filtering. This can be observed by configuring a macvlan interface on eth0 in a VM in VMWare Fusion 5.0.5, and trying to use that interface instead of eth0. Tested on 3.16. Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-10-29	inet: frags: remove the WARN_ON from inet_evict_bucket	Nikolay Aleksandrov	1	-1/+0
	The WARN_ON in inet_evict_bucket can be triggered by a valid case: inet_frag_kill and inet_evict_bucket can be running in parallel on the same queue which means that there has been at least one more ref added by a previous inet_frag_find call, but inet_frag_kill can delete the timer before inet_evict_bucket which will cause the WARN_ON() there to trigger since we'll have refcnt!=1. Now, this case is valid because the queue is being "killed" for some reason (removed from the chain list and its timer deleted) so it will get destroyed in the end by one of the inet_frag_put() calls which reaches 0 i.e. refcnt is still valid. CC: Florian Westphal <fw@strlen.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McLean <chutzpah@gentoo.org> Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue") Reported-by: Patrick McLean <chutzpah@gentoo.org> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29	inet: frags: fix a race between inet_evict_bucket and inet_frag_kill	Nikolay Aleksandrov	1	-1/+2
	When the evictor is running it adds some chosen frags to a local list to be evicted once the chain lock has been released but at the same time the *frag_queue can be running for some of the same queues and it may call inet_frag_kill which will wait on the chain lock and will then delete the queue from the wrong list since it was added in the eviction one. The fix is simple - check if the queue has the evict flag set under the chain lock before deleting it, this is safe because the evict flag is set only under that lock and having the flag set also means that the queue has been detached from the chain list, so no need to delete it again. An important note to make is that we're safe w.r.t refcnt because inet_frag_kill and inet_evict_bucket will sync on the del_timer operation where only one of the two can succeed (or if the timer is executing - none of them), the cases are: 1. inet_frag_kill succeeds in del_timer - then the timer ref is removed, but inet_evict_bucket will not add this queue to its expire list but will restart eviction in that chain 2. inet_evict_bucket succeeds in del_timer - then the timer ref is kept until the evictor "expires" the queue, but inet_frag_kill will remove the initial ref and will set INET_FRAG_COMPLETE which will make the frag_expire fn just to remove its ref. In the end all of the queue users will do an inet_frag_put and the one that reaches 0 will free it. The refcount balance should be okay. CC: Florian Westphal <fw@strlen.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McLean <chutzpah@gentoo.org> Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue") Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Patrick McLean <chutzpah@gentoo.org> Tested-by: Patrick McLean <chutzpah@gentoo.org> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29	cnic: Update the rcu_access_pointer() usages	Tej Parkash	1	-4/+1
	1. Remove the rcu_read_lock/unlock around rcu_access_pointer 2. Replace the rcu_dereference with rcu_access_pointer Signed-off-by: Tej Parkash <tej.parkash@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29	cxgb4vf: Replace repetitive pci device ID's with right ones	Hariprasad Shenai	1	-8/+8
	Replaced repetive Device ID's which got added in commit b961f9a48844ecf3 ("cxgb4vf: Remove superfluous "idx" parameter of CH_DEVICE() macro") Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29	ipv6: notify userspace when we added or changed an ipv6 token	Lubomir Rintel	1	-0/+1
	NetworkManager might want to know that it changed when the router advertisement arrives. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Daniel Borkmann <dborkman@redhat.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-29	sch_pie: schedule the timer after all init succeed	WANG Cong	1	-1/+1
	Cc: Vijay Subramanian <vijaynsu@cisco.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com>
2014-10-28	cdc-ether: handle promiscuous mode with a set_rx_mode callback	Olivier Blin	1	-0/+5
	Promiscuous mode was not supported anymore with my Lenovo adapters (RTL8153) since commit c472ab68ad67db23c9907a27649b7dc0899b61f9 (cdc-ether: clean packet filter upon probe). It was not possible to use them in a bridge anymore. Signed-off-by: Olivier Blin <olivier.blin@softathome.com> Also-analyzed-by: Loïc Yhuel <loic.yhuel@softathome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28	cdc-ether: extract usbnet_cdc_update_filter function	Olivier Blin	1	-14/+28
	This will be used by the set_rx_mode callback. Also move a comment about multicast filtering in this new function. Signed-off-by: Olivier Blin <olivier.blin@softathome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28	usbnet: add a callback for set_rx_mode	Olivier Blin	2	-0/+24
	To delegate promiscuous mode and multicast filtering to the subdriver. Signed-off-by: Olivier Blin <olivier.blin@softathome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28	net: systemport: reset UniMAC coming out of a suspend cycle	Florian Fainelli	1	-0/+2
	bcm_sysport_resume() was missing an UniMAC reset which can lead to various receive FIFO corruptions coming out of a suspend cycle. If the RX FIFO is stuck, it will deliver corrupted/duplicate packets towards the host CPU interface. This could be reproduced on crowded network and when Wake-on-LAN is enabled for this particular interface because the switch still forwards packets towards the host CPU interface (SYSTEMPORT), and we had to leave the UniMAC RX enable bit on to allow matching MagicPackets. Once we re-enter the resume function, there is a small window during which the UniMAC receive is still enabled, and we start queueing packets, but the RDMA and RBUF engines are not ready, which leads to having packets stuck in the UniMAC RX FIFO, ultimately delivered towards the host CPU as corrupted. Fixes: 40755a0fce17 ("net: systemport: add suspend and resume support") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28	net: systemport: enable RX interrupts after NAPI	Florian Fainelli	1	-6/+3
	There is currently a small window during which the SYSTEMPORT adapter enables its RX interrupts without having enabled its NAPI handler, which can result in packets to be discarded during interface bringup. A similar but more serious window exists in bcm_sysport_resume() during which we can have the RDMA engine not fully prepared to receive packets and yet having RX interrupts enabled. Fix this my moving the RX interrupt enable down to bcm_sysport_netif_start() after napi_enable() for the RX path is called, which fixes both call sites: bcm_sysport_open() and bcm_sysport_resume(). Fixes: b02e6d9ba7ad ("net: systemport: add bcm_sysport_netif_{enable,stop}") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28	skbuff.h: fix kernel-doc warning for headers_end	Randy Dunlap	1	-0/+4
	Fix kernel-doc warning in <linux/skbuff.h> by making both headers_start and headers_end private fields. Warning(..//include/linux/skbuff.h:654): No description found for parameter 'headers_end[0]' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28	net: phy: Add SGMII Configuration for Marvell 88E1145 Initialization	Vince Bridgers	1	-0/+19
	Marvell phy 88E1145 configuration & initialization was missing a case for initializing SGMII mode. This patch adds that case. Signed-off-by: Vince Bridgers <vbridger@opensource.altera.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28	drivers: net:cpsw: fix probe_dt when only slave 1 is pinned out	Mugunthan V N	1	-9/+9
	when slave 0 has no phy and slave 1 connected to phy, driver probe will fail as there is no phy id present for slave 0 device tree, so continuing even though no phy-id found, also moving mac-id read later to ensure mac-id is read from device tree even when phy-id entry in not found. Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28	dsa: mv88e6171: Fix tagging protocol/Kconfig	Andrew Lunn	1	-1/+1
	The mv88e6171 can support two different tagging protocols, DSA and EDSA. The switch driver structure only allows one protocol to be enumerated, and DSA was chosen. However the Kconfig entry ensures the EDSA tagging code is built. With a minimal configuration, we then end up with a mismatch. The probe is successful, EDSA tagging is used, but the switch is configured for DSA, resulting in mangled packets. Change the switch driver structure to enumerate EDSA, fixing the mismatch. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Fixes: 42f272539487 ("net: DSA: Marvell mv88e6171 switch driver") Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28	net: dsa: Error out on tagging protocol mismatches	Andrew Lunn	1	-1/+4
	If there is a mismatch between enabled tagging protocols and the protocol the switch supports, error out, rather than continue with a situation which is unlikely to work. Signed-off-by: Andrew Lunn <andrew@lunn.ch> cc: alexander.h.duyck@intel.com Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27	bpf: split eBPF out of NET	Alexei Starovoitov	5	-5/+28
	introduce two configs: - hidden CONFIG_BPF to select eBPF interpreter that classic socket filters depend on - visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use that solves several problems: - tracing and others that wish to use eBPF don't need to depend on NET. They can use BPF_SYSCALL to allow loading from userspace or select BPF to use it directly from kernel in NET-less configs. - in 3.18 programs cannot be attached to events yet, so don't force it on - when the rest of eBPF infra is there in 3.19+, it's still useful to switch it off to minimize kernel size bloat-o-meter on x64 shows: add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601) tested with many different config combinations. Hopefully didn't miss anything. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27	cxgb4 : Handle dcb enable correctly	Anish Bhatt	2	-2/+11
	Disabling DCBx in firmware automatically enables DCBx for control via host lldp agents. Wait for an explicit setstate call from an lldp agents to enable DCBx instead. Fixes: 76bcb31efc06 ("cxgb4 : Add DCBx support codebase and dcbnl_ops") Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27	cxgb4 : Improve handling of DCB negotiation or loss thereof	Anish Bhatt	1	-3/+45
	Clear out any DCB apps we might have added to kernel table when we lose DCB sync (or IEEE equivalent event). These were previously left behind and not cleaned up correctly. IEEE allows individual components to work independently, so improve check for IEEE completion by specifying individual components. Fixes: 10b0046685ab ("cxgb4: IEEE fixes for DCBx state machine") Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-27	netfilter: nft_compat: fix wrong target lookup in nft_target_select_ops()	Arturo Borrero	1	-1/+1
	The code looks for an already loaded target, and the correct list to search is nft_target_list, not nft_match_list. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-26	net: napi_reuse_skb() should check pfmemalloc	Eric Dumazet	1	-0/+4
	Do not reuse skb if it was pfmemalloc tainted, otherwise future frame might be dropped anyway. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-26	net/mlx4_core: Call synchronize_irq() before freeing EQ buffer	Eli Cohen	1	-0/+1
	After moving the EQ ownership to software effectively destroying it, call synchronize_irq() to ensure that any handler routines running on other CPU cores finish execution. Only then free the EQ buffer. The same thing is done when we destroy a CQ which is one of the sources generating interrupts. In the case of CQ we want to avoid completion handlers on a CQ that was destroyed. In the case we do the same to avoid receiving asynchronous events after the EQ has been destroyed and its buffers freed. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-26	net/mlx5_core: Call synchronize_irq() before freeing EQ buffer	Eli Cohen	1	-0/+1
	After destroying the EQ, the object responsible for generating interrupts, call synchronize_irq() to ensure that any handler routines running on other CPU cores finish execution. Only then free the EQ buffer. This patch solves a very rare case when we get panic on driver unload. The same thing is done when we destroy a CQ which is one of the sources generating interrupts. In the case of CQ we want to avoid completion handlers on a CQ that was destroyed. In the case we do the same to avoid receiving asynchronous events after the EQ has been destroyed and its buffers freed. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-25	drivers: net: xgene: Rewrite buggy loop in xgene_enet_ecc_init()	Geert Uytterhoeven	1	-9/+7
	drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c: In function ‘xgene_enet_ecc_init’: drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c:126: warning: ‘data’ may be used uninitialized in this function Depending on the arbitrary value on the stack, the loop may terminate too early, and cause a bogus -ENODEV failure. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-25	i40e: _MASK vs _SHIFT typo in i40e_handle_mdd_event()	Dan Carpenter	1	-2/+2
	We accidentally mask by the _SHIFT variable. It means that "event" is always zero. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-25	macvlan: fix a race on port dismantle and possible skb leaks	Eric Dumazet	1	-2/+8
	We need to cancel the work queue after rcu grace period, otherwise it can be rescheduled by incoming packets. We need to purge queue if some skbs are still in it. We can use __skb_queue_head_init() variant in macvlan_process_broadcast() Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 412ca1550cbec ("macvlan: Move broadcasts into a work queue") Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-25	tcp: md5: do not use alloc_percpu()	Eric Dumazet	1	-39/+20
	percpu tcp_md5sig_pool contains memory blobs that ultimately go through sg_set_buf(). -> sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf)); This requires that whole area is in a physically contiguous portion of memory. And that @buf is not backed by vmalloc(). Given that alloc_percpu() can use vmalloc() areas, this does not fit the requirements. Replace alloc_percpu() by a static DEFINE_PER_CPU() as tcp_md5sig_pool is small anyway, there is no gain to dynamically allocate it. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 765cf9976e93 ("tcp: md5: remove one indirection level in tcp_md5sig_pool") Reported-by: Crestez Dan Leonard <cdleonard@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-25	xen-netback: reintroduce guest Rx stall detection	David Vrabel	4	-1/+86
	If a frontend not receiving packets it is useful to detect this and turn off the carrier so packets are dropped early instead of being queued and drained when they expire. A to-guest queue is stalled if it doesn't have enough free slots for a an extended period of time (default 60 s). If at least one queue is stalled, the carrier is turned off (in the expectation that the other queues will soon stall as well). The carrier is only turned on once all queues are ready. When the frontend connects, all the queues start in the stalled state and only become ready once the frontend queues enough Rx requests. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-25	xen-netback: fix unlimited guest Rx internal queue and carrier flapping	David Vrabel	4	-178/+161
	Netback needs to discard old to-guest skb's (guest Rx queue drain) and it needs detect guest Rx stalls (to disable the carrier so packets are discarded earlier), but the current implementation is very broken. 1. The check in hard_start_xmit of the slot availability did not consider the number of packets that were already in the guest Rx queue. This could allow the queue to grow without bound. The guest stops consuming packets and the ring was allowed to fill leaving S slot free. Netback queues a packet requiring more than S slots (ensuring that the ring stays with S slots free). Netback queue indefinately packets provided that then require S or fewer slots. 2. The Rx stall detection is not triggered in this case since the (host) Tx queue is not stopped. 3. If the Tx queue is stopped and a guest Rx interrupt occurs, netback will consider this an Rx purge event which may result in it taking the carrier down unnecessarily. It also considers a queue with only 1 slot free as unstalled (even though the next packet might not fit in this). The internal guest Rx queue is limited by a byte length (to 512 Kib, enough for half the ring). The (host) Tx queue is stopped and started based on this limit. This sets an upper bound on the amount of memory used by packets on the internal queue. This allows the estimatation of the number of slots for an skb to be removed (it wasn't a very good estimate anyway). Instead, the guest Rx thread just waits for enough free slots for a maximum sized packet. skbs queued on the internal queue have an 'expires' time (set to the current time plus the drain timeout). The guest Rx thread will detect when the skb at the head of the queue has expired and discard expired skbs. This sets a clear upper bound on the length of time an skb can be queued for. For a guest being destroyed the maximum time needed to wait for all the packets it sent to be dropped is still the drain timeout (10 s) since it will not be sending new packets. Rx stall detection is reintroduced in a later commit. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-25	xen-netback: make feature-rx-notify mandatory	David Vrabel	3	-25/+5
	Frontends that do not provide feature-rx-notify may stall because netback depends on the notification from frontend to wake the guest Rx thread (even if can_queue is false). This could be fixed but feature-rx-notify was introduced in 2006 and I am not aware of any frontends that do not implement this. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>