linux/lib/rhashtable.c, branch v4.8

rhashtable: fix a memory leak in alloc_bucket_locks()

2016-08-27T04:59:53Z

If vmalloc() was successful, do not attempt a kmalloc_array() Fixes: 4cf0b354d92e ("rhashtable: avoid large lock-array allocations") Reported-by: CAI Qian Signed-off-by: Eric Dumazet Cc: Florian Westphal Acked-by: Herbert Xu Tested-by: CAI Qian Signed-off-by: David S. Miller

rhashtable: fix shift by 64 when shrinking

2016-08-15T18:10:09Z

I got this: ================================================================================ UBSAN: Undefined behaviour in ./include/linux/log2.h:63:13 shift exponent 64 is too large for 64-bit type 'long unsigned int' CPU: 1 PID: 721 Comm: kworker/1:1 Not tainted 4.8.0-rc1+ #87 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 Workqueue: events rht_deferred_worker 0000000000000000 ffff88011661f8d8 ffffffff82344f50 0000000041b58ab3 ffffffff84f98000 ffffffff82344ea4 ffff88011661f900 ffff88011661f8b0 0000000000000001 ffff88011661f6b8 dffffc0000000000 ffffffff867f7640 Call Trace: [] dump_stack+0xac/0xfc [] ? _atomic_dec_and_lock+0xc4/0xc4 [] ubsan_epilogue+0xd/0x8a [] __ubsan_handle_shift_out_of_bounds+0x255/0x29a [] ? __ubsan_handle_out_of_bounds+0x180/0x180 [] ? nl80211_req_set_reg+0x256/0x2f0 [] ? print_context_stack+0x8a/0x160 [] ? amd_pmu_reset+0x341/0x380 [] rht_deferred_worker+0x1618/0x1790 [] ? rht_deferred_worker+0x1618/0x1790 [] ? rhashtable_jhash2+0x370/0x370 [] ? process_one_work+0x6fd/0x1970 [] process_one_work+0x79f/0x1970 [] ? process_one_work+0x6fd/0x1970 [] ? try_to_grab_pending+0x4c0/0x4c0 [] ? worker_thread+0x1c4/0x1340 [] worker_thread+0x55f/0x1340 [] ? __schedule+0x4df/0x1d40 [] ? process_one_work+0x1970/0x1970 [] ? process_one_work+0x1970/0x1970 [] kthread+0x237/0x390 [] ? __kthread_parkme+0x280/0x280 [] ? _raw_spin_unlock_irq+0x33/0x50 [] ret_from_fork+0x1f/0x40 [] ? __kthread_parkme+0x280/0x280 ================================================================================ roundup_pow_of_two() is undefined when called with an argument of 0, so let's avoid the call and just fall back to ht->p.min_size (which should never be smaller than HASH_MIN_SIZE). Cc: Herbert Xu Signed-off-by: Vegard Nossum Acked-by: Herbert Xu Signed-off-by: David S. Miller

rhashtable: avoid large lock-array allocations

2016-08-15T04:12:57Z

Sander reports following splat after netfilter nat bysrc table got converted to rhashtable: swapper/0: page allocation failure: order:3, mode:0x2084020(GFP_ATOMIC|__GFP_COMP) CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc1 [..] [] warn_alloc_failed+0xdd/0x140 [] __alloc_pages_nodemask+0x3e1/0xcf0 [] alloc_pages_current+0x8d/0x110 [] kmalloc_order+0x1f/0x70 [] __kmalloc+0x129/0x140 [] bucket_table_alloc+0xc1/0x1d0 [] rhashtable_insert_rehash+0x5d/0xe0 [] nf_nat_setup_info+0x2ef/0x400 The failure happens when allocating the spinlock array. Even with GFP_KERNEL its unlikely for such a large allocation to succeed. Thomas Graf pointed me at inet_ehash_locks_alloc(), so in addition to adding NOWARN for atomic allocations this also makes the bucket-array sizing more conservative. In commit 095dc8e0c3686 ("tcp: fix/cleanup inet_ehash_locks_alloc()"), Eric Dumazet says: "Budget 2 cache lines per cpu worth of 'spinlocks'". IOW, consider size needed by a single spinlock when determining number of locks per cpu. So with 64 byte per cacheline and 4 byte per spinlock this gives 32 locks per cpu. Resulting size of the lock-array (sizeof(spinlock) == 4): cpus: 1 2 4 8 16 32 64 old: 1k 1k 4k 8k 16k 16k 16k new: 128 256 512 1k 2k 4k 8k 8k allocation should have decent chance of success even with GFP_ATOMIC, and should not fail with GFP_KERNEL. With 72-byte spinlock (LOCKDEP): cpus : 1 2 old: 9k 18k new: ~2k ~4k Reported-by: Sander Eikelenboom Suggested-by: Thomas Graf Signed-off-by: Florian Westphal Signed-off-by: David S. Miller

rhashtable: accept GFP flags in rhashtable_walk_init

2016-04-05T08:56:32Z

In certain cases, the 802.11 mesh pathtable code wants to iterate over all of the entries in the forwarding table from the receive path, which is inside an RCU read-side critical section. Enable walks inside atomic sections by allowing GFP_ATOMIC allocations for the walker state. Change all existing callsites to pass in GFP_KERNEL. Acked-by: Thomas Graf Signed-off-by: Bob Copeland [also adjust gfs2/glock.c and rhashtable tests] Signed-off-by: Johannes Berg

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

2015-12-31T23:20:10Z

rhashtable: Kill harmless RCU warning in rhashtable_walk_init

2015-12-19T04:44:18Z

The commit c6ff5268293ef98e48a99597e765ffc417e39fa5 ("rhashtable: Fix walker list corruption") causes a suspicious RCU usage warning because we no longer hold ht->mutex when we dereference ht->tbl. However, this is a false positive because we now hold ht->lock which also guarantees that ht->tbl won't disppear from under us. This patch kills the warning by using rcu_dereference_protected. Reported-by: kernel test robot Signed-off-by: Herbert Xu Signed-off-by: David S. Miller

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

2015-12-18T03:08:28Z

Conflicts: drivers/net/geneve.c Here we had an overlapping change, where in 'net' the extraneous stats bump was being removed whilst in 'net-next' the final argument to udp_tunnel6_xmit_skb() was being changed. Signed-off-by: David S. Miller

rhashtable: Fix walker list corruption

2015-12-16T16:13:14Z

The commit ba7c95ea3870fe7b847466d39a049ab6f156aa2c ("rhashtable: Fix sleeping inside RCU critical section in walk_stop") introduced a new spinlock for the walker list. However, it did not convert all existing users of the list over to the new spin lock. Some continued to use the old mutext for this purpose. This obviously led to corruption of the list. The fix is to use the spin lock everywhere where we touch the list. This also allows us to do rcu_rad_lock before we take the lock in rhashtable_walk_start. With the old mutex this would've deadlocked but it's safe with the new spin lock. Fixes: ba7c95ea3870 ("rhashtable: Fix sleeping inside RCU...") Reported-by: Colin Ian King Signed-off-by: Herbert Xu Signed-off-by: David S. Miller

rhashtable: Enforce minimum size on initial hash table

2015-12-16T15:44:08Z

William Hua wrote: > > I wasn't aware there was an enforced minimum size. I simply set the > nelem_hint in the rhastable_params struct to 1, expecting it to grow as > needed. This caused a segfault afterwards when trying to insert an > element. OK we're doing the size computation before we enforce the limit on min_size. ---8<--- We need to do the initial hash table size computation after we have obtained the correct min_size/max_size parameters. Otherwise we may end up with a hash table whose size is outside the allowed envelope. Fixes: a998f712f77e ("rhashtable: Round up/down min/max_size to...") Reported-by: William Hua Signed-off-by: Herbert Xu Signed-off-by: David S. Miller

rhashtable: Remove unnecessary wmb for future_tbl

2015-12-09T03:46:32Z

The patch 9497df88ab5567daa001829051c5f87161a81ff0 ("rhashtable: Fix reader/rehash race") added a pair of barriers. In fact the wmb is superfluous because every subsequent write to the old or new hash table uses rcu_assign_pointer, which itself carriers a full barrier prior to the assignment. Therefore we may remove the explicit wmb. Signed-off-by: Herbert Xu Acked-by: Thomas Graf Signed-off-by: David S. Miller