<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/futex.c, branch v5.7</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.7</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.7'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2020-03-30T23:17:15Z</updated>
<entry>
<title>Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2020-03-30T23:17:15Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-03-30T23:17:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4b9fd8a829a1eec7442e38afff21d610604de56a'/>
<id>urn:sha1:4b9fd8a829a1eec7442e38afff21d610604de56a</id>
<content type='text'>
Pull locking updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Continued user-access cleanups in the futex code.

   - percpu-rwsem rewrite that uses its own waitqueue and atomic_t
     instead of an embedded rwsem. This addresses a couple of
     weaknesses, but the primary motivation was complications on the -rt
     kernel.

   - Introduce raw lock nesting detection on lockdep
     (CONFIG_PROVE_RAW_LOCK_NESTING=y), document the raw_lock vs. normal
     lock differences. This too originates from -rt.

   - Reuse lockdep zapped chain_hlocks entries, to conserve RAM
     footprint on distro-ish kernels running into the "BUG:
     MAX_LOCKDEP_CHAIN_HLOCKS too low!" depletion of the lockdep
     chain-entries pool.

   - Misc cleanups, smaller fixes and enhancements - see the changelog
     for details"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits)
  fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t
  thermal/x86_pkg_temp: Make pkg_temp_lock a raw_spinlock_t
  Documentation/locking/locktypes: Minor copy editor fixes
  Documentation/locking/locktypes: Further clarifications and wordsmithing
  m68knommu: Remove mm.h include from uaccess_no.h
  x86: get rid of user_atomic_cmpxchg_inatomic()
  generic arch_futex_atomic_op_inuser() doesn't need access_ok()
  x86: don't reload after cmpxchg in unsafe_atomic_op2() loop
  x86: convert arch_futex_atomic_op_inuser() to user_access_begin/user_access_end()
  objtool: whitelist __sanitizer_cov_trace_switch()
  [parisc, s390, sparc64] no need for access_ok() in futex handling
  sh: no need of access_ok() in arch_futex_atomic_op_inuser()
  futex: arch_futex_atomic_op_inuser() calling conventions change
  completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all()
  lockdep: Add posixtimer context tracing bits
  lockdep: Annotate irq_work
  lockdep: Add hrtimer context tracing bits
  lockdep: Introduce wait-type checks
  completion: Use simple wait queues
  sched/swait: Prepare usage in completions
  ...
</content>
</entry>
<entry>
<title>Merge branch 'uaccess.futex' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into locking/core</title>
<updated>2020-03-28T10:59:24Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2020-03-28T10:59:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cf226c42b2d66b0f60d18fc2e44e68091fee6cef'/>
<id>urn:sha1:cf226c42b2d66b0f60d18fc2e44e68091fee6cef</id>
<content type='text'>
Pull uaccess futex cleanups for Al Viro:

     Consolidate access_ok() usage and the futex uaccess function zoo.
</content>
</entry>
<entry>
<title>futex: arch_futex_atomic_op_inuser() calling conventions change</title>
<updated>2020-03-28T03:58:51Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2020-02-16T15:17:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a08971e9488d12a10a46eb433612229767b61fd5'/>
<id>urn:sha1:a08971e9488d12a10a46eb433612229767b61fd5</id>
<content type='text'>
Move access_ok() in and pagefault_enable()/pagefault_disable() out.
Mechanical conversion only - some instances don't really need
a separate access_ok() at all (e.g. the ones only using
get_user()/put_user(), or architectures where access_ok()
is always true); we'll deal with that in followups.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>futex: Unbreak futex hashing</title>
<updated>2020-03-09T21:33:09Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2020-03-08T18:07:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8d67743653dce5a0e7aa500fcccb237cde7ad88e'/>
<id>urn:sha1:8d67743653dce5a0e7aa500fcccb237cde7ad88e</id>
<content type='text'>
The recent futex inode life time fix changed the ordering of the futex key
union struct members, but forgot to adjust the hash function accordingly,

As a result the hashing omits the leading 64bit and even hashes beyond the
futex key causing a bad hash distribution which led to a ~100% performance
regression.

Hand in the futex key pointer instead of a random struct member and make
the size calculation based of the struct offset.

Fixes: 8019ad13ef7f ("futex: Fix inode life-time issue")
Reported-by: Rong Chen &lt;rong.a.chen@intel.com&gt;
Decoded-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Tested-by: Rong Chen &lt;rong.a.chen@intel.com&gt;
Link: https://lkml.kernel.org/r/87h7yy90ve.fsf@nanos.tec.linutronix.de
</content>
</entry>
<entry>
<title>futex: Remove {get,drop}_futex_key_refs()</title>
<updated>2020-03-06T10:06:19Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2020-03-04T12:24:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4b39f99c222a2aff6a52fddfa6d8d4aef1771737'/>
<id>urn:sha1:4b39f99c222a2aff6a52fddfa6d8d4aef1771737</id>
<content type='text'>
Now that {get,drop}_futex_key_refs() have become a glorified NOP,
remove them entirely.

The only thing get_futex_key_refs() is still doing is an smp_mb(), and
now that we don't need to (ab)use existing atomic ops to obtain them,
we can place it explicitly where we need it.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
</content>
</entry>
<entry>
<title>futex: Remove pointless mmgrap() + mmdrop()</title>
<updated>2020-03-06T10:06:18Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2020-03-04T12:02:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=222993395ed38f3751287f4bd82ef46b3eb3a66d'/>
<id>urn:sha1:222993395ed38f3751287f4bd82ef46b3eb3a66d</id>
<content type='text'>
We always set 'key-&gt;private.mm' to 'current-&gt;mm', getting an extra
reference on 'current-&gt;mm' is quite pointless, because as long as the
task is blocked it isn't going to go away.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
</content>
</entry>
<entry>
<title>futex: Fix inode life-time issue</title>
<updated>2020-03-06T10:06:15Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2020-03-04T10:28:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8019ad13ef7f64be44d4f892af9c840179009254'/>
<id>urn:sha1:8019ad13ef7f64be44d4f892af9c840179009254</id>
<content type='text'>
As reported by Jann, ihold() does not in fact guarantee inode
persistence. And instead of making it so, replace the usage of inode
pointers with a per boot, machine wide, unique inode identifier.

This sequence number is global, but shared (file backed) futexes are
rare enough that this should not become a performance issue.

Reported-by: Jann Horn &lt;jannh@google.com&gt;
Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
</content>
</entry>
<entry>
<title>futex: Fix kernel-doc notation warning</title>
<updated>2020-01-09T12:23:40Z</updated>
<author>
<name>Randy Dunlap</name>
<email>rdunlap@infradead.org</email>
</author>
<published>2019-12-09T04:26:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=51bfb1d11d6daf095addf9fe8471c20992caae0b'/>
<id>urn:sha1:51bfb1d11d6daf095addf9fe8471c20992caae0b</id>
<content type='text'>
Fix a kernel-doc warning in kernel/futex.c by adding notation
for @ret.

../kernel/futex.c:1187: warning: Function parameter or member 'ret' not described in 'wait_for_owner_exiting'

Fixes: 3ef240eaff36 ("futex: Prevent exit livelock")
Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/r/223be78c-f3c8-52df-836d-c5fb8e7907e9@infradead.org

</content>
</entry>
<entry>
<title>futex: Prevent exit livelock</title>
<updated>2019-11-20T08:40:38Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-11-06T21:55:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3ef240eaff36b8119ac9e2ea17cbf41179c930ba'/>
<id>urn:sha1:3ef240eaff36b8119ac9e2ea17cbf41179c930ba</id>
<content type='text'>
Oleg provided the following test case:

int main(void)
{
	struct sched_param sp = {};

	sp.sched_priority = 2;
	assert(sched_setscheduler(0, SCHED_FIFO, &amp;sp) == 0);

	int lock = vfork();
	if (!lock) {
		sp.sched_priority = 1;
		assert(sched_setscheduler(0, SCHED_FIFO, &amp;sp) == 0);
		_exit(0);
	}

	syscall(__NR_futex, &amp;lock, FUTEX_LOCK_PI, 0,0,0);
	return 0;
}

This creates an unkillable RT process spinning in futex_lock_pi() on a UP
machine or if the process is affine to a single CPU. The reason is:

 parent	    	    			child

  set FIFO prio 2

  vfork()			-&gt;	set FIFO prio 1
   implies wait_for_child()	 	sched_setscheduler(...)
 			   		exit()
					do_exit()
 					....
					mm_release()
					  tsk-&gt;futex_state = FUTEX_STATE_EXITING;
					  exit_futex(); (NOOP in this case)
					  complete() --&gt; wakes parent
  sys_futex()
    loop infinite because
    tsk-&gt;futex_state == FUTEX_STATE_EXITING

The same problem can happen just by regular preemption as well:

  task holds futex
  ...
  do_exit()
    tsk-&gt;futex_state = FUTEX_STATE_EXITING;

  --&gt; preemption (unrelated wakeup of some other higher prio task, e.g. timer)

  switch_to(other_task)

  return to user
  sys_futex()
	loop infinite as above

Just for the fun of it the futex exit cleanup could trigger the wakeup
itself before the task sets its futex state to DEAD.

To cure this, the handling of the exiting owner is changed so:

   - A refcount is held on the task

   - The task pointer is stored in a caller visible location

   - The caller drops all locks (hash bucket, mmap_sem) and blocks
     on task::futex_exit_mutex. When the mutex is acquired then
     the exiting task has completed the cleanup and the state
     is consistent and can be reevaluated.

This is not a pretty solution, but there is no choice other than returning
an error code to user space, which would break the state consistency
guarantee and open another can of problems including regressions.

For stable backports the preparatory commits ac31c7ff8624 .. ba31c1a48538
are required as well, but for anything older than 5.3.y the backports are
going to be provided when this hits mainline as the other dependencies for
those kernels are definitely not stable material.

Fixes: 778e9a9c3e71 ("pi-futex: fix exit races and locking problems")
Reported-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Stable Team &lt;stable@vger.kernel.org&gt;
Link: https://lkml.kernel.org/r/20191106224557.041676471@linutronix.de
</content>
</entry>
<entry>
<title>futex: Provide distinct return value when owner is exiting</title>
<updated>2019-11-20T08:40:10Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-11-06T21:55:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ac31c7ff8624409ba3c4901df9237a616c187a5d'/>
<id>urn:sha1:ac31c7ff8624409ba3c4901df9237a616c187a5d</id>
<content type='text'>
attach_to_pi_owner() returns -EAGAIN for various cases:

 - Owner task is exiting
 - Futex value has changed

The caller drops the held locks (hash bucket, mmap_sem) and retries the
operation. In case of the owner task exiting this can result in a live
lock.

As a preparatory step for seperating those cases, provide a distinct return
value (EBUSY) for the owner exiting case.

No functional change.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20191106224556.935606117@linutronix.de


</content>
</entry>
</feed>
