<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel, branch v3.18</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.18</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.18'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2014-12-04T04:55:58Z</updated>
<entry>
<title>context_tracking: Restore previous state in schedule_user</title>
<updated>2014-12-04T04:55:58Z</updated>
<author>
<name>Andy Lutomirski</name>
<email>luto@amacapital.net</email>
</author>
<published>2014-12-03T23:37:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7cc78f8fa02c2485104b86434acbc1538a3bd807'/>
<id>urn:sha1:7cc78f8fa02c2485104b86434acbc1538a3bd807</id>
<content type='text'>
It appears that some SCHEDULE_USER (asm for schedule_user) callers
in arch/x86/kernel/entry_64.S are called from RCU kernel context,
and schedule_user will return in RCU user context.  This causes RCU
warnings and possible failures.

This is intended to be a minimal fix suitable for 3.18.

Reported-and-tested-by: Dave Jones &lt;davej@redhat.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Frédéric Weisbecker &lt;fweisbec@gmail.com&gt;
Acked-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Signed-off-by: Andy Lutomirski &lt;luto@amacapital.net&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUME</title>
<updated>2014-11-23T22:25:28Z</updated>
<author>
<name>Andy Lutomirski</name>
<email>luto@amacapital.net</email>
</author>
<published>2014-11-21T21:26:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=82975bc6a6df743b9a01810fb32cb65d0ec5d60b'/>
<id>urn:sha1:82975bc6a6df743b9a01810fb32cb65d0ec5d60b</id>
<content type='text'>
x86 call do_notify_resume on paranoid returns if TIF_UPROBE is set but
not on non-paranoid returns.  I suspect that this is a mistake and that
the code only works because int3 is paranoid.

Setting _TIF_NOTIFY_RESUME in the uprobe code was probably a workaround
for the x86 bug.  With that bug fixed, we can remove _TIF_NOTIFY_RESUME
from the uprobes code.

Reported-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Acked-by: Borislav Petkov &lt;bp@suse.de&gt;
Signed-off-by: Andy Lutomirski &lt;luto@amacapital.net&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>sched: Provide update_curr callbacks for stop/idle scheduling classes</title>
<updated>2014-11-23T22:14:40Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2014-11-23T22:04:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=90e362f4a75d0911ca75e5cd95591a6cf1f169dc'/>
<id>urn:sha1:90e362f4a75d0911ca75e5cd95591a6cf1f169dc</id>
<content type='text'>
Chris bisected a NULL pointer deference in task_sched_runtime() to
commit 6e998916dfe3 'sched/cputime: Fix clock_nanosleep()/clock_gettime()
inconsistency'.

Chris observed crashes in atop or other /proc walking programs when he
started fork bombs on his machine.  He assumed that this is a new exit
race, but that does not make any sense when looking at that commit.

What's interesting is that, the commit provides update_curr callbacks
for all scheduling classes except stop_task and idle_task.

While nothing can ever hit that via the clock_nanosleep() and
clock_gettime() interfaces, which have been the target of the commit in
question, the author obviously forgot that there are other code paths
which invoke task_sched_runtime()

do_task_stat(()
 thread_group_cputime_adjusted()
   thread_group_cputime()
     task_cputime()
       task_sched_runtime()
        if (task_current(rq, p) &amp;&amp; task_on_rq_queued(p)) {
          update_rq_clock(rq);
          up-&gt;sched_class-&gt;update_curr(rq);
        }

If the stats are read for a stomp machine task, aka 'migration/N' and
that task is current on its cpu, this will happily call the NULL pointer
of stop_task-&gt;update_curr.  Ooops.

Chris observation that this happens faster when he runs the fork bomb
makes sense as the fork bomb will kick migration threads more often so
the probability to hit the issue will increase.

Add the missing update_curr callbacks to the scheduler classes stop_task
and idle_task.  While idle tasks cannot be monitored via /proc we have
other means to hit the idle case.

Fixes: 6e998916dfe3 'sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency'
Reported-by: Chris Mason &lt;clm@fb.com&gt;
Reported-and-tested-by: Borislav Petkov &lt;bp@alien8.de&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Stanislaw Gruszka &lt;sgruszka@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2014-11-21T23:44:54Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-11-21T23:44:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8b2ed21e846c63d8f1bdee0d8df0645721a604a1'/>
<id>urn:sha1:8b2ed21e846c63d8f1bdee0d8df0645721a604a1</id>
<content type='text'>
Pull scheduler fixes from Ingo Molnar:
 "Misc fixes: two NUMA fixes, two cputime fixes and an RCU/lockdep fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency
  sched/cputime: Fix cpu_timer_sample_group() double accounting
  sched/numa: Avoid selecting oneself as swap target
  sched/numa: Fix out of bounds read in sched_init_numa()
  sched: Remove lockdep check in sched_move_task()
</content>
</entry>
<entry>
<title>Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2014-11-21T23:44:07Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-11-21T23:44:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=13f5004c94785af107dd702d9fbbe160f1004064'/>
<id>urn:sha1:13f5004c94785af107dd702d9fbbe160f1004064</id>
<content type='text'>
Pull perf fixes from Ingo Molnar:
 "Misc fixes: two Intel uncore driver fixes, a CPU-hotplug fix and a
  build dependencies fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Fix boot crash on SBOX PMU on Haswell-EP
  perf/x86/intel/uncore: Fix IRP uncore register offsets on Haswell EP
  perf: Fix corruption of sibling list with hotplug
  perf/x86: Fix embarrasing typo
</content>
</entry>
<entry>
<title>sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency</title>
<updated>2014-11-16T09:04:20Z</updated>
<author>
<name>Stanislaw Gruszka</name>
<email>sgruszka@redhat.com</email>
</author>
<published>2014-11-12T15:58:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6e998916dfe327e785e7c2447959b2c1a3ea4930'/>
<id>urn:sha1:6e998916dfe327e785e7c2447959b2c1a3ea4930</id>
<content type='text'>
Commit d670ec13178d0 "posix-cpu-timers: Cure SMP wobbles" fixes one glibc
test case in cost of breaking another one. After that commit, calling
clock_nanosleep(TIMER_ABSTIME, X) and then clock_gettime(&amp;Y) can result
of Y time being smaller than X time.

Reproducer/tester can be found further below, it can be compiled and ran by:

	gcc -o tst-cpuclock2 tst-cpuclock2.c -pthread
	while ./tst-cpuclock2 ; do : ; done

This reproducer, when running on a buggy kernel, will complain
about "clock_gettime difference too small".

Issue happens because on start in thread_group_cputimer() we initialize
sum_exec_runtime of cputimer with threads runtime not yet accounted and
then add the threads runtime to running cputimer again on scheduler
tick, making it's sum_exec_runtime bigger than actual threads runtime.

KOSAKI Motohiro posted a fix for this problem, but that patch was never
applied: https://lkml.org/lkml/2013/5/26/191 .

This patch takes different approach to cure the problem. It calls
update_curr() when cputimer starts, that assure we will have updated
stats of running threads and on the next schedule tick we will account
only the runtime that elapsed from cputimer start. That also assure we
have consistent state between cpu times of individual threads and cpu
time of the process consisted by those threads.

Full reproducer (tst-cpuclock2.c):

	#define _GNU_SOURCE
	#include &lt;unistd.h&gt;
	#include &lt;sys/syscall.h&gt;
	#include &lt;stdio.h&gt;
	#include &lt;time.h&gt;
	#include &lt;pthread.h&gt;
	#include &lt;stdint.h&gt;
	#include &lt;inttypes.h&gt;

	/* Parameters for the Linux kernel ABI for CPU clocks.  */
	#define CPUCLOCK_SCHED          2
	#define MAKE_PROCESS_CPUCLOCK(pid, clock) \
		((~(clockid_t) (pid) &lt;&lt; 3) | (clockid_t) (clock))

	static pthread_barrier_t barrier;

	/* Help advance the clock.  */
	static void *chew_cpu(void *arg)
	{
		pthread_barrier_wait(&amp;barrier);
		while (1) ;

		return NULL;
	}

	/* Don't use the glibc wrapper.  */
	static int do_nanosleep(int flags, const struct timespec *req)
	{
		clockid_t clock_id = MAKE_PROCESS_CPUCLOCK(0, CPUCLOCK_SCHED);

		return syscall(SYS_clock_nanosleep, clock_id, flags, req, NULL);
	}

	static int64_t tsdiff(const struct timespec *before, const struct timespec *after)
	{
		int64_t before_i = before-&gt;tv_sec * 1000000000ULL + before-&gt;tv_nsec;
		int64_t after_i = after-&gt;tv_sec * 1000000000ULL + after-&gt;tv_nsec;

		return after_i - before_i;
	}

	int main(void)
	{
		int result = 0;
		pthread_t th;

		pthread_barrier_init(&amp;barrier, NULL, 2);

		if (pthread_create(&amp;th, NULL, chew_cpu, NULL) != 0) {
			perror("pthread_create");
			return 1;
		}

		pthread_barrier_wait(&amp;barrier);

		/* The test.  */
		struct timespec before, after, sleeptimeabs;
		int64_t sleepdiff, diffabs;
		const struct timespec sleeptime = {.tv_sec = 0,.tv_nsec = 100000000 };

		/* The relative nanosleep.  Not sure why this is needed, but its presence
		   seems to make it easier to reproduce the problem.  */
		if (do_nanosleep(0, &amp;sleeptime) != 0) {
			perror("clock_nanosleep");
			return 1;
		}

		/* Get the current time.  */
		if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &amp;before) &lt; 0) {
			perror("clock_gettime[2]");
			return 1;
		}

		/* Compute the absolute sleep time based on the current time.  */
		uint64_t nsec = before.tv_nsec + sleeptime.tv_nsec;
		sleeptimeabs.tv_sec = before.tv_sec + nsec / 1000000000;
		sleeptimeabs.tv_nsec = nsec % 1000000000;

		/* Sleep for the computed time.  */
		if (do_nanosleep(TIMER_ABSTIME, &amp;sleeptimeabs) != 0) {
			perror("absolute clock_nanosleep");
			return 1;
		}

		/* Get the time after the sleep.  */
		if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &amp;after) &lt; 0) {
			perror("clock_gettime[3]");
			return 1;
		}

		/* The time after sleep should always be equal to or after the absolute sleep
		   time passed to clock_nanosleep.  */
		sleepdiff = tsdiff(&amp;sleeptimeabs, &amp;after);
		if (sleepdiff &lt; 0) {
			printf("absolute clock_nanosleep woke too early: %" PRId64 "\n", sleepdiff);
			result = 1;

			printf("Before %llu.%09llu\n", before.tv_sec, before.tv_nsec);
			printf("After  %llu.%09llu\n", after.tv_sec, after.tv_nsec);
			printf("Sleep  %llu.%09llu\n", sleeptimeabs.tv_sec, sleeptimeabs.tv_nsec);
		}

		/* The difference between the timestamps taken before and after the
		   clock_nanosleep call should be equal to or more than the duration of the
		   sleep.  */
		diffabs = tsdiff(&amp;before, &amp;after);
		if (diffabs &lt; sleeptime.tv_nsec) {
			printf("clock_gettime difference too small: %" PRId64 "\n", diffabs);
			result = 1;
		}

		pthread_cancel(th);

		return result;
	}

Signed-off-by: Stanislaw Gruszka &lt;sgruszka@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/20141112155843.GA24803@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/cputime: Fix cpu_timer_sample_group() double accounting</title>
<updated>2014-11-16T09:04:18Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2014-11-12T11:37:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=23cfa361f3e54a3e184a5e126bbbdd95f984881a'/>
<id>urn:sha1:23cfa361f3e54a3e184a5e126bbbdd95f984881a</id>
<content type='text'>
While looking over the cpu-timer code I found that we appear to add
the delta for the calling task twice, through:

  cpu_timer_sample_group()
    thread_group_cputimer()
      thread_group_cputime()
        times-&gt;sum_exec_runtime += task_sched_runtime();

    *sample = cputime.sum_exec_runtime + task_delta_exec();

Which would make the sample run ahead, making the sleep short.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Stanislaw Gruszka &lt;sgruszka@redhat.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Link: http://lkml.kernel.org/r/20141112113737.GI10476@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/numa: Avoid selecting oneself as swap target</title>
<updated>2014-11-16T09:04:17Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2014-11-10T09:54:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7af683350cb0ddd0e9d3819b4eb7abe9e2d3e709'/>
<id>urn:sha1:7af683350cb0ddd0e9d3819b4eb7abe9e2d3e709</id>
<content type='text'>
Because the whole numa task selection stuff runs with preemption
enabled (its long and expensive) we can end up migrating and selecting
oneself as a swap target. This doesn't really work out well -- we end
up trying to acquire the same lock twice for the swap migrate -- so
avoid this.

Reported-and-Tested-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/20141110100328.GF29390@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf: Fix corruption of sibling list with hotplug</title>
<updated>2014-11-16T08:45:46Z</updated>
<author>
<name>Mark Rutland</name>
<email>mark.rutland@arm.com</email>
</author>
<published>2014-11-05T16:11:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=226424eee809251ec23bd4b09d8efba09c10fc3c'/>
<id>urn:sha1:226424eee809251ec23bd4b09d8efba09c10fc3c</id>
<content type='text'>
When a CPU hotplugged out, we call perf_remove_from_context() (via
perf_event_exit_cpu()) to rip each CPU-bound event out of its PMU's cpu
context, but leave siblings grouped together. Freeing of these events is
left to the mercy of the usual refcounting.

When a CPU-bound event's refcount drops to zero we cross-call to
__perf_remove_from_context() to clean it up, detaching grouped siblings.

This works when the relevant CPU is online, but will fail if the CPU is
currently offline, and we won't detach the event from its siblings
before freeing the event, leaving the sibling list corrupt. If the
sibling list is later walked (e.g. because the CPU cam online again
before a remaining sibling's refcount drops to zero), we will walk the
now corrupted siblings list, potentially dereferencing garbage values.

Given that the events should never be scheduled again (as we removed
them from their context), we can simply detatch siblings when the CPU
goes down in the first place. If the CPU comes back online, the
redundant call to __perf_remove_from_context() is safe.

Reported-by: Drew Richardson &lt;drew.richardson@arm.com&gt;
Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: vincent.weaver@maine.edu
Cc: Vince Weaver &lt;vincent.weaver@maine.edu&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@kernel.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/1415203904-25308-2-git-send-email-mark.rutland@arm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'pm+acpi-3.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm</title>
<updated>2014-11-14T21:38:02Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-11-14T21:38:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=78646f62dbaba99ab0ae50afbe8c2d1876cb4d33'/>
<id>urn:sha1:78646f62dbaba99ab0ae50afbe8c2d1876cb4d33</id>
<content type='text'>
Pull ACPI and power management fixes from Rafael Wysocki:
 "These are three regression fixes, two recent (generic power domains,
  suspend-to-idle) and one older (cpufreq), an ACPI blacklist entry for
  one more machine having problems with Windows 8 compatibility, a minor
  cpufreq driver fix (cpufreq-dt) and a fixup for new callback
  definitions (generic power domains).

  Specifics:

   - Fix a crash in the suspend-to-idle code path introduced by a recent
     commit that forgot to check a pointer against NULL before
     dereferencing it (Dmitry Eremin-Solenikov).

   - Fix a boot crash on Exynos5 introduced by a recent commit making
     that platform use generic Device Tree bindings for power domains
     which exposed a weakness in the generic power domains framework
     leading to that crash (Ulf Hansson).

   - Fix a crash during system resume on systems where cpufreq depends
     on Operation Performance Points (OPP) for functionality, but
     CONFIG_OPP is not set.  This leads the cpufreq driver registration
     to fail, but the resume code attempts to restore the pre-suspend
     cpufreq configuration (which does not exist) nevertheless and
     crashes.  From Geert Uytterhoeven.

   - Add a new ACPI blacklist entry for Dell Vostro 3546 that has
     problems if it is reported as Windows 8 compatible to the BIOS
     (Adam Lee).

   - Fix swapped arguments in an error message in the cpufreq-dt driver
     (Abhilash Kesavan).

   - Fix up the prototypes of new callbacks in struct generic_pm_domain
     to make them more useful.  Users of those callbacks will be added
     in 3.19 and it's better for them to be based on the correct struct
     definition in mainline from the start.  From Ulf Hansson and Kevin
     Hilman"

* tag 'pm+acpi-3.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / Domains: Fix initial default state of the need_restore flag
  PM / sleep: Fix entering suspend-to-IDLE if no freeze_oops is set
  PM / Domains: Change prototype for the attach and detach callbacks
  cpufreq: Avoid crash in resume on SMP without OPP
  cpufreq: cpufreq-dt: Fix arguments in clock failure error message
  ACPI / blacklist: blacklist Win8 OSI for Dell Vostro 3546
</content>
</entry>
</feed>
