<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel, branch v3.1</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.1</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.1'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2011-10-18T09:36:59Z</updated>
<entry>
<title>cputimer: Cure lock inversion</title>
<updated>2011-10-18T09:36:59Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2011-10-17T09:50:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=bcd5cff7216f9b2de0a148cc355eac199dc6f1cf'/>
<id>urn:sha1:bcd5cff7216f9b2de0a148cc355eac199dc6f1cf</id>
<content type='text'>
There's a lock inversion between the cputimer-&gt;lock and rq-&gt;lock;
notably the two callchains involved are:

 update_rlimit_cpu()
   sighand-&gt;siglock
   set_process_cpu_timer()
     cpu_timer_sample_group()
       thread_group_cputimer()
         cputimer-&gt;lock
         thread_group_cputime()
           task_sched_runtime()
             -&gt;pi_lock
             rq-&gt;lock

 scheduler_tick()
   rq-&gt;lock
   task_tick_fair()
     update_curr()
       account_group_exec()
         cputimer-&gt;lock

Where the first one is enabling a CLOCK_PROCESS_CPUTIME_ID timer, and
the second one is keeping up-to-date.

This problem was introduced by e8abccb7193 ("posix-cpu-timers: Cure
SMP accounting oddities").

Cure the problem by removing the cputimer-&gt;lock and rq-&gt;lock nesting,
this leaves concurrent enablers doing duplicate work, but the time
wasted should be on the same order otherwise wasted spinning on the
lock and the greater-than assignment filter should ensure we preserve
monotonicity.

Reported-by: Dave Jones &lt;davej@redhat.com&gt;
Reported-by: Simon Kirby &lt;sim@hostway.ca&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: stable@kernel.org
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Link: http://lkml.kernel.org/r/1318928713.21167.4.camel@twins
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
</entry>
<entry>
<title>Avoid using variable-length arrays in kernel/sys.c</title>
<updated>2011-10-17T15:24:24Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2011-10-17T15:24:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a84a79e4d369a73c0130b5858199e949432da4c6'/>
<id>urn:sha1:a84a79e4d369a73c0130b5858199e949432da4c6</id>
<content type='text'>
The size is always valid, but variable-length arrays generate worse code
for no good reason (unless the function happens to be inlined and the
compiler sees the length for the simple constant it is).

Also, there seems to be some code generation problem on POWER, where
Henrik Bakken reports that register r28 can get corrupted under some
subtle circumstances (interrupt happening at the wrong time?).  That all
indicates some seriously broken compiler issues, but since variable
length arrays are bad regardless, there's little point in trying to
chase it down.

"Just don't do that, then".

Reported-by: Henrik Grindal Bakken &lt;henribak@cisco.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge branches 'irq-urgent-for-linus', 'x86-urgent-for-linus' and 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip</title>
<updated>2011-10-01T15:37:25Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2011-10-01T15:37:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f72a209a3e694ecb8d3ceed4671d98c4364e00e3'/>
<id>urn:sha1:f72a209a3e694ecb8d3ceed4671d98c4364e00e3</id>
<content type='text'>
* 'irq-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
  irq: Fix check for already initialized irq_domain in irq_domain_add
  irq: Add declaration of irq_domain_simple_ops to irqdomain.h

* 'x86-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
  x86/rtc: Don't recursively acquire rtc_lock

* 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
  posix-cpu-timers: Cure SMP wobbles
  sched: Fix up wchan borkage
  sched/rt: Migrate equal priority tasks to available CPUs
</content>
</entry>
<entry>
<title>posix-cpu-timers: Cure SMP wobbles</title>
<updated>2011-09-30T12:07:06Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2011-09-01T10:42:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d670ec13178d0fd8680e6742a2bc6e04f28f87d8'/>
<id>urn:sha1:d670ec13178d0fd8680e6742a2bc6e04f28f87d8</id>
<content type='text'>
David reported:

  Attached below is a watered-down version of rt/tst-cpuclock2.c from
  GLIBC.  Just build it with "gcc -o test test.c -lpthread -lrt" or
  similar.

  Run it several times, and you will see cases where the main thread
  will measure a process clock difference before and after the nanosleep
  which is smaller than the cpu-burner thread's individual thread clock
  difference.  This doesn't make any sense since the cpu-burner thread
  is part of the top-level process's thread group.

  I've reproduced this on both x86-64 and sparc64 (using both 32-bit and
  64-bit binaries).

  For example:

  [davem@boricha build-x86_64-linux]$ ./test
  process: before(0.001221967) after(0.498624371) diff(497402404)
  thread:  before(0.000081692) after(0.498316431) diff(498234739)
  self:    before(0.001223521) after(0.001240219) diff(16698)
  [davem@boricha build-x86_64-linux]$ 

  The diff of 'process' should always be &gt;= the diff of 'thread'.

  I make sure to wrap the 'thread' clock measurements the most tightly
  around the nanosleep() call, and that the 'process' clock measurements
  are the outer-most ones.

  ---
  #include &lt;unistd.h&gt;
  #include &lt;stdio.h&gt;
  #include &lt;stdlib.h&gt;
  #include &lt;time.h&gt;
  #include &lt;fcntl.h&gt;
  #include &lt;string.h&gt;
  #include &lt;errno.h&gt;
  #include &lt;pthread.h&gt;

  static pthread_barrier_t barrier;

  static void *chew_cpu(void *arg)
  {
	  pthread_barrier_wait(&amp;barrier);
	  while (1)
		  __asm__ __volatile__("" : : : "memory");
	  return NULL;
  }

  int main(void)
  {
	  clockid_t process_clock, my_thread_clock, th_clock;
	  struct timespec process_before, process_after;
	  struct timespec me_before, me_after;
	  struct timespec th_before, th_after;
	  struct timespec sleeptime;
	  unsigned long diff;
	  pthread_t th;
	  int err;

	  err = clock_getcpuclockid(0, &amp;process_clock);
	  if (err)
		  return 1;

	  err = pthread_getcpuclockid(pthread_self(), &amp;my_thread_clock);
	  if (err)
		  return 1;

	  pthread_barrier_init(&amp;barrier, NULL, 2);
	  err = pthread_create(&amp;th, NULL, chew_cpu, NULL);
	  if (err)
		  return 1;

	  err = pthread_getcpuclockid(th, &amp;th_clock);
	  if (err)
		  return 1;

	  pthread_barrier_wait(&amp;barrier);

	  err = clock_gettime(process_clock, &amp;process_before);
	  if (err)
		  return 1;

	  err = clock_gettime(my_thread_clock, &amp;me_before);
	  if (err)
		  return 1;

	  err = clock_gettime(th_clock, &amp;th_before);
	  if (err)
		  return 1;

	  sleeptime.tv_sec = 0;
	  sleeptime.tv_nsec = 500000000;
	  nanosleep(&amp;sleeptime, NULL);

	  err = clock_gettime(th_clock, &amp;th_after);
	  if (err)
		  return 1;

	  err = clock_gettime(my_thread_clock, &amp;me_after);
	  if (err)
		  return 1;

	  err = clock_gettime(process_clock, &amp;process_after);
	  if (err)
		  return 1;

	  diff = process_after.tv_nsec - process_before.tv_nsec;
	  printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
		 process_before.tv_sec, process_before.tv_nsec,
		 process_after.tv_sec, process_after.tv_nsec, diff);
	  diff = th_after.tv_nsec - th_before.tv_nsec;
	  printf("thread:  before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
		 th_before.tv_sec, th_before.tv_nsec,
		 th_after.tv_sec, th_after.tv_nsec, diff);
	  diff = me_after.tv_nsec - me_before.tv_nsec;
	  printf("self:    before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
		 me_before.tv_sec, me_before.tv_nsec,
		 me_after.tv_sec, me_after.tv_nsec, diff);

	  return 0;
  }

This is due to us using p-&gt;se.sum_exec_runtime in
thread_group_cputime() where we iterate the thread group and sum all
data. This does not take time since the last schedule operation (tick
or otherwise) into account. We can cure this by using
task_sched_runtime() at the cost of having to take locks.

This also means we can (and must) do away with
thread_group_sched_runtime() since the modified thread_group_cputime()
is now more accurate and would deadlock when called from
thread_group_sched_runtime().

Aside of that it makes the function safe on 32 bit systems. The old
code added t-&gt;se.sum_exec_runtime unprotected. sum_exec_runtime is a
64bit value and could be changed on another cpu at the same time.

Reported-by: David Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twins
Tested-by: David Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
</entry>
<entry>
<title>Resource: fix wrong resource window calculation</title>
<updated>2011-09-30T03:04:34Z</updated>
<author>
<name>Ram Pai</name>
<email>linuxram@us.ibm.com</email>
</author>
<published>2011-09-22T07:48:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=47ea91b4052d9e94b9dca5d7a3d947fbebd07ba9'/>
<id>urn:sha1:47ea91b4052d9e94b9dca5d7a3d947fbebd07ba9</id>
<content type='text'>
__find_resource() incorrectly returns a resource window which overlaps
an existing allocated window.  This happens when the parent's
resource-window spans 0x00000000 to 0xffffffff and is entirely allocated
to all its children resource-windows.

__find_resource() looks for gaps in resource allocation among the
children resource windows.  When it encounters the last child window it
blindly tries the range next to one allocated to the last child.  Since
the last child's window ends at 0xffffffff the calculation overflows,
leading the algorithm to believe that any window in the range 0x0000000
to 0xfffffff is available for allocation.  This leads to a conflicting
window allocation.

Michal Ludvig reported this issue seen on his platform.  The following
patch fixes the problem and has been verified by Michal.  I believe this
bug has been there for ages.  It got exposed by git commit 2bbc6942273b
("PCI : ability to relocate assigned pci-resources")

Signed-off-by: Ram Pai &lt;linuxram@us.ibm.com&gt;
Tested-by: Michal Ludvig &lt;mludvig@logix.net.nz&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>sched: Fix up wchan borkage</title>
<updated>2011-09-26T10:51:08Z</updated>
<author>
<name>Simon Kirby</name>
<email>sim@hostway.ca</email>
</author>
<published>2011-09-23T00:03:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6ebbe7a07b3bc40b168d2afc569a6543c020d2e3'/>
<id>urn:sha1:6ebbe7a07b3bc40b168d2afc569a6543c020d2e3</id>
<content type='text'>
Commit c259e01a1ec ("sched: Separate the scheduler entry for
preemption") contained a boo-boo wrecking wchan output. It forgot to
put the new schedule() function in the __sched section and thereby
doesn't get properly ignored for things like wchan.

Tested-by: Simon Kirby &lt;sim@hostway.ca&gt;
Cc: stable@kernel.org # 2.6.39+
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/20110923000346.GA25425@hostway.ca
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>ptrace: PTRACE_LISTEN forgets to unlock -&gt;siglock</title>
<updated>2011-09-25T18:02:00Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2011-09-25T17:46:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f9d81f61c84aca693bc353dfef4b8c36c2e5e1b5'/>
<id>urn:sha1:f9d81f61c84aca693bc353dfef4b8c36c2e5e1b5</id>
<content type='text'>
If PTRACE_LISTEN fails after lock_task_sighand() it doesn't drop -&gt;siglock.

Reported-by: Matt Fleming &lt;matt.fleming@intel.com&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>irq: Fix check for already initialized irq_domain in irq_domain_add</title>
<updated>2011-09-20T10:16:22Z</updated>
<author>
<name>Rob Herring</name>
<email>robherring2@gmail.com</email>
</author>
<published>2011-09-14T16:31:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=eef24afb28561a5a9f4be8f8da97735b7e6a826f'/>
<id>urn:sha1:eef24afb28561a5a9f4be8f8da97735b7e6a826f</id>
<content type='text'>
The sanity check in irq_domain_add() tests desc-&gt;irq_data != NULL or
irq_data-&gt;domain != NULL. This prevents adding an irq_domain to a irq
descriptor when irq_data exists, which true when the irq descriptor
exists.

This went unnoticed so far as the simple domain code did not enter
this code path because domain-&gt;nr_irqs is always 0 for the simple domains.

Split the check for irq_data == NULL out and have a separate warning
for it.

[ tglx: Made the check for irq_data == NULL separate ]

Signed-off-by: Rob Herring &lt;rob.herring@calxeda.com&gt;
Cc: Grant Likely &lt;grant.likely@secretlab.ca&gt;
Cc: marc.zyngier@arm.com
Cc: thomas.abraham@linaro.org
Cc: jamie@jamieiles.com
Cc: b-cousson@ti.com
Cc: shawn.guo@linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: devicetree-discuss@lists.ozlabs.org
Link: http://lkml.kernel.org/r/1316017900-19918-3-git-send-email-robherring2@gmail.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
</entry>
<entry>
<title>Merge branch 'irq-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip</title>
<updated>2011-09-20T00:23:41Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2011-09-20T00:23:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9d037a777695993ec7437e5f451647dea7919d4c'/>
<id>urn:sha1:9d037a777695993ec7437e5f451647dea7919d4c</id>
<content type='text'>
* 'irq-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
  x86, iommu: Mark DMAR IRQ as non-threaded
  genirq: Make irq_shutdown() symmetric vs. irq_startup again
</content>
</entry>
<entry>
<title>Make taskstats round statistics down to nearest 1k bytes/events</title>
<updated>2011-09-20T00:10:57Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2011-09-20T00:10:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=58c3c3aa01b455ecb99d61ce73f1444274af696b'/>
<id>urn:sha1:58c3c3aa01b455ecb99d61ce73f1444274af696b</id>
<content type='text'>
Even with just the interface limited to admin, there really is little to
reason to give byte-per-byte counts for taskstats.  So round it down to
something less intrusive.

Acked-by: Balbir Singh &lt;bsingharora@gmail.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
