<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/watchdog.c, branch v3.6</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.6</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.6'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2012-08-08T18:49:45Z</updated>
<entry>
<title>Revert "NMI watchdog: fix for lockup detector breakage on resume"</title>
<updated>2012-08-08T18:49:45Z</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rjw@sisk.pl</email>
</author>
<published>2012-08-07T11:50:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=300d3739e873d50d4c6e3656f89007a217fb1d29'/>
<id>urn:sha1:300d3739e873d50d4c6e3656f89007a217fb1d29</id>
<content type='text'>
Revert commit 45226e9 (NMI watchdog: fix for lockup detector breakage
on resume) which breaks resume from system suspend on my SH7372
Mackerel board (by causing a NULL pointer dereference to happen) and
is generally wrong, because it abuses the CPU hotplug functionality
in a shamelessly blatant way.

The original issue should be addressed through appropriate syscore
resume callback instead.

Signed-off-by: Rafael J. Wysocki &lt;rjw@sisk.pl&gt;
</content>
</entry>
<entry>
<title>NMI watchdog: fix for lockup detector breakage on resume</title>
<updated>2012-07-31T00:25:13Z</updated>
<author>
<name>Sameer Nanda</name>
<email>snanda@chromium.org</email>
</author>
<published>2012-07-30T21:40:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=45226e944ce071d0231949f2fea90969437cd2dc'/>
<id>urn:sha1:45226e944ce071d0231949f2fea90969437cd2dc</id>
<content type='text'>
On the suspend/resume path the boot CPU does not go though an
offline-&gt;online transition.  This breaks the NMI detector post-resume
since it depends on PMU state that is lost when the system gets
suspended.

Fix this by forcing a CPU offline-&gt;online transition for the lockup
detector on the boot CPU during resume.

To provide more context, we enable NMI watchdog on Chrome OS.  We have
seen several reports of systems freezing up completely which indicated
that the NMI watchdog was not firing for some reason.

Debugging further, we found a simple way of repro'ing system freezes --
issuing the command 'tasket 1 sh -c "echo nmilockup &gt; /proc/breakme"'
after the system has been suspended/resumed one or more times.

With this patch in place, the system freeze result in panics, as
expected.

These panics provide a nice stack trace for us to debug the actual issue
causing the freeze.

[akpm@linux-foundation.org: fiddle with code comment]
[akpm@linux-foundation.org: make lockup_detector_bootcpu_resume() conditional on CONFIG_SUSPEND]
[akpm@linux-foundation.org: fix section errors]
Signed-off-by: Sameer Nanda &lt;snanda@chromium.org&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: "Rafael J. Wysocki" &lt;rjw@sisk.pl&gt;
Cc: Don Zickus &lt;dzickus@redhat.com&gt;
Cc: Mandeep Singh Baines &lt;msb@chromium.org&gt;
Cc: Srivatsa S. Bhat &lt;srivatsa.bhat@linux.vnet.ibm.com&gt;
Cc: Anshuman Khandual &lt;khandual@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>watchdog: Quiet down the boot messages</title>
<updated>2012-06-14T10:20:50Z</updated>
<author>
<name>Don Zickus</name>
<email>dzickus@redhat.com</email>
</author>
<published>2012-06-13T13:35:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a70270468234749741c5893ae78e5bb524771402'/>
<id>urn:sha1:a70270468234749741c5893ae78e5bb524771402</id>
<content type='text'>
A bunch of bugzillas have complained how noisy the nmi_watchdog
is during boot-up especially with its expected failure cases
(like virt and bios resource contention).

This is my attempt to quiet them down and keep it less confusing
for the end user.  What I did is print the message for cpu0 and
save it for future comparisons.  If future cpus have an
identical message as cpu0, then don't print the redundant info.
However, if a future cpu has a different message, happily print
that loudly.

Before the change, you would see something like:

    ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
    CPU0: Intel(R) Core(TM)2 Quad CPU    Q9550  @ 2.83GHz stepping 0a
    Performance Events: PEBS fmt0+, Core2 events, Intel PMU driver.
    ... version:                2
    ... bit width:              40
    ... generic registers:      2
    ... value mask:             000000ffffffffff
    ... max period:             000000007fffffff
    ... fixed-purpose events:   3
    ... event mask:             0000000700000003
    NMI watchdog enabled, takes one hw-pmu counter.
    Booting Node   0, Processors  #1
    NMI watchdog enabled, takes one hw-pmu counter.
     #2
    NMI watchdog enabled, takes one hw-pmu counter.
     #3 Ok.
    NMI watchdog enabled, takes one hw-pmu counter.
    Brought up 4 CPUs
    Total of 4 processors activated (22607.24 BogoMIPS).

After the change, it is simplified to:

    ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
    CPU0: Intel(R) Core(TM)2 Quad CPU    Q9550  @ 2.83GHz stepping 0a
    Performance Events: PEBS fmt0+, Core2 events, Intel PMU driver.
    ... version:                2
    ... bit width:              40
    ... generic registers:      2
    ... value mask:             000000ffffffffff
    ... max period:             000000007fffffff
    ... fixed-purpose events:   3
    ... event mask:             0000000700000003
    NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
    Booting Node   0, Processors  #1 #2 #3 Ok.
    Brought up 4 CPUs

V2: little changes based on Joe Perches' feedback
V3: printk cleanup based on Ingo's feedback; checkpatch fix
V4: keep printk as one long line
V5: Ingo fix ups

Reported-and-tested-by: Nathan Zimmer &lt;nzimmer@sgi.com&gt;
Signed-off-by: Don Zickus &lt;dzickus@redhat.com&gt;
Cc: nzimmer@sgi.com
Cc: joe@perches.com
Link: http://lkml.kernel.org/r/1339594548-17227-1-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>watchdog: add check for suspended vm in softlockup detector</title>
<updated>2012-04-08T09:49:03Z</updated>
<author>
<name>Eric B Munson</name>
<email>emunson@mgebm.net</email>
</author>
<published>2012-03-10T19:37:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5d1c0f4a80a6df73395fb3fc2c302510f8f09d36'/>
<id>urn:sha1:5d1c0f4a80a6df73395fb3fc2c302510f8f09d36</id>
<content type='text'>
A suspended VM can cause spurious soft lockup warnings.  To avoid these, the
watchdog now checks if the kernel knows it was stopped by the host and skips
the warning if so.  When the watchdog is reset successfully, clear the guest
paused flag.

Signed-off-by: Eric B Munson &lt;emunson@mgebm.net&gt;
Signed-off-by: Marcelo Tosatti &lt;mtosatti@redhat.com&gt;
Signed-off-by: Avi Kivity &lt;avi@redhat.com&gt;

</content>
</entry>
<entry>
<title>kernel/watchdog.c: add comment to watchdog() exit path</title>
<updated>2012-03-23T23:58:32Z</updated>
<author>
<name>Andrew Morton</name>
<email>akpm@linux-foundation.org</email>
</author>
<published>2012-03-23T22:01:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b60f796c4ca72545327a069f12938360d833cce7'/>
<id>urn:sha1:b60f796c4ca72545327a069f12938360d833cce7</id>
<content type='text'>
Revelation from Peter.

Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Don Zickus &lt;dzickus@redhat.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@tglx.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>kernel/watchdog.c: convert to pr_foo()</title>
<updated>2012-03-23T23:58:32Z</updated>
<author>
<name>Andrew Morton</name>
<email>akpm@linux-foundation.org</email>
</author>
<published>2012-03-23T22:01:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4501980aae221ed8120dee3491f799ecd75187ad'/>
<id>urn:sha1:4501980aae221ed8120dee3491f799ecd75187ad</id>
<content type='text'>
It fixes some 80-col wordwrappings and adds some consistency.

Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>watchdog: make sure the watchdog thread gets CPU on loaded system</title>
<updated>2012-03-23T23:58:32Z</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.cz</email>
</author>
<published>2012-03-23T22:01:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7a05c0f7bbae91d08b7d0acf016fdb42dbc912ae'/>
<id>urn:sha1:7a05c0f7bbae91d08b7d0acf016fdb42dbc912ae</id>
<content type='text'>
If the system is loaded while hotplugging a CPU we might end up with a
bogus hardlockup detection.  This has been seen during LTP pounder test
executed in parallel with hotplug test.

The main problem is that enable_watchdog (called when CPU is brought up)
registers perf event which periodically checks per-cpu counter
(hrtimer_interrupts), updated from a hrtimer callback, but the hrtimer
is fired from the kernel thread.

This means that while we already do check for the hard lockup the kernel
thread might be sitting on the runqueue with zillions of tasks so there
is nobody to update the value we rely on and so we KABOOM.

Let's fix this by boosting the watchdog thread priority before we wake
it up rather than when it's already running.  This still doesn't handle
a case where we have the same amount of high prio FIFO tasks but that
doesn't seem to be common.  The current implementation doesn't handle
that case anyway so this is not worse at least.

Unfortunately, we cannot start perf counter from the watchdog thread
because we could miss a real lock up and also we cannot start the
hrtimer watchdog_enable because we there is no way (at least I don't
know any) to start a hrtimer from a different CPU.

[dzickus@redhat.com: fix compile issue with param]
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Reviewed-by: Mandeep Singh Baines &lt;msb@chromium.org&gt;
Signed-off-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Signed-off-by: Don Zickus &lt;dzickus@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>watchdog: Fix code/comments mismatches</title>
<updated>2012-02-11T14:11:33Z</updated>
<author>
<name>Fernando Luis Vázquez Cao</name>
<email>fernando@oss.ntt.co.jp</email>
</author>
<published>2012-02-09T22:42:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=86f5e6a7b192721995ece919985ac75222402351'/>
<id>urn:sha1:86f5e6a7b192721995ece919985ac75222402351</id>
<content type='text'>
Reflect the change in the soft and hard lockup thresholds and
their relation to the frequency of the hrtimer and NMI events in
the code comments. While at it, remove references to files that
do not exist anymore.

Signed-off-by: Fernando Luis Vazquez Cao &lt;fernando@oss.ntt.co.jp&gt;
Signed-off-by: Don Zickus &lt;dzickus@redhat.com&gt;
Link: http://lkml.kernel.org/r/1328827342-6253-3-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>bugs, x86: Fix printk levels for panic, softlockups and stack dumps</title>
<updated>2012-01-26T20:28:45Z</updated>
<author>
<name>Prarit Bhargava</name>
<email>prarit@redhat.com</email>
</author>
<published>2012-01-26T13:55:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b0f4c4b32c8e3aa0d44fc4dd6c40a9a9a8d66b63'/>
<id>urn:sha1:b0f4c4b32c8e3aa0d44fc4dd6c40a9a9a8d66b63</id>
<content type='text'>
rsyslog will display KERN_EMERG messages on a connected
terminal.  However, these messages are useless/undecipherable
for a general user.

For example, after a softlockup we get:

 Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ...
 kernel:Stack:

 Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ...
 kernel:Call Trace:

 Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ...
 kernel:Code: ff ff a8 08 75 25 31 d2 48 8d 86 38 e0 ff ff 48 89
 d1 0f 01 c8 0f ae f0 48 8b 86 38 e0 ff ff a8 08 75 08 b1 01 4c 89 e0 0f 01 c9 &lt;e8&gt; ea 69 dd ff 4c 29 e8 48 89 c7 e8 0f bc da ff 49 89 c4 49 89

This happens because the printk levels for these messages are
incorrect. Only an informational message should be displayed on
a terminal.

I modified the printk levels for various messages in the kernel
and tested the output by using the drivers/misc/lkdtm.c kernel
modules (ie, softlockups, panics, hard lockups, etc.) and
confirmed that the console output was still the same and that
the output to the terminals was correct.

For example, in the case of a softlockup we now see the much
more informative:

 Message from syslogd@intel-s3e37-04 at Jan 25 10:18:06 ...
 BUG: soft lockup - CPU4 stuck for 60s!

instead of the above confusing messages.

AFAICT, the messages no longer have to be KERN_EMERG.  In the
most important case of a panic we set console_verbose().  As for
the other less severe cases the correct data is output to the
console and /var/log/messages.

Successfully tested by me using the drivers/misc/lkdtm.c module.

Signed-off-by: Prarit Bhargava &lt;prarit@redhat.com&gt;
Cc: dzickus@redhat.com
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/1327586134-11926-1-git-send-email-prarit@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>watchdog: move watchdog_*_all_cpus under CONFIG_SYSCTL</title>
<updated>2011-11-01T00:30:53Z</updated>
<author>
<name>Vasily Averin</name>
<email>vvs@sw.ru</email>
</author>
<published>2011-11-01T00:11:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4ff819515b203f937cc6c8a0215a37a68d1ee71f'/>
<id>urn:sha1:4ff819515b203f937cc6c8a0215a37a68d1ee71f</id>
<content type='text'>
Fix compilation warnings for CONFIG_SYSCTL=n:

fixed compilation warnings in case of disabled CONFIG_SYSCTL
kernel/watchdog.c:483:13: warning: `watchdog_enable_all_cpus' defined but not used
kernel/watchdog.c:500:13: warning: `watchdog_disable_all_cpus' defined but not used

these functions are static and are used only in sysctl handler, so move
them inside #ifdef CONFIG_SYSCTL too

Signed-off-by: Vasily Averin &lt;vvs@sw.ru&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
