<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/time/timekeeping.c, branch v3.17</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.17</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.17'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2014-09-06T10:58:18Z</updated>
<entry>
<title>timekeeping: Update timekeeper before updating vsyscall and pvclock</title>
<updated>2014-09-06T10:58:18Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2014-09-06T10:24:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9bf2419fa7bffa16ce58a4d5c20399eff8c970c9'/>
<id>urn:sha1:9bf2419fa7bffa16ce58a4d5c20399eff8c970c9</id>
<content type='text'>
The update_walltime() code works on the shadow timekeeper to make the
seqcount protected region as short as possible. But that update to the
shadow timekeeper does not update all timekeeper fields because it's
sufficient to do that once before it becomes life. One of these fields
is tkr.base_mono. That stays stale in the shadow timekeeper unless an
operation happens which copies the real timekeeper to the shadow.

The update function is called after the update calls to vsyscall and
pvclock. While not correct, it did not cause any problems because none
of the invoked update functions used base_mono.

commit cbcf2dd3b3d4 (x86: kvm: Make kvm_get_time_and_clockread()
nanoseconds based) changed that in the kvm pvclock update function, so
the stale mono_base value got used and caused kvm-clock to malfunction.

Put the update where it belongs and fix the issue.

Reported-by: Chris J Arges &lt;chris.j.arges@canonical.com&gt;
Reported-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: Gleb Natapov &lt;gleb@kernel.org&gt;
Cc: John Stultz &lt;john.stultz@linaro.org&gt;
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409050000570.3333@nanos
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
</entry>
<entry>
<title>timekeeping: Another fix to the VSYSCALL_OLD update_vsyscall</title>
<updated>2014-08-14T17:04:11Z</updated>
<author>
<name>John Stultz</name>
<email>john.stultz@linaro.org</email>
</author>
<published>2014-08-13T19:47:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0680eb1f485ba5aac2ee02c9f0622239c9a4b16c'/>
<id>urn:sha1:0680eb1f485ba5aac2ee02c9f0622239c9a4b16c</id>
<content type='text'>
Benjamin Herrenschmidt pointed out that I further missed modifying
update_vsyscall after the wall_to_mono value was changed to a
timespec64.  This causes issues on powerpc32, which expects a 32bit
timespec.

This patch fixes the problem by properly converting from a timespec64 to
a timespec before passing the value on to the arch-specific vsyscall
logic.

[ Thomas is currently on vacation, but reviewed it and wanted me to send
  this fix on to you directly. ]

Cc: LKML &lt;linux-kernel@vger.kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Reported-by: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>timekeeping: Use cached ntp_tick_length when accumulating error</title>
<updated>2014-07-23T22:01:57Z</updated>
<author>
<name>John Stultz</name>
<email>john.stultz@linaro.org</email>
</author>
<published>2014-04-24T03:53:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=375f45b5b53a91dfa8f0c11328e0e044f82acbed'/>
<id>urn:sha1:375f45b5b53a91dfa8f0c11328e0e044f82acbed</id>
<content type='text'>
By caching the ntp_tick_length() when we correct the frequency error,
and then using that cached value to accumulate error, we avoid large
initial errors when the tick length is changed.

This makes convergence happen much faster in the simulator, since the
initial error doesn't have to be slowly whittled away.

This initially seems like an accounting error, but Miroslav pointed out
that ntp_tick_length() can change mid-tick, so when we apply it in the
error accumulation, we are applying any recent change to the entire tick.

This approach chooses to apply changes in the ntp_tick_length() only to
the next tick, which allows us to calculate the freq correction before
using the new tick length, which avoids accummulating error.

Credit to Miroslav for pointing this out and providing the original patch
this functionality has been pulled out from, along with the rational.

Cc: Miroslav Lichvar &lt;mlichvar@redhat.com&gt;
Cc: Richard Cochran &lt;richardcochran@gmail.com&gt;
Cc: Prarit Bhargava &lt;prarit@redhat.com&gt;
Reported-by: Miroslav Lichvar &lt;mlichvar@redhat.com&gt;
Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
</content>
</entry>
<entry>
<title>timekeeping: Rework frequency adjustments to work better w/ nohz</title>
<updated>2014-07-23T22:01:56Z</updated>
<author>
<name>John Stultz</name>
<email>john.stultz@linaro.org</email>
</author>
<published>2013-12-07T01:25:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=dc491596f6394382fbc74ad331156207d619fa0a'/>
<id>urn:sha1:dc491596f6394382fbc74ad331156207d619fa0a</id>
<content type='text'>
The existing timekeeping_adjust logic has always been complicated
to understand. Further, since it was developed prior to NOHZ becoming
common, its not surprising it performs poorly when NOHZ is enabled.

Since Miroslav pointed out the problematic nature of the existing code
in the NOHZ case, I've tried to refactor the code to perform better.

The problem with the previous approach was that it tried to adjust
for the total cumulative error using a scaled dampening factor. This
resulted in large errors to be corrected slowly, while small errors
were corrected quickly. With NOHZ the timekeeping code doesn't know
how far out the next tick will be, so this results in bad
over-correction to small errors, and insufficient correction to large
errors.

Inspired by Miroslav's patch, I've refactored the code to try to
address the correction in two steps.

1) Check the future freq error for the next tick, and if the frequency
error is large, try to make sure we correct it so it doesn't cause
much accumulated error.

2) Then make a small single unit adjustment to correct any cumulative
error that has collected over time.

This method performs fairly well in the simulator Miroslav created.

Major credit to Miroslav for pointing out the issue, providing the
original patch to resolve this, a simulator for testing, as well as
helping debug and resolve issues in my implementation so that it
performed closer to his original implementation.

Cc: Miroslav Lichvar &lt;mlichvar@redhat.com&gt;
Cc: Richard Cochran &lt;richardcochran@gmail.com&gt;
Cc: Prarit Bhargava &lt;prarit@redhat.com&gt;
Reported-by: Miroslav Lichvar &lt;mlichvar@redhat.com&gt;
Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
</content>
</entry>
<entry>
<title>timekeeping: Minor fixup for timespec64-&gt;timespec assignment</title>
<updated>2014-07-23T22:01:56Z</updated>
<author>
<name>John Stultz</name>
<email>john.stultz@linaro.org</email>
</author>
<published>2014-07-23T21:35:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e2dff1ec0cc81fcf3e0696604bacc3e1c816538c'/>
<id>urn:sha1:e2dff1ec0cc81fcf3e0696604bacc3e1c816538c</id>
<content type='text'>
In the GENERIC_TIME_VSYSCALL_OLD update_vsyscall implementation,
we take the tk_xtime() value, which returns a timespec64, and
store it in a timespec.

This luckily is ok, since the only architectures that use
GENERIC_TIME_VSYSCALL_OLD are ia64 and ppc64, which are both
64 bit systems where timespec64 is the same as a timespec.

Even so, for cleanliness reasons, use the conversion function
to assign the proper type.

Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
</content>
</entry>
<entry>
<title>timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC</title>
<updated>2014-07-23T22:01:55Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2014-07-16T21:05:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4396e058c52e167729729cf64ea3dfa229637086'/>
<id>urn:sha1:4396e058c52e167729729cf64ea3dfa229637086</id>
<content type='text'>
Tracers want a correlated time between the kernel instrumentation and
user space. We really do not want to export sched_clock() to user
space, so we need to provide something sensible for this.

Using separate data structures with an non blocking sequence count
based update mechanism allows us to do that. The data structure
required for the readout has a sequence counter and two copies of the
timekeeping data.

On the update side:

  smp_wmb();
  tkf-&gt;seq++;
  smp_wmb();
  update(tkf-&gt;base[0], tk);
  smp_wmb();
  tkf-&gt;seq++;
  smp_wmb();
  update(tkf-&gt;base[1], tk);

On the reader side:

  do {
     seq = tkf-&gt;seq;
     smp_rmb();
     idx = seq &amp; 0x01;
     now = now(tkf-&gt;base[idx]);
     smp_rmb();
  } while (seq != tkf-&gt;seq)

So if a NMI hits the update of base[0] it will use base[1] which is
still consistent, but this timestamp is not guaranteed to be monotonic
across an update.

The timestamp is calculated by:

	now = base_mono + clock_delta * slope

So if the update lowers the slope, readers who are forced to the
not yet updated second array are still using the old steeper slope.

 tmono
 ^
 |    o  n
 |   o n
 |  u
 | o
 |o
 |12345678---&gt; reader order

 o = old slope
 u = update
 n = new slope

So reader 6 will observe time going backwards versus reader 5.

While other CPUs are likely to be able observe that, the only way
for a CPU local observation is when an NMI hits in the middle of
the update. Timestamps taken from that NMI context might be ahead
of the following timestamps. Callers need to be aware of that and
deal with it.

V2: Got rid of clock monotonic raw and reorganized the data
    structures. Folded in the barrier fix from Mathieu.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
</content>
</entry>
<entry>
<title>timekeeping: Use tk_read_base as argument for timekeeping_get_ns()</title>
<updated>2014-07-23T22:01:53Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2014-07-16T21:05:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0e5ac3a8b100469ea154f87dd57b685fbdd356f6'/>
<id>urn:sha1:0e5ac3a8b100469ea154f87dd57b685fbdd356f6</id>
<content type='text'>
All the function needs is in the tk_read_base struct. No functional
change for the current code, just a preparatory patch for the NMI safe
accessor to clock monotonic which will use struct tk_read_base as well.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
</content>
</entry>
<entry>
<title>timekeeping: Create struct tk_read_base and use it in struct timekeeper</title>
<updated>2014-07-23T22:01:53Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2014-07-16T21:05:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d28ede83791defee9a81e558540699dc46dbbe13'/>
<id>urn:sha1:d28ede83791defee9a81e558540699dc46dbbe13</id>
<content type='text'>
The members of the new struct are the required ones for the new NMI
safe accessor to clcok monotonic. In order to reuse the existing
timekeeping code and to make the update of the fast NMI safe
timekeepers a simple memcpy use the struct for the timekeeper as well
and convert all users.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
</content>
</entry>
<entry>
<title>timekeeping: Restructure the timekeeper some more</title>
<updated>2014-07-23T22:01:52Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2014-07-16T21:05:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6d3aadf3e180e09dbefab16478c6876b584ce16e'/>
<id>urn:sha1:6d3aadf3e180e09dbefab16478c6876b584ce16e</id>
<content type='text'>
Access to time requires to touch two cachelines at minimum

   1) The timekeeper data structure

   2) The clocksource data structure

The access to the clocksource data structure can be avoided as almost
all clocksource implementations ignore the argument to the read
callback, which is a pointer to the clocksource.

But the core needs to touch it to access the members @read and @mask.

So we are better off by copying the @read function pointer and the
@mask from the clocksource to the core data structure itself.

For the most used ktime_get() access all required data including the
@read and @mask copies fits together with the sequence counter into a
single 64 byte cacheline.

For the other time access functions we touch in the current code three
cache lines in the worst case. But with the clocksource data copies we
can reduce that to two adjacent cachelines, which is more efficient
than disjunct cache lines.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
</content>
</entry>
<entry>
<title>clocksource: Get rid of cycle_last</title>
<updated>2014-07-23T22:01:52Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2014-07-16T21:05:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4a0e637738f06673725792d74eed67f8779b62c7'/>
<id>urn:sha1:4a0e637738f06673725792d74eed67f8779b62c7</id>
<content type='text'>
cycle_last was added to the clocksource to support the TSC
validation. We moved that to the core code, so we can get rid of the
extra copy.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
</content>
</entry>
</feed>
