<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/softirq.c, branch v3.9</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v3.9</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v3.9'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2013-03-06T02:10:04Z</updated>
<entry>
<title>Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2013-03-06T02:10:04Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-03-06T02:10:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e3b59518c10e08eeb06215abf06f50e8f83b51dc'/>
<id>urn:sha1:e3b59518c10e08eeb06215abf06f50e8f83b51dc</id>
<content type='text'>
Pull irq fixes and cleanups from Thomas Gleixner:
 "Commit e5ab012c3271 ("nohz: Make tick_nohz_irq_exit() irq safe") is
  the first commit in the series and the minimal necessary bugfix, which
  needs to go back into stable.

  The remanining commits enforce irq disabling in irq_exit(), sanitize
  the hardirq/softirq preempt count transition and remove a bunch of no
  longer necessary conditionals."

I personally love getting rid of the very subtle and confusing
IRQ_EXIT_OFFSET thing.  Even apart from the whole "more lines removed
than added" thing.

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irq: Don't re-enable interrupts at the end of irq_exit
  irq: Remove IRQ_EXIT_OFFSET workaround
  Revert "nohz: Make tick_nohz_irq_exit() irq safe"
  irq: Sanitize invoke_softirq
  irq: Ensure irq_exit() code runs with interrupts disabled
  nohz: Make tick_nohz_irq_exit() irq safe
</content>
</entry>
<entry>
<title>irq: Don't re-enable interrupts at the end of irq_exit</title>
<updated>2013-02-28T19:00:43Z</updated>
<author>
<name>Frederic Weisbecker</name>
<email>fweisbec@gmail.com</email>
</author>
<published>2013-02-28T19:00:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4cd5d1115c2f752ca94a0eb461b36d88fb37ed1e'/>
<id>urn:sha1:4cd5d1115c2f752ca94a0eb461b36d88fb37ed1e</id>
<content type='text'>
Commit 74eed0163d0def3fce27228d9ccf3d36e207b286
"irq: Ensure irq_exit() code runs with interrupts disabled"
restore interrupts flags in the end of irq_exit() for archs
that don't define __ARCH_IRQ_EXIT_IRQS_DISABLED.

However always returning from irq_exit() with interrupts
disabled should not be a problem for these archs. Prior to
this commit this was already happening anytime we processed
pending softirqs anyway.

Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</content>
</entry>
<entry>
<title>irq: Remove IRQ_EXIT_OFFSET workaround</title>
<updated>2013-02-21T23:05:07Z</updated>
<author>
<name>Frederic Weisbecker</name>
<email>fweisbec@gmail.com</email>
</author>
<published>2013-02-21T23:05:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4d4c4e24cf48400a24d33feffc7cca4f4e8cabe1'/>
<id>urn:sha1:4d4c4e24cf48400a24d33feffc7cca4f4e8cabe1</id>
<content type='text'>
The IRQ_EXIT_OFFSET trick was used to make sure the irq
doesn't get preempted after we substract the HARDIRQ_OFFSET
until we are entirely done with any code in irq_exit().

This workaround was necessary because some archs may call
irq_exit() with irqs enabled and there is still some code
in the end of this function that is not covered by the
HARDIRQ_OFFSET but want to stay non-preemptible.

Now that irq are always disabled in irq_exit(), the whole code
is guaranteed not to be preempted. We can thus remove this hack.

Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</content>
</entry>
<entry>
<title>irq: Sanitize invoke_softirq</title>
<updated>2013-02-21T19:52:24Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2013-02-21T17:17:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=facd8b80c67a3cf64a467c4a2ac5fb31f2e6745b'/>
<id>urn:sha1:facd8b80c67a3cf64a467c4a2ac5fb31f2e6745b</id>
<content type='text'>
With the irq protection in irq_exit, we can remove the #ifdeffery and
the bh_disable/enable dance in invoke_softirq()

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Linus Torvalds &lt;torvalds@linuxfoundation.org&gt;
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1302202155320.22263@ionos

</content>
</entry>
<entry>
<title>irq: Ensure irq_exit() code runs with interrupts disabled</title>
<updated>2013-02-21T19:52:24Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2013-02-20T21:00:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=74eed0163d0def3fce27228d9ccf3d36e207b286'/>
<id>urn:sha1:74eed0163d0def3fce27228d9ccf3d36e207b286</id>
<content type='text'>
We had already a few problems with code called from irq_exit() when
interrupted from a nesting interrupt. This can happen on architectures
which do not define __ARCH_IRQ_EXIT_IRQS_DISABLED.

__ARCH_IRQ_EXIT_IRQS_DISABLED should go away and we want to make it
mandatory to call irq_exit() with interrupts disabled.

As a temporary protection disable interrupts for those architectures
which do not define __ARCH_IRQ_EXIT_IRQS_DISABLED and add a WARN_ONCE
when an architecture which defines __ARCH_IRQ_EXIT_IRQS_DISABLED calls
irq_exit() with interrupts enabled.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Linus Torvalds &lt;torvalds@linuxfoundation.org&gt;
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1302202155320.22263@ionos

</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next</title>
<updated>2013-02-21T02:58:50Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-02-21T02:58:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a0b1c42951dd06ec83cc1bc2c9788131d9fefcd8'/>
<id>urn:sha1:a0b1c42951dd06ec83cc1bc2c9788131d9fefcd8</id>
<content type='text'>
Pull networking update from David Miller:

 1) Checkpoint/restarted TCP sockets now can properly propagate the TCP
    timestamp offset.  From Andrey Vagin.

 2) VMWARE VM VSOCK layer, from Andy King.

 3) Much improved support for virtual functions and SR-IOV in bnx2x,
    from Ariel ELior.

 4) All protocols on ipv4 and ipv6 are now network namespace aware, and
    all the compatability checks for initial-namespace-only protocols is
    removed.  Thanks to Tom Parkin for helping deal with the last major
    holdout, L2TP.

 5) IPV6 support in netpoll and network namespace support in pktgen,
    from Cong Wang.

 6) Multiple Registration Protocol (MRP) and Multiple VLAN Registration
    Protocol (MVRP) support, from David Ward.

 7) Compute packet lengths more accurately in the packet scheduler, from
    Eric Dumazet.

 8) Use per-task page fragment allocator in skb_append_datato_frags(),
    also from Eric Dumazet.

 9) Add support for connection tracking labels in netfilter, from
    Florian Westphal.

10) Fix default multicast group joining on ipv6, and add anti-spoofing
    checks to 6to4 and 6rd.  From Hannes Frederic Sowa.

11) Make ipv4/ipv6 fragmentation memory limits more reasonable in modern
    times, rearrange inet frag datastructures for better cacheline
    locality, and move more operations outside of locking.  From Jesper
    Dangaard Brouer.

12) Instead of strict master &lt;--&gt; slave relationships, allow arbitrary
    scenerios with "upper device lists".  From Jiri Pirko.

13) Improve rate limiting accuracy in TBF and act_police, also from Jiri
    Pirko.

14) Add a BPF filter netfilter match target, from Willem de Bruijn.

15) Orphan and delete a bunch of pre-historic networking drivers from
    Paul Gortmaker.

16) Add TSO support for GRE tunnels, from Pravin B SHelar.  Although
    this still needs some minor bug fixing before it's %100 correct in
    all cases.

17) Handle unresolved IPSEC states like ARP, with a resolution packet
    queue.  From Steffen Klassert.

18) Remove TCP Appropriate Byte Count support (ABC), from Stephen
    Hemminger.  This was long overdue.

19) Support SO_REUSEPORT, from Tom Herbert.

20) Allow locking a socket BPF filter, so that it cannot change after a
    process drops capabilities.

21) Add VLAN filtering to bridge, from Vlad Yasevich.

22) Bring ipv6 on-par with ipv4 and do not cache neighbour entries in
    the ipv6 routes, from YOSHIFUJI Hideaki.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1538 commits)
  ipv6: fix race condition regarding dst-&gt;expires and dst-&gt;from.
  net: fix a wrong assignment in skb_split()
  ip_gre: remove an extra dst_release()
  ppp: set qdisc_tx_busylock to avoid LOCKDEP splat
  atl1c: restore buffer state
  net: fix a build failure when !CONFIG_PROC_FS
  net: ipv4: fix waring -Wunused-variable
  net: proc: fix build failed when procfs is not configured
  Revert "xen: netback: remove redundant xenvif_put"
  net: move procfs code to net/core/net-procfs.c
  qmi_wwan, cdc-ether: add ADU960S
  bonding: set sysfs device_type to 'bond'
  bonding: fix bond_release_all inconsistencies
  b44: use netdev_alloc_skb_ip_align()
  xen: netback: remove redundant xenvif_put
  net: fec: Do a sanity check on the gpio number
  ip_gre: propogate target device GSO capability to the tunnel device
  ip_gre: allow CSUM capable devices to handle packets
  bonding: Fix initialize after use for 3ad machine state spinlock
  bonding: Fix race condition between bond_enslave() and bond_3ad_update_lacp_rate()
  ...
</content>
</entry>
<entry>
<title>cputime: Safely read cputime of full dynticks CPUs</title>
<updated>2013-01-27T19:35:47Z</updated>
<author>
<name>Frederic Weisbecker</name>
<email>fweisbec@gmail.com</email>
</author>
<published>2012-12-16T19:00:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6a61671bb2f3a1bd12cd17b8fca811a624782632'/>
<id>urn:sha1:6a61671bb2f3a1bd12cd17b8fca811a624782632</id>
<content type='text'>
While remotely reading the cputime of a task running in a
full dynticks CPU, the values stored in utime/stime fields
of struct task_struct may be stale. Its values may be those
of the last kernel &lt;-&gt; user transition time snapshot and
we need to add the tickless time spent since this snapshot.

To fix this, flush the cputime of the dynticks CPUs on
kernel &lt;-&gt; user transition and record the time / context
where we did this. Then on top of this snapshot and the current
time, perform the fixup on the reader side from task_times()
accessors.

Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Li Zhong &lt;zhong@linux.vnet.ibm.com&gt;
Cc: Namhyung Kim &lt;namhyung.kim@lge.com&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
[fixed kvm module related build errors]
Signed-off-by: Sedat Dilek &lt;sedat.dilek@gmail.com&gt;
</content>
</entry>
<entry>
<title>softirq: reduce latencies</title>
<updated>2013-01-10T23:33:01Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2013-01-10T23:26:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c10d73671ad30f54692f7f69f0e09e75d3a8926a'/>
<id>urn:sha1:c10d73671ad30f54692f7f69f0e09e75d3a8926a</id>
<content type='text'>
In various network workloads, __do_softirq() latencies can be up
to 20 ms if HZ=1000, and 200 ms if HZ=100.

This is because we iterate 10 times in the softirq dispatcher,
and some actions can consume a lot of cycles.

This patch changes the fallback to ksoftirqd condition to :

- A time limit of 2 ms.
- need_resched() being set on current task

When one of this condition is met, we wakeup ksoftirqd for further
softirq processing if we still have pending softirqs.

Using need_resched() as the only condition can trigger RCU stalls,
as we can keep BH disabled for too long.

I ran several benchmarks and got no significant difference in
throughput, but a very significant reduction of latencies (one order
of magnitude) :

In following bench, 200 antagonist "netperf -t TCP_RR" are started in
background, using all available cpus.

Then we start one "netperf -t TCP_RR", bound to the cpu handling the NIC
IRQ (hard+soft)

Before patch :

# netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
RT_LATENCY=550110.424
MIN_LATENCY=146858
MAX_LATENCY=997109
P50_LATENCY=305000
P90_LATENCY=550000
P99_LATENCY=710000
MEAN_LATENCY=376989.12
STDDEV_LATENCY=184046.92

After patch :

# netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
RT_LATENCY=40545.492
MIN_LATENCY=9834
MAX_LATENCY=78366
P50_LATENCY=33583
P90_LATENCY=59000
P99_LATENCY=69000
MEAN_LATENCY=38364.67
STDDEV_LATENCY=12865.26

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Cc: Tom Herbert &lt;therbert@google.com&gt;
Cc: Ben Hutchings &lt;bhutchings@solarflare.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>cputime: Specialize irq vtime hooks</title>
<updated>2012-10-29T20:31:32Z</updated>
<author>
<name>Frederic Weisbecker</name>
<email>fweisbec@gmail.com</email>
</author>
<published>2012-10-06T02:07:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fa5058f3b63153e0147ef65bcdb3a4ee63581346'/>
<id>urn:sha1:fa5058f3b63153e0147ef65bcdb3a4ee63581346</id>
<content type='text'>
With CONFIG_VIRT_CPU_ACCOUNTING, when vtime_account()
is called in irq entry/exit, we perform a check on the
context: if we are interrupting the idle task we
account the pending cputime to idle, otherwise account
to system time or its sub-areas: tsk-&gt;stime, hardirq time,
softirq time, ...

However this check for idle only concerns the hardirq entry
and softirq entry:

* Hardirq may directly interrupt the idle task, in which case
we need to flush the pending CPU time to idle.

* The idle task may be directly interrupted by a softirq if
it calls local_bh_enable(). There is probably no such call
in any idle task but we need to cover every case. Ksoftirqd
is not concerned because the idle time is flushed on context
switch and softirq in the end of hardirq have the idle time
already flushed from the hardirq entry.

In the other cases we always account to system/irq time:

* On hardirq exit we account the time to hardirq time.
* On softirq exit we account the time to softirq time.

To optimize this and avoid the indirect call to vtime_account()
and the checks it performs, specialize the vtime irq APIs and
only perform the check on irq entry. Irq exit can directly call
vtime_account_system().

CONFIG_IRQ_TIME_ACCOUNTING behaviour doesn't change and directly
maps to its own vtime_account() implementation. One may want
to take benefits from the new APIs to optimize irq time accounting
as well in the future.

Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2012-10-01T17:43:39Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-10-01T17:43:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0b981cb94bc63a2d0e5eccccdca75fe57643ffce'/>
<id>urn:sha1:0b981cb94bc63a2d0e5eccccdca75fe57643ffce</id>
<content type='text'>
Pull scheduler changes from Ingo Molnar:
 "Continued quest to clean up and enhance the cputime code by Frederic
  Weisbecker, in preparation for future tickless kernel features.

  Other than that, smallish changes."

Fix up trivial conflicts due to additions next to each other in arch/{x86/}Kconfig

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  cputime: Make finegrained irqtime accounting generally available
  cputime: Gather time/stats accounting config options into a single menu
  ia64: Reuse system and user vtime accounting functions on task switch
  ia64: Consolidate user vtime accounting
  vtime: Consolidate system/idle context detection
  cputime: Use a proper subsystem naming for vtime related APIs
  sched: cpu_power: enable ARCH_POWER
  sched/nohz: Clean up select_nohz_load_balancer()
  sched: Fix load avg vs. cpu-hotplug
  sched: Remove __ARCH_WANT_INTERRUPTS_ON_CTXSW
  sched: Fix nohz_idle_balance()
  sched: Remove useless code in yield_to()
  sched: Add time unit suffix to sched sysctl knobs
  sched/debug: Limit sd-&gt;*_idx range on sysctl
  sched: Remove AFFINE_WAKEUPS feature flag
  s390: Remove leftover account_tick_vtime() header
  cputime: Consolidate vtime handling on context switch
  sched: Move cputime code to its own file
  cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING
  tile: Remove SD_PREFER_LOCAL leftover
  ...
</content>
</entry>
</feed>
