<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/irq, branch v5.6</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.6</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.6'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2020-03-21T16:32:46Z</updated>
<entry>
<title>genirq: Fix reference leaks on irq affinity notifiers</title>
<updated>2020-03-21T16:32:46Z</updated>
<author>
<name>Edward Cree</name>
<email>ecree@solarflare.com</email>
</author>
<published>2020-03-13T20:33:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=df81dfcfd6991d547653d46c051bac195cd182c1'/>
<id>urn:sha1:df81dfcfd6991d547653d46c051bac195cd182c1</id>
<content type='text'>
The handling of notify-&gt;work did not properly maintain notify-&gt;kref in two
 cases:
1) where the work was already scheduled, another irq_set_affinity_locked()
   would get the ref and (no-op-ly) schedule the work.  Thus when
   irq_affinity_notify() ran, it would drop the original ref but not the
   additional one.
2) when cancelling the (old) work in irq_set_affinity_notifier(), if there
   was outstanding work a ref had been got for it but was never put.
Fix both by checking the return values of the work handling functions
 (schedule_work() for (1) and cancel_work_sync() for (2)) and put the
 extra ref if the return value indicates preexisting work.

Fixes: cd7eab44e994 ("genirq: Add IRQ affinity notifiers")
Fixes: 59c39840f5ab ("genirq: Prevent use-after-free and work list corruption")
Signed-off-by: Edward Cree &lt;ecree@solarflare.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Link: https://lkml.kernel.org/r/24f5983f-2ab5-e83a-44ee-a45b5f9300f5@solarflare.com

</content>
</entry>
<entry>
<title>genirq/proc: Reject invalid affinity masks (again)</title>
<updated>2020-02-14T08:43:17Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2020-02-12T11:19:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cba6437a1854fde5934098ec3bd0ee83af3129f5'/>
<id>urn:sha1:cba6437a1854fde5934098ec3bd0ee83af3129f5</id>
<content type='text'>
Qian Cai reported that the WARN_ON() in the x86/msi affinity setting code,
which catches cases where the affinity setting is not done on the CPU which
is the current target of the interrupt, triggers during CPU hotplug stress
testing.

It turns out that the warning which was added with the commit addressing
the MSI affinity race unearthed yet another long standing bug.

If user space writes a bogus affinity mask, i.e. it contains no online CPUs,
then it calls irq_select_affinity_usr(). This was introduced for ALPHA in

  eee45269b0f5 ("[PATCH] Alpha: convert to generic irq framework (generic part)")

and subsequently made available for all architectures in

  18404756765c ("genirq: Expose default irq affinity mask (take 3)")

which introduced the circumvention of the affinity setting restrictions for
interrupt which cannot be moved in process context.

The whole exercise is bogus in various aspects:

  1) If the interrupt is already started up then there is absolutely
     no point to honour a bogus interrupt affinity setting from user
     space. The interrupt is already assigned to an online CPU and it
     does not make any sense to reassign it to some other randomly
     chosen online CPU.

  2) If the interupt is not yet started up then there is no point
     either. A subsequent startup of the interrupt will invoke
     irq_setup_affinity() anyway which will chose a valid target CPU.

So the only correct solution is to just return -EINVAL in case user space
wrote an affinity mask which does not contain any online CPUs, except for
ALPHA which has it's own magic sauce for this.

Fixes: 18404756765c ("genirq: Expose default irq affinity mask (take 3)")
Reported-by: Qian Cai &lt;cai@lca.pw&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Tested-by: Qian Cai &lt;cai@lca.pw&gt;
Link: https://lkml.kernel.org/r/878sl8xdbm.fsf@nanos.tec.linutronix.de


</content>
</entry>
<entry>
<title>Merge tag 'x86-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2020-02-09T20:11:12Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-02-09T20:11:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1a2a76c2685a29e46d7b37e752ccea7b15aa8e24'/>
<id>urn:sha1:1a2a76c2685a29e46d7b37e752ccea7b15aa8e24</id>
<content type='text'>
Pull x86 fixes from Thomas Gleixner:
 "A set of fixes for X86:

   - Ensure that the PIT is set up when the local APIC is disable or
     configured in legacy mode. This is caused by an ordering issue
     introduced in the recent changes which skip PIT initialization when
     the TSC and APIC frequencies are already known.

   - Handle malformed SRAT tables during early ACPI parsing which caused
     an infinite loop anda boot hang.

   - Fix a long standing race in the affinity setting code which affects
     PCI devices with non-maskable MSI interrupts. The problem is caused
     by the non-atomic writes of the MSI address (destination APIC id)
     and data (vector) fields which the device uses to construct the MSI
     message. The non-atomic writes are mandated by PCI.

     If both fields change and the device raises an interrupt after
     writing address and before writing data, then the MSI block
     constructs a inconsistent message which causes interrupts to be
     lost and subsequent malfunction of the device.

     The fix is to redirect the interrupt to the new vector on the
     current CPU first and then switch it over to the new target CPU.
     This allows to observe an eventually raised interrupt in the
     transitional stage (old CPU, new vector) to be observed in the APIC
     IRR and retriggered on the new target CPU and the new vector.

     The potential spurious interrupts caused by this are harmless and
     can in the worst case expose a buggy driver (all handlers have to
     be able to deal with spurious interrupts as they can and do happen
     for various reasons).

   - Add the missing suspend/resume mechanism for the HYPERV hypercall
     page which prevents resume hibernation on HYPERV guests. This
     change got lost before the merge window.

   - Mask the IOAPIC before disabling the local APIC to prevent
     potentially stale IOAPIC remote IRR bits which cause stale
     interrupt lines after resume"

* tag 'x86-urgent-2020-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Mask IOAPIC entries when disabling the local APIC
  x86/hyperv: Suspend/resume the hypercall page for hibernation
  x86/apic/msi: Plug non-maskable MSI affinity race
  x86/boot: Handle malformed SRAT tables during early ACPI parsing
  x86/timer: Don't skip PIT setup when APIC is disabled or in legacy mode
</content>
</entry>
<entry>
<title>genirq: Clarify that irq wake state is orthogonal to enable/disable</title>
<updated>2020-02-07T20:37:08Z</updated>
<author>
<name>Stephen Boyd</name>
<email>swboyd@chromium.org</email>
</author>
<published>2020-02-06T19:15:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f9f21cea311340f38074ff93a8d89b4a9cae6bcc'/>
<id>urn:sha1:f9f21cea311340f38074ff93a8d89b4a9cae6bcc</id>
<content type='text'>
There's some confusion around if an irq that's disabled with disable_irq()
can still wake the system from sleep states such as "suspend to RAM".

Clarify this in the kernel documentation for irq_set_irq_wake() so that
it's clear that an irq can be disabled and still wake the system if it has
been marked for wakeup.

Signed-off-by: Stephen Boyd &lt;swboyd@chromium.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Link: https://lkml.kernel.org/r/20200206191521.94559-1-swboyd@chromium.org

</content>
</entry>
<entry>
<title>proc: convert everything to "struct proc_ops"</title>
<updated>2020-02-04T03:05:26Z</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2020-02-04T01:37:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=97a32539b9568bb653683349e5a76d02ff3c3e2c'/>
<id>urn:sha1:97a32539b9568bb653683349e5a76d02ff3c3e2c</id>
<content type='text'>
The most notable change is DEFINE_SHOW_ATTRIBUTE macro split in
seq_file.h.

Conversion rule is:

	llseek		=&gt; proc_lseek
	unlocked_ioctl	=&gt; proc_ioctl

	xxx		=&gt; proc_xxx

	delete ".owner = THIS_MODULE" line

[akpm@linux-foundation.org: fix drivers/isdn/capi/kcapi_proc.c]
[sfr@canb.auug.org.au: fix kernel/sched/psi.c]
  Link: http://lkml.kernel.org/r/20200122180545.36222f50@canb.auug.org.au
Link: http://lkml.kernel.org/r/20191225172546.GB13378@avx2
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>x86/apic/msi: Plug non-maskable MSI affinity race</title>
<updated>2020-02-01T08:31:47Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2020-01-31T14:26:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6f1a4891a5928a5969c87fa5a584844c983ec823'/>
<id>urn:sha1:6f1a4891a5928a5969c87fa5a584844c983ec823</id>
<content type='text'>
Evan tracked down a subtle race between the update of the MSI message and
the device raising an interrupt internally on PCI devices which do not
support MSI masking. The update of the MSI message is non-atomic and
consists of either 2 or 3 sequential 32bit wide writes to the PCI config
space.

   - Write address low 32bits
   - Write address high 32bits (If supported by device)
   - Write data

When an interrupt is migrated then both address and data might change, so
the kernel attempts to mask the MSI interrupt first. But for MSI masking is
optional, so there exist devices which do not provide it. That means that
if the device raises an interrupt internally between the writes then a MSI
message is sent built from half updated state.

On x86 this can lead to spurious interrupts on the wrong interrupt
vector when the affinity setting changes both address and data. As a
consequence the device interrupt can be lost causing the device to
become stuck or malfunctioning.

Evan tried to handle that by disabling MSI accross an MSI message
update. That's not feasible because disabling MSI has issues on its own:

 If MSI is disabled the PCI device is routing an interrupt to the legacy
 INTx mechanism. The INTx delivery can be disabled, but the disablement is
 not working on all devices.

 Some devices lose interrupts when both MSI and INTx delivery are disabled.

Another way to solve this would be to enforce the allocation of the same
vector on all CPUs in the system for this kind of screwed devices. That
could be done, but it would bring back the vector space exhaustion problems
which got solved a few years ago.

Fortunately the high address (if supported by the device) is only relevant
when X2APIC is enabled which implies interrupt remapping. In the interrupt
remapping case the affinity setting is happening at the interrupt remapping
unit and the PCI MSI message is programmed only once when the PCI device is
initialized.

That makes it possible to solve it with a two step update:

  1) Target the MSI msg to the new vector on the current target CPU

  2) Target the MSI msg to the new vector on the new target CPU

In both cases writing the MSI message is only changing a single 32bit word
which prevents the issue of inconsistency.

After writing the final destination it is necessary to check whether the
device issued an interrupt while the intermediate state #1 (new vector,
current CPU) was in effect.

This is possible because the affinity change is always happening on the
current target CPU. The code runs with interrupts disabled, so the
interrupt can be detected by checking the IRR of the local APIC. If the
vector is pending in the IRR then the interrupt is retriggered on the new
target CPU by sending an IPI for the associated vector on the target CPU.

This can cause spurious interrupts on both the local and the new target
CPU.

 1) If the new vector is not in use on the local CPU and the device
    affected by the affinity change raised an interrupt during the
    transitional state (step #1 above) then interrupt entry code will
    ignore that spurious interrupt. The vector is marked so that the
    'No irq handler for vector' warning is supressed once.

 2) If the new vector is in use already on the local CPU then the IRR check
    might see an pending interrupt from the device which is using this
    vector. The IPI to the new target CPU will then invoke the handler of
    the device, which got the affinity change, even if that device did not
    issue an interrupt

 3) If the new vector is in use already on the local CPU and the device
    affected by the affinity change raised an interrupt during the
    transitional state (step #1 above) then the handler of the device which
    uses that vector on the local CPU will be invoked.

expose issues in device driver interrupt handlers which are not prepared to
handle a spurious interrupt correctly. This not a regression, it's just
exposing something which was already broken as spurious interrupts can
happen for a lot of reasons and all driver handlers need to be able to deal
with them.

Reported-by: Evan Green &lt;evgreen@chromium.org&gt;
Debugged-by: Evan Green &lt;evgreen@chromium.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Tested-by: Evan Green &lt;evgreen@chromium.org&gt;
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87imkr4s7n.fsf@nanos.tec.linutronix.de
</content>
</entry>
<entry>
<title>Merge tag 'irqchip-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core</title>
<updated>2020-01-24T19:08:51Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2020-01-24T19:08:51Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=43ee74487bd2842cb4d37b5c62f074fbed2366b9'/>
<id>urn:sha1:43ee74487bd2842cb4d37b5c62f074fbed2366b9</id>
<content type='text'>
Pull irqchip updates from Marc Zyngier:

- Conversion of the SiFive PLIC to hierarchical domains
- New SiFive GPIO irqchip driver
- New Aspeed SCI irqchip driver
- New NXP INTMUX irqchip driver
- Additional support for the Meson A1 GPIO irqchip
- First part of the GICv4.1 support
- Assorted fixes
</content>
</entry>
<entry>
<title>genirq, sched/isolation: Isolate from handling managed interrupts</title>
<updated>2020-01-22T15:29:49Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2020-01-20T09:16:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=11ea68f553e244851d15793a7fa33a97c46d8271'/>
<id>urn:sha1:11ea68f553e244851d15793a7fa33a97c46d8271</id>
<content type='text'>
The affinity of managed interrupts is completely handled in the kernel and
cannot be changed via the /proc/irq/* interfaces from user space. As the
kernel tries to spread out interrupts evenly accross CPUs on x86 to prevent
vector exhaustion, it can happen that a managed interrupt whose affinity
mask contains both isolated and housekeeping CPUs is routed to an isolated
CPU. As a consequence IO submitted on a housekeeping CPU causes interrupts
on the isolated CPU.

Add a new sub-parameter 'managed_irq' for 'isolcpus' and the corresponding
logic in the interrupt affinity selection code.

The subparameter indicates to the interrupt affinity selection logic that
it should try to avoid the above scenario.

This isolation is best effort and only effective if the automatically
assigned interrupt mask of a device queue contains isolated and
housekeeping CPUs. If housekeeping CPUs are online then such interrupts are
directed to the housekeeping CPU so that IO submitted on the housekeeping
CPU cannot disturb the isolated CPU.

If a queue's affinity mask contains only isolated CPUs then this parameter
has no effect on the interrupt routing decision, though interrupts are only
happening when tasks running on those isolated CPUs submit IO. IO submitted
on housekeeping CPUs has no influence on those queues.

If the affinity mask contains both housekeeping and isolated CPUs, but none
of the contained housekeeping CPUs is online, then the interrupt is also
routed to an isolated CPU. Interrupts are only delivered when one of the
isolated CPUs in the affinity mask submits IO. If one of the contained
housekeeping CPUs comes online, the CPU hotplug logic migrates the
interrupt automatically back to the upcoming housekeeping CPU. Depending on
the type of interrupt controller, this can require that at least one
interrupt is delivered to the isolated CPU in order to complete the
migration.

[ tglx: Removed unused parameter, added and edited comments/documentation
  	and rephrased the changelog so it contains more details. ]

Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/r/20200120091625.17912-1-ming.lei@redhat.com

</content>
</entry>
<entry>
<title>irqdomain: Fix a memory leak in irq_domain_push_irq()</title>
<updated>2020-01-20T19:10:05Z</updated>
<author>
<name>Kevin Hao</name>
<email>haokexin@gmail.com</email>
</author>
<published>2020-01-20T04:35:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0f394daef89b38d58c91118a2b08b8a1b316703b'/>
<id>urn:sha1:0f394daef89b38d58c91118a2b08b8a1b316703b</id>
<content type='text'>
Fix a memory leak reported by kmemleak:
unreferenced object 0xffff000bc6f50e80 (size 128):
  comm "kworker/23:2", pid 201, jiffies 4294894947 (age 942.132s)
  hex dump (first 32 bytes):
    00 00 00 00 41 00 00 00 86 c0 03 00 00 00 00 00  ....A...........
    00 a0 b2 c6 0b 00 ff ff 40 51 fd 10 00 80 ff ff  ........@Q......
  backtrace:
    [&lt;00000000e62d2240&gt;] kmem_cache_alloc_trace+0x1a4/0x320
    [&lt;00000000279143c9&gt;] irq_domain_push_irq+0x7c/0x188
    [&lt;00000000d9f4c154&gt;] thunderx_gpio_probe+0x3ac/0x438
    [&lt;00000000fd09ec22&gt;] pci_device_probe+0xe4/0x198
    [&lt;00000000d43eca75&gt;] really_probe+0xdc/0x320
    [&lt;00000000d3ebab09&gt;] driver_probe_device+0x5c/0xf0
    [&lt;000000005b3ecaa0&gt;] __device_attach_driver+0x88/0xc0
    [&lt;000000004e5915f5&gt;] bus_for_each_drv+0x7c/0xc8
    [&lt;0000000079d4db41&gt;] __device_attach+0xe4/0x140
    [&lt;00000000883bbda9&gt;] device_initial_probe+0x18/0x20
    [&lt;000000003be59ef6&gt;] bus_probe_device+0x98/0xa0
    [&lt;0000000039b03d3f&gt;] deferred_probe_work_func+0x74/0xa8
    [&lt;00000000870934ce&gt;] process_one_work+0x1c8/0x470
    [&lt;00000000e3cce570&gt;] worker_thread+0x1f8/0x428
    [&lt;000000005d64975e&gt;] kthread+0xfc/0x128
    [&lt;00000000f0eaa764&gt;] ret_from_fork+0x10/0x18

Fixes: 495c38d3001f ("irqdomain: Add irq_domain_{push,pop}_irq() functions")
Signed-off-by: Kevin Hao &lt;haokexin@gmail.com&gt;
Signed-off-by: Marc Zyngier &lt;maz@kernel.org&gt;
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200120043547.22271-1-haokexin@gmail.com
</content>
</entry>
<entry>
<title>genirq: Introduce irq_domain_translate_onecell</title>
<updated>2020-01-20T09:19:33Z</updated>
<author>
<name>Yash Shah</name>
<email>yash.shah@sifive.com</email>
</author>
<published>2019-12-10T11:11:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b01ecceaf2c0c4b3f2d24aa0adcf096ab1648253'/>
<id>urn:sha1:b01ecceaf2c0c4b3f2d24aa0adcf096ab1648253</id>
<content type='text'>
Add a new function irq_domain_translate_onecell() that is to be used as
the translate function in struct irq_domain_ops.

Signed-off-by: Yash Shah &lt;yash.shah@sifive.com&gt;
Signed-off-by: Marc Zyngier &lt;maz@kernel.org&gt;
Link: https://lore.kernel.org/r/1575976274-13487-2-git-send-email-yash.shah@sifive.com
</content>
</entry>
</feed>
