| Age | Commit message (Collapse) | Author | Lines |
|
If an error is encountered while mapping TX buffers, the driver should
unmap any buffers already mapped for that skb.
Because count is incremented after a successful mapping, it will always
match the correct number of unmappings needed when dma_error is reached.
Decrementing count before the while loop in dma_error causes an
off-by-one error. If any mapping was successful before an unsuccessful
mapping, exactly one DMA mapping would leak.
In these commits, a faulty while condition caused an infinite loop in
dma_error:
Commit 03b1320dfcee ("e1000e: remove use of skb_dma_map from e1000e
driver")
Commit 602c0554d7b0 ("e1000: remove use of skb_dma_map from e1000 driver")
Commit c1fa347f20f1 ("e1000/e1000e/igb/igbvf/ixgb/ixgbe: Fix tests of
unsigned in *_tx_map()") fixed the infinite loop, but introduced the
off-by-one error.
This issue may still exist in the igbvf driver, but I did not address it
in this patch.
Fixes: c1fa347f20f1 ("e1000/e1000e/igb/igbvf/ixgb/ixgbe: Fix tests of unsigned in *_tx_map()")
Assisted-by: Claude:claude-4.6-opus
Signed-off-by: Matt Vollrath <tactii@gmail.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Fix following issues in the IPv4 and IPv6 cloud filter handling logic in
both the add and delete paths:
- The source-IP mask check incorrectly compares mask.src_ip[0] against
tcf.dst_ip[0]. Update it to compare against tcf.src_ip[0]. This likely
goes unnoticed because the check is in an "else if" path that only
executes when dst_ip is not set, most cloud filter use cases focus on
destination-IP matching, and the buggy condition can accidentally
evaluate true in some cases.
- memcpy() for the IPv4 source address incorrectly uses
ARRAY_SIZE(tcf.dst_ip) instead of ARRAY_SIZE(tcf.src_ip), although
both arrays are the same size.
- The IPv4 memcpy operations used ARRAY_SIZE(tcf.dst_ip) and ARRAY_SIZE
(tcf.src_ip), Update these to use sizeof(cfilter->ip.v4.dst_ip) and
sizeof(cfilter->ip.v4.src_ip) to ensure correct and explicit copy size.
- In the IPv6 delete path, memcmp() uses sizeof(src_ip6) when comparing
dst_ip6 fields. Replace this with sizeof(dst_ip6) to make the intent
explicit, even though both fields are struct in6_addr.
Fixes: e284fc280473 ("i40e: Add and delete cloud filter")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Three driver callbacks schedule a reset and wait for its completion:
ndo_change_mtu(), ethtool set_ringparam(), and ethtool set_channels().
Waiting for reset in ndo_change_mtu() and set_ringparam() was added by
commit c2ed2403f12c ("iavf: Wait for reset in callbacks which trigger
it") to fix a race condition where adding an interface to bonding
immediately after MTU or ring parameter change failed because the
interface was still in __RESETTING state. The same commit also added
waiting in iavf_set_priv_flags(), which was later removed by commit
53844673d555 ("iavf: kill "legacy-rx" for good").
Waiting in set_channels() was introduced earlier by commit 4e5e6b5d9d13
("iavf: Fix return of set the new channel count") to ensure the PF has
enough time to complete the VF reset when changing channel count, and to
return correct error codes to userspace.
Commit ef490bbb2267 ("iavf: Add net_shaper_ops support") added
net_shaper_ops to iavf, which required reset_task to use _locked NAPI
variants (napi_enable_locked, napi_disable_locked) that need the netdev
instance lock.
Later, commit 7e4d784f5810 ("net: hold netdev instance lock during
rtnetlink operations") and commit 2bcf4772e45a ("net: ethtool: try to
protect all callback with netdev instance lock") started holding the
netdev instance lock during ndo and ethtool callbacks for drivers with
net_shaper_ops.
Finally, commit 120f28a6f314 ("iavf: get rid of the crit lock")
replaced the driver's crit_lock with netdev_lock in reset_task, causing
incorrect behavior: the callback holds netdev_lock and waits for
reset_task, but reset_task needs the same lock:
Thread 1 (callback) Thread 2 (reset_task)
------------------- ---------------------
netdev_lock() [blocked on workqueue]
ndo_change_mtu() or ethtool op
iavf_schedule_reset()
iavf_wait_for_reset() iavf_reset_task()
waiting... netdev_lock() <- blocked
This does not strictly deadlock because iavf_wait_for_reset() uses
wait_event_interruptible_timeout() with a 5-second timeout. The wait
eventually times out, the callback returns an error to userspace, and
after the lock is released reset_task completes the reset. This leads to
incorrect behavior: userspace sees an error even though the configuration
change silently takes effect after the timeout.
Fix this by extracting the reset logic from iavf_reset_task() into a new
iavf_reset_step() function that expects netdev_lock to be already held.
The three callbacks now call iavf_reset_step() directly instead of
scheduling the work and waiting, performing the reset synchronously in
the caller's context which already holds netdev_lock. This eliminates
both the incorrect error reporting and the need for
iavf_wait_for_reset(), which is removed along with the now-unused
reset_waitqueue.
The workqueue-based iavf_reset_task() becomes a thin wrapper that
acquires netdev_lock and calls iavf_reset_step(), preserving its use
for PF-initiated resets.
The callbacks may block for several seconds while iavf_reset_step()
polls hardware registers, but this is acceptable since netdev_lock is a
per-device mutex and only serializes operations on the same interface.
v3:
- Remove netif_running() guard from iavf_set_channels(). Unlike
set_ringparam where descriptor counts are picked up by iavf_open()
directly, num_req_queues is only consumed during
iavf_reinit_interrupt_scheme() in the reset path. Skipping the reset
on a down device would silently discard the channel count change.
- Remove dead reset_waitqueue code (struct field, init, and all
wake_up calls) since iavf_wait_for_reset() was the only consumer.
Fixes: 120f28a6f314 ("iavf: get rid of the crit lock")
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Commit 7c01dbfc8a1c5f ("iavf: periodically cache PHC time") introduced a
worker to cache PHC time, but failed to stop it during reset or disable.
This creates a race condition where `iavf_reset_task()` or
`iavf_disable_vf()` free adapter resources (AQ) while the worker is still
running. If the worker triggers `iavf_queue_ptp_cmd()` during teardown, it
accesses freed memory/locks, leading to a crash.
Fix this by calling `iavf_ptp_release()` before tearing down the adapter.
This ensures `ptp_clock_unregister()` synchronously cancels the worker and
cleans up the chardev before the backing resources are destroyed.
Fixes: 7c01dbfc8a1c5f ("iavf: periodically cache PHC time")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
If CONFIG_IRDMA isn't enabled but there are ice NICs in the system, the
driver will prevent full devlink dev param show dump because its rdma get
callbacks return ENODEV and stop the dump. For example:
$ devlink dev param show
pci/0000:82:00.0:
name msix_vec_per_pf_max type generic
values:
cmode driverinit value 2
name msix_vec_per_pf_min type generic
values:
cmode driverinit value 2
kernel answers: No such device
Returning EOPNOTSUPP allows the dump to continue so we can see all devices'
devlink parameters.
Fixes: c24a65b6a27c ("iidc/ice/irdma: Update IDC to support multiple consumers")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The only user of frag_size field in XDP RxQ info is
bpf_xdp_frags_increase_tail(). It clearly expects whole buffer size instead
of DMA write size. Different assumptions in idpf driver configuration lead
to negative tailroom.
To make it worse, buffer sizes are not actually uniform in idpf when
splitq is enabled, as there are several buffer queues, so rxq->rx_buf_size
is meaningless in this case.
Use truesize of the first bufq in AF_XDP ZC, as there is only one. Disable
growing tail for regular splitq.
Fixes: ac8a861f632e ("idpf: prepare structures to support XDP")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://patch.msgid.link/20260305111253.2317394-8-larysa.zaremba@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The only user of frag_size field in XDP RxQ info is
bpf_xdp_frags_increase_tail(). It clearly expects whole buffer size instead
of DMA write size. Different assumptions in i40e driver configuration lead
to negative tailroom.
Set frag_size to the same value as frame_sz in shared pages mode, use new
helper to set frag_size when AF_XDP ZC is active.
Fixes: a045d2f2d03d ("i40e: set xdp_rxq_info::frag_size")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://patch.msgid.link/20260305111253.2317394-7-larysa.zaremba@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Current way of handling XDP RxQ info in i40e has a problem, where frag_size
is not updated when xsk_buff_pool is detached or when MTU is changed, this
leads to growing tail always failing for multi-buffer packets.
Couple XDP RxQ info registering with buffer allocations and unregistering
with cleaning the ring.
Fixes: a045d2f2d03d ("i40e: set xdp_rxq_info::frag_size")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://patch.msgid.link/20260305111253.2317394-6-larysa.zaremba@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The only user of frag_size field in XDP RxQ info is
bpf_xdp_frags_increase_tail(). It clearly expects whole buff size instead
of DMA write size. Different assumptions in ice driver configuration lead
to negative tailroom.
This allows to trigger kernel panic, when using
XDP_ADJUST_TAIL_GROW_MULTI_BUFF xskxceiver test and changing packet size to
6912 and the requested offset to a huge value, e.g.
XSK_UMEM__MAX_FRAME_SIZE * 100.
Due to other quirks of the ZC configuration in ice, panic is not observed
in ZC mode, but tailroom growing still fails when it should not.
Use fill queue buffer truesize instead of DMA write size in XDP RxQ info.
Fix ZC mode too by using the new helper.
Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://patch.msgid.link/20260305111253.2317394-5-larysa.zaremba@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
XDP RxQ info contains frag_size, which depends on the MTU. This makes the
old way of registering RxQ info before calculating new buffer sizes
invalid. Currently, it leads to frag_size being outdated, making it
sometimes impossible to grow tailroom in a mbuf packet. E.g. fragments are
actually 3K+, but frag size is still as if MTU was 1500.
Always register new XDP RxQ info after reconfiguring memory pools.
Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://patch.msgid.link/20260305111253.2317394-4-larysa.zaremba@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch addresses the issue where the igc_xsk_wakeup function
was triggering an incorrect IRQ for tx-0 when the i226 is configured
with only 2 combined queues or in an environment with 2 active CPU cores.
This prevented XDP Zero-copy send functionality in such split IRQ
configurations.
The fix implements the correct logic for extracting q_vectors saved
during rx and tx ring allocation and utilizes flags provided by the
ndo_xsk_wakeup API to trigger the appropriate IRQ.
Fixes: fc9df2a0b520 ("igc: Enable RX via AF_XDP zero-copy")
Fixes: 15fd021bc427 ("igc: Add Tx hardware timestamp request for AF_XDP zero-copy packet")
Signed-off-by: Vivek Behera <vivek.behera@siemens.com>
Reviewed-by: Jacob Keller <jacob.keller@intel.com>
Reviewed-by: Aleksandr loktinov <aleksandr.loktionov@intel.com>
Reviewed-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Reviewed-by: Song Yoong Siang <yoong.siang.song@intel.com>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The current implementation in the igb_xsk_wakeup expects
the Rx and Tx queues to share the same irq. This would lead
to triggering of incorrect irq in split irq configuration.
This patch addresses this issue which could impact environments
with 2 active cpu cores
or when the number of queues is reduced to 2 or less
cat /proc/interrupts | grep eno2
167: 0 0 0 0 IR-PCI-MSIX-0000:08:00.0
0-edge eno2
168: 0 0 0 0 IR-PCI-MSIX-0000:08:00.0
1-edge eno2-rx-0
169: 0 0 0 0 IR-PCI-MSIX-0000:08:00.0
2-edge eno2-rx-1
170: 0 0 0 0 IR-PCI-MSIX-0000:08:00.0
3-edge eno2-tx-0
171: 0 0 0 0 IR-PCI-MSIX-0000:08:00.0
4-edge eno2-tx-1
Furthermore it uses the flags input argument to trigger either rx, tx or
both rx and tx irqs as specified in the ndo_xsk_wakeup api documentation
Fixes: 80f6ccf9f116 ("igb: Introduce XSK data structures and helpers")
Signed-off-by: Vivek Behera <vivek.behera@siemens.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Saritha Sanigani <sarithax.sanigani@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
iavf sets LIBIE_MAX_MTU as netdev->max_mtu, ignoring vf_res->max_mtu
from PF [1]. This allows setting an MTU beyond the actual hardware
limit, causing TX queue timeouts [2].
Set correct netdev->max_mtu using vf_res->max_mtu from the PF.
Note that currently PF drivers such as ice/i40e set the frame size in
vf_res->max_mtu, not MTU. Convert vf_res->max_mtu to MTU before setting
netdev->max_mtu.
[1]
# ip -j -d link show $DEV | jq '.[0].max_mtu'
16356
[2]
iavf 0000:00:05.0 enp0s5: NETDEV WATCHDOG: CPU: 1: transmit queue 0 timed out 5692 ms
iavf 0000:00:05.0 enp0s5: NIC Link is Up Speed is 10 Gbps Full Duplex
iavf 0000:00:05.0 enp0s5: NETDEV WATCHDOG: CPU: 6: transmit queue 3 timed out 5312 ms
iavf 0000:00:05.0 enp0s5: NIC Link is Up Speed is 10 Gbps Full Duplex
...
Fixes: 5fa4caff59f2 ("iavf: switch to Page Pool")
Signed-off-by: Kohei Enju <kohei@enjuk.jp>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The libie_fwlog_deinit() function can be called during driver unload
even when firmware logging was never properly initialized. This led to call
trace:
[ 148.576156] Oops: Oops: 0000 [#1] SMP NOPTI
[ 148.576167] CPU: 80 UID: 0 PID: 12843 Comm: rmmod Kdump: loaded Not tainted 6.17.0-rc7next-queue-3oct-01915-g06d79d51cf51 #1 PREEMPT(full)
[ 148.576177] Hardware name: HPE ProLiant DL385 Gen10 Plus/ProLiant DL385 Gen10 Plus, BIOS A42 07/18/2020
[ 148.576182] RIP: 0010:__dev_printk+0x16/0x70
[ 148.576196] Code: 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 55 41 54 49 89 d4 55 48 89 fd 53 48 85 f6 74 3c <4c> 8b 6e 50 48 89 f3 4d 85 ed 75 03 4c 8b 2e 48 89 df e8 f3 27 98
[ 148.576204] RSP: 0018:ffffd2fd7ea17a48 EFLAGS: 00010202
[ 148.576211] RAX: ffffd2fd7ea17aa0 RBX: ffff8eb288ae2000 RCX: 0000000000000000
[ 148.576217] RDX: ffffd2fd7ea17a70 RSI: 00000000000000c8 RDI: ffffffffb68d3d88
[ 148.576222] RBP: ffffffffb68d3d88 R08: 0000000000000000 R09: 0000000000000000
[ 148.576227] R10: 00000000000000c8 R11: ffff8eb2b1a49400 R12: ffffd2fd7ea17a70
[ 148.576231] R13: ffff8eb3141fb000 R14: ffffffffc1215b48 R15: ffffffffc1215bd8
[ 148.576236] FS: 00007f5666ba6740(0000) GS:ffff8eb2472b9000(0000) knlGS:0000000000000000
[ 148.576242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 148.576247] CR2: 0000000000000118 CR3: 000000011ad17000 CR4: 0000000000350ef0
[ 148.576252] Call Trace:
[ 148.576258] <TASK>
[ 148.576269] _dev_warn+0x7c/0x96
[ 148.576290] libie_fwlog_deinit+0x112/0x117 [libie_fwlog]
[ 148.576303] ixgbe_remove+0x63/0x290 [ixgbe]
[ 148.576342] pci_device_remove+0x42/0xb0
[ 148.576354] device_release_driver_internal+0x19c/0x200
[ 148.576365] driver_detach+0x48/0x90
[ 148.576372] bus_remove_driver+0x6d/0xf0
[ 148.576383] pci_unregister_driver+0x2e/0xb0
[ 148.576393] ixgbe_exit_module+0x1c/0xd50 [ixgbe]
[ 148.576430] __do_sys_delete_module.isra.0+0x1bc/0x2e0
[ 148.576446] do_syscall_64+0x7f/0x980
It can be reproduced by trying to unload ixgbe driver in recovery mode.
Fix that by checking if fwlog is supported before doing unroll.
Fixes: 641585bc978e ("ixgbe: fwlog support for e610")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
In ice_set_ringparam, tx_rings and xdp_rings are allocated before
rx_rings. If the allocation of rx_rings fails, the code jumps to
the done label leaking both tx_rings and xdp_rings. Furthermore, if
the setup of an individual Rx ring fails during the loop, the code jumps
to the free_tx label which releases tx_rings but leaks xdp_rings.
Fix this by introducing a free_xdp label and updating the error paths to
ensure both xdp_rings and tx_rings are properly freed if rx_rings
allocation or setup fails.
Compile tested only. Issue found using a prototype static analysis tool
and code review.
Fixes: fcea6f3da546 ("ice: Add stats and ethtool support")
Fixes: efc2214b6047 ("ice: Add support for XDP")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Executing ethtool -m can fail reporting a netlink I/O error while firmware
link management holds the i2c bus used to communicate with the module.
According to Intel(R) Ethernet Controller E810 Datasheet Rev 2.8 [1]
Section 3.3.10.4 Read/Write SFF EEPROM (0x06EE)
request should to be retried upon receiving EBUSY from firmware.
Commit e9c9692c8a81 ("ice: Reimplement module reads used by ethtool")
implemented it only for part of ice_get_module_eeprom(), leaving all other
calls to ice_aq_sff_eeprom() vulnerable to returning early on getting
EBUSY without retrying.
Remove the retry loop from ice_get_module_eeprom() and add Admin Queue
(AQ) command with opcode 0x06EE to the list of commands that should be
retried on receiving EBUSY from firmware.
Cc: stable@vger.kernel.org
Fixes: e9c9692c8a81 ("ice: Reimplement module reads used by ethtool")
Signed-off-by: Jakub Staniszewski <jakub.staniszewski@linux.intel.com>
Co-developed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://www.intel.com/content/www/us/en/content-details/613875/intel-ethernet-controller-e810-datasheet.html [1]
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add retry mechanism for indirect Admin Queue (AQ) commands. To do so we
need to keep the command buffer.
This technically reverts commit 43a630e37e25
("ice: remove unused buffer copy code in ice_sq_send_cmd_retry()"),
but combines it with a fix in the logic by using a kmemdup() call,
making it more robust and less likely to break in the future due to
programmer error.
Cc: Michal Schmidt <mschmidt@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 3056df93f7a8 ("ice: Re-send some AQ commands, as result of EBUSY AQ error")
Signed-off-by: Jakub Staniszewski <jakub.staniszewski@linux.intel.com>
Co-developed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The referenced commit came from a misunderstanding of the FW LLDP filter
AQ (Admin Queue) command due to the error in the internal documentation.
Contrary to the assumptions in the original commit, VFs can be added and
deleted from this filter without any problems. Introduced dev_info message
proved to be useful, so reverting the whole commit does not make sense.
Without this fix, trusted VFs do not receive LLDP traffic, if there is an
AQ LLDP filter on PF. When trusted VF attempts to add an LLDP multicast
MAC address, the following message can be seen in dmesg on host:
ice 0000:33:00.0: Failed to add Rx LLDP rule on VSI 20 error: -95
Revert checking VSI type when adding LLDP filter through AQ.
Fixes: 4d5a1c4e6d49 ("ice: do not add LLDP-specific filter if not necessary")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2026-02-19 (idpf, ice, i40e, ixgbevf, e1000e)
For idpf:
Li Li moves the check for software marker to occur after incrementing
next to clean to avoid re-encountering the same packet. He also adds a
couple of checks to prevent NULL pointer dereferences and NULLs rss_key,
after free, in error path so that later checks are properly evaluated.
Brian Vazquez adjusts IRQ naming to have correlation with netdev naming.
Sreedevi removes validation of action type as part of ntuple rule
deletion.
For ice:
Aaron Ma breaks RDMA initialization into two steps and adjusts calls so
that VSIs are entirely configured before plugging.
Michal Schmidt fixes initialization of loopback VSI to have proper
resources allocated to allow for loopback testing to occur.
For i40e:
Thomas Gleixner fixes a leak of preempt count by replacing get_cpu()
with smp_processor_id().
For ixgbevf:
Jedrzej adds a check for mailbox version before attempting to call an
associated link state call that is supported in that mailbox version.
For e1000e:
Vitaly clears power gating feature for Panther Lake systems to avoid
packet issues.
* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
e1000e: clear DPG_EN after reset to avoid autonomous power-gating
e1000e: introduce new board type for Panther Lake PCH
ixgbevf: fix link setup issue
i40e: Fix preempt count leak in napi poll tracepoint
ice: fix crash in ethtool offline loopback test
ice: recap the VSI and QoS info after rebuild
idpf: Fix flow rule delete failure due to invalid validation
idpf: change IRQ naming to match netdev and ethtool queue numbering
idpf: nullify pointers after they are freed
idpf: skip deallocating txq group's txqs if it is NULL
idpf: skip deallocating bufq_sets from rx_qgrp if it is NULL
idpf: increment completion queue next_to_clean in sw marker wait routine
====================
Link: https://patch.msgid.link/20260225211546.1949260-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Panther Lake systems introduced an autonomous power gating feature for
the integrated Gigabit Ethernet in shutdown state (S5) state. As part of
it, the reset value of DPG_EN bit was changed to 1. Clear this bit after
performing hardware reset to avoid errors such as Tx/Rx hangs, or packet
loss/corruption.
Fixes: 0c9183ce61bc ("e1000e: Add support for the next LOM generation")
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add new board type for Panther Lake devices for separating device-specific
features and flows.
Additionally, remove the deprecated device IDs 0x57B5 and 0x57B6, which
are not used by any existing devices.
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
It may happen that VF spawned for E610 adapter has problem with setting
link up. This happens when ixgbevf supporting mailbox API 1.6 cooperates
with PF driver which doesn't support this version of API, and hence
doesn't support new approach for getting PF link data.
In that case VF asks PF to provide link data but as PF doesn't support
it, returns -EOPNOTSUPP what leads to early bail from link configuration
sequence.
Avoid such situation by using legacy VFLINKS approach whenever negotiated
API version is less than 1.6.
To reproduce the issue just create VF and set its link up - adapter must
be any from the E610 family, ixgbevf must support API 1.6 or higher while
ixgbevf must not.
Fixes: 53f0eb62b4d2 ("ixgbevf: fix getting link speed data for E610 devices")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: stable@vger.kernel.org
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Using get_cpu() in the tracepoint assignment causes an obvious preempt
count leak because nothing invokes put_cpu() to undo it:
softirq: huh, entered softirq 3 NET_RX with preempt_count 00000100, exited with 00000101?
This clearly has seen a lot of testing in the last 3+ years...
Use smp_processor_id() instead.
Fixes: 6d4d584a7ea8 ("i40e: Add i40e_napi_poll tracepoint")
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org
Reviewed-by: Joe Damato <joe@dama.to>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Since the conversion of ice to page pool, the ethtool loopback test
crashes:
BUG: kernel NULL pointer dereference, address: 000000000000000c
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1100f1067 P4D 0
Oops: Oops: 0002 [#1] SMP NOPTI
CPU: 23 UID: 0 PID: 5904 Comm: ethtool Kdump: loaded Not tainted 6.19.0-0.rc7.260128g1f97d9dcf5364.49.eln154.x86_64 #1 PREEMPT(lazy)
Hardware name: [...]
RIP: 0010:ice_alloc_rx_bufs+0x1cd/0x310 [ice]
Code: 83 6c 24 30 01 66 41 89 47 08 0f 84 c0 00 00 00 41 0f b7 dc 48 8b 44 24 18 48 c1 e3 04 41 bb 00 10 00 00 48 8d 2c 18 8b 04 24 <89> 45 0c 41 8b 4d 00 49 d3 e3 44 3b 5c 24 24 0f 83 ac fe ff ff 44
RSP: 0018:ff7894738aa1f768 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000700 RDI: 0000000000000000
RBP: 0000000000000000 R08: ff16dcae79880200 R09: 0000000000000019
R10: 0000000000000001 R11: 0000000000001000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ff16dcae6c670000
FS: 00007fcf428850c0(0000) GS:ff16dcb149710000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000000c CR3: 0000000121227005 CR4: 0000000000773ef0
PKRU: 55555554
Call Trace:
<TASK>
ice_vsi_cfg_rxq+0xca/0x460 [ice]
ice_vsi_cfg_rxqs+0x54/0x70 [ice]
ice_loopback_test+0xa9/0x520 [ice]
ice_self_test+0x1b9/0x280 [ice]
ethtool_self_test+0xe5/0x200
__dev_ethtool+0x1106/0x1a90
dev_ethtool+0xbe/0x1a0
dev_ioctl+0x258/0x4c0
sock_do_ioctl+0xe3/0x130
__x64_sys_ioctl+0xb9/0x100
do_syscall_64+0x7c/0x700
entry_SYSCALL_64_after_hwframe+0x76/0x7e
[...]
It crashes because we have not initialized libeth for the rx ring.
Fix it by treating ICE_VSI_LB VSIs slightly more like normal PF VSIs and
letting them have a q_vector. It's just a dummy, because the loopback
test does not use interrupts, but it contains a napi struct that can be
passed to libeth_rx_fq_create() called from ice_vsi_cfg_rxq() ->
ice_rxq_pp_create().
Fixes: 93f53db9f9dc ("ice: switch to Page Pool")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Fix IRDMA hardware initialization timeout (-110) after resume by
separating VSI-dependent configuration from RDMA resource allocation,
ensuring VSI is rebuilt before IRDMA accesses it.
After resume from suspend, IRDMA hardware initialization fails:
ice: IRDMA hardware initialization FAILED init_state=4 status=-110
Separate RDMA initialization into two phases:
1. ice_init_rdma() - Allocate resources only (no VSI/QoS access, no plug)
2. ice_rdma_finalize_setup() - Assign VSI/QoS info and plug device
This allows:
- ice_init_rdma() to stay in ice_resume() (mirrors ice_deinit_rdma()
in ice_suspend())
- VSI assignment deferred until after ice_vsi_rebuild() completes
- QoS info updated after ice_dcb_rebuild() completes
- Device plugged only when control queues, VSI, and DCB are all ready
Fixes: bc69ad74867db ("ice: avoid IRQ collision to fix init failure on ACPI S3 resume")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
When deleting a flow rule using "ethtool -N <dev> delete <location>",
idpf_sideband_action_ena() incorrectly validates fsp->ring_cookie even
though ethtool doesn't populate this field for delete operations. The
uninitialized ring_cookie may randomly match RX_CLS_FLOW_DISC or
RX_CLS_FLOW_WAKE, causing validation to fail and preventing legitimate
rule deletions. Remove the unnecessary sideband action enable check and
ring_cookie validation during delete operations since action validation
is not required when removing existing rules.
Fixes: ada3e24b84a0 ("idpf: add flow steering support")
Signed-off-by: Sreedevi Joshi <sreedevi.joshi@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The code uses the vidx for the IRQ name but that doesn't match ethtool
reporting nor netdev naming, this makes it hard to tune the device and
associate queues with IRQs. Sequentially requesting irqs starting from
'0' makes the output consistent.
This commit changes the interrupt numbering but preserves the name
format, maintaining ABI compatibility. Existing tools relying on the old
numbering are already non-functional, as they lack a useful correlation
to the interrupts.
Before:
ethtool -L eth1 tx 1 combined 3
grep . /proc/irq/*/*idpf*/../smp_affinity_list
/proc/irq/67/idpf-Mailbox-0/../smp_affinity_list:0-55,112-167
/proc/irq/68/idpf-eth1-TxRx-1/../smp_affinity_list:0
/proc/irq/70/idpf-eth1-TxRx-3/../smp_affinity_list:1
/proc/irq/71/idpf-eth1-TxRx-4/../smp_affinity_list:2
/proc/irq/72/idpf-eth1-Tx-5/../smp_affinity_list:3
ethtool -S eth1 | grep -v ': 0'
NIC statistics:
tx_q-0_pkts: 1002
tx_q-1_pkts: 2679
tx_q-2_pkts: 1113
tx_q-3_pkts: 1192 <----- tx_q-3 vs idpf-eth1-Tx-5
rx_q-0_pkts: 1143
rx_q-1_pkts: 3172
rx_q-2_pkts: 1074
After:
ethtool -L eth1 tx 1 combined 3
grep . /proc/irq/*/*idpf*/../smp_affinity_list
/proc/irq/67/idpf-Mailbox-0/../smp_affinity_list:0-55,112-167
/proc/irq/68/idpf-eth1-TxRx-0/../smp_affinity_list:0
/proc/irq/70/idpf-eth1-TxRx-1/../smp_affinity_list:1
/proc/irq/71/idpf-eth1-TxRx-2/../smp_affinity_list:2
/proc/irq/72/idpf-eth1-Tx-3/../smp_affinity_list:3
ethtool -S eth1 | grep -v ': 0'
NIC statistics:
tx_q-0_pkts: 118
tx_q-1_pkts: 134
tx_q-2_pkts: 228
tx_q-3_pkts: 138 <--- tx_q-3 matches idpf-eth1-Tx-3
rx_q-0_pkts: 111
rx_q-1_pkts: 366
rx_q-2_pkts: 120
Fixes: d4d558718266 ("idpf: initialize interrupts and enable vport")
Signed-off-by: Brian Vazquez <brianvv@google.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
rss_data->rss_key needs to be nullified after it is freed.
Checks like "if (!rss_data->rss_key)" in the code could fail
if it is not nullified.
Tested: built and booted the kernel.
Fixes: 83f38f210b85 ("idpf: Fix RSS LUT NULL pointer crash on early ethtool operations")
Signed-off-by: Li Li <boolli@google.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
In idpf_txq_group_alloc(), if any txq group's txqs failed to
allocate memory:
for (j = 0; j < tx_qgrp->num_txq; j++) {
tx_qgrp->txqs[j] = kzalloc(sizeof(*tx_qgrp->txqs[j]),
GFP_KERNEL);
if (!tx_qgrp->txqs[j])
goto err_alloc;
}
It would cause a NULL ptr kernel panic in idpf_txq_group_rel():
for (j = 0; j < txq_grp->num_txq; j++) {
if (flow_sch_en) {
kfree(txq_grp->txqs[j]->refillq);
txq_grp->txqs[j]->refillq = NULL;
}
kfree(txq_grp->txqs[j]);
txq_grp->txqs[j] = NULL;
}
[ 6.532461] BUG: kernel NULL pointer dereference, address: 0000000000000058
...
[ 6.534433] RIP: 0010:idpf_txq_group_rel+0xc9/0x110
...
[ 6.538513] Call Trace:
[ 6.538639] <TASK>
[ 6.538760] idpf_vport_queues_alloc+0x75/0x550
[ 6.538978] idpf_vport_open+0x4d/0x3f0
[ 6.539164] idpf_open+0x71/0xb0
[ 6.539324] __dev_open+0x142/0x260
[ 6.539506] netif_open+0x2f/0xe0
[ 6.539670] dev_open+0x3d/0x70
[ 6.539827] bond_enslave+0x5ed/0xf50
[ 6.540005] ? rcutree_enqueue+0x1f/0xb0
[ 6.540193] ? call_rcu+0xde/0x2a0
[ 6.540375] ? barn_get_empty_sheaf+0x5c/0x80
[ 6.540594] ? __kfree_rcu_sheaf+0xb6/0x1a0
[ 6.540793] ? nla_put_ifalias+0x3d/0x90
[ 6.540981] ? kvfree_call_rcu+0xb5/0x3b0
[ 6.541173] ? kvfree_call_rcu+0xb5/0x3b0
[ 6.541365] do_set_master+0x114/0x160
[ 6.541547] do_setlink+0x412/0xfb0
[ 6.541717] ? security_sock_rcv_skb+0x2a/0x50
[ 6.541931] ? sk_filter_trim_cap+0x7c/0x320
[ 6.542136] ? skb_queue_tail+0x20/0x50
[ 6.542322] ? __nla_validate_parse+0x92/0xe50
ro[o t t o6 .d5e4f2a5u4l0t]- ? security_capable+0x35/0x60
[ 6.542792] rtnl_newlink+0x95c/0xa00
[ 6.542972] ? __rtnl_unlock+0x37/0x70
[ 6.543152] ? netdev_run_todo+0x63/0x530
[ 6.543343] ? allocate_slab+0x280/0x870
[ 6.543531] ? security_capable+0x35/0x60
[ 6.543722] rtnetlink_rcv_msg+0x2e6/0x340
[ 6.543918] ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[ 6.544138] netlink_rcv_skb+0x16a/0x1a0
[ 6.544328] netlink_unicast+0x20a/0x320
[ 6.544516] netlink_sendmsg+0x304/0x3b0
[ 6.544748] __sock_sendmsg+0x89/0xb0
[ 6.544928] ____sys_sendmsg+0x167/0x1c0
[ 6.545116] ? ____sys_recvmsg+0xed/0x150
[ 6.545308] ___sys_sendmsg+0xdd/0x120
[ 6.545489] ? ___sys_recvmsg+0x124/0x1e0
[ 6.545680] ? rcutree_enqueue+0x1f/0xb0
[ 6.545867] ? rcutree_enqueue+0x1f/0xb0
[ 6.546055] ? call_rcu+0xde/0x2a0
[ 6.546222] ? evict+0x286/0x2d0
[ 6.546389] ? rcutree_enqueue+0x1f/0xb0
[ 6.546577] ? kmem_cache_free+0x2c/0x350
[ 6.546784] __x64_sys_sendmsg+0x72/0xc0
[ 6.546972] do_syscall_64+0x6f/0x890
[ 6.547150] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 6.547393] RIP: 0033:0x7fc1a3347bd0
...
[ 6.551375] RIP: 0010:idpf_txq_group_rel+0xc9/0x110
...
[ 6.578856] Rebooting in 10 seconds..
We should skip deallocating txqs[j] if it is NULL in the first place.
Tested: with this patch, the kernel panic no longer appears.
Fixes: 1c325aac10a8 ("idpf: configure resources for TX queues")
Signed-off-by: Li Li <boolli@google.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
In idpf_rxq_group_alloc(), if rx_qgrp->splitq.bufq_sets failed to get
allocated:
rx_qgrp->splitq.bufq_sets = kcalloc(vport->num_bufqs_per_qgrp,
sizeof(struct idpf_bufq_set),
GFP_KERNEL);
if (!rx_qgrp->splitq.bufq_sets) {
err = -ENOMEM;
goto err_alloc;
}
idpf_rxq_group_rel() would attempt to deallocate it in
idpf_rxq_sw_queue_rel(), causing a kernel panic:
```
[ 7.967242] early-network-sshd-n-rexd[3148]: knetbase: Info: [ 8.127804] BUG: kernel NULL pointer dereference, address: 00000000000000c0
...
[ 8.129779] RIP: 0010:idpf_rxq_group_rel+0x101/0x170
...
[ 8.133854] Call Trace:
[ 8.133980] <TASK>
[ 8.134092] idpf_vport_queues_alloc+0x286/0x500
[ 8.134313] idpf_vport_open+0x4d/0x3f0
[ 8.134498] idpf_open+0x71/0xb0
[ 8.134668] __dev_open+0x142/0x260
[ 8.134840] netif_open+0x2f/0xe0
[ 8.135004] dev_open+0x3d/0x70
[ 8.135166] bond_enslave+0x5ed/0xf50
[ 8.135345] ? nla_put_ifalias+0x3d/0x90
[ 8.135533] ? kvfree_call_rcu+0xb5/0x3b0
[ 8.135725] ? kvfree_call_rcu+0xb5/0x3b0
[ 8.135916] do_set_master+0x114/0x160
[ 8.136098] do_setlink+0x412/0xfb0
[ 8.136269] ? security_sock_rcv_skb+0x2a/0x50
[ 8.136509] ? sk_filter_trim_cap+0x7c/0x320
[ 8.136714] ? skb_queue_tail+0x20/0x50
[ 8.136899] ? __nla_validate_parse+0x92/0xe50
[ 8.137112] ? security_capable+0x35/0x60
[ 8.137304] rtnl_newlink+0x95c/0xa00
[ 8.137483] ? __rtnl_unlock+0x37/0x70
[ 8.137664] ? netdev_run_todo+0x63/0x530
[ 8.137855] ? allocate_slab+0x280/0x870
[ 8.138044] ? security_capable+0x35/0x60
[ 8.138235] rtnetlink_rcv_msg+0x2e6/0x340
[ 8.138431] ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[ 8.138650] netlink_rcv_skb+0x16a/0x1a0
[ 8.138840] netlink_unicast+0x20a/0x320
[ 8.139028] netlink_sendmsg+0x304/0x3b0
[ 8.139217] __sock_sendmsg+0x89/0xb0
[ 8.139399] ____sys_sendmsg+0x167/0x1c0
[ 8.139588] ? ____sys_recvmsg+0xed/0x150
[ 8.139780] ___sys_sendmsg+0xdd/0x120
[ 8.139960] ? ___sys_recvmsg+0x124/0x1e0
[ 8.140152] ? rcutree_enqueue+0x1f/0xb0
[ 8.140341] ? rcutree_enqueue+0x1f/0xb0
[ 8.140528] ? call_rcu+0xde/0x2a0
[ 8.140695] ? evict+0x286/0x2d0
[ 8.140856] ? rcutree_enqueue+0x1f/0xb0
[ 8.141043] ? kmem_cache_free+0x2c/0x350
[ 8.141236] __x64_sys_sendmsg+0x72/0xc0
[ 8.141424] do_syscall_64+0x6f/0x890
[ 8.141603] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 8.141841] RIP: 0033:0x7f2799d21bd0
...
[ 8.149905] Kernel panic - not syncing: Fatal exception
[ 8.175940] Kernel Offset: 0xf800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 8.176425] Rebooting in 10 seconds..
```
Tested: With this patch, the kernel panic no longer appears.
Fixes: 95af467d9a4e ("idpf: configure resources for RX queues")
Signed-off-by: Li Li <boolli@google.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Currently, in idpf_wait_for_sw_marker_completion(), when an
IDPF_TXD_COMPLT_SW_MARKER packet is found, the routine breaks out of
the for loop and does not increment the next_to_clean counter. This
causes the subsequent NAPI polls to run into the same
IDPF_TXD_COMPLT_SW_MARKER packet again and print out the following:
[ 23.261341] idpf 0000:05:00.0 eth1: Unknown TX completion type: 5
Instead, we should increment next_to_clean regardless when an
IDPF_TXD_COMPLT_SW_MARKER packet is found.
Tested: with the patch applied, we do not see the errors above from NAPI
polls anymore.
Fixes: 9d39447051a0 ("idpf: remove SW marker handling from NAPI")
Signed-off-by: Li Li <boolli@google.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Conversion performed via this Coccinelle script:
// SPDX-License-Identifier: GPL-2.0-only
// Options: --include-headers-for-types --all-includes --include-headers --keep-comments
virtual patch
@gfp depends on patch && !(file in "tools") && !(file in "samples")@
identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex,
kzalloc_obj,kzalloc_objs,kzalloc_flex,
kvmalloc_obj,kvmalloc_objs,kvmalloc_flex,
kvzalloc_obj,kvzalloc_objs,kvzalloc_flex};
@@
ALLOC(...
- , GFP_KERNEL
)
$ make coccicheck MODE=patch COCCI=gfp.cocci
Build and boot tested x86_64 with Fedora 42's GCC and Clang:
Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01
Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01
Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This converts some of the visually simpler cases that have been split
over multiple lines. I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.
Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script. I probably had made it a bit _too_ trivial.
So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.
The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is the exact same thing as the 'alloc_obj()' version, only much
smaller because there are a lot fewer users of the *alloc_flex()
interface.
As with alloc_obj() version, this was done entirely with mindless brute
force, using the same script, except using 'flex' in the pattern rather
than 'objs*'.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
The ID 8086:104f is matched by both i40e and ipw2200. The same device
ID should not be in more than one driver, because in that case, which
driver is used is unpredictable. Fix this by taking advantage of the
fact that i40e devices use PCI_CLASS_NETWORK_ETHERNET and ipw2200
devices use PCI_CLASS_NETWORK_OTHER to differentiate the devices.
Fixes: 2e45d3f4677a ("i40e: Add support for X710 B/P & SFP+ cards")
Cc: stable@vger.kernel.org
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Link: https://patch.msgid.link/20260210021235.16315-1-enelsonmoore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that the kernel doesn't insert HBH for BIG TCP IPv6 packets, remove
unnecessary steps from the ice TX path, that used to check and remove
HBH.
Signed-off-by: Alice Mikityanska <alice@isovalent.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260205133925.526371-8-alice.kernel@fastmail.im
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.19-rc9).
No adjacent changes, conflicts:
drivers/net/ethernet/spacemit/k1_emac.c
3125fc1701694 ("net: spacemit: k1-emac: fix jumbo frame support")
f66086798f91f ("net: spacemit: Remove broken flow control support")
https://lore.kernel.org/aYIysFIE9ooavWia@sirena.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement SyncE support for the E825-C Ethernet controller using the
DPLL subsystem. Unlike E810, the E825-C architecture relies on platform
firmware (ACPI) to describe connections between the NIC's recovered clock
outputs and external DPLL inputs.
Implement the following mechanisms to support this architecture:
1. Discovery Mechanism: The driver parses the 'dpll-pins' and 'dpll-pin names'
firmware properties to identify the external DPLL pins (parents)
corresponding to its RCLK outputs ("rclk0", "rclk1"). It uses
fwnode_dpll_pin_find() to locate these parent pins in the DPLL core.
2. Asynchronous Registration: Since the platform DPLL driver (e.g.
zl3073x) may probe independently of the network driver, utilize
the DPLL notifier chain The driver listens for DPLL_PIN_CREATED
events to detect when the parent MUX pins become available, then
registers its own Recovered Clock (RCLK) pins as children of those
parents.
3. Hardware Configuration: Implement the specific register access logic
for E825-C CGU (Clock Generation Unit) registers (R10, R11). This
includes configuring the bypass MUXes and clock dividers required to
drive SyncE signals.
4. Split Initialization: Refactor `ice_dpll_init()` to separate the
static initialization path of E810 from the dynamic, firmware-driven
path required for E825-C.
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Co-developed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Co-developed-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Link: https://patch.msgid.link/20260203174002.705176-10-ivecera@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Update existing DPLL drivers to utilize the DPLL reference count
tracking infrastructure.
Add dpll_tracker fields to the drivers' internal device and pin
structures. Pass pointers to these trackers when calling
dpll_device_get/put() and dpll_pin_get/put().
This allows developers to inspect the specific references held by this
driver via debugfs when CONFIG_DPLL_REFCNT_TRACKER is enabled, aiding
in the debugging of resource leaks.
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Link: https://patch.msgid.link/20260203174002.705176-9-ivecera@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add support for the REF_TRACKER infrastructure to the DPLL subsystem.
When enabled, this allows developers to track and debug reference counting
leaks or imbalances for dpll_device and dpll_pin objects. It records stack
traces for every get/put operation and exposes this information via
debugfs at:
/sys/kernel/debug/ref_tracker/dpll_device_*
/sys/kernel/debug/ref_tracker/dpll_pin_*
The following API changes are made to support this:
1. dpll_device_get() / dpll_device_put() now accept a 'dpll_tracker *'
(which is a typedef to 'struct ref_tracker *' when enabled, or an empty
struct otherwise).
2. dpll_pin_get() / dpll_pin_put() and fwnode_dpll_pin_find() similarly
accept the tracker argument.
3. Internal registration structures now hold a tracker to associate the
reference held by the registration with the specific owner.
All existing in-tree drivers (ice, mlx5, ptp_ocp, zl3073x) are updated
to pass NULL for the new tracker argument, maintaining current behavior
while enabling future debugging capabilities.
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Co-developed-by: Petr Oros <poros@redhat.com>
Signed-off-by: Petr Oros <poros@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Link: https://patch.msgid.link/20260203174002.705176-8-ivecera@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The i40e driver calls udp_tunnel_get_rx_info() during i40e_open().
This is redundant because UDP tunnel RX offload state is preserved
across device down/up cycles. The udp_tunnel core handles
synchronization automatically when required.
Furthermore, recent changes in the udp_tunnel infrastructure require
querying RX info while holding the udp_tunnel lock. Calling it
directly from the ndo_open path violates this requirement,
triggering the following lockdep warning:
Call Trace:
<TASK>
? __udp_tunnel_nic_assert_locked+0x39/0x40 [udp_tunnel]
i40e_open+0x135/0x14f [i40e]
__dev_open+0x121/0x2e0
__dev_change_flags+0x227/0x270
dev_change_flags+0x3d/0xb0
devinet_ioctl+0x56f/0x860
sock_do_ioctl+0x7b/0x130
__x64_sys_ioctl+0x91/0xd0
do_syscall_64+0x90/0x170
...
</TASK>
Remove the redundant and unsafe call to udp_tunnel_get_rx_info() from
i40e_open() resolve the locking violation.
Fixes: 1ead7501094c ("udp_tunnel: remove rtnl_lock dependency")
Signed-off-by: Mohammad Heib <mheib@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The ice driver calls udp_tunnel_get_rx_info() during ice_open_internal().
This is redundant because UDP tunnel RX offload state is preserved
across device down/up cycles. The udp_tunnel core handles
synchronization automatically when required.
Furthermore, recent changes in the udp_tunnel infrastructure require
querying RX info while holding the udp_tunnel lock. Calling it
directly from the ndo_open path violates this requirement,
triggering the following lockdep warning:
Call Trace:
<TASK>
ice_open_internal+0x253/0x350 [ice]
__udp_tunnel_nic_assert_locked+0x86/0xb0 [udp_tunnel]
__dev_open+0x2f5/0x880
__dev_change_flags+0x44c/0x660
netif_change_flags+0x80/0x160
devinet_ioctl+0xd21/0x15f0
inet_ioctl+0x311/0x350
sock_ioctl+0x114/0x220
__x64_sys_ioctl+0x131/0x1a0
...
</TASK>
Remove the redundant and unsafe call to udp_tunnel_get_rx_info() from
ice_open_internal() to resolve the locking violation
Fixes: 1ead7501094c ("udp_tunnel: remove rtnl_lock dependency")
Signed-off-by: Mohammad Heib <mheib@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Fix race condition where PTP periodic work runs while VSI is being
rebuilt, accessing NULL vsi->rx_rings.
The sequence was:
1. ice_ptp_prepare_for_reset() cancels PTP work
2. ice_ptp_rebuild() immediately queues PTP work
3. VSI rebuild happens AFTER ice_ptp_rebuild()
4. PTP work runs and accesses NULL vsi->rx_rings
Fix: Keep PTP work cancelled during rebuild, only queue it after
VSI rebuild completes in ice_rebuild().
Added ice_ptp_queue_work() helper function to encapsulate the logic
for queuing PTP work, ensuring it's only queued when PTP is supported
and the state is ICE_PTP_READY.
Error log:
[ 121.392544] ice 0000:60:00.1: PTP reset successful
[ 121.392692] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 121.392712] #PF: supervisor read access in kernel mode
[ 121.392720] #PF: error_code(0x0000) - not-present page
[ 121.392727] PGD 0
[ 121.392734] Oops: Oops: 0000 [#1] SMP NOPTI
[ 121.392746] CPU: 8 UID: 0 PID: 1005 Comm: ice-ptp-0000:60 Tainted: G S 6.19.0-rc6+ #4 PREEMPT(voluntary)
[ 121.392761] Tainted: [S]=CPU_OUT_OF_SPEC
[ 121.392773] RIP: 0010:ice_ptp_update_cached_phctime+0xbf/0x150 [ice]
[ 121.393042] Call Trace:
[ 121.393047] <TASK>
[ 121.393055] ice_ptp_periodic_work+0x69/0x180 [ice]
[ 121.393202] kthread_worker_fn+0xa2/0x260
[ 121.393216] ? __pfx_ice_ptp_periodic_work+0x10/0x10 [ice]
[ 121.393359] ? __pfx_kthread_worker_fn+0x10/0x10
[ 121.393371] kthread+0x10d/0x230
[ 121.393382] ? __pfx_kthread+0x10/0x10
[ 121.393393] ret_from_fork+0x273/0x2b0
[ 121.393407] ? __pfx_kthread+0x10/0x10
[ 121.393417] ret_from_fork_asm+0x1a/0x30
[ 121.393432] </TASK>
Fixes: 803bef817807d ("ice: factor out ice_ptp_rebuild_owner()")
Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The E825 hardware currently has each PF handle the PFINT_TSYN_TX cause of
the miscellaneous OICR interrupt vector. The actual interrupt cause
underlying this is shared by all ports on the same quad:
┌─────────────────────────────────┐
│ │
│ ┌────┐ ┌────┐ ┌────┐ ┌────┐ │
│ │PF 0│ │PF 1│ │PF 2│ │PF 3│ │
│ └────┘ └────┘ └────┘ └────┘ │
│ │
└────────────────▲────────────────┘
│
│
┌────────────────┼────────────────┐
│ PHY QUAD │
└───▲────────▲────────▲────────▲──┘
│ │ │ │
┌───┼──┐ ┌───┴──┐ ┌───┼──┐ ┌───┼──┐
│Port 0│ │Port 1│ │Port 2│ │Port 3│
└──────┘ └──────┘ └──────┘ └──────┘
If multiple PFs issue Tx timestamp requests near simultaneously, it is
possible that the correct PF will not be interrupted and will miss its
timestamp. Understanding why is somewhat complex.
Consider the following sequence of events:
CPU 0:
Send Tx packet on PF 0
...
PF 0 enqueues packet with Tx request CPU 1, PF1:
... Send Tx packet on PF1
... PF 1 enqueues packet with Tx request
HW:
PHY Port 0 sends packet
PHY raises Tx timestamp event interrupt
MAC raises each PF interrupt
CPU 0, PF0: CPU 1, PF1:
ice_misc_intr() checks for Tx timestamps ice_misc_intr() checks for Tx timestamp
Sees packet ready bit set Sees nothing available
... Exits
...
...
HW:
PHY port 1 sends packet
PHY interrupt ignored because not all packet timestamps read yet.
...
Read timestamp, report to stack
Because the interrupt event is shared for all ports on the same quad, the
PHY will not raise a new interrupt for any PF until all timestamps are
read.
In the example above, the second timestamp comes in for port 1 before the
timestamp from port 0 is read. At this point, there is no longer an
interrupt thread running that will read the timestamps, because each PF has
checked and found that there was no work to do. Applications such as ptp4l
will timeout after waiting a few milliseconds. Eventually, the watchdog
service task will re-check for all quads and notice that there are
outstanding timestamps, and issue a software interrupt to recover. However,
by this point it is far too late, and applications have already failed.
All of this occurs because of the underlying hardware behavior. The PHY
cannot raise a new interrupt signal until all outstanding timestamps have
been read.
As a first step to fix this, switch the E825C hardware to the
ICE_PTP_TX_INTERRUPT_ALL mode. In this mode, only the clock owner PF will
respond to the PFINT_TSYN_TX cause. Other PFs disable this cause and will
not wake. In this mode, the clock owner will iterate over all ports and
handle timestamps for each connected port.
This matches the E822 behavior, and is a necessary but insufficient step to
resolve the missing timestamps.
Even with use of the ICE_PTP_TX_INTERRUPT_ALL mode, we still sometimes miss
a timestamp event. The ice_ptp_tx_tstamp_owner() does re-check the ready
bitmap, but does so before re-enabling the OICR interrupt vector. It also
only checks the ready bitmap, but not the software Tx timestamp tracker.
To avoid risk of losing a timestamp, refactor the logic to check both the
software Tx timestamp tracker bitmap *and* the hardware ready bitmap.
Additionally, do this outside of ice_ptp_process_ts() after we have already
re-enabled the OICR interrupt.
Remove the checks from the ice_ptp_tx_tstamp(), ice_ptp_tx_tstamp_owner(),
and the ice_ptp_process_ts() functions. This results in ice_ptp_tx_tstamp()
being nothing more than a wrapper around ice_ptp_process_tx_tstamp() so we
can remove it.
Add the ice_ptp_tx_tstamps_pending() function which returns a boolean
indicating if there are any pending Tx timestamps. First, check the
software timestamp tracker bitmap. In ICE_PTP_TX_INTERRUPT_ALL mode, check
*all* ports software trackers. If a tracker has outstanding timestamp
requests, return true. Additionally, check the PHY ready bitmap to confirm
if the PHY indicates any outstanding timestamps.
In the ice_misc_thread_fn(), call ice_ptp_tx_tstamps_pending() just before
returning from the IRQ thread handler. If it returns true, write to
PFINT_OICR to trigger a PFINT_OICR_TSYN_TX_M software interrupt. This will
force the handler to interrupt again and complete the work even if the PHY
hardware did not interrupt for any reason.
This results in the following new flow for handling Tx timestamps:
1) send Tx packet
2) PHY captures timestamp
3) PHY triggers MAC interrupt
4) clock owner executes ice_misc_intr() with PFINT_OICR_TSYN_TX flag set
5) ice_ptp_ts_irq() returns IRQ_WAKE_THREAD
7) The interrupt thread wakes up and kernel calls ice_misc_intr_thread_fn()
8) ice_ptp_process_ts() is called to handle any outstanding timestamps
9) ice_irq_dynamic_ena() is called to re-enable the OICR hardware interrupt
cause
10) ice_ptp_tx_tstamps_pending() is called to check if we missed any more
outstanding timestamps, checking both software and hardware indicators.
With this change, it should no longer be possible for new timestamps to
come in such a way that we lose an interrupt. If a timestamp comes in
before the ice_ptp_tx_tstamps_pending() call, it will be noticed by at
least one of the software bitmap check or the hardware bitmap check. If the
timestamp comes in *after* this check, it should cause a timestamp
interrupt as we have already read all timestamps from the PHY and the OICR
vector has been re-enabled.
Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Przemyslaw Korba <przemyslaw.korba@intel.com>
Tested-by: Vitaly Grinberg <vgrinber@redhat.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Modify PTP (Precision Time Protocol) configuration on link down flow.
Previously, PHY_REG_TX_OFFSET_READY register was cleared in such case.
This register is used to determine if the timestamp is valid or not on
the hardware side.
However, there is a possibility that there is still the packet in the
HW queue which originally was supposed to be timestamped but the link
is already down and given register is cleared.
This potentially might lead to the situation in which that 'delayed'
packet's timestamp is treated as invalid one when the link is up
again.
This in turn leads to the situation in which the driver is not able to
effectively clean timestamp memory and interrupt configuration.
From the hardware perspective, that 'old' interrupt was not handled
properly and even if new timestamp packets are processed, no new
interrupts is generated. As a result, providing timestamps to the user
applications (like ptp4l) is not possible.
The solution for this problem is implemented at the driver level rather
than the firmware, and maintains the tx_ready bit high, even during
link down events. This avoids entering a potential inconsistent state
between the driver and the timestamp hardware.
Testing hints:
- run PTP traffic at higher rate (like 16 PTP messages per second)
- observe ptp4l behaviour at the client side in the following
conditions:
a) trigger link toggle events. It needs to be physiscal
link down/up events
b) link speed change
In all above cases, PTP processing at ptp4l application should resume
always. In failure case, the following permanent error message in ptp4l
log was observed:
controller-0 ptp4l: err [6175.116] ptp4l-legacy timed out while polling
for tx timestamp
Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Cross-merge networking fixes after downstream PR (net-6.19-rc8).
No adjacent changes, conflicts:
drivers/net/ethernet/spacemit/k1_emac.c
2c84959167d64 ("net: spacemit: Check for netif_carrier_ok() in emac_stats_update()")
f66086798f91f ("net: spacemit: Remove broken flow control support")
https://lore.kernel.org/aXjAqZA3iEWD_DGM@sirena.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since the beginning, the Intel ice driver has counted receive checksum
offload mismatches into the rx_errors member of the rtnl_link_stats64
struct. In ethtool -S these show up as rx_csum_bad.nic.
I believe counting these in rx_errors is fundamentally wrong, as it's
pretty clear from the comments in if_link.h and from every other statistic
the driver is summing into rx_errors, that all of them would cause a
"hardware drop" except for the UDP checksum mismatch, as well as the fact
that all the other causes for rx_errors are L2 reasons, and this L4 UDP
"mismatch" is an outlier.
A last nail in the coffin is that rx_errors is monitored in production and
can indicate a bad NIC/cable/Switch port, but instead some random series of
UDP packets with bad checksums will now trigger this alert. This false
positive makes the alert useless and affects us as well as other companies.
This packet with presumably a bad UDP checksum is *already* passed to the
stack, just not marked as offloaded by the hardware/driver. If it is
dropped by the stack it will show up as UDP_MIB_CSUMERRORS.
And one more thing, none of the other Intel drivers, and at least bnxt_en
and mlx5 both don't appear to count UDP offload mismatches as rx_errors.
Here is a related customer complaint:
https://community.intel.com/t5/Ethernet-Products/ice-rx-errros-is-too-sensitive-to-IP-TCP-attack-packets-Intel/td-p/1662125
Fixes: 4f1fe43c920b ("ice: Add more Rx errors to netdev's rx_error counter")
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: Jake Keller <jacob.e.keller@intel.com>
Cc: IWL <intel-wired-lan@lists.osuosl.org>
Signed-off-by: Jesse Brandeburg <jbrandeburg@cloudflare.com>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add NULL pointer checks in ice_vsi_set_napi_queues() to prevent crashes
during resume from suspend when rings[q_idx]->q_vector is NULL.
Tested adaptor:
60:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller E810-XXV for SFP [8086:159b] (rev 02)
Subsystem: Intel Corporation Ethernet Network Adapter E810-XXV-2 [8086:4003]
SR-IOV state: both disabled and enabled can reproduce this issue.
kernel version: v6.18
Reproduce steps:
Boot up and execute suspend like systemctl suspend or rtcwake.
Log:
<1>[ 231.443607] BUG: kernel NULL pointer dereference, address: 0000000000000040
<1>[ 231.444052] #PF: supervisor read access in kernel mode
<1>[ 231.444484] #PF: error_code(0x0000) - not-present page
<6>[ 231.444913] PGD 0 P4D 0
<4>[ 231.445342] Oops: Oops: 0000 [#1] SMP NOPTI
<4>[ 231.446635] RIP: 0010:netif_queue_set_napi+0xa/0x170
<4>[ 231.447067] Code: 31 f6 31 ff c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 85 c9 74 0b <48> 83 79 30 00 0f 84 39 01 00 00 55 41 89 d1 49 89 f8 89 f2 48 89
<4>[ 231.447513] RSP: 0018:ffffcc780fc078c0 EFLAGS: 00010202
<4>[ 231.447961] RAX: ffff8b848ca30400 RBX: ffff8b848caf2028 RCX: 0000000000000010
<4>[ 231.448443] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8b848dbd4000
<4>[ 231.448896] RBP: ffffcc780fc078e8 R08: 0000000000000000 R09: 0000000000000000
<4>[ 231.449345] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
<4>[ 231.449817] R13: ffff8b848dbd4000 R14: ffff8b84833390c8 R15: 0000000000000000
<4>[ 231.450265] FS: 00007c7b29e9d740(0000) GS:ffff8b8c068e2000(0000) knlGS:0000000000000000
<4>[ 231.450715] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 231.451179] CR2: 0000000000000040 CR3: 000000030626f004 CR4: 0000000000f72ef0
<4>[ 231.451629] PKRU: 55555554
<4>[ 231.452076] Call Trace:
<4>[ 231.452549] <TASK>
<4>[ 231.452996] ? ice_vsi_set_napi_queues+0x4d/0x110 [ice]
<4>[ 231.453482] ice_resume+0xfd/0x220 [ice]
<4>[ 231.453977] ? __pfx_pci_pm_resume+0x10/0x10
<4>[ 231.454425] pci_pm_resume+0x8c/0x140
<4>[ 231.454872] ? __pfx_pci_pm_resume+0x10/0x10
<4>[ 231.455347] dpm_run_callback+0x5f/0x160
<4>[ 231.455796] ? dpm_wait_for_superior+0x107/0x170
<4>[ 231.456244] device_resume+0x177/0x270
<4>[ 231.456708] dpm_resume+0x209/0x2f0
<4>[ 231.457151] dpm_resume_end+0x15/0x30
<4>[ 231.457596] suspend_devices_and_enter+0x1da/0x2b0
<4>[ 231.458054] enter_state+0x10e/0x570
Add defensive checks for both the ring pointer and its q_vector
before dereferencing, allowing the system to resume successfully even when
q_vectors are unmapped.
Fixes: 2a5dc090b92cf ("ice: move netif_queue_set_napi to rtnl-protected sections")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|