| Age | Commit message (Collapse) | Author | Files | Lines |
|
The hw_info uAPI will support a bidirectional data_type field that can be
used as an input field for user space to request for a specific info data.
To prepare for the uAPI update, change the iommu layer first:
- Add a new IOMMU_HW_INFO_TYPE_DEFAULT as an input, for which driver can
output its only (or firstly) supported type
- Update the kdoc accordingly
- Roll out the type validation in the existing drivers
Link: https://patch.msgid.link/r/00f4a2d3d930721f61367014717b3ba2d1e82a81.1752126748.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The UVC driver provides two metadata types V4L2_META_FMT_UVC, and
V4L2_META_FMT_D4XX. The only difference between the two of them is that
V4L2_META_FMT_UVC only copies PTS, SCR, size and flags, and
V4L2_META_FMT_D4XX copies the whole metadata section.
Now we only enable V4L2_META_FMT_D4XX for the Intel D4xx family of
devices, but it is useful to have the whole metadata payload for any
device where vendors include other metadata, such as the one described by
Microsoft:
https://learn.microsoft.com/en-us/windows-hardware/drivers/stream/mf-capture-metadata
This patch introduces a new format V4L2_META_FMT_UVC_MSXU_1_5, that is
identical to V4L2_META_FMT_D4XX.
Let the user enable this format with a quirk for now. This way they can
test if their devices provide useful metadata without rebuilding the
kernel. They can later contribute patches to auto-quirk their devices.
We will also work in methods to auto-detect devices compatible with this
new metadata format.
Suggested-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Hans de Goede <hansg@kernel.org>
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Link: https://lore.kernel.org/r/20250707-uvc-meta-v8-4-ed17f8b1218b@chromium.org
Signed-off-by: Hans de Goede <hansg@kernel.org>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
|
|
Introduce a new IOMMUFD_CMD_HW_QUEUE_ALLOC ioctl for user space to allocate
a HW QUEUE object for a vIOMMU specific HW-accelerated queue, e.g.:
- NVIDIA's Virtual Command Queue
- AMD vIOMMU's Command Buffer, Event Log Buffers, and PPR Log Buffers
Since this is introduced with NVIDIA's VCMDQs that access the guest memory
in the physical address space, add an iommufd_hw_queue_alloc_phys() helper
that will create an access object to the queue memory in the IOAS, to avoid
the mappings of the guest memory from being unmapped, during the life cycle
of the HW queue object.
AMD's HW will need an hw_queue_init op that is mutually exclusive with the
hw_queue_init_phys op, and their case will bypass the access part, i.e. no
iommufd_hw_queue_alloc_phys() call.
Link: https://patch.msgid.link/r/dab4ace747deb46c1fe70a5c663307f46990ae56.1752126748.git.nicolinc@nvidia.com
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Add IOMMUFD_OBJ_HW_QUEUE with an iommufd_hw_queue structure, representing
a HW-accelerated queue type of IOMMU's physical queue that can be passed
through to a user space VM for direct hardware control, such as:
- NVIDIA's Virtual Command Queue
- AMD vIOMMU's Command Buffer, Event Log Buffers, and PPR Log Buffers
Add new viommu ops for iommufd to communicate with IOMMU drivers to fetch
supported HW queue structure size and to forward user space ioctls to the
IOMMU drivers for initialization/destroy.
As the existing HWs, NVIDIA's VCMDQs access the guest memory via physical
addresses, while AMD's Buffers access the guest memory via guest physical
addresses (i.e. iova of the nesting parent HWPT). Separate two mutually
exclusive hw_queue_init and hw_queue_init_phys ops to indicate whether a
vIOMMU HW accesses the guest queue in the guest physical space (via iova)
or the host physical space (via pa).
In a latter case, the iommufd core will validate the physical pages of a
given guest queue, to ensure the underlying physical pages are contiguous
and pinned.
Since this is introduced with NVIDIA's VCMDQs, add hw_queue_init_phys for
now, and leave some notes for hw_queue_init in the near future (for AMD).
Either NVIDIA's or AMD's HW is a multi-queue model: NVIDIA's will be only
one type in enum iommu_hw_queue_type, while AMD's will be three different
types (two of which will have multi queues). Compared to letting the core
manage multiple queues with three types per vIOMMU object, it'd be easier
for the driver to manage that by having three different driver-structure
arrays per vIOMMU object. Thus, pass in the index to the init op.
Link: https://patch.msgid.link/r/6939b73699e278e60ce167e911b3d9be68882bad.1752126748.git.nicolinc@nvidia.com
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Pranjal Shrivastava <praan@google.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The FH_FLAG_IMMUTABLE flag was meant to avoid the reference counting on
the private hash and so to avoid the performance regression on big
machines.
With the switch to per-CPU counter this is no longer needed. That flag
was never useable on any released kernel.
Remove any support for IMMUTABLE while preserve the flags argument and
enforce it to be zero.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250710110011.384614-5-bigeasy@linutronix.de
|
|
https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
UAPI Changes:
- Documentation fixes (Shuicheng)
Cross-subsystem Changes:
- MTD intel-dg driver for dgfx non-volatile memory device (Sasha)
- i2c: designware changes to allow i2c integration with BMG (Heikki)
Core Changes:
- Restructure migration in preparation for multi-device (Brost, Thomas)
- Expose fan control and voltage regulator version on sysfs (Raag)
Driver Changes:
- Add WildCat Lake support (Roper)
- Add aux bus child device driver for NVM on DGFX (Sasha)
- Some refactor and fixes to allow cleaner BMG w/a (Lucas, Maarten, Auld)
- BMG w/a (Vinay)
- Improve handling of aborted probe (Michal)
- Do not wedge device on killed exec queues (Brost)
- Init changes for flicker-free boot (Maarten)
- Fix out-of-bounds field write in MI_STORE_DATA_IMM (Jia)
- Enable the GuC Dynamic Inhibit Context Switch optimization (Daniele)
- Drop bo->size (Brost)
- Builds and KConfig fixes (Harry, Maarten)
- Consolidate LRC offset calculations (Tvrtko)
- Fix potential leak in hw_engine_group (Michal)
- Future-proof for multi-tile + multi-GT cases (Roper)
- Validate gt in pmu event (Riana)
- SRIOV PF: Clear all LMTT pages on alloc (Michal)
- Allocate PF queue size on pow2 boundary (Brost)
- SRIOV VF: Make multi-GT migration less error prone (Tomasz)
- Revert indirect ring state patch to fix random LRC context switches failures (Brost)
- Fix compressed VRAM handling (Auld)
- Add one additional BMG PCI ID (Ravi)
- Recommend GuC v70.46.2 for BMG, LNL, DG2 (Julia)
- Add GuC and HuC to PTL (Daniele)
- Drop PTL force_probe requirement (Atwood)
- Fix error flow in display suspend (Shuicheng)
- Disable GuC communication on hardware initialization error (Zhanjun)
- Devcoredump fixes and clean up (Shuicheng)
- SRIOV PF: Downgrade some info to debug (Michal)
- Don't allocate temporary GuC policies object (Michal)
- Support for I2C attached MCUs (Heikki, Raag, Riana)
- Add GPU memory bo trace points (Juston)
- SRIOV VF: Skip some W/a (Michal)
- Correct comment of xe_pm_set_vram_threshold (Shuicheng)
- Cancel ongoing H2G requests when stopping CT (Michal)
Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/aHA7184UnWlONORU@intel.com
|
|
Implement ETHTOOL_GRXFH over Netlink. The number of flow types is
reasonable (around 20) so report all of them at once for simplicity.
Do not maintain the flow ID mapping with ioctl at the uAPI level.
This gives us a chance to clean up the confusion that come from
RxNFC vs RxFH (flow direction vs hashing) in the ioctl.
Try to align with the names used in ethtool CLI, they seem to have
stood the test of time just fine. One annoyance is that we still
call L4 ports the weird names, but I guess they also apply to IPSec
(where they cover the SPI) so it is what it is.
$ ynl --family ethtool --dump rss-get
{
"header": {
"dev-index": 1,
"dev-name": "enp1s0"
},
"hfunc": 1,
"hkey": b"...",
"indir": [0, 1, ...],
"flow-hash": {
"ether": {"l2da"},
"ah-esp4": {"ip-src", "ip-dst"},
"ah-esp6": {"ip-src", "ip-dst"},
"ah4": {"ip-src", "ip-dst"},
"ah6": {"ip-src", "ip-dst"},
"esp4": {"ip-src", "ip-dst"},
"esp6": {"ip-src", "ip-dst"},
"ip4": {"ip-src", "ip-dst"},
"ip6": {"ip-src", "ip-dst"},
"sctp4": {"ip-src", "ip-dst"},
"sctp6": {"ip-src", "ip-dst"},
"udp4": {"ip-src", "ip-dst"},
"udp6": {"ip-src", "ip-dst"}
"tcp4": {"l4-b-0-1", "l4-b-2-3", "ip-src", "ip-dst"},
"tcp6": {"l4-b-0-1", "l4-b-2-3", "ip-src", "ip-dst"},
},
}
Link: https://patch.msgid.link/20250708220640.2738464-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Looks like some drivers (ena, enetc, fbnic.. there's probably more)
consider ETHER_FLOW to be legitimate target for flow hashing.
I'm not sure how intentional that is from the uAPI perspective
vs just an effect of ethtool IOCTL doing minimal input validation.
But Netlink will do strict validation, so we need to decide whether
we allow this use case or not. I don't see a strong reason against
it, and rejecting it would potentially regress a number of drivers.
So update the comments and flow_type_hashable().
Link: https://patch.msgid.link/20250708220640.2738464-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://github.com/pabeni/linux-devel
Paolo Abeni says:
====================
virtio: introduce GSO over UDP tunnel
Some virtualized deployments use UDP tunnel pervasively and are impacted
negatively by the lack of GSO support for such kind of traffic in the
virtual NIC driver.
The virtio_net specification recently introduced support for GSO over
UDP tunnel, this series updates the virtio implementation to support
such a feature.
Currently the kernel virtio support limits the feature space to 64,
while the virtio specification allows for a larger number of features.
Specifically the GSO-over-UDP-tunnel-related virtio features use bits
65-69.
The first four patches in this series rework the virtio and vhost
feature support to cope with up to 128 bits. The limit is set by
a define and could be easily raised in future, as needed.
This implementation choice is aimed at keeping the code churn as
limited as possible. For the same reason, only the virtio_net driver is
reworked to leverage the extended feature space; all other
virtio/vhost drivers are unaffected, but could be upgraded to support
the extended features space in a later time.
The last four patches bring in the actual GSO over UDP tunnel support.
As per specification, some additional fields are introduced into the
virtio net header to support the new offload. The presence of such
fields depends on the negotiated features.
New helpers are introduced to convert the UDP-tunneled skb metadata to
an extended virtio net header and vice versa. Such helpers are used by
the tun and virtio_net driver to cope with the newly supported offloads.
Tested with basic stream transfer with all the possible permutations of
host kernel/qemu/guest kernel with/without GSO over UDP tunnel support.
====================
Link: https://patch.msgid.link/cover.1751874094.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.16-rc6).
No conflicts.
Adjacent changes:
Documentation/devicetree/bindings/net/allwinner,sun8i-a83t-emac.yaml
0a12c435a1d6 ("dt-bindings: net: sun8i-emac: Add A100 EMAC compatible")
b3603c0466a8 ("dt-bindings: net: sun8i-emac: Rename A523 EMAC0 to GMAC0")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull KVM fixes from Paolo Bonzini:
"Many patches, pretty much all of them small, that accumulated while I
was on vacation.
ARM:
- Remove the last leftovers of the ill-fated FPSIMD host state
mapping at EL2 stage-1
- Fix unexpected advertisement to the guest of unimplemented S2 base
granule sizes
- Gracefully fail initialising pKVM if the interrupt controller isn't
GICv3
- Also gracefully fail initialising pKVM if the carveout allocation
fails
- Fix the computing of the minimum MMIO range required for the host
on stage-2 fault
- Fix the generation of the GICv3 Maintenance Interrupt in nested
mode
x86:
- Reject SEV{-ES} intra-host migration if one or more vCPUs are
actively being created, so as not to create a non-SEV{-ES} vCPU in
an SEV{-ES} VM
- Use a pre-allocated, per-vCPU buffer for handling de-sparsification
of vCPU masks in Hyper-V hypercalls; fixes a "stack frame too
large" issue
- Allow out-of-range/invalid Xen event channel ports when configuring
IRQ routing, to avoid dictating a specific ioctl() ordering to
userspace
- Conditionally reschedule when setting memory attributes to avoid
soft lockups when userspace converts huge swaths of memory to/from
private
- Add back MWAIT as a required feature for the MONITOR/MWAIT selftest
- Add a missing field in struct sev_data_snp_launch_start that
resulted in the guest-visible workarounds field being filled at the
wrong offset
- Skip non-canonical address when processing Hyper-V PV TLB flushes
to avoid VM-Fail on INVVPID
- Advertise supported TDX TDVMCALLs to userspace
- Pass SetupEventNotifyInterrupt arguments to userspace
- Fix TSC frequency underflow"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: avoid underflow when scaling TSC frequency
KVM: arm64: Remove kvm_arch_vcpu_run_map_fp()
KVM: arm64: Fix handling of FEAT_GTG for unimplemented granule sizes
KVM: arm64: Don't free hyp pages with pKVM on GICv2
KVM: arm64: Fix error path in init_hyp_mode()
KVM: arm64: Adjust range correctly during host stage-2 faults
KVM: arm64: nv: Fix MI line level calculation in vgic_v3_nested_update_mi()
KVM: x86/hyper-v: Skip non-canonical addresses during PV TLB flush
KVM: SVM: Add missing member in SNP_LAUNCH_START command structure
Documentation: KVM: Fix unexpected unindent warnings
KVM: selftests: Add back the missing check of MONITOR/MWAIT availability
KVM: Allow CPU to reschedule while setting per-page memory attributes
KVM: x86/xen: Allow 'out of range' event channel ports in IRQ routing table.
KVM: x86/hyper-v: Use preallocated per-vCPU buffer for de-sparsified vCPU masks
KVM: SVM: Initialize vmsa_pa in VMCB to INVALID_PAGE if VMSA page is NULL
KVM: SVM: Reject SEV{-ES} intra host migration if vCPU creation is in-flight
KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities
KVM: TDX: Exit to userspace for SetupEventNotifyInterrupt
|
|
The new type of vIOMMU for tegra241-cmdqv driver needs a driver-specific
user data. So, add data_len/uptr to the iommu_viommu_alloc uAPI and pass
it in via the viommu_init iommu op.
Link: https://patch.msgid.link/r/2315b0e164b355746387e960745ac9154caec124.1752126748.git.nicolinc@nvidia.com
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Pranjal Shrivastava <praan@google.com>
Acked-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The userspace-api iommufd.rst has described it correctly but the uAPI doc
was remained uncorrected. Thus, fix it.
Link: https://patch.msgid.link/r/2cdcecaf2babee16fda7545ccad4e5bed7a5032d.1752126748.git.nicolinc@nvidia.com
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
This patch provides a setsockopt method to let applications leverage to
adjust how many descs to be handled at most in one send syscall. It
mitigates the situation where the default value (32) that is too small
leads to higher frequency of triggering send syscall.
Considering the prosperity/complexity the applications have, there is no
absolutely ideal suggestion fitting all cases. So keep 32 as its default
value like before.
The patch does the following things:
- Add XDP_MAX_TX_SKB_BUDGET socket option.
- Set max_tx_budget to 32 by default in the initialization phase as a
per-socket granular control.
- Set the range of max_tx_budget as [32, xs->tx->nentries].
The idea behind this comes out of real workloads in production. We use a
user-level stack with xsk support to accelerate sending packets and
minimize triggering syscalls. When the packets are aggregated, it's not
hard to hit the upper bound (namely, 32). The moment user-space stack
fetches the -EAGAIN error number passed from sendto(), it will loop to try
again until all the expected descs from tx ring are sent out to the driver.
Enlarging the XDP_MAX_TX_SKB_BUDGET value contributes to less frequency of
sendto() and higher throughput/PPS.
Here is what I did in production, along with some numbers as follows:
For one application I saw lately, I suggested using 128 as max_tx_budget
because I saw two limitations without changing any default configuration:
1) XDP_MAX_TX_SKB_BUDGET, 2) socket sndbuf which is 212992 decided by
net.core.wmem_default. As to XDP_MAX_TX_SKB_BUDGET, the scenario behind
this was I counted how many descs are transmitted to the driver at one
time of sendto() based on [1] patch and then I calculated the
possibility of hitting the upper bound. Finally I chose 128 as a
suitable value because 1) it covers most of the cases, 2) a higher
number would not bring evident results. After twisting the parameters,
a stable improvement of around 4% for both PPS and throughput and less
resources consumption were found to be observed by strace -c -p xxx:
1) %time was decreased by 7.8%
2) error counter was decreased from 18367 to 572
[1]: https://lore.kernel.org/all/20250619093641.70700-1-kerneljasonxing@gmail.com/
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://patch.msgid.link/20250704160138.48677-1-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
For 12-bit packed Bayer formats: every two consecutive samples are
packed into three bytes. Fix the corresponding comment.
Signed-off-by: Mehdi Djait <mehdi.djait@linux.intel.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
|
|
The root inode of /proc having a fixed inode number has been part of the
core kernel ABI since its inception, and recently some userspace
programs (mainly container runtimes) have started to explicitly depend
on this behaviour.
The main reason this is useful to userspace is that by checking that a
suspect /proc handle has fstype PROC_SUPER_MAGIC and is PROCFS_ROOT_INO,
they can then use openat2(RESOLVE_{NO_{XDEV,MAGICLINK},BENEATH}) to
ensure that there isn't a bind-mount that replaces some procfs file with
a different one. This kind of attack has lead to security issues in
container runtimes in the past (such as CVE-2019-19921) and libraries
like libpathrs[1] use this feature of procfs to provide safe procfs
handling functions.
There was also some trailing whitespace in the "struct proc_dir_entry"
initialiser, so fix that up as well.
[1]: https://github.com/openSUSE/libpathrs
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Link: https://lore.kernel.org/20250708-uapi-procfs-root-ino-v1-1-6ae61e97c79b@cyphar.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Allow a guest to read the physical IA32_APERF and IA32_MPERF MSRs
without interception.
The IA32_APERF and IA32_MPERF MSRs are not virtualized. Writes are not
handled at all. The MSR values are not zeroed on vCPU creation, saved
on suspend, or restored on resume. No accommodation is made for
processor migration or for sharing a logical processor with other
tasks. No adjustments are made for non-unit TSC multipliers. The MSRs
do not account for time the same way as the comparable PMU events,
whether the PMU is virtualized by the traditional emulation method or
the new mediated pass-through approach.
Nonetheless, in a properly constrained environment, this capability
can be combined with a guest CPUID table that advertises support for
CPUID.6:ECX.APERFMPERF[bit 0] to induce a Linux guest to report the
effective physical CPU frequency in /proc/cpuinfo. Moreover, there is
no performance cost for this capability.
Signed-off-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20250530185239.2335185-3-jmattson@google.com
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250626001225.744268-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
to pick up the __GENMASK() fix, otherwise the AUX clock VDSO patches fail
to compile for compat.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Runtime PM updates related to autosuspend for 6.17
Make several autosuspend functions mark last busy stamp and update
the documentation accordingly (Sakari Ailus).
|
|
We have an application that uses almost the same code for TCP and
AF_UNIX (SOCK_STREAM).
TCP can use TCP_INQ, but AF_UNIX doesn't have it and requires an
extra syscall, ioctl(SIOCINQ) or getsockopt(SO_MEMINFO) as an
alternative.
Let's introduce the generic version of TCP_INQ.
If SO_INQ is enabled, recvmsg() will put a cmsg of SCM_INQ that
contains the exact value of ioctl(SIOCINQ). The cmsg is also
included when msg->msg_get_inq is non-zero to make sockets
io_uring-friendly.
Note that SOCK_CUSTOM_SOCKOPT is flagged only for SOCK_STREAM to
override setsockopt() for SOL_SOCKET.
By having the flag in struct unix_sock, instead of struct sock, we
can later add SO_INQ support for TCP and reuse tcp_sk(sk)->recvmsg_inq.
Note also that supporting custom getsockopt() for SOL_SOCKET will need
preparation for other SOCK_CUSTOM_SOCKOPT users (UDP, vsock, MPTCP).
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250702223606.1054680-7-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add new tun features to represent the newly introduced virtio
GSO over UDP tunnel offload. Allows detection and selection of
such features via the existing TUNSETOFFLOAD ioctl and compute
the expected virtio header size and tunnel header offset using
the current netdev features, so that we can plug almost seamless
the newly introduced virtio helpers to serialize the extended
virtio header.
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v6 -> v7:
- rebased
v4 -> v5:
- encapsulate the guest feature guessing in a tun helper
- dropped irrelevant check on xdp buff headroom
- do not remove unrelated black line
- avoid line len > 80 char
v3 -> v4:
- virtio tnl-related fields are at fixed offset, cleanup
the code accordingly.
- use netdev features instead of flags bit to check for
the configured offload
- drop packet in case of enabled features/configured hdr
size mismatch
v2 -> v3:
- cleaned-up uAPI comments
- use explicit struct layout instead of raw buf.
|
|
The virtio specification are introducing support for GSO over UDP
tunnel.
This patch brings in the needed defines and the additional virtio hdr
parsing/building helpers.
The UDP tunnel support uses additional fields in the virtio hdr, and such
fields location can change depending on other negotiated features -
specifically VIRTIO_NET_F_HASH_REPORT.
Try to be as conservative as possible with the new field validation.
Existing implementation for plain GSO offloads allow for invalid/
self-contradictory values of such fields. With GSO over UDP tunnel we can
be more strict, with no need to deal with legacy implementation.
Since the checksum-related field validation is asymmetric in the driver
and in the device, introduce a separate helper to implement the new checks
(to be used only on the driver side).
Note that while the feature space exceeds the 64-bit boundaries, the
guest offload space is fixed by the specification of the
VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET command to a 64-bit size.
Prior to the UDP tunnel GSO support, each guest offload bit corresponded
to the feature bit with the same value and vice versa.
Due to the limited 'guest offload' space, relevant features in the high
64 bits are 'mapped' to free bits in the lower range. That is simpler
than defining a new command (and associated features) to exchange an
extended guest offloads set.
As a consequence, the uAPIs also specify the mapped guest offload value
corresponding to the UDP tunnel GSO features.
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
--
v4 -> v5:
- avoid lines above 80 chars
v3 -> v4:
- fixed offset for UDP GSO tunnel, update accordingly the helpers
- tried to clarified vlan_hlen semantic
- virtio_net_chk_data_valid() -> virtio_net_handle_csum_offload()
v2 -> v3:
- add definitions for possible vnet hdr layouts with tunnel support
v1 -> v2:
- 'relay' -> 'rely' typo
- less unclear comment WRT enforced inner GSO checks
- inner header fields are allowed only with 'modern' virtio,
thus are always le
- clarified in the commit message the need for 'mapped features'
defines
- assume little_endian is true when UDP GSO is enabled.
- fix inner proto type value
|
|
Use the extended feature type for 'acked_features' and implement
two new ioctls operation allowing the user-space to set/query an
unbounded amount of features.
The actual number of processed features is limited by VIRTIO_FEATURES_MAX
and attempts to set features above such limit fail with
EOPNOTSUPP.
Note that: the legacy ioctls implicitly truncate the negotiated
features to the lower 64 bits range and the 'acked_backend_features'
field don't need conversion, as the only negotiated feature there
is in the low 64 bit range.
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
BITS_PER_LONG does not exist in UAPI headers, so can't be used by the UAPI
__GENMASK(). Instead __BITS_PER_LONG needs to be used.
When __GENMASK() was introduced in commit 3c7a8e190bc5 ("uapi: introduce uapi-friendly macros for GENMASK"),
the code was fine. A broken revert in 1e7933a575ed ("uapi: Revert "bitops: avoid integer overflow in GENMASK(_ULL)"")
introduced the incorrect usage of BITS_PER_LONG.
That was fixed in commit 11fcf368506d ("uapi: bitops: use UAPI-safe variant of BITS_PER_LONG again").
But a broken sync of the kernel headers with the tools/ headers in
commit fc92099902fb ("tools headers: Synchronize linux/bits.h with the kernel sources")
undid the fix.
Reapply the fix and while at it also fix the tools header.
Fixes: fc92099902fb ("tools headers: Synchronize linux/bits.h with the kernel sources")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
|
|
Add a new netlink parameter 'HANDSHAKE_A_ACCEPT_KEYRING' to provide
the serial number of the keyring to use.
Signed-off-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250701144657.104401-1-hare@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
https://gitlab.freedesktop.org/drm/msm into drm-next
Updates for v6.17
CI:
- uprev mesa and ci-templates
- use shallow clone to speed up build jobs
- remove sdm845/cheza jobs. These runners are no more (RIP
dear chezas)
- fix runner tag for i915 cml runners
- uprev igt to pull in msm test fixes
Core:
- VM_BIND support!
- single source of truth for UBWC configuration. Adds a global soc
driver for UBWC config which is used from display and GPU. (And
later vidc/camera/etc)
- Decouple ties between GPU and KMS, adding a `separate_gpu_kms`
modparam to allow the GPU and KMS to bind to separate DRM devices.
This should better deal with more exotic SoC configurations where
the number of GPUs is different from number of DPUs. The default
behavior is to still come up as a single unified DRM device to
avoid surprising userspace.
DP:
- major rework of the I/O accessors
DPU:
- use version checks instead of feature bits
- SM8750 support
- set min_prefill_lines for SC8180X
DSI:
- SM8750 support
GPU:
- speedbin support for X1-85
- X1-45 support
MDSS:
- SM8750 support
Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Robin Clark <robin.clark@oss.qualcomm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CACSVV0217R+kpoWQJeuYGHf6q_4aFyEJuKa=dZZKOnLQzFwppg@mail.gmail.com
|
|
This change allows for gateway routing, where a route table entry
may reference a routable endpoint (by network and EID), instead of
routing directly to a netdevice.
We add support for a RTM_GATEWAY attribute for netlink route updates,
with an attribute format of:
struct mctp_fq_addr {
unsigned int net;
mctp_eid_t eid;
}
- we need the net here to uniquely identify the target EID, as we no
longer have the device reference directly (which would provide the net
id in the case of direct routes).
This makes route lookups recursive, as a route lookup that returns a
gateway route must be resolved into a direct route (ie, to a device)
eventually. We provide a limit to the route lookups, to prevent infinite
loop routing.
The route lookup populates a new 'nexthop' field in the dst structure,
which now specifies the key for the neighbour table lookup on device
output, rather than using the packet destination address directly.
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Link: https://patch.msgid.link/20250702-dev-forwarding-v5-13-1468191da8a4@codeconstruct.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
User can config or display the bonding broadcast_neighbor option via
iproute2/netlink.
Cc: Jay Vosburgh <jv@jvosburgh.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com>
Signed-off-by: Zengbing Tu <tuzengbing@didiglobal.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/76b90700ba5b98027dfb51a2f3c5cfea0440a21b.1751031306.git.tonghao@bamaicloud.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Introduce a new KVM capability to expose to the userspace whether
cacheable mapping of PFNMAP is supported.
The ability to safely do the cacheable mapping of PFNMAP is contingent
on S2FWB and ARM64_HAS_CACHE_DIC. S2FWB allows KVM to avoid flushing
the D cache, ARM64_HAS_CACHE_DIC allows KVM to avoid flushing the icache
and turns icache_inval_pou() into a NOP. The cap would be false if
those requirements are missing and is checked by making use of
kvm_arch_supports_cacheable_pfnmap.
This capability would allow userspace to discover the support.
It could for instance be used by userspace to prevent live-migration
across FWB and non-FWB hosts.
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Jason Gunthorpe <jgg@nvidia.com>
CC: Oliver Upton <oliver.upton@linux.dev>
CC: David Hildenbrand <david@redhat.com>
Suggested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250705071717.5062-7-ankita@nvidia.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
When a packet enters OVS datapath and there is no flow to handle it,
packet goes to userspace through a MISS upcall. With per-CPU upcall
dispatch mechanism, we're using the current CPU id to select the
Netlink PID on which to send this packet. This allows us to send
packets from the same traffic flow through the same handler.
The handler will process the packet, install required flow into the
kernel and re-inject the original packet via OVS_PACKET_CMD_EXECUTE.
While handling OVS_PACKET_CMD_EXECUTE, however, we may hit a
recirculation action that will pass the (likely modified) packet
through the flow lookup again. And if the flow is not found, the
packet will be sent to userspace again through another MISS upcall.
However, the handler thread in userspace is likely running on a
different CPU core, and the OVS_PACKET_CMD_EXECUTE request is handled
in the syscall context of that thread. So, when the time comes to
send the packet through another upcall, the per-CPU dispatch will
choose a different Netlink PID, and this packet will end up processed
by a different handler thread on a different CPU.
The process continues as long as there are new recirculations, each
time the packet goes to a different handler thread before it is sent
out of the OVS datapath to the destination port. In real setups the
number of recirculations can go up to 4 or 5, sometimes more.
There is always a chance to re-order packets while processing upcalls,
because userspace will first install the flow and then re-inject the
original packet. So, there is a race window when the flow is already
installed and the second packet can match it and be forwarded to the
destination before the first packet is re-injected. But the fact that
packets are going through multiple upcalls handled by different
userspace threads makes the reordering noticeably more likely, because
we not only have a race between the kernel and a userspace handler
(which is hard to avoid), but also between multiple userspace handlers.
For example, let's assume that 10 packets got enqueued through a MISS
upcall for handler-1, it will start processing them, will install the
flow into the kernel and start re-injecting packets back, from where
they will go through another MISS to handler-2. Handler-2 will install
the flow into the kernel and start re-injecting the packets, while
handler-1 continues to re-inject the last of the 10 packets, they will
hit the flow installed by handler-2 and be forwarded without going to
the handler-2, while handler-2 still re-injects the first of these 10
packets. Given multiple recirculations and misses, these 10 packets
may end up completely mixed up on the output from the datapath.
Let's allow userspace to specify on which Netlink PID the packets
should be upcalled while processing OVS_PACKET_CMD_EXECUTE.
This makes it possible to ensure that all the packets are processed
by the same handler thread in the userspace even with them being
upcalled multiple times in the process. Packets will remain in order
since they will be enqueued to the same socket and re-injected in the
same order. This doesn't eliminate re-ordering as stated above, since
we still have a race between kernel and the userspace thread, but it
allows to eliminate races between multiple userspace threads.
Userspace knows the PID of the socket on which the original upcall is
received, so there is no need to send it up from the kernel.
Solution requires storing the value somewhere for the duration of the
packet processing. There are two potential places for this: our skb
extension or the per-CPU storage. It's not clear which is better,
so just following currently used scheme of storing this kind of things
along the skb. We still have a decent amount of space in the cb.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20250702155043.2331772-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Merge series from Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>:
This is prepare to hiding snd_soc_dapm_context inside soc-dapm.c
|
|
With this change each pwmchip defining the new-style waveform callbacks
can be accessed from userspace via a character device. Compared to the
sysfs-API this is faster and allows to pass the whole configuration in a
single ioctl allowing atomic application and thus reducing glitches.
On an STM32MP13 I see:
root@DistroKit:~ time pwmtestperf
real 0m 1.27s
user 0m 0.02s
sys 0m 1.21s
root@DistroKit:~ rm /dev/pwmchip0
root@DistroKit:~ time pwmtestperf
real 0m 3.61s
user 0m 0.27s
sys 0m 3.26s
pwmtestperf does essentially:
for i in 0 .. 50000:
pwm_set_waveform(duty_length_ns=i, period_length_ns=50000, duty_offset_ns=0)
and in the presence of /dev/pwmchip0 is uses the ioctls introduced here,
without that device it uses /sys/class/pwm/pwmchip0.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Link: https://lore.kernel.org/r/ad4a4e49ae3f8ea81e23cac1ac12b338c3bf5c5b.1746010245.git.u.kleine-koenig@baylibre.com
Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org>
|
|
Add a VM_BIND ioctl for binding/unbinding buffers into a VM. This is
only supported if userspace has opted in to MSM_PARAM_EN_VM_BIND.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Tested-by: Antonino Maniscalco <antomani103@gmail.com>
Reviewed-by: Antonino Maniscalco <antomani103@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/661524/
|
|
This submitqueue type isn't tied to a hw ringbuffer, but instead
executes on the CPU for performing async VM_BIND ops.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Tested-by: Antonino Maniscalco <antomani103@gmail.com>
Reviewed-by: Antonino Maniscalco <antomani103@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/661517/
|
|
We'll be re-using these for the VM_BIND ioctl.
Also, rename a few things in the uapi header to reflect that syncobj use
is not specific to the submit ioctl.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Tested-by: Antonino Maniscalco <antomani103@gmail.com>
Reviewed-by: Antonino Maniscalco <antomani103@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/661512/
|
|
Buffers that are not shared between contexts can share a single resv
object. This way drm_gpuvm will not track them as external objects, and
submit-time validating overhead will be O(1) for all N non-shared BOs,
instead of O(n).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Tested-by: Antonino Maniscalco <antomani103@gmail.com>
Reviewed-by: Antonino Maniscalco <antomani103@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/661497/
|
|
Add a SET_PARAM for userspace to request to manage to the VM itself,
instead of getting a kernel managed VM.
In order to transition to a userspace managed VM, this param must be set
before any mappings are created.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Tested-by: Antonino Maniscalco <antomani103@gmail.com>
Reviewed-by: Antonino Maniscalco <antomani103@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/661494/
|
|
Add PRR (Partial Resident Region) is a bypass address which make GPU
writes go to /dev/null and reads return zero. This is used to implement
vulkan sparse residency.
To support PRR/NULL mappings, we allocate a page to reserve a physical
address which we know will not be used as part of a GEM object, and
configure the SMMU to use this address for PRR/NULL mappings.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Tested-by: Antonino Maniscalco <antomani103@gmail.com>
Reviewed-by: Antonino Maniscalco <antomani103@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/661486/
|
|
The link to the libcap library is outdated. Instead, use a link to the
libcap2 library.
As well, give the complete reference of the POSIX compliance.
Signed-off-by: Ariel Otilibili <ariel.otilibili-anieli@eurecom.fr>
Acked-by: Andrew G. Morgan <morgan@kernel.org>
Reviewed-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Serge Hallyn <sergeh@kernel.org>
|
|
Now that we expose struct file_attr as our uapi struct rename all the
internal struct to struct file_kattr to clearly communicate that it is a
kernel internal struct. This is similar to struct mount_{k}attr and
others.
Link: https://lore.kernel.org/20250703-restlaufzeit-baurecht-9ed44552b481@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.16-rc5).
No conflicts.
No adjacent changes.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add support for a stream API to the kernel and expose related kfuncs to
BPF programs. Two streams are exposed, BPF_STDOUT and BPF_STDERR. These
can be used for printing messages that can be consumed from user space,
thus it's similar in spirit to existing trace_pipe interface.
The kernel will use the BPF_STDERR stream to notify the program of any
errors encountered at runtime. BPF programs themselves may use both
streams for writing debug messages. BPF library-like code may use
BPF_STDERR to print warnings or errors on misuse at runtime.
The implementation of a stream is as follows. Everytime a message is
emitted from the kernel (directly, or through a BPF program), a record
is allocated by bump allocating from per-cpu region backed by a page
obtained using alloc_pages_nolock(). This ensures that we can allocate
memory from any context. The eventual plan is to discard this scheme in
favor of Alexei's kmalloc_nolock() [0].
This record is then locklessly inserted into a list (llist_add()) so
that the printing side doesn't require holding any locks, and works in
any context. Each stream has a maximum capacity of 4MB of text, and each
printed message is accounted against this limit.
Messages from a program are emitted using the bpf_stream_vprintk kfunc,
which takes a stream_id argument in addition to working otherwise
similar to bpf_trace_vprintk.
The bprintf buffer helpers are extracted out to be reused for printing
the string into them before copying it into the stream, so that we can
(with the defined max limit) format a string and know its true length
before performing allocations of the stream element.
For consuming elements from a stream, we expose a bpf(2) syscall command
named BPF_PROG_STREAM_READ_BY_FD, which allows reading data from the
stream of a given prog_fd into a user space buffer. The main logic is
implemented in bpf_stream_read(). The log messages are queued in
bpf_stream::log by the bpf_stream_vprintk kfunc, and then pulled and
ordered correctly in the stream backlog.
For this purpose, we hold a lock around bpf_stream_backlog_peek(), as
llist_del_first() (if we maintained a second lockless list for the
backlog) wouldn't be safe from multiple threads anyway. Then, if we
fail to find something in the backlog log, we splice out everything from
the lockless log, and place it in the backlog log, and then return the
head of the backlog. Once the full length of the element is consumed, we
will pop it and free it.
The lockless list bpf_stream::log is a LIFO stack. Elements obtained
using a llist_del_all() operation are in LIFO order, thus would break
the chronological ordering if printed directly. Hence, this batch of
messages is first reversed. Then, it is stashed into a separate list in
the stream, i.e. the backlog_log. The head of this list is the actual
message that should always be returned to the caller. All of this is
done in bpf_stream_backlog_fill().
From the kernel side, the writing into the stream will be a bit more
involved than the typical printk. First, the kernel typically may print
a collection of messages into the stream, and parallel writers into the
stream may suffer from interleaving of messages. To ensure each group of
messages is visible atomically, we can lift the advantage of using a
lockless list for pushing in messages.
To enable this, we add a bpf_stream_stage() macro, and require kernel
users to use bpf_stream_printk statements for the passed expression to
write into the stream. Underneath the macro, we have a message staging
API, where a bpf_stream_stage object on the stack accumulates the
messages being printed into a local llist_head, and then a commit
operation splices the whole batch into the stream's lockless log list.
This is especially pertinent for rqspinlock deadlock messages printed to
program streams. After this change, we see each deadlock invocation as a
non-interleaving contiguous message without any confusion on the
reader's part, improving their user experience in debugging the fault.
While programs cannot benefit from this staged stream writing API, they
could just as well hold an rqspinlock around their print statements to
serialize messages, hence this is kept kernel-internal for now.
Overall, this infrastructure provides NMI-safe any context printing of
messages to two dedicated streams.
Later patches will add support for printing splats in case of BPF arena
page faults, rqspinlock deadlocks, and cond_break timeouts, and
integration of this facility into bpftool for dumping messages to user
space.
[0]: https://lore.kernel.org/bpf/20250501032718.65476-1-alexei.starovoitov@gmail.com
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250703204818.925464-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for 6.17:
UAPI Changes:
Cross-subsystem Changes:
Core Changes:
- bridge: More reference counting
- dp: Implement backlight control helpers
- fourcc: Add half-float and 32b float formats, RGB161616, BGR161616
- mipi-dsi: Drop MIPI_DSI_MODE_VSYNC_FLUSH flag
- ttm: Improve eviction
Driver Changes:
- i915: Use backlight control helpers for eDP
- tidss: Add AM65x OLDI bridge support
- panels:
- panel-edp: Add CMN N116BCJ-EAK support
- raydium-rm67200: misc cleanups, optional reset
- new panel: DJN HX83112B
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://lore.kernel.org/r/20250703-chirpy-lilac-dalmatian-2c5838@houat
|
|
The Renesas Camera Receiver Unit in the RZ/V2H SoC can output RAW
data captured from an image sensor without conversion to an RGB/YUV
format. In that case the data are packed into 64-bit blocks, with a
variable amount of padding in the most significant bits depending on
the bitdepth of the data. Add new V4L2 pixel format codes for the new
formats, along with documentation to describe them.
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
Signed-off-by: Daniel Scally <dan.scally+renesas@ideasonboard.com>
Link: https://lore.kernel.org/r/20250630222734.2712390-1-dan.scally@ideasonboard.com
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
|
|
Use the clamp() function from minmax.h and provide a define for the max
sizes as they will be used in subsequent patches.
Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com>
Reviewed-by: Stefan Klug <stefan.klug@ideasonboard.com>
Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
|
|
__kernel_rwf_t is defined as int, the actual size of which is
implementation defined. It won't go well if some compiler / archs
ever defines it as i64, so replace it with __u32, hoping that
there is no one using i16 for it.
Cc: stable@vger.kernel.org
Fixes: 2b188cc1bb857 ("Add io_uring IO interface")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/47c666c4ee1df2018863af3a2028af18feef11ed.1751412511.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Introduce support for specifying relative bandwidth shares between
traffic classes (TC) in the devlink-rate API. This new option allows
users to allocate bandwidth across multiple traffic classes in a
single command.
This feature provides a more granular control over traffic management,
especially for scenarios requiring Enhanced Transmission Selection.
Users can now define a relative bandwidth share for each traffic class.
For example, assigning share values of 20 to TC0 (TCP/UDP) and 80 to TC5
(RoCE) will result in TC0 receiving 20% and TC5 receiving 80% of the
total bandwidth. The actual percentage each class receives depends on
the ratio of its share value to the sum of all shares.
Example:
DEV=pci/0000:08:00.0
$ devlink port function rate add $DEV/vfs_group tx_share 10Gbit \
tx_max 50Gbit tc-bw 0:20 1:0 2:0 3:0 4:0 5:80 6:0 7:0
$ devlink port function rate set $DEV/vfs_group \
tc-bw 0:20 1:0 2:0 3:0 4:0 5:20 6:60 7:0
Example usage with ynl:
./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml \
--do rate-set --json '{
"bus-name": "pci",
"dev-name": "0000:08:00.0",
"port-index": 1,
"rate-tc-bws": [
{"rate-tc-index": 0, "rate-tc-bw": 50},
{"rate-tc-index": 1, "rate-tc-bw": 50},
{"rate-tc-index": 2, "rate-tc-bw": 0},
{"rate-tc-index": 3, "rate-tc-bw": 0},
{"rate-tc-index": 4, "rate-tc-bw": 0},
{"rate-tc-index": 5, "rate-tc-bw": 0},
{"rate-tc-index": 6, "rate-tc-bw": 0},
{"rate-tc-index": 7, "rate-tc-bw": 0}
]
}'
./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml \
--do rate-get --json '{
"bus-name": "pci",
"dev-name": "0000:08:00.0",
"port-index": 1
}'
output for rate-get:
{'bus-name': 'pci',
'dev-name': '0000:08:00.0',
'port-index': 1,
'rate-tc-bws': [{'rate-tc-bw': 50, 'rate-tc-index': 0},
{'rate-tc-bw': 50, 'rate-tc-index': 1},
{'rate-tc-bw': 0, 'rate-tc-index': 2},
{'rate-tc-bw': 0, 'rate-tc-index': 3},
{'rate-tc-bw': 0, 'rate-tc-index': 4},
{'rate-tc-bw': 0, 'rate-tc-index': 5},
{'rate-tc-bw': 0, 'rate-tc-index': 6},
{'rate-tc-bw': 0, 'rate-tc-index': 7}],
'rate-tx-max': 0,
'rate-tx-priority': 0,
'rate-tx-share': 0,
'rate-tx-weight': 0,
'rate-type': 'leaf'}
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250629142138.361537-3-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
PR_MTE_STORE_ONLY is used to restrict the MTE tag check for store
opeartion only.
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20250618092957.2069907-3-yeoreum.yun@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Introduce file_getattr() and file_setattr() syscalls to manipulate inode
extended attributes. The syscalls takes pair of file descriptor and
pathname. Then it operates on inode opened accroding to openat()
semantics. The struct file_attr is passed to obtain/change extended
attributes.
This is an alternative to FS_IOC_FSSETXATTR ioctl with a difference
that file don't need to be open as we can reference it with a path
instead of fd. By having this we can manipulated inode extended
attributes not only on regular files but also on special ones. This
is not possible with FS_IOC_FSSETXATTR ioctl as with special files
we can not call ioctl() directly on the filesystem inode using fd.
This patch adds two new syscalls which allows userspace to get/set
extended inode attributes on special files by using parent directory
and a path - *at() like syscall.
CC: linux-api@vger.kernel.org
CC: linux-fsdevel@vger.kernel.org
CC: linux-xfs@vger.kernel.org
Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org>
Link: https://lore.kernel.org/20250630-xattrat-syscall-v6-6-c4e3bc35227b@kernel.org
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add a flag that enables polling on the mock file. For now it's trivially
says that there is always data available, it'll be extended in the
future.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f16de043ec4876d65fae294fc99ade57415fba0c.1750599274.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|