summaryrefslogtreecommitdiffstats
path: root/include/uapi
AgeCommit message (Collapse)AuthorLines
2023-05-30spi: add SPI_MOSI_IDLE_LOW mode bitBoerge Struempfel-1/+2
Some spi controller switch the mosi line to high, whenever they are idle. This may not be desired in all use cases. For example neopixel leds can get confused and flicker due to misinterpreting the idle state. Therefore, we introduce a new spi-mode bit, with which the idle behaviour can be overwritten on a per device basis. Signed-off-by: Boerge Struempfel <boerge.struempfel@gmail.com> Link: https://lore.kernel.org/r/20230530141641.1155691-2-boerge.struempfel@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-05-30firewire: cdev: add new event to notify phy packet with time stampTakashi Sakamoto-10/+57
This commit adds new event to notify event of phy packet with time stamp field. Unlike the fw_cdev_event_request3 and fw_cdev_event_response2, the size of new structure, fw_cdev_event_phy_packet2, is multiples of 8, thus padding is not required to keep the same size between System V ABI for different architectures. It is noticeable that for the case of ping request 1394 OHCI controller does not record the isochronous cycle at which the packet was sent for the request subaction. Instead, it records round-trip count measured by hardware at 42.195 MHz resolution. Cc: kunit-dev@googlegroups.com Link: https://lore.kernel.org/r/20230529113406.986289-12-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2023-05-30firewire: cdev: add new event to notify response subaction with time stampTakashi Sakamoto-10/+49
This commit adds new event to notify event of response subaction with time stamp field. Current compiler implementation of System V ABI selects one of structure members which has the maximum alignment size in the structure to decide the size of structure. In the case of fw_cdev_event_request3 structure, it is closure member which has 8 byte storage. The size of alignment for the type of 8 byte storage differs depending on architectures; 4 byte for i386 architecture and 8 byte for the others including x32 architecture. It is inconvenient to device driver developer to use structure layout which varies between architectures since the developer takes care of ioctl compat layer. This commit adds 32 bit member for padding to keep the size of structure as multiples of 8. Cc: kunit-dev@googlegroups.com Link: https://lore.kernel.org/r/20230529113406.986289-9-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2023-05-30firewire: cdev: add new event to notify request subaction with time stampTakashi Sakamoto-2/+51
This commit adds new event to notify event of request subaction with time stamp field. Current compiler implementation of System V ABI selects one of structure members which has the maximum alignment size in the structure to decide the size of structure. In the case of fw_cdev_event_request3 structure, it is closure member which has 8 byte storage. The size of alignment for the type of 8 byte storage differs depending on architectures; 4 byte for i386 architecture and 8 byte for the others including x32 architecture. It is inconvenient to device driver developer to use structure layout which varies between architectures since the developer takes care of ioctl compat layer. This commit adds 32 bit member for padding to keep the size of structure as multiples of 8. Cc: kunit-dev@googlegroups.com Link: https://lore.kernel.org/r/20230529113406.986289-4-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2023-05-30firewire: cdev: add new version of ABI to notify time stamp at ↵Takashi Sakamoto-0/+1
request/response subaction of transaction This commit adds new version of ABI for future new events with time stamp for request/response subaction of asynchronous transaction to user space. Link: https://lore.kernel.org/r/20230529113406.986289-3-o-takashi@sakamocchi.jp Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2023-05-28uapi: wireless: Replace zero-length array with flexible-array memberGustavo A. R. Silva-1/+1
Zero-length and one-element arrays are deprecated, and we are moving towards adopting C99 flexible-array members, instead. Address the following warnings seen under GCC-13 and -fstrict-flex-arrays=3 enabled: drivers/staging/ks7010/ks_wlan_net.c:1597:50: warning: array subscript 0 is outside array bounds of ‘__u8[0]’ {aka ‘unsigned char[]’} [-Warray-bounds=] drivers/staging/ks7010/ks_wlan_net.c:1603:61: warning: array subscript 16 is outside array bounds of ‘__u8[0]’ {aka ‘unsigned char[]’} [-Warray-bounds=] drivers/staging/ks7010/ks_wlan_net.c:1604:61: warning: array subscript 24 is outside array bounds of ‘__u8[0]’ {aka ‘unsigned char[]’} [-Warray-bounds=] drivers/staging/ks7010/ks_wlan_net.c:1600:61: warning: array subscript 16 is outside array bounds of ‘__u8[0]’ {aka ‘unsigned char[]’} [-Warray-bounds=] drivers/staging/ks7010/ks_wlan_net.c:1586:50: warning: array subscript 0 is outside array bounds of ‘__u8[0]’ {aka ‘unsigned char[]’} [-Warray-bounds=] This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines on memcpy() and help us make progress towards globally enabling -fstrict-flex-arrays=3 [1]. This results in no differences in binary output. Link: https://github.com/KSPP/linux/issues/21 Link: https://github.com/KSPP/linux/issues/261 Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html [1] Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2023-05-29Merge tag 'drm-intel-gt-next-2023-05-24' of ↵Dave Airlie-1/+50
git://anongit.freedesktop.org/drm/drm-intel into drm-next UAPI Changes: - New getparam for querying PXP support and load status Cross-subsystem Changes: - GSC/MEI proxy driver Driver Changes: Fixes/improvements/new stuff: - Avoid clearing pre-allocated framebuffers with the TTM backend (Nirmoy Das) - Implement framebuffer mmap support (Nirmoy Das) - Disable sampler indirect state in bindless heap (Lionel Landwerlin) - Avoid out-of-bounds access when loading HuC (Lucas De Marchi) - Actually return an error if GuC version range check fails (John Harrison) - Get mutex and rpm ref just once in hwm_power_max_write (Ashutosh Dixit) - Disable PL1 power limit when loading GuC firmware (Ashutosh Dixit) - Block in hwmon while waiting for GuC reset to complete (Ashutosh Dixit) - Provide sysfs for SLPC efficient freq (Vinay Belgaumkar) - Add support for total context runtime for GuC back-end (Umesh Nerlige Ramappa) - Enable fdinfo for GuC backends (Umesh Nerlige Ramappa) - Don't capture Gen8 regs on Xe devices (John Harrison) - Fix error capture for virtual engines (John Harrison) - Track patch level versions on reduced version firmware files (John Harrison) - Decode another GuC load failure case (John Harrison) - GuC loading and firmware table handling fixes (John Harrison) - Fix confused register capture list creation (John Harrison) - Dump error capture to kernel log (John Harrison) - Dump error capture to dmesg on CTB error (John Harrison) - Disable rps_boost debugfs when SLPC is used (Vinay Belgaumkar) Future platform enablement: - Disable stolen memory backed FB for A0 [mtl] (Nirmoy Das) - Various refactors for multi-tile enablement (Andi Shyti, Tejas Upadhyay) - Extend Wa_22011802037 to MTL A-step (Madhumitha Tolakanahalli Pradeep) - WA to clear RDOP clock gating [mtl] (Haridhar Kalvala) - Set has_llc=0 [mtl] (Fei Yang) - Define MOCS and PAT tables for MTL (Madhumitha Tolakanahalli Pradeep) - Add PTE encode function [mtl] (Fei Yang) - fix mocs selftest [mtl] (Fei Yang) - Workaround coherency issue for Media [mtl] (Fei Yang) - Add workaround 14018778641 [mtl] (Tejas Upadhyay) - Implement Wa_14019141245 [mtl] (Radhakrishna Sripada) - Fix the wa number for Wa_22016670082 [mtl] (Radhakrishna Sripada) - Use correct huge page manager for MTL (Jonathan Cavitt) - GSC/MEI support for Meteorlake (Alexander Usyskin, Daniele Ceraolo Spurio) - Define GuC firmware version for MTL (John Harrison) - Drop FLAT CCS check [mtl] (Pallavi Mishra) - Add MTL for remapping CCS FBs [mtl] (Clint Taylor) - Meteorlake PXP enablement (Alan Previn) - Do not enable render power-gating on MTL (Andrzej Hajda) - Add MTL performance tuning changes (Radhakrishna Sripada) - Extend Wa_16014892111 to MTL A-step (Radhakrishna Sripada) - PMU multi-tile support (Tvrtko Ursulin) - End support for set caching ioctl [mtl] (Fei Yang) Driver refactors: - Use i915 instead of dev_priv insied the file_priv structure (Andi Shyti) - Use proper parameter naming in for_each_engine() (Andi Shyti) - Use gt_err for GT info (Tejas Upadhyay) - Consolidate duplicated capture list code (John Harrison) - Capture list naming clean up (John Harrison) - Use kernel-doc -Werror when CONFIG_DRM_I915_WERROR=y (Jani Nikula) - Preparation for using PAT index (Fei Yang) - Use pat_index instead of cache_level (Fei Yang) Miscellaneous: - Fix memory leaks in i915 selftests (Cong Liu) - Record GT error for gt failure (Tejas Upadhyay) - Migrate platform-dependent mock hugepage selftests to live (Jonathan Cavitt) - Update the SLPC selftest (Vinay Belgaumkar) - Throw out set() wrapper (Jani Nikula) - Large driver kernel doc cleanup (Jani Nikula) - Fix probe injection CI failures after recent change (John Harrison) - Make unexpected firmware versions an error in debug builds (John Harrison) - Silence UBSAN uninitialized bool variable warning (Ashutosh Dixit) - Fix memory leaks in function live_nop_switch (Cong Liu) Merges: - Merge drm/drm-next into drm-intel-gt-next (Joonas Lahtinen) Signed-off-by: Dave Airlie <airlied@redhat.com> # Conflicts: # drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ZG5SxCWRSkZhTDtY@tursulin-desk
2023-05-26Merge tag 'for-netdev' of ↵Jakub Kicinski-0/+10
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-05-26 We've added 54 non-merge commits during the last 10 day(s) which contain a total of 76 files changed, 2729 insertions(+), 1003 deletions(-). The main changes are: 1) Add the capability to destroy sockets in BPF through a new kfunc, from Aditi Ghag. 2) Support O_PATH fds in BPF_OBJ_PIN and BPF_OBJ_GET commands, from Andrii Nakryiko. 3) Add capability for libbpf to resize datasec maps when backed via mmap, from JP Kobryn. 4) Move all the test kfuncs for CI out of the kernel and into bpf_testmod, from Jiri Olsa. 5) Big batch of xsk selftest improvements to prep for multi-buffer testing, from Magnus Karlsson. 6) Show the target_{obj,btf}_id in tracing link's fdinfo and dump it via bpftool, from Yafang Shao. 7) Various misc BPF selftest improvements to work with upcoming LLVM 17, from Yonghong Song. 8) Extend bpftool to specify netdevice for resolving XDP hints, from Larysa Zaremba. 9) Document masking in shift operations for the insn set document, from Dave Thaler. 10) Extend BPF selftests to check xdp_feature support for bond driver, from Lorenzo Bianconi. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (54 commits) bpf: Fix bad unlock balance on freeze_mutex libbpf: Ensure FD >= 3 during bpf_map__reuse_fd() libbpf: Ensure libbpf always opens files with O_CLOEXEC selftests/bpf: Check whether to run selftest libbpf: Change var type in datasec resize func bpf: drop unnecessary bpf_capable() check in BPF_MAP_FREEZE command libbpf: Selftests for resizing datasec maps libbpf: Add capability for resizing datasec maps selftests/bpf: Add path_fd-based BPF_OBJ_PIN and BPF_OBJ_GET tests libbpf: Add opts-based bpf_obj_pin() API and add support for path_fd bpf: Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands libbpf: Start v1.3 development cycle bpf: Validate BPF object in BPF_OBJ_PIN before calling LSM bpftool: Specify XDP Hints ifname when loading program selftests/bpf: Add xdp_feature selftest for bond device selftests/bpf: Test bpf_sock_destroy selftests/bpf: Add helper to get port using getsockname bpf: Add bpf_sock_destroy kfunc bpf: Add kfunc filter function to 'struct btf_kfunc_id_set' bpf: udp: Implement batching for sockets iterator ... ==================== Link: https://lore.kernel.org/r/20230526222747.17775-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski-2/+4
Cross-merge networking fixes after downstream PR. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski-0/+1
Cross-merge networking fixes after downstream PR. Conflicts: net/ipv4/raw.c 3632679d9e4f ("ipv{4,6}/raw: fix output xfrm lookup wrt protocol") c85be08fc4fa ("raw: Stop using RTO_ONLINK.") https://lore.kernel.org/all/20230525110037.2b532b83@canb.auug.org.au/ Adjacent changes: drivers/net/ethernet/freescale/fec_main.c 9025944fddfe ("net: fec: add dma_wmb to ensure correct descriptor values") 144470c88c5d ("net: fec: using the standard return codes when xdp xmit errors") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-25Merge tag 'net-6.4-rc4' of ↵Linus Torvalds-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bluetooth and bpf. Current release - regressions: - net: fix skb leak in __skb_tstamp_tx() - eth: mtk_eth_soc: fix QoS on DSA MAC on non MTK_NETSYS_V2 SoCs Current release - new code bugs: - handshake: - fix sock->file allocation - fix handshake_dup() ref counting - bluetooth: - fix potential double free caused by hci_conn_unlink - fix UAF in hci_conn_hash_flush Previous releases - regressions: - core: fix stack overflow when LRO is disabled for virtual interfaces - tls: fix strparser rx issues - bpf: - fix many sockmap/TCP related issues - fix a memory leak in the LRU and LRU_PERCPU hash maps - init the offload table earlier - eth: mlx5e: - do as little as possible in napi poll when budget is 0 - fix using eswitch mapping in nic mode - fix deadlock in tc route query code Previous releases - always broken: - udplite: fix NULL pointer dereference in __sk_mem_raise_allocated() - raw: fix output xfrm lookup wrt protocol - smc: reset connection when trying to use SMCRv2 fails - phy: mscc: enable VSC8501/2 RGMII RX clock - eth: octeontx2-pf: fix TSOv6 offload - eth: cdc_ncm: deal with too low values of dwNtbOutMaxSize" * tag 'net-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits) udplite: Fix NULL pointer dereference in __sk_mem_raise_allocated(). net: phy: mscc: enable VSC8501/2 RGMII RX clock net: phy: mscc: remove unnecessary phydev locking net: phy: mscc: add support for VSC8501 net: phy: mscc: add VSC8502 to MODULE_DEVICE_TABLE net/handshake: Enable the SNI extension to work properly net/handshake: Unpin sock->file if a handshake is cancelled net/handshake: handshake_genl_notify() shouldn't ignore @flags net/handshake: Fix uninitialized local variable net/handshake: Fix handshake_dup() ref counting net/handshake: Remove unneeded check from handshake_dup() ipv6: Fix out-of-bounds access in ipv6_find_tlv() net: ethernet: mtk_eth_soc: fix QoS on DSA MAC on non MTK_NETSYS_V2 SoCs docs: netdev: document the existence of the mail bot net: fix skb leak in __skb_tstamp_tx() r8169: Use a raw_spinlock_t for the register locks. page_pool: fix inconsistency for page_pool_ring_[un]lock() bpf, sockmap: Test progs verifier error with latest clang bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer with drops bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer ...
2023-05-25media: uapi: Use unsigned int values for assigning bits in u32 fieldsSakari Ailus-14/+14
Use unsigned int values annoted by "U" for u32 fields. While this is a good practice, there doesn't appear to be a bug that this patch would fix. The patch has been generated using the following command: perl -i -pe 's/\([0-9]+\K <</U <</g; s/\|\s*0\K\)/U\)/' \ include/uapi/linux/media.h Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2023-05-25media: uapi: HEVC: Add num_delta_pocs_of_ref_rps_idx fieldBenjamin Gaignard-1/+5
Some drivers firmwares parse by themselves slice header and need num_delta_pocs_of_ref_rps_idx value to parse slice header short_term_ref_pic_set(). Use one of the 4 reserved bytes to store this value without changing the v4l2_ctrl_hevc_decode_params structure size and padding. This value also exist in DXVA API. Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> Signed-off-by: Yunfei Dong <yunfei.dong@mediatek.com> Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> [hverkuil: fix typo in num_delta_pocs_of_ref_rps_idx doc]
2023-05-25media: videodev2.h: Fix struct v4l2_input tuner index commentMarek Vasut-1/+1
VIDIOC_ENUMINPUT documentation describes the tuner field of struct v4l2_input as index: Documentation/userspace-api/media/v4l/vidioc-enuminput.rst " * - __u32 - ``tuner`` - Capture devices can have zero or more tuners (RF demodulators). When the ``type`` is set to ``V4L2_INPUT_TYPE_TUNER`` this is an RF connector and this field identifies the tuner. It corresponds to struct :c:type:`v4l2_tuner` field ``index``. For details on tuners see :ref:`tuner`. " Drivers I could find also use the 'tuner' field as an index, e.g.: drivers/media/pci/bt8xx/bttv-driver.c bttv_enum_input() drivers/media/usb/go7007/go7007-v4l2.c vidioc_enum_input() However, the UAPI comment claims this field is 'enum v4l2_tuner_type': include/uapi/linux/videodev2.h This field being 'enum v4l2_tuner_type' is unlikely as it seems to be never used that way in drivers, and documentation confirms it. It seem this comment got in accidentally in the commit which this patch fixes. Fix the UAPI comment to stop confusion. This was pointed out by Dmitry while reviewing VIDIOC_ENUMINPUT support for strace. Fixes: 6016af82eafc ("[media] v4l2: use __u32 rather than enums in ioctl() structs") Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2023-05-25media: videodev2.h: Fix p_s32 and p_s64 pointer typesDaniel Lundberg Pedersen-2/+2
Use the intended pointer types for p_s32 and p_64 in the union of the struct v4l2_ext_control. Fixes: e77eb66342c7 ("videodev2.h: add p_s32 and p_s64 pointers") Signed-off-by: Daniel Lundberg Pedersen <dlp@qtec.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2023-05-25exportfs: allow exporting non-decodeable file handles to userspaceAmir Goldstein-0/+5
Some userspace programs use st_ino as a unique object identifier, even though inode numbers may be recycable. This issue has been addressed for NFS export long ago using the exportfs file handle API and the unique file handle identifiers are also exported to userspace via name_to_handle_at(2). fanotify also uses file handles to identify objects in events, but only for filesystems that support NFS export. Relax the requirement for NFS export support and allow more filesystems to export a unique object identifier via name_to_handle_at(2) with the flag AT_HANDLE_FID. A file handle requested with the AT_HANDLE_FID flag, may or may not be usable as an argument to open_by_handle_at(2). To allow filesystems to opt-in to supporting AT_HANDLE_FID, a struct export_operations is required, but even an empty struct is sufficient for encoding FIDs. Acked-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230502124817.3070545-4-amir73il@gmail.com>
2023-05-24net/handshake: Enable the SNI extension to work properlyChuck Lever-0/+1
Enable the upper layer protocol to specify the SNI peername. This avoids the need for tlshd to use a DNS lookup, which can return a hostname that doesn't match the incoming certificate's SubjectName. Fixes: 2fd5532044a8 ("net/handshake: Add a kernel API for requesting a TLSv1.3 handshake") Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-24net: mdio: add clause 73 to ethtool conversion helperRussell King (Oracle)-0/+24
Add a helper to convert a clause 73 advertisement to an ethtool bitmap. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-23vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-XReinette Chatre-0/+3
Dynamic MSI-X is supported. Clear VFIO_IRQ_INFO_NORESIZE to provide guidance to user space. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/fd1ef2bf6ae972da8e2805bc95d5155af5a8fb0a.1683740667.git.reinette.chatre@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-05-23bpf: Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commandsAndrii Nakryiko-0/+10
Current UAPI of BPF_OBJ_PIN and BPF_OBJ_GET commands of bpf() syscall forces users to specify pinning location as a string-based absolute or relative (to current working directory) path. This has various implications related to security (e.g., symlink-based attacks), forces BPF FS to be exposed in the file system, which can cause races with other applications. One of the feedbacks we got from folks working with containers heavily was that inability to use purely FD-based location specification was an unfortunate limitation and hindrance for BPF_OBJ_PIN and BPF_OBJ_GET commands. This patch closes this oversight, adding path_fd field to BPF_OBJ_PIN and BPF_OBJ_GET UAPI, following conventions established by *at() syscalls for dirfd + pathname combinations. This now allows interesting possibilities like working with detached BPF FS mount (e.g., to perform multiple pinnings without running a risk of someone interfering with them), and generally making pinning/getting more secure and not prone to any races and/or security attacks. This is demonstrated by a selftest added in subsequent patch that takes advantage of new mount APIs (fsopen, fsconfig, fsmount) to demonstrate creating detached BPF FS mount, pinning, and then getting BPF map out of it, all while never exposing this private instance of BPF FS to outside worlds. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/bpf/20230523170013.728457-4-andrii@kernel.org
2023-05-23ipv{4,6}/raw: fix output xfrm lookup wrt protocolNicolas Dichtel-0/+1
With a raw socket bound to IPPROTO_RAW (ie with hdrincl enabled), the protocol field of the flow structure, build by raw_sendmsg() / rawv6_sendmsg()), is set to IPPROTO_RAW. This breaks the ipsec policy lookup when some policies are defined with a protocol in the selector. For ipv6, the sin6_port field from 'struct sockaddr_in6' could be used to specify the protocol. Just accept all values for IPPROTO_RAW socket. For ipv4, the sin_port field of 'struct sockaddr_in' could not be used without breaking backward compatibility (the value of this field was never checked). Let's add a new kind of control message, so that the userland could specify which protocol is used. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") CC: stable@vger.kernel.org Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/20230522120820.1319391-1-nicolas.dichtel@6wind.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-23Merge branch 'topic/midi20' into for-nextTakashi Iwai-19/+126
This is a (largish) patch set for adding the support of MIDI 2.0 functionality, mainly targeted for USB devices. MIDI 2.0 is a complete overhaul of the 40-years old MIDI 1.0. Unlike MIDI 1.0 byte stream, MIDI 2.0 uses packets in 32bit words for Universal MIDI Packet (UMP) protocol. It supports both MIDI 1.0 commands for compatibility and the extended MIDI 2.0 commands for higher resolutions and more functions. For supporting the UMP, the patch set extends the existing ALSA rawmidi and sequencer interfaces, and adds the USB MIDI 2.0 support to the standard USB-audio driver. The rawmidi for UMP has a different device name (/dev/snd/umpC*D*) and it reads/writes UMP packet data in 32bit CPU-native endianness. For the old MIDI 1.0 applications, the legacy rawmidi interface is provided, too. As default, USB-audio driver will take the alternate setting for MIDI 2.0 interface, and the compatibility with MIDI 1.0 is provided via the rawmidi common layer. However, user may let the driver falling back to the old MIDI 1.0 interface by a module option, too. A UMP-capable rawmidi device can create the corresponding ALSA sequencer client(s) to support the UMP Endpoint and UMP Group connections. As a nature of ALSA sequencer, arbitrary connections between clients/ports are allowed, and the ALSA sequencer core performs the automatic conversions for the connections between a new UMP sequencer client and a legacy MIDI 1.0 sequencer client. It allows the existing application to use MIDI 2.0 devices without changes. The MIDI-CI, which is another major extension in MIDI 2.0, isn't covered by this patch set. It would be implemented rather in user-space. Roughly speaking, the first half of this patch set is for extending the rawmidi and USB-audio, and the second half is for extending the ALSA sequencer interface. The patch set is based on 6.4-rc2 kernel, but all patches can be cleanly applicable on 6.2 and 6.3 kernels, too (while 6.1 and older kernels would need minor adjustment for uapi header changes). The updates for alsa-lib and alsa-utils will follow shortly later. The author thanks members of MIDI Association OS/API Working Group, especially Andrew Mee, for great helps for the initial design and debugging / testing the drivers. Link: https://lore.kernel.org/r/20230523075358.9672-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-05-23ALSA: seq: Add UMP group filterTakashi Iwai-1/+2
Add a new filter bitmap for UMP groups for reducing the unnecessary read/write when the client is connected to UMP EP seq port. The new group_filter field contains the bitmap for the groups, i.e. when the bit is set, the corresponding group is filtered out and the messages to that group won't be delivered. The filter bitmap consists of each bit of 1-based UMP Group number. The bit 0 is reserved for the future use. Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230523075358.9672-37-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-05-23ALSA: seq: Add ioctls for client UMP info query and setupTakashi Iwai-0/+14
Add new ioctls for sequencer clients to query and set the UMP endpoint and block information. As a sequencer client corresponds to a UMP Endpoint, one UMP Endpoint information can be assigned at most to a single sequencer client while multiple UMP block infos can be assigned by passing the type with the offset of block id (i.e. type = block_id + 1). For the kernel client, only SNDRV_SEQ_IOCTL_GET_CLIENT_UMP_INFO is allowed. Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230523075358.9672-35-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-05-23ALSA: seq: Bind UMP deviceTakashi Iwai-0/+1
This patch introduces a new ALSA sequencer client for the kernel UMP object, snd-seq-ump-client. It's a UMP version of snd-seq-midi driver, while this driver creates a sequencer client per UMP endpoint which contains (fixed) 16 ports. The UMP rawmidi device is opened in APPEND mode for output, so that multiple sequencer clients can share the same UMP endpoint, as well as the legacy UMP rawmidi devices that are opened in APPEND mode, too. For input, on the other hand, the incoming data is processed on the fly in the dedicated hook, hence it doesn't open a rawmidi device. The UMP packet group is updated upon delivery depending on the target sequencer port (which corresponds to the actual UMP group). Each sequencer port sets a new port type bit, SNDRV_SEQ_PORT_TYPE_MIDI_UMP, in addition to the other standard types for MIDI. Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230523075358.9672-33-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-05-23ALSA: seq: Allow suppressing UMP conversionsTakashi Iwai-0/+1
A sequencer client like seq_dummy rather doesn't want to convert UMP events but receives / sends as is. Add a new event filter flag to suppress the automatic UMP conversion and applies accordingly. Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230523075358.9672-32-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-05-23ALSA: seq: Add UMP group number to snd_seq_port_infoTakashi Iwai-1/+2
Add yet more new filed "ump_group" to snd_seq_port_info for specifying the associated UMP Group number for each sequencer port. This will be referred in the upcoming automatic UMP conversion in sequencer core. Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230523075358.9672-30-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-05-23ALSA: seq: Add port direction to snd_seq_port_infoTakashi Iwai-1/+8
Add a new field "direction" to snd_seq_port_info for allowing a client to tell the expected direction of the port access. A port might still allow subscriptions for read/write (e.g. for MIDI-CI) even if the primary usage of the port is a single direction (either input or output only). This new "direction" field can help to indicate such cases. When the direction is unspecified at creating a port and the port has either read or write capability, the corresponding direction bits are set automatically as default. Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230523075358.9672-29-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-05-23ALSA: seq: Support MIDI 2.0 UMP Endpoint portTakashi Iwai-0/+1
This is an extension to ALSA sequencer infrastructure to support the MIDI 2.0 UMP Endpoint port. It's a "catch-all" port that is supposed to be present for each UMP Endpoint. When this port is read via subscription, it sends any events from all ports (UMP Groups) found in the same client. A UMP Endpoint port can be created with the new capability bit SNDRV_SEQ_PORT_CAP_UMP_ENDPOINT. Although the port assignment isn't strictly defined, it should be the port number 0. Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230523075358.9672-28-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-05-23ALSA: seq: Add port inactive flagTakashi Iwai-0/+1
This extends the ALSA sequencer port capability bit to indicate the "inactive" flag. When this flag is set, the port is essentially invisible, and doesn't appear in the port query ioctls, while the direct access and the connection to this port are still allowed. The active/inactive state can be flipped dynamically, so that it can be visible at any time later. This feature is introduced basically for UMP; some UMP Groups in a UMP Block may be unassigned, hence those are practically invisible. On ALSA sequencer, the corresponding sequencer ports will get this new "inactive" flag to indicate the invisible state. Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230523075358.9672-27-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-05-23ALSA: seq: Add UMP supportTakashi Iwai-16/+37
Starting from this commit, we add the basic support of UMP (Universal MIDI Packet) events on ALSA sequencer infrastructure. The biggest change here is that, for transferring UMP packets that are up to 128 bits, we extend the data payload of ALSA sequencer event record when the client is declared to support for the new UMP events. A new event flag bit, SNDRV_SEQ_EVENT_UMP, is defined and it shall be set for the UMP packet events that have the larger payload of 128 bits, defined as struct snd_seq_ump_event. For controlling the UMP feature enablement in kernel, a new Kconfig, CONFIG_SND_SEQ_UMP is introduced. The extended event for UMP is available only when this Kconfig item is set. Similarly, the size of the internal snd_seq_event_cell also increases (in 4 bytes) when the Kconfig item is set. (But the size increase is effective only for 32bit architectures; 64bit archs already have padding there.) Overall, when CONFIG_SND_SEQ_UMP isn't set, there is no change in the event and cell, keeping the old sizes. For applications that want to access the UMP packets, first of all, a sequencer client has to declare the user-protocol to match with the latest one via the new SNDRV_SEQ_IOCTL_USER_PVERSION; otherwise it's treated as if a legacy client without UMP support. Then the client can switch to the new UMP mode (MIDI 1.0 or MIDI 2.0) with a new field, midi_version, in snd_seq_client_info. When switched to UMP mode (midi_version = 1 or 2), the client can write the UMP events with SNDRV_SEQ_EVENT_UMP flag. For reads, the alignment size is changed from snd_seq_event (28 bytes) to snd_seq_ump_event (32 bytes). When a UMP sequencer event is delivered to a legacy sequencer client, it's ignored or handled as an error. Conceptually, ALSA sequencer client and port correspond to the UMP Endpoint and Group, respectively; each client may have multiple ports and each port has the fixed number (16) of channels, total up to 256 channels. As of this commit, ALSA sequencer core just sends and receives the UMP events as-is from/to clients. The automatic conversions between the legacy events and the new UMP events will be implemented in a later patch. Along with this commit, bump the sequencer protocol version to 1.0.3. Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230523075358.9672-26-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-05-23ALSA: seq: Introduce SNDRV_SEQ_IOCTL_USER_PVERSION ioctlTakashi Iwai-0/+1
For the future extension of ALSA sequencer ABI, introduce a new ioctl SNDRV_SEQ_IOCTL_USER_PVERSION. This is similar like the ioctls used in PCM and other interfaces, for an application to specify its supporting ABI version. The use of this ioctl will be mandatory for the upcoming UMP support. Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230523075358.9672-25-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-05-23ALSA: ump: Add ioctls to inquiry UMP EP and Block info via control APITakashi Iwai-0/+2
It'd be convenient to have ioctls to inquiry the UMP Endpoint and UMP Block information directly via the control API without opening the rawmidi interface, just like SNDRV_CTL_IOCTL_RAWMIDI_INFO. This patch extends the rawmidi ioctl handler to support those; new ioctls, SNDRV_CTL_IOCTL_UMP_ENDPOINT_INFO and SNDRV_CTL_IOCTL_UMP_BLOCK_INFO, return the snd_ump_endpoint and snd_ump_block data that is specified by the device field, respectively. Suggested-by: Jaroslav Kysela <perex@perex.cz> Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230523075358.9672-6-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-05-23ALSA: rawmidi: Skip UMP devices at SNDRV_CTL_IOCTL_RAWMIDI_NEXT_DEVICETakashi Iwai-1/+2
Applications may look for rawmidi devices with the ioctl SNDRV_CTL_IOCTL_RAWMIDI_NEXT_DEVICE. Returning a UMP device from this ioctl may confuse the existing applications that support only the legacy rawmidi. This patch changes the code to skip the UMP devices from the lookup for avoiding the confusion, and introduces a new ioctl to look for the UMP devices instead. Along with this change, bump the CTL protocol version to 2.0.9. Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230523075358.9672-5-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-05-23ALSA: rawmidi: UMP supportTakashi Iwai-1/+56
This patch adds the support helpers for UMP (Universal MIDI Packet) in ALSA core. The basic design is that a rawmidi instance is assigned to each UMP Endpoint. A UMP Endpoint provides a UMP stream, typically bidirectional (but can be also uni-directional, too), which may hold up to 16 UMP Groups, where each UMP (input/output) Group corresponds to the traditional MIDI I/O Endpoint. Additionally, the ALSA UMP abstraction provides the multiple UMP Blocks that can be assigned to each UMP Endpoint. A UMP Block is a metadata to hold the UMP Group clusters, and can represent the functions assigned to each UMP Group. A typical implementation of UMP Block is the Group Terminal Blocks of USB MIDI 2.0 specification. For distinguishing from the legacy byte-stream MIDI device, a new device "umpC*D*" will be created, instead of the standard (MIDI 1.0) devices "midiC*D*". The UMP instance can be identified by the new rawmidi info bit SNDRV_RAWMIDI_INFO_UMP, too. A UMP rawmidi device reads/writes only in 4-bytes words alignment, stored in CPU native endianness. The transmit and receive functions take care of the input/out data alignment, and may return zero or aligned size, and the params ioctl may return -EINVAL when the given input/output buffer size isn't aligned. A few new UMP-specific ioctls are added for obtaining the new UMP endpoint and block information. As of this commit, no ALSA sequencer instance is attached to UMP devices yet. They will be supported by later patches. Along with those changes, the protocol version for rawmidi is bumped to 2.0.3. Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230523075358.9672-4-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-05-22scsi: block: Introduce ioprio hintsDamien Le Moal-0/+49
I/O priorities currently only use 6-bits of the 16-bits ioprio value: the 3-upper bits are used to define up to 8 priority classes (4 of which are valid) and the 3 lower bits of the value are used to define a priority level for the real-time and best-effort class. The remaining 10-bits between the I/O priority class and level are unused, and in fact, cannot be used by the user as doing so would either result in the value being completely ignored, or in an error returned by ioprio_check_cap(). Use these 10-bits of an ioprio value to allow a user to specify I/O hints. An I/O hint is defined as a 10-bitsvalue, allowing up to 1023 different hints to be specified, with the value 0 being reserved as the "no hint" case. An I/O hint can apply to any I/O that specifies a valid priority class other than NONE, regardless of the I/O priority level specified. To do so, the macros IOPRIO_PRIO_HINT() and IOPRIO_PRIO_VALUE_HINT() are introduced in include/uapi/linux/ioprio.h to respectively allow a user to get and set a hint in an ioprio value. To support the ATA and SCSI command duration limits feature, 7 hints are defined: IOPRIO_HINT_DEV_DURATION_LIMIT_1 to IOPRIO_HINT_DEV_DURATION_LIMIT_7, allowing a user to specify which command duration limit descriptor should be applied to the commands serving an I/O. Specifying these hints has for now no effect whatsoever if the target block devices do not support the command duration limits feature. However, in the future, block I/O schedulers can be modified to optimize I/O issuing order based on these hints, even for devices that do not support the command duration limits feature. Given that the 7 duration limits hints defined have no effect on any block layer component, the actual definition of the duration limits implied by these hints remains at the device level. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Link: https://lore.kernel.org/r/20230511011356.227789-3-nks@flawful.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-05-22scsi: block: ioprio: Clean up interface definitionDamien Le Moal-5/+14
The I/O priority user interface defines the 16-bits ioprio values as the combination of the upper 3-bits for an I/O priority class and the lower 13-bits as priority data. However, the kernel only uses the lower 3-bits of the priority data to define priority levels for the RT and BE priority classes. The data part of an ioprio value is completely ignored for the IDLE and NONE classes. This is enforced by checks done in ioprio_check_cap(), which is called for all paths that allow defining an I/O priority for I/Os: the per-context ioprio_set() system call, aio interface and io_uring interface. Clarify this fact in the uapi ioprio.h header file and introduce the IOPRIO_PRIO_LEVEL_MASK and IOPRIO_PRIO_LEVEL() macros for users to define and get priority levels in an ioprio value. The coarser macro IOPRIO_PRIO_DATA() is retained for backward compatibility with old applications already using it. There is no functional change introduced with this. In-kernel users of the IOPRIO_PRIO_DATA() macro which are explicitly handling I/O priority data as a priority level are modified to use the new IOPRIO_PRIO_LEVEL() macro without any functional change. Since f2fs is the only user of this macro not explicitly using that value as a priority level, it is left unchanged. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Link: https://lore.kernel.org/r/20230511011356.227789-2-nks@flawful.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-05-22drm/i915/pmu: Prepare for multi-tile non-engine countersTvrtko Ursulin-1/+16
Reserve some bits in the counter config namespace which will carry the tile id and prepare the code to handle this. No per tile counters have been added yet. v2: - Fix checkpatch issues - Use 4 bits for gt id in non-engine counters. Drop FIXME. - Set MAX GTs to 4. Drop FIXME. v3: (Ashutosh, Tvrtko) - Drop BUG_ON that would never fire - Make enable u64 - Pull in some code from next patch v4: Set I915_PMU_MAX_GTS to 2 (Tvrtko) v5: s/u64/u32 where needed (Ashutosh) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230519154946.3751971-7-umesh.nerlige.ramappa@intel.com
2023-05-22ASoC: Intel: Skylake: Fix declaration of enum skl_ch_cfgCezary Rojewski-1/+2
Constant 'C4_CHANNEL' does not exist on the firmware side. Value 0xC is reserved for 'C7_1' instead. Fixes: 04afbbbb1cba ("ASoC: Intel: Skylake: Update the topology interface structure") Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com> Signed-off-by: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com> Link: https://lore.kernel.org/r/20230519201711.4073845-4-amadeuszx.slawinski@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-05-19ublk: support user copyMing Lei-0/+3
Currently copy between io request buffer(pages) and userspace buffer is done inside ublk_map_io() or ublk_unmap_io(). This way performs very well in case of pre-allocated userspace io buffer. For dynamically allocated or external userspace backend io buffer, UBLK_F_NEED_GET_DATA is added for ublk server to provide buffer by one extra command communication for WRITE request. For READ, userspace simply provides buffer, but can't know when the buffer is done[1]. Add UBLK_F_USER_COPY by moving io data copy out of kernel by providing read()/write() on /dev/ublkcN, and simply let ublk server do the io data copy. This way makes both side cleaner, the cost is that one extra syscall for copy io data between request and backend buffer. With UBLK_F_USER_COPY, it actually becomes possible to run per-io zero copy now, such as, only do zero copy for big size IO, so it can be thought as one prep patch for supporting zero copy. Meantime zero copy still needs to expose read()/write() buffer for some corner case, such as passthrough IO. [1] READ buffer in UBLK_F_NEED_GET_DATA https://lore.kernel.org/linux-block/116d8a56-0881-56d3-9bcc-78ff3e1dc4e5@linux.alibaba.com/T/#m23bd4b8634c0a054e6797063167b469949a247bb ublksrv loop usercopy code: https://github.com/ming1/ubdsrv/commits/usercopy Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230519065030.351216-8-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-19ublk: add read()/write() support for ublk char deviceMing Lei-1/+21
Support pread()/pwrite() on ublk char device for reading/writing request io buffer, so data copy between io request buffer and userspace buffer can be moved to ublk server from ublk driver. Then UBLK_F_NEED_GET_DATA becomes not necessary, so ublk server can allocate buffer without one extra round uring command communication for userspace to provide buffer. IO buffer can be located by iocb->ki_pos which encodes buffer offset, io tag and queue id info, and type of iocb->ki_pos is u64, so it is big enough for holding reasonable queue depth, nr_queues and max io buffer size. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230519065030.351216-7-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-19fs: allow to mount beneath top mountChristian Brauner-1/+2
Various distributions are adding or are in the process of adding support for system extensions and in the future configuration extensions through various tools. A more detailed explanation on system and configuration extensions can be found on the manpage which is listed below at [1]. System extension images may – dynamically at runtime — extend the /usr/ and /opt/ directory hierarchies with additional files. This is particularly useful on immutable system images where a /usr/ and/or /opt/ hierarchy residing on a read-only file system shall be extended temporarily at runtime without making any persistent modifications. When one or more system extension images are activated, their /usr/ and /opt/ hierarchies are combined via overlayfs with the same hierarchies of the host OS, and the host /usr/ and /opt/ overmounted with it ("merging"). When they are deactivated, the mount point is disassembled — again revealing the unmodified original host version of the hierarchy ("unmerging"). Merging thus makes the extension's resources suddenly appear below the /usr/ and /opt/ hierarchies as if they were included in the base OS image itself. Unmerging makes them disappear again, leaving in place only the files that were shipped with the base OS image itself. System configuration images are similar but operate on directories containing system or service configuration. On nearly all modern distributions mount propagation plays a crucial role and the rootfs of the OS is a shared mount in a peer group (usually with peer group id 1): TARGET SOURCE FSTYPE PROPAGATION MNT_ID PARENT_ID / / ext4 shared:1 29 1 On such systems all services and containers run in a separate mount namespace and are pivot_root()ed into their rootfs. A separate mount namespace is almost always used as it is the minimal isolation mechanism services have. But usually they are even much more isolated up to the point where they almost become indistinguishable from containers. Mount propagation again plays a crucial role here. The rootfs of all these services is a slave mount to the peer group of the host rootfs. This is done so the service will receive mount propagation events from the host when certain files or directories are updated. In addition, the rootfs of each service, container, and sandbox is also a shared mount in its separate peer group: TARGET SOURCE FSTYPE PROPAGATION MNT_ID PARENT_ID / / ext4 shared:24 master:1 71 47 For people not too familiar with mount propagation, the master:1 means that this is a slave mount to peer group 1. Which as one can see is the host rootfs as indicated by shared:1 above. The shared:24 indicates that the service rootfs is a shared mount in a separate peer group with peer group id 24. A service may run other services. Such nested services will also have a rootfs mount that is a slave to the peer group of the outer service rootfs mount. For containers things are just slighly different. A container's rootfs isn't a slave to the service's or host rootfs' peer group. The rootfs mount of a container is simply a shared mount in its own peer group: TARGET SOURCE FSTYPE PROPAGATION MNT_ID PARENT_ID /home/ubuntu/debian-tree / ext4 shared:99 61 60 So whereas services are isolated OS components a container is treated like a separate world and mount propagation into it is restricted to a single well known mount that is a slave to the peer group of the shared mount /run on the host: TARGET SOURCE FSTYPE PROPAGATION MNT_ID PARENT_ID /propagate/debian-tree /run/host/incoming tmpfs master:5 71 68 Here, the master:5 indicates that this mount is a slave to the peer group with peer group id 5. This allows to propagate mounts into the container and served as a workaround for not being able to insert mounts into mount namespaces directly. But the new mount api does support inserting mounts directly. For the interested reader the blogpost in [2] might be worth reading where I explain the old and the new approach to inserting mounts into mount namespaces. Containers of course, can themselves be run as services. They often run full systems themselves which means they again run services and containers with the exact same propagation settings explained above. The whole system is designed so that it can be easily updated, including all services in various fine-grained ways without having to enter every single service's mount namespace which would be prohibitively expensive. The mount propagation layout has been carefully chosen so it is possible to propagate updates for system extensions and configurations from the host into all services. The simplest model to update the whole system is to mount on top of /usr, /opt, or /etc on the host. The new mount on /usr, /opt, or /etc will then propagate into every service. This works cleanly the first time. However, when the system is updated multiple times it becomes necessary to unmount the first update on /opt, /usr, /etc and then propagate the new update. But this means, there's an interval where the old base system is accessible. This has to be avoided to protect against downgrade attacks. The vfs already exposes a mechanism to userspace whereby mounts can be mounted beneath an existing mount. Such mounts are internally referred to as "tucked". The patch series exposes the ability to mount beneath a top mount through the new MOVE_MOUNT_BENEATH flag for the move_mount() system call. This allows userspace to seamlessly upgrade mounts. After this series the only thing that will have changed is that mounting beneath an existing mount can be done explicitly instead of just implicitly. Today, there are two scenarios where a mount can be mounted beneath an existing mount instead of on top of it: (1) When a service or container is started in a new mount namespace and pivot_root()s into its new rootfs. The way this is done is by mounting the new rootfs beneath the old rootfs: fd_newroot = open("/var/lib/machines/fedora", ...); fd_oldroot = open("/", ...); fchdir(fd_newroot); pivot_root(".", "."); After the pivot_root(".", ".") call the new rootfs is mounted beneath the old rootfs which can then be unmounted to reveal the underlying mount: fchdir(fd_oldroot); umount2(".", MNT_DETACH); Since pivot_root() moves the caller into a new rootfs no mounts must be propagated out of the new rootfs as a consequence of the pivot_root() call. Thus, the mounts cannot be shared. (2) When a mount is propagated to a mount that already has another mount mounted on the same dentry. The easiest example for this is to create a new mount namespace. The following commands will create a mount namespace where the rootfs mount / will be a slave to the peer group of the host rootfs / mount's peer group. IOW, it will receive propagation from the host: mount --make-shared / unshare --mount --propagation=slave Now a new mount on the /mnt dentry in that mount namespace is created. (As it can be confusing it should be spelled out that the tmpfs mount on the /mnt dentry that was just created doesn't propagate back to the host because the rootfs mount / of the mount namespace isn't a peer of the host rootfs.): mount -t tmpfs tmpfs /mnt TARGET SOURCE FSTYPE PROPAGATION └─/mnt tmpfs tmpfs Now another terminal in the host mount namespace can observe that the mount indeed hasn't propagated back to into the host mount namespace. A new mount can now be created on top of the /mnt dentry with the rootfs mount / as its parent: mount --bind /opt /mnt TARGET SOURCE FSTYPE PROPAGATION └─/mnt /dev/sda2[/opt] ext4 shared:1 The mount namespace that was created earlier can now observe that the bind mount created on the host has propagated into it: TARGET SOURCE FSTYPE PROPAGATION └─/mnt /dev/sda2[/opt] ext4 master:1 └─/mnt tmpfs tmpfs But instead of having been mounted on top of the tmpfs mount at the /mnt dentry the /opt mount has been mounted on top of the rootfs mount at the /mnt dentry. And the tmpfs mount has been remounted on top of the propagated /opt mount at the /opt dentry. So in other words, the propagated mount has been mounted beneath the preexisting mount in that mount namespace. Mount namespaces make this easy to illustrate but it's also easy to mount beneath an existing mount in the same mount namespace (The following example assumes a shared rootfs mount / with peer group id 1): mount --bind /opt /opt TARGET SOURCE FSTYPE MNT_ID PARENT_ID PROPAGATION └─/opt /dev/sda2[/opt] ext4 188 29 shared:1 If another mount is mounted on top of the /opt mount at the /opt dentry: mount --bind /tmp /opt The following clunky mount tree will result: TARGET SOURCE FSTYPE MNT_ID PARENT_ID PROPAGATION └─/opt /dev/sda2[/tmp] ext4 405 29 shared:1 └─/opt /dev/sda2[/opt] ext4 188 405 shared:1 └─/opt /dev/sda2[/tmp] ext4 404 188 shared:1 The /tmp mount is mounted beneath the /opt mount and another copy is mounted on top of the /opt mount. This happens because the rootfs / and the /opt mount are shared mounts in the same peer group. When the new /tmp mount is supposed to be mounted at the /opt dentry then the /tmp mount first propagates to the root mount at the /opt dentry. But there already is the /opt mount mounted at the /opt dentry. So the old /opt mount at the /opt dentry will be mounted on top of the new /tmp mount at the /tmp dentry, i.e. @opt->mnt_parent is @tmp and @opt->mnt_mountpoint is /tmp (Note that @opt->mnt_root is /opt which is what shows up as /opt under SOURCE). So again, a mount will be mounted beneath a preexisting mount. (Fwiw, a few iterations of mount --bind /opt /opt in a loop on a shared rootfs is a good example of what could be referred to as mount explosion.) The main point is that such mounts allows userspace to umount a top mount and reveal an underlying mount. So for example, umounting the tmpfs mount on /mnt that was created in example (1) using mount namespaces reveals the /opt mount which was mounted beneath it. In (2) where a mount was mounted beneath the top mount in the same mount namespace unmounting the top mount would unmount both the top mount and the mount beneath. In the process the original mount would be remounted on top of the rootfs mount / at the /opt dentry again. This again, is a result of mount propagation only this time it's umount propagation. However, this can be avoided by simply making the parent mount / of the @opt mount a private or slave mount. Then the top mount and the original mount can be unmounted to reveal the mount beneath. These two examples are fairly arcane and are merely added to make it clear how mount propagation has effects on current and future features. More common use-cases will just be things like: mount -t btrfs /dev/sdA /mnt mount -t xfs /dev/sdB --beneath /mnt umount /mnt after which we'll have updated from a btrfs filesystem to a xfs filesystem without ever revealing the underlying mountpoint. The crux is that the proposed mechanism already exists and that it is so powerful as to cover cases where mounts are supposed to be updated with new versions. Crucially, it offers an important flexibility. Namely that updates to a system may either be forced or can be delayed and the umount of the top mount be left to a service if it is a cooperative one. This adds a new flag to move_mount() that allows to explicitly move a beneath the top mount adhering to the following semantics: * Mounts cannot be mounted beneath the rootfs. This restriction encompasses the rootfs but also chroots via chroot() and pivot_root(). To mount a mount beneath the rootfs or a chroot, pivot_root() can be used as illustrated above. * The source mount must be a private mount to force the kernel to allocate a new, unused peer group id. This isn't a required restriction but a voluntary one. It avoids repeating a semantical quirk that already exists today. If bind mounts which already have a peer group id are inserted into mount trees that have the same peer group id this can cause a lot of mount propagation events to be generated (For example, consider running mount --bind /opt /opt in a loop where the parent mount is a shared mount.). * Avoid getting rid of the top mount in the kernel. Cooperative services need to be able to unmount the top mount themselves. This also avoids a good deal of additional complexity. The umount would have to be propagated which would be another rather expensive operation. So namespace_lock() and lock_mount_hash() would potentially have to be held for a long time for both a mount and umount propagation. That should be avoided. * The path to mount beneath must be mounted and attached. * The top mount and its parent must be in the caller's mount namespace and the caller must be able to mount in that mount namespace. * The caller must be able to unmount the top mount to prove that they could reveal the underlying mount. * The propagation tree is calculated based on the destination mount's parent mount and the destination mount's mountpoint on the parent mount. Of course, if the parent of the destination mount and the destination mount are shared mounts in the same peer group and the mountpoint of the new mount to be mounted is a subdir of their ->mnt_root then both will receive a mount of /opt. That's probably easier to understand with an example. Assuming a standard shared rootfs /: mount --bind /opt /opt mount --bind /tmp /opt will cause the same mount tree as: mount --bind /opt /opt mount --beneath /tmp /opt because both / and /opt are shared mounts/peers in the same peer group and the /opt dentry is a subdirectory of both the parent's and the child's ->mnt_root. If a mount tree like that is created it almost always is an accident or abuse of mount propagation. Realistically what most people probably mean in this scenarios is: mount --bind /opt /opt mount --make-private /opt mount --make-shared /opt This forces the allocation of a new separate peer group for the /opt mount. Aferwards a mount --bind or mount --beneath actually makes sense as the / and /opt mount belong to different peer groups. Before that it's likely just confusion about what the user wanted to achieve. * Refuse MOVE_MOUNT_BENEATH if: (1) the @mnt_from has been overmounted in between path resolution and acquiring @namespace_sem when locking @mnt_to. This avoids the proliferation of shadow mounts. (2) if @to_mnt is moved to a different mountpoint while acquiring @namespace_sem to lock @to_mnt. (3) if @to_mnt is unmounted while acquiring @namespace_sem to lock @to_mnt. (4) if the parent of the target mount propagates to the target mount at the same mountpoint. This would mean mounting @mnt_from on @mnt_to->mnt_parent and then propagating a copy @c of @mnt_from onto @mnt_to. This defeats the whole purpose of mounting @mnt_from beneath @mnt_to. (5) if the parent mount @mnt_to->mnt_parent propagates to @mnt_from at the same mountpoint. If @mnt_to->mnt_parent propagates to @mnt_from this would mean propagating a copy @c of @mnt_from on top of @mnt_from. Afterwards @mnt_from would be mounted on top of @mnt_to->mnt_parent and @mnt_to would be unmounted from @mnt->mnt_parent and remounted on @mnt_from. But since @c is already mounted on @mnt_from, @mnt_to would ultimately be remounted on top of @c. Afterwards, @mnt_from would be covered by a copy @c of @mnt_from and @c would be covered by @mnt_from itself. This defeats the whole purpose of mounting @mnt_from beneath @mnt_to. Cases (1) to (3) are required as they deal with races that would cause bugs or unexpected behavior for users. Cases (4) and (5) refuse semantical quirks that would not be a bug but would cause weird mount trees to be created. While they can already be created via other means (mount --bind /opt /opt x n) there's no reason to repeat past mistakes in new features. Link: https://man7.org/linux/man-pages/man8/systemd-sysext.8.html [1] Link: https://brauner.io/2023/02/28/mounting-into-mount-namespaces.html [2] Link: https://github.com/flatcar/sysext-bakery Link: https://fedoraproject.org/wiki/Changes/Unified_Kernel_Support_Phase_1 Link: https://fedoraproject.org/wiki/Changes/Unified_Kernel_Support_Phase_2 Link: https://github.com/systemd/systemd/pull/26013 Reviewed-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org> Message-Id: <20230202-fs-move-mount-replace-v4-4-98f3d80d7eaa@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-05-18netfilter: nft_exthdr: add boolean DCCP option matchingJeremy Sowden-0/+2
The xt_dccp iptables module supports the matching of DCCP packets based on the presence or absence of DCCP options. Extend nft_exthdr to add this functionality to nftables. Link: https://bugzilla.netfilter.org/show_bug.cgi?id=930 Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-05-17Merge drm/drm-next into drm-intel-nextRodrigo Vivi-387/+1829
Backmerge to get some hwmon dependencies. Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-05-16KVM: arm64: Add KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZERicardo Koller-0/+2
Add a capability for userspace to specify the eager split chunk size. The chunk size specifies how many pages to break at a time, using a single allocation. Bigger the chunk size, more pages need to be allocated ahead of time. Suggested-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Link: https://lore.kernel.org/r/20230426172330.1439644-6-ricarkol@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-05-16io_uring: Add io_uring_setup flag to pre-register ring fd and never install itJosh Triplett-0/+7
With IORING_REGISTER_USE_REGISTERED_RING, an application can register the ring fd and use it via registered index rather than installed fd. This allows using a registered ring for everything *except* the initial mmap. With IORING_SETUP_NO_MMAP, io_uring_setup uses buffers allocated by the user, rather than requiring a subsequent mmap. The combination of the two allows a user to operate *entirely* via a registered ring fd, making it unnecessary to ever install the fd in the first place. So, add a flag IORING_SETUP_REGISTERED_FD_ONLY to make io_uring_setup register the fd and return a registered index, without installing the fd. This allows an application to avoid touching the fd table at all, and allows a library to never even momentarily install a file descriptor. This splits out an io_ring_add_registered_file helper from io_ring_add_registered_fd, for use by io_uring_setup. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Link: https://lore.kernel.org/r/bc8f431bada371c183b95a83399628b605e978a3.1682699803.git.josh@joshtriplett.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-16io_uring: support for user allocated memory for rings/sqesJens Axboe-2/+7
Currently io_uring applications must call mmap(2) twice to map the rings themselves, and the sqes array. This works fine, but it does not support using huge pages to back the rings/sqes. Provide a way for the application to pass in pre-allocated memory for the rings/sqes, which can then suitably be allocated from shmfs or via mmap to get huge page support. Particularly for larger rings, this reduces the TLBs needed. If an application wishes to take advantage of that, it must pre-allocate the memory needed for the sq/cq ring, and the sqes. The former must be passed in via the io_uring_params->cq_off.user_data field, while the latter is passed in via the io_uring_params->sq_off.user_data field. Then it must set IORING_SETUP_NO_MMAP in the io_uring_params->flags field, and io_uring will then map the existing memory into the kernel for shared use. The application must not call mmap(2) to map rings as it otherwise would have, that will now fail with -EINVAL if this setup flag was used. The pages used for the rings and sqes must be contigious. The intent here is clearly that huge pages should be used, otherwise the normal setup procedure works fine as-is. The application may use one huge page for both the rings and sqes. Outside of those initialization changes, everything works like it did before. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-15ALSA: emu10k1: enable bit-exact playback, part 1: DSP attenuationOswald Buddenhagen-3/+5
Fractional multiplication with the maximal value 2^31-1 causes some tiny distortion. Instead, we want to multiply with the full 2^31. The catch is of course that this cannot be represented in the DSP's signed 32 bit registers. One way to deal with this is to encode 1.0 as a negative number and special-case it. As a matter of fact, the SbLive! code path already contained such code, though the controls never actually exercised it. A more efficient approach is to use negative values, which actually extend to -2^31. Accordingly, for all the volume adjustments we now use the MAC1 instruction which negates the X operand. The range of the controls in highres mode is extended downwards, so -1 is the new zero/mute. At maximal excursion, real zero is not mute any more, but I don't think anyone will notice this behavior change. ;-) That also required making the min/max/values in the control structs signed. This technically changes the user space interface, but it seems implausible that someone would notice - the numbers were actually treated as if they were signed anyway (and in the actual mixer iface they _are_). And without this change, the min value didn't even make sense in the first place (and no-one noticed, because it was always 0). Tested-by: Jonathan Dowland <jon@dow.land> Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de> Link: https://lore.kernel.org/r/20230514170323.3408834-7-oswald.buddenhagen@gmx.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-05-15ASoC: SOF: Separate the tokens for input and output pin indexRanjani Sridharan-1/+2
Using the same token ID for both input and output format pin index results in collisions and incorrect pin index getting parsed from topology. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com Reviewed-by: Paul Olaru <paul.olaru@oss.nxp.com Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com Link: https://lore.kernel.org/r/20230515104403.32207-1-peter.ujfalusi@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org
2023-05-15drm/fourcc: define Intel Meteorlake related ccs modifiersJuha-Pekka Heikkila-0/+43
Add Tile4 type ccs modifiers with aux buffer needed for MTL Bspec: 49251, 49252, 49253 Cc: dri-devel@lists.freedesktop.org Cc: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230514184240.6184-1-juhapekka.heikkila@gmail.com