| Age | Commit message (Collapse) | Author | Lines |
|
* Add enforce_fs() for defining and enforcing a ruleset in one step
* In some places, dropped "ASSERT_LE(0, fd)" checks after
create_ruleset() call -- create_ruleset() already checks that.
* In some places, rename "file_fd" to "fd" if it is not needed to
disambiguate any more.
Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260327164838.38231-12-gnoack3000@gmail.com
[mic: Tweak subjet]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Even when a process is restricted with the new
LANDLOCK_ACCESS_FS_RESOLVE_UNIX right, the kernel can continue writing
its coredump to the configured coredump socket.
In the test, we create a local server and rewire the system to write
coredumps into it. We then create a child process within a Landlock
domain where LANDLOCK_ACCESS_FS_RESOLVE_UNIX is restricted and make
the process crash. The test uses SO_PEERCRED to check that the
connecting client process is the expected one.
Includes a fix by Mickaël Salaün for setting the EUID to 0 (see [1]).
Link[1]: https://lore.kernel.org/all/20260218.ohth8theu8Yi@digikod.net/
Suggested-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260327164838.38231-11-gnoack3000@gmail.com
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Add an audit test to check that Landlock denials from
LANDLOCK_ACCESS_FS_RESOLVE_UNIX result in audit logs in the expected
format. (There is one audit test for each filesystem access right, so
we should add one for LANDLOCK_ACCESS_FS_RESOLVE_UNIX as well.)
Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260327164838.38231-10-gnoack3000@gmail.com
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
* Extract common helpers from an existing IOCTL test that
also uses pathname unix(7) sockets.
* These tests use the common scoped domains fixture which is also used
in other Landlock scoping tests and which was used in Tingmao Wang's
earlier patch set in [1].
These tests exercise the cross product of the following scenarios:
* Stream connect(), Datagram connect(), Datagram sendmsg() and
Seqpacket connect().
* Child-to-parent and parent-to-child communication
* The Landlock policy configuration as listed in the scoped_domains
fixture.
* In the default variant, Landlock domains are only placed where
prescribed in the fixture.
* In the "ALL_DOMAINS" variant, Landlock domains are also placed in
the places where the fixture says to omit them, but with a
LANDLOCK_RULE_PATH_BENEATH that allows connection.
Cc: Justin Suess <utilityemal77@gmail.com>
Cc: Tingmao Wang <m@maowtm.org>
Cc: Mickaël Salaün <mic@digikod.net>
Link[1]: https://lore.kernel.org/all/53b9883648225d5a08e82d2636ab0b4fda003bc9.1767115163.git.m@maowtm.org/
Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260327164838.38231-9-gnoack3000@gmail.com
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
The access_fs_16 variable was originally intended to stay frozen at 16
access rights so that audit tests would not need updating when new
access rights are added. Now that we have 17 access rights, the name
is confusing.
Replace all uses of access_fs_16 with ACCESS_ALL and delete the
variable.
Suggested-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260327164838.38231-8-gnoack3000@gmail.com
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
* Add a new access right LANDLOCK_ACCESS_FS_RESOLVE_UNIX, which
controls the lookup operations for named UNIX domain sockets. The
resolution happens during connect() and sendmsg() (depending on
socket type).
* Change access_mask_t from u16 to u32 (see below)
* Hook into the path lookup in unix_find_bsd() in af_unix.c, using a
LSM hook. Make policy decisions based on the new access rights
* Increment the Landlock ABI version.
* Minor test adaptations to keep the tests working.
* Document the design rationale for scoped access rights,
and cross-reference it from the header documentation.
With this access right, access is granted if either of the following
conditions is met:
* The target socket's filesystem path was allow-listed using a
LANDLOCK_RULE_PATH_BENEATH rule, *or*:
* The target socket was created in the same Landlock domain in which
LANDLOCK_ACCESS_FS_RESOLVE_UNIX was restricted.
In case of a denial, connect() and sendmsg() return EACCES, which is
the same error as it is returned if the user does not have the write
bit in the traditional UNIX file system permissions of that file.
The access_mask_t type grows from u16 to u32 to make space for the new
access right. This also doubles the size of struct layer_access_masks
from 32 byte to 64 byte. To avoid memory layout inconsistencies between
architectures (especially m68k), pack and align struct access_masks [2].
Document the (possible future) interaction between scoped flags and
other access rights in struct landlock_ruleset_attr, and summarize the
rationale, as discussed in code review leading up to [3].
This feature was created with substantial discussion and input from
Justin Suess, Tingmao Wang and Mickaël Salaün.
Cc: Tingmao Wang <m@maowtm.org>
Cc: Justin Suess <utilityemal77@gmail.com>
Cc: Kuniyuki Iwashima <kuniyu@google.com>
Suggested-by: Jann Horn <jannh@google.com>
Link[1]: https://github.com/landlock-lsm/linux/issues/36
Link[2]: https://lore.kernel.org/all/20260401.Re1Eesu1Yaij@digikod.net/
Link[3]: https://lore.kernel.org/all/20260205.8531e4005118@gnoack.org/
Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20260327164838.38231-5-gnoack3000@gmail.com
[mic: Fix kernel-doc formatting, pack and align access_masks]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
On architectures where __u64 is unsigned long (e.g. powerpc64), using
%llx to format a __u64 triggers a -Wformat warning because %llx expects
unsigned long long. Cast the argument to unsigned long long.
Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Fixes: a549d055a22e ("selftests/landlock: Add network tests")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202604020206.62zgOTeP-lkp@intel.com/
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260402192608.1458252-6-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Domain deallocation records are emitted asynchronously from kworker
threads (via free_ruleset_work()). Stale deallocation records from a
previous test can arrive during the current test's deallocation read
loop and be picked up by audit_match_record() instead of the expected
record, causing a domain ID mismatch. The audit.layers test (which
creates 16 nested domains) is particularly vulnerable because it reads
16 deallocation records in sequence, providing a large window for stale
records to interleave.
The same issue affects audit_flags.signal, where deallocation records
from a previous test (audit.layers) can leak into the next test and be
picked up by audit_match_record() instead of the expected record.
Fix this by continuing to read records when the type matches but the
content pattern does not. Stale records are silently consumed, and the
loop only stops when both type and pattern match (or the socket times
out with -EAGAIN).
Additionally, extend matches_log_domain_deallocated() with an
expected_domain_id parameter. When set, the regex pattern includes the
specific domain ID as a literal hex value, so that deallocation records
for a different domain do not match the pattern at all. This handles
the case where the stale record has the same denial count as the
expected one (e.g. both have denials=1), which the type+pattern loop
alone cannot distinguish. Callers that already know the expected domain
ID (from a prior denial or allocation record) now pass it to filter
precisely.
When expected_domain_id is set, matches_log_domain_deallocated() also
temporarily increases the socket timeout to audit_tv_dom_drop (1 second)
to wait for the asynchronous kworker deallocation, and restores
audit_tv_default afterward. This removes the need for callers to manage
the timeout switch manually.
Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Fixes: 6a500b22971c ("selftests/landlock: Add tests for audit flags and domain IDs")
Link: https://lore.kernel.org/r/20260402192608.1458252-5-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Non-audit Landlock tests generate audit records as side effects when
audit_enabled is non-zero (e.g. from boot configuration). These records
accumulate in the kernel audit backlog while no audit daemon socket is
open. When the next test opens a new netlink socket and registers as
the audit daemon, the stale backlog is delivered, causing baseline
record count checks to fail spuriously.
Fix this by draining all pending records in audit_init() right after
setting the receive timeout. The 1-usec SO_RCVTIMEO causes audit_recv()
to return -EAGAIN once the backlog is empty, naturally terminating the
drain loop.
Domain deallocation records are emitted asynchronously from a work
queue, so they may still arrive after the drain. Remove records.domain
== 0 checks that are not preceded by audit_match_record() calls, which
would otherwise consume stale records before the count. Document this
constraint above audit_count_records().
Increasing the drain timeout to catch in-flight deallocation records was
considered but rejected: a longer timeout adds latency to every
audit_init() call even when no stale record is pending, and any fixed
timeout is still not guaranteed to catch all records under load.
Removing the unprotected checks is simpler and avoids the spurious
failures.
Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Fixes: 6a500b22971c ("selftests/landlock: Add tests for audit flags and domain IDs")
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260402192608.1458252-4-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
audit_init() opens a netlink socket and configures it, but leaks the
file descriptor if audit_set_status() or setsockopt() fails. Fix this
by jumping to an error path that closes the socket before returning.
Apply the same fix to audit_init_with_exe_filter(), which leaks the file
descriptor from audit_init() if audit_init_filter_exe() or
audit_filter_exe() fails, and to audit_cleanup(), which leaks it if
audit_init_filter_exe() fails in FIXTURE_TEARDOWN_PARENT().
Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Fixes: 6a500b22971c ("selftests/landlock: Add tests for audit flags and domain IDs")
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260402192608.1458252-3-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
snprintf() returns the number of characters that would have been
written, excluding the terminating NUL byte. When the output is
truncated, this return value equals or exceeds the buffer size. Fix
matches_log_domain_allocated() and matches_log_domain_deallocated() to
detect truncation with ">=" instead of ">".
Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Fixes: 6a500b22971c ("selftests/landlock: Add tests for audit flags and domain IDs")
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260402192608.1458252-2-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
LANDLOCK_RESTRICT_SELF_TSYNC does not allow
LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF with ruleset_fd=-1, preventing
a multithreaded process from atomically propagating subdomain log muting
to all threads without creating a domain layer. Relax the fd=-1
condition to accept TSYNC alongside LOG_SUBDOMAINS_OFF, and update the
documentation accordingly.
Add flag validation tests for all TSYNC combinations with ruleset_fd=-1,
and audit tests verifying both transition directions: muting via TSYNC
(logged to not logged) and override via TSYNC (not logged to logged).
Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Fixes: 42fc7e6543f6 ("landlock: Multithreading support for landlock_restrict_self()")
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260407164107.2012589-2-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
hook_cred_transfer() only copies the Landlock security blob when the
source credential has a domain. This is inconsistent with
landlock_restrict_self() which can set LOG_SUBDOMAINS_OFF on a
credential without creating a domain (via the ruleset_fd=-1 path): the
field is committed but not preserved across fork() because the child's
prepare_creds() calls hook_cred_transfer() which skips the copy when
domain is NULL.
This breaks the documented use case where a process mutes subdomain logs
before forking sandboxed children: the children lose the muting and
their domains produce unexpected audit records.
Fix this by unconditionally copying the Landlock credential blob.
Cc: Günther Noack <gnoack@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: stable@vger.kernel.org
Fixes: ead9079f7569 ("landlock: Add LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF")
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260407164107.2012589-1-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Add tsync_interrupt test to exercise the signal interruption path in
landlock_restrict_sibling_threads(). When a signal interrupts
wait_for_completion_interruptible() while the calling thread waits for
sibling threads to finish credential preparation, the kernel:
1. Sets ERESTARTNOINTR to request a transparent syscall restart.
2. Calls cancel_tsync_works() to opportunistically dequeue task works
that have not started running yet.
3. Breaks out of the preparation loop, then unblocks remaining
task works via complete_all() and waits for them to finish.
4. Returns the error, causing abort_creds() in the syscall handler.
Specifically, cancel_tsync_works() in its entirety, the ERESTARTNOINTR
error branch in landlock_restrict_sibling_threads(), and the
abort_creds() error branch in the landlock_restrict_self() syscall
handler are timing-dependent and not exercised by the existing tsync
tests, making code coverage measurements non-deterministic.
The test spawns a signaler thread that rapidly sends SIGUSR1 to the
calling thread while it performs landlock_restrict_self() with
LANDLOCK_RESTRICT_SELF_TSYNC. Since ERESTARTNOINTR causes a
transparent restart, userspace always sees the syscall succeed.
This is a best-effort coverage test: the interruption path is exercised
when the signal lands during the preparation wait, which depends on
thread scheduling. The test creates enough idle sibling threads (200)
to ensure multiple serialized waves of credential preparation even on
machines with many cores (e.g., 64), widening the window for the
signaler. Deterministic coverage would require wrapping the wait call
with ALLOW_ERROR_INJECTION() and using CONFIG_FAIL_FUNCTION.
Test coverage for security/landlock was 90.2% of 2105 lines according to
LLVM 21, and it is now 91.1% of 2105 lines with this new test.
Cc: Günther Noack <gnoack@google.com>
Cc: Justin Suess <utilityemal77@gmail.com>
Cc: Tingmao Wang <m@maowtm.org>
Cc: Yihan Ding <dingyihan@uniontech.com>
Link: https://lore.kernel.org/r/20260310190416.1913908-1-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
fs_bench benchmarks the performance of Landlock's path walk
by exercising it in a scenario that amplifies Landlock's overhead:
* Create a large number of nested directories
* Enforce a Landlock policy in which a rule is associated with each of
these subdirectories
* Benchmark openat() applied to the deepest directory,
forcing Landlock to walk the entire path.
Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260206151154.97915-3-gnoack3000@gmail.com
[mic: Fix missing mode with O_CREAT, improve text consistency, sort
includes]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Exercise various scenarios where Landlock domains are enforced across
all of a processes' threads.
Test coverage for security/landlock is 91.6% of 2130 lines according to
LLVM 21.
Cc: Andrew G. Morgan <morgan@kernel.org>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Paul Moore <paul@paul-moore.com>
Signed-off-by: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20251127115136.3064948-3-gnoack@google.com
[mic: Fix subject, use EXPECT_EQ(close()), make helpers static, add test
coverage]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Introduce the LANDLOCK_RESTRICT_SELF_TSYNC flag. With this flag, a
given Landlock ruleset is applied to all threads of the calling
process, instead of only the current one.
Without this flag, multithreaded userspace programs currently resort
to using the nptl(7)/libpsx hack for multithreaded policy enforcement,
which is also used by libcap and for setuid(2). Using this
userspace-based scheme, the threads of a process enforce the same
Landlock policy, but the resulting Landlock domains are still
separate. The domains being separate causes multiple problems:
* When using Landlock's "scoped" access rights, the domain identity is
used to determine whether an operation is permitted. As a result,
when using LANLDOCK_SCOPE_SIGNAL, signaling between sibling threads
stops working. This is a problem for programming languages and
frameworks which are inherently multithreaded (e.g. Go).
* In audit logging, the domains of separate threads in a process will
get logged with different domain IDs, even when they are based on
the same ruleset FD, which might confuse users.
Cc: Andrew G. Morgan <morgan@kernel.org>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Paul Moore <paul@paul-moore.com>
Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20251127115136.3064948-2-gnoack@google.com
[mic: Fix restrict_self_flags test, clean up Makefile, allign comments,
reduce local variable scope, add missing includes]
Closes: https://github.com/landlock-lsm/linux/issues/2
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Add a missing close(srv_fd) call, and use EXPECT_EQ() to check the
result.
Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Fixes: f83d51a5bdfe ("selftests/landlock: Check IOCTL restrictions for named UNIX domain sockets")
Link: https://lore.kernel.org/r/20260101134102.25938-2-gnoack3000@gmail.com
[mic: Use EXPECT_EQ() and update commit message]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
ptrace_test.c currently contains a duplicated version of the
scoped_domains fixture variants. This patch removes that and make it use
the shared scoped_base_variants.h instead, like in
scoped_abstract_unix_test and scoped_signal_test.
This required renaming the hierarchy fixture to scoped_domains, but the
test is otherwise the same.
Cc: Tahera Fahimi <fahimitahera@gmail.com>
Signed-off-by: Tingmao Wang <m@maowtm.org>
Link: https://lore.kernel.org/r/48148f0134f95f819a25277486a875a6fd88ecf9.1766885035.git.m@maowtm.org
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Add missing semicolon after EXPECT_EQ(0, close(stream_server_child)) in
the scoped_vs_unscoped test. I suspect currently it's just not executing
the close statement after the line, but this causes no observable
difference.
Fixes: fefcf0f7cf47 ("selftests/landlock: Test abstract UNIX socket scoping")
Cc: Tahera Fahimi <fahimitahera@gmail.com>
Signed-off-by: Tingmao Wang <m@maowtm.org>
Link: https://lore.kernel.org/r/d9e968e4cd4ecc9bf487593d7b7220bffbb3b5f5.1766885035.git.m@maowtm.org
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Signed-off-by: Tingmao Wang <m@maowtm.org>
Link: https://lore.kernel.org/r/62d18e06eeb26f62bc49d24c4467b3793c5ba32b.1766885035.git.m@maowtm.org
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
The size of Unix pathname addresses is computed in selftests using
offsetof(struct sockaddr_un, sun_path) + strlen(xxx). It should have
been that +1, which makes addresses passed to the libc and kernel
non-NULL-terminated. unix_mkname_bsd() fixes that in Linux so there is
no harm, but just using sizeof(the address struct) should improve
readability.
Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
Reviewed-by: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20251202215141.689986-1-matthieu@buffet.re
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Remove bind() call on a client socket that doesn't make sense.
Since strlen(cli_un.sun_path) returns a random value depending on stack
garbage, that many uninitialized bytes are read from the stack as an
unix socket address. This creates random test failures due to the bind
address being invalid or already in use if the same stack value comes up
twice.
Fixes: f83d51a5bdfe ("selftests/landlock: Check IOCTL restrictions for named UNIX domain sockets")
Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
Reviewed-by: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20251201003631.190817-1-matthieu@buffet.re
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
connect_variant(unspec_any0) is called twice. Both calls end
up in connect_variant_addrlen() with an address length of
get_addrlen(minimal=false).
However, the connect() syscall and its variants (e.g.
iouring/compat) accept much shorter addresses of 4 bytes
and that behaviour was not tested.
Replace one of these calls with one using a minimal address
length (just a bare sa_family=AF_UNSPEC field with no actual
address). Also add a call using a truncated address for good
measure.
Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
Link: https://lore.kernel.org/r/20251027190726.626244-3-matthieu@buffet.re
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
The nominal error code for bind(AF_UNSPEC) on an IPv6 socket
is -EAFNOSUPPORT, not -EINVAL. -EINVAL is only returned when
the supplied address struct is too short, which happens to be
the case in current selftests because they treat AF_UNSPEC
like IPv4 sockets do: as an alias for AF_INET (which is a
16-byte struct instead of the 24 bytes required by IPv6
sockets).
Make the union large enough for any address (by adding struct
sockaddr_storage to the union), and make AF_UNSPEC addresses
large enough for any family.
Test for -EAFNOSUPPORT instead, and add a dedicated test case
for truncated inputs with -EINVAL.
Fixes: a549d055a22e ("selftests/landlock: Add network tests")
Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
Link: https://lore.kernel.org/r/20251027190726.626244-2-matthieu@buffet.re
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- "panic: sys_info: Refactor and fix a potential issue" (Andy Shevchenko)
fixes a build issue and does some cleanup in ib/sys_info.c
- "Implement mul_u64_u64_div_u64_roundup()" (David Laight)
enhances the 64-bit math code on behalf of a PWM driver and beefs up
the test module for these library functions
- "scripts/gdb/symbols: make BPF debug info available to GDB" (Ilya Leoshkevich)
makes BPF symbol names, sizes, and line numbers available to the GDB
debugger
- "Enable hung_task and lockup cases to dump system info on demand" (Feng Tang)
adds a sysctl which can be used to cause additional info dumping when
the hung-task and lockup detectors fire
- "lib/base64: add generic encoder/decoder, migrate users" (Kuan-Wei Chiu)
adds a general base64 encoder/decoder to lib/ and migrates several
users away from their private implementations
- "rbree: inline rb_first() and rb_last()" (Eric Dumazet)
makes TCP a little faster
- "liveupdate: Rework KHO for in-kernel users" (Pasha Tatashin)
reworks the KEXEC Handover interfaces in preparation for Live Update
Orchestrator (LUO), and possibly for other future clients
- "kho: simplify state machine and enable dynamic updates" (Pasha Tatashin)
increases the flexibility of KEXEC Handover. Also preparation for LUO
- "Live Update Orchestrator" (Pasha Tatashin)
is a major new feature targeted at cloud environments. Quoting the
cover letter:
This series introduces the Live Update Orchestrator, a kernel
subsystem designed to facilitate live kernel updates using a
kexec-based reboot. This capability is critical for cloud
environments, allowing hypervisors to be updated with minimal
downtime for running virtual machines. LUO achieves this by
preserving the state of selected resources, such as memory,
devices and their dependencies, across the kernel transition.
As a key feature, this series includes support for preserving
memfd file descriptors, which allows critical in-memory data, such
as guest RAM or any other large memory region, to be maintained in
RAM across the kexec reboot.
Mike Rappaport merits a mention here, for his extensive review and
testing work.
- "kexec: reorganize kexec and kdump sysfs" (Sourabh Jain)
moves the kexec and kdump sysfs entries from /sys/kernel/ to
/sys/kernel/kexec/ and adds back-compatibility symlinks which can
hopefully be removed one day
- "kho: fixes for vmalloc restoration" (Mike Rapoport)
fixes a BUG which was being hit during KHO restoration of vmalloc()
regions
* tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (139 commits)
calibrate: update header inclusion
Reinstate "resource: avoid unnecessary lookups in find_next_iomem_res()"
vmcoreinfo: track and log recoverable hardware errors
kho: fix restoring of contiguous ranges of order-0 pages
kho: kho_restore_vmalloc: fix initialization of pages array
MAINTAINERS: TPM DEVICE DRIVER: update the W-tag
init: replace simple_strtoul with kstrtoul to improve lpj_setup
KHO: fix boot failure due to kmemleak access to non-PRESENT pages
Documentation/ABI: new kexec and kdump sysfs interface
Documentation/ABI: mark old kexec sysfs deprecated
kexec: move sysfs entries to /sys/kernel/kexec
test_kho: always print restore status
kho: free chunks using free_page() instead of kfree()
selftests/liveupdate: add kexec test for multiple and empty sessions
selftests/liveupdate: add simple kexec-based selftest for LUO
selftests/liveupdate: add userspace API selftests
docs: add documentation for memfd preservation via LUO
mm: memfd_luo: allow preserving memfd
liveupdate: luo_file: add private argument to store runtime state
mm: shmem: export some functions to internal.h
...
|
|
Test disconnected directories with two test suites
(layout4_disconnected_leafs and layout5_disconnected_branch) and 43
variants to cover the main corner cases.
These tests are complementary to the previous commit.
Add test_renameat() and test_exchangeat() helpers.
Test coverage for security/landlock is 92.1% of 1927 lines according to
LLVM 20.
Cc: Günther Noack <gnoack@google.com>
Cc: Song Liu <song@kernel.org>
Cc: Tingmao Wang <m@maowtm.org>
Link: https://lore.kernel.org/r/20251128172200.760753-5-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
This adds tests for the edge case discussed in [1], with specific ones
for rename and link operations when the operands are through
disconnected paths, as that go through a separate code path in Landlock.
This has resulted in a warning, due to collect_domain_accesses() not
expecting to reach a different root from path->mnt:
# RUN layout1_bind.path_disconnected ...
# OK layout1_bind.path_disconnected
ok 96 layout1_bind.path_disconnected
# RUN layout1_bind.path_disconnected_rename ...
[..] ------------[ cut here ]------------
[..] WARNING: CPU: 3 PID: 385 at security/landlock/fs.c:1065 collect_domain_accesses
[..] ...
[..] RIP: 0010:collect_domain_accesses (security/landlock/fs.c:1065 (discriminator 2) security/landlock/fs.c:1031 (discriminator 2))
[..] current_check_refer_path (security/landlock/fs.c:1205)
[..] ...
[..] hook_path_rename (security/landlock/fs.c:1526)
[..] security_path_rename (security/security.c:2026 (discriminator 1))
[..] do_renameat2 (fs/namei.c:5264)
# OK layout1_bind.path_disconnected_rename
ok 97 layout1_bind.path_disconnected_rename
Move the const char definitions a bit above so that we can use the path
for s4d1 in cleanup code.
Cc: Günther Noack <gnoack@google.com>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/027d5190-b37a-40a8-84e9-4ccbc352bcdf@maowtm.org [1]
Signed-off-by: Tingmao Wang <m@maowtm.org>
Link: https://lore.kernel.org/r/20251128172200.760753-4-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
This follow-up patch completes centralization of kselftest.h and
ksefltest_harness.h includes in remaining seltests files, replacing all
relative paths with a non-relative paths using shared -I include path in
lib.mk
Tested with gcc-13.3 and clang-18.1, and cross-compiled successfully on
riscv, arm64, x86_64 and powerpc arch.
[reddybalavignesh9979@gmail.com: add selftests include path for kselftest.h]
Link: https://lkml.kernel.org/r/20251017090201.317521-1-reddybalavignesh9979@gmail.com
Link: https://lkml.kernel.org/r/20251016104409.68985-1-reddybalavignesh9979@gmail.com
Signed-off-by: Bala-Vignesh-Reddy <reddybalavignesh9979@gmail.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/lkml/20250820143954.33d95635e504e94df01930d0@linux-foundation.org/
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Günther Noack <gnoack@google.com>
Cc: Jakub Kacinski <kuba@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mickael Salaun <mic@digikod.net>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Simon Horman <horms@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Make all headers part of make's dependencies computations.
Otherwise, updating audit.h, common.h, scoped_base_variants.h,
scoped_common.h, scoped_multiple_domain_variants.h, or wrappers.h,
re-running make and running selftests could lead to testing stale headers.
Fixes: 6a500b22971c ("selftests/landlock: Add tests for audit flags and domain IDs")
Fixes: fefcf0f7cf47 ("selftests/landlock: Test abstract UNIX socket scoping")
Fixes: 5147779d5e1b ("selftests/landlock: Add wrappers.h")
Signed-off-by: Matthieu Buffet <matthieu@buffet.re>
Link: https://lore.kernel.org/r/20251027011440.1838514-1-matthieu@buffet.re
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Several selftests subdirectories duplicated the define __maybe_unused,
leading to redundant code. Move to kselftest.h header and remove other
definitions.
This addresses the duplication noted in the proc-pid-vm warning fix
Link: https://lkml.kernel.org/r/20250821101159.2238-1-reddybalavignesh9979@gmail.com
Signed-off-by: Bala-Vignesh-Reddy <reddybalavignesh9979@gmail.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Link:https://lore.kernel.org/lkml/20250820143954.33d95635e504e94df01930d0@linux-foundation.org/
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Mickal Salan <mic@digikod.net> [landlock]
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This test checks that a rule on a directory used as a mount point does
not grant access to the mount covering it. It is a generalization of
the bind mount case in layout3_fs.hostfs.release_inodes [1] that tests
hidden mount points.
Cc: Günther Noack <gnoack@google.com>
Cc: Song Liu <song@kernel.org>
Cc: Tingmao Wang <m@maowtm.org>
Link: https://lore.kernel.org/r/20250606.zo5aekae6Da6@digikod.net [1]
Link: https://lore.kernel.org/r/20250606110811.211297-1-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
We are hitting build error on CentOS 9:
audit_test.c:232:40: error: ‘O_CLOEXEC’ undeclared (...)
Fix this by including fcntl.h.
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250605214416.1885878-1-song@kernel.org
Fixes: 6b4566400a29 ("selftests/landlock: Add PID tests for audit records")
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
The audit_init_filter_exe() helper incorrectly checks the readlink(2)
error because an unsigned integer is used to store the result. Use a
signed integer for this check.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/aDbFwyZ_fM-IO7sC@stanley.mountain
Fixes: 6a500b22971c ("selftests/landlock: Add tests for audit flags and domain IDs")
Reviewed-by: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20250528144426.1709063-1-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Add audit.thread tests to check that the PID tied to a domain is not a
thread ID but the thread group ID. These new tests would not pass
without the previous TGID fix.
Extend matches_log_domain_allocated() to check against the PID that
created the domain.
Test coverage for security/landlock is 93.6% of 1524 lines according to
gcc/gcov-14.
Cc: Christian Brauner <brauner@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250410171725.1265860-3-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
The audit fixture needlessly stores and manages domain_stack. Move it
to the audit.layers tests. This will be useful to reuse the audit
fixture with the next patch.
Cc: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20250410171725.1265860-2-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Test all network blockers:
- net.bind_tcp
- net.connect_tcp
Test coverage for security/landlock is 94.0% of 1525 lines according to
gcc/gcov-14.
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-28-mic@digikod.net
[mic: Update test coverage]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Test all filesystem blockers, including events with several records, and
record with several blockers:
- fs.execute
- fs.write_file
- fs.read_file
- fs_read_dir
- fs.remove_dir
- fs.remove_file
- fs.make_char
- fs.make_dir
- fs.make_reg
- fs.make_sock
- fs.make_fifo
- fs.make_block
- fs.make_sym
- fs.refer
- fs.truncate
- fs.ioctl_dev
- fs.change_topology
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-27-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Add a new scoped_audit.connect_to_child test to check the abstract UNIX
socket blocker.
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-26-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Add tests for all ptrace actions checking "blockers=ptrace" records.
This also improves PTRACE_TRACEME and PTRACE_ATTACH tests by making sure
that the restrictions comes from Landlock, and with the expected
process. These extended tests are like enhanced errno checks that make
sure Landlock enforcement is consistent.
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-25-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Add audit_exec tests to filter Landlock denials according to
cross-execution or muted subdomains.
Add a wait-pipe-sandbox.c test program to sandbox itself and send a
(denied) signals to its parent.
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-24-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Add audit_test.c to check with and without LANDLOCK_RESTRICT_SELF_*
flags against the two Landlock audit record types:
AUDIT_LANDLOCK_ACCESS and AUDIT_LANDLOCK_DOMAIN.
Check consistency of domain IDs per layer in AUDIT_LANDLOCK_ACCESS and
AUDIT_LANDLOCK_DOMAIN messages: denied access, domain allocation, and
domain deallocation.
These tests use signal scoping to make it simple. They are not in the
scoped_signal_test.c file but in the new dedicated audit_test.c file.
Tests are run with audit filters to ensure the audit records come from
the test program. Moreover, because there can only be one audit
process, tests would failed if run in parallel. Because of audit
limitations, tests can only be run in the initial namespace.
The audit test helpers were inspired by libaudit and
tools/testing/selftests/net/netfilter/audit_logread.c
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Phil Sutter <phil@nwl.cc>
Link: https://lore.kernel.org/r/20250320190717.2287696-23-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Add the base_test's restrict_self_fd_flags tests to align with previous
restrict_self_fd tests but with the new
LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF flag.
Add the restrict_self_flags tests to check that
LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF,
LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON, and
LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF are valid but not the next
bit. Some checks are similar to restrict_self_checks_ordering's ones.
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-22-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
To align with fs_test's layout1.inval and layout0.proc_nsfs which test
EBADFD for landlock_add_rule(2), create a new base_test's
restrict_self_fd which test EBADFD for landlock_restrict_self(2).
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-21-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Most of the time we want to log denied access because they should not
happen and such information helps diagnose issues. However, when
sandboxing processes that we know will try to access denied resources
(e.g. unknown, bogus, or malicious binary), we might want to not log
related access requests that might fill up logs.
By default, denied requests are logged until the task call execve(2).
If the LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF flag is set, denied
requests will not be logged for the same executed file.
If the LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON flag is set, denied
requests from after an execve(2) call will be logged.
The rationale is that a program should know its own behavior, but not
necessarily the behavior of other programs.
Because LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF is set for a specific
Landlock domain, it makes it possible to selectively mask some access
requests that would be logged by a parent domain, which might be handy
for unprivileged processes to limit logs. However, system
administrators should still use the audit filtering mechanism. There is
intentionally no audit nor sysctl configuration to re-enable these logs.
This is delegated to the user space program.
Increment the Landlock ABI version to reflect this interface change.
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-18-mic@digikod.net
[mic: Rename variables and fix __maybe_unused]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
The new signal_scoping_thread_setuid tests check that the libc's
setuid() function works as expected even when a thread is sandboxed with
scoped signal restrictions.
Before the signal scoping fix, this test would have failed with the
setuid() call:
[pid 65] getpid() = 65
[pid 65] tgkill(65, 66, SIGRT_1) = -1 EPERM (Operation not permitted)
[pid 65] futex(0x40a66cdc, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 65] setuid(1001) = 0
After the fix, tgkill(2) is successfully leveraged to synchronize
credentials update across threads:
[pid 65] getpid() = 65
[pid 65] tgkill(65, 66, SIGRT_1) = 0
[pid 66] <... read resumed>0x40a65eb7, 1) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
[pid 66] --- SIGRT_1 {si_signo=SIGRT_1, si_code=SI_TKILL, si_pid=65, si_uid=1000} ---
[pid 66] getpid() = 65
[pid 66] setuid(1001) = 0
[pid 66] futex(0x40a66cdc, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 66] rt_sigreturn({mask=[]}) = 0
[pid 66] read(3, <unfinished ...>
[pid 65] setuid(1001) = 0
Test coverage for security/landlock is 92.9% of 1137 lines according to
gcc/gcov-14.
Fixes: c8994965013e ("selftests/landlock: Test signal scoping for threads")
Cc: Günther Noack <gnoack@google.com>
Cc: Tahera Fahimi <fahimitahera@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250318161443.279194-8-mic@digikod.net
[mic: Update test coverage]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Split signal_scoping_threads tests into signal_scoping_thread_before
and signal_scoping_thread_after.
Use local variables for thread synchronization. Fix exported function.
Replace some asserts with expects.
Fixes: c8994965013e ("selftests/landlock: Test signal scoping for threads")
Cc: Günther Noack <gnoack@google.com>
Cc: Tahera Fahimi <fahimitahera@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250318161443.279194-7-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Because Linux credentials are managed per thread, user space relies on
some hack to synchronize credential update across threads from the same
process. This is required by the Native POSIX Threads Library and
implemented by set*id(2) wrappers and libcap(3) to use tgkill(2) to
synchronize threads. See nptl(7) and libpsx(3). Furthermore, some
runtimes like Go do not enable developers to have control over threads
[1].
To avoid potential issues, and because threads are not security
boundaries, let's relax the Landlock (optional) signal scoping to always
allow signals sent between threads of the same process. This exception
is similar to the __ptrace_may_access() one.
hook_file_set_fowner() now checks if the target task is part of the same
process as the caller. If this is the case, then the related signal
triggered by the socket will always be allowed.
Scoping of abstract UNIX sockets is not changed because kernel objects
(e.g. sockets) should be tied to their creator's domain at creation
time.
Note that creating one Landlock domain per thread puts each of these
threads (and their future children) in their own scope, which is
probably not what users expect, especially in Go where we do not control
threads. However, being able to drop permissions on all threads should
not be restricted by signal scoping. We are working on a way to make it
possible to atomically restrict all threads of a process with the same
domain [2].
Add erratum for signal scoping.
Closes: https://github.com/landlock-lsm/go-landlock/issues/36
Fixes: 54a6e6bbf3be ("landlock: Add signal scoping")
Fixes: c8994965013e ("selftests/landlock: Test signal scoping for threads")
Depends-on: 26f204380a3c ("fs: Fix file_set_fowner LSM hook inconsistencies")
Link: https://pkg.go.dev/kernel.org/pub/linux/libs/security/libcap/psx [1]
Link: https://github.com/landlock-lsm/linux/issues/2 [2]
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Tahera Fahimi <fahimitahera@gmail.com>
Cc: stable@vger.kernel.org
Acked-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20250318161443.279194-6-mic@digikod.net
[mic: Add extra pointer check and RCU guard, and ease backport]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Some fixes may require user space to check if they are applied on the
running kernel before using a specific feature. For instance, this
applies when a restriction was previously too restrictive and is now
getting relaxed (e.g. for compatibility reasons). However, non-visible
changes for legitimate use (e.g. security fixes) do not require an
erratum.
Because fixes are backported down to a specific Landlock ABI, we need a
way to avoid cherry-pick conflicts. The solution is to only update a
file related to the lower ABI impacted by this issue. All the ABI files
are then used to create a bitmask of fixes.
The new errata interface is similar to the one used to get the supported
Landlock ABI version, but it returns a bitmask instead because the order
of fixes may not match the order of versions, and not all fixes may
apply to all versions.
The actual errata will come with dedicated commits. The description is
not actually used in the code but serves as documentation.
Create the landlock_abi_version symbol and use its value to check errata
consistency.
Update test_base's create_ruleset_checks_ordering tests and add errata
tests.
This commit is backportable down to the first version of Landlock.
Fixes: 3532b0b4352c ("landlock: Enable user space to infer supported features")
Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250318161443.279194-3-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Building the test creates binaries 'wait-pipe' and
'sandbox-and-launch' which need to be gitignore'd.
Signed-off-by: Bharadwaj Raju <bharadwaj.raju777@gmail.com>
Link: https://lore.kernel.org/r/20250210161101.6024-1-bharadwaj.raju777@gmail.com
[mic: Sort entries]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|