linux/drivers/dma-buf/heaps, branch master

Merge tag 'dma-mapping-7.1-2026-04-16' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux

2026-04-17T18:12:42Z

Pull dma-mapping updates from Marek Szyprowski: - added support for batched cache sync, what improves performance of dma_map/unmap_sg() operations on ARM64 architecture (Barry Song) - introduced DMA_ATTR_CC_SHARED attribute for explicitly shared memory used in confidential computing (Jiri Pirko) - refactored spaghetti-like code in drivers/of/of_reserved_mem.c and its clients (Marek Szyprowski, shared branch with device-tree updates to avoid merge conflicts) - prepared Contiguous Memory Allocator related code for making dma-buf drivers modularized (Maxime Ripard) - added support for benchmarking dma_map_sg() calls to tools/dma utility (Qinxin Xia) * tag 'dma-mapping-7.1-2026-04-16' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: (24 commits) dma-buf: heaps: system: document system_cc_shared heap dma-buf: heaps: system: add system_cc_shared heap for explicitly shared memory dma-mapping: introduce DMA_ATTR_CC_SHARED for shared memory mm: cma: Export cma_alloc(), cma_release() and cma_get_name() dma: contiguous: Export dev_get_cma_area() dma: contiguous: Make dma_contiguous_default_area static dma: contiguous: Make dev_get_cma_area() a proper function dma: contiguous: Turn heap registration logic around of: reserved_mem: rework fdt_init_reserved_mem_node() of: reserved_mem: clarify fdt_scan_reserved_mem*() functions of: reserved_mem: rearrange code a bit of: reserved_mem: replace CMA quirks by generic methods of: reserved_mem: switch to ops based OF_DECLARE() of: reserved_mem: use -ENODEV instead of -ENOENT of: reserved_mem: remove fdt node from the structure dma-mapping: fix false kernel-doc comment marker dma-mapping: Support batch mode for dma_direct_{map,unmap}_sg dma-mapping: Separate DMA sync issuing and completion waiting arm64: Provide dcache_inval_poc_nosync helper arm64: Provide dcache_clean_poc_nosync helper ...

dma-buf: heaps: system: add system_cc_shared heap for explicitly shared memory

2026-04-02T05:29:33Z

Add a new "system_cc_shared" dma-buf heap to allow userspace to allocate shared (decrypted) memory for confidential computing (CoCo) VMs. On CoCo VMs, guest memory is private by default. The hardware uses an encryption bit in page table entries (C-bit on AMD SEV, "shared" bit on Intel TDX) to control whether a given memory access is private or shared. The kernel's direct map is set up as private, so pages returned by alloc_pages() are private in the direct map by default. To make this memory usable for devices that do not support DMA to private memory (no TDISP support), it has to be explicitly shared. A couple of things are needed to properly handle shared memory for the dma-buf use case: - set_memory_decrypted() on the direct map after allocation: Besides clearing the encryption bit in the direct map PTEs, this also notifies the hypervisor about the page state change. On free, the inverse set_memory_encrypted() must be called before returning pages to the allocator. If re-encryption fails, pages are intentionally leaked to prevent shared memory from being reused as private. - pgprot_decrypted() for userspace and kernel virtual mappings: Any new mapping of the shared pages, be it to userspace via mmap or to kernel vmalloc space via vmap, creates PTEs independent of the direct map. These must also have the encryption bit cleared, otherwise accesses through them would see encrypted (garbage) data. - DMA_ATTR_CC_SHARED for DMA mapping: Since the pages are already shared, the DMA API needs to be informed via DMA_ATTR_CC_SHARED so it can map them correctly as unencrypted for device access. On non-CoCo VMs, the system_cc_shared heap is not registered to prevent misuse by userspace that does not understand the security implications of explicitly shared memory. Signed-off-by: Jiri Pirko Reviewed-by: T.J. Mercier Reviewed-by: Jason Gunthorpe Acked-by: Sumit Semwal Signed-off-by: Marek Szyprowski Link: https://lore.kernel.org/r/20260325192352.437608-3-jiri@resnulli.us

dma: contiguous: Turn heap registration logic around

2026-03-31T11:27:20Z

The CMA heap instantiation was initially developed by having the contiguous DMA code call into the CMA heap to create a new instance every time a reserved memory area is probed. Turning the CMA heap into a module would create a dependency of the kernel on a module, which doesn't work. Let's turn the logic around and do the opposite: store all the reserved memory CMA regions into the contiguous DMA code, and provide an iterator for the heap to use when it probes. Signed-off-by: Maxime Ripard Signed-off-by: Marek Szyprowski Link: https://lore.kernel.org/r/20260331-dma-buf-heaps-as-modules-v4-1-e18fda504419@kernel.org

dma-buf: heaps: Clear CMA highages using helper

2026-03-11T09:06:32Z

Currently the CMA allocator clears highmem pages using kmap()->clear_page()->kunmap(), but there is a helper static inline in that does the same for us so use clear_highpage() instead of open coding this. Suggested-by: T.J. Mercier Reviewed-by: T.J. Mercier Reviewed-by: Christian König Reviewed-by: Maxime Ripard Signed-off-by: Linus Walleij Link: https://patch.msgid.link/20260310-cma-heap-clear-pages-v2-2-ecbbed3d7e6d@kernel.org

dma-buf: heaps: Clear CMA pages with clear_pages()

2026-03-11T09:06:13Z

As of commit 62a9f5a85b98 "mm: introduce clear_pages() and clear_user_pages()" we can clear a range of pages with a potentially assembly-optimized call. Instead of using a memset, use this helper to clear the whole range of pages from the CMA allocation. Reviewed-by: T.J. Mercier Reviewed-by: Maxime Ripard Signed-off-by: Linus Walleij Link: https://patch.msgid.link/20260310-cma-heap-clear-pages-v2-1-ecbbed3d7e6d@kernel.org

Convert 'alloc_obj' family to use the new default GFP_KERNEL argument

2026-02-22T01:09:51Z

This was done entirely with mindless brute force, using git grep -l '\

treewide: Replace kmalloc with kmalloc_obj for non-scalar types

2026-02-21T09:02:28Z

This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook

dma-buf: system_heap: account for system heap allocation in memcg

2026-01-19T13:39:52Z

The system dma-buf heap lets userspace allocate buffers from the page allocator. However, these allocations are not accounted for in memcg, allowing processes to escape limits that may be configured. Pass __GFP_ACCOUNT for system heap allocations, based on the dma_heap.mem_accounting parameter, to use memcg and account for them. Signed-off-by: Eric Chanudet Reviewed-by: T.J. Mercier Reviewed-by: Maxime Ripard Signed-off-by: Sumit Semwal Link: https://patch.msgid.link/20260116-dmabuf-heap-system-memcg-v3-2-ecc6b62cc446@redhat.com

dma-buf: heaps: Clear CMA pages with clear_page()

2026-01-08T07:52:29Z

clear_page() translates into memset(*p, 0, PAGE_SIZE) on some architectures, but on the major architectures it will call an optimized assembly snippet so use this instead of open coding a memset(). Signed-off-by: Linus Walleij Reviewed-by: Nirmoy Das Reviewed-by: T.J. Mercier Signed-off-by: Linus Walleij Link: https://patch.msgid.link/20251130-dma-buf-heap-clear-page-v1-1-a8dcea2a88ee@linaro.org

dma-buf: system_heap: use larger contiguous mappings instead of per-page mmap

2025-11-21T05:18:00Z

We can allocate high-order pages, but mapping them one by one is inefficient. This patch changes the code to map as large a chunk as possible. The code looks somewhat complicated mainly because supporting mmap with a non-zero offset is a bit tricky. Using the micro-benchmark below, we see that mmap becomes 35X faster: #include #include #include #include #include #include #include #include #define SIZE (512UL * 1024 * 1024) #define PAGE 4096 #define STRIDE (PAGE/sizeof(int)) #define PAGES (SIZE/PAGE) int main(void) { int heap = open("/dev/dma_heap/system", O_RDONLY); struct dma_heap_allocation_data d = { .len = SIZE, .fd_flags = O_RDWR|O_CLOEXEC }; ioctl(heap, DMA_HEAP_IOCTL_ALLOC, &d); struct timespec t0, t1; clock_gettime(CLOCK_MONOTONIC, &t0); int *p = mmap(NULL, SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, d.fd, 0); clock_gettime(CLOCK_MONOTONIC, &t1); for (int i = 0; i < PAGES; i++) p[i*STRIDE] = i; for (int i = 0; i < PAGES; i++) if (p[i*STRIDE] != i) { fprintf(stderr, "mismatch at page %d\n", i); exit(1); } long ns = (t1.tv_sec-t0.tv_sec)*1000000000L + (t1.tv_nsec-t0.tv_nsec); printf("mmap 512MB took %.3f us, verify OK\n", ns/1000.0); return 0; } W/ patch: ~ # ./a.out mmap 512MB took 200266.000 us, verify OK ~ # ./a.out mmap 512MB took 198151.000 us, verify OK ~ # ./a.out mmap 512MB took 197069.000 us, verify OK ~ # ./a.out mmap 512MB took 196781.000 us, verify OK ~ # ./a.out mmap 512MB took 198102.000 us, verify OK ~ # ./a.out mmap 512MB took 195552.000 us, verify OK W/o patch: ~ # ./a.out mmap 512MB took 6987470.000 us, verify OK ~ # ./a.out mmap 512MB took 6970739.000 us, verify OK ~ # ./a.out mmap 512MB took 6984383.000 us, verify OK ~ # ./a.out mmap 512MB took 6971311.000 us, verify OK ~ # ./a.out mmap 512MB took 6991680.000 us, verify OK Signed-off-by: Barry Song Acked-by: John Stultz Reviewed-by: Maxime Ripard Signed-off-by: Sumit Semwal [sumits: correct from 3.5x to 35x] Link: https://patch.msgid.link/20251021042022.47919-1-21cnbao@gmail.com