<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/lib/maple_tree.c, branch v6.16</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v6.16</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v6.16'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2025-07-10T04:07:52Z</updated>
<entry>
<title>maple_tree: fix mt_destroy_walk() on root leaf node</title>
<updated>2025-07-10T04:07:52Z</updated>
<author>
<name>Wei Yang</name>
<email>richard.weiyang@gmail.com</email>
</author>
<published>2025-06-24T19:18:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ea9b77f98d94c4d5c1bd1ac1db078f78b40e8bf5'/>
<id>urn:sha1:ea9b77f98d94c4d5c1bd1ac1db078f78b40e8bf5</id>
<content type='text'>
On destroy, we should set each node dead.  But current code miss this when
the maple tree has only the root node.

The reason is mt_destroy_walk() leverage mte_destroy_descend() to set node
dead, but this is skipped since the only root node is a leaf.

Fixes this by setting the node dead if it is a leaf.

Link: https://lore.kernel.org/all/20250407231354.11771-1-richard.weiyang@gmail.com/
Link: https://lkml.kernel.org/r/20250624191841.64682-1-Liam.Howlett@oracle.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Wei Yang &lt;richard.weiyang@gmail.com&gt;
Signed-off-by: Liam R. Howlett &lt;Liam.Howlett@oracle.com&gt;
Reviewed-by: Dev Jain &lt;dev.jain@arm.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>maple_tree: fix MA_STATE_PREALLOC flag in mas_preallocate()</title>
<updated>2025-06-20T03:48:04Z</updated>
<author>
<name>Liam R. Howlett</name>
<email>Liam.Howlett@oracle.com</email>
</author>
<published>2025-06-16T18:45:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fba46a5d83ca8decb338722fb4899026d8d9ead2'/>
<id>urn:sha1:fba46a5d83ca8decb338722fb4899026d8d9ead2</id>
<content type='text'>
Temporarily clear the preallocation flag when explicitly requesting
allocations.  Pre-existing allocations are already counted against the
request through mas_node_count_gfp(), but the allocations will not happen
if the MA_STATE_PREALLOC flag is set.  This flag is meant to avoid
re-allocating in bulk allocation mode, and to detect issues with
preallocation calculations.

The MA_STATE_PREALLOC flag should also always be set on zero allocations
so that detection of underflow allocations will print a WARN_ON() during
consumption.

User visible effect of this flaw is a WARN_ON() followed by a null pointer
dereference when subsequent requests for larger number of nodes is
ignored, such as the vma merge retry in mmap_region() caused by drivers
altering the vma flags (which happens in v6.6, at least)

Link: https://lkml.kernel.org/r/20250616184521.3382795-3-Liam.Howlett@oracle.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett &lt;Liam.Howlett@oracle.com&gt;
Reported-by: Zhaoyang Huang &lt;zhaoyang.huang@unisoc.com&gt;
Reported-by: Hailong Liu &lt;hailong.liu@oppo.com&gt;
Link: https://lore.kernel.org/all/1652f7eb-a51b-4fee-8058-c73af63bacd1@oppo.com/
Link: https://lore.kernel.org/all/20250428184058.1416274-1-Liam.Howlett@oracle.com/
Link: https://lore.kernel.org/all/20250429014754.1479118-1-Liam.Howlett@oracle.com/
Cc: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Hailong Liu &lt;hailong.liu@oppo.com&gt;
Cc: zhangpeng.00@bytedance.com &lt;zhangpeng.00@bytedance.com&gt;
Cc: Steve Kang &lt;Steve.Kang@unisoc.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Sidhartha Kumar &lt;sidhartha.kumar@oracle.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>maple_tree: reorder mas-&gt;store_type case statements</title>
<updated>2025-05-12T00:48:29Z</updated>
<author>
<name>Sidhartha Kumar</name>
<email>sidhartha.kumar@oracle.com</email>
</author>
<published>2025-04-10T19:14:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2a6ed1b411c5a037033a4dccc65cf4a90e4ae40a'/>
<id>urn:sha1:2a6ed1b411c5a037033a4dccc65cf4a90e4ae40a</id>
<content type='text'>
Move the unlikely case that mas-&gt;store_type is invalid to be the last
evaluated case and put liklier cases higher up.

Link: https://lkml.kernel.org/r/20250410191446.2474640-7-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar &lt;sidhartha.kumar@oracle.com&gt;
Suggested-by: Liam R. Howlett &lt;liam.howlett@oracle.com&gt;
Reviewed-by: Liam R. Howlett &lt;Liam.Howlett@Oracle.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Wei Yang &lt;richard.weiyang@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>maple_tree: add sufficient height</title>
<updated>2025-05-12T00:48:29Z</updated>
<author>
<name>Sidhartha Kumar</name>
<email>sidhartha.kumar@oracle.com</email>
</author>
<published>2025-04-10T19:14:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=271152a973cb01c135d29e91d1a05f51fbd88a9c'/>
<id>urn:sha1:271152a973cb01c135d29e91d1a05f51fbd88a9c</id>
<content type='text'>
In order to support rebalancing and spanning stores using less than the
worst case number of nodes, we need to track more than just the vacant
height.  Using only vacant height to reduce the worst case maple node
allocation count can lead to a shortcoming of nodes in the following
scenarios.

For rebalancing writes, when a leaf node becomes insufficient, it may be
combined with a sibling into a single node.  This means that the parent
node which has entries for this children will lose one entry.  If this
parent node was just meeting the minimum entries, losing one entry will
now cause this parent node to be insufficient.  This leads to a cascading
operation of rebalancing at different levels and can lead to more node
allocations than simply using vacant height can return.

For spanning writes, a similar situation occurs.  At the location at which
a spanning write is detected, the number of ancestor nodes may similarly
need to rebalanced into a smaller number of nodes and the same cascading
situation could occur.

To use less than the full height of the tree for the number of
allocations, we also need to track the height at which a non-leaf node
cannot become insufficient.  This means even if a rebalance occurs to a
child of this node, it currently has enough entries that it can lose one
without any further action.  This field is stored in the maple write state
as sufficient height.  In mas_prealloc_calc() when figuring out how many
nodes to allocate, we check if the vacant node is lower in the tree than a
sufficient node (has a larger value).  If it is, we cannot use the vacant
height and must use the difference in the height and sufficient height as
the basis for the number of nodes needed.

An off by one bug was also discovered in mast_overflow() where it is using
&gt;= rather than &gt;.  This caused extra iterations of the
mas_spanning_rebalance() loop and lead to unneeded allocations.  A test is
also added to check the number of allocations is correct.

Link: https://lkml.kernel.org/r/20250410191446.2474640-6-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar &lt;sidhartha.kumar@oracle.com&gt;
Reviewed-by: Liam R. Howlett &lt;Liam.Howlett@oracle.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Wei Yang &lt;richard.weiyang@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>maple_tree: break on convergence in mas_spanning_rebalance()</title>
<updated>2025-05-12T00:48:28Z</updated>
<author>
<name>Sidhartha Kumar</name>
<email>sidhartha.kumar@oracle.com</email>
</author>
<published>2025-04-10T19:14:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=300a5b4ffedf826998ed1f1b5d107e9fb0ef7579'/>
<id>urn:sha1:300a5b4ffedf826998ed1f1b5d107e9fb0ef7579</id>
<content type='text'>
This allows support for using the vacant height to calculate the worst
case number of nodes needed for wr_rebalance operation. 
mas_spanning_rebalance() was seen to perform unnecessary node allocations.
We can reduce allocations by breaking early during the rebalancing loop
once we realize that we have ascended to a common ancestor.

Link: https://lkml.kernel.org/r/20250410191446.2474640-5-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar &lt;sidhartha.kumar@oracle.com&gt;
Suggested-by: Liam Howlett &lt;liam.howlett@oracle.com&gt;
Reviewed-by: Wei Yang &lt;richard.weiyang@gmail.com&gt;
Reviewed-by: Liam R. Howlett &lt;Liam.Howlett@Oracle.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>maple_tree: use vacant nodes to reduce worst case allocations</title>
<updated>2025-05-12T00:48:28Z</updated>
<author>
<name>Sidhartha Kumar</name>
<email>sidhartha.kumar@oracle.com</email>
</author>
<published>2025-04-10T19:14:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ad88fc17d2dafe45e40de2af80207f4b2e3b1f71'/>
<id>urn:sha1:ad88fc17d2dafe45e40de2af80207f4b2e3b1f71</id>
<content type='text'>
In order to determine the store type for a maple tree operation, a walk of
the tree is done through mas_wr_walk().  This function descends the tree
until a spanning write is detected or we reach a leaf node.  While
descending, keep track of the height at which we encounter a node with
available space.  This is done by checking if mas-&gt;end is less than the
number of slots a given node type can fit.

Now that the height of the vacant node is tracked, we can use the
difference between the height of the tree and the height of the vacant
node to know how many levels we will have to propagate creating new nodes.
Update mas_prealloc_calc() to consider the vacant height and reduce the
number of worst-case allocations.

Rebalancing and spanning stores are not supported and fall back to using
the full height of the tree for allocations.

Update preallocation testing assertions to take into account vacant
height.

Link: https://lkml.kernel.org/r/20250410191446.2474640-4-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar &lt;sidhartha.kumar@oracle.com&gt;
Reviewed-by: Liam R. Howlett &lt;Liam.Howlett@oracle.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Wei Yang &lt;richard.weiyang@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>maple_tree: use height and depth consistently</title>
<updated>2025-05-12T00:48:28Z</updated>
<author>
<name>Sidhartha Kumar</name>
<email>sidhartha.kumar@oracle.com</email>
</author>
<published>2025-04-10T19:14:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f9d3a963fef4d3377b7ee122408cf2cdf37b3181'/>
<id>urn:sha1:f9d3a963fef4d3377b7ee122408cf2cdf37b3181</id>
<content type='text'>
For the maple tree, the root node is defined to have a depth of 0 with a
height of 1.  Each level down from the node, these values are incremented
by 1.  Various code paths define a root with depth 1 which is inconsisent
with the definition.  Modify the code to be consistent with this
definition.

In mas_spanning_rebalance(), l_mas.depth was being used to track the
height based on the number of iterations done in the main loop.  This
information was then used in mas_put_in_tree() to set the height.  Rather
than overload the l_mas.depth field to track height, simply keep track of
height in the local variable new_height and directly pass this to
mas_wmb_replace() which will be passed into mas_put_in_tree().  This
allows up to remove writes to l_mas.depth.

Link: https://lkml.kernel.org/r/20250410191446.2474640-3-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar &lt;sidhartha.kumar@oracle.com&gt;
Reviewed-by: Liam R. Howlett &lt;Liam.Howlett@oracle.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Wei Yang &lt;richard.weiyang@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>maple_tree: convert mas_prealloc_calc() to take in a maple write state</title>
<updated>2025-05-12T00:48:28Z</updated>
<author>
<name>Sidhartha Kumar</name>
<email>sidhartha.kumar@oracle.com</email>
</author>
<published>2025-04-10T19:14:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=28092a652f9c17773aa6ca0f62f73738d756b914'/>
<id>urn:sha1:28092a652f9c17773aa6ca0f62f73738d756b914</id>
<content type='text'>
Patch series "Track node vacancy to reduce worst case allocation counts", v5.

================ overview ========================
Currently, the maple tree preallocates the worst case number of nodes for
given store type by taking into account the whole height of the tree. 
This comes from a worst case scenario of every node in the tree being full
and having to propagate node allocation upwards until we reach the root of
the tree.  This can be optimized if there are vacancies in nodes that are
at a lower depth than the root node.  This series implements tracking the
level at which there is a vacant node so we only need to allocate until
this level is reached, rather than always using the full height of the
tree.  The ma_wr_state struct is modified to add a field which keeps track
of the vacant height and is updated during walks of the tree.  This value
is then read in mas_prealloc_calc() when we decide how many nodes to
allocate.

For rebalancing and spanning stores, we also need to track the lowest
height at which a node has 1 more entry than the minimum sufficient number
of entries.  This is because rebalancing can cause a parent node to become
insufficient which results in further node allocations.  In this case, we
need to use the sufficient height as the worst case rather than the vacant
height.

patch 1-2: preparatory patches
patch 3: implement vacant height tracking + update the tests
patch 4: support vacant height tracking for rebalancing writes
patch 5: implement sufficient height tracking
patch 6: reorder switch case statements

================ results =========================
Bpftrace was used to profile the allocation path for requesting new maple
nodes while running stress-ng mmap 120s.  The histograms below represent
requests to kmem_cache_alloc_bulk() and show the count argument.  This
represnts how many maple nodes the caller is requesting in
kmem_cache_alloc_bulk()

command: stress-ng --mmap 4 --timeout 120

mm-unstable 

@bulk_alloc_req:
[3, 4)                 4 |                                                    |
[4, 5)             54170 |@                                                   |
[5, 6)                 0 |                                                    |
[6, 7)            893057 |@@@@@@@@@@@@@@@@@@@@                                |
[7, 8)                 4 |                                                    |
[8, 9)           2230287 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[9, 10)            55811 |@                                                   |
[10, 11)           77834 |@                                                   |
[11, 12)               0 |                                                    |
[12, 13)         1368684 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                     |
[13, 14)               0 |                                                    |
[14, 15)               0 |                                                    |
[15, 16)          367197 |@@@@@@@@                                            |


@maple_node_total: 46,630,160
@total_vmas: 46184591

mm-unstable + this series

@bulk_alloc_req:
[2, 3)               198 |                                                    |
[3, 4)                 4 |                                                    |
[4, 5)                43 |                                                    |
[5, 6)                 0 |                                                    |
[6, 7)           1069503 |@@@@@@@@@@@@@@@@@@@@@                               |
[7, 8)                 4 |                                                    |
[8, 9)           2597268 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[9, 10)           472191 |@@@@@@@@@                                           |
[10, 11)          191904 |@@@                                                 |
[11, 12)               0 |                                                    |
[12, 13)          247316 |@@@@                                                |
[13, 14)               0 |                                                    |
[14, 15)               0 |                                                    |
[15, 16)           98769 |@                                                   |


@maple_node_total: 37,813,856
@total_vmas: 43493287

This represents a ~19% reduction in the number of bulk maple nodes allocated.

For more reproducible results, a historgram of the return value of
mas_prealloc_calc() is displayed while running the maple_tree_tests whcih
have a deterministic store pattern

mas_prealloc_calc() return value mm-unstable
1   :                                                    (12068)
3   :                                                    (11836)
5   : *****                                              (271192)
7   : ************************************************** (2329329)
9   : ***********                                        (534186)
10  :                                                    (435)
11  : ***************                                    (704306)
13  : ********                                           (409781)

mas_prealloc_calc() return value mm-unstable + this series
1   :                                                    (12070)
3   : ************************************************** (3548777)
5   : ********                                           (633458)
7   :                                                    (65081)
9   :                                                    (11224)
10  :                                                    (341)
11  :                                                    (2973)
13  :                                                    (68)

do_mmap latency was also measured for regressions:
command: stress-ng --mmap 4 --timeout 120

mm-unstable:
avg = 7162 nsecs, total: 16101821292 nsecs, count: 2248034

mm-unstable + this series:
avg = 6689 nsecs, total: 15135391764 nsecs, count: 2262726


stress-ng --mmap4 --timeout 120

with vacant_height:
stress-ng: info:  [257]                   21526312 Maple Tree Read                0.176 M/sec
stress-ng: info:  [257]                  339979348 Maple Tree Write               2.774 M/sec

without vacant_height:
stress-ng: info:  [8228]                   20968900 Maple Tree Read                0.171 M/sec
stress-ng: info:  [8228]                  312214648 Maple Tree Write               2.547 M/sec

This represents an increase of ~3% read throughput and ~9% increase in
write throughput.


This patch (of 6):

In a subsequent patch, mas_prealloc_calc() will need to access fields only
in the ma_wr_state.  Convert the function to take in a ma_wr_state and
modify all callers.  There is no functional change.

Link: https://lkml.kernel.org/r/20250410191446.2474640-1-sidhartha.kumar@oracle.com
Link: https://lkml.kernel.org/r/20250410191446.2474640-2-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar &lt;sidhartha.kumar@oracle.com&gt;
Reviewed-by: Liam R. Howlett &lt;Liam.Howlett@Oracle.com&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Wei Yang &lt;richard.weiyang@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>maple_tree: remove a BUG_ON() in mas_alloc_nodes()</title>
<updated>2025-03-17T05:06:15Z</updated>
<author>
<name>Petr Tesarik</name>
<email>ptesarik@suse.com</email>
</author>
<published>2025-02-13T11:44:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=18ea595a07bcf931ec6efae5c1f8a0d9a440ed97'/>
<id>urn:sha1:18ea595a07bcf931ec6efae5c1f8a0d9a440ed97</id>
<content type='text'>
Remove a BUG_ON() right before a WARN_ON() with the same condition.

Calling WARN_ON() and BUG_ON() here is definitely wrong.  Since the goal is
generally to remove BUG_ON() invocations from the kernel, keep only the
WARN_ON().

Link: https://lkml.kernel.org/r/20250213114453.1078318-1-ptesarik@suse.com
Fixes: 067311d33e65 ("maple_tree: separate ma_state node from status")
Signed-off-by: Petr Tesarik &lt;ptesarik@suse.com&gt;
Reviewed-by: Liam R. Howlett &lt;Liam.Howlett@Oracle.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>maple_tree: use ma_dead_node() in mte_dead_node()</title>
<updated>2025-03-17T05:06:13Z</updated>
<author>
<name>I Hsin Cheng</name>
<email>richard120310@gmail.com</email>
</author>
<published>2025-02-11T07:18:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6fbea85271c6764d512a5c5b6c1b44b762693a18'/>
<id>urn:sha1:6fbea85271c6764d512a5c5b6c1b44b762693a18</id>
<content type='text'>
Utilize ma_dead_node() in mte_dead_node().  It can prevent decoding the
maple enode for a second time.  Use the "node" to find parent for
comparison.

Link: https://lkml.kernel.org/r/20250211071850.330632-1-richard120310@gmail.com
Signed-off-by: I Hsin Cheng &lt;richard120310@gmail.com&gt;
Reviewed-by: Liam R. Howlett &lt;Liam.Howlett@Oracle.com&gt;
Cc: Ching-Chun (Jim) Huang &lt;jserv@ccns.ncku.edu.tw&gt;
Cc: Shuah khan &lt;skhan@linuxfoundation.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
</feed>
