<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/dax, branch v5.1</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.1</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.1'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2019-03-16T20:05:32Z</updated>
<entry>
<title>Merge tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm</title>
<updated>2019-03-16T20:05:32Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-03-16T20:05:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f67e3fb4891287b8248ebb3320f794b9f5e782d4'/>
<id>urn:sha1:f67e3fb4891287b8248ebb3320f794b9f5e782d4</id>
<content type='text'>
Pull device-dax updates from Dan Williams:
 "New device-dax infrastructure to allow persistent memory and other
  "reserved" / performance differentiated memories, to be assigned to
  the core-mm as "System RAM".

  Some users want to use persistent memory as additional volatile
  memory. They are willing to cope with potential performance
  differences, for example between DRAM and 3D Xpoint, and want to use
  typical Linux memory management apis rather than a userspace memory
  allocator layered over an mmap() of a dax file. The administration
  model is to decide how much Persistent Memory (pmem) to use as System
  RAM, create a device-dax-mode namespace of that size, and then assign
  it to the core-mm. The rationale for device-dax is that it is a
  generic memory-mapping driver that can be layered over any "special
  purpose" memory, not just pmem. On subsequent boots udev rules can be
  used to restore the memory assignment.

  One implication of using pmem as RAM is that mlock() no longer keeps
  data off persistent media. For this reason it is recommended to enable
  NVDIMM Security (previously merged for 5.0) to encrypt pmem contents
  at rest. We considered making this recommendation an actively enforced
  requirement, but in the end decided to leave it as a distribution /
  administrator policy to allow for emulation and test environments that
  lack security capable NVDIMMs.

  Summary:

   - Replace the /sys/class/dax device model with /sys/bus/dax, and
     include a compat driver so distributions can opt-in to the new ABI.

   - Allow for an alternative driver for the device-dax address-range

   - Introduce the 'kmem' driver to hotplug / assign a device-dax
     address-range to the core-mm.

   - Arrange for the device-dax target-node to be onlined so that the
     newly added memory range can be uniquely referenced by numa apis"

NOTE! I'm not entirely happy with the whole "PMEM as RAM" model because
we currently have special - and very annoying rules in the kernel about
accessing PMEM only with the "MC safe" accessors, because machine checks
inside the regular repeat string copy functions can be fatal in some
(not described) circumstances.

And apparently the PMEM modules can cause that a lot more than regular
RAM.  The argument is that this happens because PMEM doesn't necessarily
get scrubbed at boot like RAM does, but that is planned to be added for
the user space tooling.

Quoting Dan from another email:
 "The exposure can be reduced in the volatile-RAM case by scanning for
  and clearing errors before it is onlined as RAM. The userspace tooling
  for that can be in place before v5.1-final. There's also runtime
  notifications of errors via acpi_nfit_uc_error_notify() from
  background scrubbers on the DIMM devices. With that mechanism the
  kernel could proactively clear newly discovered poison in the volatile
  case, but that would be additional development more suitable for v5.2.

  I understand the concern, and the need to highlight this issue by
  tapping the brakes on feature development, but I don't see PMEM as RAM
  making the situation worse when the exposure is also there via DAX in
  the PMEM case. Volatile-RAM is arguably a safer use case since it's
  possible to repair pages where the persistent case needs active
  application coordination"

* tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  device-dax: "Hotplug" persistent memory for use like normal RAM
  mm/resource: Let walk_system_ram_range() search child resources
  mm/memory-hotplug: Allow memory resources to be children
  mm/resource: Move HMM pr_debug() deeper into resource code
  mm/resource: Return real error codes from walk failures
  device-dax: Add a 'modalias' attribute to DAX 'bus' devices
  device-dax: Add a 'target_node' attribute
  device-dax: Auto-bind device after successful new_id
  acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node
  device-dax: Add /sys/class/dax backwards compatibility
  device-dax: Add support for a dax override driver
  device-dax: Move resource pinning+mapping into the common driver
  device-dax: Introduce bus + driver model
  device-dax: Start defining a dax bus model
  device-dax: Remove multi-resource infrastructure
  device-dax: Kill dax_region base
  device-dax: Kill dax_region ida
</content>
</entry>
<entry>
<title>device-dax: "Hotplug" persistent memory for use like normal RAM</title>
<updated>2019-02-28T18:41:23Z</updated>
<author>
<name>Dave Hansen</name>
<email>dave.hansen@linux.intel.com</email>
</author>
<published>2019-02-25T18:57:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c221c0b0308fd01d9fb33a16f64d2fd95f8830a4'/>
<id>urn:sha1:c221c0b0308fd01d9fb33a16f64d2fd95f8830a4</id>
<content type='text'>
This is intended for use with NVDIMMs that are physically persistent
(physically like flash) so that they can be used as a cost-effective
RAM replacement.  Intel Optane DC persistent memory is one
implementation of this kind of NVDIMM.

Currently, a persistent memory region is "owned" by a device driver,
either the "Direct DAX" or "Filesystem DAX" drivers.  These drivers
allow applications to explicitly use persistent memory, generally
by being modified to use special, new libraries. (DIMM-based
persistent memory hardware/software is described in great detail
here: Documentation/nvdimm/nvdimm.txt).

However, this limits persistent memory use to applications which
*have* been modified.  To make it more broadly usable, this driver
"hotplugs" memory into the kernel, to be managed and used just like
normal RAM would be.

To make this work, management software must remove the device from
being controlled by the "Device DAX" infrastructure:

	echo dax0.0 &gt; /sys/bus/dax/drivers/device_dax/unbind

and then tell the new driver that it can bind to the device:

	echo dax0.0 &gt; /sys/bus/dax/drivers/kmem/new_id

After this, there will be a number of new memory sections visible
in sysfs that can be onlined, or that may get onlined by existing
udev-initiated memory hotplug rules.

This rebinding procedure is currently a one-way trip.  Once memory
is bound to "kmem", it's there permanently and can not be
unbound and assigned back to device_dax.

The kmem driver will never bind to a dax device unless the device
is *explicitly* bound to the driver.  There are two reasons for
this: One, since it is a one-way trip, it can not be undone if
bound incorrectly.  Two, the kmem driver destroys data on the
device.  Think of if you had good data on a pmem device.  It
would be catastrophic if you compile-in "kmem", but leave out
the "device_dax" driver.  kmem would take over the device and
write volatile data all over your good data.

This inherits any existing NUMA information for the newly-added
memory from the persistent memory device that came from the
firmware.  On Intel platforms, the firmware has guarantees that
require each socket's persistent memory to be in a separate
memory-only NUMA node.  That means that this patch is not expected
to create NUMA nodes, but will simply hotplug memory into existing
nodes.

Because NUMA nodes are created, the existing NUMA APIs and tools
are sufficient to create policies for applications or memory areas
to have affinity for or an aversion to using this memory.

There is currently some metadata at the beginning of pmem regions.
The section-size memory hotplug restrictions, plus this small
reserved area can cause the "loss" of a section or two of capacity.
This should be fixable in follow-on patches.  But, as a first step,
losing 256MB of memory (worst case) out of hundreds of gigabytes
is a good tradeoff vs. the required code to fix this up precisely.
This calculation is also the reason we export
memory_block_size_bytes().

Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Reviewed-by: Keith Busch &lt;keith.busch@intel.com&gt;
Cc: Dave Jiang &lt;dave.jiang@intel.com&gt;
Cc: Ross Zwisler &lt;zwisler@kernel.org&gt;
Cc: Vishal Verma &lt;vishal.l.verma@intel.com&gt;
Cc: Tom Lendacky &lt;thomas.lendacky@amd.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: linux-nvdimm@lists.01.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: Huang Ying &lt;ying.huang@intel.com&gt;
Cc: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
Cc: Borislav Petkov &lt;bp@suse.de&gt;
Cc: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Cc: Yaowei Bai &lt;baiyaowei@cmss.chinamobile.com&gt;
Cc: Takashi Iwai &lt;tiwai@suse.de&gt;
Cc: Jerome Glisse &lt;jglisse@redhat.com&gt;
Reviewed-by: Vishal Verma &lt;vishal.l.verma@intel.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>device-dax: Add a 'modalias' attribute to DAX 'bus' devices</title>
<updated>2019-02-28T05:03:48Z</updated>
<author>
<name>Vishal Verma</name>
<email>vishal.l.verma@intel.com</email>
</author>
<published>2019-02-22T23:58:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=c347bd71dcdb2d0ac8b3a771486584dca8c8dd80'/>
<id>urn:sha1:c347bd71dcdb2d0ac8b3a771486584dca8c8dd80</id>
<content type='text'>
Add a 'modalias' attribute to devices under the DAX bus so that userspace
is able to dynamically load modules as needed.

Normally, udev can get the modalias from 'uevent', and that is correctly
set up by the DAX bus. However other tooling such as 'libndctl' for
interacting with drivers/nvdimm/, and 'libdaxctl' for drivers/dax/ can
also use the modalias to dynamically load modules via libkmod lookups.

The 'nd' bus set up by the libnvdimm subsystem exports a modalias
attribute. Imitate this to export the same for the 'dax' bus.

Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Signed-off-by: Vishal Verma &lt;vishal.l.verma@intel.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>dax: Check the end of the block-device capacity with dax_direct_access()</title>
<updated>2019-02-21T05:12:50Z</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2019-02-21T05:12:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ad428cdb525a97d15c0349fdc80f3d58befb50df'/>
<id>urn:sha1:ad428cdb525a97d15c0349fdc80f3d58befb50df</id>
<content type='text'>
The checks in __bdev_dax_supported() helped mitigate a potential data
corruption bug in the pmem driver's handling of section alignment
padding. Strengthen the checks, including checking the end of the range,
to validate the dev_pagemap, Xarray entries, and sector-to-pfn
translation established for pmem namespaces.

Acked-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: "Darrick J. Wong" &lt;darrick.wong@oracle.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;</content>
</entry>
<entry>
<title>device-dax: Add a 'target_node' attribute</title>
<updated>2019-02-20T19:39:36Z</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2019-02-20T19:39:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=21c75763a3ae18679e5c4e2260aa9379b073566b'/>
<id>urn:sha1:21c75763a3ae18679e5c4e2260aa9379b073566b</id>
<content type='text'>
The target-node attribute is the Linux numa-node that a device-dax
instance may create when it is online. Prior to being online the
device's 'numa_node' property reflects the closest online cpu node which
is the typical expectation of a device 'numa_node'. Once it is online it
becomes its own distinct numa node, i.e. 'target_node'.

Export the 'target_node' property to give userspace tooling the ability
to predict the effective numa-node from a device-dax instance configured
to provide 'System RAM' capacity.

Cc: Vishal Verma &lt;vishal.l.verma@intel.com&gt;
Reported-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>device-dax: Auto-bind device after successful new_id</title>
<updated>2019-01-24T21:12:04Z</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2019-01-24T21:12:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=664525b2d84abca1074c9546654ae9689de8a818'/>
<id>urn:sha1:664525b2d84abca1074c9546654ae9689de8a818</id>
<content type='text'>
The typical 'new_id' attribute behavior is to immediately attach a
device to its driver after a new device-id is added. Implement this
behavior for the dax bus.

Reported-by: Alexander Duyck &lt;alexander.h.duyck@linux.intel.com&gt;
Reported-by: Brice Goglin &lt;Brice.Goglin@inria.fr&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node</title>
<updated>2019-01-07T05:41:57Z</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2018-11-09T20:43:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8fc5c73554db0ac18c0c6ac5b2099ab917f83bdf'/>
<id>urn:sha1:8fc5c73554db0ac18c0c6ac5b2099ab917f83bdf</id>
<content type='text'>
Persistent memory, as described by the ACPI NFIT (NVDIMM Firmware
Interface Table), is the first known instance of a memory range
described by a unique "target" proximity domain. Where "initiator" and
"target" proximity domains is an approach that the ACPI HMAT
(Heterogeneous Memory Attributes Table) uses to described the unique
performance properties of a memory range relative to a given initiator
(e.g. CPU or DMA device).

Currently the numa-node for a /dev/pmemX block-device or /dev/daxX.Y
char-device follows the traditional notion of 'numa-node' where the
attribute conveys the closest online numa-node. That numa-node attribute
is useful for cpu-binding and memory-binding processes *near* the
device. However, when the memory range backing a 'pmem', or 'dax' device
is onlined (memory hot-add) the memory-only-numa-node representing that
address needs to be differentiated from the set of online nodes. In
other words, the numa-node association of the device depends on whether
you can bind processes *near* the cpu-numa-node in the offline
device-case, or bind process *on* the memory-range directly after the
backing address range is onlined.

Allow for the case that platform firmware describes persistent memory
with a unique proximity domain, i.e. when it is distinct from the
proximity of DRAM and CPUs that are on the same socket. Plumb the Linux
numa-node translation of that proximity through the libnvdimm region
device to namespaces that are in device-dax mode. With this in place the
proposed kmem driver [1] can optionally discover a unique numa-node
number for the address range as it transitions the memory from an
offline state managed by a device-driver to an online memory range
managed by the core-mm.

[1]: https://lore.kernel.org/lkml/20181022201317.8558C1D8@viggo.jf.intel.com

Reported-by: Fan Du &lt;fan.du@intel.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: "Oliver O'Halloran" &lt;oohall@gmail.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Jérôme Glisse &lt;jglisse@redhat.com&gt;
Reviewed-by: Yang Shi &lt;yang.shi@linux.alibaba.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>device-dax: Add /sys/class/dax backwards compatibility</title>
<updated>2019-01-07T05:41:57Z</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2017-07-16T20:51:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=730926c3b0998943654019f00296cf8e3b02277e'/>
<id>urn:sha1:730926c3b0998943654019f00296cf8e3b02277e</id>
<content type='text'>
On the expectation that some environments may not upgrade libdaxctl
(userspace component that depends on the /sys/class/dax hierarchy),
provide a default / legacy dax_pmem_compat driver. The dax_pmem_compat
driver implements the original /sys/class/dax sysfs layout rather than
/sys/bus/dax. When userspace is upgraded it can blacklist this module
and switch to the dax_pmem driver going forward.

CONFIG_DEV_DAX_PMEM_COMPAT and supporting code will be deleted according
to the dax_pmem entry in Documentation/ABI/obsolete/.

Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
<entry>
<title>device-dax: Add support for a dax override driver</title>
<updated>2019-01-07T05:41:55Z</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2018-11-07T23:31:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d200781ef237a354d918ceff5cee350d88a93d42'/>
<id>urn:sha1:d200781ef237a354d918ceff5cee350d88a93d42</id>
<content type='text'>
Introduce the 'new_id' concept for enabling a custom device-driver attach
policy for dax-bus drivers. The intended use is to have a mechanism for
hot-plugging device-dax ranges into the page allocator on-demand. With
this in place the default policy of using device-dax for performance
differentiated memory can be overridden by user-space policy that can
arrange for the memory range to be managed as 'System RAM' with
user-defined NUMA and other performance attributes.

Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;</content>
</entry>
<entry>
<title>device-dax: Move resource pinning+mapping into the common driver</title>
<updated>2019-01-07T05:26:21Z</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2018-10-29T22:52:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=89ec9f2cfa36cc5fca2fb445ed221bb9add7b536'/>
<id>urn:sha1:89ec9f2cfa36cc5fca2fb445ed221bb9add7b536</id>
<content type='text'>
Move the responsibility of calling devm_request_resource() and
devm_memremap_pages() into the common device-dax driver. This is another
preparatory step to allowing an alternate personality driver for a
device-dax range.

Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;</content>
</entry>
</feed>
