<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/ceph, branch v5.4</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.4</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.4'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2019-09-16T10:06:25Z</updated>
<entry>
<title>libceph: use ceph_kvmalloc() for osdmap arrays</title>
<updated>2019-09-16T10:06:25Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2019-09-04T13:40:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cf73d882cc51c1f245a890cccb79952a260302d3'/>
<id>urn:sha1:cf73d882cc51c1f245a890cccb79952a260302d3</id>
<content type='text'>
osdmap has a bunch of arrays that grow linearly with the number of
OSDs.  osd_state, osd_weight and osd_primary_affinity take 4 bytes per
OSD.  osd_addr takes 136 bytes per OSD because of sockaddr_storage.
The CRUSH workspace area also grows linearly with the number of OSDs.

Normally these arrays are allocated at client startup.  The osdmap is
usually updated in small incrementals, but once in a while a full map
may need to be processed.  For a cluster with 10000 OSDs, this means
a bunch of 40K allocations followed by a 1.3M allocation, all of which
are currently required to be physically contiguous.  This results in
sporadic ENOMEM errors, hanging the client.

Go back to manually (re)allocating arrays and use ceph_kvmalloc() to
fall back to non-contiguous allocation when necessary.

Link: https://tracker.ceph.com/issues/40481
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
</content>
</entry>
<entry>
<title>libceph: avoid a __vmalloc() deadlock in ceph_kvmalloc()</title>
<updated>2019-09-16T10:06:25Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2019-08-30T18:44:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=10c12851a022662bf6085bd4384b4ebed4c447ce'/>
<id>urn:sha1:10c12851a022662bf6085bd4384b4ebed4c447ce</id>
<content type='text'>
The vmalloc allocator doesn't fully respect the specified gfp mask:
while the actual pages are allocated as requested, the page table pages
are always allocated with GFP_KERNEL.  ceph_kvmalloc() may be called
with GFP_NOFS and GFP_NOIO (for ceph and rbd respectively), so this may
result in a deadlock.

There is no real reason for the current PAGE_ALLOC_COSTLY_ORDER logic,
it's just something that seemed sensible at the time (ceph_kvmalloc()
predates kvmalloc()).  kvmalloc() is smarter: in an attempt to reduce
long term fragmentation, it first tries to kmalloc non-disruptively.

Switch to kvmalloc() and set the respective PF_MEMALLOC_* flag using
the scope API to avoid the deadlock.  Note that kvmalloc() needs to be
passed GFP_KERNEL to enable the fallback.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
</content>
</entry>
<entry>
<title>libceph: drop unused con parameter of calc_target()</title>
<updated>2019-09-16T10:06:25Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2019-08-21T11:57:18Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8edf84ba4d0ecad7d1c5b393ba004b959dae76ea'/>
<id>urn:sha1:8edf84ba4d0ecad7d1c5b393ba004b959dae76ea</id>
<content type='text'>
This bit was omitted from a561372405cf ("libceph: fix PG split vs OSD
(re)connect race") to avoid backport conflicts.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>libceph: handle OSD op ceph_pagelist_append() errors</title>
<updated>2019-09-16T10:06:25Z</updated>
<author>
<name>David Disseldorp</name>
<email>ddiss@suse.de</email>
</author>
<published>2019-07-03T15:59:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4766815b1179dbe4fe263a5f95795710e29276e2'/>
<id>urn:sha1:4766815b1179dbe4fe263a5f95795710e29276e2</id>
<content type='text'>
osd_req_op_cls_init() and osd_req_op_xattr_init() currently propagate
ceph_pagelist_alloc() ENOMEM errors but ignore ceph_pagelist_append()
memory allocation failures. Add these checks and cleanup on error.

Signed-off-by: David Disseldorp &lt;ddiss@suse.de&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>libceph: add function that clears osd client's abort_err</title>
<updated>2019-09-16T10:06:23Z</updated>
<author>
<name>Yan, Zheng</name>
<email>zyan@redhat.com</email>
</author>
<published>2019-07-25T12:16:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2cef0ba8032ce0df3f29621921fe7d6372f68d3e'/>
<id>urn:sha1:2cef0ba8032ce0df3f29621921fe7d6372f68d3e</id>
<content type='text'>
Signed-off-by: "Yan, Zheng" &lt;zyan@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>libceph: add function that reset client's entity addr</title>
<updated>2019-09-16T10:06:23Z</updated>
<author>
<name>Yan, Zheng</name>
<email>zyan@redhat.com</email>
</author>
<published>2019-07-25T12:16:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=120a75ea9f4ba02f852171e75d44f29139b9c83e'/>
<id>urn:sha1:120a75ea9f4ba02f852171e75d44f29139b9c83e</id>
<content type='text'>
This function also re-open connections to OSD/MON, and re-send in-flight
OSD requests after re-opening connections to OSD.

Signed-off-by: "Yan, Zheng" &lt;zyan@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>libceph: don't call crypto_free_sync_skcipher() on a NULL tfm</title>
<updated>2019-08-28T10:33:46Z</updated>
<author>
<name>Jia-Ju Bai</name>
<email>baijiaju1990@gmail.com</email>
</author>
<published>2019-07-24T09:43:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e8c99200b4d117c340c392ebd5e62d85dfeed027'/>
<id>urn:sha1:e8c99200b4d117c340c392ebd5e62d85dfeed027</id>
<content type='text'>
In set_secret(), key-&gt;tfm is assigned to NULL on line 55, and then
ceph_crypto_key_destroy(key) is executed.

ceph_crypto_key_destroy(key)
  crypto_free_sync_skcipher(key-&gt;tfm)
    crypto_free_skcipher(&amp;tfm-&gt;base);

This happens to work because crypto_sync_skcipher is a trivial wrapper
around crypto_skcipher: &amp;tfm-&gt;base is still 0 and crypto_free_skcipher()
handles that.  Let's not rely on the layout of crypto_sync_skcipher.

This bug is found by a static analysis tool STCheck written by us.

Fixes: 69d6302b65a8 ("libceph: Remove VLA usage of skcipher").
Signed-off-by: Jia-Ju Bai &lt;baijiaju1990@gmail.com&gt;
Reviewed-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>libceph: fix PG split vs OSD (re)connect race</title>
<updated>2019-08-22T08:47:41Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2019-08-20T14:40:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=a561372405cf6bc6f14239b3a9e57bb39f2788b0'/>
<id>urn:sha1:a561372405cf6bc6f14239b3a9e57bb39f2788b0</id>
<content type='text'>
We can't rely on -&gt;peer_features in calc_target() because it may be
called both when the OSD session is established and open and when it's
not.  -&gt;peer_features is not valid unless the OSD session is open.  If
this happens on a PG split (pg_num increase), that could mean we don't
resend a request that should have been resent, hanging the client
indefinitely.

In userspace this was fixed by looking at require_osd_release and
get_xinfo[osd].features fields of the osdmap.  However these fields
belong to the OSD section of the osdmap, which the kernel doesn't
decode (only the client section is decoded).

Instead, let's drop this feature check.  It effectively checks for
luminous, so only pre-luminous OSDs would be affected in that on a PG
split the kernel might resend a request that should not have been
resent.  Duplicates can occur in other scenarios, so both sides should
already be prepared for them: see dup/replay logic on the OSD side and
retry_attempt check on the client side.

Cc: stable@vger.kernel.org
Fixes: 7de030d6b10a ("libceph: resend on PG splits if OSD has RESEND_ON_SPLIT")
Link: https://tracker.ceph.com/issues/41162
Reported-by: Jerry Lee &lt;leisurelysw24@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Tested-by: Jerry Lee &lt;leisurelysw24@gmail.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'ceph-for-5.3-rc1' of git://github.com/ceph/ceph-client</title>
<updated>2019-07-18T18:05:25Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-07-18T18:05:25Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=d9b9c893048e9d308a833619f0866f1f52778cf5'/>
<id>urn:sha1:d9b9c893048e9d308a833619f0866f1f52778cf5</id>
<content type='text'>
Pull ceph updates from Ilya Dryomov:
 "Lots of exciting things this time!

   - support for rbd object-map and fast-diff features (myself). This
     will speed up reads, discards and things like snap diffs on sparse
     images.

   - ceph.snap.btime vxattr to expose snapshot creation time (David
     Disseldorp). This will be used to integrate with "Restore Previous
     Versions" feature added in Windows 7 for folks who reexport ceph
     through SMB.

   - security xattrs for ceph (Zheng Yan). Only selinux is supported for
     now due to the limitations of -&gt;dentry_init_security().

   - support for MSG_ADDR2, FS_BTIME and FS_CHANGE_ATTR features (Jeff
     Layton). This is actually a single feature bit which was missing
     because of the filesystem pieces. With this in, the kernel client
     will finally be reported as "luminous" by "ceph features" -- it is
     still being reported as "jewel" even though all required Luminous
     features were implemented in 4.13.

   - stop NULL-terminating ceph vxattrs (Jeff Layton). The convention
     with xattrs is to not terminate and this was causing
     inconsistencies with ceph-fuse.

   - change filesystem time granularity from 1 us to 1 ns, again fixing
     an inconsistency with ceph-fuse (Luis Henriques).

  On top of this there are some additional dentry name handling and cap
  flushing fixes from Zheng. Finally, Jeff is formally taking over for
  Zheng as the filesystem maintainer"

* tag 'ceph-for-5.3-rc1' of git://github.com/ceph/ceph-client: (71 commits)
  ceph: fix end offset in truncate_inode_pages_range call
  ceph: use generic_delete_inode() for -&gt;drop_inode
  ceph: use ceph_evict_inode to cleanup inode's resource
  ceph: initialize superblock s_time_gran to 1
  MAINTAINERS: take over for Zheng as CephFS kernel client maintainer
  rbd: setallochint only if object doesn't exist
  rbd: support for object-map and fast-diff
  rbd: call rbd_dev_mapping_set() from rbd_dev_image_probe()
  libceph: export osd_req_op_data() macro
  libceph: change ceph_osdc_call() to take page vector for response
  libceph: bump CEPH_MSG_MAX_DATA_LEN (again)
  rbd: new exclusive lock wait/wake code
  rbd: quiescing lock should wait for image requests
  rbd: lock should be quiesced on reacquire
  rbd: introduce copyup state machine
  rbd: rename rbd_obj_setup_*() to rbd_obj_init_*()
  rbd: move OSD request allocation into object request state machines
  rbd: factor out __rbd_osd_setup_discard_ops()
  rbd: factor out rbd_osd_setup_copyup()
  rbd: introduce obj_req-&gt;osd_reqs list
  ...
</content>
</entry>
<entry>
<title>Merge tag 'driver-core-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core</title>
<updated>2019-07-12T19:24:03Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-07-12T19:24:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=f632a8170a6b667ee4e3f552087588f0fe13c4bb'/>
<id>urn:sha1:f632a8170a6b667ee4e3f552087588f0fe13c4bb</id>
<content type='text'>
Pull driver core and debugfs updates from Greg KH:
 "Here is the "big" driver core and debugfs changes for 5.3-rc1

  It's a lot of different patches, all across the tree due to some api
  changes and lots of debugfs cleanups.

  Other than the debugfs cleanups, in this set of changes we have:

   - bus iteration function cleanups

   - scripts/get_abi.pl tool to display and parse Documentation/ABI
     entries in a simple way

   - cleanups to Documenatation/ABI/ entries to make them parse easier
     due to typos and other minor things

   - default_attrs use for some ktype users

   - driver model documentation file conversions to .rst

   - compressed firmware file loading

   - deferred probe fixes

  All of these have been in linux-next for a while, with a bunch of
  merge issues that Stephen has been patient with me for"

* tag 'driver-core-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (102 commits)
  debugfs: make error message a bit more verbose
  orangefs: fix build warning from debugfs cleanup patch
  ubifs: fix build warning after debugfs cleanup patch
  driver: core: Allow subsystems to continue deferring probe
  drivers: base: cacheinfo: Ensure cpu hotplug work is done before Intel RDT
  arch_topology: Remove error messages on out-of-memory conditions
  lib: notifier-error-inject: no need to check return value of debugfs_create functions
  swiotlb: no need to check return value of debugfs_create functions
  ceph: no need to check return value of debugfs_create functions
  sunrpc: no need to check return value of debugfs_create functions
  ubifs: no need to check return value of debugfs_create functions
  orangefs: no need to check return value of debugfs_create functions
  nfsd: no need to check return value of debugfs_create functions
  lib: 842: no need to check return value of debugfs_create functions
  debugfs: provide pr_fmt() macro
  debugfs: log errors when something goes wrong
  drivers: s390/cio: Fix compilation warning about const qualifiers
  drivers: Add generic helper to match by of_node
  driver_find_device: Unify the match function with class_find_device()
  bus_find_device: Unify the match callback with class_find_device
  ...
</content>
</entry>
</feed>
