<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/net/ceph, branch v5.0</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.0</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.0'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2019-02-18T17:05:33Z</updated>
<entry>
<title>libceph: handle an empty authorize reply</title>
<updated>2019-02-18T17:05:33Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2019-02-05T19:30:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=0fd3fd0a9bb0b02b6435bb7070e9f7b82a23f068'/>
<id>urn:sha1:0fd3fd0a9bb0b02b6435bb7070e9f7b82a23f068</id>
<content type='text'>
The authorize reply can be empty, for example when the ticket used to
build the authorizer is too old and TAG_BADAUTHORIZER is returned from
the service.  Calling -&gt;verify_authorizer_reply() results in an attempt
to decrypt and validate (somewhat) random data in au-&gt;buf (most likely
the signature block from calc_signature()), which fails and ends up in
con_fault_finish() with !con-&gt;auth_retry.  The ticket isn't invalidated
and the connection is retried again and again until a new ticket is
obtained from the monitor:

  libceph: osd2 192.168.122.1:6809 bad authorize reply
  libceph: osd2 192.168.122.1:6809 bad authorize reply
  libceph: osd2 192.168.122.1:6809 bad authorize reply
  libceph: osd2 192.168.122.1:6809 bad authorize reply

Let TAG_BADAUTHORIZER handler kick in and increment con-&gt;auth_retry.

Cc: stable@vger.kernel.org
Fixes: 5c056fdc5b47 ("libceph: verify authorize reply on connect")
Link: https://tracker.ceph.com/issues/20164
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Sage Weil &lt;sage@redhat.com&gt;
</content>
</entry>
<entry>
<title>libceph: avoid KEEPALIVE_PENDING races in ceph_con_keepalive()</title>
<updated>2019-01-21T13:53:12Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2019-01-14T20:13:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=4aac9228d16458cedcfd90c7fb37211cf3653ac3'/>
<id>urn:sha1:4aac9228d16458cedcfd90c7fb37211cf3653ac3</id>
<content type='text'>
con_fault() can transition the connection into STANDBY right after
ceph_con_keepalive() clears STANDBY in clear_standby():

    libceph user thread               ceph-msgr worker

ceph_con_keepalive()
  mutex_lock(&amp;con-&gt;mutex)
  clear_standby(con)
  mutex_unlock(&amp;con-&gt;mutex)
                                mutex_lock(&amp;con-&gt;mutex)
                                con_fault()
                                  ...
                                  if KEEPALIVE_PENDING isn't set
                                    set state to STANDBY
                                  ...
                                mutex_unlock(&amp;con-&gt;mutex)
  set KEEPALIVE_PENDING
  set WRITE_PENDING

This triggers warnings in clear_standby() when either ceph_con_send()
or ceph_con_keepalive() get to clearing STANDBY next time.

I don't see a reason to condition queue_con() call on the previous
value of KEEPALIVE_PENDING, so move the setting of KEEPALIVE_PENDING
into the critical section -- unlike WRITE_PENDING, KEEPALIVE_PENDING
could have been a non-atomic flag.

Reported-by: syzbot+acdeb633f6211ccdf886@syzkaller.appspotmail.com
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Tested-by: Myungho Jung &lt;mhjungk@gmail.com&gt;
</content>
</entry>
<entry>
<title>libceph: allow setting abort_on_full for rbd</title>
<updated>2019-01-07T21:47:48Z</updated>
<author>
<name>Dongsheng Yang</name>
<email>dongsheng.yang@easystack.cn</email>
</author>
<published>2018-12-18T09:31:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=02b2f549d502b46e68b97ea1452fb8853b3327dd'/>
<id>urn:sha1:02b2f549d502b46e68b97ea1452fb8853b3327dd</id>
<content type='text'>
Introduce a new option abort_on_full, default to false. Then
we can get -ENOSPC when the pool is full, or reaches quota.

[ Don't show abort_on_full in /proc/mounts. ]

Signed-off-by: Dongsheng Yang &lt;dongsheng.yang@easystack.cn&gt;
Reviewed-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>libceph: switch more to bool in ceph_tcp_sendmsg()</title>
<updated>2018-12-26T14:56:04Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2018-11-21T17:56:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=87349cdad963163b55cf7d327f5d47a647339838'/>
<id>urn:sha1:87349cdad963163b55cf7d327f5d47a647339838</id>
<content type='text'>
Unlike in ceph_tcp_sendpage(), it's a bool.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>libceph: use MSG_SENDPAGE_NOTLAST with ceph_tcp_sendpage()</title>
<updated>2018-12-26T14:56:04Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2018-11-20T14:44:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=433b0a12953bc1dfcb52febb186136395a65aad0'/>
<id>urn:sha1:433b0a12953bc1dfcb52febb186136395a65aad0</id>
<content type='text'>
Prevent do_tcp_sendpages() from calling tcp_push() (at least) once per
page.  Instead, arrange for tcp_push() to be called (at least) once per
data payload.  This results in more MSS-sized packets and fewer packets
overall (5-10% reduction in my tests with typical OSD request sizes).
See commits 2f5338442425 ("tcp: allow splice() to build full TSO
packets"), 35f9c09fe9c7 ("tcp: tcp_sendpages() should call tcp_push()
once") and ae62ca7b0321 ("tcp: fix MSG_SENDPAGE_NOTLAST logic") for
details.

Here is an example of a packet size histogram for 128K OSD requests
(MSS = 1448, top 5):

Before:

     SIZE    COUNT
     1448   777700
      952   127915
     1200    39238
     1219     9806
       21     5675

After:

     SIZE    COUNT
     1448   897280
       21     6201
     1019     2797
      643     2739
      376     2479

We could do slightly better by explicitly corking the socket but it's
not clear it's worth it.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>libceph: use sock_no_sendpage() as a fallback in ceph_tcp_sendpage()</title>
<updated>2018-12-26T14:56:04Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2018-11-16T10:58:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3239eb5215ebdef593a79316c9dbbdf8849166ec'/>
<id>urn:sha1:3239eb5215ebdef593a79316c9dbbdf8849166ec</id>
<content type='text'>
sock_no_sendpage() makes the code cleaner.

Also, don't set MSG_EOR.  sendpage doesn't act on MSG_EOR on its own,
it just honors the setting from the preceding sendmsg call by looking
at -&gt;eor in tcp_skb_can_collapse_to().

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>libceph: drop last_piece logic from write_partial_message_data()</title>
<updated>2018-12-26T14:56:04Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2018-11-14T11:24:01Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=1f6b821aef78e3d79e8d598ae59fc7e23fb6c563'/>
<id>urn:sha1:1f6b821aef78e3d79e8d598ae59fc7e23fb6c563</id>
<content type='text'>
last_piece is for the last piece in the current data item, not in the
entire data payload of the message.  This is harmful for messages with
multiple data items.  On top of that, we don't need to signal the end
of a data payload either because it is always followed by a footer.

We used to signal "more" unconditionally, until commit fe38a2b67bc6
("libceph: start defining message data cursor").  Part of a large
series, it introduced cursor-&gt;last_piece and also mistakenly inverted
the hint by passing last_piece for "more".  This was corrected with
commit c2cfa1940097 ("libceph: Fix ceph_tcp_sendpage()'s more boolean
usage").

As it is, last_piece is not helping at all: because Nagle algorithm is
disabled, for a simple message with two 512-byte data items we end up
emitting three packets: front + first data item, second data item and
footer.  Go back to the original pre-fe38a2b67bc6 behavior -- a single
packet in most cases.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>libceph: fall back to sendmsg for slab pages</title>
<updated>2018-11-19T16:59:47Z</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2018-11-08T14:55:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7e241f647dc7087a0401418a187f3f5b527cc690'/>
<id>urn:sha1:7e241f647dc7087a0401418a187f3f5b527cc690</id>
<content type='text'>
skb_can_coalesce() allows coalescing neighboring slab objects into
a single frag:

  return page == skb_frag_page(frag) &amp;&amp;
         off == frag-&gt;page_offset + skb_frag_size(frag);

ceph_tcp_sendpage() can be handed slab pages.  One example of this is
XFS: it passes down sector sized slab objects for its metadata I/O.  If
the kernel client is co-located on the OSD node, the skb may go through
loopback and pop on the receive side with the exact same set of frags.
When tcp_recvmsg() attempts to copy out such a frag, hardened usercopy
complains because the size exceeds the object's allocated size:

  usercopy: kernel memory exposure attempt detected from ffff9ba917f20a00 (kmalloc-512) (1024 bytes)

Although skb_can_coalesce() could be taught to return false if the
resulting frag would cross a slab object boundary, we already have
a fallback for non-refcounted pages.  Utilize it for slab pages too.

Cc: stable@vger.kernel.org # 4.8+
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs</title>
<updated>2018-11-02T02:58:52Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2018-11-02T02:58:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9931a07d518e86eb58a75e508ed9626f86359303'/>
<id>urn:sha1:9931a07d518e86eb58a75e508ed9626f86359303</id>
<content type='text'>
Pull AFS updates from Al Viro:
 "AFS series, with some iov_iter bits included"

* 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
  missing bits of "iov_iter: Separate type from direction and use accessor functions"
  afs: Probe multiple fileservers simultaneously
  afs: Fix callback handling
  afs: Eliminate the address pointer from the address list cursor
  afs: Allow dumping of server cursor on operation failure
  afs: Implement YFS support in the fs client
  afs: Expand data structure fields to support YFS
  afs: Get the target vnode in afs_rmdir() and get a callback on it
  afs: Calc callback expiry in op reply delivery
  afs: Fix FS.FetchStatus delivery from updating wrong vnode
  afs: Implement the YFS cache manager service
  afs: Remove callback details from afs_callback_break struct
  afs: Commit the status on a new file/dir/symlink
  afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS
  afs: Don't invoke the server to read data beyond EOF
  afs: Add a couple of tracepoints to log I/O errors
  afs: Handle EIO from delivery function
  afs: Fix TTL on VL server and address lists
  afs: Implement VL server rotation
  afs: Improve FS server rotation error handling
  ...
</content>
</entry>
<entry>
<title>Merge tag 'ceph-for-4.20-rc1' of git://github.com/ceph/ceph-client</title>
<updated>2018-10-31T21:42:31Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2018-10-31T21:42:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=31990f0f5366a8f66688edae8688723b22034108'/>
<id>urn:sha1:31990f0f5366a8f66688edae8688723b22034108</id>
<content type='text'>
Pull ceph updates from Ilya Dryomov:
 "The highlights are:

   - a series that fixes some old memory allocation issues in libceph
     (myself). We no longer allocate memory in places where allocation
     failures cannot be handled and BUG when the allocation fails.

   - support for copy_file_range() syscall (Luis Henriques). If size and
     alignment conditions are met, it leverages RADOS copy-from
     operation. Otherwise, a local copy is performed.

   - a patch that reduces memory requirement of ceph_sync_read() from
     the size of the entire read to the size of one object (Zheng Yan).

   - fallocate() syscall is now restricted to FALLOC_FL_PUNCH_HOLE (Luis
     Henriques)"

* tag 'ceph-for-4.20-rc1' of git://github.com/ceph/ceph-client: (25 commits)
  ceph: new mount option to disable usage of copy-from op
  ceph: support copy_file_range file operation
  libceph: support the RADOS copy-from operation
  ceph: add non-blocking parameter to ceph_try_get_caps()
  libceph: check reply num_data_items in setup_request_data()
  libceph: preallocate message data items
  libceph, rbd, ceph: move ceph_osdc_alloc_messages() calls
  libceph: introduce alloc_watch_request()
  libceph: assign cookies in linger_submit()
  libceph: enable fallback to ceph_msg_new() in ceph_msgpool_get()
  ceph: num_ops is off by one in ceph_aio_retry_work()
  libceph: no need to call osd_req_opcode_valid() in osd_req_encode_op()
  ceph: set timeout conditionally in __cap_delay_requeue
  libceph: don't consume a ref on pagelist in ceph_msg_data_add_pagelist()
  libceph: introduce ceph_pagelist_alloc()
  libceph: osd_req_op_cls_init() doesn't need to take opcode
  libceph: bump CEPH_MSG_MAX_DATA_LEN
  ceph: only allow punch hole mode in fallocate
  ceph: refactor ceph_sync_read()
  ceph: check if LOOKUPNAME request was aborted when filling trace
  ...
</content>
</entry>
</feed>
