<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/fs/fuse, branch v5.11</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v5.11</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v5.11'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2020-12-10T14:33:14Z</updated>
<entry>
<title>fuse: fix bad inode</title>
<updated>2020-12-10T14:33:14Z</updated>
<author>
<name>Miklos Szeredi</name>
<email>mszeredi@redhat.com</email>
</author>
<published>2020-12-10T14:33:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5d069dbe8aaf2a197142558b6fb2978189ba3454'/>
<id>urn:sha1:5d069dbe8aaf2a197142558b6fb2978189ba3454</id>
<content type='text'>
Jan Kara's analysis of the syzbot report (edited):

  The reproducer opens a directory on FUSE filesystem, it then attaches
  dnotify mark to the open directory.  After that a fuse_do_getattr() call
  finds that attributes returned by the server are inconsistent, and calls
  make_bad_inode() which, among other things does:

          inode-&gt;i_mode = S_IFREG;

  This then confuses dnotify which doesn't tear down its structures
  properly and eventually crashes.

Avoid calling make_bad_inode() on a live inode: switch to a private flag on
the fuse inode.  Also add the test to ops which the bad_inode_ops would
have caught.

This bug goes back to the initial merge of fuse in 2.6.14...

Reported-by: syzbot+f427adf9324b92652ccc@syzkaller.appspotmail.com
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
Tested-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: &lt;stable@vger.kernel.org&gt;
</content>
</entry>
<entry>
<title>fuse: support SB_NOSEC flag to improve write performance</title>
<updated>2020-11-11T16:22:33Z</updated>
<author>
<name>Vivek Goyal</name>
<email>vgoyal@redhat.com</email>
</author>
<published>2020-10-09T18:15:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9d769e6aa2524e1762e3b8681e0ed78f8acf6cad'/>
<id>urn:sha1:9d769e6aa2524e1762e3b8681e0ed78f8acf6cad</id>
<content type='text'>
Virtiofs can be slow with small writes if xattr are enabled and we are
doing cached writes (No direct I/O).  Ganesh Mahalingam noticed this.

Some debugging showed that file_remove_privs() is called in cached write
path on every write.  And everytime it calls security_inode_need_killpriv()
which results in call to __vfs_getxattr(XATTR_NAME_CAPS).  And this goes to
file server to fetch xattr.  This extra round trip for every write slows
down writes tremendously.

Normally to avoid paying this penalty on every write, vfs has the notion of
caching this information in inode (S_NOSEC).  So vfs sets S_NOSEC, if
filesystem opted for it using super block flag SB_NOSEC.  And S_NOSEC is
cleared when setuid/setgid bit is set or when security xattr is set on
inode so that next time a write happens, we check inode again for clearing
setuid/setgid bits as well clear any security.capability xattr.

This seems to work well for local file systems but for remote file systems
it is possible that VFS does not have full picture and a different client
sets setuid/setgid bit or security.capability xattr on file and that means
VFS information about S_NOSEC on another client will be stale.  So for
remote filesystems SB_NOSEC was disabled by default.

Commit 9e1f1de02c22 ("more conservative S_NOSEC handling") mentioned that
these filesystems can still make use of SB_NOSEC as long as they clear
S_NOSEC when they are refreshing inode attriutes from server.

So this patch tries to enable SB_NOSEC on fuse (regular fuse as well as
virtiofs).  And clear SB_NOSEC when we are refreshing inode attributes.

This is enabled only if server supports FUSE_HANDLE_KILLPRIV_V2.  This says
that server will clear setuid/setgid/security.capability on
chown/truncate/write as apporpriate.

This should provide tighter coherency because now suid/sgid/
security.capability will be cleared even if fuse client cache has not seen
these attrs.

Basic idea is that fuse client will trigger suid/sgid/security.capability
clearing based on its attr cache.  But even if cache has gone stale, it is
fine because FUSE_HANDLE_KILLPRIV_V2 will make sure WRITE clear
suid/sgid/security.capability.

We make this change only if server supports FUSE_HANDLE_KILLPRIV_V2.  This
should make sure that existing filesystems which might be relying on
seucurity.capability always being queried from server are not impacted.

This tighter coherency relies on WRITE showing up on server (and not being
cached in guest).  So writeback_cache mode will not provide that tight
coherency and it is not recommended to use two together.  Having said that
it might work reasonably well for lot of use cases.

This change improves random write performance very significantly.  Running
virtiofsd with cache=auto and following fio command:

fio --ioengine=libaio --direct=1  --name=test --filename=/mnt/virtiofs/random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randwrite

Bandwidth increases from around 50MB/s to around 250MB/s as a result of
applying this patch.  So improvement is very significant.

Link: https://github.com/kata-containers/runtime/issues/2815
Reported-by: "Mahalingam, Ganesh" &lt;ganesh.mahalingam@intel.com&gt;
Signed-off-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</content>
</entry>
<entry>
<title>fuse: add a flag FUSE_OPEN_KILL_SUIDGID for open() request</title>
<updated>2020-11-11T16:22:33Z</updated>
<author>
<name>Vivek Goyal</name>
<email>vgoyal@redhat.com</email>
</author>
<published>2020-10-09T18:15:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=643a666a89c358ef588d2b3ef9f2dc1efc421e61'/>
<id>urn:sha1:643a666a89c358ef588d2b3ef9f2dc1efc421e61</id>
<content type='text'>
With FUSE_HANDLE_KILLPRIV_V2 support, server will need to kill suid/sgid/
security.capability on open(O_TRUNC), if server supports
FUSE_ATOMIC_O_TRUNC.

But server needs to kill suid/sgid only if caller does not have CAP_FSETID.
Given server does not have this information, client needs to send this info
to server.

So add a flag FUSE_OPEN_KILL_SUIDGID to fuse_open_in request which tells
server to kill suid/sgid (only if group execute is set).

This flag is added to the FUSE_OPEN request, as well as the FUSE_CREATE
request if the create was non-exclusive, since that might result in an
existing file being opened/truncated.

Signed-off-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</content>
</entry>
<entry>
<title>fuse: don't send ATTR_MODE to kill suid/sgid for handle_killpriv_v2</title>
<updated>2020-11-11T16:22:33Z</updated>
<author>
<name>Vivek Goyal</name>
<email>vgoyal@redhat.com</email>
</author>
<published>2020-10-09T18:15:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=8981bdfda7445af5d5a8c277c923bf91873a0c98'/>
<id>urn:sha1:8981bdfda7445af5d5a8c277c923bf91873a0c98</id>
<content type='text'>
If client does a write() on a suid/sgid file, VFS will first call
fuse_setattr() with ATTR_KILL_S[UG]ID set.  This requires sending setattr
to file server with ATTR_MODE set to kill suid/sgid.  But to do that client
needs to know latest mode otherwise it is racy.

To reduce the race window, current code first call fuse_do_getattr() to get
latest -&gt;i_mode and then resets suid/sgid bits and sends rest to server
with setattr(ATTR_MODE).  This does not reduce the race completely but
narrows race window significantly.

With fc-&gt;handle_killpriv_v2 enabled, it should be possible to remove this
race completely.  Do not kill suid/sgid with ATTR_MODE at all.  It will be
killed by server when WRITE request is sent to server soon.  This is
similar to fc-&gt;handle_killpriv logic.  V2 is just more refined version of
protocol.  Hence this patch does not send ATTR_MODE to kill suid/sgid if
fc-&gt;handle_killpriv_v2 is enabled.

This creates an issue if fc-&gt;writeback_cache is enabled.  In that case
WRITE can be cached in guest and server might not see WRITE request and
hence will not kill suid/sgid.  Miklos suggested that in such cases, we
should fallback to a writethrough WRITE instead and that will generate
WRITE request and kill suid/sgid.  This patch implements that too.

But this relies on client seeing the suid/sgid set.  If another client sets
suid/sgid and this client does not see it immideately, then we will not
fallback to writethrough WRITE.  So this is one limitation with both
fc-&gt;handle_killpriv_v2 and fc-&gt;writeback_cache enabled.  Both the options
are not fully compatible.  But might be good enough for many use cases.

Note: This patch is not checking whether security.capability is set or not
      when falling back to writethrough path.  If suid/sgid is not set and
      only security.capability is set, that will be taken care of by
      file_remove_privs() call in -&gt;writeback_cache path.

Signed-off-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</content>
</entry>
<entry>
<title>fuse: setattr should set FATTR_KILL_SUIDGID</title>
<updated>2020-11-11T16:22:33Z</updated>
<author>
<name>Vivek Goyal</name>
<email>vgoyal@redhat.com</email>
</author>
<published>2020-10-09T18:15:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3179216135ec09825d7c7875580951a6e69dc5df'/>
<id>urn:sha1:3179216135ec09825d7c7875580951a6e69dc5df</id>
<content type='text'>
If fc-&gt;handle_killpriv_v2 is enabled, we expect file server to clear
suid/sgid/security.capbility upon chown/truncate/write as appropriate.

Upon truncate (ATTR_SIZE), suid/sgid are cleared only if caller does not
have CAP_FSETID.  File server does not know whether caller has CAP_FSETID
or not.  Hence set FATTR_KILL_SUIDGID upon truncate to let file server know
that caller does not have CAP_FSETID and it should kill suid/sgid as
appropriate.

On chown (ATTR_UID/ATTR_GID) suid/sgid need to be cleared irrespective of
capabilities of calling process, so set FATTR_KILL_SUIDGID unconditionally
in that case.

Signed-off-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</content>
</entry>
<entry>
<title>fuse: set FUSE_WRITE_KILL_SUIDGID in cached write path</title>
<updated>2020-11-11T16:22:33Z</updated>
<author>
<name>Vivek Goyal</name>
<email>vgoyal@redhat.com</email>
</author>
<published>2020-10-09T18:15:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b866739596ae3c3c60c43f1cf04a516c5aa20fd1'/>
<id>urn:sha1:b866739596ae3c3c60c43f1cf04a516c5aa20fd1</id>
<content type='text'>
With HANDLE_KILLPRIV_V2, server will need to kill suid/sgid if caller does
not have CAP_FSETID.  We already have a flag FUSE_WRITE_KILL_SUIDGID in
WRITE request and we already set it in direct I/O path.

To make it work in cached write path also, start setting
FUSE_WRITE_KILL_SUIDGID in this path too.

Set it only if fc-&gt;handle_killpriv_v2 is set.  Otherwise client is
responsible for kill suid/sgid.

In case of direct I/O we set FUSE_WRITE_KILL_SUIDGID unconditionally
because we don't call file_remove_privs() in that path (with cache=none
option).

Signed-off-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</content>
</entry>
<entry>
<title>fuse: rename FUSE_WRITE_KILL_PRIV to FUSE_WRITE_KILL_SUIDGID</title>
<updated>2020-11-11T16:22:32Z</updated>
<author>
<name>Miklos Szeredi</name>
<email>mszeredi@redhat.com</email>
</author>
<published>2020-11-11T16:22:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=10c52c84e3f4872689a64ac7666b34d67e630691'/>
<id>urn:sha1:10c52c84e3f4872689a64ac7666b34d67e630691</id>
<content type='text'>
Kernel has:
ATTR_KILL_PRIV -&gt; clear "security.capability"
ATTR_KILL_SUID -&gt; clear S_ISUID
ATTR_KILL_SGID -&gt; clear S_ISGID if executable

Fuse has:
FUSE_WRITE_KILL_PRIV -&gt; clear S_ISUID and S_ISGID if executable

So FUSE_WRITE_KILL_PRIV implies the complement of ATTR_KILL_PRIV, which is
somewhat confusing.  Also PRIV implies all privileges, including
"security.capability".

Change the name to FUSE_WRITE_KILL_SUIDGID and make FUSE_WRITE_KILL_PRIV an
alias to perserve API compatibility

Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</content>
</entry>
<entry>
<title>fuse: introduce the notion of FUSE_HANDLE_KILLPRIV_V2</title>
<updated>2020-11-11T16:22:32Z</updated>
<author>
<name>Vivek Goyal</name>
<email>vgoyal@redhat.com</email>
</author>
<published>2020-10-09T18:15:07Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=63f9909ff602082597849f684655e93336c50b11'/>
<id>urn:sha1:63f9909ff602082597849f684655e93336c50b11</id>
<content type='text'>
We already have FUSE_HANDLE_KILLPRIV flag that says that file server will
remove suid/sgid/caps on truncate/chown/write. But that's little different
from what Linux VFS implements.

To be consistent with Linux VFS behavior what we want is.

- caps are always cleared on chown/write/truncate
- suid is always cleared on chown, while for truncate/write it is cleared
  only if caller does not have CAP_FSETID.
- sgid is always cleared on chown, while for truncate/write it is cleared
  only if caller does not have CAP_FSETID as well as file has group execute
  permission.

As previous flag did not provide above semantics. Implement a V2 of the
protocol with above said constraints.

Server does not know if caller has CAP_FSETID or not. So for the case
of write()/truncate(), client will send information in special flag to
indicate whether to kill priviliges or not. These changes are in subsequent
patches.

FUSE_HANDLE_KILLPRIV_V2 relies on WRITE being sent to server to clear
suid/sgid/security.capability. But with -&gt;writeback_cache, WRITES are
cached in guest. So it is not recommended to use FUSE_HANDLE_KILLPRIV_V2
and writeback_cache together. Though it probably might be good enough
for lot of use cases.

Signed-off-by: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</content>
</entry>
<entry>
<title>fuse: always revalidate if exclusive create</title>
<updated>2020-11-11T16:22:32Z</updated>
<author>
<name>Miklos Szeredi</name>
<email>mszeredi@redhat.com</email>
</author>
<published>2020-11-11T16:22:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=df8629af293493757beccac2d3168fe5a315636e'/>
<id>urn:sha1:df8629af293493757beccac2d3168fe5a315636e</id>
<content type='text'>
Failure to do so may result in EEXIST even if the file only exists in the
cache and not in the filesystem.

The atomic nature of O_EXCL mandates that the cached state should be
ignored and existence verified anew.

Reported-by: Ken Schalk &lt;kschalk@nvidia.com&gt;
Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</content>
</entry>
<entry>
<title>virtiofs: clean up error handling in virtio_fs_get_tree()</title>
<updated>2020-11-11T16:22:32Z</updated>
<author>
<name>Miklos Szeredi</name>
<email>mszeredi@redhat.com</email>
</author>
<published>2020-11-11T16:22:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=833c5a42e28beeefa1f9bd476a63fe8050c1e8ca'/>
<id>urn:sha1:833c5a42e28beeefa1f9bd476a63fe8050c1e8ca</id>
<content type='text'>
Avoid duplicating error cleanup.

Signed-off-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
</content>
</entry>
</feed>
