<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/kernel/auditsc.c, branch v4.14</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v4.14</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v4.14'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2017-09-15T00:37:26Z</updated>
<entry>
<title>Merge branch 'work.ipc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs</title>
<updated>2017-09-15T00:37:26Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-09-15T00:37:26Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=cc73fee0bae2d66594d1fa2df92bbd783aa98e04'/>
<id>urn:sha1:cc73fee0bae2d66594d1fa2df92bbd783aa98e04</id>
<content type='text'>
Pull ipc compat cleanup and 64-bit time_t from Al Viro:
 "IPC copyin/copyout sanitizing, including 64bit time_t work from Deepa
  Dinamani"

* 'work.ipc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  utimes: Make utimes y2038 safe
  ipc: shm: Make shmid_kernel timestamps y2038 safe
  ipc: sem: Make sem_array timestamps y2038 safe
  ipc: msg: Make msg_queue timestamps y2038 safe
  ipc: mqueue: Replace timespec with timespec64
  ipc: Make sys_semtimedop() y2038 safe
  get rid of SYSVIPC_COMPAT on ia64
  semtimedop(): move compat to native
  shmat(2): move compat to native
  msgrcv(2), msgsnd(2): move compat to native
  ipc(2): move compat to native
  ipc: make use of compat ipc_perm helpers
  semctl(): move compat to native
  semctl(): separate all layout-dependent copyin/copyout
  msgctl(): move compat to native
  msgctl(): split the actual work from copyin/copyout
  ipc: move compat shmctl to native
  shmctl: split the work from copyin/copyout
</content>
</entry>
<entry>
<title>audit: update the function comments</title>
<updated>2017-09-05T13:46:59Z</updated>
<author>
<name>Geliang Tang</name>
<email>geliangtang@gmail.com</email>
</author>
<published>2017-08-07T13:44:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=196a5085592c62ffa4eb739d7ce49c040c2953a1'/>
<id>urn:sha1:196a5085592c62ffa4eb739d7ce49c040c2953a1</id>
<content type='text'>
Update the function comments to match the code.

Signed-off-by: Geliang Tang &lt;geliangtang@gmail.com&gt;
Signed-off-by: Paul Moore &lt;paul@paul-moore.com&gt;
</content>
</entry>
<entry>
<title>audit: Reduce overhead using a coarse clock</title>
<updated>2017-09-05T13:46:54Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@techsingularity.net</email>
</author>
<published>2017-07-04T12:11:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=e832bf48c8e12f3b39e40fee35c4ea269d685875'/>
<id>urn:sha1:e832bf48c8e12f3b39e40fee35c4ea269d685875</id>
<content type='text'>
Commit 2115bb250f26 ("audit: Use timespec64 to represent audit timestamps")
noted that audit timestamps were not y2038 safe and used a 64-bit
timestamp. In itself, this makes sense but the conversion was from
CURRENT_TIME to ktime_get_real_ts64() which is a heavier call to record
an accurate timestamp which is required in some, but not all, cases. The
impact is that when auditd is running without any rules that all syscalls
have higher overhead. This is visible in the sysbench-thread benchmark as
a 11.5% performance hit. That benchmark is dumb as rocks but it's also
visible in redis as an 8-10% hit on all operations which is of greater
concern. It is somewhat stupid of audit to track syscalls without any
rules related to syscalls but that is how it behaves.

The overhead can be directly measured with perf comparing 4.9 with 4.12

4.9
     7.76%  sysbench         [kernel.vmlinux]    [k] __schedule
     7.62%  sysbench         [kernel.vmlinux]    [k] _raw_spin_lock
     7.37%  sysbench         libpthread-2.22.so  [.] __lll_lock_elision
     7.29%  sysbench         [kernel.vmlinux]    [.] syscall_return_via_sysret
     6.59%  sysbench         [kernel.vmlinux]    [k] native_sched_clock
     5.21%  sysbench         libc-2.22.so        [.] __sched_yield
     4.38%  sysbench         [kernel.vmlinux]    [k] entry_SYSCALL_64
     4.28%  sysbench         [kernel.vmlinux]    [k] do_syscall_64
     3.49%  sysbench         libpthread-2.22.so  [.] __lll_unlock_elision
     3.13%  sysbench         [kernel.vmlinux]    [k] __audit_syscall_exit
     2.87%  sysbench         [kernel.vmlinux]    [k] update_curr
     2.73%  sysbench         [kernel.vmlinux]    [k] pick_next_task_fair
     2.31%  sysbench         [kernel.vmlinux]    [k] syscall_trace_enter
     2.20%  sysbench         [kernel.vmlinux]    [k] __audit_syscall_entry
.....
     0.00%  swapper          [kernel.vmlinux]    [k] read_tsc

4.12
     7.84%  sysbench         [kernel.vmlinux]    [k] __schedule
     7.05%  sysbench         [kernel.vmlinux]    [k] _raw_spin_lock
     6.57%  sysbench         libpthread-2.22.so  [.] __lll_lock_elision
     6.50%  sysbench         [kernel.vmlinux]    [.] syscall_return_via_sysret
     5.95%  sysbench         [kernel.vmlinux]    [k] read_tsc
     5.71%  sysbench         [kernel.vmlinux]    [k] native_sched_clock
     4.78%  sysbench         libc-2.22.so        [.] __sched_yield
     4.30%  sysbench         [kernel.vmlinux]    [k] entry_SYSCALL_64
     3.94%  sysbench         [kernel.vmlinux]    [k] do_syscall_64
     3.37%  sysbench         libpthread-2.22.so  [.] __lll_unlock_elision
     3.32%  sysbench         [kernel.vmlinux]    [k] __audit_syscall_exit
     2.91%  sysbench         [kernel.vmlinux]    [k] __getnstimeofday64

Note the additional overhead from read_tsc which goes from 0% to 5.95%.
This is on a single-socket E3-1230 but similar overheads have been measured
on an older machine which the patch also eliminates.

The patch in question has no explanation as to why a fully-accurate timestamp
is required and is likely an oversight.  Using a coarser, but monotically
increasing, timestamp the overhead can be eliminated.  While it can be
worked around by configuring or disabling audit, it's tricky enough to
detect that a kernel fix is justified. With this patch, we see the following;

sysbenchthread
                              4.9.0                 4.12.0                 4.12.0
                            vanilla                vanilla            coarse-v1r1
Amean     1         1.49 (   0.00%)        1.66 ( -11.42%)        1.51 (  -1.34%)
Amean     3         1.48 (   0.00%)        1.65 ( -11.45%)        1.50 (  -0.96%)
Amean     5         1.49 (   0.00%)        1.67 ( -12.31%)        1.51 (  -1.83%)
Amean     7         1.49 (   0.00%)        1.66 ( -11.72%)        1.50 (  -0.67%)
Amean     12        1.48 (   0.00%)        1.65 ( -11.57%)        1.52 (  -2.89%)
Amean     16        1.49 (   0.00%)        1.65 ( -11.13%)        1.51 (  -1.73%)

The benchmark is reporting the time required for different thread counts to
lock/unlock a private mutex which, while dense, demonstrates the syscall
overhead. This is showing that 4.12 took a 11-12% hit but the overhead is
almost eliminated by the patch. While the variance is not reported here,
it's well within the noise with the patch applied.

Signed-off-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Acked-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Acked-by: Deepa Dinamani &lt;deepa.kernel@gmail.com&gt;
Signed-off-by: Paul Moore &lt;paul@paul-moore.com&gt;
</content>
</entry>
<entry>
<title>ipc: mqueue: Replace timespec with timespec64</title>
<updated>2017-09-04T00:21:24Z</updated>
<author>
<name>Deepa Dinamani</name>
<email>deepa.kernel@gmail.com</email>
</author>
<published>2017-08-03T02:51:11Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b9047726386bb538cf5e4f52a9f04eb556eebc67'/>
<id>urn:sha1:b9047726386bb538cf5e4f52a9f04eb556eebc67</id>
<content type='text'>
struct timespec is not y2038 safe. Replace
all uses of timespec by y2038 safe struct timespec64.

Even though timespec is used here to represent timeouts,
replace these with timespec64 so that it facilitates
in verification by creating a y2038 safe kernel image
that is free of timespec.

The syscall interfaces themselves are not changed as part
of the patch. They will be part of a different series.

Signed-off-by: Deepa Dinamani &lt;deepa.kernel@gmail.com&gt;
Cc: Paul Moore &lt;paul@paul-moore.com&gt;
Cc: Richard Guy Briggs &lt;rgb@redhat.com&gt;
Reviewed-by: Richard Guy Briggs &lt;rgb@redhat.com&gt;
Reviewed-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Acked-by: Paul Moore &lt;paul@paul-moore.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>Merge branch 'stable-4.13' of git://git.infradead.org/users/pcmoore/audit</title>
<updated>2017-07-05T18:24:05Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-07-05T18:24:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7391786a64dcfe9c609a1f8e2204c1abf42ded23'/>
<id>urn:sha1:7391786a64dcfe9c609a1f8e2204c1abf42ded23</id>
<content type='text'>
Pull audit updates from Paul Moore:
 "Things are relatively quiet on the audit front for v4.13, just five
  patches for a total diffstat of 102 lines.

  There are two patches from Richard to consistently record the POSIX
  capabilities and add the ambient capability information as well.

  I also chipped in two patches to fix a race condition with the auditd
  tracking code and ensure we don't skip sending any records to the
  audit multicast group.

  Finally a single style fix that I accepted because I must have been in
  a good mood that day.

  Everything passes our test suite, and should be relatively harmless,
  please merge for v4.13"

* 'stable-4.13' of git://git.infradead.org/users/pcmoore/audit:
  audit: make sure we never skip the multicast broadcast
  audit: fix a race condition with the auditd tracking code
  audit: style fix
  audit: add ambient capabilities to CAPSET and BPRM_FCAPS records
  audit: unswing cap_* fields in PATH records
</content>
</entry>
<entry>
<title>audit: add ambient capabilities to CAPSET and BPRM_FCAPS records</title>
<updated>2017-05-30T21:36:11Z</updated>
<author>
<name>Richard Guy Briggs</name>
<email>rgb@redhat.com</email>
</author>
<published>2017-04-07T14:17:27Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7786f6b6dfc12d17eea2df04116de6ebac50c884'/>
<id>urn:sha1:7786f6b6dfc12d17eea2df04116de6ebac50c884</id>
<content type='text'>
Capabilities were augmented to include ambient capabilities in v4.3
commit 58319057b784 ("capabilities: ambient capabilities").

Add ambient capabilities to the audit BPRM_FCAPS and CAPSET records.

The record contains fields "old_pp", "old_pi", "old_pe", "new_pp",
"new_pi", "new_pe" so in keeping with the previous record
normalizations, change the "new_*" variants to simply drop the "new_"
prefix.

A sample of the replaced BPRM_FCAPS record:
RAW: type=BPRM_FCAPS msg=audit(1491468034.252:237): fver=2
fp=0000000000200000 fi=0000000000000000 fe=1 old_pp=0000000000000000
old_pi=0000000000000000 old_pe=0000000000000000 old_pa=0000000000000000
pp=0000000000200000 pi=0000000000000000 pe=0000000000200000
pa=0000000000000000

INTERPRET: type=BPRM_FCAPS msg=audit(04/06/2017 04:40:34.252:237):
fver=2 fp=sys_admin fi=none fe=chown old_pp=none old_pi=none
old_pe=none old_pa=none pp=sys_admin pi=none pe=sys_admin pa=none

A sample of the replaced CAPSET record:
RAW: type=CAPSET msg=audit(1491469502.371:242): pid=833
cap_pi=0000003fffffffff cap_pp=0000003fffffffff cap_pe=0000003fffffffff
cap_pa=0000000000000000

INTERPRET: type=CAPSET msg=audit(04/06/2017 05:05:02.371:242) : pid=833
cap_pi=chown,dac_override,dac_read_search,fowner,fsetid,kill,
setgid,setuid,setpcap,linux_immutable,net_bind_service,net_broadcast,
net_admin,net_raw,ipc_lock,ipc_owner,sys_module,sys_rawio,sys_chroot,
sys_ptrace,sys_pacct,sys_admin,sys_boot,sys_nice,sys_resource,sys_time,
sys_tty_config,mknod,lease,audit_write,audit_control,setfcap,
mac_override,mac_admin,syslog,wake_alarm,block_suspend,audit_read
cap_pp=chown,dac_override,dac_read_search,fowner,fsetid,kill,setgid,
setuid,setpcap,linux_immutable,net_bind_service,net_broadcast,
net_admin,net_raw,ipc_lock,ipc_owner,sys_module,sys_rawio,sys_chroot,
sys_ptrace,sys_pacct,sys_admin,sys_boot,sys_nice,sys_resource,
sys_time,sys_tty_config,mknod,lease,audit_write,audit_control,setfcap,
mac_override,mac_admin,syslog,wake_alarm,block_suspend,audit_read
cap_pe=chown,dac_override,dac_read_search,fowner,fsetid,kill,setgid,
setuid,setpcap,linux_immutable,net_bind_service,net_broadcast,
net_admin,net_raw,ipc_lock,ipc_owner,sys_module,sys_rawio,sys_chroot,
sys_ptrace,sys_pacct,sys_admin,sys_boot,sys_nice,sys_resource,
sys_time,sys_tty_config,mknod,lease,audit_write,audit_control,setfcap,
mac_override,mac_admin,syslog,wake_alarm,block_suspend,audit_read
cap_pa=none

See: https://github.com/linux-audit/audit-kernel/issues/40

Signed-off-by: Richard Guy Briggs &lt;rgb@redhat.com&gt;
Acked-by: Serge Hallyn &lt;serge@hallyn.com&gt;
Signed-off-by: Paul Moore &lt;paul@paul-moore.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'fsnotify' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs</title>
<updated>2017-05-03T18:05:15Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-05-03T18:05:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=5133cd7518758211e827481e7d5053333bb926f0'/>
<id>urn:sha1:5133cd7518758211e827481e7d5053333bb926f0</id>
<content type='text'>
Pull fsnotify updates from Jan Kara:
 "The branch contains mainly a rework of fsnotify infrastructure fixing
  a shortcoming that we have waited for response to fanotify permission
  events with SRCU read lock held and when the process consuming events
  was slow to respond the kernel has stalled.

  It also contains several cleanups of unnecessary indirections in
  fsnotify framework and a bugfix from Amir fixing leakage of kernel
  internal errno to userspace"

* 'fsnotify' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (37 commits)
  fanotify: don't expose EOPENSTALE to userspace
  fsnotify: remove a stray unlock
  fsnotify: Move -&gt;free_mark callback to fsnotify_ops
  fsnotify: Add group pointer in fsnotify_init_mark()
  fsnotify: Drop inode_mark.c
  fsnotify: Remove fsnotify_find_{inode|vfsmount}_mark()
  fsnotify: Remove fsnotify_detach_group_marks()
  fsnotify: Rename fsnotify_clear_marks_by_group_flags()
  fsnotify: Inline fsnotify_clear_{inode|vfsmount}_mark_group()
  fsnotify: Remove fsnotify_recalc_{inode|vfsmount}_mask()
  fsnotify: Remove fsnotify_set_mark_{,ignored_}mask_locked()
  fanotify: Release SRCU lock when waiting for userspace response
  fsnotify: Pass fsnotify_iter_info into handle_event handler
  fsnotify: Provide framework for dropping SRCU lock in -&gt;handle_event
  fsnotify: Remove special handling of mark destruction on group shutdown
  fsnotify: Detach mark from object list when last reference is dropped
  fsnotify: Move queueing of mark for destruction into fsnotify_put_mark()
  inotify: Do not drop mark reference under idr_lock
  fsnotify: Free fsnotify_mark_connector when there is no mark attached
  fsnotify: Lock object list with connector lock
  ...
</content>
</entry>
<entry>
<title>audit: Use timespec64 to represent audit timestamps</title>
<updated>2017-05-02T14:16:05Z</updated>
<author>
<name>Deepa Dinamani</name>
<email>deepa.kernel@gmail.com</email>
</author>
<published>2017-05-02T14:16:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=2115bb250f260089743e26decfb5f271ba71ca37'/>
<id>urn:sha1:2115bb250f260089743e26decfb5f271ba71ca37</id>
<content type='text'>
struct timespec is not y2038 safe.
Audit timestamps are recorded in string format into
an audit buffer for a given context.
These mark the entry timestamps for the syscalls.
Use y2038 safe struct timespec64 to represent the times.
The log strings can handle this transition as strings can
hold upto 1024 characters.

Signed-off-by: Deepa Dinamani &lt;deepa.kernel@gmail.com&gt;
Reviewed-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Acked-by: Paul Moore &lt;paul@paul-moore.com&gt;
Acked-by: Richard Guy Briggs &lt;rgb@redhat.com&gt;
Signed-off-by: Paul Moore &lt;paul@paul-moore.com&gt;
</content>
</entry>
<entry>
<title>fsnotify: Free fsnotify_mark_connector when there is no mark attached</title>
<updated>2017-04-10T15:37:35Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2017-02-01T08:21:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=08991e83b7286635167bab40927665a90fb00d81'/>
<id>urn:sha1:08991e83b7286635167bab40927665a90fb00d81</id>
<content type='text'>
Currently we free fsnotify_mark_connector structure only when inode /
vfsmount is getting freed. This can however impose noticeable memory
overhead when marks get attached to inodes only temporarily. So free the
connector structure once the last mark is detached from the object.
Since notification infrastructure can be working with the connector
under the protection of fsnotify_mark_srcu, we have to be careful and
free the fsnotify_mark_connector only after SRCU period passes.

Reviewed-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
Reviewed-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</content>
</entry>
<entry>
<title>fsnotify: Move mark list head from object into dedicated structure</title>
<updated>2017-04-10T15:37:34Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2017-03-14T11:31:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=9dd813c15b2c101168808d4f5941a29985758973'/>
<id>urn:sha1:9dd813c15b2c101168808d4f5941a29985758973</id>
<content type='text'>
Currently notification marks are attached to object (inode or vfsmnt) by
a hlist_head in the object. The list is also protected by a spinlock in
the object. So while there is any mark attached to the list of marks,
the object must be pinned in memory (and thus e.g. last iput() deleting
inode cannot happen). Also for list iteration in fsnotify() to work, we
must hold fsnotify_mark_srcu lock so that mark itself and
mark-&gt;obj_list.next cannot get freed. Thus we are required to wait for
response to fanotify events from userspace process with
fsnotify_mark_srcu lock held. That causes issues when userspace process
is buggy and does not reply to some event - basically the whole
notification subsystem gets eventually stuck.

So to be able to drop fsnotify_mark_srcu lock while waiting for
response, we have to pin the mark in memory and make sure it stays in
the object list (as removing the mark waiting for response could lead to
lost notification events for groups later in the list). However we don't
want inode reclaim to block on such mark as that would lead to system
just locking up elsewhere.

This commit is the first in the series that paves way towards solving
these conflicting lifetime needs. Instead of anchoring the list of marks
directly in the object, we anchor it in a dedicated structure
(fsnotify_mark_connector) and just point to that structure from the
object. The following commits will also add spinlock protecting the list
and object pointer to the structure.

Reviewed-by: Miklos Szeredi &lt;mszeredi@redhat.com&gt;
Reviewed-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
</content>
</entry>
</feed>
