<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/ipc, branch v2.6.38</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
</subtitle>
<id>https://git.shady.money/linux/atom?h=v2.6.38</id>
<link rel='self' href='https://git.shady.money/linux/atom?h=v2.6.38'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/'/>
<updated>2011-01-07T06:50:26Z</updated>
<entry>
<title>fs: icache RCU free inodes</title>
<updated>2011-01-07T06:50:26Z</updated>
<author>
<name>Nick Piggin</name>
<email>npiggin@kernel.dk</email>
</author>
<published>2011-01-07T06:49:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=fa0d7e3de6d6fc5004ad9dea0dd6b286af8f03e9'/>
<id>urn:sha1:fa0d7e3de6d6fc5004ad9dea0dd6b286af8f03e9</id>
<content type='text'>
RCU free the struct inode. This will allow:

- Subsequent store-free path walking patch. The inode must be consulted for
  permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
  to take i_lock no longer need to take sb_inode_list_lock to walk the list in
  the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
  page lock to follow page-&gt;mapping.

The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.

In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.

The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.

Signed-off-by: Nick Piggin &lt;npiggin@kernel.dk&gt;
</content>
</entry>
<entry>
<title>ipc: shm: fix information leak to userland</title>
<updated>2010-10-30T15:25:51Z</updated>
<author>
<name>Vasiliy Kulikov</name>
<email>segooon@gmail.com</email>
</author>
<published>2010-10-30T14:22:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=3af54c9bd9e6f14f896aac1bb0e8405ae0bc7a44'/>
<id>urn:sha1:3af54c9bd9e6f14f896aac1bb0e8405ae0bc7a44</id>
<content type='text'>
The shmid_ds structure is copied to userland with shm_unused{,2,3}
fields unitialized.  It leads to leaking of contents of kernel stack
memory.

Signed-off-by: Vasiliy Kulikov &lt;segooon@gmail.com&gt;
Acked-by: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>switch get_sb_ns() users</title>
<updated>2010-10-29T08:17:03Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2010-07-26T09:16:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=ceefda6931806972ecf550bd8231dce4a4178953'/>
<id>urn:sha1:ceefda6931806972ecf550bd8231dce4a4178953</id>
<content type='text'>
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>ipc: initialize structure memory to zero for compat functions</title>
<updated>2010-10-28T01:03:13Z</updated>
<author>
<name>Dan Rosenberg</name>
<email>drosenberg@vsecurity.com</email>
</author>
<published>2010-10-27T22:34:17Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=03145beb455cf5c20a761e8451e30b8a74ba58d9'/>
<id>urn:sha1:03145beb455cf5c20a761e8451e30b8a74ba58d9</id>
<content type='text'>
This takes care of leaking uninitialized kernel stack memory to
userspace from non-zeroed fields in structs in compat ipc functions.

Signed-off-by: Dan Rosenberg &lt;drosenberg@vsecurity.com&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc/shm.c: add RSS and swap size information to /proc/sysvipc/shm</title>
<updated>2010-10-28T01:03:13Z</updated>
<author>
<name>Helge Deller</name>
<email>deller@gmx.de</email>
</author>
<published>2010-10-27T22:34:16Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=b795218075a1e1183169abb66a90dcdcf30367f9'/>
<id>urn:sha1:b795218075a1e1183169abb66a90dcdcf30367f9</id>
<content type='text'>
The kernel currently provides no functionality to analyze the RSS and swap
space usage of each individual sysvipc shared memory segment.

This patch adds this info for each existing shm segment by extending the
output of /proc/sysvipc/shm by two columns for RSS and swap.

Since shmctl(SHM_INFO) already provides a similiar calculation (it
currently sums up all RSS/swap info for all segments), I did split out a
static function which is now used by the /proc/sysvipc/shm output and
shmctl(SHM_INFO).

SAP products (esp.  the SAP Netweaver ABAP Kernel) uses lots of big shared
memory segments (we often have Linux systems with &gt;= 16GB shm usage).
Sometimes we get customer reports about "slow" system responses and while
looking into their configurations we often find massive swapping activity
on the system.  With this patch it's now easy to see from the command line
if and which shm segments gets swapped out (and how much) and can more
easily give recommendations for system tuning.  Without the patch it's
currently not possible to do such shm analysis at all.

Also...

Add some spaces in front of the "size" field for 64bit kernels to get the
columns correct if you cat the contents of the file.  In
sysvipc_shm_proc_show() the kernel prints the size value in "SPEC_SIZE"
format, which is defined like this:

#if BITS_PER_LONG &lt;= 32
#define SIZE_SPEC "%10lu"
#else
#define SIZE_SPEC "%21lu"
#endif

So, if the header is not adjusted, the columns are not correctly aligned.
I actually tested this on 32- and 64-bit and it seems correct now.

Signed-off-by: Helge Deller &lt;deller@gmx.de&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fs: do not assign default i_ino in new_inode</title>
<updated>2010-10-26T01:26:11Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2010-10-23T15:19:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=85fe4025c616a7c0ed07bc2fc8c5371b07f3888c'/>
<id>urn:sha1:85fe4025c616a7c0ed07bc2fc8c5371b07f3888c</id>
<content type='text'>
Instead of always assigning an increasing inode number in new_inode
move the call to assign it into those callers that actually need it.
For now callers that need it is estimated conservatively, that is
the call is added to all filesystems that do not assign an i_ino
by themselves.  For a few more filesystems we can avoid assigning
any inode number given that they aren't user visible, and for others
it could be done lazily when an inode number is actually needed,
but that's left for later patches.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Dave Chinner &lt;dchinner@redhat.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>new helper: ihold()</title>
<updated>2010-10-26T01:26:11Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2010-10-23T15:11:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=7de9c6ee3ecffd99e1628e81a5ea5468f7581a1f'/>
<id>urn:sha1:7de9c6ee3ecffd99e1628e81a5ea5468f7581a1f</id>
<content type='text'>
Clones an existing reference to inode; caller must already hold one.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl</title>
<updated>2010-10-22T17:52:56Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2010-10-22T17:52:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=092e0e7e520a1fca03e13c9f2d157432a8657ff2'/>
<id>urn:sha1:092e0e7e520a1fca03e13c9f2d157432a8657ff2</id>
<content type='text'>
* 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
  vfs: make no_llseek the default
  vfs: don't use BKL in default_llseek
  llseek: automatically add .llseek fop
  libfs: use generic_file_llseek for simple_attr
  mac80211: disallow seeks in minstrel debug code
  lirc: make chardev nonseekable
  viotape: use noop_llseek
  raw: use explicit llseek file operations
  ibmasmfs: use generic_file_llseek
  spufs: use llseek in all file operations
  arm/omap: use generic_file_llseek in iommu_debug
  lkdtm: use generic_file_llseek in debugfs
  net/wireless: use generic_file_llseek in debugfs
  drm: use noop_llseek
</content>
</entry>
<entry>
<title>llseek: automatically add .llseek fop</title>
<updated>2010-10-15T13:53:27Z</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2010-08-15T16:52:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=6038f373a3dc1f1c26496e60b6c40b164716f07e'/>
<id>urn:sha1:6038f373a3dc1f1c26496e60b6c40b164716f07e</id>
<content type='text'>
All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.

The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.

New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time.  Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.

The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.

Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.

Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.

===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
//   but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{
&lt;+...
nonseekable_open(...)
...+&gt;
}

@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{
&lt;+...
(
nonseekable_open(...)
|
nested_open(...)
)
...+&gt;
}

@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
&lt;+...
(
   *off = E
|
   *off += E
|
   func(..., off, ...)
|
   E = *off
)
...+&gt;
}

@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}

@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
&lt;+...
(
  *off = E
|
  *off += E
|
  func(..., off, ...)
|
  E = *off
)
...+&gt;
}

@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}

@ fops0 @
identifier fops;
@@
struct file_operations fops = {
 ...
};

@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
 .llseek = llseek_f,
...
};

@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
 .read = read_f,
...
};

@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
 .write = write_f,
...
};

@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
 .open = open_f,
...
};

// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek &amp;&amp; has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
...  .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};

@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
...  .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};

// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
...  .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};

// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek &amp;&amp; !nonseekable1 &amp;&amp; !nonseekable2 &amp;&amp; !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};

// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 &amp;&amp; !has_llseek &amp;&amp; !nonseekable1 &amp;&amp; !nonseekable2 &amp;&amp; !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};

@ fops3 depends on !fops1 &amp;&amp; !fops2 &amp;&amp; !has_llseek &amp;&amp; !nonseekable1 &amp;&amp; !nonseekable2 &amp;&amp; !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+	.llseek = default_llseek, /* write accesses f_pos */
};

// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////

@ fops4 depends on !fops1 &amp;&amp; !fops2 &amp;&amp; !fops3 &amp;&amp; !has_llseek &amp;&amp; !nonseekable1 &amp;&amp; !nonseekable2 &amp;&amp; !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
 .write = write_f,
 .read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};

@ depends on has_write &amp;&amp; !has_read &amp;&amp; !fops1 &amp;&amp; !fops2 &amp;&amp; !has_llseek &amp;&amp; !nonseekable1 &amp;&amp; !nonseekable2 &amp;&amp; !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};

@ depends on has_read &amp;&amp; !has_write &amp;&amp; !fops1 &amp;&amp; !fops2 &amp;&amp; !has_llseek &amp;&amp; !nonseekable1 &amp;&amp; !nonseekable2 &amp;&amp; !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};

@ depends on !has_read &amp;&amp; !has_write &amp;&amp; !fops1 &amp;&amp; !fops2 &amp;&amp; !has_llseek &amp;&amp; !nonseekable1 &amp;&amp; !nonseekable2 &amp;&amp; !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====

Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Julia Lawall &lt;julia@diku.dk&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
</content>
</entry>
<entry>
<title>sys_semctl: fix kernel stack leakage</title>
<updated>2010-10-01T17:50:58Z</updated>
<author>
<name>Dan Rosenberg</name>
<email>drosenberg@vsecurity.com</email>
</author>
<published>2010-09-30T22:15:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/linux/commit/?id=982f7c2b2e6a28f8f266e075d92e19c0dd4c6e56'/>
<id>urn:sha1:982f7c2b2e6a28f8f266e075d92e19c0dd4c6e56</id>
<content type='text'>
The semctl syscall has several code paths that lead to the leakage of
uninitialized kernel stack memory (namely the IPC_INFO, SEM_INFO,
IPC_STAT, and SEM_STAT commands) during the use of the older, obsolete
version of the semid_ds struct.

The copy_semid_to_user() function declares a semid_ds struct on the stack
and copies it back to the user without initializing or zeroing the
"sem_base", "sem_pending", "sem_pending_last", and "undo" pointers,
allowing the leakage of 16 bytes of kernel stack memory.

The code is still reachable on 32-bit systems - when calling semctl()
newer glibc's automatically OR the IPC command with the IPC_64 flag, but
invoking the syscall directly allows users to use the older versions of
the struct.

Signed-off-by: Dan Rosenberg &lt;dan.j.rosenberg@gmail.com&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
