<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/tempfile.c, branch v2.16.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.16.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.16.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2017-09-06T08:19:54Z</updated>
<entry>
<title>tempfile: auto-allocate tempfiles on heap</title>
<updated>2017-09-06T08:19:54Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2017-09-05T12:15:08Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=076aa2cbda5782426c45cd65017b81d77876297a'/>
<id>urn:sha1:076aa2cbda5782426c45cd65017b81d77876297a</id>
<content type='text'>
The previous commit taught the tempfile code to give up
ownership over tempfiles that have been renamed or deleted.
That makes it possible to use a stack variable like this:

  struct tempfile t;

  create_tempfile(&amp;t, ...);
  ...
  if (!err)
          rename_tempfile(&amp;t, ...);
  else
          delete_tempfile(&amp;t);

But doing it this way has a high potential for creating
memory errors. The tempfile we pass to create_tempfile()
ends up on a global linked list, and it's not safe for it to
go out of scope until we've called one of those two
deactivation functions.

Imagine that we add an early return from the function that
forgets to call delete_tempfile(). With a static or heap
tempfile variable, the worst case is that the tempfile hangs
around until the program exits (and some functions like
setup_shallow_temporary rely on this intentionally, creating
a tempfile and then leaving it for later cleanup).

But with a stack variable as above, this is a serious memory
error: the variable goes out of scope and may be filled with
garbage by the time the tempfile code looks at it.  Let's
see if we can make it harder to get this wrong.

Since many callers need to allocate arbitrary numbers of
tempfiles, we can't rely on static storage as a general
solution. So we need to turn to the heap. We could just ask
all callers to pass us a heap variable, but that puts the
burden on them to call free() at the right time.

Instead, let's have the tempfile code handle the heap
allocation _and_ the deallocation (when the tempfile is
deactivated and removed from the list).

This changes the return value of all of the creation
functions. For the cleanup functions (delete and rename),
we'll add one extra bit of safety: instead of taking a
tempfile pointer, we'll take a pointer-to-pointer and set it
to NULL after freeing the object. This makes it safe to
double-call functions like delete_tempfile(), as the second
call treats the NULL input as a noop. Several callsites
follow this pattern.

The resulting patch does have a fair bit of noise, as each
caller needs to be converted to handle:

  1. Storing a pointer instead of the struct itself.

  2. Passing the pointer instead of taking the struct
     address.

  3. Handling a "struct tempfile *" return instead of a file
     descriptor.

We could play games to make this less noisy. For example, by
defining the tempfile like this:

  struct tempfile {
	struct heap_allocated_part_of_tempfile {
                int fd;
                ...etc
        } *actual_data;
  }

Callers would continue to have a "struct tempfile", and it
would be "active" only when the inner pointer was non-NULL.
But that just makes things more awkward in the long run.
There aren't that many callers, so we can simply bite
the bullet and adjust all of them. And the compiler makes it
easy for us to find them all.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>tempfile: remove deactivated list entries</title>
<updated>2017-09-06T08:19:54Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2017-09-05T12:15:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=422a21c6a086ec8c05c96b04bdccd960da141c04'/>
<id>urn:sha1:422a21c6a086ec8c05c96b04bdccd960da141c04</id>
<content type='text'>
Once a "struct tempfile" is added to the global cleanup
list, it is never removed. This means that its storage must
remain valid for the lifetime of the program. For single-use
tempfiles and locks, this isn't a big deal: we just declare
the struct static. But for library code which may take
multiple simultaneous locks (like the ref code), they're
forced to allocate a struct on the heap and leak it.

This is mostly OK in practice. The size of the leak is
bounded by the number of refs, and most programs exit after
operating on a fixed number of refs (and allocate
simultaneous memory proportional to the number of ref
updates in the first place). But:

  1. It isn't hard to imagine a real leak: a program which
     runs for a long time taking a series of ref update
     instructions and fulfilling them one by one. I don't
     think we have such a program now, but it's certainly
     plausible.

  2. The leaked entries appear as false positives to
     tools like valgrind.

Let's relax this rule by keeping only "active" tempfiles on
the list. We can do this easily by moving the list-add
operation from prepare_tempfile_object to activate_tempfile,
and adding a deletion in deactivate_tempfile.

Existing callers do not need to be updated immediately.
They'll continue to leak any tempfile objects they may have
allocated, but that's no different than the status quo. We
can clean them up individually.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>tempfile: use list.h for linked list</title>
<updated>2017-09-06T08:19:54Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2017-09-05T12:15:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=24d82185d267daafc29ca61af1ed1dc746ba3463'/>
<id>urn:sha1:24d82185d267daafc29ca61af1ed1dc746ba3463</id>
<content type='text'>
The tempfile API keeps to-be-cleaned tempfiles in a
singly-linked list and never removes items from the list.  A
future patch would like to start removing items, but removal
from a singly linked list is O(n), as we have to walk the
list to find the predecessor element. This means that a
process which takes "n" simultaneous lockfiles (for example,
an atomic transaction on "n" refs) may end up quadratic in
"n".

Before we start allowing items to be removed, it would be
nice to have a way to cover this case in linear time.

The simplest solution is to make an assumption about the
order in which tempfiles are added and removed from the
list. If both operations iterate over the tempfiles in the
same order, then by putting new items at the end of the list
our removal search will always find its items at the
beginning of the list. And indeed, that would work for the
case of refs. But it creates a hidden dependency between
unrelated parts of the code. If anybody changes the ref code
(or if we add a new caller that opens multiple simultaneous
tempfiles) they may unknowingly introduce a performance
regression.

Another solution is to use a better data structure. A
doubly-linked list works fine, and we already have an
implementation in list.h. But there's one snag: the elements
of "struct tempfile" are all marked as "volatile", since a
signal handler may interrupt us and iterate over the list at
any moment (even if we were in the middle of adding a new
entry).

We can declare a "volatile struct list_head", but we can't
actually use it with the normal list functions. The compiler
complains about passing a pointer-to-volatile via a regular
pointer argument. And rightfully so, as the sub-function
would potentially need different code to deal with the
volatile case.

That leaves us with a few options:

  1. Drop the "volatile" modifier for the list items.

     This is probably a bad idea. I checked the assembly
     output from "gcc -O2", and the "volatile" really does
     impact the order in which it updates memory.

  2. Use macros instead of inline functions. The irony here
     is that list.h is entirely implemented as trivial
     inline functions. So we basically are already
     generating custom code for each call. But sadly there's no
     way in C to declare the inline function to take a more
     generic type.

     We could do so by switching the inline functions to
     macros, but it does make the end result harder to read.
     And it doesn't fully solve the problem (for instance,
     the declaration of list_head needs to change so that
     its "prev" and "next" pointers point to other volatile
     structs).

  3. Don't use list.h, and just make our own ad-hoc
     doubly-linked list. It's not that much code to
     implement the basics that we need here. But if we're
     going to do so, why not add the few extra lines
     required to model it after the actual list.h interface?
     We can even reuse a few of the macro helpers.

So this patch takes option 3, but actually implements a
parallel "volatile list" interface in list.h, where it could
potentially be reused by other code. This implements just
enough for tempfile.c's use, though we could easily port
other functions later if need be.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>tempfile: release deactivated strbufs instead of resetting</title>
<updated>2017-09-06T08:19:54Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2017-09-05T12:14:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=102cf7a6aa1f09177c09536a81dbaef44e13e038'/>
<id>urn:sha1:102cf7a6aa1f09177c09536a81dbaef44e13e038</id>
<content type='text'>
When a tempfile is deactivated, we reset its strbuf to the
empty string, which means we hold onto the memory for later
reuse.

Since we'd like to move to a system where tempfile structs
can actually be freed, deactivating one should drop all
resources it is currently using. And thus "release" rather
than "reset" is the appropriate function to call.

In theory the reset may have saved a malloc() when a
tempfile (or a lockfile) is reused multiple times. But in
practice this happened rarely. Most of our tempfiles are
single-use, since in cases where we might actually use many
(like ref locking) we xcalloc() a fresh one for each ref. In
fact, we leak those locks (to appease the rule that tempfile
storage can never be freed). Which means that using reset is
actively hurting us: instead of leaking just the tempfile
struct, we're leaking the extra heap chunk for the filename,
too.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>tempfile: robustify cleanup handler</title>
<updated>2017-09-06T08:19:53Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2017-09-05T12:14:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=6b935066960d19713d85e7bbebd442dd8d35e0c6'/>
<id>urn:sha1:6b935066960d19713d85e7bbebd442dd8d35e0c6</id>
<content type='text'>
We may call remove_tempfiles() from an atexit handler, or
from a signal handler. In the latter case we must take care
to avoid functions which may deadlock if the process is in
an unknown state, including looking at any stdio handles
(which may be in the middle of doing I/O and locked) or
calling malloc() or free().

The current implementation calls delete_tempfile(). We unset
the tempfile's stdio handle (if any) to avoid deadlocking
there. But delete_tempfile() still calls unlink_or_warn(),
which can deadlock writing to stderr if the unlink fails.

Since delete_tempfile() isn't very long, let's just
open-code our own simple conservative version of the same
thing.  Notably:

  1. The "skip_fclose" flag is now called "in_signal_handler",
     because it should inform more decisions than just the
     fclose handling.

  2. We can replace close_tempfile() with just close(fd).
     That skips the fclose() question altogether. This is
     fine for the atexit() case, too; there's no point
     flushing data to a file which we're about to delete
     anyway.

  3. We can choose between unlink/unlink_or_warn based on
     whether it's safe to use stderr.

  4. We can replace the deactivate_tempfile() call with a
     simple setting of the active flag. There's no need to
     do any further cleanup since we know the program is
     exiting.  And even though the current deactivation code
     is safe in a signal handler, this frees us up in future
     patches to make non-signal deactivation more
     complicated (e.g., by freeing resources).

  5. There's no need to remove items from the tempfile_list.
     The "active" flag is the ultimate answer to whether an
     entry has been handled or not. Manipulating the list
     just introduces more chance of recursive signals
     stomping on each other, and the whole list will go away
     when the program exits anyway. Less is more.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>tempfile: factor out deactivation</title>
<updated>2017-09-06T08:19:53Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2017-09-05T12:14:50Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b5f4dcb598bb369c960f715fcd5c9ccf4112e026'/>
<id>urn:sha1:b5f4dcb598bb369c960f715fcd5c9ccf4112e026</id>
<content type='text'>
When we deactivate a tempfile, we also have to clean up the
"filename" strbuf. Let's pull this out into its own function
to keep the logic in one place (which will become more
important when a future patch makes it more complicated).

Note that we can use the same function when deactivating an
object that _isn't_ actually active yet (like when we hit an
error creating a tempfile). These callsites don't currently
reset the "active" flag to 0, but it's OK to do so (it's
just a noop for these cases).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>tempfile: factor out activation</title>
<updated>2017-09-06T08:19:53Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2017-09-05T12:14:47Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=2933ebbac1967a677eaed1945068941bc3ff7751'/>
<id>urn:sha1:2933ebbac1967a677eaed1945068941bc3ff7751</id>
<content type='text'>
There are a few steps required to "activate" a tempfile
struct. Let's pull these out into a function. That saves a
few repeated lines now, but more importantly will make it
easier to change the activation scheme later.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>tempfile: replace die("BUG") with BUG()</title>
<updated>2017-09-06T08:19:53Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2017-09-05T12:14:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=9b028aa45a2016ae0dbdfeb85ad9d43f2017db0d'/>
<id>urn:sha1:9b028aa45a2016ae0dbdfeb85ad9d43f2017db0d</id>
<content type='text'>
Compared to die(), using BUG() triggers abort(). That may
give us an actual coredump, which should make it easier to
get a stack trace. And since the programming error for these
assertions is not in the functions themselves but in their
callers, such a stack trace is needed to actually find the
source of the bug.

In addition, abort() raises SIGABRT, which is more likely to
be caught by our test suite.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>tempfile: handle NULL tempfile pointers gracefully</title>
<updated>2017-09-06T08:19:53Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2017-09-05T12:14:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f5b4dc7668b6c8d71432af9f9ddad6f7c62d284e'/>
<id>urn:sha1:f5b4dc7668b6c8d71432af9f9ddad6f7c62d284e</id>
<content type='text'>
The tempfile functions all take pointers to tempfile
objects, but do not check whether the argument is NULL.
This isn't a big deal in practice, since the lifetime of any
tempfile object is defined to last for the whole program. So
even if we try to call delete_tempfile() on an
already-deleted tempfile, our "active" check will tell us
that it's a noop.

In preparation for transitioning to a new system that
loosens the "tempfile objects can never be freed" rule,
let's tighten up our active checks:

  1. A NULL pointer is now defined as "inactive" (so it will
     BUG for most functions, but works as a silent noop for
     things like delete_tempfile).

  2. Functions should always do the "active" check before
     looking at any of the struct fields.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>tempfile: prefer is_tempfile_active to bare access</title>
<updated>2017-09-06T08:19:53Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2017-09-05T12:14:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=e6fc267314d24fe9a81875b85b28a4b5d0fb78b1'/>
<id>urn:sha1:e6fc267314d24fe9a81875b85b28a4b5d0fb78b1</id>
<content type='text'>
The tempfile code keeps an "active" flag, and we have a
number of assertions to make sure that the objects are being
used in the right order. Most of these directly check
"active" rather than using the is_tempfile_active()
accessor.

Let's prefer using the accessor, in preparation for it
growing more complicated logic (like checking for NULL).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
