<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/blob.c, branch v2.2.0</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.2.0</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.2.0'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2014-07-14T01:59:05Z</updated>
<entry>
<title>add object_as_type helper for casting objects</title>
<updated>2014-07-14T01:59:05Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2014-07-13T06:42:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=8ff226a9d5ee065fe52752e6032f63cb6e4beccb'/>
<id>urn:sha1:8ff226a9d5ee065fe52752e6032f63cb6e4beccb</id>
<content type='text'>
When we call lookup_commit, lookup_tree, etc, the logic goes
something like:

  1. Look for an existing object struct. If we don't have
     one, allocate and return a new one.

  2. Double check that any object we have is the expected
     type (and complain and return NULL otherwise).

  3. Convert an object with type OBJ_NONE (from a prior
     call to lookup_unknown_object) to the expected type.

We can encapsulate steps 2 and 3 in a helper function which
checks whether we have the expected object type, converts
OBJ_NONE as appropriate, and returns the object.

Not only does this shorten the code, but it also provides
one central location for converting OBJ_NONE objects into
objects of other types. Future patches will use that to
enforce type-specific invariants.

Since this is a refactoring, we would want it to behave
exactly as the current code. It takes a little reasoning to
see that this is the case:

  - for lookup_{commit,tree,etc} functions, we are just
    pulling steps 2 and 3 into a function that does the same
    thing.

  - for the call in peel_object, we currently only do step 3
    (but we want to consolidate it with the others, as
    mentioned above). However, step 2 is a noop here, as the
    surrounding conditional makes sure we have OBJ_NONE
    (which we want to keep to avoid an extraneous call to
    sha1_object_info).

  - for the call in lookup_commit_reference_gently, we are
    currently doing step 2 but not step 3. However, step 3
    is a noop here. The object we got will have just come
    from deref_tag, which must have figured out the type for
    each object in order to know when to stop peeling.
    Therefore the type will never be OBJ_NONE.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>move setting of object-&gt;type to alloc_* functions</title>
<updated>2014-07-14T01:59:05Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2014-07-13T06:41:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d36f51c13b54a872cdaf08a1765a23afab26ae51'/>
<id>urn:sha1:d36f51c13b54a872cdaf08a1765a23afab26ae51</id>
<content type='text'>
The "struct object" type implements basic object
polymorphism.  Individual instances are allocated as
concrete types (or as a union type that can store any
object), and a "struct object *" can be cast into its real
type after examining its "type" enum.  This means it is
dangerous to have a type field that does not match the
allocation (e.g., setting the type field of a "struct blob"
to "OBJ_COMMIT" would mean that a reader might read past the
allocated memory).

In most of the current code this is not a problem; the first
thing we do after allocating an object is usually to set its
type field by passing it to create_object. However, the
virtual commits we create in merge-recursive.c do not ever
get their type set. This does not seem to have caused
problems in practice, though (presumably because we always
pass around a "struct commit" pointer and never even look at
the type).

We can fix this oversight and also make it harder for future
code to get it wrong by setting the type directly in the
object allocation functions.

This will also make it easier to fix problems with commit
index allocation, as we know that any object allocated by
alloc_commit_node will meet the invariant that an object
with an OBJ_COMMIT type field will have a unique index
number.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Replace parse_blob() with an explanatory comment</title>
<updated>2010-01-19T01:04:02Z</updated>
<author>
<name>Daniel Barkalow</name>
<email>barkalow@iabervon.org</email>
</author>
<published>2010-01-18T18:06:28Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=837d395a5c0b98ab938d71db8e2b6b9f69ddcc4d'/>
<id>urn:sha1:837d395a5c0b98ab938d71db8e2b6b9f69ddcc4d</id>
<content type='text'>
parse_blob() has never actually been used; it has served simply to
avoid having a confusing gap in the API. Instead of leaving it, put in
a comment that explains what "parsing a blob" entails (making sure the
object is actually readable), and why code might care whether a blob
has been parsed or not.

Signed-off-by: Daniel Barkalow &lt;barkalow@iabervon.org&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Clean up object creation to use more common code</title>
<updated>2007-04-17T06:36:16Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2007-04-17T05:11:43Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=100c5f3b0b27ec6617de1a785c4ff481e92636c1'/>
<id>urn:sha1:100c5f3b0b27ec6617de1a785c4ff481e92636c1</id>
<content type='text'>
This replaces the fairly odd "created_object()" function that did _most_
of the object setup with a more complete "create_object()" function that
also has a more natural calling convention.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
<entry>
<title>convert object type handling from a string to a number</title>
<updated>2007-02-27T09:34:21Z</updated>
<author>
<name>Nicolas Pitre</name>
<email>nico@cam.org</email>
</author>
<published>2007-02-26T19:55:59Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=21666f1aae4e890d8f50924f9e80763b27e6a45d'/>
<id>urn:sha1:21666f1aae4e890d8f50924f9e80763b27e6a45d</id>
<content type='text'>
We currently have two parallel notation for dealing with object types
in the code: a string and a numerical value.  One of them is obviously
redundent, and the most used one requires more stack space and a bunch
of strcmp() all over the place.

This is an initial step for the removal of the version using a char array
found in object reading code paths.  The patch is unfortunately large but
there is no sane way to split it in smaller parts without breaking the
system.

Signed-off-by: Nicolas Pitre &lt;nico@cam.org&gt;
Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
<entry>
<title>simplify inclusion of system header files.</title>
<updated>2006-12-20T17:51:35Z</updated>
<author>
<name>Junio C Hamano</name>
<email>junkio@cox.net</email>
</author>
<published>2006-12-19T22:34:12Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=85023577a8f4b540aa64aa37f6f44578c0c305a3'/>
<id>urn:sha1:85023577a8f4b540aa64aa37f6f44578c0c305a3</id>
<content type='text'>
This is a mechanical clean-up of the way *.c files include
system header files.

 (1) sources under compat/, platform sha-1 implementations, and
     xdelta code are exempt from the following rules;

 (2) the first #include must be "git-compat-util.h" or one of
     our own header file that includes it first (e.g. config.h,
     builtin.h, pkt-line.h);

 (3) system headers that are included in "git-compat-util.h"
     need not be included in individual C source files.

 (4) "git-compat-util.h" does not have to include subsystem
     specific header files (e.g. expat.h).

Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
<entry>
<title>Remove TYPE_* constant macros and use object_type enums consistently.</title>
<updated>2006-07-13T06:18:03Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@osdl.org</email>
</author>
<published>2006-07-12T03:45:31Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1974632c664c2d573b36a00fa993c1c13dd8a967'/>
<id>urn:sha1:1974632c664c2d573b36a00fa993c1c13dd8a967</id>
<content type='text'>
This updates the type-enumeration constants introduced to reduce
the memory footprint of "struct object" to match the type bits
already used in the packfile format, by removing the former
(i.e. TYPE_* constant macros) and using the latter (i.e. enum
object_type) throughout the code for consistency.

Eventually we can stop passing around the "type strings"
entirely, and this will help - no confusion about two different
integer enumeration.

Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
<entry>
<title>Add specialized object allocator</title>
<updated>2006-06-20T01:42:21Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@osdl.org</email>
</author>
<published>2006-06-19T17:44:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=855419f764a65e92f1d5dd1b3d50ee987db1d9de'/>
<id>urn:sha1:855419f764a65e92f1d5dd1b3d50ee987db1d9de</id>
<content type='text'>
This creates a simple specialized object allocator for basic
objects.

This avoids wasting space with malloc overhead (metadata and
extra alignment), since the specialized allocator knows the
alignment, and that objects, once allocated, are never freed.

It also allows us to track some basic statistics about object
allocations. For example, for the mozilla import, it shows
object usage as follows:

     blobs:   627629 (14710 kB)
     trees:  1119035 (34969 kB)
   commits:   196423  (8440 kB)
      tags:     1336    (46 kB)

and the simpler allocator shaves off about 2.5% off the memory
footprint off a "git-rev-list --all --objects", and is a bit
faster too.

[ Side note: this concludes the series of "save memory in object storage".
  The thing is, there simply isn't much more to be saved on the objects.

  Doing "git-rev-list --all --objects" on the mozilla archive has a final
  total RSS of 131498 pages for me: that's about 513MB. Of that, the
  object overhead is now just 56MB, the rest is going somewhere else (put
  another way: the fact that this patch shaves off 2.5% of the total
  memory overhead, considering that objects are now not much more than 10%
  of the total shows how big the wasted space really was: this makes
  object allocations much more memory- and time-efficient).

  I haven't looked at where the rest is, but I suspect the bulk of it is
  just the pack-file loading. It may be that we should pack the tree
  objects separately from the blob objects: for git-rev-list --objects, we
  don't actually ever need to even look at the blobs, but since trees and
  blobs are interspersed in the pack-file, we end up not being dense in
  the tree accesses, so we end up looking at more pages than we strictly
  need to.

  So with a 535MB pack-file, it's entirely possible - even likely - that
  most of the remaining RSS is just the mmap of the pack-file itself. We
  don't need to map in _all_ of it, but we do end up mapping a fair
  amount. ]

Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
<entry>
<title>Shrink "struct object" a bit</title>
<updated>2006-06-18T01:49:18Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@osdl.org</email>
</author>
<published>2006-06-14T23:45:13Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=885a86abe2e9f7b96a4e2012183c6751635840aa'/>
<id>urn:sha1:885a86abe2e9f7b96a4e2012183c6751635840aa</id>
<content type='text'>
This shrinks "struct object" by a small amount, by getting rid of the
"struct type *" pointer and replacing it with a 3-bit bitfield instead.

In addition, we merge the bitfields and the "flags" field, which
incidentally should also remove a useless 4-byte padding from the object
when in 64-bit mode.

Now, our "struct object" is still too damn large, but it's now less
obviously bloated, and of the remaining fields, only the "util" (which is
not used by most things) is clearly something that should be eventually
discarded.

This shrinks the "git-rev-list --all" memory use by about 2.5% on the
kernel archive (and, perhaps more importantly, on the larger mozilla
archive). That may not sound like much, but I suspect it's more on a
64-bit platform.

There are other remaining inefficiencies (the parent lists, for example,
probably have horrible malloc overhead), but this was pretty obvious.

Most of the patch is just changing the comparison of the "type" pointer
from one of the constant string pointers to the appropriate new TYPE_xxx
small integer constant.

Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
<entry>
<title>Replace xmalloc+memset(0) with xcalloc.</title>
<updated>2006-04-04T07:11:19Z</updated>
<author>
<name>Peter Eriksen</name>
<email>s022018@student.dtu.dk</email>
</author>
<published>2006-04-03T18:30:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=90321c106ca6e36c0e884ca677c9a52dea47bdde'/>
<id>urn:sha1:90321c106ca6e36c0e884ca677c9a52dea47bdde</id>
<content type='text'>
Signed-off-by: Peter Eriksen &lt;s022018@student.dtu.dk&gt;
Signed-off-by: Junio C Hamano &lt;junkio@cox.net&gt;
</content>
</entry>
</feed>
