<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/object-store.h, branch v2.40.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.40.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.40.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2023-01-08T01:52:55Z</updated>
<entry>
<title>repo_read_object_file(): stop wrapping read_object_file_extended()</title>
<updated>2023-01-08T01:52:55Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2023-01-07T13:50:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0ba05cf2e0d077bedbc1ee2521b3e5b5dc883250'/>
<id>urn:sha1:0ba05cf2e0d077bedbc1ee2521b3e5b5dc883250</id>
<content type='text'>
The only caller of read_object_file_extended() is the thin wrapper of
repo_read_object_file(). Instead of wrapping, let's just rename the
inner function and let people call it directly. This cleans up the
namespace and reduces confusion.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>read_object_file_extended(): drop lookup_replace option</title>
<updated>2023-01-08T01:52:55Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2023-01-07T13:50:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7be13f5f743978180ba377e12a312b773ed9af2b'/>
<id>urn:sha1:7be13f5f743978180ba377e12a312b773ed9af2b</id>
<content type='text'>
Our sole caller always passes in "1", so we can just drop the parameter
entirely. Anybody who doesn't want this behavior could easily call
oid_object_info_extended() themselves, as we're just a thin wrapper
around it.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-file: inline calls to read_object()</title>
<updated>2023-01-08T01:52:54Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2023-01-07T13:48:55Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b25562e63fe8afaf0f103362a4e672e9ccdc2d68'/>
<id>urn:sha1:b25562e63fe8afaf0f103362a4e672e9ccdc2d68</id>
<content type='text'>
Since read_object() is these days just a thin wrapper around
oid_object_info_extended(), and since it only has two callers, let's
just inline those calls. This has a few positive outcomes:

  - it's a net reduction in source code lines

  - even though the callers end up with a few extra lines, they're now
    more flexible and can use object_info flags directly. So no more
    need to convert die_if_corrupt between parameter/flag, and we can
    ask for lookup replacement with a flag rather than doing it
    ourselves.

  - there's one fewer function in an already crowded namespace (e.g.,
    the difference between read_object() and read_object_file() was not
    immediately obvious; now we only have one of them).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-file: emit corruption errors when detected</title>
<updated>2022-12-15T00:05:55Z</updated>
<author>
<name>Jonathan Tan</name>
<email>jonathantanmy@google.com</email>
</author>
<published>2022-12-14T19:17:42Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=9e59b38c88cdd79964cc92860906a0737bee3797'/>
<id>urn:sha1:9e59b38c88cdd79964cc92860906a0737bee3797</id>
<content type='text'>
Instead of relying on errno being preserved across function calls, teach
do_oid_object_info_extended() to itself report object corruption when
it first detects it. There are 3 types of corruption being detected:
 - when a replacement object is missing
 - when a loose object is corrupt
 - when a packed object is corrupt and the object cannot be read
   in another way

Note that in the RHS of this patch's diff, a check for ENOENT that was
introduced in 3ba7a06552 (A loose object is not corrupt if it cannot
be read due to EMFILE, 2010-10-28) is also removed. The purpose of this
check is to avoid a false report of corruption if the errno contains
something like EMFILE (or anything that is not ENOENT), in which case
a more generic report is presented. Because, as of this patch, we no
longer rely on such a heuristic to determine corruption, but surface
the error message at the point when we read something that we did not
expect, this check is no longer necessary.

Besides being more resilient, this also prepares for a future patch in
which an indirect caller of do_oid_object_info_extended() will need
such functionality.

Helped-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Jonathan Tan &lt;jonathantanmy@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-file: remove OBJECT_INFO_IGNORE_LOOSE</title>
<updated>2022-12-15T00:05:55Z</updated>
<author>
<name>Jonathan Tan</name>
<email>jonathantanmy@google.com</email>
</author>
<published>2022-12-14T19:17:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=acd6f0d973e6cf27188d80ad3e8e6876bf0e3da9'/>
<id>urn:sha1:acd6f0d973e6cf27188d80ad3e8e6876bf0e3da9</id>
<content type='text'>
Its last user was removed in 97b2fa08b6 (fetch-pack: drop
custom loose object cache, 2018-11-12), so we can remove it.

Helped-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Jonathan Tan &lt;jonathantanmy@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>git-compat-util.h: use "UNUSED", not "UNUSED(var)"</title>
<updated>2022-09-01T17:49:48Z</updated>
<author>
<name>Ævar Arnfjörð Bjarmason</name>
<email>avarab@gmail.com</email>
</author>
<published>2022-08-25T17:09:48Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=5cf88fd8b059235b21ee2f72b17bf1f421a9c4e7'/>
<id>urn:sha1:5cf88fd8b059235b21ee2f72b17bf1f421a9c4e7</id>
<content type='text'>
As reported in [1] the "UNUSED(var)" macro introduced in
2174b8c75de (Merge branch 'jk/unused-annotation' into next,
2022-08-24) breaks coccinelle's parsing of our sources in files where
it occurs.

Let's instead partially go with the approach suggested in [2] of
making this not take an argument. As noted in [1] "coccinelle" will
ignore such tokens in argument lists that it doesn't know about, and
it's less of a surprise to syntax highlighters.

This undoes the "help us notice when a parameter marked as unused is
actually use" part of 9b240347543 (git-compat-util: add UNUSED macro,
2022-08-19), a subsequent commit will further tweak the macro to
implement a replacement for that functionality.

1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>hashmap: mark unused callback parameters</title>
<updated>2022-08-19T19:18:55Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2022-08-19T10:08:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=02c3c59e62a30771d209a171edfc75a4d7387ebe'/>
<id>urn:sha1:02c3c59e62a30771d209a171edfc75a4d7387ebe</id>
<content type='text'>
Hashmap comparison functions must conform to a particular callback
interface, but many don't use all of their parameters. Especially the
void cmp_data pointer, but some do not use keydata either (because they
can easily form a full struct to pass when doing lookups). Let's mark
these to make -Wunused-parameter happy.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>object-file.c: add "stream_loose_object()" to handle large object</title>
<updated>2022-06-13T17:22:35Z</updated>
<author>
<name>Han Xin</name>
<email>hanxin.hx@alibaba-inc.com</email>
</author>
<published>2022-06-11T02:44:19Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=2b6070ac4c408cce86daf21cbce60096f8b78216'/>
<id>urn:sha1:2b6070ac4c408cce86daf21cbce60096f8b78216</id>
<content type='text'>
If we want unpack and write a loose object using "write_loose_object",
we have to feed it with a buffer with the same size of the object, which
will consume lots of memory and may cause OOM. This can be improved by
feeding data to "stream_loose_object()" in a stream.

Add a new function "stream_loose_object()", which is a stream version of
"write_loose_object()" but with a low memory footprint. We will use this
function to unpack large blob object in later commit.

Another difference with "write_loose_object()" is that we have no chance
to run "write_object_file_prepare()" to calculate the oid in advance.
In "write_loose_object()", we know the oid and we can write the
temporary file in the same directory as the final object, but for an
object with an undetermined oid, we don't know the exact directory for
the object.

Still, we need to save the temporary file we're preparing
somewhere. We'll do that in the top-level ".git/objects/"
directory (or whatever "GIT_OBJECT_DIRECTORY" is set to). Once we've
streamed it we'll know the OID, and will move it to its canonical
path.

"freshen_packed_object()" or "freshen_loose_object()" will be called
inside "stream_loose_object()" after obtaining the "oid". After the
temporary file is written, we wants to mark the object to recent and we
may find that where indeed is already the object. We should remove the
temporary and do not leave a new copy of the object.

Helped-by: René Scharfe &lt;l.s.r@web.de&gt;
Helped-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Helped-by: Jiang Xin &lt;zhiyou.jx@alibaba-inc.com&gt;
Signed-off-by: Han Xin &lt;chiyutianyi@gmail.com&gt;
Signed-off-by: Ævar Arnfjörð Bjarmason &lt;avarab@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'tb/cruft-packs'</title>
<updated>2022-06-03T21:30:37Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2022-06-03T21:30:37Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a50036da1a39806a8ae1aba2e2f2fea6f7fb8e08'/>
<id>urn:sha1:a50036da1a39806a8ae1aba2e2f2fea6f7fb8e08</id>
<content type='text'>
A mechanism to pack unreachable objects into a "cruft pack",
instead of ejecting them into loose form to be reclaimed later, has
been introduced.

* tb/cruft-packs:
  sha1-file.c: don't freshen cruft packs
  builtin/gc.c: conditionally avoid pruning objects via loose
  builtin/repack.c: add cruft packs to MIDX during geometric repack
  builtin/repack.c: use named flags for existing_packs
  builtin/repack.c: allow configuring cruft pack generation
  builtin/repack.c: support generating a cruft pack
  builtin/pack-objects.c: --cruft with expiration
  reachable: report precise timestamps from objects in cruft packs
  reachable: add options to add_unseen_recent_objects_to_traversal
  builtin/pack-objects.c: --cruft without expiration
  builtin/pack-objects.c: return from create_object_entry()
  t/helper: add 'pack-mtimes' test-tool
  pack-mtimes: support writing pack .mtimes files
  chunk-format.h: extract oid_version()
  pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles'
  pack-mtimes: support reading .mtimes files
  Documentation/technical: add cruft-packs.txt
</content>
</entry>
<entry>
<title>builtin/pack-objects.c: --cruft without expiration</title>
<updated>2022-05-26T22:48:26Z</updated>
<author>
<name>Taylor Blau</name>
<email>me@ttaylorr.com</email>
</author>
<published>2022-05-20T23:17:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b757353676d6ffe1ac275366fa8b5b42b5d9727d'/>
<id>urn:sha1:b757353676d6ffe1ac275366fa8b5b42b5d9727d</id>
<content type='text'>
Teach `pack-objects` how to generate a cruft pack when no objects are
dropped (i.e., `--cruft-expiration=never`). Later patches will teach
`pack-objects` how to generate a cruft pack that prunes objects.

When generating a cruft pack which does not prune objects, we want to
collect all unreachable objects into a single pack (noting and updating
their mtimes as we accumulate them). Ordinary use will pass the result
of a `git repack -A` as a kept pack, so when this patch says "kept
pack", readers should think "reachable objects".

Generating a non-expiring cruft packs works as follows:

  - Callers provide a list of every pack they know about, and indicate
    which packs are about to be removed.

  - All packs which are going to be removed (we'll call these the
    redundant ones) are marked as kept in-core.

    Any packs the caller did not mention (but are known to the
    `pack-objects` process) are also marked as kept in-core. Packs not
    mentioned by the caller are assumed to be unknown to them, i.e.,
    they entered the repository after the caller decided which packs
    should be kept and which should be discarded.

    Since we do not want to include objects in these "unknown" packs
    (because we don't know which of their objects are or aren't
    reachable), these are also marked as kept in-core.

  - Then, we enumerate all objects in the repository, and add them to
    our packing list if they do not appear in an in-core kept pack.

This results in a new cruft pack which contains all known objects that
aren't included in the kept packs. When the kept pack is the result of
`git repack -A`, the resulting pack contains all unreachable objects.

Signed-off-by: Taylor Blau &lt;me@ttaylorr.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
