<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/object-store.h, branch v2.30.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.30.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.30.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2020-08-06T20:01:02Z</updated>
<entry>
<title>sha1-file: introduce no-lazy-fetch has_object()</title>
<updated>2020-08-06T20:01:02Z</updated>
<author>
<name>Jonathan Tan</name>
<email>jonathantanmy@google.com</email>
</author>
<published>2020-08-05T23:06:49Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1d8d9cb62099e1524ce1269ea88faad871c2197f'/>
<id>urn:sha1:1d8d9cb62099e1524ce1269ea88faad871c2197f</id>
<content type='text'>
There have been a few bugs wherein Git fetches missing objects whenever
the existence of an object is checked, even though it does not need to
perform such a fetch. To resolve these bugs, we could look at all the
places that has_object_file() (or a similar function) is used. As a
first step, introduce a new function has_object() that checks for the
existence of an object, with a default behavior of not fetching if the
object is missing and the repository is a partial clone. As we verify
each has_object_file() (or similar) usage, we can replace it with
has_object(), and we will know that we are done when we can delete
has_object_file() (and the other similar functions).

Also, the new function has_object() has more appropriate defaults:
besides not fetching, it also does not recheck packed storage.

Signed-off-by: Jonathan Tan &lt;jonathantanmy@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>packfile: compute and use the index CRC offset</title>
<updated>2020-05-27T17:07:07Z</updated>
<author>
<name>brian m. carlson</name>
<email>sandals@crustytoothpaste.net</email>
</author>
<published>2020-05-25T19:59:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=629dffc461f3631bb7acfe905d805caa38b49dfa'/>
<id>urn:sha1:629dffc461f3631bb7acfe905d805caa38b49dfa</id>
<content type='text'>
Both v2 pack index files and the v3 format specified as part of the
NewHash work have similar data starting at the CRC table.  Much of the
existing code wants to read either this table or the offset entries
following it, and in doing so computes the offset each time.

In order to share as much code between v2 and v3, compute the offset of
the CRC table and store it when the pack is opened.  Use this value to
compute offsets to not only the CRC table, but to the offset entries
beyond it.

Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>oid_array: rename source file from sha1-array</title>
<updated>2020-03-30T17:59:08Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2020-03-30T14:03:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=fe299ec5ae7b419990bbc3efd4e6bfa3f0773b45'/>
<id>urn:sha1:fe299ec5ae7b419990bbc3efd4e6bfa3f0773b45</id>
<content type='text'>
We renamed the actual data structure in 910650d2f8 (Rename sha1_array to
oid_array, 2017-03-31), but the file is still called sha1-array. Besides
being slightly confusing, it makes it more annoying to grep for leftover
occurrences of "sha1" in various files, because the header is included
in so many places.

Let's complete the transition by renaming the source and header files
(and fixing up a few comment references).

I kept the "-" in the name, as that seems to be our style; cf.
fc1395f4a4 (sha1_file.c: rename to use dash in file name, 2018-04-10).
We also have oidmap.h and oidset.h without any punctuation, but those
are "struct oidmap" and "struct oidset" in the code. We _could_ make
this "oidarray" to match, but somehow it looks uglier to me because of
the length of "array" (plus it would be a very invasive patch for little
gain).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>packed_object_info(): use object_id for returning delta base</title>
<updated>2020-02-24T20:55:53Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2020-02-24T04:36:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b99b6bcc57faf5c989fc18c3b8d4d92df3407cec'/>
<id>urn:sha1:b99b6bcc57faf5c989fc18c3b8d4d92df3407cec</id>
<content type='text'>
If a caller sets the object_info.delta_base_sha1 to a non-NULL pointer,
we'll write the oid of the object's delta base to it. But we can
increase our type safety by switching this to a real object_id struct.
All of our callers are just pointing into the hash member of an
object_id anyway, so there's no inconvenience.

Note that we do still keep it as a pointer-to-struct, because the NULL
sentinel value tells us whether the caller is even interested in the
information.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'mt/use-passed-repo-more-in-funcs'</title>
<updated>2020-02-14T20:54:22Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-02-14T20:54:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=78e67cda42a440f216fc4cbfd8a9e190c5eece5a'/>
<id>urn:sha1:78e67cda42a440f216fc4cbfd8a9e190c5eece5a</id>
<content type='text'>
Some codepaths were given a repository instance as a parameter to
work in the repository, but passed the_repository instance to its
callees, which has been cleaned up (somewhat).

* mt/use-passed-repo-more-in-funcs:
  sha1-file: allow check_object_signature() to handle any repo
  sha1-file: pass git_hash_algo to hash_object_file()
  sha1-file: pass git_hash_algo to write_object_file_prepare()
  streaming: allow open_istream() to handle any repo
  pack-check: use given repo's hash_algo at verify_packfile()
  cache-tree: use given repo's hash_algo at verify_one()
  diff: make diff_populate_filespec() honor its repo argument
</content>
</entry>
<entry>
<title>Merge branch 'mt/threaded-grep-in-object-store'</title>
<updated>2020-02-14T20:54:20Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-02-14T20:54:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=56ceb64eb0377acb2884226d4fb7d7c3f4032354'/>
<id>urn:sha1:56ceb64eb0377acb2884226d4fb7d7c3f4032354</id>
<content type='text'>
Traditionally, we avoided threaded grep while searching in objects
(as opposed to files in the working tree) as accesses to the object
layer is not thread-safe.  This limitation is getting lifted.

* mt/threaded-grep-in-object-store:
  grep: use no. of cores as the default no. of threads
  grep: move driver pre-load out of critical section
  grep: re-enable threads in non-worktree case
  grep: protect packed_git [re-]initialization
  grep: allow submodule functions to run in parallel
  submodule-config: add skip_if_read option to repo_read_gitmodules()
  grep: replace grep_read_mutex by internal obj read lock
  object-store: allow threaded access to object reading
  replace-object: make replace operations thread-safe
  grep: fix racy calls in grep_objects()
  grep: fix race conditions at grep_submodule()
  grep: fix race conditions on userdiff calls
</content>
</entry>
<entry>
<title>Merge branch 'jn/pretend-object-doc'</title>
<updated>2020-02-12T20:41:35Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-02-12T20:41:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b486d2ee811d7471e7a31cd315eddca1be79fc19'/>
<id>urn:sha1:b486d2ee811d7471e7a31cd315eddca1be79fc19</id>
<content type='text'>
Warn programmers about pretend_object_file() that allows the code
to tentatively use in-core objects.

* jn/pretend-object-doc:
  sha1-file: document how to use pretend_object_file
</content>
</entry>
<entry>
<title>sha1-file: pass git_hash_algo to hash_object_file()</title>
<updated>2020-01-31T18:45:39Z</updated>
<author>
<name>Matheus Tavares</name>
<email>matheus.bernardino@usp.br</email>
</author>
<published>2020-01-30T20:32:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=2dcde20e1c55fc2e3f9e9e6d48e93c39ec5661d2'/>
<id>urn:sha1:2dcde20e1c55fc2e3f9e9e6d48e93c39ec5661d2</id>
<content type='text'>
Allow hash_object_file() to work on arbitrary repos by introducing a
git_hash_algo parameter. Change callers which have a struct repository
pointer in their scope to pass on the git_hash_algo from the said repo.
For all other callers, pass on the_hash_algo, which was already being
used internally at hash_object_file(). This functionality will be used
in the following patch to make check_object_signature() be able to work
on arbitrary repos (which, in turn, will be used to fix an
inconsistency at object.c:parse_object()).

Signed-off-by: Matheus Tavares &lt;matheus.bernardino@usp.br&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'jt/sha1-file-remove-oi-skip-cached'</title>
<updated>2020-01-22T23:07:30Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2020-01-22T23:07:30Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=e26bd14c8d7c7983d741cd59c076d33306b7050e'/>
<id>urn:sha1:e26bd14c8d7c7983d741cd59c076d33306b7050e</id>
<content type='text'>
has_object_file() said "no" given an object registered to the
system via pretend_object_file(), making it inconsistent with
read_object_file(), causing lazy fetch to attempt fetching an
empty tree from promisor remotes.

* jt/sha1-file-remove-oi-skip-cached:
  sha1-file: remove OBJECT_INFO_SKIP_CACHED
</content>
</entry>
<entry>
<title>object-store: allow threaded access to object reading</title>
<updated>2020-01-17T21:52:14Z</updated>
<author>
<name>Matheus Tavares</name>
<email>matheus.bernardino@usp.br</email>
</author>
<published>2020-01-16T02:39:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=31877c9aec21e0824fd4fcf415069cf8dfae4b72'/>
<id>urn:sha1:31877c9aec21e0824fd4fcf415069cf8dfae4b72</id>
<content type='text'>
Allow object reading to be performed by multiple threads protecting it
with an internal lock, the obj_read_mutex. The lock usage can be toggled
with enable_obj_read_lock() and disable_obj_read_lock(). Currently, the
functions which can be safely called in parallel are:
read_object_file_extended(), repo_read_object_file(),
read_object_file(), read_object_with_reference(), read_object(),
oid_object_info() and oid_object_info_extended(). It's also possible
to use obj_read_lock() and obj_read_unlock() to protect other sections
that cannot execute in parallel with object reading.

Probably there are many spots in the functions listed above that could
be executed unlocked (and thus, in parallel). But, for now, we are most
interested in allowing parallel access to zlib inflation. This is one of
the sections where object reading spends most of the time in (e.g. up to
one-third of git-grep's execution time in the chromium repo corresponds
to inflation) and it's already thread-safe. So, to take advantage of
that, the obj_read_mutex is released when calling git_inflate() and
re-acquired right after, for every calling spot in
oid_object_info_extended()'s call chain. We may refine this lock to also
exploit other possible parallel spots in the future, but for now,
threaded zlib inflation should already give great speedups for threaded
object reading callers.

Note that add_delta_base_cache() was also modified to skip adding
already present entries to the cache. This wasn't possible before, but
it would be now, with the parallel inflation. Take for example the
following situation, where two threads - A and B - are executing the
code at unpack_entry():

1. Thread A is performing the decompression of a base O (which is not
   yet in the cache) at PHASE II. Thread B is simultaneously trying to
   unpack O, but just starting at PHASE I.
2. Since O is not yet in the cache, B will go to PHASE II to also
   perform the decompression.
3. When they finish decompressing, one of them will get the object
   reading mutex and go to PHASE III while the other waits for the
   mutex. Let’s say A got the mutex first.
4. Thread A will add O to the cache, go throughout the rest of PHASE III
   and return.
5. Thread B gets the mutex, also add O to the cache (if the check wasn't
   there) and returns.

Finally, it is also important to highlight that the object reading lock
can only ensure thread-safety in the mentioned functions thanks to two
complementary mechanisms: the use of 'struct raw_object_store's
replace_mutex, which guards sections in the object reading machinery
that would otherwise be thread-unsafe; and the 'struct pack_window's
inuse_cnt, which protects window reading operations (such as the one
performed during the inflation of a packed object), allowing them to
execute without the acquisition of the obj_read_mutex.

Signed-off-by: Matheus Tavares &lt;matheus.bernardino@usp.br&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
