<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/cache.h, branch v2.37.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.37.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.37.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2022-06-03T21:30:35Z</updated>
<entry>
<title>Merge branch 'ds/sparse-sparse-checkout'</title>
<updated>2022-06-03T21:30:35Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2022-06-03T21:30:35Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=c276c21da6d055060b1c216de1b1c04fb058425f'/>
<id>urn:sha1:c276c21da6d055060b1c216de1b1c04fb058425f</id>
<content type='text'>
"sparse-checkout" learns to work well with the sparse-index
feature.

* ds/sparse-sparse-checkout:
  sparse-checkout: integrate with sparse index
  p2000: add test for 'git sparse-checkout [add|set]'
  sparse-index: complete partial expansion
  sparse-index: partially expand directories
  sparse-checkout: --no-sparse-index needs a full index
  cache-tree: implement cache_tree_find_path()
  sparse-index: introduce partially-sparse indexes
  sparse-index: create expand_index()
  t1092: stress test 'git sparse-checkout set'
  t1092: refactor 'sparse-index contents' test
</content>
</entry>
<entry>
<title>Merge branch 'ns/batch-fsync'</title>
<updated>2022-06-03T21:30:34Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2022-06-03T21:30:34Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=83937e9592832408670da38bfe6e96c90ad63521'/>
<id>urn:sha1:83937e9592832408670da38bfe6e96c90ad63521</id>
<content type='text'>
Introduce a filesystem-dependent mechanism to optimize the way the
bits for many loose object files are ensured to hit the disk
platter.

* ns/batch-fsync:
  core.fsyncmethod: performance tests for batch mode
  t/perf: add iteration setup mechanism to perf-lib
  core.fsyncmethod: tests for batch mode
  test-lib-functions: add parsing helpers for ls-files and ls-tree
  core.fsync: use batch mode and sync loose objects by default on Windows
  unpack-objects: use the bulk-checkin infrastructure
  update-index: use the bulk-checkin infrastructure
  builtin/add: add ODB transaction around add_files_to_cache
  cache-tree: use ODB transaction around writing a tree
  core.fsyncmethod: batched disk flushes for loose-objects
  bulk-checkin: rebrand plug/unplug APIs as 'odb transactions'
  bulk-checkin: rename 'state' variable and separate 'plugged' boolean
</content>
</entry>
<entry>
<title>sparse-index: introduce partially-sparse indexes</title>
<updated>2022-05-23T18:08:21Z</updated>
<author>
<name>Derrick Stolee</name>
<email>dstolee@microsoft.com</email>
</author>
<published>2022-05-23T13:48:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=9fadb373dd4a670e761776560d3c40f6fcc80360'/>
<id>urn:sha1:9fadb373dd4a670e761776560d3c40f6fcc80360</id>
<content type='text'>
A future change will present a temporary, in-memory mode where the index
can both contain sparse directory entries but also not be completely
collapsed to the smallest possible sparse directories. This will be
necessary for modifying the sparse-checkout definition while using a
sparse index.

For now, convert the single-bit member 'sparse_index' in 'struct
index_state' to be a an 'enum sparse_index_mode' with three modes:

* INDEX_EXPANDED (0): No sparse directories exist. This is always the
  case for repositories that do not use cone-mode sparse-checkout.

* INDEX_COLLAPSED: Sparse directories may exist. Files outside the
  sparse-checkout cone are reduced to sparse directory entries whenever
  possible.

* INDEX_PARTIALLY_SPARSE: Sparse directories may exist. Some file
  entries outside the sparse-checkout cone may exist. Running
  convert_to_sparse() may further reduce those files to sparse directory
  entries.

The main reason to store this extra information is to allow
convert_to_sparse() to short-circuit when the index is already in
INDEX_EXPANDED mode but to actually do the necessary work when in
INDEX_PARTIALLY_SPARSE mode.

The INDEX_PARTIALLY_SPARSE mode will be used in an upcoming change.

Signed-off-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ds/midx-normalize-pathname-before-comparison'</title>
<updated>2022-05-04T16:51:29Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2022-05-04T16:51:29Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=8b28e2e2e4abd0e9cffe0d85afd6f0c0346bb948'/>
<id>urn:sha1:8b28e2e2e4abd0e9cffe0d85afd6f0c0346bb948</id>
<content type='text'>
The path taken by "git multi-pack-index" command from the end user
was compared with path internally prepared by the tool withut first
normalizing, which lead to duplicated paths not being noticed,
which has been corrected.

* ds/midx-normalize-pathname-before-comparison:
  cache: use const char * for get_object_directory()
  multi-pack-index: use --object-dir real path
  midx: use real paths in lookup_multi_pack_index()
</content>
</entry>
<entry>
<title>cache: use const char * for get_object_directory()</title>
<updated>2022-04-25T18:31:13Z</updated>
<author>
<name>Derrick Stolee</name>
<email>derrickstolee@github.com</email>
</author>
<published>2022-04-25T18:27:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=11f9e8de3d1a3d484c452141f1fcdbd707457ec0'/>
<id>urn:sha1:11f9e8de3d1a3d484c452141f1fcdbd707457ec0</id>
<content type='text'>
The get_object_directory() method returns the exact string stored at
the_repository-&gt;objects-&gt;odb-&gt;path. The return type of "char *" implies
that the caller must keep track of the buffer and free() it when
complete. This causes significant problems later when the ODB is
accessed.

Use "const char *" as the return type to avoid this confusion. There are
no current callers that care about the non-const definition.

Signed-off-by: Derrick Stolee &lt;derrickstolee@github.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>core.fsync: use batch mode and sync loose objects by default on Windows</title>
<updated>2022-04-06T20:13:26Z</updated>
<author>
<name>Neeraj Singh</name>
<email>neerajsi@microsoft.com</email>
</author>
<published>2022-04-05T05:20:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=8a94d833497f3b8232684271c80a7da56f930939'/>
<id>urn:sha1:8a94d833497f3b8232684271c80a7da56f930939</id>
<content type='text'>
Git for Windows has defaulted to core.fsyncObjectFiles=true since
September 2017. We turn on syncing of loose object files with batch mode
in upstream Git so that we can get broad coverage of the new code
upstream.

We don't actually do fsyncs in the most of the test suite, since
GIT_TEST_FSYNC is set to 0. However, we do exercise all of the
surrounding batch mode code since GIT_TEST_FSYNC merely makes the
maybe_fsync wrapper always appear to succeed.

Signed-off-by: Neeraj Singh &lt;neerajsi@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>core.fsyncmethod: batched disk flushes for loose-objects</title>
<updated>2022-04-06T20:13:01Z</updated>
<author>
<name>Neeraj Singh</name>
<email>neerajsi@microsoft.com</email>
</author>
<published>2022-04-05T05:20:09Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=c0f4752ed2ffb89d10a9e469a83f089cc7e9df9d'/>
<id>urn:sha1:c0f4752ed2ffb89d10a9e469a83f089cc7e9df9d</id>
<content type='text'>
When adding many objects to a repo with `core.fsync=loose-object`,
the cost of fsync'ing each object file can become prohibitive.

One major source of the cost of fsync is the implied flush of the
hardware writeback cache within the disk drive. This commit introduces
a new `core.fsyncMethod=batch` option that batches up hardware flushes.
It hooks into the bulk-checkin odb-transaction functionality, takes
advantage of tmp-objdir, and uses the writeout-only support code.

When the new mode is enabled, we do the following for each new object:
1a. Create the object in a tmp-objdir.
2a. Issue a pagecache writeback request and wait for it to complete.

At the end of the entire transaction when unplugging bulk checkin:
1b. Issue an fsync against a dummy file to flush the log and hardware
   writeback cache, which should by now have seen the tmp-objdir writes.
2b. Rename all of the tmp-objdir files to their final names.
3b. When updating the index and/or refs, we assume that Git will issue
   another fsync internal to that operation. This is not the default
   today, but the user now has the option of syncing the index and there
   is a separate patch series to implement syncing of refs.

On a filesystem with a singular journal that is updated during name
operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS
we would expect the fsync to trigger a journal writeout so that this
sequence is enough to ensure that the user's data is durable by the time
the git command returns. This sequence also ensures that no object files
appear in the main object store unless they are fsync-durable.

Batch mode is only enabled if core.fsync includes loose-objects. If
the legacy core.fsyncObjectFiles setting is enabled, but core.fsync does
not include loose-objects, we will use file-by-file fsyncing.

In step (1a) of the sequence, the tmp-objdir is created lazily to avoid
work if no loose objects are ever added to the ODB. We use a tmp-objdir
to maintain the invariant that no loose-objects are visible in the main
ODB unless they are properly fsync-durable. This is important since
future ODB operations that try to create an object with specific
contents will silently drop the new data if an object with the target
hash exists without checking that the loose-object contents match the
hash. Only a full git-fsck would restore the ODB to a functional state
where dataloss doesn't occur.

In step (1b) of the sequence, we issue a fsync against a dummy file
created specifically for the purpose. This method has a little higher
cost than using one of the input object files, but makes adding new
callers of this mechanism easier, since we don't need to figure out
which object file is "last" or risk sharing violations by caching the fd
of the last object file.

_Performance numbers_:

Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD.
Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD.
Windows - Same host as Linux, a preview version of Windows 11.

Adding 500 files to the repo with 'git add' Times reported in seconds.

object file syncing | Linux | Mac   | Windows
--------------------|-------|-------|--------
           disabled | 0.06  |  0.35 | 0.61
              fsync | 1.88  | 11.18 | 2.47
              batch | 0.15  |  0.41 | 1.53

Signed-off-by: Neeraj Singh &lt;neerajsi@microsoft.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ns/core-fsyncmethod' into ns/batch-fsync</title>
<updated>2022-04-06T20:01:54Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2022-04-06T20:01:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=fca85986bb936346d362c4802ebce5692fd257ee'/>
<id>urn:sha1:fca85986bb936346d362c4802ebce5692fd257ee</id>
<content type='text'>
* ns/core-fsyncmethod:
  configure.ac: fix HAVE_SYNC_FILE_RANGE definition
  core.fsyncmethod: correctly camel-case warning message
  core.fsync: fix incorrect expression for default configuration
  core.fsync: documentation and user-friendly aggregate options
  core.fsync: new option to harden the index
  core.fsync: add configuration parsing
  core.fsync: introduce granular fsync control infrastructure
  core.fsyncmethod: add writeout-only mode
  wrapper: make inclusion of Windows csprng header tightly scoped
</content>
</entry>
<entry>
<title>Merge branch 'jh/builtin-fsmonitor-part2'</title>
<updated>2022-04-04T17:56:24Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2022-04-04T17:56:24Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=439c1e6d5d8ad4d1134fc6ff5e514d28ff9ecac4'/>
<id>urn:sha1:439c1e6d5d8ad4d1134fc6ff5e514d28ff9ecac4</id>
<content type='text'>
Built-in fsmonitor (part 2).

* jh/builtin-fsmonitor-part2: (30 commits)
  t7527: test status with untracked-cache and fsmonitor--daemon
  fsmonitor: force update index after large responses
  fsmonitor--daemon: use a cookie file to sync with file system
  fsmonitor--daemon: periodically truncate list of modified files
  t/perf/p7519: add fsmonitor--daemon test cases
  t/perf/p7519: speed up test on Windows
  t/perf/p7519: fix coding style
  t/helper/test-chmtime: skip directories on Windows
  t/perf: avoid copying builtin fsmonitor files into test repo
  t7527: create test for fsmonitor--daemon
  t/helper/fsmonitor-client: create IPC client to talk to FSMonitor Daemon
  help: include fsmonitor--daemon feature flag in version info
  fsmonitor--daemon: implement handle_client callback
  compat/fsmonitor/fsm-listen-darwin: implement FSEvent listener on MacOS
  compat/fsmonitor/fsm-listen-darwin: add MacOS header files for FSEvent
  compat/fsmonitor/fsm-listen-win32: implement FSMonitor backend on Windows
  fsmonitor--daemon: create token-based changed path cache
  fsmonitor--daemon: define token-ids
  fsmonitor--daemon: add pathname classification
  fsmonitor--daemon: implement 'start' command
  ...
</content>
</entry>
<entry>
<title>Merge branch 'ns/core-fsyncmethod'</title>
<updated>2022-04-04T17:56:22Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2022-04-04T17:56:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=27dd46079980a8f80c5e70bc8fa5ee4c9b54995e'/>
<id>urn:sha1:27dd46079980a8f80c5e70bc8fa5ee4c9b54995e</id>
<content type='text'>
A couple of fix-up to a topic that is now in 'master'.

* ns/core-fsyncmethod:
  core.fsyncmethod: correctly camel-case warning message
  core.fsync: fix incorrect expression for default configuration
</content>
</entry>
</feed>
