aboutsummaryrefslogtreecommitdiffstats
path: root/pack.h (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-07-16object-file: get rid of `the_repository` in `finalize_object_file()`Patrick Steinhardt1-1/+2
We implicitly depend on `the_repository` when moving an object file into place in `finalize_object_file()`. Get rid of this global dependency by passing in a repository. Note that one might be pressed to inject an object database instead of a repository. But the function doesn't really care about the ODB at all. All it does is to move a file into place while checking whether there is any collision. As such, the functionality it provides is independent of the object database and only needs the repository as parameter so that it can adjust permissions of the file we are about to finalize. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10pack-write: stop depending on `the_repository` and `the_hash_algo`Patrick Steinhardt1-5/+6
There are a couple of functions in "pack-write.c" that implicitly depend on `the_repository` or `the_hash_algo`. Remove this dependency by injecting the repository via a parameter and adapt callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-03Merge branch 'kn/pack-write-with-reduced-globals'Junio C Hamano1-6/+24
Code clean-up. * kn/pack-write-with-reduced-globals: pack-write: pass hash_algo to internal functions pack-write: pass hash_algo to `write_rev_file()` pack-write: pass hash_algo to `write_idx_file()` pack-write: pass repository to `index_pack_lockfile()` pack-write: pass hash_algo to `fixup_pack_header_footer()`
2025-01-28Merge branch 'jk/pack-header-parse-alignment-fix'Junio C Hamano1-1/+2
It was possible for "git unpack-objects" and "git index-pack" to make an unaligned access, which has been corrected. * jk/pack-header-parse-alignment-fix: index-pack, unpack-objects: use skip_prefix to avoid magic number index-pack, unpack-objects: use get_be32() for reading pack header parse_pack_header_option(): avoid unaligned memory writes packfile: factor out --pack_header argument parsing bswap.h: squelch potential sparse -Wcast-truncate warnings
2025-01-21pack-write: pass hash_algo to `write_rev_file()`Karthik Nayak1-2/+12
The `write_rev_file()` function uses the global `the_hash_algo` variable to access the repository's hash_algo. To avoid global variable usage, pass a hash_algo from the layers above. Also modify children functions `write_rev_file_order()` and `write_rev_header()` to accept 'the_hash_algo'. Altough the layers above could have access to the hash_algo internally, simply pass in `the_hash_algo`. This avoids any compatibility issues and bubbles up global variable usage to upper layers which can be eventually resolved. However, in `midx-write.c`, since all usage of global variables is removed, don't reintroduce them and instead use the `repo` available in the context. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21pack-write: pass hash_algo to `write_idx_file()`Karthik Nayak1-2/+8
The `write_idx_file()` function uses the global `the_hash_algo` variable to access the repository's hash_algo. To avoid global variable usage, pass a hash_algo from the layers above. Since `stage_tmp_packfiles()` also resides in 'pack-write.c' and calls `write_idx_file()`, update it to accept a `struct git_hash_algo` as a parameter and pass it through to the callee. Altough the layers above could have access to the hash_algo internally, simply pass in `the_hash_algo`. This avoids any compatibility issues and bubbles up global variable usage to upper layers which can be eventually resolved. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21pack-write: pass repository to `index_pack_lockfile()`Karthik Nayak1-1/+1
The `index_pack_lockfile()` function uses the global `the_repository` variable to access the repository. To avoid global variable usage, pass the repository from the layers above. Altough the layers above could have access to the repository internally, simply pass in `the_repository`. This avoids any compatibility issues and bubbles up global variable usage to upper layers which can be eventually resolved. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21pack-write: pass hash_algo to `fixup_pack_header_footer()`Karthik Nayak1-1/+3
The `fixup_pack_header_footer()` function uses the global `the_hash_algo` variable to access the repository's hash function. To avoid global variable usage, pass a hash_algo from the layers above. Altough the layers above could have access to the hash_algo internally, simply pass in `the_hash_algo`. This avoids any compatibility issues and bubbles up global variable usage to upper layers which can be eventually resolved. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21index-pack, unpack-objects: use get_be32() for reading pack headerJeff King1-1/+2
Both of these commands read the incoming pack into a static unsigned char buffer in BSS, and then parse it by casting the start of the buffer to a struct pack_header. This can result in SIGBUS on some platforms if the compiler doesn't place the buffer in a position that is properly aligned for 4-byte integers. This reportedly happens with unpack-objects (but not index-pack) on sparc64 when compiled with clang (but not gcc). But we are definitely in the wrong in both spots; since the buffer's type is unsigned char, we can't depend on larger alignment. When it works it is only because we are lucky. We'll fix this by switching to get_be32() to read the headers (just like the last few commits similarly switched us to put_be32() for writing into the same buffer). It would be nice to factor this out into a common helper function, but the interface ends up quite awkward. Either the caller needs to hardcode how many bytes we'll need, or it needs to pass us its fill()/use() functions as pointers. So I've just fixed both spots in the same way; this is not code that is likely to be repeated a third time (most of the pack reading code uses an mmap'd buffer, which should be properly aligned). I did make one tweak to the shared code: our pack_version_ok() macro expects us to pass the big-endian value we'd get by casting. We can introduce a "native" variant which uses the host integer ordering. Reported-by: Koakuma <koachan@protonmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-04config: make `delta_base_cache_limit` a non-global variableKarthik Nayak1-0/+2
The `delta_base_cache_limit` variable is a global config variable used by multiple subsystems. Let's make this non-global, by adding this variable independently to the subsystems where it is used. First, add the setting to the `repo_settings` struct, this provides access to the config in places where the repository is available. Use this in `packfile.c`. In `index-pack.c` we add it to the `pack_idx_option` struct and its constructor. While the repository struct is available here, it may not be set because `git index-pack` can be used without a repository. In `gc.c` add it to the `gc_config` struct and also the constructor function. The gc functions currently do not have direct access to a repository struct. These changes are made to remove the usage of `delta_base_cache_limit` as a global variable in `packfile.c`. This brings us one step closer to removing the `USE_THE_REPOSITORY_VARIABLE` definition in `packfile.c` which we complete in the next patch. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-30pack-write: fix return parameter of `write_rev_file_order()`Patrick Steinhardt1-2/+2
While the return parameter of `write_rev_file_order()` is a string constant, the function may indeed return an allocated string when its first parameter is a `NULL` pointer. This makes for a confusing calling convention, where callers need to be aware of these intricate ownership rules and cast away the constness to free the string in some cases. Adapt the function and its caller `write_rev_file()` to always return an allocated string and adapt callers to always free the return value. Note that this requires us to also adapt `rename_tmp_packfile()`, which compares the pointers to packfile data with each other. Now that the path of the reverse index file gets allocated unconditionally the check will always fail. This is fixed by using strcmp(3P) instead, which also feels way less fragile. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21csum-file.h: remove unnecessary inclusion of cache.hElijah Newren1-0/+2
With the change in the last commit to move several functions to write-or-die.h, csum-file.h no longer needs to include cache.h. However, removing that include forces several other C files, which directly or indirectly dependend upon csum-file.h's inclusion of cache.h, to now be more explicit about their dependencies. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26pack-mtimes: support writing pack .mtimes filesTaylor Blau1-0/+1
Now that the `.mtimes` format is defined, supplement the pack-write API to be able to conditionally write an `.mtimes` file along with a pack by setting an additional flag and passing an oidmap that contains the timestamps corresponding to each object in the pack. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles'Taylor Blau1-0/+3
This structure will be used to communicate the per-object mtimes when writing a cruft pack. Here, we need the full packing_data structure because the mtime information is stored in an array there, not on the individual object_entry's themselves (to avoid paying the overhead in structure width for operations which do not generate a cruft pack). We haven't passed this information down before because one of the two callers (in bulk-checkin.c) does not have a packing_data structure at all. In that case (where no cruft pack will be generated), NULL is passed instead. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09pack-write: split up finish_tmp_packfile() functionÆvar Arnfjörð Bjarmason1-2/+5
Split up the finish_tmp_packfile() function and use the split-up version in pack-objects.c in preparation for moving the step of renaming the *.idx file later as part of a function change. Since the only other caller of finish_tmp_packfile() was in bulk-checkin.c, and it won't be needing a change to its *.idx renaming, provide a thin wrapper for the old function as a static function in that file. If other callers end up needing the simpler version it could be moved back to "pack-write.c" and "pack.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09pack.h: line-wrap the definition of finish_tmp_packfile()Ævar Arnfjörð Bjarmason1-1/+6
Line-wrap the definition of finish_tmp_packfile() to make subsequent changes easier to read. See 0e990530ae (finish_tmp_packfile(): a helper function, 2011-10-28) for the commit that introduced this overly long line. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-08Merge branch 'tb/reverse-midx'Junio C Hamano1-0/+1
An on-disk reverse-index to map the in-pack location of an object back to its object name across multiple packfiles is introduced. * tb/reverse-midx: midx.c: improve cache locality in midx_pack_order_cmp() pack-revindex: write multi-pack reverse indexes pack-write.c: extract 'write_rev_file_order' pack-revindex: read multi-pack reverse indexes Documentation/technical: describe multi-pack reverse indexes midx: make some functions non-static midx: keep track of the checksum midx: don't free midx_name early midx: allow marking a pack as preferred t/helper/test-read-midx.c: add '--show-objects' builtin/multi-pack-index.c: display usage on unrecognized command builtin/multi-pack-index.c: don't enter bogus cmd_mode builtin/multi-pack-index.c: split sub-commands builtin/multi-pack-index.c: define common usage with a macro builtin/multi-pack-index.c: don't handle 'progress' separately builtin/multi-pack-index.c: inline 'flags' with options
2021-04-01pack-write.c: extract 'write_rev_file_order'Taylor Blau1-0/+1
Existing callers provide the reverse index code with an array of 'struct pack_idx_entry *'s, which is then sorted by pack order (comparing the offsets of each object within the pack). Prepare for the multi-pack index to write a .rev file by providing a way to write the reverse index without an array of pack_idx_entry (which the MIDX code does not have). Instead, callers can invoke 'write_rev_index_positions()', which takes an array of uint32_t's. The ith entry in this array specifies the ith object's (in index order) position within the pack (in pack order). Expose this new function for use in a later patch, and rewrite the existing write_rev_file() in terms of this new function. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-01Merge branch 'jt/transfer-fsck-across-packs'Junio C Hamano1-1/+1
The approach to "fsck" the incoming objects in "index-pack" is attractive for performance reasons (we have them already in core, inflated and ready to be inspected), but fundamentally cannot be applied fully when we receive more than one pack stream, as a tree object in one pack may refer to a blob object in another pack as ".gitmodules", when we want to inspect blobs that are used as ".gitmodules" file, for example. Teach "index-pack" to emit objects that must be inspected later and check them in the calling "fetch-pack" process. * jt/transfer-fsck-across-packs: fetch-pack: print and use dangling .gitmodules fetch-pack: with packfile URIs, use index-pack arg http-fetch: allow custom index-pack args http: allow custom index-pack args
2021-02-22fetch-pack: print and use dangling .gitmodulesJonathan Tan1-1/+1
Teach index-pack to print dangling .gitmodules links after its "keep" or "pack" line instead of declaring an error, and teach fetch-pack to check such lines printed. This allows the tree side of the .gitmodules link to be in one packfile and the blob side to be in another without failing the fsck check, because it is now fetch-pack which checks such objects after all packfiles have been downloaded and indexed (and not index-pack on an individual packfile, as it is before this commit). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-25pack-write.c: prepare to write 'pack-*.rev' filesTaylor Blau1-0/+4
This patch prepares for callers to be able to write reverse index files to disk. It adds the necessary machinery to write a format-compliant .rev file from within 'write_rev_file()', which is called from 'finish_tmp_packfile()'. Similar to the process by which the reverse index is computed in memory, these new paths also have to sort a list of objects by their offsets within a packfile. These new paths use a qsort() (as opposed to a radix sort), since our specialized radix sort requires a full revindex_entry struct per object, which is more memory than we need to allocate. The qsort is obviously slower, but the theoretical slowdown would require a repository with a large amount of objects, likely implying that the time spent in, say, pack-objects during a repack would dominate the overall runtime. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-12fetch-pack: refactor writing promisor fileChristian Couder1-0/+4
Let's replace the 2 different pieces of code that write a promisor file in 'builtin/repack.c' and 'fetch-pack.c' with a new function called 'write_promisor_file()' in 'pack-write.c' and 'pack.h'. This might also help us in the future, if we want to put back the ref names and associated hashes that were in the promisor files we are repacking in 'builtin/repack.c' as suggested by a NEEDSWORK comment just above the code we are refactoring. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-05*.[ch]: manually align parameter listsDenton Liu1-1/+1
In previous patches, extern was mechanically removed from function declarations without care to formatting, causing parameter lists to be misaligned. Manually format changed sections such that the parameter lists should be realigned. Viewing this patch with 'git diff -w' should produce no output. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-05*.[ch]: remove extern from function declarations using spatchDenton Liu1-12/+12
There has been a push to remove extern from function declarations. Remove some instances of "extern" for function declarations which are caught by Coccinelle. Note that Coccinelle has some difficulty with processing functions with `__attribute__` or varargs so some `extern` declarations are left behind to be dealt with in a future patch. This was the Coccinelle patch used: @@ type T; identifier f; @@ - extern T f(...); and it was run with: $ git ls-files \*.{c,h} | grep -v ^compat/ | xargs spatch --sp-file contrib/coccinelle/noextern.cocci --in-place Files under `compat/` are intentionally excluded as some are directly copied from external sources and we should avoid churning them as much as possible. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-12pack-check.c: remove the_repository referencesNguyễn Thái Ngọc Duy1-1/+3
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-02csum-file: rename sha1file to hashfilebrian m. carlson1-2/+2
Rename struct sha1file to struct hashfile, along with all of its related functions. The transformation in this commit was made by global search-and-replace. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08Convert the verify_pack callback to struct object_idbrian m. carlson1-1/+1
Make the verify_pack_callback take a pointer to struct object_id. Change the pack checksum to use GIT_MAX_RAWSZ, even though it is not strictly an object ID. Doing so ensures resilience against future hash size changes, and allows us to remove hard-coded assumptions about how big the buffer needs to be. Also, use a union to convert the pointer from nth_packed_object_sha1 to to a pointer to struct object_id. This behavior is compatible with GCC and clang and explicitly sanctioned by C11. The alternatives are to just perform a cast, which would run afoul of strict aliasing rules, but should just work, and changing the pointer into an instance of struct object_id and copying the value. The latter operation could seriously bloat memory usage on fsck, which already uses a lot of memory on some repositories. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08pack: convert struct pack_idx_entry to struct object_idbrian m. carlson1-1/+1
Convert struct pack_idx_entry to use struct object_id by changing the definition and applying the following semantic patch, plus the standard object_id transforms: @@ struct pack_idx_entry E1; @@ - E1.sha1 + E1.oid.hash @@ struct pack_idx_entry *E1; @@ - E1->sha1 + E1->oid.hash Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-24pack.h: define largest possible encoded object sizeJeff King1-0/+6
Several callers use fixed buffers for storing the pack object header, and they've picked 10 as a magic number. This is reasonable, since it handles objects up to 2^67. But let's give them a constant so it's clear that the number isn't pulled out of thin air. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-24encode_in_pack_object_header: respect output buffer lengthJeff King1-1/+2
The encode_in_pack_object_header() writes a variable-length header to an output buffer, but it doesn't actually know long the buffer is. At first glance, this looks like it might be possible to overflow. In practice, this is probably impossible. The smallest buffer we use is 10 bytes, which would hold the header for an object up to 2^67 bytes. Obviously we're not likely to see such an object, but we might worry that an object could lie about its size (causing us to overflow before we realize it does not actually have that many bytes). But the argument is passed as a uintmax_t. Even on systems that have __int128 available, uintmax_t is typically restricted to 64-bit by the ABI. So it's unlikely that a system exists where this could be exploited. Still, it's easy enough to use a normal out/len pair and make sure we don't write too far. That protects the hypothetical 128-bit system, makes it harder for callers to accidentally specify a too-small buffer, and makes the resulting code easier to audit. Note that the one caller in fast-import tried to catch such a case, but did so _after_ the call (at which point we'd have already overflowed!). This check can now go away. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-13fsck: use streaming interface for large blobs in packNguyễn Thái Ngọc Duy1-0/+1
For blobs, we want to make sure the on-disk data is not corrupted (i.e. can be inflated and produce the expected SHA-1). Blob content is opaque, there's nothing else inside to check for. For really large blobs, we may want to avoid unpacking the entire blob in memory, just to check whether it produces the same SHA-1. On 32-bit systems, we may not have enough virtual address space for such memory allocation. And even on 64-bit where it's not a problem, allocating a lot more memory could result in kicking other parts of systems to swap file, generating lots of I/O and slowing everything down. For this particular operation, not unpacking the blob and letting check_sha1_signature, which supports streaming interface, do the job is sufficient. check_sha1_signature() is not shown in the diff, unfortunately. But if will be called when "data_valid && !data" is false. We will call the callback function "fn" with NULL as "data". The only callback of this function is fsck_obj_buffer(), which does not touch "data" at all if it's a blob. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-03finish_tmp_packfile():use strbuf for pathname constructionSun He1-1/+1
The old version fixes a maximum length on the buffer, which could be a problem if one is not certain of the length of get_object_directory(). Using strbuf can avoid the protential bug. Helped-by: Michael Haggerty <mhagger@alum.mit.edu> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Sun He <sunheehnus@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-05pack-objects: name pack files after trailer hashJeff King1-1/+1
Our current scheme for naming packfiles is to calculate the sha1 hash of the sorted list of objects contained in the packfile. This gives us a unique name, so we are reasonably sure that two packs with the same name will contain the same objects. It does not, however, tell us that two such packs have the exact same bytes. This makes things awkward if we repack the same set of objects. Due to run-to-run variations, the bytes may not be identical (e.g., changed zlib or git versions, different source object reuse due to new packs in the repository, or even different deltas due to races during a multi-threaded delta search). In theory, this could be helpful to a program that cares that the packfile contains a certain set of objects, but does not care about the particular representation. In practice, no part of git makes use of that, and in many cases it is potentially harmful. For example, if a dumb http client fetches the .idx file, it must be sure to get the exact .pack that matches it. Similarly, a partial transfer of a .pack file cannot be safely resumed, as the actual bytes may have changed. This could also affect a local client which opened the .idx and .pack files, closes the .pack file (due to memory or file descriptor limits), and then re-opens a changed packfile. In all of these cases, git can detect the problem, as we have the sha1 of the bytes themselves in the pack trailer (which we verify on transfer), and the .idx file references the trailer from the matching packfile. But it would be simpler and more efficient to actually get the correct bytes, rather than noticing the problem and having to restart the operation. This patch simply uses the pack trailer sha1 as the pack name. It should be similarly unique, but covers the exact representation of the objects. Other parts of git should not care, as the pack name is returned by pack-objects and is essentially opaque. One test needs to be updated, because it actually corrupts a pack and expects that re-packing the corrupted bytes will use the same name. It won't anymore, but we can easily just use the name that pack-objects hands back. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-16Merge branch 'jc/stream-to-pack'Junio C Hamano1-0/+6
* jc/stream-to-pack: bulk-checkin: replace fast-import based implementation csum-file: introduce sha1file_checkpoint finish_tmp_packfile(): a helper function create_tmp_packfile(): a helper function write_pack_header(): a helper function Conflicts: pack.h
2011-12-05Merge branch 'jc/index-pack-reject-dups'Junio C Hamano1-1/+2
* jc/index-pack-reject-dups: receive-pack, fetch-pack: reject bogus pack that records objects twice
2011-11-16receive-pack, fetch-pack: reject bogus pack that records objects twiceJunio C Hamano1-1/+2
When receive-pack & fetch-pack are run and store the pack obtained over the wire to a local repository, they internally run the index-pack command with the --strict option. Make sure that we reject incoming packfile that records objects twice to avoid spreading such a damage. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-06fsck: print progressNguyễn Thái Ngọc Duy1-1/+2
fsck is usually a long process and it would be nice if it prints progress from time to time. Progress meter is not printed when --verbose is given because --verbose prints a lot, there's no need for "alive" indicator. Progress meter may provide "% complete" information but it would be lost anyway in the flood of text. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-06fsck: avoid reading every object twiceNguyễn Thái Ngọc Duy1-1/+4
During verify_pack() all objects are read for SHA-1 check. Then fsck_sha1() is called on every object, which read the object again (fsck_sha1 -> parse_object -> read_sha1_file). Avoid reading an object twice, do fsck_sha1 while we have an object uncompressed data in verify_pack. On git.git, with this patch I got: $ /usr/bin/time ./git fsck >/dev/null 98.97user 0.90system 1:40.01elapsed 99%CPU (0avgtext+0avgdata 616624maxresident)k 0inputs+0outputs (0major+194186minor)pagefaults 0swaps Without it: $ /usr/bin/time ./git fsck >/dev/null 231.23user 2.35system 3:53.82elapsed 99%CPU (0avgtext+0avgdata 636688maxresident)k 0inputs+0outputs (0major+461629minor)pagefaults 0swaps Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-28finish_tmp_packfile(): a helper functionJunio C Hamano1-0/+1
Factor out a small logic out of the private write_pack_file() function in builtin/pack-objects.c. This changes the order of finishing multi-pack generation slightly. The code used to - adjust shared perm of temporary packfile - rename temporary packfile to the final name - update mtime of the packfile under the final name - adjust shared perm of temporary idxfile - rename temporary idxfile to the final name but because the helper does not want to do the mtime thing, the updated code does that step first and then all the rest. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-28create_tmp_packfile(): a helper functionJunio C Hamano1-0/+3
Factor out a small logic out of the private write_pack_file() function in builtin/pack-objects.c Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-28write_pack_header(): a helper functionJunio C Hamano1-0/+2
Factor out a small logic out of the private write_pack_file() function in builtin/pack-objects.c Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-27index-pack --verify: read anomalous offsets from v2 idx fileJunio C Hamano1-0/+8
A pack v2 .idx file usually records offset using 64-bit representation only when the offset does not fit within 31-bit, but you can handcraft your .idx file to record smaller offset using 64-bit, storing all zero in the upper 4-byte. By inspecting the original idx file when running index-pack --verify, encode such low offsets that do not need to be in 64-bit but are encoded using 64-bit just like the original idx file so that we can still validate the pack/idx pair by comparing the idx file recomputed with the original. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-27index-pack: --verifyJunio C Hamano1-0/+4
Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-27write_idx_file: introduce a struct to hold idx customization optionsJunio C Hamano1-4/+7
Remove two globals, pack_idx_default version and pack_idx_off32_limit, and place them in a pack_idx_option structure. Allow callers to pass it to write_idx_file() as a parameter. Adjust all callers to the API change. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-21Merge branch 'sp/maint-dumb-http-pack-reidx'Junio C Hamano1-0/+1
* sp/maint-dumb-http-pack-reidx: http.c::new_http_pack_request: do away with the temp variable filename http-fetch: Use temporary files for pack-*.idx until verified http-fetch: Use index-pack rather than verify-pack to check packs Allow parse_pack_index on temporary files Extract verify_pack_index for reuse from verify_pack Introduce close_pack_index to permit replacement http.c: Remove unnecessary strdup of sha1_to_hex result http.c: Don't store destination name in request structures http.c: Drop useless != NULL test in finish_http_pack_request http.c: Tiny refactoring of finish_http_pack_request t5550-http-fetch: Use subshell for repository operations http.c: Remove bad free of static block
2010-04-19Extract verify_pack_index for reuse from verify_packShawn O. Pearce1-0/+1
The dumb HTTP transport should verify an index is completely valid before trying to use it. That requires checking the header/footer but also checking the complete content SHA-1. All of this logic is already in the front half of verify_pack, so pull it out into a new function that can be reused. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-23move encode_in_pack_object_header() to a better placeNicolas Pitre1-0/+1
Commit 1b22b6c897 made duplicated versions of encode_header() into a common version called encode_in_pack_object_header(). There is however a better location that sha1_file.c for such a function though, as sha1_file.c contains nothing related to the creation of packs, and it is quite populated already. Also the comment that was moved to the header file should really remain near the function as it covers implementation details and provides no information about the actual function interface. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-22make "index-pack" a built-inLinus Torvalds1-1/+1
This required some fairly trivial packfile function 'const' cleanup, since the builtin commands get a const char *argv[] array. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-29improve reliability of fixup_pack_header_footer()Nicolas Pitre1-1/+1
Currently, this function has the potential to read corrupted pack data from disk and give it a valid SHA1 checksum. Let's add the ability to validate SHA1 checksum of existing data along the way, including before and after any arbitrary point in the pack. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-24verify-pack: check packed object CRC when using index version 2Nicolas Pitre1-1/+1
To do so, check_pack_crc() moved from builtin-pack-objects.c to pack-check.c where it is more logical to share. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>