diff options
Diffstat (limited to '')
| -rw-r--r-- | Documentation/gitformat-loose.adoc | 157 |
1 files changed, 157 insertions, 0 deletions
diff --git a/Documentation/gitformat-loose.adoc b/Documentation/gitformat-loose.adoc new file mode 100644 index 0000000000..4850c91669 --- /dev/null +++ b/Documentation/gitformat-loose.adoc @@ -0,0 +1,157 @@ +gitformat-loose(5) +================== + +NAME +---- +gitformat-loose - Git loose object format + + +SYNOPSIS +-------- +[verse] +$GIT_DIR/objects/[0-9a-f][0-9a-f]/* +$GIT_DIR/objects/loose-object-idx +$GIT_DIR/objects/loose-map/map-*.map + +DESCRIPTION +----------- + +Loose objects are how Git stores individual objects, where every object is +written as a separate file. + +Over the lifetime of a repository, objects are usually written as loose objects +initially. Eventually, these loose objects will be compacted into packfiles +via repository maintenance to improve disk space usage and speed up the lookup +of these objects. + +== Loose objects + +Each loose object contains a prefix, followed immediately by the data of the +object. The prefix contains `<type> <size>\0`. `<type>` is one of `blob`, +`tree`, `commit`, or `tag` and `size` is the size of the data (without the +prefix) as a decimal integer expressed in ASCII. + +The entire contents, prefix and data concatenated, is then compressed with zlib +and the compressed data is stored in the file. The object ID of the object is +the SHA-1 or SHA-256 (as appropriate) hash of the uncompressed data. + +The file for the loose object is stored under the `objects` directory, with the +first two hex characters of the object ID being the directory and the remaining +characters being the file name. This is done to shard the data and avoid too +many files being in one directory, since some file systems perform poorly with +many items in a directory. + +As an example, the empty tree contains the data (when uncompressed) `tree 0\0` +and, in a SHA-256 repository, would have the object ID +`6ef19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321` and would be +stored under +`$GIT_DIR/objects/6e/f19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321`. + +Similarly, a blob containing the contents `abc` would have the uncompressed +data of `blob 3\0abc`. + +== Loose object mapping + +When the `compatObjectFormat` option is used, Git needs to store a mapping +between the repository's main algorithm and the compatibility algorithm. There +are two formats for this: the legacy mapping and the modern mapping. + +=== Legacy mapping + +The compatibility mapping is stored in a file called +`$GIT_DIR/objects/loose-object-idx`. The format of this file looks like this: + + # loose-object-idx + (main-name SP compat-name LF)* + +`main-name` refers to hexadecimal object ID of the object in the main +repository format and `compat-name` refers to the same thing, but for the +compatibility format. + +This format is read if it exists but is not written. + +Note that carriage returns are not permitted in this file, regardless of the +host system or configuration. + +=== Modern mapping + +The modern mapping consists of a set of files under `$GIT_DIR/objects/loose` +ending in `.map`. The portion of the filename before the extension is that of +the hash checksum in hex format. + +`git pack-objects` will repack existing entries into one file, removing any +unnecessary objects, such as obsolete shallow entries or loose objects that +have been packed. + +==== Mapping file format + +- A header appears at the beginning and consists of the following: + * A 4-byte mapping signature: `LMAP` + * 4-byte version number: 1 + * 4-byte length of the header section. + * 4-byte number of objects declared in this map file. + * 4-byte number of object formats declared in this map file. + * For each object format: + ** 4-byte format identifier (e.g., `sha1` for SHA-1) + ** 4-byte length in bytes of shortened object names. This is the + shortest possible length needed to make names in the shortened + object name table unambiguous. + ** 8-byte integer, recording where tables relating to this format + are stored in this index file, as an offset from the beginning. + * 8-byte offset to the trailer from the beginning of this file. + * Zero or more additional key/value pairs (4-byte key, 4-byte value), which + may optionally declare one or more chunks. No chunks are currently + defined. Readers must ignore unrecognized keys. +- Zero or more NUL bytes. These are used to improve the alignment of the + 4-byte quantities below. +- Tables for the first object format: + * A sorted table of shortened object names. These are prefixes of the names + of all objects in this file, packed together without offset values to + reduce the cache footprint of the binary search for a specific object name. + * A sorted table of full object names. + * A table of 4-byte metadata values. + * Zero or more chunks. A chunk starts with a four-byte chunk identifier and + a four-byte parameter (which, if unneeded, is all zeros) and an eight-byte + size (not including the identifier, parameter, or size), plus the chunk + data. +- Zero or more NUL bytes. +- Tables for subsequent object formats: + * A sorted table of shortened object names. These are prefixes of the names + of all objects in this file, packed together without offset values to + reduce the cache footprint of the binary search for a specific object name. + * A table of full object names in the order specified by the first object format. + * A table of 4-byte values mapping object name order to the order of the + first object format. For an object in the table of sorted shortened object + names, the value at the corresponding index in this table is the index in + the previous table for that same object. + * Zero or more NUL bytes. +- The trailer consists of the following: + * Hash checksum of all of the above. + +The lower six bits of each metadata table contain a type field indicating the +reason that this object is stored: + +0:: + Reserved. +1:: + This object is stored as a loose object in the repository. +2:: + This object is a shallow entry. The mapping refers to a shallow value + returned by a remote server. +3:: + This object is a submodule entry. The mapping refers to the commit stored + representing a submodule. + +Other data may be stored in this field in the future. Bits that are not used +must be zero. + +All 4-byte numbers are in network order and must be 4-byte aligned in the file, +so the NUL padding may be required in some cases. + +Note that the hash at the end of the file is in whatever the repository's main +algorithm is. In the usual case when there are multiple algorithms, the main +algorithm will be SHA-256 and the compatibility algorithm will be SHA-1. + +GIT +--- +Part of the linkgit:git[1] suite |
