<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/compat/precompose_utf8.c, branch v2.45.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.45.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.45.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2023-06-21T20:39:53Z</updated>
<entry>
<title>cache.h: remove this no-longer-used header</title>
<updated>2023-06-21T20:39:53Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-05-16T06:33:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=bc5c5ec0446895f5c4139cd470066beb3c4ac6d5'/>
<id>urn:sha1:bc5c5ec0446895f5c4139cd470066beb3c4ac6d5</id>
<content type='text'>
Since this header showed up in some places besides just #include
statements, update/clean-up/remove those other places as well.

Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got
away with violating the rule that all files must start with an include
of git-compat-util.h (or a short-list of alternate headers that happen
to include it first).  This change exposed the violation and caused it
to stop building correctly; fix it by having it include
git-compat-util.h first, as per policy.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>read-cache*.h: move declarations for read-cache.c functions from cache.h</title>
<updated>2023-06-21T20:39:53Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-05-16T06:33:56Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=08c46a499aec5b6459fb1d55ff90403c7dc2ee5a'/>
<id>urn:sha1:08c46a499aec5b6459fb1d55ff90403c7dc2ee5a</id>
<content type='text'>
For the functions defined in read-cache.c, move their declarations from
cache.h to a new header, read-cache-ll.h.  Also move some related inline
functions from cache.h to read-cache.h.  The purpose of the
read-cache-ll.h/read-cache.h split is that about 70% of the sites don't
need the inline functions and the extra headers they include.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>hash-ll.h: split out of hash.h to remove dependency on repository.h</title>
<updated>2023-04-24T19:47:32Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-04-22T20:17:20Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d1cbe1e6d8a9cab2b4ffe8a17d34db214dce1e49'/>
<id>urn:sha1:d1cbe1e6d8a9cab2b4ffe8a17d34db214dce1e49</id>
<content type='text'>
hash.h depends upon and includes repository.h, due to the definition and
use of the_hash_algo (defined as the_repository-&gt;hash_algo).  However,
most headers trying to include hash.h are only interested in the layout
of the structs like object_id.  Move the parts of hash.h that do not
depend upon repository.h into a new file hash-ll.h (the "low level"
parts of hash.h), and adjust other files to use this new header where
the convenience inline functions aren't needed.

This allows hash.h and object.h to be fairly small, minimal headers.  It
also exposes a lot of hidden dependencies on both path.h (which was
brought in by repository.h) and repository.h (which was previously
implicitly brought in by object.h), so also adjust other files to be
more explicit about what they depend upon.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>environment.h: move declarations for environment.c functions from cache.h</title>
<updated>2023-03-21T17:56:53Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-03-21T06:26:03Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=32a8f510614312cc8b81bbc6a982d08ab7562ab4'/>
<id>urn:sha1:32a8f510614312cc8b81bbc6a982d08ab7562ab4</id>
<content type='text'>
Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>treewide: be explicit about dependence on gettext.h</title>
<updated>2023-03-21T17:56:51Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2023-03-21T06:25:54Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f394e093df10f1867d9bb2180b3789ee61124aed'/>
<id>urn:sha1:f394e093df10f1867d9bb2180b3789ee61124aed</id>
<content type='text'>
Dozens of files made use of gettext functions, without explicitly
including gettext.h.  This made it more difficult to find which files
could remove a dependence on cache.h.  Make C files explicitly include
gettext.h if they are using it.

However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an
include of gettext.h, it was left out to avoid conflicting with an
in-flight topic.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>precompose_utf8: make precompose_string_if_needed() public</title>
<updated>2021-04-06T00:30:04Z</updated>
<author>
<name>Torsten Bögershausen</name>
<email>tboegi@web.de</email>
</author>
<published>2021-04-04T06:17:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=5020774aef3fc80c89fac2e04d4ed0aeac66cdf0'/>
<id>urn:sha1:5020774aef3fc80c89fac2e04d4ed0aeac66cdf0</id>
<content type='text'>
commit 5c327502 (MacOS: precompose_argv_prefix(), 2021-02-03) uses
the function precompose_string_if_needed() internally.  It is only
used from precompose_argv_prefix() and therefore static in
compat/precompose_utf8.c

Expose this function, it will be used in the next commit.

While there, allow passing a NULL pointer, which will return NULL.

Signed-off-by: Torsten Bögershausen &lt;tboegi@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>MacOS: precompose_argv_prefix()</title>
<updated>2021-02-03T22:09:37Z</updated>
<author>
<name>Torsten Bögershausen</name>
<email>tboegi@web.de</email>
</author>
<published>2021-02-03T16:28:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=5c327502dbf7a27c8784c20037851206a87857c1'/>
<id>urn:sha1:5c327502dbf7a27c8784c20037851206a87857c1</id>
<content type='text'>
The following sequence leads to a "BUG" assertion running under MacOS:

  DIR=git-test-restore-p
  Adiarnfd=$(printf 'A\314\210')
  DIRNAME=xx${Adiarnfd}yy
  mkdir $DIR &amp;&amp;
  cd $DIR &amp;&amp;
  git init &amp;&amp;
  mkdir $DIRNAME &amp;&amp;
  cd $DIRNAME &amp;&amp;
  echo "Initial" &gt;file &amp;&amp;
  git add file &amp;&amp;
  echo "One more line" &gt;&gt;file &amp;&amp;
  echo y | git restore -p .

 Initialized empty Git repository in /tmp/git-test-restore-p/.git/
 BUG: pathspec.c:495: error initializing pathspec_item
 Cannot close git diff-index --cached --numstat
 [snip]

The command `git restore` is run from a directory inside a Git repo.
Git needs to split the $CWD into 2 parts:
The path to the repo and "the rest", if any.
"The rest" becomes a "prefix" later used inside the pathspec code.

As an example, "/path/to/repo/dir-inside-repå" would determine
"/path/to/repo" as the root of the repo, the place where the
configuration file .git/config is found.

The rest becomes the prefix ("dir-inside-repå"), from where the
pathspec machinery expands the ".", more about this later.
If there is a decomposed form, (making the decomposing visible like this),
"dir-inside-rep°a" doesn't match "dir-inside-repå".

Git commands need to:

 (a) read the configuration variable "core.precomposeunicode"
 (b) precocompose argv[]
 (c) precompose the prefix, if there was any

The first commit,
76759c7dff53 "git on Mac OS and precomposed unicode"
addressed (a) and (b).

The call to precompose_argv() was added into parse-options.c,
because that seemed to be a good place when the patch was written.

Commands that don't use parse-options need to do (a) and (b) themselfs.

The commands `diff-files`, `diff-index`, `diff-tree` and `diff`
learned (a) and (b) in
commit 90a78b83e0b8 "diff: run arguments through precompose_argv"

Branch names (or refs in general) using decomposed code points
resulting in decomposed file names had been fixed in
commit 8e712ef6fc97 "Honor core.precomposeUnicode in more places"

The bug report from above shows 2 things:
- more commands need to handle precomposed unicode
- (c) should be implemented for all commands using pathspecs

Solution:
precompose_argv() now handles the prefix (if needed), and is renamed into
precompose_argv_prefix().

Inside this function the config variable core.precomposeunicode is read
into the global variable precomposed_unicode, as before.
This reading is skipped if precomposed_unicode had been read before.

The original patch for preocomposed unicode, 76759c7dff53, placed
precompose_argv() into parse-options.c

Now add it into git.c::run_builtin() as well.  Existing precompose
calls in diff-files.c and others may become redundant, and if we
audit the callflows that reach these places to make sure that they
can never be reached without going through the new call added to
run_builtin(), we might be able to remove these existing ones.

But in this commit, we do not bother to do so and leave these
precompose callsites as they are.  Because precompose() is
idempotent and can be called on an already precomposed string
safely, this is safer than removing existing calls without fully
vetting the callflows.

There is certainly room for cleanups - this change intends to be a bug fix.
Cleanups needs more tests in e.g. t/t3910-mac-os-precompose.sh, and should
be done in future commits.

[1] git-bugreport-2021-01-06-1209.txt (git can't deal with special characters)
[2] https://lore.kernel.org/git/A102844A-9501-4A86-854D-E3B387D378AA@icloud.com/

Reported-by: Daniel Troger &lt;random_n0body@icloud.com&gt;
Helped-By: Philippe Blain &lt;levraiphilippeblain@gmail.com&gt;
Signed-off-by: Torsten Bögershausen &lt;tboegi@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Support working-tree-encoding "UTF-16LE-BOM"</title>
<updated>2019-01-31T18:27:52Z</updated>
<author>
<name>Torsten Bögershausen</name>
<email>tboegi@web.de</email>
</author>
<published>2019-01-30T15:01:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=aab2a1ae48ff65781a5379a01a4abb4f75e5641d'/>
<id>urn:sha1:aab2a1ae48ff65781a5379a01a4abb4f75e5641d</id>
<content type='text'>
Users who want UTF-16 files in the working tree set the .gitattributes
like this:
test.txt working-tree-encoding=UTF-16

The unicode standard itself defines 3 allowed ways how to encode UTF-16.
The following 3 versions convert all back to 'g' 'i' 't' in UTF-8:

a) UTF-16, without BOM, big endian:
$ printf "\000g\000i\000t" | iconv -f UTF-16 -t UTF-8 | od -c
0000000    g   i   t

b) UTF-16, with BOM, little endian:
$ printf "\377\376g\000i\000t\000" | iconv -f UTF-16 -t UTF-8 | od -c
0000000    g   i   t

c) UTF-16, with BOM, big endian:
$ printf "\376\377\000g\000i\000t" | iconv -f UTF-16 -t UTF-8 | od -c
0000000    g   i   t

Git uses libiconv to convert from UTF-8 in the index into ITF-16 in the
working tree.
After a checkout, the resulting file has a BOM and is encoded in "UTF-16",
in the version (c) above.
This is what iconv generates, more details follow below.

iconv (and libiconv) can generate UTF-16, UTF-16LE or UTF-16BE:

d) UTF-16
$ printf 'git' | iconv -f UTF-8 -t UTF-16 | od -c
0000000  376 377  \0   g  \0   i  \0   t

e) UTF-16LE
$ printf 'git' | iconv -f UTF-8 -t UTF-16LE | od -c
0000000    g  \0   i  \0   t  \0

f)  UTF-16BE
$ printf 'git' | iconv -f UTF-8 -t UTF-16BE | od -c
0000000   \0   g  \0   i  \0   t

There is no way to generate version (b) from above in a Git working tree,
but that is what some applications need.
(All fully unicode aware applications should be able to read all 3 variants,
but in practise we are not there yet).

When producing UTF-16 as an output, iconv generates the big endian version
with a BOM. (big endian is probably chosen for historical reasons).

iconv can produce UTF-16 files with little endianess by using "UTF-16LE"
as encoding, and that file does not have a BOM.

Not all users (especially under Windows) are happy with this.
Some tools are not fully unicode aware and can only handle version (b).

Today there is no way to produce version (b) with iconv (or libiconv).
Looking into the history of iconv, it seems as if version (c) will
be used in all future iconv versions (for compatibility reasons).

Solve this dilemma and introduce a Git-specific "UTF-16LE-BOM".
libiconv can not handle the encoding, so Git pick it up, handles the BOM
and uses libiconv to convert the rest of the stream.
(UTF-16BE-BOM is added for consistency)

Rported-by: Adrián Gimeno Balaguer &lt;adrigibal@gmail.com&gt;
Signed-off-by: Torsten Bögershausen &lt;tboegi@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>config: don't include config.h by default</title>
<updated>2017-06-15T19:56:22Z</updated>
<author>
<name>Brandon Williams</name>
<email>bmwill@google.com</email>
</author>
<published>2017-06-14T18:07:36Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=b2141fc1d20e659810245ec6ca1c143c60e033ec'/>
<id>urn:sha1:b2141fc1d20e659810245ec6ca1c143c60e033ec</id>
<content type='text'>
Stop including config.h by default in cache.h.  Instead only include
config.h in those files which require use of the config system.

Signed-off-by: Brandon Williams &lt;bmwill@google.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>typofix: assorted typofixes in comments, documentation and messages</title>
<updated>2016-05-06T20:16:37Z</updated>
<author>
<name>Li Peng</name>
<email>lip@dtdream.com</email>
</author>
<published>2016-05-06T12:36:46Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=832c0e5e63a0f61c3788847d4a7abb82d9e86ef4'/>
<id>urn:sha1:832c0e5e63a0f61c3788847d4a7abb82d9e86ef4</id>
<content type='text'>
Many instances of duplicate words (e.g. "the the path") and
a few typoes are fixed, originally in multiple patches.

    wildmatch: fix duplicate words of "the"
    t: fix duplicate words of "output"
    transport-helper: fix duplicate words of "read"
    Git.pm: fix duplicate words of "return"
    path: fix duplicate words of "look"
    pack-protocol.txt: fix duplicate words of "the"
    precompose-utf8: fix typo of "sequences"
    split-index: fix typo
    worktree.c: fix typo
    remote-ext: fix typo
    utf8: fix duplicate words of "the"
    git-cvsserver: fix duplicate words

Signed-off-by: Li Peng &lt;lip@dtdream.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
