<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/utf8.c, branch v2.41.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.41.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.41.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2022-12-13T12:09:40Z</updated>
<entry>
<title>Sync with Git 2.31.6</title>
<updated>2022-12-13T12:09:40Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2022-12-13T12:09:40Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=8a755eddf5bf256613bc584f32cd44401a25897c'/>
<id>urn:sha1:8a755eddf5bf256613bc584f32cd44401a25897c</id>
<content type='text'>
</content>
</entry>
<entry>
<title>utf8: refactor `strbuf_utf8_replace` to not rely on preallocated buffer</title>
<updated>2022-12-09T05:26:21Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2022-12-01T14:47:15Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f930a2394303b902e2973f4308f96529f736b8bc'/>
<id>urn:sha1:f930a2394303b902e2973f4308f96529f736b8bc</id>
<content type='text'>
In `strbuf_utf8_replace`, we preallocate the destination buffer and then
use `memcpy` to copy bytes into it at computed offsets. This feels
rather fragile and is hard to understand at times. Refactor the code to
instead use `strbuf_add` and `strbuf_addstr` so that we can be sure that
there is no possibility to perform an out-of-bounds write.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>utf8: fix checking for glyph width in `strbuf_utf8_replace()`</title>
<updated>2022-12-09T05:26:21Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2022-12-01T14:47:10Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=81c2d4c3a5ba0e6ab8c348708441fed170e63a82'/>
<id>urn:sha1:81c2d4c3a5ba0e6ab8c348708441fed170e63a82</id>
<content type='text'>
In `strbuf_utf8_replace()`, we call `utf8_width()` to compute the width
of the current glyph. If the glyph is a control character though it can
be that `utf8_width()` returns `-1`, but because we assign this value to
a `size_t` the conversion will cause us to underflow. This bug can
easily be triggered with the following command:

    $ git log --pretty='format:xxx%&lt;|(1,trunc)%x10'

&gt;From all I can see though this seems to be a benign underflow that has
no security-related consequences.

Fix the bug by using an `int` instead. When we see a control character,
we now copy it into the target buffer but don't advance the current
width of the string.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>utf8: fix overflow when returning string width</title>
<updated>2022-12-09T05:26:21Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2022-12-01T14:47:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=937b71cc8b5b998963a7f9a33312ba3549d55510'/>
<id>urn:sha1:937b71cc8b5b998963a7f9a33312ba3549d55510</id>
<content type='text'>
The return type of both `utf8_strwidth()` and `utf8_strnwidth()` is
`int`, but we operate on string lengths which are typically of type
`size_t`. This means that when the string is longer than `INT_MAX`, we
will overflow and thus return a negative result.

This can lead to an out-of-bounds write with `--pretty=format:%&lt;1)%B`
and a commit message that is 2^31+1 bytes long:

    =================================================================
    ==26009==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000001168 at pc 0x7f95c4e5f427 bp 0x7ffd8541c900 sp 0x7ffd8541c0a8
    WRITE of size 2147483649 at 0x603000001168 thread T0
        #0 0x7f95c4e5f426 in __interceptor_memcpy /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827
        #1 0x5612bbb1068c in format_and_pad_commit pretty.c:1763
        #2 0x5612bbb1087a in format_commit_item pretty.c:1801
        #3 0x5612bbc33bab in strbuf_expand strbuf.c:429
        #4 0x5612bbb110e7 in repo_format_commit_message pretty.c:1869
        #5 0x5612bbb12d96 in pretty_print_commit pretty.c:2161
        #6 0x5612bba0a4d5 in show_log log-tree.c:781
        #7 0x5612bba0d6c7 in log_tree_commit log-tree.c:1117
        #8 0x5612bb691ed5 in cmd_log_walk_no_free builtin/log.c:508
        #9 0x5612bb69235b in cmd_log_walk builtin/log.c:549
        #10 0x5612bb6951a2 in cmd_log builtin/log.c:883
        #11 0x5612bb56c993 in run_builtin git.c:466
        #12 0x5612bb56d397 in handle_builtin git.c:721
        #13 0x5612bb56db07 in run_argv git.c:788
        #14 0x5612bb56e8a7 in cmd_main git.c:923
        #15 0x5612bb803682 in main common-main.c:57
        #16 0x7f95c4c3c28f  (/usr/lib/libc.so.6+0x2328f)
        #17 0x7f95c4c3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
        #18 0x5612bb5680e4 in _start ../sysdeps/x86_64/start.S:115

    0x603000001168 is located 0 bytes to the right of 24-byte region [0x603000001150,0x603000001168)
    allocated by thread T0 here:
        #0 0x7f95c4ebe7ea in __interceptor_realloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:85
        #1 0x5612bbcdd556 in xrealloc wrapper.c:136
        #2 0x5612bbc310a3 in strbuf_grow strbuf.c:99
        #3 0x5612bbc32acd in strbuf_add strbuf.c:298
        #4 0x5612bbc33aec in strbuf_expand strbuf.c:418
        #5 0x5612bbb110e7 in repo_format_commit_message pretty.c:1869
        #6 0x5612bbb12d96 in pretty_print_commit pretty.c:2161
        #7 0x5612bba0a4d5 in show_log log-tree.c:781
        #8 0x5612bba0d6c7 in log_tree_commit log-tree.c:1117
        #9 0x5612bb691ed5 in cmd_log_walk_no_free builtin/log.c:508
        #10 0x5612bb69235b in cmd_log_walk builtin/log.c:549
        #11 0x5612bb6951a2 in cmd_log builtin/log.c:883
        #12 0x5612bb56c993 in run_builtin git.c:466
        #13 0x5612bb56d397 in handle_builtin git.c:721
        #14 0x5612bb56db07 in run_argv git.c:788
        #15 0x5612bb56e8a7 in cmd_main git.c:923
        #16 0x5612bb803682 in main common-main.c:57
        #17 0x7f95c4c3c28f  (/usr/lib/libc.so.6+0x2328f)

    SUMMARY: AddressSanitizer: heap-buffer-overflow /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827 in __interceptor_memcpy
    Shadow bytes around the buggy address:
      0x0c067fff81d0: fd fd fd fa fa fa fd fd fd fa fa fa fd fd fd fa
      0x0c067fff81e0: fa fa fd fd fd fd fa fa fd fd fd fd fa fa fd fd
      0x0c067fff81f0: fd fa fa fa fd fd fd fa fa fa fd fd fd fa fa fa
      0x0c067fff8200: fd fd fd fa fa fa fd fd fd fd fa fa 00 00 00 fa
      0x0c067fff8210: fa fa fd fd fd fa fa fa fd fd fd fa fa fa fd fd
    =&gt;0x0c067fff8220: fd fa fa fa fd fd fd fa fa fa 00 00 00[fa]fa fa
      0x0c067fff8230: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c067fff8240: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c067fff8250: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c067fff8260: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c067fff8270: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    Shadow byte legend (one shadow byte represents 8 application bytes):
      Addressable:           00
      Partially addressable: 01 02 03 04 05 06 07
      Heap left redzone:       fa
      Freed heap region:       fd
      Stack left redzone:      f1
      Stack mid redzone:       f2
      Stack right redzone:     f3
      Stack after return:      f5
      Stack use after scope:   f8
      Global redzone:          f9
      Global init order:       f6
      Poisoned by user:        f7
      Container overflow:      fc
      Array cookie:            ac
      Intra object redzone:    bb
      ASan internal:           fe
      Left alloca redzone:     ca
      Right alloca redzone:    cb
    ==26009==ABORTING

Now the proper fix for this would be to convert both functions to return
an `size_t` instead of an `int`. But given that this commit may be part
of a security release, let's instead do the minimal viable fix and die
in case we see an overflow.

Add a test that would have previously caused us to crash.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>utf8: fix returning negative string width</title>
<updated>2022-12-09T05:26:21Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2022-12-01T14:47:00Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=17d23e8a3812a5ca3dd6564e74d5250f22e5d76d'/>
<id>urn:sha1:17d23e8a3812a5ca3dd6564e74d5250f22e5d76d</id>
<content type='text'>
The `utf8_strnwidth()` function calls `utf8_width()` in a loop and adds
its returned width to the end result. `utf8_width()` can return `-1`
though in case it reads a control character, which means that the
computed string width is going to be wrong. In the worst case where
there are more control characters than non-control characters, we may
even return a negative string width.

Fix this bug by treating control characters as having zero width.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>utf8: fix truncated string lengths in `utf8_strnwidth()`</title>
<updated>2022-12-09T05:26:21Z</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2022-12-01T14:46:53Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=522cc87fdc25449222a5894a428eebf4b8d5eaa9'/>
<id>urn:sha1:522cc87fdc25449222a5894a428eebf4b8d5eaa9</id>
<content type='text'>
The `utf8_strnwidth()` function accepts an optional string length as
input parameter. This parameter can either be set to `-1`, in which case
we call `strlen()` on the input. Or it can be set to a positive integer
that indicates a precomputed length, which callers typically compute by
calling `strlen()` at some point themselves.

The input parameter is an `int` though, whereas `strlen()` returns a
`size_t`. This can lead to implementation-defined behaviour though when
the `size_t` cannot be represented by the `int`. In the general case
though this leads to wrap-around and thus to negative string sizes,
which is sure enough to not lead to well-defined behaviour.

Fix this by accepting a `size_t` instead of an `int` as string length.
While this takes away the ability of callers to simply pass in `-1` as
string length, it really is trivial enough to convert them to instead
pass in `strlen()` instead.

Signed-off-by: Patrick Steinhardt &lt;ps@pks.im&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>t0060: test ntfs/hfs-obscured dotfiles</title>
<updated>2021-05-04T02:52:02Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2021-05-03T20:43:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=801ed010bf13465bf67608beabbaa1ec2550204f'/>
<id>urn:sha1:801ed010bf13465bf67608beabbaa1ec2550204f</id>
<content type='text'>
We have tests that cover various filesystem-specific spellings of
".gitmodules", because we need to reliably identify that path for some
security checks. These are from dc2d9ba318 (is_{hfs,ntfs}_dotgitmodules:
add tests, 2018-05-12), with the actual code coming from e7cb0b4455
(is_ntfs_dotgit: match other .git files, 2018-05-11) and 0fc333ba20
(is_hfs_dotgit: match other .git files, 2018-05-02).

Those latter two commits also added similar matching functions for
.gitattributes and .gitignore. These ended up not being used in the
final series, and are currently dead code. But in preparation for them
being used in some fsck checks, let's make sure they actually work by
throwing a few basic tests at them. Likewise, let's cover .mailmap
(which does need matching code added).

I didn't bother with the whole battery of tests that we cover for
.gitmodules. These functions are all based on the same generic matcher,
so it's sufficient to test most of the corner cases just once.

Note that the ntfs magic prefix names in the tests come from the
algorithm described in e7cb0b4455 (and are different for each file).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>utf8: use skip_iprefix() in same_utf_encoding()</title>
<updated>2019-11-10T07:04:36Z</updated>
<author>
<name>René Scharfe</name>
<email>l.s.r@web.de</email>
</author>
<published>2019-11-08T20:25:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=89f8cabaf35f8a5f7e893f190764597ad5c44ef9'/>
<id>urn:sha1:89f8cabaf35f8a5f7e893f190764597ad5c44ef9</id>
<content type='text'>
Get rid of magic numbers by using skip_iprefix() and skip_prefix() for
parsing the leading "[uU][tT][fF]-?" of both strings instead of checking
with istarts_with() and an explicit comparison.

Signed-off-by: René Scharfe &lt;l.s.r@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>utf8: use ARRAY_SIZE() in git_wcwidth()</title>
<updated>2019-10-12T01:57:39Z</updated>
<author>
<name>Beat Bolli</name>
<email>dev+git@drbeat.li</email>
</author>
<published>2019-10-11T18:41:23Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=fa364ad790ae0a778ba14b5e5fc9fe910ab4b67e'/>
<id>urn:sha1:fa364ad790ae0a778ba14b5e5fc9fe910ab4b67e</id>
<content type='text'>
This macro has been available globally since b4f2a6ac92 ("Use #define
ARRAY_SIZE(x) (sizeof(x)/sizeof(x[0]))", 2006-03-09), so let's use it.

Signed-off-by: Beat Bolli &lt;dev+git@drbeat.li&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>utf8: handle systems that don't write BOM for UTF-16</title>
<updated>2019-02-12T02:20:07Z</updated>
<author>
<name>brian m. carlson</name>
<email>sandals@crustytoothpaste.net</email>
</author>
<published>2019-02-12T00:52:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=79444c92943048f9ac62e9311038ebe43f5f0982'/>
<id>urn:sha1:79444c92943048f9ac62e9311038ebe43f5f0982</id>
<content type='text'>
When serializing UTF-16 (and UTF-32), there are three possible ways to
write the stream. One can write the data with a BOM in either big-endian
or little-endian format, or one can write the data without a BOM in
big-endian format.

Most systems' iconv implementations choose to write it with a BOM in
some endianness, since this is the most foolproof, and it is resistant
to misinterpretation on Windows, where UTF-16 and the little-endian
serialization are very common. For compatibility with Windows and to
avoid accidental misuse there, Git always wants to write UTF-16 with a
BOM, and will refuse to read UTF-16 without it.

However, musl's iconv implementation writes UTF-16 without a BOM,
relying on the user to interpret it as big-endian. This causes t0028 and
the related functionality to fail, since Git won't read the file without
a BOM.

Add a Makefile and #define knob, ICONV_OMITS_BOM, that can be set if the
iconv implementation has this behavior. When set, Git will write a BOM
manually for UTF-16 and UTF-32 and then force the data to be written in
UTF-16BE or UTF-32BE. We choose big-endian behavior here because the
tests use the raw "UTF-16" encoding, which will be big-endian when the
implementation requires this knob to be set.

Update the tests to detect this case and write test data with an added
BOM if necessary. Always write the BOM in the tests in big-endian
format, since all iconv implementations that omit a BOM must use
big-endian serialization according to the Unicode standard.

Preserve the existing behavior for systems which do not have this knob
enabled, since they may use optimized implementations, including
defaulting to the native endianness, which may improve performance.

Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
