diff options
| author | Elijah Newren <newren@gmail.com> | 2026-04-17 22:45:10 +0000 |
|---|---|---|
| committer | Junio C Hamano <gitster@pobox.com> | 2026-04-17 17:38:31 -0700 |
| commit | 41f1ee06d5d6163ae2dcbca1fbd97ef39d8a76bc (patch) | |
| tree | 5a943137ee54a14acb6210605a94399f740f3cc5 /t/t4013/diff.format-patch_--stdout_--numbered_initial..main | |
| parent | e8955061076952cc5eab0300424fc48b601fe12d (diff) | |
| download | git-41f1ee06d5d6163ae2dcbca1fbd97ef39d8a76bc.tar.gz git-41f1ee06d5d6163ae2dcbca1fbd97ef39d8a76bc.zip | |
diff: fix out-of-bounds reads and NULL deref in diffstat UTF-8 truncation
f85b49f3d4a (diff: improve scaling of filenames in diffstat to handle
UTF-8 chars, 2026-01-16) introduced a loop in show_stats() that calls
utf8_width() repeatedly to skip leading characters until the displayed
width fits. However, utf8_width() can return problematic values:
- For invalid UTF-8 sequences, pick_one_utf8_char() sets the name
pointer to NULL and utf8_width() returns 0. Since name_len does
not change, the loop iterates once more and pick_one_utf8_char()
dereferences the NULL pointer, crashing.
- For control characters, utf8_width() returns -1, so name_len
grows when it is expected to shrink. This can cause the loop to
consume more characters than the string contains, reading past
the trailing NUL.
By default, fill_print_name() will C-quotes filenames which escapes
control characters and invalid bytes to printable text. That avoids
this bug from being triggered; however, with core.quotePath=false,
raw bytes can reach this code.
Add tests exercising both failure modes with core.quotePath=false and
a narrow --stat-name-width to force truncation: one with a bare 0xC0
byte (invalid UTF-8 lead byte, triggers NULL deref) and one with a
0x01 byte (control character, causes the loop to read past the end
of the string).
Fix both issues by introducing utf8_ish_width(), a thin wrapper
around utf8_width() that guarantees the pointer always advances and
the returned width is never negative:
- On invalid UTF-8 it restores the pointer, advances by one byte,
and returns width 1 (matching the strlen()-based fallback used
by utf8_strwidth()).
- On a control character it returns 0 (matching utf8_strnwidth()
which skips them).
Also add a "&& *name" guard to the while-loop condition so it
terminates at end-of-string even when utf8_strwidth()'s strlen()
fallback causes name_len to exceed the sum of per-character widths.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 't/t4013/diff.format-patch_--stdout_--numbered_initial..main')
0 files changed, 0 insertions, 0 deletions
