diff: fix out-of-bounds reads and NULL deref in diffstat UTF-8 truncation - git - Mirror of https://git.kernel.org/pub/scm/git/git.git/

diff options

author	Elijah Newren <newren@gmail.com>	2026-04-17 22:45:10 +0000
committer	Junio C Hamano <gitster@pobox.com>	2026-04-17 17:38:31 -0700
commit	41f1ee06d5d6163ae2dcbca1fbd97ef39d8a76bc (patch)
tree	5a943137ee54a14acb6210605a94399f740f3cc5 /t/t4013/diff.format-patch_--stdout_--numbered_initial..main
parent	e8955061076952cc5eab0300424fc48b601fe12d (diff)
download	git-41f1ee06d5d6163ae2dcbca1fbd97ef39d8a76bc.tar.gz git-41f1ee06d5d6163ae2dcbca1fbd97ef39d8a76bc.zip

diff: fix out-of-bounds reads and NULL deref in diffstat UTF-8 truncation

f85b49f3d4a (diff: improve scaling of filenames in diffstat to handle UTF-8 chars, 2026-01-16) introduced a loop in show_stats() that calls utf8_width() repeatedly to skip leading characters until the displayed width fits. However, utf8_width() can return problematic values: - For invalid UTF-8 sequences, pick_one_utf8_char() sets the name pointer to NULL and utf8_width() returns 0. Since name_len does not change, the loop iterates once more and pick_one_utf8_char() dereferences the NULL pointer, crashing. - For control characters, utf8_width() returns -1, so name_len grows when it is expected to shrink. This can cause the loop to consume more characters than the string contains, reading past the trailing NUL. By default, fill_print_name() will C-quotes filenames which escapes control characters and invalid bytes to printable text. That avoids this bug from being triggered; however, with core.quotePath=false, raw bytes can reach this code. Add tests exercising both failure modes with core.quotePath=false and a narrow --stat-name-width to force truncation: one with a bare 0xC0 byte (invalid UTF-8 lead byte, triggers NULL deref) and one with a 0x01 byte (control character, causes the loop to read past the end of the string). Fix both issues by introducing utf8_ish_width(), a thin wrapper around utf8_width() that guarantees the pointer always advances and the returned width is never negative: - On invalid UTF-8 it restores the pointer, advances by one byte, and returns width 1 (matching the strlen()-based fallback used by utf8_strwidth()). - On a control character it returns 0 (matching utf8_strnwidth() which skips them). Also add a "&& *name" guard to the while-loop condition so it terminates at end-of-string even when utf8_strwidth()'s strlen() fallback causes name_len to exceed the sum of per-character widths. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>

Diffstat (limited to 't/t4013/diff.format-patch_--stdout_--numbered_initial..main')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: