<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/utf8.h, branch v2.32.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.32.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.32.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2021-05-04T02:52:02Z</updated>
<entry>
<title>t0060: test ntfs/hfs-obscured dotfiles</title>
<updated>2021-05-04T02:52:02Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2021-05-03T20:43:22Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=801ed010bf13465bf67608beabbaa1ec2550204f'/>
<id>urn:sha1:801ed010bf13465bf67608beabbaa1ec2550204f</id>
<content type='text'>
We have tests that cover various filesystem-specific spellings of
".gitmodules", because we need to reliably identify that path for some
security checks. These are from dc2d9ba318 (is_{hfs,ntfs}_dotgitmodules:
add tests, 2018-05-12), with the actual code coming from e7cb0b4455
(is_ntfs_dotgit: match other .git files, 2018-05-11) and 0fc333ba20
(is_hfs_dotgit: match other .git files, 2018-05-02).

Those latter two commits also added similar matching functions for
.gitattributes and .gitignore. These ended up not being used in the
final series, and are currently dead code. But in preparation for them
being used in some fsck checks, let's make sure they actually work by
throwing a few basic tests at them. Likewise, let's cover .mailmap
(which does need matching code added).

I didn't bother with the whole battery of tests that we cover for
.gitmodules. These functions are all based on the same generic matcher,
so it's sufficient to test most of the corner cases just once.

Note that the ntfs magic prefix names in the tests come from the
algorithm described in e7cb0b4455 (and are different for each file).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>*.[ch]: remove extern from function declarations using spatch</title>
<updated>2019-05-05T06:20:06Z</updated>
<author>
<name>Denton Liu</name>
<email>liu.denton@gmail.com</email>
</author>
<published>2019-04-29T08:28:14Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=554544276a604c144df45efcb060c80aa322088c'/>
<id>urn:sha1:554544276a604c144df45efcb060c80aa322088c</id>
<content type='text'>
There has been a push to remove extern from function declarations.
Remove some instances of "extern" for function declarations which are
caught by Coccinelle. Note that Coccinelle has some difficulty with
processing functions with `__attribute__` or varargs so some `extern`
declarations are left behind to be dealt with in a future patch.

This was the Coccinelle patch used:

	@@
	type T;
	identifier f;
	@@
	- extern
	  T f(...);

and it was run with:

	$ git ls-files \*.{c,h} |
		grep -v ^compat/ |
		xargs spatch --sp-file contrib/coccinelle/noextern.cocci --in-place

Files under `compat/` are intentionally excluded as some are directly
copied from external sources and we should avoid churning them as much
as possible.

Signed-off-by: Denton Liu &lt;liu.denton@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Support working-tree-encoding "UTF-16LE-BOM"</title>
<updated>2019-01-31T18:27:52Z</updated>
<author>
<name>Torsten Bögershausen</name>
<email>tboegi@web.de</email>
</author>
<published>2019-01-30T15:01:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=aab2a1ae48ff65781a5379a01a4abb4f75e5641d'/>
<id>urn:sha1:aab2a1ae48ff65781a5379a01a4abb4f75e5641d</id>
<content type='text'>
Users who want UTF-16 files in the working tree set the .gitattributes
like this:
test.txt working-tree-encoding=UTF-16

The unicode standard itself defines 3 allowed ways how to encode UTF-16.
The following 3 versions convert all back to 'g' 'i' 't' in UTF-8:

a) UTF-16, without BOM, big endian:
$ printf "\000g\000i\000t" | iconv -f UTF-16 -t UTF-8 | od -c
0000000    g   i   t

b) UTF-16, with BOM, little endian:
$ printf "\377\376g\000i\000t\000" | iconv -f UTF-16 -t UTF-8 | od -c
0000000    g   i   t

c) UTF-16, with BOM, big endian:
$ printf "\376\377\000g\000i\000t" | iconv -f UTF-16 -t UTF-8 | od -c
0000000    g   i   t

Git uses libiconv to convert from UTF-8 in the index into ITF-16 in the
working tree.
After a checkout, the resulting file has a BOM and is encoded in "UTF-16",
in the version (c) above.
This is what iconv generates, more details follow below.

iconv (and libiconv) can generate UTF-16, UTF-16LE or UTF-16BE:

d) UTF-16
$ printf 'git' | iconv -f UTF-8 -t UTF-16 | od -c
0000000  376 377  \0   g  \0   i  \0   t

e) UTF-16LE
$ printf 'git' | iconv -f UTF-8 -t UTF-16LE | od -c
0000000    g  \0   i  \0   t  \0

f)  UTF-16BE
$ printf 'git' | iconv -f UTF-8 -t UTF-16BE | od -c
0000000   \0   g  \0   i  \0   t

There is no way to generate version (b) from above in a Git working tree,
but that is what some applications need.
(All fully unicode aware applications should be able to read all 3 variants,
but in practise we are not there yet).

When producing UTF-16 as an output, iconv generates the big endian version
with a BOM. (big endian is probably chosen for historical reasons).

iconv can produce UTF-16 files with little endianess by using "UTF-16LE"
as encoding, and that file does not have a BOM.

Not all users (especially under Windows) are happy with this.
Some tools are not fully unicode aware and can only handle version (b).

Today there is no way to produce version (b) with iconv (or libiconv).
Looking into the history of iconv, it seems as if version (c) will
be used in all future iconv versions (for compatibility reasons).

Solve this dilemma and introduce a Git-specific "UTF-16LE-BOM".
libiconv can not handle the encoding, so Git pick it up, handles the BOM
and uses libiconv to convert the rest of the stream.
(UTF-16BE-BOM is added for consistency)

Rported-by: Adrián Gimeno Balaguer &lt;adrigibal@gmail.com&gt;
Signed-off-by: Torsten Bögershausen &lt;tboegi@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'en/incl-forward-decl'</title>
<updated>2018-08-20T19:41:32Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2018-08-20T19:41:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=5ade03446430f1190fdf1eca8b8fde5f63367110'/>
<id>urn:sha1:5ade03446430f1190fdf1eca8b8fde5f63367110</id>
<content type='text'>
Code hygiene improvement for the header files.

* en/incl-forward-decl:
  Remove forward declaration of an enum
  compat/precompose_utf8.h: use more common include guard style
  urlmatch.h: fix include guard
  Move definition of enum branch_track from cache.h to branch.h
  alloc: make allocate_alloc_state and clear_alloc_state more consistent
  Add missing includes and forward declarations
</content>
</entry>
<entry>
<title>Add missing includes and forward declarations</title>
<updated>2018-08-15T18:52:09Z</updated>
<author>
<name>Elijah Newren</name>
<email>newren@gmail.com</email>
</author>
<published>2018-08-15T17:54:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=ef3ca95475ce467ae883cc8175ed40e6f7d27800'/>
<id>urn:sha1:ef3ca95475ce467ae883cc8175ed40e6f7d27800</id>
<content type='text'>
I looped over the toplevel header files, creating a temporary two-line C
program for each consisting of
  #include "git-compat-util.h"
  #include $HEADER
This patch is the result of manually fixing errors in compiling those
tiny programs.

Signed-off-by: Elijah Newren &lt;newren@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>reencode_string: use size_t for string lengths</title>
<updated>2018-07-24T17:19:29Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2018-07-24T10:50:33Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=c7d017d7e1cca37ca20f73c11fa9f1b319a2c3a5'/>
<id>urn:sha1:c7d017d7e1cca37ca20f73c11fa9f1b319a2c3a5</id>
<content type='text'>
The iconv interface takes a size_t, which is the appropriate
type for an in-memory buffer. But our reencode_string_*
functions use integers, meaning we may get confusing results
when the sizes exceed INT_MAX. Let's use size_t
consistently.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Sync with Git 2.17.1</title>
<updated>2018-05-29T08:10:05Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2018-05-29T08:09:58Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=7913f53b5628997165e075008d6142da1c04271a'/>
<id>urn:sha1:7913f53b5628997165e075008d6142da1c04271a</id>
<content type='text'>
* maint: (25 commits)
  Git 2.17.1
  Git 2.16.4
  Git 2.15.2
  Git 2.14.4
  Git 2.13.7
  fsck: complain when .gitmodules is a symlink
  index-pack: check .gitmodules files with --strict
  unpack-objects: call fsck_finish() after fscking objects
  fsck: call fsck_finish() after fscking objects
  fsck: check .gitmodules content
  fsck: handle promisor objects in .gitmodules check
  fsck: detect gitmodules files
  fsck: actually fsck blob data
  fsck: simplify ".git" check
  index-pack: make fsck error message more specific
  verify_path: disallow symlinks in .gitmodules
  update-index: stat updated files earlier
  verify_dotfile: mention case-insensitivity in comment
  verify_path: drop clever fallthrough
  skip_prefix: add case-insensitive variant
  ...
</content>
</entry>
<entry>
<title>is_hfs_dotgit: match other .git files</title>
<updated>2018-05-22T03:50:11Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2018-05-02T19:23:45Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=0fc333ba20b43a8afee5023e92cb3384ff4e59a6'/>
<id>urn:sha1:0fc333ba20b43a8afee5023e92cb3384ff4e59a6</id>
<content type='text'>
Both verify_path() and fsck match ".git", ".GIT", and other
variants specific to HFS+. Let's allow matching other
special files like ".gitmodules", which we'll later use to
enforce extra restrictions via verify_path() and fsck.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
</content>
</entry>
<entry>
<title>utf8: add function to detect a missing UTF-16/32 BOM</title>
<updated>2018-04-16T02:40:56Z</updated>
<author>
<name>Lars Schneider</name>
<email>larsxschneider@gmail.com</email>
</author>
<published>2018-04-15T18:16:06Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=c6e48652f69f6955bbbb423100e0df2a49467db8'/>
<id>urn:sha1:c6e48652f69f6955bbbb423100e0df2a49467db8</id>
<content type='text'>
If the endianness is not defined in the encoding name, then let's
be strict and require a BOM to avoid any encoding confusion. The
is_missing_required_utf_bom() function returns true if a required BOM
is missing.

The Unicode standard instructs to assume big-endian if there in no BOM
for UTF-16/32 [1][2]. However, the W3C/WHATWG encoding standard used
in HTML5 recommends to assume little-endian to "deal with deployed
content" [3]. Strictly requiring a BOM seems to be the safest option
for content in Git.

This function is used in a subsequent commit.

[1] http://unicode.org/faq/utf_bom.html#gen6
[2] http://www.unicode.org/versions/Unicode10.0.0/ch03.pdf
     Section 3.10, D98, page 132
[3] https://encoding.spec.whatwg.org/#utf-16le

Signed-off-by: Lars Schneider &lt;larsxschneider@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>utf8: add function to detect prohibited UTF-16/32 BOM</title>
<updated>2018-04-16T02:40:56Z</updated>
<author>
<name>Lars Schneider</name>
<email>larsxschneider@gmail.com</email>
</author>
<published>2018-04-15T18:16:05Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=10ecb82e4f1f507d5f122e00fd4829b30953f853'/>
<id>urn:sha1:10ecb82e4f1f507d5f122e00fd4829b30953f853</id>
<content type='text'>
Whenever a data stream is declared to be UTF-16BE, UTF-16LE, UTF-32BE
or UTF-32LE a BOM must not be used [1]. The function returns true if
this is the case.

This function is used in a subsequent commit.

[1] http://unicode.org/faq/utf_bom.html#bom10

Signed-off-by: Lars Schneider &lt;larsxschneider@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
