<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/utf8.c, branch v2.2.0</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.2.0</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.2.0'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2014-09-19T18:38:39Z</updated>
<entry>
<title>Merge branch 'rs/export-strbuf-addchars'</title>
<updated>2014-09-19T18:38:39Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2014-09-19T18:38:39Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=56feed1c7641bbf7920efe6607c6a04309073baa'/>
<id>urn:sha1:56feed1c7641bbf7920efe6607c6a04309073baa</id>
<content type='text'>
Code clean-up.

* rs/export-strbuf-addchars:
  strbuf: use strbuf_addchars() for adding a char multiple times
  strbuf: export strbuf_addchars()
</content>
</entry>
<entry>
<title>Merge branch 'nd/strbuf-utf8-replace'</title>
<updated>2014-09-09T19:54:02Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2014-09-09T19:54:02Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=1764e8124e441a84b4764da0f8e6b0c900c44941'/>
<id>urn:sha1:1764e8124e441a84b4764da0f8e6b0c900c44941</id>
<content type='text'>
* nd/strbuf-utf8-replace:
  utf8.c: fix strbuf_utf8_replace() consuming data beyond input string
</content>
</entry>
<entry>
<title>strbuf: export strbuf_addchars()</title>
<updated>2014-09-08T18:26:45Z</updated>
<author>
<name>René Scharfe</name>
<email>l.s.r@web.de</email>
</author>
<published>2014-09-07T07:03:32Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d07235a027e0c7ed2fee1aeb3bee8a858bf5ca58'/>
<id>urn:sha1:d07235a027e0c7ed2fee1aeb3bee8a858bf5ca58</id>
<content type='text'>
Move strbuf_addchars() to strbuf.c, where it belongs, and make it
available for other callers.

Signed-off-by: Rene Scharfe &lt;l.s.r@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>utf8.c: fix strbuf_utf8_replace() consuming data beyond input string</title>
<updated>2014-08-11T18:52:22Z</updated>
<author>
<name>Nguyễn Thái Ngọc Duy</name>
<email>pclouds@gmail.com</email>
</author>
<published>2014-08-10T07:05:21Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=430875969a5229c1d306e4cc5acc8c8afe2b50a3'/>
<id>urn:sha1:430875969a5229c1d306e4cc5acc8c8afe2b50a3</id>
<content type='text'>
The main loop in strbuf_utf8_replace() could summed up as:

  while ('src' is still valid) {
    1) advance 'src' to copy ANSI escape sequences
    2) advance 'src' to copy/replace visible characters
  }

The problem is after #1, 'src' may have reached the end of the string
(so 'src' points to NUL) and #2 will continue to copy that NUL as if
it's a normal character. Because the output is stored in a strbuf,
this NUL accounted in the 'len' field as well. Check after #1 and
break the loop if necessary.

The test does not look obvious, but the combination of %&gt;&gt;() should
make a call trace like this

  show_log()
  pretty_print_commit()
  format_commit_message()
  strbuf_expand()
  format_commit_item()
  format_and_pad_commit()
  strbuf_utf8_replace()

where %C(auto)%d would insert a color reset escape sequence in the end
of the string given to strbuf_utf8_replace() and show_log() uses
fwrite() to send everything to stdout (including the incorrect NUL
inserted by strbuf_utf8_replace)

Signed-off-by: Nguyễn Thái Ngọc Duy &lt;pclouds@gmail.com&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'tb/unicode-6.3-zero-width'</title>
<updated>2014-06-06T18:29:38Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2014-06-06T18:29:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=334d40e951fa3b3961135b3183633706d976c4bd'/>
<id>urn:sha1:334d40e951fa3b3961135b3183633706d976c4bd</id>
<content type='text'>
Update the logic to compute the display width needed for utf8
strings and allow us to more easily maintain the tables used in
that logic.

We may want to let the users choose if codepoints with ambiguous
widths are treated as a double or single width in a follow-up patch.

* tb/unicode-6.3-zero-width:
  utf8: make it easier to auto-update git_wcwidth()
  utf8.c: use a table for double_width
</content>
</entry>
<entry>
<title>utf8: make it easier to auto-update git_wcwidth()</title>
<updated>2014-05-12T17:38:01Z</updated>
<author>
<name>Torsten Bögershausen</name>
<email>tboegi@web.de</email>
</author>
<published>2014-05-09T21:51:44Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=9c94389c3ee02df891100b894c1790a524268d91'/>
<id>urn:sha1:9c94389c3ee02df891100b894c1790a524268d91</id>
<content type='text'>
The function git_wcwidth() returns for a given unicode code point the
width on the display:

 -1 for control characters,
  0 for combining or other non-visible code points
  1 for e.g. ASCII
  2 for double-width code points.

This table had been originally been extracted for one Unicode
version, probably 3.2.

We now use two tables these days, one for zero-width and another for
double-width.  Make it easier to update these tables to a later
version of Unicode by factoring out the table from utf8.c into
unicode_width.h and add the script update_unicode.sh to update the
table based on the latest Unicode specification files.

Thanks to Peter Krefting &lt;peter@softwolves.pp.se&gt; and Kevin Bracey
&lt;kevin@bracey.fi&gt; for helping with their Unicode knowledge.

Signed-off-by: Torsten Bögershausen &lt;tboegi@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>utf8.c: use a table for double_width</title>
<updated>2014-05-12T17:20:46Z</updated>
<author>
<name>Torsten Bögershausen</name>
<email>tboegi@web.de</email>
</author>
<published>2014-05-09T21:51:38Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=08460345b5a221c0076a48c6be875d64b78b6015'/>
<id>urn:sha1:08460345b5a221c0076a48c6be875d64b78b6015</id>
<content type='text'>
Refactor git_wcwidth() and replace the if-else-if chain.
Use the table double_width which is scanned by the bisearch() function,
which is already used to find combining code points.

Signed-off-by: Torsten Bögershausen &lt;tboegi@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'tb/unicode-6.3-zero-width'</title>
<updated>2014-04-16T20:38:57Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2014-04-16T20:38:57Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=9fd911a8101bf4d3bff4e2e61a138505eb39eace'/>
<id>urn:sha1:9fd911a8101bf4d3bff4e2e61a138505eb39eace</id>
<content type='text'>
Teach our display-column-counting logic about decomposed umlauts
and friends.

* tb/unicode-6.3-zero-width:
  utf8.c: partially update to version 6.3
</content>
</entry>
<entry>
<title>utf8.c: partially update to version 6.3</title>
<updated>2014-04-09T17:14:05Z</updated>
<author>
<name>Torsten Bögershausen</name>
<email>tboegi@web.de</email>
</author>
<published>2014-04-07T19:39:41Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=d813ab970db8b57b70bdd1b7e5feddec1c3fd84e'/>
<id>urn:sha1:d813ab970db8b57b70bdd1b7e5feddec1c3fd84e</id>
<content type='text'>
Unicode 6.3 defines more code points as combining or accents.  For
example, the character "ö" could be expressed as an "o" followed by
U+0308 COMBINING DIARESIS (aka umlaut, double-dot-above).  We should
consider that such a sequence of two codepoints occupies one display
column for the alignment purposes, and for that, git_wcwidth()
should return 0 for them.  Affected codepoints are:

    U+0358..U+035C
    U+0487
    U+05A2, U+05BA, U+05C5, U+05C7
    U+0604, U+0616..U+061A, U+0659..U+065F

Earlier unicode standards had defined these as "reserved".

Only the range 0..U+07FF has been checked to see which codepoints
need to be marked as 0-width while preparing for this commit; more
updates may be needed.

Signed-off-by: Torsten Bögershausen &lt;tboegi@web.de&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>utf8: use correct type for values in interval table</title>
<updated>2014-02-18T23:51:40Z</updated>
<author>
<name>John Keeping</name>
<email>john@keeping.me.uk</email>
</author>
<published>2014-02-16T16:06:04Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=a68a67dea399f15305b059aa36c007cfdde2890e'/>
<id>urn:sha1:a68a67dea399f15305b059aa36c007cfdde2890e</id>
<content type='text'>
We treat these as unsigned everywhere and compare against unsigned
values, so declare them using the typedef we already have for this.

While we're here, fix the indentation as well.

Signed-off-by: John Keeping &lt;john@keeping.me.uk&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
