<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/t/t9835-git-p4-metadata-encoding-python2.sh, branch v2.43.2</title>
<subtitle>Mirror of https://git.kernel.org/pub/scm/git/git.git/
</subtitle>
<id>https://git.shady.money/git/atom?h=v2.43.2</id>
<link rel='self' href='https://git.shady.money/git/atom?h=v2.43.2'/>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/'/>
<updated>2022-05-04T17:30:01Z</updated>
<entry>
<title>git-p4: improve encoding handling to support inconsistent encodings</title>
<updated>2022-05-04T17:30:01Z</updated>
<author>
<name>Tao Klerks</name>
<email>tao@klerks.biz</email>
</author>
<published>2022-04-30T19:26:52Z</published>
<link rel='alternate' type='text/html' href='https://git.shady.money/git/commit/?id=f7b5ff607fa1a62a480399afb6ccb9691735bf79'/>
<id>urn:sha1:f7b5ff607fa1a62a480399afb6ccb9691735bf79</id>
<content type='text'>
git-p4 is designed to run correctly under python2.7 and python3, but
its functional behavior wrt importing user-entered text differs across
these environments:

Under python2, git-p4 "naively" writes the Perforce bytestream into git
metadata (and does not set an "encoding" header on the commits); this
means that any non-utf-8 byte sequences end up creating invalidly-encoded
commit metadata in git.

Under python3, git-p4 attempts to decode the Perforce bytestream as utf-8
data, and fails badly (with an unhelpful error) when non-utf-8 data is
encountered.

Perforce clients (especially p4v) encourage user entry of changelist
descriptions (and user full names) in OS-local encoding, and store the
resulting bytestream to the server unmodified - such that different
clients can end up creating mutually-unintelligible messages. The most
common inconsistency, in many Perforce environments, is likely to be utf-8
(typical in linux) vs cp-1252 (typical in windows).

Make the changelist-description- and user-fullname-handling code
python-runtime-agnostic, introducing three "strategies" selectable via
config:
- 'passthrough', behaving as previously under python2,
- 'strict', behaving as previously under python3, and
- 'fallback', favoring utf-8 but supporting a secondary encoding when
utf-8 decoding fails, and finally escaping high-range bytes if the
decoding with the secondary encoding also fails.

Keep the python2 default behavior as-is ('legacy' strategy), but switch
the python3 default strategy to 'fallback' with default fallback encoding
'cp1252'.

Also include tests exercising these encoding strategies, documentation for
the new config, and improve the user-facing error messages when decoding
does fail.

Signed-off-by: Tao Klerks &lt;tao@klerks.biz&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
