[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-commits] [qemu/qemu] cb2744: unicode: New mod_utf8_codepoint()
From: |
GitHub |
Subject: |
[Qemu-commits] [qemu/qemu] cb2744: unicode: New mod_utf8_codepoint() |
Date: |
Sat, 13 Apr 2013 13:00:10 -0700 |
Branch: refs/heads/master
Home: https://github.com/qemu/qemu
Commit: cb2744ea343d8cb96bab0389f6b7d6e1a3ddf6c1
https://github.com/qemu/qemu/commit/cb2744ea343d8cb96bab0389f6b7d6e1a3ddf6c1
Author: Markus Armbruster <address@hidden>
Date: 2013-04-13 (Sat, 13 Apr 2013)
Changed paths:
M include/qemu-common.h
M util/Makefile.objs
A util/unicode.c
Log Message:
-----------
unicode: New mod_utf8_codepoint()
Signed-off-by: Markus Armbruster <address@hidden>
Reviewed-by: Laszlo Ersek <address@hidden>
Signed-off-by: Blue Swirl <address@hidden>
Commit: d6244e2ce48b353402eff271d382ee6fd47ce166
https://github.com/qemu/qemu/commit/d6244e2ce48b353402eff271d382ee6fd47ce166
Author: Markus Armbruster <address@hidden>
Date: 2013-04-13 (Sat, 13 Apr 2013)
Changed paths:
M tests/check-qjson.c
Log Message:
-----------
check-qjson: Improve a few comments, delete bogus ones
Signed-off-by: Markus Armbruster <address@hidden>
Reviewed-by: Laszlo Ersek <address@hidden>
Signed-off-by: Blue Swirl <address@hidden>
Commit: 1d50c8e947180174acb02bad9ff95e0aee6249ea
https://github.com/qemu/qemu/commit/1d50c8e947180174acb02bad9ff95e0aee6249ea
Author: Markus Armbruster <address@hidden>
Date: 2013-04-13 (Sat, 13 Apr 2013)
Changed paths:
M tests/check-qjson.c
Log Message:
-----------
check-qjson: Test noncharacters other than U+FFFE, U+FFFF in strings
Test cases cover the two noncharacters in the BMP. Add tests for the
other 64 noncharacters.
Three existing test cases involve noncharacters U+FFFF and U+10FFFF.
Instead of deleting them as now duplicates, adjust them to use U+FFFC
and U+10FFFFD.
Signed-off-by: Markus Armbruster <address@hidden>
Reviewed-by: Laszlo Ersek <address@hidden>
Signed-off-by: Blue Swirl <address@hidden>
Commit: e2ec3f976803b360c70d9ae2ba13852fa5d11665
https://github.com/qemu/qemu/commit/e2ec3f976803b360c70d9ae2ba13852fa5d11665
Author: Markus Armbruster <address@hidden>
Date: 2013-04-13 (Sat, 13 Apr 2013)
Changed paths:
M qobject/qjson.c
M tests/check-qjson.c
Log Message:
-----------
qjson: to_json() case QTYPE_QSTRING is buggy, rewrite
Known bugs in to_json():
* A start byte for a three-byte sequence followed by less than two
continuation bytes is split into one-byte sequences.
* Start bytes for sequences longer than three bytes get misinterpreted
as start bytes for three-byte sequences. Continuation bytes beyond
byte three become one-byte sequences.
This means all characters outside the BMP are decoded incorrectly.
* One-byte sequences with the MSB are put into the JSON string
verbatim when char is unsigned, producing invalid UTF-8. When char
is signed, they're replaced by "\\uFFFF" instead.
This includes \xFE, \xFF, and stray continuation bytes.
* Overlong sequences are happily accepted, unless screwed up by the
bugs above.
* Likewise, sequences encoding surrogate code points or noncharacters.
* Unlike other control characters, ASCII DEL is not escaped. Except
in overlong encodings.
My rewrite fixes them as follows:
* Malformed UTF-8 sequences are replaced.
Except the overlong encoding \xC0\x80 of U+0000 is still accepted.
Permits embedding NUL characters in C strings. This trick is known
as "Modified UTF-8".
* Sequences encoding code points beyond Unicode range are replaced.
* Sequences encoding code points beyond the BMP produce a surrogate
pair.
* Sequences encoding surrogate code points are replaced.
* Sequences encoding noncharacters are replaced.
* ASCII DEL is now always escaped.
The replacement character is U+FFFD.
Signed-off-by: Markus Armbruster <address@hidden>
Reviewed-by: Laszlo Ersek <address@hidden>
Signed-off-by: Blue Swirl <address@hidden>
Compare: https://github.com/qemu/qemu/compare/75312e745ad1...e2ec3f976803
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Qemu-commits] [qemu/qemu] cb2744: unicode: New mod_utf8_codepoint(),
GitHub <=