[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#50391: 28.0.50; json-read non-ascii data results in malformed string
From: |
Zhiwei Chen |
Subject: |
bug#50391: 28.0.50; json-read non-ascii data results in malformed string |
Date: |
Sun, 05 Sep 2021 12:19:56 +0800 |
When fetch json from youdao (a dict service in China).
#+begin_src elisp
(url-retrieve
"https://dict.youdao.com/suggest?q=accumulate&le=eng&num=80&doctype=json"
(lambda (_status)
(goto-char (1+ url-http-end-of-headers))
(write-region (point) (point-max) "/tmp/acc1.json")))
#+end_src
Then C-x C-f "/tmp/acc1.json", the file is correctly encoded without
But If `json-read' then `json-insert', the file is malformed even if
uchardet shows the encoding of the file is utf-8.
#+begin_src elisp
(url-retrieve
"https://dict.youdao.com/suggest?q=accumulate&le=eng&num=80&doctype=json"
(lambda (_status)
(goto-char (1+ url-http-end-of-headers))
(let ((j (json-read)))
(with-temp-buffer
(json-insert j)
(write-region (point-min) (point-max) "/tmp/acc2.json")))))
#+end_src
#+begin_src shell
diff -u <(hexdump -C /tmp/acc1.json | head -n10) <(hexdump -C /tmp/acc2.json |
head -n10) | diff-so-fancy
#+end_src
Screenshot: https://pb.nichi.co/jazz-estate-brave
Where diff shows the first word "累积" is encoded incorrectly in
"/tmp/acc2.json". (It uses `c3 a7 c2 b4 c2 af')
Actually,
#+begin_src shell
echo -n "累积" | hexdump -C
#+end_src
should be `e7 b4 af e7 a7 af' in utf-8 where "累" is represented with
`e7 b4 af' and "积" is represented with `e7 a7 af'
The environment variable LANG is `en_US.UTF-8', all tested in `emacs -Q'
--
Zhiwei Chen
- bug#50391: 28.0.50; json-read non-ascii data results in malformed string,
Zhiwei Chen <=