|
From: | Dmitry Gutov |
Subject: | Re: [elpa] 02/04: company-clang: handle multibyte chars between bol and point |
Date: | Fri, 21 Mar 2014 05:47:11 +0200 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 |
On 20.03.2014 18:11, Eli Zaretskii wrote:
I needed to look in their sources, but the information there isn't clear-cut, either (or maybe I didn't understand the code ;-). Some functions that convert file offsets to columns count bytes from the beginning of the line, others count characters, assuming a UTF-8 encoding. But since you say the attempt to count characters in non-UTF-8 encoding failed, I guess clang needs byte counts of UTF-8 encoding.
Yes. And from what I've read (http://stackoverflow.com/a/8259610/615245), non-ANSI encoding support was added piecewise, so maybe the relevant code still hasn't settled.
In any case, please note that UTF-8 and the internal encoding used by Emacs are not exactly identical, so IMO you should encode into UTF-8 and then use 'length' to compute the "column".
This makes sense. I don't think anyone's likely to encounter a source file with characters that are encoded differently between utf-8 and utf-8-emacs, but I guess the latter is unspecced, so it could change in the future.
[Prev in Thread] | Current Thread | [Next in Thread] |