[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Help-source-highlight] Unicode files ?
From: |
Dario Teixeira |
Subject: |
Re: [Help-source-highlight] Unicode files ? |
Date: |
Fri, 2 Apr 2010 07:03:04 -0700 (PDT) |
Hi,
> the html might bring also bad encoding in the head, but I
> guess it is also due to the fact that source-highlight reads
> two bytes, which in unicode represent a single character,
> and interprets them as two characters instead of one.
> This is unicode, am I right? Sorry for my ignorance,
> but with unicode in a text file every character is
> represented by two bytes, right?
Nope. There is not one standard Unicode encoding, but several. The most
common one is UTF-8, which is a variable length encoding where each Unicode
character can take from 1 to 4 bytes (originally it was up to 6, but that's
deprecated now). Another variable-length encoding is UTF-16, where each
character can occupy between 2 and 4 bytes. The only fixed-length encoding
is UTF-32 (UCS-4), where each character requires 4 bytes.
> I'd like to try with wstring and see whether this solves
> something.
I haven't used C++ in a long time, but isn't wstring based on wchar_t,
which is 2 bytes long? If so, it won't solve anything. There is no
Unicode encoding that uses a fixed-length of 2 bytes!
Lorenzo, I think we can give you a hand in implementing this. However,
if you read through this entire thread you will notice that the best
course of action is dependent on a crucial piece of information which
you are the most qualified person to provide: we need a list of the
manipulations that Source-highlight applies to strings.
Hope that helps!
Best regards,
Dario Teixeira