[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH, resend] Handle multibyte codepoint width properly

From: Vladimir 'φ-coder/phcoder' Serbinenko
Subject: Re: [PATCH, resend] Handle multibyte codepoint width properly
Date: Thu, 05 Apr 2012 16:12:34 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.3) Gecko/20120329 Icedove/10.0.3

On 05.04.2012 14:32, Bruno Haible wrote:
> Hi Vladimir,
>> I'm not sure if previous time I sent with or without \0 bugfix. Resending
> Thanks. I apologize for not following up on the update that you sent in
> [1] and the test case in [2].
> The approach of the patch looks fine now; but there are a couple of minor
> problems that would be nice to fix first. Take it as a way to learn gnulib
> code style conventions. I think you will have more opportunities to contribute
> to gnulib in the future.
> - In lib/argp-fmtstream.h, the function __argp_get_display_len is declared
>   twice.
Fixed locally
> - In lib/argp-fmtstream.c: Both functions should have a comment before
>   the function, describing what the function does, what are the arguments,
>   and what is the return type.
Fixed locаlly
> - The functions __argp_get_display_len and add_length don't write to
>   the memory delimited by 'beg', 'end', 'ptr, 'end'. Therefore it is good
>   style to declare these parameters as being 'const char *' rather than
>   'char *'. The general rule of thumb is: Use 'const char *' instead of
>   'char *' as long as it does not lead to warnings with "gcc -Wall" and
>   does not force you to introduce casts.
I'll have to change return types to offset instead of pointer to avoid
> - Naming of the functions: The term "len" or "length" designates a byte
>   count (or multibyte-character count, when used in functions with prefix
>   "mbs"). The term "display length" is therefore a misnomer. When a
>   functions counts the number of screen columns needed to display a string,
>   the right term is "width".
Fixed locally
> - Naming of the second function: You call it 'add_length', but it does
>   a subtraction. Either the function name is irritating, or the code is
>   wrong?
Neither. The function returns pointer to the first non-fitting character
hence it's like ptr + l just that l is expressed in columns.
> - Handling of incomplete multibyte characters: When mbrtowc() encounters
>   the end of the character sequence with an incomplete byte sequence, it
>   returns (size_t)(-2). (See "man mbrtowc".) In this case, your code
>   accesses the uninitialized variable 'wc'.
Fixed. I didn't think of it because in GRUB we decode such sequences to
a replacement character.
> - The function __argp_get_display_len looks very similar to mbsnwidth(),
>   from module 'mbswidth'. Could you use that function? One of the gnulib
>   principles is to reuse code that is already in gnulib, where it makes sense.
mbsnwidth returns plainly -1 in case of incorrect multibyte sequence.
This is unfortunately not uncommon that the .mo claims incorrect
encoding, use poorly supported encoding or use incorrect encoding (like
8-byte character in otherwise UTF-8 file or characters from C1 used for
western symbols in windows way). I'd prefer to be able to recover
somehow in this case, at least in the case of UTF-8 and 8-byte encoding
(I'm aware that in case some encodings like EUC-JP it's not possible to
reliably resynchronise).
What is the best course of action in this case? Just failing may result
in the options descriptions completely missing due to one character
corruption. While it gives incencitive for developper to fix the problem
it creates a big problem for end user.
Perhaps we can use an approximation one byte = one column in case of
failure and output some message at the end telling the user about the
> - Your patch introduces tabs. While other GNU projects like tabs, in gnulib
>   we have banned tabs from all *.c, *.h, *.m4, *.sh, *.texi files. See
>   the gnulib/README for instructions how to ensure you don't accidentally
>   introduce tabs.
Will look into it
> - You have shown a test case as a Cyrillic string. But what is the C code
>   to make the behaviour explicit? Could you add this code to 
> tests/test-argp.c,
>   or create a new test file tests/test-argp-3.c?
Will do.
> - Last not least, we will need a copyright assignment from you for the
>   changes, so that the FSF can act as copyright holder of argp-fmtstream.c
>   and enforce the copyright when there is need to. To this effect, can you
>   fill out the form at gnulib/doc/Copyright/request-assign.future and submit
>   it, please?
I have the assignment covering any contribution to GRUB. So this should
go pretty quickly. Alternatively I can commit it in GRUB and you'll lift
it from there.
> Bruno
> [1] http://lists.gnu.org/archive/html/bug-gnulib/2012-02/msg00072.html
> [2] http://lists.gnu.org/archive/html/bug-gnulib/2012-03/msg00170.html

Vladimir 'φ-coder/phcoder' Serbinenko

Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]