bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH, resend] Handle multibyte codepoint width properly


From: Bruno Haible
Subject: Re: [PATCH, resend] Handle multibyte codepoint width properly
Date: Thu, 05 Apr 2012 14:32:45 +0200
User-agent: KMail/4.7.4 (Linux/3.1.0-1.2-desktop; KDE/4.7.4; x86_64; ; )

Hi Vladimir,

> I'm not sure if previous time I sent with or without \0 bugfix. Resending

Thanks. I apologize for not following up on the update that you sent in
[1] and the test case in [2].

The approach of the patch looks fine now; but there are a couple of minor
problems that would be nice to fix first. Take it as a way to learn gnulib
code style conventions. I think you will have more opportunities to contribute
to gnulib in the future.

- In lib/argp-fmtstream.h, the function __argp_get_display_len is declared
  twice.

- In lib/argp-fmtstream.c: Both functions should have a comment before
  the function, describing what the function does, what are the arguments,
  and what is the return type.

- The functions __argp_get_display_len and add_length don't write to
  the memory delimited by 'beg', 'end', 'ptr, 'end'. Therefore it is good
  style to declare these parameters as being 'const char *' rather than
  'char *'. The general rule of thumb is: Use 'const char *' instead of
  'char *' as long as it does not lead to warnings with "gcc -Wall" and
  does not force you to introduce casts.

- Naming of the functions: The term "len" or "length" designates a byte
  count (or multibyte-character count, when used in functions with prefix
  "mbs"). The term "display length" is therefore a misnomer. When a
  functions counts the number of screen columns needed to display a string,
  the right term is "width".

- Naming of the second function: You call it 'add_length', but it does
  a subtraction. Either the function name is irritating, or the code is
  wrong?

- Handling of incomplete multibyte characters: When mbrtowc() encounters
  the end of the character sequence with an incomplete byte sequence, it
  returns (size_t)(-2). (See "man mbrtowc".) In this case, your code
  accesses the uninitialized variable 'wc'.

- The function __argp_get_display_len looks very similar to mbsnwidth(),
  from module 'mbswidth'. Could you use that function? One of the gnulib
  principles is to reuse code that is already in gnulib, where it makes sense.

- Your patch introduces tabs. While other GNU projects like tabs, in gnulib
  we have banned tabs from all *.c, *.h, *.m4, *.sh, *.texi files. See
  the gnulib/README for instructions how to ensure you don't accidentally
  introduce tabs.

- You have shown a test case as a Cyrillic string. But what is the C code
  to make the behaviour explicit? Could you add this code to tests/test-argp.c,
  or create a new test file tests/test-argp-3.c?

- Last not least, we will need a copyright assignment from you for the
  changes, so that the FSF can act as copyright holder of argp-fmtstream.c
  and enforce the copyright when there is need to. To this effect, can you
  fill out the form at gnulib/doc/Copyright/request-assign.future and submit
  it, please?

Bruno

[1] http://lists.gnu.org/archive/html/bug-gnulib/2012-02/msg00072.html
[2] http://lists.gnu.org/archive/html/bug-gnulib/2012-03/msg00170.html





reply via email to

[Prev in Thread] Current Thread [Next in Thread]