Re: guarantees of u8_mbtouc/u8

bug-gnulib

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: guarantees of u8_mbtouc/u8_strmbtouc

From:	Bruno Haible
Subject:	Re: guarantees of u8_mbtouc/u8_strmbtouc
Date:	Sat, 31 Jul 2010 22:24:21 +0200
User-agent:	KMail/1.9.9

Hi Paolo,

> Still, without safety u8_strmbtouc(puc, s) uses the same code as 
> u8_mbtouc(puc, s, SIZE_MAX), which makes pretty much my point.  I think 
> it is safe and actually very useful to document u8_mbtouc/u16_mbtouc as 
> looking only one byte (resp. one short) beyond the first complete character.

I find it better to have clear specifications that the programmer can easily
remember. The libunistring manual [1] states:
  "Argument pairs (s, n) denote a string s[0..n-1] with exactly n units."

If we were to document "u8_mbtouc accesses only as many bytes as the first
Unicode character makes up", the question immediately comes up: what about
invalid and incomplete Unicode characters? Like
   { 0xC3 }, n = 1
or { 0xE4, 0x30 } n = 2.
You see how such a definition quickly gets ambiguous. Such ambiguities later
lead to bugs in the programs.

Bruno

[1] http://www.gnu.org/software/libunistring/manual/html_node/Conventions.html

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, (continued)
- Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm, Pádraig Brady, 2010/07/23

Prev by Date: Re: speed up u8_strstr
Next by Date: Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm
Previous by thread: guarantees of u8_mbtouc/u8_strmbtouc (was Re: [PATCH v2 0/5] Speed up uNN_chr and uNN_strchr with Boyer-Moore algorithm)
Next by thread: Re: guarantees of u8_mbtouc/u8_strmbtouc
Index(es):
- Date
- Thread