help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re[2]: UNICODE


From: Hans Aberg
Subject: Re[2]: UNICODE
Date: Tue, 12 Feb 2002 19:04:17 +0100

[Please keep the cc to Help Bison, as more good folks can help.]
At 17:02 +0300 2002/02/12, pecherin wrote:
>>Formally, the Bison generated parser reads token numbers provided by the
>>lexer and knows nothing about character codes, so from that point of view,
>>there is nothing different form using one character encoding from another.
>
>Thanks. I think you are right.
>I looked at code generated by Bison and
>found a lot of char constants and it
>confused me.

There is another limitation, which I do not know whether you will hit, but
which you should be aware of when working with Unicode: Namely that both
Bison and the parser it generates uses a "short" for states. It means that
if you put in a lot of Unicode tokens which are parsed by different states,
you might run into an overflow.

The reason one might want to generate a lot of Unicode tokens is that the
C/C++ support for Unicode is real lousy, so those that write such Unicode
multi-compiler applications (like WWW servers/browsers) give the Unicode
characters identifier names, and write out the character codes explicitly.
This seems to be the only way to ensure portability right now.

  Hans Aberg





reply via email to

[Prev in Thread] Current Thread [Next in Thread]