Re[2]: UNICODE

help-bison

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re[2]: UNICODE

From:	Hans Aberg
Subject:	Re[2]: UNICODE
Date:	Tue, 12 Feb 2002 19:04:17 +0100

[Please keep the cc to Help Bison, as more good folks can help.]
At 17:02 +0300 2002/02/12, pecherin wrote:
>>Formally, the Bison generated parser reads token numbers provided by the
>>lexer and knows nothing about character codes, so from that point of view,
>>there is nothing different form using one character encoding from another.
>
>Thanks. I think you are right.
>I looked at code generated by Bison and
>found a lot of char constants and it
>confused me.

There is another limitation, which I do not know whether you will hit, but
which you should be aware of when working with Unicode: Namely that both
Bison and the parser it generates uses a "short" for states. It means that
if you put in a lot of Unicode tokens which are parsed by different states,
you might run into an overflow.

The reason one might want to generate a lot of Unicode tokens is that the
C/C++ support for Unicode is real lousy, so those that write such Unicode
multi-compiler applications (like WWW servers/browsers) give the Unicode
characters identifier names, and write out the character codes explicitly.
This seems to be the only way to ensure portability right now.

  Hans Aberg

[Prev in Thread]

Current Thread

[Next in Thread]

UNICODE, pecherin, 2002/02/12
- Re: UNICODE, Hans Aberg, 2002/02/12
  - Message not available
    - Re[2]: UNICODE, Hans Aberg <=
- Re[2]: UNICODE, pecherin, 2002/02/12

Prev by Date: Re[2]: UNICODE
Next by Date: Bison Success Stories?
Previous by thread: Re: UNICODE
Next by thread: Re[2]: UNICODE
Index(es):
- Date
- Thread