[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Issue 2159 in lilypond: Patch: lexer.ll: Warn about non-UTF-8 charac
From: |
Hans Aberg |
Subject: |
Re: Issue 2159 in lilypond: Patch: lexer.ll: Warn about non-UTF-8 characters |
Date: |
Sun, 1 Jan 2012 23:18:50 +0100 |
On 1 Jan 2012, at 21:06, David Kastrup wrote:
>>> Updates:
>>> Labels: Patch-new
>>>
>>> Comment #2 on issue 2159 by address@hidden: Patch: lexer.ll: Warn
>>> about non-UTF-8 characters
>>> http://code.google.com/p/lilypond/issues/detail?id=2159#c2
>>>
>>> lexer.ll: Warn about non-UTF-8 characters
>>>
>>> Making the warnings point to the exact bad byte rather than the
>>> enclosing construct would be nice.
>>
>> One way to implement this might be to use the Haskell program for Flex
>> like UTF-8 regular expressions I made:
>> http://xcybercloud.blogspot.com/2009/04/unicode-support-in-flex.html
>>
>> First make rules for the Unicode characters you want admit, followed
>> by a '.' rule which picks up single excluded bytes.
>
> The "unicode characters we want admit" are not single characters, but
> part of things like identifiers, strings and other stuff. Cf.
> <URL:http://codereview.appspot.com/5505090#msg5>
> for a reasoning about the current approach for this patch.
I translate Unicode character classes into Flex UTF-8 regular expressions, so
you can apply the other Flex regex operators to get that stuff.
Hans
Re: Issue 2159 in lilypond: Patch: lexer.ll: Warn about non-UTF-8 characters, Hans Aberg, 2012/01/01