monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] iconv diffs [Was: Why is utf8...]


From: Patrick Georgi
Subject: Re: [Monotone-devel] iconv diffs [Was: Why is utf8...]
Date: Sat, 17 Feb 2007 08:45:46 +0100
User-agent: Thunderbird 1.5.0.8 (X11/20061204)

Nathaniel Smith schrieb:
have no idea what's going to happen on, say, OSX or *BSD or Solaris.
For solaris: it will fail as it can't find that table you refer ("ASCII//whatever") as it's non-standard. The same for BSD, unless they rebuilt the GNU extension (in which case you'd better look out for implementation differences)
One option is just to write our own "//IGNORE"-style iconv wrapper.
iconv's normal API is that it does as much work as it can, then it
tells you where it bombed out.  It's perfectly possible at that point
to skip ahead a byte or more on the input, stick a question mark in
the output string, and then try again from there.  Not the most
efficient thing in the world, but probably a lot easier than trying to
ship iconv conversion tables.
"skip ahead a byte" is troublesome - if your illegal sequence is a multibyte character (or even some state machine changing sequence in some of the obscure encodings), your next character will be wrong or illegal, too.

but skipping a character should be possible:
- build another iconv state that translates input encoding into input encoding (unless that enables a fast-path, which I'm not sure of - alternative might be some encoding that is the ultimate superset, if such an encoding exists) - push first unknown byte into it. if that creates a response already, discard (as it might be some header sequence) and restart with the same byte in the next step, otherwise start at the next byte
- until iconv emits a response, push byte after byte into it
- skip that many bytes in the input, replace with one "?"

not so simple anymore, but imho still easier than integrating gnu iconv.


patrick georgi




reply via email to

[Prev in Thread] Current Thread [Next in Thread]