guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: iconv or something like that


From: Konrad Makowski
Subject: Re: iconv or something like that
Date: Sat, 25 Oct 2014 10:24:34 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0

O problem resolved and was not related to mysql or locale but my mistake.

Konrad

W dniu 25.10.2014 o 09:03, Konrad Makowski pisze:
I'm using MySQL. And figure out that if i send query: "SET NAMES utf8" or "SET NAMES utf8 COLLATE utf8_general_ci" to the database (in terminal for example) mysql converts for me charset of returned data. But if i do the same in my guile script it reports error:
In ice-9/boot-9.scm:
 157: 9 [catch #t #<catch-closure 1cff400> ...]
In unknown file:
   ?: 8 [apply-smob/1 #<catch-closure 1cff400>]
In ice-9/boot-9.scm:
  63: 7 [call-with-prompt prompt0 ...]
In ice-9/eval.scm:
 432: 6 [eval # #]
 432: 5 [eval # #]
 387: 4 [eval # #]
 387: 3 [eval # #]
 387: 2 [eval # #]
 387: 1 [eval # #]
In unknown file:
   ?: 0 [utf8->string #vu8(80 65 87 69 163)]

ERROR: In procedure utf8->string:
ERROR: Throw to key `decoding-error' with args `("scm_from_stringn" "input locale conversion error" 84 #vu8(80 65 87 69 163))'.

My locale say that:
LANG=pl_PL.UTF-8
LANGUAGE=pl:en
LC_CTYPE="pl_PL.UTF-8"
LC_NUMERIC="pl_PL.UTF-8"
LC_TIME="pl_PL.UTF-8"
LC_COLLATE="pl_PL.UTF-8"
LC_MONETARY="pl_PL.UTF-8"
LC_MESSAGES="pl_PL.UTF-8"
LC_PAPER="pl_PL.UTF-8"
LC_NAME="pl_PL.UTF-8"
LC_ADDRESS="pl_PL.UTF-8"
LC_TELEPHONE="pl_PL.UTF-8"
LC_MEASUREMENT="pl_PL.UTF-8"
LC_IDENTIFICATION="pl_PL.UTF-8"
LC_ALL=pl_PL.UTF-8

Any idea?

Konrad

W dniu 23.10.2014 o 20:00, Mark H Weaver pisze:
Konrad Makowski <address@hidden> writes:
Is there any solution to convert charset from one encoding to another?
Yes, but character encodings are only relevant when converting between a
sequence of _bytes_ (a bytevector), and a sequence of _characters_ [*]
(a string).  These conversions happen implicitly while performing I/O,
converting Scheme strings to/from C, etc.

[*] More precisely, Scheme strings are sequences of unicode code points.

It doesn't make sense to talk about the encoding of a Scheme string, or
to convert a Scheme string from one encoding to another, because they
are not byte sequences.

It sounds like you already have a Scheme string that was incorrectly
decoded from bytes, and are asking how to fix it up. Unfortunately,
this won't work, because many valid ISO-8859-2 byte sequences are not
valid UTF-8, and will therefore lead to decoding errors.

I have database in iso-8859-2 but my script runs in utf-8. I use dbi module.
Having looked at the guile-dbi source code, I see that it always uses
the current locale encoding when talking to databases. Specifically, it
always uses 'scm_from_locale_string' and 'scm_to_locale_string'.  For
your purposes, you'd like it to use 'scm_from_stringn' and
'scm_to_stringn' instead, with "ISO-8859-2" as the 'encoding' argument.

My knowledge of modern databases is limited, so I'm not sure how this
problem is normally dealt with.  It seems to me that, ideally, strings
in databases should be sequences of Unicode code points, rather than
sequences of bytes.  If that were the case, then this problem wouldn't
arise.

It would be good if someone with more knowledge of databases would chime
in here.

In the meantime, I can see a few possible solutions/workarounds:

* Enhance guile-dbi to include an 'encoding' field to its database
   handles, add a new API procedure to set it, and use it in all the
   appropriate places.  This only makes sense if database strings are
   conceptually byte sequences, otherwise it should probably be fixed in
   some other way.

* Hack your local copy of guile-dbi to use 'scm_from_stringn' and
   'scm_to_stringn' with a hard-coded "ISO-8859-2" in the appropriate
   places.

* Use 'setlocale' to set a ISO-8859-2 locale temporarily while
   performing database queries.

Which database are you using?

      Mark








reply via email to

[Prev in Thread] Current Thread [Next in Thread]