pspp-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: status of UTF-8 support?


From: Rob Messer
Subject: Re: status of UTF-8 support?
Date: Tue, 26 Oct 2010 09:24:00 -0700

John,

Ok, we will try it with your latest changes and then reply back and let you 
know how it goes.  Thanks much,

Rob

On Oct 26, 2010, at 8:46 AM, John Darrington wrote:

> On Tue, Oct 26, 2010 at 10:40:29AM +0000, John Darrington wrote:
>     On Mon, Oct 25, 2010 at 07:51:56PM -0700, Ben Pfaff wrote:
>          Rob Messer <address@hidden> writes:
> 
>> What is the current status of support for including UTF-8 characters
>> in PSPP output?  My company is using the Perl interface to import
>> survey data into PSPP, and generally it works very well.  However,
>> we've never been able to use it when our dataset includes labels and
>> records in languages like Japanese and Chinese.  I know there have
>> been some recent updates to PSPP, so last week we upgraded to 0.7.5
>> and tried that, but it still didn't seem to work for our test Japanese
>> and Chinese data.  Is it supposed to be supported?  And if not in
>> 0.7.5, perhaps in the latest development snapshot?  Thanks,
> 
>          John Darrington and I talked about this briefly in IRC this
>          morning.  We didn't know a reason that UTF-8 shouldn't work.
> 
>     I had another look today and have to modify my opinion.  Currently, 
> non-ascii
>     characters will not work with the perl module.   :(
> 
> 
> OK.   I've just pushed a quick fix which should address this problem.  I 
> tested this 
> new version writing UTF8 strings in:
> 
> Variable Names;
> Variable Labels;
> Value Labels (both the key and the value);
> Values of string variables.
> 
> 
> So now, assuming you have a string variable defined, you can write a string 
> value using an literal utf8 string like:
> 
> # German word for "Cylindrical concrete billboard"
> $sysfile->append_case ( ["Litfaßsaüle"]);]);
> 
> or using escape sequences like:
> 
> # The Chinese  representation of the name of the city of Tapei
> $sysfile->append_case ( ["\x{53F0}\x{5317}"]);
> 
> 
> However, in most real life uses, I image you will not be using string 
> literals,
> but will be receiving the data from some other perl module.  In this case, 
> what
> needs to be done is :
> 
> use Encode;
> 
> $s = get_string_data_from_some_source ();
> $enc = get_encoding_of_string_data ();
> 
> $sysfile->append_case ([decode ($enc, $s)]);
> 
> 
> As always with i18n things are never without caveats... in particular:
> 
> * You must remember that a variable's "width" is the maximum number of BYTES
>  (not characters).
> 
> 
> * For rather convoluted reasons, which you need to read "man Encode" in order
>  to understand, the code ...
> 
>  use utf8;
>  use Encode;
> 
>  $sysfile->append_case ([decode ('UTF-8', "some-utf8-encoded-string")]);
> 
>  .... won't work.  Instead,  you would have to write:
> 
>  $sysfile->append_case ([decode ('UTF-8', encode ('UTF-8', 
> "some-utf8-encoded-string"))]);
> 
> 
> I haven't had a chance to look at reading non-ascii from a .sav file into 
> perl.
> 
> J'
> 
> 
> 
> 
> -- 
> PGP Public key ID: 1024D/2DE827B3 
> fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
> See http://pgp.mit.edu or any PGP keyserver for public key.
> 
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]