pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Filename Encoding


From: Ben Pfaff
Subject: Re: Filename Encoding
Date: Wed, 11 Dec 2013 07:38:46 -0800
User-agent: Mutt/1.5.21 (2010-09-15)

On Wed, Dec 11, 2013 at 09:05:16AM +0100, John Darrington wrote:
> On Tue, Dec 10, 2013 at 12:38:04PM -0800, Ben Pfaff wrote:
> 
>      I understand now.  However, in other places in PSPP, and in particular
>      in syntax and the output engine, we tend to convert everything we
>      receive externally into UTF-8 for internal processing, and then convert
>      back to other encodings as necessary.  It would be convenient for some
>      purposes to do this for filenames also (e.g. to include file names in
>      output), and it would avoid needing to keep around two pieces of
>      information (file name plus encoding) when one (UTF-8 file name) would
>      do.  
> 
>      Do you think that storing file name plus encoding is superior?
> 
> Both solutions have advantages and disadvantages.
> 
> The converting-all-filenames-to-utf8 solution has two disadvantages that I
> can see:
> 
> *.  Unnecessary recoding - often it will be necessary to convert from 
> "filename encoding"
>     to utf8 and then, back to "filename encoding".

Is the concern here about performance, or something else?  I doubt that
there is a real performance problem with doing one or two conversions of
a file name, once per file open.  Also, on GNU/Linux the filename
encoding is UTF-8 anyway, so there is no actual conversion.

> *.  The bigger disadvantage, is that it will be very easy simply to forget to 
> do
> the necessary conversion.  If the programmer forgets - the compiler won't 
> complain - 
> it is just a char *   - Passing a struct file_handle * one cannot forget - 
> there'll
> be a compiler error.

That's true.  In data, we use uint8_t instead of char to remind
ourselves that the data is in the dictionary encoding.  We could use
int8_t for UTF-8 data, but that doesn't match either libunistring or
glib practice so it would probably cause a lot of friction at
interfaces.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]