pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Filename Encoding


From: Ben Pfaff
Subject: Re: Filename Encoding
Date: Tue, 10 Dec 2013 12:38:04 -0800
User-agent: Mutt/1.5.21 (2010-09-15)

On Tue, Dec 10, 2013 at 07:24:39PM +0100, John Darrington wrote:
> On Mon, Dec 09, 2013 at 08:30:17PM -0800, Ben Pfaff wrote:
>      On Sun, Dec 08, 2013 at 07:43:25PM +0100, John Darrington wrote:
>      > Thanks to Harry's feedback, I've started to understand the issues 
> concerning 
>      > bug #33255 .
>      
>      Can you explain the issue as you see it?  I don't see any recent
>      substantive comments in that bug report.
> 
> 
> There are two issues when dealing with filenames on windows.
> 
> Firstly, if not using ascii, it is unsafe to open a file using fopen.  
> Instead,
> one must convert the filename to a wchar_t (UTF16) and use the special 
> windows 
> function _wfopen.  However, if you want to convert a string to UTF16, you 
> have 
> to know what encoding the string currently is.
> 
> Now the filename for (say) a GET FILE command can in PSPP come from a variety 
> of 
> sources:
> 
> * From a syntax file.  In which case the filename is presumably in the 
> encoding
>   of the file.
> 
> * From the command line.  In which case the encoding is probably that of the 
>   current locale.
> 
> * From the GUI.  In which case, the encoding is either UTF-8  or the value of 
>   G_FILENAME_ENCODING See 
> https://developer.gnome.org/glib/stable/glib-running.html
> 
> 
> Since there are many different sources of a filename, unless we know the 
> source, we
> cannot know the encoding.
>   
> So my proposed solution is to save the encoding and the filename together in 
> a struct,
> and pass them around in that struct.  Fortunately, we already have such a 
> struct which
> is in most cases used to open/close/access files, viz; struct file_handle.
> 
> I hope this makes sense.

I understand now.  However, in other places in PSPP, and in particular
in syntax and the output engine, we tend to convert everything we
receive externally into UTF-8 for internal processing, and then convert
back to other encodings as necessary.  It would be convenient for some
purposes to do this for filenames also (e.g. to include file names in
output), and it would avoid needing to keep around two pieces of
information (file name plus encoding) when one (UTF-8 file name) would
do.  Do you think that storing file name plus encoding is superior?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]