[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug33254 1/6] encoding-guesser: New function encoding_guess_whole_f

From: John Darrington
Subject: Re: [bug33254 1/6] encoding-guesser: New function encoding_guess_whole_file().
Date: Thu, 12 May 2011 17:19:47 +0000
User-agent: Mutt/1.5.18 (2008-05-17)

These patches look good to me. However it's a shame that latin-1 etc. cannot be
detected.  Other minor issues:

The text "Character Encoding:" is juxtaposed hard against the combo box, making 
it look a bit awkward.  Either the hbox needs padding or just append a space to 
the string.

The "(Auto)" in "Automatically Detect (Auto)" seems somewhat redundant to me.


On Wed, May 11, 2011 at 10:42:02PM -0700, Ben Pfaff wrote:
     This will be used for the first time in an upcoming commit.
      src/libpspp/encoding-guesser.c |   27 +++++++++++++++++++++++++++
      src/libpspp/encoding-guesser.h |    4 ++++
      2 files changed, 31 insertions(+), 0 deletions(-)
     diff --git a/src/libpspp/encoding-guesser.c 
     index 298861e..7d10015 100644
     --- a/src/libpspp/encoding-guesser.c
     +++ b/src/libpspp/encoding-guesser.c
     @@ -283,3 +283,30 @@ encoding_guess_tail_is_utf8 (const void *data, size_t 
                : is_all_utf8_text (data, n));
     +/* Attempts to guess the encoding of a text file based on ENCODING, an 
     +   name in one of the forms described at the top of encoding-guesser.h, 
and the
     +   SIZE byts in DATA, which contains the entire contents of the file.  
     +   the guessed encoding, which might be ENCODING itself or a suffix of it 
or a
     +   statically allocated string.
     +   Encoding autodetection only takes place if ENCODING actually specifies
     +   autodetection.  See encoding-guesser.h for details. */
     +const char *
     +encoding_guess_whole_file (const char *encoding, const void *text, size_t 
     +  const char *guess;
     +  guess = encoding_guess_head_encoding (encoding, text, size);
     +  if (!strcmp (guess, "ASCII") && encoding_guess_encoding_is_auto 
     +    {
     +      size_t ofs = encoding_guess_count_ascii (text, size);
     +      if (ofs < size)
     +        return encoding_guess_tail_encoding (encoding,
     +                                             (const char *) text + ofs,
     +                                             size - ofs);
     +      else
     +        return encoding_guess_parse_encoding (encoding);
     +    }
     +  else
     +    return guess;
     diff --git a/src/libpspp/encoding-guesser.h 
     index 2ec2fee..0a7d1f9 100644
     --- a/src/libpspp/encoding-guesser.h
     +++ b/src/libpspp/encoding-guesser.h
     @@ -115,6 +115,10 @@ bool encoding_guess_tail_is_utf8 (const void *, 
      const char *encoding_guess_tail_encoding (const char *encoding,
                                                const void *, size_t);
     +/* Guessing from entire file contents. */
     +const char *encoding_guess_whole_file (const char *encoding,
     +                                       const void *, size_t);
      /* Returns true if C is a byte that might appear in an ASCII text file,
         false otherwise. */
      static inline bool
     pspp-dev mailing list

PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See or any PGP keyserver for public key.

Attachment: signature.asc
Description: Digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]