bug-gettext
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: autoconf test for finding UTF-8 locale?


From: Werner LEMBERG
Subject: Re: autoconf test for finding UTF-8 locale?
Date: Mon, 21 Nov 2022 12:20:01 +0000 (UTC)

>> I'm searching for an autoconf test that checks whether a 'neutral'
>> locale with UTF-8 encoding is available.  [...]

> You can do so in a way similar to the Gnulib-provided macros
>   gt_LOCALE_FR_UTF8 [1]
>   gt_LOCALE_TR_UTF8 [2]
> or the gettext internal macro
>   gt_LOCALE_DE_UTF8 [3]
> 
> E.g. replace 'French_France'/'Turkish_Turkey'/'German_Germany' with
> 'English_United States'.

Thanks!  I have attached my current results – completely untested, and
I'm rather sure that it won't work...

Parts marked with 'XXX' are places where I am completely clueless what
to do.

Finally, I wonder how this could be tested on various platforms.  Is
there an OS 'farm' to which a configure script could be sent,
collecting all results of them?


    Werner
dnl locale-c-utf8.m4   -*-shell-script-*-

dnl Copyright (C) 2003, 2005-2018, 2022 Free Software Foundation, Inc.
dnl
dnl This file is free software; the Free Software Foundation
dnl gives unlimited permission to copy and/or distribute it,
dnl with or without modifications, as long as this notice is preserved.

dnl From Bruno Haible and Werner Lemberg.

dnl Find a 'C.UTF-8' locale encoding.
dnl This file is based on `locale-de.m4` from 'gnulib'.
AC_DEFUN([LOCALE_C_UTF8],
[
  AC_REQUIRE([AM_LANGINFO_CODESET])
  AC_CACHE_CHECK([for a 'C.UTF-8' locale], [ac_cv_locale_c_utf8], [
    AC_LANG_CONFTEST([AC_LANG_SOURCE([[
#include <locale.h>
#include <time.h>
#if HAVE_LANGINFO_CODESET
# include <langinfo.h>
#endif
#include <stdlib.h>
#include <string.h>
struct tm t;
char buf[16];
int main () {
  /* On BeOS and Haiku, locales are not implemented in libc.  Rather, libintl
     imitates locale dependent behaviour by looking at the environment
     variables, and all locales use the UTF-8 encoding.  */
#if !(defined __BEOS__ || defined __HAIKU__)
  /* Check whether the given locale name is recognized by the system.  */
# if defined _WIN32 && !defined __CYGWIN__
  /* On native Windows, setlocale(category, "") looks at the system settings,
     not at the environment variables.  Also, when an encoding suffix such
     as ".65001" or ".54936" is specified, it succeeds but sets the LC_CTYPE
     category of the locale to "C".  */
  if (setlocale (LC_ALL, getenv ("LC_ALL")) == NULL
      || strcmp (setlocale (LC_CTYPE, NULL), "C") == 0)
    return 1;
# else
  if (setlocale (LC_ALL, "") == NULL) return 1;
# endif


  /* Check whether nl_langinfo(CODESET) is nonempty and not "ASCII" or "646".
     On Mac OS X 10.3.5 (Darwin 7.5) in the de_DE locale, nl_langinfo(CODESET)
     is empty, and the behaviour of Tcl 8.4 in this locale is not useful.
     On OpenBSD 4.0, when an unsupported locale is specified, setlocale()
     succeeds but then nl_langinfo(CODESET) is "646". In this situation,
     some unit tests fail.  */
# if 0 && HAVE_LANGINFO_CODESET
  /* XXX: How shall this look like for 'C.utf8' or 'en_US.UTF-8'? */
  {
    const char *cs = nl_langinfo (CODESET);
    if (cs[0] == '\0' || strcmp (cs, "ASCII") == 0 || strcmp (cs, "646") == 0)
      return 1;
  }
# endif


# ifdef __CYGWIN__
  /* On Cygwin, avoid locale names without encoding suffix, because the
     locale_charset() function relies on the encoding suffix.  Note that
     LC_ALL is set on the command line.  */
  if (strchr (getenv ("LC_ALL"), '.') == NULL) return 1;
# endif


  /* XXX How can I test that UTF-8 encoding actually works? */


#endif
  return 0;
}
      ]])])
    if AC_TRY_EVAL([ac_link]) && test -s conftest$ac_exeext; then
      case "$host_os" in
        # Handle native Windows specially, because there setlocale() interprets
        # "ar" as "Arabic" or "Arabic_Saudi Arabia.1256",
        # "fr" or "fra" as "French" or "French_France.1252",
        # "ge"(!) or "deu"(!) as "German" or "German_Germany.1252",
        # "ja" as "Japanese" or "Japanese_Japan.932",
        # and similar.
        mingw*)
          if (LC_ALL=.65001 \
              LC_TIME= \
              LC_CTYPE= \
              ./conftest; exit) 2>/dev/null; then
            ac_cv_locale_c_utf8=.65001
          # Test for the hypothetical native Windows locale name.
          # XXX Shouldn't this be rather 'English_US.65001'?
          elif (LC_ALL="English_United States.65001" \
                LC_TIME= \
                LC_CTYPE= \
                ./conftest; exit) 2>/dev/null; then
            ac_cv_locale_c_utf8="English_United States.65001"
          else
            # None found.
            ac_cv_locale_c_utf8=none
          fi
          ;;
        *)
          if (LC_ALL=C \
              ./conftest; exit) 2>/dev/null; then
            ac_cv_locale_c_utf8=C
          # Setting LC_ALL is not enough. Need to set LC_TIME to empty, because
          # otherwise on Mac OS X 10.3.5 the LC_TIME=C from the beginning of the
          # configure script would override the LC_ALL setting. Likewise for
          # LC_CTYPE, which is also set at the beginning of the configure 
script.
          # Test for the usual locale name.
          elif (LC_ALL=en_US \
                LC_TIME= \
                LC_CTYPE= \
                ./conftest; exit) 2>/dev/null; then
            ac_cv_locale_c_utf8=en_US
          else
            # Test for the locale name with explicit encoding suffix.
            if (LC_ALL=C.UTF-8 \
                LC_TIME= \
                LC_CTYPE= \
                ./conftest; exit) 2>/dev/null; then
              ac_cv_locale_c_utf8=C.UTF-8
            elif (LC_ALL=en_US.UTF-8 \
                  LC_TIME= \
                  LC_CTYPE= \
                  ./conftest; exit) 2>/dev/null; then
              ac_cv_locale_c_utf8=en_US.UTF-8
            else
              # Test for the Solaris 7 locale name.
              if (LC_ALL=en.UTF-8 \
                  LC_TIME= \
                  LC_CTYPE= \
                  ./conftest; exit) 2>/dev/null; then
                ac_cv_locale_c_utf8=en.UTF-8
              else
                # None found.
                ac_cv_locale_c_utf8=none
              fi
            fi
          fi
          ;;
      esac
    fi
    rm -fr conftest*
  ])
  LOCALE_C_UTF8=$ac_cv_locale_c_utf8
  AC_SUBST([LOCALE_C_UTF8])
])

reply via email to

[Prev in Thread] Current Thread [Next in Thread]