On Wed, Sep 15, 2021 at 11:40:31AM -0400, John Snow wrote:
> A few new annoyances. Of note is the new warning for an unspecified
> encoding when opening a text file, which actually does indicate a
> potentially real problem; see
> https://www.python.org/dev/peps/pep-0597/#motivation
>
> Use LC_CTYPE to determine an encoding to use for interpreting QEMU's
> terminal output. Note that Python states: "language code and encoding
> may be None if their values cannot be determined" -- use a platform
> default as a backup.
>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
> python/qemu/machine/machine.py | 9 ++++++++-
> python/setup.cfg | 1 +
> 2 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
> index a7081b1845..51b6e79a13 100644
> --- a/python/qemu/machine/machine.py
> +++ b/python/qemu/machine/machine.py
> @@ -19,6 +19,7 @@
>
> import errno
> from itertools import chain
> +import locale
> import logging
> import os
> import shutil
> @@ -290,8 +291,14 @@ def get_pid(self) -> Optional[int]:
> return self._subp.pid
>
> def _load_io_log(self) -> None:
> + # Assume that the output encoding of QEMU's terminal output
> + # is defined by our locale. If indeterminate, use a platform default.
> + _, encoding = locale.getlocale()
> + if encoding is None:
> + encoding = locale.getpreferredencoding(do_setlocale=False)
Do we really need this getpreferredencoding ? IIUC, this is a sign
that the application is buggy by not calling
locale.setlocale(locale.LC_ALL, '')
during its main() method, which I think we can just delegate to the
code in question to fix. Missing setlocale will affect everything
they run, so doing workarounds in only 1 place is not worth it IMHO
I genuinely don't know! (And, I try to keep the Python code free from assuming Linux as much as I can help it.)
Python's getlocale documentation states: "language code and encoding may be None if their values cannot be determined."
But it is quiet as to the circumstances under which this may happen. Browsing the cpython source code, (3.9ish):
```
def getlocale(category=LC_CTYPE):
localename = _setlocale(category)
if category == LC_ALL and ';' in localename:
raise TypeError('category LC_ALL is not supported')
return _parse_localename(localename)
```
_setlocale is ultimately a call to (I think) _localemodule.c's PyLocale_setlocale(PyObject *self, PyObject *args) C function.
It calls `result = setlocale(category, locale)` where the category is going to be LC_CTYPE, so this should be equivalent to locale(3) (LC_CTYPE, NULL).
locale(3) says that "The return value is NULL if the request cannot be honored."
Python parses that string according to _parse_localename, which in turn calls normalize(localename).
Normalization looks quite involved, but has a fallback of returning the string verbatim. If the normalized locale string is "C", we return the tuple (None, None)!
So I figured there was a non-zero chance that we'd see a value of `None` here.
Source code is in cpython/Lib/locale.py and cpython/Modules/_localemodule.c if you want to nose around yourself.
I also have no idea how this will all shake out on Windows, so I decided to add the fallback here just in case. (Does the Python package work on Windows? I don't know, but I avoid assuming it won't EVER run there... Certainly, I have an interest in having the QMP packages I am building work on all platforms.)
Thoughts?