emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: "args-out-of-range" error when using data from external process on W


From: Alexis
Subject: Re: "args-out-of-range" error when using data from external process on Windows
Date: Thu, 18 Apr 2024 17:07:25 +1000
User-agent: mu4e 1.12.4; emacs 29.3


Eli Zaretskii <eliz@gnu.org> writes:

Crystal ball says the package assumes UTF-8 encoding of the text from the sub-process, which is generally not what happens on Windows. Or maybe the package assumes that UTF-8 text from a sub-process will necessarily be decoded as UTF-8, which again can fail if the default coding-systems are not UTF-8 (which happens on Windows). The upshot is that the Lisp code expects some number of characters, but gets a different number of characters instead.

But this is all basically stabbing in the dark, since I have no idea what that package does and what the program whose output it reads does.

Hi Eli,

Thanks for your prompt reply. Sorry for my email not being more descriptive and self-contained. i linked to the GitHub issue:

 https://github.com/flexibeast/ebuku/issues/32

as there is already an extended discussion there about this issue, which itself links to a previous issue and discussion:

 https://github.com/flexibeast/ebuku/issues/31

in which the user first reported an "Invalid string for collation" issue. That issue was addressed, after some discussion, by setting LC_ALL to the same value that the user had set LANG, i.e. "zh_CN.UTF-8". That left us with issue 32, which is the one i'm asking about here.

Some better background about the software involved:

`buku` provides a command-line interface to an SQLite-based database of Web bookmarks, allowing one to save, delete and search for bookmarks, with each bookmark able to have a comment and tags associated with it.

`Ebuku` is a package that provides an Emacs-based UI for buku. It allows the user to add bookmarks, edit them, remove them, search them etc. without actually leaving Emacs. It does so by running `call-process` to call `buku` with the appropriate options, receiving the resulting output in a buffer, then processing the data in that buffer in order to present the user with the relevant results.

ebuku.el has a function:

(defun ebuku--call-buku (args) "Internal function for calling `buku' with list ARGS." (unless ebuku-buku-path (error "Couldn't find buku: check 'ebuku-buku-path'")) (apply #'call-process `(,ebuku-buku-path nil t nil "--np" "--nc" "--db" ,ebuku-database-path ,@args))) which gets called in several places - e.g. https://github.com/flexibeast/ebuku/blob/c854d128cba8576fe9693c19109b5deafb573e99/ebuku.el#L534 - to put the contents inside a temp buffer, which is then 'parsed' for the information to be presented to the user.

In a comment from a couple of days ago, and after having noted in a comment on issue 31:

 https://github.com/flexibeast/ebuku/issues/31#issuecomment-2053557703

that they'd set LANG on their system to "zh_CN.UTF-8", the user wrote (https://github.com/flexibeast/ebuku/issues/32#issuecomment-2058289816):

I set the value with (set-language-environment "UTF-8"). I remember I set up this value bacause I don't want my files containing Chinese to be encoded by GBK encoding.

Then, in https://github.com/flexibeast/ebuku/issues/32#issuecomment-2058498373, i wrote:

if i remember correctly, the default encoding used by Windows is UTF-16, not UTF-8. So i'm wondering if that's somehow being used to transfer data from the buku process to the Emacs process, regardless of the value of LANG and LC_ALL, and regardless of the encoding of the buku database itself?

to which the user responded:

I think the Powershell will use UTF-16 to encode instead of UTF-8.

Is that correct? Is that the case despite the user having specified "zh_CN.UTF-8"? But if that's the case, why does removing the CRAB emoji from text being operated on by string-match / match-string make the issue disappear? Is it perhaps something to do with
the code point for the CRAB emoji being outside the BMP?

Suggest that you ask the user who reported that to show the actual output of the sub-process (e.g., by running the same command outside of Emacs and redirecting output to a file), and if the output looks correct, examine the Lisp code which processes that output, with an eye on how the text is decoded. For example, if the text from the sub-process is supposed to be UTF-8 encoded, your Lisp code should bind coding-system-for-read to 'utf-8', to make sure it is decoded correctly.

Thanks, i can certainly do that, modulo the issue of whether the LANG and LC_ALL variables have any effect data transferred between the `buku` sub-process and Emacs. But what should i do to handle the more general case of an arbitrary encoding? Do i need to have a defcustom, with 'reasonable defaults', that the user can set if necessary, which i use as the value to pass to coding-system-for-read?

Btw: using UTF-8 by default on MS-Windows is not a very good idea, even with Windows 11 where one can enable UTF-8 support (did they do it, btw?). Windows still doesn't support UTF-8 well, even after the improvements in Windows 11, so the above settings might very well cause trouble. Suggest to ask the user to try the same recipe in "emacs -Q", and if the zh_CN.UTF-8 stuff is set up outside Emacs, to try without it.

As i interpret their comments in the above discussions so far, yes, they had themselves set LANG to "zh_CN.UTF-8" (and yes, as described above, had definitely `set-language-environment` as "UTF-8".

i'll certainly take your suggestions back to the user.

Thanks again,


Alexis.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]