[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: dired-do-find-regexp failure with latin-1 encoding
From: |
Juri Linkov |
Subject: |
Re: dired-do-find-regexp failure with latin-1 encoding |
Date: |
Sun, 29 Nov 2020 21:37:23 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (x86_64-pc-linux-gnu) |
>>>> Adding -a probably cannot do any harm, but its support should be
>>>> detected, since I don't think it's portable enough (it isn't in the
>>>> latest Posix spec, at least).
>>>
>>> Are you sure about that? Are we sure it won't make searching binary
>>> files slower, for example?
>> It will be slower, but more useful: by default Grep just says "Binary
>> file foo matches".
>
> Do we want to search the "binary" files at all? Right now we simply filter
> such matches out (see the definition of xref-matches-in-files), and I have
> seen no complaints.
There are two cases: a really binary file, and a legit ascii file
with an occasional ^@ char. And grep can't distinguish one from another.
There is an option --binary-files=binary, but unfortunately it doesn't help,
it still outputs "Binary file matches".
So xref parser needs to be smart enough to detect whether the matched line
contains binary garbage when '-a' is used, or it's purely ascii.
Moreover, I think we should apply the same heuristics to the grep output
in grep.el and add '-a' to the grep command by default. Then grep.el
should prettify the lines with real binary garbage e.g. by hiding groups of
bytes between 0 and 32, or adding a 'display' property with ellipsis.
>>> Also, the manual has this warning:
>>>
>>> Warning: The -a option might output binary garbage, which can have
>>> nasty side effects if the output is a terminal and if the terminal
>>> driver interprets some of it as commands.
>>>
>>> ...which might conceivably mess up our parsing of Grep output sometimes?
>> This is not relevant, since we read that output, there's no terminal
>> device driver to interpret it and get messed up.
>
> Our interpreter is our regexp with which we parse. But I suppose as long as
> Grep doesn't insert unexpected newlines, the parser will be fine.
For grep output a bigger problem is that grep on binary data
might output too long lines before the terminating newline.
>> I actually don't think I understand why we need -a in this case, since
>> Grep looks for null bytes to decide this is a binary file, and encoded
>> non-ASCII characters don't have null bytes 9except if they are in
>> UTF-16).
>
> Good question.
The grep manual says that binary data are either output bytes that
are improperly encoded for the current locale, or null input bytes.
- Re: dired-do-find-regexp failure with latin-1 encoding, (continued)
- Re: dired-do-find-regexp failure with latin-1 encoding, Eli Zaretskii, 2020/11/29
- Re: dired-do-find-regexp failure with latin-1 encoding, Dmitry Gutov, 2020/11/29
- Re: dired-do-find-regexp failure with latin-1 encoding, Eli Zaretskii, 2020/11/29
- Re: dired-do-find-regexp failure with latin-1 encoding, Dmitry Gutov, 2020/11/29
- Re: dired-do-find-regexp failure with latin-1 encoding, Eli Zaretskii, 2020/11/29
- Re: dired-do-find-regexp failure with latin-1 encoding, Dmitry Gutov, 2020/11/29
- Re: dired-do-find-regexp failure with latin-1 encoding, Eli Zaretskii, 2020/11/29
- Re: dired-do-find-regexp failure with latin-1 encoding, Eli Zaretskii, 2020/11/29
- Re: dired-do-find-regexp failure with latin-1 encoding, Stephen Berman, 2020/11/29
- Re: dired-do-find-regexp failure with latin-1 encoding, Gregory Heytings, 2020/11/29
- Re: dired-do-find-regexp failure with latin-1 encoding,
Juri Linkov <=
- Re: dired-do-find-regexp failure with latin-1 encoding, Dmitry Gutov, 2020/11/29
- Re: dired-do-find-regexp failure with latin-1 encoding, Juri Linkov, 2020/11/30
- Re: dired-do-find-regexp failure with latin-1 encoding, Dmitry Gutov, 2020/11/30