|
From: | Dmitry Gutov |
Subject: | Re: dired-do-find-regexp failure with latin-1 encoding |
Date: | Sun, 29 Nov 2020 18:07:38 +0200 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 |
On 29.11.2020 17:06, Eli Zaretskii wrote:
Do we want to search the "binary" files at all?We don't. I still hope to understand why -a was needed in this case. Stephen?
Looks like it actually depends on the encoding of the _output_. So if it can print some lines well but not others it can even print a line from a file and then later say it's a binary:
$ grep "prem" latin1.txt premie?re is slightly different Binary file latin1.txt matches Adding -a or prepending 'LC_ALL=C' changes that: $ LC_ALL=C grep "prem" latin1.txt premi�re is first premie?re is slightly differentSo... looks like Grep searches through all files anyway. Just modifies its output in cases where it looks iffy.
We should support Grep regardless, since not everyone will have ripgrep. And in any case, "C-x RET c" will be needed with it as well, no?I'd have to test it explicitly to say for sure, but: ripgrep supports searching files in text encodings other than UTF-8, such as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. (Some support for automatically detecting UTF-16 is provided. Other text encodings must be specifically specified with the -E/--encoding flag.) https://blog.burntsushi.net/ripgrep/#pitchWhat is not clear to me is whether the _output_ is always in some fixed encoding, like UTF-8. That doesn't seem to be stated in the docs there.
Judging by a small experiment, rg's output is in the same encoding as input, for each file. Which can be a nuisance when looking at the search results, but that's probably all.
In any case, if one takes the pre-processing route, the end encoding will be UTF-8.
[Prev in Thread] | Current Thread | [Next in Thread] |