Re: dired-do-find-regexp failure with latin-1 encoding

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: dired-do-find-regexp failure with latin-1 encoding

From:	Dmitry Gutov
Subject:	Re: dired-do-find-regexp failure with latin-1 encoding
Date:	Sun, 29 Nov 2020 18:07:38 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

On 29.11.2020 17:06, Eli Zaretskii wrote:

Do we want to search the "binary" files at all?


We don't.  I still hope to understand why -a was needed in this case.
Stephen?

Looks like it actually depends on the encoding of the _output_. So if itcan print some lines well but not others it can even print a line from afile and then later say it's a binary:


$ grep "prem" latin1.txt
premie?re is slightly different
Binary file latin1.txt matches

Adding -a or prepending 'LC_ALL=C' changes that:
$ LC_ALL=C grep "prem" latin1.txt
premi�re is first
premie?re is slightly different

So... looks like Grep searches through all files anyway. Just modifiesits output in cases where it looks iffy.

We should support Grep regardless, since not everyone will have
ripgrep.  And in any case, "C-x RET c" will be needed with it as well,
no?


I'd have to test it explicitly to say for sure, but:

    ripgrep supports searching files in text encodings other than UTF-8,
    such as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. (Some
    support for automatically detecting UTF-16 is provided. Other text
    encodings must be specifically specified with the -E/--encoding flag.)

https://blog.burntsushi.net/ripgrep/#pitch


What is not clear to me is whether the _output_ is always in some
fixed encoding, like UTF-8.  That doesn't seem to be stated in the
docs there.

Judging by a small experiment, rg's output is in the same encoding asinput, for each file. Which can be a nuisance when looking at the searchresults, but that's probably all.

In any case, if one takes the pre-processing route, the end encodingwill be UTF-8.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: dired-do-find-regexp failure with latin-1 encoding, (continued)

Prev by Date: Re: dired-do-find-regexp failure with latin-1 encoding
Next by Date: Re: dired-do-find-regexp failure with latin-1 encoding
Previous by thread: Re: dired-do-find-regexp failure with latin-1 encoding
Next by thread: Re: dired-do-find-regexp failure with latin-1 encoding
Index(es):
- Date
- Thread