emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: dired-do-find-regexp failure with latin-1 encoding


From: Dmitry Gutov
Subject: Re: dired-do-find-regexp failure with latin-1 encoding
Date: Sun, 29 Nov 2020 19:19:43 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

On 29.11.2020 19:12, Eli Zaretskii wrote:
Cc: stephen.berman@gmx.net, emacs-devel@gnu.org
From: Dmitry Gutov <dgutov@yandex.ru>
Date: Sun, 29 Nov 2020 18:07:38 +0200

Adding -a or prepending 'LC_ALL=C' changes that:
$ LC_ALL=C grep "prem" latin1.txt
premi�re is first
premie?re is slightly different

Is that � what Grep actually produced?

That's copied from a terminal emulator.

If I run it with shell-command, I get this:

premi\350re is first
premie?re is slightly different

(\350 being a raw char)

What is not clear to me is whether the _output_ is always in some
fixed encoding, like UTF-8.  That doesn't seem to be stated in the
docs there.

Judging by a small experiment, rg's output is in the same encoding as
input, for each file.

So in this aspect it is not better than Grep: it is still impractical
to search through files that have different encodings.

It's not optimal, but the important thing is to get matches from all of them. Even if some can be printed in a not-so-readable way.

In any case, if one takes the pre-processing route, the end encoding
will be UTF-8.

But then the pre-processor will have to guess the encoding (if it is
not the same for all the files), which we know is not simple.

Yes.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]