Re: dired-do-find-regexp failure with latin-1 encoding

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: dired-do-find-regexp failure with latin-1 encoding

From:	Dmitry Gutov
Subject:	Re: dired-do-find-regexp failure with latin-1 encoding
Date:	Sun, 29 Nov 2020 21:48:04 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

On 29.11.2020 20:42, Eli Zaretskii wrote:

Cc: stephen.berman@gmx.net, emacs-devel@gnu.org
From: Dmitry Gutov <dgutov@yandex.ru>
Date: Sun, 29 Nov 2020 19:32:17 +0200

If the calls to the conversion program are done in parallel to the
subsequent searches, reading the file twice might not be a problem (with
the benefit of a disk cache).


How do you mean "in parallel"?  You cannot start searching until you
decide on the encoding, so it must not be in parallel.

Since we're passing multiple files to Grep or RG at the same time, itcould start deciding on the encoding of the next file while stillsearching the previous one.

How does Emacs do it? Does it read until the end of the file?


No, just a small initial part of it.  That's one reason why the
results are not guaranteed to be correct.


But if we consider that approach good enough for Emacs, it should
probably be good enough for doing a search from inside Emacs.


It's good enough when the encoding is the locale's codeset, and in a
few other (not very important) cases.  For an arbitrary combination of
file's encoding and locale's codeset, the result can be wrong every
single time.

And searching in non-ASCII files whose encoding is not the locale's
native one is precisely the case where this will fail.  Granted, it's
a relatively rare use case, but when it does happen, all bets are off.

Which will likely have affected the user (who is foremost an Emacs user)already, before he/did the search.

So reading just a small part, as Emacs does, will yield similar
percentage of wrong guesses.


...so that seems like a good thing.

Anyway, that should work but you don't seem to be crazy about theapproach, and I'm not in love with the potential implementation. Somaybe we should stop and let it brew for a little while.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: dired-do-find-regexp failure with latin-1 encoding, (continued)

Prev by Date: Re: dired-do-find-regexp failure with latin-1 encoding
Next by Date: Re: dired-do-find-regexp failure with latin-1 encoding
Previous by thread: Re: dired-do-find-regexp failure with latin-1 encoding
Next by thread: Re: dired-do-find-regexp failure with latin-1 encoding
Index(es):
- Date
- Thread