bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#42602: Wrong (not-)casechars value for "polish" in ispell-dictionary


From: Sebastian Urban
Subject: bug#42602: Wrong (not-)casechars value for "polish" in ispell-dictionary-base-alist
Date: Thu, 30 Jul 2020 13:39:55 +0200
User-agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

I don't understand this change.  Values above octal 377 cannot be
right in the above regexps, because they are supposed to be in
Latin-2 encoding, which is a single-byte encoding, and so can only
handle values below octal 400.  How did you come up with those
values?

Basically, C-x = on a char, which gave me octal values.  I though it
was recognising only A-z + ó/Ó and some other chars that I'm not
interested in, so I swapped those values for the ones corresponding to
the Polish chars.  That's the whole story.

Anyway, I'm quite sure some other factor is at work here.

Well, I did some tests, e.g. switched back to the original value of
"polish" in my "pl" dictionary, and... it works.  And if I change from
iso-8859-2 to utf-8 in my "pl" (with original value from "polish") it
doesn't work.  So, as you later wrote - wrong character encoding,
I guess.

Looking for a cause (in default settings), I think I found it in
ispell-dictionary-base-alist and ispell-dictionary-alist.  During
"transfer" from *-base-* to ispell-dictionary-alist, the value of
CHARACTER-SET is changed in all cases from iso-* or cp1255 to utf-8,
then ispell uses these (from ispell-dictionary-alist) when it "talks"
with Aspell.

On the other hand, if I use Emacs 26.3 from Cygwin, everything works
out of the box, I don't even have to set "polish" as default
dictionary. But there, in Cygwin command line, "env | grep LANG" gives
"LANG=pl_PL.UTF-8".

Your Emacs is a native MinGW build, whereas Aspell seems to be
a Cygwin build?

Both Emacses are official Win builds, and Aspell is installed through
Cygwin.

If so, you could have incompatibility in character encoding.  What
is your Windows locale?

"Polish" everywhere in "Control Panel" -> "Regional and Language".

And what does M-: (getenv "LANG") RET yield inside Emacs?

"PLK"


S. U.

P.S.
Moreover, if I type in regexp-builder "[\363\323]" it won't
recognize ó/Ó, but it doesn't have a problem with other Polish
chars, like "ł" ("[\502]") or "ż" ("[\574]").

In the "Character List" buffer for unicode-bmp, regexp-builder
(numbers are octal values):
- 0-177 and 400-777 - highlights chars
- 240-377 - doesn't highlight chars (it highlights them if I use hex
  value, or insert them directly)
I didn't check "80h-9Fh" chars.  Chars like C-a were checked by
inserting them with quoted-insert in another buffer.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]