|
From: | Reuben Thomas |
Subject: | bug#17742: Acknowledgement (Support for enchant?) |
Date: | Tue, 20 Dec 2016 21:43:32 +0000 |
> From: Reuben Thomas <rrt@sc3d.org>
> Date: Mon, 19 Dec 2016 21:47:42 +0000
> Cc: 17742@debbugs.gnu.org
>
> neither GNU Aspell nor hunspell offer any way to get this information (about character classes of dictionaries) via their APIs.
>
> They provide this information in the dictionaries, and we glean it
> from there. See ispell-parse-hunspell-affix-file and If there's no better way, then I see no problem in relying on the
> ispell-aspell-find-dictionary.
>
> The dictionaries are not part of the API (even where the format is documented, the location may not be fixed), so it's not a good idea to rely on them.
dictionaries, and de-facto the results are satisfactory.
> Having discovered that Aspell does not provide this information (I checked again, and ispell-aspell-find-dictionary does not find this information in the dictionaries, except for limited information about otherchars; for casechars and not-casechars it defaults to [:alpha:]), I shall investigate with the hunspell maintainers.
Aspell provides some of that, and there's no reason to ignore what it
does provide.
Whether it's good enough depends on the dictionary and on what "(XP)"
means. It could be that "(XP)", including the parentheses, is a word
the dictionary recognizes, something akin to "(C)", i.e. copyright
sign.
I don't see why it would be fragile with Enchant when it isn't with
its back-ends.
And avoiding even fragile methods is worse than using
them, when there's no better way of gleaning the same information, and
the information is important (as it is in this case).
I think you are drawing too radical conclusions from trying that with
a single word and a single dictionary. Which string was sent to the
speller in this case,
and is that the string you expected to be sent?
> Moreover, even when we send entire lines to the speller, we want to
> skip lines that include only non-word characters.
>
> Why?
To avoid false positives and false negatives, as explained above.
First, Enchant could be using Hunspell as its engine, right?
And second, AFAIU this discussion started by you proposing to get rid
of CASECHARS etc., for all spellers, not just for Enchant, something
that will definitely cause degradation.
It sounds like the important part of our disagreement is in the last
sentence. If so, I hope I've succeeded to change your mind. Failing
that, all I can suggest is to study the spelling rules of modern
speller, such as Hunspell, and see how this information is used there.
I tried to explain that above: you will get falses and/or irrelevant
or missing corrections from the speller. For example, if you send
"foo.bar", and the speller doesn't support '.' as a word-constituent
character, you will get separate suggestions for "foo" and "bar", and
won't get "foobar".
I also don't understand why you want to remove this information, that
is already there, is not harder to get with Enchant than it is without
it, and the code which supports it is already there?
[Prev in Thread] | Current Thread | [Next in Thread] |