[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#13639: [emacs] ispell.el: hunspell dicts autodetection under Emacs.
From: |
Eli Zaretskii |
Subject: |
bug#13639: [emacs] ispell.el: hunspell dicts autodetection under Emacs. |
Date: |
Wed, 20 Feb 2013 21:00:41 +0200 |
> Date: Wed, 20 Feb 2013 18:50:45 +0100
> From: Agustin Martin <agustin.martin@hispalinux.es>
>
> > > > > Sorry, I should have written WORDCHARS.
> > > >
> > > > Why do we need that?
> > >
> > > This is what ispell.el calls otherchars. Parsing WORDCHARS ensures that
> > > both
> > > hunspell and ispell.el think about the same characters in that category.
> >
> > I think you are mistaken, that's not my reading of hunspell(4).
>
> Sorry for the late reply,
>
> (Opening a new thread specifically about hunspell dicts autodetection and
> using new cloned bugreport #13639 specific about this)
>
> Although WORDCHARS description in hunspell(4)
>
> WORDCHARS characters
> WORDCHARS extends tokenizer of Hunspell command line interface
> with additional word character. For example, dot, dash, n-dash, numbers,
> percent sign are word character in Hungarian.
>
> is too hungarian biassed and does not mention usual apostrophe AFAIK it
> mostly refers to the same as 'otherchars', although hunspell may accept
> that in locations not in the middle of a word.
I didn't just read the man page, I also looked into several *.aff
files that install with Hunspell dictionaries. It is clear to me that
WORDCHARS is at least unreliable, even if your interpretation is
correct (of which I'm still unconvinced): some *.aff files don't have
that entry at all (e.g., en_GB.aff, whose OTHERCHARS should include
the ' character, and also ru_RU.aff); others, like he_IL.aff, have
that entry mention all the CASECHARS, in addition to OTHERCHARS. I
wouldn't bet my money on what that entry gives us.
> The good news are that I started working on hunspell dicts autodetection.
Good news, indeed! Thanks!