aspell-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Aspell-user] Hyphens and apostrophes in words


From: Carlo Traverso
Subject: Re: [Aspell-user] Hyphens and apostrophes in words
Date: Sun, 19 May 2013 12:19:14 +0200 (CEST)

>>>>> "ciaran" == =?iso-8859-1?B?Q2lhcuFuINMgRHVpYmjtbg==?=  <iso-8859-1> 
>>>>> writes:

    ciaran> I'd like to know which, if any, spellcheckers can be
    ciaran> configured to act like this.  (The examples are from
    ciaran> English but the real need comes from other languages.)
    ciaran> Asking here about aspell particularly, of course.

    ciaran> First, if necessary, allow the dictionary to contain words
    ciaran> with apostrophe "'" and hyphen "-" in any position. (I am
    ciaran> aware of the side-effects of this and am not worried by
    ciaran> them.)

    ciaran> Now, when checking text:

    ciaran> 1. Accept a word containing a hyphen if EITHER the
    ciaran> dictionary contains the whole word including the hyphen
    ciaran> ("hotch-potch") OR if the dictionary contains both parts
    ciaran> separately ("half-moon").

    ciaran> 2. With a dictionary containing "'twas" but not "twas",
    ciaran> accept "'twas".

    ciaran> 3. With a dictionary containing "well" but not "'well",
    ciaran> not accept "'well".

aspell can do 2 and 3, (but you have to recompile the English
dictionary after changing the handling of ' in the .dat file; and of
course add the acceptable words; this is the aspell way to do your
"First" point).

For 1, you should modify the .dat file again allowing - in the middle
of a word, add the composed words, and pass the spell-checker twice,
once with the modified dictionary, (to accept the words with -) once
with the original one (or rather the one modified in the first step)
to accept the two components. The first pass will refuse the words
with - not included, the second pass will split their components and
check again.

I don't think that it is possible to do it with one pass, combining
the two dictionaries in one .multi file since the .dat have to be
different (and hence the word tokens will be different). 

Carlo Traverso



reply via email to

[Prev in Thread] Current Thread [Next in Thread]