aspell-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Aspell-user] Configuring spell check in mult language documents


From: Mahesh T. Pai
Subject: Re: [Aspell-user] Configuring spell check in mult language documents
Date: Sat, 9 Jul 2011 00:19:20 +0530
User-agent: Mutt/1.5.21 (2010-09-15)

Carlo Traverso said on Fri, Jul 08, 2011 at 08:24:54PM +0200,:

 > aspell list -l lang1 | aspell list -l lang2

That would take the words out of their context, no? 

 > I did not check the hindi dictionaries, but probably hindi accepts
 > both latin and hindi characters as word components (this is how
 > ancient greek, grc, does). The solution of your problem could be to
 > define a variant of hindi that only accepts hindi characters.

AFAICT, no. Especially if you are putting that in the linguistic
sense.


Hindi (and most Indic languages) use the 16 bit mapping in UTF-8
encoding schema.

I suspect that the difficulties mentioned by Kevin have more to do
with aspell being "internally 8 bit", as Kevin put it some months back. 

Probably, the difficulty is in distinguishing between few bytes of 8
bit characters, followed by few bytes of 16 bit characters. Of course,
I am no expert or even a programmer and I may be way off mark. 

If you want a look at the kind of documents we have in mind, have a
look at

http://finance.kerala.gov.in/
index.php?option=com_docman&task=doc_download&gid=3047&Itemid=34

(watchout for a broken line - to avoid problems in mailers)

That is a pdf file, with both English and Malayalam script. We use
plenty of documents like that. The pdf itself is unlikely to use
UTF-8, so do not use it as an example for anything except visual
representation of the text. 



-- 
Mahesh T. Pai   ||
DICTIONARY, n.  A malevolent literary device for cramping the
  growth of a language and making it hard and inelastic.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]