[silpa-discuss] Dictionary Index Generator script. v2

silpa-discuss

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[silpa-discuss] Dictionary Index Generator script. v2

From:	Vasudev Kamath
Subject:	[silpa-discuss] Dictionary Index Generator script. v2
Date:	Tue, 27 Apr 2010 18:59:13 +0530
User-agent:	KMail/1.12.4 (Linux/2.6.33-2.slh.10-sidux-686; KDE/4.3.4; i686; ; )

Hi all,
As santhos mentioned the new script now saves the index file as human readable 
file with following format
A=1
B=2000
...
I've tested it with multiple language and it works fine. I'm attaching the new 
script please test it and let me know of any bugs.

The English dictionary file can now be converted into UTF-8 format with 
following command

iconv -f ISO-8859-1 -t UTF-8 en_US.dic > en_US_utf-8.dic 

the new file can then be renamed to proper english directory, After UTF-8 
conversion the codecs.open with utf-8 works fine and all words are read with 
out any issues. 
Santhosh has mentioned that after conversion of en_US to utf-8 spell checker 
module was throwing wrong spelling for the words which are present in 
dictionary. I found that issue is not with converting en_US to utf-8 file. 
Here is what is actually going on
We have changed spell checker module to convert input characters to lower 
case. For eg. Input Hello was told wrong because dictionary contained only 
hello so to deal with this we were converting input words to lower case.
This gave rise to new issue
New Issue: Input AOL gets converted into aol before checking against 
dictionary but dictionary contains only  AOL and not aol hence aol is flagged 
as wrong spelling.

We need to come up with a new strategy to deal with this and point that is to 
be noted this issue is only related to en_US dictionary .

I'm going to work on integrating this new indexing approach with Silpa once 
i'm up with working code i'll share it here

Thanks and Regards
Vasudev Kamath

indexer_v2.py
Description: Text Data

[Prev in Thread]

Current Thread

[Next in Thread]

[silpa-discuss] Dictionary Index Generator script. v2, Vasudev Kamath <=
- [silpa-discuss] Re: Dictionary Index Generator script. v2, Laxminarayan Kamath, 2010/04/28

Prev by Date: [silpa-discuss] Re: Dictionary Index Generator
Next by Date: [silpa-discuss] Re: Dictionary Index Generator script. v2
Previous by thread: [silpa-discuss] [bug #29678] Word loss during reading of english dictionary
Next by thread: [silpa-discuss] Re: Dictionary Index Generator script. v2
Index(es):
- Date
- Thread