silpa-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[silpa-discuss] Re: Dictionary Index Generator


From: Vasudev Kamath
Subject: [silpa-discuss] Re: Dictionary Index Generator
Date: Mon, 26 Apr 2010 23:36:00 +0530



On Mon, Apr 26, 2010 at 10:54 PM, Santhosh Thottingal <address@hidden> wrote:
On Sun, Apr 25, 2010 at 5:00 PM, Vasudev Kamath <address@hidden> wrote:
> Hi,
> Finally I was able to write a script which generates index file for a given
> dictionary.

Great!

Thanks :)
> Here are some assumption
> 1. If file is english dictionary it is opened with normal
> open since english dictionary encoding is IS0-8859 else files are opened with
> utf-8 encoding.

We need to fix this. All our dictionaries should be in UTF-8

I found a way to convert ISO-8859 encoding file to UTF-8 and after converting reading goes fine. But as you showed me there is issue while checking the spelling. I already wrote a sample script which reads the new UTF-8 english dictionary and searches for a word which user enters via console. It uses same logic "i in list" to search for a word in list. Whatever word like Aaliyah's it tells found. So i'm still looking what is going wrong in spell checker module
 
> 2. For english small and capital letters are treated differently since words
> with a and A start at different locations in the dictionary. For fixing this
> dictionary needs to be fixed

We need to consider a and A as different for English. india is a
spelling mistake but India is correct.


Great then I don't have to worry about capital letters 

> 3. I used cPickle instead of saving index as normal file, cPickle with protocol
> 2 is used for efficiency purpose and hence index file won't be human readable.
> Reason for using Python pickles as file format is to just reduce the complexity
> of processing index file. If desired we can create index as normal file. I need
> suggestions on this

I would prefer plain text index. Keep in mind that, one of SILPA
project's goal is develop algorithm in a generic way so that it can be
adopted in other technologies, programming languages. So let us keep
our data in plain text for index. It won't make much difference since
the index files are going to be very small.

Yeah ok then i'll write logic to create index in the manner which we discussed in our previous mail

> I'm attaching the dictionary indexing script please test it and let me know of
> any changes that needs to be done.

I will do a detailed testing over this weekend and let you know. If
you have the code for integration with existing spellchecker, please
share.

Ok. I haven't written any logic for integrating with existing silpa module was waiting for your reply. I'll go ahead with logic for integrating this idea with Silpa and once i've some working code i'll share it with you.
Thanks
Santhosh


Thanks and Regards
--
Vasudev Kamath
address@hidden
address@hidden
http://vasudevkamath.blogspot.com/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]