silpa-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[silpa-discuss] Re: Dictionary Index Generator


From: Santhosh Thottingal
Subject: [silpa-discuss] Re: Dictionary Index Generator
Date: Mon, 26 Apr 2010 22:54:13 +0530

On Sun, Apr 25, 2010 at 5:00 PM, Vasudev Kamath <address@hidden> wrote:
> Hi,
> Finally I was able to write a script which generates index file for a given
> dictionary.

Great!

> Here are some assumption
> 1. If file is english dictionary it is opened with normal
> open since english dictionary encoding is IS0-8859 else files are opened with
> utf-8 encoding.

We need to fix this. All our dictionaries should be in UTF-8

> 2. For english small and capital letters are treated differently since words
> with a and A start at different locations in the dictionary. For fixing this
> dictionary needs to be fixed

We need to consider a and A as different for English. india is a
spelling mistake but India is correct.


> 3. I used cPickle instead of saving index as normal file, cPickle with 
> protocol
> 2 is used for efficiency purpose and hence index file won't be human readable.
> Reason for using Python pickles as file format is to just reduce the 
> complexity
> of processing index file. If desired we can create index as normal file. I 
> need
> suggestions on this

I would prefer plain text index. Keep in mind that, one of SILPA
project's goal is develop algorithm in a generic way so that it can be
adopted in other technologies, programming languages. So let us keep
our data in plain text for index. It won't make much difference since
the index files are going to be very small.

> I'm attaching the dictionary indexing script please test it and let me know of
> any changes that needs to be done.

I will do a detailed testing over this weekend and let you know. If
you have the code for integration with existing spellchecker, please
share.

Thanks
Santhosh




reply via email to

[Prev in Thread] Current Thread [Next in Thread]