silpa-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[silpa-discuss] Modified spellchecker


From: Vasudev Kamath
Subject: [silpa-discuss] Modified spellchecker
Date: Wed, 5 May 2010 22:38:30 +0530
User-agent: KMail/1.12.4 (Linux/2.6.33-3.slh.4-sidux-686; KDE/4.3.4; i686; ; )

Hi,
PFA the diff patch file for spellchecker.py for spellchecker module. This patch 
holds the logic change to integrate the indexing logic. I'm also attaching 
indexer script which should be run first to generate index file for each 
dictionary. Please note you need to place the indexer script inside 
spellchecker module as of now to work properly. In coming days i'll modify it 
to make it work independent of its location. Also note that indexer script 
fails for mr_IN.dic i'm still not sure of the reason, here is the trace output 
and output from the file command for mr_IN.dic
Traceback (most recent call last):
  File "indexer.py", line 129, in <module>
    index.createIndex("mr_IN.dic")
  File "indexer.py", line 60, in createIndex
    item = self.fp.readline()
  File "/usr/lib/python2.5/codecs.py", line 622, in readline
    return self.reader.readline(size)
  File "/usr/lib/python2.5/codecs.py", line 477, in readline
    data = self.read(readsize, firstline=True)
  File "/usr/lib/python2.5/codecs.py", line 424, in read
    newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid 
data

dicts/mr_IN.dic: UTF-8 Unicode text, with CRLF, LF line terminators

One more thing is we need to convert the english dictionary encoding to UTF-8 
which is currently ISO-8859-1 and hence causes data loss while reading. If its 
ok i'll convert the encoding and commit the dictionary to repo.

Note about the performance improvement. As I noticed the new version of 
spellchecker works pretty faster than the existing code. At the first time 
there will be slight delay (may be because of loading the index from the file)
I tested the performance by running new code as standalone and existing silpa 
over apache. Please verify this.

Thanks and Regards
Vasudev Kamath

Attachment: spellchecker.diff
Description: Text Data

Attachment: indexer.py
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]