[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Mon, 7 Jul 2014 23:51:35 +0530
I'll put this in the blog soon. Just a quick review of what I did :
1) Now looks ahead one syllable before stemming. Solved a lot of problems with the stem rule parsing. This is done by creating an exceptions table in ml.vst. Each row contains a stem rule and its exception. The stem rule is not applied if the syllable preceding the suffix is the exception.
2) Will not stem very short words, and will not stem if the length of the word - the length of the suffix is less than 2 syllables long.
3) Created a data set of 1000 words from malayalam wikipedia history section and tested stemming on it. Stems with 94% accuracy (In cases where the word is already in base form, not stemming the word is considered as a correct result).
Within the next few days :
1) More data sets. Approximately 5000 words in total for testing
2) Tweak stem rules even more.
3) Finally test the learning.
- [Varnamproject-discuss] Update,
Kevin Martin <=