I was testing out the accuracy of transliteration before and after applying the stem patch. With a very small paragraph after learning only the words in 0.txt, there's an improvement of only 1 word. But we would be testing transliteration then right? Wouldn't it be more meaningful if we feed the entire word corpus into varnam, and then export the suggestions database and compare with the original word corpus? The new exported corpus should be larger than the original one.
For accurate metrics, I can perhaps do the same for a corpus of 1000 words and see how many new meaningful words are added to the corpus. What do you think?