varnamproject-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Varnamproject-discuss] Varnam GSOC ideas


From: Kevin Martin
Subject: Re: [Varnamproject-discuss] Varnam GSOC ideas
Date: Tue, 17 Feb 2015 14:22:11 +0530

I've been thinking about this idea. Its true that its relevance is only till varnam gets all language support. But the rules for the stemmer too goes into the scheme file. To improve or add the stemmer in other languages we have to edit the scheme file again. For someone not well versed in ruby the syntax (all the square brackets and curlies) might look a bit intimidating. I remember soorej having difficulty reading the scheme file. So if we come up with an editor/gui tool that makes editing the scheme file more intuitive for the end user, I think it will eventually detach the burden of adding a language/stemmer support from the developer. No matter how many comments we include in the scheme file, an end user will always prefer a GUI interface to a command line one.

Also, can varnam be adapted to handle non-indic languages? I know that the database now contains sanskrit based entries like swaras and viramas. But if someone wants to, say, add support for arabic, will it need changes to the underlying logic?

Also, does varnam now compile under windows?

On Fri, Feb 13, 2015 at 10:51 AM, Navaneeth K N <address@hidden> wrote:
We learned more words in Kannada. The whole wikipedia dump was fed into varnam. I haven’t released it yet.

---
Navaneeth



> On 13-Feb-2015, at 10:48 am, Kevin Martin <address@hidden> wrote:
>
> 21 Gb? That's way too much! But how come Kannada take so much space when malayalam is less than 1 Gb?
>
> But I think we should get varnam into mobile. That will result in the project getting the popularity it deserves.
>
> On Fri, Feb 13, 2015 at 10:34 AM, Navaneeth K N <address@hidden> wrote:
> Hi Kevin,
>
> Yes. That is a good idea. But it is usable only till varnam gets all the language support. I think as part of GSOC, we should target for something which can serve long term. Like the stemmer implementation.
>
> I was thinking getting varnam into mobile space and integrate it with Indic keyboard a new varnam keyboard altogether. What do you think about this?
>
> Or rewrite the learning algorithm so that learned data takes less space. This is critical when getting into the mobile space and offline editing. Currently, Kannada learned file takes about 21Gb of space.
>
> thoughts?
>
> —
> Navaneeth
>
> > On 12-Feb-2015, at 9:20 pm, Kevin Martin <address@hidden> wrote:
> >
> > Hi,
> >
> > I was wondering about a few project ideas related to varnam for students this year. When soorej and I was working on the inscript support soorej had to modify the scheme file. He mentioned that it would be nice if we had a GUI tool that makes it easy to write the scheme file. Can this be a GSOC project idea, perhaps of a lower priority? Maybe we can add a few more related tasks to this.
>
>
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]