silpa-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[silpa-discuss] Fwd: Re: Fw:GSOC'13


From: Vasudev Kamath
Subject: [silpa-discuss] Fwd: Re: Fw:GSOC'13
Date: Thu, 11 Apr 2013 19:54:02 +0530
User-agent: alot/0.3.4

To every one who have doubt on what is improving transliteration project
is please read through this mail.


Forwarded message from Vasudev Kamath (2013-04-09 09:38:54):
> Hello Yash,
> 
> On Tue, Apr 9, 2013 at 2:31 AM, Yash Sinha <address@hidden> wrote:
> > Sir,
> > I am Yash from BITS Pilani. I have experience in python programming. I am
> > interested in "Improving cross language transliteration system." As of now,
> > I have joined the mailing list and trying to install and understand the
> > transliteration module. I have also sent this mail to mailing list.
> 
> Thank you for taking interest in the project. And yeah you can drop
> the mail list we both are subscribed there along with Santhosh main
> developer of project. As of now I'm keeping him in CC for this mail.
> 
> >
> > 1. Please let me know which linux version to use to install transliteration
> > module.
> 
> Operating system is not a constraint please choose distro of your
> liking. We only require python pip virtualenv and some more packages.
> Currently the transliteration module is separated into separate git
> repo which can be found here [1]
> 
> [1] https://github.com/Project-SILPA/Transliteration
> 
> > 2. What method is being thought upon for  developing better method for
> > English to Indic transliteration?
> 
> As mentioned in the ideas page we currently use CMUDict a
> pronunciation dictionary [2]. Let me explain basic idea behind the
> transliteration lets consider this word
> 
> ABDUCTING AE B D AH K T IH NG
> 
> See the code here [3] basically we have word as key in dictionary and
> remaining part which is actually how the word sounds when you
> pronounce it. We already have mapping of each such words in
> cmumapping.py[4]
> 
> for eg: consider AE it is mapped like
> 
> "AE": "ಏ", in Kannada
> 
> So we first find equivalent and then apply some normalizations which
> is language specific to construct a proper transliterated word. I'm
> just giving basic idea you can read the code to how it really works.
> 
> As you see we only have current mapping for Kannada and Malayalam as
> me and Santhosh are good in these 2 languages so all other languages
> go through cycle en -> ml -> language and this is error prone due to
> language natures we basically want to fix this first by improving and
> adding mapping for other Indian languages
> 
> Secondly CMUDict probably we are using old or new version is not
> released (I'm not sure) this probably limits the amount of English
> word which can be transliterated so we probably need to improve this
> dictionary or find other approach to handle this.
> 
> @Santhosh can you explain what idea you have on this.
> 
> [2] 
> https://raw.github.com/Project-SILPA/Transliteration/master/transliteration/cmudict.0.7a_SPHINX_40
> [3] 
> https://github.com/Project-SILPA/Transliteration/blob/master/transliteration/cmudict.py
> 
> > 3. Any other relevant information which I may be missing out?
> >
> 
> You probably need some knowledge on Git as it is the version control we use.
> 
> 
> @Santhosh I've forgot access password for mailing list admin area can
> you please moderate Yash's last mail?
> 
> Yash from next mail lets move this discussion to mailing list.
> 
> Best Regards
> --
> 
> Vasudev Kamath
> http://copyninja.info
> address@hidden|vasudev.homelinux.net}

Vasudev Kamath
http://copyninja.info
Connect on ~friendica: address@hidden | vasudev.homelinux.net}
IRC nick: copyninja | vasudev {irc.oftc.net | irc.freenode.net}
GPG Key: C517 C25D E408 759D 98A4  C96B 6C8F 74AE 8770 0B7E

Attachment: signature.asc
Description: signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]