varnamproject-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Varnamproject-discuss] tool to build and analyze text corpus


From: kiran ps
Subject: [Varnamproject-discuss] tool to build and analyze text corpus
Date: Sun, 16 Mar 2014 13:05:37 +0530

Adding new words to varnam corpus can only increase the size of word corpus. Words in the corpus are not proofread. Word corpus will grow day by day. Each word need to be reviewed before adding it to the corpus.Current word corpus consist only word and its frequency.
 
My idea is create a tool to build and analyze text corpus for Indian languages. A software which allows editors to write accurate and meaningful entries, annotations,track and record the very latest developments in language today, find how new words and senses are emerging, as well as spotting other trends in usage, spelling, world English, and so on.
 
Data to corpus can be added from crawling web, literary novels and specialist journals to everyday newspapers and magazines and blogs, emails, and social media and word learned from offline IMEs can be dumped into this corpus.
 
Reference
Oxford English corpus
http://www.oxforddictionaries.com/words/the-oxford-english-corpus

reply via email to

[Prev in Thread] Current Thread [Next in Thread]