|
From: | Navaneeth |
Subject: | [Varnamproject-discuss] [bug #41902] [libvarnam] Normalization of words while learning |
Date: | Wed, 19 Mar 2014 04:50:09 +0000 |
User-agent: | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:27.0) Gecko/20100101 Firefox/27.0 |
URL: <http://savannah.nongnu.org/bugs/?41902> Summary: [libvarnam] Normalization of words while learning Project: Varnamproject Submitted by: navaneethkn Submitted on: Wed 19 Mar 2014 01:50:08 PM TLT Category: libvarnam Severity: 3 - Normal Item Group: Bug Status: None Privacy: Public Assigned to: None Open/Closed: Open Discussion Lock: Any _______________________________________________________ Details: In Unicode, there are characters which look same but have different code points. The atomic chills is Malayalam is an example. Also Unicode text can contain metadata characters like "SOFT HYPHEN (xad)". This has to be removed and normalized to a standard form while learning a word. Scheme file will define the word normalization rules and varnam will apply them while learning. _______________________________________________________ Reply to this item at: <http://savannah.nongnu.org/bugs/?41902> _______________________________________________ Message sent via/by Savannah http://savannah.nongnu.org/
[Prev in Thread] | Current Thread | [Next in Thread] |