[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Savannah-hackers] submission of Book Index Generator for Thai Book - s
From: |
b4205072 |
Subject: |
[Savannah-hackers] submission of Book Index Generator for Thai Book - savannah.nongnu.org |
Date: |
Sun, 05 Jan 2003 02:14:04 -0500 |
User-agent: |
Mozilla/5.0 Galeon/1.2.7 (X11; Linux i686; U; ) Gecko/20021208 Debian/1.2.7-5 |
A package was submitted to savannah.nongnu.org
This mail was sent to address@hidden, address@hidden
Vee Satayamas <address@hidden> described the package as follows:
License: gpl
Other License:
Package: Book Index Generator for Thai Book
System name: thbookidx
Type: non-GNU
Description:
Book Index Generator for Thai Book generates Indies at back of the book
automatically. It requires Thai text processing.
Thai language is an asian language which is no space between
each words but space is used to seperate the sentences.
This project generate back of book index base on Salton
algorithm which is the algorithm to calculate the weight of
any word to determine if the word is important to enough to
be an index or not but the major task of this project is
to process Thai text which required :
1. Word segmentation process because there is no space between
Thai words. Nowaday effect algorithm to segment thai words is
base on dictionary but to add all of word in to dictionary is
not possible and there are quite a lot of ambiguity to determine
the word boundary. I try to improve this process and it become
subproject which can find at http://thaiwordseg.sourceforge.net/
2. Noun phrase analysis and word formation process.
Back of book index is not only the words but phrases
therefore phrases and complex words are need to find also.
And it also important to find index and subindex.
There are some class diagram and proposal of this project
but in Thai language at
http://vivaldi.cpe.ku.ac.th/~vee/wiki.php/BookIndex
Other Software Required:
Other Comments: