varnamproject-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Varnamproject-discuss] Loading the words database


From: Navaneeth K N
Subject: [Varnamproject-discuss] Loading the words database
Date: Tue, 12 Aug 2014 18:46:00 +0530
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:24.0) Gecko/20100101 Thunderbird/24.6.0

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Hello Sooraj,

Here is how you can setup the word corpus. Currently, learn is bit slow,
so loading this corpus fully will take around 1 hour to complete. You
can do individual files if you want. And do other files incrementally.

Here are the steps,

Download:

        wget
http://download.savannah.gnu.org/releases/varnamproject/words/ml/word-corpus.tar.gz

        tar -xvf word-corpus.tar.gz

If you want to setup individual files,

        cd words
        varnamc -s ml --learn-from 0.txt  (this learns all the words in 0.txt.
You can do other files one by one)

If you want to learn all the words (will take more time and CPU intensive)

        varnamc -s ml --learn-from words (reads all the files under 'words'
directory and learns all the words from it)

Hope that helps. Let me know if some other help is required. This
process will be simplified later once I am done with the new algorithm
that I am doing. That will reduce the storage space requirements too.

- -- 
Cheers,
Navaneeth
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
Comment: GPGTools - https://gpgtools.org

iQEcBAEBCgAGBQJT6hOQAAoJEHFACYSL7h6kTRAIAM4i1YY8ozAdyBpEoWEN5aX3
/tO771Ba/jCYbcQ8Oplw0lCfjRk/P2fz2GdqQQgxr6QOB5aHTUOGnQh6sUePz+1g
GpK4H5UhwWDt30856duWGeTRAiHiBSQKjc+Trfl1OTehBlw7iZ1laqoG663ojWD2
Xf+EZYxxFNgOGqKJ3uUwZSQ9XqAuP4Ehl9xzNB8z6nAXqUfunI3r6vS4LI8Mobgt
+/tsYM6SdiHATiJobMpel8olrX7dYgpyAbkQY76XQPAcKVt4H13DkQxnG0Sx0txr
rVGANEYqQd7ZyZDYLcvd7nvVs8p2BE75WNwlx0wvg87IW0SdJHJQoGx8s72NICM=
=P3pA
-----END PGP SIGNATURE-----



reply via email to

[Prev in Thread] Current Thread [Next in Thread]