varnamproject-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Varnamproject-discuss] Frequency calculation


From: Kevin Martin
Subject: [Varnamproject-discuss] Frequency calculation
Date: Thu, 24 Apr 2014 23:30:38 +0530

I want to get more familiar with the code base and was hoping to work on this issue:

https://savannah.nongnu.org/bugs/?40401

A simple but inefficient solution will be to use float instead of int. Make the frequency increment by 0.001 instead of 1. I guess that would make the whole program slower since working with floats tend to have more overhead.

I believe that we are only interested in the relative frequencies here. We can have a frequency threshold of, say, 1000. This means that if the frequency of a word exceeds that of the word with the second highest (or third, or whatever) by 1000 or more, we use a normalization function. This will result in words rarely used being reset to 0 (or 1) frequency and the frequencies of other words adjusted to scale. Sort of like the percentile system - but keeps resetting.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]