aspell-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Neural nets and spelling


From: Asger K. Alstrup Nielsen
Subject: Neural nets and spelling
Date: Tue, 10 Nov 1998 10:05:47 +0100 (MET)

> The only problem is that when used as a library or through aspell -a
> mode the application will have to communicate the replacement word used
> to the spell checker.

In the case of a library, it would be easy to add:  Just add a function
that given a string will return the same string, or a replaced string.

In the case of ispell compatibility mode, just ignore it.

> Some one else also discussed using a neural network that Aspell could
> adapt it self to how the user misspells.  What do you think of that
> idea?

A neural net is nothing more than a function learner.  Given some input,
it will learn specific output.  The fun part is that doing this, it also
has the capability to generalize the function, such that it will provide
an estimate for unknown inputs.

If we were to use a neural net to accomplish the above task in one go, we 
would need a huge neural net.  Assuming a feed-forward net (the most common
and best investigated), encoding each letter will require at least
1 input or at best as many inputs as there are letters.
We need such an encoding for every letter in the word, and then we need the 
same for the output, or arguably more because nobody says that the 
replacement word has the same amount of letters as the input word.

In between, we need a hidden layer of nodes.  The amount of nodes needed is 
difficult to determine in advance, and it depends on the number of input 
nodes used, and the problem at hand.

Ignoring all of these unknowns, let's speculate and have a look at how many 
connections, we'll need in an optimistic setting:

Assuming that each letter is encoded using 5 inputs, and we have 26 letters.
We want to restrict outselves to words of up to 10 letters.  Let's be opti-
mistic and assume that we can settle for 10 hidden nodes.

26*5*10 inputs connected to 10 hidden nodes that are connected 
to 26*5*10 outputs.

That gives us 1300*10+10*1300 connections: A total of 26000 connections
in an optimistic scenario.

In order to train the network, we have to collect a body of examples from
which it can learn.  Given that we have such a huge space (it's a function
from an 1300 dimensional space to another 1300 dimensional space), we need
a huge body if we want to get predictable behaviour for just a minor fraction
of the input space.

Judging from this, I doubt a neural net is useful in this situation, 
with the setup described above.

One can arguably design a different neural-net that solves a smaller task
in the bigger task of suggesting spell-corrections.  That would probably
be more feasible.  For instance, instead of feeding the entire word, one
would feed a part of the word, and instead of getting a complete replacement
word, we would only get one letter at the time.

In this way, the neural net would reduce to something that we can handle,
but the cost is that the net isn't as general.

However, it might work, and it's worth trying out.

There are a few related articles about hyphenation and neural nets on the net:

http://ilk.kub.nl/~antalb/pubs.html

http://www.neuroinformatik.ruhr-uni-bochum.de/ini/PEOPLE/fritzke/papers/papers.html

The first is the best.

If you need a general introduction to neural nets, check out the FAQ at:

http://wwwipd.ira.uka.de/~prechelt/FAQ/neural-net-faq.html

(this points to the newer FAQ, but when I tried the link was too slow for me.)

In the FAQ, there are references to general neural net frameworks that can
help build such a thing.

Good luck,

Asger Alstrup



reply via email to

[Prev in Thread] Current Thread [Next in Thread]