aspell-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Neural nets and spelling


From: Kevin Atkinson
Subject: Re: Neural nets and spelling
Date: Tue, 10 Nov 1998 12:00:31 -0500

Ian Clarke wrote:
> 
> > But I agree with you that we can get a useful subset of generalization
> > from a few thousand examples, provided we have a sane network topology.
> > However, that's still a lot.
> 
> True, we could possibly make use of a web page to collect training data

> though.
> 
> > The net is trivially not as general, because with a window, we will
> > not be able to generalize beyond the window size.
> 
> Ah, I see now, we are talking about different things.  You are talking
> about using the neural net to detect spelling mistakes, I am talking about
> using them to translate words into their 'soundalikes'.  At present a very
> simple algorithm is used to do this which involves removing vowels and
> converting k's to c's or something like that.  A neural network should be
> able to perform this task much better than this simple algorithm.

The algorithm is actually I little more complicated than that.  Look at
metaphone.cc.  But the problem is that it is two rough.  Thus fine,
fain, fan, phone, fone, etc.. all get the same key.

The problem is right now that while my spell checker does a good job of
coming up with suggestions it is not very accurate.  That is the most
likely candidate is not first on the list.  

For example "fone" would produce the following list.

fine
fain
fan
fans
faun
fawn
fen
fin
fines
fins
fun
phone
...(about 20 more)

As you can see the most likely canadint phone is near the middle of the
list.

What I really need is an algorithm to convert fone to its phoneme so
that fone and phone will
have the save phone and thus be at the top of the list.

My current system of scoring is a weighted average between the score of
the actual spelling of the word and the metaphone.  However because all
the words shown above have the save metaphone fine becomes fine because
it only different from fone by one letter.  There is nothing I can
really do to correct it without some word of other scoring algorithm.

One way to improve this is two also score the phonemes of the two words.

(1) However if a neutral net can be used to score the whole thing and
get better results than go for it.  Just keep in mind the the neutral
network will only be scoring the words, it would not be coming up with
new ones as I will use my current method for that. (Unless it can some
how do a better job).  See the How it Works chapter.  You could you my
test data as a start for training but you would probably need a lot more
than that.

(3) Then again, a to-phenome algorithm would also work.  The only
question is getting enough and the right kind of test data.  Does anyone
know of a publicly available pronecation dictionary?

My only question is how fast would it be compared to my current method?
(Sorry I am really AI illiterate)  In particular how fast would an AI
based to phoneme conversion be compared to a hand tuned one.  Also which
one would you think would be better the first or the second?

> 
> > First step is to collect a test body.  Then it's easier to see what obvious
> > limitations any given net configuration will have.
> 
> I agree.  I think a webpage is a good way to do this.  Does anyone know
> whether the hit-rate on the Aspell home page would be sufficient to do
> this?  Perhaps Rob Malda of SlashDot could direct some hits our way for
> this?
> 

Go to http://metalab.unc.edu/stats/
and enter in /kevina/aspell (NO TRALING SLASH)
and select the date range.  You can ignore the error messages as the
stats are still correct for the most part.
Traffic before I posted to freshmeat was low (Middle of October or so)

-- 
Kevin Atkinson
address@hidden
http://sunsite.unc.edu/kevina/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]