You are here

Learning Corpus Patterns Using Finite State Automata

Event date: 
Thursday, 4 April, 2013 - 11:00
Sala Consiglio Scientifico , Edificio Ovest, ground floor
Octavian Popescu

In this talk we argue that natural language has computationally discoverable regular properties characterizing a certain type of phrases. While a word in isolation has a high potential of expressing various senses, in certain phrases this potential is restricted up to the point that one and only one sense is possible. A phrase is called sense stable if the senses of all the words compounding it do not change their sense irrespective of the context which could be added to its left or to its right. By comparing sense stable phrases we can extract corpus patterns. These patterns have slots which are filled by semantic types that capture the relevant information for disambiguation. The relationship between slots is such that a chain like disambiguation process is possible. Acquiring from corpus these kinds of patterns is beneficial for NLP, because problems such as data sparseness, noise, learning complexity are alleviated.

We present a set of  experiments  involving these patterns carried on various corpora: learning from examples, recognizing, application in WSD, TE,  and MT.