Identifying Patterns for Unsupervised Grammar Induction.
Jesús Santamaría, Lourdes Araujo

Proc. of the Fourteenth Conference on Computational Natural Language Learning (conll 2010)
Association for Computational Linguistics, pp. 38-45 (2010)

This paper describes a new method for unsupervised grammar induction based on
the automatic extraction of certain patterns in the texts. Our starting hypothesis
is that there exist some classes of words that function as separators, marking
the beginning or the end of new constituents. Among these separators we distinguish
those which trigger new levels in the parse tree. If we are able to detect these
separators we can follow a very simple procedure to identify the constituents of a
sentence by taking the classes of words between separators. This paper is devoted to
describe the process that we have followed to automatically identify the set of separators
from a corpus only annotated with Part-of-Speech (POS) tags. The proposed
approach has allowed us to improve the results of previous proposals when parsing
sentences fromtheWall Street Journal corpus.