Semi-supervised Constituent Grammar Induction Based on Text Chunking Information.
Jesus Santamaria and Lourdes Araujo.
International Conference on Intelligent Text Processing and Computational Linguistics CICLing-2013.
Lecture Notes in Computer Science 7816, p. 258-269, Springer-Verlag.

There is a growing interest in unsupervised grammar induction, which
does not require syntactic annotations, but provides less accurate results than the
supervised approach. Aiming at improving the accuracy of the unsupervised approach,
we have resorted to additional information, which can be obtained more
easily. Shallow parsing or chunking identifies the sentence constituents (noun
phrases, verb phrases, etc.), but without specifying their internal structure. There
exist highly accurate systems to perform this task, and thus this information is
available even for languages for which large syntactically annotated corpora are
lacking. In this work we have investigated how the results of a pattern-based unsupervised
grammar induction system improve as data on new kind of phrases are
added, leading to a significant improvement in performance. We have analyzed
the results for three different languages. We have also shown that the system is
able to significantly improve the results of the unsupervised system using the
chunks provided by automatic chunkers.