Studying the Advantages of a Messy Evolutionary Algorithm for Natural Language Tagging.
Lourdes Araujo.
International Conference on Genetic and Evolutionary Computation Conference (GECCO 2003).
Lecture Notes in Computer Science 2724. pp. 1951-1962, Springer-Verlag.

The process of labeling each word in a sentence with one of its lexical
categories (noun, verb, etc) is called tagging and is a key step in parsing
and many other language processing and generation applications. Automatic
lexical taggers are usually based on statistical methods, such as Hidden
Markov Models, which works with information extracted from large tagged
available corpora. This information consists of the frequencies of the
contexts of the words, that is, of the sequence of their neighbouring
tags. Thus, these methods rely on the assumption that the tag of a word only
depends on its surrounding tags. This work proposes the use of a Messy
Evolutionary Algorithm to investigate the validity of this assumption. This
algorithm is an extension of the fast messy genetic algorithms, a variety
of Genetic Algorithms that improve the survival of high quality partial
solutions or building blocks. Messy GAs do not require all genes to be
present in the chromosomes and they may also appear more than one time.
This allows us to study the kind of building blocks that arise, thus
obtaining information of possible relationships between the tag of a word and
other tags corresponding to any position in the sentence. The paper describes
the design of a messy evolutionary algorithm for the tagging problem and a
number of experiments on the performance of the system and the parameters
of the algorithm.