Discovering taxonomies in Wikipedia by means of grammatical evolution.
Lourdes Araujo, Juan Martinez-Romo, Andres Duque
Soft Comput. 22(9): 2907-2919 (2018)

This work applies grammatical evolution to identify taxonomic hierarchies of concepts from Wikipedia. Each article in
Wikipedia covers a topic and is cross-linked by hyperlinks that connect related topics. Hierarchical taxonomies and their
generalization to ontologies are a highly useful resource for many applications since they enable semantic search and
reasoning. Thus, the automatic identification of taxonomies composed of concepts associated with linked Wikipedia pages
has attracted much attention. We have developed a system which arranges a set of Wikipedia concepts into a taxonomy.
This technique is based on the relationships among a set of features extracted from the contents of the Wikipedia pages.
We have used a grammatical evolution algorithm to discover the best way of combining the considered features in an
explicit function. Candidate functions are evaluated by applying a genetic algorithm to approximate the optimal taxonomy
that the function can provide for a number of training cases. The fitness is computed as an average of the precision
obtained by comparing, for the set of training cases, the taxonomy provided by the evaluated function with the reference
one. Experimental results show that the proposal is able to provide valuable functions to find high-quality taxonomies.