The aim of this research is the development and application of algorithms to combine lexical information with web directories, in order to associate Wordnet word senses with ODP (Open Directory Project) directories. Such associations can be used as rich domain labels and to acquire sense-tagged corpora automatically, cluster topically-related senses and detect sense specializations.
Our current algorithm has been evaluated for the 29 nouns (147 senses) used in the Senseval 2 competition, obtaining 148 word sense/ Internet directory associations covering 88% of the domain-specific word senses in the test data with 86% accuracy.
The richness of Internet directories as sense characterizations is evaluated in a supervised Word Sense Disambiguation task with the Senseval 2 test suite. The results indicate that, when the directory/word sense association is correct, the training samples acquired automatically from the Internet directories are as valid for training as the original Senseval 2 training instances.
The following data is currently available: