Co-occurrence graphs for word sense disambiguation in the biomedical domain.
Andres Duque Fernandez, Mark Stevenson, Juan Martinez-Romo, Lourdes Araujo
Artificial Intelligence in Medicine 87: 9-19 (2018)

Word sense disambiguation is a key step for many natural language processing tasks (e.g. summarization, text
classification, relation extraction) and presents a challenge to any system that aims to process documents from the
biomedical domain. In this paper, we present a new graph-based unsupervised technique to address this problem. The
knowledge base used in this work is a graph built with co-occurrence information from medical concepts found in
scientific abstracts, and hence adapted to the specific domain. Unlike other unsupervised approaches based on static
graphs such as UMLS, in this work the knowledge base takes the context of the ambiguous terms into account. Abstracts
downloaded from PubMed are used for building the graph and disambiguation is performed using the personalized PageRank
algorithm. Evaluation is carried out over two test datasets widely explored in the literature. Different parameters of
the system are also evaluated to test robustness and scalability. Results show that the system is able to outperform
state-of-the-art knowledge-based systems, obtaining more than 10% of accuracy improvement in some cases, while only
requiring minimal external resources.