comp-biomed18.html

Deep neural models for extracting entities and relationships in the new RDD corpus relating disabilities and rare diseases.
Hermenegildo Fabregat, Lourdes Araujo, Juan Martínez-Romo
Comput. Methods Programs Biomed. 164: 121-129 (2018)

Background and Objective: There is a huge amount of rare diseases, many of which have associated important
disabilities. It is paramount to know in advance the evolution of the disease in order to limit and prevent the
appearance of disabilities and to prepare the patient to manage the future difficulties. Rare disease associations
are making an effort to manually collect this information, but it is a long process. A lot of information about the
consequences of rare diseases is published in scientific papers, and our goal is to automatically extract disabilities
associated with diseases from them.
Methods: This work presents a new corpus of abstracts from scientific papers related to rare diseases, which has been
manually annotated with disabilities. This corpus allows to train machine and deep learning systems that can
automatically process other papers, thus extracting new information about the relations between rare diseases and
disabilities. The corpus is also annotated with negation and speculation when they appear affecting disabilities.
The corpus has been made publicly accessible.
Results: We have devised some experiments using deep learning techniques to show the usefulness of the developed
corpus. Specifically, we have designed a long short-term memory based architecture for disabilities identification,
as well as a convolutional neural network for detecting their relationships to diseases. The systems designed do not
need any preprocessing of the data, but only low dimensional vectors representing the words.
Conclusions: The developed corpus will allow to train systems to identify disabilities in biomedical documents,
which the current annotation systems are not able to detect. The system could also be trained to detect relationships
between them and diseases, as well as negation and speculation, that can change the meaning of the language.
The deep learning models designed for identifying disabilities and their relationships to diseases in new documents
show that the corpus allows obtaining an F-measure of around 81% for the disability recognition and 75% for relation
extraction.