Disability annotation on documents from the biomedical domain




"Disabilities is an umbrella term, covering impairments, activity limitations, and participation restrictions. An impairment is a problem in body function or structure; an activity limitation is a difficulty encountered by an individual in executing a task or action; while a participation restriction is a problem experienced by an individual in involvement in life situations."
―― World Health Organization

The main goal of DIANN task is the annotation of disabilities. These conditions affect to a large part of population. For example, they are present in many rare disease. Therefore, it is extremely important to collect information related to them.
There are some tools for the annotation of medical concepts, especially in English, such as Metamap. However, they do not consider disabilities as a distinctive concept, but as any other sign. Thus, they do not allow to distinguish a disability, usually a permanent condition, from other signs associated to diseases.

The organizing committee will provide the participants the training corpus to be used for the development of their systems. A test corpus will be provided later to evaluate the performance of the systems they developed. These corpora include the annotation of negation when it affects to a disability.
The task will be evaluated in two sub-tasks, corresponding the detection of entities in English and Spanish. Participants must present at least the results of the corresponding Spanish sub-task.
We encourage the entire research community and especially those groups working in the detection of entities to participate by adapting their systems to this new task. To facilitate the task, the use of any resource such as UMLS, Metamap, Word Embeddings, etc. will be allowed. The organizers will provide a basic list (not including possible variants) of considered disabilities and human functions that when impaired give rise to a disability.
The annotations of the participating systems must follow the output format explained in the Dataset section.

Inscription

If you are interested in participating, send us an email to diannibereval [at] gmail.com and we will reply you with detailed instructions for participation. All participants will be automatically subscribed to our mailing list. Information about the submission of results and their format will be available soon.
We invite potential participants to contact us, in order to be kept up to date with the latest news related to the task. The organizers will assist you for any potential issues that could be raised.

Schedule

15 March 2018Trial data release
15 April 2018Test data realease
3 May 2018End of the evaluation period
15 May 2018Results posted
1 June 2018End of the period for the reception of Working notes
15 June 2018Release of the Working notes reviews
29 June 2018Release of the Working and overview of the task for the corresponding proceedings

Dataset

The dataset has been collected between 2017 and 2018. DIANN's corpus consists of a collection of 500 abstracts from Elsevier journal papers related to the biomedical domain. We have selected those abstracts for which there are available both, the Spanish and the English versions.
We have annotated the disabilities appearing in these abstracts using the XML tag <dis>. Disabilities are usually expressed either with a specific word, such as "blindness", or as the limitation or lack of a human function, such as "lack of vision".

  • English:
  • Fragile-X syndrome is an inherited form of <dis>mental retardation</dis> with a connective tissue component involving mitral valve prolapse.

    Age related macular degeneration (AMD) is the leading cause of <dis>blindness</dis> in individuals older than 65 years of age.

  • Spanish:
  • El síndrome X frágil es una forma hereditaria de <dis>retraso mental</dis> con una afectación de tejido conectivo que produce prolapso de la válvula mitral.

    La degeneración macular asociada a la edad (DMAE) constituye la causa principal de <dis>ceguera</dis> en personas mayores de 65 años.

The boundaries among diseases, disabilities and signs are often unclear. In order to clarify what have been considered a disability in the annotations for this task, we provide lists of disability terms and lists of functions whose absence or limitation has been considered a disability.

English
Attention Autonomy Cognition
Communication Behaviour Day-to-day living activities
Development Emotions Executive functioning
Feeding Functional capacity Gait
Hearing Language Learning
Mental capabilities Mobility Perception
Psychological capabilities Sensory Sight
Sleep Social cognition Speech
Swallowing Occupational functioning
Spanish
Alimentación Aprendizaje Atención
Audición Autonomía Capacidad funcional
Capacidad intelectual Capacidad ocupacional Capacidades afectivas
Capacidades psicológicas Capacidades sociales Cognición
Comportamiento Comunicación Deglución
Desarrollo Emociones Habilidad para realizar actividades de la vida cotidiana
Lenguaje Memoria Movilidad
Percepción Sensibilidad Sueño
Visión
English Spanish
Aphasia Afasia
Apraxia Apraxia
Ataxia Ataxia
Blindness Ceguera
Deaf-mute Sordomudez
Deafness Sordera
Dementia Demencia
Dysarthria Disartria
Dysautonomia Disautonomía
Dyskinesias Discinesias
Dysphagia Disfagia
Hemiplegia Hemiplejia
Hyperactivity Hiperactividad
Paralysis Paralisis

Notice that these lists only include one representative term for each concept, but different variants of them can appear in the corpus. For example "sightlessness" does not appear because its representative term is "blindness". Functions can also appear in the corpus in a wide variety of forms. For example:

  • English:
    • Sight: One month later, she presented with complaints of systemic oedema and <dis>loss of vision </dis>.
  • Spanish:
    • Movilidad: Fampridina-LP a dosis de 10mg cada 12 h es actualmente el único fármaco autorizado para mejorar el <dis>trastorno de la marcha </dis> en adultos con EM.

The modifiers affecting the disability have been annoated as part of it.

English:

  • aphasia / isolated aphasia / ...
  • dementia / advanced dementia / ...

Spanish:

  • afasia / afasia aislada / ...
  • demencia / demencia avanzada / ...

Acronyms referring to a disability have been also annotated with the <dis> tag, but only if the extended version has also appeared in the same abstract.

  • English:
  • To establish the validity and reliability of the Montreal Cognitive Assessment in Spanish (MoCA-S) to identify <dis>mild cognitive impairment</dis> (<dis>MCI</dis>)...

  • Spanish:
  • Establecer la validez y confiabilidad del Montreal Evaluación Cognitiva en Español (MoCA-E) para identificar <dis>deterioro cognitivo leve</dis> (<dis>DCL</dis>)...

It is necessary to take into consideration that disabilities that appear isolated in texts without any context (e. g., search queries) or disabilities that appear in names of associations or entities (e. g., Care Unit for Dementia Patients) have not been annotated.

A search strategy in _PubMed_ was designed using the following keywords: (gene OR genomics OR GWAS OR high throughput) AND (hearing loss OR chronic otitis media OR age-related hearing loss OR otosclerosis OR Meniere's disease) during the last 5 years.

... in VLBW infants included in the Universal Hearing Loss Screening Programme at the University Mother-Child Hospital of Gran Canaria (Spain) in the 2007–2010 period.

Negation is annotated when it affects one or more disabilities. We have adopted the criteria from the Bioscope corpus for annotating negation in English. For Spanish we have annotated the scope corresponding to the English annotation. Negation is annotated with the <scp> tag. For the sake of clarity, the negation triggers has also been annoted with the <neg> tag.

  • English:
  • In the patients <scp><neg>without</neg> <dis>dementia</dis></scp>, significant differences were obtained in terms of functional and cognitive status (Barthel index of 52.34±38 and Pfeiffer test with an average score of 1.48 ±3.2 (P<.001)).

  • Spanish:
  • En pacientes <scp> <neg>sin</neg> <dis>demencia</dis></scp>, se obtienen diferencias significativas en cuanto a la situación funcional y cognitiva (índice de Barthel de 52,34±38 y test de Pfeiffer con una puntuación media de 1,48±3,2 (p<0,001)).

Evaluation and Results

Two types of matching will be used for the evaluation: partial and exact. For instance, if "severe cognitive impairment" is annotated in the goal standard, exact matching will be "severe cognitive impairment" and partial matching could be "cognitive impairment". For each system, the results obtained will be provided according to the matching type.
The metrics used to evaluate the systems will be precision, recall and their harmonic mean F-measure.

Task Coordinators & Contact


Lourdes Araujo Serna

Juan Martínez Romo

Hermenegildo Fabregat Marcos

Please, if you have any questions regarding the task, please do not hesitate to ask through the email: diannibereval [at] gmail.com