Machine Reading of Biomedical Texts about the

Alzheimer Disease


It is aimed at setting questions in the Biomedical domain with a special focus on one disease, namely Alzheimer. This pilot task will explore the ability of a system to answer questions using scientific language. Texts will be taken from Medline abstracts. MEDLINE (Medical Literature Analysis and Retrieval System Online) is a bibliographic database of life sciences and biomedical information. It was compiled by the United States National Library of Medicine (NLM), and is freely available on the Internet. In order to keep the task reasonably simple for systems, participants will be given the background collection already processed with Tok, Lem, POS, NER, and Dependency parsing. A development set will also be provided to participants.

The task will be offered in English only and will be coordinated by the University of Antwerp, Belgium.

Task Description

This task aims at exploring the ability of a machine reading system to answer questions about a scientific topic, namely Alzheimer's disease. As in the main QA4MRE task, this task focuses on the reading of single documents and the identification of the answers to a set of questions about information that is stated or implied in the text. Questions are in the form of multiple choice, each having five options, and only one correct answer. The detection of correct answers is specifically designed to require various kinds of inference and the consideration of previously acquired background knowledge from reference document collections provided by the organization. Although the additional knowledge obtained through the background collection may be used to assist with answering the questions, the principal answer is to be found among the facts contained in the test documents given.

Participants will be provided with a background collection, the Alzheimer's Disease Literature Corpus, and test documents about Alzheimer's disease. To solve the task, participants can make use of existing resources, such as ontologies or databases, and tools, such as named entity taggers, event extractors, parsers, etc. In order to keep the task reasonably simple for systems, the organisation will provide the texts of the background collection and the test documents processed at several levels of linguistic analysis (lemmas, part-of-speech, named entities, chunking, dependency parsing). Publicly available state of the art tools will be used for this purpose.

This is the second edition of the task, which was run as a pilot task of the QA4MRE Lab at CLEF 2012.

Background Collection

The background collection is a reference corpus consisting of documents related to the topic. As in the main task, the background collection should be used by the systems to acquire the reading capabilities and the knowledge needed to answer questions about the test documents. The collection consists of abstracts and full articles about Alzheimer's Disease: (i) Around 66,000 abstracts from PubMed. PubMed ( is a free resource that is developed and maintained by the National Center for Biotechnology Information (NCBI), at the U.S. National Library of Medicine (NLM), located at the National Institutes of Health (NIH). (ii) Around 8,000 Open Access full articles from PubMed Central. PubMed Central ( is a free full-text archive of biomedical and life sciences journal literature at the U.S. National Institutes of Health's National Library of Medicine (NIH/NLM). (iii) Full articles about the key hypotheses in Alzheimer Disease published by Elsevier.

Test data

The test set will be composed of 4 reading tests. Each reading test will consist of one single document, with 10 questions and a set of five choices per question. So, there will be in total 40 questions and 200 choices/options.

Participating systems will be required to answer these 40 questions by choosing in each case one answer from the five alternatives. There will always be one and only one correct option. Systems will also have the chance to leave some questions unanswered if they are not confident about the correctness of their response.


Document collections and reading tests will be available in English.


This pilot task will be evaluated following the same criteria as the main task. Evaluation will be performed automatically by comparing the answers given by systems to the ones given by humans. No manual assessment will be required. Each test will receive an evaluation score between 0 and 1 using c@1.

Example Tests

An example test will be provided so that participants can form an idea about the type of texts and questions that will be provided as evaluation set. A preprocessed version of the test set will also provided, so that participants can form an idea of the format and annotations that will be provided. The example test will be available from the download site of the main task.



  • Roser Morante, Walter Daelemans - CLiPS, University of Antwerp, Belgium
  • Martin Krallinger and Alfonso Valencia - CNIO, Madrid, Spain
Technical support:
  • Vincent Van Asch - CLiPS, University of Antwerp, Belgium
  • Florian Leitner - CNIO, Madrid, Spain
  • Cartic Ramakrishnan - Information Sciences Institute of the University of Southern California, USA
  • Gully A.P.C. Burns - Information Sciences Institute of the University of Southern California, USA
Domain advisor:
  • Tim Clark, Massachusetts Alzheimer's Disease Research Center, USA
Data providers:
  • Elsevier, Pubmed Central, Medline
General coordinators of QA4MRE:
  • Anselmo Peñas - IR&NLP Group, UNED, Madrid, Spain
  • Eduard Hovy - Information Sciences Institute of the University of Southern California, USA
Technical Management and data collection infrastructure:
  • Pamela Forner - Giovanni Moretti, CELCT , Italy
  • Roser Morante - roser.morante[at]
  • Walter Daelemans - walter.daelemans[at]