Readers: Evaluation And DEvelopment of Reading Systems
READERS is a three-year project (2013-2015) that has the financial support of MINECO (PCIN-2013-002-C02-01), ANR (convention ANR-12-CHRI-0004-03) and EPSRC (EP/K017845/1) in the framework of ERA-NET CHIST-ERA
The READERS project proposes new unsupervised computational models to automatically extract background knowledge after reading large amounts of unstructured text. This knowledge will be in the form of classes, categorized entities and predicates whose arguments are typified by probability distributions over classes. Classes themselves will be automatically organized into taxonomies related to the predicates in which they participate.
In this way, new methods and models based on extensional definitions of concepts will be developed and deployed for the automatic creation of knowledge bases. Importantly, these will be closely related to textual representations and instrumental in enabling textual inferences. The extracted knowledge will be also linked to external human-made resources such as Freebase, DBPedia and WordNet, and the knowledge bases will be interfaced with several engines for performing disambiguation, relation extraction, term expansion, and measuring relatedness.
A key part of the project will be the development of a reading matching that will use all these resources and tools. The purpose of our reading machine is to answer queries about a given text. Texts are never self-contained and their interpretation always requires recovering large amounts of background knowledge. Thus, the Machine Reading technology under development must incorporate not only language processing but also the recovery and use of large amounts of background knowledge.
This Machine Reading technology will be evaluated through Multiple-Choice Reading Comprehension tests (MRC) developed by humans over unseen documents. MRC tests enable objective and reproducible evaluation experiments, and will be 100% reusable as benchmarks available for the international community. Interestingly, the industrial partner in charge of the Machine Reading system development will apply the reverse technology to automatically generate MRC tests for the automatic assessment of children's reading abilities. This reading machine will work with at least two languages, English and French.
The support and coordination of an international evaluation campaign for Machine Reading in multiple languages is part of the proposal. This evaluation campaign will serve to measure the progress in the development of the Machine Reading technology in a comparative/competitive environment. Evaluation exercises in specific domains such biomedicine will also provide a venue for technology transfer and allow us to assess the portability of the proposed technology.
The objectives of the project are divided into four groups:
1. Background reading
1.1. Find natural representations of background knowledge after processing large amounts of texts
1.2. Explore different unsupervised and distant-supervised algorithmic proposals for background reading. Examples include the induction of semantic roles, the discovery of class-instance relations, and so on.
1.3. Create automatically propositional background knowledge bases
2. Knowledge Linking and Integration
2.1. Integrate the acquired background knowledge with existing knowledge repositories
2.2. Develop distant supervised techniques for Entity Linking and Relation Extraction
2.3. Extend existing knowledge bases with new taxonomical relations and instances
3. Machine Reading system development and application
3.1. Develop a Machine Reading system in several languages able to solve reading comprehension tests about a single document
3.2. Apply the Machine Reading technology to assist the assessment of children reading abilities
4. Evaluation of Machine Reading systems
4.1. Develop a rigorous evaluation methodology for Machine Reading systems
4.2. Coordinate the generation of benchmarks for measuring progress of Machine Reading technology
4.3. Coordinate an international comparative/competitive evaluation task for the assessment of Machine Reading systems