Current Projects

EXTRAE

Duration: 2018-2019
Financing institution: IMIENS
Summary: En este proyecto nos proponemos diseñar algoritmos que ayuden a la identificación de relaciones relevantes entre distintas enfermedades. Esta información es muy útil para realizar nuevos diagnósticos, probar nuevos tratamientos o fármacos, o para prever la posible evolución de la enfermedad, etc. . Muchas enfermedades comparten uno, o varios aspectos, como síntomas, evolución, tratamiento, etc., pero esto no siempre significa que exista una relación entre ellas. Por ello, lo que proponemos es un sistema capaz de detectar relaciones entre enfermedades que se pueden considerar significativas. La significatividad vendrá dada por la coincidencia de aspectos más allá de la casualidad que se capturará definiendo un modelo estadístico apropiado. Las relaciones entre distintas enfermedades se pueden establecer en base a distintos patrones, separada o conjuntamente: aparición conjunta, sí­ntomas comunes, similitudes de tratamientos, etc. Estas relaciones entre enfermedades se pueden codificar como Reglas de Asociación (RA), que se pueden considerar formas de representar el conocimiento médico subyacente en el conjunto de HCE almacenadas en el repositorio de información clínica.
Este proyecto se enmarca en la Convocatoria IMIENS de Ayudas para la realización de Proyectos de Investigación Conjuntos entre grupos de investigación de la UNED y el Instituto de Salud Carlos III.

VEMODALEN

Análisis Automático del Significado y la Autoridad en Social Media

Duration: 2016-2018 
Financing institution: Ministerio de Economía y Competitividad
Convocatoria 2015, Modalidad 1: Proyectos DE I+D+I, del Programa Estatal de Investigación, Desarrollo e Innovación Orientada a los Retos de la Sociedad.
Summary: For an average citizen of our digital era, the problem is no longer finding relevant information, but assimilating the massive amount of relevant available information at any moment in time. This is not possible without the help of a new generation of machines able to digest all relevant sources into a readable, personalized synthesis of the stream of relevant information. And such machines need to acquire two crucial, interdependent skills: (i) the ability to automatically discern when different texts convey approximately the same message; and (ii) the ability to discern the credibility of messages.
Our goal is to address the challenge of computing both textual similarity and source authority in online media, focusing on three different and challenging tasks in three relevant application scenarios: Identification and synthesis of controversy in the medical domain, Generation of reputation profiles for companies/brandS and Recommendation of instructional materials in e-learning environments.

MUSACCES

Museología e integración social: la difusión del patrimonio artístico y cultural del Museo del Prado a colectivos con especial accesibilidad (invidentes, sordos y reclusos)

Duration: 2016-2018 
Financing institution:

Convocatoria 2015 de Programas de Actividades de I+D entre Grupos de investigación de la Comunidad de Madrid, organizada por la Dirección General de Universidades e Investigación de la Consejería de Educación, Juventud y Deporte, en la Comunidad de Madrid. (S2015/HUM3494)

Summary:

The work is structured around three focal points of attention: the first will detect the specific needs and interests of different groups; the second will deal with the design and the creation of applications, systems and virtual exhibitions adapted for these three groups, from some virtual thematic tours or visits of the Museo del Prado; finally, the third focus will seek to invigorate an international network that relates the social projection of museology and its application to the accessibility of the culture to specific groups, all of it through the development of the new technological commodities.

 

The concern about the patrimonial dimension of the Community of Madrid, especially the art collection of the Museo del Prado, leads us to consider the museum as "cultural artifact" that goes beyond its investigative and conservative function, to seek to bring the museum to the viewer, whatever its diversity and condition, making it a sharer of the contact with the artistic reality and inviting him not only to a direct contemplation of a work of art, but to an interaction with the institution and its collections, with the purpose of exceeding the barrier of the sacredness of the works of art and saving the elitist character that the nineteenth century perception of the traditional collections can suppose. 

EXTRECM

EXTracción de RElaciones entre Conceptos Médicos en fuentes de información heterogéneas

Duration: 2014-2017
Financing institution: MINECO (TIN2013-46616-C2-2-R)
Summary: The overall objective of this project is to address the generation of techniques and tools to allow efficient and intelligent access to the contents of medical documents of multilingual nature such as i) general scientific documents, ii) medical records and iii) general information on the Internet. The project will demonstrate, through a series of use cases, the benefits of the application of language technology in the health sector, using advanced Natural Language Processing techniques such as information retrieval applied to large amounts of resources about medical information on the Internet.

Voxpopuli

Duration: 2014-2016 
Financing institution: Ministerio de Economía y Competitividad (TIN2013-4709-C3-1P)
Summary: Online Reputation Management has recently become a fundamental aspect of Public Relations for organizations, personalities and entities in general. The very reason why the online dimension of reputation is now essential the fact that it is the biggest, richest and most updated source of information, opinions and attitudes around any entity it is the reason why a manual analysis of information streams in media and social networks is not viable. Automatic processing of online information crucially depends of the advancements in many research fields (data structures and algorithms for real time Natural Language Processing, Opinion Mining, Textual Synthesis, Novelty Detection and Recommendation, multimedia search, social network analysis, etc.) that, up to now, have paid little attention to the online reputation scenario. For instance, opinion mining has been focused on product reviews, and its results are not applicable to the (much more complex) problem of evaluating how the content of information streams in sial networks may affect the reputation of a company. The project aims towards the creation of a new generation of online reputation monitoring systems, able to understand, process, aggregate and synthesize, in real time, facts, opinions and attitudes around an entity, of presenting such information in multiple dimensions, and of interacting with reputation experts so that they can accomplish their task better and faster. Our research will go from fundamental problems such as textual similarity or data structures for real time Natural Language Processing to prototype validation with reputation experts. Besides algorithms and prototypes, we will also create and distribute test collections to evaluate all relevant technologies in the reputation management scenario.

Past Projects

READERS

Readers: Evaluation And DEvelopment of Reading Systems

Duration: 2013 - 2015
Financing institution:  EU (CHIST-ERA 2011) + Mineco (PCIN-2013-002-C02-01)
Summary: The READERS project proposes new unsupervised computational models to automatically extract background knowledge after reading large amounts of unstructured text. This knowledge will be in the form of classes, categorized entities and predicates whose arguments are typified by probability distributions over classes. Classes themselves will be automatically organized into taxonomies related to the predicates in which they participate.

LiMoSINe

Linguistically Motivated Semantic Aggregation Engines

Duration: 2011-2014
Financing institution:  European Comission, FP7-ICT
Summary: The LiMoSINe vision is to transition access to online information from a document-centric search paradigm focused on returning disconnected atomic pieces to a truly semantic aggregation paradigm. In this new paradigm, machines will understand a user's intent, discover and organize facts, identify opinions, experiences and trends, all from inherently multilingual online sources and open knowledge repositories. LiMoSINe's aggregation engines will automatically organize search results in semantically meaningful ways.

ELIAS

Evaluating Information Access Systems

Duration: 2011-2016
Financing institution:  European Science Foundation
Summary: ELIAS will define a new measurement paradigm for the evaluation of search engines based on so-called living laboratories. This paradigm involves (i) exploitation of novel market places and forums where large numbers of users are recruited into early stage evaluation experiments to test a particular aspect of an information access system; and (ii) using operational systems as experimental platforms on which to conduct user-based experiments at scale.

HOLOPEDIA

The automatic encyclopedia of people and organizations.

Duration: 2010-2012
Financing institution: MICINN (TIN2010-21128-C02)
Summary: The main goal of the project is to develop algorithms, techniques and systems able to mine and aggregate information relative to people and organizations from unstructured and structured web sources, such as social networks, blogs, news, semantic web data, and websites in general.

MA2VICMR

Mejorando el Acceso, el Análisis y la Visibilidad de la Información y los Contenidos Multilingüe y Multimedia en Red para la Comunidad de Madrid

Duration: 2010-2013
Financing institution: Regional Government of Madrid (S2009/TIC-1542)
Summary: Improving access, analysis and visibility of multilingual and multimedia Web contents.

Buscamedia

Duration: 2009-2012
Financing institution: CDTI (CEN-20091026)
Summary: Development of a true Multimedia Semantic Search Engine.

Existes/WebOpinion

Financing institution: Sub-contracts by Grupo ALMA
Summary: Online Reputation Managing

QEAVis

Quantitative Evaluation of Academic Websites Visibility

Duration: 2008-2010
Financing institution: CICYT (TIN 2007-67581-C02-01)
Summary: Automated Classification of academic websites by topic and language, in order to create ranks with them. The main goal of the project is to improve the accessibility and visibility of academic information on the World Wide Web.

TrebleCLEF

Evaluation Best Practice and Collaboration for Multilingual Information Access

Financing institution: European Commission
Summary: TrebleCLEF supports the development and consolidation of expertise in the multidisciplinary research area of multilingual information access (MLIA) and disseminates this knowhow to the application communities through a set of complementary activities.

Text-Mess (subproyecto INES)

 

Duration: 2007-2009
Financing institution: CICYT (TIN2006-15265-C06-02)

MultiMatch

Multilingual/Multimedia Access To Cultural Heritage

Duration: 2006-2009
Financing institution: European Commission, 6FP (STREP 033104)
Summary: MultiMatch plans to develop a multilingual search engine specifically designed for access, organisation and personalised presentation of cultural heritage information.

MAVIR

Mejorando el acceso y visibilidad de la información multilingüe en red para la Comunidad de Madrid

Duration: 2006-2009
Financing institution: Comunidad de Madrid, IV PRICIT, (S-0505/TID/0267)
Summary: MAVIR es una red de investigación formada por un equipo multidisciplinar de científicos, técnicos, lingüistas y documentalistas para desarrollar un esfuerzo integrador en las líneas de investigación, formación y transferencia de tecnología.

MedIEQ

Quality Labelling of Medical Web Content using Multilingual Information Extraction.

Duration: 2006-2008
Financing institution: European Commission (EC Programme: Public Health 61383)
Summary: Quality Labelling of Medical Web Content using Multilingual Information Extraction

SWIISA

Speech Web and Images Interactive Search Assitants

Duration: 2006-2007
Financing institution: UNED
Summary: Estudio de aplicación de asistentes interactivos a tres línas: búsqueda translingüe sobre imágenes, sobre la Web y sobre transcripciones automáticas de reconocedores de habla.

R2D2 (subproyecto Syembra)

Recuperación de Respuestas en Documentos Digitalizados

Duration: 2003-2006
Financing institution: CICYT (TIC2003-07158-C04)
Summary: Evaluation of cross-lingual answer retrieval systems.

RIBIDI

Recuperación de Información en Bibliotecas Digitales

Duration: 2001-2004
Financing institution: CYTED VII.19
Summary: Cooperación iberoamericana en investigación y desarrollo de tecnologías para recuperación de información y bibliotecas digitales.

CLEF

Cross-Language Evaluation Forum

Duration: 2001-2003
Financing institution: European Commission, 5FP (IST-2000-31002)
Summary: Evaluation of Cross-Language Information Retrieval Systems for European Languages

ETB

European Schools Treasury Browser

Duration: 2000-2002
Financing institution: European Commission, 5FP (IST Programme)
Summary: Access to meta-information about educational resources and new technologies in Europe.

DELOS

DELOS: a Network of Excellence on Digital Libraries

Duration: 2000-2002
Financing institution: European Commission, IST Programme
Summary: The main objective of DELOS is to coordinate a joint programme of activities of the major European teams working in digital library related areas.

NAMIC

News Agencies Multilingual Information Categorization

Duration: 1999-2002
Financing institution: European Commission, 5FP (IST-1999-12392)
Summary: NAMIC main objective is to develop and bring to marketable stage advanced NLP technologies for multilingual news customization and broadcasting throughout distributed services.

EuroWordnet

Duration: 1996-1999
Financing institution: European Commission, 4FP (Telematics, LE 4003)
Summary: The project aimed at building a multilingual lexical database with semantic relations between words in 8 european languages (Spanish, English, Italian, Dutch, French, German, Estonian and Czech). Every monolingual wordnet is linked to the others via an InterLingual Index derived from Wordnet 1.5.

ELSNET LE Training Showcase

Financing institution: ACO*HUM (Socrates), ELSENET, European Commission
Summary: A project under the auspices of ELSNET and ACO*HUM excellence networks to develop 6 specialization courses around Natural Language Processing and Speech Recognition and synthesis. Our task was to develop an open distance learning course on Natural Language Processing and Information Retrieval.

Hermes

Duration: 2001-2003
Financing institution: CICYT (TIC2000-0335-C03-01)
Summary: Multilingual named-entity recognition, hyperlinking, phrase extraction, summarization and semantic indexing for information access on a digital news archive.

RILE

Servidor de Recursos para el Desarrollo de la Ingeniería Lingüística en Español

Duration: 1999-2000
Financing institution: M.I.N.E.R.
Summary: The goal of RILE is to develop a pilot for a server with resources, tools and information related to the development of applications in the field of Language Engineering for Spanish.

ITEM

Recuperación de Información Textual en un Entorno Multilíngüe

Duration: 1996-1999
Financing institution: CICyT (TIC96-1243-C03-01)
Summary: Development and integration of Language Engineering resources and tools for Spanish, Catalan, Basque and English and demonstration of such tools in a multilingual search engine with NLP capabilities.

ACQUILEX II

 

Duration: 1993-1995
Financing institution: European Commission (Esprit BRA 7315)
Summary: The goal was to explore the utility of constructing a multilingual lexical knowledge base from machine-readable versions of conventional dictionaries by exploring the utility of machine readable textual corpora as a source of lexical information not coded in conventional dictionaries, and by adding dictionary publishing partners to exploit the lexical database and corpus extraction software developed by the projects for conventional lexicography.