Welcome to my personal website. I’m Assistant Professor at the Department of Languages and Information Systems at UNED, and Researcher in Language Technologies at NLP & IR Research Group. I finished my Ph.D on the use of linguistic and semantic information for modeling emotions in text for polarity classification.
My research interests are Natural Language Processing, specially Social Networks Data Mining and eHealth in Social Networks, and Systems Evaluation. At present, I am working in Controversy Detection and Sexism Detection in Social Networks in the FairTransNLP Project. Also, I am working in EvALL, an online service for Information System Evaluation.
During 2022 and 2023 I am collaborating with Damiano Spina as a Visiting Research Fellow at RMIT.
Ph.D. in Computer Science, 2011
Universidad Complutense de Madrid (UCM)
MSc in Artificial Intelligence, 2008
Universidad Complutense de Madrid (UCM)
BSc in Computer Science, 2006
Universidad Complutense de Madrid (UCM)
Associate Professor, School of Computer Science
Researcher in Language Technologies, School of Computer Science (UNED)
Visiting Research Fellow, School of Computing Technologies
Sexism comprises any form of oppression or prejudice against women because of their sex. The aim of the EXIST dataset is to cover sexism in a broad sense, from explicit misogyny to other subtle expressions that involve implicit sexist behaviours.
The MeTwo dataset is a corpus for the detection of sexist expressions and attitudes in Twitter. MeTwo is the first corpus in Spanish designed to identify sexism in a broad sense, from hostile to much more subtle sexism.
We propose a new metric for Ordinal Classification, Closeness Evaluation Measure, that is rooted on Measurement Theory and Information Theory
The RepLab summarization dataset contains companies data from the RepLab 2013 dataset. The collection comprises tweets about 31 entities from two domains: automotive and banking. As a result, our subset of RepLab 2013 comprises 71,303 English and Spanish tweets.
The eDiseases dataset contains patient data from the MedHelp. We extracted 146 posts for allergies, 191 posts for crohn, and 142 posts for breast cancer; which include 983 sentences for allergies, 1780 sentences for crohn, and 1029 sentences for breast cancer. Each sentence in the dataset is labeled with Factuality (OPINION, FACT, EXPERIENCE) and Polarity (POSITIVE, NEUTRAL, NEGATIVE).
We define the Rank-Biased Utility (RBU) metric – an adaptation of the well-known Rank-Biased Precision metric – that takes into account redundancy and the user effort associated to the inspection of documents in the ranking with diversity task.
The EvALL online evaluation service aims to provide a unified evaluation framework for Information Access systems. EvALL allows to: (i) evaluate results in a way compliant with measurement theor; (ii) provide their results as reusable data to the scientific community; (ii) automatically generate evaluation figures and (low-level) interpretation of the results, both as a pdf report and as a latex source.
RepLab 2014 focuses on Reputation Monitoring on Twitter, targeting two new tasks: the categorization of messages with respect to standard reputation dimensions (Performance, Leadership, Innovation, etc.) and the characterization of Twitter profiles (author profiling) with respect to a certain activity domain.
We present a semi-automatic tool that assists experts in their daily work of monitoring the reputation of entities —companies, organizations or public figures- in Twitter.
The RepLab 2013 task is a (multilingual) evaluation exercise for Online Reputation Management systems. RepLab 2013 focused on monitoring the reputation of entities (companies, organizations, etc.) on Twitter. The monitoring task consists of filtering those that do refer to the entity, detecting topics (i.e., clustering tweets by subject) and ranking them based on the degree to which they signal reputation alerts.
The SentiSense Affective Lexicon consists of 5,496 words and 2,190 synsets from WordNet 2.1 labeled with an emotional category. The main part of the lexicon consists of nouns and adjectives, followed by verbs and a small set of adverbs. SentiSense is available in English (WordNet 2.1 and WordNet 3.0) and in Spanish (WordNet 3.0). Also, Polar words are provided in both languages.
SentiSense is endowed with a set of tools that allow users to visualize the lexicon and some statistics about the distribution of synsets and emotions in SentiSense, as well as to easily expand the lexicon. This tool is only available for the SentiSense version in English that uses WordNet 2.1.
The HotelReview Corpus is a corpus of 1000 reviews extracted from booking.com where each review has been manually tagged with a 5-classes category within the set Excellent, Good, Fair, Poor, Very poor and with a 3-classes category within the set Good, Fair, Poor.