Jorge Carrillo-de-Albornoz

Jorge Carrillo-de-Albornoz

Associate Professor and Researcher in Language Technologies

UNED

Bio

Welcome to my personal website. I’m Assistant Professor at the Department of Languages and Information Systems at UNED, and Researcher in Language Technologies at NLP & IR Research Group. I finished my Ph.D on the use of linguistic and semantic information for modeling emotions in text for polarity classification.

My research interests are Natural Language Processing, specially Social Networks Data Mining and eHealth in Social Networks, and Systems Evaluation. At present, I am working in Controversy Detection and Sexism Detection in Social Networks in the FairTransNLP Project. Also, I am working in EvALL, an online service for Information System Evaluation.

During 2022 and 2023 I am collaborating with Damiano Spina as a Visiting Research Fellow at RMIT.

Interests

  • Controversy Detection
  • Sexism Identification
  • Bias Understanding
  • Natural Language Processing
  • Systems Evaluation

Education

  • Ph.D. in Computer Science, 2011

    Universidad Complutense de Madrid (UCM)

  • MSc in Artificial Intelligence, 2008

    Universidad Complutense de Madrid (UCM)

  • BSc in Computer Science, 2006

    Universidad Complutense de Madrid (UCM)

Affiliations

UNED

UNED

Associate Professor, School of Computer Science

NLP_IR

NLP & IR Reserach Group

Researcher in Language Technologies, School of Computer Science (UNED)

RMIT

RMIT University

Visiting Research Fellow, School of Computing Technologies

Recent Publications

Quickly discover relevant content by filtering publications.
(2021). A Multi-Task and Multilingual Model for Sexism Identification in Social Networks. Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2021), XXXVII International Conference of the Spanish Society for Natural Language Processing., Málaga, Spain, September, 2021.

PDF

(2021). Combining Transformer-Based Models with Traditional Machine Learning Approaches for Sexism Identification in Social Networks at EXIST 2021. Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2021), XXXVII International Conference of the Spanish Society for Natural Language Processing., Málaga, Spain, September, 2021.

PDF

Resources

*
EXIST 2021 Dataset

EXIST 2021 Dataset

Sexism comprises any form of oppression or prejudice against women because of their sex. The aim of the EXIST dataset is to cover sexism in a broad sense, from explicit misogyny to other subtle expressions that involve implicit sexist behaviours.

MeTwo Dataset

MeTwo Dataset

The MeTwo dataset is a corpus for the detection of sexist expressions and attitudes in Twitter. MeTwo is the first corpus in Spanish designed to identify sexism in a broad sense, from hostile to much more subtle sexism.

CEM-Ord metric

CEM-Ord metric

We propose a new metric for Ordinal Classification, Closeness Evaluation Measure, that is rooted on Measurement Theory and Information Theory

RepLab Summarization Dataset

RepLab Summarization Dataset

The RepLab summarization dataset contains companies data from the RepLab 2013 dataset. The collection comprises tweets about 31 entities from two domains: automotive and banking. As a result, our subset of RepLab 2013 comprises 71,303 English and Spanish tweets.

eDiseases Dataset

eDiseases Dataset

The eDiseases dataset contains patient data from the MedHelp. We extracted 146 posts for allergies, 191 posts for crohn, and 142 posts for breast cancer; which include 983 sentences for allergies, 1780 sentences for crohn, and 1029 sentences for breast cancer. Each sentence in the dataset is labeled with Factuality (OPINION, FACT, EXPERIENCE) and Polarity (POSITIVE, NEUTRAL, NEGATIVE).

RBU metric

RBU metric

We define the Rank-Biased Utility (RBU) metric – an adaptation of the well-known Rank-Biased Precision metric – that takes into account redundancy and the user effort associated to the inspection of documents in the ranking with diversity task.

EvALL

EvALL

The EvALL online evaluation service aims to provide a unified evaluation framework for Information Access systems. EvALL allows to: (i) evaluate results in a way compliant with measurement theor; (ii) provide their results as reusable data to the scientific community; (ii) automatically generate evaluation figures and (low-level) interpretation of the results, both as a pdf report and as a latex source.

The RepLab 2014 Dataset

The RepLab 2014 Dataset

RepLab 2014 focuses on Reputation Monitoring on Twitter, targeting two new tasks: the categorization of messages with respect to standard reputation dimensions (Performance, Leadership, Innovation, etc.) and the characterization of Twitter profiles (author profiling) with respect to a certain activity domain.

ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter

ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter

We present a semi-automatic tool that assists experts in their daily work of monitoring the reputation of entities —companies, organizations or public figures- in Twitter.

The RepLab 2013 Dataset

The RepLab 2013 Dataset

The RepLab 2013 task is a (multilingual) evaluation exercise for Online Reputation Management systems. RepLab 2013 focused on monitoring the reputation of entities (companies, organizations, etc.) on Twitter. The monitoring task consists of filtering those that do refer to the entity, detecting topics (i.e., clustering tweets by subject) and ranking them based on the degree to which they signal reputation alerts.

SentiSense Affective Lexicon

SentiSense Affective Lexicon

The SentiSense Affective Lexicon consists of 5,496 words and 2,190 synsets from WordNet 2.1 labeled with an emotional category. The main part of the lexicon consists of nouns and adjectives, followed by verbs and a small set of adverbs. SentiSense is available in English (WordNet 2.1 and WordNet 3.0) and in Spanish (WordNet 3.0). Also, Polar words are provided in both languages.

SentiSense Affective Tools

SentiSense Affective Tools

SentiSense is endowed with a set of tools that allow users to visualize the lexicon and some statistics about the distribution of synsets and emotions in SentiSense, as well as to easily expand the lexicon. This tool is only available for the SentiSense version in English that uses WordNet 2.1.

Hotel Review Corpus

Hotel Review Corpus

The HotelReview Corpus is a corpus of 1000 reviews extracted from booking.com where each review has been manually tagged with a 5-classes category within the set Excellent, Good, Fair, Poor, Very poor and with a 3-classes category within the set Good, Fair, Poor.

Projects

 
 
 
 
 

Principal Investigator

FairTransNLP : Fairness and Transparency for equitable NLP applications in social media

Jan 2022 – Present Plan Nacional I+D+I Generación de Conocimiento
Artificial Intelligence (AI) applications often inadvertently perpetuate and accentuate unfair biases that can originate from multiple sources, such as data sampling, labelling process, training data, etc. We will develop equitable machine learning / deep learning systems which are able to reflect multiple perspectives, and operate over data that have “conflicting” labels, in order not to marginalise minority views.
 
 
 
 
 

Research Team

Artificial Intelligence Observatory for Spanish

Jan 2022 – Present In collaboration with RED.ES
The project focuses on monitoring the comparative state of the art of Natural Language Processing in English and Spanish, as well as quantifying the comparative adoption and use of Artificial Intelligence solutions in English and Spanish by citizens and organizations.
 
 
 
 
 

Principal Investigator

WOHA: Wild Online Health Assistant

Jan 2020 – Dec 2020 UNED Research Funds
The goal of the project is to build a Big Data service for patients (citizens at large) that automatically harvests, analyzes, aggregates and organizes online medical information, where trustable and suspicious sources are detected.
 
 
 
 
 

Research Team

MISMIS: Misinformation and Miscommunication in Social Media

Jan 2019 – Dec 2021 Plan Nacional I+D+I Generación de Conocimiento
The general objectives of the project are to address and monitor misinformation (biased and fake news) and miscommunication (aggressive language and hate speech) in social media. We propose a methodological standard for the whole research community (i) by developing rich annotated datasets, a data repository and online evaluation services; (ii) by proposing suitable evaluation metrics; and (iii) by organizing evaluation campaigns to foster research on the above issues.
 
 
 
 
 

Research Team

Vemodalen: Automatic analysis of meaning and authority in social media

Jan 2016 – Dec 2018 Plan Nacional I+D+I Retos
For an average citizen of our digital era, the problem is no longer finding relevant information, but assimilating the massive amount of relevant available information at any moment in time. This is not possible without the help of a new generation of machines able to digest all relevant sources into a readable, personalized synthesis of the stream of relevant information.
 
 
 
 
 

Research Team

Holopedia: The automatic enciclopedia of people and organizations

Nov 2014 – Dec 2015 Plan Nacional I+D+I Retos
The main goal of the project is to develop algorithms, techniques and systems able to mine and aggregate information relative to people and organizations from unstructured and structured web sources, such as social networks, blogs, news, semantic web data, and websites in general.
 
 
 
 
 

Research Team

VoxPopuli: Efficient analysis of reputation, propagation and recommendation in social media environments

Jan 2014 – Dec 2016 Plan Nacional I+D+I Retos
The project aims towards the creation of a new generation of online reputation monitoring systems, able to understand, process, aggregate and synthesize, in real time, facts, opinions and attitudes around an entity, of presenting such information in multiple dimensions, and of interacting with reputation experts so that they can accomplish their task better and faster. Our research will go from fundamental problems such as textual similarity or data structures for real time Natural Language Processing to prototype validation with reputation experts.
 
 
 
 
 

Research Team

Limosine: Linguistically Motivated Semantic aggregatIon engiNes

Jan 2011 – Oct 2014 European FP7-ICT
The LiMoSINe vision is to transition access to online information from a document-centric search paradigm focused on returning disconnected atomic pieces to a truly semantic aggregation paradigm. In this new paradigm, machines will understand a user’s intent, discover and organize facts, identify opinions, experiences and trends, all from inherently multilingual online sources and open knowledge repositories.
 
 
 
 
 

Research Team

MILES: Models of Interaction centred on Language, spacE and computational Semantics

Jan 2010 – Dec 2012 Plan Nacional I+D+I Retos
The main goal of this project is to develop an architecture for interactive systems that combines a dialogue engine, a natural language generator, and a semantic representation based on ontologies covering both the real (or virtual) physical space and the user located within it.
 
 
 
 
 

Research Team

Galante: Natural language generation for texts with emotions

Jan 2006 – Dec 2009 Plan Nacional I+D+I Retos
The project proposal contemplated two top level goals: the development of a natural language generation (NLG) application, and its validation in the context of a dialogue system (DS). The proposal outlined a coordinated project plan aimed at achieving these goals by bringing together the NLG expertise of the NIL research group at Universidad Complutense de Madrid (UCM), and the experience of the Julietta research group of the University of Seville (USE) in the development of dialogue systems.

Contact