Jorge Carrillo-de-Albornoz

Jorge Carrillo-de-Albornoz

Associate Professor & Lead Developer of AI Solutions

UNED

Bio

Welcome to my personal website. I am an Associate Professor at the Department of Languages and Information Systems at UNED and a senior researcher at the NLP & IR Research Group.

My work bridges academic research and industrial application through the technical leadership of production-grade AI platforms. I am the lead architect and technical coordinator of the main applications within the ODESIA initiative (funded by Red.es), including the Spanish NLP Portal (ODESIA Portal) and the AI Model Leaderboard (ODESIA Leaderboard), both designed to support large-scale benchmarking and informed decision-making around LLMs. I am also the principal lead of EvALL, a specialized service for the comprehensive evaluation of Information Systems, with a strong focus on language technologies. In addition to leading the platform, I am the main developer of PyEvALL, the evaluation framework powering EvALL and enabling reproducible, extensible, and fine-grained assessment of AI systems.

My current research and development focus on Human-Centric AI (HCAI) and the design of advanced RAG (Retrieval-Augmented Generation) architectures to improve the reliability and factual accuracy of generative models. Through the ANNOTATE project, I also work on detecting sexism and bias in multimedia environments.

I am actively open to industrial collaborations, technical consulting, and R&D partnerships—specifically in LLM implementation, RAG optimization, and AI fairness auditing.

Interests

  • Strategic AI Consulting
  • RAG (Retrieval-Augmented Generation)
  • Human-Centric AI (HCAI)
  • Bias & Sexism Detection
  • LLM Evaluation & Benchmarking

Education

  • Ph.D. in Computer Science, 2011

    Universidad Complutense de Madrid (UCM)

  • MSc in Artificial Intelligence, 2008

    Universidad Complutense de Madrid (UCM)

  • BSc in Computer Science, 2006

    Universidad Complutense de Madrid (UCM)

High-impact NLP solutions for reliable, auditable, and multilingual AI systems

AI Solutions & Consulting

I help organizations deploy reliable, auditable, and high-performing language technologies when off-the-shelf AI is not enough. My work bridges state-of-the-art NLP research and business-critical deployment, especially in high-risk, multilingual, or regulated environments.

I have served as technical lead and Principal Investigator in competitive R&D projects for over six years, coordinating multidisciplinary teams and delivering results in large-scale public and industry-funded initiatives. Previously, I worked as a Systems Analyst at SATEC, gaining hands-on experience with enterprise IT environments and real-world deployment constraints.


Proven Impact

  • Large-scale Benchmarking: Technical lead of the 2.1M€ ODESIA initiative (Red.es), conducting exhaustive benchmarking of AI systems to quantify performance gaps between English and Spanish, providing evidence to support strategic decisions in public procurement and AI policy.
  • Reputation Analytics: Collaborated with LLYC (Llorente & Cuenca) within the European LIMOSINE project, enabling early detection of reputational risk and public discourse trends at scale.
  • Principal Investigator: Lead researcher in multiple competitive R&D projects on Human-Centric AI, LLM evaluation, and Social Media Mining, including the 500k€+ ANNOTATE project.

Technical Expertise

  • Advanced RAG Systems: Design and optimization of Retrieval-Augmented Generation architectures to ensure factuality, traceability, and secure AI deployment.
  • AI Evaluation & Benchmarking: Rigorous assessment of LLM capabilities, focusing on robustness, bias, multilingual degradation, and real-world performance.
  • Social Media Intelligence: Large-scale data mining for online reputation, narrative detection, and trend analysis.
  • Safety & Ethics Auditing: Detection of hate speech, propaganda, sexism, and algorithmic bias to support compliance and risk mitigation.

Collaboration Models

  • Technical Consulting: Expert advice on AI strategy, RAG implementation, and LLM evaluation.
  • Joint R&D Projects: Innovation partnerships through University–Industry transfer contracts (Art. 60 LOSU).
  • Industrial PhDs: Development of deep-tech solutions leveraging public funding schemes and tax incentives.

Interested in a partnership?
Initial conversations are exploratory and non-binding.
Contact me via email to discuss how advanced NLP can support your organization’s needs.

Projects

 
 
 
 
 

Principal Investigator

ANNOTATE - integrAting disagreemeNt and seNsOr data for NexT-generation Ai sysTEms

Sep 2025 – Present Plan Nacional I+D+I Generación de Conocimiento - 550K
The ANNOTATE project delivers human-centric AI solutions designed for real-world deployment, embedding human feedback and contextual signals across the AI lifecycle. Its outcomes enable organizations to build more reliable, explainable, and inclusive AI systems, reducing bias, improving user trust, and supporting informed decision-making beyond traditional accuracy-driven models.
 
 
 
 
 

Principal Investigator

FairTransNLP : Fairness and Transparency for equitable NLP applications in social media

Sep 2022 – Dec 2025 Plan Nacional I+D+I Generación de Conocimiento - 350K
Artificial Intelligence (AI) systems often unintentionally amplify existing biases arising from data collection, annotation processes, and model training. This work focuses on developing equitable machine learning and deep learning approaches that explicitly account for multiple and potentially conflicting human perspectives, enabling AI systems to operate on diverse annotations without marginalising minority views.
 
 
 
 
 

Lead Architect and Technical Coordinator

ODESIA

Jan 2022 – Present In collaboration with RED.ES - 2.1M
The project provides a systematic and exhaustive assessment of the state of the art in Natural Language Processing for English and Spanish, quantifying performance gaps across AI systems and tasks. In parallel, it analyzes the adoption and real-world use of language-based AI solutions by citizens and organizations, generating evidence to support informed policy, procurement, and strategic decision-making.
 
 
 
 
 

Principal Investigator

WOHA: Wild Online Health Assistant

Jan 2020 – Dec 2020 UNED Research Funds - 4K
The goal of this project is to develop a Big Data service that automatically collects, analyzes, aggregates, and structures online medical information, enabling the identification of trustworthy and potentially misleading sources to support informed decision-making by patients and the general public.
 
 
 
 
 

Research Team

MISMIS: Misinformation and Miscommunication in Social Media

Jan 2019 – Dec 2021 Plan Nacional I+D+I Generación de Conocimiento - 300K
The project aims to monitor and address misinformation and harmful communication on social media, including biased or fake news, aggressive language, and hate speech. It establishes a methodological standard for the research community through the development of richly annotated datasets, shared data repositories, and online evaluation services, complemented by robust evaluation metrics and coordinated benchmarking campaigns to advance research and practical solutions in this area.
 
 
 
 
 

Research Team

Vemodalen: Automatic analysis of meaning and authority in social media

Jan 2016 – Dec 2018 Plan Nacional I+D+I Retos - 100K
In today’s digital environment, the main challenge for citizens is no longer accessing information, but processing and making sense of overwhelming volumes of relevant content. This work addresses the need for a new generation of AI systems capable of analyzing multiple sources and delivering clear, personalized, and actionable information syntheses in real time.
 
 
 
 
 

Research Team

Holopedia: The automatic enciclopedia of people and organizations

Nov 2014 – Dec 2015 Plan Nacional I+D+I Retos - 234K
The project focuses on developing algorithms and scalable systems for mining and aggregating information about people and organizations, integrating structured and unstructured web sources—including social networks, news outlets, blogs, semantic web data, and general websites—to support comprehensive analysis and informed decision-making.
 
 
 
 
 

Research Team

VoxPopuli: Efficient analysis of reputation, propagation and recommendation in social media environments

Jan 2014 – Dec 2016 Plan Nacional I+D+I Retos - 300K
The project aims to develop algorithms and systems for large-scale information mining and aggregation about people and organizations, combining structured and unstructured web sources such as social networks, news media, blogs, semantic web data, and general websites to enable comprehensive and actionable insights.
 
 
 
 
 

Research Team

Limosine: Linguistically Motivated Semantic aggregatIon engiNes

Jan 2011 – Oct 2014 European FP7-ICT - 3.4M
The LiMoSINe project aims to move beyond document-centric search toward semantic information aggregation, enabling AI systems to understand user intent, organize facts, and identify opinions, experiences, and trends. By integrating multilingual online sources and open knowledge repositories, the project supports more coherent, contextual, and insight-driven access to information.
 
 
 
 
 

Research Team

MILES: Models of Interaction centred on Language, spacE and computational Semantics

Jan 2010 – Dec 2012 Plan Nacional I+D+I Retos - 277K
The project aims to develop an architecture for interactive systems that integrates a dialogue engine, natural language generation, and ontology-based semantic representations of both the physical (or virtual) environment and the user, enabling more context-aware, adaptive, and meaningful human–machine interaction.
 
 
 
 
 

Research Team

Galante: Natural language generation for texts with emotions

Jan 2006 – Dec 2009 Plan Nacional I+D+I Retos - 72K
The project pursued two primary objectives: the development of a natural language generation (NLG) application and its validation within a dialogue system. These goals were addressed through a coordinated research plan that combined the NLG expertise of the NIL research group (Universidad Complutense de Madrid) with the dialogue systems experience of the Julietta research group (University of Seville).

Resources

*
EXIST Datasets

EXIST Datasets

Sexism comprises any form of oppression or prejudice against women because of their sex. The aim of the EXIST datasets is to cover sexism in a broad sense, from explicit misogyny to other subtle expressions that involve implicit sexist behaviours.

PyEvALL

PyEvALL

PyEvALL (The Python library to Evaluate ALL) is an evaluation tool for information systems that allows assessing a wide range of metrics covering various evaluation contexts, including classification, ranking, or LeWiDi (Learning with disagreement).

MeTwo Dataset

MeTwo Dataset

The MeTwo dataset is a corpus for the detection of sexist expressions and attitudes in Twitter. MeTwo is the first corpus in Spanish designed to identify sexism in a broad sense, from hostile to much more subtle sexism.

CEM-Ord metric

CEM-Ord metric

We propose a new metric for Ordinal Classification, Closeness Evaluation Measure, that is rooted on Measurement Theory and Information Theory

RepLab Summarization Dataset

RepLab Summarization Dataset

The RepLab summarization dataset contains companies data from the RepLab 2013 dataset. The collection comprises tweets about 31 entities from two domains: automotive and banking. As a result, our subset of RepLab 2013 comprises 71,303 English and Spanish tweets.

eDiseases Dataset

eDiseases Dataset

The eDiseases dataset contains patient data from the MedHelp. We extracted 146 posts for allergies, 191 posts for crohn, and 142 posts for breast cancer; which include 983 sentences for allergies, 1780 sentences for crohn, and 1029 sentences for breast cancer. Each sentence in the dataset is labeled with Factuality (OPINION, FACT, EXPERIENCE) and Polarity (POSITIVE, NEUTRAL, NEGATIVE).

RBU metric

RBU metric

We define the Rank-Biased Utility (RBU) metric – an adaptation of the well-known Rank-Biased Precision metric – that takes into account redundancy and the user effort associated to the inspection of documents in the ranking with diversity task.

EvALL

EvALL

The EvALL online evaluation service aims to provide a unified evaluation framework for Information Access systems. EvALL allows to: (i) evaluate results in a way compliant with measurement theor; (ii) provide their results as reusable data to the scientific community; (ii) automatically generate evaluation figures and (low-level) interpretation of the results, both as a pdf report and as a latex source.

The RepLab 2014 Dataset

The RepLab 2014 Dataset

RepLab 2014 focuses on Reputation Monitoring on Twitter, targeting two new tasks: the categorization of messages with respect to standard reputation dimensions (Performance, Leadership, Innovation, etc.) and the characterization of Twitter profiles (author profiling) with respect to a certain activity domain.

ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter

ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter

We present a semi-automatic tool that assists experts in their daily work of monitoring the reputation of entities —companies, organizations or public figures- in Twitter.

The RepLab 2013 Dataset

The RepLab 2013 Dataset

The RepLab 2013 task is a (multilingual) evaluation exercise for Online Reputation Management systems. RepLab 2013 focused on monitoring the reputation of entities (companies, organizations, etc.) on Twitter. The monitoring task consists of filtering those that do refer to the entity, detecting topics (i.e., clustering tweets by subject) and ranking them based on the degree to which they signal reputation alerts.

SentiSense Affective Lexicon

SentiSense Affective Lexicon

The SentiSense Affective Lexicon consists of 5,496 words and 2,190 synsets from WordNet 2.1 labeled with an emotional category. The main part of the lexicon consists of nouns and adjectives, followed by verbs and a small set of adverbs. SentiSense is available in English (WordNet 2.1 and WordNet 3.0) and in Spanish (WordNet 3.0). Also, Polar words are provided in both languages.

SentiSense Affective Tools

SentiSense Affective Tools

SentiSense is endowed with a set of tools that allow users to visualize the lexicon and some statistics about the distribution of synsets and emotions in SentiSense, as well as to easily expand the lexicon. This tool is only available for the SentiSense version in English that uses WordNet 2.1.

Hotel Review Corpus

Hotel Review Corpus

The HotelReview Corpus is a corpus of 1000 reviews extracted from booking.com where each review has been manually tagged with a 5-classes category within the set Excellent, Good, Fair, Poor, Very poor and with a 3-classes category within the set Good, Fair, Poor.

Recent Publications

Quickly discover relevant content by filtering publications.
(2025). Evaluating Sequence Labeling on the basis of Information Theory. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2025, Vienna, Austria, July 27 - August 1, 2025.

URL

Contact