Laura Plaza

Laura Plaza

Associate Professor and Researcher in Language Technologies

UNED

Bio

I’m Assistant Professor at the Department of Languages and Information Systems at UNED and researcher at the NLP & IR UNED group.

My expertise includes different fields of Natural Language Processing, with special interest in practical applications in the biomedical domain and social networks. Currently, my interest focus on extracting and summarizing information from online patient forums, as well as detecting and analyzing sexist expressions and behaviors in social networks.

I am also a member of the Observatory for AI in Spanish, whose aim is to promote research in language technologies and resources, and therefore an important part of my research is devoted to the development of textual corpora in Spanish for training NLP systems.

During 2022 and 2023 I am collaborating with Damiano Spina as a Visiting Research Fellow at the Royal Melbourne Institute of Technology.

Interests

  • Biomedical summarization
  • Sexism Identification
  • Controversy Detection
  • Bias Understanding
  • Natural Language Processing

Education

  • BsC in Business Administration, 2016

    Universidad Nacional de Educación a Distancia (UNED)

  • Ph.D. in Computer Science, 2011

    Universidad Complutense de Madrid (UCM)

  • MSc in Artificial Intelligence, 2008

    Universidad Complutense de Madrid (UCM)

  • BSc in Computer Science, 2006

    Universidad Carlos III de Madrid (UC3M)

Affiliations

UNED

UNED

Associate Professor, School of Computer Science

NLP_IR

NLP & IR Reserach Group

Researcher in Language Technologies, School of Computer Science (UNED)

RMIT

RMIT University

Visiting Research Fellow, School of Computing Technologies

Recent Publications

Quickly discover relevant content by filtering publications.
(2022). Self-Assesment tool with topic-driven navigation for algorithms learning. IEEE Global Engineering Education Conference, EDUCON 2022, Tunis, Tunisia, March 28-31, 2022.

PDF DOI

(2021). Combining Transformer-Based Models with Traditional Machine Learning Approaches for Sexism Identification in Social Networks at EXIST 2021. Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2021), XXXVII International Conference of the Spanish Society for Natural Language Processing., Málaga, Spain, September, 2021.

PDF

(2021). UNEDBiasTeam at IberLEF 2021's EXIST Task: Detecting Sexism Using Bias Techniques. Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2021) co-located with the Conference of the Spanish Society for Natural Language Processing (SEPLN 2021), XXXVII International Conference of the Spanish Society for Natural Language Processing., Málaga, Spain, September, 2021.

PDF

Resources

*
EXIST 2021 Dataset

EXIST 2021 Dataset

Sexism comprises any form of oppression or prejudice against women because of their sex. The aim of the EXIST dataset is to cover sexism in a broad sense, from explicit misogyny to other subtle expressions that involve implicit sexist behaviours.

MeTwo Dataset

MeTwo Dataset

The MeTwo dataset is a corpus for the detection of sexist expressions and attitudes in Twitter. MeTwo is the first corpus in Spanish designed to identify sexism in a broad sense, from hostile to much more subtle sexism.

RepLab Summarization Dataset

RepLab Summarization Dataset

The RepLab summarization dataset contains companies data from the RepLab 2013 dataset. The collection comprises tweets about 31 entities from two domains: automotive and banking. As a result, our subset of RepLab 2013 comprises 71,303 English and Spanish tweets.

eDiseases Dataset

eDiseases Dataset

The eDiseases dataset contains patient data from the MedHelp. We extracted 146 posts for allergies, 191 posts for crohn, and 142 posts for breast cancer; which include 983 sentences for allergies, 1780 sentences for crohn, and 1029 sentences for breast cancer. Each sentence in the dataset is labeled with Factuality (OPINION, FACT, EXPERIENCE) and Polarity (POSITIVE, NEUTRAL, NEGATIVE).

SentiSense Affective Lexicon

SentiSense Affective Lexicon

The SentiSense Affective Lexicon consists of 5,496 words and 2,190 synsets from WordNet 2.1 labeled with an emotional category. The main part of the lexicon consists of nouns and adjectives, followed by verbs and a small set of adverbs. SentiSense is available in English (WordNet 2.1 and WordNet 3.0) and in Spanish (WordNet 3.0). Also, Polar words are provided in both languages.

SentiSense Affective Tools

SentiSense Affective Tools

SentiSense is endowed with a set of tools that allow users to visualize the lexicon and some statistics about the distribution of synsets and emotions in SentiSense, as well as to easily expand the lexicon. This tool is only available for the SentiSense version in English that uses WordNet 2.1.

Hotel Review Corpus

Hotel Review Corpus

The HotelReview Corpus is a corpus of 1000 reviews extracted from booking.com where each review has been manually tagged with a 5-classes category within the set Excellent, Good, Fair, Poor, Very poor and with a 3-classes category within the set Good, Fair, Poor.

Projects

 
 
 
 
 

Principal Investigator

FairTransNLP : Fairness and Transparency for equitable NLP applications in social media

Jan 2022 – Present Plan Nacional I+D+I Generación de Conocimiento
Artificial Intelligence (AI) applications often inadvertently perpetuate and accentuate unfair biases that can originate from multiple sources, such as data sampling, labelling process, training data, etc. We will develop equitable machine learning / deep learning systems which are able to reflect multiple perspectives, and operate over data that have “conflicting” labels, in order not to marginalise minority views.
 
 
 
 
 

Research Team

Artificial Intelligence Observatory for Spanish

Jan 2022 – Present In collaboration with RED.ES
The project focuses on monitoring the comparative state of the art of Natural Language Processing in English and Spanish, as well as quantifying the comparative adoption and use of Artificial Intelligence solutions in English and Spanish by citizens and organizations.
 
 
 
 
 

Principal Investigator

MISMIS: Misinformation and Miscommunication in Social Media

Jan 2019 – Dec 2021 Plan Nacional I+D+I Generación de Conocimiento
The general objectives of the project are to address and monitor misinformation (biased and fake news) and miscommunication (aggressive language and hate speech) in social media. We propose a methodological standard for the whole research community (i) by developing rich annotated datasets, a data repository and online evaluation services; (ii) by proposing suitable evaluation metrics; and (iii) by organizing evaluation campaigns to foster research on the above issues.
 
 
 
 
 

Research Team

Vemodalen: Automatic analysis of meaning and authority in social media

Jan 2016 – Dec 2018 Plan Nacional I+D+I Retos
For an average citizen of our digital era, the problem is no longer finding relevant information, but assimilating the massive amount of relevant available information at any moment in time. This is not possible without the help of a new generation of machines able to digest all relevant sources into a readable, personalized synthesis of the stream of relevant information.
 
 
 
 
 

Principal Investigator

Automatic modelling and synthesis of user opinions in social networks

Jan 2014 – Dec 2015 UNED Research Funds
The aim of the project is the creation of automatic summary generation systems specially designed to work on reputational data exchanged via Twitter.
 
 
 
 
 

Research Team

VoxPopuli: Efficient analysis of reputation, propagation and recommendation in social media environments

Jan 2014 – Dec 2016 Plan Nacional I+D+I Retos
The project aims towards the creation of a new generation of online reputation monitoring systems, able to understand, process, aggregate and synthesize, in real time, facts, opinions and attitudes around an entity, of presenting such information in multiple dimensions, and of interacting with reputation experts so that they can accomplish their task better and faster. Our research will go from fundamental problems such as textual similarity or data structures for real time Natural Language Processing to prototype validation with reputation experts.
 
 
 
 
 

Research Team

MILES: Models of Interaction centred on Language, spacE and computational Semantics

Jan 2010 – Dec 2012 Plan Nacional I+D+I Retos
The main goal of this project is to develop an architecture for interactive systems that combines a dialogue engine, a natural language generator, and a semantic representation based on ontologies covering both the real (or virtual) physical space and the user located within it.
 
 
 
 
 

Research Team

Galante: Natural language generation for texts with emotions

Jan 2006 – Dec 2009 Plan Nacional I+D+I Retos
The project proposal contemplated two top level goals: the development of a natural language generation (NLG) application, and its validation in the context of a dialogue system (DS). The proposal outlined a coordinated project plan aimed at achieving these goals by bringing together the NLG expertise of the NIL research group at Universidad Complutense de Madrid (UCM), and the experience of the Julietta research group of the University of Seville (USE) in the development of dialogue systems.

Contact