Why EXIST?

Welcome to the website of EXIST 2025, the fifth edition of the sEXism Identification in Social neTworks task at CLEF 2025.

EXIST is a series of scientific events and shared tasks on sexism identification in social networks. EXIST aims to foster the automatic detection of sexism in a broad sense, from explicit misogyny to other subtle expressions that involve implicit sexist behaviours (EXIST 2021, EXIST 2022, EXIST 2023, EXIST 2024). The fifth edition of the EXIST shared task will be held as a Lab in CLEF 2025, on September 9-12, 2025, at UNED, Madrid, Spain.

Social Networks are the main platforms for social complaint, activism, etc. Movements like #MeTwoo, #8M or #Time’sUp have spread rapidly. Under the umbrella of social networks, many women all around the world have reported abuses, discriminations and other sexist experiences suffered in real life. Social networks are also contributing to the transmission of sexism and other disrespectful and hateful behaviours. In this context, automatic tools not only may help to detect and alert against sexism behaviours and discourses, but also to estimate how often sexist and abusive situations are found in social media platforms, what forms of sexism are more frequent and how sexism is expressed in these media. This Lab will contribute to developing applications to detect sexism.

In the 2024 EXIST campaign the datasets contained multimedia content in the format of memes, stepping forward research on more robust techniques to identify sexism in social networks. Following this line, this year the challenge will focus on TikTok videos, so that the dataset includes the three more important multimedia elements used to spread sexism: text, images and videos. Consequently, it is essential to develop automated multimodal tools capable of detecting sexism in text, images, and videos, to raise alarms or automatically remove such content from social network because platforms’ algorithms often amplify content that perpetuates gender stereotypes and internalized misogyny. This lab will contribute to the creation of applications that identify sexist content in social media across all three formats.

Similar to the approach in the 2023 and 2024 edition, this edition will also embrace the Learning With Disagreement (LeWiDi) paradigm for both the development of the dataset and the evaluation of the systems. The LeWiDi paradigm doesn’t rely on a single “correct” label for each example. Instead, the model is trained to handle and learn from conflicting or diverse annotations. This enables the system to consider various annotators’ perspectives, biases, or interpretations, resulting in a fairer learning process.

In previous editions, 223 teams from more than 50 countries submitted their results achieving impressive results, especially in the sexism detection task. However, there is still room for improvement, especially in when the problem is addressed under the LeWeDi paradigm in a multimedia context.

                               

Tasks

Participants will be asked to identify and characterize sexism in social networks according to different sources: This year the lab comprises nine subtasks in two languages, English and Spanish, which are the same three tasks (sexism identification, source intention detection, and sexism categorization) applied to three different types of data: text (tweets), image (memes) and video (TikToks). This multimedia approach will help identify trends and patterns in sexism across media formats and user interactions, contributing to a deeper understanding of the social dynamics. Also, approaches submitted to all tasks will be evaluated to analyze their capacity to detect sexism in a multimodal source.

A condense schema of all tasks included this year in the lab is presented in the following table:

For a more detailed description of each subtask, as well as some examples, check the next sections.

Subtask 1.1: Sexism Identification in Tweets

The first subtask is a binary classification. The systems have to decide whether or not a given tweet contains sexist expressions or behaviours (i.e., it is sexist itself, describes a sexist situation or criticizes a sexist behaviour), and classify it according to two categories: YES and NO.

  • YES:
    • “Mujer al volante, tenga cuidado!”
    • “People really try to convince women with little to no ass that they should go out and buy a body. Like bih, I don’t need a fat ass to get a man. Never have.”
  • NO:
    • “Alguien me explica que zorra hace la gente en el cajero que se demora tanto.”
    • "@messyworldorder it’s honestly so embarrassing to watch and they’ll be like “not all white women are like that”"

Subtask 1.2: Source Intention in Tweets

Once a message has been classified as sexist, the second subtask aims to categorize the message according to the intention of the author, which provides insights in the role played by social networks on the emission and dissemination of sexist messages. In this subtask, we propose a ternary classification task:

  • DIRECT: the intention was to write a message that is sexist by itself or incites to be sexist, as in:

    • A woman needs love, to fill the fridge, if a man can give this to her in return for her services (housework, cooking, etc), I don’t see what else she needs.
    • Women shouldn’t code…perhaps be influencer/creator instead…it’s their natural strength.
  • REPORTED: the intention is to report and share a sexist situation suffered by a woman or women in first or third person, as in:

    • I doze in the subway, I open my eyes feeling something weird: the hand of the man sat next to me on my leg #SquealOnYourPig.
    • Today, one of my year 1 class pupils could not believe he’d lost a race against a girl.
  • JUDGEMENTAL: the intention was to judge, since the tweet describes sexist situations or behaviours with the aim of condemning them.

    • As usual, the woman was the one quitting her job for the family’s welfare…
    • 21st century and we are still earning 25% less than men #Idonotrenounce.

Subtask 1.3: Sexism Categorization in Tweets

Many facets of a woman’s life may be the focus of sexist attitudes including domestic and parenting roles, career opportunities, sexual image, and life expectations, to name a few. Automatically detecting which of these facets of women are being more frequently attacked in social networks will facilitate the development of policies to fight against sexism. According to this, each sexist tweet must be categorized in one or more of the following categories

  • IDEOLOGICAL AND INEQUALITY: The text discredits the feminist movement, rejects inequality between men and women, or presents men as victims of gender-based oppression.

    • “Mi hermana y mi madre se burlan de mí por defender todo el tiempo los derechos de todos y me acaban de decir feminazi, la completaron”.
    • “I think the whole equality thing is getting out of hand. We are different, thats how were made!”.
  • STEREOTYPING AND DOMINANCE: The text expresses false ideas about women that suggest they are more suitable to fulfill certain roles (mother, wife, family caregiver, faithful, tender, loving, submissive, etc.), or inappropriate for certain tasks (driving, hardwork, etc), or claims that men are somehow superior to women.

    • “@Paula2R @faber_acuria A las mujeres hay que amarlas…solo eso… Nunca las entenderás.”.
    • “Most women no longer have the desire or the knowledge to develop a high quality character, even if they wanted to.”.
  • OBJECTIFICATION: The text presents women as objects apart from their dignity and personal aspects, or assumes or describes certain physical qualities that women must have in order to fulfill traditional gender roles (compliance with beauty standards, hypersexualization of female attributes, women’s bodies at the disposal of men, etc.).

    • ““Pareces una puta con ese pantalón” - Mi hermano de 13 cuando me vio con un pantalón de cuero”.
    • “Don’t get married than blame all woman for your poor investment. You should of got a hooker but instead you choose to go get a wedding ring.”.
  • SEXUAL VIOLENCE: Sexual suggestions, requests for sexual favors or harassment of a sexual nature (rape or sexual assault) are made.

    • “#MeToo Estas 4 no han conseguido su objetivo.El juez estima que se abrieron de patas https://t.co/GSHiiwqY6Aánta lagartona hay en este \metoo"!👇🏻👇🏻🤔🤔🤔 https://t.co/8t5VmFIUFn"
    • “fuck that cunt, I would with my fist”.
  • MISOGYNY AND NON-SEXUAL VIOLENCE: The text expressses hatred and violence towards women.

    • “Las mujeres de hoy en dia te enseñar a querer… estar soltero”
    • “Some woman are so toxic they don’t even know they are draining everyone around them in poison. If you lack self awareness you won’t even notice how toxic you really are”.

Subtask 2.1: Sexism Identification in Memes

This is a binary classification subtask consisting on determining wheter a meme describes a sexist situation or criticizes a sexist behaviour), and classifying it into two categories: YES and NO. The following figures are some examples of both types of memes, respectively.

Sexist
(a) YES
Not sexist
(b) NO

Subtask 2.2: Source Intention in Memes

As in subtask 1.2, this subtask aims to categorize the meme according to the intention of the author, which provides insights in the role played by social networks on the emission and dissemination of sexist messages. Due to the characteristics of the memes, the REPORTED label is virtually null, so in this task systems should only classify memes with DIRECT or JUDGEMENTAL labels. The following figures are some examples of them, respectively.

Direct
(a) Direct
Judgemental
(b) Judgemental

Subtask 2.3: Sexism Categorization in Memes

This task aims to classify sexist memes according to the categorization provided for subtask 1.3: (i) IDEOLOGICAL AND INEQUALITY, (ii) STEREOTYPING AND DOMINANCE, (iii) OBJECTIFICATION, (iv) SEXUAL VIOLENCE and (v) MISOGYNY AND NON-SEXUAL VIOLENCE. The following figures are some examples of categorized memes.

(a) Stereotyping

(e) Ideological

(c) Objectification

(d) Misogyny

(b) Sexual violence

Subtask 3.1: Sexism Identification in Videos

This subtask is the same subtask as subtasks 1.1 and 2.1. The following figures are some examples of videos classified as YES or NO.

@cayleecresta #stitch with @goodbrobadbro easy should never be the word used to describe womanhood #fyp #foryou #foryoupage #womenempowerment #women #feminism ♬ original sound - Caylee Cresta
(a) YES
@dailyhealth2 #haha #kidnapped #bigredswifesarmy #oregon #victimcard #victimblaming #bodyguard #loved #smile #lagrandeoregon ♬ original sound - รⒶ︎я︎Ⓐ︎𝔥 ģⒶ︎เ︎ᒪ︎🫦
(b) NO

Subtask 3.2: Source Intention in Videos

This subtask replicates subtask 2.2 for memes, but it takes as source videos. The following examples are some videos representing each category.

@yourgirlhaylie #duet with @michaelkoz #sexist #foryou #FitCheck #throwhimaway ♬ original sound - Mike Koz
(a) Direct
@zantyoo #womenpower #humiliation #power #womencant #womencantoo #womencan ♬ original sound - Amizan Words
(b) Judgemental

Subtask 3.3: Sexism Categorization in Videos

This subtask aims to classify sexist videos according to the categorization provided for Task 1.3: (i) IDEOLOGICAL AND INEQUALITY, (ii) STEREOTYPING AND DOMINANCE, (iii) OBJECTIFICATION, (iv) SEXUAL VIOLENCE and (v) MISOGYNY AND NON-SEXUAL VIOLENCE. The following figures are some examples of categorized videos.

@streaminfreedom I’m an idiot! @streaminfreedom #truestory #menvswomen #relationshipcomedy ♬ original sound - leanne_lou
(a) Stereotyping
@itslindobaby I’m getting so use to this now 😒 can people just like me for my music? #golddigger #rapper #hiphop #golddiggerprank ♬ original sound - Lindo
(b) Ideological and Dominance
@zo3tv #duet with @lenatheplug #noJumper #dunked #in #theRight #goal #she #is #beautiful & #babygirl #isTo #swimsuit #never #gotTight #bodySnatched #congrats ♬ Aesthetic Girl - Yusei
(c) Objectification
@alt_acc393 IT'S A JOKEEEEE. #fyp #foryoupage #foryou ♬ original sound - alt acc
(d) Misogyny
@caitlinnrowe_ proud of adelaide today 🤍#justicforwomen #saraheverard #notallmen #fyp #protest #adelaide #southaustralia #australia #foryoupage ♬ THISISNOTMYREMIX - Thewizardliz
(e) Sexual violence

How to participate

If you want to participate in the EXIST 2025 shared task at CLEF 2025, please proceed to register for the lab at CLEF 2025 Labs Registration site. Once you have filled out the form, you will receive an email with information on how to join the EXIST 2025 Discord Forum, where EXIST-Datasets, EXIST-Communications, EXIST-Questions/Answers, and EXIST-Guidelines will be made available to participants. This is a manual process, so it might take some time. Please don’t worry, :-).

Participants will be required to submit their runs and will have the possibility to provide a technical report that should include a brief description of their approach, focusing on the adopted algorithms, models and resources, a summary of their experiments, and an analysis of the obtained results. Although we recommend to participate in all subtasks and in both languages, participants are allowed to participate just in one of them (e.g. subtask 1) and in one language (e.g. English).

Publications

Technical reports will be published in CLEF 2024 Proceedings at CEUR-WS.org.

Important dates

  • 18 November 2024: Registration opens.
  • 3 February 2025: Training and development sets available. Extended Deadline: 10 February 2025: Training and development sets available.
  • 7 April 2025: Test set available.
  • 25 April 2025: Registration closes.
  • 18 May 2025: Runs submission due to organizers. Extended Deadline: 23 May 2025: Runs submission due to organizers.
  • 8 June 2025: Results notification to participants.
  • 15 June 2025: Submission of Working Notes by participants.
  • 29 June 2025: Notification of acceptance (peer reviews).
  • 7 July 2025: Camera-ready participant papers due to organizers.
  • 9-12 September 2025: EXIST 2025 at CLEF Conference.

Note: All deadlines are 11:59PM UTC-12:00 (“anywhere on Earth”).

Dataset

Since 2021, the primary goal of the EXIST campaigns has been to identify sexism in tweets, resulting in the creation of three annotated tweet corpora for various EXIST tasks.

In 2024, the EXIST evaluation campaigns have expanded into multimedia environments. This year, with the inclusion of TikTok videos in the dataset, the EXIST 2025 Dataset aims to provide the research community with the first comprehensive multimedia dataset —encompassing tweets, memes, and videos— for sexism detection and categorization in social media.

Crawling

The TikTok dataset was collected using Apify’s TikTok Hashtag Scraper tool, focusing on hashtags associated with potentially sexist content. A rigorous manual selection process was carried out to ensure an appropriate balance between positive and negative seed hashtags. In total, 185 Spanish hashtags and 61 English hashtags were chosen, guaranteeing a broad and representative collection of sexist-related content in both languages.

The collected TikTok videos were divided into training and test sets following a chronological and author-based partitioning strategy. This approach ensured temporal coherence while preventing data leakage. To achieve this, authors present in the training set were excluded from the test set, preventing the model from learning author-specific patterns and enhancing its generalization capabilities. Additionally, each hashtag (seed) was required to contribute a minimum number of videos, ensuring a more uniform distribution across the dataset. The final selection of videos was conducted randomly but maintained a temporal distribution to ensure diversity and avoid overrepresentation of any specific time period.

The final dataset comprises more than 3,000 videos. The training set consists of 2,524 videos, including 1,524 Spanish videos and 1,000 English videos. The test set contains 674 videos, with a subset of 304 Spanish videos and 370 English videos.

Labeling process

The annotation process was conducted using Servipoli’s service at UPV University, with a total of eight students. Given the complexity of video labeling, this year’s methodology was carried out with experts who received specialized training through multiple sessions and followed carefully designed guidelines. Additionally, preliminary experiments were conducted with a minimum set of TikTok videos to ensure a thorough understanding of the task and to guarantee the quality of the annotations. Due to this new labelling methodology, the 2025 videos datasets only includes as demographic data the gender of the annotators.

The labeling process was performed in pairs, with each annotator responsible for labeling 1,000 TikTok videos while maintaining close communication with experts throughout the process. As a result, each TikTok video was labeled by two annotators. To ensure a rigorous evaluation of the dataset in a challenging context, while minimizing data loss, any disagreements between annotators were resolved by a member of the research team, who made the final decision.

Learning with disagreements

The idea that natural language expressions have a single, clearly identifiable interpretation in a given context is a convenient simplification but does not reflect reality, particularly in highly subjective tasks such as sexism identification. The learning with disagreements paradigm addresses this challenge by allowing systems to learn from datasets that do not rely on a single “gold” annotation but instead incorporate the perspectives of multiple annotators, capturing the diversity of interpretations.

Following approaches designed to train models directly from data with disagreements, rather than using an aggregated label, we will provide all annotations per instance for the different annotators.

More details about the dataset will be provided in the task overview (bias consideration, annotation process, quality experiments, inter-annotator agreement, etc.).

Evaluation

From the point of view of evaluation metrics, our nine subtasks can be described as:

  • Subtasks 1.1, 2.1 and 3.1 (sexism identification): binary classification, mono label.
  • Subtasks 1.2, 2.2 and 3.2 (source intention): multiclass hierarchical classification, mono label. The hierarchy of classes has a first level with YES/NO, and a second level for the sexist category with three/two mutually exclusive subcategories: direct, reported and judgemental. A suitable evaluation metric must reflect the fact that a confusion between not sexist and a sexist category is more severe than a confusion between two sexist subcategories.
  • Subtasks 1.3, 2.3 and 3.3 (sexism categorization): multiclass hierarchical classification, multi-label. Again, the first level is a binary distinction between YES/NO, and there is a second level for the sexist category that includes “ideological and inequality”, “stereotyping and dominance”, “objectification”, “sexual violence” and “misogyny and non-sexual violence”. These classes are not mutually exclusive: a tweet/meme/video may belong to several subcategories at the same time.

The learning with disagreements paradigm can be considered in both sides of the evaluation process:

  • The ground truth. In a “hard” setting, variability in the human annotations is reduced to a gold standard set of categories, hard labels, that are assigned to each item (e.g., using majority vote). In a “soft” setting, the gold standard is the full set of human annotations with their variability. Therefore, the evaluation metric incorporates the proportion of human annotators that have selected each category, soft labels. Note that in subtasks 1.1, 2.1, 3.1, 1.2, 2.2 and 3.2, which are mono label problems, the sum of the probabilities of each class must be one. But in subtasks 1.3, 2.3 and 3.3, which are multi-label, each annotator may select more than one category for a single item. Therefore, the sum of the probabilities of each class may be larger than one.
  • The system output. In a “hard”, traditional setting, the system predicts one or more categories for each item. In a “soft” setting, the system predicts a probability for each category, for each item. The evaluation score is maximized when the probabilities predicted match the actual probabilities in a soft ground truth. Again, note that in subtasks 1.3, 2.3 and 3.3, which is a multi-label problem, the probabilities predicted by the system for each of the categories do not necessarily add up to one.

For each of the tasks, two types of evaluation will be reported:

  • Hard-hard: hard system output and hard ground truth.
  • Soft-soft: soft system output and soft ground truth.

For all tasks and all types of evaluation (hard-hard and soft-soft) we will use the same official metric: ICM (Information Contrast Measure) (Amigó and Delgado, 2022). ICM is a similarity function that generalizes Pointwise Mutual Information (PMI), and can be used to evaluate system outputs in classification problems by computing their similarity to the ground truth categories. As there is not, to the best of our knowledge, any current metric that fits hierarchical multi-label classification problems in a learning with disagreement scenario, we have defined an extension of ICM (ICM-soft) that accepts both soft system outputs and soft ground truth assignments.

For each of the tasks, the evaluation will be performed in the two modes described above, as follows:

  • Hard-hard evaluation. For systems that provide a hard, conventional output, we will provide a hard-hard evaluation. To derive the hard labels in the ground truth from the different annotators’ labels, we use a probabilistic threshold computed for each task. As a result, for subtasks 1.1 and 2.1, the class annotated by more than 3 annotators is selected; for subtasks 1.2 and 2.2, the class annotated by more than 2 annotators is selected; and for subtasks 1.3 and 2.3 (multi-label), the class annotated by more than 1 annotator are selected. Due to the nature of subtasks 3.1, 3.2 and 3.3 and the complexity of video labeling, the labelling methodology challenged for this subtasks so hard labels included are those annotated by more than 1 annotator. Items for which there is no majority class (i.e., no class receives more probability than the threshold) will be removed from this evaluation scheme. The official metric will be the original ICM (as defined in (Amigó and Delgado, 2022)). We will also report and compare systems with F1 (the harmonic average of precision and recall). In subtasks 1.1, 2.1 and 3.1, we will use F1 for the positive class. In the remaining subtasks, we will use the average of F1 for all classes. Note, however, that F1 is not ideal in our experimental setting: although it can handle multi-label situations, it does not consider the relationships between classes: a mistake between not sexist and any of the sexist subclasses, and a mistake between two of the positive subclasses, are penalized equally, although the former is a more severe error.
  • Soft-soft evaluation. For systems that provide probabilities for each category, we will provide a soft-soft evaluation that compares the probabilities assigned by the system with the probabilities assigned by the set of human annotators. As in the previous case, we will use ICM-soft as the official evaluation metric in this variant. We may also report additional metrics in the final report.

Enrique Amigó and Agustín Delgado. 2022. Evaluating Extreme Hierarchical Multi-label Classification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5809–5819, Dublin, Ireland. Association for Computational Linguistics.

Results

A total of 244 teams from 38 countries registered for EXIST 2025, of which 114 teams from 23 countries submitted at least one run. This year’s edition reflects a remarkable level of engagement, with 873 runs received in total — 589 hard-label and 284 soft-label submissions.

Most participants focused on the textual analysis tasks, particularly Task 1, which alone received 596 evaluated runs. However, Task 3 also attracted significant interest, with 211 runs evaluated, highlighting the increasing relevance of multimodal approaches.

Below are the official leaderboards for all participants and tasks in all evaluations contexts:

Link to Subtask 1.1 Leaderboard

Link to Subtask 1.2 Leaderboard

Link to Subtask 1.3 Leaderboard

Link to Subtask 2.1 Leaderboard

Link to Subtask 2.2 Leaderboard

Link to Subtask 2.3 Leaderboard

Link to Subtask 3.1 Leaderboard

Link to Subtask 3.2 Leaderboard

Link to Subtask 3.3 Leaderboard

Details:

  • Hard-hard: hard system output and hard ground truth.
    • Metrics:
      • ICM-Hard: ICM is the official metric for the ranking (as defined in Amigó and Delgado, 2022).
      • ICM-Hard Norm: ICM hard normalized.
      • F1: in Subtask 1, we provide results for F1 for the positive class, “YES”. In Subtask 2 and 3, we provide results for the average of F1 for all classes.
    • Baselines:
      • Majority class: non-informative baseline that classifies all instances as the majority class.
      • Minority class: non-informative baseline that classifies all instances as the minority class.
  • Soft-soft: soft system output and soft ground truth.
    • Metrics:
      • ICM-Soft: ICM soft is the official metric for the ranking (as adapted from Amigó and Delgado, 2022).
      • ICM-Soft Norm: ICM soft normalized.
      • Cross Entropy: in Subtask 1 and Subtask 2 we provide results for cross entropy measure.
    • Baselines:
      • Majority class: non-informative baseline that classifies all instances as the majority class. Note that the probability of the class has been set to 1.
      • Minority class: non-informative baseline that classifies all instances as the minority class. Note that the probability of the class has been set to 1.

Enrique Amigó and Agustín Delgado. 2022. Evaluating Extreme Hierarchical Multi-label Classification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5809–5819, Dublin, Ireland. Association for Computational Linguistics.

EXIST 2025 Lab Program

EXIST 2025 is co-located with the CLEF Conference, and will be held face-to-face on Wednesday, September 10th, 2025 in Madrid.

Wednesday, September 10th

11:30 – 13:15
Overview of EXIST 2025: Learning with Disagreement for Sexism Identification and Characterization in Tweets, Memes, and TikTok Videos
Laura Plaza, Jorge Carrillo-de-Albornoz, Iván Arcos, Paolo Rosso, Damiano Spina, Enrique Amigó, Julio Gonzalo, Roser Morante


14:15 – 15:45
EXIST 2025 Parallel Session 1: Sexism detection and categorization in Multimedia Content

  • 14:15 – 14:20: Welcome and Opening Remarks
  • 14:20 – 15:05: Keynote Speaker: To be announced
  • 15:05 – 15:15: Mario at EXIST 2025: A Simple Gateway to Effective Multilingual Sexism Detection
    Lin Tian, Johanne R. Trippas, Marian-Andrei Rizoiu
  • 15:15 – 15:25: CLiC at EXIST 2025: Combining Fine-tuning and Prompting with Learning with Disagreement for Sexism Detection
    Pol Pastells, Mauro Vázquez, Mireia Farrús, Mariona Taulé
  • 15:25 – 15:35: ANLP-Uniso at EXIST 2025: Sexism Identification and Characterization in Tweets
    Ghada Ben Amor, Nawres Medimagh, Sawssen Ben Chaabene, Omar Trigui
  • 15:35 – 15:45: Identifying Sexism in Memes with Multimodal Deep Learning: Fusing Text and Visual Cues
    Iván Arcos

15:45 – 16:30
Poster Session

  • NLPDame at EXIST: Sexism Categorization in Tweets via Multi-Head Multi-Task Models, LLM & RAG Voting Synergy
    Christina Christodoulou

16:30 – 18:00
EXIST 2025 Parallel Session 2: Sexism detection and categorization in Multimedia Content

  • 16:30 – 16:40: ECA-SIMM-UVa at EXIST 2025: A Segmentation Oriented Approach to Sexism Detection in TikTok Videos Based on a “One Is Enough” Paradigm
    David Fernández, Enrique Amigó, Valentín Cardeñoso
  • 16:40 – 16:50: FHSTP@EXIST 2025 Benchmark: Sexism Detection with Transparent Speech Concept Bottleneck Models
    Roberto Labadie-Tamayo, Adrian Jaques Böck, Djordje Slijepčević, Xihui Chen, Andreas Babic, Matthias Zeppelzauer
  • 16:50 – 17:00: GrootWatch at EXIST 2025: Automatic Sexism Detection on Social Networks – Classification of Tweets and Memes
    Nathan Nowakowski, Lorenzo Calogiuri, Elöd Egyed-Zsigmond, Diana Nurbakova, Johan Erbani, Sylvie Calabretto
  • 17:00 – 17:10: Tackling Sexism in Multimodal Social Media: Exploring Hybrid Generative-Transformer Models
    Moiz Ali, Lakshmi Yendapalli, Bishoy Tawfik, Matt Winzenried
  • 17:10 – 17:20: UMUTeam at EXIST 2025: Multimodal Transformer Architectures and Soft-Label Learning for Sexism Detection
    Ronghao Pan, Tomás Bernal-Beltrán, José Antonio García-Díaz, Rafael Valencia-García
  • 17:20 – 17:50: EXIST 2026: What’s Next?
    A Multimodal Sensor-Data Approach to Sexism Identification in Memes using Heart-Rate Variation, Eye-Tracking and EEG Signals
    Iván Arcos
  • 17:50 – 18:00: Final discussion and suggestions

Organizers

Avatar

Damiano Spina

RMIT University

Senior Lecturer

Avatar

Enrique Amigó

UNED

Associate Professor

Avatar

Iván Arcos

Universitat Politècnica de València

Researcher in Computational Linguistic

Avatar

Jorge Carrillo-de-Albornoz

UNED

RMIT University

Associate Professor

Avatar

Julio Gonzalo

UNED

Full Professor

Avatar

Laura Plaza

UNED

RMIT University

Associate Professor

Avatar

Paolo Rosso

Universitat Politècnica de València

Full Professor

Avatar

Roser Morante

UNED

Researcher in Computational Linguistic

Sponsors

Avatar

ARC Centre of Excellence for Automated Decision-Making and Society (ADM+S) (CE200100005)

RMIT University

Avatar

FairTransNLP Project

(PID2021-124361OB-C31 and PID2021-124361OB-C32)

Spanish Ministry of Science and Innovation

Avatar

Pattern Recognition and Human Language Technologies (PRHLT) Research Center

Universitat Politècnica de València

Contact

For any question that concern the shared task, please write to Jorge Carrillo-de-Albornoz.

Related Work

Overviews previous LeWiDi EXIST editions:

Extended Overviews previous LeWiDi EXIST editions:

Working Notes previous LeWiDi EXIST editions:

Video annotation related work

Iván Arcós and Paolo Rosso (2024) Sexism Identification on TikTok: A Multimodal AI Approach with Text, Audio, and Video. In: Proc. 15th Int. Conf. of the CLEF Association, CLEF 2024, Grenoble, France, September 9-12, Springer-Verlag, LNCS(14958), pp. 61-73