Welcome to the website of EXIST 2025, the fifth edition of the sEXism Identification in Social neTworks task at CLEF 2025.
EXIST is a series of scientific events and shared tasks on sexism identification in social networks. EXIST aims to foster the automatic detection of sexism in a broad sense, from explicit misogyny to other subtle expressions that involve implicit sexist behaviours (EXIST 2021, EXIST 2022, EXIST 2023, EXIST 2024). The fifth edition of the EXIST shared task will be held as a Lab in CLEF 2025, on September 9-12, 2025, at UNED, Madrid, Spain.
Social Networks are the main platforms for social complaint, activism, etc. Movements like #MeTwoo, #8M or #Time’sUp have spread rapidly. Under the umbrella of social networks, many women all around the world have reported abuses, discriminations and other sexist experiences suffered in real life. Social networks are also contributing to the transmission of sexism and other disrespectful and hateful behaviours. In this context, automatic tools not only may help to detect and alert against sexism behaviours and discourses, but also to estimate how often sexist and abusive situations are found in social media platforms, what forms of sexism are more frequent and how sexism is expressed in these media. This Lab will contribute to developing applications to detect sexism.
In the 2024 EXIST campaign the datasets contained multimedia content in the format of memes, stepping forward research on more robust techniques to identify sexism in social networks. Following this line, this year the challenge will focus on TikTok videos, so that the dataset includes the three more important multimedia elements used to spread sexism: text, images and videos. Consequently, it is essential to develop automated multimodal tools capable of detecting sexism in text, images, and videos, to raise alarms or automatically remove such content from social network because platforms’ algorithms often amplify content that perpetuates gender stereotypes and internalized misogyny. This lab will contribute to the creation of applications that identify sexist content in social media across all three formats.
Similar to the approach in the 2023 and 2024 edition, this edition will also embrace the Learning With Disagreement (LeWiDi) paradigm for both the development of the dataset and the evaluation of the systems. The LeWiDi paradigm doesn’t rely on a single “correct” label for each example. Instead, the model is trained to handle and learn from conflicting or diverse annotations. This enables the system to consider various annotators’ perspectives, biases, or interpretations, resulting in a fairer learning process.
In previous editions, 223 teams from more than 50 countries submitted their results achieving impressive results, especially in the sexism detection task. However, there is still room for improvement, especially in when the problem is addressed under the LeWeDi paradigm in a multimedia context.
![]() |
![]() |
![]() |
Participants will be asked to identify and characterize sexism in social networks according to different sources: This year the lab comprises nine subtasks in two languages, English and Spanish, which are the same three tasks (sexism identification, source intention detection, and sexism categorization) applied to three different types of data: text (tweets), image (memes) and video (TikToks). This multimedia approach will help identify trends and patterns in sexism across media formats and user interactions, contributing to a deeper understanding of the social dynamics. Also, approaches submitted to all tasks will be evaluated to analyze their capacity to detect sexism in a multimodal source.
A condense schema of all tasks included this year in the lab is presented in the following table:
For a more detailed description of each subtask, as well as some examples, check the next sections.
The first subtask is a binary classification. The systems have to decide whether or not a given tweet contains sexist expressions or behaviours (i.e., it is sexist itself, describes a sexist situation or criticizes a sexist behaviour), and classify it according to two categories: YES and NO.
Once a message has been classified as sexist, the second subtask aims to categorize the message according to the intention of the author, which provides insights in the role played by social networks on the emission and dissemination of sexist messages. In this subtask, we propose a ternary classification task:
DIRECT: the intention was to write a message that is sexist by itself or incites to be sexist, as in:
REPORTED: the intention is to report and share a sexist situation suffered by a woman or women in first or third person, as in:
JUDGEMENTAL: the intention was to judge, since the tweet describes sexist situations or behaviours with the aim of condemning them.
Many facets of a woman’s life may be the focus of sexist attitudes including domestic and parenting roles, career opportunities, sexual image, and life expectations, to name a few. Automatically detecting which of these facets of women are being more frequently attacked in social networks will facilitate the development of policies to fight against sexism. According to this, each sexist tweet must be categorized in one or more of the following categories
IDEOLOGICAL AND INEQUALITY: The text discredits the feminist movement, rejects inequality between men and women, or presents men as victims of gender-based oppression.
STEREOTYPING AND DOMINANCE: The text expresses false ideas about women that suggest they are more suitable to fulfill certain roles (mother, wife, family caregiver, faithful, tender, loving, submissive, etc.), or inappropriate for certain tasks (driving, hardwork, etc), or claims that men are somehow superior to women.
OBJECTIFICATION: The text presents women as objects apart from their dignity and personal aspects, or assumes or describes certain physical qualities that women must have in order to fulfill traditional gender roles (compliance with beauty standards, hypersexualization of female attributes, women’s bodies at the disposal of men, etc.).
SEXUAL VIOLENCE: Sexual suggestions, requests for sexual favors or harassment of a sexual nature (rape or sexual assault) are made.
MISOGYNY AND NON-SEXUAL VIOLENCE: The text expressses hatred and violence towards women.
This is a binary classification subtask consisting on determining wheter a meme describes a sexist situation or criticizes a sexist behaviour), and classifying it into two categories: YES and NO. The following figures are some examples of both types of memes, respectively.
As in subtask 1.2, this subtask aims to categorize the meme according to the intention of the author, which provides insights in the role played by social networks on the emission and dissemination of sexist messages. Due to the characteristics of the memes, the REPORTED label is virtually null, so in this task systems should only classify memes with DIRECT or JUDGEMENTAL labels. The following figures are some examples of them, respectively.
This task aims to classify sexist memes according to the categorization provided for subtask 1.3: (i) IDEOLOGICAL AND INEQUALITY, (ii) STEREOTYPING AND DOMINANCE, (iii) OBJECTIFICATION, (iv) SEXUAL VIOLENCE and (v) MISOGYNY AND NON-SEXUAL VIOLENCE. The following figures are some examples of categorized memes.
(a) Stereotyping
(e) Ideological
(c) Objectification
(d) Misogyny
(b) Sexual violence
This subtask is the same subtask as subtasks 1.1 and 2.1. The following figures are some examples of videos classified as YES or NO.
@cayleecresta #stitch with @goodbrobadbro easy should never be the word used to describe womanhood #fyp #foryou #foryoupage #womenempowerment #women #feminism ♬ original sound - Caylee Cresta
@dailyhealth2 #haha #kidnapped #bigredswifesarmy #oregon #victimcard #victimblaming #bodyguard #loved #smile #lagrandeoregon ♬ original sound - รⒶ︎я︎Ⓐ︎𝔥 ģⒶ︎เ︎ᒪ︎🫦
This subtask replicates subtask 2.2 for memes, but it takes as source videos. The following examples are some videos representing each category.
@yourgirlhaylie #duet with @michaelkoz #sexist #foryou #FitCheck #throwhimaway ♬ original sound - Mike Koz
@zantyoo #womenpower #humiliation #power #womencant #womencantoo #womencan ♬ original sound - Amizan Words
This subtask aims to classify sexist videos according to the categorization provided for Task 1.3: (i) IDEOLOGICAL AND INEQUALITY, (ii) STEREOTYPING AND DOMINANCE, (iii) OBJECTIFICATION, (iv) SEXUAL VIOLENCE and (v) MISOGYNY AND NON-SEXUAL VIOLENCE. The following figures are some examples of categorized videos.
@streaminfreedom I’m an idiot! @streaminfreedom #truestory #menvswomen #relationshipcomedy ♬ original sound - leanne_lou
@itslindobaby I’m getting so use to this now 😒 can people just like me for my music? #golddigger #rapper #hiphop #golddiggerprank ♬ original sound - Lindo
@zo3tv #duet with @lenatheplug #noJumper #dunked #in #theRight #goal #she #is #beautiful & #babygirl #isTo #swimsuit #never #gotTight #bodySnatched #congrats ♬ Aesthetic Girl - Yusei
@alt_acc393 IT'S A JOKEEEEE. #fyp #foryoupage #foryou ♬ original sound - alt acc
@caitlinnrowe_ proud of adelaide today 🤍#justicforwomen #saraheverard #notallmen #fyp #protest #adelaide #southaustralia #australia #foryoupage ♬ THISISNOTMYREMIX - Thewizardliz
If you want to participate in the EXIST 2025 shared task at CLEF 2025, please proceed to register for the lab at CLEF 2025 Labs Registration site. Once you have filled out the form, you will receive an email with information on how to join the EXIST 2025 Discord Forum, where EXIST-Datasets, EXIST-Communications, EXIST-Questions/Answers, and EXIST-Guidelines will be made available to participants. This is a manual process, so it might take some time. Please don’t worry, :-).
Participants will be required to submit their runs and will have the possibility to provide a technical report that should include a brief description of their approach, focusing on the adopted algorithms, models and resources, a summary of their experiments, and an analysis of the obtained results. Although we recommend to participate in all subtasks and in both languages, participants are allowed to participate just in one of them (e.g. subtask 1) and in one language (e.g. English).
Technical reports will be published in CLEF 2024 Proceedings at CEUR-WS.org.
Note: All deadlines are 11:59PM UTC-12:00 (“anywhere on Earth”).
Since 2021, the primary goal of the EXIST campaigns has been to identify sexism in tweets, resulting in the creation of three annotated tweet corpora for various EXIST tasks.
In 2024, the EXIST evaluation campaigns have expanded into multimedia environments. This year, with the inclusion of TikTok videos in the dataset, the EXIST 2025 Dataset aims to provide the research community with the first comprehensive multimedia dataset —encompassing tweets, memes, and videos— for sexism detection and categorization in social media.
The TikTok dataset was collected using Apify’s TikTok Hashtag Scraper tool, focusing on hashtags associated with potentially sexist content. A rigorous manual selection process was carried out to ensure an appropriate balance between positive and negative seed hashtags. In total, 185 Spanish hashtags and 61 English hashtags were chosen, guaranteeing a broad and representative collection of sexist-related content in both languages.
The collected TikTok videos were divided into training and test sets following a chronological and author-based partitioning strategy. This approach ensured temporal coherence while preventing data leakage. To achieve this, authors present in the training set were excluded from the test set, preventing the model from learning author-specific patterns and enhancing its generalization capabilities. Additionally, each hashtag (seed) was required to contribute a minimum number of videos, ensuring a more uniform distribution across the dataset. The final selection of videos was conducted randomly but maintained a temporal distribution to ensure diversity and avoid overrepresentation of any specific time period.
The final dataset comprises more than 3,000 videos. The training set consists of 2,524 videos, including 1,524 Spanish videos and 1,000 English videos. The test set contains 674 videos, with a subset of 304 Spanish videos and 370 English videos.
The annotation process was conducted using Servipoli’s service at UPV University, with a total of eight students. Given the complexity of video labeling, this year’s methodology was carried out with experts who received specialized training through multiple sessions and followed carefully designed guidelines. Additionally, preliminary experiments were conducted with a minimum set of TikTok videos to ensure a thorough understanding of the task and to guarantee the quality of the annotations. Due to this new labelling methodology, the 2025 videos datasets only includes as demographic data the gender of the annotators.
The labeling process was performed in pairs, with each annotator responsible for labeling 1,000 TikTok videos while maintaining close communication with experts throughout the process. As a result, each TikTok video was labeled by two annotators. To ensure a rigorous evaluation of the dataset in a challenging context, while minimizing data loss, any disagreements between annotators were resolved by a member of the research team, who made the final decision.
The idea that natural language expressions have a single, clearly identifiable interpretation in a given context is a convenient simplification but does not reflect reality, particularly in highly subjective tasks such as sexism identification. The learning with disagreements paradigm addresses this challenge by allowing systems to learn from datasets that do not rely on a single “gold” annotation but instead incorporate the perspectives of multiple annotators, capturing the diversity of interpretations.
Following approaches designed to train models directly from data with disagreements, rather than using an aggregated label, we will provide all annotations per instance for the different annotators.
More details about the dataset will be provided in the task overview (bias consideration, annotation process, quality experiments, inter-annotator agreement, etc.).
For any question that concern the shared task, please write to Jorge Carrillo-de-Albornoz.