Welcome to the website of EXIST 2022, the second edition of the sEXism Identification in Social neTworks task at IberLEF 2022.
EXIST 2021 was the first shared task that aims to detect sexism in a broad sense, from explicit misogyny to other subtle expressions that involve implicit sexist behaviours EXIST 2021.
Detecting online sexism may be difficult, as it may be expressed in very different forms. Sexism may sound “friendly”: the statement “Women must be loved and respected, always treat them like a fragile glass” may seem positive, but is actually considering that women are weaker than men. Sexism may sound “funny”, as it is the case of sexist jokes or humour (“You have to love women… just that… You will never understand them.”). Sexism may sound “offensive” and “hateful”, as in “Humiliate, expose and degrade yourself as the fucking bitch you are if you want a real man to give you attention”. The automatic identification of sexism may help to create, design and determine the evolution of new equality policies, as well as encourage better behaviors in society.
In previous edition, more than 31 teams submitted their results achieving impressive results, especially in the first task. However, there is still room for improvement, especially in task 2, where the aim is to categorize sexism according to the facet of the women that is undermined. We encourage researchers to participate and beat last edition results.
Participants will be asked to classify “tweets” (in English and Spanish) according to the following two tasks:
The first subtask is a binary classification. The systems have to decide whether or not a given tweet contains sexist expressions or behaviours (i.e., it is sexist itself, describes a sexist situation or criticizes a sexist behaviour). The following tweets show examples of sexist and not sexist messages.
Once a message has been classified as sexist, the second task aims to categorize the message according to the type of sexism (according to the categorization proposed by experts and that takes into account the different facets of women that are undermined). In particular, we propose a five-classification task:
IDEOLOGICAL AND INEQUALITY: The text discredits the feminist movement, rejects inequality between men and women, or presents men as victims of gender-based oppression.
STEREOTYPING AND DOMINANCE: The text expresses false ideas about women that suggest they are more suitable to fulfill certain roles (mother, wife, family caregiver, faithful, tender, loving, submissive, etc.), or inappropriate for certain tasks (driving, hardwork, etc), or claims that men are somehow superior to women.
OBJECTIFICATION: The text presents women as objects apart from their dignity and personal aspects, or assumes or describes certain physical qualities that women must have in order to fulfill traditional gender roles (compliance with beauty standards, hypersexualization of female attributes, women’s bodies at the disposal of men, etc.).
SEXUAL VIOLENCE: Sexual suggestions, requests for sexual favors or harassment of a sexual nature (rape or sexual assault) are made.
MISOGYNY AND NON-SEXUAL VIOLENCE: The text expressses hatred and violence towards women.
If you want to participate in the EXIST2022@IberLEF-2022 shared task, please fill this form. You will receive information about how to join our Google Group, where EXIST-Datasets, EXIST-Communications, EXIST-Questions/Answers, and EXIST-Guidelines will be provided to the participants.
Participants will be required to submit their runs and will have the possibility to provide a technical report that should include a brief description of their approach, focusing on the adopted algorithms, models and resources, a summary of their experiments, and an analysis of the obtained results. Although we recommend to participate in both tasks, participants are allowed to participate just in one of them (e.g. Task 1).
Technical reports will be published in IberLEF 2022 Proceedings at CEUR-WS.org.
Note: All deadlines are 11:59PM UTC-12:00 (“anywhere on Earth”).
Sexism comprises any form of oppression or prejudice against women because of their sex. As stated in (Rodríguez-Sánchez et al. 2020), sexism is frequently found in many forms in social networks, includes a wide range of behaviours (such as stereotyping, ideological issues, sexual violence, etc. (Donoso-Vázquez and Rebollo-Catalán, 2018; Manne, 2018)), and may be expressed in different forms: direct, indirect, descriptive or reported (Miller, 2009; Chiril et al. 2020). While previous studies have focused on identifying explicit hatred or violence towards women (Zeerak and Dirk, 2016; Zeerak, 2016; Anzovino et al., 2018; Frenda et al., 2019), the aim of the EXIST dataset is to cover sexism in a broad sense, from explicit misogyny to other subtle expressions that involve implicit sexist behaviours. The EXIST dataset incorporates any type of sexist content, expressions or related phenomena, including descriptive or reported assertions where the sexist message is a report or a description of a sexist behaviour.
To this aim, we collected a number of popular expressions and terms, both in English and Spanish, commonly used to underestimate the role of women in our society and extracted from several Twitter accounts which collect phrases and expressions that women (Twitter users) have received on a day-to-day basis, as well as terms used in previous state of the art approaches. The final set contains more than 200 expressions that can be used in sexist contexts.
In this new edition of EXIST 2022 challenge, we will use the EXIST 2021 dataset as training data. The entire EXIST 2021 dataset contains 11,345 labeled texts, 5644 for English and 5701 for Spanish. The final EXIST dataset consists of 6977 tweets for training and 3386 tweets for testing, where both sets are randomly selected from the 9000 and 4000 sampled sets, training and test respectively, to ensure class balancing according to Task 1. Gab information was labeled following the same process, obtaining 492 gabs in English and 490 in Spanish. More details about the EXIST 2021 dataset are available in the task overview (bias consideration, annotation process, quality experiments, inter-annotator agreement, etc.).
For the test set, we will collect and label around 1058 tweets from Twitter following the procedure used in the EXIST 2021 dataset. Crawling on Twitter started on January 1st 2022 and ended on January 31st 2022. We crawled data in both languages, Spanish and English, during one month. The labelling process will be carried out by 6 experts in the area with several years of experience analyzing sexist content considering the balance between gender, 3 women and 3 men, in order to avoid gender bias in the labelling process.
If you want to access EXIST Datasets for research purpose, please fill this form.
Rodríguez-Sánchez, F., Carrillo-de-Albornoz, J., Plaza, L., Automatic Classification of Sexism in Social Networks: An Empirical Study on Twitter Data. IEEE Access (2020).
Donoso-Vázquez, Trinidad; Rebollo-Catalán, Ángeles. (coordinadoras) (2018). Violencias de género en entornos virtuales. Ediciones Octaedro, S.L.
Manne, K., DOWN GIRL: The logic of misogyny. Oxford University Press (2018)
Miller, S., Language and Sexism. Cambridge University Press (2009)
Chiril, P., Moriceau, V., Benamara, F., He said “who’s gonna take care of your children when you are at ACL?”: Reported Sexist Acts are Not Sexist. In proceedings of the ACL (2020)
Zeerak, W., Dirk, H., Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter. In proceedings of the ACL (2016)
Zeerak, W., Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter. In proceedings of the ACL (2016)
Anzovino, M., Fersini, E., Rosso, P., Automatic Identification and Classification of Misogynistic Language on Twitter, Springer (2018)
Frenda S., Ghanem B., Montes-y-Gómez M., Rosso P., Online Hate Speech against Women: Automatic Identification of Misogyny and Sexism on Twitter. In: Journal of Intelligent & Fuzzy Systems, vol. 36, num. 5, pp. 4743–4752 (2019)
In order to evaluate the performance of the different approaches proposed by the participants we will use the Evaluation Framework EvALL, www.evall.uned.es (Amigo et al., 2017; Amigo et al., 2018, Amigo et al., 2020). Within this framework, we will evaluate the system outputs as classification tasks (binary and multiclass respectively) with the following measures: Accuracy, Precision, Recall and F-measure (using macro average with all classes for the three last).
In the first task, Sexism Identification, results of participants will be ranked using Accuracy. Besides, other measures will be computed, such Precision, Recall and F1, as well as other analysis based on the two different social networks will be performed.
For the second task, Sexism Categorization, we will use macro-average F-measure to rank the system outputs, analyzing the results according to the different categories and distributions. Similarly, we will compute other measures such as Precision and Recall.
More details about the evaluation and additional experiments will be provided in the task overview.
Amigó, E., Carrillo-de-Albornoz, J., Almagro-Cádiz, M., Gonzalo, J., Rodríguez-Vidal, J., and Verdejo, F. (2017). EvALL: Open Access Evaluation for Information Access Systems. Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7-11, 2017.
Amigó, E., Spina, D., and Carrillo-de-Albornoz, J.. An Axiomatic Analysis of Diversity Evaluation Metrics: Introducing the Rank-Biased Utility Metric. In Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR ‘18). ACM, New York, NY, USA, 625-634.
Amigo, E., Gonzalo, J., Mizzaro, S., and Carrillo-de-Albornoz, J.. An Effectiveness Metric for Ordinal Classification: Formal Properties and Experimental Results. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020).
Below are the official test scores for all participants and tasks. Ranking for Task 1 is based on Accuracy measure, while F1 measure is used for ranking in Task 2.
The Evaluation Framework EvALL (www.evall.uned.es) has been used to generate the evaluation for both tasks. Evaluations by language has been also generated for all runs. All evaluations reports generated by EvALL are available for download below the table.
TASK 1: Sexism Identification | TASK 2: Sexism Categorization | ||||||
---|---|---|---|---|---|---|---|
Ranking | Run | Acc | F1 | Run | Acc | F1 | |
1 | task1_avacaondata_1 | 0,7996 | 0,7978 | task2_avacaondata_1 | 0,7013 | 0,5106 | |
2 | task1_avacaondata_3 | 0,7996 | 0,7978 | task2_avacaondata_3 | 0,7013 | 0,5106 | |
3 | task1_CIMATCOLMEX_1 | 0,7949 | 0,7940 | task2_ELiRF-VRAIN_3 | 0,7042 | 0,4991 | |
4 | task1_CIMATCOLMEX_3 | 0,7911 | 0,7904 | task2_ELiRF-VRAIN_1 | 0,7013 | 0,4963 | |
5 | task1_CIMATCOLMEX_2 | 0,7883 | 0,7877 | task2_ELiRF-VRAIN_2 | 0,6862 | 0,4787 | |
6 | task1_I2C_1 | 0,7883 | 0,7880 | task2_avacaondata_2 | 0,6607 | 0,4747 | |
7 | task1_SINAI-TL_1 | 0,7845 | 0,7841 | task2_UMU_2 | 0,6767 | 0,4741 | |
8 | task1_SINAI-TL_3 | 0,7845 | 0,7839 | task2_UMU_1 | 0,6730 | 0,4724 | |
9 | task1_multiaztertest_1 | 0,7836 | 0,7830 | task2_multiaztertest_1 | 0,6786 | 0,4706 | |
10 | task1_I2C_3 | 0,7807 | 0,7788 | task2_ThangCIC_8 | 0,6626 | 0,4706 | |
11 | task1_multiaztertest_2 | 0,7732 | 0,7708 | task2_I2C_1 | 0,6465 | 0,4700 | |
12 | task1_ELiRF-VRAIN_2 | 0,7694 | 0,7686 | task2_UMU_3 | 0,6720 | 0,4680 | |
13 | task1_ELiRF-VRAIN_3 | 0,7684 | 0,7679 | task2_AIT_FHSTP_3 | 0,6522 | 0,4675 | |
14 | task1_ELiRF-VRAIN_1 | 0,7656 | 0,7655 | task2_LPtower_1 | 0,6569 | 0,4635 | |
15 | task1_I2C_2 | 0,7656 | 0,7656 | task2_ThangCIC_4 | 0,6541 | 0,4612 | |
16 | task1_UMU_1 | 0,7647 | 0,7642 | task2_ThangCIC_2 | 0,6626 | 0,4562 | |
17 | task1_2539404758 | 0,7637 | 0,7623 | task2_AI-UPV_3 | 0,6267 | 0,4516 | |
18 | task1_AI-UPV_3 | 0,7637 | 0,7635 | task2_AI-UPV_2 | 0,6257 | 0,4485 | |
19 | task1_UMU_3 | 0,7637 | 0,7628 | task2_AIT_FHSTP_1 | 0,6418 | 0,4366 | |
20 | task1_UMU_2 | 0,7618 | 0,7605 | task2_LPtower_2_major | 0,6371 | 0,4325 | |
21 | task1_ThangCIC_3 | 0,7609 | 0,7600 | task2_AI-UPV_1 | 0,6125 | 0,4299 | |
22 | task1_ThangCIC_7 | 0,7609 | 0,7608 | task2_besiguenza_1 | 0,6285 | 0,4198 | |
23 | task1_LPtower_1 | 0,7580 | 0,7559 | task2_2539404758 | 0,6153 | 0,3809 | |
24 | task1_ThangCIC_1 | 0,7580 | 0,7553 | task2_UNED-UPM_1 | 0,5274 | 0,3708 | |
25 | task1_shm2022_1 | 0,7533 | 0,7530 | task2_AIT_FHSTP_2 | 0,5255 | 0,3571 | |
26 | task1_AIT_FHSTP_3 | 0,7505 | 0,7496 | task2_BASELINE | 0,5784 | 0,3420 | |
27 | task1_CompLingKnJ_1 | 0,7457 | 0,7448 | task2_UNED-UPM_2 | 0,4924 | 0,3283 | |
28 | task1_LPtower_2_major | 0,7457 | 0,7426 | task2_NIT Agartala NLP Team_1 | 0,6229 | 0,3194 | |
29 | task1_AIT_FHSTP_1 | 0,7420 | 0,7410 | task2_LPtower_3 | 0,3110 | 0,1508 | |
30 | task1_AI-UPV_1 | 0,7410 | 0,7410 | Majority Class | 0,5539 | 0,1018 | |
31 | task1_AI-UPV_2 | 0,7410 | 0,7410 | task2_shm2022_1 | 0,1380 | 0,0560 | |
32 | task1_SINAI_1 | 0,7316 | 0,7315 | ||||
33 | task1_besiguenza_1 | 0,7306 | 0,7269 | ||||
34 | task1_SINAI_2 | 0,7278 | 0,7272 | ||||
35 | task1_SINAI_3 | 0,7202 | 0,7184 | ||||
36 | task1_AIT_FHSTP_2 | 0,7183 | 0,7181 | ||||
37 | task1_shm2022_2 | 0,7183 | 0,7183 | ||||
38 | task1_NIT Agartala NLP Team_1 | 0,7098 | 0,7065 | ||||
39 | task1_BASELINE | 0,6928 | 0,6859 | ||||
40 | task1_SINAI-TL_2 | 0,6928 | 0,6882 | ||||
41 | task1_UNED-UPM_1 | 0,6824 | 0,6792 | ||||
42 | task1_CompLingKnJ_2 | 0,6815 | 0,6770 | ||||
43 | task1_UNED-UPM_2 | 0,6664 | 0,6624 | ||||
44 | Majority Class | 0,5444 | 0,3525 | ||||
45 | task1_LPtower_3 | 0,4905 | 0,4872 | ||||
46 | task1_xaiTUD_1 | 0,4811 | 0,4600 | ||||
47 | task1_avacaondata_2 | 0,0491 | 0,0473 |
The EvALL reports for Task 1 are avaible for download:
EvALL Tsv Report Task 1 English
EvALL Tsv Report Task 1 Spanish
The EvALL reports for Task 2 are avaible for download:
EXIST 2022 is co-located with the IberLEF Conference, and will be held face-to-face on Tuesday, 20 September 2022, from 9:30 to 19:30h CET.
11:50 - 12:10: Overview of EXIST 2022: sEXism Identification in Social neTworks. Francisco Rodríguez-Sánchez, Jorge Carrillo-de-Albornoz, Laura Plaza, Adrián Mendieta-Aragón, Guillermo Marco-Remón, Maryna Makeienko, María Plaza, Julio Gonzalo, Damiano Spina, and Paolo Rosso.
18:15 - 19:15: EXIST 2022 Parallel Session.
Overview Paper:
Working Notes:
If you have any specific question about the EXIST 2022 task, we may ask you to let us know through the Google Group existiberlef2022.
For any other question that does not directly concern the shared task, please write to Jorge Carrillo-de-Albornoz.