Why EXIST?

Welcome to the website of EXIST 2022, the second edition of the sEXism Identification in Social neTworks task at IberLEF 2022.

EXIST 2021 was the first shared task that aims to detect sexism in a broad sense, from explicit misogyny to other subtle expressions that involve implicit sexist behaviours EXIST 2021.

Detecting online sexism may be difficult, as it may be expressed in very different forms. Sexism may sound “friendly”: the statement “Women must be loved and respected, always treat them like a fragile glass” may seem positive, but is actually considering that women are weaker than men. Sexism may sound “funny”, as it is the case of sexist jokes or humour (“You have to love women… just that… You will never understand them.”). Sexism may sound “offensive” and “hateful”, as in “Humiliate, expose and degrade yourself as the fucking bitch you are if you want a real man to give you attention”. The automatic identification of sexism may help to create, design and determine the evolution of new equality policies, as well as encourage better behaviors in society.

In previous edition, more than 31 teams submitted their results achieving impressive results, especially in the first task. However, there is still room for improvement, especially in task 2, where the aim is to categorize sexism according to the facet of the women that is undermined. We encourage researchers to participate and beat last edition results.

Tasks

Participants will be asked to classify “tweets” (in English and Spanish) according to the following two tasks:

TASK 1: Sexism Identification

The first subtask is a binary classification. The systems have to decide whether or not a given tweet contains sexist expressions or behaviours (i.e., it is sexist itself, describes a sexist situation or criticizes a sexist behaviour). The following tweets show examples of sexist and not sexist messages.

  • SEXIST:
    • “Mujer al volante, tenga cuidado!"
    • “People really try to convince women with little to no ass that they should go out and buy a body. Like bih, I don’t need a fat ass to get a man. Never have."
  • NOT SEXIST:
    • “Alguien me explica que zorra hace la gente en el cajero que se demora tanto."
    • "@messyworldorder it’s honestly so embarrassing to watch and they’ll be like “not all white women are like that”"

TASK 2: Sexism Categorization

Once a message has been classified as sexist, the second task aims to categorize the message according to the type of sexism (according to the categorization proposed by experts and that takes into account the different facets of women that are undermined). In particular, we propose a five-classification task:

  • IDEOLOGICAL AND INEQUALITY: The text discredits the feminist movement, rejects inequality between men and women, or presents men as victims of gender-based oppression.

    • “Mi hermana y mi madre se burlan de mí por defender todo el tiempo los derechos de todos y me acaban de decir feminazi, la completaron”.
    • “I think the whole equality thing is getting out of hand. We are different, thats how were made!”.
  • STEREOTYPING AND DOMINANCE: The text expresses false ideas about women that suggest they are more suitable to fulfill certain roles (mother, wife, family caregiver, faithful, tender, loving, submissive, etc.), or inappropriate for certain tasks (driving, hardwork, etc), or claims that men are somehow superior to women.

    • “@Paula2R @faber_acuria A las mujeres hay que amarlas…solo eso… Nunca las entenderás.”.
    • “Most women no longer have the desire or the knowledge to develop a high quality character, even if they wanted to.”.
  • OBJECTIFICATION: The text presents women as objects apart from their dignity and personal aspects, or assumes or describes certain physical qualities that women must have in order to fulfill traditional gender roles (compliance with beauty standards, hypersexualization of female attributes, women’s bodies at the disposal of men, etc.).

    • ““Pareces una puta con ese pantalón” - Mi hermano de 13 cuando me vio con un pantalón de cuero”.
    • “Don’t get married than blame all woman for your poor investment. You should of got a hooker but instead you choose to go get a wedding ring.”.
  • SEXUAL VIOLENCE: Sexual suggestions, requests for sexual favors or harassment of a sexual nature (rape or sexual assault) are made.

    • “#MeToo Estas 4 no han conseguido su objetivo.El juez estima que se abrieron de patas https://t.co/GSHiiwqY6Aánta lagartona hay en este \metoo"!👇🏻👇🏻🤔🤔🤔 https://t.co/8t5VmFIUFn””
    • “fuck that cunt, I would with my fist”.
  • MISOGYNY AND NON-SEXUAL VIOLENCE: The text expressses hatred and violence towards women.

    • “Las mujeres de hoy en dia te enseñar a querer… estar soltero”
    • “Some woman are so toxic they don’t even know they are draining everyone around them in poison. If you lack self awareness you won’t even notice how toxic you really are”.

How to participate

If you want to participate in the EXIST2022@IberLEF-2022 shared task, please fill this form. You will receive information about how to join our Google Group, where EXIST-Datasets, EXIST-Communications, EXIST-Questions/Answers, and EXIST-Guidelines will be provided to the participants.

Participants will be required to submit their runs and will have the possibility to provide a technical report that should include a brief description of their approach, focusing on the adopted algorithms, models and resources, a summary of their experiments, and an analysis of the obtained results. Although we recommend to participate in both tasks, participants are allowed to participate just in one of them (e.g. Task 1).

Publications

Technical reports will be published in IberLEF 2022 Proceedings at CEUR-WS.org.

Important dates

  • 1 Feb 2022 Registration open
  • 15 Feb 2022 Training set available.
  • 22 Mar 2022 Test set available. 29 Mar 2022 Test set available.
  • 12 Apr 2022 Systems results due to organizers. 19 Apr 2022 Systems results due to organizers.
  • 26 Apr 2022 Results notification to participants.
  • 17 May 2022 Submission of Working Notes by participants. EXTENDED: 24 May 2022 Submission of Working Notes by participants.
  • 31 May 2022 Reviews to participants (peer-reviews).
  • 16 Jun 2022 Camera-ready due to organizers.
  • Sep 2022 EXIST@IberLEF 2022

Note: All deadlines are 11:59PM UTC-12:00 (“anywhere on Earth”).

Dataset

Sexism comprises any form of oppression or prejudice against women because of their sex. As stated in (Rodríguez-Sánchez et al. 2020), sexism is frequently found in many forms in social networks, includes a wide range of behaviours (such as stereotyping, ideological issues, sexual violence, etc. (Donoso-Vázquez and Rebollo-Catalán, 2018; Manne, 2018)), and may be expressed in different forms: direct, indirect, descriptive or reported (Miller, 2009; Chiril et al. 2020). While previous studies have focused on identifying explicit hatred or violence towards women (Zeerak and Dirk, 2016; Zeerak, 2016; Anzovino et al., 2018; Frenda et al., 2019), the aim of the EXIST dataset is to cover sexism in a broad sense, from explicit misogyny to other subtle expressions that involve implicit sexist behaviours. The EXIST dataset incorporates any type of sexist content, expressions or related phenomena, including descriptive or reported assertions where the sexist message is a report or a description of a sexist behaviour.

To this aim, we collected a number of popular expressions and terms, both in English and Spanish, commonly used to underestimate the role of women in our society and extracted from several Twitter accounts which collect phrases and expressions that women (Twitter users) have received on a day-to-day basis, as well as terms used in previous state of the art approaches. The final set contains more than 200 expressions that can be used in sexist contexts.

In this new edition of EXIST 2022 challenge, we will use the EXIST 2021 dataset as training data. The entire EXIST 2021 dataset contains 11,345 labeled texts, 5644 for English and 5701 for Spanish. The final EXIST dataset consists of 6977 tweets for training and 3386 tweets for testing, where both sets are randomly selected from the 9000 and 4000 sampled sets, training and test respectively, to ensure class balancing according to Task 1. Gab information was labeled following the same process, obtaining 492 gabs in English and 490 in Spanish. More details about the EXIST 2021 dataset are available in the task overview (bias consideration, annotation process, quality experiments, inter-annotator agreement, etc.).

For the test set, we will collect and label around 1058 tweets from Twitter following the procedure used in the EXIST 2021 dataset. Crawling on Twitter started on January 1st 2022 and ended on January 31st 2022. We crawled data in both languages, Spanish and English, during one month. The labelling process will be carried out by 6 experts in the area with several years of experience analyzing sexist content considering the balance between gender, 3 women and 3 men, in order to avoid gender bias in the labelling process.

References

Rodríguez-Sánchez, F., Carrillo-de-Albornoz, J., Plaza, L., Automatic Classification of Sexism in Social Networks: An Empirical Study on Twitter Data. IEEE Access (2020).

Donoso-Vázquez, Trinidad; Rebollo-Catalán, Ángeles. (coordinadoras) (2018). Violencias de género en entornos virtuales. Ediciones Octaedro, S.L.

Manne, K., DOWN GIRL: The logic of misogyny. Oxford University Press (2018)

Miller, S., Language and Sexism. Cambridge University Press (2009)

Chiril, P., Moriceau, V., Benamara, F., He said “who’s gonna take care of your children when you are at ACL?”: Reported Sexist Acts are Not Sexist. In proceedings of the ACL (2020)

Zeerak, W., Dirk, H., Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter. In proceedings of the ACL (2016)

Zeerak, W., Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter. In proceedings of the ACL (2016)

Anzovino, M., Fersini, E., Rosso, P., Automatic Identification and Classification of Misogynistic Language on Twitter, Springer (2018)

Frenda S., Ghanem B., Montes-y-Gómez M., Rosso P., Online Hate Speech against Women: Automatic Identification of Misogyny and Sexism on Twitter. In: Journal of Intelligent & Fuzzy Systems, vol. 36, num. 5, pp. 4743–4752 (2019)

Evaluation

In order to evaluate the performance of the different approaches proposed by the participants we will use the Evaluation Framework EvALL, www.evall.uned.es (Amigo et al., 2017; Amigo et al., 2018, Amigo et al., 2020). Within this framework, we will evaluate the system outputs as classification tasks (binary and multiclass respectively) with the following measures: Accuracy, Precision, Recall and F-measure (using macro average with all classes for the three last).

In the first task, Sexism Identification, results of participants will be ranked using Accuracy. Besides, other measures will be computed, such Precision, Recall and F1, as well as other analysis based on the two different social networks will be performed.

For the second task, Sexism Categorization, we will use macro-average F-measure to rank the system outputs, analyzing the results according to the different categories and distributions. Similarly, we will compute other measures such as Precision and Recall.

More details about the evaluation and additional experiments will be provided in the task overview.

References

Amigó, E., Carrillo-de-Albornoz, J., Almagro-Cádiz, M., Gonzalo, J., Rodríguez-Vidal, J., and Verdejo, F. (2017). EvALL: Open Access Evaluation for Information Access Systems. Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7-11, 2017.

Amigó, E., Spina, D., and Carrillo-de-Albornoz, J.. An Axiomatic Analysis of Diversity Evaluation Metrics: Introducing the Rank-Biased Utility Metric. In Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR ‘18). ACM, New York, NY, USA, 625-634.

Amigo, E., Gonzalo, J., Mizzaro, S., and Carrillo-de-Albornoz, J.. An Effectiveness Metric for Ordinal Classification: Formal Properties and Experimental Results. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020).

Results

Below are the official test scores for all participants and tasks. Ranking for Task 1 is based on Accuracy measure, while F1 measure is used for ranking in Task 2.

The Evaluation Framework EvALL (www.evall.uned.es) has been used to generate the evaluation for both tasks. Evaluations by language has been also generated for all runs. All evaluations reports generated by EvALL are available for download below the table.

TASK 1: Sexism Identification TASK 2: Sexism Categorization
Ranking Run Acc F1 Run Acc F1
1 task1_avacaondata_1 0,7996 0,7978 task2_avacaondata_1 0,7013 0,5106
2 task1_avacaondata_3 0,7996 0,7978 task2_avacaondata_3 0,7013 0,5106
3 task1_CIMATCOLMEX_1 0,7949 0,7940 task2_ELiRF-VRAIN_3 0,7042 0,4991
4 task1_CIMATCOLMEX_3 0,7911 0,7904 task2_ELiRF-VRAIN_1 0,7013 0,4963
5 task1_CIMATCOLMEX_2 0,7883 0,7877 task2_ELiRF-VRAIN_2 0,6862 0,4787
6 task1_I2C_1 0,7883 0,7880 task2_avacaondata_2 0,6607 0,4747
7 task1_SINAI-TL_1 0,7845 0,7841 task2_UMU_2 0,6767 0,4741
8 task1_SINAI-TL_3 0,7845 0,7839 task2_UMU_1 0,6730 0,4724
9 task1_multiaztertest_1 0,7836 0,7830 task2_multiaztertest_1 0,6786 0,4706
10 task1_I2C_3 0,7807 0,7788 task2_ThangCIC_8 0,6626 0,4706
11 task1_multiaztertest_2 0,7732 0,7708 task2_I2C_1 0,6465 0,4700
12 task1_ELiRF-VRAIN_2 0,7694 0,7686 task2_UMU_3 0,6720 0,4680
13 task1_ELiRF-VRAIN_3 0,7684 0,7679 task2_AIT_FHSTP_3 0,6522 0,4675
14 task1_ELiRF-VRAIN_1 0,7656 0,7655 task2_LPtower_1 0,6569 0,4635
15 task1_I2C_2 0,7656 0,7656 task2_ThangCIC_4 0,6541 0,4612
16 task1_UMU_1 0,7647 0,7642 task2_ThangCIC_2 0,6626 0,4562
17 task1_2539404758 0,7637 0,7623 task2_AI-UPV_3 0,6267 0,4516
18 task1_AI-UPV_3 0,7637 0,7635 task2_AI-UPV_2 0,6257 0,4485
19 task1_UMU_3 0,7637 0,7628 task2_AIT_FHSTP_1 0,6418 0,4366
20 task1_UMU_2 0,7618 0,7605 task2_LPtower_2_major 0,6371 0,4325
21 task1_ThangCIC_3 0,7609 0,7600 task2_AI-UPV_1 0,6125 0,4299
22 task1_ThangCIC_7 0,7609 0,7608 task2_besiguenza_1 0,6285 0,4198
23 task1_LPtower_1 0,7580 0,7559 task2_2539404758 0,6153 0,3809
24 task1_ThangCIC_1 0,7580 0,7553 task2_UNED-UPM_1 0,5274 0,3708
25 task1_shm2022_1 0,7533 0,7530 task2_AIT_FHSTP_2 0,5255 0,3571
26 task1_AIT_FHSTP_3 0,7505 0,7496 task2_BASELINE 0,5784 0,3420
27 task1_CompLingKnJ_1 0,7457 0,7448 task2_UNED-UPM_2 0,4924 0,3283
28 task1_LPtower_2_major 0,7457 0,7426 task2_NIT Agartala NLP Team_1 0,6229 0,3194
29 task1_AIT_FHSTP_1 0,7420 0,7410 task2_LPtower_3 0,3110 0,1508
30 task1_AI-UPV_1 0,7410 0,7410 Majority Class 0,5539 0,1018
31 task1_AI-UPV_2 0,7410 0,7410 task2_shm2022_1 0,1380 0,0560
32 task1_SINAI_1 0,7316 0,7315
33 task1_besiguenza_1 0,7306 0,7269
34 task1_SINAI_2 0,7278 0,7272
35 task1_SINAI_3 0,7202 0,7184
36 task1_AIT_FHSTP_2 0,7183 0,7181
37 task1_shm2022_2 0,7183 0,7183
38 task1_NIT Agartala NLP Team_1 0,7098 0,7065
39 task1_BASELINE 0,6928 0,6859
40 task1_SINAI-TL_2 0,6928 0,6882
41 task1_UNED-UPM_1 0,6824 0,6792
42 task1_CompLingKnJ_2 0,6815 0,6770
43 task1_UNED-UPM_2 0,6664 0,6624
44 Majority Class 0,5444 0,3525
45 task1_LPtower_3 0,4905 0,4872
46 task1_xaiTUD_1 0,4811 0,4600
47 task1_avacaondata_2 0,0491 0,0473

The EvALL reports for Task 1 are avaible for download:

EvALL Tsv Report Task 1 ALL

EvALL Tsv Report Task 1 English

EvALL Tsv Report Task 1 Spanish

The EvALL reports for Task 2 are avaible for download:

EvALL Tsv Report Task 2 ALL

EvALL Tsv Report Task 2 English

EvALL Tsv Report Task 2 Spanish

Organizers

Avatar

Adrián Mendieta-Aragón

UNED

PhD student

Avatar

Damiano Spina

RMIT University

Senior Lecturer

Avatar

Francisco Rodríguez-Sánchez

UNED

Data developer and PhD student

Avatar

Guillermo Marco-Remón

UNED

PhD student

Avatar

Jorge Carrillo-de-Albornoz

UNED

Associate Professor

Avatar

Julio Gonzalo

UNED

Full Professor

Avatar

Laura Plaza

UNED

Associate Professor

Avatar

Maryna Makeienko

UNED

PhD student

Avatar

María Plaza

Collaborator

Avatar

Paolo Rosso

Universitat Politècnica de València

Full Professor

Contact

If you have any specific question about the EXIST 2022 task, we may ask you to let us know through the Google Group existiberlef2022.

For any other question that does not directly concern the shared task, please write to Jorge Carrillo-de-Albornoz.