Why EXIST?

Welcome to the website of EXIST, the first shared task on sEXism Identification in Social neTworks at IberLEF 2021.

The Oxford English Dictionary defines sexism as “prejudice, stereotyping or discrimination, typically against women, on the basis of sex”. Inequality and discrimination against women that remain embedded in society is increasingly being replicated online.

Detecting online sexism may be difficult, as it may be expressed in very different forms. Sexism may sound “friendly”: the statement “Women must be loved and respected, always treat them like a fragile glass” may seem positive, but is actually considering that women are weaker than men. Sexism may sound “funny”, as it is the case of sexist jokes or humour (“You have to love women… just that… You will never understand them.”). Sexism may sound “offensive” and “hateful”, as in “Humiliate, expose and degrade yourself as the fucking bitch you are if you want a real man to give you attention”. Our aim is the detection of sexism in a broad sense, from explicit misogyny to other subtle expressions that involve implicit sexist behaviours.

However, even the most subtle forms of sexism can be as pernicious as the most violent ones and affect women in many facets of their lives, including domestic and parenting roles, career opportunities, sexual image and life expectations, to name a few. The automatic identification of sexisms in a broad sense may help to create, design and determine the evolution of new equality policies, as well as encourage better behaviors in society.

Tasks

Participants will be asked to classify “tweets” and “gab post” (in English and Spanish) according to the following two tasks:

TASK 1: Sexism Identification

The first subtask is a binary classification. The systems have to decide whether or not a given text (tweet or gab) is sexist (i.e., it is sexist itself, describes a sexist situation or criticizes a sexist behaviour). The following tweets show examples of sexist and not sexist messages.

  • SEXIST:
    • “Mujer al volante, tenga cuidado!"
    • “People really try to convince women with little to no ass that they should go out and buy a body. Like bih, I don’t need a fat ass to get a man. Never have."
  • NOT SEXIST:
    • “Alguien me explica que zorra hace la gente en el cajero que se demora tanto."
    • "@messyworldorder it’s honestly so embarrassing to watch and they’ll be like “not all white women are like that”"

TASK 2: Sexism Categorization

Once a message has been classified as sexist, the second task aims to categorize the message according to the type of sexism (according to the categorization proposed by experts and that takes into account the different facets of women that are undermined). In particular, we propose a five-classification task:

  • IDEOLOGICAL AND INEQUALITY: The text discredits the feminist movement, rejects inequality between men and women, or presents men as victims of gender-based oppression.

    • “Mi hermana y mi madre se burlan de mí por defender todo el tiempo los derechos de todos y me acaban de decir feminazi, la completaron”.
    • “I think the whole equality thing is getting out of hand. We are different, thats how were made!”.
  • STEREOTYPING AND DOMINANCE: The text expresses false ideas about women that suggest they are more suitable to fulfill certain roles (mother, wife, family caregiver, faithful, tender, loving, submissive, etc.), or inappropriate for certain tasks (driving, hardwork, etc), or claims that men are somehow superior to women.

    • “@Paula2R @faber_acuria A las mujeres hay que amarlas…solo eso… Nunca las entenderás.”.
    • “Most women no longer have the desire or the knowledge to develop a high quality character, even if they wanted to.”.
  • OBJECTIFICATION: The text presents women as objects apart from their dignity and personal aspects, or assumes or describes certain physical qualities that women must have in order to fulfill traditional gender roles (compliance with beauty standards, hypersexualization of female attributes, women’s bodies at the disposal of men, etc.).

    • ““Pareces una puta con ese pantalón” - Mi hermano de 13 cuando me vio con un pantalón de cuero”.
    • “Don’t get married than blame all woman for your poor investment. You should of got a hooker but instead you choose to go get a wedding ring.”.
  • SEXUAL VIOLENCE: Sexual suggestions, requests for sexual favors or harassment of a sexual nature (rape or sexual assault) are made.

    • “#MeToo Estas 4 no han conseguido su objetivo.El juez estima que se abrieron de patas https://t.co/GSHiiwqY6Aánta lagartona hay en este \metoo"!👇🏻👇🏻🤔🤔🤔 https://t.co/8t5VmFIUFn””
    • “fuck that cunt, I would with my fist”.
  • MISOGYNY AND NON-SEXUAL VIOLENCE: The text expressses hatred and violence towards women.

    • “Las mujeres de hoy en dia te enseñar a querer… estar soltero”
    • “Some woman are so toxic they don’t even know they are draining everyone around them in poison. If you lack self awareness you won’t even notice how toxic you really are”.

How to participate

If you want to participate in the EXIST@IberLEF-2021 shared task, please fill this form. You will receive information about how to join our Google Group, where EXIST-Datasets, EXIST-Communications, EXIST-Questions/Answers, and EXIST-Guidelines will be provided to the participants.

Participants will be required to submit their runs and will have the possibility to provide a technical report that should include a brief description of their approach, focusing on the adopted algorithms, models and resources, a summary of their experiments, and an analysis of the obtained results or their publication in the Proceedings. Although we recommend to participate in both tasks, participants are allowed to participate just in one of them (e.g. Task 1).

Publications

Technical reports will be published in IberLEF 2021 Proceedings at CEUR-WS.org.

Important dates

  • 1 Feb 2021 Registration open
  • 8 Mar 2021 Training set available.
  • 14 Apr 2021 Testing set available.
  • 28 Apr 2021 Systems results due to organizers. Extended Deadline: 5 May 2021 Systems results due to organizers
  • 12 May 2021 Results notification to participants.
  • 26 May 2021 Submission of Working Notes by participants. Extended Deadline: 2 Jun 2021 Submission of Working Notes by participants.
  • 16 Jun 2021 Reviews to participants (peer-reviews).
  • 30 Jun 2021 Camera-ready due to organizers.
  • Sep 2021 EXIST@IberLEF 2021

Note: All deadlines are 11:59PM UTC-12:00 (“anywhere on Earth”).

Dataset

Sexism comprises any form of oppression or prejudice against women because of their sex. As stated in (Rodríguez-Sánchez et al. 2020), sexism is frequently found in many forms in social networks, includes a wide range of behaviours (such as stereotyping, ideological issues, sexual violence, etc. (Donoso-Vázquez and Rebollo-Catalán, 2018; Manne, 2018)), and may be expressed in different forms: direct, indirect, descriptive or reported (Miller, 2009; Chiril et al. 2020). While previous studies have focused on identifying explicit hatred or violence towards women (Zeerak and Dirk, 2016; Zeerak, 2016; Anzovino et al., 2018; Frenda et al., 2019), the aim of the EXIST dataset is to cover sexism in a broad sense, from explicit misogyny to other subtle expressions that involve implicit sexist behaviours. The EXIST dataset incorporates any type of sexist expression or related phenomena, including descriptive or reported assertions where the sexist message is a report or a description of a sexist behaviour.

To this aim, we have collected a number of popular expressions and terms, both in English and Spanish, commonly used to underestimate the role of women in our society and extracted from several Twitter accounts which collects phrases and expressions that women (Twitter users) have received on a day-to-day basis, as well as terms used in previous state of the art approaches. These terms were analyzed and filtered by two experts in gender issues, Trinidad Donoso and Miriam Comet, which examined examples of tweets extracted using these terms as seeds. The final set contains more than 200 expressions that can be used in sexist contexts.

The final set of sexism terms was used to extract tweets both in English and Spanish (more than 800.000 tweets were downloaded). Crawling was performed during the period from the 1st December 2020 till the 28st February 2021. To ensure an appropriate balance between seeds, we have removed those with less than 60 tweets. The final set of seeds used contains 94 seeds for Spanish and 91 seeds for English.

For each seed, approximately 50 tweets were randomly selected within the period from 1st to 31st of December 2020 for the training set, and 22 tweets per seed within the period from 1st to 28th February of 2021 for the test set. This distribution was set to allow a temporal separation between the training and test data. As a result, we have 4.500 tweets per language for the training set and 2.000 tweets per language for the test set.

Each tweet was annotated by 5 crowdsourcing annotators, following the guidelines developed by Trinidad and Miriam (different experiments were done to ensure quality), and an inter-annotator agreement test was carried out. Final labels were selected according to the majority vote between crowdsourcing annotators, but tweets with 3 to 2 were manually reviewed by two persons (man and woman) with more than two years of experience analyzing sexist content in social networks. Final EXIST dataset consists of 6977 tweets for training and 3.386 tweets for testing, where both sets are randomly selected from the 9.000 and 4.000 labeled sets, training and test respectively, to ensure class balancing according to Task 1.

In addition, we have collected 492 “gabs” in English and 490 in Spanish from the uncensored social network Gab.com following a similar procedure as described before. This set will be included in the EXIST test set in order to measure the difference between social networks with and without “content control”, Twitter and Gab.com respectively.

More details about the dataset will be provided in the task overview (bias consideration, annotation process, quality experiments, inter-annotator agreement, etc).

References

Rodríguez-Sánchez, F., Carrillo-de-Albornoz, J., Plaza, L., Automatic Classification of Sexism in Social Networks: An Empirical Study on Twitter Data. IEEE Access (2020).

Donoso-Vázquez, Trinidad; Rebollo-Catalán, Ángeles. (coordinadoras) (2018). Violencias de género en entornos virtuales. Ediciones Octaedro, S.L.

Manne, K., DOWN GIRL: The logic of misogyny. Oxford University Press (2018)

Miller, S., Language and Sexism. Cambridge University Press (2009)

Chiril, P., Moriceau, V., Benamara, F., He said “who’s gonna take care of your children when you are at ACL?”: Reported Sexist Acts are Not Sexist. In proceedings of the ACL (2020)

Zeerak, W., Dirk, H., Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter. In proceedings of the ACL (2016)

Zeerak, W., Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter. In proceedings of the ACL (2016)

Anzovino, M., Fersini, E., Rosso, P., Automatic Identification and Classification of Misogynistic Language on Twitter, Springer (2018)

Frenda S., Ghanem B., Montes-y-Gómez M., Rosso P., Online Hate Speech against Women: Automatic Identification of Misogyny and Sexism on Twitter. In: Journal of Intelligent & Fuzzy Systems, vol. 36, num. 5, pp. 4743–4752 (2019)

Evaluation

In order to evaluate the performance of the different approaches proposed by the participants we will use the Evaluation Framework EvALL, www.evall.uned.es (Amigo et al., 2017; Amigo et al., 2018, Amigo et al., 2020). Within this framework, we will evaluate the system outputs as classification tasks (binary and multiclass respectively) with the following measures: Accuracy, Precision, Recall and F-measure (using macro average with all classes for the three last).

In the first task, Sexism Identification, results of participants will be ranked using Accuracy, as distribution between sexist and non-sexist categories is balanced. Besides, other measures will be computed, such Precision, Recall and F1, as well as other analysis based on the two different social networks will be performed.

For the second task, Sexism Categorization, we will use macro-average F-measure to rank the system outputs, analyzing the results according to the different categories and distributions. Similarly, we will compute other measures such as Precision and Recall.

More details about the evaluation and additional experiments will be provided in the task overview.

References

Amigó, E., Carrillo-de-Albornoz, J., Almagro-Cádiz, M., Gonzalo, J., Rodríguez-Vidal, J., and Verdejo, F. (2017). EvALL: Open Access Evaluation for Information Access Systems. Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7-11, 2017.

Amigó, E., Spina, D., and Carrillo-de-Albornoz, J.. An Axiomatic Analysis of Diversity Evaluation Metrics: Introducing the Rank-Biased Utility Metric. In Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR ‘18). ACM, New York, NY, USA, 625-634.

Amigo, E., Gonzalo, J., Mizzaro, S., and Carrillo-de-Albornoz, J.. An Effectiveness Metric for Ordinal Classification: Formal Properties and Experimental Results. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020).

Results

Below are the official test scores for all participants and tasks. Ranking for Task 1 is based on Accuracy measure, while F1 measure is used for ranking in Task 2.

The Evaluation Framework EvALL (www.evall.uned.es) has been used to generate the evaluation for both tasks. Evaluations by language has been also generated for all runs. All evaluations reports geenerated by EvALL are available for download below the table.

TASK 1: Sexism Identification TASK 2: Sexism Categorization
Ranking Run Acc F1 Run Acc F1
1 task1_AI-UPV_1 0,7804 0,7802 task2_AI-UPV_1 0,6577 0,5787
2 task1_SINAI_TL_1 0,78 0,7797 task2_LHZ_1 0,6509 0,5706
3 task1_SINAI_TL_3 0,777 0,7757 task2_SINAI_TL_1 0,6527 0,5667
4 task1_SINAI_TL_2 0,7766 0,7761 task2_SINAI_TL_3 0,6497 0,5632
5 task1_AIT_FHSTP_2 0,7754 0,7752 task2_QMUL-SDS_1 0,6426 0,5594
6 task1_multiaztertest_1 0,774 0,7731 task2_AIT_FHSTP_2 0,6445 0,5589
7 task1_nlp_uned_team_1 0,772 0,7702 task2_Alclatos_1 0,6369 0,5578
8 task1_free_1 0,7708 0,7708 task2_AIT_FHSTP_3 0,6445 0,5559
9 task1_GuillemGSubies_2 0,7683 0,7683 task2_IREL_hatespeech_group_3 0,6403 0,5556
10 task1_AIT_FHSTP_3 0,7665 0,7656 task2_zk_1 0,649 0,5521
11 task1_LHZ_1 0,7665 0,7661 task2_nlp_uned_team_3 0,6232 0,5509
12 task1_zk_1 0,7647 0,7645 task2_recognai_1 0,6243 0,55
13 task1_Alclatos_1 0,7637 0,7636 task2_QMUL-SDS_2 0,6351 0,5464
14 task1_QMUL-SDS_2 0,761 0,7609 task2_QMUL-SDS_3 0,6351 0,5464
15 task1_GuillemGSubies_1 0,7603 0,7603 task2_nlp_uned_team_1 0,6314 0,544
16 task1_s_exist_1 0,7598 0,7598 task2_IREL_hatespeech_group_1 0,6344 0,5419
17 task1_nlp_uned_team_3 0,7571 0,7563 task2_IREL_hatespeech_group_2 0,6305 0,5408
18 task1_QMUL-SDS_3 0,7557 0,7555 task2_UMUTEAM_2 0,617 0,5362
19 task1_MiniTrue_1 0,7553 0,7551 task2_codec_1 0,6239 0,5354
20 task1_IREL_hatespeech_group_3 0,7532 0,7532 task2_s_exist_1 0,5682 0,5342
21 task1_zzw_1 0,7527 0,7526 task2_GuillemGSubies_2 0,6293 0,5295
22 task1_UMUTEAM_3 0,7514 0,7514 task2_nlp_uned_team_2 0,6209 0,5246
23 task1_QMUL-SDS_1 0,7502 0,7489 task2_UMUTEAM_3 0,5911 0,524
24 task1_GuillemGSubies_3 0,7479 0,7479 task2_LaSTUS_1 0,612 0,5227
25 task1_IREL_hatespeech_group_1 0,747 0,7469 task2_GuillemGSubies_1 0,6234 0,5218
26 task1_IREL_hatespeech_group_2 0,7457 0,7455 task2_Zimtstern_1 0,6108 0,5208
27 task1_UMUTEAM_2 0,744 0,744 task2_Zimtstern_3 0,6133 0,5206
28 task1_Zimtstern_3 0,7356 0,7354 task2_Andrea_Lisa_1 0,6129 0,5204
29 task1_nlp_uned_team_2 0,7324 0,7317 task2_AIT_FHSTP_1 0,6074 0,5195
30 task1_LaSTUS_1 0,7317 0,7316 task2_zzw_1 0,6296 0,5192
31 task1_CIC_1 0,7278 0,727 task2_recognai_2 0,5996 0,5177
32 Task1_MessGroupELL_3 0,7237 0,7237 task2_GuillemGSubies_3 0,6145 0,5174
33 Task1_MessGroupELL_1 0,723 0,7225 task2_Zimtstern_2 0,6033 0,5121
34 task1_Andrea_Lisa_1 0,7186 0,7182 task2_CIC_2 0,5527 0,4908
35 task1_Zimtstern_1 0,7184 0,7165 task2_Nerin_3 0,6046 0,4817
36 task1_AIT_FHSTP_1 0,7182 0,7121 task2_Nerin_2 0,6005 0,471
37 task1_MB-Courage_1 0,7145 0,7145 task2_UNEDBiasTeam_3 0,5797 0,4704
38 task1_Soumya_2 0,7115 0,7114 task2_Alclatos_2 0,5826 0,4673
39 Task1_MessGroupELL_2 0,7111 0,7109 task2_UNEDBiasTeam_2 0,5689 0,4621
40 task1_MB-Courage_2 0,7083 0,7072 task2_MB-Courage_2 0,5946 0,459
41 task1_Zimtstern_2 0,7076 0,7066 task2_SINAI_TL_2 0,6049 0,4549
42 task1_Nerin_1 0,7072 0,7068 task2_CIC_3 0,5838 0,4543
43 task1_s_exist_2 0,707 0,7066 task2_Soumya_1 0,5923 0,4504
44 task1_UNEDBiasTeam_2 0,7056 0,7056 task2_MB-Courage_1 0,5897 0,4496
45 task1_Soumya_1 0,7047 0,7045 task2_CIC_1 0,565 0,4489
46 task1_recognai_1 0,7044 0,7041 task2_Soumya_2 0,595 0,4415
47 task1_Nerin_2 0,7022 0,7016 task2_s_exist_2 0,5817 0,4357
48 task1_Alclatos_3 0,6962 0,6959 task2_Nerin_1 0,582 0,4234
49 task1_Alclatos_2 0,6944 0,6926 task2_MB-Courage_3 0,5923 0,4214
50 task1_Nerin_3 0,6918 0,6906 task2_free_1 0,5847 0,4194
51 task1_UNEDBiasTeam_3 0,6905 0,6898 Baseline_svm_tfidf 0,5222 0,395
52 Baseline_svm_tfidf 0,6845 0,6832 task2_s_exist_3 0,4492 0,3853
53 task1_MB-Courage_3 0,6809 0,6792 task2_BilaUnwanPk1_1 0,5062 0,3788
54 task1_BilaUnwanPk1_3 0,6763 0,6759 task2_BilaUnwanPk1_3 0,5064 0,3772
55 task1_Soumya_3 0,6761 0,6761 task2_BilaUnwanPk1_2 0,5046 0,3756
56 task1_recognai_2 0,6726 0,6717 task2_UMUTEAM_1 0,2905 0,2812
57 task1_BilaUnwanPk1_2 0,6717 0,6712 task2_UNEDBiasTeam_1 0,4444 0,165
58 task1_CIC_2 0,6699 0,6677 task2_Alclatos_3 0,4311 0,1585
59 task1_s_exist_3 0,6648 0,6619 task2_ORDS_CLAN_1 0,4833 0,1244
60 task1_multiaztertest_3 0,6582 0,6482 task2_ORDS_CLAN_2 0,4833 0,1237
61 task1_CIC_3 0,6367 0,6366 task2_ORDS_CLAN_3 0,4833 0,1237
62 task1_BilaUnwanPk1_1 0,6126 0,6027 Majority Class 0,4778 0,1078
63 task1_UMUTEAM_1 0,5966 0,5964 task2_almuoes3.0_1 0,1291 0,1069
64 task1_multiaztertest_2 0,5948 0,5944
65 task1_UNEDBiasTeam_1 0,543 0,5359
66 Majority Class 0,5222 0,3431
67 task1_uja_1 0,519 0,5035
68 task1_ORDS_CLAN_1 0,4924 0,3934
69 task1_ORDS_CLAN_2 0,4908 0,39
70 task1_ORDS_CLAN_3 0,4908 0,39
71 task1_almuoes3.0_1 0,4876 0,3979
72 task1_codec_1 0,4096 0,3892

The EvALL reports for Task 1 are avaible for download:

EvALL Tsv Report Task 1 ALL

EvALL Tsv Report Task 1 English

EvALL Tsv Report Task 1 Spanish

The EvALL reports for Task 2 are avaible for download:

EvALL Tsv Report Task 2 ALL

EvALL Tsv Report Task 2 English

EvALL Tsv Report Task 2 Spanish

EXIST 2021 Workshop Program

EXIST 2021 is co-located with the IberLEF Conference, and will be held online (due to COVID-19) on Tuesday, 21 September 2021, from 11:00 to 22:00h CET.

  • 12:30 - 14:00 EXIST Parallel Session

    • 12:30 - 12:35: Welcome and Opening Remarks.
    • 12:35 - 12:50: Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models. Angel Felipe Magnossão de Paula, Roberto Fray da Silva, Ipek Baris Schlicht.
    • 12:50 - 13:05: Sexism Identification in Social Networks using a Multi-Task Learning System. Flor Miriam Plaza-del-Arco, M. Dolores Molina-González, L. Alfonso Ureña López, M. Teresa Martín-Valdivia.
    • 13:05 - 13:20: Automatic Sexism Detection with Multilingual Transformer Models AIT_FHSTP@EXIST2021. Schütz Mina, Boeck Jaqueline, Liakhovets Daria, Slijepcevic Djordje, Kirchknopf Armin, Hecht Manuel, Bogensperger Johannes, Schlarb Sven, Schindler Alexander, and Zeppelzauer Matthias.
    • 13:20 - 13:35: MultiAzterTest@Exist-IberLEF 2021:Linguistically Motivated Sexism Identification. Kepa Bengoetxea, Itziar Gonzalez-Dios.
    • 13:35 - 13:50: EXIST2021: Detecting Sexism with Transformers and Translation-Augmented Data. Guillem García Subies.
    • 13:50 - 14:00: Discussion.
  • 17:15 - 17:30 Overview of EXIST. Francisco Rodríguez-Sánchez, Jorge Carrillo-de-Albornoz, Laura Plaza, Julio Gonzalo, Paolo Rosso, Miriam Comet, Trinidad Donoso.

EXIST 2021 Proceedings

Overview Paper:

Working Notes:

Organizers

Avatar

Francisco Rodríguez-Sánchez

UNED

Data developer and PhD student

Avatar

Jorge Carrillo-de-Albornoz

UNED

Associate Professor and Researcher in NLP

Avatar

Julio Gonzalo

UNED

Full Professor

Avatar

Laura Plaza

UNED

Associate Professor and Researcher in NLP

Avatar

Miriam Comet

Universidad de Barcelona

Assistant Professor

Avatar

Paolo Rosso

Universitat Politècnica de València

Full Professor

Avatar

Trinidad Donoso

Universidad de Barcelona

Full Professor

Sponsors

Avatar

MISMIS-Language Project

(PGC2018-096212-B)

Ministerio de Ciencia, Innovación y Universidades

Avatar

Symanto Research

Francisco Rangel

Head of Product

Contact

If you have any specific question about the EXIST task, we may ask you to let us know through the Google Group existiberlef2021.

For any other question that does not directly concern the shared task, please write to Jorge Carrillo-de-Albornoz.