EXIST: sEXism Identification in Social neTworks

First Shared Task at IberLEF 2021

Why EXIST?

Welcome to the website of EXIST, the first shared task on sEXism Identification in Social neTworks at IberLEF 2021.

The Oxford English Dictionary defines sexism as “prejudice, stereotyping or discrimination, typically against women, on the basis of sex”. Inequality and discrimination against women that remain embedded in society is increasingly being replicated online.

Detecting online sexism may be difficult, as it may be expressed in very different forms. Sexism may sound “friendly”: the statement “Women must be loved and respected, always treat them like a fragile glass” may seem positive, but is actually considering that women are weaker than men. Sexism may sound “funny”, as it is the case of sexist jokes or humour (“You have to love women… just that… You will never understand them.”). Sexism may sound “offensive” and “hateful”, as in “Humiliate, expose and degrade yourself as the fucking bitch you are if you want a real man to give you attention”. Our aim is the detection of sexism in a broad sense, from explicit misogyny to other subtle expressions that involve implicit sexist behaviours.

However, even the most subtle forms of sexism can be as pernicious as the most violent ones and affect women in many facets of their lives, including domestic and parenting roles, career opportunities, sexual image and life expectations, to name a few. The automatic identification of sexisms in a broad sense may help to create, design and determine the evolution of new equality policies, as well as encourage better behaviors in society.

Tasks

Participants will be asked to classify “tweets” and “gab post” (in English and Spanish) according to the following two tasks:

TASK 1: Sexism Identification

The first subtask is a binary classification. The systems have to decide whether or not a given text (tweet or gab) is sexist (i.e., it is sexist itself, describes a sexist situation or criticizes a sexist behaviour). The following tweets show examples of sexist and not sexist messages.

SEXIST:
- “Mujer al volante, tenga cuidado!"
- “People really try to convince women with little to no ass that they should go out and buy a body. Like bih, I don’t need a fat ass to get a man. Never have."
NOT SEXIST:
- “Alguien me explica que zorra hace la gente en el cajero que se demora tanto."
- "@messyworldorder it’s honestly so embarrassing to watch and they’ll be like “not all white women are like that”"

TASK 2: Sexism Categorization

Once a message has been classified as sexist, the second task aims to categorize the message according to the type of sexism (according to the categorization proposed by experts and that takes into account the different facets of women that are undermined). In particular, we propose a five-classification task:

IDEOLOGICAL AND INEQUALITY: The text discredits the feminist movement, rejects inequality between men and women, or presents men as victims of gender-based oppression.
- “Mi hermana y mi madre se burlan de mí por defender todo el tiempo los derechos de todos y me acaban de decir feminazi, la completaron”.
- “I think the whole equality thing is getting out of hand. We are different, thats how were made!”.
STEREOTYPING AND DOMINANCE: The text expresses false ideas about women that suggest they are more suitable to fulfill certain roles (mother, wife, family caregiver, faithful, tender, loving, submissive, etc.), or inappropriate for certain tasks (driving, hardwork, etc), or claims that men are somehow superior to women.
- “@Paula2R @faber_acuria A las mujeres hay que amarlas…solo eso… Nunca las entenderás.”.
- “Most women no longer have the desire or the knowledge to develop a high quality character, even if they wanted to.”.
OBJECTIFICATION: The text presents women as objects apart from their dignity and personal aspects, or assumes or describes certain physical qualities that women must have in order to fulfill traditional gender roles (compliance with beauty standards, hypersexualization of female attributes, women’s bodies at the disposal of men, etc.).
- ““Pareces una puta con ese pantalón” - Mi hermano de 13 cuando me vio con un pantalón de cuero”.
- “Don’t get married than blame all woman for your poor investment. You should of got a hooker but instead you choose to go get a wedding ring.”.
SEXUAL VIOLENCE: Sexual suggestions, requests for sexual favors or harassment of a sexual nature (rape or sexual assault) are made.
- “#MeToo Estas 4 no han conseguido su objetivo.El juez estima que se abrieron de patas https://t.co/GSHiiwqY6Aánta lagartona hay en este \metoo"!👇🏻👇🏻🤔🤔🤔 https://t.co/8t5VmFIUFn””
- “fuck that cunt, I would with my fist”.
MISOGYNY AND NON-SEXUAL VIOLENCE: The text expressses hatred and violence towards women.
- “Las mujeres de hoy en dia te enseñar a querer… estar soltero”
- “Some woman are so toxic they don’t even know they are draining everyone around them in poison. If you lack self awareness you won’t even notice how toxic you really are”.

How to participate

If you want to participate in the EXIST@IberLEF-2021 shared task, please fill this form. You will receive information about how to join our Google Group, where EXIST-Datasets, EXIST-Communications, EXIST-Questions/Answers, and EXIST-Guidelines will be provided to the participants.

Participants will be required to submit their runs and will have the possibility to provide a technical report that should include a brief description of their approach, focusing on the adopted algorithms, models and resources, a summary of their experiments, and an analysis of the obtained results or their publication in the Proceedings. Although we recommend to participate in both tasks, participants are allowed to participate just in one of them (e.g. Task 1).

Publications

Technical reports will be published in IberLEF 2021 Proceedings at CEUR-WS.org.

Important dates

1 Feb 2021 Registration open
8 Mar 2021 Training set available.
14 Apr 2021 Testing set available.
28 Apr 2021 Systems results due to organizers. Extended Deadline: 5 May 2021 Systems results due to organizers
12 May 2021 Results notification to participants.
26 May 2021 Submission of Working Notes by participants. Extended Deadline: 2 Jun 2021 Submission of Working Notes by participants.
16 Jun 2021 Reviews to participants (peer-reviews).
30 Jun 2021 Camera-ready due to organizers.
Sep 2021 EXIST@IberLEF 2021

Note: All deadlines are 11:59PM UTC-12:00 (“anywhere on Earth”).

Dataset

Sexism comprises any form of oppression or prejudice against women because of their sex. As stated in (Rodríguez-Sánchez et al. 2020), sexism is frequently found in many forms in social networks, includes a wide range of behaviours (such as stereotyping, ideological issues, sexual violence, etc. (Donoso-Vázquez and Rebollo-Catalán, 2018; Manne, 2018)), and may be expressed in different forms: direct, indirect, descriptive or reported (Miller, 2009; Chiril et al. 2020). While previous studies have focused on identifying explicit hatred or violence towards women (Zeerak and Dirk, 2016; Zeerak, 2016; Anzovino et al., 2018; Frenda et al., 2019), the aim of the EXIST dataset is to cover sexism in a broad sense, from explicit misogyny to other subtle expressions that involve implicit sexist behaviours. The EXIST dataset incorporates any type of sexist expression or related phenomena, including descriptive or reported assertions where the sexist message is a report or a description of a sexist behaviour.

To this aim, we have collected a number of popular expressions and terms, both in English and Spanish, commonly used to underestimate the role of women in our society and extracted from several Twitter accounts which collects phrases and expressions that women (Twitter users) have received on a day-to-day basis, as well as terms used in previous state of the art approaches. These terms were analyzed and filtered by two experts in gender issues, Trinidad Donoso and Miriam Comet, which examined examples of tweets extracted using these terms as seeds. The final set contains more than 200 expressions that can be used in sexist contexts.

The final set of sexism terms was used to extract tweets both in English and Spanish (more than 800.000 tweets were downloaded). Crawling was performed during the period from the 1st December 2020 till the 28st February 2021. To ensure an appropriate balance between seeds, we have removed those with less than 60 tweets. The final set of seeds used contains 94 seeds for Spanish and 91 seeds for English.

For each seed, approximately 50 tweets were randomly selected within the period from 1st to 31st of December 2020 for the training set, and 22 tweets per seed within the period from 1st to 28th February of 2021 for the test set. This distribution was set to allow a temporal separation between the training and test data. As a result, we have 4.500 tweets per language for the training set and 2.000 tweets per language for the test set.

Each tweet was annotated by 5 crowdsourcing annotators, following the guidelines developed by Trinidad and Miriam (different experiments were done to ensure quality), and an inter-annotator agreement test was carried out. Final labels were selected according to the majority vote between crowdsourcing annotators, but tweets with 3 to 2 were manually reviewed by two persons (man and woman) with more than two years of experience analyzing sexist content in social networks. Final EXIST dataset consists of 6977 tweets for training and 3.386 tweets for testing, where both sets are randomly selected from the 9.000 and 4.000 labeled sets, training and test respectively, to ensure class balancing according to Task 1.

In addition, we have collected 492 “gabs” in English and 490 in Spanish from the uncensored social network Gab.com following a similar procedure as described before. This set will be included in the EXIST test set in order to measure the difference between social networks with and without “content control”, Twitter and Gab.com respectively.

More details about the dataset will be provided in the task overview (bias consideration, annotation process, quality experiments, inter-annotator agreement, etc).

Download EXIST Dataset

If you want to access EXIST Dataset for research purpose, please fill this form.

References

Rodríguez-Sánchez, F., Carrillo-de-Albornoz, J., Plaza, L., Automatic Classification of Sexism in Social Networks: An Empirical Study on Twitter Data. IEEE Access (2020).

Donoso-Vázquez, Trinidad; Rebollo-Catalán, Ángeles. (coordinadoras) (2018). Violencias de género en entornos virtuales. Ediciones Octaedro, S.L.

Manne, K., DOWN GIRL: The logic of misogyny. Oxford University Press (2018)

Miller, S., Language and Sexism. Cambridge University Press (2009)

Chiril, P., Moriceau, V., Benamara, F., He said “who’s gonna take care of your children when you are at ACL?”: Reported Sexist Acts are Not Sexist. In proceedings of the ACL (2020)

Zeerak, W., Dirk, H., Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter. In proceedings of the ACL (2016)

Zeerak, W., Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter. In proceedings of the ACL (2016)

Anzovino, M., Fersini, E., Rosso, P., Automatic Identification and Classification of Misogynistic Language on Twitter, Springer (2018)

Frenda S., Ghanem B., Montes-y-Gómez M., Rosso P., Online Hate Speech against Women: Automatic Identification of Misogyny and Sexism on Twitter. In: Journal of Intelligent & Fuzzy Systems, vol. 36, num. 5, pp. 4743–4752 (2019)

Evaluation

In order to evaluate the performance of the different approaches proposed by the participants we will use the Evaluation Framework EvALL, www.evall.uned.es (Amigo et al., 2017; Amigo et al., 2018, Amigo et al., 2020). Within this framework, we will evaluate the system outputs as classification tasks (binary and multiclass respectively) with the following measures: Accuracy, Precision, Recall and F-measure (using macro average with all classes for the three last).

In the first task, Sexism Identification, results of participants will be ranked using Accuracy, as distribution between sexist and non-sexist categories is balanced. Besides, other measures will be computed, such Precision, Recall and F1, as well as other analysis based on the two different social networks will be performed.

For the second task, Sexism Categorization, we will use macro-average F-measure to rank the system outputs, analyzing the results according to the different categories and distributions. Similarly, we will compute other measures such as Precision and Recall.

More details about the evaluation and additional experiments will be provided in the task overview.

References

Amigó, E., Carrillo-de-Albornoz, J., Almagro-Cádiz, M., Gonzalo, J., Rodríguez-Vidal, J., and Verdejo, F. (2017). EvALL: Open Access Evaluation for Information Access Systems. Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7-11, 2017.

Amigó, E., Spina, D., and Carrillo-de-Albornoz, J.. An Axiomatic Analysis of Diversity Evaluation Metrics: Introducing the Rank-Biased Utility Metric. In Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR ‘18). ACM, New York, NY, USA, 625-634.

Amigo, E., Gonzalo, J., Mizzaro, S., and Carrillo-de-Albornoz, J.. An Effectiveness Metric for Ordinal Classification: Formal Properties and Experimental Results. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020).

Results

Below are the official test scores for all participants and tasks. Ranking for Task 1 is based on Accuracy measure, while F1 measure is used for ranking in Task 2.

The Evaluation Framework EvALL (www.evall.uned.es) has been used to generate the evaluation for both tasks. Evaluations by language has been also generated for all runs. All evaluations reports geenerated by EvALL are available for download below the table.

	TASK 1: Sexism Identification			TASK 2: Sexism Categorization
Ranking	Run	Acc	F1	Run	Acc	F1
1	task1_AI-UPV_1	0,7804	0,7802	task2_AI-UPV_1	0,6577	0,5787
2	task1_SINAI_TL_1	0,78	0,7797	task2_LHZ_1	0,6509	0,5706
3	task1_SINAI_TL_3	0,777	0,7757	task2_SINAI_TL_1	0,6527	0,5667
4	task1_SINAI_TL_2	0,7766	0,7761	task2_SINAI_TL_3	0,6497	0,5632
5	task1_AIT_FHSTP_2	0,7754	0,7752	task2_QMUL-SDS_1	0,6426	0,5594
6	task1_multiaztertest_1	0,774	0,7731	task2_AIT_FHSTP_2	0,6445	0,5589
7	task1_nlp_uned_team_1	0,772	0,7702	task2_Alclatos_1	0,6369	0,5578
8	task1_free_1	0,7708	0,7708	task2_AIT_FHSTP_3	0,6445	0,5559
9	task1_GuillemGSubies_2	0,7683	0,7683	task2_IREL_hatespeech_group_3	0,6403	0,5556
10	task1_AIT_FHSTP_3	0,7665	0,7656	task2_zk_1	0,649	0,5521
11	task1_LHZ_1	0,7665	0,7661	task2_nlp_uned_team_3	0,6232	0,5509
12	task1_zk_1	0,7647	0,7645	task2_recognai_1	0,6243	0,55
13	task1_Alclatos_1	0,7637	0,7636	task2_QMUL-SDS_2	0,6351	0,5464
14	task1_QMUL-SDS_2	0,761	0,7609	task2_QMUL-SDS_3	0,6351	0,5464
15	task1_GuillemGSubies_1	0,7603	0,7603	task2_nlp_uned_team_1	0,6314	0,544
16	task1_s_exist_1	0,7598	0,7598	task2_IREL_hatespeech_group_1	0,6344	0,5419
17	task1_nlp_uned_team_3	0,7571	0,7563	task2_IREL_hatespeech_group_2	0,6305	0,5408
18	task1_QMUL-SDS_3	0,7557	0,7555	task2_UMUTEAM_2	0,617	0,5362
19	task1_MiniTrue_1	0,7553	0,7551	task2_codec_1	0,6239	0,5354
20	task1_IREL_hatespeech_group_3	0,7532	0,7532	task2_s_exist_1	0,5682	0,5342
21	task1_zzw_1	0,7527	0,7526	task2_GuillemGSubies_2	0,6293	0,5295
22	task1_UMUTEAM_3	0,7514	0,7514	task2_nlp_uned_team_2	0,6209	0,5246
23	task1_QMUL-SDS_1	0,7502	0,7489	task2_UMUTEAM_3	0,5911	0,524
24	task1_GuillemGSubies_3	0,7479	0,7479	task2_LaSTUS_1	0,612	0,5227
25	task1_IREL_hatespeech_group_1	0,747	0,7469	task2_GuillemGSubies_1	0,6234	0,5218
26	task1_IREL_hatespeech_group_2	0,7457	0,7455	task2_Zimtstern_1	0,6108	0,5208
27	task1_UMUTEAM_2	0,744	0,744	task2_Zimtstern_3	0,6133	0,5206
28	task1_Zimtstern_3	0,7356	0,7354	task2_Andrea_Lisa_1	0,6129	0,5204
29	task1_nlp_uned_team_2	0,7324	0,7317	task2_AIT_FHSTP_1	0,6074	0,5195
30	task1_LaSTUS_1	0,7317	0,7316	task2_zzw_1	0,6296	0,5192
31	task1_CIC_1	0,7278	0,727	task2_recognai_2	0,5996	0,5177
32	Task1_MessGroupELL_3	0,7237	0,7237	task2_GuillemGSubies_3	0,6145	0,5174
33	Task1_MessGroupELL_1	0,723	0,7225	task2_Zimtstern_2	0,6033	0,5121
34	task1_Andrea_Lisa_1	0,7186	0,7182	task2_CIC_2	0,5527	0,4908
35	task1_Zimtstern_1	0,7184	0,7165	task2_Nerin_3	0,6046	0,4817
36	task1_AIT_FHSTP_1	0,7182	0,7121	task2_Nerin_2	0,6005	0,471
37	task1_MB-Courage_1	0,7145	0,7145	task2_UNEDBiasTeam_3	0,5797	0,4704
38	task1_Soumya_2	0,7115	0,7114	task2_Alclatos_2	0,5826	0,4673
39	Task1_MessGroupELL_2	0,7111	0,7109	task2_UNEDBiasTeam_2	0,5689	0,4621
40	task1_MB-Courage_2	0,7083	0,7072	task2_MB-Courage_2	0,5946	0,459
41	task1_Zimtstern_2	0,7076	0,7066	task2_SINAI_TL_2	0,6049	0,4549
42	task1_Nerin_1	0,7072	0,7068	task2_CIC_3	0,5838	0,4543
43	task1_s_exist_2	0,707	0,7066	task2_Soumya_1	0,5923	0,4504
44	task1_UNEDBiasTeam_2	0,7056	0,7056	task2_MB-Courage_1	0,5897	0,4496
45	task1_Soumya_1	0,7047	0,7045	task2_CIC_1	0,565	0,4489
46	task1_recognai_1	0,7044	0,7041	task2_Soumya_2	0,595	0,4415
47	task1_Nerin_2	0,7022	0,7016	task2_s_exist_2	0,5817	0,4357
48	task1_Alclatos_3	0,6962	0,6959	task2_Nerin_1	0,582	0,4234
49	task1_Alclatos_2	0,6944	0,6926	task2_MB-Courage_3	0,5923	0,4214
50	task1_Nerin_3	0,6918	0,6906	task2_free_1	0,5847	0,4194
51	task1_UNEDBiasTeam_3	0,6905	0,6898	Baseline_svm_tfidf	0,5222	0,395
52	Baseline_svm_tfidf	0,6845	0,6832	task2_s_exist_3	0,4492	0,3853
53	task1_MB-Courage_3	0,6809	0,6792	task2_BilaUnwanPk1_1	0,5062	0,3788
54	task1_BilaUnwanPk1_3	0,6763	0,6759	task2_BilaUnwanPk1_3	0,5064	0,3772
55	task1_Soumya_3	0,6761	0,6761	task2_BilaUnwanPk1_2	0,5046	0,3756
56	task1_recognai_2	0,6726	0,6717	task2_UMUTEAM_1	0,2905	0,2812
57	task1_BilaUnwanPk1_2	0,6717	0,6712	task2_UNEDBiasTeam_1	0,4444	0,165
58	task1_CIC_2	0,6699	0,6677	task2_Alclatos_3	0,4311	0,1585
59	task1_s_exist_3	0,6648	0,6619	task2_ORDS_CLAN_1	0,4833	0,1244
60	task1_multiaztertest_3	0,6582	0,6482	task2_ORDS_CLAN_2	0,4833	0,1237
61	task1_CIC_3	0,6367	0,6366	task2_ORDS_CLAN_3	0,4833	0,1237
62	task1_BilaUnwanPk1_1	0,6126	0,6027	Majority Class	0,4778	0,1078
63	task1_UMUTEAM_1	0,5966	0,5964	task2_almuoes3.0_1	0,1291	0,1069
64	task1_multiaztertest_2	0,5948	0,5944
65	task1_UNEDBiasTeam_1	0,543	0,5359
66	Majority Class	0,5222	0,3431
67	task1_uja_1	0,519	0,5035
68	task1_ORDS_CLAN_1	0,4924	0,3934
69	task1_ORDS_CLAN_2	0,4908	0,39
70	task1_ORDS_CLAN_3	0,4908	0,39
71	task1_almuoes3.0_1	0,4876	0,3979
72	task1_codec_1	0,4096	0,3892

The EvALL reports for Task 1 are avaible for download:

EvALL Tsv Report Task 1 ALL

EvALL Tsv Report Task 1 English

EvALL Tsv Report Task 1 Spanish

The EvALL reports for Task 2 are avaible for download:

EvALL Tsv Report Task 2 ALL

EvALL Tsv Report Task 2 English

EvALL Tsv Report Task 2 Spanish

EXIST 2021 Workshop Program

EXIST 2021 is co-located with the IberLEF Conference, and will be held online (due to COVID-19) on Tuesday, 21 September 2021, from 11:00 to 22:00h CET.

12:30 - 14:00 EXIST Parallel Session
- 12:30 - 12:35: Welcome and Opening Remarks.
- 12:35 - 12:50: Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models. Angel Felipe Magnossão de Paula, Roberto Fray da Silva, Ipek Baris Schlicht.
- 12:50 - 13:05: Sexism Identiﬁcation in Social Networks using a Multi-Task Learning System. Flor Miriam Plaza-del-Arco, M. Dolores Molina-González, L. Alfonso Ureña López, M. Teresa Martín-Valdivia.
- 13:05 - 13:20: Automatic Sexism Detection with Multilingual Transformer Models AIT_FHSTP@EXIST2021. Schütz Mina, Boeck Jaqueline, Liakhovets Daria, Slijepcevic Djordje, Kirchknopf Armin, Hecht Manuel, Bogensperger Johannes, Schlarb Sven, Schindler Alexander, and Zeppelzauer Matthias.
- 13:20 - 13:35: MultiAzterTest@Exist-IberLEF 2021:Linguistically Motivated Sexism Identiﬁcation. Kepa Bengoetxea, Itziar Gonzalez-Dios.
- 13:35 - 13:50: EXIST2021: Detecting Sexism with Transformers and Translation-Augmented Data. Guillem García Subies.
- 13:50 - 14:00: Discussion.
17:15 - 17:30 Overview of EXIST. Francisco Rodríguez-Sánchez, Jorge Carrillo-de-Albornoz, Laura Plaza, Julio Gonzalo, Paolo Rosso, Miriam Comet, Trinidad Donoso.

EXIST 2021 Proceedings

Overview Paper:

Francisco Rodríguez-Sánchez, Jorge Carrillo-de-Albornoz, Laura Plaza, Julio Gonzalo, Paolo Rosso, Miriam Comet, Trinidad Donoso. Overview of EXIST 2021: sEXism Identification in Social neTworks.. Procesamiento del Lenguaje Natural, Vol 67, septiembre 2021.

Working Notes:

All proceedings available at: EXIST 2021 Proceedings.

Organizers

Francisco Rodríguez-Sánchez

UNED

Data developer and PhD student

Jorge Carrillo-de-Albornoz

UNED

Associate Professor and Researcher in NLP

Julio Gonzalo

UNED

Full Professor

Laura Plaza

UNED

Associate Professor and Researcher in NLP

Miriam Comet

Universidad de Barcelona

Assistant Professor

Paolo Rosso

Universitat Politècnica de València

Full Professor

Trinidad Donoso

Universidad de Barcelona

Full Professor

MISMIS-Language Project

(PGC2018-096212-B)

Ministerio de Ciencia, Innovación y Universidades

Symanto Research

Francisco Rangel

Head of Product

Contact

If you have any specific question about the EXIST task, we may ask you to let us know through the Google Group existiberlef2021.

For any other question that does not directly concern the shared task, please write to Jorge Carrillo-de-Albornoz.

jcalbornoz@lsi.uned.es

EXIST: sEXism Identification in Social neTworks

Why EXIST?

Tasks

TASK 1: Sexism Identification

TASK 2: Sexism Categorization

How to participate

Publications

Important dates

Dataset

Download EXIST Dataset

References

Evaluation

References

Results

EXIST 2021 Workshop Program

EXIST 2021 Proceedings

Organizers

Francisco Rodríguez-Sánchez

UNED

Data developer and PhD student

Jorge Carrillo-de-Albornoz

UNED

Associate Professor and Researcher in NLP

Julio Gonzalo

UNED

Full Professor

Laura Plaza

UNED

Associate Professor and Researcher in NLP

Miriam Comet

Universidad de Barcelona

Assistant Professor

Paolo Rosso

Universitat Politècnica de València

Full Professor

Trinidad Donoso

Universidad de Barcelona

Full Professor

Sponsors

MISMIS-Language Project

(PGC2018-096212-B)

Ministerio de Ciencia, Innovación y Universidades

Symanto Research

Francisco Rangel

Head of Product

Contact