RepLab is a competitive evaluation exercise for Online Reputation Management systems organized as an activity of CLEF. RepLab 2013 focused on the task of monitoring the reputation of entities (companies, organizations, celebrities, etc.) on Twitter. The monitoring task for analysts consists of searching the stream of tweets for potential mentions to the entity, filtering those that do refer to the entity, detecting topics (i.e., clustering tweets by subject) and ranking them based on the degree to which they signal reputation alerts (i.e., issues that may have a substantial impact on the reputation of the entity). The RepLab 2013 task is defined, accordingly, as (multilingual) topic detection combined with priority ranking of the topics, as input for reputation monitoring experts. The detection of reputational polarity (does the tweet have negative/positive implications for the reputation of the entity?) is an essential step to assign priority, and was evaluated as a standalone subtask.


Participants presented systems that attempted the full monitoring task (filtering + topic detection + topic ranking) or modules that contributed only partially to solve the problem. Possible modules are related to the following components of the whole reputation management task:

  1. Filtering: Systems were asked to determine which tweets are related to the entity and which are not, for instance, distinguishing between tweets that contain the word "Stanford" referring to the University of Stanford and filtering out tweets about Stanford as a place. Manual annotations were provided with two possible values: related/unrelated.
  2. Reputational polarity: The goal was to decide if the tweet content has positive or negative implications for the company's reputation. Manual annotations are: positive/negative/neutral.
  3. Topic Detection: Systems were asked to cluster related tweets about the entity by topic with the objective of grouping together tweets referring to the same subject.
  4. Assigning priority: The full task involved detecting the relative priority of topics. So as to be able to evaluate priority independently from the clustering task, we evaluated the subtask of predicting the priority of the cluster a tweet belongs to.


RepLab 2013 dataset uses Twitter data in English and Spanish (more than 142,000 tweets). The balance between both languages depends on the availability of data for each of the entities included in the dataset. The corpus consists of a collection of tweets referring to a selected set of 61 entities from four domains: automotive, banking, universities and music/artists. The domain selection was done to offer a variety of scenarios for reputation studies.

Crawling was performed during the period from the 1st June 2012 till the 31st Dec 2012 using the entity’s canonical name as query. For each entity, at least 2,200 tweets are collected: at least 700 tweets at the beginning of the timeline are used as training set, and at least 1,500 last tweets are reserved for the test set. The corpus also comprises additional background tweets for each entity (up to 50,000 tweets, with a large variability across entities). This distribution was set in this way to obtain a temporal separation (ideally of several months) between the training and test data.

Note that the final amount of available tweets in these sets may be lower, since some posts may have been deleted by the users: in order to respect Twitter’s terms of service, we do not provide the contents of the tweets. The tweet identifiers can be used to retrieve the texts of the posts. We provide a download tool that is similarly to the mechanism used in the TREC Microblog Track in 2011 and 2012.

For more information, please refer to the RepLab 2013 Overview's paper.


The RepLab 2013 dataset can be downloaded through the following links:

Note that the version with "external links" includes a copy of each URL included in a tweet belonging to the corpus, and it is much larger than the plain version (8,6G vs 76M).

The tool to download the tweets can be downloaded through the following link:

Please refer to the README included on each package for the description of the contents. Fo any question please write to: or

Evaluation via EvALL

The RepLab 2013 Goldstandard is now available via EvALL ( EvALL is an evaluation web service that allows researchers to evaluate their systems outputs according to several metrics. EvALL also allows to compare the system outputs against the State of the Art already stored in EvALL for this dataset. You can find more information about EvALL in the next video.


Please cite the article below if you use this resource in your research:

Enrique Amigó, Jorge  Carrillo-de-Albornoz, Irina Chugur, Adolfo Corujo, Julio Gonzalo, Tamara  Martín,
Edgar Meij, Maarten de Rijke, Damiano Spina.
Overview of RepLab 2013: Evaluating Online Reputation Monitoring Systems
Proceedings of the Fourth International  Conference of the CLEF initiative. 2013.
[working notes version]


  title = {{Overview of RepLab 2013: Evaluating Online Reputation Monitoring Systems}},
  author = {Amig{\'o}, E. and {Carrillo de Albornoz}, J. and Chugur, I. and Corujo, A. and Gonzalo, J. and Mart{\'i}n, T. and Meij, E. and de Rijke, M. and Spina, D.},
  pages = {333--352},
  booktitle = {{Proceedings of the Fourth International  Conference of the CLEF initiative}},
  year = {2013}