RepLab 2014

0. Program

Session #1 - Wednesday, 17 Sept 13:30-15:30 (High Tor 3)
13:30-14:10	Invited Talk I: Understanding the Social Context: New Trends in Managing Corporate Reputation Juan Cardona, Llorente & Cuenca.
14:10-14:45	Overview of RepLab 2014 Enrique Amigó, UNED.
14:45-15:30	RepLab 2014: Author Profiling *UAMCLyR at RepLab 2014: Author Profiling Task, Christian Sánchez-Sánchez LIA@REPLAB 2014, Jean-Valère Cossu LyS at CLEF RepLab 2014: Creating the State of the Art in Author Influence Ranking and Reputation Classification on Twitter*, Jesús Vilares
Break (Dining Room)
Session #2 - Wednseday, 17 Sept 16:00-18:00 (High Tor 3)
16:00-17:00	Invited Talk II: Features and Target Tasks - What Are We Aiming For and What Do We Learn? Jussi Karlgen, Gavagai.
17:00-17:30	RepLab 2014: Reputation Dimensions *University of Glasgow Terrier Team / Project Abacá at RepLab 2014: Reputation Dimensions Task, Graham McDonald Feature Selection and Data Sampling Methods for Learning Reputation Dimensions - The University of Amsterdam at RepLab 2014*, Manos Tsagkias
17:30-18:00	Experiments using RepLab 2013 Data *Learning Similarity Functions for Topic Detection in Online Reputation Monitoring, Damiano Spina Formal Concept Analysis as an alternative in Topic Detection Task*, Juan M. Cigarrán
Happy 15th Birthday CLEF and refreshments (Dining Room and Bar Area)
Session #3 - Thursday, 18 Sept 10:30-12:30 (High Tor 3)
10:30-11:30	Invited Talk III: Reputation Management: The Gap Between Research and Reality Miguel Martínez, Signal.
11:30-12:20	Panel: What comes after RepLab?
12:20-12:30	Wrap-Up

1. About RepLab

RepLab is a competitive evaluation exercise for Online Reputation Management systems. As in previous years, the third RepLab campaign (RepLab 2014) will be organized as an activity of CLEF, and the results of the exercise will be discussed at the CLEF 2014 conference in Sheffield, on the 15-18th September. In 2012 and 2013, RepLab focused on the problem of monitoring the reputation of entities (typically companies) in Twitter, and dealt with the tasks of entity name disambiguation (Is the tweet about the entity?), reputation polarity (Does the tweet have positive or negative implications for the entity’s reputation?), topic detection (What is the issue relative to the entity is discussed in the tweet?) and topic ranking (Is the topic a reputation alert that deserves immediate attention?).

RepLab 2014 will still focus on Reputation Monitoring on Twitter, targeting two new tasks: the categorization of messages with respect to standard reputation dimensions (Performance, Leadership, Innovation, etc.) and the characterization of Twitter profiles (author profiling) with respect to a certain activity domain, classifying authors as journalists, professionals, etc. and finding the opinion makers in the domain. The dataset will contain tweets in two languages: English and Spanish.

Note that Twitter profile classification forms part of the shared PAN-RepLab author profiling task. Besides the characterization of profiles from a reputation analysis perspective, participants can also attempt the classification of authors by gender and age, which is the focus of PAN 2014.

The papers of the RepLab 2014 --including the overview-- are available online at the CLEF 2014 Working Notes:
http://ceur-ws.org/Vol-1180/.

2. Tasks

RepLab 2014 will include two tasks: (1) classification of Twitter posts and (2) search and classification of Twitter profiles. Participants are welcome to present systems that attempt one or both tasks.

Reputation Dimensions: This is a classification task on tweets that consists in categorizing tweets according to their reputational dimension. We will use the standard categorization provided by the Reputation Institute (http://www.reputationinstitute.com/about-reputation-institute/the-reptrak-framework): (1) Products/Services, (2) Innovation, (3) Workplace, (4) Citizenship, (5) Governance, (6) Leadership, and (7) Performance, (8) Undefined. These categories aim at facilitating the reputation analysis. For instance:

Workplace: "We are sadly going to be loosing Sarah Smith from HSBC Bank, as she has been successful in moving forward into a...http://fb.me/18FKDLQIr"
Innovation: "HSBC to upgrade 10,000 POS terminals for contactless payments http://bit.ly/K9h6QW"

Author Profiling: This task consists of two subtasks that will be evaluated separately.

Author Categorization: Participants will be asked to classify Twitter profiles by type of author: journalist, professional, authority, activist, investor, company or celebrity. The systems’ output will be a list of profile identifiers with the assigned categories, one per profile. Note that this subtask will be evaluated only over the profiles annotated as “Influencer” in the gold standard.
Author Ranking: Using the same set of Twitter profiles systems will be expected to find out which authors have more reputational influence (who the influencers or opinion makers are) and which profiles are less influential or have no influence at all. For a given domain (e.g. automotive or banking), the systems’ output will be a ranking of profiles according to their probability of being an opinion maker with respect to the concrete domain, optionally including the corresponding weights.

Some aspects that determine the influence of an author in Twitter – from a reputation analysis perspective – can be the number of followers, the number of comments on a domain or the type of author. As an example, below is the profile description of an influential financial journalist:

Description: New York Times Columnist & CNBC Squawk Box (@SquawkCNBC) Co-Anchor. Author, Too Big To Fail. Founder, @DealBook. Proud father. RTs ≠ endorsements
Location: New York, New York · nytimes.com/dealbook
Tweets: 1,423
Tweet examples:

Whitney Tilson: Evaluating the Dearth of Female Hedge Fund Managers http://nyti.ms/1gpClRq @dealbook

Dina Powell, Goldman’s Charitable Foundation Chief to Lead the Firm's Urban Investment Group http://nyti.ms/1fpdTxn @dealbook

Systems can also participate in the shared author profiling task RepLab@PAN. In order to do so, participants will need to classify profiles by gender and age. Two categories, female and male, will be used for gender. Regarding age, the following classes will be considered: 18-24, 25-34, 35-49, 50-64, and 65+ .

3. Dataset

RepLab 2014 used Twitter data in English and Spanish.

For the reputation dimensions task, the data set is the same as in Replab 2013. This corpus consists of a collection of tweets referring to a selected set of 61 entities from four domains: automotive, banking, universities and music/artists. Replab 2014 will use only the automotive and banking subsets. Crawling was performed during the period from the 1st June 2012 to the 31st Dec 2012 using the entity’s canonical name as query. For each entity, at least 2,200 tweets were collected: at least 700 tweets at the beginning of the timeline are used as training set, and at least 1,500 last tweets are reserved for the test set. The corpus also comprises additional background tweets for each entity (up to 50,000, with a large variability across entities). Note that the final amount of available tweets in these sets may be lower, since some posts may have been deleted or made private by the authors: in order to respect Twitter’s terms of service, the organizers do not provide the contents of the tweets. The tweet identifiers can be used to retrieve the texts of the posts similarly to the mechanism used in the TREC Microblog Track in 2011 and 2012. Each tweet is categorized into one of the following reputation dimensions: Products/Services, Innovation, Workplace, Citizenship, Governance, Leadership, Performance and Undefined.

For the author profiling task, the data set consists of over 8,000 Twitter profiles (all with at least 1,000 followers) related to the automotive and banking domains. Each profile consists of (i) author name; (ii) profile URL and (iii) the last 600 tweets published by the author at crawling time. Reputation experts will manually identify the opinion makers (i.e. authors with reputational influence) and annotate them as “Influencer”. All those profiles that are not considered opinion makers will be assigned the “Non-Influencer” label. In case a profile cannot be classified into one of these categories, it will be labelled as “Undecidable”.

Each opinion maker will be categorized as journalist, professional, authority, activist, investor, company, or celebrity. The data set will be split into training and test sets. The estimatated proportion is 30% and 70% respectively, although the exact splits will be given later.

A subset of the author profiling data set will be used in the shared task RepLab@PAN.

The RepLab 2014 Dataset is publicly available at http://nlp.uned.es/replab2014/replab2014-dataset.tar.gz.

4. Evaluation Measures

The reputation dimensions task and the categorization of profiles by type of author in the author profiling task will be evaluated as classification problems. Accuracy and precision/recall measures over each class will be reported, using accuracy as the main measure.

Note that for the categorization subtask, systems are expected to return the type of author category for every profile. However, as pointed out above, this categorization will be evaluated only over the profiles annotated as “Influencers” in the gold standard.

In the author profiling task, the detection of opinion makers will be evaluated as a traditional ranking information retrieval problem, using the MAP, DCG, RBP and Reliability/Sensitivity measures. The systems’ output will be a ranking of profiles.

5. Important Dates

March 1: Release of training data
March 17: Release of test data
May 9: System results due
May 19: Official results released
June 7: Deadline for paper submissions
September 15-18: CLEF 2014 Conference in Sheffield

6. How to Submit Runs

The number of runs allowed:
Each group is allowed to send up to 5 runs to the Reputation dimensions task and up to 5 runs per each subtask of Author Profiling. This way groups that would like to participate in all the tasks, would be able to submit a total maximum of 15 runs.
How to format your submission:
1. Each group must pick up a group id (alphanumeric string, preferably short).
2. All runs must be packed in a directory named replab2014-<group-id>.
3. Inside this directory, each run should be in a separate file named.
  <group_id>_<task>_<run_id>.txt
  
  where run_id is a number between 1 and 5 and <task> can be RD (Reputation Dimensions), AC (Author Categorization) or AR (Author Ranking). For instance:
  
  replab2014-UNED/UNED_RD_2.txt
4. Files must follow the specifications of the evaluation package distributed with the data.
How to submit:

The compressed directory with your runs must be sent as a single file to enrique_at_lsi.uned.es, and jcalbornoz_at_lsi.uned.es (preferably as a download URL), together with a separate spreadsheet containing metadata about your runs.
How to prepare your paper for the Workshop Notes:

Each group must prepare one paper describing all experiments in all subtasks, following the formatting guidelines on the CLEF 2014 website. If you feel that your work should be split in more than one report (in cases of disjoint experiments with disjoint authors, for instance), please ask the lab organizers (email to enrique_at_lsi.uned.es and jcalbornoz_at_lsi.uned.es).

7. TORM - Track for Online Reputation Management

This year, RepLab will explore new scenarios and offer new tasks: classification of tweets by reputation dimension and author profiling (http://nlp.uned.es/replab2014/). However, Replab 2014 will also include Track for Online Reputation Management (TORM) in order to give an opportunity to keep working on past campaigns data sets (http://nlp.uned.es/replab2013/).

This track will focus on work that makes substantial progress in one or more tasks addressed in the first two RepLab campaigns. It will serve as a basis for a special issue on Online Reputation Management in an indexed journal. The deadline for paper submission is June 7. The LNCS proceedings format (http://www.springer.com/computer/lncs?SGWID=0-164-7-72376-0) will be used. Papers can be submitted by sending them to enrique_at_lsi.uned.es and jcalbornoz_at_lsi.uned.es.

If you have any doubts or problems, please send an e-mail to enrique_at_lsi.uned.es and jcalbornoz_at_lsi.uned.es

8. Organizers

RepLab is an activity sponsored by the EU project LiMoSINe.

Organizers:

Julio Gonzalo (UNED, Madrid)
Adolfo Corujo (Llorente & Cuenca, Madrid)
Maarten de Rijke (University of Amsterdam)
Edgar Meij (Yahoo Labs)
Enrique Amigó (UNED, Madrid)
Jorge Carrillo de Albornoz (UNED, Madrid)
Damiano Spina(UNED, Madrid)
Irina Chugur (UNED, Madrid)
Vanessa Álvarez (Llorente & Cuenca, Madrid)
Ana Pitart (Llorente & Cuenca, Madrid)

Steering Committee:

Eugene Agichtein, Emory University, USA
Alexandra Balahur, JRC, Italy
Krisztian Balog, U. Stavanger, Norway
Donna Harman, NIST, USA
Eduard Hovy, Carnegie Mellon University, USA
Radu Jurca, Google, Zurich
Jussi Karlgren, Gavagai/SICS, Sweden
Mounia Lalmas, Yahoo Labs, London
Jochen Leidner, Thomson Reuters, Switzerland
Bing Liu, U. Illinois at Chicago, USA
Alessandro Moschitti, U. Trento, Italy
Miles Osborne, U. Edinburgh, UK
Hans Uszkoreit, U. Saarbrücken, Germany
James Shanahan, Boston University, USA
Belle Tseng, Apple Inc.
Julio Villena, Daedalus/U. Carlos III, Spain

News

RepLab 2014 included in EvALL

28/07/2017

The RepLab 2014 Goldstandard is now available via EvALL (www.evall.uned.es). EvALL is an evaluation web service that allows researchers to evaluate their systems outputs according to several metrics. EvALL also allows to compare the system outputs against the State of the Art already stored in EvALL for this dataset. You can find more information about EvALL in the next video.

RepLab 2014 Dataset available

08/10/2014

The RepLab 2014 Dataset is publicly available at http://nlp.uned.es/replab2014/replab2014-dataset.tar.gz.

RepLab 2014 papers online!

12/09/2014

The papers of RepLab 2014 --including the overview-- are available online at the CLEF 2014 Working Notes: http://ceur-ws.org/Vol-1180/.

Deadline Extended!

2/05/2014

Please note that the deadline for submitting system results has been extended to May 9th! Happy hacking!

Call for papers TORM

31/03/2014

TORM, Track for Online Reputation Management, will focus on work that makes substantial progress in one or more tasks addressed in the first two RepLab campaigns. It will serve as a basis for a special issue on Online Reputation Management in an indexed journal. The deadline for paper submission is June 7. For more information, please see TORM.

RepLab 2014 Facebook event!

21/03/2014

Just to remind you that we have created a Facebook event of the RepLab 2014 to share experiences, doubts, problems, etc. Please, join us in the following link: https://www.facebook.com/events/593775794030878/

RepLab 2014 test set and evaluation package available!

18/03/2014

We are pleased to announce that the RepLab 2014 test set and evaluation package are now available. To access the dataset and the evaluation package, please register in the lab at CLEF. If you have already registered and have not received an email from the organizers, please contact enrique_at_lsi.uned.es and jcalbornoz_at_lsi.uned.es.

Registration!

10/03/2014

Due to technical problems with the registration form of the CLEF website, we open a new way of registering in the RepLab (until the server of CLEF is up again). If you want to participate and could not register through the CLEF website, please write to the lab organizers (enrique_at_lsi.uned.es and jcalbornoz_at_lsi.uned.es) indicating in the mail the task(s) you want to participate in:

- Task 1: Reputation Dimensions

- Task 2: Author Profiling

RepLab 2014 training dataset already available!!!

04/03/2014

We are pleased to announce that the RepLab 2014 training set is now available. To access the dataset, please register in the lab at CLEF. If you have already registered and have not received an email from the organizers, please contact enrique_at_lsi.uned.es and jcalbornoz_at_lsi.uned.es.

Virtual machines available

26/02/2014

Thanks to the shared author profiling task PAN-RepLab, the RepLab participants will be able to claim a virtual machine which is the possibility offered by the PAN organisers. This will allow the research groups to submit a running software and to deploy it into a virtual machine at PAN's site. Instructions on how to prepare the software for this are given on (http://pan.webis.de/) in the submission box of the author profiling task.

If you wish to claim your virtual machine now, please write to pan@webis.de, indicating which of the following operating systems you prefer:

- Ubuntu 12.04 Server (accessible via SSH)
- Ubuntu 12.04 Desktop (accessible via SSH and remote desktop)
- Windows 7 (accessible via SSH and remote desktop)

A corresponding virtual machine will be set up for you and you will receive further information on how to gain access.