Third WePS Evaluation Workshop:
Searching Information about Entities in the Web
Call for Participation
Previous WePS campaigns have been focused on the people search task: the first campaign addressed the name ambiguity problem, defining the task as a clustering of web search results for a given person name, aiming at one cluster per person sharing the name. The second campaign used a refined version of the evaluation metrics and added an attribute extraction task for web documents returned by the search engine for a given person name.
In WePS-3 we aim at merging both problems into one single task, where the system must return both the documents and the attributes for each of the different people sharing a given name. This is not a trivial step from the point of view of evaluation: a system may correctly extract attribute values from different URLs but then incorrectly merge them into person profiles.
In addition, WePS-3 adds a task which considers, for the first time, another relevant type of named entity: organizations. We will focus on name ambiguity for organizations
, which is a highly relevant problem faced by Online Reputation Management systems. Take, for instance, the online company Amazon. In order to trace mentions and opinions about Amazon in web data (including news and blog feeds and input from social networks), the system must filter out alternative senses of “Amazon” (the South American river, the nation of female warriors, etc.). But such filtering cannot be done by liberally adding keywords to a query (e.g. “amazon online store”), because that may harm recall, and recall is crucial for reputation management.
WePS 3 will be a competitive evaluation campaign including two tasks concerning the Web entity search problem:
Task 1: Clustering and Attribute Extraction for Web People Search
Task 1 is related to Web People Search and focuses on person name ambiguity and person attribute extraction on Web pages.
Given a set of web search results for a person name, the task is to cluster the pages according to the different people sharing the name and extract certain biographical attributes for each person (i.e., for each cluster of documents).
Task 2: Name ambiguity resolution for Online Reputation Management (ORM)
Task 2 is related to Online Reputation Management (ORM) for organizations and focuses on the problem of ambiguity for organization names and the relevance of Web data for reputation management purposes. The motivation is to help experts in reputation management and alert services. Nowadays, the ambiguity of names is an important bottleneck for these experts. Twitter has been chosen as target data because it is a critical source for real time reputation management and also because ambiguity resolution is challenging: tweets are minimal and little context is available for resolving name ambiguity.
The task is defined as follows: given a set of Twitter entries containing an (ambiguous) company name, and given the home page of the company, the task is to discriminate entries that do not refer to the company. Entries will be given in two languages: English and Spanish.
A team can choose to participate in both Task 1 and Task 2 or only in one of them. In Task 1 Clustering is mandatory and Attribute Extraction optional (i.e. teams that perform the Attribute Extraction subtask are required to complete the Clustering task too).
The organizers will provide annotated data for developing/training systems (read the task guidelines for more details). On a second stage, an unannotated corpus will be distributed, systems output will be collected and evaluation results returned to the participants. Each team can submit up to five runs for each task (Clustering, Attribute Extraction and ORM). Every team is expected to write a paper describing their system and discussing the evaluation results.
The results of the evaluation campaign will be discussed in a one day workshop as a CLEF 2010 Lab in Padua (Italy), 22 or 23 September 2010.
How do I register ?
Please send an email expressing your interest to the task organizers
). State the name of your research group, a contact e-mail and the task(s) in which you intend to participate (Task 1 clustering only, Task 1 clustering + attribute extraction, Task 2).
|15 February 2010|
|7 June 2010|
|21 June 2010|
- Release of official results
|15 July 2010|
|15 August 2010|
|23 September (CLEF 2010, Padua)|
The general lab coordinators are:
- Julio Gonzalo (UNED, Madrid),
- Satoshi Sekine (NYU, New York),
The coordinators for Task 1 (people search) are:
- Javier Artiles (UNED, Madrid),
- Andrew Borthwick (Intelius Corp., Palo Alto),
The coordinators for Task 2 (organizations search) are:
- Bing Liu (University of Illinois at Chicago),
- Enrique Amigó (UNED, Madrid),
- Adolfo Corujo (Llorente & Cuenca, Madrid),
- Eneko Agirre, EHU, Spain
- Breck Balwin, Alias-i, USA
- Danushka Bollegala, Tokyo University, Japan
- Jeremy Ellman, Northumbria University, UK
- Donna Harman, National Institute of Standards and Technology (NIST), USA
- Eduard Hovy, ISI, USA
- Dmitri Kalashnikov, University of California, USA
- Paul Kalmar, USA
- Bernardo Magnini, FBK-irst, Italy
- Gideon Mann, Google, USA
- Yutaka Matsuo, Tokyo University, Japan
- Manabu Okumura, Tokyo Inst. of Tech., Japan
- Ted Pedersen, University of Minnesota, USA
- Massimo Poesio, University of Essex, UK
- Maarten de Rijke, University of Amsterdam, Netherlands
- Jamie Taylor, Freebase, USA
- Mark Sanderson, University of Sheffield, UK
- Arjen P. de Vries, Centrum Wiskunde & Informatica, Netherlands