Third WePS Evaluation Workshop:
Searching Information about Entities in the Web
Call for Participation
Previous WePS campaigns have been focused on the people search task: the first campaign addressed the name ambiguity problem, defining the task as a clustering of web search results for a given person name, aiming at one cluster per person sharing the name. The second campaign used a refined version of the evaluation metrics and added an attribute extraction task for web documents returned by the search engine for a given person name.
In addition, WePS-3 adds a task which considers, for the first time, another relevant type of named entity: organizations. We will focus on name ambiguity for organizations, which is a highly relevant problem faced by Online Reputation Management systems. Take, for instance, the online company Amazon. In order to trace mentions and opinions about Amazon in web data (including news and blog feeds and input from social networks), the system must filter out alternative senses of “Amazon” (the South American river, the nation of female warriors, etc.). But such filtering cannot be done by liberally adding keywords to a query (e.g. “amazon online store”), because that may harm recall, and recall is crucial for reputation management.
Given a set of web search results for a person name, the task is to cluster the pages according to the different people sharing the name and extract certain biographical attributes for each person (i.e., for each cluster of documents).
- Guidelines for the WePS-3 Person Name Disambiguation Task
- Guidelines for the WePS-3 Attribute Extraction Subtask
The task is defined as follows: given a set of Twitter entries containing an (ambiguous) company name, and given the home page of the company, the task is to discriminate entries that do not refer to the company. Entries will be given in two languages: English and Spanish.
Participation
A team can choose to participate in both Task 1 and Task 2 or only in one of them. In Task 1 Clustering is mandatory and Attribute Extraction optional (i.e. teams that perform the Attribute Extraction subtask are required to complete the Clustering task too).
The organizers will provide annotated data for developing/training systems (read the task guidelines for more details). On a second stage, an unannotated corpus will be distributed, systems output will be collected and evaluation results returned to the participants. Each team can submit up to five runs for each task (Clustering, Attribute Extraction and ORM). Every team is expected to write a paper describing their system and discussing the evaluation results.
How do I register ?
Please send an email expressing your interest to the task organizers
( This e-mail address is being protected from spambots. You need JavaScript enabled to view it ). State the name of your research group, a contact e-mail and the task(s) in which you intend to participate (Task 1 clustering only, Task 1 clustering + attribute extraction, Task 2).
Important Dates
| 15 February 2010 |
| 7 June 2010 |
| 21 June 2010 |
| 15 July 2010 |
| 15 August 2010 |
| 23 September (CLEF 2010, Padua) |
Organizers
- Julio Gonzalo (UNED, Madrid), This e-mail address is being protected from spambots. You need JavaScript enabled to view it
- Satoshi Sekine (NYU, New York), This e-mail address is being protected from spambots. You need JavaScript enabled to view it
- Javier Artiles (UNED, Madrid), This e-mail address is being protected from spambots. You need JavaScript enabled to view it
- Andrew Borthwick (Intelius Corp., Palo Alto), This e-mail address is being protected from spambots. You need JavaScript enabled to view it
- Bing Liu (University of Illinois at Chicago), This e-mail address is being protected from spambots. You need JavaScript enabled to view it
- Enrique Amigó (UNED, Madrid), This e-mail address is being protected from spambots. You need JavaScript enabled to view it
- Adolfo Corujo (Llorente & Cuenca, Madrid),
This e-mail address is being protected from spambots. You need JavaScript enabled to view it
Program Committee
- Eneko Agirre, EHU, Spain
- Breck Balwin, Alias-i, USA
- Danushka Bollegala, Tokyo University, Japan
- Jeremy Ellman, Northumbria University, UK
- Donna Harman, National Institute of Standards and Technology (NIST), USA
- Eduard Hovy, ISI, USA
- Dmitri Kalashnikov, University of California, USA
- Paul Kalmar, USA
- Bernardo Magnini, FBK-irst, Italy
- Gideon Mann, Google, USA
- Yutaka Matsuo, Tokyo University, Japan
- Manabu Okumura, Tokyo Inst. of Tech., Japan
- Ted Pedersen, University of Minnesota, USA
- Massimo Poesio, University of Essex, UK
- Maarten de Rijke, University of Amsterdam, Netherlands
- Jamie Taylor, Freebase, USA
- Mark Sanderson, University of Sheffield, UK
- Arjen P. de Vries, Centrum Wiskunde & Informatica, Netherlands