WePS 3: searching information about entities in the Web
The results of the evaluation campaign will be discussed in a one day workshop as a CLEF 2010 Lab in Padova (Italy), 22 or 23 September 2010. See the WePS-3 Call for Participation BackgroundThe WePS campaign has been focused on the Web People Search problem in its first two editions: WePS 1 was run as a Semeval 1 task in 2007, where 16 teams submitted results (being one of the largest tasks in Semeval) and WePS 2 was run as a workshop of the WWW 2009 Conference, with the participation of 19 research teams. The Web People Search task was defined in WePS as a problem of organization of web search results for a given person name. Web search engines return a ranked list of URLs which typically refer to various people sharing the same name. Ideally, the user would rather see documents in different clusters grouping documents that refer to the same individual, possibly with a list of person attributes that help deciding who is the actual person intended by the user. From a practical point of view, the task is highly relevant: between 11 and 17% of web queries include a person name, 4% of web queries are just a person name, and person names are highly ambiguous: according to the US Census Bureau, only 90,000 different names are shared by more than 100,000,000 people. An indirect proof of the relevance of the problem is the fact that, since 2005, a number of web startups have been created precisely to address it (Spock.com and Zoominfo.com being the best known). From a research point of view, the task is challenging (the number of clusters is not known a priori; the degree of ambiguity does not seem to follow a normal distribution; and web pages are noisy sources from which attributes and other indexes are difficult to extract) and has connections with Natural Language Processing and Information Retrieval tasks (Text Clustering, Information Extraction, Word Sense Discrimination) in the context of the WWW as data source. GoalsOur current proposal represents a third step in a growth path for WePS which is illustrated in the following figure. WePS 3 TasksWePS 1 and WePS 2 were focused on the people search task: in the first campaign we addressed only the name coreference problem, defining the task as clustering of web search results for a given person name. In the second campaign we refined the evaluation metrics and added an attribute extraction task for web documents returned by the search engine for a given person name. For this third campaign we aim at merging both problems into one single task, where the system must return both the documents and the attributes for each of the different people sharing a given name. This is not a trivial step from the point of view of evaluation: a system may correctly extract attribute profiles from different URLs but then incorrectly merge profiles. In addition, we want to consider another type of entity: organizations. Name ambiguity for organizations is a highly relevant problem faced by Online Reputation Management systems. Take, for instance, the online company Amazon. In order to trace mentions and opinions about Amazon in web data (including news and blog feeds and input from social networks), the system must filter out alternative senses of “Amazon” (the South American river, the nation of female warriors, etc.). But such filtering cannot be done by liberally adding keywords to a query (e.g. “amazon online store”), because that may harm recall, and recall is crucial for reputation management.
WePS 3 Focus: implication of industrial stakeholdersWePS 1 and WePS 2 focused on consolidating a research community around the problem and an optimal evaluation methodology. In WePS 3 the focus is on implicating industrial stakeholders in the evaluation campaign, as providers of input to the task design phase and also as providers of realistic scale datasets. To reach this goal we have incorporated a representative from industry in each of the tasks:
OrganizersThe general lab coordinators are:
The coordinators for Task 1 (people search) are:
The coordinators for Task 2 (organizations search) are:
Besides the track coordinators, WePS has a representative Steering Committee. WePS 3 AgendaThis is the tentative agenda for WePS 3:
The results of the evaluation campaign will be discussed in a one day workshop as a CLEF 2010 Lab in Padova (Italy), 22 or 23 September 2010. The organization of the workshop will follow the successful model used for WePS 2, and will include (i) overviews of the two tasks, (ii) selected presentations from participants, focusing on successful strategies and innovative proposals, (iii) invited talks by leading researchers, industrial stakeholders and experts in evaluation methodologies, (iv) poster session where all participants can present and discuss their approaches, and (v) discussion sessions to shape future WePS campaigns. WePS-3 is sponsored by InteliusPerson attribute extraction and clustering are core technologies for Intelius. Intelius' support of WePS-3 continues its history of support for research in this area, as shown by the $50,000 Spock Challenge (2007), which was sponsored by Intelius subsidiary, spock.com. Intelius is actively hiring people with expertise in people record linkage and attribute extraction for its data research team. Those interested should see our ad or contact Dr. Borthwick for more information. |
|||||||||||||
|
06/14/2010 - Task-1 submission checker and XML DTDs. 06/09/2010 - Person name frequencies for task-1 test data. 06/08/2010 06/07/2010 05/24/2010 - Training data for Task-2 - WePS 1 and 2 person name frequencies from Intelius 02/22/2010 Trial data for tasks 1 and 2 is now available 01/20/2010 01/20/2010WePS-3 guidelines are now available
|