iCLEF 2005 (QA)

Guidelines

Build a system that will allow real people to find information that is written in languages that they have not mastered, and then measure how well representative users are able to use the system that you have built.

Introducing iCLEF 2005:

iCLEF 2005 will study the problem of Cross-Language Question Answering from a user-inclusive perspective. Depending on the perspective, the challenge is twofold: from the point of view of Q&A as a machine task (Q&A systems), interaction with the user may help a Q&A engine to retrieve better answers; from the point of view of Q&A as a user task, a search assistant may help the user in locating the answer faster, more easily and more accurately.

For Cross-Language Q&A engines, the issue is how best can a Q&A system interact with the user to retrieve details about a question that facilitate the automatic cross-language search of an searching a foreign-language document collection. Note that, in monolingual searches, users can efficiently retrieve answers using standard document or passage retrieval engines, as shown in the interactive TREC experiences. The Cross-Language version of the problem, however, seems to demand much more assistance from the computer. answer in a given document collection.

For Cross-Language search assistants, the issue is how best can a system assist a user in the task of finding and recognizing the answer to a question by searching a foreign-language document collection. Note that, in monolingual searches, users can efficiently retrieve answers using standard document or passage retrieval engines, as shown in the interactive TREC experiences. The Cross-Language version of the problem, however, seems to demand much more assistance from the computer.

Groups participant in iCLEF will share a common experiment design involving users to explore different aspects of the above research challenges. The experience is coordinated with the Q&A CLEF track.

We welcome research teams with interests in Cross-Language IR, Human-Computer Interaction, Question Answering, and Machine Translation. The organizers will foster synergies between interested parties; expertise in all of these fields is not required to participate. As in previous iCLEF tracks, participants will adapt a shared user study design to test a hypothesis of their choice, comparing reference and contrastive systems. Descriptions of experiments carried out at iCLEF 2004 can be downloaded from the workshop notes at the CLEF web page.

In coordination with the Cross-Language Image Retrieval task, iCLEF will also organize an interactive image retrieval task.

OK, I'm interested. How do I join?

Go to the main CLEF page and follow the general registration procedure. Please e-mail Julio@lsi.uned.es with any enquiries regarding the task.

Track - How to - Data provided - Data to be submitted - Evaluation - Schedule

How to participate

Research groups interested in joining the iCLEF 2004 task should follow these steps:

Register as CLEF participants (follow instructions in http://clef.isti.cnr.it/). Upon registration, every participant will receive instructions to download the appropriate document collections from the CLEF ftp site.

E-mail the track organizers (Paul Clough, Alessandro Vallin and Julio Gonzalo) indicating your wish to participate and the languages (user and document languages) that will be used in your experiment. Once registration for CLEF is confirmed, participants will receive instructions to download iCLEF question set.

Formulate some hypothesis about the task, and design two interactive CL QA systems intended to test your hypothesis. Usually, one of the systems is taken as a reference or baseline, and the other system is a proposed or contrastive approach. You can find examples of this methodology in previous iCLEF tracks: iCLEF 2004, iCLEF 2003, iCLEF 2002 and iCLEF 2001. Ensure that both systems keep a log for post-experiment analysis that is as rich as possible.
These are examples of baseline CL QA systems that can be used for the iCLEF 2005 (QA) task:

A standard IR system coupled with a Machine Translation system. The user can type a question, the system translates the question automatically, retrieves relevant documents, and shows MT versions of the documents. By reading the document translations, the user may recognize possible answers, or refine the query until a suitable answer is found.
A standard QA system coupled with a Machine Translation system. The user types a question, the system translates the question, retrieves a set of possible answers, and shows an MT version of every answer and the document that supports the answer. The user scans the answers until a seemingly correct one is found. If no answer is found, the user paraphrases the question and starts the process again.

Recruit subjects for the experiment; a minimum of eight subjects, more can be added in groups of eight. Make sure that the (source and target) language skills of the subjects are homogeneous. The usual setup is that the subjects are native in the question language, and have no (or very low) skills in the document language.

Perform the experiment (which takes approximately three hours per subject) and submit the results to iCLEF organizers, following the experiment design shown above.

An experiment consists of a number of search sessions. A search session has three parameters: which user is searching [1-8], which question is being searched [1-16], and which system is being used [reference,contrastive]. The user has a fixed amount of time to find the answer in the document collection using the system. Once the answer is found, the user writes it down (in his own native language) in a questionnaire. Check the experiment design for details.

Which user/question/system combinations must be carried out? We use a within-subject design like that used in early years of the TREC interactive track, but with a different number of topics and a different task. Check the experiment design for details. Overall, every user will search the full set of 16 questions (half with one system, half with the other) in an overall time (including training, questionnaries and searches) of around 3 hours.

No formal coordination of hypotheses or comparison of systems across sites is planned for iCLEF 2005, but groups are encouraged to seek out and exploit synergies. As a first step, groups are strongly encouraged to make the focus of their planned investigations known to other track participants as soon as possible, preferably via the track listserv at iclef@listserv.uned.es. Contact julio@lsi.uned.es to join the list.

Submit the results to the track organizers following the submission format.

After receiving the official results, write a paper describing your experiment for the CLEF working notes and submit the paper to Julio Gonzalo and the track organizers.

Track - How to - Data provided - Data to be submitted - Evaluation - Schedule

Data Provided by the Organizers

Document collection : participants may choose any of the CLEF QA 2004 document languages. In each case, the documents are newswire and/or newspaper articles from major news services that were generated during 1994 and 1995. Each participant will select one of these collections that meets the needs of their chosen user group. Document collections are provided by the CLEF organization.
Translated documents : The Systran machine translation system will be used at the University of Maryland to provide reference translations for some of the CLEF collections. These translations will be made available to participants through the CLEF organizers for use in their baseline system if desired. Use of these translations is not required. Translated documents will be available in the workspace for participants .
Questions: 16 questions will be made for every document language. Each participating team will use the topics in the native language of the searchers. There is no restriction on the searchers' native language. Questions will be made available in the workspace for participants .
Questionnaires : Questionnaires should be completed by searchers at the start of their session, after each topic, when switching systems, and at the end of their session. Participating teams are encouraged (but not required) to provide one observer per searcher if sufficient resources are available in order to maximize the value of the observational notes. Check the sample questionnaires .
Result format : A standard format is provided for submitting data collected during the experiment to the organizers. This submitted data will be used as a basis for computing standard measures of effectiveness, and will be made available to any participating team upon request to facilitate more detailed cross-site comparisons. Check the submission format .

Track - How to - Data provided - Data to be submitted - Evaluation - Schedule

Data to be Submitted to the Organizers

Participants are encouraged to log as many details as possible about every search session. However, only a minimal information (basically, the answer provided by the user for every question/system/user combination) has to be submitted to the organizers. Check the submission format for details.

Track - How to - Data provided - Data to be submitted - Evaluation - Schedule

Evaluation measures

The main evaluation score for a system will be accuracy, i.e., the fraction of correct answers. Accuracy will be measured for every searcher/system pair, and then averaged over searchers to obtain a single accuracy measure for each of the two systems being compared.

The assessment will be done following the CLEF QA guidelines , except for one issue:

When a searcher does not find an answer within the five minutes allowed per search, "NIL" will be taken as the answer for assessment purposes.

The nature of any further detailed analysis is up to each site, but sites are strongly encouraged to take advantage of the experimental design and undertake exploratory data analysis to examine the patterns of correlation and interaction among factors that can be observed in the experiment, and to explore the potential for gaining additional insight through alternative evaluation measures. The computation of analysis of variance (ANOVA), where appropriate, can provide useful insights into the separate contributions of searcher, topic and system as a first step in understanding why the results of one search are different from those of another.

Track - How to - Data provided - Data to be submitted - Evaluation - Schedule

Schedule

Registration Open	now
Corpora release	available
Test Sets Release	May 23
Submission of Runs	June 22
Release of individual results	July 30
Submission of papers for working notes	August 21
CLEF workshop (Vienna)	21-23 September 2005
Document languages allowed	BG,DE,EN,ES,FI,FR,IT,NL,PT
User languages allowed	unrestricted

Fernando López Ostenero - Webmaster
- Javart Web Design and implementation