- Also available in a single file

Track: The iCLEF challenge

Build a system that will allow real people to find information that is written in languages that they have not mastered. Then measure how well representative users are able to use that system.

The goal of iCLEF , thn, is to study the interactive aspects of Cross-Language Information Retrieval systems. Standard CLEF, NTCIR and Cross-Language TREC tasks evaluate the ability of systems to automatically retrieve target-language(s) documents from source-language queries; iCLEF evaluates how well systems help users locate and identify relevant foreign-language information.

iCLEF 2004 task: Interactive Cross-Language Q&A

This year, the interactive CLEF track will study the problem of Cross-Language Question Answering (CL-QA) from a user-inclusive perspective. The challenge is twofold:

For a CL-QA system, the issue is how best can the QA system interact with the user to obtain details about a question that facilitate the automatic search for an answer in the document collection. For instance, in case of ambiguity, the system may request additional information from the user, avoiding incorrect translations (for translation ambiguity) or incorrect inferences (for semantic ambiguity).

For a Cross-Language search system, the issue is how a system can best assist a user with the task of finding and recognizing the answer to a question by searching the document collection. In monolingual searches, users can often easily find answers using standard document or passage retrieval systems. The cross-language case seems, however, to demand much more assistance from the system:
We welcome research teams with interests in cross-language information retrieval, human-computer interaction, question answering, and machine translation. The organizers hope to foster synergies between interested parties, so expertise in one or more of these fields should be sufficient to participate.

Research teams participating in iCLEF are supposed to study some of the issues above by comparing two systems in a CL QA search task involving a number of topics (provided by iCLEF) and a number of searchers (recruited locally by the participant team). The two systems should differ in the facilities provided for any of the tasks listed above. The iCLEF experiment design will allow groups to estimate the effect of system differences by suppressing the (additive) effects of participant and topic, and by reducing somewhat the effects of interactions between these factors.

Participating teams should focus on one of the following user groups (if both groups are studied, separate experiments should be run for each):
  1. searchers with passive language abilities in the foreign language (i.e. that can at least roughly understand documents in that language, but cannot form accurate queries in that language without assistance). For example, a native speaker of Italian that is searching Spanish documents might be a member of this user group.
  2. searchers with no useful language abilities in the foreign language. For example, a monolingual Spanish speaker that is searching German documents might be a member of this user group

Track - How to - Data provided - Data to be submitted - Evaluation - Schedule

How to participate

Research groups interested in joining the iCLEF 2004 task should follow these steps:

Register as CLEF participants (follow instructions in Upon registration, every participant will receive instructions to download the appropriate document collections from the CLEF ftp site.

E-mail the track organizers (Doug Oard and Julio Gonzalo) indicating your wish to participate and the languages (user and document languages) that will be used in your experiment. Once registration for CLEF is confirmed, participants will receive instructions to download iCLEF question set.

Formulate some hypothesis about the task, and design two interactive CL QA systems intended to test your hypothesis. Usually, one of the systems is taken as a reference or baseline, and the other system is a proposed or contrastive approach. You can find examples of this methodology in previous iCLEF tracks: iCLEF 2003, iCLEF 2002 and iCLEF 2001. Ensure that both systems keep a log for post-experiment analysis that is as rich as possible.
These are examples of baseline CL QA systems that can be used for the iCLEF 2004 task:

Recruit subjects for the experiment; a minimum of eight subjects, more can be added in groups of eight. Make sure that the (source and target) language skills of the subjects are homogeneous. The usual setup is that the subjects are native in the question language, and have no (or very low) skills in the document language.

Perform the experiment (which takes approximately three hours per subject) and submit the results to iCLEF organizers, following the experiment design shown above.

An experiment consists of a number of search sessions. A search session has three parameters: which user is searching [1-8], which question is being searched [1-16], and which system is being used [reference,contrastive]. The user has a fixed amount of time to find the answer in the document collection using the system. Once the answer is found, the user writes it down (in his own native language) in a questionnaire. Check the experiment design for details.

Which user/question/system combinations must be carried out? We use a within-subject design like that used in early years of the TREC interactive track, but with a different number of topics and a different task. Check the experiment design for details. Overall, every user will search the full set of 16 questions (half with one system, half with the other) in an overall time (including training, questionnaries and searches) of around 3 hours.

No formal coordination of hypotheses or comparison of systems across sites is planned for iCLEF 2004, but groups are encouraged to seek out and exploit synergies. As a first step, groups are strongly encouraged to make the focus of their planned investigations known to other track participants as soon as possible, preferably via the track listserv at Contact to join the list.

Submit the results to the track organizers following the submission format.

After receiving the official results, write a paper describing your experiment for the CLEF working notes and submit the paper to Carol Peters and the track organizers.

Track - How to - Data provided - Data to be submitted - Evaluation - Schedule

Data Provided by the Organizers

Track - How to - Data provided - Data to be submitted - Evaluation - Schedule

Data to be Submitted to the Organizers

Participants are encouraged to log as many details as possible about every search session. However, only a minimal information (basically, the answer provided by the user for every question/system/user combination) has to be submitted to the organizers. Check the submission format for details.

Track - How to - Data provided - Data to be submitted - Evaluation - Schedule

Evaluation measures

The main evaluation score for a system will be accuracy, i.e., the fraction of correct answers. Accuracy will be measured for every searcher/system pair, and then averaged over searchers to obtain a single accuracy measure for each of the two systems being compared.

The assessment will be done following the CLEF QA guidelines , except for two issues:

The nature of any further detailed analysis is up to each site, but sites are strongly encouraged to take advantage of the experimental design and undertake exploratory data analysis to examine the patterns of correlation and interaction among factors that can be observed in the experiment, and to explore the potential for gaining additional insight through alternative evaluation measures. The computation of analysis of variance (ANOVA), where appropriate, can provide useful insights into the separate contributions of searcher, topic and system as a first step in understanding why the results of one search are different from those of another.

Track - How to - Data provided - Data to be submitted - Evaluation - Schedule


Registration opens January 15, 2004
Document release February 2004
Question release May 20, 2004
Submission of runs by participants June 10, 2004
Release of individual results July 15, 2004
Submission of papers for working notes August 15, 2004
CLEF workshop (Bath, UK, after ECDL) September 16-17, 2004
Document languages
(choose at least one)
Dutch, French, German,
Italian, Spanish, English
User languages Unrestricted


Fernando López Ostenero - Webmaster
- Javart Web Design and implementation