Guidelines

Track: The iCLEF challenge

The Cross-Language Evaluation Forum (CLEF) is a an annual evaluation of Cross-Language Information Retrieval (CLIR) systems with a focus in European languages. Increasing interest in interactive aspects of the CLIR problem is evident, so the CLEF 2001 evaluation will include an experimental track for teams interested in evaluation of interactive CLIR systems.

The goal of the interactive track at CLEF 2001 is to explore evaluation methods for interactive CLIR and to establish baselines agains which future research progress can be measured. Participating teams will be asked to run a mutually agreed experiment protocol involving an interactive CLIR system of their own design and a relatively small number (4-8) of human subjects.

The CLEF 2001 interactive track will most likely focus on interactive selection of documents that have been automatically translated from a language that the searcher would otherwise have been unable to read. The details of the task and the evaluation design will be developed though discussions on the interactive track mailing list.

Participation in the mailing list is open to all interested parties, and an archive of postings to the mailing list can be found on the track's Web site. In order to facilitate participation both tasks, the interactive track's result submission deadline will be one month after the main CLEF deadline.

To join the interactive track mailing list, please send an e-mail to Doug Oard and Julio Gonzalo with the name of your organization and the email address(es) that you would like added to the list.

The tentative schedule for the interactive track is:

ASAP	Join the iCLEF mailing list
6 June	Topics and documents available Systran translations available
10 July	Submit relevance judgments to UNED
25 July	Results available from UNED
6 August	Submit notebook papers to CNR
13 August	Submit additional results to UNED
3-4 September	CLEF Workshop in Darmstadt. Germany

For further information, please visit the CLEF Web site at http://www.clef-campaign.org and select the link for the interactive track.

Track - Data provided - Data to be submitted - Evaluation - Schedule

Data Provided by the Organizers

Document collection : sites will use a subset of the CLEF 2001 collection that contains either French, English or German documents. In each case, the documents are newswire and/or newspaper articles from major news services that were generated during 1994. Each participant will select one of these collections that meets the needs of their chosen user group. See the CLEF web page for information about how to obtain the documents.
Translated documents : The Systran machine translation system will be used at the University of Maryland to translate the German documents into English and the English documents into Spanish. These translations will be made available to participants through the CLEF organizers for use in their baseline system if desired. Use of these translations is not required.
Topics : Written topic descriptions will be made available in Chinese, Dutch, English, Finnish, French, German, Italian, Russian, Japanese, Spanish, Swedish and Thai. Participating teams may use the full topic description (title, description, and narrative fields) or any subset of that topic description as a basis for interactive query translation and as a basis for interactive document selection. Each participating team will use the topics in the native language of the searchers.
Ranked lists : For teams that do not wish to investigate interactive query translation, a standard ranked list of documents for each topic will be provided. The ranked list will be generated using a CLIR system with no user interaction. In order to maximize the potential for cross-site comparison, use of these standard ranked lists is required if interactive query translation is not performed.
Experiment design : The experiment will use a within-subject design like that used for the TREC interactive track, but with a different number of topics and a different task. The design with detailed instructions are available here .
Searcher instructions : A standard example of instructions to be given to the searchers is provided to help standardize the conditions under which the experiments are run. These instructions should be modified as necessary by participating teams to account for system design and local conditions, but care should be taken to minimize substantive changes where possible. Searcher instructions will be made available here .
Questionnaires : Standardized questionnaires should be completed by searchers at the start of their session, after each topic, when switching systems, and at the end of their session. An additional form will be provided on which observers can note their observations. Participating teams are encouraged (but not required) to provide one observer per searcher if sufficient resources are available in order to maximize the value of the observational notes. The questionnaires will be made available here .
Result format : A standard format is provided for submitting data collected during the experiment to the organizers. This submitted data will be used as a basis for computing standard measures of effectiveness, and will be made available to any participating team upon request to facilitate more detailed cross-site comparisons. The formatting instructions are available here .

Track - Data provided - Data to be submitted - Evaluation - Schedule

Data to be Submitted to the Organizers

For every search (topic/searcher/system combination), two types of data will be collected:

If query translation is supported, the top 100 documents in every ranked list generated by the system during interactive query translation (only one list is required if the user does not reformulate the query and search again).
If document selection is performed, the list of documents that are judged to be relevant by user.

Information about where to submit this data will be provided to the participating teams about one month before the submission deadline. The format of data to be submitted can be downloaded here .

Track - Data provided - Data to be submitted - Evaluation - Schedule

Evaluation Measures

The CLEF-2001 relevance assessment for the chosen document language will be used as ground truth. The primary measure of a interactive query translation effectiveness will be the mean uninterpolated average precision in the top 100 documents. Because searchers may elect to iteratively refine their query, a sequence of mean uninterpolated average precision values will be used to characterize the effect of query refinement. The primary measure of a searcher's document selection effectiveness will be van Rijsbergen's F_ALPHA measure (the same measure used in iCLEF 2001):

F_ALPHA = 1/[(ALPHA/P + (1-ALPHA)/R]

where P is precision and R is recall. Values of ALPHA below 0.5 emphasize recall, values greater than 0.5 emphasize precision. For this evaluation, ALPHA=0.8 will be the default value, modeling the case in which missing some relevant documents would be less objectionable than paying to obtain fluent translations of many documents that later turn out not to be relevant. RELEVANCEJUDGMENTs of 2 will be treated as relevant and all other RELEVANCEJUDGMENTs will be treated as not relevant for purposes of computing the F_ALPHA measure.

The nature of any further detailed analysis is up to each site, but sites are strongly encouraged to take advantage of the experimental design and undertake exploratory data analysis to examine the patterns of correlation and interaction among factors that can be observed in the experiment, and to explore the potential for gaining additional insight through alternative evaluation measures. Some example plots for the TREC-6 interactive data (recall or precision by searcher or topic) are available on the Interactive Track web site at http://www.itl.nist.gov/iad/894.02/projects/t10i/ under "Interactive Track History." The computation of analysis of variance (ANOVA), where appropriate, can provide useful insights into the separate contributions of searcher, topic and system as a first step in understanding why the results of one search are different from those of another.

Track - Data provided - Data to be submitted - Evaluation - Schedule

Schedule

ASAP	Join the iCLEF mailing list
20 Mar	Topics and documents available
22 Apr	Systran translations available
15 Jun	Submit relevance judgments to UNED
1 Aug	Results available from UNED
1 Sep	Submit notebook papers to CNR
19-20 Sep	CLEF Workshop in Rome

Fernando López Ostenero - Webmaster
- Javart Web Design and Implementation