Guidelines

Track: The iCLEF challenge

The Cross-Language Evaluation Forum (CLEF) is a an annual evaluation of Cross-Language Information Retrieval (CLIR) systems with a focus in European languages. Increasing interest in interactive aspects of the CLIR problem is evident, so the CLEF 2001 evaluation will include an experimental track for teams interested in evaluation of interactive CLIR systems.

The goal of the interactive track at CLEF 2001 is to explore evaluation methods for interactive CLIR and to establish baselines agains which future research progress can be measured. Participating teams will be asked to run a mutually agreed experiment protocol involving an interactive CLIR system of their own design and a relatively small number (4-8) of human subjects.

The CLEF 2001 interactive track will most likely focus on interactive selection of documents that have been automatically translated from a language that the searcher would otherwise have been unable to read. The details of the task and the evaluation design will be developed though discussions on the interactive track mailing list.

Participation in the mailing list is open to all interested parties, and an archive of postings to the mailing list can be found on the track's Web site. In order to facilitate participation both tasks, the interactive track's result submission deadline will be one month after the main CLEF deadline.

To join the interactive track mailing list, please send an e-mail to Doug Oard and Julio Gonzalo with the name of your organization and the email address(es) that you would like added to the list.

The tentative schedule for the interactive track is:

ASAP Join the iCLEF mailing list
6 June Topics and documents available
Systran translations available
10 July Submit relevance judgments to UNED
25 July Results available from UNED
6 August Submit notebook papers to CNR
13 August Submit additional results to UNED
3-4 September CLEF Workshop in Darmstadt. Germany

For further information, please visit the CLEF Web site at http://www.clef-campaign.org and select the link for the interactive track.

Track - Data provided - Data to be submitted - Evaluation - Schedule

Data Provided by the Organizers

Track - Data provided - Data to be submitted - Evaluation - Schedule

Data to be Submitted to the Organizers

For every search (topic/searcher/system combination), two types of data will be collected:
  1. If query translation is supported, the top 100 documents in every ranked list generated by the system during interactive query translation (only one list is required if the user does not reformulate the query and search again).
  2. If document selection is performed, the list of documents that are judged to be relevant by user.
Information about where to submit this data will be provided to the participating teams about one month before the submission deadline. The format of data to be submitted can be downloaded here .

Track - Data provided - Data to be submitted - Evaluation - Schedule

Evaluation Measures

The CLEF-2001 relevance assessment for the chosen document language will be used as ground truth. The primary measure of a interactive query translation effectiveness will be the mean uninterpolated average precision in the top 100 documents. Because searchers may elect to iteratively refine their query, a sequence of mean uninterpolated average precision values will be used to characterize the effect of query refinement. The primary measure of a searcher's document selection effectiveness will be van Rijsbergen's F_ALPHA measure (the same measure used in iCLEF 2001):

F_ALPHA = 1/[(ALPHA/P + (1-ALPHA)/R]

where P is precision and R is recall. Values of ALPHA below 0.5 emphasize recall, values greater than 0.5 emphasize precision. For this evaluation, ALPHA=0.8 will be the default value, modeling the case in which missing some relevant documents would be less objectionable than paying to obtain fluent translations of many documents that later turn out not to be relevant. RELEVANCEJUDGMENTs of 2 will be treated as relevant and all other RELEVANCEJUDGMENTs will be treated as not relevant for purposes of computing the F_ALPHA measure.

The nature of any further detailed analysis is up to each site, but sites are strongly encouraged to take advantage of the experimental design and undertake exploratory data analysis to examine the patterns of correlation and interaction among factors that can be observed in the experiment, and to explore the potential for gaining additional insight through alternative evaluation measures. Some example plots for the TREC-6 interactive data (recall or precision by searcher or topic) are available on the Interactive Track web site at http://www.itl.nist.gov/iad/894.02/projects/t10i/ under "Interactive Track History." The computation of analysis of variance (ANOVA), where appropriate, can provide useful insights into the separate contributions of searcher, topic and system as a first step in understanding why the results of one search are different from those of another.

Track - Data provided - Data to be submitted - Evaluation - Schedule

Schedule

ASAP Join the iCLEF mailing list
20 Mar Topics and documents available
22 Apr Systran translations available
15 Jun Submit relevance judgments to UNED
1 Aug Results available from UNED
1 Sep Submit notebook papers to CNR
19-20 Sep CLEF Workshop in Rome

 

Fernando López Ostenero - Webmaster
- Javart Web Design and Implementation