Guidelines

Track: The iCLEF challenge

The high-level goal of the Interactive Track in CLEF-2001 is investigation of cross-language searching (by users who cannot read the document language) as an interactive task, examining the process as well as the outcome. To this end, an experimental framework has been designed with the following common features:

The framework will allow groups to estimate the effect of their experimental manipulation free of the main (additive) effects of participant and topic and it will reduce the effect of interactions.

In CLEF 2001, the emphasis will be on each group's exploration of different approaches to supporting the common searcher task and understanding the reasons for the results they get. No formal coordination of hypotheses or comparison of systems across sites is planned for CLEF 2001, but groups are encouraged to seek out and exploit synergies. As a first step, groups are strongly encouraged to make the focus of their planned investigations known to other track participants as soon as possible, preferably via the track listserv at iclef@listserv.uned.es. Contact Julio Gonzalo to join.

Track - Questions -Data provided -Searcher task -Instructions -Questionnaires- Data to be submitted -Evaluation - Exp. Schedule - Analysis - Schedule

Questions

The track will look at two types of questions:

  1. Broad questions, asking about a general subject
  2. Narrow questions, asking about a specific event

The questions will be selected from the CLEF-2000 topics topics that had good coverage in the English and French collections and will be balanced among the two types.

Track - Questions -Data provided -Searcher task -Instructions -Questionnaires- Data to be submitted -Evaluation - Exp. Schedule - Analysis - Schedule

Data provided

Track - Questions -Data provided -Searcher task -Instructions -Questionnaires- Data to be submitted -Evaluation - Exp. Schedule - Analysis - Schedule

Searcher task

The searcher's task will be to begin at the top of a ranked list that was produced by a cross-language retrieval system and examine a translation of each foreign language document in the list to determine whether the document is relevant, somewhat relevant, or irrelevant to a topic described by a written topic description. A maximum of 20 minutes is allowed for each ranked list. The user will also be afforded the ability to indicate if they are unsure of their assessment for particular documents, and they may choose to leave some documents unassessed.

Track - Questions -Data provided -Searcher task -Instructions -Questionnaires- Data to be submitted -Evaluation - Exp. Schedule - Analysis - Schedule

Instructions to be given to the searchers

The goal of this experiment is to determine how well an information retrieval system can provide you with information about foreign language documents that would allow you to reliably decide whether each document is relevant to a topic (which we define as "a written statement of a searcher's information need").

You will be asked to judge documents with respect to 2 topics with one system and 2 topics with another. For each topic, you will be shown information about 50 documents. The documents are arranged so that the documents that an automatic search systems has determined are most likely to be relevant will be in the most prominent positions (for example, they may be placed near the top of a ranked list). You may select any individual document for closer examination, or you may judge the relevance of a document based on the summary information that is initially displayed. More credit will be awarded for accurately assessing relevant documents than for the number of documents that are assessed, because in a real application you might need to pay for a high-quality translation prepared for each selected document. You may indicate each document as relevant, not relevant, unsure,, or you may leave it unassessed. You will have twenty minutes for each search, with one brief break in the middle of the session.

You will also be asked to complete several additional questionnaires:

Track - Questions -Data provided -Searcher task -Instructions -Questionnaires - Data to be submitted -Evaluation - Exp. Schedule - Analysis - Schedule

Searcher questionnaires

The questionnaires can be downloaded here.

In the questionnaires, <ENGLISH/SPANISH> must be substituted for the native language of the searchers.

Track - Questions -Data provided -Searcher task -Instructions -Questionnaires- Data to be submitted -Evaluation - Exp. Schedule - Analysis - Schedule

Data to be collected and submitted to UNED

Several sorts of result data will be collected for evaluation/analysis (for all questions unless otherwise specified):


Instructions about where to submit all data will be mailed to the distribution list.

Track - Questions -Data provided -Searcher task -Instructions -Questionnaires- Data to be submitted - Evaluation - Exp. Schedule - Analysis - Schedule

Evaluation of data submitted to UNED

The CLEF-2000 relevance assessment for the document language used by the participant will be used as ground truth. The primary measure of a searcher's effectiveness will be van Rijsbergen's F_ALPHA measure: F_ALPHA = 1/[(ALPHA/P + (1-ALPHA)/R] where P is precision and R is recall. Values of ALPHA below 0.5 emphasize precision, values greater than 0.5 emphasize recall. For this evaluation, ALPHA=0.2 will be the default value, modeling the case in which missing some relevant documents would be less objectionable than paying to obtain fluent translations of many documents that later turn out not to be relevant. RELEVANCEJUDGMENTs of 2 will be treated as relevant and all other RELEVANCEJUDGMENTs will be treated as not relevant for purposes of computing the F_ALPHA measure. This is an exploratory track in which one of our most important goals is to develop good evaluation measures for this task. Participating teams are therefore encouraged to compute and report any other measures that they believe offer a useful degree of insight.

Track - Questions -Data provided -Searcher task -Instructions -Questionnaires- Data to be submitted -Evaluation - Exp. Schedule - Analysis - Schedule

Experiment schedule

The design will be a within-subject design like that used for the TREC interactive track, but with a different number of topics and a different task.

Each user will be presented with all of the topics. The presentation order for topics will be varied systematically, with 2 variations to ensure that each topic is searched in a different position, but that the same presentation order is used for each system. The topic presentation order will be

The experiment will take about three hours. Each topic will take about 25 minutes: 2 minutes before to examine the topic description, 20 minutes during the search, 3 minutes afterwards to complete the post-search survey. Searchers should not be asked to work for more than an hour without a break. An example schedule for an experimental session would be as follows:

Introductory stuff 10 minutes
Initial survey 5 minutes
Tutorials (2 systems) 30 minutes total
Break 10 minutes
Searching (system A, 2 topics) 50 minutes
Post-system survey 5 minutes
Break
Searching (system B, 2 topics) 50 minutes
Post-system survey 5 minutes
Final survey 10 minutes

 

Track - Questions -Data provided -Searcher task -Instructions -Questionnaires- Data to be submitted -Evaluation - Exp. Schedule - Analysis - Schedule

Analysis

The nature of the detailed analysis is up to each site, but sites are strongly encouraged to take advantage of the experimental design and undertake exploratory data analysis to examine the patterns of correlation, interaction, etc. involving the major factors. Some example plots for the TREC-6 interactive data (recall or precision by searcher or topic) are available on the Interactive Track web site at http://www.itl.nist.gov/iad/894.02/projects/t10i/ under "Interactive Track History." The computation of analysis of variance (ANOVA), where appropriate, can provide useful insights into the separate contributions of searcher, topic and system as a first step in understanding why the results of one search are different from those of another.

Track - Questions -Data provided -Searcher task -Instructions -Questionnaires- Data to be submitted -Evaluation - Exp. Schedule - Analysis - Schedule

Schedule

ASAP Join the iCLEF mailing list
6 Jun Topics and documents available
Systran translations available
10 July Submit relevance judgments to UNED
25 July Results available from UNED
6 August Submit notebook papers to CNR
13 August Submit additional results to UNED
3-4 September CLEF Workshop in Darmstadt. Germany

 

 

Fernando López Ostenero - Webmaster
- Design and implementation