The high-level goal of the Interactive Track in CLEF-2001 is investigation of cross-language searching (by users who cannot read the document language) as an interactive task, examining the process as well as the outcome. To this end, an experimental framework has been designed with the following common features:
In CLEF 2001, the emphasis will be on each group's exploration of different approaches to supporting the common searcher task and understanding the reasons for the results they get. No formal coordination of hypotheses or comparison of systems across sites is planned for CLEF 2001, but groups are encouraged to seek out and exploit synergies. As a first step, groups are strongly encouraged to make the focus of their planned investigations known to other track participants as soon as possible, preferably via the track listserv at email@example.com. Contact Julio Gonzalo to join.
The track will look at two types of questions:
The questions will be selected from the CLEF-2000 topics topics that had good coverage in the English and French collections and will be balanced among the two types.
You will be asked to judge documents with respect to 2 topics with one system and 2 topics with another. For each topic, you will be shown information about 50 documents. The documents are arranged so that the documents that an automatic search systems has determined are most likely to be relevant will be in the most prominent positions (for example, they may be placed near the top of a ranked list). You may select any individual document for closer examination, or you may judge the relevance of a document based on the summary information that is initially displayed. More credit will be awarded for accurately assessing relevant documents than for the number of documents that are assessed, because in a real application you might need to pay for a high-quality translation prepared for each selected document. You may indicate each document as relevant, not relevant, unsure,, or you may leave it unassessed. You will have twenty minutes for each search, with one brief break in the middle of the session.
You will also be asked to complete several additional questionnaires:
The questionnaires can be downloaded here.
In the questionnaires, <ENGLISH/SPANISH> must be substituted for the native language of the searchers.
Several sorts of result data will be collected for evaluation/analysis (for all questions unless otherwise specified):
Instructions about where to submit all data will be mailed to the distribution list.
The CLEF-2000 relevance assessment for the document language used by the participant will be used as ground truth. The primary measure of a searcher's effectiveness will be van Rijsbergen's F_ALPHA measure: F_ALPHA = 1/[(ALPHA/P + (1-ALPHA)/R] where P is precision and R is recall. Values of ALPHA below 0.5 emphasize precision, values greater than 0.5 emphasize recall. For this evaluation, ALPHA=0.2 will be the default value, modeling the case in which missing some relevant documents would be less objectionable than paying to obtain fluent translations of many documents that later turn out not to be relevant. RELEVANCEJUDGMENTs of 2 will be treated as relevant and all other RELEVANCEJUDGMENTs will be treated as not relevant for purposes of computing the F_ALPHA measure. This is an exploratory track in which one of our most important goals is to develop good evaluation measures for this task. Participating teams are therefore encouraged to compute and report any other measures that they believe offer a useful degree of insight.
The design will be a within-subject design like that used for the TREC interactive track, but with a different number of topics and a different task.
Each user will be presented with all of the topics. The presentation order for topics will be varied systematically, with 2 variations to ensure that each topic is searched in a different position, but that the same presentation order is used for each system. The topic presentation order will be
The experiment will take about three hours. Each topic will take about 25 minutes: 2 minutes before to examine the topic description, 20 minutes during the search, 3 minutes afterwards to complete the post-search survey. Searchers should not be asked to work for more than an hour without a break. An example schedule for an experimental session would be as follows:
|Introductory stuff||10 minutes|
|Initial survey||5 minutes|
|Tutorials (2 systems)||30 minutes total|
|Searching (system A, 2 topics)||50 minutes|
|Post-system survey||5 minutes|
|Searching (system B, 2 topics)||50 minutes|
|Post-system survey||5 minutes|
|Final survey||10 minutes|
The nature of the detailed analysis is up to each site, but sites are strongly encouraged to take advantage of the experimental design and undertake exploratory data analysis to examine the patterns of correlation, interaction, etc. involving the major factors. Some example plots for the TREC-6 interactive data (recall or precision by searcher or topic) are available on the Interactive Track web site at http://www.itl.nist.gov/iad/894.02/projects/t10i/ under "Interactive Track History." The computation of analysis of variance (ANOVA), where appropriate, can provide useful insights into the separate contributions of searcher, topic and system as a first step in understanding why the results of one search are different from those of another.
|ASAP||Join the iCLEF mailing list|
|6 Jun||Topics and documents available
Systran translations available
|10 July||Submit relevance judgments to UNED|
|25 July||Results available from UNED|
|6 August||Submit notebook papers to CNR|
|13 August||Submit additional results to UNED|
|3-4 September||CLEF Workshop in Darmstadt. Germany|
Fernando López Ostenero - Webmaster