Guidelines
Track: The iCLEF challenge
The goal of
iCLEF is to study the interactive aspects of Cross-Language Information Retrieval systems. Standard CLEF, NTCIR and Cross-Language TREC tasks evaluate the ability of systems to automatically retrieve target-language(s) documents from source-language queries;
iCLEF evaluates how well systems
help users locate and identify relevant foreign-language documents.
Essentially, interactive CLIR systems can help users to:
- Formulate and/or translate the query.
- Refine the formulation and/or translation of the query depending on the outcome of the system.
- identify foreign-language documents as relevant (cross-language document selection)
Research teams participating in iCLEF are supposed to study some of these issues by comparing two systems in a search task involving a number of topics (provided by iCLEF) and a number of searchers (recruited locally by the participant team). The two systems should differ in the facilities provided for any of the tasks listed above. The iCLEF experiment design will allow groups to estimate the effect of system differences by suppressing the (additive) effects of participant and topic, and by reducing somewhat the effects of interactions between these factors.
There are two possibilities for the search task that the compared systems support:
- If the focus is on query formulation, translation and refinement, the search task consists in finding as many relevant documents as possible for every topic, system and searcher combination prescribed by the iCLEF
experiment design .
- If the focus is on cross-language document selection, the (partial) search task may consist on scanning a fixed ranked list of foreign-language documents (returned by some CLIR system), and selecting the relevant ones. Again, this task will be performed for every topic/system/searcher combination prescribed in the shared
experiment design .
Participating teams should focus on one of the following user groups (if both groups are studied, separate experiments should be run for each):
- searchers with passive language abilities in the foreign language (i.e. that can at least roughly understand documents in that language, but cannot form accurate queries in that language without assistance). For example, a native speaker of Italian that is searching Spanish documents might be a member of this user group.
- searchers with no useful language abilities in the foreign language. For example, a monolingual Spanish speaker that is searching German documents might be a member of this user group.
In 2003, iCLEF has eight broad questions, each of which asks about a topic that includes multiple aspects. The questions are selected from the CLEF-2002 topics that hit a reasonable number of documents in all CLEF languages.
How to participate
Research groups interested in joining the iCLEF 2003 task should follow these steps:
1. Register as CLEF participants (follow instructions in http://www.clef-campaign.org). Upon registration, every participant will receive instructions to download the appropriate document collections from the CLEF ftp site.
2. E-mail the track organizers (Doug Oard and Julio Gonzalo) indicating their wish to participate and the languages (user and document languages) that will be used in the intended study. Once registration for CLEF is confirmed, participants will receive instructions to download iCLEF-specific data.
No formal coordination of hypotheses or comparison of systems across sites is planned for iCLEF 2003, but groups are encouraged to seek out and exploit synergies. As a first step, groups are strongly encouraged to make the focus of their planned investigations known to other track participants as soon as possible, preferably via the track listserv at iclef@listserv.uned.es. Contact Julio Gonzalo (julio@lsi.uned.es) to join the list.
The CLEF 2002 working notes contain descriptions of experiments carried out in iCLEF 2002 that may be used as a guidance for iCLEF 2003 experiments.
Data Provided by the Organizers
- Document collection : participants may choose between any of the CLEF 2002 document languages. In each case, the documents are newswire and/or newspaper articles from major news services that were generated during 1994. Each participant will select one of these collections that meets the needs of their chosen user group.
- Translated documents : The Systran machine translation system will be used at the University of Maryland to translate Spanish documents into English and English documents into Spanish. These translations will be made available to participants through the CLEF organizers for use in their baseline system if desired. Use of these translations is not required. Translated documents will be available here .
- Topics : Written topic descriptions will be made available in German, English, Spanish, Finnish, French, Italian, Japanese, Dutch, Portuguese, Russian, Swedish and Chinese. Participating teams may use the full topic description (title, description, and narrative fields) or any subset of that topic description as a basis for interactive query translation and as a basis for interactive document selection. Each participating team will use the topics in the native language of the searchers. Topics will be made available here .
- Ranked lists : For teams that do not wish to investigate interactive query translation, a standard ranked list of documents for each topic will be provided. The ranked list will be generated using a CLIR system with no user interaction. In order to maximize the potential for cross-site comparison, use of these standard ranked lists is required if interactive query translation is not performed. Fixed ranked lists will be made available here .
- Experiment design : The experiment will use a within-subject design like that used for the TREC interactive track, but with a different number of topics and a different task. The design with detailed instructions are available here .
- Searcher instructions : A standard example of instructions to be given to the searchers is provided to help standardize the conditions under which the experiments are run. These instructions should be modified as necessary by participating teams to account for system design and local conditions, but care should be taken to minimize substantive changes where possible. Searcher instructions will be made available here .
- Questionnaires : Questionnaires should be completed by searchers at the start of their session, after each topic, when switching systems, and at the end of their session. An additional form will be provided on which observers can note their observations. Participating teams are encouraged (but not required) to provide one observer per searcher if sufficient resources are available in order to maximize the value of the observational notes. No standard questionnaire is provided by the iCLEF organization, but some samples are available here .
- Result format : A standard format is provided for submitting data collected during the experiment to the organizers. This submitted data will be used as a basis for computing standard measures of effectiveness, and will be made available to any participating team upon request to facilitate more detailed cross-site comparisons. The formatting instructions are available here .
Data to be Submitted to the Organizers
For every search (topic/searcher/system combination), two types of data will be collected:
- [REQUIRED] The list of documents that are found to be relevant by the user, for every topic/system/user combination attempted.
- [OPTIONAL] If query translation is supported, the top 100 documents in every ranked list generated by the system during interactive query translation (only one list is required if the user does not reformulate the query and search again).
Information about where to submit this data will be provided to the participating teams about one month before the submission deadline. The format of data to be submitted can be downloaded here .
Evaluation Measures
The CLEF-2002 relevance assessment for the chosen document language will be used as ground truth:
- The primary measure of a searcher's document selection effectiveness will be van Rijsbergen's F_ALPHA measure (the same measure used in iCLEF 2001 and 2002):
F_ALPHA = 1/[(ALPHA/P + (1-ALPHA)/R]
where P is precision and R is recall, measured over the set of documents retrieved and found to be relevant by the user, given an interactive CLIR system and a search topic.
Values of ALPHA below 0.5 emphasize recall, values greater than 0.5 emphasize precision. ALPHA=0.8 will be the default value, modeling the case in which missing some relevant documents would be less objectionable than paying to obtain fluent translations of many documents that later turn out not to be relevant.
- The primary measure of interactive query translation effectiveness will be the mean uninterpolated average precision in the top 100 documents. Because searchers may elect to iteratively refine their query, a sequence of mean uninterpolated average precision values will be used to characterize the effect of query refinement.
The nature of any further detailed analysis is up to each site, but sites are strongly encouraged to take advantage of the experimental design and undertake exploratory data analysis to examine the patterns of correlation and interaction among factors that can be observed in the experiment, and to explore the potential for gaining additional insight through alternative evaluation measures. The computation of analysis of variance (ANOVA), where appropriate, can provide useful insights into the separate contributions of searcher, topic and system as a first step in understanding why the results of one search are different from those of another.
Schedule
30 March |
Topics and documents available |
15 May |
Deadline for submission of data |
7 July
|
Results submitted to participants |
20 July
|
Submit notebook papers |
21-22 August |
CLEF Workshop in Trondheim |