Build a system that will allow real people to find information that is written in languages that they have not mastered, and then measure how well representative users are able to use the system that you have built.
iCLEF 2005 will study the problem of Cross-Language Question Answering from a user-inclusive perspective. Depending on the perspective, the challenge is twofold: from the point of view of Q&A as a machine task (Q&A systems), interaction with the user may help a Q&A engine to retrieve better answers; from the point of view of Q&A as a user task, a search assistant may help the user in locating the answer faster, more easily and more accurately.
For Cross-Language Q&A engines, the issue is how best can a Q&A system interact with the user to retrieve details about a question that facilitate the automatic cross-language search of an searching a foreign-language document collection. Note that, in monolingual searches, users can efficiently retrieve answers using standard document or passage retrieval engines, as shown in the interactive TREC experiences. The Cross-Language version of the problem, however, seems to demand much more assistance from the computer. answer in a given document collection.
For Cross-Language search assistants, the issue is how best can a system assist a user in the task of finding and recognizing the answer to a question by searching a foreign-language document collection. Note that, in monolingual searches, users can efficiently retrieve answers using standard document or passage retrieval engines, as shown in the interactive TREC experiences. The Cross-Language version of the problem, however, seems to demand much more assistance from the computer.
Groups participant in iCLEF will share a common experiment design involving users to explore different aspects of the above research challenges. The experience is coordinated with the Q&A CLEF track.
We welcome research teams with interests in Cross-Language IR, Human-Computer Interaction, Question Answering, and Machine Translation. The organizers will foster synergies between interested parties; expertise in all of these fields is not required to participate. As in previous iCLEF tracks, participants will adapt a shared user study design to test a hypothesis of their choice, comparing reference and contrastive systems. Descriptions of experiments carried out at iCLEF 2004 can be downloaded from the workshop notes at the CLEF web page.
In coordination with the Cross-Language Image Retrieval task, iCLEF will also organize an interactive image retrieval task.
Go to the main CLEF page and follow the general registration procedure. Please e-mail Julio@lsi.uned.es with any enquiries regarding the task.
Register as CLEF participants (follow instructions in http://clef.isti.cnr.it/). Upon registration, every participant will receive instructions to download the appropriate document collections from the CLEF ftp site.
E-mail the track organizers (Paul Clough, Alessandro Vallin and Julio Gonzalo) indicating your wish to participate and the languages (user and document languages) that will be used in your experiment. Once registration for CLEF is confirmed, participants will receive instructions to download iCLEF question set.
Formulate some hypothesis about the task, and design two interactive CL QA systems intended to test your hypothesis. Usually, one of the systems is taken as a reference or baseline, and the other system is a proposed or contrastive approach. You can find examples of this methodology in previous iCLEF tracks: iCLEF 2004,
iCLEF 2002 and
iCLEF 2001. Ensure that both systems keep a log for post-experiment analysis that is as rich as possible.
These are examples of baseline CL QA systems that can be used for the iCLEF 2005 (QA) task:
Recruit subjects for the experiment; a minimum of eight subjects, more can be added in groups of eight. Make sure that the (source and target) language skills of the subjects are homogeneous. The usual setup is that the subjects are native in the question language, and have no (or very low) skills in the document language.
Perform the experiment (which takes approximately three hours per subject) and submit the results to iCLEF organizers, following the experiment design shown above.
An experiment consists of a number of search sessions. A search session has three parameters: which user is searching [1-8], which question is being searched [1-16], and which system is being used [reference,contrastive]. The user has a fixed amount of time to find the answer in the document collection using the system. Once the answer is found, the user writes it down (in his own native language) in a questionnaire. Check the experiment design for details.
Which user/question/system combinations must be carried out? We use a within-subject design like that used in early years of the TREC interactive track, but with a different number of topics and a different task. Check the experiment design for details. Overall, every user will search the full set of 16 questions (half with one system, half with the other) in an overall time (including training, questionnaries and searches) of around 3 hours.
No formal coordination of hypotheses or comparison of systems across sites is planned for iCLEF 2005, but groups are encouraged to seek out and exploit synergies. As a first step, groups are strongly encouraged to make the focus of their planned investigations known to other track participants as soon as possible, preferably via the track listserv at email@example.com. Contact firstname.lastname@example.org to join the list.
Submit the results to the track organizers following the submission format.
After receiving the official results, write a paper describing your experiment for the CLEF working notes and submit the paper to Julio Gonzalo and the track organizers.
Participants are encouraged to log as many details as possible about every search session. However, only a minimal information (basically, the answer provided by the user for every question/system/user combination) has to be submitted to the organizers. Check the submission format for details.
The main evaluation score for a system will be accuracy, i.e., the fraction of correct answers. Accuracy will be measured for every searcher/system pair, and then averaged over searchers to obtain a single accuracy measure for each of the two systems being compared.
The assessment will be done following the CLEF QA guidelines , except for one issue:
The nature of any further detailed analysis is up to each site, but sites are strongly encouraged to take advantage of the experimental design and undertake exploratory data analysis to examine the patterns of correlation and interaction among factors that can be observed in the experiment, and to explore the potential for gaining additional insight through alternative evaluation measures. The computation of analysis of variance (ANOVA), where appropriate, can provide useful insights into the separate contributions of searcher, topic and system as a first step in understanding why the results of one search are different from those of another.
|Test Sets Release||May 23|
|Submission of Runs||June 22|
|Release of individual results||July 30|
|Submission of papers for working notes||August 21|
|CLEF workshop (Vienna)||21-23 September 2005|
|Document languages allowed||BG,DE,EN,ES,FI,FR,IT,NL,PT|
|User languages allowed||unrestricted|
Fernando López Ostenero - Webmaster