ISCORPUS

How ISCORPUS has been elaborated?

ISCORPUS has been created at the Department of Computer Languages and Systems, UNED, for qualitative and quantitative studies of Information Synthesis tasks. The ISCORPUS is described in the following paper (provided with the ISCORPUS package):

Title= "An Empirical Study of Information Synthesis Task"
Authors= "Enrique Amigo, Julio Gonzalo, Victor Peinado, Anselmo Peñas and Felisa Verdejo".
Proceeding: "42nd meeting of Association for Computational Linguistics (ACL), july 2004, Barcelona".

The use of this corpus is only allowed for research purposes.

WHAT ISCORPUS CONTAINS?

This corpus contains the next directories:

Traces

Donwload Traces.

It contains, for each query and user, all the monitorized actions realized by users along synthesis process, anotated in files:

/Traces/#user name#/consulta#query number#.#user name#.file

Each line contains three fields:

time||action||index of the treated sentence or document|

The action field values mean:

VISUALIZANDO DOCUMENTO: The user get in a new document.
ANOTANDO FRAGMENTO: The user adds the sentence to his report.
BORRANDO FRAGMENTO: The user deletes the sentence form
SALIENDO DEL DOCUMENTO: The user get out from the document.
CUESTIONARIO REALIZADO: The user has completed the form.

The form answers are registered at the end of each fich:

PERSONAS: Answer to the question "Who are the main people involved in the topic?"
ORGANIZACIONES: Answer to the question "What are the main organizations participating in the topic?"
FACTORES: Answer to the question "What are the key factors in the topic?"

P1: Were you familiarized with the topic?
P2: Was it hard for you to elaborate the report?
P3: Did you miss the possibility of introducing annotations
or rewriting parts of the report by hand?
P4: Do you consider that you generated a good report?
P5: Are you tired?

NADA=Nothing.
POCO=Little
ALGO=Something.
MUCHO=A lot.

Donwload Contents.

This directory contains, for all queries, the index and content of sentences sets, explored by the user. Those datas are registered in files:

/contents/originalTexts/consulta#query number#.content.text

The lematized contents are in

/contents/lemmatizedTexts/consulta#query number#.content.text

The lematized contents without stopwords are in

/contents/cleanTexts/consulta#query number#.content.text

Reports

Donwload Reports.

The /reports/ directory contains the indexes of the sentences selected by users during synthesis process, for each query and user, in files:

/reports/consulta#query number#.idFrags.#user name#.file

Query texts

Donwload Query Texts.

The queryText fich contains the texts showed to the users in order to lead the user task.