Information Retrieval & Natural Language Processing

Introduction


 

What is this course about ?

This course reviews the attempts to apply Natural Language resources and techniques to Information Retrieval or, to be more accurate, of Text Retrieval: the problem of finding the relevant documents in a text collection given a user's query stated in some language. It assumes that the student has some background in Linguistic Engineering and is interested in potential applications in the field of Text Retrieval.

The main pedagogical effort in the course is currently to provide or assemble Internet on-line resources that permit a practical experimentation of the issues considered in the course. Such facilities can be in-site software, including:

or external resources such as Internet Retrieval Engines, Machine Translation Systems, etc.

Such practical experimentation is complemented with short introductions for every main topic, references to books and papers available on the web (which are to be used as the primary text material to be studied), and self tests (to the extent they are appropriate for this topic).
 

What will you not find here ?

This course is not a textbook. The primary references for study should be the papers and books suggested here. The text contained in this web course is just enough to assembly the practical experiments to be carried out along the course.

This is not as well an up-to-date review of the state-of-the art in Text Retrieval with NLP techniques and resources. The references have been chosen according to pedagogical purposes, in addition to their representativeness or relevance. In its actual state, the course is biased as well by the working experience of the authors in the field.

Finally, this course does not address all issues related to the algorithms and data structures for efficient Information Retrieval. Such topics, while of great importance in an IR course from a computer-science perspective, are of secondary interest when the focus is on how language technologies may improve IR processes, and deserve a course on their own.
 

In which ways can this site be exploited for learning purposes?

We foresee at least the following, non-exclusives, uses: