Information Retrieval & Natural Language Processing
What is this course about ?This course reviews the attempts to apply Natural Language resources and techniques to Information Retrieval or, to be more accurate, of Text Retrieval: the problem of finding the relevant documents in a text collection given a user's query stated in some language. It assumes that the student has some background in Linguistic Engineering and is interested in potential applications in the field of Text Retrieval.
The main pedagogical effort in the course is currently to provide or assemble Internet on-line resources that permit a practical experimentation of the issues considered in the course. Such facilities can be in-site software, including:
or external resources such as Internet Retrieval Engines, Machine Translation Systems, etc.
- Morphological analyzers
- POS Taggers for different languages
- Multilingual lexical databases
- Cross-Language mapping of queries
Such practical experimentation is complemented with short introductions for every main topic, references to books and papers available on the web (which are to be used as the primary text material to be studied), and self tests (to the extent they are appropriate for this topic).
What will you not find here ?This course is not a textbook. The primary references for study should be the papers and books suggested here. The text contained in this web course is just enough to assembly the practical experiments to be carried out along the course.
This is not as well an up-to-date review of the state-of-the art in Text Retrieval with NLP techniques and resources. The references have been chosen according to pedagogical purposes, in addition to their representativeness or relevance. In its actual state, the course is biased as well by the working experience of the authors in the field.
Finally, this course does not address all issues related to the algorithms and data structures for efficient Information Retrieval. Such topics, while of great importance in an IR course from a computer-science perspective, are of secondary interest when the focus is on how language technologies may improve IR processes, and deserve a course on their own.
In which ways can this site be exploited for learning purposes?We foresee at least the following, non-exclusives, uses:
- As a full course, to be followed in self-study modality.
If this is your case you should navigate the site following the recommendations provided by the study guide.
- As a source of material for practical work: examples and exercises.
This could be the case of a professor/student following a regular course, our site provides buttons to enter directly to this kind of material. Each list is organized by topics. A pedagogical description for exercises is included to help you to perform an appropiate selection. Many of the exercises propose small-scale but authentic tasks to give students direct insights of the real problems they have to face in the professional practice.
- As a repository of data and tools to perform NLP & IR tasks.
- As a virtual workspace to carry out small NLP&IR projects in collaboration