Expected Results

The Web has changed the way in which researchers access scientific information, conduct research, communicate their findings and share data. There is now a need to assess the impact of Web publication in order to promote wider and better use of this new medium.

Quantitative approaches have been very popular and relatively successful in scientific evaluation, and although the ranking of researchers, institutions, journals or countries according to output indicators has been heavily criticized, policy makers, academic authorities and funding bodies increasingly use these indicators in decision-making.

Collecting information from websites in order to obtain and process the data for generating indicators needs to scale to cope with the increasing growth of the information space. In principle, Human Languages Technologies are mature enough to be applied successfully in a well-defined task and context. The identification of this case is both an opportunity to test the HLT potential and to develop a feasible approach in terms of resources and techniques needed, processing time, and quality of the automatic processing outcomes.

The main scientific contributions expected from the project aims at the creation of models and methods of web mining for cybermetrics purposes. The results obtained in the project will generate different scientific publication in international journals concerning to the role of the mediators, geographic and thematic factors, as well as data about the Spanish humanities community. Non-academic papers will be also produced. A PhD dissertation will be done on Web scholar communication in Spanish. All the results will be presented in national and international meetings.

The main expected technological contribution addresses the development of Classification and IE techniques tailored for a web-scale task. These techniques would support interactive systems in multilingual environments helping the user to solve a specific problem. These techniques will be implemented as open source code and the prototypes will be freely distributed for research purposes.