Analyzing Information Retrieval Methods to Recover Broken Web Links.
Juan Martinez-Romo, Lourdes Araujo
Proc. European Conf. on Information Retrieval (ECIR 2010)
LNCS 5993, pp. 26-37, Springer (2010)

In this work we compare different techniques to automatically find candidate web pages to substitute broken links.
We extract information from the anchor text, the content of the page containing the link, and the cache page in
some digital library. The selected information is processed and submitted to a search engine. We have compared
different information retrieval methods for both, the selection of terms used to construct the queries
submitted to the search engine, and the ranking of the candidate pages that it provides, in order to help the
user to find the best replacement. In particular, we have used term frequencies, and a language model
approach for the selection of terms; and cooccurrence measures and a language model approach for ranking the
final results. To test the different methods, we have also defined a methodology which does not require the
user judgments, what increases the objectivity of the results.