Updating Broken Web Links: an Automatic Recommendation System.
Juan Martinez-Romo, Lourdes Araujo.
Information Processing & Management, 48(2), 183-203 (2012).

Broken hypertext links are a frequent problem in the Web. Sometimes the page which a
link points to has disappeared forever, but in many other cases the page has simply been
moved to another location in the same web site or to another one. In some cases the page
besides being moved, is updated, becoming a bit different to the original one but rather
similar. In all these cases it can be very useful to have a tool that provides us with pages
highly related to the broken link, since we could select the most appropriate one. The relationship
between the broken link and its possible linkable pages, can be defined as a function
of many factors. In this work we have employed several resources both in the context
of the link and in the Web to look for pages related to a broken link. From the resources in
the context of a link, we have analyzed several sources of information such as the anchor
text, the text surrounding the anchor, the URL and the page containing the link. We have
also extracted information about a link from the Web infrastructure such as search engines,
Internet archives and social tagging systems. We have combined all of these resources to
design a system that recommends pages that can be used to recover the broken link. A
novel methodology is presented to evaluate the system without resorting to user judgments,
thus increasing the objectivity of the results, and helping to adjust the parameters
of the algorithm. We have also compiled a web page collection with true broken links,
which has been used to test the full system by humans.
Results show that the system is able to recommend the correct page among the first ten
results when the page has been moved, and to recommend highly related pages when the
original one has disappeared.