Informational queries -where the user has a more or less complex need for information- are also well addressed by web search engines when there is some web document that aggregates the relevant information in a structured and updated way. For this reason, Wikipedia entries are dominant in the results of informational needs of a certain degree of popularity. There are, however, a substantial proportion of information needs that cannot be answered with a few weblinks:
- The vast majority of web queries (around 88% according to a recent estimation by Yahoo!) are unique (this is referred to as the “long tail” of search engine logs). This implies, for instance, that when the query is the name of a person it is highly probable that this person will not have its own entry in Wikipedia.
- Sometimes the relevant information for a topic can be extracted after a manual process of reading, analyzing and synthesizing the most relevant sources. In the web there is also implicit information that can only be aggregated automatically. Some increasingly relevant examples are online popularity, authority and reputation, which require aggregation of the number, relevance and polarity of mentions and opinions to a person or organization. Even in the Wikipedia itself there is implicit information that have to be mined by processing several entries, such as “European airports with only one landing strip”.
Our hypothesis is that such information needs call for a different paradigm of web search engine, able to answer any informational need with the equivalent of an automatic -and expanded- Wikipedia entry. In particular, we focus on persons and organizations as a relevant, large fraction of the specialized mining and search capabilities of such new generation of web engines.