Wiki10+ Dataset

Wiki10+ is a dataset created during April 2009 with data retrieved from the social bookmarking site Delicious and Wikipedia. It is available for research purposes.

Statistics

This dataset is made up by 20,764 unique URLs, all of them with their corresponding social tags. All of them are English Wikipedia articles with at least 10 annotations on Delicious. Therefore, the tag information for each of these Wikipedia articles as well as the text content can be found in this dataset.

If you want to know more on the dataset generation process, please read the paper referenced at the end of this page.

Metadata Format

All the metadata for the dataset documents is provided in XML format, following this pattern:

<articles>
  ...
  <article>
    <hash>MD5 hash for document's URL</hash>
    <title>The title of the article</title>
    <users>Number of users annotating it</users>
    <tags>
      ...
      <tag>
        <name>Tag name</name>
        <count># of users who annotated the tag</count>
      </tag>
      ...
    </tags>
  </article>
  ...
</article>

Legal Information

By downloading and using this dataset you acknowledge that:

  • The data has been compiled to exclusively use it for scientific research purposes.
  • The copyright holders retain ownership and reserve all rights.

Reference

Please, consider citing the following paper if you make use of this dataset for your research work:

Arkaitz Zubiaga. "Enhancing Navigation on Wikipedia with Social Tags". Wikimania 2009. Buenos Aires, Argentina. 2009.

Download