This dataset is made up by 20,764 unique URLs, all of them with their corresponding social tags. All of them are English Wikipedia articles with at least 10 annotations on Delicious. Therefore, the tag information for each of these Wikipedia articles as well as the text content can be found in this dataset.
If you want to know more on the dataset generation process, please read the paper referenced at the end of this page.
All the metadata for the dataset documents is provided in XML format, following this pattern:
<hash>MD5 hash for document's URL</hash>
<title>The title of the article</title>
<users>Number of users annotating it</users>
<count># of users who annotated the tag</count>
By downloading and using this dataset you acknowledge that:
- The data has been compiled to exclusively use it for scientific research purposes.
- The copyright holders retain ownership and reserve all rights.
Please, consider citing the following paper if you make use of this dataset for your research work:
Arkaitz Zubiaga. "Enhancing Navigation on Wikipedia with Social Tags". Wikimania 2009. Buenos Aires, Argentina. 2009.