DeliciousT140 Dataset

DeliciousT140 is a dataset created during June 2008 with data retrieved from the social bookmarking site Delicious and the Web. It is available for research purposes.

Statistics

This dataset is made up by 144,574 unique URLs, all of them with their corresponding social tags retrieved from Delicious on June 2008. This set of documents is annotated with 67,104 different tags.

If you want to know more on the dataset generation process, please read the paper referenced at the end of this page.

Metadata Format

All the metadata for the dataset documents is provided in XML format, following this pattern:

<documents>
  ...
  <document>
    <url>Document's URL</url>
    <hash>MD5 hash for document's URL</hash>
    <filetype>File extension: html, pdf, xml or swf</filetype>
    <filename>Filename of the document in the dataset</filename>
    <users># of users bookmarked it</users>
    <tags>
      ...
      <tag>
        <name>Tag name</name>
        <count># of users who annotated the tag</count>
      </tag>
      ...
    </tags>
  </document>
  ...
</documents>

Legal Information

By downloading and using this dataset you acknowledge that:

  • The data has been compiled to exclusively use it for scientific research purposes.
  • The copyright holders retain ownership and reserve all rights.

Reference

Please, consider citing the following paper if you make use of this dataset for your research work:

Arkaitz Zubiaga, Alberto P. García-Plaza, Víctor Fresno, and Raquel Martínez. "Content-based Clustering for Tag Cloud Visualization". Proceedings of ASONAM 2009, International Conference on Advances in Social Networks Analysis and Mining. 2009.

Download