DeliciousT140 is a dataset created during June 2008 with data retrieved from the social bookmarking site Delicious and the Web. It is available for research purposes.
This dataset is made up by 144,574 unique URLs, all of them with their corresponding social tags retrieved from Delicious on June 2008. This set of documents is annotated with 67,104 different tags.
If you want to know more on the dataset generation process, please read the paper referenced at the end of this page.
All the metadata for the dataset documents is provided in XML format, following this pattern:
<hash>MD5 hash for document's URL</hash>
<filetype>File extension: html, pdf, xml or swf</filetype>
<filename>Filename of the document in the dataset</filename>
<users># of users bookmarked it</users>
<count># of users who annotated the tag</count>
By downloading and using this dataset you acknowledge that:
- The data has been compiled to exclusively use it for scientific research purposes.
- The copyright holders retain ownership and reserve all rights.
Please, consider citing the following paper if you make use of this dataset for your research work:
Arkaitz Zubiaga, Alberto P. García-Plaza, Víctor Fresno, and Raquel Martínez. "Content-based Clustering for Tag Cloud Visualization". Proceedings of ASONAM 2009, International Conference on Advances in Social Networks Analysis and Mining. 2009.