Social-ODP-2k9 is a dataset created during December 2008 and January 2009 with data retrieved from the social bookmarking sites Delicious and StumbleUpon, the Open Directory Project and the Web. It is available for research purposes.
This dataset is made up by 12,616 unique URLs, all of them with their corresponding social annotations:
- Number of users annotating it1.
- Top 10 list of tags1.
- Full Tag Activity (FTA)1.
1 From Delicious
2 From StumbleUpon
Moreover, the category for each URL, extracted from the Open Directory Project, is also available.
If you want to know more on the dataset generation process, please read the paper referenced at the end of this page.
All the metadata for the dataset documents is provided in XML format, following this pattern:
<hash>MD5 hash for document's URL</hash>
<usercount>Number of users annotating it</usercount>
<count># of users who annotated the tag</count>
<review>A review from StumbleUpon</review>
<note>A note from Delicious</note>
<tag>Tags assigned by a user</tag>
By downloading and using this dataset you acknowledge that:
- The data has been compiled to exclusively use it for scientific research purposes.
- The copyright holders retain ownership and reserve all rights.
Please, consider citing the following paper if you make use of this dataset for your research work:
Arkaitz Zubiaga, Raquel Martínez, and Víctor Fresno. "Getting the Most Out of Social Annotations for Web Page Classification". Proceedings of DocEng 2009, the 9th ACM Symposium on Document Engineering, pp. 74-83, Munich, Germany. 2009.