Social-ODP-2k9 is a dataset created during December 2008 and January 2009 with data retrieved from the social bookmarking sites Delicious and StumbleUpon, the Open Directory Project and the Web. It is available for research purposes.
This dataset is made up by 12,616 unique URLs, all of them with their corresponding social annotations:
1 From Delicious
2 From StumbleUpon
Moreover, the category for each URL, extracted from the Open Directory Project, is also available.
If you want to know more on the dataset generation process, please read the paper referenced at the end of this page.
All the metadata for the dataset documents is provided in XML format, following this pattern:
<documents>
...
<document>
<hash>MD5 hash for document's URL</hash>
<url>Document's URL</url>
<category>ODP Category</category>
<usercount>Number of users annotating it</usercount>
<tags>
...
<tag>
<name>Tag name</name>
<count># of users who annotated the tag</count>
</tag>
...
</tags>
<reviews>
...
<review>A review from StumbleUpon</review>
...
</reviews>
<notes>
...
<note>A note from Delicious</note>
...
</notes>
<detailedtags>
...
<user>
...
<tag>Tags assigned by a user</tag>
...
</user>
...
</detailedtags>
</document>
...
</documents>
By downloading and using this dataset you acknowledge that:
Please, consider citing the following paper if you make use of this dataset for your research work:
Arkaitz Zubiaga, Raquel Martínez, and Víctor Fresno. Getting the Most Out of Social Annotations for Web Page Classification. Proceedings of DocEng 2009, the 9th ACM Symposium on Document Engineering, pp. 74-83, Munich, Germany. 2009.