News de duplication dataset

I am looking for a news dataset with semantically duplicate news articles tagged. Basically all the news articles which talk about the same story should be grouped. The stories can be worded differently but at a high level talk about the same event. Something like what google news does. Are there tagged news datasets for this ?

Topic dataset clustering

Category Data Science


You can find lot of articles from multiple news sources and languages discussing the same trending event in this dataset.

However, it provides no tags for semantically duplicate articles. That is something you have to implement yourself. Check the page of the source mentioned in the dataset, they provide some event correlation features.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.