The Science Daily Climate Change Dataset
The Science Daily Climate Change SciDCC dataset was created by web scraping news articles from the "Earth & Climate" and "Plant & Animals" topics in the environmental science section of the Science Daily (SD) website. The SD news articles are relatively more scientific when compared to other news outlets, which makes SD perfect for extracting scientific-based climate change news. In total, we extracted over 11k news articles from 20 categories relevant to climate change, where each article comprises of a title, summary, and a body. For each category, we were able to extract a maximum of 1k news articles. The key statistics of the SciDCC dataset are summarized in the section below.
Length Distributions
Cumulative Distribution By Year
Download Link:
This dataset is introduced in a paper titled "NeuralNERE: Neural Named Entity Relationship Extraction for End-to-End Climate Change Knowledge Graph Construction". This paper is still under review in ICML 2021 Workshop on Tackling Climate Change with Machine Learning.