For Research Purposes Only!
It is appreciated if you cite the related papers when using the codes and datasets.
Processed ACL Anthology Network (data by the year of 2020, processing finished in Sep 2021)
For the JASIST 2023 paper Extracting the evolutionary backbone of scientific domains: The semantic main path network analysis approach based on citation context analysis
Processed with Allen AI's doc2json tool and in-house scripts
Brief notes: readme
Users must abide by ACL Anthology's copyright rules (see faq here) and for non-commercial use (under the Creative Commons 3.0 BY-NC-SA license).
Json files of papers, including citation contexts at the end of each file: semantic aan data (3GB zipped, 20-30GB unzipped).
Citation network(s): 1985-2020 (Poor pdf quality for early papers, so time range from 1985 used in the paper); 1965-2020 (maybe useful for citation network analysis, as AAN seems to be discontinued)
Datasets and Source Codes for Citation Context Analysis
For the Scientometrics 2023 paper: Contextualised segment-wise citation function classification
Users must follow ACL Anthology's copyright rules (see faq here) and for non-commercial use (under the Creative Commons 3.0 BY-NC-SA license).
Python source codes: CCA (updated by July 2022)
Json files for citation context annotation: citation function dataset (created by merging and re-annotating six existing datasets in the computational linguistics area; for details see the paper)
Some test codes for the experiments of data augmentation: DataAugCFC (see the augmentation package)
JMPA: A Java package for Main Path Analysis. download (20190325 version).
For the JAISIT 2020 paper Main path analysis on cyclic citation networks
Benchmarking environment for J. Informetrics 2019: download
For the J. Informetrics 2019 paper Forward search path count as an alternative indirect citation impact indicator
Dataset creation is introduced in the paper.
Following are a list of gold-standard paper lists compiled by me
Paper lists collected from source textbooks or survey papers according to different thresholds of number of recommendations (mainly for earlier years before 2000)
AAN-Gold-Standard-Papers-Final-New-recCntThr=2-recTimeThr=2.txt
AAN-Gold-Standard-Papers-Final-New-recCntThr=3-recTimeThr=3.txt
AAN-Gold-Standard-Papers-Final-New-recCntThr=4-recTimeThr=4.txt
With additional so-called undercited papers defined by this J. Informetrics 2016 paper (Scientific influence is not always visible: The phenomenon of under-cited influential publications):
AAN-Gold-Standard-Papers-Final-New-recCntThr=2-recTimeThr=2-TOPCM-TTPCM.txt
AAN-Gold-Standard-Papers-Final-New-recCntThr=3-recTimeThr=3-TOPCM-TTPCM.txt
AAN-Gold-Standard-Papers-Final-New-recCntThr=4-recTimeThr=4-TOPCM-TTPCM.txt
A 2021 Nov version of the gold standard for more than 55K ACL Anthology papers until 2020. download.
This new gold standard, named GS-NLP, built on the basis of the 2019 version, is compiled using more recent textbooks and survey papers.
The time range of the dataset very well covers the recent developments in deep learning techniques for natural language processing. Compared to the 2019 version, this new gold standard is expected to have a third spike in the time distribution of the gold standard papers around 2014-2016 to reflect the gradual penetration and domination of deep learning in the natural language processing domain.
Curated ACL Anthology Network (2016 version). download.
The original AAN distribution can be requested from aaa.how.
You should put the acl-venue-map.txt file in the corresponding release folder, if you use the metadata.AANLoader class of the JMPA package
Benchmarking environment for JASIST'2016 and AAAI'2016.
Dataset (curated gold-standard): download
Gold standard built by two books and 15 course reading lists;
Curated ACL Anthology Network (2011 version).
Benchmarking environment for SKG'2016 and beyond. download.
Gold standard built by four text books or handbooks
Speech and Language Processing (2nd Edition). Daniel Jurafsky, and James H. Martin. Pearson Prentice Hall, 2010.
Handbook of Computational Linguistics and Natural Language Processing (Blackwell Handbooks in Linguistics). Edited by Alexander Clark, Chris Fox, and Shalom Lappin. Wiley-Blackwell, 2010.
Handbook Of Natural Language Processing (2nd Edition). Edited by Nitin Indurkhya and Fred J. Damerau. Chapman and Hall/CRC, 2010.
Statistical Natural Language Processing (2nd Edition), Chengqing Zong. Tsinghua University Press, 2013. (《统计自然语言处理(第2版)》, 宗成庆, 清华大学出版社, 2013年)
Curated ACL Anthology Network (2011 version).