Data Details
Data Columns
The dataset is stored as a google spreadsheet. The data contains the following columns:
A: Short title of the Conference the paper appeared in. One of {SciVis, InfoVis, VAST, Vis}. See notes on conference names below.
B: The Year the paper appeared at the conference (note that this can be different from the year the paper has been published as, for example, most VIS 2018 papers were published in the first 2019 issue of the TVCG journal)
C: The Title of the paper
D: If we found it, a DOI to the paper. Where a paper did not include a valid DOI, a syntactically valid but fake DOI was entered that starts with 10.0000. This column can, thus, be used as a unique identifier for each paper.
E: A Link to the paper in the IEEE digital library - based on the DOI
F: The FirstPage of the paper in the printed proceedings. This information has not been thoroughly checked for correctness.
G: The LastPage of the paper in the printed proceedings. This information has not been thoroughly checked for correctness. Note that sometimes a second page is listed here (for example as 20,203) - this is often the case for old years that included a separate color page for pictures in the back of the proceedings. The page after the comma would be the page where the colored pictures appeared.
H: PaperType: one of C (conference paper), J (journal paper), M (miscellaneous). If you would like to filter for scientific research articles, then take all Cs and Js. Papers with an "M" are not typically considered archival publications.
I: The Abstract of the paper
J: AuthorNames-Deduped. Co-authors are separated by a semicolon. Authors are ordered by how they appear on the paper. The full author names were taken from the de-duped names the DBLP provides.
K: AuthorNames as they appeared on the paper. Currently this column is not yet complete. For analysis you should always use column J.
L: AuthorAffiliation. This information has not been cleaned up and for most entries just shows the affiliation of the first author. This information has only partially been checked for correctness. We have, however, checked that there is now one affiliation per author separated by a semicolon. The affiliation may be empty, however. If an author has more than one affiliation, these affiliations are separated by a # character.
M: A list of InternalReferences to other IEEE VIS papers using DOIs of the papers in this list. This list does not include citations to papers external to this conference series. These internal citations are automatically extracted either from a full-text extraction of the citation data using Grobid (for older papers) or using CrossRef or Aminer (for papers from VIS 2021 onwards).
N: A list of AuthorKeywords freely chosen by the authors - extracted as found on the paper pdf
O: A number indicating how many times this publication has been cited. The AminerCitationCount was extracted from the Aminer dataset (using a DBLP-Citation-network). A missing value indicates that this publication was not found in the Aminer dataset.
P: A number indicating how many times this publication has been cited. The CitationCount_CrossRef was extracted from CrossRef. A missing value indicates that the article was not found on CrossRef.
Q: PubsCited_CrossRef: A number indicating a reference count = the number of other papers this publication cites. This is also automatically extracted via the CrossRef API.
R: Downloads_Xplore: A number that indicates how often the article has been downloaded from IEEEXplore.
S: A code indicating any VIS internal Award the publication has won. The data for BP and HM is reliable from about 2004 for InfoVis and 2009 for Vis. Historic records on awards in early years is very sparse (e.g. comes from people's personal websites or CVs). TT awards are a recent addition to the conference, and therefore, complete.
BP = best paper,
HM = best paper honorable mention (= like a runner-up),
TT = test of time award (=an award for a paper that has proved to be influential to the community over time),
BA=best application paper award (in the early years of the conference),
BCS=best case study award (in early years of the conference).
T: GraphicsReplicabilityStamp : this column contains an X if the paper has been awarded a graphics replicability stamp.
Data Collection Methodology
The data has a relatively complex collection pipeline that involved both a manual and automatic approach. Information is mainly extracted from the pdf proceedings and double-checked with information from the IEEEXplore library. Where discrepancies occur we manually check the printed or electronic proceedings and add information by hand. We extract the deduped full author names from the DBLP using an author's "primary name" as stored in this database (e.g. authors who changed their last name will appear with their new name even on older papers). Citation data is extracted from Aminer and CrossRef and internal citations are calculated following a reference extraction from the pdf files using Grobid.
Notes on IEEE VIS, its Child Conferences and Naming
The IEEE Conference on Visualization started as a conference in 1990 under the name IEEE Visualization (Vis). It quickly grew and in 1995 the IEEE Symposium on Information Visualization (InfoVis) was held. The symposium was later renamed into the IEEE Conference on Information Visualization but kept the acronym InfoVis (not to be confused with the other IEEE Conference on Information Visualization that uses the acronym IV). In 2006 the Symposium on Visual Analytics Science and Technology (VAST) joined and was later renamed to the VAST conference. The original IEEE Visualization track ran under used the acronym VIS for several years but was later renamed to the IEEE Conference on Scientific Visualization using the acronym SciVis in 2013. With the name change the conference also changed its call for participation (cfp) - and in particular the focus topics of the conference. For example, information visualization was dropped as a topic.
For a number of years the three conferences ran together under the umbrella name VisWeek in what you can more or less think of as parallel tracks of a big joint conference. In 2013, however VIS was chosen as the joint acronym for all three conferences and the name VisWeek was dropped and since 2021 the conference is now run as a single joint event under the name "The IEEE Visualization and Visual Analytics Conference (VIS)".
The dataset is further a bit complicated in that VAST has since 2012 had both conference as well as TVCG journal papers. The top ranked papers are considered journal papers and the other type of papers are considered conference track papers. We chose to mark the conference only papers in the dataset in Column J so that you can filter them if needed. SciVis also experimented with this concept in 2015.
The image below is a graphical explanation of the naming changes for the conference series between 1990 and 2021:
NOTE: The image above is explicitly placed into the public domain.
Acknowledgements
The dataset was prepared primarily by the following people:
Petra Isenberg (Inria)
John Stasko (Georgia Tech)
Florian Heimerl (University of Stuttgart)
Steffen Koch (University of Stuttgart)
Tobias Isenberg (Inria)
Natkamon Tovanich (Inria)
With significant help from:
Torsten Möller (University of Vienna)
Michael Sedlmair (University of Vienna)
Panpan Xu (Hong Kong University of Science and Technology)
Chad Stolper (Georgia Tech)
Jian Chen (UMBC)
Mohammad Ghoniem (Luxembourg Institute of Science and Technology)
And with a little bit of help from:
Charles Perin (Inria)
Nadia Boukhelifa (Inria)
Jean-Daniel Fekete (Inria)
Wesley Willett (Inria)
Fred Vernier (Université Paris Sud)
Pascal Goffin (Inria)
Heidi Lam (Google)