Data Details

Data Columns

The dataset is stored as a google spreadsheet. The data contains the following columns:

  • A: Short title of the conference the paper appeared in. One of {SciVis, InfoVis, VAST, Vis}. See notes on conference names below.

  • B: The year the paper appeared at the conference (note that this can be different from the year the paper has been published as, for example, most VIS 2018 papers were published in the first 2019 issue of the TVCG journal)

  • C: The title of the paper

  • D: If we found it, a DOI to the paper. Where a paper did not include a valid DOI (2 cases at this point), a syntactically valid but fake DOI was entered that starts with 10.0000. This column can, thus, be used as a unique identifier for each paper.

  • E: A link to the paper in the IEEE digital library - based on the DOI

  • F: The first page of the paper in the printed proceedings. This information has not been thoroughly checked for correctness.

  • G: The last page of the paper in the printed proceedings. This information has not been thoroughly checked for correctness.

  • H: Paper type: one of C (conference paper), J (journal paper), M (miscellaneous). If you would like to filter for scientific research articles, then take all Cs and Ts. Papers with an "M" are not typically considered archival publications.

  • I: The abstract of the paper

  • J: Author names deduped. Co-authors are separated by a semicolon. Authors are ordered by how they appear on the paper. The full author names were taken from the de-duped names the DBLP provides.

  • K: Author names as they appeared on the paper. Currently this column is not yet complete. For analysis you should always use column J.

  • L: Author affiliation. This information has not been cleaned up and for most entries just shows the affiliation of the first author. This information has not been thoroughly checked for correctness.

  • M: A list of internal references to other IEEE VIS papers using DOIs of the papers in this list. This list does not include citations to papers external to this conference series. These internal citations are automatically extracted from a full-text extraction of the citation data using Grobid - therefore errors may have occurred.

  • N: A list of author keywords freely chosen by the authors - extracted as found on the paper pdf

  • O: A number indicating how many times this publication has been cited. The citation count was extracted from the Aminer dataset (DBLP-Citation-network V14). The title of the column indicates the last time this count has been updated. A missing value indicates that this publication was not found in the Aminer dataset.

  • P: A number indicating how many times this publication has been cited. This citation count was extracted from CrossRef. A missing value indicates that the article was not found on CrossRef.

  • Q: PubsCited_CrossRef: A number indicating a reference count = the number of other papers this publication cites. This is also automatically extracted via the CrossRef API.

  • R: A code indicating any VIS internal award the publication has won. BP = best paper, HM = best paper honorable mention (= like a runner-up), TT = test of time award (=an award for a paper that has proved to be influential to the community over time), BA=best application paper award,BCS=best case study award. The data for BP and HM is reliable from about 2004 for InfoVis and 2009 for Vis. Historic records on awards in early years is very sparse (e.g. comes from people's personal websites or CVs). TT awards are a recent addition to the conference, and therefore, complete.

Data Collection Methodology

The data has a relatively complex collection pipeline that involved both a manual and automatic approach. Information is mainly extracted from the pdf proceedings and double-checked with information from the IEEEXplore library. Where discrepancies occur we manually check the printed or electronic proceedings and add information by hand. We extract the deduped full author names from the DBLP using an author's "primary name" as stored in this database (e.g. authors who changed their last name will appear with their new name even on older papers). Citation data is extracted from Aminer and CrossRef and internal citations are calculated following a reference extraction from the pdf files using Grobid.

Notes on IEEE VIS, its Child Conferences and Naming

The IEEE Conference on Visualization started as a conference in 1990 under the name IEEE Visualization (Vis). It quickly grew and in 1995 the IEEE Symposium on Information Visualization (InfoVis) was held. The symposium was later renamed into the IEEE Conference on Information Visualization but kept the acronym InfoVis (not to be confused with the other IEEE Conference on Information Visualization that uses the acronym IV). In 2006 the Symposium on Visual Analytics Science and Technology (VAST) joined and was later renamed to the VAST conference. The original IEEE Visualization track ran under used the acronym VIS for several years but was later renamed to the IEEE Conference on Scientific Visualization using the acronym SciVis in 2013. With the name change the conference also changed its call for participation (cfp) - and in particular the focus topics of the conference. For example, information visualization was dropped as a topic.

For a number of years the three conferences ran together under the umbrella name VisWeek in what you can more or less think of as parallel tracks of a big joint conference. In 2013, however VIS was chosen as the joint acronym for all three conferences and the name VisWeek was dropped and since 2021 the conference is now run as a single joint event under the name "The IEEE Visualization and Visual Analytics Conference (VIS)".

The dataset is further a bit complicated in that VAST has since 2012 had both conference as well as TVCG journal papers. The top ranked papers are considered journal papers and the other type of papers are considered conference track papers. We chose to mark the conference only papers in the dataset in Column J so that you can filter them if needed. SciVis also experimented with this concept in 2015.

The image below is a graphical explanation of the naming changes for the conference series between 1990 and 2021:

NOTE: The image above is explicitly placed into the public domain.


The dataset was prepared primarily by the following people:

With significant help from:

  • Torsten Möller (University of Vienna)

  • Michael Sedlmair (University of Vienna)

  • Panpan Xu (Hong Kong University of Science and Technology)

  • Chad Stolper (Georgia Tech)

  • Jian Chen (UMBC)

And with a little bit of help from:

  • Charles Perin (Inria)

  • Nadia Boukhelifa (Inria)

  • Jean-Daniel Fekete (Inria)

  • Wesley Willett (Inria)

  • Fred Vernier (Université Paris Sud)

  • Pascal Goffin (Inria)

  • Heidi Lam (Google)