Welcome to Project Amber

Project Amber: The .gov Data Integrity Initiative:

Preserving the history and mapping the movement, accessibility, and changes in content and tone of the United States government's public data.

Imagine that over the years, from one Presidential Administration to another, changes were made to the Declaration of Independence and the Bill of Rights and that no one ever noted what those changes were, how they varied from the original documents, or even how to find copies to view. We as a people could not then be sure of our own history, or of the foundations upon which our nation is built. We would lose not only the original, accurate information itself, but also the intent of that information.


The United States' digital footprint -- the information available on all websites that end in .gov -- is an essential part of the way the United States talks to us as citizens and talks to the world. The information available online is used for everything from research to journalism to providing essential information to other public and governmental organizations—to the many uses and views of individual citizens who need to find accurate information from a trusted source. Our estimate is that there were about eight billion looks at .gov pages in 2016.


Our digital footprint -- all of the information on those .gov websites, all of the links contained therein, all of the datasets and databases that are part of them -- is always in danger of being lost: it's akin to having fossils catalogued in a natural history museum but then having them be misfiled, lost, thrown into corners, or accidentally or purposefully hidden when a new curator comes in. Our intent with Project Amber is to protect against that very thing from happening with the United States' "digital fossils," and keep us from losing the essential  connections to our past, our history, and to our original values and verbiage.

Welcome to Project Amber.

Project Amber is a research initiative that originated in Carnegie Mellon University's Heinz College, home to the US' top-ranked School of Information Systems Management and School of Public Policy and Management.


Project Amber got its start in November 2016 as Heinz College PhD candidate Matt Crespi (mcrespi@andrew.cmu.edu) was discussing with Professor Chris Labash (clabash@cmu.edu) the potential data loss that might occur as the nation made the transition from the Obama administration to the Trump administration, the impact that might have on researchers, media, and citizens, and the possibility of "backing up" all of the web pages, links, and datasets that were freely available to the American public. This type of preservation of digital information already had a heritage in Carnegie Mellon's School of Computer Science, and Machine Learning Professor Jamie Callan guided the early efforts at data capture, based on his own research activities.


Crespi, Labash, and a small team of graduate students from Professor Labash's Innovation + Technology class (where Crespi is Teaching Assistant) started to put together a framework for collecting and curating the data scraped from all .gov web pages but happily found, upon reaching out to the helpful folks at the Internet Archive and Wayback Machine, that most of that job was already being handled by the Archive's End-of-Term collection of digital .gov artifacts.


The issue now is one of ensuring that when datasets or other digital elements and artifacts change, that the changes and versions are noted and the history and integrity of the information is assured.


To that end, a number of researchers have expressed interest and a number of research projects are being developed and proposed. We'll post updates to the Project Amber website as projects and initiatives unfold.


This Google Site is a placeholder that we're using (and still developing) to keep interested folks informed, as we build a proper and more complete web presence. Please bear with us, bookmark this site if you like, and we'll link to the new URL once the official site is live. 


Follow Project Amber.

Project Amber is an evolving collection of research projects and initiatives. New projects, as they are added, will appear in the Projects pages, and you can follow individual research projects from there. To stay abreast of everything going on with Project Amber, you can check our Update and News pages periodically, sign up for our Newsletter, and follow us on social media:

Related Projects and Research.

Project Amber is one of a number of initiatives that have been (and are still being) created to ensure that information, accuracy, and intent of government information retains its integrity. Our intent is to preserve, archive, and track changes to public information and taxpayer-funded scientific research. If you're interested in becoming part of the team, or have a research project of your own that you'd like to discuss, or just want to stay informed about this and other research and our consistently-updated findings, please get in touch with us, stay connected through social media, or connect with any of the other groups engaged in similar efforts. We've listed below the ones that we know about. We hope to constantly add more to our sort-of consortium, and would appreciate your adding the name of your group or letting us know of any that we've missed.