How data science, digital humanities, and libraries address sources, methods, and meaning
Researchers in every field of study have access to far more data than could have been imagined even a decade ago. The scale and speed of data creation has opened up exciting new paths of inquiry while, at the same time, introducing new kinds of data bias and challenges in the form of reproducibility and reliability of results.
All data are representations. Their collection, handling, reduction and transformation influence what we can and cannot learn from data. And yet our traditional methods of data management, rooted in provenance, context, and careful documentation, struggle to keep pace with big data.
Data science has made significant advances in computational and mathematical tools to organize and analyze data, recognize patterns and make predictions at scale.
Digital humanities are also concerned with scale, but focus on the problem from a different angle: How do we reduce rich, layered and uneven, historical and literary sources into data representations that are ripe for computation?
Libraries and archives that produce metadata, define collections, and trace provenance, all to provide meaningful context, are in need of more powerful tools to support the production of new knowledge.
This state of things provides an opportunity, and perhaps faces us with an imperative, to learn from each other across theoretical and methodological divides, and to address the social, ethical, and political implications of the boundlessness of data today.
In response to this need, the Stanford Data Science Institute, CESTA (Center for Spatial and Textual Analysis) and Stanford Libraries are hosting two complementary trans-disciplinary events to foster communication and shared understanding.