Today’s grand challenges such as mitigating climate change, ensuring energy security or restoring the environment have an interrelated set of large, diverse, and at times competing goals. Any one of these, or the grand challenges to come, will only be achieved by engaging a broad, tightly interacting multidisciplinary, geographically distributed team; and by leveraging, growing and integrating necessary science and technology.
Large scale, high dimensional modeling, observation and experimentation have become core and expanding components of each required discipline. While the investment in modeling has produced rapid advancements, the process of identifying, acquiring and manipulating appropriate modeling, observational, and experimental data sets can be time-consuming and at times intractable. As the fidelity, complexity and scope of models increase, the difficulty of data identification, acquisition, translation, interpretation and validation will present ever-growing inertia for scientific discovery. Consequently, the specialized needs of data sciences are increasingly important across domains.
Within and across disciplines, the communication of theories, hypotheses, inputs, methods, models, results, and inferences, i.e. of data and knowledge, will accelerate the advancement and utilization of solutions to grand challenge problems. Data intensive sciences are as much about this communication as about modeling, observation and experimentation. Today’s diverse communication environment, as embodied in the commonly adopted, yet under-recognized, marvels -- Facebook, Google, Netflix, SharePoint, VMware, Wikipedia, World of Warcraft, Flash mob,… --, provides the necessary technology that data sciences will leverage, grow and integrate to achieve the coming grand challenges.
This web site describes the vision of a research agenda to rapidly accelerate the state of practice in the broad discipline of data intensive sciences. We envisage a future where scientists will interact to identify, search for, acquire and manipulate useful scientific data as easily as existing communication technologies currently manage communicate personal information. Alongside, a globally based collaborative infrastructure would enable cross-discipline scientific communities to rapidly form to address important problems, easily share and disseminate data about all aspects of a scientific endeavor, and perform innovative analyses using scalable, widely-accessible and accepted scientific tools. This discovery, collaboration and analysis environment will leverage standards-based technologies that ensure its familiarity to ‘next generation of scientists’ as well as its longevity.
Achieving these goals requires a large-scale community-driven effort that brings together the mathematical, computational and domain science communities. This community would define and create the infrastructure, tools and services for ubiquitous data access, manipulation and analysis that underpin the solutions to coming grand challenges.