Action Plan


2/4/10 This site has moved. New URL is
http://systemsbiologyknowledgebase.org.

Login to new site at: http://login.systemsbiologyknowledgebase.org.







Project Description

This project will create the conceptual design and implementation planning necessary for subsequent development of the Systems Biology Knowledgebase (Kbase). A fully functional Systems Biology Knowledgebase is envisioned to be a cyberinfrastructure for systems biology information and data that not only includes data storage, retrieval, and management, but also enables new knowledge acquisition and management, through free and open access to data, analysis tools, and information for the scientific research community. Knowledgebase capabilities are envisioned to include:

  • Curation, not just data, but models and representations of scientific concepts
  • Analysis including the ability to compare methods and inventory of results
  • Simulation including the ability to modify and improve models
  • Prediction based on simulation and analysis to form new hypotheses
  • Experimental design and comparison between prediction and results

The Knowledgebase would drive two classes of work: (1) experimental design and (2) modeling and simulation. Integrating data derived from computational predictions and modeling, as envisioned in the Knowledgebase project, would increase data completeness, fidelitly, and accuracy. These advancements in turn would greatly improve modeling and simulation, leading to new experimentation, analyses, and mechanistic insight. Scientists' ever-increasing exploitation of the dynamic linkages among data integration, experimentation, and modeling and simulation - aided by Kbase - will advance efforts to achieve a predictive understanding of the functions of biological systems.


The Knowledgebase will transform the way the biology community views data and information by providing a contextual and interactive environment to drive and transform technology revolutions and ensure the Department of Energy (DOE) meets mission critical challenges in energy and the environment. Specifically, the Knowledgebase will accelerate research by integrating together data and information on microbial and plant systems to enable a clearer understanding of the processes involved in bioenergy production and environmental stabilization.


This project will define the scope, cost, schedule, and infrastructural needs for Kbase by gathering community input from a series of workshops and supporting pilot projects. A partnership between the DOE Office of Biological and Environmental Research and the DOE Office of Adanced Scientific Computing Research will support activities focused on Kbase computational infrastructure.

Purpose

As biological research advances and enables the pursuit of more ambitious objectives, projects are becoming larger and more complex, encompassing more diverse and specialized capabilities. This describes the situation both experimentally and computationally. In order for these projects to be successful, there is a need for increasing cooperation and standardization. In addition, advances in technological capabilities associated with large-scale biological research projects are producing exponentially more data. All of these trends are leading toward requiring a new kind of computational infrastructure in order for the overall scientific effort to be successful.  

 

Ultimately, to do large-scale science in the future, it will be necessary to have an associated large-scale, open-community computational capability for data management and analysis. The vision for an integrated experimental framework for accessing, comparing, analyzing, modeling, and testing systems biology data was described in the DOE report Systems Biology Knowledgebase for a New Era in Biology.

 

Background

Historically, individual research groups were largely independent and funded as such. Not surprisingly this resulted in associated computational systems being independently developed and largely disconnected. Because of a lack of standards, both computational and experimental, these systems are not readily integrated today. As research using new technologies becomes more productive and collaborative, it is necessary for the computational systems supporting this research to reflect this change. This is the transition from individual research projects to large-scale community efforts – in many ways a cultural change.

 

As described in a recent article in Science (Gordon Bell et al., 6 March 2009 323: 1297-1298), biology, as with other areas of science, is demanding data-intensive computing. For systems biology, the computation is less numerical processing and more the mining and comparison of large datasets.

 

The Knowledgebase will be a substantial software engineering effort unlike anything done before in this community. As such, it demands a clear description of the stakeholders, what they require, and what they would define as success. Furthermore, success for this project will be as much about scientific accomplishment as technological achievement.

 

Objectives
The Knowledgebase project objectives are as follows:
  • Engage the computer science research community in the challenges of biocomputing for the future.
  • Transition the BER informatics community from individual project based efforts toward research community based informatics.
  • Provide pilot examples of software.
  • Provide infrastructure examples of hardware and software.
  • Develop strategies for a successful open informatics science endeavor.
  • Establish scope, specifications, and requirements for Knowledgebase implementation.
Guiding Principles
The guiding principles for this project include:
  • Open contribution– Meaning data and methods will be available for anyone to use. 
  • Open source – Source code will be freely available to access, modify, and redistribute under the same terms such as the GNU General Public License.
  • Open development – Meaning anyone can contribute to the development following the organization guidelines. This would be analogous to submitting a publication. A review process by an authoritative group would determine if the contribution meets established criteria.
    • Allow for user accounts such that data and code can be held private and analysis can be conducted in a private environment.
    • Allow for tracking history, versions, and provenance so that analysis done today can be usefully compared with analysis done previously.
Plan for Success
In order to be successful, the Knowledgebase will focus on scientific goals and strong community involvement while emphasizing the cultural change from individual to community science. This plan for success will include significant efforts toward:
  • Assessing quality of experimental data
  • Establishing experimental protocol and standards
  • Tracking provenance of data, including analytical processes 

Final Report - The Conceptual Design Report

The DOE Systems Biology Knowledgebase project is a significant software engineering and operations effort. Successfully building such a system depends on sufficiently detailed specifications and requirements. The Final Report will be the conceptual design document establishing the scope, cost, and schedule of the Knowledgebase effort.