Lei Wang1, Jessica Turner2, Arcot Rajasekar3 and Howard Lander3
1Northwestern University, 2Georgia State University, 3University of North Carolina at Chapel Hill
Saturday, August 11, 2018
8:00 – 1:00
de Grandpré Communications Centre, Montreal Neurological Institute & Hospital
3801 University Street, Montreal, Quebec H3A 2B4
Scope
The challenge of understanding complicated neuro-biological systems is one of the most compelling problems in modern science. Progress in this goal depends not only on advanced experimental and computational techniques, but on the timely availability and discoverability of the most useful datasets. Traditional discovery relies on faceted search on metadata and associated keywords. But this is very limiting for the image, numeric and instrument datasets commonly found in clinical neuroscience. Deep indexing is a novel technique which overcomes this problem by applying domain-centric analytical algorithms to create discriminating signatures for datasets which can, in turn, be used for discovering similar datasets of interest.
Neuroscience is at an inflection point where more and more data are being aggregated and shared through common repositories. But the sheer number of these datasets has made the task of locating data of interest a tedious time-consuming task without guarantee of success. With sparse provenance and metadata, combined with heterogeneity and variety, these numerous datasets are not easily discoverable, limiting their full potential to empower important new research and science. The next challenge facing the neuroscience community in the Big Data era is developing new modes of discovery spanning multiple repositories.
Neuroscience researchers have developed several algorithms for finding specific results from datasets. These computational models have been used to answer specific questions that are of interest to a researcher using datasets that they have collected. But these algorithms can be viewed as a treasure trove of indexing techniques which can provide insight into other datasets collected for different reasons but which are targets for reuse and repurposing. These signature generation algorithms are uniquely useful as they give insights into datasets that are not available through the use of textual metadata.
The scope of this workshop is to explore novel dataset discovery mechanisms driven by neuroscience analytic algorithms. Several classes of algorithms including voxel-based quantitative analysis, cellular morphology, disease prediction and progression algorithms, tractography and biomarker algorithms, are candidates for ‘signature analysis’ for use across datasets in various repositories and individual collections. Through an NSF funded project called the DataBridge, we have developed a platform for using analytics to index dataset collections. A prototype application using schizophrenic data has shown the applicability of the approach. This workshop is a means to explore further applications of this technology for a wider neuroscience audience. The effort to add deep indexing for neuroscience datasets will have to be a community effort, engaging the researchers most familiar with the data in selecting datasets and algorithms with the highest scientific potential. Our aim in conducting the DataBridge for Neuroscience workshop at INCF 2018 is to engage the broadest possible set of neuroscience researchers and data experts as we explore the role of deep indexing as a solution to the problem of data discovery in the neurosciences.
Agenda
8 – 8:30 Breakfast
8:30 - 8:40 Welcome and Introduction (Arcot Rajasekar)
8:40 – 9:40 Panel 1: Data discovery, comparison and similarity: How to find data and compare data, in terms of study design, measurement, and ontology
Moderator: Howard Lander
Panel Members: Arcot Rajasekar, Ruth Duerr, Jeff Grethe (15 minutes each plus 15 minutes for discussion)
Presentations: Current approaches, implementations, and perspectives
Discussion: Data discovery, ontology, use of common data element, comparison of data dictionaries, harmonization/integration
9:45 – 10:45 Panel 2: Neuroimaging data: How to make neuroimaging data discoverable, re-useable, and how to facilitate comparison across datasets, in terms of research domain, study design, measurement, and structure
Moderators: Lei Wang and Jessica Turner
Panel Members: Satra Ghosh, Chris Gorgolewski, Vince Calhoun (15 minutes each plus 15 minutes for discussion)
Presentations: Current approaches, implementations, and pros and cons
Discussion: data models, description of studies etc
10:45 – 11:00 Coffee Break
11:00 – 12:00 Panel 3: Current experiences on data discovery.
Moderator: David Keator
Panel Members: Greg Farber, Howard Lander, David Kennedy (15 minutes each plus 15 minutes for discussion)
Presentations: Current experiences: DataBridge for Neuroscience, ReproNim
Discussion: Next steps for collaboration.
12:00 - 1:00 Lunch, Wrap Up and Closing Remarks