INCF 2018 Workshop on Advanced Data Discovery for Neuroscience

Lei Wang¹, Jessica Turner², Arcot Rajasekar³ and Howard Lander³

¹Northwestern University, ²Georgia State University, ³University of North Carolina at Chapel Hill

Date/Time

Saturday, August 11, 2018

8:00 – 1:00

Place

de Grandpré Communications Centre, Montreal Neurological Institute & Hospital

3801 University Street, Montreal, Quebec H3A 2B4

Scope

The challenge of understanding complicated neuro-biological systems is one of the most compelling problems in modern science. Progress in this goal depends not only on advanced experimental and computational techniques, but on the timely availability and discoverability of the most useful datasets. Traditional discovery relies on faceted search on metadata and associated keywords. But this is very limiting for the image, numeric and instrument datasets commonly found in clinical neuroscience. Deep indexing is a novel technique which overcomes this problem by applying domain-centric analytical algorithms to create discriminating signatures for datasets which can, in turn, be used for discovering similar datasets of interest.

Neuroscience is at an inflection point where more and more data are being aggregated and shared through common repositories. But the sheer number of these datasets has made the task of locating data of interest a tedious time-consuming task without guarantee of success. With sparse provenance and metadata, combined with heterogeneity and variety, these numerous datasets are not easily discoverable, limiting their full potential to empower important new research and science. The next challenge facing the neuroscience community in the Big Data era is developing new modes of discovery spanning multiple repositories.

Neuroscience researchers have developed several algorithms for finding specific results from datasets. These computational models have been used to answer specific questions that are of interest to a researcher using datasets that they have collected. But these algorithms can be viewed as a treasure trove of indexing techniques which can provide insight into other datasets collected for different reasons but which are targets for reuse and repurposing. These signature generation algorithms are uniquely useful as they give insights into datasets that are not available through the use of textual metadata.

The scope of this workshop is to explore novel dataset discovery mechanisms driven by neuroscience analytic algorithms. Several classes of algorithms including voxel-based quantitative analysis, cellular morphology, disease prediction and progression algorithms, tractography and biomarker algorithms, are candidates for ‘signature analysis’ for use across datasets in various repositories and individual collections. Through an NSF funded project called the DataBridge, we have developed a platform for using analytics to index dataset collections. A prototype application using schizophrenic data has shown the applicability of the approach. This workshop is a means to explore further applications of this technology for a wider neuroscience audience. The effort to add deep indexing for neuroscience datasets will have to be a community effort, engaging the researchers most familiar with the data in selecting datasets and algorithms with the highest scientific potential. Our aim in conducting the DataBridge for Neuroscience workshop at INCF 2018 is to engage the broadest possible set of neuroscience researchers and data experts as we explore the role of deep indexing as a solution to the problem of data discovery in the neurosciences.

Agenda

8 – 8:30 Breakfast

8:30 - 8:40 Welcome and Introduction (Arcot Rajasekar)

8:40 – 9:40 Panel 1: Data discovery, comparison and similarity: How to find data and compare data, in terms of study design, measurement, and ontology

Moderator: Howard Lander

Panel Members: Arcot Rajasekar, Ruth Duerr, Jeff Grethe (15 minutes each plus 15 minutes for discussion)

Presentations: Current approaches, implementations, and perspectives

Discussion: Data discovery, ontology, use of common data element, comparison of data dictionaries, harmonization/integration

9:45 – 10:45 Panel 2: Neuroimaging data: How to make neuroimaging data discoverable, re-useable, and how to facilitate comparison across datasets, in terms of research domain, study design, measurement, and structure

Moderators: Lei Wang and Jessica Turner

Panel Members: Satra Ghosh, Chris Gorgolewski, Vince Calhoun (15 minutes each plus 15 minutes for discussion)

Presentations: Current approaches, implementations, and pros and cons

Discussion: data models, description of studies etc

10:45 – 11:00 Coffee Break

11:00 – 12:00 Panel 3: Current experiences on data discovery.

Moderator: David Keator

Panel Members: Greg Farber, Howard Lander, David Kennedy (15 minutes each plus 15 minutes for discussion)

Presentations: Current experiences: DataBridge for Neuroscience, ReproNim

Discussion: Next steps for collaboration.

12:00 - 1:00 Lunch, Wrap Up and Closing Remarks

Google Sites

Report abuse