Day 1
13:30 - 14:30
Veridical data science aims at responsible, reliable, reproducible, and transparent data analysis and decision-making. Predictability, computability, and stability (PCS) are three core principles towards veridical data science. They embed the scientific principles of prediction and replication in data-driven decision making while recognizing the central role of computation. Based on these principles, the PCS framework consists of a workflow and documentation (in R Markdown or Jupyter Notebook) for the entire data science life cycle from problem formulation, data collection, data cleaning to modeling and data result interpretation and conclusions.
Employing the PCS framework in causal inference and analyzing data from clinical trial VIGOR, we developed staDISC for stable discovery of interpretable subgroups via calibration for precision medicine. The subgroups discovered by staDISC using the VIGOR data is validated to a good extent with the APPROVe study.
18:00 - 19:00
This talk will consist of two parts. In the first part, I will discuss reduced rank regression with matrix projections for high-dimensional multivariate linear regression, and present some technical results, simulation study and a case study illustrating the results and methods. In the second part of the talk, I will discuss envelope-based reduced rank regression for high-dimensional multivariate linear regression and present the corresponding results, and make some comparative comments with the first part.
Short Talks and Posters
Day 2
18:00 - 19:00
At each point of time, we live in a 3D space, thus natural scenes data records should be 3D, while in fact data is stored as 1D or 2D images: satellite pictures, showing that we are living in a thin layer of air, first and second generation DNA sequences, initially stored as images, and digital cameras images offer such examples of data. Emulating bilateral colored human vision, machine vision is based on 3D projective shape retrieval of scenes, from their RGB camera images. Once the 3D information is extracted, data may be represented on certain metric spaces, that often have a smooth structure, or that of a stratified space, thus opening the formidable doors to the realm of geometric and algebraic topological data analysis of 3D scenes extracted from image data. A few basic examples of 3D machine vision analysis is presented here; this is joint work with Rob Paige (MST), Daniel Osborne(FAMU), Mingfei Qiu, Ruite Guo, K. David Yao, David Lester, Yifang Deng, Seunghee Choi and Michael Crane.
Short Talks and Posters
Day 3
18:00 - 19:00
Thanks to the advancement of modern technology in acquiring data, massive data with diverse features and big volume are becoming more accessible than ever. The impact of big data is significant. While the abundant volume of data presents great opportunities for researchers to extract useful information for new knowledge gain and sensible decision making, big data present great challenges. A very important, sometimes overlooked challenge is the quality and provenance of the data. Big data are not automatically useful; big data are often raw and involve considerable noise.
Typically, the challenges presented by noisy data with measurement error, missing observation and high dimensionality are particularly intriguing. Noisy data with these features arise ubiquitously from various fields including health sciences, epidemiological studies, environmental studies, survey research, economics, and so on. In this talk, I will discuss some issues induced from noisy data and how these features may challenge inferential procedures.
Short Talks and Posters
Day 4
18:00 - 19:30
Look for the Youtube link in Gather "Data" Town!
Short Talks and Posters
Day 5
12:30 - 13:30
Andriy Bezuhlyy
Prof. Ronald Friedman
Jim Kunce
Robert E. Neher
Tammy Toscos
Prof. Yvonne Zubovic
Prof. Mark Daniel Ward
Short Talks and Posters