As it becomes more difficult to analyze large-scale simulation output at full resolution, users will have to review and identify regions of interest by transforming data into compact information descriptors that characterize simulation results and allow detailed analysis on demand. Among many different feature descriptors, the statistical information derived from data samples is a promising approach to tame the big data avalanche, because data distributions computed from a population can compactly describe the presence and characteristics of salient features with minimal data movement. The ability to computationally summarize and process data using distributions provides an efficient and representative capture of the information content of a large-scale data set. In GRAVITY lab, we aim for developing novel and compact distribution-based data representations which can on one hand reduce the size of the overall data significantly via statistical summarization techniques, and on the other hand allows for compact and efficient stochastic data analysis and visualization for feature discovery.