Abstracts for the afternoon speakers

Hunter Glanz, Boston University

Title: High Dimensional Inference in Remote Sensing Using MODIS Data

Abstract:

High dimensionality and large proportions of missing data present significant challenges for algorithms that use remote sensing to classify land cover and detect land cover changes. Time series images are increasingly being used to monitor and map land cover. In this talk we present an approach that uses PCA to reduce the dimensionality of

multitemporal MODIS data. The goal of this method is to reduce the spectral dimensionality in a way that retains temporal variance properties of different land cover classes. As part of our approach we also present a method to impute missing data and reliably perform pixel-wise land cover classification using a MAP estimator. The PCA-based approach we

describe successfully reduced the dimensionality of the multispectral and multi temporal data―91% of the variance in the 196 input bands were captured by 3 components. The retained PC’s successfully capture spectral-temporal features that distinguish the MODIS land cover classes from one another. Because spatial information can help inform the classification we expand the previous setup by developing a graphical model on the full lattice of pixels using spanning trees. Inference, in this scenario, is made via the centroid estimator. To evaluate our approach we use a data set composed of MODIS data covering approximately 500 square kilometers near Montreal, Canada.

Prakash Balachandran, Boston University

Title: Inference of Network Summary Statistics through Network Denoising

Abstract:

Consider observing an undirected network that is ‘noisy’ in the sense that there are Type I and Type II errors in the observation of edges. Such errors can arise, for example, in the context of inferring gene regulatory networks in genomics or functional connectivity networks in neuroscience. Given a single observed network then, to what extent are summary statistics for that network representative of their analogues for the true underlying network? Can we infer such statistics more accurately by taking into account the noise in the observed network edges?

In this paper, we answer both of these questions. In particular, we develop a spectral-based methodology using the adjacency matrix to ‘denoise’ the observed network data and produce more accurate inference of the summary statistics of the true network. We characterize its performance through bounds on appropriate notions of risk in the L2 sense, and conclude by illustrating the practical impact of this work on synthetic and real-world data.

This is joint work with Eric Kolaczyk and Edo Airoldi.

Tomonari Sei, Keio University

Title: Infinitely imbalanced binomial regression and deformed exponential families

Abstract:

The logistic regression model is known to converge to a Poisson point

process model if the binary response tends to infinitely imbalanced.

In this talk, we show that this phenomenon is universal in a wide

class of link functions on binomial regression.

The intensity measure of the point process becomes a deformed

exponential family. The proof relies on the extreme value theory.

Ryoichi Suzuki, Keio University

Title: A Clark-Ocone type formula under change of measure for Lévy processes and related topics

Abstract:

Clark-Ocone formula is an explicit stochastic integral representation for random variables in terms

of Malliavin derivatives. In this talk, we indtroduce a Clark-Ocone type formula under change of measure

(COCM) for Lévy processes with L2- Lévy measure.

To show COCM for L2- Lévy processes, we develop Malliavin calculus for Lévy processes, based on Geiss and Laukkarinen (2011).

By using σ-finiteness of Lévy measure, we obtain a commutation formula for the Lebesgue integration and

the Malliavin derivative and a chain rule for Malliavin derivative. These formulas derive COCM.

Moreover, I introduce some formulas for Malliavin derivative.

Finally, we obtain log-Sobolev type formula for Lévy functionals.

Tomoaki Imoto, Keio University

Title: The flexible distribution to model under- and over-dispersion

Abstract:

The Conway-Maxwell-Poisson (COM-Poisson) distribution with two parameters was originally developed as a solution to handling queueing systems with state-dependent arrival or service rates. This distribution generalizes the Poisson distribution by adding a parameter to model over-dispersion and under-dispersion and includes the geometric distribution as a special case and the Bernoulli distribution as a limiting case. In my talk, we propose a generalized COM-Poisson (GCOM-Poisson) distribution with three parameters, which includes the negative binomial distribution as a special case, and can become a longer-tailed model than the COM-Poisson distribution. The new parameter plays the role of controlling length of tail. The GCOM-Poisson distribution can become a bimodal distribution whose one mode is at zero and is applicable to count data with excess zeros. Estimation methods are also discussed for the GCOM-Poisson distribution.

Ian Johnston, Boston University

Title: Hierarchical gene-proximity models for GWAS

Abstract:

Motivated by the important problem of detecting association between genetic markers and binary traits in genomewide

association studies, we present a novel Bayesian model that establishes a hierarchy between markers and genes

by defining weights according to gene lengths and distances from genes to markers. The proposed hierarchical model

uses these weights to define unique prior probabilities of association for markers based on their proximities to genes

that are believed to be relevant to the trait of interest. We use an Expectation-Maximization algorithm in a filtering

step to first reduce the dimensionality of the data and then sample from the posterior distribution of the model

parameters to calculate estimates of the posterior probabilities of association for the markers. We offer practical

and meaningful guidelines for the selection of the model tuning parameters and propose a pipeline that exploits a

singular value decomposition on the raw data to make it feasible to run our model efficiently on large data sets.

We demonstrate the performance of the model in a simulation study and conclude by discussing the results of a

case study using a real dataset provided by the Wellcome Trust Case Control Consortium.