演講摘要

10:10 黃信雄

While matrix variate regression models have been studied in many existing works, classical statistical and computational methods for analysis of the regression coefficient estimation are highly affected by ultrahigh dimensional matrix-valued predictors. To address this issue, this paper proposes a framework of matrix variate regression methods, based on a rank-constraint optimization problem and its alternating gradient descent algorithm. In particular, we consider three low-rank matrix variate regression models including ordinary matrix regression, robust matrix regression, and matrix logistic regression, and we establish the convergence property and statistical consistency of the proposed estimator under these three models. The rank constraint effectively reduces the number of parameters in the model, and as a result, compared with existing methods based on regularization, our method has a better theoretical consistency rate. The experimental results show that the proposed algorithms are effective and efficient under various settings.

11:10 陳光亮

Spectral clustering has emerged as a very effective clustering approach; however, it is quite slow on large data sets. As a result, there has been considerable effort in the machine learning community to develop fast, approximate spectral clustering algorithms that are scalable to large data. Notably, most of those methods use a small set of landmark points selected from the given data. In this talk we present two new landmark-based scalable spectral clustering algorithms that are developed based on novel document-term and bipartite graph models. We demonstrate the superior performance of our proposed algorithms by comparing them with the state-of-the-art methods on some benchmark data sets. Finally, we provide a unified view of all the old and new landmark-based spectral clustering methods.

13:30 李易儒

The recent increased demand for data accelerated statistical and computational adoption to support precision healthcare. To effectively define the state of “disease” and “healthy” from complex medical data is one of the issues in modern medical studies. This talk will focus on introducing two approaches for investigating complex medical images. First, we adopt the concept from Complexity Science, which integrates the knowledge from physics, mathematics and computer sciences. By quantifying the complex medical image with complexity, we could provide a holistic and scale-free image-based evaluation that may serve as a potential biomarker to support clinical decisions. Secondly, taking the advantage of various healthcare data accumulation, the use of deep learning techniques has shown great impacts on clinical workflow efficiency and national policymaking. By using real-world data as an example, we will introduce how statistical methods may advance typical deep learning training. Ultimately, revealing the underlying messages of complex images from different perspectives may optimize the value of data and create insights for the medical field.

14:30 黃世豪

We consider testing independence between two spatial Gaussian random fields evaluated respectively at p and q locations with sample size n, where p and q are allowed to be larger than n. Our approach is based on canonical correlation analysis (CCA), without imposing any spatial stationarity and parametric structure for the two random fields. Instead of applying CCA directly, which is not feasible for high-dimensional testing considered, we adopt a dimension-reduction approach using a special class of multiresolution spline basis functions. These functions are ordered in terms of their degrees of smoothness. By projecting the data to the function space spanned by a few leading basis functions, the spatial variation of the data can be effectively preserved. The test statistic is constructed from the first sample canonical correlation coefficient in the projected space and is shown to have an asymptotic Tracy-Widom distribution under the null hypothesis. Our proposed method automatically detects the signal between the two random fields and is designed to handle irregularly spaced data directly. In addition, we show that our test is consistent under mild conditions and provide simulation experiments to demonstrate its powers. Moreover, we apply our method to investigate the teleconnection between east Africa and the Indian Ocean, and that between west Australia and the North Atlantic Ocean. (Work done jointly with H.-C. Huang, R. S. Tsay, and G. Pan.)


Keyword: canonical correlation analysis, dimension reduction, high-dimensional test, irregularly spaced data, multiresolution spline basis functions, teleconnection, Tracy-Widom distribution.

15:30 林得勝

The talk will start by introducing some fundamental mathematical backgrounds for machine learning and the methodologies to solve PDEs. Then, the shallow neural network model to solve the elliptic interface problems was explained in detail. Some remaining open questions and future directions will also be raised.


16:30 Free discussion