In this "Big Data" age, as Hal Varian, chief
economist at Google said, "the sexy job in the next 10 years will be statisticians".
Here "statisticians" are not just referred to people in Statistics
department, instead, they are generally "Data Scientists", who can
apply modern statistical methods to do data analysis to get knowledge from the
massive data from various areas. Nowadays, Machine Learning, Graphical Models,
Computational Bayesian Analysis and other methods are becoming more and more
popular among "Data Scientists", with applications in Engineering,
Computational Biology, Finance, Social Science and so on. And it's also easy
for us to get real data online such as
Kaggle, which is a famous website for
data analysis competitions, whose data are provided by many large companies
such as Facebook.
One hot area in statistics, high-dimensional data analysis
deals with estimation in the "large p, small n" setting, where p and
n correspond, respectively, to the dimensionality of the data and the sample
size. It is well-known that such high-dimensional scaling can lead to dramatic
breakdowns in many classical procedures. In the absence of additional model
assumptions, it is frequently impossible to obtain consistent procedures when p is far bigger than n. Accordingly, an active line of
statistical research is based on imposing various restrictions on the model,
for instance, sparsity, manifold structure, or graphical model structure, and
then studying the scaling behavior of different estimators as a function of
sample size n, ambient dimension p and additional parameters related to these
structural assumptions. So we can see that sparsity is not the only case for
studying high dimensional statistics, and graphical model structure is also one
of the reasonable assumptions. Thus learning graphical model is important not
only for itself but also for high dimensional statistics.
In this student seminar, we are going to focus on Graphical
Models, which is very popular among data scientists from companies and also
among statisticians from academia. It involves Machine Learning, Bayesian Analysis,
High-Dimensional Statistics and other hot areas in data analysis. We start with
basics of the graphical models and related algorithms, probably for the whole
fall semester, and then we move on to the high dimensional case in next spring
semester. Along with the theory learning, we also try to implement the
algorithms to realize the powerful theory in real data analysis, hence
enhancing our computational and programming ability. This is an effective way
to learn statistical methods.
In addition, in our department, Professor R.V.Ramamoorthi is working on Graphical Models, Professor Yuehua Cui is interested
in the gene regulatory networks (GRN), which applies graphical models in the
applications to genetics, Professor Ping-Shou Zhong is working on high
dimensional inference, Professor Lifeng Wang is also interested in the
graph-valued regression analysis, Professor Yimin Xiao is working on Random
Fields, which includes Markov random fields (Undirected graphical models),
Professor Sarat Dass, Tapabrata Maiti and Chae Young Lim are working on spatial
data analysis and Bayesian analysis, which are closely related with Graphical
Models. Thus, we really have excellent academic resources in our department to
study Graphical Models. It can help us to cooperate with our Professors easily.
And this can also help new students to choose academic advisors.
The student seminar is also a platform to strengthen
students' presentation ability and stimulate the cooperation among students.
Hopefully, we will end up with getting papers published at the end of the
seminar.