`Data are widely available, what is scarce is the ability to extract wisdom from them’
-- Hal Varian, Google’s chief economist
We live in an era of data deluge. Pervasive media collect massive amounts of data in a wide variety of formats. A major portion corresponds to data residing on networks representing a wide range of physical, biological, and social interdependencies. For instance, we as Facebook users happily feed 10 billion messages per day, click the ‘like’ button 4.5 billion times, and upload 350 million new pictures each and every day. Learning from these large volumes of network data is expected to bring significant science and engineering advances.
The overarching theme of my research is on algorithms, analysis, and application of machine learning, optimization and statistical signal processing tools to data science and network science. Past and current research includes scalable nonlinear and tensor-based learning from high-dimensional (network) data, which finds exciting applications in understanding the structure and dynamics of social-, biological-, and financial-systems.
My doctoral research work focuses on three major challenges for data analytics and network science: massive size, unknown nonlinearity, and unknown dynamics. In addition to theoretical foundations, my research has pursued the following thrusts:
T1. Scalable online learning adaptive to unknown dynamics
- Online subspace learning for big categorical data with misses
- Online functional approximation in environments with unknown dynamics
T2. Graph topology identification and tracking
- Nonlinear topology identification and tracking
- Tensor based topology identification and tracking with missing observations.
T3. Scalable learning adaptive to graphs
- Graph-aware dimensionality reduction
- Online Graph-adaptive learning with scalability and privacy
My research work has contributed to part of three tutorial talks in international conferences
 “Learning Nonlinear and Dynamic Connectivity and Processes over Graphs”
- Intl. Conf. on Acoust., Speech, and Signal Processing, May 12-17, 2019, Brighton, United Kingdom. (co-presenter with G. B. Giannakis).
 “Resilient and Scalable Interactive Learning”
- IEEE Conf. on Military Communications, October, 28-31 2018, Los Angeles, US. (co-presenter with Prof. G. B. Giannakis).
 "Interactive Learning for Optimizing IoT Management"
- IEEE Global Communications Conference 9-13 December 2018, Abu Dhabi, UAE
T1. Scalable and adaptive online learning for big streaming data
Major challenges encountered with large-scale datasets is that they are often incomplete; they are prone to outlying measurements, and may contain categorical attributes. Furthermore, as information sources unceasingly produce data in real time, analytics must often be performed on-the-fly, typically without a chance to reconsider previous data. Extracting latent low-dimensional structure from high-dimensional data is of paramount importance in timely inference tasks encountered with big data. The sheer volume of data and the fact that observations are acquired sequentially over time motivate updating previously obtained estimates, rather than recomputing new ones from scratch, each time a new datum becomes available. Moreover, simple linear models may not be sufficient to accomplish the complex nonlinear learning tasks. Hence, in this thrust, I focused on two questions:
Q1) How to extract low-dimensional subspace from incomplete streaming data generated with nonlinear models?
Q2) How to learn general nonlinear functions in environments with unknown dynamics?
- Online Subspace learning for streaming categorical data with misses
I first developed a new framework to efficiently track the latent low-dimensional structures from incomplete and corrupt datasets with categorical features, that typically encountered in practice, e.g., movie ratings, gene expressions. The proposed framework encompasses several fundamental learning tasks including imputation, clustering, and classification. The novel methods were tested over using real MovieLens and MINST datasets. Performance of the subspace iterates was analyzed in terms of asymptotic convergence and sublinear regret bounds for infinite and finite data streams respectively.
Y. Shen, M. Mardani, and G. B. Giannakis, "Online Categorical Subspace Learning for Sketching Big Data with Misses," IEEE Transactions on Signal Processing, vol. 65, no. 15, pp. 4004-4018, August 2017. [pdf]
Y. Shen, M. Mardani, and G. B. Giannakis, "Online Sketching of Big Categorical Data with Absent Features," Proc. of Conf. on Info. Sciences and Systems, Baltimore, MD, March 18-20, 2015.
Y. Shen and G. B. Giannakis, "Online Dictionary Learning for Large-Scale Binary Data," Proc. of EUSIPCO, Sept. 2016. [pdf]
- Adaptive nonlinear functional approximation in environments with unknown dynamics
Major challenges in the area of nonlinear functional approximation include scalability, unknown dynamics and the choice of functional form. To this end, I developed a scalable, adaptive, and data-driven learning scheme to obtain the sought function ‘on the fly.’ I further analyzed its performance in terms of both static and dynamic regrets that quantitatively assess its tracking capability in environments with unknown and possibly adversarial dynamics. Tests with real datasets showcased the effectiveness of the novel algorithms in tracking temperature, movements, and Twitter data.
Y. Shen, T. Chen, and G. B. Giannakis, "Random Feature-based Online Multi-kernel Learning in Non-stationary and Adversarial Environments," Journal of Machine Learning Research (JMLR), to appear. [pdf]
Y. Shen , T. Chen and G. B. Giannakis, “Online Ensemble Multi-kernel Learning Adaptive to Non-stationary and Adversarial Environments,” Proc. of Intl. Conf. on Artificial Intelligence and Statistics (AISTATS), Lanzarote, Canary Islands, April 9-11, 2018.[pdf]
T2. Graph topology identification and tracking
The study of graphs and interconnected complex agents has recently emerged as a major catalyst for collectively understanding the behavior of complex systems. Topology identification of networks is a prominent task in a gamut of application domains. For example, discovery of causal links between regions of interest in the brain is tantamount to identifying an implicit connectivity network. Studies pertaining to regulations among genes hinge upon identification of unknown links within gene regulatory networks. Most contemporary graph topology inference approaches rely on linear and static models due to their inherent simplicity and tractability, and presume that the nodal processes are directly observable, which may not be feasible in many applications. Therefore, this thrust contributed answers to the following two questions:
Q1) How to model and adaptively learn nonlinear interactions in networks?
Q2) How to identify and track dynamic networks with partial observations?
Results in this direction are summarized in our overview paper published in Proceedings of the IEEE.
G. B. Giannakis, Y. Shen, and G. V. Karanikolas, "Topology Identification and Learning over Graphs: Accounting for Nonlinearities and Dynamics," Proceedings of the IEEE, vol. 106, no. 5, pp. 787-807, May 2018. [pdf]
- Nonlinear network topology inference
In order to model nonlinear iterations, we advocated novel nonlinear data-driven topology inference approaches that could account for the nonlinear dependencies among nodes and across time, the proposed method can adaptively learn the nonlinearity from the data. Experiments on a real gene expression dataset, as well as brain data sets unveil interesting new edges that were not revealed by linear models, which could shed more light on regulatory behavior of human genes, and brain dynamics at the onset of seizures. In addition, I put forth an online kernel-based method to track time-varying and nonlinear dependencies.
Y. Shen, B. Baingana, and G. B. Giannakis, "Nonlinear Structural Vector Autoregressive Models for Inferring Effective Brain Network Connectivity," IEEE Transactions on Signal Processing, submitted September 2018. [pdf]
Y. Shen, B. Baingana, and G. B. Giannakis, "Kernel-based structural equation models for topology identification of directed networks," IEEE Trans. on Sig. Processing, vol. 65, no. 10, pp. 2503-2516, May 2017.[pdf]
Y. Shen and G. B. Giannakis, “Online Nonlinear Sparse Structural Vector AR models for Identifying Dynamic Graph Topologies,” Proc. of Data Science Workshop, Lausanne, Switzerland, June 4-6, 2018. [pdf]
- Tensor-based network topology inference with missing nodal measurements
To cope with nodal processes that may not be directly observable in certain applications, due to e.g., privacy, I developed tensor-based algorithms that only require statistics of nodal processes. Capitalizing on the uniqueness properties inherent to high-order tensor factorizations, I established conditions for exact recovery of graph topology. To facilitate real-time operation and inference of time-varying networks, I further developed an adaptive tensor decomposition based scheme, which tracks the topology-revealing tensor factors. Extensive testson real stock quote data demonstrated the merits of the novel tensor-based approach.
Y. Shen, B. Baingana, and G. B. Giannakis, "Tensor Decompositions for Identifying Directed Graph Topologies and Tracking Dynamic Networks," IEEE Transactions on Signal Processing, vol. 65, no. 14, pp. 3675 - 3687, July 2017.[pdf]
Y. Shen, B. Baingana, and G. B. Giannakis, "Tracking dynamic piecewise-constant network topologies via adaptive tensor factorization," Proc. of Globalsip, Washington, DC, Dec. 7-9, 2016. (SPS Travel Grant Award)
Y. Shen, X. Fu, G. B. Giannakis, and N. D. Sidiropoulos, "Inferring Directed Network Topologies via Joint Diagonalization," Proc. of Asilomar Conf., Pacific Grove, CA, Oct. 29 - Nov. 1, 2017.
T3. Graph-adaptive nonlinear learning with Scalability and Privacy
With insights gained from T1 and T2, I further envision broadening the theoretical and algorithmic framework to online scalable learning over networks, by answering the following two intertwined questions:
Q1) How can various machine learning tasks benefit from incorporating structural information encoded by graphs?
Q2) How can adaptive online learning scale up learning over dynamic graphs?
- Graph-aware learning
With the adjacency matrices at hand, this set of work studies how various learning tasks can benefit from incorporating structural information encoded by graphs. Our recent results have revealed the benefits of considering the underlying network topologies for classification and clustering, as well as dimensionality reduction of data observed over graphs. Prominent tests show that our method markedly improve recognition accuracy on face datasets.
Y. Shen, P. A. Traganitis, and G. B. Giannakis, "Graph-Adaptive Nonlinear Dimensionality Reduction," IEEE Transactions on Signal Processing, submitted March 2018. [pdf]
Y. Shen, P. A. Traganitis, and G. B. Giannakis, "Nonlinear Dimensionality Reduction on Graphs," Proc. of CAMSAP Conf., Curacao, Dutch Antilles, Dec. 10-13, 2017. (Best Student Paper Finalist)
J. Chen, G. Wang, Y. Shen, and G. B. Giannakis, "Canonical Correlation Analysis of Datasets with a Common Source Graph," IEEE Transactions on Signal Processing, vol. 66, no. 16, pp. 4398-4408, August 2018. [pdf]
- Adaptive learning over graphs with misses/ perturbations
Real-world networks may have very large size, and nodal attributes can be unavailable to a number of nodes. To this end, we explored joint inference of topologies and dynamic processes over graphs. Tests over gene regulatory networks, and financial networks show that the novel algorithm can track both topologies and signals over networks with partial observations, or with perturbation in graph measurements.
*V. N. Ioannidis, Y. Shen, and G. B. Giannakis, "Semi-Blind Inference of Topologies and Dynamical Processes over Graphs," IEEE Transactions on Signal Processing, submitted May 2018, revised October, 2018. [pdf]
*V. N. Ioannidis, Y. Shen, and G. B. Giannakis, ”Semi-blind Inference of Topologies and Signals over Graphs,” Proc. of Data Science Workshop, Lausanne, Switzerland, June 4-6, 2018. (SPS Travel Grant Award)
*E. Ceci, Y. Shen, G. B. Giannakis, and S. Barbarossa, “Signal and Graph Perturbations via Total Least Squares,” Proc. of Asilomar Conf. on Sig., Systems, and Computers , Pacific Grove, CA, October 28-31, 2018.
*E. Eeci and V. N.Ioannidis are two junior PhD students I mentored.
- Privacy-Preserving Adaptive learning over growing graphs
Existing inference methods over graphs typically assume that the network size is fixed. However, new nodes can emerge over time, which necessitates real time analytics of growing network data. In order to cope with newly-joining nodes, and to meet potential privacy requirements, I cast inference of nodal attributes as an online function learning task, and developed an adaptive privacy-preserving approach that is scalable to large-size dynamic networks. The novel method is capable of effectively providing real-time evaluation of growing networks with newly-joining nodes without resorting to a batch solver. In addition, the novel scheme only relies on an encrypted version of each node’s connectivity in order to learn the nodal attributes, which promotes privacy. Regret analysis, as well as experiments on real email and citation network datasets corroborate the effectiveness of the proposed methods.
Y. Shen, G. Leus, and G. B. Giannakis, "Online Graph-Adaptive Learning with Scalability and Privacy," IEEE Transactions on Signal Processing, submitted August 2018, revised December 2018. [pdf]
Y. Shen, G. Leus, and G. B. Giannakis, "Scalable Learning with Privacy over Dynamic graphs," Proc. of Intl. Conf. on Artificial Intell. and Stat. (AISTATS), Naha, Japan, April 2019. (in review)
M. Sc. research work
I have also worked on theoretical analysis and algorithm design of Compressive Sensing, and Sparse Bayesian learning in my M.Sc. study, which resulted in 6 journal papers, and 6 conference papers. Please check out my Publications for more details.