Welcome to my home page!

Short Bio

I am a Postdoctoral Researcher at the Cluster of Excellence on Multimodal Computing and Interaction (MMCI), Saarland Univeristy. In addition, I am affiliated as a Researcher with the Database and Information Systems (D5) group of the Max Planck Institute for Informatics (MPI).

Before joining MMCI and MPI, I obtained my Doctor of Engineering (PhD) degree on February 6th, 2015 from the Institute for Program Structures and Data Organization (IPD) - Chair Prof. Böhm, Karlsruhe Institute of Technology (KIT).

Before joining KIT, I obtained my Master of Engineering and Bachelor of Engineering degrees from the Nanyang Technological University (NTU).

Research Interests: I am always interested in large-scale data analysis. Currently I am focusing on time series analysis, correlation analysis, multi-view learning, transfer learning, and subgroup discovery.


Recent Awards & Honors

(For any award that I obtained, I am always deeply grateful to anyone who has helped me towards achieving it...)
  • Ohio State University (OSU) University Fellowship (to pursue a PhD study in OSU), 2012
  • University of Southern California (USC) Provost's PhD Fellowship (to pursue a PhD study in USC), 2011
  • Best Student Paper Runner-up award in DASFAA 2011
  • Best Student Paper award in DASFAA 2010
  • Paper nominated for the Best Paper award in WISE 2008
  • Gold Medal & Book Prize, Information Technology Management Association, 2008
    (for the student with best performance in the Final Year Project of the School of Computer Engineering)
  • Book Prize, Media Development Authority, 2008
    (for the student with best performance in general proficiency during the final year of the School of Computer Engineering)
  • Nominee of the Professional Engineers Board Gold Medal, Singapore, 2008
    (for students graduating with First Class Honors and having good extra-curricular activities)
  • Dean's List (top 5% students), NTU, 2005-2008
  • Undergraduate Research Experience on Campus Awards, NTU, 2005-2007

Publications (Google Scholar and DBLP)

2016

  1. Linear-time Detection of Non-Linear Changes in Massively High Dimensional Time Series
    Hoang-Vu Nguyen and Jilles Vreeken
    The 16th SIAM International Conference on Data Mining (SDM), 2016. (acceptance rate 25.9%)

  2. Flexibly Mining Better Subgroups
    Hoang-Vu Nguyen and Jilles Vreeken
    The 16th SIAM International Conference on Data Mining (SDM), 2016. (acceptance rate 25.9%)

  3. Universal Dependency Analysis
    Hoang-Vu Nguyen, Panagiotis Mandros, and Jilles Vreeken
    The 16th SIAM International Conference on Data Mining (SDM), 2016. (acceptance rate 25.9%)

2015

  1. Non-parametric Methods for Correlation Analysis in Multivariate Data with Applications in Data Mining
    Hoang-Vu Nguyen
    PhD Dissertation, Fakultät für Informatik, Karlsruher Institut für Technologie

  2. Non-Parametric Jensen-Shannon Divergence
    Hoang Vu Nguyen and Jilles Vreeken
    The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), 2015. (acceptance rate 23.2%)

  3. Identifying User Interests within the Data Space - A Case Study with SkyServer
    Hoang-Vu Nguyen, Klemens Böhm, Florian Becker, Bertrand Goldman, Georg Hinkel, and Emmanuel Müller
    The 18th International Conference on Extending Database Technology (EDBT), 2015.

2014

  1. A Near-Linear Time Subspace Search Scheme for Unsupervised Selection of Correlated Features
    Hoang-Vu Nguyen, Emmanuel Müller, and Klemens Böhm
    Big Data Research (BDR), Elsevier, 2014.

  2. Multivariate Maximal Correlation Analysis
    Hoang-Vu Nguyen, Emmanuel Müller, Jilles Vreeken, Pavel Efros, and Klemens Böhm
    The 31st International Conference on Machine Learning (ICML), 2014. (acceptance rate 25.0%)

  3. Unsupervised Interaction-Preserving Discretization of Multivariate Data
    Hoang-Vu Nguyen, Emmanuel Müller, Jilles Vreeken, and Klemens Böhm
    Data Mining and Knowledge Discovery (DAMI), Springer, 2014. (Impact Factor: 2.877)

  4. Detecting Correlated Columns in Relational Databases
    Hoang-Vu Nguyen, Emmanuel Müller, Periklis Andritsos, and Klemens Böhm
    The 26th International Conference on Scientific and Statistical Database Management (SSDBM), 2014.

2013

  1. 4S: Scalable subspace search scheme overcoming traditional Apriori processing
    Hoang-Vu Nguyen, Emmanuel Müller, and Klemens Böhm
    The 1st IEEE International Conference on Big Data (BigData), 2013. (full paper acceptance rate 17.0%)

  2. CMI: An Information-Theoretic Contrast Measure for Enhancing Subspace Cluster and Outlier Detection
    Hoang-Vu Nguyen, Emmanuel Müller, Jilles Vreeken, Fabian Keller, and Klemens Böhm
    The 13th SIAM International Conference on Data Mining (SDM), 2013. (oral presentation acceptance rate 14.4%)

2011

  1. An Unbiased Distance-based Outlier Detection Approach for High-dimensional Data
    Best Student Paper Runner-up Award
    Hoang-Vu Nguyen, Vivekanand Gopalkrishnan, and Ira Assent
    The 16th International Conference on Database Systems for Advanced Applications (DASFAA), 2011. (full paper acceptance rate 23.6%)

2010

  1. Outlier Detection Based On Neighborhood Proximity
    Hoang-Vu Nguyen
    Dissertation (Master Thesis), School of Computer Engineering, Nanyang Technological University, 2010.

  2. Mining Outliers with Ensemble of Heterogeneous Detectors on Random Subspaces
    Best Student Paper Award
    Hoang-Vu Nguyen, Hock Hee Ang, and Vivekanand Gopalkrishnan
    The 15th International Conference on Database Systems for Advanced Applications (DASFAA), 2010. (full paper acceptance rate 23.2%)

  3. Feature Extraction for Outlier Detection in High-Dimensional Spaces
    Hoang-Vu Nguyen and Vivekanand Gopalkrishnan
    The 4th International Workshop on Feature Selection in Data Mining (FSDM), 2010.

2009

  1. Efficient Pruning Schemes for Distance-Based Outlier Detection
    Hoang-Vu Nguyen and Vivekanand Gopalkrishnan
    The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), 2009. (acceptance rate 25.1%)

  2. On Scheduling Data Loading and View Maintenance in Soft Real-time Data Warehouses
    Hoang-Vu Nguyen and Vivekanand Gopalkrishnan
    The 15th International Conference on Management of Data (COMAD), 2009.

2008

  1. Online Outlier Detection Based on Relative Neighbourhood Dissimilarity
    Nominated for Best Paper Award
    Hoang Vu Nguyen, Vivekanand Gopalkrishnan, and Praneeth Namburi
    The 9th International Conference on Web Information Systems Engineering (WISE), 2008. (full presentation acceptance rate 12.7%)                         

Informal Publications

2015

  1. Linear-time Detection of Non-Linear Changes in Massively High Dimensional Time Series
    Hoang-Vu Nguyen and Jilles Vreeken
    arXiv 1510.08385, 2015.

  2. Canonical Divergence Analysis
    Hoang-Vu Nguyen and Jilles Vreeken
    arXiv 1510.08370, 2015.

  3. Universal Dependency Analysis
    Hoang-Vu Nguyen, Panagiotis Mandros, and Jilles Vreeken
    arXiv 1510.08389, 2015.

  4. Flexibly Mining Better Subgroups
    Hoang-Vu Nguyen and Jilles Vreeken
    arXiv 1510.08382, 2015.