Dr. Zhang Zhenjie
Chief Scientist of Nose Labs
Email: zhenjie at nose dot red
I received my B.S. from Department of Computer Science and Engineering , Fudan University in 2004. In 2010, I got my Ph.D. in Computer Science from School of Computing, National University of Singapore. My PhD thesis is about clustering and unsupervised learning on uncertain data, advised by Prof. Anthony K. H. Tung. During my PhD, I have also worked on different topics related to data analytics, including multi-criteria selection, high-dimensional indexing techniques. I was senior research scientist with Advanced Digital Sciences Center from 2010 to 2018. I am also a contributor to the PChain project since 2017. My research is now mainly focusing on distributed database, parallel streaming analytics, text mining and large-scale machine learning. I was the recipient of NUS President's Graduate Fellowship in 2008 and best paper winner of IC2E 2013.
Projects
Enabling medical research with differential privacy: the project team includes biomedical researchers from the Genome Institute of Singapore and from NUHS/NUS, along with data mining and security experts from ADSC, I2R, and NTU. The overall plan was for the biomedical researchers to identify the types of analyses where they most wanted to be able to obtain differentially private results, along with the accuracy needed for those results in order for them to be useful. Then the computer scientists would devise a differentially private version of each identified type of analysis, and validate the quality of the results using data provided by the biomedical researchers.
Scalable and Real-Time Analytics for Challenging Data: this project targets distributed stream processing systems, which are relatively easy to deploy, manage, and optimize on cloud platforms. The key driver for the specialization of this generic framework is the need for scalable, flexible, real-time response at reasonable cost. We design new technologies on all levels of the software stack to enable effective analytics on challenging data on the fly, including text, audio and video stream.
LEO: Learning-based Efficiency Optimization for centralized air-conditioning system: this is an NRF-funded project, aiming at optimizing energy efficiency of HVAC systems, by machine learning on streaming sensor data and real-time reconfiguration on the chiller plants. This is a 2-year project starting in July 2016.
Recent and Representative Publications
For complete publication list and citation statistics, please refer to DBLP and Google Scholar
Yuming Li, Rong Zhang, Xiaoyan Yang, Zhenjie Zhang, Aoying Zhou, "Touchstone: Generating Enormous Query-Aware Test Databases", to appear in USENIX ATC 2018.
Ruichu Cai, Boyan Xu, Zhenjie Zhang, Xiaoyan Yang, Zijian Li, "An Encoder-Decoder Framework Translating Natural Language to Database Queries", to appear in IJCAI 2018.
Jiong He, Yao Chen, Tom Z. J. Fu, Xin Long, Marianne Winslett, Liang You, Zhenjie Zhang, "HaaS: Cloud-based Real-time Data Analytics with Heterogeneity-aware Scheduling", to appear in ICDCS 2018.
Li Wang, Ruichu Cai, Tom Z. J. Fu, Jiong He, Zijie Lu, Marianne Winslett, Zhenjie Zhang, "Waterwheel: Realtime Indexing and Temporal Range Query Processing over Massive Data Stream", in ICDE 2018.
Ruichu Cai, Jie Qiao, Zhenjie Zhang, Zhifeng Hao, "SELF: Structural Equational Likelihood Framework for Causal Discovery", in AAAI 2018.
Hoang Dung Vu, Kok Soon Chai, Bryan Keating, Nurislam Tursynbek, Boyan Xu, Kaige Yang, Xiaoyan Yang, Zhenjie Zhang, “Data Driven Chiller Plant Energy Optimization with Domain Knowledge”, in CIKM 2017.
Ruichu Cai, Zhenjie Zhang, Zhifeng Hao, Marianne Winslett, "Sophisticated Merging over Random Partitions: A Scalable and Robust Causal Discovery Approach", to appear in IEEE Transactions on Neural Networks and Learning Systems (TNNLS).
Tom Z. J. Fu, Richard T. B. Ma, Marianne Winslett, Yin Yang, Zhenjie Zhang, "DRS: Dynamic Resource Scheduling for Real-Time Analytics over Fast Streams", to appear in IEEE/ACM Transactions on Networking (ToN).
Ruichu Cai, Zijie Lu, Li Wang, Zhenjie Zhang, Tom Z. J. Fu, Marianne Winslett, "DITIR: Distributed Index for High Throughput Trajectory Insertion and Real-time Temporal Range Query" (demo paper), in PVLDB 2017.
Junhua Fang, Rong Zhang, Tom Z.J.Fu, Zhenjie Zhang, Aoying Zhou, Junhua Zhu, "Parallel Stream Processing Against Workload Skewness and Variance", in HPDC 2017.
Deokwoo Jung, Zhenjie Zhang, Marianne Winslett, " Vibration Analysis for IoT Enabled Predictive Maintenance", in ICDE 2017.
Ning Wang, Xiaokui Xiao, Yin Yang, Zhenjie Zhang, Yu Gu, Ge Yu, "PrivSuper: a Superset-First Approach to Frequent Itemset Mining under Differential Privacy", in ICDE 2017.
Zhida Chen, Gao Cong, Zhenjie Zhang, Tom Fu, Lisi Chen, "Distributed Publish/Subscribe Query Processing on the Spatio-Textual Data Stream", in ICDE 2017.
Wenliang Chen, Zhenjie Zhang, Zhenhua Li, Min Zhang, "Distributed Representation for Building Profiles of Users and Items from Text Reviews", in COLING 2016.
Junhua Fang, Rong Zhang, Xiaotong Wang, Tom Fu, Zhenjie Zhang, Aoying Zhou, "Cost-Effective Stream Join Algorithm on Cloud System", in CIKM 2016.
Parijat Mazumdar, Li Wang, Marianne Winslett, Zhenjie Zhang, Deokwoo Jung, "An Index Scheme for Fast Data Stream to Distributed Append-Only Store", in WebDB 2016.
Ganzhao Yuan, Yin Yang, Zhenjie Zhang, Zhifeng Hao, "Semidefinite Optimization for Linear Aggregate Query Processing under Approximate Differential Privacy", in SIGKDD 2016.
Ruichu Cai, Zhenjie Zhang, Zhifeng Hao, Marianne Winslett, "Understanding Social Causalities Behind Human Action Sequences", to appear in IEEE Transaction on Neural Networks and Learning Systems.
Ruichu Cai, Zhenjie Zhang, Srini Parthasarathy, Anthong K. H. Tung, Zhifeng Hao, Wen Zhang, "Multi-Domain Manifold Learning for Drug-Target Interaction Prediction", in SDM 2016.
Jianbing Ding, Zhenjie Zhang, Richard T. B. Ma, Yin Yang, "Abacus: An Auction-Based Approach to Cloud Service Differentiation", in Computer Network.
Li Wang, Minqi Zhou, Zhenjie Zhang, Ming-Chien Shan, Yin Yang, Aoying Zhou, "Elastic Pipelining in An In-Memory Database Cluster", in SIGMOD 2016.
Tom Fu, Jianbing Ding, Richard T. B. Ma, Marianne Winslett, Yin Yang, Zhenjie Zhang, Yong Pei, Bingbing Ni, "LiveTraj: Real-Time Trajectory Tracking over Live Video Streams", (demo paper), in ACM Multimedia 2015.
Tom Fu, Jianbing Ding, Richard T. B. Ma, Marianne Winslett, Yin Yang, Zhenjie Zhang, "DRS: Dynamic Resource Scheduling for Real-Time Analytics over Fast Streams", in ICDCS 2015.
Ruichu Cai, Zhifeng Hao, Marianne Winslett, Xiaokui Xiao, Yin Yang, Zhenjie Zhang, Shuigen Zhou, "Deterministic Identification of Specific Individuals from GWAS Results", in Bioinformatics.
Ganzhao Yuan, Zhenjie Zhang, Marianne Winslett, Xiaokui Xiao, Yin Yang, Zhifeng Hao, "Optimizing Batch Linear Queries under Exact and Approximate Differential Privacy", in ACM TODS.
Ling Gu, Minqi Zhou, Zhenjie Zhang, Ming-Chien Shan, Aoying Zhou, Marianne Winslett, "Chronos: An Elastic Parallel Framework for Stream Benchmark Generation and Simulation", in ICDE 2015. (System web page)
Xianke Zhou, Sai Wu, Zhenjie Zhang, Gang Chen, Anthony K. H. Tung, Marianne Winslett, "PABIRS: A Data Access Middleware for Distributed File Systems", in ICDE 2015.
Rong Zhang, Zhenjie Zhang, Xiaofeng He, Aoying Zhou, "Dish Comment Summarization Based on Bilateral Topic Analysis", in ICDE 2015.
Tuan-Anh Nguyen Pham, Xutao Li, Gao Cong, Zhenjie Zhang, "A General Graph-based Model for Recommendation in Event-based Social Networks", in ICDE 2015.
Li Wang, Minqi Zhou, Zhenjie Zhang, Ming-Chien Shan, Aoying Zhou, "NUMA-Aware Scalable and Efficient In-Memory Aggregation on Large Domains", in IEEE Transaction on Knowledge and Data Engineering (TKDE).
Ruichu Cai, Zhenjie Zhang, Anthony K. H. Tung, Chenyun Dai, Zhifeng Hao. "A general framework of hierarchical clustering and its applications", in Information Science 272: 29-48 (2014).
Xiaoli Wang, Xiaofeng Ding, Anthony K. H. Tung, Zhenjie Zhang, "Efficient and Effective KNN Sequence Search with Approximate N-Grams", in VLDB 2014.
Jia Xu, Zhenjie Zhang, Xiaokui Xiao, Yin Yang, Ge Yu, Marianne Winslett, "Differentially Private Histogram Publication", in VLDB Journal.
Sai Wu, Sheng Wang, Xiaoli Wang, Zhenjie Zhang, Anthony K. H. Tung, "K-Anonymity for Crowdsourcing", in IEEE Transaction on Knowledge and Data Engineering (TKDE).
Jun Zhang, Xiaokui Xiao, Yin Yang, Zhenjie Zhang, and Marianne Winslett, "PrivGene: Differentially Private Model Fitting Using Genetic Algorithms", in SIGMOD 2013.
Ruichu Cai, Zhenjie Zhang, and Zhifeng Hao, "SADA: A General Framework to Support Robust Causation Discovery". in ICML 2013.
Zhenjie Zhang, Richard T. B. Ma, Jianbing Ding, Yin Yang, "ABACUS: An Auction-Based Approach to Cloud Service Differentiation" [slides], in IC2E 2013. (best paper award winner)
Zhenjie Zhang, Hu Shu, Zhihong Chong, Hua Lu, Yin Yang, "C-Cube: Elastic Continuous Clustering in Clouds", in ICDE 2013.
Jun Zhang, Zhenjie Zhang, Xiaokui Xiao, Yin Yang, and Marianne Winslett, "Functional Mechanism: Regression Analysis under Differential Privacy". in VLDB 2012.
Jia Xu, Zhenjie Zhang, Anthony K. H. Tung, and Ge Yu, "Efficient and Effective Similarity Search on Probabilistic Data based on Earth Mover's Distance". in VLDB Journal, [Codes & Data].
Daniel Yang Li, Zhenjie Zhang, Yin Yang, and Marianne Winslett, "Compressive Mechanism: Utilizing Sparse Representation in Differential Privacy". in WPES 2011.
Zhenjie Zhang, Marios Hadjieleftheriou, Beng Chin Ooi, and Divesh Srivastava, "B^{ed}-Tree: An All-Purpose Tree Index for String Similarity Search on Edit Distance". in SIGMOD 2010.
Zhenjie Zhang, Beng Chin Ooi, Srinivasan Parthasarathy, and Anthony K.H. Tung. "Similarity Search on Bregman Divergence: Towards Non-Metric Indexing". in VLDB 2009.
Zhenjie Zhang, Hua Lu, Beng Chin Ooi, and Anthony K.H. Tung. "Understanding the Meaning of A Shifted Sky: A General Framework on Extending Skyline Query". in International Journal of Very Large Database (VLDBJ).
Zhenjie Zhang, Yin Yang, Ruichu Cai, Dimitris Papadias and Anthony K.H. Tung. "Kernel-Based Skyline Cardinality Estimation". in SIGMOD 2009.
Zhenjie Zhang, Reynold Cheng, Dimitris Papadias and Anthony K.H. Tung. "Minimizing the Communication Cost for Continuous Skyline Maintenance". in SIGMOD 2009.
Zhenjie Zhang, Laks Lakshmanan and Anthony K.H. Tung. "On Domination Game Analysis for Microeconomic Data Mining". in ACM Transaction on Knowledge Discovery from Data (TKDD).
Zhenjie Zhang, Bing Tian Dai and Anthony K.H. Tung. "Estimating Local Optimums in EM Algorithm over Gaussian Mixture Model ". in ICML 2008. [Technical Report]
Chee-Yong Chan, H.V. Jagadish, Kian-Lee Tan, Anthony K.H. Tung and Zhenjie Zhang. "Finding k-Dominant Skylines in High Dimensional Spaces". in SIGMOD 2006.
Chee-Yong Chan, H.V. Jagadish, Kian-Lee Tan, Anthony K.H. Tung and Zhenjie Zhang. "On High Dimensional Skyline".in EDBT 2006.
Honours and Awards
Honorable Mention, Early Career Award, IEEE Technical Committee of Data Engineering, 2015
NUS President's Graduate Fellowship in 2007
Dean's Award in 2008
Best Paper Award of IEEE International Conference on Cloud Engineering (IC2E 2013)
Recent Professional Activities