Jia Zou
Assistant Professor
Computer Science and Engineering
School of Computing and Augmented Intelligence
Arizona State University
jia.zou@asu.edu
Bio
Jia Zou is a Tenure-Track Assistant Professor in the School of Computing and Augmented Intelligence, Ira A. Fulton Schools of Engineering, Arizona State University - Tempe, starting in summer 2019. She is the director of the CACTUS data-intensive systems lab founded in the summer of 2020. Jia Zou is also affiliated with the Lawrence Berkeley National Lab since 2023. Before 2019, she was a Research Scientist in the Department of Computer Science of Rice University. Before that she worked in IBM Research - China as a researcher. She received her Ph.D in Computer Science from Tsinghua University, China.
Jia Zou received a prestigious NSF CAREER award in 2022, an Amazon Research Award in 2023, and an IBM faculty award in 2021. Her research interests include database systems, AI/ML in database, applied AI/ML to database, and federated data management. She has more than 20 papers published in VLDB, SIGMOD, ICDE, VLDB journal, SoCC, ICDCS, ICDM, and so on, and has been granted 15 patents. Her work received VLDB 2019 best paper honorable mention award and SIGMOD 2020 research highlight award.
More project information is here.
[New!!!] We have 2 Ph.D openings starting for 2024 fall, if you are interested in database systems research, please send your CV to jia.zou@asu.edu. [More Details] .
News and Activities
Jia will serve as the Co-chair of the Ph.D Workshop of EDBT 2026.
Collaborating with Irbaz Riaz, Arsalan Naqvi, and Parminder Singh, Jia has won a prestigious 2024 Mayo Clinic and ASU Faculty Summer Residency Award. We will work on Scalable Infrastructure for Human-Artificial Intelligence Collaboration (HumAI) and Longitudinal Analysis on Electronic Health Records and Clinical Practice Guidelines.
Our work "DeepMapping: Learned Data Mapping for Lossless Compression and Efficient Lookup. " collaborating with Dr. Selcuk Candan was accepted to ICDE 2024 as a full research paper! Congratulations to Lixi!!!
Jia will serve as a Program Committee Member for KDD 2024 ADS track.
Our vision paper "Serving Deep Learning Model in Relational Databases" was accepted to EDBT 2024. Congratulations to our co-authors!
Jia was invited to be a Program Committee Member for VLDB 2025.
Jia was invited to be a Program Committee Member for SIGMOD 2025.
Jia was invited to be a Program Committee Member for SC 2024.
Our paper "Automatic Data Transformation Using Large Language Model: An Experimental Study on Building Energy Data." has been accepted to IEEE Big Data 2023 (Industrial and Government Track) as a full research paper. Congratulations to Ankita, Xuanmao, Hong, Jasper (Guoxin), and all other co-authors!!! A pre-print can be found here. So many thanks to the three anonymous reviewers who gave us one Strong Accept and two Accepts, which greatly encouraged our research along the line!
Jia was invited to be a Program Committee Member for SIGMOD 2024 Demo Track.
Our paper "A Comparison of End-to-End Decision Forest Inference Pipelines" has been accepted to ACM SoCC 2023. Congratulations to Hong and all co-authors!!
Jia will serve as a co-chair of New Researcher Symposium in SIGMOD 2024.
Our paper "Privacy-Preserving Redaction of Diagnosis Data through Source Code Analysis" has been accepted to SSDBM 2023. Congratulations to Lixi, and co-authors Lei Yu and Hong Min!!
Jia received an Amazon Research Award (70,000 USD and 50,000 USD Cloud Credit) for our works in DNN Inference Query Compilation!
We received a summer fellowship (58,800 USD) from DOE and SHI to support the work of Jia and two students, Ankita Sharma and Hong Guan, on scientific data management, in the Lawrence Berkeley National Lab this summer. We are super excited about it!!
Jia was invited to be a Program Committee Member for ECML/PKDD 2023 Applied Data Science Track
Jia was invited to be a Program Committee Member for VLDB 2024 Research Track
Our project "A Federated Query Optimizer for Privacy-Preserving Analytics and Machine Learning" collaborating with Co-PIs Prof. Chaowei Xiao and Prof. Yingzhen Yang, gets funded by U.S. Department of Homeland Security (DHS) Center for Accelerating Operational Efficiency. Congratulations to the team!! (125,000 USD for the first six months, expect 874,998 USD in total)
Jia was invited to be a Program Committee Member for KDD 2023 Applied Data Science Track
Jia was invited to be a Program Committee Member for SSDBM 2023
Lixi has won an NSF VLDB 2022 Travel Grant. Congratulations, Lixi!
Our paper "Benchmark of DNN Model Search at Deployment Time" was accepted by SSDBM 2022 as a research paper. Congratulations to Lixi and all co-authors.
Our paper "Serving Deep Learning Models with Deduplication from Relational Databases" was officially accepted by VLDB 2022 as a research paper. Congratulations to Lixi and all co-authors.
Jia wins an NSF CAREER Award of 547, 584 USD. The project title is "CAREER: Rethink and Redesign of Analytics Databases for Machine Learning Model Serving" (Link). Thanks, NSF!
Jia will serve as a New Researcher Symposium Co-Chair and a Program Committee Member of SIGMOD 2023
Jia wins an ACM WSDM 2022 Outstanding Service Award
Jia will serve as a Program Committee Member of SSDBM 2022
One of Jia's patents has been assigned to DoorDash, Inc
We've received an IBM Global University Program Academic Award of 40,000 USD. The project title is "Low-Latency and Large-Scale Serving of Machine Learning Models in DB2". Thanks IBM!!
We have two research papers accepted to VLDB 2021. Congratulations to Amitabh, Pratik, and all co-authors !
Jia will serve as a Program Committee Member of VLDB 2022 Research Track
Jia will serve as a Local Arrangement Chair of WSDM 2022
Lixi received a CIDSE doctoral fellowship. Congratulations, Lixi!
An abstract "Using Deep Learning Models to Replace Large Materialized Views in Relational Database" has been accepted to CIDR 2021.
Our paper "Integration of Fast-Evolving Data Sources Using A Deep Learning Approach" has been accepted to SFDI 2020 (in conjunction with VLDB 2020). Congratulations to Zijie and Lixi!
Our paper "Declarative Recursive Computation on an RDBMS, or, why you should use a database for distributed machine learning" has been selected for 2020 ACM SIGMOD Research Highlight Award!
Our paper "WATSON: A Workflow-based Data Storage Optimizer for Analytics" (with Ming Zhao, Juwei Shi, Chen Wang) has been accepted to MSST 2020
Jia will serve as a Program Vice Chair in IEEE Big Data 2020
Our paper "Architecture of A Distributed Storage that Combines File System, Memory and Computation in A Single Layer" (with Arun Iyengar and Chris Jermaine) has been accepted to VLDB journal.
Our application for the Amazon AWS Cloud Credits for Research program was successful! We have been awarded $15,000 AWS credits for our automatic data placement project from Oct 2019.
Our paper “Declarative Recursive Computation on an RDBMS” has been selected for Best Paper Honorable Mention Award at VLDB 2019.
Google Cloud provides a research credit of 5,000 USD to support our work in automatic memory tuning of the TensorFlow system, starting from Aug 2019.
Awards
Mayo Clinic and ASU Faculty Summer Residency Award, 2024
Amazon Research Award, 2023
DOE SRP Fellowship, 2023
NSF CAREER Award, 2022
WSDM Outstanding Service Award, 2022
IBM Academic Award, 2021
SIGMOD Research Highlight Award, 2020
VLDB Best Paper Award Runner-up, 2019
Publon Top Peer Reviewer Award, 2018
IBM Patent Awards, 2011-2014
Publications
(Supervised students are marked with *)
2024
[ICDE 2024] Lixi Zhou*, K. Selçuk Candan, Jia Zou. DeepMapping: Learned Data Mapping for Lossless Compression and Efficient Lookup. The 40th IEEE International Conference on Data Engineering (ICDE 2024) Preprint: CoRR abs/2307.05861 (2023) [PDF] [13 pages]To Appear
[EDBT 2024] Lixi Zhou*, Qi Lin*, Kanchan Chowdhury, Saif Masood*, Eichenberger, Alexandre, Hong Min, Alexander Sim, Jie Wang, Yida Wang, Kesheng Wu, Binhang Yuan, Jia Zou. "Serving Deep Learning Model in Relational Databases." 27th International Conference on Extending Database Technology (EDBT'24). Research Track. [9 pages] [PDF].
Guan, H*., Gautier, S.*, Gupta, D., Ambrish, R.H.*, Wang, Y., Lakamsani, H.*, Giriyan, D.*, Maslanka, S.*, Xiao, C., Yang, Y. and Zou, J., 2024. A Learning-based Declarative Privacy-Preserving Framework for Federated Data Management. arXiv preprint arXiv:2401.12393.
2023
[IEEE Big Data 2023] Ankita Sharma*, Xuanmao Li*, Hong Guan*, Guoxin Sun*, Liang Zhang, Lanjun Wang, Kesheng Wu, Lei Cao, Erkang Zhu, Alexander Sim, Teresa Wu, Jia Zou. Automatic Data Transformation Using Large Language Model: An Experimental Study on Building Energy Data. 2023 IEEE International Conference on Big Data. (Industrial and Government Track) Pre-Print CoRR abs/2309.01957 (2023) [PDF] [10 pages] (The first four student authors have equal contributions to the work)
[SoCC 2023] Hong Guan*, Saif Masood*, Mahidhar Dawrampudi*, Venkatesh Gunda*, Hong Min, Lei Yu, Soham Nag*, Jia Zou. A Comparison of End-to-End Decision Forest Inference Pipelines, 2023 ACM Symposium on Cloud Computing SoCC'23 [16 pages][PDF]
[SSDBM 2023] Lixi Zhou*, Lei Yu, Jia Zou, Hong Min. Privacy-Preserving Redaction of Diagnosis Data through Source Code Analysis. In Proceedings of the 35th International Conference on Scientific and Statistical Database Management, SSDBM 2023 [4 pages] [PDF]
Liang Zhang, Jianli Chen, Jia Zou. Taxonomy, Semantic Data Schema, and Schema Alignment for Open Data in Urban Building Energy Modeling. CoRR abs/2311.08535 (2023) [PDF]
2022
[VLDB 2022] Lixi Zhou*, Jiaqing Chen*, Amitabh Das*, Hong Min, Lei Yu, Ming Zhao, and Jia Zou. "Serving Deep Learning Models with Deduplication from Relational Databases." VLDB 2022, PVLDB Volume 15 Issue 10. [14 pages][PDF]
[SSDBM 2022] Lixi Zhou*, Arindam Jain*, Zijie Wang*, Amitabh Das*, Yingzhen Yang, and Jia Zou, "Benchmark of DNN Model Search at Deployment Time." SSDBM 2022. [12 pages] [PDF]
2021
[VLDB 2021] Jia Zou, Amitabh Das*, Pratik Barhate*, Arun Iyengar, Binhang Yuan, Dimitrije Jankov, and Chris Jermaine. "Lachesis: Automatic Partitionings for UDF-Centric Analytics.", VLDB 2021, PVLDB Volume 14 Issue 8 [14 pages][PDF]
[VLDB 2021] Binhang Yuan, Dimitrije Jankov, Jia Zou, Yuxin Tang, Daniel Bourgeois, and Chris Jermaine. “Tensor Relational Algebra for Machine Learning System Design.” VLDB 2021, PVLDB Volume 14 Issue 8 [13 pages][PDF]
[CIDR 2021] Jia Zou, "Using Deep Learning Models to Replace Large Materialized Views in Relational Database", CIDR 2021 (Abstract) [1 page] [PDF]
2020
Zijie Wang,* Lixi Zhou*, Amitabh Das*, Valay Dave*, Zhanpeng Jin, Jia Zou, "Survive the Schema Changes: Integration of Unmanaged Data Using Deep Learning", arxiv:2010.07586 [cs.DB] (pre-submission) [12 pages]
Zhou, Lixi*, Zijie Wang*, Amitabh Das*, and Jia Zou. "It's the Best Only When It Fits You Most: Finding Related Models for Serving Based on Dynamic Locality Sensitive Hashing." arXiv preprint arXiv:2010.09474 (2020) [9pages]
[MSST 2020] Jia Zou, Ming Zhao, Juwei Shi and Chen Wang. "WATSON: A Workflow-based Data Storage Optimizer for Analytics." MSST 2020 [14 pages][PDF]
[SIGMOD Record] Dimitrije Jankov, Shangyu Luo, Binhang Yuan, Zhuhua Cai, Jia Zou, Chris Jermaine, Zekai J. Gao. Declarative recursive computation on an RDBMS, or, why you should use a database for distributed machine learning, SIGMOD Record, Volume 49 No. 1. [8 pages] [PDF] (Invited)
[SFDI 2020] Zijie Wang*, Lixi Zhou*, Jia Zou. "Integration of Fast-Evolving Data Sources Using A Deep Learning Approach." SFDI 2020, workshop co-located with VLDB 2020 [14 pages] [PDF][Video]
[VLDB Journal] Jia Zou, Arun Iyengar, and Chris Jermaine. "Architecture of a distributed storage that combines file system, memory and computation in a single layer." The VLDB Journal 29(5) (2020): 1049-1073. [25 pages][PDF]
2019 and before
[VLDB 2019] Dimitrije Jankov, Shangyu Luo, Binhang Yuan, Zhuhua Cai, Jia Zou, Chris Jermaine, Zekai J. Gao. Declarative recursive computation on an RDBMS, or, why you should use a database for distributed machine learning, VLDB 2019, PVLDB Volume 12 Issue 7. [14 pages] (PDF) (Honorable Mention, VLDB 2019 Best Paper Award runner-up, 2020 SIGMOD Research Highlight Award)
[VLDB 2019] Jia Zou, Arun Iyengar, Chris Jermaine, Pangea: Monolithic Distributed Storage for Data Analytics, VLDB 2019, PVLDB Volume 12 Issue 6. [14 pages] (PDF)
[SIGMOD 2018] Jia Zou, R Matthew Barnett, Tania Lorido-Botran, Shangyu Luo, Carlos Monroy, Sourav Sikdar, Kia Teymourian, Binhang Yuan, Chris Jermaine, PlinyCompute: A Platform for High- Performance, Distributed, Data-Intensive Tool Development, SIGMOD 2018. [16 pages] (PDF)
[ICDCS 2015] Jia Zou, Juwei Shi, Tongping Liu, Zhao Cao, Chen Wang, Foreseer: Workload-aware Data Storage for MapReduce, ICDCS 2015. [2 pages]
[VLDB 2015] Lanjun Wang, Oktie Hassanzadeh, Shuo Zhang, Juwei Shi, Limei Jiao, Jia Zou, Chen Wang, Schema Management for Document Stores, VLDB 2015, PVLDB Volume 8 Issue 9. [12 pages] (PDF)
[VLDB 2014] Juwei Shi, Jia Zou, Jiaheng Lu, Zhao Cao, Shiqiang Li, Chen Wang, MRTuner: A Toolkit to Enable Holistic Optimization for MapReduce Jobs, VLDB 2014, PVLDB Volume 7 Issue 13. [12 pages] (PDF)
[ICDCS 2011] Jia Zou, Gong Su, Arun Iyengar, Yu Yuan, Yi Ge, Design and Analysis of a Distributed Multi-leg Stock Trading System, ICDCS 2011. [12 pages]
[ICDM 2010] Jia Zou, Jing Xiao, Rui Hou, Yanqi Wang, Frequent Instruction Sequential Pattern Mining in Hardware Sample Data, ICDM 2010. [6 pages]
[ICPP 2008] Jia Zou, Zhiyong Liang, Yiqi Dai, Scalability Evaluation and Optimization of Multi-core SIP Proxy Server, ICPP 2008. [8 pages][PDF]
[ICEBE 2008] Jianguo Hao, Jia Zou, Yiqi Dai, A real-time payment scheme for SIP service based on hash chain, ICEBE 2008. [8 pages]
[GLOBECOM 2007] Jia Zou, Wei Xue, Zhiyong Liang, Yixin Zhao, Bo Yang and Ling Shao, SIP Parsing Offload: Design and Performance, GLOBECOM 2007. [6 pages]
[ICCCN 2007] Jia Zou, Yiqi Dai, Motivating and Modeling SIP Offload, ICCCN 2007. [6 pages]
Granted Patents
with Z Cao, P Li and et al. Generating Job Alert. US Patent 10,705,935, 2020
with J Liu, JW Shi and et al. Method and apparatus for configuring relevant parameters of MapReduce applications. US Patent 10,831,716, 2020
with GC Chen, JW Shi and et al. Method and apparatus for processing database data in distributed database system. US Patent 10,628,449, 2020
with Juwei Shi, Chen Wang and et al. Method and Apparatus for Generating Schema of Non- Relational Database. US Patent 10002142B2, 2018
with Li Li, Juwei Shi and et al. Resource management in MapReduce architecture and architec- tural system. US Patent 9582334 B2, 2017
with Zhao Cao, Juwei Shi and et al. Scheduling and execution of tasks based on resource avail- ability. US Patent 9495206 B2, 2016
with Heng Cao, Juwei Shi and et al. Determining location of a user of a mobile device. US Patent 9374800, B2, 2016
with Xiaotao Chang, Fei Chen and et al. Method and system for allocating FPGA resources. US Patent 9389915 B2, 2016
with Kun Wang, Tianyi Wang and et al. Data processing method, data query method in a database, and corresponding device. US Patent 9471612 B2, 2016
with Bo Yang, Juwei Shi and et al. Method and apparatus for processing database data in distributed database system. US Patent 10140351B2, 2016
with Arun Iyengar, Su Gong and et al. Methods and systems for highly available coordinated transaction processing. US Patent 9146944B2, 2015
with Stephen Heisig, Yanqi Wang and et al. Computer system performance analysis. US Patent 8639697 B2, 2014
with Arun Iyengar, Su Gong and et al. Systems and methods for multi-leg transaction processing. US Patent 8601479 B2, 2013 (This patent was assigned to DoorDash, Inc on Oct 15 2021)
Chinese Patent: CN100539499C, 2009
Chinese Patent: CN100428731C, 2008
Teaching
2024 spring: CSE 598: Special Topics: Data-Intensive System for Machine Learning (Class Size: 105)
2023 fall: CSE 412: Database Management (Class Size: 160 each session x 2 sessions)
2023 spring: CSE 598: Special Topics: Data-Intensive System for Machine Learning (Class Size: 100)
2022 fall: CSE 412: Database Management (Class Size: 160)
2022 spring: CSE 412: Database Management (Class Size: 160)
2022 spring: CSE 598: Special Topics: Data-Intensive System for Machine Learning (Class Size: 100)
2021 fall: CSE 412: Database Management (Class Size: 160)
2021 spring: CSE 598: Special Topics: Data-Intensive System for Machine Learning (Class Size: 40)
2020 fall: CSE 412: Database Management (Class Size: 160)
2020 spring: CSE 598: Special Topics: Data-Intensive System for Machine Learning (Class Size: 75)
2019 fall: CSE 205: Object-Oriented Program & Data (Class Size: 180)
Professional Activities
Ph.D Workshop Co-Chair, EDBT 2026
Program Committee Member. VLDB 2025
Program Committee Member. SIGMOD 2025
DoE ASCR Panelist, 2024
NSF Panelist, 2024
Program Committee Member. KDD 2024 (ADS Track)
Program Committee Member. SC 2024
Program Committee Member. SIGMOD 2024 (Demo Track).
New Researcher Symposium Co-Chair, SIGMOD 2024.
Program Committee Member. VLDB 2024
Program Committee Member to APWeb-WAIM 2024
Program Committee Member. ADS track of KDD 2023
Program Committee Member. ADS track of ECML/PKDD 2023
Program Committee Member. SSDBM 2023
Program Committee Member, SIGMOD 2023
New Researcher Symposium Co-Chair, SIGMOD 2023
NSF Panelist, 2022
Local Arrangement Chair of ACM WSDM 2022
Program Committee Member of SSDBM 2022
Review Board Member of VLDB 2022 Research Track
Guest Editor: Special Issue for IEEE Internet Computing: Cognitive Web Service for Machine Learning and Data Analytics
Program Vice Chair of IEEE Big Data 2020
Organization Committee Member of IEEE Service Hackathon 2020
Program Committee Member of IEEE SmartDataServices 2020
Program Committee Member of CIKM 2019
Program Committee Member of IEEE Cluster 2018, 2019
Program Committee Member of HotData I, the First International Workshop on Hot Topics in Big Data and Networking, in conjunction with ICCCN 2014
Reviewer of IEEE Transactions on Parallel and Distributed Systems (TPDS), IEEE Transactions on Knowledge and Data Engineering (TKDE), VLDB journal and etc.. (https://publons.com/researcher/1466438/jia-zou/)
Co-lead of IBM GTO Topic (subtopic: Future of Data Management) 2011
Technical Assistant to Director of IBM Research-China, 2011
Department Services
Search Committee Chair: Smart Data Services (2022 fall to 2023 spring)
Data Science Graduate Program Committee (2022 summer to present)
Graduate Admission Committee (2019 fall to present)
Recent Presentations
Synthetic Data Generation for Privacy-Preserving Fraud ID Detection & Federated Query Optimizer, DHS CAOE PI Meeting (Washington D.C.), Apr 2024
Integrating AI/ML and Database Systems, at ASU GPU Day, Apr 2024
Serving Machine Learning Models from Relational Databases, at EDBT 2024 (Virtual talk)
Integration of AI/ML and Big Data Management, at ASU AI Day, Nov 2023
Unified Compilation Framework for ML Inferences, at Amazon ARA Research Talk Series (Virtual talk), Aug 2023
Privacy-Preserving Redaction of Diagnosis Data through Source Code Analysis. SSDBM 2023 (Los Angeles)
Four Lightening Talks at DOE Sustainable Research Parthway Virtual Workshops, 2022-2023
Lachesis: Automatic Partitionings for UDF-Centric Analytics. VLDB 2021 (Virtual)
Using Deep Learning Models to Replace Large Materialized Views in Relational Database. CIDR 2021 (Virtual)
WATSON: A Workflow-based Data Storage Optimizer for Analytics. MSST 2020 (Virtual)
Integration of Fast-Evolving Data Sources Using A Deep Learning Approach. SFDI 2020, co-located with VLDB 2020 (Virtual)
Pangea: Monolithic Distributed Storage Architecture for Big Data Analytics at VLDB 2019 (Los Angeles, CA)
PlinyCompute: Connecting Programming, Computation and Storage for Analytics at Speed in University of Wisconsin--Madison, Nov 2018 (Madison, WI)
PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development at SIGMOD 2018 (Houston, TX)
PlinyCompute: A Large-Scale Analytics Platform for Complex Objects and UDF-Centric Workflows at DARPA MUSE PI Meeting 2018 (Austin, TX)
Pangea: Monolithic Distributed Storage for Big Data Processing at USENIX ATC 2017 (San Jose, CA)
Foreseer: Workload-aware Data Storage at ICDCS 2015 (Columbus, OH)