The First International Workshop on

Big Data eXploration, Compression and Systems (BDXCS 2026)

Afternoon (13:30 - 16:30) on Monday, January 26th, 2026

Room 701 Room 1201, Osaka International Convention Center

The first international workshop on Big Data eXploration, Compression and Systems (BDXCS) is an international workshop held in conjunction with SCA/HPCAsia 2026, focusing on big data, data compression, and their associated systems.

SCA/HPCAsia 2026: https://www.sca-hpcasia2026.jp/

Call for Papers

PDF version

Objectives

In addition to traditional applications, the rise of AI and cloud computing has significantly increased the volume of data processing and communication required in high-performance computing (HPC).

Efficient data analytics and data movement across distributed and parallel environments (e.g., the Internet, inter-node networks, and system interconnects) have become critical factors in determining the performance and energy efficiency of supercomputers, data centers, and cloud platforms.

This workshop aims to address key research challenges related to big data from multiple perspectives, including data exploration, data compression, and big data systems.

To tackle these challenges, the workshop will aim to explore practical and effective approaches to data analytics and mining, big data visualization, data integration, scalable data compression, and storage/processing systems for big data.

These investigations will consider both the characteristics of large-scale data workloads and the constraints of modern hardware architectures.

In particular, the workshop will emphasize optimization strategies for big data processing, adaptive and general-purpose compression techniques, and high-performance systems designed for high-throughput, low-latency, and hardware-efficient data operations.

Scopes
- Big Data Exploration
  - Data Analytics and Mining: Statistical and Machine Learning methods for Big Data, Graph Analytics and Network Mining, Pattern Recognition and Anomaly Detection, Time Series and Spatial Data Analysis
  - Interactive Visualization and Visual Analytics: Real-time and Interactive Visualization Techniques, Scalable Visualization Algorithms, Exploratory Data Analysis, Visualization of Complex Data Structures
  - Data Integration and Fusion: Multi-source and Multi-modal Data Integration, Schema Matching and Data Harmonization, Semantic Web and Knowledge Graphs, Data Fusion Techniques for Heterogeneous Data

Big Data Compression
- Lossless and Lossy Data Compression: Compression Techniques for Structured and Unstructured Scientific Data, Multimedia Data Compression, Time-series Data Compression, Textual Data Compression
- Compression Algorithms and Techniques: Quantization, Predictive Coding, Transform-based Compression, Dictionary/Entropy-based Compression, Tensor Decomposition and Low-rank Approximations
- Compression and Analytics Integration: Compression-aware Data Mining and Machine Learning, Performance investigation by applying compression, Analysis of power consumption associated with compression
- Compression/Reduction-conscious Architecture: Offloading data compression/reduction to the network, Data reduction in smart NICs, Adaptive compression with dedicated hardware, Online data compression methods
Big Data Systems
- Scalable and Distributed Systems: Distributed Storage Systems (e.g., Hadoop, HDFS, Ceph), Distributed and Parallel Computing Frameworks (e.g., Spark, Flink, MPI), Cloud and Edge Computing Platforms for Big Data,
- Performance and Optimization: Big Data-based Resource Scheduling and Load Balancing, Hardware-accelerated Big Data Processing (GPU, FPGA), Energy-efficient and Cost-efficient Big Data Processing
- Reliability, Privacy, and Security: Fault Tolerance and Reliability in Big Data Systems, Data Security and Encryption in Large-scale Storage, Privacy-preserving Analytics and Differential Privacy
- Architecture and Middleware: Big Data Workflow Management, Middleware Systems for Data-intensive Computing, Containers and Virtualization for Big Data Applications, Custom Architecture for Big Data systems

Keynotes

Speaker: Xiaoyi Lu, University of Florida

Dr. Xiaoyi Lu is a tenured Associate Professor in the Department of Electrical and Computer Engineering at the University of Florida, where he leads the Parallel and Distributed Systems Laboratory (PADSYS Lab). His research focuses on parallel and distributed computing; scalable and efficient systems across diverse computing paradigms, including HPC, big data, AI, cloud, and edge computing; high-performance communication and I/O technologies (e.g., RDMA, NVMe, GPUs, and DPUs); scalable algorithms and applications; and interdisciplinary computing for social good. Dr. Lu has published over 180 papers in prestigious conferences and journals and has received ten Best Paper or Best Student Paper Awards or Nominations, including those at SC 2019 and IPDPS 2024. He has delivered over 100 invited talks, tutorials, and presentations worldwide and is deeply engaged in academic service and community leadership. His research outputs, including OpenDOTA, SR-APPFL, PMIdioBench, HiBD, MVAPICH2-Virt, and DataMPI, have made a broad impact across both industry and academia. Dr. Lu has received notable honors, including the NSF CAREER Award and research awards from Amazon, Google, and Meta/Facebook.

Title: Heterogeneity-Enriched Communication: Rethinking Data Movement for Data-Intensive AI

Modern AI workloads are increasingly data-intensive, and their scalability depends on how effectively computation and data movement are coordinated across heterogeneous systems. As datasets and model sizes continue to grow, the efficiency of data handling becomes increasingly important. This talk introduces Heterogeneity-Enriched Communication (HEC) as a new perspective for understanding and optimizing data movement in large-scale, data-driven AI. I will discuss how HEC principles inspire new opportunities in data compression and communication-efficient system designs that leverage modern CPUs, GPUs, DPUs, and emerging accelerators. Through recent studies and examples, I will illustrate how rethinking data movement alongside computation can improve the scalability and efficiency of distributed, data-intensive AI workloads and can provide a path toward the next generation of high-performance big data computing systems.

Speaker: Takaki Hatsui, RIKEN SPring-8 Center

Dr. Takaki Hatsui received his B.Eng. and M.Eng. degrees from Kyoto University in 1994 and 1996, and his Ph.D. from the Graduate University for Advanced Studies (SOKENDAI) in 1999. After postdoctoral research at the University of Tokyo and Uppsala University, he served as Assistant and later Associate Professor at the Institute for Molecular Science before joining RIKEN in 2007. He is currently leading the development of advanced X-ray imaging detectors and data infrastructure for the large-scale synchrotron and XFEL facilities, SPring-8 and SACLA, and concurrently serves as Project Leader for Semiconductor Applications at SPring-8.

Title: Towards an AI-Ready SPring-8: Key Prerequisites for Data-Intensive Photon Science

Next-generation synchrotron facilities, such as the upgraded SPring-8 (SPring-8-II) to be completed in 2029, will generate data volumes of up to an exabyte per year per instrument, driven by high-repetition, high–pixel-count imaging detectors, including the CITIUS detector developed through our collaboration. To fully exploit these capabilities, artificial intelligence must be integrated not as an add-on, but as a core component of the facility infrastructure. This keynote presents the key prerequisites for making SPring-8 AI-ready, focusing on the end-to-end data lifecycle in data-intensive photon science. Topics include scalable detector data acquisition, streaming data reduction, high-throughput data transport, and tight integration with high-performance computing and AI platforms.

Important Dates

Paper submission deadline: October 21, 2025 November 4, 2025 (AoE) November 11, 2025 (AoE, Firm)
Notification of acceptance: November 19, 2025
Camera-ready paper deadline: December 15, 2025

Paper Submission

Please submit your paper here

Accepted papers will be published together with SCA/HPCAsia 2026 proceedings.

The paper format and page limits are the same as for SCA/HPCAsia 2026. For details, please see the following page.

https://www.sca-hpcasia2026.jp/submit/papers.html

For this workshop, we will accept both single- and double-column formats.

The page limit is 18 pages for single-column submissions and 10 pages for double-column submissions (both limits include all content, such as references, figures, and tables).

Location

https://www.sca-hpcasia2026.jp/location.html

Room 1201, Osaka International Convention Center

Registration

Please refer to the SCA/HPCAsia 2026 registration page: https://www.sca-hpcasia2026.jp/

Page updated

Google Sites

Report abuse