Heewon Kim

Efficient and Effective Network Monitoring

Optimization for Telemetry Collection

In-band network telemetry (INT) is a promising technique to investigate the real-time state of networks. However, it inevitably generates a considerable transmission overhead due to the operational feature of directly inserting telemetry data into the packet header. One of the widely used methods to alleviate such overhead is sampling, which requires a careful selection of sampling rates to control the trade-off between the monitoring accuracy and the transmission overhead. Meanwhile, INT needs to collect various kinds of telemetry items with varying frequency characteristics, leading to different sampling rates to satisfy the desired monitoring accuracy. To address these issues, we propose a INT scheme that can effectively strike the balance the trade-off, achieving low overhead with comparable and robust monitoring accuracy under dynamic network environments.

Related Publications

FAT-INT: Frequency-Aware and Item-Wise In-band Network Telemetry for Low-Overhead and Accurate Measurement (ACM CoNEXT 2025)

Intent-based Automatic Telemetry System

Recent advances in large language models (LLMs) have opened new opportunities for automating complex decision-making processes in dynamic environments. Their ability to perform black-box optimization through in-context learning allows them to adaptively tune system parameters based on accumulated feedback—without requiring explicit models or retraining. Motivated by this, we explore the use of LLMs as a general-purpose control mechanism for INT, where conventional tuning methods struggle to adapt to fast-changing network conditions. We present an LLM-driven system that automatically adjusts the sampling interval based on real-time feedback. Using in-context learning, it continuously improves decisions by minimizing error regret. It also maintains a long-term memory of past decisions to accelerate convergence and improve reliability.

Related Publications

Auto-INT: Intent-Aware and LLM-Empowered Automatic Sampling Optimization for In-band Network Telemetry (ACM CoNEXT 2025 Poster)

Intelligence Processing in Networks

Distributed Model Deployment for In-Network Intelligence

In-network inference over programmable data planes allows fast and low-overhead inference using deep neural networks. To alleviate massive processing and deployment costs of in-network inference, distributed deployment on multiple programmable network devices is mainly adopted. However, it is likely to produce a considerable amount of network traffic due to the long distance of the forwarding path and the intermediate data between submodels. In this project, we propose an in-network inference scheme to maximally reduce the network traffic of in-network inference without causing a significant reduction in classification performance.

Related Publications

TINIEE: Traffic-Aware Adaptive In-Network Intelligence via Early-Exit Strategy (IEEE SECON 2024, under review on IEEE ToN)

Model Optimization for In-Network Intelligence

As programmable data planes evolve to support in-network intelligence, there is growing interest in running multiple inference tasks directly within the network. However, the limited computational and memory resources of programmable switches make it challenging to support such functionality at scale. To address this, we explore efficient methods for enabling multi-task inference in resource-constrained network environments. Our approach focuses on optimizing model structures to reduce memory usage and computational load, allowing a single model to support multiple tasks without duplicating resources. By carefully selecting critical features and compressing the model accordingly, we demonstrate that intelligent inference can be performed within the strict constraints of programmable switches—bringing scalable and low-latency intelligence closer to the data.

Related Publications

MALOI: Multi-Task-Aware Low-Overhead In-Network Inference using Programmable Switch (IEEE ComMag)

Data Aggregation in Networks

Orchestration for Distributed In-Network Aggregators

Distributed machine learning offers scalability by distributing computation across multiple nodes, but often faces communication bottlenecks due to the need for frequent gradient exchanges. In-network aggregation, enabled by programmable data planes, has emerged as a promising solution by allowing gradient aggregation to occur directly within the network—reducing traffic and accelerating training. However, deploying these aggregation functions across the entire network is non-trivial, as programmable switches have limited resources. This project investigates how to strategically deploy in-network aggregation in multi-tenant environments, aiming to minimize overall network traffic while respecting hardware constraints. Our goal is to make distributed machine learning not only faster, but also more efficient and scalable within real-world network infrastructures.

Related Publications

Traffic- and Multi-Tenancy-Aware In-Network Aggregation Placement for Distributed Machine Learning (IEEE ICCCN 2023, under review on IEEE TNSM)

Page updated

Google Sites

Report abuse