My research lies at the intersection of ML systems, computer vision, and efficient hardware execution. I view vision workloads as inherently containing exploitable visual priors such as locality, redundancy, and spatial structure. However, modern vision models often rely on generic operations like self-attention—especially dense global attention—to achieve flexibility and generality. In this process, these visual priors are often no longer expressed as explicit computational structure, but instead remain implicitly embedded within learned attention patterns.
I study how to make these implicit visual priors explicit again through algorithmic transformations and system execution abstractions. For example, this may involve transforming locality or redundancy hidden inside global attention into structured or windowed computation, or converting already structured window topologies into descriptor-driven representations that backends can execute directly. The goal is to enable models to execute their intended computational structure more directly, rather than indirectly through generic tensor operations, dense masks, layout transformations, or backend-specific kernel sequences.
More broadly, I am interested in narrowing the gap between algorithmic structure and practical system execution in modern vision and vision-language workloads. Rather than focusing solely on optimizing individual kernels or proposing new model architectures, I aim to study how visual priors inherent in vision workloads can be transformed at the algorithmic level and represented and executed efficiently at the system level.
Publications - [link]
Carrer -[링크]
Extra Curricular Study - [링크]
Projects -[링크]
blog1: Woongjoon AI blog2 : 티스토리 블로그 blog3(english) : Woongjoon_AI2