Moon Ye-Bin

PhD Student @ POSTECH & Research Intern @ Huawei London Research Center

I am a Ph.D. student at POSTECH, South Korea, advised by Prof. Tae-Hyun Oh. I am currently a full-time research intern in London, U.K., at Huawei London Research Center. I finished my master's degree at POSTECH, and bachelor's degree at Chung-Ang University (CAU), South Korea.

I am interested in reliable and automated multimodal systems by exploiting VLM & LLM as human proxies. This involves constructing and augmenting data to evaluate model appropriateness and facilitate model refinement through SFT, preference optimization, and other techniques.

Keywords: VLM & LLM / Agentic System / SFT & PO / Model Evaluation / Data Augmentation & Construction / but not limited to

Research Experiences

Full-time Research Intern, Huawei London Research Center, UK (current)
- VLM & LLM / Agentic System / PO / Data Construction (RetouchLLM & Automatic Photomontage)
- Jan. 2025 - July. 2025 / Jul. 2025 - Dec. 2025 (Extended)
- Collaborate with Jiankang Deng, Ismail Elezi, and Roy Miles
Ph.D. in Electrical Engineering, POSTECH, South Korea (current)
- VLM & LLM / SFT / Model Evaluation / Data Augmentation & Construction
- Mar. 2022 -
Master's in Electrical Engineering, POSTECH, South Korea
- Sports AIX Program
- Mar. 2020 – Feb. 2022

On-going Projects

RetouchLLM: Training-free Code-based Image Retouching with Vision Language Models, submitted
Moon Ye-Bin, Roy Miles, Tae-Hyun Oh, Ismail Elezi, Jiankang Deng
[arXiv]

Photo Retouching / VLM & LLM / Agentic System

RetouchLLM is an iterative retouching framework guided by a style-guided selection score, which ensures convergence toward the target style without requiring any training data. The system’s white-box, code-based design provides transparency and reproducibility, operating directly on high-resolution images. By leveraging LLM & VLM, our framework further supports natural language human instructions, enabling personalized retouching aligned with user intent.
Work performed while interning at Huawei London Research Center.

Automatic Photomontage with Vision-Language Models, on-going

Photomontage / VLM & LLM / Preference Optimization (PO) / Data Construction

Publications

Patch-wise Retrieval: A Bag of Practical Techniques for Instance-level Matching, WACV'26
Wonseok Choi, Sohwi Lim, Nam Hyeon-Woo, Moon Ye-Bin, Dong-ju Jeong, Jinyoung Hwang, Tae-Hyun Oh

Image Retrieval

Beyond the Highlights: Video Retrieval with Salient and Surrounding Contexts, WACV'26
Jaehun Bang, Moon Ye-Bin, Kyungdon Joo, Tae-Hyun Oh

Video Retrieval / VLM & LLM

Automated Model Discovery via Multi-modal & Multi-step Pipeline, NeurIPS'25 [Project Page][Paper]
Lee Jung-Mok, Nam Hyeon-Woo, Moon Ye-Bin, Junhyun Nam, Tae-Hyun Oh

Model Discovery / VLM & LLM / Agentic System

Interpreting time-series data as visual graphs with VLMs yields higher model discovery accuracy than interpreting it as text with LLMs. By leveraging VLM-based agents, the system visualizes and analyzes the given time-series data to propose model candidates, then iteratively evaluates and refines them to identify the most suitable model.

SYNAuG: Exploiting Synthetic Data for Data Imbalance Problems,
Pattern Recognition Letters'25 (IF 3.9) [Paper][arXiv]
Moon Ye-Bin, Nam Hyeon-Woo, Wonseok Choi, Nayeong Kim, Suha Kwak, Tae-Hyun Oh

Data Augmentation / Diffusion Models

SYNAuG augments imbalanced training data with synthetic images to balance the data distribution. Although there exists a domain gap between synthetic and real images, incorporating synthetic data is beneficial in data-imbalanced settings when at least 5–10 real samples are available. In tasks involving long-tail recognition, model fairness, and spurious correlation, we demonstrate that SYNAuG outperforms algorithmic approaches trained only on real data, suggesting the importance of controlling imbalance from a data perspective.
[Workshop] ICCVw 2023 (MMFM: What is Next in Multimodal Foundation Models?)

VLM’s Eye Examination: Instruct and Inspect Visual Competency of Vision Language Model, TMLR'25 [Code][Paper]
Nam Hyeon-Woo, Moon Ye-Bin, Wonseok Choi, Lee Hyun, Tae-Hyun Oh

Model Evaluation / VLM & LLM / SFT

VLM’s Eye Examination investigates how a VLM perceives images, specifically focusing on key elements of visual recognition, from primitive color and shape to semantic levels. Models that fail to understand the exam format undergo supervised fine-tuning before evaluation. The color exam reveals that VLMs are less sensitive to the green spectrum. Furthermore, we observe that the LLM component, which serves as the cognitive core of the VLM, influences shape sensitivity and patch-wise semantic discrimination.

BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models, ECCV'24 [Project Page][Paper]
Moon Ye-Bin, Nam Hyeon-Woo, Wonseok Choi, Tae-Hyun Oh

Model Evaluation / VLM & LLM

BEAF evaluates whether a VLM correctly understands a scene by observing its response changes to the same question when presented with an original image versus a manipulated version from which an object has been removed. Based on the proposed change-aware metrics, we demonstrate that answers previously considered non-hallucinatory by existing benchmarks may actually be guesses made without referencing the image content. This finding suggests that observing changes along the vision-axis is crucial for accurately assessing VLM hallucination.

TextManiA: Enriching Visual Feature by Text-driven Manifold Augmentation, ICCV'23 [Project Page][Paper]
Moon Ye-Bin, Jisoo Kim, Hongyeob Kim, Kilho Son, Tae-Hyun Oh

Data Augmentation / VLM & LLM

TextManiA augments visual features by incorporating attribute vectors from the text embedding space. Unlike conventional noise injection techniques, TextManiA makes augmentation understandable and semantically meaningful. Surprisingly, we demonstrate that visual features can be effectively augmented not only using aligned text-visual spaces like CLIP, but also by leveraging embeddings from independent LLMs such as GPT-2 and BERT.
[Workshop] CVPRw 2023 (WFM: Workshop on Foundation Models)
[Workshop] ICCVw 2023 (MMFM: What is Next in Multimodal Foundation Models?)

ENInst: Enhancing Weakly-supervised Low-shot Instance Segmentation, Pattern Recognition'24 (IF 8.0) [Paper][arXiv]
Moon Ye-Bin, Dongmin Choi, Yongjin Kwon, Junsik Kim, Tae-Hyun Oh

Few-shot Segmentation / Markov Random Field / Parameterization

ENInst includes sub-task enhancement methods: MRF-based instance-wise refinement to improve pixel localization quality, and novel classifier composition that parameterizes them with base classifiers and Gaussian random vectors to improve classification accuracy, based on performance bottleneck analyses.
[Conference] IPIU 2022, Best Paper Award

HDR-Plenoxels: Self-Calibrating High Dynamic Range Radiance Fields, ECCV'22 [Project Page][Paper]
Kim Jun-Seong, Kim Yu-Ji, Moon Ye-Bin, Tae-Hyun Oh

Scene Reconstruction / HDR

HDR-Plenoxels learns the plenoptic function of the 3D scene from a comprehensive understanding of 3D information, physical radiance field, and varying camera settings inherent in 2D LDR images.
[Workshop] CVPRw 2023 (3DMV: Learning 3D with Multi-View Supervision)

FedPara: Low-rank Hadamard Product Parameterization for Efficient Federated Learning, ICLR'22 [Paper]
Nam Hyeon-Woo, Moon Ye-Bin, Tae-Hyun Oh

Federated Learning / Parameterization

FedPara re-parameterizes layer weights using a low-rank Hadamard product, allowing larger capacity with lower communication costs than the model with the original layers in a federated learning (FL) setting. pFedPara, a personalized FL application based on FedPara, splits layer weights into global and local parameters and shows more robust results than competing methods.

Projects

Localized image retrieval (Samsung Research, 2024)
Time-series regression with LLMs (SAIT, 2024)
Abnormal and danger signs detection with LLMs (KRIT, 2023)
Video panoptic segmentation and depth estimation (ETRI, 2022)
Data augmentation for domain adaptive object detection (LG Display, 2022)
Weakly-supervised low-shot instance segmentation (ETRI, 2021)
Self-supervised few-shot learning by episodic instance discrimination (ETRI, 2020)

Extra

Journal/Conference Reviewer
- IJCV
- WACV'25, CVPR'25, ICCV'25, ACMMM'25, NeurIPS'25
- ACCV'24, ECCV'24, CVPR'24
- ICCV'23, CVPR'23, WACV'23
Teaching Experiences
- [NAVER Boostcamp AI Tech 6th] Data-centric CV course, NAVER & Upstage, 2023
- [EECE236] Learning Electronic Engineering with MATLAB, POSTECH, 2020
POSTECH AMILab seminar [YouTube]

Page updated

Report abuse

Moon Ye-Bin

Research Experiences

On-going Projects

RetouchLLM: Training-free Code-based Image Retouching with Vision Language Models, submitted Moon Ye-Bin, Roy Miles, Tae-Hyun Oh, Ismail Elezi, Jiankang Deng[arXiv]

Automatic Photomontage with Vision-Language Models, on-going

Publications

Patch-wise Retrieval: A Bag of Practical Techniques for Instance-level Matching, WACV'26 Wonseok Choi, Sohwi Lim, Nam Hyeon-Woo, Moon Ye-Bin, Dong-ju Jeong, Jinyoung Hwang, Tae-Hyun Oh

Beyond the Highlights: Video Retrieval with Salient and Surrounding Contexts, WACV'26 Jaehun Bang, Moon Ye-Bin, Kyungdon Joo, Tae-Hyun Oh

Automated Model Discovery via Multi-modal & Multi-step Pipeline, NeurIPS'25 [Project Page][Paper]Lee Jung-Mok, Nam Hyeon-Woo, Moon Ye-Bin, Junhyun Nam, Tae-Hyun Oh

SYNAuG: Exploiting Synthetic Data for Data Imbalance Problems, Pattern Recognition Letters'25 (IF 3.9) [Paper][arXiv]Moon Ye-Bin*, Nam Hyeon-Woo*, Wonseok Choi, Nayeong Kim, Suha Kwak, Tae-Hyun Oh

VLM’s Eye Examination: Instruct and Inspect Visual Competency of Vision Language Model, TMLR'25 [Code][Paper]Nam Hyeon-Woo, Moon Ye-Bin, Wonseok Choi, Lee Hyun, Tae-Hyun Oh

BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models, ECCV'24 [Project Page][Paper]Moon Ye-Bin*, Nam Hyeon-Woo*, Wonseok Choi, Tae-Hyun Oh

TextManiA: Enriching Visual Feature by Text-driven Manifold Augmentation, ICCV'23 [Project Page][Paper]Moon Ye-Bin, Jisoo Kim, Hongyeob Kim, Kilho Son, Tae-Hyun Oh

ENInst: Enhancing Weakly-supervised Low-shot Instance Segmentation, Pattern Recognition'24 (IF 8.0) [Paper][arXiv]Moon Ye-Bin, Dongmin Choi, Yongjin Kwon, Junsik Kim, Tae-Hyun Oh

HDR-Plenoxels: Self-Calibrating High Dynamic Range Radiance Fields, ECCV'22 [Project Page][Paper]Kim Jun-Seong*, Kim Yu-Ji*, Moon Ye-Bin, Tae-Hyun Oh

FedPara: Low-rank Hadamard Product Parameterization for Efficient Federated Learning, ICLR'22 [Paper]Nam Hyeon-Woo, Moon Ye-Bin, Tae-Hyun Oh