Moon Ye-Bin
PhD Student @ POSTECH & Research Intern @ Huawei London Research Center
PhD Student @ POSTECH & Research Intern @ Huawei London Research Center
I am a Ph.D. student at POSTECH, South Korea, advised by Prof. Tae-Hyun Oh. I am currently a full-time research intern in London, U.K., at Huawei London Research Center. I finished my master's degree at POSTECH, and bachelor's degree at Chung-Ang University (CAU), South Korea.
I am interested in reliable and automated multimodal systems by exploiting VLM & LLM as human proxies. This involves constructing and augmenting data to evaluate model appropriateness and facilitate model refinement through SFT, preference optimization, and other techniques.
Keywords: VLM & LLM / Agentic System / SFT & PO / Model Evaluation / Data Augmentation & Construction / but not limited to
Full-time Research Intern, Huawei London Research Center, UK (current)
VLM & LLM / Agentic System / PO / Data Construction (RetouchLLM & Automatic Photomontage)
Jan. 2025 - July. 2025 / Jul. 2025 - Dec. 2025 (Extended)
Collaborate with Jiankang Deng, Ismail Elezi, and Roy Miles
Ph.D. in Electrical Engineering, POSTECH, South Korea (current)
VLM & LLM / SFT / Model Evaluation / Data Augmentation & Construction
Mar. 2022 -
Master's in Electrical Engineering, POSTECH, South Korea
Sports AIX Program
Mar. 2020 – Feb. 2022
Photo Retouching / VLM & LLM / Agentic System
RetouchLLM is an iterative retouching framework guided by a style-guided selection score, which ensures convergence toward the target style without requiring any training data. The system’s white-box, code-based design provides transparency and reproducibility, operating directly on high-resolution images. By leveraging LLM & VLM, our framework further supports natural language human instructions, enabling personalized retouching aligned with user intent.
Work performed while interning at Huawei London Research Center.
Photomontage / VLM & LLM / Preference Optimization (PO) / Data Construction
Model Discovery / VLM & LLM / Agentic System
Interpreting time-series data as visual graphs with VLMs yields higher model discovery accuracy than interpreting it as text with LLMs. By leveraging VLM-based agents, the system visualizes and analyzes the given time-series data to propose model candidates, then iteratively evaluates and refines them to identify the most suitable model.
Data Augmentation / Diffusion Models
SYNAuG augments imbalanced training data with synthetic images to balance the data distribution. Although there exists a domain gap between synthetic and real images, incorporating synthetic data is beneficial in data-imbalanced settings when at least 5–10 real samples are available. In tasks involving long-tail recognition, model fairness, and spurious correlation, we demonstrate that SYNAuG outperforms algorithmic approaches trained only on real data, suggesting the importance of controlling imbalance from a data perspective.
[Workshop] ICCVw 2023 (MMFM: What is Next in Multimodal Foundation Models?)
Model Evaluation / VLM & LLM / SFT
VLM’s Eye Examination investigates how a VLM perceives images, specifically focusing on key elements of visual recognition, from primitive color and shape to semantic levels. Models that fail to understand the exam format undergo supervised fine-tuning before evaluation. The color exam reveals that VLMs are less sensitive to the green spectrum. Furthermore, we observe that the LLM component, which serves as the cognitive core of the VLM, influences shape sensitivity and patch-wise semantic discrimination.
Model Evaluation / VLM & LLM
BEAF evaluates whether a VLM correctly understands a scene by observing its response changes to the same question when presented with an original image versus a manipulated version from which an object has been removed. Based on the proposed change-aware metrics, we demonstrate that answers previously considered non-hallucinatory by existing benchmarks may actually be guesses made without referencing the image content. This finding suggests that observing changes along the vision-axis is crucial for accurately assessing VLM hallucination.
Data Augmentation / VLM & LLM
TextManiA augments visual features by incorporating attribute vectors from the text embedding space. Unlike conventional noise injection techniques, TextManiA makes augmentation understandable and semantically meaningful. Surprisingly, we demonstrate that visual features can be effectively augmented not only using aligned text-visual spaces like CLIP, but also by leveraging embeddings from independent LLMs such as GPT-2 and BERT.
[Workshop] CVPRw 2023 (WFM: Workshop on Foundation Models)
[Workshop] ICCVw 2023 (MMFM: What is Next in Multimodal Foundation Models?)
Few-shot Segmentation / Markov Random Field / Parameterization
ENInst includes sub-task enhancement methods: MRF-based instance-wise refinement to improve pixel localization quality, and novel classifier composition that parameterizes them with base classifiers and Gaussian random vectors to improve classification accuracy, based on performance bottleneck analyses.
[Conference] IPIU 2022, Best Paper Award
Scene Reconstruction / HDR
HDR-Plenoxels learns the plenoptic function of the 3D scene from a comprehensive understanding of 3D information, physical radiance field, and varying camera settings inherent in 2D LDR images.
[Workshop] CVPRw 2023 (3DMV: Learning 3D with Multi-View Supervision)
Federated Learning / Parameterization
FedPara re-parameterizes layer weights using a low-rank Hadamard product, allowing larger capacity with lower communication costs than the model with the original layers in a federated learning (FL) setting. pFedPara, a personalized FL application based on FedPara, splits layer weights into global and local parameters and shows more robust results than competing methods.
Localized image retrieval (Samsung Research, 2024)
Time-series regression with LLMs (SAIT, 2024)
Abnormal and danger signs detection with LLMs (KRIT, 2023)
Video panoptic segmentation and depth estimation (ETRI, 2022)
Data augmentation for domain adaptive object detection (LG Display, 2022)
Weakly-supervised low-shot instance segmentation (ETRI, 2021)
Self-supervised few-shot learning by episodic instance discrimination (ETRI, 2020)
Journal/Conference Reviewer
IJCV
WACV'25, CVPR'25, ICCV'25, ACMMM'25, NeurIPS'25
ACCV'24, ECCV'24, CVPR'24
ICCV'23, CVPR'23, WACV'23
Teaching Experiences
[NAVER Boostcamp AI Tech 6th] Data-centric CV course, NAVER & Upstage, 2023
[EECE236] Learning Electronic Engineering with MATLAB, POSTECH, 2020
POSTECH AMILab seminar [YouTube]