Sogang University
BA, Media and Entertainment
BE, Computer Science and Engineering
Feb 2020 - Aug 2026 (Expected)
State University of New York, Stony Brook
Exchange Program
Aug 2024 - Dec 2024
Ulsan Science High School
Mar 2017 - Jan 2020
Photographer
Sogang University, Office of Public Afairs
Jan 2025 - Present
AI Engineer Intern
GenON
Dec 2025 - Feb 2026
Big Data Platform Department Intern
Woori Bank
Jul 2025 - Aug 2025
Public Affairs Specialist
Defense Intelligence Agency - 777사령부
May 2022 - Nov 2023
2nd Place - 13th BIGCONTEST 2025 (AI·Data Analysis Track)
Shinhan Card CEO
Dec 2025
Award Winner: 37th Place - 12th University Student Mock Investment Competition
Korea Investment & Securities
Jul 2025
Namgoong Hoon Future Talent Scholarship
Sogang University Development Fund
May 2025
Excellence Award - 3rd Unification Video Contest (2021 Republic of Korea Youth Day)
Members of the National Assembly
Nov 2021
Core Systems & Architecture
Operating Systems
Fundamentals of Compiler Construction
Computer Architecture
Computer Networks
System Programming
Digital Logic Design
Introduction to Computer System
AI & Data Science
Natural Language Processing
Introduction to Data Science
Introduction to Machine Learning
AI Convergence Capstone Design
Data & AI
Mathematics & Algorithms
Linear Algebra
Analytic Geometry and Calculus I
Analytic Geometry and Calculus II
Data Structures
Design and Analysis of Algorithms
Engineering & Media
Software Engineering
Introduction to Computer Graphics
Introduction to Visual Media Programming
Advanced Applied C Programming
Languages
Python
C/C++
SQL
F#
Data / AI
Pandas
Scikit-learn
PyTorch
Keras
Frontend
Streamlit
Backend
FastAPI
Database
MySQL
Infrastructure
Docker
Engineer Big Data Analysis
Korea Data Agency
Dec 19, 2025
Advanced Data Analytics Semi-Professional
Korea Data Agency
Mar 21, 2025
Specialist-Multimedia Contents Producing
Human Resources Development Service of Korea
Aug 7, 2020
Craftsman Computer Graphics Operation
Human Resources Development Service of Korea
Jul 17, 2020
OPIc - IH
ACTFL
May 7, 2025
TOEIC - 990
ETS
Jan 26, 2025
2025 Big Contest, involved proposing an AI-driven early warning system to predict crisis signals—such as sharp sales declines or closure risks—for local small and medium-sized businesses. The system was designed to analyze diverse data, identify key predictive signals, and provide actionable insights to help business owners stabilize their operations.
This project, conducted for the 2025 Big Contest , involved the development of an "AI Early Warning System for Small Businesses". Facing an unprecedented crisis with record-high closure rates and worsening financial health , small business owners lack tools that offer predictive insights. This project leverages survival analysis to build a model that preemptively detects crisis signals. The analysis was built on a comprehensive dataset integrating internal store performance (Meso) , market/geospatial data (Macro) , and macroeconomic indicators (Mega).
To provide preemptive risk detection
To diagnose specific root causes
To create a foundation for customized support
To overcome methodological limitations
Managed the end-to-end data science lifecycle, from problem definition and data collection to modeling, evaluation, and strategic proposal.
Processed, cleaned, and integrated complex, multi-source datasets
Engineered time-series features.
Developed, trained, and evaluated a Time-Varying Cox model for statistical interpretation.
Analyzed and translated complex model outputs (Hazard Ratios, C-Index scores) into actionable business insights and designed a final application prototype.
Problem Re-definition: Identified that standard classification is unsuitable for this task. Its failure to handle "censored" data (stores that are still operating) and extreme data imbalance leads to models with high accuracy but practically zero predictive power (low recall/F1 scores).
Geospatial Feature Engineering: Overcame the limitations of outdated administrative districts by using the Kakao Maps API and K-means clustering to generate new 'commercial cluster IDs' that reflect actual consumer search patterns.
Dual-Track Modeling Strategy: Implemented a two-pronged approach to balance predictive accuracy with "Explainable AI" (XAI).
Prediction (Who/When): A Random Survival Forest (RSF) model was developed using scikit-survival to accurately identify which stores were at high risk. This model achieved the highest performance (C-Index: 0.9553).
Interpretation (Why): A Time-Varying Cox (TVC) model was developed to provide clear, interpretable insights into why a store was at risk by analyzing the dynamic impact of variables over time.
Key Findings & Drivers: The TVC model identified statistically significant (p<0.05) drivers of closure risk. Key factors included high delivery sales ratio (Hazard Ratio: 1.54) , high floating population customer ratio (HR: 1.33) , and high new customer ratio (HR: 1.18), indicating struggles with profitability and customer loyalty.
Key Deliverable (XAI Solution): Proposed a deployable "AI Early Warning System" for Shinhan Card's "MyShop Partner" platform. The dashboard provides a simple "risk traffic light" , diagnoses the primary risk factor (e.g., "weakening customer loyalty") , and offers concrete, actionable solutions (e.g., "Start a stamp coupon event for repeat customers").
Survival Analysis, Data Analysis, Data Modeling, Data Preproocessing
Woori Bank holds a significant advantage in the youth market through its exclusive student ID card partnerships with major universities (10 of 26 in the metro area), securing over 200,000 young customers. However, analysis revealed this segment has the lowest digital engagement with the bank's WON Banking app. This project aimed to solve this paradox by analyzing young customer data to develop targeted gamification strategies, capitalizing on this generation's high preference for game-like elements and fun in financial services.
To analyze and define the characteristics of the "Active Young Customer" (aged 19-34), comparing their financial behavior and digital engagement against other generations (X Generation and Baby Boomers).
To identify the key drivers and product relationships that correlate with a young customer becoming "Active" (defined as having a 1-month average deposit balance of 300,000 KRW or more).
To prove that "Active Young Customers" are not a homogeneous group, and to segment them into distinct, actionable clusters based on their financial goals and digital activity levels.
To design and propose specific, persona-based gamification marketing strategies to be implemented within the WON Banking app, with the goal of increasing engagement, retention (Lock-in), and cross-selling opportunities.
Conducted end-to-end data analysis on a 100:1 sampled dataset of 256,104 Woori Bank customers.
Processed and analyzed customer data across four main categories: basic info, account/transaction info (deposits, loans, card usage), channel activity (branch vs. app visits), and product holdings (savings, funds, etc.).
Engineered key derived variables for the analysis, such as "Active Customer Status" and "Weekly WON Banking Visitor".
Performed Apriori Association Rule analysis to discover significant behavioral patterns, such as the relationship between holding a housing subscription account and being an active customer.
Executed K-Means clustering (K=5) to segment the "Active Young Customer" population, validating the cluster count using the Elbow Method and Silhouette Score (0.46).
Synthesized all analytical findings into three distinct customer personas and developed a detailed, three-pronged gamification marketing proposal tailored to each persona.
Key Insight from EDA: Uncovered a critical gap: Young Customers have the highest rate of Open Banking adoption (24.1%) , yet the lowest rate of weekly engagement with Woori's own WON Banking app (15.4%).
Key Driver Analysis (Association Rules): Identified that holding a "Housing Subscription" (청약) account is a primary driver for becoming an active customer, increasing the likelihood by 2.27 times (Lift). This suggests the segment is highly goal-oriented.
Customer Segmentation (K-Means): The analysis disproved the idea of a single "young customer" and segmented the active group into 5 clusters, which were then defined as three key personas:
'Professional Digital Affluent': A high-value, high-engagement core group with high balances and product diversification (e.g., "Youth Leap Account"). Identified as potential VIPs.
'Pragmatic Subscription Saver': A large group focused on a single goal (housing subscription) but with minimal digital engagement. Identified as the highest-potential group for cross-selling.
'Low-Engagement Academic': A group with basic assets but the lowest financial and digital activity, requiring re-activation strategies.
Key Deliverable (Persona-Based Gamification Strategy): Proposed a detailed marketing plan with three distinct "Challenges" within the WON Banking app, each targeting a specific persona:
For 'Affluents' (Up-sell/Lock-in): An "Investment Master Challenge" with visible ranks (Bronze to Diamond), achievement badges, and "hidden quests" linked to portfolio diversification and PB consultations.
For 'Savers' (Cross-sell/Engagement): A "My Home-Building Challenge" that visually gamifies saving by "growing" a virtual house as their subscription balance increases, supported by daily quests and random rewards.
For 'Academics' (Re-activation): A "Simple App-Tech & Mini-Game" program to build a habit of logging in via low-effort daily check-ins and 10-second quests for small, tangible point rewards.
Data Analysis, Association Rule Analysis, Cluster Analysis, Strategic Planning, Quantitative Reasoning
INSICON is INSIGHT's internal data competition, designed for members to apply and expand upon data analysis skills gained from our sessions. In this hands-on competition, participants develop business scenarios, analyze metadata, and leverage workshop insights to create actionable strategies.
This project involved conducting an in-depth analysis of Dunnhumby's internal data to identify key business challenges and formulate data-backed strategies for growth. This dataset contains household level transactions over two years from a group of 2,500 households who are frequent shoppers at a retailer. It contains all of each household’s purchases, not just those from a limited number of categories. For certain households, demographic information as well as direct marketing contact history are included.
To diagnose current business problems or areas with significant improvement potential using quantitative data analysis.
To develop realistic and actionable solutions addressing these identified issues.
To ensure that both the problem definition and the proposed solutions were rigorously supported by numerical evidence.
To forecast the expected impact of the proposed strategies in quantifiable terms.
To navigate the complexities of Dunnhumby's retail distribution structure, where multiple stakeholders are involved, requiring careful consideration for decisions related to pricing, coupon issuance, and promotional activities.
Preprocessed and analyzed extensive internal datasets (e.g., transaction, customer, product, promotional data).
Developed, coded, and analyzed a regression model to predict customer value.
Analyzed behavioral patterns of segmented customer groups through persona development and analysis.
Presentating a comprehensive proposal detailing the problem, data-driven analysis, proposed solution, and expected numerical outcomes for Dunnhumby's executive management.
Customer Base Analysis & Strategic Goal Refinement: An analysis of 2,500 loyal customer households, characterized by high retention, prompted a strategic shift from new customer acquisition to maximizing the value derived from the existing customer base.
KPI Definition: A "Customer Value" KPI was established by dividing each customer's cumulative sales by their engagement period, providing a standardized metric for comparison and predictive modeling.
Customer Segmentation & Persona Development based on Behavioral Patterns: Customers were categorized into four distinct personas (Loyal, Churned, Returning, Non-Loyal) based on their visit patterns, revealing unique needs and highlighting the inefficiency of a uniform marketing approach.
Identification of Key Value Drivers & Formulation of Targeted Engagement Strategies: Regression modeling identified key factors influencing customer value, leading to the development of tailored engagement strategies designed to retain high-value customers and effectively reactivate or convert other segments.
Data Analysis, Strategic Planning, Problem-Solving, Quantitative Reasoning, Communication & Presentation.
The CSE354 (Natural Language Processing) Final Team Project, titled 'Reading Between the Labels: Mitigating Ambiguity in MBTI Personality Classification', was a capstone experience during my Fall 2024 exchange at Stony Brook University. This hands-on project focused on leveraging Large Language Models to tackle the inherent ambiguities in personal trait inference, specifically by developing and evaluating techniques to improve MBTI personality type prediction from textual data.
This project focused on leveraging Large Language Models (LLMs) to infer personal traits, specifically tackling the task of MBTI personality type prediction. After initial models showed low accuracy (e.g., BERT at 37.53%), the project conducted an in-depth analysis to identify and address the core challenge: the inherent "ambiguity" in datasets related to human personality traits due to the lack of strong ground truth. The primary goal was to develop and evaluate techniques to mitigate this ambiguity and improve classification performance.
To thoroughly analyze and define the impact of ambiguity on NLP models in the context of personality trait prediction.
To develop and implement data-centric strategies, including perplexity-based filtering and data augmentation, to improve the quality and reduce ambiguity in the training dataset.
To design and test advanced prompting techniques, such as label reduction and contrastive comparison, for enhancing the performance of LLMs in zero-shot inference scenarios.
To systematically evaluate the effectiveness of these proposed ambiguity mitigation techniques across various models (BERT, ELECTRA, GPT-4o Mini) using metrics like accuracy, recall, and F1-Score.
Led the processing and preparation of the dataset for training, which included implementing data cleaning procedures, oversampling techniques to address class imbalance, and perplexity-based filtering to refine data quality.
Developed the codebase for fine-tuning the ELECTRA model and conducted a series of experiments to assess its performance on the MBTI personality classification task.
Systematically evaluated the ELECTRA model's effectiveness across different versions of the dataset: preprocessed, oversampled, and oversampled with perplexity filtering.
Analyzed and reported the experimental results for the ELECTRA model, contributing key findings on its response to various data refinement strategies within the project's broader investigation of ambiguity mitigation.
Problem Identification & Dataset Characterization: The project began by establishing baseline model performance (BERT achieving 37.53% accuracy), followed by an analysis that pinpointed "ambiguity" arising from the lack of strong ground truth in MBTI datasets as the primary obstacle to accurate personality classification.
Data-Centric Ambiguity Mitigation: A comprehensive data refinement pipeline was implemented, involving systematic data cleaning, perplexity-based filtering with GPT-2 to select less ambiguous text samples, and techniques like oversampling and T5-based paraphrasing to address significant class imbalances, thereby enhancing overall data quality for model training.
Advanced Prompting & Zero-Shot LLM Evaluation: Novel prompting strategies were developed and applied to a pre-trained model (GPT-4o Mini) in a zero-shot setting; this included "positive label reduction" to narrow down choices effectively and "contrastive comparison" to help the model discern subtle differences between potential labels.
Comprehensive Model Evaluation & Findings: Extensive experiments were conducted, fine-tuning models like ELECTRA and BERT, and evaluating the zero-shot GPT-4o Mini. The results demonstrated that the proposed ambiguity reduction techniques led to significant improvements in MBTI classification accuracy (e.g., ELECTRA's accuracy rose to 75.84% with oversampling and perplexity filtering ). Positive label reduction also enhanced GPT-4o Mini's performance. Detailed error analysis further identified the N/S (Intuition/Sensing) MBTI dimension as particularly challenging for the models.
NLP, LLMs, ML (Model Fine-tuning, Zero-Shot Learning), Data Analysis & Data Preprocessing (Perplexity Filtering, Augmentation), Prompt Engineering, Model Evaluation & Error Analysis, Communication & Presentation.