Role: Research Assistant
Timeline: Jan. 2015 - Jun. 2015
Tech Stack: Python, R, Fortran
I developed a program that calculates the limit amount per dose of drugs using the Bayesian semi-parametric regression model that was being researched at the time. In addition, I developed a GUI (Graphical User Interface) using Python so that even people who do not know how to use statistical software such as R can use it. Furthermore, the built-in regression model which uses the Metropolis-Hastings algorithm was implemented in Fortran.
Role: Research Assistant
Timeline: Mar. 2014 - Dec. 2015
Tech Stack: R, Fortran, Matlab
I proposed a hierarchical smoothing model for mean and covariance estimation of functional data. The model is based on the Bayesian shape-restricted spectral analysis from Lenk and Choi (2015), which uses Gaussian process prior, which is expanded with cosine functions. It includes various shape restrictions such as monotone, convex, S-shape, and U-shape for estimating the mean. Metropolis-Hastings algorithm is implemented in R and Fortran.
Role: FDS Modeler
Timeline: Jan. 2021 - Apr. 2021
Tech Stack: Python, MySQL
I developed statistical models and programs to predict fraud risk for a fraud detection system that had used only scenario-based rules before. The new system is stable enough to calculate fraud risk for all money transfer and payment transactions, and it is an API that can respond to requests from the main application. In addition, the system I developed has over 200 input variables that can be used for both models and rules, a database to store them, and a feature to update the database in real-time.
Role: Modeling Part Leader
Timeline: May. 2020 - Dec. 2020
Tech Stack: Python, Oracle SQL
It was a project to create fraud detection and anomaly detection models for screening suspected fraudulent transactions for further investigation. I utilized gradient boosting models and convolutional neural networks to construct fraud detection models and auto-encoders for anomaly detection models. It was also the first project in South Korea to use an automated rule generator. I was in charge of building models, supervising members’ duties, and coordinating work with clients and the IT development part as a modeling-part leader.
Role: Data Scientist
Timeline: Jan. 2020 - Oct. 2020
Tech Stack: Python
I developed an API that can automatically create scenario-based rules using gradient boosting decision trees. By combining each tree's weights and classification criteria, I developed a method to convert them into natural language so that even operation managers can easily understand and modify them.
Role: Modeling Part Leader
Timeline: Oct. 2019 - Feb. 2020
Tech Stack: Python
The goal of this project was to create models for detecting identity theft and voice phishing. I made a model that checks for identity theft in clients who have passed the credit card membership evaluation using consumption patterns. In addition, I created a model to determine whether any voice phishing frauds occur due to the impersonation of police officers or hacking. As a modeling-part leader, I was in charge of overall schedule management, including model development.
Role: Consultant
Timeline: Jul. 2019 - Aug. 2019
Tech Stack: Python, SAS
It was a project to develop models and build a database for detecting fraudulent insurance claims. Since most of the data is unlabeled, I used unsupervised learning to create an anomaly detection model.
Role: Consultant
Timeline: May. 2019 - Jun. 2019
Tech Stack: SAS
It was a consulting project for evaluating the stability and efficiency of models and rules developed in a project conducted in 2017. In addition, I was in charge of establishing the operation strategy of models and rules for effective fraud detection procedures.
Role: Data Scientist & Software Developer
Timeline: Feb. 2019 - Jun. 2019
Tech Stack: Python
As a developer, I created a Python package that contains all of the step-by-step activities that consultants must do according to the FDS project manual. Additionally, the package includes all tasks that must be completed before model development, such as database data extraction and preprocessing, input variable generation, and variable importance calculation.
Role: FDS Modeler
Timeline: Oct. 2018 - Feb. 2019
Tech Stack: Python, Oracle SQL
It was a project to develop new fraud detection models and build an automatic re-training system for them. I used nearly 200 input variables, including newly created 50+ input variables for the prediction models.
Role: FDS Modeler
Timeline: Mar. 2018 - Oct. 2018
Tech Stack: Python, SAS
I developed models to detect fraudulent usage and authentication, and I built a system to prevent model performance degradation through automatic re-training. In particular, I concentrated on developing a model that can detect a variety of scams in the Google Play Store or Apple App Store. In addition, to detect fake identity authentication, I created around 50 input variables utilizing IP address and device information.
Role: Data Scientist
Timeline: Mar. 2018 - Sep. 2018
Tech Stack: Python
To overcome the difficulty of classifying rare event data without resorting to simple oversampling, I investigated using a generative model to generate synthetic data. After training generative adversarial networks, I developed a model that generates synthetic data using actual fraud transactions. I developed a pipeline that can be used to train fraud prediction models and develop rules using the generated data.
Role: Data Scientiest & Software Developer
Timeline: Jan. 2018 - Jan. 2021
Tech Stack: Python
To reduce the time required to convert models developed in SAS to C or Java, I developed a solution that can integrate the model development and deployment process. As a data scientist, I created the functions required for model development and deployment, requirements for user interfaces, and training and prediction modules of various machine learning algorithms. In addition, I used Python to implement numerous modules directly and served as a field tester. I was in charge of adding new features and improving the program even after the project was completed.
Role: FDS Modeler
Timeline: Mar. 2017 - Oct. 2017
Tech Stack: Python, Oracle SQL, SAS
It was a project to detach and develop a standalone system previously housed within another system. I developed fraud detection models and rules as a member of the modeling team. Using approval, customer, card, and merchant data, I created nearly 300 input variables. In addition, I created fraud risk prediction models utilizing ensembles of multiple machine learning algorithms and deep neural networks, which is South Korea’s first deep learning model for credit card fraud detection.
Role: FDS Modeler
Timeline: Mar. 2017 - Apr. 2017
Tech Stack: Python, Oracle SQL, SAS
The goal of this Proof of Concept project was to check whether deep learning is suitable for credit card fraud detection. I developed various deep neural network models that utilize an existing database without adding new input variables. Consequently, team members agreed that deep learning would be used in the main project because it outperformed current techniques.