I am an associate professor at the Department of Applied Statistics, Yonsei University. I have a broad research interest in theoretical, methodological and computational research. Main research directions include High-dimensional Statistics, Nonparametric Statistics, Machine Learning Research, Optimization, and Quantile-based Inference. The best way to contact me is through email.
Email: ishspsy@yonsei.ac.kr
Office: 434, Daewoo Hall, Yonsei University
Ph.D., Statistics (2016), University of Michigan, Ann Arbor, USA.
B.S. in Mathematics and B.E. in Industrial Engineering (2009), Yonsei University, Seoul, Korea.
Associate Professor (March 2024- ): Department of Applied Statistics, Yonsei University, Seoul, Korea.
Associate Professor (March 2023- Feb 2024): Department of Statistics, Sungkyunkwan University, Seoul, Korea.
Assistant Professor (Sep 2018 - Feb 2023): Department of Statistics, Sungkyunkwan University, Seoul, Korea.
Postdoctoral Associate (August, 2016 - July, 2018): Biostatistics Department, Yale University, New Haven, USA.
High-Dimensional Statistics
Machine Learning Research
Optimization
Quantile Modeling
Model Selection
Associate Editor, Computational Statistics and Data Analysis
I am looking for self-motivated ph.d. students with a strong interest in high-dimensional Statistic, nonparametric Statistic, and machine learning. If you are interested, feel free to send your CV or resume to ishspsy@yonsei.ac.kr.
Kim, Y., Kim, I.*, and Park, S.* (2025). Transfer learning for benign overfitting in high-dimensional linear regression. NeurIPS 2025 (Spotlight)
Park, S.*, Lee, E., Kim, H., and Zhao, H. (2025) Transfer learning under large-scale low-rank regression models, Journal of the American Statistical Association, Theory & Methods, to appear. https://www.tandfonline.com/doi/full/10.1080/01621459.2025.2555057
Lee, E.^, Park, S.^, Mammen E., Park, B. U. (2024) Efficient functional Lasso kernel smoothing for high-dimensional additive regression, Annals of Statistics, 52(4). 1741-1773. https://www.e-publications.org/ims/submission/AOS/user/submissionFile/63540?confirm=4736b830
Park, S., Lee, E.*, and Zhao, H. (2024) Low-rank regression models for multiple binary responses and their applications to cancer cell-line encyclopedia data. Journal of the American Statistical Association, Theory & Methods, 119, 202-216. https://www.tandfonline.com/doi/full/10.1080/01621459.2022.2105704
Kim, H., Lee, E. and Park, S.* (2023) Debiased inference for heterogeneous subpopulations in a high-dimensional logistic regression model, Scientific Reports, 13, 21979.
Park, S. , Kim, H. and Lee, E.* (2023) Regional quantile regression for multiple responses. Computational Statistics and Data Analysis, 188, 107826.
Park, S.^, Lee, E.^ and Hong, G.* (2023) Varying-coefficients for regional quantile via KNN-based LASSO with applications to health outcome study, Statistics in Medicine, 42, 3903-3918.
Lee, E.^, Park, S.^, Lee, S. and Hong, G. (2023) Quantile forward regression for high-dimensional survival data. Lifetime Data Analysis, 29, 769-806.
Tang, D., Park, S., and Zhao, H. (2022). SCADIE: simultaneous estimation of cell type proportions and cell type-specific gene expressions using SCAD-based iterative estimating procedure. Genome Biology, 23, 129.
Kim, H. and Park, S.* (2022) Pairwise fusion approach to cluster analysis with applications to movie data. The Korean Journal of Applied Statistics, 35, 265-283. (written in Korean)
Cho, Y. and Park, S.* (2022). Multivariate response regression with low-rank and generalized sparsity. Journal of the Korean Statistical Society, 51, 847-867.
Lee, E., Cho, J., and Park, S.* (2022). Penalized kernel quantile regression for varying coefficient models. Journal of Statistical Planning and Inference, Vol. 217, 8-23.
Ahn, D. and Park, S.* (2022) Linear programming models using a Dantzig type risk for portfolio optimization. The Korean Journal of Applied Statistics, 35, 229-250. (written in Korean)
Park, S. and Lee, E. (2021). Hypothesis testing of varying coefficients for regional quantiles. Computational Statistics and Data Analysis, Vol. 159, 107204.
Lee, E. and Park, S.^ (2021). Poisson reduced-rank models with sparse loadings. Journal of the Korean Statistical Society, Vol. 50, 1079-1097.
Park, S. Xu, H. and Zhao, H.* (2021). Integrating multidimensional data for clustering analysis with applications to cancer patient data. Journal of the American Statistical Association, Applications & Case Studies, Vol. 116, No.533, 14-26. https://doi.org/10.1080/01621459.2020.1730853
Won, H. and Park, S.* (2021) Mean-shortfall optimization problem with perturbation methods. The Korean Journal of Applied Statistics, Vol. 34, No. 1, 39-68. (written in Korean)
Tang, D., Park, S.^, and Zhao, H. (2020). NITUMID: NMF-based Immune-TUmor MIcroenvironment Deconvolution. Bioinformatics, Vol. 36, No. 5, 1344-1350.
Park, M. and Park, S.* (2020) One-step spectral clustering of weighted variables on single-cell RNA-sequencing data. The Korean Journal of Applied Statistics, Vol. 33, No. 4, 511-526. (written in Korean)
Park, S.* and Zhao, H. (2019). Sparse principal component analysis with missing observations. Annals of Applied Statistics, Vol.13, No.2, 1016-1042.
Park, S.* and Lee, S. (2019). Linear programming models for portfolio optimization using a benchmark. European Journal of Finance, Vol. 25, 435-457.
Park, S.* and Zhao, H. (2018). Spectral clustering based on learning similarity matrix. Bioinformatics, Vol. 34, No. 12, 2069-2076.
Park, S*., He, X., and Zhou, S. (2017). Dantzig-type penalization for multiple quantile regression with high dimensional covariates. Statistica Sinica, Vol. 27, No. 4, 1619-1638. (Winner of the 2015 Student Paper Competition in the ASA Section on SLDM)
Park, S.* and He, X. (2017). Hypothesis testing for regional quantiles. Journal of Statistical Planning and Inference, Vol. 191, 13-24.
Greenewald, K., Park, S., Giessing, A., and Zhou, S. (2017). Time varying matrix- variate graphical models. In Advances in Neural Information Processing Systems, 30 (NeurIPS 2017)
* Corresponding author
^ Co-first author
Graduate students are underlined.
Youngjin Cho (Virginia Tech, PhD Student): Multi-task learning with low rank and generalized sparsity in a high dimensional model.
Minyoung Park (LG CNS): One-step spectral clustering of weighted variables on single-cell RNA-sequencing data.
Hyunjin Kim, PhD. (Financial Supervisory Service): Debiased inference for multivariate quantile regression for regional quantiles.
Hayeon Won (Samsung Electronics): Mean-shortfall optimization problem with perturbation methods.
Sohyeon Kim (NC State University, PhD student): Matrix decomposition for regional quantile regression with multiple responses.
Hui Jin Kim (Toyko Electron): Pairwise fusion approach to cluster analysis with applications to movie data.
Dayoung Ahn (SBI Saving Bank): Linear programming models using a Dantzig type risk for portfolio optimization.
Sungmin Ji (U. of Arizona, PhD student): Multivariate single index models for predicting multi-dimensional drug responses.
Jiyeon Lee (Naver): Reduced Rank Quantile Regression with Row-Wise Sparsity.
Jinku Kang (JB Woori Capital): Maximum Sharpe Ratio Portfolio Optimization with Perturbation Method and Schaible Transformation.
Woorim Jung (Master's Student): Reduced varying coefficients of regional quantile for multiple responses.
2023년 제33회 과학기술우수논문상 (한국과학기술단체총연합회)
Yonsei University Future-Leading Research Initiative, 2024-2026
National Research Foundation of Korea (중견연구)
Principal Investigator, 2025- 2030
National Research Foundation of Korea (중견연계 신진후속)
Principal Investigator, 2022- 2025
National Research Foundation of Korea (우수신진연구)
Principal Investigator, 2019 - 2022
Sungkyun Research Fund 2018
Principal Investigator, 2018 - 2019
KOFAC-2019 Undergraduate Research Program
Principal Investigator, 2019
Introduction to Statistical Computing - Fall 2018, this class is partially supported by DataCamp. Students will have full access to the entire DataCamp course curriculum for the semester.
High Dimensional Data Analysis - Spring 2024, Spring 2025
Regression Analysis - Fall 2024, Fall 2025
Graphical Models in Statistics (graduate course) - Spring 2024
Mathematical Statistics (graduate course) - Spring 2019
Modern Statistical Theory (graduate course) - Spring 2019
Large Sample Theory (graduate course) - Spring 2020, Spring 2021, Spring 2022, Spring 2025
Categorical Data Analysis (graduate course) - Fall 2024
Advanced Statistical Computing (graduate course) - Fall 2020, Fall 2021
Advanced Regression Analysis (graduate course) - Spring 2022
Introduction to Statistical Programming - Fall 2019, Spring 2020, Spring 2021, this class is partially supported by DataCamp. Students will have full access to the entire DataCamp course curriculum for the semester.
Statistics and data science - Fall 2019, Fall 2020, Fall 2021, this class is partially supported by DataCamp. Students will have full access to the entire DataCamp course curriculum for the semester.