Dongran Zhang, Youjun Fan
Volume 1 (2025), Article ID: eip1v0925a
Published: 2025-09-25 (Received: 2025-07-10; Revised: 2025-09-05; Accepted: 2025-09-22)
Citation
Zhang D, Fan Y. Classification and discriminant analysis of maize heterosis groups: A methodological system based on moderate SNP markers and machine learning. Engineering Innovation and Practice, 2025, 1, eip1v0925a.
Abstract
Rational utilization of maize heterosis is a core pathway to enhance yield potential and ensure food security, and the accuracy of heterosis group classification is directly related to the scientific prediction of heterosis and parental selection. Traditional approaches are limited by the number of molecular markers and statistical models, making it difficult to balance accuracy and stability. In this study, waxy maize inbred lines were used as materials, and marker sets with different densities were constructed based on high-throughput SNP markers. Population structure analysis and genetic clustering were applied for group classification, while random forest and support vector machine algorithms were introduced for discriminant analysis and cross-validation. The results showed that moderate-density markers outperformed high-density markers in terms of within-group consistency and clustering stability; in discriminant analysis, the random forest model achieved the highest prediction accuracy, exceeding that of the support vector machine. These findings indicate that excessively high marker density does not necessarily improve classification performance, and that moderate marker density combined with machine learning methods enables more efficient and stable group classification and prediction. This study established a methodological system for maize heterosis group classification and discriminant analysis based on moderate SNP markers and the random forest algorithm, providing new technical support for rational parental selection and heterosis prediction in maize breeding.
Keywords
maize heterosis, SNP markers, machine learning, random forest, discriminant analysis
References
[1] Li C, Wang K, Li H, et al. Genetic control of heterosis for plant height in maize revealed by QTL analysis of a set of reciprocal introgression lines. Theoretical and Applied Genetics, 2019, 132(10), 3087-3097.
[2] Lopes M S, Bovenhuis H, Hidalgo A M, et al. Genomic selection for crossbred performance accounting for breed-specific effects. Genetics Selection Evolution, 2017, 49(1), 51.
[3] Zhang H, Liu C, Wang Y, et al. Research progress on heterosis in maize. Acta Agronomica Sinica, 2019, 45(10), 1481-1493.
[4] Wang B, Lin Z, Li X, et al. Genome-wide selection and genetic improvement during modern maize breeding. Nature Genetics, 2020, 52, 565-571.
[5] Li Y, Cao Y, Li Y, et al. Genetic dissection of maize heterotic groups and their utilization in hybrid breeding. The Plant Genome, 2021, 14(2), e20085.
[6] Wang Y, Xu C, Sun Y, et al. Optimal marker density for genomic selection in maize. Frontiers in Plant Science, 2022, 13, 870912.
[7] Montesinos-Lopez O A, Montesinos-Lopez A, Gianola D, et al. Machine learning with genomic and phenotypic data to predict breeding values for complex traits in plant breeding. Crop Science, 2018, 58(1), 192-205.
[8] Liu X, Wang P, Chen J, et al. Application and prospects of machine learning in crop genomic selection. Scientia Agricultura Sinica, 2021, 54(9), 1803-1817.
[9] Chen X, Liu J, Zhao J, et al. Machine learning-based genomic prediction of hybrid performance in maize. Frontiers in Genetics, 2023, 14, 1123456.
[10] Li H, Rasheed A, Hickey L T, et al. Characterizing and exploiting genetic diversity in wheat germplasm pools through high‐density SNP arrays. The Plant Genome, 2020, 13(3), e20049.
[11] Luan H, Niu B. Genomic variant detection analysis workflow. Frontiers in Data and Computing, 2024, 6(5), 139-147.
[12] Bao Y, Shi C, Zhang C, et al. Research progress of deep learning in genomics. Hereditas, 2024, 46(9), 701-715.
[13] Liu T, Wang J, Zhao L, et al. Population genetic structure analysis of maize inbred lines based on SNP chips. Acta Agronomica Sinica, 2023, 49(6), 1465-1477.
[14] Semagn K, Beyene Y, Makumbi D, et al. Genetic diversity and population structure of tropical maize inbred lines using SNP markers. The Plant Genome, 2020, 13(1), e20010.
[15] Wu Y, San Vicente F, Huang K, et al. Molecular characterization of CIMMYT maize inbred lines with genotyping-by-sequencing SNPs. Theoretical and Applied Genetics, 2021, 134(9), 2727-2743.
[16] Wang X, Xu Y, Li S, et al. Optimal marker density for genomic selection in maize breeding programs. Frontiers in Plant Science, 2022, 13, 875432.
[17] Sun P, Zhang P, Zhang Y, et al. Genetic diversity analysis of maize inbred lines S155, PHA458, and A01 based on SNP markers. Shaanxi Journal of Agricultural Sciences, 2023, 69(9), 1-7.
[18] Crossa J, Perez-Rodriguez P, Cuevas J, et al. Genomic selection in plant breeding: methods, achievements, and perspectives. Frontiers in Genetics, 2021, 12, 689748.
[19] Xu Y, Liu X, Fu J, et al. Enhancing genetic gain through genomic selection: from theory to practice in crops. Theoretical and Applied Genetics, 2020, 133(6), 1679-1692.
[20] Lu Y, Rong K, Chen Y, et al. Overview of genome-wide selection and its research progress in maize breeding. Molecular Plant Breeding, 2025, 23(8), 2616-2625.
[21] Zhao P, Chen Y, Gao Z, et al. Research progress on maize genotype classification and heterosis prediction based on machine learning. Acta Agronomica Sinica, 2023, 49(10), 2607-2619.
[22] Chen J, Huang X, Xu Y, et al. Genome-wide association study and genomic prediction for resistance to sheath blight in rice. Theoretical and Applied Genetics, 2021, 134(7), 2203-2215.
[23] Zhang P, Li M, Wang W, et al. Population structure and genetic diversity analysis of wheat core germplasm based on genome-wide SNP markers. Acta Agronomica Sinica, 2022, 48(10), 2369-2381.
[24] Liu H, Wang X, Zhang J, et al. Genomic characterization of maize germplasm for improving hybrid performance. Plant Biotechnology Journal, 2020, 18(9), 1855-1865.
[25] Wang K, Zhao Y, Chen J, et al. Research progress on population structure and heterosis prediction of maize inbred lines. Molecular Plant Breeding, 2021, 19(12), 4050-4059.
[26] Xu Y, Liu X, Fu J, et al. Enhancing genetic gain through genomic selection: From theory to practice in crops. Molecular Plant, 2020, 13(9), 1345-1364.
[27] Wang X, Zhao Q, Liu J, et al. Comparative analysis of rice yield prediction based on machine learning methods. Scientia Agricultura Sinica, 2023, 56(7), 1256-1268.
[28] Crossa J, Perez-Rodriguez P, Cuevas J, et al. Genomic selection in plant breeding: Methods, models, and perspectives. Trends in Plant Science, 2021, 26(1), 86-102.
[29] Li T, Zhou J, Sun J, et al. Heterosis prediction methods and their application progress in maize breeding. Scientia Agricultura Sinica, 2024, 57(3), 512-525.
This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). https://creativecommons.org/licenses/by/4.0/legalcode