Our work aims to address three major challenges in PCOS intelligent detection research: identifying the detection features used and understanding their relevance, evaluating the capabilities of detection tools, and assessing the current state of related datasets.
In this study, we introduce a comprehensive framework, the Analytical Framework for the Current Status of PCOS Diagnostic Research. Using this framework, we developed a robust taxonomy of features for PCOS diagnostic. Each feature is annotated with its acquisition method and the level of difficulty in obtaining it, validated through an industry survey.
Our evaluation of detection tools and datasets reveals several key findings. Among 12 publicly available datasets, the overall coverage rate compared to the identified 110 diagnostic features is only 52%. These datasets lack multimodal data, are outdated, and often have unclear licensing information, which directly impacts the performance of detection tools. Furthermore, of the 42 detection tools analyzed, many demand substantial computational resources, lack multimodal data processing capabilities, and remain unvalidated in clinical settings.
These findings highlight the significant room for improvement in the PCOS intelligent detection domain. Through this study, we not only enhance the understanding of PCOS features and detection tools but also provide valuable insights to guide future research and practices in this field.
The framework of our study
Answer to RQ1: Through the review of 93 scientific publications, we have proposed the comprehensive taxonomy of PCOS detection features to date. This taxonomy is divided into eight categories, encompassing 110 detection features, and annotates the methods and difficulty levels of acquiring these features in clinical practice. The taxonomy has received affirmation from 82 domain experts and researchers, including senior doctors and nurses. Over 91% of the feedback rated the taxonomy highly, scoring 4 or 5 on its rationality, completeness, and practical utility in both academic research and clinical applications.
Answer to RQ2: Among the 36 PCOS datasets, 12 are publicly accessible, covering 58 features, which corresponds to an overall coverage rate of 52% compared to the 110 features included in the taxonomy. At the same time, the current state of these datasets raises concerns due to the lack of multimodal datasets, many datasets have remained unupdated for years, and most lacking clear licensing information.
Answer to RQ3: The selected 42 representative PCOS detection tools are all based on machine learning and deep learning technologies and demonstrate good detection performance on their test datasets. The detection accuracy of some tools even reached 100%. However, these tools still have the following limitations: they require a lot of computing resources and take a long time to train; they are insufficient in multimodal data processing; and they lack clinical validation.