This requirement dataset is mentioned and released in [1], which consists of 1278 NFR user review sentences from iBooks and WhatsApp two popular Apps (therefore we name it NFR-Review). NFR-Review has a total of 1278 samples of 5 NFR categories: security, portability, performance, reliability, and usability
The following table shows the classification performance statistics of the four models under 11 different sizes (from 0 to 100% with 10% increase step) of train sets (NFR-Review) with 3 repeated experiments (the initialized random seeds are 42, 930728, and 904727489 respectively).
Results (p-value is much less than 0.05) illustrate that the classification performance difference from each 3 repeated experiments is not statistically different from zero. In other words, all results from the repeated experiments are statistically significant.
[1] Tianlu Wang, Peng Liang, and Mengmeng Lu. 2018. What aspects do nonfunctional requirements in app user reviews describe? an exploratory and comparative study. In 2018 25th Asia-Pacific Software Engineering Conference (APSEC). IEEE, 494–503.