In this case, there are 4 images retrieved from the control group and 10, 30, 50 designs retrieved from the experimental groups. Among 10 participants, for the control group, 3.5/4 (87.5%) designs are marked as related candidates on average. And the average results for the experimental groups are 3/4 (75%), 3.35/4 (83.9%), 3.77/4 (94.4%) respectively. In the experimental group (50), there are 8 out of 10 participants think that there are more than 45 designs out of 50 are useful, with 6 of them mark all of the 50 designs are useful candidates.
From the above bar chart, we can see that the our method outperforms the control group in satisfaction score (3.5/5, 4.1/5, 4.9/5 compare to 3.4/5). The result shows that the more results retrieved, the higher satisfaction participants feel.
From the above bar chart, we can see that the our method is better than the control group in diversity rating, especially for experimental group 30 and 50 which is 4.5/5 and 4.7/5 respectively. For the control group, participants' responses are quite diverse, there are 3 participants rate the diversity score as 1 out of 5 while two rate 5. In contrast, for the experimental groups, 93.3% participants rate equal or greater than 3, particularly the experimental group (50), all of the participants rate equal or greater than 3 and 6 participants rate full score.