In the third case, there are 9 designs retrieved from the control group and 10, 30, 50 designs randomly selected from our method as the experimental groups. Among 10 participants, for the control group, 3/4 (75%) designs are marked as related candidates on average, in contrast, the designs from the experimental groups are marked as more related to the query, 7.5/10 (75%), 27.5/30 (91.67%), and 44.7/50 (89.4%). Nearly 76.67% of participants from the experimental groups consider that there are more than 80% of retrieved designs are useful, with one third of them mark all results as useful candidates.
From the above bar chart, we can see that the satisfaction score of the experimental group (10) is the same as the control group (3.6/5), while the scores of the experimental groups (30) and (50), 4.5/5 and 4.9/5, are much higher than the control group. This result is reasonable because the number of the results retrieved from the control group is 9, which is close to the experimental group (10). Note that, for the experimental group (50), 90% participants rate full points.
It can be seen that the results retrieved for control group is sightly more diverse than those in the experimental group (10), being rated (2.9/5) and (2.6/5) but it is defeated by both the experimental group (30) and (50), rated 4.2/5 and 4.7/5 on average. For the control group, participants' satisfactions are quite diverse, note that there is one participant even rate the diversity score form the control group as 1 out of 5 but there's also one user rate it as full points. For the experimental group (30) and (50), 90% participants rate results above 3 out of 5.