In case 5, there are 10 designs retrieved from the control group and 10, 30, 50 designs retrieved from the experimental groups. Among 10 participants, for the control group, on average, 69% (6.21/9) designs are marked as related candidates, in contrast, the designs from the experimental groups are marked as more related to the query, 7.9/10 (79%), 23.46/30 (78.2%), and 40.9/50 (81.8%) respectively.
From the bar chart, we can see that in this case, our method performance is equal to the control group in satisfaction score (3.4/5). It is reasonable as the number of the retrieval results from the control group and the experimental group (10) is the same. However, the experimental group (30) and (50) achieves better satisfaction score, 4.3/5 and 4.5/5. As the number of the retrieval results increases, the satisfaction score increases. It is consistent to human perception.
It can be seen that in general, the diversity rating of the experiment group (10) (2.6/5) is slightly lower than the control group (2.8/5), but the rating in other two experiment groups are greatly higher, as 4.3/5 and 4.6/5 on average. Note that all of the participants acknowledge that the experimental groups (30) and (50) are very diverse (all of them rate over 3).