In the case4 , there are 4 designs retrieved from the control group and 10, 30, 50 designs randomly chosen to be experimental groups. Among 10 participants, it can be seen that our method is significantly better than the control group (7.4/10 (74%), 27.5/30 (91.67%), 48.1/50 ( 96.2%) vs 3.76/10 (37.6%)). In the control group, 90% of participants think that more than 50% of retrieved results are not related to the query. However, 76.67% participants from the experimental group think that more than 80% designs retrieved from our method are highly related. Especially we can see that in the experimental groups (10) (30) (50), half of participants acknowledge that all of the retrieved designs are related to the query.
For satisfaction score, results show that the average rating of our method dominant the control group, with 3.4/5, 4.5/5, 4.8/5 compare to 2.6/5. The designs retrieved from the control group only satisfy 2 participants , whom rates equal or above 3. However, in the experimental groups, over 73% participants rate the satisfaction score over 3 out of 5. With the increase of number of designs in the experimental group, the satisfaction score increases. Note that in the experimental group (50), all of the participants rate satisfaction score over 3.
From the above bar chart, we can see that the our method is better than the control group in diversity score (2.9/5, 4/5, 4.8/5 vs 2.3/5). There are only 3 participants rate the designs in the control group as equal to or above 3, in contrast, the experimental group (10) reaches 7 participants out of 10, and particularly in the experimental group (30) and (50), there is only one participant rate the diversity score below 3.