This is a usefulness evaluation of the predicted addition tags for bootstrapping the UI design retrieval. We obtain five queries. For each query, we set up two groups, i.e., control group and experimental group. As the number of designs in the control group is ~10. To ensure the comparability and fairness of the experiment, we randomly take 10, 30, 50 designs as the experimental groups.
The following shows the results of a survey from 10 Master and final-year Bachelor students from our school. We asked the participants to mark each design as related to the query or not and to rate how satisfied the query results in five-point likert scale (1: not satisfied and 5: highly satisfied), as well as the diversity.
Note: * denotes p < 0.01, ** denotes p < 0.05
Compared with the baseline, most participants admit that our method can provide more satisfactory and diverse results for them to choose from. By no means conclusive, this user study provides initial evidence of the usefulness of our method for enhancing the performance of the tagging-based search.