Evaluation Metrics
Evaluation Metrics
In this section, we will discuss the various evaluation metrics that we have used to measure the effectiveness of our fashion recommendation system.
Precision:
Precision is a commonly used metric in information retrieval and recommendation systems that measures the fraction of relevant items in the recommended list. For StyleMate, we have used precision@10 as an evaluation metric. We defined a relevant item as an item matching a product category with respect to a searched query.
Mean Average Precision (MAP):
Mean Average Precision is a widely used metric that takes into account the order in which items are recommended. It assigns higher scores to systems that recommend relevant items early in the recommendation list. For StyleMate, we have used MAP@10 to evaluate the effectiveness of our recommendation system. We calculated the average precision of the top 10 recommendations for various queries and then computed the mean of these values across all queries.
MRR (MRR):
MRR is calculated by taking the reciprocal of the rank of the first correct result, and then taking the mean of these reciprocals over a set of queries.
Analysis
The main algorithm we used for identifying relevant items based on query gave excellent results. Always the first item was a highly relevant one. Also, except for slightly difficult queries like "Trendy Handbags" and "Watches for all seasons" the top 10 recommendations included at least 7 relevant items. In the case of "Trendy Handbags", a few clothes which were described as trendy were returned.
The Resnet18 model pretrained on ImageNet gave a very good performance. It exactly catered to the expectation of identifying 'similar' shirts to shirts and pants to pants.
On careful analysis, we found that in case of irrelevant results it is mostly due to a slightly different object with same color. For e.g. the algorithm gave a black women's top as similar item to a black shirt.
We performed Aspect Based Sentiment Analysis for four aspects Quality, Fit , Comfort and Price. And we got fairly good results for our testing data With MAP@10 as 0.82 and MRR@10 as 0.85
For some queries we were getting top 2 same reviews for multiple aspects example – Fit and Comfort. This is the area we think we can improve and fine tune our model.