Dataset Used : Amazon Review
StyleMate makes use of multiple methodologies and techniques to create an end-to-end e-commerce. Each technique we employed is listed below along with their place of usage.
Recommendation system - Content-based filtering [Text] / Word Embeddings / DistilBERT
Similar-item suggestions - Content-based filtering [Images] / Image2Vec / ResNet-50
Top reviews for personalized features - Aspect Based sentiment analysis / DeBERTa
Product Pipeline
We implemented an efficient retrieval / ranking method to identify products that are most relevant to specific user queries.
The initial step in processing a user query involves identifying the general category of the product being searched for. To this end, we defined a set of major categories (e.g., shirts, shoes, accessories, etc.) and used zero-shot learning to determine the most suitable category for the query.
In addition, we employed a fuzzy logic framework to identify relevant subcategories for the identified major category. This approach allowed us to optimize computational efficiency while ensuring that only the most appropriate products were displayed in response to user queries, thereby enhancing the overall user experience.
To generate embeddings for each product, we utilized the available textual data such as the title, description, and features of the product. A pre-trained DistilBERT model was employed for this task.
When a user query is received, a query embedding is generated using the same DistilBERT model. The similarity between the query and a given product is then determined by computing the dot product of their respective embeddings.
The use of a pre-trained model such as DistilBERT allowed us to leverage state-of-the-art natural language processing techniques and obtain highly accurate embeddings for each product.
DistilBERT is a refined version of the BERT (Bidirectional Encoder Representations from Transformers) model that utilizes knowledge distillation and parameter pruning techniques to compress the original architecture while maintaining most of its performance.
Compared to BERT, DistilBERT has significantly fewer parameters, which makes it much faster and more memory-efficient. This is accomplished by employing a single transformer layer instead of the 12 or 24 layers found in the original BERT model, which reduces computational requirements without compromising accuracy.
Despite its reduced complexity, DistilBERT has demonstrated state-of-the-art results on various benchmark natural language processing (NLP) tasks such as sentiment analysis, text classification, and question answering. This makes it an ideal choice for generating embeddings for product data in an e-commerce recommendation system like StyleMate.
We used an Image2Vec model with ResNet-50 backend (pre-trained on ImageNet) to provide a list of visually similar items to the user on the product page.
Each image was converted into a 512-dimensional vector, and an average product embedding was calculated.
The cosine similarity metric was used to calculate the similarity between products, and the products were ordered based on relevance.
This approach provided users with a highly relevant list of visually similar products, while minimizing computational costs.
Information nugget - "Sentiment analysis is a natural language processing (NLP) technique used to automatically determine the sentiment (positive, negative, or neutral) expressed in a given piece of text, such as a review, tweet, or news article. Aspect-based sentiment analysis (ABSA) is a specific task within sentiment analysis that aims to identify and extract opinions towards specific aspects or features of a product or service."
To identify the top reviews for each feature of a particular product, we utilized Aspect-Based Sentiment Analysis (ABSA), implemented through a pre-trained DeBERTa model.
In the current prototype, the features are predetermined, but users can provide their own features in the future.
Given that there can be 100s to 10000s reviews for a single product, we calculated the sentiment of each feature in the review and sorted the reviews based on the confidence of the sentiment prediction.
This approach enabled us to identify the most relevant reviews for each feature, providing users with valuable insights to help them make informed decisions. By utilizing a pre-trained DeBERTa model, we were able to achieve highly accurate aspect-based sentiment analysis while minimizing computational costs.
Information nugget - "DeBERTa is a transformer-based language model that significantly improves upon the already impressive performance of the BERT model. DeBERTa introduces several key innovations, including disentangled self-attention and cross-attention mechanisms, which allow for more fine-grained modeling of contextual dependencies and enable more accurate predictions on a range of NLP tasks."