The Video Game Recommender System shows top 10 games list based on user reviews. The Game Finder App use Metacritic Video Game Comments. To find the top 10 games, the app uses inverted term frequency for finding similarities between the user query and the game.
In the information retrieval, TF-IDF stands for term frequency-inverse document frequency. TF-IDF is tells us how important a particular word. Both tf and idf calculated separately.
After fiding tf-idf for each document and user query, we need to calculate consine similarity. Consine similiarity shows how similar two vectors.
Building search was the most difficult phase of the Game Finder App. The most challenging thing I faced in this phase is creating the posting list. In the first version of the search feature, I created a search feature by checking all cosine similarities. But it was taking a lot of time to search. I approximately taking 25 seconds. Later I decided to create the posting list and app performance improved significantly. It is now taking less than 3 seconds. However, as I mentioned before creating the posting list was extremely difficult. I read over and over skeleton given by Dr. Deokgun Park.
For the Phase I, Game Finder used two references.
1. https://github.com/AdnanOquaish/Cosine-similarity-Tf-Idf-/blob/master/DocumentParser.java
2. https://stackoverflow.com/questions/27685839/removing-stopwords-from-a-string-in-java
What I did differently?
First reference describes algorithm for cosine similarities, and tf_idf. From this reference, Game Finder similar algorithm but it is different in implementation. In other words, the reference is used for to understanding the cosine similarities.
Second reference shows the steps which can be taken to remove stopwords. Game Finder used similar algorithm describe in this link.
The classify feature will classify user query based on training data. To classify, Game Finder uses Multinomial Naive Bayes classifier.
Multinomial naive bayes classifier is a very simple algorithm but surprisingly very fast. It is best suited for large data-set and supervised and learning. It assumes every training data is independent. It also assumes each class is independent from each other. Lets discuss step by step algorithm of multinomial naive bayes:
The most challenging part of building a classifier is to understand the algorithm. When I understand the algorithm, I was able to build the classifier very quickly. Another challenge was to build classifier in using Java since there are not many build in libraries compare to python.
In the development phase II, I used two references and these are:
The first link i used to understand the algorithm. Basically, i used the book to understand how classifier works. I also brush up my understanding using a lecture video from edureka!. I didn't used any code from other sources. I built the classifier based on this link.
Recommending small portions of similar products from a large amount of dataset is challenging. To overcome this challenge, we will use a content-based recommender system. Content-based recommender system gives the most priority to user preference. Content-based filtering algorithm will recommend similar products based on their liking.
Game Finder App used content-based recommender system. Content-based algorithm recommends games based on user liking by calculating cosine similarities. To utilize the recommend feature, users first need to search for games. After searching games, Game Finder will show three games based on user search. Then, the user can find three more similar games by selecting one game from the search result list. Basically, the recommender system calculates cosine similarity two times.
TF-IDF: In the information retrieval, TF-IDF stands for term frequency-inverse document frequency. TF-IDF shows us how important a particular word. Both TF and IDF calculated separately.
Cosine similarity: After finding TF-IDF for each document and user query, we need to calculate cosine similarity. Cosine similarity shows how similar two vectors.
Content-based filtering: Content-based filtering recommends similar products by considering user preference. In this case, user preference will be user selected game. After selecting the user desire game, the content-based algorithm shows three more similar games based on the previously selected game.
Major steps for content-based recommender system are below:
Finding 3 games based on user query is very similar to the Search feature.
· Game Finder loads all documents from two .csv file and split data into title, platform, publisher, genre, players, release year, Metacritic rating, user rating, and user reviews.
· After loading data, Game Finder removes stop words and tokenize each word.
· Game Finder calculates TF_IDF score for each token.
· After setting TF_IDF for training data, Game Finder takes user query and perform stemming and lemmatization on the user query. Game Finder also tokenizes user query and calculates TF_IDF for each token.
· After finding all TF_IDF of training data and user query, Game Finder calculates cosine similarities between document and user query. Game Finder will show the top 3 similar games based on the user query.
After finding the search result based on cosine similarities, the user can select one game by typing game number. Game Finder uses user selected game descriptions to find three similar games from Metacritic dataset. Game finder uses content-based filtering to show three similar games. Content-based filtering is very similar to finding cosine similarities. The steps Game Finder uses are below:
· Game finder finds TF_IDF of training data by considering games title, publisher, platform, and genre.
· Game Finder also calculates TF_IDF of user-selected game description. Game Finder use games title, publisher, platform, and genre for user selected games.
· After finding TF_IDF of training data and user selected game, Game Finder calculates cosine similarities and shows three more similar games based on the user-selected game.
Building a recommender system was easy compared to the other two features. However, I struggled at the very beginning of building a recommender system. At first, I tried to build a collaborative filtering method. But I couldn’t manage to finish implementing the collaborative filter method because of time. Then, I switch to a different algorithm for the recommender system which is a content-based recommender system. I also found visualizing the recommend feature. To overcome this difficulty, I read chapter 9 from the Mining of Massive Datasets (MMDS). I also read a blog article online. After understanding content-based filtering, I found it very easy to implements.
To implement the recommender system, I used two sources, and these are:
· Chapter 9 from Mining of Massive Datasets (MMDS): http://infolab.stanford.edu/~ullman/mmds/ch9.pdf
· Online blog article: https://towardsdatascience.com/how-to-build-from-scratch-a-content-based-movie-recommender-with-natural-language-processing-25ad400eb243
What I did differently?
I used these two sources to understand the concept behind content-based fileting. However, I didn’t use codes from any of the sources to implement. The blog post describes one example about Movie Recommending which was help very to understand content-based filtering. I developed this feature by utilizing code from the search feature of the Game Finder (phase I).