Recommendation App & Restaurant Decision Tool for Two
Modern consumers are overwhelmed with dining choices. With all the information and services available at a consumer’s fingertips, it takes a lot for restaurants to stand out. Recommendation systems offer an effective way for lesser-known restaurants to come to the attention of consumers. Such systems allow consumers to explore the abundance of choices at hand while still catering to their particular interests. As a result, they can enhance a consumer’s satisfaction and loyalty.
Considering this, we wanted to explore the two broad groups of recommendation systems: content-based and collaborative filtering methods. Content-based systems use the attributes of items and users to recommend items similar to those liked by the users in the past. In contrast, collaborative filtering systems recommend items that were liked by people who were identified as having similar tastes as the user. We have also combined both approaches into a hybrid recommendation system, using restaurant attributes and the users’ review history.
Additionally, we have developed an evaluation system for the user-based recommendation systems in order to be able compare them and choose the best approach. The chosen recommendation system was implemented in our app, “OkFoodie!,” that is now online to provide indecisive customers recommendations for two within the city of Las Vegas.
For this project, our team wanted to explore data provided by Yelp to create a recommendation system for two people with different tastes. The only task more difficult than one person picking a place to go to dinner is when two people need to choose a restaurant.
The Yelp Challenge Dataset is a subset of Yelp’s businesses, reviews, and user data from several cities. The data consists of:
- 5,200,000 user reviews
- 174,000 businesses
- 11 metropolitan areas from two countries
We chose to focus on Las Vegas because it had a large proportion of the businesses in the dataset with over 4,000 restaurants.
Yelp Dataset & EDA
Our exploratory data analysis started with a thorough understanding of the Yelp users. The average rating given for businesses is 3.8 stars. Out of all the reviewers, only 20% are writing the majority of the reviews, most reviewers write fewer than 5 reviews.
The Yelp Elite status is a way for the site to recognize users who are active in the Yelp community. Elite-worthiness is based on well-written reviews and high quality tips. Our team felt the reviews from Yelp Elite users should receive more weight when creating a recommendation system.
The general process for the app, as shown in the diagram below, requires two users to interact with the app by providing their location and each person’s restaurant preferences from the local restaurant list. The model utilizes the reviews written for each restaurant and runs the text through a natural language processing (NLP) model to find the most similar restaurants based on their reviews. Next, the similar restaurants are filtered based on location and other criteria taken from the businesses’ information contained in the dataset. Those recommendations are then ranked and, finally, returned back to the user within the app along with their locations displayed on a map.
Yelp and the 1/9/90 Rule (Unknown Users)
When you think about Yelp users, you probably think of the users who write the reviews. However, according a phenomenon in social media known as the 90/9/1 Rule, the users that write reviews are only a small percentage of the overall population using Yelp. According to this rule, only 1 percent of users will actively create content. Another 9 percent are users that observe and occasionally contribute. The other 90 percent observe without responding. In other words, most people on Yelp don’t write reviews; they merely read them. Those Yelp lurkers represent the majority of the prospective users for our app.
The first approach we chose was to offer recommendations to these users using a content-based recommendation system. This approach uses a series of discrete characteristics of an item liked by the user in order to recommend additional items with similar properties. In our case, each user would choose a restaurant that they like to form the basis of the recommendation. Examining the text of the reviews of the two selected restaurants would bring to light details that we would use to find other restaurants with similar characteristics.
A precedent for this content-based approach can be found in Pandora. The music site uses the properties of a song or artist to create a "station" that plays music with similar properties.
Similarity Based on Text Vectorization
The approach we followed to rank the restaurants for our recommendation system was to find restaurants with similar reviews to the chosen restaurants. In order to be able to compare the similarity between the reviews, we have to follow the following steps: pre-processing, vectorization, and document similarity.
The pre-processing of the text mainly includes:
- Tokenizing, or splitting the text into a sequence of words
- Lemmatizing, or identifying the root words
- Filtering stop words-- frequent words that add no value to our text
- Filtering infrequent words
Once the text is pre-processed, we vectorized the text (or make a numerical representation of it) by using word embeddings. The types of word embeddings can be classified into two categories: (1) frequency-based embeddings and (2) prediction-based embeddings.
- Frequency-based embeddings are represented by three types of vectors:
- Count vectors, which only take into consideration the frequency of the words
- TF-IDF (Term Frequency-Inverse Document Frequency): On top of the frequency of the words, this also considers the importance of a word in a document. For example, it penalizes common words by assigning them lower weights and gives words that are only present in some documents higher weights.
- Co-occurrence vectors or matrices. These vectors measure how often two words occur together, which is usually linked to their semantic relationship.
- Additionally, we create prediction-based vectors, which include Word2Vec and Doc2Vec, which are discussed in the next section.
Finally, once we have our vectorized text for each review, we measure the document similarities by calculating the cosine similarity between the reviews of the business “liked” by our users and the reviews of all the other restaurants. When calculating cosine similarity, the dimensions are the terms or words. The similarity depends on the orientation of the vectors. Two similar vectors with the same orientation would have a cosine similarity of 1, and two vectors oriented 90-degrees to each other would have a cosine similarity of 0. Using the cosine similarity between the reviews, the corresponding restaurants are ranked in descending order.
TF-IDF Weighted-Sum of Embedding Vectors
For our recommendation system, we have generated the TF-IDF vectors and also generated a more sophisticated text document vector by multiplying the matrix of TF-IDF weights by a co-occurrence matrix to obtain a TF-IDF weighted sum of embedding vectors.
In the resulting matrix, the number of rows corresponds to the number of documents, and the number of columns corresponds to the number of terms, which were the top 300 words in each review. In order to be able to use these vectors as inputs in our collaborative filtering model, we decided to reduce dimensionality. With PCA (Principal Component Analysis), we reduced it to 8 dimensions. We further used t-SNE (T-distributed Stochastic Neighbor Embedding) to project the vectors in 2D, obtaining this graph in which each document is a dot. Orange represents good reviews, and blue represents the bad reviews. Even though there is one large cluster, we can see how the bad reviews are mostly located in the upper part of the cluster. These vectors were further used to feed into our collaborative filtering model along with the vectors generated by the Doc2Vec model.
Inferring Contextual Similarities with Doc2Vec
Another approach we used for analyzing the Yelp reviews in order to recommend restaurants was with a neural network-based algorithm called Doc2Vec. This method allows us to compare the reviews based on context, as well as the word representations.
Doc2Vec, an unsupervised NLP model, generates vectors for a larger piece of text, such as a paragraph or document, independent of the text’s length. It is based on the Word2Vec algorithm that defines a vector representation for each word in the document based on its neighboring words to infer additional relationships between them, such as antonyms or analogies. These word vectors are then combined with a paragraph or document vector that represents the overall contextual meaning of the document. Besides comparing the words contained within each review, which may give information about menu items, cuisine, and service quality, one advantage of the document vectors is that they can help infer other useful information about each restaurant, such as its uniqueness, reputation, or customer base, leading to more insightful recommendations for the two users.
The process for comparing the Yelp restaurant reviews using the GenSim Doc2Vec module is fairly straightforward. All reviews are pre-processed to tokenize the text, and punctuation is removed. Each review is tagged with its ID before training with the Doc2Vec model where 200-dimension vectors are constructed for each review. Once they are trained, we are able to use the vectors from the set of reviews for each restaurant to find similarities between the two original choices and the remaining restaurants in the chosen area. Using the median cosine similarities to remove any unusual reviews or outliers, we can then rank the restaurants and provide personalized recommendations for the users. The paragraph vectors are also used in the collaborative filtering model, as discussed in the following section.
Collaborative Filtering & Matrix Factorization (Existing Users)
Matrix factorization is a class of collaborative filtering algorithms used in recommender systems. Matrix factorization techniques are much more effective than user and item-based systems because it uncovers the latent (hidden) features and underlying interactions which describe the relationships between users and items.
In mathematics, factorization or factoring consists of writing a number or another mathematical object as a product of several factors, usually smaller or simpler objects of the same type. Matrix factorization algorithms work the same way by decomposing the user-item interaction matrix into the products of matrices with lower dimensionality.
Let R be the matrix of size U (Users) x I (Items-Restaurants) that contains all the ratings that the users have assigned to the items. Now the latent features would be discovered. Our task then, is to find two matrices, P (U – Users x К – Latent Features) and Q (I – Items x К – Latent Features) such that their product approximately equals R, given by:
R ≈ P x QT = Ȓ
Our team used the Graphlab library in Python to build a model where you can feed various features pertaining to the users and items. The Factorization recommender for this library trains a model capable of predicting a score for each possible combination of users and items. The internal coefficients of the model are learned from the known score of users and items. The dataset used for training the model must contain a column of user IDs and a column of item IDs. Each row represents an observed interaction between the user and item. The user and item pairs are stored with the model so that they can later be excluded from recommendations if desired. It can optionally contain a target ratings column. All other columns included are considered side features.
The GraphLab factorization model was fed a user table, item (restaurant) table, and review table. User and items are represented by weights and factors.
Our team extracted features such as Elite users, user with friends, and users that provided at least 5 ratings from the Yelp dataset. The restaurant table included features such as kid-friendly, restaurant atmosphere, alcohol, intimate, touristy, friendly and about 10 other features that our team felt were relevant to restaurant attributes. The review table included NLP vectors created from Doc2Vec and TF-IDF for each review.
The factor terms model interactions between users and items. For example, if a user tends to love Mexican food and hate Italian food, the factor terms attempt to capture these features along with items (restaurants) that are similar to the items that this user likes.
The Factorization Machine recommender model approximates target rating values as a weighted combination of user and item latent factors, biases, side features, and their pairwise combinations. The recommender targets rating values by utilizing stochastic gradient descent and alternating least square (ALS).
The categorical features were turned into dummy variables and, when fed into the model, the matrix factorization predicts the user ratings (scores).
When utilizing this recommender system for 2, our team generates a list of recommendations for each user with a score and rank. The two lists are then merged on the business ID, and the users ranks are summed. The lower the score, the stronger the recommendation for that particular user; rank 1 would be the best recommendation. The restaurant with the lowest sum (highest rank) is the restaurant recommended to both users.
When implementing our exploratory data analysis and NLP vectors, the results shown below demonstrate that adding more information (particularly the weight of the review through NLP) substantially improves the model’s ability to predict user ratings.
As we collect more user and restaurant information, the model will improve its ability to predict ratings and improve its recommendation capabilities.
While the performance of our collaborative filtering models allows for a direct evaluation metric (e.g., RMSE), recommendations based on unsupervised models lack an immediate approach to evaluation. Nonetheless, we sought out to identify a way to measure the success of our recommendations for unknown users. Our approach relies on the idea that, despite lacking information for such users, the selected restaurant(s) allows us to match them with existing users in the Yelp dataset. By matching a new and existing user based on the former’s input to our app, we can develop a measure of how likely unknown users are to like the recommendations, despite them resulting from unsupervised machine learning models.
Our general approach is summarized as follows: Imagine two users who are unknown to Yelp supplying two restaurant choices to our app. At this point, all we know about these users is that one user (e.g., User A) likes their choice (e.g., Restaurant 1) and the other user (e.g., User B) likes their choice (e.g., Restaurant 2). Despite lacking additional information about these individuals, our project uses these choices to produce a recommendation set for these users. We evaluate the quality of our recommendation set by assuming the preferences of these unknown users can be approximated by existing Yelp users who have registered that like Restaurants A and B.
Implementing this idea requires several steps. First, we must deal with the sparsity of the existing ratings matrix. As most Yelp users only review a small number of restaurants, we use collaborative filtering to predict users’ unknown ratings. Second, we must also match new and existing users. As any two given Yelp users may rate restaurants very differently on average, we initially demean the ratings matrix at the user-level. The demeaned ratings are then used to identify the set of users that “really like” (or would really like) Restaurant A and also the set of users that really like Restaurant B. The set of existing users that are matched to the unknown user that selected Restaurant A are those for which this restaurant is among the top 600 restaurants in the ratings matrix. With approximately 3,000 restaurants in our main sample, this threshold corresponds to approximately a user’s top 20% of restaurants.
The ratings of these comparison groups can then be used to approximate the probability that the two unknown users would like the restaurants our app recommends. For any given input from our app, which is a pair of restaurants, this probability can be estimated by simply computing the probability that the comparison groups would like the set of recommendations produced by the app. In general, any given unsupervised approach can then be evaluated by developing a random sample of restaurant pairs and then computing the average probability that the unknown users would like their recommendation set.
The probability that our top recommendations would be liked by the users is quite high: ~90%. However, we did find that this probability can be influenced considerably by which thresholds are used. For example, we found much lower success rates when we computed the probability that a recommendation set of the top 100 restaurants would be liked, on average, when the comparison group of users were only considered to have liked the restaurant if it were in the top 200 (of ~3000) restaurants. This point illustrates that while the collaborative filtering results can help us gauge the performance of our unsupervised recommendations, additional work should be done to determine the optimal parameter choices.
Ok Foodie! - Restaurant Decision App
The app we nicknamed “OkFoodie!” currently utilizes the Doc2Vec NLP model within the Flask interactive framework to provide recommendations for restaurants in Las Vegas. Users can allow the web-based app use their current location, or they may enter their zip code, so it can provide recommendations within 5 miles of the given location. The two users then enter their preferred restaurants, hit Submit, and the model returns a list of 5 restaurants based on similarities to both of the users’ original restaurant preferences.
Conclusions & Next Steps
While we are able to provide restaurant recommendations based on similarities inferred from the Yelp reviews as well as with preferences assumed for the two users, we know that there are many more steps to be taken to enhance the algorithm, recommendation quality, and user experience. Moving forward, we expect that the model would be refined further to incorporate larger samples of the Yelp dataset and possibly the reviewers’ tips that are also available for each restaurant in order to improve accuracy and personalization. We could also begin adding other cities available from the Yelp challenge dataset, at which point we also intend to scale-up the model for processing with Spark.
Future versions may also include the option for existing Yelp users to log into the site so we may use their profiles to provide more personalized recommendations through the collaborative filtering model. User selections can also help optimize the model performance. Ultimately, we anticipate this would be available as a mobile app, allowing users to find restaurants when traveling based on restaurants they’ve liked in other locations around the globe.