Restaurant Reviews Analysis Model Based on ML Algorithms

Amrutha Last Updated : 22 Mar, 2022

6 min read

This article was published as a part of the blog.

Introduction
Working with dataset
Import Count Vectorizer
Import Support Vector Classifier
Using Pipeline
Save the model
Prediction of new reviews using the model
Conclusion

Introduction

In this article, we will be dealing with the Restaurant reviews dataset. In this dataset, there are reviews from the customers which are either positive or negative. And now we are going to build a machine learning model using both Support Vector Classifier(SVC) and Count Vectorizer methods. And finally, this model is going to predict whether the given review is either positive or negative.

Working with Dataset

Let’s start by looking into the dataset.

New Feature

Get Personalized Learning Path! Set your goal and timeline. Get a path—under 2 mins.

Here is the link for the dataset. You can download it and proceed.

https://drive.google.com/file/d/1TgqU0Q_wyEy250ed5xm3lAggYSKU71wN/view?usp=sharing

In this dataset there are two columns namely, Review and Liked. The review column has all the reviews given by the customer. And in Liked column it can be either 0 or 1. 1 indicates positive review and 0 indicates negative review.

We have to import some basic important libraries before working on the machine learning model.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Next, we have to create a data frame. Download the dataset which was shown previously. And create using pandas.

#import Restaurant Reviews Dataset 
df=pd.read_table(r"C:UsersAdminDownloadsRestaurant_Reviews.csv")

In between Invited commas, paste the path of the Restaurant Reviews dataset on your computer. This will save the data frame in the df variable.

let’s view it.

df

It will show the output like this. It will show the first five and last five rows and also it will show the number of rows and number of columns in the data frame.

df.info()

info() method gives the information about the data frame. I will give the number of columns, column labels, number of non-null entries, the data type of the column, memory usage.

output will be

Statistical Description:

It will give total count, mean, standard deviation, minimum value, maximum value, 25% of data, 50% of data, 75% of data.

df.describe()

The output will be like,

Restaurant Reviews Analysis — Source: Author

Let’s see the total columns in the df.

df.columns

Index([‘Review’, ‘Liked’], dtype=’object’)

nunique() method gives the number of unique values in the particular column

df['Liked'].nunique()

unique() method gives unique values in the particular column.

print(df['Liked'].unique())

[1 0]

value_counts() method gives the number of times the particular value repeated in that column through the data frame.

df['Liked'].value_counts()

Let’s see the top 5 entries of the data frame.

df.head()

and similarly, the tail() method is used to view the last 5 entries of the data frame.

Visualizations

plt.figure(figsize=(8,5))
sns.countplot(x=df.Liked);

Here we used the seaborn library to visualize the data frame. This is a count plot where it counts the entries of the column and plots it.

Bar Graph | Restaurant Reviews Analysis — Source: Author

Define X and Y

Here, X is the input feature that we give to the model, and Y is the output that the model should predict. And coming to our dataset, the Review column is the input that we give, and Liked is going to be predicted by the model.

x=df['Review'].values
y=df['Liked'].values

Split the Dataset into Training and Testing Sets

For this, we have to import train_test_split from the scikit learn library. And then whole data frame is divided into four data sets. They are, x_train, x_test, y_train, y_test. Bot x and y are divided into training and test datasets.

from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,random_state=0)

View the Shapes of Train Sets and Test Sets

x_train.shape

(750,)

x_test.shape

(250,)

y_train.shape

(750,)

y_test.shape

(250,)

Import CountVectorizer

from the sci-kit learn library we have to import CountVectorizer. And then store it in a variable something like vect with setting stop_wors as “English”.

This count vectorizer transforms the text into a vector based on the count of the words like the number of times the word is repeated in the sentence.

from sklearn.feature_extraction.text import CountVectorizer
vect=CountVectorizer(stop_words='english')

x_train_vect=vect.fit_transform(x_train)
x_test_vect=vect.transform(x_test)

Import Support Vector Classifier(SVC)

Import Support Vector Classifier(SVC) from Support Vector Machine (SVM) library and assign it to a variable called a model.

from sklearn.svm import SVC
model=SVC()

Train the Model

The fit method is used to train the model and we have to pass training datasets as arguments in it to train the model.

model.fit(x_train_vect,y_train)

Predict the Test Results

Use predict method to predict the test results. Pass the x variables of the testing dataset in it.

y_pred=model.predict(x_test_vect)

Evaluate the Model

For machine learning models to evaluate it, we use variable methods and all these are in the metrics library and here for support vector classifier(svc), we use accuracy score to evaluate it.

Import accuracy_score from scikit learn metrics library and then pass two arguments to which we have to compare and evaluate. Here predicted dataset and test dataset are taken to evaluate.

accuracy_score(y_pred,y_test)

0.792

For my model, the accuracy is 79.2%.

Using Pipeline

Before using pipeline in our model, let us understand a little bit about the pipeline. Basically, the pipeline is used whenever we use multiple methods, classes, or models together. Let us understand the pipeline more using the below code.

First, we will see without using the pipeline.

    vect = CountVectorizer()
    tfidf = TfidfTransformer()
    clf = SGDClassifier()
    vX = vect.fit_transform(Xtrain)
    tfidfX = tfidf.fit_transform(vX)
    predicted = clf.fit_predict(tfidfX)
    # Now evaluate all steps on test set
    vX = vect.fit_transform(Xtest)
    tfidfX = tfidf.fit_transform(vX)
    predicted = clf.fit_predict(tfidfX)

And now using pipeline we just need to use very few lines of code. We just have to pass all the methods we are willing to use as arguments in the pipeline method.

pipeline = Pipeline([
    ('vect', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
    ('clf', SGDClassifier()),
])
predicted = pipeline.fit(Xtrain).predict(Xtrain)
# Now evaluate all steps on test set
predicted = pipeline.predict(Xtest)

Now coming to our model, let’s use the pipeline method. For that import make_pipeline from the pipeline library. And pass CountVectorizer and SVC as arguments into it.

from sklearn.pipeline import make_pipeline
text_model=make_pipeline(CountVectorizer(),SVC())

Train the Model with Training Sets

Now again as we know the fit method is used to train the model, train our new model which is made using the pipeline.

text_model.fit(x_train,y_train)

Predict the Test Results

Similarly predict the results using predict method.

y_pred=text_model.predict(x_test)

And the outcome will be,

y_pred

Test Results | Restaurant Reviews Analysis

Source: Author

Evaluate the Model

Let’s evaluate our new model using accuracy_method.

accuracy_score(y_pred,y_test)

0.792

The accuracy of the model is 79.2%.

Save the Model

We can save the model and for that, we have to use joblib. Import joblib and using dump method we can save it. We have to pass two arguments in it. one is the model and the other is the name of our file.

import joblib
joblib.dump(text_model,'Project')

And again to use it we have to use the load method. We can retrieve it using the load method and save it to a variable.

import joblib
text_model=joblib.load('Verzeo_Major_Project')

Prediction of New Reviews using the Model

Now our model is well trained and ready for implementation. Let us try with some examples.

text_model.predict(['hello!!Love Your Food'])

array([1], dtype=int64)

Here the review is a positive review and as expected our model predicted 1 for it which means positive.

Let’s try with a negative review and see what it will predict.

text_model.predict(["omg!!it was too spice and i asked you don't add too much "])

array([0], dtype=int64)

As expected it gave 0 as output which means negative.

Conclusion

We have learned how to work on support vector classifier and count vectorizer and also we have seen how to use both on the model using pipeline and we have created a model which is able to predict whether the review is positive or negative. We have also seen it using some examples. And we saved the model using the joblib and also retrieved it and used back using the joblib.

Hope you guys found this article on restaurant reviews analysis useful. Share your views in the comments sections. Read more articles on our blog.

Connect with me on LinkedIn: https://www.linkedin.com/in/amrutha-k-6335231a6vl/

Thank you!

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.

Amrutha

This is Amrutha, I am pursuing B.Tech in the Computer science Department. I am interested in developing ML Models with python and Data Analysis. And also I have an interest in Web Development. I hope my articles in Analytics Vidhya help you to learn better. Thank you!!

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

Restaurant Reviews Analysis Model Based on ML Algorithms

Table of Contents

Introduction

Working with Dataset

Get Personalized Learning Path! Set your goal and timeline. Get a path—under 2 mins.

Define X and Y

Split the Dataset into Training and Testing Sets

View the Shapes of Train Sets and Test Sets

Import CountVectorizer

Import Support Vector Classifier(SVC)

Train the Model

Predict the Test Results

Evaluate the Model

Using Pipeline

Train the Model with Training Sets

Predict the Test Results

Evaluate the Model

Save the Model

Prediction of New Reviews using the Model

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID