In this article, we will study the Polynomial Regression model and implement it using Python on sample data. I hope you are already familiar with Simple Linear Regression Algorithm and multiple polynomial. If not, then please visit our previous article and get a basic understanding of the linear regression model vs. polynomial regression and linear regression because this regression in python is derived using the same concept of Linear regression with few modifications to increase accuracy.
This article will teach you about polynomial regression, including what it is, examples, and its uses in machine learning. We will investigate the process of polynomial regression, emphasizing its mathematical basis and real-world application. Also, we will give an example of polynomial regression to demonstrate how it can effectively model nonlinear relationships between variables.
Learning Objectives
This article was published as a part of the Data Science Blogathon.
A simple linear regression algorithm only works when the relationship between the data is linear. But suppose we have non-linear data, then linear regression will not be able to draw a best-fit line. Simple regression analysis fails in such conditions. Consider the below diagram, which has a non-linear relationship, and you can see the linear regression results on it, which does not perform well, meaning it does not come close to reality. Hence, we introduce it to overcome this problem, which helps identify the curvilinear relationship between independent and dependent variables.
Polynomial regression is a form of Linear regression where only due to the Non-linear relationship between dependent and independent variables, we add some polynomial terms to linear regression to convert it into Polynomial Regression in Machine Learning.
The relationship between the dependent variable and the independent variable is modeled as an nth-degree polynomial function. When the polynomial is of degree 2, it is called a quadratic model; when the degree of a polynomial is 3, it is called a cubic model, and so on.
Suppose we have a dataset where variable X represents the Independent data and Y is the dependent data. Before feeding data to a mode in the preprocessing stage, we convert the input variables into polynomial terms using some degree.
Consider an example my input value is 35, and the degree of a polynomial is 2, so I will find 35 power 0, 35 power 1, and 35 power 2 this helps to interpret the non-linear relationship in data.
The equation of polynomials becomes something like this.
y = a0 + a1x1 + a2x12 + … + anx1n
The degree of order which to use is a Hyperparameter, and we need to choose it wisely. But using a high degree of polynomial tries to overfit the data, and for smaller values of degree, the model tries to underfit, so we need to find the optimum value of a degree. Polynomial Regression in Machine Learning models are usually fitted with the method of least squares. The least square method minimizes the variance of the coefficients under the Gauss-Markov Theorem.
If you see the equation of polynomial regression python carefully, then we can see that we are trying to estimate the relationship between coefficients and y. And the values of x and y are already given to us, only we need to determine coefficients, and the degree of coefficient here is 1 only, and degree one represents simple linear regression Hence, Polynomial Regression in Machine Learning is also known as Polynomial Linear Regression as it has a polynomial equation and this is only the simple concept behind this. I hope you got the point right.
Now we know how polynomial regression works and helps to build a model over non-linear data. Let’s compare both algorithms practically and see the results.
First, we will generate the data using some equation ax^2 + bx + c, and then apply simple linear regression to it to form a linear equation. Then we will apply regression polynomial on top of it, which will make an easy comparison between the practical performance of both algorithms.
Initially, we will try it with only one input column and one output column. After having a brief understanding we will try it on high-dimensional data.
let’s make your hands dirty with some practical implementations
Step 1: Import all the libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import r2_score
Step 2: Create and visualize the data
Python Code:
import numpy as np
import matplotlib.pyplot as plt
X = 6 * np.random.rand(200, 1) - 3
y = 0.8 * X**2 + 0.9*X + 2 + np.random.randn(200, 1)
#equation used -> y = 0.8x^2 + 0.9x + 2
#visualize the data
plt.plot(X, y, 'b.')
plt.xlabel("X")
plt.ylabel("Y")
plt.show()
We have added some random noise in the data so that while modeling, it does not overfit it.
Step 3: Split data in the train and test set
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=2)
Step 4: Apply simple linear regression
Now we will analyze the prediction by fitting simple linear regression. We can see how worse the model is performing, It is not capable of estimating the points.
lr = LinearRegression()
lr.fit(x_train, y_train)
y_pred = lr.predict(x_test)
print(r2_score(y_test, y_pred))
If you see the score, it will be near 15 percent to 20 percent, which is too much. If you plot the prediction line, it will be the same as we saw above, which is not capable of identifying or estimating the best-fit line.
plt.plot(x_train, lr.predict(x_train), color="r")
plt.plot(X, y, "b.")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()
Step 5: Apply polynomial regression
Now we will convert the input to polynomial terms by using the degree as 2 because of the equation we have used, the intercept is 2. while dealing with real-world problems, we choose degree by heat and trial method.
#applying polynomial regression degree 2
poly = PolynomialFeatures(degree=2, include_bias=True)
x_train_trans = poly.fit_transform(x_train)
x_test_trans = poly.transform(x_test)
#include bias parameter
lr = LinearRegression()
lr.fit(x_train_trans, y_train)
y_pred = lr.predict(x_test_trans)
print(r2_score(y_test, y_pred))
After converting to polynomial terms, we fit the linear regression which is now working as Regression in Machine Learning. If you print the x_train value and train transformed value, you will see the 3 polynomial terms. And the model is now performing descent well and if you see the coefficients and intercept value. our coefficient was 0.9, and it predicted 0.88 and the intercept was 2 it has given 1.9 which is very close to the original and the model can be said as a generalized model.
print(lr.coef_)
print(lr.intercept_)
If we visualize the predicted line across the training data points, we can see how well it identifies the non-linear relationship in data.
X_new = np.linspace(-3, 3, 200).reshape(200, 1)
X_new_poly = poly.transform(X_new)
y_new = lr.predict(X_new_poly)
plt.plot(X_new, y_new, "r-", linewidth=2, label="Predictions")
plt.plot(x_train, y_train, "b.",label='Training points')
plt.plot(x_test, y_test, "g.",label='Testing points')
plt.xlabel("X")
plt.ylabel("y")
plt.legend()
plt.show()
Now we will design a function that will help you to find the right value for a degree. here we apply all the preprocessing steps we have done above in a function and map the end prediction plot on it. All you need to do to pass is the degree and it will build a model and plot a graph of a particular degree. here we will create a pipeline of preprocessing steps that makes the process streamlined.
Source: Analytics Vidhya
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
def polynomial_regression(degree):
X_new=np.linspace(-3, 3, 100).reshape(100, 1)
X_new_poly = poly.transform(X_new)
polybig_features = PolynomialFeatures(degree=degree, include_bias=False)
std_scaler = StandardScaler()
lin_reg = LinearRegression()
polynomial_regression = Pipeline([
("poly_features", polybig_features),
("std_scaler", std_scaler),
("lin_reg", lin_reg),
])
polynomial_regression.fit(X, y)
y_newbig = polynomial_regression.predict(X_new)
#plotting prediction line
plt.plot(X_new, y_newbig,'r', label="Degree " + str(degree), linewidth=2)
plt.plot(x_train, y_train, "b.", linewidth=3)
plt.plot(x_test, y_test, "g.", linewidth=3)
plt.legend(loc="upper left")
plt.xlabel("X")
plt.ylabel("y")
plt.axis([-3, 3, 0, 10])
plt.show()
when we run the function while passing high degrees like 10, 15, and 20, then the model tries to overfit the data means slowly the prediction line will leave its original essence and try to rely on training data points, and as there is some change in the training path, the line tries to catch the point.
polynomial_regression(25)
This is a problem with a High degree of polynomial, which I want to show you practically, so it’s necessary to choose an optimum value of a degree. here I would like to recommend you try a different degree and analyze the results.
We have seen polynomial regression python with one variable. most of the time, there will be many columns in input data, so how to apply regression polynomial and visualize the result in 3-dimensional space. It sometimes feels like a hectic task for most beginners, so let’s crack that out and understand how to perform Polynomial Regression in Machine Learning in 3-d space.
Step 1: Creating a dataset
I am taking 2 input columns and one output column. the approach with multiple columns is the same.
# 3D polynomial regression
x = 7 * np.random.rand(100, 1) - 2.8
y = 7 * np.random.rand(100, 1) - 2.8
z = x**2 + y**2 + 0.2*x + 0.2*y + 0.1*x*y +2 + np.random.randn(100, 1)
let’s visualize the data in 3-d space using a 3-D scatter plot (Plotly library).
import plotly.express as px
df = px.data.iris()
fig = px.scatter_3d(df, x=x.ravel(), y=y.ravel(), z=z.ravel())
fig.show()
Step 2: Applying linear regression
first, let’s try to estimate results with simple linear regression for better understanding and comparison.
let’s visualize the prediction of linear regression in 3-d space.
import plotly.graph_objects as go
fig = px.scatter_3d(df, x=x.ravel(), y=y.ravel(), z=z.ravel())
fig.add_trace(go.Surface(x = x_input, y = y_input, z =z_final ))
fig.show()
Step 3: Estimating results using polynomial regression python
Now we will transform inputs to polynomial terms and see the powers
X_multi = np.array([x,y]).reshape(100,2)
poly = PolynomialFeatures(degree=30)
X_multi_trans = poly.fit_transform(X_multi)
print("Input",poly.n_input_features_)
print("Ouput",poly.n_output_features_)
print("Powersn",poly.powers_)
After running the above code, you will get the powers of both x and y, and we can estimate the result as x power 0 and y power 0, x power 1 and y power 0, and so on. let’s apply the regression to these polynomial terms.
lr = LinearRegression()
lr.fit(X_multi_trans, z)
X_test_multi = poly.transform(final)
z_final = lr.predict(X_multi_trans).reshape(10,10)
Now when we visualize the results of regression polynomial, we can see how well the contour has plotted.
The plot looks beautiful. We can see in some places, the plot is up and down, meaning somewhere it is overfitting the data. So it takes some time to find the generalized term, and you have to do the heat and trial method.
I hope you now understand the intuition and practical implementation behind the algorithm.
This tutorial taught us that polynomial regression is a form of linear regression, specifically a special case of multiple linear regression. It estimates the relationship as an nth-degree polynomial. Polynomial Regression in Machine Learning is sensitive to outliers, so the presence of one or two outliers can also badly affect the performance.
Hope you like the article! Polynomial regression is a powerful technique in machine learning that models relationships using polynomial equations. For instance, a polynomial regression example can illustrate how to do polynomial regression by fitting a curve to data points, capturing non-linear patterns effectively. This method enhances predictive accuracy when linear models fall short, making it essential in various applications.
Key Takeaways
A. Linear regression models a relationship with a straight line, while polynomial regression uses a curve by including higher-degree terms.
A. Use polynomial regression when data shows a nonlinear relationship that a straight line cannot accurately model.
A. A real-life example of polynomial regression is predicting the trajectory of a rocket, where the relationship between time and position is nonlinear.
A. Logistic regression predicts categorical outcomes, typically binary, while polynomial regression models continuous data with a polynomial equation.
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.
hello dear im working on design eye tracker , i use a camera to capture an eye , then i write a python code to find pupil position (x,y) from that photo , the challenge is the next step , finding the point of regard , by another word gaze estimation . gaze astimation is excuted by regression , any help with REGRESSION ?
I have five independent variables. When I was doing linear regression analysis, two of them become insignificant. So if I carried a quadratic regression analysis, then do I need to consider three significant variables again in the quadratic model?
This is a great tutorial on how to use polynomial regression to predict outcomes. I found it to be very informative and helpful.