Aaron Krueger 1 February, 2021
During our work on the Flowability project, we came across a scenario where we want to perform polynomial regression on a set of data points. Polynomial regression is used when data is distributed in a more complicated manner than can be accurately analyzed with linear regression. In short, linear regression would not fit the data distribution properly, so we instead turned to polynomial regression.
To perform polynomial regression, we want to first import a couple of libraries:
Note that we import LinearRegression here; the model itself is still linear, but we are attempting to fit a non-linear curve!
First, let’s create some dummy test data:
Next, we can use the PolynomialFeatures class to convert our original features (designated by x) into higher order terms, then use that to fit and transform them. In this case, we are using a second degree polynomial (as seen by “degree=2”).
Next, we can create our polynomial regression model, and use that to predict our y values. If we were performing normal linear regression, we would use the x data as is (i.e. replacing “x_poly” with “x”).
Now that we’ve fit the model, we can plot our x and y data. After this, we’ll sort and plot the model we created.
Here are our results compared to a standard linear regression model and that of a model with a cubic fit (i.e. with “degree=3” as the PolynomialFeatures parameter):
You can see that the line with “degree=2” performs much better than the linear fit, and the cubic fit performs better than both! I hope you’ve enjoyed learning about Polynomial Regression and its visualization, and are able to apply it in the future.