2.3. Multiple Linear Regression Philippine Revenue

1. Concepts & Definitions

1.1. Linear regression: Concepts and equations

1.2. Linear regression: Numerical example

1.3. Correlation is no causation

1.4. Dummy and categorical variables

1.5. Multiple linear regression

1.6. Dummy multiple linear regression

2. Problem & Solution

2.1. Predicting Exportation & Importation Volume

2.2. Cumulative Probability Predictions

2.3. Multiple Linear Regression Philippine Revenue

Multiple Linear Regression Analysis for Philippines’ Revenue

This study case is a simplified methodology to deal with the problem and data described in the article "Modelling the Philippines’ Revenue Collection Performance: A Study Using Multiple Linear Regression Analysis" and can be seen at:

https://ijrpr.com/uploads/V3ISSUE11/IJRPR8001.pdf

To build the linear regression model the dataset has five independent variables: inflation rate, forex rate, import value, export value, and stock price index. These variables can be used to predict the Philippines’ revenue value.

Reading and cleaning the dataset from an Excel file

Let's employ only trading indicators, from the online data, to build a multiple linear regression to predict the Philippines’ revenue value. But, first, it is necessary to read and clean the data using the following Python commands. The next code declares which columns, from the Excel Table, will be employed.

import pandas as pd

# Original shared link: https://docs.google.com/spreadsheets/d/1m9bOb2Kw-kq6j_6Ji75h6cJK5-THhSL1/edit?usp=sharing&ouid=106640872116257813737&rtpof=true&sd=true

url = "https://drive.google.com/file/d/1m9bOb2Kw-kq6j_6Ji75h6cJK5-THhSL1/view?usp=sharing"

url2='https://drive.google.com/uc?id=' + url.split('/')[-2]

df = pd.read_excel(url2)

Now, let's check data validity by knowing the type of each column data.

print(df.dtypes)

Date object

Revenue Collections (LNY) float64

Inflation Rate(x1) float64

Forex Rate Average (x2) float64

Import (FOB Value in Million USD) (x3) float64

Export (FOB Value in Million USD) (x4) float64

Stock Price Index (x5) float64

dtype: object

It is also important to check data validity by verifying for Null values.

df.isnull().values.any()

False

Since there are no null values, let's filter columns to select only numerical columns.

# Remove non-numeric columns

df2 = df.drop(df.columns[[0]], axis=1, inplace=False)

df2.drop_duplicates()

df2

Extracting data statistics

Since the data set had been cleaned, it is time to extract statistics to better understand how to model the multiple linear regression. The first step is to compute the Pearson correlation between variables.

pearsoncorr = df2.corr(method='pearson')

pearsoncorr

The Pearson correlation could be visualized using a heat map graphic.

# import required libraries

import seaborn as sns

# Displaying dataframe as an heatmap

# with diverging colourmap as RdYlGn

sns.heatmap(pearsoncorr, cmap ='RdYlGn', linewidths = 0.30, annot = True);

A dispersion graphic is another way to visualize patterns of correlation between independent and dependent variables as done in the next commands.

import seaborn as sns

X = df2[list(df2.columns)]

sns.pairplot(X);

Choosing input and output variables for the multiple linear regression model

First, select the column that will be the multiple linear regression model output.

# Y output column, revenue is the output

out_col = 'Revenue Collections (LNY)' # 'Revenue Collections (y)'

y_revenue = df2[out_col]

y_revenue

0 11.36

1 11.14

2 11.23

3 11.67

4 11.46

...

68 11.86

69 11.87

70 12.10

71 11.94

72 12.08

Name: Revenue Collections (LNY), Length: 73, dtype: float64

And define the columns that will be the independent variables of the model.

# X 5 predictor variables

x = df2[['Inflation Rate(x1)',

'Forex Rate Average (x2)',

'Import (FOB Value in Million USD) (x3)',

'Export (FOB Value in Million USD) (x4)',

'Stock Price Index (x5)']]

Another possibility is to select only the columns that showed higher values of Pearson correlation or dispersion graphic linear pattern, i.e., variables x3 and x5,

Now it is possible to build a multiple linear model with all variables and only the selected variables and print a detailed report about it.

# Obtain multiple linear regression using statsmodels

import statsmodels.api as sm

# Employing multiple linear regression and return its results.

def get_model(x, y):

xm = sm.add_constant(x) # adding a constant

model = sm.OLS(y, xm).fit()

return model, xm

# Printing a report about the model.

def print_model(model, name='None'):

summary = model.summary()

print('--------------------------------------')

print(name,'model')

print(summary)

print('--------------------------------------')

# Multiple linear regression using statsmodels

model_revenue, xm_import = get_model(x, y_revenue)

print_model(model_revenue,'Revenue - All five')

model_revenue_selected, xm_import_selected = get_model(x_selected, y_revenue)

print_model(model_revenue_selected,'Revenue - x3 and x5')

Finally, it is possible to use both models to predict the dependent variable and use graphics to verify the quality of the model's prediction.

import matplotlib.pyplot as plt

def model_predict(model, xm):

y_hat = model.predict(xm)

return y_hat

def draw_model(model, y, yname, y_hat):

xp = list(range(1,len(x)+1))

plt.plot(xp,y,'ob',label='Data')

plt.plot(xp,y,'-r')

plt.plot(xp,y_hat,'--g',label='Prediction')

plt.title('Prediction on ' + str(yname))

plt.xlabel('Time')

plt.ylabel(yname)

plt.legend()

plt.grid()

plt.show()

# Predict the values used to build the model with all coefficients.

y_hat = model_predict(model_revenue, xm_import)

#print(y_hat)

# Drawing to compare data x prediction in score data.

draw_model(model_revenue, y_revenue, 'Revenue', y_hat)

# Predict the values used to build the model with x3 and x5 coefficients.

y_hat_selected = model_predict(model_revenue_selected, xm_import_selected)

#print(y_hat_selected)

# Drawing to compare data x prediction in score data.

draw_model(model_revenue_selected, y_revenue, 'Selected Revenue', y_hat_selected)

The Python code with all the steps is summarized in this Google Colab (click on the link):

https://colab.research.google.com/drive/1zRkBkIjKUxieS2hR-bXSmv-9Wu_zL9eR?usp=sharing

Page updated

Google Sites

Report abuse