Advertising.csv consists of 200 rows and 5 columns
Unamed, TV, Radio, Newspaper, Sales [Unamed need to be removed]
It showed the sales based on different combination of TV/Radio/Newspaper.
#Load library
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
#Dataset import
import pandas as pd
dataexe = pd.read_csv('Advertising.csv')
#Check Information
dataexe.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 200 non-null int64
1 TV 200 non-null float64
2 Radio 200 non-null float64
3 Newspaper 200 non-null float64
4 Sales 200 non-null float64
dtypes: float64(4), int64(1)
memory usage: 7.9 KB
#Remove unnecessary column
dataexe.drop(dataexe.columns[[0]], axis=1, inplace=True)
#Pairplot_To find the relationship between features
sns.pairplot(dataexe1,kind="reg");
From the above graph, it shows that TV and radio are having positive linear relationship with sales, if TV/Radio increased, Sales increased, the lessest is the Newspaper. Using TV is more efficient to boost the sales. So focus on TV/Radio.
#Regression plot
sns.regplot(data = dataexe1, x = 'TV', y = 'Sales');
#Scatterplot
sns.scatterplot(data = dataexe, x = 'Radio', y = 'Sales');
#Boxplot to check outlier?
sns.boxplot(x = dataexe1['TV']);
#Boxplot to check outlier?
sns.boxplot(x = dataexe1['Radio']);
A boxplot is a standardized way of displaying the distribution of data based on a five number summary (“minimum”, first quartile [Q1], median, third quartile [Q3] and “maximum”). It can tell you about your outliers and what their values are
Data that contains sum of each column
Shows that invest in which component the most
dataexe.loc['Total'] = dataexe.sum(numeric_only=True, axis=0)
print(dataexe)
OUTPUT
TV Radio Newspaper Sales
0 230.1 37.8 69.2 22.1
1 44.5 39.3 45.1 10.4
2 17.2 45.9 69.3 9.3
3 151.5 41.3 58.5 18.5
4 180.8 10.8 58.4 12.9
... ... ... ... ...
196 94.2 4.9 8.1 9.7
197 177.0 9.3 6.4 12.8
198 283.6 42.0 66.2 25.5
199 232.1 8.6 8.7 13.4
Total 29408.5 4652.8 6110.8 2804.5
#ONLY TOTAL ROW
data2 = dataexe.drop(dataexe.index[:200])
data2
#PIE CHART
y = np.array([29408.5, 4652.8, 6110.8])
mylabels = ["TV", "Radios", "Newspaper"]
plt.pie(y, labels = mylabels)
plt.show()