Data Visualization

Machine Learning Visualization

Machine Learning Visualization is a powerful technique that uses visual representations to make complex machine learning models and data patterns easier to understand. It helps in exploring data, identifying patterns and trends, and making informed decisions throughout the machine learning process. Visualization techniques like charts, graphs, and maps can reveal insights that might be hidden in raw data, leading to better model performance and decision-making.

Barplot

A bar plot is a common type of chart used to visualize categorical data. It displays the relationship between a categorical variable and a numerical variable. Each category is represented by a rectangular bar, and the length or height of the bar corresponds to the numerical value associated with that category.

sns.barplot(x='day', y='total_bill', data=tips, palette='tab10');

Boxplot

A box plot is a statistical chart that visually summarizes data distribution. It shows the minimum, first quartile, median, third quartile, and maximum values. The box represents the interquartile range (IQR), and whiskers extend to show the range of data, excluding outliers. Outliers are often plotted as individual points.

sns.boxplot(x='day', y='total_bill', hue='sex', data=tips, linewidth =2.5, palette='Dark2');

Kdeplot

A KDE plot, or Kernel Density Estimation plot, is a smooth curve that estimates the probability density function of a continuous variable. It provides a visual representation of the data's distribution, highlighting its shape, central tendency, and spread. By examining the peaks, valleys, and overall shape of the KDE curve, insights can be gained into the data's characteristics.

sns.kdeplot(data=df , x='Age', hue='Sex', multiple='stack', palette='tab10');

Violinplot

A violin plot is a statistical chart that combines a box plot with a kernel density plot. It provides a richer visualization of data distribution, showing both the overall shape and density of the data. The wider parts of the violin represent regions with higher data density, while narrower parts indicate lower density. Violin plots are useful for comparing distributions, especially when there are multiple groups or categories.

sns.violinplot(x="day", y="total_bill", data=tips);

Stripplot

A stripplot is a simple visualization that shows the distribution of data points along a single axis. It's useful for visualizing the spread and density of data, especially when combined with other plots like box plots or violin plots. Stripplots can be customized with jitter to reduce overlap between points and enhance readability.

sns.stripplot(x="time", y="total_bill", hue="sex", data=tips);

Scatterplot

A scatter plot is a graph used to visualize the relationship between two numerical variables. Each data point is represented by a dot on the plot, with its position determined by its values on the x and y axes. Scatter plots are useful for identifying trends, correlations, and outliers in data. They can also be used to fit regression lines to model the relationship between the variables.

sns.scatterplot(x = 'total_bill', y = 'tip', hue = 'sex', data = tips);

Swarmplot

A swarmplot is a visualization that displays individual data points along an axis. It's similar to a stripplot, but it adjusts the positions of points to minimize overlap, making it easier to visualize the distribution of data, especially when there are many data points. Swarmplots are often used in conjunction with box plots or violin plots to provide a more detailed view of the data.

sns.swarmplot(x="day", y="total_bill", hue="sex", data=tips);

Boxenplot

A boxenplot is an enhanced box plot that provides a more detailed view of data distribution. It displays multiple quantiles, revealing more information about the shape of the distribution, especially in the tails. Boxenplots are particularly useful for large datasets where traditional box plots might not be sufficient.

sns.boxenplot( x='time', y="total_bill", hue='sex', data=tips);

Lineplot

A line plot is a visualization used to show trends over time or across categories. It connects data points with lines, revealing patterns, increases, decreases, and overall changes. Line plots are excellent for tracking variables over time or comparing multiple variables simultaneously. By analyzing the slope and direction of the lines, insights can be gained into the underlying trends and relationships in the data.

sns.lineplot(x="size",y="total_bill",data=tips,hue='sex',markers=True);

Jointplot

A Jointplot is a visualization that combines a scatter plot with histograms along the x and y axes. It provides a comprehensive view of the relationship between two numerical variables, along with their individual distributions. This allows for a deeper understanding of the bivariate relationship and the marginal distributions of each variable. Jointplots are particularly useful for identifying trends, correlations, and outliers in the data.

sns.jointplot(x="chol",y="trtbps",data=heart,kind="kde",hue='sex');

JointGrid

JointGrid is a figure-level object in Seaborn that allows for the creation of complex visualizations. It combines a bivariate plot (scatter plot, hex plot, etc.) with univariate plots (histograms, KDE plots) along the margins. This provides a comprehensive view of the relationship between two variables, along with their individual distributions. JointGrid offers flexibility in customizing the appearance and style of the plot, making it a powerful tool for data exploration and analysis.

g = sns.JointGrid(data=heart, x="age", y="chol", hue="output")

g.plot(sns.scatterplot, sns.histplot);

Lmplot

Lmplot, a function in Seaborn, is used to visualize linear relationships between numerical variables. It combines scatter plots with regression lines, providing insights into the correlation and potential trends between the variables. Lmplot can also handle categorical variables, allowing for visualizing relationships within different groups.

g= sns.lmplot(x="age", y="chol", hue="cp", data=heart)

Relplot

Relplot is a versatile function in Seaborn that allows for creating various relational plots, including scatter plots, line plots, and more. It offers flexibility in customizing the appearance, adding multiple dimensions (hue, size, style), and organizing plots into subplots. Relplot is a powerful tool for exploring relationships between variables and understanding data trends.

g = sns.relplot(x="age", y="chol", data=heart,hue='sex')

Heatmap

A heatmap is a 2D visualization that represents data values as colors. Warmer colors indicate higher values, while cooler colors indicate lower values. Heatmaps are useful for identifying patterns, trends, and anomalies within data, especially when dealing with large datasets. They are commonly used in fields like finance, biology, and machine learning.

mask = np.triu(np.ones_like(tips.corr(), dtype=bool))

sns.heatmap(tips.corr(), mask = mask, annot=True, cmap='Dark2');

Catplot

Catplot in Seaborn is a versatile function for visualizing categorical data. It creates various plot types like strip plots, swarm plots, box plots, violin plots, and bar plots, helping to understand data distribution across categories.

sns.catplot(x='smoker', col='sex', kind='count', data=tips ,palette="Dark2")

Finds Top 4 Chest Pain Categories: Seaborn

This code analyzes a column named 'cp' in a dataframe 'df' and finds the 4 categories that appear most frequently. These categories are stored in 'top_leagues'.

For Code Press Arrow 👇

top_leagues = df['cp'].value_counts().nlargest(4).index

plt.figure(figsize=(8, 5))

sns.scatterplot(x='age', y='chol', data=df[df['cp'].isin(top_leagues)], hue='cp')

plt.title('Age vs. Cholesterol for Top 4 Chest Pain',fontsize = 18, color = 'DarkRed')

plt.xlabel('Age',fontsize = 14, color = 'DarkBlue')

plt.ylabel('Cholesterol',fontsize = 14, color = 'DarkBlue')

plt.xticks(fontsize = 12, color = 'Black')

plt.yticks(fontsize = 12, color = 'Black')

plt.legend(title='Chest Pain Type', bbox_to_anchor=(1.05, 1), loc='upper left')

plt.show()

Finds Top 4 Chest Pain Categories: Plotly

This code analyzes a column named 'cp' in a dataframe 'df' and finds the 4 categories that appear most frequently. These categories are stored in 'top_leagues'.

For Code Press Arrow 👇

import plotly.express as px

fig = px.scatter(df, x='chol', y='age', color='cp', size = 'oldpeak', size_max = 30, hover_name = 'exang')

fig.update_layout(width=1000, height=600)

fig.update_layout(title_text='Scatter Plot of Cholesterol vs. Age (colored by cp)')

fig.show()

Finds Top 4 Chest Pain Categories: Plotly

This code analyzes a column named 'cp' in a dataframe 'df' and finds the 4 categories that appear most frequently. These categories are stored in 'top_leagues'.

For Code Press Arrow 👇

import plotly.express as px

fig = px.scatter(df, x='chol', y='age', color='cp', size = 'oldpeak', size_max = 30, hover_name = 'exang',facet_col = 'cp', log_x = True )

fig.update_layout(width=1000, height=600)

fig.update_layout(title_text='Scatter Plot of Cholesterol vs. Age (colored by cp)')

fig.show()

Stacked Bar Charts

This code snippet creates a stacked bar chart to visualize the distribution of case counts within chest pain (cp) categories, further broken down by the slope of the ST segment (slope)

For Code Press Arrow 👇

def generate_rating_df(df):

rating_df = df.groupby(['cp', 'slope']).agg({'id': 'count'}).reset_index()

rating_df = rating_df[rating_df['id'] != 0]

rating_df.columns = ['cp', 'slope', 'counts']

rating_df = rating_df.sort_values('slope')

return rating_df

rating_df = generate_rating_df(df)

fig = px.bar(rating_df, x='cp', y='counts', color='slope')

fig.update_traces(textposition='auto',

textfont_size=20)

fig.update_layout(barmode='stack')

fig.show()

Bar Charts

The code creates a visualization of case distribution across chest pain types.

For Code Press Arrow 👇

def generate_rating_df(df):

rating_df = df.groupby(['cp', 'slope']).agg({'id': 'count'}).reset_index()

rating_df = rating_df[rating_df['id'] != 0]

rating_df.columns = ['cp', 'slope', 'counts']

rating_df = rating_df.sort_values('slope')

return rating_df

rating_df = generate_rating_df(df)

fig = px.bar(rating_df, x='cp', y='counts', color='slope')

fig.update_traces(textposition='auto',

textfont_size=20)

fig.update_layout(barmode='group')

fig.show()

Scatter Plot

This code snippet creates a scatter plot with several customizations using Plotly Express

For Code Press Arrow 👇

fig = px.scatter(data_frame = df,

x="age",

y="chol",

color="cp",

size='ca',

hover_data=['oldpeak'],

marginal_x="histogram",

marginal_y="box",)

fig.update_layout(title_text=" Age vs Cholesterol ",

titlefont={'size': 24, 'family':'Serif'},

width=1000,

height=600,

)

fig.show()

Bar Chat

This code snippet creates a bar chart to visualize the gender distribution in your data

For Code Press Arrow 👇

from plotly.offline import iplot

gender = df["sex"].value_counts()

fig = px.bar(data_frame=gender,

x = gender.index,

y = gender,

color=gender.index,

text_auto="0.3s",

labels={"y": "Frequency", "index": "Gender"}

)

fig.update_traces(textfont_size=24)

iplot(fig)

Pie Chat

This code creates a pie chart visualizing the most frequent chest pain types ("cp")

For Code Press Arrow 👇

fig=px.pie(df.groupby('cp',as_index=False)['sex'].count().sort_values(by='sex',ascending=False).reset_index(drop=True),

names='cp',values='sex',color='sex',color_discrete_sequence=px.colors.sequential.Plasma_r,

labels={'cp':'Chest Pain','Sex':'Count'}, template='seaborn',hole=0.4)

fig.update_layout(autosize=False, width=1000, height=600,legend=dict(orientation='v', yanchor='bottom',y=0.40,xanchor='center',x=1),title='Chest Pain',

title_x=0.5, showlegend=True)

fig.update_traces(

textfont= {

"family": "consolas",

"size": 20

}

)

fig.show()

3D Scatter Plot

This code snippet creates a 3D scatter plot to visualize relationships between multiple variables in data

For Code Press Arrow 👇

# Visualization 5: Plotly Express

fig = px.scatter_3d(df, x='chol', y='trestbps', z='oldpeak', color='slope', size='age', hover_name='cp')

fig.show()

Pie Chat

This code snippet creates a sunburst chart to visualize the distribution of genders within different chest pain categories in data

For Code Press Arrow 👇

fig = px.sunburst(df, path=['cp', 'sex'])

fig.update_layout(title_text="Chest Pain vs Gender",

titlefont={'size': 24, 'family':'Serif'},

width=750,

height=750,

)

fig.show()

Pie Chat

This code snippet creates a sunburst chart to visualize the distribution of genders within different chest pain categories in data

For Code Press Arrow 👇

from plotly.subplots import make_subplots

# data titanic

fig = make_subplots(rows=1, cols=2,

specs=[[{'type':'domain'}, {'type':'domain'}],

])

fig.add_trace(

go.Pie(

labels=df['cp'],

values=None,

hole=.4,

title='Chest Pain',

titlefont={'color':None, 'size': 24},

),

row=1,col=1

)

fig.update_traces(

hoverinfo='label+value',

textinfo='label+percent',

textfont_size=12,

marker=dict(

colors=['lightgray', 'lightseagreen'],

line=dict(color='#000000',

width=2)

)

fig.add_trace(

go.Pie(

labels=df['sex'],

values=None,

hole=.4,

title='Sex',

titlefont={'color':None, 'size': 24},

),

row=1,col=2

)

fig.update_traces(

hoverinfo='label+value',

textinfo='label+percent',

textfont_size=16,

marker=dict(

colors=['lightgray', 'lightseagreen'],

line=dict(color='#000000',

width=2)

)

fig.layout.update(title=" Heart Disease ",

titlefont={'color':None, 'size': 24, 'family': 'San-Serif'},

showlegend=False,

height=600,

width=950,

)

fig.show()

Choropleth Map

This code snippet creates a choropleth map using Plotly Express (px) to visualize population data on a world map, along with additional information on hover

For Code Press Arrow 👇

fig = px.choropleth(df,

locations="Country",

locationmode='country names',

color="Population",

hover_name="Country",

hover_data={"LifeExpectancy": True,

"Population": True},

title="World Map of Population size",

color_continuous_scale=px.colors.sequential.Plasma)

fig.show()

Choropleth Map

This code snippet creates a choropleth map using Plotly Express (px) to visualize population data on a world map, along with additional information on hover

For Code Press Arrow 👇

fig = px.choropleth(df,

locations="Country",

locationmode='country names',

color="FertilityRate",

hover_name="Country",

hover_data={"LifeExpectancy": True,

"Population": True},

title="World Map of Fertility Rate",

color_continuous_scale=px.colors.sequential.Plasma)

fig.show()

Choropleth Map

This code snippet creates a choropleth map using Plotly Express (px) to visualize population data on a world map, along with additional information on hover

For Code Press Arrow 👇

px.choropleth(gapminder,locations="iso_alpha",color="lifeExp",hover_name="country",animation_frame="year",color_continuous_scale=px.colors.sequential.Plasma,projection="natural earth")

Word Cloud

The provided code snippet creates a word cloud visualization for positive reviews related to companies located in Atlanta

For Code Press Arrow 👇

from wordcloud import WordCloud

# create a word cloud for positive reviews

positive_reviews = df1[df1['location'] == 'Atlanta, GA']['company'].str.cat(sep=' ')

positive_cloud = WordCloud(width=1500, height=800, max_words=100, background_color='white').generate(positive_reviews)

plt.figure(figsize=(20, 6), facecolor=None)

plt.imshow(positive_cloud)

plt.axis("off")

plt.tight_layout(pad=0)

plt.show()

Bar Plot

This code snippet analyzes your data to find the top 10 most frequent job titles (positions) in DataFrame, then creates a visually appealing bar chart that highlights those job titles and their respective counts.

For Code Press Arrow 👇

z=df1['position'].value_counts().head(10)

fig=px.bar(z,x=z.index,y=z.values,color=z.index,text=z.values,labels={'index':'job title','y':'count','text':'count'},template='seaborn',title=' Top 10 Popular Roles in Data Sceince')

fig.show()