Warning - This site is moving to https://getthecodingbug.anvil.app
Topics covered
Using matplotlib to create charts
line chart
bar chart
pie chart
Using matplotlib and numpy
These two libraries, commonly used together, matplotlib and numpy enable us to generate charts to help visualise data.
Line chart
We first import the libraries:
from mpl_toolkits.axisartist.axislines import SubplotZero
import matplotlib.pyplot as plt
import numpy as np
Then we generate the data for our chart.
Firstly, using numpy, we generate a list of numbers, for our x axis, from -1.0 through 0 to +1.0 in steps of 0.1:
x = np.linspace(-1.0, 1.0, 100)
Secondly, we create an empty list for our y axis, and populate it with the value the of sin of our x values multiplied by pi:
y = []
for n in range(len(x)):
y.append(np.sin(x[n]*np.pi))
The next three statements prepare the chart:
fig = plt.figure(1)
ax = SubplotZero(fig, 111)
fig.add_subplot(ax)
Next we say we are going to plot a line, based on our x and y values and colour it blue:
ax.plot(x, y, color='blue')
Finally we show the chart:
plt.show()
Open up IDLE and create a new empty file (File/New File) and save it as 'sineWave.py'.
Next, copy'n'paste the code below and press F5 to run the program.
# sineWave.py
from mpl_toolkits.axisartist.axislines import SubplotZero
import matplotlib.pyplot as plt
import numpy as np
# Generate data
x = np.linspace(-1.0, 1.0, 100)
y = []
for n in range(len(x)):
y.append(np.sin(x[n]*np.pi))
# Prepare Chart
fig = plt.figure(1)
ax = SubplotZero(fig, 111)
fig.add_subplot(ax)
# Choose chart type (plot = line) and colour
ax.plot(x, y, color='blue') # Show chart
plt.show()
You should get an window like this:
Note the buttons, at the foot of the window, which may be used to manipulate the image and reset and the far right one enables you to save the image, which can then be imported into a Word document or web page, for example.
The x and y axes on this chart are on the outside, left and bottom, but can be place inside by inserting these lines of code before the ax.plot(x, y, color='blue') statement:
# Hide borders
for direction in ["left", "right", "bottom", "top"]:
ax.axis[direction].set_visible(False)
for direction in ["xzero", "yzero"]:
# adds X and Y-axis from the origin
ax.axis[direction].set_visible(True)
# adds arrows at the ends of each axis
ax.axis[direction].set_axisline_style("-|>")
After inserting, save this file as 'sineWave1.py' and press F5 to run it. You should get a chart like this one:
Bar chart
When we are done with the windows above, close them and just leave one IDLE window.
Create a new empty file (File/New File) and save it as 'selfieChart.py'.
Next, copy'n'paste the code below and save (File / Save)
The data used for this chart was copied from a web page: http://selfiecity.net/#dataset
I always make a comment in my code when I use someone else's data, see # Reference, below.
The line y_pos = np.arange(len(cities)) merely counts the number of items in the list of cities and places it in the variable y_pos.
In the # Prepare the chart section, we are first choosing a Bar chart (plt.bar), using data from the list named 'women', and using the colour (blue) and choosing a half-tone (alpha=0.5).
Then we say what the x-axis ticks should be labelled (City names) and the y-axis should be labelled 'Women' and the we give the whole chart a title. Finally we show the chart as before.
# selfieChart.py
import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
# Reference http://selfiecity.net/#dataset
# Generate the data
cities = ('Bangkok', 'Berlin', 'Moscow', 'New York', 'Sao Paulo')
y_pos = np.arange(len(cities))
women = [52.2,59.4,61.6,65.4,82.0]
# Prepare the chart
plt.bar(y_pos, women, align='center', color='blue', alpha=0.5)
plt.xticks(y_pos, cities)
plt.ylabel('Women')
plt.title('Percentage of women taking selfies')
# Show the chart
plt.show()
Press F5 to run the program and you should get a chart like this one:
Try changing the colour and perhaps the alpha value to, say 1. Also change the title of the chart.
We can add the data for men and have a second bar of a different colour next the bar for women for comparison.
As the figures for women are percentages of both sexes, we can determine the data for the men by subtracting the figures for women from 100.
Insert the following lines after the line: women = [52.2,59.4,61.6,65.4,82.0]
men = []
for n in range(len(women)):
men.append(100-women[n])
In the next section, # Prepare the chart, we add another bar, this time for men and colour it green.
We give each bar a label, change the title to 'people' and add a 'legend' to inform the viewer which bar belongs to which sex.
Now, replace the # Prepare the chart section with the following lines:
# Prepare the chart
bar_width = 0.30
plt.bar(y_pos, women, bar_width, alpha=0.5, label='women', color='blue')
plt.bar(y_pos+bar_width, men, bar_width, alpha=0.5, label='men', color='green')
plt.xticks(y_pos, cities)
plt.ylabel('Percentage')
plt.title('People taking selfies')
plt.legend()
Save the file as 'selfieChart1.py' and press F5 to run, and expect something like this:
Pie chart
The third kind of chart we are going to use is called a pie chart, so-called because it looks like a pie, with slices in it.
Matplotlib, has many other kinds of charts, and you can investigate them here:
https://matplotlib.org/tutorials/introductory/sample_plots.html
This time we are going to use some data from the Norfolk Daily News in the USA.
In the section, # Prepare the chart, we choose a pie chart (plt.pie) and set it to automatically compute the percentacges for each slice (autopct='%1.1f%%').
The line, plt.axis('equal') ensures the pie is a circle and not squashed.
Create a new empty file (File/New File) and save it as 'iceCreamFlavours.py'.
Next, copy'n'paste the following lines and save the file.
# iceCreamFlavours.py
import matplotlib.pyplot as plt
# Generate Data
'''
Reference: http://norfolkdailynews.com/lite_rock/programs/dave_williams/most-popular-ice-cream-flavors/article_8e46a544-6a88-11e8-a65b-2bb11e94b092.html
1. Mint chocolate chip, 16% of people say it's their favorite flavor
2. Chocolate, 15%
3. Cookies and cream, 15%
4. Vanilla, 12%
5. Butter pecan, 11%
6. Rocky road, 10%
7. Strawberry, 10%
8. Chocolate chip, 5%
9. Neapolitan, 4%
'''
labels = ['Mint chocolate chip', 'Chocolate', 'Cookies and cream', 'Vanilla',
'Butter pecan', 'Rocky road', 'Strawberry', 'Chocolate chip', 'Neapolitan']
flavours = [16, 15, 15, 12, 11, 10, 10, 5, 4]
# Prepare chart
plt.pie(flavours, labels=labels, autopct='%1.1f%%', startangle=90)
plt.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.
plt.title('Most Popular Ice Cream Flavors (Norfolk Daily News)')
# Show chart
plt.show()
Press F5 to run the program and you should get something like this:
You can enhance the pie-chart by 'pulling-out' one or more slices, also known as 'exploding',
by adding these two lines after the line: flavours = [16, 15, 15, 12, 11, 10, 10, 5, 4]
# "explode" first and second slices
explode = (0.2, 0.1, 0, 0, 0, 0, 0, 0, 0)
Note, that the first two slices are exploded, (0.2, 0.1) by different amounts and the rest are not (0).
Next we insert these two parameters, 'explode' and 'shadow' into the plt.pie(flavours, line:
plt.pie(flavours, explode=explode, shadow=True, labels=labels, …
Save the file as 'iceCreamFlavours1.py' and press F5 to run, and you should see this:
Now you can generate some data of your own using, say the files stored on you school directory.
Most files are stored with names and an extension, like '.py' for Python files, although, Windows sometimes hides the extension, depending on the default settings for your username.
With python, you can scan all the files in a folder or folders, using the 'os' library function called 'walk'.
Copy and paste the following lines into a new empty file, then change the 'Y:\Source' in the line:
rootDir = r'Y:\Source'
to point to your drive letter and folder. The save the file as 'myDataPieChart.py' and press F5 to run.
It may take a few minutes depending on the speed of your processor.
# myDataPieChart.py
# Import the os module, for the os.walk function
import os
import os.path
import matplotlib.pyplot as plt
# Setup empty dictionary for extensions
extensions = {}
# Set the directory you want to start from
rootDir = r'Y:\Source'
# Walk the directory folders collecting the data
for dirName, subdirList, fileList in os.walk(rootDir):
for fname in fileList:
extension = os.path.splitext(fname)[1]
extension = extension.upper()
if len(extension) > 0:
if extension in extensions.keys():
extensions[extension] += 1
else:
extensions.update({extension : 1})
# Sort the data by extensions
sorted(extensions.values())
# Collect just the extensions which number greater than 100
labels = []
extn_count = []
for extension in extensions:
count = extensions[extension]
if count > 100:
labels.append(extension)
extn_count.append(extensions[extension])
# Chart settings
plt.pie(extn_count, labels=labels, autopct='%1.1f%%', startangle=90)
plt.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.
plt.title('File types by extensions under ' + rootDir + '\n')
# Show chart
plt.show()