external

Python has nearly half a million external modules which can be accessed via the package manager.

The Python Package Index (PyPI) is a repository of software for the Python programming language.

Some of the major ones are:

numpy (numerical python)
pandas (panel data)
matplotlib (math plotting library)

Numpy

Nearly every scientist working in Python draws on the power of NumPy.

NumPy brings the computational power of languages like C and Fortran to Python, a language much easier to learn and use. With this power comes simplicity: a solution in NumPy is often clear and elegant.

NumPy is an open source project that enables numerical computing with Python. It was created in 2005 building on the early work of the Numeric and Numarray libraries. NumPy will always be 100% open source software and free for all to use.

NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object (the ndarray), various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

import numpy as np

a = np.arange(6)

a2 = a[np.newaxis, :]

a2.shape

An array is a central data structure of the NumPy library. An array is a grid of values and it contains information about the raw data, how to locate an element, and how to interpret an element. It has a grid of elements that can be indexed in various ways. The elements are all of the same type, referred to as the array dtype.

The official docs containing the above an much more are here: https://numpy.org/

Pandas

pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

In particular, it offers data structures and operations for manipulating numerical tables and time series.

The name is derived from the term "panel data", an econometrics term for data sets that include observations over multiple time periods for the same individuals. Its name is a play on the phrase "Python data analysis" itself.

Pandas is mainly used for data analysis and associated manipulation of tabular data in DataFrames. Pandas allows importing data from various file formats such as comma-separated values, JSON, Parquet, SQL database tables or queries, and Microsoft Excel.

Pandas allows various data manipulation operations such as merging, reshaping, selecting, as well as data cleaning, and data wrangling features. The development of pandas introduced into Python many comparable features of working with DataFrames that were established in the R programming language. The pandas library is built upon another library, NumPy, which is oriented to efficiently working with arrays instead of the features of working on DataFrames.

Many potential pandas users have some familiarity with spreadsheet programs like Excel. Here is a comparison.

pandas Excel

DataFrame worksheet

Series column

Index row headings

row row

NaN empty cell

https://pandas.pydata.org/

Matplotlib

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib makes easy things easy and hard things possible.

MAtplotlib makes heavy use of NumPy and also integrates well with Pandas.

Here is an example of a histogram:

import matplotlib.pyplot as plt

import numpy as np

# make data

x = 4 + np.random.normal(0, 1.5, 200)

# plot:

fig, ax = plt.subplots()

ax.hist(x, bins=8, linewidth=0.5, edgecolor="white")

ax.set(xlim=(0, 8), xticks=np.arange(1, 8),