Pandas Titanic Dojo
Digital storytelling of titanic proportions using pandas and matplotlib. Bring to life in full technicolor detail your data science odyssey.
Digital storytelling of titanic proportions using pandas and matplotlib. Bring to life in full technicolor detail your data science odyssey.
Given the extent to which the titanic story has been embedded in modern culture and storytelling - we take advantage here to communicate key observations recounted in numerous films, dramas and varying forms of literature. We investigate survival by making reference to sex, age, family size, and passenger class. We piggyback here on the work carried out on the Titanic Tidyverse. The video clips below are designed as a series of tutorials that allow us to interrogate titanic passenger list. The Pandas library is useful in that it provides an extensive syntax for understanding and appreciating key nuances that maybe embedded in our data but not directly observable at first glance from excel worksheets or csv files. Developing skills in Pandas are useful in their own right. Writing business reports or communicating scientific ideas necessitates mounting a succinct appraisal of underlying trends, filtering of key attributes, sorting from highest to lowest and just basic viewing through dataframe construction. We put into operation all these actions by creating dataframe objects, viewing, selecting, histogramming, dealing with missing values and grouping in the tutorials below.
Below, we make extensive use of the Pandas library available in Python to understand the better the passenger data. This can be accomplished by taking the raw titanic3 dataset and parsing through key attributes and then linking those same attributes like sex, age and class to survival. Pandas are ideal for setting up data transformation, reshaping and sifting through extensive volumes of information. The Pandas library was written for the Python programming language to perform data manipulation and analysis. Here we apply the library in Google Colab to implement Exploratory Data Analysis through pivoting and subsetting.