Lesson 1 ❮ Lesson List ❮ Top Page
❯ 1.1 DataFrame
⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺
EXPECTED COMPLETION TIME
❲▹❳ Video 10m 59s
☷ Interactive readings 5m
We will start by importing the pandas library. Throughout this course, we will learn numerous pandas methods that can help us understand data.
First, we use the read_csv function to read in the movie.csv dataset, and display the first five rows with the head method.
Let us see some terminology of the parts that we see in the output:
index label individual member of index
column name individual member of column
index all the index labels as a whole
column all the column names as a whole.
values data other than columns or index.
NaN (not a number) representation of missing value
... At least one column is not displayed due to the number of columns exceeding the display limit.
We can use the DataFrame attributes index, columns, and values to assign the index, columns, and values to their own variables. Here we will try to print all of them in f-string.
Most of pandas relies heavily on the ndarray. This is a basic array used in the numpy library. They could be considered the base object for pandas that many other objects are built upon.
The values method can convert the DataFrame into Numpy Array. We will use Numpy array in later sections.
The next first thing to do when we see a new data is to see how big it is.
shape will return the number of row and column.
size will give the number of entries (row x column)
ndim will give the dimension of the data (it is 2 since a 2-dimensional table)
len will give the number of rows
Read in the movie dataset, make the index meaningful by setting it as the movie title. The rename DataFrame method accepts dictionaries that map the old value to the new value. Let's create one for the rows and another for the columns.
Pass the dictionaries to the rename method, and assign the result to a new variable.