Search this site
Embedded Files
AIMD GPDS Courses
  • Home
  • Courses
  • Contact
AIMD GPDS Courses
  • Home
  • Courses
  • Contact
  • More
    • Home
    • Courses
    • Contact

日本語  ❯

Lesson 1    ❮    Lesson List    ❮    Top Page

❯  1.1  DataFrame

1.2  Data Types Conversion

1.3  Rows Selection

1.4  Columns Selection

⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺
EXPECTED COMPLETION TIME
❲▹❳  Video   10m 59s
☷  Interactive readings   5m

We will start by importing the pandas library. Throughout this course, we will learn numerous pandas methods that can help us understand data.

Reading File

First, we use the read_csv function to read in the movie.csv dataset, and display the first five rows with the head method.

Let us see some terminology of the parts that we see in the output:

index label individual member of index
column name individual member of column
index all the index labels as a whole
column all the column names as a whole.
values data other than columns or index.
NaN (not a number) representation of missing value
... At least one column is not displayed due to the number of columns exceeding the display limit.

> The head method accepts a single parameter, n, which controls the number of rows displayed. Similarly, the tail method returns the last n rows.

Checking the index and columns

We can use the DataFrame attributes index, columns, and values to assign the index, columns, and values to their own variables. Here we will try to print all of them in f-string.

Checking the data type

We can also check the type of each DataFrame component, using the method type. The name of the type is the word following the last dot of the output. 

We will further discuss this in the next lesson.

Note that the values method returns a NumPy n-dimensional array: ndarray.

Converting Pandas Object into Numpy Array

Most of pandas relies heavily on the ndarray. This is a basic array used in the numpy library. They could be considered the base object for pandas that many other objects are built upon.

The values method can convert the DataFrame into Numpy Array. We will use Numpy array in later sections.

Checking the dimension of the data

The next first thing to do when we see a new data is to see how big it is. 

shape will return the number of row and column.

size will give the number of entries (row x column)

ndim will give the dimension of the data (it is 2 since a 2-dimensional table)

len will give the number of rows

Renaming Index and Columns using Dictionary

Read in the movie dataset, make the index meaningful by setting it as the movie title. The rename DataFrame method accepts dictionaries that map the old value to the new value. Let's create one for the rows and another for the columns.

Pass the dictionaries to the rename method, and assign the result to a new variable.

Changing the Index and Columns Format

The rename method is useful for renaming the labels of index or columns. Other than using dictionary, you can use the string str method like upper, lower, or title.

©2023. All rights reserved.  Samy Baladram,
Graduate Program in Data Science - GSIS - Tohoku University
Google Sites
Report abuse
Page details
Page updated
Google Sites
Report abuse