Search this site
Embedded Files
AIMD GPDS Courses
  • Home
  • Courses
  • Contact
AIMD GPDS Courses
  • Home
  • Courses
  • Contact
  • More
    • Home
    • Courses
    • Contact

日本語  ❯

Lesson 1    ❮    Lesson List    ❮    Top Page

1.1  DataFrame

1.2  Data Types Conversion

1.3  Rows Selection

❯  1.4  Columns Selection

⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺
EXPECTED COMPLETION TIME
❲▹❳  Video   9m 21s
☷  Interactive readings   5m

Selecting One Column

To get the column 'director_name' as a Series, pass that column name as a string to the indexing operator to select a Series of data.

Alternatively, you may use the dot notation to accomplish the same task:

movie.director_name

Look at the image and try to familiarize yourself with the Series anatomy. 

Finally, we can verify the output using type(). Note that, you'll see that it is indeed a Series.

Selecting Multiple Columns with List

Let's select some columns in the movie dataset by passing in a list of the desired columns to the indexing operator. There are instances when one column of a DataFrame needs to be selected. This can be done by passing a single element list to the indexing operator.

> Passing a long list inside the indexing operator might cause readibility issues. To help with this, you may save all your column names to a list variable first:cols = ['actor_1_name', 'actor_2_name', 'actor_3_name', 'director_name']movie_actor_director = movie[cols]

Selecting Columns by their Types

We can also select a particular data type.  By using .value_counts() method, we can see the number of columns with each specific data type. You can use the select_dtypes method to select only the integer columns. If you would like to select all the numeric columns, you may simply pass the string number to the include parameter.

> Because all integers and floats default to 64 bits, you may select them by simply using the string, int, or float. If you want to select all integer and floats regardless of their specific size, use the string number.

Filtering Columns using like and regex

An alternative method to select columns is with the filter method. This method is flexible and searches column names (or index labels) based on which parameter is used. Here, we use the like parameter to search for all column names that contain the exact string, 'fb'.

The filter method allows columns to be searched through regular expression with the regex parameter. Here, we search for all columns that have a digit somewhere in their name.

> The filter method comes with another parameter, items, which takes a list of exact column names. This is nearly an exact duplication of the indexing operator, except that a KeyError will not be raised if one of the strings does not match a column name. For instance, movie.filter(items=['actor_1_name', 'asdf']) runs without error and returns a single column DataFrame.

Selecting Columns and Rows using .loc and .iloc 

The generic form to select rows and columns will look like the following code:

df.iloc[rows, columns]
df.loc[rows, columns]

Here, we will see how to select the first few rows and the first few columns with slice notation.

Selecting All Rows of Two Columns

To select all the rows of two different columns, we can use the slice notation as in the examples.

Selecting Some Rows of Some Columns

In this example here, we select disjointed rows and columns:

©2023. All rights reserved.  Samy Baladram,
Graduate Program in Data Science - GSIS - Tohoku University
Google Sites
Report abuse
Page details
Page updated
Google Sites
Report abuse