Lesson 1 ❮ Lesson List ❮ Top Page
❯ 1.4 Columns Selection
⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺
EXPECTED COMPLETION TIME
❲▹❳ Video 9m 21s
☷ Interactive readings 5m
To get the column 'director_name' as a Series, pass that column name as a string to the indexing operator to select a Series of data.
Alternatively, you may use the dot notation to accomplish the same task:
movie.director_name
Look at the image and try to familiarize yourself with the Series anatomy.
Finally, we can verify the output using type(). Note that, you'll see that it is indeed a Series.
Let's select some columns in the movie dataset by passing in a list of the desired columns to the indexing operator. There are instances when one column of a DataFrame needs to be selected. This can be done by passing a single element list to the indexing operator.
We can also select a particular data type. By using .value_counts() method, we can see the number of columns with each specific data type. You can use the select_dtypes method to select only the integer columns. If you would like to select all the numeric columns, you may simply pass the string number to the include parameter.
An alternative method to select columns is with the filter method. This method is flexible and searches column names (or index labels) based on which parameter is used. Here, we use the like parameter to search for all column names that contain the exact string, 'fb'.
The filter method allows columns to be searched through regular expression with the regex parameter. Here, we search for all columns that have a digit somewhere in their name.
The generic form to select rows and columns will look like the following code:
df.iloc[rows, columns]
df.loc[rows, columns]
Here, we will see how to select the first few rows and the first few columns with slice notation.
To select all the rows of two different columns, we can use the slice notation as in the examples.
In this example here, we select disjointed rows and columns: