Slicing, Combining, Removing Data

You may not need to work with all the data in a dataset. The act of slicing and combining data, gives you a subset of the data suitable for analysis. The following sections describe various ways to obtain specific pieces of data to meet particular needs.

Slicing Rows

Slicing can occur in multiple ways when working with data. Say you want to see the first 10 entries of homes.csv. You can do this as follows.

Here, we use [:10] for the first 10 entries. This is similar to the labelling for list. You can change it with what ever you want. The last ten entries would be like [-10:].

Slicing colums

In homes.csv, say we only want to see the column Sell and List. We did this for one column before but we can also extend it to multiple columns.

Here, we use [:10] for the first 10 entries. This is similar to the labelling for list. You can change it with what ever you want. The last ten entries would be like [-10:].

Adding new cases

You often find a need to combine datasets in various ways. The following example shows how to add new cases: we combine the two csv file together using .append(), just like list.

We make a new combined table called h_table. Since it still use the old index from the two files, .reset_index() reset the indexing.
Documentation: append, reset_index

Adding new variables

Sometimes you need to add a new variable (column) to the DataFrame. Here, we will add only the name and the age owner of the house from homes_owner.csv using join().

Note that the resulting DataFrame will match cases with the same index value, so indexing is important. The number of cases in both DataFrame objects must match.
Documentation: join

Removing Row

At some point, you may need to remove cases or variables from a dataset because they aren't required. As in the previous lesson, we use drop() for this. Let say entries with index 1 and 25 - 29 is not needed.

Documentation: drop

Removing Column

Removing a column is different. You use the column name for this. Say we want to remove the "Taxes" part from homes.csv.

You must specify an axis as part of the removal process in second argument (normally 1).

< Prev. Lesson

Go to Lesson II >

Exercise 1.5

Here, we have cities.csv that we've seen before.

Get the first 100 entries from cities.csv and save it in a variable.
The columns with heading "NS" and "EW" is not really useful.
Remove those two columns from your variable.
Add the name, height, and weight from the representatives.csv to your variable.
Name should appear in the 1st column. (Note: the two data are not related)
The entry with index 91 and 98 has some wrong information.
Remove those two entries from your variable. Print the final result.
Drop all the rows whose the entry in column "State" is CA.
Save the new data (from the result in 1-5) in a new csv file called cities_no_CA.csv

Page updated

Google Sites

Report abuse