Lesson 3 ❮ Lesson List ❮ Top Page
❯ 3.3 Discretization
⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺
EXPECTED COMPLETION TIME
❲▹❳ Video 8m 4s
☷ Interactive readings 5m
Suppose you have data about a group of people in a study, and you want to group them into discrete age buckets.
Let's divide these into bins of 18 to 25, 26 to 35, 36 to 60, and finally 60 and older. To do so, you have to use cut, a function in pandas.
This Categorical object contains a categories array specifying the distinct category names along with a labeling for the ages data in the codes attributes.
We can also replace the label using reset_index and set_index.
If you pass an integer number of bins to cut instead of explicit bin edges, it will compute equal-length bins based on the minimum and maximum values in the data.
A closely related function, qcut, bins the data based on sample quantiles. Since qcut uses sample quantiles instead, by definition you will obtain roughly equal-size bins.
Similar to cut, you can pass your own quantiles.