Lesson 4 ❮ Lesson List ❮ Top Page
❯ 4.5 RegEx for Data Cleaning
⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺
EXPECTED COMPLETION TIME
❲▹❳ Video 12m 28s
☷ Interactive readings 5m
✑ Practice 4.5 (G Colab) 25m
In data cleaning, some corrupted characters can be deleted or replaced.
re.sub(pattern, sub, string)
replaces all the pattern in string with sub.
Cleaning a part of data using list comprehension can be useful.
We used all that we have learned previously in this example.
Function is handy since it can be used repeatedly for multiple data set.
strip() clean unnecessary whitespaces at the start and end of a string.