Search this site
Embedded Files
AIMD GPDS Courses
  • Home
  • Courses
  • Contact
AIMD GPDS Courses
  • Home
  • Courses
  • Contact
  • More
    • Home
    • Courses
    • Contact

日本語  ❯


Lesson 4    ❮    Lesson List    ❮    Top Page

4.1 Functions    

4.2 Lambda Functions    

4.3 Handling Exceptions    

❯  4.4  Intro to RegEx    

4.5 RegEx for Data Cleaning

⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺
EXPECTED COMPLETION TIME
❲▹❳  Video   8m 18s
☷  Interactive readings   5m
✑  Practice 4.4 (G Colab)   15m

Finding All Patterns in a String

A regular expression (RegEx) is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern.

The Python module re need to be imported first. One of the simplest re method is

re.findall(pattern, string).
This function returns a list containing all matches of pattern in string.

Splitting a String

Splitting a string can be done by using

re.split(pattern, string).
This function returns a list where the string has been split at each pattern.

\s matches any whitespace character (including tab and newline).

Matching Multiple Possibilities

Square bracket [ ] allows to find the match of multiple pattern.

Backslash \ can be inserted before some special characters like ), ], -, or ^ if you want to include them in your pattern.

Matching Numbers and Alphabets

The square bracket [ ] can also contain a range of characters separated by a hyphen (-), in which case it matches any single character within the range. 

[A-z] matches any alphabetic character. For upper or lower case only, use [A-Z] or [a-z].

[0-9] matches any digit character.

Matching a Repeating Patterns

To match multiple repetitions of a pattern, you can use the following:

Plus + symbol allows to match zero or more repetitions of the preceding regex.

{m} matches exactly m repetitions of the preceding regex.

{m,n} matches any number of repetitions of the preceding regex from m to n-1.

Returning the Necessary Part of a Patterns

Parentheses ( ) can be use to group a pattern. It can also be used to return the only necessary part of the pattern to the list.

©2023. All rights reserved.  Samy Baladram,
Graduate Program in Data Science - GSIS - Tohoku University
Google Sites
Report abuse
Page details
Page updated
Google Sites
Report abuse