Lesson 4 ❮ Lesson List ❮ Top Page
❯ 4.4 Intro to RegEx
⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺⎺
EXPECTED COMPLETION TIME
❲▹❳ Video 8m 18s
☷ Interactive readings 5m
✑ Practice 4.4 (G Colab) 15m
A regular expression (RegEx) is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern.
The Python module re need to be imported first. One of the simplest re method is
re.findall(pattern, string).
This function returns a list containing all matches of pattern in string.
Splitting a string can be done by using
re.split(pattern, string).
This function returns a list where the string has been split at each pattern.
\s matches any whitespace character (including tab and newline).
Square bracket [ ] allows to find the match of multiple pattern.
Backslash \ can be inserted before some special characters like ), ], -, or ^ if you want to include them in your pattern.
The square bracket [ ] can also contain a range of characters separated by a hyphen (-), in which case it matches any single character within the range.
[A-z] matches any alphabetic character. For upper or lower case only, use [A-Z] or [a-z].
[0-9] matches any digit character.
To match multiple repetitions of a pattern, you can use the following:
Plus + symbol allows to match zero or more repetitions of the preceding regex.
{m} matches exactly m repetitions of the preceding regex.
{m,n} matches any number of repetitions of the preceding regex from m to n-1.
Parentheses ( ) can be use to group a pattern. It can also be used to return the only necessary part of the pattern to the list.