5.4. Finding a Single Character

The re module contains many useful functions and methods. In this lesson, we will see how to find a single character in a string.

[ ]

Specifies a specific set of characters to match

Characters contained in square bracket ([ ]) represents a character class. A character class metacharacter sequence will match any single character contained in the class.

> The metacharacter sequence [aiueo] matches any single 'a', 'i', 'u', 'e', or 'o' character. In the example, the regex sta[aiueo] matches both 'sta' and 'sto' (and would also match 'sti' and 'ste')

A character class can also contain a range of characters separated by a hyphen (-), in which case it matches any single character within the range.

[a-z] matches any lower alphabetic character between 'a' and 'z'.

[0-9] matches any digit character.

[0-9a-fA-F] matches any hexadecimal digit character.

> In this case, [0-9][0-9][0-9] matches a sequence of three digits.

Not in the class: ^

Complement a character class

You can complement a character class by specifying ^ as the first character, in which case it matches any character that is not in the set.

> The metacharacter sequence [aiueo] matches any single 'a', 'i', 'u', 'e', or 'o' character. In the example, the regex sta[aiueo] matches both 'sta' and 'sto' (and would also match 'sti' and 'ste')

Escape Metacharacter: \

Escapes metacharacter from its special meaning

What if you want to include some special characters like ], -, or ^ in your string? You can escape it with a backslash (\)

Any Character: .

Specifies a wildcard

The . metacharacter matches any single character except a newline.

Word Character: \w \W

Match based on whether a character is a word character.

\w matches any alphanumeric word character (uppercase and lowercase letters, digits, and the underscore (_) character). \W is the opposite. It matches any non-word character.

\w is equal to [a-zA-Z0-9_].
\W is equivalent to [^a-zA-Z0-9_].

Digits: \d \D

Match based on whether a character is a decimal digit.

\d matches any decimal digit character. \D is the opposite. It matches any character that isn’t a decimal digit.

\d is equal to [0-9].
\D is equal to [^0-9].

Whitespace: \s \S

Match based on whether a character represent whitespace.

\s matches any whitespace character (including tab and newline). \S is the opposite of \s . It matches any character that isn't whitespace.

Exercise 5.4

Save the python.txt into a variable. Then do the following:

  1. Find all the references
    Find all instances of two digits reference number in the text. The two digits reference looks like this: [12], [13], [15], etc. Print the length of it and print the result.


  1. Find all the versions
    Find all instances of the python versions that has with .x in it. It should capture the followings: 3.5.x, 2.6.x, 2.7.x. Print the length of it and print the result.


  1. Find all year before period or comma.
    Find all instances of the year before period or comma. It should capture the followings: 2018,, 2000,, 2008.. Print the length of it and print the result.


  1. Find all year not followed by period or comma.
    Find all instances of the year that are not followed by period or comma. Print the length of it and print the result.