Here are more information about Python regular expressions:
The Python re module: http://docs.python.org/library/re.html
Regular expression HOWTO: http://docs.python.org/howto/regex.html
Always use Python raw strings. r"…", r'…', r"""…""", or r'''…'''
r"." matches any character
r"^…" / r"…$" matches the start / end of the string
r"…?" / r"…*" / r"…+" matches a regexp at most once / any number of times / at least once
r"…??" / r"…*?" / r"…+?" are the non-greedy alternatives
r"…|…|…" matches either this or that or that
r"[…]" matches any of the characters given
r"[^…]" matches any character not given
r"[^a-z]" matches any character between "a" and "z" (according to the unicode order)
r"[…-]" matches a minus (in addition to the other given characters)
r"[[…]" matches a closing bracket (in addition to the other given characters)
backslashed characters
r"\." / r"\[" / r"\\" / etc. matches the literal symbol
r"\s" matches a whitespace character
r"\S" matches non-whitespace
r"\w" matches a letter or a digit or underscore
r"\W" matches non-(letter|digit|underscore)
r"\d" matches a digit
r"\D" matches non-digit
r"\b" matches a word boundary
i.e., the empty string but only in the context of r"\W\w" or r"\w\W"
r"[^\W\d_]" matches only letters
try to understand why
the idea is taken from http://stackoverflow.com/questions/1673749
r"…(…)…" matches the whole regexp, but captures the part inside parenthesis so that you can look it up later
r"…(?:…)…" matches the regexp, and does not capture the parenthesis
These are the main regexp functions:
re.compile(r"…", [flags])
returns an object with all the methods below (but with no pattern argument, since this is already compiled)
re.split(r"…", string, [maxsplit])
returns a list of strings, split by the pattern
if there are capturing parentheses r"…(…)…" in the pattern, then their values are also returned as part of the list
re.findall(r"…", string, [flags])
returns a list of all substrings that match the pattern
if there are n capturing parentheses r"…(…)…" in the pattern, then return a list of n-tuples
re.sub(r"…", replacement, string, [count])
substitutes each occurrence of the pattern with the replacement string
if there are n capturing parentheses r"…(…)…" in the pattern, then you can use \1, …, \n as backreferences in the replacement string
re.search(r"…", string, [flags])
returns a MatchObject, or None
read more about match objects here: http://docs.python.org/library/re.html#match-objects
re.finditer(r"…", string, [flags])
returns an iterator of MatchObject, that can be used in a for loop
Match objects are returned by re.search and re.finditer. They have the following methods:
.group() => returns the matching string
m.group() == m.group(0)
.group(k) => returns the kth capture group (k=0 means the whole match)
.groups() => returns a tuple of all capture groups
.start() => returns the start position of the match in the search string
.start(k) => returns the start position of the kth capture group
.end() => returns the end position of the match in the search string
.end(k) => returns the end position of the kth capture group
.span() => returns a tuple (start, end)
m.span(k) == (m.start(k), m.end(k))