To match sth with a prefix but not include the prefix
(?<=...)
To match sth with a suffix but not include the suffix
(?=...)
To match with both prefix and suffix, e.g. brackets, but not include them in the result, use both of the above.
import re
string = 'abc1,[2],3end'
s1 = re.search('(?<=abc)[0-9]+', string)
s2 = re.search('[0-9]+(?=end)', string)
s3 = re.search('(?<=\[)[0-9]+(?=\])', string)
print(s1.group(0))
print(s2.group(0))
print(s3.group(0))
output:
1
3
2
The regex (?=regex) is called positive look ahead, which finds a pattern ahead but not including it in the match.
The regex (?!regex) is called negative look ahead, which must not have a pattern ahead.
Similarly (?<=regex) is called positive look behind, which finds a pattern behind but not including it in the match.
The regex (?<!regex) is called negative look behind, which f must not have a pattern behind.
Group
In .Net (maybe others as well) it can match a pattern as a group and assign the group a name, so as to access the group later through group name.
( ? 'group name' regex)
The group when matched will be pushed into a stack, so you can visit it later.
When there is a negative sign prefixing the group name, it means popping out the group from the stack on encountering the pattern
( ? '-group name' regex)
Matching a pair, e.g. left and right parenthesis
(?'parenthesis'\( [a-z]+ (?'-parenthesis'\) (?(parenthesis)(?!)
Firstly match left parenthesis \( and push the group to stack, i.e. the red part. The group is called parenthesis
Secondly match any character a-z between parenthesis, i.e. the blue
Thirdly match right parenthesis \) and pop the group parenthesis out of stack, which means there should be no parenthesis group in the stack now if the parenthesis come in pair.
Finally use a conditional check (?(parenthesis)(?!) to see if the stack is empty, if not then fail.
The conditional check works as follow:
(?( group name ) yes | no )
It checks if the group name is in the stack, if true then evaluate the yes regex, otherwise the no regex. The no regex is optional.
So the above (?(parenthesis)(?!) checks if the group parenthesis is in the stack or not. If no the pair is ok and it just passes through.
If yes it evaluates (?!) which is the negative look ahead for empty string. After the ! there is nothing so it matches empty string. Because there is always an empty string so the negative look ahead (?!)
always fails. That means if the group parenthesis is in the stack, it always fails. It guarantees left and right parenthesis must come in pair to clear the group from stack.
Group name in python
import re
pattern = '(?P<groupname>[0-9]+)'
string = 'dddabc123'
m = re.search(pattern, string)
print(m.group('groupname'))