Regex

在模式'Test'和'function'的前面加上了一个'r'，这个'r'代表原始字符串(Raw String)。在Python中，原始字符串主要用来处理特殊字符所产生的歧义，比如前面讲到的'\'这个转义字符就是一种特殊字符，它会产生很多不必要的歧义，比如你要在Windows中用Python的open()函数打开一个文件：

>>> f = open('C:\Program Files\test.txt', 'r')

IOError: [Errno 2] No such file or directory: 'C:\\Program Files\test.txt'

这时你会发现该文件打不开了，Python返回了一个IOError: [Errno 2] No such file or directory: 'C:\\Program Files\test.txt'错误，这是因为“\t”被当做了不属于文件名的特殊符号。解决的办法也很简单，就是使用原始字符串。

>>> f = open(r'C:\Program Files\test.txt', 'r')

>>> print f

<open file 'C:\\Program Files\\test.txt', mode 'r' at 0x0000000002A01150>

re.match() # search from beginning 从头开始找

re.search() # search from whole strings 整个文章找

re.findall() # find all matched 找到所有

re.compile() # Define regular expression grammar for re-call later. 单独定义正则表达式语法，以便多次调用

=============================================

str =  "a1b2c3d4"

reg0 = re.compile(“\d{0,1}”)

 method1 -> ss = re.findall( reg0, str )

 method2 -> ss = reg0.findall( str )

>>> ss

['', '1', '', '2', '', '3', '', '4', '']

s.replace(ss, (4,3,2,1) )                      # use replace module to replace search result from Regex 找到目标并替换

正则表达式匹配中，（.*）和（.*?）匹配区别？

（.*）是贪婪匹配，会把满足正则的尽可能多的往后匹配

（.*?）是非贪婪匹配，会把满足正则的尽可能少匹配

>>> s="<a>haha</a><a>hehe</a>"

>>> import re

>>> res1=re.findall("<a>(.*)</a>",s)

>>> print (res1)

['haha</a><a>hehe']

>>> res2=re.findall("<a>(.*?)</a>",s)

>>> print (res2)

['haha', 'hehe']

正则切分字符串 split 多重字符 (e.g. 空格和冒号）

s = "info:xiaozhang is work at Nanjing"

res = re.split(r":| ",s)

print (res)

>>>['info', 'xiaozhang', 'is', 'work', 'at', 'Nanjing']

Google Sites

Report abuse