expression syntax: http://docs.python.org/2/library/re.html
To prepare pattern from some existing text such as html, first pick up a sample section, use re.escape to escape it, then put re directives into it.
x = re.escape(re.sub('\s+','EmPtYeMpTyEmPtY',sample))
x = x.replace('EmPtYeMpTyEmPtY','\\s+').replace('VEYVEYVEYVEY','.*?').replace('CTXCTXCTXCTX','(.*?)')
then just type 'x' to see escaped x to put into code and further edit
Test the pattern with file, if not successful, best way to find out where goes wrong is to split a half off, test again, and repeat, i.e.
try to format string like
x = 'sssssssssssssssssssssssssss'\
+'sssssssssssssssssssssssssssss'\
+'ssssssssssssss'
try comment the end and see if result change. if change, then whatever cause failure still at the last line.
Often it's best to put compile pattern as re.compile(pattern, re.DOTALL) to catch content span over line break in HTML.