Entry
Matching multiline patterns
Jul 5th, 2000 10:03
Nathan Wallace, Hans Nowak, Snippet 299, A.M. Kuchling
"""
Packages: text.regular_expressions
"""
"""
> > I have a pattern to match which would be like
> > //------------------
> > There are multiple such lines but I want to pick out the pairs because there
> > is data inbetween then and sometimes I want to insert between them.
Assuming that you want to match successive occurrences of //---, and
not ones that are nested, you can do this with a regular expression.
You need to specify MULTILINE mode; in MULTILINE mode, ^ will match at
the start of the string, and after newlines embedded in the string.
Similarly, $ will match at the end of the string, or before newlines
inside the string.
"""
data = """
//------------------
Inside the first pair
//------------------
Outside both pairs
//------------------
Inside the second pair
//------------------"""
import re
pat = re.compile('^/+-+$ .*? ^/+-+$', re.VERBOSE|re.MULTILINE|re.DOTALL)
print pat.findall( data )
"""
'.' usually matches anything except a newline, so the DOTALL flag makes
'.' match newlines as well. Note that the central component uses a
non-greedy, or minimal match: *? instead of *. This avoids getting a
match extending from the first
This prints out:
['//------------------\012 Inside the first pair\012//------------------',
'//------------------\012 Inside the second pair\012//------------------']
If you want to do the matching line-by-line, without reading
everything into a string, then Tim Ottinger's solution is pretty much
the correct one. (Should really finish the mmap module one of these
days; if you had it, then you could mmap the entire file and do the
regex match on it.)
"""