faqts : Computers : Programming : Languages : Python : Snippets

+ Search
Add Entry AlertManage Folder Edit Entry Add page to http://del.icio.us/
Did You Find This Entry Useful?

3 of 5 people (60%) answered Yes
Recently 2 of 4 people (50%) answered Yes

Entry

Matching multiline patterns

Jul 5th, 2000 10:03
Nathan Wallace, Hans Nowak, Snippet 299, A.M. Kuchling


"""
Packages: text.regular_expressions
"""
"""
> > I have a pattern to match which would be like
> > //------------------
> > There are multiple such lines but I want to pick out the pairs because there
> > is data inbetween then and sometimes I want to insert between them.  
Assuming that you want to match successive occurrences of //---, and
not ones that are nested, you can do this with a regular expression.
You need to specify MULTILINE mode; in MULTILINE mode, ^ will match at 
the start of the string, and after newlines embedded in the string.  
Similarly, $ will match at the end of the string, or before newlines
inside the string.  
"""
data = """
//------------------
  Inside the first pair
//------------------
  Outside both pairs
//------------------
   Inside the second pair
//------------------"""
import re
pat = re.compile('^/+-+$ .*? ^/+-+$', re.VERBOSE|re.MULTILINE|re.DOTALL)
print pat.findall( data )
"""
'.' usually matches anything except a newline, so the DOTALL flag makes
'.' match newlines as well.  Note that the central component uses a
non-greedy, or minimal match: *? instead of *.  This avoids getting a
match extending from the first 
This prints out:
['//------------------\012  Inside the first pair\012//------------------', 
 '//------------------\012   Inside the second pair\012//------------------']
If you want to do the matching line-by-line, without reading
everything into a string, then Tim Ottinger's solution is pretty much
the correct one.  (Should really finish the mmap module one of these
days; if you had it, then you could mmap the entire file and do the
regex match on it.)
"""