faqts : Computers : Programming : Languages : Python : Snippets

+ Search
Add Entry AlertManage Folder Edit Entry Add page to http://del.icio.us/
Did You Find This Entry Useful?

1 of 1 people (100%) answered Yes

Entry

Search (match) backward

Jul 5th, 2000 10:02
Nathan Wallace, Hans Nowak, Snippet 249, Magnus L. Hetland


"""
Packages: text.parsing
"""
"""
> This raises the interesting question of whether there is an
> algorithm for reversing an RE (i.e. find an RE which matches
> the reverse of the strings that the original RE matches).
> Can it even be done in general? My intuition says yes, but
> I can't think of a proof off the top of my head.
I want to try! :)
OK... A regular expression expresses a regular language, right? And
they are built from union, concatenation, and closure... (I'm assuming
that all other stuff in the re module follows from this... Though with
lookahead etc. this is probably not the case...)
So...A regular set is a singleton set, the union of two regular sets,
the concatenation of two regular sets, or the closure of a regular
sets. I guess I just have to show how to reverse all of these
constructs...
A singleton set consists of only one symbol, so it doesn't have to be
reversed. A union doesn't express order, so that doesn't have to be
reversed either (but of course both of the sets being used must be).
Concatenation must be reversed - and van, quite simply. Just
concatenate the sets in the opposite order (after they themselves have
been reversed). Finally - closure does not have to be reversed, since
patterns from the same set are repeated, and the order is
insignificant.
So - as far as I can see, it can be done, and only the concatenation
actually needs rearranging... So - while I'm at it...
There's no parser here, but I think it should be trivial to construct
(with this minimal regexp grammar).
"""
# ----- rereverse.py -----
class RegExp:
    def __init__(self, left, right):
        self.left = left
        self.right = right
class Singleton(RegExp):
    def __init__(self, value):
        self.value = value
    def accept(self, visitor):
        return visitor.visitSingleton(self)
class Union(RegExp):
    def accept(self, visitor):
        return visitor.visitUnion(self)
class Concatenation(RegExp):
    def accept(self, visitor):
        return visitor.visitConcatenation(self)
class Closure(RegExp):
    def __init__(self, child):
        self.child = child
    def accept(self, visitor):
        return visitor.visitClosure(self)
class RegexpReverser:
    def visitSingleton(self, singleton):
        return singleton.value
    def visitUnion(self, union):
        l = union.left.accept(self)
        r = union.right.accept(self)
        return "("+l+"|"+r+")"
    def visitConcatenation(self, concat):
        l = concat.left.accept(self)
        r = concat.right.accept(self)
        return r+l
    def visitClosure(self, closure):
        c = closure.child.accept(self)
        return "("+c+")*"
if __name__=="__main__":
    r = Concatenation(Union(Singleton("a"),
                            Singleton("b")),
                      Closure(Singleton("c")))
    rr = RegexpReverser()
    result = r.accept(rr)
    print result