FAQTs - Knowledge Base - View Entry - Docstring-driven testing [LONG]

faqts : Computers : Programming : Languages : Python : Snippets

+ Search

Python Cookbook (lots of great snippets!),
Entry

Docstring-driven testing [LONG]

Jul 5th, 2000 09:59
Nathan Wallace, Hans Nowak, Snippet 83, Tim Peters
"""
Packages: text.docstrings
"""
"""
If you're like me, you've been using Python since '91, and every scheme
you've come up with for testing basically sucked.  Some observations:
+ Examples are priceless.
+ Examples that don't work are worse than worthless.
+ Examples that work eventually turn into examples that don't.
+ Docstrings too often don't get written.
+ Docstrings that do get written rarely contain those priceless examples.
+ The rare written docstrings that do contain priceless examples eventually
turn into rare docstrings with examples that don't work.  I think this one
may follow from the above ...
+ Module unit tests too often don't get written.
+ The best Python testing gets done in interactive mode, esp. trying
endcases that almost never make it into a test suite because they're so
tedious to code up.
+ The endcases that were tested interactively-- but never coded up --also
fail to work after time.
About a month ago, I tried something new:  take those priceless interactive
testing sessions, paste them into docstrings, and write a module to do all
the rest by magic (find the examples, execute them, and verify they still
work exactly as advertised).
Wow -- it turned out to be the only scheme I've ever really liked, and I
like it a lot!  With almost no extra work beyond what I was doing before,
tests and docstrings get written now, and I'm certain the docstring examples
are accurate.  It's also caught an amazing number of formerly-insidious
buglets in my modules, from accidental changes in endcase behavior, to hasty
but inconsistent renamings.
doctest.py is attached, and it's the whole banana.  Give it a try, if you
like.  After another month or so of ignoring your groundless complaints,
I'll upload it to the python.org FTP contrib site.  Note that it serves as
an example of its own use, albeit an artificially strained example.
winklessly y'rs  - tim
"""
# Module doctest.
# Released to the public domain 06-Mar-1999,
# by Tim Peters (tim_one@email.msn.com).
# Provided as-is; use at your own risk; no warranty; no promises; enjoy!
"""Module doctest -- a framework for running examples in docstrings.
NORMAL USAGE
In normal use, end each module M with:
def _test():
    import M, doctest           # replace M with your module's name
    return doctest.testmod(M)   # ditto
if __name__ == "__main__":
    _test()
Then running the module as a script will cause the examples in the
docstrings to get executed and verified:
python M.py
This won't display anything unless an example fails, in which case
the failing example(s) and the cause of the failure(s) are printed
to stdout (why not stderr? because stderr is a lame hack <0.2 wink>),
and the final line of output is "Test failed.".
Run it with the -v switch instead:
python M.py -v
and a detailed report of all examples tried is printed to stdout, along
with assorted summaries at the end.
You can force verbose mode by passing "verbose=1" to testmod, or prohibit
it by passing "verbose=0".  In either of those cases, sys.argv is not
examined by testmod.
In any case, testmod returns a 2-tuple of ints, (f, t), where f is the
number of docstrings examples that failed and t is the total number of
docstring examples attempted.
WHICH DOCSTRINGS ARE EXAMINED?
In any case, M.__doc__ is searched for examples.
By default, the following are also searched:
+ All functions in M.__dict__.values(), except those whose names
  begin with an underscore.
+ All classes in M.__dict__.values(), except those whose names
  begin with an underscore.
By default, any classes found are recursively searched similarly, to
test docstrings in their contained methods and nested classes.  Pass
"deep=0" to testmod and *only* M.__doc__ is searched.
Warning:  imports can cause trouble; e.g., if you do
from XYZ import XYZclass
then XYZclass is a name in M.__dict__ too, and doctest has no way to
know that XYZclass wasn't *defined* in M.  So it may try to execute the
examples in XYZclass's docstring, and those in turn may require a
different set of globals to work correctly.  I prefer to do "import *"-
friendly imports, a la
import XYY
_XYZclass = XYZ.XYZclass
del XYZ
and then the leading underscore stops testmod from going nuts.  You may
prefer the method in the next section.
WHAT'S THE EXECUTION CONTEXT?
By default, each time testmod finds a docstring to test, it uses a
*copy* of M's globals (so that running tests on a module doesn't change
the module's real globals).  This means examples can freely use any
names defined at top-level in M.  It also means that sloppy imports (see
above) can cause examples in external docstrings to use globals
inappropriate for them.
You can force use of your own dict as the execution context by passing
"globs=your_dict" to testmod instead.  Presumably this would be a copy
of M.__dict__ merged with the globals from other imported modules.
WHAT IF I WANT TO TEST A WHOLE PACKAGE?
Piece o' cake, provided the modules do their testing from docstrings.
Here's the test.py I use for the world's most elaborate Rational/
floating-base-conversion pkg (which I'll distribute some day):
from Rational import Cvt
from Rational import Format
from Rational import machprec
from Rational import Rat
from Rational import Round
from Rational import utils
modules = (Cvt,
           Format,
           machprec,
           Rat,
           Round,
           utils)
def _test():
    import doctest
    import sys
    verbose = "-v" in sys.argv
    for mod in modules:
        doctest.testmod(mod, verbose=verbose, report=0)
    doctest.master.summarize()
if __name__ == "__main__":
    _test()
IOW, it just runs testmod on all the pkg modules.  testmod remembers the
names and outcomes (# of failures, # of tries) for each item it's seen,
and passing "report=0" prevents it from printing a summary in verbose
mode.  Instead, the summary is delayed until all modules have been
tested, and then "doctest.master.summarize()" forces the summary at the
end.
So this is very nice in practice:  each module can be tested individually
with almost no work beyond writing up docstring examples, and collections
of modules can be tested too as a unit with no more work than the above.
SO WHAT DOES A DOCSTRING EXAMPLE LOOK LIKE ALREADY!?
Oh ya.  It's easy!  In most cases a slightly fiddled copy-and-paste of an
interactive console session works fine -- just make sure there aren't any
tab characters in it, and that you eliminate stray whitespace-only lines.
    >>> # comments are harmless
    >>> x = 12
    >>> x
    12
    >>> if x == 13:
    ...     print "yes"
    ... else:
    ...     print "no"
    ...     print "NO"
    ...     print "NO!!!"
    no
    NO
    NO!!!
    >>>
Any expected output must immediately follow the final ">>>" or "..."
line containing the code, and the expected output (if any) extends
to the next ">>>" or all-whitespace line.  That's it.
Bummer:  only non-exceptional examples can be used.  If anything raises
an uncaught exception, doctest will report it as a failure.
The starting column doesn't matter:
>>> assert "Easy!"
     >>> import math
            >>> math.floor(1.9)
            1.0
and as many leading blanks are stripped from the expected output as
appeared in the code lines that triggered it.
If you execute this very file, the examples above will be found and
executed, leading to this output in verbose mode:
Running doctest.__doc__
Trying: # comments are harmless
Expecting: nothing
ok
Trying: x = 12
Expecting: nothing
ok
Trying: x
Expecting: 12
ok
Trying:
if x == 13:
    print "yes"
else:
    print "no"
    print "NO"
    print "NO!!!"
Expecting:
no
NO
NO!!!
ok
... and a bunch more like that, with this summary at the end:
2 items had no tests:
    doctest.run_docstring_examples
    doctest.testmod
4 items passed all tests:
   7 tests in doctest
   2 tests in doctest.TestClass
   2 tests in doctest.TestClass.get
   1 tests in doctest.TestClass.square
12 tests in 6 items.
12 passed and 0 failed.
Test passed.
"""
__version__ = 0, 0, 1
import types
_FunctionType = types.FunctionType
_ClassType    = types.ClassType
_ModuleType   = types.ModuleType
del types
import string
_string_find = string.find
del string
import re
PS1 = re.compile(r" *>>>").match
PS2 = "... "
del re
# Extract interactive examples from a string.  Return a list of string
# pairs, (source, outcome).  "source" is the source code, and ends
# with a newline iff the source spans more than one line.  "outcome" is
# the expected output if any, else None.  If not None, outcome always
# ends with a newline.
def _extract_examples(s):
    import string
    examples = []
    lines = string.split(s, "\n")
    i, n = 0, len(lines)
    while i < n:
        line = lines[i]
        i = i + 1
        m = PS1(line)
        if m is None:
            continue
        j = m.end(0)  # beyond the prompt
        if string.strip(line[j:]) == "":
            # a bare prompt -- not interesting
            continue
        assert line[j] == " "
        j = j + 1
        nblanks = j - 4  # 4 = len(">>> ")
        blanks = " " * nblanks
        # suck up this and following PS2 lines
        source = []
        while 1:
            source.append(line[j:])
            line = lines[i]
            if line[:j] == blanks + PS2:
                i = i + 1
            else:
                break
        if len(source) == 1:
            source = source[0]
        else:
            source = string.join(source, "\n") + "\n"
        # suck up response
        if  PS1(line) or string.strip(line) == "":
            expect = None
        else:
            expect = []
            while 1:
                assert line[:nblanks] == blanks
                expect.append(line[nblanks:])
                i = i + 1
                line = lines[i]
                if PS1(line) or string.strip(line) == "":
                    break
            expect = string.join(expect, "\n") + "\n"
        examples.append( (source, expect) )
    return examples
# Capture stdout when running examples.
class _SpoofOut:
    def __init__(self):
        self.clear()
    def write(self, s):
        self.buf = self.buf + s
    def get(self):
        return self.buf
    def clear(self):
        self.buf = ""
# Display some tag-and-msg pairs nicely, keeping the tag and its msg
# on the same line when that makes sense.
def _tag_out(printer, *tag_msg_pairs):
    for tag, msg in tag_msg_pairs:
        printer(tag + ":")
        msg_has_nl = msg[-1:] == "\n"
        msg_has_two_nl = msg_has_nl and \
                        _string_find(msg, "\n") < len(msg) - 1
        if len(tag) + len(msg) < 76 and not msg_has_two_nl:
            printer(" ")
        else:
            printer("\n")
        printer(msg)
        if not msg_has_nl:
            printer("\n")
# Run list of examples, in context globs.  "out" can be used to display
# stuff to "the real" stdout, and fakeout is an instance of _SpoofOut
# that captures the examples' std output.  Return (#failures, #tries).
def _run_examples_inner(out, fakeout, examples, globs, verbose):
    import sys
    TRYING, BOOM, OK, FAIL = range(4)
    tries = failures = 0
    for source, want in examples:
        if verbose:
            _tag_out(out, ("Trying", source),
                          ("Expecting",
                           want is None and "nothing" or want))
        fakeout.clear()
        state = TRYING
        tries = tries + 1
        if want is None:
            # this must be an exec
            want = ""
            try:
                exec source in globs
                got = fakeout.get()    # expect this to be empty
                state = OK
            except:
                etype, evalue = sys.exc_info()[:2]
                state = BOOM
        else:
            # can't tell whether to eval or exec without trying
            try:
                result = eval(source, globs)
                # interactive console applies repr to result, so us too
                got = fakeout.get() + repr(result) + "\n"
                state = OK
            except SyntaxError:
                # must need an exec
                pass
            except:
                etype, evalue = sys.exc_info()[:2]
                state = BOOM
            if state == TRYING:
                try:
                    exec source in globs
                    got = fakeout.get()
                    state = OK
                except:
                    etype, evalue = sys.exc_info()[:2]
                    state = BOOM
        assert state in (OK, BOOM)
        if state == OK:
            if got == want:
                if verbose:
                    out("ok\n")
                continue
            state = FAIL
        assert state in (FAIL, BOOM)
        failures = failures + 1
        out("*" * 65 + "\n")
        _tag_out(out, ("Failure in example", source))
        if state == FAIL:
            _tag_out(out, ("Expected", want), ("Got", got))
        else:
            assert state == BOOM
            _tag_out(out, ("Exception raised",
                           str(etype) + ": " + str(evalue)))
    return failures, tries
# Run list of examples, in context globs.  Return (#failures, #tries).
def _run_examples(examples, globs, verbose):
    import sys
    saveout = sys.stdout
    try:
        sys.stdout = fakeout = _SpoofOut()
        x = _run_examples_inner(saveout.write, fakeout, examples,
                                globs, verbose)
    finally:
        sys.stdout = saveout
    return x
def run_docstring_examples(f, globs, verbose=0):
    """f, globs [,verbose] -> run examples from f.__doc__.
    Use globs as the globals for execution.
    Return (#failures, #tries).
    If optional arg verbose is true, print stuff even if there are no
    failures.
    """
    try:
        doc = f.__doc__
    except AttributeError:
        return 0, 0
    if not doc:
        # docstring empty or None
        return 0, 0
    e = _extract_examples(doc)
    if not e:
        return 0, 0
    return _run_examples(e, globs, verbose)
class _Tester:
    def __init__(self):
        self.globs = {}     # globals for execution
        self.deep = 1       # recurse into classes?
        self.verbose = 0    # print lots of stuff?
        self.name2ft = {}   # map name to (#failures, #trials) pairs
    def runone(self, target, name):
        if _string_find(name, "._") >= 0:
            return 0, 0
        if self.verbose:
            print "Running", name + ".__doc__"
        f, t = run_docstring_examples(target, self.globs.copy(),
                                      self.verbose)
        if self.verbose:
            print f, "of", t, "examples failed in", name + ".__doc__"
        self.name2ft[name] = f, t
        if self.deep and type(target) is _ClassType:
            f2, t2 = self.rundict(target.__dict__, name)
            f = f + f2
            t = t + t2
        return f, t
    def rundict(self, d, name):
        f = t = 0
        for thisname, value in d.items():
            if type(value) in (_FunctionType, _ClassType):
                f2, t2 = self.runone(value, name + "." + thisname)
                f = f + f2
                t = t + t2
        return f, t
    def summarize(self):
        notests = []
        passed = []
        failed = []
        totalt = totalf = 0
        for x in self.name2ft.items():
            name, (f, t) = x
            assert f <= t
            totalt = totalt + t
            totalf = totalf + f
            if t == 0:
                notests.append(name)
            elif f == 0:
                passed.append( (name, t) )
            else:
                failed.append(x)
        if self.verbose:
            if notests:
                print len(notests), "items had no tests:"
                notests.sort()
                for thing in notests:
                    print "   ", thing
            if passed:
                print len(passed), "items passed all tests:"
                passed.sort()
                for thing, count in passed:
                    print " %3d tests in %s" % (count, thing)
        if failed:
            print len(failed), "items had failures:"
            failed.sort()
            for thing, (f, t) in failed:
                print " %3d of %3d in %s" % (f, t, thing)
        if self.verbose:
            print totalt + totalf, "tests in", len(self.name2ft), "items."
            print totalt, "passed and", totalf, "failed."
        if totalf:
            print "***Test Failed***", totalf, "failures."
        elif self.verbose:
            print "Test passed."
master = _Tester()
def testmod(m, name=None, globs=None, verbose=None, deep=1, report=1):
    """m, name=None, globs=None, verbose=None, deep=1, report=1
    Test examples in docstrings in functions and classes reachable from
    module m, starting with m.__doc__.  Names beginning with a leading
    underscore are skipped.
    See doctest.__doc__ for an overview.
    Optional keyword arg "name" gives the name of the module; by default
    use m.__name__.
    Optional keyword arg "globs" gives a dict to be used as the globals
    when executing examples; by default, use m.__dict__.  A copy of this
    dict is actually used for each docstring.
    Optional keyword arg "verbose" prints lots of stuff if true, prints
    only failures if false; by default, it's true iff "-v" is in sys.argv.
    Optional keyword arg "deep" tests *only* m.__doc__ when false.
    Optional keyword arg "report" prints a summary at the end when true,
    else prints nothing at the end.  In verbose mode, the summary is
    detailed, else very brief.
    """
    if type(m) is not _ModuleType:
        raise TypeError("testmod: module required; " + `m`)
    if name is None:
        name = m.__name__
    if globs is None:
        globs = m.__dict__
    if verbose is None:
        import sys
        verbose = "-v" in sys.argv
    master.globs = globs
    master.verbose = verbose
    master.deep = deep
    failures, tries = master.runone(m, name)
    if deep:
        f, t = master.rundict(m.__dict__, name)
        failures = failures + f
        tries = tries + t
    if report:
        master.summarize()
    return failures, tries
class TestClass:
    """
    A pointless class, for sanity-checking of docstring testing.
    Methods:
        square()
        get()
    >>> TestClass(13).get() + TestClass(-12).get()
    1
    >>> hex(TestClass(13).square().get())
    '0xa9'
    """
    def __init__(self, val):
        self.val = val
    def square(self):
        """square() -> square TestClass's associated value
        >>> TestClass(13).square().get()
        169
        """
        self.val = self.val ** 2
        return self
    def get(self):
        """get() -> return TestClass's associated value.
        >>> x = TestClass(-42)
        >>> print x.get()
        -42
        """
        return self.val
def _test():
    import doctest
    return doctest.testmod(doctest)
if __name__ == "__main__":
    _test()