Entry
Docstring-driven testing [LONG]
Jul 5th, 2000 09:59
Nathan Wallace, Hans Nowak, Snippet 83, Tim Peters
"""
Packages: text.docstrings
"""
"""
If you're like me, you've been using Python since '91, and every scheme
you've come up with for testing basically sucked. Some observations:
+ Examples are priceless.
+ Examples that don't work are worse than worthless.
+ Examples that work eventually turn into examples that don't.
+ Docstrings too often don't get written.
+ Docstrings that do get written rarely contain those priceless examples.
+ The rare written docstrings that do contain priceless examples eventually
turn into rare docstrings with examples that don't work. I think this one
may follow from the above ...
+ Module unit tests too often don't get written.
+ The best Python testing gets done in interactive mode, esp. trying
endcases that almost never make it into a test suite because they're so
tedious to code up.
+ The endcases that were tested interactively-- but never coded up --also
fail to work after time.
About a month ago, I tried something new: take those priceless interactive
testing sessions, paste them into docstrings, and write a module to do all
the rest by magic (find the examples, execute them, and verify they still
work exactly as advertised).
Wow -- it turned out to be the only scheme I've ever really liked, and I
like it a lot! With almost no extra work beyond what I was doing before,
tests and docstrings get written now, and I'm certain the docstring examples
are accurate. It's also caught an amazing number of formerly-insidious
buglets in my modules, from accidental changes in endcase behavior, to hasty
but inconsistent renamings.
doctest.py is attached, and it's the whole banana. Give it a try, if you
like. After another month or so of ignoring your groundless complaints,
I'll upload it to the python.org FTP contrib site. Note that it serves as
an example of its own use, albeit an artificially strained example.
winklessly y'rs - tim
"""
# Module doctest.
# Released to the public domain 06-Mar-1999,
# by Tim Peters (tim_one@email.msn.com).
# Provided as-is; use at your own risk; no warranty; no promises; enjoy!
"""Module doctest -- a framework for running examples in docstrings.
NORMAL USAGE
In normal use, end each module M with:
def _test():
import M, doctest # replace M with your module's name
return doctest.testmod(M) # ditto
if __name__ == "__main__":
_test()
Then running the module as a script will cause the examples in the
docstrings to get executed and verified:
python M.py
This won't display anything unless an example fails, in which case
the failing example(s) and the cause of the failure(s) are printed
to stdout (why not stderr? because stderr is a lame hack <0.2 wink>),
and the final line of output is "Test failed.".
Run it with the -v switch instead:
python M.py -v
and a detailed report of all examples tried is printed to stdout, along
with assorted summaries at the end.
You can force verbose mode by passing "verbose=1" to testmod, or prohibit
it by passing "verbose=0". In either of those cases, sys.argv is not
examined by testmod.
In any case, testmod returns a 2-tuple of ints, (f, t), where f is the
number of docstrings examples that failed and t is the total number of
docstring examples attempted.
WHICH DOCSTRINGS ARE EXAMINED?
In any case, M.__doc__ is searched for examples.
By default, the following are also searched:
+ All functions in M.__dict__.values(), except those whose names
begin with an underscore.
+ All classes in M.__dict__.values(), except those whose names
begin with an underscore.
By default, any classes found are recursively searched similarly, to
test docstrings in their contained methods and nested classes. Pass
"deep=0" to testmod and *only* M.__doc__ is searched.
Warning: imports can cause trouble; e.g., if you do
from XYZ import XYZclass
then XYZclass is a name in M.__dict__ too, and doctest has no way to
know that XYZclass wasn't *defined* in M. So it may try to execute the
examples in XYZclass's docstring, and those in turn may require a
different set of globals to work correctly. I prefer to do "import *"-
friendly imports, a la
import XYY
_XYZclass = XYZ.XYZclass
del XYZ
and then the leading underscore stops testmod from going nuts. You may
prefer the method in the next section.
WHAT'S THE EXECUTION CONTEXT?
By default, each time testmod finds a docstring to test, it uses a
*copy* of M's globals (so that running tests on a module doesn't change
the module's real globals). This means examples can freely use any
names defined at top-level in M. It also means that sloppy imports (see
above) can cause examples in external docstrings to use globals
inappropriate for them.
You can force use of your own dict as the execution context by passing
"globs=your_dict" to testmod instead. Presumably this would be a copy
of M.__dict__ merged with the globals from other imported modules.
WHAT IF I WANT TO TEST A WHOLE PACKAGE?
Piece o' cake, provided the modules do their testing from docstrings.
Here's the test.py I use for the world's most elaborate Rational/
floating-base-conversion pkg (which I'll distribute some day):
from Rational import Cvt
from Rational import Format
from Rational import machprec
from Rational import Rat
from Rational import Round
from Rational import utils
modules = (Cvt,
Format,
machprec,
Rat,
Round,
utils)
def _test():
import doctest
import sys
verbose = "-v" in sys.argv
for mod in modules:
doctest.testmod(mod, verbose=verbose, report=0)
doctest.master.summarize()
if __name__ == "__main__":
_test()
IOW, it just runs testmod on all the pkg modules. testmod remembers the
names and outcomes (# of failures, # of tries) for each item it's seen,
and passing "report=0" prevents it from printing a summary in verbose
mode. Instead, the summary is delayed until all modules have been
tested, and then "doctest.master.summarize()" forces the summary at the
end.
So this is very nice in practice: each module can be tested individually
with almost no work beyond writing up docstring examples, and collections
of modules can be tested too as a unit with no more work than the above.
SO WHAT DOES A DOCSTRING EXAMPLE LOOK LIKE ALREADY!?
Oh ya. It's easy! In most cases a slightly fiddled copy-and-paste of an
interactive console session works fine -- just make sure there aren't any
tab characters in it, and that you eliminate stray whitespace-only lines.
>>> # comments are harmless
>>> x = 12
>>> x
12
>>> if x == 13:
... print "yes"
... else:
... print "no"
... print "NO"
... print "NO!!!"
no
NO
NO!!!
>>>
Any expected output must immediately follow the final ">>>" or "..."
line containing the code, and the expected output (if any) extends
to the next ">>>" or all-whitespace line. That's it.
Bummer: only non-exceptional examples can be used. If anything raises
an uncaught exception, doctest will report it as a failure.
The starting column doesn't matter:
>>> assert "Easy!"
>>> import math
>>> math.floor(1.9)
1.0
and as many leading blanks are stripped from the expected output as
appeared in the code lines that triggered it.
If you execute this very file, the examples above will be found and
executed, leading to this output in verbose mode:
Running doctest.__doc__
Trying: # comments are harmless
Expecting: nothing
ok
Trying: x = 12
Expecting: nothing
ok
Trying: x
Expecting: 12
ok
Trying:
if x == 13:
print "yes"
else:
print "no"
print "NO"
print "NO!!!"
Expecting:
no
NO
NO!!!
ok
... and a bunch more like that, with this summary at the end:
2 items had no tests:
doctest.run_docstring_examples
doctest.testmod
4 items passed all tests:
7 tests in doctest
2 tests in doctest.TestClass
2 tests in doctest.TestClass.get
1 tests in doctest.TestClass.square
12 tests in 6 items.
12 passed and 0 failed.
Test passed.
"""
__version__ = 0, 0, 1
import types
_FunctionType = types.FunctionType
_ClassType = types.ClassType
_ModuleType = types.ModuleType
del types
import string
_string_find = string.find
del string
import re
PS1 = re.compile(r" *>>>").match
PS2 = "... "
del re
# Extract interactive examples from a string. Return a list of string
# pairs, (source, outcome). "source" is the source code, and ends
# with a newline iff the source spans more than one line. "outcome" is
# the expected output if any, else None. If not None, outcome always
# ends with a newline.
def _extract_examples(s):
import string
examples = []
lines = string.split(s, "\n")
i, n = 0, len(lines)
while i < n:
line = lines[i]
i = i + 1
m = PS1(line)
if m is None:
continue
j = m.end(0) # beyond the prompt
if string.strip(line[j:]) == "":
# a bare prompt -- not interesting
continue
assert line[j] == " "
j = j + 1
nblanks = j - 4 # 4 = len(">>> ")
blanks = " " * nblanks
# suck up this and following PS2 lines
source = []
while 1:
source.append(line[j:])
line = lines[i]
if line[:j] == blanks + PS2:
i = i + 1
else:
break
if len(source) == 1:
source = source[0]
else:
source = string.join(source, "\n") + "\n"
# suck up response
if PS1(line) or string.strip(line) == "":
expect = None
else:
expect = []
while 1:
assert line[:nblanks] == blanks
expect.append(line[nblanks:])
i = i + 1
line = lines[i]
if PS1(line) or string.strip(line) == "":
break
expect = string.join(expect, "\n") + "\n"
examples.append( (source, expect) )
return examples
# Capture stdout when running examples.
class _SpoofOut:
def __init__(self):
self.clear()
def write(self, s):
self.buf = self.buf + s
def get(self):
return self.buf
def clear(self):
self.buf = ""
# Display some tag-and-msg pairs nicely, keeping the tag and its msg
# on the same line when that makes sense.
def _tag_out(printer, *tag_msg_pairs):
for tag, msg in tag_msg_pairs:
printer(tag + ":")
msg_has_nl = msg[-1:] == "\n"
msg_has_two_nl = msg_has_nl and \
_string_find(msg, "\n") < len(msg) - 1
if len(tag) + len(msg) < 76 and not msg_has_two_nl:
printer(" ")
else:
printer("\n")
printer(msg)
if not msg_has_nl:
printer("\n")
# Run list of examples, in context globs. "out" can be used to display
# stuff to "the real" stdout, and fakeout is an instance of _SpoofOut
# that captures the examples' std output. Return (#failures, #tries).
def _run_examples_inner(out, fakeout, examples, globs, verbose):
import sys
TRYING, BOOM, OK, FAIL = range(4)
tries = failures = 0
for source, want in examples:
if verbose:
_tag_out(out, ("Trying", source),
("Expecting",
want is None and "nothing" or want))
fakeout.clear()
state = TRYING
tries = tries + 1
if want is None:
# this must be an exec
want = ""
try:
exec source in globs
got = fakeout.get() # expect this to be empty
state = OK
except:
etype, evalue = sys.exc_info()[:2]
state = BOOM
else:
# can't tell whether to eval or exec without trying
try:
result = eval(source, globs)
# interactive console applies repr to result, so us too
got = fakeout.get() + repr(result) + "\n"
state = OK
except SyntaxError:
# must need an exec
pass
except:
etype, evalue = sys.exc_info()[:2]
state = BOOM
if state == TRYING:
try:
exec source in globs
got = fakeout.get()
state = OK
except:
etype, evalue = sys.exc_info()[:2]
state = BOOM
assert state in (OK, BOOM)
if state == OK:
if got == want:
if verbose:
out("ok\n")
continue
state = FAIL
assert state in (FAIL, BOOM)
failures = failures + 1
out("*" * 65 + "\n")
_tag_out(out, ("Failure in example", source))
if state == FAIL:
_tag_out(out, ("Expected", want), ("Got", got))
else:
assert state == BOOM
_tag_out(out, ("Exception raised",
str(etype) + ": " + str(evalue)))
return failures, tries
# Run list of examples, in context globs. Return (#failures, #tries).
def _run_examples(examples, globs, verbose):
import sys
saveout = sys.stdout
try:
sys.stdout = fakeout = _SpoofOut()
x = _run_examples_inner(saveout.write, fakeout, examples,
globs, verbose)
finally:
sys.stdout = saveout
return x
def run_docstring_examples(f, globs, verbose=0):
"""f, globs [,verbose] -> run examples from f.__doc__.
Use globs as the globals for execution.
Return (#failures, #tries).
If optional arg verbose is true, print stuff even if there are no
failures.
"""
try:
doc = f.__doc__
except AttributeError:
return 0, 0
if not doc:
# docstring empty or None
return 0, 0
e = _extract_examples(doc)
if not e:
return 0, 0
return _run_examples(e, globs, verbose)
class _Tester:
def __init__(self):
self.globs = {} # globals for execution
self.deep = 1 # recurse into classes?
self.verbose = 0 # print lots of stuff?
self.name2ft = {} # map name to (#failures, #trials) pairs
def runone(self, target, name):
if _string_find(name, "._") >= 0:
return 0, 0
if self.verbose:
print "Running", name + ".__doc__"
f, t = run_docstring_examples(target, self.globs.copy(),
self.verbose)
if self.verbose:
print f, "of", t, "examples failed in", name + ".__doc__"
self.name2ft[name] = f, t
if self.deep and type(target) is _ClassType:
f2, t2 = self.rundict(target.__dict__, name)
f = f + f2
t = t + t2
return f, t
def rundict(self, d, name):
f = t = 0
for thisname, value in d.items():
if type(value) in (_FunctionType, _ClassType):
f2, t2 = self.runone(value, name + "." + thisname)
f = f + f2
t = t + t2
return f, t
def summarize(self):
notests = []
passed = []
failed = []
totalt = totalf = 0
for x in self.name2ft.items():
name, (f, t) = x
assert f <= t
totalt = totalt + t
totalf = totalf + f
if t == 0:
notests.append(name)
elif f == 0:
passed.append( (name, t) )
else:
failed.append(x)
if self.verbose:
if notests:
print len(notests), "items had no tests:"
notests.sort()
for thing in notests:
print " ", thing
if passed:
print len(passed), "items passed all tests:"
passed.sort()
for thing, count in passed:
print " %3d tests in %s" % (count, thing)
if failed:
print len(failed), "items had failures:"
failed.sort()
for thing, (f, t) in failed:
print " %3d of %3d in %s" % (f, t, thing)
if self.verbose:
print totalt + totalf, "tests in", len(self.name2ft), "items."
print totalt, "passed and", totalf, "failed."
if totalf:
print "***Test Failed***", totalf, "failures."
elif self.verbose:
print "Test passed."
master = _Tester()
def testmod(m, name=None, globs=None, verbose=None, deep=1, report=1):
"""m, name=None, globs=None, verbose=None, deep=1, report=1
Test examples in docstrings in functions and classes reachable from
module m, starting with m.__doc__. Names beginning with a leading
underscore are skipped.
See doctest.__doc__ for an overview.
Optional keyword arg "name" gives the name of the module; by default
use m.__name__.
Optional keyword arg "globs" gives a dict to be used as the globals
when executing examples; by default, use m.__dict__. A copy of this
dict is actually used for each docstring.
Optional keyword arg "verbose" prints lots of stuff if true, prints
only failures if false; by default, it's true iff "-v" is in sys.argv.
Optional keyword arg "deep" tests *only* m.__doc__ when false.
Optional keyword arg "report" prints a summary at the end when true,
else prints nothing at the end. In verbose mode, the summary is
detailed, else very brief.
"""
if type(m) is not _ModuleType:
raise TypeError("testmod: module required; " + `m`)
if name is None:
name = m.__name__
if globs is None:
globs = m.__dict__
if verbose is None:
import sys
verbose = "-v" in sys.argv
master.globs = globs
master.verbose = verbose
master.deep = deep
failures, tries = master.runone(m, name)
if deep:
f, t = master.rundict(m.__dict__, name)
failures = failures + f
tries = tries + t
if report:
master.summarize()
return failures, tries
class TestClass:
"""
A pointless class, for sanity-checking of docstring testing.
Methods:
square()
get()
>>> TestClass(13).get() + TestClass(-12).get()
1
>>> hex(TestClass(13).square().get())
'0xa9'
"""
def __init__(self, val):
self.val = val
def square(self):
"""square() -> square TestClass's associated value
>>> TestClass(13).square().get()
169
"""
self.val = self.val ** 2
return self
def get(self):
"""get() -> return TestClass's associated value.
>>> x = TestClass(-42)
>>> print x.get()
-42
"""
return self.val
def _test():
import doctest
return doctest.testmod(doctest)
if __name__ == "__main__":
_test()