Entry
XML beautifier
Jul 5th, 2000 10:00
Nathan Wallace, Hans Nowak, Snippet 144, Andreas Jung
"""
Packages: text.xml
"""
"""
> I'm having a gas with the XML package and its DOM classes, but its toxml
> () mechanism outputs mainly flat XML -- no visual structure in the form
> of line shifts or indentation. Is there a Python module that such
> beautification reasonably hassle-free?
Here is just a very stupid program which does the job. It works
with regular expressions. You can although use the sgmllib
to parse the file, find the tags with the unknown_starttag() and
unknown_endtag() functions and indent the output corresponding.
Cheers,
Andreas
"""
import os,sys,re,string
import gzip
fname = sys.argv[1]
if fname[-2:] == 'gz':
data = gzip.GzipFile(fname,'r').read()
else:
data = open(fname,'r').read()
fields = re.split('(<.*?>)',data)
level = 0
for f in fields:
if string.strip(f)=='': continue
if f[0]=='<' and f[1] != '/':
print ' '*(level*4) + f
level = level + 1
elif f[:2]=='</':
level = level - 1
print ' '*(level*4) + f
else:
print ' '*(level*4) + f