faqts : Computers : Programming : Languages : Python : Snippets

+ Search
Add Entry AlertManage Folder Edit Entry Add page to http://del.icio.us/
Did You Find This Entry Useful?

11 of 16 people (69%) answered Yes
Recently 8 of 10 people (80%) answered Yes

Entry

XML beautifier

Jul 5th, 2000 10:00
Nathan Wallace, Hans Nowak, Snippet 144, Andreas Jung


"""
Packages: text.xml
"""
"""
> I'm having a gas with the XML package and its DOM classes, but its toxml
> () mechanism outputs mainly flat XML -- no visual structure in the form 
> of line shifts or indentation. Is there a Python module that such 
> beautification reasonably hassle-free?
Here is just a very stupid program which does the job. It works
with regular expressions. You can although use the sgmllib
to parse the file, find the tags with the unknown_starttag() and
unknown_endtag() functions and indent the output corresponding.
Cheers,
Andreas
"""
import os,sys,re,string
import gzip
fname = sys.argv[1]
if fname[-2:] == 'gz':
    data = gzip.GzipFile(fname,'r').read()
else:
    data = open(fname,'r').read()
fields = re.split('(<.*?>)',data)
level = 0
for f in fields:
    if string.strip(f)=='': continue
    if f[0]=='<' and f[1] != '/':
        print ' '*(level*4) + f
        level = level + 1
    elif f[:2]=='</':
        level = level - 1
        print ' '*(level*4) + f
    else:
        print ' '*(level*4) + f