FAQTs - Knowledge Base - View Entry - I have a really big XML file, but only need to read a small part. Do I have to read it all in memory to parse?

faqts : Computers : Programming : Languages : Python : XML

+ Search

Entry

I have a really big XML file, but only need to read a small part. Do I have to read it all in memory to parse?

Jul 22nd, 2002 12:15
Michael Chermside, Henrik Motakef, Fredrik Lundh

Normally, using the DOM approach to XML processing (instead of the SAX
approach) requires reading the entire document into memory. But if you
only need to process a small portion of the document, Python has a
version of the DOM which works on a "pull" basis (reading in only as
needed). Here is a snippet of sample code that the Fredrik Lundh posted
to c.l.p:
>>> from xml.dom import pulldom
>>> source = pulldom.parse("somefile.xml")
>>> for event, node in source:
>>>     # node is now a dom node without child elements
>>>     if event == "START_ELEMENT" and node.tagName == "record":
>>>         # make sure we have all child elements
>>>         source.expandNode(node)
>>>         process(node)