December 3, 2007

XML Tree Pruning

A little while back I had an XML document that I needed to prune down. Now tools like XPATH will easily let you pick out certain bits of an XML tree, but what I needed was kind of the complement: keep the document intact, but zap specific bits of it. Turns out that libxml2 provides an easy way to do this. For instance, in python:

import libxml2
doc = libxml2.parseFile("file.xml")
xpc = doc.xpathNewContext()
for n in xpc.xpathEval('/root/item|/root/folder[./title/text()!="Keeper"]'):
    n.unlinkNode()
doc.saveFormatFile("pruned.xml", True)

I.e. if I have an XML tree with a root element containing a bunch of item and folder nodes, this will toss out all of them except the folder I want to keep (here, the one with title "Keeper"). The procedure should be about the same from any language that can use libxml2.

Posted by Milligan at December 3, 2007 2:32 PM | TrackBack
Comments
Post a comment









Remember personal info?