A little while back I had an XML document that I needed to prune down. Now tools like XPATH will easily let you pick out certain bits of an XML tree, but what I needed was kind of the complement: keep the document intact, but zap specific bits of it. Turns out that libxml2 provides an easy way to do this. For instance, in python:
import libxml2
doc = libxml2.parseFile("file.xml")
xpc = doc.xpathNewContext()
for n in xpc.xpathEval('/root/item|/root/folder[./title/text()!="Keeper"]'):
n.unlinkNode()
doc.saveFormatFile("pruned.xml", True)
I.e. if I have an XML tree with a root element containing a bunch of item and folder nodes, this will toss out all of them except the folder I want to keep (here, the one with title "Keeper"). The procedure should be about the same from any language that can use libxml2.
Posted by Milligan at December 3, 2007 02:32 PM | TrackBack