Personal tools
You are here: Home Storage Merging two XML documents with the ElementTree

Merging two XML documents with the ElementTree

Posted by Rafal Zawadzki at Jul 02, 2010 02:37 PM |

simply recipe how to merge many XML documents using Python with ElementTree module

In my latest project which I am working on (for Headnet.dk) we are getting data from external source (a Xerox DocuShare's instance). DocuShare provides XML API over HTTP. This API looks quite complex and  after fast searching I coudn't find a way to get all documents using search functionality.

So I decided to iterate over folders (collections) from the root element of site and in this way I got around 10 xml files. It's good to have list of documents locally, but I really needed all this data in one file. Below is a small Python snippet which I wrote to merge all documents:
import xml.etree.ElementTree as ET
from StringIO import StringIO
responses = None
for filename in FILES:
if responses:
[ responses.append(x) for x in    ET.parse(filename).getroot().getchildren() ]
else:
maintree = ET.parse(filename)
root = maintree.getroot()

responses  = root.getchildren()

tmpfile = StringIO()
maintree.write(tmpfile)
tmpfile.flush()
# this line is hacky and totally stupid, but it looks like
# that solr xpath parser is broken more than they stated
# and cannot handle xml with ns0: namespace :(
correct_data = tmpfile.read().replace('ns0:', '')
filewrite = open(URLS['destination'], 'w')
filewrite.write(correct_data)
filewrite.close()
A few words about removing 'ns0:' string from documents - you shoudn't do this. I had to this because my destination's xpath parser (from Solr) coudn't work with additional namespaces added by ElementTree.

Comments (0)