A while back I needed to convert a ton (millions) of small xml files to json, so I could store them in MongoDB. To that end I wrote a teensy-tiny tool called xml-to-json (github, Hackage). Originally it was just a command-line tool with all the code thrown in a single file.

So, I did a quick refactor this week to split it into a library + executable, and pushed it to github (to deafening cries of joy).

Features

First, a non-feature. xml-to-json is “optimized” for many small xml files. If you have many small xml files, you can easily take advantage of multiple cores / cpu’s. You should be aware that for large files (over 10MB of xml data in a single file) something starts to eat up RAM, around 50 times the size of the file.

Other features:

Packages used

For XML decoding, I’m using hxt (over expat using hxt-expat). I tried a few of the xml packages on hackage, and hxt + expat was the only way I could parse quickly while avoiding nasty memory issues. Apparently, tagsoup can be used with Bytestrings to avoid the same issue but I didn’t try.

JSON is encoded using aeson.