xmlutils.py is a set of Python utilities for processing xml files serially, namely converting
them to other formats (SQL, CSV, JSON). The scripts use ElementTree.iterparse() to iterate
through nodes in an XML file, thus not needing to load the whole DOM into memory.
The scripts can be used to churn through large XML files (albeit taking long :P) without memory hiccups.

Blind conversion of XML to CSV and SQL is not recommended.
It only works if the structure of the XML document is simple (flat).
On the other hand, xml2json supports complex XML documents with multiple nested hierarchies.
Lastly, the XML files are not validated at the time of conversion.

Kailash Nadh, October 2011

License:        MIT License

Documentation: http://nadh.in/code/xmlutils.py

Installation
============
With pip or easy_install

``pip install xmlutils`` or ``easy_install xmlutils``

Commandline utilities
=====================

xml2csv
-------
Convert an XML document to a CSV file.

``xml2csv --input "samples/fruits.xml" --output "samples/fruits.csv" --tag "item"``

Arguments
^^^^^^^^^
--input         Input XML document's filename*
--output        Output CSV file's filename*
--tag           The tag of the node that represents a single record (Eg: item, record)*
--delimiter     Delimiter for seperating items in a row. Default is , (a comma followed by a space)
--ignore        A space separated list of element tags in the XML document to ignore.
--header        Whether to print the CSV header (list of fields) in the first line; 1=yes, 0=no. Default is 1.
--encoding      Character encoding of the document. Default is utf-8
--limit         Limit the number of records to be processed from the document to a particular number. Default is no limit (-1)
--buffer        The number of records to be kept in memory before it is written to the output CSV file. Helps reduce the number of disk writes. Default is 1000.


xml2sql
-------
Convert an XML document to an SQL file.

``xml2sql --input "samples/fruits.xml" --output "samples/fruits.sql" --tag "item" --table "myfruits"``

Arguments
^^^^^^^^^
::

 tag     -- the record tag. eg: item
 table   -- table name
 ignore  -- list of tags to ignore
 limit   -- maximum number of records to process
 packet  -- maximum size of an insert query in MB (MySQL's max_allowed_packet)

 Returns:
 {       num: number of records converted,
         num_insert: number of sql insert statements generated
 }


xml2json
--------
Convert XML to JSON.
xml2json supports hierarchies nested to any number of levels.

``xml2json --input "samples/fruits.xml" --output "samples/fruits.sql"``

Modules
=======

xmlutils.xml2sql
----------------

::

  from xmlutils.xml2sql import xml2sql

  converter = xml2sql("samples/fruits.xml", "samples/fruits.sql", encoding="utf-8")
  converter.convert(tag="item", table="table")

Arguments
^^^^^^^^^
::

  tag     -- the record tag. eg: item
  table   -- table name
  ignore  -- list of tags to ignore
  limit   -- maximum number of records to process
  packet  -- maximum size of an insert query in MB (MySQL's max_allowed_packet)

  Returns:
  {       num: number of records converted,
          num_insert: number of sql insert statements generated
  }


xmlutils.xml2csv
----------------
::

  from xmlutils.xml2csv import xml2csv

  converter = xml2csv("samples/fruits.xml", "samples/fruits.sql", encoding="utf-8")
  converter.convert(tag="item")


Arguments
^^^^^^^^^
::

  tag     -- the record tag. eg: item
  delimiter -- csv field delimiter
  ignore  -- list of tags to ignore
  limit   -- maximum number of records to process
  buffer  -- number of records to keep in buffer before writing to disk

  Returns:
  number of records converted

xmlutils.xml2json
-----------------
::

  from xmlutils.xml2json import xml2json

  converter = xml2json("samples/fruits.xml", "samples/fruits.sql", encoding="utf-8")
  converter.convert()

  # to get a json string
  converter = xml2json("samples/fruits.xml", encoding="utf-8")
  print converter.get_json()

Arguments
^^^^^^^^^
``pretty  -- pretty print?``
