Metadata-Version: 1.1
Name: MetaCSV
Version: 0.0.15
Summary: Tools for documentation-aware data reading, writing, and analysis
Home-page: https://github.com/delgadom/metacsv
Author: Michael Delgado
Author-email: delgado.michaelt@gmail.com
License: MIT
Description: =======
        MetaCSV
        =======
        
        
        .. image:: https://img.shields.io/travis/delgadom/metacsv/master.svg?style=flat-square
            :target: https://travis-ci.org/delgadom/metacsv
        
        .. image:: https://img.shields.io/pypi/v/metacsv.svg?style=flat-square
            :target: https://pypi.python.org/pypi/MetaCSV
        
        .. image:: https://img.shields.io/coveralls/delgadom/metacsv/master.svg?style=flat-square
            :target: https://coveralls.io/github/delgadom/metacsv?branch=master
        
        .. image:: https://img.shields.io/pypi/pyversions/metacsv.svg?style=flat-square
            :target: https://pypi.python.org/pypi/MetaCSV
        
        .. image:: https://anaconda.org/delgadom/metacsv/badges/version.svg
            :target: https://anaconda.org/delgadom/metacsv
        
        .. image:: https://anaconda.org/delgadom/metacsv/badges/downloads.svg
            :target: https://anaconda.org/delgadom/metacsv
        
        ``metacsv`` - Tools for documentation-aware data reading, writing, and analysis
        
        See the full documentation at ReadTheDocs_ 
        
        .. _ReadTheDocs: http://metacsv.rtfd.org
        
        Overview
        =========
        
        **MetaCSV** provides tools to read in CSV data with a yaml-compliant header 
        directly into a ``pandas`` ``Series``, ``DataFrame``, or ``Panel`` or an 
        ``xarray`` ``DataArray`` or ``Dataset``.
        
        Data specification
        ----------------------------
        
        Data can be specified using a yaml-formatted header, with the YAML *start-mark*
        string (``---``) above and the YAML *end-mark* string (``...``) below the yaml 
        block. Only one yaml block is allowed. If the doc-separation string is not the 
        first (non-whitespace) line in the file, all of the file's contents will be 
        interpreted by the csv reader. The yaml data can have arbitrary complexity.
        
        .. code-block:: python
        
            >>> import metacsv, numpy as np, 
            >>> import StringIO as io # import io for python 3
            >>> doc = io.StringIO('''
            ---
            author: A Person
            date:   2000-12-31
            variables:
                pop:
                  name: Population
                  unit: millions
                gdp:
                  name: Product
                  unit: 2005 $Bn
            ...
            region,year,pop,gdp
            USA,2010,309.3,13599.3
            USA,2011,311.7,13817.0
            CAN,2010,34.0,1240.0
            CAN,2011,34.3,1276.7
            ''')
        
        
        
        Using MetaCSV-formatted files in python
        --------------------------------------------
        
        
        Read MetaCSV-formatted data into python using pandas-like syntax: 
        
        .. code-block:: python
        
            >>> df = metacsv.read_csv(doc, index_col=[0,1])
            >>> df
            <metacsv.core.containers.DataFrame (4, 2)>
                           pop      gdp
            region year
            USA    2010  309.3  13599.3
                   2011  311.7  13817.0
            CAN    2010   34.0   1240.0
                   2011   34.3   1276.7
            
            Variables
                gdp:       OrderedDict([('name', 'Product'), ('unit', '2005 $Bn')])
                pop:       OrderedDict([('name', 'Population'), ('unit', 'millions')])
            Attributes
                date:      2000-12-31
                author:    A Person
        
        These properties can be transferred from one data container to another:
        
        .. code-block:: python
        
            >>> s = metacsv.Series(np.random.random(6))
            >>> s
            <metacsv.core.containers.Series (6,)>
            0    0.881924
            1    0.556330
            2    0.554700
            3    0.221284
            4    0.970801
            5    0.946414
            dtype: float64
            >>> s.attrs = df.attrs
            >>> s
            <metacsv.core.containers.Series (6,)>
            0    0.881924
            1    0.556330
            2    0.554700
            3    0.221284
            4    0.970801
            5    0.946414
            dtype: float64
        
            Attributes
                date:      2000-12-31
                author:    A Person
            
        
        All MetaCSV attributes, including the ``attrs`` Attribute object, can be copied, 
        assigned to new objects, and deleted. Since these attributes are largely 
        unstable across normal pandas data processing, it is recommended that attributes 
        be copied before data work is attempted and then reassigned before IO 
        conversions.
        
        
        Exporting MetaCSV data to other formats
        -----------------------------------------------
        
        CSV
        ~~~~~~~~~
        
        A MetaCSV ``Series`` or ``DataFrame`` can be written as a yaml-prefixed CSV 
        using the same ``to_csv`` syntax as it's ``pandas`` counterpart:
        
        .. code-block:: python
        
            >>> df.attrs['new attribute'] = 'changed in python!'
            >>> df.to_csv('my_new_data.csv')
        
        The resulting csv will include a yaml-formatted header with the original 
        metadata updated to include attr['new attribute'].,
        
        
        pandas
        ~~~~~~~~~~~~~~~
        
        The coordinates and MetaCSV attributes can be easily stripped from a MetaCSV 
        Container:
        
        .. code-block:: python
        
            >>> df.to_pandas()
                           pop      gdp
            region year
            USA    2010  309.3  13599.3
                   2011  311.7  13817.0
            CAN    2010   34.0   1240.0
                   2011   34.3   1276.7
        
        
        
        xarray/netCDF
        ~~~~~~~~~~~~~~~
        
        `xArray <http://xarray.pydata.org/>`_ provides a pandas-like interface to 
        operating on indexed ``ndarray`` data. It is modeled on the ``netCDF`` data 
        storage format used frequently in climate science, but is useful for many 
        applications with higher-order data.
        
        
        
        .. code-block:: python
        
            >>> ds = df.to_xarray()
            >>> ds
            <xarray.Dataset>
            Dimensions:  (region: 2, year: 2)
            Coordinates:
              * region   (region) object 'USA' 'CAN'
              * year     (year) int64 2010 2011
            Data variables:
                pop      (region, year) float64 309.3 311.7 34.0 34.3
                gdp      (region, year) float64 1.36e+04 1.382e+04 1.24e+03 1.277e+03
            Attributes:
                date: 2000-12-31
                author: A Person
            >>> ds.to_netcdf('my_netcdf_data.nc')
        
        Pickling
        ~~~~~~~~~
        
        Pickling works just like pandas.
        
        .. code-block:: python
        
            >>> df.to_pickle('my_metacsv_pickle.pkl')
            >>> metacsv.read_pickle('my_metacsv_pickle.pkl')
            <metacsv.core.containers.DataFrame (4, 2)>
                           pop      gdp
            region year
            USA    2010  309.3  13599.3
                   2011  311.7  13817.0
            CAN    2010   34.0   1240.0
                   2011   34.3   1276.7
        
            Variables
                gdp:       OrderedDict([('name', 'Product'), ('unit', '2005 $Bn')])
                pop:       OrderedDict([('name', 'Population'), ('unit', 'millions')])
            Attributes
                date:      2000-12-31
                author:    A Person
        
        
        
        Others
        ~~~~~~~~~
        
        Currently, MetaCSV only supports conversion to CSV and to netCDF through the 
        ``xarray`` module. However, feel free to suggest additional features and to 
        contribute your own!
        
        
        
        Conversion to other types on the fly
        -----------------------------------------------
        
        Special conversion utilities allow you to convert any metacsv, pandas, or xarray 
        container or a CSV filepath into any other type in this group.
        
        All of these conversion utilities are also methods on metacsv containers.
        
        * to_csv
        
        ``to_csv`` allows you to write any container or csv file to a metacsv-formatted 
        csv file. Keyword arguments ``attrs``, ``coords``, and ``variables`` will be 
        attached to the data before it is written. Any conflicts in these attributes 
        will be updated with the arguments to this function
        
        .. code-block:: python
        
            >>> import pandas as pd, numpy as np, xarray as xr, metacsv
            >>> df = pd.DataFrame(np.random.random((3,4)), columns=list('abcd'))
            >>> df
                      a         b         c         d
            0  0.558083  0.665184  0.226173  0.339905
            1  0.541712  0.835804  0.326078  0.179103
            2  0.332869  0.435573  0.904612  0.823884
            
            >>> metacsv.to_csv(df, 'mycsv.csv', attrs={'author': 'my name', 'date': '2015-12-31'})
            >>> 
            >>> df2 = metacsv.read_csv('mycsv.csv', index_col=[0])
            >>> df2
            <metacsv.core.containers.DataFrame (3, 4)>
                      a         b         c         d
            0  0.558083  0.665184  0.226173  0.339905
            1  0.541712  0.835804  0.326078  0.179103
            2  0.332869  0.435573  0.904612  0.823884
            
            Attributes
                date:      2015-12-31
                author:    my name
            
            >>> metacsv.to_csv(df2, 'mycsv.csv', attrs={'author': 'new name'})
            >>> 
            >>> metacsv.read_csv('mycsv.csv', index_col=[0])
            <metacsv.core.containers.DataFrame (3, 4)>
                      a         b         c         d
            0  0.558083  0.665184  0.226173  0.339905
            1  0.541712  0.835804  0.326078  0.179103
            2  0.332869  0.435573  0.904612  0.823884
            
            Attributes
                date:      2015-12-31
                author:    new name
        
        * to_header
        
        ``to_header`` allows you to write the special attributes directly to a 
        metacsv-formatted header file. The special attributes may be individually 
        specified or taken from a metacsv container. The ``header_file`` argument to 
        both ``read_csv`` and ``to_csv`` allow the creation of special header files 
        which allow you to separate the metacsv-formatted header from the data if 
        desired.
        
        For example, say you have a table to read into pandas
        
        .. code-block:: python
        
            >>> import metacsv, pandas as pd
            >>> pd.DataFrame(
                [['x',1,2,3],['y',4,5,6],['z',7,8,9]], columns=['index','a','b','c']).to_csv('mycsv.csv', index=None)
            >>> metacsv.read_csv('mycsv.csv')
            <metacsv.core.containers.DataFrame (3, 4)>
              index  a  b  c
            0     x  1  2  3
            1     y  4  5  6
            2     z  7  8  9
        
        A separate header file can be created and used which can then be read in with the data:
        
        .. code-block:: python
        
            >>> metacsv.to_header('mycsv.header', attrs={'author': 'me'}, coords='index')
            >>> metacsv.read_csv('mycsv.csv', header_file='mycsv.header')
            <metacsv.core.containers.DataFrame (3, 3)>
                   a  b  c
            index
            x      1  2  3
            y      4  5  6
            z      7  8  9
        
            Coordinates
              * index      (index) object x, y, z
            Attributes
                author:         me
        
        
        * to_xarray
        
        ``to_xarray`` returns any container or csv file as an xarray container. Table 
        data (CSV files and DataFrames) will create ``xarray.Dataset`` objects, while 
        Series objects will create ``xarray.DataArray`` objects. Keyword arguments 
        ``attrs``, ``coords``, and ``variables`` will be attached to the data before it 
        is written. Any conflicts in these attributes will be updated with the arguments 
        to this function.
        
        * to_dataarray
        
        ``to_dataarray`` returns any container or csv file as an ``xarray.DataArray``. 
        Table data (CSV files and DataFrames) will be stacked, with columns re-arranged 
        as new ``xarray.Coordinates``. Keyword arguments ``attrs``, ``coords``, and 
        ``variables`` will be attached to the data before it is written. Any conflicts 
        in these attributes will be updated with the arguments to this function.
        
        * to_dataset
        
        ``to_dataarray`` returns any container or csv file as an ``xarray.DataArray``. 
        Table data (CSV files and DataFrames) will be stacked, with columns re-arranged 
        as new ``xarray.Coordinates``. Keyword arguments ``attrs``, ``coords``, and 
        ``variables`` will be attached to the data before it is written. Any conflicts 
        in these attributes will be updated with the arguments to this function.
        
        * to_pandas
        
        ``to_pandas`` strips special attributes and returns an ordinary ``Series`` or 
        ``DataFrame`` object.
        
        * to_netcdf
        
        ``to_netcdf`` first converts a container or csv file to an ``xarray.Dataset`` 
        using the ``to_dataset`` function, then writes the dataset to file with the
        ``xarray`` ``ds.to_netcdf`` method.
        
        .. code-block:: python
        
            >>> metacsv.to_netcdf('mycsv.csv', 'mycsv.nc', header_file='mycsv.header')
            >>> import xarray as xr
            >>> xr.open_dataset('mycsv.nc')
            <xarray.Dataset>
            Dimensions:  (index: 3)
            Coordinates:
              * index    (index) |S1 'x' 'y' 'z'
            Data variables:
                a        (index) int64 1 4 7
                b        (index) int64 2 5 8
                c        (index) int64 3 6 9
            Attributes:
                author: me
        
        Special attributes
        -----------------------------------------------
        
        The ``coords`` and ``variables`` attributes are keywords and are not simply 
        passed to the MetaCSV object's ``attrs`` attribute.
        
        
        Variables
        ~~~~~~~~~~~~~
        
        Variables are attributes which apply to speicific columns or data variables. In 
        MetaCSV containers, variables are displayed as a separate set of attributes. On 
        conversion to ``xarray``, these attributes are assigned to variable-specific 
        ``attrs``:
        
        .. code-block:: python
        
            >>> ds = df.to_xarray()
            >>> ds
            <xarray.Dataset>
            Dimensions:  (index: 4)
            Coordinates:
              * index    (index) int64 0 1 2 3
            Data variables:
                region   (index) object 'USA' 'USA' 'CAN' 'CAN'
                year     (index) int64 2010 2011 2010 2011
                pop      (index) float64 309.3 311.7 34.0 34.3
                gdp      (index) float64 1.36e+04 1.382e+04 1.24e+03 1.277e+03
            Attributes:
                date: 2000-12-31
                author: A Person
            
            >>> ds.pop
            <xarray.DataArray 'pop' (index: 4)>
            array([ 309.3,  311.7,   34. ,   34.3])
            Coordinates:
              * index    (index) int64 0 1 2 3
            Attributes:
                name: Population
                unit: millions
        
        Note that at present, variables are not persistent across slicing operations.
        
        **parse_vars**
        
        Variables have a special argument to ``read_csv``: ``parse_vars`` allows parsing of one-line variable definitions in the format ``var: description [unit]``:
        
        .. code-block:: python
        
            >>> doc = io.StringIO('''
            ---
            author: A Person
            date:   2000-12-31
            variables:
                pop: Population [millions]
                gdp: Product [2005 $Bn]
            ...
            region,year,pop,gdp
            USA,2010,309.3,13599.3
            USA,2011,311.7,13817.0
            CAN,2010,34.0,1240.0
            CAN,2011,34.3,1276.7
            ''')
            
            >>> metacsv.read_csv(doc, index_col=0, parse_vars=True)
            <metacsv.core.containers.DataFrame (4, 3)>
                    year    pop      gdp
            region
            USA     2010  309.3  13599.3
            USA     2011  311.7  13817.0
            CAN     2010   34.0   1240.0
            CAN     2011   34.3   1276.7
            
            Variables
                gdp:       {u'description': 'Product', u'unit': '2005 $Bn'}
                pop:       {u'description': 'Population', u'unit': 'millions'}
            Attributes
                date:      2000-12-31
                author:    A Person
        
        Coordinates
        ~~~~~~~~~~~~~
        
        The conceptual foundation of coordinates is taken from ``xarray``, where data is 
        treated as an ndarray rather than a table. If you plan to only work with the 
        pandas-like features of ``metacsv``, you do not really need coordinates.
        
        That said, specifying the ``coords`` attribute in a csv results in automatic
        index handling:
        
        .. code-block:: python
        
            >>> doc = io.StringIO('''
            ---
            author: A Person
            date:   2000-12-31
            variables:
                pop:
                  name: Population
                  unit: millions
                gdp:
                  name: Product
                  unit: 2005 $Bn
            coords:
                - region
                - year
            ...
            region,year,pop,gdp
            USA,2010,309.3,13599.3
            USA,2011,311.7,13817.0
            CAN,2010,34.0,1240.0
            CAN,2011,34.3,1276.7
            ''')
            
            >>> df = metacsv.read_csv(doc)
            >>> df
            <metacsv.core.containers.DataFrame (4, 2)>
                           pop      gdp
            region year
            USA    2010  309.3  13599.3
                   2011  311.7  13817.0
            CAN    2010   34.0   1240.0
                   2011   34.3   1276.7
            
            Coordinates
              * region     (region) object CAN, USA
              * year       (year) int64 2010, 2011
            Variables
                gdp:       OrderedDict([('name', 'Product'), ('unit', '2005 $Bn')])
                pop:       OrderedDict([('name', 'Population'), ('unit', 'millions')])
            Attributes
                date:      2000-12-31
                author:    A Person
        
        
        Coordinates become especially useful, however, when moving to ``xarray`` objects 
        or ``netCDF`` files. The ``DataFrame`` above will have no trouble, as ``region`` 
        and ``year`` are orthoganal:
        
        .. code-block:: python
        
            >>> df.to_xarray()
            <xarray.Dataset>
            Dimensions:  (region: 2, year: 2)
            Coordinates:
              * region   (region) object 'USA' 'CAN'
              * year     (year) int64 2010 2011
            Data variables:
                pop      (region, year) float64 309.3 311.7 34.0 34.3
                gdp      (region, year) float64 1.36e+04 1.382e+04 1.24e+03 1.277e+03
            Attributes:
                date: 2000-12-31
                author: A Person
        
        This becomes more complicated when columns in the index are not independent and 
        cannot be thought of as orthogonal. In this case, you can specify ``coords`` as 
        a dict-like attribute either in the CSV header or as an argument to the 
        conversion method:
        
        .. code-block:: python
        
            doc = io.StringIO('''
            ---
            coords:
                region:
                regname: 'region'
                continent: 'region'
                year:
            ...
            region,regname,continent,year,pop,gdp
            USA,United States,North America,2010,309.3,13599.3
            USA,United States,North America,2011,311.7,13817.0
            CAN,Canada,North America,2010,34.0,1240.0
            CAN,Canada,North America,2011,34.3,1276.7
            ''')
            
            >>> metacsv.to_xarray(doc)
            <xarray.Dataset>
            Dimensions:    (region: 2, year: 2)
            Coordinates:
              * region     (region) object 'USA' 'CAN'
              * year       (year) int64 2010 2011
                regname    (region) object 'United States' 'Canada'
                continent  (region) object 'North America' 'North America'
            Data variables:
                pop        (region, year) float64 309.3 311.7 34.0 34.3
                gdp        (region, year) float64 1.36e+04 1.382e+04 1.24e+03 1.277e+03
        
        Note that the resulting ``Dataset`` is not indexed by the cartesian product of 
        all four coordinates, but only by the base coordinates, indicated by the ``*``. 
        Without first setting the ``coords`` attribute this way, the resulting data 
        would have ``NaN`` values corresponding to ``(USA, Canada)`` and 
        ``(CAN, United States)``.
        
        
        TODO
        ============
        
        * Allow automatic coersion of ``xarray.Dataset`` and ``xarray.DataArray`` 
          objects to MetaCSV containers.
        
        * Extend metacsv functionality to ``Panel`` objects
        
        * Make ``coords`` and ``attrs`` persistent across slicing operations 
          (try ``df['pop'].to_xarray()`` from above example and watch it 
          fail...)
        
        * Improve hooks between ``pandas`` and ``metacsv``:
        
          - update ``coord`` names on ``df.index.names`` assignment
          - update ``coords`` on stack/unstack
          - update ``coords`` on 
        
        * Improve parser to automatically strip trailing commas and other excel relics
        
        * Enable ``read_csv(engine='C')``... this currently does not work.
        
        * Handle attributes indexed by coord/variable names --> assign to 
          coord/variable-specific ``attrs``
        
        * Let's start an issue tracker and get rid of this section!
        
        * Should we rethink "special attribute," naming e.g. coords? Maybe these should 
          have some special prefix like ``_coords`` when included in yaml headers to 
          avoid confusion with other generic attributes...
        
        * Allow attribute assertions (e.g. ``version='>1.6.0'``) in ``read_csv`` call
        
        * Improve test coverage
        
        * Improve documentation & build readthedocs page
        
        
        
        Feature Requests
        ==================
        * Create syntax for ``multi-csv`` --> ``Panel`` or combining using filename 
          regex 
        * Eventually? allow for on-disk manipulation of many/large files with 
          dask/xarray 
        * Eventually? add xml, SQL, other structured syntax language conversions
        
        
        
        
        ==============  ==========================================================
        Python support  Python 2.7, >= 3.3
        Source          https://github.com/delgadom/metacsv
        Docs            http://metacsv.rtfd.org
        Changelog       http://metacsv.readthedocs.org/en/latest/history.html
        API             http://metacsv.readthedocs.org/en/latest/api.html
        Issues          https://github.com/delgadom/metacsv/issues
        Travis          http://travis-ci.org/delgadom/metacsv
        Test coverage   https://coveralls.io/r/delgadom/metacsv
        pypi            https://pypi.python.org/pypi/metacsv
        Ohloh           https://www.ohloh.net/p/metacsv
        License         `BSD`_.
        git repo        .. code-block:: bash
        
                            $ git clone https://github.com/delgadom/metacsv.git
        install dev     .. code-block:: bash
        
                            $ git clone https://github.com/delgadom/metacsv.git metacsv
                            $ cd ./metacsv
                            $ virtualenv .env
                            $ source .env/bin/activate
                            $ pip install -e .
        tests           .. code-block:: bash
        
                            $ python setup.py test
        ==============  ==========================================================
        
        .. _BSD: http://opensource.org/licenses/BSD-3-Clause
        .. _Documentation: http://metacsv.readthedocs.org/en/latest/
        .. _API: http://metacsv.readthedocs.org/en/latest/api.html
        
        
        =========
        Changelog
        =========
        
        Here you can find the recent changes to MetaCSV..
        
        .. changelog::
            :version: dev
            :released: Ongoing
        
            .. change::
                :tags:  docs
        
                Updated CHANGES.
        
        .. changelog::
            :version: 0.0.1
            :released: 2016-05-04
        
            .. change::
                :tags: project
        
                First release on PyPi.
        
        
Keywords: MetaCSV
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
