Metadata-Version: 1.1
Name: jx-sqlite
Version: 0.9.17206
Summary: JSON query expressions using SQLite
Home-page: https://github.com/mozilla/jx-sqlite
Author: Rohit Kumar, Kyle Lahnakoski
Author-email: rohitkumar.a255@gmail.com, kyle@lahnakoski.com
License: MPL 2.0
Description: jx-sqlite
        =========
        
        JSON query expressions using SQLite
        
        Overview
        --------
        
        An attempt to store JSON documents in SQLite so that they are accessible
        via SQL. The hope is this will serve a basis for a general
        document-relational map (DRM), and leverage the database's query
        optimizer.
        
        Status
        ------
        
        It looks like many tests already pass, but those are the easy ones. The
        difficult tests, testing queries into nested arrays, remain to be
        solved. Hopefully, there will be a GSOC project to refactor and finish
        this work.
        
        The tests fail because what I have written does not handle the most
        interesting, and most important features: We want to query nested object
        arrays as if they were just another table. This is important for two
        reasons;
        
        1. Inner objects ``{"a":{"b":0}}`` are a shortcut for nested arrays
           ``{"a":[{"b":0}]}``, plus
        2. Schemas can be expanded from one-to-one to one-to-many
           ``{"a":[{"b":0}, {"b":1}]}``.
        
        Running tests
        -------------
        
        ::
        
            export PYTHONPATH=.
            python -m unittest discover -v -s tests
        
        Design
        ------
        
        Nomenclature
        ~~~~~~~~~~~~
        
        -  **Nested Object** - An object in an array
        -  **Inner Object** - A child object, no arrays
        
        This nomenclature is different than most documentation that talks about
        JSON documents: Other documentation will use the word *nested* to refer
        to either sub-structure ambiguously.
        
        Overloading property types
        ~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        A distinction between databases and document stores is the ability to
        store different primitive types at the same property. To overcome this,
        we markup the columns of the database with the type. Let's put two
        objects into the ``example`` table:
        
        ::
        
            {"a": 1}
            {"a": "hello"}
        
        We markup the columns with ``$``\ +typename, with the hope we avoid
        namespace collisions.
        
        ``example``
        ~~~~~~~~~~~
        
        \| \_id \| a.\ :math:`integer | a.`\ string \|
        \|-----\|------------\|-----------\| \| 0 \| 1 \| null \| \| 1 \| null
        \| "hello" \|
        
        We add an ``_id`` column as a UID, so we can distinguish documents.
        
        The good thing about adding the type to the name is we can store
        primitive values:
        
        ::
        
            "hello world"
        
        \| \_id \| a.\ :math:`integer | a.`\ string \| $string \|
        \|-----\|------------\|-----------\|---------------\| \| 2 \| null \|
        null \| "hello world" \|
        
        Inner objects
        ~~~~~~~~~~~~~
        
        Limiting ourselves to inner objects, with no arrays, we can store them
        in the database so that each column represents a *path* to a literal
        value
        
        ::
        
            {"a": {"b": 1, "c": "test"}}
        
        there are only two leaves in this tree of documents:
        
        ``example``
        ~~~~~~~~~~~
        
        \| \_id \| a.b.\ :math:`integer | a.c.`\ string \|
        \|-----\|--------------\|-------------\| \| 3 \| 1 \| "test" \|
        
        When we encounter objects with different structures, we can perform
        schema expansion
        
        ::
        
            {"a": {"b": {"d": 3}}}
        
        we do this by adding columns to the table so we can store the new leaf
        values
        
        ``example``
        ~~~~~~~~~~~
        
        \| \_id \| a.b.\ :math:`integer | a.c.`\ string \| a.b.d.$integer \|
        \|-----\|--------------\|-------------\|----------------\| \| 4 \| null
        \| null \| 3 \|
        
        Nested Objects
        ~~~~~~~~~~~~~~
        
        When it comes to nesting objects, a new table will be required
        
        ::
        
            {"a": [{"b": 4}, {"b":5}]}
        
        Our fact table has no primitive values
        
        ``example``
        ~~~~~~~~~~~
        
        \| \_id \| a.b.\ :math:`integer | a.c.`\ string \| a.b.d.$integer \|
        \|-----\|--------------\|-------------\|----------------\| \| 5 \| null
        \| null \| null \|
        
        Our nested documents are stored in a new table, called ``example.a``
        
        ``example.a``
        ~~~~~~~~~~~~~
        
        | \_id \| \_order \| \_parent \| b.integer \|
        | --- \| ------ \| ------- \| --------- \|
        |  6 \| 0 \| 5 \| 4 \|
        |  7 \| 1 \| 5 \| 5 \|
        
        Child tables have a ``_id`` column, plus two others: ``_order`` so we
        can reconstruct the original JSON and ``_parent`` which is used to refer
        to the immediate parent of the array.
        
        More Design Docs
        ----------------
        
        -  `Snowflake <https://github.com/mozilla/jx-sqlite/blob/master/docs/Perspective.md>`__
        -  `JSON in
           Database <https://github.com/mozilla/jx-sqlite/blob/master/docs/JSON%20in%20Database.md>`__
        
        Open problems
        -------------
        
        **Do we copy the ``a.*`` columns from the ``example`` table to our new
        child table?** As I see it, there are two possible answers:
        
        1. **Do not copy** - If there is just one nested document, or it is an
           inner object, then it will fit in a single row of the parent, and
           reduce the number of rows returned when querying. The query
           complexity increases because we must consider the case of inner
           objects and the case of nested objects.
        2. **Copy** - Effectively move the columns to the new child table: This
           will simplify the queries because each path is realized in only one
           table but may be more expensive because every inner object will
           demand an SQL-join, it may be expensive to perform the alter table.
        
        **How to handle arrays of arrays?** I have not seen many examples in the
        wild yet. Usually, arrays of arrays represent a multidimensional array,
        where the number of elements in every array is the same. Maybe we can
        reject JSON that does not conform to a multidimensional interpretation.
        
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: License :: OSI Approved :: Mozilla Public License 2.0 (MPL 2.0)
