Metadata-Version: 2.4
Name: opteryx_core
Version: 0.6.56
Summary: Opteryx Query Engine
Home-page: https://github.com/mabel-dev/opteryx/
Author-email: Justin Joyce <justin.joyce@joocer.com>
Maintainer-email: Justin Joyce <justin.joyce@joocer.com>
Project-URL: Homepage, https://opteryx.dev/
Project-URL: Documentation, https://opteryx.dev/
Project-URL: Repository, https://github.com/mabel-dev/opteryx.git
Requires-Python: >=3.13
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: aiohttp==3.13.*
Requires-Dist: numpy==2.4.*
Requires-Dist: orso==0.0.*
Requires-Dist: pyarrow==23.0.*
Provides-Extra: testing
Requires-Dist: freezegun; extra == "testing"
Provides-Extra: performance
Requires-Dist: orjson==3.11.*; extra == "performance"
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# Opteryx-Core

Opteryx-Core is the SQL execution engine behind [opteryx.app](https://opteryx.app). It is a fork of [Opteryx](https://github.com/mabel-dev/opteryx) with a smaller, more opinionated API and configuration surface, shaped around the workloads we run in the hosted service.

This library is designed for fast, read-heavy analytical queries over Parquet-backed data. It handles SQL parsing, planning, predicate pushdown, projection pruning, and execution so you can query datasets from Python without standing up a separate warehouse.

It is fair to say this project is opinionated toward the needs of `opteryx.app`. That said, it is still useful as a standalone library, especially if you want to query local Parquet-backed datasets via registered workspaces, embed SQL into a Python service or notebook, or experiment with the engine directly.

## Install

```bash
pip install opteryx-core
```

Import it as:

```python
import opteryx
```

## Quick Start: Query Local Files

If your current working directory contains local Parquet data, the simplest way to use Opteryx-Core is to register a local workspace and query it with dot-separated names.

```python
import opteryx
from opteryx.connectors import DiskConnector

opteryx.register_workspace("data", DiskConnector)

session = opteryx.session()
result = session.execute_to_arrow(
    "SELECT id, name FROM data.planets WHERE id < 5"
)

print(result)
```

In this model, dataset names are resolved relative to the current working directory. For example, `data.planets` resolves to `./data/planets`, and Opteryx-Core will read the Parquet files it finds there.

## What It Is For

- Powering the execution layer used by `opteryx.app`
- Running analytical SQL against local Parquet-backed datasets
- Embedding a query engine inside Python applications, scripts, notebooks, and services
- Working on engine internals such as planning, execution, and Parquet performance

## Best With Opteryx Catalog

Opteryx-Core works best when paired with the `opteryx_catalog` library. That is the intended model for named datasets, catalog-backed tables, and the general experience used in `opteryx.app`.

Typical setup:

```python
import os

import opteryx

from opteryx import set_default_connector
from opteryx.connectors import OpteryxConnector
from opteryx_catalog import OpteryxCatalog

set_default_connector(
    OpteryxConnector,
    catalog=OpteryxCatalog,
    firestore_project=os.environ["GCP_PROJECT_ID"],
    firestore_database=os.environ["FIRESTORE_DATABASE"],
    gcs_bucket=os.environ["GCS_BUCKET"],
)
```

Once configured, you can query catalog-backed datasets using dot-separated names such as `public.space.planets` or `opteryx.ops.billing`.

For local data, Opteryx-Core is typically used through registered workspaces such as `testdata`, `scratch`, or `data`. Queries refer to datasets by dot-separated names relative to the workspace root, for example `testdata.planets`, `testdata.satellites`, or `scratch.signals`.

## Where It Fits

Opteryx-Core is best thought of as an embedded analytical engine rather than a full end-user platform. If you want a hosted experience, multi-tenant service features, and the broader product workflow, use [opteryx.app](https://opteryx.app). If you want the core engine in your own environment, this package gives you that engine directly. If you want the intended table-resolution model, pair it with `opteryx_catalog`.

## Contributing

If you use Opteryx-Core yourself, we want to hear from you.

- Use it on your own datasets
- Raise bugs when queries, schemas, or performance do not behave as expected
- Open pull requests for fixes, tests, docs, or performance improvements
- Share repro cases, failing queries, and edge-case Parquet files

This project is being actively built, and outside usage helps make it better.

Docs: https://docs.opteryx.app/  •  Source: https://github.com/mabel-dev/opteryx-core  •  License: Apache-2.0
