Metadata-Version: 2.4
Name: strake
Version: 0.2.3rc1
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Database :: Database Engines/Servers
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Dist: mcp>=0.1.0
Requires-Dist: pyarrow>=14.0.0
Requires-Dist: uvicorn>=0.20.0
Requires-Dist: lancedb>=0.6.0
Requires-Dist: tantivy>=0.21.0
Summary: The AI Data Layer: Secure, sandboxed environment for AI agents to query and process data.
Keywords: ai,data,sandbox,mcp,datafusion,arrow,agent
Author-email: Strake Data <hello@strakedata.com>
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Documentation, https://strake-data.github.io/strake/
Project-URL: Homepage, https://strakedata.com
Project-URL: Issues, https://github.com/strake-data/strake/issues
Project-URL: Repository, https://github.com/strake-data/strake

<div align="center">
  <img src="docs/assets/logo.png" width="200" height="auto" alt="Strake Logo">
  <h1>Strake</h1>
  <p>
    <strong>The AI Data Layer</strong>
  </p>
  <p>
    <a href="LICENSE"><img src="https://img.shields.io/badge/license-Apache_2.0-blue.svg" alt="License"></a>
    <a href="CONTRIBUTING.md"><img src="https://img.shields.io/badge/PRs-welcome-brightgreen.svg" alt="PRs Welcome"></a>
    <a href="https://strake-data.github.io/strake/"><img src="https://img.shields.io/badge/docs-latest-blue.svg" alt="Docs"></a>
  </p>
</div>

<br>

**Strake** is the AI Data Layer. Not a query tool. Not a RAG pipeline. The sandboxed execution environment where agents meet your data and return answers, not rows.

Built on [Apache Arrow DataFusion](https://github.com/apache/arrow-datafusion), Strake enables AI agents to discover, query, and process data across your entire stack (PostgreSQL, Snowflake, S3, and more) without the need for data movement or ETL. Give AI agents structured access to your entire data stack safely.

> 📚 **Full Documentation**: Check out the [complete documentation](https://strake-data.github.io/strake/) for installation, architecture, and API references.

---

##  Key Features

- **MCP-Native Discovery**: Built for the Model Context Protocol. Your agents immediately discover your entire data catalog and schemas.
- **Run Python, Not Prompts**: Every agent execution runs inside strict native OS sandboxes for performance, or ephemeral MicroVMs for hardware-level isolation.
- **Zero-Copy Federation**: Query Postgres, S3, Local Files, REST, gRPC, and more simultaneously with Pushdown optimization via Apache Arrow.
- **Read-Only by Default**: Strict read-only enforcement, dynamic Row-Level Security (RLS), and PII masking out of the box.
- **Developer First**: Built for engineers shipping agents to production. Type-safe configuration, rich CLI tooling, and local development workflows.
- **Python Native**: Zero-copy integration with Pandas and Polars via PyO3.
- **GitOps Native**: Manage your data mesh configuration as code. Version control your sources, policies, and metrics.
- **Observability**: Built-in OpenTelemetry tracing and Prometheus metrics.
- **Enterprise Capabilities**: OIDC Authentication, Row-Level Security, and Data Contracts (Enterprise Edition).

## Code Mode: Don't Compute in Context

Most agents fail by swallowing thousands of raw SQL rows. Strake's **Code Mode** lets them process data in Python where it lives, inside a secure sandbox, sending only the parsed results that matter to the LLM.

```python
import strake
from strake.mcp import run_python

script = """
# 1. Query 10M rows instantly via DataFusion
df = strake.sql("SELECT * FROM user_events")

# 2. Aggregate in Python to prevent context bloat
summary = df.groupby('feature_flag')['latency'].median()

# 3. Print exactly what the LLM needs
print(summary.to_json())
"""

# Runs isolated with OS Sandboxing or Firecracker VMs
result = await run_python(script)
print(result)
```

## Quick Start (5-Minute Setup)

If you're building agents that need to query Postgres, S3, and a REST API in a single operation — without context overflow and without leaking credentials — Strake is the runtime you're missing.

### 1. Installation

#### Quick Install (Linux/macOS)
```bash
curl -sSfL https://strakedata.com/install.sh | sh
```

#### Install via Cargo (Rust)
```bash
cargo install --path crates/cli
cargo install --path crates/server
```

#### Python Client
```bash
pip install strake
```

### 2. Configuration (GitOps)

Initialize a new config and validate your sources:

```bash
# Initialize a new config
strake-cli init

# Validate configuration
strake-cli validate sources.yaml

# Apply to the metadata store (Sync)
strake-cli apply sources.yaml --force
```

### 3. Query with Python

First, define your data sources in a `sources.yaml` file:

```yaml
sources:
  - name: local_files
    type: csv
    path: "data/*.csv"
    has_header: true
    tables:
      - name: measurements
```

Then, query using the Strake Python client:

```python
import strake
import polars as pl

# Connect using your source configuration
conn = strake.connect(sources_config="sources.yaml")

# Query across sources using standard SQL
query = "SELECT * FROM measurements LIMIT 5"
data = conn.sql(query)

# Zero-copy integration with Polars/Pandas
df = pl.from_arrow(data)
print(df)
```

## Project Structure

| Component | Description |
|-----------|-------------|
| [**strake-runtime**](crates/runtime) | Orchestration layer (Federation Engine, Sidecar). |
| [**strake-connectors**](crates/connectors) | Data source implementations (Postgres, S3, REST, etc). |
| [**strake-sql**](crates/sql) | SQL Dialects, Query Optimization, and Substrait generation. |
| [**strake-common**](crates/common) | Shared types, configuration, and error handling. |
| [**strake-server**](crates/server) | Arrow Flight SQL server implementation. |
| [**strake-cli**](crates/cli) | GitOps CLI for managing data mesh configurations. |
| [**strake-python**](python) | Python bindings for high-performance data access. |


## Contributing

We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details on how to get started.

## License

Strake is licensed under the [Apache 2.0](LICENSE) license.

