Metadata-Version: 2.4
Name: seahorse-coral
Version: 0.1.0
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Rust
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: typing-extensions>=4.0.0
Requires-Dist: pytest>=7.0 ; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21.0 ; extra == 'dev'
Requires-Dist: black>=22.0 ; extra == 'dev'
Requires-Dist: isort>=5.0 ; extra == 'dev'
Requires-Dist: mypy>=1.0 ; extra == 'dev'
Requires-Dist: ruff>=0.1.0 ; extra == 'dev'
Provides-Extra: dev
Summary: Python client library for SeahorseDB via Coral API server
Author: Seahorse Team
License: MIT
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Documentation, https://docs.seahorse.ai
Project-URL: Repository, https://github.com/seahorse-ai/seahorse

# seahorse-coral

gRPC-native Python client library for SeahorseDB via Coral.

The low-level Python surface is intentionally:
- Arrow-first for tabular reads (`scan`, `search`, `hybrid`)
- typed-model-first for metadata/admin results (`nodes`, segment status/retry, table schema)
- explicit about post-processing (`to_pyarrow()`, `to_pandas()`, `to_polars()`, `to_json()`)

Additional docs:
- Build from source: `docs/build.md`
- Advanced usage: `docs/advanced.md`
- Compatibility policy: `docs/compatibility.md`

## Quickstart (Recommended)

This project intentionally supports multiple ways to define a schema (preset / components / builder).
To keep onboarding simple, **we recommend starting with the preset schema** and only moving to Advanced
when you need customization.

### 1) Create a Coral client

```python
import seahorse_coral as sc

coral = sc.Coral("http://localhost:8080")
```

`Coral` and `AsyncCoral` use the same gRPC-native contract as the Rust client.

### 2) Create a table (preset: `id + vector + metadata`)

```python
import seahorse_coral as sc

schema = sc.default_vector_table_schema(
    dim=384,
    # Optional:
    # id_type=sc.ScalarType.STRING,
)

table = coral.create_table("documents", schema=schema)
```

Metadata/admin APIs also return typed Python models:

```python
nodes = coral.nodes()
first = nodes[0]
print(first.node_id)
print(first.node_address)
```

Table admin/mutation helpers are also available:

```python
counts = table.indexed_row_count()  # readable counts included by default
print(counts.total_row_count)
print(counts.readable.total_row_count if counts.readable else None)

table.update_rows("metadata = '{\"source\":\"updated\"}'", where="id = 1")
table.delete_rows(where="id = 2")

plan = table.rebalance_plan(writer_nodes=["writer-2"])
status = table.rebalance_status()
commit = table.rebalance_commit(plan.commit_template)
print(commit.status)
```

The preset creates:
- `id`: `INT64` primary key (or `STRING` if `id_type=ScalarType.STRING`)
- `vector`: dense vector column
- `metadata`: `STRING` (nullable; store JSON-encoded strings if you want structured metadata)

### (Optional) Schema building (SchemaBuilder)

If you need customization (more columns, segmentation, multiple indexes, etc.), build a schema explicitly.

#### A) Create table with components (no `SchemaBuilder` object)

```python
import seahorse_coral as sc

table = coral.create_table(
    "documents",
    columns=[
        sc.int64_column("id", nullable=False),
        sc.vector_column("vector", dim=384),
        sc.metadata_column("metadata"),
    ],
    primary_key=["id"],
    indexes=[sc.hnsw_index("vector")],  # List[IndexDefinition]
)
```

#### B) `SchemaBuilder` (constructor style)

```python
import seahorse_coral as sc

schema = sc.SchemaBuilder(
    columns=[
        sc.int64_column("id", nullable=False),
        sc.vector_column("vector", dim=384),
        sc.metadata_column("metadata"),
    ],
    primary_key=["id"],
    indexes=[sc.hnsw_index("vector", space=sc.IndexSpace.COSINE)],
)

table = coral.create_table("documents", schema=schema)
```

#### C) `SchemaBuilder` (fluent / chain style)

```python
import seahorse_coral as sc

schema = (
    sc.SchemaBuilder()
    .int64("id", nullable=False)
    .vector("vector", dim=384)
    .metadata()
    .with_primary_key("id")
    .hnsw("vector", space=sc.IndexSpace.COSINE)
)

table = coral.create_table("documents", schema=schema)
```

### 3) Insert rows

```python
import json

table.insert_rows(
    [
        {"id": 1, "vector": [0.1, 0.2, 0.3], "metadata": json.dumps({"source": "a"})},
        {"id": 2, "vector": [0.2, 0.1, 0.0], "metadata": json.dumps({"source": "b"})},
    ]
)
```

### (Optional) More insert options

Write APIs are explicit by mode.

```python
import seahorse_coral as sc

# 1) JSONL string (each line is a JSON object)
jsonl = (
    '{"id": 4, "vector": [0.4, 0.4, 0.4], "metadata": "{}"}\n'
    '{"id": 5, "vector": [0.5, 0.5, 0.5], "metadata": "{}"}\n'
)
table.insert_jsonl(jsonl)

# 2) Local Parquet file
# - client converts Parquet -> Arrow IPC stream -> gRPC upload stream
table.insert_parquet("./data/documents.parquet", batch_size=8192)

# 3) Single remote Parquet file
# - server reads the object directly
table.insert_parquet(
    sc.s3_file(
        "path/to/documents.parquet",
        bucket="my-bucket",
        access_key="YOUR_ACCESS_KEY",
        secret_key="YOUR_SECRET_KEY",
        region="ap-northeast-2",
    ),
    options=sc.ImportOptions(reader_batch_size=8192),
)

# 4) Multi-file import from S3
request = sc.s3_file(
    ["path/to/a.parquet", "path/to/b.parquet"],
    bucket="my-bucket",
    access_key="YOUR_ACCESS_KEY",
    secret_key="YOUR_SECRET_KEY",
    region="ap-northeast-2",
)
options = sc.ImportOptions(
    format=sc.FileFormat.PARQUET,
    reader_batch_size=8192,  # reader record batch size
    max_concurrent_files=4,  # optional
)
result = table.import_files(request, options=options)
print(result.total_inserted_row_count)

# `import_files()` is strict by default.
# If any file fails, sc.PartialImportError or sc.ImportFilesError is raised
# and the exception carries the same ImportFilesResult via `.result`.

# 5) Arrow IPC stream bytes (advanced)
# - bytes, pyarrow.Table, pyarrow.RecordBatch, and list[RecordBatch] are supported
table.insert_arrow(arrow_ipc_bytes)
```

### 4) Search (dense)

```python
# Dense vector search
#
# Note:
# - `index` is the index name, typically the same as the vector column name.
vec = table.index("vector")

result = vec.search([0.1, 0.2, 0.3], top_k=10)

result = vec.search(
    [0.1, 0.2, 0.3],
    top_k=10,
    ef_search=128,
    select="id, metadata, distance",
    where="id > 0",
)
```

### 5) Consume Arrow-first tabular results

`scan()` and `search()` return `ResultSet`.

```python
result = table.scan(select="id, metadata", limit=100)

# Low-level Arrow-native access
batches = result.to_record_batches()
arrow_table = result.to_pyarrow()

# Explicit convenience conversions
rows = result.to_json()
df = result.to_pandas()
```

For batch vector search, use `ResultSets`.

```python
results = vec.search_batch([[0.1, 0.2, 0.3], [0.3, 0.2, 0.1]], top_k=10)

for result in results:
    print(result.to_pyarrow())
```

Large-result paths are exposed separately.

```python
for batch in table.scan_stream(select="id, metadata"):
    process(batch)

for result in vec.search_batch_stream([[0.1, 0.2, 0.3], [0.3, 0.2, 0.1]], top_k=10):
    process(result.to_pyarrow())
```

### (Optional) Sparse & hybrid search

Sparse/hybrid search requires a table that has a sparse vector column + inverted index.

```python
import seahorse_coral as sc

schema = sc.SchemaBuilder(
    columns=[
        sc.int64_column("id", nullable=False),
        sc.vector_column("vector", dim=384),
        sc.sparse_vector_column("sparse_emb"),
        sc.metadata_column("metadata"),
    ],
    primary_key=["id"],
    indexes=[
        sc.hnsw_index("vector"),
        sc.inverted_index("sparse_emb"),
    ],
)

table = coral.create_table("documents_hybrid", schema=schema)

# Sparse vector search (BM25 / inverted index)
sparse_query = "1:0.8 5:0.6 12:0.4"
result = table.index("sparse_emb").search_sparse(
    sparse_query,
    top_k=10,
    bm25_k=1.2,
    bm25_b=0.75,
)

# Hybrid search (dense + sparse + fusion)
# - requires dense_column + sparse_column
result = table.hybrid_search(
    dense_column="vector",
    dense_query=[0.1, 0.2, 0.3],
    sparse_column="sparse_emb",
    sparse_query=sparse_query,
    top_k=10,
    options=sc.HybridSearchOptions(
        fusion="rrf",
        rrf_k=60,
        alpha=0.7,
    ),
)
```

## Next steps

- Advanced schema options (segmentation / placement / tuning): [docs/advanced.md](https://github.com/dn-inc/Coral/blob/main/seahorse-coral/docs/advanced.md)
- Build from source / development: [docs/build.md](https://github.com/dn-inc/Coral/blob/main/seahorse-coral/docs/build.md)

