Metadata-Version: 2.4
Name: oxenai
Version: 0.46.7
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Topic :: Software Development :: Version Control
Requires-Dist: fsspec>=2025.3.0
Requires-Dist: maturin>=1.9.3
Requires-Dist: pandas>=2.3.1
Requires-Dist: polars>=1.32.0
Requires-Dist: pyarrow>=22.0.0
Requires-Dist: pytest>=8.4.1
Requires-Dist: pytest-datadir>=1.8.0
Requires-Dist: requests>=2.32.4
Requires-Dist: ruff>=0.12.7
Requires-Dist: toml>=0.10.2
Requires-Dist: tqdm>=4.67.1
Summary: Data version control for machine learning
Keywords: oxen,version control
License-Expression: Apache-2.0
Requires-Python: >=3.11
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Documentation, https://docs.oxen.ai/
Project-URL: Homepage, https://www.oxen.ai/
Project-URL: Repository, https://github.com/Oxen-AI/Oxen

# 🐂 🐍 Oxen Python Interface

The Oxen python interface makes it easy to integrate Oxen datasets directly into machine learning dataloaders or other data pipelines.

## Repositories

There are two types of repositories one can interact with, a `Repo` and a `RemoteRepo`.


## Local Repo

To fully clone all the data to your local machine, you can use the `Repo` class.

```python
import oxen

repo = oxen.Repo("path/to/repository")
repo.clone("https://hub.oxen.ai/ox/CatDogBBox")
```

If there is a specific version of your data you want to access, you can specify the `branch` when cloning.

```python
repo.clone("https://hub.oxen.ai/ox/CatDogBBox", branch="my-pets")
```

Once you have a repository locally, you can perform the same operations you might via the command line, through the python api.

For example, you can checkout a branch, add a file, commit, and push the data to the same remote you cloned it from.

```python
import oxen

repo = oxen.Repo("path/to/repository")
repo.clone("https://hub.oxen.ai/ox/CatDogBBox")
repo.checkout()
```

## Remote Repo

If you don't want to download the data locally, you can use the `RemoteRepo` class to interact with a remote repository on OxenHub.

```python
import oxen

repo = oxen.RemoteRepo("https://hub.oxen.ai/ox/CatDogBBox")
```

To stage and commit files to a specific version of the data, you can `checkout` an existing branch or create a new one.

```python
repo.create_branch("dev")
repo.checkout("dev")
```

You can then stage files to the remote repository by specifying the file path and destination directory.

```python
repo.add("new-cat.png", "images") # Stage to images/new-cat.png on remote
repo.commit("Adding another training image")
```

Note that no "push" command is required here, since the above code creates a commit directly on the remote branch.


## Build 🔨

### Pre-Requistes

If you're developing the Python interface, you'll need to:
1. [Install the Rust toolchain](../README.md#build-)
2. [Install `uv`](https://docs.astral.sh/uv/getting-started/installation/)
3. Install the [pre-commit hooks](../README.md#pre-commit-hooks) to ensure your code is consistent

### Development Cycle

To get and build dependencies, as well as the `oxen-python` code, run:
```bash
uv sync --verbose
```

To build the PyO3 oxen wrappers only, use [`maturin`](https://github.com/PyO3/maturin) and `--no-sync`:
```bash
uv run --no-sync maturin develop
```

## Test

Run `pytest`:

```bash
uv run --verbose pytest -s tests/
```

If you have already installed all dependencies, and you're not making any changes to
[`liboxen`](../crates/lib), then you may use `--no-sync`:

```bash
uv run --no-sync pytest -s tests/
```

Format and lint code with:
```bash
uvx ruff check .
uvx ruff format .
```

