Metadata-Version: 2.2
Name: onebatch
Version: 0.1.0
Summary: OneBatchPAM: A Fast and Frugal K-Medoids Algorithm
Author-email: Antoine de Mathelin <antoine.demat@gmail.com>
License: BSD-3-Clause
Project-URL: Homepage, https://github.com/antoinedemathelin/onebatch
Project-URL: Repository, https://github.com/antoinedemathelin/onebatch
Project-URL: Issues, https://github.com/antoinedemathelin/onebatch/issues
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Cython
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.19.0
Requires-Dist: scikit-learn>=1.2.0
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: flake8; extra == "dev"
Provides-Extra: test
Requires-Dist: pytest; extra == "test"
Requires-Dist: pytest-cov; extra == "test"

# OneBatchPAM

[![CI](https://github.com/antoinedemathelin/onebatch/actions/workflows/ci.yml/badge.svg)](https://github.com/antoinedemathelin/onebatch/actions/workflows/ci.yml)
[![Python 3.9-3.12](https://img.shields.io/badge/python-3.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue)](https://github.com/antoinedemathelin/onebatch)
[![PyPI](https://img.shields.io/pypi/v/onebatch)](https://pypi.org/project/onebatch/)

Implementation of [OneBatchPAM: A Fast and Frugal K-Medoids Algorithm](https://arxiv.org/pdf/2501.19285) (AAAI 2025)

## Install the package:
```
pip install git+https://github.com/antoinedemathelin/onebatch.git
```

## Minimal examples

From data:

```python
import numpy as np
from onebatch import OneBatchPAM

X = np.random.RandomState(0).randn(10000, 2).astype(np.float32)

km = OneBatchPAM(
    n_medoids=9,
    distance="euclidean",
    batch_size="auto",
    weighting=True,
    max_iter=100,
    tol=1e-6,
    n_jobs=None,
    random_state=0,
)

# Option 1: fit + attributes
km.fit(X)
medoids = km.medoid_indices_
labels = km.predict(X)  # or use km.labels_

# Option 2: fit_predict returns medoid indices
medoids2 = km.fit_predict(X)
```

From pre-computed distance matrix:
```python
import numpy as np
from sklearn.metrics import pairwise_distances
from onebatch import OneBatchPAM

X = np.random.RandomState(0).randn(10000, 2).astype(np.float32)

# Build a (n_samples, m) distance matrix to m candidate points
m = 1000
cand_idx = np.random.choice(X.shape[0], size=m, replace=False)
K = pairwise_distances(X, X[cand_idx], metric="euclidean")

km = OneBatchPAM(
    n_medoids=9,
    distance="precomputed",
    weighting=True,
    max_iter=100,
    tol=1e-6,
    random_state=0,
)

km.fit(K)
medoids = km.medoid_indices_
# Labels are available after fit via km.labels_
# Note: predict() is intended for feature-space distances; for precomputed, prefer km.labels_
```

For visualization:
```python
import matplotlib.pyplot as plt

plt.scatter(X[:, 0], X[:, 1], c=labels, cmap="tab10", s=3, alpha=0.4)
plt.plot(X[medoids, 0], X[medoids, 1], "ko", markersize=12, markeredgewidth=2, markerfacecolor="none")
plt.show()
```

![OneBatchPAM example](assets/example.png)

## API notes
- **n_jobs**: number of parallel jobs for `sklearn.metrics.pairwise_distances`.
- **random_state**: `int` or `numpy.random.Generator` controlling sampling and initialization.
- **fit_predict**: returns medoid indices (not labels). After `fit`, labels are in `labels_`.
- **distance != 'precomputed'**: `batch_size` controls candidate columns; `'auto'` uses a heuristic.
- **distance == 'precomputed'**: pass a distance matrix of shape `(n_samples, m)` where each column contains distances to a candidate point. `batch_size` is ignored in this mode.
