Metadata-Version: 2.4
Name: fast-hash-utils
Version: 0.2.2
Summary: Fast deterministic dict hashing via mypyc
Author: Toby Mao
License-Expression: MIT
Project-URL: Homepage, https://github.com/tobymao/hash-utils
Project-URL: Repository, https://github.com/tobymao/hash-utils
Project-URL: Issues, https://github.com/tobymao/hash-utils/issues
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Typing :: Typed
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: mypy-extensions>=1.0
Provides-Extra: dev
Requires-Dist: mypy>=1.19; extra == "dev"
Requires-Dist: pytest>=8.4; extra == "dev"
Requires-Dist: ruff>=0.15; extra == "dev"
Dynamic: license-file

# hash-utils

Fast deterministic dict hashing via mypyc.

## Functions

- **`dict_hash(d)`** — deterministic hash of a nested dict's full content (keys + values)
- **`shape_hash(d)`** — structural hash that ignores string/int/float values, only hashing keys, value types, bools, and container lengths

## Install

```bash
pip install fast-hash-utils
```

## Usage

```python
from hash_utils import dict_hash, shape_hash

d1 = {"name": "alice", "config": {"enabled": True, "tags": []}}
d2 = {"name": "bob",   "config": {"enabled": True, "tags": []}}

# Full content hash — different names produce different hashes
dict_hash(d1) != dict_hash(d2)

# Shape hash — same structure produces same hash
shape_hash(d1) == shape_hash(d2)
```

## Why

`shape_hash` enables massive deduplication for jsonschema validation. If 13,000 dicts share the same structure but differ only in string values, they collapse to 1 unique shape — skip 12,999 redundant validations.

## Performance

Compiled via mypyc to native C. ~335K ops/s for nested dicts on a single core.

| Method | ops/s | Deterministic |
|---|---|---|
| `shape_hash` (mypyc) | 335K | Yes |
| `dict_hash` (mypyc) | 334K | Yes |
| `json.dumps + hash` | 112K | Yes |

## Development

```bash
python -m venv .venv && source .venv/bin/activate
make install    # editable install with dev deps
make test       # run tests (pure Python or compiled)
make style      # ruff check --fix + format
make clean      # remove build artifacts
```

## License

MIT
