Metadata-Version: 2.4
Name: william-occam
Version: 0.2.3.1
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: License :: Free for non-commercial use
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries
Requires-Dist: numpy
Requires-Dist: numba
Requires-Dist: pandas
Requires-Dist: intnan
Requires-Dist: typing-inspect
Requires-Dist: scikit-learn
Requires-Dist: xxhash
Requires-Dist: scipy ; extra == 'dev'
Requires-Dist: graphviz ; extra == 'dev'
Requires-Dist: pillow ; extra == 'dev'
Requires-Dist: uvicorn ; extra == 'web'
Requires-Dist: fastapi ; extra == 'web'
Requires-Dist: jinja2 ; extra == 'web'
Requires-Dist: email-validator ; extra == 'web'
Requires-Dist: python-multipart ; extra == 'web'
Requires-Dist: python-dotenv ; extra == 'web'
Requires-Dist: starlette ; extra == 'web'
Provides-Extra: dev
Provides-Extra: web
License-File: LICENSE.md
Summary: William: A tool for data compression and machine learning automatization
Keywords: artificial intelligence,machine learning,incremental compression,data compression,inductive programming,AGI
Author-email: Arthur Franz <af@occam.com.ua>, Michael Löffler <ml@occam.com.ua>
Requires-Python: >=3.11
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: homepage, https://occam.com.ua
Project-URL: source, https://gitlab.com/occam_ua/william/
Project-URL: tracker, https://gitlab.com/occam_ua/william/-/issues

[![pipeline status](https://gitlab.com/occam_ua/william/badges/master/pipeline.svg)](https://gitlab.com/occam_ua/william/-/commits/master)
[![PyPI](https://img.shields.io/pypi/v/william-occam.svg?style=flat)](https://pypi.org/project/william-occam/)
![Python Version](https://img.shields.io/pypi/pyversions/william-occam)

# WILLIAM - A general purpose data compression algorithm

## Overview

WILLIAM is an **inductive programming system** based on the
**theory of Incremental Compression (IC)** [Franz et al. 2021].
Its core principle is that *learning = compression*:  
given a dataset `x`, the algorithm searches for short descriptions in the form
of compositional features `f1, f2, …, fs` such that

x = f1(f2(... f_s(r_s)))


with each step achieving some compression.
This corresponds to an incremental approximation of the **Kolmogorov complexity K(x)**:

K(x) ≈ Σ l(f*i) + K(r_s) + O(s · log l(x))


where each `f*i` is the shortest compressing feature at step `i`.

WILLIAM differs from classical ML approaches in that it does not optimize
parameters in a fixed representation, but **searches a broad algorithmic space**
for compressing autoencoders.  
This yields machine learning algorithms (centralization, regression, classification, decision trees, outlier detection) as *emergent special cases* of general compression:contentReference[oaicite:0]{index=0}.

For theoretical background, see:
- *A Theory of Incremental Compression* (Franz, Antonenko, Soletskyi, 2021):contentReference[oaicite:1]{index=1}
- *WILLIAM: A Monolithic Approach to AGI* (Franz, Gogulya, Löffler, 2019)
- *Experiments on the Generalization of Machine Learning Algorithms* (Franz, 2020):contentReference[oaicite:2]{index=2}


## Key Concepts

- **Incremental Compression**  
  Decomposes data into *features* and *residuals* step by step, ensuring that each feature is independent and incompressible.

- **Features as Properties**  
  Features formalize algorithmic properties of data and can be related to **Martin-Löf randomness tests**:  
  non-random regularities correspond to compressible features.

- **Universality**  
  Unlike specialized ML algorithms, WILLIAM discovers short descriptions exhaustively
  via directed acyclic graphs (DAGs) of operators, reusing values and cutting at information bottlenecks.

- **Emergent ML Algorithms**  
  Without any tuning, WILLIAM naturally rediscovers:
  - data centralization  
  - outlier detection  
  - linear regression  
  - linear classification  
  - decision tree induction:contentReference[oaicite:3]{index=3}


## Limitations and Future Work

Overhead accumulation: IC theory implies additive overhead terms.

Alternative descriptions: currently only one compression path is explored at a time.

Reuse of functions: theory of memory/retrieval still open.

Performance: the Python prototype handles graphs of depth 4–5; C++/Rust backend and parallelization are natural next steps.

Despite these challenges, IC theory provides guarantees:
incremental compression reaches Kolmogorov complexity up to logarithmic precision


## Installation

For a standard installation, use:
```
pip install william-occam
```

For a full installation of all dependencies for further development, testing and graphical output use:
```
pip install william-occam[dev]
```


## Compression examples

You can run various compression tests directly with **pytest**. Set
```
export WILLIAM_DEBUG=3
```
to get visual output after every compression step. Set to 2, if you only want to see the compression results after every task. 
Now run:

```bash
py.test -v -s william/tests/test_alice.py
```
Enter c and enter to step through the steps with the debugger and look at the generated graphs.

During execution, WILLIAM will:

- Generate synthetic training data for several regression problems:
- Search for a minimal program (tree/DAG) that explains the data.
- Display the compression progress (how the description length decreases).
- Render the resulting Directed Acyclic Graphs (DAGs) as PDF files in your working directory.


## License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). You are free to use, share, and modify the code for non-commercial purposes only, with proper attribution to the original author. For full license details, see the [LICENSE.md](./LICENSE.md) file.

## Releasing

Releases are published automatically when a tag is pushed to GitLab.

```bash
# Example for version 1.2.3
export RELEASE=v1.2.3

# Create a tag and push the specific tag to trigger the CI pipeline
git tag $RELEASE && git push origin $RELEASE
```

