Metadata-Version: 2.4
Name: synamine
Version: 0.1.0
Summary: AI-powered process mining for Python. Fully open source, fully flexible, lightning fast.
Project-URL: Homepage, https://github.com/captain-red-baron/synamine-core
Project-URL: Documentation, https://captain-red-baron.github.io/synamine-core/
Project-URL: Repository, https://github.com/captain-red-baron/synamine-core
Project-URL: Issues, https://github.com/captain-red-baron/synamine-core/issues
Project-URL: Changelog, https://github.com/captain-red-baron/synamine-core/blob/main/CHANGELOG.md
Author: Marcel Mueller
License-Expression: MIT
License-File: LICENSE
Keywords: ai,analytics,business-process,machine-learning,process-mining
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Office/Business
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Typing :: Typed
Requires-Python: >=3.12
Requires-Dist: lxml>=5.0.0
Requires-Dist: numpy>=1.26.0
Requires-Dist: pandas>=2.2.0
Requires-Dist: pyarrow>=15.0.0
Provides-Extra: all
Requires-Dist: anthropic>=0.18.0; extra == 'all'
Requires-Dist: graphviz>=0.20; extra == 'all'
Requires-Dist: httpx>=0.27.0; extra == 'all'
Requires-Dist: matplotlib>=3.8.0; extra == 'all'
Requires-Dist: openai>=1.12.0; extra == 'all'
Requires-Dist: polars>=0.20.0; extra == 'all'
Requires-Dist: scikit-learn>=1.4.0; extra == 'all'
Requires-Dist: scipy>=1.12.0; extra == 'all'
Provides-Extra: llm
Requires-Dist: anthropic>=0.18.0; extra == 'llm'
Requires-Dist: httpx>=0.27.0; extra == 'llm'
Requires-Dist: openai>=1.12.0; extra == 'llm'
Provides-Extra: ml
Requires-Dist: scikit-learn>=1.4.0; extra == 'ml'
Requires-Dist: scipy>=1.12.0; extra == 'ml'
Provides-Extra: polars
Requires-Dist: polars>=0.20.0; extra == 'polars'
Provides-Extra: viz
Requires-Dist: graphviz>=0.20; extra == 'viz'
Requires-Dist: matplotlib>=3.8.0; extra == 'viz'
Description-Content-Type: text/markdown

# synamine

AI-powered process mining for Python. Fully open source, fully flexible, lightning fast.

## Features

- **Event Log Handling** -- Read and write event logs in CSV and XES (IEEE 1849) formats
- **Process Discovery** -- Discover process models using Alpha Miner, Heuristic Miner, Inductive Miner, DFG, and Variant Trie algorithms
- **Process Models** -- Petri Nets, Process Trees, Directly-Follows Graphs, Variant Tries, and BPMN data structures
- **Statistics** -- Variants, activities, case durations, case lengths, sojourn times, waiting times, rework, resource workload
- **Filtering** -- Filter cases by activity, time range, or variant
- **Visualization** -- Render Petri Nets, Process Trees, DFGs (Graphviz), variant trie dendrograms, and statistics charts (Matplotlib)
- **Typed** -- Full type annotations with `py.typed` marker

## Requirements

- Python 3.12+
- [uv](https://docs.astral.sh/uv/) (recommended) or pip

## Installation

### From source (development)

```bash
git clone https://github.com/synamine/synamine-core.git
cd synamine-core
uv sync
```

### With optional dependencies

```bash
# Visualization (graphviz + matplotlib)
uv sync --extra viz

# Machine learning
uv sync --extra ml

# LLM integration
uv sync --extra llm

# Everything
uv sync --extra all
```

> **Note:** For DFG visualization, you also need the [Graphviz](https://graphviz.org/download/) system binary installed (e.g. `brew install graphviz` on macOS).

### With pip

```bash
pip install -e .            # core only
pip install -e ".[viz]"     # with visualization
pip install -e ".[all]"     # everything
```

## Quick Start

```python
import synamine

# Read an event log
log = synamine.read_csv("events.csv")
# or: log = synamine.read_xes("events.xes")
# or from an existing DataFrame: log = synamine.read_dataframe(df)

# Explore statistics
print(f"Cases: {log.num_cases}, Events: {log.num_events}")
print(f"Activities: {log.activities}")

variants = synamine.get_variants(log)
start = synamine.get_start_activities(log)
end = synamine.get_end_activities(log)
durations = synamine.get_case_durations(log)

# Filter
filtered = synamine.filter_by_activity(log, activities=["Approve"])
filtered = synamine.filter_by_variant(log, top_k=5)
filtered = synamine.filter_by_timeframe(log, start=datetime(2024, 1, 1))

# Discover process models
dfg = synamine.discover_dfg(log)
pn = synamine.discover_petri_net(log, algorithm="alpha")       # or "heuristic"
pt = synamine.discover_process_tree(log)                        # Inductive Miner
pn_from_dfg = synamine.convert_dfg_to_petri_net(dfg)

# Discover a Variant Trie
trie = synamine.discover_variant_trie(log)

# Visualize (requires synamine[viz])
synamine.save_visualization(dfg, "dfg.png")
synamine.save_visualization(pn, "petri_net.png")
synamine.save_visualization(pt, "process_tree.svg")
synamine.save_visualization(trie, "trie.svg")

# Prune rare variants and visualize with options
pruned = trie.prune(min_count=5)
synamine.save_visualization(
    pruned, "trie_pruned.png",
    label_strategy="truncate",
    pct_reference="relative",
)
```

## From a pandas DataFrame

If you already have a DataFrame (e.g. in a Jupyter notebook), use `read_dataframe` directly -- no file I/O needed:

```python
import pandas as pd
import synamine

df = pd.read_sql("SELECT * FROM events", conn)
log = synamine.read_dataframe(df)

# With custom column names
log = synamine.read_dataframe(df, case_id="order_id", activity="step", timestamp="ts")
```

## Custom Column Names

By default, synamine expects columns named `case_id`, `activity`, and `timestamp`. You can override this for any read function:

```python
log = synamine.read_csv(
    "events.csv",
    case_id="order_id",
    activity="step",
    timestamp="ts",
)
```

## CSV / XES Round-Tripping

```python
# CSV -> XES
log = synamine.read_csv("events.csv")
synamine.write_xes(log, "events.xes")

# XES -> CSV
log = synamine.read_xes("events.xes")
synamine.write_csv(log, "events.csv")
```

## Development

### Setup

```bash
git clone https://github.com/synamine/synamine-core.git
cd synamine-core
uv sync --extra viz       # install all deps including viz
uv run pre-commit install # set up pre-commit hooks
```

### Running Tests

```bash
# Run all tests
uv run pytest

# Verbose output
uv run pytest -v

# Run a specific test file
uv run pytest tests/models/test_log.py

# Run a specific test class or method
uv run pytest tests/models/test_log.py::TestEventLog::test_create_from_dataframe

# Run with coverage
uv run pytest --cov=synamine --cov-report=term-missing

# Skip slow tests
uv run pytest -m "not slow"
```

### Linting & Formatting

```bash
# Check for lint errors
uv run ruff check src/ tests/

# Auto-fix lint errors
uv run ruff check src/ tests/ --fix

# Check formatting
uv run ruff format --check src/ tests/

# Auto-format
uv run ruff format src/ tests/
```

### Type Checking

```bash
uv run mypy src/synamine
```

### Project Structure

```
synamine-core/
├── src/synamine/
│   ├── __init__.py              # Public API
│   ├── _version.py              # Version string
│   ├── read.py                  # read_csv(), read_xes(), read_dataframe()
│   ├── write.py                 # write_csv(), write_xes()
│   ├── discover.py              # discover_dfg(), discover_petri_net(), discover_process_tree(), ...
│   ├── stats.py                 # get_variants(), get_activities(), ...
│   ├── filter.py                # filter_by_activity(), filter_by_timeframe(), ...
│   ├── viz.py                   # view_dfg(), view_petri_net(), save_visualization(), ...
│   ├── models/
│   │   ├── log.py               # EventLog, EventLogSchema, Event, Trace
│   │   ├── dfg.py               # DirectlyFollowsGraph
│   │   ├── variant_trie.py      # VariantTrie, VariantTrieNode
│   │   ├── petri_net.py         # PetriNet, Place, Transition, Marking
│   │   ├── process_tree.py      # ProcessTree, Operator
│   │   └── bpmn.py              # BpmnGraph, BpmnNode, BpmnEdge
│   ├── io/
│   │   ├── csv/                 # CSV importer/exporter
│   │   └── xes/                 # XES importer/exporter
│   ├── algo/
│   │   ├── discovery/
│   │   │   ├── dfg/             # DFG discovery algorithm
│   │   │   ├── alpha/           # Alpha Miner (Petri Net)
│   │   │   ├── heuristic/       # Heuristic Miner (Petri Net)
│   │   │   └── inductive/       # Inductive Miner (Process Tree)
│   │   ├── conversion/          # DFG-to-Petri-Net conversion
│   │   ├── statistics/          # Variant, activity, duration stats
│   │   └── filtering/           # Activity, temporal, variant filters
│   └── visualization/
│       ├── dfg.py               # Graphviz-based DFG rendering
│       ├── petri_net.py         # Graphviz-based Petri Net rendering
│       ├── process_tree.py      # Graphviz-based Process Tree rendering
│       ├── charts.py            # Matplotlib statistics charts
│       └── variant_trie.py      # Matplotlib-based dendrogram
├── tests/
│   ├── conftest.py              # Shared fixtures
│   ├── data/                    # running_example.csv, running_example.xes
│   ├── models/                  # Model unit tests
│   ├── io/                      # IO round-trip tests
│   ├── algo/                    # Algorithm tests
│   ├── visualization/           # Visualization tests
│   ├── test_discover.py         # Discovery API tests
│   └── test_end_to_end.py       # Integration tests
├── pyproject.toml               # Project config, deps, tool settings
├── .github/workflows/ci.yml     # CI: lint + test matrix
└── .pre-commit-config.yaml      # Pre-commit hooks
```

## API Reference

### IO

| Function | Description |
|----------|-------------|
| `synamine.read_csv(path, ...)` | Read a CSV file into an EventLog |
| `synamine.read_xes(path)` | Read an XES file into an EventLog |
| `synamine.read_dataframe(df, ...)` | Create an EventLog from a pandas DataFrame |
| `synamine.write_csv(log, path)` | Write an EventLog to CSV |
| `synamine.write_xes(log, path)` | Write an EventLog to XES |

### Discovery

| Function | Description |
|----------|-------------|
| `synamine.discover_dfg(log, noise_threshold=0.0)` | Discover a Directly-Follows Graph |
| `synamine.discover_petri_net(log, algorithm="alpha")` | Discover a Petri Net (Alpha or Heuristic Miner) |
| `synamine.discover_process_tree(log, noise_threshold=0.0)` | Discover a Process Tree (Inductive Miner) |
| `synamine.discover_variant_trie(log, min_count=1)` | Discover a Variant Trie |
| `synamine.convert_dfg_to_petri_net(dfg)` | Convert a DFG to a Petri Net |

#### Discovery Algorithms

| Algorithm | Output | Key Idea |
|-----------|--------|----------|
| **Alpha Miner** | Petri Net | Footprint matrix from ordering relations (causality, parallelism, choice) |
| **Heuristic Miner** | Petri Net | Frequency-based dependency measure; noise-tolerant via `dependency_threshold` |
| **Inductive Miner** | Process Tree | Recursive activity partitioning via cut detection (sequence, XOR, parallel, loop) |
| **DFG-to-Petri-Net** | Petri Net | Structural conversion of any DFG into a Petri Net |

### Statistics

| Function | Description |
|----------|-------------|
| `synamine.get_variants(log)` | Get variant frequencies (`dict[tuple[str,...], int]`) |
| `synamine.get_activities(log)` | Get activity frequencies (`dict[str, int]`) |
| `synamine.get_start_activities(log)` | Get start activity frequencies |
| `synamine.get_end_activities(log)` | Get end activity frequencies |
| `synamine.get_case_durations(log)` | Get case durations in seconds |
| `synamine.get_case_lengths(log)` | Get number of events per case |
| `synamine.get_activity_durations(log)` | Get sojourn time stats per activity (mean, median, min, max, stdev) |
| `synamine.get_waiting_times(log)` | Get all inter-event waiting times as a flat list |
| `synamine.get_rework(log)` | Get count of cases with repeated activities |
| `synamine.get_resource_workload(log)` | Get event count per resource (or `None`) |
| `synamine.get_resource_activity_matrix(log)` | Get resource x activity counts (or `None`) |

### Filtering

| Function | Description |
|----------|-------------|
| `synamine.filter_by_activity(log, activities, exclude=False)` | Filter cases by activity presence |
| `synamine.filter_by_timeframe(log, start=None, end=None)` | Filter cases by time range |
| `synamine.filter_by_variant(log, top_k=None, variants=None)` | Filter cases by variant |

### Visualization

| Function | Description |
|----------|-------------|
| `synamine.save_visualization(model, path, **kwargs)` | Save any model to file (.png, .svg, .pdf) |
| `synamine.view_dfg(dfg)` | Open DFG in system viewer |
| `synamine.view_petri_net(net)` | Open Petri Net in system viewer |
| `synamine.view_process_tree(tree)` | Open Process Tree in system viewer |
| `synamine.view_variant_trie(trie, label_strategy, pct_reference)` | Open variant trie dendrogram |

#### Statistics Charts

All chart functions accept `path=` to save directly to file, or return a `Figure` when called without it.

| Function | Chart Type |
|----------|-----------|
| `synamine.plot_activity_frequencies(log)` | Horizontal bar chart |
| `synamine.plot_case_durations(log)` | Duration histogram (hours) |
| `synamine.plot_case_lengths(log)` | Case length histogram |
| `synamine.plot_start_end_activities(log)` | Grouped bar chart (start vs end) |
| `synamine.plot_variant_frequencies(log, top_k=10)` | Top-k variant bar chart |
| `synamine.plot_cases_over_time(log, freq="W")` | Case arrivals line chart |

```python
# Save chart to file
synamine.plot_activity_frequencies(log, path="activities.png")

# Get Figure for customization in Jupyter
fig = synamine.plot_case_durations(log)
fig.axes[0].set_title("My Custom Title")
fig.savefig("custom.png")
```

#### Variant Trie Options

Both `save_visualization` (for `VariantTrie` models) and `view_variant_trie` accept these keyword arguments:

**`label_strategy`** -- Controls how node labels are rendered when the tree is dense.

| Value | Behavior |
|-------|----------|
| `"auto"` (default) | Boxes sized to fit the full activity label. Font size scales with tree density. |
| `"truncate"` | Long labels are trimmed with an ellipsis (`...`) to fit a maximum box width. |
| `"hide"` | Nodes with < 5% frequency show only their count (e.g. `(12)`) instead of the activity name. |

**`pct_reference`** -- Controls what the edge percentages are relative to.

| Value | Behavior |
|-------|----------|
| `"absolute"` (default) | Percentage of all traces (root count). "20%" means 20% of all cases pass through this edge. |
| `"relative"` | Percentage of the parent node's count (branching probability). "20%" means 20% of traces reaching the parent continue down this branch. |

**Example:**

```python
trie = synamine.discover_variant_trie(log)

# Default: full labels, absolute percentages
synamine.save_visualization(trie, "trie.png")

# Truncate long labels, show branching probabilities
synamine.save_visualization(
    trie, "trie.png",
    label_strategy="truncate",
    pct_reference="relative",
)

# Hide rare nodes, keep absolute percentages
synamine.save_visualization(trie, "trie.svg", label_strategy="hide")

# Interactive viewer with options
synamine.view_variant_trie(
    trie,
    label_strategy="hide",
    pct_reference="relative",
)
```

#### Variant Trie Pruning

Use `VariantTrie.prune(min_count=N)` to remove low-frequency branches before visualization. The root count is preserved so percentages remain accurate.

```python
trie = synamine.discover_variant_trie(log)
pruned = trie.prune(min_count=10)  # keep only variants with >= 10 traces
synamine.save_visualization(pruned, "trie_pruned.png")
```

## Examples

The `examples/` directory contains runnable scripts that demonstrate synamine's features using the **Sepsis Cases** event log.

| Example | Description |
|---------|-------------|
| [`dfg_trie_dendrogram/`](examples/dfg_trie_dendrogram/) | DFG discovery (full + noise-filtered) and variant trie dendrogram |
| [`discovery_algorithms/`](examples/discovery_algorithms/) | Compare Alpha Miner, Heuristic Miner, Inductive Miner, and DFG-to-Petri-Net conversion |
| [`statistics/`](examples/statistics/) | Process statistics and chart visualizations (activity frequencies, durations, rework, etc.) |

```bash
# Run an example (from the repository root)
uv run python examples/dfg_trie_dendrogram/dfg_trie_dendrogram.py
uv run python examples/discovery_algorithms/discover.py
uv run python examples/statistics/process_statistics.py
```

## License

MIT
