Metadata-Version: 2.4
Name: finamt
Version: 0.13.1
Summary: An agentic Python library for extracting key information from receipts and preparing essential German tax return statements.
Author: Yauheniya Varabyova
Maintainer: Yauheniya Varabyova
License: MIT
Project-URL: Changelog, https://github.com/yauheniya-ai/finamt/blob/main/CHANGELOG.md
Project-URL: Documentation, https://finamt.readthedocs.io
Project-URL: Repository, https://github.com/yauheniya-ai/finamt
Keywords: finanzamt,finance,office,receipts,invoices,tax,statement,ocr,extraction
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Intended Audience :: Legal Industry
Classifier: Intended Audience :: End Users/Desktop
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Natural Language :: German
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Office/Business :: Financial
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Indexing
Classifier: Topic :: Multimedia :: Graphics :: Capture :: Digital Camera
Requires-Python: <3.14,>=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.32.0
Requires-Dist: PyMuPDF>=1.22.0
Requires-Dist: paddleocr>=3.0.0
Requires-Dist: paddlepaddle>=3.0.0
Requires-Dist: pytesseract>=0.3.10
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pydantic-settings>=2.0.0
Requires-Dist: Pillow>=9.0.0
Requires-Dist: numpy>=1.21.0
Requires-Dist: fastapi>=0.110
Requires-Dist: uvicorn[standard]>=0.29
Requires-Dist: python-multipart>=0.0.9
Requires-Dist: typer>=0.12.0
Provides-Extra: docs
Requires-Dist: sphinx>=5.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=1.0; extra == "docs"
Requires-Dist: myst-parser>=0.18; extra == "docs"
Requires-Dist: sphinx-copybutton>=0.5; extra == "docs"
Provides-Extra: dev
Requires-Dist: twine; extra == "dev"
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: pytest-mock; extra == "dev"
Requires-Dist: black>=22.0; extra == "dev"
Requires-Dist: flake8>=5.0; extra == "dev"
Requires-Dist: mypy>=0.991; extra == "dev"
Requires-Dist: pre-commit>=2.20.0; extra == "dev"
Requires-Dist: finamt[docs]; extra == "dev"
Dynamic: license-file

# finamt

<img src="https://raw.githubusercontent.com/yauheniya-ai/finamt/main/.github/images/finamt-wordmark.svg" width="50%" alt="finamt"/>

<div>
<br>
</div>

<div align="center">

[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-purple.svg)](https://opensource.org/licenses/MIT)
[![PyPI version](https://img.shields.io/pypi/v/finamt?color=blue&label=PyPI)](https://pypi.org/project/finamt/)
[![PyPI Downloads](https://img.shields.io/pypi/dm/finamt)](https://pypistats.org/packages/finamt)
[![Tests](https://github.com/yauheniya-ai/finamt/actions/workflows/tests.yml/badge.svg)](https://github.com/yauheniya-ai/finamt/actions/workflows/tests.yml)
[![Coverage](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/yauheniya-ai/d09f6edc7b1928aeea1fbde834a6080b/raw/coverage.json)](https://github.com/yauheniya-ai/finamt/actions/workflows/tests.yml)
[![GitHub last commit](https://img.shields.io/github/last-commit/yauheniya-ai/finamt)](https://github.com/yauheniya-ai/finamt/commits/main)
[![Documentation Status](https://readthedocs.org/projects/finamt/badge/?version=latest)](https://readthedocs.org/projects/finamt/)

<img src="https://api.iconify.design/noto-v1:flag-for-flag-united-states.svg" width="16" height="16"> English | <img src="https://api.iconify.design/noto-v1:flag-for-flag-germany.svg" width="16" height="16"> [German](https://github.com/yauheniya-ai/finamt/blob/main/readme/README_de.md)

</div>

An agentic Python library for extracting structured data from receipts and invoices and preparing essential German tax return statements.

## Features

- **German Tax Alignment** — Category taxonomy and VAT handling aligned with German fiscal practice 
managing receipts
- **Local-First** — Everything runs completely offline, with data stored in a local database
- **4-Agent Pipeline** — Sequential specialised agents for metadata, counterparty, amounts, and line items; short focused prompts for reliable local model performance
- **Web UI** — Full browser interface for uploading, reviewing, editing, and 

## Tech Stack

**Backend**
- <img src="https://api.iconify.design/devicon:python.svg" width="16" height="16"> [Python](https://www.python.org) — package language
- <img src="https://api.iconify.design/devicon:fastapi.svg" width="16" height="16"> [FastAPI](https://fastapi.tiangolo.com) — backend for the web UI
- <img src="https://api.iconify.design/simple-icons:paddlepaddle.svg" width="16" height="16"> [PaddleOCR](https://github.com/PADDLEPADDLE/PADDLEOCR) — OCR for scanned PDFs 
- <img src="https://api.iconify.design/devicon:google.svg" width="16" height="16"> [Tesseract](https://github.com/tesseract-ocr/tesseract) — OCR for scanned PDFs and images when PaddleOCR fails or times out
- <img src="https://api.iconify.design/devicon:ollama.svg" width="16" height="16"> [Ollama](https://ollama.com) — local LLMs for structured extraction of information from receipts and invoices
    - <img src="https://upload.wikimedia.org/wikipedia/commons/6/69/Qwen_logo.svg" width="16" height="16"> [Qwen](https://qwen.ai/home) – laptop-compatible LLMs with qwen2.5:7b-instruct-q4_K_M currently as preferred default for text-based extraction
- <img src="https://api.iconify.design/devicon:sqlite.svg" width="16" height="16"> [SQLite](https://sqlite.org) – local database for original receipts and extracted data

**Frontend**
- <img src="https://api.iconify.design/devicon:react.svg" width="16" height="16"> [React](https://react.dev) — interactive frontend
- <img src="https://api.iconify.design/devicon:vitejs.svg" width="16" height="16"> [Vite](https://vite.dev) — fast dev server and production bundler
- <img src="https://api.iconify.design/devicon:tailwindcss.svg" width="16" height="16"> [Tailwind CSS](https://tailwindcss.com) — utility-first styling
- <img src="https://api.iconify.design/devicon:typescript.svg" width="16" height="16"> [TypeScript](https://www.typescriptlang.org) — type-safe component and API code


**CLI**
- <img src="https://api.iconify.design/devicon:typer.svg" width="16" height="16"> [Typer](https://typer.tiangolo.com/) — CLI with coloured progress output

**Packaging**
- <img src="https://api.iconify.design/devicon:pypi.svg" width="16" height="16"> [PyPI](https://pypi.org/project/finamt/) — distributed as an installable Python package

## Installation

```bash
pip install finamt
```

For CLI usage, installing via [pipx](https://pipx.pypa.io/) is recommended — it places `finamt` into its own dedicated virtual environment, ensuring its dependencies never interfere with your other projects, while still exposing the `finamt` command globally without requiring you to activate a virtualenv:

```bash
pipx install finamt
```

### System Requirements

- Python 3.10+
- Ollama running locally with a supported model pulled
- Tesseract OCR (optional fallback when PaddleOCR times out)

#### Ollama

```bash
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a model — qwen2.5 7B is the recommended default
ollama pull qwen2.5:7b-instruct-q4_K_M
```

Other models that work well: `qwen3:8b`, `llama3.2`, `llama3.1`.

#### Tesseract OCR (optional fallback from PaddleOCR)

**Ubuntu / Debian**
```bash
sudo apt-get install tesseract-ocr tesseract-ocr-deu
```

**macOS**
```bash
brew install tesseract tesseract-lang
```

**Windows**

Download the installer from https://github.com/UB-Mannheim/tesseract/wiki and add it to your `PATH`.

## Quick Start

### Interactive UI

```bash
finamt serve
```

<p align="center">
  <img src="https://raw.githubusercontent.com/yauheniya-ai/finamt/main/docs/images/Demo.webp" width="100%" />
  <em>Interactive UI to upload receipts and manage tax statements</em>
</p>

### Python API

#### Process a single receipt (expense)

```python
from finamt import FinanceAgent

agent = FinanceAgent()
result = agent.process_receipt("receipt.pdf")

if result.success:
    data = result.data
    print(f"Counterparty: {data.vendor}")
    print(f"Date:         {data.receipt_date}")
    print(f"Total:        {data.total_amount} EUR")
    print(f"VAT:          {data.vat_percentage}% ({data.vat_amount} EUR)")
    print(f"Net:          {data.net_amount} EUR")
    print(f"Category:     {data.category}")
    print(f"Items:        {len(data.items)}")

    # Serialise to JSON
    with open("extracted.json", "w", encoding="utf-8") as f:
        f.write(data.to_json())
else:
    print(f"Extraction failed: {result.error_message}")
```

#### Sale invoices (outgoing)

```python
result = agent.process_receipt("invoice_to_client.pdf", receipt_type="sale")
```

#### Batch processing

```python
from pathlib import Path
from finamt import FinanceAgent

agent = FinanceAgent()
results = agent.batch_process(list(Path("receipts/").glob("*.pdf")))

for path, result in results.items():
    if result.success:
        print(f"{path}: {result.data.total_amount} EUR")
    else:
        print(f"{path}: ERROR — {result.error_message}")
```

## Configuration

Settings are read in priority order from: environment variables → `.env` file → built-in defaults.

```bash
# .env

# OCR and general settings
FINAMT_OLLAMA_BASE_URL=http://localhost:11434
FINAMT_OCR_LANGUAGE=german
FINAMT_OCR_TIMEOUT=60
FINAMT_TESSERACT_CMD=tesseract
FINAMT_OCR_PREPROCESS=true
FINAMT_PDF_DPI=150

# Extraction agents — all 4 agents use this model
FINAMT_AGENT_MODEL=qwen2.5:7b-instruct-q4_K_M
FINAMT_AGENT_TIMEOUT=60
FINAMT_AGENT_NUM_CTX=4096
FINAMT_AGENT_MAX_RETRIES=2
```

You can also pass config objects directly:

```python
from finamt import FinanceAgent
from finamt.agents.config import Config, AgentsConfig

agent = FinanceAgent(
    config=Config(ocr_language="deu+eng", pdf_dpi=150),
    agents_cfg=AgentsConfig(agent_model="qwen3:8b"),
)
```

## API Reference

### FinanceAgent

```python
class FinanceAgent:
    def __init__(
        self,
        config:     Config | None = None,
        db_path:    str | Path | None = "~/.finamt/default/finamt.db",
        agents_cfg: AgentsConfig | None = None,
    ) -> None: ...

    def process_receipt(
        self,
        pdf_path:     str | Path | bytes,
        receipt_type: str = "purchase",   # "purchase" or "sale"
    ) -> ExtractionResult: ...

    def batch_process(
        self,
        pdf_paths:    list[str | Path],
        receipt_type: str = "purchase",
    ) -> dict[str, ExtractionResult]: ...
```

### ExtractionResult

Always check `success` before accessing `data`.

```python
@dataclass
class ExtractionResult:
    success:         bool
    data:            ReceiptData | None
    error_message:   str | None
    duplicate:       bool                  # True if already in the database
    existing_id:     str | None            # ID of the original if duplicate
    processing_time: float | None          # seconds

    def to_dict(self) -> dict: ...
```

### ReceiptData

```python
@dataclass
class ReceiptData:
    id:               str                  # SHA-256 of OCR text — stable dedup key
    receipt_type:     ReceiptType          # "purchase" or "sale"
    counterparty:     Counterparty | None  # vendor (purchase) or client (sale)
    receipt_number:   str | None
    receipt_date:     datetime | None
    total_amount:     Decimal | None
    currency:         str | "EUR"
    vat_percentage:   Decimal | None       # e.g. Decimal("19.0")
    vat_amount:       Decimal | None
    net_amount:       Decimal | None       # computed: total - vat
    category:         ReceiptCategory
    items:            list[ReceiptItem]
    vat_splits:       list[dict]           # for mixed-rate invoices

    vendor: str | None                     # alias for counterparty.name

    def to_dict(self) -> dict: ...
    def to_json(self) -> str: ...
```

### Counterparty

```python
@dataclass
class Counterparty:
    id:          str           # UUID assigned by the database
    name:        str | None
    vat_id:      str | None    # EU format, e.g. DE123456789
    tax_number:  str | None    # German Steuernummer, e.g. 123/456/78901
    address:     Address
    verified:    bool          # manually confirmed in the UI
```

### ReceiptItem

```python
@dataclass
class ReceiptItem:
    position:    int | None
    description: str
    quantity:    Decimal | None
    unit_price:  Decimal | None
    total_price: Decimal | None
    vat_rate:    Decimal | None
    vat_amount:  Decimal | None
    category:    ReceiptCategory

    def to_dict(self) -> dict: ...
```

### ReceiptCategory

A validated string subclass. Invalid values are silently normalised to `"other"`.

```python
from finamt.agents.prompts import RECEIPT_CATEGORIES   # list[str]
from finamt.models import ReceiptCategory

cat = ReceiptCategory("software")       # valid
cat = ReceiptCategory("unknown_value")  # normalised to "other"
cat = ReceiptCategory.other()           # explicit fallback
```

### Exceptions

All exceptions inherit from `FinanceAgentError`.

| Exception | Raised when |
|---|---|
| `OCRProcessingError` | PDF cannot be opened or text extraction fails |
| `LLMExtractionError` | Ollama is unreachable or returns invalid JSON after all retries |
| `InvalidReceiptError` | Extracted data fails business-logic validation |

```python
from finamt.exceptions import FinanceAgentError, OCRProcessingError

try:
    result = agent.process_receipt("scan.pdf")
except OCRProcessingError as e:
    print(e)
```

## Extraction Pipeline

Each receipt goes through four sequential LLM calls, each with a short focused prompt:

| Agent | Extracts |
|---|---|
| Agent 1 | Receipt number, date, category |
| Agent 2 | Counterparty name, VAT ID, Steuernummer, address |
| Agent 3 | Total amount, VAT percentage, VAT amount |
| Agent 4 | Line items (description, VAT rate, VAT amount, price) |

Results are merged in Python — no additional LLM validation step. Debug output for every agent (prompt, raw response, parsed JSON) is saved to `~/.finamt/debug/<receipt_id>/`.

## Categories and Subcategories

Every receipt is tagged with a category and optional subcategory. Categories map directly to line items in the German ELSTER tax forms (EÜR / UStVA), so the correct totals land in the right fields without manual re-sorting.

| Category | Subcategories |
|---|---|
| <img src="https://api.iconify.design/mdi:briefcase.svg" width="16" height="16"> `services` | `freelance` `consulting` `legal` `accounting` `notary` |
| <img src="https://api.iconify.design/ant-design:product-filled.svg" width="16" height="16"> `products` | `physical_goods` `digital_goods` `merchandise` `samples` |
| <img src="https://api.iconify.design/solar:box-bold.svg" width="16" height="16"> `material` | `consumables` `raw_materials` `packaging` `merchandise` |
| <img src="https://api.iconify.design/streamline-plump:computer-pc-desktop-solid.svg" width="16" height="16"> `equipment` | `low_value_asset` `computer` `machinery` `furniture` `tools` |
| <img src="https://api.iconify.design/heroicons:cpu-chip-16-solid.svg" width="16" height="16"> `software` | `subscriptions` `pay_as_you_go` `licenses` `hosting` `domains` |
| <img src="https://api.iconify.design/mdi:file-certificate.svg" width="16" height="16"> `licensing` | `software_licenses` `media_licenses` `other_ip` |
| <img src="https://api.iconify.design/streamline-flex:satellite-dish-solid.svg" width="16" height="16"> `telecommunication` | `phone` `internet` `bundled` |
| <img src="https://api.iconify.design/mdi:airplane.svg" width="16" height="16"> `travel` | `transport` `accommodation` `meals` `per_diem` `incidental` |
| <img src="https://api.iconify.design/boxicons:car-filled.svg" width="16" height="16"> `car` | `fuel` `parking` `garage` `repair` `maintenance` `insurance` `leasing` `rental` |
| <img src="https://api.iconify.design/wpf:books.svg" width="16" height="16"> `education` | `courses` `books` `conferences` `certifications` |
| <img src="https://api.iconify.design/roentgen:electricity.svg" width="16" height="16"> `utilities` | `electricity` `heating` `water` `waste` |
| <img src="https://api.iconify.design/fa:shield.svg" width="16" height="16"> `insurance` | `liability` `health` `vehicle` `property` |
| <img src="https://api.iconify.design/boxicons:bank-filled.svg" width="16" height="16"> `financial` | `bank_fees` `interest` `loan_costs` `payment_fees` |
| <img src="https://api.iconify.design/vaadin:office.svg" width="16" height="16"> `office` | `rent` `coworking` `storage` `cleaning` `security` |
| <img src="https://api.iconify.design/mdi:loudspeaker.svg" width="16" height="16"> `marketing` | `advertising` `print_media` `trade_fairs` `sponsorship` `gifts` |
| <img src="https://api.iconify.design/mdi:donation.svg" width="16" height="16"> `donations` | `charitable` `political` `church` |
| <img src="https://api.iconify.design/flowbite:folder-plus-solid.svg" width="16" height="16"> `other` | `membership_fees` `sundry` |



## TODO

- [x] Receipt parsing
- [x] Tax calculation engine
- [ ] ELSTER field mapper
- [ ] XML generator
- [ ] XSD validator

## Contributing

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/my-change`)
3. Make your changes
4. Run the test suite: `pytest --cov=src --cov-report=term-missing`
5. Submit a pull request

## License

MIT — see [LICENSE](https://raw.githubusercontent.com/yauheniya-ai/finamt/main/LICENSE) for details.
