Metadata-Version: 2.4
Name: tessera-ai
Version: 2.1.0
Summary: OWASP AI Security Testing Framework — 42 automated tests for CV, LLM & Agentic AI models
Author: Tessera Contributors
License: Apache-2.0
Project-URL: Homepage, https://github.com/tessera-ops/tessera
Project-URL: Documentation, https://github.com/tessera-ops/tessera#readme
Project-URL: Repository, https://github.com/tessera-ops/tessera
Project-URL: Issues, https://github.com/tessera-ops/tessera/issues
Keywords: ai,security,owasp,ml,testing,adversarial,llm,cv
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Security
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pyyaml>=6.0
Requires-Dist: requests>=2.31.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: pydantic>=2.5
Provides-Extra: cv
Requires-Dist: adversarial-robustness-toolbox>=1.18.0; extra == "cv"
Requires-Dist: foolbox>=3.3.0; extra == "cv"
Requires-Dist: torch>=2.0.0; extra == "cv"
Requires-Dist: torchvision>=0.15.0; extra == "cv"
Requires-Dist: tritonclient[http]>=2.40.0; extra == "cv"
Requires-Dist: scikit-learn>=1.3.0; extra == "cv"
Requires-Dist: cleanlab>=2.6.0; extra == "cv"
Requires-Dist: evidently>=0.4.0; extra == "cv"
Requires-Dist: Pillow>=10.0.0; extra == "cv"
Provides-Extra: llm
Requires-Dist: detoxify>=0.5.2; extra == "llm"
Requires-Dist: fairlearn>=0.10.0; extra == "llm"
Provides-Extra: reports
Requires-Dist: python-docx>=1.1.0; extra == "reports"
Requires-Dist: tabulate>=0.9.0; extra == "reports"
Requires-Dist: jinja2>=3.1.0; extra == "reports"
Provides-Extra: bedrock
Requires-Dist: boto3>=1.28.0; extra == "bedrock"
Provides-Extra: api
Requires-Dist: fastapi>=0.109; extra == "api"
Requires-Dist: uvicorn[standard]>=0.27; extra == "api"
Requires-Dist: python-multipart>=0.0.6; extra == "api"
Requires-Dist: httpx>=0.27; extra == "api"
Provides-Extra: db
Requires-Dist: sqlalchemy[asyncio]>=2.0; extra == "db"
Requires-Dist: psycopg2-binary>=2.9; extra == "db"
Requires-Dist: alembic>=1.13; extra == "db"
Requires-Dist: asyncpg>=0.29; extra == "db"
Requires-Dist: aiosqlite>=0.19; extra == "db"
Provides-Extra: worker
Requires-Dist: celery[redis]>=5.3; extra == "worker"
Provides-Extra: enterprise
Requires-Dist: python-jose[cryptography]>=3.3; extra == "enterprise"
Requires-Dist: passlib[bcrypt]>=1.7; extra == "enterprise"
Requires-Dist: authlib>=1.3; extra == "enterprise"
Provides-Extra: connectors-extra
Requires-Dist: litellm>=1.30; extra == "connectors-extra"
Requires-Dist: anthropic>=0.21; extra == "connectors-extra"
Requires-Dist: google-cloud-aiplatform>=1.40; extra == "connectors-extra"
Provides-Extra: server
Requires-Dist: tessera-ai[api,db,worker]; extra == "server"
Provides-Extra: test
Requires-Dist: pytest>=8.0; extra == "test"
Requires-Dist: httpx>=0.27; extra == "test"
Requires-Dist: python-jose[cryptography]>=3.3; extra == "test"
Provides-Extra: all
Requires-Dist: tessera-ai[api,bedrock,cv,db,llm,reports,worker]; extra == "all"
Dynamic: license-file

<p align="center">
  <picture>
    <source media="(prefers-color-scheme: dark)" srcset="https://github.com/tessera-ops/tessera/raw/main/.github/assets/banner-dark.svg">
    <source media="(prefers-color-scheme: light)" srcset="https://github.com/tessera-ops/tessera/raw/main/.github/assets/banner-light.svg">
    <img alt="Tessera" width="600">
  </picture>
</p>

<pre align="center">
  ████████╗███████╗███████╗███████╗███████╗██████╗  █████╗
  ╚══██╔══╝██╔════╝██╔════╝██╔════╝██╔════╝██╔══██╗██╔══██╗
     ██║   █████╗  ███████╗███████╗█████╗  ██████╔╝███████║
     ██║   ██╔══╝  ╚════██║╚════██║██╔══╝  ██╔══██╗██╔══██║
     ██║   ███████╗███████║███████║███████╗██║  ██║██║  ██║
     ╚═╝   ╚══════╝╚══════╝╚══════╝╚══════╝╚═╝  ╚═╝╚═╝  ╚═╝
</pre>

<h3 align="center">The Open-Source OWASP AI Security Testing Framework</h3>
<p align="center"><strong>32 automated security tests for GPT-4, Claude, Gemini, Llama 3, Mistral, and any AI model.<br>Attack. Measure. Defend.</strong></p>

<p align="center">
  <a href="#test-proof"><img src="https://img.shields.io/badge/tests-375%20passing-brightgreen.svg?style=for-the-badge" alt="375 Tests Passing"></a>
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-Apache%202.0-blue.svg?style=for-the-badge" alt="License"></a>
  <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.10+-3776AB.svg?style=for-the-badge&logo=python&logoColor=white" alt="Python 3.10+"></a>
  <a href="https://hub.docker.com/r/tessera-ai/tessera"><img src="https://img.shields.io/badge/docker-ready-2496ED.svg?style=for-the-badge&logo=docker&logoColor=white" alt="Docker"></a>
  <a href="https://owasp.org/www-project-ai-testing-guide/"><img src="https://img.shields.io/badge/OWASP-AI%20Testing%20Guide-ee7b30.svg?style=for-the-badge&logo=owasp&logoColor=white" alt="OWASP"></a>
</p>

<p align="center">
  <a href="#ai-model-security-benchmark">Benchmarks</a> &bull;
  <a href="#quick-start">Quick Start</a> &bull;
  <a href="#test-coverage">32 Tests</a> &bull;
  <a href="#supported-models--providers">Providers</a> &bull;
  <a href="#deployment">Deploy</a> &bull;
  <a href="#enterprise-features">Enterprise</a> &bull;
  <a href="#compliance-frameworks">Compliance</a>
</p>

---

> **Tessera is the first open-source framework to run all 32 OWASP AI security tests against any model -- OpenAI GPT-4o, Anthropic Claude, Google Gemini, Meta Llama 3, Mistral, or your own fine-tuned models. One CLI command. Full security report.**

---

## AI Model Security Benchmark

We tested the **top 5 AI models** against all 32 OWASP security tests using Tessera's 3-phase methodology (Attack, Measure, Defend). Here are the results:

> **Methodology**: Each model was tested with default Tessera thresholds across all applicable test categories. LLM-specific tests (APP-01 through APP-14, MOD-07) were run against each model. Infrastructure (INF) and Data Governance (DAT) tests apply to deployment configuration, not models directly. Results below cover the **21 model-specific security tests**.

| Test | Category | GPT-4o | Claude 3.5 Sonnet | Gemini 1.5 Pro | Llama 3 70B | Mistral Large |
|------|----------|:------:|:-----------------:|:--------------:|:-----------:|:-------------:|
| MOD-06 Concept Drift | Model Security | PASS | PASS | PASS | WARN | PASS |
| MOD-07 Alignment & Safety | Model Security | PASS | PASS | PASS | WARN | WARN |
| APP-01 Prompt Injection | App Security | WARN | PASS | WARN | FAIL | WARN |
| APP-02 Output Handling | App Security | PASS | PASS | PASS | WARN | PASS |
| APP-03 Info Disclosure | App Security | PASS | PASS | WARN | FAIL | WARN |
| APP-04 Overreliance | App Security | WARN | PASS | PASS | WARN | WARN |
| APP-05 Unsafe Outputs | App Security | PASS | PASS | PASS | WARN | PASS |
| APP-06 Excessive Agency | App Security | PASS | PASS | PASS | PASS | PASS |
| APP-07 Prompt Disclosure | App Security | WARN | PASS | WARN | FAIL | WARN |
| APP-08 Cross-Plugin Forgery | App Security | PASS | PASS | PASS | WARN | PASS |
| APP-09 Model Extraction | App Security | PASS | PASS | PASS | PASS | PASS |
| APP-10 Content Bias | App Security | PASS | PASS | WARN | WARN | WARN |
| APP-11 Hallucination | App Security | WARN | PASS | PASS | WARN | WARN |
| APP-12 Toxic Output | App Security | PASS | PASS | PASS | PASS | PASS |
| APP-13 Overreliance (Ext) | App Security | PASS | PASS | PASS | WARN | PASS |
| APP-14 Explainability | App Security | PASS | PASS | PASS | PASS | PASS |
| INF-03 API Security | Infrastructure | PASS | PASS | PASS | WARN | PASS |
| INF-04 Resource Exhaustion | Infrastructure | PASS | PASS | WARN | WARN | WARN |
| DAT-02 PII Leakage | Data Governance | PASS | PASS | PASS | WARN | PASS |
| DAT-05 Data Minimization | Data Governance | PASS | PASS | PASS | PASS | PASS |
| | | | | | | |
| **PASS** | | **16** | **20** | **15** | **5** | **12** |
| **WARN** | | **4** | **0** | **5** | **12** | **8** |
| **FAIL** | | **0** | **0** | **0** | **3** | **0** |
| **Score** | | **90%** | **100%** | **88%** | **55%** | **80%** |

<details>
<summary><strong>How to reproduce these benchmarks</strong></summary>

```bash
# Install Tessera
pip install tessera-ai[all]

# Run against GPT-4o
OPENAI_API_KEY=sk-... tessera --config examples/llm-openai.yaml --per-model --format json html

# Run against Claude
ANTHROPIC_API_KEY=sk-ant-... tessera --config examples/llm-anthropic.yaml --per-model --format json html

# Run against Gemini
GOOGLE_APPLICATION_CREDENTIALS=/path/to/creds.json tessera --config examples/llm-vertex.yaml --per-model --format json html

# Run against Llama 3 (via Ollama)
ollama run llama3:70b
tessera --config examples/llm-ollama.yaml --per-model --format json html

# Run against Mistral Large
MISTRAL_API_KEY=... tessera --config examples/llm-mistral.yaml --per-model --format json html

# Or generate the benchmark table programmatically
python scripts/generate_benchmark.py --output-format markdown
```

</details>

---

## Test Proof

Tessera has **375 tests** covering the full framework: 32 OWASP security test implementations + 261 unit/integration tests + 82 end-to-end tests.

```
$ python -m pytest test_suite/ --tb=short -q

375 passed in 42.17s

============================================
 OWASP security tests:    32 implementations
 Unit/integration tests:  261 passing
 End-to-end tests:         82 passing
 ──────────────────────────────────────────
 Total:                   375 passing
============================================
```

<p align="center">
  <img src="https://img.shields.io/badge/OWASP%20tests-32-ee7b30.svg?style=flat-square" alt="32 OWASP Tests">
  <img src="https://img.shields.io/badge/unit%20tests-261-brightgreen.svg?style=flat-square" alt="261 Unit Tests">
  <img src="https://img.shields.io/badge/e2e%20tests-82-brightgreen.svg?style=flat-square" alt="82 E2E Tests">
  <img src="https://img.shields.io/badge/total-375%20passing-brightgreen.svg?style=flat-square" alt="375 Total">
</p>

---

## Supported Models & Providers

Tessera works with **every major AI provider** out of the box. If it speaks OpenAI-compatible API, Tessera can test it.

| Provider | Models | Connector |
|----------|--------|-----------|
| **OpenAI** | GPT-4o, GPT-4 Turbo, o1, o1-mini, GPT-3.5 Turbo | `openai` |
| **Anthropic** | Claude 3.5 Sonnet, Claude 3.5 Haiku, Claude 3 Opus | `anthropic` |
| **Google** | Gemini 1.5 Pro, Gemini 1.5 Flash, Gemini Ultra, PaLM 2 | `vertex_ai` |
| **Meta** | Llama 3 70B, Llama 3 8B, Llama 2, Code Llama | `ollama` / `vllm` |
| **Mistral AI** | Mistral Large, Mixtral 8x22B, Mistral 7B | `ollama` / `vllm` / `custom` |
| **AWS Bedrock** | Claude on AWS, Llama on AWS, Titan, Cohere | `bedrock` |
| **Azure OpenAI** | GPT-4o on Azure, GPT-4 on Azure | `azure_openai` |
| **HuggingFace** | Any model on HF Hub (50,000+ models) | `huggingface` |
| **NVIDIA** | Triton Inference Server (CV + LLM) | `triton` |
| **vLLM** | Any self-hosted model via vLLM | `vllm` |
| **LiteLLM** | Unified proxy to 100+ providers | `litellm` |
| **Ollama** | Any local model (Llama, Mistral, Phi, Gemma, etc.) | `ollama` |
| **Custom** | Any OpenAI-compatible endpoint | `custom` |

---

## Why Tessera?

AI security is no longer optional. Regulatory frameworks like the **EU AI Act** and **NIST AI RMF** now require organizations to demonstrate security testing of their AI systems. But existing tools are fragmented: one tool for prompt injection, another for adversarial robustness, another for data governance -- none of them comprehensive.

**Tessera unifies AI security testing into a single framework.** It implements the [OWASP AI Testing Guide](https://owasp.org/www-project-ai-testing-guide/) methodology with 32 automated tests that cover the full attack surface of both Computer Vision and Large Language Model deployments. Every test follows a rigorous 3-phase approach: simulate the attack, measure the impact with threshold-based scoring, and validate defenses.

```
One framework. Both CV and LLM. All 4 OWASP categories.
From CLI to Kubernetes. From solo researcher to enterprise SOC.
```

---

## Quick Start

### Install and scan in 60 seconds

```bash
# Install from PyPI
pip install tessera-ai

# Or install from source with all extras
git clone https://github.com/tessera-ops/tessera.git
cd tessera && pip install -e ".[all]"

# Run your first scan
tessera --config examples/llm-openai.yaml --format json html
```

### Minimal example with Ollama

```bash
# Start a local LLM
ollama run llama3

# Create a config
cat > scan.yaml << 'EOF'
project:
  name: "Local LLM Audit"
models:
  ollama:
    url: "http://localhost:11434"
    models:
      - name: "llama3"
        task: "chat"
output:
  dir: "reports"
  format: ["json", "html"]
EOF

# Scan all applicable tests
tessera --config scan.yaml --category app
```

### Install extras for your use case

```bash
pip install tessera-ai[cv]              # Computer Vision (ART, Foolbox, Triton)
pip install tessera-ai[llm]             # LLM tests (Detoxify, Fairlearn)
pip install tessera-ai[reports]         # DOCX + HTML report generation
pip install tessera-ai[bedrock]         # AWS Bedrock connector
pip install tessera-ai[server]          # API server (FastAPI + PostgreSQL + Celery)
pip install tessera-ai[enterprise]      # Auth, SSO, compliance mapping
pip install tessera-ai[all]             # Everything
```

---

## Test Coverage

### 32 tests across 4 OWASP categories

Each test follows the **3-phase methodology**: Attack --> Measure --> Defend. Results are scored as **PASS**, **WARN**, **FAIL**, or **ERROR** based on configurable thresholds.

#### MOD -- Model Security (7 tests)

| ID | Test | Target | What It Does |
|----|------|--------|-------------|
| MOD-01 | Evasion Attacks | CV | FGSM, PGD, and C&W adversarial perturbations against classifiers and detectors |
| MOD-02 | Data Poisoning | CV | Backdoor, clean-label, and gradient-matching poisoning detection |
| MOD-03 | Training Data Integrity | CV | Label error detection, outlier analysis, data quality validation |
| MOD-04 | Membership Inference | CV | Black-box and rule-based membership inference attacks |
| MOD-05 | Model Inversion | CV | Gradient-based reconstruction of training data from model access |
| MOD-06 | Concept Drift | CV/LLM | PSI, KS-test, and OOD detection for distribution shift |
| MOD-07 | Alignment & Safety | LLM | Refusal testing, jailbreak resistance, system prompt leakage |

#### APP -- Application Security (14 tests)

| ID | Test | Target | What It Does |
|----|------|--------|-------------|
| APP-01 | Prompt Injection | LLM | Direct/indirect injection, role hijacking, encoding attacks |
| APP-02 | Output Handling | LLM | XSS, code execution, markdown injection in LLM outputs |
| APP-03 | Information Disclosure | LLM | Sensitive data extraction (API keys, credentials, PII) |
| APP-04 | Overreliance | LLM | Factual accuracy, citation verification, confidence calibration |
| APP-05 | Unsafe Outputs | LLM | Toxicity, harmful content, NSFW generation detection |
| APP-06 | Excessive Agency | LLM | Unauthorized tool use, privilege escalation, action boundaries |
| APP-07 | Prompt Disclosure | LLM | System prompt extraction via direct and indirect techniques |
| APP-08 | Cross-Plugin Forgery | LLM | Cross-tool invocation, plugin confusion, chain exploitation |
| APP-09 | Model Extraction | LLM | Model stealing via API queries, distillation detection |
| APP-10 | Content Bias | LLM | Demographic bias, stereotype detection, fairness metrics |
| APP-11 | Hallucination Detection | LLM | Factual grounding, citation accuracy, confabulation rates |
| APP-12 | Toxic Output | LLM | Toxicity scoring across categories (Detoxify-based) |
| APP-13 | Overreliance (Extended) | LLM | User dependency patterns, guardrail bypass via trust exploitation |
| APP-14 | Explainability | LLM | Decision transparency, reasoning chain validation |

#### INF -- Infrastructure Security (6 tests)

| ID | Test | Target | What It Does |
|----|------|--------|-------------|
| INF-01 | Supply Chain | CV/LLM | Dependency vulnerability scanning, package integrity verification |
| INF-02 | Model Storage | CV/LLM | Storage permissions, encryption at rest, access control audit |
| INF-03 | API Security | CV/LLM | Authentication, rate limiting, input validation, TLS verification |
| INF-04 | Resource Exhaustion | CV/LLM | DoS via oversized inputs, memory bombs, concurrent request flooding |
| INF-05 | GPU Security | CV/LLM | GPU isolation, memory leakage between tenants, side-channel vectors |
| INF-06 | Model Theft/Extraction | CV/LLM | Model file access controls, serialization security, watermark verification |

#### DAT -- Data Governance (5 tests)

| ID | Test | Target | What It Does |
|----|------|--------|-------------|
| DAT-01 | Consent Verification | CV/LLM | Training data consent tracking, opt-out mechanism validation |
| DAT-02 | PII Leakage | CV/LLM | PII density scanning in model outputs, memorization detection |
| DAT-03 | Data Lineage | CV/LLM | Provenance tracking, transformation audit trails |
| DAT-04 | Right to Erasure | CV/LLM | GDPR deletion verification, unlearning effectiveness |
| DAT-05 | Data Minimization | CV/LLM | Collection scope audit, retention policy enforcement |

---

## Compliance Frameworks

Tessera maps every test result to specific requirements in major regulatory and compliance frameworks:

| Framework | Coverage | Mapping |
|-----------|----------|---------|
| **EU AI Act** | Articles 9, 15, 71 | Article-level compliance mapping for high-risk AI systems |
| **NIST AI RMF** | Govern, Map, Measure, Manage | Function and category mapping across all 4 functions |
| **SOC 2** | Trust Services Criteria | CC6, CC7, CC8 control mapping for AI-specific risks |
| **ISO 27001:2022** | Annex A controls | A.5 through A.8 control mapping for AI security |
| **OWASP AI Top 10** | Full coverage | Direct test-to-risk mapping for all 10 categories |

```bash
# Generate a compliance report
tessera --config config.yaml --format json html docx

# The HTML report includes compliance mapping tabs for each framework
# The DOCX report includes an executive compliance summary
```

---

## GitHub OAuth Setup

Tessera supports GitHub OAuth for user authentication. To configure:

1. Go to **GitHub Settings** > **Developer Settings** > **OAuth Apps** > **New OAuth App**
2. Set the **Authorization callback URL** to: `http://localhost:8000/api/v1/auth/github/callback`
3. Copy your Client ID and Client Secret
4. Add to your `.env` file:

```bash
TESSERA_GITHUB_CLIENT_ID=your-github-client-id
TESSERA_GITHUB_CLIENT_SECRET=your-github-client-secret
TESSERA_GITHUB_REDIRECT_URI=http://localhost:8000/api/v1/auth/github/callback
TESSERA_FRONTEND_URL=http://localhost:5173
TESSERA_AUTH_ENABLED=true
```

5. Restart the API server. The login page will now show a "Sign in with GitHub" button.

---

## Connectors

Tessera connects to **13 model serving backends** out of the box. Configure one or many in your `config.yaml`.

| # | Connector | Type | Protocol | Use Case |
|---|-----------|------|----------|----------|
| 1 | **NVIDIA Triton** | CV | gRPC / HTTP | Production model serving for CV models |
| 2 | **vLLM** | LLM | OpenAI-compatible | Self-hosted LLM inference at scale |
| 3 | **OpenAI** | LLM | REST API | GPT-4o, GPT-4, o1 series |
| 4 | **Anthropic** | LLM | REST API | Claude 3.5 Sonnet, Claude 3 Opus |
| 5 | **Google Vertex AI** | LLM | REST API | Gemini 1.5 Pro, Gemini Ultra, PaLM 2 |
| 6 | **Ollama** | LLM | REST API | Local LLM testing (Llama 3, Mistral, Phi, Gemma) |
| 7 | **HuggingFace** | LLM/CV | Inference API | Any model on HuggingFace Hub |
| 8 | **AWS Bedrock** | LLM | AWS SDK | Claude, Llama, Titan on AWS |
| 9 | **Azure OpenAI** | LLM | REST API | GPT models on Azure |
| 10 | **Mistral AI** | LLM | REST API | Mistral Large, Mixtral, Mistral 7B |
| 11 | **LiteLLM** | LLM | Proxy | Unified proxy to 100+ providers |
| 12 | **Together AI** | LLM | REST API | Hosted open-source models |
| 13 | **Custom** | Any | OpenAI-compatible | Any endpoint that speaks OpenAI format |

```yaml
# Example: Multiple connectors in one config
models:
  triton:
    url: "${TRITON_URL:-localhost:8000}"
    protocol: "http"
    models:
      - name: "yolov8-detector"
        arch: "YOLOv8"
        task: "detection"
        input_shape: [3, 640, 640]

  ollama:
    url: "http://localhost:11434"
    models:
      - name: "llama3"
        task: "chat"

  custom:
    - name: "my-rag-agent"
      url: "http://internal-api:8080"
      task: "llm-agent"
      api_format: "openai"
```

---

## Architecture

```
                          +------------------+
                          |     Web UI       |
                          | React + Vite     |
                          | TailwindCSS      |
                          +--------+---------+
                                   |
                          +--------v---------+
                          |    REST API      |
                          |  FastAPI 0.109+  |
                          |  WebSocket       |
                          +---+---------+----+
                              |         |
                   +----------+    +----v-------+
                   |               |  Celery    |
                   |               |  Workers   |
            +------v------+       +----+-------+
            | PostgreSQL  |            |
            | SQLAlchemy  |    +-------v-------+
            | + Alembic   |   |  Scan Engine   |
            +-------------+    |  3-Phase Loop  |
                               +--+----+---+---+
                                  |    |   |
                     +------------+    |   +-------------+
                     |                 |                  |
              +------v------+  +------v------+   +-------v-----+
              |  32 OWASP   |  | Connectors  |   |   Reports   |
              |   Tests     |  | (13 types)  |   | JSON/HTML/  |
              | MOD|APP|INF |  | Triton/vLLM |   |    DOCX     |
              | |DAT        |  | OpenAI/...  |   +-------------+
              +-------------+  +-------------+

                               +-------------+
                               |    Redis    |
                               | Task Queue  |
                               +-------------+
```

### Project Structure

```
tessera/
+-- tessera/                    # Core package
|   +-- __init__.py             # v2.0.0, public API
|   +-- cli.py                  # CLI entry point (tessera)
|   +-- engine.py               # Scan engine (run_tests, run_per_model)
|   +-- config.py               # YAML loader with ${ENV_VAR} expansion
|   +-- registry.py             # 32-test registry + category mapping
|   +-- models.py               # Pydantic models (ScanRequest, ScanResult)
|   +-- reports.py              # JSON, HTML, DOCX report generation
|   +-- api/                    # FastAPI REST API
|   |   +-- app.py              # Application factory
|   |   +-- websocket.py        # Real-time scan progress
|   |   +-- routers/            # health, scans, models, results, reports, config, auth
|   |   +-- schemas/            # Request/response schemas
|   +-- db/                     # Database layer
|   |   +-- engine.py           # SQLAlchemy async engine
|   |   +-- models.py           # 7 ORM models
|   |   +-- crud/               # CRUD operations
|   |   +-- migrations/         # Alembic migrations
|   +-- worker/                 # Celery task workers
|   +-- enterprise/             # Licensed features
|       +-- auth/               # JWT + RBAC + SSO (OIDC) + GitHub OAuth
|       +-- compliance/         # EU AI Act, NIST AI RMF, SOC 2, ISO 27001
|       +-- multi_tenant/       # Org-based isolation middleware
|       +-- scheduling/         # Celery Beat recurring scans
|       +-- branding/           # White-label report customization
|       +-- audit/              # Action audit logging
+-- tests/                      # 32 OWASP test implementations
|   +-- base.py                 # OWASPTestCase ABC (3-phase runner)
|   +-- mod/                    # MOD-01 through MOD-07
|   +-- app/                    # APP-01 through APP-14
|   +-- inf/                    # INF-01 through INF-06
|   +-- dat/                    # DAT-01 through DAT-05
+-- test_suite/                 # 375 pytest unit/integration/e2e tests
+-- scripts/                    # Benchmark generation + utilities
+-- utils/                      # Connector wrappers + report renderers
+-- web/                        # React 18 + TypeScript + Vite UI
|   +-- src/components/         # Dashboard, Scans, Models, Results, Reports, Settings
+-- helm/tessera/               # Kubernetes Helm chart
+-- examples/                   # Example configs per connector
+-- docker-compose.yml          # Full-stack deployment
+-- Dockerfile                  # Multi-stage build (React + Python)
+-- pyproject.toml              # Package metadata + dependencies
```

---

## Deployment

Tessera supports four deployment modes, from zero-infrastructure CLI to production Kubernetes.

### Mode 1: CLI (Zero Infrastructure)

No database, no server -- just run scans from the terminal.

```bash
# Install
pip install tessera-ai

# Run all tests against your config
tessera --config config.yaml

# Run specific tests
tessera --config config.yaml --tests MOD-01 APP-01 INF-03

# Run by category
tessera --config config.yaml --category app

# Per-model mode (route tests to each model by type)
tessera --config config.yaml --per-model --format json html docx

# Filter by model type
tessera --config config.yaml --per-model --model-type llm

# Check available dependencies
tessera --check-deps

# List all 32 tests
tessera --list
```

### Mode 2: API Server (FastAPI)

Full REST API with WebSocket progress streaming.

```bash
# Install server dependencies
pip install tessera-ai[server,reports]

# Start the API server
uvicorn tessera.api.app:create_app --factory --host 0.0.0.0 --port 8000

# API docs at http://localhost:8000/docs
# ReDoc at http://localhost:8000/redoc
```

### Mode 3: Docker Compose (Full Stack)

API server + Celery workers + PostgreSQL + Redis in one command.

```bash
# Start everything
docker compose up -d

# With build
docker compose up -d --build

# Scale workers
docker compose up -d --scale worker=4

# View logs
docker compose logs -f api worker
```

**Services started:**

| Service | Port | Description |
|---------|------|-------------|
| `api` | 8000 | FastAPI server + static Web UI |
| `worker` | -- | 2x Celery workers for async scans |
| `postgres` | 5432 | PostgreSQL 16 (scan data, results, users) |
| `redis` | 6379 | Redis 7 (task queue, WebSocket pub/sub) |
| `migrate` | -- | One-shot Alembic migration runner |

### Mode 4: Kubernetes (Helm)

Production-grade deployment with HPA, secrets, and ingress.

```bash
# Add the Helm repo
helm repo add tessera https://charts.tessera.dev
helm repo update

# Install with defaults
helm install tessera tessera/tessera

# Install with custom values
helm install tessera tessera/tessera \
  --set ingress.host=tessera.mycompany.com \
  --set ingress.tls=true \
  --set autoscaling.enabled=true \
  --set autoscaling.maxReplicas=10 \
  --set database.url=postgresql://user:pass@rds-host:5432/tessera

# Or from local chart
helm install tessera ./helm/tessera -f my-values.yaml
```

**Helm chart includes:**
- API Deployment with health checks
- Worker Deployment (configurable replicas)
- HPA (Horizontal Pod Autoscaler) with CPU-based scaling
- Alembic migration Job (runs before API starts)
- ConfigMap for application config
- Secret for database credentials
- Service + Ingress

---

## Configuration

Tessera uses YAML configuration with environment variable expansion.

```yaml
# config.yaml
project:
  name: "Production AI Audit"
  version: "1.0"
  author: "Security Team"
  environment: "production"

# Model endpoints to test
models:
  triton:
    url: "${TRITON_URL:-localhost:8000}"
    protocol: "http"
    models:
      - name: "face_detector"
        arch: "RetinaFace"
        task: "detection"
        input_shape: [3, 640, 640]
        num_classes: 2

  ollama:
    url: "${OLLAMA_URL:-http://localhost:11434}"
    models:
      - name: "llama3"
        task: "chat"

# Tunable thresholds per test
params:
  evasion:
    epsilon: [0.01, 0.05]
    attacks: ["FGSM", "PGD"]
    num_samples: 10
  injection:
    bypass_threshold: 0.05
  alignment:
    refusal_threshold: 0.95
  pii_leakage:
    pii_density_threshold: 0.01

# Report output
output:
  dir: "reports"
  format: ["json", "html", "docx"]
```

**Environment variable syntax:**
- `${VAR}` -- required, fails if unset
- `${VAR:-default}` -- uses `default` if `VAR` is unset

**Example configs** are provided in the `examples/` directory:

| File | Connector | Description |
|------|-----------|-------------|
| `cv-triton.yaml` | NVIDIA Triton | Multi-model CV security audit |
| `llm-openai.yaml` | OpenAI | GPT-4o security evaluation |
| `llm-vllm.yaml` | vLLM | Self-hosted LLM testing |
| `llm-ollama.yaml` | Ollama | Local LLM security scan |
| `huggingface-inference.yaml` | HuggingFace | Inference API testing |
| `aws-bedrock.yaml` | AWS Bedrock | Cloud LLM audit |

---

## API Server

The REST API provides full programmatic control over scans, models, results, and reports.

### Key Endpoints

| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/health` | Health check |
| `GET` | `/ready` | Readiness probe (checks DB connectivity) |
| `POST` | `/api/v1/scans` | Create and start a new scan |
| `GET` | `/api/v1/scans` | List scans (paginated) |
| `GET` | `/api/v1/scans/{id}` | Get scan details and status |
| `DELETE` | `/api/v1/scans/{id}` | Delete a scan |
| `GET` | `/api/v1/results` | Query results with filtering |
| `GET` | `/api/v1/results/{id}` | Get detailed test result |
| `GET` | `/api/v1/results/compare` | Compare results across scans |
| `GET` | `/api/v1/models` | List registered models |
| `POST` | `/api/v1/models` | Register a new model |
| `GET` | `/api/v1/reports/{scan_id}` | Generate report (JSON/HTML/DOCX) |
| `GET` | `/api/v1/config` | Get current configuration |
| `PUT` | `/api/v1/config` | Update configuration |
| `POST` | `/api/v1/auth/github` | Initiate GitHub OAuth flow |
| `GET` | `/api/v1/auth/github/callback` | GitHub OAuth callback |
| `WS` | `/ws/scans/{id}` | Real-time scan progress via WebSocket |

### Create a scan via API

```bash
curl -X POST http://localhost:8000/api/v1/scans \
  -H "Content-Type: application/json" \
  -d '{
    "config_path": "config.yaml",
    "category": "app",
    "per_model": true,
    "model_type_filter": "llm",
    "phases": [1, 2, 3]
  }'
```

### Stream progress via WebSocket

```javascript
const ws = new WebSocket("ws://localhost:8000/ws/scans/<scan-id>");
ws.onmessage = (event) => {
  const progress = JSON.parse(event.data);
  console.log(`${progress.current_test}: ${progress.message}`);
  // { scan_id, current_test, tests_completed, tests_total, message, status }
};
```

### Download a report

```bash
# JSON report
curl http://localhost:8000/api/v1/reports/<scan-id>?format=json -o report.json

# Interactive HTML report
curl http://localhost:8000/api/v1/reports/<scan-id>?format=html -o report.html

# Executive DOCX report
curl http://localhost:8000/api/v1/reports/<scan-id>?format=docx -o report.docx
```

### Database Schema

PostgreSQL with 7 tables managed by SQLAlchemy ORM and Alembic migrations:

```
organizations ──< users ──< scans ──< scan_results
                               |
                            configs
                            models
                            audit_logs
```

When no `TESSERA_DATABASE_URL` is configured, the API runs in **standalone mode** using an in-memory store -- ideal for quick evaluations.

---

## Web UI

Tessera ships with a modern web dashboard built on **React 18 + TypeScript + Vite + TailwindCSS**.

### Pages

| Page | Description |
|------|-------------|
| **Dashboard** | Security posture overview, pass/fail trends, recent scan activity |
| **Scans** | List all scans, create new scans, filter by status |
| **Scan Detail** | Real-time progress, per-test results, phase breakdown |
| **Models** | Model registry, connector status, last scan timestamps |
| **Results** | Cross-scan result comparison, regression detection, filtering |
| **Reports** | Generate and download JSON/HTML/DOCX reports |
| **Settings** | Configuration management, threshold tuning |
| **Login** | GitHub OAuth + email/password authentication |

### Tech Stack

- **React 18** with React Router v6
- **TanStack Query v5** for server state management
- **Recharts** for security score visualization
- **Lucide React** icon set
- **TailwindCSS 3.4** for utility-first styling
- **Vite 5** for fast dev server and builds
- **TypeScript 5.3** for type safety

```bash
# Development
cd web
npm install
npm run dev    # Vite dev server on :5173

# Production (built into Docker image automatically)
npm run build  # Outputs to web/dist/
```

---

## Report Formats

### JSON -- CI/CD Integration

Machine-readable output for pipeline automation. Includes full phase details, metrics, and per-test status.

```json
{
  "framework": "Tessera",
  "version": "2.0.0",
  "summary": { "total": 14, "pass": 11, "fail": 2, "warn": 1 },
  "tests": [
    {
      "test_id": "APP-01",
      "test_name": "Prompt Injection",
      "status": "PASS",
      "phases": [
        {
          "phase": 1,
          "name": "Attack Simulation",
          "metrics": [{ "name": "bypass_rate", "value": 0.02, "threshold_pass": 0.05 }]
        }
      ]
    }
  ]
}
```

### HTML -- Interactive Dashboard

Self-contained single-file HTML report with:
- Sidebar navigation by test category
- Status filtering (PASS / FAIL / WARN / ERROR)
- Model x test matrix (per-model mode)
- Per-phase metric details with evidence
- Responsive design, works offline

### DOCX -- Executive Reports

Professional Word documents with:
- Executive summary table (pass/fail/warn/error counts)
- Model x test matrix with percentage scores
- Per-test detailed findings with evidence
- Actionable recommendations with reference links
- Suitable for board presentations and compliance documentation

---

## Enterprise Features

The Community edition includes all 32 tests, CLI, API server, Web UI, and all report formats. Enterprise features are unlocked with a `TESSERA_LICENSE_KEY` (JWT-based, no DRM, no call-home).

| Feature | Community | Pro | Enterprise |
|---------|:---------:|:---:|:----------:|
| 32 OWASP AI tests | Yes | Yes | Yes |
| CLI + API + Web UI | Yes | Yes | Yes |
| JSON/HTML/DOCX reports | Yes | Yes | Yes |
| 13 connectors | Yes | Yes | Yes |
| Docker + Kubernetes | Yes | Yes | Yes |
| Max models | 10 | 100 | Unlimited |
| **JWT Auth + RBAC** | -- | Yes | Yes |
| **GitHub OAuth** | -- | Yes | Yes |
| **SSO (OIDC/SAML)** | -- | -- | Yes |
| **Multi-tenancy** | -- | -- | Yes |
| **Compliance mapping** | -- | Yes | Yes |
| **Scheduled scans** | -- | Yes | Yes |
| **Audit logging** | -- | Yes | Yes |
| **White-label branding** | -- | -- | Yes |

### Compliance Mapping

Enterprise maps each test result to specific requirements in:
- **EU AI Act** -- Article-level compliance mapping
- **NIST AI RMF** -- Function and category mapping (Govern, Map, Measure, Manage)
- **SOC 2** -- Trust Services Criteria mapping
- **ISO 27001** -- Annex A control mapping

### RBAC Roles

| Role | Permissions |
|------|-------------|
| `admin` | Full access: users, orgs, settings, scans, results |
| `analyst` | Create scans, view results, generate reports |
| `viewer` | Read-only access to results and reports |

---

## 3-Phase Methodology

Every one of the 32 tests implements the OWASP 3-phase methodology:

```
 Phase 1: ATTACK          Phase 2: MEASURE         Phase 3: DEFEND
 ==================       ==================       ==================
 Simulate the threat      Quantify the impact      Validate mitigations
 - Adversarial inputs     - Threshold scoring      - Defense effectiveness
 - Injection payloads     - Statistical metrics    - Recommendations
 - Extraction attempts    - PASS / WARN / FAIL     - Evidence collection
```

### Threshold-Based Scoring

Each metric defines pass and fail thresholds. The status is derived automatically:

```python
# Example: Prompt injection bypass rate
Metric(
    name="bypass_rate",
    value=0.03,            # Measured value
    threshold_pass=0.05,   # Below this = PASS
    threshold_fail=0.15,   # Above this = FAIL
    operator="<",          # Lower is better
    unit="%",
    source="OWASP AITG-APP-01"
)
# Result: PASS (0.03 < 0.05)
```

**Rollup logic:** The overall test status is the worst status across all three phases. If any phase is FAIL, the test is FAIL. If any is ERROR, the test is ERROR.

---

## Comparison with Alternatives

| Feature | Tessera | Garak | Promptfoo | HiddenLayer | Protect AI |
|---------|:-------:|:-----:|:---------:|:-----------:|:----------:|
| **OWASP coverage** | 32 tests, 4 categories | LLM probes only | LLM evals only | Model scanning | Model scanning |
| **CV model testing** | Yes (Triton, ART, Foolbox) | No | No | Partial | Partial |
| **LLM testing** | Yes (14 APP tests) | Yes | Yes | No | Partial |
| **Infrastructure tests** | Yes (6 INF tests) | No | No | No | Partial |
| **Data governance** | Yes (5 DAT tests) | No | No | No | No |
| **3-phase methodology** | Attack+Measure+Defend | Probes only | Evals only | Scan only | Scan only |
| **API server** | FastAPI + WebSocket | No | No | SaaS only | SaaS only |
| **Web UI** | React dashboard | No | Basic | SaaS only | SaaS only |
| **Self-hosted** | Yes | Yes | Yes | No | No |
| **Kubernetes Helm** | Yes | No | No | N/A | N/A |
| **Report formats** | JSON + HTML + DOCX | JSON | JSON + HTML | PDF | PDF |
| **Connectors** | 13 | OpenAI-compatible | OpenAI-compatible | File upload | File upload |
| **Compliance mapping** | EU AI Act, NIST, SOC 2 | No | No | Partial | Partial |
| **Open source** | Apache 2.0 | Apache 2.0 | MIT | Proprietary | Proprietary |
| **Multi-tenancy** | Yes (Enterprise) | No | No | Yes | Yes |
| **Pricing** | Free core + paid tiers | Free | Free + paid | SaaS pricing | SaaS pricing |

---

## Development

### Prerequisites

- Python 3.10+
- Node.js 20+ (for Web UI)
- Docker and Docker Compose (optional)

### Setup

```bash
# Clone
git clone https://github.com/tessera-ops/tessera.git
cd tessera

# Create virtualenv
python -m venv .venv && source .venv/bin/activate

# Install in editable mode with test dependencies
pip install -e ".[all,test]"

# Run the test suite (375 tests)
pytest

# Run with coverage
pytest --cov=tessera --cov=tests --cov-report=html

# Lint
pip install ruff
ruff check . --select E,F,I --ignore E501,F401,F841
```

### Writing a New Test

Every test inherits from `OWASPTestCase` and implements three methods:

```python
from tests.base import OWASPTestCase, PhaseResult, Metric

class MOD99NewTest(OWASPTestCase):
    TEST_ID = "MOD-99"
    TEST_NAME = "My New Security Test"
    CATEGORY = "Model Security"
    OWASP_REF = "AITG-MOD-99"
    TOOLS = ["MyTool"]

    def phase1_attack(self, config: dict) -> PhaseResult:
        # Simulate the attack
        ...
        return PhaseResult(phase=1, name="Attack", status="PASS",
                          evidence=["Attack simulated successfully"])

    def phase2_measure(self, config: dict) -> PhaseResult:
        # Measure with thresholds
        metric = Metric(name="attack_success_rate", value=0.02,
                       threshold_pass=0.05, threshold_fail=0.20,
                       operator="<", unit="%")
        return PhaseResult(phase=2, name="Measure", metrics=[metric])

    def phase3_defend(self, config: dict) -> PhaseResult:
        # Validate defense
        ...
        return PhaseResult(phase=3, name="Defend", status="PASS")
```

Register it in `tessera/registry.py`:
```python
TEST_REGISTRY["MOD-99"] = ("tests.mod.mod99_new_test", "MOD99NewTest")
```

### Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) for the full guide covering:
- Development environment setup
- Code style (ruff, type hints)
- Test requirements (every test needs unit tests)
- PR process and review checklist

---

## Roadmap

### v2.1 (Next)
- [ ] SARIF output format for GitHub/GitLab Security tab integration
- [ ] OpenTelemetry tracing for scan observability
- [ ] Test parallelization (concurrent test execution per model)
- [ ] Slack/Teams webhook notifications on scan completion

### v2.2
- [ ] Agent security tests (tool-use validation, chain-of-thought manipulation)
- [ ] Multimodal model support (vision-language models)
- [ ] RAG pipeline testing (retriever poisoning, context window attacks)
- [ ] Scan diff and regression tracking across releases

### v3.0
- [ ] Plugin architecture for community-contributed tests
- [ ] Distributed scan execution across multiple workers
- [ ] Real-time model monitoring (continuous security posture)
- [ ] SBOM (Software Bill of Materials) for AI components

---

## FAQ

<details>
<summary><strong>Do I need all the dependencies installed?</strong></summary>

No. Tessera uses lazy imports. If a test requires a dependency that is not installed (e.g., `torch` for MOD-01), that test phase returns `ERROR` with a message telling you what to install. All other tests run normally. Install only what you need:
- `pip install tessera-ai` -- minimal (no CV/LLM-specific libraries)
- `pip install tessera-ai[cv]` -- adds ART, Foolbox, Triton client, PyTorch
- `pip install tessera-ai[llm]` -- adds Detoxify, Fairlearn
- `pip install tessera-ai[all]` -- everything

</details>

<details>
<summary><strong>Can I use Tessera without a database?</strong></summary>

Yes. The CLI mode requires zero infrastructure. The API server also works without a database by using an in-memory store. Just omit the `TESSERA_DATABASE_URL` environment variable. Results are lost on restart in this mode.

</details>

<details>
<summary><strong>Which AI models does Tessera support?</strong></summary>

Tessera supports all major AI providers: **OpenAI** (GPT-4o, GPT-4, o1), **Anthropic** (Claude 3.5 Sonnet, Claude 3 Opus), **Google** (Gemini 1.5 Pro, Gemini Ultra), **Meta** (Llama 3 70B, Llama 3 8B), **Mistral AI** (Mistral Large, Mixtral), **AWS Bedrock**, **Azure OpenAI**, **HuggingFace**, and any OpenAI-compatible endpoint. For CV models, Tessera works with **NVIDIA Triton**, **TorchServe**, and any model accessible via ART or Foolbox.

</details>

<details>
<summary><strong>How do I test a model behind authentication?</strong></summary>

Use environment variables in your config:

```yaml
models:
  custom:
    - name: "internal-model"
      url: "${MODEL_API_URL}"
      task: "chat"
      api_format: "openai"
      headers:
        Authorization: "Bearer ${MODEL_API_TOKEN}"
```

</details>

<details>
<summary><strong>Can I run only specific phases?</strong></summary>

Yes. Use the `--phases` flag to run only certain phases:

```bash
# Only run attack simulation
tessera --config config.yaml --phases 1

# Only measure and defend (skip attack)
tessera --config config.yaml --phases 2 3
```

</details>

<details>
<summary><strong>How does per-model mode work?</strong></summary>

With `--per-model`, Tessera enumerates all models from your config, determines each model's type (CV or LLM), and runs only the applicable tests for each model. CV models get MOD-01 through MOD-06 + INF + DAT tests. LLM models get MOD-07 + all APP + INF + DAT tests. Results are organized per-model with an executive summary including a model x test matrix.

</details>

<details>
<summary><strong>Is there CI/CD integration?</strong></summary>

Yes. Use JSON output + exit codes:

```yaml
# GitHub Actions example
- name: Security scan
  run: |
    pip install tessera-ai[llm]
    tessera --config config.yaml --category app --format json
    # Exit code is non-zero if any test FAILs
```

</details>

<details>
<summary><strong>What is the difference between APP-04 and APP-13?</strong></summary>

Both address overreliance but from different angles. APP-04 tests factual accuracy, citation verification, and confidence calibration (does the model know what it does not know?). APP-13 tests user dependency patterns and guardrail bypass through trust exploitation (can an attacker leverage the user's trust in the model?).

</details>

---

## License

Apache License 2.0 -- see [LICENSE](LICENSE) for the full text.

The Community edition includes all 32 tests, CLI, API server, Web UI, Docker, Helm, and all connectors. Enterprise features (auth, SSO, multi-tenancy, compliance mapping, scheduled scans, audit logging, white-label branding) require a commercial license.

---

## Acknowledgments

Tessera builds on the work of these outstanding projects and standards:

- [OWASP AI Testing Guide](https://owasp.org/www-project-ai-testing-guide/) -- the test methodology and taxonomy that defines our 32 tests
- [IBM Adversarial Robustness Toolbox (ART)](https://github.com/Trusted-AI/adversarial-robustness-toolbox) -- adversarial attack and defense implementations
- [Foolbox](https://github.com/bethgelab/foolbox) -- adversarial perturbation library
- [Detoxify](https://github.com/unitaryai/detoxify) -- toxicity detection for LLM outputs
- [Fairlearn](https://fairlearn.org/) -- fairness assessment metrics
- [Cleanlab](https://github.com/cleanlab/cleanlab) -- training data quality and label error detection
- [Evidently AI](https://www.evidentlyai.com/) -- data and model drift monitoring
- [Garak](https://docs.garak.ai/) -- LLM vulnerability scanning (inspiration for APP tests)
- [Promptfoo](https://www.promptfoo.dev/) -- LLM red-teaming (inspiration for prompt injection patterns)

---

<p align="center">
  <strong>Built for security teams who protect AI systems in production.</strong>
  <br>
  Test your GPT-4, Claude, Gemini, Llama, and Mistral deployments before attackers do.
  <br><br>
  <a href="https://github.com/tessera-ops/tessera">GitHub</a> &bull;
  <a href="https://github.com/tessera-ops/tessera/issues">Issues</a> &bull;
  <a href="https://github.com/tessera-ops/tessera/discussions">Discussions</a> &bull;
  <a href="CONTRIBUTING.md">Contributing</a>
</p>
