Metadata-Version: 2.4
Name: daylily-ursa
Version: 0.1.15
Summary: Daylily Workset Management API - automated analysis workset manager for genomics pipelines
Author-email: Daylily Informatics <daylily@daylilyinformatics.com>
License: MIT
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: boto3>=1.26.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: daylily-cognito>=0.1.24
Requires-Dist: daylily-tapdb<0.2.0,>=0.1.28
Requires-Dist: fastapi>=0.104.0
Requires-Dist: uvicorn[standard]>=0.24.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pydantic[email]>=2.0.0
Requires-Dist: pydantic-settings>=2.0.0
Requires-Dist: python-multipart>=0.0.6
Requires-Dist: jinja2>=3.1.0
Requires-Dist: httpx>=0.25.0
Requires-Dist: jsonschema>=4.17.0
Requires-Dist: itsdangerous>=2.2.0
Requires-Dist: tabulate>=0.9.0
Requires-Dist: typer>=0.9.0
Requires-Dist: rich>=13.0.0
Provides-Extra: auth
Requires-Dist: python-jose[cryptography]>=3.3.0; extra == "auth"
Requires-Dist: passlib[bcrypt]>=1.7.4; extra == "auth"
Provides-Extra: dev
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: moto>=4.2.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.5.0; extra == "dev"
Requires-Dist: boto3-stubs[cloudwatch,s3,sns]>=1.28.0; extra == "dev"
Provides-Extra: cluster
Requires-Dist: daylily-ephemeral-cluster; extra == "cluster"

# daylily-ursa

[![CI](https://github.com/Daylily-Informatics/daylily-ursa/actions/workflows/ci.yml/badge.svg)](https://github.com/Daylily-Informatics/daylily-ursa/actions/workflows/ci.yml)
[![Version](https://img.shields.io/github/v/tag/Daylily-Informatics/daylily-ursa?label=version)](https://github.com/Daylily-Informatics/daylily-ursa/tags)

**Daylily Workset Management API** — Automated analysis workset manager for genomics pipelines.

## Overview

Daylily Ursa provides a comprehensive workset management system for orchestrating genomics analysis pipelines. It handles:

- **Workset Lifecycle Management** — Create, monitor, and manage analysis worksets through TapDB-backed state machine
- **File Registry** — Track and validate input/output files across S3 buckets
- **Customer Portal** — Web-based interface for customers to submit and monitor worksets
- **Biospecimen Management** — Track samples, manifests, and metadata
- **Multi-Region Support** — Coordinate worksets across AWS regions
- **Notifications** — SNS-based alerts for workset state changes
- **Storage Metrics** — Track and display workset directory sizes and storage consumption
- **Cognito Authentication** — Optional AWS Cognito integration for secure multi-tenant access

## Quick Start (Development)

```bash
# Activate the development environment (creates conda env if needed)
source ./ursa_activate

# Check system status
ursa info

# Run tests
ursa test run

# Start the API server (no auth, development mode)
ursa server start
```

## CLI Tools

### `ursa_activate` — Environment Setup

Source this script to set up the development environment:

```bash
source ./ursa_activate
```

This will:
1. Create the `URSA` conda environment from `config/ursa_env.yaml` (if not exists)
2. Activate the conda environment
3. Install the package in development mode
4. Add CLI tools to PATH
5. Enable tab completion for the `ursa` CLI

### `ursa` — Management CLI

The main CLI tool for managing the project. Uses Typer with subcommand groups:

```bash
ursa <group> <command> [args]
```

**Command Groups:**

| Group | Description |
|-------|-------------|
| `ursa server` | API server management (start, stop, status, logs) |
| `ursa monitor` | Workset monitor daemon (start, stop, status, logs) |
| `ursa aws` | AWS resource management (setup, status, teardown) |
| `daycog` | Cognito/SSO management via `daylily-cognito` (setup, status, users, apps) |
| `ursa test` | Testing and code quality (run, cov, lint, format, typecheck) |
| `ursa env` | Environment and configuration (status, generate, clean) |

**Top-Level Commands:**
- `ursa version` — Show version information
- `ursa info` — Show system status and configuration
- `ursa --help` — Show all available commands

**Examples:**

```bash
# Server management
ursa server start              # Start API server as daemon
ursa server start --no-daemon  # Start in foreground
ursa server stop               # Stop the server
ursa server status             # Check server status
ursa server logs               # Tail server logs

# Testing
ursa test run                  # Run test suite
ursa test cov                  # Run with coverage
ursa test lint                 # Run ruff linter
ursa test format               # Format code

# AWS resources
ursa aws setup                 # Bootstrap TapDB templates and registries
ursa aws status                # Show TapDB template readiness
ursa aws teardown --force      # Print manual teardown instructions

# Cognito authentication (Google-first default)
./scripts/setup_cognito_google_default.sh  # Uses ~/.config/google_oauth/client_secret_2_...json
daycog status            # Check Cognito configuration
daycog list-users        # List users in the configured pool
daycog set-password --email user@example.com --password 'NewPass123!'
```

## Installation (Production)

```bash
pip install daylily-ursa

# With authentication support
pip install daylily-ursa[auth]

# For development
pip install daylily-ursa[dev]
```

## Alternative Quick Start

```bash
# Start the API server directly
daylily-workset-api --host 0.0.0.0 --port 8914

# Start the workset monitor
daylily-workset-monitor config/workset-monitor-config.yaml
```

## Architecture

```
daylib/
├── workset_api.py          # FastAPI application entry point
├── workset_state_db.py     # TapDB state management
├── workset_monitor.py      # S3 workset monitoring daemon
├── workset_integration.py  # TapDB/S3 integration layer
├── workset_metrics.py      # Storage and performance metrics
├── workset_customer.py     # Customer/tenant management
├── workset_multi_region.py # Multi-region coordination
├── file_registry.py        # File tracking and validation
├── biospecimen.py          # Sample/manifest management
├── config.py               # Pydantic settings (env vars)
├── cli/                    # Typer CLI modules
│   ├── __init__.py         # Main CLI app
│   ├── server.py           # Server commands
│   ├── monitor.py          # Monitor commands
│   ├── aws.py              # AWS resource commands
│   ├── test.py             # Test commands
│   └── env.py              # Environment commands
└── routes/                 # FastAPI route modules
    ├── portal.py           # Customer portal routes
    ├── worksets.py         # Workset CRUD routes
    ├── utilities.py        # Utility endpoints
    └── dependencies.py     # Shared dependencies
```

## Configuration

Configuration is managed via environment variables or a `.env` file. Generate a template:

```bash
ursa env generate
```

**Key Environment Variables:**

```bash
# AWS Configuration (required)
AWS_PROFILE=your-profile-name

# Region Configuration
# Regions are configured in ~/.config/ursa/ursa-config.yaml
# Use URSA_ALLOWED_REGIONS to specify regions to scan for clusters
URSA_ALLOWED_REGIONS=us-west-2,us-east-1

# S3 Configuration
# NOTE: S3 buckets are discovered from cluster tags (aws-parallelcluster-monitor-bucket)
# No bucket environment variables are required.

# TapDB (Strict Namespace)
# Bootstrap (preferred):
#   tapdb config init --client-id local --database-name ursa --env dev
#   tapdb bootstrap local
TAPDB_STRICT_NAMESPACE=1
TAPDB_CLIENT_ID=local
TAPDB_DATABASE_NAME=ursa
TAPDB_ENV=dev

# Authentication (optional)
ENABLE_AUTH=false
COGNITO_USER_POOL_ID=us-west-2_xxxxxxxx
COGNITO_CLIENT_ID=xxxxxxxxxxxxxxxxxxxxxxxxxx
SESSION_SECRET_KEY=change-this-in-production
WHITELIST_DOMAINS=all  # or comma-separated: company.com,partner.org

# Server
URSA_HOST=0.0.0.0
URSA_PORT=8914

# Multi-Region (optional)
DAYLILY_MULTI_REGION=false
DAYLILY_PRIMARY_REGION=us-west-2
```

See `docs/AUTHENTICATION_SETUP.md` and `docs/MULTI_REGION.md` for detailed configuration guides.

## Features

### Customer Portal

Web-based interface at `/portal/` providing:
- Dashboard with workset overview and storage metrics
- Workset list with status, progress, and directory sizes
- Workset detail view with resources, samples, and logs
- File registry for managing input files
- Usage tracking and storage breakdown
- Cluster management (admin only)

### Storage Metrics

Workset directory sizes are automatically calculated during the pre-export phase and displayed throughout the UI:
- **Dashboard**: Total storage across all worksets
- **Workset List**: Per-workset directory size column
- **Workset Detail**: Storage in the Resources card
- **Usage Page**: Storage breakdown by workset

### Authentication Modes

1. **No Auth** (development): `ursa server start`
2. **Cognito Auth** (production): Set `ENABLE_AUTH=true` with Cognito configuration

See `docs/AUTHENTICATION_SETUP.md` for setup instructions.

## Documentation

Detailed guides are available in the `docs/` directory:

| Document | Description |
|----------|-------------|
| `AUTHENTICATION_SETUP.md` | Cognito authentication configuration |
| `CUSTOMER_PORTAL.md` | Portal features and multi-tenant support |
| `MULTI_REGION.md` | Multi-region deployment guide |
| `BILLING_INTEGRATION.md` | AWS billing and cost allocation |
| `IAM_SETUP_GUIDE.md` | Required IAM permissions |
| `QUICKSTART_WORKSET_MONITOR.md` | Monitor daemon setup |
| `WORKSET_STATE_DIAGRAM.md` | Workset state machine reference |

## Related Projects

- [daylily-ephemeral-cluster](https://github.com/Daylily-Informatics/daylily-ephemeral-cluster) — AWS ParallelCluster infrastructure for running genomics pipelines

## License

MIT
