Metadata-Version: 2.4
Name: cvxtract
Version: 0.3.3
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Topic :: Text Processing
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
License-File: LICENSE-MIT
Summary: LLM-powered CV/resume extraction — typed Python dicts from PDF, DOCX, HTML, or TXT
Keywords: resume,cv,llm,extraction,document
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Bug Tracker, https://github.com/satadeep3927/cvxtract/issues
Project-URL: Documentation, https://docs.rs/cvxtract
Project-URL: Homepage, https://github.com/satadeep3927/cvxtract
Project-URL: Repository, https://github.com/satadeep3927/cvxtract

# cvxtract

[![Crates.io](https://img.shields.io/crates/v/cvxtract)](https://crates.io/crates/cvxtract)
[![Docs.rs](https://docs.rs/cvxtract/badge.svg)](https://docs.rs/cvxtract)
[![License](https://img.shields.io/badge/license-MIT%2FApache--2.0-blue)](LICENSE-MIT)

LLM-powered structured extraction from CVs and resumes.

**cvxtract** loads a CV/resume in any common format (PDF, DOCX, HTML, plain text),
sends the text to an LLM of your choice, and deserialises the response directly
into typed Rust structs — no regex, no hand-written parsers.

## Quick start

```rust
use cvxtract::{Extractor, Model};

#[tokio::main]
async fn main() {
    // No API key needed — model is downloaded automatically on first run.
    let mut extractor = Extractor::new(Some(Model::from_local()));

    match extractor.extract_resume("resume.pdf".into()).await {
        Ok(resume) => {
            println!("Name:  {}", resume.name);
            println!("Email: {}", resume.email.as_deref().unwrap_or("-"));
            println!("Jobs:  {}", resume.experience.len());
        }
        Err(e) => eprintln!("Extraction failed: {e}"),
    }
}
```

## Installation

```toml
[dependencies]
cvxtract = "0.1"
tokio = { version = "1", features = ["full"] }
```

## Providers

| Constructor | Backend | Auth |
|---|---|---|
| `Model::from_local()` | llama-cpp-2 on-device (Qwen3.5-2B) | none — model auto-downloaded |
| `Model::from_openai()` | OpenAI API | `OPENAI_API_KEY` |
| `Model::from_openrouter()` | OpenRouter | `OPENROUTER_API_KEY` |
| `Model::from_ollama()` | Local Ollama | Ollama running on `localhost:11434` |
| `Model::from_openai_compatible()` | Any OpenAI-compatible endpoint | explicit key + URL |
| `Model::from_copilot()` | GitHub Copilot | `COPILOT_TOKEN` |

```rust
// OpenAI
let model = Model::from_openai("gpt-4o-mini");

// Ollama (local)
let model = Model::from_ollama("llama3.2");

// Any OpenAI-compatible endpoint
let model = Model::from_openai_compatible(
    "https://api.my-provider.com/v1",
    std::env::var("MY_API_KEY").unwrap(),
    "my-model-name",
);
```

## Supported input formats

| Format | Extension |
|---|---|
| PDF | `.pdf` |
| Word | `.docx` |
| HTML | `.html`, `.htm` |
| Plain text | `.txt` |

## Built-in `Resume` type

`extract_resume()` populates a comprehensive `Resume` struct:

```rust
pub struct Resume {
    pub name:           String,
    pub email:          Option<String>,
    pub phone:          Option<String>,
    pub location:       Option<String>,
    pub linkedin:       Option<String>,
    pub github:         Option<String>,
    pub website:        Option<String>,
    pub summary:        Option<String>,
    pub experience:     Vec<Experience>,   // company, role, dates, highlights
    pub education:      Vec<Education>,    // institution, degree, field, dates
    pub skills:         Vec<SkillGroup>,   // grouped or flat skill lists
    pub projects:       Vec<Project>,      // name, tech stack, URL
    pub certifications: Vec<Certification>,
    pub languages:      Vec<Language>,
    pub awards:         Vec<Award>,
}
```

## Custom types

Extract *any* shape by deriving `serde::Deserialize` and `schemars::JsonSchema`:

```rust
use cvxtract::{Extractor, Model};
use schemars::JsonSchema;
use serde::Deserialize;

#[derive(Debug, Deserialize, JsonSchema)]
struct ContactInfo {
    name:  String,
    email: Option<String>,
    phone: Option<String>,
}

#[tokio::main]
async fn main() {
    let mut extractor = Extractor::new(Some(Model::from_openai("gpt-4o-mini")));
    let info: ContactInfo = extractor
        .extract::<ContactInfo>("resume.pdf".into())
        .await
        .unwrap();
    println!("{info:#?}");
}
```

## GPU acceleration (local model)

When using `Model::from_local()`, compile with a feature flag to offload layers to
your GPU. llama.cpp auto-fits what it can into VRAM and falls back to CPU for the
remainder — this is safe even on GPUs with limited memory.

```bash
# NVIDIA CUDA
cargo build --release --features cuda

# Apple Silicon (Metal)
cargo build --release --features metal

# AMD / Intel Vulkan
cargo build --release --features vulkan
```

```toml
# Cargo.toml
[dependencies]
cvxtract = { version = "0.1", features = ["cuda"] }
```

## Error handling

All async methods return `Result<T, ExtractionError>`:

```rust
use cvxtract::ExtractionError;

match extractor.extract_resume(path).await {
    Ok(resume) => { /* use resume */ }
    Err(ExtractionError::LoadError(e))  => eprintln!("Could not load file: {e}"),
    Err(ExtractionError::ModelError(m)) => eprintln!("LLM error: {m}"),
    Err(ExtractionError::ParseError(e)) => eprintln!("JSON parse error: {e}"),
}
```

## Raw document loading

Use `UnstructuredLoader` to extract plain text from a file without any LLM call:

```rust
use cvxtract::UnstructuredLoader;

let loader = UnstructuredLoader::new();
let doc = loader.load("resume.pdf")?;
println!("{} characters extracted", doc.content.len());
```
## License

Licensed under either of MIT license.
