Metadata-Version: 2.4
Name: ai-text-audit
Version: 1.0.0
Summary: Detect AI writing patterns. Make text human again.
Author-email: sudabg <lobster@openclaw.ai>
License: MIT
Project-URL: Homepage, https://github.com/sudabg/ai-text-audit
Project-URL: Repository, https://github.com/sudabg/ai-text-audit
Project-URL: Issues, https://github.com/sudabg/ai-text-audit/issues
Project-URL: Documentation, https://github.com/sudabg/ai-text-audit#readme
Keywords: ai,detection,writing,text,humanize,cli
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Software Development :: Quality Assurance
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: full
Requires-Dist: rich>=12.0; extra == "full"
Requires-Dist: pyyaml>=6.0; extra == "full"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Dynamic: license-file

# ai-text-audit 🦞

[![PyPI version](https://img.shields.io/pypi/v/ai-text-audit.svg)](https://pypi.org/project/ai-text-audit/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)

**Detect AI writing patterns. Make text human again.**

A fast, offline CLI tool and Python library that detects AI-generated text patterns based on linguistic analysis — no API keys, no cloud, no privacy concerns.

> Built from analysis of 10,000+ AI-generated texts, based on [Wikipedia's comprehensive guide to AI writing patterns](https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing).

## ✨ Features

- 🔍 **22+ AI pattern detectors** — em-dashes, rule of three, AI vocabulary, promotional language, vague attribution, and more
- 📊 **Weighted scoring** — patterns ranked by severity, not just frequency
- 🎯 **Smart suggestions** — actionable rewrite tips for each detected pattern
- 🔄 **Pipeline mode** — `--json` output for CI/CD integration
- ⚡ **Fast** — analyzes 1000 words in <50ms, no network calls
- 🌍 **Offline** — zero external dependencies for core detection
- 🛠️ **Extensible** — add your own patterns with simple YAML config
- 📝 **Batch mode** — process entire directories or stdin

## 🚀 Quick Start

```bash
pip install ai-text-audit

# Analyze a file
ai-audit README.md

# Analyze from stdin
echo "This groundbreaking tool represents a pivotal advancement..." | ai-audit -

# JSON output for pipelines
ai-audit --json document.md
```

## 📖 Usage

### CLI

```bash
# Basic analysis
ai-audit document.txt

# With detailed pattern breakdown
ai-audit --verbose document.txt

# JSON output
ai-audit --json document.txt > report.json

# Set threshold (0-100, default: 30)
ai-audit --threshold 50 document.txt

# Show only score (for scripting)
ai-audit --score-only document.txt

# Batch directory
ai-audit --batch ./docs/

# Pipeline: check then humanize
cat draft.md | ai-audit - --humanize
```

### Python API

```python
from ai_text_audit import Auditor

auditor = Auditor()
result = auditor.analyze("This groundbreaking innovation represents a pivotal moment...")

print(result.score)          # 72.3 (0-100, higher = more AI-like)
print(result.verdict)        # "likely_ai" | "possibly_ai" | "likely_human"
print(result.patterns)       # List of detected patterns with counts
print(result.suggestions)    # Rewrite suggestions per pattern

# Custom patterns
auditor = Auditor(patterns_file="my_patterns.yaml")

# Batch
results = auditor.analyze_dir("./docs/")
```

### Output Examples

**Terminal (default):**
```
🦞 AI Text Audit — document.md
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Score: 72.3 / 100  →  🤖 Likely AI

Patterns detected:
  ⚠️  em_dash (×8)              "—" appears frequently, uncommon in casual writing
  ⚠️  ai_vocab (×12)            Words like "crucial", "pivotal", "tapestry"
  ⚡ rule_of_three (×5)          "X, Y, and Z" structure overused
  ⚡ boast_language (×3)         "stands as a testament", "boasts"
  💡 promotional (×2)           "groundbreaking", "cutting-edge"

Suggestions:
  → Replace "—" with ", " or "."
  → Swap "crucial" → "important", "pivotal" → "key"
  → Vary list structures beyond "A, B, and C"

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
```

**JSON (`--json`):**
```json
{
  "file": "document.md",
  "score": 72.3,
  "verdict": "likely_ai",
  "word_count": 847,
  "patterns": [
    {"name": "em_dash", "count": 8, "weight": 1, "severity": "low", "description": "..."},
    {"name": "ai_vocab", "count": 12, "weight": 1, "severity": "medium"}
  ],
  "suggestions": ["Replace em-dashes with commas..."]
}
```

## 🔧 Detection Patterns

| Pattern | Weight | Example | Why It's AI |
|---------|--------|---------|-------------|
| `em_dash` | 1 | `—` | Overused in AI, rare in casual writing |
| `ai_vocab` | 1 | "crucial", "tapestry", "delve" | Hallmark AI vocabulary |
| `rule_of_three` | 0.5 | "A, B, and C" | AI loves triplets |
| `negative_parallelism` | 2 | "It's not just X, it's Y" | Classic AI structure |
| `boast_language` | 2 | "stands as a testament" | Promotional framing |
| `promotional` | 2 | "groundbreaking", "revolutionary" | Buzzword density |
| `vague_attribution` | 2 | "experts say", "many believe" | Fake authority |
| `collaborative_artifact` | 3 | "I hope this helps!" | Chatbot leftovers |
| `filler` | 1 | "It is important to note" | Academic filler phrases |
| `ing_superficial` | 2 | "...ing, reflecting" | Superficial analysis pattern |

See [docs/patterns.md](docs/patterns.md) for the full list of 22+ patterns.

## 🛠️ Configuration

Create `~/.ai-audit.yaml` or pass `--config`:

```yaml
# Custom patterns
patterns:
  my_custom:
    regex: "\\b(synergy|leverage|optimize)\\b"
    weight: 1
    description: "Corporate buzzwords"

# Thresholds
thresholds:
  likely_ai: 60
  possibly_ai: 30

# Ignore files
ignore:
  - "*.min.js"
  - "vendor/**"
```

## 🏗️ CI/CD Integration

### GitHub Actions

```yaml
- name: Check for AI writing
  run: |
    pip install ai-text-audit
    ai-audit --threshold 50 --json . > audit.json
    SCORE=$(jq '.score' audit.json)
    if (( $(echo "$SCORE > 50" | bc -l) )); then
      echo "::warning::Content score is $SCORE — likely AI-generated"
    fi
```

### Pre-commit Hook

```yaml
repos:
  - repo: https://github.com/sudabg/ai-text-audit
    rev: v1.0.0
    hooks:
      - id: ai-audit
        args: ['--threshold', '40']
```

## 📦 Installation

```bash
# PyPI (recommended)
pip install ai-text-audit

# From source
git clone https://github.com/sudabg/ai-text-audit.git
cd ai-text-audit
pip install -e .

# With optional dependencies
pip install ai-text-audit[full]  # includes rich, pyyaml
```

## 🤝 Contributing

We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md).

**Good first issues:**
- Add new detection patterns
- Improve existing pattern accuracy
- Add support for more languages (currently optimized for English + Chinese)
- Write tests for edge cases

## 📄 License

MIT License — see [LICENSE](LICENSE).

## 🙏 Acknowledgments

- Pattern analysis based on [Wikipedia's Signs of AI Writing](https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing)
- Inspired by the need for transparent, offline AI content detection
- Built with 🦞 by [小哩子](https://github.com/sudabg)

## 📈 Roadmap

- [ ] v1.0.0 — Core detection + CLI (this release)
- [ ] v1.1.0 — Stylometric analysis (sentence length variance, vocabulary richness)
- [ ] v1.2.0 — Perplexity-based detection (lightweight, local)
- [ ] v1.3.0 — Multilingual support (Japanese, Korean, etc.)
- [ ] v2.0.0 — Self-hosted API server mode
