Metadata-Version: 2.4
Name: domain-scout-ct
Version: 0.7.0
Summary: Discover internet domains associated with a business entity via CT logs, RDAP, and DNS
Project-URL: Repository, https://github.com/minghsuy/domain-scout
Project-URL: Changelog, https://github.com/minghsuy/domain-scout/blob/main/CHANGELOG.md
Project-URL: Issues, https://github.com/minghsuy/domain-scout/issues
Author: Ming Yang
License-Expression: MIT
License-File: LICENSE
Keywords: certificate-transparency,dns,domain-discovery,rdap,security
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Internet :: Name Service (DNS)
Classifier: Topic :: Security
Classifier: Typing :: Typed
Requires-Python: >=3.12
Requires-Dist: dnspython>=2.7.0
Requires-Dist: httpx>=0.28.0
Requires-Dist: psycopg2-binary>=2.9.11
Requires-Dist: pydantic>=2.10.0
Requires-Dist: rapidfuzz>=3.11.0
Requires-Dist: structlog>=24.4.0
Requires-Dist: typer>=0.15.0
Provides-Extra: all
Requires-Dist: duckdb>=1.0.0; extra == 'all'
Requires-Dist: fastapi>=0.115.0; extra == 'all'
Requires-Dist: prometheus-client>=0.21.0; extra == 'all'
Requires-Dist: pyyaml>=6.0; extra == 'all'
Requires-Dist: uvicorn[standard]>=0.32.0; extra == 'all'
Provides-Extra: api
Requires-Dist: fastapi>=0.115.0; extra == 'api'
Requires-Dist: uvicorn[standard]>=0.32.0; extra == 'api'
Provides-Extra: cache
Requires-Dist: duckdb>=1.0.0; extra == 'cache'
Provides-Extra: eval
Requires-Dist: pyyaml>=6.0; extra == 'eval'
Provides-Extra: metrics
Requires-Dist: prometheus-client>=0.21.0; extra == 'metrics'
Description-Content-Type: text/markdown

# domain-scout

[![CI](https://github.com/minghsuy/domain-scout/actions/workflows/ci.yml/badge.svg)](https://github.com/minghsuy/domain-scout/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/domain-scout-ct)](https://pypi.org/project/domain-scout-ct/)

Discover internet domains associated with a business entity using Certificate Transparency logs, RDAP, and DNS.

Useful for security teams, asset inventories, and M&A due diligence — where seed domains can be wrong, misspelled, or belong to a parent company.

## Install

```bash
pip install domain-scout-ct            # core library + CLI
pip install domain-scout-ct[api]       # + REST API server
pip install domain-scout-ct[cache]     # + DuckDB query cache
pip install domain-scout-ct[all]       # everything
```

For development:

```bash
uv sync --all-groups --all-extras
```

## Usage

### CLI

```bash
# Basic usage
domain-scout --name "Guidewire Software" --location "San Mateo, CA"

# With seed domain
domain-scout --name "Palo Alto Networks" --location "Santa Clara, CA" --seed "paloaltonetworks.com"

# Multiple seeds — cross-verification boosts confidence for domains found by both
domain-scout --name "Walmart" --seed walmart.com --seed samsclub.com

# Deep mode — GeoDNS global resolution for non-resolving domains
domain-scout --name "Walmart" --seed "walmart.com" --deep

# JSON output
domain-scout --name "Acme Corp" --output json > results.json

# Verbose logging
domain-scout --name "Cloudflare" --seed "cloudflare.com" -v
```

### REST API

```bash
# Start the API server (cache enabled by default)
domain-scout serve --port 8080

# Health check
curl http://localhost:8080/health

# Run a scan
curl -X POST http://localhost:8080/scan \
  -H "Content-Type: application/json" \
  -d '{"entity": {"company_name": "Walmart", "seed_domain": ["walmart.com"]}}'

# Readiness check (probes crt.sh connectivity)
curl http://localhost:8080/ready
```

### Docker

```bash
# Build
docker build -t domain-scout-ct .

# Run API server
docker run -p 8080:8080 domain-scout-ct

# Run CLI scan
docker run domain-scout-ct scout --name "Walmart" --seed walmart.com

# Persist cache across runs
docker run -p 8080:8080 -v scout-cache:/data/cache domain-scout-ct
```

### Cache

```bash
# Enable cache for CLI scans
domain-scout scout --name "Walmart" --seed walmart.com --cache

# View cache statistics
domain-scout cache stats

# Clear cache
domain-scout cache clear
```

### Library

```python
from domain_scout import Scout

result = Scout().discover(
    company_name="Palo Alto Networks",
    location="Santa Clara, CA",
    seed_domain=["paloaltonetworks.com"],
)

for domain in result.domains:
    print(f"{domain.domain:40s}  {domain.confidence:.2f}  {domain.sources}")
```

### Async

```python
import asyncio
from domain_scout import Scout, EntityInput

async def main():
    scout = Scout()
    result = await scout.discover_async(EntityInput(
        company_name="Palo Alto Networks",
        seed_domain=["paloaltonetworks.com"],
    ))
    return result

result = asyncio.run(main())
```

## How it works

1. **Seed validation** — DNS-resolves the seed domain, checks RDAP registrant org and CT cert org names against the company name
2. **CT org search** — Queries crt.sh Postgres for certificates where the Subject Organization matches the company name
3. **Seed expansion** — Finds all SANs on certs covering the seed domain, revealing related domains (e.g., acquired companies)
4. **Domain guessing** — Generates candidates from the company name + common TLDs, resolves them, verifies via CT
5. **Cross-seed verification** — With multiple seeds, domains found independently by 2+ seeds get a confidence boost
6. **RDAP corroboration** — Queries RDAP registrant org on top discovered domains, confirming ownership matches the target company
7. **Confidence scoring** — Corroboration-level model scores each domain 0–1 based on the combination of evidence: CT org match, SAN co-occurrence, DNS resolution, RDAP registrant match, cross-seed verification, and shared infrastructure

### Data sources

| Source | Method | Rate limited |
|--------|--------|-------------|
| crt.sh | Postgres (primary), JSON API (fallback) | 5 concurrent queries, 1s burst delay |
| RDAP | rdap.org universal bootstrap | Per-request |
| DNS | dnspython (8.8.8.8, 1.1.1.1) | 5 concurrent |
| Shodan GeoDNS | geonet.shodan.io (deep mode) | 3 concurrent, 0.5s delay |

## Development

```bash
make install      # uv sync --all-groups
make test         # unit tests (mocked external calls)
make lint         # ruff + mypy
make format       # ruff --fix + ruff format
make check        # format + lint + test
```

Integration tests hit real crt.sh:

```bash
make test-integration
```

## License

MIT
