Metadata-Version: 2.4
Name: s3bolt
Version: 0.2.0
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Rust
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Utilities
Classifier: Typing :: Typed
License-File: LICENSE
Summary: High-performance S3 file copy tool — concurrent, async, built in Rust
Keywords: s3,aws,copy,async,performance
Author: cykruss
License: MIT
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://github.com/cykruss/s3bolt
Project-URL: Issues, https://github.com/cykruss/s3bolt/issues
Project-URL: Repository, https://github.com/cykruss/s3bolt

# s3bolt

**High-performance S3 file copy tool — concurrent, async, built in Rust.**

[![CI](https://github.com/cykruss/s3bolt/actions/workflows/CI.yml/badge.svg)](https://github.com/cykruss/s3bolt/actions/workflows/CI.yml)
[![Crates.io](https://img.shields.io/crates/v/s3bolt.svg)](https://crates.io/crates/s3bolt)
[![PyPI](https://img.shields.io/pypi/v/s3bolt.svg)](https://pypi.org/project/s3bolt/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

Copy S3 objects between buckets and prefixes at maximum throughput. Uses server-side copy, adaptive concurrency, and async I/O. Available as a **Rust crate**, a **CLI tool**, and a **Python package**.

## Why s3bolt?

| | `aws s3 cp` | s3bolt |
|---|---|---|
| Concurrency | ~10 concurrent transfers | Up to 1024 with adaptive AIMD |
| Large files (>5 GiB) | Multipart upload (downloads data) | Multipart **server-side copy** (zero data transfer) |
| Throttle handling | Fixed retry | Adaptive backoff (halves concurrency on 503) |
| Resume | No | Checkpoint file for resumable copies |
| Filtering | Basic `--exclude` | Glob, regex, size range, date range |
| Cross-account | Single profile | Separate source/dest profiles |

## Installation

### CLI (Rust)

```bash
cargo install s3bolt
```

### Python

```bash
pip install s3bolt
```

### From source

```bash
git clone https://github.com/cykruss/s3bolt.git
cd s3bolt
cargo build --release
# Binary at target/release/s3bolt
```

## Quick start

### CLI

```bash
# Copy a single file
s3bolt s3://src-bucket/data/file.parquet s3://dst-bucket/data/file.parquet

# Recursive prefix copy
s3bolt -r s3://src-bucket/data/2024/ s3://dst-bucket/archive/2024/

# Sync (only copy new/changed objects)
s3bolt -r --sync s3://data-lake/raw/ s3://data-lake/curated/

# Filter by pattern
s3bolt -r --include "**/*.parquet" --exclude "_tmp/**" s3://src/ s3://dst/

# Cross-account with different AWS profiles
s3bolt -r --source-profile prod --dest-profile analytics s3://prod/ s3://analytics/

# High concurrency
s3bolt -r -j 512 s3://src/ s3://dst/

# Dry run (see what would be copied)
s3bolt -r --dry-run s3://src/ s3://dst/

# Resumable copy with checkpoint
s3bolt -r --checkpoint /tmp/copy.ckpt s3://src/ s3://dst/
# If interrupted, resume with:
s3bolt -r --checkpoint /tmp/copy.ckpt --resume s3://src/ s3://dst/
```

### Python

```python
from s3bolt import S3CopyEngine

engine = S3CopyEngine(source_profile="prod", dest_profile="analytics")

result = engine.copy(
    "s3://src-bucket/data/",
    "s3://dst-bucket/data/",
    recursive=True,
    include=["**/*.parquet"],
    max_concurrent=512,
)

print(f"Copied {result['copied_objects']} objects ({result['copied_bytes']} bytes)")
print(f"Skipped {result['skipped_objects']} | Failed {result['failed_objects']}")
print(f"Duration: {result['duration_secs']:.1f}s")
```

### Rust

```rust
use std::sync::Arc;
use s3bolt::config::{CopyConfig, ConcurrencyConfig, FilterConfig};
use s3bolt::engine::orchestrator;
use s3bolt::progress::reporter::ProgressState;
use s3bolt::types::S3Uri;

#[tokio::main]
async fn main() -> s3bolt::error::Result<()> {
    let config = CopyConfig {
        source: S3Uri::parse("s3://src-bucket/prefix/")?,
        destination: S3Uri::parse("s3://dst-bucket/prefix/")?,
        recursive: true,
        sync_mode: false,
        dry_run: false,
        verify: false,
        filters: FilterConfig::default(),
        concurrency: ConcurrencyConfig::default(),
        checkpoint_path: None,
        resume: false,
        storage_class: None,
        sse: None,
        preserve_metadata: false,
        source_profile: None,
        dest_profile: None,
    };

    let progress = Arc::new(ProgressState::default());
    let manifest = orchestrator::run(config, progress).await?;
    println!("Copied {} objects", manifest.copied_objects);
    Ok(())
}
```

## Architecture

```
ListObjectsV2 (async paginator stream)
       │
       ▼
[bounded channel, cap=10,000]  ← backpressure: listing pauses when full
       │
       ▼
Filter stage (glob, regex, size, date)
       │
       ▼
[tokio::sync::Semaphore, permits=N]  ← adaptive concurrency (AIMD)
       │
       ▼
Worker tasks (spawn per object)
  ├── CopyObject (≤ 5 GiB) ─── server-side, zero data transfer
  └── UploadPartCopy (> 5 GiB) ─ parallel multipart, server-side
       │
       ▼
Progress reporter + checkpoint writer
```

### Performance design

- **Tokio async runtime** — tens of thousands of concurrent I/O tasks on a small thread pool. No OS thread overhead.
- **Server-side copy** — `CopyObject` and `UploadPartCopy` move data within S3's network. The client only sends metadata requests (~50-200ms latency each).
- **Adaptive concurrency (AIMD)** — starts at the configured limit (default 256), ramps up on sustained success, halves on S3 503 SlowDown responses. Respects S3's 3,500 PUT/5,500 GET per-prefix rate limits automatically.
- **Bounded backpressure** — a 10,000-item channel between the lister and copy workers. If workers fall behind, listing pauses. Memory stays bounded (~2-3 MiB for the queue).
- **Multipart for large files** — objects > 5 GiB are split into 256 MiB parts, each copied server-side in parallel. Automatic cleanup (abort) on failure.
- **Zero unnecessary data transfer** — data never flows through the client for same-region copies. Pure metadata orchestration.

## CLI reference

```
s3bolt [OPTIONS] <SOURCE> <DESTINATION>

Arguments:
  <SOURCE>        Source S3 URI (s3://bucket/key or s3://bucket/prefix/)
  <DESTINATION>   Destination S3 URI

Copy options:
  -r, --recursive              Recursively copy all objects under prefix
      --sync                   Only copy new/changed objects
      --dry-run                List objects without copying
      --verify                 Verify ETag after copy
      --storage-class <CLASS>  Override storage class

Filtering:
      --include <GLOB>         Include keys matching glob (repeatable)
      --exclude <GLOB>         Exclude keys matching glob (repeatable)
      --key-regex <REGEX>      Include keys matching regex
      --min-size <BYTES>       Minimum object size
      --max-size <BYTES>       Maximum object size

Concurrency:
  -j, --concurrency <N>       Max concurrent copies [default: 256]
      --no-adaptive            Disable adaptive concurrency

Resume:
      --checkpoint <FILE>      Checkpoint file for resume support
      --resume                 Resume from existing checkpoint

Credentials:
      --source-profile <NAME>  AWS profile for source
      --dest-profile <NAME>    AWS profile for destination

Output:
  -v, --verbose                Debug logging
  -q, --quiet                  Errors only
```

## Prerequisites

- AWS credentials configured via any standard method (env vars, `~/.aws/credentials`, IAM role, SSO)
- The executing role must have `s3:GetObject` + `s3:ListBucket` on source and `s3:PutObject` on destination
- For multipart copies: `s3:AbortMultipartUpload` on destination

### Minimal IAM policy

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:ListBucket"],
      "Resource": [
        "arn:aws:s3:::source-bucket",
        "arn:aws:s3:::source-bucket/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": ["s3:PutObject", "s3:AbortMultipartUpload"],
      "Resource": [
        "arn:aws:s3:::dest-bucket",
        "arn:aws:s3:::dest-bucket/*"
      ]
    }
  ]
}
```

## Development

```bash
# Clone
git clone https://github.com/cykruss/s3bolt.git
cd s3bolt

# Rust tests
cargo test --no-default-features

# Clippy
cargo clippy --no-default-features --lib -- -D warnings

# Build Python package (dev mode)
python -m venv .venv
source .venv/bin/activate
pip install maturin pytest
maturin develop --release
pytest tests/ -v
```

## Contributing

Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

## License

[MIT](LICENSE)

