Metadata-Version: 2.4
Name: pyautoflip
Version: 0.2.1
Summary: Intelligent video reframing library that automatically crops videos to different aspect ratios while preserving important content
Project-URL: Homepage, https://github.com/AhmedHisham1/pyautoflip
Project-URL: Repository, https://github.com/AhmedHisham1/pyautoflip
Project-URL: Documentation, https://github.com/AhmedHisham1/pyautoflip#readme
Author-email: Ahmed Hisham <a7med.hish@gmail.com>
License: MIT
License-File: LICENSE
Keywords: aspect-ratio,autoflip,computer-vision,cropping,face-detection,object-detection,reframing,video
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: End Users/Desktop
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Multimedia :: Video
Classifier: Topic :: Multimedia :: Video :: Conversion
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Image Processing
Requires-Python: <3.14,>=3.10
Requires-Dist: insightface>=0.7.3
Requires-Dist: matplotlib>=3.10.7
Requires-Dist: mediapipe>=0.10.21
Requires-Dist: numpy>=1.24.0
Requires-Dist: onnxruntime>=1.21.0
Requires-Dist: opencv-python>=4.11.0.86
Requires-Dist: pillow>=10.0.0
Requires-Dist: scenedetect>=0.6.6
Requires-Dist: scikit-image>=0.25.2
Requires-Dist: scikit-learn>=1.6.1
Requires-Dist: scipy>=1.15.3
Requires-Dist: tensorboard>=2.20.0
Requires-Dist: tensorboardx>=2.6.4
Requires-Dist: torch>=2.11.0
Requires-Dist: torchvision>=0.26.0
Requires-Dist: tqdm>=4.65.0
Description-Content-Type: text/markdown

# PyAutoFlip

A Python library for saliency-aware video cropping that automatically reframes videos to different aspect ratios while preserving important content.

**Note**: This is a Python implementation inspired by [MediaPipe's AutoFlip](https://github.com/google-ai-edge/mediapipe/blob/master/docs/solutions/autoflip.md). The original MediaPipe AutoFlip solution is no longer actively supported, so this project provides a maintained alternative using similar techniques.

## What it does

PyAutoFlip analyzes videos to identify salient content (faces, objects, motion) and intelligently crops frames to fit target aspect ratios. This is useful for adapting content between different platforms (e.g., landscape videos for portrait social media formats).

### Examples

Landscape to 9:16 — the saliency method automatically tracks the most important region:

![Example 1](assets/example1.jpg)

When two people are too far apart for a single crop, it automatically creates a split-screen layout:

![Split-screen example 1](assets/example_split1.jpg)

![Split-screen example 2](assets/example_split2.jpg)

Handles multi-person scenes by focusing on the salient subject:

![Example 2](assets/example2.jpg)

## Installation

```bash
# From PyPI
pip install pyautoflip
```

## Quick Start

### Command Line

```bash
# Convert a landscape video to portrait (9:16)
pyautoflip reframe -i input.mp4 -o output.mp4

# Use saliency-based detection (UNISAL + face detection)
pyautoflip reframe -i input.mp4 -o output.mp4 --method saliency

# Convert to square format
pyautoflip reframe -i input.mp4 -o output.mp4 --aspect-ratio 1:1

# Enable debug visualizations
pyautoflip reframe -i input.mp4 -o output.mp4 --debug
```

### Python API

```python
from pyautoflip import reframe_video

# Basic usage (detection method - face/object detection)
reframe_video(
    input_path="input.mp4",
    output_path="output.mp4",
    target_aspect_ratio="9:16"
)

# Saliency-based method (better for complex content)
reframe_video(
    input_path="input.mp4",
    output_path="output.mp4",
    target_aspect_ratio="9:16",
    detection_method="saliency"
)

# With options
reframe_video(
    input_path="input.mp4",
    output_path="output.mp4",
    target_aspect_ratio="1:1",
    motion_threshold=0.3,        # Lower = more stable crops
    padding_method="blur",       # or "solid_color"
    detection_method="saliency",
    debug_mode=True
)
```

## Detection Methods

PyAutoFlip supports two detection methods for determining what to keep in frame:

### `detection` (default)

Uses InsightFace for face detection and MediaPipe for object detection. Fast and reliable for content with clear subjects (people, animals, text). Assigns priority weights to different object types (faces > people > animals > text).

### `saliency`

Uses [UNISAL](https://github.com/rdroste/unisal) saliency maps combined with InsightFace face detection. Better for complex scenes where important content isn't just faces/objects. Features:

- **UNISAL saliency**: Learns what draws human visual attention from data, via ONNX Runtime for fast CPU inference
- **Face-aware**: Combines saliency with face detection, filters out false faces (portraits, posters) by size
- **Adaptive crop width**: Uses narrow (exact AR) or wide (+30% with blur padding) crop per scene based on saliency spread
- **Split-screen**: Automatically detects when two faces are too far apart for one crop (e.g., podcast wide shots) and renders a 2-panel split layout
- **Temporal stabilization**: Per-scene camera motion classification (stationary/panning/tracking) with trajectory smoothing

## How it works

1. **Scene Detection**: Identifies scene boundaries using PySceneDetect
2. **Content Analysis**: Samples key frames per scene and runs detection
   - *Detection method*: InsightFace faces + MediaPipe objects with priority weights
   - *Saliency method*: UNISAL saliency maps + InsightFace faces (size-filtered) on downscaled frames
3. **Crop Computation**: Determines optimal crop regions per frame
   - Fixed-width crop windows centered on the saliency center of mass
   - Per-scene crop width decision (narrow vs wide with padding)
4. **Temporal Smoothing**: Camera motion classification (STATIONARY/PANNING/TRACKING) with appropriate stabilization per scene
5. **Output**: Applies crops with blur/solid padding and recombines with original audio

## Options

| Option | Description | Default |
|---|---|---|
| `--aspect-ratio` | Target aspect ratio (e.g., "9:16", "1:1", "4:3") | `9:16` |
| `--method` | Detection method: `detection` or `saliency` | `detection` |
| `--motion-threshold` | Camera motion sensitivity (0.0 = stable, 1.0 = allow motion) | `0.5` |
| `--padding-method` | Padding style: `blur` or `solid_color` | `blur` |
| `--debug` | Enable debug mode with visualizations and logging | off |

## Requirements

- Python 3.10+
- FFmpeg (for video processing)

### System dependencies

**Ubuntu/Debian:**
```bash
sudo apt-get install ffmpeg libgl1-mesa-glx libglib2.0-0
```

**macOS:**
```bash
brew install ffmpeg
```

## Development

```bash
git clone https://github.com/AhmedHisham1/pyautoflip.git
cd pyautoflip
uv sync
```

## License

MIT License - see LICENSE file for details.

## Acknowledgments

- [MediaPipe AutoFlip](https://github.com/google-ai-edge/mediapipe/blob/master/docs/solutions/autoflip.md) for the original concept and methodology
- [UNISAL](https://github.com/rdroste/unisal) for the saliency detection model
- [InsightFace](https://github.com/deepinsight/insightface) for face detection
- [MediaPipe](https://github.com/google/mediapipe) for object detection
- [PySceneDetect](https://github.com/Breakthrough/PySceneDetect) for scene analysis
- [ONNX Runtime](https://onnxruntime.ai/) for fast CPU inference
