Metadata-Version: 2.4
Name: sonata-asr
Version: 0.0.1
Summary: SONATA: SOund and Narrative Advanced Transcription Assistant
Home-page: https://github.com/hwk06023/SONATA
Author: hwk06023
Author-email: hwk06023@github.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.20.0
Requires-Dist: torch>=1.10.0
Requires-Dist: torchaudio>=0.10.0
Requires-Dist: transformers>=4.25.1
Requires-Dist: whisperx>=3.1.0
Requires-Dist: librosa>=0.9.0
Requires-Dist: pydub>=0.25.1
Requires-Dist: scipy>=1.7.0
Requires-Dist: soundfile>=0.10.3
Requires-Dist: tqdm>=4.62.0
Requires-Dist: matplotlib>=3.5.0
Requires-Dist: scikit-learn>=1.0.0
Requires-Dist: huggingface_hub>=0.12.0
Requires-Dist: hf_xet>=0.1.0
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# SONATA

**SOund and Narrative Advanced Transcription Assistant**

SONATA is an advanced Automatic Speech Recognition (ASR) system that captures the symphony of human expression by recognizing and transcribing both verbal content and emotive sounds.

## Features

- High-accuracy speech-to-text transcription
- Recognition of emotive sounds and non-verbal cues
- Support for tags like `<laugh>`, `<sigh>`, `<yawn>`, `<surprise>`, `<inhale>`, `<groan>`, `<cough>`, `<sneeze>`, `<sniffle>`
- Open-source and extensible architecture

## Installation

Install the package from PyPI:

```bash
pip install sonata-asr
```

Or install from source:

```bash
git clone https://github.com/hwk06023/SONATA.git
cd SONATA
pip install -e .
```

## Usage Examples

### Basic Transcription

```python
from sonata import Transcriber

# Initialize the transcriber
transcriber = Transcriber()

# Transcribe an audio file
result = transcriber.transcribe("path/to/audio.wav")
print(result)
```

### Detecting Emotive Sounds

```python
from sonata.core import EmotiveDetector

# Initialize the emotive detector
detector = EmotiveDetector(threshold=0.6)

# Detect emotive events in an audio file
events = detector.detect_events("path/to/audio.wav")

# Print the detected events
for event in events:
    print(f"{event.type}: {event.start_time:.2f}s - {event.end_time:.2f}s (confidence: {event.confidence:.2f})")
```

### Full Pipeline

```python
from sonata import Sonata

# Initialize SONATA with default settings
sonata = Sonata()

# Process audio file - transcribes speech and detects emotive sounds
result = sonata.process("path/to/audio.wav")

# Print the text with emotive tags
print(result.text_with_tags)

# Save the result
sonata.save_output(result, "output.json")
```

## Command Line Interface

SONATA also provides a CLI for quick transcription:

```bash
# Basic usage
sonata-asr path/to/audio.wav

# Save output to specific file
sonata-asr path/to/audio.wav --output result.json

# Set threshold for emotive detection
sonata-asr path/to/audio.wav --threshold 0.7
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the GNU General Public License v3.0 - see the [LICENSE](LICENSE) file for details. This license ensures that derivative works must also be open source and use the same license.
