Metadata-Version: 2.4
Name: f2a
Version: 1.0.3
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Visualization
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Dist: pandas>=2.0
Requires-Dist: numpy>=1.24
Requires-Dist: matplotlib>=3.7
Requires-Dist: seaborn>=0.13
Requires-Dist: scipy>=1.11
Requires-Dist: pyarrow>=12.0
Requires-Dist: rich>=13.0
Requires-Dist: jinja2>=3.1
Requires-Dist: scikit-learn>=1.3 ; extra == 'advanced'
Requires-Dist: networkx>=3.0 ; extra == 'advanced'
Requires-Dist: umap-learn>=0.5 ; extra == 'advanced'
Requires-Dist: statsmodels>=0.14 ; extra == 'advanced'
Requires-Dist: pytest>=7.0 ; extra == 'dev'
Requires-Dist: pytest-benchmark ; extra == 'dev'
Requires-Dist: ruff ; extra == 'dev'
Requires-Dist: black ; extra == 'dev'
Requires-Dist: isort ; extra == 'dev'
Requires-Dist: mypy ; extra == 'dev'
Requires-Dist: maturin>=1.7 ; extra == 'dev'
Requires-Dist: openpyxl>=3.1 ; extra == 'io'
Requires-Dist: pyreadstat>=1.2 ; extra == 'io'
Requires-Dist: tables>=3.8 ; extra == 'io'
Requires-Dist: odfpy>=1.4 ; extra == 'io'
Requires-Dist: lxml>=4.9 ; extra == 'io'
Requires-Dist: duckdb>=0.9 ; extra == 'io'
Requires-Dist: datasets>=2.14 ; extra == 'io'
Provides-Extra: advanced
Provides-Extra: dev
Provides-Extra: io
License-File: LICENSE
Summary: File to Analysis -- Automatically perform statistical analysis from any data source (Rust-powered)
Keywords: statistics,visualization,data-analysis,eda,rust,performance
Author: CocoRoF
License: Apache-2.0
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Documentation, https://github.com/CocoRoF/f2a#readme
Project-URL: Homepage, https://github.com/CocoRoF/f2a
Project-URL: Issues, https://github.com/CocoRoF/f2a/issues
Project-URL: Repository, https://github.com/CocoRoF/f2a

# f2a

> **File to Analysis** — Automatically perform statistical analysis from any data source.

`f2a` is a high-performance data analysis library that provides a simple
Python API while running all compute-heavy operations in native Rust via
[PyO3](https://pyo3.rs) and [maturin](https://www.maturin.rs).

## Architecture

```
┌─────────────────────────────────────────┐
│            Python API Layer             │
│  f2a.analyze()  /  AnalysisConfig       │
│  Report generation (Jinja2 HTML)        │
│  Visualization (matplotlib / seaborn)   │
└──────────────┬──────────────────────────┘
               │  PyO3 FFI
┌──────────────▼──────────────────────────┐
│          Rust Core  (_core)             │
│  Data loading (polars)                  │
│  Schema inference & preprocessing       │
│  21 statistical analysis modules        │
│  Parallel computation (rayon)           │
└─────────────────────────────────────────┘
```

### What runs in Rust

| Layer | Modules |
|---|---|
| **Core** | Loader (CSV/TSV/Parquet/JSON/JSONL), Schema inference, Preprocessor, Analyzer orchestration |
| **Basic Stats** | Descriptive, Correlation, Distribution, Missing, Outlier, Categorical, Duplicates, Quality, Feature Importance, PCA |
| **Advanced Stats** | Statistical Tests, Clustering, Anomaly Detection, Advanced Correlation, Advanced Distribution, Dimensionality Reduction, Feature Insights, Insight Engine, Column Role, Cross Analysis, ML Readiness |

### What stays in Python

| Layer | Reason |
|---|---|
| **Visualization** | matplotlib/seaborn — no Rust equivalent worth the effort |
| **HTML Report** | Jinja2 templating is inherently Python |
| **i18n** | String-heavy, low compute |

## Quick Start

```python
import f2a

report = f2a.analyze("data.csv")
report.show()               # Rich console summary
report.to_html("output/")   # Self-contained HTML report
report.get("quality")       # Dict access to any section
```

## Installation

```bash
pip install f2a
```

### Building from Source

```bash
# Prerequisites: Rust toolchain, Python >=3.10
pip install maturin

# Development build (editable)
maturin develop --release

# Build wheel
maturin build --release
```

## Supported Formats

CSV, TSV, JSON, JSONL, Parquet — plus optional extras for Excel, SPSS, SAS, HDF5, ODF, and more:

```bash
pip install f2a[io]
```

## License

Apache-2.0

