Metadata-Version: 2.4
Name: waste-predictor
Version: 4.0.0
Summary: Production-grade machine learning system for industrial waste prediction
Author: Research Project Team
License: MIT
Project-URL: Homepage, https://github.com/yourusername/waste-predictor
Project-URL: Documentation, https://github.com/yourusername/waste-predictor#readme
Project-URL: Repository, https://github.com/yourusername/waste-predictor
Project-URL: Issues, https://github.com/yourusername/waste-predictor/issues
Keywords: machine-learning,waste-prediction,industrial,ml,prediction,ensemble
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=2.0.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: scikit-learn>=1.3.0
Requires-Dist: xgboost>=2.0.0
Requires-Dist: lightgbm>=4.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: black>=23.0; extra == "dev"
Requires-Dist: flake8>=6.0; extra == "dev"
Requires-Dist: mypy>=1.0; extra == "dev"
Provides-Extra: api
Requires-Dist: flask>=3.0.0; extra == "api"
Requires-Dist: flask-cors>=4.0.0; extra == "api"
Dynamic: license-file

# Waste Prediction Module V4

A production-grade machine learning system for predicting industrial waste compositions based on production volume and environmental parameters.

## 📊 Model Performance

| Metric | Score |
|--------|-------|
| **R² Score** | **0.98** |
| **MAE** | 1,607 |
| **RMSE** | 3,092 |
| **CV R² (5-Fold)** | 0.974 ± 0.005 |

### Per-Target R² Scores

| Waste Type | R² |
|------------|-----|
| Total_Waste_kg | 0.96 |
| Solid_Waste_Limestone_kg | 0.98 |
| Solid_Waste_Gypsum_kg | 0.99 |
| Solid_Waste_Industrial_Salt_kg | 0.98 |
| Liquid_Waste_Bittern_Liters | 0.99 |
| Potential_Epsom_Salt_kg | 0.98 |
| Potential_Potash_kg | 0.99 |
| Potential_Magnesium_Oil_Liters | 0.98 |

## 🚀 Quick Start

### Installation

```bash
pip install -r requirements.txt
```

### Training

```bash
python train_v4.py
```

### Prediction

```python
from predict_v4 import WastePredictorV4

# Load model
predictor = WastePredictorV4.load('waste_predictor_v4.pkl')

# Make prediction
result = predictor.predict(
    production_volume=50000,
    rain_sum=200,
    temperature_mean=28,
    humidity_mean=85,
    wind_speed_mean=15,
    month=6
)

print(result)
# {'Total_Waste_kg': 125000.5, 'Solid_Waste_Limestone_kg': 12500.0, ...}
```

### Quick Prediction Function

```python
from predict_v4 import quick_predict

result = quick_predict(
    production_volume=50000,
    rain_sum=200,
    temperature_mean=28,
    humidity_mean=85,
    wind_speed_mean=15,
    month=6
)
```

### Batch Prediction

```python
import pandas as pd
from predict_v4 import WastePredictorV4

predictor = WastePredictorV4.load('waste_predictor_v4.pkl')

# Create input DataFrame
data = pd.DataFrame({
    'Month': [1, 6, 12],
    'production_volume': [30000, 70000, 50000],
    'rain_sum': [300, 0, 250],
    'temperature_mean': [26, 28, 26],
    'humidity_mean': [100, 80, 98],
    'wind_speed_mean': [20, 12, 18]
})

# Get predictions
results = predictor.predict_batch(data)
print(results)
```

## 📁 Project Structure

```
local-module/
├── data/
│   └── training/
│       └── training.csv          # Training dataset (312 samples)
├── train_v4.py                   # V4 training (PRODUCTION - R²=0.98)
├── predict_v4.py                 # V4 inference module
├── waste_predictor_v4.pkl        # Trained model checkpoint
├── waste_predictor_v4_metadata.json
├── train_v3.py                   # V3 training (Neural network)
├── model_v2.py                   # Enhanced neural network model
├── feature_engineering.py        # Feature engineering pipeline
├── requirements.txt              # Dependencies
└── README.md
```

## 🔧 Input Features

| Feature | Description | Range |
|---------|-------------|-------|
| `production_volume` | Production volume | 0 - 200,000 |
| `rain_sum` | Total rainfall (mm) | 0 - 1,000 |
| `temperature_mean` | Average temperature (°C) | 0 - 50 |
| `humidity_mean` | Average humidity (%) | 0 - 100 |
| `wind_speed_mean` | Average wind speed (km/h) | 0 - 50 |
| `month` | Month of year | 1 - 12 |

## 📤 Output Predictions

| Output | Description |
|--------|-------------|
| `Total_Waste_kg` | Total waste produced (kg) |
| `Solid_Waste_Limestone_kg` | Limestone solid waste (kg) |
| `Solid_Waste_Gypsum_kg` | Gypsum solid waste (kg) |
| `Solid_Waste_Industrial_Salt_kg` | Industrial salt waste (kg) |
| `Liquid_Waste_Bittern_Liters` | Bittern liquid waste (L) |
| `Potential_Epsom_Salt_kg` | Potential Epsom salt byproduct (kg) |
| `Potential_Potash_kg` | Potential Potash byproduct (kg) |
| `Potential_Magnesium_Oil_Liters` | Potential Magnesium oil (L) |

## 🧠 Model Architecture (V4)

### Weighted Ensemble of 3 Model Types:

1. **XGBoost Gradient Boosting** (weight ~33%)
   - 500 estimators, max_depth=6
   - Per-target models with log-transformed outputs

2. **Stacked Ensemble** (weight ~34%)
   - Level 0: XGBoost + LightGBM + Random Forest + GBR
   - Level 1: Ridge regression meta-learner
   - 5-fold stacking with passthrough

3. **Deep Neural Network** (weight ~33%)
   - Architecture: 256 → 512 → 256 → 128 with skip connections
   - GELU activation, BatchNorm, Dropout
   - Cosine annealing LR schedule

### Feature Engineering (30+ features):
- Log/sqrt/squared production transforms
- Cyclical month encoding (sin/cos)
- Weather condition indices (wet, dry, evaporation)
- Production × weather interactions
- Domain-driven ratio features

## 📈 Performance Comparison

| Version | R² Score | MAE | Key Technique |
|---------|----------|-----|---------------|
| V1 (Original) | 0.47 | 7,873 | Simple MLP |
| V3 | 0.77 | 5,575 | Log transform + Feature eng |
| **V4 (Production)** | **0.98** | **1,607** | XGBoost + Stacked + DNN Ensemble |

## 🔄 Retraining

To retrain the model with new data:

1. Add new data to `data/training/training.csv`
2. Run training:
   ```bash
   python train_v4.py
   ```
3. Model will be saved to `waste_predictor_v4.pkl`

## 📝 License

MIT License



