{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# NumPy and Pandas Transforms\n",
    "\n",
    "TimeDataModel provides clean patterns for transforming time series data using numpy and pandas. Every `TimeSeriesList` and `TimeSeriesTable` exposes `.arr` (numpy array) and `.df` (pandas DataFrame) properties, plus dedicated methods for writing results back. This keeps your domain model structured while letting you leverage the full scientific Python ecosystem."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "from datetime import datetime, timedelta, timezone\n\nimport numpy as np\n\nimport timedatamodel as tdm\n\nbase = datetime(2024, 1, 15, tzinfo=timezone.utc)\ntimestamps = [base + timedelta(hours=i) for i in range(24)]\n\nts = tdm.TimeSeriesList(\n    tdm.Frequency.PT1H,\n    timestamps=timestamps,\n    values=[\n        120.0, 115.0, 108.0, 105.0, 102.0, 100.0,\n        110.0, 135.0, 160.0, 175.0, 180.0, 178.0,\n        172.0, 170.0, 168.0, 165.0, 175.0, 190.0,\n        200.0, 195.0, 180.0, 165.0, 145.0, 130.0,\n    ],\n    name=\"power\",\n    unit=\"MW\",\n    data_type=tdm.DataType.OBSERVATION,\n)"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The `.arr` and `.df` properties\n",
    "\n",
    "Every `TimeSeriesList` has two shorthand properties for quick access to the underlying data:\n",
    "- `ts.arr` — returns a numpy `ndarray` (same as `ts.to_numpy()`)\n",
    "- `ts.df` — returns a pandas `DataFrame` (same as `ts.to_pandas_dataframe()`)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts.arr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts.df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pattern 1: `apply_numpy(func)`\n",
    "\n",
    "Pass a function that receives a numpy array and returns a numpy array. Timestamps, frequency, and all metadata are preserved automatically. The output array must have the same length as the input."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "normalized = ts.apply_numpy(lambda arr: (arr - arr.mean()) / arr.std())\n",
    "normalized"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cumulative = ts.apply_numpy(np.cumsum)\n",
    "cumulative.head(6)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "clipped = ts.apply_numpy(lambda arr: np.clip(arr, 110, 180))\n",
    "clipped"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pattern 2: `apply_pandas(func)`\n",
    "\n",
    "Pass a function that receives a pandas DataFrame and returns a pandas DataFrame. This lets you use the full pandas API — rolling windows, resampling, interpolation, and more."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rolling_mean = ts.apply_pandas(lambda df: df.rolling(6, min_periods=1).mean())\n",
    "rolling_mean"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "diff = ts.apply_pandas(lambda df: df.diff())\n",
    "diff"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pct_change = ts.apply_pandas(lambda df: df.pct_change() * 100)\n",
    "pct_change.head(6)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pattern 3: One-liner round-trips with `update_arr()` and `update_df()`\n",
    "\n",
    "Combine `.arr` / `.df` with `update_arr()` / `update_df()` to transform data in a single expression. The result is a new `TimeSeriesList` with all metadata preserved."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts.update_arr(ts.arr.clip(110, 180))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts.update_df(ts.df.resample(\"3h\").mean())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts.update_df(ts.df.diff())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts.update_arr(np.cumsum(ts.arr))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pattern 4: Manual numpy round-trip\n",
    "\n",
    "For transformations where the output shape differs from the input, export to numpy, transform freely, and construct a new `TimeSeriesList`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "arr = ts.to_numpy()\n",
    "print(f\"Type:  {type(arr)}\")\n",
    "print(f\"Shape: {arr.shape}\")\n",
    "print(f\"Mean:  {arr.mean():.1f} MW\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "window = 3\nsmoothed_arr = np.convolve(arr, np.ones(window) / window, mode=\"valid\")\nsmoothed_timestamps = timestamps[window - 1 :]\n\nts_smoothed = tdm.TimeSeriesList(\n    tdm.Frequency.PT1H,\n    timestamps=smoothed_timestamps,\n    values=smoothed_arr.tolist(),\n    name=ts.name,\n    unit=ts.unit,\n    data_type=ts.data_type,\n)\nprint(f\"Original length: {len(ts)}, Smoothed length: {len(ts_smoothed)}\")\nts_smoothed.head(6)"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pattern 5: Manual pandas round-trip\n",
    "\n",
    "For multi-step pandas workflows where a one-liner would be hard to read, break it into separate steps."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = ts.to_pandas_dataframe()\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_resampled = df.resample(\"3h\").mean()\n",
    "\n",
    "ts_resampled = ts.update_from_pandas(df_resampled)\n",
    "print(f\"Original:  {len(ts)} points\")\n",
    "print(f\"Resampled: {len(ts_resampled)} points\")\n",
    "print(f\"Unit preserved: {ts_resampled.unit}\")\n",
    "ts_resampled"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_ewm = df.ewm(span=6).mean()\n",
    "\n",
    "ts_ewm = ts.update_from_pandas(df_ewm)\n",
    "ts_ewm"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Transforms on TimeSeriesTable\n",
    "\n",
    "All patterns — `apply_*`, `update_arr()`, `update_df()`, `.arr`, `.df` — also work on `TimeSeriesTable`, applying across all columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "rng = np.random.default_rng(42)\n\ntable = tdm.TimeSeriesTable(\n    tdm.Frequency.PT1H,\n    timestamps=timestamps,\n    values=np.column_stack([\n        80 + 40 * np.sin(np.linspace(0, 2 * np.pi, 24)) + rng.normal(0, 5, 24),\n        np.clip(60 * np.sin(np.linspace(-0.5, np.pi + 0.5, 24)), 0, None),\n        50 + rng.normal(0, 3, 24),\n    ]),\n    names=[\"wind\", \"solar\", \"hydro\"],\n    units=[\"MW\", \"MW\", \"MW\"],\n)\ntable"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "table_rolling = table.apply_pandas(lambda df: df.rolling(4, min_periods=1).mean())\n",
    "table_rolling"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "table_norm = table.apply_numpy(\n",
    "    lambda arr: (arr - arr.mean(axis=0)) / arr.std(axis=0)\n",
    ")\n",
    "table_norm.head(6)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "table.update_df(table.df.rolling(4, min_periods=1).mean())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "table.update_arr(np.clip(table.arr, 40, 120))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "Five patterns for transforming time series data:\n",
    "\n",
    "| Pattern | Method | Best for |\n",
    "|---------|--------|----------|\n",
    "| `ts.apply_numpy(func)` | Functional | Same-length vectorized ops (normalize, cumsum) |\n",
    "| `ts.apply_pandas(func)` | Functional | Rolling windows, diff, pct_change |\n",
    "| `ts.update_arr(ts.arr.clip(...))` | One-liner | Quick numpy transforms via `.arr` |\n",
    "| `ts.update_df(ts.df.resample(...).mean())` | One-liner | Quick pandas transforms via `.df` |\n",
    "| Manual `to_numpy()` / `to_pandas_dataframe()` | Multi-step | Shape-changing ops, complex workflows |\n",
    "\n",
    "All patterns preserve metadata. Use `.arr` / `.df` for read access and `update_arr()` / `update_df()` to write results back.\n",
    "\n",
    "Next up: **nb_03** covers unit handling, validation, and rich metadata."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.14.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}