{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Unit Handling and Validation\n",
    "\n",
    "TimeDataModel treats **units**, **data types**, and **validation** as first-class concerns.  \n",
    "This notebook covers:\n",
    "\n",
    "1. Setting and inspecting units on `TimeSeriesList` and `TimeSeriesTable`\n",
    "2. Converting between compatible units with `convert_unit()`\n",
    "3. Automatic unit conversion in arithmetic operations\n",
    "4. Resolving units to pint objects with `pint_unit`\n",
    "5. Validating timestamps and frequency with `validate()`\n",
    "6. Using `DataType`, `TimeSeriesType`, and custom `attributes`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "from datetime import datetime, timedelta, timezone\n\nimport numpy as np\n\nimport timedatamodel as tdm\n\nbase = datetime(2024, 1, 15, tzinfo=timezone.utc)\ntimestamps = [base + timedelta(hours=i) for i in range(24)]\nrng = np.random.default_rng(42)"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setting units on a TimeSeriesList\n",
    "\n",
    "The `unit` parameter is a free-form string. It appears in the repr and is carried through all operations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "wind = tdm.TimeSeriesList(\n    tdm.Frequency.PT1H,\n    timezone=\"UTC\",\n    timestamps=timestamps,\n    values=(8 + rng.normal(0, 2, 24)).tolist(),\n    name=\"wind_speed\",\n    unit=\"m/s\",\n)\nwind"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"Unit: {wind.unit}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Converting units with `convert_unit()`\n",
    "\n",
    "`convert_unit()` uses [pint](https://pint.readthedocs.io/) under the hood to convert values.  \n",
    "It returns a **new** `TimeSeriesList` — the original is unchanged.\n",
    "\n",
    "```bash\n",
    "pip install timedatamodel[pint]\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "wind_kmh = wind.convert_unit(\"km/h\")\n",
    "wind_knot = wind.convert_unit(\"knot\")\n",
    "\n",
    "print(f\"Original:  {wind.unit:5s}  mean={np.nanmean(wind.arr):.2f}\")\n",
    "print(f\"Converted: {wind_kmh.unit:5s}  mean={np.nanmean(wind_kmh.arr):.2f}\")\n",
    "print(f\"Converted: {wind_knot.unit:5s}  mean={np.nanmean(wind_knot.arr):.2f}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "energy_kwh = tdm.TimeSeriesList(\n    tdm.Frequency.PT1H,\n    timezone=\"UTC\",\n    timestamps=timestamps,\n    values=(500 + rng.normal(0, 50, 24)).tolist(),\n    name=\"energy\",\n    unit=\"kWh\",\n)\n\nenergy_mwh = energy_kwh.convert_unit(\"MWh\")\nenergy_j = energy_kwh.convert_unit(\"J\")\n\nprint(f\"kWh: mean={np.nanmean(energy_kwh.arr):.1f}\")\nprint(f\"MWh: mean={np.nanmean(energy_mwh.arr):.4f}\")\nprint(f\"J:   mean={np.nanmean(energy_j.arr):.0f}\")"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Incompatible units raise an error"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    wind.convert_unit(\"MW\")\n",
    "except ValueError as e:\n",
    "    print(f\"Error: {e}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "no_unit = tdm.TimeSeriesList(\n    tdm.Frequency.PT1H, timezone=\"UTC\",\n    timestamps=timestamps,\n    values=rng.normal(0, 1, 24).tolist(),\n    name=\"dimensionless\",\n)\n\ntry:\n    no_unit.convert_unit(\"MW\")\nexcept ValueError as e:\n    print(f\"Error: {e}\")"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Automatic unit conversion in arithmetic\n",
    "\n",
    "When you add or subtract two `TimeSeriesList` with compatible units, values are automatically converted to the left operand's unit."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "power_mw = tdm.TimeSeriesList(\n    tdm.Frequency.PT1H, timezone=\"UTC\",\n    timestamps=timestamps,\n    values=(100 + rng.normal(0, 10, 24)).tolist(),\n    name=\"plant_a\",\n    unit=\"MW\",\n)\n\npower_kw = tdm.TimeSeriesList(\n    tdm.Frequency.PT1H, timezone=\"UTC\",\n    timestamps=timestamps,\n    values=(50000 + rng.normal(0, 5000, 24)).tolist(),\n    name=\"plant_b\",\n    unit=\"kW\",\n)\n\ntotal = power_mw + power_kw\nprint(f\"Result unit: {total.unit}\")\nprint(f\"plant_a mean: {np.nanmean(power_mw.arr):.1f} MW\")\nprint(f\"plant_b mean: {np.nanmean(power_kw.arr):.1f} kW = {np.nanmean(power_kw.arr)/1000:.1f} MW\")\nprint(f\"total mean:   {np.nanmean(total.arr):.1f} MW\")"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mismatched unit *presence* (one has a unit, the other doesn't) raises an error:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    _ = power_mw + no_unit\n",
    "except ValueError as e:\n",
    "    print(f\"Error: {e}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Resolving units with `pint_unit`\n",
    "\n",
    "The `pint_unit` property returns a `pint.Unit` object for programmatic inspection."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pu = power_mw.pint_unit\n",
    "print(f\"pint unit: {pu}\")\n",
    "print(f\"type:      {type(pu).__name__}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Units on TimeSeriesTable\n",
    "\n",
    "`TimeSeriesTable` supports per-column units via the `units` parameter.  \n",
    "`convert_unit()` can target a single column or all columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "table = tdm.TimeSeriesTable(\n    tdm.Frequency.PT1H,\n    timezone=\"UTC\",\n    timestamps=timestamps,\n    values=np.column_stack([\n        100 + rng.normal(0, 15, 24),\n        8 + rng.normal(0, 2, 24),\n    ]),\n    names=[\"power\", \"wind_speed\"],\n    units=[\"MW\", \"m/s\"],\n)\ntable"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "table_kw = table.convert_unit(\"kW\", column=\"power\")\n",
    "\n",
    "print(f\"Original units: {table.units}\")\n",
    "print(f\"After convert:  {table_kw.units}\")\n",
    "print(f\"Power mean: {table.arr[:, 0].mean():.1f} MW → {table_kw.arr[:, 0].mean():.1f} kW\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Validating timestamps and frequency\n",
    "\n",
    "`validate()` checks that timestamps are strictly increasing and match the declared frequency.  \n",
    "It returns a list of warning strings — an empty list means everything is consistent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "good = tdm.TimeSeriesList(\n    tdm.Frequency.PT1H, timezone=\"UTC\",\n    timestamps=timestamps,\n    values=rng.normal(0, 1, 24).tolist(),\n    name=\"clean\",\n)\n\nwarnings = good.validate()\nprint(f\"Warnings: {warnings}\")"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "gap_timestamps = timestamps[:12] + timestamps[14:]\ngap_values = rng.normal(0, 1, len(gap_timestamps)).tolist()\n\ngapped = tdm.TimeSeriesList(\n    tdm.Frequency.PT1H, timezone=\"UTC\",\n    timestamps=gap_timestamps,\n    values=gap_values,\n    name=\"has_gap\",\n)\n\nfor w in gapped.validate():\n    print(f\"⚠ {w}\")"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "bad_order = timestamps[:12] + [timestamps[13], timestamps[12]] + timestamps[14:]\nbad_values = rng.normal(0, 1, len(bad_order)).tolist()\n\nunordered = tdm.TimeSeriesList(\n    tdm.Frequency.PT1H, timezone=\"UTC\",\n    timestamps=bad_order,\n    values=bad_values,\n    name=\"unordered\",\n)\n\nfor w in unordered.validate():\n    print(f\"⚠ {w}\")"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Detecting missing values\n",
    "\n",
    "The `has_missing` property returns `True` when any value is `None` (NaN)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "values_with_gaps = rng.normal(100, 10, 24).tolist()\nvalues_with_gaps[5] = None\nvalues_with_gaps[18] = None\n\nsparse = tdm.TimeSeriesList(\n    tdm.Frequency.PT1H, timezone=\"UTC\",\n    timestamps=timestamps,\n    values=values_with_gaps,\n    name=\"sparse\",\n    unit=\"MW\",\n)\n\nprint(f\"has_missing: {sparse.has_missing}\")\nprint(f\"NaN count:   {np.isnan(sparse.arr).sum()}\")\nprint(f\"Length:      {len(sparse)}\")"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## DataType — classifying your data\n",
    "\n",
    "The `DataType` enum communicates what kind of data a series holds."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "print(\"Available DataType values:\")\nfor dt in tdm.DataType:\n    print(f\"  {dt.value}\")"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "measured = tdm.TimeSeriesList(\n    tdm.Frequency.PT1H, timezone=\"UTC\",\n    timestamps=timestamps,\n    values=(100 + rng.normal(0, 10, 24)).tolist(),\n    name=\"wind_measured\",\n    unit=\"MW\",\n    data_type=tdm.DataType.OBSERVATION,\n)\n\nforecast = tdm.TimeSeriesList(\n    tdm.Frequency.PT1H, timezone=\"UTC\",\n    timestamps=timestamps,\n    values=(105 + rng.normal(0, 15, 24)).tolist(),\n    name=\"wind_forecast\",\n    unit=\"MW\",\n    data_type=tdm.DataType.FORECAST,\n)\n\nprint(f\"{measured.name}: data_type={measured.data_type}\")\nprint(f\"{forecast.name}: data_type={forecast.data_type}\")"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## TimeSeriesType — structural classification\n",
    "\n",
    "`TimeSeriesType` describes the structural nature of the series."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "print(\"Available TimeSeriesType values:\")\nfor tst in tdm.TimeSeriesType:\n    print(f\"  {tst.value}\")"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "flat = tdm.TimeSeriesList(\n    tdm.Frequency.PT1H, timezone=\"UTC\",\n    timestamps=timestamps,\n    values=rng.normal(0, 1, 24).tolist(),\n    name=\"flat_series\",\n    timeseries_type=tdm.TimeSeriesType.FLAT,\n)\nprint(f\"timeseries_type: {flat.timeseries_type}\")"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Custom attributes\n",
    "\n",
    "The `attributes` dict stores arbitrary key-value metadata — source system, fuel type, model version, etc."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "rich = tdm.TimeSeriesList(\n    tdm.Frequency.PT1H,\n    timezone=\"UTC\",\n    timestamps=timestamps,\n    values=(80 + rng.normal(0, 10, 24)).tolist(),\n    name=\"wind_farm_alpha\",\n    unit=\"MW\",\n    description=\"Measured output from Wind Farm Alpha\",\n    data_type=tdm.DataType.OBSERVATION,\n    timeseries_type=tdm.TimeSeriesType.FLAT,\n    attributes={\n        \"source\": \"SCADA\",\n        \"fuel\": \"wind\",\n        \"capacity_mw\": \"120\",\n        \"operator\": \"NorthWind Energy\",\n    },\n)\n\nprint(f\"Attributes: {rich.attributes}\")\nprint(f\"Capacity:   {rich.attributes['capacity_mw']} MW\")"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Frequency enum\n",
    "\n",
    "`Frequency` is a `StrEnum` with helpers for calendar-based vs fixed-duration frequencies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "print(f\"{'Frequency':<8s}  {'timedelta':<22s}  {'calendar?'}\")\nprint(\"-\" * 45)\nfor f in tdm.Frequency:\n    td = f.to_timedelta()\n    td_str = str(td) if td else \"-\"\n    print(f\"{f.value:<8s}  {td_str:<22s}  {f.is_calendar_based}\")"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Metadata survives serialization\n",
    "\n",
    "Units, data types, attributes, and other metadata round-trip through JSON."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "json_str = rich.to_json()\nrestored = tdm.TimeSeriesList.from_json(json_str)\n\nprint(f\"unit:            {restored.unit}\")\nprint(f\"data_type:       {restored.data_type}\")\nprint(f\"timeseries_type: {restored.timeseries_type}\")\nprint(f\"attributes:      {restored.attributes}\")\nprint(f\"description:     {restored.description}\")"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "| Feature | API |\n",
    "| --- | --- |\n",
    "| Set unit | `TimeSeriesList(..., unit=\"MW\")` |\n",
    "| Convert unit | `ts.convert_unit(\"kW\")` — returns new series |\n",
    "| Auto-convert in arithmetic | `ts_mw + ts_kw` converts to left operand's unit |\n",
    "| Pint integration | `ts.pint_unit` — resolves to `pint.Unit` |\n",
    "| Per-column units | `TimeSeriesTable(..., units=[\"MW\", \"m/s\"])` |\n",
    "| Column conversion | `table.convert_unit(\"kW\", column=\"power\")` |\n",
    "| Validate timestamps | `ts.validate()` → list of warning strings |\n",
    "| Missing values | `ts.has_missing` |\n",
    "| Data classification | `DataType.OBSERVATION`, `.FORECAST`, `.SCENARIO`, … |\n",
    "| Structural type | `TimeSeriesType.FLAT`, `.OVERLAPPING` |\n",
    "| Custom metadata | `attributes={\"key\": \"value\"}` |\n",
    "| Frequency info | `Frequency.PT1H.to_timedelta()`, `.is_calendar_based` |\n",
    "\n",
    "Next up: **nb_04** covers arithmetic operations and comparisons on TimeSeriesList."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.14.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}