{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Cubes and Collections\n",
    "\n",
    "For multi-dimensional data (e.g., scenario x time, or region x time) use `TimeSeriesCube`. For grouping heterogeneous time series that don't share the same timestamps, use `TimeSeriesCollection`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## TimeSeriesCube\n",
    "\n",
    "A cube stores an N-dimensional array with named `Dimension` objects. Common use cases include ensemble forecasts, scenario analysis, and region-by-time grids."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "from datetime import datetime, timedelta, timezone\n\nimport numpy as np\n\nimport timedatamodel as tdm\n\nbase = datetime(2024, 1, 15, tzinfo=timezone.utc)\nhours = [base + timedelta(hours=i) for i in range(24)]"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Building a cube from scratch\n",
    "\n",
    "Create a 3-scenario x 24-hour cube representing price forecasts under different assumptions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "rng = np.random.default_rng(42)\nbase_prices = 50 + 20 * np.sin(np.linspace(0, 2 * np.pi, 24))\n\ndata = np.array([\n    base_prices * 0.8 + rng.normal(0, 2, 24),  # low scenario\n    base_prices + rng.normal(0, 2, 24),          # base scenario\n    base_prices * 1.3 + rng.normal(0, 3, 24),   # high scenario\n])\n\ncube = tdm.TimeSeriesCube(\n    tdm.Frequency.PT1H,\n    timezone=\"UTC\",\n    name=\"price_forecast\",\n    unit=\"EUR/MWh\",\n    data_type=tdm.DataType.FORECAST,\n    dimensions=[\n        tdm.Dimension(\"scenario\", [\"low\", \"base\", \"high\"]),\n        tdm.Dimension(\"valid_time\", hours),\n    ],\n    values=data,\n)\ncube"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Cube properties"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"Shape:      {cube.shape}\")\n",
    "print(f\"Dimensions: {cube.dim_names}\")\n",
    "print(f\"Begin:      {cube.begin}\")\n",
    "print(f\"End:        {cube.end}\")\n",
    "print(f\"Has missing:{cube.has_missing}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Selecting with `sel()` — label-based\n",
    "\n",
    "Select a single scenario to collapse the cube into a `TimeSeries`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "base_scenario = cube.sel(scenario=\"base\")\n",
    "base_scenario"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Selecting with `isel()` — index-based\n",
    "\n",
    "Select by integer position."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "first_scenario = cube.isel(scenario=0)\n",
    "first_scenario"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Slicing a dimension\n",
    "\n",
    "Select a range of labels to get a smaller cube or table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "two_scenarios = cube.sel(scenario=slice(\"low\", \"base\"))\nprint(f\"Type:  {type(two_scenarios).__name__}\")\nprint(two_scenarios)"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Auto-collapse to Table or Series\n",
    "\n",
    "When a `sel()` or `isel()` call removes enough dimensions, the result automatically becomes a `TimeSeriesTable` (2D) or `TimeSeries` (1D)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "table_view = cube.to_table()\n",
    "print(f\"Type: {type(table_view).__name__}\")\n",
    "table_view"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Building a cube from a list of TimeSeries\n",
    "\n",
    "`from_timeseries_list()` is handy when you already have individual scenario forecasts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "series_list = [\n    tdm.TimeSeries(\n        tdm.Frequency.PT1H,\n        timestamps=hours,\n        values=(base_prices * factor + rng.normal(0, 2, 24)).tolist(),\n        name=\"price\",\n        unit=\"EUR/MWh\",\n    )\n    for factor in [0.7, 0.85, 1.0, 1.15, 1.3]\n]\n\nensemble = tdm.TimeSeriesCube.from_timeseries_list(\n    series_list,\n    dimension=tdm.Dimension(\"percentile\", [\"p10\", \"p25\", \"p50\", \"p75\", \"p90\"]),\n    name=\"price_ensemble\",\n)\nensemble"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## TimeSeriesCollection\n",
    "\n",
    "A `TimeSeriesCollection` groups time series that may have different frequencies, time ranges, or numbers of points. Think of it as a named bag of `TimeSeries` and `TimeSeriesTable` objects."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "daily_base = datetime(2024, 1, 1, tzinfo=timezone.utc)\n\nts_hourly = tdm.TimeSeries(\n    tdm.Frequency.PT1H,\n    timestamps=hours,\n    values=[100.0 + rng.normal(0, 10) for _ in range(24)],\n    name=\"wind_hourly\",\n    unit=\"MW\",\n)\n\nts_daily = tdm.TimeSeries(\n    tdm.Frequency.P1D,\n    timestamps=[daily_base + timedelta(days=d) for d in range(30)],\n    values=[2400.0 + rng.normal(0, 200) for _ in range(30)],\n    name=\"wind_daily_energy\",\n    unit=\"MWh\",\n)\n\nts_15min = tdm.TimeSeries(\n    tdm.Frequency.PT15M,\n    timestamps=[base + timedelta(minutes=15 * i) for i in range(96)],\n    values=[50.0 + rng.normal(0, 5) for _ in range(96)],\n    name=\"solar_15min\",\n    unit=\"MW\",\n)"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Creating a collection"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "collection = tdm.TimeSeriesCollection(\n    [ts_hourly, ts_daily, ts_15min],\n    name=\"Plant overview\",\n    description=\"Mixed-frequency data for a single plant\",\n)\ncollection"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Dictionary-like access"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"Names: {collection.names}\")\n",
    "print(f\"Count: {collection.series_count}\")\n",
    "\n",
    "collection[\"wind_hourly\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Adding and removing series\n",
    "\n",
    "Collections are immutable — `add()` and `remove()` return new collections."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "ts_price = tdm.TimeSeries(\n    tdm.Frequency.PT1H,\n    timestamps=hours,\n    values=[45.0 + rng.normal(0, 8) for _ in range(24)],\n    name=\"spot_price\",\n    unit=\"EUR/MWh\",\n)\n\nextended = collection.add(ts_price)\nprint(f\"Original: {collection.names}\")\nprint(f\"Extended: {extended.names}\")"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "reduced = extended.remove(\"wind_daily_energy\")\n",
    "print(f\"Reduced: {reduced.names}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Iterating over a collection"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for name, series in collection.items():\n",
    "    print(f\"{name:20s}  freq={str(series.frequency):5s}  len={len(series):3d}  begin={series.begin}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "- **`TimeSeriesCube`**: N-dimensional time series with `Dimension` labels; slice with `sel()` / `isel()`; auto-collapses to Table or Series\n",
    "- **`TimeSeriesCollection`**: heterogeneous container for series with different frequencies and time ranges; dictionary-like access; immutable add/remove\n",
    "\n",
    "Next up: **nb_07** covers data quality tools — coverage bars and validation."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.11.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}