{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Data Quality and Coverage\n",
    "\n",
    "Real-world time series often have gaps — sensor outages, missing transmissions, or maintenance windows. TimeDataModel provides built-in tools to visualize coverage and validate data integrity."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "from datetime import datetime, timedelta, timezone\n\nimport numpy as np\n\nimport timedatamodel as tdm\n\nbase = datetime(2024, 1, 15, tzinfo=timezone.utc)\nweek_hours = [base + timedelta(hours=i) for i in range(168)]"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Coverage bars on a TimeSeriesList\n",
    "\n",
    "Create a week of hourly data with a simulated outage (hours 50-70 missing)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "rng = np.random.default_rng(42)\nvalues_full = (100 + rng.normal(0, 15, 168)).tolist()\n\nvalues_with_gap = [\n    None if 50 <= i < 70 else v\n    for i, v in enumerate(values_full)\n]\n\nts_sensor = tdm.TimeSeriesList(\n    tdm.Frequency.PT1H,\n    timestamps=week_hours,\n    values=values_with_gap,\n    name=\"sensor_A\",\n    unit=\"MW\",\n    data_type=tdm.DataType.OBSERVATION,\n)\n\nprint(f\"Has missing: {ts_sensor.has_missing}\")\nprint(f\"Total points: {len(ts_sensor)}, missing: {sum(1 for v in ts_sensor.values if v is None)}\")"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts_sensor.coverage_bar()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Coverage bars on a TimeSeriesTable\n",
    "\n",
    "With multiple columns, each gets its own coverage row — making it easy to spot which signals have gaps."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "sensor_a = values_with_gap\nsensor_b = [\n    None if 100 <= i < 130 else v\n    for i, v in enumerate(values_full)\n]\nsensor_c = [\n    None if (20 <= i < 30 or 140 <= i < 155) else v\n    for i, v in enumerate(values_full)\n]\n\nvals = np.column_stack([\n    [v if v is not None else np.nan for v in sensor_a],\n    [v if v is not None else np.nan for v in sensor_b],\n    [v if v is not None else np.nan for v in sensor_c],\n])\n\ntable = tdm.TimeSeriesTable(\n    tdm.Frequency.PT1H,\n    timestamps=week_hours,\n    values=vals,\n    names=[\"sensor_A\", \"sensor_B\", \"sensor_C\"],\n    units=[\"MW\", \"MW\", \"MW\"],\n)\ntable.coverage_bar()"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "## Coverage bars on a TimeSeriesArray\n\nArrays show one bar per label in the non-time dimension."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "cube_data = np.array([\n    [v if v is not None else np.nan for v in sensor_a],\n    [v if v is not None else np.nan for v in sensor_b],\n    [v if v is not None else np.nan for v in sensor_c],\n])\n\ncube = tdm.TimeSeriesArray(\n    tdm.Frequency.PT1H,\n    dimensions=[\n        tdm.Dimension(\"sensor\", [\"A\", \"B\", \"C\"]),\n        tdm.Dimension(\"valid_time\", week_hours),\n    ],\n    values=cube_data,\n    name=\"sensor_grid\",\n    unit=\"MW\",\n)\ncube.coverage_bar()"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Coverage bars on a TimeSeriesCollection\n",
    "\n",
    "Collections map all series onto a shared global time range, so you can compare coverage across heterogeneous data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "ts_short = tdm.TimeSeriesList(\n    tdm.Frequency.PT1H,\n    timestamps=week_hours[:72],\n    values=values_full[:72],\n    name=\"short_range\",\n    unit=\"MW\",\n)\n\ncollection = tdm.TimeSeriesCollection(\n    [ts_sensor, ts_short],\n    name=\"Sensor comparison\",\n)\ncollection.coverage_bar()"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Validation\n",
    "\n",
    "`validate()` checks that timestamps are strictly increasing and that the step between consecutive timestamps matches the declared frequency. It returns a list of warning strings — empty means everything is fine."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "warnings = ts_sensor.validate()\n",
    "print(f\"Warnings for ts_sensor: {warnings}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Catching problems\n",
    "\n",
    "Let's create a series with intentionally bad timestamps to trigger validation warnings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "bad_timestamps = [\n    datetime(2024, 1, 15, 0, tzinfo=timezone.utc),\n    datetime(2024, 1, 15, 1, tzinfo=timezone.utc),\n    datetime(2024, 1, 15, 1, tzinfo=timezone.utc),  # duplicate!\n    datetime(2024, 1, 15, 4, tzinfo=timezone.utc),  # gap: skipped hours 2-3\n    datetime(2024, 1, 15, 5, tzinfo=timezone.utc),\n]\n\nts_bad = tdm.TimeSeriesList(\n    tdm.Frequency.PT1H,\n    timestamps=bad_timestamps,\n    values=[10.0, 20.0, 30.0, 40.0, 50.0],\n    name=\"bad_data\",\n)\n\nfor w in ts_bad.validate():\n    print(f\"  WARNING: {w}\")"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Practical example: multi-sensor data feed audit\n",
    "\n",
    "Imagine you receive data from 5 sensors. Quickly assess which ones are reliable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": "gap_ranges = {\n    \"turbine_1\": [],\n    \"turbine_2\": [(30, 45)],\n    \"turbine_3\": [(10, 20), (80, 100)],\n    \"turbine_4\": [(0, 50)],\n    \"turbine_5\": [(60, 65), (120, 130), (150, 160)],\n}\n\nsensors = []\nfor name, gaps in gap_ranges.items():\n    vals = values_full.copy()\n    for start, end in gaps:\n        for i in range(start, end):\n            vals[i] = None\n    sensors.append(\n        tdm.TimeSeriesList(\n            tdm.Frequency.PT1H,\n            timestamps=week_hours,\n            values=vals,\n            name=name,\n            unit=\"MW\",\n        )\n    )\n\naudit = tdm.TimeSeriesCollection(sensors, name=\"Turbine fleet audit\")\naudit.coverage_bar()"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": "## Summary\n\n- `coverage_bar()` is available on `TimeSeriesList`, `TimeSeriesTable`, `TimeSeriesArray`, and `TimeSeriesCollection`\n- It renders as a color-coded SVG in notebooks and Unicode blocks in terminals\n- `validate()` catches non-monotonic timestamps and frequency inconsistencies\n- `has_missing` is a quick boolean check for any gaps\n\nNext up: **nb_08** demonstrates I/O and interoperability with pandas, numpy, polars, JSON, and CSV."
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.11.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}