Data Quality and Coverage

Real-world time series often have gaps — sensor outages, missing transmissions, or maintenance windows. TimeDataModel provides built-in tools to visualize coverage and validate data integrity.

[1]:
from datetime import datetime, timedelta, timezone

import numpy as np

import timedatamodel as tdm

base = datetime(2024, 1, 15, tzinfo=timezone.utc)
week_hours = [base + timedelta(hours=i) for i in range(168)]
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 5
      1 from datetime import datetime, timedelta, timezone
      3 import numpy as np
----> 5 import timedatamodel as tdm
      7 base = datetime(2024, 1, 15, tzinfo=timezone.utc)
      8 week_hours = [base + timedelta(hours=i) for i in range(168)]

ModuleNotFoundError: No module named 'timedatamodel'

Coverage bars on a TimeSeries

Create a week of hourly data with a simulated outage (hours 50-70 missing).

[2]:
rng = np.random.default_rng(42)
values_full = (100 + rng.normal(0, 15, 168)).tolist()

values_with_gap = [
    None if 50 <= i < 70 else v
    for i, v in enumerate(values_full)
]

ts_sensor = tdm.TimeSeries(
    tdm.Frequency.PT1H,
    timestamps=week_hours,
    values=values_with_gap,
    name="sensor_A",
    unit="MW",
    data_type=tdm.DataType.MEASUREMENT,
)

print(f"Has missing: {ts_sensor.has_missing}")
print(f"Total points: {len(ts_sensor)}, missing: {sum(1 for v in ts_sensor.values if v is None)}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[2], line 9
      2 values_full = (100 + rng.normal(0, 15, 168)).tolist()
      4 values_with_gap = [
      5     None if 50 <= i < 70 else v
      6     for i, v in enumerate(values_full)
      7 ]
----> 9 ts_sensor = tdm.TimeSeries(
     10     tdm.Frequency.PT1H,
     11     timestamps=week_hours,
     12     values=values_with_gap,
     13     name="sensor_A",
     14     unit="MW",
     15     data_type=tdm.DataType.MEASUREMENT,
     16 )
     18 print(f"Has missing: {ts_sensor.has_missing}")
     19 print(f"Total points: {len(ts_sensor)}, missing: {sum(1 for v in ts_sensor.values if v is None)}")

NameError: name 'tdm' is not defined
[3]:
ts_sensor.coverage_bar()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 ts_sensor.coverage_bar()

NameError: name 'ts_sensor' is not defined

Coverage bars on a TimeSeriesTable

With multiple columns, each gets its own coverage row — making it easy to spot which signals have gaps.

[4]:
sensor_a = values_with_gap
sensor_b = [
    None if 100 <= i < 130 else v
    for i, v in enumerate(values_full)
]
sensor_c = [
    None if (20 <= i < 30 or 140 <= i < 155) else v
    for i, v in enumerate(values_full)
]

vals = np.column_stack([
    [v if v is not None else np.nan for v in sensor_a],
    [v if v is not None else np.nan for v in sensor_b],
    [v if v is not None else np.nan for v in sensor_c],
])

table = tdm.TimeSeriesTable(
    tdm.Frequency.PT1H,
    timestamps=week_hours,
    values=vals,
    names=["sensor_A", "sensor_B", "sensor_C"],
    units=["MW", "MW", "MW"],
)
table.coverage_bar()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[4], line 17
      6 sensor_c = [
      7     None if (20 <= i < 30 or 140 <= i < 155) else v
      8     for i, v in enumerate(values_full)
      9 ]
     11 vals = np.column_stack([
     12     [v if v is not None else np.nan for v in sensor_a],
     13     [v if v is not None else np.nan for v in sensor_b],
     14     [v if v is not None else np.nan for v in sensor_c],
     15 ])
---> 17 table = tdm.TimeSeriesTable(
     18     tdm.Frequency.PT1H,
     19     timestamps=week_hours,
     20     values=vals,
     21     names=["sensor_A", "sensor_B", "sensor_C"],
     22     units=["MW", "MW", "MW"],
     23 )
     24 table.coverage_bar()

NameError: name 'tdm' is not defined

Coverage bars on a TimeSeriesCube

Cubes show one bar per label in the non-time dimension.

[5]:
cube_data = np.array([
    [v if v is not None else np.nan for v in sensor_a],
    [v if v is not None else np.nan for v in sensor_b],
    [v if v is not None else np.nan for v in sensor_c],
])

cube = tdm.TimeSeriesCube(
    tdm.Frequency.PT1H,
    dimensions=[
        tdm.Dimension("sensor", ["A", "B", "C"]),
        tdm.Dimension("valid_time", week_hours),
    ],
    values=cube_data,
    name="sensor_grid",
    unit="MW",
)
cube.coverage_bar()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[5], line 7
      1 cube_data = np.array([
      2     [v if v is not None else np.nan for v in sensor_a],
      3     [v if v is not None else np.nan for v in sensor_b],
      4     [v if v is not None else np.nan for v in sensor_c],
      5 ])
----> 7 cube = tdm.TimeSeriesCube(
      8     tdm.Frequency.PT1H,
      9     dimensions=[
     10         tdm.Dimension("sensor", ["A", "B", "C"]),
     11         tdm.Dimension("valid_time", week_hours),
     12     ],
     13     values=cube_data,
     14     name="sensor_grid",
     15     unit="MW",
     16 )
     17 cube.coverage_bar()

NameError: name 'tdm' is not defined

Coverage bars on a TimeSeriesCollection

Collections map all series onto a shared global time range, so you can compare coverage across heterogeneous data.

[6]:
ts_short = tdm.TimeSeries(
    tdm.Frequency.PT1H,
    timestamps=week_hours[:72],
    values=values_full[:72],
    name="short_range",
    unit="MW",
)

collection = tdm.TimeSeriesCollection(
    [ts_sensor, ts_short],
    name="Sensor comparison",
)
collection.coverage_bar()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[6], line 1
----> 1 ts_short = tdm.TimeSeries(
      2     tdm.Frequency.PT1H,
      3     timestamps=week_hours[:72],
      4     values=values_full[:72],
      5     name="short_range",
      6     unit="MW",
      7 )
      9 collection = tdm.TimeSeriesCollection(
     10     [ts_sensor, ts_short],
     11     name="Sensor comparison",
     12 )
     13 collection.coverage_bar()

NameError: name 'tdm' is not defined

Validation

validate() checks that timestamps are strictly increasing and that the step between consecutive timestamps matches the declared frequency. It returns a list of warning strings — empty means everything is fine.

[7]:
warnings = ts_sensor.validate()
print(f"Warnings for ts_sensor: {warnings}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[7], line 1
----> 1 warnings = ts_sensor.validate()
      2 print(f"Warnings for ts_sensor: {warnings}")

NameError: name 'ts_sensor' is not defined

Catching problems

Let’s create a series with intentionally bad timestamps to trigger validation warnings.

[8]:
bad_timestamps = [
    datetime(2024, 1, 15, 0, tzinfo=timezone.utc),
    datetime(2024, 1, 15, 1, tzinfo=timezone.utc),
    datetime(2024, 1, 15, 1, tzinfo=timezone.utc),  # duplicate!
    datetime(2024, 1, 15, 4, tzinfo=timezone.utc),  # gap: skipped hours 2-3
    datetime(2024, 1, 15, 5, tzinfo=timezone.utc),
]

ts_bad = tdm.TimeSeries(
    tdm.Frequency.PT1H,
    timestamps=bad_timestamps,
    values=[10.0, 20.0, 30.0, 40.0, 50.0],
    name="bad_data",
)

for w in ts_bad.validate():
    print(f"  WARNING: {w}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[8], line 9
      1 bad_timestamps = [
      2     datetime(2024, 1, 15, 0, tzinfo=timezone.utc),
      3     datetime(2024, 1, 15, 1, tzinfo=timezone.utc),
   (...)      6     datetime(2024, 1, 15, 5, tzinfo=timezone.utc),
      7 ]
----> 9 ts_bad = tdm.TimeSeries(
     10     tdm.Frequency.PT1H,
     11     timestamps=bad_timestamps,
     12     values=[10.0, 20.0, 30.0, 40.0, 50.0],
     13     name="bad_data",
     14 )
     16 for w in ts_bad.validate():
     17     print(f"  WARNING: {w}")

NameError: name 'tdm' is not defined

Practical example: multi-sensor data feed audit

Imagine you receive data from 5 sensors. Quickly assess which ones are reliable.

[9]:
gap_ranges = {
    "turbine_1": [],
    "turbine_2": [(30, 45)],
    "turbine_3": [(10, 20), (80, 100)],
    "turbine_4": [(0, 50)],
    "turbine_5": [(60, 65), (120, 130), (150, 160)],
}

sensors = []
for name, gaps in gap_ranges.items():
    vals = values_full.copy()
    for start, end in gaps:
        for i in range(start, end):
            vals[i] = None
    sensors.append(
        tdm.TimeSeries(
            tdm.Frequency.PT1H,
            timestamps=week_hours,
            values=vals,
            name=name,
            unit="MW",
        )
    )

audit = tdm.TimeSeriesCollection(sensors, name="Turbine fleet audit")
audit.coverage_bar()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[9], line 16
     13         for i in range(start, end):
     14             vals[i] = None
     15     sensors.append(
---> 16         tdm.TimeSeries(
     17             tdm.Frequency.PT1H,
     18             timestamps=week_hours,
     19             values=vals,
     20             name=name,
     21             unit="MW",
     22         )
     23     )
     25 audit = tdm.TimeSeriesCollection(sensors, name="Turbine fleet audit")
     26 audit.coverage_bar()

NameError: name 'tdm' is not defined

Summary

  • coverage_bar() is available on TimeSeries, TimeSeriesTable, TimeSeriesCube, and TimeSeriesCollection

  • It renders as a color-coded SVG in notebooks and Unicode blocks in terminals

  • validate() catches non-monotonic timestamps and frequency inconsistencies

  • has_missing is a quick boolean check for any gaps

Next up: nb_08 demonstrates I/O and interoperability with pandas, numpy, polars, JSON, and CSV.