Data Quality and Coverage

Real-world time series often have gaps — sensor outages, missing transmissions, or maintenance windows. TimeDataModel provides built-in tools to visualize coverage and validate data integrity.

[1]:
from datetime import datetime, timedelta, timezone

import numpy as np

import timedatamodel as tdm

base = datetime(2024, 1, 15, tzinfo=timezone.utc)
week_hours = [base + timedelta(hours=i) for i in range(168)]

Coverage bars on a TimeSeriesList

Create a week of hourly data with a simulated outage (hours 50-70 missing).

[2]:
rng = np.random.default_rng(42)
values_full = (100 + rng.normal(0, 15, 168)).tolist()

values_with_gap = [
    None if 50 <= i < 70 else v
    for i, v in enumerate(values_full)
]

ts_sensor = tdm.TimeSeriesList(
    tdm.Frequency.PT1H,
    timestamps=week_hours,
    values=values_with_gap,
    name="sensor_A",
    unit="MW",
    data_type=tdm.DataType.OBSERVATION,
)

print(f"Has missing: {ts_sensor.has_missing}")
print(f"Total points: {len(ts_sensor)}, missing: {sum(1 for v in ts_sensor.values if v is None)}")
Has missing: True
Total points: 168, missing: 20
[3]:
ts_sensor.coverage_bar()
[3]:
sensor_A 2024-01-15 00:00 2024-01-21 23:00

Coverage bars on a TimeSeriesTable

With multiple columns, each gets its own coverage row — making it easy to spot which signals have gaps.

[4]:
sensor_a = values_with_gap
sensor_b = [
    None if 100 <= i < 130 else v
    for i, v in enumerate(values_full)
]
sensor_c = [
    None if (20 <= i < 30 or 140 <= i < 155) else v
    for i, v in enumerate(values_full)
]

vals = np.column_stack([
    [v if v is not None else np.nan for v in sensor_a],
    [v if v is not None else np.nan for v in sensor_b],
    [v if v is not None else np.nan for v in sensor_c],
])

table = tdm.TimeSeriesTable(
    tdm.Frequency.PT1H,
    timestamps=week_hours,
    values=vals,
    names=["sensor_A", "sensor_B", "sensor_C"],
    units=["MW", "MW", "MW"],
)
table.coverage_bar()
[4]:
sensor_A sensor_B sensor_C 2024-01-15 00:00 2024-01-21 23:00

Coverage bars on a TimeSeriesArray

Arrays show one bar per label in the non-time dimension.

[5]:
cube_data = np.array([
    [v if v is not None else np.nan for v in sensor_a],
    [v if v is not None else np.nan for v in sensor_b],
    [v if v is not None else np.nan for v in sensor_c],
])

cube = tdm.TimeSeriesArray(
    tdm.Frequency.PT1H,
    dimensions=[
        tdm.Dimension("sensor", ["A", "B", "C"]),
        tdm.Dimension("valid_time", week_hours),
    ],
    values=cube_data,
    name="sensor_grid",
    unit="MW",
)
cube.coverage_bar()
[5]:
A B C 2024-01-15 00:00 2024-01-21 23:00

Coverage bars on a TimeSeriesCollection

Collections map all series onto a shared global time range, so you can compare coverage across heterogeneous data.

[6]:
ts_short = tdm.TimeSeriesList(
    tdm.Frequency.PT1H,
    timestamps=week_hours[:72],
    values=values_full[:72],
    name="short_range",
    unit="MW",
)

collection = tdm.TimeSeriesCollection(
    [ts_sensor, ts_short],
    name="Sensor comparison",
)
collection.coverage_bar()
[6]:
sensor_A short_range 2024-01-15 00:00 2024-01-21 23:00

Validation

validate() checks that timestamps are strictly increasing and that the step between consecutive timestamps matches the declared frequency. It returns a list of warning strings — empty means everything is fine.

[7]:
warnings = ts_sensor.validate()
print(f"Warnings for ts_sensor: {warnings}")
Warnings for ts_sensor: []

Catching problems

Let’s create a series with intentionally bad timestamps to trigger validation warnings.

[8]:
bad_timestamps = [
    datetime(2024, 1, 15, 0, tzinfo=timezone.utc),
    datetime(2024, 1, 15, 1, tzinfo=timezone.utc),
    datetime(2024, 1, 15, 1, tzinfo=timezone.utc),  # duplicate!
    datetime(2024, 1, 15, 4, tzinfo=timezone.utc),  # gap: skipped hours 2-3
    datetime(2024, 1, 15, 5, tzinfo=timezone.utc),
]

ts_bad = tdm.TimeSeriesList(
    tdm.Frequency.PT1H,
    timestamps=bad_timestamps,
    values=[10.0, 20.0, 30.0, 40.0, 50.0],
    name="bad_data",
)

for w in ts_bad.validate():
    print(f"  WARNING: {w}")
  WARNING: timestamps not strictly increasing at index 2: 2024-01-15 01:00:00+00:00 >= 2024-01-15 01:00:00+00:00
  WARNING: inconsistent frequency at index 2: expected 1:00:00, got 0:00:00

Practical example: multi-sensor data feed audit

Imagine you receive data from 5 sensors. Quickly assess which ones are reliable.

[9]:
gap_ranges = {
    "turbine_1": [],
    "turbine_2": [(30, 45)],
    "turbine_3": [(10, 20), (80, 100)],
    "turbine_4": [(0, 50)],
    "turbine_5": [(60, 65), (120, 130), (150, 160)],
}

sensors = []
for name, gaps in gap_ranges.items():
    vals = values_full.copy()
    for start, end in gaps:
        for i in range(start, end):
            vals[i] = None
    sensors.append(
        tdm.TimeSeriesList(
            tdm.Frequency.PT1H,
            timestamps=week_hours,
            values=vals,
            name=name,
            unit="MW",
        )
    )

audit = tdm.TimeSeriesCollection(sensors, name="Turbine fleet audit")
audit.coverage_bar()
[9]:
turbine_1 turbine_2 turbine_3 turbine_4 turbine_5 2024-01-15 00:00 2024-01-21 23:00

Summary

  • coverage_bar() is available on TimeSeriesList, TimeSeriesTable, TimeSeriesArray, and TimeSeriesCollection

  • It renders as a color-coded SVG in notebooks and Unicode blocks in terminals

  • validate() catches non-monotonic timestamps and frequency inconsistencies

  • has_missing is a quick boolean check for any gaps

Next up: nb_08 demonstrates I/O and interoperability with pandas, numpy, polars, JSON, and CSV.