Data Quality and Coverage
Real-world time series often have gaps — sensor outages, missing transmissions, or maintenance windows. TimeDataModel provides built-in tools to visualize coverage and validate data integrity.
[1]:
from datetime import datetime, timedelta, timezone
import numpy as np
import timedatamodel as tdm
base = datetime(2024, 1, 15, tzinfo=timezone.utc)
week_hours = [base + timedelta(hours=i) for i in range(168)]
Coverage bars on a TimeSeriesList
Create a week of hourly data with a simulated outage (hours 50-70 missing).
[2]:
rng = np.random.default_rng(42)
values_full = (100 + rng.normal(0, 15, 168)).tolist()
values_with_gap = [
None if 50 <= i < 70 else v
for i, v in enumerate(values_full)
]
ts_sensor = tdm.TimeSeriesList(
tdm.Frequency.PT1H,
timestamps=week_hours,
values=values_with_gap,
name="sensor_A",
unit="MW",
data_type=tdm.DataType.OBSERVATION,
)
print(f"Has missing: {ts_sensor.has_missing}")
print(f"Total points: {len(ts_sensor)}, missing: {sum(1 for v in ts_sensor.values if v is None)}")
Has missing: True
Total points: 168, missing: 20
[3]:
ts_sensor.coverage_bar()
[3]:
Coverage bars on a TimeSeriesTable
With multiple columns, each gets its own coverage row — making it easy to spot which signals have gaps.
[4]:
sensor_a = values_with_gap
sensor_b = [
None if 100 <= i < 130 else v
for i, v in enumerate(values_full)
]
sensor_c = [
None if (20 <= i < 30 or 140 <= i < 155) else v
for i, v in enumerate(values_full)
]
vals = np.column_stack([
[v if v is not None else np.nan for v in sensor_a],
[v if v is not None else np.nan for v in sensor_b],
[v if v is not None else np.nan for v in sensor_c],
])
table = tdm.TimeSeriesTable(
tdm.Frequency.PT1H,
timestamps=week_hours,
values=vals,
names=["sensor_A", "sensor_B", "sensor_C"],
units=["MW", "MW", "MW"],
)
table.coverage_bar()
[4]:
Coverage bars on a TimeSeriesArray
Arrays show one bar per label in the non-time dimension.
[5]:
cube_data = np.array([
[v if v is not None else np.nan for v in sensor_a],
[v if v is not None else np.nan for v in sensor_b],
[v if v is not None else np.nan for v in sensor_c],
])
cube = tdm.TimeSeriesArray(
tdm.Frequency.PT1H,
dimensions=[
tdm.Dimension("sensor", ["A", "B", "C"]),
tdm.Dimension("valid_time", week_hours),
],
values=cube_data,
name="sensor_grid",
unit="MW",
)
cube.coverage_bar()
[5]:
Coverage bars on a TimeSeriesCollection
Collections map all series onto a shared global time range, so you can compare coverage across heterogeneous data.
[6]:
ts_short = tdm.TimeSeriesList(
tdm.Frequency.PT1H,
timestamps=week_hours[:72],
values=values_full[:72],
name="short_range",
unit="MW",
)
collection = tdm.TimeSeriesCollection(
[ts_sensor, ts_short],
name="Sensor comparison",
)
collection.coverage_bar()
[6]:
Validation
validate() checks that timestamps are strictly increasing and that the step between consecutive timestamps matches the declared frequency. It returns a list of warning strings — empty means everything is fine.
[7]:
warnings = ts_sensor.validate()
print(f"Warnings for ts_sensor: {warnings}")
Warnings for ts_sensor: []
Catching problems
Let’s create a series with intentionally bad timestamps to trigger validation warnings.
[8]:
bad_timestamps = [
datetime(2024, 1, 15, 0, tzinfo=timezone.utc),
datetime(2024, 1, 15, 1, tzinfo=timezone.utc),
datetime(2024, 1, 15, 1, tzinfo=timezone.utc), # duplicate!
datetime(2024, 1, 15, 4, tzinfo=timezone.utc), # gap: skipped hours 2-3
datetime(2024, 1, 15, 5, tzinfo=timezone.utc),
]
ts_bad = tdm.TimeSeriesList(
tdm.Frequency.PT1H,
timestamps=bad_timestamps,
values=[10.0, 20.0, 30.0, 40.0, 50.0],
name="bad_data",
)
for w in ts_bad.validate():
print(f" WARNING: {w}")
WARNING: timestamps not strictly increasing at index 2: 2024-01-15 01:00:00+00:00 >= 2024-01-15 01:00:00+00:00
WARNING: inconsistent frequency at index 2: expected 1:00:00, got 0:00:00
Practical example: multi-sensor data feed audit
Imagine you receive data from 5 sensors. Quickly assess which ones are reliable.
[9]:
gap_ranges = {
"turbine_1": [],
"turbine_2": [(30, 45)],
"turbine_3": [(10, 20), (80, 100)],
"turbine_4": [(0, 50)],
"turbine_5": [(60, 65), (120, 130), (150, 160)],
}
sensors = []
for name, gaps in gap_ranges.items():
vals = values_full.copy()
for start, end in gaps:
for i in range(start, end):
vals[i] = None
sensors.append(
tdm.TimeSeriesList(
tdm.Frequency.PT1H,
timestamps=week_hours,
values=vals,
name=name,
unit="MW",
)
)
audit = tdm.TimeSeriesCollection(sensors, name="Turbine fleet audit")
audit.coverage_bar()
[9]:
Summary
coverage_bar()is available onTimeSeriesList,TimeSeriesTable,TimeSeriesArray, andTimeSeriesCollectionIt renders as a color-coded SVG in notebooks and Unicode blocks in terminals
validate()catches non-monotonic timestamps and frequency inconsistencieshas_missingis a quick boolean check for any gaps
Next up: nb_08 demonstrates I/O and interoperability with pandas, numpy, polars, JSON, and CSV.