Data Quality and Coverage
Real-world time series often have gaps — sensor outages, missing transmissions, or maintenance windows. TimeDataModel provides built-in tools to visualize coverage and validate data integrity.
[1]:
from datetime import datetime, timedelta, timezone
import numpy as np
import timedatamodel as tdm
base = datetime(2024, 1, 15, tzinfo=timezone.utc)
week_hours = [base + timedelta(hours=i) for i in range(168)]
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 5
1 from datetime import datetime, timedelta, timezone
3 import numpy as np
----> 5 import timedatamodel as tdm
7 base = datetime(2024, 1, 15, tzinfo=timezone.utc)
8 week_hours = [base + timedelta(hours=i) for i in range(168)]
ModuleNotFoundError: No module named 'timedatamodel'
Coverage bars on a TimeSeries
Create a week of hourly data with a simulated outage (hours 50-70 missing).
[2]:
rng = np.random.default_rng(42)
values_full = (100 + rng.normal(0, 15, 168)).tolist()
values_with_gap = [
None if 50 <= i < 70 else v
for i, v in enumerate(values_full)
]
ts_sensor = tdm.TimeSeries(
tdm.Frequency.PT1H,
timestamps=week_hours,
values=values_with_gap,
name="sensor_A",
unit="MW",
data_type=tdm.DataType.MEASUREMENT,
)
print(f"Has missing: {ts_sensor.has_missing}")
print(f"Total points: {len(ts_sensor)}, missing: {sum(1 for v in ts_sensor.values if v is None)}")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[2], line 9
2 values_full = (100 + rng.normal(0, 15, 168)).tolist()
4 values_with_gap = [
5 None if 50 <= i < 70 else v
6 for i, v in enumerate(values_full)
7 ]
----> 9 ts_sensor = tdm.TimeSeries(
10 tdm.Frequency.PT1H,
11 timestamps=week_hours,
12 values=values_with_gap,
13 name="sensor_A",
14 unit="MW",
15 data_type=tdm.DataType.MEASUREMENT,
16 )
18 print(f"Has missing: {ts_sensor.has_missing}")
19 print(f"Total points: {len(ts_sensor)}, missing: {sum(1 for v in ts_sensor.values if v is None)}")
NameError: name 'tdm' is not defined
[3]:
ts_sensor.coverage_bar()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[3], line 1
----> 1 ts_sensor.coverage_bar()
NameError: name 'ts_sensor' is not defined
Coverage bars on a TimeSeriesTable
With multiple columns, each gets its own coverage row — making it easy to spot which signals have gaps.
[4]:
sensor_a = values_with_gap
sensor_b = [
None if 100 <= i < 130 else v
for i, v in enumerate(values_full)
]
sensor_c = [
None if (20 <= i < 30 or 140 <= i < 155) else v
for i, v in enumerate(values_full)
]
vals = np.column_stack([
[v if v is not None else np.nan for v in sensor_a],
[v if v is not None else np.nan for v in sensor_b],
[v if v is not None else np.nan for v in sensor_c],
])
table = tdm.TimeSeriesTable(
tdm.Frequency.PT1H,
timestamps=week_hours,
values=vals,
names=["sensor_A", "sensor_B", "sensor_C"],
units=["MW", "MW", "MW"],
)
table.coverage_bar()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[4], line 17
6 sensor_c = [
7 None if (20 <= i < 30 or 140 <= i < 155) else v
8 for i, v in enumerate(values_full)
9 ]
11 vals = np.column_stack([
12 [v if v is not None else np.nan for v in sensor_a],
13 [v if v is not None else np.nan for v in sensor_b],
14 [v if v is not None else np.nan for v in sensor_c],
15 ])
---> 17 table = tdm.TimeSeriesTable(
18 tdm.Frequency.PT1H,
19 timestamps=week_hours,
20 values=vals,
21 names=["sensor_A", "sensor_B", "sensor_C"],
22 units=["MW", "MW", "MW"],
23 )
24 table.coverage_bar()
NameError: name 'tdm' is not defined
Coverage bars on a TimeSeriesCube
Cubes show one bar per label in the non-time dimension.
[5]:
cube_data = np.array([
[v if v is not None else np.nan for v in sensor_a],
[v if v is not None else np.nan for v in sensor_b],
[v if v is not None else np.nan for v in sensor_c],
])
cube = tdm.TimeSeriesCube(
tdm.Frequency.PT1H,
dimensions=[
tdm.Dimension("sensor", ["A", "B", "C"]),
tdm.Dimension("valid_time", week_hours),
],
values=cube_data,
name="sensor_grid",
unit="MW",
)
cube.coverage_bar()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[5], line 7
1 cube_data = np.array([
2 [v if v is not None else np.nan for v in sensor_a],
3 [v if v is not None else np.nan for v in sensor_b],
4 [v if v is not None else np.nan for v in sensor_c],
5 ])
----> 7 cube = tdm.TimeSeriesCube(
8 tdm.Frequency.PT1H,
9 dimensions=[
10 tdm.Dimension("sensor", ["A", "B", "C"]),
11 tdm.Dimension("valid_time", week_hours),
12 ],
13 values=cube_data,
14 name="sensor_grid",
15 unit="MW",
16 )
17 cube.coverage_bar()
NameError: name 'tdm' is not defined
Coverage bars on a TimeSeriesCollection
Collections map all series onto a shared global time range, so you can compare coverage across heterogeneous data.
[6]:
ts_short = tdm.TimeSeries(
tdm.Frequency.PT1H,
timestamps=week_hours[:72],
values=values_full[:72],
name="short_range",
unit="MW",
)
collection = tdm.TimeSeriesCollection(
[ts_sensor, ts_short],
name="Sensor comparison",
)
collection.coverage_bar()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[6], line 1
----> 1 ts_short = tdm.TimeSeries(
2 tdm.Frequency.PT1H,
3 timestamps=week_hours[:72],
4 values=values_full[:72],
5 name="short_range",
6 unit="MW",
7 )
9 collection = tdm.TimeSeriesCollection(
10 [ts_sensor, ts_short],
11 name="Sensor comparison",
12 )
13 collection.coverage_bar()
NameError: name 'tdm' is not defined
Validation
validate() checks that timestamps are strictly increasing and that the step between consecutive timestamps matches the declared frequency. It returns a list of warning strings — empty means everything is fine.
[7]:
warnings = ts_sensor.validate()
print(f"Warnings for ts_sensor: {warnings}")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[7], line 1
----> 1 warnings = ts_sensor.validate()
2 print(f"Warnings for ts_sensor: {warnings}")
NameError: name 'ts_sensor' is not defined
Catching problems
Let’s create a series with intentionally bad timestamps to trigger validation warnings.
[8]:
bad_timestamps = [
datetime(2024, 1, 15, 0, tzinfo=timezone.utc),
datetime(2024, 1, 15, 1, tzinfo=timezone.utc),
datetime(2024, 1, 15, 1, tzinfo=timezone.utc), # duplicate!
datetime(2024, 1, 15, 4, tzinfo=timezone.utc), # gap: skipped hours 2-3
datetime(2024, 1, 15, 5, tzinfo=timezone.utc),
]
ts_bad = tdm.TimeSeries(
tdm.Frequency.PT1H,
timestamps=bad_timestamps,
values=[10.0, 20.0, 30.0, 40.0, 50.0],
name="bad_data",
)
for w in ts_bad.validate():
print(f" WARNING: {w}")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[8], line 9
1 bad_timestamps = [
2 datetime(2024, 1, 15, 0, tzinfo=timezone.utc),
3 datetime(2024, 1, 15, 1, tzinfo=timezone.utc),
(...) 6 datetime(2024, 1, 15, 5, tzinfo=timezone.utc),
7 ]
----> 9 ts_bad = tdm.TimeSeries(
10 tdm.Frequency.PT1H,
11 timestamps=bad_timestamps,
12 values=[10.0, 20.0, 30.0, 40.0, 50.0],
13 name="bad_data",
14 )
16 for w in ts_bad.validate():
17 print(f" WARNING: {w}")
NameError: name 'tdm' is not defined
Practical example: multi-sensor data feed audit
Imagine you receive data from 5 sensors. Quickly assess which ones are reliable.
[9]:
gap_ranges = {
"turbine_1": [],
"turbine_2": [(30, 45)],
"turbine_3": [(10, 20), (80, 100)],
"turbine_4": [(0, 50)],
"turbine_5": [(60, 65), (120, 130), (150, 160)],
}
sensors = []
for name, gaps in gap_ranges.items():
vals = values_full.copy()
for start, end in gaps:
for i in range(start, end):
vals[i] = None
sensors.append(
tdm.TimeSeries(
tdm.Frequency.PT1H,
timestamps=week_hours,
values=vals,
name=name,
unit="MW",
)
)
audit = tdm.TimeSeriesCollection(sensors, name="Turbine fleet audit")
audit.coverage_bar()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[9], line 16
13 for i in range(start, end):
14 vals[i] = None
15 sensors.append(
---> 16 tdm.TimeSeries(
17 tdm.Frequency.PT1H,
18 timestamps=week_hours,
19 values=vals,
20 name=name,
21 unit="MW",
22 )
23 )
25 audit = tdm.TimeSeriesCollection(sensors, name="Turbine fleet audit")
26 audit.coverage_bar()
NameError: name 'tdm' is not defined
Summary
coverage_bar()is available onTimeSeries,TimeSeriesTable,TimeSeriesCube, andTimeSeriesCollectionIt renders as a color-coded SVG in notebooks and Unicode blocks in terminals
validate()catches non-monotonic timestamps and frequency inconsistencieshas_missingis a quick boolean check for any gaps
Next up: nb_08 demonstrates I/O and interoperability with pandas, numpy, polars, JSON, and CSV.