Unit Handling and Validation

TimeDataModel treats units, data types, and validation as first-class concerns.
This notebook covers:
  1. Setting and inspecting units on TimeSeriesList and TimeSeriesTable

  2. Converting between compatible units with convert_unit()

  3. Automatic unit conversion in arithmetic operations

  4. Resolving units to pint objects with pint_unit

  5. Validating timestamps and frequency with validate()

  6. Using DataType, TimeSeriesType, and custom attributes

[1]:
from datetime import datetime, timedelta, timezone

import numpy as np

import timedatamodel as tdm

base = datetime(2024, 1, 15, tzinfo=timezone.utc)
timestamps = [base + timedelta(hours=i) for i in range(24)]
rng = np.random.default_rng(42)

Setting units on a TimeSeriesList

The unit parameter is a free-form string. It appears in the repr and is carried through all operations.

[2]:
wind = tdm.TimeSeriesList(
    tdm.Frequency.PT1H,
    timezone="UTC",
    timestamps=timestamps,
    values=(8 + rng.normal(0, 2, 24)).tolist(),
    name="wind_speed",
    unit="m/s",
)
wind
[2]:
TimeSeriesList
Namewind_speed
Length24
FrequencyPT1H
TimezoneUTC (+00:00)
Unitm/s
timestampwind_speed
2024-01-15 00:008.60943
2024-01-15 01:005.92003
2024-01-15 02:009.5009
2024-01-15 21:006.63814
2024-01-15 22:0010.4451
2024-01-15 23:007.69094
[3]:
print(f"Unit: {wind.unit}")
Unit: m/s

Converting units with convert_unit()

convert_unit() uses pint under the hood to convert values.
It returns a new TimeSeriesList — the original is unchanged.
pip install timedatamodel[pint]
[4]:
wind_kmh = wind.convert_unit("km/h")
wind_knot = wind.convert_unit("knot")

print(f"Original:  {wind.unit:5s}  mean={np.nanmean(wind.arr):.2f}")
print(f"Converted: {wind_kmh.unit:5s}  mean={np.nanmean(wind_kmh.arr):.2f}")
print(f"Converted: {wind_knot.unit:5s}  mean={np.nanmean(wind_knot.arr):.2f}")
Original:  m/s    mean=7.96
Converted: km/h   mean=28.66
Converted: knot   mean=15.48
[5]:
energy_kwh = tdm.TimeSeriesList(
    tdm.Frequency.PT1H,
    timezone="UTC",
    timestamps=timestamps,
    values=(500 + rng.normal(0, 50, 24)).tolist(),
    name="energy",
    unit="kWh",
)

energy_mwh = energy_kwh.convert_unit("MWh")
energy_j = energy_kwh.convert_unit("J")

print(f"kWh: mean={np.nanmean(energy_kwh.arr):.1f}")
print(f"MWh: mean={np.nanmean(energy_mwh.arr):.4f}")
print(f"J:   mean={np.nanmean(energy_j.arr):.0f}")
kWh: mean=508.9
MWh: mean=0.5089
J:   mean=1832028606

Incompatible units raise an error

[6]:
try:
    wind.convert_unit("MW")
except ValueError as e:
    print(f"Error: {e}")
Error: cannot convert 'm/s' to 'MW': incompatible dimensions
[7]:
no_unit = tdm.TimeSeriesList(
    tdm.Frequency.PT1H, timezone="UTC",
    timestamps=timestamps,
    values=rng.normal(0, 1, 24).tolist(),
    name="dimensionless",
)

try:
    no_unit.convert_unit("MW")
except ValueError as e:
    print(f"Error: {e}")
Error: cannot convert units: source unit is None

Automatic unit conversion in arithmetic

When you add or subtract two TimeSeriesList with compatible units, values are automatically converted to the left operand’s unit.

[8]:
power_mw = tdm.TimeSeriesList(
    tdm.Frequency.PT1H, timezone="UTC",
    timestamps=timestamps,
    values=(100 + rng.normal(0, 10, 24)).tolist(),
    name="plant_a",
    unit="MW",
)

power_kw = tdm.TimeSeriesList(
    tdm.Frequency.PT1H, timezone="UTC",
    timestamps=timestamps,
    values=(50000 + rng.normal(0, 5000, 24)).tolist(),
    name="plant_b",
    unit="kW",
)

total = power_mw + power_kw
print(f"Result unit: {total.unit}")
print(f"plant_a mean: {np.nanmean(power_mw.arr):.1f} MW")
print(f"plant_b mean: {np.nanmean(power_kw.arr):.1f} kW = {np.nanmean(power_kw.arr)/1000:.1f} MW")
print(f"total mean:   {np.nanmean(total.arr):.1f} MW")
Result unit: MW
plant_a mean: 98.4 MW
plant_b mean: 48948.6 kW = 48.9 MW
total mean:   147.4 MW

Mismatched unit presence (one has a unit, the other doesn’t) raises an error:

[9]:
try:
    _ = power_mw + no_unit
except ValueError as e:
    print(f"Error: {e}")
Error: unit mismatch: one operand has unit='MW' and the other has unit=None

Resolving units with pint_unit

The pint_unit property returns a pint.Unit object for programmatic inspection.

[10]:
pu = power_mw.pint_unit
print(f"pint unit: {pu}")
print(f"type:      {type(pu).__name__}")
pint unit: megawatt
type:      Unit

Units on TimeSeriesTable

TimeSeriesTable supports per-column units via the units parameter.
convert_unit() can target a single column or all columns.
[11]:
table = tdm.TimeSeriesTable(
    tdm.Frequency.PT1H,
    timezone="UTC",
    timestamps=timestamps,
    values=np.column_stack([
        100 + rng.normal(0, 15, 24),
        8 + rng.normal(0, 2, 24),
    ]),
    names=["power", "wind_speed"],
    units=["MW", "m/s"],
)
table
[11]:
TimeSeriesTable
Nameunnamed
Columnspower, wind_speed
Length24 × 2
FrequencyPT1H
TimezoneUTC (+00:00)
UnitMW, m/s
timestamppowerwind_speed
2024-01-15 00:0084.64756.37412
2024-01-15 01:00102.6897.16929
2024-01-15 02:00103.36.77581
2024-01-15 21:0085.15696.19415
2024-01-15 22:0068.01939.86315
2024-01-15 23:00104.0168.7699
[12]:
table_kw = table.convert_unit("kW", column="power")

print(f"Original units: {table.units}")
print(f"After convert:  {table_kw.units}")
print(f"Power mean: {table.arr[:, 0].mean():.1f} MW → {table_kw.arr[:, 0].mean():.1f} kW")
Original units: ['MW', 'm/s']
After convert:  ['kW', 'm/s']
Power mean: 100.3 MW → 100261.2 kW

Validating timestamps and frequency

validate() checks that timestamps are strictly increasing and match the declared frequency.
It returns a list of warning strings — an empty list means everything is consistent.
[13]:
good = tdm.TimeSeriesList(
    tdm.Frequency.PT1H, timezone="UTC",
    timestamps=timestamps,
    values=rng.normal(0, 1, 24).tolist(),
    name="clean",
)

warnings = good.validate()
print(f"Warnings: {warnings}")
Warnings: []
[14]:
gap_timestamps = timestamps[:12] + timestamps[14:]
gap_values = rng.normal(0, 1, len(gap_timestamps)).tolist()

gapped = tdm.TimeSeriesList(
    tdm.Frequency.PT1H, timezone="UTC",
    timestamps=gap_timestamps,
    values=gap_values,
    name="has_gap",
)

for w in gapped.validate():
    print(f"⚠ {w}")
⚠ inconsistent frequency at index 12: expected 1:00:00, got 3:00:00
[15]:
bad_order = timestamps[:12] + [timestamps[13], timestamps[12]] + timestamps[14:]
bad_values = rng.normal(0, 1, len(bad_order)).tolist()

unordered = tdm.TimeSeriesList(
    tdm.Frequency.PT1H, timezone="UTC",
    timestamps=bad_order,
    values=bad_values,
    name="unordered",
)

for w in unordered.validate():
    print(f"⚠ {w}")
⚠ inconsistent frequency at index 12: expected 1:00:00, got 2:00:00
⚠ timestamps not strictly increasing at index 13: 2024-01-15 13:00:00+00:00 >= 2024-01-15 12:00:00+00:00

Detecting missing values

The has_missing property returns True when any value is None (NaN).

[16]:
values_with_gaps = rng.normal(100, 10, 24).tolist()
values_with_gaps[5] = None
values_with_gaps[18] = None

sparse = tdm.TimeSeriesList(
    tdm.Frequency.PT1H, timezone="UTC",
    timestamps=timestamps,
    values=values_with_gaps,
    name="sparse",
    unit="MW",
)

print(f"has_missing: {sparse.has_missing}")
print(f"NaN count:   {np.isnan(sparse.arr).sum()}")
print(f"Length:      {len(sparse)}")
has_missing: True
NaN count:   2
Length:      24

DataType — classifying your data

The DataType enum communicates what kind of data a series holds.

[17]:
print("Available DataType values:")
for dt in tdm.DataType:
    print(f"  {dt.value}")
Available DataType values:
  ACTUAL
  OBSERVATION
  DERIVED
  CALCULATED
  ESTIMATION
  FORECAST
  PREDICTION
  SCENARIO
  SIMULATION
  RECONSTRUCTION
  REFERENCE
  BASELINE
  BENCHMARK
  IDEAL
[18]:
measured = tdm.TimeSeriesList(
    tdm.Frequency.PT1H, timezone="UTC",
    timestamps=timestamps,
    values=(100 + rng.normal(0, 10, 24)).tolist(),
    name="wind_measured",
    unit="MW",
    data_type=tdm.DataType.OBSERVATION,
)

forecast = tdm.TimeSeriesList(
    tdm.Frequency.PT1H, timezone="UTC",
    timestamps=timestamps,
    values=(105 + rng.normal(0, 15, 24)).tolist(),
    name="wind_forecast",
    unit="MW",
    data_type=tdm.DataType.FORECAST,
)

print(f"{measured.name}: data_type={measured.data_type}")
print(f"{forecast.name}: data_type={forecast.data_type}")
wind_measured: data_type=OBSERVATION
wind_forecast: data_type=FORECAST

TimeSeriesType — structural classification

TimeSeriesType describes the structural nature of the series.

[19]:
print("Available TimeSeriesType values:")
for tst in tdm.TimeSeriesType:
    print(f"  {tst.value}")
Available TimeSeriesType values:
  FLAT
  OVERLAPPING
[20]:
flat = tdm.TimeSeriesList(
    tdm.Frequency.PT1H, timezone="UTC",
    timestamps=timestamps,
    values=rng.normal(0, 1, 24).tolist(),
    name="flat_series",
    timeseries_type=tdm.TimeSeriesType.FLAT,
)
print(f"timeseries_type: {flat.timeseries_type}")
timeseries_type: FLAT

Custom attributes

The attributes dict stores arbitrary key-value metadata — source system, fuel type, model version, etc.

[21]:
rich = tdm.TimeSeriesList(
    tdm.Frequency.PT1H,
    timezone="UTC",
    timestamps=timestamps,
    values=(80 + rng.normal(0, 10, 24)).tolist(),
    name="wind_farm_alpha",
    unit="MW",
    description="Measured output from Wind Farm Alpha",
    data_type=tdm.DataType.OBSERVATION,
    timeseries_type=tdm.TimeSeriesType.FLAT,
    attributes={
        "source": "SCADA",
        "fuel": "wind",
        "capacity_mw": "120",
        "operator": "NorthWind Energy",
    },
)

print(f"Attributes: {rich.attributes}")
print(f"Capacity:   {rich.attributes['capacity_mw']} MW")
Attributes: {'source': 'SCADA', 'fuel': 'wind', 'capacity_mw': '120', 'operator': 'NorthWind Energy'}
Capacity:   120 MW

Frequency enum

Frequency is a StrEnum with helpers for calendar-based vs fixed-duration frequencies.

[22]:
print(f"{'Frequency':<8s}  {'timedelta':<22s}  {'calendar?'}")
print("-" * 45)
for f in tdm.Frequency:
    td = f.to_timedelta()
    td_str = str(td) if td else "-"
    print(f"{f.value:<8s}  {td_str:<22s}  {f.is_calendar_based}")
Frequency  timedelta               calendar?
---------------------------------------------
P1Y       -                       True
P3M       -                       True
P1M       -                       True
P1W       7 days, 0:00:00         False
P1D       1 day, 0:00:00          False
PT1H      1:00:00                 False
PT30M     0:30:00                 False
PT15M     0:15:00                 False
PT10M     0:10:00                 False
PT5M      0:05:00                 False
PT1M      0:01:00                 False
PT1S      0:00:01                 False
NONE      -                       False

Metadata survives serialization

Units, data types, attributes, and other metadata round-trip through JSON.

[23]:
json_str = rich.to_json()
restored = tdm.TimeSeriesList.from_json(json_str)

print(f"unit:            {restored.unit}")
print(f"data_type:       {restored.data_type}")
print(f"timeseries_type: {restored.timeseries_type}")
print(f"attributes:      {restored.attributes}")
print(f"description:     {restored.description}")
unit:            MW
data_type:       OBSERVATION
timeseries_type: FLAT
attributes:      {'source': 'SCADA', 'fuel': 'wind', 'capacity_mw': '120', 'operator': 'NorthWind Energy'}
description:     Measured output from Wind Farm Alpha

Summary

Feature

API

Set unit

TimeSeriesList(..., unit="MW")

Convert unit

ts.convert_unit("kW") — returns new series

Auto-convert in arithmetic

ts_mw + ts_kw converts to left operand’s unit

Pint integration

ts.pint_unit — resolves to pint.Unit

Per-column units

TimeSeriesTable(..., units=["MW", "m/s"])

Column conversion

table.convert_unit("kW", column="power")

Validate timestamps

ts.validate() → list of warning strings

Missing values

ts.has_missing

Data classification

DataType.OBSERVATION, .FORECAST, .SCENARIO, …

Structural type

TimeSeriesType.FLAT, .OVERLAPPING

Custom metadata

attributes={"key": "value"}

Frequency info

Frequency.PT1H.to_timedelta(), .is_calendar_based

Next up: nb_04 covers arithmetic operations and comparisons on TimeSeriesList.