Unit Handling and Validation

TimeDataModel treats units, data types, and validation as first-class concerns.
This notebook covers:
  1. Setting and inspecting units on TimeSeries and TimeSeriesTable

  2. Converting between compatible units with convert_unit()

  3. Automatic unit conversion in arithmetic operations

  4. Resolving units to pint objects with pint_unit

  5. Validating timestamps and frequency with validate()

  6. Using DataType, TimeSeriesType, and custom attributes

[1]:
from datetime import datetime, timedelta, timezone

import numpy as np

import timedatamodel as tdm

base = datetime(2024, 1, 15, tzinfo=timezone.utc)
timestamps = [base + timedelta(hours=i) for i in range(24)]
rng = np.random.default_rng(42)
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 5
      1 from datetime import datetime, timedelta, timezone
      3 import numpy as np
----> 5 import timedatamodel as tdm
      7 base = datetime(2024, 1, 15, tzinfo=timezone.utc)
      8 timestamps = [base + timedelta(hours=i) for i in range(24)]

ModuleNotFoundError: No module named 'timedatamodel'

Setting units on a TimeSeries

The unit parameter is a free-form string. It appears in the repr and is carried through all operations.

[2]:
wind = tdm.TimeSeries(
    tdm.Frequency.PT1H,
    timezone="UTC",
    timestamps=timestamps,
    values=(8 + rng.normal(0, 2, 24)).tolist(),
    name="wind_speed",
    unit="m/s",
)
wind
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[2], line 1
----> 1 wind = tdm.TimeSeries(
      2     tdm.Frequency.PT1H,
      3     timezone="UTC",
      4     timestamps=timestamps,
      5     values=(8 + rng.normal(0, 2, 24)).tolist(),
      6     name="wind_speed",
      7     unit="m/s",
      8 )
      9 wind

NameError: name 'tdm' is not defined
[3]:
print(f"Unit: {wind.unit}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 print(f"Unit: {wind.unit}")

NameError: name 'wind' is not defined

Converting units with convert_unit()

convert_unit() uses pint under the hood to convert values.
It returns a new TimeSeries — the original is unchanged.
pip install timedatamodel[pint]
[4]:
wind_kmh = wind.convert_unit("km/h")
wind_knot = wind.convert_unit("knot")

print(f"Original:  {wind.unit:5s}  mean={np.nanmean(wind.arr):.2f}")
print(f"Converted: {wind_kmh.unit:5s}  mean={np.nanmean(wind_kmh.arr):.2f}")
print(f"Converted: {wind_knot.unit:5s}  mean={np.nanmean(wind_knot.arr):.2f}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 wind_kmh = wind.convert_unit("km/h")
      2 wind_knot = wind.convert_unit("knot")
      4 print(f"Original:  {wind.unit:5s}  mean={np.nanmean(wind.arr):.2f}")

NameError: name 'wind' is not defined
[5]:
energy_kwh = tdm.TimeSeries(
    tdm.Frequency.PT1H,
    timezone="UTC",
    timestamps=timestamps,
    values=(500 + rng.normal(0, 50, 24)).tolist(),
    name="energy",
    unit="kWh",
)

energy_mwh = energy_kwh.convert_unit("MWh")
energy_j = energy_kwh.convert_unit("J")

print(f"kWh: mean={np.nanmean(energy_kwh.arr):.1f}")
print(f"MWh: mean={np.nanmean(energy_mwh.arr):.4f}")
print(f"J:   mean={np.nanmean(energy_j.arr):.0f}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 energy_kwh = tdm.TimeSeries(
      2     tdm.Frequency.PT1H,
      3     timezone="UTC",
      4     timestamps=timestamps,
      5     values=(500 + rng.normal(0, 50, 24)).tolist(),
      6     name="energy",
      7     unit="kWh",
      8 )
     10 energy_mwh = energy_kwh.convert_unit("MWh")
     11 energy_j = energy_kwh.convert_unit("J")

NameError: name 'tdm' is not defined

Incompatible units raise an error

[6]:
try:
    wind.convert_unit("MW")
except ValueError as e:
    print(f"Error: {e}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[6], line 2
      1 try:
----> 2     wind.convert_unit("MW")
      3 except ValueError as e:
      4     print(f"Error: {e}")

NameError: name 'wind' is not defined
[7]:
no_unit = tdm.TimeSeries(
    tdm.Frequency.PT1H, timezone="UTC",
    timestamps=timestamps,
    values=rng.normal(0, 1, 24).tolist(),
    name="dimensionless",
)

try:
    no_unit.convert_unit("MW")
except ValueError as e:
    print(f"Error: {e}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[7], line 1
----> 1 no_unit = tdm.TimeSeries(
      2     tdm.Frequency.PT1H, timezone="UTC",
      3     timestamps=timestamps,
      4     values=rng.normal(0, 1, 24).tolist(),
      5     name="dimensionless",
      6 )
      8 try:
      9     no_unit.convert_unit("MW")

NameError: name 'tdm' is not defined

Automatic unit conversion in arithmetic

When you add or subtract two TimeSeries with compatible units, values are automatically converted to the left operand’s unit.

[8]:
power_mw = tdm.TimeSeries(
    tdm.Frequency.PT1H, timezone="UTC",
    timestamps=timestamps,
    values=(100 + rng.normal(0, 10, 24)).tolist(),
    name="plant_a",
    unit="MW",
)

power_kw = tdm.TimeSeries(
    tdm.Frequency.PT1H, timezone="UTC",
    timestamps=timestamps,
    values=(50000 + rng.normal(0, 5000, 24)).tolist(),
    name="plant_b",
    unit="kW",
)

total = power_mw + power_kw
print(f"Result unit: {total.unit}")
print(f"plant_a mean: {np.nanmean(power_mw.arr):.1f} MW")
print(f"plant_b mean: {np.nanmean(power_kw.arr):.1f} kW = {np.nanmean(power_kw.arr)/1000:.1f} MW")
print(f"total mean:   {np.nanmean(total.arr):.1f} MW")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[8], line 1
----> 1 power_mw = tdm.TimeSeries(
      2     tdm.Frequency.PT1H, timezone="UTC",
      3     timestamps=timestamps,
      4     values=(100 + rng.normal(0, 10, 24)).tolist(),
      5     name="plant_a",
      6     unit="MW",
      7 )
      9 power_kw = tdm.TimeSeries(
     10     tdm.Frequency.PT1H, timezone="UTC",
     11     timestamps=timestamps,
   (...)     14     unit="kW",
     15 )
     17 total = power_mw + power_kw

NameError: name 'tdm' is not defined

Mismatched unit presence (one has a unit, the other doesn’t) raises an error:

[9]:
try:
    _ = power_mw + no_unit
except ValueError as e:
    print(f"Error: {e}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[9], line 2
      1 try:
----> 2     _ = power_mw + no_unit
      3 except ValueError as e:
      4     print(f"Error: {e}")

NameError: name 'power_mw' is not defined

Resolving units with pint_unit

The pint_unit property returns a pint.Unit object for programmatic inspection.

[10]:
pu = power_mw.pint_unit
print(f"pint unit: {pu}")
print(f"type:      {type(pu).__name__}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[10], line 1
----> 1 pu = power_mw.pint_unit
      2 print(f"pint unit: {pu}")
      3 print(f"type:      {type(pu).__name__}")

NameError: name 'power_mw' is not defined

Units on TimeSeriesTable

TimeSeriesTable supports per-column units via the units parameter.
convert_unit() can target a single column or all columns.
[11]:
table = tdm.TimeSeriesTable(
    tdm.Frequency.PT1H,
    timezone="UTC",
    timestamps=timestamps,
    values=np.column_stack([
        100 + rng.normal(0, 15, 24),
        8 + rng.normal(0, 2, 24),
    ]),
    names=["power", "wind_speed"],
    units=["MW", "m/s"],
)
table
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[11], line 1
----> 1 table = tdm.TimeSeriesTable(
      2     tdm.Frequency.PT1H,
      3     timezone="UTC",
      4     timestamps=timestamps,
      5     values=np.column_stack([
      6         100 + rng.normal(0, 15, 24),
      7         8 + rng.normal(0, 2, 24),
      8     ]),
      9     names=["power", "wind_speed"],
     10     units=["MW", "m/s"],
     11 )
     12 table

NameError: name 'tdm' is not defined
[12]:
table_kw = table.convert_unit("kW", column="power")

print(f"Original units: {table.units}")
print(f"After convert:  {table_kw.units}")
print(f"Power mean: {table.arr[:, 0].mean():.1f} MW → {table_kw.arr[:, 0].mean():.1f} kW")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[12], line 1
----> 1 table_kw = table.convert_unit("kW", column="power")
      3 print(f"Original units: {table.units}")
      4 print(f"After convert:  {table_kw.units}")

NameError: name 'table' is not defined

Validating timestamps and frequency

validate() checks that timestamps are strictly increasing and match the declared frequency.
It returns a list of warning strings — an empty list means everything is consistent.
[13]:
good = tdm.TimeSeries(
    tdm.Frequency.PT1H, timezone="UTC",
    timestamps=timestamps,
    values=rng.normal(0, 1, 24).tolist(),
    name="clean",
)

warnings = good.validate()
print(f"Warnings: {warnings}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[13], line 1
----> 1 good = tdm.TimeSeries(
      2     tdm.Frequency.PT1H, timezone="UTC",
      3     timestamps=timestamps,
      4     values=rng.normal(0, 1, 24).tolist(),
      5     name="clean",
      6 )
      8 warnings = good.validate()
      9 print(f"Warnings: {warnings}")

NameError: name 'tdm' is not defined
[14]:
gap_timestamps = timestamps[:12] + timestamps[14:]
gap_values = rng.normal(0, 1, len(gap_timestamps)).tolist()

gapped = tdm.TimeSeries(
    tdm.Frequency.PT1H, timezone="UTC",
    timestamps=gap_timestamps,
    values=gap_values,
    name="has_gap",
)

for w in gapped.validate():
    print(f"⚠ {w}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[14], line 1
----> 1 gap_timestamps = timestamps[:12] + timestamps[14:]
      2 gap_values = rng.normal(0, 1, len(gap_timestamps)).tolist()
      4 gapped = tdm.TimeSeries(
      5     tdm.Frequency.PT1H, timezone="UTC",
      6     timestamps=gap_timestamps,
      7     values=gap_values,
      8     name="has_gap",
      9 )

NameError: name 'timestamps' is not defined
[15]:
bad_order = timestamps[:12] + [timestamps[13], timestamps[12]] + timestamps[14:]
bad_values = rng.normal(0, 1, len(bad_order)).tolist()

unordered = tdm.TimeSeries(
    tdm.Frequency.PT1H, timezone="UTC",
    timestamps=bad_order,
    values=bad_values,
    name="unordered",
)

for w in unordered.validate():
    print(f"⚠ {w}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[15], line 1
----> 1 bad_order = timestamps[:12] + [timestamps[13], timestamps[12]] + timestamps[14:]
      2 bad_values = rng.normal(0, 1, len(bad_order)).tolist()
      4 unordered = tdm.TimeSeries(
      5     tdm.Frequency.PT1H, timezone="UTC",
      6     timestamps=bad_order,
      7     values=bad_values,
      8     name="unordered",
      9 )

NameError: name 'timestamps' is not defined

Detecting missing values

The has_missing property returns True when any value is None (NaN).

[16]:
values_with_gaps = rng.normal(100, 10, 24).tolist()
values_with_gaps[5] = None
values_with_gaps[18] = None

sparse = tdm.TimeSeries(
    tdm.Frequency.PT1H, timezone="UTC",
    timestamps=timestamps,
    values=values_with_gaps,
    name="sparse",
    unit="MW",
)

print(f"has_missing: {sparse.has_missing}")
print(f"NaN count:   {np.isnan(sparse.arr).sum()}")
print(f"Length:      {len(sparse)}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[16], line 1
----> 1 values_with_gaps = rng.normal(100, 10, 24).tolist()
      2 values_with_gaps[5] = None
      3 values_with_gaps[18] = None

NameError: name 'rng' is not defined

DataType — classifying your data

The DataType enum communicates what kind of data a series holds.

[17]:
print("Available DataType values:")
for dt in tdm.DataType:
    print(f"  {dt.value}")
Available DataType values:
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[17], line 2
      1 print("Available DataType values:")
----> 2 for dt in tdm.DataType:
      3     print(f"  {dt.value}")

NameError: name 'tdm' is not defined
[18]:
measured = tdm.TimeSeries(
    tdm.Frequency.PT1H, timezone="UTC",
    timestamps=timestamps,
    values=(100 + rng.normal(0, 10, 24)).tolist(),
    name="wind_measured",
    unit="MW",
    data_type=tdm.DataType.MEASUREMENT,
)

forecast = tdm.TimeSeries(
    tdm.Frequency.PT1H, timezone="UTC",
    timestamps=timestamps,
    values=(105 + rng.normal(0, 15, 24)).tolist(),
    name="wind_forecast",
    unit="MW",
    data_type=tdm.DataType.FORECAST,
)

print(f"{measured.name}: data_type={measured.data_type}")
print(f"{forecast.name}: data_type={forecast.data_type}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[18], line 1
----> 1 measured = tdm.TimeSeries(
      2     tdm.Frequency.PT1H, timezone="UTC",
      3     timestamps=timestamps,
      4     values=(100 + rng.normal(0, 10, 24)).tolist(),
      5     name="wind_measured",
      6     unit="MW",
      7     data_type=tdm.DataType.MEASUREMENT,
      8 )
     10 forecast = tdm.TimeSeries(
     11     tdm.Frequency.PT1H, timezone="UTC",
     12     timestamps=timestamps,
   (...)     16     data_type=tdm.DataType.FORECAST,
     17 )
     19 print(f"{measured.name}: data_type={measured.data_type}")

NameError: name 'tdm' is not defined

TimeSeriesType — structural classification

TimeSeriesType describes the structural nature of the series.

[19]:
print("Available TimeSeriesType values:")
for tst in tdm.TimeSeriesType:
    print(f"  {tst.value}")
Available TimeSeriesType values:
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[19], line 2
      1 print("Available TimeSeriesType values:")
----> 2 for tst in tdm.TimeSeriesType:
      3     print(f"  {tst.value}")

NameError: name 'tdm' is not defined
[20]:
flat = tdm.TimeSeries(
    tdm.Frequency.PT1H, timezone="UTC",
    timestamps=timestamps,
    values=rng.normal(0, 1, 24).tolist(),
    name="flat_series",
    timeseries_type=tdm.TimeSeriesType.FLAT,
)
print(f"timeseries_type: {flat.timeseries_type}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[20], line 1
----> 1 flat = tdm.TimeSeries(
      2     tdm.Frequency.PT1H, timezone="UTC",
      3     timestamps=timestamps,
      4     values=rng.normal(0, 1, 24).tolist(),
      5     name="flat_series",
      6     timeseries_type=tdm.TimeSeriesType.FLAT,
      7 )
      8 print(f"timeseries_type: {flat.timeseries_type}")

NameError: name 'tdm' is not defined

Custom attributes

The attributes dict stores arbitrary key-value metadata — source system, fuel type, model version, etc.

[21]:
rich = tdm.TimeSeries(
    tdm.Frequency.PT1H,
    timezone="UTC",
    timestamps=timestamps,
    values=(80 + rng.normal(0, 10, 24)).tolist(),
    name="wind_farm_alpha",
    unit="MW",
    description="Measured output from Wind Farm Alpha",
    data_type=tdm.DataType.MEASUREMENT,
    timeseries_type=tdm.TimeSeriesType.FLAT,
    attributes={
        "source": "SCADA",
        "fuel": "wind",
        "capacity_mw": "120",
        "operator": "NorthWind Energy",
    },
)

print(f"Attributes: {rich.attributes}")
print(f"Capacity:   {rich.attributes['capacity_mw']} MW")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[21], line 1
----> 1 rich = tdm.TimeSeries(
      2     tdm.Frequency.PT1H,
      3     timezone="UTC",
      4     timestamps=timestamps,
      5     values=(80 + rng.normal(0, 10, 24)).tolist(),
      6     name="wind_farm_alpha",
      7     unit="MW",
      8     description="Measured output from Wind Farm Alpha",
      9     data_type=tdm.DataType.MEASUREMENT,
     10     timeseries_type=tdm.TimeSeriesType.FLAT,
     11     attributes={
     12         "source": "SCADA",
     13         "fuel": "wind",
     14         "capacity_mw": "120",
     15         "operator": "NorthWind Energy",
     16     },
     17 )
     19 print(f"Attributes: {rich.attributes}")
     20 print(f"Capacity:   {rich.attributes['capacity_mw']} MW")

NameError: name 'tdm' is not defined

Frequency enum

Frequency is a StrEnum with helpers for calendar-based vs fixed-duration frequencies.

[22]:
print(f"{'Frequency':<8s}  {'timedelta':<22s}  {'calendar?'}")
print("-" * 45)
for f in tdm.Frequency:
    td = f.to_timedelta()
    td_str = str(td) if td else "-"
    print(f"{f.value:<8s}  {td_str:<22s}  {f.is_calendar_based}")
Frequency  timedelta               calendar?
---------------------------------------------
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[22], line 3
      1 print(f"{'Frequency':<8s}  {'timedelta':<22s}  {'calendar?'}")
      2 print("-" * 45)
----> 3 for f in tdm.Frequency:
      4     td = f.to_timedelta()
      5     td_str = str(td) if td else "-"

NameError: name 'tdm' is not defined

Metadata survives serialization

Units, data types, attributes, and other metadata round-trip through JSON.

[23]:
json_str = rich.to_json()
restored = tdm.TimeSeries.from_json(json_str)

print(f"unit:            {restored.unit}")
print(f"data_type:       {restored.data_type}")
print(f"timeseries_type: {restored.timeseries_type}")
print(f"attributes:      {restored.attributes}")
print(f"description:     {restored.description}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[23], line 1
----> 1 json_str = rich.to_json()
      2 restored = tdm.TimeSeries.from_json(json_str)
      4 print(f"unit:            {restored.unit}")

NameError: name 'rich' is not defined

Summary

Feature

API

Set unit

TimeSeries(..., unit="MW")

Convert unit

ts.convert_unit("kW") — returns new series

Auto-convert in arithmetic

ts_mw + ts_kw converts to left operand’s unit

Pint integration

ts.pint_unit — resolves to pint.Unit

Per-column units

TimeSeriesTable(..., units=["MW", "m/s"])

Column conversion

table.convert_unit("kW", column="power")

Validate timestamps

ts.validate() → list of warning strings

Missing values

ts.has_missing

Data classification

DataType.MEASUREMENT, .FORECAST, .SCENARIO, …

Structural type

TimeSeriesType.FLAT, .OVERLAPPING

Custom metadata

attributes={"key": "value"}

Frequency info

Frequency.PT1H.to_timedelta(), .is_calendar_based

Next up: nb_04 covers arithmetic operations and comparisons on TimeSeries.