Unit Handling and Validation
Setting and inspecting units on
TimeSeriesandTimeSeriesTableConverting between compatible units with
convert_unit()Automatic unit conversion in arithmetic operations
Resolving units to pint objects with
pint_unitValidating timestamps and frequency with
validate()Using
DataType,TimeSeriesType, and customattributes
[1]:
from datetime import datetime, timedelta, timezone
import numpy as np
import timedatamodel as tdm
base = datetime(2024, 1, 15, tzinfo=timezone.utc)
timestamps = [base + timedelta(hours=i) for i in range(24)]
rng = np.random.default_rng(42)
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 5
1 from datetime import datetime, timedelta, timezone
3 import numpy as np
----> 5 import timedatamodel as tdm
7 base = datetime(2024, 1, 15, tzinfo=timezone.utc)
8 timestamps = [base + timedelta(hours=i) for i in range(24)]
ModuleNotFoundError: No module named 'timedatamodel'
Setting units on a TimeSeries
The unit parameter is a free-form string. It appears in the repr and is carried through all operations.
[2]:
wind = tdm.TimeSeries(
tdm.Frequency.PT1H,
timezone="UTC",
timestamps=timestamps,
values=(8 + rng.normal(0, 2, 24)).tolist(),
name="wind_speed",
unit="m/s",
)
wind
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[2], line 1
----> 1 wind = tdm.TimeSeries(
2 tdm.Frequency.PT1H,
3 timezone="UTC",
4 timestamps=timestamps,
5 values=(8 + rng.normal(0, 2, 24)).tolist(),
6 name="wind_speed",
7 unit="m/s",
8 )
9 wind
NameError: name 'tdm' is not defined
[3]:
print(f"Unit: {wind.unit}")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[3], line 1
----> 1 print(f"Unit: {wind.unit}")
NameError: name 'wind' is not defined
Converting units with convert_unit()
convert_unit() uses pint under the hood to convert values.TimeSeries — the original is unchanged.pip install timedatamodel[pint]
[4]:
wind_kmh = wind.convert_unit("km/h")
wind_knot = wind.convert_unit("knot")
print(f"Original: {wind.unit:5s} mean={np.nanmean(wind.arr):.2f}")
print(f"Converted: {wind_kmh.unit:5s} mean={np.nanmean(wind_kmh.arr):.2f}")
print(f"Converted: {wind_knot.unit:5s} mean={np.nanmean(wind_knot.arr):.2f}")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[4], line 1
----> 1 wind_kmh = wind.convert_unit("km/h")
2 wind_knot = wind.convert_unit("knot")
4 print(f"Original: {wind.unit:5s} mean={np.nanmean(wind.arr):.2f}")
NameError: name 'wind' is not defined
[5]:
energy_kwh = tdm.TimeSeries(
tdm.Frequency.PT1H,
timezone="UTC",
timestamps=timestamps,
values=(500 + rng.normal(0, 50, 24)).tolist(),
name="energy",
unit="kWh",
)
energy_mwh = energy_kwh.convert_unit("MWh")
energy_j = energy_kwh.convert_unit("J")
print(f"kWh: mean={np.nanmean(energy_kwh.arr):.1f}")
print(f"MWh: mean={np.nanmean(energy_mwh.arr):.4f}")
print(f"J: mean={np.nanmean(energy_j.arr):.0f}")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[5], line 1
----> 1 energy_kwh = tdm.TimeSeries(
2 tdm.Frequency.PT1H,
3 timezone="UTC",
4 timestamps=timestamps,
5 values=(500 + rng.normal(0, 50, 24)).tolist(),
6 name="energy",
7 unit="kWh",
8 )
10 energy_mwh = energy_kwh.convert_unit("MWh")
11 energy_j = energy_kwh.convert_unit("J")
NameError: name 'tdm' is not defined
Incompatible units raise an error
[6]:
try:
wind.convert_unit("MW")
except ValueError as e:
print(f"Error: {e}")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[6], line 2
1 try:
----> 2 wind.convert_unit("MW")
3 except ValueError as e:
4 print(f"Error: {e}")
NameError: name 'wind' is not defined
[7]:
no_unit = tdm.TimeSeries(
tdm.Frequency.PT1H, timezone="UTC",
timestamps=timestamps,
values=rng.normal(0, 1, 24).tolist(),
name="dimensionless",
)
try:
no_unit.convert_unit("MW")
except ValueError as e:
print(f"Error: {e}")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[7], line 1
----> 1 no_unit = tdm.TimeSeries(
2 tdm.Frequency.PT1H, timezone="UTC",
3 timestamps=timestamps,
4 values=rng.normal(0, 1, 24).tolist(),
5 name="dimensionless",
6 )
8 try:
9 no_unit.convert_unit("MW")
NameError: name 'tdm' is not defined
Automatic unit conversion in arithmetic
When you add or subtract two TimeSeries with compatible units, values are automatically converted to the left operand’s unit.
[8]:
power_mw = tdm.TimeSeries(
tdm.Frequency.PT1H, timezone="UTC",
timestamps=timestamps,
values=(100 + rng.normal(0, 10, 24)).tolist(),
name="plant_a",
unit="MW",
)
power_kw = tdm.TimeSeries(
tdm.Frequency.PT1H, timezone="UTC",
timestamps=timestamps,
values=(50000 + rng.normal(0, 5000, 24)).tolist(),
name="plant_b",
unit="kW",
)
total = power_mw + power_kw
print(f"Result unit: {total.unit}")
print(f"plant_a mean: {np.nanmean(power_mw.arr):.1f} MW")
print(f"plant_b mean: {np.nanmean(power_kw.arr):.1f} kW = {np.nanmean(power_kw.arr)/1000:.1f} MW")
print(f"total mean: {np.nanmean(total.arr):.1f} MW")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[8], line 1
----> 1 power_mw = tdm.TimeSeries(
2 tdm.Frequency.PT1H, timezone="UTC",
3 timestamps=timestamps,
4 values=(100 + rng.normal(0, 10, 24)).tolist(),
5 name="plant_a",
6 unit="MW",
7 )
9 power_kw = tdm.TimeSeries(
10 tdm.Frequency.PT1H, timezone="UTC",
11 timestamps=timestamps,
(...) 14 unit="kW",
15 )
17 total = power_mw + power_kw
NameError: name 'tdm' is not defined
Mismatched unit presence (one has a unit, the other doesn’t) raises an error:
[9]:
try:
_ = power_mw + no_unit
except ValueError as e:
print(f"Error: {e}")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[9], line 2
1 try:
----> 2 _ = power_mw + no_unit
3 except ValueError as e:
4 print(f"Error: {e}")
NameError: name 'power_mw' is not defined
Resolving units with pint_unit
The pint_unit property returns a pint.Unit object for programmatic inspection.
[10]:
pu = power_mw.pint_unit
print(f"pint unit: {pu}")
print(f"type: {type(pu).__name__}")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[10], line 1
----> 1 pu = power_mw.pint_unit
2 print(f"pint unit: {pu}")
3 print(f"type: {type(pu).__name__}")
NameError: name 'power_mw' is not defined
Units on TimeSeriesTable
TimeSeriesTable supports per-column units via the units parameter.convert_unit() can target a single column or all columns.[11]:
table = tdm.TimeSeriesTable(
tdm.Frequency.PT1H,
timezone="UTC",
timestamps=timestamps,
values=np.column_stack([
100 + rng.normal(0, 15, 24),
8 + rng.normal(0, 2, 24),
]),
names=["power", "wind_speed"],
units=["MW", "m/s"],
)
table
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[11], line 1
----> 1 table = tdm.TimeSeriesTable(
2 tdm.Frequency.PT1H,
3 timezone="UTC",
4 timestamps=timestamps,
5 values=np.column_stack([
6 100 + rng.normal(0, 15, 24),
7 8 + rng.normal(0, 2, 24),
8 ]),
9 names=["power", "wind_speed"],
10 units=["MW", "m/s"],
11 )
12 table
NameError: name 'tdm' is not defined
[12]:
table_kw = table.convert_unit("kW", column="power")
print(f"Original units: {table.units}")
print(f"After convert: {table_kw.units}")
print(f"Power mean: {table.arr[:, 0].mean():.1f} MW → {table_kw.arr[:, 0].mean():.1f} kW")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[12], line 1
----> 1 table_kw = table.convert_unit("kW", column="power")
3 print(f"Original units: {table.units}")
4 print(f"After convert: {table_kw.units}")
NameError: name 'table' is not defined
Validating timestamps and frequency
validate() checks that timestamps are strictly increasing and match the declared frequency.[13]:
good = tdm.TimeSeries(
tdm.Frequency.PT1H, timezone="UTC",
timestamps=timestamps,
values=rng.normal(0, 1, 24).tolist(),
name="clean",
)
warnings = good.validate()
print(f"Warnings: {warnings}")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[13], line 1
----> 1 good = tdm.TimeSeries(
2 tdm.Frequency.PT1H, timezone="UTC",
3 timestamps=timestamps,
4 values=rng.normal(0, 1, 24).tolist(),
5 name="clean",
6 )
8 warnings = good.validate()
9 print(f"Warnings: {warnings}")
NameError: name 'tdm' is not defined
[14]:
gap_timestamps = timestamps[:12] + timestamps[14:]
gap_values = rng.normal(0, 1, len(gap_timestamps)).tolist()
gapped = tdm.TimeSeries(
tdm.Frequency.PT1H, timezone="UTC",
timestamps=gap_timestamps,
values=gap_values,
name="has_gap",
)
for w in gapped.validate():
print(f"⚠ {w}")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[14], line 1
----> 1 gap_timestamps = timestamps[:12] + timestamps[14:]
2 gap_values = rng.normal(0, 1, len(gap_timestamps)).tolist()
4 gapped = tdm.TimeSeries(
5 tdm.Frequency.PT1H, timezone="UTC",
6 timestamps=gap_timestamps,
7 values=gap_values,
8 name="has_gap",
9 )
NameError: name 'timestamps' is not defined
[15]:
bad_order = timestamps[:12] + [timestamps[13], timestamps[12]] + timestamps[14:]
bad_values = rng.normal(0, 1, len(bad_order)).tolist()
unordered = tdm.TimeSeries(
tdm.Frequency.PT1H, timezone="UTC",
timestamps=bad_order,
values=bad_values,
name="unordered",
)
for w in unordered.validate():
print(f"⚠ {w}")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[15], line 1
----> 1 bad_order = timestamps[:12] + [timestamps[13], timestamps[12]] + timestamps[14:]
2 bad_values = rng.normal(0, 1, len(bad_order)).tolist()
4 unordered = tdm.TimeSeries(
5 tdm.Frequency.PT1H, timezone="UTC",
6 timestamps=bad_order,
7 values=bad_values,
8 name="unordered",
9 )
NameError: name 'timestamps' is not defined
Detecting missing values
The has_missing property returns True when any value is None (NaN).
[16]:
values_with_gaps = rng.normal(100, 10, 24).tolist()
values_with_gaps[5] = None
values_with_gaps[18] = None
sparse = tdm.TimeSeries(
tdm.Frequency.PT1H, timezone="UTC",
timestamps=timestamps,
values=values_with_gaps,
name="sparse",
unit="MW",
)
print(f"has_missing: {sparse.has_missing}")
print(f"NaN count: {np.isnan(sparse.arr).sum()}")
print(f"Length: {len(sparse)}")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[16], line 1
----> 1 values_with_gaps = rng.normal(100, 10, 24).tolist()
2 values_with_gaps[5] = None
3 values_with_gaps[18] = None
NameError: name 'rng' is not defined
DataType — classifying your data
The DataType enum communicates what kind of data a series holds.
[17]:
print("Available DataType values:")
for dt in tdm.DataType:
print(f" {dt.value}")
Available DataType values:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[17], line 2
1 print("Available DataType values:")
----> 2 for dt in tdm.DataType:
3 print(f" {dt.value}")
NameError: name 'tdm' is not defined
[18]:
measured = tdm.TimeSeries(
tdm.Frequency.PT1H, timezone="UTC",
timestamps=timestamps,
values=(100 + rng.normal(0, 10, 24)).tolist(),
name="wind_measured",
unit="MW",
data_type=tdm.DataType.MEASUREMENT,
)
forecast = tdm.TimeSeries(
tdm.Frequency.PT1H, timezone="UTC",
timestamps=timestamps,
values=(105 + rng.normal(0, 15, 24)).tolist(),
name="wind_forecast",
unit="MW",
data_type=tdm.DataType.FORECAST,
)
print(f"{measured.name}: data_type={measured.data_type}")
print(f"{forecast.name}: data_type={forecast.data_type}")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[18], line 1
----> 1 measured = tdm.TimeSeries(
2 tdm.Frequency.PT1H, timezone="UTC",
3 timestamps=timestamps,
4 values=(100 + rng.normal(0, 10, 24)).tolist(),
5 name="wind_measured",
6 unit="MW",
7 data_type=tdm.DataType.MEASUREMENT,
8 )
10 forecast = tdm.TimeSeries(
11 tdm.Frequency.PT1H, timezone="UTC",
12 timestamps=timestamps,
(...) 16 data_type=tdm.DataType.FORECAST,
17 )
19 print(f"{measured.name}: data_type={measured.data_type}")
NameError: name 'tdm' is not defined
TimeSeriesType — structural classification
TimeSeriesType describes the structural nature of the series.
[19]:
print("Available TimeSeriesType values:")
for tst in tdm.TimeSeriesType:
print(f" {tst.value}")
Available TimeSeriesType values:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[19], line 2
1 print("Available TimeSeriesType values:")
----> 2 for tst in tdm.TimeSeriesType:
3 print(f" {tst.value}")
NameError: name 'tdm' is not defined
[20]:
flat = tdm.TimeSeries(
tdm.Frequency.PT1H, timezone="UTC",
timestamps=timestamps,
values=rng.normal(0, 1, 24).tolist(),
name="flat_series",
timeseries_type=tdm.TimeSeriesType.FLAT,
)
print(f"timeseries_type: {flat.timeseries_type}")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[20], line 1
----> 1 flat = tdm.TimeSeries(
2 tdm.Frequency.PT1H, timezone="UTC",
3 timestamps=timestamps,
4 values=rng.normal(0, 1, 24).tolist(),
5 name="flat_series",
6 timeseries_type=tdm.TimeSeriesType.FLAT,
7 )
8 print(f"timeseries_type: {flat.timeseries_type}")
NameError: name 'tdm' is not defined
Custom attributes
The attributes dict stores arbitrary key-value metadata — source system, fuel type, model version, etc.
[21]:
rich = tdm.TimeSeries(
tdm.Frequency.PT1H,
timezone="UTC",
timestamps=timestamps,
values=(80 + rng.normal(0, 10, 24)).tolist(),
name="wind_farm_alpha",
unit="MW",
description="Measured output from Wind Farm Alpha",
data_type=tdm.DataType.MEASUREMENT,
timeseries_type=tdm.TimeSeriesType.FLAT,
attributes={
"source": "SCADA",
"fuel": "wind",
"capacity_mw": "120",
"operator": "NorthWind Energy",
},
)
print(f"Attributes: {rich.attributes}")
print(f"Capacity: {rich.attributes['capacity_mw']} MW")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[21], line 1
----> 1 rich = tdm.TimeSeries(
2 tdm.Frequency.PT1H,
3 timezone="UTC",
4 timestamps=timestamps,
5 values=(80 + rng.normal(0, 10, 24)).tolist(),
6 name="wind_farm_alpha",
7 unit="MW",
8 description="Measured output from Wind Farm Alpha",
9 data_type=tdm.DataType.MEASUREMENT,
10 timeseries_type=tdm.TimeSeriesType.FLAT,
11 attributes={
12 "source": "SCADA",
13 "fuel": "wind",
14 "capacity_mw": "120",
15 "operator": "NorthWind Energy",
16 },
17 )
19 print(f"Attributes: {rich.attributes}")
20 print(f"Capacity: {rich.attributes['capacity_mw']} MW")
NameError: name 'tdm' is not defined
Frequency enum
Frequency is a StrEnum with helpers for calendar-based vs fixed-duration frequencies.
[22]:
print(f"{'Frequency':<8s} {'timedelta':<22s} {'calendar?'}")
print("-" * 45)
for f in tdm.Frequency:
td = f.to_timedelta()
td_str = str(td) if td else "-"
print(f"{f.value:<8s} {td_str:<22s} {f.is_calendar_based}")
Frequency timedelta calendar?
---------------------------------------------
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[22], line 3
1 print(f"{'Frequency':<8s} {'timedelta':<22s} {'calendar?'}")
2 print("-" * 45)
----> 3 for f in tdm.Frequency:
4 td = f.to_timedelta()
5 td_str = str(td) if td else "-"
NameError: name 'tdm' is not defined
Metadata survives serialization
Units, data types, attributes, and other metadata round-trip through JSON.
[23]:
json_str = rich.to_json()
restored = tdm.TimeSeries.from_json(json_str)
print(f"unit: {restored.unit}")
print(f"data_type: {restored.data_type}")
print(f"timeseries_type: {restored.timeseries_type}")
print(f"attributes: {restored.attributes}")
print(f"description: {restored.description}")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[23], line 1
----> 1 json_str = rich.to_json()
2 restored = tdm.TimeSeries.from_json(json_str)
4 print(f"unit: {restored.unit}")
NameError: name 'rich' is not defined
Summary
Feature |
API |
|---|---|
Set unit |
|
Convert unit |
|
Auto-convert in arithmetic |
|
Pint integration |
|
Per-column units |
|
Column conversion |
|
Validate timestamps |
|
Missing values |
|
Data classification |
|
Structural type |
|
Custom metadata |
|
Frequency info |
|
Next up: nb_04 covers arithmetic operations and comparisons on TimeSeries.