Hierarchical Time Series
HierarchicalTimeSeries lets you model that structure directly and aggregate bottom-up through the tree.Class |
Purpose |
|---|---|
|
A single node — key, level, children, and an optional |
|
The tree container — traversal, aggregation, conversion |
|
|
[1]:
from datetime import datetime, timedelta, timezone
import numpy as np
import timedatamodel as tdm
base = datetime(2024, 1, 15, tzinfo=timezone.utc)
timestamps = [base + timedelta(hours=i) for i in range(24)]
rng = np.random.default_rng(42)
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 5
1 from datetime import datetime, timedelta, timezone
3 import numpy as np
----> 5 import timedatamodel as tdm
7 base = datetime(2024, 1, 15, tzinfo=timezone.utc)
8 timestamps = [base + timedelta(hours=i) for i in range(24)]
ModuleNotFoundError: No module named 'timedatamodel'
Create leaf time series
Each leaf in the hierarchy holds a TimeSeries. Here we model electricity consumption for five Norwegian cities.
[2]:
def make_consumption(name: str, base_mw: float) -> tdm.TimeSeries:
pattern = base_mw * (1 + 0.3 * np.sin(np.linspace(0, 2 * np.pi, 24)))
noise = rng.normal(0, base_mw * 0.05, 24)
return tdm.TimeSeries(
tdm.Frequency.PT1H,
timezone="Europe/Oslo",
timestamps=timestamps,
values=(pattern + noise).tolist(),
name=name,
unit="MW",
)
ts_oslo = make_consumption("Oslo", 500)
ts_bergen = make_consumption("Bergen", 200)
ts_stavanger = make_consumption("Stavanger", 150)
ts_tromsoe = make_consumption("Tromsø", 80)
ts_bodoe = make_consumption("Bodø", 50)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[2], line 13
3 noise = rng.normal(0, base_mw * 0.05, 24)
4 return tdm.TimeSeries(
5 tdm.Frequency.PT1H,
6 timezone="Europe/Oslo",
(...) 10 unit="MW",
11 )
---> 13 ts_oslo = make_consumption("Oslo", 500)
14 ts_bergen = make_consumption("Bergen", 200)
15 ts_stavanger = make_consumption("Stavanger", 150)
Cell In[2], line 3, in make_consumption(name, base_mw)
1 def make_consumption(name: str, base_mw: float) -> tdm.TimeSeries:
2 pattern = base_mw * (1 + 0.3 * np.sin(np.linspace(0, 2 * np.pi, 24)))
----> 3 noise = rng.normal(0, base_mw * 0.05, 24)
4 return tdm.TimeSeries(
5 tdm.Frequency.PT1H,
6 timezone="Europe/Oslo",
(...) 10 unit="MW",
11 )
NameError: name 'rng' is not defined
Building a hierarchy with HierarchyNode
Construct the tree by nesting HierarchyNode objects. Leaves hold a TimeSeries; interior nodes have children.
[3]:
root = tdm.HierarchyNode(
key="Norway",
level="country",
children=[
tdm.HierarchyNode(
key="South",
level="region",
children=[
tdm.HierarchyNode(key="Oslo", level="city", timeseries=ts_oslo),
tdm.HierarchyNode(key="Bergen", level="city", timeseries=ts_bergen),
tdm.HierarchyNode(key="Stavanger", level="city", timeseries=ts_stavanger),
],
),
tdm.HierarchyNode(
key="North",
level="region",
children=[
tdm.HierarchyNode(key="Tromsø", level="city", timeseries=ts_tromsoe),
tdm.HierarchyNode(key="Bodø", level="city", timeseries=ts_bodoe),
],
),
],
)
hierarchy = tdm.HierarchicalTimeSeries(
root,
name="Norway Consumption",
description="Hourly electricity consumption by city",
levels=["country", "region", "city"],
aggregation=tdm.AggregationMethod.SUM,
)
hierarchy
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[3], line 1
----> 1 root = tdm.HierarchyNode(
2 key="Norway",
3 level="country",
4 children=[
5 tdm.HierarchyNode(
6 key="South",
7 level="region",
8 children=[
9 tdm.HierarchyNode(key="Oslo", level="city", timeseries=ts_oslo),
10 tdm.HierarchyNode(key="Bergen", level="city", timeseries=ts_bergen),
11 tdm.HierarchyNode(key="Stavanger", level="city", timeseries=ts_stavanger),
12 ],
13 ),
14 tdm.HierarchyNode(
15 key="North",
16 level="region",
17 children=[
18 tdm.HierarchyNode(key="Tromsø", level="city", timeseries=ts_tromsoe),
19 tdm.HierarchyNode(key="Bodø", level="city", timeseries=ts_bodoe),
20 ],
21 ),
22 ],
23 )
25 hierarchy = tdm.HierarchicalTimeSeries(
26 root,
27 name="Norway Consumption",
(...) 30 aggregation=tdm.AggregationMethod.SUM,
31 )
32 hierarchy
NameError: name 'tdm' is not defined
Inspecting the tree
Basic properties tell you the shape of the hierarchy.
[4]:
print(f"Name: {hierarchy.name}")
print(f"Levels: {hierarchy.levels}")
print(f"# levels: {hierarchy.n_levels}")
print(f"# nodes: {hierarchy.n_nodes}")
print(f"# leaves: {hierarchy.n_leaves}")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[4], line 1
----> 1 print(f"Name: {hierarchy.name}")
2 print(f"Levels: {hierarchy.levels}")
3 print(f"# levels: {hierarchy.n_levels}")
NameError: name 'hierarchy' is not defined
Leaves and walking
leaves() returns all leaf nodes. walk() yields nodes in pre-order (default) or post-order.
[8]:
print("All leaves:")
for leaf in hierarchy.leaves():
print(f" {leaf.key:12s} path={leaf.path}")
All leaves:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[8], line 2
1 print("All leaves:")
----> 2 for leaf in hierarchy.leaves():
3 print(f" {leaf.key:12s} path={leaf.path}")
NameError: name 'hierarchy' is not defined
[9]:
print("Pre-order walk:")
for node in hierarchy.walk(order="pre"):
indent = " " * node.depth
label = f"{node.key} [{node.level}]"
if node.is_leaf:
label += f" — {len(node.timeseries)} pts"
print(f"{indent}{label}")
Pre-order walk:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[9], line 2
1 print("Pre-order walk:")
----> 2 for node in hierarchy.walk(order="pre"):
3 indent = " " * node.depth
4 label = f"{node.key} [{node.level}]"
NameError: name 'hierarchy' is not defined
Bottom-up aggregation
aggregate() recursively combines leaf series using the chosen method (default: SUM).[10]:
total = hierarchy.aggregate()
print(f"Name: {total.name}")
print(f"Length: {len(total)} data points")
print(f"Mean: {np.nanmean(total.arr):.1f} MW")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[10], line 1
----> 1 total = hierarchy.aggregate()
2 print(f"Name: {total.name}")
3 print(f"Length: {len(total)} data points")
NameError: name 'hierarchy' is not defined
[11]:
south_total = hierarchy.aggregate(south)
print(f"South region total — mean: {np.nanmean(south_total.arr):.1f} MW")
north = hierarchy.get_node("North")
north_total = hierarchy.aggregate(north)
print(f"North region total — mean: {np.nanmean(north_total.arr):.1f} MW")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[11], line 1
----> 1 south_total = hierarchy.aggregate(south)
2 print(f"South region total — mean: {np.nanmean(south_total.arr):.1f} MW")
4 north = hierarchy.get_node("North")
NameError: name 'hierarchy' is not defined
Level-wise aggregation
aggregate_level(level) aggregates every node at the named level, returning a dict.
[12]:
region_agg = hierarchy.aggregate_level("region")
for name, ts in region_agg.items():
print(f"{name:8s} mean={np.nanmean(ts.arr):7.1f} MW max={np.nanmax(ts.arr):7.1f} MW")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[12], line 1
----> 1 region_agg = hierarchy.aggregate_level("region")
3 for name, ts in region_agg.items():
4 print(f"{name:8s} mean={np.nanmean(ts.arr):7.1f} MW max={np.nanmax(ts.arr):7.1f} MW")
NameError: name 'hierarchy' is not defined
Choosing an aggregation method
Override the default method by passing a different AggregationMethod.
[13]:
for method in tdm.AggregationMethod:
agg = hierarchy.aggregate(method=method)
vals = agg.arr
print(f"{method.value:5s} mean={np.nanmean(vals):7.1f} min={np.nanmin(vals):7.1f} max={np.nanmax(vals):7.1f}")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[13], line 1
----> 1 for method in tdm.AggregationMethod:
2 agg = hierarchy.aggregate(method=method)
3 vals = agg.arr
NameError: name 'tdm' is not defined
Subtree extraction
subtree(*path) creates a new HierarchicalTimeSeries rooted at the specified node.
[14]:
south_tree = hierarchy.subtree("South")
print(south_tree)
print(f"\nLevels: {south_tree.levels}")
print(f"Leaves: {south_tree.n_leaves}")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[14], line 1
----> 1 south_tree = hierarchy.subtree("South")
2 print(south_tree)
3 print(f"\nLevels: {south_tree.levels}")
NameError: name 'hierarchy' is not defined
Converting to other containers
Flatten the hierarchy into a TimeSeriesCollection or TimeSeriesTable.
[15]:
collection = hierarchy.to_collection()
print(f"Leaf-level collection: {list(collection.keys())}")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[15], line 1
----> 1 collection = hierarchy.to_collection()
2 print(f"Leaf-level collection: {list(collection.keys())}")
NameError: name 'hierarchy' is not defined
[16]:
collection_regions = hierarchy.to_collection(level="region")
print("Region-level collection (aggregated):")
for key, ts in collection_regions.items():
print(f" {key}: {len(ts)} pts, mean={np.nanmean(ts.arr):.1f} MW")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[16], line 1
----> 1 collection_regions = hierarchy.to_collection(level="region")
2 print("Region-level collection (aggregated):")
3 for key, ts in collection_regions.items():
NameError: name 'hierarchy' is not defined
[17]:
table = hierarchy.to_table()
print(f"Table shape: {len(table)} rows × {table.n_columns} columns")
print(f"Columns: {table.names}")
table
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[17], line 1
----> 1 table = hierarchy.to_table()
2 print(f"Table shape: {len(table)} rows × {table.n_columns} columns")
3 print(f"Columns: {table.names}")
NameError: name 'hierarchy' is not defined
Building from a DataFrame
from_dataframe builds the tree from a long-format DataFrame with hierarchy columns.[18]:
import pandas as pd
rows = []
for ts_dt in timestamps:
for region, cities in [("South", ["Oslo", "Bergen"]), ("North", ["Tromsø", "Bodø"])]:
for city in cities:
rows.append({
"timestamp": ts_dt,
"region": region,
"city": city,
"consumption_mw": float(rng.normal(200, 30)),
})
df = pd.DataFrame(rows)
print(f"DataFrame shape: {df.shape}")
df.head(8)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[18], line 4
1 import pandas as pd
3 rows = []
----> 4 for ts_dt in timestamps:
5 for region, cities in [("South", ["Oslo", "Bergen"]), ("North", ["Tromsø", "Bodø"])]:
6 for city in cities:
NameError: name 'timestamps' is not defined
[19]:
h_from_df = tdm.HierarchicalTimeSeries.from_dataframe(
df,
level_columns=["region", "city"],
value_column="consumption_mw",
timestamp_column="timestamp",
name="Consumption from DataFrame",
frequency=tdm.Frequency.PT1H,
timezone="Europe/Oslo",
)
h_from_df
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[19], line 1
----> 1 h_from_df = tdm.HierarchicalTimeSeries.from_dataframe(
2 df,
3 level_columns=["region", "city"],
4 value_column="consumption_mw",
5 timestamp_column="timestamp",
6 name="Consumption from DataFrame",
7 frequency=tdm.Frequency.PT1H,
8 timezone="Europe/Oslo",
9 )
10 h_from_df
NameError: name 'tdm' is not defined
[20]:
total_df = h_from_df.aggregate()
print(f"Total from DataFrame hierarchy: mean={np.nanmean(total_df.arr):.1f} MW")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[20], line 1
----> 1 total_df = h_from_df.aggregate()
2 print(f"Total from DataFrame hierarchy: mean={np.nanmean(total_df.arr):.1f} MW")
NameError: name 'h_from_df' is not defined
Another example: energy production by source
Hierarchies can model any tree-shaped relationship — here, power production broken down by energy source.
[21]:
energy_root = tdm.HierarchyNode(
key="Total",
level="total",
children=[
tdm.HierarchyNode(
key="Wind",
level="source",
children=[
tdm.HierarchyNode(key="Farm A", level="farm", timeseries=make_consumption("Farm A", 100)),
tdm.HierarchyNode(key="Farm B", level="farm", timeseries=make_consumption("Farm B", 80)),
],
),
tdm.HierarchyNode(
key="Solar",
level="source",
children=[
tdm.HierarchyNode(key="Plant X", level="farm", timeseries=make_consumption("Plant X", 60)),
],
),
],
)
energy = tdm.HierarchicalTimeSeries(
energy_root,
name="Energy Production",
levels=["total", "source", "farm"],
aggregation=tdm.AggregationMethod.SUM,
)
energy
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[21], line 1
----> 1 energy_root = tdm.HierarchyNode(
2 key="Total",
3 level="total",
4 children=[
5 tdm.HierarchyNode(
6 key="Wind",
7 level="source",
8 children=[
9 tdm.HierarchyNode(key="Farm A", level="farm", timeseries=make_consumption("Farm A", 100)),
10 tdm.HierarchyNode(key="Farm B", level="farm", timeseries=make_consumption("Farm B", 80)),
11 ],
12 ),
13 tdm.HierarchyNode(
14 key="Solar",
15 level="source",
16 children=[
17 tdm.HierarchyNode(key="Plant X", level="farm", timeseries=make_consumption("Plant X", 60)),
18 ],
19 ),
20 ],
21 )
23 energy = tdm.HierarchicalTimeSeries(
24 energy_root,
25 name="Energy Production",
26 levels=["total", "source", "farm"],
27 aggregation=tdm.AggregationMethod.SUM,
28 )
29 energy
NameError: name 'tdm' is not defined
[22]:
source_agg = energy.aggregate_level("source")
for name, ts in source_agg.items():
print(f"{name:8s} mean={np.nanmean(ts.arr):.1f} MW")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[22], line 1
----> 1 source_agg = energy.aggregate_level("source")
2 for name, ts in source_agg.items():
3 print(f"{name:8s} mean={np.nanmean(ts.arr):.1f} MW")
NameError: name 'energy' is not defined
Sequence protocol
HierarchicalTimeSeries supports len, in, and bracket indexing with slash-separated paths.
[23]:
print(f"Total nodes: {len(hierarchy)}")
print(f"'Oslo' in tree: {'Oslo' in hierarchy}")
print(f"'Helsinki' in tree: {'Helsinki' in hierarchy}")
node = hierarchy["South/Oslo"]
print(f"\nBracket access: {node.key} (leaf={node.is_leaf})")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[23], line 1
----> 1 print(f"Total nodes: {len(hierarchy)}")
2 print(f"'Oslo' in tree: {'Oslo' in hierarchy}")
3 print(f"'Helsinki' in tree: {'Helsinki' in hierarchy}")
NameError: name 'hierarchy' is not defined
Summary
Feature |
API |
|---|---|
Build manually |
|
Build from DataFrame |
|
Build from dict |
|
Navigate |
|
Walk |
|
Aggregate |
|
Level aggregate |
|
Subtree |
|
Convert |
|
Sequence ops |
|