Hierarchical Time Series

Many real-world datasets are naturally organised as trees: a country’s electricity consumption breaks into regions, which break into cities.
HierarchicalTimeSeries lets you model that structure directly and aggregate bottom-up through the tree.

Class

Purpose

HierarchyNode

A single node — key, level, children, and an optional TimeSeries

HierarchicalTimeSeries

The tree container — traversal, aggregation, conversion

AggregationMethod

SUM, MEAN, MIN, MAX

[1]:
from datetime import datetime, timedelta, timezone

import numpy as np

import timedatamodel as tdm

base = datetime(2024, 1, 15, tzinfo=timezone.utc)
timestamps = [base + timedelta(hours=i) for i in range(24)]
rng = np.random.default_rng(42)
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 5
      1 from datetime import datetime, timedelta, timezone
      3 import numpy as np
----> 5 import timedatamodel as tdm
      7 base = datetime(2024, 1, 15, tzinfo=timezone.utc)
      8 timestamps = [base + timedelta(hours=i) for i in range(24)]

ModuleNotFoundError: No module named 'timedatamodel'

Create leaf time series

Each leaf in the hierarchy holds a TimeSeries. Here we model electricity consumption for five Norwegian cities.

[2]:
def make_consumption(name: str, base_mw: float) -> tdm.TimeSeries:
    pattern = base_mw * (1 + 0.3 * np.sin(np.linspace(0, 2 * np.pi, 24)))
    noise = rng.normal(0, base_mw * 0.05, 24)
    return tdm.TimeSeries(
        tdm.Frequency.PT1H,
        timezone="Europe/Oslo",
        timestamps=timestamps,
        values=(pattern + noise).tolist(),
        name=name,
        unit="MW",
    )

ts_oslo = make_consumption("Oslo", 500)
ts_bergen = make_consumption("Bergen", 200)
ts_stavanger = make_consumption("Stavanger", 150)
ts_tromsoe = make_consumption("Tromsø", 80)
ts_bodoe = make_consumption("Bodø", 50)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[2], line 13
      3     noise = rng.normal(0, base_mw * 0.05, 24)
      4     return tdm.TimeSeries(
      5         tdm.Frequency.PT1H,
      6         timezone="Europe/Oslo",
   (...)     10         unit="MW",
     11     )
---> 13 ts_oslo = make_consumption("Oslo", 500)
     14 ts_bergen = make_consumption("Bergen", 200)
     15 ts_stavanger = make_consumption("Stavanger", 150)

Cell In[2], line 3, in make_consumption(name, base_mw)
      1 def make_consumption(name: str, base_mw: float) -> tdm.TimeSeries:
      2     pattern = base_mw * (1 + 0.3 * np.sin(np.linspace(0, 2 * np.pi, 24)))
----> 3     noise = rng.normal(0, base_mw * 0.05, 24)
      4     return tdm.TimeSeries(
      5         tdm.Frequency.PT1H,
      6         timezone="Europe/Oslo",
   (...)     10         unit="MW",
     11     )

NameError: name 'rng' is not defined

Building a hierarchy with HierarchyNode

Construct the tree by nesting HierarchyNode objects. Leaves hold a TimeSeries; interior nodes have children.

[3]:
root = tdm.HierarchyNode(
    key="Norway",
    level="country",
    children=[
        tdm.HierarchyNode(
            key="South",
            level="region",
            children=[
                tdm.HierarchyNode(key="Oslo", level="city", timeseries=ts_oslo),
                tdm.HierarchyNode(key="Bergen", level="city", timeseries=ts_bergen),
                tdm.HierarchyNode(key="Stavanger", level="city", timeseries=ts_stavanger),
            ],
        ),
        tdm.HierarchyNode(
            key="North",
            level="region",
            children=[
                tdm.HierarchyNode(key="Tromsø", level="city", timeseries=ts_tromsoe),
                tdm.HierarchyNode(key="Bodø", level="city", timeseries=ts_bodoe),
            ],
        ),
    ],
)

hierarchy = tdm.HierarchicalTimeSeries(
    root,
    name="Norway Consumption",
    description="Hourly electricity consumption by city",
    levels=["country", "region", "city"],
    aggregation=tdm.AggregationMethod.SUM,
)
hierarchy
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 root = tdm.HierarchyNode(
      2     key="Norway",
      3     level="country",
      4     children=[
      5         tdm.HierarchyNode(
      6             key="South",
      7             level="region",
      8             children=[
      9                 tdm.HierarchyNode(key="Oslo", level="city", timeseries=ts_oslo),
     10                 tdm.HierarchyNode(key="Bergen", level="city", timeseries=ts_bergen),
     11                 tdm.HierarchyNode(key="Stavanger", level="city", timeseries=ts_stavanger),
     12             ],
     13         ),
     14         tdm.HierarchyNode(
     15             key="North",
     16             level="region",
     17             children=[
     18                 tdm.HierarchyNode(key="Tromsø", level="city", timeseries=ts_tromsoe),
     19                 tdm.HierarchyNode(key="Bodø", level="city", timeseries=ts_bodoe),
     20             ],
     21         ),
     22     ],
     23 )
     25 hierarchy = tdm.HierarchicalTimeSeries(
     26     root,
     27     name="Norway Consumption",
   (...)     30     aggregation=tdm.AggregationMethod.SUM,
     31 )
     32 hierarchy

NameError: name 'tdm' is not defined

Inspecting the tree

Basic properties tell you the shape of the hierarchy.

[4]:
print(f"Name:     {hierarchy.name}")
print(f"Levels:   {hierarchy.levels}")
print(f"# levels: {hierarchy.n_levels}")
print(f"# nodes:  {hierarchy.n_nodes}")
print(f"# leaves: {hierarchy.n_leaves}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 print(f"Name:     {hierarchy.name}")
      2 print(f"Levels:   {hierarchy.levels}")
      3 print(f"# levels: {hierarchy.n_levels}")

NameError: name 'hierarchy' is not defined

Leaves and walking

leaves() returns all leaf nodes. walk() yields nodes in pre-order (default) or post-order.

[8]:
print("All leaves:")
for leaf in hierarchy.leaves():
    print(f"  {leaf.key:12s}  path={leaf.path}")
All leaves:
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[8], line 2
      1 print("All leaves:")
----> 2 for leaf in hierarchy.leaves():
      3     print(f"  {leaf.key:12s}  path={leaf.path}")

NameError: name 'hierarchy' is not defined
[9]:
print("Pre-order walk:")
for node in hierarchy.walk(order="pre"):
    indent = "  " * node.depth
    label = f"{node.key} [{node.level}]"
    if node.is_leaf:
        label += f" — {len(node.timeseries)} pts"
    print(f"{indent}{label}")
Pre-order walk:
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[9], line 2
      1 print("Pre-order walk:")
----> 2 for node in hierarchy.walk(order="pre"):
      3     indent = "  " * node.depth
      4     label = f"{node.key} [{node.level}]"

NameError: name 'hierarchy' is not defined

Bottom-up aggregation

aggregate() recursively combines leaf series using the chosen method (default: SUM).
Calling it on the root gives the total for the whole hierarchy.
[10]:
total = hierarchy.aggregate()
print(f"Name:   {total.name}")
print(f"Length: {len(total)} data points")
print(f"Mean:   {np.nanmean(total.arr):.1f} MW")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[10], line 1
----> 1 total = hierarchy.aggregate()
      2 print(f"Name:   {total.name}")
      3 print(f"Length: {len(total)} data points")

NameError: name 'hierarchy' is not defined
[11]:
south_total = hierarchy.aggregate(south)
print(f"South region total — mean: {np.nanmean(south_total.arr):.1f} MW")

north = hierarchy.get_node("North")
north_total = hierarchy.aggregate(north)
print(f"North region total — mean: {np.nanmean(north_total.arr):.1f} MW")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[11], line 1
----> 1 south_total = hierarchy.aggregate(south)
      2 print(f"South region total — mean: {np.nanmean(south_total.arr):.1f} MW")
      4 north = hierarchy.get_node("North")

NameError: name 'hierarchy' is not defined

Level-wise aggregation

aggregate_level(level) aggregates every node at the named level, returning a dict.

[12]:
region_agg = hierarchy.aggregate_level("region")

for name, ts in region_agg.items():
    print(f"{name:8s}  mean={np.nanmean(ts.arr):7.1f} MW  max={np.nanmax(ts.arr):7.1f} MW")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[12], line 1
----> 1 region_agg = hierarchy.aggregate_level("region")
      3 for name, ts in region_agg.items():
      4     print(f"{name:8s}  mean={np.nanmean(ts.arr):7.1f} MW  max={np.nanmax(ts.arr):7.1f} MW")

NameError: name 'hierarchy' is not defined

Choosing an aggregation method

Override the default method by passing a different AggregationMethod.

[13]:
for method in tdm.AggregationMethod:
    agg = hierarchy.aggregate(method=method)
    vals = agg.arr
    print(f"{method.value:5s}  mean={np.nanmean(vals):7.1f}  min={np.nanmin(vals):7.1f}  max={np.nanmax(vals):7.1f}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[13], line 1
----> 1 for method in tdm.AggregationMethod:
      2     agg = hierarchy.aggregate(method=method)
      3     vals = agg.arr

NameError: name 'tdm' is not defined

Subtree extraction

subtree(*path) creates a new HierarchicalTimeSeries rooted at the specified node.

[14]:
south_tree = hierarchy.subtree("South")
print(south_tree)
print(f"\nLevels: {south_tree.levels}")
print(f"Leaves: {south_tree.n_leaves}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[14], line 1
----> 1 south_tree = hierarchy.subtree("South")
      2 print(south_tree)
      3 print(f"\nLevels: {south_tree.levels}")

NameError: name 'hierarchy' is not defined

Converting to other containers

Flatten the hierarchy into a TimeSeriesCollection or TimeSeriesTable.

[15]:
collection = hierarchy.to_collection()
print(f"Leaf-level collection: {list(collection.keys())}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[15], line 1
----> 1 collection = hierarchy.to_collection()
      2 print(f"Leaf-level collection: {list(collection.keys())}")

NameError: name 'hierarchy' is not defined
[16]:
collection_regions = hierarchy.to_collection(level="region")
print("Region-level collection (aggregated):")
for key, ts in collection_regions.items():
    print(f"  {key}: {len(ts)} pts, mean={np.nanmean(ts.arr):.1f} MW")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[16], line 1
----> 1 collection_regions = hierarchy.to_collection(level="region")
      2 print("Region-level collection (aggregated):")
      3 for key, ts in collection_regions.items():

NameError: name 'hierarchy' is not defined
[17]:
table = hierarchy.to_table()
print(f"Table shape: {len(table)} rows × {table.n_columns} columns")
print(f"Columns: {table.names}")
table
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[17], line 1
----> 1 table = hierarchy.to_table()
      2 print(f"Table shape: {len(table)} rows × {table.n_columns} columns")
      3 print(f"Columns: {table.names}")

NameError: name 'hierarchy' is not defined

Building from a DataFrame

from_dataframe builds the tree from a long-format DataFrame with hierarchy columns.
Each unique combination of level columns becomes a leaf.
[18]:
import pandas as pd

rows = []
for ts_dt in timestamps:
    for region, cities in [("South", ["Oslo", "Bergen"]), ("North", ["Tromsø", "Bodø"])]:
        for city in cities:
            rows.append({
                "timestamp": ts_dt,
                "region": region,
                "city": city,
                "consumption_mw": float(rng.normal(200, 30)),
            })

df = pd.DataFrame(rows)
print(f"DataFrame shape: {df.shape}")
df.head(8)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[18], line 4
      1 import pandas as pd
      3 rows = []
----> 4 for ts_dt in timestamps:
      5     for region, cities in [("South", ["Oslo", "Bergen"]), ("North", ["Tromsø", "Bodø"])]:
      6         for city in cities:

NameError: name 'timestamps' is not defined
[19]:
h_from_df = tdm.HierarchicalTimeSeries.from_dataframe(
    df,
    level_columns=["region", "city"],
    value_column="consumption_mw",
    timestamp_column="timestamp",
    name="Consumption from DataFrame",
    frequency=tdm.Frequency.PT1H,
    timezone="Europe/Oslo",
)
h_from_df
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[19], line 1
----> 1 h_from_df = tdm.HierarchicalTimeSeries.from_dataframe(
      2     df,
      3     level_columns=["region", "city"],
      4     value_column="consumption_mw",
      5     timestamp_column="timestamp",
      6     name="Consumption from DataFrame",
      7     frequency=tdm.Frequency.PT1H,
      8     timezone="Europe/Oslo",
      9 )
     10 h_from_df

NameError: name 'tdm' is not defined
[20]:
total_df = h_from_df.aggregate()
print(f"Total from DataFrame hierarchy: mean={np.nanmean(total_df.arr):.1f} MW")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[20], line 1
----> 1 total_df = h_from_df.aggregate()
      2 print(f"Total from DataFrame hierarchy: mean={np.nanmean(total_df.arr):.1f} MW")

NameError: name 'h_from_df' is not defined

Another example: energy production by source

Hierarchies can model any tree-shaped relationship — here, power production broken down by energy source.

[21]:
energy_root = tdm.HierarchyNode(
    key="Total",
    level="total",
    children=[
        tdm.HierarchyNode(
            key="Wind",
            level="source",
            children=[
                tdm.HierarchyNode(key="Farm A", level="farm", timeseries=make_consumption("Farm A", 100)),
                tdm.HierarchyNode(key="Farm B", level="farm", timeseries=make_consumption("Farm B", 80)),
            ],
        ),
        tdm.HierarchyNode(
            key="Solar",
            level="source",
            children=[
                tdm.HierarchyNode(key="Plant X", level="farm", timeseries=make_consumption("Plant X", 60)),
            ],
        ),
    ],
)

energy = tdm.HierarchicalTimeSeries(
    energy_root,
    name="Energy Production",
    levels=["total", "source", "farm"],
    aggregation=tdm.AggregationMethod.SUM,
)
energy
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[21], line 1
----> 1 energy_root = tdm.HierarchyNode(
      2     key="Total",
      3     level="total",
      4     children=[
      5         tdm.HierarchyNode(
      6             key="Wind",
      7             level="source",
      8             children=[
      9                 tdm.HierarchyNode(key="Farm A", level="farm", timeseries=make_consumption("Farm A", 100)),
     10                 tdm.HierarchyNode(key="Farm B", level="farm", timeseries=make_consumption("Farm B", 80)),
     11             ],
     12         ),
     13         tdm.HierarchyNode(
     14             key="Solar",
     15             level="source",
     16             children=[
     17                 tdm.HierarchyNode(key="Plant X", level="farm", timeseries=make_consumption("Plant X", 60)),
     18             ],
     19         ),
     20     ],
     21 )
     23 energy = tdm.HierarchicalTimeSeries(
     24     energy_root,
     25     name="Energy Production",
     26     levels=["total", "source", "farm"],
     27     aggregation=tdm.AggregationMethod.SUM,
     28 )
     29 energy

NameError: name 'tdm' is not defined
[22]:
source_agg = energy.aggregate_level("source")
for name, ts in source_agg.items():
    print(f"{name:8s}  mean={np.nanmean(ts.arr):.1f} MW")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[22], line 1
----> 1 source_agg = energy.aggregate_level("source")
      2 for name, ts in source_agg.items():
      3     print(f"{name:8s}  mean={np.nanmean(ts.arr):.1f} MW")

NameError: name 'energy' is not defined

Sequence protocol

HierarchicalTimeSeries supports len, in, and bracket indexing with slash-separated paths.

[23]:
print(f"Total nodes:        {len(hierarchy)}")
print(f"'Oslo' in tree:     {'Oslo' in hierarchy}")
print(f"'Helsinki' in tree: {'Helsinki' in hierarchy}")

node = hierarchy["South/Oslo"]
print(f"\nBracket access:     {node.key} (leaf={node.is_leaf})")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[23], line 1
----> 1 print(f"Total nodes:        {len(hierarchy)}")
      2 print(f"'Oslo' in tree:     {'Oslo' in hierarchy}")
      3 print(f"'Helsinki' in tree: {'Helsinki' in hierarchy}")

NameError: name 'hierarchy' is not defined

Summary

Feature

API

Build manually

HierarchyNode(key, level, children, timeseries)

Build from DataFrame

HierarchicalTimeSeries.from_dataframe(df, level_columns, value_column)

Build from dict

HierarchicalTimeSeries.from_dict(tree, series_map, levels=...)

Navigate

get_node(*path), get_level(name), leaves()

Walk

walk(order="pre") / walk(order="post")

Aggregate

aggregate(node, method) — bottom-up recursion

Level aggregate

aggregate_level(level)dict[str, TimeSeries]

Subtree

subtree(*path) → new HierarchicalTimeSeries

Convert

to_collection(level), to_table(level)

Sequence ops

len(h), key in h, h["path/to/node"]