Cubes and Collections

For multi-dimensional data (e.g., scenario x time, or region x time) use TimeSeriesCube. For grouping heterogeneous time series that don’t share the same timestamps, use TimeSeriesCollection.

TimeSeriesCube

A cube stores an N-dimensional array with named Dimension objects. Common use cases include ensemble forecasts, scenario analysis, and region-by-time grids.

[1]:
from datetime import datetime, timedelta, timezone

import numpy as np

import timedatamodel as tdm

base = datetime(2024, 1, 15, tzinfo=timezone.utc)
hours = [base + timedelta(hours=i) for i in range(24)]
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 5
      1 from datetime import datetime, timedelta, timezone
      3 import numpy as np
----> 5 import timedatamodel as tdm
      7 base = datetime(2024, 1, 15, tzinfo=timezone.utc)
      8 hours = [base + timedelta(hours=i) for i in range(24)]

ModuleNotFoundError: No module named 'timedatamodel'

Building a cube from scratch

Create a 3-scenario x 24-hour cube representing price forecasts under different assumptions.

[2]:
rng = np.random.default_rng(42)
base_prices = 50 + 20 * np.sin(np.linspace(0, 2 * np.pi, 24))

data = np.array([
    base_prices * 0.8 + rng.normal(0, 2, 24),  # low scenario
    base_prices + rng.normal(0, 2, 24),          # base scenario
    base_prices * 1.3 + rng.normal(0, 3, 24),   # high scenario
])

cube = tdm.TimeSeriesCube(
    tdm.Frequency.PT1H,
    timezone="UTC",
    name="price_forecast",
    unit="EUR/MWh",
    data_type=tdm.DataType.FORECAST,
    dimensions=[
        tdm.Dimension("scenario", ["low", "base", "high"]),
        tdm.Dimension("valid_time", hours),
    ],
    values=data,
)
cube
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[2], line 10
      2 base_prices = 50 + 20 * np.sin(np.linspace(0, 2 * np.pi, 24))
      4 data = np.array([
      5     base_prices * 0.8 + rng.normal(0, 2, 24),  # low scenario
      6     base_prices + rng.normal(0, 2, 24),          # base scenario
      7     base_prices * 1.3 + rng.normal(0, 3, 24),   # high scenario
      8 ])
---> 10 cube = tdm.TimeSeriesCube(
     11     tdm.Frequency.PT1H,
     12     timezone="UTC",
     13     name="price_forecast",
     14     unit="EUR/MWh",
     15     data_type=tdm.DataType.FORECAST,
     16     dimensions=[
     17         tdm.Dimension("scenario", ["low", "base", "high"]),
     18         tdm.Dimension("valid_time", hours),
     19     ],
     20     values=data,
     21 )
     22 cube

NameError: name 'tdm' is not defined

Cube properties

[3]:
print(f"Shape:      {cube.shape}")
print(f"Dimensions: {cube.dim_names}")
print(f"Begin:      {cube.begin}")
print(f"End:        {cube.end}")
print(f"Has missing:{cube.has_missing}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 print(f"Shape:      {cube.shape}")
      2 print(f"Dimensions: {cube.dim_names}")
      3 print(f"Begin:      {cube.begin}")

NameError: name 'cube' is not defined

Selecting with sel() — label-based

Select a single scenario to collapse the cube into a TimeSeries.

[4]:
base_scenario = cube.sel(scenario="base")
base_scenario
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 base_scenario = cube.sel(scenario="base")
      2 base_scenario

NameError: name 'cube' is not defined

Selecting with isel() — index-based

Select by integer position.

[5]:
first_scenario = cube.isel(scenario=0)
first_scenario
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 first_scenario = cube.isel(scenario=0)
      2 first_scenario

NameError: name 'cube' is not defined

Slicing a dimension

Select a range of labels to get a smaller cube or table.

[6]:
two_scenarios = cube.sel(scenario=slice("low", "base"))
print(f"Type:  {type(two_scenarios).__name__}")
print(two_scenarios)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[6], line 1
----> 1 two_scenarios = cube.sel(scenario=slice("low", "base"))
      2 print(f"Type:  {type(two_scenarios).__name__}")
      3 print(two_scenarios)

NameError: name 'cube' is not defined

Auto-collapse to Table or Series

When a sel() or isel() call removes enough dimensions, the result automatically becomes a TimeSeriesTable (2D) or TimeSeries (1D).

[7]:
table_view = cube.to_table()
print(f"Type: {type(table_view).__name__}")
table_view
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[7], line 1
----> 1 table_view = cube.to_table()
      2 print(f"Type: {type(table_view).__name__}")
      3 table_view

NameError: name 'cube' is not defined

Building a cube from a list of TimeSeries

from_timeseries_list() is handy when you already have individual scenario forecasts.

[8]:
series_list = [
    tdm.TimeSeries(
        tdm.Frequency.PT1H,
        timestamps=hours,
        values=(base_prices * factor + rng.normal(0, 2, 24)).tolist(),
        name="price",
        unit="EUR/MWh",
    )
    for factor in [0.7, 0.85, 1.0, 1.15, 1.3]
]

ensemble = tdm.TimeSeriesCube.from_timeseries_list(
    series_list,
    dimension=tdm.Dimension("percentile", ["p10", "p25", "p50", "p75", "p90"]),
    name="price_ensemble",
)
ensemble
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[8], line 2
      1 series_list = [
----> 2     tdm.TimeSeries(
      3         tdm.Frequency.PT1H,
      4         timestamps=hours,
      5         values=(base_prices * factor + rng.normal(0, 2, 24)).tolist(),
      6         name="price",
      7         unit="EUR/MWh",
      8     )
      9     for factor in [0.7, 0.85, 1.0, 1.15, 1.3]
     10 ]
     12 ensemble = tdm.TimeSeriesCube.from_timeseries_list(
     13     series_list,
     14     dimension=tdm.Dimension("percentile", ["p10", "p25", "p50", "p75", "p90"]),
     15     name="price_ensemble",
     16 )
     17 ensemble

NameError: name 'tdm' is not defined

TimeSeriesCollection

A TimeSeriesCollection groups time series that may have different frequencies, time ranges, or numbers of points. Think of it as a named bag of TimeSeries and TimeSeriesTable objects.

[9]:
daily_base = datetime(2024, 1, 1, tzinfo=timezone.utc)

ts_hourly = tdm.TimeSeries(
    tdm.Frequency.PT1H,
    timestamps=hours,
    values=[100.0 + rng.normal(0, 10) for _ in range(24)],
    name="wind_hourly",
    unit="MW",
)

ts_daily = tdm.TimeSeries(
    tdm.Frequency.P1D,
    timestamps=[daily_base + timedelta(days=d) for d in range(30)],
    values=[2400.0 + rng.normal(0, 200) for _ in range(30)],
    name="wind_daily_energy",
    unit="MWh",
)

ts_15min = tdm.TimeSeries(
    tdm.Frequency.PT15M,
    timestamps=[base + timedelta(minutes=15 * i) for i in range(96)],
    values=[50.0 + rng.normal(0, 5) for _ in range(96)],
    name="solar_15min",
    unit="MW",
)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[9], line 3
      1 daily_base = datetime(2024, 1, 1, tzinfo=timezone.utc)
----> 3 ts_hourly = tdm.TimeSeries(
      4     tdm.Frequency.PT1H,
      5     timestamps=hours,
      6     values=[100.0 + rng.normal(0, 10) for _ in range(24)],
      7     name="wind_hourly",
      8     unit="MW",
      9 )
     11 ts_daily = tdm.TimeSeries(
     12     tdm.Frequency.P1D,
     13     timestamps=[daily_base + timedelta(days=d) for d in range(30)],
   (...)     16     unit="MWh",
     17 )
     19 ts_15min = tdm.TimeSeries(
     20     tdm.Frequency.PT15M,
     21     timestamps=[base + timedelta(minutes=15 * i) for i in range(96)],
   (...)     24     unit="MW",
     25 )

NameError: name 'tdm' is not defined

Creating a collection

[10]:
collection = tdm.TimeSeriesCollection(
    [ts_hourly, ts_daily, ts_15min],
    name="Plant overview",
    description="Mixed-frequency data for a single plant",
)
collection
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[10], line 1
----> 1 collection = tdm.TimeSeriesCollection(
      2     [ts_hourly, ts_daily, ts_15min],
      3     name="Plant overview",
      4     description="Mixed-frequency data for a single plant",
      5 )
      6 collection

NameError: name 'tdm' is not defined

Dictionary-like access

[11]:
print(f"Names: {collection.names}")
print(f"Count: {collection.series_count}")

collection["wind_hourly"]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[11], line 1
----> 1 print(f"Names: {collection.names}")
      2 print(f"Count: {collection.series_count}")
      4 collection["wind_hourly"]

NameError: name 'collection' is not defined

Adding and removing series

Collections are immutable — add() and remove() return new collections.

[12]:
ts_price = tdm.TimeSeries(
    tdm.Frequency.PT1H,
    timestamps=hours,
    values=[45.0 + rng.normal(0, 8) for _ in range(24)],
    name="spot_price",
    unit="EUR/MWh",
)

extended = collection.add(ts_price)
print(f"Original: {collection.names}")
print(f"Extended: {extended.names}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[12], line 1
----> 1 ts_price = tdm.TimeSeries(
      2     tdm.Frequency.PT1H,
      3     timestamps=hours,
      4     values=[45.0 + rng.normal(0, 8) for _ in range(24)],
      5     name="spot_price",
      6     unit="EUR/MWh",
      7 )
      9 extended = collection.add(ts_price)
     10 print(f"Original: {collection.names}")

NameError: name 'tdm' is not defined
[13]:
reduced = extended.remove("wind_daily_energy")
print(f"Reduced: {reduced.names}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[13], line 1
----> 1 reduced = extended.remove("wind_daily_energy")
      2 print(f"Reduced: {reduced.names}")

NameError: name 'extended' is not defined

Iterating over a collection

[14]:
for name, series in collection.items():
    print(f"{name:20s}  freq={str(series.frequency):5s}  len={len(series):3d}  begin={series.begin}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[14], line 1
----> 1 for name, series in collection.items():
      2     print(f"{name:20s}  freq={str(series.frequency):5s}  len={len(series):3d}  begin={series.begin}")

NameError: name 'collection' is not defined

Summary

  • ``TimeSeriesCube``: N-dimensional time series with Dimension labels; slice with sel() / isel(); auto-collapses to Table or Series

  • ``TimeSeriesCollection``: heterogeneous container for series with different frequencies and time ranges; dictionary-like access; immutable add/remove

Next up: nb_07 covers data quality tools — coverage bars and validation.