Arrays and Collections

For multi-dimensional data (e.g., scenario × time, or region × quantile × time) use TimeSeriesArray. For grouping heterogeneous time series that don’t share the same timestamps, use TimeSeriesCollection.

TimeSeriesArray

An array stores an N-dimensional array with named Dimension objects. Each dimension has a name and a list of labels (datetimes, strings, or numbers).
Common use cases include ensemble forecasts, scenario analysis, and multi-site probabilistic forecasts.
[1]:
from datetime import datetime, timedelta, timezone

import numpy as np

import timedatamodel as tdm

rng = np.random.default_rng(42)
base = datetime(2024, 1, 15, tzinfo=timezone.utc)

Building a 4D array

Imagine a wind power forecasting system that produces:

  • 3 knowledge times — forecasts issued at 00:00, 06:00, and 12:00

  • 24 valid times — hourly forecast horizon

  • 3 wind farms — Alpha, Bravo, Charlie

  • 5 quantiles — probabilistic spread from q10 to q90

That gives a (3, 24, 3, 5) array with 1 080 values.

[2]:
knowledge_times = [base + timedelta(hours=h) for h in [0, 6, 12]]
valid_times = [base + timedelta(hours=h) for h in range(24)]
wind_farms = ["Alpha", "Bravo", "Charlie"]
quantiles = ["q10", "q25", "q50", "q75", "q90"]

nk = len(knowledge_times)
nv = len(valid_times)
nw = len(wind_farms)
nq = len(quantiles)

data = np.empty((nk, nv, nw, nq))
for k in range(nk):
    for w in range(nw):
        capacity = 50 + 20 * w
        daily_shape = capacity * (1 + 0.3 * np.sin(np.linspace(0, 2 * np.pi, nv)))
        for q_idx, q_spread in enumerate([-0.4, -0.15, 0, 0.15, 0.4]):
            data[k, :, w, q_idx] = daily_shape * (1 + q_spread) + rng.normal(0, 3, nv) + k * 5

cube = tdm.TimeSeriesArray(
    tdm.Frequency.PT1H,
    timezone="UTC",
    name="wind_power",
    unit="MW",
    data_type=tdm.DataType.FORECAST,
    dimensions=[
        tdm.Dimension("knowledge_time", knowledge_times),
        tdm.Dimension("valid_time", valid_times),
        tdm.Dimension("wind_farm", wind_farms),
        tdm.Dimension("quantile", quantiles),
    ],
    values=data,
)
cube
[2]:
TimeSeriesArray
Namewind_power
Dimensionsknowledge_time: 3, valid_time: 24, wind_farm: 3, quantile: 5
Shape(3, 24, 3, 5)
FrequencyPT1H
TimezoneUTC (+00:00)
UnitMW
Data typeFORECAST
knowledge_timevalid_timeAlphaCharlie
q10q25q50q75q25q50q75q90
2024-01-15 00:002024-01-15 00:0030.914241.21552.036754.741677.914689.1235108.732123.987
2024-01-15 00:002024-01-15 01:0029.308244.883554.249763.645585.7396.974113.194137.134
2024-01-15 00:002024-01-15 02:0036.927650.721658.661166.890188.8907103.273122.117149.106
2024-01-15 12:002024-01-15 21:0038.886940.524551.753157.025375.596289.1078104.702124.787
2024-01-15 12:002024-01-15 22:0034.5149.31153.714860.961882.318496.2784100.991124.042
2024-01-15 12:002024-01-15 23:0034.420550.831456.249168.434485.3755100.608117.916140.306

Array properties

[3]:
print(f"Shape:      {cube.shape}")
print(f"Dimensions: {cube.dim_names}")
print(f"N dims:     {cube.ndim}")
print(f"Begin:      {cube.begin}")
print(f"End:        {cube.end}")
print(f"Has missing:{cube.has_missing}")
Shape:      (3, 24, 3, 5)
Dimensions: ('knowledge_time', 'valid_time', 'wind_farm', 'quantile')
N dims:     4
Begin:      2024-01-15 00:00:00+00:00
End:        2024-01-15 23:00:00+00:00
Has missing:False
[4]:
coords = cube.coords
for dim_name, labels in coords.items():
    preview = labels[:3]
    suffix = f" ... ({len(labels)} total)" if len(labels) > 3 else ""
    print(f"{dim_name:18s} {preview}{suffix}")
knowledge_time     [datetime.datetime(2024, 1, 15, 0, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 1, 15, 6, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 1, 15, 12, 0, tzinfo=datetime.timezone.utc)]
valid_time         [datetime.datetime(2024, 1, 15, 0, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 1, 15, 1, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 1, 15, 2, 0, tzinfo=datetime.timezone.utc)] ... (24 total)
wind_farm          ['Alpha', 'Bravo', 'Charlie']
quantile           ['q10', 'q25', 'q50'] ... (5 total)

sel() — fixing one dimension (4D → 3D array)

Fix knowledge_time to get all farms and quantiles for a single forecast issuance.
The result is still an array because 3 dimensions remain.
[5]:
noon_forecast = cube.sel(knowledge_time=knowledge_times[2])  # 12:00 issuance

print(f"Type:  {type(noon_forecast).__name__}")
print(f"Shape: {noon_forecast.shape}")
print(f"Dims:  {noon_forecast.dim_names}")
Type:  TimeSeriesArray
Shape: (24, 3, 5)
Dims:  ('valid_time', 'wind_farm', 'quantile')

sel() — fixing two dimensions (4D → 2D → TimeSeriesTable)

Fix knowledge_time and wind_farm to see all quantiles over time for one farm.
Only 2 dimensions remain (valid_time × quantile), so the array auto-collapses to a TimeSeriesTable.
[6]:
alpha_quantiles = cube.sel(
    knowledge_time=knowledge_times[2],
    wind_farm="Alpha",
)

print(f"Type:    {type(alpha_quantiles).__name__}")
print(f"Columns: {alpha_quantiles.column_names}")
alpha_quantiles
Type:    TimeSeriesTable
Columns: ('q10', 'q25', 'q50', 'q75', 'q90')
[6]:
TimeSeriesTable
Nameunnamed
Columnsq10, q25, q50, q75, q90
Length24 × 5
FrequencyPT1H
TimezoneUTC (+00:00)
timestampq10q25q50q75q90
2024-01-15 00:0039.711552.797156.160569.033778.794
2024-01-15 01:0045.812258.732469.092473.327886.398
2024-01-15 02:0037.83464.517572.980771.102791.7309
2024-01-15 21:0038.886940.524551.753157.025373.3712
2024-01-15 22:0034.5149.31153.714860.961879.8464
2024-01-15 23:0034.420550.831456.249168.434478.9922

Fix a different pair — knowledge_time and quantile — to compare farms at the median:

[7]:
farms_median = cube.sel(
    knowledge_time=knowledge_times[2],
    quantile="q50",
)

print(f"Type:    {type(farms_median).__name__}")
print(f"Columns: {farms_median.column_names}")
farms_median
Type:    TimeSeriesTable
Columns: ('Alpha', 'Bravo', 'Charlie')
[7]:
TimeSeriesTable
Nameunnamed
ColumnsAlpha, Bravo, Charlie
Length24 × 3
FrequencyPT1H
TimezoneUTC (+00:00)
timestampAlphaBravoCharlie
2024-01-15 00:0056.160572.4949102.269
2024-01-15 01:0069.092480.5538108.421
2024-01-15 02:0072.980788.4122110.324
2024-01-15 21:0051.753165.8689.1078
2024-01-15 22:0053.714872.07996.2784
2024-01-15 23:0056.249181.1911100.608

sel() — fixing three dimensions (4D → 1D → TimeSeriesList)

Fix everything except valid_time to extract a single time series.

[8]:
single = cube.sel(
    knowledge_time=knowledge_times[2],
    wind_farm="Alpha",
    quantile="q50",
)

print(f"Type: {type(single).__name__}")
print(f"Len:  {len(single)}")
single
Type: TimeSeriesList
Len:  24
[8]:
TimeSeriesList
Namewind_power
Length24
FrequencyPT1H
TimezoneUTC (+00:00)
UnitMW
Data typeFORECAST
timestampwind_power
2024-01-15 00:0056.1605
2024-01-15 01:0069.0924
2024-01-15 02:0072.9807
2024-01-15 21:0051.7531
2024-01-15 22:0053.7148
2024-01-15 23:0056.2491

isel() — index-based selection

Use integer positions instead of labels.

[9]:
bravo_p90 = cube.isel(
    knowledge_time=0,
    wind_farm=1,       # Bravo
    quantile=-1,       # q90 (last)
)

print(f"Type: {type(bravo_p90).__name__}, len={len(bravo_p90)}")
print(f"Mean: {np.nanmean(bravo_p90.arr):.1f} MW")
Type: TimeSeriesList, len=24
Mean: 97.7 MW

Slicing a dimension

Use slice() to keep a range of labels. The dimension is preserved (not collapsed).

[10]:
narrow = cube.sel(
    knowledge_time=knowledge_times[2],
    wind_farm="Alpha",
    quantile=slice("q25", "q75"),
)

print(f"Type:    {type(narrow).__name__}")
print(f"Columns: {narrow.column_names}")
narrow
Type:    TimeSeriesTable
Columns: ('q25', 'q50', 'q75')
[10]:
TimeSeriesTable
Nameunnamed
Columnsq25, q50, q75
Length24 × 3
FrequencyPT1H
TimezoneUTC (+00:00)
timestampq25q50q75
2024-01-15 00:0052.797156.160569.0337
2024-01-15 01:0058.732469.092473.3278
2024-01-15 02:0064.517572.980771.1027
2024-01-15 21:0040.524551.753157.0253
2024-01-15 22:0049.31153.714860.9618
2024-01-15 23:0050.831456.249168.4344

Converting to pandas

to_pandas_dataframe() produces a long-format DataFrame with a MultiIndex covering all dimensions.

[11]:
df = cube.to_pandas_dataframe()
print(f"Shape:        {df.shape}")
print(f"Index levels: {list(df.index.names)}")
df.head(10)
Shape:        (1080, 1)
Index levels: ['knowledge_time', 'valid_time', 'wind_farm', 'quantile']
[11]:
wind_power
knowledge_time valid_time wind_farm quantile
2024-01-15 00:00:00+00:00 2024-01-15 00:00:00+00:00 Alpha q10 30.914151
q25 41.215017
q50 52.036741
q75 54.741643
q90 66.031901
Bravo q10 38.929508
q25 57.061177
q50 69.530086
q75 79.350438
q90 99.614346

Building an array from a list of TimeSeriesList

from_timeseries_list() is handy when you already have individual forecasts.

[12]:
base_prices = 50 + 20 * np.sin(np.linspace(0, 2 * np.pi, 24))

series_list = [
    tdm.TimeSeriesList(
        tdm.Frequency.PT1H,
        timestamps=valid_times,
        values=(base_prices * factor + rng.normal(0, 2, 24)).tolist(),
        name="price",
        unit="EUR/MWh",
    )
    for factor in [0.7, 0.85, 1.0, 1.15, 1.3]
]

ensemble = tdm.TimeSeriesArray.from_timeseries_list(
    series_list,
    dimension=tdm.Dimension("percentile", ["p10", "p25", "p50", "p75", "p90"]),
    name="price_ensemble",
)
ensemble
[12]:
TimeSeriesArray
Nameprice_ensemble
Dimensionspercentile: 5, valid_time: 24
Shape(5, 24)
FrequencyPT1H
TimezoneUTC (+00:00)
UnitEUR/MWh
valid_timep10p25p50p75p90
2024-01-15 00:0035.487540.305349.512258.045363.4761
2024-01-15 01:0038.474748.48851.20162.582472.312
2024-01-15 02:0043.139448.621358.603370.846179.2416
2024-01-15 21:0028.95433.126841.668743.553949.6378
2024-01-15 22:0032.419541.456144.972251.12560.9551
2024-01-15 23:0036.039342.303151.925354.299662.8834

TimeSeriesCollection

A TimeSeriesCollection groups time series that may have different frequencies, time ranges, or numbers of points. Think of it as a named bag of TimeSeriesList and TimeSeriesTable objects.

[13]:
daily_base = datetime(2024, 1, 1, tzinfo=timezone.utc)
hours = [base + timedelta(hours=i) for i in range(24)]

ts_hourly = tdm.TimeSeriesList(
    tdm.Frequency.PT1H,
    timestamps=hours,
    values=[100.0 + rng.normal(0, 10) for _ in range(24)],
    name="wind_hourly",
    unit="MW",
)

ts_daily = tdm.TimeSeriesList(
    tdm.Frequency.P1D,
    timestamps=[daily_base + timedelta(days=d) for d in range(30)],
    values=[2400.0 + rng.normal(0, 200) for _ in range(30)],
    name="wind_daily_energy",
    unit="MWh",
)

ts_15min = tdm.TimeSeriesList(
    tdm.Frequency.PT15M,
    timestamps=[base + timedelta(minutes=15 * i) for i in range(96)],
    values=[50.0 + rng.normal(0, 5) for _ in range(96)],
    name="solar_15min",
    unit="MW",
)

Creating a collection

[14]:
collection = tdm.TimeSeriesCollection(
    [ts_hourly, ts_daily, ts_15min],
    name="Plant overview",
    description="Mixed-frequency data for a single plant",
)
collection
[14]:
TimeSeriesCollection
nametypefreqtzlengthbeginend
wind_hourlyTimeSeriesListPT1HUTC242024-01-15 00:002024-01-15 23:00
wind_daily_energyTimeSeriesListP1DUTC302024-01-01 00:002024-01-30 00:00
solar_15minTimeSeriesListPT15MUTC962024-01-15 00:002024-01-15 23:45

Dictionary-like access

[15]:
print(f"Names: {collection.names}")
print(f"Count: {collection.series_count}")

collection["wind_hourly"]
Names: ['wind_hourly', 'wind_daily_energy', 'solar_15min']
Count: 3
[15]:
TimeSeriesList
Namewind_hourly
Length24
FrequencyPT1H
TimezoneUTC (+00:00)
UnitMW
timestampwind_hourly
2024-01-15 00:0086.7746
2024-01-15 01:0095.1381
2024-01-15 02:00104.202
2024-01-15 21:00112.906
2024-01-15 22:00104.11
2024-01-15 23:00107.826

Adding and removing series

Collections are immutable — add() and remove() return new collections.

[16]:
ts_price = tdm.TimeSeriesList(
    tdm.Frequency.PT1H,
    timestamps=hours,
    values=[45.0 + rng.normal(0, 8) for _ in range(24)],
    name="spot_price",
    unit="EUR/MWh",
)

extended = collection.add(ts_price)
print(f"Original: {collection.names}")
print(f"Extended: {extended.names}")
Original: ['wind_hourly', 'wind_daily_energy', 'solar_15min']
Extended: ['wind_hourly', 'wind_daily_energy', 'solar_15min', 'spot_price']
[17]:
reduced = extended.remove("wind_daily_energy")
print(f"Reduced: {reduced.names}")
Reduced: ['wind_hourly', 'solar_15min', 'spot_price']

Iterating over a collection

[18]:
for name, series in collection.items():
    print(f"{name:20s}  freq={str(series.frequency):5s}  len={len(series):3d}  begin={series.begin}")
wind_hourly           freq=PT1H   len= 24  begin=2024-01-15 00:00:00+00:00
wind_daily_energy     freq=P1D    len= 30  begin=2024-01-01 00:00:00+00:00
solar_15min           freq=PT15M  len= 96  begin=2024-01-15 00:00:00+00:00

Summary

  • ``TimeSeriesArray``: N-dimensional time series with Dimension labels; slice with sel() / isel(); auto-collapses to Table or Series

  • ``TimeSeriesCollection``: heterogeneous container for series with different frequencies and time ranges; dictionary-like access; immutable add/remove

Next up: nb_07 covers data quality tools — coverage bars and validation.