Arrays and Collections
For multi-dimensional data (e.g., scenario × time, or region × quantile × time) use TimeSeriesArray. For grouping heterogeneous time series that don’t share the same timestamps, use TimeSeriesCollection.
TimeSeriesArray
Dimension objects. Each dimension has a name and a list of labels (datetimes, strings, or numbers).[1]:
from datetime import datetime, timedelta, timezone
import numpy as np
import timedatamodel as tdm
rng = np.random.default_rng(42)
base = datetime(2024, 1, 15, tzinfo=timezone.utc)
Building a 4D array
Imagine a wind power forecasting system that produces:
3 knowledge times — forecasts issued at 00:00, 06:00, and 12:00
24 valid times — hourly forecast horizon
3 wind farms — Alpha, Bravo, Charlie
5 quantiles — probabilistic spread from q10 to q90
That gives a (3, 24, 3, 5) array with 1 080 values.
[2]:
knowledge_times = [base + timedelta(hours=h) for h in [0, 6, 12]]
valid_times = [base + timedelta(hours=h) for h in range(24)]
wind_farms = ["Alpha", "Bravo", "Charlie"]
quantiles = ["q10", "q25", "q50", "q75", "q90"]
nk = len(knowledge_times)
nv = len(valid_times)
nw = len(wind_farms)
nq = len(quantiles)
data = np.empty((nk, nv, nw, nq))
for k in range(nk):
for w in range(nw):
capacity = 50 + 20 * w
daily_shape = capacity * (1 + 0.3 * np.sin(np.linspace(0, 2 * np.pi, nv)))
for q_idx, q_spread in enumerate([-0.4, -0.15, 0, 0.15, 0.4]):
data[k, :, w, q_idx] = daily_shape * (1 + q_spread) + rng.normal(0, 3, nv) + k * 5
cube = tdm.TimeSeriesArray(
tdm.Frequency.PT1H,
timezone="UTC",
name="wind_power",
unit="MW",
data_type=tdm.DataType.FORECAST,
dimensions=[
tdm.Dimension("knowledge_time", knowledge_times),
tdm.Dimension("valid_time", valid_times),
tdm.Dimension("wind_farm", wind_farms),
tdm.Dimension("quantile", quantiles),
],
values=data,
)
cube
[2]:
| knowledge_time | valid_time | Alpha | … | Charlie | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| q10 | q25 | q50 | q75 | … | q25 | q50 | q75 | q90 | ||
| 2024-01-15 00:00 | 2024-01-15 00:00 | 30.9142 | 41.215 | 52.0367 | 54.7416 | … | 77.9146 | 89.1235 | 108.732 | 123.987 |
| 2024-01-15 00:00 | 2024-01-15 01:00 | 29.3082 | 44.8835 | 54.2497 | 63.6455 | … | 85.73 | 96.974 | 113.194 | 137.134 |
| 2024-01-15 00:00 | 2024-01-15 02:00 | 36.9276 | 50.7216 | 58.6611 | 66.8901 | … | 88.8907 | 103.273 | 122.117 | 149.106 |
| … | … | … | … | … | … | … | … | … | … | … |
| 2024-01-15 12:00 | 2024-01-15 21:00 | 38.8869 | 40.5245 | 51.7531 | 57.0253 | … | 75.5962 | 89.1078 | 104.702 | 124.787 |
| 2024-01-15 12:00 | 2024-01-15 22:00 | 34.51 | 49.311 | 53.7148 | 60.9618 | … | 82.3184 | 96.2784 | 100.991 | 124.042 |
| 2024-01-15 12:00 | 2024-01-15 23:00 | 34.4205 | 50.8314 | 56.2491 | 68.4344 | … | 85.3755 | 100.608 | 117.916 | 140.306 |
Array properties
[3]:
print(f"Shape: {cube.shape}")
print(f"Dimensions: {cube.dim_names}")
print(f"N dims: {cube.ndim}")
print(f"Begin: {cube.begin}")
print(f"End: {cube.end}")
print(f"Has missing:{cube.has_missing}")
Shape: (3, 24, 3, 5)
Dimensions: ('knowledge_time', 'valid_time', 'wind_farm', 'quantile')
N dims: 4
Begin: 2024-01-15 00:00:00+00:00
End: 2024-01-15 23:00:00+00:00
Has missing:False
[4]:
coords = cube.coords
for dim_name, labels in coords.items():
preview = labels[:3]
suffix = f" ... ({len(labels)} total)" if len(labels) > 3 else ""
print(f"{dim_name:18s} {preview}{suffix}")
knowledge_time [datetime.datetime(2024, 1, 15, 0, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 1, 15, 6, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 1, 15, 12, 0, tzinfo=datetime.timezone.utc)]
valid_time [datetime.datetime(2024, 1, 15, 0, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 1, 15, 1, 0, tzinfo=datetime.timezone.utc), datetime.datetime(2024, 1, 15, 2, 0, tzinfo=datetime.timezone.utc)] ... (24 total)
wind_farm ['Alpha', 'Bravo', 'Charlie']
quantile ['q10', 'q25', 'q50'] ... (5 total)
sel() — fixing one dimension (4D → 3D array)
knowledge_time to get all farms and quantiles for a single forecast issuance.[5]:
noon_forecast = cube.sel(knowledge_time=knowledge_times[2]) # 12:00 issuance
print(f"Type: {type(noon_forecast).__name__}")
print(f"Shape: {noon_forecast.shape}")
print(f"Dims: {noon_forecast.dim_names}")
Type: TimeSeriesArray
Shape: (24, 3, 5)
Dims: ('valid_time', 'wind_farm', 'quantile')
sel() — fixing two dimensions (4D → 2D → TimeSeriesTable)
knowledge_time and wind_farm to see all quantiles over time for one farm.valid_time × quantile), so the array auto-collapses to a TimeSeriesTable.[6]:
alpha_quantiles = cube.sel(
knowledge_time=knowledge_times[2],
wind_farm="Alpha",
)
print(f"Type: {type(alpha_quantiles).__name__}")
print(f"Columns: {alpha_quantiles.column_names}")
alpha_quantiles
Type: TimeSeriesTable
Columns: ('q10', 'q25', 'q50', 'q75', 'q90')
[6]:
| timestamp | q10 | q25 | q50 | q75 | q90 |
|---|---|---|---|---|---|
| 2024-01-15 00:00 | 39.7115 | 52.7971 | 56.1605 | 69.0337 | 78.794 |
| 2024-01-15 01:00 | 45.8122 | 58.7324 | 69.0924 | 73.3278 | 86.398 |
| 2024-01-15 02:00 | 37.834 | 64.5175 | 72.9807 | 71.1027 | 91.7309 |
| … | … | … | … | … | … |
| 2024-01-15 21:00 | 38.8869 | 40.5245 | 51.7531 | 57.0253 | 73.3712 |
| 2024-01-15 22:00 | 34.51 | 49.311 | 53.7148 | 60.9618 | 79.8464 |
| 2024-01-15 23:00 | 34.4205 | 50.8314 | 56.2491 | 68.4344 | 78.9922 |
Fix a different pair — knowledge_time and quantile — to compare farms at the median:
[7]:
farms_median = cube.sel(
knowledge_time=knowledge_times[2],
quantile="q50",
)
print(f"Type: {type(farms_median).__name__}")
print(f"Columns: {farms_median.column_names}")
farms_median
Type: TimeSeriesTable
Columns: ('Alpha', 'Bravo', 'Charlie')
[7]:
| timestamp | Alpha | Bravo | Charlie |
|---|---|---|---|
| 2024-01-15 00:00 | 56.1605 | 72.4949 | 102.269 |
| 2024-01-15 01:00 | 69.0924 | 80.5538 | 108.421 |
| 2024-01-15 02:00 | 72.9807 | 88.4122 | 110.324 |
| … | … | … | … |
| 2024-01-15 21:00 | 51.7531 | 65.86 | 89.1078 |
| 2024-01-15 22:00 | 53.7148 | 72.079 | 96.2784 |
| 2024-01-15 23:00 | 56.2491 | 81.1911 | 100.608 |
sel() — fixing three dimensions (4D → 1D → TimeSeriesList)
Fix everything except valid_time to extract a single time series.
[8]:
single = cube.sel(
knowledge_time=knowledge_times[2],
wind_farm="Alpha",
quantile="q50",
)
print(f"Type: {type(single).__name__}")
print(f"Len: {len(single)}")
single
Type: TimeSeriesList
Len: 24
[8]:
| timestamp | wind_power |
|---|---|
| 2024-01-15 00:00 | 56.1605 |
| 2024-01-15 01:00 | 69.0924 |
| 2024-01-15 02:00 | 72.9807 |
| … | … |
| 2024-01-15 21:00 | 51.7531 |
| 2024-01-15 22:00 | 53.7148 |
| 2024-01-15 23:00 | 56.2491 |
isel() — index-based selection
Use integer positions instead of labels.
[9]:
bravo_p90 = cube.isel(
knowledge_time=0,
wind_farm=1, # Bravo
quantile=-1, # q90 (last)
)
print(f"Type: {type(bravo_p90).__name__}, len={len(bravo_p90)}")
print(f"Mean: {np.nanmean(bravo_p90.arr):.1f} MW")
Type: TimeSeriesList, len=24
Mean: 97.7 MW
Slicing a dimension
Use slice() to keep a range of labels. The dimension is preserved (not collapsed).
[10]:
narrow = cube.sel(
knowledge_time=knowledge_times[2],
wind_farm="Alpha",
quantile=slice("q25", "q75"),
)
print(f"Type: {type(narrow).__name__}")
print(f"Columns: {narrow.column_names}")
narrow
Type: TimeSeriesTable
Columns: ('q25', 'q50', 'q75')
[10]:
| timestamp | q25 | q50 | q75 |
|---|---|---|---|
| 2024-01-15 00:00 | 52.7971 | 56.1605 | 69.0337 |
| 2024-01-15 01:00 | 58.7324 | 69.0924 | 73.3278 |
| 2024-01-15 02:00 | 64.5175 | 72.9807 | 71.1027 |
| … | … | … | … |
| 2024-01-15 21:00 | 40.5245 | 51.7531 | 57.0253 |
| 2024-01-15 22:00 | 49.311 | 53.7148 | 60.9618 |
| 2024-01-15 23:00 | 50.8314 | 56.2491 | 68.4344 |
Converting to pandas
to_pandas_dataframe() produces a long-format DataFrame with a MultiIndex covering all dimensions.
[11]:
df = cube.to_pandas_dataframe()
print(f"Shape: {df.shape}")
print(f"Index levels: {list(df.index.names)}")
df.head(10)
Shape: (1080, 1)
Index levels: ['knowledge_time', 'valid_time', 'wind_farm', 'quantile']
[11]:
| wind_power | ||||
|---|---|---|---|---|
| knowledge_time | valid_time | wind_farm | quantile | |
| 2024-01-15 00:00:00+00:00 | 2024-01-15 00:00:00+00:00 | Alpha | q10 | 30.914151 |
| q25 | 41.215017 | |||
| q50 | 52.036741 | |||
| q75 | 54.741643 | |||
| q90 | 66.031901 | |||
| Bravo | q10 | 38.929508 | ||
| q25 | 57.061177 | |||
| q50 | 69.530086 | |||
| q75 | 79.350438 | |||
| q90 | 99.614346 |
Building an array from a list of TimeSeriesList
from_timeseries_list() is handy when you already have individual forecasts.
[12]:
base_prices = 50 + 20 * np.sin(np.linspace(0, 2 * np.pi, 24))
series_list = [
tdm.TimeSeriesList(
tdm.Frequency.PT1H,
timestamps=valid_times,
values=(base_prices * factor + rng.normal(0, 2, 24)).tolist(),
name="price",
unit="EUR/MWh",
)
for factor in [0.7, 0.85, 1.0, 1.15, 1.3]
]
ensemble = tdm.TimeSeriesArray.from_timeseries_list(
series_list,
dimension=tdm.Dimension("percentile", ["p10", "p25", "p50", "p75", "p90"]),
name="price_ensemble",
)
ensemble
[12]:
| valid_time | p10 | p25 | p50 | p75 | p90 |
|---|---|---|---|---|---|
| 2024-01-15 00:00 | 35.4875 | 40.3053 | 49.5122 | 58.0453 | 63.4761 |
| 2024-01-15 01:00 | 38.4747 | 48.488 | 51.201 | 62.5824 | 72.312 |
| 2024-01-15 02:00 | 43.1394 | 48.6213 | 58.6033 | 70.8461 | 79.2416 |
| … | … | … | … | … | … |
| 2024-01-15 21:00 | 28.954 | 33.1268 | 41.6687 | 43.5539 | 49.6378 |
| 2024-01-15 22:00 | 32.4195 | 41.4561 | 44.9722 | 51.125 | 60.9551 |
| 2024-01-15 23:00 | 36.0393 | 42.3031 | 51.9253 | 54.2996 | 62.8834 |
TimeSeriesCollection
A TimeSeriesCollection groups time series that may have different frequencies, time ranges, or numbers of points. Think of it as a named bag of TimeSeriesList and TimeSeriesTable objects.
[13]:
daily_base = datetime(2024, 1, 1, tzinfo=timezone.utc)
hours = [base + timedelta(hours=i) for i in range(24)]
ts_hourly = tdm.TimeSeriesList(
tdm.Frequency.PT1H,
timestamps=hours,
values=[100.0 + rng.normal(0, 10) for _ in range(24)],
name="wind_hourly",
unit="MW",
)
ts_daily = tdm.TimeSeriesList(
tdm.Frequency.P1D,
timestamps=[daily_base + timedelta(days=d) for d in range(30)],
values=[2400.0 + rng.normal(0, 200) for _ in range(30)],
name="wind_daily_energy",
unit="MWh",
)
ts_15min = tdm.TimeSeriesList(
tdm.Frequency.PT15M,
timestamps=[base + timedelta(minutes=15 * i) for i in range(96)],
values=[50.0 + rng.normal(0, 5) for _ in range(96)],
name="solar_15min",
unit="MW",
)
Creating a collection
[14]:
collection = tdm.TimeSeriesCollection(
[ts_hourly, ts_daily, ts_15min],
name="Plant overview",
description="Mixed-frequency data for a single plant",
)
collection
[14]:
| name | type | freq | tz | length | begin | end |
|---|---|---|---|---|---|---|
| wind_hourly | TimeSeriesList | PT1H | UTC | 24 | 2024-01-15 00:00 | 2024-01-15 23:00 |
| wind_daily_energy | TimeSeriesList | P1D | UTC | 30 | 2024-01-01 00:00 | 2024-01-30 00:00 |
| solar_15min | TimeSeriesList | PT15M | UTC | 96 | 2024-01-15 00:00 | 2024-01-15 23:45 |
Dictionary-like access
[15]:
print(f"Names: {collection.names}")
print(f"Count: {collection.series_count}")
collection["wind_hourly"]
Names: ['wind_hourly', 'wind_daily_energy', 'solar_15min']
Count: 3
[15]:
| timestamp | wind_hourly |
|---|---|
| 2024-01-15 00:00 | 86.7746 |
| 2024-01-15 01:00 | 95.1381 |
| 2024-01-15 02:00 | 104.202 |
| … | … |
| 2024-01-15 21:00 | 112.906 |
| 2024-01-15 22:00 | 104.11 |
| 2024-01-15 23:00 | 107.826 |
Adding and removing series
Collections are immutable — add() and remove() return new collections.
[16]:
ts_price = tdm.TimeSeriesList(
tdm.Frequency.PT1H,
timestamps=hours,
values=[45.0 + rng.normal(0, 8) for _ in range(24)],
name="spot_price",
unit="EUR/MWh",
)
extended = collection.add(ts_price)
print(f"Original: {collection.names}")
print(f"Extended: {extended.names}")
Original: ['wind_hourly', 'wind_daily_energy', 'solar_15min']
Extended: ['wind_hourly', 'wind_daily_energy', 'solar_15min', 'spot_price']
[17]:
reduced = extended.remove("wind_daily_energy")
print(f"Reduced: {reduced.names}")
Reduced: ['wind_hourly', 'solar_15min', 'spot_price']
Iterating over a collection
[18]:
for name, series in collection.items():
print(f"{name:20s} freq={str(series.frequency):5s} len={len(series):3d} begin={series.begin}")
wind_hourly freq=PT1H len= 24 begin=2024-01-15 00:00:00+00:00
wind_daily_energy freq=P1D len= 30 begin=2024-01-01 00:00:00+00:00
solar_15min freq=PT15M len= 96 begin=2024-01-15 00:00:00+00:00
Summary
``TimeSeriesArray``: N-dimensional time series with
Dimensionlabels; slice withsel()/isel(); auto-collapses to Table or Series``TimeSeriesCollection``: heterogeneous container for series with different frequencies and time ranges; dictionary-like access; immutable add/remove
Next up: nb_07 covers data quality tools — coverage bars and validation.