NumPy and Pandas Transforms

TimeDataModel provides clean patterns for transforming time series data using numpy and pandas. Every TimeSeriesList and TimeSeriesTable exposes .arr (numpy array) and .df (pandas DataFrame) properties, plus dedicated methods for writing results back. This keeps your domain model structured while letting you leverage the full scientific Python ecosystem.

[1]:
from datetime import datetime, timedelta, timezone

import numpy as np

import timedatamodel as tdm

base = datetime(2024, 1, 15, tzinfo=timezone.utc)
timestamps = [base + timedelta(hours=i) for i in range(24)]

ts = tdm.TimeSeriesList(
    tdm.Frequency.PT1H,
    timestamps=timestamps,
    values=[
        120.0, 115.0, 108.0, 105.0, 102.0, 100.0,
        110.0, 135.0, 160.0, 175.0, 180.0, 178.0,
        172.0, 170.0, 168.0, 165.0, 175.0, 190.0,
        200.0, 195.0, 180.0, 165.0, 145.0, 130.0,
    ],
    name="power",
    unit="MW",
    data_type=tdm.DataType.OBSERVATION,
)

The .arr and .df properties

Every TimeSeriesList has two shorthand properties for quick access to the underlying data:

  • ts.arr — returns a numpy ndarray (same as ts.to_numpy())

  • ts.df — returns a pandas DataFrame (same as ts.to_pandas_dataframe())

[2]:
ts.arr
[2]:
array([120., 115., 108., 105., 102., 100., 110., 135., 160., 175., 180.,
       178., 172., 170., 168., 165., 175., 190., 200., 195., 180., 165.,
       145., 130.])
[3]:
ts.df.head()
[3]:
power
timestamp
2024-01-15 00:00:00+00:00 120.0
2024-01-15 01:00:00+00:00 115.0
2024-01-15 02:00:00+00:00 108.0
2024-01-15 03:00:00+00:00 105.0
2024-01-15 04:00:00+00:00 102.0

Pattern 1: apply_numpy(func)

Pass a function that receives a numpy array and returns a numpy array. Timestamps, frequency, and all metadata are preserved automatically. The output array must have the same length as the input.

[4]:
normalized = ts.apply_numpy(lambda arr: (arr - arr.mean()) / arr.std())
normalized
[4]:
TimeSeriesList
Namepower
Length24
FrequencyPT1H
TimezoneUTC (+00:00)
UnitMW
Data typeOBSERVATION
timestamppower
2024-01-15 00:00-0.992844
2024-01-15 01:00-1.14899
2024-01-15 02:00-1.3676
2024-01-15 21:000.412492
2024-01-15 22:00-0.212102
2024-01-15 23:00-0.680547
[5]:
cumulative = ts.apply_numpy(np.cumsum)
cumulative.head(6)
[5]:
TimeSeriesList
Namepower
Length6
FrequencyPT1H
TimezoneUTC (+00:00)
UnitMW
Data typeOBSERVATION
timestamppower
2024-01-15 00:00120.0
2024-01-15 01:00235.0
2024-01-15 02:00343.0
[6]:
clipped = ts.apply_numpy(lambda arr: np.clip(arr, 110, 180))
clipped
[6]:
TimeSeriesList
Namepower
Length24
FrequencyPT1H
TimezoneUTC (+00:00)
UnitMW
Data typeOBSERVATION
timestamppower
2024-01-15 00:00120.0
2024-01-15 01:00115.0
2024-01-15 02:00110.0
2024-01-15 21:00165.0
2024-01-15 22:00145.0
2024-01-15 23:00130.0

Pattern 2: apply_pandas(func)

Pass a function that receives a pandas DataFrame and returns a pandas DataFrame. This lets you use the full pandas API — rolling windows, resampling, interpolation, and more.

[7]:
rolling_mean = ts.apply_pandas(lambda df: df.rolling(6, min_periods=1).mean())
rolling_mean
[7]:
TimeSeriesList
Namepower
Length24
FrequencyPT1H
TimezoneUTC (+00:00)
UnitMW
Data typeOBSERVATION
timestamppower
2024-01-15 00:00120.0
2024-01-15 01:00117.5
2024-01-15 02:00114.333
2024-01-15 21:00184.167
2024-01-15 22:00179.167
2024-01-15 23:00169.167
[8]:
diff = ts.apply_pandas(lambda df: df.diff())
diff
[8]:
TimeSeriesList
Namepower
Length24
FrequencyPT1H
TimezoneUTC (+00:00)
UnitMW
Data typeOBSERVATION
timestamppower
2024-01-15 00:00NaN
2024-01-15 01:00-5.0
2024-01-15 02:00-7.0
2024-01-15 21:00-15.0
2024-01-15 22:00-20.0
2024-01-15 23:00-15.0
[9]:
pct_change = ts.apply_pandas(lambda df: df.pct_change() * 100)
pct_change.head(6)
[9]:
TimeSeriesList
Namepower
Length6
FrequencyPT1H
TimezoneUTC (+00:00)
UnitMW
Data typeOBSERVATION
timestamppower
2024-01-15 00:00NaN
2024-01-15 01:00-4.16667
2024-01-15 02:00-6.08696

Pattern 3: One-liner round-trips with update_arr() and update_df()

Combine .arr / .df with update_arr() / update_df() to transform data in a single expression. The result is a new TimeSeriesList with all metadata preserved.

[10]:
ts.update_arr(ts.arr.clip(110, 180))
[10]:
TimeSeriesList
Namepower
Length24
FrequencyPT1H
TimezoneUTC (+00:00)
UnitMW
Data typeOBSERVATION
timestamppower
2024-01-15 00:00120.0
2024-01-15 01:00115.0
2024-01-15 02:00110.0
2024-01-15 21:00165.0
2024-01-15 22:00145.0
2024-01-15 23:00130.0
[11]:
ts.update_df(ts.df.resample("3h").mean())
[11]:
TimeSeriesList
Namepower
Length8
FrequencyPT1H
TimezoneUTC (+00:00)
UnitMW
Data typeOBSERVATION
timestamppower
2024-01-15 00:00114.333
2024-01-15 03:00102.333
2024-01-15 06:00135.0
2024-01-15 15:00176.667
2024-01-15 18:00191.667
2024-01-15 21:00146.667
[12]:
ts.update_df(ts.df.diff())
[12]:
TimeSeriesList
Namepower
Length24
FrequencyPT1H
TimezoneUTC (+00:00)
UnitMW
Data typeOBSERVATION
timestamppower
2024-01-15 00:00NaN
2024-01-15 01:00-5.0
2024-01-15 02:00-7.0
2024-01-15 21:00-15.0
2024-01-15 22:00-20.0
2024-01-15 23:00-15.0
[13]:
ts.update_arr(np.cumsum(ts.arr))
[13]:
TimeSeriesList
Namepower
Length24
FrequencyPT1H
TimezoneUTC (+00:00)
UnitMW
Data typeOBSERVATION
timestamppower
2024-01-15 00:00120.0
2024-01-15 01:00235.0
2024-01-15 02:00343.0
2024-01-15 21:003368.0
2024-01-15 22:003513.0
2024-01-15 23:003643.0

Pattern 4: Manual numpy round-trip

For transformations where the output shape differs from the input, export to numpy, transform freely, and construct a new TimeSeriesList.

[14]:
arr = ts.to_numpy()
print(f"Type:  {type(arr)}")
print(f"Shape: {arr.shape}")
print(f"Mean:  {arr.mean():.1f} MW")
Type:  <class 'numpy.ndarray'>
Shape: (24,)
Mean:  151.8 MW
[15]:
window = 3
smoothed_arr = np.convolve(arr, np.ones(window) / window, mode="valid")
smoothed_timestamps = timestamps[window - 1 :]

ts_smoothed = tdm.TimeSeriesList(
    tdm.Frequency.PT1H,
    timestamps=smoothed_timestamps,
    values=smoothed_arr.tolist(),
    name=ts.name,
    unit=ts.unit,
    data_type=ts.data_type,
)
print(f"Original length: {len(ts)}, Smoothed length: {len(ts_smoothed)}")
ts_smoothed.head(6)
Original length: 24, Smoothed length: 22
[15]:
TimeSeriesList
Namepower
Length6
FrequencyPT1H
TimezoneUTC (+00:00)
UnitMW
Data typeOBSERVATION
timestamppower
2024-01-15 02:00114.333
2024-01-15 03:00109.333
2024-01-15 04:00105.0

Pattern 5: Manual pandas round-trip

For multi-step pandas workflows where a one-liner would be hard to read, break it into separate steps.

[16]:
df = ts.to_pandas_dataframe()
df.head()
[16]:
power
timestamp
2024-01-15 00:00:00+00:00 120.0
2024-01-15 01:00:00+00:00 115.0
2024-01-15 02:00:00+00:00 108.0
2024-01-15 03:00:00+00:00 105.0
2024-01-15 04:00:00+00:00 102.0
[17]:
df_resampled = df.resample("3h").mean()

ts_resampled = ts.update_from_pandas(df_resampled)
print(f"Original:  {len(ts)} points")
print(f"Resampled: {len(ts_resampled)} points")
print(f"Unit preserved: {ts_resampled.unit}")
ts_resampled
Original:  24 points
Resampled: 8 points
Unit preserved: MW
[17]:
TimeSeriesList
Namepower
Length8
FrequencyPT1H
TimezoneUTC (+00:00)
UnitMW
Data typeOBSERVATION
timestamppower
2024-01-15 00:00114.333
2024-01-15 03:00102.333
2024-01-15 06:00135.0
2024-01-15 15:00176.667
2024-01-15 18:00191.667
2024-01-15 21:00146.667
[18]:
df_ewm = df.ewm(span=6).mean()

ts_ewm = ts.update_from_pandas(df_ewm)
ts_ewm
[18]:
TimeSeriesList
Namepower
Length24
FrequencyPT1H
TimezoneUTC (+00:00)
UnitMW
Data typeOBSERVATION
timestamppower
2024-01-15 00:00120.0
2024-01-15 01:00117.083
2024-01-15 02:00113.0
2024-01-15 21:00178.601
2024-01-15 22:00168.997
2024-01-15 23:00157.851

Transforms on TimeSeriesTable

All patterns — apply_*, update_arr(), update_df(), .arr, .df — also work on TimeSeriesTable, applying across all columns.

[19]:
rng = np.random.default_rng(42)

table = tdm.TimeSeriesTable(
    tdm.Frequency.PT1H,
    timestamps=timestamps,
    values=np.column_stack([
        80 + 40 * np.sin(np.linspace(0, 2 * np.pi, 24)) + rng.normal(0, 5, 24),
        np.clip(60 * np.sin(np.linspace(-0.5, np.pi + 0.5, 24)), 0, None),
        50 + rng.normal(0, 3, 24),
    ]),
    names=["wind", "solar", "hydro"],
    units=["MW", "MW", "MW"],
)
table
[19]:
TimeSeriesTable
Nameunnamed
Columnswind, solar, hydro
Length24 × 3
FrequencyPT1H
TimezoneUTC (+00:00)
UnitMW, MW, MW
timestampwindsolarhydro
2024-01-15 00:0081.52360.048.715
2024-01-15 01:0085.5920.048.9436
2024-01-15 02:00104.5360.051.5969
2024-01-15 21:0055.8120.050.6561
2024-01-15 22:0075.32080.052.6143
2024-01-15 23:0079.22740.050.6708
[20]:
table_rolling = table.apply_pandas(lambda df: df.rolling(4, min_periods=1).mean())
table_rolling
[20]:
TimeSeriesTable
Nameunnamed
Columnswind, solar, hydro
Length24 × 3
FrequencyPT1H
TimezoneUTC (+00:00)
UnitMW, MW, MW
timestampwindsolarhydro
2024-01-15 00:0081.52360.048.715
2024-01-15 01:0083.55780.048.8293
2024-01-15 02:0090.55040.049.7518
2024-01-15 21:0048.77959.7265149.9265
2024-01-15 22:0056.30253.8804551.0792
2024-01-15 23:0065.05060.60295451.0728
[21]:
table_norm = table.apply_numpy(
    lambda arr: (arr - arr.mean(axis=0)) / arr.std(axis=0)
)
table_norm.head(6)
[21]:
TimeSeriesTable
Nameunnamed
Columnswind, solar, hydro
Length6 × 3
FrequencyPT1H
TimezoneUTC (+00:00)
UnitMW, MW, MW
timestampwindsolarhydro
2024-01-15 00:000.0594697-1.19819-0.868761
2024-01-15 01:000.208941-1.19819-0.759577
2024-01-15 02:000.904926-1.198190.507806
[22]:
table.update_df(table.df.rolling(4, min_periods=1).mean())
[22]:
TimeSeriesTable
Nameunnamed
Columnswind, solar, hydro
Length24 × 3
FrequencyPT1H
TimezoneUTC (+00:00)
UnitMW, MW, MW
timestampwindsolarhydro
2024-01-15 00:0081.52360.048.715
2024-01-15 01:0083.55780.048.8293
2024-01-15 02:0090.55040.049.7518
2024-01-15 21:0048.77959.7265149.9265
2024-01-15 22:0056.30253.8804551.0792
2024-01-15 23:0065.05060.60295451.0728
[23]:
table.update_arr(np.clip(table.arr, 40, 120))
[23]:
TimeSeriesTable
Nameunnamed
Columnswind, solar, hydro
Length24 × 3
FrequencyPT1H
TimezoneUTC (+00:00)
UnitMW, MW, MW
timestampwindsolarhydro
2024-01-15 00:0081.523640.048.715
2024-01-15 01:0085.59240.048.9436
2024-01-15 02:00104.53640.051.5969
2024-01-15 21:0055.81240.050.6561
2024-01-15 22:0075.320840.052.6143
2024-01-15 23:0079.227440.050.6708

Summary

Five patterns for transforming time series data:

Pattern

Method

Best for

ts.apply_numpy(func)

Functional

Same-length vectorized ops (normalize, cumsum)

ts.apply_pandas(func)

Functional

Rolling windows, diff, pct_change

ts.update_arr(ts.arr.clip(...))

One-liner

Quick numpy transforms via .arr

ts.update_df(ts.df.resample(...).mean())

One-liner

Quick pandas transforms via .df

Manual to_numpy() / to_pandas_dataframe()

Multi-step

Shape-changing ops, complex workflows

All patterns preserve metadata. Use .arr / .df for read access and update_arr() / update_df() to write results back.

Next up: nb_03 covers unit handling, validation, and rich metadata.