NumPy and Pandas Transforms
TimeDataModel provides clean patterns for transforming time series data using numpy and pandas. Every TimeSeriesList and TimeSeriesTable exposes .arr (numpy array) and .df (pandas DataFrame) properties, plus dedicated methods for writing results back. This keeps your domain model structured while letting you leverage the full scientific Python ecosystem.
[1]:
from datetime import datetime, timedelta, timezone
import numpy as np
import timedatamodel as tdm
base = datetime(2024, 1, 15, tzinfo=timezone.utc)
timestamps = [base + timedelta(hours=i) for i in range(24)]
ts = tdm.TimeSeriesList(
tdm.Frequency.PT1H,
timestamps=timestamps,
values=[
120.0, 115.0, 108.0, 105.0, 102.0, 100.0,
110.0, 135.0, 160.0, 175.0, 180.0, 178.0,
172.0, 170.0, 168.0, 165.0, 175.0, 190.0,
200.0, 195.0, 180.0, 165.0, 145.0, 130.0,
],
name="power",
unit="MW",
data_type=tdm.DataType.OBSERVATION,
)
The .arr and .df properties
Every TimeSeriesList has two shorthand properties for quick access to the underlying data:
ts.arr— returns a numpyndarray(same asts.to_numpy())ts.df— returns a pandasDataFrame(same asts.to_pandas_dataframe())
[2]:
ts.arr
[2]:
array([120., 115., 108., 105., 102., 100., 110., 135., 160., 175., 180.,
178., 172., 170., 168., 165., 175., 190., 200., 195., 180., 165.,
145., 130.])
[3]:
ts.df.head()
[3]:
| power | |
|---|---|
| timestamp | |
| 2024-01-15 00:00:00+00:00 | 120.0 |
| 2024-01-15 01:00:00+00:00 | 115.0 |
| 2024-01-15 02:00:00+00:00 | 108.0 |
| 2024-01-15 03:00:00+00:00 | 105.0 |
| 2024-01-15 04:00:00+00:00 | 102.0 |
Pattern 1: apply_numpy(func)
Pass a function that receives a numpy array and returns a numpy array. Timestamps, frequency, and all metadata are preserved automatically. The output array must have the same length as the input.
[4]:
normalized = ts.apply_numpy(lambda arr: (arr - arr.mean()) / arr.std())
normalized
[4]:
| timestamp | power |
|---|---|
| 2024-01-15 00:00 | -0.992844 |
| 2024-01-15 01:00 | -1.14899 |
| 2024-01-15 02:00 | -1.3676 |
| … | … |
| 2024-01-15 21:00 | 0.412492 |
| 2024-01-15 22:00 | -0.212102 |
| 2024-01-15 23:00 | -0.680547 |
[5]:
cumulative = ts.apply_numpy(np.cumsum)
cumulative.head(6)
[5]:
| timestamp | power |
|---|---|
| 2024-01-15 00:00 | 120.0 |
| 2024-01-15 01:00 | 235.0 |
| 2024-01-15 02:00 | 343.0 |
[6]:
clipped = ts.apply_numpy(lambda arr: np.clip(arr, 110, 180))
clipped
[6]:
| timestamp | power |
|---|---|
| 2024-01-15 00:00 | 120.0 |
| 2024-01-15 01:00 | 115.0 |
| 2024-01-15 02:00 | 110.0 |
| … | … |
| 2024-01-15 21:00 | 165.0 |
| 2024-01-15 22:00 | 145.0 |
| 2024-01-15 23:00 | 130.0 |
Pattern 2: apply_pandas(func)
Pass a function that receives a pandas DataFrame and returns a pandas DataFrame. This lets you use the full pandas API — rolling windows, resampling, interpolation, and more.
[7]:
rolling_mean = ts.apply_pandas(lambda df: df.rolling(6, min_periods=1).mean())
rolling_mean
[7]:
| timestamp | power |
|---|---|
| 2024-01-15 00:00 | 120.0 |
| 2024-01-15 01:00 | 117.5 |
| 2024-01-15 02:00 | 114.333 |
| … | … |
| 2024-01-15 21:00 | 184.167 |
| 2024-01-15 22:00 | 179.167 |
| 2024-01-15 23:00 | 169.167 |
[8]:
diff = ts.apply_pandas(lambda df: df.diff())
diff
[8]:
| timestamp | power |
|---|---|
| 2024-01-15 00:00 | NaN |
| 2024-01-15 01:00 | -5.0 |
| 2024-01-15 02:00 | -7.0 |
| … | … |
| 2024-01-15 21:00 | -15.0 |
| 2024-01-15 22:00 | -20.0 |
| 2024-01-15 23:00 | -15.0 |
[9]:
pct_change = ts.apply_pandas(lambda df: df.pct_change() * 100)
pct_change.head(6)
[9]:
| timestamp | power |
|---|---|
| 2024-01-15 00:00 | NaN |
| 2024-01-15 01:00 | -4.16667 |
| 2024-01-15 02:00 | -6.08696 |
Pattern 3: One-liner round-trips with update_arr() and update_df()
Combine .arr / .df with update_arr() / update_df() to transform data in a single expression. The result is a new TimeSeriesList with all metadata preserved.
[10]:
ts.update_arr(ts.arr.clip(110, 180))
[10]:
| timestamp | power |
|---|---|
| 2024-01-15 00:00 | 120.0 |
| 2024-01-15 01:00 | 115.0 |
| 2024-01-15 02:00 | 110.0 |
| … | … |
| 2024-01-15 21:00 | 165.0 |
| 2024-01-15 22:00 | 145.0 |
| 2024-01-15 23:00 | 130.0 |
[11]:
ts.update_df(ts.df.resample("3h").mean())
[11]:
| timestamp | power |
|---|---|
| 2024-01-15 00:00 | 114.333 |
| 2024-01-15 03:00 | 102.333 |
| 2024-01-15 06:00 | 135.0 |
| … | … |
| 2024-01-15 15:00 | 176.667 |
| 2024-01-15 18:00 | 191.667 |
| 2024-01-15 21:00 | 146.667 |
[12]:
ts.update_df(ts.df.diff())
[12]:
| timestamp | power |
|---|---|
| 2024-01-15 00:00 | NaN |
| 2024-01-15 01:00 | -5.0 |
| 2024-01-15 02:00 | -7.0 |
| … | … |
| 2024-01-15 21:00 | -15.0 |
| 2024-01-15 22:00 | -20.0 |
| 2024-01-15 23:00 | -15.0 |
[13]:
ts.update_arr(np.cumsum(ts.arr))
[13]:
| timestamp | power |
|---|---|
| 2024-01-15 00:00 | 120.0 |
| 2024-01-15 01:00 | 235.0 |
| 2024-01-15 02:00 | 343.0 |
| … | … |
| 2024-01-15 21:00 | 3368.0 |
| 2024-01-15 22:00 | 3513.0 |
| 2024-01-15 23:00 | 3643.0 |
Pattern 4: Manual numpy round-trip
For transformations where the output shape differs from the input, export to numpy, transform freely, and construct a new TimeSeriesList.
[14]:
arr = ts.to_numpy()
print(f"Type: {type(arr)}")
print(f"Shape: {arr.shape}")
print(f"Mean: {arr.mean():.1f} MW")
Type: <class 'numpy.ndarray'>
Shape: (24,)
Mean: 151.8 MW
[15]:
window = 3
smoothed_arr = np.convolve(arr, np.ones(window) / window, mode="valid")
smoothed_timestamps = timestamps[window - 1 :]
ts_smoothed = tdm.TimeSeriesList(
tdm.Frequency.PT1H,
timestamps=smoothed_timestamps,
values=smoothed_arr.tolist(),
name=ts.name,
unit=ts.unit,
data_type=ts.data_type,
)
print(f"Original length: {len(ts)}, Smoothed length: {len(ts_smoothed)}")
ts_smoothed.head(6)
Original length: 24, Smoothed length: 22
[15]:
| timestamp | power |
|---|---|
| 2024-01-15 02:00 | 114.333 |
| 2024-01-15 03:00 | 109.333 |
| 2024-01-15 04:00 | 105.0 |
Pattern 5: Manual pandas round-trip
For multi-step pandas workflows where a one-liner would be hard to read, break it into separate steps.
[16]:
df = ts.to_pandas_dataframe()
df.head()
[16]:
| power | |
|---|---|
| timestamp | |
| 2024-01-15 00:00:00+00:00 | 120.0 |
| 2024-01-15 01:00:00+00:00 | 115.0 |
| 2024-01-15 02:00:00+00:00 | 108.0 |
| 2024-01-15 03:00:00+00:00 | 105.0 |
| 2024-01-15 04:00:00+00:00 | 102.0 |
[17]:
df_resampled = df.resample("3h").mean()
ts_resampled = ts.update_from_pandas(df_resampled)
print(f"Original: {len(ts)} points")
print(f"Resampled: {len(ts_resampled)} points")
print(f"Unit preserved: {ts_resampled.unit}")
ts_resampled
Original: 24 points
Resampled: 8 points
Unit preserved: MW
[17]:
| timestamp | power |
|---|---|
| 2024-01-15 00:00 | 114.333 |
| 2024-01-15 03:00 | 102.333 |
| 2024-01-15 06:00 | 135.0 |
| … | … |
| 2024-01-15 15:00 | 176.667 |
| 2024-01-15 18:00 | 191.667 |
| 2024-01-15 21:00 | 146.667 |
[18]:
df_ewm = df.ewm(span=6).mean()
ts_ewm = ts.update_from_pandas(df_ewm)
ts_ewm
[18]:
| timestamp | power |
|---|---|
| 2024-01-15 00:00 | 120.0 |
| 2024-01-15 01:00 | 117.083 |
| 2024-01-15 02:00 | 113.0 |
| … | … |
| 2024-01-15 21:00 | 178.601 |
| 2024-01-15 22:00 | 168.997 |
| 2024-01-15 23:00 | 157.851 |
Transforms on TimeSeriesTable
All patterns — apply_*, update_arr(), update_df(), .arr, .df — also work on TimeSeriesTable, applying across all columns.
[19]:
rng = np.random.default_rng(42)
table = tdm.TimeSeriesTable(
tdm.Frequency.PT1H,
timestamps=timestamps,
values=np.column_stack([
80 + 40 * np.sin(np.linspace(0, 2 * np.pi, 24)) + rng.normal(0, 5, 24),
np.clip(60 * np.sin(np.linspace(-0.5, np.pi + 0.5, 24)), 0, None),
50 + rng.normal(0, 3, 24),
]),
names=["wind", "solar", "hydro"],
units=["MW", "MW", "MW"],
)
table
[19]:
| timestamp | wind | solar | hydro |
|---|---|---|---|
| 2024-01-15 00:00 | 81.5236 | 0.0 | 48.715 |
| 2024-01-15 01:00 | 85.592 | 0.0 | 48.9436 |
| 2024-01-15 02:00 | 104.536 | 0.0 | 51.5969 |
| … | … | … | … |
| 2024-01-15 21:00 | 55.812 | 0.0 | 50.6561 |
| 2024-01-15 22:00 | 75.3208 | 0.0 | 52.6143 |
| 2024-01-15 23:00 | 79.2274 | 0.0 | 50.6708 |
[20]:
table_rolling = table.apply_pandas(lambda df: df.rolling(4, min_periods=1).mean())
table_rolling
[20]:
| timestamp | wind | solar | hydro |
|---|---|---|---|
| 2024-01-15 00:00 | 81.5236 | 0.0 | 48.715 |
| 2024-01-15 01:00 | 83.5578 | 0.0 | 48.8293 |
| 2024-01-15 02:00 | 90.5504 | 0.0 | 49.7518 |
| … | … | … | … |
| 2024-01-15 21:00 | 48.7795 | 9.72651 | 49.9265 |
| 2024-01-15 22:00 | 56.3025 | 3.88045 | 51.0792 |
| 2024-01-15 23:00 | 65.0506 | 0.602954 | 51.0728 |
[21]:
table_norm = table.apply_numpy(
lambda arr: (arr - arr.mean(axis=0)) / arr.std(axis=0)
)
table_norm.head(6)
[21]:
| timestamp | wind | solar | hydro |
|---|---|---|---|
| 2024-01-15 00:00 | 0.0594697 | -1.19819 | -0.868761 |
| 2024-01-15 01:00 | 0.208941 | -1.19819 | -0.759577 |
| 2024-01-15 02:00 | 0.904926 | -1.19819 | 0.507806 |
[22]:
table.update_df(table.df.rolling(4, min_periods=1).mean())
[22]:
| timestamp | wind | solar | hydro |
|---|---|---|---|
| 2024-01-15 00:00 | 81.5236 | 0.0 | 48.715 |
| 2024-01-15 01:00 | 83.5578 | 0.0 | 48.8293 |
| 2024-01-15 02:00 | 90.5504 | 0.0 | 49.7518 |
| … | … | … | … |
| 2024-01-15 21:00 | 48.7795 | 9.72651 | 49.9265 |
| 2024-01-15 22:00 | 56.3025 | 3.88045 | 51.0792 |
| 2024-01-15 23:00 | 65.0506 | 0.602954 | 51.0728 |
[23]:
table.update_arr(np.clip(table.arr, 40, 120))
[23]:
| timestamp | wind | solar | hydro |
|---|---|---|---|
| 2024-01-15 00:00 | 81.5236 | 40.0 | 48.715 |
| 2024-01-15 01:00 | 85.592 | 40.0 | 48.9436 |
| 2024-01-15 02:00 | 104.536 | 40.0 | 51.5969 |
| … | … | … | … |
| 2024-01-15 21:00 | 55.812 | 40.0 | 50.6561 |
| 2024-01-15 22:00 | 75.3208 | 40.0 | 52.6143 |
| 2024-01-15 23:00 | 79.2274 | 40.0 | 50.6708 |
Summary
Five patterns for transforming time series data:
Pattern |
Method |
Best for |
|---|---|---|
|
Functional |
Same-length vectorized ops (normalize, cumsum) |
|
Functional |
Rolling windows, diff, pct_change |
|
One-liner |
Quick numpy transforms via |
|
One-liner |
Quick pandas transforms via |
Manual |
Multi-step |
Shape-changing ops, complex workflows |
All patterns preserve metadata. Use .arr / .df for read access and update_arr() / update_df() to write results back.
Next up: nb_03 covers unit handling, validation, and rich metadata.