NumPy and Pandas Transforms
TimeDataModel provides clean patterns for transforming time series data using numpy and pandas. Every TimeSeries and TimeSeriesTable exposes .arr (numpy array) and .df (pandas DataFrame) properties, plus dedicated methods for writing results back. This keeps your domain model structured while letting you leverage the full scientific Python ecosystem.
[1]:
from datetime import datetime, timedelta, timezone
import numpy as np
import timedatamodel as tdm
base = datetime(2024, 1, 15, tzinfo=timezone.utc)
timestamps = [base + timedelta(hours=i) for i in range(24)]
ts = tdm.TimeSeries(
tdm.Frequency.PT1H,
timestamps=timestamps,
values=[
120.0, 115.0, 108.0, 105.0, 102.0, 100.0,
110.0, 135.0, 160.0, 175.0, 180.0, 178.0,
172.0, 170.0, 168.0, 165.0, 175.0, 190.0,
200.0, 195.0, 180.0, 165.0, 145.0, 130.0,
],
name="power",
unit="MW",
data_type=tdm.DataType.MEASUREMENT,
)
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 5
1 from datetime import datetime, timedelta, timezone
3 import numpy as np
----> 5 import timedatamodel as tdm
7 base = datetime(2024, 1, 15, tzinfo=timezone.utc)
8 timestamps = [base + timedelta(hours=i) for i in range(24)]
ModuleNotFoundError: No module named 'timedatamodel'
The .arr and .df properties
Every TimeSeries has two shorthand properties for quick access to the underlying data:
ts.arr— returns a numpyndarray(same asts.to_numpy())ts.df— returns a pandasDataFrame(same asts.to_pandas_dataframe())
[2]:
ts.arr
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[2], line 1
----> 1 ts.arr
NameError: name 'ts' is not defined
[3]:
ts.df.head()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[3], line 1
----> 1 ts.df.head()
NameError: name 'ts' is not defined
Pattern 1: apply_numpy(func)
Pass a function that receives a numpy array and returns a numpy array. Timestamps, frequency, and all metadata are preserved automatically. The output array must have the same length as the input.
[4]:
normalized = ts.apply_numpy(lambda arr: (arr - arr.mean()) / arr.std())
normalized
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[4], line 1
----> 1 normalized = ts.apply_numpy(lambda arr: (arr - arr.mean()) / arr.std())
2 normalized
NameError: name 'ts' is not defined
[5]:
cumulative = ts.apply_numpy(np.cumsum)
cumulative.head(6)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[5], line 1
----> 1 cumulative = ts.apply_numpy(np.cumsum)
2 cumulative.head(6)
NameError: name 'ts' is not defined
[6]:
clipped = ts.apply_numpy(lambda arr: np.clip(arr, 110, 180))
clipped
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[6], line 1
----> 1 clipped = ts.apply_numpy(lambda arr: np.clip(arr, 110, 180))
2 clipped
NameError: name 'ts' is not defined
Pattern 2: apply_pandas(func)
Pass a function that receives a pandas DataFrame and returns a pandas DataFrame. This lets you use the full pandas API — rolling windows, resampling, interpolation, and more.
[7]:
rolling_mean = ts.apply_pandas(lambda df: df.rolling(6, min_periods=1).mean())
rolling_mean
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[7], line 1
----> 1 rolling_mean = ts.apply_pandas(lambda df: df.rolling(6, min_periods=1).mean())
2 rolling_mean
NameError: name 'ts' is not defined
[8]:
diff = ts.apply_pandas(lambda df: df.diff())
diff
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[8], line 1
----> 1 diff = ts.apply_pandas(lambda df: df.diff())
2 diff
NameError: name 'ts' is not defined
[9]:
pct_change = ts.apply_pandas(lambda df: df.pct_change() * 100)
pct_change.head(6)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[9], line 1
----> 1 pct_change = ts.apply_pandas(lambda df: df.pct_change() * 100)
2 pct_change.head(6)
NameError: name 'ts' is not defined
Pattern 3: One-liner round-trips with update_arr() and update_df()
Combine .arr / .df with update_arr() / update_df() to transform data in a single expression. The result is a new TimeSeries with all metadata preserved.
[10]:
ts.update_arr(ts.arr.clip(110, 180))
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[10], line 1
----> 1 ts.update_arr(ts.arr.clip(110, 180))
NameError: name 'ts' is not defined
[11]:
ts.update_df(ts.df.resample("3h").mean())
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[11], line 1
----> 1 ts.update_df(ts.df.resample("3h").mean())
NameError: name 'ts' is not defined
[12]:
ts.update_df(ts.df.diff())
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[12], line 1
----> 1 ts.update_df(ts.df.diff())
NameError: name 'ts' is not defined
[13]:
ts.update_arr(np.cumsum(ts.arr))
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[13], line 1
----> 1 ts.update_arr(np.cumsum(ts.arr))
NameError: name 'ts' is not defined
Pattern 4: Manual numpy round-trip
For transformations where the output shape differs from the input, export to numpy, transform freely, and construct a new TimeSeries.
[14]:
arr = ts.to_numpy()
print(f"Type: {type(arr)}")
print(f"Shape: {arr.shape}")
print(f"Mean: {arr.mean():.1f} MW")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[14], line 1
----> 1 arr = ts.to_numpy()
2 print(f"Type: {type(arr)}")
3 print(f"Shape: {arr.shape}")
NameError: name 'ts' is not defined
[15]:
window = 3
smoothed_arr = np.convolve(arr, np.ones(window) / window, mode="valid")
smoothed_timestamps = timestamps[window - 1 :]
ts_smoothed = tdm.TimeSeries(
tdm.Frequency.PT1H,
timestamps=smoothed_timestamps,
values=smoothed_arr.tolist(),
name=ts.name,
unit=ts.unit,
data_type=ts.data_type,
)
print(f"Original length: {len(ts)}, Smoothed length: {len(ts_smoothed)}")
ts_smoothed.head(6)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[15], line 2
1 window = 3
----> 2 smoothed_arr = np.convolve(arr, np.ones(window) / window, mode="valid")
3 smoothed_timestamps = timestamps[window - 1 :]
5 ts_smoothed = tdm.TimeSeries(
6 tdm.Frequency.PT1H,
7 timestamps=smoothed_timestamps,
(...) 11 data_type=ts.data_type,
12 )
NameError: name 'arr' is not defined
Pattern 5: Manual pandas round-trip
For multi-step pandas workflows where a one-liner would be hard to read, break it into separate steps.
[16]:
df = ts.to_pandas_dataframe()
df.head()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[16], line 1
----> 1 df = ts.to_pandas_dataframe()
2 df.head()
NameError: name 'ts' is not defined
[17]:
df_resampled = df.resample("3h").mean()
ts_resampled = ts.update_from_pandas(df_resampled)
print(f"Original: {len(ts)} points")
print(f"Resampled: {len(ts_resampled)} points")
print(f"Unit preserved: {ts_resampled.unit}")
ts_resampled
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[17], line 1
----> 1 df_resampled = df.resample("3h").mean()
3 ts_resampled = ts.update_from_pandas(df_resampled)
4 print(f"Original: {len(ts)} points")
NameError: name 'df' is not defined
[18]:
df_ewm = df.ewm(span=6).mean()
ts_ewm = ts.update_from_pandas(df_ewm)
ts_ewm
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[18], line 1
----> 1 df_ewm = df.ewm(span=6).mean()
3 ts_ewm = ts.update_from_pandas(df_ewm)
4 ts_ewm
NameError: name 'df' is not defined
Transforms on TimeSeriesTable
All patterns — apply_*, update_arr(), update_df(), .arr, .df — also work on TimeSeriesTable, applying across all columns.
[19]:
rng = np.random.default_rng(42)
table = tdm.TimeSeriesTable(
tdm.Frequency.PT1H,
timestamps=timestamps,
values=np.column_stack([
80 + 40 * np.sin(np.linspace(0, 2 * np.pi, 24)) + rng.normal(0, 5, 24),
np.clip(60 * np.sin(np.linspace(-0.5, np.pi + 0.5, 24)), 0, None),
50 + rng.normal(0, 3, 24),
]),
names=["wind", "solar", "hydro"],
units=["MW", "MW", "MW"],
)
table
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[19], line 3
1 rng = np.random.default_rng(42)
----> 3 table = tdm.TimeSeriesTable(
4 tdm.Frequency.PT1H,
5 timestamps=timestamps,
6 values=np.column_stack([
7 80 + 40 * np.sin(np.linspace(0, 2 * np.pi, 24)) + rng.normal(0, 5, 24),
8 np.clip(60 * np.sin(np.linspace(-0.5, np.pi + 0.5, 24)), 0, None),
9 50 + rng.normal(0, 3, 24),
10 ]),
11 names=["wind", "solar", "hydro"],
12 units=["MW", "MW", "MW"],
13 )
14 table
NameError: name 'tdm' is not defined
[20]:
table_rolling = table.apply_pandas(lambda df: df.rolling(4, min_periods=1).mean())
table_rolling
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[20], line 1
----> 1 table_rolling = table.apply_pandas(lambda df: df.rolling(4, min_periods=1).mean())
2 table_rolling
NameError: name 'table' is not defined
[21]:
table_norm = table.apply_numpy(
lambda arr: (arr - arr.mean(axis=0)) / arr.std(axis=0)
)
table_norm.head(6)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[21], line 1
----> 1 table_norm = table.apply_numpy(
2 lambda arr: (arr - arr.mean(axis=0)) / arr.std(axis=0)
3 )
4 table_norm.head(6)
NameError: name 'table' is not defined
[22]:
table.update_df(table.df.rolling(4, min_periods=1).mean())
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[22], line 1
----> 1 table.update_df(table.df.rolling(4, min_periods=1).mean())
NameError: name 'table' is not defined
[23]:
table.update_arr(np.clip(table.arr, 40, 120))
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[23], line 1
----> 1 table.update_arr(np.clip(table.arr, 40, 120))
NameError: name 'table' is not defined
Summary
Five patterns for transforming time series data:
Pattern |
Method |
Best for |
|---|---|---|
|
Functional |
Same-length vectorized ops (normalize, cumsum) |
|
Functional |
Rolling windows, diff, pct_change |
|
One-liner |
Quick numpy transforms via |
|
One-liner |
Quick pandas transforms via |
Manual |
Multi-step |
Shape-changing ops, complex workflows |
All patterns preserve metadata. Use .arr / .df for read access and update_arr() / update_df() to write results back.
Next up: nb_03 covers unit handling, validation, and rich metadata.