NumPy and Pandas Transforms

TimeDataModel provides clean patterns for transforming time series data using numpy and pandas. Every TimeSeries and TimeSeriesTable exposes .arr (numpy array) and .df (pandas DataFrame) properties, plus dedicated methods for writing results back. This keeps your domain model structured while letting you leverage the full scientific Python ecosystem.

[1]:
from datetime import datetime, timedelta, timezone

import numpy as np

import timedatamodel as tdm

base = datetime(2024, 1, 15, tzinfo=timezone.utc)
timestamps = [base + timedelta(hours=i) for i in range(24)]

ts = tdm.TimeSeries(
    tdm.Frequency.PT1H,
    timestamps=timestamps,
    values=[
        120.0, 115.0, 108.0, 105.0, 102.0, 100.0,
        110.0, 135.0, 160.0, 175.0, 180.0, 178.0,
        172.0, 170.0, 168.0, 165.0, 175.0, 190.0,
        200.0, 195.0, 180.0, 165.0, 145.0, 130.0,
    ],
    name="power",
    unit="MW",
    data_type=tdm.DataType.MEASUREMENT,
)
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 5
      1 from datetime import datetime, timedelta, timezone
      3 import numpy as np
----> 5 import timedatamodel as tdm
      7 base = datetime(2024, 1, 15, tzinfo=timezone.utc)
      8 timestamps = [base + timedelta(hours=i) for i in range(24)]

ModuleNotFoundError: No module named 'timedatamodel'

The .arr and .df properties

Every TimeSeries has two shorthand properties for quick access to the underlying data:

  • ts.arr — returns a numpy ndarray (same as ts.to_numpy())

  • ts.df — returns a pandas DataFrame (same as ts.to_pandas_dataframe())

[2]:
ts.arr
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[2], line 1
----> 1 ts.arr

NameError: name 'ts' is not defined
[3]:
ts.df.head()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 ts.df.head()

NameError: name 'ts' is not defined

Pattern 1: apply_numpy(func)

Pass a function that receives a numpy array and returns a numpy array. Timestamps, frequency, and all metadata are preserved automatically. The output array must have the same length as the input.

[4]:
normalized = ts.apply_numpy(lambda arr: (arr - arr.mean()) / arr.std())
normalized
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 normalized = ts.apply_numpy(lambda arr: (arr - arr.mean()) / arr.std())
      2 normalized

NameError: name 'ts' is not defined
[5]:
cumulative = ts.apply_numpy(np.cumsum)
cumulative.head(6)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 cumulative = ts.apply_numpy(np.cumsum)
      2 cumulative.head(6)

NameError: name 'ts' is not defined
[6]:
clipped = ts.apply_numpy(lambda arr: np.clip(arr, 110, 180))
clipped
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[6], line 1
----> 1 clipped = ts.apply_numpy(lambda arr: np.clip(arr, 110, 180))
      2 clipped

NameError: name 'ts' is not defined

Pattern 2: apply_pandas(func)

Pass a function that receives a pandas DataFrame and returns a pandas DataFrame. This lets you use the full pandas API — rolling windows, resampling, interpolation, and more.

[7]:
rolling_mean = ts.apply_pandas(lambda df: df.rolling(6, min_periods=1).mean())
rolling_mean
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[7], line 1
----> 1 rolling_mean = ts.apply_pandas(lambda df: df.rolling(6, min_periods=1).mean())
      2 rolling_mean

NameError: name 'ts' is not defined
[8]:
diff = ts.apply_pandas(lambda df: df.diff())
diff
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[8], line 1
----> 1 diff = ts.apply_pandas(lambda df: df.diff())
      2 diff

NameError: name 'ts' is not defined
[9]:
pct_change = ts.apply_pandas(lambda df: df.pct_change() * 100)
pct_change.head(6)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[9], line 1
----> 1 pct_change = ts.apply_pandas(lambda df: df.pct_change() * 100)
      2 pct_change.head(6)

NameError: name 'ts' is not defined

Pattern 3: One-liner round-trips with update_arr() and update_df()

Combine .arr / .df with update_arr() / update_df() to transform data in a single expression. The result is a new TimeSeries with all metadata preserved.

[10]:
ts.update_arr(ts.arr.clip(110, 180))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[10], line 1
----> 1 ts.update_arr(ts.arr.clip(110, 180))

NameError: name 'ts' is not defined
[11]:
ts.update_df(ts.df.resample("3h").mean())
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[11], line 1
----> 1 ts.update_df(ts.df.resample("3h").mean())

NameError: name 'ts' is not defined
[12]:
ts.update_df(ts.df.diff())
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[12], line 1
----> 1 ts.update_df(ts.df.diff())

NameError: name 'ts' is not defined
[13]:
ts.update_arr(np.cumsum(ts.arr))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[13], line 1
----> 1 ts.update_arr(np.cumsum(ts.arr))

NameError: name 'ts' is not defined

Pattern 4: Manual numpy round-trip

For transformations where the output shape differs from the input, export to numpy, transform freely, and construct a new TimeSeries.

[14]:
arr = ts.to_numpy()
print(f"Type:  {type(arr)}")
print(f"Shape: {arr.shape}")
print(f"Mean:  {arr.mean():.1f} MW")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[14], line 1
----> 1 arr = ts.to_numpy()
      2 print(f"Type:  {type(arr)}")
      3 print(f"Shape: {arr.shape}")

NameError: name 'ts' is not defined
[15]:
window = 3
smoothed_arr = np.convolve(arr, np.ones(window) / window, mode="valid")
smoothed_timestamps = timestamps[window - 1 :]

ts_smoothed = tdm.TimeSeries(
    tdm.Frequency.PT1H,
    timestamps=smoothed_timestamps,
    values=smoothed_arr.tolist(),
    name=ts.name,
    unit=ts.unit,
    data_type=ts.data_type,
)
print(f"Original length: {len(ts)}, Smoothed length: {len(ts_smoothed)}")
ts_smoothed.head(6)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[15], line 2
      1 window = 3
----> 2 smoothed_arr = np.convolve(arr, np.ones(window) / window, mode="valid")
      3 smoothed_timestamps = timestamps[window - 1 :]
      5 ts_smoothed = tdm.TimeSeries(
      6     tdm.Frequency.PT1H,
      7     timestamps=smoothed_timestamps,
   (...)     11     data_type=ts.data_type,
     12 )

NameError: name 'arr' is not defined

Pattern 5: Manual pandas round-trip

For multi-step pandas workflows where a one-liner would be hard to read, break it into separate steps.

[16]:
df = ts.to_pandas_dataframe()
df.head()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[16], line 1
----> 1 df = ts.to_pandas_dataframe()
      2 df.head()

NameError: name 'ts' is not defined
[17]:
df_resampled = df.resample("3h").mean()

ts_resampled = ts.update_from_pandas(df_resampled)
print(f"Original:  {len(ts)} points")
print(f"Resampled: {len(ts_resampled)} points")
print(f"Unit preserved: {ts_resampled.unit}")
ts_resampled
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[17], line 1
----> 1 df_resampled = df.resample("3h").mean()
      3 ts_resampled = ts.update_from_pandas(df_resampled)
      4 print(f"Original:  {len(ts)} points")

NameError: name 'df' is not defined
[18]:
df_ewm = df.ewm(span=6).mean()

ts_ewm = ts.update_from_pandas(df_ewm)
ts_ewm
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[18], line 1
----> 1 df_ewm = df.ewm(span=6).mean()
      3 ts_ewm = ts.update_from_pandas(df_ewm)
      4 ts_ewm

NameError: name 'df' is not defined

Transforms on TimeSeriesTable

All patterns — apply_*, update_arr(), update_df(), .arr, .df — also work on TimeSeriesTable, applying across all columns.

[19]:
rng = np.random.default_rng(42)

table = tdm.TimeSeriesTable(
    tdm.Frequency.PT1H,
    timestamps=timestamps,
    values=np.column_stack([
        80 + 40 * np.sin(np.linspace(0, 2 * np.pi, 24)) + rng.normal(0, 5, 24),
        np.clip(60 * np.sin(np.linspace(-0.5, np.pi + 0.5, 24)), 0, None),
        50 + rng.normal(0, 3, 24),
    ]),
    names=["wind", "solar", "hydro"],
    units=["MW", "MW", "MW"],
)
table
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[19], line 3
      1 rng = np.random.default_rng(42)
----> 3 table = tdm.TimeSeriesTable(
      4     tdm.Frequency.PT1H,
      5     timestamps=timestamps,
      6     values=np.column_stack([
      7         80 + 40 * np.sin(np.linspace(0, 2 * np.pi, 24)) + rng.normal(0, 5, 24),
      8         np.clip(60 * np.sin(np.linspace(-0.5, np.pi + 0.5, 24)), 0, None),
      9         50 + rng.normal(0, 3, 24),
     10     ]),
     11     names=["wind", "solar", "hydro"],
     12     units=["MW", "MW", "MW"],
     13 )
     14 table

NameError: name 'tdm' is not defined
[20]:
table_rolling = table.apply_pandas(lambda df: df.rolling(4, min_periods=1).mean())
table_rolling
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[20], line 1
----> 1 table_rolling = table.apply_pandas(lambda df: df.rolling(4, min_periods=1).mean())
      2 table_rolling

NameError: name 'table' is not defined
[21]:
table_norm = table.apply_numpy(
    lambda arr: (arr - arr.mean(axis=0)) / arr.std(axis=0)
)
table_norm.head(6)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[21], line 1
----> 1 table_norm = table.apply_numpy(
      2     lambda arr: (arr - arr.mean(axis=0)) / arr.std(axis=0)
      3 )
      4 table_norm.head(6)

NameError: name 'table' is not defined
[22]:
table.update_df(table.df.rolling(4, min_periods=1).mean())
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[22], line 1
----> 1 table.update_df(table.df.rolling(4, min_periods=1).mean())

NameError: name 'table' is not defined
[23]:
table.update_arr(np.clip(table.arr, 40, 120))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[23], line 1
----> 1 table.update_arr(np.clip(table.arr, 40, 120))

NameError: name 'table' is not defined

Summary

Five patterns for transforming time series data:

Pattern

Method

Best for

ts.apply_numpy(func)

Functional

Same-length vectorized ops (normalize, cumsum)

ts.apply_pandas(func)

Functional

Rolling windows, diff, pct_change

ts.update_arr(ts.arr.clip(...))

One-liner

Quick numpy transforms via .arr

ts.update_df(ts.df.resample(...).mean())

One-liner

Quick pandas transforms via .df

Manual to_numpy() / to_pandas_dataframe()

Multi-step

Shape-changing ops, complex workflows

All patterns preserve metadata. Use .arr / .df for read access and update_arr() / update_df() to write results back.

Next up: nb_03 covers unit handling, validation, and rich metadata.