Metadata-Version: 2.4
Name: polars-units
Version: 0.1.0a3
Requires-Dist: polars>=1.38.1
Requires-Dist: regex>=2026.1.15
Summary: Unit management for polars
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM

<h1 align="center">polars-units</h1>

**polars-units** extends Polars capabilities by adding support for physical units and unit-aware computations. If offers:
- A new `DataType` named `Quantity` for `pl.DataFrame` and `pl.Series`
- Unit validation and parsing
- Arithmetic operations
- Conversions between units
- Metadata conservation when writing or reading Parquet files through recently supported [Arrow extension types](https://github.com/pola-rs/polars/pull/25322)

# How to use

## Installation

```bash
uv add polars-units
# or
pip install polars-units
```

## Usage

### Creating a `DataFrame` / `Series`

By default, polars-units uses a catalog of standard units, so that you don't need any configuration to get started: simply define a column data type to `pu.Quantity`, with a unit of youor choice.

```python
import polars as pl
import polars_units as pu

data = pl.DataFrame(
    [[1.2, 3.4, 6.8], [3.2, 2.4, 9.7]],
    schema={"length": pu.Quantity("m"), "time": pu.Quantity("s")},
)

print(data)
```

Which prints:

```
shape: (3, 2)
┌─────────────┬─────────────┐
│ length      ┆ time        │
│ ---         ┆ ---         │
│ ext[q['m']] ┆ ext[q['s']] │
╞═════════════╪═════════════╡
│ 1.2         ┆ 3.2         │
│ 3.4         ┆ 2.4         │
│ 6.8         ┆ 9.7         │
└─────────────┴─────────────┘
```

> ***INFO*** - Under the hood, `pu.Quantity` is an [extension type](https://docs.pola.rs/api/python/stable/reference/api/polars.datatypes.BaseExtension.html) that stores values as `pl.Float64` as well as unit metadata, that is why the shown type is `ext[...]`.

### Complex units

As long as all their components are known to polars-units, you can build composed units using `/`, `.` and exponents. [Metric prefixes](https://en.wikipedia.org/wiki/International_System_of_Units#Prefixes) are automatically parsed.

```
>>> pu.Quantity("m")
Quantity(m)
>>> pu.Quantity("m2/s.V^5.km-3")   # `^` is optional
Quantity(m^2.s^-1.V^5.km^-3)
```

### Casting to / from numeric columns

You can cast any `pl.Float64` column to `pu.Quantity` by using `pl.Expr.ext.to()`:

```python
>>> data = pl.DataFrame([1.2, 4.9], schema={"distance": pl.Float64})
>>> data

shape: (2, 1)
┌──────────┐
│ distance │
│ ---      │
│ f64      │
╞══════════╡
│ 1.2      │
│ 4.9      │
└──────────┘

>>> data.with_columns(pl.col("distance").ext.to(pu.Quantity("km")))

shape: (2, 1)
┌──────────────┐
│ distance     │
│ ---          │
│ ext[q['km']] │
╞══════════════╡
│ 1.2          │
│ 4.9          │
└──────────────┘
```

> ***NOTE*** - For now, only `Float64` columns are supported, if you have a column of another numeric type, you need to cast it to `Float64` before converting it to a `Quantity` column.

In the same way, you can convert a `Quantity` column back to `Float64` by using `pl.Expr.ext.storage()`:

```python
>>> data = pl.DataFrame([1.2, 4.9], schema={"distance": pu.Quantity("km")})
>>> data.with_columns(pl.col("distance").ext.storage())

shape: (2, 1)
┌──────────┐
│ distance │
│ ---      │
│ f64      │
╞══════════╡
│ 1.2      │
│ 4.9      │
└──────────┘
```

### Conversions

polars-units introduces a new namespace `unit`, that groups all operations related to units. You can convert to another unit via the `to` method:

```python
>>> data.with_columns(pl.col("length").unit.to("cm"))

shape: (3, 2)
┌──────────────┬─────────────┐
│ length       ┆ time        │
│ ---          ┆ ---         │
│ ext[q['cm']] ┆ ext[q['s']] │
╞══════════════╪═════════════╡
│ 120.0        ┆ 3.2         │
│ 340.0        ┆ 2.4         │
│ 680.0        ┆ 9.7         │
└──────────────┴─────────────┘
```

### Arithmetic

Arithmetic operators (`+`, `*`...) are not yet supported for extension types, but their corresponding methods are present in the `unit` namespace.

| Operator | Method |
|:--------:|:--------------------:|
|     +    |       `.add()`       |
|     -    |       `.sub()`       |
|     *    |       `.mul()`       |
|     /    |       `.div()`       |

```python
>>> data.with_columns(pl.col("length").unit.div("time").alias("speed"))

┌─────────────┬─────────────┬──────────────────┐
│ length      ┆ time        ┆ speed            │
│ ---         ┆ ---         ┆ ---              │
│ ext[q['m']] ┆ ext[q['s']] ┆ ext[q['m.s^-1']] │
╞═════════════╪═════════════╪══════════════════╡
│ 1.2         ┆ 3.2         ┆ 0.375            │
│ 3.4         ┆ 2.4         ┆ 1.416667         │
│ 6.8         ┆ 9.7         ┆ 0.701031         │
└─────────────┴─────────────┴──────────────────┘
```

You can only add or substract units that have the same dimension.

```python
length = pl.Series([1.2, 4.9], dtype=pu.Quantity("m"))
width = pl.Series([42.0, 18.5], dtype=pu.Quantity("cm"))
time = pl.Series([1.0, 2.0], dtype=pu.Quantity("min"))

length.unit.add(width)
```

If units differ, the one in the first expression is used:

```
shape: (2,)
Series: '' [ext[q['m']]]
[
        1.62
        5.085
]
```

If we try to add units without differents dimensions, then an error is raised to prevent the operation:
```python
>>> length.unit.add(time)

polars.exceptions.ComputeError: the plugin failed with message: Dimensions of both units must be the same:
        > left: Dimension { length: 0, current: 0, luminosity: 0, mass: 0, amount: 0, temperature: 0, time: 1 },
        > right: Dimension { length: 1, current: 0, luminosity: 0, mass: 0, amount: 0, temperature: 0, time: 0 }
```

> ***NOTE*** - For now, only operations between units are allowed, operations between units and scalars is planned, but not yet implemented.

## Advanced usage

### Core objects

Internally, polars-units uses a few classes to manipulate units, and check the validity of operations. You can use these classes to build your own sets of units.

#### `Dimension`

In the [International System of Units](https://en.wikipedia.org/wiki/International_System_of_Units), there are seven fundamental physical dimensions that are use to construct every unit.

|         Quantity         | Symbol (in polars-units) | Associated unit |
|:-------------------------|:------------------------:|:---------------:|
| Time                     |             T            |    second `s`   |
| Length                   |             L            |    meter `m`    |
| Mass                     |             M            |  kilogram `kg`  |
| Electric current         |             I            |    ampere `A`   |
| Themodynamic temperature |            Th            |    kelvin `K`   |
| Amount of substance      |             N            |    mole `mol`   |
| Luminous intensity       |             J            |   candela `cd`  |

Dimensions are represented by the `Dimension` class, and are used to verify the validity of operations between units (you cannot add kilograms and meters, for example). 

You can build `Dimension` objects by passing a dictionary:

```python
speed_dimension = pu.Dimension({pu.BaseDimension.LENGTH: 1, pu.BaseDimension.TIME: -1})
```

Or you can directly pass a string that will be parsed:

```python
speed_dimension = pu.Dimension("L/T") # or "L.T^-1"
```

#### `BaseUnit`

A base unit is a single unit (e.g. meter `m`, Volt `V`, minute `min`), possibly associated with a prefix (kilo `k`, milli `m`). It it used as a base component of composed units, such as `m/s²`. In polars-units, a `BaseUnit` has several attributes:
- A `symbol`, that is used for display,
- A `name`,
- A `prefix` (optional),
- A `formula` (see [Dimensions](#dimension)),
- `si_offset` and `si_factor`, used for conversions between units.

You can define new units by creating new `BaseUnit` objects:

```python
yard = pu.BaseUnit(symbol="yd", name="yard", dimension="L", si_factor=1 / 0.9144)
foot = pu.BaseUnit(symbol="ft", name="foot", dimension="L", si_factor=1 / 0.3048)
```

#### `Unit`

`Unit` objects represent combinations of base units and their exponent in a formula. For example, a speed in `km/h` has a `km` component with exponent 1, and a `h` component with exponent -1. `Quantity`-type columns store `Unit` objects internally to perform operations. They are almost never instanciated directly, they are rather built trough [`Registry`](#registries) objects.

### Registries

In order to perform validation on units, polars-units relies on `pu.Registry` objects. They contain a list of valid units and their properties. By default, polars-units uses a `pu.DefaultRegistry()`, which includes a catalog of usual SI units (see [Catalog](#catalog)). If you want to extend the pool of available units, you need to configure a `Registry` with your own `BaseUnit`s.

#### Extend the default registry

By instanciating a new `DefaultRegistry`, you can benefit from both the default units, and your own.

```python
yard = pu.BaseUnit(symbol="yd", name="yard", si_factor=1 / 0.9144)
foot = pu.BaseUnit(symbol="ft", name="foot", si_factor=1 / 0.3048)

registry = pu.DefaultRegistry([yard, foot])
```

You need to set the global registry before creating new `Quantity` columns based on your units:

```python
pu.config.registry = registry
```

Once the configuration is set to the new registry, you can use these new units in your program:

```
>>> pl.Series([1.2, 4.9], dtype=pu.Quantity("m")).unit.to("yd")

shape: (2,)
Series: '' [ext[q['yd']]]
[
        1.09728
        4.48056
]
```

#### Create a new registry

If you want to define your own set of units from scratch, you can use the `Registry` class instead:

```python
registry = pu.Registry([yard, foot])

pu.config.registry = registry   # Use your new registry
```

Now only your units will be registered:

```
>>> length = pl.Series([1.2, 4.9], dtype=pu.Quantity("m"))

...
polars_units.utils.exceptions.UnitNotFoundError: Unit with symbol 'm' not found in registry.
```

## Catalog

The following table lists all units included in polars-units by default.

[Dimensions](#dimension) are noted using the following symbols: $L$ = Length, $I$ = Current, $J$ = Luminosity, $M$ = Mass, $N$ = Amount of substance, $\Theta$ = Temperature, $T$ = Time.

### SI base units


| Symbol | Name | Dimension | Notes |
|:------:|------|:---------:|-------|
| `m` | Meter | $L$ | SI base unit
| `A` | Ampere | $I$ | SI base unit
| `cd` | Candela | $J$ | SI base unit
| `kg` | Kilogram | $M$ | SI base unit
| `mol` | Mole | $N$ | SI base unit
| `K` | Kelvin | $\Theta$ | SI base unit
| `s` | Second | $T$ | SI base unit
| `°C` | Degree Celsius | $\Theta$ | |
| `°F` | Degree Fahrenheit | $\Theta$ | |
| `min` | Minute | $T$ | |
| `h` | Hour | $T$ | |
| `rad` | Radian | $-$ | Dimensionless |
| `sr` | Steradian | $-$ | Dimensionless |
| `Hz` | Hertz | $T^{-1}$ | |
| `N` | Newton | $M \cdot L \cdot T^{-2}$ | |
| `Pa` | Pascal | $M \cdot L^{-1} \cdot T^{-2}$ | |
| `J` | Joule | $M \cdot L^2 \cdot T^{-2}$ | |
| `W` | Watt | $M \cdot L^2 \cdot T^{-3}$ | |
| `C` | Coulomb | $I \cdot T$ | |
| `V` | Volt | $M \cdot L^2 \cdot I^{-1} \cdot T^{-3}$ | |
| `F` | Farad | $M^{-1} \cdot L^{-2} \cdot I^2 \cdot T^4$ | |
| `Ω` / `ohm` | Ohm | $M \cdot L^2 \cdot I^{-2} \cdot T^{-3}$ | Both symbols are equivalent|
| `S` | Siemens | $M^{-1} \cdot L^{-2} \cdot I^2 \cdot T^3$ | |
| `Wb` | Weber | $M \cdot L^2 \cdot I^{-1} \cdot T^{-2}$ | |
| `T` | Tesla | $M \cdot I^{-1} \cdot T^{-2}$ | |
| `H` | Henry | $M \cdot L^2 \cdot I^{-2} \cdot T^{-2}$ | |
| `lm` | Lumen | $J$ | |
| `lx` | Lux | $J \cdot L^{-2}$ | |
| `Bq` | Becquerel | $T^{-1}$ | |
| `Gy` | Gray | $L^2 \cdot T^{-2}$ | |
| `Sv` | Sievert | $L^2 \cdot T^{-2}$ | |
| `kat` | Katal | $N \cdot T^{-1}$ | |

