Metadata-Version: 2.4
Name: polars-iptools
Version: 0.2.0
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Requires-Dist: polars>=1.3.0
License-File: LICENSE
Summary: Polars extension for IP address parsing and enrichment including geolocation
Keywords: polars,dfir,geoip,ip-address,geolocation
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://github.com/erichutchins/polars_iptools
Project-URL: Issues, https://github.com/erichutchins/polars_iptools/issues
Project-URL: Repository, https://github.com/erichutchins/polars_iptools

# Polars IPTools

Polars IPTools is a Rust-based extension to accelerates IP address manipulation and enrichment in Polars dataframes. This library includes various utility functions for working with IPv4 and IPv6 addresses and geoip and anonymization/proxy enrichment using MaxMind databases.

[![Documentation](https://img.shields.io/badge/docs-mkdocs-blue)](https://erichutchins.github.io/polars_iptools/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/polars-iptools)](https://pypi.org/project/polars-iptools/)
[![uv](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json)](https://github.com/astral-sh/uv)
[![ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![ty](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ty/refs/heads/main/assets/badge/v0.json)](https://github.com/astral-sh/ty)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
[![Claude](https://img.shields.io/badge/Claude-D97757?logo=claude&logoColor=fff)](https://claude.ai)
[![Gemini](https://img.shields.io/badge/Gemini-8E75FF?logo=googlegemini&logoColor=fff)](https://antigravity.google)

## Install

```shell
pip install polars-iptools
```

## Examples

### Simple enrichments

IPTools' Rust implementation gives you speedy answers to basic IP questions like "is this a private IP?"

```python
>>> import polars as pl
>>> import polars_iptools as ip
>>> df = pl.DataFrame({'ip': ['8.8.8.8', '2606:4700::1111', '192.168.100.100', '172.21.1.1', '172.34.5.5', 'a.b.c.d']})
>>> df.with_columns(ip.is_private(pl.col('ip')).alias('is_private'))
shape: (6, 2)
┌─────────────────┬────────────┐
│ ip              ┆ is_private │
│ ---             ┆ ---        │
│ str             ┆ bool       │
╞═════════════════╪════════════╡
│ 8.8.8.8         ┆ false      │
│ 2606:4700::1111 ┆ false      │
│ 192.168.100.100 ┆ true       │
│ 172.21.1.1      ┆ true       │
│ 172.34.5.5      ┆ false      │
│ a.b.c.d         ┆ false      │
└─────────────────┴────────────┘
```

### IP extension types

IPTools provides two Arrow extension types for storing IP addresses efficiently.
`IPv4` uses 4-byte `UInt32` storage; `IPAddress` uses 16-byte binary and handles
both IPv4 and IPv6. Both types survive Parquet and IPC round-trips with dtype preserved.

```python
>>> import polars as pl
>>> import polars_iptools as ip

>>> df = pl.DataFrame({"ip": ["8.8.8.8", "2606:4700::1111", "192.168.1.1", "invalid"]})
>>> df.with_columns(ip.to_address("ip"))
shape: (4, 2)
┌─────────────────┬─────────────────┐
│ ip              ┆ ip              │
│ ---             ┆ ---             │
│ str             ┆ ip_addr         │
╞═════════════════╪═════════════════╡
│ 8.8.8.8         ┆ 8.8.8.8         │
│ 2606:4700::1111 ┆ 2606:4700::1111 │
│ 192.168.1.1     ┆ 192.168.1.1     │
│ invalid         ┆ null            │
└─────────────────┴─────────────────┘

>>> # Write typed column to Parquet — dtype is preserved on read
>>> typed = df.with_columns(ip.to_address("ip"))
>>> typed.write_parquet("/tmp/ips.parquet")
>>> pl.read_parquet("/tmp/ips.parquet").dtypes
[String, IPAddress]
```

### `is_in` but for network ranges

Pandas and Polars have `is_in` functions to perform membership lookups. IPTools extends this to enable IP address membership in IP _networks_. This function works seamlessly with both IPv4 and IPv6 addresses and converts the specified networks into a [Level-Compressed trie (LC-Trie)](https://github.com/Orange-OpenSource/iptrie) for fast, efficient lookups.

```python
>>> import polars as pl
>>> import polars_iptools as ip
>>> df = pl.DataFrame({'ip': ['8.8.8.8', '1.1.1.1', '2606:4700::1111']})
>>> networks = ['8.8.8.0/24', '2606:4700::/32']
>>> df.with_columns(ip.is_in(pl.col('ip'), networks).alias('is_in'))
shape: (3, 2)
┌─────────────────┬───────┐
│ ip              ┆ is_in │
│ ---             ┆ ---   │
│ str             ┆ bool  │
╞═════════════════╪═══════╡
│ 8.8.8.8         ┆ true  │
│ 1.1.1.1         ┆ false │
│ 2606:4700::1111 ┆ true  │
└─────────────────┴───────┘
```

### GeoIP enrichment

Using [MaxMind's](https://www.maxmind.com/en/geoip-databases) _GeoLite2-ASN.mmdb_ and _GeoLite2-City.mmdb_ databases, IPTools provides offline enrichment of network ownership and geolocation.

`ip.geoip.full` returns a Polars struct containing all available metadata parameters. If you just want the ASN and AS organization, you can use `ip.geoip.asn`.

```python
>>> import polars as pl
>>> import polars_iptools as ip

>>> df = pl.DataFrame({"ip":["8.8.8.8", "192.168.1.1", "2606:4700::1111", "999.abc.def.123"]})
>>> df.with_columns([ip.geoip.full(pl.col("ip")).alias("geoip")])

shape: (4, 2)
┌─────────────────┬─────────────────────────────────┐
│ ip              ┆ geoip                           │
│ ---             ┆ ---                             │
│ str             ┆ struct[11]                      │
╞═════════════════╪═════════════════════════════════╡
│ 8.8.8.8         ┆ {15169,"GOOGLE","","NA","","",… │
│ 192.168.1.1     ┆ {0,"","","","","","","",0.0,0.… │
│ 2606:4700::1111 ┆ {13335,"CLOUDFLARENET","","","… │
│ 999.abc.def.123 ┆ {null,null,null,null,null,null… │
└─────────────────┴─────────────────────────────────┘

>>> df.with_columns([ip.geoip.asn(pl.col("ip")).alias("asn")])
shape: (4, 2)
┌─────────────────┬───────────────────────┐
│ ip              ┆ asn                   │
│ ---             ┆ ---                   │
│ str             ┆ str                   │
╞═════════════════╪═══════════════════════╡
│ 8.8.8.8         ┆ AS15169 GOOGLE        │
│ 192.168.1.1     ┆                       │
│ 2606:4700::1111 ┆ AS13335 CLOUDFLARENET │
│ 999.abc.def.123 ┆                       │
└─────────────────┴───────────────────────┘
```

### Spur enrichment

[Spur](https://spur.us/) is a commercial service that provides "data to detect VPNs, residential proxies, and bots". One of its offerings is a [Maxmind mmdb format](https://docs.spur.us/feeds?id=feed-export-utility) of at most 2,000,000 "busiest" Anonymous or Anonymous+Residential ips.

`ip.spur.full` returns a Polars struct containing all available metadata parameters.

```python
>>> import polars as pl
>>> import polars_iptools as ip

>>> df = pl.DataFrame({"ip":["8.8.8.8", "192.168.1.1", "999.abc.def.123"]})
>>> df.with_columns([ip.spur.full(pl.col("ip")).alias("spur")])

shape: (3, 2)
┌─────────────────┬─────────────────────────────────┐
│ ip              ┆ spur                            │
│ ---             ┆ ---                             │
│ str             ┆ struct[7]                       │
╞═════════════════╪═════════════════════════════════╡
│ 8.8.8.8         ┆ {0.0,"","","","","",null}       │
│ 192.168.1.1     ┆ {0.0,"","","","","",null}       │
│ 999.abc.def.123 ┆ {null,null,null,null,null,null… │
└─────────────────┴─────────────────────────────────┘
```

## Environment Configuration

IPTools uses two MaxMind databases: _GeoLite2-ASN.mmdb_ and _GeoLite2-City.mmdb_. You only need these files if you call the geoip functions.

Set the `MAXMIND_MMDB_DIR` environment variable to tell the extension where these files are located.

```shell
export MAXMIND_MMDB_DIR=/path/to/your/mmdb/files
# or Windows users
set MAXMIND_MMDB_DIR=c:\path\to\your\mmdb\files
```

If the environment is not set, polars_iptools will check two other common locations (on Mac/Linux):

```
/usr/local/share/GeoIP
/opt/homebrew/var/GeoIP
```

### Spur Environment

If you're a Spur customer, export the feed as `spur.mmdb` and specify its location using `SPUR_MMDB_DIR` environment variable.

```shell
export SPUR_MMDB_DIR=/path/to/spur/mmdb
# or Windows users
set SPUR_MMDB_DIR=c:\path\to\spur\mmdb
```

## Credit

Developing this extension was super easy by following Marco Gorelli's [tutorial](https://marcogorelli.github.io/polars-plugins-tutorial/) and [cookiecutter template](https://github.com/MarcoGorelli/cookiecutter-polars-plugins).

