Metadata-Version: 2.4
Name: kashima
Version: 3.1.0
Summary: Machine Learning Tools for Geotechnical Earthquake Engineering.
Home-page: https://github.com/SRKConsulting/kashima
Author: Alejandro Verri Kozlowski
Author-email: averri@fi.uba.ar
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: pandas>=1.3.0
Requires-Dist: numpy>=1.20.0
Requires-Dist: folium>=0.12.0
Requires-Dist: geopandas>=0.10.0
Requires-Dist: pyproj>=3.0.0
Requires-Dist: requests>=2.25.0
Requires-Dist: branca>=0.4.0
Requires-Dist: geopy>=2.0.0
Requires-Dist: matplotlib>=3.3.0
Requires-Dist: obspy>=1.2.0
Requires-Dist: appdirs>=1.4.0
Requires-Dist: pyarrow>=10.0.0
Requires-Dist: usgs-strec<3,>=2.3.13
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# kashima

**Interactive Seismic Event Mapping and Catalog Management**

[![Python Version](https://img.shields.io/badge/python-3.8+-blue.svg)](https://python.org) [![Version](https://img.shields.io/badge/version-3.0.1-green.svg)](https://github.com/averriK/kashima)

> **Last updated:** February 26, 2026

kashima is a Python library for seismic event visualization and catalog processing that produces interactive Folium-based web maps from global earthquake catalogs and auxiliary datasets.

## What is it?

kashima focuses on the *mapping workflow* for engineering seismology: given one or more sites of interest, it builds reproducible web maps that combine:

- Global earthquake catalogs (USGS ComCat, Global CMT NDK, ISC Bulletin)
- Auxiliary fault databases (GEM, USGS Quaternary, EFSM20) and user GeoJSON faults
- Global ISC station layer (from a packaged CSV, filtered to the map window)
- Optional user event catalogs (e.g. blasts) in CSV or Parquet form

The heavy lifting (data download, caching, clipping and styling) is encapsulated in a small public API under `kashima.mapper`.

## Who is this for? (TL;DR)

kashima is aimed at engineering seismology, seismic hazard and mining/energy projects where you need reproducible, shareable web maps of earthquakes, faults and stations around one or more sites.

The typical workflow is:

1. Install the library: `pip install kashima`.
2. Pre-populate the global cache once: `downloadAllCatalogs(include_faults=True)`.
3. Build a map for your site with `buildMap(...)` and open the generated `maps/index.html` in a browser.

## Features

- **Multi-catalog support**: USGS, Global CMT (NDK method), ISC, and custom blasts
- **GMDB maintenance helpers**: update provider `EventTable.<PROVIDER>.csv` and `StationTable.<PROVIDER>.csv` files; fill audit IDs for events; optionally augment with generation mechanism + faulting style
- **GMM/GMPE evaluation helpers**: evaluate Sa(T)/PGA from *single-TRT* GMPE logic tree seeds (OpenQuake hazardlib required at runtime). See `examples/gmm/01_*`..`04_*`.
- **GMDB Vs30 helpers**: query and update station Vs30 from GMDB `StationTable.*.csv`, combining provider (`Vs30.owner`), neighbor-inferred (`Vs30.neighbors`) and USGS-proxy (`Vs30.USGS`) values into a synthetic `StationVs30`
- **STREC + Slab2 integration (optional but supported)**: bootstrap Slab2 grids and write auditable `*.STREC` columns for reviewer-friendly subduction subtype tagging
- **Interactive maps**: Folium-based maps with beachball focal mechanisms, distance rings and rich tooltips
- **Global cache**: Download catalogs and fault databases once, reuse across projects with incremental updates
- **Advanced visualizations**: Heatmaps, clustered markers, epicentral circles, fault overlays
- **Auxiliary data**: GEM Active Faults, USGS Quaternary Faults, EFSM20 fault databases
- **Global ISC stations**: packaged CSV (~41k stations) automatically clipped to the map radius, with a dedicated layer
- **Multi-fault datasets**: combine GEM, USGS Quaternary and EFSM20 faults (and local GeoJSONs) in a single color-coded layer
- **Reproducible projects**: every map writes the catalogs actually used to `./maps` and (optionally) `./data` for auditability

## Installation

Requires Python **3.8+**.

```bash
pip install kashima
```

Development version:

```bash
git clone https://github.com/averrik/kashima.git
cd kashima
pip install -e .
```

## Quickstart

1. **Install** `kashima` (see above).
2. **Initialize the global cache** (earthquake catalogs + optional fault databases). Run this once per machine:

```python
from kashima.mapper import downloadAllCatalogs

# First-time setup: fills ~/.cache/kashima (Linux),
# ~/Library/Caches/kashima (macOS) or %LOCALAPPDATA%\kashima\Cache\ (Windows)
downloadAllCatalogs(include_faults=True)
```

3. **Build your first map** around a site of interest:

```python
from kashima.mapper import buildMap

result = buildMap(
    latitude=-32.86758,
    longitude=-68.88867,
    radius_km=500,
    project_name="Mendoza seismicity",
    client="Example Mining Co.",
)

print("HTML map:", result["html"])  # ./maps/index.html
print("Events CSV:", result["csv"])  # ./maps/epicenters.csv
```

Open the generated `index.html` in a browser to explore earthquakes, faults and stations interactively.

## Usage

### GMDB EventTable workflows (audit + enrichment)
The `kashima.gmdb` module provides workflows for maintaining provider EventTables.

- `updateEventTable(...)`: fill `usgsEventId` / `iscEventId`, optionally remove non-resolvable rows, and augment rows with:
  - `faultStyle` (from moment tensors)
  - `genMech` + auditable `*.STREC` columns (STREC + Slab2)

Bootstrap STREC+Slab2 once per machine:

```python
from kashima.gmdb import ensureStrecData

# Creates ~/.strec/config.ini and downloads Slab2 grids into the configured slab folder.
ensureStrecData(createConfig=True, ensureSlab2=True, downloadSlab2=True)
```

See `docs/gmdb_eventtable.md`, `docs/gmdb_stationtable.md`, and `examples/gmdb/06_update_all_eventtables_with_strec.py`.

### GMDB Vs30 helpers (station site conditions)
GMDB `StationTable.*.csv` files carry several Vs30-related columns (`Vs30.owner`, `Vs30.neighbors`, `Vs30.USGS`, `StationVs30`).
The `kashima.gmdb` helpers expose these through a small, file-based API so you do not have to re-implement column semantics.

Query helpers:

```python
from kashima.gmdb import (
    getStationVs30,
    getStationVs30Sources,
    getStationVs30Index,
)

# Single-station Vs30 (synthetic by default: owner > neighbors > USGS)
vs30 = getStationVs30(
    "NWZ",
    "WEL/THZ",
    indexDir="~/kashimaDB/gmdb.v2/index",
    source="synthetic",  # or "owner" / "neighbors" / "usgs"
)

# Full provenance for one station
sources = getStationVs30Sources("NWZ", "WEL/THZ", indexDir="~/kashimaDB/gmdb.v2/index")

# Bulk index: StationID -> Vs30Sources
idx = getStationVs30Index("NWZ", indexDir="~/kashimaDB/gmdb.v2/index")
```

Update helpers (flatfiles → Vs30.owner → full pipeline):

```python
from kashima.gmdb import (
    updateStationVs30OwnerFromFlatfiles,
    updateStationVs30Pipeline,
)

# Refresh Vs30.owner from provider flatfiles
status = updateStationVs30OwnerFromFlatfiles(
    "NGAW",
    rootDir="~/kashimaDB/gmdb.v2",
    indexDir="~/kashimaDB/gmdb.v2/index",
)

# Run the full Vs30 pipeline (owner + neighbors + USGS + synthetic rebuild)
r = updateStationVs30Pipeline(
    "NGAW",
    rootDir="~/kashimaDB/gmdb.v2",
    indexDir="~/kashimaDB/gmdb.v2/index",
    distanceKm=1.0,
    includeLocOnly=True,
    usgsGrid=None,  # default: downloads global_vs30.grd into the user cache (if missing)
)
```

See `docs/gmdb_vs30.md` and `examples/gmdb/10_vs30_helpers.py` for more details.

### GMDB provider ingestion (no download)
`kashima.gmdb` also includes short helpers to ingest already-downloaded provider data into a GMDB root:
- `ingestFDSNProvider(...)`: restage + normalize + base tables + RecordTables
- `updateProviderIndex(...)`: completion step for EventTable/StationTable/Vs30
- `validateProvider(...)`: RecordTable + filesystem validation
- `auditProviderFDSN(...)`: optional online audit against FDSN endpoints

MiniSEED payload policy:
- MiniSEED-based owners (NCEDC/SCEDC/NN/PNSN) are normalized to ASCII `.txt` payloads in `raw.owner/`.
- On apply/rewriteExisting, the original `.mseed` files are removed (no intermediate artifacts left).

PGA sanity workflow (recommended for MiniSEED owners):
1. Rebuild RecordTables from payloads.
2. Audit outliers.
3. If sentinel-derived spikes exist in already-materialized `.txt`, repair payloads and rebuild again.

Scripts:
- Audit PGA outliers: `examples/gmdb/23_audit_recordtable_pga_outliers.py`
- Repair `.txt` sentinels (dry-run default; requires `--apply`): `examples/gmdb/24_repair_fdsn_txt_sentinels.py`

Examples:
- `examples/gmdb/19_ingest_fdsn_provider.py`
- `examples/gmdb/21_ingest_fdsn_provider_end_to_end.py`
- `examples/gmdb/22_normalize_raw_owner.py`

### Map layers and concepts

Each map produced by `buildMap` is composed of several layers that you can turn on/off in the Folium LayerControl:

- **Events**: epicentral points coloured and sized by magnitude, coming from USGS/GCMT/ISC or an optional user CSV.
- **Clustered view**: an alternative representation where nearby events are grouped into clusters to reduce overplotting.
- **Heatmap**: a smoothed density field of events, controlled by the `heatmap_*` parameters.
- **Beachballs**: focal mechanisms (from GCMT) drawn as beachball symbols for events above a given magnitude.
- **Faults**: line features from global fault databases (GEM, USGS Quaternary, EFSM20) selected via `fault_sets`, plus any local GeoJSON passed in `faults_files`, all clipped to the same geographic window as the events.
- **Stations**: global ISC stations from the packaged CSV (or your own `station_csv_path`), clipped to the map window and rendered as square markers.
- **Site marker**: a star symbol at the site location (`latitude`, `longitude`).
- **Epicentral circles**: concentric distance rings around the site, controlled by `epicentral_circles`.

These layers showcase most of the power of kashima; the parameters of `buildMap` let you decide which ones to include and how they look.

### High-level map API: `buildMap`

The main entry point is `kashima.mapper.buildMap`. It:

- Copies the latest cached USGS/ISC/GCMT catalogs into a project-local `data/` directory
- Optionally merges global fault databases (GEM, USGS Quaternary, EFSM20) and user GeoJSON faults
- Adds a global ISC stations layer by default (unless you override it)
- Builds a Folium map and writes `maps/index.html` + `maps/epicenters.csv`

Minimal call (requires a pre-populated cache, see **Quickstart**):

```python
from kashima.mapper import buildMap

result = buildMap(
    latitude=-32.86758,
    longitude=-68.88867,
)
```

A more realistic example using multiple layers, fault sets and local faults:

```python
from kashima.mapper import buildMap

result = buildMap(
    latitude=-12.90795,
    longitude=+15.24845,
    radius_km=3500,
    # Layer visibility
    show_events_default=True,
    show_cluster_default=False,
    show_heatmap_default=True,
    show_beachballs_default=True,
    show_faults_default=True,
    show_epicentral_circles_default=True,
    # Fault datasets: global cache + local GeoJSONs
    fault_sets=["gem", "usgs", "efsm20"],
    faults_files=[
        "examples/mapper/faults/Angola1982.geojson",
        "examples/mapper/faults/Escosa2024.geojson",
    ],
    # Stations: default ISC CSV from cache, custom title
    station_layer_title="ISC + local stations",
    # Keep ./data snapshot for documentation
    keep_data=True,
)

print(result)
```

Key parameter groups (see `help(buildMap)` for the full list and defaults):

- **Location & radius** (`latitude`, `longitude`, `radius_km`, `event_radius_multiplier`): define the geographic window of the map. `radius_km` sets the base radius, and `event_radius_multiplier` scales that radius when computing the spatial window used for events, faults and stations.
- **Layers** (`show_events_default`, `show_cluster_default`, `show_heatmap_default`, `show_beachballs_default`, `show_faults_default`, `show_stations_default`, `show_epicentral_circles_default`): control which layers are visible when the map opens. Users can still toggle them later via the Folium LayerControl.
- **Catalogs & data** (`user_events_csv`, `keep_data`, `output_dir`): override the global catalogs with your own CSV, preserve the `./data` snapshot for auditability and choose where `maps/` and `data/` are written.
- **Fault configuration** (`fault_sets`, `faults_files`, `regional_faults_color`, `regional_faults_weight`, `faults_coordinate_system`): select which cached fault databases (any subset of `"gem"`, `"usgs"`, `"efsm20"`) are merged and which extra GeoJSON faults to add (for example the Angola files used in `examples/mapper/longonjo.py`), and how they are styled.
- **Stations** (`station_csv_path`, `station_coordinate_system`, `station_layer_title`, `show_stations_default`): keep the default global ISC stations or replace them with your own CSV, adjusting CRS and layer title for the stations layer. Use `show_stations_default=False` to start with the stations layer turned off.
- **Styling & legend** (`mag_bins`, `dot_palette`, `dot_sizes`, `beachball_sizes`, `fault_style_meta`, `color_palette`, `color_reversed`, `scaling_factor`, `legend_title`, `legend_position`): control how magnitudes map to colours and sizes and how the legend is rendered. Much of the visual power of examples like `examples/mapper/longonjo.py` comes from careful tuning of these parameters.
- **XY coordinates** (`x_col`, `y_col`, `location_crs`): work in projected coordinates (for example local UTM) instead of latitude/longitude, useful when your input catalogs are already in a local CRS.
- **Tooltips** (`tooltip_fields`, `legend_map`): choose which event fields appear in the tooltip and how they are labelled.
- **Map behavior** (`base_zoom_level`, `min_zoom_level`, `max_zoom_level`, `default_tile_layer`, `auto_fit_bounds`, `lock_pan`, `epicentral_circles`): control the initial view (zoom levels and base tile layer) and how many distance rings are drawn around the site via `epicentral_circles`. `auto_fit_bounds` and `lock_pan` exist for future map-behaviour controls and may have no visible effect in some versions; use `help(buildMap)` for the authoritative description.

`buildMap` returns a small dictionary:

```python
{
    "html": "path/to/index.html",
    "csv": "path/to/epicenters.csv",
    "event_count": 1234,
}
```

### Catalog API: `buildCatalog`

For scripted data pipelines you can call `buildCatalog` directly to fetch and save catalogs without generating maps.

```python
from kashima.mapper import buildCatalog

# Radial USGS query around a site
result = buildCatalog(
    source="usgs",
    output_path="data/usgs-events.csv",
    latitude=-32.86758,
    longitude=-68.88867,
    max_radius_km=500,
    min_magnitude=5.0,
    start_time="2010-01-01",
    end_time="2024-12-31",
)
print(f"Downloaded {result['event_count']} events from {result['source']}")

# Full global catalog (no spatial filter)
result = buildCatalog(
    source="gcmt",
    output_path="data/gcmt-full.csv",
    min_magnitude=5.5,
)
```

Supported sources are `"usgs"`, `"gcmt"`, `"isc"` and (in the future) `"blast"` (see docstring for details and current status).

### Global cache & updates

kashima maintains a **global cache** so catalogs and fault databases are downloaded once and reused across all projects.

```python
from kashima.mapper import (
    downloadAllCatalogs,
    updateAllCatalogs,
    get_cache_dir,
    clear_cache,
)

# One-time setup (or when you want to pre-populate everything)
catalogs = downloadAllCatalogs(include_faults=True)
print("Cache directory:", catalogs["cache_dir"])

# Incremental update (new events only + refreshed fault databases)
updated = updateAllCatalogs(include_faults=True)
print("New USGS events:", updated["usgs_new"])

# Inspect cache location
print("Cache lives in:", get_cache_dir())

# Optional: clear a catalog if needed
# clear_cache("usgs")
```

On first use, `downloadAllCatalogs` also copies any bundled data shipped inside the wheel into the cache, so initial setup is often instant.

### Fault databases

Global fault datasets live in the cache as GeoJSON files and are consumed automatically by `buildMap` when `show_faults_default=True`. You can also work with them explicitly via:

- `buildGEMActiveFaults()`
- `buildUSGSQuaternaryFaults()`
- `buildEFSM20Faults()`

Use `fault_sets` to choose which cached datasets to merge (any subset of `"gem"`, `"usgs"`, `"efsm20"`) and `faults_files` to add custom GeoJSON faults (for example the Angola examples in `examples/mapper/faults/`).

### Station layer

By default `buildMap` adds a global ISC stations layer:

- The CSV `isc_stations.csv` is bundled inside the package and copied to the cache on first use.
- When you do not pass `station_csv_path`, `buildMap` reads stations from the cache, clips them to the same geographic window as the events and adds them as a toggleable layer.
- If you pass `station_csv_path`, your CSV is used instead and the default ISC stations are ignored.
- Note: passing an empty string for `station_csv_path` raises an error; omit it to use the default ISC stations.

## Examples

Complete, runnable workflows live in `examples/mapper/`:

- **Catalog setup & maintenance**
  - `00_download_catalogs.py`, `00_update_catalogs.py`
  - `01_usgs_catalog.py`, `02_gcmt_catalog.py`, `03_isc_catalog.py`, `03_update_catalogs.py`, `04_rebuild_cache.py`
- **Basic and intermediate maps**
  - `04_minimal_map.py`, `05_map_with_beachballs.py`, `06_map_with_custom_legend.py`, `07_map_with_heatmap.py`, `08_map_with_faults.py`, `09_map_advanced_config.py`, `longonjo.py`
- **Fault databases & stations**
  - `05_custom_faults.py`, `06_update_active_faults.py`, `07_compile_all_fault_databases.py`, `08_update_all_catalogs_and_faults.py`
- **Custom catalogs**
  - `10_blast_catalog.py`

Use these scripts as living documentation of typical workflows and advanced configuration.

Offline GMDB workflows live in `examples/gmdb/`:
- `01_export_offline_catalog.py`, `02_resolve_single_event.py`
- `03_enrich_eventtable_csv.py`, `04_enrich_with_robust_inputs.py`
- `05_update_eventtable_csv.py` (recommended for maintaining `EventTable.<PROVIDER>.csv`)

GMM evaluation examples live in `examples/gmm/`:
- `01_sa_longtable_distances.py`
- `02_sa_longtable_planar_corners.py`
- `03_sa_longtable_from_r.py`

### Advanced example: `examples/mapper/longonjo.py`

This script demonstrates a project-style map centred on Angola with:

- custom magnitude bins, colour scales and point sizes tuned for satellite imagery;
- a mix of global fault databases and multiple regional fault GeoJSON files passed via `faults_files`;
- a large search radius (`radius_km=3500`), satellite base tiles, locked panning and `keep_data=True` so the generated `data/` directory can be inspected or versioned.

Use it as a template for real engineering projects: copy the script, adjust coordinates, radius, `faults_files` and project metadata, and you will obtain a map suitable for inclusion in technical reports.

## API overview

The public API of `kashima.mapper` is defined by what is exported from `kashima/mapper/__init__.py`. The most important entry points are:

- **Map & catalogs**
  - `buildMap()`: high-level map builder
  - `buildCatalog()`: generic catalog builder (`source="usgs" | "gcmt" | "isc"`)
  - `buildUSGSCatalog()`, `buildGCMTCatalog()`, `buildISCCatalog()`
- **Faults & auxiliary data**
  - `buildGEMActiveFaults()`, `buildUSGSQuaternaryFaults()`, `buildEFSM20Faults()`
- **Cache management**
  - `downloadAllCatalogs()`, `updateAllCatalogs()`
  - `updateUSGSCatalog()`, `updateGCMTCatalog()`, `updateISCCatalog()`
  - `updateGEMActiveFaults()`, `updateUSGSQuaternaryFaults()`, `updateEFSM20Faults()`
  - `get_cache_dir()`, `clear_cache()`
- **Core classes & configuration**
  - `MapConfig`, `EventConfig`, `FaultConfig`, `StationConfig`, `BlastConfig`
  - `BaseMap`, `USGSCatalog`, `GCMTCatalog`, `BlastCatalog`, `EventMap`
  - Constants: `EARTH_RADIUS_KM`, `TILE_LAYERS`, `calculate_zoom_level()`

For the definitive parameter list and defaults, always refer to the Python docstrings:

```python
from kashima.mapper import buildMap, buildCatalog
help(buildMap)
help(buildCatalog)
```

## Help & documentation

- **GitHub Pages**: the rendered docs site includes a landing page at `docs/` (see `docs/index.md`).
- **API help**: use Python's built-in tools, for example `python -m pydoc kashima.mapper.buildMap` or `help(buildMap)` / `help(buildCatalog)` from an interactive session.
- **User and internal docs**: additional notes live in the `docs/` directory of the source repository:
  - `docs/user_guide.md`: offline resolver workflows
  - `docs/gmdb_provider_ingestion.md`: provider ingestion (no download) + payload policy + QA scripts
  - `docs/gmdb_eventtable.md`: GMDB provider EventTable maintenance via `kashima.gmdb.updateEventTable(...)`
  - `docs/gmdb_stationtable.md`: GMDB provider StationTable maintenance via `kashima.gmdb.updateStationTable(...)`
  - `docs/gmdb_vs30.md`: GMDB Vs30 query/update helpers (`getStationVs30(...)`, `updateStationVs30Pipeline(...)`)
  - `docs/gmdb_naming_conventions.md`, `docs/naming_conventions.md`, `docs/mapper_layers_plan.md`
- **Examples as a manual**: the scripts under `examples/mapper/` illustrate end-to-end workflows and advanced configuration.

## Documentation for developers

Some internal design notes and naming rules live in `docs/`:

- `docs/naming_conventions.md`: what is considered public API vs. internal helpers and the naming scheme used across the package.
- `docs/mapper_layers_plan.md`: design notes for the ISC stations layer, fault handling and cache behavior.

These documents are primarily for contributors; end users should not rely on internal helpers, only on the public API listed above.

## Command-line interface / man page

At the moment kashima is a **Python library only**: it does not install a standalone `kashima` command and does not ship a Unix `man` page.

For built-in help, use Python's introspection:

```bash
# From the shell
python -m pydoc kashima.mapper.buildMap
python -m pydoc kashima.mapper.buildCatalog
```

or from a Python session:

```python
from kashima.mapper import buildMap
help(buildMap)
```

## Dependencies

Core runtime dependencies (installed automatically by `pip install kashima`):

- Python (>= 3.8)
- pandas, numpy, folium, geopandas, pyproj
- requests, branca, geopy, matplotlib
- obspy (beachball rendering), pyarrow (parquet cache)

Some fault builders download large GeoJSONs or talk to WFS services and therefore require a working internet connection the first time you run them.

## License

MIT License - see [LICENSE](LICENSE) (to be added)

## Citation

```bibtex
@software{kashima2022,
  author = {Verri Kozlowski, Alejandro},
  title = {kashima: Interactive Seismic Event Mapping and Catalog Management},
  year = {2022},
  version = {3.0.0},
  url = {https://averrik.github.io/kashima/}
}
```

---

## Author

**Alejandro Verri Kozlowski**  
**Email:** averri@fi.uba.ar  
**ORCID:** [0000-0002-8535-1170](https://orcid.org/0000-0002-8535-1170)  
**Affiliation:** Universidad de Buenos Aires, Facultad de Ingeniería
