Metadata-Version: 2.4
Name: geoprepare
Version: 0.6.66
Summary: A Python package to prepare (download, extract, process input data) for GEOCIF and related models
Author-email: Ritvik Sahajpal <ritvik@umd.edu>
License: MIT
Project-URL: Homepage, https://github.com/ritviksahajpal/geoprepare
Project-URL: Documentation, https://ritviksahajpal.github.io/geoprepare
Project-URL: Repository, https://github.com/ritviksahajpal/geoprepare
Project-URL: Issues, https://github.com/ritviksahajpal/geoprepare/issues
Keywords: geoprepare,geospatial,agriculture,remote-sensing,earth-observation
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: GIS
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: dev
Requires-Dist: bump2version; extra == "dev"
Requires-Dist: wheel; extra == "dev"
Requires-Dist: watchdog; extra == "dev"
Requires-Dist: flake8; extra == "dev"
Requires-Dist: tox; extra == "dev"
Requires-Dist: coverage; extra == "dev"
Requires-Dist: Sphinx; extra == "dev"
Requires-Dist: twine; extra == "dev"
Requires-Dist: grip; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: isort; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Provides-Extra: docs
Requires-Dist: mkdocs; extra == "docs"
Requires-Dist: mkdocstrings; extra == "docs"
Requires-Dist: mkdocs-material; extra == "docs"
Requires-Dist: mkdocs-jupyter; extra == "docs"
Requires-Dist: mkdocs-git-revision-date-plugin; extra == "docs"
Dynamic: license-file

# geoprepare


[![image](https://img.shields.io/pypi/v/geoprepare.svg)](https://pypi.python.org/pypi/geoprepare)


**A Python package to prepare (download, extract, process input data) for GEOCIF and related models**


-   Free software: MIT license
-   Documentation: https://ritviksahajpal.github.io/geoprepare

## Installation
> **Note:** The instructions below have only been tested on a Linux system

### Install Anaconda
We recommend that you use the conda package manager to install the `geoprepare` library and all its
dependencies. If you do not have it installed already, you can get it from the [Anaconda distribution](https://www.anaconda.com/download#downloads)

### Using the CDS API
If you intend to download AgERA5 data, you will need to install the CDS API.
You can do this by following the instructions [here](https://cds.climate.copernicus.eu/api-how-to)

### Create a new conda environment (optional but highly recommended)
`geoprepare` requires multiple Python GIS packages including `gdal` and `rasterio`. These packages are not always easy
to install. To make the process easier, you can optionally create a new environment using the
following commands, specify the python version you have on your machine (python >= 3.9 is recommended). we use the `pygis` library
to install multiple Python GIS packages including `gdal` and `rasterio`.

```bash
conda create --name <name_of_environment> python=3.x
conda activate <name_of_environment>
conda install -c conda-forge mamba
mamba install -c conda-forge gdal
mamba install -c conda-forge rasterio
mamba install -c conda-forge xarray
mamba install -c conda-forge rioxarray
mamba install -c conda-forge pyresample
mamba install -c conda-forge cdsapi
mamba install -c conda-forge pygis
pip install wget
pip install pyl4c
```

Install the octvi package to download MODIS data
```bash
pip install git+https://github.com/ritviksahajpal/octvi.git
```

Downloading from the NASA distributed archives (DAACs) requires a personal app key. Users must
configure the module using a new console script, `octviconfig`. After installation, run `octviconfig`
in your command prompt to prompt the input of your personal app key. Information on obtaining app keys
can be found [here](https://ladsweb.modaps.eosdis.nasa.gov/tools-and-services/data-download-scripts/#tokens)


### Using PyPi (default)
```bash
pip install --upgrade geoprepare
```

### Using Github repository (for development)
```bash
pip install --upgrade --no-deps --force-reinstall git+https://github.com/ritviksahajpal/geoprepare.git
```

### Local installation
Navigate to the directory containing `pyproject.toml` and run the following command:
```bash
pip install .
```

For development (editable install):
```bash
pip install -e ".[dev]"
```

## Usage
* Execute the following code to download the data

```python
from geoprepare import geodownload

# Provide full path to the configuration files
# Download and preprocess data
geodownload.run([r"PATH_TO_geobase.txt"])
```

* Execute the following code to extract crop masks and EO data
```python
from geoprepare import geoextract

# Extract crop masks and EO variables
geoextract.run([r"PATH_TO_geobase.txt", r"PATH_TO_geoextract.txt"])
```

* Execute the following code to prepare the data for the crop yield ML model and AgMet graphics
```python
from geoprepare import geomerge

# Merge EO files into one, this is needed to create AgMet graphics and to run the crop yield model
geomerge.run([r"PATH_TO_geobase.txt", r"PATH_TO_geoextract.txt"])
```


Before running the code above, we need to specify the two configuration files:
* `geobase.txt` contains configuration settings for downloading and processing the input data.
* `geoextract.txt` contains configuration settings for extracting crop masks and EO variables.

## Configuration files
### geobase.txt
> **NOTE:** `dir_base` needs to be changed to your specific directory structure
* `datasets`: Specify which datasets need to be downloaded and processed
* `dir_base`: Path where to store the downloaded and processed files
* `start_year`, `end_year`: Specify time-period for which data should be downloaded and processed
* `logfile`: What directory name to use for the log files
* `level`: Which level to use for [logging](https://www.loggly.com/ultimate-guide/python-logging-basics/)
* `parallel_process`: Whether to use multiple CPUs
* `fraction_cpus`: What fraction of available CPUs to use
```ini
[DATASETS]
datasets = ['NDVI', 'AGERA5', 'CHIRPS', 'CPC', 'CHIRPS-GEFS', 'NSIDC']

[PATHS]
dir_base = /gpfs/data1/cmongp1/GEOGLAM
dir_input = ${dir_base}/Input
dir_log = ${dir_base}/log
dir_interim = ${dir_input}/intermed
dir_download = ${dir_input}/download
dir_output = ${dir_base}/Output
dir_global_datasets = ${dir_input}/Global_Datasets
dir_metadata = ${dir_input}/metadata
dir_masks = ${dir_global_datasets}/masks
dir_regions = ${dir_global_datasets}/regions
dir_regions_shp = ${dir_regions}/shps
dir_crop_masks = ${dir_input}/crop_masks
dir_models = ${dir_input}/models

[AGERA5]
variables = ['Precipitation_Flux', 'Temperature_Air_2m_Max_24h', 'Temperature_Air_2m_Min_24h']

[AVHRR]
data_dir = https://www.ncei.noaa.gov/data/avhrr-land-normalized-difference-vegetation-index/access

[CHIRPS]
fill_value = -2147483648
prelim = /pub/org/chc/products/CHIRPS-2.0/prelim/global_daily/tifs/p05/
final = /pub/org/chc/products/CHIRPS-2.0/global_daily/tifs/p05/

[CHIRPS-GEFS]
fill_value = -2147483648
data_dir = /pub/org/chc/products/EWX/data/forecasts/CHIRPS-GEFS_precip_v12/15day/precip_mean/

[CPC]
data_dir = ftp://ftp.cdc.noaa.gov/Datasets

[ESI]
data_dir = https://gis1.servirglobal.net//data//esi//

[FLDAS]

[LST]
num_update_days = 7

[VHI]
data_historic = https://www.star.nesdis.noaa.gov/data/pub0018/VHPdata4users/VHP_4km_GeoTiff/
data_current = https://www.star.nesdis.noaa.gov/pub/corp/scsb/wguo/data/Blended_VH_4km/geo_TIFF/

[NDVI]
product = MOD09CMG
vi = ndvi
scale_glam = False
scale_mark = True
print_missing = False

[VIIRS]
product = VNP09CMG
vi = ndvi
scale_glam = False
scale_mark = True
print_missing = False

[NSIDC]

[SOIL-MOISTURE]
data_dir = https://gimms.gsfc.nasa.gov/SMOS/SMAP/L03/

[FPAR]
data_url = https://agricultural-production-hotspots.ec.europa.eu/data/indicators_fpar/fpar/

[LOGGING]
level = ERROR

[DEFAULT]
logfile = log
parallel_process = False
fraction_cpus = 0.75
start_year = 2001
end_year = 2024
```

### geoextract.txt
> **NOTE:** For each country add a new section to this file, using `kenya` as an example
* `countries`: List of countries to process
* `forecast_seasons`: List of seasons to process
* `mask`: Name of file to use as a mask for cropland/croptype
* `redo`: Redo the processing for all days (`True`) or only days with new data (`False`)
* `threshold`: Use a `threshold` value (`True`) or a `percentile` (`False`) on the cropland/croptype mask
* `floor`: Value below which to set the mask to 0
* `ceil`: Value above which to set the mask to 1
* `eo_model`: List of datasets to extract from
```ini
[kenya]
category = EWCM
scales = ['admin_1']  ; can be admin_1 (state level) or admin_2 (county level)
growing_seasons = [1]  ; 1 is primary/long season, 2 is secondary/short season
crops = ['mz', 'sr', 'ml', 'rc', 'ww', 'tf']
use_cropland_mask = True

[rwanda]
category = EWCM
scales = ['admin_1']  ; can be admin_1 (state level) or admin_2 (county level)
growing_seasons = [1]  ; 1 is primary/long season, 2 is secondary/short season
crops = ['mz', 'sr', 'ml', 'rc', 'ww', 'tf']
use_cropland_mask = True

[malawi]
category = EWCM
scales = ['admin_1']  ; can be admin_1 (state level) or admin_2 (county level)
growing_seasons = [1]  ; 1 is primary/long season, 2 is secondary/short season
crops = ['mz', 'sr', 'ml', 'rc', 'ww', 'tf']
use_cropland_mask = True

[zambia]
category = EWCM
scales = ['admin_1']  ; can be admin_1 (state level) or admin_2 (county level)
growing_seasons = [1]  ; 1 is primary/long season, 2 is secondary/short season
crops = ['mz', 'sr', 'ml', 'rc', 'ww', 'tf']
use_cropland_mask = True

[united_republic_of_tanzania]
category = EWCM
scales = ['admin_1']  ; can be admin_1 (state level) or admin_2 (county level)
growing_seasons = [1]  ; 1 is primary/long season, 2 is secondary/short season
crops = ['mz', 'sr', 'ml', 'rc', 'ww', 'tf']
use_cropland_mask = True

[ww]
mask = cropland_v9.tif  ; A tif file specifying name of cropland/crop-type mask

[mz]
mask = cropland_v9.tif

[sb]
mask = cropland_v9.tif

[rc]
mask = cropland_v9.tif

[tf]
mask = cropland_v9.tif

[sr]
mask = cropland_v9.tif

[ml]
mask = cropland_v9.tif

[EWCM]
calendar_file = EWCM_2021-6-17.xlsx

[AMIS]
calendar_file = AMISCM_2021-6-17.xlsx

[DEFAULT]
redo = False
threshold = True
floor = 20
ceil = 90
scales = ['admin_1']
growing_seasons = [1]
countries = ['kenya']
forecast_seasons = [2022]
mask = cropland_v9.tif
shp_boundary = EWCM_Level_1.shp
statistics_file = statistics.csv
zone_file = countries.csv
calendar_file = crop_calendar.csv
eo_model = ['ndvi', 'cpc_tmax', 'cpc_tmin', 'chirps', 'chirps_gefs', 'esi_4wk', 'soil_moisture_as1', 'soil_moisture_as2']
```

## Accessing EO data using the earthaccess library
```python
import geopandas as gpd
from tqdm import tqdm
from pathlib import Path

from geoprepare.eoaccess import eoaccess

dg = gpd.read_file(PATH_TO_SHAPEFILE, engine="pyogrio")

# Convert to CRS 4326 if not already
if dg.crs != "EPSG:4326":
    dg = dg.to_crs("EPSG:4326")

# Iterate over each row of the shapefile
for index, row in tqdm(dg.iterrows(), desc="Iterating over shapefile", total=len(dg)):
    # Get bbox from geometry of the row
    bbox = row.geometry.bounds

    obj = eoaccess.NASAEarthAccess(
        dataset=["HLSL30", "HLSS30"],
        bbox=bbox,
        temporal=(f"{row['year']}-01-01", f"{row['year']}-12-31"),
        output_dir=".",
    )

    obj.search_data()
    if obj.results:
        obj.download_parallel()

obj = eoaccess.EarthAccessProcessor(
    dataset=["HLSL30", "HLSS30"],
    input_dir=".",
    shapefile=Path(PATH_TO_SHAPEFILE),
)
obj.mosaic()
```

## Upload package to PyPI

Navigate to the **root of the geoprepare repository** (the directory containing `pyproject.toml`):
```bash
cd /path/to/geoprepare
```

### Step 1: Update version
Use `bump2version` to update the version in both `pyproject.toml` and `geoprepare/__init__.py`:

**Using uv:**
```bash
uvx bump2version patch --current-version X.X.X --new-version X.X.Y pyproject.toml geoprepare/__init__.py
```

**Using pip:**
```bash
pip install bump2version
bump2version patch --current-version X.X.X --new-version X.X.Y pyproject.toml geoprepare/__init__.py
```

Or manually edit the version in `pyproject.toml` and `geoprepare/__init__.py`.

### Step 2: Clean old builds

**Linux/macOS:**
```bash
rm -rf dist/ build/ *.egg-info/
```

**Windows (Command Prompt):**
```cmd
rmdir /s /q dist build geoprepare.egg-info
```

**Windows (PowerShell):**
```powershell
Remove-Item -Recurse -Force dist/, build/, *.egg-info/ -ErrorAction SilentlyContinue
```

### Step 3: Build and upload

**Using uv (Linux/macOS):**
```bash
uv build
uvx twine check dist/*
uvx twine upload dist/geoprepare-X.X.X*
```

**Using uv (Windows):**
```cmd
uv build
uvx twine check dist\geoprepare-X.X.X.tar.gz dist\geoprepare-X.X.X-py3-none-any.whl
uvx twine upload dist\geoprepare-X.X.X.tar.gz dist\geoprepare-X.X.X-py3-none-any.whl
```

**Using pip:**
```bash
pip install build twine
python -m build
twine check dist/*
twine upload dist/geoprepare-X.X.X*
```

Replace `X.X.X` with your current version and `X.X.Y` with the new version.

### Optional: Configure PyPI credentials
To avoid entering credentials each time, create a `~/.pypirc` file (Linux/macOS) or `%USERPROFILE%\.pypirc` (Windows):
```ini
[pypi]
username = __token__
password = pypi-YOUR_API_TOKEN_HERE
```

## Credits
This project was supported by was supported by NASA Applied Sciences Grant No. 80NSSC17K0625 through the NASA Harvest Consortium,
and the NASA Acres Consortium under NASA Grant #80NSSC23M0034
