Metadata-Version: 2.4
Name: socio4health
Version: 1.0.4
Summary: Socio4health is a Python package for gathering and consolidating socio-demographic data.
Home-page: https://github.com/harmonize-tools/socio4health
Author: Erick Lozano, Diego Irreño, Juan Montenegro, Ingrid Mora
Author-email: 
Project-URL: Bug Reports, https://github.com/harmonize-tools/socio4health/issues
Project-URL: Source, https://github.com/harmonize-tools/socio4health/
Project-URL: Changelog, https://github.com/harmonize-tools/socio4health/blob/main/CHANGELOG.md
Keywords: extract transform load etl scraping relational census sociodemographic colombia brazil
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Build Tools
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3 :: Only
Requires-Python: >=3.10, <4
Description-Content-Type: text/markdown
License-File: LICENSE.md
Requires-Dist: pandas>=2.0.0
Requires-Dist: requests>=2.31.0
Requires-Dist: Scrapy>=2.11.1
Requires-Dist: tqdm>=4.66.1
Requires-Dist: pyreadstat>=1.2.6
Requires-Dist: py7zr>=0.20.8
Requires-Dist: matplotlib>=3.7.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: openpyxl>=3.1.2
Requires-Dist: dask>=2023.0.0
Requires-Dist: appdirs>=1.4.4
Requires-Dist: pyarrow>=12.0.0
Requires-Dist: deep-translator>=1.11.4
Requires-Dist: transformers>=4.30.0
Requires-Dist: torch>=2.0.0
Requires-Dist: geopandas>=0.14.0
Requires-Dist: zipfile-deflate64==0.2.0
Requires-Dist: pyzipper==0.3.6
Provides-Extra: dev
Requires-Dist: check-manifest; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Provides-Extra: test
Requires-Dist: coverage; extra == "test"
Dynamic: author
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license-file
Dynamic: project-url
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary


<a href="https://www.harmonize-tools.org/">
    <img height="120" align="right" src="https://harmonize-tools.github.io/harmonize-logo.png" />
</a>

<a href="https://harmonize-tools.github.io/socio4health/">
    <img height="120" src="https://raw.githubusercontent.com/harmonize-tools/socio4health/main/docs/source/_static/image.png" />
</a>

# socio4health
                                                             
<!-- badges: start -->

[![Lifecycle:
maturing](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[![MIT
license](https://img.shields.io/badge/License-MIT-blue.svg)](https://github.com/harmonize-tools/socio4health/blob/main/LICENSE.md/)
[![GitHub
contributors](https://img.shields.io/github/contributors/harmonize-tools/socio4health)](https://github.com/harmonize-tools/socio4health/graphs/contributors)
![commits](https://img.shields.io/github/commit-activity/t/harmonize-tools/socio4health)
<!-- badges: end -->

## Overview  
<p style="font-family: Arial, sans-serif; font-size: 14px;">
  Package socio4health is an extraction, transformation and loading (ETL)  classification tool designed to simplify the intricate process of collecting and merging data from multiple sources, focusing on sociodemographic and census datasets from Colombia, Brazil, and Peru, into a harmonized dataset.
</p>

- Seamlessly retrieve data from online data sources through web scraping, as well as from local files.
- Support for various data formats, including `.csv`, `.xlsx`, `.xls`, `.txt`, `.sav`, fixed-width files and geospatial files, ensuring versatility in sourcing information.
- Consolidating extracted data into a pandas (or dask) DataFrame.



## Dependencies

<table>
  <tr>
    <td align="center">
      <a href="https://www.dask.org/" target="_blank">
        <img src="https://avatars.githubusercontent.com/u/17131925?s=200&v=4" height="50" alt="pandas logo">
      </a>
    </td>
    <td align="left">
      <strong>Dask</strong><br>
     Dask is a flexible parallel computing library for analytics.<br>
    </td>
  </tr>
  <tr>
    <td align="center">
      <a href="https://pandas.pydata.org/" target="_blank">
        <img src="https://avatars.githubusercontent.com/u/21206976?s=280&v=4" height="50" alt="pandas logo">
      </a>
    </td>
    <td align="left">
      <strong>Pandas</strong><br>
      Pandas is a well-known open source data analysis and manipulation tool.<br>
    </td>
  </tr>
  <tr>
    <td align="center">
      <a href="https://geopandas.org/" target="_blank">
        <img src="https://avatars.githubusercontent.com/u/8130715?s=48&v=4" height="50" alt="pandas logo">
      </a>
    </td>
    <td align="left">
      <strong>Geopandas</strong><br>
     Python tools for geographic data.<br>
    </td>
  </tr>
  <tr>
    <td align="center">
      <a href="https://numpy.org/" target="_blank">
        <img src="https://avatars.githubusercontent.com/u/288276?s=48&v=4" height="50" alt="numpy logo">
      </a>
    </td>
    <td align="left">
      <strong>Numpy</strong><br>
      The fundamental package for scientific computing with Python.<br>
    </td>
  </tr>
  <tr>
    <td align="center">
      <a href="https://scrapy.org/" target="_blank">
        <img src="https://avatars.githubusercontent.com/u/733635?s=48&v=4" height="50" alt="scrapy logo">
      </a>
    </td>
    <td align="left">
      <strong>Scrapy</strong><br>
      Framework for extracting the data you need from websites.<br>
    </td>
  </tr>
  <tr>
    <td align="center">
      <a href="https://matplotlib.org/" target="_blank">
        <img src="https://avatars.githubusercontent.com/u/215947?s=48&v=4" height="50" alt="scrapy logo">
      </a>
    </td>
    <td align="left">
      <strong>Matplotlib</strong><br>
      Library for creating static, animated, and interactive visualizations in Python.<br>
    </td>
  </tr>
  <tr>
    <td align="center">
      <a href="https://pytorch.org/" target="_blank">
        <img src="https://avatars.githubusercontent.com/u/21003710?s=48&v=4" height="50" alt="scrapy logo">
      </a>
    </td>
    <td align="left">
      <strong>Torch</strong><br>
      Python package for tensor computation and deep neural networks.<br>
    </td>
  </tr>
</table>

- <a href="https://openpyxl.readthedocs.io/en/stable/">openpyxl</a>
- <a href="https://py7zr.readthedocs.io/en/latest/">py7zr</a>
- <a href="https://pypi.org/project/pyreadstat/">pyreadstat</a>
- <a href="https://tqdm.github.io/">tqdm</a>
- <a href="https://requests.readthedocs.io/en/latest/">requests</a>
- <a href="https://pypi.org/project/appdirs/">appdirs</a>
- <a href="https://pypi.org/project/pyarrow/">pyarrow</a>
- <a href="https://pypi.org/project/deep-translator/">deep_translator</a>
- <a href="https://pypi.org/project/transformers/">transformers</a>
- <a href="https://pypi.org/project/pytest/">pytest</a>

## Installation

**socio4health** can be installed via pip from [PyPI](https://pypi.org/project/socio4health/).

``` CMD
# Install using pip
pip install socio4health
```

## How to Use it

To use the socio4health package, follow these steps:

1. Import the package in your Python script:

   ```python
   from socio4health import Extractor()
   from socio4health import Harmonizer
   
   ```
2. Create an instance of the `Extractor` class:

   ```python
   extractor = Extractor()
   ```

3. Extract data from online sources and create a list of data information:

   ```python
   url = 'https://www.example.com'
   depth = 0
   ext = 'csv'
   list_datainfo = extractor.s4h_extract(url=url, depth=depth, ext=ext)
   harmonizer = Harmonizer()
   ```

For more detailed examples and use cases, please refer to the [socio4health documentation](https://harmonize-tools.github.io/socio4health/).

## Resources

<details>
<summary>
Package Website
</summary>

The [socio4health website](https://harmonize-tools.github.io/socio4health/) package website includes **API reference**, **user guide**, and **examples**. The site mainly concerns the release version, but you can also find documentation for the latest development version.

</details>
<details>
<summary>
Organisation Website
</summary>

[Harmonize](https://www.harmonize-tools.org/) is an international project that develops cost-effective and reproducible digital tools for stakeholders in Latin America and the Caribbean (LAC) affected by a changing climate. These stakeholders include cities, small islands, highlands, and the Amazon rainforest.

The project consists of resources and [tools](https://harmonize-tools.github.io/) developed in conjunction with different teams from Brazil, Colombia, Dominican Republic, Peru, and Spain.

</details>

## Organizations

<table>
  <tr>
    <td align="center">
      <a href="https://www.bsc.es/" target="_blank">
        <img src="https://imgs.search.brave.com/t_FUOTCQZmDh3ddbVSX1LgHYq4mzCxvVA8U_YHywMTc/rs:fit:500:0:0/g:ce/aHR0cHM6Ly9zb21t/YS5lcy93cC1jb250/ZW50L3VwbG9hZHMv/MjAyMi8wNC9CU0Mt/Ymx1ZS1zbWFsbC5q/cGc" height="64" alt="bsc logo">
      </a>
    </td>
    <td align="center">
      <a href="https://uniandes.edu.co/" target="_blank">
        <img src="https://raw.githubusercontent.com/harmonize-tools/socio4health/refs/heads/main/docs/img/uniandes.png" height="64" alt="uniandes logo">
      </a>
    </td>
  </tr>
</table>


## Authors / Contact information

Here is the contact information of authors/contributors in case users have questions or feedback.
</br>
</br>
<a href="https://github.com/dirreno">
  <img src="https://avatars.githubusercontent.com/u/39099417?v=4" style="width: 50px; height: auto;" />
</a>
<span style="display: flex; align-items: center; margin-left: 10px;">
  <strong>Diego Irreño</strong> (developer)
</span>
</br>
<a href="https://github.com/Ersebreck">
  <img src="https://avatars.githubusercontent.com/u/81669194?v=4" style="width: 50px; height: auto;" />
</a>
<span style="display: flex; align-items: center; margin-left: 10px;">
  <strong>Erick Lozano</strong> (developer)
</span>
</br>
<a href="https://github.com/Juanmontenegro99">
  <img src="https://avatars.githubusercontent.com/u/60274234?v=4" style="width: 50px; height: auto;" />
</a>
<span style="display: flex; align-items: center; margin-left: 10px;">
  <strong>Juan Montenegro</strong> (developer)
</span>
</br>
<a href="https://github.com/ingridvmoras">
  <img src="https://avatars.githubusercontent.com/u/91691844?s=400&u=945efa0d09fcc25d1e592d2a9fddb984fdc6ceea&v=4" style="width: 50px; height: auto;" />
</a>
<span style="display: flex; align-items: center; margin-left: 10px;">
  <strong>Ingrid Mora</strong> (documentation)
</span>
