Metadata-Version: 2.4
Name: rdetoolkit
Version: 1.6.2
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Programming Language :: Rust
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: MacOS
Classifier: Typing :: Typed
Requires-Dist: chardet>=5.2.0
Requires-Dist: charset-normalizer>=3.2.0
Requires-Dist: openpyxl>=3.1.2
Requires-Dist: pandas>=2.2.3
Requires-Dist: build>=1.0.3
Requires-Dist: typer>=0.9.0
Requires-Dist: toml>=0.10.2
Requires-Dist: pydantic>=2.8.3
Requires-Dist: jsonschema>=4.21.1
Requires-Dist: tomlkit>=0.12.4
Requires-Dist: pyyaml>=6.0.1
Requires-Dist: eval-type-backport>=0.2.0
Requires-Dist: typing-extensions>=4.12.2 ; python_full_version < '3.11'
Requires-Dist: numpy>=1.26.4
Requires-Dist: polars>=1.9.0
Requires-Dist: pyarrow>=19.0.0
Requires-Dist: pip>=24.3.1
Requires-Dist: rpds-py>=0.26
Requires-Dist: markdown>=3.7
Requires-Dist: matplotlib>=3.9.4
Requires-Dist: minio>=7.2.15 ; extra == 'minio'
Requires-Dist: plotly>=6.3.1 ; extra == 'plotly'
Provides-Extra: minio
Provides-Extra: plotly
License-File: LICENSE
Summary: A module that supports the workflow of the RDE dataset construction program
Keywords: rdetoolkit,RDE,toolkit,structure,dataset
Author-email: NIMS MDPF <rde@nims.go>
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Bug Tracker, https://github.com/nims-dpfc/rdetoolkit
Project-URL: Homepage, https://github.com/nims-dpfc/rdetoolkit

![GitHub Release](https://img.shields.io/github/v/release/nims-mdpf/rdetoolkit)
[![python.org](https://img.shields.io/badge/Python-3.10%7C3.11%7C3.12%7C3.13%7C3.14-%233776AB?logo=python)](https://www.python.org/downloads/)
[![MIT License](https://img.shields.io/badge/license-MIT-green)](https://github.com/nims-mdpf/rdetoolkit/blob/main/LICENSE)
[![Issue](https://img.shields.io/badge/issue_tracking-github-orange)](https://github.com/nims-mdpf/rdetoolkit/issues)
![workflow](https://github.com/nims-mdpf/rdetoolkit/actions/workflows/main.yml/badge.svg)
![coverage](docs/img/coverage.svg)

> [日本語ドキュメント](docs/README_ja.md)

# RDEToolKit

RDEToolKit is a fundamental Python package for creating workflows of RDE-structured programs.
By utilizing various modules provided by RDEToolKit, you can easily build processes for registering research and experimental data into RDE.
Additionally, by combining RDEToolKit with Python modules used in your research or experiments, you can achieve a wide range of tasks, from data registration to processing and visualization.

## Documents

See the [documentation](https://nims-mdpf.github.io/rdetoolkit/) for more details.

## Contributing

If you wish to make changes, please read the following document first:

- [CONTRIBUTING.md](https://github.com/nims-mdpf/rdetoolkit/blob/main/CONTRIBUTING.md)

## Requirements

- **Python**: 3.10 or higher

!!! note "Python 3.9 Support Removed"
    Python 3.9 support was removed in rdetoolkit 1.6.x. If you need Python 3.9 support, use rdetoolkit 1.5.x or earlier.

## Install

To install, run the following command:

```shell
pip install rdetoolkit
```

## Usage

Below is an example of building an RDE-structured program.

### Create a Project

First, prepare the necessary files for the RDE-structured program. Run the following command in your terminal or shell:

```python
python3 -m rdetoolkit init
```

If the command runs successfully, the following files and directories will be generated.

In this example, development proceeds within a directory named `container`.

- **requirements.txt**
  - Add any Python packages you wish to use for building the structured program. Run `pip install` as needed.
- **modules**
  - Store programs you want to use for structuring processing here. Details are explained in a later section.
- **main.py**
  - Defines the entry point for the structured program.
- **data/inputdata**
  - Place data files to be processed here.
- **data/invoice**
  - Required even as an empty file for local execution.
- **data/tasksupport**
  - Place supporting files for structuring processing here.

```shell
container
├── data
│   ├── inputdata
│   ├── invoice
│   │   └── invoice.json
│   └── tasksupport
│       ├── invoice.schema.json
│       └── metadata-def.json
├── main.py
├── modules
└── requirements.txt
```

### Implementing Structuring Processing

You can process input data (e.g., data transformation, visualization, creation of CSV files for machine learning) and register the results into RDE. By following the format below, you can incorporate your own processing into the RDE structured workflow.

The recommended signature for the `dataset()` function accepts a single
`RdeDatasetPaths` argument that bundles both input and output locations. The
legacy two-argument style (`RdeInputDirPaths`, `RdeOutputResourcePath`) remains
available for backward compatibility.

```python
from rdetoolkit.models.rde2types import RdeDatasetPaths

def dataset(paths: RdeDatasetPaths) -> None:
    ...
```

In this example, we define a dummy function `display_message()` under `modules` to demonstrate how to implement custom structuring processing. Create a file named `modules/modules.py` as follows:

```python
# modules/modules.py
from rdetoolkit.models.rde2types import RdeDatasetPaths


def display_message(path):
    print(f"Test Message!: {path}")


def dataset(paths: RdeDatasetPaths) -> None:
    display_message(paths.inputdata)
    display_message(paths.struct)
```

### About the Entry Point

Next, use `rdetoolkit.workflow.run()` to define the entry point. The main tasks performed in the entry point are:

- Checking input files
- Obtaining various directory paths as specified by RDE structure
- Executing user-defined structuring processing

```python
import rdetoolkit
from modules.modules import dataset  # User-defined structuring processing function

# Pass the user-defined structuring processing function as an argument
rdetoolkit.workflows.run(custom_dataset_function=dataset)
```

If you do not wish to pass a custom structuring processing function, define as follows:

```python
import rdetoolkit

rdetoolkit.workflows.run()
```

### Running in a Local Environment

To debug or test the RDE structured process in your local environment, simply add the necessary input data to the `data` directory. As long as the `data` directory is placed at the same level as `main.py`, it will work as shown below:

```shell
container/
├── main.py
├── requirements.txt
├── modules/
│   └── modules.py
└── data/
    ├── inputdata/
    │   └── <experimental data to process>
    ├── invoice/
    │   └── invoice.json
    └── tasksupport/
        ├── metadata-def.json
        └── invoice.schema.json
```

### Validating RDE Files

RDEToolKit provides validation commands to verify the structure and correctness of your RDE project files. These commands help catch configuration errors early and can be integrated into CI/CD pipelines.

#### Validate Invoice Schema

```bash
rdetoolkit validate invoice-schema data/tasksupport/invoice.schema.json
```

#### Validate Invoice Data

```bash
rdetoolkit validate invoice data/invoice/invoice.json \
  --schema data/tasksupport/invoice.schema.json
```

#### Validate Metadata Definition

```bash
rdetoolkit validate metadata-def data/tasksupport/metadata-def.json
```

#### Validate Metadata Data

```bash
rdetoolkit validate metadata data/metadata.json \
  --schema data/tasksupport/metadata-def.json
```

#### Batch Validation

Validate all standard files in your project at once:

```bash
# Validate all files in current directory
rdetoolkit validate all

# Validate all files in specific project
rdetoolkit validate all /path/to/project

# Use JSON output for CI/CD integration
rdetoolkit validate all --format json
```

For more details, see the [validation documentation](https://nims-mdpf.github.io/rdetoolkit/rdetoolkit/cmd/validate/).

