Metadata-Version: 2.4
Name: conf-spl2-converter
Version: 0.8.0
Summary: A CLI tool for converting Splunk .conf configurations to SPL2
Project-URL: Homepage, https://github.com/splunk/conf-spl2-converter
Project-URL: Repository, https://github.com/splunk/conf-spl2-converter
Author-email: Splunk <mgazda@cisco.com>
License: Splunk Proprietary
License-File: LICENSE
Keywords: conf,converter,pipeline,spl2,splunk
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: Other/Proprietary License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Utilities
Requires-Python: >=3.9
Requires-Dist: addonfactory-splunk-conf-parser-lib>=0.4
Requires-Dist: antlr4-python3-runtime>=4.13
Requires-Dist: deepdiff>=8.5
Requires-Dist: openai>=1.93
Requires-Dist: python-dotenv>=1.1
Requires-Dist: pyyaml>=6.0
Requires-Dist: requests>=2.32
Requires-Dist: splunk-sdk>=2.1
Requires-Dist: typer>=0.12.0
Requires-Dist: xmltodict>=0.14
Provides-Extra: testing
Requires-Dist: defusedxml>=0.7; extra == 'testing'
Requires-Dist: requests>=2.31; extra == 'testing'
Requires-Dist: splunk-sdk>=2.0; extra == 'testing'
Requires-Dist: urllib3>=2.0; extra == 'testing'
Description-Content-Type: text/markdown

# conf-spl2-converter

A CLI tool for converting Splunk `.conf` configurations (props.conf / transforms.conf) to SPL2 pipeline templates, and generating expected test outputs from Splunk field extractions or CIM field annotations.

> **Alpha** — this project is under active development. APIs and output format may change.

## Installation

Requires Python 3.9+.

```bash
pip install conf-spl2-converter
```

To use the test generation pipeline (`generate-expected`), install with the testing extra:

```bash
pip install conf-spl2-converter[testing]
```

## Quick start

```bash
# 1. Generate SPL2 pipeline files from a TA
conf-spl2-converter generate /path/to/ta

# 2a. Generate expected test outputs from CIM fields (no Docker needed)
conf-spl2-converter generate-expected-cim /path/to/ta

# 2b. Or generate expected test outputs via Splunk (requires Docker)
conf-spl2-converter generate-expected /path/to/ta
```

## Commands

### `generate` — Create SPL2 pipeline templates

Reads the TA's `props.conf` and `transforms.conf` and generates SPL2 pipeline files.

```bash
# Auto-discover all sourcetypes from props.conf (no config file needed)
conf-spl2-converter generate /path/to/ta

# Use a config file to control which sourcetypes are processed and how
conf-spl2-converter generate /path/to/ta -c field_extraction_config.json

# Write output to a custom directory
conf-spl2-converter generate /path/to/ta -o /tmp/my-output

# Export parsed template data as JSON (useful for debugging / integration)
conf-spl2-converter generate /path/to/ta -o /tmp/my-output -f json

# Combine all options with verbose logging
conf-spl2-converter generate /path/to/ta -c config.json -o ./out -f spl2 -v
```


### Config file

When `--config` / `-c` is **not** provided, the tool looks for `field_extraction_config.json` inside the TA directory. If found, it is used automatically. If not found, sourcetypes are auto-discovered from `props.conf`.

When `--config` / `-c` **is** provided, the specified config file is used instead (overrides the default lookup in the TA directory).

The config file controls which sourcetypes are processed along with extra settings like lookups, fields to trim, kv_mode overrides, etc.

**Single config for all commands** — The same `field_extraction_config.json` is used by `generate`, `knowledge-build`, and the `generate-expected` family. Use one config file and the same TA path; all commands share the same resolution order (explicit path, then TA directory, then auto-discovery) and the same keys (sourcetype names, `source`, `slug`, etc.). Example:

```bash
CONFIG=path/to/field_extraction_config.json
TA_PATH=/path/to/Splunk_TA_windows

conf-spl2-converter generate    $TA_PATH -c $CONFIG -o out/gen
conf-spl2-converter knowledge-build $TA_PATH -c $CONFIG -o out/kb -k ta
```

### Config file format (`field_extraction_config.json`)

The config file is a **JSON object** whose **top-level keys are sourcetype names** (as in `props.conf`). Each value is an object that can contain the following keys. All keys are optional unless noted.

| Key | Type | Used by | Description |
|-----|------|---------|-------------|
| `slug` | string | both | Filesystem-safe identifier for output paths (e.g. `pan_firewall`). If omitted, derived from the sourcetype. |
| `addon_name` | string | converter | Add-on identifier (e.g. `splunk-add-on-for-palo-alto-networks`). |
| `version` | string | converter | Add-on version (metadata in generated pipeline). |
| `label` | string | converter | Add-on label (e.g. `Splunk_TA_paloalto_networks`). |
| `human_readable_name` | string | converter | Human-readable add-on name (metadata). |
| `splunk_base_url` | string | converter | URL to Splunkbase or docs (metadata). |
| `template_version` | string | converter | Template format version (metadata). |
| `sample_files` | array of strings | KB | Sample file names for the sourcetype (Knowledge Builder). |
| `lb_rule` | string | KB | Line-breaking rule (e.g. `"\n"`) for the sourcetype (Knowledge Builder). |
| `source` | array of strings | both | List of source names (e.g. `["WinEventLog:Security", "WinEventLog:Application"]`). When set, the converter/KB generates one pipeline branch per source. Values are the part after `source::` in props.conf stanza names. |
| `sub_sourcetypes` | array of strings | both | List of sub-sourcetype stanzas to process under this sourcetype (e.g. `["pan:threat", "pan:traffic"]`). Generates one pipeline branch per sub-sourcetype. |
| `system_default_extractions` | array of strings | converter | List of **default** stanza names to use for extractions (converter only). |
| `lookups_with_empty_values` | array of strings | converter | Lookup table names that accept empty values (converter). |
| `fields_to_trim` | array of strings | converter | Field names to trim (whitespace) in the pipeline. |
| `fields_to_trim_newlines` | array of strings | converter | Field names to trim newlines from. |
| `fields_to_trim_quotes` | array of strings | converter | Field names to trim surrounding quotes from. |
| `remove_duplicate_fields_case_insensitive` | array of strings | converter | Field names to deduplicate case-insensitively. |
| `convert_string_to_array` | array of strings | converter | Field names to convert from string to array. |
| `kv_mode` | string | both | Override KV mode: `auto`, `none`, `json`, `xml`, or other values supported by the TA. Default when omitted is `auto`. |
| `template_name` | string | converter | Override `@template` annotation name. Default: derived from add-on name and sourcetype. |
| `template_description` | string | converter | Override `@template` annotation description. |
| `template_runtime` | array of strings | converter | Override `@template` runtime list. Default: `["ingestProcessor", "edgeProcessor"]`. |
| `template_sourcetype` | object | converter | Override `@template` sourcetype matching (`field`, `operator`, `values`). Supports `EQUAL` and `MATCH` operators. |
| `template_events` | array of objects | converter | Sample events for `@template` annotation (each with `host`, `sourcetype`, `source`, `_raw`). |

**Minimal example** (one sourcetype, no sources or sub-sourcetypes):

```json
{
  "mysourcetype": {
    "slug": "mysourcetype",
    "addon_name": "my-addon"
  }
}
```

**Example with sources and options** (converter + Knowledge Builder):

```json
{
  "pan:firewall": {
    "addon_name": "splunk-add-on-for-palo-alto-networks",
    "slug": "pan_firewall",
    "version": "3.0.0",
    "label": "Splunk_TA_paloalto_networks",
    "human_readable_name": "Splunk Add-on for Palo Alto Networks",
    "sample_files": ["pan_firewall.samples"],
    "lb_rule": "\n",
    "sub_sourcetypes": ["pan:threat", "pan:traffic", "pan:system"],
    "fields_to_trim": ["threat_name", "signature"],
    "remove_duplicate_fields_case_insensitive": ["action", "rule"],
    "convert_string_to_array": ["flags"]
  }
}
```

When **no config file** is provided (and none is found in the TA), the tool **auto-discovers** all sourcetypes from `props.conf` and uses default behaviour: no `source`/`sub_sourcetypes`, no field trimming or lookups, and `slug` is derived from the sourcetype name.

### `generate` options

| Flag | Short | Description |
|------|-------|-------------|
| `--config` | `-c` | Path to a `field_extraction_config.json`. When omitted, looks for it in the TA directory; falls back to auto-discovery from `props.conf`. |
| `--output` | `-o` | Output directory for generated files. Defaults to `<ta_path>/default/data/spl2/`. |
| `--format` | `-f` | Output format: `spl2` (default) or `json`. |
| `--no-annotation` | | Disable `@template` annotation generation in SPL2 output. |
| `--verbose` | `-v` | Enable debug logging. |

#### Output formats

- **`spl2`** (default) — renders `.spl2` pipeline files ready for use in Splunk.
- **`json`** — writes a structured JSON file per sourcetype containing the parsed template data (extractions, evals, lookups, etc.).

#### `@template` annotation

By default, every generated SPL2 pipeline includes a `@template` annotation just before the `$pipeline` statement. This annotation provides metadata for Data Orchestrator, including the template name, sourcetype matching rules, and sample events.

Example output:

```
@template("Palo Alto Networks: Firewall events field extractions", sourcetype: {field: "sourcetype", operator: "MATCH", values: ["/(pan_log|pan:[^:]+)(?!(?::|_)cloud)/i"]}, events: [{...}]);
```

The annotation fields are derived automatically from the TA metadata (add-on name, sourcetype) but can be overridden in `field_extraction_config.json` using the template config keys below.

To disable annotation generation entirely, use `--no-annotation`:

```bash
conf-spl2-converter generate /path/to/ta --no-annotation
```

#### Template config keys

These optional keys in `field_extraction_config.json` control the `@template` annotation content:

| Key | Type | Description |
|-----|------|-------------|
| `template_name` | string | Override the annotation name. Default: `"<addon_name>: <sourcetype> field extractions"`. |
| `template_description` | string | Override the description. Default: auto-generated from the pipeline context. |
| `template_runtime` | array of strings | Override the runtime list. Default: `["ingestProcessor", "edgeProcessor"]`. |
| `template_sourcetype` | object | Override sourcetype matching. Object with `field`, `operator` (`EQUAL` or `MATCH`), and `values`. Default: `EQUAL` with the sourcetype name(s). |
| `template_events` | array of objects | Sample events to embed. Each object should contain `host`, `sourcetype`, `source`, and `_raw`. |

Example with `MATCH` operator and sample events:

```json
{
  "pan:firewall": {
    "slug": "pan_firewall",
    "template_name": "Palo Alto Networks: Firewall events field extractions",
    "template_sourcetype": {
      "field": "sourcetype",
      "operator": "MATCH",
      "values": ["/(pan_log|pan:[^:]+)(?!(?::|_)cloud)/i"]
    },
    "template_events": [
      {
        "host": "so1",
        "sourcetype": "pan:traffic",
        "source": "pan:traffic",
        "_raw": "May 14 12:03:13 gateway ..."
      }
    ]
  }
}
```

### `generate-expected` — Generate expected test outputs

Runs the full test generation pipeline in a single command. Requires Docker.

The pipeline:

1. Starts a Splunk Docker container with the TA installed.
2. Collects test samples from the TA's `tests/knowledge/samples/` directory (XML/log files).
3. Sends each sample event to Splunk via HEC.
4. Retrieves Splunk's extracted fields via the Splunk SDK.
5. Generates `module.test.json` files containing the expected field extractions.
6. Stops the Splunk container (unless `--keep-running` is used).

```bash
# Basic usage — starts Docker, runs pipeline, stops Docker
conf-spl2-converter generate-expected /path/to/ta

# With a config file and custom output directory
conf-spl2-converter generate-expected /path/to/ta -c config.json -o ./out

# Skip Docker management (assumes Splunk is already running on localhost)
conf-spl2-converter generate-expected /path/to/ta --skip-docker

# Leave the Splunk container running after completion (useful for iterating)
conf-spl2-converter generate-expected /path/to/ta --keep-running

# Verbose logging
conf-spl2-converter generate-expected /path/to/ta -v
```

#### Options

| Flag | Short | Description |
|------|-------|-------------|
| `--config` | `-c` | Path to a `field_extraction_config.json`. When omitted, looks for it in the TA directory; falls back to auto-discovery from `props.conf`. |
| `--output` | `-o` | Output directory for generated files. Defaults to `<ta_path>/default/data/spl2/`. |
| `--skip-docker` | | Skip Docker container management; assume Splunk is already running. |
| `--keep-running` | | Leave the Splunk container running after completion. |
| `--verbose` | `-v` | Enable debug logging. |

#### Generated files

For each sourcetype, the pipeline produces:

```
<output_dir>/<sourcetype_slug>/
    <sourcetype_slug>.samples      # JSONL file with collected sample events
    module.test.json               # Expected field extractions for each sample
```

#### Environment variables

Splunk connection settings can be overridden via environment variables:

| Variable | Default | Description |
|----------|---------|-------------|
| `SPL2_TF_SPLUNK_INSTANCE_IP` | `127.0.0.1` | Splunk host address |
| `SPL2_TF_SPLUNK_INSTANCE_PORT` | `8088` | HEC port |
| `SPL2_TF_SPLUNK_INSTANCE_API_PORT` | `8089` | Splunk management API port |
| `SPL2_TF_SPLUNK_INSTANCE_USERNAME` | `admin` | Splunk admin username |
| `SPL2_TF_SPLUNK_INSTANCE_PASSWORD` | `newPassword` | Splunk admin password |
| `SPL2_TF_SPLUNK_INSTANCE_INDEX` | `cov_test` | Index used for test events |
| `SPL2_TF_SPLUNK_INSTANCE_HEC_TOKEN` | `cc7f4d5e-...` | HEC authentication token |

### `generate-expected-all` — Splunk + CIM in a single command

Runs both pipelines sequentially: first `generate-expected` (Splunk-based) to populate `expected_destination_result`, then `generate-expected-cim` to add `expected_cim_fields`. The result is a `module.test.json` with both full Splunk extraction results and CIM field expectations.

```bash
conf-spl2-converter generate-expected-all /path/to/ta

# Skip Docker management if Splunk is already running
conf-spl2-converter generate-expected-all /path/to/ta --skip-docker -v
```

Accepts the same options as `generate-expected` (`--config`, `--output`, `--skip-docker`, `--keep-running`, `--verbose`).

### `generate-expected-cim` — Generate expected test outputs from CIM fields

Offline alternative to `generate-expected`. Instead of running events through a Splunk instance, this command reads CIM field annotations already present in the TA's XML sample files and writes them as `expected_cim_fields` in `module.test.json`. No Docker or Splunk required.

Each XML event can contain a `<cim>` element with `<cim_fields>`, `<models>`, and `<missing_recommended_fields>`. This command extracts the CIM field name/value pairs and:

- **If a `module.test.json` already exists** (e.g. from a prior `generate-expected` run), it merges the CIM data into matching test entries (matched by `_raw`) as an `expected_cim_fields` section, preserving the existing `expected_destination_result`.
- **If no prior test exists** for a sample, it creates a new entry with an empty `expected_destination_result` and the `expected_cim_fields` section.

```bash
# Basic usage
conf-spl2-converter generate-expected-cim /path/to/ta

# With a config file and custom output directory
conf-spl2-converter generate-expected-cim /path/to/ta -c config.json -o ./out

# Verbose logging
conf-spl2-converter generate-expected-cim /path/to/ta -v
```

#### Options

| Flag | Short | Description |
|------|-------|-------------|
| `--config` | `-c` | Path to a `field_extraction_config.json`. When omitted, looks for it in the TA directory; falls back to auto-discovery from `props.conf`. |
| `--output` | `-o` | Output directory for generated files. Defaults to `<ta_path>/default/data/spl2/`. |
| `--verbose` | `-v` | Enable debug logging. |

> **Note:** Events without CIM field annotations (`<cim/>` or missing `<cim>`) are skipped.

### Full workflow example

Generate SPL2 pipelines, expected test data, and run tests for a TA:

```bash
# Step 1: Generate SPL2 pipeline templates
conf-spl2-converter generate /path/to/ta

# Step 2: Generate expected test outputs — pick one:
#   Option A: From CIM fields in XML samples (fast, no Docker)
conf-spl2-converter generate-expected-cim /path/to/ta
#   Option B: Via Splunk field extraction (full fidelity, requires Docker)
conf-spl2-converter generate-expected /path/to/ta
#   Option C: Both Splunk + CIM in one command (requires Docker)
conf-spl2-converter generate-expected-all /path/to/ta

# Step 3: Run tests with spl2-testing-framework (see below)
cd <ta>/default/data/spl2
spl2_tests_run cli -v --ignore_additional_fields_in_actual --ignore_empty_strings
```

All commands write to `<ta_path>/default/data/spl2/` by default, producing:

```
<ta>/default/data/spl2/<sourcetype_slug>/
    pipeline_<sourcetype_slug>.spl2    # SPL2 pipeline (from generate)
    <sourcetype_slug>.samples          # Sample events (from generate-expected)
    module.test.json                   # Expected outputs (from generate-expected)
```

## Running tests with spl2-testing-framework

Use [spl2-testing-framework](https://pypi.org/project/spl2-testing-framework/) to verify that the generated SPL2 pipelines produce the expected field extractions.

```bash
pip install spl2-testing-framework

cd <ta>/default/data/spl2
spl2_tests_run cli -v --ignore_additional_fields_in_actual --ignore_empty_strings
```

## Knowledge Builder integration

The **Knowledge Builder** builds knowledge bases from a TA (and optionally security content) and generates SPL2 noise-reduction pipelines. It uses the same input config behaviour as the `generate` command.

### Command

```bash
# TA path required; optional config and output
conf-spl2-converter knowledge-build <ta_path> [-c CONFIG] [-o OUTPUT_DIR] [-k KNOWLEDGE_SOURCE] [-v]

# Examples
conf-spl2-converter knowledge-build /path/to/Splunk_TA_cisco-asa
conf-spl2-converter knowledge-build /path/to/ta -o ./out/kb-cisco -k ta -v
```

| Flag | Short | Description |
|------|-------|-------------|
| `ta_path` | | (Required) Path to the TA package directory (must contain `default/props.conf`). |
| `--config` | `-c` | Path to `field_extraction_config.json`. When omitted, looks for it in the TA directory; if not found, sourcetypes are auto-discovered from `props.conf`. |
| `--output` | `-o` | Output directory for SPL2 templates and knowledge bases. If omitted, uses paths from the package config. |
| `--knowledge-source` | `-k` | `ta` (TA only) or `security_content`. Use `ta` when the security_content repo is not available. |
| `--verbose` | `-v` | Enable verbose/debug logging. |

### Input config (same as generate)

- If **config is provided** (`-c` or `field_extraction_config.json` found in the TA): that config defines which sourcetypes (and optional sources) are processed.
- If **no config is used**: sourcetypes are **auto-discovered** from the TA’s `default/props.conf` (all non–`source::` stanzas).

### Output when no config (auto-discovery)

When no config file is used, the Knowledge Builder generates:

1. **One combined template** for all discovered sourcetypes:
   - Path: `<output_dir>/all_sourcetypes/all_sourcetypes_noisereduce.spl2`

2. **One template per sourcetype**, in a directory per sourcetype (slug):
   - Path: `<output_dir>/<slug>/<slug>_noisereduce.spl2`
   - Example: `out/kb-cisco/cisco_asa/cisco_asa_noisereduce.spl2`, `out/kb-cisco/syslog/syslog_noisereduce.spl2`

**Naming convention:** All generated SPL2 templates use the pattern `{sourcetype}_noisereduce.spl2`, where the sourcetype is represented by its **slug** (filesystem-safe, e.g. `cisco:asa` → `cisco_asa`). The combined template uses the slug `all_sourcetypes`.

Knowledge base JSON files and field-regex mappings are written under `<output_dir>/knowledge_bases/`.

Generated SPL2 templates include a **comment header** (metadata table, disclaimer, overview, purpose). Metadata is taken from the config when available (`human_readable_name`, `version`, `splunk_base_url`, `template_version`); missing values are replaced with placeholders (e.g. `[SPLUNKBASE_URL]`) for you to fill in.

### Output when config is provided

When a config file is used, a **single** combined SPL2 file is written to the legacy path (e.g. `<output_dir>/<sourcetype1>_<sourcetype2>_..._noise_reduction.spl2`), and the per-sourcetype directory layout above is not used.

## Development

### Prerequisites

- **Python 3.9+**
- **uv** (recommended) — install with: `pip install uv` or see [uv installation](https://docs.astral.sh/uv/getting-started/installation/)

### Running from the repository

To run the CLI from the repo (without installing the package globally):

1. **One-time setup** — from the repo root, create the environment and install the project:

   ```bash
   uv sync --group dev
   ```

   This creates a `.venv` in the repo and installs the project in editable mode with all dependencies.

2. **Run the CLI** — use `uv run` so the command uses the project’s environment:

   ```bash
   uv run conf-spl2-converter knowledge-build /path/to/Splunk_TA_cisco-asa -o ./out/kb-cisco -k ta -v
   uv run conf-spl2-converter generate /path/to/ta -o ./out
   ```

   You can run these from any directory; `uv run` will resolve the project from the current working directory.

   **Alternative:** Activate the virtual environment and run the script directly:

   ```bash
   source .venv/bin/activate   # macOS/Linux
   # or:  .venv\Scripts\activate  on Windows
   conf-spl2-converter knowledge-build /path/to/ta -o ./out/kb -k ta
   ```

   After activation, `conf-spl2-converter` is on `PATH` because the project is installed in `.venv`.

### Tests, lint, format

```bash
# Run tests
uv run pytest

# Lint and format
uv run ruff check .
uv run ruff format .

# Install pre-commit hooks
uv run pre-commit install
```

## License

Copyright (C) 2026 Splunk Inc. All Rights Reserved.
See [LICENSE](LICENSE) for details.
