Metadata-Version: 2.4
Name: agentdsl
Version: 0.0.2
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Rust
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Communications
License-File: license.md
Summary: A token-efficient DSL for Agent-to-Agent and Human-to-Agent communication, optimized for LLM context windows
Keywords: agent,dsl,llm,rpc,communication,ai,language-model
Author-email: Khaled Abbas <halidrauf@users.noreply.github.com>
Requires-Python: >=3.7
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://github.com/halidrauf/AgentDSL
Project-URL: Issues, https://github.com/halidrauf/AgentDSL/issues
Project-URL: Repository, https://github.com/halidrauf/AgentDSL

# AgentDSL

[![PyPI](https://img.shields.io/pypi/v/agentdsl.svg)](https://pypi.org/project/agentdsl/)
[![Rust](https://img.shields.io/badge/rust-stable-orange.svg)](https://www.rust-lang.org/)
[![Python](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](license.md)

**AgentDSL** is a specialized, token-efficient format designed for **Agent-to-Agent (A2A)** and **Human-to-Agent** relational data communication. Unlike verbose JSON payloads, AgentDSL uses a concise, delimiter-based Domain Specific Language (DSL) optimized for Large Language Model (LLM) context windows and parsing efficiency.

> [!IMPORTANT]
> **Project Status: Pre-Alpha / Experimental**
> This project is in early development. APIs and the DSL specification are subject to change.

## 🚀 Key Features

- **Token Efficient**: Specialized syntax reduces overhead compared to JSON/XML.
- **Streamable**: Designed to be parsed line-by-line.
- **Type Safe**: Built-in support for standard types (`str`, `int`, `float`, `any`).
- **Relational**: Support for cross-schema references using `@`.

## 🛠 Prerequisites

- **Rust**: `stable` (2021 edition)
- **Python**: `3.10` or higher
- **Maturin**: For building Python bindings

## 📖 Protocol Specification

The format relies on standard markers (e.g., `[SECTION]`) and pipe `|` delimiters.

### 1. Structured Data Exchange

AgentDSL decouples generic data structure from values to save tokens on repetitive keys.

#### `[SCHEMA:LABEL]`

Defines the structure of a data block.

- **LABEL**: A unique name for the schema.
- **Fields**: Format `FIELD_NAME:type`.
- **ID**: Each schema MUST define an `ID:int` as the first column.
- **References**: Use `@SCHEMA.FIELD:type` (e.g., `@INDIVIDUAL.ID:int`) to reference other blocks.

```text
[SCHEMA:INDIVIDUAL]
ID:int|NAME:str|AGE:int|GENDER:str|HOBBY:str
```

#### `[DATA]`

Contains the actual records. Each row MUST start with an ID, followed by values corresponding to the Schema.

> [!TIP]
> **Multi-Value Support**: Use a comma `,` to separate multiple values within a single column (e.g., `Coding,Reading`).

> [!NOTE]
> **Empty Columns**: Use an underscore `_` to represent an empty column to maintain structural integrity while keeping the row token-efficient.

```text
[DATA]
1|Sion|25|MALE|Coding,Reading
2|Alice|30|FEMALE|Reading
3|Bob|35|_|_
```

### 2. Capability Discovery

Agents broadcast available remote procedures using the `[FUNCTIONS]` block.

#### `[FUNCTIONS]`

Defines available functions with descriptions and typed parameters.

- **[FunctionName]**: Each function name is enclosed in brackets.
- **DESC**: Human-readable description.
- **IN**: Input parameters (`*` indicates required).
- **OUT**: Return values.

```text
[FUNCTIONS]
[GetWeather]
DESC[Get weather for a city]
IN[CITY:str*|FORCAST_DAYS:int]
OUT[TEMPERATURE:str|HUMIDITY:str|WIND_SPEED:str]
```

### 3. Remote Procedure Calls (RPC)

#### `[CALL]`

Invokes a function defined in a `[FUNCTIONS]` block.
Format: `[FunctionName]` followed by `FIELD:VALUE` pairs.

```text
[CALL]
[GetWeather]
CITY:New York
FORCAST_DAYS:3
```

### 4. Results & Errors

#### `[RESULT]`

Contains structured outputs, status, and error messages.
Headers are typed: `[FIELD:type|...]`. Status and Error blocks provide execution feedback.

```text
[RESULT]
[TEMPERATURE:str|HUMIDITY:str|WIND_SPEED:str]
30C|70%|10mph
[STATUS:str]
SUCCESS
[ERROR:str]
```

In case of failure:

```text
[RESULT]
...
[STATUS:str]
ERROR
[ERROR:str]
Invalid city name
```

For nested data structures, AgentDSL generates a sequence of blocks to maintain token efficiency and relational integrity:

```text
[SCHEMA:Details]
ID:int|HUMIDITY:str|UV_INDEX:int

[DATA]
1|60%|5

[RESULT]
[CITY:str|DETAILS:@Details.ID]
Paris|1
[STATUS:str]
SUCCESS
[ERROR:str]
```

## ⚡ Examples & Resources

For the formal language specification, grammar, and detailed semantics, refer to [DSL_SPECIFICATION.md](DSL_SPECIFICATION.md).

## 📦 Rust Library

AgentDSL provides a Rust library for parsing and serializing AgentDSL messages.

### Installation

Add this to your `Cargo.toml`:

```toml
[dependencies]
agentdsl = { path = "." } # Or git/crates.io version
```

### Usage (Rust)

```rust
use agentdsl::{parse, serialize, AgentMessage};

fn main() {
    let input = r#"
[DATA]
1|Sion|25
"#;

    // Parsing
    let message = parse(input.to_string()).unwrap();

    // Serialization
    let dsl_string = serialize(message);
    println!("DSL: {}", dsl_string);
}
```

## 🐍 Python Integration

AgentDSL is available as a Python package.

### Installation

```bash
pip install agentdsl
```

### Usage (Python)

```python
import agentdsl

# --- DSL to JSON (Parsing) ---
ast = agentdsl.parse("[DATA]\n1|Sion|25")
print(ast)

# --- JSON to DSL (Serialization) ---
dsl = agentdsl.serialize(ast)
print(dsl)

# --- Tool Integration ---
@agentdsl.tool
def get_weather(city: str, forecast_days: int = 1) -> str:
    """Get weather for a city"""
    return f"Weather in {city} for {forecast_days} days: Sunny"

# Access generated AgentDSL definition
print(get_weather.__agentdsl__)

# Format execution results (Relational Data Support)
data = get_weather("New York", 3)
result_dsl = agentdsl.serialize_result(get_weather, data)
print(result_dsl)

# --- NEW: Native JSON Support ---
# DSL -> Natural JSON (Resolves relational references)
json_obj = agentdsl.to_json(result_dsl)
print(json_obj)

# Natural JSON -> DSL (Infers schemas & extracts relations)
new_dsl = agentdsl.from_json(json_obj)
print(new_dsl)
```

## 🔗 C-FFI (Other Languages)

For integration with other languages, AgentDSL provides C-compatible bindings in `src/ffi.rs`.

- `agentdsl_parse`: Convert DSL string to JSON AST string.
- `agentdsl_serialize`: Convert JSON AST string to AgentDSL string.
- `agentdsl_to_json`: Convert DSL string to Native JSON string.
- `agentdsl_from_json`: Convert Native JSON string to AgentDSL string.
- `agentdsl_free_string`: Free memory allocated by the library.

You can generate headers using `cbindgen`.

---

### 5. Native JSON Conversion

AgentDSL provides "Native JSON" support to bridge the gap between token-efficient DSL and developer-friendly keyed objects. This handles **relational resolution** (re-hydrating `@` references into nested objects).

#### DSL to Native JSON

Converts a sequence of DSL blocks into a nested JSON structure. Relational references are automatically resolved into nested objects when the referenced data is present.

```python
# Returns a dictionary with re-hydrated nested objects
native_json = agentdsl.to_json(dsl_string)
```

#### Native JSON to DSL

Infers schemas from JSON keys and extracts nested objects into separate schemas with `@` references to maintain token efficiency.

```python
# Returns a compact DSL string
dsl_string = agentdsl.from_json(native_json_obj)
```

## 🎯 Use Cases

### ✅ When to Use AgentDSL

AgentDSL excels in agent communication scenarios where **token efficiency matters** and data is **relational or tabular**.

#### 1. **High-Density Data Tables** ✓

When an agent needs to pass a large batch of structured records (like CSV data but type-safe and streamable), AgentDSL's `[SCHEMA]` and `[DATA]` blocks provide a flat, high-density alternative to JSON arrays.

**Example: Database query results**

```text
[SCHEMA:USERS]
ID:int|NAME:str|EMAIL:str|CREATED:int

[DATA]
1|Alice|alice@example.com|1704067200
2|Bob|bob@example.com|1704153600
3|Carol|carol@example.com|1704240000
```

vs. JSON (much more verbose):

```json
{
  "users": [
    {
      "id": 1,
      "name": "Alice",
      "email": "alice@example.com",
      "created": 1704067200
    },
    {
      "id": 2,
      "name": "Bob",
      "email": "bob@example.com",
      "created": 1704153600
    },
    {
      "id": 3,
      "name": "Carol",
      "email": "carol@example.com",
      "created": 1704240000
    }
  ]
}
```

**Token savings: ~48% compared to JSON (high-density tables)**

#### 2. **Deeply Nested/Relational Data** ✓

Instead of nesting deep objects (which confuses LLM attention), AgentDSL uses **Relational References** (`@`).

**Example: API response with customer → orders → items hierarchy**

```text
[SCHEMA:CUSTOMER]
ID:int|NAME:str|ACCOUNT_TYPE:str

[DATA]
1|Acme Corp|ENTERPRISE

[SCHEMA:ORDERS]
ID:int|@CUSTOMER.ID:int|ORDER_NUM:str|TOTAL:float

[DATA]
1|1|ORD-001|5000.50
2|1|ORD-002|3200.00

[SCHEMA:ITEMS]
ID:int|@ORDERS.ID:int|SKU:str|QTY:int|PRICE:float

[DATA]
1|1|SKU-A001|100|50.00
2|1|SKU-A002|50|25.00
3|2|SKU-B001|200|16.00
```

vs. JSON (deeply nested, harder to parse):

```json
{"customer": {"id": 1, "name": "Acme Corp", "orders": [{"id": 1, "orderId": "ORD-001", "items": [...]}]}}
```

**Benefits:**

- Maintains relational integrity without token-expensive nesting
- Supports **unlimited nesting depth** via relational decomposition
- Each schema is independently parseable

#### 3. **Tool Discovery & Function Schema Broadcasting** ✓

Agents can broadcast their "Capabilities" in a format that looks like a header file rather than verbose JSON-Schema.

**Example: LLM discovering available functions**

```text
[FUNCTIONS]
[SearchDocuments]
DESC[Search documentation using keyword or phrase]
IN[QUERY:str*|MAX_RESULTS:int|LANGUAGE:str]
OUT[ID:str|TITLE:str|URL:str|RELEVANCE:float]

[AnalyzeSentiment]
DESC[Analyze sentiment of text]
IN[TEXT:str*|DETAILED:any]
OUT[SENTIMENT:str|SCORE:float|CONFIDENCE:float]
```

**Token savings: ~37% vs. JSON-Schema (from function schema benchmarks)**

#### 4. **Relational AI Results** ✓

Function results with relational references improve structure clarity for LLMs.

**Example: Search + metadata correlation**

```text
[SCHEMA:METADATA]
ID:int|SOURCE:str|INDEXED_DATE:int

[DATA]
1|internal_kb|1704067200
2|public_api|1704153600

[RESULT]
[ID:str|TITLE:str|METADATA:@METADATA.ID]
doc_001|Deployment Guide|1
doc_002|API Reference|2
[STATUS:str]
SUCCESS
[ERROR:str]
```

### ❌ When NOT to Use AgentDSL

#### ❌ Avoid AgentDSL When:

1. **Binary or Complex Data Types**
   - Geospatial data (GIS), images, audio, videos
   - Encrypted/compressed data
   - Custom serialized objects
   - **Use Instead:** Base64-encoded JSON or appropriate binary protocols

2. **Highly Irregular/Sparse Data**
   - Data with too many optional fields (>50% null values)
   - Completely unstructured data
   - Heterogeneous records with different field sets
   - **Use Instead:** JSON with nullable fields or document databases

3. **Single Small Objects**
   - Communicating a single config object
   - Small API responses (< 500 chars)
   - Simple key-value pairs
   - **Use Instead:** JSON (overhead of schema definition not justified)

   **Example: DON'T do this**

   ```text
   [SCHEMA:CONFIG]
   ID:int|API_KEY:str|DEBUG:any

   [DATA]
   1|secret_key_123|true
   ```

   **Use JSON instead:**

   ```json
   { "api_key": "secret_key_123", "debug": true }
   ```

4. **When Human Readability is Critical**
   - Configuration files for end users
   - API documentation examples
   - Log files for debugging
   - **Use Instead:** JSON, YAML, or TOML

5. **Unstructured Text**
   - Chat messages with arbitrary metadata
   - Natural language processing results
   - Free-form annotations
   - **Use Instead:** JSON with text field

6. **Real-time Streaming with Variable Schema**
   - Data where field count/types change per message
   - Sensor streams with dynamic attributes
   - Log aggregation with unknown field structure
   - **Use Instead:** JSON or MessagePack with schema versioning

---

## 📋 Specification & Limits

### Type System

AgentDSL supports four atomic types and one composite type:

```
int       - 64-bit signed integer: -2^63 to 2^63-1
float     - 64-bit IEEE754 double precision
str       - UTF-8 string, up to ~2 billion characters
any       - JSON-serialized value (arrays, objects, etc.)
Reference - Relational pointer to another schema
```

### Limits & Constraints

| Constraint                   | Limit            | Notes                                                   |
| ---------------------------- | ---------------- | ------------------------------------------------------- |
| **Field Count per Schema**   | 1,000            | Practical limit; performance degrades >500              |
| **Nesting Depth**            | Unlimited        | Relational decomposition handles any depth              |
| **Records per Block**        | 100,000+         | Limited by memory; tested up to 1M+ records             |
| **String Length**            | ~2GB theoretical | Practical limit 10MB+ for efficient parsing             |
| **Field Name Length**        | 255 chars        | Should be kept <50 for readability                      |
| **Schema Label Length**      | 100 chars        | Avoid special chars, use `[A-Za-z0-9_]`                 |
| **Single Line Width**        | Unlimited        | Pipe-delimited; no newlines in values except `any` type |
| **Array Nesting (in `any`)** | 100 levels       | JSON RFC limit not explicitly enforced                  |

### Protocol Rules

#### Schema Definition

```text
[SCHEMA:LABEL]
FIELDNAME:type|FIELDNAME:type|...
```

- **Must have** exactly one `ID:int` as first field
- **Field names** are case-sensitive, use `[A-Za-z0-9_]`
- **Types** must be one of: `int`, `str`, `float`, `any`, `@Schema.Field`
- **References** format: `@ParentSchema.ID:int` (the `:int` is type annotation)

#### Data Rows

```text
[DATA]
id_value|field2|field3|...
```

- **Each row starts with integer ID**, one per record
- **Delimiter is pipe `|`**, no escaping needed (pipes in values use `any` type)
- **Empty fields** represented as `_` (underscore)
- **Multi-values** in single field: comma-separated (e.g., `skill1,skill2,skill3`)
- **No newlines** in scalar values; use `any` type for complex data

#### Functions Block

```text
[FUNCTIONS]
[FunctionName]
DESC[Description here]
IN[PARAM1:type*|PARAM2:type]
OUT[RESULT1:type|RESULT2:type]
```

- **Function names** must be alphanumeric + underscore
- **DESC** is mandatory (searchable by agents)
- **Asterisk `*`** marks required parameters
- **No type** in parameter means `str` by default

#### RPC Call

```text
[CALL]
[FunctionName]
PARAM:VALUE
PARAM:VALUE
```

- **Function must be previously declared** in `[FUNCTIONS]` block
- **Parameters provided as KEY:VALUE pairs**, one per line
- **Order doesn't matter**; matching by name

#### Result Block

```text
[RESULT]
[FIELD:type|FIELD:type]
value1|value2|...
[STATUS:str]
SUCCESS|ERROR
[ERROR:str]
optional_error_message
```

- **Status** must be exactly `SUCCESS` or `ERROR`
- **Error message** empty if success
- **Records follow same pipe-delimited format** as `[DATA]`

### Performance Characteristics

| Operation                | Time Complexity | Notes                                             |
| ------------------------ | --------------- | ------------------------------------------------- |
| **Parse DSL to AST**     | O(n)            | Linear in input size; single pass parser          |
| **Serialize AST to DSL** | O(n)            | Linear in record count                            |
| **From JSON decompose**  | O(n × d)        | n=records, d=nesting depth; relational extraction |
| **To JSON reconstruct**  | O(n × m)        | n=total records, m=average reference depth        |

### Stability & Versioning

- **Current Version:** Alpha (0.0.x)
- **Protocol Stability:** Subject to change until 1.0 release
- **Breaking Changes:** Will increment minor version (e.g., 0.1 → 0.2)
- **Recommendations:**
  - Version your DSL payloads with a version field
  - Keep agent implementations flexible for schema evolution
  - Use `[FUNCTIONS]` DESC field for version info if needed

### Best Practices

✅ **DO:**

- Use `int` for IDs, timestamps, counts
- Use `str` for product names, messages, categories
- Use `float` for metrics, percentages, financial amounts
- Use `@Reference` for any foreign key relationship
- Use `any` rarely—only for complex metadata
- Keep field names descriptive but concise (under 50 chars)
- Place parent schema before child schemas in message order

❌ **DON'T:**

- Use `str` for large binary blobs (use encoding or external reference)
- Create >500 fields per schema (split into related schemas instead)
- Store unstructured text in required fields (use optional `any` type)
- Rely on field order in `[SCHEMA]`—always validate by name
- Store passwords/secrets in DSL—use encryption or environment injection

---

## 📊 Benchmark Results

Real-world benchmarks using **8 diverse scenarios** (table data, 4-level nested hierarchies, API responses, and tool discovery):

### Overall Performance

| Metric                       | Result        | Notes                                |
| ---------------------------- | ------------- | ------------------------------------ |
| **Character Size Reduction** | **46.0%**     | JSON → AgentDSL compression ratio    |
| **Token Efficiency Gain**    | **23.8%**     | Using OpenAI CL100K encoding         |
| **Average Per-Scenario**     | 42.8% / 20.4% | Size reduction / Token reduction     |
| **Conversion Speed**         | 0.88ms/item   | Single-pass json → DSL decomposition |

### Scenario Breakdown

| Scenario                      | Size Reduction | Token Gain | Nesting Depth |
| ----------------------------- | -------------- | ---------- | ------------- |
| **Users_Table_500**           | 48.3%          | 32.1%      | 1 (flat)      |
| **Deep_Org_Hierarchy**        | 56.2%          | 41.3%      | 4             |
| **E-Commerce_Catalog**        | 59.1%          | 44.8%      | 4             |
| **API_Response_Nested**       | 42.1%          | 18.5%      | 2-3           |
| **Database_Orders_4Levels**   | 47.3%          | 17.5%      | 4             |
| **Documentation_Content**     | 51.8%          | 22.1%      | 4             |
| **Function_Schema_Discovery** | 38.2%          | 12.3%      | 1 (flat)      |
| **API_Results_200**           | 44.6%          | 15.0%      | 2             |

### Key Findings

✅ **High-density flat data:** 38-48% size reduction (excellent for tables, schemas)
✅ **Nested hierarchies (4-level):** 47-59% size reduction (unlimited depth supported)
✅ **Mixed relational data:** 12-45% token reduction across all nesting levels
✅ **Fast conversion:** Sub-millisecond decomposition for typical payloads
✅ **Scalability:** Tested with hundreds of records and thousands of values

### Benchmark Configuration

- **Encoding:** OpenAI CL100K (matches GPT-4, GPT-3.5-turbo)
- **Test Data:** Generated realistic schemas with deterministic and random nesting
- **Largest Scenario:** Database with 12 customers, 4 orders each, ~5 items per order
- **Tool:** AgentDSL v0.0.1-pre (Rust + Python bindings via maturin)

---

## 🤝 Contributing

Contributions are welcome! Whether it's bug fixes, feature requests, or documentation improvements, please feel free to open an issue or submit a pull request.

1. Fork the Project
2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the Branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request

## 📄 License

Distributed under the MIT License. See [license.md](license.md) for more information.

