Metadata-Version: 2.4
Name: sela-browse-sdk
Version: 1.0.5
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Rust
Classifier: Topic :: Internet :: WWW/HTTP :: Browsers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Summary: Sela Network Python SDK - P2P client for browser automation agents
Keywords: sela,p2p,libp2p,browser-automation,web-scraping,semantic
Author-email: Sela Network <dev@sela.network>
License: MIT
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: documentation, https://github.com/sela-network/sela-node-v2/tree/main/client-sdk
Project-URL: homepage, https://github.com/sela-network/sela-node-v2
Project-URL: repository, https://github.com/sela-network/sela-node-v2

# Sela Network Python SDK

Python bindings for the Sela Network P2P client SDK. Browse any website and get back structured data through distributed browser agents — no CAPTCHAs, no bot detection.

## Installation

```bash
pip install sela-browse-sdk
```

## Quick Start

```python
import asyncio
import json
from sela_browse_sdk import SelaClient, BrowseOptions

async def main():
    # 1. Create client with your API key (get one at https://app.selanet.ai)
    client = await SelaClient.with_api_key('sk_live_xxx')
    await client.start()

    # 2. Find and connect to an available agent
    agents = await client.discover_agents('web')
    await client.connect_to_best_available(agents)

    # 3. Browse a URL
    result = await client.browse(
        'https://x.com/search?q=ai&f=live',
        BrowseOptions(parse_only=True),
    )

    # 4. Extract structured data
    for item in result.page.content:
        fields = json.loads(item.fields_json)
        print(fields['text'])
        print(fields['author_username'], '—', fields.get('like_count'), 'likes')

    await client.shutdown()

asyncio.run(main())
```

Example response:

```json
{
  "page": {
    "page_type": "x_com::search",
    "content": [
      {
        "content_type": "tweet",
        "fields": {
          "text": "AI is transforming how we build software...",
          "author_username": "johndoe",
          "like_count": 142,
          "retweet_count": 38
        }
      }
    ]
  }
}
```

> **Note:** `fields_json` and `collection_stats` are JSON strings in Python. Always use `json.loads()` to parse them.

For platform-specific examples (X, Xiaohongshu, parallel browse, markdown extraction), see the [Browse API Guide](../../docs/browse-api-guide.python.md).

---

## Examples

### Browse Any Website (Markdown Extraction)

For general websites (not X or Xiaohongshu), use `extract_format='markdown'` to get readable content:

```python
import asyncio
from sela_browse_sdk import SelaClient, BrowseOptions

async def main():
    client = await SelaClient.with_api_key('sk_live_xxx')
    await client.start()

    agents = await client.discover_agents('web')
    await client.connect_to_best_available(agents)

    # extract_format='markdown' is required for general websites
    result = await client.browse(
        'https://en.wikipedia.org/wiki/Python_(programming_language)',
        BrowseOptions(parse_only=True, extract_format='markdown'),
    )

    print(result.extracted_content)  # Clean markdown text

    await client.shutdown()

asyncio.run(main())
```

> **Important:** `extract_format='schema'` (default) only works for supported platforms (X, Xiaohongshu). For all other sites, you **must** use `'markdown'` or `'html'` — otherwise you'll get an empty response.

### Collect More Items (Infinite Scroll)

Use `count` to collect multiple items by scrolling the page:

```python
import asyncio
import json
from sela_browse_sdk import SelaClient, BrowseOptions

async def main():
    client = await SelaClient.with_api_key('sk_live_xxx')
    await client.start()

    agents = await client.discover_agents('web')
    await client.connect_to_best_available(agents)

    # Collect 50 tweets by scrolling
    result = await client.browse(
        'https://x.com/search?q=python&f=live',
        BrowseOptions(parse_only=True, count=50),
    )

    for item in result.page.content:
        fields = json.loads(item.fields_json)
        print(fields['text'][:80])

    # Check collection stats
    if result.collection_stats:
        stats = json.loads(result.collection_stats)
        print(f"Collected: {stats}")

    await client.shutdown()

asyncio.run(main())
```

### Browse Multiple URLs in Parallel

`browse_parallel_collect()` distributes URLs across connected agents for concurrent browsing:

```python
import asyncio
from sela_browse_sdk import SelaClient, BrowseOptions, ParallelBrowseItem

async def main():
    client = await SelaClient.with_api_key('sk_live_xxx')
    await client.start()

    agents = await client.discover_agents('web')
    await client.connect_to_best_available(agents)

    # Define URLs to browse in parallel
    items = [
        ParallelBrowseItem(
            url='https://example.com/page-1',
            options=BrowseOptions(parse_only=True, extract_format='markdown'),
        ),
        ParallelBrowseItem(
            url='https://example.com/page-2',
            options=BrowseOptions(parse_only=True, extract_format='markdown'),
        ),
        ParallelBrowseItem(
            url='https://example.com/page-3',
            options=BrowseOptions(parse_only=True, extract_format='markdown'),
        ),
    ]

    # Browse all URLs concurrently (max 3 per agent)
    results = await client.browse_parallel_collect(items, max_concurrent_per_agent=3)

    for r in results:
        if r.error:
            print(f"[{r.url}] Failed: {r.error}")
        else:
            print(f"[{r.url}] {r.response.page.metadata.title} ({r.elapsed_ms}ms)")

    await client.shutdown()

asyncio.run(main())
```

`ParallelBrowseResult` fields:

| Field | Type | Description |
|-------|------|-------------|
| `index` | `int` | Original position in the input list |
| `url` | `str` | The URL that was browsed |
| `response` | `SemanticResponse?` | Result (None if error) |
| `error` | `str?` | Error message (None if success) |
| `agent_peer_id` | `str` | Which agent handled this URL |
| `elapsed_ms` | `int` | Time taken in milliseconds |

---

## API Reference

### SelaClient

#### Creating a Client

```python
# Recommended — auto-discovers bootstrap nodes and relays
client = await SelaClient.with_api_key('sk_live_xxx')

# With environment selection (production or staging)
client = await SelaClient.with_api_key('sk_live_xxx', 'production')
client = await SelaClient.with_api_key('sk_live_xxx', 'staging')

# With full configuration
client = await SelaClient.create(config)

# From SELA_API_KEY environment variable
client = await SelaClient.from_env()
```

#### Methods

| Method | Description |
|--------|-------------|
| `start()` | Start the P2P client and connect to bootstrap nodes |
| `shutdown()` | Gracefully shutdown the client |
| `discover_agents(capability, options?)` | Discover available agents |
| `connect_to_best_available(agents, url?, options?)` | Connect to the best available agent |
| `connect_to_agent(peer_id, multiaddr, options?)` | Connect to a specific agent |
| `browse(url, options?)` | Browse a URL and get semantic content |
| `browse_parallel_collect(items, max_concurrent?)` | Browse multiple URLs in parallel |
| `search(query, target_url?, options?)` | Search on a platform |
| `local_peer_id()` | Get the local peer ID |
| `state()` | Get current client state (`Created`, `Running`, `Stopped`) |
| `poll_event()` | Poll for the next event (non-blocking) |
| `get_reputation(peer_id)` | Get reputation for a peer |
| `report_outcome(peer_id, outcome, response_time_ms?)` | Report task outcome |

---

### Configuration

#### SelaClientConfig

Only needed if you want to override defaults. `with_api_key()` handles everything automatically.

```python
from sela_browse_sdk import SelaClientConfig, BootstrapNode

config = SelaClientConfig(
    bootstrap_nodes=[],                             # Auto-resolved if empty
    relay_nodes=None,                               # Auto-discovered
    api_key='sk_live_xxx',
    api_server_url=None,                            # Default: http://localhost:9002
    connection_timeout_secs=30,                     # Peer connection timeout
    discovery_timeout_secs=60,                      # DHT agent discovery timeout
    request_timeout_secs=60,                        # Browse request timeout
    auto_discover_relays=True,
    environment=None,                               # "production" or "staging" (None = compile-time default)
)
```

#### BrowseOptions

```python
BrowseOptions(
    parse_only=True,            # Always recommended — skip AI planning, extract only
    count=None,                 # Items to collect (enables infinite scroll)
    extract_format=None,        # "schema" (default), "markdown", "html"
    session_id=None,            # Reuse existing browser session
    timeout_ms=None,            # Request timeout in ms (default: 60000)
    x_params=None,              # X (Twitter) search/profile/post params
    xiaohongshu_params=None,    # Xiaohongshu search/note/profile params
    agent_id=None,              # Target a specific agent by peer ID
    query=None,                 # Hint for semantic extraction
    include_html=None,          # Return raw HTML in response
    api_key=None,               # Override API key per-request
)
```

> **Important:** `extract_format='schema'` only works for supported platforms (X, Xiaohongshu). For all other sites, you **must** use `'markdown'` or `'html'` — otherwise you'll get an empty response.

#### DiscoveryOptions / ConnectionOptions

```python
DiscoveryOptions(max_agents=None, timeout_ms=None)
ConnectionOptions(timeout_ms=None)
```

---

### Response Types

#### SemanticResponse

```python
result = await client.browse(url, options)

result.session_id           # Browser session ID
result.page                 # SemanticPage
result.page.page_type       # e.g., "x_com::search", "generic"
result.page.content         # list[SemanticContent]
result.page.metadata.title  # Page title
result.collection_stats     # JSON string (when count is used) — use json.loads()
result.extracted_content    # Markdown/HTML text (when extract_format is set)
result.extract_format       # "schema", "html", or "markdown"
```

#### SemanticContent

```python
for item in result.page.content:
    item.content_type       # e.g., "tweet", "note", "user"
    item.content_id         # Unique ID (if available)
    item.fields_json        # JSON string — use json.loads()
    item.actions            # list[AvailableAction]
```

---

### Error Handling

```python
from sela_browse_sdk import SelaError

try:
    result = await client.browse(url, options)
except SelaError.ConnectionError as e:
    print(f"Connection failed: {e}")
except SelaError.TimeoutError as e:
    print(f"Request timed out: {e}")
except SelaError.BrowseError as e:
    print(f"Browse failed: {e}")
except SelaError as e:
    print(f"Error: {e}")
```

| Error | When |
|-------|------|
| `ConfigurationError` | Invalid config (bad API key format, missing fields) |
| `ConnectionError` | Cannot connect to agent |
| `TimeoutError` | Request or discovery timed out |
| `BrowseError` | Browse operation failed on agent |
| `DiscoveryError` | No agents found in DHT |
| `NotStartedError` | Called method before `start()` |
| `NotConnectedError` | No agent connected |
| `ProtocolError` | P2P protocol mismatch |
| `InvalidPeerIdError` | Bad peer ID format |
| `InternalError` | Unexpected internal error |

---

## Development

### Building from Source

Requires: Rust 1.70+, Python 3.9+, maturin

```bash
pip install maturin

cd client-sdk/crates/sela-py
maturin develop          # Install in dev mode
maturin build --release  # Build wheel
```

### Running Tests

```bash
cargo test -p sela-py
```

## License

MIT License - see [LICENSE](../../../LICENSE) for details.

