Metadata-Version: 2.4
Name: selanet-sdk
Version: 0.2.0
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Rust
Classifier: Topic :: Internet :: WWW/HTTP :: Browsers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Summary: Selanet Python SDK - HTTP client for browser automation agents
Keywords: selanet,browser-automation,web-scraping,semantic,ai-agents
Author-email: Selanet <dev@sela.network>
License: MIT
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: documentation, https://github.com/sela-network/sela-node-v2/tree/main/client-sdk
Project-URL: homepage, https://github.com/sela-network/sela-node-v2
Project-URL: repository, https://github.com/sela-network/sela-node-v2

# Selanet Python SDK

Python bindings for the Selanet client SDK. Browse any website and get back structured data through distributed browser agents — no CAPTCHAs, no bot detection.

## Installation

```bash
pip install selanet-sdk
```

## Quick Start

```python
import asyncio
import json
import os
from selanet_sdk import SelaClient, BrowseOptions

async def main():
    # 1. Create client with your API key (get one at https://app.selanet.ai)
    client = await SelaClient.with_api_key(os.environ["SELA_API_KEY"])

    # 2. Browse a URL
    result = await client.browse(
        url="https://x.com/search?q=ai&f=live",
        options=BrowseOptions(parse_only=True),
    )

    # 3. Extract structured data
    for item in result.page.content:
        fields = json.loads(item.fields_json)
        print(fields["text"])
        print(fields["author_username"], "—", fields.get("like_count"), "likes")

    await client.shutdown()

asyncio.run(main())
```

Example response:

```json
{
  "page": {
    "page_type": "x_com::search",
    "content": [
      {
        "content_type": "tweet",
        "fields": {
          "text": "AI is transforming how we build software...",
          "author_username": "johndoe",
          "like_count": 142,
          "retweet_count": 38
        }
      }
    ]
  }
}
```

> **Note:** `fields_json` and `collection_stats` are JSON strings in Python. Always use `json.loads()` to parse them.

---

## Examples

All examples below assume you've created a client:

```python
import os
client = await SelaClient.with_api_key(os.environ["SELA_API_KEY"])
```

### Browse Any Website (Markdown Extraction)

For general websites (not X, Xiaohongshu, Rednote, or YouTube), use `extract_format='markdown'` to get readable content:

```python
result = await client.browse(
    url="https://en.wikipedia.org/wiki/Python_(programming_language)",
    options=BrowseOptions(parse_only=True, extract_format="markdown"),
)

print(result.extracted_content)  # Clean markdown text
```

> **Important:** `extract_format='schema'` (default) only works for supported platforms (X, Xiaohongshu, Rednote, YouTube). For all other sites, you **must** use `'markdown'` or `'html'` — otherwise you'll get an empty response.

### Collect More Items (Infinite Scroll)

Use `count` to collect multiple items by scrolling the page. The agent will scroll and accumulate items until reaching the target count.

- Typical range: `10` – `100` items per request
- Approximate time: `count=30` takes ~10–15s, `count=100` takes ~30–60s (depends on the platform and network)

```python
import json

# Collect 50 tweets by scrolling
result = await client.browse(
    url="https://x.com/search?q=python&f=live",
    options=BrowseOptions(parse_only=True, count=50),
)

for item in result.page.content:
    fields = json.loads(item.fields_json)
    print(fields["text"][:80])

# Check collection stats
if result.collection_stats:
    stats = json.loads(result.collection_stats)
    print(f"Collected: {stats}")
    # Example: {"total_collected": 50, "scroll_count": 8, "duplicates_removed": 3}
```

### Xiaohongshu

#### Search Posts

When using platform params (`xiaohongshu_params`, `x_params`, `rednote_params`, `youtube_params`), pass `url=None` — the URL is determined by the params.

```python
import json
from selanet_sdk import BrowseOptions, XiaohongshuParams

# Search and collect 30 posts
result = await client.browse(
    url=None,
    options=BrowseOptions(
        xiaohongshu_params=XiaohongshuParams(
            feature="search",
            query="맛집 추천",
        ),
        count=30,
        parse_only=True,
    ),
)

for item in result.page.content:
    fields = json.loads(item.fields_json)
    print(fields.get("title", ""), "—", fields.get("like_count", ""))
```

#### Browse a Note by URL

Use `feature="note"` with a `url` to browse a note directly. This is useful when you have note links from search results:

```python
result = await client.browse(
    url=None,
    options=BrowseOptions(
        xiaohongshu_params=XiaohongshuParams(
            feature="note",
            url="https://www.xiaohongshu.com/explore/682e7c72000000000b03a68a",
        ),
        parse_only=True,
    ),
)
```

#### Browse a Note by ID

Use `feature="note"` with a `note_id` to browse a specific note:

```python
result = await client.browse(
    url=None,
    options=BrowseOptions(
        xiaohongshu_params=XiaohongshuParams(
            feature="note",
            note_id="682e7c72000000000b03a68a",
        ),
        parse_only=True,
    ),
)
```

#### XiaohongshuParams Features

| Feature | Required Fields | Description |
|---------|----------------|-------------|
| `"search"` | `query` | Search posts by keyword |
| `"note"` | `note_id` or `url` | Browse a specific note by ID or URL |
| `"profile"` | `user_id` | Browse a user profile |
| `"url"` | `url` | Browse any Xiaohongshu URL directly |

### X (Twitter)

#### Search Tweets

```python
from selanet_sdk import BrowseOptions, XParams

result = await client.browse(
    url=None,
    options=BrowseOptions(
        x_params=XParams(
            feature="search",
            query="selanet",
            search_tab="latest",        # "top" (default), "latest", "people", "media"
            filters=["has:images"],      # content filters
            min_likes=100,
        ),
        count=30,
        parse_only=True,
    ),
)
```

#### Browse a Profile

```python
result = await client.browse(
    url=None,
    options=BrowseOptions(
        x_params=XParams(feature="profile", username="elonmusk", tab="media"),
        parse_only=True,
    ),
)
```

#### Browse a Single Post

```python
result = await client.browse(
    url=None,
    options=BrowseOptions(
        x_params=XParams(feature="post", username="elonmusk", tweet_id="1234567890"),
        parse_only=True,
    ),
)
```

#### XParams Features

| Feature | Required Fields | Optional Fields | Description |
|---------|----------------|-----------------|-------------|
| `"search"` | `query` | `search_tab`, `filters`, `min_likes`, `min_retweets`, `lang`, `since`, `until` | Advanced search with filters |
| `"profile"` | `username` | `tab` (posts/replies/media/likes) | User profile |
| `"post"` | `username`, `tweet_id` | | Single tweet |

### Rednote

Rednote is the international version of Xiaohongshu. `RednoteParams` has the same fields and features as `XiaohongshuParams`.

#### Search Posts

```python
from selanet_sdk import BrowseOptions, RednoteParams

result = await client.browse(
    url=None,
    options=BrowseOptions(
        rednote_params=RednoteParams(
            feature="search",
            query="travel tips",
        ),
        count=20,
        parse_only=True,
    ),
)
```

#### Browse a Note

```python
# By note ID
result = await client.browse(
    url=None,
    options=BrowseOptions(
        rednote_params=RednoteParams(
            feature="note",
            note_id="682e7c72000000000b03a68a",
        ),
        parse_only=True,
    ),
)

# By URL (Rednote shares the xiaohongshu.com domain)
result = await client.browse(
    url=None,
    options=BrowseOptions(
        rednote_params=RednoteParams(
            feature="note",
            url="https://www.xiaohongshu.com/explore/682e7c72000000000b03a68a",
        ),
        parse_only=True,
    ),
)
```

#### RednoteParams Features

| Feature | Required Fields | Description |
|---------|----------------|-------------|
| `"search"` | `query` | Search posts by keyword |
| `"note"` | `note_id` or `url` | Browse a specific note by ID or URL |
| `"profile"` | `user_id` | Browse a user profile |
| `"url"` | `url` | Browse any Rednote URL directly |

### YouTube

#### Search Videos

```python
import json
from selanet_sdk import BrowseOptions, YouTubeParams

result = await client.browse(
    url=None,
    options=BrowseOptions(
        youtube_params=YouTubeParams(
            feature="search",
            query="rust programming tutorial",
        ),
        count=10,
        parse_only=True,
    ),
)

for item in result.page.content:
    fields = json.loads(item.fields_json)
    print(fields.get("title"), "—", fields.get("link"))
```

#### Watch a Video

Use `feature="watch"` with `duration` (seconds) to watch a video for a specified time:

```python
result = await client.browse(
    url=None,
    options=BrowseOptions(
        youtube_params=YouTubeParams(
            feature="watch",
            video_id="dQw4w9WgXcQ",
            duration=30,  # watch for 30 seconds
        ),
        parse_only=True,
    ),
)
```

#### Get Comments

```python
result = await client.browse(
    url=None,
    options=BrowseOptions(
        youtube_params=YouTubeParams(
            feature="comments",
            video_id="dQw4w9WgXcQ",
        ),
        count=20,
        parse_only=True,
    ),
)
```

#### YouTubeParams Features

| Feature | Required Fields | Optional Fields | Description |
|---------|----------------|-----------------|-------------|
| `"search"` | `query` | `url` | Search videos by keyword |
| `"watch"` | `video_id` | `duration` (seconds) | Watch a video for a specified duration |
| `"comments"` | `video_id` | | Get comments on a video (use `count` for more) |

### Browse Multiple URLs in Parallel

`browse_parallel_collect()` distributes URLs across agents for concurrent browsing. Results contain an `index` field matching the original input order — results may arrive out of order, so use `index` to map them back.

#### Search → Parallel Browse Pattern

The typical workflow: search first, extract links, then browse them in parallel:

```python
import json
import os
from selanet_sdk import (
    SelaClient, BrowseOptions,
    XiaohongshuParams, ParallelBrowseItem,
)

async def main():
    client = await SelaClient.with_api_key(os.environ["SELA_API_KEY"])

    # Step 1: Search
    search_result = await client.browse(
        url=None,
        options=BrowseOptions(
            xiaohongshu_params=XiaohongshuParams(feature="search", query="맛집 추천"),
            count=10,
            parse_only=True,
        ),
    )

    # Step 2: Extract note links from search results
    note_links = []
    for item in search_result.page.content:
        fields = json.loads(item.fields_json)
        link = fields.get("link")
        if link and "/explore/" in link:
            note_links.append(link)

    # Step 3: Browse notes in parallel
    items = [
        ParallelBrowseItem(
            url=None,
            options=BrowseOptions(
                xiaohongshu_params=XiaohongshuParams(feature="note", url=link),
                parse_only=True,
            ),
        )
        for link in note_links[:5]  # batch of 5
    ]

    results = await client.browse_parallel_collect(items, max_concurrent_per_agent=5)

    for r in sorted(results, key=lambda r: r.index):
        if r.error:
            print(f"[{r.index}] FAIL: {r.error}")
        else:
            title = r.response.page.metadata.title or "N/A"
            count = len(r.response.page.content)
            print(f"[{r.index}] {title[:40]} — {count} items ({r.elapsed_ms}ms)")

    await client.shutdown()

import asyncio
asyncio.run(main())
```

#### Batching Large Requests

For many URLs, batch them to avoid overwhelming the network:

```python
batch_size = 5
for i in range(0, len(all_urls), batch_size):
    batch = all_urls[i : i + batch_size]
    items = [
        ParallelBrowseItem(
            url=None,
            options=BrowseOptions(
                xiaohongshu_params=XiaohongshuParams(feature="note", url=url),
                parse_only=True,
            ),
        )
        for url in batch
    ]
    results = await client.browse_parallel_collect(items, max_concurrent_per_agent=5)
    # process results...
```

`ParallelBrowseResult` fields:

| Field | Type | Description |
|-------|------|-------------|
| `index` | `int` | Original position in the input list (use this to match results to inputs) |
| `url` | `str` | The URL that was browsed |
| `response` | `SemanticResponse?` | Result (None if error) |
| `error` | `str?` | Error message (None if success) |
| `agent_peer_id` | `str` | Always empty in HTTP mode. Use `response.agent_id` instead. |
| `elapsed_ms` | `int` | Time taken in milliseconds |

---

## API Reference

### SelaClient

#### Creating a Client

| Constructor | Description |
|-------------|-------------|
| `SelaClient.with_api_key(key)` | Pass API key directly (recommended) |
| `SelaClient.from_env()` | Auto-load from `SELA_API_KEY` or `API_KEY` env var |
| `SelaClient.create(config)` | Full control via `SelaClientConfig` |

```python
import os
from selanet_sdk import SelaClient, SelaClientConfig

# Option 1: API key 직접 전달 (권장)
client = await SelaClient.with_api_key(os.environ["SELA_API_KEY"])

# Option 2: 환경변수에서 자동 로딩 (SELA_API_KEY 또는 API_KEY)
client = await SelaClient.from_env()

# Option 3: 상세 설정
config = SelaClientConfig(api_key="sk_live_xxx", api_server_url="https://api.selanet.ai")
client = await SelaClient.create(config)
```

#### Methods

| Method | Description |
|--------|-------------|
| `browse(url, options?)` | Browse a URL and get semantic content |
| `browse_parallel_collect(items, max_concurrent?)` | Browse multiple URLs in parallel |
| `shutdown(timeout_ms?)` | Gracefully shutdown the client |
| `state()` | Get current client state |

---

### Configuration

#### BrowseOptions

All fields are optional. Only set what you need.

| Field | Type | Description |
|-------|------|-------------|
| `parse_only` | `bool` | **Recommended.** Skip AI planning, extract only |
| `count` | `int` | Items to collect via infinite scroll (~10–100) |
| `extract_format` | `str` | `"schema"` (default), `"markdown"`, `"html"` |
| `x_params` | `XParams` | X (Twitter) search/profile/post params |
| `xiaohongshu_params` | `XiaohongshuParams` | Xiaohongshu search/note/profile/url params |
| `rednote_params` | `RednoteParams` | Rednote search/note/profile/url params |
| `youtube_params` | `YouTubeParams` | YouTube search/watch/comments params |
| `wait_for_agent` | `bool` | Wait for agent availability within timeout |
| `include_html` | `bool` | Return raw HTML in response |
| `timeout_ms` | `int` | Request timeout in ms |
| `session_id` | `str` | Reuse existing browser session |
| `query` | `str` | Query hint for semantic extraction |
| `api_key` | `str` | Per-request API key override |
| `agent_id` | `str` | Target a specific agent by peer ID |

> **Important:** `extract_format='schema'` only works for supported platforms (X, Xiaohongshu, Rednote, YouTube). For all other sites, you **must** use `'markdown'` or `'html'`.

#### SelaClientConfig

Most users should use `SelaClient.with_api_key()`. Use `SelaClientConfig` only when you need to customize timeouts or API server URL:

```python
from selanet_sdk import SelaClient, SelaClientConfig

config = SelaClientConfig(
    api_key=os.environ["SELA_API_KEY"],
    api_server_url="https://api.selanet.ai",  # Optional, override API server
    connection_timeout_secs=30,     # Optional, default: 30s
    request_timeout_secs=120,       # Optional, default: 120s
)
client = await SelaClient.create(config)
```

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `api_key` | `str?` | `None` | API key (`sk_live_xxx`) |
| `api_server_url` | `str?` | `None` | API Server URL override |
| `connection_timeout_secs` | `int?` | `30` | Connection timeout |
| `request_timeout_secs` | `int?` | `120` | Per-request timeout |

All fields are optional.

---

### Response Types

#### SemanticResponse

```python
result = await client.browse(url, options)

result.session_id           # Browser session ID
result.page                 # SemanticPage
result.page.page_type       # e.g., "x_com::search", "generic"
result.page.content         # list[SemanticContent]
result.page.metadata.title  # Page title
result.collection_stats     # JSON string (when count is used) — use json.loads()
result.extracted_content    # Markdown/HTML text (when extract_format is set)
result.extract_format       # "schema", "html", or "markdown"
result.agent_version        # Agent node version (e.g., "0.2.6")
result.agent_id             # Agent peer ID that processed the request
result.request_id           # Unique request identifier
result.action_result        # ActionResult (Success, Failed, Skipped, etc.)
```

#### SemanticContent

```python
for item in result.page.content:
    item.content_type       # e.g., "tweet", "note", "user"
    item.content_id         # Unique ID (if available)
    item.fields_json        # JSON string — use json.loads()
    item.actions            # list[AvailableAction]
```

#### collection_stats

When using `count`, `collection_stats` contains scroll/collection metadata as a JSON string:

```python
if result.collection_stats:
    stats = json.loads(result.collection_stats)
    # {
    #   "total_collected": 30,
    #   "scroll_count": 5,
    #   "duplicates_removed": 2,
    #   "elapsed_ms": 12340
    # }
```

---

### Error Handling

```python
from selanet_sdk import SelaError

try:
    result = await client.browse(url, options)
except SelaError.ConfigurationError as e:
    print(f"Bad config: {e}")
except SelaError.TimeoutError as e:
    print(f"Request timed out: {e}")
except SelaError.BrowseError as e:
    print(f"Browse failed: {e}")
except SelaError.DiscoveryError as e:
    print(f"No agents available: {e}")
except SelaError as e:
    print(f"Error: {e}")
```

| Error | When |
|-------|------|
| `ConfigurationError` | Invalid config (bad API key format, missing fields) |
| `TimeoutError` | Request timed out |
| `BrowseError` | Browse operation failed on agent |
| `DiscoveryError` | No agents found or agent connection failed |
| `ConnectionError` | Network connection failed |
| `ProtocolError` | Rate limited, invalid response, serialization error |
| `InternalError` | Unexpected internal error |

---

## Development

### Building from Source

Requires: Rust 1.70+, Python 3.9+, maturin

```bash
pip install maturin

cd client-sdk/crates/sela-py
maturin develop --release   # Install in dev mode
maturin build --release     # Build wheel
```

### Running the Example

```bash
cd client-sdk
python examples/browse_example.py
```

### Running Tests

```bash
cargo test -p sela-py
```

## License

MIT License - see [LICENSE](../../../LICENSE) for details.

