Metadata-Version: 2.4
Name: crawlnest-mcp
Version: 0.3.1
Summary: CrawlNest MCP server for web scraping, Twitter, Reddit, video download, image download, and transcription. Use with Claude Code, Claude Desktop, Cursor, and other MCP clients.
Author-email: WonderCrafts <support@crawlnest.com>
License: MIT
Project-URL: Homepage, https://crawlnest.com
Project-URL: Documentation, https://docs.crawlnest.com
Project-URL: Repository, https://github.com/WonderCrafts/CrawlNest
Project-URL: Issues, https://github.com/WonderCrafts/CrawlNest/issues
Keywords: mcp,model-context-protocol,web-scraping,twitter,reddit,claude-code,claude-desktop,anthropic,scraping,crawler,anti-bot,data-extraction,video-download,image-download,transcription
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Operating System :: OS Independent
Requires-Python: <3.14,>=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: fastmcp>=0.1.0
Requires-Dist: mcp>=1.0.0
Requires-Dist: httpx>=0.24.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: build>=1.0.0; extra == "dev"
Requires-Dist: twine>=4.0.0; extra == "dev"
Dynamic: license-file

# crawlnest-mcp

MCP server for web scraping, Twitter, and Reddit — 17 tools for any MCP-compatible LLM client.

Works with Claude Code, Claude Desktop, Cursor, Windsurf, and any other MCP client.

## Quick Start

```bash
pip install crawlnest-mcp
```

Then configure your client with two env vars:

| Variable | Description |
|----------|-------------|
| `CRAWLNEST_API_URL` | Your CrawlNest API endpoint (e.g., `https://api.crawlnest.com` or `http://100.x.x.x:8000` for Tailscale) |
| `CRAWLNEST_API_KEY` | Your API key (starts with `cn_...`) |

That's it. No SSH, no scripts, no cloning repos.

## Client Configuration

### Claude Code

```bash
claude mcp add crawlnest \
  -e CRAWLNEST_API_URL=https://api.crawlnest.com \
  -e CRAWLNEST_API_KEY=cn_your_key \
  -- crawlnest-mcp
```

Or add to `~/.claude.json` manually:

```json
{
  "mcpServers": {
    "crawlnest": {
      "command": "crawlnest-mcp",
      "env": {
        "CRAWLNEST_API_URL": "https://api.crawlnest.com",
        "CRAWLNEST_API_KEY": "cn_your_key"
      }
    }
  }
}
```

### Claude Desktop

Add to your config file:

| Platform | Config Path |
|----------|------------|
| macOS | `~/Library/Application Support/Claude/claude_desktop_config.json` |
| Windows | `%APPDATA%\Claude\claude_desktop_config.json` |
| Linux | `~/.config/Claude/claude_desktop_config.json` |

```json
{
  "mcpServers": {
    "crawlnest": {
      "command": "crawlnest-mcp",
      "env": {
        "CRAWLNEST_API_URL": "https://api.crawlnest.com",
        "CRAWLNEST_API_KEY": "cn_your_key"
      }
    }
  }
}
```

Restart Claude Desktop after saving.

### Cursor

Add to `~/.cursor/mcp.json`:

```json
{
  "mcpServers": {
    "crawlnest": {
      "command": "crawlnest-mcp",
      "env": {
        "CRAWLNEST_API_URL": "https://api.crawlnest.com",
        "CRAWLNEST_API_KEY": "cn_your_key"
      }
    }
  }
}
```

### Windsurf

Add to `~/.codeium/windsurf/mcp_config.json`:

```json
{
  "mcpServers": {
    "crawlnest": {
      "command": "crawlnest-mcp",
      "env": {
        "CRAWLNEST_API_URL": "https://api.crawlnest.com",
        "CRAWLNEST_API_KEY": "cn_your_key"
      }
    }
  }
}
```

### Self-Hosted / Tailscale / Proxmox

Point `CRAWLNEST_API_URL` at your instance:

```json
{
  "mcpServers": {
    "crawlnest": {
      "command": "crawlnest-mcp",
      "env": {
        "CRAWLNEST_API_URL": "http://100.64.0.5:8000",
        "CRAWLNEST_API_KEY": "cn_your_key"
      }
    }
  }
}
```

Replace `100.64.0.5` with your Tailscale IP (`tailscale ip` on the server).

## Tools (17)

### Web Scraping

| Tool | Description | Key Parameters |
|------|-------------|----------------|
| `scrape_url` | Scrape a single page with anti-bot bypass | `url` |
| `crawl_website` | Crawl up to 50 pages with sitemap discovery | `url`, `max_pages` |
| `extract_structured` | Extract structured data with custom schema | `url`, `schema`, `output_format` |

### Twitter

| Tool | Description | Credentials |
|------|-------------|-------------|
| `fetch_tweet` | Fetch tweet by URL (3-layer cascade) | No |
| `twitter_user_info` | User profile info | No |
| `twitter_search` | Search tweets by query | Required |
| `twitter_trending` | Trending topics (local/global) | Required |
| `twitter_news` | News timeline | Required |
| `twitter_for_you` | Explore timeline | Required |
| `twitter_home_timeline` | Home feed (for_you/following) | Required |

### Reddit

| Tool | Description | Credentials |
|------|-------------|-------------|
| `reddit_post` | Fetch post + comment tree | No |
| `reddit_community` | Subreddit post listings | No |
| `reddit_about` | Subreddit metadata | No |
| `reddit_search` | Search across Reddit | No |
| `reddit_explore` | Discover subreddits | No |
| `reddit_popular` | Trending posts (with geo filter) | No |
| `reddit_feed` | Personalized home/news feed | Only with `use_cookies=true` |

## Social Media Credentials

Twitter/Reddit tools that need authentication use server-side credentials — not passed through MCP. Set them once via the API.

### Twitter (for search, trending, timelines)

Get `auth_token` and `ct0` from x.com cookies (DevTools → Application → Cookies):

```bash
curl -X POST $CRAWLNEST_API_URL/api/twitter/credentials \
  -H "X-API-Key: cn_your_key" \
  -H "Content-Type: application/json" \
  -d '{"auth_token": "YOUR_AUTH_TOKEN", "ct0": "YOUR_CT0"}'
```

### Reddit (for personalized feeds only)

Get `reddit_session` from reddit.com cookies:

```bash
curl -X POST $CRAWLNEST_API_URL/api/subreddit/credentials \
  -H "X-API-Key: cn_your_key" \
  -H "Content-Type: application/json" \
  -d '{"reddit_session": "YOUR_SESSION"}'
```

## Usage Examples

```
> Scrape https://news.ycombinator.com and show the top stories

> Extract product name, price, and image from https://example.com/products as CSV

> Crawl https://anthropic.com/news with max 5 pages

> Get the latest tweet from https://x.com/anthropic/status/123456

> What's trending on Twitter right now?

> Show me the top posts from r/technology this week

> Search Reddit for "machine learning" in r/programming
```

## Troubleshooting

### "Command not found: crawlnest-mcp"

```bash
pip install --force-reinstall crawlnest-mcp
which crawlnest-mcp
```

### "CrawlNest API not reachable"

```bash
curl -H "X-API-Key: cn_your_key" $CRAWLNEST_API_URL/health
```

### "Twitter credentials missing"

Save credentials first (see [Social Media Credentials](#social-media-credentials)).

### Tools not showing

1. Restart your client
2. Check config JSON syntax: `cat config.json | python3 -m json.tool`
3. Check logs: `tail -f /tmp/mcp_server.log`

## Self-Hosting

```bash
git clone https://github.com/WonderCrafts/CrawlNest
cd CrawlNest
make setup && make infra && make db-apply && make dev
```

Then use `http://localhost:8000` as your `CRAWLNEST_API_URL`.

## License

MIT

## Links

- [GitHub](https://github.com/WonderCrafts/CrawlNest)
- [Issues](https://github.com/WonderCrafts/CrawlNest/issues)
- [Full MCP Setup Guide](https://github.com/WonderCrafts/CrawlNest/blob/main/docs/MCP_SETUP.md)
