Metadata-Version: 2.3
Name: jehoctor-rag-demo
Version: 0.2.4
Summary: Chat with Wikipedia
Author: James Hoctor
Author-email: James Hoctor <JEHoctor@protonmail.com>
Requires-Dist: aiosqlite==0.21.0
Requires-Dist: bitsandbytes>=0.49.1
Requires-Dist: chromadb>=1.3.4
Requires-Dist: datasets>=4.4.1
Requires-Dist: httpx>=0.28.1
Requires-Dist: huggingface-hub>=0.36.0
Requires-Dist: langchain>=1.0.5
Requires-Dist: langchain-anthropic>=1.0.2
Requires-Dist: langchain-community>=0.4.1
Requires-Dist: langchain-huggingface>=1.1.0
Requires-Dist: langchain-ollama>=1.0.0
Requires-Dist: langchain-openai>=1.0.2
Requires-Dist: langgraph-checkpoint-sqlite>=3.0.1
Requires-Dist: nvidia-ml-py>=13.590.44
Requires-Dist: ollama>=0.6.0
Requires-Dist: platformdirs>=4.5.0
Requires-Dist: psutil>=7.1.3
Requires-Dist: py-cpuinfo>=9.0.0
Requires-Dist: pydantic>=2.12.4
Requires-Dist: pyperclip>=1.11.0
Requires-Dist: sentence-transformers>=5.2.2
Requires-Dist: textual>=6.5.0
Requires-Dist: transformers[torch]>=4.57.6
Requires-Dist: typer>=0.20.0
Requires-Dist: llama-cpp-python>=0.3.16 ; extra == 'llamacpp'
Requires-Python: ~=3.12.0
Provides-Extra: llamacpp
Description-Content-Type: text/markdown

# RAG-demo

Chat with (a small portion of) Wikipedia

⚠️ RAG functionality is still under development. ⚠️

![app screenshot](screenshots/screenshot_0.2.0.png "App screenshot")

## Requirements

 1. The [uv](https://docs.astral.sh/uv/) Python package manager
    - Installing and updating `uv` is easy by following [the docs](https://docs.astral.sh/uv/getting-started/installation/).
    - As of 2026-01-25, I'm developing using `uv` version 0.9.26, and using the new experimental `--pytorch-backend` option.
 2. A terminal emulator or web browser
    - Any common web browser will work.
    - Some terminal emulators will work better than others.
      See [Notes on terminal emulators](#notes-on-terminal-emulators) below.

### Notes on terminal emulators

Certain terminal emulators will not work with some features of this program.
In particular, on macOS consider using [iTerm2](https://iterm2.com/) instead of the default Terminal.app ([explanation](https://textual.textualize.io/FAQ/#why-doesnt-textual-look-good-on-macos)).
On Linux you might want to try [kitty](https://sw.kovidgoyal.net/kitty/), [wezterm](https://wezterm.org/), [alacritty](https://alacritty.org/), or [ghostty](https://ghostty.org/), instead of the terminal that came with your desktop environment ([reason](https://darren.codes/posts/textual-copy-paste/)).
Windows Terminal should be fine as far as I know.

### Optional dependencies

 1. [Hugging Face login](https://huggingface.co/docs/huggingface_hub/quick-start#login)
 2. API key for your favorite LLM provider (support coming soon)
 3. Ollama installed on your system if you have a GPU
 4. Run RAG-demo on a more capable (bigger GPU) machine over SSH if you can. It is a terminal app after all.
 5. A C compiler if you want to build Llama.cpp from source.

## Run the latest version

Run in a terminal:
```bash
uvx --torch-backend=auto --from=jehoctor-rag-demo@latest chat
```

Or run in a web browser:
```bash
uvx --torch-backend=auto --from=jehoctor-rag-demo@latest textual serve chat
```

## CUDA acceleration via Llama.cpp

If you have an NVIDIA GPU with CUDA and build tools installed, you might be able to get CUDA acceleration without installing Ollama.

```bash
CMAKE_ARGS="-DGGML_CUDA=on" uv run --extra=llamacpp chat
```

## Metal acceleration via Llama.cpp (on Apple Silicon)

On an Apple Silicon machine, make sure `uv` runs an ARM interpreter as this should cause it to install Llama.cpp with Metal support.
Also, run with the extra group `llamacpp`.
Try this:

```bash
uvx --python-platform=aarch64-apple-darwin --torch-backend=auto --from='jehoctor-rag-demo[llamacpp]@latest' chat
```

## Ollama on Linux

Remember that you have to keep Ollama up-to-date manually on Linux.
A recent version of Ollama (v0.11.10 or later) is required to run the [embedding model we use](https://ollama.com/library/embeddinggemma).
See this FAQ: https://docs.ollama.com/faq#how-can-i-upgrade-ollama.

## Project feature roadmap

- ❌ RAG functionality
- ✅ torch inference via the Langchain local Hugging Face inference integration
- ✅ uv automatic torch backend selection (see [the docs](https://docs.astral.sh/uv/guides/integration/pytorch/#automatic-backend-selection))
- ❌ OpenAI integration
- ❌ Anthropic integration

## Run from the repository

First, clone this repository. Then, run one of the options below.

Run in a terminal:
```bash
uv run chat
```

Or run in a web browser:
```bash
uv run textual serve chat
```
