Metadata-Version: 2.4
Name: londonaicentre-mesa-local
Version: 2.1.0
Summary: Serve MESA models locally
Author-email: "Dr. Joe Zhang" <jzhang@nhs.net>, Martin Chapman <contact@martinchapman.co.uk>, "Dr. Lawrence Adams" <lawrence.adams2@nhs.net>
License-Expression: LicenseRef-Proprietary
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE.md
Requires-Dist: boto3>=1.42.56
Requires-Dist: httpx>=0.28.1
Requires-Dist: pydantic>=2.12.4
Requires-Dist: pydantic-settings>=2.11.0
Requires-Dist: pyyaml>=6.0.2
Requires-Dist: tqdm>=4.67.1
Requires-Dist: vllm>=0.16.0
Dynamic: license-file

# MESA local

Serve MESA models locally.

- ⬇️ Downloads weights from S3

- 📦 Unpacks

- 🚀 Serves via a local OpenAI-compatible server

## Prerequisites

### Software

- Python 3.12

### Hardware

- A GPU with >=24GB VRAM (tested on NVIDIA A30)

### Configuration

- Create a file called `.env` in the directory where you intend to run this package.
Populate it with the details you have been provided with in the following format:

```text
MODEL_NAME=
WEIGHTS_ID=
WEIGHTS_KEY=
```

#### vLLM configuration

The package provides a [set of vLLM configuration files](src/mesalocal/config/) for runnings a specific model on a specific GPU.
In addition to `MODEL_NAME`, this can be specified by adding `GPU` to the .env.

## Installation

1. (Recommended) Create a virtual environment and activate it:

    ```python
    python -m venv .venv
    source .venv/bin/activate
    ```

2. Install this package: `pip install londonaicentre-mesa-local`.

## Usage

### CLI (primary)

1. Note command line arguments:

    | Argument | Description |
    | --- | --- |
    | -v, --verbose | Enable debug output (optional) |

2. Start the server as follows: `mesalocal [args]`.

### Library (secondary)

1. Import and use the logic of this package as a library:

```python
import asyncio
from mesalocal.weights import Weights
from mesalocal.inferrer import VLLM
vllm_config: VLLMConfig = VLLMConfig() # VLLMConfig(model_name="foo", gpu="bar") to use a vLLM config without a .env file
weights: Weights = Weights(vllm_config.model)
if weights.unpack():
    vllm: VLLM = VLLM(weights.get_model_folder(), vllm_config)
    async def run():
        async for output in vllm.generate(prompt):
            print(output.outputs[0].text)
    asyncio.run(run())
```

## Clients

### OpenAI (example with _Oncollama_)

1. Interact with the server using the [OpenAI client](https://pypi.org/project/openai) in python:

    ```python
    from openai import OpenAI
    from oncoschema.prompt_builder import PromptBuilder # pip install londonaicentre-oncoschema

    client = OpenAI(
        base_url="http://localhost:5000/v1",
        api_key="blank" 
    )

    response = client.chat.completions.create(
        model="oncollama3betav01",
        messages=[
            {"role": "system", "content": PromptBuilder().build_main_prompt()},
            {"role": "user", "content": "Diagnosis 01/01/26..."}
        ]
    )

    print(response.choices[0].message.content)
    ```

## License

This project uses a proprietary license (see [LICENSE](LICENSE.md)).
