GGUF setup

GGUF setup — embedded local LLM

Gnomad can run small GGUF models in-process for the command planner and (optionally) local chat when built with the embedded-llm feature. Models are not bundled in the repo or default installers (size and licensing).

Quick start

Build with embedded LLM:
```
npm run tauri:dev:embedded
```
Download a small model (optional helper):
```
npm run download:gguf
```
Default: Qwen2.5-Coder 1.5B Instruct Q4 (~1 GB) into ~/.gnomad/models/.
Settings → Agent access → GGUF path — paste the full path to the .gguf file.
Enable Command planner and/or Use GGUF for local chat (experimental).

Custom download

bash scripts/download-gguf-model.sh /path/to/output/dir
# or with a custom URL:
GGUF_URL="https://huggingface.co/.../model.gguf" bash scripts/download-gguf-model.sh ~/models

After download, set the GGUF path in Settings to the printed file path.

Recommended models (dev)

Model	Size (approx)	Use case
Qwen2.5-Coder 1.5B Q4	~1 GB	Command planner, light local chat
Llama 3.2 3B Q4	~2 GB	General chat (slower on CPU)

Use Q4_K_M or similar quantizations for laptop CPU inference.

CI note

Default GitHub Actions builds omit embedded-llm to keep matrix fast. Enable locally or in a dedicated workflow when testing GGUF.

USER_GUIDE.md — embedded GGUF section
WAVE_B_ROADMAP.md — B2 design
BUILD.md — build flags

Built with ❤️ by Gnomad Studio 🦙

GGUF setup — embedded local LLM

Quick start

Custom download

Recommended models (dev)

CI note

Related