GGUF setup

Also available: Markdown ยท Plain text

GGUF setup โ€” embedded local LLM

Gnomad can run small GGUF models in-process for the command planner and (optionally) local chat when built with the embedded-llm feature. Models are not bundled in the repo or default installers (size and licensing).


Quick start

  1. Build with embedded LLM:

    npm run tauri:dev:embedded
    
  2. Download a small model (optional helper):

    npm run download:gguf
    

    Default: Qwen2.5-Coder 1.5B Instruct Q4 (~1 GB) into ~/.gnomad/models/.

  3. Settings โ†’ Agent access โ†’ GGUF path โ€” paste the full path to the .gguf file.

  4. Enable Command planner and/or Use GGUF for local chat (experimental).


Custom download

bash scripts/download-gguf-model.sh /path/to/output/dir
# or with a custom URL:
GGUF_URL="https://huggingface.co/.../model.gguf" bash scripts/download-gguf-model.sh ~/models

After download, set the GGUF path in Settings to the printed file path.


Model Size (approx) Use case
Qwen2.5-Coder 1.5B Q4 ~1 GB Command planner, light local chat
Llama 3.2 3B Q4 ~2 GB General chat (slower on CPU)

Use Q4_K_M or similar quantizations for laptop CPU inference.


CI note

Default GitHub Actions builds omit embedded-llm to keep matrix fast. Enable locally or in a dedicated workflow when testing GGUF.



Built with โค๏ธ by Gnomad Studio ๐Ÿฆ™