nixos-configs

johno/nixos-configs

Fork 0

Commit Graph

Author	SHA1	Message	Date
John Ogle	170a27310e	feat(local-inference): add TTL support for automatic model unloading Some checks failed CI / check (push) Failing after 1m44s Details CI / build-and-cache (push) Has been skipped Details Add globalTTL and per-model ttl options to llama-swap config, allowing idle models to be automatically unloaded from memory.	2026-04-16 15:37:02 -07:00
John Ogle	10efafd92e	feat(local-inference): replace ollama with llama-swap + llama.cpp on zix790prors - Add local-inference NixOS role using llama-swap (from nixpkgs-unstable) with llama.cpp (CUDA-enabled, from nixpkgs-unstable) - Serves Qwen3.6-35B-A3B via HuggingFace auto-download with --cpu-moe - Add nixosSpecialArgs for nixpkgs-unstable module access - Configure opencode with llama-local provider pointing to zix790prors:8080 - Update gptel from Ollama backend to OpenAI-compatible llama-swap backend - Remove ollama service from zix790prors	2026-04-16 15:20:37 -07:00

Author

SHA1

Message

Date

John Ogle

170a27310e

feat(local-inference): add TTL support for automatic model unloading

CI / check (push) Failing after 1m44s

Details

CI / build-and-cache (push) Has been skipped

Details

Add globalTTL and per-model ttl options to llama-swap config,
allowing idle models to be automatically unloaded from memory.

2026-04-16 15:37:02 -07:00

John Ogle

10efafd92e

feat(local-inference): replace ollama with llama-swap + llama.cpp on zix790prors

- Add local-inference NixOS role using llama-swap (from nixpkgs-unstable)
  with llama.cpp (CUDA-enabled, from nixpkgs-unstable)
- Serves Qwen3.6-35B-A3B via HuggingFace auto-download with --cpu-moe
- Add nixosSpecialArgs for nixpkgs-unstable module access
- Configure opencode with llama-local provider pointing to zix790prors:8080
- Update gptel from Ollama backend to OpenAI-compatible llama-swap backend
- Remove ollama service from zix790prors

2026-04-16 15:20:37 -07:00

2 Commits